Machine Learning Models in Fullerene/Metallofullerene Chromatography

Total Page:16

File Type:pdf, Size:1020Kb

Machine Learning Models in Fullerene/Metallofullerene Chromatography Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies Xiaoyang Liu Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science In Computer Science and Application Yang Cao, Advisor Harry C. Dorn Lenwood S. Heath August 8, 2019 Blacksburg, VA 24060, U.S. Keywords: Machine learning, Neural Network, Chromatography, Fullerene, Modeling, Random Forest, XGBoost, Linear Regression, SVM regression, Nearest Neighbor Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies Xiaoyang Liu ABSTRACT Machine learning methods are now extensively applied in various scientific research areas to make models. Unlike regular models, machine learning based models use a data-driven approach. Machine learning algorithms can learn knowledge that are hard to be recognized, from available data. The data-driven approaches enhance the role of algorithms and computers and then accelerate the computation using alternative views. In this thesis, we explore the possibility of applying machine learning models in the prediction of chromatographic retention behaviors. Chromatographic separation is a key technique for the discovery and analysis of fullerenes. In previous studies, differential equation models have achieved great success in predictions of chromatographic retentions. However, most of the differential equation models require experimental measurements or theoretical computations for many parameters, which are not easy to obtain. Fullerenes/metallofullerenes are rigid and spherical molecules with only carbon atoms, which makes the predictions of chromatographic retention behaviors as well as other properties much simpler than other flexible molecules that have more variations on conformations. In this thesis, I propose the polarizability of a fullerene molecule is able to be estimated directly from the structures. Structural motifs are used to simplify the model and the models with motifs provide satisfying predictions. The data set contains 31947 isomers and their polarizability data and is split into a training set with 90% data points and a complementary testing set. In addition, a second testing set of large fullerene isomers is also prepared and it is used to testing whether a model can be trained by small fullerenes and then gives ideal predictions on large fullerenes. Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies Xiaoyang Liu GENERAL AUDIENCE ABSTRACT Machine learning models are capable to be applied in a wide range of areas, such as scientific research. In this thesis, machine learning models are applied to predict chromatography behaviors of fullerenes based on the molecular structures. Chromatography is a common technique for mixture separations, and the separation is because of the difference of interactions between molecules and a stationary phase. In real experiments, a mixture usually contains a large family of different compounds and it requires lots of work and resources to figure out the target compound. Therefore, models are extremely import for studies of chromatography. Traditional models are built based on physics rules, and involves several parameters. The physics parameters are measured by experiments or theoretically computed. However, both of them are time consuming and not easy to be conducted. For fullerenes, in my previous studies, it has been shown that the chromatography model can be simplified and only one parameter, polarizability, is required. A machine learning approach is introduced to enhance the model by predicting the molecular polarizabilities of fullerenes based on structures. The structure of a fullerene is represented by several local structures. Several types of machine learning models are built and tested on our data set and the result shows neural network gives the best predictions. ACKNOWLEDGEMENT I would like to express my appreciation to my advisor Dr. Young Cao for giving me the opportunity to pursue a master degree in computer science. Studying toward a computer science degree opens a new world for me. The project is in the intersection between computer science knowledge and my Ph.D. dissertation. I also thank my committee members, Dr. Harry Dorn and Dr. Lenwood Heath. I thank Dr. Harry Dorn for his support and understanding for my work and study in computer science. I thank Dr. Lenwood Heath for his help in discussion and revision of my thesis. It is my best luck to have the helpful committee members. iv Table of Content 1. Background ............................................................................................................................ 1 1.1 Chromatography .............................................................................................................. 4 1.2 High Performance Liquid Chromatography .................................................................... 8 1.3 HPLC in Fullerenes/Metallofullerenes Separations ....................................................... 20 2. Machine Learning Models Based on Molecular Geometry ................................................. 32 2.1 Machine Learning Models and Applications in Chemistry ........................................... 32 2.2 Feature Selection ............................................................................................................ 33 2.3 Model Selection ............................................................................................................. 37 3. Conclusion ........................................................................................................................... 45 References ................................................................................................................................ 46 v 1. Background Machine learning has been in its rapid development for the past few decades[1]. We may find machine learning in news feed, searching, and image recognition. Machine learning combines statistics and computer science and builds models to realize the tasks, which are hardly achieved by regular methods. Fundamentally, machine learning algorithms extract information or key features from available data and represent the data set using general models. There are numerous machine learning methods and it has been well established that different models are capable to be used for different tasks. Generally, there are two categories of tasks, regression and classification[2]. Most machine learning models can be modified for both tasks. Machine learning models are trained by raw data and then applied to infer things for unknown data. The training of machine learning models is a big challenge and nowadays, there are several methods to achieve sufficient training of models due to the development of algorithms and computing resources. Different machine learning methods have different structures to make them general for most tasks. The learning of machine learning models is a step to acquiring the hidden key features of data. The structures of knowledge from the raw data take different forms. For example, linear regression is a model to describe the linear relationship between features and the result data. Decision tree uses a different way of applying rules and applies a set of parameters to make decisions under different conditions. The application of a machine learning algorithm is also referred to as data mining, which is the process to obtain key information or features from a large amount of data. Among machine learning algorithms, neural network, or named as deep learning, is the most commonly used technique for scientific research. In the past few decades, various neural networks have been developed. For example, the convolutional neural network has shown its 1 power in image processing. Also, the ability to process an image or recognizing patterns is suitable to be converted to solve chemistry problems that involve chemical structures. Another essential feature to note is that deep learning can extract features automatically. One of the big challenges of machine learning is designing features, and designing features requires no only machine learning knowledge but also domain knowledge. Therefor deep learning is then versatile for different studies. Nowadays, with enough amount of data, deep learning models can defeat professional people in their areas. Behind the fancy results of machine learning, math plays an essential role. Machine learning algorithms, fundamentally, are built based on linear algebra, statistics, and programming. The first step of building machine models is to convert data into vectors, which is the language computer program can understand. Then training machine learning models is then to solve linear algebra systems following the given structures of machine learning algorithms. The starting point of many machine learning algorithms is the probability, and then the decision or result is estimated based on probability distributions or likelihood. A machine learning models have a certain structure, but contains several parameters, which are decided by the data. The way to learning information from the data set is to figure out the parameters suitable for the data set. Therefore, the training of a machine learning model is to optimize the parameter associated with the general machine learning model. The development of optimization algorithms has been a central topic in machine learning for decades. A typical method to handle a large amount
Recommended publications
  • Shree H.N.Shukla College of Science Rajkot B.Sc
    Shree H.N.Shukla College of Science Rajkot B.Sc. (Sem- 6) (CBCS) CHEMISTRY: [603] Unit-4 Chapter-6: Chromatography History of Chromatography Chromatography is derived from the Greek word ‘chroma’ means ‘color’ and ‘graphein’ means writing or recording. In 1890, Mikhail Tsvet, a Russian Italian Botanist invented an earliest form of true chromatography technique for the separation of plant pigmentation. But later, evolution of paper chromatography stroked and improved by Raphael E Liesegang in 1927. Archer Martin and Richard Synge again popularized it and further developed gas chromatography in collaboration with Anthony James. It separates a chemical mixture into an individual component and helps in analysis of the particular compound. Chromatography is generally carried out by organic chemist and biochemists for analysis, isolation and purification. Definition Chromatography separates a component of mixture which is dissolved in a substance called the mobile phase and is carried out by a second substance called the stationary phase. Chromatography is a method of separation in which the components to be separated are distributed between two phases, one of these is called a stationary phase and the other a mobile phase which moves on the stationary phase in a definite direction. Basic working principle of Chromatography Chromatography is a method of physical separation in which components of mixture gets separated on two phases. One of the phase is the immobile porous bed bulk liquid which is called stationary phase and the other phase is the mobile fluid that flows over the stationary phase under gravity. During the movement of the sample, a separated result is formed by the repeated desorption and sorption in the direction of the mobile phase migration.
    [Show full text]
  • Reader 19 05 19 V75 Timeline Pagination
    Plant Trivia TimeLine A Chronology of Plants and People The TimeLine presents world history from a botanical viewpoint. It includes brief stories of plant discovery and use that describe the roles of plants and plant science in human civilization. The Time- Line also provides you as an individual the opportunity to reflect on how the history of human interaction with the plant world has shaped and impacted your own life and heritage. Information included comes from secondary sources and compila- tions, which are cited. The author continues to chart events for the TimeLine and appreciates your critique of the many entries as well as suggestions for additions and improvements to the topics cov- ered. Send comments to planted[at]huntington.org 345 Million. This time marks the beginning of the Mississippian period. Together with the Pennsylvanian which followed (through to 225 million years BP), the two periods consti- BP tute the age of coal - often called the Carboniferous. 136 Million. With deposits from the Cretaceous period we see the first evidence of flower- 5-15 Billion+ 6 December. Carbon (the basis of organic life), oxygen, and other elements ing plants. (Bold, Alexopoulos, & Delevoryas, 1980) were created from hydrogen and helium in the fury of burning supernovae. Having arisen when the stars were formed, the elements of which life is built, and thus we ourselves, 49 Million. The Azolla Event (AE). Hypothetically, Earth experienced a melting of Arctic might be thought of as stardust. (Dauber & Muller, 1996) ice and consequent formation of a layered freshwater ocean which supported massive prolif- eration of the fern Azolla.
    [Show full text]
  • University of Cincinnati
    UNIVERSITY OF CINCINNATI Date:___________________ I, _________________________________________________________, hereby submit this work as part of the requirements for the degree of: in: It is entitled: This work and its defense approved by: Chair: _______________________________ _______________________________ _______________________________ _______________________________ _______________________________ Coherent Porous Silicon Technology for Micro Loop Heat Pipes and Chromatography A dissertation submitted to the Division of Research and Advanced Studies of the University of Cincinnati in partial fulfillment of the requirement for the degree of Doctor of Philosophy (Ph.D.) In the Department of Electrical and Computer Engineering & Computer Science 2006 By Srinivas Parimi M.S., University of Cincinnati, Cincinnati OH, 2003 B.Tech., Nagarjuna University, AP India, 1999 Committee Chair: Thurman H. Henderson Co-Chair: Frank M. Gerner To my parents… ii Abstract In this work coherent porous silicon (CPS) is used as a base technology to develop micro Loop Heat Pipes (LHP) and multi-turn micro chromatograph. The issues with silicon passivation in a photon pumped electrochemical cell are discussed and innovative solutions are presented. The challenges faced in micropatterning CPS, such as stress development around the boundaries, material selection, electrolyte selection and process development are described. The micro LHP developed in this lab provides a planar surface for microelectronic chip cooling. Several generations of these devices were built with improvements in design and optimization of heat transfer. Recently 60W/cm2 of heat flux was removed using our current micro LHP. Many steady state models were developed in this work to understand the heat delivery and to optimize the same in a micro LHP. Microfabrication of individual components and packaging issues involved are described.
    [Show full text]
  • Phenolics and Plant Allelopathy
    Molecules 2010, 15, 8933-8952; doi:10.3390/molecules15128933 OPEN ACCESS molecules ISSN 1420-3049 www.mdpi.com/journal/molecules Review Phenolics and Plant Allelopathy Zhao-Hui Li 1,2, Qiang Wang 1,2,*, Xiao Ruan 1,2, Cun-De Pan 3 and De-An Jiang 1,* 1 College of Life Sciences, Zhejiang University, Hangzhou 310058, China 2 Ningbo Institute of Technology, Zhejiang University, Ningbo 315100, China 3 College of Forest, Xinjiang Agricultural University, Urumqi 830052, China * Authors to whom correspondence should be addressed; E-Mails: [email protected] (Q.W.); [email protected] (D.-A.J.); Tel.: +86-574-88134338 (Q.W.); +86-571-88206461(D.-A.J.); Fax: +86-574-88229545 (Q.W.); +86-571-88206461(D.-A.J.). Received: 19 September 2010; in revised form: 21 November 2010 / Accepted: 25 November 2010 / Published: 7 December 2010 Abstract: Phenolic compounds arise from the shikimic and acetic acid (polyketide) metabolic pathways in plants. They are but one category of the many secondary metabolites implicated in plant allelopathy. Phenolic allelochemicals have been observed in both natural and managed ecosystems, where they cause a number of ecological and economic problems, such as declines in crop yield due to soil sickness, regeneration failure of natural forests, and replanting problems in orchards. Phenolic allelochemical structures and modes of action are diverse and may offer potential lead compounds for the development of future herbicides or pesticides. This article reviews allelopathic effects, analysis methods, and allelopathic mechanisms underlying the activity of plant phenolic compounds. Additionally, the currently debated topic in plant allelopathy of whether catechin and 8-hydroxyquinoline play an important role in Centaurea maculata and Centaurea diffusa invasion success is discussed.
    [Show full text]
  • Gel Permeation Chromatography and Size Exclusion Chromatography
    An Introduction to Gel Permeation Chromatography and Size Exclusion Chromatography PRIMER Contents Start here 3 Chapter 4 – GPC/SEC in action; A note on names 3 real world applications 18 Seven things you should know about GPC/SEC 3 Gum arabic, good and bad 18 Chapter 1 – What is chromatography? 4 Fingerprinting nail varnish 18 Types of chromatography 4 Modifying PVC 19 Gas chromatography 4 Chapter 5 – FAQs 20 High performance liquid chromatography 5 Appendix 21 Gel permeation/size exclusion chromatography 5 Recommendations for Chapter 2 – GPC/SEC overview 6 setting up a GPC/SEC system 21 Polymers 6 Choosing an eluent for GPC/SEC 21 Size matters 6 Choosing a column for GPC/SEC 21 How does GPC/SEC work 7 Setting up the GPC/SEC system 22 Who uses GPC/SEC, what for and why 8 What standards should I use? 22 Calibrations 8 Typical polymer molecular weights 23 Calculations in GPC/SEC 9 Ordering Information 24 Types of polymer distribution 11 Agilent solutions for GPC/SEC 27 Chapter 3 – GPC/SEC in practice 13 Glossary and abbreviations 28 Solvents and solvent containers 13 Suggestions for further reading 30 Ovens 13 Samples 14 Injection and injectors 14 Columns and column sets 14 Pumps 15 Detectors 16 Conventional GPC/SEC 17 Multi-detector GPC/SEC 17 Automatic data processing 17 35 years’ expertise in GPC/SEC 1990 PL aquagel-OH columns 1981 PLgel MIXED columns, Vastly improve resolution and data quality PL aquagel columns in aqueous GPC MIXED columns improve data quality, with novel chemistries for analysis of water soluble polymers 1984 GPC software 1976 PLgel columns, individual Dedicated software streamlines standards and standard kits GPC/SEC calculations Polymer Laboratories founded to develop market leading products for organic GPC/SEC 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 1900s 2 Start here This guide provides some background to the most common Seven things you should know about GPC/SEC techniques and applications of gel permeation chromatography, 1.
    [Show full text]
  • Chromatographic Techniques
    Chromatographic Techniques D R ASHWINI WADEGAONKAR 1. Introduction to chromatography, IUPAC definition of chromatography. 2. History of Chromatography- 3. Types of chromatography – (a) Paper chromatography, (b)Thin Layer Chromatography, (c) Ion exchange Chromatography, (d) Gas permeation Chromatography, (e) Affinity chromatography, (g) Gas chromatography, (h) Supercritical fluid chromatography, (i) High Performance Liquid Chromatography, (j) Capillary electrophoresis, 4. Classification of chromatographic methods – according to separation methods, according to development procedures. (i)Thin Layer Chromatography: Theory and principles, outline of the method, surface adsorption and spot shape, Comparison of TLC with other forms of chromatography, adsorbents, preparation of plates, application of samples, development. (ii)Paper Chromatography- Origin, overview of technique, sample preparation, types of paper, solvents, equilibrium, development, sample application and detection, Identification, Quantitative methods, applications of paper chromatography Introduction Chromatography means – Colour Writing It is new physical technique of separation, identification, identification and purification of components of a mixture. It is used in many areas of study particularly in chemistry, biology and medicine. Pigments, dyes, amino acids, vitamins, polymers, etc can be separated by using the chromatography technique. It is used for the purification and separation of organic as well as inorganic substances. Found useful for the fractionation of
    [Show full text]
  • Chromatography Resin Characterisation to Analyse Lifetime and Performance During Biopharmaceutical Manufacture
    Chromatography Resin Characterisation to Analyse Lifetime and Performance During Biopharmaceutical Manufacture by Mauryn C. Nweke Department of Biochemical Engineering University College London Gower Street London WC1E 6BT A thesis submitted for the degree of DOCTOR OF PHILOSOPHY September 2017 1 DECLARATION I confirm that the work presented in this thesis is my own unless indicated otherwise. The work presented was carried out under the supervision of Prof Daniel G Bracewell at the Department of Biochemical Engineering, University College London and Dr R. Graham McCartney, Eli Lilly & Co., Ireland, between October 2013 and September 2017. This thesis has not been submitted, either in whole or in part, for another degree or another qualification at any other university. Mauryn C. Nweke London, September 2017 2 ACKNOWLEDGMENTS My acknowledgements extend as far back as when my time at UCL began in 2009. I would like to thank all the contributors to my academic career thus far, for without them, I would not have found myself here. I would especially like to thank my supervisor Prof Daniel G Bracewell, my industrial supervisor Dr Graham McCartney and my secondary supervisor Prof Nigel Titchener-Hooker for taking a chance on me and supporting me in my pursuit of this project. Your support has not been in vain! I also thank my Head of Department, Prof Gary J Lye and my mentor in many ways, Dr Sunny Bains, for helping me to develop my personal aspirations, I am grateful. I could never thank my loved ones enough. I thank you for your ability to believe in me in my darkest moments and your ability to continue to support me when I could hardly support myself.
    [Show full text]
  • The People Who Shaped Chromatography – Mikhail Tsvet
    The People Who Shaped Chromatography – Mikhail Tsvet Guest Author – Genevieve Hodson, Technical Specialist – Phenomenex USA To know where we are going, we must understand where we have come from. That is why I am excited to start this series of articles focusing on those who helped shape the separation sciences we know today, starting with Mikhail Tsvet – the unrecognized father of chromatography. Mikhail Tsvet was born in Asti, Italy to an Italian mother and Russian father in 1872. After his mother passed away, he moved with his father to Geneva. He attended school there and later graduated from the University of Geneva with his bachelor’s degree in Science in Mathematics and Physics. He then went on to get his doctoral degree in botany after researching cell physiology. However, when he followed his father back to his homeland of Russia at the age of 24, Tsvet was forced to re-earn these degrees as they were not honored in Russia at that time. The People Who Shaped Chromatography – Mikhail Tsvet After some time, he began work at the Biological Laboratories in the Russian Academy of Sciences on plant pigmentation. This work continued at the Institute of Plant Physiology at Warsaw University in Poland. Due to upheaval in the region, Tsvet spent the remainder of his life moving around Eastern Europe escaping WWI, continually trying to further his beloved research. As a botanist, Tsvet was enamored by the pigmentation of plants. This fascination eventually led him to perform critical experiments which would later be known as absorptive The People Who Shaped Chromatography – Mikhail Tsvet chromatography.
    [Show full text]
  • Download PDF (633K)
    n ★ En tio v c ir u o d n o m r e p n o i t Please cite this article as B ★ ★ L i e Hayami and Sri Kantha, Reviews in Agricultural Science, 5:83-99, 2017 f c e n S e i http://dx.doi.org/10.7831/ras.5.83 c REVIEWS OPEN ACCESS Nobel Prizes for Research in Plant Science: Past, Present and Future Natsuki Hayami and Sachi Sri Kantha1 1 The Graduate School of Agricultural Sciences, Gifu University, Yanagido 1-1, Gifu City 501-1193, Japan. ABSTRACT The Nobel Prizes awarded in two appropriate science categories (chemistry as well as physiology or medicine) and the peace category since 1901 were studied to evaluate the plant science related research that had received recognition. We also checked the Nobel prize nomination database for the two appropriate science categories to verify the number of scientists (with research reputation on plant-based studies) who were nominated, but were unlucky in the eventual selection process. The focus of this review is research on plant materials in a wider sense (including that of photosynthetic bacteria), that received Nobel prize recognition. Until 2017, Nobel Prizes for research in plant sciences have been awarded 17 times to 20 scientists. Pioneering work on five major research themes, namely, (1) chlorophyll and photosynthesis, (2) elucidation of the structure of vitamins (carotene, thiamin, ascorbic acid and vitamin K), (3) use of radioisotopes for metabolism studies, (4) plant natural product chemistry and (5) plant genetics had received Nobel award recognition so far. For future recognition, Nobel laureates such as Melvin Calvin and Barbara McClintock had opined the worth of interdisciplinary teams with expertise in botany for trend-setting new discoveries in plant science research.
    [Show full text]
  • Applications of Column, Paper, Thin Layer and Ion Exchange Chromatography in Purifying Samples: Mini Review
    Mini Review Published: 12 Nov, 2019 SF Journal of Pharmaceutical and Analytical Chemistry Applications of Column, Paper, Thin Layer and Ion Exchange Chromatography in Purifying Samples: Mini Review Enyoh CE1,2*, Isiuku BO2, Verla AW1,2 1Group Research in Analytical Chemistry, Environment and Climate Change (GRACE&CC), Department of Chemistry, Imo State University (IMSU), Imo State, Nigeria 2Department of Chemistry, Imo State University (IMSU), Imo State, Nigeria Abstract An important technique which allows purification of mixture components is chromatography based on interaction between a stationary and mobile phase. The mixture components redistribute themselves between the phases either adsorption, partition, ion exchange or size exclusion. Here, we presented a review of applications of column, paper, thin layer and ion exchange chromatography in purifying samples: The technique has wide use in the analysis of proteins molecules, nucleic acids, drugs, antibiotics and biological matrices and does not require the use of machines or special devices, it is fully portable and easy to handle and considerably cheaper than most commercial methods. Keywords: Chromatography; Column chromatography; Protein purification; Purification; Separation; Drugs Background: Historical Perspective It was an Italian-born scientist by the name Mikhail Tsvet, who first used chromatography in 1900 in Russia [1], to primarily separate pigments present in plant (chlorophyll, carotenes, and xanthophylls). Mikhail Tsvet continued to use the method throughout the first ten years of the OPEN ACCESS 20th century. The basis of using the technique for pigments separation gave the technique its name. *Correspondence: Development made during the 30s and 40s expanded the scope of the technique and became Enyoh Christian Ebere, Group very useful for many separation processes [2].
    [Show full text]