Towards Chemical Universal Turing Machines
Total Page:16
File Type:pdf, Size:1020Kb
Towards Chemical Universal Turing Machines Stephen Muggleton Department of Computing, Imperial College London, 180 Queens Gate, London SW7 2AZ. [email protected] Abstract aid of computers. This is not only because of the scale of the data involved but also because scientists are unable Present developments in the natural sciences are pro- to conceptualise the breadth and depth of the relationships viding enormous and challenging opportunities for var- ious AI technologies to have an unprecedented impact between relevant databases without computational support. in the broader scientific world. If taken up, such ap- The potential benefits to science of such computerization are plications would not only stretch present AI technology high knowledge derived from large-scale scientific data has to the limit, but if successful could also have a radical the potential to pave the way to new technologies ranging impact on the way natural science is conducted. We re- from personalised medicines to methods for dealing with view our experience with the Robot Scientist and other and avoiding climate change (Muggleton 2006). Machine Learning applications as examples of such AI- In the 1990s it took the international human genome inspired developments. We also consider potential fu- project a decade to determine the sequence of a single hu- ture extensions of such work based on the use of Uncer- man genome, but projected increases in the speed of gene tainty Logics. As a generalisation of the robot scientist sequencing imply that before 2050 it will be feasible to de- we introduce the notion of a Chemical Universal Turing machine. Such a machine would not only be capable of termine the complete genome of every individual human be- complex cell simulations, but could also be the basis for ing on Earth. Owing to the scale and rate of data generation, programmable chemical and biological experimentation computational models of scientific data now require auto- robots. matic construction and modification. We are seeing a range of techniques from mathematics, statistics and computer sci- ence being used to compute scientific models from empiri- Introduction cal data in an increasingly automated way. For instance, in Collection and curation of data throughout the sciences is meteorology and epidemiology large-scale empirical data is becoming increasingly automated. For example, a single routinely used to check the predictions of differential equa- high-throughput experiment in biology can easily generate tion models concerning climate variation and the spread of over a gigabyte of data per day, while in astronomy auto- diseases. matic data collection leads to more than a terabyte of data per night. Throughout the sciences the volumes of archived Machine Learning in Science data are increasing exponentially, supported not only by Meanwhile, machine learning techniques are being used to low-cost digital storage but also by increasing efficiency of automate the generation of scientific hypotheses from data. automated instrumentation. It is clear that the future of sci- For instance, Inductive Logic Programming (ILP) enables ence involves increasing amounts of automation in all its new hypotheses, in the form of logical rules and principles, aspects: data collection, storage of information, hypothesis to be extracted relative to predefined background knowl- formation and experimentation. Future advances have the edge. This background knowledge is formulated and revised ability to yield powerful new forms of science which could by human scientists, who also judge the new hypotheses and blur the boundaries between theory and experiment. How- may attempt to refute them experimentally. As an exam- ever, to reap the full benefits it is essential that developments ple, within the past decade researchers in my group have in high-speed automation are not introduced at the expense used ILP to discover key molecular sub-structures within of human understanding and insight. a class of potential cancer-producing agents (Muggleton During the 21st century, Artificial Intelligence techniques 1999; Sternberg & Muggleton 2003). Building on the same have the potential to play an increasingly central important techniques, we have more recently been able to generate ex- role in supporting the testing and even formulation of sci- perimentally testable claims about the toxic properties of hy- entific hypotheses. This traditionally human activity has al- drazine from experimental data in this instance, analyses ready become unsustainable in many sciences without the of metabolites in rat urine following low doses of the toxin Copyright c 2006, American Association for Artificial Intelli- (Tamaddoni-Nezhad et al. 2004). gence (www.aaai.org). All rights reserved. In other sciences, the reliance on computational mod- elling has arguably moved to a new level. In systems bi- yeast survival or death. The robot strategy that worked best ology the need to account for complex interactions within (lowest cost for a given accuracy of prediction) not only out- cells in gene transduction, signalling and metabolic path- performed two other automated strategies, based on cheap- ways are requiring new and richer systems-level modelling. est and random-experiment selection, but also outperformed Traditional reductionist approaches in this area concentrated humans given the same task. on understanding the functions of individual genes in iso- lation. However, genome-wide instrumentation, including Micro-fluidic robots micro-array technologies, are leading to a system-level ap- One exciting development we might expect in the next 10 proach to biomolecules and pathways and to the formulation years is the construction of the first micro-fluidic robot and testing of models that describe the detailed behaviour of scientist, which would combine active learning and au- whole cells. This is new territory for the natural sciences tonomous experimentation with micro-fluidic technology. and has resulted in multi-disciplinary international projects Scientists can already build miniaturised laboratories on a such as the virtual E-Cell (Takahashi et al. 2003). chip using micro-fluidics (Fletcher et al. 2002) controlled One obstacle to rapid progress in systems biology is and directed by a computer. Such chips contain miniature the incompatibility of existing models. Often models that reaction chambers, ducts, gates, ionic pumps and reagent account for shape and charge distribution of individual stores and allow for chemical synthesis and testing at high molecules need to be integrated with models describing the speed. We can imagine miniaturising our robot scientist interdependency of chemical reactions. However, differ- technology in this way, with the overall goal of reducing the ences in the mathematical underpinnings of say differen- experimental cycle time from hours to milliseconds. With tial equations, Bayesian networks and logic programs make micro-fluidic technology each chemical reaction not only re- integrating these various models virtually impossible. Al- quires less time to complete, but also requires smaller quan- though hybrid models can be built by simply patching two tities of input materials, with higher expected yield. On such models together, the underlying differences lead to unpre- timescales it should become easier for scientists to repro- dictable and error-prone behaviour when changes are made. duce new experiments. Potential for Uncertainty Logics Chemical Universal Turing Machines One key development from AI is that of formalisms Today's generation of micro-fluidic machines are designed (Halpern 1990) that integrate, in a sound fashion, two of to carry out a specific series of chemical reactions, but fur- the major branches of mathematics; mathematical logic and ther flexibility could be added to this toolkit by developing probability calculus. Mathematical logic provides a formal what one might call a `Chemical Universal Turing Machine' foundation for logic programming languages such as Pro- (CUTM). The universal Turing machine devised in 1936 by log, whereas probability calculus provides the basic axioms Alan Turing was intended to mimic the pencil-and-paper op- of probability for statistical models, such as Bayesian net- erations of a mathematician. A CUTM would be a universal works. The resulting `probabilistic logic' is a formal lan- processor capable of performing a broad range of chemical guage that supports statements of sound inference, such as operations on both the reagents available to it at the start and “The probability of A being true if B is true is 0.7”. Pure those chemicals it later generates. The machine would auto- forms of existing probabilistic logic are unfortunately com- matically prepare and test chemical compounds but it would putationally intractable. However, an increasing number also be programmable, thus allowing much the same flexi- of research groups have developed machine learning tech- bility as a real chemist has in the lab. niques that can handle tractable subsets of probabilistic logic One can think of a CUTM as an automaton connected to (Raedt & Kersting 2004). Although it is early days, such re- a conveyor belt containing a series of flasks: the automa- search holds out the promise of sound integration of scien- ton can move the conveyor to obtain distant flasks, and can tific models from the statistical and computer science com- mix and make tests on local flasks. Just as Turing's orig- munities. inal machine later formed the theoretical basis of modern computation,