The Biological Future of Theoretical Physics Cur s Callan Princeton University
The very success of theore cal physics in elucida ng the structure of ma er at the smallest scales, and of the universe at the largest scales, has made the future course of this discipline uncertain. At the same me, the new ability of biological experiment to produce massive data is crea ng an urgent need for mathema cal frameworks in biology of the kind theore cal physics has tradi onally provided for physical science. These overlapping “crises” offer a golden opportunity for both disciplines to collaborate. I will expand on this theme, and sketch some specific examples of how theore cal physicists are taking up this challenge. Who am I and why am I here? (w. apologies to Adm. Stockdale!)
• I used to be a theore cal par cle physicist .. wri ng papers like these: – Worldsheet Approach to Hetero c Solitons and Instantons – Brane Dynamics from the Born-Infeld Ac on – D-Brane Approach to Black Hole Quantum Mechanics. • But about ten years ago I started wri ng papers like this: – Precise physical models of protein-DNA interac on from high-throughput data – Informa on capacity of gene c regulatory elements – Quan fying selec on in immune receptor repertoires • I am occasionally asked why I did this .. why did I “switch” to biology? – Not really because of any dissa sfac on with “main stream” theore cal physics – Rather that pioneering biophysics colleagues (Bialek, Leibler) showed me that biology poses fascina ng ques ons for theory, and that the me is ripe to a ack them. • On reflec on, though, it seems to me that biology and theore cal physics have both come to a “crisis” that offers opportuni es for both subjects. – My purpose tonight is to explain what I mean by this and to give you some concrete no on of what theore cal physicists are doing to respond. – I am here for the “Quan ta ve Immunology” program (largely populated by theore cal physicists) … my talk may explain why KITP is hos ng such an event! Theore cal Physics is Hugely Ambi ous!
It comprehends a lot. But some things escape its net:
It describes how It describes the our universe behavior of inanimate came from the ma er everywhere in Big Bang. our universe
Thanks to NASA for the image The primary task of theoretical physics
• To discover the fundamental, mathema cally expressed, Laws of Nature … as well as the “stuff” that obeys those laws .. in our universe (?). • These laws are valid within broad domains of phenomena and are in some sense “simple”; o en fit handily on a postcard in shorthand form. • They were discovered in steps over recent centuries: – Newtonian mechanics & the gravita onal force law (1670s) – Maxwell’s equa ons of E&M (1860s) (and special rela vity) – Einstein’s general rela vity (gravity as dynamical geometry) (1910s) – Quantum mechanics of electrons, atoms and molecules (1920s) – Quantum field theory and the Standard Model of everything (1970s) – “Big Bang” cosmology and a theory of the origin of our universe (2000s) • The culmina on of this development, the outcome of an explosion of discovery over the last 50 years, is the Standard Model of our world. – It is a precise mathema cal theory whose scope is … everything. – Some modesty is of course in order, but a victory lap or two is jus fied. The Standard Model Consensus
Fi y years ago, there was no agreement on the fundamental nature of ma er, on the physical theory governing that ma er, or how the universe worked. Over me, a “picture” and “theory” of ma er and the universe came together (on two tracks):
The par cle physics track: A specific quantum field theory of the strong, weak and electromagne c interac ons between three “genera ons” of point-like quarks and leptons; forces mediated by “gauge” gluons, photons, and the W/Z, with a Higgs boson doing symmetry breaking and mass genera on. Some 21 parameters (that could have been different) completely define the whole thing. Total agreement with many and varied experiments. The search for “cons tuents” could stop here!
The cosmology track: The recession of the galaxies suggests a “Big Bang” origin for the universe. The explora on of its thermal “a erglow” (cosmic microwave background now at 2.75 K) revealed how the universe was cons tuted in its earliest seconds of existence. Two surprises came out: the ma er of par cle physics is there, but is a minority player in the energy census: “dark ma er” and “dark energy” dominate, but seem to act on the world only via gravity. This explains many puzzles of astronomy and also how today’s universe coalesced from the Big Bang The Standard Model of particle physics
We have a very specific quantum field theory of how these point par cle cons tuents interact. The quarks are not seen directly: they are “confined”, combining in triples to make neutrons, protons etc. Our ability to calculate specific results from in this theory is limited, but good enough to convince us of its accuracy.
Graphic credit: Par cle Data Group The Standard Model of cosmology
We can see photons emi ed back to here
Density ripples start to grow about here BIG BANG!!! A remark on the method of modern physics
Millions of data points in …..
A few model parameters out!
In cosmology/par cle physics we do not directly measure the “hidden variables” of interest. Given an underlying, simple, model for how they affect the noisy data sets we do measure, we use sta s cal inference to “see through the noise” into the underlying physical parameters. This is a nearly universal method in modern physical science. One of my messages is that biology is now entering this era. Massive resources were deployed to get here
The WMAP satellite The LHC accelerator at CERN
The CMS detector at the LHC
This consensus picture is the outcome of an enormous effort (intellectual, experimental, sociological, financial) carried out over a period of 50 years. This sustained effort to answer what amounts to a ques on of “natural philosophy” is remarkable .. and a credit to the human race. Where does theoretical physics go from here? • There are loose ends, but the historic program has succeeded so well that it has put its own con nua on into serious ques on! • On the conceptual side, we seem to be at a turning point: – Once you are down to point par cle cons tuents of ma er, you can’t really “explain” them in terms of more fundamental en es: Is the game over? – There could be more massive point cons tuents that even the LHC can’t see (and neutrino masses and dark ma er point that way … obscurely). – Going deeper, in the light of string theory, we can see a natural reduc onist limit point at the Planck scale .. way beyond direct experimental reach. • On the experimental side, exploring the relevant energy scales is increasingly costly, and surely approaching societal limits – The LHC is speaking; we hope for surprises, but we may “only” get a comple on of the Standard Model, not a view of deeper physical law. – Looking beyond the Standard Model, or into infla on, the future experimental projects are in the 1010 euro class … will taxpayers support them? – And, in the long run, theory without experiment is not sustainable. • So, life is going to be hard for fundamental theory from here on … Not impossible, to be sure, but are there other paths to take? The “other agenda” of theoretical physics • Discovering the fundamental laws is the historic core mission of theore cal physics .. but that’s not all theore cal physicists do! • In addi on, we want to explain phenomena that are not directly baked into the fundamental laws … we call them emergent phenomena. – E.g. show that superconduc vity (a macroscopic quantum effect) follows from the Schrödinger equa on for many electrons moving in a host atomic la ce. – Or prove “confinement”, namely that the quarks of QCD can never be seen outside the hadrons (neutrons and protons) of which they are cons tuents. • Be er yet, we want to predict unknown emergent phenomena (i.e. derive them from fundamental law) before their experimental discovery: – Our record on this is not good. The quan zed Hall effect is a striking quantum effect and could have been predicted. But it wasn’t .. a failure of imagina on? – It doesn’t happen o en, but it is not impossible: cosmologists did an cipate Big Bang phenomena like CMB. Topological insulators were also an cipated. • Good news is that we have (so far) found that our fundamental laws are able to explain new “emergent” phenomena. Usually a er the fact … Life: the “emperor” of all emergent phenomena • There are phenomena of fundamental importance which are certainly described by the already-known fundamental laws … but whose deriva on from those Laws completely escapes us. • The chief of these is Life. There are good reasons why it is me to bring the domain of living ma er into the realm of predic ve, mathema cal, science: – How “living ma er” is governed by physical principles has always been a ques on for theore cal physics ... but inadequate data ed our hands – The ongoing explosion of quan ta ve biological data (hi-throughput sequencing, expression profiling, …) has created a totally new context for this issue – On the biology side, it is becoming clear that we need mathema cal frameworks (like we have in physics) to extract meaning from the growing mass of data. • Developing this kind of theory is what theore cal physicists do .. and it is a major intellectual challenge, on a par with our quest for fundamental laws – A quick tour of the past, present and future of this enterprise will, I hope, give you a more concrete idea of what I am talking about • It used to be said that there is no theory in biology … the day may be coming when there is no biology without theory! The theoretical physics of Life: past Historical instances of using theore cal principles to illuminate and make predic ons about a broad class of biological phenomena include: • Schroedinger’s “What is Life?”: genes are carried by a polymer molecule – Basic quantum mechanics and the known rate of induc on of muta ons by x- rays (Morgan, 1910) led him to the conclusion that genes had to be carried by a molecule. Many genes -> informa on -> linear polymer molecule of heredity • Hopfield and Ninio’s “kine c proofreading” for DNA replica on accuracy – Boltzmann equilibrium sta s cs and the binding energy differences between base pairs would give a copying error rate of 1 per 104 bases per genera on. The actual rate is more like 1 per 108, and this led them to propose an energy consuming enzyme that checks fidelity and corrects mistakes. Found! • Berg and Purcell’s explana on of how bacteria locate their next meal – Bacteria must move up a density gradient to find a food source (in real world). B&P showed that a bacterium’s size and the physics of diffusion mean that a bacterium can’t measure the food gradient across its body. How does it know which way is up? They do a funny “run and tumble” random walk: as long as density is increasing, go straight … else pick a new direc on at random. True! The theoretical physics of Life: future
Big theore cal ques ons are wai ng in the wings. They are way beyond our grasp today but not, I trust, forever! My favorite examples are:
• What is it about non-equilibrium sta s cal mechanics that makes it possible for popula ons of dis nct, stably reproducing en es to arise? We hardly know where to begin in trying to answer this ques on. • Are there equa ons describing evolu on of organisms with 103s of genes; can they be solved and can anything be said about their global behavior? • Can we capture the dynamics of a cell? Can equa ons accurately describe its behavior, given that a cell has thousands of interac ng parts? Can we see ssue types as basins of a rac on of a large dynamical system? • Brains carry out tasks that have an abstract representa on. Can dynamical network models capture the processing power of a human brain (or eye)? • These are some of the “big ques ons” that a future theore cal physics of biology will want to tackle. Today we are doing “warm-up exercises” to prepare ourselves for the big task that lies ahead. The theoretical physics of Life: present
Theore cal physicists, once they get hooked on biology, try to study issues that cut across kingdoms (“ from bacteria to brains”), are suitable for nontrivial mathema cal analysis, and can be illuminated by modern biological “big data”:
I: Biological en es respond to s muli, but noise limits how much they can know. Do quan ta ve measures of informa on (Shannon bits) give us new views of biological func on/fitness?
II: Biological func on o en involves probability distribu ons (pdf’s) on high- dimensional data spaces. These pdf’s must usually be “learned” from sparse data. How can this be done?
III: Could “high-throughput” biological experimental data give access to the molecular “wheels and gears” of cellular func on (using a similar strategy to what is common in par cle physics and cosmology)?
I will give you a (superficial) tour of what theore cal physicists are doing to a ack a few specific problems that fall under these general headings. I will try to show in what way “theore cal physics” and “biological big data” are coming together. Quick study: how genes are turned on and off
TF Protein [g] [c] Transcription Regulated factor TF gene concentration expression
Special DNA site (promoter) is Output noise Mean output occupied by TF with occupancy τ driven by TF concentra on [c] and DNA binding energy ε.
[g] Occupancy τ enhances RNAP binding, drives transcrip on. Typical on/off mean response.
But diffusion and small numbers [c] make the output noisy. [c] (I) Building a fly embryo bit by bit
Development is a cascade of transcrip on factor (TF) proteins. Hunchback dis nguishes the thorax from the abdomen. This decision is driven by the level of maternally supplied Bicoid. This pa ern repeats in a TF cascade (Bcd → Hb → Kr, Gt, Kn, ..) leading to the expression “stripes” needed to make a mul -segment body.
Maternal Produced Bicoid Huncback
Nuclei must know `where they are’ to make cell fate choice (thorax v. abdomen). [Bcd] [Bcd] is the signal: if [Bcd] big enough, Hb is expressed
[Hb] and [Bcd] levels in 104s of nuclei can be measured by fluorescent immuno-staining (di o for other TFs). Thus, we can measure the distribu on of the input/output pairs P([Hb] , [Bcd]) over the nuclei in 102s of eggs. What we see is a noisy switch (go back one slide). Can it do more than signal just “on” vs “off”? (I) Building a fly embryo bit by bit
Given two variables g and c with joint pdf P(g,c) their correla on is best quan fied by “mutual informa on” (MI). It is posi ve and measured in “bits”. One bit means that knowing [g] will tell you just that [c] was “high” or “low”. If more bits --> more discrimina on is possible. The joint pdf P([Hb],[Bcd]) has been measured, so we can evaluate MI([Hb],[Bcd]) from data. We find MIdata = 1.5 bits. So, the fruit fly nucleus is more than a simple on/off switch! … Why?
Here is where theore cal physics comes in! We can ask how big MI([Hb],[Bcd]) could be, given the Black is data, red is op mal dist’n known I/O rule of the nucleus. This is a varia onal problem, with simple solu on. Find MImax = 1.7 bits (true OFF ON MI is very close to op mal). But we also get the distribu ons of TF 20% of nuclei are neither “on” nor “off” … concentra ons that achieve this and this is crucial to having MI > 1 op mum, and they match the data. The larger theoretical challenge: role of information in the gap gene network
bicoid caudal Gap genes express in stripes
hunch- kruppel knirps giant back
Gap gene network in D. melanogaster Nuclei need to “locate” about 100 rows Joint expression pa erns of … which requires 6.6 bits. At 1.5 bits (or {hb,kr,kn,gt} could be a code so) per gene, we need 4 readout genes for “where you are” on the … exactly the number of gap genes. ant-post axis. NB: neighboring Informa on op miza on can be used nuclei have different states! to “derive” the regulatory network. (II) Probability distributions & immune diversity • T&B cells implement adap ve immunity. These cells have surface receptors to recognize pathogens floa ng loose or (for T cells) infec ng our cells. • A T/B-cell that recognizes a pathogen proliferates to create a big clone to clear the infec on. A “memory” clone is le to protect against re-infec on. • New T/B-cells are made by stem cells. In each crea on event, the germline DNA for the receptor is “edited” to create a new, unique, receptor gene.
• You have ~ 107 unique, randomly-generated immune cell types in your body. Can we quan fy this diversity and understand how it is generated? • By harves ng T cells, extrac ng/amplifying DNA, and then sequencing, we obtain > 105 dis nct examples of receptor genes from one blood sample. • There is virtually no overlap between repertoires of different individuals. We need to understand their sta s cs in order to learn anything useful. (II) Probability distributions & immune diversity The genome edi ng event is carried out by a few DNA repair enzymes. There are only a few “moves” they can apply to the germline DNA. Any new receptor gene sequence σ is the result of a “scenario” E, a set of values for these ac ons.
Choose possible V,D,J gene segments. Chew away some number of bases from each Don’t ask! A bit exo c …. Insert some number of bases between genes Account for different inser on probabili es
For each type of move there is an unknown probability distribu on: each V gene has its own likelihood of choice, di o for each number of VD or DJ inser ons, etc. We want to infer a pdf for the genera ve scenarios. To do so, we assume a plausible structure:
N.B. Many “scenarios” can yield the same sequence read σ.
Pgen(σ) is the net probability that the result σ is produced in one stem cell event. Once we know the right component pdf’s we will be able to evaluate this hidden variable. (II) Probability distributions & immune diversity
Key theore cal idea: The true component probability distribu ons are those that maximize the likelihood (product of individual probabili es over all data sequences) of the repertoire. Easy search problem! Results for the some of the distribu ons are: Number of inser ons Pgen distribu on & Shared Seqs
Error Bars: Variance over 10 individuals
The total possible diversity is enormous: something like 1013 sequences. The unique clones we observe come nowhere near sampling sequence diversity. We can get sharp results because we assume that the sequence diversity has a simple hidden source. Why did I tell you these particular stories?
First, each is a specific instance of a broader class of conceptually similar problems spanning the tree of life .. and for which big data is available
Theme I: Informa on transmission problems are ubiquitous, and data adequate for quan fying how well it is transmi ed can now be collected in many systems. • Higher organisms convert sensory inputs to spikes on neurons. Recon- struc ng a sensory input from the spike train is an informa on problem. • Eukaryo c signaling converts external chemical signal into phosphoryl- a on state of internal messenger proteins. Quan fy with mass-spec. Theme II: Correlated probability distribu ons on high-dimensional data, and the problems of learning them from sparse data are everywhere. • A given visual scene produces a noisy train of “spikes” on the op c nerves. P(spikes|scene) is the mother of all high-dim’l pdf’s • The same enzyma c func on is provided in different species by proteins with different sequence. Is there a P(aa seq|protein type) to be learned? Second, they exemplify in various ways the theore cal physics method of “understanding” complex phenomena through inference of a much simpler underlying “hidden” (mathema cally expressed) mechanism. Back to the future (of science) • It seems blindingly obvious that biology will need some kind of general mathema cal framework to organize the data flood that is in the offing • I am very skep cal that generic “data mining” approaches (a la Google X) will do the job. Finding and using simpler underlying structures will be key. • The deep problem is figuring out what complexi es can be ignored in crea ng an “adequate” theory or model of a cell (or brain). Do the “coarse graining” and “universality” that work in physics have a role in biology? • We need to rise above specific models for each specific biological system. I don’t claim that physicists are uniquely equipped for this task .. but they are naturally inclined to look for the generality we really need. • The concrete examples I described in the talk are a pre y pale imita on of the sort of analysis we will eventually need. They are just first steps. • The problem is a major intellectual challenge, just as hard and deep as the fundamental physics challenges we have solved in the past. Accep ng it will ensure the future vitality of theore cal physics .. And biology. Acknowledgements
Theme 1 Theme 2 Theme 3 Gasper Tkacik (IST Aus a) Thierry Mora (ENS) Jus n Kinney (CSHL) Bill Bialek (Princeton) Aleksandra Walczak (ENS) Ted Cox (Princeton) Zach Sethna (Princeton) Anand Murugan (Stanford)
Physics Dept. & Lewis-Sigler Institute, Princeton University Funding: NSF, NIH, Keck Foundation