Machine Inspired Synthetic :

Neuromorphic Computing in Mammalian Cells

by

Andrew Moorman

B.Arch.

Cornell University, 2017

Submitted to the MIT Department of Architecture and the Department of Electrical Engineering

and Science in Partial Fulfillment of the Requirements for the Degrees

of

Master of Science in Architecture Studies and

Master of Science in Electrical Engineering and Computer Science

at the

Massachusetts Institute of Technology February 2020 © 2020 Massachusetts Institute of Technology. All rights reserved

Signature of Author ...... MIT Department of Architecture MIT Department of Electrical Engineering and Computer Science January 17, 2020

Certified by ...... Ron Weiss Professor of Biological Engineering and Electrical Engineering and Computer Science Thesis Supervisor

Certified by ...... Skylar Tibbits Associate Professor of Architecture Thesis Supervisor

Accepted by ...... Leslie K. Norford Professor of Building Technology Chair, Department Committee on Graduate Students

Accepted by ...... Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Inspired : Neuromorphic Computing in Mammalian Cells by Andrew Moorman

Submitted to the Department of Architecture and Department of Electrical Engineering and Computer Science on February 17, 2020, in partial fulllment of the requirements for the degrees of Master of Science in Electrical Engineering and Computer Science and Master of Science in Architecture Studies

Abstract Synthetic biologists seek to collect, rene, and repackage nature so that it’s easier to new and reliable biological systems, typically at the cellular or multicellular level. These redesigned systems are often referred to as “biological circuits,” for their ability to perform operations on biomolecular signals, rather than electrical signals, and for their aim to behave as predictably and modularly as would integrated circuits in a computer. In natural and synthetic biological systems, the abstraction of these circuits’ behaviors to dig- ital is often appropriate, especially in decision-making settings wherein the output is selected to coordinate a discrete set of outcomes, e.g. developmental networks or disease-state classication circuits. However, there are challenges in engineering entire genetic systems that mimic digital logic. Biological do not generally exist at only two possible concentra- tions but vary over an analog range of concentrations, and are ordinarily uncompartmentalized in the cell. As a result, scaling biological circuits which rely on digital logic schemes can prove dicult in practice. Neuromorphic devices represent a promising computing paradigm which aims to reproduce desirable, high-level characteristics inspired by how the processes - features like tunable signal processing and resource ecient scaling. They are a versatile substrate for compu- tation, and, in engineered biological systems, marry the practical benets of digital and analog signal processing. As the decision-making intelligence of engineered-cell therapies, neuromor- phic circuits could replace digital logic schemes with a modular and reprogrammable ana- log template, allowing for more sophisticated computation using fewer resources. This template could then be adapted either externally or autonomously in long-term single cell medicine. Here, I describe the implementation of in-vivo neuromorphic circuits in human cell culture models as a proof-of-concept for their application to personalized medicine. While biology has long served as inspiration for the articial intelligence community, this work will help launch a new, interactive relationship between the two elds, in which nature of-

3 fers more to AI than a helpful metaphor. Synthetic biology provides a rigorous framework to actively probe how learning systems work in living things, closing the loop between traditional machine learning and naturally intelligent systems. This thesis oers a starting point from which to pursue cell therapeutic strategies and multi-step genetic dierentiation programs, while expos- ing the inherent learning capabilities of biology (e.g., self-repair, operation in noisy environments, etc.). Simultaneously, the results included lay groundwork to analyze the role of machine learn- ing in medicine, where its dicult interpretability contradicts the need to guarantee stable, safe, and ecacious therapies. This thesis should not only spur future research in the use of these approaches for personalized medicine, but also broaden the landscape of academics who nd interest in and relevance to its concerns.

Thesis Supervisor: Ron Weiss Title: Professor of Biological Engineering and Electrical Engineering and Computer Science

Thesis Supervisor: Skylar Tibbits Title: Associate Professor of Architecture

4 Acknowledgments

I would rst like to thank my thesis advisors Prof. Ron Weiss and Prof. Skylar Tibbits, whose doors and minds were always open despite the broad and somewhat audacious interdisciplinary scope of this topic. Only because of their patience, trust, and condence was I able to discover a new world of interest in biology. I would then like to acknowledge my colleagues in the MIT Weiss Lab. I especially thank Christian Cuba-Samaniego and Wenlong Xu for their contributions to the project SBIML, as well as to my growth as a synthetic biologist. I owe much of my progress and excitement in the eld to their inspiration and mentorship, and my sanity to their counseling during challenging times. I am grateful to the United States Department of Defense Advanced Research Projects Agency (DARPA)as well as MIT Department of Architecture for providing funding during my graduate education at MIT. I also acknowledge with gratitude the Department of Biological Engineering Administrators Olga Parkin and Darlene Ray for their persistence and help. I thank my sister, Aubry White, for her friendship and for adding four members to my family and support network, including Eli, Callen, and Isabel White, during my tenure at MIT. I must also thank my girlfriend, Jessica Jiang, whose unrelenting patience and companionship has been the foundation of my success. Finally, I must express my very profound gratitude to my parents Bob and Jody Moorman for providing me with unconditional support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Thank you.

Author Andrew Moorman

5 6 Contents

1 Introduction 15 1.1 What is Design in Synthetic Biology? ...... 15 1.2 Cells as ...... 17 1.3 Non-Classical Computation in Dynamical Systems ...... 19 1.3.1 The Computational Behavior of Dynamical Systems ...... 19 1.3.2 Molecular Computation: A Simple Example ...... 21 1.4 Cell Therapies as ...... 24 1.5 Neuromorphic Computing in Cell Therapies ...... 26

2 Molecular Neural Network Computing in Mammalian Cells 29 2.1 A Brief Review of Articial Neural Networks ...... 29 2.2 On Combinatorial Transcriptional Regulation ...... 30 2.2.1 A Brief Review of Transcriptional Regulation ...... 30 2.2.2 A Transcriptional Articial Neuron ...... 32 2.3 On Bio-Molecular Sequestration ...... 36 2.3.1 A Brief Review of Molecular Sequestration ...... 36 2.3.2 A Sequestration-Based Articial Neuron ...... 37 2.3.3 Sequestration-Based Molecular ANNs to Solve the XOR Problem . . . 40 2.4 Conclusions ...... 41 2.5 Methods ...... 43 2.6 Supplementary Proofs ...... 44 2.6.1 Sequestration-Based Biomolecular Perceptron Model ...... 44 2.6.2 Sequestration-Based Biomolecular Neural Network ...... 47

7 3 Molecular Learning and Adaptation in Mammalian Cells 51 3.1 A Genetic Circuit for Learning in Mammalian Cells ...... 53 3.1.1 Circuit Design ...... 53 3.1.2 Discussion ...... 54 3.1.3 Description of Model ...... 56 3.2 A Stability Analysis for Population-Scale Learning ...... 60 3.2.1 Preliminaries ...... 60 3.2.2 Stability Analysis of a Perceptron Ensemble ...... 64

4 Conclusion 67

A Tables 71

B Figures 73

8 List of Figures

B-1 Personalized medicine increases in therapeutic precision from group level to single- cell ‘living drugs.’ ...... 74 B-2 Neuromorphic computing ts within the ‘Sense-Compute-Eect’ pipeline for engineered-cell therapies...... 75 B-3 A general schematic for the operation of neuromorphic gene circuits in mam- malian cells...... 76 B-4 Circuit design schematic for a three-input articial neuron computation using transcriptional regulation...... 77 B-5 A general functional schematic for the computation performed by a hybrid pro- moter...... 78 B-6 Steady state uorescence measurements for a three-input hybrid cir- cuit with the corresponding log-scale reference computation...... 79 B-7 Reaction diagrams for molecular binding and sequestration...... 80 B-8 Pictorial depiction of molecular sequestration as an articial neuron...... 81 B-9 Schematic for the use of control to regulate the input of species to molecular sequestration...... 82 B-10 Transcriptional regulation may be folded into the sequestration framework as a

non-linear feature map, 휑(퐵푖, 푤푖) on input uxes, 퐵푖, induced by TFs...... 83 B-11 Schematic and experimental data displaying thresholding of mRNA by the CasE endoRNase through molecular sequestration...... 84 B-12 Circuit diagram for a three-input sequestration-based articial neuron with two positive inputs (eYFP) and one negative input (CasE)...... 85

9 B-13 Steady-state uorescence measurements and the corresponding log-scale and linear- scale reference computations for a three-input sequestration-based articial neu- ron with two positive inputs (eYFP) and one negative input (CasE)...... 86 B-14 Steady-state uorescence measurements and cross sectional ReLU plots for a three-input sequestration-based articial neuron with two positive inputs (eYFP) and one negative input (CasE)...... 87 B-15 Steady-state uorescence measurements for increasing threshold values for a three- input sequestration-based articial neuron with two positive inputs (eYFP) and one negative input (CasE)...... 88 B-16 Circuit diagram for a three-input sequestration-based articial neuron with one positive input (eYFP) and two negative inputs (CasE)...... 89 B-17 Steady-state uorescence measurements and the corresponding log-scale and linear- scale reference computations for a three-input sequestration-based articial neu- ron with one positive input (eYFP) and two negative inputs (CasE)...... 90 B-18 Steady-state uorescence measurements for increasing threshold values for a three- input sequestration-based articial neuron with one positive input (eYFP) and two negative inputs (CasE)...... 91 B-19 Steady-state uorescence measurements and the corresponding linear-scale ref- erence computations for dierently weighted variants of a three-input sequestration- based articial neuron...... 92 B-20 Circuit design for a two-layer sequestration-based ANN featuring two nodes which computes an analog version of the XOR logical function...... 93 B-21 Steady-state uorescence measurements for a two-layer, two-node sequestration- based ANN computing XOR, plotted for four increasing oset values...... 94 B-22 Circuit design for a two-layer sequestration-based ANN featuring three nodes which computes an analog version of the XNOR logical function...... 95 B-23 Steady-state uorescence measurements for a two-layer, three-node sequestration- based ANN computing XNOR, plotted for four increasing oset values. . . . . 96 B-24 Circuit design schematic for a three-input ANN using multi-modal transcrip- tional and post-transcriptional regulation...... 97

10 B-25 Steady-state uorescence measurements for A) a two-layer, two-node multi-modal ANN compared to B) unimodal (post-transcriptional) regulation ...... 98 B-26 Simplied circuit schematic for achieving molecular learning in cells...... 99 B-27 A circuit design for stoichiometry tuning in mammalian cells...... 100 B-28 Serine integrase and serine integrase-RDF fusion catalyze the inversion of DNA regions anked by attP/attB and attL/attR target sites, respectively, in four stages...... 101 B-29 the transfer of species within one update to the active rtTA promoter sites from the introduction of int and intRDF to equilibrium...... 102 B-30 When both recombinase enzymes are introduced to the system simultaneously and allowed to degrade, the action on the attL/attR and attP/attB sites is driven by the proportional relationship of the enzymes...... 103 B-31 The eect of asynchrony between int and intRDF input signals on the update to active rtTA promoter sites...... 104 B-32 The behavior of the comparator module across titrated levels of active attP/attB promoter sites...... 105 B-33 Imbalances in the production of the Tet mutants and integrase enzymes upset the relationship between the error and update minima...... 106 B-34 Dierences in the production time for the titrant and sequesterer species dis- rupts the error inferred by the system, creating an update even when the number of rtTA sites is already at the optimum value...... 107 B-35 Two optimization runs consisting of 2 hour inductions of DAPG and DOX every 24 hours for 10 days. The right simulation begins below the optimum number of active rtTA sites and must increase its value; the left, above the value so that it must decrease its active sites...... 108

11 12 List of Tables

A.1 Parameter values for the molecular learning circuit ...... 72

13 14 Chapter 1

Introduction

1.1 What is Design in Synthetic Biology?

The proposition to redesign life is neither simple nor comfortable, and the eld of synthetic biol- ogy represents, perhaps, its most overt expression and synthetic biologists, its most prolic pro- ponents. Whatever else synthetic biology might do, its emergence has enrolled large numbers of scientists and garnered substantial public and private funding, corralling them under a rubric for the design of biology [44, 61, 73, 100]. In synthetic biology, an appeal to design is an important vector of this contagion, and perhaps the most important contributor to its success and popular- ity. This begs the questions: What is design in synthetic biology? The answer is, at least partially, recursive. Synthetic biology is perhaps best seen as an attitude toward and methodology for the design of biological substances – a set of practices which orga- nize how biology is imagined and worked on [72]. Synthetic biologists seek to collect, rene, and repackage nature so that it’s easier to design new and reliable biological systems, typically at the cellular or multicellular level. These redesigned systems are often referred to as ‘biological circuits’ for their ability to perform operations on biomolecular signals, rather than electrical signals, and for their aim to behave as predictably and modularly as would integrated circuits in a computer [26, 117, 83]. Arguably, the term ‘design’ carries a lot of baggage here. In synthetic biology, it is freighted with ideals borrowed from forward engineering, and metaphorical language, from elec- tronics and software. Design principles such as abstraction, decoupling, and standardization are meant to counterbalance the innate complexity of biological systems, thereby economizing the

15 time, eort and skill needed to create new biological constructs with desirable properties [116, 3]. In synthetic biology, design is somewhat akin to a program, and the cell, a machine which exe- cutes it.

Synthetic biology is often presented as a practical consequence of fast and cheap commercial DNA synthesis: Once long stretches of genetic material can be synthesized quickly and reliably, biologists and engineers can envisage new forms of experimentation and share a standardized registry of their materials [43]. These too can be replicated elsewhere by dierent people for new applications; they need only synthesize the same genetic substance. The practice of synthetic biology is indebted to a lasting and unshakeable image which portrays DNA as the wellspring of biological information. Like software, DNA is an information encoding and it too can be read, written, copied, transferred, and transposed. A common metaphor for this dimension of biotechnology is lateralization or attening: In this attened world, the substance of almost any biological design can, in principle, be freed from its ties to cell, organ, , or species, and set free to circulate and be combined with any other, provided certain conditions are met [39, 88, 71]. The objective of synthetic biology is perhaps, put most simply, to minimize these prerequisite conditions.

Synthetic biology is also frequently presumed an inevitable conclusion to molecular biology, an earlier subdiscipline whose overhaul of biological attention has since proven powerful and lasting in the eld [100]. Molecular biology concerns the study of molecules, such as nucleic acids and proteins, and the mechanisms they compose which underly a cell’s proper function- ing, maintenance, and replication [2]. According to popular usage, it emerged in the rst half of the twentieth century as a fundamentally novel type of biology marked by the identication of DNA as the material basis of genetics [60]. On the conceptual level, this discovery relocated the essence of living beings – and locus of research activity – to the physicochemical structure of the gene, and supplanted the previous ‘ paradigm of life’ with a ‘nucleic acid paradigm of life.’ What stood in its center was a new notion of biological specicity which found its ex- pression by adopting ‘genetic information’ and ‘genetic program’ as key words, phrases that had been completely absent from biology before [100]. Life, itself, came to be redened by molecular biologists as the instructions or information encoded in . Their pursuit was to understand how the genetic information residing in DNA was stored and activated, a conception of control

16 that would promise mastery over the processes of making and remaking life [60]. On the technical level, the “path to the double helix” in the twentieth century also brought with it a massive import of analytical procedures from biophysics and biochemistry [86]. The shift to technologies in which biological play the central role - restriction and ligation enzymes, plasmids and other vectors, and polymerases, for example - marked the origi- nal introduction of ‘genetic engineering’ [81]. Its techniques irted with contemporary synthetic biology and conceived the rst articial transgenic events [100, 81]. In tubes outside the cell, biol- ogists could cut (restriction enzymes), paste (ligation enzymes), synthesize (polymerase enzymes) and recombine (recombinase enzymes) genetic material, thus altering life’s ‘code’, not by ran- dom , but by design [59]. These technologies were then supplemented by a new mode of constructive biology which shifted the manipulations back into the cell. Following, researchers could easily construct recombinant DNA and analyze the behavior of individual genes, such that not only could existing genetic material be described and evaluated but also new genetic arrange- ments, which were altogether articial [99, 81]. These shifts to 1) a primacy of genetic information as the programmatic basis for life and 2) biotechnologies for the synthesis of genetic material largely determined the substance and tech- nique of biological design, thereafter [60, 72]. Together,they founded and reinforced a metaphor which cast information transfer in biology as genetic ‘code’ executable by molecular ‘circuits’. Synthetic biology is largely the plausible extension of this metaphor; If DNA is the code of life, why not reprogram the computer?

1.2 Cells as Computers

The hallmark of scientic understanding is often said to be the reduction of a natural phenomenon to simpler units [85]. This mechanistic reduction allows operational things to be described in terms more fundamental phenomena – the mechanisms of its parts in operation. In molecular biology, for example, certain high-level cellular features are explained by the physical principles governing the structure and behavior of their underlying molecular processes. These processes are then reassembled, one-by-one, to elucidate the higher-level phenomenon observed [15]. For instance, the molecular biological mechanisms of replication, recombination, and cell division

17 provide reductive explanations of the major principles of genetics [2]. While classical genetics oered an observational account of gene transmission, molecular biology, whose fundamental mechanisms were studied independently and combined, lled in the gap for explaining how and why the transfer of genetic information occurs between generations [6].

Equally important understanding comes from nding the appropriate abstraction with which to distill a unit of the entire phenomenon. A useful abstraction — a mapping from a real-world domain to a mathematical domain — highlights some essential properties while ignoring other, complicating, ones [98]. For example, computer science distinguishes between two levels to de- scribe a system’s behavior: the mechanism of implementation (how the system is built, e.g. the wires in a circuit) and the abstract specication (what the system does, e.g. an ‘AND’logic gate). Using abstraction opens up new possibilities for understanding and composing more complex systems, so long as the act of composition doesn’t violate the behavior of any unit of abstraction on its own [118]. Computer and biomolecular systems, for instance, start from a small set of ele- mentary components – transistors and molecular reactions, respectively – from which, layer by layer, more complex entities are constructed with evermore sophisticated functions. Similarly, once biological behavior is abstracted as computational behavior, implementation can be related to a real, existing biological system whose equivalent model performs the behavior.

In this vein, an often articulated goal of synthetic biology is to create of new biological func- tions which obey these same low- and high-level equivalences. Many synthetic biologists aim to reshape biology into a predictable systems engineering discipline, urging abstraction and stan- dardization of modules with well-dened behavior, akin to integrated circuit design in VLSI [116, 75]. Appropriately, this often occurs through the creation of so-called synthetic gene circuits. Gene circuits are small networks of interacting genes designed to perform a predened function inside cells. To create them, synthetic biologists rely on the bottom-up assembly of composable units, and use predictive modeling and prototyping to characterize their performance in-vivo, i.e. with methods of forward engineering [35].

In synthetic biology, like molecular biology, the simplest composable units are often molec- ular interactions entailed, originally, by DNA [54]. For example, the transcription of RNA from DNA can be abstracted as a functional unit with two inputs (a DNA containing a pro- moter and a transcription factor) and one output (an RNA molecule) [10]. These molecular

18 interfaces allow bioengineers to create synthetic modules that interact with endogenous cellular processes and support the creation of more-complex synthetic systems by means of their hier- archical assembly. Like all abstractions, this denition of biological modularity ignores many important details for the sake of conceptual simplicity. However, by focusing initially on the modules’ inputs and outputs, this abstraction can make the assembly of complex “biological pro- grams” more tractable.

1.3 Non-Classical Computation in Dynamical Systems

1.3.1 The Computational Behavior of Dynamical Systems

Before moving on, it is important we rst establish a formal groundwork for interpreting com- putation, in non-classical terms, from the behavior of dynamical systems. Importantly, when we claim biological systems are capable of computation, we appeal to a dierent formalism in math- ematics than the theory of formal languages familiar to the classical Turing model of computing. This non-classical model situates computation in the language of deterministic dynamical sys- tems theory, an area of mathematics typically ascribed to modeling the time-dependent physical behavior of natural systems. A review of this association is worthwhile; the formal interpretation of dynamical systems behavior as computation will be instrumental when reconciling cell state transformations with a model of computing and applying this perspective to cellular decision- making. At its core, a is a mathematical formalization for describing how a point’s position in space evolves over time according to a “xed” rule. Thus, it can be geometrically de- scribed in terms of three entities: 1) an abstract state space dened by variables which are decided to adequately describe the system; 2) the system’s current state, a vector within its state space; and 3) and a rule that determines the motion of a system through space [112]. In a classical com- putational system, that rule is given explicitly by the computer program and its state is stored in ; in a natural physical system, the rules are the underlying physical laws governing its behavior and its state is implicit and undened [110]. For a dynamical system in 푛 dimensions, its abstract state space is 푋푛. In other words, its state can be dened by n variables 푠푖 whose possible values derive from some set 푋. Together,

19 these variables comprise an 푛-dimensional state vector 푠 ∈ 푋푛, containing all permutations of 푛 members of 푋. The vector 푠 may contain binary bits, such as in classical computing, or be a vector of continuous variables such as position or molecular concentration. In many dynamical systems, the state vector 푠 changes deterministically over time; the trajec- tory of these changes through space is dened by a xed rule, the dynamics, which strings together successive states, so that from a current state, the next can always be determined. In continuous systems, this rule is denoted 푠˙ = 푓 (푠). If the system is dissipative, we can follow these trajectories to a point where they no longer change. Such points are called attractors, and the states whose trajectories ultimately lead to them, their basins of attraction. In non-chaotic dynamics, the computation performed by the system can be interpreted as the determination of an attractor basin, found by following its trajectory from an initial state 푠0. The output may be the attractor, a subspace of the attractor in state space, or some projection of the states it passes along its trajectory. The programming problem is thus in nding the rel- evant dynamics, now restricted to natural (albeit engineered) system properties, which are not arbitrary and whose properties we are hopefully able to predict. Formally, we look to inputs 푝 and investigate how the dynamics of a system, (푠, ˙ 푝) = 푓 (푠, 푝), vary with 푝 as it changes. Dynamical systems can be autonomous or closed to inputs from an external environment. Their rules for behavior are not explicit functions of time, and remain constant throughout a system’s . Programming a closed dynamical system thus involves specifying a constant input 푝, a priori which parameterizes a family of xed dynamics. That is, choosing 푝 xes the computation, i.e. which input states deterministically map to which outputs. Most biological systems, in contrast, are open or non-autonomous; the behavioral rules gov- erning open dynamical systems are explicit functions of time, in the form of inputs from the environment [102]. However, the analysis of open systems is often much more demanding than closed systems. This depends in part on the relationship between the timescale on which the envi- ronmental inputs are changing and the timescale on which the dynamics is acting. An input com- pletely changes the dynamics of the system: Immediately after receiving a new input, a system will be in the same position in its state space, but the underlying attractor structure of that space may be entirely unrecognizable. The input parameter 푝 is a function which modulates the dynamics in time and produces an output 푞. We formally denote this relationship (푠, ˙ 푞(푡)) = 푓 (푠, 푝(푡)).

20 If possible, we prefer to look at open systems whose inputs evolve on a slower timescale than the dynamics of the system, such that the system dynamics stabilize on a new attractor before the next signicant change to the input. The overall dynamics can thus be considered piecewise, as the concatenation of several closed dynamical systems in series [32]. Programming open systems which abide by these limitations on timescales can closely resemble programming closed systems; a slowly modulating 푝 may be treated as a xed parameter while the computation executes. Open systems may also, however, take a broader, more powerful position in the landscape of computa- tion. Sometimes, when the historic path of an eectively closed system through parameter space impacts 푝 with a lag or delay, it may exhibit hysteresis: Restoring its inputs to previous values may not necessarily restore the system to its previous state at equilibrium. The computation it performs has been irrevocably changed by past behavior. When the change eected by its his- tory progressively optimizes one or several parameters of the closed system, the entirety is said to demonstrate learning [53]. At this point of the discussion, we reduce the scope of dynamical systems to biological systems in particular.

1.3.2 Molecular Computation: A Simple Example

We denote most physical systems, biological systems included, with state spaces and time dynam- ics which are mathematically continuous, meaning the sets of values in their axes are uncountable and the dierences between successive values in each set are innitesimally small. Continuous state space variables with continuous time dynamics are formalized using notation for the set of real numbers, ℛ, which contains such values. A continuous state vector 푆 with 푛 dimensions is thus said to be in the product of sets, ℛ푛, containing all permutations of 푛 real numbers. Continuous sets of this type are useful for describing molecular concentrations, which com- prise our state space variables and whose we study in . Accord- ingly, the “rules” that determine the motion of a biological system through space – the ux in molecular concentrations in a cell volume – are the reactions in which its molecules participate. For instance, consider the simple binding reaction of two molecular species 퐴 and 퐵 to re- versibly form a complex 퐶 where 푘푓 describes the rate of association and 푘푟 describes the rate of

21 dissociation. We have the following reaction:

푘푓 퐴 + 퐵 ↽−⇀− 퐶 푘푟

This states that the reaction ‘consumes’ molecules 퐴 and 퐵 to create molecule 퐶 while, simul- taneously, 퐶 also falls apart, or dissociates, back into 퐴 and 퐵. In chemistry, the law of mass action proposes that the rate of a chemical reaction in a xed volume is directly proportional to the product of the activities or concentrations of the reactants. Here, mass-action kinetics dictate the ‘chemical anity’ of 퐴 and 퐵 to form 퐶 depends on their material nature, contained here in

푘푓 and 푘푟, as well as their concentrations in solution, [퐴], [퐵], and [퐶]. From this law, we can de- rive the following ordinary dierential equations, which determine the evolution of our system over time:

˙ 퐴 = 푘푟[퐶] − 푘푓 [퐴] · [퐵] ˙ 퐵 = 푘푟[퐶] − 푘푓 [퐴] · [퐵] ˙ 퐶 = 푘푓 [퐴] · [퐵] − 푘푟[퐶]

Additionally, its state at time 푡 is given by the vector of concentrations ⟨[퐴], [퐵], [퐶]⟩푡. Note: For convenience, in the remainder of the thesis, I denote concentrations as upper case symbols without square brackets, so that ⟨[퐴], [퐵], [퐶]⟩푡 becomes simply ⟨퐴, 퐵, 퐶⟩푡.

Assume that 퐵 is a species that is controlled by other reactions in the cell (i.e. an input) and that the total concentration of 퐴 is conserved, so that 퐴 + 퐶 = 퐴푡표푡. Recall our framework for deriving computations from dynamical systems behaviors. We wish to determine how 퐴 and 퐶 vary as a function of 퐵 at an attractor state – where the system is at steady state or its equilibrium. To do so, we must nd the concentrations of 퐴 and 퐶 at a hypothetical state when their motions are no longer changing, which we can formally state as 퐴˙ = 퐶˙ = 0. Substituting our description of 퐴푡표푡 above into our rate equations for 퐴 and 퐶 yields:

˙ 퐶 = 푘푓 퐴 · 퐵 − 푘푟퐶

= 푘푓 (퐴푡표푡 − 퐶) · 퐵 − 푘푟퐶

22 퐶˙ = 0 퐾푑 = 푘푟 By setting and letting 푘푓 , we obtain:

0 = 푘푓 (퐴푡표푡 − 퐶) · 퐵 − 푘푟퐶 =⇒

퐶(푘푓 퐵 + 푘푟) = 푘푓 퐴푡표푡퐵 =⇒

퐶(퐵 + 퐾푑) = 퐴푡표푡퐵 =⇒ 퐴 퐵 퐶 = 푡표푡 =⇒ 퐵 + 퐾푑 퐵 퐴푡표푡 퐶 = 퐾푑 1 + 퐵 퐾푑

Solving for 퐴, we get:

0 = 푘푓 퐴 · 퐵 − 푘푟퐶

= 푘푓 퐴 · 퐵 − 푘푟(퐴푡표푡 − 퐴) =⇒

−퐴(푘푓 퐵 + 푘푟) = −푘푟퐴푡표푡 퐵 퐴( + 1) = 퐴푡표푡 =⇒ 퐾푑 퐴 퐴 = 푡표푡 퐵 + 1 퐾푑

What can we say about our result, computationally? Provided an input molecule of concen- tration 퐵, our system will calculate for us two outputs 퐴 and 퐶, whose values are given by the equations above, provided we wait for the system to settle into equilibrium. Thus, we have at ′ our disposal two computations, dened by the variable 퐵 = 퐵/퐾푑 and one xed parameter

푎 = 퐴푡표푡:

′ 1 푓 (퐵 , 푎) = 푎 · 퐴 퐵′ + 1 ′ ′ 퐵 푓 (퐵 , 푎) = 푎 · 퐶 퐵′ + 1

′ ′ Although 푓퐴(퐵 , 푎) and 푓퐶 (퐵 , 푎) may not, themselves, seem desirable, genetic and molecu- lar biology provides us a framework of diverse functions whose forms, especially when composed together, enriches our computational toolbox with broad representational power. As we’ll see,

23 with the right experimental techniques, we can rely on the molecular machinery already operat- ing within cells to carry out these computations for us; by encoding a genetic circuit entirely in DNA, the cell is provided a complete set of instructions and parameters to autonomously per- form operations like the one above.

1.4 Cell Therapies as Computations

Although the approach and vision of synthetic biology is far broader, I now narrow to decision- making gene circuits as an opportunity where the eld can impact and integrate with personalized medicine. Unlike traditional medical care, which caters to broad subtypes of patients, personal- ized medicine refers to a more precise model which tailors medical treatment to each patient, individually [27]. Equipped with their genetic and molecular proles, physicians can design a protocol for therapy to address the specic signatures of a patient’s disease and potentially with enough precision to dierentiate treatment at the cellular level (Figure B-1) [4]. Within personalized medicine, engineered cell therapies promise a level of precision and au- tonomy at the far spectrum of this model: By changing their genetic program, a cell may be de- signed to selectively treat themselves in a therapeutically useful manner [63]. For example, amend- ing certain cells of 훽-thalassemia patients ex-vivo with a functional locus has been demonstrated to reduce their dependence on blood transfusions, and engineering a cancer patient’s T cells with cancer-compatible receptors can allow them to destroy tumor cells when transplanted back in- vivo [20, 67]. However, realizing this level of precision has the eect of scaling down typical roles in the medical process. In the most advanced technologies, tools for diagnosis and treatment now have to operate exclusively at the molecular level, and cellular “living drugs” must be provided enough intelligence to produce correct medical decisions entirely on their own. Certainly, this aspect of autonomous decision-making falls within the purview of synthetic biology; when we restructure ‘diagnosis’ as classication and ‘treatment’ as molecular produc- tion, the problem simplies to a computation, albeit a complex one. A computation of this kind requires sophisticated control: Specicity of the strength, timing, and cellular context of its ef- fect are necessary for interventions that have a narrow therapeutic window or require a localized response [120]. For example, in cancer immunotherapy, overdosing or o-target activation can

24 give rise to a viral response in patients and result in signicant destruction to otherwise healthy tissue [57]. Because genetic programs are typically delivered indiscriminately to cells, diseased or not, the on-target specicity of their behavior is a strict requirement [115, 120].

In practice, precise regulation in medicine is broadly achieved by the use of known biomolec- ular markers - or ‘biomarkers’ - of abnormality. Cellular biomarkers indicate normal or pathogenic biological processes and may be used to characterize the existence and progression of disease [45]. For instance, all known cancers possess biomarkers – typically proteins, altered segments of DNA, or short circulating RNA – expressed in every cell, healthy or not, but at higher or lower rates in those that are malignant [66, 25]. By sensing a panel of such molecular biomarkers (inputs), a genetic program may classify their hosts and perform precise control of dierential gene expression (output) on the basis of their prognosis [63]. Traditionally, the output will elicit a cytotoxic or immunogenic eect, providing a means to selectively treat diseased cells despite the non-specic delivery of the genetic circuit.

This relationship between the inputs and outputs of a circuit can subsequently be summa- rized by a mathematical function, formulated beforehand according to the decision-making task the circuit is designed to implement. Equivalently, this function is the “computation” to be per- formed by the circuit, mapping biomolecular markers for disease to a tailored biomolecular ther- apeutic response.

In engineered-cell therapies, gene-encoded molecular computers could, in principle, be de- signed to enact diverse monitoring, reporting, and therapeutic tasks of this type. However, as discussed, the mapping from physiological cues to eects would need to be elaborately specied. Toprevent o-target activation, many dierent sets of conditions observable in healthy cells must all generate an inert outcome. In computational terms, our separation of categories of cells – al- ternatively, the classication of cell types – must be meticulous. In cancer therapy, for instance, this dilemma has been a major bottleneck for treatments which are safe, yet ecacious [101, 122]. While individual molecular features greatly improve selectivity in cancer cell-targeting, they are often insucient for precise classication [14, 23, 21]. Thus, a signicant challenge in classier de- sign is the development of robust systems capable of processing and integrating multiple inputs in order to target cells more carefully [63]. Simultaneously, a classier must be resource ecient, and not impose too signicant a material burden on the normal operations of the cell it inhabits

25 [103, 34]. These contradictory constraints, complex signal integration and resource eciency of hardware, are long familiar to champions of a bio-inspired concept for computing called ‘neu- romorphic electronic systems,’ or more often ‘neuromorphic computing’ [76]. This concept is where we now turn our attention.

1.5 Neuromorphic Computing in Cell Therapies

In natural and synthetic biological systems, it is common practice to abstract a circuits’ behaviors to digital computation [117, 63, 30, 9, 8]. This is often appropriate, especially in decision-making settings wherein the output is selected to coordinate a discrete set of outcomes, e.g. developmen- tal networks or ‘all-or-none’ disease-state classication circuits. Accordingly, in synthetic biology, desirable computations are often articulated as Boolean logic functions, a high-level specication fundamental to computer systems. This approach relies on an abstraction of graded analog func- tions, where values above a threshold are classied as ‘1’ (alternatively, True, High, etc. ) and val- ues below this threshold are classied as ‘0’ (alternatively, False, Low, etc.) [30]. Its popularity in synthetic biology owes to the fact that biochemical interactions are often well described by Hill functions, which naturally approximate a binary condition [58]. However, there are challenges in engineering entire genetic systems that mimic digital logic. Biological molecules do not generally exist at only two possible concentrations but vary over an analog range of concentrations, and are ordinarily uncompartmentalized in the cell [106]. More- over, scaling biological circuits which rely on digital logic schemes can prove dicult in practice and unnecessarily exhaust endogenous resources[30, 105, 95]. Thus, while this strategy provides a convenient solution for building logical relationships between inputs, other frameworks may be needed for computation of more sophisticated, continuous functions. Neuromorphic devices, the ‘hardware’ counterpart to articial neural networks (ANNs), represent an alternative computing paradigm which aims to reproduce desirable, high-level char- acteristics inspired by how the brain processes information - features like tunable signal processing and resource ecient scaling [76]. They are a versatile substrate for computation, and, in engi- neered biological systems, would marry the practical benets of digital and analog signal process- ing [49]. As the decision-making intelligence of engineered-cell therapies, I argue neuromorphic

26 gene circuits could replace digital logic schemes with a modular and reprogrammable analog tem- plate, allowing for more sophisticated computation using fewer resources. This template could then be adapted either externally or autonomously in long-term single cell medicine (Figure B-2). While biology has long served as inspiration for the articial intelligence (AI) community, most work has emphasized precedents at the organ and organismal levels; the study of how molecules think and learn has not yet produced a similar variety of computational models and applications [31, 38, 90]. In fact, the study of articial learning mechanisms in synthetic biomolecular systems is a relatively new endeavor, and precedent work is sparse and largely theoretical. Previously, several projects have described abstract reaction networks which exhibit neuron- like behavior in simulated chemical systems [49, 48, 7]. Hjelmfelt et al. and Okotama et al., for example, demonstrated chemical reaction networks whose behaviors reect the McCulloch- Pitts traditional ring model for articial neurons and rely on activating and inhibitory reactions to establish connection weights between nodes. Using similar mechanisms, Banda et al. have proposed binary and analogue chemical perceptrons in-silico and simulated their ability to learn using weight-race conditions and under discrete time constraints [7, 12]. In-vitro systems have also been suggested which employ combinatorial transcriptional logic schemes that draw analogy to neural network behavior [17], or use DNA strand displacement to perform linear thresholding and temporal in cell-free experimental and simulated conditions, respectively. [24, 70, 96, 84]. To date, however, no synthetic neuromorphic computing system has been realized experi- mentally in mammalian biology. Neither have these studies addressed the practical design con- straints when implementing such a system in living cells. Here, I describe the implementation of in-vivo neuromorphic circuits in human cell culture experiments as a proof-of-concept for their application to personalized medicine (Figure B-3). Further, I attempt to illustrate how the process of autonomous adaptation would impact the real, if sometimes implicit, design process which occurs in synthetic biology. It’s my hope that this thesis will spur future research in the use of these approaches for personalized medicine and broaden the landscape of academics who nd interest in and relevance to its concerns. In the following section, I outline the use of combinatorial signal integration through the de- sign of hybrid transcriptional promoters and show their computational equivalence to “nodes” in

27 neural networks. I then demonstrate an experimental characterization of a three-input transcription- based articial neuron and discuss the design limitations of this strategy. I also present an alterna- tive approach which uses molecular sequestration networks to compute a positive and negative weighted sum of inputs. I leverage this reaction motif to experimentally construct genetic circuits for two three-input linear classiers and two multi-layer networks which carry out the non-linear XOR classication; the design strengths and weaknesses of molecular sequestration are discussed as well. Finally, I turn to the question of learning: I discuss the simulated behavior of a genetically- encoded circuit which learns the optimal stoichiometry of one of its components, and examine how aggregating many cell classiers in an ensemble approach can increase the stability of learned classications compared to a single cell. I conclude with experimental evidence in support of the former study, and address implications of an autonomous learning process on the role of the synthetic biologist as a practitioner of design.

28 Chapter 2

Molecular Neural Network Computing in Mammalian Cells

2.1 A Brief Review of Articial Neural Networks

Articial neural networks are a versatile basis for computation and a popular reference point for most neuromorphic hardware. Importantly, they represent a computationally universal system: Any nite state machine, and hence the nite state part of a universal Turing machine, can be simulated by a neural network [79, 107]. Neural networks also form the basis of many collective computational systems such as feedforward or recurrent networks [50]. A biomolecular neural network would in turn serve as the "hardware" for any of these approaches to computation and thus exemplify a neuromorphic computing device. Feedforward neural networks, also often called articial neural networks (ANNs) or multi- layer perceptrons (MLPs), are quintessential models in the eld of machine learning. They are so-named because they process or “feed” information forward from an input 푋 through inter- mediate computations 푓ˆto a corresponding output 푦, without connections directing 푦 back to 푋. The purpose of a feedforward network is to dene a mapping 푦 = 푓ˆ(푋, 푤) where the values of parameters 푤 are optimized to best approximate a desired function 푓 *. When 푓 * identies to which of a set of categories (or class) 푥 belongs, it is denoted a classication function or simply a classier. Perhaps the most notable classier in machine learning is the perceptron model for linear

29 classication [119]. In linear classication, 푓ˆ(푋, 푤) assigns 푋 to a class based on a linear combi- nation of the input features, where the coecients are set by 푤. Often, 푓ˆis a nonlinear threshold function, which maps all inputs above a certain threshold to one value or otherwise to a second value. We represent the perceptron function for 푛-dimensional inputs:

푛 ˆ ∑︁ 푓( 푥푖푤푖): 푥푖, 푤푖 ∈ R 푖=1 where 푓ˆ(푥) is one such threshold function. While, individually, a perceptron is constrained to solving linearly separable functions, by composing together several classiers in a multilayer structure, feedforward networks are able to distinguish inputs which are not linearly separable. These chains of functions are the most commonly used structures in articial neural networks, whose “depth” indicates their number of compositions. In the following sections, I present evidence of biomolecular ‘hardware’ encoded by plasmid DNA which experimentally emulate the units of neuromorphic computation, articial neurons. Since the hardware is solely molecular, coupled reactions implement "programmed" computa- tions as the concentrations of molecules evolve in time inside the bodies of cells. I begin with a rst attempt using combinatorial transcriptional regulation, and describe the aordances and limitations of this strategy as a platform for ANN-like computation in mammalian cells. I then proceed to a more practical alternative using networks of molecular sequestration reactions.

2.2 On Combinatorial Transcriptional Regulation

2.2.1 A Brief Review of Transcriptional Regulation

All biological from the single-celled to the highest orders of life possess an enormous repertoire of strategies for decision-making in response to cellular and environmental signals [94]. To a large extent, this repertoire is encoded in complex networks of genes whose activities are in- terlocked by transcriptional regulation. These gene-regulatory networks (GRNs) typically con- sists of only a few tens to hundreds of genes, the expression of which is slow and asynchronous,

30 but have the capacity for very sophisticated signal processing: In eukaryotes, each node can be regulated combinatorically, often by four to ve other nodes, such that the ultimate regulatory function is highly complex [5, 10]. The activity of a gene is regulated by other genes through the concentrations of their prod- ucts, transcription factors (TFs). Equivalently, TFs are proteins which regulate the transcrip- tional activity of genes. This is accomplished mechanistically by the interaction of the TFs with their respective DNA-, each other, and with the RNA polymerase (RNAP) complex in a regulatory region of DNA located upstream of the gene [32]. Regulation can be quantied by its ‘response function’ or ‘transfer function,’ a relation which connects the ‘response’ of gene expression to the concentrations of relevant TFs in a cell [35]. The qualitative aspects of a response function – its general shape – is dictated by the types of regulation performed its TFs. Because a response can, intuitively, take one of two forms – an increase or decrease – TFs are often placed into two corresponding categories – activators and , respectively – based on their eect. Eukaryotic activators upregulate transcription of DNA, typically through a mechanism of recruitment: After binding upstream, an attracts certain transcriptional machinery to a gene of interest. This machinery comprises many additional proteins – more than thirty for RNA polymerase II, for example – required for RNAP to recognize a promoter and initiate transcription [92, 93]. Without this machinery, however, a gene would stay silent. In contrast, transcriptional repressors downregulate otherwise active genes. This is often achieved by steric hindrance, in which the -binding site overlaps core promoter elements; when the repressor is bound to DNA, it blocks recognition of the promoter by RNAP, without which transcription is unable to occur [109]. How the shape of a response function precisely manifests is dictated by the strength of mecha- nistic interactions between TFs and their binding sites on the DNA (operators). It is also broadly bound by maximal rates of transcription and translation; these parameters are inuenced by the choice of activating TF, the occupation of DNA operator sites, and well as other DNA elements contained within the regulatory region of the gene [92, 10]. We thus have at our disposal a exible architecture for making and tinkering with arbitrary networks of computations encoded in DNA. Because it is straightforward to modify individual DNA-binding sequences (through point substitutions), adjust their positions within a regula-

31 tory region (via insertions and deletions), and move/copy them from one regulatory region to another (via duplications and recombination), their topologies and weights can be programmed or reprogrammed by reassembling new or more operator sequences or else changing their lo- cations with respect to each other and the transcription start site. Specically, the inclusion of binding domains in a gene’s regulatory region sets the network topology; the choice of transcrip- tional regulator, the sign of a connection’s weight. More advanced ne-tuning through choice of the binding sequence or its mutational variants. Buchler et al previously demonstrated that gene transcription regulated by the architecture of a single promoter can be modelled as a form of Boltzmann Machine (albeit with heavy practical constraints on its topology), a class of universal computing machine [17]. Moreover, in a paper to be released, a collaborator, Professor Ramiz Daniel of the Technion Institute, has shown that a model based on similar gene networks in computes a negatively-weighted multiplica- tive combination - a log-scale subtraction - of repressor concentrations. However, in synthetic mammalian systems, it is more interesting to take a step back and view this framework, instead, through a constructive lens. Specically, what does a neuron look like in the form of a promoter?

2.2.2 A Transcriptional Articial Neuron

Figure B-4 displays the design of a neuron with three inputs and one output. Four copies of a binding sequence for an activator, Gal4-VP16, are placed upstream of a core promoter region called the minimal cytomegalovirus (minCMV) promoter. This sequence provides the bare min- imum architecture for transcriptional machinery to bind and work, though it is signicantly enhanced by recruitment once Gal4-VP16 activators are bound to nearby DNA. The resulting architecture is made repressible by the placement of neighboring operator sequences on either side of the minCMV promoter. One operator, PhlfO which corresponds to the repressor Phlf, is included upstream, while three more are placed downstream: Two TetO operator sites which correspond to the repressor TetR are followed by a second PhlfO at their tail. Each mechanism for repression works by steric hindrance of RNAP as described above, although the rst TetO operator plays an additional role. Its binding sequence overlaps with another important regula- tory element called the TATA box, where the rst step in transcription initiation occurs. When a TetR repressor is bound to this sequence, it blocks recognition by a preinitiation complex, with-

32 out which any of the process of transcription can begin. In this sense, it carries the most weight of all four operator sites. Should the RNAP happen to be recruited to the promoter, and sub- sequently breakthrough this complex assemblage of regulators, it produces mRNA copies of its output gene, mKO2, ready for translation into protein.

As mentioned above, we’re interested in observing whether this particular stretch of DNA, governed by the regulation of its gene, behaves itself, mathematically-speaking; we’d like to en- sure it acts like an articial neuron and not some other arbitrary function. Unfortunately, it’s not clear how a proper articial neuron should act? From a cell therapeutic standpoint, a neu- ron is only useful insofar as it makes the right decision. Of course, formally, it must ‘separate’ dened classes of inputs using their linear combination, i.e. produce a linear separatrix. How- ever, in biology, linear separability of cell biomarkers can sometimes occur across several orders of magnitude; these classications are visible only when their inputs are put in logarithmic terms. In other circumstances, the dierence is more rened; mere linear terms are sucient to do the trick. Context dictates the best design.

For now, in transcriptional regulation, our operators are multiplicative and we perform a specic and constrained form of log-scale addition (Figure B-5). This is useful for identifying cell disease states whose biomarkers dier by powers of ten. As we’ll see, in molecular sequestration where the operators are additive, we arrive at a linear-scale sum of inputs for which powers of ten look much dierent. At present, however, our primary concern is evaluating whether our design expresses a linear-scale multiplication of its inputs, i.e. their log-scale addition, and how we can tweak the parameters of the operation.

I’ve already described that by recasting the design process in synthetic biology as writing and assembling DNA, constructing a new design hypothesis is fairly straightforward, and can be as simple as rewriting a sequence and waiting for delivery. Evaluating that hypothesis in living sys- tems, however, is a much more complicated enterprise. In mammalian cells, many experiments (including those within this thesis) use a process called transient transfection to validate the be- havior of their . This method temporarily introduces foreign DNA into a cell using small lipid vesicles or other means; its stay is transient because, unlike genomic DNA, this foreign DNA is not replicated when passed on to daughter cells. As a result, it dilutes with every cell division until its eect is unmeasurable.

33 Transient transfections reach a pseudo-equilibrium state in a fairly short time window of 1-3 days, at which point we can measure the response to conditions at day zero [78]. In synthetic bi- ology, it is standard to track the concentration of input and output molecules (TFs, for instance) by tying them to a measurable property like uorescence. For example, in a lipofection-based co-transfection experiment, several copies of separate pieces of DNA – one piece a circuit com- ponent and the other, its uorescent proxy - are rst mixed together before other reagents are added which package the DNA copies in fatty vesicles called liposomes. When injected into the liquid medium of a cell culture, these vesicles easily merge with the membranes of cells since they are both made of a phospholipid bilayer, allowing them to deliver a payload of DNA copies into their cytoplasm. Though not all cells will receive the same total number of copies, because the marker and component DNA are rst mixed, this results in highly correlated delivery of the two components into each cell. These components will be transcribed and translated by the natu- ral machinery of the cell, per usual, with the stipulation that one produces a uorescent protein after translation. When combined with single-cell analysis methods such as ow cytometry, we can record its uorescence to approximate the relative concentration of its sister component. In other words, for each cell in the transfection, we can deduce its stoichiometry, or ratio of parts, without ever observing it directly. These values are then plotted to make statements about the relationship between composition of parts and the computation performed.

However, it should be noted each cell does contain a unique composition, even if measurable. Therefore, strictly speaking, each cell performs a slightly dierent computation. A key insight by Gam et al. is to leverage this variation to sample a large design space in one ‘poly-transfection’ experiment [41]. In short, this method involves several decorrelated co-transfections conducted simultaneously: Rather than including a single uorescent marker for an entire co-transfection sample, a dierent marker is used with each component of a system. Multiple co-transfections are then carried out together in a single cell culture vessel, in which each cell stochastically re- ceives a unique transfection payload whose stoichiometric composition is unrelated to the rest of the population. As a result, the uorescence measurement of every transfected cell provides an independent data point describing how the system behaves at a specic combination of its parts. Importantly, in my experiments, this concentration of parts can also be treated as the mag- nitude of an input; each neuronal node takes as arguments the equilibrium concentration of a

34 component delivered through transfection. Consequently, the poly-transfection method oers a high-throughput solution for characterizing multi-dimensional computations and observing whether they match the mathematical abstraction proposed.

Now let’s take a look back at Figure B-5. Toperform a weighted log-scale addition function in mammalian cells, I utilize the expression of two constitutively-expressed repressors, PhlF, whose cognate operator site is PhlFO, and TetR, whose operator site site TetO, as well as the expression of an output mKO2, by the hybrid promoter PPhlF/TetO, already described. All promoters are activated by constitutively-expressed Gal4-VP16 from the Hef1A promoter which has high basal activity in eukaryotic cells. Normally, the PhlF protein binds to a PhlF operator domain up- stream of its gene. However, in the presence of an input, DAPG, the PhlF repressor is inhibited, amplifying expression of the output mKO2. Similarly, when DOX is introduced to the system, the TetR repressor is unable to bind upstream Tet operators, up-regulating mKO2. The result of this operation, measured at equilibrium by ow cytometry, is shown in Figure B-6.

We can observe similarity between the expected (coarse-grained) behavior of a log-scale arti- cial neuron, displayed in the upper-right corner, and the output behavior, shown as a contoured surface with mean values (geometric mean calculated for approximately 1푒4 cells per data point) displayed in the adjacent graph. As expected, the TetR protein is performing a lion’s share of re- pression due to its overlap of the TATA box region of DNA, and, consequently, its de-repression by DOX contributes more signicantly to the expression of mKO2 than the de-repression of PhlF by DAPG. The result: a weighted, multiplicative integration of inputs by design.

So where do we go from here? Perhaps we stop: A single articial neuron can be pretty pow- erful in the right hands and given an easy enough problem. Our system is a veritable transcrip- tional perceptron and can already thwart classication problems with linear – or more generally, hyperplanar – separability. If, for instance, a set of two or three miRNA biomarkers can be distin- guished using a straight line (in logarithmic scale), with some tweaking, the neuron above would suce. For the classication of healthy HEK cells from cancerous HeLa cells, this is indeed the case [40]. However, a single articial neuron is really just the beginning; only when multiple are interconnected into large networks do ANNs derive their true representational power, putting them on par with any imaginable classication of cells [29]. In other words, only armed with a suitably large network of articial neurons can we feel condent tackling any decision-making

35 problem in single cell therapies. It’s now worth re-examining the architecture of the three-input neuron above. The nature of combinatorial regulation, where input signals are integrated multiplicatively, required an in- dependent TF to carry each signal. In larger networks, this remains true for every input to and connection between neurons; scaling network complexity demands a new part for each edge. Un- fortunately for mammalian synthetic biology, well-characterized, reliable, and orthogonal TFs are sparse. Thus, a limiting factor in the development of articial neurons with combinatorial reg- ulation is the sheer availability of parts. While others like Stanton et. al have made signicant contributions to populating the mammalian regulatory toolbox with new and reliable TFs, the number of such parts has not exploded comparably to prokaryotic systems [109]. So, we’ve bared a new set of questions: Do other reaction mechanisms exist from which neu- romorphic devices may be constructed with large numbers of inputs at little cost? Can we look beyond transcriptional control for other useful ingredients of computation? The answer, we’ll see, lies one step in either direction.

2.3 On Bio-Molecular Sequestration

2.3.1 A Brief Review of Molecular Sequestration

In living cells, many endogenous molecules form oligomers, complexes created when certain com- patible molecules collide and stick together. For instance, the TetR and PhlfR TFs we’ve seen both form dimers, two-component oligomers, before binding to their respective operator se- quences of DNA. In higher eukaryotes, like humans for example, oligomers play a central role in transcriptional regulation and subsequently proliferation and developmental decision-making [74, 1]. Perhaps because of their utility, these protein families are often ubiquitous in plants and animals and can account for up to half of the transcription factors in some organisms [55, 65]. This reason alone makes them a subject of for further inquiry. The combinatorial formation of regulatory complexes is a useful mechanism to control gene expression. If the active regulatory molecule is a heterodimer 퐴퐵, then a regulatory response will occur only when protein 퐴 and protein 퐵 are co-expressed in the cell; a variation on this model yields the multiplicative principle seen in transcriptional regulation above. But what about the

36 alternative scenario in which the heterodimer complex 퐴퐵 is inert and either 퐴 or 퐵 is the active component? This reaction mechanism goes by a dierent name: molecular sequestration. Molecular sequestration occurs when an active molecule 퐴 – ‘active’ here denotes it will later participate in a secondary reaction of interest – is sequestered into an inactive heterodimer com- plex 퐴퐵 by a titrating molecule 퐵 (Figure B-7). When the total concentration of 퐵 is larger than

a binding constant 퐾푑, the sequestering molecule 퐵 serves as a “sink” that buers against the

accumulation of 퐴. For suciently strong binding of 퐴 to 퐵 (퐾푑 < 1 nM), 퐵푡표푡 acts as a lin-

ear threshold for 퐴푡표푡: When the total concentration 퐴푡표푡 is increased, eventually the titrant 퐵

is depleted as the concentration of 퐴푡표푡 approaches 퐵푡표푡. Beyond this threshold, excess of free 퐴 is no longer buered and its concentration grows linearly with the total concentration of 퐴, i.e.,

퐴 = 퐴푡표푡 − 퐵푡표푡.

2.3.2 A Sequestration-Based Articial Neuron

Let’s take a moment to appreciate this result: 퐴 = 퐴푡표푡 − 퐵푡표푡. At equilibrium, the active (unbound) output 퐴 of a molecular computation is (approximately) the sum of a collection of

positive and negative terms, subsumed by 퐴푡표푡 and 퐵푡표푡, respectively. Moreover, if we account

for behavior when 퐴푡표푡 < 퐵푡표푡, we have a more interesting equation yet:

⎧ ⎨⎪0 if 퐴푡표푡 ≤ 퐵푡표푡 퐴푓푟푒푒 = ⎩⎪퐴푡표푡 − 퐵푡표푡 if 퐴푡표푡 > 퐵푡표푡

a nonlinear threshold function on the linear input summation 퐴푡표푡 − 퐵푡표푡. In machine learning lingo, this ramp-like function is often called a rectied linear unit (ReLU), one of the most popu- lar activation functions in ANNs [82]. Thus, in summary, we have a complete, albeit unweighted, articial neuron derived entirely from one simple and biologically abundant reaction.

However, in living systems, molecular concentrations like 퐴푡표푡 and 퐵푡표푡 don’t originate as static amounts. They are the outcome of uxes, a tumultuous struggle between mechanisms for synthesis and degradation in the cell. In realistic settings, we must instead consider an in-vivo model for molecular sequestration, with constant production rates 훼 and 훽 of the species 퐴 and 퐵, respectively, a titration rate 훾, and decay rate 푑. Nevertheless, at equilibrium and for fast

37 titration rates, again we nd 퐴 behaves as a ramp-like function of a linear combination of inputs (See Supplementary Proofs for more detail). Specically, we have:

¯ 훼푡표푡 − 훽푡표푡 퐴푓푟푒푒 ≈ 푚푎푥( , 0) 푑퐴 ⎧ ⎨⎪0 if 훼푡표푡 ≤ 훽푡표푡 = 훼푡표푡−훽푡표푡 ⎩⎪ 훼푡표푡 > 훽푡표푡 푑퐴 if where 훼푡표푡 and 훽푡표푡 describe the sum production rates of 퐴푡표푡 and 퐵푡표푡, respectively (Figure B-13). As before, we arrive at an articial neuron at equilibrium, though now expressed in terms of rates of production and decay of molecular species. This shift from protein concentrations to their uxes is important for reasons we’ve already observed. Namely, it relocates us back to transcriptional and translational control, where rates of synthesis and degradation are tunable, specied a priori, and encoded in DNA (Figure B- 9). Moreover, by mediating these rates through regulation by TFs, our previous transcriptional model may be folded into the sequestration framework as well, identiable as a nonlinear fea- ture map on inputs Xi corresponding to each TF (Figure B-10). However, a practical question remains: Which real components can we substitute for titrant A and sequesterer B? As mentioned, there are abundant examples of molecular sequestration in nature: The sense/anti- sense RNA and sigma/anti-sigma pairs in prokaryotic organisms, and leucine zipper proteins, CRISPR/anti-CRISPR, and recombinase proteins in eukaryotes, for example, have already been re-purposed for articial systems [22, 28, 16, 89, 69]. Similarly, regulatory dimers like TetR and PhlfR, can be rendered into inactive heterodimer systems by mutation of their binding domains: Co-expressing the wild-type and mutant monomers will result in sequestration of the former by the latter in living systems. For example, with 4C and 6C mutants of TetR,assembled and cloned by Christian Cuba-Samaniego, I demonstrated their capacity for sequestration of wild-type TetR experimentally in HEK-293T cells. The route I primarily chose to explore, however, lies in mechanisms governing post-transcriptional regulation, in which gene expression is controlled at the RNA level following transcription. Pre- viously, Dr. Breanna Stillo, a former graduate, with other members of the Weiss Lab, created and veried a platform of orthogonal RNA-regulating endoribonucleases (endoRNases) [33]. These

38 enzymes cleave short, specic, often hairpin-structured sequences of RNA called direct repeats which may be encoded into the transcript of a gene. By inserting the cognate direct repeat of an endoRNase upstream of the transcript, it is destabilized and targeted for degradation post- cleavage. Moreover, because the enzyme remains bound to its cognate RNA-hairpin, it is a sin- gle turnover reaction; the mRNA transcript and endoRNase are functionally inactive following binding.

I hypothesized that, if the degradation of destabilized, cleaved RNA occurs more quickly than its translation, the reaction, altogether, would represent a molecular sequestration of functional mRNA by endoRNase. To test this hypothesis, I selected two endoRNases, Csy4 and CasE, whose binding anity to their cognate hairpins has been previously shown to exceed the require- ments for molecular sequestration (Kd = 0.6 and 0.1 nM, respectively) [56, 111]. For each en- zyme, I assembled a DNA construct which constitutively expresses its gene as well as an infrared- uorescent iRFP marker protein, and measured by ow cytometry the availability of a constitu- tive eYFP yellow-uorescent plasmid whose transcript contained an upstream cognate hairpin for the corresponding endoRNase. The results are shown in Figure B-11, in which we can observe successful thresholding of mRNA by the CasE endoRNase, suggesting the reaction conforms to molecular sequestration. This experiment gave evidence that our lab’s endoRNase library would be a suitable platform from which to construct more sophisticated, sequestration-based ANNs.

In the previous paragraphs, much eort was taken to construct a biomolecular reaction mech- anism for articial neuronal computation which admits positively and negatively weighted val- ues. Formally, we can illustrate the advantage of a linear classier with positive and negative weights (versus positive weights only) using a measure of capacity for a functional class called its Vapnik-Chervonenkis or VC-dimension [114]. For a classication model, its VC-dimension is the maximum number of points which it is said to be able to shatter. In order to shatter a cong- uration of points, for every possible assignment of classes, there must exist some set of parameter values such that the model is able to correctly classify each point. Intuitively, this measure can be thought of as gauging expressiveness of a classier. Those with positive and negative weights are able to correctly distinguish more complicated distributions than neurons which possess positive weights only; the maximum VC-dimensions are 1 and 3, respectively, for two-input articial neu- rons with positive-only or positive and negative weights. For the linear classication of biomark-

39 ers in cells, possessing the capacity for both addition and subtraction is paramount to a successful articial neuron. Though early two-input thresholding experiments provided a strong indication that both operations were taking place, to more suciently validate that the framework could be extended to add and subtract multiple unique inputs, I designed and assembled genetic circuits for two additional articial neurons. The rst performs a three-input linear summation of two positive inputs and one negative input; the second, a three-input linear summation of two negative inputs and one positive input (Figure B-12, Figure B-16). From experimental data collected by the poly- transfection method described, we can observe a strong correspondence between steady-state u- orescence measurements and the corresponding log-scale computations, displayed for reference (Figure ??, Figure ??). Note: Though the operations performed are linear-scale additions, the u- orescence measurements collected after poly-transfection are uniformly distributed across each dimension in log-scale. Thus, I report experimental data using log-scale axes and provide a linear- scale reference in the adjacent graph. Moreover, we can observe a clear thresholding eect by modulation of the third input to the circuits, as displayed in Figures ?? and ??. Because the second cornerstone of articial neuronal computation is their parameterization, or adjustment of weights, I next sought to validate this property in-vivo. By decreasing the sto- ichiometry, or relative concentration of DNA, for one of the positive inputs while keeping the relative concentration of its transfection marker unchanged, I could adjust that component’s relative rate of expression in-vivo, and thus modulate its weight. I applied this strategy to success- fully generate weight changes in three variants of the previous experiment (Figure ??). We can observe an obvious shift in the intercept of 푥2, whose weight was modulated by three log-orders of magnitude, decreasing from left to right.

2.3.3 Sequestration-Based Molecular ANNs to Solve the XOR Problem

Exclusive logical or (XOR) classication is a model problem in the eld of machine learning, particularly because it requires non-linear classication and, consequently, is unsolvable with a single articial neuron. Correctly classifying the four inputs needed to describe an XOR function requires a VC-dimension of at least 4, one greater than the limit of any neuron I’ve previously shown. Solving this problem is of special interest because it demands a network of computations

40 to be performed, each appropriately weighted and behaving properly. This is a sensitive aair: get any part wrong and the entire computation collapses. I worked with Christian Cuba-Samaniego to design two alternative models capable of this classication. The rst is a traditional two-layer BNN featuring three nodes (Figure B-20) which computes an analog version of the XNOR logical function; the second, a reduced model of only two nodes which computes an analog XOR (Figure B-22). Both require three orthogonal seques- tration reactions to take place in the cell: CasE with its cognate mRNA, Csy4 with its cognate mRNA, and a third molecule, 휑퐶31 recombinase, which forms a dimer complex and may be

sequestered by its mutant partner, 휑퐶31푅퐷퐹 . Following several rounds of stoichiometric tun- ing, both were ultimately successful, as shown in Figure B-21 and Figure B-23 for the XNOR and XOR computations, respectively.

2.4 Conclusions

I have presented two motifs for biomolecular ANNS which executes articial neuron computa- tions at equilibrium. In the past, several models of chemical neural networks have been explored and experimentally tested; neural network computations have been furthest advanced in-vitro using DNA strand displacement, itself a sequestration mechanism [24, 97, 70]. Primarily, I be- lieve this model oers a generalized chemical neural network, and a proven blueprint for feasibly implementing this type of computation in living systems. Gene regulation is a powerful source of inspiration that may have broader scalability in synthetic biology in the future. Moreover, there are abundant examples of protein sequestration in nature, and many have already been re- purposed for articial systems [22, 28, 16, 89, 69]. At this point, one might reasonably inquire whether the two motifs might be combined. These would produce articial neurons belonging to a hybrid scale of log and linear domain, and could extend our toolbox for ANN computation into two potentially compatible mech- anisms. Moreover, from a practical standpoint, these operations would occur across multiple modalities of regulation - transcriptional activation and repression as well as post-transcriptional repression and dominant negative protein sequestration - oering a safety net of redundancy and orthogonality in the face of uncertainty in the cell. As a proof-of-concept, I experimentally char-

41 acterized one such circuit combining transcriptional regulation by TetR and Gal4-VP16 with post-transcriptional regulation by CasE (Figure B-24). As predicted, we can observe a magnitude fold-change in the range of the output, one indicator of strong and redundant regulation (Figure B-25).

I also see this as groundwork on which to build other analogies between biochemical net- works and results from the eld of machine learning. For instance, to solve the issue of separating data which is not linearly separable, machine learning practitioners have often turned to feature maps, cleverly designed functions which transform input data from one representation to an- other. If successful, the transformed data will have a clear linear margin dividing classes, allowing simpler (i.e. shallower) networks to solve otherwise impossible classication problems. Fortu- nately, we can re-purpose many mechanisms found in the existing toolbox for bio-engineering as feature maps on input molecular species. Earlier, we have suggested activation and inhibition in- teractions, though, in fact, all Φ푖 above can be considered internal feature maps in their respective networks. Our use of molecular sequestration can even be altered into an absolute value mapping [104]. Moreover, in addition to expressiveness, the careful selection of these functions can allow for greater resource eciency when designing biomolecular neural networks; an inhibition reac- tion, for instance, allowed the three perceptron XOR function of Figure 4a to be reduced to two units in Figure B-20.

Beyond rationally selecting feature maps, designing a BNN to implement a desired behavior in-vivo will require each weight to be set appropriately. In the past, others [80, 49, 70, 96] have suggested their network parameters be optimized in-silico using popular optimization strategies like gradient descent. Here, I reinforce that strategy with a caveat: Perhaps deterministic methods are not so realistic for the simulation and training of a bio-molecular neural network. Fig. ??, for example, depicts the output variance of one perceptron as the number of input molecules in the system decreases. Rather, well-studied probabilistic methods in machine learning, which operate on the probability distributions of inputs and portray condence intervals for their behavior, may be more appropriate.

Finally, I hope the development and optimization of BNNs will contribute to the under- standing of learning mechanisms biology currently employs. Perhaps, then, the discovery of new, unexpected motifs will simultaneously help biologists and bio-engineers build more complex sys-

42 tems and decode those which already exist in nature.

2.5 Methods

DNA Assembly: All ENTRs were provided in 5 femtomoles/l and DESTs are in 10 femtomoles/l concentra- tions. The LR reaction utilized LR Clonase II Plus (ThermoFisher 12538120) accordingly to man- ufacturer protocols without Proteinase K treatment. Two microliters of each reaction were trans- formed into 50 l aliquots of chemically competent (Zymo Research Z-Competent Kit) cells of the bacterial strain E. clonii (Lucigen) per manufacturer protocol. pUC19 controls were provided in 10 picogram/l quantities. Ampicillin in all LB and LB-Agar plates was in a nal concentration of 100 microgram /mL. Plasmid extraction was done with 4 mL overnight cultures supplemented with appropriate antibiotic using a Miniprep Kit (Qiagen Inc) and veried by restriction digests and gel electrophoresis. All restriction endonucleases were obtained from New England Biolabs. Cell Culture: HEK-293T cell lines were graciously provided by Dr. Ron Weiss. All cell lines used in this study were maintained in Dulbecco’s modied Eagle Medium (DMEM, Cellgro) supplemented with 10% FBS (Atlanta BIO), 1% penicillin/streptomycin/L-glutamine (Sigma-Aldrich) and 1% nonessential amino acids (HyClone) at 37 °C and 5% CO2. DNA Preparation and Transfec- tion: All transfections were carried out in 24-well format. Forty-eight hours prior to transfection, CHO-K1 cells were harvested by adding 250 l of 0.05% trypsin and incubating for 3 minutes at room temperature and seeded in 500 l of complete media in a 24-well plate (5푥105 cells per well). Transfections were performed with Lipofectamine 3000 transfection reagent according to the manufacturer’s protocol. A total of 1800 ng of DNA diluted to 12 l was mixed with 36.5 l Opti- MEM Serum Free Medium (Life Technologies) and 1.5 l of Lipofectamine 3000 reagent to a nal volume of 50 l. After ensuring each was thoroughly mixed, the complexes were incubated for 12 minutes at room temperature and added dropwise to wells containing HEK-293T cells. Cells were harvested by trypsinization 18 hours after transfection, and placed on ice before analysis by ow cytometry. Flow Cytometry:

43 Individual 24-well HEK-293T in DMEM media supplemented with FBS, Glutamine, and appropriate antibiotics were trypsinized with 100 l 0.05% Trypsin (ThermoFisher) before being collected by centrifugation and resuspended in 150uL 1X PBS. The same LSRFortessa ow ana- lyzer (BD Biosciences) was used for all ow cytometry measures using the same settings. For each sample, 200,000 events were collected and gated according to forward scatter (FSC-A) (PMT of 100 V) and side scatter (SSC-A) (PMT of 175 V). eYFP was measured using a 488-nm laser and a 530/15 emission lter using a photomultiplier tube (PMT) setting of 300 V. mKate was measured using a 561-nm laser and a 610/20 emission lter using a PMT setting of 500 V. eBFP2 was mea- sured using a 405-nm laser and a 450/50emission lter using a PMT setting of 325V.All cytometry data les were analyzed using CytoFlow (Teague Lab, UW-Stout).

2.6 Supplementary Proofs

2.6.1 Sequestration-Based Biomolecular Perceptron Model

Molecular sequestration consists of two species, 푍1 and 푍2, which form an inactive complex, thereby reducing the species’ individual concentrations, as shown in Fig 1B. Weconsider constant

production rates 푢 and 푣 of the species 푍1 and 푍2, respectively, a titration rate 훾, and decay rate 휑. The list of chemical reactions describing this interaction are:

푢 푣 ∅ −−⇀ 푍1 ∅ −−⇀ 푍2 Constitutive activation 휑 휑 푍1 −−⇀ ∅ 푍2 −−⇀ ∅ Degradation 훾 푍1 + 푍2 −−⇀ ∅ Molecular sequestration

where 푢 and 푣 describe the production rates of 푧1 and 푧2, respectively, by the constant input species 푋푖 with individual rates 푤푖. Assuming our reactions follow mass action kinetics, we can derive the following Ordinary Dierential Equations (ODEs):

푧˙1 = 푢 − 훾푧1푧2 − 휑푧1 (2.1)

푧˙2 = 푣 − 훾푧1푧2 − 휑푧2 (2.2)

44 A stability analysis of (2.1)-(2.2) has already been demonstrated in [11] for a general model of the same form. Here, we report Proposition 2 from that paper, adapted to our notation.

Proposition 1 The variables of system (2.1)-(2.2) are bounded for any initial condition 푧1(0), ⊤ 푧2(0) ≥ 0. The system admits a unique, asymptotically stable equilibrium point, 푧¯ = (¯푧1, 푧¯2) , −휑푡 and the convergence is exponential: ‖푧(푡)−푧¯‖1 ≤ 푒 ‖푧(0)−푧¯‖1, for any 휑 > 0 and 푧(0) ≥ 0. Moreover, oscillatory behavior is not possible around the equilibrium, so 푧(푡) =푧 ¯ occurs at most once.

The proof, which relies on non-smooth Lyapunov functions, can be found in [11].

Proposition 2 At equilibrium and for fast titration rates 훾 ≫ 휑, 푧1 behaves as a ramp-like

function of a linear combination of inputs 푥푖:

⎧ ⎪0 푥′ < 0 푛 ⎨ if ′ ∑︁ ′ ′ 푧¯1 ≈ , 푥 = 푥푖푤푖 : 푥푖, 푤푖 ∈ R ⎩⎪푥′ if 푥′ ≥ 0 푖=1

Proof: Under the conditions above, it has been previously shown in [28] that molecular seques- tration operates as a comparator module: For a xed 푣, a threshold is dened for 푢 such that, when 푢 < 푣, the output concentration of 푣 is nearly zero, and for 푢 ≥ 푣, it is proportional to (푢 − 푣) by a positive factor 휑−1. This reasoning is symmetric when 푢 is considered constant and

푣 our output. We represent the behavior of 푧1 at equilibrium as:

⎧ ⎨⎪0 if 푢 < 푣 푧¯1 ≈ (2.3) ⎪ 푢−푣 ⎩ 휑 if 푢 ≥ 푣

Next, we aim to nd 푧¯1 and 푧¯2 as a function of the input 푢 and 푣. To do this, we make 푧˙1 =푧 ˙2 = 0, and nd 푢 − 휑푧¯1 푣 푧¯2 = = , 훾푧¯1 훾푧¯1 + 휑 This results in a second order polynomial equation

2 푎푧¯1 + 푏푧¯1 + 푐 = 0,

45 where 푎 = 1, 푏 = 휑/훾 + (푣 − 푢)/휑 and 푐 = −푢/훾. The constant term 푐 is negative, resulting √ 1 (︀ 2 )︀ 2 in a single positive solution, 푧¯1 = 2 −푏 − 푏 − 4푐 . If 훾 >> 휑, |푐| is smaller than 푏 , and the solution can be approximated as:

⎧ ⎨⎪0 if 푢 < 푣 푧¯1 ≈ (2.4) ⎪ 푢−푣 ⎩ 휑 if 푢 ≥ 푣

−1 Now, we emphasize that when 푢 ≥ 푣, the output, 푧1 = 휑 (푢 − 푣), denes a sum of input −1 concentrations, 푥푖, weighted by their rates of production and 휑 :

푚 푛 푢 − 푣 ∑︁ (푢) ∑︁ (푣) = 푤′ + 푤′(푢)푥 − 푤′ (푣)푥 ≥ 0 휑 0 푖 푖 푗 푗 푖=1 푗=푚+1

(푢) (푢) (푣) (푣) Inputs and weights denoted 푥푖 and 푤푖 are associated with input 푢; 푥푗 and 푤푗 , with input

′ 푤푖 푣. All weights are eective, i.e. 푤푖 = 휑 . Thus, we can restate the behavior of 푧1 at equilibrium as: ⎧ ⎨⎪0 if 푥′ < 0 푧¯1 ≈ ⎩⎪푥′ if 푥′ ≥ 0 푚 푛 ′ ′ ∑︁ ′(푢) (푢) ∑︁ ′ (푣) (푣) 푥 = 푤0 + 푤푖 푥푖 − 푤푗 푥푗 푖=1 푗=푚+1 푛 ∑︁ ′ ′ = 푥푖푤푖 : 푥푖, 푤푖 ∈ R 푖=1

′ ′ where 푥푖푤푖 ≥ 0 are associated with input 푢; 푥푖푤푖 < 0, with input 푣.

In the context of ANNs, the ramp function is commonly denoted Rectied Linear Unit (ReLU) and is a common choice of activation function with strong mathematical justications [46]. Thus, we can interpret the preceding reaction network as a complete perceptron with posi- tive and negative weights: A positive weight corresponds to the production of 푧1 by the input 푢,

and a negative weight corresponds to the production of the sequesterer 푧2 by the input 푣. Our threshold function 푓ˆis a ReLU-like function.

However, in the interest of scalability, we introduce an additional non-linear enzymatic reac-

46 tion to distinguish the input and output molecular species. Practically, when connecting percep- trons in a feed-forward network, this will enable the perceptron to drive downstream processes – for instance, in which the new output species is an input to a subsequent perceptron – with- out interfering with the upstream sequestration. The expression for the output at equilibrium is now:

Φ 훿 푍1 −−⇀ 푌 푌 −−⇀ ∅ Production/Degradation

where:

푛 푧1 Φ = 훼 푛 푛 , for an activator 푧1 푧1 + 푘1 푛 푘1 Φ = 훼 푛 푛 , for an inhibitor 푧1 푧1 + 푘1

All following analyses assume 푧1 activates a downstream process, and similar steps can be fol- lowed when 푧1 serves as an inhibitor. Finally, we describe the output 푦 at equilibrium:

⎧ ⎨⎪0 if 푢 < 푣 푦¯(푢, 푣) ≈ (2.5) (︁ 푧푛 )︁ ⎪ 훼 1 ⎩ 푛 푛 if 푢 ≥ 푣 훿 푧1 +푘1

2.6.2 Sequestration-Based Biomolecular Neural Network

A Biomolecular Neural Network, or BNN, contains many biomolecular perceptrons organized into layers associated only through feed-forward connections. Each unit in a layer processes an

input species 푥푖,푗 : 푖 ∈ {1, . . . , 푊 }, 푗 ∈ {1, . . . , 퐷}, to produce an output species 푦푖,푗, where 푊 is the width of the network and 퐷 is its depth.

For sigmoidal and other nonlinear activation functions, this multilayer feedforward architec- ture endows perceptron networks with the potential to approximate a wide variety of functions when given appropriate parameters [29, 51].

Proposition 3 Consider a BNN as defined with depth 퐷 and layer width 푤푗 for each layer 푗. Every equilibrium point is locally asymptotically stable.

47 Proof: The Jacobian of perceptron 푃 in position 푖, 푗 is described by 푃푖,푗:

⎡ ⎤ 휑 + 훾푧¯2,푖,푗 훾푧¯1,푖,푗 푃푖,푗 = − ⎣ ⎦ 훾푧¯2,푖,푗 휑 + 훾푧¯1,푖,푗

From 푃푖,푗, we obtain the characteristic polynomial for a single perceptron:

휆푖,푗(푠) = 푑푒푡(푃푖,푗 − 푠퐼) = (푠 + 휑){푠 + 휑 + 훾(¯푧1,푖,푗 +푧 ¯2,푖,푗)}

We represent a layer of the network 퐿 at depth 푗 with the matrix 퐿푗:

⎡ ⎤ 푃푗,1 0 ··· 0 ⎢ ⎥ ⎢ ⎥ ⎢ 0 푃푗,2 ··· 0 ⎥ 퐿푗 = ⎢ ⎥ , ⎢ . . .. . ⎥ ⎢ . . . . ⎥ ⎣ ⎦ 0 0 ··· 푃푗,푚푗

A matrix 퐶푗,푘 : 푗, 푘 ∈ {1, . . . , 퐷} describes the relation between layers 푗 and 푘 where 푗 < 푘. Altogether, we have the following Jacobian for a BNN, which represents the general Jacobian for a feedforward network: ⎡ ⎤ 퐿1 0 ··· 0 ··· 0 ⎢ ⎥ ⎢ ⎥ ⎢ 퐶1,2 퐿2 ··· 0 ··· 0 ⎥ ⎢ ⎥ ⎢ ...... ⎥ ⎢ ...... ⎥ 퐽 = ⎢ ⎥ , ⎢ ⎥ ⎢ 퐶1,푗 퐶2,푗 ··· 퐿푗 ··· 0 ⎥ ⎢ ⎥ ⎢ ...... ⎥ ⎢ ...... ⎥ ⎣ ⎦ 퐶1,퐷 퐶2,퐷 ··· 퐶푗,퐷 ··· 퐿퐷

Because, the matrix 퐽 is a lower block triangular matrix, we can rewrite it as:

⎡ ⎤ 퐴 0 퐽 = ⎣ ⎦ , 퐶 퐷 where 퐴 and 퐷 are also lower block triangular matrices, 퐶 is a matrix, and 0 is a block containing

48 only zeroes. The characteristic polynomial, 푃 (푠) for matrix 퐽 is therefore the product of the determinants of the diagonal blocks, 푃푖,푗. Thus, we obtain a nal expression for 푃 (푠):

퐷 (︃ 푚푗 )︃ ∏︁ ∏︁ 푃 (푠) = 휆푖,푗(푠) 푗=1 푖=1

From the Jacobian 푃푖,푗 given above, all eigenvalues have negative real-components, and the BNN with Jacobian 퐽 is stable around any equilibrium.

49 50 Chapter 3

Molecular Learning and Adaptation in Mammalian Cells

To recapitulate, synthetic biologists seek to collect, rene, and repackage nature so that it’s eas- ier to design new and reliable biological systems, typically at the cellular or multicellular level. These redesigned systems are often referred to as “biological circuits,” for their ability to perform operations on biomolecular signals, rather than electrical signals, and for their aim to behave as predictably and modularly as would integrated circuits in a computer. However, unlike computer hardware, engineered circuits in living cells often exhibit poor robustness and substantial variations from one cell to the next [42, 37]. A major cause for such behavior is that the biochemical components which constitute a circuit depend on factors in their molecular environment (or context) within and outside the cell [3]. Despite the eort of bioengineers to design and construct biological processes which obey a pre-determined set of rules, how their designs function in vivo is often circumstantial, or context-dependent. Here, we refer to context as the environment in which a system nds itself [47]. The context in which single-cells and multi-cellular organisms nd themselves dene the way they interact, respond to the environment, and process external information. For instance, in embryonic development, chemical signatures in stem cells’ inter- and intracellular environments control their complete dierentiation, patterning, and , transforming a single zy- gote into a complex organism [18]. At its best, contextual richness underscores the complexity of biological life. However, for synthetic devices, problems of context are barriers to the reliable

51 function of biological pathways, more often coupled with violations of robustness and modu- larity. Since the birth of the eld, synthetic biologists have had to grapple with issues of context, particularly when their molecular and genetic devices inexplicably fail to function as designed [19]. Recently, progress has been made in taking into account such environmental factors in the modeling and design of molecular circuits [68, 113, 121]. This can tremendously improve the faith- fulness of computational models and in turn the predictability of rationally designed circuits. However, in practical scenarios, the origins and properties of these inuences are barely known and hard to anticipate during design time, and sometimes dicult to uncover even once imple- mented. Thus, while synthetic biologists often use principles of rational engineering to understand and construct biological systems, rational concepts may have practical limits in biological design. This is particularly true when optimizing complex, multi-step phenotypes or when the physical link between genotype and phenotype is obscured from observation. For synthetic biologists, this challenge complicates their ability to design and tune a circuit a priori so that it performs as intended in the real environment of a cell. In gene and engineered cell therapies, issues of context are critical to safe and ecacious decision-making. As aforementioned, relevant cellular biomarkers may provide a useful proxy for deducing the context of a cell (e.g. its cell state) to steer the direction of intervention. How- ever, their integration into synthetic biological circuits does not circumvent the issue, altogether. In practical scenarios, a classication task is not often static or fully understood: Diseases like can- cer adapt to evade elimination, while organ development requires a complicated coordination of growth factors and external inputs. A viable alternative is to employ adaptive design principles in which a circuit continuously senses and modies itself according to changing environmental conditions and a predened objective. This requires molecular circuits which make inferences about their surroundings and learn in order to improve on future decisions. As a starting point, in the following section, I discuss the behavior of a genetically-encoded circuit which learns the optimal stoichiometry of a single component in order to emulate the behavior of a second reference circuit. To do so, the entire design is rst divided into two sub- circuits, Comparator and Update components, which are individually analyzed. Then, I discuss

52 the requirements and challenges of each, as well as of connecting the two components together.

3.1 A Genetic Circuit for Learning in Mammalian Cells

3.1.1 Circuit Design

In this context, the objective of a learning circuit is to modify a parameter W to minimize the dierence between two transfer functions: 퐹1 involving parameter W and input X, and 퐹2 in- volving reference Y. Because this modication to W should enable 퐹1 to stably emulate 퐹2 over time and in the absence of Y, I consider genetic modications to W (for instance, versus a molec- ular quasi-integral controller). The circuit design is shown in Figure B-26 and Figure B-27. In the circuit, a small molecule Doxycycline binds to a reverse tetracycline trans-activator (rtTA), constitutively expressed by a modiable number of transcriptional units (the modica- tion process is described later), inducing the expression of a monomeric Tet-4C mutant protein. Simultaneously, another small molecular, DAPG, binds to constitutive PhlF to activate expres- sion of a monomeric Tet-6C mutant protein. Unbound Tet-4C and Tet-6C proteins undergo molecular titration, in which either protein is sequestered by the other into an inactive TetR- 4C:TetR-6Ccomplex that is unable to participate in downstream processes (although inaccurate, the Tet-4Cand Tet-6Cmutants are assumed to not interact with DOX or rtTA). The remaining, unbound Tet-4Cand Tet-6Cproteins may then dimerize to induce expression of serine integrase and serine integrase-RDF (Ribosome Directionality Factor) fusion proteins, respectively. Serine integrase and serine integrase-RDF fusion proteins – hereafter referred to as int and int-RDF – catalyze the inversion of DNA regions anked by attP/attB and attL/attR target sites, respectively, in four stages shown in Figure B-28: 1) integrases bind as dimers to the region of DNA containing its target sites, 2) bound integrases synapse into a tetramer, uniting the respec- tive target sites, and invert the region of DNA between target sites, 3) the integrase tetramer de- synapses back into two dimers, which 4) unbind from the now converted target sites (attP/attB to attL/attR or vice-versa). A unidirectional promoter region for rtTAis placed between recombination sites on multiple integrated transcription units. Because int and int-RDF target the recombination sites produced by each other’s inversion, we utilize them to reversibly tune the stoichiometry of the circuit; int

53 decreases the active copy number of rtTA transcription units while int-RDF increases it. Disclaimer: This design is only presumed to carry out the learning procedure described above given some assumptions:

1. There exists an active number of rtTA transcriptional units less than or equal to the total

number integrated which minimizes the dierence between 퐹1 and 퐹2. More generally,

modifying the active copies is presumed to impact 퐹1.

2. There is a direct relationship between changes to the active copy number of rtTA sites and

the dierence between 퐹1 and 퐹2; the partial derivative of that dierence w.r.t the active copy number of rtTA is always positive.

3.1.2 Discussion

Figure B-29 shows the transfer of species within one update to the active rtTA promoter sites from the introduction of int and intRDF to equilibrium. It’s important to note that the long timescale for adaptation would be a strongly limiting factor in any practical setting in mammalian cells; its pace is primarily a result of the slow de-synapsing reaction following DNA inversion, the unbinding of integrases from DNA, and the slow rate of dilution accounting for the removal of update species from the system. In the following simulations, to be able to analyze meaningful behavior especially when connected to the comparator module, I assume a 10x speedup in de- synapsing and increased degradation approximating SsrA-tag degradation rates in bacteria. As a baseline analysis, I wished to understand the input-output behavior of the integrase- based update mechanism. In particular, I rst wanted to understand whether relative concentra- tions of int and int-RDF would produce the desired update to the active rtTA promoter sites at equilibrium, as both integrase species degrade from the system. Figure B-30 demonstrates that when both enzymes are introduced to the system simultaneously and allowed to degrade, the ac- tion on the attL/attR and attP/attB sites is as anticipated. Moreover, the outcome seems driven by the proportional relationship of the enzymes (we can observe this in the anti-diagonal linear change in attP/attB sites), rather than their magnitudes. This is a desirable property when con- sidering the circuit in full; that the update is a result of the ratiometric dierence between the integrases insulates the update module from, for instance, the magnitude of their production

54 by the comparator module. Moreover, if the protein sequestration is imperfect, meaning some quantity of both products remain in the system, the leftover sequesterer won’t impact the system as strongly. However, this also introduces more issues in the transient regime of each individual update.

Figure B-31 demonstrates the eect of asynchrony between int and intRDF input signals on the update to active rtTA promoter sites by creating time lags in the introduction of intRDF. By delaying intRDF, the pathway initiated by int immediately begins the process of recombination (the values show in Figure B-31 plot the sum of rtTA promoter sites all stages of recombination), biasing the rtTA sites at equilibrium toward a lower value, creating an o-diagonal response al- together. The opposite eect is true when delaying int.

Figure B-32 shows the behavior of the comparator module across titrated levels of active attP/attB promoter sites. Here, we can observe two important characteristics for error minimization. First, the relationship between P/B sites and the error signal (dened here as the absolute dierence between the reference signal, DAPG:PhlF, and the signal of interest, DOX:RTTA) is convex. Second, the protein sequestration by Tet-4C and Tet-6C generates an update signal (ratio of int to intRDF) opposite the gradient of the error. Together, these characteristics imply that suc- cessive iterations of updates would result in an optimal P/B site value. However, as Figure B-33 exemplies, this is a precarious implication; imbalances in the production of the Tetmutants and integrase enzymes upset the relationship between the error and update minima. For example, a disparity in the dissociation constants for DAPG:PhlF and DOX:rtTA with their promoter sites separates the minima such that the system would undershoot the “true” minimum. In Figure B-34, again we see the negative impact of input delays. Here dierences in the production time for the titrant and sequesterer species disrupts the error inferred by the system, creating an update even when the number of rtTA sites is already at the optimum value.

When simulating the complete circuit, we can observe the impact of this system’s fragility as described above. Figure B-35 displays two optimization runs consisting of 2 hour inductions of DAPG and DOX every 24 hours for 10 days. The right simulation begins below the optimum number of active rtTA sites and must increase its value; the left, above the value so that it must decrease its active sites. In the end, neither simulation arrives at the true optimum (the right simulation actually passes it).

55 Through the studies above, I’ve attempted to qualitatively understand the behavior and chal- lenges of a circuit which modies itself to learn an optimal stoichiometric parameter provided user inductions. The most palpable takeaway is a stronger understanding of the slow pace and fragility of the system as designed as well as the method for training it. A working version would need to incorporate nely regulated timing of inductions as well as pathways for comparison and updating which are experimentally veried to be symmetric; neither are easy feats on their own. In retrospect, true timescale separation between the comparator and update module could actu- ally be advantageous, so that an update is only completed once the error signal is stable. However, this wouldn’t eliminate the diculties of lag times in the comparator as described above. Overall, the circuit design needs to be reconsidered in light of the issues already described, though I be- lieve the analysis was successful in unearthing some issues which should be reexamined in a more mature design.

3.1.3 Description of Model

Integrase Recombination Species

푃 퐵 : Free promoter anked by attP-attB recombinase target sites.

퐿푅 : Free promoter anked by attL-attR recombinase target sites.

푖푛푡 : Serine integrase (tetramer).

푖푛푡푅퐷퐹 : Fusion protein (tetramer) of serine integrase and recombination directionality factor (RDF).

푃 퐵푖푛푡 : Complex containing 푃 퐵 and 푖푛푡.

퐿푅푖푛푡1 : Synaptic complex containing 퐿푅 and 푖푛푡 after recombination.

퐿푅푖푛푡2 : Complex containing 퐿푅 and 푖푛푡 after de-synapsis.

퐿푅푖푛푡푅퐷퐹 : Complex containing 퐿푅 and 푖푛푡푅퐷퐹 .

푃 퐵푖푛푡푅퐷퐹1 : Synaptic complex containing 푃 퐵 and 푖푛푡푅퐷퐹 after recombination.

푃 퐵푖푛푡푅퐷퐹2 : Complex containing 푃 퐵 and 푖푛푡푅퐷퐹 after de-synapsis.

56 Reactions

푏1푓 푟1푓 푠푦푛1푓 푏2푟 푃 퐵 + 4푖푛푡 ↽−−−−⇀ 푃 퐵푖푛푡 ↽−−−−⇀ 퐿푅푖푛푡1 ↽−−−−−−⇀ 퐿푅푖푛푡2 ↽−−−−⇀ 퐿푅 + 4푖푛푡 푏1푟 푟1푟 푠푦푛1푟 푏2푓

푏3푓 푟2푓 푠푦푛2푓 푏4푟 퐿푅 + 4푖푛푡푅퐷퐹 ↽−−−−⇀ 퐿푅푖푛푡푅퐷퐹 −−↽−−⇀ 푃 퐵푖푛푡푅퐷퐹1 −−−↽−−−⇀ 푃 퐵푖푛푡푅퐷퐹2 ↽−−−−⇀ 푃 퐵 + 4푖푛푡푅퐷퐹 푏3푟 푟2푟 푠푦푛2푟 푏4푓

ODEs Fast Reactions:

푑푃 퐵 푖푛푡 = 푏 푃 퐵 · 푖푛푡4 − 푏 푃 퐵 + [푟 퐿푅 − 푟 푃 퐵 ] =⇒ 푑푡 1푓 1푟 푖푛푡 1푟 푖푛푡1 1푓 푖푛푡 4 ¯ 푃 퐵 · 푖푛푡 푃 퐵푖푛푡 = 퐾푏1 푑퐿푅 푖푛푡2 = 푏 퐿푅 · 푖푛푡4 − 푏 퐿푅 + [푠푦푛 퐿푅 − 푠푦푛 퐿푅 ] =⇒ 푑푡 2푓 2푟 푖푛푡2 1푓 푖푛푡1 1푟 푖푛푡2 4 ¯ 퐿푅 · 푖푛푡 퐿푅푖푛푡2 = 퐾푏2 푑퐿푅 푖푛푡푅퐷퐹 = 푏 퐿푅 · 푖푛푡푅퐷퐹 4 − 푏 퐿푅 + [푟 푃 퐵 − 푟 푃 퐵 ] =⇒ 푑푡 3푓 3푟 푖푛푡푅퐷퐹 2푟 푖푛푡푅퐷퐹1 2푓 푖푛푡푅퐷퐹 4 ¯ 퐿푅 · 푖푛푡푅퐷퐹 퐿푅푖푛푡푅퐷퐹 = 퐾푏3 푑푃 퐵 푖푛푡푅퐷퐹2 = 푏 푃 퐵 · 푖푛푡푅퐷퐹 4 − 푏 푃 퐵 + [푠푦푛 푃 퐵 − 푠푦푛 푃 퐵 ] =⇒ 푑푡 4푓 4푟 푖푛푡푅퐷퐹2 2푓 푖푛푡푅퐷퐹1 2푟 푖푛푡푅퐷퐹2 4 ¯ 푃 퐵 · 푖푛푡푅퐷퐹 푃 퐵푖푛푡푅퐷퐹2 = 퐾푏4

57 Slow Reactions:

푑푃 퐵 푡표푡 = 푟 퐿푅 − 푟 푃 퐵푖푛푡 + 푟 퐿푅 − 푟 푃 퐵 푑푡 1푟 푖푛푡1 1푓 2푟 푖푛푡푅퐷퐹 2푓 푖푛푡푅퐷퐹1 푑푃 퐵 푖푛푡푅퐷퐹1 = 푟 퐿푅 + 푠푦푛 푃 퐵 − (푟 + 푠푦푛 )푃 퐵 푑푡 2푓 푖푛푡푅퐷퐹 2푟 푖푛푡푅퐷퐹2 2푟 2푟 푖푛푡푅퐷퐹1 푑퐿푅 푖푛푡1 = 푟 푃 퐵 + 푠푦푛 퐿푅 − (푟 + 푠푦푛 )퐿푅 푑푡 1푓 푖푛푡 1푟 푖푛푡2 1푟 1푟 푖푛푡1 푑푖푛푡 = −훾푖푛푡 푑푡 푑푖푛푡푅퐷퐹 = −훾푖푛푡푅퐷퐹 푑푡

푃 퐵 = 푃 퐵푡표푡 − 푃 퐵푖푛푡 − 푃 퐵푖푛푡푅퐷퐹1 − 푃 퐵푖푛푡푅퐷퐹2 푃 퐵 − 푃 퐵 = 푡표푡 푖푛푡푅퐷퐹1 1 + 푖푛푡4 + 푖푛푡푅퐷퐹 4 퐾푏1 퐾푏4

퐿푅 = 퐷푡표푡 − 푃 퐵푡표푡 − 퐿푅푖푛푡푅퐷퐹 − 퐿푅푖푛푡1 − 퐿푅푖푛푡2 퐷 − 푃 퐵 − 퐿푅 = 푡표푡 푡표푡 푖푛푡1 1 + 푖푛푡4 + 푖푛푡푅퐷퐹 4 퐾푏2 퐾푏3

Molecular Titration Species

퐷푂푋 : Small molecule input.

푟푡푇 퐴 : Weight species.

퐷푂푋-푟푡푇 퐴 : Complex formed from 퐷푂푋 and 푟푡푇 퐴 and activator for 푇 푒푡4퐶 .

푇 푒푡4퐶 : Mutant Tet protein and analyte with 푇 푒푡6퐶 .

퐷퐴푃 퐺 : Small molecule reference.

푃 ℎ푙퐹 : Constitutive activator.

퐷퐴푃 퐺-푃 ℎ푙퐹 : Complex formed from 퐷퐴푃 퐺 and 푃 ℎ푙퐹 and activator for 푇 푒푡6퐶 .

푇 푒푡6퐶 : Mutant Tet protein and analyte with 푇 푒푡4퐶 .

푇 푒푡4퐶-6퐶 : Inactive complex formed from 푇 푒푡4퐶 and 푇 푒푡6퐶 . 푖푛푡 : Serine integrase (tetramer).

푖푛푡푅퐷퐹 : Fusion protein (tetramer) of serine integrase

and recombination directionality factor (RDF).

58 ODEs

푑퐷푂푋 = 푓1푖푛푑(푡) − 푘표푛 퐷푂푋 · 푟푡푇 퐴 + 푘표푓푓 퐷푂푋-푟푡푇 퐴 − 훾퐷푂푋 푑푡 1 1 푑푟푡푇 퐴 = 푣푚푎푥푃 퐵 − 푘표푛 퐷푂푋 · 푟푡푇 퐴 + 푘표푓푓 퐷푂푋-푟푡푇 퐴 − 훾푟푡푇 퐴 푑푡 1 1 푑퐷푂푋-푟푡푇 퐴 = 푘표푛 퐷푂푋 · 푟푡푇 퐴 − 푘표푓푓 퐷푂푋-푟푡푇 퐴 − 훾퐷푂푋-푟푡푇 퐴 푑푡 1 1 푑푇 푒푡 퐷푂푋-푟푡푇 퐴 4퐶 = 푣 퐷 퐾푑1 − 푘 푇 푒푡 · 푇 푒푡 + 푘 푇 푒푡 − 훾푇 푒푡 푑푡 푚푎푥 푡표푡 퐷푂푋-푟푡푇 퐴 + 1 표푛2 4퐶 6퐶 표푓푓2 4퐶-6퐶 4퐶 퐾푑1 푑퐷퐴푃 퐺 = 푓2푖푛푑(푡) − 푘표푛 퐷퐴푃 퐺 · 푃 ℎ푙퐹 + 푘표푓푓 퐷퐴푃 퐺-푃 ℎ푙퐹 − 훾퐷퐴푃 퐺 푑푡 3 3 푑푃 ℎ푙퐹 = 푣푚푎푥퐷푡표푡 − 푘표푛 퐷퐴푃 퐺 · 푃 ℎ푙퐹 + 푘표푓푓 퐷퐴푃 퐺-푃 ℎ푙퐹 − 훾푃 ℎ푙퐹 푑푡 3 3 푑퐷퐴푃 퐺-푃 ℎ푙퐹 = 푘표푛 퐷퐴푃 퐺 · 푃 ℎ푙퐹 − 푘표푓푓 퐷퐴푃 퐺-푃 ℎ푙퐹 − 훾퐷퐴푃 퐺-푃 ℎ푙퐹 푑푡 3 3 푑푇 푒푡 퐷퐴푃 퐺-푃 ℎ푙퐹 6퐶 = 푣 퐷 퐾푑2 − 푘 푇 푒푡 · 푇 푒푡 + 푘 푇 푒푡 − 훾푇 푒푡 푑푡 푚푎푥 푡표푡 퐷퐴푃 퐺-푃 ℎ푙퐹 + 1 표푛2 4퐶 6퐶 표푓푓2 4퐶-6퐶 6퐶 퐾푑2 푇 푒푡4퐶 푑푖푛푡 퐾푑3 = 푣푚푎푥퐷푡표푡 − 훾푖푛푡 푑푡 푇 푒푡4퐶 + 1 퐾푑3 푇 푒푡6퐶 푑푖푛푡푅퐷퐹 퐾푑4 = 푣푚푎푥퐷푡표푡 − 훾푖푛푡푅퐷퐹 푑푡 푇 푒푡6퐶 + 1 퐾푑4 (︀ 푡 − 푝푒푟푖표푑 · 푓푙표표푟(푡/푝푒푟푖표푑) − 푡표푛 where 푖푛푑(푡) = 0.5 푡푎푛ℎ · 푘푡 푡 − 푝푒푟푖표푑 · 푓푙표표푟(푡/푝푒푟푖표푑) − 푡 푡푎푛ℎ 표푓푓 )︀ 푘푡

Parameter values for the integrase switch are given in Table A.1 were gathered from [77], and for the comparator, from [91]

59 3.2 A Stability Analysis for Population-Scale Learning

From the previous section, we’ve observed that the creation of molecular circuits which learn, though fraught with challenges, is a prospective enterprise. Biological in situ classiers of this variety thus have a huge potential to impact personalized medicine, as they facilitate the devel- opment of therapies tailored not only to a specic disease, but also to a specic individual, their cells, and perhaps even their states as they evolve over time.

However, given the potential variance in context experienced by a cell, this creates reason to feel uneasy: Environmental dierences may produce a high generalization gap across a seemingly homogenous population of cell classiers, and with disastrous consequence. In particular, in cancer immunotherapy, consistent misclassication due to generalization error could give rise to a viral o-target response in patients and result in signicant destruction to healthy tissue. For this reason, it’s pertinent to study the sensitivity of learning engineered for living cells, both to compare alternative designs and to mitigate the risk of those selected. Because cell classi- cation circuits must be delivered en masse to large cell cultures (typically 500,000 – 10,000,000 cells, simultaneously), the implementation and training process naturally lends itself to massive parallelization. In the nal section, I examine how aggregating many such cell classiers in an en- semble approach can increase the classication stability compared to a single deterministic classi- er, provided some estimated input variance.

3.2.1 Preliminaries

Stability of Widrow-Ho under Perturbation In this paper, we focus our at- tention on the stability and convergence of learning parameters belonging to a perceptron in a noisy environment which employs an update rule comparable to the Widrow-Ho algorithm. In particular we have:

훼(푒푘 + 푑푘)푋푘 푊푘+1 = 푊푘 + 2 (3.1) |푋푘| ⊤ 푒푘 = 푦푘 − 푋푘 푊푘 (3.2)

60 where:

푘 = time or cycle index,

푊푘 = [푤1,푘, 푤2,푘, . . . 푤푛,푘], value of the weight vector at time 푘,

훼 = reduction factor at each cycle

푒푘 = error at time 푘

푑푘 = additive noise at time 푘

푋푘 = [푥1,푘, 푥2,푘, . . . 푥푛,푘], value of the input vector at time 푘,

푦푘 = desired output at time 푘,

Widrof et al. have shown in [119] when 푑푘 = 0, for each cycle:

⊤ ∆푒푘 = ∆(푦푘 − 푋푘 푊푘)

⊤ = −푋푘 ∆푊푘

⊤ = −푋푘 (푊푘+1 − 푊푘) ⊤ 훼푒푘푋푘 푋푘 = − 2 |푋푘|

= −훼푒푘

so that the algorithm converges for 0 < 훼 < 2. However, for 푑푘 ̸= 0, we have:

푒푘+1 = 푒푘 − 훼(푒푘 + 푑푘)

= (1 − 훼)푒푘 − 훼푑푘

which need not converge given a sequence:

{푑푘} : @ lim 푑푘 푘→∞

61 In [52], Hui and Zak propose a modied version of the Widrow-Ho algorithm in which the

reduction factor, 훼, is allowed to vary with time index 푘 as a new sequence {푎푘}, resulting in the convergence of the weight vector even in the presence of noise. To summarize their results, ∏︀∞ provided 0 ≤ 푎푘 < 1, 푘 = 0, 1, 2, 3,..., satisfying 푖=1(1 − 훼푘) = 휖 ∈ [0, 1), we can bound

the maximum error, 푒푘 as:

lim sup |푒푘| ≤ 휖|푒0| + (1 − 휖)푑, where 푑 = sup |푑푘| 푘→∞ 푘

Then, for 휖 > 0 or 푑 = 0, there exists 푙푖푚푘→∞푊푘 = 푊∞, i.e. the weight vector will converge with some bounded error reliant on the initial error, 푒0, and maximal noise, 푑. The crux of the

author’s proof lies in the construction of {훼푘}:

∞ ∏︀ Choose {훾푘} : 0 < 훾푘 < 1, 훾푘 < ∞ 푘=1 ∞ ∏︀ Set 푝 = 훾푘 > 0 푘=1 Set 휏 > 0 : 푝휏 = 휖

휏 Set 훼 = 1 − (1 − 훾푘)

∏︀∞ Then 0 ≤ 푎푘 < 1 and 푘=1(1 − 훼푘) = 휖. Moreover, from 푒푘+1 = 푒푘 − 훼(푒푘 + 푑푘), we can obtain:

(︂ 푘 )︂ 푘 (︂ 푘 )︂ ∏︀ ∑︀ ∏︀ 푒푘+1 = (1 − 훼푗) 푒0 + (1 − 훼푗) 푎푙푑푙 =⇒ 푗=0 푙=0 푗=푙+1 ∞ ∞ (︀ ∏︀ )︀ (︀ ∏︀ )︀ lim sup |푒푘+1| ≤ (1 − 훼푘) |푒0| + 1 − (1 − 훼푘) 푑 푘→∞ 푘=1 푘=1

≤ 휖|푒0| + (1 − 휖)푑

In the case of a single perceptron whose evaluation is susceptible to corruption by noise, these guaranteed error bounds and proof of convergence are adequate. However, for the context de- scribed in the introduction, they appear insucient. As in Widrow and Lehr [119], the previous strategy only considers cases where the same complete input pattern 푋 is presented in successive

iterations, and none other is presented, i.e. 푋1 = 푋2 = . . . 푋푘 = 푋푘+1. Rather, we’d like to

62 demonstrate stability in learning algorithms whose input is subject to variation as well. For exam- ple, for random subsets of a theoretical, complete input space 푋, it would be useful to provide stability bounds for the same algorithm as above. To do so, in the following section, we apply the framework for random hypothesis stability [36] to an ensemble of our noisy perceptrons.

Random Hypothesis Stability A complete treatment of random hypothesis stability can be found in [36]. For conciseness, here we restate only the main assumptions and conclusions presented by the authors. Random hypothesis stability generalizes the notion of hypothesis sta- bility for deterministic algorithms by Bousqet et al. to study the performance of algorithms with randomization, typically w.r.t. their parameters or inputs [13]. In particular, the authors discard the assumption of symmetry and address asymmetric learning algorithms in which permutations of the training data may lead to dierent outcomes due to random sampling. Random hypothesis stability is dened comparably to Buosqet et al. such that a randomized algorithm 푓 has random

hypothesis stability 훽푚 w.r.t. the loss function 푙 if:

[︁⃒ ⃒]︁ ∀푖 ∈ {1, . . . , 푚}, E퐷,푧,푟 ⃒푙(푓퐷,푟, 푧) − 푙(푓퐷∖푖,푟, 푧)⃒ ≤ 훽푚

for training set 퐷 and 푧 ∈ 풵 = 풳 × 풴. It’s assumed that the randomness of an algorithm,

푟 = (푟1, . . . , 푟푇 ) (where 푟푡, 푡 = 1, ..., 푇 are random elements drawn independently from the same distribution ℛ) is independent of the training set 퐷, and the same 푟 can be applied to 푓퐷 and 푓퐷∖푖 where 퐷 ∖ 푖 is the set 퐷 with point 푖 removed.

It follows that if we let 푓퐷,푟 be the outcome of a random algorithm with random hypothesis stability 훽푚 w.r.t. the loss function 푙 bounded 0 ≤ 푙(푓, 푧) ≤ 푀, for all 푦 ∈ 푌, 푟 ∈ ℛ, 퐷, and for an unsampled training set of size 푚, with probability 1 − 훿:

√︂ 2푀 2 + 12푀푚훽 |푅(푓 ) − 푅 (푓 )| ≤ 푚 퐷,푟 푒푚푝 퐷,푟 훿푚

Ensemble Methods Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance, bias, or improve predictions. Of these, parallel ensemble methods - methods where the models are generated in parallel - exploit independence between these “learners” to reduce error by averaging. When an

63 ensemble method uses a single base learning algorithm to produce a homogeneous set of predic- tors, i.e. functions of the same type, it is called a homogeneous ensemble [87]. In this paper we examine a homogenous ensemble of perceptron-like classication algorithms. Moreover, we in- terpret 푟 as being used to randomly sub-sample the complete input distribution, yielding 풟 and

a noise factor 푑푡,퐷 dependent on 퐷 and the index of the algorithm, 푡. Building on our previous denition, we now have:

푇 1 ∑︁ 퐹 = 푓 + 푑 퐷,푟 푇 퐷,푟푡 푡,퐷 푡=1

3.2.2 Stability Analysis of a Perceptron Ensemble

Stability of the Perceptron Ensemble with Noise In the following section, we compute gen- eralization bounds for our perceptron ensemble based on the results previously discussed. For

xed 퐷 and 푧, i.i.d. 푟푡 ∈ ℛ, and 푙1 loss function, we dene:

[︁⃒ ⃒]︁ 퐼(퐷, 푧) = E푟1,...,푟푡 ⃒푙(퐹퐷,푟) − 푙(퐹퐷∖푖,푟)⃒ 푇 푇 [︁⃒ 1 ∑︁ 1 ∑︁ ⃒]︁ = 푟1,...,푟푡 ⃒푙( 푓퐷,푟푡 + 푑푡,퐷) − 푙( 푓퐷∖푖,푟푡 + 푑푡,퐷∖푖)⃒ E 푇 푇 푡=1 푡=1 푇 1 [︁⃒ ∑︁ (︀ )︀⃒]︁ ≤ 푟1,...,푟푡 ⃒ 푓퐷,푟푡 − 푓퐷∖푖,푟푡 + 푑푡,Δ퐷 ⃒ 푇 E 푡=1 where 푑푡,Δ퐷 is the dierential noise for sets 퐷, 퐷∖푖, generated by 푟푡 with and without 푖. Because

푟1, . . . 푟푡 are i.i.d. random variables, we have:

푇 1 ∑︁ [︁⃒(︀ )︀⃒]︁ 퐼(퐷, 푧) ≤ 푟푡 ⃒ 푓퐷,푟푡 − 푓퐷∖푖,푟푡 + 푑푡,Δ퐷 ⃒ 푇 E 푡=1 [︁⃒(︀ )︀⃒]︁ ≤ E푟푡 ⃒ 푓퐷,푟푡 − 푓퐷∖푖,푟푡 + 푑푡,Δ퐷 ⃒ [︁⃒(︀ )︀⃒ ]︁ ≤ E푟푡 ⃒ 푓퐷,푟푡 − 푓퐷∖푖,푟푡 + 푑푡,Δ퐷 ⃒(1푖∈푟 + 1푖/∈푟) [︁⃒ ⃒ ]︁ [︁⃒ ⃒ ]︁ ≤ E푟푡 ⃒푓퐷,푟푡 − 푓퐷∖푖,푟푡 ⃒1푖∈푟 + E푟푡 ⃒푓퐷,푟푡 − 푓퐷∖푖,푟푡 ⃒1푖/∈푟 + [︁ ]︁ [︁ ]︁ E푟푡 |푑푡,Δ퐷|1푖∈푟 + E푟푡 |푑푡,Δ퐷|1푖/∈푟

64 Noticing 푓퐷,푟푡 = 푓퐷∖푖,푟푡 when 푖∈ / 푟, we then eliminate the terms whose indicator functions include that condition. Taking the expectation w.r.t. 퐷 and 푧 yields:

[︀ ]︀ 훽푚 = E퐷,푧 퐼(퐷, 푧) [︁⃒ ⃒ ]︁ [︁ ]︁ ≤ E퐷,푧,푟푡 ⃒푓퐷,푟푡 − 푓퐷∖푖,푟푡 ⃒1푖∈푟 + E퐷,푧,푟푡 |푑푡,Δ퐷|1푖∈푟 [︂ ]︂ [︁⃒ ⃒]︁ [︁ ]︁ ≤ E푟푡 E퐷,푧 ⃒푓퐷,푟푡 − 푓퐷∖푖,푟푡 ⃒ 1푖∈푟 + E퐷,푧,푟푡 |푑푡,Δ퐷|1푖∈푟 [︀ ]︀ [︁ ]︁ ≤ E푟푡 훾|퐷|1푖∈푟 + E퐷,푧,푟푡 |푑푡,Δ퐷|1푖∈푟

Here, 훾|퐷| is the deterministic leave-one-out stability for 푓 with a training set of |퐷| unique ele- ments [Buosqet]. We rewrite the rst term as:

푚 [︀ ]︀ ∑︁ (︀ )︀ [︀ ]︀ E푟푡 훾|퐷|1푖∈푟 = P |퐷| = 푘 훾푘E푟푡 1푖∈푟 : |퐷| = 푘 푘=1 푚 ∑︁ 푘 = (︀|퐷| = 푘)︀훾 P 푘 푚 푘=1 푝 = 훾푝 , if 퐷 is generated for 푝 points by 푟 without replacement 푚

Finally, we rename our noise term 푁 and substitute the bound computed for 훽푚 into the error bound provided above, yielding:

√︂ 2푀 2 + 12푀푚훽 |푅(푓 ) − 푅 (푓 )| ≤ 푚 퐷,푟 푒푚푝 퐷,푟 훿푚 √︂ 2푀 2 + 12푀(훾 푝 + 푁푚) ≤ 푝 훿푚

Example Calculation with Biologically-Realistic Values We recognize that in practice, 푝, 푚, and 훾푝 are unknown to an experimenter and must be estimated. Previously, single-cell measure- ment protocols like scRNA-seq have been used to characterize gene expression variance across a sample cell population comparable to the target population [64, 62]. From multiple scRNA-seq measurements, 푇 , 푝, and 푚 can be measured directly with a calculated variance. For this demon- stration, however, in lieu of scRNA-seq data and to simplify calculation, we assign toy values to

65 each parameter. We have:

푀 = 100휇푀, the dynamic range of the output molecule

푝 = 1440, a training period of 24 hours and change in cell state each minute 푚 = 2160 = 1.5푝

훿 = 0.05, i.e. 0.95 condence which yields:

√︂ 2푀 2 + 12푀(훾 푝 + 푁푚) |푅(푓 ) − 푅 (푓 )| ≤ 푝 퐷,푟 푒푚푝 퐷,푟 훿푚 √︂ 20000 + 1200(훾 1440 + 푁푚) ≤ 푝 0.95 * 2160 √︀ / 0.975 + 842.1훾푝 + 1263.16푁

If we assume that expected noise, 푁 is ve or more orders of magnitude lower than the range of input signal (i.e. 푁 ≈ 0.001), we can relate the deterministic 훾푝 to our bound, 훾푟:

2 훾푝 ≈ 0.001훾푟 + 0.003

66 Chapter 4

Conclusion

The ability to process biomolecular information is a hallmark of living systems; understanding and precise control of this ability is critical to better engineering the behavior of biological sys- tems. Synthetic biology seeks to collect, rene, and repackage these processes so that it’s easier to design new and reliable biological systems which behave as predictably and modularly as would integrated circuits in a computer. Its most optimistic proponents seek to apply these principles to human health, where challenges which evolve over space and time. Diseases like cancer adapt to evade detection, while organ development requires a complicated coordination of growth factors and external inputs. These are challenges with complex, multi-step solutions, often where the link between the design and result is unobservable or at least hard to predict. Their solutions must eciently inte- grate multiple contextual signals while simultaneously providing a versatile substrate for sophis- ticated computation from which those signals may be decoded and acted upon. Moreover, all of the circuitry required for intelligent decision-making must 1) operate autonomously and 2) t within the connes of a living cell. Clearly, this isn’t an easy problem. You can recap about a decade of progress in synthetic biology with the question: “Can a cell act like a computer?” In other words, can we use molecular interactions in cells like we use transistors in digital electronics? The short answer is yes, and most research in synthetic biology today exists in its wake. It’s the answer that marked synthetic biology as the epicenter of program- ming life. But since it was rst asked, synthetic biologists have created new questions like can our

67 circuits adapt or intelligently account for unknowns? These questions push the boundaries of binary logic, and require new types of cellular computing to answer. Today, a more interesting question is: “Can a cell act like a brain?” Specically, can our syn- thetic decision-making circuits exhibit some of the properties we value in neural systems like re- source eciency, robustness to damage, and learning behaviors. As the intelligence of smarter, adaptive cell therapies, neuromorphic cell circuits would oer a new learning platform to au- tonomously explore these challenging applications in live cells and in real time. In this thesis, I’ve described the implementation of in-vivo neuromorphic circuits in human cell culture models as a proof-of-concept for their application to personalized medicine. Delivering on the premise of neuromorphic computing requires two pieces: First, a general- purpose unit that we can replicate and join together to represent a very large pool of possible computations. It’s a pool that might contain, for instance, disease classiers or programs for organ development. Which computation we actually perform is decided by the second piece, a method for learning the parameters which dene the computation we actually want. In my thesis, I’ve contributed toward realizing these components in mammalian cells. Specically, I’ve developed:

1. Several genetic programs which encode articial neuron computations, the fundamental unit of a neuromorphic circuit. This part I’ve worked on primarily through wet-lab re- search conducted in the Weiss Lab facilities.

2. Molecular learning circuits that would allow the unit to optimize itself autonomously. This I’ve worked on through biologically-realistic models and simulations, and analyzed formally using methods from statistical machine learning.

I believe these two advancements create a foundation for further study of neuromorphic cir- cuits in mammalian cells, and a proof-of-concept for their application to personalized medicine. As traditional methods fall short, I believe these steps toward cellular neuromorphic comput- ing will help mammalian synthetic biologists grapple with complex challenges like autonomous disease classication and organ dierentiation. Simultaneously, the results included lay ground- work to analyze the role of machine learning in medicine, where its dicult interpretability con- tradicts the need to guarantee stable, safe, and ecacious therapies.I also expect neuromorphic

68 computing to impact the ways synthetic biologists currently design. Most synthetic biologists at some point quote Richard Feynman as an inspiration for their approach to biology: "What I cannot create, I do not understand.[108]" From the beginning, knowing and making have been critically intertwined in the philosophy driving synthetic biologists. Behind clean, published results are messy cycles of design, building, and testing that help us better understand the systems we create. It’s easy to think that surrender- ing this iterative hands-on process to an autonomous neuromorphic circuit would kill that phi- losophy: In synthetic biology, we normally create genetic blueprints for the cell which describe how to make molecules that then process information. Any renements to those blueprints, we do ourselves based on what we know and what we observe. In cellular learning, we transfer most of that design cycle directly into the cells themselves. Practically, it involves creating a genetic blueprint that describes rst how to process information, and then how to use that information to modify the blueprint without our direct intervention. But biologists close to my project have actually more often endorsed a dierent scenario. As researchers, letting our designs gure out topologies and conditions that work gives us more and better material to learn from, probe, and interact with, especially where the natural precedents are too entangled with everything else going on in a cell. Under this perspective, neuromorphic circuits have the potential to expand our view on biological design principles and even help create ones, and I nd that pretty cool. I want to conclude by reecting on a question that comes up whenever I explain to a biologist that I have a background in architecture: “Why does a designer care about any of this?” In the SMArchS Computation Group, we all have at least some predilection for trying to un- derstand the world through the lens of computation. We look at design and fabrication and con- struction and materials and try to distill the reasons why they can or can’t be programmed. Now that we’re shifting toward programming life, where a lot of our ideas originated, I feel there’s room in that conversation for us to contribute back: When someone in my lab mentions pro- gramming morphogenesis or design automation, I always have something to say and it’s coming from architecture. It’s those parallels between elds that keep me interested and make me con- dent that designers have a place in synthetic biology.

69 70 Appendix A

Tables

71 Table A.1: Parameter values for the molecular learning circuit

Parameter Values Parameter Name Value Parameter Name Value 4 푏1푓 10.0 푏1푟 10.0 3 푏2푓 10.0 푏2푟 200.0 4 푏3푓 10.0 푏3푟 10.0 3 푏4푓 10.0 푏4푟 200.0 푟1푓 6.0 푟1푟 2.14 푟2푓 6.0 푟2푟 3.0 푠푦푛1푓 0.006 푠푦푛1푟 0.017 푠푦푛2푓 0.06 푠푦푛2푟 0.12 −1 −1 −1 푣푚푎푥 50 휇푀 ℎ푟. 훾 0.5 hr.

푘표푛푖 5.0 푘표푓푓푖 0.50

퐾푑푖 10.0 퐷푡표푡 0.10 휇푀 period 24 hr. k푡 0.3 hr. t표푛 0 hr. t표푓푓 2 hr.

72 Appendix B

Figures

73 Figure B-1: Personalized medicine increases in therapeutic precision from group level to single-cell ‘living drugs.’

74 Figure B-2: Neuromorphic computing ts within the ‘Sense-Compute-Eect’ pipeline for engineered-cell therapies.

75 Figure B-3: A general schematic for the operation of neuromorphic gene circuits in mammalian cells.

76 Figure B-4: Circuit design schematic for a three-input articial neuron computation using tran- scriptional regulation. Log-scale addition is performed by transcriptional regulation of the hybrid promoter P-Gal4- TetR-PhlfRby the repressors TetRand PhlF and activator Gal4-VP16. A constitutively-expressed BFP uorescent marker is used to retrieve the relative copy number of all circuit components. A pictorial representation of the computation performed is displayed in the upper-right corner.

77 Figure B-5: A general functional schematic for the computation performed by a hybrid promoter. Log-scale addition and subtraction of transcriptional regulators is performed by multiplying the ‘positive’ contribution of unique activators with the basal expression rate 푣0 and dividing by the ‘negative’ contribution of repressors. Green lines correspond to a positive contribution and red, to a negative contribution. Black dashed lines are used to indicate the constraint that rates of expression by multiple activation must equal the sum of the activators’ individual rates of expres- sion. Green dashed lines indicate a constrained value in the computation.

78 Figure B-6: Steady state uorescence measurements for a three-input hybrid promoter circuit with the corresponding log-scale reference computation. Values plotted are the geometric mean calculated for total cell events > 1푒4 for each combination of three inputs shown. We can observe similarity between the expected (coarse-grained) behav- ior of a log-scale articial neuron, displayed in the upper-right corner, and the output behavior, shown as a contoured surface with mean values (geometric mean calculated for approximately 1푒4 cells per data point) displayed in the adjacent graph.

79 Figure B-7: Reaction diagrams for molecular binding and sequestration. Conditions given describe the reaction rate constant requirements for molecular sequestration of 퐴 by 퐵 to take place. If the requirements are met, the top reaction diagram resembles the bottom reaction diagram, with respect to active species 퐴.

80 Figure B-8: Pictorial depiction of molecular sequestration as an articial neuron.

A summation is performed on the reaction uxes of ‘positive’ species, 훼1, . . . , 훼푝, and the reac- tion uxes ‘negative species’ species, 훽1, . . . , 훽푛, where the sum of 훽 uxes yields a linear thresh- old for the sum of 훼 uxes. The equivalent articial neuron diagram is given in the upper-right corner.

81 Figure B-9: Schematic for the use of transcription control to regulate the input of species to molecular sequestration. A summation is performed on the reaction uxes given by DNA concentration. Weights are specied according to relative copy number, rates of constitutive transcription and translation, and rates of degradation of mRNA and protein. The function 푓 denotes a non-linear activation function which is, in the least restrained case, a ReLU. The equivalent articial neuron diagram is given in the upper-right corner.

82 Figure B-10: Transcriptional regulation may be folded into the sequestration framework as a non- linear feature map, 휑(퐵푖, 푤푖) on input uxes, 퐵푖, induced by TFs.

The example given displays 퐵푖 as an activator, wherein weight 푤푖 now inuences the maximal value of expression achievable by the input 휑(퐵푖, 푤푖).

83 Figure B-11: Schematic and experimental data displaying thresholding of mRNA by the CasE endoRNase through molecular sequestration. The left gure depicts the schematic for molecular sequestration of mRNA by CasE, where input concentrations of eYFP and CasE plasmid are measured by the poly-transfection markers BFP and iRFP, respectively. The right gure displays the thresholding phenomenon wherein each dashed line indicates a xed input concentration of CasE.

84 Figure B-12: Circuit diagram for a three-input sequestration-based articial neuron with two pos- itive inputs (eYFP) and one negative input (CasE). Molecular sequestration of mRNA by CasE is depicted, where input concentrations of eYFP plasmids are measured by the poly-transfection markers BFP and iRFP, and the input concen- tration of the CasE plasmid is measured by the poly-transfection marker mKO2. A simplied functional schematic is depicted in the upper-right corner.

85 Figure B-13: Steady-state uorescence measurements and the corresponding log-scale and linear- scale reference computations for a three-input sequestration-based articial neuron with two pos- itive inputs (eYFP) and one negative input (CasE). Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combina- tion of three inputs shown. Though the operations performed are linear-scale additions, the u- orescence measurements collected after poly-transfection are uniformly distributed across each dimension in log-scale. Accordingly, experimental data (large gure) is presented using log-scale axes and a linear-scale reference is provided in the adjacent labeled graph. For more information on protocols for transfection and ow cytometry, see Methods.

86 Figure B-14: Steady-state uorescence measurements and cross sectional ReLU plots for a three- input sequestration-based articial neuron with two positive inputs (eYFP) and one negative in- put (CasE). Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of three inputs shown. ReLU cross-sections are plotted for each input when the other input is minimal (e.g. 푋1 plotted for 푋2 = 10−1). The threshold is marked with a dashed gray line.

87 Figure B-15: Steady-state uorescence measurements for increasing threshold values for a three- input sequestration-based articial neuron with two positive inputs (eYFP) and one negative in- put (CasE). Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of three inputs shown.

88 Figure B-16: Circuit diagram for a three-input sequestration-based articial neuron with one pos- itive input (eYFP) and two negative inputs (CasE). Molecular sequestration of mRNA by CasE is depicted, where the input concentration of the eYFP plasmid is measured by the poly-transfection markers BFP, and input concentrations of the CasE plasmids are measured by the poly-transfection markers iRFP and mKO2. A simplied functional schematic is depicted in the upper-right corner.

89 Figure B-17: Steady-state uorescence measurements and the corresponding log-scale and linear- scale reference computations for a three-input sequestration-based articial neuron with one pos- itive input (eYFP) and two negative inputs (CasE). Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of three inputs. Experimental data (large gure) is presented using log-scale axes and a linear-scale reference is provided in the adjacent labeled graph.

90 Figure B-18: Steady-state uorescence measurements for increasing threshold values for a three- input sequestration-based articial neuron with one positive input (eYFP) and two negative in- puts (CasE). Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of three inputs.

91 Figure B-19: Steady-state uorescence measurements and the corresponding linear-scale reference computations for dierently weighted variants of a three-input sequestration-based articial neu- ron. Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of three inputs. Weights are adjusted by decreasing the relative rate of expression for one of the positive inputs while keeping the relative concentration of its transfection marker unchanged. We can observe an increase in the intercept of 푥2, whose weight was modulated by three log-orders of magnitude, decreasing from left to right. The corresponding linear-scale reference computations, annotated with their changes in slope, are displayed below.

92 Figure B-20: Circuit design for a two-layer sequestration-based ANN featuring two nodes which computes an analog version of the XOR logical function.

93 Figure B-21: Steady-state uorescence measurements for a two-layer, two-node sequestration- based ANN computing XOR, plotted for four increasing oset values. Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of inputs.

94 Figure B-22: Circuit design for a two-layer sequestration-based ANN featuring three nodes which computes an analog version of the XNOR logical function.

95 Figure B-23: Steady-state uorescence measurements for a two-layer, three-node sequestration- based ANN computing XNOR, plotted for four increasing oset values. Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of inputs.

96 Figure B-24: Circuit design schematic for a three-input ANN using multi-modal transcriptional and post-transcriptional regulation. Log-scale addition is performed by transcriptional regulation of the hybrid promoter P-Gal4- TetR by the repressor TetR and activator Gal4-VP16. Linear Scale addition is performed by se- questration of eYFP mRNA by CasE.

97 Figure B-25: Steady-state uorescence measurements for A) a two-layer, two-node multi-modal ANN compared to B) unimodal (post-transcriptional) regulation Values plotted are the geometric mean calculated for total cell events > 1푒2 for each combination of inputs.

98 Figure B-26: Simplied circuit schematic for achieving molecular learning in cells.

99 Figure B-27: A circuit design for stoichiometry tuning in mammalian cells.

100 Figure B-28: Serine integrase and serine integrase-RDF fusion proteins catalyze the inversion of DNA regions anked by attP/attB and attL/attR target sites, respectively, in four stages.

101 Figure B-29: the transfer of species within one update to the active rtTA promoter sites from the introduction of int and intRDF to equilibrium.

102 Figure B-30: When both recombinase enzymes are introduced to the system simultaneously and allowed to degrade, the action on the attL/attR and attP/attB sites is driven by the proportional relationship of the enzymes.

103 Figure B-31: The eect of asynchrony between int and intRDF input signals on the update to active rtTA promoter sites.

104 Figure B-32: The behavior of the comparator module across titrated levels of active attP/attB promoter sites.

105 Figure B-33: Imbalances in the production of the Tet mutants and integrase enzymes upset the relationship between the error and update minima.

106 Figure B-34: Dierences in the production time for the titrant and sequesterer species disrupts the error inferred by the system, creating an update even when the number of rtTA sites is already at the optimum value.

107 Figure B-35: Two optimization runs consisting of 2 hour inductions of DAPG and DOX every 24 hours for 10 days. The right simulation begins below the optimum number of active rtTA sites and must increase its value; the left, above the value so that it must decrease its active sites.

108 Bibliography

[1] Tom Alber. Structure of the leucine zipper. Current opinion in genetics & development, 2(2):205–210, 1992.

[2] B. Alberts, D. Bray, J. Lewis, M. Ra, K. Roberts, and J.D. Watson. Molecular Biology of the Cell. Garland, 4th edition, 2002.

[3] Daniel A Anderson, Ross D Jones, Adam P Arkin, and Ron Weiss. Principles of synthetic biology: a mooc for an emerging eld. Synthetic Biology, 4(1):ysz010, 2019.

[4] Euan A Ashley. Towards precision medicine. Nature Reviews Genetics, 17(9):507, 2016.

[5] James P Balho and Gregory A Wray. Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites. Proceedings of the National Academy of Sciences, 102(24):8591–8596, 2005.

[6] Wolfgang Balzer and CM Dawe. Structure and comparison of genetic theories:(2) the reduction of character-factor genetics to molecular genetics. The British journal for the philosophy of science, 37(2):177–191, 1986.

[7] Peter Banda, Christof Teuscher, and Matthew R Lakin. Online learning in a chemical perceptron. Artificial life, 19(2):195–219, 2013.

[8] Yaakov Benenson, Binyamin Gil, Uri Ben-Dor, Rivka Adar, and Ehud Shapiro. An autonomous molecular computer for logical control of gene expression. Nature, 429(6990):423, 2004.

[9] Yaakov Benenson, Tamar Paz-Elizur, Rivka Adar, Ehud Keinan, Zvi Livneh, and Ehud Shapiro. Programmable and autonomous computing machine made of . Na- ture, 414(6862):430, 2001.

[10] Lacramioara Bintu, Nicolas E Buchler, Hernan G Garcia, Ulrich Gerland, Terence Hwa, Jané Kondev, and Rob Phillips. Transcriptional regulation by the numbers: models. Cur- rent opinion in genetics & development, 15(2):116–124, 2005.

[11] Franco Blanchini and Elisa Franco. Structural analysis of biological networks. In A Sys- tems Theoretic Approach to Systems and Synthetic Biology I: Models and System Character- izations, pages 47–71. Springer, 2014.

109 [12] Drew Blount, Peter Banda, Christof Teuscher, and Darko Stefanovic. Feedforward chem- ical neural network: An in silico chemical system that learns xor. Artificial life, 23(3):295– 317, 2017.

[13] Olivier Bousquet and André Elissee. Stability and generalization. Journal of machine learning research, 2(Mar):499–526, 2002.

[14] Renier J Brentjens, Marco L Davila, Isabelle Riviere, Jae Park, Xiuyan Wang, Lindsay G Cowell, Shirley Bartido, Jolanta Stefanski, Clare Taylor, Malgorzata Olszewska, et al. Cd19- targeted t cells rapidly induce molecular remissions in adults with chemotherapy-refractory acute lymphoblastic leukemia. Science translational medicine, 5(177):177ra38–177ra38, 2013.

[15] Ingo Brigandt and Alan Love. Reductionism in biology. 2008.

[16] Nicolas E Buchler and Frederick R Cross. Protein sequestration generates a exible ultra- sensitive response in a genetic network. Molecular , 5(1):272, 2009.

[17] Nicolas E Buchler, Ulrich Gerland, and Terence Hwa. On schemes of combinatorial tran- scription logic. Proceedings of the National Academy of Sciences, 100(9):5136–5141, 2003.

[18] Volker Busskamp, Nathan E Lewis, Patrick Guye, Alex HM Ng, Seth L Shipman, Susan M Byrne, Neville E Sanjana, Jernej Murn, Yinqing Li, Shangzhong Li, et al. Rapid neuro- genesis through transcriptional activation in human stem cells. Molecular systems biology, 10(11), 2014.

[19] Stefano Cardinale and Adam Paul Arkin. Contextualizing context for synthetic biology– identifying causes of failure of synthetic biological systems. Biotechnology journal, 7(7):856–866, 2012.

[20] Marina Cavazzana-Calvo, Emmanuel Payen, Olivier Negre, Gary Wang, Kathleen Hehir, Floriane Fusil, Julian Down, Maria Denaro, Troy Brady, Karen Westerman, et al. Trans- fusion independence and hmga2 activation after gene therapy of human 훽-thalassaemia. Nature, 467(7313):318, 2010.

[21] Patrick Chames, Marc Van Regenmortel, Etienne Weiss, and Daniel Baty. Therapeutic antibodies: successes, limitations and hopes for the future. British journal of pharmacology, 157(2):220–233, 2009.

[22] James Chappell, Alexandra Westbrook, Matthew Veroslo, and Julius B Lucks. Compu- tational design of small transcription activating rnas for versatile and dynamic gene regu- lation. Nature communications, 8(1):1051, 2017.

[23] Ravi VJ Chari, Michael L Miller, and Wayne C Widdison. Antibody–drug conju- gates: an emerging concept in cancer therapy. Angewandte Chemie International Edition, 53(15):3796–3827, 2014.

[24] Kevin M Cherry and Lulu Qian. Scaling up molecular pattern recognition with dna-based winner-take-all neural networks. Nature, 559(7714):370, 2018.

110 [25] William CS Cho. Micrornas: potential biomarkers for cancer diagnosis, prognosis and targets for therapy. The international journal of biochemistry & cell biology, 42(8):1273– 1281, 2010.

[26] George M Church, Michael B Elowitz, Christina D Smolke, Christopher A Voigt, and Ron Weiss. Realizing the potential of synthetic biology. Nature Reviews Molecular Cell Biology, 15(4):289, 2014.

[27] Francis S Collins and Harold Varmus. A new initiative on precision medicine. New Eng- land journal of medicine, 372(9):793–795, 2015.

[28] Christian Cuba Samaniego, Giulia Giordano, Jongmin Kim, Franco Blanchini, and Elisa Franco. Molecular titration promotes oscillations and bistability in minimal network mod- els with monomeric regulators. ACS synthetic biology, 5(4):321–333, 2016.

[29] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathemat- ics of control, signals and systems, 2(4):303–314, 1989.

[30] Ramiz Daniel, Jacob R Rubens, Rahul Sarpeshkar, and Timothy K Lu. Synthetic analog computation in living cells. Nature, 497(7451):619, 2013.

[31] Leandro Nunes De Castro. Fundamentals of : basic concepts, algorithms, and applications. Chapman and Hall/CRC, 2006.

[32] Domitilla Del Vecchio and Richard M Murray. Biomolecular feedback systems. Princeton University Press Princeton, NJ, 2015.

[33] Breanna DiAndreth, Noreen Wauford, Eileen Hu, Sebastian Palacios, and Ron Weiss. Per- sist: A programmable rna regulation platform using crispr endornases. bioRxiv, 2019.

[34] Hengjiang Dong, Lars Nilsson, and Charles G Kurland. Gratuitous overexpression of genes in leads to growth inhibition and ribosome destruction. Journal of bacteriology, 177(6):1497–1504, 1995.

[35] Xavier Duportet, Liliana Wroblewska, Patrick Guye, Yinqing Li, Justin Eyquem, Ju- lianne Rieders, Tharathorn Rimchala, Gregory Batt, and Ron Weiss. A platform for rapid prototyping of synthetic gene networks in mammalian cells. Nucleic acids research, 42(21):13440–13451, 2014.

[36] Andre Elissee, Theodoros Evgeniou, and Massimiliano Pontil. Stability of randomized learning algorithms. Journal of Machine Learning Research, 6(Jan):55–79, 2005.

[37] Michael B Elowitz and Stanislas Leibler. A synthetic oscillatory network of transcriptional regulators. Nature, 403(6767):335, 2000.

[38] Dario Floreano and Claudio Mattiussi. Bio-inspired artificial intelligence: theories, meth- ods, and technologies. MIT press, 2008.

[39] Sarah Franklin, Celia Lury, and Jackie Stacey. Global nature, global culture. Sage, 2000.

111 [40] Jeremy J Gam, Jonathan Babb, and Ron Weiss. A mixed antagonistic/synergistic mirna repression model enables accurate predictions of multi-input mirna sensor activity. Nature communications, 9(1):2430, 2018.

[41] Jeremy J Gam, Breanna DiAndreth, Ross D Jones, Jin Huh, and Ron Weiss. A ‘poly- transfection’method for rapid, one-pot characterization and optimization of genetic sys- tems. Nucleic acids research, 47(18):e106–e106, 2019.

[42] Timothy S Gardner, Charles R Cantor, and James J Collins. Construction of a genetic toggle switch in escherichia coli. Nature, 403(6767):339, 2000.

[43] Michele S Garnkel, Drew Endy, Gerald L Epstein, and Robert M Friedman. Synthetic genomics: options for governance. Industrial Biotechnology, 3(4):333–365, 2007.

[44] Daniel G Gibson, John I Glass, Carole Lartigue, Vladimir N Noskov, Ray-Yuan Chuang, Mikkel A Algire, Gwynedd A Benders, Michael G Montague, Li Ma, Monzia M Moodie, et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science, 329(5987):52–56, 2010.

[45] Biomarkers Denitions Working Group, Arthur J Atkinson Jr, Wayne A Colburn, Vic- tor G DeGruttola, David L DeMets, Gregory J Downing, Daniel F Hoth, John A Oates, Carl C Peck, Robert T Schooley, et al. Biomarkers and surrogate endpoints: preferred def- initions and conceptual framework. Clinical pharmacology & therapeutics, 69(3):89–95, 2001.

[46] Richard HR Hahnloser, Rahul Sarpeshkar, Misha A Mahowald, Rodney J Douglas, and H Sebastian Seung. Digital selection and analogue amplication coexist in a cortex- inspired silicon circuit. Nature, 405(6789):947, 2000.

[47] Derek K Hitchins. Systems engineering: a 21st century systems methodology. John Wiley & Sons, 2008.

[48] Allen Hjelmfelt and John Ross. Chemical implementation and thermodynamics of col- lective neural networks. Proceedings of the National Academy of Sciences, 89(1):388–391, 1992.

[49] Allen Hjelmfelt, Edward D Weinberger, and John Ross. Chemical implementation of neural networks and turing machines. Proceedings of the National Academy of Sciences, 88(24):10983–10987, 1991.

[50] John J Hopeld. Neural networks and physical systems with emergent collective compu- tational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.

[51] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.

[52] Stefen Hui and Stanislaw H Zak. Robust stability analysis of adaptation algorithms for single perceptron. IEEE transactions on neural networks, 2(2):325–328, 1991.

112 [53] Giacomo Indiveri. Neuromorphic analog vlsi sensor for visual tracking: Circuits and ap- plication examples. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 46(11):1337–1347, 1999.

[54] François Jacob and . Genetic regulatory mechanisms in the synthesis of proteins. Journal of molecular biology, 3(3):318–356, 1961.

[55] Marc Jakoby, Bernd Weisshaar, Wolfgang Dröge-Laser, Jesus Vicente-Carbajosa, Jens Tiedemann, Thomas Kroj, and François Parcy. bzip transcription factors in arabidopsis. Trends in plant science, 7(3):106–111, 2002.

[56] Rabea Jesser, Juliane Behler, Christian Benda, Viktoria Reimann, and Wolfgang R Hess. Biochemical analysis of the cas6-1 rna endonuclease associated with the subtype id crispr- cas system in synechocystis sp. pcc 6803. RNA biology, 16(4):481–491, 2019.

[57] Michael Kalos and Carl H June. Adoptive t cell transfer for cancer immunotherapy in the era of synthetic biology. , 39(1):49–60, 2013.

[58] Stuart Kauman. and dierentiation in random genetic control networks. Nature, 224(5215):177, 1969.

[59] Evelyn Fox Keller. Physics and the emergence of molecular biology: A history of cognitive and political synergy. Journal of the History of Biology, pages 389–409, 1990.

[60] Evelyn Fox Keller. Refiguring life: Metaphors of twentieth-century biology. Columbia Uni- versity Press, 1995.

[61] Evelyn Fox Keller. The century of the gene. Harvard University Press, 2009.

[62] Jong Kyoung Kim, Aleksandra A Kolodziejczyk, Tomislav Ilicic, Sarah A Teichmann, and John C Marioni. Characterizing noise structure in single-cell rna-seq distinguishes genuine from technical stochastic allelic expression. Nature communications, 6:8687, 2015.

[63] Tasuku Kitada, Breanna DiAndreth, Brian Teague, and Ron Weiss. Programming gene and engineered-cell therapies with synthetic biology. Science, 359(6376):eaad1067, 2018.

[64] Aleksandra A Kolodziejczyk, Jong Kyoung Kim, ValentineSvensson, John C Marioni, and Sarah A Teichmann. The technology and biology of single-cell rna sequencing. Molecular cell, 58(4):610–620, 2015.

[65] Valérie Ledent, Odier Paquet, and Michel Vervoort. Phylogenetic analysis of the human basic helix-loop-helix proteins. Genome biology, 3(6):research0030–1, 2002.

[66] Christoph Lengauer, Kenneth W Kinzler, and Bert Vogelstein. Genetic instabilities in human cancers. Nature, 396(6712):643, 1998.

[67] Wendell A Lim and Carl H June. The principles of engineering immune cells to treat cancer. Cell, 168(4):724–740, 2017.

113 [68] Artémis Llamosi, Andres M Gonzalez-Vargas, Cristian Versari, Eugenio Cinquemani, Gi- ancarlo Ferrari-Trecate, Pascal Hersen, and Gregory Batt. What population reveals about individual cell identity: single-cell parameter estimation of models of gene expression in yeast. PLoS , 12(2):e1004706, 2016.

[69] Jonathan Lloyd, Claire H Tran, Krishen Wadhwani, Christian Cuba Samaniego, Hari KK Subramanian, and Elisa Franco. Dynamic control of aptamer–ligand activity using strand displacement reactions. ACS synthetic biology, 7(1):30–37, 2017.

[70] Randolph Lopez, Ruofan Wang, and Georg Seelig. A molecular multi-gene classier for disease diagnostics. Nature chemistry, 2018.

[71] Jessica Lovaas. The politics of life itself: Biomedicine, power, and subjectivity in the twenty-rst century. by nikolas rose. pp. 352.(princeton university press, princeton, nj, 2006.)£ 15.95, isbn 978-0-691-12191-8, paperback. Journal of Biosocial Science, 39(5):795– 796, 2007.

[72] Adrian Mackenzie. Design in synthetic biology. BioSocieties, 5(2):180–198, 2010.

[73] Adrian Mackenzie. Synthetic biology and the technicity of biofuels. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 44(2):190–198, 2013.

[74] Mark Eben Massari and Cornelis Murre. Helix-loop-helix proteins: regulators of tran- scription in eucaryotic organisms. Molecular and cellular biology, 20(2):429–440, 2000.

[75] Harley H McAdams and Adam Arkin. Gene regulation: Towards a circuit engineering discipline. Current Biology, 10(8):R318–R320, 2000.

[76] Carver Mead. Neuromorphic electronic systems. Proceedings of the IEEE, 78(10):1629– 1636, 1990.

[77] Ron Milo, Paul Jorgensen, Uri Moran, Grin Weber, and Michael Springer. Bionum- bers—the database of key numbers in molecular and cell biology. Nucleic acids research, 38(suppl_1):D750–D753, 2009.

[78] Ron Milo and Rob Phillips. Cell biology by the numbers. Garland Science, 2015.

[79] Marvin Lee Minsky. Computation. Prentice-Hall Englewood Clis, 1967.

[80] Pejman Mohammadi, Niko Beerenwinkel, and Yaakov Benenson. Automated design of synthetic cell classier circuits using a two-step optimization strategy. Cell systems, 4(2):207–218, 2017.

[81] Michel Morange. A history of molecular biology. Harvard University Press, 2000.

[82] Vinod Nair and Georey E Hinton. Rectied linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML- 10), pages 807–814, 2010.

114 [83] Nagarajan Nandagopal and Michael B Elowitz. Synthetic biology: integrated gene circuits. science, 333(6047):1244–1248, 2011.

[84] Jackson O’Brien and Arvind Murugan. Temporal pattern recognition through analog molecular computation. arXiv preprint arXiv:1810.02883, 2018.

[85] Anthony O’hear. An introduction to the philosophy of science. 1993.

[86] Robert Cecil Olby. The path to the double helix: the discovery of DNA. Courier Corpora- tion, 1994.

[87] Nikunj Chandrakant Oza and Stuart Russell. Online ensemble learning. University of California, Berkeley, 2001.

[88] Luciana Parisi. Biotech: Life by contagion. Theory, Culture & Society, 24(6):29–52, 2007.

[89] April Pawluk, Alan R Davidson, and Karen L Maxwell. Anti-crispr: discovery, mechanism and function. Nature Reviews Microbiology, 16(1):12, 2018.

[90] Camelia-Mihaela Pintea. Advances in bio-inspired computing for combinatorial optimiza- tion problems. Springer, 2014.

[91] Alexandra Pokhilko, Oliver Ebenhöh, W Marshall Stark, and Sean D Colloms. Mathemat- ical model of a serine integrase-controlled toggle switch with a single input. Journal of The Royal Society Interface, 15(143):20180160, 2018.

[92] Mark Ptashne. How eukaryotic transcriptional activators work. Nature, 335(6192):683, 1988.

[93] Mark Ptashne and Alexander Gann. Transcriptional activation by recruitment. Nature, 386(6625):569, 1997.

[94] Mark Ptashne and Alexander Gann. Genes & signals, volume 402. Cold Spring Harbor Laboratory Press Cold Spring Harbor, NY:, 2002.

[95] Priscilla EM Purnick and Ron Weiss. The second wave of synthetic biology: from modules to systems. Nature reviews Molecular cell biology, 10(6):410, 2009.

[96] Lulu Qian and Erik Winfree. Scaling up digital circuit computation with dna strand dis- placement cascades. Science, 332(6034):1196–1201, 2011.

[97] Lulu Qian, Erik Winfree, and Jehoshua Bruck. Neural network computation with dna strand displacement cascades. Nature, 475(7356):368, 2011.

[98] Aviv Regev and Ehud Shapiro. Cells as computation. Nature, 419(6905), 2002.

[99] Hans-Jörg Rheinberger. Beyond nature and culture: modes of reasoning in the age of molecular biology and medicine. Cambridge Studies in Medical Anthropology, pages 19– 30, 2000.

115 [100] Hans-Jörg Rheinberger. What happened to molecular biology? BioSocieties, 3(3):303–310, 2008.

[101] Ashley G Rivenbark, Siobhan M O’Connor, and William B Coleman. Molecular and cel- lular heterogeneity in breast cancer: challenges for personalized medicine. The American journal of pathology, 183(4):1113–1124, 2013.

[102] Robert Rosen. Dynamical system theory in biology, Volume I. Stability theory and its ap- plications. New York: Wiley, 1970.

[103] Renana Sabi and Tamir Tuller. Modelling and measuring intracellular competition for nite resources during gene expression. Journal of the Royal Society Interface, 16(154):20180887, 2019.

[104] Christian Cuba Samaniego and Elisa Franco. A molecular device for frequency doubling enabled by molecular sequestration. In 2019 18th European Control Conference (ECC), pages 2146–2151. IEEE, 2019.

[105] R Sarpeshkar. Analog synthetic biology. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 372(2012):20130110, 2014.

[106] Herbert M Sauro and Kyung Kim. Synthetic biology: it’s an analog world. Nature, 497(7451):572, 2013.

[107] Hava Siegelman and Eduardo Sontag. Neural nets are universal computing devices. Sycon- 91-08, Rutger University, 1991.

[108] Michael L Simpson. Cell-free synthetic biology: a bottom-up approach to discovery by design. Molecular Systems Biology, 2(1), 2006.

[109] Brynne C Stanton, Velia Siciliano, Amar Ghodasara, Liliana Wroblewska, Kevin Clancy, Axel C Trefzer, Jonathan D Chesnut, Ron Weiss, and Christopher A Voigt. Systematic transfer of prokaryotic sensors and circuits to mammalian cells. ACS synthetic biology, 3(12):880–891, 2014.

[110] Susan Stepney. Nonclassical computation—a dynamical systems perspective. Handbook of natural computing, pages 1979–2025, 2012.

[111] Samuel H Sternberg, Rachel E Haurwitz, and Jennifer A Doudna. Mechanism of substrate selection by a highly specic crispr endoribonuclease. Rna, 18(4):661–672, 2012.

[112] Steven H Strogatz. Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering. CRC Press, 2018.

[113] Tina Toni and Bruce Tidor. Combined model of intrinsic and extrinsic variability for computational network design with application to synthetic biology. PLoS computational biology, 9(3):e1002960, 2013.

116 [114] Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative fre- quencies of events to their probabilities. In Measures of complexity, pages 11–30. Springer, 2015.

[115] Wilfried Weber and Martin Fussenegger. Emerging biomedical applications of synthetic biology. Nature Reviews Genetics, 13(1):21, 2012.

[116] Ron Weiss. Synthetic biology: from bacteria to stem cells. In Proceedings of the 44th annual Design Automation Conference, pages 634–635. ACM, 2007.

[117] Ron Weiss, George E Homsy, and Thomas F Knight. Toward in vivo digital circuits. In Evolution as Computation, pages 275–295. Springer, 2002.

[118] Neil HE Weste and Kamran Eshraghian. Principles of cmos vlsi design: a systems perspec- tive. NASA STI/Recon Technical Report A, 85, 1985.

[119] Bernard Widrow and Michael A Lehr. 30 years of adaptive neural networks: perceptron, madaline, and . Proceedings of the IEEE, 78(9):1415–1442, 1990.

[120] Haifeng Ye, Dominique Aubel, and Martin Fussenegger. Synthetic mammalian gene cir- cuits for biomedical applications. Current opinion in chemical biology, 17(6):910–917, 2013.

[121] Christoph Zechner and Heinz Koeppl. Uncoupled analysis of stochastic reaction net- works in uctuating environments. PLoS computational biology, 10(12):e1003942, 2014.

[122] Jon Zugazagoitia, Cristiano Guedes, Santiago Ponce, Irene Ferrer, Sonia Molina-Pinelo, and Luis Paz-Ares. Current challenges in cancer treatment. Clinical therapeutics, 38(7):1551–1566, 2016.

117