A Review of Deep Learning Methods for Antibodies

antibodies Review A Review of Deep Learning Methods for Antibodies Jordan Graves, Jacob Byerly, Eduardo Priego, Naren Makkapati , S. Vince Parish, Brenda Medellin and Monica Berrondo * Macromoltek, Inc, 2500 W William Cannon Dr, Suite 204, Austin, Austin, TX 78745, USA; [email protected] (J.G.); [email protected] (J.B.); [email protected] (E.P.); [email protected] (N.M.); [email protected] (S.V.P.); [email protected] (B.M.) * Correspondence: [email protected] Received: 1 April 2020; Accepted: 16 April 2020; Published: 28 April 2020 Abstract: Driven by its successes across domains such as computer vision and natural language processing, deep learning has recently entered the field of biology by aiding in cellular image classification, finding genomic connections, and advancing drug discovery. In drug discovery and protein engineering, a major goal is to design a molecule that will perform a useful function as a therapeutic drug. Typically, the focus has been on small molecules, but new approaches have been developed to apply these same principles of deep learning to biologics, such as antibodies. Here we give a brief background of deep learning as it applies to antibody drug development, and an in-depth explanation of several deep learning algorithms that have been proposed to solve aspects of both protein design in general, and antibody design in particular. Keywords: antibody; antigen; machine learning; deep learning; neural networks; binding prediction; protein–protein interaction; epitope mapping; drug discovery; drug design 1. Introduction In this paper, we outline the deep learning techniques that are starting to be applied to the field of antibody design and their results. We outline current challenges in three areas of antibody design: (1) modeling of structure from sequence, (2) prediction of protein interactions, and (3) identification of likely binding sites. We then touch on the deep learning techniques and algorithms that have been developed towards antibody design in each of these areas. As these three challenge areas have analogues in the general protein space, we additionally describe deep learning approaches for similar problems among proteins more broadly, with the understanding that these approaches may be applicable to the narrower domain of antibody engineering. We also describe the dataset and benchmarks available that could aid in the development and comparison of methods within this field. We conclude with a comparison of these methods and touch on future directions. Monoclonal antibody therapeutics have become an increasingly popular approach for drug development against targets and indications where small-molecule-based approaches have proven insufficient. With this increase in focus has come the creation of a number of new methods for improving and refining the antibody development pipeline. Innovations in constructing display libraries (phage and yeast) have accelerated the candidate discovery timeline and reduced the challenges associated with downstream development of therapeutic leads. A significant goal of current lead development research has been to reduce the necessity of downstream lead optimization steps such as improvements to the solubility and immunogenicity of the candidates, along with the mitigation of other developability concerns. Other research has expanded the mechanisms of action of potential antibody therapeutics by adding additional functional domains to create bispecific and Fc effector antibodies, and by exploring different constructs for their unique properties, such as single-chain variable fragments and camelid-derived nanobodies. Antibodies 2020, 9, 12; doi:10.3390/antib9020012 www.mdpi.com/journal/antibodies Antibodies 2020, 9, 12 2 of 22 While these in vitro innovations have shortened timelines and have improved different steps throughout of the development pipeline, there have been a new class of innovations surrounding the in silico engineering and design of antibody candidates. These approaches attempt to harness advances in computational processing power to reduce the cost and increase the speed of lead candidate generation. The advantages of an in silico pipeline would include rapid and cheap scaling of candidate generation, the ability to develop antibodies against challenging antigens, and the application of rational design principles. In contrast to traditional methods of candidate generation such as hybridoma or phage display, an in silico pipeline promises cheaper and faster drug development. However, conventional in silico methods have yet to fully deliver on these promises. Here, we present deep-learning-based approaches that appear to demonstrate greater success than conventional methods with respect to the key challenges of computational antibody design. 2. Antibodies Antibodies are a type of protein produced as an immune response to invading pathogens. They consist of four chains—two heavy chains and two light chains. The heavy chains include three constant domains and a variable domain, while the light chains have just one constant domain and one variable domain. The variable domains contain the antibody’s binding surface, or “paratope”. The paratope primarily consists of six distinct variable loops—three on the light chain (loops L1, L2, and L3), and three on the heavy chain (loops H1, H2, and H3) (Figure1). This region, also called the complementarity-determining region, or CDR, is what allows an antibody to bind a target with high specificity [1]. The area is large enough to accommodate many unique contacts, which is part of what allows for such high specificity—especially as compared to typical small molecules, which are able to accommodate far fewer contacts and thus tend to have a greater number of side-effect-causing off-target interactions. The substantial degree of variation between the CDR loops is significant, as the diversity of antibodies is part of what makes them effective binders for such a wide range of targets [1]. Figure 1. Schematic of antibody and ribbon diagram of variable region. The heavy chain (H) of the antibody is depicted in dark blue, while the light chain (L) is shown in light blue. Both chains show labels C for constant region and V for variable region. The complementarity-determining region (CDR) is shown as orange loops on the light chain and yellow loops on the heavy chain. On the right, a ribbon diagram of a CDR is shown with light and heavy chain CDR loops highlighted and labeled (PDB: 1A4J). The specificity and broad applicability of antibodies make them the subject of much attention in medical research, and this has in turn attracted much attention to the study of antibodies computationally, or in silico. In order to computationally analyze an antibody or predict its effectiveness, it is often necessary to generate a three-dimensional model. As traditional structure-determination methods, such as X-ray crystallography, Nuclear Magnetic Resonance (NMR), and Cryogenic Electron Mycroscopy (CryoEM), are laborious, time-consuming, and expensive, computational methods have emerged to generate structure predictions using chemistry and existing protein fold data. Several groups have Antibodies 2020, 9, 12 3 of 22 been able to accurately predict antibody structures for a set of benchmarks, but the modeling of the H3 CDR loop continues to present a significant challenge [2]. The biological process which generates the H3 loop is unique relative to the other CDR loops. The bulk of the loop is encoded in its own gene, separate from the genes which code for the rest of the antibody sequence. Whereas other CDR loops exhibit much less variation and can even be reasonably separated into canonical structural clusters, the H3-encoding gene is actively mutated in isolation before being recombined with the rest of the gene sequence in a process called V(D)J recombination, which creates both sequential and conformational hypervariability among these loops [3,4]. The extreme variety of loop sequences introduced by this process makes homology to similar loops nearly impossible; there is rarely sufficient homology data to predict structure. This presents a substantial challenge which makes evident the need for new approaches. Another challenge in computational antibody development is interface prediction. Typically, the interface between two proteins consists of several well-conserved residues that form a tight interaction. The reason for this is that typically any two interacting proteins will have co-evolved over the span of many generations. An antibody’s interaction with its antigen does not benefit from such a long shared history. Antibodies are ad-hoc binders generated “on the fly” to address a foreign pathogen. Although antibody-antigen interactions do fall under the general umbrella of protein–protein interactions (PPI), it has become increasingly apparent that antibody–antigen interactions and their interfaces are distinct, with unique properties that reduce the applicability of general protein interaction prediction to the antibody space. (Figure2) As only the antibody side of the interface undergoes its own, separate evolution, the antigen surface lacks many of the features associated with PPIs, including a lack of enrichment with non-polar and aromatic residues. These interfaces have fewer hydrophobic interactions, and are typically constructed of paratope aromatic hotspots surrounded by polar contacts created by short-chain hydrophilic residues [5]. Most of the cross-interface hydrogen bonds are created by sidechain–sidechain

A Review of Deep Learning Methods for Antibodies

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support