bioRxiv preprint doi: https://doi.org/10.1101/2020.07.05.188367; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Scalable Analysis of Authentic Viral Envelopes on FRONTERA

Fabio Gonzalez-Arias´ ∗, Tyler Reddy†, John E. Stone‡, Jodi A. Hadden-Perilla∗§ and Juan R. Perilla∗§ ∗Department of Chemistry and Biochemistry, University of Delaware, Newark, DE 19716 †Applied Computer Science (CCS-7), Los Alamos National Laboratory, Los Alamos, NM 87544 ‡Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 § Correspondence: [email protected], [email protected]

Abstract—Enveloped infect host cells via fusion of their viral envelope with the plasma membrane. Upon cell entry, viruses gain access to all the macromolecular machinery necessary to replicate, assemble, and bud their progeny from the infected cell. By employing molecular dynamics simulations to characterize the dynamical and chemical-physical properties of viral envelopes, researchers can gain insights into key determinants of viral infection and propagation. Here, the Frontera supercomputer is leveraged for large-scale analysis of authentic viral envelopes, whose lipid compositions are complex and realistic. VMD with support for MPI is employed on the massive parallel computer to overcome previous computational limitations and enable investigation into biology at an unprecedented scale. The modeling and analysis techniques applied to authentic viral envelopes at two levels of particle resolution are broadly applicable to the study of other viruses, including the novel that causes COVID-19. A framework for carrying out scalable analysis of multi-million particle MD simulation trajectories on Frontera is presented, expanding the the utility of the machine in humanity’s ongoing fight against infectious disease. !

1 INTRODUCTION

IRUSES are pathogens that cause infectious diseases in spike (S), a purported viroporin known as the envelope (E) V living organisms. Because of their of lack metabolic protein, and a structurally essential integral membrane (M) capabilities, viruses require the molecular machinery of protein [8]. While each of these components plays a key a host cell to replicate [1]. Virus particles, referred to as role in the viral life cycle, it is ultimately the envelope that virions, exhibit chemical and structural diversity across enables each to carry out its function successfully. As the families. The detailed architecture of a virus determines its envelope is derived from the host , partic- fitness, mechanism of infection and replication, and host cell ularly from the plasma membrane or organelle in which tropism. the virion assembles, its lipid composition may be highly The most basic virions consist of genomic material pro- complex [9]. Further, the envelope may be asymmetric, or tected by a protein shell, called a capsid. The viral genome present different lipid compositions across the inner versus contains the blueprint for synthesis of the macromolecular outer leaflets of the bilayer; membrane asymmetry is crucial components that will form progeny virions. Depending to the function of most organelles and may likewise be im- on the virus, the genome may be encoded as RNA or portant in the assembly or infection processes of viruses [10], as DNA [2], [3]. More complex virus structures exhibit [11]. an exterior membrane, called an envelope, composed of a Detailed characterization of viruses is critical to the . Fusion of the viral envelope with the plasma development of new antiviral therapies [12]. Computational membrane of the host cell initiates infection. Enveloped studies now routinely contribute to the foundation of ba- viruses typically incorporate surface glycoproteins that in- sic science that drives innovation in pharmacy, medicine, teract with host cell receptors to mediate adhesion and and the management of public health. Notably, molecular facilitate membrane fusion [4]. The composition of the viral dynamics (MD) simulations have emerged as a powerful envelope, along with the specificity of the adhesion proteins, tool to investigate viruses. Following advances in high- contributes to the recognition, attachment, and fusion of the performance computing, MD simulations can now be ap- pathogen with its host [5], [6]. Other membrane-embedded plied to elucidate chemical-physical properties of large, structures that may be displayed by a virion include vi- biologically relevant virus structures, revealing insights into ral ion channels, called viroporins, proteins that provide their mechanisms of infection and replication that are in- particle scaffolding or assembly support, or proteins that accessible to experiments [13]–[16]. In recent years, MD participate in the release of progeny virions [7]. simulations of intact enveloped virions, including influenza Enveloped viruses, such as influenza A, Ebola, and A, dengue, and an immature HIV-1 particle, have been HIV-1, account for numerous cases of infection and death reported. State-of-the art lipidomics profiling of viruses has in humans each year. SARS-CoV-2, the novel coronavirus enabled some of these models to contain realistic lipid that causes COVID-19, is also enveloped. The SARS-CoV-2 composition, leading to the first computational studies of virion incorporates a class-I fusion glycoprotein called the authentic viral envelopes [17]–[20]. Here, next-generation bioRxiv preprint doi: https://doi.org/10.1101/2020.07.05.188367; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2 simulation and analysis of authentic viral envelopes is resource for furthering basic science research of large biolog- discussed, leveraging the resources of the leadership-class ical systems, including enveloped viruses. Frontera supercomputer. A framework for deploying large- Frontera comprises multiple computing subsystems. The scale analysis of MD simulation trajectories on Frontera is primary partition is CPU-only and consists of 8,008 compute presented. nodes powered by Intel Xeon Platinum 8280 (CLX) pro- cessors; each node provides 56 cores (28 cores per socket) and 192 GB of RAM. The large memory partition includes 2 COMPUTATIONAL CHALLENGES 16 additional compute nodes with 112 cores (28 cores per MD simulations of membranes can be carried out at coarse- socket) and memory upgraded to 2.1 TB NVDIMM. Another grained (CG) or atomistic levels of detail. CG models em- partition is a hybrid CPU/GPU architecture, consisting of ploy particles that consolidate groups of atoms. Atomistic 90 compute nodes powered by Intel Xeon E5-2620 v4 pro- models employ discrete particles for each constituent atom, cessors and NVIDIA Quadro RTX 5000 GPUs; each node including hydrogen. Due to the size and chemical complex- provides 16 cores (8 cores per socket), four GPUs, and ity of authentic viral envelopes, MD simulations can include 128 GB of RAM. All Frontera nodes contain 240 GB solid- millions – even hundreds of millions – of particles, partic- state drives and are interconnected with Mellanox HDR ularly when the physiological solvent environment of the InfiniBand. The Longhorn subsystem consists of 96 hybrid system is taken into account. Moreover, MD simulations of CPU/GPU compute nodes powered by IBM POWER9 pro- membranes must be performed on timescales of hundreds cessors and NVIDIA Tesla V100 GPUs; each node provides of nanoseconds to microseconds in order to characterize 40 cores (20 cores per socket), four GPUs, 256 GB of RAM, dynamical properties, such as lipid lateral diffusion and flip- and 900 GB of local storage. Eight additional nodes have flop between leaflets of the bilayer [16], [21]. RAM upgraded to 512 GB to support memory-intensive cal- CG models reduce computational expense by eliminat- culations. Longhorn is interconnected with Mellanox EDR ing degrees of freedom, enabling the exploration of ex- InfiniBand. tended simulation timescales; however, the loss of detail Frontera has a multi-tier file system and provides 60 PB versus atomistic models reduces simulation accuracy. An of Lustre-based storage shared across nodes. The ability to approach that combines the strengths of both methods in- store massive amounts of data and analyze this data in volves building and simulating a CG model, which repro- parallel enables the investigation of biological systems at duces essential features of the membrane and allows equili- an unprecedented scale. Researchers gain access, not only bration of lipid species [22], then backmapping the CG rep- to the computational capability to perform MD simulations resentation to an atomistic model for further study [23], [24]. of millions of particles, but also to the capability to extract The backmapping process can be precarious and depends more complex and comprehensive information from their on the molecular mechanics force field and energy mini- simulation trajectories, leading to deeper discoveries rele- mization scheme to resolve structural artifacts. Either way, vant to the advancement of human health. the incredible number of particle-particle interactions that Frontera’s Lustre filesystem is crucial for the perfor- must be computed to reproduce the behavior of a membrane mance of large-scale biomolecular analysis. Lustre decom- bilayer over extended simulation timescales, especially for poses storage resources among a large number of so-called complex, large-scale systems like viral envelopes, presents a Object Store Targets (OSTs) that are themselves composed significant computational challenge [25]. of moderate sized high-performance RAID-protected disk Furthermore, the amount of data generated by MD simu- arrays. In much the same way that a RAID-0 array can stripe lations of intact viral envelopes is tremendous, both in terms a file over multiple disks, Lustre can similarly stripe files of chemical intricacy and storage footprint. MD simulation over multiple OSTs. This second level of striping over OSTs trajectories, which may range from tens to hundreds of is particularly advantageous when working with very large terabytes in size, fail to provide new information about files, since I/O operations at different file offsets can be ser- viruses until they are analyzed in painstaking detail. The viced in parallel by multiple independent OSTs, according magnitude of trajectory files and the sheer numbers of to the stripe count and stripe size set by the user, or by sys- particles that must be tracked during analyses necessitates tem defaults. Figure 1A illustrates how an MD simulation massively parallel computational solutions. It follows that trajectory (and potentially even individual frames) can be such large datasets cannot be feasibly transferred to other striped over OSTs. Figure 1B demonstrates the scalability locations for analysis or visualization, and interaction with of multi-million particle biomolecular analysis on Frontera the data to yield scientific discoveries must be conducted on when using the Lustre filesystem. the same high-performance computing resource used to run the simulation in the first place. 4 PARALLEL ANALYSIS WITH VMD Visual Molecular Dynamics (VMD) is a widely-used soft- 3 COMPUTATIONAL SOLUTIONS ON FRONTERA ware for biomolecular visualization and analysis [26]. Com- Frontera is a leadership-class petascale supercomputer, monly utilized as a desktop application to setup MD sim- funded by the National Science Foundation, and housed at ulations and interact with trajectory data, VMD exploits the Texas Advanced Computing Center at the University multi-core CPUS, CPU vectorization, and GPU-accelerated of Texas at Austin. It is ranked as the fifth most powerful computing techniques to achieve high performance on key supercomputer in the world, and the fastest on any univer- molecular modeling tasks. VMD can also be used in-situ on sity campus. Frontera provides an invaluable computational massive parallel computers to perform large-scale modeling bioRxiv preprint doi: https://doi.org/10.1101/2020.07.05.188367; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3 tasks, exploiting computing and I/O resources that are distributed memory parallel computers. Computationally orders of magnitude greater than those available on even the demanding VMD commands are executed by the fastest most powerful desktop workstations. MPI implementations available hardware-optimized code path, with a general of VMD have already been employed on supercomputers to purpose C++ implementation as the standard fall-back. enable novel investigation of large virus structures, such as When run in parallel, the I/O operations of each VMD MPI the capsid of HIV-1 [27]. rank are independent of each other, allowing I/O intensive To facilitate large-scale molecular modeling pipelines, trajectory analysis tasks to naturally exploit parallel filesys- VMD incorporates support for distributed memory message tems. passing with MPI, a built-in parallel work scheduler with VMD supports I/O-efficient MD simulation trajectory dynamic load balancing, and easy to use scripting com- formats that have been specifically designed for so-called mands, enabling large-scale parallel execution of molecular burst buffers and flash-based high-performance computing visualization and analysis of MD simulation trajectories. storage tiers, with emerging GPU-Direct Storage interfaces VMD’s Tcl and Python scripting interfaces provide a user- achieving I/O rates of up to 70 GB/sec on a single GPU- friendly mechanism to distribute and schedule user-defined dense compute node, and conventional parallel I/O rates work across MPI ranks, to synchronize workers, and to approaching 1 TB/sec. VMD startup scripts can be cus- gather results. This approach allows user-written modeling tomized to run one or multiple MPI ranks per node, while and analysis scripts to be readily adapted from existing avoiding GPU sharing conflicts or other undesirable out- scripts and protocols that have been developed previously comes, allowing compute, memory, and I/O resources to be on local computing resources, allowing robust tools to be apportioned among MPI ranks, and thereby best-utilized for deployed on a large-scale, often with few changes. the task at hand. To facilitate large-scale simulation and analysis of biolog- ical systems containing millions of particles, VMD has been compiled on Frontera with MPI enabled. Frontera’s capac- ity for massively parallel investigation of intact, authentic viral envelopes is demonstrated by application to envelope models at two levels of particle resolution.

5 MODELINGAUTHENTICVIRALENVELOPES 5.1 Coarse-grained model of HIV-1 envelope Lipidomics profiling by mass spectrometry has established the lipid composition of the HIV-1 viral envelope [28], [29]. Based on this experimental data, a CG model of an authentic HIV-1 envelope was constructed (Fig. 2A). The model ex- hibits a highly complex chemical makeup, with 24 different lipid species and asymmetry across the inner and outer leaflets of the membrane bilayer. The envelope is 120 nm in diameter and, with solvent, comprises 29 million CG beads, represented as MARTINI [30] particles. The CG model was equilibrated using GROMACS 2018.1 [31], and its dynamics were investigated over a production simulation time of over five microseconds. The MD simulation system, complete with solvent environment, is shown in Figure 2B.

5.2 Atomistic model of HIV-1 envelope All-atom MD simulations are the most accurate classical mechanical approach that can be applied to study large- scale biological systems [13]. Following simulation of the Fig. 1. Analysis of large-scale MD simulation trajectories using Lustre CG HIV-1 envelope model, which facilitated equilibration filesystem A Illustration of trajectory file striping using Lustre filesystem. Each frame on the file is striped across multiple OSTs, where the of lipid distributions over an extended simulation timescale, number of stripes are spawned according to the stripe count and stripe an atomistic model was constructed based on backmapping. size of the storage file. Upon submission of the job, the compute nodes The equilibrated CG model was mapped to an atomistic access a series of frames, from the OSTs, to perform the analysis of the trajectory using parallel I/O. Instructions from the Tcl scripting interface representation using the Backward tool [24], which enables involving reading and writing files are requested through the Metadata transformation of CG systems built with MARTINI [30]. Server (MS), fetching the layout extended attributes (EAs) and using this The approach utilizes a series of mapping files that contain information to perform I/O on the file. B Wall-clock time for the transverse structural information and geometric restrictions relevant diffusion analysis of authentic viral envelopes on Frontera, using a stripe count of 16 with different stripe sizes. to the modeling of each lipid species. Local geometries of atomistic lipids are reconstructed, taking into account VMD exploits node-level hardware-optimized CPU and stoichiometry and stereochemistry, to project the CG con- GPU kernels, all of which are also used when running on figuration into an atomistic configuration. An example of an bioRxiv preprint doi: https://doi.org/10.1101/2020.07.05.188367; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 4

can catalyze this movement of lipids, or it can occur sponta- neously over longer timescales. Computational approaches to characterize transverse diffusion in heterogeneous lipid mixtures have been described previously and indicate that flip-flop occurs at relatively slow rates [32], [33]. Current strategies to measure flip-flip events are based on tracking the translocation and reorientation of lipid headgroups (i.e., colored beads in Fig. 2A). A feature released in VMD 1.9.4 by the authors (measure volinterior) [34] facilitates tracking of headgroup dynamics by providing rapid classification of their locations in the inner versus outer leaflets of the bilayer. The method equips VMD with a mechanism to identify the inside versus outside of a biomolecular container. If the container surface is speci- fied as the region of the bilayer composed of lipid tail groups (i.e., silver in Fig. 2A), VMD can detect the presence of lipid headgroups in the endoplasmic versus exoplasmic leaflets. By tracking the number of headgroups that flip-flop from one leaflet to the other during intervals of simulation time, rates of transverse diffusion can be readily calculated [34]. Frontera was leveraged to measure transverse lipid dif- fusion in the CG model of the authentic HIV-1 viral enve- lope using measure volinterior in VMD compiled with MPI support. The entire 5.2 µs of MD simulation, comprising 5,200 frames with a storage footprint of 170 GB, was used for the calculation. To obtain high I/O performance, the trajectory files were set to a 1 MB stripe width, striping over a total of 16 Lustre OSTs. The analysis was run on Frontera’s primary compute partition, utilizing 56 Intel CLX cores per node. Flip-flop events for the lipid DPSM were found to occur at a rate of 2.62 × 10−4 ± 1.49 × 10−5 molecules per nanosecond from the exoplasmic to endoplasmic leaflet, and 2.33 × 10−5 ± 1.41 × 10−6 molecules per nanosecond from the endoplasmic to exoplasmic leaflet (Fig. 3A). Because inward versus outward diffusion of this lipid species is not occurring at the same rate, the calculation reveals the process of lipid distributions being equilibrated across the Fig. 2. Authentic model of HIV-1 viral envelope A Cross-section of CG asymmetric bilayer over the course of the simulation. The envelope model. Inset shows details of the inner (endoplasmic) and Tcl framework for performing the calculation with VMD, the outer (exoplasmic) leaflets of the membrane bilayer. Envelope is 120 nm in diameter and comprises 29 million MARTINI particles. B CG VMD startup script, and the SLURM job submission script envelope model suspended in physiological solvent environment, used used to run the calculation on Frontera, are given in the for long-timescale MD simulations. C Chemical structure and atomistic Supplementary Information. representation of DPSM, one of 24 lipid species included in the HIV-1 envelope model. Each CG lipid in the envelope was backmapped to an Figure 3B shows the wall-clock time for the transverse atomistic version exhibiting full chemical detail. diffusion calculation with one VMD MPI rank per node and two VMD MPI ranks per node. By running multiple VMD atomistic representation of a lipid, along with its chemical MPI ranks per node, I/O operations are more effectively structure, is given in Figure 2C. The final atomistic model overlapped with computations, thereby achieving better of an authentic HIV-1 viral envelope comprises over 280 overall I/O and compute throughput both per-node, and in million atoms including solvent and ions. To complete the the aggregate, leading to an overall performance gain. It is backmapping process, the system must be relaxed to a local worth noting, however, that finite per-node memory, GPU, energy minimum to resolve structural artifacts introduced and interconnect resources ultimately restrict the benefit by the CG to all-atom transformation. of using multiple MPI ranks per-node to achieve better compute-I/O overlap. Ongoing VMD developments aim to improve compute-I/O overlap both for CPUs and for GPU- 6 VIRALENVELOPEANALYSISON FRONTERA accelerated molecular modeling tasks through increased 6.1 Measuring transverse lipid diffusion internal multithreading of trajectory I/O to allow greater Transverse diffusion of lipids occurs when they flip-flop asynchrony and decoupling of I/O from internal VMD between leaflets of the membrane bilayer. Cellular enzymes computational kernels. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.05.188367; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 5

energy minimization based on the CHARMM36 force field, with all the latest corrections in parameters for lipids and heterogenous protein-lipid systems [39]. Figure 4C diagrams the complete procedure for con- structing an atomistic envelope model from an initial CG model. Following backmapping of the CG representation to an all-atom representation, explicit ions must be added to achieve electrostatic neutralization of the system, given the numerous charged lipid headgroups present in the en- velope. Coordinate and topology files are generated using the js plugin in VMD, utilizing a binary file format that overcomes the fixed-column limitation of pdb files. The min- imization of multi-million atom systems can occasionally cause memory overflow or significant loss of performance in NAMD, due to the I/O, even on large-scale compu- tational resources. In remedy, the memory-optimized ver- sion of NAMD implements parallel I/O and a compressed topology file, reducing the memory requirements for large biomolecular systems. Following initial conjugate gradient minimization with NAMD, the model is assessed for major contributors to energetic instability, including structural clashes, close con- tacts, bond lengths or angles significantly exceeding equilib- rium values, and unfavorable dihedral configurations. Such assessments can be rapidly accomplished using VMD com- piled with MPI support on Frontera, using existing VMD commands like measure contacts, measure bond, measure angle, and measure dihed. In a second iteration of minimization, re- gions of the system that are relatively stable are restrained in Fig. 3. Parallel analysis of authentic viral envelopes. A Transverse diffu- order to focus further minimization efforts on regions of that sion analysis of the lipid DPSM in the CG model of the viral envelope. B Wall-clock time for calculation of transverse diffusion on Frontera. remain the most energetically unfavorable. Useful NAMD commands for minimization of unstable systems include minTinyStep and minBabyStep, which control the magnitude 6.2 Iterative relaxation of an atomistic envelope of steps for the line minimizer algorithm for initial and further steps of minimization, respectively. This process is The atomistic model of an authentic HIV-1 viral envelope repeated until all geometric distortions are resolved, and the presented here was derived via backmapping from a CG atomistic model has been driven to a local energy minimum, model. A major limitation of the backmapping approach representative of the stable biological system at equilibrium. involves structural artifacts introduced by the CG to all- atom transformation. Figure 4A shows an example of such structural artifacts in the lipid DPSM, including distorted 7 CONCLUSIONS bond lengths and angles. The current strategy for reme- dying these artifacts is to subject the backmapped model Increased computational capability presents the opportu- to molecular mechanics energy minimization, relaxing the nity for researchers to push the envelope and extend the geometry of each molecule, and the system as a whole, to bounds of scientific knowledge. Building, simulating, and a local energy minimum. Figure 4B shows an example of a analyzing models of intact, authentic viral envelopes, par- post-minimization structure in which the distortions have ticularly at atomistic detail, is a relatively new research been successfully resolved. endeavor under active development. The leadership-class Energy minimization is a common procedure applied to Frontera supercomputer provides a powerful resource to biomolecular systems to eliminate close contacts and non- support the continued growth of this field. The combination ideal geometries prior to the start of MD simulations. The of Frontera’s state-of-the-art, massively parallel architecture energy of atomistic systems is described as an empirical and high-performance software applications, such as VMD, potential energy function V(ri), where the terms depend on that can exploit it will enabling researchers to make dis- the positions r of i atoms [35]. Here, energy minimization of coveries relevant to the treatment and prevention of disease the system is performed on Frontera using NAMD 2.14, an at an unprecedented scale. All of the techniques discussed MD engine based on C++ and charmm++ that is amenable here are applicable to modeling and investigation of other to large systems by virtue of being highly scalable on enveloped viruses, including SARS-CoV-2, which likewise massive parallel computers [36], [37]. Energy minimization comprises a complex lipidome. Owing to resources like is implemented in NAMD using the conjugate gradient Frontera, the computational biological sciences will play an algorithm [38] for maximum performance. The atomistic increasingly important role in driving future innovations model of an authentic HIV-1 viral envelope was subjected to that improve human health and longevity. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.05.188367; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 6

is supported by the U.S. Department of Energy Na- tional Nuclear Security Administration under Contract No. 89233218CNA000001.

REFERENCES

[1] P. Cossart and A. Helenius, “ of viruses and bacteria,” Cold Spring Harbor perspectives in biology, vol. 6, no. 8, p. a016972, 2014. [2] J. L. Van Etten, L. C. Lane, and D. D. Dunigan, “DNA viruses: the really big ones (giruses),” Annual review of microbiology, vol. 64, pp. 83–99, 2010. [3] P. Poltronieri, B. Sun, and M. Mallardo, “RNA viruses: RNA roles in pathogenesis, coreplication and viral load,” Current genomics, vol. 16, no. 5, pp. 327–335, 2015. [4] F. A. Rey and S.-M. Lok, “Common features of enveloped viruses and implications for immunogen design for next-generation vac- cines,” Cell, vol. 172, no. 6, pp. 1319–1334, 2018. [5] S. C. Harrison, “Viral membrane fusion,” Nature structural & molecular biology, vol. 15, no. 7, pp. 690–698, 2008. [6] S. S. Rawat, M. Viard, S. A. Gallo, A. Rein, R. Blumenthal, and A. Puri, “Modulation of entry of enveloped viruses by cholesterol and sphingolipids,” Molecular membrane biology, vol. 20, no. 3, pp. 243–254, 2003. [7] J. L. Nieto-Torres, C. Verdia-B´ aguena,´ C. Castano-Rodriguez,˜ V. M. Aguilella, and L. Enjuanes, “Relevance of viroporin ion channel activity on viral replication and pathogenesis,” Viruses, vol. 7, no. 7, pp. 3552–3573, 2015. [8] A. Wu et al., “Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China,” Cell host & microbe, 2020. [9] T. Tang, M. Bidon, J. A. Jaimes, G. R. Whittaker, and S. Daniel, “Coronavirus membrane offers as a potential target for antiviral development,” Antiviral research, p. 104792, 2020. [10] S. Chiantia and E. London, “Lipid bilayer asymmetry,” in Encyclopedia of Biophysics, G. C. K. Roberts, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 1250–1253. [Online]. Available: https://doi.org/10.1007/978-3-642-16712-6 552 [11] G. van Meer, “Dynamic transbilayer lipid asymmetry,” Cold Spring Harbor perspectives in biology, vol. 3, no. 5, p. a004671, 2011. [12] D. D. Richman and N. Nathanson, “Antiviral therapy,” in Viral Pathogenesis. Elsevier, 2016, pp. 271–287. [13] J. R. Perilla et al., “Molecular dynamics simulations of large macro- molecular complexes,” Current opinion in structural biology, vol. 31, pp. 64–74, 2015. Fig. 4. Resolving distortions in backmapped lipids. A Atomistic rep- [14] J. K. Marzinek, R. G. Huber, and P. J. Bond, “Multiscale modelling resentation of DPSM with structural artifacts post-backmapping. CG and simulation of viruses,” Current Opinion in Structural Biology, representation of the lipid is superimposed to the all-atom structure. vol. 61, pp. 146–152, 2020. B Atomistic representation of DPSM post-minimization with structural [15] J. A. Hadden and J. R. Perilla, “All-atom virus simulations,” artifacts resolved. CG representation of the lipid is superimposed to Current opinion in virology, vol. 31, pp. 82–91, 2018. the relaxed all-atom structure. C Iterative relaxation process for large- [16] T. Reddy and M. S. Sansom, “Computational virology: from the scale authentic atomistic viral envelopes. Following backmapping from inside out,” Biochimica et Biophysica Acta (BBA)-Biomembranes, vol. CG representations to all-atom, the system is subjected to charge neu- 1858, no. 7, pp. 1610–1618, 2016. tralization, followed by generation of topology and coordinate files. Con- [17] T. Reddy et al., “Nothing to sneeze at: a dynamic and integrative jugate gradient minimization using NAMD, in combination with parallel computational model of an influenza A virion,” Structure, vol. 23, analysis using VMD MPI, is employed iteratively to eliminate structural no. 3, pp. 584–597, 2015. artifacts of the membrane and drive the system to a local minimum. [18] T. Reddy and M. S. Sansom, “The role of the membrane in the structure and biophysical robustness of the dengue virion envelope,” Structure, vol. 24, no. 3, pp. 375–382, 2016. ACKNOWLEDGMENTS [19] J. D. Durrant, S. E. Kochanek, L. Casalino, P. U. Ieong, A. C. Dommer, and R. E. Amaro, “Mesoscale all-atom influenza virus This work is supported by National Science Foundation simulations suggest new substrate binding mechanism,” ACS award MCB-2027096, funded in part by the Delaware Es- central science, vol. 6, no. 2, pp. 189–196, 2020. tablished Program to Stimulate Competitive Research (EP- [20] G. S. Ayton and G. A. Voth, “Multiscale computer simulation of SCoR). This research is part of the Frontera computing the immature HIV-1 virion,” Biophysical journal, vol. 99, no. 9, pp. 2757–2765, 2010. project at the Texas Advanced Computing Center. Frontera [21] S. J. Marrink, V. Corradi, P. C. Souza, H. I. Ingolfsson, D. P. is made possible by National Science Foundation award Tieleman, and M. S. Sansom, “Computational modeling of realistic OAC-1818253. VMD development has been supported by cell membranes,” Chemical reviews, vol. 119, no. 9, pp. 6184–6226, National Institutes of Health grant P41-GM104601. This 2019. [22] S. V. Bennun, M. I. Hoopes, C. Xing, and R. Faller, “Coarse-grained research used resources provided by the Los Alamos Na- modeling of lipids,” Chemistry and physics of lipids, vol. 159, no. 2, tional Laboratory Institutional Computing Program, which pp. 59–66, 2009. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.05.188367; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 7

[23] A. J. Rzepiela, L. V. Schafer,¨ N. Goga, H. J. Risselada, A. H. De Vries, and S. J. Marrink, “Reconstruction of atomistic details from coarse-grained structures,” Journal of computational chemistry, vol. 31, no. 6, pp. 1333–1343, 2010. [24] T. A. Wassenaar, K. Pluhackova, R. A. Bockmann, S. J. Marrink, and D. P. Tieleman, “Going backward: a flexible geometric ap- proach to reverse transformation from coarse grained to atomistic models,” Journal of chemical theory and computation, vol. 10, no. 2, pp. 676–690, 2014. [25] D. Poger, B. Caron, and A. E. Mark, “Validating lipid force fields against experimental data: Progress, challenges and perspectives,” Biochimica et Biophysica Acta (BBA)-Biomembranes, vol. 1858, no. 7, pp. 1556–1565, 2016. [26] W. Humphrey, A. Dalke, and K. Schulten, “VMD – visual molecu- lar dynamics,” Journal of molecular graphics, vol. 14, no. 1, pp. 33–38, 1996. [27] J. R. Perilla and K. Schulten, “Physical properties of the HIV- 1 capsid from all-atom molecular dynamics simulations,” Nature communications, vol. 8, no. 1, pp. 1–10, 2017. [28] B. Brugger,¨ B. Glass, P. Haberkant, I. Leibrecht, F. T. Wieland, and H.-G. Krausslich,¨ “The HIV lipidome: a raft with an unusual composition,” Proceedings of the National Academy of Sciences, vol. 103, no. 8, pp. 2641–2646, 2006. [29] M. Lorizate et al., “Comparative lipidomics analysis of HIV-1 particles and their producer cell membrane in different cell lines,” Cellular microbiology, vol. 15, no. 2, pp. 292–304, 2013. [30] S. J. Marrink, H. J. Risselada, S. Yefimov, D. P. Tieleman, and A. H. De Vries, “The MARTINI force field: coarse grained model for biomolecular simulations,” The journal of physical chemistry B, vol. 111, no. 27, pp. 7812–7824, 2007. [31] S. Pronk et al., “GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit,” Bioinformatics, vol. 29, no. 7, pp. 845–854, 2013. [32] F. Ogushi, R. Ishitsuka, T. Kobayashi, and Y. Sugita, “Rapid flip- flop motions of diacylglycerol and ceramide in phospholipid bilayers,” Chemical Physics Letters, vol. 522, pp. 96–102, 2012. [33] R.-X. Gu, S. Baoukina, and D. P. Tieleman, “Cholesterol flip- flop in heterogeneous membranes,” Journal of chemical theory and computation, vol. 15, no. 3, pp. 2064–2070, 2019. [34] A. J. Bryer, J. A. Hadden-Perilla, J. E. Stone, and J. R. Per- illa, “High-performance analysis of biomolecular containers to measure small-molecule transport, transbilayer lipid diffusion, and protein cavities,” Journal of chemical information and modeling, vol. 59, no. 10, pp. 4328–4338, 2019. [35] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. a. Swaminathan, and M. Karplus, “CHARMM: a program for macro- molecular energy, minimization, and dynamics calculations,” Jour- nal of computational chemistry, vol. 4, no. 2, pp. 187–217, 1983. [36] J. C. Phillips et al., “Scalable molecular dynamics with NAMD,” Journal of computational chemistry, vol. 26, no. 16, pp. 1781–1802, 2005. [37] L. Kale´ and S. Krishnan, “Parallel programming using c; wilson, gv; lu, p., eds,” 1996. [38] M. C. Payne, M. P. Teter, D. C. Allan, T. Arias, and a. J. Joannopou- los, “Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients,” Re- views of modern physics, vol. 64, no. 4, p. 1045, 1992. [39] J. B. Klauda et al., “Update of the CHARMM all-atom additive force field for lipids: validation on six lipid types,” The journal of physical chemistry B, vol. 114, no. 23, pp. 7830–7843, 2010.