Supplementary Online Materials

Title: AI-guided discovery of the invariant host response to viral pandemics Authors: Debashis Sahoo1-3†*, Gajanan D. Katkar4*, Soni Khandelwal1, Mahdi Behroozikhah2, Amanraj Claire4, Vanessa Castillo4, Courtney Tindle4, MacKenzie Fuller4, Sahar Taheri2, Thomas F. Rogers5-6, Nathan Beutler5, Sydney I. Ramirez10, 11, Stephen A. Rawlings11, Victor Pretorius14, David M. Smith11, Dennis R. Burton5, 7-8, Laura E. Crotty Alexander9, Jason Duran15, Shane Crotty10, 11, Jennifer M. Dan10, 11, Soumita Das11† and Pradipta Ghosh4,13†

Affiliations: 1Department of Pediatrics, University of California San Diego. 2Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego. 3Moores Cancer Center, University of California San Diego. 4Department of Cellular and Molecular Medicine, University of California San Diego. 5Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA. 6Division of Infectious Diseases, Department of Medicine, University of California, San Diego, La Jolla, CA 92037, USA. 7IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA. 8Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA. 9Pulmonary Critical Care Section, Veterans Affairs (VA) San Diego Healthcare System, La Jolla, California; Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of California San Diego (UCSD), La Jolla, California 10Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA, USA. 11Department of Medicine, Division of Infectious Diseases and Global Public Health, University of California, San Diego (UCSD), La Jolla, CA, USA. 12Department of Pathology, University of California San Diego. 13Medicine, University of California San Diego. 14Department of Surgery, University of California San Diego. 15Division of Cardiology, Department of Internal Medicine, UC San Diego Medical Center, La Jolla 92037

*Equal contribution † Co-Corresponding

Corresponding authors: Debashis Sahoo, Ph.D.; Assistant Professor, Department of Pediatrics, University of California San Diego; 9500 Gilman Drive, MC 0703, Leichtag Building 132; La Jolla, CA 92093-0831. Phone: 858-246-1803: Fax: 858-246- 0019: Email: [email protected]

Soumita Das, Ph.D.; Associate Professor, Department of Pathology, University of California San Diego; 9500 Gilman Drive, George E. Palade Bldg, Rm 256; La Jolla, CA 92093. Phone: 858-246-2062: Email: [email protected]

Pradipta Ghosh, M.D.; Professor, Departments of Medicine and Cell and Molecular Medicine, University of California San Diego; 9500 Gilman Drive (MC 0651), George E. Palade Bldg, Rm 232; La Jolla, CA 92093. Phone: 858-822-7633: Email: [email protected]

Supplementary Materials: Includes • Detailed Materials and Methods • Supplementary Text- n/a • Figures S1-S2 • Tables S1-5 • External Databases - None • References (1-20)

DETAILED SUPPLEMENTARY METHODS:

Data Collection and Annotation Publicly available microarray and RNASeq databases were downloaded from the National Center for Biotechnology Information (NCBI) Expression Omnibus (GEO) website1-3. Gene expression summarization was performed by normalizing Affymetrix platforms by RMA (Robust Multichip Average)4, 5 and RNASeq platforms by computing TPM (Transcripts Per Millions)6, 7 values whenever normalized data were not available in GEO. We used log2(TPM) if TPM > 1 and (TPM – 1) if TPM < 1 as the final gene expression value for analyses.

Rapid autopsy procedure for tissue collection The lung specimens from the COVID 19 positive human subjects were collected using autopsy (study was IRB Exempt). All donations to this trial were obtained after telephone consent followed by written email confirmation with next of kin/power of attorney per California state law (no in- person visitation could be allowed into our COVID-19 ICU during the pandemic). The team member followed the CDC guidelines for COVID19 and the autopsy procedures8, 9. Lung specimens were collected in 10% Zinc-formalin and stored for 72 h before processing for histology. Patient characteristic is listed in supplementary Table S5. Autopsy 2 was a standard autopsy performed by anatomical pathology in the BSL3 autopsy suite. The patient expired and his family consented for autopsy. After 48 hours, lungs were removed and immersion fixed whole in 10% formalin for 48 hours and then processed further. Lungs were only partially fixed at this time (about 50% fixed in thicker segments) and were sectioned further into small 2-4cm chunks and immersed in 10% formalin for further investigation. Autopsy 4 and 5 were collected from rapid postmortem lung biopsies. The procedure was performed in the Jacobs Medical Center ICU (all of the ICU rooms have a pressure-negative environment, with air exhausted through HEPA filters [Biosafety Level 3 (BSL3)] for isolation of SARS-CoV-2 virus). Biopsies were performed 2-4 hours after patient expiration. Ventilator was shut off to reduce aerosolization of viral particles at least 1 hour after loss of pulse and before the sample collection. Every team member had personal protective equipment in accordance with the University policies for procedures on patients with COVID-19 (N95 mask + surgical mask, hairnet, full face shield, surgical gowns, double surgical gloves, booties). Lung biopsies were obtained after L-thoracotomy in the 5th intercostal space by our cardiothoracic surgery team. Samples were taken from the left upper lobe (LUL) and left lower lobe (LLL) and then sectioned further.

COVID-19 donors Blood from COVID-19 donors was either obtained at a UC San Diego Health clinic under the approved IRB protocols of the University of California, San Diego (UCSD; 200236X) or recruited at the La Jolla Institute under IRB approved (LJI; VD-214). COVID-19 donors were California residents, who were either referred to the study by a health care provider or self-referred. Blood was collected in acid citrate dextrose (ACD) tubes (UCSD) or in EDTA tubes (LJI) and stored at room temperature prior to processing for plasma collection. Seropositivity against SARS-CoV-2 was confirmed by ELISA. At the time of enrollment, all COVID-19 donors provided informed consent to participate in the present and future studies. Patient characteristic is listed in supplementary Table S4.

Plasma isolation Whole blood was collected in heparin coated blood bags (healthy unexposed donors) or in ACD tubes (COVID-19 donors) and centrifuged for 25 min at 1850 rpm to separate the cellular fraction and plasma. The plasma was then carefully removed from the cell pellet and stored at -80C.

Animal Study Lung samples from 8-week old Syrian hamsters were generated from experiments conducted exactly as in a previously published study10. We chose three different groups of samples: uninfected control, SARS-CoV-2 challenge after Den3 (antibody to dengue virus), and SARS- CoV-2 challenge after Anti-CoV2 (CC12.2; a potent SARS-CoV-2 neutralizing antibodies)10.

Plasma IL15 cytokine ELISA Plasma obtained from COVID-19 patients were used to quantify IL15 cytokine using ELISA MAX Deluxe set (Biolegend Cat. No. 435104) according to the manufacturer’s recommended protocol. The concentrations of IL15 cytokine was compared using Welch’s t-test. A p < 0.05 denoted statistical significance.

Immunohistochemistry COVID-19 samples were inactivated by storing in 10% formalin for 2 days and then transferred to zinc-formalin solution for another 3 days. The deactivated tissues were transferred to 70% ethanol and cassettes were prepared for tissue sectioning. The slides containing hamster and human lung tissue sections were deparaffinized in xylene (Sigma-Aldrich Inc., MO, USA; catalog# 534056) and rehydrated in graded alcohols to water. For IL15RA antigen retrieval, slides were immersed in Tris-EDTA buffer (pH 9.0) and boiled for 10 minutes at 100°C. Slides were immersed in Tris-EDTA-Tween 20 buffer (pH 9.0) and pressure cooked for 3 minutes, for IL15 antigen retrieval. Endogenous peroxidase activity was blocked by incubation with 3% H2O2 for 10 minutes. To block non-specific binding 2.5% goat serum (Vector Laboratories, Burlingame, USA; catalog# S-1012) was added. Tissues were then incubated with rabbit IL15RA polyclonal antibody (1:200 dilution; proteintech®, Rosemont, IL, USA; catalog# 16744-1-AP) for 1.5 hour and mouse IL15 monoclonal antibody (1:10 dilution; Santa Cruz Biotechnology, Inc., Dallas, TX, USA; catalog# sc-8437) at room temperature in a humidified chamber and then rinsed with TBS or PBS 3x, 5 minutes each. Sections were incubated with goat anti-rabbit (Vector Laboratories, Burlingame, USA; catalog# MP-7401) and goat anti-mouse (Vector Laboratories, Burlingame, USA; catalog# MP-7402) secondary antibodies for 30 minutes at room temperature and then washed with TBS or PBS 3x, 5 minutes each; incubated with DAB (Vector Laboratories, Burlingame, USA; catalog# SK-4105), counterstained with hematoxylin (Sigma-Aldrich Inc., MO, USA; catalog# MHS1), dehydrated in graded alcohols, cleared in xylene, and cover slipped. Epithelial and stromal components of the lung tissue were identified by staining duplicate slides in parallel with hematoxylin and eosin (Sigma-Aldrich Inc., MO, USA; catalog# E4009) and visualizing by Leica DM1000 LED (Leica Microsystems, Germany).

IHC Quantification IHC images were randomly sampled at different 300x300 pixel regions of interest (ROI). The ROIs were analyzed using IHC Profiler11. IHC Profiler uses a spectral deconvolution method of DAB/hematoxylin color spectra by using optimized optical density vectors of the color deconvolution plugin for proper separation of the DAB color spectra. The histogram of the DAB intensity was divided into 4 zones: high positive (0 to 60), positive (61 to 120), low positive (121 to 180) and negative (181 to 235). High positive, positive, and low positive percentages were combined to compute the final percentage positive for each ROI. The range of values for the percent positive is compared among different experimental groups. IL15 staining showed too many ROIs with low final percent positive score. We subtracted these background noise by focusing on only ROIs with greater than 20% positive percentages.

RNA sequencing RNA sequencing libraries were generated using the Illumina TruSeq Stranded Total RNA Library Prep Gold with TruSeq Unique Dual Indexes (Illumina, San Diego, CA). Samples were processed following manufacturer’s instructions, except modifying RNA shear time to five minutes. Resulting libraries were multiplexed and sequenced with 100 basepair (bp) Paired End (PE100) to a depth of approximately 25-40 million reads per sample on an Illumina NovaSeq 6000 by the Institute of Genomic Medicine (IGM) at the University of California San Diego. Samples were demuxltiplexed using bcl2fastq v2.20 Conversion Software (Illumina, San Diego, CA). RNASeq data was processed using kallisto (version 0.45.0), Mesocricetus auratus genome (MesAur1.0) and GRCh38 Ensembl version 94 annotation (Homo_sapiens GRCh38.94 chr_patch_hapl_scaff.gtf). Gene-level TPM values and gene annotations were computed using tximport and biomaRt R package. A custom python script was used to organize the data and log reduced using log2(TPM) if TPM > 1 and TPM - 1 if TPM <= 1. For the hamster study kallisto index was prepared on Mesocricetus_auratus.MesAur1.0.ncrna.fa.gz + Mesocricetus_auratus MesAur1.0 cdna.all.fa.gz. The raw data and processed data are deposited in Gene Expression Omnibus under accession no GSE157058 (Hamster) and GSE157059 (Ileum).

StepMiner Analysis

StepMiner is a computational tool that identifies step-wise transitions in a time-series data.12 StepMiner performs an adaptive regression scheme to identify the best possible step up or down based on sum-of-square errors. The steps are placed between time points at the sharpest change between low expression and high expression levels, which gives insight into the timing of the gene expression-switching event. To fit a step function, the algorithm evaluates all possible step positions, and for each position, it computes the average of the values on both sides of the step for the constant segments. An adaptive regression scheme is used that chooses the step positions that minimize the square error with the fitted data. Finally, a regression test statistic is computed as follows:

( ) ( 1) = 𝑛𝑛 ( )2 ( ) ∑𝑖𝑖=1 𝑋𝑋�𝚤𝚤 − 𝑋𝑋� ⁄ 𝑚𝑚 − Where for = 1 to are the𝐹𝐹 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 values, 𝑛𝑛 for = 1 to2 are fitted values. m is the degrees of ∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋�𝚤𝚤 ⁄ 𝑛𝑛 − 𝑚𝑚 freedom 𝑖𝑖used for the adaptive regression 𝚤𝚤analysis. is the average of all the values: = 𝑋𝑋 𝑖𝑖 𝑛𝑛 𝑋𝑋� 𝑖𝑖 𝑛𝑛 1 . For a step position at k, the fitted values are computed by using for = 1 𝑋𝑋� 𝑋𝑋� 𝑛𝑛 ∗ 𝑛𝑛 1 𝑛𝑛 to 𝑗𝑗= 1 and𝑗𝑗 for = + 1 to . 𝑙𝑙 𝑗𝑗=1 𝑗𝑗 ∑ 𝑋𝑋 ( ) 𝑋𝑋� 𝑘𝑘 ∗ ∑ 𝑋𝑋 𝑖𝑖 1 𝑛𝑛 𝑘𝑘 𝑛𝑛−𝑘𝑘 ∗ ∑𝑗𝑗=𝑘𝑘+1 𝑋𝑋𝑗𝑗 𝑖𝑖 𝑘𝑘 𝑛𝑛 Boolean Analysis

Boolean logic is a simple mathematic relationship of two values, i.e., high/low, 1/0, or positive/negative. The Boolean analysis of gene expression data requires the conversion of expression levels into two possible values. The StepMiner algorithm is reused to perform Boolean analysis of gene expression data.13 The Boolean analysis is a statistical approach which creates binary logical inferences that explain the relationships between phenomena. Boolean analysis is performed to determine the relationship between the expression levels of pairs of . The StepMiner algorithm is applied to gene expression levels to convert them into Boolean values (high and low). In this algorithm, first the expression values are sorted from low to high and a rising step function is fitted to the series to identify the threshold. Middle of the step is used as the StepMiner threshold. This threshold is used to convert gene expression values into Boolean values. A noise margin of 2-fold change is applied around the threshold to determine intermediate values, and these values are ignored during Boolean analysis. In a scatter plot, there are four possible quadrants based on Boolean values: (low, low), (low, high), (high, low), (high, high). A Boolean implication relationship is observed if any one of the four possible quadrants or two diagonally opposite quadrants are sparsely populated. Based on this rule, there are six kinds of Boolean implication relationships. Two of them are symmetric: equivalent (corresponding to the positively correlated genes), opposite (corresponding to the highly negatively correlated genes). Four of the Boolean relationships are asymmetric and each corresponds to one sparse quadrant: (low => low), (high => low), (low => high), (high => high). BooleanNet statistics (Figure 2A) is used to assess the sparsity of a quadrant and the significance of the Boolean implication relationships13, 14. Given a pair of genes A and B, four quadrants are identified by using the StepMiner thresholds on A and B by ignoring the Intermediate values defined by the noise margin of 2 fold change (+/- 0.5 around StepMiner threshold). Number of samples in each quadrant are defined as a00, a01, a10, and a11 (Figure 1A) which is different from X in the previous equation of F stat. Total number of samples where gene expression values for A and B are low is computed using the following equations.

= ( + ), = ( + ),

𝑙𝑙𝑙𝑙𝑙𝑙 00 01 𝑙𝑙𝑙𝑙𝑙𝑙 00 10 𝑛𝑛𝑛𝑛Total number𝑎𝑎 of samples𝑎𝑎 𝑛𝑛𝑛𝑛 considered𝑎𝑎 is computed𝑎𝑎 using following equation.

= + + +

Expected𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 number𝑎𝑎00 𝑎𝑎 of01 samples𝑎𝑎10 in𝑎𝑎 each11 quadrant is computed by assuming independence between A

and B. For example, expected number of samples in the bottom left quadrant e00 = is computed

as probability of A low ((a00 + a01)/total) multiplied by probability of B low ((a00𝑛𝑛� + a10)/total) multiplied by total number of samples. Following equation is used to compute the expected number of samples.

= , = ( )

𝑖𝑖𝑖𝑖 𝑙𝑙𝑙𝑙𝑙𝑙 𝑙𝑙𝑙𝑙𝑙𝑙 To𝑛𝑛 check𝑎𝑎 𝑛𝑛�whether𝑛𝑛𝑛𝑛 a quadrant⁄𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∗is 𝑛𝑛𝑛𝑛sparse,⁄ 𝑡𝑡a𝑡𝑡 statistical𝑡𝑡𝑡𝑡𝑡𝑡 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 te𝑡𝑡st for (e00 > a00) or ( > ) is performed by

computing S00 and p00 using following equations. A quadrant is considered𝑛𝑛� sparse𝑛𝑛 if S00 is high ( > ) and p00 is small.

𝑛𝑛� 𝑛𝑛 = 𝑛𝑛� − 𝑛𝑛 𝑆𝑆𝑖𝑖𝑖𝑖 1 √𝑛𝑛� = + 2 ( +00 ) ( +00 ) 00 𝑎𝑎 𝑎𝑎 𝑝𝑝 � 00 01 00 10 � A suitable threshold is chosen for S00 > sThr𝑎𝑎 and𝑎𝑎 p00 < pThr𝑎𝑎 to 𝑎𝑎check sparse quadrant. A Boolean implication relationship is identified when a sparse quadrant is discovered using following equation.

Boolean Implication = (Sij > sThr, pij < pThr)

A relationship is called Boolean equivalent if top-left and bottom-right quadrants are sparse.

Equivalent = ( > , < , > , < )

01 01 10 10 Boolean opposite𝑆𝑆 relationships𝑠𝑠𝑠𝑠ℎ𝑟𝑟 𝑃𝑃 have sparse𝑝𝑝𝑝𝑝ℎ𝑟𝑟 top𝑆𝑆 -right𝑠𝑠𝑠𝑠 (aℎ11𝑟𝑟 )𝑃𝑃 and bottom𝑝𝑝𝑝𝑝ℎ𝑟𝑟-left (a00) quadrants.

Opposite = ( > , < , > , < )

𝑆𝑆00 𝑠𝑠𝑠𝑠ℎ𝑟𝑟 𝑃𝑃00 𝑝𝑝𝑝𝑝ℎ𝑟𝑟 𝑆𝑆11 𝑠𝑠𝑠𝑠ℎ𝑟𝑟 𝑃𝑃11 𝑝𝑝𝑝𝑝ℎ𝑟𝑟 Boolean equivalent and opposite are symmetric relationship because the relationship from A to B is same as from B to A. Asymmetric relationship forms when there is only one quadrant sparse (A low => B low: top-left; A low => B high: bottom-left; A high=> B high: bottom-right; A high => B low: top-right). These relationships are asymmetric because the relationship from A to B is different from B to A. For example, A low => B low and B low => A low are two different relationships.

A low => B high is discovered if the bottom-left (a00) quadrant is sparse and this relationship satisfies following conditions.

A low => B high = ( > , < )

00 00 Similarly, A low => 𝑆𝑆B low is𝑠𝑠𝑠𝑠 identifiedℎ𝑟𝑟 𝑃𝑃 if the𝑝𝑝𝑝𝑝 ℎtop𝑟𝑟 -left (a01) quadrant is sparse.

A low => B low = ( > , < )

01 01 A high => B high Boolean𝑆𝑆 𝑠𝑠𝑠𝑠implicationℎ𝑟𝑟 𝑃𝑃 is established𝑝𝑝𝑝𝑝ℎ𝑟𝑟 if the bottom-right (a10) quadrant is sparse as described below.

A high => B high = ( > , < )

10 10 Boolean implication 𝑆𝑆A high𝑠𝑠𝑠𝑠 =>ℎ𝑟𝑟 B 𝑃𝑃low is 𝑝𝑝found𝑝𝑝ℎ𝑟𝑟 if the top-right (a11) quadrant is sparse using following equation.

A high => B low = ( > , < )

11 11 For each quadrant a 𝑆𝑆statistic𝑠𝑠𝑠𝑠 Sℎij 𝑟𝑟and𝑃𝑃 an error𝑝𝑝𝑝𝑝 rateℎ𝑟𝑟 pij is computed. Sij > sThr and pij < pThr are the thresholds used on the BooleanNet statistics to identify Boolean implication relationships.

Boolean analyses in the test dataset GSE47963 uses a threshold of sThr = 5 and pThr = 0.05. Boolean analysis on the large normal lung dataset GSE23546 uses a threshold of sThr = 6 and pThr = 0.1. These thresholds are more stringent compared to previously used thresholds sThr = 3 and pThr = 0.1 for BooleanNet13, 15, 16 to focus on the strong candidates.

BECC (Boolean Equivalent Correlated Clusters) Analysis

BECC analysis16 is based on Boolean Equivalent relationships, pair-wise correlation and linear regression analysis. BECC analysis begins with a seed gene. We used ACE2 as a seed gene in this paper. BECC analysis identified a set of genes Boolean Equivalent to ACE2 and Boolean Opposite to ACE2 using the BooleanNet statistic described above.

The BECC algorithm identified 367 genes ‘Boolean Equivalent’ and 163 genes ‘Boolean Opposite’ to the ACE2 gene. Reactome pathway analyses on both clusters showed that the 367- gene ACE2-equivalent cluster was enriched in viral response pathways and processes, whereas the 163-gene ACE2-opposite cluster represented housekeeping processes, implying that ACE2 and its related genes are drivers of host response in the setting of viral infections. These clusters were subsequently filtered using differential analysis on another dataset [GSE113211 (n = 118); Fig 2B] that profiled heterogeneous immunophenotypes of children with viral bronchiolitis (confirmed positive for the virus in ~100% patients; of which 25 % were infected with Influenza/Para- Influenza and 14.8% with human CoV). Transcriptomes were analyzed in nasal mucosal scrapings (NMS) and PBMC samples taken during an acute visit (AV) and during a subsequent visit at convalescence (CV)17. Of the 367 ACE2-equivalent genes, 166 genes (Table S1; 1-1) retained the “Boolean Equivalent” relationship with ACE2 and their expression was downregulated during the convalescence visit. Of the 163 ACE2-opposite genes, 26 genes (Table S1; 2-1) retained “Boolean Opposite” relationships with ACE2 and their expression were upregulated during the convalescence visit. All subsequent analyses were performed using the 166 –gene signature that had Boolean Equivalent relationship with ACE2 and that was down-regulated during a convalescent visit after acute viral bronchiolitis. A gene signature score is computed using the 166-genes that were equivalent to ACE2 which is used to order the sample. To compute the ViP signature, first the genes present in this list were normalized according to a modified Z-score approach centered around StepMiner threshold (formula = (expr -SThr)/3*stddev). The normalized expression values for every probeset for 166 genes were added together to create the final ViP signature. The samples were ordered based on the final ViP signature. To compute the severe ViP signature, 166 genes were first ordered using T test between the mild vs severe cases in GSE101702 dataset, and top 20 genes (Table S1; 3-1) were selected from this list.

Single Cell RNASeq data analysis

Single Cell RNASeq data from GSE145926 and GSE150728 was downloaded from Gene Expression Omnibus (GEO) in the HDF5 Feature Barcode Matrix Format. The filtered barcode data matrix was processed using Seurat v3 R package18. B cells (CD19, MS4A1, CD79A), T cells (CD3D, CD3E, CD3G), CD4 T cells (CCR7, CD4, IL7R, FOXP3, IL2RA), CD8 T cells (CD8A, CD8B), Natural killer cells (KLRF1), Macs Monos DCs (TYROBP, FCER1G), Epithelial (SFTPA1, SFTPB, AGER, AQP4, SFTPC, SCGB3A2, KRT5, CYP2F1, CCDC153, TPPP3) cells were identified using relevant gene markers using SCINA algorithm19. Pseudo bulk datasets were prepared by adding counts from the different cell subtypes and normalized using log(CPM+1).

AI guided discovery of invariant host response

BECC requires the depth of Boolean equivalent relationship as a parameter. For example, if ACE2 is Equivalent to X and X is Equivalent to Y but ACE2 is not necessarily equivalent to Y, depth of X is 1 and Y is 2. The depth controls how much the list of genes that are Boolean equivalent to ACE2 is expanded. This list of genes is converted to a gene expression score based on average of the normalized gene expression values as mentioned before. The strength of classification of uninfected and infected samples using this score is computed by the ROC-AUC measurement. We performed a regression to identify the best depth that predicts uninfected vs infected samples in the cohort GSE47963 (n = 438). We tested how the gene expression score distinguish uninfected and infected samples as they are annotated in many other independent datasets. Our confidence on the host response being invariant depend on having this test pass in all properly annotated cohorts without exceptions.

Statistical Analyses

Gene signature is used to classify sample categories and the performance of the multi-class classification is measured by ROC-AUC (Receiver Operating Characteristics Area Under The Curve) values. A color-coded bar plot is combined with a density plot to visualize the gene signature-based classification. All statistical tests were performed using R version 3.2.3 (2015-12- 10). Standard t-tests were performed using python scipy.stats.ttest_ind package (version 0.19.0) with Welch’s Two Sample t-test (unpaired, unequal variance (equal_var=False), and unequal sample size) parameters. Multiple hypothesis correction were performed by adjusting p values with statsmodels.stats.multitest.multipletests (fdr_bh: Benjamini/Hochberg principles). The results were independently validated with R statistical software (R version 3.6.1; 2019-07-05). Pathway analysis of gene lists were carried out via the Reactome database and algorithm20. Reactome identifies signaling and metabolic molecules and organizes their relations into biological pathways and processes. Kaplan-Meier analysis is performed using lifelines python package version 0.14.6. Violin, Swarm and Bubble plots are created using python seaborn package version 0.10.1.

SUPPLEMENTARY FIGURES AND LEGENDS

Figure S1. The ViP signature distinguishes infected from uninfected samples across multiple RNA and DNA viruses. (A) Bar plots showing the accuracy of the 166 gene ViP signature to classify infected vs. uninfected samples. Samples are categorized based on the nature of genetic material in the viruses (RNA vs. DNA) and whether the dataset was generated by assessing infections in in vitro cell-based models or in infected humans (in vivo). ROC-AUC values of infected samples classifications are shown on the right side of each bar plot. See also Table S2, which classifies these viruses based on their route of entry into cells, i.e., clathrin-dependent vs. independent endocytosis. ROC-AUC values of infected samples classifications are shown on the right side of each bar plot. (B) Bar (top) and violin (bottom) plots show that the ViP signature is equally effective in distinguishing RNA (R) and DNA (D) virus infections from uninfected controls (C) in vivo.

Figure S2. ReacFome analysis of 20-gene severe ViP signature. (A) ReacFome pathway analysis of 20 gene severe ViP signature and visualization based on Voronoi tessellation. Cell cycle, Immune System, Disease and DNA Damage components are highlighted.

SUPPLEMENTARY TABLES INDEX:

Table S1: Gene clusters that constitute the ViP signature and their reactome analyses

Table S2: Classification of host response to viral infection using the ViP signature

Table S3: Table of 20 genes that define “severity” within the 166-gene ViP signature

Table S4. Demographics of the UCSD COVID-19 cohort participants for plasma.

Table S5. Demographics of the UCSD COVID-19 cohort participants for lung tissue.

Table S1: 1-1: List of genes that are equivalent to ACE2 that are elevated in acute infection and reduced in convalescent SARS-CoV infection (ViP signature) [n = 166] ABCD1 ACE2 ADAMTSL3 ADAR ADPRHL2 APOBEC3B APOL4 APOL6 AZI2 B2M BCL2L14 BLZF1 BRIP1 C19orf66 C21orf91 C3orf38 C6orf62 CARD16 CARD17 CASP1 CCNA1 CD274 CHMP5 CLIC4 CNP COL16A1 CSAG1 CSAG2 CXCL16 CYP21A2 CYP2J2 DCLRE1C DHX58 DTX3L ELOVL7 ETV7 FAM26F FANCA FAS FBXO6 FGD6 FRMD3 GBP1 GBP3 GBP4 GCA GCH1 GLRX GMPR GPT2 GTPBP1 HDX HESX1 HIST1H2AD HIST1H2AJ HIST2H2AA4 HIST2H2AB HLA-A HLA-B HLA-C HLA-E HLA-F HLA-G HLA-H HRASLS2 HSH2D HSPB9 HTR2B IDO1 IFI16 IFI27L1 IFI35 IFNE IGFBP4 IL15 IL15RA IL4I1 ISG20 KCNE4 KLHDC7B LAP3 LGALS3BP LGALS9 LGALS9C LMO2 LOX LPAR6 LYSMD2 MASTL MMAA MOV10 MUC13 MYD88 MYH7 N4BP1 NAT8 NCOA7 NMI NRG2 NUB1 NUPR1 OPTN PANX1 PARP12 PARP14 PARP9 PDCD1LG2 PHF11 PLSCR1 PML PNPT1 PPP2R2A PSMA6 PSMB8 PSME2 PTAFR PTPRR RBCK1 RBMS2 RNF114 RNF19B RTP4 S1PR2 SAMD9 SCARB2 SCO2 SECTM1 SHISA5 SLC15A3 SLC16A1 SLC25A28 SLC8A2 SP100 SPATS2L SQRDL STARD5 STAT2 TAP1 TDRD7 TLR2 TLR3 TMEM140 TMEM62 TMEM92 TNFSF10 TNFSF13 TOR1B TRAFD1 TREX1 TRIM21 TRIM22 TRIM25 TRIM26 TRIM69 TYMP UBA6 UBE2L6 UNC93B1 VAMP5 WARS WDFY1 ZBP1 ZFYVE26 ZNF618 ZNF620 ZNFX1

1-2: Reactome analysis ACE2 equivalent genes that are elevated in acute infection and reduced in convalescent SARS-CoV infection [n = 166] Name pValue FDR

ER-Phagosome pathway 1.1102230246251565e- 5.773159728050814e- 16 15 Antigen Presentation: Folding, assembly 1.1102230246251565e- 5.773159728050814e-

and peptide loading of class I MHC 16 15

Interferon alpha/beta signaling 1.1102230246251565e- 5.773159728050814e- 16 15

Endosomal/Vacuolar pathway 1.1102230246251565e- 5.773159728050814e- 16 15 Class I MHC mediated antigen processing & 1.1102230246251565e- 5.773159728050814e-

presentation 16 15

Antigen processing-Cross presentation 1.1102230246251565e- 5.773159728050814e- 16 15

Interferon Signaling 1.1102230246251565e- 5.773159728050814e- 16 15 Immunoregulatory interactions between a 1.1102230246251565e- 5.773159728050814e-

Lymphoid and a non-Lymphoid cell 16 15

Adaptive Immune System 1.1102230246251565e- 5.773159728050814e- 16 15

Immune System 1.1102230246251565e- 5.773159728050814e- 16 15 Interferon gamma signaling 1.1102230246251565e- 5.773159728050814e- 16 15

Cytokine Signaling in Immune system 1.1102230246251565e- 5.773159728050814e- 16 15

Formation of editosomes by ADAR 0.0007332592542007577 0.03111717347600229

UNC93B1 deficiency - HSE 0.0007332592542007577 0.03111717347600229

UCH proteinases 0.0007589554506342022 0.03111717347600229 Nef mediated downregulation of MHC class I 0.0013748418863652745 0.053618833568245705 complex cell surface expression mRNA Editing 0.0017596398496316779 0.0633470345867404 mRNA Editing: A to I Conversion 0.002858899412564564 0.09434368061463061

C6 deamination of adenosine 0.002858899412564564 0.09434368061463061

Metalloprotease DUBs 0.0037503329544381625 0.10925191811698953 Antigen processing: Ubiquitination & 0.0038640552673664397 0.10925191811698953

Proteasome degradation

DAP12 interactions 0.004046367337666279 0.10925191811698953

RMTs methylate arginines 0.004046367337666279 0.10925191811698953 Cytosolic sensors of pathogen-associated 0.004271400865153652 0.11105642249399494

DNA

Diseases of Immune System 0.004638186457844418 0.11131647498826602 Diseases associated with the TLR signaling 0.004638186457844418 0.11131647498826602 cascade

Table S1: 2-1: List of genes that are opposite to ACE2 that are reduced in acute infection and elevated in convalescent SARS-CoV infection [n = 26] ABCC5 ABLIM1 AHNAK C17orf89 CDCA7L CEP68 EIF3L EIF4B EPHA4 FAM172A FITM2 GDF11 GLS GSK3B KLHL36 NDUFA10 NRBP2 OIP5-AS1 PCBD2 RPL15 RPL27A RPL3 SLC16A7 SLC35E2 VIPR1 VPS13A

2-2: Reactome analysis for genes opposite to ACE2 that are reduced in acute infection and elevated in convalescent SARS-CoV infection [n = 26] Name pValue FDR GTP hydrolysis and joining of the 3.514097013379569e-9 1.9431648823342584e-7

60S ribosomal subunit L13a-mediated translational 3.514097013379569e-9 1.9431648823342584e-7 silencing of Ceruloplasmin

expression

Eukaryotic Translation Initiation 6.072390257294558e-9 1.9431648823342584e-7 Cap-dependent Translation 6.072390257294558e-9 1.9431648823342584e-7

Initiation Formation of a pool of free 40S 6.240078331831711e-8 0.0000015600195829579278

subunits

Peptide chain elongation 0.0000013850692115457974 0.00002829669938364532 Nonsense Mediated Decay (NMD) 0.0000016858595985880243 0.00002829669938364532 independent of the Exon Junction

Complex (EJC)

Eukaryotic Translation Elongation 0.0000017685437114778324 0.00002829669938364532

Eukaryotic Translation Termination 0.000002131796673188191 0.000029845153424634674

Selenocysteine synthesis 0.0000029067716003083888 0.00003164524306820127

Viral mRNA Translation 0.0000030334945886334452 0.00003164524306820127 Response of EIF2AK4 (GCN2) to 0.0000031645243068201268 0.00003164524306820127

amino acid deficiency SRP-dependent cotranslational 0.0000037338339450299074 0.000033604505505269167

protein targeting to membrane

Translation 0.0000037640053116572147 0.00003387604780491493

Nonsense-Mediated Decay (NMD) 0.000004555432106423396 0.00003644345685138717 Nonsense Mediated Decay (NMD) 0.000004555432106423396 0.00003644345685138717 enhanced by the Exon Junction

Complex (EJC)

Axon guidance 0.000013776459824810239 0.00009643521877367167

Nervous system development 0.000021198763962448908 0.00014261250385816382 Influenza Viral RNA Transcription 0.000023768750643027303 0.00014261250385816382

and Replication

Selenoamino acid metabolism 0.000027900157407834136 0.00016740094444700482 Regulation of expression of SLITs 0.00002939465523721374 0.0001712730921954453

and ROBOs Major pathway of rRNA processing 0.00003425461843908906 0.0001712730921954453

in the nucleolus and cytosol

Influenza Infection 0.00004371898058419532 0.0002185949029209766 rRNA processing in the nucleus and 0.00005265641581997382 0.0002632820790998691

cytosol

Signaling by ROBO receptors 0.00009552483029040548 0.0004641491015906496

rRNA processing 0.0001160372753976624 0.0004641491015906496 Metabolism of amino acids and 0.0017249564294193886 0.006899825717677555 derivatives

Cellular responses to stress 0.002127603149398083 0.008510412597592332

Developmental Biology 0.002376902588972607 0.009507610355890428 Cellular responses to external 0.0024219649238000907 0.009687859695200363 stimuli

Table S1: 3-1: 20 gene severe ViP signature [n = 20]

CLIC4 CARD16 CARD17 HIST2H2AA4 HIST1H2AJ SQRDL HIST1H2AD C21orf91

B2M HIST2H2AB HRASLS2 GCA CCNA1 CASP1 GCH1 TRIM25

TMEM92 IFI27L1 LOX ELOVL7

3-2: Reactome analysis for severe ViP signature genes [n = 20]

Name pValue FDR

RMTs methylate histone arginines 2.92E-09 6.66E-07

Metalloprotease DUBs 1.37E-08 1.567E-06

HDACs deacetylate 3.86E-07 2.934E-05

DNA Damage/Telomere Stress Induced Senescence 6.93E-07 3.947E-05

Packaging Of Telomere Ends 1.32E-06 4.987E-05

RNA Polymerase I Promoter Opening 1.32E-06 4.987E-05

DNA methylation 1.654E-06 4.987E-05

Amyloid fiber formation 1.969E-06 4.987E-05

Senescence-Associated Secretory Phenotype (SASP) 2.196E-06 4.987E-05

Recognition and association of DNA glycosylase with site containing an affected purine 2.267E-06 4.987E-05

Recognition and association of DNA glycosylase with site containing an affected pyrimidine 3.034E-06 5.638E-05

UCH proteinases 3.317E-06 5.638E-05

PRC2 methylates histones and DNA 3.642E-06 5.638E-05

SIRT1 negatively regulates rRNA expression 3.978E-06 5.638E-05

Cleavage of the damaged purine 3.978E-06 5.638E-05

Deubiquitination 4.068E-06 5.638E-05

Depurination 4.337E-06 5.638E-05

ERCC6 (CSB) and EHMT2 (G9a) positively regulate rRNA expression 5.124E-06 6.149E-05

Activated PKN1 stimulates transcription of AR (androgen receptor) regulated genes KLK2 5.556E-06 6.372E-05 and KLK3

HATs acetylate histones 5.793E-06 6.372E-05

Cleavage of the damaged pyrimidine 6.498E-06 6.498E-05

Depyrimidination 6.498E-06 6.498E-05

Ub-specific processing proteases 7.814E-06 6.986E-05

Nucleosome assembly 8.127E-06 6.986E-05

Deposition of new CENPA-containing nucleosomes at the centromere 8.127E-06 6.986E-05

Condensation of Prophase 8.732E-06 6.986E-05

Meiotic recombination 1.075E-05 8.596E-05

B-WICH complex positively regulates rRNA expression 1.394E-05 9.464E-05

Meiotic synapsis 1.394E-05 9.464E-05

Base-Excision Repair, AP Site Formation 1.394E-05 9.464E-05

Table S2. Virus Infection Datasets Used in this Study

Virus Type Portal of Entry GSE ID In vitro/in Cell type RO (RNA/DNA) (CME-dependent vivo C or independent) AU C HIV ss RNA (+) CME-dependent GSE125817 in vitro human monocyte derived dendritic cells (moDCs) 1 Zika ss RNA (+) CME-dependent GSE146423 in vitro A549 cells (Adenocarcinomic human alveolar basal 1 epithelial cells) WNV ssRNA (+) CME-dependent GSE138841 in vitro A549 cells (Adenocarcinomic human alveolar basal 1 epithelial cells) WNV ssRNA (+) CME-dependent GSE136342 in vitro Human monocyte derived DCs moDCs 1 CoV (MERS) ss RNA (+) CME-dependent GSE56677 In vitro Calu-3 2B4 cells; Human Coronavirus EMC 2012 1 (HCoV-EMC) CoV (HCoV- ss RNA (+) CME-dependent GSE45042 In vitro Calu-3 2B4 cells were infected with Human 1 EMC) Coronavirus EMC 2012 (HCoV-EMC) CoV ss RNA (+) CME-dependent GSE17400 In vitro Calu3 cells with SARS-CoV1 and Dhori virus 1 (DHOV), a member of the Orthomyxoviridae family within the Thogotovirus genus CoV ss RNA (+) CME-dependent GSE30589 In vitro Vero E6 cells SARS-CoV-1 1 H1N1 ss RNA (-) CME-dependent GSE47963 In vitro HAE cultures with H1N1 Inf virus 1 Inf A/B ss RNA (-) CME-dependent GSE68310 in vivo Peripheral blood cells 0.97 (human) CoV (SARS- ss RNA (+) CME-dependent GSE33267 In vitro Calu-3 cells were infected with either icSARS CoV or 0.97 CoV-1) the icSARS deltaORF6 EnV HHV-6 ds DNA CME-independent GSE40396 in vivo Peripheral blood cells 0.95 (human) CoV (SARS- ss RNA (+) CME-dependent GSE37827 In vitro Calu-3 cells were infected with either icSARS CoV or 0.94 CoV-1) the cSARS Bat SRBD HIV ss RNA (+) CME-dependent GSE140713 in vivo PBMCs 0.93 (human) HCV ss RNA (+) CME-dependent GSE40184 in vivo PBMCs 0.93 (human) HSV ds DNA CME-independent GSE46042 in vitro Pluripotent stem cell derived neuronallineages 0.89 Ebola ss RNA (-) CME-dependent GSE130629 in vivo Liver and spleen 0.88 (and (mouse) Macropinocytosis) HAV ss RNA (+) CME-dependent GSE40396 in vivo Peripheral blood cells 0.87 (human) CMV ds DNA CME-independent GSE81246 in vivo PBMCs 0.86 (human) SARS-CoV-2 ss RNA (+) CME-dependent GSE147507 In vitro Primary human lung epithelium (NHBE) and 0.86 transformed lung alveolar (A549) cells were mock treated or infected with SARS-CoV-2 (USA- WA1/2020) HHV6 ds DNA CME-independent GSE40396 in vivo Peripheral blood cells 0.85 (human) HEV ss RNA (+) CME-dependent GSE36539 in vivo Peripheral blood cells 0.85 (human) VZV ds DNA CME-independent GSE136586 in vitro Primary keratinocytes 0.81 SARS-CoV1 ss RNA (+) CME-dependent GSE47963 In vitro HAE cultures with SARS-CoV1 0.81 VZV ds DNA CME-independent GSE54385 in vitro human dermal fibroblasts and neuron 0.75 HBV ds DNA CME-dependent GSE135501 in vivo Peripheral blood cells 0.71 (human) AdV ss DNA CME-dependent GSE40396 in vivo 0.71 (human) HRV ss RNA (+) CME-dependent GSE40396 in vivo 0.7 (human) CMV ds DNA CME-independent GSE99454 in vitro MRC-5 and ARPE-19 cells 0.65 EBV ds DNA CME-dependent GSE135644 in vitro AGS and AGS-EBV cells 0.64 Measles ss RNA (-) CME-independent GSE5808 in vivo PBMC 0.53 (Caveolin- dependent) HPV ds DNA CME-independent GSE137965 in vitro Primary keratinocytes 0.51

Table S3. Table of 20 genes that define “severity” within the 166-gene ViP signature

Gene Name Host responses/infection Targeted pathway Drug/biologics- FDA approved CLIC4 Chloride intracellular channels 4 is required for the - Ribavirin PMID: replication of Chikungunya virus (PMID: 31483794) 31483794 CARD16 CARD 16 increases after the influenza A virus (PMID: Therapeutics on NLRP3 Canakinumab 29299535) NLRP3 inhibitor— Anakinra rilonacept CARD17 CARD 17 increases after Chikungunya virus Therapeutics on NLRP3 Canakinumab (CHIKV) infection (PMID: 31211814) & after the NLRP3 inhibitor— Anakinra influenza A virus (PMID: 29299535) rilonacept

HIST2H2AA4 Histone family pan-histone deacetylase SAHA (HDAC) inhibitor SAHA HIST1H2AJ Histone family pan-histone deacetylase SAHA (HDAC) inhibitor SAHA SQRDL Sulfide quinone reductase HIST1H2AD Histone family pan-histone deacetylase SAHA (HDAC) inhibitor SAHA C21orf91 Cold Sore Susceptibility Gene 1, influences - pathogenesis of herpes virus (PMID: 27081513) B2M b2 microglobulin mRNA levels are transiently increased after infection with Neurovirulent Sindbis Virus, suggesting that transcription of these mRNAs are coordinately regulated in neurons. (PMCID: PMC112110 HIST2H2AB Histone family pan-histone deacetylase SAHA (HDAC) inhibitor SAHA HRASLS2 Phospholipase A And Acyltransferase 2 GCA Grancalcin/GCA heterodimerization of TLR9 is important for TLR9-mediated downstream signaling during fine tune processes against viral infection (PMID: 26648480) CCNA1 Cyclin A1 CASP1 Caspase 1 GCH1 GTP cyclohydrolase; Neopterin is a biomarker for viral infection It is produced as a by-product in tetrahydrobiopterin (BH4) de novo synthesis and mirrors the activity of the rate limiting enzyme in the BH4 synthesis cascade TRIM25 tripartite motif containing 25 (TRIM25) is an E3 ubiquitin ligase and activates RIG1. It is involved in innate immune responses regulates intracellular signaling and/or RNA virus replication. PMID: 29018447 TMEM92 Transmembrane Protein 92 IFI27L1 Interferon Alpha Inducible Protein 27 Like 1 LOX Lipoxygenase increased in lung cells during RSV Maraviroc prevent the Maraviroc infection. It also gives higher CCL3 and CCL5 (PMID: function of CCR5 receptor 28579398) and it will block CCL5 binding (PMID: 22637726) ELOVL7 host fatty acid elongase 7; required for efficient virus release and virion infectivity (PMID: 25732827)

Table S4. Demographics of the UCSD COVID-19 cohort participants for plasma.

Days post Acute/ symptom UCSD ID Convalescent onset Age Gender Peak Disease Severity CoV3 Acute 37 59 M Critical CoV23 Acute 18 72 F Moderate CoV36 Acute 17 43 F Critical CoV37 Acute 4 25 F Mild CoV38 Acute 4 32 M Mild CoV41 Acute 7 83 M Mild CoV40 Acute 10 83 M Severe to Critical (NRB in ICU but never intubated) CoV42 Acute 6 58 M Mild to Moderate CoV64 Acute 13 62 M Moderate CoV65 Acute 28 42 M Critical CoV66 Acute 6 53 F Moderate Cov77 Acute 9 81 M Mod-Severe Cov78 Acute 10 42 M Mod-Severe CoV81 Acute 22 37 F Critical CoV82 Acute 8 53 M Critical CoV92 Acute 10 84 M Critical CoV93 Acute 4 41 M Critical CoV97 Acute 13 65 M Critical CoV98 Acute 5 86 M Critical CoV99 Acute 22 77 F Critical CoV14 Convalescent 24 32 M Mild CoV21 Convalescent 27 42 M Mild CoV22 Convalescent 34 51 M Mild CoV25 Convalescent 21 40 F Mild CoV26 Convalescent 30 43 F Mild CoV29 Convalescent 30 55 F Mild CoV30 Convalescent 27 40 M Mild to Moderate CoV31 Convalescent 33 64 M Mild Cov7 Convalescent 25 44 F mild Cov8 Convalescent 22 45 M mild Cov9 Convalescent 25 43 F mild Cov12 Convalescent 25 39 F mild Cov18 Convalescent 23 48 F mild Cov62 Convalescent 50 55 M mild Cov63 Convalescent 39 68 M mild to moderate Cov64 Convalescent 56 44 F mild Cov201 Acute 12 73 M moderate Cov204 Acute 13 54 M moderate Cov205 Acute 9 37 M moderate Cov206 Acute 10 77 F moderate Cov207 Acute 8 49 M moderate Cov210 Acute 5 48 M moderate Cov211 Acute 8 39 F moderate Cov202 Acute 10 37 M severe Cov106 Convalescent 74 59 M Moderate Cov103 Convalescent 79 50 M mild to moderate Cov208A Acute 7 84 M Fatal Cov213A Acute NA 37 M Asymptomatic Cov215 Acute 9 68 M Severe Cov218 Acute 5 53 M Severe CoV220 Acute N/A 55 M Asymptomatic CoV221 Acute 5 68 M moderate to severe CoV222 Acute 5 68 M moderate CoV223 Acute 7 57 M moderate to severe

Table S5. Demographics of the UCSD COVID-19 cohort participants for lung tissue.

Sample Age Gender Days spent Smoking Drugs administered during Subsequent Reason for Histology (NON Name hospitalized history hospitalization infection? surgery SARS-CoV-2) COVID + (NON until expired SARS-CoV- 2) Autopsy 2 64 Male 35 days Smoked MSSA from 1.5-2 Packs Lasix (furosemide), being per day Amiodarone HCL, stress intubated (PPD) until dose steroids, 1988 when vancomycin/zosyn, IV they quit. ceftriaxone, norepinephrine Smoked and vasopressin, digoxin, cannabis vecuronium daily

Autopsy 3 NA NA NA NA NA NA

Autopsy 4 90 Female 3 NA vancomycin/zosyn, low dose none of norepinephrine

Autopsy 5 57 Male 28 NA Cefepime, E.feacalis Hydroxychloroquine, and vancomycin, klebsillea Norepinephrine, PNA from vasopressin, esmolol, tracheal Unasyn, bivalirudin, aspirate pressors (levo, vaso, phenylephrine), linezolid ,Tocilizumab v. Placebo (information was not given what group he belonged to)

JA 5 (NON 46 Female NA Non- NA NA Left Lower Invasive SARS-CoV- smoker Lobe Adenocarcinoma 2) Nodule

JA 6 (NON 70 Male NA Non- NA NA Left Lower Granulomatous- SARS-CoV- smoker Lobe like inflammation 2) Nodule and scar

REFERENCES:

1. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R. NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res. 2005;33(Database issue):D562-6. Epub 2004/12/21. doi: 10.1093/nar/gki022. PubMed PMID: 15608262; PMCID: PMC539976. 2. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41(Database issue):D991-5. doi: 10.1093/nar/gks1193. PubMed PMID: 23193258; PMCID: PMC3531084. 3. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207-10. Epub 2001/12/26. PubMed PMID: 11752295; PMCID: PMC99122. 4. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003;31(4):e15. Epub 2003/02/13. PubMed PMID: 12582260; PMCID: PMC150247. 5. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249-64. Epub 2003/08/20. doi: 10.1093/biostatistics/4.2.249. PubMed PMID: 12925520. 6. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. Epub 2011/08/06. doi: 10.1186/1471-2105-12- 323. PubMed PMID: 21816040; PMCID: PMC3163565. 7. Pachter L. Models for transcript quantification from RNA-Seq. arXiv e-prints [Internet]. 2011 April 01, 2011. Available from: https://ui.adsabs.harvard.edu/\#abs/2011arXiv1104.3889P. 8. Collection and Submission of Post-mortem Specimens from Deceased Persons with Known or Suspected COVID-19 (Interim Guidance): Centers for Disease Control and Prevention (CDC); 2020 [cited 2020 30 April]. Available from: https://www.cdc.gov/coronavirus/2019-ncov/hcp/guidance- postmortem-specimens.html. 9. Amended COVID-19 autopsy guideline statement from the CAP autopsy committee: College of American Pathologists (CAP); 2020 [cited 2020 5th May]. Available from: https://documents.cap.org/documents/COVID-Autopsy-Statement-05may2020.pdf. 10. Rogers TF, Zhao F, Huang D, Beutler N, Burns A, He WT, Limbo O, Smith C, Song G, Woehl J, Yang L, Abbott RK, Callaghan S, Garcia E, Hurtado J, Parren M, Peng L, Ramirez S, Ricketts J, Ricciardi MJ, Rawlings SA, Wu NC, Yuan M, Smith DM, Nemazee D, Teijaro JR, Voss JE, Wilson IA, Andrabi R, Briney B, Landais E, Sok D, Jardine JG, Burton DR. Isolation of potent SARS-CoV-2 neutralizing antibodies and protection from disease in a small animal model. Science. 2020;369(6506):956-63. Epub 2020/06/17. doi: 10.1126/science.abc7520. PubMed PMID: 32540903; PMCID: PMC7299280. 11. Varghese F, Bukhari AB, Malhotra R, De A. IHC Profiler: an open source plugin for the quantitative evaluation and automated scoring of immunohistochemistry images of human tissue samples. PLoS One. 2014;9(5):e96801. Epub 2014/05/08. doi: 10.1371/journal.pone.0096801. PubMed PMID: 24802416; PMCID: PMC4011881. 12. Sahoo D, Dill DL, Tibshirani R, Plevritis SK. Extracting binary signals from microarray time-course data. Nucleic Acids Res. 2007;35(11):3705-12. doi: 10.1093/nar/gkm284. PubMed PMID: 17517782; PMCID: PMC1920252. 13. Sahoo D, Dill DL, Gentles AJ, Tibshirani R, Plevritis SK. Boolean implication networks derived from large scale, whole genome microarray datasets. Genome Biol. 2008;9(10):R157. doi: 10.1186/gb-2008-9- 10-r157. PubMed PMID: 18973690; PMCID: PMC2760884. 14. Sahoo D, Seita J, Bhattacharya D, Inlay MA, Weissman IL, Plevritis SK, Dill DL. MiDReG: a method of mining developmentally regulated genes using Boolean implications. Proc Natl Acad Sci U S A. 2010;107(13):5732-7. Epub 2010/03/17. doi: 10.1073/pnas.0913635107. PubMed PMID: 20231483; PMCID: PMC2851930. 15. Pandey S, Sahoo D. Identification of gene expression logical invariants in Arabidopsis. Plant Direct. 2019;3(3):e00123. Epub 2019/06/28. doi: 10.1002/pld3.123. PubMed PMID: 31245766; PMCID: PMC6508763. 16. Dabydeen SA, Desai A, Sahoo D. Unbiased Boolean analysis of public gene expression data for cell cycle gene identification. Mol Biol Cell. 2019;30(14):1770-9. Epub 2019/05/16. doi: 10.1091/mbc.E19- 01-0013. PubMed PMID: 31091168. 17. Jones AC, Anderson D, Galbraith S, Fantino E, Gutierrez Cardenas D, Read JF, Serralha M, Holt BJ, Strickland DH, Sly PD, Bosco A, Holt PG. Personalized Transcriptomics Reveals Heterogeneous Immunophenotypes in Children with Viral Bronchiolitis. Am J Respir Crit Care Med. 2019;199(12):1537-49. Epub 2018/12/19. doi: 10.1164/rccm.201804-0715OC. PubMed PMID: 30562046. 18. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, 3rd, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell. 2019;177(7):1888-902 e21. Epub 2019/06/11. doi: 10.1016/j.cell.2019.05.031. PubMed PMID: 31178118; PMCID: PMC6687398. 19. Zhang Z, Luo D, Zhong X, Choi JH, Ma Y, Wang S, Mahrt E, Guo W, Stawiski EW, Modrusan Z, Seshagiri S, Kapur P, Hon GC, Brugarolas J, Wang T. SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples. Genes (Basel). 2019;10(7). Epub 2019/07/25. doi: 10.3390/genes10070531. PubMed PMID: 31336988; PMCID: PMC6678337. 20. Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M, Roca CD, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Viteri G, Weiser J, Wu G, Stein L, Hermjakob H, D'Eustachio P. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018;46(D1):D649-D55. Epub 2017/11/18. doi: 10.1093/nar/gkx1132. PubMed PMID: 29145629; PMCID: PMC5753187.