<<

Content Based Search in Expression Databases and a Meta-analysis of Host Responses to Infection

A Thesis Submitted to the Faculty of Drexel University by Francis X. Bell in partial fulfillment of the requirements for the degree of Doctor of Philosophy November 2015 c Copyright 2015 Francis X. Bell. All Rights Reserved. ii

Acknowledgments

I would like to acknowledge and thank my advisor, Dr. Ahmet Sacan. Without his advice, support, and patience I would not have been able to accomplish all that I have. I would also like to thank my committee members and the Biomed Faculty that have guided me. I would like to give a special thanks for the members of the bioinformatics lab, in particular the members of the Sacan lab: Rehman Qureshi, Daisy Heng Yang, April Chunyu Zhao, and Yiqian Zhou. Thank you for creating a pleasant and friendly environment in the lab. I give the members of my family my sincerest gratitude for all that they have done for me. I cannot begin to repay my parents for their sacrifices. I am eternally grateful for everything they have done. The support of my sisters and their encouragement gave me the strength to persevere to the end. iii

Table of Contents

LIST OF TABLES...... vii

LIST OF FIGURES ...... xiv

ABSTRACT ...... xvii

1. A BRIEF INTRODUCTION TO ...... 1

1.1 Central Dogma of Molecular ...... 1

1.1.1 Basic Transfers ...... 1

1.1.2 Uncommon Transfers ...... 3

1.2 Gene Expression ...... 4

1.2.1 Estimating Gene Expression ...... 4

1.2.2 DNA Microarrays ...... 6

1.2.3 Microarray Analysis Methods ...... 7

1.3 Gene Expression Databases ...... 9

1.3.1 Small or Specialty Databases ...... 10

1.3.2 Gene Expression Omnibus: the Large Database ...... 10

1.3.3 ArrayExpress: the European Counterpart ...... 12

1.4 Database Management Systems ...... 12

2. BINARY REPRESENTATIONS OF GENE EXPRESSION STUDIES ENABLE EFFICIENT SEARCHES BY CONTENT...... 14

2.1 Background...... 14

2.1.1 Previous Attempts to Establish Content-based Searches ...... 14

2.1.2 Inspiration from Chemoinformatics ...... 15

2.1.3 Distance Measures...... 16 iv

2.2 Methods...... 18

2.2.1 Dataset Acquisition...... 18

2.2.2 Expression Profile Creation ...... 19

2.2.3 Binary Vector Creation ...... 20

2.2.4 Cross Validation Studies ...... 20

2.3 Results ...... 21

2.3.1 Single Platform Validation Study ...... 21

2.3.2 Multiple Platform Validation Study ...... 22

2.4 Discussion ...... 24

2.5 Conclusion...... 26

3. IMPLEMENTATION OF A DATABASE OF BINARY REPRESENTATIONS OF GENE EXPRESSION STUDIES ...... 27

3.1 Background and Methods...... 27

3.2 Results ...... 28

3.3 Discussion ...... 29

3.4 Conclusion...... 32

3.5 Future Work ...... 32

4. ENRICHMENT OF GENE EXPRESSION DATA USING BINARY VECTOR REPRESENTATIONS ...... 34

4.1 Gene Set Enrichment...... 34

4.1.1 KEGG Pathways...... 34

4.1.2 ...... 35

4.2 Determining Significance ...... 36 v

4.2.1 DAVID ...... 36

4.2.2 GSEA ...... 37

4.3 Motivation and Reasoning ...... 38

4.4 Methods...... 38

4.4.1 Distance Measures...... 38

4.4.2 Database Construction ...... 39

4.4.3 Exponential Transforms ...... 39

4.5 Results ...... 40

4.6 Discussion ...... 42

4.7 Conclusion and Future Work ...... 43

5. META-ANALYSIS OF GENE EXPRESSION DURING HOST RESPONSES TO INFECTIONS ...... 44

5.1 Background...... 44

5.2 Methods...... 45

5.2.1 Data Acquisition...... 45

5.2.2 Binary Search to Distinguish Taxonomical Groups...... 46

5.2.3 Selection of a Meta-analysis Method ...... 47

5.2.4 Statistical Approach ...... 48

5.3 Results ...... 49

5.4 Discussion ...... 53

5.5 Conclusion...... 56

6. META-ANALYSIS OF MIRNA EXPRESSION DURING HOST RESPONSES TO INFECTIONS ...... 58 vi

6.1 Background...... 58

6.2 Methods...... 59

6.3 Results ...... 60

6.4 Discussion ...... 63

6.5 Conclusion...... 64

7. CONCLUSION...... 65

REFERENCES ...... 67

APPENDIX A: Data from the Single Platform Study of Binary Representations ...... 79

APPENDIX B: Data from the Multiple Platform Study of Binary Representations .... 99

APPENDIX C: Data from the Initial Search of Binary Representation Database...... 120

APPENDIX D: Predicted Enrichments Using Binary Distances...... 155

APPENDIX E: Data Used in the Meta-analysis of Gene Expression Studies ...... 164

APPENDIX F: Enriched Data from the Meta-analysis of Gene Expression ...... 398

APPENDIX G: Data from the Meta-analysis of microRNA Expression Studies ...... 407

APPENDIX H: Enriched Data from the Meta-analysis of miRNA Expression ...... 517

VITA ...... 525 vii

List of Tables

2.1 Definitions of Binary Distance Measures. A and B are both binary vectors of length n. A+ and A− represent the number of positive and negative bits in A, respectively. A±B± denotes the number of bits in the intersection of A and B. A± ∨ B± denotes the number of bits in the union of A and B. To avoid division by zero, vectors of all positive or all negative bits are removed...... 17

2.2 Definitions of Numerical Distance Measures. A and B are vectors of fold changes of length n. Ai represents the ith term of the vector. µA is the mean of vectorA...... 18

2.3 Results of Single Platform Validation Study. True positive rates at a false positive rate of 0 (TPR|FPR=0) and average search time for distance measures (δ) are provided...... 22

2.4 Results of Multiple Platform Validation Study. True positive rates at a false positive rate of 0 (TPR|FPR=0) and average search time for distance measures (δ) are provided. The first search of a distance measure required the most time. In cases where the first search created deviations greater than the average search time, the first search was omitted from calculations...... 24

3.1 Results of Initial Search of Large Database. The time required searching, the average number of bits in the resulting profiles, the average number of common bits, and the number of profiles from the same submission as the query are shown for each distance measure, (δ)...... 28

4.1 Equations of Exponential Fitted Curves. Equations for the curves that are fit in Figure 4.3 for KEGG pathways and GO associations are provided for distance measures (δ). Equations follow the form a ∗ eb∗x...... 41

4.2 Confusion Matrices for Test Enrichments. Given are the numbers of True Positive, False Positive, and False Negative enriched gene sets for the Tanimoto and ratio distances. EASE enrichment was used to determine true enrichment values. TP - True Positives, FP - False Positives, and FN - False Negatives. True Negative enrichment is not included because most sets are not enriched for either distance...... 42

A.1 Summary of Profiles in Single Platform Study. Gives groups, abbr, samples, profiles ...... 79 viii

A.2 Subsets Used in Single Platform Study. The GEO Series ID, the Series title, the test group, the number of the subset in the series, and test samples are given for each subset. There are 5 test groups: Breast (BC), Alzheimer’s Disease (A), (SM), Hepatic (H), and Cardiac (C)...... 79

B.1 Summary of Profiles in Single Platform Study. Gives groups, abbr, samples, profiles ...... 99

B.2 Profles Used in Multiple Platform Study. The GEO Series ID, the Series title, the test group, the number of the subset in the series, and test samples are given for each subset. There are 5 test groups: (BC), Hunt- ington’s Disease (HD), Duchenne’s Muscular Dystrophe (DMD), Hepatic (H), and Cardiac (C)...... 99

C.1 Summary of Initial Searches of Databases. Searches of the database are per- formed using 3 distance measures to identify the 100 least distant profiles from the query. The average of and the standard deviations of the number of positive bits (PB) in selected profiles, the number of common positive bits (CPB), com- mon negative bits (CNB), and uncommon bits (UB) between the query profile and selected profiles are given for each measure. The time required to select the 100 least distant profile is included for each measure...... 120

C.2 Profiles Selected by Modified Tanimoto Distance. Profiles are identified by DataSet and profile titles and are ranked according to the modified Tanimoto distance between the selected and query profiles. The platform and number of positive bits (PB) of the selected profile as well as the number of common positive bits (CPB), common negative bits (CNB), and uncommon bits (UB) between the query profile and selected profiles are given for each profile that is selected. The query profile is separated from other profiles by a horizontal line.. 121

C.3 Profiles Selected by Tanimoto Distance. Profiles are identified by DataSet and profile titles and are ranked according to the Tanimoto distance between the selected and query profiles. The platform and number of positive bits (PB) of the selected profile as well as the number of common positive bits (CPB), common negative bits (CNB), and uncommon bits (UB) between the query profile and selected profiles are given for each profile that is selected. The query profile is separated from other profiles by a horizontal line...... 133 ix

C.4 Profiles Selected by Hamming Distance. Profiles are identified by DataSet and profile titles and are ranked according to the Hamming distance between the selected and query profiles. The platform and number of positive bits (PB) of the selected profile as well as the number of common positive bits (CPB), common negative bits (CNB), and uncommon bits (UB) between the query profile and selected profiles are given for each profile that is selected. The query profile is separated from other profiles by a horizontal line and a horizontal line is used to seperate clusters of profiles...... 145

D.1 Enriched KEGG Pathways for GDS2856. Significantly regulated from GDS2856 are enriched for KEGG pathways using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrich- ments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched...... 155

D.2 Enriched KEGG Pathways for GDS2856. Significantly regulated genes from GDS2856 are enriched for GO associations using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrich- ments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched...... 156

D.3 Enriched KEGG Pathways for GDS3534. Significantly regulated genes from GDS3534 are enriched for KEGG pathways using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrich- ments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched...... 157

D.4 Enriched GO Annotations for GDS3534. Significantly regulated genes from GDS3534 are enriched for GO annotations using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrich- ments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched...... 162 x

E.1 Profiles Used in Gene Expression Meta-analysis. The GSE series ID, the Platform ID, the series title, the species of infectious pathogen, control sam- ple IDs, and infection sample IDs are given for each profile. Series that are performed on multiple platforms and series using multiple infectious species are separated into distinct profiles. For profiles that contain a large number of samples, vector notation of the form textitGSM(i):increment:GSM(i+n) is used. 165

E.2 Up Regulated Genes From the Meta-analysis. The table lists genes that are significantly upregulated for taxonomical groups and genes that are commonly upregulated in combinations of the groups. The virus taxonomical group pro- duces the most significant genes...... 257

E.3 Down Regulated Genes From the Meta-analysis. The table lists genes that are significantly downregulated for taxonomical groups and genes that are com- monly downregulated in combinations of the groups. The virus taxonomical group produces the most significant genes...... 278

E.4 The 100 Most Up Regulated Genes from the Meta-analysis. The 100 genes that are upregulated and generate the least P-values are given with correspond- ing Z-scores for taxonomical groups and the collection of all profiles...... 390

E.5 The 100 Most Down Regulated Genes from the Meta-analysis. The 100 genes that are downregulated and generate the least P-values are given with corresponding Z-scores for taxonomical groups and the collection of all pro- files...... 394

F.5 GO Associations Enriched for the Collection of Profiles. The 20 most signif- icantly enriched GO associations for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included...... 398

F.6 GO Associations Enriched for Bacteria Profiles. The 20 most significantly enriched GO associations for bacteria profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included...... 399

F.7 GO Associations Enriched for Profiles. The 20 most significantly enriched GO associations for eukaryote profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included...... 400 xi

F.8 GO Associations Enriched for Virus Profiles. The 20 most significantly en- riched GO associations for virus profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included...... 401

F.1 KEGG Pathways Enriched for the Collection of Profiles. The 20 most sig- nificantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each path- way shown are provided. A P-value of 0 is a value less than 10−4...... 403

F.2 KEGG Pathways Enriched for Bacteria Profiles. The 20 most significantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4...... 404

F.3 KEGG Pathways Enriched for Profiles. The 20 most signifi- cantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4...... 405

F.4 KEGG Pathways Enriched for Viruses Profiles. The 20 most significantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4...... 406

G.1 Profiles Used in miRNA Expression Meta-analysis. The GSE series ID, the Platform ID, the series title, the species of infectious pathogen, control sam- ple IDs, and infection sample IDs are given for each profile. Series that are performed on multiple platforms and series using multiple infectious species are separated into distinct profiles. For profiles that contain a large number of samples, vector notation of the form textitGSM(i):increment:GSM(i+n) is used. 408

G.2 Up Regulated miRNA From the Meta-analysis. The table lists miRNA that are significantly upregulated for taxonomical groups and miRNA that are com- monly upregulated in combinations of the groups. The Bacteria taxonomical group produces the most significant miRNA...... 421

G.3 Down Regulated miRNA From the Meta-analysis. The table lists miRNA that are significantly downregulated for taxonomical groups and miRNA that are commonly upregulated in combinations of the groups. The viruses taxo- nomical group produces the most significant miRNA...... 427 xii

G.4 The 100 Most Up Regulated miRNA from the Meta-analysis. The 100 miRNA that are upregulated and generate the least P-values are given with corresponding Z-scores for taxonomical groups and the collection of all pro- files...... 429

G.5 The 100 Most Down Regulated miRNA from the Meta-analysis. The 100 miRNA that are downregulated and generate the least P-values are given with corresponding Z-scores for taxonomical groups and the collection of all pro- files...... 431

G.6 mRNA Targets of Significant miRNA. Significant miRNA are mapped to mRNA targets for the entire collection and taxonomical groups. mRNA that correspond to a gene are listed. Bacteria profiles map to the greatest number of genes, over 10,000...... 434

H.1 KEGG Pathways Enriched for the Collection of Profiles. The 20 most sig- nificantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4...... 517

H.2 KEGG Pathways Enriched for Bacteria Profiles. The 20 most significantly enriched KEGG pathways for bacteria profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4...... 518

H.3 KEGG Pathways Enriched for Eukaryote Profiles. The 20 most signifi- cantly enriched KEGG pathways for eukaryote profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4...... 518

H.4 KEGG Pathways Enriched for Virus Profiles. The 20 most significantly enriched KEGG pathways for virus profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4...... 519

H.5 GO Associations Enriched for the Collection of Profiles. The 20 most signif- icantly enriched GO associations for the collection of profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. False discovery rates (FDR) are included. .. 520 xiii

H.6 GO Associations Enriched for Bacteria Profiles. The 20 most significantly enriched GO associations for bacteria profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included...... 521

H.7 GO Associations Enriched for Eukyotes Profiles. The 20 most significantly enriched GO associations for eukaryote profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included...... 522

H.8 GO Associations Enriched for Virus Profiles. The 20 most significantly en- riched GO associations for virus profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included...... 524 xiv

List of Figures

1.1 Diagram of the Central Dogma of Molecular Biology. Basic transfers are indicated by thick, red arrows. Uncommon transfers are indicated by thin, blue arrows. Rare transfers are not shown. Modified from [2] ...... 2

1.2 Example of a Simple DNA Microarray. Four probes complementary to dif- ferent genes are used. These probes are attached to a platform and tagged. When the complementary gene attaches to a probe, it fluoresces. The amount of fluorescence at a spot indicates gene expression...... 7

1.3 GEO Records and Organization. Platforms, Samples, Series, and Datasets are the data types in GEO. The relations between them are shown as arrows. .... 11

2.1 ROC Curves of Distance Measures Included in the Single Platform Vali- dation Study. The solid curves plot the true positive rate (y-axis) against the false positive rate (x-axis), averaged over all queries, with error bars represent- ing standard deviation. AUC is shown as a percentage of the total area. The dashed lines show the ROC curves for random selections. Numerical distance measures are in the left column; binary in the right...... 23

2.2 ROC Curves of Distance Measures Included in Multiple Platform Valida- tion Study. The solid curves plot the true positive rate (y-axis) against the false positive rate (x-axis), averaged over all queries, with error bars represent- ing standard deviation. AUC is shown as a percentage of the total area. The dashed lines show the ROC curves for random selections. Numerical distance measures are in the left column; binary in the right...... 25

4.1 Example of KEGG Pathway. Th complement cascade is part of KEGG Path- way hsa04610. Gene products are shown in boxes. Dashed lines show indirect effects between products. Solid lines with arrows represent activation. Solid lines with bars represent inhibition...... 35

4.2 Example of GO Hierarchy. The regulation of antigen processing and pre- sentation is defined in GO:0002577. The yellow box is the child of its parent terms shown in white. The possible vocabularies for relations between terms is shown on the right...... 36

4.3 Fit Exponential Curves of P-values and Binary Distances. Exponential curves are fit to plots of P-values and binary distances of enriched gene sets for the profile from GDS1290. Gene sets are represented as blue dots. Curves are shown as continuous red lines. (Left) KEGG Pathways. (Right) GO Asso- ciations. (Top) Tanimoto distance. (Bottom) Ratio distance...... 41 xv

5.1 ROC Curves from Cross Validation Study. ROC curves are shown when profiles from the entire set and the bacteria, eukaryota, and viruses taxonomical groups are used to query a binary representation database. The expected result of a random search is provided as a dashed red line...... 50

5.2 Venn Diagrams of Differentially Regulated Genes. Differential expression is tested for three taxonomical groups. The number of common genes refers to genes that are found in different taxonomical groups, not genes that are regulated in the entire collection analysis...... 51

5.3 Phylogenetic Tree and Enrichment of Significant Genes. a) Phylogenetic tree of organisms in gene expression analysis. Viruses are classified by the Baltimore Classification. Bacteria and Eukaryotes are classified by phyla. Opisthokonta is used as a phylum for Animalia and Fungi. The number of significant genes are provided after the group title. b) Heat map of enriched KEGG pathways. (Left) Signaling pathways. (Right) Disease associated path- ways...... 52

6.1 Diagram of miRNA Function. After , a miRNA forms a double stranded precursor stem loop with itself. The stem loop is cleaved, producing mature miRNA strands. The miRNA strands attach to complementary mRNA strand. Translation of the mRNA is prevented or the mRNA is destroyed. Mod- ified from [47]...... 59

6.2 Venn Diagrams of Differentially Regulated miRNA. Differential expression is tested for three taxonomical groups. The number of common genes refers to genes that are found in different taxonomical groups, not genes that are regulated in the entire collection analysis...... 61

6.3 Phylogenetic Tree and Enrichment of Significant miRNA. a) Phylogenetic tree of organisms in miRNA expression analysis. Viruses are classified by the Baltimore Classification. Bacteria and Eukaryotes are classified by phyla. Opisthokonta is used as a phylum for Animalia and Fungi. The number of significant miRNA as well as the number of targeted genes are provided after the group title. b) Heat map of enriched KEGG pathways. (Left) Signaling pathways. (Right)Cellular process pathways...... 62

E.1 Phylogenetic Tree of Pathogen Species. The species are divided into 3 groups: Viruses (green), Bacteria (violet), and Eukaryotes (). The num- ber of significant genes in each group are shown. Groups that do not produce a significant gene are included...... 164 xvi

G.1 Phylogenetic Tree of Pathogen Species. The species are divided into 3 groups: Viruses (green), Bacteria (violet), and Eukaryotes (orange). The num- ber of significant miRNA and the number of corresponding gene targets in each group are shown. Groups that do not produce a significant gene are included. .. 407 xvii

Abstract

Content Based Search in Gene Expression Databases and a Meta-analysis of Host Responses to Infection Francis X. Bell Ahmet Sacan, Ph.D.

The expression of a gene is a function of the number of times that the information encoded by the gene is transcribed. Estimation of gene expression levels is typically per- formed by determining the concentration of RNA molecules. The high-throughput tech- nology of cDNA microarrays allows the concentrations of thousands of genes to be esti- mated simultaneously. Databases of gene expression studies have been growing, but the data contained in these databases are not fully interpreted, because cross-comparison of all gene expression studies requires large amounts of and computing time, making content-based searching in these databases impractical. In order to improve the efficiency of searching gene expression databases, we introduce the method of representing gene expression data in binary format, inspired from the use of binary fingerprints in the Chemoinformatics applications. The use of binary representations of significant gene lists from gene expression studies is tested for its appropriateness by performing cross-validation experiments in small and large benchmark datasets. Among the numerical and binary distance measures surveyed, the modified Tanimoto distance was found to provide the best accuracy-speed trade-off; it identifies many relevant profiles as similar to a query profile and the distance calculation can be executed efficiently by utilizing fast bit-wise operations. Availability of data from different gene expression experiments in public databases pro- vides an opportunity for performing meta-analysis to obtain common expression profiles across different experiments that may not be apparent from individual studies. We have undertaken a meta-analysis of host responses to infections, utilizing both gene and miRNA xviii expression studies. Common and unique differential expression patterns in response to pathogens from different taxonomic groups are identified and gene set enrichment was performed to identify the affected biological pathways. Our findings corroborate known common host response mechanisms, while also identifying novel expression profiles that are important for pathogen-specific responses. Some of the significantly differentially ex- pressed genes we have identified are involved in pathogen recognition, and proteasome assembly, and common pathogen evasion mechanisms including TNF signaling and apoptosis. Our findings identify under-studied classes of pathogens and also provide insights for pathogen-specific evasion and response mechanisms.

1

Chapter 1: A Brief Introduction to Gene Expression

This thesis focuses on the analysis of gene expression detected using microarray tech- nology and increasing the accessibility of that analysis. This chapter addresses the theory behind gene expression and microarray technology. It also introduces the methods used to access, analyze, and manipulate gene expression data.

1.1 Central Dogma of Molecular Biology

On a cellular level, information is expressed as sequences of organic molecules. The central dogma of molecular biology attempts to summarize the transfer of information be- tween sequences. Simply put, the dogma states that information is stored in DNA which is transcribed into RNA which is then translated to amino acids that form which in turn give the function [29]. Although it can be worded succinctly, the central dogma is very intricate with many exceptions and corollaries required to fully explain the informa- tion transfer as it is seen in nature. Three basic transfers, two uncommon transfers, and one rare transfer of information are possible. Figure 1.1 provides a simplified depiction of the central dogma including the three basic and two uncommon transfers. The rare transfer has only been seen in experimental settings and is not discussed in this thesis.

1.1.1 Basic Transfers

Cellular information is stored in sequences of nucleic acids called genes which are composed of deoxyribonucleic acid (DNA) strands. Single stranded DNA is unstable. To prevent degradation, single stranded DNA binds to a complementary DNA strand to form a stable double stranded DNA which coils into a chromosome. The first basic transfer occurs when a chromosome is uncoiled, the DNA is separated into individual strands, information from both strands is read, and new DNA strands containing the same information as the original strands are created. This transfer is known as DNA replication, and few exceptions are needed to further explain it. The most notable exception is the replication of methy- 2

Figure 1.1: Diagram of the Central Dogma of Molecular Biology. Basic transfers are indicated by thick, red arrows. Uncommon transfers are indicated by thin, blue arrows. Rare transfers are not shown. Modified from [2]

lated DNA. When a is attached to a nucleic acid, the gene that includes this nucleic acid cannot participate in other transfers. In general, methylated DNA is copied to the new DNA strands and the methylated nucleic acid continues to prevent the gene from participating in other transfers. The second basic transfer separates the double stranded DNA into single strands, reads information from a gene, and generates a complementary ribonucleic acid (RNA) strand. This transfer is known as transcription and there are several exceptions needed to explain the unequal frequencies with which information is transfered. The first exception is caused by the coiling of to reduce the space required to store DNA. Chromosomes are coiled around proteins called histones which must be deactivated before access to the DNA strand containing the gene to be transcribed is gained. The deactivation of histones requires time and energy. As a result, a gene that is on a section of a chromosome that is coiled around many histones may rarely be transcribed. Another exception to transcrip- 3 tion is that genes may be contained on non-continuous strands of DNA. The RNA that is produced from these genes depends upon the ordering of the non-continuous RNA strands with certain orderings more likely to occur. Thus, information may be transcribed more or less frequently. Other exceptions exist but are not included in this discussion. The third basic transfer reads information from the RNA and assembles an chain. With proper folding, the amino acid chain forms a . This transfer is known as translation and it has many exceptions, but only two are mentioned here. Before the amino acid chain is folded into a protein, it can be modified by the addition of functional groups to amino acids or by removing amino acids in a process called post-translational modification. Like other exceptions, the frequency at which information is transfered is disrupted, but unlike other exceptions, the information itself is altered. Another exception of translation is RNA silencing which utilizes small RNA strands to interfere with functions of other RNA strands. RNA silencing only changes the frequency with which information is transfered and not the information itself.

1.1.2 Uncommon Transfers

The uncommon transfers were first identified as mechanisms of viral regeneration. The first of these transfers occurs in viruses that do not utilize DNA during their regeneration process. RNA replication occurs when a RNA strand is read, a complementary RNA strand is produced and read, and a RNA strand identical to the original strand is formed. This transfer is primarily seen in, but is not limited to, viruses. It has been identified in cellular organisms as part of RNA silencing. The other uncommon transfer reads a RNA strand, produces a complementary strand of DNA, adds another strand to stabilize the DNA, and inserts the stable DNA strand into the host chromosome. This is known as reverse transcription and is a staple of viruses like HIV and Hepatitis B virus. It has become a valuable tool for experimentally amplifying RNA concentrations using cloned DNA (cDNA), as discussed later. 4

1.2 Gene Expression

The expression of a gene is a function of the number of times that the information en- coded by the gene is transcribed. It varies across tissues and cell lines due to the exceptions of the central dogma. In one tissue, a gene may be methylated or densely coiled around many histones; while in another tissue, it may be unmodified and easily transcribed. A gene may be frequently transferred in the latter tissue and infrequently transferred in the former tissue. Therefore, the magnitudes of gene expression between tissues and cell lines can be misleading when testing if an environmental condition has changed the expression of that gene. To detect if an environmental condition alters gene expression, a baseline expression of the gene must be used.

1.2.1 Estimating Gene Expression

Gene expression cannot be measured directly because, as of now, there is no method to directly assess the frequency with which a gene is being transcribed at a discrete moment. Measuring DNA concentrations is not useful because DNA concentrations remain constant, except during replication when it doubles. Because of the exceptions to the central dogma, RNA and protein concentrations are not exactly proportional to the frequency at which a gene is read and can only provide estimates of gene expression. Nevertheless, determining the concentrations of RNA or protein at a given time allows a statistically robust estimation of gene expression.

Estimations using protein

Gene expression can be inferred from protein concentrations at a particular moment of time using the preferred method of Western blotting. In this technique, protein is extracted from a sample and bound to tagged . The -bound protein is added to a gel, and different proteins are separated by an electric current. The proteins are then transferred to a membrane, and the tagged antibodies are stained. The intensity of the stain is proportional to protein concentrations and is correlated with gene expression. 5

Although proteins can be used to estimate gene expression, they are not the preferred sequences used to do so. Many exceptions exist between the reading of a gene and the maturation of a protein which decreases the correlation between protein concentration and gene expression. To compound the problem, only a limited number of proteins can be tested simultaneously using Western blots. The high-throughput technology of protein mi- croarrays allows more proteins to be measured simultaneously, but the limited correlation restricts this approach.

Estimations using RNA

The practice of measuring RNA concentrations to estimate gene expression is more common than measuring protein concentrations. Because there are fewer exceptions to the central dogma between DNA and RNA than DNA and protein, the correlation between RNA concentrations and gene expression is increased. In addition, RNA types like non- coding RNA (ncRNA) and transfer RNA (tRNA) that are not translated into protein can be included in the analysis. The standard technique for measuring RNA concentrations is Northern blotting. Northern blotting is similar to Western blotting, but instead of protein concentrations, RNA concentrations are measured. In Northern blotting, RNA sequences are isolated, labeled, separated by gel electrophoresis, and stained on a membrane where intensity is proportional to RNA concentration. Instead of being labeled with antibodies, RNA sequences are tagged with complementary nucleotides. Western blotting and North- ern blotting both are limited by the number of tagged sequences that can be measured and the relatively small number of sequences from the sample that can be tested. The number of RNA sequences that can be measured is increased by utilizing repli- cation in a process known as reverse transcription polymerase chain reaction (RT-PCR). Polymerase chain reaction (PCR) is the experimental method to replicate and amplify the concentration of DNA. Reverse transcription is used to create cDNA strands from RNA strands which can then be amplified using PCR. If the cDNA strands are tagged, the concen- tration of the cDNA strands can be detected in a process called quantitative PCR (qPCR). 6

Reverse transcription qPCR can be used to estimate the original RNA concentrations, but the large amounts of noise and variation introduced during amplification create a degree of uncertainty in the estimation. High-throughput DNA microarray technology allows for the concentrations of many RNA sequences to be determined by removing some of the uncertainty, but it introduces additional noise and variation that must be factored into the estimation.

1.2.2 DNA Microarrays

A DNA microarray is a laboratory device that detects the concentrations of DNA se- quences bound to fixed, tagged complementary DNA strands. When used in conjunction with cDNA libraries produced from RT-PCR, they can be used to estimate gene expression. The use of cDNA microarrays has become a very common part of research in biological sciences in recent years. The construction of a DNA microarray can be explained, rather simply, as a series of DNA probes attached to spots on a surface. Florescent labeled cDNA samples are then added to the slide, and the cDNA fragments bind to specific probes by DNA complementarity. By controlling the location, quantity, and nucleic acid sequence of the attached DNA probes, the concentration of a complementary gene can be estimated as proportional to the florescent intensity at that spot. When the complementary strands are cDNA strands from RT-PCR, the intensity can be used to approximate RNA concentration. The basic construction and functionality of DNA microarrays are illustrated in Figure 1.2. Two forms of microarrays have been used to estimate gene expression. The earlier form uses different colored tags to distinguish between samples under different conditions. The intensity of one color would serve as a baseline for detecting expression while the intensity of the other color would serve as the expression of the test sample. As microarray studies began to include large amounts of test samples, this form of microarray became unmanageable. Each test sample required a corresponding control sample and the total number of samples became excessive. In this thesis, this form of microarray is avoided 7

Figure 1.2: Example of a Simple DNA Microarray. Four probes complementary to dif- ferent genes are used. These probes are attached to a platform and tagged. When the complementary gene attaches to a probe, it fluoresces. The amount of fluorescence at a spot indicates gene expression.

whenever possible. A second form of microarray addressed the problem of sample size by eliminating the second sample by using only one sample and one fluorescent color in each test. To establish a baseline of expression, the average expression of several control samples is calculated, and this baseline is compared with the average of the test samples. This form has predominated the field, reducing the number of samples by close to a half. Several designs of this form of microarray exist, but they are all based on the normal distribution of sample RNA sequences subject to diffusion and Brownian motion. Because the technology relies upon diffusion, noise and variation is added to an already noisy technology. Despite its drawbacks, this form of microarray is the most commonly used and is preferred in this thesis.

1.2.3 Microarray Analysis Methods

Estimating gene expression from microarray experiments can be very difficult due to the large amount of noise and variability. Many complex analysis approaches have been 8 devised to remove noise and show true expression. However, each approach produces a slightly different result. The ability to draw conclusions from heterogeneous sets of microarray experiments is limited by the use of these different data analysis techniques. The first analysis step is the normalization of probe intensities to correct for systematic biases in measurements [97]. A common approach assumes that the sum of all intensities in a sample will be equal for every sample. So that the sum of intensities of each sample is equal, the intensities of different samples is adjusted independently for each sample. Nu- merous variations of this approach using mean or median values exist using every probe or a subset of probes in a sample. Linear regression approaches have been used for normaliza- tion assuming a linear response between true expression level and observed intensity. Un- fortunately, this assumption does not always hold true [72], and caution should be applied before using a linear regression for normalization. Some normalization approaches utilize the ranking of differential expressions of probes to define where no expression change oc- curs and adjust probe intensities around it [120]. These approaches tend to fail when the majority of genes are regulated in the same direction, either up or down. Another nor- malization approach called locally weighted scatterplot smoothing (LOWESS) attempts to remove intensity dependent effects, most notably when the intensities are logarithmically transformed [25, 128]. After normalization, the data is reduced to profiles of genes with statistically differ- ent expression levels by clustering groups of genes with similar expressions. Clustering usually requires complex algorithms, and numerous such algorithms have been proposed [105]. Normalized expression values have been used to cluster by k-means or k-nearest neighbor algorithms [35] or machine learning techniques like support vector machines or self organized maps. Fold change can be used to determine which genes are different by considering only genes with an absolute magnitude of fold change greater than an arbi- trary level. Differentially expressed genes can also be identified by selecting genes that show consistently large changes in expression by ranking expressions across samples [44]. 9

To establish a statistical difference, a Z-score can be calculated by dividing the difference of the average expression of a gene for test and control samples by the pooled standard deviation. The Z-score is compared with the normal cumulative distribution to determine significance. The use of an unequal variance t-test between test and control samples for every probe provides a greater statistical decision, but it does not account for type I errors caused by testing multiple comparisons, also known as false discoveries. Algorithms have been proposed for reducing the rate at which these errors occur, known as the false dis- covery rate (FDR), after statistical testing. FDR can also be addressed before statistical testing with algorithms using permutations [7, 113]. The most notable of these algorithms is called significance analysis of microarrays (SAM). For the SAM algorithm, a t-statistic is calculated between expressions of control and test samples and compared against the average t-statistic of permutations of random selections for control and test samples for the same gene [121]. For this thesis, pre-normalized data are used whenever available. For studies without pre-normalized data, the Robust Multi-chip Average (RMA) algorithm [8, 64, 63] is used for normalization of raw data. RMA adjusts observed expression values by factors based on mean and distribution, normalizes data using a quartile variation of the equal inten- sity approach, and summarizes data from probes for the same gene using a median polish technique. In this work, a correction for FDR is not applied because FDR corrections are not necessarily applied for pre-normalized data, and data management presents concerns addressed in later chapters.

1.3 Gene Expression Databases

Microarray technology has become a standard practice in modern biology to estimate gene expression levels under different experimental conditions. The information contained in microarray studies is very large and usually contains details that are not interpreted by the authors of the original study. To allow other researchers access to this data to dis- 10 cover details not addressed by study authors, large scale repositories have been created to hold gene expression data. Since 2004, scientific publishers have made it obligatory that microarray data be deposited into these repositories before an article can be published [5].

1.3.1 Small or Specialty Databases

Several relatively small databases of microarray studies have been created either for institutions or for specific topics. The Stanford Microarray Database (SMD) was created in 2001 to store results of microarray studies performed at the university [61]. Although Stanford researchers can access the database freely while others are charged for information retrieval, most of the data is now deposited in larger, free databases. Oddly, it is now run from a server, not in California, but in New Jersey at Princeton University and a name change is planned. The Saccharomyces Genome Database (SGD) is a database entirely devoted to the study of budding , genomes, and yeast microarrays [22]. It was started in 2011 at Stanford University and is still operated at the California school, not in another state. Oncomine is a proprietary database of studies of cancer genomes and gene expression data [101]. The Connectivity Map, BioGPS, and L2L Microarray Database are other examples of small databases.

1.3.2 Gene Expression Omnibus: the Large Database

Currently, the most prominent and widely used database is the Gene Expression Om- nibus (GEO) provided by the National Center for Biotechnology Information (NCBI). The database was created in 2000 to efficiently store gene expression data in a freely available location, provide a standard for the submission of data, and offer mechanisms for querying and locating that data [34]. In this thesis, GEO is used as the primary source for microar- ray data, although the methods presented in subsequent chapters are applicable to other microarray data sources. 11

Figure 1.3: GEO Records and Organization. Platforms, Samples, Series, and Datasets are the data types in GEO. The relations between them are shown as arrows.

GEO Structure

GEO classifies all of the data deposited to it into two types of records: Platforms and Samples. Platform records are created for the equipment on which studies are performed and designated with a GPL identifier. They contain information about the manufacturer, probe identifiers, and mappings of probes to various gene or protein identifiers. Each exper- iment submitted to GEO is stored as a Sample record and designated with a GSM identifier. Sample records contain pre-normalized expression values and experimental procedures. If raw data is submitted for an experiment, it can be accessed when the Sample is accessed. Samples performed for the same study are grouped into Series, designated GSE. Series contain descriptions of study motivations and procedures as well as additional conclusions, analyses, and data. Series that meet GEO standards are curated into DataSets, designated GDS. DataSets have analyzed data, detailed information about study procedures, and are able to be independently searched. Unfortunately, the rate of study deposition into GEO is greater than the rate at which GEO curates DataSets. It can take years before a Series is curated and most studies have not been curated yet. Figure 1.3 summarizes GEO data types and their relations. 12

Expansion of Database

Before 2001, there were no guidelines for submitting microarray data to public reposito- ries and results of microarray studies often could not be verified. In that year, the Minimum Information About a Microarray Experiment (MIAME) standard was created establishing protocols for transparency with regards to gene expression studies. The MIAME guidelines state that raw data as well as normalized data be submitted to a public repository along with proper annotation of experiments and probe sets and descriptions of experimental proce- dures and analysis techniques used [9]. Since the adoption of MIAME standards for pub- lishing gene expression studies by most scientific journals, the amount of data in public repositories for gene expression studies has been ever expanding. Having quickly adopted the standard, GEO has collected over 650,000 submissions with an estimated 10,000 more submitted each month [6, 34].

1.3.3 ArrayExpress: the European Counterpart

ArrayExpress is the other large and widely used free database. Created in the early 21st century, it has the same goals as GEO and has also adopted the MIAME standards [10, 92]. The key distinction between ArrayExpress and GEO is the institution that op- erates the database. GEO is run by NCBI while ArrayExpress is run by the European Bioinformatics Institute (EBI). The large databases mostly contain the same information and new information is shared between them on a regular basis.

1.4 Database Management Systems

Most databases today are designed in entity-relationship structure where an entity con- sists of information or references to the information and relations consists of interactions between the references [36]. The most prominent computational language used to access and modify databases is the Sample Query Language (SQL). A database management sys- tem maintains the storage of the data in the file system and allows data retrieval and al- teration by SQL squeries. Commonly used database management systems include Ora- 13 cle, Microsoft SQL Server, PostgreSQL, and MySQL. In this study, we utilize the SQLite database management system [91] due to its minimal requirements in system resources. SQLite databases can be accessed interactively using client software or programmatically in computer languages like C++, Java, Python, MATLAB, Perl, and Ruby. Unless other- wise stated, all relational databases created for this thesis are based in SQLite and accessed by Python [26] and MATLAB [69]. 14

Chapter 2: Binary Representations of Gene Expression Studies Enable Efficient Searches by Content

The availability and rapid growth of microarray databases have made an integrated analysis of these databases computationally challenging. The implementation of content based searching in these databases will aid in this integrated analysis. In this chapter, a novel approach to content based searching in microarray databases that uses binary vector representations of differential gene expression profiles is presented, tested, and discussed. The performance of searches with the use of binary vector representations are compared with the use of floating point measures with data from a single platform and with data from multiple platforms.

2.1 Background

After analyzing results from a gene expression study, researchers find it helpful to com- pare results with published studies in public databases. Manually searching the databases is inefficient and time consuming. GEO and other databases allow for topical searches to aid users, but these searches are biased by the incomplete categorization and indexing of datasets and do not reliably return all related datasets [96]. Particularly, they do not explic- itly retrieve datasets that produce similar results or share similar gene expression profiles. Searching for datasets with similar results is termed content based retrieval which GEO does not currently provide.

2.1.1 Previous Attempts to Establish Content-based Searches

Only a few attempts have been made to establish content based searches in microarray databases. One of the earliest tools for comparing two differential expression profiles was based on a Bayesian similarity metric [62]. This approach was tested on a small collection of microarray studies and compared with the use of other similarity measures. The use of the Bayesian similarity metric performed better than other metrics but not significantly so. This approach, although ground breaking, could not be scaled to large databases. 15

A few years later, the first profile comparison algorithm that could be used with large databases was developed utilizing Spearmans rank coefficient [55]. It was efficient for iden- tifying datasets testing the same cell type as the query on the same technology as the query. Even though this approach had limited success when different technologies were com- pared, it was used to create CellMontage, which compares expression profiles of queries to database records [43]. Although CellMontage quickly identifies datasets testing similar cell types and tissues to those of a query sample, it provides only a small improvement over topical searches. In 2008, an application called GeneChaser was developed to efficiently obtain datasets that differentially expressed query genes and retrieve relevant data from those datasets [20]. It did this by implementing a program that automatically converts probe set identifiers into correlated annotations [19]. The limitation of this approach was the small number of genes that could be searched simultaneously. Due to computational requirements, this approach could not handle a complete listing of genes from an experiment. Many microar- ray databases, such as SMD and ArrayExpress, adopted techniques similar to GeneChaser shortly thereafter [61, 81, 92]. Engreitz et al. addressed the speed of content based searches of differential expression profiles. They used the dimension reduction method of independent component analysis to increase the speed of searches 50 fold [38, 37]. The work compared gene expressions for every gene in an experiment, even genes with expressions that were not statistically different.

2.1.2 Inspiration from Chemoinformatics

The motivation for a novel content based search algorithm comes from the Chemoin- formatics field. Chemoinformatics databases store chemical properties of a vast number of chemicals such that chemicals with similar properties can later be identified [13, 116]. The set of chemical features is most commonly represented using binary vectors, a format that 16 greatly reduces space for storing data and time for searching this information. A binary vector indicates if an attribute is present using only one bit per attribute. For example, the ethyne molecule would receive a positive bit for a triple bond between atoms and a negative (empty) bit for an ketone group. An early analysis of binary vector representations of chemical compounds showed that binary vectors were useful for identifying chemical similarities, but searches of binary vec- tors often produced generic results [41]. Small chemicals with few positive attributes were selected before chemicals that better resembled the query. Nevertheless, the use of binary vectors to represent chemical compounds continues to be widely used. Binary representa- tions have also been used in a wide variety of other fields such as taxonomy [103, 104, 111], paleoecology [53], image analysis [3], biometrics [73], and handwriting identification [17]. In 2011, McCall et al. developed an algorithm to convert data from a single microarray experiment to a binary representation called a barcode [86]. It used binary entropy to compare a large number of datasets. The work was limited to specific platforms and it focused on identifying tissue specific gene expression, rather than content-based searching. A novel content based search using a similar approach is discussed below.

2.1.3 Distance Measures

In order to utilize binary vector representations for searching, an appropriate binary similarity measure must be used to quantify the relatedness of two binary vectors. A simi- larity measure can be trivially converted to a distance measure by subtraction from a max- imum possible similarity value. Without loss of generality, in this thesis distance relation- ships are used to compare vectors. Non-binary distance measures operating on floating point vectors are referred to as numerical distance measures.

Binary Distance Measures

Many binary distance measures have been proposed, along with their various alter- ations. Often, these variations are not given a unique name and are later mistaken as the 17 original measure. To avoid confusion, the binary distances used in this work are listed with symbolic definitions in Table 2.1. The binary distance measures, Tanimoto, Modified Tanimoto, Hamming, Kulczynski, Yule, and Bit Correlation, are commonly used distance measures for comparing binary vectors [40, 41, 53, 108, 17, 126]. The simpler binary distance measures, Hamming and Tanimoto, require less computation time and are preferred in many applications. The other binary distance measures attempt to balance the contributions of different combinations of positive and negative traits that the Hamming and Tanimoto distance measures ignore.

Numerical Distance Measures

The numerical distance measures used in this study are given in Table 2.2. The nu- merical distance measures,Euclidean, Manhattan, Cosine, Pearson, Spearmans, and Bray- Curtis distance measures, were chosen for their widespread use and excellent performance under specific conditions [27, 38, 42, 74, 103, 116, 77]. The Cosine distance measure is more commonly referred to as a similarity measure, whereas the Pearson and Spearmans distance measures are used as measures of statistical correlation. These are converted to

Table 2.1: Definitions of Binary Distance Measures. A and B are both binary vectors of length n. A+ and A− represent the number of positive and negative bits in A, respectively. A±B± denotes the number of bits in the intersection of A and B. A± ∨ B± denotes the number of bits in the union of A and B. To avoid division by zero, vectors of all positive or all negative bits are removed.

Distance Measure Symbol( δ) Definition A+B+ Tanimoto δτ 1 − A+∨B+ A+B+ A−B− 2 A++B+ Modified Tanimoto δτm 1 − α A+∨B+ − (1 − α) A−∨B− where α = 3 − 6∗n A+B++A−B− Hamming δH 1 − n A+B+ Kulczynski δK 1 − A+B−+A−B++n A+B−∗A−B+ Yule δY 2 ∗ A+B+∗A−B−+A+B−∗A−B+ A+B+∗A−B−−A+B−∗A−B+ Bit Correlation δBC 0.5 − √ √ √ √ 2∗ A+∗ A−∗ B+∗ A− 18 distance measures as defined in Table 2.2. The Euclidean, Pearson, and Spearmans distance measures have traditionally been the preferred distance measures for comparing microarray expression profiles.

2.2 Methods

In order to determine the applicability of binary vector representations to content based searching in microarray databases, two small datasets are constructed from records that are retrieved from GEO and analyzed. The first dataset contains studies performed on the most commonly used platform in GEO, Affymetrix HG-U133 Plus 2.0 platform (GPL570), whereas, the second dataset contains studies performed on multiple platforms. The results of the datasets are converted to binary and numerical vectors and validation studies are performed.

2.2.1 Dataset Acquisition

Two collections of microarray datasets are partitioned into five groups where each group is expected to produce a distinct expression profile. The first dataset is limited to studies performed on GPL570; the second contains studies performed on multiple platforms. For the single platform study, the datasets are divided into groups of Alzheimer’s disease, breast

Table 2.2: Definitions of Numerical Distance Measures. A and B are vectors of fold changes of length n. Ai represents the ith term of the vector. µA is the mean of vector A.

Distance Measure Symbol( δ) Definition p 2 Euclidean δeuc ∑(Ai + BI) 1 Manhattan δman n ∑|Ai − Bi| A·B Cosine δcos 1 − kAkkBk q Pearson δcorr 1 − 1 Ai−µA Bi−µB 1−n stdevA stdevB Spearmen’s δρ δcorr(Rank(A),Rank(B)) Bray-Curtis δbray ∑|Ai−Bi| ∑|Ai+Bi| 19 cancer, cardiac complications, hepatic stress, and skeletal muscle. Studies are identified using a search of GEO series for keywords related to each group and limited to the desired platform. In total, 54 series are included with 1322 profiles being created. Because the data are taken from Series records that have yet to be curated, the validity of the groups are highly questionable. A complete listing of these series, subsets, and profiles can be found in Appendix A. For the multiple platform study, the datasets are composed of a compendium that was also used by Engreitz et al. [38] and datasets identified by keyword based searching of GEO DataSet titles. The use of curated DataSets ensures the validity of subsets and groups. The groups organize datasets of Duchenes muscular dystrophy, breast cancer, Huntingtons disorder, hepatic tissue exposed to a chemical stimulus, and cardiac tissue in conditions similar to cardiac arrest. 18 GEO DataSets involving hepatic tissue are identified using keywords similar to and hepatic and filtering results to ensure an appropriate stimulus was present. 18 GEO DataSets involving cardiac tissue are identified using the keywords such as cardiac, atrium, and cardiovascular and filtered to ensure tissues were exposed to conditions similar to cardiac arrest. Subsets for each dataset are identified, resulting in a total of 289 subsets. A complete listing of these subsets can be found in Appendix B.

2.2.2 Expression Profile Creation

Differential expression profiles are used to compare subsets in order to reduce the noise and variability seen with microarrays. Although there will be variability with microarrays regardless of the platform used, differential profiles have been shown to be adequate for comparisons [66, 109]. Differential profiles have also been shown to provide consistent and reproducible analytical results across different microarray platforms [50, 59]. Because the second dataset involves studies taken from experiments performed on dif- ferent microarray platforms, a standard set of genes and homologous genes of those genes is created. Genes from GPL570 are chosen as the standard set as they are used in the sin- 20 gle platform dataset. Genes from other platforms are identified according to platform data tables provided by GEO. If necessary, genes are mapped across species using the Homolo- gene database provided by NCBI [125]. The mean expression value of every gene in each subset is calculated and pairwise comparison is done between all subsets in a dataset with fold change determined as the ratio of mean expression values of a gene between subsets. Profiles are created using fold changes when two subsets are compared, but profiles are not created by comparing subsets from different datasets. In the single platform study, 1322 profiles are created. A listing of the profiles appears in Appendix A. In the multiple platform dataset, 947 profiles are created. A listing of the profiles appears in Appendix B.

2.2.3 Binary Vector Creation

Students t-tests are performed to determine significant changes in expression with sig- nificance determined by a p-value of less than 0.05. A more stringent significance cut-off and a correction for multiple hypothesis testing are not used in order to prevent the gen- eration of profiles with very few or no differentially expressed genes. Every gene in the standard set is assigned one bit in the linear binary vectors representing whether the change in the expression is significant. Only one differentially expressed probe set for a gene is considered sufficient for that gene to be labeled significantly different. The direction of ex- pression change is ignored. Binary vector representations that contain all-zeros or all-ones are filtered out. The corresponding numerical profiles are also removed.

2.2.4 Cross Validation Studies

In order to assess the accuracy of each distance measure, a leave-one-out cross valida- tion study is conducted for both datasets. Each differential expression profile in a dataset is used as a query against the entire dataset. The results are ranked according to the dis- tance from the query profile, from least to greatest. Six binary distance measures and six numerical distance measures are investigated. 21

Receiver operating characteristic (ROC) curves are produced for the rankings of ex- pression profiles under each distance measure for both datasets. True positive selections are assumed to be from the same group as the query. The area under the curve (AUC) statistic is calculated to enable quantitative analysis. The percentage of true positive selec- tions that are identified before a single false positive selection occurs is recorded for each distance measure. The time in which searches are completed when a distance measure is applied is also recorded.

2.3 Results

Two validation studies provide an opportunity to determine the accuracy of searching a microarray database with binary representations. The single platform dataset tests to deter- mine if binary representations are efficient on one platform. The multiple platform study is needed to determine if binary representations are efficient on different technologies.

2.3.1 Single Platform Validation Study

The results of the single platform dataset are summarized in Table 2.3. ROC curves for each distance measure are shown in Figure 2.1. In general, the use of binary distance measures produces superior results in both accuracy and speed when compared with the use of numerical distance measures. The use of the Kulczynski distance measure generates the highest percentage of true positive selections before a false negative was selected. The second highest percentage occurs when the Tanimoto distance measure is used. The average searching time when numerical measures are used is 59.5 ms. On the other hand, the average searching time when binary measures are utilized is only 4.40 ms. Among the binary distance measures, the use of the Hamming distance is the fastest. The use of the Tanimoto distance requires slightly more time. When the Modified Tanimoto measure is utilized, searches require 150% of the time needed when the Tanimoto distance measure is used. The greatest area under the curve is achieved when the Kulczynski distance measure 22 is utilized. The Yule distance measure is the only binary distance measure to produce an AUC value of less than 90 percent. The use of three numerical measures, the Manhattan, Spearmans, and Pearson distance measures, perform better than a random search while only the Manhattan distance generates an AUC value similar to the results when binary measures are used. The results that are generated with the utilization of the other numerical distance measures are worse than what would be the result of a random search.

2.3.2 Multiple Platform Validation Study

The results of the multiple platform dataset are summarized in Table 2.4. ROC curves for each distance measure are shown in Figure 2.2. Again, the use of binary distance measures produces superior results in both accuracy and speed when compared with the use of numerical distance measures. As is in the single platform study, the use of the Kulczynski distance measure generates the highest percentage of true positive selections before a false negative is selected in the multiple platform study. As opposed to the single platform study where the Tanimoto distance is the second highest, the second highest percentage in the multiple platform study occurs when the Hamming distance measure is used. The average searching time when

Table 2.3: Results of Single Platform Validation Study. True positive rates at a false pos- itive rate of 0 (TPR|FPR=0) and average search time for distance measures (δ) are provided.

δ TPR|FPR=0 Time(ms) δ TPR|FPR=0 Time(ms) δτ 95.3 ± 12.6 2.19 ± 0.19 δeuc 0.77 ± 2.72 28.96 ± 4.84

δτm 84.9 ± 15.0 3.55 ± 0.26 δman 8.58 ± 14.4 40.04 ± 7.12 δH 93.9 ± 15.0 2.13 ± 0.16 δcos 0.95 ± 3.12 31.04 ± 4.66 δK 98.0 ± 8.8 4.77 ± 0.34 δcorr 1.35 ± 4.55 37.96 ± 7.02 δY 63.3 ± 36.3 5.94 ± 0.39 δρ 10.9 ± 8.71 35.42 ± 3.18

δBC 90.2 ± 23.0 5.97 ± 0.35 δbray 0.88 ± 3.05 48.45 ± 8.73 23

Figure 2.1: ROC Curves of Distance Measures Included in the Single Platform Vali- dation Study. The solid curves plot the true positive rate (y-axis) against the false positive rate (x-axis), averaged over all queries, with error bars representing standard deviation. AUC is shown as a percentage of the total area. The dashed lines show the ROC curves for random selections. Numerical distance measures are in the left column; binary in the right.

numerical measures are used is 28.6 ms. On the other hand, the average searching time when binary measures are utilized is only 4.00 ms. The rankings of distance measures by time required to search is the same as in the single platform study. The greatest accuracy is achieved when the Bit Correlation distance measure is uti- 24 lized. The use of the Yule distance measure produces the only other AUC value greater than 90 percent. Of note, the use of the Cosine, Euclidean, Pearson, and Bray Curtis distance measures generate AUC values equivalent to a random search. The use of two nu- merical measures, the Manhattan and Spearmans distance measures, produce comparable accuracies to the results when binary measures are used. The results generated with the utilization of the other numerical distance measures are alike. This finding suggests that Euclidean and Pearson distance measures, which are commonly used in microarray data analysis, may not be appropriate for evaluating similarity of differential expression profiles from different technologies.

2.4 Discussion

One of the main advantages of binary vector representations is the relatively small amount of storage space they require. Binary representations use one bit per gene whereas floating point numbers use 32 bits. The reduction of storage space approaches this ratio when the meta-data stored for each experiment is relatively small. The time required to perform a search is used as a criteria to select the three best mea-

Table 2.4: Results of Multiple Platform Validation Study. True positive rates at a false positive rate of 0 (TPR|FPR=0) and average search time for distance measures (δ) are pro- vided. The first search of a distance measure required the most time. In cases where the first search created deviations greater than the average search time, the first search was omitted from calculations.

δ TPR|FPR=0 Time(ms) δ TPR|FPR=0 Time(ms) δτ 31.9 ± 28.6 2.40 ± 0.17 δeuc 0.00 ± 0.04 19.59 ± 0.17

δτm 32.7 ± 22.3 3.66 ± 0.17 δman 28.4 ± 27.8 33.12 ± 4.64 δH 44.8 ± 22.3 2.24 ± 0.01 δcos 0.00 ± 0.07 18.88 ± 0.62 δK 55.0 ± 25.0 4.57 ± 0.01 δcorr 0.00 ± 0.05 22.46 ± 1.28 δY 23.6 ± 28.3 5.58 ± 0.01 δρ 35.2 ± 23.6 23.40 ± 1.14

δBC 37.1 ± 25.4 5.60 ± 0.01 δbray 0.00 ± 0.00 1.94 ± 0.19 25

Figure 2.2: ROC Curves of Distance Measures Included in Multiple Platform Valida- tion Study. The solid curves plot the true positive rate (y-axis) against the false positive rate (x-axis), averaged over all queries, with error bars representing standard deviation. AUC is shown as a percentage of the total area. The dashed lines show the ROC curves for random selections. Numerical distance measures are in the left column; binary in the right.

sures from the validation studies. Although the use of the Bit Correlation and the Kul- czynski distance measures displays the greatest accuracies over both validation studies, the time required to search the datasets when these distance measures are applied is greater than when the fastest methods are utilized. When the Bit Correlation distance measure is 26 utilized, the time is much greater. The Kulczynski distance measure did produce the great- est initial sensitivity in both studies, but the time required to search when it is utilized is nearly double the time when the Hamming distance measure is used. The time required to search a large database with the use of numerical distance measures is prohibitive. Float- ing point representations require extensive memory for storage as well. Because their use requires the least amount of time and produces adequate accuracies as determined by the ROC curves and AUC values, the Hamming, Tanimoto, and Modified Tanimoto distance measures are chosen as the three best performing distance measures. The accuracy improvement achieved when binary distance measures are used is sur- prising. Despite the fact that information is lost during discretization of the data, using binary distance measures improved accuracy. This may be due to several reasons. It is possible the information that is lost during binarization may be superfluous noise. Another possible cause is the heterogeneity of the datasets used, wherein binarization of the expres- sion profiles makes datasets from different platforms and experimental conditions more comparable. Furthermore, the validation studies rely on the assumption that subsets taken from the same group should be more similar to each other than subsets taken from different groups. This assumption may not always be true. It is especially tested when groups of the same tissue type exposed to different stresses are used.

2.5 Conclusion

Binary vector representations are suitable for searching a database of differential ex- pression profiles of microarray data. The use of binary vectors yields accuracies equivalent to or better than floating point measures. The compression of data does not reduce the relevance of the results retrieved by content based searches of binary profiles. At the same time, fast bit-wise operations and the reduction in memory requirements of the binary rep- resentations significantly reduced the time required to search a database. 27

Chapter 3: Implementation of a Database of Binary Representations of Gene Expression Studies

Having demonstrated in Chapter 2 that the use of binary vector representations achieves accuracies equivalent to or better than the use of floating point measures, a database of binary representations from GEO DataSets is created. A profile from the database is used as a query to search the database which demonstrates that a modified Tanimoto distance measure is best suited for content-based searches of differential microarray profiles. The database is refined and implemented on a web server.

3.1 Background and Methods

In order to determine the scalability of binary representations and to identify the best distance measure for searching binary vector representations of differential expression pro- files, a larger database containing over 7,500 profiles is established. Datasets are obtained from GEO DataSets posted before February 1, 2006. Differential expression profiles are constructed as in Chapter 2. Binary vector representations that contain all-zeros or all-ones are filtered out. The corresponding numerical profiles are also removed. Based on the results from Chapter 2, the Tanimoto, Modified Tanimoto, and Hamming distance measures are used to search the large database. The best distance measure is chosen as the measure that produces a limited number of selections of false positive profiles, a maximum number of relevant profiles from the same GDS submission as the query, and a diverse selection of possible true positives. Selections are considered false positives if an immediate correlation between the selected profile and the query profile can not be made and a literature search does not produce a likely connection between profiles. Selections are deemed true positives if a correlation can be made between profiles. Selections that do not compare biological conditions, such as profiles comparing healthy tissues or experimental techniques, are considered false positives. A profile is chosen at random to serve as a query in an initial search of the database. 28

The query profile is taken from GDS1290, a study investigating the differences between CD4+ lymphocytes that develop into T-Helper 1 and 2 (Th1 and Th2) classes of cells with and without the presence of Transforming Growth Factor Beta (TGFβ). The query profile is between the subset of mononuclear cells under a controlled environment at 48 hours and the subset of mononuclear cells under an environment rich in interleukin 4 (IL-4) at 48 hours.

3.2 Results

The 100 least distant profiles from the query when each distance measure is applied are available in Appendix C. The average number of positive bits in the selected profiles, the average number of common positive bits between the query profile and the selected profiles, and the average number of uncommon bits between the profiles is recorded. The number of relevant profiles selected from the same dataset as the query are also recorded for each selected profile. These results are shown in Table 3.1. The use of all three measures correctly identifies the sample query and the same profile as being the least distant to the query profile. This least distant profile is from the same dataset as the query and is subjected to similar conditions. When the Tanimoto and Modi- fied Tanimoto distance measures are applied, the next 14 profiles are taken from GDS1290 and contain one of the subsets used in the query. There are only subtle changes in rank.

Table 3.1: Results of Initial Search of Large Database. The time required searching, the average number of bits in the resulting profiles, the average number of common bits, and the number of profiles from the same submission as the query are shown for each distance measure, (δ).

δ Time (ms) Positive Bits Common Positive Bits GDS1290 Profiles δτ 12.78 2401.07 ± 1105.70 509.30 ± 211.27 49

δτm 18.77 1835.35 ± 370.91 441.02 ± 187.83 61 δH 8.54 25.72 ± 169.77 20.43 ± 143.52 2 29

Hamming distance measure does not produce another profile from GDS1290. Searching using the Hamming distance measure results with the 100 least distant pro- files being clustered into five distinct distance groups, as displayed in Appendix C. The query profile and the closely related GDS1290 profile mentioned earlier form their own clusters. A profile from the dataset GDS1519 forms the third cluster. GDS1519 is from a study that induces fibroblasts to an antiviral state with beta interferon (IFN-β). The fourth and fifth clusters consist of 8 and 66 profiles, respectively, being composed of datasets of viral infections or immune related conditions. When the Tanimoto and Modified Tanimoto distance measures are used, searches return many of the same profiles. The second most common dataset from which the profiles are selected is GDS449, which investigates HIV infection and activation of the HIV gene Tat at different stages of the cell cycle. There are 17 and 19 of these profiles when the Tanimoto and Modified Tanimoto distance measures are used, respectively. Profiles from GDS262 are selected when both measures are used. GDS262 compares tissue samples from healthy individuals and Duchenes Muscular Dystrophy patients and compares expression in dif- ferent tissues. Profiles from GDS1095, a dataset of hematopoietic stem cells, are highly ranked when either measure is applied. Other high ranking profiles include datasets inves- tigating humoral immunity, the regulation of gamma interferon (IFN-γ), and the regulation of tumor necrosis factor alpha (TNF-α).

3.3 Discussion

In the query profile that is used to search the larger database, the IL-4 rich environment induces cells to polarize towards the Th2 lineage. Genes regulated for this lineage are significantly differentially regulated, and the genes involved in the Jak-STAT pathway are up regulated [82]. Genes indicative of the Th1 pathway are down regulated. IFN-γ, one of the Th1 pathway genes, has been well established as a marker for Th1 proliferation

[30, 82, 87]. Immunosuppressive genes such as TGFβ are down regulated as well. Profiles 30 that are returned as similar should have these pathways altered. The use of the Hamming distance measure does not distinguish between profiles suffi- ciently. It results in large clusters of profiles. The titles of these profiles frequently contain the phrase viral infection, possibly because T lymphocytes proliferate in response to viral infection [45, 87]. The Hamming distance measure is determined by the sum of the common bits between two binary vectors. It does not include uncommon bits. As a result, the Hamming distance measure prefers the selection of binary vectors with a small number of positive bits [126]. The number of common negative bits begins to dominate. The average number of positive bits of the 100 least distant profiles as determined by the Hamming distance measure is much less than the number of positive bits in the query. In fact, 98 of the 100 least dis- tant profiles contain fewer than six positive bits. These results suggest that the Hamming distance is not adequate for searching a large database of binary vector representations of gene expression profiles. The use of the Tanimoto or the Modified Tanimoto distance measures is better suited for searching a large database of diverse binary vector representations. Both measures are able to identify profiles from the same sample and detect which profiles have similar bi- nary representations. The profiles that are selected from GDS1290 typically included one of the subsets of the query and a subset similar to the other subset in the query profile. The selection of profiles from GDS449, a study of cell cycle and the HIV gene Tat, should be very similar to the query as well. HIV infection causes an increase in Th2 proliferation and activation of the Tat gene causes an increase in expression for many genes and path- ways with significantly different expressions in the query [31, 79]. It is reasonable that profiles of hematopoietic stem cells should appear in the results. CD4+ cells are derived from hematopoietic stem cells and common genes are expressed in both. The selections of profiles of Duchene’s Muscular Dystrophy, although not immediately associated with CD4+ proliferation, are reasonable as well. Inflammation and an immune response are be- 31 lieved to damage muscles in dystrophin deficient individuals [112]. The inflammation is believed to cause a significant increase in IFN-γ expression [21]. It seems logical that other profiles involving IFN-γ are identified by both measures. Also, profiles involving TNF-α are identified by both measures. Because TNF-α is down regulated during Th2 polarization [30, 82, 87], these profiles are expected to be found in a listing of the least distant profiles. Not every profile that is selected can be easily identified as being related to the query profile. Some of these profiles may be false positives. A profile from GDS360 is seen in the results of both methods. GDS360 is a study comparing tissue biopsies from breast cancer patients treated with docetaxel. The results of the study do not implicate CD4+ lympho- cytes or any of the targeted genes of the query as being involved in docetaxel resistance. The study did implicate several genes involved in the cell cycle [18]. These genes may be significantly differently regulated in the query, but further examination is needed. The other profile found by implementing both measures whose relationship is unclear comes from GDS367. GDS367 is a dataset comparing skeletal repair in normal and non-union

fractions. TGF-β may be up regulated in non-union fractures, but fracture healing is a complicated and diverse process [131] that requires extensive study before this entry can be classified as a true positive. Other potential false positives are specific to the distance measure used. Unlike the use of the Hamming distance measure which results in profiles with few positive bits, the implementation of the Tanimoto distance measure results with profiles with many positive bits. The average number of positive bits in the least distant profiles that are found using the Tanimoto distance measure is nearly double the number of positive bits in the query. This selection preference has been noted in previous work [40, 41, 53, 126]. This nature is due to the fact that the Tanimoto distance measure does not include common negative bits. The Modified Tanimoto distance measure attempts to correct this bias by incorporating the contribution of common negative bits [40]. Incorporation of the modification decreases the difference between the number of positive bits in the query 32 profile and the average number of bits in the profiles that are selected. Use of the Modified Tanimoto distance measure increases the relevancy of the profiles selected as being similar. Fewer profiles establishing gene expressions for healthy tissues are selected than when the Tanimoto distance measure is used. The use of the Tanimoto distance measure as compared to the use of the Modified Tanimoto distance measure results with fewer relevant profiles from GDS1290 being selected, a smaller ratio of common bits selected to positive bits selected, and the same number of submissions from which profiles originated. The drawback of using the Modified Tanimoto distance measure is the additional time required to search. When searching large databases, the use of the Modified Tanimoto distance measure needs more time than the Hamming distance measure and the Tanimoto distance measure. Although the search time is increased, the gain in performance compen- sates for a lack of speed. For this reason, that among the distance measures investigated in this chapter, the Modified Tanimoto distance measure is the best distance measure tested for searching a large database of binary vector representation of differential expression profiles.

3.4 Conclusion

Experiments on a large database of binary vector representations demonstrate that, among the distance measures investigated, the modified Tanimoto distance measure is most appropriate for scalable and accurate content based retrieval of microarray differential ex- pression profiles. An implementation allowing the use any of the binary distance measures used in Chapter 2 is available as a web service at: http://sacan.biomed.drexel.edu/mageoindex/.

3.5 Future Work

The mappings between probe sets and genes relies upon an intermediate mapping to nucleotide accessions. This intermediate step may have lead to incorrect mappings. GEO 33

DataSets provide a direct mapping to a gene and using this information instead of data table annotations will remove these errors. An updated database is under construction to improve mappings. The expansion of the database to include GEO Series is being considered. The diffi- culty associated with GSE files is the automated detection of unique subsets because GSE are not divided into subsets by GEO reviewers and sample nomenclature is inconsistent. A computer learning algorithm may be set up using the current platform definitions to gener- ate binary vectors. An option to allow users to query the database with non-published data is also under development. 34

Chapter 4: Enrichment of Gene Expression Data Using Binary Vector Representations

After a microarray study produces a list of differentially regulated genes, the determi- nation of which biological functions are associated with those genes may not be apparent due to the large number of genes in the list. Typically, the gene list is enriched by mapping genes to sets of biological associations and determining the statistical over-representation of gene sets in the mappings. In this chapter, a method of gene set enrichment based on the binary representation of gene expression databases established in Chapter 3 is introduced, tested, and its possible implementation is discussed.

4.1 Gene Set Enrichment

Obtaining a list of significantly differentially regulated genes from microarray data does not illuminate the biological implications of the list. This can be done by using the knowl- edge stored in public databases where sets of genes associated with functions, processes, and cellular components are available for over-representation enrichment analysis. If a set contains many of the genes found in a list of differentially regulated genes, the biological association of the set is likely to be important for the conditions being studied [60]. Many gene sets have been created and several methods to determine if a set is statistically en- riched have been proposed. Any set of genes can be used for over-representation analysis, but a correlation between the gene set and a biological function may not exist. Therefor, a gene set should only be used if it can be obtained from a trusted source and verified for its accuracy. Two prominent sources of gene sets whose information is very trusted are introduced below.

4.1.1 KEGG Pathways

The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a collection of databases for understanding complex cellular processes. It has been maintained by the Kanehisa lab- oratories since 1995 and has become a popular resource for gene set enrichment [70]. It 35

Figure 4.1: Example of KEGG Pathway. Th complement cascade is part of KEGG Path- way hsa04610. Gene products are shown in boxes. Dashed lines show indirect effects between products. Solid lines with arrows represent activation. Solid lines with bars repre- sent inhibition.

contains databases for genomic information, chemical substances, and disease and drug information, but it most known for its annotations of molecular pathways. These pathways can be used to determine the complex interactions between gene products. An illustration of a KEGG pathways representation of gene product interactions is shown in Figure 4.1. By creating a set of every gene shown in the pathway and ignoring the relations between gene products, the pathway can be simplified into a traditional gene set associated with biological importance. Ignoring relations between gene products causes enrichment to be- come difficult when the direction of regulation is considered. The enrichment should be reviewed carefully to ensure that the directions of regulation are consistent with the path- way before the enrichment is declared. Fortunately, the direction of regulation is typically ignored with gene expression analysis, and the pathway can be treated as a gene set. This thesis relies heavily upon KEGG for pathway information.

4.1.2 Gene Ontology

The Gene Ontology (GO) Consortium is a group dedicated to explaining how genes encode biological functions at the molecular, cellular, and tissue system levels. The GO database places genes into a hierarchical network of annotations where annotations are connected to each other by a controlled vocabulary of interactions [28]. An example of a 36

Figure 4.2: Example of GO Hierarchy. The regulation of antigen processing and presen- tation is defined in GO:0002577. The yellow box is the child of its parent terms shown in white. The possible vocabularies for relations between terms is shown on the right.

GO annotation tree is given in Figure 4.2. Because GO is very detailed when identifying a gene product with a biological function, the enrichment of a differentially regulated list of genes using GO can be very confusing. A process that regulates a biological function will often have children terms for positive and negative regulation processes. If only the parent term is identified as being enriched, determining the direction of regulation associated with the list is not possible. To prevent this possibility, only GO terms without children are enriched in this thesis. If many similar associations are enriched, the significance of a common parent association may be used to more succinctly show enrichments.

4.2 Determining Significance

Numerous resources have been developed to aide in the automated enrichment of gene lists [60]. These resources can be classified into two groups by the algorithms used. Either deviations of the hypergeometric test or permutations and rank-order tests are implemented by these algorithms. Prominent services that use the algorithms are discussed below.

4.2.1 DAVID

The Database for Annotation, Visualization, and Integrated Discovery (DAVID) was created to enable the functional annotation and analysis of large lists of genes, particularly those generated from microarray analysis [33]. It was established in 2003 by the National 37

Institute of Allergy and Infectious Diseases (NIAID) which is part of the National Insti- tute of Health (NIH). It allows users to submit a list of genes, compares the gene list to several sets of biological associations, and calculates the statistical probability the set is over represented using a modified hypergeometric test [56]. As defined in Equation 4.1, the hypergeometric test, also called Fisher’s exact test, calculates the probability, p, of se- lecting up to a number of desired objects, k, from a known number of objects, s, containing a known number of desired objects, d, in a number of tries, t, removing an object once it has been selected. k ds−d k t−k p = ∑ s (4.1) 0 t There are approximately 20,000 genes in the genome and that approximation is used for the total number of objects, s. For the EASE modification, the number of desired objects is reduced by a value, E, which is typically set to one to account for very small gene sets. k−E d  s−d  k−E t−k+E p = ∑ s (4.2) 0 t Because of its ease of use, DAVID has become widely used.

4.2.2 GSEA

Another common approach for the over representation enrichment of gene lists is to use the Gene Set Enrichment Analysis (GSEA) program. GSEA utilizes random permu- tations and a weighted KolmogorovSmirnov test to determine over representation [114]. An attractive aspect of GSEA is its ability to correct for multiple comparisons by account- ing for false discovery rates. Because false discovery rates are not necessarily included in pre-normalized data and GSEA has been shown to perform poorly on some datasets [65], GSEA is not used in this thesis. 38

4.3 Motivation and Reasoning

The binary representation database established in Chapter 3 greatly reduces the time needed to search for and the space required to store profiles. An extension of the database can be created to allow for the efficient enrichment of profiles as binary representations. Using the same 16600 gene template, gene sets can be converted to binary with positive bits for genes in that set. Establishing a statistical measure that uses binary vectors to increase the speed of the enrichments presents a challenge as binary distance measures do not produce statistics of probabilities. The binary distance measures would need to be transformed into estimates of probability before statistical enrichment would be possible. In the following section, an approach to transform binary distance measures to estimates of probabilities is tested.

4.4 Methods

An enrichment algorithm mapping binary distances to EASE probabilities is created and tested. The algorithm uses an exponential transform obtained from plots of probabili- ties and binary distances.

4.4.1 Distance Measures

Binary vector representations of gene sets limit which distance measures can be used for over representation enrichment analysis. Binary vectors of gene sets contain a large fraction of empty bits as genes not in the set are not designated with positive bits. The mostly empty vectors are lacking information and distance measures that incorporate empty bits emphasize the lack of information. As a result, the only binary distance measure used in Chapter 2 that is appropriate for enrichment analysis without modification is the Tanimoto distance. As Equation 4.3 shows, the Tanimoto distance is the intersection divided by the union of positive bits in two vectors and does not include negative bits in the calculation.

A+B+ δτ = 1 − (4.3) A+ ∨ B+ 39

Another binary distance measure that was previously used can be altered for applica- bility to enrichment analysis. The hamming distance is modified to ignore negative bits, as in Equation 4.4, and is referred to as the Ratio distance for the remainder of this chapter.

A+B+ δR = 1 − (4.4) B+

A+B+ is the number of genes that appear in both differentially regulated gene list and gene set and B+ is the number of genes in the gene set.

4.4.2 Database Construction

Gene sets for KEGG pathways and GO associations for are retrieved and mapped to the genes of GPL570. The mappings are kept as sparse matrices that can be converted quickly to binary vectors. A database of these sparse matrices is created with ta- bles for each resource. To avoid contradictory enriched associations, only GO annotations without children terms are added to the database.

4.4.3 Exponential Transforms

Enrichments are calculated using the EASE test, Equation 4.2, for a test profile from the binary representation database. Distance values between gene sets and the test profile are determined as well. Plots of P-values against distances are generated and equations of exponential curves that fit the data are produced. Because KEGG pathways and GO associations are independent, two equations, one for KEGG and one for GO, are derived from the plots for each binary distance that is used. The equations are implemented to predict the significance of distances from a profile with more significant genes and a profile with fewer significant genes. To validate the predicted enriched gene sets, EASE tests are conducted for both profiles and the enriched subsets are compared to the predicted enriched subsets by using a confusion matrix. 40

4.5 Results

The subset from GDS1290 used as a query in Chapter 2 is used as the test profile. A total of 1,278 bits are significantly regulated in the profile with 95 KEGG pathways being enriched and 94 GO associations being enriched. The profile containing fewer significant bits was taken from GDS2856. It compares peripheral blood derived monocytes under control settings and cell responses 2 hours after lipopolysaccharide treatment and contains 501 significant genes. 3 KEGG pathways and 14 GO associations are enriched for this profile. The profile containing more significant bits is taken from GDS3435. It compares kidneys administered an intravenous dose of saline and a 200 mg/kg dose of vancomycin and contains 2,067 significant genes. 40 KEGG pathways and 47 GO associations are enriched for this profile. The plots of P-values and binary distances with fitted exponential curves for this sample are shown in Figure 4.3. The equations of the fitted curves are given in Table 4.1. The resulting confusion matrix is displayed in Table 4.2. The plots of P-values and binary distances roughly trend exponentially. The fitted curves do not follow the data points at the elbow of the data because the data include outliers that cause the curve to be more rounded. The fitted curves tend to better fit KEGG pathways than GO associations. The curves for the Ratio distance seem to follow different exponential trends with coefficients appearing to be dependent on the number of significant genes in a given gene set. Using the Tanimoto distance tends to predict that more sets are enriched than using the Ratio distance. As a result, the Tanimoto distance predicts more true positive and false negative gene sets. Many enriched gene sets are predicted to not be enriched with the use of both measures, but when the Ratio measure is used to predict the enrichment of KEGG pathways for the profile from GDS3435, numerous sets are predicted as being enriched. The large number of predictions results in many false positive gene sets being predicted. 41

Figure 4.3: Fit Exponential Curves of P-values and Binary Distances. Exponential curves are fit to plots of P-values and binary distances of enriched gene sets for the profile from GDS1290. Gene sets are represented as blue dots. Curves are shown as continuous red lines. (Left) KEGG Pathways. (Right) GO Associations. (Top) Tanimoto distance. (Bottom) Ratio distance.

Table 4.1: Equations of Exponential Fitted Curves. Equations for the curves that are fit in Figure 4.3 for KEGG pathways and GO associations are provided for distance measures (δ). Equations follow the form a ∗ eb∗x.

Gene Set δ Equation KEGG δT 3.674 ∗ 10−110 ∗ eδ∗252.3 KEGG δR 1.323 ∗ 10−9 ∗ eδ∗21.74 GO δT 3.183 ∗ 10−36 ∗ eδ∗81.01 GO δR 0.04965 ∗ eδ∗2.54 42

Table 4.2: Confusion Matrices for Test Enrichments. Given are the numbers of True Positive, False Positive, and False Negative enriched gene sets for the Tanimoto and ratio distances. EASE enrichment was used to determine true enrichment values. TP - True Positives, FP - False Positives, and FN - False Negatives. True Negative enrichment is not included because most sets are not enriched for either distance.

Tanimoto Distance Ratio Distance GDS2856 GDS3435 GDS2856 GDS3435 Gene Set TP FP FN TP FP FN TP FP FN TP FP FN KEGG 2 6 1 20 10 20 0 1 3 37 59 3 GO 1 3 13 7 13 40 0 0 14 1 0 46

4.6 Discussion

Neither the Tanimoto distance nor the Ratio can adequately predict if a gene set is enriched using the equations in Table 4.1. The predictions using either distance are not reliable for statistical analysis. The limited predictability may be caused by the size of the set of profiles that is used to generate the exponential fits. The training set consists of only one profile which may not represent the entire database of binary profiles. Using more profiles to develop the equations of the exponential curves will better summarize the entire database and lead to more accurate predictions. The use of the Tanimoto distance is limited by the number of false positive predictions it makes. As can be found in Appendix D, these false positive selections tend to predict gene sets with EASE probabilities in the range of 0.1 to 0.25. The number of false predic- tions should be reduced after the fitted exponential curves are redefined. Further testing is required before the Tanimoto distance is implemented for enrichment analysis. The Ratio distance should not be used for enrichment analysis of gene lists. Using the distance predicts too few or too many gene sets as being enriched. It does not incorporate the number of genes that are being tested. Because the probability that a gene set is enriched 43 depends on the number of genes in the set and the number of genes being tested, the Ratio distance should not be included in further testing.

4.7 Conclusion and Future Work

A useful enrichment of a gene produces a large ratio of sets predicted to be enriched and the number of sets determined to be enriched, a small ratio of the number of sets that are incorrectly predicted as being enriched and the number of sets that are correctly predicted as being enriched, and a small ratio of the number of sets that are not predicted to be enriched but are determined to be enriched and the number of sets correctly predicted of being enriched. The test that is described above shows that the Ratio distance is not appropriate for enrichment analysis, but a conclusion about the Tanimoto distance cannot be reached. It remains unclear if a transform between the Tanimoto distance and the widely used hypergeometric test can adequately predict gene set enrichments. Another study needs to be conducted before a final decision is made. This study will increase the number of profiles used to create the exponential transforms by randomly selecting a third of the profiles in the database of binary profiles to include in the training set. The use of more profiles should generate a normal distribution of enrich- ments centered around an exponential curve, if one exists. This study should also address the method used to generate a curve that summarizes the enrichments. Methods other than the one presented above include, but are not limited to, the extrapolation of an exponential curve fit only at the elbow of the enrichments, the formation of a piece-wise function of moving averages, and the creation of a spline. The predicted gene sets may be compared to those determined by another enrichment method such as GSEA. If this study finds that the Tanimoto distance is appropriate for enrichment analysis, a service should be created for Tanimoto enrichments and a database of the enrichments can be created. 44

Chapter 5: Meta-analysis of Gene Expression During Host Responses to Infections

Numerous gene expression studies of host responses to infectious pathogens have been conducted and the data from these studies have been made available in the public reposito- ries. Meta-analysis of these studies can identify host responses common among and unique to different pathogens and provide insight to molecular mechanisms of host responses. In this chapter, a meta-analysis of gene expression experiments during responses to infections is performed using an inverse variance effect size method. Using the methods described in Chapter 4, gene set enrichment shows a set of common cell cycle and immune response pathways. Differences between responses to varying pathogens can be partially attributed to mechanisms of host evasion by pathogens.

5.1 Background

The host response to infectious pathogens has been extensively studied. Numerous gene expression studies during infection have been conducted and a common host response to pathogens has been identified [57, 67, 93]. The common host response is initiated when pat- tern recognition receptors (PRRs) detect pathogen associated molecule patterns (PAMPs) and start a signaling cascade to protect and alert nearby cells to the danger. Key classes of PRRs are Toll-like receptors (TLRs) and C-type lectin receptors (CLRs) which detect a variety of both intracellular and extracellular PAMPs. Once the pathogen is identified, an immune response specific to the invader is summoned through the release of and chemokines such as TNF-α, interferons, and interleukins [12, 107]. Mechanisms of these defenses include phagocytosis, antigen presentation, Natural Killer (NK) cell recruitment, B and T lymphocyte proliferation, complement activation, and apoptosis. Unfortunately, it can be unreliable to draw conclusions from a single gene expression experiment because the data are often very noisy and similar experiments can produce contradicting results. Using data from many experiments yields more steadfast results al- lowing for the generation of more reliable conclusions and hypotheses. Meta-analysis is the 45 statistical process of combining and extracting information from related, yet independent, experiments to generate inferences that could not be discovered from individual studies alone. Gene expression is well suited for meta-analysis and has been utilized to study various [49, 100], auto-immune conditions [110], and pain [76]. For a successful meta-analysis of gene expression data, the statistical method used to combine data across studies, the criteria for data inclusion, the handling of different technologies, the meth- ods for the normalization of arrays, and the aim of the work must be well defined before the meta-analysis is undertaken [14, 99]. Because of the vast number of gene expression studies of the host response to infectious pathogens available in public repositories, the host response is a strong candidate for meta-analysis. A meta-analysis of host responses should identify the mechanisms common among responses to different pathogens, high- light differences between responses to different pathogens, illuminate where information is incomplete, and propose novel hypotheses about interactions between host and pathogens.

5.2 Methods

The following describes a meta-analysis of gene expression studies to identify genes that are differentially expressed between healthy samples and samples during the early host responses to infectious pathogens. The inclusion criteria for infection experiments, data preparation procedure, and the utilization of an inverse variance effect size method to perform a meta-analysis across these experiments are detailed. Genes that are commonly differentially regulated in host responses to different classes of pathogens are identified. These genes are enriched by mappings to biological terms and pathways.

5.2.1 Data Acquisition

All data used in this chapter come from publicly available experimental data deposited in GEO. Candidate studies are identified using the keyword search capabilities of GEO. Keywords include but are not limited to host, pathogen, infection, virus, bacteria, and parasite. A manual curation of the resulting set of studies is undertaken to minimize the 46 heterogeneity of experimental conditions and designs. Pre-normalized data in SOFT format are retrieved from GEO, and when pre-normalized data are not available, raw data are normalized using RMA without correcting for FDR. The candidate datasets that are identified from the keyword search are filtered using a set of inclusion criteria. Studies are separated into subsets of samples where each subset tests a unique combination of experimental variables such as a time point after infection and a specific cell line. Studies that do not contain at least one control subset and at least one infection subset, with more than one sample in each, are removed from consideration. Studies are reviewed to ensure that genetic alterations are not made to host samples, exper- iments are performed on two color single channel arrays, and host tissues are derived from humans, , mice, or rats. For time series studies without a control subset, a subset of samples taken at the time of infection is used as a control. Study subsets taken around 24 hours after infection are used as infection subsets; this includes subsets from 18 to 48 hours post infection for studies without a subset taken 24 hours after infection. Clinical samples are exempt from this criteria because the exact time of infection is unknown. Data are reviewed to ensure proper expression values, where probe sets missing ex- pression values are removed from further analysis. Because many different technologies are used in the collected studies, probe set identifiers are mapped to sequence identifiers using GEO annotation files. Gene identifiers are then mapped to HGNC symbols [48] us- ing Gene [11]. When needed, NCBI Homologene is used to convert genes across species.

5.2.2 Binary Search to Distinguish Taxonomical Groups

A cross validation is performed as in Chapter 2 to determine if profiles from studies using pathogens of viruses, bacteria, and eukaryotes are similar to each other. Selected profiles are considered true positives if the selected profile is from the same group as the query. ROC curves are constructed, and the AUC statistic is calculated for each taxonom- 47 ical group and the entire dataset. Initial sensitivity is not recorded because the common host response is present in all groups, and false positive selections occur before many true positive profiles are selected.

5.2.3 Selection of a Meta-analysis Method

The selection of a statistical method for meta-analysis of gene expression studies is of utmost importance because there are numerous methods from which to choose with each having benefits and limitations. Vote counting is the simplest of these approaches in which genes that are most commonly differentially expressed in individual studies are deemed to be altered. Vote counting is most efficient for studies performed on the same technology because different technologies include different sets of genes, and vote counting does not adjust for the likelihood of genes appearing in a particular set. It also is limited by study selection bias and does not differentiate the direction of regulation [15, 54, 99]. Other al- gorithms, such as rank aggregation, calculate the fold change of gene expressions, rank them accordingly for individual studies, and combine ranks across multiple studies. As with vote counting, this approach is best suited for studies performed on a common plat- form, but unlike vote counting it enables a high degree of statistical control [32, 95, 132].

Fisher’s inverse χ2 method uses p-values from individual studies to compute a combined statistic that follows a χ2 distribution. P-values are determined by calculating the proba- bility that two fold changes are equal by performing a t-test. It can be applied to studies from different technologies but cannot determine the direction of differentiation [54]. The effect size method calculates a statistic similar to a t-value for every gene in a study and combines these by including within and between study variations to produce a summary statistic. It can be used with studies from different technologies by inversely weighting the mean fold change of individual genes by variance and detects the direction of differenti- ation [23, 49, 99]. The effect size method is computationally intensive and may produce false positive discoveries, so care must be taken when interpreting its results. In this work, 48 data are pooled from different technologies and the direction of differentiation is detected. Therefore, a mean weighted, inverse variance effect size method is implemented to perform a meta-analysis, as described below.

5.2.4 Statistical Approach

Differential expression profiles are created comparing an infection subset and the ap- propriate control subset from the same study. Expression values are converted to Z-scores using mean weighted, inverse variance effect sizes as described in [23] for every profile. Briefly, the effect size for each gene in each profile is calculated as Hedge’s correction for the sample size bias detectable with Cohen’s calculation of mean difference. The calcula- tion of the unbiased effect size is given in Equation 5.1 and its variance in Equation 5.2.

3d X − X g = d − where d = e c (5.1) 4 ∗ (ne + nc) − 9 Sp

where e represents samples in the infection subset, c represents samples in the control subset, n is the number of samples in a given subset, X is the mean expression value of the

given subset, Sp is the pooled standard deviation of samples from both subsets, d is Cohens calculation of mean difference, and g is the unbiased effect size.

2 2 2 (ne + nc) + nencg σg = (5.2) 2nenc ∗ (ne + nc)

Heterogeneity is assumed for all genes and a random effects model is implemented to calculate the weighted mean of effect sizes, µT . Cochran’s Q, Equation 5.3, traditionally is used to determine homogeneity but is required in the calculation of between study variation.

g 2 ∑ 2 (g − ς) σg Q = ∑ 2 where ς = 1 (5.3) σg ∑ 2 σg

Before calculating the weighted mean expression of each gene, the between-study vari- 49 ation, τ2, is estimated using a method of moments, given in Equation 5.4, where k is the number of studies. The weighted mean of effect sizes and its variance are given in Equa- tion 5.5. ( (Q − k + 1) ∗ 1 ) ∑ σ 2 τ2 = max 0, g (5.4) 1 2 1 (∑ 2 ) − ∑ 4 σg σg

2 σµ ∗ g 2 1 µτ = ∑ 2 2 where σµ = 1 (5.5) σg + τ ∑ 2 2 σg +τ A Z-score is calculated as the ratio of the weighted mean of effect sizes and its standard deviation. Positive Z-scores represent up regulation and negative Z-scores represent down regulation in the infected samples compared to their respective control samples. Absolute values of Z-scores greater than the normal cumulative distribution function centered at 0

with a unity deviation corresponding to α = 0.0001 are designated as significantly different. False discovery rates are calculated for every gene using the pFDR method [113]. The resulting lists of differentially expressed genes are enriched by mapping significant genes to KEGG pathways and GO process associations. A modified hypergeometric test is used to determine the significance of enrichments using the total number of genes detected as the population size.

5.3 Results

Data are taken from 284 studies which are performed on 72 different microarray plat- forms. 480 differential expression profiles between control and infected subsets which con- sist of 5,448 individual samples are created. Because they are used as controls for many infection subsets, 1,281 control samples are used more than once. A complete listing of the studies, platforms, subsets, profiles, and samples that are used for analysis can be found in Appendix E. 94 species of pathogens are included in the gene expression analysis. For convenience, the profiles are grouped by pathogenic species into three broad taxonomical categories; Viruses, Bacteria, and Eukaryota, with 223, 173, 82 profiles, respectively. These categories 50

Figure 5.1: ROC Curves from Cross Validation Study. ROC curves are shown when profiles from the entire set and the bacteria, eukaryota, and viruses taxonomical groups are used to query a binary representation database. The expected result of a random search is provided as a dashed red line.

are further divided into evolutionary subgroups. A detailed phylogenetic tree can be found in Appendix E. The ROC curves from the cross validation study are displayed in Figure 5.1. The curve for the entire set is most similar to the curve for the viruses group because the viruses group contains the greatest number of profiles. As expected, the initial sensitivities are poor for all groups. The AUC statistics for the entire set, bacteria, eukaryota, and viruses groups are 55.5%, 57.2%, 50.8%, and 55.9%, respectively. These AUC statistics are greater than 55% for every group but the eukaryota group suggesting that there are small but noticeable differences between groups. Significant genes are determined at different taxonomic levels: the collection of all profiles, taxonomical groups, evolutionary subgroups, and species. A total of 25,246 genes are detected and are used as the population size for enrichment calculations. A complete listing of significant genes can be found in Appendix F. The number of genes that are up regulated in a taxonomical group reflects the number of profiles in the group, but a larger ratio of up regulated genes to significant genes is found 51

Figure 5.2: Venn Diagrams of Differentially Regulated Genes. Differential expression is tested for three taxonomical groups. The number of common genes refers to genes that are found in different taxonomical groups, not genes that are regulated in the entire collection analysis.

for bacteria than for viruses or eukaryota. The viruses taxonomical group contains the greatest number of profiles and generates the greatest number of significant genes among taxonomical groups. Although gene expression is analyzed at the species level as well, there are often too few profiles testing samples that are exposed to a specific pathogenic species to generate the confidence that is needed to designate genes as significant. Several evolutionary groups have an insufficient number of samples which results in few or no significant genes. For example, no genes are designated as significant from the two profiles of the eukaryotic evolutionary subgroup Amoebozoa. Less than one hundred significant genes are found for the bacterial evolutionary subgroups Chlamydiae and Bac- teroidetes. If a less stringent α is used, more genes for every group and subgroup will be considered significant. Across the three taxonomic categories, totals of 1,536 and 3,611 genes are significantly differentially up and down regulated, respectively. 61 genes are commonly up regulated and 35 genes are commonly down regulated from these categories, as shown in Figure 5.2. Significant genes are mapped to KEGG pathways for the collection of profiles and taxonomical groups. Using GO, genes are associated with biological processes. Figure 5.3 a) provides a simple taxonomical breakdown giving the number of significant genes in each subgroup. Figure 5.3 b) shows enriched signaling and disease associated KEGG pathways. 52

Figure 5.3: Phylogenetic Tree and Enrichment of Significant Genes. a) Phylogenetic tree of organisms in gene expression analysis. Viruses are classified by the Baltimore Classification. Bacteria and Eukaryotes are classified by phyla. Opisthokonta is used as a phylum for Animalia and Fungi. The number of significant genes are provided after the group title. b) Heat map of enriched KEGG pathways. (Left) Signaling pathways. (Right) Disease associated pathways.

Most cellular process pathways are not strongly enriched. A detailed listing of enriched pathways and process associations can be found in Appendix F. No KEGG pathways are enriched for the evolutionary subgroups dsRNA viruses and Fusobacteria. Two GO process associations involving mRNA splicing are significantly enriched for dsRNA viruses, but no associations are significantly enriched for Fusobacteria. These subgroups are not shown in Figure 5.2. The differences between enrichments for subgroups are subtle but suggest pathogens have similar tactics to evade host responses. A typical strategy for evasion manipulates TNF signaling to prevent apoptosis, both of which are part of the common host response. Pathways for TNF signaling and Apoptosis are significantly enriched for most subgroups but are not enriched for the dsDNA virus, ssRNA RT virus, Actinobacteria, and Eugleno- zoa subgroups. Pathogens of the dsDNA virus subgroup can inhibit TNF-α to prevent 53 apoptosis [4, 24]. A prominent pathogen of the dsRNA RT virus subgroup is HIV which manipulates the TNF signaling pathway to prevent apoptosis and support viral replication [46]. The Actinobacteria evolutionary subgroup consists mostly of profiles of cells ex- posed to Mycobacterium pathogens. The TNF signaling and Apoptosis pathways may not be enriched for this subgroup because strains of M. tuberculosis contain a virulence fac- tor that inhibits host cell apoptosis and alters the expression of many signaling pathways [71]. Many Euglenozoa also evade host immune responses by blocking the TNF signaling and Apoptosis pathways [94]. The GO associations for apoptosis and TNF activity are not enriched for these subgroups. The enrichment of a disease associated pathway caused by a particular pathogen does not necessarily correspond to the evolutionary subgroup of that pathogen. Influenza A and Toxoplasmosis are significantly enriched for ssRNA(-) viruses and Apicomplexa, re- spectively, but are also enriched for other subgroups. Epstein-Barr virus infection is not enriched for dsDNA viruses, Tuberculosis is not significantly enriched for Actinobacteria, and Malaria is not enriched for Apicomplexa. Three disease associated pathways are not significantly enriched for any subgroup although the causative pathogen is included in this work.

5.4 Discussion

Differentially regulated genes that are detected support the previously mentioned con- cept of a common host response. This is evidenced by the ROC curves and the AUC statistics resembling a random search as the differences between profiles are subtle. Many of the significantly regulated genes support the innate immune response. KEGG pathways and GO associations for the innate immune response are highly up regulated as well. The cytokines of the common host response, TNF-α, interferons, and interleukins, are up regu- lated and the corresponding pathways and associations are enriched. Many genes induced by interferons are heavily up regulated. This is especially true when cells are exposed to 54 viral pathogens. Genes and GO annotations that are involved in self-recognition and anti- gen processing and presentation are up regulated, although the KEGG pathway for antigen presentation is not. For the most part, PRRs and associated genes are up regulated. TLR2, which recog- nizes a wide range of extracellular stimuli, is up regulated for bacterial and eukaryotic pathogens. TLR8, which recognizes cytosomal single stranded RNA, is up regulated for viral and eukaryotic pathogens. TLR6 is only significantly up regulated for eukaryotes. In combination with TLR2, it recognizes PAMPs found most commonly on eukaryotic and bacterial invaders. TLR4 recognizes lipopolysaccharide from gram-negative bacteria. Be- cause the Bacteria taxonomical group contains gram-positive bacteria, TLR4 expression is not significantly regulated for the taxonomical group but is up regulated for the exclusively gram-negative Proteobacteria evolutionary subgroup. TLR3 which recognizes intracellular double stranded RNA is up regulated for viral and eukaryotic pathogens. Many down- stream targets of TLRs are up regulated including MYD88 for all taxonomical groups and RIPK1, IRF7, and IL-6 for viral and bacterial pathogens. The TLR signaling pathway for both KEGG pathways and GO processes is significantly enriched for taxonomical groups and evolutionary subgroups. In fact, GO pathways for TLRs specific for a particular type of pathogen are among the most significantly enriched GO annotations when samples are exposed to that pathogen. Many CLR genes and downstream targets are up regulated. In particular, CLEC4E is up regulated for every taxonomical group. The GO process for stimulatory CLR signaling is up regulated for every taxonomical group. Genes in the com- plement cascade are generally up regulated, although the Complement and Coagulation Cascade KEGG pathway is only weakly up regulated and the complement GO process is not enriched. Several down regulated genes are found for all taxonomical groups. The beta (ACTB) gene is strongly down regulated for viral and bacterial pathogens and is the most significantly down regulated gene for the collection of profiles. However, it is up regulated 55 for eukaryotic pathogens. This may be due to the diversity of eukaryotic species and the relatively small number of eukaryotic profiles used. It may also be the result of detect- ing the actin gene from the motile eukaryotic pathogens. Because of the discrepancy of up regulation of ACTB for the Eukaryota taxonomical group, the confidence that the down regulation of ACTB is part of the common host response is diminished. The zinc-finger pro- tein CXXC5 induces the tumor suppressor protein which maintains genome integrity and whose expression often leads to apoptosis [129]. CXXC5 also has been described as a repressor of WNT signaling [1]. It may be down regulated to limit p53 expression prevent- ing apoptosis and to increase WNT signaling. The nuclear envelope protein GMCL1 has been indicated to enhance the degradation of the double minute 2 homolog protein (MDM2) [84]. MDM2 is a primary repressor of p53 suggesting that GMCL1 is an indirect activator of p53. Down regulation of GMCL1 may disrupt MDM2 degradation leading to p53 inactivation. Like CXXC5, GMCL1 may be down regulated to limit the activity of p53 and thus preventing apoptosis. Another commonly down regulated gene, TTC3, does not directly alter p53 expression; rather, it interacts with the PI3K/AKT pathway. The PI3K/AKT pathway is a key determinant of survival or apoptosis and as such is involved in numerous activities in the cell. The members of the AKT family tended to be down regulated during infection. Surprisingly, the downstream targets of the kinase family, especially during viral infection, behaved as if AKT is up regulated. One possible explanation for this is the significant down regulation of TTC3, an ubiquitin E3 responsible for preparing phosphorylated nuclear AKT for proteasome degradation [89, 115, 119]. With the down regulation of TTC3, AKT accumulates in the cell as if it were up regulated and stimulates the signaling pathway although AKT expression is reduced. Therefore, this data may implicate the inhibition of TTC3 as an integral process for the immune response to pathogens. If the down regulation of TTC3 is important for proper host response to pathogens, it could be relevant to Down’s syndrome. Down’s syndrome (DS) is the complete or partial trisomy of chromosome 21 which 56 encodes a region known as the Down’s Syndrome Critical Region (DSCR). DS patients are prone to developing infections and tend to require longer periods of time to clear the infection [52, 98]. TTC3 is located on chromosome 21 in the DSCR and is likely over expressed in DS as a result of the additional copy of the gene [127, 123]. These results suggest the importance of TTC3 inhibition for defending against infection. The extra copy of TTC3 in DS may prevent its proper inhibition and prevent AKT from triggering the downstream targets of the pathway. To support the concept of a common host response, enrichments are very similar for all taxonomical groups in gene expression analysis. For the gene expression analysis, KEGG signaling and disease associated pathways and the corresponding GO processes are gener- ally enriched while metabolic pathways and processes are not enriched. Pathogenic species co-evolve with host species and as a result have developed ways to manipulate the host response as is seen with pathways for TNF signaling and Apoptosis. Because studies of live pathogens are included in this work, some of the genes found here may have been regulated by the pathogen to subvert host detection and clearance. It would have been preferable to use only inactive pathogens to remove subversions of the host response. Sadly, there are insufficient studies using inactive pathogens found in GEO to limit the scope of the work. A partial correction to pathogenic subversion is the broad range of pathogens included in this work. Because pathogens alter the host response in a species specific manner, the impact of these subversions should be diminished.

5.5 Conclusion

Meta-analysis is a useful tool for supporting and developing hypotheses. The results of the meta-analysis that is performed in this chapter support the concept of a common host response and implicate the importance of inhibition during that response. Genes like CXXC5, GMCL1, and TTC3 may strongly contribute to the common response. Further studies using molecular biology techniques specifically targeting these genes and their 57 translated proteins should be conducted to test these hypotheses. Many other genes of the common host response are up regulated for the collection of profiles. Differences in the regulation of these genes and enrichment profiles between evolutionary subgroups suggest how the pathogens try to evade the common host response. More studies of host responses to inactivated pathogens would illuminate parts of the common host response obscured by pathogenic evasion. 58

Chapter 6: Meta-analysis of miRNA Expression During Host Responses to Infections

In Chapter 5, a meta-analysis of gene expression during host responses to infection is performed. In this chapter, the expression of another type of nucleic acid, microRNA (miRNA), during infection is studied. By mapping miRNA to gene products, a deeper understanding of the regulation of gene expression is gained.

6.1 Background

Although most of the expression studies submitted to public repositories have focused on mRNA, mRNA are not the only type of sequence representing the cellular regulatory state. miRNAs are a trending class of short oligonucleotides, around 22 nucleotides in length, that regulate post-transcriptional gene expression. As a mechanism of the excep- tion to translation of RNA silencing, miRNAs bind to complementary mRNA strands and prevent the information contained in the mRNA strands from being translated into proteins. A miRNA sequence can bind to multiple mRNA sequences, and a mRNA sequence can be targeted by many miRNA sequences. The targeting of the mRNAs by miRNAs is deter- mined mostly by sequence complementarity. Figure 6.1 provides a schematic showing how miRNA alters gene expression. Because many disorders and conditions are caused by a deregulation of gene expression, it is no surprise that miRNAs have been impli- cated in a variety of disorders such as cancer, Alzheimers disease, and myocardial disease [16, 51, 117, 124]. miRNAs have only recently been discovered, and many microarray studies have been performed to determine miRNA expression in various conditions. The data from these studies are also deposited into public databases and would be well suited for meta-analysis. A meta-analysis of miRNA studies to identify miRNA differentially expressed between healthy samples and samples during the early host response to infectious pathogens is con- ducted. Using the mRNA targets of differentially expressed miRNAs, an estimation of 59

Figure 6.1: Diagram of miRNA Function. After transcription, a miRNA forms a dou- ble stranded precursor stem loop with itself. The stem loop is cleaved, producing mature miRNA strands. The miRNA strands attach to complementary mRNA strand. Translation of the mRNA is prevented or the mRNA is destroyed. Modified from [47].

changes to gene expression can be made. In the following sections, the inclusion criteria for infection experiments, data preparation procedures, and the utilization of an inverse variance effect size method to perform a meta-analysis across these experiments is summa- rized; and the miRNAs found commonly differentially regulated in host responses to dif- ferent classes of pathogens, along with enriched biological terms and pathways that these genes are involved in, are presented.

6.2 Methods

All data used in this chapter come from publicly available experimental data deposited in the GEO. Candidate studies are identified using the keyword search capabilities of GEO. Keywords include but are not limited to host, pathogen, infection, virus, bacteria, and parasite. A manual curation of the resulting set of studies is undertaken to minimize the heterogeneity of experimental conditions and designs. Pre-normalized data in SOFT for- mat are retrieved from GEO, and when pre-normalized data are not available, raw data are normalized using RMA without correcting for FDR. The candidate datasets that are 60 identified from the keyword search are filtered using a set of inclusion criteria. Studies are separated into subsets of samples where each subset tests a unique combination of ex- perimental variables such as a time point after infection and a specific cell line. Studies that do not contain at least one control subset and at least one infection subset, with more than one sample in each, are removed from consideration. Studies are reviewed to ensure that genetic alterations are not made to host samples, experiments are performed on two color single channel arrays, and host tissues are derived from humans, primates, mice, or rats. For time series studies without a control subset, subsets of samples taken at the time of infection are used as controls. Study subsets taken around 24 hours after infection are used as infection subsets; this includes subsets from 18 to 48 hours post infection for stud- ies without a subset taken 24 hours after infection. Clinical samples are exempt from this criteria because the exact time of infection is unknown. Data are reviewed to ensure proper expression values, where probe sets missing ex- pression values are removed from further analysis. Because many different technologies are used in the collected studies, probe set identifiers are mapped to sequence identifiers using GEO annotation files. miRNA identifiers are mapped to current miRBase family identifiers [75]. mirTarBase [58] is used to map miRNA to gene targets, which are then used for further enrichment. Differential expression profiles are created comparing an infection subset and the ap- propriate control subset from the same study. Expression values are converted to Z-scores using mean weighted, inverse variance effect sizes as described in Equations 5.1, 5.2, 5.3, 5.4, 5.5.

6.3 Results

Data are taken from 45 studies that are performed on 33 platforms from which 73 profiles are created. 19 species of pathogens are included in the analysis. A complete listing of studies, platforms, subsets, profiles, and samples that are used can be found in 61

Figure 6.2: Venn Diagrams of Differentially Regulated miRNA. Differential expression is tested for three taxonomical groups. The number of common genes refers to genes that are found in different taxonomical groups, not genes that are regulated in the entire collection analysis.

Appendix G. miRNA expression is analyzed and significant regulation is determined. A complete listing of significant upregulated and downregulated miRNA can be found in Appendix G. Figure 6.3 contains a taxogenomic divide of pathogens that are included. Unlike for gene expression, increasing the number of profiles in a group does not tend to produce more significant results for miRNA expression. The ssRNA RT virus subgroup does not generate any significant miRNA and is not included in Figure 6.3. The power to determine significance is not limited by sample size because the subgroup contains the most profiles of miRNA evolutionary subgroups. It is believed that the diverse experimental procedures of the profiles from the subgroup diminish the ability to designate miRNA as significant. The data from the subgroup are included in further analysis. Across the three taxonomic categories, totals of 156 and 81 miRNA are significantly differentially up and down regulated, respectively. 2 miRNA (-142 and miR-652) are commonly up regulated, but no miRNA are commonly down regulated from these cate- gories, as shown in Figure 6.2. After mapping miRNA to target mRNA, an enrichment of significant mRNA targets is performed to determine targeted KEGG pathways and GO associations. Figure 6.3 b) shows signaling and cellular process KEGG pathways that are enriched. Disease associated pathways tend to be enriched for all evolutionary subgroups. A listing of enriched pathways 62

Figure 6.3: Phylogenetic Tree and Enrichment of Significant miRNA. a) Phylogenetic tree of organisms in miRNA expression analysis. Viruses are classified by the Baltimore Classification. Bacteria and Eukaryotes are classified by phyla. Opisthokonta is used as a phylum for Animalia and Fungi. The number of significant miRNA as well as the num- ber of targeted genes are provided after the group title. b) Heat map of enriched KEGG pathways. (Left) Signaling pathways. (Right)Cellular process pathways.

and associations can be found in Appendix H. Many cellular process pathways are not enriched for the Proteobacteria evolutionary subgroup because there are few mRNA targets of the significant miRNA. The ssRNA(-) virus subgroup generates the same number of significant miRNA, but these miRNA target many mRNA which significantly enrich more pathways. Enriched pathways for miRNA analysis tend to be enriched for every evolutionary sub- group, but notable exceptions occur with both ssRNA virus subgroups and the Opisthokonta subgroup. Apoptosis is significantly enriched for these subgroups but not for other sub- groups. Protein processing in and other pathways exhibit a similar enrichment pattern. The opposite enrichment pattern is true for the Hedgehog signaling pathway which is not significantly enriched for either ssRNA virus or Opisthokonta sub- groups but is significantly enriched for the other subgroups. Despite these correlations between enrichments of the ssRNA virus and Opisthokonta subgroups, caution should be used before a strong relation between subgroups is assigned. A limited number of profiles using a eukaryotic pathogen is included in this analysis which greatly limits the analysis of 63 the miRNA expression of the Opisthokonta subgroup. Further studies of miRNA expres- sion from cells exposed to these pathogens are needed before a meaningful conclusion can be made.

6.4 Discussion

miRNA expression provides a negative control of the expression of genes by restricting gene expression from becoming detrimental to the cell or by suppressing gene expression to change cellular behavior. Some miRNA expression changes are characteristic of the common host response and have consistent expression changes for all pathogens. miR-155 suppresses negative regulators of inflammation and has been found to be up regulated upon infections from numerous pathogens [39, 90, 102, 118]. The up regulations of miR-142 and miR-652 are also characteristic of the common host response [68, 78, 88, 106]. To further support the concept of a common host response, enrichments were very similar for all taxonomical groups in miRNA expression analysis. KEGG signaling and disease associated pathways and the corresponding GO processes are generally enriched while metabolic pathways and processes are not enriched as miRNA are used to limit the over expression of these pathways and processes. Metabolic pathways and processes are en- riched for miRNA analysis as part of the host response to limit unnecessary . Other miRNA modify the common response to enable a defense specific to a particular pathogen. miRNA modifying the common response is exemplified by miR-29, a suppres- sor of IFN-γ [83]. IFN-γ is a powerful inflammatory molecule that turns cells to an antiviral state when over expressed. Because an antiviral state is not appropriate to respond to most bacterial pathogens, IFN-γ expression is closely controlled by the up regulated miR-29 when cells are exposed to bacterial pathogens. miR-29 is not differentially regulated when cells are exposed to viral pathogens [80, 122, 130] because an antiviral state is preferable. Like miR-29, miR-221 is up regulated when cells are exposed to bacterial pathogens and not differentially regulated when cells are exposed to viral pathogens. One of the mRNA 64 targets of miR-221 is the TLR adaptor molecule, TICAM1. TICAM1 only interacts with TLR3 whose ligand is intracellular double stranded RNA typically associated with viral pathogens [12]. With viral infection, it is more beneficial to not up regulate miR-221 and therefore not down regulate its other targets normally down regulated during the common host response than to interfere with PAMP recognition. The only other miRNA known to target TICAM1, miR-193, is also not significantly regulated in response to viral pathogens. Down regulation of miRNAs like miR-29 and miR-221 is uncommon because expression levels are low initially and some expression is often needed to prevent enduring infection [85].

6.5 Conclusion

The meta-analysis supports the notion that miRNA are used to direct the cell to specif- ically adapt to an environmental condition. More studies of both gene expression and miRNA expression of hosts exposed to eukaryotic pathogens are needed to better under- stand the interaction between hosts and eukaryotic pathogens. Binary vectors were not uti- lized in this chapter because a binary template for miRNA is unnecessary Because miRNA have not been standardized, a binary template would require too many bits. Also, miRNA are not directly correlated with targets so a binary representation is not applicable. 65

Chapter 7: Conclusion

Data from gene expression studies performed on high-throughput technologies require large amounts of space to be stored. Databases of these data are very large and cannot be searched efficiently for similarity using current methods. The information contained in these databases is not fully utilized. In this thesis, binary representations of this data are used to create a method for content based searches of the databases to further explore the knowledge contained in the studies. Lists of differentially expressed genes are converted to binary representation where a positive bit denotes if a gene is present in the list and an empty bit designates the absence of the gene in the list. A modification of the Tanimoto distance is selected as the algorithm for determining similarity because of its speed of implementation and the relevance of the profiles it selects as similar. The modification includes the use of negative or empty bits with the use of positive bits traditionally seen in the Tanimoto distance. The modification involves an additional calculation which requires more time to be implemented but returns more similar profiles. Even though the Hamming and unmodified Tanimoto distances re- quire less time to perform searches, the modified Tanimoto distance is considered the best distance measure that is tested. The use of the Kulczynski distance could be a better dis- tance measure based on the relevancy of selected profiles but it is not studied further in this thesis because the time required to search using the measure was not favorable. Over-representation enrichment may be possible using binary representations but may not be preferable to traditional methods. This is because the binary distance used to deter- mine enrichment do not produce statistical probabilities and must be transformed. Using the Ratio distance where the fraction of significant genes in a set is determined can not be transformed into a probability because transformations appear to be dependent on the numerator. The Tanimoto distance may prove to be adequate for enrichment if a proper conversion can be identified. Until that time, it remains unclear if binary methods pro- 66 duce equivalent results as those of traditional methods and if binary methods reduce the time needed to calculate the scores. A proper conversion may be established using a suffi- ciently large number of profiles to generate a normal distribution of distances with respect to calculated probabilities and a stringent exponential curve fit. Binary representations are able to contribute to a meta-analysis by ensuring significant differences between sample groupings much in the same fashion as analysis of variance (ANOVA) tests. In this thesis, a meta-analysis of gene expression studies of host responses to infections is conducted to distinguish differences between the responses to different types of pathogens. The results show that a common response is present regardless of the pathogen that is encountered. Subtle but noticeable differences between responses to different pathogens can be determined but limited amounts of data reduce the statistical competence of these differences. The results of the meta-analysis suggest the importance of down regulated genes, especially genes that inhibit other genes. These include genes such as CXXC5, GMCL1, and TTC3 which could be used as future candidates for drugs targeting infection. Along with the meta-analysis of gene expression studies, a meta-analysis of miRNA expression studies of host responses to infections is conducted. miRNA are a new class of neucleotides that alter post-transcriptional expression to respond to specific environments. The results of the meta-analysis show that miRNA are responsible for modifying the com- mon host response to specific types of pathogens. 67

References

[1] Therese Andersson, Erik Sodersten,¨ Joshua K Duckworth, Anna Cascante, Nicolas Fritz, Paola Sacchetti, Igor Cervenka, Vitezslav Bryja, and Ola Hermanson. Cxxc5 is a novel bmp4-regulated modulator of wnt signaling in neural stem cells. Journal of Biological , 284(6):3672–3681, 2009.

[2] Aseem Z Ansari. Chemical crosshairs on the central dogma. Nature chemical biol- ogy, 3(1):2–7, 2007.

[3] Ismail Avcibas¸, Mehdi Kharrazi, Nasir Memon, and Bulent¨ Sankur. Image steganal- ysis with binary similarity measures. EURASIP Journal on Applied Signal Process- ing, 2005:2749–2757, 2005.

[4] J Baillie, DA Sahlender, and JH Sinclair. Human cytomegalovirus infection inhibits tumor necrosis factor alpha (tnf-α) signaling by targeting the 55-kilodalton tnf-α receptor. Journal of virology, 77(12):7007–7016, 2003.

[5] Catherine A Ball, Alvis Brazma, Helen Causton, Steve Chervitz, Ron Edgar, Pascal Hingamp, John C Matese, Helen Parkinson, John Quackenbush, Martin Ringwald, et al. Submission of microarray data to public repositories. PLoS biology, 2:1276– 1277, 2004.

[6] Tanya Barrett, Dennis B Troup, Stephen E Wilhite, Pierre Ledoux, Carlos Evan- gelista, Irene F Kim, Maxim Tomashevsky, Kimberly A Marshall, Katherine H Phillippy, Patti M Sherman, et al. Ncbi geo: archive for functional genomics data sets10 years on. Nucleic acids research, 39(suppl 1):D1005–D1010, 2011.

[7] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a prac- tical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), pages 289–300, 1995.

[8] Benjamin M Bolstad, Rafael A Irizarry, Magnus Astrand,˚ and Terence P. Speed. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2):185–193, 2003.

[9] Alvis Brazma, Pascal Hingamp, John Quackenbush, Gavin Sherlock, Paul Spellman, Chris Stoeckert, John Aach, Wilhelm Ansorge, Catherine A Ball, Helen C Causton, et al. Minimum information about a microarray experiment (miame)toward stan- dards for microarray data. Nature genetics, 29(4):365–371, 2001.

[10] Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo, Niran Abeygunawardena, Ele Holloway, Misha Kapushesky, Patrick Kem- meren, Gonzalo Garcia Lara, et al. Arrayexpressa public repository for microarray gene expression data at the ebi. Nucleic acids research, 31(1):68–71, 2003. 68

[11] Garth R Brown, Vichet Hem, Kenneth S Katz, Michael Ovetsky, Craig Wallin, Olga Ermolaeva, Igor Tolstoy, Tatiana Tatusova, Kim D Pruitt, Donna R Maglott, et al. Gene: a gene-centered information resource at ncbi. Nucleic acids research, 43(D1):D36–D42, 2015. [12] Jonathan Brown, Huizhi Wang, George N Hajishengallis, and Michael Martin. Tlr- signaling networks an integration of adaptor molecules, , and cross-talk. Journal of dental research, 90(4):417–427, 2011. [13] Nathan Brown. Chemoinformaticsan introduction for computer scientists. ACM Computing Surveys (CSUR), 41(2):8, 2009. [14] Patrick Cahan, Felicia Rovegno, Denise Mooney, John C Newman, Georges St Lau- rent, and Timothy A McCaffrey. Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. Gene, 401(1):12–18, 2007. [15] Anna Campain and Yee H Yang. Comparison study of microarray meta-analysis methods. BMC bioinformatics, 11(1):408, 2010. [16] James WF Catto, Antonio Alcaraz, Anders S Bjartell, Ralph De Vere White, Christo- pher P Evans, Susanne Fussel, Freddie C Hamdy, Olli Kallioniemi, Lourdes Men- gual, Thorsten Schlomm, et al. Microrna in prostate, bladder, and kidney cancer: a systematic review. European urology, 59(5):671–681, 2011. [17] Sung-Hyuk Cha, Sungsoo Yoon, and Charles C Tappert. On binary similarity mea- sures for handwritten character recognition. In Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pages 4–8. IEEE, 2005. [18] Jenny C Chang, Eric C Wooten, Anna Tsimelzon, Susan G Hilsenbeck, M Car- olina Gutierrez, Richard Elledge, Syed Mohsin, C Kent Osborne, Gary C Chamness, D Craig Allred, et al. Gene expression profiling for the prediction of therapeutic re- sponse to docetaxel in patients with breast cancer. The Lancet, 362(9381):362–369, 2003. [19] Rong Chen, Li Li, and Atul J Butte. Ailun: reannotating gene expression data automatically. Nature methods, 4(11):879–879, 2007. [20] Rong Chen, Rohan Mallelwar, Ajit Thosar, Shivkumar Venkatasubrahmanyam, and Atul J Butte. Genechaser: identifying all biological and clinical conditions in which genes of interest are differentially expressed. BMC bioinformatics, 9(1):548, 2008. [21] Yi-Wen Chen, Po Zhao, Rehannah Borup, and Eric P Hoffman. Expression profiling in the muscular dystrophies identification of novel aspects of molecular pathophysi- ology. The Journal of cell biology, 151(6):1321–1336, 2000. [22] J Michael , Eurie L Hong, Craig Amundsen, Rama Balakrishnan, Gail Bink- ley, Esther T Chan, Karen R Christie, Maria C Costanzo, Selina S Dwight, Stacia R Engel, et al. Saccharomyces genome database: the genomics resource of budding yeast. Nucleic acids research, page gkr1029, 2011. 69

[23] Jung Kyoon Choi, Ungsik Yu, Sangsoo Kim, and Ook Joon Yoo. Combining multi- ple microarray studies and modeling interstudy variation. Bioinformatics, 19(suppl 1):i84–i90, 2003.

[24] Huai-Chia Chuang, Jong-Ding Lay, Shuang-En Chuang, Wen-Chuan Hsieh, Yao Chang, and Ih-Jen Su. Epstein-barr virus (ebv) latent membrane protein-1 down- regulates tumor necrosis factor-α (tnf-α) receptor-1 and confers resistance to tnf- α-induced apoptosis in t cells: Implication for the progression to t-cell lymphoma in ebv-associated hemophagocytic syndrome. The American journal of pathology, 170(5):1607–1617, 2007.

[25] William S Cleveland. Robust locally weighted regression and smoothing scatter- plots. Journal of the American statistical association, 74(368):829–836, 1979.

[26] Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, An- drew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422–1423, 2009.

[27] L Mike Conner and Bruce D Leopold. A euclidean distance metric to index disper- sion from radiotelemetry data. Wildlife Society Bulletin, pages 783–786, 2001.

[28] Gene Ontology Consortium et al. Gene ontology consortium: going forward. Nu- cleic acids research, 43(D1):D1049–D1056, 2015.

[29] Francis Crick et al. Central dogma of molecular biology. Nature, 227(5258):561– 563, 1970.

[30] Annalisa D’Andrea, Xiaojing Ma, Miguel Aste-Amezaga, Carla Paganin, and Gior- gio Trinchieri. Stimulatory and inhibitory effects of interleukin (il)-4 and il-13 on the production of cytokines by human peripheral blood mononuclear cells: priming for il-12 and tumor necrosis factor alpha production. The Journal of experimental medicine, 181(2):537–546, 1995.

[31] Cynthia De La Fuente, Francisco Santiago, Longwen Deng, Carolyne Eadie, Irene Zilberman, Kylene Kehn, Anil Maddukuri, Shanese Baylor, Kaili Wu, Chee G Lee, et al. Gene expression profile of hiv-1 tat expressing cells: a close interplay between proliferative and differentiation signals. BMC , 3(1):14, 2002.

[32] Robert P DeConde, Sarah Hawley, Seth Falcon, Nigel Clegg, Beatrice Knudsen, and Ruth Etzioni. Combining results of microarray experiments: a rank aggregation approach. Statistical Applications in Genetics and Molecular Biology, 5(1), 2006.

[33] Glynn Dennis Jr, Brad T Sherman, Douglas A Hosack, Jun Yang, Wei Gao, H Clif- ford Lane, Richard A Lempicki, et al. David: database for annotation, visualization, and integrated discovery. Genome biol, 4(5):P3, 2003. 70

[34] Ron Edgar, Michael Domrachev, and Alex E Lash. Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic acids research, 30(1):207–210, 2002.

[35] Michael B Eisen, Paul T Spellman, Patrick O Brown, and David Botstein. Clus- ter analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95(25):14863–14868, 1998.

[36] Ramez Elmasri and Shamkant B Navathe. Fundamentals of database systems. - son, 2014.

[37] Jesse M Engreitz, Rong Chen, Alexander A Morgan, Joel T Dudley, Rohan Mal- lelwar, and Atul J Butte. Profilechaser: searching microarray repositories based on genome-wide patterns of differential expression. Bioinformatics, 27(23):3317–3318, 2011.

[38] Jesse M Engreitz, Alexander A Morgan, Joel T Dudley, Rong Chen, Rahul Thathoo, Russ B Altman, and Atul J Butte. Content-based microarray search using differential expression profiles. BMC bioinformatics, 11(1):603, 2010.

[39] L Fassi Fehri, Manuel Koch, Elena Belogolova, Hany Khalil, Christian Bolz, Behnam Kalali, Hans J Mollenkopf, Macarena Beigier-Bompadre, Alexander Kar- las, Thomas Schneider, et al. Helicobacter pylori induces mir-155 in t cells in a camp-foxp3-dependent manner. PLoS One, 5(3):e9500–e9500, 2010.

[40] Michael A Fligner, Joseph S Verducci, and Paul E Blower. A modification of the jaccard–tanimoto similarity index for diverse selection of chemical compounds us- ing binary strings. Technometrics, 44(2):110–119, 2002.

[41] Darren R Flower. On the properties of bit string-based measures of chemical similar- ity. Journal of chemical information and computer sciences, 38(3):379–386, 1998.

[42] Jonathan T Foote. Content-based retrieval of music and audio. In Voice, Video, and Data Communications, pages 138–147. International Society for Optics and Photonics, 1997.

[43] Wataru Fujibuchi, Larisa Kiseleva, Takeaki Taniguchi, Hajime Harada, and Paul Horton. Cellmontage: similar expression profile search server. Bioinformatics, 23(22):3103–3104, 2007.

[44] Timothy Galitski, Alok J Saldanha, Cora A Styles, Eric S Lander, and Gerald R Fink. Ploidy regulation of gene expression. Science, 285(5425):251–254, 1999.

[45] RICARDO T Gazzinelli, MASAHIKO Makino, SISIR K Chattopadhyay, CM Snap- per, A Sher, AW Hugin,¨ and HC3d Morse. Cd4+ subset regulation in viral infection. preferential activation of th2 cells during progression of retrovirus-induced immun- odeficiency in mice. The Journal of Immunology, 148(1):182–188, 1992. 71

[46] Romas Geleziunas, Weiduan Xu, Kohsuke Takeda, Hidenori Ichijo, and Warner C Greene. Hiv-1 nef inhibits ask1-dependent death signalling providing a potential mechanism for protecting the infected host cell. Nature, 410(6830):834–838, 2001.

[47] GeneCopoeia. miexpress precursor mirna expression clones, 2015.

[48] Kristian A Gray, Bethan Yates, Ruth L Seal, Mathew W Wright, and Elspeth A Bruford. Genenames. org: the hgnc resources in 2015. Nucleic acids research, 43(D1):D1079–D1085, 2015.

[49] Robert Grutzmann,¨ Hinnerk Boriss, Ole Ammerpohl, Jutta Luttges,¨ Holger Kalthoff, Hans Konrad Schackert, Gunter¨ Kloppel,¨ Hans Detlev Saeger, and Christian Pi- larsky. Meta-analysis of microarray data on defines a set of com- monly dysregulated genes. , 24(32):5079–5088, 2005.

[50] Junming Guo, Ying Miao, Bingxiu Xiao, Rong Huan, Zhen Jiang, Dan Meng, and Yanjun Wang. Differential expression of species in human gastric cancer versus non-tumorous tissues. Journal of gastroenterology and hepatology, 24(4):652–657, 2009.

[51] Sebastien´ S Hebert,´ Katrien Horre,´ Laura Nicola¨ı, Aikaterini S Papadopoulou, Wim Mandemakers, Asli N Silahtaroglu, Sakari Kauppinen, Andre´ Delacourte, and Bart De Strooper. Loss of microrna cluster mir-29a/b-1 in sporadic alzheimer’s disease correlates with increased bace1/β-secretase expression. Proceedings of the National Academy of Sciences, 105(17):6415–6420, 2008.

[52] JM Hilton, DA Fitzgerald, and DM Cooper. Respiratory morbidity of hospitalized children with trisomy 21. Journal of paediatrics and child health, 35(4):383–386, 1999.

[53] Michael Ed Hohn. Binary coefficients: A theoretical and empirical study. Journal of the International Association for Mathematical Geology, 8(2):137–150, 1976.

[54] Fangxin Hong and Rainer Breitling. A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics, 24(3):374–382, 2008.

[55] Paul B Horton, Larisa Kiseleva, and Wataru Fujibuchi. Rapids: an algorithm for rapid expression profile database search. Genome Informatics, 17(2):67–76, 2006.

[56] Douglas A Hosack, Glynn Dennis Jr, Brad T Sherman, H Clifford Lane, Richard A Lempicki, et al. Identifying biological themes within lists of genes with ease. Genome Biol, 4(10):R70, 2003.

[57] Hamid Hossain, Svetlin Tchatalbachev, and Trinad Chakraborty. Host gene ex- pression profiling in pathogen–host interactions. Current opinion in immunology, 18(4):422–429, 2006. 72

[58] Sheng-Da Hsu, Yu-Ting Tseng, Sirjana Shrestha, Yu-Ling Lin, Anas Khaleel, Chih- Hung Chou, Chao-Fang Chu, Hsi-Yuan Huang, Ching-Min Lin, Shu-Yi Ho, et al. mirtarbase update 2014: an information resource for experimentally validated mirna- target interactions. Nucleic acids research, 42(D1):D78–D85, 2014.

[59] Zhiyuan Hu, Cheng Fan, Daniel S Oh, JS Marron, Xiaping He, Bahjat F Qaqish, Chad Livasy, Lisa A Carey, Evangeline Reynolds, Lynn Dressler, et al. The molec- ular portraits of breast tumors are conserved across microarray platforms. BMC genomics, 7(1):96, 2006.

[60] Da Wei Huang, Brad T Sherman, and Richard A Lempicki. Bioinformatics enrich- ment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research, 37(1):1–13, 2009.

[61] Jeremy Hubble, Janos Demeter, Heng Jin, Maria Mao, Michael Nitzberg, TBK Reddy, Farrell Wymore, Zachariah K Zachariah, Gavin Sherlock, and Catherine A Ball. Implementation of genepattern within the stanford microarray database. Nu- cleic acids research, 37(suppl 1):D898–D901, 2009.

[62] Lawrence Hunter, Ronald C Taylor, Sonia M Leach, and Richard Simon. Gest: a gene expression search tool based on a novel bayesian similarity metric. Bioinfor- matics, 17(suppl 1):S115–S122, 2001.

[63] Rafael A Irizarry, Benjamin M Bolstad, Francois Collin, Leslie M Cope, Bridget Hobbs, and Terence P Speed. Summaries of affymetrix genechip probe level data. Nucleic acids research, 31(4):e15–e15, 2003.

[64] Rafael A Irizarry, Bridget Hobbs, Francois Collin, Yasmin D Beazer-Barclay, Kris- ten J Antonellis, Uwe Scherf, Terence P Speed, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4(2):249–264, 2003.

[65] Rafael A Irizarry, Chi Wang, Yun Zhou, and Terence P Speed. Gene set enrichment analysis made simple. Statistical methods in medical research, 18(6):565–575, 2009.

[66] Rafael A Irizarry, Daniel Warren, Forrest Spencer, Irene F Kim, Shyam Biswal, Bryan C Frank, Edward Gabrielson, Joe GN Garcia, Joel Geoghegan, Gregory Ger- mino, et al. Multiple-laboratory comparison of microarray platforms. Nature meth- ods, 2(5):345–350, 2005.

[67] Richard G Jenner and Richard A Young. Insights into host responses against pathogens from transcriptional profiling. Nature Reviews Microbiology, 3(4):281– 294, 2005.

[68] F Ji, B Yang, X Peng, H Ding, H You, and P Tien. Circulating in hepatitis b virus–infected patients. Journal of viral hepatitis, 18(7):e242–e251, 2011.

[69] Bradley Jones. MATLAB: Statistics Toolbox; User’s Guide. MathWorks, 1997. 73

[70] Minoru Kanehisa, Susumu Goto, Yoko Sato, Masayuki Kawashima, Miho Furu- michi, and Mao Tanabe. Data, information, knowledge and principle: back to metabolism in kegg. Nucleic acids research, 42(D1):D199–D205, 2014.

[71] Joseph Keane, Heinz G Remold, and Hardy Kornfeld. Virulent mycobacterium tu- berculosis strains evade apoptosis of infected alveolar macrophages. The Journal of Immunology, 164(4):2016–2020, 2000.

[72] Thomas B Kepler, Lynn Crosby, and Kevin T Morgan. Normalization and analysis of microarray data by self-consistency and local regression. Genome biol, 3(7):1– 12, 2002.

[73] Tom AM Kevenaar, Geert Jan Schrijen, Michiel van der Veen, Anton HM Akker- mans, and Fei Zuo. Face recognition with renewable and privacy preserving binary templates. In Automatic Identification Advanced Technologies, 2005. Fourth IEEE Workshop on, pages 21–26. IEEE, 2005.

[74] Manesh Kokare, BN Chatterji, and PK Biswas. Comparison of similarity metrics for texture image retrieval. In TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region, volume 2, pages 571–575. IEEE, 2003.

[75] Ana Kozomara and Sam Griffiths-Jones. mirbase: annotating high confidence mi- crornas using deep sequencing data. Nucleic acids research, page gkt1181, 2013.

[76] Michael L LaCroix-Fralish, Jean-Sebastien Austin, Felix Y Zheng, Daniel J Levitin, and Jeffrey S Mogil. Patterns of pain: meta-analysis of microarray studies of pain. PAIN R , 152(8):1888–1898, 2011.

[77] Wei-Jen Li, Ke Wang, Salvatore J Stolfo, and Benjamin Herzog. Fileprints: Iden- tifying file types by n-gram analysis. In Information Assurance Workshop, 2005. IAW’05. Proceedings from the Sixth Annual IEEE SMC, pages 64–71. IEEE, 2005.

[78] Yu Li, Eric Y Chan, Jiangning Li, Chester Ni, Xinxia Peng, Elizabeth Rosenzweig, Terrence M Tumpey, and Michael G Katze. Microrna expression and virulence in pandemic influenza virus-infected mice. Journal of virology, 84(6):3023–3032, 2010.

[79] Winnie S Liang, Anil Maddukuri, Tanya M Teslovich, Cynthia de la Fuente, Em- manuel Agbottah, Shabnam Dadgar, Kylene Kehn, Sampsa Hautaniemi, Anne Pum- fery, Dietrich A Stephan, et al. Therapeutic targets for hiv-1 infection in the host proteome. Retrovirology, 2(1):20, 2005.

[80] Zhen Liu, Bin Xiao, Bin Tang, Bosheng Li, Na Li, Endong Zhu, Gang Guo, Jiang Gu, Yuan Zhuang, Xiaofei Liu, et al. Up-regulated microrna-146a negatively mod- ulate helicobacter pylori-induced inflammatory response in human gastric epithelial cells. Microbes and Infection, 12(11):854–863, 2010. 74

[81] Margus Lukk, Misha Kapushesky, Janne Nikkila,¨ Helen Parkinson, Angela Goncalves, Wolfgang Huber, Esko Ukkonen, and Alvis Brazma. A global map of human gene expression. Nature biotechnology, 28(4):322–324, 2010.

[82] Riikka Lund, Tero Aittokallio, Olli Nevalainen, and Riitta Lahesmaa. Identification of novel genes regulated by il-12, il-4, or tgf-β during the early polarization of cd4+ lymphocytes. The Journal of Immunology, 171(10):5328–5336, 2003.

[83] Feng Ma, Sheng Xu, Xingguang Liu, Qian Zhang, Xiongfei Xu, Mofang Liu, Min- min Hua, Nan Li, Hangping Yao, and Xuetao Cao. The microrna mir-29 controls innate and adaptive immune responses to intracellular bacterial infection by target- ing interferon-[gamma]. Nature immunology, 12(9):861–869, 2011.

[84] Masaaki Masuhara, Kenji Nagao, Mitsuo Nishikawa, Tohru Kimura, and Toru Nakano. Enhanced degradation of mdm2 by a nuclear envelope component, mouse germ cell-less. Biochemical and biophysical research communications, 308(4):927– 932, 2003.

[85] Alexey A Matskevich and Karin Moelling. Dicer is involved in protection against influenza a virus infection. Journal of General Virology, 88(10):2627–2635, 2007.

[86] Matthew N McCall, Karan Uppal, Harris A Jaffee, Michael J Zilliox, and Rafael A Irizarry. The gene expression barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic acids research, 39(suppl 1):D1011–D1015, 2011.

[87] TR Mosmann and RL Coffman. Th1 and th2 cells: different patterns of lymphokine secretion lead to different functional properties. Annual review of immunology, 7(1):145–173, 1989.

[88] Yoshiki Murakami, Masami Tanaka, Hidenori Toyoda, Katsuyuki Hayashi, Masahiko Kuroda, Atsushi Tajima, and Kunitada Shimotohno. Hepatic microrna expression is associated with the response to interferon treatment of chronic hepati- tis c. BMC medical genomics, 3(1):48, 2010.

[89] Masayuki Noguchi, Noriyuki Hirata, and Futoshi Suizu. The links between akt and two intracellular proteolytic cascades: Ubiquitination and autophagy. Biochimica et Biophysica Acta (BBA)-Reviews on Cancer, 1846(2):342–352, 2014.

[90] Ryan M O’Connell, Konstantin D Taganov, Mark P Boldin, Genhong Cheng, and David Baltimore. Microrna-155 is induced during the macrophage inflammatory response. Proceedings of the National Academy of Sciences, 104(5):1604–1609, 2007.

[91] Mike Owens and Grant Allen. SQLite. Springer, 2010. 75

[92] Helen Parkinson, Misha Kapushesky, Nikolay Kolesnikov, Gabriella Rustici, Mo- hammad Shojatalab, Niran Abeygunawardena, Hugo Berube, Miroslaw Dylag, Ibrahim Emam, Anna Farne, et al. Arrayexpress updatefrom an archive of func- tional genomics experiments to the atlas of gene expression. Nucleic acids research, 37(suppl 1):D868–D872, 2009.

[93] JL Pennings, Tjeerd G Kimman, and Riny Janssen. Identification of a common gene expression response in different lung inflammatory diseases in and macaques. PLoS One, 3(7):e2596, 2008.

[94] Christine A Petersen, Katherine A Krumholz, John Carmen, Anthony P Sinai, and Barbara A Burleigh. Trypanosoma cruzi infection and nuclear factor kappa b acti- vation prevent apoptosis in cardiac cells. Infection and immunity, 74(3):1580–1587, 2006.

[95] V Pihur, Somnath Datta, and Susmita Datta. Finding common genes in multiple cancer types through meta–analysis of microarray experiments: A rank aggregation approach. Genomics, 92(6):400–403, 2008.

[96] Heather A Piwowar and Wendy W Chapman. Recall and bias of retrieving gene expression microarray datasets through identifiers. Journal of biomedical discovery and collaboration, 5:7, 2010.

[97] John Quackenbush. Microarray data normalization and transformation. Nature ge- netics, 32:496–501, 2002.

[98] G Ram and J Chinen. Infections and immunodeficiency in down syndrome. Clinical & Experimental Immunology, 164(1):9–16, 2011.

[99] Adaikalavan Ramasamy, Adrian Mondry, Chris C Holmes, and Douglas G Altman. Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med, 5(9):e184, 2008.

[100] Daniel R Rhodes, Terrence R Barrette, Mark A Rubin, Debashis Ghosh, and Arul M Chinnaiyan. Meta-analysis of microarrays interstudy validation of gene expres- sion profiles reveals pathway dysregulation in . Cancer research, 62(15):4427–4433, 2002.

[101] Daniel R Rhodes, Jianjun Yu, Kalyan Shanker, Nandan Deshpande, Radhika Varam- bally, Debashis Ghosh, Terrence Barrette, Akhilesh Pander, and Arul M Chinnaiyan. Oncomine: a cancer microarray database and integrated data-mining platform. Neo- plasia, 6(1):1–6, 2004.

[102] Antony Rodriguez, Elena Vigorito, Simon Clare, Madhuri V Warren, Philippe Cout- tet, Dalya R Soond, Stijn van Dongen, Russell J Grocock, Partha P Das, Eric A Miska, et al. Requirement of bic/microrna-155 for normal immune function. Sci- ence, 316(5824):608–611, 2007. 76

[103] David J Rogers and Henry Fleming. A computer program for classifying ii. a numerical handling of non-numerical data. Bioscience, 14(9):15–28, 1964.

[104] David J Rogers and Taffee T Tanimoto. A computer program for classifying plants. Science, 132(3434):1115–1118, 1960.

[105] Yvan Saeys, Inaki˜ Inza, and Pedro Larranaga.˜ A review of feature selection tech- niques in bioinformatics. bioinformatics, 23(19):2507–2517, 2007.

[106] Yoshimasa Saito, Hidekazu Suzuki, Hitoshi Tsugawa, Hiroyuki Imaeda, Juntaro Matsuzaki, Kenro Hirata, Naoki Hosoe, Masahiko Nakamura, Makio Mukai, Hidet- sugu Saito, et al. Overexpression of mir-142-5p and mir-155 in gastric mucosa- associated lymphoid tissue (malt) lymphoma resistant to helicobacter pylori eradi- cation. PLoS One, 2012.

[107] David Sancho and Caetano Reis e Sousa. Signaling by myeloid c-type lectin re- ceptors in immunity and homeostasis. Annual review of immunology, 30:491–529, 2012.

[108] Guang R Shi. Multivariate data analysis in palaeoecology and palaeobiogeogra- phya review. Palaeogeography, Palaeoclimatology, Palaeoecology, 105(3):199–234, 1993.

[109] Leming Shi, Laura H Reid, Wendell D Jones, Richard Shippy, Janet A Warrington, Shawn C Baker, Patrick J Collins, Francoise De Longueville, Ernest S Kawasaki, Kathleen Y Lee, et al. The microarray quality control (maqc) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature biotech- nology, 24(9):1151–1161, 2006.

[110] Guilherme L Silva, Cristina M Junta, Stephano S Mello, Paula S Garcia, Diane M Rassi, ELZA T SAKAMOTO-HOJO, Eduardo A Donadi, and Geraldo AS Passos. Profiling meta-analysis reveals primarily gene coexpression concordance between systemic lupus erythematosus and rheumatoid arthritis. Annals of the New York Academy of Sciences, 1110(1):33–46, 2007.

[111] Peter HA Sneath and Robert R Sokal. Numerical taxonomy. Nature, 193(4818):855– 860, 1962.

[112] Melissa J Spencer, Encarnacion Montecino-Rodriguez, Kenneth Dorshkind, and James G Tidball. Helper (cd4+) and cytotoxic (cd8+) t cells promote the pathol- ogy of dystrophin-deficient muscle. Clinical Immunology, 98(2):235–243, 2001.

[113] John D Storey. The positive false discovery rate: a bayesian interpretation and the q-value. Annals of statistics, pages 2013–2035, 2003.

[114] Aravind Subramanian, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Ben- jamin L Ebert, Michael A Gillette, Amanda Paulovich, Scott L Pomeroy, Todd R 77

Golub, Eric S Lander, et al. Gene set enrichment analysis: a knowledge-based ap- proach for interpreting genome-wide expression profiles. Proceedings of the Na- tional Academy of Sciences of the United States of America, 102(43):15545–15550, 2005.

[115] Futoshi Suizu, Yosuke Hiramuki, Fumihiko Okumura, Mami Matsuda, Akiko J Oku- mura, Noriyuki Hirata, Masumi Narita, Takashi Kohno, Jun Yokota, Miyuki Bo- hgaki, et al. The e3 ligase ttc3 facilitates ubiquitination and degradation of phospho- rylated akt. Developmental cell, 17(6):800–810, 2009.

[116] S Joshua Swamidass and Pierre Baldi. Mathematical correction for fingerprint simi- larity measures to improve chemical retrieval. Journal of chemical information and modeling, 47(3):952–964, 2007.

[117] Thomas Thum, Carina Gross, Jan Fiedler, Thomas Fischer, Stephan Kissler, Markus Bussen, Paolo Galuppo, Steffen Just, Wolfgang Rottbauer, Stefan Frantz, et al. Microrna-21 contributes to myocardial disease by stimulating map kinase signalling in fibroblasts. Nature, 456(7224):980–984, 2008.

[118] Esmerina Tili, Jean-Jacques Michaille, Amelia Cimino, Stefan Costinean, Calin Dan Dumitru, Brett Adair, Muller Fabbri, Hannes Alder, Chang Gong Liu, George Adrian Calin, et al. Modulation of mir-155 and mir-125b levels follow- ing lipopolysaccharide/tnf-α stimulation and their possible roles in regulating the response to endotoxin shock. The Journal of Immunology, 179(8):5082–5089, 2007.

[119] Alex Toker. Ttc3 ubiquitination terminates akt-ivation. Developmental cell, 17(6):752–754, 2009.

[120] George C Tseng, Min-Kyu Oh, Lars Rohlin, James C Liao, and Wing Hung Wong. Issues in cdna microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic acids research, 29(12):2549– 2557, 2001.

[121] Virginia Goss Tusher, Robert Tibshirani, and Gilbert Chu. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 98(9):5116–5121, 2001.

[122] Shunsuke Ura, Masao Honda, Yamashita, Teruyuki Ueda, Hajime Taka- tori, Ryuhei Nishino, Hajime Sunakozaka, Yoshio Sakai, Katsuhisa Horimoto, and Shuichi Kaneko. Differential microrna expression between hepatitis b and hepatitis c leading disease progression to hepatocellular carcinoma. Hepatology, 49(4):1098– 1112, 2009.

[123] Mireia Vilardell, Axel Rasche, Anja Thormann, Elisabeth Maschke-Dutz, Luis A Perez-Jurado,´ Hans Lehrach, and Ralf Herwig. Meta-analysis of heterogeneous down syndrome data reveals consistent genome-wide dosage effects related to neu- rological processes. BMC genomics, 12(1):229, 2011. 78

[124] Stefano Volinia, George A Calin, Chang-Gong Liu, Stefan Ambs, Amelia Cimmino, Fabio Petrocca, Rosa Visone, Marilena Iorio, Claudia Roldo, Manuela Ferracin, et al. A microrna expression signature of human solid tumors defines cancer gene targets. Proceedings of the National academy of Sciences of the United States of America, 103(7):2257–2261, 2006.

[125] David L Wheeler, Deanna M Church, Ron Edgar, Scott Federhen, Wolfgang Helm- berg, Thomas L Madden, Joan U Pontius, Gregory D Schuler, Lynn M Schriml, Edwin Sequeira, et al. Database resources of the national center for biotechnology information: update. Nucleic acids research, 32(suppl 1):D35–D40, 2004.

[126] P Willett. Similarity-based approaches to virtual screening. Biochemical Society Transactions, 31(Pt 3):603–606, 2003.

[127] E A¨ıt Yahya-Graison, J Aubert, L Dauphinot, I Rivals, M Prieur, G Golfier, J Rossier, L Personnaz, N Creau, H Blehaut, et al. Classification of human chromosome 21 gene-expression variations in down syndrome: impact on disease phenotypes. The American Journal of Human Genetics, 81(3):475–491, 2007.

[128] Yee Hwa Yang, Sandrine Dudoit, Percy Luu, David M Lin, Vivian Peng, John Ngai, and Terence P Speed. Normalization for cdna microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic acids research, 30(4):e15–e15, 2002.

[129] Min Zhang, RuiPeng Wang, YanYi Wang, FeiCi Diao, Fei Lu, Dong Gao, DanYing Chen, ZhongHe Zhai, and HongBing Shu. The cxxc finger 5 protein is required for dna damage-induced p53 activation. Science in China Series C: Life Sciences, 52(6):528–538, 2009.

[130] Rui Zhou, Guoku Hu, Ai-Yu Gong, and Xian-Ming Chen. Binding of nf-kappab p65 subunit to the elements is involved in lps-induced transactivation of mirna genes in human biliary epithelial cells. Nucleic acids research, page gkq056, 2010.

[131] G Zimmermann, P Henle, M Kusswetter,¨ A Moghaddam, A Wentzensen, W Richter, and S Weiss. Tgf-β1 as a marker of delayed fracture healing. , 36(5):779–785, 2005.

[132] Elias Zintzaras and John PA Ioannidis. Meta-analysis for ranked discovery datasets: theoretical framework and empirical demonstration for microarrays. Computational biology and chemistry, 32(1):39–47, 2008. 79

Appendix A: Data from the Single Platform Study of Binary Representations

Table A.1: Summary of Profiles in Single Platform Study. Gives groups, abbr, samples, profiles

Group Abbr. Studies Samples Alzheimer’s Disorder A 9 41 Breast Cancer BC 8 77 Skeletal Muscle SM 11 40 Hepatic Tissue H 11 55 Cardiac Tissue C 10 26 Total 49 239

Table A.2: Subsets Used in Single Platform Study. The GEO Series ID, the Series title, the test group, the number of the subset in the series, and test samples are given for each subset. There are 5 test groups: Breast Cancer (BC), Alzheimer’s Disease (A), Skeletal Muscle (SM), Hepatic (H), and Cardiac (C).

GSE ID Title Group Subset Samples GSE3156 Breast Cancer Cell Lines BC 1 GSM70657 GSE3156 Breast Cancer Cell Lines BC 2 GSM70658 GSE3156 Breast Cancer Cell Lines BC 3 GSM70659 GSE3156 Breast Cancer Cell Lines BC 4 GSM70660 GSE3156 Breast Cancer Cell Lines BC 5 GSM70661 GSE3156 Breast Cancer Cell Lines BC 6 GSM70662 GSE3156 Breast Cancer Cell Lines BC 7 GSM70663 GSE3156 Breast Cancer Cell Lines BC 8 GSM70664 GSE3156 Breast Cancer Cell Lines BC 9 GSM70665 GSE3156 Breast Cancer Cell Lines BC 10 GSM70666 GSE3156 Breast Cancer Cell Lines BC 11 GSM70667 GSE3156 Breast Cancer Cell Lines BC 12 GSM70668 GSE3156 Breast Cancer Cell Lines BC 13 GSM70669 GSE3156 Breast Cancer Cell Lines BC 14 GSM70670 80

Table A.2 (continued) GSE ID Title Group Subset Samples GSE3156 Breast Cancer Cell Lines BC 15 GSM70671 GSE3156 Breast Cancer Cell Lines BC 16 GSM70672 GSE3156 Breast Cancer Cell Lines BC 17 GSM70673 GSE3156 Breast Cancer Cell Lines BC 18 GSM70674 GSE3156 Breast Cancer Cell Lines BC 19 GSM70675 GSE3744 Human breast tumor expression BC 1 GSM85473, GSM85474, GSM85475, GSM85476, GSM85477, GSM85478, GSM85479, GSM85480, GSM85481, GSM85482, GSM85483, GSM85484, GSM85485, GSM85486, GSM85487, GSM85488, GSM85489, GSM85490 GSE3744 Human breast tumor expression BC 2 GSM85491, GSM85492 GSE3744 Human breast tumor expression BC 3 GSM85493, GSM85494, GSM85495, GSM85496, GSM85497, GSM85498, GSM85499, GSM85500, GSM85501, GSM85502, GSM85503, GSM85504, GSM85505, GSM85506, GSM85507, GSM85508, GSM85509, GSM85510, GSM85511, GSM85512 GSE3744 Human breast tumor expression BC 4 GSM85513, GSM85514, GSM85515, GSM85516, GSM85517, GSM85518, GSM85519 GSE4754 MCF10A cells in 3-D culture BC 1 GSM106933, GSM106968 after treatment with BRCA1- RNAi or CtIP-RNAi GSE4754 MCF10A cells in 3-D culture BC 2 GSM106971, GSM106972 after treatment with BRCA1- RNAi or CtIP-RNAi GSE4754 MCF10A cells in 3-D culture BC 3 GSM106974, GSM106975 after treatment with BRCA1- RNAi or CtIP-RNAi GSE4754 MCF10A cells in 3-D culture BC 4 GSM106976, GSM106977 after treatment with BRCA1- RNAi or CtIP-RNAi GSE5102 Genomic profile of the estrogen BC 1 GSM115080, GSM115081, induced neoplastic transforma- GSM115082 tion of the human breast epithe- lial cell MCF10F 81

Table A.2 (continued) GSE ID Title Group Subset Samples GSE5102 Genomic profile of the estrogen BC 2 GSM115083, GSM115084, induced neoplastic transforma- GSM115085 tion of the human breast epithe- lial cell MCF10F GSE5102 Genomic profile of the estrogen BC 3 GSM115086, GSM115087, induced neoplastic transforma- GSM115088 tion of the human breast epithe- lial cell MCF10F GSE5102 Genomic profile of the estrogen BC 4 GSM115089, GSM115090, induced neoplastic transforma- GSM115091 tion of the human breast epithe- lial cell MCF10F GSE5102 Genomic profile of the estrogen BC 5 GSM115092, GSM115093, induced neoplastic transforma- GSM115094 tion of the human breast epithe- lial cell MCF10F GSE5116 Genomic Pathways of 17-beta- BC 1 GSM115510, GSM115511, Induced Malignant GSM115512 Cell Transformation in Human Breast Epithelial Cells GSE5116 Genomic Pathways of 17-beta- BC 2 GSM115513, GSM115514, Estradiol Induced Malignant GSM115515 Cell Transformation in Human Breast Epithelial Cells GSE5116 Genomic Pathways of 17-beta- BC 3 GSM115516, GSM115517, Estradiol Induced Malignant GSM115518 Cell Transformation in Human Breast Epithelial Cells GSE5116 Genomic Pathways of 17-beta- BC 4 GSM115519, GSM115520, Estradiol Induced Malignant GSM115521 Cell Transformation in Human Breast Epithelial Cells GSE6521 MCF7 inhibitor BC 1 GSM149913 GSE6521 MCF7 inhibitor BC 2 GSM149914 GSE6521 MCF7 inhibitor BC 3 GSM149915 GSE6521 MCF7 inhibitor BC 4 GSM149916 GSE6521 MCF7 inhibitor BC 5 GSM149917 GSE6521 MCF7 inhibitor BC 6 GSM149918 GSE6521 MCF7 inhibitor BC 7 GSM149919 GSE6521 MCF7 inhibitor BC 8 GSM149920 GSE6521 MCF7 inhibitor BC 9 GSM149921 GSE6521 MCF7 inhibitor BC 10 GSM149922 GSE6521 MCF7 inhibitor BC 11 GSM149923 GSE6521 MCF7 inhibitor BC 12 GSM149924 GSE6521 MCF7 inhibitor BC 13 GSM149925 82

Table A.2 (continued) GSE ID Title Group Subset Samples GSE6521 MCF7 inhibitor BC 14 GSM149926 GSE6521 MCF7 inhibitor BC 15 GSM149927 GSE6521 MCF7 inhibitor BC 16 GSM149928 GSE6521 MCF7 inhibitor BC 17 GSM149929 GSE6521 MCF7 inhibitor BC 18 GSM149930 GSE6521 MCF7 inhibitor BC 19 GSM149931 GSE6521 MCF7 inhibitor BC 20 GSM149932 GSE6521 MCF7 inhibitor BC 21 GSM149933 GSE6521 MCF7 inhibitor BC 22 GSM149934 GSE6521 MCF7 inhibitor BC 23 GSM149935 GSE6521 MCF7 inhibitor BC 24 GSM149936 GSE6521 MCF7 inhibitor BC 25 GSM149937 GSE6521 MCF7 inhibitor BC 26 GSM149938 GSE6521 MCF7 inhibitor BC 27 GSM149939 GSE6521 MCF7 inhibitor BC 28 GSM149940 GSE6521 MCF7 inhibitor BC 29 GSM149941 GSE6521 MCF7 inhibitor BC 30 GSM149942 GSE6521 MCF7 inhibitor BC 31 GSM149943 GSE6521 MCF7 inhibitor BC 32 GSM149944 GSE6521 MCF7 inhibitor BC 33 GSM149945 GSE6521 MCF7 inhibitor BC 34 GSM149946 GSE6521 MCF7 inhibitor BC 35 GSM149947 GSE6521 MCF7 inhibitor BC 36 GSM149948 GSE6521 MCF7 inhibitor BC 37 GSM149949 GSE5764 Analysis of microdissected in- BC 1 GSM134584, GSM134588, vasive lobular and ductal breast GSM134687, GSM134690, carcinomas in relation to nor- GSM134693, GSM134696, mal ductal and lobular cells GSM134699, GSM134702, GSM134705, GSM134708 GSE5764 Analysis of microdissected in- BC 2 GSM134586, GSM134589, vasive lobular and ductal breast GSM134688, GSM134691, carcinomas in relation to nor- GSM134694, GSM134697, mal ductal and lobular cells GSM134700, GSM134703, GSM134706, GSM134709 GSE5764 Analysis of microdissected in- BC 3 GSM134587, GSM134591, vasive lobular and ductal breast GSM134689, GSM134692, carcinomas in relation to nor- GSM134695 mal ductal and lobular cells GSE5764 Analysis of microdissected in- BC 4 GSM134698, GSM134701, vasive lobular and ductal breast GSM134704, GSM134707, carcinomas in relation to nor- GSM134710 mal ductal and lobular cells 83

Table A.2 (continued) GSE ID Title Group Subset Samples GSE28379 Gene expression profiles of fa- A 1 GSM701542 milial Alzheimer’s disease with presenilin 2 patient- specific induced pluripotent stem cells GSE28379 Gene expression profiles of fa- A 2 GSM701543 milial Alzheimer’s disease with presenilin 2 mutation patient- specific induced pluripotent stem cells GSE28379 Gene expression profiles of fa- A 3 GSM701544, GSM701545 milial Alzheimer’s disease with presenilin 2 mutation patient- specific induced pluripotent stem cells GSE28146 Microarray analyses of laser- A 1 GSM697308, GSM697309, captured hippocampus reveal GSM697310, GSM697311, distinct gray and white matter GSM697312, GSM697313, signatures associated with in- GSM697314, GSM697315 cipient Alzheimers disease GSE28146 Microarray analyses of laser- A 2 GSM697316, GSM697317, captured hippocampus reveal GSM697318, GSM697319, distinct gray and white matter GSM697320, GSM697321, signatures associated with in- GSM697322 cipient Alzheimers disease GSE28146 Microarray analyses of laser- A 3 GSM697323, GSM697324, captured hippocampus reveal GSM697325, GSM697326, distinct gray and white matter GSM697327, GSM697328, signatures associated with in- GSM697329, GSM697330 cipient Alzheimers disease GSE28146 Microarray analyses of laser- A 4 GSM697331, GSM697332, captured hippocampus reveal GSM697333, GSM697334, distinct gray and white matter GSM697335, GSM697336, signatures associated with in- GSM697337 cipient Alzheimers disease GSE29652 Microarray analysis of the A 1 GSM735094, GSM735095, astrocyte transcriptome in the GSM735096 ageing : relationship to Alzheimer’s pathology and ApoE genotype GSE29652 Microarray analysis of the A 2 GSM735097, GSM735098, astrocyte transcriptome in the GSM735099 ageing brain: relationship to Alzheimer’s pathology and ApoE genotype 84

Table A.2 (continued) GSE ID Title Group Subset Samples GSE29652 Microarray analysis of the A 3 GSM735100, GSM735101, astrocyte transcriptome in the GSM735102 ageing brain: relationship to Alzheimer’s pathology and ApoE genotype GSE29652 Microarray analysis of the A 4 GSM735103, GSM735104, astrocyte transcriptome in the GSM735105 ageing brain: relationship to Alzheimer’s pathology and ApoE genotype GSE29652 Microarray analysis of the A 5 GSM735106, GSM735107, astrocyte transcriptome in the GSM735108 ageing brain: relationship to Alzheimer’s pathology and ApoE genotype GSE29652 Microarray analysis of the A 6 GSM735109, GSM735110, astrocyte transcriptome in the GSM735111 ageing brain: relationship to Alzheimer’s pathology and ApoE genotype GSE21779 Gene expression data from tem- A 1 GSM542488, GSM542555, poral cortex of young adult, old GSM542557, GSM542562, and AD-like Microcebus muri- GSM542563, GSM542572, nus GSM542577 GSE21779 Gene expression data from tem- A 2 GSM542556, GSM542558, poral cortex of young adult, old GSM542560, GSM542561, and AD-like Microcebus muri- GSM542571, GSM542573, nus GSM542574, GSM542575, GSM542576 GSE21779 Gene expression data from tem- A 3 GSM542559, GSM542570 poral cortex of young adult, old and AD-like Microcebus muri- nus GSE18309 Transcriptomes in Peripheral A 1 GSM457169, GSM457170, Blood Mononuclear Cells of GSM457171 and Alzheimer Pa- tients GSE18309 Transcriptomes in Peripheral A 2 GSM457172, GSM457173, Blood Mononuclear Cells of GSM457174 Dementia and Alzheimer Pa- tients GSE18309 Transcriptomes in Peripheral A 3 GSM457175, GSM457176, Blood Mononuclear Cells of GSM457177 Dementia and Alzheimer Pa- tients 85

Table A.2 (continued) GSE ID Title Group Subset Samples GSE9770 Non-demented individuals with A 1 GSM242043, GSM242044, intermediate Alzheimer’s neu- GSM242045, GSM242046, ropathologies - neuronal ex- GSM242047, GSM242048 pression (6 regions) GSE9770 Non-demented individuals with A 2 GSM242049, GSM242050, intermediate Alzheimer’s neu- GSM242051, GSM242052, ropathologies - neuronal ex- GSM242053, GSM242054 pression (6 regions) GSE9770 Non-demented individuals with A 3 GSM242055, GSM242056, intermediate Alzheimer’s neu- GSM242057, GSM242058, ropathologies - neuronal ex- GSM242059, GSM242060 pression (6 regions) GSE9770 Non-demented individuals with A 4 GSM242061, GSM242062, intermediate Alzheimer’s neu- GSM242063, GSM242064, ropathologies - neuronal ex- GSM242065 pression (6 regions) GSE9770 Non-demented individuals with A 5 GSM242066, GSM242067, intermediate Alzheimer’s neu- GSM242068, GSM242069, ropathologies - neuronal ex- GSM242083, GSM242084 pression (6 regions) GSE9770 Non-demented individuals with A 6 GSM242085, GSM242086, intermediate Alzheimer’s neu- GSM242087, GSM242088, ropathologies - neuronal ex- GSM242089 pression (6 regions) GSE6276 rogae-affy-human-323460 A 1 GSM144420, GSM144421, GSM144422 GSE6276 rogae-affy-human-323460 A 2 GSM144423, GSM144425, GSM144427 GSE5281 Alzheimer’s disease and the A 1 GSM119615, GSM119616, normal aged brain (steph-affy- GSM119617, GSM119618, human-433773) GSM119619, GSM119620, GSM119621, GSM119622, GSM119623, GSM119624, GSM119625, GSM119626, GSM119627 GSE5281 Alzheimer’s disease and the A 2 GSM119628, GSM119629, normal aged brain (steph-affy- GSM119630, GSM119631, human-433773) GSM119632, GSM119633, GSM119634, GSM119635, GSM119636, GSM119637, GSM119638, GSM119639, GSM119640 GSE5281 Alzheimer’s disease and the A 3 GSM119641, GSM119642, normal aged brain (steph-affy- GSM119643, GSM119644, human-433773) GSM119645, GSM119646, GSM119647, GSM119648, GSM119649, GSM119650, GSM119651, GSM119652 86

Table A.2 (continued) GSE ID Title Group Subset Samples GSE5281 Alzheimer’s disease and the A 4 GSM119653, GSM119654, normal aged brain (steph-affy- GSM119655, GSM119656, human-433773) GSM119657, GSM119658, GSM119659, GSM119660, GSM119661, GSM119662, GSM119663, GSM119664, GSM119665 GSE5281 Alzheimer’s disease and the A 5 GSM119666, GSM119667, normal aged brain (steph-affy- GSM119668, GSM119669, human-433773) GSM119670, GSM119671, GSM119672, GSM119673, GSM119674, GSM119675, GSM119676 GSE5281 Alzheimer’s disease and the A 6 GSM119677, GSM119678, normal aged brain (steph-affy- GSM119679, GSM119680, human-433773) GSM119681, GSM119682, GSM119683, GSM119684, GSM119685, GSM119686, GSM119687, GSM119688 GSE5281 Alzheimer’s disease and the A 7 GSM238763, GSM238790, normal aged brain (steph-affy- GSM238791, GSM238792, human-433773) GSM238793, GSM238794, GSM238795, GSM238796, GSM238797, GSM238798 GSE5281 Alzheimer’s disease and the A 8 GSM238799, GSM238800, normal aged brain (steph-affy- GSM238801, GSM238802, human-433773) GSM238803, GSM238804, GSM238805, GSM238806, GSM238807, GSM238808 GSE5281 Alzheimer’s disease and the A 9 GSM238809, GSM238810, normal aged brain (steph-affy- GSM238811, GSM238812, human-433773) GSM238813, GSM238815, GSM238816, GSM238817, GSM238818, GSM238819, GSM238820, GSM238821, GSM238822, GSM238823, GSM238824, GSM238825 GSE5281 Alzheimer’s disease and the A 10 GSM238826, GSM238827, normal aged brain (steph-affy- GSM238834, GSM238835, human-433773) GSM238837, GSM238838, GSM238839, GSM238840, GSM238841 87

Table A.2 (continued) GSE ID Title Group Subset Samples GSE5281 Alzheimer’s disease and the A 11 GSM238842, GSM238843, normal aged brain (steph-affy- GSM238844, GSM238845, human-433773) GSM238846, GSM238847, GSM238848, GSM238851, GSM238854, GSM238855, GSM238856, GSM238857, GSM238858, GSM238860, GSM238861, GSM238862, GSM238863, GSM238864, GSM238865, GSM238867, GSM238868, GSM238870, GSM238871 GSE5281 Alzheimer’s disease and the A 12 GSM238872, GSM238873, normal aged brain (steph-affy- GSM238874, GSM238875, human-433773) GSM238877, GSM238941, GSM238942, GSM238943, GSM238944, GSM238945, GSM238946, GSM238947, GSM238948, GSM238949, GSM238951, GSM238952, GSM238953, GSM238955, GSM238963 GSE4757 Alzheimers disease: neu- A 1 GSM107522, GSM107524, rofibrillary tangles (Rogers- GSM107526, GSM107528, 3U24NS043571-01S1) GSM107530, GSM107532, GSM107534, GSM107536, GSM107538, GSM107540 GSE4757 Alzheimers disease: neu- A 2 GSM107523, GSM107525, rofibrillary tangles (Rogers- GSM107527, GSM107529, 3U24NS043571-01S1) GSM107531, GSM107533, GSM107535, GSM107537, GSM107539, GSM107541 GSE6798 Reduced expression of mito- SM 1 GSM155631, GSM155643, chondrial oxidative metabolism GSM155644, GSM155729, genes in skeletal muscle of GSM156170, GSM156171, women with PCOS GSM156176, GSM156177, GSM156178, GSM156179, GSM156180, GSM156181, GSM156184 GSE6798 Reduced expression of mito- SM 2 GSM156186, GSM156187, chondrial oxidative metabolism GSM156510, GSM156511, genes in skeletal muscle of GSM156512, GSM156749, women with PCOS GSM156750, GSM156751, GSM156752, GSM156753, GSM156763, GSM156946, GSM156948, GSM156949, GSM156950, GSM156951 88

Table A.2 (continued) GSE ID Title Group Subset Samples GSE9103 Skeletal Muscle Transcript Pro- SM 1 GSM230387, GSM230388, files in Trained or Sedentary GSM230389, GSM230390, Young and Old Subjects GSM230391, GSM230392, GSM230393, GSM230394, GSM230395, GSM230396 GSE9103 Skeletal Muscle Transcript Pro- SM 2 GSM230397, GSM230398, files in Trained or Sedentary GSM230399, GSM230400, Young and Old Subjects GSM230401, GSM230402, GSM230403, GSM230404, GSM230405, GSM230406 GSE9103 Skeletal Muscle Transcript Pro- SM 3 GSM230407, GSM230408, files in Trained or Sedentary GSM230409, GSM230410, Young and Old Subjects GSM230411, GSM230412, GSM230413, GSM230414, GSM230415, GSM230416 GSE9103 Skeletal Muscle Transcript Pro- SM 4 GSM230417, GSM230418, files in Trained or Sedentary GSM230419, GSM230420, Young and Old Subjects GSM230421, GSM230422, GSM230423, GSM230424, GSM230425, GSM230426 GSE10685 Human skeletal muscle biop- SM 1 GSM269808, GSM269811, sies from a 3h IL-6 infusion GSM269932 GSE10685 Human skeletal muscle biop- SM 2 GSM269809, GSM269834, sies from a 3h IL-6 infusion GSM269933 GSE10685 Human skeletal muscle biop- SM 3 GSM269810, GSM269835, sies from a 3h IL-6 infusion GSM269934 GSE9419 The skeletal muscle transcript SM 1 GSM239380, GSM239381, profile reflects responses to GSM239392, GSM239393, inadequate protein intake in GSM239394, GSM239395, younger and older males GSM239396, GSM239397, GSM239398, GSM239399, GSM239400, GSM239401 GSE9419 The skeletal muscle transcript SM 2 GSM239382, GSM239383, profile reflects responses to GSM239402, GSM239403, inadequate protein intake in GSM239404, GSM239405, younger and older males GSM239406, GSM239407, GSM239408, GSM239409, GSM239410, GSM239411 GSE9419 The skeletal muscle transcript SM 3 GSM239384, GSM239385, profile reflects responses to GSM239412, GSM239413, inadequate protein intake in GSM239414, GSM239415, younger and older males GSM239416, GSM239417, GSM239418, GSM239419, GSM239420, GSM239421 89

Table A.2 (continued) GSE ID Title Group Subset Samples GSE9419 The skeletal muscle transcript SM 4 GSM239386, GSM239387, profile reflects responses to GSM239422, GSM239423, inadequate protein intake in GSM239424, GSM239425, younger and older males GSM239426, GSM239427, GSM239428, GSM239429 GSE9419 The skeletal muscle transcript SM 5 GSM239388, GSM239389, profile reflects responses to GSM239430, GSM239431, inadequate protein intake in GSM239432, GSM239433, younger and older males GSM239434, GSM239435, GSM239436, GSM239437 GSE9419 The skeletal muscle transcript SM 6 GSM239390, GSM239391, profile reflects responses to GSM239438, GSM239439, inadequate protein intake in GSM239440, GSM239441, younger and older males GSM239442, GSM239443, GSM239444, GSM239445 GSE8157 Gene expression profiling in SM 1 GSM201542, GSM201543, skeletal muscle of PCOS after GSM201544, GSM201545, pioglitazone therapy GSM201829, GSM201830, GSM201831, GSM201832, GSM201833, GSM201834 GSE8157 Gene expression profiling in SM 2 GSM201835, GSM201836, skeletal muscle of PCOS after GSM201837, GSM201838, pioglitazone therapy GSM201839, GSM201840, GSM201841, GSM201842, GSM201843, GSM201844 GSE8157 Gene expression profiling in SM 3 GSM201849, GSM201850, skeletal muscle of PCOS after GSM201851, GSM201852, pioglitazone therapy GSM201853, GSM201854, GSM201855, GSM201856, GSM201857, GSM201858, GSM201859, GSM201861, GSM201862 GSE8157 Gene expression profiling in SM 4 GSM201863, GSM201864, skeletal muscle of PCOS after GSM201865, GSM201866, pioglitazone therapy GSM201867, GSM201868, GSM201869, GSM201870, GSM201871, GSM201872 GSE15090 Gene expression profiles in SM 1 GSM377456, GSM377459, muscle tissue from FSHD pa- GSM377462, GSM377465, tients GSM377468 GSE15090 Gene expression profiles in SM 2 GSM377457, GSM377460, muscle tissue from FSHD pa- GSM377463, GSM377466, tients GSM377469 GSE15090 Gene expression profiles in SM 3 GSM377458, GSM377461, muscle tissue from FSHD pa- GSM377464, GSM377467, tients GSM377470 90

Table A.2 (continued) GSE ID Title Group Subset Samples GSE7014 Expression data from DM1, SM 1 GSM161935, GSM161936, DM2 and Normal Adult Skele- GSM161937, GSM161938, tal Muscle Biopsies GSM161939, GSM161940, GSM161941, GSM161942, GSM161943, GSM161944 GSE7014 Expression data from DM1, SM 2 GSM161945, GSM161946, DM2 and Normal Adult Skele- GSM161948, GSM161949, tal Muscle Biopsies GSM161950, GSM161951, GSM161952, GSM161953, GSM161954, GSM161955, GSM161956, GSM161957, GSM161958, GSM161959, GSM161960, GSM161961, GSM161962, GSM161963, GSM161964, GSM161965 GSE7014 Expression data from DM1, SM 3 GSM161966, GSM161967, DM2 and Normal Adult Skele- GSM161968, GSM161969, tal Muscle Biopsies GSM161970, GSM161971 GSE21496 Effects of 48h Lower Limb SM 1 GSM537120, GSM537123, Unloading in Human Skeletal GSM537126, GSM537129, Muscle GSM537132, GSM537135, GSM537138 GSE21496 Effects of 48h Lower Limb SM 2 GSM537121, GSM537124, Unloading in Human Skeletal GSM537127, GSM537130, Muscle GSM537133, GSM537136, GSM537139 GSE21496 Effects of 48h Lower Limb SM 3 GSM537122, GSM537125, Unloading in Human Skeletal GSM537128, GSM537131, Muscle GSM537134, GSM537137, GSM537140 GSE24235 Skeletal muscle gene expres- SM 1 GSM595895, GSM595896, sion in response to resistance GSM595897 exercise: sex specific regulation GSE24235 Skeletal muscle gene expres- SM 2 GSM595899, GSM595901, sion in response to resistance GSM595902, GSM595908, exercise: sex specific regulation GSM595910, GSM595911 GSE24235 Skeletal muscle gene expres- SM 3 GSM595904, GSM595906, sion in response to resistance GSM595907 exercise: sex specific regulation GSE24235 Skeletal muscle gene expres- SM 4 GSM595912, GSM595931, sion in response to resistance GSM595932, GSM596002 exercise: sex specific regulation GSE24235 Skeletal muscle gene expres- SM 5 GSM596003, GSM596004, sion in response to resistance GSM596005, GSM596037, exercise: sex specific regulation GSM596054, GSM596055, GSM596056, GSM596057 91

Table A.2 (continued) GSE ID Title Group Subset Samples GSE24235 Skeletal muscle gene expres- SM 6 GSM596038, GSM596051, sion in response to resistance GSM596052, GSM596053 exercise: sex specific regulation GSE12474 Microarray analysis of skeletal SM 1 GSM313315, GSM313316, muscle hypertrophy induced by GSM313317, GSM313318, heat-stress in healthy humans GSM313324 GSE12474 Microarray analysis of skeletal SM 2 GSM313319, GSM313320, muscle hypertrophy induced by GSM313321, GSM313322, heat-stress in healthy humans GSM313323 GSE28998 Human skeletal muscle tran- SM 1 GSM718509, GSM718510, scriptional response to exercise GSM718511 GSE28998 Human skeletal muscle tran- SM 2 GSM718512, GSM718513, scriptional response to exercise GSM718514 GSE28998 Human skeletal muscle tran- SM 3 GSM718515, GSM718516, scriptional response to exercise GSM718518, GSM718521 GSE28998 Human skeletal muscle tran- SM 4 GSM718517, GSM718519, scriptional response to exercise GSM718520, GSM718522 GSE37715 Expression data in human hep- H 1 GSM925906, GSM925907, atocytes with HCV infection GSM925908, GSM925909 GSE37715 Expression data in human hep- H 2 GSM925910, GSM925911, atocytes with HCV infection GSM925912, GSM925913 GSE37715 Expression data in human hep- H 3 GSM925914, GSM925915, atocytes with HCV infection GSM925916 GSE37715 Expression data in human hep- H 4 GSM925917, GSM925918, atocytes with HCV infection GSM925919, GSM925920 GSE31264 Primary human hepatocytes H 1 GSM774851 treated with IFNalpha and IL28B GSE31264 Primary human hepatocytes H 2 GSM774852 treated with IFNalpha and IL28B GSE31264 Primary human hepatocytes H 3 GSM774853 treated with IFNalpha and IL28B GSE31264 Primary human hepatocytes H 4 GSM774854 treated with IFNalpha and IL28B GSE31264 Primary human hepatocytes H 5 GSM774855 treated with IFNalpha and IL28B GSE31264 Primary human hepatocytes H 6 GSM774856 treated with IFNalpha and IL28B GSE31264 Primary human hepatocytes H 7 GSM924592, GSM924593, treated with IFNalpha and GSM924594 IL28B 92

Table A.2 (continued) GSE ID Title Group Subset Samples GSE31264 Primary human hepatocytes H 8 GSM924595, GSM924596, treated with IFNalpha and GSM924597 IL28B GSE31264 Primary human hepatocytes H 9 GSM924598, GSM924599, treated with IFNalpha and GSM924600 IL28B GSE31264 Primary human hepatocytes H 10 GSM924601, GSM924602, treated with IFNalpha and GSM924603 IL28B GSE31193 A Robust Induction of Type H 1 GSM773317, GSM773318, III Interferons and Chemokines GSM773319 Defines a Unique Pattern of Hepatic Innate Immunity in Re- sponse to Hepatitis C Virus In- fection GSE23031 Ribavirin Treated Huh7.5.1 H 1 GSM568463, GSM568464, Cells GSM568465 GSE23031 Ribavirin Treated Huh7.5.1 H 2 GSM568466, GSM568467, Cells GSM568468 GSE34022 Expression data of interferon- H 1 GSM840543 alpha treated Huh-7 cells GSE34022 Expression data of interferon- H 2 GSM840544 alpha treated Huh-7 cells GSE29889 Gene expression of a variety of H 1 GSM740159, GSM740160, Huh-7 derived hepatoma cells GSM740161 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 2 GSM740162, GSM740163, Huh-7 derived hepatoma cells GSM740164 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 3 GSM740165, GSM740166, Huh-7 derived hepatoma cells GSM740167 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 4 GSM740168, GSM740169, Huh-7 derived hepatoma cells GSM740170 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 5 GSM740171, GSM740172, Huh-7 derived hepatoma cells GSM740173 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. 93

Table A.2 (continued) GSE ID Title Group Subset Samples GSE29889 Gene expression of a variety of H 6 GSM740174, GSM740175, Huh-7 derived hepatoma cells GSM740176 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 7 GSM740177, GSM740178, Huh-7 derived hepatoma cells GSM740179 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 8 GSM740180, GSM740181, Huh-7 derived hepatoma cells GSM740182 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 9 GSM740183, GSM740184, Huh-7 derived hepatoma cells GSM740185 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 10 GSM740186, GSM740187, Huh-7 derived hepatoma cells GSM740188 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 11 GSM740189, GSM740190, Huh-7 derived hepatoma cells GSM740191 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 12 GSM740192, GSM740193, Huh-7 derived hepatoma cells GSM740194 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE29889 Gene expression of a variety of H 13 GSM740195, GSM740196, Huh-7 derived hepatoma cells GSM740197 and their susceptibility to infec- tion by JHF1 HCV in the HCV cell culture system. GSE25157 CREB3L1 Target Genes in Re- H 1 GSM618135 sponse to Hepatitis C Replicon Infection GSE25157 CREB3L1 Target Genes in Re- H 2 GSM618136 sponse to Hepatitis C Replicon Infection 94

Table A.2 (continued) GSE ID Title Group Subset Samples GSE25156 Host Factors with Reduced Ex- H 1 GSM618131 pression in Two HCV Permis- sive Cell Lines as Compared to the Non-Permissive Parent Cell Line Huh7 GSE25156 Host Factors with Reduced Ex- H 2 GSM618132 pression in Two HCV Permis- sive Cell Lines as Compared to the Non-Permissive Parent Cell Line Huh7 GSE25156 Host Factors with Reduced Ex- H 3 GSM618133, GSM618134 pression in Two HCV Permis- sive Cell Lines as Compared to the Non-Permissive Parent Cell Line Huh7 GSE20948 The Effect of Hepatitis C Virus H 1 GSM523800, GSM523801 Infection on Host Gene Expres- sion GSE20948 The Effect of Hepatitis C Virus H 2 GSM523802, GSM523803, Infection on Host Gene Expres- GSM523804 sion GSE20948 The Effect of Hepatitis C Virus H 3 GSM523805, GSM523806, Infection on Host Gene Expres- GSM523807 sion GSE20948 The Effect of Hepatitis C Virus H 4 GSM523808, GSM523809, Infection on Host Gene Expres- GSM523810 sion GSE20948 The Effect of Hepatitis C Virus H 5 GSM523811, GSM523812, Infection on Host Gene Expres- GSM523813 sion GSE20948 The Effect of Hepatitis C Virus H 6 GSM523814, GSM523815 Infection on Host Gene Expres- sion GSE20948 The Effect of Hepatitis C Virus H 7 GSM523816, GSM523817, Infection on Host Gene Expres- GSM523818 sion GSE20948 The Effect of Hepatitis C Virus H 8 GSM523819, GSM523820, Infection on Host Gene Expres- GSM523821 sion GSE20948 The Effect of Hepatitis C Virus H 9 GSM523822, GSM523823, Infection on Host Gene Expres- GSM523824 sion GSE20948 The Effect of Hepatitis C Virus H 10 GSM523825, GSM523826, Infection on Host Gene Expres- GSM523827 sion GSE15743 IFN alpha-induced gene ex- H 1 GSM394092 pression in human NK cells 95

Table A.2 (continued) GSE ID Title Group Subset Samples GSE15743 IFN alpha-induced gene ex- H 2 GSM394093 pression in human NK cells GSE15743 IFN alpha-induced gene ex- H 3 GSM394094 pression in human NK cells GSE15743 IFN alpha-induced gene ex- H 4 GSM394095 pression in human NK cells GSE11190 Interferon signaling and treat- H 1 GSM281841, GSM281843, ment outcome in chronic hep- GSM281845, GSM281847, atitis C GSM281849, GSM281851, GSM281853, GSM281855, GSM281857, GSM281859, GSM281861, GSM281863, GSM281865, GSM281867, GSM281869, GSM281871, GSM281873, GSM281875, GSM281877, GSM281879, GSM281881, GSM281883, GSM281885, GSM281887, GSM281889, GSM281891, GSM281893, GSM281895, GSM281897, GSM281899, GSM281901, GSM281903, GSM281905, GSM281907, GSM281909, GSM281911, GSM281913 GSE11190 Interferon signaling and treat- H 2 GSM281837 ment outcome in chronic hep- atitis C GSE11190 Interferon signaling and treat- H 3 GSM281838, GSM281840, ment outcome in chronic hep- GSM281842, GSM281844, atitis C GSM281846, GSM281848, GSM281850, GSM281852, GSM281854, GSM281856, GSM281858, GSM281860, GSM281862, GSM281864, GSM281866, GSM281868, GSM281870, GSM281872, GSM281874, GSM281876, GSM281878, GSM281880, GSM281882, GSM281884, GSM281886, GSM281888, GSM281890, GSM281892, GSM281894, GSM281896, GSM281898, GSM281900, GSM281902, GSM281904, GSM281906, GSM281908, GSM281910, GSM281912, GSM281914 96

Table A.2 (continued) GSE ID Title Group Subset Samples GSE11190 Interferon signaling and treat- H 4 GSM281839 ment outcome in chronic hep- atitis C GSE19339 Differential gene expression in C 1 GSM480382, GSM480383, thrombus-derived white blood GSM480384, GSM480385 cells of patients with acute coronary syndrome GSE19339 Differential gene expression in C 2 GSM480386, GSM480387, thrombus-derived white blood GSM480388, GSM480389 cells of patients with acute coronary syndrome GSE29111 Characterisation of Myocardial C 1 GSM720972, GSM720973, Infarction and Unstable Angina GSM720974, GSM720975, with mRNA Profiles from GSM720976, GSM720977, Whole Blood of individual GSM720978, GSM720979, patients GSM720980, GSM720981, GSM720982, GSM720983, GSM720984, GSM720985, GSM720986, GSM720987, GSM720988, GSM720989, GSM720990, GSM720991, GSM720992, GSM720993, GSM720994, GSM720995, GSM720996, GSM720997, GSM720998, GSM720999, GSM721000, GSM721001, GSM721002, GSM721003, GSM721004, GSM721005, GSM721006, GSM721007 GSE29111 Characterisation of Myocardial C 2 GSM721008, GSM721009, Infarction and Unstable Angina GSM721010, GSM721011, with mRNA Profiles from GSM721012, GSM721013, Whole Blood of individual GSM721014, GSM721015, patients GSM721016, GSM721017, GSM721018, GSM721019, GSM721020, GSM721021, GSM721022, GSM721023 GSE13491 Therapeutic efficacy of human C 1 GSM340079, GSM340080, umbilical cord blood-derived GSM340081 mesenchymal stem cells in my- ocardial repair after infarction GSE13491 Therapeutic efficacy of human C 2 GSM340082, GSM340083, umbilical cord blood-derived GSM340084 mesenchymal stem cells in my- ocardial repair after infarction GSE13491 Therapeutic efficacy of human C 3 GSM340085, GSM340086, umbilical cord blood-derived GSM340087 mesenchymal stem cells in my- ocardial repair after infarction 97

Table A.2 (continued) GSE ID Title Group Subset Samples GSE13491 Therapeutic efficacy of human C 4 GSM340088, GSM340089, umbilical cord blood-derived GSM340090 mesenchymal stem cells in my- ocardial repair after infarction GSE12504 Hearts after off-pump coronary C 1 GSM313601, GSM313620, revascularization surgery and GSM313621, GSM313622, on-pump coronary artery by- GSM313623, GSM313624, pass grafting GSM313625, GSM313626, GSM313627, GSM313628 GSE12504 Hearts after off-pump coronary C 2 GSM313629, GSM313631, revascularization surgery and GSM313633, GSM313635, on-pump coronary artery by- GSM313637, GSM313638, pass grafting GSM313639, GSM313640, GSM313641, GSM313642 GSE12486 Changes in cardiac transcrip- C 1 GSM313629, GSM313631, tion profiles following on-pump GSM313633, GSM313635, coronary artery bypass grafting GSM313637 GSE12486 Changes in cardiac transcrip- C 2 GSM313638, GSM313639, tion profiles following on-pump GSM313640, GSM313641, coronary artery bypass grafting GSM313642 GSE34781 Expression data from normal C 1 GSM855105 human and unstable angina pa- tient. GSE34781 Expression data from normal C 2 GSM855106, GSM855107 human and unstable angina pa- tient. GSE29819 Myocardial transcriptome anal- C 1 GSM738990, GSM738992, ysis of human arrhythmogenic GSM738994, GSM738996, right ventricular cardiomyopa- GSM738998, GSM739000 thy (ARVC) GSE29819 Myocardial transcriptome anal- C 2 GSM738991, GSM738993, ysis of human arrhythmogenic GSM738995, GSM738997, right ventricular cardiomyopa- GSM738999, GSM739001 thy (ARVC) GSE29819 Myocardial transcriptome anal- C 3 GSM739002, GSM739004, ysis of human arrhythmogenic GSM739006, GSM739008, right ventricular cardiomyopa- GSM739010, GSM739012, thy (ARVC) GSM739014 GSE29819 Myocardial transcriptome anal- C 4 GSM739003, GSM739005, ysis of human arrhythmogenic GSM739007, GSM739009, right ventricular cardiomyopa- GSM739011, GSM739013, thy (ARVC) GSM739015 GSE29819 Myocardial transcriptome anal- C 5 GSM739016, GSM739018, ysis of human arrhythmogenic GSM739020, GSM739022, right ventricular cardiomyopa- GSM739024, GSM739026 thy (ARVC) 98

Table A.2 (continued) GSE ID Title Group Subset Samples GSE29819 Myocardial transcriptome anal- C 6 GSM739017, GSM739019, ysis of human arrhythmogenic GSM739021, GSM739023, right ventricular cardiomyopa- GSM739025, GSM739027 thy (ARVC) GSE12485 Changes in cardiac tran- C 1 GSM313601, GSM313620, scription profiles following GSM313621, GSM313622, off-pump coronary revascular- GSM313623 ization surgery GSE12485 Changes in cardiac tran- C 2 GSM313624, GSM313625, scription profiles following GSM313626, GSM313627, off-pump coronary revascular- GSM313628 ization surgery GSE14975 Rac1-Induced Connective Tis- C 1 GSM373956, GSM373957, sue Growth Factor regulates GSM373958, GSM373959, Connexin 43 and N-Cadherin GSM373960 Expression in Atrial Fibrillation GSE14975 Rac1-Induced Connective Tis- C 2 GSM373961, GSM373962, sue Growth Factor regulates GSM373963, GSM373964, Connexin 43 and N-Cadherin GSM373965 Expression in Atrial Fibrillation GSE4172 Gene expression profiling of C 1 GSM94831, GSM94854, human inflammatory Car- GSM94855, GSM94870 diomyopathy GSE4172 Gene expression profiling of C 2 GSM94836, GSM94837, human inflammatory Car- GSM94838, GSM94839, diomyopathy GSM94840, GSM94841, GSM94842, GSM94843 99

Appendix B: Data from the Multiple Platform Study of Binary Representations

Table B.1: Summary of Profiles in Single Platform Study. Gives groups, abbr, samples, profiles

Group Abbr. Studies Samples Duchenne’s Muscular Dystrophe DMD 7 42 Breast Cancer BC 8 52 Huntington’s Disorder HD 13 57 Hepatic Tissue H 18 63 Cardiac Tissue C 18 75 Total 64 289

Table B.2: Profles Used in Multiple Platform Study. The GEO Series ID, the Series title, the test group, the number of the subset in the series, and test samples are given for each subset. There are 5 test groups: Breast Cancer (BC), Huntington’s Disease (HD), Duchenne’s Muscular Dystrophe (DMD), Hepatic (H), and Cardiac (C).

GSE ID Platform Title Group No. Samples GSE53 GPL170 Human mammary epithelium and HD 1 GSM1928 breast cancer GSE53 GPL170 Human mammary epithelium and HD 2 GSM1929 breast cancer GSE53 GPL170 Human mammary epithelium and HD 3 GSM1930 breast cancer GSE53 GPL170 Human mammary epithelium and HD 4 GSM1931 breast cancer GSE53 GPL170 Human mammary epithelium and HD 5 GSM1932 breast cancer GSE53 GPL170 Human mammary epithelium and HD 6 GSM1933 breast cancer GSE53 GPL170 Human mammary epithelium and HD 7 GSM1934 breast cancer 100

Table B.2 (continued) GSE ID Platform Title Group No. Samples GSE53 GPL170 Human mammary epithelium and HD 8 GSM1935 breast cancer GSE53 GPL170 Human mammary epithelium and HD 9 GSM1936 breast cancer GSE53 GPL170 Human mammary epithelium and HD 10 GSM1937 breast cancer GSE53 GPL170 Human mammary epithelium and HD 11 GSM1938 breast cancer GSE53 GPL170 Human mammary epithelium and HD 12 GSM1939 breast cancer GSE53 GPL170 Human mammary epithelium and HD 13 GSM1940 breast cancer GSE53 GPL170 Human mammary epithelium and HD 14 GSM1941 breast cancer GSE53 GPL170 Human mammary epithelium and HD 15 GSM1942 breast cancer GSE53 GPL170 Human mammary epithelium and HD 16 GSM1943 breast cancer GSE53 GPL170 Human mammary epithelium and HD 17 GSM1944 breast cancer GSE53 GPL170 Human mammary epithelium and HD 18 GSM1945 breast cancer GSE53 GPL170 Human mammary epithelium and HD 19 GSM1946 breast cancer GSE53 GPL170 Human mammary epithelium and HD 20 GSM1947 breast cancer GSE53 GPL170 Human mammary epithelium and HD 21 GSM1948 breast cancer GSE53 GPL170 Human mammary epithelium and HD 22 GSM1949 breast cancer GSE53 GPL170 Human mammary epithelium and HD 23 GSM1950 breast cancer GSE53 GPL170 Human mammary epithelium and HD 24 GSM1951 breast cancer GSE53 GPL170 Human mammary epithelium and HD 25 GSM1952 breast cancer GSE53 GPL170 Human mammary epithelium and HD 26 GSM1953 breast cancer GDS214 GPL246 Duchenne muscular dystrophy DMD 1 GSM4230, GSM4231, (MuscleChip) GSM4236, GSM4241 101

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS214 GPL246 Duchenne muscular dystrophy DMD 2 GSM4400, GSM4405, (MuscleChip) GSM4406, GSM4407, GSM4408, GSM4409, GSM4410, GSM4411, GSM4412, GSM4413, GSM4414, GSM4415, GSM4416, GSM4417, GSM4383, GSM4385, GSM4386, GSM4387, GSM4388, GSM4389, GSM4390, GSM4391, GSM4392, GSM4393, GSM4394, GSM48537 GDS236 GPL81 Dystrophin-deficient mdx muscle DMD 1 GSM4372, GSM4373, regeneration GSM4374, GSM4375, GSM4376 GDS236 GPL81 Dystrophin-deficient mdx muscle DMD 2 GSM4377, GSM4378, regeneration GSM4379, GSM4380, GSM4381 GDS563 GPL8300 Duchenne muscular dystrophy DMD 1 GSM15807, GSM15822, (II) (HG-U95A) GSM15823, GSM15824, GSM15825, GSM15826, GSM15827, GSM15828, GSM15829, GSM15830, GSM15831 GDS563 GPL8300 Duchenne muscular dystrophy DMD 2 GSM15833, GSM15834, (II) (HG-U95A) GSM15835, GSM15836, GSM15837, GSM15838, GSM15839, GSM15840, GSM15841, GSM15842, GSM15843, GSM15844 GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 1 GSM15775, GSM15776, traocular muscle development GSM15777 time course GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 2 GSM15845, GSM15846, traocular muscle development GSM15847 time course GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 3 GSM15851, GSM15852, traocular muscle development GSM15853 time course GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 4 GSM15857, GSM15858, traocular muscle development GSM15859 time course GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 5 GSM15767, GSM15771, traocular muscle development GSM15774 time course 102

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 6 GSM15778, GSM15940, traocular muscle development GSM15941 time course GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 7 GSM15848, GSM15849, traocular muscle development GSM15850 time course GDS614 GPL81 Dystrophin-deficient mdx ex- DMD 8 GSM15854, GSM15855, traocular muscle development GSM15856 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 1 GSM16475, GSM16476, aphram muscle development GSM16477 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 2 GSM16481, GSM16482, aphram muscle development GSM16483 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 3 GSM16487, GSM16488, aphram muscle development GSM16489 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 4 GSM16493, GSM16494, aphram muscle development GSM16495 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 5 GSM16499, GSM16500, aphram muscle development GSM16501 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 6 GSM16505, GSM16506, aphram muscle development GSM16507 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 7 GSM16472, GSM16473, aphram muscle development GSM16474 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 8 GSM16478, GSM16479, aphram muscle development GSM16480 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 9 GSM16484, GSM16485, aphram muscle development GSM16486 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 10 GSM16490, GSM16491, aphram muscle development GSM16492 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 11 GSM16496, GSM16497, aphram muscle development GSM16498 time course GDS638 GPL81 Dystrophin-deficient mdx di- DMD 12 GSM16502, GSM16503, aphram muscle development GSM16504 time course 103

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS639 GPL81 Dystrophin-deficient mdx di- DMD 1 GSM15863, GSM15864, aphram muscle development GSM15865 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 2 GSM15936, GSM15937, aphram muscle development GSM15938 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 3 GSM16451, GSM16452, aphram muscle development GSM16453 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 4 GSM16457, GSM16458, aphram muscle development GSM16459 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 5 GSM16463, GSM16464, aphram muscle development GSM16465 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 6 GSM16469, GSM16470, aphram muscle development GSM16471 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 7 GSM15860, GSM15861, aphram muscle development GSM15862 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 8 GSM15933, GSM15934, aphram muscle development GSM15935 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 9 GSM15939, GSM16449, aphram muscle development GSM16450 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 10 GSM16454, GSM16455, aphram muscle development GSM16456 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 11 GSM16460, GSM16461, aphram muscle development GSM16462 time course GDS639 GPL81 Dystrophin-deficient mdx di- DMD 12 GSM16466, GSM16467, aphram muscle development GSM16468 time course GDS703 GPL32 Dystrophin-deficient mdx ex- DMD 1 GSM17197, GSM17198, traocular and leg muscle GSM17199, GSM17200, GSM17201 GDS703 GPL32 Dystrophin-deficient mdx ex- DMD 2 GSM17206, GSM17207, traocular and leg muscle GSM17208, GSM17209, GSM17210 GDS703 GPL32 Dystrophin-deficient mdx ex- DMD 3 GSM24811, GSM24812, traocular and leg muscle GSM24813, GSM24814, GSM24815 104

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS703 GPL32 Dystrophin-deficient mdx ex- DMD 4 GSM24806, GSM24807, traocular and leg muscle GSM24808, GSM24809, GSM24810 GDS806/7 GPL1223 Estrogen positive breast cancer BC 1 GSM22365, GSM22366, recurrence during tamoxifen ther- GSM22367, GSM22368, apy: microdissected tumor GSM22369, GSM22370, GSM22371, GSM22372, GSM22373, GSM22374, GSM22375, GSM22376, GSM22377, GSM22378, GSM22379, GSM22380, GSM22381, GSM22382, GSM22383, GSM22384, GSM22385, GSM22386, GSM22387, GSM22388, GSM22389, GSM22390, GSM22391, GSM22392, GSM22393, GSM22394, GSM22395, GSM22396, GSM22397, GSM22398, GSM22399, GSM22400, GSM22401, GSM22402, GSM22403, GSM22404, GSM22405, GSM22406, GSM22407, GSM22408, GSM22409, GSM22410, GSM22411, GSM22412, GSM22413, GSM22414, GSM22415, GSM22416, GSM22417, GSM22418, GSM22419, GSM22420, GSM22421, GSM22422, GSM22423, GSM22424 105

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS806/7 GPL1223 Estrogen positive breast cancer BC 2 GSM22453, GSM22458, recurrence during tamoxifen ther- GSM22465, GSM22466, apy: microdissected tumor GSM22468, GSM22469, GSM22471, GSM22472, GSM22474, GSM22476, GSM22477, GSM22478, GSM22481, GSM22484, GSM22485, GSM22487, GSM22488, GSM22489, GSM22490, GSM22492, GSM22493, GSM22494, GSM22497, GSM22498, GSM22501, GSM22502, GSM22503, GSM22504, GSM22505, GSM22506, GSM22507, GSM22508, GSM22449, GSM22450, GSM22451, GSM22452, GSM22454, GSM22455, GSM22456, GSM22457, GSM22459, GSM22460, GSM22461, GSM22462, GSM22463, GSM22464, GSM22467, GSM22470, GSM22473, GSM22475, GSM22479, GSM22480, GSM22482, GSM22483, GSM22486, GSM22491, GSM22495, GSM22496, GSM22499, GSM22500 GDS1222 GPL81 Mammary tumorigenesis in BC 1 GSM48242, GSM48244, MMTV-neu model GSM48245 GDS1222 GPL81 Mammary tumorigenesis in BC 2 GSM48246, GSM48247, MMTV-neu model GSM48248, GSM48249 GDS1222 GPL81 Mammary tumorigenesis in BC 3 GSM48236, GSM48237, MMTV-neu model GSM48238, GSM48239, GSM48240 GDS1222 GPL81 Mammary tumorigenesis in BC 4 GSM48241, GSM48243 MMTV-neu model GDS1250 GPL96 Atypical ductal hyperplasia and BC 1 GSM45657, GSM45658, breast cancer GSM45659, GSM45660 GDS1250 GPL96 Atypical ductal hyperplasia and BC 2 GSM45661, GSM45662, breast cancer GSM45663, GSM45664 GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 1 GSM38895 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 2 GSM38896 pression in Human Normal and Breast Cancer Cells 106

Table B.2 (continued) GSE ID Platform Title Group No. Samples GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 3 GSM38897 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 4 GSM38898 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 5 GSM38899 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 6 GSM38900 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 7 GSM38901 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 8 GSM38902 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 9 GSM38903 pression in Human Normal and Breast Cancer Cells GSE2155 GPL1794 Microarray Analysis of Gene Ex- BC 10 GSM38904 pression in Human Normal and Breast Cancer Cells GDS2250 GPL570 Basal-like breast cancer tumors BC 1 GSM85513, GSM85514, GSM85515, GSM85516, GSM85517, GSM85518, GSM85519 GDS2250 GPL570 Basal-like breast cancer tumors BC 2 GSM85493, GSM85494, GSM85495, GSM85496, GSM85497, GSM85498, GSM85499, GSM85500, GSM85501, GSM85502, GSM85503, GSM85504, GSM85505, GSM85506, GSM85507, GSM85508, GSM85509, GSM85510, GSM85511, GSM85512 GDS2250 GPL570 Basal-like breast cancer tumors BC 3. GSM85491, GSM85492 GDS2250 GPL570 Basal-like breast cancer tumors BC 4 GSM85473, GSM85474, GSM85475, GSM85476, GSM85477, GSM85478, GSM85479, GSM85480, GSM85481, GSM85482, GSM85483, GSM85484, GSM85485, GSM85486, GSM85487, GSM85488, GSM85489, GSM85490 107

Table B.2 (continued) GSE ID Platform Title Group No. Samples GSE4382 GPL180 Repeated observation of breast BC 1 GSM1844, GSM1845, tumor subtypes in independent GSM1846, GSM1847, gene expression data sets GSM1849, GSM1850, GSM1851, GSM1852, GSM1854, GSM1855, GSM1856, GSM1857, GSM1858, GSM1859, GSM1860, GSM1861, GSM1862, GSM1866, GSM1867, GSM1868, GSM1870, GSM1871, GSM1872, GSM1873, GSM1874, GSM1876, GSM1877, GSM1878, GSM1879, GSM1880, GSM1881, GSM1882, GSM1885, GSM1888, GSM1889, GSM1890, GSM1891, GSM1892, GSM1893, GSM1895, GSM1896, GSM1897, GSM1898, GSM1899, GSM1900, GSM1901, GSM1903, GSM1904, GSM1905, GSM1907, GSM1908, GSM1909, GSM1910, GSM1911, GSM1913, GSM1914, GSM1915, GSM1916, GSM1917, GSM1918, GSM1927, GSM73720, GSM98953 GSE4382 GPL180 Repeated observation of breast BC 2 GSM1887, GSM1902, tumor subtypes in independent GSM1906 gene expression data sets GDS717 GPL81 Huntingtons disease and combi- HD 1 GSM13300, GSM13355, nation drug therapy GSM13356 GDS717 GPL81 Huntingtons disease and combi- HD 2. GSM13357, GSM13358, nation drug therapy GSM13359 GDS717 GPL81 Huntingtons disease and combi- HD 3 GSM13360, GSM13361, nation drug therapy GSM13362, GSM13363 GDS717 GPL81 Huntingtons disease and combi- HD 4 GSM13364, GSM13365, nation drug therapy GSM13366 GDS1236 GPL85 Huntingtin 1 protein overex- HD 1 GSM49946, GSM49948, pression GSM49950, GSM49952 GDS1236 GPL85 Huntingtin exon 1 protein overex- HD 2 GSM49945, GSM49947, pression GSM49949, GSM49951 108

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS1331 GPL96 Huntington’s disease: peripheral HD 1 GSM30580, GSM30581, blood expression profile (HG- GSM30582, GSM30583, U133A) GSM30584, GSM30585, GSM30586, GSM30587, GSM30588, GSM30589, GSM30590, GSM30591, GSM30592, GSM30593 GDS1331 GPL96 Huntington’s disease: peripheral HD 2 GSM30542, GSM30543, blood expression profile (HG- GSM30544, GSM30545, U133A) GSM30546 GDS1331 GPL96 Huntington’s disease: peripheral HD 3 GSM30530, GSM30531, blood expression profile (HG- GSM30532, GSM30533, U133A) GSM30534, GSM30535, GSM30536, GSM30537, GSM30538, GSM30539, GSM30540, GSM30541 GDS1332 GPL1449 Huntington’s disease: periph- HD 1 GSM30698, GSM30699, eral blood expression profile GSM30700, GSM30701, (Codelink Uniset 20K) GSM30702, GSM30703, GSM30704, GSM30705, GSM30706, GSM30707, GSM30708, GSM30709, GSM30710, GSM30711 GDS1332 GPL1449 Huntington’s disease: periph- HD 2 GSM30693, GSM30694, eral blood expression profile GSM30695, GSM30696, (Codelink Uniset 20K) GSM30697 GDS1332 GPL1449 Huntington’s disease: periph- HD 3 GSM30681, GSM30682, eral blood expression profile GSM30683, GSM30684, (Codelink Uniset 20K) GSM30685, GSM30686, GSM30687, GSM30688, GSM30689, GSM30690, GSM30691, GSM30692 GDS2169 GPL81 Nuclear and extranuclear mutant HD 1 GSM73205, GSM73208, huntingtin exon 1 protein effect GSM73209 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 2 GSM73212, GSM73214, huntingtin exon 1 protein effect GSM73216, GSM73224 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 3 GSM73217, GSM73222, huntingtin exon 1 protein effect GSM73223 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 4 GSM73192, GSM73196, huntingtin exon 1 protein effect GSM73197 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 5 GSM73200, GSM73218, huntingtin exon 1 protein effect GSM73221, GSM73231 on cerebellum 109

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS2169 GPL81 Nuclear and extranuclear mutant HD 6 GSM73186, GSM73189, huntingtin exon 1 protein effect GSM73191 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 7 GSM73198, GSM73199, huntingtin exon 1 protein effect GSM73227, GSM73228 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 8 GSM73203, GSM73204, huntingtin exon 1 protein effect GSM73207 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 9 GSM73211, GSM73213, huntingtin exon 1 protein effect GSM73215, GSM73225 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 10 GSM73201, GSM73202, huntingtin exon 1 protein effect GSM73206 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 11 GSM73193, GSM73194, huntingtin exon 1 protein effect GSM73195 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 12 GSM73219, GSM73220, huntingtin exon 1 protein effect GSM73232, GSM73233 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 13 GSM73187, GSM73188, huntingtin exon 1 protein effect GSM73190 on cerebellum GDS2169 GPL81 Nuclear and extranuclear mutant HD 14 GSM73210, GSM73226, huntingtin exon 1 protein effect GSM73229, GSM73230 on cerebellum GDS2391 GPL1261 PGC-1alpha transcriptional coac- HD 1 GSM135225, GSM135226, tivator null mutation effect on the GSM135227 brain striatum GDS2391 GPL1261 PGC-1alpha transcriptional coac- HD 2 GSM135217, GSM135219, tivator null mutation effect on the GSM135221 brain striatum GDS2911 GPL1261 Huntington’s disease models HD 1 GSM82441, GSM82442, GSM82443 GDS2911 GPL1261 Huntington’s disease models HD 2 GSM82444, GSM82445, GSM82446 GDS2911 GPL1261 Huntington’s disease models HD 3 GSM82447, GSM82448, GSM82449 GDS2912 GPL1261 Huntington’s disease transgenic HD 1 GSM83863, GSM83872, model: time course GSM83873 GDS2912 GPL1261 Huntington’s disease transgenic HD 2 GSM83870, GSM83874, model: time course GSM83876 GDS2912 GPL1261 Huntington’s disease transgenic HD 3 GSM83862, GSM83866, model: time course GSM83871 GDS2912 GPL1261 Huntington’s disease transgenic HD 4 GSM83869, GSM83878, model: time course GSM83879 110

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS2912 GPL1261 Huntington’s disease transgenic HD 5 GSM83867, GSM83868 model: time course GDS2912 GPL1261 Huntington’s disease transgenic HD 6 GSM83864, GSM83865, model: time course GSM83875, GSM83877 GDS2887 GPL570 Moderate stage Huntington’s dis- HD 1 GSM217771, GSM217772, ease lymphocytes GSM217773, GSM217774, GSM217775 GDS2887 GPL570 Moderate stage Huntington’s dis- HD 2 GSM217766, GSM217767, ease lymphocytes GSM217768, GSM217769, GSM217770 GDS2887 GPL570 Moderate stage Huntington’s dis- HD 3 GSM217784, GSM217785, ease lymphocytes GSM217786, GSM217787 GDS2887 GPL570 Moderate stage Huntington’s dis- HD 4 GSM217776, GSM217777, ease lymphocytes GSM217778, GSM217779, GSM217780, GSM217781, GSM217782, GSM217783 GSE7958 GPL1261 Striatal gene expression data HD 1 GSM197114, GSM197116, from 3- and 18-month-old Q92 GSM197117 mice and control mice GSE7958 GPL1261 Striatal gene expression data HD 2 GSM197115, GSM197118, from 3- and 18-month-old Q92 GSM197119, mice and control mice GSE7958 GPL1261 Striatal gene expression data HD 3 GSM197120, GSM197121, from 3- and 18-month-old Q92 GSM197125 mice and control mice GSE7958 GPL1261 Striatal gene expression data HD 4 GSM197122, GSM197123, from 3- and 18-month-old Q92 GSM197124 mice and control mice GSE9375 GPL81 Striatal gene expression data HD 1 GSM238688, GSM238689, from 12 months-old Hdh4/Q80 GSM238690 mice and control mice GSE9375 GPL81 Striatal gene expression data HD 2 GSM238691, GSM238692, from 12 months-old Hdh4/Q80 GSM238693 mice and control mice GSE9857 GPL1261 Striatal gene expression data HD 1 GSM247436, GSM247437, from 12 weeks-old R6/2 mice and GSM247438, GSM247439, control mice GSM247440, GSM247445, GSM247446, GSM247447, GSM247448 GSE9857 GPL1261 Striatal gene expression data HD 2 GSM247441, GSM247442, from 12 weeks-old R6/2 mice and GSM247443, GSM247444, control mice GSM247449, GSM247450, GSM247451, GSM247452, GSM247453 111

Table B.2 (continued) GSE ID Platform Title Group No. Samples GSE3790 GPL96 Human cerebellum, frontal cortex HD 1 GSM86787, GSM86789, [BA4, BA9] and caudate nucleus GSM86791, GSM86793, HD tissue experiment GSM86795, GSM86796, GSM86798, GSM86799, GSM86801, GSM86802, GSM86803, GSM86808, GSM86812, GSM86814, GSM86816, GSM86817, GSM86818, GSM86819, GSM86820, GSM86821, GSM86822, GSM86823, GSM86824, GSM86825, GSM86826, GSM86827, GSM86828, GSM86829, GSM86830, GSM86831, GSM86832, GSM86833, GSM86856 GSE3790 GPL96 Human cerebellum, frontal cortex HD 2 GSM86788, GSM86790, [BA4, BA9] and caudate nucleus GSM86792, GSM86794, HD tissue experiment GSM86797, GSM86800, GSM86804, GSM86805, GSM86806, GSM86807, GSM86809, GSM86810, GSM86811, GSM86813, GSM86815, GSM86834, GSM86835, GSM86836, GSM86837, GSM86838, GSM86839, GSM86840, GSM86841, GSM86842, GSM86843, GSM86844, GSM86845, GSM86846, GSM86847, GSM86848, GSM86849, GSM86850, GSM86851, GSM86852, GSM86853, GSM86854, GSM86855 GSE3790 GPL96 Human cerebellum, frontal cortex HD 3 GSM86929, GSM86931, [BA4, BA9] and caudate nucleus GSM86933, GSM86937, HD tissue experiment GSM86938, GSM86939, GSM86940, GSM86942, GSM86943, GSM86947, GSM86948, GSM86951, GSM86957, GSM86958, GSM86959, GSM86960, GSM86961, GSM86962, GSM86963, GSM86964, GSM86965, GSM86966, GSM86967, GSM86968, GSM86969, GSM86970, GSM86971 112

Table B.2 (continued) GSE ID Platform Title Group No. Samples GSE3790 GPL96 Human cerebellum, frontal cortex HD 4 GSM86927, GSM86928, [BA4, BA9] and caudate nucleus GSM86930, GSM86932, HD tissue experiment GSM86934, GSM86935, GSM86936, GSM86941, GSM86944, GSM86945, GSM86946, GSM86949, GSM86950, GSM86952, GSM86953, GSM86954, GSM86955, GSM86956, GSM86973, GSM86974, GSM86975, GSM86976, GSM86977, GSM86978, GSM86979, GSM86980, GSM86981, GSM86982, GSM86983, GSM86984, GSM86985, GSM86986, GSM86987, GSM86988, GSM86989, GSM86990 GSE3790 GPL96 Human cerebellum, frontal cortex HD 5 GSM87071, GSM87072, [BA4, BA9] and caudate nucleus GSM87073, GSM87074, HD tissue experiment GSM87075, GSM87085, GSM87086, GSM87087, GSM87088, GSM87089, GSM87090, GSM87091, GSM87092, GSM87093, GSM87094, GSM87095, GSM87096, GSM87097, GSM87098, GSM87099, GSM87100, GSM87101 GSE3790 GPL96 Human cerebellum, frontal cortex HD 6 GSM87059, GSM87060, [BA4, BA9] and caudate nucleus GSM87061, GSM87062, HD tissue experiment GSM87063, GSM87067, GSM87068, GSM87069, GSM87070, GSM87076, GSM87077, GSM87078, GSM87079, GSM87080, GSM87081, GSM87082, GSM87083, GSM87084, GSM87103, GSM87104, GSM87105, GSM87106, GSM87107, GSM87108, GSM87109, GSM87110, GSM87111, GSM87112, GSM87113, GSM87114, GSM87115, GSM87116, GSM87117, GSM87118, GSM87119, GSM87120, GSM87121, GSM87122 GDS280 GPL81 treatment effect on liver H 1 GSM5950, GSM5957 GDS280 GPL81 Cytokine treatment effect on liver H 2 GSM5955, GSM5956 113

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS280 GPL81 Cytokine treatment effect on liver H 3 GSM5951, GSM5952 GDS280 GPL81 Cytokine treatment effect on liver H 5 GSM5958, GSM5959 GDS280 GPL81 Cytokine treatment effect on liver H 6 GSM5960, GSM5961 GDS1093 GPL81 NADH-cytochrome P450 reduc- H 1 GSM24728, GSM24729 tase deletion effect on liver GDS1093 GPL81 NADH-cytochrome P450 reduc- H 2 GSM24730, GSM24747 tase deletion effect on liver GDS1213 GPL339 Intermittent effect on H 1 GSM32860, GSM32861, liver GSM32862, GSM32863, GSM32864 GDS1213 GPL339 Intermittent hypoxia effect on H 2 GSM32865, GSM32866, liver GSM32867, GSM32868, GSM32869 GDS1227 GPL339 Boswellia H 1 GSM32467, GSM32484 serrata effects on liver GDS1227 GPL339 Dietary supplement Boswellia H 2 GSM32493, GSM32494 serrata effects on liver GDS1227 GPL339 Dietary supplement Boswellia H 3 GSM32495, GSM32496 serrata effects on liver GDS1232 GPL339 alpha increase H 1 GSM50801, GSM50802 effect on of LDLR knock- outs fed a high- high- GDS1227 GPL339 Dietary supplement Boswellia H 2 GSM50803, GSM50804 serrata effects on liver GDS1227 GPL339 Dietary supplement Boswellia H 3 GSM50805, GSM50806 serrata effects on liver GDS1227 GPL339 Dietary supplement Boswellia H 4 GSM50807, GSM50808 serrata effects on liver GDS1274 GPL85 A deficiency effect on H 1 GSM27430, GSM27431, liver GSM27432, GSM27433, GSM27434, GSM27435, GSM27436 GDS1274 GPL85 deficiency effect on H 2 GSM27437, GSM27438, liver GSM27439, GSM27440, GSM27441, GSM27442, GSM27443 GDS1307 GPL1355 Various high fat diets effect on H 1 GSM78877, GSM78878 liver GDS1307 GPL1355 Various high fat diets effect on H 2 GSM78869, GSM78870 liver GDS1307 GPL1355 Various high fat diets effect on H 3 GSM78871, GSM78872 liver GDS1307 GPL1355 Various high fat diets effect on H 4 GSM78873, GSM78874 liver 114

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS1307 GPL1355 Various high fat diets effect on H 5 GSM78875, GSM78876 liver GDS1354 GPL341 Cirrhosis and liver endothelial H 1 GSM32440, GSM32441, cells GSM32442 GDS1354 GPL341 Cirrhosis and liver endothelial H 2 GSM32443, GSM32444, cells GSM32445 GDS1442 GPL8300 PPARα agonist ciprofibrate ef- H 1 GSM62852, GSM62853 fect on liver GDS1442 GPL8300 PPARα agonist ciprofibrate ef- H 2 GSM62854, GSM62855 fect on liver GDS1442 GPL8300 PPARα agonist ciprofibrate ef- H 3 GSM62856, GSM62857, fect on liver GSM62858 GDS1442 GPL8300 PPARα agonist ciprofibrate ef- H 4 GSM62863, GSM62864, fect on liver GSM62865, GSM62866 GDS1442 GPL8300 PPARα agonist ciprofibrate ef- H 5 GSM62867, GSM62868, fect on liver GSM62869, GSM62870 GDS1442 GPL8300 PPARα agonist ciprofibrate ef- H 6 GSM62871, GSM62872, fect on liver GSM62873, GSM62874 GDS1443 GPL81 Type 2 susceptibility: H 1 GSM63273, GSM63274 role of hepatic lipogenic capacity GDS1443 GPL81 susceptibility: H 2 GSM63275, GSM63276 role of hepatic lipogenic capacity GDS1443 GPL81 Type 2 diabetes susceptibility: H 3 GSM63277, GSM63278 role of hepatic lipogenic capacity GDS1443 GPL81 Type 2 diabetes susceptibility: H 4 GSM63281, GSM63282 role of hepatic lipogenic capacity GDS1443 GPL81 Type 2 diabetes susceptibility: H 5 GSM63283, GSM63284 role of hepatic lipogenic capacity GDS1443 GPL81 Type 2 diabetes susceptibility: H 6 GSM63285, GSM63286 role of hepatic lipogenic capacity GDS1443 GPL81 Type 2 diabetes susceptibility: H 7 GSM63287, GSM63288 role of hepatic lipogenic capacity GDS1484 GPL85 Anticarcinogen 3H-1, 2-dithiole- H 1 GSM71311, GSM71368, 3-thione effect on liver GSM71369, GSM71370 GDS1484 GPL85 Anticarcinogen 3H-1, 2-dithiole- H 2 GSM71371, GSM71372, 3-thione effect on liver GSM71373, GSM71374 GDS1517 GPL1261 Stearoyl-CoA desaturase 1- H 1 GSM88882, GSM88883, deficient mutants on a very GSM88884, GSM88885, low-fat, high- diet: GSM88886 liver expression profile GDS1517 GPL1261 Stearoyl-CoA desaturase 1- H 2 GSM88882, GSM88883, deficient mutants on a very GSM88884, GSM88885, low-fat, high-carbohydrate diet: GSM88886 liver expression profile 115

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS1517 GPL1261 Stearoyl-CoA desaturase 1- H 3 GSM88887, GSM88888, deficient mutants on a very GSM88889, GSM88890, low-fat, high-carbohydrate diet: GSM88891 liver expression profile GDS1517 GPL1261 Stearoyl-CoA desaturase 1- H 4 GSM88877, GSM88878, deficient mutants on a very GSM88879, GSM88880, low-fat, high-carbohydrate diet: GSM88881 liver expression profile GDS1555 GPL1261 kinase knockout effect H 1 GSM87833, GSM87834, on liver GSM87835, GSM87836 GDS1555 GPL1261 Glycerol kinase knockout effect H 2 GSM87837, GSM87838, on liver GSM87839, GSM87840 GDS1622 GPL1355 deprivation effect on tu- H 1 GSM42161, GSM42162, morigenic hepatic cells GSM42163 GDS1622 GPL1355 Arginine deprivation effect on tu- H 2 GSM42164, GSM42165, morigenic hepatic cells GSM42166 GDS1622 GPL1355 Arginine deprivation effect on tu- H 3 GSM42167, GSM42168, morigenic hepatic cells GSM42169 GDS1701 GPL8321 Endothelial progenitor cells in fe- H 1 GSM30084, GSM101117 tal liver GDS1701 GPL8321 Endothelial progenitor cells in fe- H 2 GSM30085, GSM101118, tal liver GSM101119 GDS1808 GPL81 Candidate restriction H 1 GSM45690, GSM45691, mimetic drugs effect on the liver GSM45692, GSM45693 GDS1808 GPL81 Candidate calorie restriction H 2 GSM45694, GSM45695, mimetic drugs effect on the liver GSM45696, GSM45697 GDS1808 GPL81 Candidate calorie restriction H 3 GSM45706, GSM45707, mimetic drugs effect on the liver GSM45708, GSM45709 GDS1808 GPL81 Candidate calorie restriction H 4 GSM45698, GSM45699, mimetic drugs effect on the liver GSM45700, GSM45701 GDS1808 GPL81 Candidate calorie restriction H 5 GSM45710, GSM45711, mimetic drugs effect on the liver GSM45712, GSM45713 GDS1808 GPL81 Candidate calorie restriction H 6 GSM45702, GSM45703, mimetic drugs effect on the liver GSM45704, GSM45705 GDS1808 GPL81 Candidate calorie restriction H 7 GSM45714, GSM45715, mimetic drugs effect on the liver GSM45716, GSM45717 GDS1808 GPL81 Candidate calorie restriction H 8 GSM45718, GSM45719, mimetic drugs effect on the liver GSM45720, GSM45721 GDS1916 GPL1261 Hepatocyte nuclear factor 4 alpha H 1 GSM69792, GSM69793, knockout effect on the embryonic GSM69794 liver GDS1916 GPL1261 Hepatocyte nuclear factor 4 alpha H 2 GSM69795, GSM69796, knockout effect on the embryonic GSM69797 liver 116

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS2093 GPL341 Pregnane X receptor agonist ef- H 1 GSM111888, GSM111890, fect on the liver (RAE230A) GSM111891, GSM111893, GSM111895 GDS2093 GPL341 Pregnane X receptor agonist ef- H 2 GSM111897, GSM111899, fect on the liver (RAE230A) GSM111901, GSM111903, GSM111905 GDS40 GPL32 Cardiac development, maturation C 1 GSM2189, GSM2190, and aging GSM2191 GDS40 GPL32 Cardiac development, maturation C 2 GSM2088, GSM2178, and aging GSM2179 GDS40 GPL32 Cardiac development, maturation C 3 GSM2088, GSM2178, and aging GSM2179 GDS40 GPL32 Cardiac development, maturation C 4 GSM2183, GSM2184, and aging GSM2185 GDS40 GPL32 Cardiac development, maturation C 5 GSM2334, GSM2335, and aging GSM2336 GDS40 GPL32 Cardiac development, maturation C 6 GSM2186, GSM2187, and aging GSM2188 GDS40 GPL32 Cardiac development, maturation C 7 GSM2180, GSM2181, and aging GSM2182 GDS40 GPL32 Cardiac development, maturation C 8 GSM2337, GSM2338, and aging GSM2339 GDS49 GPL75 Congenital heart disease C 1 GSM2266, GSM2267, (Mu11K-A) GSM2268 GDS49 GPL75 Congenital heart disease C 2 GSM2269, GSM2270, (Mu11K-A) GSM2273 GDS49 GPL75 Congenital heart disease C 3 GSM2271, GSM2272, (Mu11K-A) GSM2274 GDS213 GPL246 Congestive heart failure C 1 GSM4423, GSM4425 GDS213 GPL246 Congestive heart failure C 2 GSM4424, GSM52521, GSM52522 GDS399 GPL85 Cardiac aging C 1 GSM6174, GSM6175, GSM6176, GSM6177, GSM6178 GDS399 GPL85 Cardiac aging C 2 GSM6168, GSM6169, GSM6170, GSM6171, GSM6172, GSM6173 GDS404 GPL81 Circadian oscillations and cardio- C 1 GSM6090, GSM6102 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 2 GSM6091 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 3 GSM6092 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 4 GSM6093 vascular function 117

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS404 GPL81 Circadian oscillations and cardio- C 5 GSM6094 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 6 GSM6095 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 7 GSM6096 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 8 GSM6097 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 9 GSM6098 vascular function GDS404 GPL81 Circadian oscillations and cardio- C 10 GSM6099 vascular function GDS411 GPL75 Heart failure and rescue (Mu11K- C 1 GSM10178, GSM10179, A) GSM10180, GSM10181, GSM10182, GSM10183, GSM10184, GSM10185, GSM10186, GSM10187, GSM10188, GSM10189 GDS411 GPL75 Heart failure and rescue (Mu11K- C 2 GSM10190, GSM10191, A) GSM10192, GSM10193, GSM10194, GSM10195, GSM10196, GSM10197, GSM10198, GSM10199, GSM10200, GSM10201 GDS411 GPL75 Heart failure and rescue (Mu11K- C 3 GSM10202, GSM10203, A) GSM10204, GSM10205, GSM10206, GSM10207, GSM10208, GSM10209, GSM10210, GSM10211, GSM10212, GSM10213 GDS411 GPL75 Heart failure and rescue (Mu11K- C 4 GSM10214, GSM10215, A) GSM10216, GSM10217, GSM10218, GSM10219 GDS411 GPL75 Heart failure and rescue (Mu11K- C 5 GSM10220, GSM10221, A) GSM10222, GSM10223, GSM10224, GSM10225 GDS411 GPL75 Heart failure and rescue (Mu11K- C 6 GSM10226, GSM10227, A) GSM10228, GSM10229, GSM10230 GDS437 GPL81 Heart transplants C 1 GSM8926, GSM8931, GSM8932 GDS437 GPL81 Heart transplants C 2 GSM8928, GSM8934, GSM8935 GDS437 GPL81 Heart transplants C 3 GSM8929, GSM8936, GSM8937 GDS437 GPL81 Heart transplants C 4 GSM8930, GSM8938, GSM8939 118

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS437 GPL81 Heart transplants C 5 GSM8927, GSM8933 GDS1001 GPL81 Na,K-ATPase alpha 1 isoform re- C 1 GSM19023, GSM19024 duced expression effect on hearts GDS1001 GPL81 Na,K-ATPase alpha 1 isoform re- C 2 GSM19025, GSM19026 duced expression effect on hearts GDS1080 GPL1261 Manganese superoxide dismutase C 1 GSM40959, GSM40960 deficiency effect on hearts GDS1080 GPL1261 Manganese superoxide dismutase C 2 GSM40957, GSM40958 deficiency effect on hearts GDS1247 GPL81 Dysferlin deficiency effect on C 1 GSM46151, GSM46152, skeletal and cardiac muscles GSM46153, GSM46154, GSM46155 GDS1247 GPL81 Dysferlin deficiency effect on C 2 GSM46166, GSM46167, skeletal and cardiac muscles GSM46168, GSM46169, GSM46170 GDS1247 GPL81 Dysferlin deficiency effect on C 3 GSM46156, GSM46157, skeletal and cardiac muscles GSM46158, GSM46159, GSM46160 GDS1247 GPL81 Dysferlin deficiency effect on C 4 GSM46161, GSM46162, skeletal and cardiac muscles GSM46163, GSM46164, GSM46165 GDS1264 GPL85 Left ventricular hypertrophy C 1 GSM38239, GSM38240, in spontaneously hypertensive GSM38241, GSM38242 model: time course GDS1264 GPL85 Left ventricular hypertrophy C 2 GSM38243, GSM38244, in spontaneously hypertensive GSM38245, GSM38246 model: time course GDS1264 GPL85 Left ventricular hypertrophy C 3 GSM38247, GSM38248, in spontaneously hypertensive GSM38249, GSM38250 model: time course GDS1264 GPL85 Left ventricular hypertrophy C 4 GSM38251, GSM38252, in spontaneously hypertensive GSM38253, GSM38254 model: time course GDS1264 GPL85 Left ventricular hypertrophy C 5 GSM38255, GSM38256, in spontaneously hypertensive GSM38257, GSM38258 model: time course GDS1302 GPL81 2,3,7,8-tetrachlorodibenzo-p- C 1 GSM60894, GSM60895, dioxin effect on cardiovascular GSM60896, GSM60897, development GSM60902, GSM60903, GSM60906, GSM60907 GDS1302 GPL81 2,3,7,8-tetrachlorodibenzo-p- C 2 GSM60898, GSM60901, dioxin effect on cardiovascular GSM60904, GSM60908 development GDS1302 GPL81 2,3,7,8-tetrachlorodibenzo-p- C 3 GSM60899, GSM60900, dioxin effect on cardiovascular GSM60905, GSM60909 development 119

Table B.2 (continued) GSE ID Platform Title Group No. Samples GDS1302 GPL81 2,3,7,8-tetrachlorodibenzo-p- C 4 GSM60910, GSM60911, dioxin effect on cardiovascular GSM60912, GSM60913 development 120

Appendix C: Data from the Initial Search of Binary Representation Database

Table C.1: Summary of Initial Searches of Databases. Searches of the database are performed using 3 distance measures to identify the 100 least distant profiles from the query. The average of and the standard deviations of the number of positive bits (PB) in selected profiles, the number of common positive bits (CPB), common negative bits (CNB), and uncommon bits (UB) between the query profile and selected profiles are given for each measure. The time required to select the 100 least distant profile is included for each measure.

Distance Measure PB CPB CNB UB Modified Tanimoto Average 1835.35 441.02 13878.76 2280.22 St. Dev. 370.91 187.83 721.27 742.36 Search Time: 18.77 ms Tanimoto Average 2401.07 509.30 13464.70 2626.00 St. Dev. 1105.70 211.27 1107.28 1046.40 Search Time: 12.78 ms Hamming Average 25.72 20.43 15317.13 1262.44 St. Dev. 169.77 143.52 46.48 129.13 Search Time: 8.54 ms 121 GPL8300 0.0000 1278GPL8300 1278 0.4330 15322 1134GPL8300 0 669 0.4554 1965 14857GPL8300 1074 866 0.4568 2028 14223GPL8300 1511 881 0.4582 2060 14175GPL8300 1544 887 0.4587 1999 14149 1564 868 14191 1541 . Profiles are identified by DataSet and profile titles and are ranked anti-CD3 anti-CD28, control, 48 h vsIL-4, anti-CD3 48 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-4 anti-CD3 TGFbeta, 48 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vsIL-4, anti-CD3 6 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12 anti-CD3 TGFbeta, 6 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vsIL-4 anti-CD3 TGFbeta, 6 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vsIL-12, anti-CD3 6 h anti-CD28, tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) Profiles Selected by Modified Tanimoto Distance GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS IDGDS1290 Title CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB Table C.2: according to the modified Tanimotoof distance the between selected the profile selected asbetween and well the query as profiles. query the number profile Theprofiles of and platform by common and selected a positive number profiles horizontal bits of line. are (CPB), positive common given bits negative for (PB) bits each (CNB), profile and uncommon that bits is (UB) selected. The query profile is separated from other 122 GPL8300 0.4590 2036GPL8300 878 0.4613 2069 14164GPL8300 1558 882 0.4615 2105 14135GPL8300 1583 892 0.4643 2035 14109GPL8300 1599 865 0.4665 2061 14152GPL8300 1583 867 0.4681 2061 14128GPL8300 1605 863 0.5191 973 14124GPL8300 1613 0.5556 467 989 14816 1317 396 14729 1475 anti-CD3 anti-CD28, control, 48 h vscontrol, anti-CD3 6 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12 anti-CD3 TGFbeta, 2 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vscontrol, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-4, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-4 anti-CD3 TGFbeta, 2 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vsIL-12, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12, anti-CD3 48 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12 anti-CD3 TGFbeta, 48 anti-CD28, h tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 123 GPL8300 0.5904 1224GPL8300 361 0.5942 1187 14459GPL8300 1780 345 0.6124GPL8300 1996 14480 0.6124 1775 340GPL8300 1996 0.6146 14248 439GPL8300 2012 2084 0.6151 13765 439 2396 1627GPL8300 13765 448 0.6166 2396 GPL8300 2218 13686 0.6166 2466 365GPL8300 2218 0.6168 14060 466 2175 1073 13570 466 2564 13570 2564 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4, 48 vs h anti-CD3 anti-CD3 anti-CD28,48 IL-12, h vsIL-4, anti-CD3 48 h anti-CD28, etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 nocodazole, 3 HeLa hour anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12, 2 vs h anti-CD3 etat/PCEP4 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour etat/PCEP4 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4 vs TGFbeta, anti-CD3 h 48 tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 124 GPL8300 0.6168 1634GPL8300 270 0.6175GPL8300 1955 14519 0.6179 1811 361 1641GPL8300 14049 280 0.6180 2190 2223 14456GPL8300 1864 415 0.6187 1565 13782GPL8300 2403 359 0.6192 1628 14040GPL8300 2201 462 0.6192 1611 13561GPL91 2577 380 0.6200 13929 2167 2291 344 14101 2155 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4 TGFbeta, vs 6 h anti-CD3 etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 nocodazole, 6 HeLa hour anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4 TGFbeta, vs 2 h anti-CD3 anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28, IL-4 TGFbeta, 6 h anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4, 6 vs h anti-CD3 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4, 2 vs h anti-CD3 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 6 normal, 5 toDuchenne muscular 12 dystrophy, year mix10 to vs 12 year mix tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS262 Duchenne muscular dystrophy Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 125 GPL8300 0.6200GPL8300 2101 0.6202 353 1994GPL8300 350 0.6207 14047 2200 1951 14061GPL8300 2189 370 0.6207 1935 13961GPL8300 2269 445 0.6207 1590 13600GPL91 2555 433 0.6210GPL8300 13654 2052 0.6213 2513 413 1951GPL8300 404 13741 0.6213 2446 2187 13775 2421 401 13788 2411 Lin negative, CD34vs Lin negative positive, CD34 positive etat/PCEP4 HeLa,unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- trol, nocodazole, 6 hour anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 6 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4, 2 vs h anti-CD3 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12, 6 vs h anti-CD3 normal, 4 toDuchenne muscular 13 dystrophy, year mix10 to vs 12 year mix anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12, 6 vs h anti-CD3 anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,12 IL- TGFbeta, 6 h tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) various types GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS262 Duchenne muscular dystrophy GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1095 Title Hematopoeitic stem cells of Profile Platform Distance PB CPB CNB UB 126 GPL8300 0.6214 1613GPL91 342 0.6214GPL8300 14074 1721 0.6215 2184 421 1875GPL8300 402 13691 0.6215 2488 1646 13773GPL8300 2425 444 0.6216GPL8300 2300 13579 0.6216 2577 344 1990GPL8300 14053 362 0.6216 2203 1642 13963GPL8300 2275 388 0.6217 2233 13835 2377 349 14025 2226 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h control, 6 vs h anti-CD3 normal, 5 toDuchenne muscular 12 dystrophy, year mix5 to vs 6 year mix] anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4, 6 vs h anti-CD3 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h control, 2 vs h anti-CD3 etat/PCEP4 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 3 HeLa hour anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 2 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 2 docetaxel sensitive tumordocetaxel resistant vs tumor (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) treatment tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS262 Duchenne muscular dystrophy GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS360 Breast cancer and docetaxel Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 127 GPL8300 0.6219 1996GPL8300 464 0.6221 2163 13486GPL8300 2650 408 0.6223 1941 13740GPL8300 2452 348 0.6223 2194 14028GPL8300 2224 632 0.6228 1374 8098GPL8300 408 0.6229 7870 GPL8300 1912 13734 0.6234 2458 437 1956GPL8300 13596 397 0.6234 2567 1687 13778 2425 442 13570 2588 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4 TGFbeta, vs 6 h anti-CD3 anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,12, IL- 6 h anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12, 2 vs h anti-CD3 anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,12 IL- TGFbeta, 2 h control, 24 h vs TNFalphaparthenolide, and 24 h etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 hydroxyurea, 6 HeLa hour anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h control, 6 vs h anti-CD3 anti-CD3 anti-CD28,48 IL-12, h vsIL-4, anti-CD3 6 h anti-CD28, tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) keratinocytes in theof presence NF-kappa B inhibitor:course time tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1289 TNFalpha effect on epidermal GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 128 GPL8300 0.6236 1726GPL8300 301 0.6237GPL8300 2121 14249 0.6239 2050 390 2114GPL8300 13800 396 0.6239 2410 2114 13762GPL8300 2442 350 0.6241 1644 13985GPL8300 2265 356 0.6243 1979 13952GPL8300 2292 424 0.6246GPL8300 1797 0.6246 13625 2551 422 1690 422 13630 2548 13630 2548 anti-CD3 anti-CD28,48 IL-12, h vsIL-4 anti-CD3 TGFbeta, 6 anti-CD28, h etat/PCEP4 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 6 HeLa hour etat/PCEP4 HeLa,unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- trol, nocodazole, 0 hour etat/PCEP4 HeLa,unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- trol, nocodazole, 0 hour control, 1 h vsparthenolide, 24 TNFalpha h and anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h control, 2 vs h anti-CD3 non-union skeletal fracture vs normal skeletal fracture anti-CD3 anti-CD28,48 IL-12, h vsIL-4 anti-CD3 TGFbeta, 2 anti-CD28, h tion tion tion keratinocytes in theof presence NF-kappa B inhibitor:course time tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) fractures (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS1289 TNFalpha effect on epidermal GDS1290 CD4+ lymphocyte polariza- GDS367 SkeletalGDS1290 repair in non-union CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 129 GPL8300 0.6250 1951GPL91 341 0.6250GPL8300 14019 1601 0.6251 2240 397 2175 295 13740GPL8300 2463 0.6251 12102GPL8300 4203 2103 0.6251GPL8300 390 2035 0.6252 331 13761 2115GPL8300 2449 14052 429 0.6255 2217 2064 13576GPL8300 2595 604 0.6256 1690 12833 3163 416 13635 2549 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4 TGFbeta, vs 2 h anti-CD3 normal, 4 toDuchenne muscular 13 dystrophy, year mix5 to vs 6 year mix] etat/PCEP4 HeLa,unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- trol, nocodazole, 3 hour heart vs prostateuntreated, 1 h vs24 IFN-gamma, h etat/PCEP4 HeLa,yurea, 3 hour GPL8300 hydrox- vscontrol, pCep4 nocodazole, 3 HeLa hour anti-CD3 anti-CD28, 0.6251 IL-4, 48 h vs anti-CD3 anti-CD28,4, IL- 2 h 3093anti-CD3 347 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28, IL-4 TGFbeta, 2 h 13979anti-CD3 2274 anti-CD28,48 IL-12, h vsIL-12 anti-CD3 TGFbeta, 6 anti-CD28, h (HG-U95A) tion sion profiling (HG-U95A) tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) keratinocytes: time course GDS262 Duchenne muscular dystrophy GDS449 Cell cycle and Tat transactiva- GDS422 Normal human tissue expres- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- ProfileGDS1846 Interferon gamma effect on Platform Distance PB CPB CNB UB 130 GPL8300 0.6257 1681GPL8300 404 0.6258 1724 13691GPL8300 2505 418 0.6258 1147 13625GPL8300 2557 408 0.6259GPL8300 1911 13666 0.6259 2526 344 1739GPL8300 13976 342 0.6259 2280 1709 13983GPL8300 2275 349 0.6260 2170 13947GPL8300 2304 258 0.6262 988 14433 1909 380 13791 2429 anti-CD3 anti-CD28,48 IL-12, h vsIL-12, anti-CD3 6 h anti-CD28, anti-CD3 anti-CD28,48 IL-12, h vscontrol, anti-CD3 6 h anti-CD28, anti-CD3 anti-CD28, IL-4, 6 h vs anti-CD3 anti-CD28,TGFbeta, IL-4 6 h etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 hydroxyurea, 0 HeLa hour anti-CD3 anti-CD28,48 IL-12, h vsIL-12 anti-CD3 TGFbeta, 2 anti-CD28, h control, 4 h vsparthenolide, 24 TNFalpha h and anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28, con- trol, 6 h anti-CD3 anti-CD28, control, 6 h vsIL-4 TGFbeta, anti-CD3 6 h anti-CD28, tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) keratinocytes in theof presence NF-kappa B inhibitor:course time tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1289 TNFalpha effect on epidermal GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 131 GPL8300 0.6265 2044GPL8300 351 0.6268GPL8300 1572 13934 0.6269 2315 346 1617GPL8300 13959 425 0.6274 2295 1282 13577GPL8300 2598 234 0.6275 1283 14568GPL8300 1798 401 0.6276 1698 13679GPL8300 2520 321 0.6276 1618 14071GPL8300 2208 328 0.6281 2000 14033 2239 274 14314 2012 anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,4, IL- 6 h etat/PCEP4 HeLa,yurea, 6 hydrox- hourHeLa, nocodazole, vs 0 hour etat/PCEP4 anti-CD3 anti-CD28, control, 2 h vsIL-12 TGFbeta, anti-CD3 6 h anti-CD28, PseudomonasPAK, exoS, xoYPseudomonas aeruginosa deleted vs PAK, needle complex deleted aeruginosa control, 48 h vs TNFalphaparthenolide, and 24 h anti-CD3 anti-CD28,48 IL-12, h vsIL-4, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 2 h vsIL-4 TGFbeta, anti-CD3 6 h anti-CD28, measles virus, 3 hvirus, 6 vs h measles tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) Pseudomonas aeruginosa type III secretion system mutants keratinocytes in theof presence NF-kappa B inhibitor:course time tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) measles virus infection:course time tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1022 Lung pneumocyte response to GDS1289 TNFalpha effect on epidermal GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1681 Dendritic cell response to Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 132 GPL8300 0.6283GPL8300 2022 0.6283 339 2022 13963GPL8300 326 2298 0.6286 14030GPL8300 1617 2244 0.6286 577GPL8300 1892 0.6290 12877GPL8300 391 3146 1270 0.6290 13691 391 2190GPL8300 2518 373 0.6290 13691GPL8300 2518 2012 0.6292 13775 2452 323 1125 368 14028 2249 13798 2434 heart vs spinal cordetat/PCEP4 HeLa,yurea, 3 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour GPL8300etat/PCEP4 HeLa,yurea, 3 hour hydrox- 0.6282 vscontrol, pCep4 nocodazole, 0 HeLa hour hypoxia vs normoxia 3022latanoprost 274 free acid,muscle ciliary vsacid, trabecular GPL91 meshwork latanoprost 14313 free etat/PCEP4 HeLa,yurea, 2013 3 hour hydrox- vs 0.6285control, pCep4 nocodazole, 6 HeLa hour untreated, 1 h 1920 vs48 IFN-gamma, h anti-CD3 anti-CD28, 388 IL-4, 48 h vs anti-CD3 anti-CD28, con- trol, 2 h 13710Lin 2502 negative, CD34vs Lin negative negative, CD34 positive anti-CD3 anti-CD28,48 IL-12, h vsIL-4 anti-CD3 TGFbeta, 48 anti-CD28, h tion poxia tion keratinocytes: time course tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) various types tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sion profiling (HG-U95A) tion outflow GDS449 Cell cycle and Tat transactiva- GDS2036 Macrophage response to hy- GDS449 Cell cycle and Tat transactiva- GDS1846 Interferon gammaGDS1290 effect on CD4+ lymphocyte polariza- GDS1095 Hematopoeitic stemGDS1290 cells of CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS422 Title NormalGDS449 human tissue expres- Cell cycle and Tat transactiva- ProfileGDS359 Glaucoma and aqueous humor Platform Distance PB CPB CNB UB 133 GPL8300 0.6292 1710GPL8300 302 0.6293 1694 14137 2161 0 15322 1278 GPL8300 0.0000 1278GPL8300 1278 0.6162 15322 1134GPL8300 0 669 0.6357 1965 14857 1074 866 14223 1511 . Profiles are identified by DataSet and profile titles and are ranked according to anti-CD3 anti-CD28,48 IL-12, h vscontrol, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28,48 IL-12, h vsIL-12, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-4, anti-CD3 48 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-4 anti-CD3 TGFbeta, 48 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vsIL-4, anti-CD3 6 h anti-CD28, tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) Profiles Selected by Tanimoto Distance GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.2 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- ProfileGDS ID PlatformGDS1290 Title CD4+ Distance lymphocyte polariza- PBGDS1290 CD4+ CPB lymphocyte polariza- CNB UB Profile Platform Distance PB CPB CNB UB Table C.3: the Tanimoto distance between the selectedwell and as query the profiles. number The of platformand common and selected positive number bits profiles of (CPB), are positive common given bits for negative (PB) bits each of (CNB), profile the and that selected uncommon is profile bits selected. as (UB) The between query the profile query is profile separated from other profiles by a horizontal line. 134 GPL8300 0.6367 2028GPL8300 881 0.6381 2060 14175GPL8300 1544 887 0.6396 2036 14149GPL8300 1564 878 0.6397 1999 14164GPL8300 1558 868 0.6419 2105 14191GPL8300 1541 892 0.6422 2069 14109GPL8300 1599 882 0.6467 2035 14135GPL8300 1583 865 0.6493 2061 14152 1583 867 14128 1605 anti-CD3 anti-CD28, control, 48 h vsIL-12 anti-CD3 TGFbeta, 6 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vsIL-4 anti-CD3 TGFbeta, 6 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vscontrol, anti-CD3 6 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12, anti-CD3 6 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vscontrol, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12 anti-CD3 TGFbeta, 2 anti-CD28, h anti-CD3 anti-CD28, control, 48 h vsIL-4, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-4 anti-CD3 TGFbeta, 2 anti-CD28, h tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 135 GPL8300 0.6515 2061GPL8300 863 0.7382 973 14124GPL8300 1613 0.7883 467 989GPL8300 14816 0.8314 1317 396 1224GPL8300 14729 361 0.8373 1475 1187 14459 1780 345 14480 1775 anti-CD3 anti-CD28, control, 48 h vsIL-12, anti-CD3 2 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12, anti-CD3 48 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-12 anti-CD3 TGFbeta, 48 anti-CD28, h anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4, 48 vs h anti-CD3 anti-CD3 anti-CD28,48 IL-12, h vsIL-4, anti-CD3 48 h anti-CD28, heart vs prostateAffymetrix, 4 h vs Enzo, 4 h GPL8300CodeLink, 14 h vs GPL8300 Enzo, 4 h 0.8429 0.8397 GPL8300heart vs spinal 6897 cord 3093 0.8435 1110 604 6760 9535 GPL8300 0 12833 5955 0.8450 3163 15322 3022 1278 577 12877 3146 tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sion profiling (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) basedtion/labeling methods RNA amplifica- basedtion/labeling methods RNA amplifica- sion profiling (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS422 Normal human tissue expres- Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PBGDS1954 Various T7 RNA polymerase- CPBGDS1954 CNB Various T7 RNA polymerase- GDS422 UB Normal human tissue expres- 136 GPL8300 0.8451GPL8300 1996 0.8451 439GPL8300 1996 0.8455 13765 439GPL8300 2396 4110 0.8462 13765 721GPL8300 2396 2218 0.8462 11933 466GPL8300 3946 2218 0.8463 13570 466 2564 2084 13570 448 2564 13686GPL8300 2466 0.8477GPL8300 5666 0.8480 918 2223 10574 462 5108 13561 2577 etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour Affymetrix, 4 h vs Affymetrix, 16 h etat/PCEP4 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour etat/PCEP4 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 nocodazole, 3 HeLa hour Affymetrix, 16 h vs Enzo, 4 h GPL8300heart vs brain 0.8469Affymetrix, 4 h vs14 5781 CodeLink, h 937anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28, GPL8300 IL-4 TGFbeta, 6 10478 h 0.8471 5185 3178 591 12735 3274 tion basedtion/labeling methods RNA amplifica- tion tion tion basedtion/labeling methods RNA amplifica- tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion sion profiling (HG-U95A) basedtion/labeling methods RNA amplifica- GDS449 Cell cycle and Tat transactiva- GDS1954 Various T7 RNA polymerase- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS1954 Various T7 RNA polymerase- GDS1290 CD4+ lymphocyte polariza- Table C.3 (continued) GDS IDGDS449 Title Cell cycle and Tat transactiva- Profile Platform DistanceGDS422 PB NormalGDS1954 human tissue expres- CPB Various T7 RNA polymerase- CNB UB 137 GPL8300 0.8492GPL8300 4650 0.8510 777GPL91 2300 11449 464 0.8517 4374 2167 13486 445 2650 GPL8300 13600 0.8526GPL8300 2555 2233 0.8527GPL8300 632 1955 0.8530GPL8300 415 8098 2101 0.8530 7870 13782 433 2187GPL8300 2403 444 0.8531 13654 2513 4320 13579 2577 717 11719 4164 heart vs pancreasAffymetrix, 16 h vs CodeLink, 14 h etat/PCEP4 GPL8300 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 0.8488 3 HeLa hour normal, 5 toDuchenne muscular 3990 12 dystrophy, year mix10 to vs 12 year mix 692heart vs skeletal muscleheart 12024 vs kidney GPL8300 3884 docetaxel sensitive tumordocetaxel resistant vs 0.8522 tumor etat/PCEP4 HeLa,yurea, 4113 6 hour hydrox- vscontrol, GPL8300 pCep4 nocodazole, 6 HeLa hour 694Lin negative, 0.8526 CD34vs Lin negative positive, CD34 positive anti-CD3 11903 anti-CD28, 3136 IL-4, 48 h vs anti-CD3 4003 anti-CD28,12 IL- TGFbeta, 6 567 h normal, control 12753 vslapse AML, re- 3280 tion (HG-U95A) sion profiling (HG-U95A) tion various types tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sponse to chemotherapy sion profiling (HG-U95A) basedtion/labeling methods RNA amplifica- sion profiling (HG-U95A) treatment GDS449 Cell cycle and Tat transactiva- GDS262 Duchenne muscular dystrophy GDS422 Normal human tissue expres- GDS449 Cell cycle and Tat transactiva- GDS1095 Hematopoeitic stemGDS1290 cells of CD4+ lymphocyte polariza- GDS1059 Acute myeloid leukemia re- Table C.3 (continued) GDS IDGDS422 Title NormalGDS1954 human tissue expres- Various T7 RNA polymerase- ProfileGDS422 NormalGDS360 human tissue expres- Breast cancer and Platform docetaxel Distance PB CPB CNB UB 138 GPL8300 0.8541 2194GPL8300 442 0.8545 2163 13570 2588 437GPL91 13596 0.8553GPL8300 2567 2052 0.8555 421 1994GPL8300 340 13691 0.8563 2488 1627 14248GPL8300 2012 413 0.8564 4311 13741GPL8300 2446 365 0.8570 1951 14060 2175 518 12990 3092 anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,12 IL- TGFbeta, 2 h anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,12, IL- 6 h lung vs prostatenormal, 4 toDuchenne muscular 13 dystrophy, year mix10 to vs 12 year mix etat/PCEP4 HeLa, GPL8300unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- 0.8547trol, nocodazole, 6 hour anti-CD3 3066 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12, 2 vs h 551 anti-CD3 normal, control vs AML, com- 12807plete remission heart vs 3242 thymusanti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 6 GPL8300 0.8565 2850 702 11713 4185 tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sion profiling (HG-U95A) tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sponse to chemotherapy sion profiling (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS422 Normal human tissue expres- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1059 Acute myeloidGDS422 leukemia re- Normal human tissue expres- Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- ProfileGDS262 Duchenne muscular dystrophy Platform Distance PB CPBGDS1290 CNB CD4+ lymphocyte polariza- UB 139 GPL8300 0.8573 1990GPL8300 404 0.8574 1935 13775GPL8300 2421 408 0.8575GPL8300 2121 13740 0.8576 2452 401 1996GPL8300 13788 424 0.8578 2411 1951 13625GPL8300 2551 408 0.8579 2114 13734GPL8300 2458 380 0.8579 2114 13929GPL8300 2291 402 0.8581 2175 13773 2425 422 13630 2548 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 2 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4, 2 vs h anti-CD3 etat/PCEP4 HeLa,yurea, 0 hour hydrox- vscontrol, pCep4 nocodazole, 6 HeLa hour anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4 TGFbeta, vs 6 h anti-CD3 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12, 6 vs h anti-CD3 etat/PCEP4 HeLa,unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- trol, nocodazole, 0 hour etat/PCEP4 HeLa,unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- trol, nocodazole, 0 hour etat/PCEP4 HeLa,unsychronized, control control - unsy- - chronized vs pCep4 HeLa con- trol, nocodazole, 3 hour tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 140 GPL8300 0.8585 1634GPL8300 429 0.8589GPL8300 3600 0.8593 13576 2595 476 1941GPL8300 361 0.8594 13237 2887 2170 14049GPL8300 2190 0 0.8595 2115GPL8300 15322 397 0.8597 1278 1875 13778GPL8300 2425 425 0.8597GPL8300 2103 0.8598 13577 2598 418 1641 388 13625 2557 13835 2377 heart vs lunganti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4 TGFbeta, vs 6 h anti-CD3 carboplatin sensitive tumor vs GPL8300carboplatin resistant tumor anti-CD3 0.8585 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-12, 2 vs 2561 h anti-CD3 anti-CD3 anti-CD28, 422 IL-4, 48 h vs anti-CD3 anti-CD28, con- trol, 6 h 13630anti-CD3 2548 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,4, IL- 2 h anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4, 6 vs h anti-CD3 untreated, 1 h vs24 IFN-gamma, h anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4 TGFbeta, vs 2 h anti-CD3 sistant ovarian carcinoma tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) keratinocytes: time course tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sion profiling (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1381 Carboplatin sensitiveGDS1290 and re- CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1846 Interferon gammaGDS1290 effect on CD4+ lymphocyte polariza- Table C.3 (continued) GDS IDGDS422 Title NormalGDS1290 human tissue expres- CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 141 GPL8300 0.8605 1956GPL8300 416 0.8607GPL8300 1912 13635 0.8609 2549 359 2064GPL8300 14040 370 0.8611 2201 GPL8300 2035 13961 0.8612 2269 396 1979GPL8300 13762 390 0.8617 2442 1628 13800GPL8300 2410 408 0.8622 1611 13666GPL8300 2526 404 0.8623 1565 13691 2505 397 13740 2463 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h control, 6 vs h anti-CD3 etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 hydroxyurea, 6 HeLa hour anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28, IL-4 TGFbeta, 2 h etat/PCEP4 HeLa,yurea, 3 hour hydrox- vscontrol, pCep4 nocodazole, 3 HeLa hour anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h control, 2 vs h anti-CD3 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4, 2 vs h anti-CD3 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 6 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-4, 6 vs h anti-CD3 tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 142 GPL8300 0.8626 1951GPL8300 353 0.8627GPL8300 2342 0.8627 14047 2200 350 2044GPL91 344 14061 0.8627GPL8300 2189 14101 1721 0.8630 2155 390 2190 437 13761 2449 13417GPL80 2746 GPL8300 0.8643 0.8645 4086 1646 418 665 13550 11704 2632 4231 anti-CD3 anti-CD28,TGFbeta, 48 IL-4 anti-CD28, h IL-4 TGFbeta, vs 2 h anti-CD3 training ,test, non-responder non-responder vs anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,4, IL- 6 h normal, 5 toDuchenne muscular 12 dystrophy, year mix5 to vs 6 year mix] anti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28, con- trol, 2 h lung vs pancreaslung vs spinal cordp53 -/+, 0 hour vshour p53 +/+, GPL8300 12 anti-CD3 anti-CD28,TGFbeta, GPL8300 0.8642 IL-12 48anti-CD28, h control, 2 vs h anti-CD3 0.8643 4283lung vs skeletal muscle 3190 401 362 13679 GPL8300 2520 13963 0.8646 2275 3895 534 12666 3400 therapy response tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sion profiling (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sion profiling (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) sion profiling (HG-U95A) gene dosage effects GDS1887 Rectal cancer cellsGDS1290 and radio- CD4+ lymphocyte polariza- GDS262 Duchenne muscular dystrophy GDS1290 CD4+ lymphocyte polariza- GDS422 Normal human tissue expres- GDS1290 CD4+ lymphocyte polariza- GDS422 Normal human tissue expres- Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile PlatformGDS422 Distance NormalGDS170 human tissue PB expres- Tumor suppressor protein p53 CPB CNB UB 143 GPL8300 0.8646 1590GPL8300 641 0.8647 1642 11877GPL8300 4082 349 0.8647GPL8300 1911 14025 0.8649 2226 617 1613GPL8300 12044 342 0.8653GPL8300 3939 1797 0.8656 14074 2184 348 1726GPL8300 380 0.8656 14028GPL8300 2224 2022 13791 0.8656 2429 344GPL8300 2022 0.8657 14053 295 2203 2000 12102 356 4203 13952 2292 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12, 6 vs h anti-CD3 anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h IL-12 vs TGFbeta, anti-CD3 h 2 etat/PCEP4 HeLa,yurea, 6 hour hydrox- vscontrol, pCep4 hydroxyurea, 0 HeLa hour anti-CD3 anti-CD28,TGFbeta, IL-12 48anti-CD28, h control, 6 vs h anti-CD3 non-union skeletal fracture vs normal skeletal fracture anti-CD3 anti-CD28,48 IL-12, h vsIL-4 anti-CD3 TGFbeta, 6 anti-CD28, h etat/PCEP4 HeLa,yurea, 3 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour etat/PCEP4 HeLa,yurea, 3 hour hydrox- vscontrol, pCep4 nocodazole, 0 HeLa hour measles virus, 3 hvirus, 6 vs h measles tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) fractures (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion tion measles virus infection:course time tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS1290 CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS1290 CD4+ lymphocyte polariza- GDS367 SkeletalGDS1290 repair in non-union CD4+ lymphocyte polariza- GDS449 Cell cycle and Tat transactiva- GDS449 Cell cycle and Tat transactiva- GDS1681 Dendritic cell response to Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 144 GPL8300 0.8662 1687 391GPL8300 0.8666 13691 2518 2096GPL8300 388 0.8667GPL8300 2012 0.8676 13710 2502 350 1690GPL8300 470 0.8679 13985 2265 1644 13077 3053 397 13623 2580 anti-CD3 anti-CD28,48 IL-12, h vsIL-4, anti-CD3 6 h anti-CD28, heart vs spleenanti-CD3 anti-CD28, IL-4, 48 h vs anti-CD3 anti-CD28,12, IL- 2 h Lin GPL8300 negative, CD34vs Lin negative negative, CD34 positive 0.8666anti-CD3 anti-CD28,48 IL-12, h vs 2715IL-4 anti-CD3 TGFbeta, 2 anti-CD28, h 391control, 1 h vsparthenolide, 24 TNFalpha h and 13691 2518 sion profiling (HG-U95A) various types tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) keratinocytes in theof presence NF-kappa B inhibitor:course time tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) GDS422 Normal human tissue expres- GDS1095 Hematopoeitic stemGDS1290 cells of CD4+ lymphocyte polariza- GDS1289 TNFalpha effect on epidermal Table C.3 (continued) GDS IDGDS1290 Title CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- Profile Platform Distance PB CPB CNB UB 145 GPL8300 0.00000 1278GPL8300 1278 0.06470 15322 1134 0 669GPL72 14857 0.07687 1074 GPL80 2GPL72 0.07687 2 2 0.07687 2 15322 0 1276 3 15322 1278 15321 1276 . Profiles are identified by DataSet and profile titles and are ranked according to anti-CD3 anti-CD28, control, 48 h vsIL-4, anti-CD3 48 h anti-CD28, anti-CD3 anti-CD28, control, 48 h vsIL-4 anti-CD3 TGFbeta, 48 anti-CD28, h untreated vs interferon betaGFP positive, gain GPL81 of function, Arm vs GFP positive,function, gain Pnt of 0.07681uninfected, 5 24fected, 72 hour hour vsimaginal in- wing disc bodyfragment wall vs 4disc wing/hinge imaginal fragment wing LPS, 8 h vs LPS and R848, 8 h 15321 GPL570 1275 0.07687 2 2 15322 1276 course gene expression ergistic effectcells: time on course dendritic tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tion into Th1 andthe Th2 presence cells of in TGFbeta: time course (HG-U95A) tiviral state by interferon-beta from variousturbed mutants forment per- muscle develop- Profiles Selected by Hamming Distance GDS171 HIVGDS192 viral infection Wing time imaginalGDS1249 disc spatial Toll-like receptor agonists syn- GDS IDGDS1290 Title CD4+ lymphocyte polariza- GDS1290 CD4+ lymphocyte polariza- GDS1519 Fibroblasts induced Profile toGDS1739 an an- Primary mesodermal cells Platform Distance PB CPB CNB UB the Hamming distance between the selectedwell and as query the profiles. number The of platformand common and positive selected number bits profiles of (CPB), positive are common bits givenand negative (PB) for bits a of (CNB), each horizontal the and selected line profile uncommon profile is that bits as used is (UB) to selected. between seperate the clusters The query of query profile profiles. profile is separated from other profiles by a horizontal line Table C.4: 146 GPL72 0.07687GPL341 2 0.07687GPL339 2 2 0.07687 15322 2 2 1276 2 15322 1276 GPL1261 15322 0.07693 1276 1GPL72 0 0.07693GPL85 1 15322 0.07693 1278 GPL90 1 1 0.07693 15322 3 1 1277 0 15322 1277 15322 1278 GFP positive, gain of function, Arm vs GFP positive,function, gain Tkv of high iron, 6 wk vswk low iron, 36 non-tumor, adjacent to tumor, 22 m vs tumor, 22 m TCDD, 1 d vs TCDD, 5 dH4 wild type vs H4 delta(2-26) GPL1319 GPL90H3 wild type vs H4 delta(2-26) 0.07687 GPL90control, 0.07693 control, male 2 vstrol, control, con- female 1wild 0.07693 type vs mutant 391 0 1GFP positive, gain 0 of function, FGFR vs GFP positive, gain GPL81 of function, Arm, Ras 15322 0 15322kainate, 1278 0.07693 immature,kainate, 24 mature, 1278 24 h h 15322 1 vs 0 1278 m,30 wild m, UASGAL1-IFH1, galac- type, 1tose control vs 15322 1277 duodenum:(RAE230A) time course tein mutant modelcellular of carcinoma hepato- fin: time course tants ticus effect onpocampus: immature time hip- course course from variousturbed mutants forment per- muscle develop- minal deletions minal deletions on heart: time course from variousturbed mutants forment per- muscle develop- GDS1054 Iron deficiencyGDS2006 effect on interacting pro- GDS1797 TCDD effect on regenerating GDS1053 Growth receptor mu- GDS1642 Kainate-induced status epilep- GDS1013 IFH1 overexpression: time Table C.4 (continued) GDS IDGDS1739 Title Primary mesodermal cells ProfileGDS1674 Histone H3 andGDS1674 H4 amino ter- Histone H3 andGDS1306 H4 amino ter- MAP kinase Platform activation effect GDS1739 Distance PB Primary mesodermal cells CPB CNB UB 147 GPL90 0.07693GPL85 1 0.07693GPL82 1 0GPL341 0.07693 1 2 15322 0.07693GPL2529 1278 3 0.07693 1 15321GPL81 1 1277 0 15322 0.07693 0 1277 15322 1GPL90 1278 GPL85 15322 0.07693 0 1278 1 0.07693GPL72 15322 1 0 1278 0.07693GPL1261 1 0 15322 0.07693 1278 1 15322 1 1278 15322 1 1277 15322 1277 wild type, ammonium limited vs leu3limited mutant, ammonium control, immature, 6 h vstrol, mature, con- 72 h 7 d, sorted GFP negatived, vs sorted 17 GFP positive high iron, 21 d vswk low iron, 36 5th generation, control vsgeneration, low-shear 5th modeled microgravity 7 d, sorted GFP negatived, vs unsorted 17 wild type vs spt10 null mutant GPL90wild typemutant vs spt10(C388S) kainate, 0.07693 mature,kainate, mature, 240 h 24 1 h vs GFP positive, loss of function, spi vs GFPfunction, 0 positive, Arm, Ras gain of control, control,HRas-v12 15322 transgenic, 2 to 4 male w, female vs 1278 ticus effect onpocampus: immature time hip- course (MG-U74B) duodenum:(RAE230A) time course gravity effect (MG-U74A) mutants ticus effect onpocampus: immature time hip- course from variousturbed mutants forment per- muscle develop- on heart: time course files mutants GDS1642 Kainate-induced status epilep- GDS1632 OsteoblastGDS1054 differentiation Iron deficiencyGDS1687 effect on Low-shear modeled micro- GDS1631 OsteoblastGDS1534 differentiation DNA-binding protein SPT10 GDS1642 Kainate-induced status epilep- GDS1739 Primary mesodermal cells GDS1306 MAP kinase activation effect Table C.4 (continued) GDS IDGDS1103 Title leu3 mutant expression pro- ProfileGDS1534 Platform DNA-binding protein Distance SPT10 PB CPB CNB UB 148 GPL72 0.07693GPL85 1 0.07693 0 1GPL8300 15322 0 1278 0.07693GPL72 5 15322 0.07693 1278 0GPL341 1 0.07693 15322GPL1261 1 3 1278 0.07693GPL72 15322 3 3 0.07693 1277 GPL72 1 1 15320 1277 0.07693 2 15322 1 1277 15321 2 1277 15321 1277 GFP positive, loss of function, spi vs GFPfunction, positive, Arm gain of kainate, immature,kainate, immature, 72 1 h h vs LPS, 8 h vs R848, 8 huntreated, 0 h vs measles virus, 24 h GPL570GFP positive, gain of function, FGFR 0.07693 vs GFP positive, lossfunction, of lmd 1high iron, 6 wk vs high iron,wk 36 1wild type vsdown Gata-1 15322 knock- GFP positive, gain of function, EGFR 1277 vs GFP positive, gain of function, Arm, Ras GFP positive, gain of function, EGFR vs GFP positive, gain of function, Tkv ticus effect onpocampus: immature time hip- course ergistic effectcells: time on course dendritic from variousturbed mutants forment per- muscle develop- duodenum:(RAE230A) time course magakaryocytes from variousturbed mutants forment per- muscle develop- from variousturbed mutants forment per- muscle develop- from variousturbed mutants forment per- muscle develop- measles virus infection:course time GDS1642 Kainate-induced status epilep- GDS1249 Toll-like receptor agonists syn- GDS1739 Primary mesodermal cells GDS1054 Iron deficiencyGDS1245 effect on GATA-1 knockdown effect on GDS1739 Primary mesodermal cells GDS1739 Primary mesodermal cells Table C.4 (continued) GDS IDGDS1739 Title Primary mesodermal cells ProfileGDS1681 Dendritic cell response to Platform Distance PB CPB CNB UB 149 GPL339 0.07693GPL90 1 0.07693 0 3GPL72 15322 1 0.07693 1278 GPL85 1 15322 0.07693 1277 GPL96 2 1 0.07693GPL341 15321 1 0 1277 0.07693GPL80 1 1 15322GPL198 0.07693 1278 0 1 15322 0.07693GPL80 1277 1 15322 0GPL90 0.07693 1278 1 3 0.07693 15322 1 1278 15322 1 1277 1 15322 1277 15322 1277 untreated vs CstF-64pression overex- 0 m, wild type,m, wild control type, vs galactose 30 control vs 50uM zincGFP positive, gain of function, EGFR vs GFP GPL570 positive, gain of function, Notch 0.07693clofibrate, 4 h, individual sam- ples 1 vs clofibrate,vidual 24 samples h, indi- normal, GM0316B vsAG11498 HGPS, 1amoxicillin, 17 d, distalintestine small vs 15322 amoxicillin, 21distal small d, intestine 1277 uninfected, 48fected, 24 hour hour vsrop10-1 in- mutant,rop10-1 mutant, control abscisic acid vs uninfected, 48fected, 72 hour hour vswild in- type, ethanolwild limited type, ammonium vs limited course zinc effect onlymphoma line Ramos B-cell pooled and individualcomparison sample ria syndrome:(HG-U133A) fibroblast testine and colon development: time course course GTPaseseedlings rop10-1 mutant course files lipopolysaccharide and CstF- 64 overexpression from variousturbed mutants forment per- muscle develop- GDS1013 IFH1 overexpression:GDS1617 time Motexafin gadolinium and GDS1451 Toxicants effect onGDS1503 liver: Hutchinson-Gilford proge- GDS1273 Amoxicillin effect on small in- GDS171 HIVGDS1723 viral Abscisic infection acid effect time on small GDS171 HIVGDS1103 viral leu3 infection mutant time expression pro- Table C.4 (continued) GDS IDGDS1285 Title Macrophage response to GDS1739 Primary Profile mesodermal cells Platform Distance PB CPB CNB UB 150 GPL90 0.07693 3 2GPL339 0.07693 15321 5 1277 GPL90 1 0.07693GPL85 3 15322 0.07693 1277 GPL1261 1 1 0.07693GPL72 3 1 15322 0.07693 1277 2GPL90 1 15322 1277 15321 0.07693 1 1277 3 15322 2 1277 15321 1277 0 m,40 wild m, UASGAL1-IFH1, galac- type,tose control vs wild type vs spt10 nullMist1 null, control GPL90null, caerulein vs Mist1 0.07693 10 m, UASGAL1-IFH1, control vs 20galactose m, 2 UASGAL1-IFH1, kainate, mature,kainate, mature, 240 h 72 h 15321 vs control, control,HRas-v12 1277 transgenic, 2 female to 4 w, female vs GFP positive, loss of function, spi vs GFPfunction, wg positive, loss of 0 m, UASGAL1-IFH1, control vs 30 m, wild type, galactose regulator null mutant course ticus effect onpocampus: immature time hip- course on heart: time course from variousturbed mutants forment per- muscle develop- course course tion factor Mist1 null GDS1395GDS1270 Viral infection: time course SPT10 global control, 24 transcription h vs virus, 48 hGDS1395 GPL72GDS1395 Viral infection: time courseGDS1395 Viral infection: time course 0.07693GDS1013 Viral infection: virus, 24 time h course vs virus, 1 48 IFH1 h control, 24 h overexpression: vs virus, 24 h control, 48GDS1642 time h vs virus, 24 h GPL72 1 Kainate-induced status GPL72 epilep- GDS1306 GPL72 0.07693 15322 MAP 0.07693 kinase activation 1 effect 0.07693 1277 1GDS1739 1 Primary mesodermal 1 0 cells GDS1013 1 IFH1 15322 overexpression: 15322 time 1277 15322 1278 1277 Table C.4 (continued) GDS IDGDS1013 Title IFH1 overexpression: time GDS1395GDS1731 Viral infection: time course Caerulein effect on transcrip- Profile control, 48 h vs virus, 48 h GPL72 0.07693 1 Platform Distance 1 PB 15322 CPB 1277 CNB UB 151 GPL72 0.07693GPL85 1 0.07693GPL90 1 1GPL90 0.07693 15322 3 2 0.07693 1277 GPL72 3 1 15321 0.07693 1277 GPL72 1 0 15322 0.07693 1277 2 15322GPL81 1 1278 15321 0.07693GPL8300 1 1277 1 0.07693 15322 5GPL90 1 1277 2 0.07693 15322 3 1277 15321 1277 1 15322 1277 GFP positive, gain of function, FGFR vs GFP positive, gain of function, Notch kainate, immature,kainate, 24 mature, 240 h h vs 0 m, wild type,m, wild control type, vs galactose 60 0 m, UASGAL1-IFH1, control vs 30galactose m, UASGAL1-IFH1, GFP positive, gain of function, FGFR vs GFP positive, lossfunction, of wg GFP positive, gain of function, FGFR vs GFP positive, lossfunction, of Dl wild type , 4 d vsexpressor, PMP22 4 over- d unprimed, unstimulated,vs 0 IFN-gamma h primed,gamma restimulated, IFN- 24 h 0 m, UASGAL1-IFH1, control vs 40galactose m, UASGAL1-IFH1, ticus effect onpocampus: immature time hip- course course course from variousturbed mutants forment per- muscle develop- from variousturbed mutants forment per- muscle develop- gene dosage andtion point effect on muta- sciatic nerve macrophageIFN-gamma responsetime course restimulation: to course from variousturbed mutants forment per- muscle develop- GDS1642 Kainate-induced status epilep- GDS1013 IFH1 overexpression:GDS1013 time IFH1 overexpression:GDS1739 time Primary mesodermal cells GDS1739 Primary mesodermal cells GDS1371 Peripheral myelin protein 22 GDS1365 IFN-gammaGDS1013 primed IFH1 overexpression: time Table C.4 (continued) GDS IDGDS1739 Title Primary mesodermal cells Profile Platform Distance PB CPB CNB UB 152 GPL72 0.07693GPL341 1 0.07693GPL90 1 1GPL72 0.07693 15322 2 3 0.07693 1277 GPL90 1 15321 2 1277 0.07693GPL72 1 3 15321 0.07693 1277 15322GPL90 1 2 1277 0.07693GPL72 1 15321 3 1277 0.07693 15322GPL341 1 0 1277 0.07693 1 15322 1 1278 15322 0 1277 15322 1278 GFP positive, gain of function, FGFR vs GFP positive, lossfunction, of spi control, 17 d,testine distal small vs in- amoxicillin,distal small 21 intestine d, 0 m, UASGAL1-IFH1, control vs 60 m, wild type, galactose GFP positive, gain of function, EGFR vs GFP positive, lossfunction, of wg 0 m,20 wild m, UASGAL1-IFH1, galac- type,tose control vs GFP positive, gain of function, EGFR vs GFP positive, lossfunction, of Dl 0 m, UASGAL1-IFH1, control vs 60galactose m, UASGAL1-IFH1, GFP positive, gain of function, EGFR vs GFP positive, gain of function, Ras conditioned mediumBSN cells vs heregulin from testine and colon development: time course course from variousturbed mutants forment per- muscle develop- course from variousturbed mutants forment per- muscle develop- course from variousturbed mutants forment per- muscle develop- to non-branching and branch- ing growth promoting factors from variousturbed mutants forment per- muscle develop- GDS1273 Amoxicillin effect on small in- GDS1013 IFH1 overexpression:GDS1739 time Primary mesodermal cells GDS1013 IFH1 overexpression:GDS1739 time Primary mesodermal cells GDS1013 IFH1 overexpression:GDS1739 time Primary mesodermal cells GDS1518 Ureteric bud epithelia response Table C.4 (continued) GDS IDGDS1739 Title Primary mesodermal cells Profile Platform Distance PB CPB CNB UB 153 GPL72 0.07693GPL90 1 0.07693 1 1GPL82 15322 1GPL90 0.07693 1277 1 0.07693 15322 1 1277 0GPL90 0 0.07693 15322 1 1278 15322GPL72 1278 1 0.07693GPL90 1 15322 1277 0.07693 0 1 15322 1 1278 15322 1277 ble deletion GFP positive, gain of function, EGFR vs GFP positive, lossfunction, of lmd 30 m, wild type,30 m, galactose UASGAL1-IFH1, galac- vs tose 7 d, unsorted vs 17 d, unsorted7 GPL81 d, sorted GFP positived, vs sorted GFP 17 positive 0.07693 1wild type vs GHR -/- 030 m, wild type,40 m, galactose UASGAL1-IFH1, galac- vs tose GPL81 1532225uM zinc vs MGd/25uM zinc 1278 GPL570 0.07693GFP positive, gain of 1 function, 0.07693Arm vs GFP positive,function, wg loss 1 of 030 m, wild type,60 m, galactose UASGAL1-IFH1, galac- 1 vs tose 15322 1278 15322 1277 course (MG-U74A) tants zinc effect onlymphoma line Ramos B-cell course from variousturbed mutants forment per- muscle develop- (MG-U74B) course from variousturbed mutants forment per- muscle develop- GDS1013 IFH1 overexpression:GDS1631 time Osteoblast differentiation GDS1033 fhl1 and ifh1 deletionGDS1383 mutantsGDS1053 fhl1 Salinity deletion stress vs response fhl1 ifh1 dou- Growth hormone receptor mu- control, FL478 vs saline, IR29GDS1617 GPL2025 Motexafin gadolinium 0.07693 and 1 3GDS1013 IFH1 overexpression: 15320 time 1277 Table C.4 (continued) GDS IDGDS1739 Title Primary mesodermal cells GDS1632 Profile Osteoblast differentiation GDS1013 IFH1 Platform overexpression: time Distance PBGDS1739 Primary CPB mesodermal cells CNB UB 154 GPL97 0.07693GPL198 1 0.07693GPL90 1 0 0.07693GPL341 1 3 15322 0.07693 1278 3 15322 0 1277 0GPL72 15322 1278 0.07693 15322GPL72 1 1278 0.07693 2GPL72 1 15321 0.07693 1 1277 GPL80 1 15322 0.07693 1 1277 3 15322 1 1277 15322 1277 normal, GM00038Cmal, vs GM0316B nor- wild type, control vs wild type, abscisic acid 0 m,60 wild m, UASGAL1-IFH1, galac- type,tose control vs low iron, 6 wk vswk low iron, 36 adult, LPS vs aged, saline GPL1261GFP positive, gain of function, Arm 0.07693 vs GFP positive,function, lmd loss of 1GFP positive, gain of function, Arm vs GFP 2 positive,function, Dl loss of GFP positive, gain of function, 15321EGFR vs GFP positive, gain of function, Arm 1277 uninfected, 48fected, 0 hour hour vs in- GTPaseseedlings rop10-1 mutant course duodenum:(RAE230A) time course lipopolysaccharide-induced neuroinflammation andness sick- behavior from variousturbed mutants forment per- muscle develop- from variousturbed mutants forment per- muscle develop- course ria syndrome:(HG-U133B) fibroblast from variousturbed mutants forment per- muscle develop- GDS1723 Abscisic acid effect on small GDS1013 IFH1 overexpression:GDS1054 time Iron deficiencyGDS1311 effect on Age effect on GDS1739 Primary mesodermal cells GDS1739 Primary mesodermal cells GDS171 HIV viral infection time Table C.4 (continued) GDS IDGDS1504 Title Hutchinson-Gilford proge- ProfileGDS1739 Primary Platform mesodermal cells Distance PB CPB CNB UB 155

Appendix D: Predicted Enrichments Using Binary Distances

Table D.1: Enriched KEGG Pathways for GDS2856. Significantly regulated genes from GDS2856 are enriched for KEGG pathways using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrichments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched.

Measure Pathway EASE δτ δR Tanimoto hsa04151 0.2542 0.0214 1.5251 hsa04080 0.4630 0.0538 1.6730 hsa04724 0.0594 0.0518 0.8059 hsa04144 0.6691 0.0964 1.8685 hsa04630 0.2013 0.0641 1.2164 hsa04068 0.0521 0.0386 0.8583 hsa04060 0.4114 0.0511 1.6150 hsa05200 0.8529 0.0829 2.1179 Ratio hsa00785 1.0000 0.8330 0.0159 EASE hsa04724 0.0595 0.0518 0.8059 hsa04068 0.0521 0.0386 0.8584 hsa05020 0.0939 0.2096 0.3266 156

Table D.2: Enriched KEGG Pathways for GDS2856. Significantly regulated genes from GDS2856 are enriched for GO associations using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrichments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched.

Measure Pathway EASE δτ δR Tanimoto GO0045211 0.0081 0.0962 0.5265 GO0005615 0.7675 0.0974 0.5854 GO0005829 0.9678 0.0820 0.5889 GO0005622 0.4877 0.0799 0.5802 Ratio EASE GO0035249 0.0999 0.2648 0.4775 GO0060395 0.0247 0.2018 0.4763 GO0045211 0.0081 0.0962 0.5265 GO0035235 0.0384 0.2613 0.4185 GO0031116 0.0491 0.3012 0.3330 GO0071625 0.0651 0.3018 0.3646 GO0030165 0.0500 0.1847 0.5134 GO0048011 0.0745 0.1122 0.5522 GO0007605 0.0021 0.1039 0.4980 GO0030216 0.0884 0.2338 0.5009 GO0005234 0.0185 0.2594 0.3681 GO0060291 0.0821 0.2640 0.4661 GO0060292 0.0826 0.3023 0.3903 GO0042043 0.0651 0.3018 0.3646 157

Table D.3: Enriched KEGG Pathways for GDS3534. Significantly regulated genes from GDS3534 are enriched for KEGG pathways using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrichments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched.

Measure Pathway EASE δτ δR Tanimoto hsa05169 0.2242 0.0488 0.1425 hsa04151 0.0180 0.0032 0.0976 hsa05161 0.0402 0.0633 0.0674 hsa04015 0.0323 0.0228 0.0823 hsa04010 0.0368 0.0129 0.0949 hsa04080 0.8251 0.0489 0.3238 hsa04022 0.0156 0.0371 0.0582 hsa04020 0.2374 0.0662 0.1403 hsa04024 0.2160 0.0487 0.1403 hsa04144 0.3280 0.0277 0.1782 hsa04921 0.0878 0.0644 0.0912 hsa05206 0.9402 0.0559 0.4099 hsa04014 0.0749 0.0235 0.1057 hsa05016 0.0582 0.0346 0.0906 hsa05010 0.1486 0.0652 0.1134 hsa05012 0.1725 0.0994 0.1108 hsa04725 0.0102 0.0853 0.0347 hsa04727 0.0004 0.0828 0.0104 hsa04120 0.0880 0.0881 0.0834 hsa05166 0.0623 0.0144 0.1077 hsa04810 0.0612 0.0257 0.0976 hsa04510 0.0249 0.0227 0.0765 hsa05222 0.0008 0.0928 0.0117 hsa05215 0.0005 0.0829 0.0111 hsa04068 0.0726 0.0878 0.0767 hsa05205 0.0992 0.0351 0.1086 hsa05200 0.8e-04 0.0005 0.0545 hsa05203 0.1093 0.0352 0.1124 hsa04910 0.0234 0.0627 0.0552 hsa04261 0.1621 0.0893 0.1102 Ratio hsa04915 0.1121 0.1688 0.0731 hsa00565 0.3467 0.5282 0.0834 hsa00564 0.1369 0.1890 0.0778 hsa04914 0.0222 0.1485 0.0335 hsa04964 0.5892 0.8473 0.0976 158

Table D.3 (continued) Measure Pathway EASE δτ δR hsa04151 0.0180 0.0032 0.0976 hsa05161 0.0402 0.0633 0.0674 hsa04270 0.1128 0.1219 0.0840 hsa03440 0.5235 0.7523 0.0976 hsa04015 0.0323 0.0228 0.0823 hsa00920 0.4061 0.9538 0.0097 hsa04010 0.0368 0.0129 0.0949 hsa04152 0.0934 0.1091 0.0797 hsa00600 0.2428 0.4692 0.0621 hsa00250 0.2876 0.5928 0.0534 hsa04722 0.1128 0.1219 0.0840 hsa04668 0.0924 0.1354 0.0728 hsa04723 0.0461 0.1341 0.0515 hsa04022 0.0156 0.0371 0.0582 hsa04726 0.1057 0.1356 0.0780 hsa03320 0.2504 0.3329 0.0880 hsa05014 0.1927 0.4174 0.0559 hsa03430 0.5892 0.8473 0.0976 hsa05410 0.0945 0.2102 0.0582 hsa05412 0.0777 0.2346 0.0473 hsa05414 0.0889 0.1880 0.0602 hsa05416 0.0067 0.2060 0.0122 hsa00750 0.6059 1.0764 0.0073 hsa05230 0.0185 0.2073 0.0220 hsa04921 0.0878 0.0644 0.0912 hsa05145 0.1128 0.1219 0.0840 hsa04320 0.4675 0.7518 0.0754 hsa05120 0.1444 0.2956 0.0609 hsa04540 0.0465 0.1671 0.0450 hsa04962 0.0999 0.4158 0.0292 hsa04620 0.1738 0.1700 0.0944 hsa00430 0.1805 0.8449 0.0026 hsa00531 0.4621 0.8465 0.0473 hsa00532 0.4956 0.8467 0.0587 hsa00030 0.3158 0.6670 0.0473 hsa00130 0.7974 1.0770 0.0976 hsa05016 0.0582 0.0345 0.0906 hsa05219 0.0702 0.4150 0.0207 hsa03410 0.4196 0.6679 0.0789 hsa00970 0.2068 0.3323 0.0745 hsa04725 0.0102 0.0853 0.0347 159

Table D.3 (continued) Measure Pathway EASE δτ δR hsa04122 0.7685 1.0769 0.0702 hsa04727 0.0004 0.0828 0.0104 hsa04120 0.0880 0.0881 0.0834 hsa00524 1.0000 1.2156 0.0976 hsa00520 0.2610 0.4695 0.0674 hsa04961 0.3895 0.5287 0.0976 hsa03460 0.2243 0.4179 0.0653 hsa04966 0.2649 0.6665 0.0347 hsa04730 0.2229 0.3726 0.0725 hsa00120 0.6751 0.9550 0.0976 hsa05211 0.0344 0.2331 0.0281 hsa05216 0.1621 0.5914 0.0229 hsa04110 0.0934 0.1091 0.0797 hsa00511 0.4276 0.8463 0.0376 hsa00290 1.0000 1.2155 0.0473 hsa05220 0.0374 0.2084 0.0332 hsa03018 0.1003 0.2352 0.0559 hsa04810 0.0612 0.0257 0.0976 hsa05134 0.2578 0.4183 0.0754 hsa00450 0.6751 0.9550 0.0976 hsa05032 0.0174 0.1326 0.0324 hsa05033 0.4035 0.5940 0.0894 hsa05030 0.1631 0.4169 0.0473 hsa00051 0.3937 0.6677 0.0702 hsa05218 0.0143 0.1848 0.0216 hsa04510 0.0249 0.0227 0.0765 hsa05222 0.0008 0.0928 0.0117 hsa05212 0.2068 0.3323 0.0745 hsa05213 0.0591 0.3292 0.0266 hsa05210 0.0092 0.2063 0.0146 hsa00790 0.5738 0.9545 0.0473 hsa05214 0.3035 0.3737 0.0976 hsa05215 0.0005 0.0829 0.0111 hsa04068 0.0726 0.0878 0.0767 hsa04064 0.1664 0.2117 0.0834 hsa00650 0.2649 0.6665 0.0347 hsa05221 0.1792 0.3719 0.0592 hsa00062 0.4092 0.7514 0.0559 hsa00061 0.5352 0.9543 0.0347 hsa04210 0.0640 0.1873 0.0497 hsa04370 0.2384 0.3728 0.0773 160

Table D.3 (continued) Measure Pathway EASE δτ δR hsa05200 8.3e-05 0.0005 0.0545 hsa04662 0.1923 0.2965 0.0762 hsa04666 0.1769 0.2118 0.0868 hsa05110 0.2243 0.4179 0.0653 hsa00471 1.0000 1.2155 0.0473 hsa01040 0.5892 0.8473 0.0976 hsa04910 0.0234 0.0627 0.0552 hsa00072 0.4061 0.9538 0.0097 hsa00270 0.3568 0.5935 0.0739 EASE hsa04914 0.0222 0.1485 0.0335 hsa04151 0.0180 0.0032 0.0976 hsa05161 0.0402 0.0633 0.0674 hsa04015 0.0323 0.0228 0.0823 hsa04010 0.0368 0.0129 0.0949 hsa04152 0.0934 0.1091 0.0797 hsa04668 0.0924 0.1354 0.0728 hsa04723 0.0461 0.1341 0.0515 hsa04022 0.0156 0.0371 0.0582 hsa05410 0.0945 0.2102 0.0582 hsa05412 0.0777 0.2346 0.0473 hsa05414 0.0890 0.1880 0.0602 hsa05416 0.0067 0.2060 0.0122 hsa05230 0.0185 0.2073 0.0220 hsa04921 0.0878 0.0644 0.0912 hsa04540 0.0465 0.1671 0.0450 hsa04962 0.0999 0.4158 0.0292 hsa04014 0.0749 0.0235 0.1057 hsa05016 0.0582 0.0345 0.0906 hsa05219 0.0702 0.4150 0.0207 hsa04725 0.0102 0.0853 0.0347 hsa04727 0.0004 0.0828 0.0104 hsa04120 0.0880 0.0881 0.0834 hsa05211 0.0344 0.2331 0.0281 hsa04110 0.0934 0.1091 0.0797 hsa05220 0.0374 0.2084 0.0332 hsa05166 0.0623 0.0144 0.1077 hsa04810 0.0612 0.0257 0.0976 hsa05032 0.0174 0.1326 0.0324 hsa05218 0.0143 0.1848 0.0216 hsa04510 0.0249 0.0227 0.0765 hsa05222 0.0008 0.0928 0.0117 161

Table D.3 (continued) Measure Pathway EASE δτ δR hsa05213 0.0591 0.3292 0.0266 hsa05210 0.0092 0.2063 0.0146 hsa05215 0.0005 0.0829 0.0111 hsa04068 0.0726 0.0878 0.0767 hsa04210 0.0640 0.1873 0.0497 hsa05205 0.0992 0.0351 0.1086 hsa05200 8.3e-05 0.0005 0.0545 hsa04910 0.0234 0.0627 0.05524 162

Table D.4: Enriched GO Annotations for GDS3534. Significantly regulated genes from GDS3534 are enriched for GO annotations using 3 measures: the tanimoto distance δτ, the ratio distance δR, and the EASE score. The EASE enrichments are considered as proper enrichments while the other enrichments are predicted. Enriched pathways are given for each measure and are shown with values determined by all measures. A value of less than 0.100 is considered enriched.

Measure Pathway EASE δτ δR Tanimoto GO0005794 0.3231 0.02341 0.4501 GO0005524 0.3101 0.0058 0.4523 GO0005509 0.9670 0.0570 0.4818 GO0005925 0.0504 0.0666 0.4226 GO0005739 0.3022 0.0092 0.4513 GO0048471 0.0565 0.0347 0.4306 GO0009986 0.8539 0.0826 0.4726 GO0005615 0.9973 0.0233 0.4862 GO0042803 0.4440 0.0363 0.4537 GO0005654 0.0294 0.0011 0.4442 GO0005730 0.9963 0.0493 0.4906 GO0005829 0.0013 0.0005 0.4385 GO0008270 0.8161 0.0140 0.4662 GO0048011 0.0222 0.0952 0.4071 GO0007596 0.2604 0.0609 0.4436 GO0010467 0.4458 0.0255 0.4543 GO0070062 0.0801 0.0010 0.4481 GO0005743 0.0852 0.0605 0.4294 GO0046982 0.5985 0.0866 0.4588 GO0005622 0.9999 0.0318 0.4986 Ratio GO0016035 0.078 0.4302 0.0935 EASE GO0010259 0.0510 0.38312 0.25639 GO0030014 0.0098 0.3685 0.20684 GO0030018 0.0477 0.2214 0.38522 GO0045294 0.0072 0.3829 0.15723 GO0045296 0.0514 0.3550 0.30413 GO0005791 0.0354 0.3171 0.32813 GO0030521 0.0702 0.3295 0.34324 GO0051082 0.0423 0.2291 0.38002 GO0071456 0.0691 0.2296 0.39089 GO0005925 0.0504 0.0666 0.42255 GO0001102 0.0924 0.3421 0.34433 GO0042787 0.0460 0.2642 0.36813 GO0005769 0.0410 0.1343 0.40572 GO0048471 0.0565 0.0347 0.43063 163

Table D.4 (continued) Measure Pathway EASE δτ δR GO0071549 0.0779 0.3832 0.28177 GO0009791 0.0534 0.2550 0.37583 GO0051721 0.0306 0.3687 0.25832 GO0042632 0.0123 0.2634 0.33628 GO0051539 0.0617 0.3294 0.33821 GO0007252 0.0226 0.3830 0.21158 GO0000049 0.0118 0.3051 0.30413 GO0005654 0.0294 0.0011 0.44424 GO0005829 0.0013 0.0005 0.43849 GO0030145 0.0954 0.3176 0.36334 GO0031527 0.0182 0.3686 0.23402 GO0008503 0.0458 0.4138 0.14719 GO0005782 0.0713 0.3420 0.33301 GO0048011 0.0223 0.0952 0.40711 GO0046716 0.0937 0.3832 0.2933 GO0005080 0.0311 0.3055 0.33301 GO0008385 0.0681 0.3982 0.23657 GO0016035 0.0783 0.4302 0.093521 GO0046961 0.0576 0.3688 0.29008 GO0030315 0.0617 0.3294 0.33821 GO0070062 0.0801 0.0010 0.4481 GO0005743 0.0852 0.0605 0.42944 GO0005741 0.0770 0.2071 0.39954 GO0019430 0.0863 0.3982 0.25367 GO0019432 0.0269 0.3054 0.32854 GO0005689 0.0817 0.3689 0.30858 GO0071377 0.0467 0.3293 0.32764 GO0015288 0.0666 0.4138 0.17648 GO0031069 0.0947 0.3553 0.33301 GO0007494 0.0044 0.3828 0.13689 GO0030890 0.0924 0.3421 0.34433 GO0015629 0.0759 0.1395 0.41398 GO0042470 0.0024 0.1972 0.34158 164

Appendix E: Data Used in the Meta-analysis of Gene Expression Studies

Figure E.1: Phylogenetic Tree of Pathogen Species. The species are divided into 3 groups: Viruses (green), Bacteria (violet), and Eukaryotes (orange). The number of significant genes in each group are shown. Groups that do not produce a significant gene are included. 165 GSM49942,GSM49944 GSM49943, GSM155916,GSM155918, GSM155917, GSM155995 GSM155994, GSM157037,GSM157039 GSM157038, GSM49941 GSM155998, GSM155999 GSM157036 . The GSE series ID, the Platform ID, the series title, the species of A. phagocytophilum GSM49939, GSM49940, Y. enterocolitica GSM107566,M. GSM107568 bovis GSM107571, GSM107572 M. bovis GSM76894, GSM76900M. bovis GSM76897, GSM76902 GSM76905, GSM76909A. caviae GSM76906, GSM76910 GSM76912, GSM76916 GSM76913, GSM76917 GSM155996, GSM155997, S. aureus GSM157034, GSM157035, Anaplasma phagocytophilum in- fection profile 1 fected with Yersiniaica: time enterocolit- course profile 1 tiated macrophagesbacillus response Calmette-Guerin to (Subset A) profile 1 tiated macrophagesbacillus response Calmette-Guerin to (Subset B) profile 1 tiated macrophagesbacillus response Calmette-Guerin to (Subset C) profile 1 Aeromonasprofile 1 caviae infection responsepathogens to profile 1 various airway Profiles Used in Gene Expression Meta-analysis GEO IDGSE2600 Platform GPL570 Title Promyelocytic cell response to Infection Species Control Samples Infection Samples GSE4764 GPL81 Intestinal lymphoid tissues in- GSE3408 GPL1293 M-CSF and GM-CSF differen- GSE3408 GPL1294 M-CSF and GM-CSF differen- GSE3408 GPL1295 M-CSF and GM-CSF differen- GSE6765 GPL1261 Small intestine response to GSE6802 GPL571 Bronchial epithelial cell line Table E.1: infectious pathogen, control sampleplatforms IDs, and and series infection using sample multiplesamples, IDs vector infectious are notation species of given are the for separated form each textitGSM(i):increment:GSM(i+n) into profile. is distinct used. profiles. Series that For are profiles performed that on contain a multiple large number of 166 GSM157043, GSM157044 GSM159477,GSM159480, GSM159478 GSM159479, GSM159481,GSM159483, GSM159484 GSM159482, GSM148023, GSM148024 GSM196068, GSM196069 GSM245725,GSM245727, GSM245728 GSM245726, GSM157036 GSM159487, GSM159488 GSM159487, GSM159488 GSM196064 GSM245731, GSM245732 P. aeroginosa GSM157034, GSM157035, C. pneumoniaB. abortus GSM154460, GSM154461B. melitensis GSM154463, GSM154464 GSM155205, GSM155248S. gordonii GSM155205, GSM155248 GSM155249, GSM155251 GSM155256, GSM155257 GSM159485, GSM159486, F. nucleatum GSM159485, GSM159486, E. coliP. aeroginosa GSM148017, GSM148018 GSM196062, GSM196063, GSM148021, GSM148022, P. gingivalis GSM245729, GSM245730, responsepathogens to profile 2 various airway Chlamydia pneumoniae infection profile 1 by virB mutantprofile 1 Brucella strains by virB mutantprofile 2 Brucella strains sponse to commensal andtunistic oral oppor- microbial species pro- file 1 sponse to commensal andtunistic oral oppor- microbial species pro- file 2 to intracellular bacterialnities commu- profile 1 domonasairway aeruginosa epithelium fromand exposed C57Bl6 MMP-7 andcient mice. MMP-10 profile 1 defi- sponse to oral pathogen infections profile 1 Table E.1 (continued) GEO IDGSE6802 Platform GPL571 Title Bronchial epithelial cell line Infection Species Control Samples Infection Samples GSE6690 GPL81 Macrophage cell line response to GSE6810 GPL1261 Splenocyte response toGSE6810 infection GPL1261 Splenocyte response toGSE6927 infection GPL96 Gingival epithelial cell line re- GSE6927 GPL96 Gingival epithelial cell line re- GSE6419 GPL8321 Host bladder urothelialGSE7957 response GPL1261 Expression data from Pseu- GSE9723 GPL96 Gingival epithelial cell line re- 167 GSM245733,GSM245735, GSM245736 GSM245734, GSM305429,GSM305433, GSM305435 GSM305431, GSM305437,GSM305441, GSM305442 GSM305439, GSM259189,GSM259191 GSM259190, GSM259192,GSM259194 GSM259193, GSM259180,GSM259182 GSM259181, GSM259183,GSM259185 GSM259184, GSM284859,GSM284865 GSM284862, GSM245729,GSM245731, GSM245732 GSM245730, GSM305434,GSM305438, GSM305440 GSM305436, GSM305434,GSM305438, GSM305440 GSM305436, GSM259188 GSM259188 GSM259179 GSM259179 GSM284864 A.comitans actinomycetem- S. flexneriS. flexneri GSM213913, GSM213914F. tularensis GSM213915, GSM213916 GSM213913, GSM213914 GSM305430, GSM213917, GSM213918 GSM305432, F. tularensis GSM305430, GSM305432, H. pylori GSM259186, GSM259187, H. pylori GSM259186, GSM259187, H. pylori GSM259177, GSM259178, H. pylori GSM259177, GSM259178, S. aureus GSM284858, GSM284861, sponse to oral pathogen infections profile 2 sponse toflexneri infection in vivo profile pathogen 1 Shigella sponse toflexneri infection in vivo profile pathogen 2 Shigella sponse to Francisella tularensis in- fections profile 1 sponse to Francisella tularensis in- fections profile 2 fect on a gastric epithelialitor progen- cell line profile 1 fect on a gastric epithelialitor progen- cell line profile 2 fect on a gastric epithelialitor progen- cell line profile 3 fect on a gastric epithelialitor progen- cell line profile 4 ins effectmononuclear cells on profile 1 peripheral blood Table E.1 (continued) GEO IDGSE9723 Platform GPL96 Title Gingival epithelial cell line re- Infection Species Control Samples Infection Samples GSE8636 GPL96 Intestinal tissue xenograft re- GSE8636 GPL96 IntestinalGSE12108 tissue GPL570 xenograft re- Peripheral blood monocyte re- GSE12108 GPL570 Peripheral blood monocyte re- GSE10262 GPL1261 Helicobacter pylori infection ef- GSE10262 GPL1261 Helicobacter pylori infection ef- GSE10262 GPL1261 Helicobacter pylori infection ef- GSE10262 GPL1261 Helicobacter pylori infection ef- GSE11281 GPL570 Staphylococcal superantigen tox- 168 GSM284860,GSM284866 GSM284863, GSM161570, GSM161571 GSM403380,GSM403384 GSM403382, GSM403387, GSM403388 GSM403444,GSM403446, GSM403447 GSM403445, GSM509706,GSM509714, GSM509711, GSM509724, GSM509729 GSM509719, GSM882309,GSM882312 GSM882310, GSM501251,GSM501253 GSM501252, GSM284864 GSM403378, GSM403379 GSM403378, GSM403379 GSM403378, GSM403379 GSM509781,GSM509777, GSM509779, GSM509773, GSM509775, GSM509769, GSM509771, GSM509765, GSM509767, GSM509761, GSM509763, GSM509757, GSM509759, GSM509753, GSM509751 GSM509755, GSM882307 S. aureus GSM284858, GSM284861, L. monocytogenesC. pneumonia GSM161581, GSM161582R. prowazekii GSM161223, GSM321605, GSM321606 GSM161226, GSM403281, GSM321607, GSM321608 GSM403377, R. prowazekii GSM403281, GSM403377, R. prowazekii GSM403281, GSM403377, S. aureus GSM509785, GSM509783, P. aeroginosa GSM882304, GSM882306, M. tuberculosis GSM501254, GSM501255 GSM501249, GSM501250, ins effectmononuclear cells on profile 2 peripheral blood infection series profile 1 effect on dendritic cells profile 1 sponse to infectionprowazekii by profile 1 Rickettsia sponse to infectionprowazekii by profile 2 Rickettsia sponse to infectionprowazekii by profile 3 Rickettsia whole blood profile 1 lung epithelial cells profile 1 comparison withparenchyma profile 1 normal lung Table E.1 (continued) GEO IDGSE11281 Platform GPL570 Title Staphylococcal superantigen tox- Infection Species Control Samples Infection Samples GSE7013 GPL81GSE12806 GPL570 Gnotobiotic mouse ileum; Listeria Chlamydia pneumonia infection GSE16123 GPL4133 HMEC-1 endothelial cell line re- GSE16123 GPL4133 HMEC-1 endothelial cell line re- GSE16123 GPL4133 HMEC-1 endothelial cell line re- GSE20346 GPL6947 Severe influenza A infection: GSE36174 GPL85 LPS-induced expression in rat GSE20050 GPL1352 Caseous tuberculosis granulomas 169 GSM528211,GSM528209 GSM528210, GSM528234,GSM528232 GSM528233, GSM605513,GSM605515, GSM605514, GSM605517 GSM605516, GSM605554,GSM605556, GSM605555, GSM605558, GSM605559 GSM605557, GSM605539,GSM605541, GSM605542 GSM605540, GSM605570,GSM605572, GSM605573 GSM605571, GSM14506,GSM14508 GSM14507, GSM528194,GSM528192, GSM528215 GSM528193, GSM528194,GSM528192, GSM528215 GSM528193, GSM605498,GSM605500, GSM605501 GSM605499, GSM605498,GSM605500, GSM605501 GSM605499, GSM605529, GSM605530 GSM605529, GSM605530 GSM14500, GSM14501 B. suis GSM528217, GSM528216, B. suis GSM528217, GSM528216, S. aureus GSM605496, GSM605497, S. aureus GSM605496, GSM605497, E. coli GSM605527, GSM605528, E. coli GSM605527, GSM605528, P. aeroginosa GSM14498, GSM14499, suis strainmacrophage cell infection line: time effectprofile course 1 on suis strainmacrophage cell infection line: time effectprofile course 2 on ent genetic predispositions for so- matic cell score response toreus S. and au- E. coli:file time 1 course pro- ent genetic predispositions for so- matic cell score response toreus S. and au- E. coli:file time 2 course pro- ent genetic predispositions for so- matic cell score response toreus S. and au- E. coli:file time 3 course pro- ent genetic predispositions for so- matic cell score response toreus S. and au- E. coli:file time 4 course pro- aeruginosa infected lung epithelial cell comparison profile 1 Table E.1 (continued) GEO IDGSE21117 Platform GPL1261 Rough Title attenuated Brucella Infection Species Control Samples Infection Samples GSE21117 GPL1261 Rough attenuated Brucella GSE24560 GPL2112 Mammary gland cells of differ- GSE24560 GPL2112 Mammary gland cells of differ- GSE24560 GPL2112 Mammary gland cells of differ- GSE24560 GPL2112 Mammary gland cells of differ- GSE923 GPL96 Mucoid and motile Pseudomonas 170 GSM14513,GSM14515, GSM14516 GSM14514, GSM14502,GSM14504, GSM14505 GSM14503, GSM14509,GSM14511, GSM14512 GSM14510, GSM262580,GSM262672, GSM262671, GSM262688 GSM262674, GSM245725,GSM245727, GSM245728 GSM245726, GSM280331,GSM280343, GSM280345 GSM280337, GSM280333,GSM280347, GSM280349 GSM280339, GSM280335,GSM280351, GSM280353 GSM280341, GSM14500, GSM14501 GSM14500, GSM14501 GSM14500, GSM14501 GSM262675,GSM262691 GSM262689, GSM265991, GSM265992 GSM280344, GSM280346 GSM280348, GSM280350 GSM280352, GSM280354 P. aeroginosa GSM14498, GSM14499, P. aeroginosa GSM14498, GSM14499, P. aeroginosa GSM14498, GSM14499, L. monocytogenes GSM262670, GSM262673, P. gingivalis GSM265989, GSM265990, M. tuberculosis GSM280332, GSM280338, M. tuberculosis GSM280334, GSM280340, M. tuberculosis GSM280336, GSM280342, aeruginosa infected lung epithelial cell comparison profile 2 aeruginosa infected lung epithelial cell comparison profile 3 aeruginosa infected lung epithelial cell comparison profile 4 of brainswith from lethalgenes mice over Listeria a infected 4-day monocyto- 1 period profile gival Epithelialtal Cell Remodeling and Cytoskele- Cytokine Pro- ductions. profile 1 Susceptibility Genes with Human Macrophage Geneprofile s profile 1 Expression Susceptibility Genes with Human Macrophage Geneprofile s profile 2 Expression Susceptibility Genes with Human Macrophage Geneprofile s profile 3 Expression Table E.1 (continued) GEO IDGSE923 Platform GPL96 Title Mucoid and motile Pseudomonas Infection Species Control Samples Infection Samples GSE923 GPL96 Mucoid and motile Pseudomonas GSE923 GPL96 Mucoid and motile Pseudomonas GSE10416 GPL151 Gene expression based profiling GSE10526 GPL96 Role of P. gingivalis SerB in Gin- GSE11199 GPL570 Identification of Tuberculosis GSE11199 GPL570 Identification of Tuberculosis GSE11199 GPL570 Identification of Tuberculosis 171 GSM289412,GSM289414, GSM289415 GSM289413, GSM305592,GSM305594, GSM305595 GSM305593, GSM305596,GSM305598, GSM305599 GSM305597, GSM307651,GSM307655, GSM307653, GSM307660, GSM307657, GSM307668, GSM307663, GSM307676 GSM307671, GSM307658,GSM307664, GSM307661, GSM307672, GSM307677 GSM307669, GSM325763, GSM325764 GSM289410, GSM289411 GSM305590, GSM305591 GSM305590, GSM305591 GSM307654,GSM307659, GSM307656, GSM307667, GSM307662, GSM307675 GSM307670, GSM307654,GSM307659, GSM307656, GSM307667, GSM307662, GSM307675 GSM307670, S. Pyogenes GSM289408, GSM289409, S. gordonii GSM305588, GSM305589, P. gingivalis GSM305588, GSM305589, B. cepacia GSM307650, GSM307652, P. aeroginosa GSM307650, GSM307652, E. coliE. coli GSM325744, GSM325747 GSM325722, GSM325729, GSM325744, GSM325747 GSM325757, GSM325758 nasal-associated lymphoid tissue (NALT) togenes Streptococcus infection. profile 1 pyo- sponses tobiome a Comprising Complexand Opportunistic Commensal Micro- Oral Microbial Species profile 1 sponses tobiome a Comprising Complexand Opportunistic Commensal Micro- Oral Microbial Species profile 2 Patterns inMacrophages in Human Responseaeruginosa and to B. Alveolar P. cepacia1 profile Patterns inMacrophages in Human Responseaeruginosa and to B. Alveolar P. cepacia2 profile ysis ofmurine BA- colonic orO157 epithelium BL- infection profile after associated 1 ysis ofmurine BA- colonic orO157 epithelium BL- infection profile after associated 2 Table E.1 (continued) GEO IDGSE11494 Platform GPL1261 Innate Title immune response of murine Infection Species Control Samples Infection Samples GSE12121 GPL570 Gingival Epithelial Cell Re- GSE12121 GPL570 Gingival Epithelial Cell Re- GSE12245 GPL80 Similarity of Gene Expression GSE12245 GPL80 Similarity of Gene Expression GSE12998 GPL1261 Comparative transcriptomic anal- GSE12998 GPL1261 Comparative transcriptomic anal- 172 GSM326090,GSM326097, GSM326091, GSM326102, GSM326101, GSM326106, GSM326105, GSM326114, GSM326110, GSM326120, GSM326115, GSM326122, GSM326121, GSM326164, GSM326163, GSM326172, GSM326166, GSM326177, GSM326173, GSM326180, GSM326179, GSM326191, GSM326184, GSM365204, GSM326192, GSM365206, GSM365205, GSM365208, GSM365207, GSM365210, GSM365209, GSM365230, GSM365211, GSM365232, GSM365231, GSM365234, GSM365233, GSM365239, GSM365240 GSM365238, GSM349131,GSM349133, GSM349132, GSM349135, GSM349134, GSM349137, GSM349136, GSM349139, GSM349138, GSM349141, GSM349142 GSM349140, GSM359506,GSM359508 GSM359507, GSM326096,GSM326100, GSM326099, GSM326112, GSM326109, GSM326119, GSM326113, GSM326167, GSM326165, GSM326174, GSM326169, GSM326183, GSM326175, GSM326188, GSM326187, GSM326190 GSM326189, GSM359502 B. pseudomallei GSM326094, GSM326095, H. pylori GSM349127, GSM349128E. coli GSM349129, GSM349130, GSM359500, GSM359501, E. coli GSM361871, GSM361873 GSM361872, GSM361874 Identifies a Blood Biomarker Sig- nature for theticemic Diagnosis Melioidosis profile of 1 Sep- tric epithelium profile 1 bacterial signal indole profile 1 model of Ecoli mastitis profile 1 Table E.1 (continued) GEO IDGSE13015 Platform GPL6106 Genomic Title Transcriptional Profiling Infection Species Control Samples Infection Samples GSE13873 GPL1261 Expression data from murine gas- GSE14379 GPL8060 Human epithelial cell response to GSE14485 GPL339 Expression data from mouse 173 GSM400920,GSM400922 GSM400921, GSM400928,GSM400930, GSM400929, GSM400932 GSM400931, GSM403282,GSM403288, GSM403286, GSM403292, GSM403290, GSM403296, GSM403294, GSM403300, GSM403298, GSM403306, GSM403302, GSM403310, GSM403308, GSM403316, GSM403314, GSM403320, GSM403318, GSM403393, GSM403324, GSM403397, GSM403395, GSM403401, GSM403399, GSM403405, GSM403403, GSM403409, GSM403407, GSM403415, GSM403413, GSM403419, GSM403417, GSM403429, GSM403421, GSM403437, GSM403435, GSM403441, GSM403439, GSM403553 GSM403548, GSM400915,GSM400917, GSM400916, GSM400919 GSM400918, GSM400925,GSM400927 GSM400926, GSM403312,GSM403326, GSM403322, GSM403330, GSM403328, GSM403423, GSM403411, GSM403426, GSM403424, GSM403431, GSM403427, GSM403433, GSM403432, GSM403551 GSM403546, B. anthracis GSM377031, GSM377036M. tuberculosis GSM377030, GSM377041 GSM400913, GSM400914, M. tuberculosis GSM400923, GSM400924, S. aureus GSM403284, GSM403304, file of HumanMononuclear Peripheral Cells Blood Bacillus Exposed anthracis in to vitro1 profile dominant monocytopenia with in- creased susceptibility to mycobac- terial infection profile 1 dominant monocytopenia with in- creased susceptibility to mycobac- terial infection profile 2 and Decreased CentralT Cells Memory in ChildrenStaphylococcus with aureus Invasive Infections profile 1 Table E.1 (continued) GEO IDGSE15068 Platform GPL3566 Whole Title Genome Expression pro- GSE16020 GPL570 Patients affected with autosomal Infection Species Control Samples Infection Samples GSE16020 GPL570 Patients affected with autosomal GSE16129 GPL96 Enhanced Monocyte Response 174 GSM403283,GSM403289, GSM403287, GSM403293, GSM403291, GSM403297, GSM403295, GSM403301, GSM403299, GSM403307, GSM403303, GSM403311, GSM403309, GSM403317, GSM403315, GSM403321, GSM403319, GSM403394, GSM403325, GSM403398, GSM403396, GSM403402, GSM403400, GSM403406, GSM403404, GSM403410, GSM403408, GSM403416, GSM403414, GSM403420, GSM403418, GSM403425, GSM403422, GSM403430, GSM403428, GSM403436, GSM403434, GSM403440, GSM403438, GSM403549, GSM403554 GSM403442, GSM403550,GSM403561, GSM403560, GSM403572, GSM403571, GSM403574, GSM403573, GSM403576 GSM403575, GSM406980,GSM406992 GSM406986, GSM403313,GSM403327, GSM403323, GSM403331, GSM403329, GSM403547, GSM403552 GSM403412, GSM403563,GSM403566, GSM403564, GSM403568, GSM403567, GSM403570 GSM403569, GSM406989 S. aureus GSM403285, GSM403305, S. aureus GSM403555, GSM403562, F. tularensis GSM406977, GSM406983, and Decreased CentralT Cells Memory in ChildrenStaphylococcus with aureus Invasive Infections profile 1 and Decreased CentralT Cells Memory in ChildrenStaphylococcus with aureus Invasive Infections profile 1 infected with Ft LVSwith (without LPS pretreatment) or profile 1 Table E.1 (continued) GEO IDGSE16129 Platform GPL97 Title Enhanced Monocyte Response Infection Species Control Samples Infection Samples GSE16129 GPL6106 Enhanced Monocyte Response GSE16207 GPL1261 Expression data from mouse liver 175 GSM411458,GSM411460, GSM411459, GSM411462 GSM411461, GSM411470,GSM411472 GSM411471, GSM463603,GSM463605, GSM463604, GSM463607 GSM463606, GSM463613,GSM463615, GSM463614, GSM463617 GSM463616, GSM435897,GSM435899, GSM435900 GSM435898, GSM456428,GSM456434 GSM456431, GSM456429,GSM456435 GSM456432, GSM456426,GSM456433 GSM456430, GSM411430,GSM411432, GSM411431, GSM411434, GSM411433, GSM411436, GSM411435, GSM411438, GSM411439 GSM411437, GSM411442,GSM411444, GSM411445 GSM411443, GSM463610,GSM463612 GSM463611, GSM463620, GSM463621 GSM435907, GSM435908 GSM456370 GSM456371 GSM456396 H. pylori GSM411428, GSM411429, H. pylori GSM411440, GSM411441, P. gingivalis GSM463608, GSM463609, P. gingivalis GSM463618, GSM463619, M. tuberculosis GSM435905, GSM435906, Y. pestis GSM456324, GSM456337, Y. pestis GSM456329, GSM456369, Y. pestis GSM456394, GSM456395, genitors to H. pylori isolatesSwedish from patients with chronicrophic at- gastritis 1 profile 1 genitors to H. pylori isolatesSwedish from patients with chronicrophic at- gastritis 1 profile 2 givalis infected Mouse profile 1 givalis infected Mouse profile 2 stimulated with IFN-gfected and/or with in- TB profile 1 model after infectiontype with Yersinia pestis wild- CO92Braun and its mutant profile 1 model after infectiontype with Yersinia pestis wild- CO92Braun and lipoprotein its mutant profile 2 model after infectiontype with Yersinia pestis wild- CO92Braun and lipoprotein its mutant profile 3 Table E.1 (continued) GEO IDGSE16390 Platform GPL1261 Response Title of gastric epithelial pro- Infection Species Control Samples Infection Samples GSE16390 GPL1261 Response of gastric epithelial pro- GSE17110 GPL339 Gene expression data from P. gin- GSE17110 GPL339 Gene expression data from P. gin- GSE17477 GPL571 Expresson from human THP-1 GSE18293 GPL1261 A murine pneumonic plague GSE18293 GPL1261 A murine pneumonic plague GSE18293 GPL1261 A murine pneumonic plague 176 GSM484449,GSM484456, GSM484451, GSM484463, GSM484459, GSM484468, GSM484466, GSM484472, GSM484470, GSM484475, GSM484474, GSM484485 GSM484478, GSM484595,GSM484597, GSM484596, GSM484605, GSM484598, GSM484610, GSM484606, GSM484618, GSM484611, GSM484620, GSM484619, GSM484628, GSM484621, GSM484630, GSM484629, GSM484638, GSM484631, GSM484641, GSM484639, GSM484645 GSM484642, GSM495867,GSM495869, GSM495868, GSM495871 GSM495870, GSM495872,GSM495874, GSM495873, GSM495876 GSM495875, GSM484454,GSM484462, GSM484457, GSM484477, GSM484465, GSM484480, GSM484479, GSM484483, GSM484487 GSM484481, GSM484612,GSM484622, GSM484613, GSM484632, GSM484623, GSM484634, GSM484633, GSM484647, GSM484648 GSM484644, GSM463610,GSM463612 GSM463611, GSM463620, GSM463621 M. tuberculosis GSM484450, GSM484453, M. tuberculosis GSM484603, GSM484604, T. denticola GSM463608, GSM463609, T. denticola GSM463618, GSM463619, Active andUK Latent (Training Set) profile Tuberculosis 1 Active and LatentSet) TB profile 1 (UK Test Treponema denticolainduced infection- bonetranscriptional profile s and profile 1 soft tissue Treponema denticolainduced infection- bonetranscriptional profile s and profile 2 soft tissue Table E.1 (continued) GEO IDGSE19439 Platform GPL6947 Blood Title Transcriptional profile s in Infection Species Control Samples Infection Samples GSE19444 GPL6947 Blood Transcriptional profile s of GSE19855 GPL339 Molecular Characterization of GSE19855 GPL339 Molecular Characterization of 177 GSM507579,GSM507615, GSM507603, GSM507651, GSM507639, GSM507687, GSM507675, GSM507723, GSM507711, GSM507759, GSM507747, GSM507795, GSM507783, GSM507831, GSM507819, GSM507867, GSM507855, GSM507903, GSM507927 GSM507891, GSM526721,GSM526729, GSM526722, GSM526739, GSM526731, GSM526747, GSM526753 GSM526746, GSM529499,GSM529501, GSM529502 GSM529500, GSM529527,GSM529529, GSM529528, GSM529531 GSM529530, GSM543409,GSM543411 GSM543410, GSM507624,GSM507660, GSM507648, GSM507696, GSM507684, GSM507732, GSM507720, GSM507768, GSM507756, GSM507804, GSM507792, GSM507840, GSM507828, GSM507876, GSM507864, GSM507912, GSM507936 GSM507900, GSM526725,GSM526736, GSM526735, GSM526743, GSM526750 GSM526742, GSM529497, GSM529498 GSM529524,GSM529526 GSM529525, GSM543408 S. Pyogenes GSM507588, GSM507612, C. burnetii GSM526717, GSM526718, M. bovis GSM529495, GSM529496, M. bovis GSM529522, GSM529523, H. pylori GSM543406, GSM543407, tudinal pharyngealcynomolgus macaques infection by group A of Streptococcus profile 1 pression following Coxiellanetii bur- infection:circadian potential rhythm profile role 1 of cination induces amonary specific transcriptome pul- ture in biosigna- mice profile 1 cination induces amonary specific transcriptome pul- ture in biosigna- mice profile 2 ameliorates homeostasis in mice profile 1 Table E.1 (continued) GEO IDGSE20262 Platform GPL96 Title Interactome analysis of longi- Infection Species Control Samples Infection Samples GSE21065 GPL4134 Sex-related differences in gene ex- GSE21149 GPL7202 Mycobacterium bovis-BCG vac- GSE21149 GPL7202 Mycobacterium bovis-BCG vac- GSE21833 GPL1261 Helicobacter pylori colonization 178 GSM549435,GSM549437, GSM549436, GSM549439, GSM549438, GSM549441, GSM549440, GSM549443, GSM549442, GSM549445, GSM549446 GSM549444, GSM549421,GSM549423, GSM549422, GSM549425, GSM549424, GSM549427, GSM549426, GSM549429, GSM549428, GSM549431, GSM549430, GSM549433, GSM549432, GSM549447:GSM549472 GSM549434, GSM552585,GSM552587, GSM552586, GSM552589, GSM552590 GSM552588, GSM552609,GSM552611, GSM552610, GSM552613, GSM552614 GSM552612, GSM550322,GSM550324, GSM550323, GSM550326, GSM550325, GSM550328, GSM550327, GSM550330, GSM550329, GSM550332, GSM550331, GSM550334, GSM550333, GSM550336, GSM550335, GSM550338, GSM550337, GSM550340, GSM550339, GSM550342:GSM550400 GSM550341, GSM550322,GSM550324, GSM550323, GSM550326, GSM550325, GSM550328, GSM550327, GSM550330, GSM550329, GSM550332, GSM550331, GSM550334, GSM550333, GSM550336, GSM550335, GSM550338, GSM550337, GSM550340, GSM550339, GSM550342:GSM550400 GSM550341, GSM552575,GSM552577, GSM552578 GSM552576, GSM552575,GSM552577, GSM552578 GSM552576, S. Pyogenes GSM550320, GSM550321, S. aureus GSM550320, GSM550321, F. tularensis GSM552573, GSM552574, F. tularensis GSM552573, GSM552574, file s of patients withculosis active (TB) tuber- and othertory inflamma- and infectious diseases profile 1 file s of patients withculosis active (TB) tuber- and othertory inflamma- and infectious diseases profile 2 to virulent F. tularensisthe mouse Schu4 model in of infectionfile pro- 1 to virulent F. tularensisthe mouse Schu4 model in of infectionfile pro- 2 Table E.1 (continued) GEO IDGSE22098 Platform GPL6947 Whole Title blood transcriptional pro- Infection Species Control Samples Infection Samples GSE22098 GPL6947 Whole blood transcriptional pro- GSE22203 GPL10504 Immunological responses unique GSE22203 GPL10504 Immunological responses unique 179 GSM552639,GSM552641, GSM552640, GSM552643, GSM552644 GSM552642, GSM552663,GSM552665, GSM552664, GSM552667, GSM552668 GSM552666, GSM566329,GSM566331 GSM566330, GSM575860,GSM575862, GSM575863 GSM575861, GSM615985,GSM616001 GSM615993, GSM616002 GSM552629,GSM552631, GSM552632 GSM552630, GSM552629,GSM552631, GSM552632 GSM552630, GSM566318 GSM575859 GSM616005 F. tularensis GSM552627, GSM552628, F. tularensis GSM552627, GSM552628, M. tuberculosis GSM566316, GSM566317, M. tuberculosis GSM575857, GSM575858, B. pseudomallei GSM615989, GSM615997, B. pseudomallei GSM615990, GSM615998 GSM615986, GSM615994, to virulent F. tularensisthe mouse Schu4 model in of infectionfile pro- 3 to virulent F. tularensisthe mouse Schu4 model in of infectionfile pro- 4 ing mycobacteriamacrophages infection isautocrine-paracrine in dependentprofile 1 on signaling 1 influencesand bacterial pathology during the clearance infection with Mycobacterium tuberculosis profile 1 filing of adosis murine model acute reveals newinto melioi- how insights Burkholderia pseudoma- llei overcome hostnity innate profile 1 immu- filing of adosis murine model acute reveals newinto melioi- how insights Burkholderia pseudoma- llei overcome hostnity innate profile 2 immu- Table E.1 (continued) GEO IDGSE22203 Platform GPL10504 Immunological responses Title unique Infection Species Control Samples Infection Samples GSE22203 GPL10504 Immunological responses unique GSE22935 GPL1261 Arginine utilization follow- GSE23508 GPL7202 Suppressor of cytokine signaling- GSE25074 GPL6103 Genome wide transcriptome pro- GSE25074 GPL6103 Genome wide transcriptome pro- 180 GSM676126,GSM676128 GSM676127, GSM676129,GSM676131 GSM676130, GSM677734,GSM677736 GSM677735, GSM677737,GSM677739 GSM677738, GSM692448,GSM692450, GSM692451 GSM692449, GSM692456,GSM692458, GSM692459 GSM692457, GSM676134 GSM676134 GSM677746 GSM677747 GSM692454, GSM692455 GSM692454, GSM692455 H. pylori GSM676132, GSM676133, H. pylori GSM676132, GSM676133, H. pylori GSM677730, GSM677731, H. pylori GSM677732, GSM677733, M. tuberculosis GSM692452, GSM692453, M. tuberculosis GSM692452, GSM692453, E. coli GSM700775, GSM700776 GSM700777, GSM700778 tric epithelial cells induced by He- licobacter pylori infection1 profile tric epithelial cells induced by He- licobacter pylori infection2 profile the molecular detailsgastritis of in atrophic Helicobacter pylori in- fected corpus mucosa profile 1 the molecular detailsgastritis of in atrophic Helicobacter pylori in- fected corpus mucosa profile 2 berculosis patients,tious latent individuals infec- and healthy con- trols. profile 1 berculosis patients,tious latent individuals infec- and healthy con- trols. profile 2 dritic cells profile 1 Table E.1 (continued) GEO IDGSE27347 Platform GPL11094 Gene expression Title changes in gas- Infection Species Control Samples Infection Samples GSE27347 GPL11094 Gene expression changes in gas- GSE27411 GPL6255 Transcriptional analysis reveals GSE27411 GPL6255 Transcriptional analysis reveals GSE27984 GPL6848 PPD stimulated PBMCs from tu- GSE27984 GPL6848 PPD stimulated PBMCs from tu- GSE28340 GPL1261 Expression data from mouse den- 181 GSM709414,GSM709418, GSM709415, GSM709425, GSM709419, GSM709433, GSM709429, GSM709440, GSM709435, GSM709445, GSM709441, GSM709456, GSM709446, GSM709465, GSM709457, GSM709484, GSM709468, GSM709492, GSM709486, GSM709495, GSM709494, GSM709517, GSM709497, GSM709519 GSM709518, GSM709420,GSM709428, GSM709423, GSM709432, GSM709430, GSM709438, GSM709437, GSM709443, GSM709442, GSM709448, GSM709444, GSM709453, GSM709451, GSM709462, GSM709460, GSM709476, GSM709463, GSM709478, GSM709477, GSM709481, GSM709480, GSM709493, GSM709491, GSM709498, GSM709496, GSM709500, GSM709499, GSM709511, GSM709509, GSM709513, GSM709512, GSM709515, GSM709514, GSM709520 GSM709516, M. tuberculosis GSM709413, GSM709416, tion analysis of wholeexpression blood gene profile stional reveal func- networkspathogenesis. underlying TB profile [Agilent-014850] 1 Table E.1 (continued) GEO IDGSE28623 Platform GPL4133 Pathway Title and functional associa- Infection Species Control Samples Infection Samples 182 GSM709417,GSM709422, GSM709421, GSM709426, GSM709424, GSM709431, GSM709427, GSM709436, GSM709434, GSM709447, GSM709439, GSM709450, GSM709449, GSM709454, GSM709452, GSM709458, GSM709455, GSM709461, GSM709459, GSM709466, GSM709464, GSM709469, GSM709467, GSM709471, GSM709470, GSM709473, GSM709472, GSM709475, GSM709474, GSM709482, GSM709479, GSM709485, GSM709483, GSM709488, GSM709487, GSM709490, GSM709489, GSM709502, GSM709501, GSM709504, GSM709503, GSM709506, GSM709505, GSM709508, GSM709510 GSM709507, GSM710390,GSM710413 GSM710400, GSM709420,GSM709428, GSM709423, GSM709432, GSM709430, GSM709438, GSM709437, GSM709443, GSM709442, GSM709448, GSM709444, GSM709453, GSM709451, GSM709462, GSM709460, GSM709476, GSM709463, GSM709478, GSM709477, GSM709481, GSM709480, GSM709493, GSM709491, GSM709498, GSM709496, GSM709500, GSM709499, GSM709511, GSM709509, GSM709513, GSM709512, GSM709515, GSM709514, GSM709520 GSM709516, GSM710408 M. tuberculosis GSM709413, GSM709416, B. pseudomallei GSM710388, GSM710392, tion analysis of wholeexpression blood gene profile stional reveal func- networkspathogenesis. underlying TB profile [Agilent-014850] 2 innate immunestreptozotocin-induced pathwayshosts diabetic leads to more in severeduring disease infection withria Burkholde- pseudomallei profile 1 Table E.1 (continued) GEO IDGSE28623 Platform GPL4133 Pathway Title and functional associa- Infection Species Control Samples Infection Samples GSE28683 GPL6103 Delayed activation of host 183 GSM710391,GSM710402 GSM710401, GSM752014,GSM752016 GSM752015, GSM770967,GSM770969 GSM770968, GSM787748,GSM787750 GSM787749, GSM787751,GSM787753 GSM787752, GSM792382,GSM792384 GSM792383, GSM710409 GSM752019 GSM770957 GSM787747 GSM787747 GSM792381 B. pseudomallei GSM710389, GSM710393, C. burnetii GSM752017, GSM752018, C. trachomatis GSM770955, GSM770956, M. tuberculosis GSM787745, GSM787746, M. tuberculosis GSM787745, GSM787746, H. pylori GSM792379, GSM792380, innate immunestreptozotocin-induced pathwayshosts diabetic leads to more in severeduring disease infection withria Burkholde- pseudomallei profile 2 phoblastic cell line,placenta infection a profile 1 model of innate immuneChlamydia pathways trachomatis-infected endocervical cells profile 1 in reponse is strain-specific profile 1 reponse is strain-specific profile 2 gastric mucosa ofto mice Pteridium aquilinum exposed and/orfected in- withprofile Helicobacter 1 pylori Table E.1 (continued) GEO IDGSE28683 Platform GPL6103 Delayed Title activation of host Infection Species Control Samples Infection Samples GSE30330 GPL6480 Coxiella burnetii infects JEG tro- GSE31149 GPL6104 activates multiple GSE31734 GPL6246 Mtb-mediated host transcriptional GSE31734 GPL6246 Mtb-mediated host transcriptional GSE31991 GPL11098 To identify alterations induced in 184 GSM824477,GSM824481, GSM824478, GSM824606, GSM824596, GSM824626, GSM824636 GSM824616, GSM824556,GSM824558, GSM824557, GSM824560, GSM824559, GSM824621, GSM824601, GSM824641 GSM824631, GSM824648,GSM824656, GSM824649, GSM824664, GSM824657, GSM824672, GSM824665, GSM824680, GSM824681 GSM824673, GSM824463,GSM824465, GSM824464, GSM824467, GSM824466, GSM824469, GSM824468, GSM824471, GSM824470, GSM824473, GSM824472, GSM824475, GSM824474, GSM824592, GSM824476, GSM824611, GSM824602, GSM824622, GSM824632 GSM824612, GSM824545,GSM824547, GSM824546, GSM824549, GSM824548, GSM824551, GSM824550, GSM824553, GSM824552, GSM824555, GSM824554, GSM824607, GSM824597, GSM824627, GSM824637 GSM824617, GSM824650,GSM824658, GSM824651, GSM824666, GSM824659, GSM824674, GSM824675 GSM824667, E. coli GSM824461, GSM824462, E. coli GSM824543, GSM824544, E. coli GSM824642, GSM824643, fiers Identify Staphylococcusreus au- Infection inmans profile Mice 1 and Hu- fiers Identify Staphylococcusreus au- Infection inmans profile Mice 2 and Hu- fiers Identify Staphylococcusreus au- Infection inmans profile Mice 3 and Hu- Table E.1 (continued) GEO IDGSE33341 Platform GPL1261 Gene Title Expression-Based Classi- Infection Species Control Samples Infection Samples GSE33341 GPL1261 Gene Expression-Based Classi- GSE33341 GPL1261 Gene Expression-Based Classi- 185 GSM824707,GSM824709, GSM824708, GSM824711, GSM824710, GSM824713, GSM824712, GSM824715, GSM824714, GSM824717, GSM824716, GSM824719, GSM824718, GSM932380, GSM824720, GSM932382, GSM932381, GSM932384 GSM932383, GSM824749,GSM824751, GSM824750, GSM824753, GSM824752, GSM824755, GSM824754, GSM824757, GSM824756, GSM824759, GSM824758, GSM824761, GSM824760, GSM824763, GSM824762, GSM824765, GSM824764, GSM824767, GSM824766, GSM824769, GSM824768, GSM824771, GSM824770, GSM824773, GSM824772, GSM824775, GSM824774, GSM824777, GSM824776, GSM824779, GSM824778, GSM824781, GSM824780, GSM824783, GSM824782, GSM824785, GSM824784, GSM824787, GSM824786, GSM824790 GSM824788, S. aureus GSM824747, GSM824748, fiers Identify Staphylococcusreus au- Infection inmans profile Mice 1 and Hu- Table E.1 (continued) GEO IDGSE33341 Platform GPL571 Title Gene Expression-Based Classi- Infection Species Control Samples Infection Samples 186 GSM824721,GSM824723, GSM824722, GSM824725, GSM824724, GSM824727, GSM824726, GSM824729, GSM824728, GSM824731, GSM824730, GSM824733, GSM824732, GSM824735, GSM824734, GSM824737, GSM824736, GSM824739, GSM824738, GSM824741, GSM824740, GSM824743, GSM824742, GSM824745, GSM824744, GSM932374, GSM824746, GSM932376, GSM932375, GSM932378, GSM932379 GSM932377, GSM838605,GSM838607, GSM838606, GSM838609, GSM838610 GSM838608, GSM838611,GSM838613, GSM838612, GSM838615, GSM838616 GSM838614, GSM824749,GSM824751, GSM824750, GSM824753, GSM824752, GSM824755, GSM824754, GSM824757, GSM824756, GSM824759, GSM824758, GSM824761, GSM824760, GSM824763, GSM824762, GSM824765, GSM824764, GSM824767, GSM824766, GSM824769, GSM824768, GSM824771, GSM824770, GSM824773, GSM824772, GSM824775, GSM824774, GSM824777, GSM824776, GSM824779, GSM824778, GSM824781, GSM824780, GSM824783, GSM824782, GSM824785, GSM824784, GSM824787, GSM824786, GSM824790 GSM824788, GSM838587,GSM838589, GSM838590 GSM838588, GSM838593,GSM838595, GSM838596 GSM838594, E. coli GSM824747, GSM824748, E. coli GSM838585, GSM838586, E. coli GSM838591, GSM838592, fiers Identify Staphylococcusreus au- Infection inmans profile Mice 2 and Hu- factor in thesponse innate to systemic immune LPSprofile re- 1 [mRNA] factor in thesponse innate to systemic immune LPSprofile re- 2 [mRNA] Table E.1 (continued) GEO IDGSE33341 Platform GPL571 Title Gene Expression-Based Classi- Infection Species Control Samples Infection Samples GSE33901 GPL7202 The Role of the E2F1 transcription GSE33901 GPL7202 The Role of the E2F1 transcription 187 GSM840142 GSM841857,GSM841859 GSM841858, GSM842082,GSM842084 GSM842083, GSM842925,GSM842929, GSM842927, GSM842933, GSM842931, GSM842937, GSM842935, GSM842941, GSM842939, GSM842945:2:GSM842971, GSM842943, GSM842972,GSM842976, GSM842974, GSM842980, GSM842978, GSM842984, GSM842982, GSM842988, GSM842986, GSM842992, GSM842990, GSM842996, GSM842994, GSM843000:2:GSM843182 GSM842998, GSM841862,GSM841864, GSM841863, GSM841866 GSM841865, GSM842077 GSM842928,GSM842932, GSM842930, GSM842936, GSM842934, GSM842940, GSM842938, GSM842944:2:GSM842970, GSM842942, GSM842973,GSM842977, GSM842975, GSM842981, GSM842979, GSM842985, GSM842983, GSM842989, GSM842987, GSM842993, GSM842991, GSM842997, GSM842995, GSM843001:2:GSM843181 GSM842999, B. pertussis GSM840158, GSM840159 GSM840140, GSM840141, L. monocytogenes GSM841860, GSM841861, E. coli GSM842075, GSM842076, M. tuberculosis GSM842924, GSM842926, lung airway gene expressioning dur- B. pertussis infectionprofile in 1 mice macrophageafter gene L. monocytogenesprofile infection 1 expression toneal cells to a non-pathogenic E. coli infection profile 1 ture of variationresponse to in Mycobacterium tuber- theculosis immune infection (expression) pro- file 1 Table E.1 (continued) GEO IDGSE33995 Platform GPL6885 The Title effect of on Infection Species Control Samples Infection Samples GSE34103 GPL6884 Genome wide analysis of GSE34114 GPL1261 Temporal response of mouse peri- GSE34151 GPL10558 Deciphering the genetic architec- 188 GSM851868,GSM851870, GSM851869, GSM851872, GSM851871, GSM851874, GSM851875 GSM851873, GSM933064,GSM933066 GSM933065, GSM944835,GSM944849, GSM944842, GSM944863 GSM944856, GSM979837, GSM979838 GSM851878,GSM851880, GSM851879, GSM851882, GSM851881, GSM851884, GSM851883, GSM851886, GSM851885, GSM851888, GSM851887, GSM851890, GSM851889, GSM851892, GSM851893 GSM851891, GSM933060 GSM944845,GSM944859 GSM944852, M. tuberculosis GSM851876, GSM851877, M. lepreaE. coli GSM867906, GSM867907E. coli GSM867904, GSM867905 M. GSM869060, tuberculosis GSM869062 GSM869061, GSM869063 S. GSM869060, aureus GSM869062 GSM898797, GSM898798 GSM869061, GSM869063 GSM898799, GSM898800 GSM933058, GSM933059, S. aureus GSM944831, GSM944838, M. bovis GSM979843, GSM979844 GSM979835, GSM979836, in pulmonary tuberculosis and sar- coidosis profile 1 rae on primarycell gene human expression profile Schwann 1 mouse mammary tissuecoli upon infection profile E. 1 mouse mammary tissuecoli upon infection profile E. 1 macrophages to M.infection profile 1 tuberculosis Staphylococcus aureusHemolysin profile 1 alpha- aureus-infected micezolid with and line- vancomycinprofile 1 treatment macrophages after infectiontwo with Argentinean Mycobacterium bovis isolates profile 1 Table E.1 (continued) GEO IDGSE34608 Platform GPL6480 Gene Title and microRNA expression Infection Species Control Samples Infection Samples GSE35423 GPL2986 Effect of live Mycobacterium lep- GSE35472 GPL13381 Alteration of gene expression in GSE35472 GPL13381 Alteration of gene expression in GSE36686 GPL10813 The IRF3 dependent response of GSE38053 GPL1261 Host Response Signature to GSE38531 GPL1261 Expression data in Staphylococcus GSE39819 GPL2112 Transcriptional response of bovine 189 GSM979841, GSM979842 GSM992788,GSM992790, GSM992789, GSM992792, GSM992791, GSM992794, GSM992795 GSM992793, GSM1002233, GSM1002234, GSM1002235, GSM1002236, GSM1002237, GSM1002238 GSM1002251, GSM1002252, GSM1002253, GSM1002254, GSM1002255, GSM1002256 GSM992798,GSM992800, GSM992799, GSM992802, GSM992801, GSM992804, GSM992803, GSM992806, GSM992805, GSM992808, GSM992807, GSM992810, GSM992809, GSM992812, GSM992811, GSM992814, GSM992813, GSM992816, GSM992817 GSM992815, GSM1002199, GSM1002200, GSM1002201, GSM1002202 GSM1002217, GSM1002218, GSM1002219, GSM1002220 M. bovis GSM979843, GSM979844 GSM979839, GSM979840, S. aureus GSM992796, GSM992797, F. tularensis GSM1002197, GSM1002198, F. tularensis GSM1002215, GSM1002216, macrophages after infectiontwo with Argentinean Mycobacterium bovis isolates profile 2 nature distinguishes viral infection from bacterial infection inyoung febrile children. profile 1 grown A549s compared tolayer mono- grown A549s infectedF. with tularensis SchuS4hour time-course over (2, apost 6 infection) 22 profile and 1 22 h grown A549s compared tolayer mono- grown A549s infectedF. with tularensis SchuS4hour time-course over (2, apost 6 infection) 22 profile and 2 22 h Table E.1 (continued) GEO IDGSE39819 Platform GPL2112 Transcriptional Title response of bovine Infection Species Control Samples Infection Samples GSE40396 GPL10558 Whole blood transcriptional sig- GSE40808 GPL4133 Rotating wall vessel (RWV) GSE40808 GPL4133 Rotating wall vessel (RWV) 190 GSM1008257, GSM1008258, GSM1008259, GSM1008260, GSM1008261, GSM1008262, GSM1008263, GSM1008264, GSM1008265, GSM1008266, GSM1008267, GSM1008268 GSM1015990, GSM1015991, GSM1015992 GSM1015993, GSM1015994, GSM1015995 GSM1083062, GSM1083063, GSM1083064, GSM1083065 GSM1083066, GSM1083067, GSM1083068 GSM1008249, GSM1008250, GSM1008251, GSM1008252, GSM1008253, GSM1008254, GSM1008255, GSM1008256 GSM1015998 GSM1015996, GSM1015997, GSM1015998 GSM1083078 GSM1083078 C. trachomatis GSM1008247, GSM1008248, P. gingivalis GSM1015996, GSM1015997, A.comitans actinomycetm- L. interrogans GSM1083076, GSM1083077, F. tularensis GSM1083076, GSM1083077, dometrial biopsiesresponse inflammatory to Chlamydia trachoma- tis genital tract infection profile 1 ibacteror actinomycetemcomitans Porphyromonas gingivalisprimary on murinederived bone dendritic marrow- 1 cells profile ibacteror actinomycetemcomitans Porphyromonas gingivalisprimary on murinederived bone dendritic marrow- 2 cells profile larensis Inducesmonary a Inflammatory Unique Response: Role Pul- of Bacterial Gene Expression in Temporal RegulationDefense of Responses profile 1 Host larensis Inducesmonary a Inflammatory Unique Response: Role Pul- of Bacterial Gene Expression in Temporal RegulationDefense of Responses profile 2 Host Table E.1 (continued) GEO IDGSE41075 Platform GPL571 Title Transcriptome profiling of en- Infection Species Control Samples Infection Samples GSE41383 GPL6887 Effect of infection with Aggregat- GSE41383 GPL6887 Effect of infection with Aggregat- GSE44320 GPL7202 Francisella tularensis subsp. tu- GSE44320 GPL7202 Francisella tularensis subsp. tu- 191 GSM1083069, GSM1083070, GSM1083071 GSM1083072, GSM1083073, GSM1083074, GSM1083075 GSM1083079, GSM1083080, GSM1083081 GSM1083082, GSM1083083, GSM1083084 GSM1083078 GSM1083078 GSM1083078 GSM1083078 F. tularensis GSM1083076, GSM1083077, F. tularensis GSM1083076, GSM1083077, F. tularensis GSM1083076, GSM1083077, F. tularensis GSM1083076, GSM1083077, F. tularensis GSM1154333, GSM1154340 GSM1154322, GSM1154337 larensis Inducesmonary a Inflammatory Unique Response: Role Pul- of Bacterial Gene Expression in Temporal RegulationDefense of Responses profile 3 Host larensis Inducesmonary a Inflammatory Unique Response: Role Pul- of Bacterial Gene Expression in Temporal RegulationDefense of Responses profile 4 Host larensis Inducesmonary a Inflammatory Unique Response: Role Pul- of Bacterial Gene Expression in Temporal RegulationDefense of Responses profile 5 Host larensis Inducesmonary a Inflammatory Unique Response: Role Pul- of Bacterial Gene Expression in Temporal RegulationDefense of Responses profile 6 Host motes hostintracellular resistance bacterialby against infection negativeI regulation interferon ofprofile production 1 type [Set 1] Table E.1 (continued) GEO IDGSE44320 Platform GPL7202 Francisella Title tularensis subsp. tu- Infection Species Control Samples Infection Samples GSE44320 GPL7202 Francisella tularensis subsp. tu- GSE44320 GPL7202 Francisella tularensis subsp. tu- GSE44320 GPL7202 Francisella tularensis subsp. tu- GSE47672 GPL6887 TPL-2;ERK1/2 signaling pro- 192 GSM1155270, GSM1155271, GSM1155272, GSM1155273, GSM1155274, GSM1155275 GSM1236484, GSM1236485, GSM1236486 GSM1236487, GSM1236488, GSM1236489 GSM1237722, GSM1237723, GSM1237724 GSM1237725, GSM1237726, GSM1237727 GSM1237728, GSM1237729, GSM1237730 GSM1155266, GSM1155267, GSM1155268, GSM1155269 GSM1236483 GSM1236483 GSM1237733 GSM1237733 GSM1237733 M. tuberculosis GSM1155264, GSM1155265, M. catarrhalis GSM1169177,P. GSM1169178 aeroginosa GSM1169179, GSM1169180 GSM1236481, GSM1236482, S. enterica GSM1236481, GSM1236482, S. enterica GSM1237731, GSM1237732, B. pertussis GSM1237731, GSM1237732, B. pertussis GSM1237731, GSM1237732, lar interplay betweencatarrhalis and Moraxella human respiratory tract epithelial cells:response Expression of humanepithelial Detroit phanryngeal 562 cellsherent to Moraxella catarrhalis ad- strain BBH18 profile 1 bacterial pigmented virulence fac- tors and orchestrates anti-bacterial defenses (part 1) profile 1 III secretion effector SteA oncells host profile 1 III secretion effector SteA oncells host profile 2 murine macrophages to the adeny- late cyclase toxinpertussis profile 1 of Bordetella murine macrophages to the adeny- late cyclase toxinpertussis profile 2 of Bordetella murine macrophages to the adeny- late cyclase toxinpertussis profile 3 of Bordetella Table E.1 (continued) GEO IDGSE47711 Platform GPL16025 Characterization of Title the molecu- Infection Species Control Samples Infection Samples GSE48130 GPL4134 Aryl receptor senses GSE51043 GPL6244 Global impact of Salmonella type GSE51043 GPL6244 Global impact of Salmonella type GSE51075 GPL1261 Transcriptional responses of GSE51075 GPL1261 Transcriptional responses of GSE51075 GPL1261 Transcriptional responses of 193 GSM117566,GSM117568 GSM117567, GSM1267507, GSM1267514, GSM1267521, GSM1267527 GSM1267533, GSM1267539, GSM1267546 GSM1267559, GSM1267566, GSM1267573 GSM1267586, GSM1267593, GSM1267599 GSM1267611, GSM1267614, GSM1267620, GSM1267627 GSM1267634, GSM1267641, GSM1267648 GSM117562 GSM1267522, GSM1267528 GSM1267547, GSM1267553 GSM1267574, GSM1267580 GSM1267600, GSM1267605 GSM1267628 GSM1267649, GSM1267655 B. pertussis GSM117560, GSM117561, B. melitensis GSM1267508, GSM1267515, S. aureus GSM1267534, GSM1267540, S. aureus GSM1267560, GSM1267567, S. aureus GSM1267587, GSM1267594, S. aureus GSM1267612, GSM1267621, S. aureus GSM1267635, GSM1267642, sis infected mouseprofile 1 macrophages Interferon-regulated Target Genes Characterizes Staphylococcal En- terotoxin B Lethality profile 1 Interferon-regulated Target Genes Characterizes Staphylococcal En- terotoxin B Lethality profile 2 Interferon-regulated Target Genes Characterizes Staphylococcal En- terotoxin B Lethality profile 3 Interferon-regulated Target Genes Characterizes Staphylococcal En- terotoxin B Lethality profile 4 Interferon-regulated Target Genes Characterizes Staphylococcal En- terotoxin B Lethality profile 5 Interferon-regulated Target Genes Characterizes Staphylococcal En- terotoxin B Lethality profile 6 Table E.1 (continued) GEO IDGSE5202 Platform GPL1261 Title Expression data from B. meliten- Infection Species Control Samples Infection Samples GSE52474 GPL1261 Late Multiple Organ Surge in GSE52474 GPL1261 Late Multiple Organ Surge in GSE52474 GPL1261 Late Multiple Organ Surge in GSE52474 GPL1261 Late Multiple Organ Surge in GSE52474 GPL1261 Late Multiple Organ Surge in GSE52474 GPL1261 Late Multiple Organ Surge in 194 GSM1282929, GSM1282935, GSM1282937, GSM1282938, GSM1282942, GSM1282948, GSM1282951, GSM1282955 GSM123423,GSM123425 GSM123424, GSM1327526, GSM1327527, GSM1327528, GSM1327529, GSM1327530, GSM1327531, GSM1327533, GSM1327535, GSM1327540 GSM1327539, GSM1327543, GSM1327545, GSM1327547, GSM1327548, GSM1327551 GSM1282940, GSM1282943, GSM1282945, GSM1282946, GSM1282950, GSM1282954, GSM1282959 GSM123429, GSM123431 GSM1327544, GSM1327546, GSM1327549, GSM1327550 GSM1327544, GSM1327546, GSM1327549, GSM1327550 S. aureus GSM1282927, GSM1282933, M. tuberculosis GSM123427, GSM123428, H. pyloriS. aureus GSM1310513, GSM1310514 GSM1310515, GSM1310516 M. smegmatis GSM1310513, GSM1310514 GSM1310517, GSM1310518 GSM1327541, GSM1327542, M. tuberculosis GSM1327541, GSM1327542, M. tuberculosis GSM1400449, GSM1400450 GSM1400453, GSM1400454 Architecture and Regulatorypact Im- of microRNA ExpressionResponse in to Infection (functional investigation of the29) role profile of 1 miR- tric mucosamice from infected with a H.immunized with pylori transgenic BabA profile and 1 croRNA(s) Regulatetory response Inflamma- in Mastitis Mice,duced in- by StaphylococcusInfection [Microarray] aureus profile 1 croRNA(s) Regulatetory response Inflamma- in Mastitis Mice,duced in- by StaphylococcusInfection [Microarray] aureus profile 2 blood profile 1 blood profile 2 Tfap4-KO CD8early T activation profile 1 cells during Table E.1 (continued) GEO IDGSE53143 Platform GPL10558 A Genomic Portrait Title of the Genetic Infection Species Control Samples Infection Samples GSE5398 GPL339 Gene expression profile s in gas- GSE54230 GPL6887 Histone H3 Acetylation and mi- GSE54230 GPL6887 Histone H3 Acetylation and mi- GSE54992 GPL570 Expression data from peripheral GSE54992 GPL570 Expression data from peripheral GSE58078 GPL6246 Microarray analysis of WT and 195 GSM142472,GSM142474, GSM142473, GSM142476, GSM142475, GSM142486, GSM142485, GSM142488, GSM142489 GSM142487, GSM142490,GSM142492, GSM142491, GSM142495, GSM142496 GSM142493, GSM183523,GSM183527 GSM183525, GSM203412,GSM203414 GSM203413, GSM142479,GSM142481, GSM142480, GSM142483, GSM142484 GSM142482, GSM142479,GSM142481, GSM142480, GSM142483, GSM142484 GSM142482, GSM183522 GSM203417 S. aureusC. trachomatis GSM1402498, GSM1402499C. trachomatis GSM1402504, GSM1402505 GSM1402500, GSM1402501 GSM1402506, GSM1402507 GSM141250, GSM141251B. burgdorferi GSM141252, GSM141253 GSM142477, GSM142478, H. pylori GSM142477, GSM142478, H. pyloriE. coli GSM177078, GSM177081 GSM177079, GSM177082 GSM183520, GSM183521, B. pseudomallei GSM203415, GSM203416, sponses to infection with Chlamy- dia trachomatis profile 1 sponses to infection with Chlamy- dia trachomatis profile 2 Borrelia burgdorferi-activated en- dothelium to favor chronic inflam- mation profile 1 expression profile s profile 1 expression profile s profile 2 Tolerant Macrophages stimulated with LPS profile 1 Burkholderia pseudomalleia Human Macrophage Cell (THP- in 1) Model ofUsing Whole-genome Acute Microarray Melioidosis profile 1 B. burgdorferi duringand invasion adherence of humancells neuroglial profile 1 Table E.1 (continued) GEO IDGSE58151 Platform GPL6244 Human Title enteroendcrine cellGSE58151 re- GPL6244 Human enteroendcrine cellGSE6092 re- GPL570 IFN-gamma alters the response of GSE6143 GPL193 Infection Species H. pylori genotypes and host gene Control Samples Infection Samples GSE6143 GPL193 H. pylori genotypes and host gene GSE7348 GPL1261 Gene Expression in Naive and GSE7577 GPL96 Toxicogenomic Effect of GSE8219 GPL2129 Global transcriptome analysis of 196 GSM76969,GSM76971 GSM76970, GSM37932,GSM37945 GSM37938, GSM123730,GSM123738, GSM123734, GSM123745, GSM123742, GSM123751, GSM123748, GSM123757, GSM123754, GSM123762, GSM123760, GSM123767, GSM123764, GSM123773, GSM123770, GSM123779, GSM123777, GSM123786, GSM123782, GSM123793, GSM123797 GSM123789, GSM123729,GSM123737, GSM123733, GSM123747, GSM123741, GSM123759, GSM123753, GSM123772, GSM123766, GSM123781, GSM123775, GSM123788, GSM123785, GSM123796 GSM123792, GSM76965 GSM37941 GSM123740,GSM123746, GSM123744, GSM123752, GSM123750, GSM123758, GSM123756, GSM123763, GSM123761, GSM123769, GSM123765, GSM123774, GSM123771, GSM123780, GSM123778, GSM123787, GSM123784, GSM123795, GSM123799 GSM123791, GSM123740,GSM123746, GSM123744, GSM123752, GSM123750, GSM123758, GSM123756, GSM123763, GSM123761, GSM123769, GSM123765, GSM123774, GSM123771, GSM123780, GSM123778, GSM123787, GSM123784, GSM123795, GSM123799 GSM123791, N. brasiliensis GSM76963, GSM76964, C. parvum GSM37929, GSM37935, P. falciparum GSM123732, GSM123736, P. falciparum GSM123732, GSM123736, postrongylus brasiliensistion: time infec- course profile 1 responseparvum to infection:profile Cryptosporidium 1 time course tomatic malaria: peripheral blood mononuclear cells profile 1 tomatic malaria: peripheral blood mononuclear cells profile 2 Table E.1 (continued) GEO IDGSE3414 Platform GPL1261 Title Lung immune response to Nip- Infection Species Control Samples Infection Samples GSE2077 GPL8300 Ileocecal epithelial cell line GSE5418 GPL96 Presymptomatic and symp- GSE5418 GPL96 Presymptomatic and symp- 197 GSM98653,GSM98669,GSM98685, GSM98693 GSM98661, GSM98677, GSM183611,GSM183618 GSM183613, GSM183610,GSM183614, GSM183612, GSM183616, GSM183615, GSM183619 GSM183617, GSM177136,GSM177138, GSM177139 GSM177137, GSM189604,GSM189606, GSM189605, GSM189608, GSM189607, GSM189610, GSM189611 GSM189609, GSM189636,GSM189638, GSM189637, GSM189640, GSM189639, GSM189642, GSM189643 GSM189641, GSM175250, GSM175257 GSM98668,GSM98684, GSM98692 GSM98676, GSM183607,GSM183609 GSM183608, GSM183607,GSM183609 GSM183608, GSM177141, GSM177142 GSM189598,GSM189600, GSM189599, GSM189602, GSM189603 GSM189601, GSM189630,GSM189632, GSM189631, GSM189634, GSM189635 GSM189633, GSM175256 P. chabaudi GSM98652, GSM98660, A. fumigatus GSM160530, GSM160534 GSM160532, GSM160536 C. albicans GSM177134, GSM177140, P. falciparum GSM189596, GSM189597, P. falciparum GSM189628, GSM189629, C. hominis GSM175249, GSM175253, differences in responsestage to blood- malariacourse profile infection: 1 time to Aspergillus fumigatus infection in vitro profile 1 dothelial cell line profile 1 course profile 1 course profile 2 man intestinal tissuescreased causes expression of in- Osteoprote- gerin profile 1 Table E.1 (continued) GEO IDGSE4324 Platform GPL339 Title Effect of gonadal steroids on sex Infection Species Control Samples Infection Samples GSE6965 GPL570 Immature dendritic cell response GSE7586 GPL570 Placental malaria profile 1 P. falciparum GSM183605, GSM183606, GSE7586 GPL570 Placental malaria profile 2GSE7355 GPL96 P. falciparum Candida albicans effect on en- GSM183605, GSM183606, GSE7814 GPL8321 Cerebral malaria model: time GSE7814 GPL8321 Cerebral malaria model: time GSE7268 GPL570 Cryptosporidium infection of hu- 198 GSM175252,GSM175259 GSM175255, GSM307838,GSM307842, GSM307839, GSM307844, GSM307843, GSM307851, GSM307847, GSM307854, GSM307853, GSM307858, GSM307856, GSM307861, GSM307860, GSM307866, GSM307864, GSM307873 GSM307867, GSM319500,GSM319502 GSM319501, GSM341003,GSM341005 GSM341004, GSM175256 GSM307845,GSM307848, GSM307846, GSM307850, GSM307849, GSM307855, GSM307852, GSM307859, GSM307857, GSM307863, GSM307862, GSM307868, GSM307865, GSM307870, GSM307869, GSM307872 GSM307871, GSM319499 GSM340998,GSM341000, GSM340999, GSM341002 GSM341001, C. parvum GSM175249, GSM175253, L. amazonensis GSM289459, GSM289460P. chabaudi GSM289455, GSM289458 GSM307840, GSM307841, P. chabaudiP. chabaudi GSM319493, GSM319494 GSM319495, GSM319496 GSM319497, GSM319498, T. cruzi GSM340996, GSM340997, man intestinal tissuescreased causes expression of in- Osteoprote- gerin profile 2 BALB/chousing mouse multiplyingamazonensis Leishmania macrophages amastigotes profile 1 Triggers Early Expansion ofural Nat- Killer Cells profile 1 Triggers Early Expansion of Natu- ral Killer Cells - NKprofile cell 1 response Triggers Early Expansion of Natu- ral Killer Cells - NKprofile cell 2 response mouse 24 hours afterinfection intradermal with Trypanosoma cruzi profile 1 Table E.1 (continued) GEO IDGSE7268 Platform GPL570 Title Cryptosporidium infection of hu- Infection Species Control Samples Infection Samples GSE11497 GPL1261 Transcriptional signatures of GSE12249 GPL5137 Experimental Malaria Infection GSE12727 GPL7275 Experimental Malaria Infection GSE12727 GPL7275 Experimental Malaria Infection GSE13522 GPL1261 Transcriptional response in of 199 GSM341006, GSM341007 GSM341008,GSM341010 GSM341009, GSM417722, GSM417724 GSM340998,GSM341000, GSM340999, GSM341002 GSM341001, GSM340998,GSM341000, GSM340999, GSM341002 GSM341001, GSM417721 T. cruzi GSM340996, GSM340997, T. cruzi GSM340996, GSM340997, T. cruziT. cruzi GSM346943, GSM346944T. cruzi GSM346941, GSM346942 GSM346951, GSM346952L. amazonensis GSM346949, GSM346950 GSM346959, GSM346960 GSM417691, GSM346957, GSM346958 GSM417719, S. mansoni GSM438037, GSM438038 GSM438041, GSM438042 mouse 24 hours afterinfection intradermal with Trypanosoma cruzi profile 2 mouse 24 hours afterinfection intradermal with Trypanosoma cruzi profile 3 mary fibroblasts, endothelialsmooth and muscle cells infected with Trypanosoma cruzi profile 1 mary fibroblasts, endothelialsmooth and muscle cells infected with Trypanosoma cruzi profile 2 mary fibroblasts, endothelialsmooth and muscle cells infected with Trypanosoma cruzi profile 3 between uninfected DCs and DCs housing L. amazonensis profile 1 TYPE INCELLS ACTIVATED DURING TREG AHELMINTH INFECTION CHRONIC profile 1 Table E.1 (continued) GEO IDGSE13522 Platform GPL1261 Transcriptional Title response in skin of Infection Species Control Samples Infection Samples GSE13522 GPL1261 Transcriptional response in skin of GSE13791 GPL570 Expression data from human pri- GSE13791 GPL570 Expression data from human pri- GSE13791 GPL570 Expression data from human pri- GSE16644 GPL1261 Transcript abundance comparison GSE17580 GPL8321 PRONOUNCED PHENO- 200 GSM454495,GSM454497, GSM454498 GSM454496, GSM454499,GSM454502, GSM454503 GSM454500, GSM454505,GSM454507, GSM454508 GSM454506, GSM454509,GSM454511, GSM454512 GSM454510, GSM501901,GSM501980, GSM501996 GSM501934, GSM501922, GSM502001 GSM454493, GSM454494 GSM454493, GSM454494 GSM454493, GSM454494 GSM454493, GSM454494 GSM501939,GSM501991, GSM502006 GSM501964, GSM501939,GSM501991, GSM502006 GSM501964, S. mansoni GSM438039, GSM438040T. cruzi GSM438044, GSM438045 GSM454467, GSM454492, T. cruzi GSM454467, GSM454492, T. cruzi GSM454467, GSM454492, T. cruzi GSM454467, GSM454492, T. cruzi GSM501905, GSM501927, L. mexicana GSM501905, GSM501927, TYPE INCELLS ACTIVATED DURING TREG AHELMINTH INFECTION CHRONIC profile 2 a myoblastwith cell fourTrypanosoma line cruzi profile 1 distinct infected strains of a myoblastwith cell fourTrypanosoma line cruzi profile 2 distinct infected strains of a myoblastwith cell fourTrypanosoma line cruzi profile 3 distinct infected strains of a myoblastwith cell fourTrypanosoma line cruzi profile 4 distinct infected strains of macrophagegrams in activation response toparasites intracellular and cytokines profile pro- 1 macrophagegrams in activation response toparasites intracellular and cytokines profile pro- 2 Table E.1 (continued) GEO IDGSE17580 Platform GPL8321 PRONOUNCED Title PHENO- GSE18175 GPL9207 Transcriptomic alterations in Infection Species Control Samples Infection Samples GSE18175 GPL9207 Transcriptomic alterations in GSE18175 GPL9207 Transcriptomic alterations in GSE18175 GPL9207 Transcriptomic alterations in GSE20087 GPL7275 Delineation of diverse GSE20087 GPL7275 Delineation of diverse 201 GSM515527,GSM515529, GSM515528, GSM515531, GSM515530, GSM515533, GSM515532, GSM515562, GSM515534, GSM515564, GSM515563, GSM515566, GSM515567 GSM515565, GSM515583, GSM515584 GSM557074,GSM557076 GSM557075, GSM557077,GSM557079 GSM557078, GSM567351,GSM567353 GSM567352, GSM567357,GSM567359 GSM567358, GSM567363,GSM567365 GSM567364, GSM515524,GSM515555, GSM515525, GSM515557, GSM515556, GSM515580, GSM515558, GSM515591, GSM515581, GSM515593 GSM515592, GSM515524,GSM515555, GSM515525, GSM515557, GSM515556, GSM515580, GSM515558, GSM515591, GSM515581, GSM515593 GSM515592, GSM557065, GSM557066 GSM557065, GSM557066 GSM567356 GSM567362 GSM567368 C. albicans GSM515522, GSM515523, C. albicans GSM515522, GSM515523, T. gondii GSM557063, GSM557064, T. gondii GSM557063, GSM557064, T. gondii GSM567354, GSM567355, T. gondii GSM567360, GSM567361, T. gondii GSM567366, GSM567367, tures Predict Invasive Candidiasis profile 1 tures Predict Invasive Candidiasis profile 2 pression response uponwith infection Toxoplasma gondii profile 1 pression response uponwith infection Toxoplasma gondii profile 2 of 3 Canonicalfections of Toxoplasma human neuroepithelial in- cells profile 1 of 3 Canonicalfections of Toxoplasma human neuroepithelial in- cells profile 2 of 3 Canonicalfections of Toxoplasma human neuroepithelial in- cells profile 3 Table E.1 (continued) GEO IDGSE20524 Platform GPL8321 Blood Title Gene Expression Signa- Infection Species Control Samples Infection Samples GSE20524 GPL8321 Blood Gene Expression Signa- GSE22402 GPL6104 Genome-wide analysis of gene ex- GSE22402 GPL6104 Genome-wide analysis of gene ex- GSE22986 GPL5175 Microarray Expression Analysis GSE22986 GPL5175 Microarray Expression Analysis GSE22986 GPL5175 Microarray Expression Analysis 202 GSM578079,GSM578081, GSM578080, GSM578083, GSM578082, GSM578085, GSM578084, GSM578087, GSM578088 GSM578086, GSM593672,GSM593674 GSM593673, GSM593675,GSM593677 GSM593676, GSM610486,GSM610494, GSM610490, GSM610502, GSM610498, GSM610510, GSM610506, GSM610514 GSM610512, GSM671573,GSM671575 GSM671574, GSM578071,GSM578073, GSM578072, GSM578075, GSM578074, GSM578077, GSM578078 GSM578076, GSM593406 GSM593406 GSM610497,GSM610505, GSM610501, GSM610511, GSM610509, GSM610515 GSM610513, GSM671572 P. chabaudi GSM578069, GSM578070, C. parvum GSM593404, GSM593405, C. parvum GSM593404, GSM593405, P. falciparum GSM610489, GSM610493, S. japonicum GSM671570, GSM671571, T. gondiiT. gondii GSM732396, GSM732397 GSM732392,T. GSM732393 gondii GSM732396, GSM732397 GSM732394, GSM732395 GSM732402, GSM732403 GSM732398, GSM732399 experimental malariaprofile 1 infection species from DCs treatedparvum with for 24 C. h. profile 1 species from DCs treatedparvum with for 24 C. h. profile 2 ral killer cells co-culturedfalciparum with P. infectedprofile erythrocytes 1 schistosomula induce type-2flammation in- inprofile the 1 murine lung plasma gondiiMacrophages Infected and Dendritic Murine Cells profile 1 plasma gondiiMacrophages Infected and Dendritic Murine Cells profile 2 plasma gondiiMacrophages Infected and Dendritic Murine Cells profile 3 Table E.1 (continued) GEO IDGSE23565 Platform GPL7275 Interferon Title signaling during early Infection Species Control Samples Infection Samples GSE24121 GPL1261 Oligoarray analysis of mRNA GSE24121 GPL1261 Oligoarray analysis of mRNA GSE24791 GPL6244 Expression profile of human natu- GSE27171 GPL6887 Migrating japonicum GSE29584 GPL1261 Expression Data from Toxo- GSE29584 GPL1261 Expression Data fromGSE29584 Toxo- GPL1261 Expression Data from Toxo- 203 GSM792500,GSM792502 GSM792501, GSM792548,GSM792550 GSM792549, GSM792644,GSM792648, GSM792645, GSM792655, GSM792653, GSM792660, GSM792658, GSM792668 GSM792665, GSM792487 GSM792535 GSM792667 T. gondiiT. gondii GSM732402, GSM732403 GSM732400,T. GSM732401 gondii GSM732408, GSM732409 GSM732404,L. GSM732405 major GSM732408, GSM732409 GSM732406, GSM732407 GSM792485, GSM792486, L. major GSM792533, GSM792534, P. berghei GSM792639, GSM792646, plasma gondiiMacrophages Infected and Dendritic Murine Cells profile 4 plasma gondiiMacrophages Infected and Dendritic Murine Cells profile 5 plasma gondiiMacrophages Infected and Dendritic Murine Cells profile 6 Mouse Balb/cderived macrophages infected Bonethe by promastigote marrow formmania major of parasite Leish- (P) orparasite Killed (Kp) during a timeof course infection [Balb/c] profile 1 Mouse Balb/cderived macrophages infected Bonethe by promastigote marrow formmania major of parasite Leish- (P) orparasite Killed (Kp) during a timeof course infection [Balb/c] profile 1 fected with Plasmodiumspleen berghei, and brainprofile expression 1 data Table E.1 (continued) GEO IDGSE29584 Platform GPL1261 Expression Title Data fromGSE29584 Toxo- GPL1261 Expression Data fromGSE29584 Toxo- GPL1261 Expression Data from InfectionGSE31995 Toxo- Species GPL6246 Gene Expression Control Samples data from Infection Samples GSE31996 GPL6246 Gene Expression data from GSE32007 GPL6887 IDR-1018 treatment of mice in- 204 GSM792640,GSM792647, GSM792643, GSM792652, GSM792650, GSM792657, GSM792656, GSM792666 GSM792664, GSM792662 P. berghei GSM792641, GSM792651, S. japonicum GSM808650, GSM808652 GSM808651, GSM808653 fected with Plasmodiumspleen berghei, and brainprofile expression 2 data tional characterization of Schisto- soma japonicum-stimulated Alter- natively Activatedprofile Macrophages 1 Table E.1 (continued) GEO IDGSE32007 Platform GPL6887 IDR-1018 Title treatment of mice in- Infection Species Control Samples Infection Samples GSE32621 GPL6887 Transcriptional profiling and func- 205 GSM848532,GSM848534, GSM848533, GSM848536, GSM848535, GSM848538, GSM848537, GSM848540, GSM848539, GSM848542, GSM848541, GSM848544, GSM848543, GSM848546, GSM848545, GSM848548, GSM848547, GSM848550, GSM848549, GSM848552, GSM848551, GSM848554, GSM848553, GSM848556, GSM848555, GSM848558, GSM848557, GSM848560, GSM848559, GSM848562, GSM848561, GSM848564, GSM848563, GSM848566, GSM848565, GSM848568, GSM848567, GSM848570, GSM848569, GSM848572, GSM848571, GSM848574, GSM848573, GSM848576, GSM848575, GSM848578, GSM848577, GSM848580, GSM848579, GSM848582, GSM848581, GSM848584, GSM848583, GSM848586, GSM848585, GSM848588, GSM848587, GSM848590, GSM848589, GSM848592 GSM848591, GSM848440,GSM848442, GSM848441, GSM848444, GSM848443, GSM848446, GSM848445, GSM848448, GSM848447, GSM848450, GSM848449, GSM848452, GSM848451, GSM848454, GSM848453, GSM848456, GSM848455, GSM848458, GSM848457, GSM848460, GSM848459, GSM848462, GSM848461, GSM848464, GSM848463, GSM848466, GSM848465, GSM848468, GSM848467, GSM848470, GSM848469, GSM848472, GSM848471, GSM848474, GSM848473, GSM848476, GSM848475, GSM848478, GSM848477, GSM848480, GSM848479, GSM848482, GSM848481, GSM848484, GSM848483, GSM848486, GSM848485, GSM848488, GSM848487, GSM848490, GSM848489, GSM848492, GSM848491, GSM848494, GSM848493, GSM848496, GSM848495, GSM848498:GSM848531 GSM848497, P. falciparum GSM848438, GSM848439, whole bloodsponse transcriptional to malaria infection re- profile 1 Table E.1 (continued) GEO IDGSE34404 Platform GPL10558 The genomic Title architecture of host Infection Species Control Samples Infection Samples 206 GSM953285, GSM953286 GSM1018244, GSM1018245, GSM1018246, GSM1018247 GSM1032247, GSM1032248, GSM1032249 GSM953289 GSM1018242, GSM1018243 GSM1032237 B. malayiB. malayi GSM5123, GSM5136T. gondii GSM5123, GSM5136L. GSM5125, major GSM5137 GSM5123, GSM5136L. GSM5128, donovani GSM5138 GSM5123, GSM5136L. donovani GSM5129, GSM5139 GSM5123, GSM5136 GSM5132, GSM5140 GSM953287, GSM5134, GSM5141 GSM953288, B. malayi GSM983110, GSM983111P. chabaudi GSM983108, GSM983109 GSM1018240, GSM1018241, L. major GSM1032235, GSM1032236, exposed to phylogeneticallytinct parasites dis- profile 1 exposed to phylogeneticallytinct parasites dis- profile 2 exposed to phylogeneticallytinct parasites dis- profile 3 exposed to phylogeneticallytinct parasites dis- profile 4 exposed to phylogeneticallytinct parasites dis- profile 5 ysis ofto Leishmania mouse donovani infection liverprofile 1 subjected acts with interleukinin 8 iDCs but receptors causesexpression different gene patterns comparediDCs stimulated to by interleukinprofile 8. 1 sensor of malaria infection profile 1 major infectedcells human profile 1 dendritic Table E.1 (continued) GEO IDGSE360 Platform GPL8300 Title Macrophages andGSE360 dendritic cells GPL8300 Macrophages andGSE360 dendritic cells GPL8300 Macrophages andGSE360 dendritic cells GPL8300 Macrophages Infection and SpeciesGSE360 dendritic cells GPL8300 Macrophages Control Samples andGSE38985 dendritic cells GPL6887 Whole genome microarray anal- Infection Samples GSE39999 GPL570 Filarial AsnRS inter- GSE41496 GPL13912 Toll-like receptor 7 is a primary GSE42088 GPL570 Expression data from Leishmania 207 GSM1046530, GSM1046531, GSM1046532, GSM1046533, GSM1046534, GSM1046535, GSM1046536, GSM1046537, GSM1046538, GSM1046539, GSM1046540, GSM1046541, GSM1046542, GSM1046543, GSM1046544, GSM1046545, GSM1046546, GSM1046547, GSM1046548, GSM1046549, GSM1046550, GSM1046551, GSM1046552, GSM1046553, GSM1046554, GSM1046555, GSM1046556, GSM1046557, GSM1046558, GSM1046559, GSM1046560, GSM1046561, GSM1046562, GSM1046563, GSM1046564, GSM1046565 GSM1046794, GSM1046795, GSM1046796, GSM1046797, GSM1046798, GSM1046799, GSM1046800, GSM1046801, GSM1046802, GSM1046803, GSM1046804, GSM1046805, GSM1046806, GSM1046807, GSM1046808, GSM1046809, GSM1046810, GSM1046811, GSM1046812, GSM1046813, GSM1046814, GSM1046815, GSM1046816, GSM1046817, GSM1046818, GSM1046819, GSM1046820, GSM1046821, GSM1046822, GSM1046823, GSM1046824, GSM1046825, GSM1046826 C. albicans GSM1046792, GSM1046793, filing of humanlated with PBMCs Candida stimu- non-fungal albicans inflammatory and stimuli profile 1 Table E.1 (continued) GEO IDGSE42606 Platform GPL10558 Genome-wide transcriptional pro- Title Infection Species Control Samples Infection Samples 208 GSM1046590, GSM1046591, GSM1046592, GSM1046593, GSM1046594, GSM1046595, GSM1046596, GSM1046597, GSM1046598, GSM1046599, GSM1046600, GSM1046601, GSM1046602, GSM1046603, GSM1046604, GSM1046605, GSM1046606, GSM1046607, GSM1046608, GSM1046609, GSM1046610, GSM1046611, GSM1046612, GSM1046613, GSM1046614, GSM1046615, GSM1046616, GSM1046617, GSM1046618, GSM1046619, GSM1046620, GSM1046621, GSM1046622, GSM1046623 GSM1046794, GSM1046795, GSM1046796, GSM1046797, GSM1046798, GSM1046799, GSM1046800, GSM1046801, GSM1046802, GSM1046803, GSM1046804, GSM1046805, GSM1046806, GSM1046807, GSM1046808, GSM1046809, GSM1046810, GSM1046811, GSM1046812, GSM1046813, GSM1046814, GSM1046815, GSM1046816, GSM1046817, GSM1046818, GSM1046819, GSM1046820, GSM1046821, GSM1046822, GSM1046823, GSM1046824, GSM1046825, GSM1046826 C. albicans GSM1046792, GSM1046793, filing of humanlated with PBMCs Candida stimu- non-fungal albicans inflammatory and stimuli profile 2 Table E.1 (continued) GEO IDGSE42606 Platform GPL10558 Genome-wide transcriptional pro- Title Infection Species Control Samples Infection Samples 209 GSM1046650, GSM1046651, GSM1046652, GSM1046653, GSM1046654, GSM1046655, GSM1046656, GSM1046657, GSM1046658, GSM1046659, GSM1046660, GSM1046661, GSM1046662, GSM1046663, GSM1046664, GSM1046665, GSM1046666, GSM1046667, GSM1046668, GSM1046669, GSM1046670, GSM1046671, GSM1046672, GSM1046673 GSM1046794, GSM1046795, GSM1046796, GSM1046797, GSM1046798, GSM1046799, GSM1046800, GSM1046801, GSM1046802, GSM1046803, GSM1046804, GSM1046805, GSM1046806, GSM1046807, GSM1046808, GSM1046809, GSM1046810, GSM1046811, GSM1046812, GSM1046813, GSM1046814, GSM1046815, GSM1046816, GSM1046817, GSM1046818, GSM1046819, GSM1046820, GSM1046821, GSM1046822, GSM1046823, GSM1046824, GSM1046825, GSM1046826 C. albicans GSM1046792, GSM1046793, filing of humanlated with PBMCs Candida stimu- non-fungal albicans inflammatory and stimuli profile 3 Table E.1 (continued) GEO IDGSE42606 Platform GPL10558 Genome-wide transcriptional pro- Title Infection Species Control Samples Infection Samples 210 GSM1046697, GSM1046698, GSM1046699, GSM1046701, GSM1046703, GSM1046705, GSM1046706, GSM1046708, GSM1046709, GSM1046711, GSM1046712, GSM1046713, GSM1046715, GSM1046716, GSM1046718, GSM1046719, GSM1046721, GSM1046722, GSM1046724, GSM1046726, GSM1046727, GSM1046729, GSM1046731, GSM1046732, GSM1046734, GSM1046735, GSM1046737, GSM1046739, GSM1046740, GSM1046742, GSM1046743, GSM1046745, GSM1046746, GSM1046748, GSM1046750, GSM1046751 GSM98653,GSM98669,GSM98685, GSM98693 GSM98661, GSM98677, GSM1061471, GSM1061472, GSM1061473 GSM1046794, GSM1046795, GSM1046796, GSM1046797, GSM1046798, GSM1046799, GSM1046800, GSM1046801, GSM1046802, GSM1046803, GSM1046804, GSM1046805, GSM1046806, GSM1046807, GSM1046808, GSM1046809, GSM1046810, GSM1046811, GSM1046812, GSM1046813, GSM1046814, GSM1046815, GSM1046816, GSM1046817, GSM1046818, GSM1046819, GSM1046820, GSM1046821, GSM1046822, GSM1046823, GSM1046824, GSM1046825, GSM1046826 GSM98668,GSM98684, GSM98692 GSM98676, C. albicans GSM1046792, GSM1046793, P. chabaudi GSM98652, GSM98660, E. histolytica GSM1061459, GSM1061460 GSM1061469, GSM1061470, filing of humanlated with PBMCs Candida stimu- non-fungal albicans inflammatory and stimuli profile 4 differences in responsestage to blood- malariacourse profile infection: 1 time tor Q223R PolymorphismHost on Transcriptome the Following In- fection with E. histolytica.1 profile Table E.1 (continued) GEO IDGSE42606 Platform GPL10558 Genome-wide transcriptional pro- Title Infection Species Control Samples Infection Samples GSE4324 GPL339 Effect of gonadal steroids on sex GSE43372 GPL6246 The Effect of the Recep- 211 GSM1061493, GSM1061494, GSM1061495, GSM1061496, GSM1061497, GSM1061498, GSM1061499, GSM1061500, GSM1061501, GSM1061502, GSM1061503, GSM1061504, GSM1061505, GSM1061506, GSM1061507, GSM1061508, GSM1061509 GSM1072817, GSM1072818, GSM1072819, GSM1072820, GSM1072821, GSM1072822, GSM1072823, GSM1072824 GSM1111121, GSM1111122 GSM1061479 GSM1111112 E. histolytica GSM1061477, GSM1061478, L. braziliensis GSM1072813, GSM1072814N. caninum GSM1072815, GSM1072816, T. gondii GSM1111106, GSM1111107 GSM1111098, GSM1111099 Apicomplexa GSM1111106, GSM1111107 GSM1111100, GSM1111101 GSM1111110, GSM1111111, Apicomplexa GSM1333786, GSM1333787 GSM1333788, GSM1333789 tor Q223R PolymorphismHost on Transcriptome the Following In- fection with E. histolytica.2 profile dependent pathology in Leishma- nia braziliensis infection profile 1 dependent innate immune signal- ing byspecies-I closely profile 1 related parasite dependent innate immune signal- ing byspecies-I closely profile 2 related parasite dependent innate immune signal- ing byspecies-II closely profile 4 related parasite tion of macrophages profile 1 Table E.1 (continued) GEO IDGSE43372 Platform GPL6246 The Title Effect of the Leptin Recep- Infection Species Control Samples Infection Samples GSE43880 GPL10558 CD8 T cells induce perforin- GSE45632 GPL6887 Differential induction of TLR3- GSE45632 GPL6887 Differential induction of TLR3- GSE45633 GPL6883 Differential induction of TLR3- GSE55298 GPL1261 Toxoplasma RH and Mock Infec- 212 GSM1341374, GSM1341375, GSM1341376, GSM1341377, GSM1341378, GSM1341379, GSM1341380, GSM1341381 GSM162966,GSM162969 GSM162967, GSM47630,GSM47632 GSM47631, GSM39116,GSM39118,GSM39120, GSM39117, GSM39122, GSM39119, GSM39124, GSM39121, GSM39126, GSM39123, GSM39128, GSM39125, GSM39130, GSM39127, GSM39132, GSM39129, GSM39134, GSM39131, GSM39136, GSM39137 GSM39133, GSM39135, GSM2602,GSM2622 GSM2612, GSM76400,GSM76402, GSM76403 GSM76401, GSM52867,GSM52869,GSM52871, GSM52868, GSM52873, GSM52870, GSM52875, GSM52876 GSM52872, GSM52874, GSM1341366, GSM1341367, GSM1341368, GSM1341369, GSM1341370, GSM1341371, GSM1341372, GSM1341373 GSM162964 GSM47627,GSM47629 GSM47628, GSM39104,GSM39106,GSM39108, GSM39105, GSM39110, GSM39107, GSM39112, GSM39109, GSM39114, GSM39115 GSM39111, GSM39113, GSM2591,GSM2617 GSM2607, GSM76392,GSM76394, GSM76395 GSM76393, GSM52861,GSM52863,GSM52865, GSM52866 GSM52862, GSM52864, Euglenozoa GSM1341364, GSM1341365, T. cruzi GSM162953, GSM162955, Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Respitoryvirus syncytial Human rotavirus A GSM52859, GSM52860, Leishmania braziliensisidentifies lesions transcriptionalules associated mod- immunopathology profile with 1 cutaneous panosoma cruzi-infectedprofile 1 cells vector profile 1 eral blood mononuclear cellsfile pro- 1 profile 1 to respiratory syncytialfection: virus time course in- profile 1 eral blood mononuclear cellsfile pro- 1 Table E.1 (continued) GEO IDGSE55664 Platform GPL10558 Genomic Title profiling of human Infection Species Control Samples Infection Samples GSE7047 GPL570 Transcriptome profile of Try- GSE2504 GPL96 T cells infected with an HIV-based GSE2171 GPL201 HIV-1 infection effect on periph- GSE511 GPL80 HIV viral infection time course GSE3397 GPL570 Bronchial epithelial cell response GSE2729 GPL8300 Acute rotavirus infection: periph- 213 GSM108096,GSM108098 GSM108097, GSM157040,GSM157042 GSM157041, GSM154936,GSM155182, GSM155180, GSM155186 GSM155184, GSM155179,GSM155183, GSM155181, GSM155187 GSM155185, GSM135575,GSM135582, GSM135579, GSM135589 GSM135586, GSM175455,GSM175457, GSM175456, GSM175444, GSM175445 GSM175443, GSM257862,GSM257866, GSM257868 GSM257865, GSM148687,GSM148689 GSM148688, GSM108093,GSM108095 GSM108094, GSM157034,GSM157036 GSM157035, GSM155228,GSM155233, GSM155230, GSM155237 GSM155235, GSM155229,GSM155234, GSM155232, GSM155238 GSM155236, GSM135573 GSM175454,GSM175441, GSM175442 GSM175440, GSM257864,GSM257869, GSM257870 GSM257867, GSM148692 Human immunodefi- ciency virus Respitoryvirus syncytial Human immunodefi- ciency virus Human immunodefi- ciency virus Measels virus GSM135280, GSM135572, Hantavirus GSM175452, GSM175453, Sendai virus GSM257861, GSM257863, Human herpesvirus 8 GSM148690, GSM148691, Nef protein expressionCD4+ effect T cells on profile 1 bronchial epithelialulated cells stim- pathogens. by profile 1 different airway profile s ofcells CD4+ from and HIV-infectedand CD8+ pateints T uninfectedprofile 1 control group profile s ofcells CD4+ from and HIV-infectedand CD8+ pateints T uninfectedprofile 2 control group mononuclear cells profile 1 hantavirus infection: timeprofile course 1 Sendai virus infection in vitro pro- file 1 primary pulmonary microvascular cells profile 1 Table E.1 (continued) GEO IDGSE4785 Platform GPL570 Title Simian immune deficiency virus Infection Species Control Samples Infection Samples GSE6802 GPL571 Gene expression analysis of GSE6740 GPL96 Comparison of transcriptional GSE6740 GPL96 Comparison of transcriptional GSE5808 GPL96 Measles: peripheral blood GSE7271 GPL341 Male and female lungs response to GSE10211 GPL81 Airway epithelial cells response to GSE6489 GPL570 Human herpesvirus-8 infection of 214 GSM509732,GSM509741, GSM509746 GSM509736, GSM523808,GSM523810 GSM523809, GSM307741, GSM307742 GSM762061,GSM762065, GSM762067 GSM762063, GSM697654,GSM697656 GSM697655, GSM338267, GSM338271 GSM338264, GSM338270 GSM509781,GSM509777, GSM509779, GSM509773, GSM509775, GSM509769, GSM509771, GSM509765, GSM509767, GSM509761, GSM509763, GSM509757, GSM509759, GSM509753, GSM509751 GSM509755, GSM523824 GSM307738,GSM307740 GSM307739, GSM762064, GSM762066 GSM560538, GSM560539 GSM560540, GSM560541 GSM560538, GSM560539 GSM560542, GSM560543 GSM697651,GSM697653 GSM697652, Human immunodefi- ciency virus Influenzavirus A GSM509785, GSM509783, Hepatitis C virus GSM523822, GSM523823, Lassanavirus mammare- West Nile virus GSM762060, GSM762062, Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus macrophages incourse profile 1 vitro: time whole blood profile 1 on Huh7 hepatomacourse profile 1 cells: time Lassa fever: liver profile 1 on retinal pigment epithelium cells profile 1 co-infected with HIV-GFP(G) and SIV-VLP(G) profile 1 co-infected with HIV-GFP(G) and SIV-VLP(G) profile 2 apy (HAART) interruptionon effect HIV patients:profile jejunal 1 mucosa Table E.1 (continued) GEO IDGSE13395 Platform GPL8300 HIV-1 Title infectionGSE20346 GPL6947 effect Severe on influenza A infection: Infection Species Control Samples Infection Samples GSE20948 GPL570 Hepatitis C Virus infection effect GSE12254 GPL570 Nonhuman model of GSE30719 GPL6244 West Nile virus infection effect GSE22589 GPL570 Monocyte-derived dendritic cells GSE22589 GPL570 Monocyte-derived dendritic cells GSE28177 GPL570 Highly Active Antiretroviral Ther- 215 GSM751340,GSM751337, GSM751338, GSM751335, GSM751336, GSM751331, GSM751333, GSM751329, GSM751330, GSM751327, GSM751328, GSM751323, GSM751326, GSM751293, GSM751319, GSM751351, GSM751295 GSM751350, GSM751412,GSM751408, GSM751409, GSM751405, GSM751407, GSM751402, GSM751403, GSM751420, GSM751401, GSM751399, GSM751400, GSM751392, GSM751396, GSM751360, GSM751390, GSM751418, GSM751364 GSM751419, GSM751458,GSM751456, GSM751457, GSM751454, GSM751455, GSM751447, GSM751453, GSM751443, GSM751445, GSM751441, GSM751442, GSM751438, GSM751439, GSM751426 GSM751427, GSM751342,GSM751359, GSM751358, GSM751316, GSM751318, GSM751310, GSM751314, GSM751305, GSM751307, GSM751354, GSM751304, GSM751298, GSM751355, GSM751353, GSM751352, GSM751349, GSM751348, GSM751347 GSM751346, GSM751414,GSM751391, GSM751410, GSM751386, GSM751388, GSM751379, GSM751382, GSM751376, GSM751378, GSM751371, GSM751375, GSM751367, GSM751368, GSM751362 GSM751363, GSM751451,GSM751437, GSM751448, GSM751430, GSM751433, GSM751425 GSM751429, Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Therapy effects onin mitochondria various tissues profile 1 Therapy effects onin mitochondria various tissues profile 2 Therapy effects onin mitochondria various tissues profile 3 Table E.1 (continued) GEO IDGSE30310 Platform GPL9392 HIV Title infection and Antiretroviral Infection Species Control Samples Infection Samples GSE30310 GPL9392 HIV infection and Antiretroviral GSE30310 GPL9392 HIV infection and Antiretroviral 216 GSM697492,GSM697494, GSM697493, GSM697496, GSM697495, GSM697498, GSM697497, GSM697500, GSM697499, GSM697502, GSM697501, GSM697504, GSM697505 GSM697503, GSM670449,GSM670453, GSM670451, GSM670457, GSM670455, GSM670461 GSM670459, GSM876868,GSM876844, GSM876892, GSM876893, GSM876869, GSM876870, GSM876845, GSM876846, GSM876894, GSM876895, GSM876871, GSM876872, GSM876847, GSM876848, GSM876896, GSM876897, GSM876849 GSM876873, GSM876874,GSM876850, GSM876898, GSM876899, GSM876875, GSM876876, GSM876851, GSM876852, GSM876900, GSM876901, GSM876877, GSM876878, GSM876853, GSM876854, GSM876902, GSM876903, GSM876879, GSM876880, GSM876855, GSM876856 GSM876904, GSM697483,GSM697485, GSM697484, GSM697487, GSM697486, GSM697489, GSM697488, GSM697491 GSM697490, GSM670465,GSM670467, GSM670466, GSM670469 GSM670468, GSM876862,GSM876838, GSM876886, GSM876887, GSM876863, GSM876864, GSM876839, GSM876840, GSM876888, GSM876889, GSM876865, GSM876866, GSM876841, GSM876842, GSM876890, GSM876891, GSM876843 GSM876867, GSM876862,GSM876838, GSM876886, GSM876887, GSM876863, GSM876864, GSM876839, GSM876840, GSM876888, GSM876889, GSM876865, GSM876866, GSM876841, GSM876842, GSM876890, GSM876891, GSM876843 GSM876867, Human immunodefi- ciency virus Influenza A H1N1 GSM670463, GSM670464, Human immunodefi- ciency virus Human immunodefi- ciency virus on brain ofassociated patients neurocognitive withders disor- profile 1 HIV- effect on peripheralcourse blood: profile 1 time rocognitive impairment: brain re- gions profile 1 rocognitive impairment: brain re- gions profile 2 Table E.1 (continued) GEO IDGSE28160 Platform GPL570 Title Antiretroviral therapy effect Infection Species Control Samples Infection Samples GSE27131 GPL6244 Severe pandemic H1N1 influenza GSE35864 GPL570 Two types of HIV-associated neu- GSE35864 GPL570 Two types of HIV-associated neu- 217 GSM876881,GSM876857, GSM876905, GSM876906, GSM876882, GSM876883, GSM876858, GSM876859, GSM876907, GSM876908, GSM876884, GSM876885, GSM876860, GSM876861 GSM876909, GSM280202,GSM280204, GSM280203, GSM280208 GSM280205, GSM280211,GSM280221 GSM280215, GSM876862,GSM876838, GSM876886, GSM876887, GSM876863, GSM876864, GSM876839, GSM876840, GSM876888, GSM876889, GSM876865, GSM876866, GSM876841, GSM876842, GSM876890, GSM876891, GSM876843 GSM876867, GSM280201 GSM280201 Human immunodefi- ciency virus Human herpesvirus 4 GSM280199, GSM280200, Human herpesvirus 4 GSM280199, GSM280200, rocognitive impairment: brain re- gions profile 3 tome changesprocesses identify likely cellular Epstein targeted Barr Virus during infection.file pro- 1 tome changesprocesses identify likelyEpstein cellular targeted Barr Virus during infection.file pro- 2 Table E.1 (continued) GEO IDGSE35864 Platform GPL570 Title Two types of HIV-associated neu- Infection Species Control Samples Infection Samples GSE11093 GPL2986 EBER2 RNA-induced transcrip- GSE11093 GPL2986 EBER2 RNA-induced transcrip- 218 GSM286648,GSM286654, GSM286651, GSM286660, GSM286657, GSM286666, GSM286663, GSM286672, GSM286669, GSM286678, GSM286675, GSM286684, GSM286681, GSM286690, GSM286687, GSM286696, GSM286693, GSM286702, GSM286699, GSM286708, GSM286705, GSM286714, GSM286710, GSM286720, GSM286717, GSM286726, GSM286723, GSM286732, GSM286729, GSM286738 GSM286735, GSM347800,GSM347804, GSM347801, GSM347806, GSM347805, GSM347808, GSM347807, GSM347823, GSM347809, GSM347829, GSM347824, GSM347831, GSM347830, GSM347833, GSM347832, GSM347835, GSM347836 GSM347834, GSM286653,GSM286659, GSM286656, GSM286665, GSM286662, GSM286671, GSM286668, GSM286677, GSM286674, GSM286683, GSM286680, GSM286689, GSM286686, GSM286695, GSM286692, GSM286701, GSM286698, GSM286707, GSM286704, GSM286713, GSM286709, GSM286719, GSM286716, GSM286725, GSM286722, GSM286731, GSM286728, GSM286737 GSM286734, GSM347802,GSM347810, GSM347803, GSM347812, GSM347811, GSM347814, GSM347813, GSM347816, GSM347815, GSM347818, GSM347817, GSM347821, GSM347820, GSM347825, GSM347822, GSM347827, GSM347828 GSM347826, Rhinovirus A GSM286647, GSM286650, Human immunodefi- ciency virus in vivo humantion: rhinovirus infec- insightssponse. profile into 1 the host re- Control: Hippocampussion profile s Expres- profile 1 Table E.1 (continued) GEO IDGSE11348 Platform GPL570 Title Gene expression profile s during Infection Species Control Samples Infection Samples GSE13824 GPL3535 SIV Encephalitis and Uninfected 219 GSM410756,GSM410764, GSM410757, GSM410770, GSM410765, GSM410776, GSM410771, GSM410778, GSM410777, GSM410788, GSM410779, GSM410794, GSM410789, GSM410798, GSM410799 GSM410797, GSM410752,GSM410758, GSM410753, GSM410762, GSM410759, GSM410768, GSM410763, GSM410780, GSM410769, GSM410782, GSM410781, GSM410786, GSM410783, GSM410790, GSM410787, GSM410800, GSM410801 GSM410791, GSM429203,GSM429207, GSM429205, GSM429211, GSM429209, GSM429215, GSM429213, GSM429219, GSM429217, GSM429223, GSM429221, GSM429227, GSM429225, GSM429231, GSM429229, GSM429234, GSM429232, GSM429238, GSM429240 GSM429236, GSM410750,GSM410754, GSM410751, GSM410760, GSM410755, GSM410766, GSM410761, GSM410772, GSM410767, GSM410774, GSM410773, GSM410784, GSM410775, GSM410792, GSM410785, GSM410795, GSM410796 GSM410793, GSM410750,GSM410754, GSM410751, GSM410760, GSM410755, GSM410766, GSM410761, GSM410772, GSM410767, GSM410774, GSM410773, GSM410784, GSM410775, GSM410792, GSM410785, GSM410795, GSM410796 GSM410793, GSM429206,GSM429210, GSM429208, GSM429214, GSM429212, GSM429218, GSM429216, GSM429222, GSM429220, GSM429226, GSM429224, GSM429230, GSM429228, GSM429235, GSM429233, GSM429239 GSM429237, Human immunodefi- ciency virus Human immunodefi- ciency virus Rhinovirus A GSM429202, GSM429204, Tissue RevealsGene-Expression Stage-Specific, SignaturesHIV-1 Infection profile in 1 Tissue RevealsGene-Expression Stage-Specific, SignaturesHIV-1 Infection profile in 2 symptomatic respiratory viralfection in- in adults profile 1 Table E.1 (continued) GEO IDGSE16363 Platform GPL570 Title Microarray Analysis of Lymphatic Infection Species Control Samples Infection Samples GSE16363 GPL570 Microarray Analysis of Lymphatic GSE17156 GPL571 Gene expression signatures of 220 GSM429242,GSM429246, GSM429244, GSM429250, GSM429248, GSM429254, GSM429252, GSM429258, GSM429256, GSM429262, GSM429260, GSM429266, GSM429264, GSM429270, GSM429268, GSM429274, GSM429272, GSM429278, GSM429280 GSM429276, GSM429282,GSM429286, GSM429284, GSM429290, GSM429288, GSM429294, GSM429292, GSM429298, GSM429296, GSM429302, GSM429300, GSM429306, GSM429304, GSM429310, GSM429308, GSM429314 GSM429312, GSM454282,GSM454284 GSM454283, GSM454282,GSM454284 GSM454283, GSM429245,GSM429249, GSM429247, GSM429253, GSM429251, GSM429257, GSM429255, GSM429261, GSM429259, GSM429265, GSM429263, GSM429269, GSM429267, GSM429273, GSM429271, GSM429277, GSM429279 GSM429275, GSM429281,GSM429285, GSM429283, GSM429289, GSM429287, GSM429293, GSM429291, GSM429297, GSM429295, GSM429301, GSM429299, GSM429305, GSM429303, GSM429309, GSM429307, GSM429313 GSM429311, GSM454279,GSM454281 GSM454280, GSM454279,GSM454281 GSM454280, Influenza A H3N2 GSM429241, GSM429243, Respitoryvirus syncytial Respitoryvirus syncytial Respitoryvirus syncytial Influenza A H1N1 GSM528634, GSM528635 GSM528769, GSM528770 symptomatic respiratory viralfection in- in adults profile 2 symptomatic respiratory viralfection in- in adults profile 3 Protein Kinaseinnate Regulates immune early responsesRSV Infection during profile 1 Protein Kinaseinnate Regulates immune early responsesRSV Infection during profile 2 man bronchial epithelialinfluenza cells virus, to interferon-beta viral profile 1 RNA and Table E.1 (continued) GEO IDGSE17156 Platform GPL571 Title Gene expression signatures of Infection Species Control Samples Infection Samples GSE17156 GPL571 Gene expression signatures of GSE18170 GPL7202 Double-Stranded RNA-Activated GSE18170 GPL7202 Double-Stranded RNA-Activated GSE19392 GPL3921 Dynamic responses of primary hu- 221 GSM523808,GSM523810 GSM523809, GSM542948,GSM542950, GSM542949, GSM542957, GSM542952, GSM542963, GSM542958, GSM542970, GSM542964, GSM542974, GSM542975 GSM542972, GSM542945,GSM542947, GSM542946, GSM542953, GSM542951, GSM542955, GSM542954, GSM542959, GSM542956, GSM542961, GSM542960, GSM542965, GSM542962, GSM542967, GSM542966, GSM542969, GSM542968, GSM542973, GSM542971, GSM542977, GSM542976, GSM542979, GSM542980 GSM542978, GSM555583,GSM555585, GSM555584, GSM555587 GSM555586, GSM555588,GSM555590, GSM555589, GSM555592 GSM555591, GSM523824 GSM542943, GSM542944 GSM542943, GSM542944 GSM555580,GSM555582 GSM555581, GSM555580,GSM555582 GSM555581, Hepatitis C virus GSM523822, GSM523823, Influenza A H1N1 GSM542941, GSM542942, Influenza A H1N1 GSM542941, GSM542942, Influenza A H1N1 GSM555578, GSM555579, Influenza A H3N2 GSM555578, GSM555579, fection on Host Geneprofile 1 Expression caused by pandemic H1N1 profile 1 caused by pandemic H1N1 profile 2 screening identifies neweffective broadly influenza antivirals pro- file 1 screening identifies neweffective broadly influenza antivirals pro- file 2 Table E.1 (continued) GEO IDGSE20948 Platform GPL570 Title The Effect of Hepatitis C Virus In- Infection Species Control Samples Infection Samples GSE21802 GPL6102 Hosts responses in critical disease GSE21802 GPL6102 Hosts responses in critical disease GSE22319 GPL10536 Gene expression signature-based GSE22319 GPL10536 Gene expression signature-based 222 GSM555593,GSM555595, GSM555594, GSM555597 GSM555596, GSM555598,GSM555600, GSM555599, GSM555602 GSM555601, GSM555603,GSM555605, GSM555604, GSM555607 GSM555606, GSM596105,GSM596111 GSM596108, GSM602168,GSM602170 GSM602169, GSM686092,GSM686094 GSM686093, GSM686095,GSM686097 GSM686096, GSM686098,GSM686100 GSM686099, GSM555580,GSM555582 GSM555581, GSM555580,GSM555582 GSM555581, GSM555580,GSM555582 GSM555581, GSM596110 GSM602167 GSM686091 GSM686091 GSM686091 Influenza A H5N1 GSM555578, GSM555579, Influenza A H5N2 GSM555578, GSM555579, Influenza A H7N1 GSM555578, GSM555579, Murid herpesvirus 1 GSM596104, GSM596107, Murid herpesvirus 1 GSM602165, GSM602166, Influenza A H5N1 GSM686089, GSM686090, Influenza A H5N1 GSM686089, GSM686090, Influenza A H5N1 GSM686089, GSM686090, screening identifies neweffective broadly influenza antivirals pro- file 3 screening identifies neweffective broadly influenza antivirals pro- file 4 screening identifies neweffective broadly influenza antivirals pro- file 5 tomegalovirus (HCMV)-infected monocytes. profile 1 to expression oftomegalovirus the (hCMV) humanimmediate-early cy- 72-kDa 1 (IE1)profile 1 protein fluenza Viruses AvoidInflammatory Response of Effective Human Macrophages profile 1 fluenza Viruses AvoidInflammatory Response of Effective Human Macrophages profile 2 fluenza Viruses AvoidInflammatory Response of Effective Human Macrophages profile 3 Table E.1 (continued) GEO IDGSE22319 Platform GPL10536 Gene expression Title signature-based Infection Species Control Samples Infection Samples GSE22319 GPL10536 Gene expression signature-based GSE22319 GPL10536 Gene expression signature-based GSE24238 GPL8300 The role of integrins in human cy- GSE24434 GPL6244 Host cell transcriptome response GSE27702 GPL571 Highly Pathogenic Avian In- GSE27702 GPL571 Highly Pathogenic Avian In- GSE27702 GPL571 Highly Pathogenic Avian In- 223 GSM692554,GSM692558, GSM692560 GSM692556, GSM697597,GSM697599 GSM697598, GSM716282,GSM716284 GSM716283, GSM716346,GSM716348 GSM716347, GSM726386,GSM726392, GSM726389, GSM726398, GSM726395, GSM726404, GSM726401, GSM726410, GSM726407, GSM726416, GSM726413, GSM726443, GSM726419, GSM726467, GSM726449, GSM726473, GSM726479 GSM726470, GSM692557, GSM692559 GSM697581 GSM716244 GSM716308 GSM726539,GSM726545, GSM726542, GSM726551, GSM726548, GSM726557, GSM726554, GSM726563, GSM726560, GSM726569, GSM726566, GSM726575, GSM726572, GSM726581:3:GSM726755 GSM726578, Influenzavirus A GSM692553, GSM692555, Influenza A H5N1 GSM697579, GSM697580, Rhinovirus A GSM716242, GSM716243, Rhinovirus A GSM716306, GSM716307, Rhinovirus A GSM726533, GSM726536, eral bloodin mononuclear childrenmedia cells with causedHaemophilus influenzae by acute profile 1 Nontypeable otitis sponse to InfectionPathogenic H5N1 with Avian Influenza Highly Virus profile 1 duced by rhinovirus infectionvitro profile in- 1 duced by rhinovirus infectionvitro profile in- 2 tion to globalscriptional whole changes profile blood 1 tran- Table E.1 (continued) GEO IDGSE27990 Platform GPL13287 Transcriptome profile Title of periph- Infection Species Control Samples Infection Samples GSE28166 GPL6480 Host Regulatory Network Re- GSE28904 GPL6883 Global transcriptional changes in- GSE28904 GPL6883 Global transcriptional changes in- GSE29385 GPL10558 Influenzavirus serotype associa- 224 GSM726293,GSM726308, GSM726305, GSM726317, GSM726311, GSM726326, GSM726320, GSM726332, GSM726329, GSM726338, GSM726335, GSM726344, GSM726341, GSM726350, GSM726347, GSM726359, GSM726353, GSM726371, GSM726368, GSM726383, GSM726380, GSM726455, GSM726452, GSM726461, GSM726458, GSM726482, GSM726464, GSM726488, GSM726485, GSM726494, GSM726491, GSM726500, GSM726497, GSM726506, GSM726503, GSM726512, GSM726509, GSM726518, GSM726515, GSM726524, GSM726521, GSM726530 GSM726527, GSM726422,GSM726428, GSM726425, GSM726434, GSM726431, GSM726440, GSM726437, GSM726476 GSM726446, GSM726539,GSM726545, GSM726542, GSM726551, GSM726548, GSM726557, GSM726554, GSM726563, GSM726560, GSM726569, GSM726566, GSM726575, GSM726572, GSM726581, GSM726578, GSM726587, GSM726584, GSM726593, GSM726590, GSM726599, GSM726596, GSM726605, GSM726602, GSM726611, GSM726608, GSM726617, GSM726614, GSM726623, GSM726620, GSM726629, GSM726626, GSM726635, GSM726632, GSM726641, GSM726638, GSM726647, GSM726644, GSM726653, GSM726650, GSM726659, GSM726656, GSM726665:3:GSM726755 GSM726662, GSM726539,GSM726545, GSM726542, GSM726551, GSM726548, GSM726557, GSM726554, GSM726563, GSM726560, GSM726569:3:GSM726755 GSM726566, Influenza A H1N1 GSM726533, GSM726536, Influenza A H1N1 GSM726533, GSM726536, tion to globalscriptional whole changes profile blood 2 tran- tion to globalscriptional whole changes profile blood 3 tran- Table E.1 (continued) GEO IDGSE29385 Platform GPL10558 Influenzavirus Title serotype associa- Infection Species Control Samples Infection Samples GSE29385 GPL10558 Influenzavirus serotype associa- 225 GSM726296,GSM726302, GSM726299, GSM726323, GSM726314, GSM726362, GSM726356, GSM726377 GSM726374, GSM741088,GSM741090 GSM741089, GSM741100,GSM741102 GSM741101, GSM741112,GSM741114 GSM741113, GSM757902,GSM757934, GSM757918, GSM757965, GSM757950, GSM757997, GSM757981, GSM758044, GSM758028, GSM758076, GSM758060, GSM758106, GSM758091, GSM758138, GSM758154 GSM758122, GSM768661,GSM768673 GSM768667, GSM726539,GSM726545, GSM726542, GSM726551, GSM726548, GSM726557, GSM726554, GSM726563, GSM726560, GSM726569:3:GSM726755 GSM726566, GSM741082,GSM741086 GSM741084, GSM741094,GSM741096 GSM741095, GSM741106,GSM741108 GSM741107, GSM757931,GSM757962, GSM757947, GSM757994, GSM757978, GSM758025, GSM758010, GSM758057, GSM758041, GSM758088, GSM758073, GSM758119, GSM758103, GSM758151 GSM758135, GSM768660 Influenza A H3N2 GSM726533, GSM726536, Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Influenza A H5N1 GSM757899, GSM757915, Influenza A H1N1 GSM768658, GSM768659, tion to globalscriptional whole changes profile blood 4 tran- glycan-related gene expression in primary human cells infected with HIV-1 profile 1 glycan-related gene expression in primary human cells infected with HIV-1 profile 2 glycan-related gene expression in primary human cells infected with HIV-1 profile 3 health human subjects beforeafter and they werelive challenged influenza with (H3N2/Wisconsin) viruses profile 1 Pandemic A (H1N1pdm) Infection in Mice profile 1 Table E.1 (continued) GEO IDGSE29385 Platform GPL10558 Influenzavirus Title serotype associa- Infection Species Control Samples Infection Samples GSE29939 GPL11095 Microarray analysis to examine GSE29939 GPL11095 Microarray analysis to examine GSE29939 GPL11095 Microarray analysis to examine GSE30550 GPL9188 Temporal expression data from 17 GSE31022 GPL6887 Role of Interleukin-6 in Influenza 226 GSM778261,GSM778263 GSM778262, GSM786643 GSM778255,GSM778257 GSM778256, Respitoryvirus syncytial Influenza A H1N1 GSM786638, GSM786639Influenza A H1N1 GSM786637, GSM786640 GSM786638, GSM786639Influenza A H1N1 GSM786641, GSM786642, GSM786638, GSM786639 GSM786644, GSM786645 Humanvirus respiratory (HRSV)macrophage cells syncytial infected atpost 4, mouse infection profile 24 1 hours Influenza A VirusHeterogeneous Isolates Show Macaques profile Virulence 1 in Influenza A VirusHeterogeneous Isolates Show Macaques profile Virulence 3 in Influenza A VirusHeterogeneous Isolates Show Macaques profile Virulence 2 in Table E.1 (continued) GEO IDGSE31378 Platform GPL1261 Host Title cell gene expression in Infection Species Control Samples Infection Samples GSE31694 GPL9861 Pandemic Swine-Origin H1N1 GSE31694 GPL9861 Pandemic Swine-Origin H1N1 GSE31694 GPL9861 Pandemic Swine-Origin H1N1 227 GSM788510,GSM788525, GSM788520, GSM788549, GSM788545, GSM788554, GSM788550, GSM788557, GSM788556, GSM788559 GSM788558, GSM796505,GSM796507 GSM796506, GSM796528,GSM796530, GSM796531 GSM796529, GSM788508,GSM788511, GSM788509, GSM788513, GSM788512, GSM788515, GSM788514, GSM788517, GSM788516, GSM788519, GSM788518, GSM788522, GSM788521, GSM788524, GSM788523, GSM788527, GSM788526, GSM788529, GSM788528, GSM788531, GSM788530, GSM788533, GSM788532, GSM788535, GSM788534, GSM788537, GSM788536, GSM788539, GSM788538, GSM788541, GSM788540, GSM788543, GSM788542, GSM788546, GSM788544, GSM788548, GSM788547, GSM788552, GSM788551, GSM788555, GSM788560 GSM788553, GSM796504 GSM796522, GSM796523 Human herpesvirus 4 GSM788506, GSM788507, Influenzavirus A GSM796502, GSM796503, Influenza A H3N2 GSM796520, GSM796521, epigenotype expandingpolycomb to target non- genes,by Epstein-Barr induced virus infectionhuman in gastricMethylation] cancer profile 1 [Illumina airway epithelial cells to Influenza infection and the importance of In- terferon type I signaling insponse this [mAEC]. re- profile 1 airway epithelialfluenza or RSV cells infection [hAECs- Agilent]. to profile 1 In- Table E.1 (continued) GEO IDGSE31788 Platform GPL8490 Aberrant Title DNA Infection Species Control Samples Infection Samples GSE32137 GPL7202 The response of murine primary GSE32138 GPL6480 The response of human primary 228 GSM796532,GSM796534, GSM796535 GSM796533, GSM796542, GSM796543 GSM796546, GSM796547 GSM822999,GSM823001, GSM823000, GSM823003 GSM823002, GSM823017,GSM823019, GSM823018, GSM823021 GSM823020, GSM823222,GSM823224 GSM823223, GSM823255,GSM823257 GSM823256, GSM796524,GSM796526, GSM796527 GSM796525, GSM796538, GSM796539 GSM796544, GSM796545, GSM822970 GSM822970 GSM823191 GSM823191 Respitoryvirus syncytial Influenza A H3N2 GSM796536, GSM796537Respitoryvirus GSM796540, syncytial GSM796541, Nipah virusInfluenza A H5N1 GSM813064, GSM813066 GSM822968, GSM813065, GSM813067 GSM822969, Influenza A H5N1 GSM822968, GSM822969, SARS coronavirus GSM823189, GSM823190, SARS coronavirus GSM823189, GSM823190, airway epithelialfluenza or RSV cells infection [hAECs- Agilent]. to profile 2 In- airway epithelialfluenza or RSV cells infection [hAECs- Illumina] to profile 1 In- airway epithelialfluenza or RSV cells infection [hAECs- Illumina] to profile 2 In- tome signatureinfection of in Nipah primarycells profile virus 1 endothelial infection of C57Bl/6 mouse model - Data from 3 viraland doses 7 at days 1, post 2, infection 4 profile 1 infection of C57Bl/6 mouse model - Data from 3 viraland doses 7 at days 1, post 2, infection 4 profile 2 icSARS deltaORF6 infectionsthe of 2B4 clonal derivative of3 Calu- cells - Time course profile 1 icSARS deltaORF6 infectionsthe of 2B4 clonal derivative of3 Calu- cells - Time course profile 2 Table E.1 (continued) GEO IDGSE32138 Platform GPL6480 The Title response of human primary Infection Species Control Samples Infection Samples GSE32139 GPL6947 The response of human primary GSE32139 GPL6947 The response of human primary GSE32902 GPL2895 Analysis of the early transcrip- GSE33263 GPL4134 IM001: Influenza A/VN/1203/04 GSE33263 GPL4134 IM001: Influenza A/VN/1203/04 GSE33267 GPL4133 SCL005: icSARS CoV Urbani or GSE33267 GPL4133 SCL005: icSARS CoV Urbani or 229 GSM824946,GSM824948, GSM824949 GSM824947, GSM824964,GSM824966, GSM824967 GSM824965, GSM838261,GSM838265, GSM838263, GSM838269, GSM838271 GSM838267, GSM838261,GSM838265, GSM838263, GSM838269, GSM838271 GSM838267, GSM824942, GSM824943 GSM824942, GSM824943 GSM838260,GSM838264, GSM838262, GSM838268, GSM838270 GSM838266, GSM838260,GSM838264, GSM838262, GSM838268, GSM838270 GSM838266, Influenza A H1N1 GSM824940, GSM824941, Influenza A H5N1 GSM824940, GSM824941, Human immunodefi- ciency virus Human immunodefi- ciency virus mune response defines pathology and death ininfected nonhuman by primates highly pathogenicfluenza virus. in- profile 1 mune response defines pathology and death ininfected nonhuman by primates highly pathogenicfluenza virus. in- profile 2 miRNA and mRNA in Primary Pe- ripheral Blood Mononuclear Cells Infected with Humaneficiency Immunod- Virus (HIV-1) [mRNA] profile 1 miRNA and mRNA in Primary Pe- ripheral Blood Mononuclear Cells Infected with Humaneficiency Immunod- Virus (HIV-1) [mRNA] profile 2 Table E.1 (continued) GEO IDGSE33351 Platform GPL9861 Early Title and sustained innate im- Infection Species Control Samples Infection Samples GSE33351 GPL9861 Early and sustained innate im- GSE33877 GPL6947 Comparative Expression profile of GSE33877 GPL6947 Comparative Expression profile of 230 GSM844147,GSM844149, GSM844148, GSM844151, GSM844150, GSM844153, GSM844152, GSM844155, GSM844154, GSM844160, GSM844156, GSM844179, GSM844161, GSM844191, GSM844186, GSM844218, GSM844217, GSM844221, GSM844219, GSM844224, GSM844223, GSM844226, GSM844225, GSM844228, GSM844227, GSM844232, GSM844233 GSM844231, GSM844139,GSM844143, GSM844141, GSM844194, GSM844144, GSM844197, GSM844196, GSM844199, GSM844201 GSM844198, Influenzavirus A GSM844136, GSM844137, in patients with acutefluenza RSV infection or profile 1 In- Table E.1 (continued) GEO IDGSE34205 Platform GPL570 Title Transcriptional profile of PBMCs Infection Species Control Samples Infection Samples 231 GSM844133,GSM844135, GSM844134, GSM844146, GSM844145, GSM844158, GSM844157, GSM844162, GSM844159, GSM844164, GSM844163, GSM844166, GSM844165, GSM844168, GSM844167, GSM844172, GSM844169, GSM844174, GSM844173, GSM844176, GSM844175, GSM844178, GSM844177, GSM844181, GSM844180, GSM844183, GSM844182, GSM844185, GSM844184, GSM844188, GSM844187, GSM844190, GSM844189, GSM844193, GSM844192, GSM844206, GSM844202, GSM844208, GSM844207, GSM844210, GSM844209, GSM844212, GSM844211, GSM844214, GSM844213, GSM844216, GSM844215, GSM844222, GSM844220, GSM844230 GSM844229, GSM864735,GSM864737 GSM864736, GSM844138,GSM844142, GSM844140, GSM844171, GSM844170, GSM844200, GSM844195, GSM844204, GSM844205 GSM844203, GSM864728 Respitoryvirus syncytial Influenza A H1N1 GSM864726, GSM864727, in patients with acutefluenza RSV infection or profile 2 In- sion profile sA549 of cells hPAF1 deficient duringH1N1 infection influenza with A virusular or stomatitis vesic- virus (VSV) profile 1 Table E.1 (continued) GEO IDGSE34205 Platform GPL570 Title Transcriptional profile of PBMCs Infection Species Control Samples Infection Samples GSE35266 GPL10558 Analysis of global gene expres- 232 GSM864744,GSM864746 GSM864745, GSM867331,GSM867333, GSM867334 GSM867332, GSM877620,GSM877622 GSM877621, GSM877823,GSM877829, GSM877826, GSM877845 GSM877843, GSM877907,GSM877955, GSM877956 GSM877908, GSM877885,GSM877932, GSM877933 GSM877886, GSM864726,GSM864728 GSM864727, GSM867325,GSM867327, GSM867326, GSM867329, GSM867330 GSM867328, GSM877631 GSM877820, GSM877824 GSM877947, GSM877948 GSM877924, GSM877925 VesicularIndiana stomatitis virus Human immunodefi- ciency virus Influenza A H5N1 GSM877629, GSM877630, Influenza A H5N1 GSM877810, GSM877815, Human herpesvirus 1 GSM877899, GSM877900, Human herpesvirus 1 GSM877877, GSM877878, sion profile sA549 of cells hPAF1 deficient duringH1N1 infection influenza with A virusular or stomatitis vesic- virus (VSV) profile 2 the CNS duringimmunodeficiency virus chronic infection simian profile 1 with Mx1-mediated resistancehighly to pathogenic influenza virus infection: mechanisms of survival profile 1 PB1-F2 proteinvirus of increases influenza virulencehibiting A by the in- earlysponse in vivo interferon profile 1 re- nervous system anding liver follow- herpes simplexinfection. virus profile corneal 1 nervous system anding liver follow- herpes simplexinfection. virus profile corneal 2 Table E.1 (continued) GEO IDGSE35266 Platform GPL10558 Analysis of Title global gene expres- Infection Species Control Samples Infection Samples GSE35397 GPL8300 Host response and dysfunction in GSE35933 GPL7202 Molecular signatures associated GSE35940 GPL7202 A single N66S mutation in the GSE35943 GPL7202 Functional genomics of central GSE35943 GPL7202 Functional genomics of central 233 GSM879371,GSM879373 GSM879372, GSM888936,GSM888953 GSM888950, GSM888935,GSM888952 GSM888943, GSM888986 GSM879382 GSM888948 GSM888948 SARS coronavirus GSM879380, GSM879381, Influenza A H1N1 GSM888933, GSM888941, Influenza A H1N1 GSM888933, GSM888941, Influenza A H1N1 GSM888976, GSM888979 GSM888957, GSM888974, sponse to mouse-adaptedvirus in SARS wild type, STAT1IFNAR1 -/-, and -/- mouse geneticgrounds profile back- 1 flammatory macrophages, nuclear receptors and interferon regulatory factors in increasedpandemic virulence 2009 H1N1 of influenza A virus after host adaptation1 profile flammatory macrophages, nuclear receptors and interferon regulatory factors in increasedpandemic virulence 2009 H1N1 of influenza A virus after host adaptation2 profile flammatory macrophages, nuclear receptors and interferon regulatory factors in increasedpandemic virulence 2009 H1N1 of influenza A virus after host adaptation3 profile Table E.1 (continued) GEO IDGSE36016 Platform GPL7202 Transcriptomic Title analysis of host re- Infection Species Control Samples Infection Samples GSE36328 GPL7202 IM002, IM009 - Implication of in- GSE36328 GPL7202 IM002, IM009 - Implication of in- GSE36328 GPL7202 IM002, IM009 - Implication of in- 234 GSM888982 GSM896488,GSM896490, GSM896489, GSM896495, GSM896491, GSM896497, GSM896496, GSM896504, GSM896500, GSM896506, GSM896505, GSM896509, GSM896508, GSM896514, GSM896515 GSM896511, GSM896494,GSM896499, GSM896498, GSM896502, GSM896501, GSM896507, GSM896503, GSM896512, GSM896510, GSM896516, GSM896513, GSM896518, GSM896519 GSM896517, Influenza A H1N1 GSM888976, GSM888979 GSM888961, GSM888973 Influenza A H1N1 GSM888976, GSM888979 GSM888962, GSM888990 Influenza A H1N1 GSM888976, GSM888979 GSM888960, GSM888978, Hepatitus E virus GSM896492, GSM896493, flammatory macrophages, nuclear receptors and interferon regulatory factors in increasedpandemic virulence 2009 H1N1 of influenza A virus after host adaptation4 profile flammatory macrophages, nuclear receptors and interferon regulatory factors in increasedpandemic virulence 2009 H1N1 of influenza A virus after host adaptation5 profile flammatory macrophages, nuclear receptors and interferon regulatory factors in increasedpandemic virulence 2009 H1N1 of influenza A virus after host adaptation6 profile program ishepatitis induced E by infectiontransplant chronic in kidney- 1 recipients profile Table E.1 (continued) GEO IDGSE36328 Platform GPL7202 IM002, Title IM009 - Implication of in- GSE36328 GPL7202 IM002, IM009 - Implication of in- Infection SpeciesGSE36328 GPL7202 Control Samples IM002, IM009 - Implication of in- Infection Samples GSE36539 GPL6480 Interferon-related transcriptional 235 GSM896840,GSM896842, GSM896841, GSM896844, GSM896845 GSM896843, GSM907609,GSM907614 GSM907612, GSM907628,GSM907633, GSM907634 GSM907632, GSM907660,GSM907663 GSM907662, GSM921812 GSM921825, GSM921826 GSM921839, GSM921840 GSM921856 GSM921934,GSM921936 GSM921935, GSM921909 GSM896820,GSM896822 GSM896821, GSM907684 GSM907684 GSM907684 Influenza A H1N1 GSM896818, GSM896819, SARS coronavirus GSM907681, GSM907683, SARS coronavirus GSM907681, GSM907683, SARS coronavirus GSM907681, GSM907683, Influenza A H1N1 GSM921869, GSM921870 GSM921810, GSM921811, Influenza A H1N1Influenza A H1N1 GSM921869, GSM921870Influenza A H1N1 GSM921869, GSM921870 GSM921823,Influenza A GSM921824, H1N1 GSM921869, GSM921870 GSM921837, GSM921838, GSM921907, GSM921854, GSM921908, GSM921855, tion with H1N1 influenza(A/Mexico/InDRE4487/H1N1/2009) A virus profile 1 sponse to SARSvariants mouse MA-15, adapted and MA-15epsilon, MA-15-gamma in youngaged and mice profile 1 sponse to SARSvariants mouse MA-15, adapted and MA-15epsilon, MA-15-gamma in youngaged and mice profile 2 sponse to SARSvariants mouse MA-15, adapted and MA-15epsilon, MA-15-gamma in youngaged and mice profile 3 Time Response profile 1 Time Response profile 2 Time Response profile 3 Time Response profile 4 sponse inwith Calu-3 cell A/CA/04/2009virus Infection profile 1 Influenza Table E.1 (continued) GEO IDGSE36553 Platform GPL6947 mRNA Title profiling during infec- Infection Species Control Samples Infection Samples GSE36969 GPL7202 Global gene expression in re- GSE36969 GPL7202 Global gene expression in re- GSE36969 GPL7202 Global gene expression in re- GSE37569 GPL7202 CA04M001 - Virus Dose and GSE37569 GPL7202 CA04M001GSE37569 - GPL7202 Virus Dose CA04M001 and GSE37569 - GPL7202 Virus Dose CA04M001 and GSE37571 - GPL6480 Virus Dose Host and Regulatory Network Re- 236 GSM921974,GSM921976, GSM921975, GSM921978 GSM921977, GSM925914,GSM925916 GSM925915, GSM928565,GSM928567 GSM928566, GSM928595,GSM928597 GSM928596, GSM921958 GSM925908, GSM925909 GSM928540 GSM928540 Influenza A H5N1 GSM921956, GSM921957, Hepatitis C virus GSM925906, GSM925907, SARS coronavirus GSM928538, GSM928539, SARS coronavirus GSM928538, GSM928539, with HAin avirulentRG1/2004(H5N1) profile 1 mutation A/Vietnam/1203-CIP048- cytes with HCV infection profile 1 icSARS SRBD (spike receptor binding domain from the wild type strain Urbani to allow for infection of human and non-human primate cells) infections of the 2B4derivative clonal of Calu-3 cellscourse profile - 1 Time icSARS Bat SRBD (spike receptor binding domain from the wild type strain Urbani to allow for infection of human and non-human primate cells) infections of the 2B4derivative clonal of Calu-3 cellscourse profile - 2 Time Table E.1 (continued) GEO IDGSE37572 Platform GPL7202 IM004 Title - mouse infection Infection Species Control Samples Infection Samples GSE37715 GPL570 Expression data in human hepato- GSE37827 GPL6480 SCL006,icSARS CoV Urbani or GSE37827 GPL6480 SCL006,icSARS CoV Urbani or 237 GSM937330,GSM937334, GSM937332, GSM937353, GSM937346, GSM937364, GSM937357, GSM937370, GSM937368, GSM937384, GSM937377, GSM937390, GSM937385, GSM937402, GSM937393, GSM937408, GSM937404, GSM937418, GSM937413, GSM937433, GSM937431, GSM937438 GSM937437, GSM937333,GSM937350, GSM937342, GSM937355, GSM937354, GSM937362, GSM937358, GSM937372, GSM937366, GSM937380, GSM937373, GSM937382, GSM937381, GSM937392, GSM937387, GSM937401, GSM937395, GSM937407, GSM937405, GSM937420, GSM937410, GSM937425, GSM937421, GSM937429, GSM937427, GSM937434, GSM937432, GSM937436, GSM937441 GSM937435, GSM937369,GSM937389, GSM937375, GSM937414, GSM937440 GSM937391, GSM937369,GSM937389, GSM937375, GSM937414, GSM937440 GSM937391, West Nile virus GSM937340, GSM937343, West Nile virus GSM937340, GSM937343, dengue infectionblood in mononuclearNicaraguan peripheral children profile 1 cells of dengue infectionblood in mononuclearNicaraguan peripheral children profile 2 cells of Table E.1 (continued) GEO IDGSE38246 Platform GPL15615 Transcriptional Title response to Infection Species Control Samples Infection Samples GSE38246 GPL15615 Transcriptional response to 238 GSM937335,GSM937339, GSM937338, GSM937351, GSM937347, GSM937360, GSM937352, GSM937378, GSM937363, GSM937386, GSM937383, GSM937399, GSM937396, GSM937416, GSM937400, GSM937423, GSM937419, GSM937439, GSM937442 GSM937428, GSM937331,GSM937337, GSM937336, GSM937344, GSM937341, GSM937348, GSM937345, GSM937356, GSM937349, GSM937361, GSM937359, GSM937367, GSM937365, GSM937374, GSM937371, GSM937379, GSM937376, GSM937394, GSM937388, GSM937398, GSM937397, GSM937406, GSM937403, GSM937411, GSM937409, GSM937415, GSM937412, GSM937422, GSM937417, GSM937426, GSM937430 GSM937424, GSM945912,GSM945914, GSM945915 GSM945913, GSM937369,GSM937389, GSM937375, GSM937414, GSM937440 GSM937391, GSM937369,GSM937389, GSM937375, GSM937414, GSM937440 GSM937391, West Nile virus GSM937340, GSM937343, West Nile virus GSM937340, GSM937343, Hepatitis C virus GSM945916, GSM945917 GSM945910, GSM945911, Hepatitis C virus GSM948651, GSM948652 GSM948657, GSM948658 dengue infectionblood in mononuclearNicaraguan peripheral children profile 3 cells of dengue infectionblood in mononuclearNicaraguan peripheral children profile 4 cells of acute hepatitis C patients profile 1 fection of Huh7 cells profile 1 Table E.1 (continued) GEO IDGSE38246 Platform GPL15615 Transcriptional Title response to Infection Species Control Samples Infection Samples GSE38246 GPL15615 Transcriptional response to GSE38597 GPL570 Gene expression profiling of 6 GSE38720 GPL571 Time series data of HCV (JC1) in- 239 GSM949622,GSM949628, GSM949626, GSM949632, GSM949635 GSM949631, GSM951675,GSM951677, GSM951676, GSM951679, GSM951678, GSM951681, GSM951680, GSM951683, GSM951682, GSM951685, GSM951684, GSM951687, GSM951686, GSM951689, GSM951688, GSM951691, GSM951690, GSM951693, GSM951692, GSM951695, GSM951694, GSM951697, GSM951696, GSM951699, GSM951698, GSM951701, GSM951700, GSM951703, GSM951702, GSM951705-GSM951781 GSM951704, GSM949623,GSM949625, GSM949624, GSM949629, GSM949627, GSM949633, GSM949630, GSM949636, GSM949637 GSM949634, GSM951642,GSM951644, GSM951643, GSM951646, GSM951645, GSM951648, GSM951647, GSM951650, GSM951649, GSM951652, GSM951651, GSM951654, GSM951653, GSM951656, GSM951655, GSM951658, GSM951657, GSM951660, GSM951659, GSM951662, GSM951661, GSM951664, GSM951663, GSM951666, GSM951665, GSM951669, GSM951668, GSM951674 GSM951672, Human immunodefi- ciency virus Rhinovirus A GSM951640, GSM951641, Monkeys duringprofile 1 SIV infection blood transcriptional responseRespiratory to Syncytial Virus,fluenza and In- Rhinovirus lower res- piratory tract infectionchildren (LRTI) profile in 1 Table E.1 (continued) GEO IDGSE38795 Platform GPL10558 TFH Title cell dynamics in Rhesus Infection Species Control Samples Infection Samples GSE38900 GPL6884 Genome-wide analysis of whole 240 GSM951782,GSM951784, GSM951783, GSM951786, GSM951785, GSM951788, GSM951787, GSM951790, GSM951789, GSM951792, GSM951791, GSM951794, GSM951793, GSM951796, GSM951795, GSM951798, GSM951797, GSM951800, GSM951799, GSM951802, GSM951801, GSM951804, GSM951803, GSM951806, GSM951805, GSM951808, GSM951807, GSM951810, GSM951811 GSM951809, GSM951812,GSM951814, GSM951813, GSM951816, GSM951815, GSM951818, GSM951817, GSM951820, GSM951819, GSM951822, GSM951821, GSM951824, GSM951823, GSM951826, GSM951827 GSM951825, GSM951642,GSM951644, GSM951643, GSM951646, GSM951645, GSM951648, GSM951647, GSM951650, GSM951649, GSM951652, GSM951651, GSM951654, GSM951653, GSM951656, GSM951655, GSM951658, GSM951657, GSM951660, GSM951659, GSM951662, GSM951661, GSM951664, GSM951663, GSM951666, GSM951665, GSM951669, GSM951668, GSM951674 GSM951672, GSM951640,GSM951642, GSM951641, GSM951644, GSM951643, GSM951646, GSM951645, GSM951648, GSM951647, GSM951650, GSM951649, GSM951652, GSM951651, GSM951654, GSM951653, GSM951656, GSM951655, GSM951658, GSM951657, GSM951660, GSM951659, GSM951662, GSM951661, GSM951664, GSM951663, GSM951666, GSM951665, GSM951669, GSM951668, GSM951674 GSM951672, Influenzavirus A GSM951640, GSM951641, Respitoryvirus syncytial blood transcriptional responseRespiratory to Syncytial Virus,fluenza and In- Rhinovirus lower res- piratory tract infectionchildren (LRTI) profile in 2 blood transcriptional responseRespiratory to Syncytial Virus,fluenza and In- Rhinovirus lower res- piratory tract infectionchildren (LRTI) profile in 3 Table E.1 (continued) GEO IDGSE38900 Platform GPL6884 Genome-wide Title analysis of whole Infection Species Control Samples Infection Samples GSE38900 GPL6884 Genome-wide analysis of whole 241 GSM1226245, GSM1226246, GSM1226247, GSM1226248, GSM1226249, GSM1226250, GSM1226251, GSM1226252, GSM1226253, GSM1226254, GSM1226255, GSM1226256, GSM1226257, GSM1226258, GSM1226259, GSM1226260, GSM1226261, GSM1226262, GSM1226263, GSM1226264, GSM1226265, GSM1226266, GSM1226267, GSM1226268, GSM1226269, GSM1226270, GSM1226271, GSM1226272 GSM1226237, GSM1226238, GSM1226239, GSM1226240, GSM1226241, GSM1226242, GSM1226243, GSM1226244 Respitoryvirus syncytial Murid herpesvirus 1 GSM971629, GSM971641Murid herpesvirus 1 GSM971631, GSM971643 GSM971633, GSM971645Murid herpesvirus 1 GSM971635, GSM971647 GSM971637, GSM971649Influenza A H1N1 GSM971639, GSM971651 GSM984816, GSM984817 GSM984815, GSM984818 blood transcriptional responseRespiratory to Syncytial Virus,fluenza and In- Rhinovirus lower res- piratory tract infectionchildren (LRTI) profile in 1 of the responsescells and of dendritic murine cellMCMV subsets infection NK profile to 1 of the responsescells and of dendritic murine cellMCMV subsets infection NK profile to 2 of the responsescells and of dendritic murine cellMCMV subsets infection NK profile to 3 ysis of acute host responses during 2009 pandemic H1N1infection in influenza mouse, macaque,swine and (macaque dataset) profile 1 Table E.1 (continued) GEO IDGSE38900 Platform GPL10558 Genome-wide analysis Title of whole Infection Species Control Samples Infection Samples GSE39555 GPL1261 Genome-wide expression study GSE39555 GPL1261 Genome-wide expression study GSE39555 GPL1261 Genome-wide expression study GSE40088 GPL14569 Comparative transcriptomic anal- 242 GSM984821,GSM984823 GSM984822, GSM990635,GSM990637 GSM990636, GSM992753,GSM992755, GSM992754, GSM992757, GSM992756, GSM992759, GSM992758, GSM992761, GSM992760, GSM992763 GSM992762, GSM992764,GSM992766, GSM992765, GSM992768, GSM992767, GSM992770, GSM992769, GSM992772, GSM992773 GSM992771, GSM984832 GSM990634 GSM992798,GSM992800, GSM992799, GSM992802, GSM992801, GSM992804, GSM992803, GSM992806, GSM992805, GSM992808, GSM992807, GSM992810, GSM992809, GSM992812, GSM992811, GSM992814, GSM992813, GSM992816, GSM992817 GSM992815, GSM992796,GSM992798, GSM992797, GSM992800, GSM992799, GSM992802, GSM992801, GSM992804, GSM992803, GSM992806, GSM992805, GSM992808, GSM992807, GSM992810, GSM992809, GSM992812, GSM992811, GSM992814, GSM992813, GSM992816, GSM992817 GSM992815, Influenza A H1N1 GSM984830, GSM984831, Influenza A H5N1 GSM990632, GSM990633, Human herpesvirus 6 GSM992796, GSM992797, Unidentifiedovirus aden- ysis of acute host responses during 2009 pandemic H1N1infection in influenza mouse, macaque,swine and (mouse dataset) profile 1 profile 1 nature distinguishes viral infection from bacterial infection inyoung febrile children. profile 1 nature distinguishes viral infection from bacterial infection inyoung febrile children. profile 2 Table E.1 (continued) GEO IDGSE40091 Platform GPL7202 Comparative Title transcriptomic anal- Infection Species Control Samples Infection Samples GSE40281 GPL570 Signaling pathways of HPAIV GSE40396 GPL10558 Whole blood transcriptional sig- GSE40396 GPL10558 Whole blood transcriptional sig- 243 GSM992774,GSM992776, GSM992775, GSM992778, GSM992779 GSM992777, GSM992780,GSM992782, GSM992781, GSM992784, GSM992783, GSM992786, GSM992787 GSM992785, GSM1001729, GSM1001730, GSM1001731 GSM992798,GSM992800, GSM992799, GSM992802, GSM992801, GSM992804, GSM992803, GSM992806, GSM992805, GSM992808, GSM992807, GSM992810, GSM992809, GSM992812, GSM992811, GSM992814, GSM992813, GSM992816, GSM992817 GSM992815, GSM992798,GSM992800, GSM992799, GSM992802, GSM992801, GSM992804, GSM992803, GSM992806, GSM992805, GSM992808, GSM992807, GSM992810, GSM992809, GSM992812, GSM992811, GSM992814, GSM992813, GSM992816, GSM992817 GSM992815, GSM1001728 Rhinovirus A GSM992796, GSM992797, Enterovirus A GSM992796, GSM992797, Influenza A H5N1 GSM1001726, GSM1001727, nature distinguishes viral infection from bacterial infection inyoung febrile children. profile 3 nature distinguishes viral infection from bacterial infection inyoung febrile children. profile 4 A/Vietnam/1203/04 H5N1fluenza virus of in- IDO1 C57BL/6J mice, KO miceKO mice profile or 1 TNFRSF1B Table E.1 (continued) GEO IDGSE40396 Platform GPL10558 Whole blood Title transcriptional sig- Infection Species Control Samples Infection Samples GSE40396 GPL10558 Whole blood transcriptional sig- GSE40792 GPL7202 IM010 - Infection with 244 GSM1002368, GSM1002369, GSM1002375, GSM1002378, GSM1002379, GSM1002390, GSM1002394, GSM1002395, GSM1002402, GSM1002403, GSM1002407, GSM1002410, GSM1002415, GSM1002418, GSM1002419, GSM1002430, GSM1002431, GSM1002439, GSM1002442, GSM1002443 GSM1002567, GSM1002568, GSM1002569 GSM1003154, GSM1003155, GSM1003156 GSM1003196, GSM1003197, GSM1003198 GSM1023562, GSM1023566, GSM1023572, GSM1023575, GSM1023584 GSM1031689, GSM1031690, GSM1031691, GSM1031692 GSM1002374, GSM1002382, GSM1002383, GSM1002386, GSM1002387, GSM1002391, GSM1002398, GSM1002399, GSM1002406, GSM1002411, GSM1002414, GSM1002422, GSM1002423, GSM1002426, GSM1002427, GSM1002434, GSM1002435, GSM1002438 GSM1002575 GSM1003171 GSM1003171 GSM1023561, GSM1023563, GSM1023565, GSM1023568, GSM1023571, GSM1023574, GSM1023577, GSM1023579, GSM1023583, GSM1023585, GSM1023588 GSM1031685, GSM1031686, GSM1031687, GSM1031688 Hepatitis C virus GSM1002366, GSM1002372, SARS coronavirus GSM1002573, GSM1002574, SARS coronavirus GSM1002608, GSM1002609Influenza GSM1002603, A GSM1002604 H1N1 GSM1003169, GSM1003170, Influenza A H1N1 GSM1003169, GSM1003170, Lassanavirus mammare- Human immunodefi- ciency virus responses from macrophagespatients chronically of infected with Hepatitis C virus profile 1 MA15 of C57BL/6J mice andfrsf1b Tn- knockout mice profile 1 MA15 ofKEPI C57BL/6J (ppp1r14c) knockouts micefile pro- 1 and cells with H1N1A/Netherlands/602/2009 profile influenza 1 virus cells with H1N1A/Netherlands/602/2009 profile influenza 2 virus Human PrimatesLassa Infected Virus to Understand With the Im- mune Response to Lassa Infection profile 1 mDCs in HIV infection profile 1 Table E.1 (continued) GEO IDGSE40812 Platform GPL10558 Impaired TLR3-mediated immune Title Infection Species Control Samples Infection Samples GSE40824 GPL7202 SM019 - Infection with SARS GSE40827 GPL7202 SM020 - Infection with SARS GSE40844 GPL6480 ICL010 - Infection of Calu-3 GSE40844 GPL6480 ICL010 - Infection of Calu-3 GSE41752 GPL4133 Transcriptional Profiling in Non- GSE42058 GPL570 Expression data from CD11c+ 245 GSM1047012, GSM1047017, GSM1047022, GSM1047027, GSM1047052, GSM1047057, GSM1047077 GSM1047013, GSM1047018, GSM1047023, GSM1047028, GSM1047053, GSM1047058, GSM1047078 GSM1047014, GSM1047019, GSM1047024, GSM1047029, GSM1047054, GSM1047059, GSM1047079 GSM1047015, GSM1047020, GSM1047025, GSM1047030, GSM1047055, GSM1047060, GSM1047080 GSM1047159, GSM1047160, GSM1047161, GSM1047162, GSM1047163, GSM1047184, GSM1047185, GSM1047186, GSM1047187, GSM1047188, GSM1047219, GSM1047220, GSM1047221, GSM1047222, GSM1047223 GSM1047164, GSM1047165, GSM1047166, GSM1047167, GSM1047168, GSM1047199, GSM1047200, GSM1047201, GSM1047202, GSM1047203, GSM1047214, GSM1047215, GSM1047216, GSM1047217, GSM1047218 GSM1047021, GSM1047026, GSM1047051, GSM1047056, GSM1047076 GSM1047021, GSM1047026, GSM1047051, GSM1047056, GSM1047076 GSM1047021, GSM1047026, GSM1047051, GSM1047056, GSM1047076 GSM1047021, GSM1047026, GSM1047051, GSM1047056, GSM1047076 GSM1047151, GSM1047152, GSM1047153, GSM1047154, GSM1047155, GSM1047156, GSM1047157, GSM1047158, GSM1047204, GSM1047205, GSM1047206, GSM1047207, GSM1047208 GSM1047151, GSM1047152, GSM1047153, GSM1047154, GSM1047155, GSM1047156, GSM1047157, GSM1047158, GSM1047204, GSM1047205, GSM1047206, GSM1047207, GSM1047208 Influenza A H1N1 GSM1047011, GSM1047016, Influenza A H1N1 GSM1047011, GSM1047016, Influenza A H1N1 GSM1047011, GSM1047016, Influenza A H1N1 GSM1047011, GSM1047016, Influenza A H1N1 GSM1047149, GSM1047150, Influenza A H1N1 GSM1047149, GSM1047150, cesses that distinguish lethal from non-lethal influenza infection pro- file 1 cesses that distinguish lethal from non-lethal influenza infection pro- file 2 cesses that distinguish lethal from non-lethal influenza infection pro- file 3 cesses that distinguish lethal from non-lethal influenza infection pro- file 4 cell types duringlethal lethal influenza and infection profile non- 1 cell types duringlethal lethal influenza and infection profile non- 2 Table E.1 (continued) GEO IDGSE42638 Platform GPL6887 Identification Title of biological pro- Infection Species Control Samples Infection Samples GSE42638 GPL6887 Identification of biological pro- GSE42638 GPL6887 Identification of biological pro- GSE42638 GPL6887 Identification of biological pro- GSE42639 GPL6887 Transcriptomic comparison of 5 GSE42639 GPL6887 Transcriptomic comparison of 5 246 GSM1047174, GSM1047175, GSM1047176, GSM1047177, GSM1047178, GSM1047194, GSM1047195, GSM1047196, GSM1047197, GSM1047198, GSM1047209, GSM1047210, GSM1047211, GSM1047212, GSM1047213 GSM1047169, GSM1047170, GSM1047171, GSM1047172, GSM1047173, GSM1047179, GSM1047180, GSM1047181, GSM1047182, GSM1047183, GSM1047189, GSM1047190, GSM1047191, GSM1047192, GSM1047193 GSM1058203, GSM1058236, GSM1058243 GSM1047151, GSM1047152, GSM1047153, GSM1047154, GSM1047155, GSM1047156, GSM1047157, GSM1047158, GSM1047204, GSM1047205, GSM1047206, GSM1047207, GSM1047208 GSM1047151, GSM1047152, GSM1047153, GSM1047154, GSM1047155, GSM1047156, GSM1047157, GSM1047158, GSM1047204, GSM1047205, GSM1047206, GSM1047207, GSM1047208 GSM1058227 Influenza A H1N1 GSM1047149, GSM1047150, Influenza A H1N1 GSM1047149, GSM1047150, Influenza A H5N1 GSM1058200, GSM1058202, cell types duringlethal lethal influenza and infection profile non- 3 cell types duringlethal lethal influenza and infection profile non- 4 human Calu-3 cellswith to infection RG3/2004 A/Vietnam/1203-CIP048- deletion), (H5N1)CIP048-RG3/2004 (PB1-F2 (PB2-627E A/Vietnam/1203- mutant)A/Vietnam/1203/2004 (H5N1) orprofile 1 (H5N1) WT: Table E.1 (continued) GEO IDGSE42639 Platform GPL6887 Transcriptomic Title comparison of 5 Infection Species Control Samples Infection Samples GSE42639 GPL6887 Transcriptomic comparison of 5 GSE43203 GPL6480 ICL0011 - Host response in 247 GSM1058192, GSM1058235, GSM1058244 GSM1058209, GSM1058230, GSM1058240 GSM1058214, GSM1058218, GSM1058233 GSM1058267, GSM1058270, GSM1058275 GSM1058227 GSM1058227 GSM1058227 GSM1058283 Influenza A H5N1 GSM1058200, GSM1058202, Influenza A H5N1 GSM1058200, GSM1058202, Influenza A H5N1 GSM1058200, GSM1058202, Influenza A H5N1 GSM1058256, GSM1058266, human Calu-3 cellswith to infection RG3/2004 A/Vietnam/1203-CIP048- deletion), (H5N1)CIP048-RG3/2004 (PB1-F2 (PB2-627E A/Vietnam/1203- mutant)A/Vietnam/1203/2004 (H5N1) orprofile 2 (H5N1) WT: human Calu-3 cellswith to infection RG3/2004 A/Vietnam/1203-CIP048- deletion), (H5N1)CIP048-RG3/2004 (PB1-F2 (PB2-627E A/Vietnam/1203- mutant)A/Vietnam/1203/2004 (H5N1) orprofile 3 (H5N1) WT: human Calu-3 cellswith to infection RG3/2004 A/Vietnam/1203-CIP048- deletion), (H5N1)CIP048-RG3/2004 (PB1-F2 (PB2-627E A/Vietnam/1203- mutant)A/Vietnam/1203/2004 (H5N1) orprofile 4 (H5N1) WT: man Calu-3 cells to infectionNS1trunc124: with A/Vietnam/1203- CIP048-RG4/2004 (H5N1)WT: or (H5N1) profile 1 A/Vietnam/1203/2004 Table E.1 (continued) GEO IDGSE43203 Platform GPL6480 ICL0011 Title - Host response in Infection Species Control Samples Infection Samples GSE43203 GPL6480 ICL0011 - Host response in GSE43203 GPL6480 ICL0011 - Host response in GSE43204 GPL6480 ICL0012 - Host response in hu- 248 GSM1058250, GSM1058252, GSM1058277 GSM1060258, GSM1060263, GSM1060269, GSM1060288 GSM1060299, GSM1060308, GSM1060310 GSM1071345, GSM1071346, GSM1071347, GSM1071348 GSM1080842, GSM1080837 GSM1045807, GSM1045808 GSM1064972, GSM1064973 GSM1058283 GSM1060278, GSM1060285 GSM1060304, GSM1060316 GSM1071339, GSM1071340 GSM1080828, GSM1080834 GSM1039741, GSM1039742 GSM1039743, GSM1039744 GSM1045805, GSM1045806, GSM1064970, GSM1064971, GSM1064974, GSM1064975 GSM1045805, GSM1045806, GSM1064970, GSM1064971, GSM1064974, GSM1064975 Influenza A H5N1 GSM1058256, GSM1058266, Influenza A H5N1 GSM1060268, GSM1060274, Influenza A H5N1 GSM1060289, GSM1060297, JC polyomavirus GSM1071337, GSM1071338, Hepatitis C virus GSM1080825, GSM1080831, Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus man Calu-3 cells to infectionNS1trunc124: with A/Vietnam/1203- CIP048-RG4/2004 (H5N1)WT: or (H5N1) profile 2 A/Vietnam/1203/2004 PB2-627E:CIP048-RG2/2004 (H5N1) profile A/Vietnam/1203- 1 PB1-F2del:CIP048-RG3/2004 A/Vietnam/1203- (H5N1)10ˆ3 PFU profile 1 at multipotential neuralcells progenitor to astrocytes revealstibility suscep- factors for JC Virus1 profile mRNA translation during hepatitis C virus infection profile 1 ronal dysfunction through disrup- tion of microRNAs. profile 1 ronal dysfunction through disrup- tion of microRNAs. profile 1 ronal dysfunction through disrup- tion of microRNAs. profile 2 Table E.1 (continued) GEO IDGSE43204 Platform GPL6480 ICL0012 Title - Host response in hu- Infection Species Control Samples Infection Samples GSE43301 GPL7202 IM005 - Mouse infection with GSE43302 GPL7202 IM006A - Mouse infection with GSE43794 GPL6244 Differentiation of human fetal GSE44210 GPL6480 Genome-wide analysis of host GSE44265 GPL6244 HIV-1 Tat protein promotes neu- GSE44265 GPL11532 HIV-1 Tat protein promotes neu- GSE44265 GPL11532 HIV-1 Tat protein promotes neu- 249 GSM1064976, GSM1064977 GSM1085378, GSM1085383, GSM1085386, GSM1085397 GSM1085559, GSM1085568, GSM1085575, GSM1085577, GSM1085579 GSM1085540, GSM1085550, GSM1085557, GSM1085570, GSM1085578, GSM1085585, GSM1085586 GSM1085719, GSM1085721, GSM1085723, GSM1085725, GSM1085727, GSM1085729 GSM1086231, GSM1086235, GSM1086246, GSM1086251 GSM1089169, GSM1089170, GSM1089171 GSM1089172, GSM1089173, GSM1089174 GSM1045805, GSM1045806, GSM1064970, GSM1064971, GSM1064974, GSM1064975 GSM1039741, GSM1039742 GSM1039745, GSM1039746 GSM1085396 GSM1085571 GSM1085571 GSM1085718, GSM1085720, GSM1085722, GSM1085724, GSM1085726, GSM1085728 GSM1089165 GSM1089168 Human immunodefi- ciency virus Human immunodefi- ciency virus Influenza A H5N1 GSM1085373, GSM1085385, Influenza A H5N1 GSM1085545, GSM1085555, Influenza A H5N1 GSM1085545, GSM1085555, Human immunodefi- ciency virus SARS coronavirus GSM1086238, GSM1086254Coxsackievirus GSM1086223, GSM1086228, GSM1089163, GSM1089164, Coxsackievirus GSM1089166, GSM1089167, ronal dysfunction through disrup- tion of microRNAs. profile 3 HIV-1 Vpr protein leads to thevelopment de- of neurocognitive dis- orders. profile 1 withRG3/2004 (H5N1) A/Vietnam/1203-CIP048- profile 1 withRG4/2004 (H5N1) A/Vietnam/1203-CIP048- profile 1 withRG4/2004 (H5N1) A/Vietnam/1203-CIP048- profile 2 HIV-Tat protein isVascular mediated Endothelial Growth via Fac- tor Receptor-2 profile 1 EMC respiratory infection profile 1 tion of 129S1tissue and day 6) 129X1 profile 1 (heart tion of 129S1tissue and day 6) 129X1 profile 2 (heart Table E.1 (continued) GEO IDGSE44265 Platform GPL11532 HIV-1 Tat Title protein promotes neu- Infection Species Control Samples Infection Samples GSE44266 GPL6244 Deregulation of microRNAs by GSE44441 GPL7202 IM006B - Mouse infection GSE44445 GPL7202 IM007 - Mouse infection GSE44445 GPL7202 IM007 - Mouse infection GSE44460 GPL6244 Induction of IL-17+ T-cells by GSE44542 GPL14569 Rhesus macaque model of hCoV- GSE44706 GPL6887 Coxsackievirus B3 (CVB3) infec- GSE44706 GPL6887 Coxsackievirus B3 (CVB3) infec- 250 GSM1096513, GSM1096519, GSM1096528 GSM1111185, GSM1111186, GSM1111187 GSM1111191, GSM1111192, GSM1111193, GSM1111194 GSM1119621, GSM1119623, GSM1119625, GSM1119627, GSM1119629, GSM1119631, GSM1119633, GSM1119635 GSM1119637, GSM1119640, GSM1119649 GSM1122296, GSM1122297, GSM1122298 GSM1163241, GSM1163242, GSM1163243, GSM1163244 GSM1096534 GSM1111184 GSM1111190 GSM1119626, GSM1119628, GSM1119630, GSM1119632, GSM1119634, GSM1119636 GSM1119644, GSM1119647, GSM1119650, GSM1119653, GSM1119655, GSM1119657, GSM1119659 GSM1122301 GSM1163389 SARS coronavirus GSM1096514, GSM1096531, JC polyomavirus GSM1111182, GSM1111183, JC polyomavirus GSM1111188, GSM1111189, Human herpesvirus 4 GSM1119622, GSM1119624, Human herpesvirus 4 GSM1119638, GSM1119642, Human herpesvirus 1 GSM1122299, GSM1122300, SARS coronavirus GSM1163295, GSM1163296, with novelEMC human predict potential coronavirus and antivirals importantSARS-coronavirus. differences profile 1 with Cell Lines Support Varying Levels of JC Virus Infectionferences due in to Cellular Gene Dif- Expres- sion profile 1 Cell Lines Support Varying Levels of JC Virus Infectionferences due in to Cellular Gene Dif- Expres- sion profile 2 in humanEBV experiencing infection (Ref8) primary profile 1 in humanEBV experiencing infection (HT12) primary profile 1 rived neuronallineages ascellular human models forplex Herpes Virus, type Sim- 1tions (HSV-1) profile infec- 1 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 1 Table E.1 (continued) GEO IDGSE45042 Platform GPL6480 Cell Title host-response to infection Infection Species Control Samples Infection Samples GSE45639 GPL6244 Clonal Immortalized Human Glial GSE45639 GPL6244 Clonal Immortalized Human Glial GSE45918 GPL6883 Peripheral blood gene expression GSE45919 GPL10558 Peripheral blood gene expression GSE46042 GPL10558 Induced pluripotent stem cell de- GSE47960 GPL6480 SHAE002: SARS-CoV, SARS- 251 GSM1163274, GSM1163275, GSM1163276 GSM1163400, GSM1163401 GSM1163438, GSM1163439, GSM1163440, GSM1163441 GSM1163454, GSM1163455, GSM1163456, GSM1163457 GSM1163514, GSM1163515, GSM1163516 GSM1163543, GSM1163544 GSM1163602, GSM1163603, GSM1163604 GSM1163617, GSM1163618, GSM1163619 GSM1163569, GSM1163570, GSM1163571 GSM1168999, GSM1169000, GSM1169001 GSM1163389 GSM1163491 GSM1163491 GSM1163491 GSM1163491 GSM1163652 GSM1163652 GSM1163652 GSM1163652 GSM1168995 SARS coronavirus GSM1163295, GSM1163296, SARS coronavirus GSM1163489, GSM1163490, SARS coronavirus GSM1163489, GSM1163490, SARS coronavirus GSM1163489, GSM1163490, SARS coronavirus GSM1163489, GSM1163490, SARS coronavirus GSM1163650, GSM1163651, SARS coronavirus GSM1163650, GSM1163651, SARS coronavirus GSM1163650, GSM1163651, SARS coronavirus GSM1163650, GSM1163651, Vaccinia virus GSM1168993, GSM1168994, dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 2 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 1 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 2 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 3 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 4 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 1 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 2 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 3 dORF6 and SARS-BatSRBDfection in- of HAE cultures. profile 4 creatic cells PANC-1 infected with oncolytic vaccinia1h153 profile virus 1 GLV- Table E.1 (continued) GEO IDGSE47960 Platform GPL6480 SHAE002: Title SARS-CoV, SARS- Infection Species Control Samples Infection Samples GSE47961 GPL6480 SHAE003: SARS-CoV, SARS- GSE47961 GPL6480 SHAE003: SARS-CoV, SARS- GSE47961 GPL6480 SHAE003: SARS-CoV, SARS- GSE47961 GPL6480 SHAE003: SARS-CoV, SARS- GSE47962 GPL6480 SHAE004: SARS-CoV, SARS- GSE47962 GPL6480 SHAE004: SARS-CoV, SARS- GSE47962 GPL6480 SHAE004: SARS-CoV, SARS- GSE47962 GPL6480 SHAE004: SARS-CoV, SARS- GSE48121 GPL571 Expression data from human pan- 252 GSM1169616, GSM1169617, GSM1169618 GSM1169640, GSM1169641, GSM1169642 GSM1169686, GSM1169687, GSM1169688 GSM1196103, GSM1196104 GSM1196129 GSM1204592, GSM1204593, GSM1204594 GSM1204595, GSM1204596, GSM1204597 GSM1169665 GSM1169665 GSM1169665 SARS coronavirus GSM1169663, GSM1169664, SARS coronavirus GSM1169663, GSM1169664, SARS coronavirus GSM1169663, GSM1169664, SARS coronavirus GSM1196116, GSM1196117SARS GSM1196101, coronavirus GSM1196102, GSM1196116, GSM1196117Human GSM1196127, immunodefi- GSM1196128, ciency virus deltaNSP16 or icSARS ExoNI in- fections of the 2B4tive clonal of deriva- Calu-3 cellsprofile - 1 Time course deltaNSP16 or icSARS ExoNI in- fections of the 2B4tive clonal of deriva- Calu-3 cellsprofile - 2 Time course deltaNSP16 or icSARS ExoNI in- fections of the 2B4tive clonal of deriva- Calu-3 cellsprofile - 3 Time course and SARS nsp16 mutantfections virus of in- C57BL6 mice -course A profile time 1 and SARS nsp16 mutantfections virus of in- C57BL6 mice -course A profile time 2 ofIndependent LiveRhesus NefSIV macaques Attenuatedinfection infected profile during 1 acute Rev- Table E.1 (continued) GEO IDGSE48142 Platform GPL6480 SCL008: Title icSARS CoV, icSARS- Infection Species Control Samples Infection Samples GSE48142 GPL6480 SCL008: icSARS CoV, icSARS- GSE48142 GPL6480 SCL008: icSARS CoV, icSARS- GSE49263 GPL7202 SM014 - SARS MA15 wild type, GSE49263 GPL7202 SM014 - SARS MA15 wild type, GSE49663 GPL3535 Gene expression in the blood 253 GSM1208076, GSM1208077, GSM1208078, GSM1208079 GSM1208108, GSM1208109, GSM1208110, GSM1208111 GSM1208124, GSM1208125, GSM1208126, GSM1208127 GSM1208139, GSM1208140, GSM1208141, GSM1208142 GSM1231651, GSM1231652, GSM1231653 GSM1208094, GSM1208095 GSM1208094, GSM1208095 GSM1208094, GSM1208095 GSM1208094, GSM1208095 GSM1231661 Influenza A H7N9 GSM1208092, GSM1208093, Influenza A H7N9 GSM1208092, GSM1208093, Influenza A H7N9 GSM1208092, GSM1208093, Influenza A H7N9 GSM1208092, GSM1208093, Chikungunya virus GSM1211129, GSM1211130SARS coronavirus GSM1211133, GSM1211134 GSM1231659, GSM1231660, the novel avian-origin influenza A (H7N9) virus:termediate host-response specific between andavian in- (H5N1 and H7N7)man and (H3N2) viruses. hu- profile 1 the novel avian-origin influenza A (H7N9) virus:termediate host-response specific between andavian in- (H5N1 and H7N7)man and (H3N2) viruses. hu- profile 2 the novel avian-origin influenza A (H7N9) virus:termediate host-response specific between andavian in- (H5N1 and H7N7)man and (H3N2) viruses. hu- profile 3 the novel avian-origin influenza A (H7N9) virus:termediate host-response specific between andavian in- (H5N1 and H7N7)man and (H3N2) viruses. hu- profile 4 HEK293T cellsChikungunya virus profile infected 1 with MA15 ofCXCR3 C57BL6/J knockouts mice instrain of and mice. the profile 1 same Table E.1 (continued) GEO IDGSE49840 Platform GPL17077 Transcriptomic characterization of Title Infection Species Control Samples Infection Samples GSE49840 GPL17077 Transcriptomic characterization of GSE49840 GPL17077 Transcriptomic characterization of GSE49840 GPL17077 Transcriptomic characterization of GSE49985 GPL15207 Gene expression profile in GSE50878 GPL7202 SM007 - Infection with SARS 254 GSM1245695, GSM1245696, GSM1245697, GSM1245698, GSM1245699 GSM1246908, GSM1246917 GSM1305811, GSM1305812, GSM1305813, GSM1305814 GSM1305815, GSM1305816, GSM1305817, GSM1305818 GSM1320802, GSM1320803, GSM1320804, GSM1320805, GSM1320837, GSM1320838, GSM1320839, GSM1320840, GSM1320876, GSM1320877, GSM1320878, GSM1320879, GSM1320888, GSM1320889, GSM1320890, GSM1320891, GSM1320940, GSM1320941, GSM1320942, GSM1320943, GSM1320952, GSM1320953, GSM1320954, GSM1320955, GSM1320964, GSM1320965, GSM1320966, GSM1320967, GSM1320976, GSM1320977, GSM1320978, GSM1320979 GSM1245692, GSM1245693, GSM1245694 GSM1245801, GSM1245802 GSM1245803, GSM1245804 GSM1246921 GSM1305809, GSM1305810 GSM1305809, GSM1305810 GSM1320800, GSM1320801, GSM1320833, GSM1320834, GSM1320835, GSM1320836, GSM1320872, GSM1320873, GSM1320874, GSM1320875, GSM1320884, GSM1320885, GSM1320886, GSM1320887, GSM1320936, GSM1320937, GSM1320938, GSM1320939, GSM1320948, GSM1320949, GSM1320950, GSM1320951, GSM1320960, GSM1320961, GSM1320962, GSM1320963, GSM1320972, GSM1320973, GSM1320974, GSM1320975 Human immunodefi- ciency virus Human immunodefi- ciency virus Influenza A H1N1 GSM1246907, GSM1246912, Alphapapillomavirus GSM1305807, GSM1305808, Alphapapillomavirus GSM1305807, GSM1305808, Hepatitis C virus GSM1320798, GSM1320799, macaque tongue profile 1 macaqueprofile 1 tongue epithelium C57BL6 andmice RIPK3 profile 1 knock-out gene in HPV16E2 protein express- ing human keratinocyte profile 1 gene in HPV16E2 protein express- ing human keratinocyte profile 2 dict Innate Antiviral Immunesponses Re- and Hepatitis C Virus Per- missiveness. profile 1 Table E.1 (continued) GEO IDGSE51436 Platform GPL3535 Expression Title data from rhesus Infection Species Control Samples Infection Samples GSE51438 GPL3535 Expression data from rhesus GSE51526 GPL7202 IM015 - Influenza infection of GSE54008 GPL10904 Genome-wide profiling of human GSE54008 GPL10904 Genome-wide profiling of human GSE54648 GPL10558 Interferon Lambda Alleles Pre- 255 GSM1361461, GSM1361479, GSM1361482, GSM1361483 GSM1361459, GSM1361468, GSM1361477, GSM1361493 GSM1377282, GSM1377288, GSM1377305, GSM1377320, GSM1377324, GSM1377325 GSM1377287, GSM1377306, GSM1377326 GSM1380961, GSM1380963, GSM1380965 GSM1380967, GSM1380969, GSM1380971 GSM1361480 GSM1361480 GSM1377308, GSM1377313, GSM1377315 GSM1377309, GSM1377314, GSM1377316 GSM1380960, GSM1380962, GSM1380964 GSM1380966, GSM1380968, GSM1380970 Influenza A H3N2 GSM1350020, GSM1350021 GSM1350022, GSM1350023 Influenza A H3N2 GSM1350020, GSM1350021 GSM1350024, GSM1350025 Influenza A H5N1 GSM1361455, GSM1361471, Influenza A H5N1 GSM1361455, GSM1361471, Ebola virus GSM1377281, GSM1377304, Ebola virus GSM1377283, GSM1377307, Lymphocyticomeningitis chori- marenavirus mam- Lymphocyticomeningitis chori- marenavirus mam- sponse to highly andinfluenza less virulent Afections (H3N2) at viruspost-infection profile 12, 1 in- 48 and 96sponse to highly h andinfluenza less virulent Afections (H3N2) at viruspost-infection profile 12, 2 in- 48 and 96A/Vietnam/1203/2004 h profile 1 (H5N1) A/Vietnam/1203/2004profile 2 (H5N1) experimental Ebola hemorrhagic fever pathogenesis profile 1 experimental Ebola hemorrhagic fever pathogenesis profile 2 naling directly onfollowing CD8+ LCMV T infection in cells presence the or absence ofprofile 1 NK cells. naling directly onfollowing CD8+ LCMV T infection in cells presence the or absence ofprofile 2 NK cells. Table E.1 (continued) GEO IDGSE55994 Platform GPL6103 Global Title murine pulmonary re- GSE55994 GPL6103 Global murine pulmonary re- GSE56433 GPL7202 IM013 Infection - Species Mouse infection with Control Samples Infection Samples GSE56433 GPL7202 IM013 - Mouse infection with GSE57214 GPL17400 Host genetic diversity enables GSE57214 GPL17400 Host genetic diversity enables GSE57355 GPL13912 The effect of type-I interferon sig- GSE57355 GPL13912 The effect of type-I interferon sig- 256 GSM1380973, GSM1380975, GSM1380977 GSM135302, GSM135303 GSM135309,GSM135312 GSM135310, GSM1466670, GSM1466672, GSM1466674 GSM1380972, GSM1380974, GSM1380976 GSM135292,GSM135294, GSM135293, GSM135296, GSM135295, GSM135298 GSM135297, GSM135292,GSM135294, GSM135293, GSM135296, GSM135295, GSM135298 GSM135297, GSM1466669, GSM1466671, GSM1466673 Lymphocyticomeningitis chori- marenavirus mam- New world arenavirus GSM135290, GSM135291, New world arenavirus GSM135290, GSM135291, Human herpesvirus 4 GSM1422231, GSM1422232 GSM1422235, GSM1422236 Human immunodefi- ciency virus naling directly onfollowing CD8+ LCMV T infection in cells presence the or absence ofprofile 3 NK cells. hemorrhagic fever profile 1 hemorrhagic fever profile 2 Barr virustalized infection normal of oralprofile keratinocytes immor- 1 PBMC andinfections confirmed in1-pathways by analysis vivosubject N-of- of human single- 1 transcriptome profile Table E.1 (continued) GEO IDGSE57355 Platform GPL13912 The effect of Title type-I interferon sig- Infection Species Control Samples Infection Samples GSE5790 GPL570 Primate blood signs of arenavirus GSE5790 GPL570 Primate blood signs of arenavirus GSE58914 GPL570 Expression data from Epstein- GSE60153 GPL6244 Concordance between ex vivo 257 All ARID5A B2M BAZ1A CASP1 CASP4 CASP5 CCRL2 CD274 CD86 CDKN1A CHMP5 CLEC4E CMTR1 CRLF2 CSF1 CSF2RB CXCL8 GBP1 GCH1 GLRX HLA-B HSPA1B ICAM1 BacteriaViruses ABTB2 & ADM AFTPH AIDA AIF1 APOL6 ARF4 ARG2 ARL5B ATF3 ATP1B3 ATXN7L3 AZI2 B4GALT5 BATF2 BCL10 BCL2A1 BCL2L1 BCL3 BIRC2 BIRC3 BST1 BST2 EukaryotesViruses & APOBEC3G APOBEC3H CCR1 CD68 EIF2S2 ETF1 FCGR1B HIST2H2AA4 HPRT1 IDO1 LACTB LILRA3 METRNL MS4A4A MT1H NOD2 NOP10 OSCAR RAB8B RNF213 SIGLEC5 SKIL THEMIS2 . The table lists genes that are significantly upregulated for taxonomical Bacteria &karyotes Eu- AGTRAP ALDOA APOL1 BATF CD163 EBI3 EHD1 FCGR2C GNG5 GSDMD HIF1A HK2 HSP90AA1 IER3 ITPRIP LDHA MAFG MARCKS MBD3L5 NFKB2 PDE4B PKM PTGS2 Viruses AARS ABL2 ACKR1 ACR ACTL6B ACTR3BP5 ADAM30 ADAMTS9 ADAP2 ADAR AEN AFF1 AFF4 AGAP7P AGRN AIM2 AKIRIN2 ALAS2 ALDH8A1 AMER3 AMICA1 ANKFY1 ANKRD18B Eukaryotes ACTB AGPAT4-IT1 ANGPTL2 ANKRD22 ANXA2P1 ASAP1-IT1 ATXN7L3B BRI3P1 BZW1 CASP3 CCL1 CCL23 CCL3L3 CCL4L1 CCZ1B CD69 CLEC6A CTSL EIF4A1 FAM74A4 FCGR1C FLJ45248 FTH1P11 Up Regulated Genes From the Meta-analysis Bacteria ABCC1 ABI1 ACBD3 ACOT9 ACP2 ACSL4 ACTR1A ACVR1B ADAM17 ADAMTS1 ADAMTS4 ADARB1 ADORA2B ADPRH ADRM1 AGFG1 AGO2 AGPAT4 AHSP ALAS1 ALDH1A3 ALOX5 AMPD3 Table E.2: groups and genes thatsignificant are genes. commonly upregulated in combinations of the groups. The virus taxonomical group produces the most 258 All IL15RA IL1B IL1RN IL6 JAK2 JUNB LILRB3 MLKL MTHFD2 MYD88 N4BP1 NAMPT NFKBIA NFKBIZ OAS1 OAS2 OASL PILRA PIM1 PLAUR PMAIP1 PSMA6 RELB RHBDF2 SAMSN1 SOCS3 BacteriaViruses BTG3 & C1QB C1R C1S C2 C3 CAPZA2 CARD16 CASP7 CCDC59 CCL13 CCL19 CCL2 CCL20 CCL3L1 CCL5 CCL8 CCNL1 CD300LF CD47 CEACAM1 CEBPD CFB CFLAR CLEC4A CLEC4D EukaryotesViruses & TLR8 TRIB1 TYMP Bacteria &karyotes Eu- PVRL2 RBBP8 SLC2A6 SLC43A3 SNX10 SRA1 TLR2 UBC Viruses ANKRD20A2 APOBEC1 APOBEC3A APOBEC3B APOL2 APOL4 AQP7P3 AQP9 ARAFP2 ARC AREL1 ARHGAP30 ARPC4-TTLL3 ART3 ASAH2 ASB4 ASB5 ASCL2 ASCL3 ASPHD2 ASS1 ATP10A AVIL B4GALT3 BEAN1 BRIP1 Eukaryotes FTH1P12 FTH1P16 FTH1P2 GNAI3 GPR132 GPR65 HGF HMGN2P46 IGLL3P IL12B IL24 KRT8P9 LAIR2 LINC00239 LOC100128317 LOC100131785 LOC100131859 LOC642590 LOC644173 LOC646347 LOC646970 LOC653071 LOC727896 LOC728602 LOC729200 LRRC25 Table E.2 (continued) Bacteria ANXA1 ANXA2 ANXA4 ANXA8L1 AOAH ARF1 ARID5B ARPC2 ARPC3 ARRDC4 ARSD ATF4 ATF5 ATP11A ATP6V0B ATP6V0C ATP6V0E1 ATP6V1D ATP6V1E1 ATP6V1G1 AZIN2 B4GALT1 BACH1 BASP1 BATF3 BCL6 259 All SOD2 SP100 SP110 SRGN STAT1 STAT3 TGM2 TIMP1 TNFAIP6 TNFRSF9 TREM1 UPP1 BacteriaViruses CLEC5A & CLIC4 CMPK2 CREM CSF3 CSRNP1 CSTA CXCL10 CXCL11 CXCL16 CXCL2 CXCL9 CXCR2 CYBB CYP2C18 DAPP1 DCP2 DCUN1D3 DDX58 DDX60 DHX58 DRAM1 DTX3L DUOX1 DUSP16 DUSP2 EukaryotesViruses & Bacteria &karyotes Eu- Viruses BYSL C1QC C3AR1 C4A C4B C9 CACNA1A CALCB CARD17 CARS CASP12 CCL11 CCL18 CCL7 CCND2 CCNYL1 CCR2 CCRN4L CD300A CD53 CD70 CD80 CDCP1 CDKN2D CDRT3 CEBPG Eukaryotes LSS MT1P3 NCF1B NKAP NUDCD1 PAM PDCD1LG2 PGAM4 PPIH RAB24 RASSF3 RBM11 RNF149 RNU105C SDHA SEC14L1 SIGLEC14 SOWAHC STAT5A STEAP1B SUMO1P3 TLR6 TMEM251 TNFRSF11B TNFSF4 TTTY10 Table E.2 (continued) Bacteria BID BLVRA BRPF1 BTG2 BUD31 CA13 CAP1 CAPG CCDC12 CCDC93 CCL17 CCT3 CD14 CD177 CD38 CD40 CD44 CD52 CDC42 CDC42EP2 CDC42EP4 CDK1 CEBPB CERS6 CGGBP1 CHEK1 260 All BacteriaViruses DUSP8 & EGR2 EIF2AK2 ELF3 EMILIN2 ERAP1 ETV6 ETV7 F3 FAM26F FAM49B FAS FCER1G FCGR1A FCGR2B FFAR2 FGL2 FGR FMNL2 FNDC3B FOSL1 FOSL2 FPR1 FPR2 GADD45B GBP4 EukaryotesViruses & Bacteria &karyotes Eu- Viruses CELA2A CH25H CHAC1 CHAC2 CHIT1 CHMP4A CKS1B CLCA4 CLEC12A CLEC7A CNGB3 CNP COL11A2 COL24A1 COL27A1 COL2A1 CPEB2 CPEB3 CPEB4 CREB5 CRNDE CT47A11 CTRL CX3CL1 CXCL13 CXCL3 Eukaryotes TXNRD1 UGP2 VAMP3 Table E.2 (continued) Bacteria CHI3L1 CHL1 CHMP2B CHMP4B CKAP4 CLCN5 CLDN4 CLIC1 CLK3 CNDP2 CNIH4 COL1A1 COL4A1 COL4A2 COQ10B COX17 CP CREB3 CRISPLD2 CSF2 CSF2RA CSNK1A1 CST7 CSTB CTSC CTSZ 261 All BacteriaViruses GBP5 & GLIPR2 GNA13 GPR84 GRINA GTPBP2 GZMB H2AFX HAX1 HBEGF HELZ2 HERC6 HK3 HLA-A HLA-G IFI27 IFI35 IFI44 IFIH1 IFIT1 IFIT1B IFIT2 IFIT3 IFITM1 IFRD1 IL15 EukaryotesViruses & Bacteria &karyotes Eu- Viruses CYR61 CYSLTR1 DCK DCLRE1C DCP1A DDA1 DDX52 DDX60L DHRS9 DIO3 DISC1 DNAH17 DNAJA1 DNM1P46 DOK3 DOT1L DPH3P1 DPYS DSE DUOXA2 DUSP6 EFCAB10 EGR4 EHD4 EIF1AY EPSTI1 Eukaryotes Table E.2 (continued) Bacteria CXCL1 CXCL6 CYP1B1 DACH1 DBNL DDX39A DENND3 DGAT2 DHRS12 DIAPH2 DLX2 DMXL2 DNPEP DPH3 DSCR3 DUSP1 DUSP4 DUSP5 DYNLRB1 E2F1 EAPP EDN1 EEA1 EGR1 EIF3CL EIF4E3 262 All BacteriaViruses IL18RAP & IL1R2 IL36G INHBA IRF1 IRF7 IRF9 IRGM ISG15 ISG20 KDM5C KIAA1549L KIAA1551 KLF6 LCP2 LGALS3BP LGALS8 LGALS9 LILRA6 LITAF LY6E LY96 MAD2L1BP MAP3K8 MARCH5 MCEMP1 EukaryotesViruses & Bacteria &karyotes Eu- Viruses ERC2-IT1 ETNK1 ETV2 ETV3 EVI2A EVI2B EWSR1 FAM122C FAM133B FAM208B FAM225A FAM46A FAM71F1 FAM90A1 FAM90A7P FAP FBXO39 FCGR2A FCGR3B FCN1 FEM1C FFAR3 FGD6 FGF23 FGF8 FKBP1A Eukaryotes Table E.2 (continued) Bacteria EIF5A EIF6 ELL ELL2 EMILIN1 ENO1 ENPP4 ENTPD1 ERH ERO1L ERRFI1 ETS2 FBN1 FBXL5 FBXO6 FCAR FCGR3A FERMT3 FGF2 FLAD1 FLOT2 FSTL1 GALNS GAPDH GAS7 GBA 263 All BacteriaViruses MCL1 & MED27 MEFV MITD1 MMP13 MNDA MOB1A MORC3 MSR1 MUC4 MX1 MXD1 MYL12A NAA25 NCF1 NCF4 NFIL3 NFKB1 NFKBIE NMI NMRK2 NUB1 OAS3 OGFR OPTN OSMR EukaryotesViruses & Bacteria &karyotes Eu- Viruses FLJ31662 FLJ42200 FLJ46365 FLOT1 FMR1 FNDC3A FOS FP6628 FXYD6 FYB GADD45A GADD45G GBP6 GCA GCG GEM GFPT2 GHSR GLDC GNB4 GOLGA6L10 GOLGA6L9 GPR141 GPR88 GPR97 GRAMD3 Eukaryotes Table E.2 (continued) Bacteria GCNT2 GDA GFPT1 GJA1 GLA GMFG GNG12 GNS GOLGA6L4 GOSR2 GRAMD1B GRB2 GSAP GSR GTF2B GYPB H3F3B HCAR2 HCK HCLS1 HDC HEATR6 HIVEP1 HK1 HLA-C HMGB2 264 All BacteriaViruses OVCH1 & PADI4 PARP14 PARP3 PARP9 PCGF5 PDPN PHF11 PIK3AP1 PIM3 PLA1A PLEK PLSCR1 PML PNCK PNP PPA1 PPP1R15A PPP2R2A PPP4R2 PRSS46 PSMA4 PSMB10 PSMB8 PSMB9 PSMD10 EukaryotesViruses & Bacteria &karyotes Eu- Viruses GRIN3A GTF2A1L GTF3C6 GYG2P1 HAS2 HAVCR2 HCG9 HERC5 HES4 HESX1 HIST1H2APS1 HIST1H2BC HIST1H2BG HIST2H2AA3 HIST2H2BC HIST2H2BE HIST2H4A HLA-E HMBS HOMER1 HORMAD1 HRASLS2 HSH2D HSPA6 HSPB9 ICAM5 Eukaryotes Table E.2 (continued) Bacteria HMOX1 HNRNPKP3 HP HTATIP2 HTR7P1 HTRA1 IDI1 IFNAR2 IFNGR2 IFT57 IGSF6 IKBKE IL10 IL13RA1 IL1R1 IL2RG IL4R INO80C INSL3 INTS12 IRAK2 IRAK3 IRF2 ITGA5 ITGAL ITGAM 265 All BacteriaViruses PSME1 & PSME2 PSTPIP2 PTP4A1 PTPN2 PTX3 PVR RAB10 RAP2C RGS16 RIOK3 RIPK1 RIPK2 RND1 RND3 RNF114 RNF19A RNF19B RNF31 RSAD2 RTCA RTP4 S100A11 S100A8 SAA1 SAMHD1 EukaryotesViruses & Bacteria &karyotes Eu- Viruses IER2 IER5 IFI16 IFI27L2 IFI44L IFI6 IFIT5 IFITM2 IFITM3 IFITM4P IFNA1 IFNA16 IFNB1 IFNG IFNL2 IGLON5 IGLV- IL11 IL18BP IL18R1 IL1A IL27 IL2RA IL6ST ILDR1 ISLR Eukaryotes Table E.2 (continued) Bacteria ITGB3 ITIH4 JAK3 JDP2 JMJD6 JUN KAZN KDM6B KLF5 KLF7 KRAS KRT19P2 LAIR1 LAMP2 LBP LCN2 LCP1 LDLR LGALS1 LGALS3 LILRP2 LIMK2 LIPG LMO4 LOC728877 LOC730020 266 All BacteriaViruses SAT1 & SBNO2 SDC4 SEC61G SELL SERPINB2 SERPINB9 SERPINE1 SERPING1 SERTAD1 SGMS2 SHISA5 SLAMF7 SLAMF8 SLC11A1 SLC25A28 SLC31A2 SLC3A2 SLFN12 SLFN13 SLFN5 SNX6 SOCS1 SPANXN5 SPATS2L EukaryotesViruses & Bacteria &karyotes Eu- Viruses ITGA4 ITPKC JPX KCND3 KCNE4 KCNT1 KEAP1 KHDC1 KIAA1033 KLF4 KLHDC7B KLK3 KRT75 KRTAP6-3 KRTAP9-1 LAG3 LAP3 LARP1 LAT2 LGALS9C LGI2 LHX1 LILRA1 LILRB2 LIMS3- LOC440895 Eukaryotes Table E.2 (continued) Bacteria LOX LPAR1 LPCAT2 LPP LRP12 LRRC59 LRRFIP2 LRRK1 LTB4R LTF LYN LYVE1 MAFF MAPK13 MAPKAPK2 MARCKSL1 MCM2 MCMBP MCTP1 MED10 MED7 METTL1 MFSD1 MFSD7 MKI67 267 All BacteriaViruses SPHK1 & SPRED1 SRP54 STAT2 STX11 SUSD6 TANK TAP1 TAPBP TAPBPL TBK1 TCIRG1 TDRD7 TFEC TGM1 TIFA TLR3 TMEM167A TNF TNFAIP3 TNFSF10 TOR1AIP1 TOR1AIP2 TOR3A TRAFD1 TRIM21 EukaryotesViruses & Bacteria &karyotes Eu- Viruses LIMS3L LINC-ROR LINC00106 LINC00152 LINC00189 LINC00265 LINC00302 LINC00327 LINC00467 LINC00472 LINC00483 LINC00487 LINC01000 LIPN LMAN1L LMO2 LNPEP LOC100128107 LOC100128348 LOC100128437 LOC100129111 LOC100129115 LOC100129312 LOC100129447 LOC100129721 LOC100130156 Eukaryotes Table E.2 (continued) Bacteria MMADHC MMP14 MMP3 MMP8 MMP9 MORF4L2 MPO MRPL27 MRPL52 MRPL54 MRPS10 MS4A6A MSN MTHFR MVP MYADM MYL6 MYO10 MYOF NAB1 NDEL1 NDRG1 NEK6 NFE2L2 NFKBIB NKG7 268 All BacteriaViruses TRIM6-TRIM34 & TTC39B TXN UBA7 UBAP1 UBD UBE2D3 UBE2F UBE2L6 UGCG USP18 VCAM1 VCAN VCPIP1 WARS WASF2 XAF1 XDH XRN1 YRDC ZBP1 ZC3H12A ZC3H12C ZC3HAV1 ZCCHC6 ZFP36 EukaryotesViruses & Bacteria &karyotes Eu- Viruses LOC100130433 LOC100130442 LOC100130587 LOC100130876 LOC100130930 LOC100131355 LOC100131929 LOC100132368 LOC100132874 LOC100133131 LOC100134391 LOC100190986 LOC100288069 LOC100289495 LOC100291105 LOC100505585 LOC100505915 LOC100505921 LOC100506253 LOC100506257 LOC100506473 LOC100506499 LOC100506514 LOC100506633 LOC100506639 LOC100507091 Eukaryotes Table E.2 (continued) Bacteria NLRP3 NNMT NOD1 NOL3 NPC2 NRAS NRP2 NUBP1 NUP50 NUP54 NUP62 NUPR1 OGFRL1 OLFM4 OR2J2 ORMDL2 OSTF1 P2RY13 PAFAH1B2 PALM2-AKAP2 PARP11 PCDHA4 PDCD10 PDSS1 PELO PFKFB3 269 All BacteriaViruses ZMYND15 & ZNF281 ZNFX1 EukaryotesViruses & Bacteria &karyotes Eu- Viruses LOC100507209 LOC100507464 LOC100507547 LOC100507747 LOC100509490 LOC100509553 LOC388210 LOC389602 LOC401357 LOC401557 LOC643733 LOC645427 LOC646517 LOC727710 LOC729468 LOC729839 LOC730338 LONRF3 LRRC4 LSMEM1 LST1 LY86 LYRM1 MAB21L3 MAFB MAK Eukaryotes Table E.2 (continued) Bacteria PFKP PFN1 PGAM1 PGD PGK1 PGLYRP1 PGS1 PHLDA1 PICALM PIWIL2 PLA2G4A PLAGL2 PLAT PLAU PLD1 PLEKHM3 PLK3 POMP PPP1R15B PRELID1 PRKCD PRR13 PSEN1 PSMA1 PSMA2 PSMA5 270 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MAP3K13 MARCH1 MARK2P9 MASTL MATN1-AS1 MB21D1 MBD1 MFSD2A MGC10814 MICB MMP19 MOB3C MOV10 MPEG1 MRGPRG MROH3P MS4A7 MT1A MT1B MT1E MT1F MT1G MT1IP MT1X MT2A MTRNR2L1 Eukaryotes Table E.2 (continued) Bacteria PSMA7 PSMB2 PSMB4 PSMB6 PSMC4 PSMC6 PSMD1 PSMD2 PSMD4 PTGES PTGES3 PTPN1 PTPN12 PXDN PYROXD1 RAB11A RAB27A RAB32 RAB5A RAC2 RAI14 RALB RAN RAPGEF5 RASSF4 RBM7 271 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MTRNR2L10 MTRNR2L3 MTRNR2L7 MUC17 MX2 MYBPH MYCBP2 MYH6 MYO1G MYOCD NAA20 NABP1 NADK NAV3 NBN NCOA7 NEUROG2 NLRC5 NPIPA1 NPPC NRIP3 NT5C3A OLR1 OR5H1 ORM1 Eukaryotes Table E.2 (continued) Bacteria RBMS1 RHAG RHOC RHOG RHOQ RHOV RIC1 RILPL2 RLF RLIM RNASEH1 RNF217 RNU1-4 RPS27L RRAS RSL24D1 RUNX1 S100A9 SAP30 SAV1 SBDS SCYL2 SEC13 SEMA4A SEPW1 SERPINA1 272 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses OSM OTOF OVOL1 P2RY12 P2RY14 PANK2 PARP10 PARP12 PARP8 PATL1 PAX6 PDCD1 PDLIM7 PELI1 PGF PIGA PIK3R5 PIP5K1A PIWIL4 PKIB PLAC8 PLEKHA4 PLSCR2 PNPT1 POLE4 POU3F1 Eukaryotes Table E.2 (continued) Bacteria SF3B6 SH3BP2 SHFM1 SIRPA SLA SLC11A2 SLC2A1 SLC2A14 SLC2A3 SLC35B1 SLC39A1 SLC6A14 SLC7A11 SLC7A2 SLC7A7 SLCO3A1 SLCO4A1 SLMO2 SLPI SMAD1 SMNDC1 SNAP29 SNORA14B SNRNP27 SNRPA1 SNX18 273 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses POU5F1 PPEF2 PPM1K PPP1R18 PRAM1 PRLHR PRSS22 PRTN3 PSMD14 PSORS1C2 PTAFR PTCHD3 PYHIN1 RASD1 RASL10B RAX2 RBM43 RBMS2 RCAN1 REC8 REREP3 RETNLB RFPL1 RHCG RNF135 RPL39L Eukaryotes Table E.2 (continued) Bacteria SPIC SPTLC2 SQLE SRP19 SRPR STEAP4 STK17B STOML1 STX12 STX18 SUMO1 SURF4 SYNJ1 TAX1BP1 TBCB TCF7L2 TDG TGFBI TIFAB TIPARP TJP2 TLE3 TMED5 TMEM11 TMEM110 TNFAIP2 274 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses RSPH6A SAA2-SAA4 SAMD9 SAMD9L SCARB2 SCFD1 SCIMP SDC3 SDCBP SDCBP2 SECTM1 SELE SELP SELT SEMA6B SERPINC1 SH2D5 SIGLEC1 SIGLEC11 SIGLEC7 SIK1 SLAMF9 SLC16A6 SLC22A4 SLC25A25 SLC26A8 Eukaryotes Table E.2 (continued) Bacteria TNFRSF14 TNFRSF1A TNFRSF1B TNIP2 TOR1A TPM3 TRAF1 TRAF3 TREML4 TRIB3 TRIP10 TRIP13 TSPO TWF1 TWF2 TXNDC9 TXNL1 TYROBP TYW5 UBALD2 UBE2J1 UBE2V1 URAD VDR VEGFA VEGFC 275 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SLC30A1 SLC35G2 SLC7A5 SMC1B SMCHD1 SMCR8 SMG1 SMG7 SNAR-A3 SNAR-B2 SNAR-C4 SNAR-D SNAR-F SNAR-G2 SNAR-H SNHG5 SNORD29 SNORD86 SNRPC SNX2 SP140 SPDYE5 SPDYE7P SPEM1 SPI1 SPRR1A Eukaryotes Table E.2 (continued) Bacteria VNN1 VRK2 WBP4 WDHD1 WTAP XRCC4 YKT6 ZBTB43 ZFAND3 ZNF143 ZNF259P1 ZNF490 ZNRF1 276 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SPSB1 SPTY2D1 SSC5D STARD5 STXBP5L SZRD1 TAP2 TCP10 TDRD12 TET2 TFE3 THBS1 TICAM1 TINAGL1 TLK2 TLR7 TMEM106A TMEM140 TMEM171 TMEM255B TNFAIP8L3 TNFRSF10B TNFSF13B TNFSF15 TNFSF9 TNIP3 Eukaryotes Table E.2 (continued) Bacteria 277 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TNNT2 TOP1P2 TP53TG3 TRA2A TRANK1 TRAPPC3L TRIM10 TRIM14 TRIM15 TRIM22 TRIM25 TRIM26 TRIM38 TRIM5 TRIM56 TRIM6 TRIM69 TUSC8 UBE2SP2 UBQLNL USP15 USP25 USP41 USP42 VMP1 VN1R1 Eukaryotes Table E.2 (continued) Bacteria 278 All All AKAP11 AKR7A2 ALDH3A2 ALDH6A1 ANKH ANKMY2 APLP2 CAT CBFA2T2 BacteriaViruses & BacteriaViruses ACAA2 & ACSS1 ACTB ADCK3 ADD1 ADD3 ARHGAP18 ARHGEF6 ATP2A3 EukaryotesViruses & EukaryotesViruses & AASDH ABCA5 ACAD11 ACCS ADCY7 AGO4 AHNAK AKTIP ALCAM . The table lists genes that are significantly downregulated for taxonomi- Bacteria &karyotes Eu- Bacteria &karyotes Eu- BACH2 GUSBP2 LETMD1 LOC100130276 MORC2-AS1 POLR2J4 SMA5 TAF15 ZNF260 Viruses VTRNA1-3 WDFY1 WHAMM WSB1 ZBTB21 ZCCHC2 ZNF296 ZNF536 ZNF800 ZUFSP Viruses AAAS AACS AAMP AARD ABCA2 ABCA3 ABCB1 ABCB6 ABCB8 Eukaryotes Eukaryotes AARSD1 ACAP2 ACOT6 ACTG1 ADARB2-AS1 AGA AGAP10 AGAP6 ALDOB Down Regulated Genes From the Meta-analysis Table E.2 (continued) Bacteria Bacteria AASDHPPT ACADM ACYP1 AK3 ANGPTL2 BPTF CAND2 CD22 CD8B Table E.3: cal groups and genes that aresignificant commonly genes. downregulated in combinations of the groups. The virus taxonomical group produces the most 279 All DBT FAT3 FBXO21 FECH FOXN3 GMCL1 GTF3C2 HADH HIBCH HLCS KAT6B LDLRAP1 MKNK2 MTX3 PCMTD2 PEX6 PPP3CA PRKDC SPG7 TMEM64 TTC3 UBTF WDR6 ZNF148 ZNF318 ZNF589 BacteriaViruses BBS10 & BTBD6 CBFA2T3 CBX7 CDAN1 CDC14B CEP41 CEP68 CNNM3 CRYL1 CSPP1 CXXC5 D2HGDH DCAF17 DYRK2 DZIP1 EMR4P ENOSF1 EPM2AIP1 FAHD2A FAM102A FAM159A FKTN FOXP1 GGA2 GLCCI1 EukaryotesViruses & ALDH5A1 ALKBH7 AMT ANAPC4 ANAPC5 ANGEL2 ARHGAP35 BBS2 BCKDHA BIVM BNC2 BRWD1 BTD CADM1 CAMK2G CAMKMT CARNS1 CCDC112 CDC16 CDK10 CDK5RAP3 CIRBP CNOT1 CPT1A CREBL2 CRELD1 Bacteria &karyotes Eu- ZNF607 Viruses ABCC3 ABCD3 ABCD4 ABCE1 ABHD10 ABHD12 ABHD14A ABHD14B ABHD17A ABHD17AP1 ABI2 ABI3BP ABL1 ABLIM1 ABLIM2 ABTB1 ACACA ACACB ACAD10 ACAD9 ACADS ACADSB ACADVL ACAP3 ACAT1 ACBD6 Eukaryotes AMZ1 ANAPC13 ANG ANKRD18A ANKRD20A1 ANKRD20A11P ANKRD23 APBB2 ARL17A ARMCX4 ATP5L ATPAF1 ATRAID BCL9 BCYRN1 BEGAIN BRD8 CALM1 CAPN3 CATSPER2 CCDC57 CCL15-CCL14 CD163L1 CD36 CEBPA-AS1 CENPU Table E.3 (continued) Bacteria CDON CECR5 CTR9 DTD2 ENTPD5 FAM73B FBXO41 GALM GPR89A GRAP2 HEXIM1 HTATSF1P2 IGF1 INPP5F ITM2A LBH LOC645314 LOC728667 LRRN3 MATR3 MROH8 ND4L NDUFB11 OCIAD2 PMS2 POLG2 280 All BacteriaViruses GOLGA8A & GPD1L GSTM4 GTF2I HDAC5 HELZ HES6 HTATSF1 IL16 IVD JADE1 KDSR KIAA1147 KIFC2 KLF12 KLHDC2 KLHL17 LDHB LEF1 LINC00094 LOC642852 MAP2K5 MAP4K2 MAPRE2 MCCC2 MDH1 EukaryotesViruses & CRTAP CSAD CSDE1 CTDSP2 CYB561A3 DCAF16 DCAF8 DCUN1D4 DDR1 DDX17 DHX30 DYM EEF1A1 EIF3FP2 EIF3K EIF3L ELP2 ERCC5 EXD3 FAAH FAM117A FAM134A FAM13A FAM208A FAM3A FAM3C Bacteria &karyotes Eu- Viruses ACLY ACO1 ACO2 ACOX1 ACOX3 ACP6 ACSF3 ACSM3 ACSS2 ACTN4 ACTR1A ACTR1B ACTR3C ACTR5 ACTR6 ACTR8 ACVR2B ADAL ADAM11 ADAM23 ADAM28 ADAMTS10 ADAMTS5 ADCK2 ADCK5 ADCY3 Eukaryotes CEP44 CERS6 CES2 CFHR3 CHD9 CLCA2 CLPTM1L COL4A5 CPED1 CPS1 CROCCP3 CUL4A CYB561 CYP4Z1 DCAF11 DCAF6 DCN DDB2 DEFA1B DNAJC27 DPF3 DPM3 ELOVL7 ENAH EPB41L4A-AS1 EVL Table E.3 (continued) Bacteria POM121 RASGRP3 SATB1 SKP2 SLC9A3R2 SMEK1 SYNRG SYTL2 TBCCD1 TMEM206 TRIM61 TTC28 VSX2 ZBTB20 ZDHHC14 ZNF12 ZNF667 ZNF785 ZNRF2 281 All BacteriaViruses MEF2C & METTL8 MPHOSPH9 MTERF2 NFATC3 NNT NR2F2 NUMA1 OR51D1 OSGEPL1 PANK1 PARP1 PATZ1 PCMTD1 PCYOX1 PDCD4 PDE3B PDE7A PDPR PER3 PJA1 POU6F1 PRKACB PRPS2 PSKH1 EukaryotesViruses & FAM73A FARS2 FASTK FBXO3 FBXO9 FCGBP FNBP1 FNTA FOXJ2 GALT GDPD5 GPBP1L1 HACD2 HACD3 HADHA IDH3B IL11RA ILVBL INADL INTS1 INTS10 INTS4 IPO9 ITFG3 KIAA1324 KIZ Bacteria &karyotes Eu- Viruses ADCY5 ADCY9 ADH5 ADK ADNP ADRA2A AES AFG3L2 AGAP3 AGAP5 AGFG2 AGGF1 AGL AGMO AGO1 AGO3 AGPAT4 AGPS AGT AHCTF1 AHCYL1 AHNAK2 AIFM1 AIP AK1 AK4 Eukaryotes FAM150B FAM178A FAM200B FAM8A1 FAN1 FBXO32 FHL1 FLJ44124 FOXD4L3 FOXD4L4 FRMD6-AS1 FUT6 GGT2 GLS2 GOLGA3 GPR137C GTF2IRD2P1 HDDC2 HERC2P2 HIGD2B HIST1H4B HLF HNRNPA1P10 HNRNPA1P4 IDUA IFNW1 Table E.3 (continued) Bacteria 282 All BacteriaViruses PTCD3 & RBL2 RBM4B RETSAT RFX7 RNF187 SACS SCRN3 SENP7 SEPT6 SHPRH SLC25A15 SMARCA2 SMARCC1 SMARCC2 SPIN2B SRPK2 SRR ST6GAL1 ST8SIA6 SUCLG2 TARSL2 TCTN3 TDRD3 TEX2 THEM4 EukaryotesViruses & KLHL3 KMT2A KMT2C LANCL1 LHPP LOC148430 LONP2 LPHN1 LPP LRRC14 LYRM7 MAN2C1 MBD4 MBLAC2 METTL17 METTL7A MMS19 MPPE1 MRPS27 MTSS1 MYH10 NAPEPLD NCOR1 NFIB NME3 NRBP2 Bacteria &karyotes Eu- Viruses AKAP1 AKAP9 AKR1A1 AKR1B1 AKR1C1 AKR1C3 AKR1E2 AKT1 ALAD ALDH16A1 ALDH1A1 ALDH2 ALDH4A1 ALDH7A1 ALDH9A1 ALDOC ALG1 ALG1L ALG1L2 ALG6 ALG9 ALKBH2 ALKBH5 ALKBH6 ALKBH8 ALMS1 Eukaryotes IRF2BP2 ISCA1P1 ITPR3 JMJD8 KLF3-AS1 KLHL14 KLHL20 L3MBTL1 LINC00176 LINC00235 LINC00282 LINC00667 LINC01011 LINC01205 LINC01552 LOC100128056 LOC100128288 LOC100130093 LOC100130353 LOC100131289 LOC100131303 LOC100131479 LOC100131541 LOC100134868 LOC389765 LOC391334 Table E.3 (continued) Bacteria 283 All BacteriaViruses TMEM143 & TMEM25 TNFRSF25 TNRC6B TPCN1 TRIM28 TRMT2B TTC14 UBE4B UCKL1 ULK2 WWP1 YLPM1 ZBED3 ZBTB14 ZBTB44 ZC3H14 ZFP30 ZKSCAN1 ZMYM3 ZNF266 ZNF346 ZNF398 ZNF704 ZNF706 EukaryotesViruses & NSUN5P1 NSUN5P2 NT5DC1 NUDT16 PALLD PARP16 PARVA PCBP2 PCM1 PDXDC1 PHKB PIAS1 PINK1 PLLP PNISR PPP1R3E PREPL PRKCZ PSMA3-AS1 PXMP4 RAB40C RALGPS2 RBM5 RCOR3 RMDN1 RNF170 Bacteria &karyotes Eu- Viruses ALOX5 AMACR AMDHD2 AMFR AMMECR1 AMOT ANAPC1 ANAPC11 ANAPC16 ANAPC2 ANAPC7 ANGEL1 ANK3 ANKHD1 ANKRD11 ANKRD13B ANKRD16 ANKRD26 ANKRD27 ANKRD28 ANKRD40 ANKRD44 ANKRD50 ANKRD54 ANKRD6 ANKS3 Eukaryotes LOC391578 LOC392440 LOC400446 LOC401098 LOC441087 LOC441228 LOC442517 LOC642817 LOC642934 LOC643605 LOC645261 LOC646588 LOC647264 LOC653080 LOC728178 LOC728196 LOC728452 LOC728499 LOC728554 LOC728715 LRRC37BP1 LST1 LTBP4 MAP1LC3B MGC24103 MMP11 Table E.3 (continued) Bacteria 284 All BacteriaViruses & EukaryotesViruses & RPL3 RPL7A RPS3A SESN1 SIAE SLC44A2 SLC4A4 SMDT1 SNRNP200 SNRPN SNURF SORBS2 SRSF5 STXBP4 SUGP2 SUN1 SUN2 SYNJ2 TAF9B TCF3 TLE2 TMEM181 TMEM203 TMEM245 TMEM8B TMX4 Bacteria &karyotes Eu- Viruses ANKS6 ANKZF1 ANO9 ANTXR2 ANXA11 ANXA6 AP2A2 AP3M1 AP3S2 AP4B1 AP4M1 APBB1 APBB3 APC APEH APH1A APMAP APOA1BP APP APPL1 APPL2 APRT ARAP3 ARFGEF3 ARFIP2 ARFRP1 Eukaryotes MPC1 MRFAP1L1 MUM1L1 MUSTN1 MYLIP NBPF10 NDUFB5 NEDD4L NSUN5 NUTM2B OFD1 OR7E125P OSBPL10 OSBPL5 PCDHB19P PCF11 PDE5A PDPK1 PIAS4 PIGC PIK3CA PIP5K1B PKD1 PKP2 PLAG1 PPCS Table E.3 (continued) Bacteria 285 All BacteriaViruses & EukaryotesViruses & TPRG1L TSPAN6 TTC13 TULP4 UBR3 USP21 VAMP1 VAPB VCL VPS28 VPS52 WASH3P ZBED5 ZBTB4 ZER1 ZFP14 ZKSCAN4 ZNF362 ZNF519 ZNF565 Bacteria &karyotes Eu- Viruses ARGLU1 ARHGAP1 ARHGAP12 ARHGAP24 ARHGAP44 ARHGEF1 ARHGEF10 ARHGEF18 ARHGEF19 ARHGEF26 ARHGEF28 ARHGEF38 ARHGEF4 ARHGEF9 ARID1A ARID1B ARID2 ARL10 ARL15 ARL2BP ARMC1 ARMC9 ARMCX5 ARRDC1 ARRDC1-AS1 ARRDC5 Eukaryotes PRPSAP2 PRR5 PSG7 RAB3GAP2 RAET1G RASA4CP RAX2 RDH13 RDH5 RERE RMRP RN7SK RNA5S9 RNASE4 RNU1-1 RNU1-3 RNU1-4 RNU11 RNU12 RNU2-1 RNU4-1 RNU4-2 RNU4ATAC RNU6-1 RNU6-15P RNVU1-18 Table E.3 (continued) Bacteria 286 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses ARSK ARVCF ASB8 ASF1A ASL ASMTL ASPSCR1 ASRGL1 ASXL1 ASXL2 ATF2 ATG13 ATG14 ATG16L2 ATG2B ATG4C ATL1 ATM ATP13A2 ATP1A1 ATP1A1-AS1 ATP1B1 ATP2B1 ATP2B4 ATP2C1 ATP5A1 Eukaryotes RNY3 RPAIN RPL32 RPL38 RPS6KB2 RUNX1-IT1 RUNX1T1 SCAND2P SCARNA13 SCARNA14 SCARNA2 SCARNA21 SCARNA6 SCARNA9 SCNN1D SDHAP2 SERAC1 SHF SHROOM4 SIKE1 SLC9A3R1 SNHG5 SNHG7 SNORA12 SNORA16A SNORA2B Table E.3 (continued) Bacteria 287 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses ATP5G2 ATP5SL ATP6AP1 ATP6V0E2 ATP9A ATP9B ATPAF2 ATRN ATRX ATXN1 ATXN10 ATXN2 ATXN3 ATXN7L3B B3GALT4 B3GAT3 B3GNT4 B3GNT9 B4GALNT1 B4GALT6 B4GAT1 B9D2 BACE1 BACE2 BAG5 BAG6 Eukaryotes SNORA48 SNORA55 SNORA62 SNORA73B SNORA7B SNORD104 SNORD12 SNORD12C SNORD13 SNORD15B SNORD1B SNORD21 SNORD32A SNORD3A SNORD3C SNORD3D SNORD46 SNORD4B SNORD66 SNORD68 SNORD71 SNORD83B SNX1 SNX29P2 SPATA6L SPOP Table E.3 (continued) Bacteria 288 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses BAIAP2L1 BAIAP3 BANK1 BAP1 BBIP1 BBS1 BBS12 BBS4 BBS9 BBX BCAP31 BCAS1 BCAS3 BCAT2 BCDIN3D BCKDHB BCKDK BCL11A BCL11B BCL7A BCL7C BCS1L BDH1 BET1L BHLHB9 BHLHE41 Eukaryotes SRRM2 SSX2IP STAG3L2 SUGT1P1 TAB1 TAF1L TBC1D32 TCEAL1 TCEAL8 TDRD1 TIA1 TMED4 TMEM107 TMEM191A TMEM69 TMEM80 TMEM9B TPD52 TPT1 TRIM4 TRIM53AP TRIM66 TRMT13 TSC22D3 TSPAN12 TSTD1 Table E.3 (continued) Bacteria 289 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses BICD2 BIN1 BIN3 BLK BLOC1S5 BMP4 BMPR1A BNIP3 BNIP3L BOD1 BOD1L1 BOK BOLA1 BPHL BRAT1 BRD3 BRF1 BRI3BP BSG BTBD2 BTF3L4 BTRC BUB3 BZW2 C2CD2 C2CD5 Eukaryotes TTC17 TTLL2 TTLL3 TUG1 UBL7 UNC80 VEZF1 VTRNA1-1 WASH1 WDR26 YY1AP1 ZBTB48 ZMYM4 ZNF204P ZNF232 ZNF234 ZNF28 ZNF30 ZNF302 ZNF324 ZNF33B ZNF430 ZNF525 ZNF586 ZNF69 ZNF700 Table E.3 (continued) Bacteria 290 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CA11 CA8 CAB39L CABIN1 CACNA2D1 CACUL1 CAD CAMK1 CAMK1D CAMSAP2 CAMTA1 CAPN7 CARF CARKD CARS2 CASC3 CASC4 CASD1 CASP2 CASP6 CAV1 CBR4 CBX1 CBX5 CBX6 CBX8 Eukaryotes ZNF74 ZNF773 ZNF816 ZNF827 ZNF839 ZNF84 Table E.3 (continued) Bacteria 291 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CBY1 CC2D1A CC2D1B CC2D2A CCBL1 CCDC107 CCDC14 CCDC146 CCDC153 CCDC159 CCDC22 CCDC28A CCDC54 CCDC65 CCDC85B CCDC88A CCDC88C CCDC91 CCDC92 CCM2 CCNA2 CCNB1 CCNG1 CCNG2 CCNH CCNI Eukaryotes Table E.3 (continued) Bacteria 292 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CCNO CCNY CCPG1 CCS CCSER2 CCT4 CD1C CD2 CD248 CD81 CD9 CD96 CD99L2 CDC123 CDC23 CDC25B CDC37 CDC40 CDC42BPA CDH13 CDH2 CDHR3 CDIPT CDK19 CDK20 CDK5 Eukaryotes Table E.3 (continued) Bacteria 293 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CDK5RAP1 CDK6 CDKN1B CDPF1 CDRT4 CDS1 CEBPA CELSR1 CELSR2 CENPB CENPK CENPO CENPP CENPT CEP112 CEP131 CEP164 CEP250 CEP290 CEP63 CEP70 CEP78 CEP97 CERK CERS2 CERS4 Eukaryotes Table E.3 (continued) Bacteria 294 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CETN2 CFAP20 CFAP97 CGNL1 CHCHD2 CHD1L CHD3 CHD4 CHD6 CHDH CHEK2 CHKB CHML CHMP1A CHMP7 CHRDL1 CHST14 CHTF18 CHURC1 CIAO1 CIZ1 CKB CKMT1A CLASP2 CLCC1 CLCN2 Eukaryotes Table E.3 (continued) Bacteria 295 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CLCN3 CLDN12 CLDN3 CLIC3 CLIC5 CLIC6 CLIP3 CLK2 CLMN CLN8 CLNS1A CLOCK CLPP CLSTN1 CLSTN2 CLUAP1 CLYBL CMBL CMTM8 CNGA1 CNNM2 CNOT10 CNOT11 CNOT2 CNOT6 CNOT7 Eukaryotes Table E.3 (continued) Bacteria 296 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CNPY4 CNTLN CNTNAP3 CNTROB COA3 COASY COBLL1 COG1 COG2 COG5 COG6 COG7 COIL COL4A4 COL6A2 COL6A5 COLGALT1 COLGALT2 COPA COPS6 COPS7A COPZ1 COPZ2 COQ10A COQ4 COQ5 Eukaryotes Table E.3 (continued) Bacteria 297 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses COQ7 COQ9 COX11 COX7A2L CPD CPE CPLX1 CPM CPSF1 CPSF3 CPSF3L CPSF4 CPSF6 CRADD CREB3L2 CREB3L4 CREBZF CRHR1-IT1 CRIM1 CRIP2 CRIPAK CRLF1 CRLS1 CROT CRY2 CRYZ Eukaryotes Table E.3 (continued) Bacteria 298 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CRYZL1 CS CSGALNACT1 CSNK1G2 CSNK2A1 CSNK2A2 CSRP2BP CSTF2T CTBP1 CTBP2 CTCF CTDSP1 CTDSPL CTNNB1 CTNNBIP1 CTNND2 CTPS2 CTSA CTSF CTSV CTTNBP2 CUEDC2 CUL7 CUL9 CUTA CXADRP3 Eukaryotes Table E.3 (continued) Bacteria 299 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses CXCR5 CXXC1 CXXC4 CYB5B CYB5D2 CYB5R3 CYB5RL CYBRD1 CYFIP2 CYHR1 CYP27A1 CYP2S1 CYP39A1 CYTB CYTH2 DAAM2 DAD1 DAG1 DAGLB DALRD3 DAP3 DAPK1 DARS2 DAW1 DBN1 DBNDD1 Eukaryotes Table E.3 (continued) Bacteria 300 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses DBNDD2 DBP DCAF4 DCBLD2 DCDC2 DCHS1 DCP1B DCTN2 DCUN1D2 DCXR DDB1 DDHD2 DDRGK1 DDX18 DDX20 DDX26B DDX42 DDX46 DDX5 DDX54 DDX6 DEAF1 DECR2 DEF6 DEGS2 DENND4C Eukaryotes Table E.3 (continued) Bacteria 301 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses DENND5B DENND6A DEPTOR DET1 DEXI DFFA DFFB DGAT1 DGAT2 DGCR6 DGCR8 DGKA DGKG DGKQ DHCR24 DHCR7 DHDDS DHFRL1 DHODH DHRS13 DHRS3 DHRS7 DHTKD1 DHX16 DHX32 DHX35 Eukaryotes Table E.3 (continued) Bacteria 302 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses DHX57 DHX9 DIAPH1 DIDO1 DIEXF DIP2A DIP2C DIRC2 DIS3L DIS3L2 DIXDC1 DLG1 DLG3 DLST DMAP1 DMD DMTN DMXL1 DNAH5 DNAH6 DNAJC10 DNAJC11 DNAJC16 DNAJC18 DNAJC28 DNAJC30 Eukaryotes Table E.3 (continued) Bacteria 303 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses DNAL4 DNASE2 DNER DNLZ DNMT3A DOCK1 DOCK4 DOCK6 DOCK7 DOCK9 DOLK DOLPP1 DPAGT1 DPEP2 DPH6 DPP7 DPYD DPYSL2 DPYSL3 DRG2 DROSHA DSCR3 DSP DSTN DSTYK DTD1 Eukaryotes Table E.3 (continued) Bacteria 304 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses DTNB DTWD2 DUS2 DUS3L DUSP14 DUSP18 DUSP19 DVL1 DVL2 DYNC1LI2 DYNC2H1 DYNC2LI1 DYNLL2 DYNLRB2 DZANK1 DZIP1L E2F5 E4F1 EBPL ECH1 ECHS1 ECI2 EDC4 EDF1 EDIL3 EEF1D Eukaryotes Table E.3 (continued) Bacteria 305 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses EEF1G EEF2 EEF2K EEFSEC EFCAB1 EFHC1 EFHC2 EFNA4 EGLN1 EGLN2 EHMT1 EHMT2 EID1 EIF2AK4 EIF2B4 EIF2D EIF3C EIF3E EIF3F EIF3H EIF3I EIF3M EIF4B EIF4EBP2 EIF4H ELL3 Eukaryotes Table E.3 (continued) Bacteria 306 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses ELMO1 ELOVL1 ELOVL5 ELP4 ELP6 EMC1 EML1 EML3 EML4 EMP2 ENDOG ENGASE ENO1 ENO2 ENOX2 ENTPD4 ENTPD6 EPAS1 EPB41L1 EPB41L4A EPDR1 EPHA1 EPHB4 EPHB6 EPHX1 EPHX2 Eukaryotes Table E.3 (continued) Bacteria 307 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses EPN2 EPOR EPS15L1 ERBB2 ERBB2IP ERCC3 ERCC4 ERCC6 ERGIC3 ERMARD ERMP1 ESYT1 ESYT2 ETAA1 ETFB EVA1B EVI5 EXD2 EXOSC10 EXOSC7 EXPH5 EXTL2 EZH1 FADS1 FAHD1 FAIM3 Eukaryotes Table E.3 (continued) Bacteria 308 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses FAM101B FAM102B FAM109A FAM110A FAM114A1 FAM117B FAM120B FAM127A FAM127B FAM134B FAM134C FAM149A FAM149B1 FAM156A FAM160B2 FAM162A FAM168A FAM168B FAM172A FAM173A FAM173B FAM174B FAM175A FAM185A FAM188B FAM199X Eukaryotes Table E.3 (continued) Bacteria 309 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses FAM19A1 FAM213A FAM217B FAM219B FAM21C FAM228B FAM229B FAM35A FAM50A FAM63A FAM64A FAM65A FAM71E1 FAM92A1 FAM98C FANCC FANCD2 FANCE FANCF FANCG FANCM FANK1 FASTKD1 FAT4 FAU FBL Eukaryotes Table E.3 (continued) Bacteria 310 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses FBLN2 FBLN5 FBP1 FBRS FBXL14 FBXL16 FBXL17 FBXL20 FBXL4 FBXL6 FBXL7 FBXO36 FBXO46 FBXW9 FCGRT FCHSD2 FCRL3 FCRLA FGD3 FGFR1OP FGFR2 FGFR3 FGFRL1 FGGY FH FHOD1 Eukaryotes Table E.3 (continued) Bacteria 311 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses FIG4 FIGN FITM2 FKBP9 FLJ20021 FLYWCH1 FLYWCH2 FMNL3 FMO4 FMO5 FN3KRP FOCAD FOXA1 FOXA2 FOXP2 FOXRED2 FPGS FRA10AC1 FRAT1 FREM2 FTO FTSJ1 FUCA1 FUK FUZ FZD1 Eukaryotes Table E.3 (continued) Bacteria 312 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses FZD2 FZD3 FZD8 FZR1 G2E3 GAA GABARAPL1 GAK GALC GALNT11 GALNT12 GALNT2 GAMT GANAB GAPVD1 GATAD1 GATAD2B GATC GATS GATSL3 GBA2 GCDH GCN1L1 GDI1 GDPD3 GEMIN4 Eukaryotes Table E.3 (continued) Bacteria 313 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses GEMIN8 GFOD2 GGA1 GGACT GGCX GGNBP2 GGT7 GHDC GHR GID8 GIGYF1 GIPC1 GIT2 GK5 GKAP1 GLB1 GLB1L2 GLG1 GLIS2 GLMN GLO1 GLOD4 GLT8D1 GLTSCR2 GLUD1 GLYCTK Eukaryotes Table E.3 (continued) Bacteria 314 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses GM2A GMEB2 GMPR2 GNA11 GNA12 GNAQ GNB2L1 GNB5 GNE GNL3L GNPDA2 GOLGA2 GOLGA2P5 GOLGA8B GOLGB1 GOLIM4 GOT1 GOT2 GPAA1 GPAM GPANK1 GPATCH11 GPATCH8 GPHN GPI GPR126 Eukaryotes Table E.3 (continued) Bacteria 315 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses GPR155 GPR183 GPR56 GPRASP2 GPRIN3 GPSM1 GPSM2 GPX8 GRAMD1C GRAMD4 GRHPR GRK6 GRSF1 GSAP GSK3B GSTA4 GSTM2 GSTM3 GSTP1 GSTT1 GSTT2B GTF2H4 GTF3C1 GTF3C5 GTPBP6 GUCD1 Eukaryotes Table E.3 (continued) Bacteria 316 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses GUCY1A3 GUCY1B3 GULP1 GUSB GUSBP11 GXYLT1 GZF1 H2AFV H2AFY2 H3F3A HABP4 HACD4 HACE1 HACL1 HADHB HAGH HAGHL HARS2 HAUS5 HAUS7 HCFC2 HDAC1 HDAC11 HDAC2 HDAC8 HDHD2 Eukaryotes Table E.3 (continued) Bacteria 317 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses HDHD3 HECA HECTD1 HECTD3 HEMK1 HERC2 HEXA HGS HGSNAT HIBADH HID1 HILPDA HINT2 HIP1 HIPK2 HIRIP3 HIST1H2BO HKR1 HLA-DMB HMCN1 HMGB1 HNRNPA0 HNRNPL HNRNPR HNRNPU HNRNPUL1 Eukaryotes Table E.3 (continued) Bacteria 318 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses HNRNPUL2 HOGA1 HOOK1 HOOK3 HOXA4 HP1BP3 HPS3 HPS4 HS1BP3 HS2ST1 HS3ST5 HS6ST2 HSD17B10 HSD17B11 HSD17B4 HSD17B8 HSDL2 HSP90AA4P HSP90AB1 HSP90AB3P HSP90AB5P HSPA2 HSPG2 HTRA1 HTRA3 HTT Eukaryotes Table E.3 (continued) Bacteria 319 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses HUWE1 HYAL1 HYI IARS2 ID4 IDE IDH3G IDS IFT122 IFT140 IFT27 IFT46 IFT80 IFT81 IFT88 IGF1R IGFBP2 IGFBP3 IK IKBIP IKBKAP IL27RA ILF3 IMMP2L IMMT IMP3 Eukaryotes Table E.3 (continued) Bacteria 320 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses IMPACT IMPDH1 INAFM1 ING2 ING4 INPP5A INPP5B INPP5E INPP5K INPPL1 INSIG2 INTS3 INTS9 INVS IPMK IPO11 IPO8 IPP IQCC IQCE IQGAP1 IQSEC1 IRAK1BP1 IRF2BP1 IRF3 IRS1 Eukaryotes Table E.3 (continued) Bacteria 321 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses IRX2 ISCA1 ISPD ITCH ITFG2 ITGA6 ITGA8 ITGB3BP ITM2C ITPKB ITPR1 ITPR2 IVNS1ABP IYD JAK1 JAM3 JMJD7- PLA2G4B JMY KANK2 KAT5 KAT7 KAT8 KATNAL1 KAZALD1 KBTBD11 Eukaryotes Table E.3 (continued) Bacteria 322 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses KBTBD3 KBTBD4 KCND2 KCNK6 KCNMB4 KCNN2 KCNN4 KCTD7 KDELC1 KDM1B KDM3A KDM3B KDM4B KDM4C KDM5B KHK KIAA0141 KIAA0195 KIAA0355 KIAA0391 KIAA0556 KIAA0586 KIAA0753 KIAA0922 KIAA0930 KIAA1107 Eukaryotes Table E.3 (continued) Bacteria 323 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses KIAA1161 KIAA1211 KIAA1217 KIAA1324L KIAA1328 KIAA1377 KIAA1407 KIAA1429 KIAA1467 KIAA1671 KIAA1715 KIAA1804 KIAA1841 KIAA1958 KIAA2018 KIAA2022 KIAA2026 KIDINS220 KIF13A KIF16B KIF1C KIF20A KIF21A KIF21B KIF22 KIF2A Eukaryotes Table E.3 (continued) Bacteria 324 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses KIF3A KIFAP3 KIT KLC1 KLF13 KLF9 KLHDC1 KLHDC10 KLHDC3 KLHL12 KLHL21 KLHL22 KLHL23 KLHL24 KLHL26 KLHL35 KLHL36 KLHL42 KLHL8 KLHL9 KMT2E KNTC1 KPNA1 KPNA6 KRAS KRBA1 Eukaryotes Table E.3 (continued) Bacteria 325 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses KRT10 KRTAP10-1 KTN1-AS1 L2HGDH L3MBTL3 LAMA4 LAMB2 LAMC1 LAMP1 LAS1L LCA5 LCLAT1 LCMT1 LCMT2 LCORL LDB1 LDB2 LDLRAD4 LDOC1L LEMD2 LFNG LIMCH1 LIMD1 LINC00339 LIPT2 LLGL2 Eukaryotes Table E.3 (continued) Bacteria 326 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses LMBR1 LMBRD1 LMBRD2 LMNA LNX1 LOC100129233 LOC100129280 LOC100129399 LOC100130691 LOC100133306 LOC100288842 LOC100422737 LOC100505710 LOC100506123 LOC100506255 LOC100506286 LOC100506974 LOC100506990 LOC100507487 LOC155060 LOC389705 LOC401320 LOC646836 LOC728392 LOC728575 LOC728705 Eukaryotes Table E.3 (continued) Bacteria 327 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses LOC729033 LONP1 LPAR2 LPAR3 LPCAT1 LPCAT3 LPIN1 LRBA LRCH4 LRFN3 LRIG1 LRP6 LRPAP1 LRPPRC LRRC1 LRRC23 LRRC26 LRRC28 LRRC36 LRRC45 LRRC46 LRRC47 LRRC57 LRRC58 LRRCC1 LRTOMT Eukaryotes Table E.3 (continued) Bacteria 328 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses LSM10 LSM11 LSM14A LSM14B LSS LTA4H LUC7L2 LUC7L3 LURAP1L LXN LY9 LYPD2 LYPLA2 LYRM4 LYRM5 LYSMD1 LYSMD4 LZTFL1 LZTR1 LZTS3 MACF1 MACROD1 MAD1L1 MADD MAGED1 MAGED2 Eukaryotes Table E.3 (continued) Bacteria 329 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MAGEH1 MAGI1 MAGI3 MAL MAN1A1 MAN1A2 MAN1B1 MAN1C1 MAN2A2 MAN2B1 MAN2B2 MANBA MAOB MAP2K6 MAP3K1 MAP3K12 MAP3K2 MAP3K3 MAP3K4 MAP3K5 MAP7 MAP7D3 MAP9 MAPK1 MAPK12 MAPK14 Eukaryotes Table E.3 (continued) Bacteria 330 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MAPK3 MAPK8 MAPK8IP3 MAPKAP1 MAPKBP1 MAPRE1 MAPT MARCH1 MARCH2 MARCH9 MARK1 MARVELD1 MAST3 MAT2B MAVS MAZ MBD3 MBNL2 MBNL3 MBOAT2 MBOAT7 MBTD1 MBTPS1 MCCC1 MCEE MCM3AP Eukaryotes Table E.3 (continued) Bacteria 331 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MCM3AP-AS1 MCOLN1 MCRS1 MCTS2P MDM1 MDN1 MECOM MECP2 MECR MED12 MED16 MED22 MED23 MED25 MED29 MED4 MEGF6 MEGF8 MEIS1 MEN1 MEPCE MESP1 MET METAP1 METTL25 METTL3 Eukaryotes Table E.3 (continued) Bacteria 332 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MEX3A MEX3B MFAP3L MFF MFHAS1 MFNG MFSD1 MFSD11 MFSD3 MFSD6 MGA MGAT4B MGC27345 MGLL MGRN1 MIA MIA3 MIB1 MIB2 MICAL1 MID1IP1 MIF MINA MINK1 MIPEP MIPOL1 Eukaryotes Table E.3 (continued) Bacteria 333 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MKL2 MKRN2 MKS1 MLEC MLH3 MLPH MLST8 MLXIP MLYCD MMAB MMACHC MME MMGT1 MMP15 MMS22L MNT MOB3B MON2 MORC2 MORC4 MORF4L1 MORN3 MOSPD2 MOSPD3 MPC2 MPDZ Eukaryotes Table E.3 (continued) Bacteria 334 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MPHOSPH8 MPND MPP7 MPPED2 MPRIP MPZL1 MRE11A MRGPRX4 MRI1 MRM1 MROH1 MRPL14 MRPL28 MRPL34 MRPL43 MRPL49 MRPL55 MRPS15 MRPS21 MRPS24 MRPS25 MRPS26 MRPS33 MRPS34 MRPS35 MRPS9 Eukaryotes Table E.3 (continued) Bacteria 335 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MSANTD3- TMEFF1 MSH2 MSH3 MSH6 MSI2 MSTO1 MTA3 MTAP MTCH1 MTDH MTERF1 MTERF4 MTFP1 MTFR1 MTFR1L MTG1 MTHFD1 MTIF3 MTM1 MTMR10 MTR MTRF1 MTURN MTUS1 MUM1 Eukaryotes Table E.3 (continued) Bacteria 336 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses MUS81 MUT MUTYH MVK MXD4 MYADM MYB MYH9 MYLK MYO18A MYO1B MYO1D MYO5C MYO6 MYO9A MYO9B MYZAP MZF1 MZT2B N4BP2 N6AMT2 NAA30 NAA35 NAA38 NAA40 NACA Eukaryotes Table E.3 (continued) Bacteria 337 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses NADSYN1 NAF1 NANP NAP1L1 NAP1L3 NARF NARFL NARS2 NAT6 NAT9 NBAS NBEA NBEAL2 NBR1 NCALD NCAPD2 NCBP2 NCDN NCKAP5 NCOA1 NCOR2 NDE1 NDP NDRG2 NDRG3 NDST1 Eukaryotes Table E.3 (continued) Bacteria 338 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses NDUFA10 NDUFB10 NDUFB8 NDUFB9 NDUFC2 NDUFS1 NDUFS2 NDUFS8 NDUFV1 NDUFV3 NEBL NEDD4 NEGR1 NEIL1 NEIL2 NEK1 NEK9 NELFB NELFCD NEO1 NET1 NEURL1B NEURL2 NEURL4 NFIA NFIC Eukaryotes Table E.3 (continued) Bacteria 339 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses NFYC NGEF NHEJ1 NHLRC2 NHSL1 NICN1 NIN NIPA1 NIPAL2 NIPAL3 NIPBL NIPSNAP1 NISCH NIT1 NIT2 NLK NLN NME4 NMNAT3 NMT2 NOL7 NOMO1 NOXA1 NPAT NPEPL1 NPHP3 Eukaryotes Table E.3 (continued) Bacteria 340 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses NPHP4 NPIPB7 NPM1 NPNT NR2C1 NRCAM NREP NRP1 NRTN NSA2 NSD1 NSDHL NSFL1C NSMCE4A NSMF NSUN7 NT5C NT5C3B NTF4 NTHL1 NTPCR NUAK1 NUBP1 NUBPL NUCB2 NUCKS1 Eukaryotes Table E.3 (continued) Bacteria 341 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses NUDCD3 NUDT11 NUDT14 NUDT16L1 NUDT4 NUDT7 NUP133 NUP210 NUP85 NYNRIN OARD1 OBFC1 OBSL1 OCRL ODF2 OGFOD2 OGT OIP5-AS1 OLFML2A OPA1 OPHN1 ORAI3 ORC5 ORMDL3 OS9 Eukaryotes Table E.3 (continued) Bacteria 342 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses OSBPL1A OSBPL3 OSGEP OVOL2 OXA1L OXCT1 OXGR1 OXNAD1 P2RY1 P3H1 P3H2 P3H4 P4HB P4HTM PABPC1 PABPC4 PABPN1 PACS1 PAFAH1B1 PAFAH2 PAGR1 PAICS PAIP2 PAIP2B PALMD PAM Eukaryotes Table E.3 (continued) Bacteria 343 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PANK3 PAPSS1 PAPSS2 PAQR4 PAQR7 PAQR8 PARD3B PARK7 PARN PARS2 PASK PBRM1 PBX1 PBX3 PBXIP1 PC PCBD2 PCBP3 PCCA PCCB PCDHAC2 PCDHGA9 PCDHGB4 PCED1A PCID2 PCNT Eukaryotes Table E.3 (continued) Bacteria 344 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PCNX PCNXL2 PCNXL3 PCP4L1 PCSK4 PCSK5 PCYOX1L PCYT2 PDCD6 PDCD6IP PDCL PDDC1 PDGFD PDHA1 PDHB PDK1 PDK2 PDS5A PDS5B PDSS2 PDXK PDZD8 PEBP1 PECR PELI3 PEMT Eukaryotes Table E.3 (continued) Bacteria 345 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PEX1 PEX11B PEX11G PEX19 PEX26 PFAS PFDN5 PFKM PFN2 PGAP3 PGM2 PGM2L1 PGM3 PGP PGPEP1 PGRMC1 PGRMC2 PHB PHF1 PHF10 PHF14 PHF20 PHF3 PHKA2 PHKG2 PHOSPHO2 Eukaryotes Table E.3 (continued) Bacteria 346 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PHOSPHO2- KLHL23 PHTF1 PI4KA PIAS2 PIAS3 PIBF1 PIGO PIGP PIGQ PIGT PIGU PIH1D2 PIK3IP1 PIK3R1 PIK3R2 PIK3R4 PIP4K2B PIR PITPNC1 PITRM1 PJA2 PKD2 PKI55 PKIA PKN1 Eukaryotes Table E.3 (continued) Bacteria 347 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PLA2G4B PLCB1 PLCB2 PLCD1 PLCE1 PLCG1 PLD1 PLEKHA1 PLEKHA5 PLEKHA6 PLEKHB1 PLEKHG3 PLEKHH2 PLEKHJ1 PLIN3 PLS1 PLSCR3 PLTP PLXNA1 PLXNB1 PLXND1 PMM1 PMPCA PNKD PNMA1 PNPLA6 Eukaryotes Table E.3 (continued) Bacteria 348 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PNPO POF1B POFUT1 POFUT2 POGZ POLD4 POLDIP2 POLG POLH POLL POLR1B POLR1C POLR1E POLR2B POLR2E POLR2L POLR3E POLR3GL POLRMT POMT1 PON2 POP5 POT1 POU2F1 PP7080 PPAP2A Eukaryotes Table E.3 (continued) Bacteria 349 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PPARA PPARGC1B PPFIBP2 PPHLN1 PPIA PPIAL4B PPIF PPIL2 PPIP5K2 PPM1A PPM1B PPM1F PPM1L PPM1M PPOX PPP1CB PPP1CC PPP1R12B PPP1R12C PPP1R13B PPP1R14C PPP1R16A PPP1R16B PPP1R35 PPP1R37 PPP1R3C Eukaryotes Table E.3 (continued) Bacteria 350 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PPP1R9A PPP2R1A PPP2R2B PPP2R3B PPP2R5D PPP2R5E PPP5C PPP6R3 PPT1 PPWD1 PQBP1 PQLC1 PQLC3 PRCP PRDX2 PREP PRICKLE1 PRICKLE2 PRIMPOL PRKAA2 PRKAB2 PRKACA PRKAR1B PRKAR2A PRKCA PRKCI Eukaryotes Table E.3 (continued) Bacteria 351 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PRKCSH PRKD3 PRKRA PRMT2 PRMT9 PRNP PROC PROS1 PRPF19 PRPF6 PRPF8 PRPSAP1 PRR12 PRRC2A PRRC2B PRRG1 PSD3 PSIP1 PSMG3-AS1 PSTK PTAR1 PTBP3 PTCH1 PTDSS1 PTDSS2 PTEN Eukaryotes Table E.3 (continued) Bacteria 352 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses PTER PTGR2 PTK2 PTK7 PTOV1 PTPDC1 PTPN3 PTPN4 PTPRF PTPRM PTRF PUM1 PUM2 PURA PURG PUSL1 PVRL3 PWP1 PXYLP1 PYCRL PYGB PYGL PYGO2 PYROXD2 QARS QRICH1 Eukaryotes Table E.3 (continued) Bacteria 353 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses QRSL1 QSER1 R3HCC1 R3HDM2 RAB11B RAB11FIP2 RAB11FIP3 RAB11FIP4 RAB14 RAB2A RAB36 RAB3A RAB40B RAB5B RAB6B RABAC1 RABEP1 RABEP2 RABGAP1 RABL2A RABL2B RABL3 RACGAP1 RAD17 RAD50 RAD51D Eukaryotes Table E.3 (continued) Bacteria 354 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses RADIL RAI1 RALBP1 RALGAPB RALGPS1 RAMP1 RANBP9 RAPGEF1 RARB RARG RASA1 RASAL3 RASGEF1A RASGRP2 RASSF5 RASSF7 RB1CC1 RBBP4 RBBP7 RBBP9 RBFA RBM12 RBM14 RBM15B RBM23 RBM26 Eukaryotes Table E.3 (continued) Bacteria 355 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses RBM4 RBMS3 RBMX RBPMS2 RCAN3 RCBTB2 RDH11 RDH14 RDM1 RDX RECK REEP4 REPS1 REV1 REV3L RFNG RFT1 RFWD2 RFX1 RFX2 RFX3 RGCC RGL2 RHBDD2 RHOA RHOBTB2 Eukaryotes Table E.3 (continued) Bacteria 356 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses RHOBTB3 RHOQ RHOT2 RIC8A RIC8B RIMBP3 RIMKLA RIMS3 RING1 RIPPLY3 RITA1 RMDN2 RMND1 RMND5A RNASEH2A RNASEH2C RNASEK RNF13 RNF130 RNF167 RNF208 RNF215 RNF26 RNF44 RNF5 RNFT1 Eukaryotes Table E.3 (continued) Bacteria 357 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses RNFT2 RNMT RNPEPL1 ROBO1 ROGDI ROR1 RPA1 RPA2 RPAP1 RPGR RPL10A RPL10L RPL10P6 RPL10P9 RPL13 RPL13A RPL14 RPL15 RPL17 RPL18 RPL19 RPL22 RPL23 RPL27 RPL27P11 RPL34P34 Eukaryotes Table E.3 (continued) Bacteria 358 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses RPL35 RPL35A RPL36 RPL37A RPL4 RPL4P5 RPL5 RPL6 RPL6P25 RPL7 RPL8 RPLP0 RPN1 RPP21 RPP40 RPRD2 RPRM RPS10 RPS10-NUDT3 RPS14 RPS16 RPS25 RPS25P6 RPS26 RPS29 RPS3 Eukaryotes Table E.3 (continued) Bacteria 359 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses RPS3AP49 RPS3AP6 RPS6KA5 RPS8 RPSA RPUSD3 RPUSD4 RRAGA RSAD1 RSL24D1 RSPH3 RSRC1 RSU1 RTKN2 RTN3 RTTN RWDD2A RXRA RXRB RYK S1PR4 SAAL1 SAC3D1 SACM1L SAE1 SALL2 Eukaryotes Table E.3 (continued) Bacteria 360 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SAMD1 SAMD12 SAMD13 SAMM50 SAP30L SARM1 SARS2 SAYSD1 SBNO1 SC5D SCAI SCAP SCAPER SCARB1 SCD5 SCFD2 SCMH1 SCN2A SCNN1A SCRIB SCRN1 SCRN2 SDF4 SDHAF1 SDR39U1 SEC11A Eukaryotes Table E.3 (continued) Bacteria 361 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SEC14L2 SEC14L3 SEC22A SEC22C SEC31A SEC61A2 SEC62 SEC63 SELENBP1 SELM SELO SEMA3E SEMA4F SENP8 SEPHS1 SEPT3 SEPT8 SEPT9 SEPT11 SERBP1 SERINC5 SERPINB6 SERPINF1 SERTAD2 SESN3 SESTD1 Eukaryotes Table E.3 (continued) Bacteria 362 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SETBP1 SETD3 SETD6 SETD7 SETMAR SF1 SF3A1 SF3B1 SF3B2 SFI1 SFT2D3 SFXN3 SGCE SGIP1 SGK223 SGSH SGSM2 SGSM3 SH2B1 SH2D1B SH3BGRL2 SH3BP4 SH3D19 SH3GLB2 SH3PXD2A SH3YL1 Eukaryotes Table E.3 (continued) Bacteria 363 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SHISA2 SHPK SIGIRR SIGLEC17P SIMC1 SIN3A SIPA1 SIPA1L3 SIRPG SIRT1 SIRT3 SIRT4 SIVA1 SKI SKIV2L2 SLAIN1 SLAIN2 SLC12A2 SLC12A6 SLC12A7 SLC12A9 SLC16A2 SLC16A7 SLC17A5 SLC1A1 SLC22A23 Eukaryotes Table E.3 (continued) Bacteria 364 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SLC22A5 SLC23A2 SLC25A10 SLC25A11 SLC25A13 SLC25A16 SLC25A17 SLC25A26 SLC25A27 SLC25A29 SLC25A3 SLC25A35 SLC25A36 SLC25A39 SLC25A4 SLC25A40 SLC25A42 SLC25A5 SLC25A6 SLC26A2 SLC27A1 SLC2A10 SLC2A4RG SLC2A8 SLC30A9 SLC35A1 Eukaryotes Table E.3 (continued) Bacteria 365 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SLC35A5 SLC35B2 SLC35D1 SLC35E1 SLC35E2 SLC35E3 SLC35F3 SLC37A4 SLC38A1 SLC38A10 SLC38A9 SLC39A10 SLC39A3 SLC41A3 SLC44A1 SLC45A4 SLC46A3 SLC47A2 SLC48A1 SLC4A3 SLC4A7 SLC5A3 SLC5A6 SLC6A6 SLC7A4 SLC7A6 Eukaryotes Table E.3 (continued) Bacteria 366 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SLC8B1 SLC9A7 SLC9A7P1 SLCO4C1 SLFN14 SLX4IP SMAD2 SMAD4 SMAD5 SMAD9 SMAP1 SMARCA1 SMARCA4 SMARCAD1 SMARCAL1 SMARCD1 SMARCD2 SMC1A SMC3 SMC5 SMC6 SMG6 SMIM10 SMIM11 SMIM14 SMIM7 Eukaryotes Table E.3 (continued) Bacteria 367 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SMIM8 SMO SMPD1 SMPD2 SMPD4 SMPD5 SMURF2 SMYD3 SMYD4 SNAP47 SNAPIN SNED1 SNRNP25 SNRNP70 SNTA1 SNUPN SNX14 SNX15 SNX17 SNX21 SNX25 SNX27 SOAT1 SOCS5 SOD1 SON Eukaryotes Table E.3 (continued) Bacteria 368 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SORD SORL1 SORT1 SOS1-IT1 SOX11 SOX13 SP1 SPAG5-AS1 SPAST SPATA17 SPATA24 SPATA6 SPATA7 SPECC1L SPICE1 SPIN1 SPIN3 SPINT2 SPIRE1 SPNS2 SPNS3 SPOCK2 SPPL2B SPRY1 SPSB3 SPTAN1 Eukaryotes Table E.3 (continued) Bacteria 369 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SPTBN1 SPTLC3 SRCIN1 SREBF1 SRP14 SRRM1 SRSF1 SRSF8 SS18L2 SSBP2 SSBP3 SSNA1 SSPN ST13 ST3GAL6 ST6GALNAC6 STARD10 STAT6 STAU2 STIM1 STK11IP STK16 STK25 STK26 STK33 STK36 Eukaryotes Table E.3 (continued) Bacteria 370 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses STK38 STOX1 STOX2 STRA13 STRADB STRBP STUB1 STX5 STXBP5 STXBP6 STYX SUCLA2 SUCLG1 SUDS3 SUMF1 SUMF2 SUPT16H SUPT3H SUPT6H SUPT7L SUV39H1 SUV420H1 SUV420H2 SVIL SVIP SWAP70 Eukaryotes Table E.3 (continued) Bacteria 371 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses SWSAP1 SYBU SYNCRIP SYNE2 SYNGR1 SYNJ2BP SYPL1 SYT17 SYTL1 TAAR9 TACC1 TADA2A TAF4 TAF6 TAF6L TALDO1 TAMM41 TANC2 TARDBP TAZ TBC1D10C TBC1D13 TBC1D19 TBC1D22A TBC1D25 TBC1D2B Eukaryotes Table E.3 (continued) Bacteria 372 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TBC1D30 TBC1D4 TBC1D5 TBC1D8B TBC1D9B TBCD TBL1X TBL1XR1 TC2N TCAF1 TCEA2 TCEA3 TCEAL5 TCF12 TCF20 TCF25 TCF7 TCTA TCTN1 TDP1 TDRKH TEAD2 TECPR1 TECR TEF TEKT1 Eukaryotes Table E.3 (continued) Bacteria 373 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TELO2 TENM3 TERF1 TERF2 TERF2IP TET1 TEX261 TEX264 TFAP4 TFB1M TFRC TGFB1 TGFBR1 TGFBR2 TGFBRAP1 THADA THAP11 THAP4 THAP7 THAP8 THBS3 THEM6 THNSL1 THRA THRAP3 THRB Eukaryotes Table E.3 (continued) Bacteria 374 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses THSD4 THTPA TICRR TIE1 TIMM17B TIMM21 TIMM44 TIMM9 TK2 TKT TLCD1 TLDC1 TLK1 TLN2 TM2D2 TM2D3 TM4SF1 TM7SF2 TM7SF3 TM9SF2 TM9SF3 TMBIM6 TMCC1 TMCO3 TMCO6 TMEM102 Eukaryotes Table E.3 (continued) Bacteria 375 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TMEM106B TMEM106C TMEM109 TMEM115 TMEM117 TMEM120B TMEM121 TMEM129 TMEM134 TMEM135 TMEM14A TMEM150A TMEM150C TMEM161A TMEM164 TMEM175 TMEM177 TMEM18 TMEM184C TMEM19 TMEM191B TMEM194B TMEM204 TMEM205 TMEM216 TMEM218 Eukaryotes Table E.3 (continued) Bacteria 376 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TMEM223 TMEM231 TMEM261 TMEM263 TMEM38A TMEM41B TMEM42 TMEM50B TMEM56- RWDD3 TMEM57 TMEM59 TMEM63A TMEM8A TMEM9 TMEM97 TMLHE TMTC2 TMTC3 TNFRSF1A TNIK TNKS TNNI3 TNRC6A TNS1 TNS3 Eukaryotes Table E.3 (continued) Bacteria 377 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TOB2 TOM1L1 TOMM20 TOMM7 TOP1MT TOP2B TOP3B TP53BP1 TP53I13 TP53INP2 TP53RK TP73-AS1 TPK1 TPM1 TPM2 TPP1 TPR TPRN TRAF3IP1 TRAF5 TRAF7 TRAK2 TRAP1 TRAPPC11 TRAPPC12 TRAPPC6A Eukaryotes Table E.3 (continued) Bacteria 378 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TREM2 TRERF1 TRIL TRIM2 TRIM23 TRIM24 TRIM27 TRIM32 TRIM33 TRIM37 TRIM41 TRIM45 TRIM59 TRIM62 TRIM65 TRIM68 TRIQK TRIT1 TRMT10C TRMT2A TRMT61B TRNAU1AP TRNP1 TRPC4AP TRPM4 TRPS1 Eukaryotes Table E.3 (continued) Bacteria 379 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TRPT1 TRRAP TRUB2 TSC1 TSC2 TSEN2 TSEN34 TSEN54 TSGA10 TSHZ1 TSN TSPAN14 TSPAN15 TSPAN18 TSPAN3 TSPAN32 TSPAN4 TSPAN5 TSPYL1 TSPYL4 TSPYL5 TST TTC12 TTC19 TTC25 TTC29 Eukaryotes Table E.3 (continued) Bacteria 380 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses TTC30A TTC31 TTC37 TTC5 TTC8 TTLL1 TTLL12 TTPA TUBA4A TUBB TUBGCP2 TUBGCP3 TUBGCP5 TUBGCP6 TUFM TUSC2 TUSC3 TWSG1 TXLNA TXNDC16 TXNRD2 TXNRD3 TYRO3 TYSND1 UBA52 UBAC1 Eukaryotes Table E.3 (continued) Bacteria 381 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses UBE2E2 UBE2I UBE3A UBE3B UBE3C UBE4A UBL3 UBN1 UBQLN2 UBR1 UBR5 UBXN1 UBXN2B UBXN6 UCHL5 UCK1 UFSP2 UGGT2 UGT1A6 UGT8 UHRF2 ULK1 ULK4 UNC119 UNC119B UNC13B Eukaryotes Table E.3 (continued) Bacteria 382 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses UNC45A UNC5CL UPF1 UPF3B UPRT UQCC3 UQCRC2 URI1 USE1 USF2 USP13 USP19 USP20 USP24 USP28 USP30 USP34 USP46 USP48 USP5 USP7 USP9X UTRN VAC14 VANGL1 VANGL2 Eukaryotes Table E.3 (continued) Bacteria 383 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses VARS2 VAT1 VCP VDAC2 VDAC3 VEGFB VGLL4 VIM VIPR1 VKORC1 VMAC VPS11 VPS13A VPS13B VPS13C VPS13D VPS26B VPS35 VPS36 VPS41 VPS51 VPS53 VPS8 VSIG10 VWA1 VWA8 Eukaryotes Table E.3 (continued) Bacteria 384 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses WASL WBSCR27 WDPCP WDR11 WDR13 WDR19 WDR24 WDR3 WDR33 WDR35 WDR54 WDR59 WDR60 WDR81 WDR82 WDR83 WDR91 WDR92 WDSUB1 WIPI2 WLS WNK2 WRAP53 WRAP73 WRB WWC2 Eukaryotes Table E.3 (continued) Bacteria 385 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses WWOX WWTR1 XPA XPC XPO4 XPO7 XPOT XPR1 XRCC1 XRCC5 XRCC6 XRCC6BP1 XYLT2 YDJC YEATS4 YIPF2 YPEL1 YPEL2 YWHAQ YY1 ZADH2 ZBED8 ZBTB1 ZBTB25 ZBTB40 ZBTB41 Eukaryotes Table E.3 (continued) Bacteria 386 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses ZBTB8A ZC2HC1C ZC3H6 ZCCHC14 ZCCHC17 ZCCHC7 ZCWPW1 ZDHHC1 ZDHHC17 ZDHHC20 ZDHHC3 ZDHHC8 ZDHHC9 ZFHX3 ZFP2 ZFP28 ZFP3 ZFP62 ZFP64 ZFYVE19 ZFYVE21 ZFYVE27 ZFYVE9 ZGPAT ZHX3 ZIK1 Eukaryotes Table E.3 (continued) Bacteria 387 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses ZMAT1 ZMAT3 ZMPSTE24 ZMYM1 ZMYM2 ZMYND11 ZMYND12 ZMYND8 ZNF133 ZNF174 ZNF182 ZNF219 ZNF235 ZNF248 ZNF251 ZNF263 ZNF280D ZNF282 ZNF285 ZNF286A ZNF287 ZNF32 ZNF329 ZNF337 ZNF33A ZNF343 Eukaryotes Table E.3 (continued) Bacteria 388 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses ZNF354A ZNF383 ZNF395 ZNF414 ZNF431 ZNF444 ZNF445 ZNF449 ZNF451 ZNF500 ZNF512 ZNF512B ZNF514 ZNF524 ZNF559 ZNF605 ZNF608 ZNF609 ZNF629 ZNF638 ZNF641 ZNF652 ZNF658 ZNF678 ZNF692 ZNF7 Eukaryotes Table E.3 (continued) Bacteria 389 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses ZNF76 ZNF764 ZNF770 ZNF778 ZNF789 ZNF792 ZNF815P ZNF816- ZNF321P ZNF879 ZNF93 ZNRF1 ZNRF3 ZRANB1 ZRANB2 ZRANB3 ZSCAN18 ZSCAN21 ZSCAN26 ZSCAN5A ZW10 ZXDB ZXDC ZYG11B Eukaryotes Table E.3 (continued) Bacteria 390 Viruses GeneSTAT1RSAD2 P-value 19.9598MX1 FDR 1.46E-83 17.3636OAS2 5.37E-63 OAS1 16.4759PARP14 15.9958 1.34E-56 2.61E-53 OASL 15.3315 15.2635 7.18E-49 IFIT2 1.74E-48 15.1934DDX58 4.43E-48 IFIT3 14.6613 14.5588 1.11E-44 B2M 4.47E-44 GBP1 14.3919 4.55E-43 IFI44 14.3072GBP4 13.3866 1.40E-42 4.46E-37 HELZ2 13.2670IFIH1 2.04E-36 13.2537 13.1516 2.27E-36 DDX60 8.18E-36 USP18 13.1287 13.0452 1.04E-35 DHX58 2.93E-35 13.0226IRF9 12.7477 3.73E-35 OAS3 1.22E-33 HERC6 12.7433XAF1 12.5673 1.23E-33 12.2624 1.09E-32 MNDA 4.60E-31 11.9892 11.9656 1.16E-29 1.48E-29 . The 100 genes that are upregulated and generate the least Eukaryotes GeneLDHAFTH1P16SIGLEC14 P-value 9.4434FTH1P11 8.7201 FDR 8.3216 8.85E-17 LINC00239 2.36E-14 6.27E-13 TGM2 8.1483 7.9742FTH1P2 6.84E-13 2.00E-12 FTH1P12 7.8104LOC644173 7.4814 6.36E-12 CXCL8 7.4548 6.9758 5.96E-11 LAIR2 6.93E-11 1.95E-09 SP100 6.8130NAMPT 5.26E-09 6.7888LOC642590 5.62E-09 6.6636ANGPTL2 6.5870 6.4561SOCS3 1.25E-08 1.98E-08 4.14E-08 6.3684IL24 6.54E-08 CTSL 6.1677ACTB 2.02E-07 AGPAT4-IT1 6.1011 5.9000BATF 6.0822 2.95E-07 6.0739CSF1 8.36E-07 3.15E-07 BRI3P1 3.26E-07 5.8979MS4A4A 8.34E-07 5.8897 5.7761 5.7371 8.64E-07 1.51E-06 1.83E-06 Bacteria GeneHLA-BNFKBIA P-value 18.1523ICAM1 13.5015 FDR 9.10E-69 CD44 5.62E-37 13.1464GBP1 4.79E-35 CXCL2 12.5001 1.52E-31 TGM2 11.4796 11.3073 2.61E-26 NAMPT 1.59E-25 11.0824LILRB3 10.7715 1.73E-24 GADD45B 4.59E-23 10.7189 10.2027SBNO2 7.28E-23 1.46E-20 PLSCR1 9.9462STAT1 9.9134 1.64E-19 CFLAR 2.11E-19 9.9100CEBPB 9.6405 2.04E-19 RBMS1 9.4536 2.67E-18 IL1RN 9.4239 1.49E-17 SOCS3 1.87E-17 9.3614IL6 9.2252 3.18E-17 TNFAIP3 1.07E-16 JAK2 9.1440 9.1475CCL2 2.05E-16 2.08E-16 TIPARP 9.1355NFKBIZ 9.0909 2.12E-16 8.9712 3.05E-16 8.9648 8.64E-16 8.80E-16 The 100 Most Up Regulated Genes from the Meta-analysis Collection Gene P-value FDR STAT1HLA-B 21.3106GBP1 6.43E-96 18.6293B2M 7.70E-73 OAS2 18.4049 3.68E-71 RSAD2 17.1078NFKBIA 16.7699 2.96E-61 16.4569 7.55E-59 OAS1 16.4285 1.03E-56 OASL 1.46E-56 PARP14 16.2544 2.25E-55 16.0887GBP4 15.7892 2.99E-54 DDX58 3.25E-52 NAMPT 15.5490 15.5263 1.29E-50 IFIT3 15.4315 1.70E-50 MX1 6.90E-50 IFIT2 15.3797 1.44E-49 CXCL2 15.3426IFIH1 2.39E-49 15.0886 14.9244 1.08E-47 GADD45B 1.20E-46 14.6004ICAM1 14.7677 1.30E-44 1.17E-45 IL1RN 14.3428TGM2 5.15E-43 14.1104ZC3HAV1 1.34E-41 13.7086 14.0417PHF11 3.31E-39 3.39E-41 13.6957 3.79E-39 Table E.4: P-values are given with corresponding Z-scores for taxonomical groups and the collection of all profiles. 391 Viruses GenePHF11NT5C3A P-value 11.9505ZC3HAV1 FDR 11.8933 1.71E-29 11.7878CXCL10 3.15E-29 1.06E-28 IFI35 11.6031LGALS9 8.90E-28 NCOA7 11.5914 11.4060TRAFD1 9.88E-28 8.08E-27 11.0581IRF7 11.0451 3.79E-25 4.25E-25 CXCL11ZNFX1 11.0124 11.0153EIF2AK2 5.78E-25 5.75E-25 10.9993TRIM5 10.9544 6.51E-25 1.04E-24 IFIT1 10.9459PML 1.11E-24 IRGM 10.9178 1.44E-24 PNPT1 10.8197 10.7494WARS 4.00E-24 10.6877 8.35E-24 IRF1 1.52E-23 10.6675CXCL2 1.85E-23 TRIM21 10.6093 10.5876CMPK2 3.36E-23 10.5776 4.15E-23 PARP9 4.53E-23 10.5585BST2 5.34E-23 10.3651RNF213 3.83E-22 BATF2 10.3584 10.2900 4.03E-22 8.04E-22 10.2875 7.98E-22 Eukaryotes GeneMBD3L5NOD2GBP1 5.6764 P-valuePTGS2 FDR 2.44E-06 5.6637CDKN1ATLR8 2.59E-06 5.6478 5.5971 5.5544MT1H 2.78E-06 3.57E-06 OAS2 4.29E-06 FCGR2C 5.5319 5.5313ICAM1 4.65E-06 4.62E-06 5.4995NFKBIA 5.4951TYMP 5.49E-06 5.47E-06 5.4786PGAM4 5.4640 5.85E-06 RAB8B 6.24E-06 5.4042HIST2H2AA4 5.3956 5.3362 8.22E-06 HSP90AA1 5.3490 8.49E-06 1.12E-05 HK2 5.3177 1.07E-05 LOC729200 1.21E-05 PKM 5.2890CLEC6A 5.3037 1.34E-05 HGF 1.27E-05 SLC43A3 5.2659 5.2789HPRT1 1.43E-05 1.38E-05 5.2426MT1P3 5.2574TNFAIP6 1.56E-05 1.47E-05 5.2203B2M 5.2173 1.74E-05 5.2138 1.76E-05 1.77E-05 5.2038 1.85E-05 Bacteria GeneCPSLCO3A1 P-value 8.8628SOD2 FDR 8.9261 2.03E-15 C3KLF6 1.20E-15 8.7949PLAUR 3.57E-15 8.7444CSF2RB 8.7250 8.7038C2 5.20E-15 5.97E-15 8.6908 6.96E-15 IFI35 7.56E-15 KDM5C 8.5475PSMB8 8.5404 8.3180IL15RA 2.53E-14 2.61E-14 1.49E-13 8.3113GBP4 8.2513 1.54E-13 FCER1G 9.24E-14 MSR1 8.2090 8.1691STAT3 9.48E-14 9.74E-14 8.1683MAFF 8.1577CLEC4E 9.01E-14 1.64E-13 8.1436FPR2 8.0768CYBB 1.68E-13 2.40E-13 IRF1 8.0612 8.0407HIF1A 2.35E-13 TIMP1 3.07E-13 8.0396 8.0343TPM3 2.94E-13 8.0302 3.00E-13 IER3 2.83E-13 7.9969CXCL16 4.16E-13 7.9083 7.9565 8.01E-13 5.44E-13 Table E.4 (continued) Collection GenePLSCR1 P-value 13.5976 1.39E-38 FDR IFI44JAK2IFI35 13.4637PML 8.22E-38 13.4106 1.62E-37 DHX58 13.4027HERC6 1.74E-37 13.3163 13.3014IRF1 5.18E-37 6.13E-37 13.2438CSF2RB 1.28E-36 IRF9 13.1653 13.1831SOCS3 3.40E-36 2.76E-36 OAS3 13.1509 13.1475CFLAR 4.00E-36 4.07E-36 DDX60 13.0592 12.9806 1.26E-35 HELZ2 3.42E-35 12.9540WARS 4.71E-35 12.9442LILRB3 5.22E-35 12.9071CXCL11 12.9037 8.24E-35 XAF1 8.42E-35 12.8471USP18 1.70E-34 TNFAIP3 12.8333 12.7545 1.99E-34 IRF7 12.7315 5.35E-34 7.02E-34 SP100IL15RA 12.7195 12.7046ISG20 8.00E-34 12.6299 9.48E-34 TNFSF10 2.39E-33 12.5903 12.3500 3.87E-33 7.60E-32 392 Viruses GeneSAMHD1 10.2604HERC5 P-value 1.04E-21 SSC5D FDR 10.1630MX2 2.76E-21 10.1428MT2A 3.34E-21 ISG20 10.1405 10.1379TRIM25 3.37E-21 3.40E-21 10.0872GADD45B 10.0834 10.0467 5.44E-21 SP110 5.57E-21 7.95E-21 CSRNP1 10.0410SAMD9 9.9186 8.31E-21 NLRC5 2.71E-20 9.9094ISG15 2.93E-20 9.8664SLFN13 4.26E-20 9.8517SP100 9.8135CREM 4.86E-20 6.91E-20 9.7896TNFSF10 9.7875CEACAM1 9.7773 8.52E-20 9.7763 8.60E-20 SNHG5 9.29E-20 9.28E-20 TRIM69 9.7568PARP12 9.7357 1.11E-19 EPSTI1 9.7193 1.35E-19 IFNA1 1.55E-19 9.6906IFI6 2.00E-19 9.6881UBE2L6 2.03E-19 SAMD9L 9.5685 9.5915 9.5485 6.05E-19 5.00E-19 7.26E-19 Eukaryotes GeneSDHAPVRL2SUMO1P3 P-value 5.1762BZW1 FDR 5.1647 5.1562 2.09E-05 TTTY10 2.18E-05 2.27E-05 CASP4 5.1025IL1B 5.1004 2.84E-05 LILRA3 2.86E-05 5.0973NUDCD1 2.89E-05 KRT8P9 5.0908 5.0749 5.0740HSPA1B 2.95E-05 3.13E-05 RHBDF2 3.13E-05 5.0668EIF2S2 5.0044 3.23E-05 HLA-B 5.0007 4.23E-05 SEC14L1 4.29E-05 4.9761MYD88 4.9051 4.76E-05 TLR2 4.8923 6.37E-05 CHMP5 6.68E-05 4.8872NOP10 6.83E-05 RNF213 4.8730 4.8716SOWAHC 7.15E-05 7.11E-05 4.8641LOC100131859 4.8396 4.8565 4.8439 7.32E-05 ANXA2P1 7.99E-05 7.52E-05 7.91E-05 PLAUR 4.8292CRLF2 8.32E-05 STAT3 4.8237 8.52E-05 4.7941 4.7706 9.73E-05 1.07E-04 Bacteria GeneIRAK3MMP9 P-value 7.9050EHD1 FDR 7.8794 7.86E-13 SAMSN1 9.65E-13 7.8454IFIH1 7.7883RAP2C 1.20E-12 1.89E-12 PTPN2 7.7867 7.7770EIF6 1.93E-12 7.7745 1.98E-12 ETS2 1.98E-12 NCF1 7.7743SNX10 7.7477 2.01E-12 TAPBP 7.7465 2.40E-12 7.7449PNCK 2.42E-12 7.7440 2.38E-12 FCGR1A 2.35E-12 7.7418MMP14 7.7183FOSL2 2.37E-12 2.81E-12 7.7173RTP4 2.82E-12 7.7007ZBP1 3.14E-12 B2M 7.6968UPP1 7.6641 3.20E-12 CASP4 4.05E-12 7.6609DDX58 7.6441 4.10E-12 7.6387RIPK2 4.63E-12 7.6123 4.76E-12 CTSZ 5.74E-12 7.5492ZC3H12A 7.5261 9.22E-12 TNFSF10 7.5267 1.07E-11 7.5074 1.08E-11 1.23E-11 Table E.4 (continued) Collection GeneCCL2 P-value 12.3458 FDR 7.85E-32 MNDASOD2 12.2824CXCL10 1.68E-31 NFKBIZ 12.2589 12.2337 2.21E-31 IL6 2.95E-31 12.2267PSMB8 3.16E-31 TRAFD1 12.1408 12.1744PLAUR 12.1271 8.55E-31 5.78E-31 9.94E-31 PARP9 12.0956SOCS1 1.41E-30 12.0692ZBP1 1.91E-30 12.0640GCH1 1.97E-30 CD274 12.0468 2.39E-30 12.0346BATF2 2.73E-30 12.0192SP110 3.24E-30 12.0009IRGM 3.98E-30 11.9627CMPK2 6.20E-30 11.9335FCGR1A 11.8343 8.67E-30 BAZ1A 11.7877 2.78E-29 4.75E-29 ATF3 11.7196RNF19B 1.03E-28 MLKL 11.7050 11.7011 1.21E-28 SAMHD1 1.25E-28 11.6885 11.6562JUNB 1.43E-28 2.06E-28 RTP4 11.6264 2.84E-28 11.5834 4.62E-28 393 Viruses GeneIL1RNSOCS1 P-value 9.5134TDRD7 FDR 9.5064 9.95E-19 NMRK2 9.4871 1.04E-18 ANKFY1 9.4813 1.19E-18 ATF3 9.4628 1.24E-18 TMEM140 1.46E-18 9.4354IFI27 9.4363 1.84E-18 PLSCR1 1.84E-18 PSMB8 9.3413 9.3273PIK3AP1 4.27E-18 4.79E-18 9.2790KIAA1549L 9.2035 9.1917 7.34E-18 ZBP1 1.40E-17 1.54E-17 JAK2NAMPT 9.1895SHISA5 9.1740 1.56E-17 9.1690TRIM26 1.78E-17 9.1261 1.83E-17 HMBS 9.1237 2.67E-17 AZI2 2.71E-17 9.0885MUC4 3.68E-17 SECTM1 9.0457 9.0342SIGLEC1 9.0330 5.22E-17 5.75E-17 STAT2 9.0245 5.78E-17 CHMP5 6.20E-17 8.9757 8.9569 9.44E-17 1.10E-16 Eukaryotes GenePMAIP1SRGNSP110 P-value 4.7478FCGR1C FDR 1.18E-04 4.7174SIGLEC5 4.7163CSF2RB 1.35E-04 4.7096 4.6931HMGN2P46 1.35E-04 1.39E-04 MARCKS 1.47E-04 4.6922 4.6902LRRC25 1.47E-04 1.48E-04 4.6860CASP5 1.50E-04 IL12B 4.6553CASP3 1.70E-04 4.6512CCL3L3 1.72E-04 4.6377CCZ1B 4.6174HIF1A 1.80E-04 4.5946 1.91E-04 OAS1 2.10E-04 4.5913ARID5A 2.13E-04 4.5788RBBP8 2.22E-04 4.5723NFKBIZ 4.5686PSMA6 2.27E-04 2.31E-04 4.5636CCR1 4.5575 2.35E-04 ITPRIP 2.38E-04 4.5508IL1RN 2.44E-04 4.5431SRA1 4.5407 2.51E-04 2.52E-04 4.5397 2.52E-04 4.5316 2.59E-04 Bacteria GenePSMA5BST1 P-value 7.4917NFKB2 FDR 1.34E-11 PSME2 7.4896 7.4693GNG12 1.35E-11 7.4391 1.56E-11 RHBDF2 7.4308 1.94E-11 CD274 7.3824 2.04E-11 CCRL2 2.90E-11 7.3673MXD1 7.3588 3.21E-11 ZC3HAV1 7.3371 3.38E-11 7.3543CD47 3.89E-11 3.46E-11 JUNBHERC6 7.3258SEMA4A 7.2690 4.19E-11 7.2604PARP14 7.2416 6.19E-11 6.52E-11 WARS 7.34E-11 7.2187LCN2 8.60E-11 7.2006NCF4 9.73E-11 NFKBIE 7.1984BIRC3 7.1907 9.79E-11 7.1882IL1B 1.03E-10 1.03E-10 7.1817BCL2A1 1.07E-10 TBK1 7.1758 7.1771PSMA6 1.10E-10 1.10E-10 7.1675 7.1572 1.16E-10 1.24E-10 Table E.4 (continued) Collection GeneTRIM21 P-value 11.5467 6.98E-28 FDR IFI27TNFAIP6CSRNP1 11.3915 11.5317 4.04E-27 CASP4 8.20E-28 11.3440PSME2 6.79E-27 11.3103CEACAM1 9.62E-27 11.2473 11.2953IFIT1 1.91E-26 1.13E-26 OGFRSLFN5 11.2314 11.2291 2.24E-26 FPR2 2.28E-26 11.1840BST2 3.70E-26 CREM 11.1670 4.43E-26 NT5C3A 11.1540 11.1095 5.07E-26 KLF6 11.0843 8.16E-26 MSR1 1.07E-25 N4BP1 11.0732 1.18E-25 11.0599TRIM5 10.9624 1.36E-25 MXD1 3.91E-25 10.9234FCER1G 5.88E-25 10.9206ISG15 10.8918 6.01E-25 ZNFX1 8.15E-25 10.8831C2 10.8411 8.89E-25 IL1B 1.39E-24 10.8258 10.8037 1.63E-24 2.05E-24 394 Viruses GeneACTBCXXC5 P-value -19.7209TTC3 FDR -12.2334 8.39E-82 PCYOX1 6.28E-31 CBX6 -11.948 -11.3932 9.07E-27 AGO4 1.70E-29 -10.9442P4HTM 1.11E-24 -10.9003GLG1 -10.7062 1.71E-24 PPIA 1.30E-23 -10.6981PITPNC1 1.39E-23 RBL2 -10.4615 -10.5649 1.45E-22 WWP1 5.08E-23 ZBTB4 -10.4412 -10.2892 1.77E-22 CYHR1 7.98E-22 -10.1298ATRN 3.64E-21 -10.1153CNNM3 4.16E-21 -10.033EIF4EBP2 -9.9959 -9.9022KLHDC3 8.88E-21 1.27E-20 3.10E-20 ZSCAN18 -9.8901 -9.8708ZYG11B 3.45E-20 4.13E-20 ADD1 -9.8506AKTIP 4.85E-20 -9.7929GTF2I -9.7785AHNAK 8.35E-20 -9.7276 9.29E-20 -9.7189 1.44E-19 1.54E-19 . The 100 genes that are downregulated and generate the Eukaryotes GeneSNORD12CLINC01552ZNF69 -10.2563 8.85E-20 P-valueLOC653080 -9.2043AGAP6 FDR 6.17E-16 PNISR -8.7758 -9.1283EPB41L4A-AS1 1.66E-14 9.91E-16 -7.6086GUSBP2 -7.8779RMRP 2.67E-11 3.95E-12 -7.7952ALDH3A2 6.62E-12 -7.5527BCYRN1SCARNA9 3.65E-11 -7.0338 -7.558RPL38 1.47E-09 -7.0095ZNF318 3.70E-11 -6.9822 1.68E-09 PPP1R3E 1.95E-09 HNRNPA1P4 -6.8841LETMD1 -6.8431 -6.8045 3.58E-09 -6.8101WASH3P 4.59E-09 5.21E-09 5.18E-09 SNORD3D -6.8158KIZ -6.673 5.35E-09 RNU1-4 -6.6241STAG3L2 1.21E-08 1.58E-08 SORBS2TCEAL1 -6.5325 -6.544 -6.4993 2.69E-08 3.27E-08 2.56E-08 -6.4444 -6.4335 4.36E-08 4.58E-08 Bacteria GeneACTBTTC3ND4L P-value -20.9788SRR FDR 1.75E-92 -10.1263ZBTB20 2.92E-20 -8.7570PDCD4 4.81E-15 UCKL1 -7.5034 -8.0526MEF2C 1.25E-11 -7.3255 2.88E-13 CNNM3 -7.2897 4.15E-11 WWP1 -7.2426 5.36E-11 ZNF148 -7.0723 7.36E-11 ZNF398 2.11E-10 -6.8485ENTPD5 -6.8259 8.47E-10 CEP68 -6.7284 9.64E-10 -6.7186FOXN3 1.74E-09 AKAP11 1.85E-09 -6.71116-Sep -6.6466 1.91E-09 -6.4648MCCC2 2.86E-09 FAHD2A 8.40E-09 -6.3813SLC9A3R2 -6.2585 -6.2519ANKMY2 1.35E-08 -6.1549 2.88E-08 2.95E-08 ACAA2 5.17E-08 -6.1057TPCN1 6.75E-08 LDHB -6.0355 1.01E-07 -5.9998 1.23E-07 -5.9932 1.27E-07 The 100 Most Down Regulated Genes from the Meta-analysis Table E.5: least P-values are given with corresponding Z-scores for taxonomical groups and the collection of all profiles. Collection GeneACTBTTC3 P-value -26.0561CXXC5 FDR 1.99E-144 -16.5721FOXN3 -13.3172 1.75E-57 WWP1 -12.1929 5.28E-37 PCYOX1 4.69E-31 -12.1069 -12.0672ZNF148 1.25E-30 1.93E-30 CNNM3 -11.7585ZNF318 -11.6497 6.62E-29 WDR6 2.19E-28 -11.3457TMEM64 6.73E-27 -11.3419 -11.316FBXO21 6.87E-27 SRR -11.2334 9.11E-27 P4HTM 2.22E-26 RBL2 -11.1982 -11.1141SMARCA2 3.19E-26 -11.0146 7.83E-26 -11.0792SPG7 2.22E-25 1.12E-25 IVD -10.9536AGO4 4.27E-25 ZBTB4 -10.7791 -10.7346AKAP11 2.65E-24 -10.6925 4.20E-24 -10.671CEP68 6.44E-24 CAT 8.03E-24 -10.5701ADD1 2.29E-23 -10.5533 -10.5512 2.69E-23 2.71E-23 395 Viruses GeneNFIASPG7 P-valueSTRBP -9.6839 FDR YLPM1 -9.6652 2.09E-19 -9.5827IGF1R 2.49E-19 -9.5799 5.39E-19 TCEA3 5.48E-19 -9.5416WDR6 -9.5105 7.67E-19 SMARCA2 -9.505 1.01E-18 -9.5064EEF2K 1.03E-18 BRD3 1.04E-18 -9.504SEC63 -9.5008PBXIP1 1.04E-18 -9.4897NUDT16L1 1.06E-18 -9.4534 -9.4662 1.17E-18 PI4KA 1.58E-18 1.42E-18 LYRM7 -9.4236VPS13A -9.405 2.04E-18 RAB6B -9.3749 2.41E-18 PGM2L1 3.17E-18 -9.365RPL3 -9.3385SOD1 3.45E-18 4.35E-18 KIAA0586 -9.307 -9.2683MRI1 -9.2944 5.75E-18 7.98E-18 ATP6V0E2 6.41E-18 -9.2572KAT8 -9.2684 8.78E-18 ACADSB 8.04E-18 -9.2285 -9.2338BBS2 1.13E-17 1.08E-17 -9.216 1.26E-17 Eukaryotes GeneANKRD20A1SNORD13 -6.3874L3MBTL1 6.05E-08 P-valueGALT -6.3791ZNF700 FDR -6.3527 6.24E-08 SNORD3C 7.09E-08 ZNF430 -6.2875LST1 -6.2649 -6.2388 1.06E-07 SNORD68 1.20E-07 1.39E-07 AKAP11 -6.2116SNORD3A 1.62E-07 -6.2038 -6.2026CES2 1.67E-07 -6.1331TPRG1L 1.65E-07 -6.0965LOC728499 2.46E-07 2.98E-07 LOC728554 -6.0899 -6.023SCARNA2 -6.004 3.05E-07 BBS2 -5.9796 4.40E-07 4.86E-07 IDH3B 5.47E-07 -5.981RNU4ATAC 5.51E-07 DCAF8 -5.9597 -5.9045TBC1D32 -5.9382 6.09E-07 8.26E-07 VTRNA1-1 6.84E-07 SSX2IP -5.8874 -5.8889SNORA12 -5.8837 8.52E-07 8.56E-07 ZNF827 8.59E-07 FAN1 -5.8754 -5.8053 8.79E-07 1.32E-06 -5.7957 1.38E-06 -5.7877 1.43E-06 Bacteria GeneTMEM64GGA2 -5.9226 P-valueADD3 1.86E-07 ANKH FDR -5.8832PDE7A -5.8610 2.32E-07 SRPK2 -5.7682 2.59E-07 CXXC5 -5.7200 4.31E-07 DYRK2 -5.6979 5.52E-07 CBX7 -5.6952 6.11E-07 TTC14 -5.6586 6.19E-07 OSGEPL1 7.42E-07 -5.6477LOC642852 -5.6296 -5.6222 7.88E-07 ZNRF2 -5.5966 8.65E-07 8.98E-07 TNRC6B 1.03E-06 BACH2 -5.5894 -5.5818ACADM 1.06E-06 HIBCH 1.10E-06 -5.5782 -5.5052HADH 1.12E-06 FBXO21 1.65E-06 -5.5043IVD -5.4340 1.65E-06 -5.4325HELZ 2.37E-06 NUMA1 2.38E-06 PDPR -5.4116 -5.3963ZNF12 -5.3928 2.64E-06 2.84E-06 LOC100130276-5.2989 2.88E-06 -5.3234GMCL1 4.57E-06 -5.3168 4.08E-06 4.20E-06 -5.2874 4.81E-06 Table E.5 (continued) Collection GeneTPCN1ZSCAN18 P-value -10.429 -10.37076-Sep FDR 1.65E-22 9.27E-23 SRPK2 -10.3291UCKL1 -10.3261 2.47E-22 FAHD2A -10.2374 2.53E-22 -10.2198PRKDC 6.09E-22 7.25E-22 CYHR1 -10.2191PDCD4 -10.1485 7.26E-22 LDLRAP1 1.44E-21 -10.1245 -10.0496GTF2I 1.82E-21 3.77E-21 CBX6 -10.0453KLF12 3.89E-21 -10.0455ATRN -9.9987 3.91E-21 MRI1 -9.9584 6.10E-21 TARSL2 -9.9382ALDH3A2 -9.9308 9.07E-21 -9.9135GGA2 1.10E-20 1.17E-20 1.37E-20 BBS2 -9.8706YLPM1 -9.863GMCL1 2.03E-20 -9.8614RPL3 -9.8424 2.18E-20 2.20E-20 GTF3C2 2.64E-20 -9.8382CRTAP -9.8046GLG1 2.74E-20 -9.7934 3.69E-20 AKTIP 4.10E-20 -9.766 -9.6855 5.25E-20 1.13E-19 396 Viruses GeneKIAA1407 -9.2017PGRMC2 P-value 1.41E-17 DDX18 -9.1702 FDR DYNC2H1 1.82E-17 -9.1704 -9.1632EPHX1 1.83E-17 1.91E-17 NDRG3 -9.1066RASSF7 -9.0848 3.14E-17 CSDE1 -9.063 3.78E-17 IVNS1ABP -9.0547 -9.0547 4.58E-17 FOXN3 4.90E-17 4.87E-17 TARSL2 -9.0518FBXO3 -9.0093 4.97E-17 ATXN2 7.06E-17 -8.9901NFIB -8.9677 8.34E-17 FLYWCH2 -8.9309 1.01E-16 CRTAP -8.9334 1.36E-16 TPCN1 1.34E-16 -8.8911TMEM64 -8.8896 1.91E-16 RAB11FIP4 -8.8785 -8.8642 1.93E-16 POLR2E 2.10E-16 2.36E-16 CAMK2G -8.8558 -8.85SUN2 2.53E-16 ABCA5 2.65E-16 SLC25A5 -8.8463 -8.8163 -8.8073YPEL1 2.71E-16 3.47E-16 IQSEC1 3.71E-16 -8.7998 -8.7928 3.94E-16 4.08E-16 Eukaryotes GeneZNF586ALDH6A1CRTAP P-valueNT5DC1 -5.7481 -5.738SNHG7 FDR 1.76E-06 1.85E-06 SNORD46 -5.7333 -5.7175ZNF839 1.85E-06 ALCAM 2.01E-06 -5.7065 -5.6934CDC16 2.12E-06 2.26E-06 SNORA7B -5.6832 -5.6521NBPF10 2.37E-06 ARMCX4 2.75E-06 -5.6255 -5.6142HERC2P2 3.13E-06 3.31E-06 RNU11 -5.6036 -5.5882TTLL3 3.48E-06 -5.5882 3.68E-06 ZNF232 3.72E-06 ITPR3 -5.5766LOC100130353 -5.5781 3.86E-06 SDHAP2 -5.5482 -5.5686 3.86E-06 FAM200B 4.41E-06 4.00E-06 -5.5456ELP2 4.38E-06 -5.546AKR7A2 -5.5364AGAP10 4.42E-06 4.58E-06 RN7SK -5.4964 -5.4945BRWD1 -5.4973 5.48E-06 LOC100128056 5.44E-06 -5.4597 5.50E-06 -5.4806 -5.4665 6.34E-06 5.83E-06 6.20E-06 Bacteria GeneVSX2ATP2A3 P-valueCBFA2T2 -5.2411WDR6 -5.2400 FDR -5.2377 6.04E-06 ZKSCAN1 6.06E-06 6.12E-06 PER3 -5.1994 -5.2131ADCK3 7.37E-06 6.95E-06 MTX3 -5.1788PSKH1 -5.1088RFX7 8.11E-06 1.14E-05 -5.0999PRPS2 -5.0792 1.19E-05 TMEM206 1.31E-05 -4.9926CAT -4.9560 -4.9566 1.98E-05 FAT3 2.35E-05 2.35E-05 AK3PMS2 -4.9451 -4.9019CDAN1 2.48E-05 3.02E-05 PCMTD2 -4.8909 -4.8917PARP1 -4.8358 3.16E-05 -4.8308 3.16E-05 PRKDC 3.98E-05 4.06E-05 SMARCA2 -4.8274ZNF318 -4.8255 -4.8217 4.11E-05 TARSL2 4.14E-05 4.18E-05 MAPRE2 -4.8051 -4.7956CAND2 4.51E-05 -4.7783PCYOX1 4.67E-05 5.02E-05 -4.7779 -4.7757 5.03E-05 5.07E-05 Table E.5 (continued) Collection GeneTNRC6B -9.678EIF4EBP2 P-value -9.6597LYRM7 FDR 1.20E-19 1.42E-19 AKR7A2 -9.6599 -9.6576APLP2 1.43E-19 PPP3CA 1.45E-19 -9.6394IDH3B -9.5894 1.71E-19 CIRBP 2.73E-19 -9.5353ANKMY2 -9.4471 -9.4454 4.46E-19 ANKH 9.96E-19 1.01E-18 SIGIRR -9.3239HADH -9.324 2.99E-18 RALGPS2 -9.3206 -9.2962 3.00E-18 KAT6B 3.06E-18 3.81E-18 ADD3 -9.2543NDRG3 -9.238 5.55E-18 OSGEPL1 -9.2373 -9.2123KIF16B 6.40E-18 6.41E-18 7.98E-18 KIZ -9.2118ZNF398 7.98E-18 UBE4B -9.1632 -9.2091MCCC2 -9.1331 1.21E-17 8.15E-18 OR51D1 -9.1314 1.57E-17 AHNAK -9.1106 1.59E-17 ZC3H14 -9.1046 1.90E-17 ANAPC5 -9.0979 2.00E-17 -9.0942 2.12E-17 2.18E-17 397 Viruses GeneUBE4BPBRM1 P-value -8.7933KIF16B FDR -8.7881 4.09E-16 DHRS3 -8.7648 4.23E-16 FBXO21 -8.7649 5.11E-16 JADE1 -8.744 5.13E-16 YWHAQ -8.7336 6.07E-16 OSGEPL1 -8.6967 -8.6948 6.58E-16 ZNF148 8.94E-16 9.05E-16 XPC -8.6827IVD 9.95E-16 ACO2 -8.6646SMYD3 -8.6619 1.15E-15 -8.6607GSTM4 -8.6403 1.17E-15 HNRNPUL1 1.18E-15 -8.619 -8.6222 1.40E-15 TMEM57 1.62E-15 1.65E-15 PCBP2 -8.6182PDDC1 1.66E-15 -8.6169AMACR -8.5972 1.67E-15 NUBPL -8.5904 1.97E-15 ZNF318 2.07E-15 -8.5908ALMS1 -8.5862 2.07E-15 MAPK3 -8.5694 2.12E-15 IKBIP -8.5583 2.42E-15 2.64E-15 -8.5561 2.67E-15 Eukaryotes GeneFASTKUBR3TTC3 P-valueCPS1 -5.4558BNC2 FDR 6.42E-06 -5.4312SNORA55 -5.4294 7.31E-06 SLC9A3R1 -5.4238 7.32E-06 RNASE4 -5.3996 -5.4085LOC441087 7.49E-06 -5.3912 8.37E-06 8.09E-06 LOC100130093 8.63E-06 -5.3638HACD2 -5.3532 -5.3483BTD 9.97E-06 1.05E-05 1.06E-05 GPBP1L1IL11RA -5.3384TMEM203 1.11E-05 -5.3208PIAS1 -5.3245 1.20E-05 SNRNP200 -5.3167 1.18E-05 -5.3115RBM5 1.21E-05 1.23E-05 WASH1 -5.3011 -5.3049UNC80 1.28E-05 1.27E-05 FOXD4L3 -5.2954LINC00176 -5.2954 1.30E-05 LYRM7 1.31E-05 -5.2868 -5.2846POLR2J4 -5.2775 1.34E-05 1.35E-05 1.38E-05 -5.275 -5.2695 1.39E-05 1.42E-05 Bacteria GeneHEXIM1SUCLG2 -4.7593 P-valuePPP3CA -4.7404RBL2 5.44E-05 FDR MAP4K2 -4.7350 5.92E-05 PCMTD1 6.03E-05 -4.7289 -4.7257LINC00094 -4.7218 6.16E-05 6.23E-05 NFATC3 -4.6994 6.33E-05 ST8SIA6 7.00E-05 -4.6934PANK1 -4.6892PRKACB 7.16E-05 MATR3 7.29E-05 -4.6852 -4.6731ZNF706 7.38E-05 7.75E-05 FAM102A -4.6702KDSR -4.6232 7.84E-05 -4.6024ANGPTL2 9.51E-05 1.03E-04 KIAA1147 -4.5942 -4.6020INPP5F -4.5849 1.06E-04 1.03E-04 METTL8 1.10E-04 FKTN -4.5804 -4.5796ALDH3A2 1.12E-04 ZFP30 1.12E-04 -4.5675 -4.5705ZNF704 1.18E-04 1.16E-04 RASGRP3 -4.5593 -4.5562 -4.5509 1.21E-04 1.23E-04 1.25E-04 Table E.5 (continued) Collection GeneTMEM57 -9.0884PI4KA P-valueEEF2K 2.29E-17 FDR -9.0823NFIB -9.0776 2.40E-17 ZBTB20 2.49E-17 -9.0759MAP2K5 -9.0633 -9.0571SENP7 2.52E-17 2.82E-17 PITPNC1 2.97E-17 -9.0523 -9.0165PCMTD1 -8.9949 3.09E-17 NFIA 4.20E-17 TCEA3 5.02E-17 -8.9866DCAF8 -8.982TTC14 5.40E-17 -8.9684INADL 5.58E-17 -8.9666 6.27E-17 OGT -8.9515 6.30E-17 MBLAC2 -8.943 7.16E-17 MEF2C -8.9486STRBP -8.8863 7.63E-17 7.33E-17 NUBPL -8.8864 1.24E-16 EPHX1 -8.8834 1.24E-16 GSTM4 -8.8743 1.26E-16 PDDC1 -8.8674 1.37E-16 VPS13A -8.8639 1.44E-16 NUMA1 -8.8511 1.48E-16 -8.8501 1.64E-16 1.65E-16 398

Appendix F: Enriched Data from the Meta-analysis of Gene Expression

Table F.5: GO Associations Enriched for the Collection of Profiles. The 20 most signif- icantly enriched GO associations for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0019221 cytokine-mediated 167 389 1.96E-34 1.73E-33 signaling pathway GO:0010467 gene expression 260 821 1.37E-26 1.16E-25 GO:0045087 innate immune re- 274 911 3.80E-24 3.14E-23 sponse GO:0006351 ”transcription and 593 2439 1.35E-23 1.11E-22 DNA-templated” GO:0051607 defense response to 78 149 1.59E-23 1.31E-22 virus GO:0060337 type I interferon sig- 47 68 5.45E-22 4.42E-21 naling pathway GO:0006915 apoptotic process 214 690 7.69E-21 6.16E-20 GO:0060333 interferon-gamma- 49 77 2.13E-20 1.70E-19 mediated signaling pathway GO:0043122 regulation of I-kappaB 95 226 1.84E-19 1.45E-18 kinase/NF-kappaB signaling GO:0051092 positive regulation of 63 129 2.47E-17 1.90E-16 NF-kappaB transcrip- tion factor activity GO:0045071 negative regulation of 31 45 4.77E-15 3.56E-14 viral genome replica- tion GO:0002224 toll-like receptor sig- 57 125 3.58E-14 2.64E-13 naling pathway GO:0034142 toll-like receptor 4 sig- 48 98 1.15E-13 8.36E-13 naling pathway GO:0031663 lipopolysaccharIDe- 22 29 1.17E-12 8.34E-12 mediated signaling pathway GO:0000122 negative regulation 190 696 1.22E-12 8.65E-12 of transcription from RNA polymerase II promoter 399

Table F.5 (continued) ID Title Study Genes Total Genes P-value FDR GO:0002479 ”antigen processing 39 75 1.75E-12 1.24E-11 and presentation of exogenous peptIDe antigen via MHC class I and TAP-dependent” GO:0006281 DNA repair 133 444 2.97E-12 2.08E-11 GO:0002223 stimulatory C-type 52 119 3.57E-12 2.50E-11 lectin receptor signal- ing pathway GO:0034138 toll-like receptor 3 sig- 41 83 4.53E-12 3.16E-11 naling pathway GO:0035666 TRIF-dependent toll- 39 77 5.26E-12 3.66E-11 like receptor signaling pathway GO:0006520 cellular amino acID 111 353 6.49E-12 4.50E-11 metabolic process

Table F.6: GO Associations Enriched for Bacteria Profiles. The 20 most significantly enriched GO associations for bacteria profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0019221 cytokine-mediated 102 389 4.08E-50 1.27E-48 signaling pathway GO:0045087 innate immune re- 148 911 2.20E-44 6.59E-43 sponse GO:0006915 apoptotic process 109 690 1.46E-31 3.96E-30 GO:0060337 type I interferon sig- 35 68 8.78E-30 2.30E-28 naling pathway GO:0002479 ”antigen processing 33 75 2.74E-25 6.62E-24 and presentation of exogenous peptIDe antigen via MHC class I and TAP-dependent” GO:0002223 stimulatory C-type 39 119 7.68E-24 1.82E-22 lectin receptor signal- ing pathway GO:0060333 interferon-gamma- 31 77 2.09E-22 4.85E-21 mediated signaling pathway GO:0002224 toll-like receptor sig- 35 125 5.06E-19 1.06E-17 naling pathway GO:0034142 toll-like receptor 4 sig- 31 98 9.56E-19 1.98E-17 naling pathway 400

Table F.6 (continued) ID Title Study Genes Total Genes P-value FDR GO:0045944 positive regulation 107 959 9.68E-19 2.00E-17 of transcription from RNA polymerase II promoter GO:0016032 viral process 82 638 3.52E-18 7.15E-17 GO:0071222 cellular response to 29 90 6.91E-18 1.39E-16 lipopolysaccharIDe GO:0019882 antigen processing 44 227 8.58E-17 1.66E-15 and presentation GO:0051607 defense response to 35 149 2.28E-16 4.34E-15 virus GO:0034138 toll-like receptor 3 sig- 26 83 7.49E-16 1.38E-14 naling pathway GO:0051092 positive regulation of 32 129 7.88E-16 1.45E-14 NF-kappaB transcrip- tion factor activity GO:0035666 TRIF-dependent toll- 25 77 1.01E-15 1.85E-14 like receptor signaling pathway GO:0007596 blood coagulation 64 473 1.59E-15 2.86E-14 GO:0006977 ”DNA damage re- 23 66 2.29E-15 4.09E-14 sponse and signal transduction by p53 class mediator re- sulting in cell cycle arrest” GO:0006521 regulation of cellular 21 63 1.03E-13 1.72E-12 amino acID metabolic process

Table F.7: GO Associations Enriched for Eukaryote Profiles. The 20 most significantly enriched GO associations for eukaryote profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0060333 interferon-gamma- 14 77 3.79E-08 3.92E-06 mediated signaling pathway GO:0019221 cytokine-mediated 32 389 1.62E-07 1.67E-05 signaling pathway GO:0032496 response to 23 230 3.00E-07 2.96E-05 lipopolysaccharIDe GO:0006952 defense response 76 1441 4.62E-07 4.52E-05 401

Table F.7 (continued) ID Title Study Genes Total Genes P-value FDR GO:0050707 regulation of cytokine 16 122 4.99E-07 4.81E-05 secretion GO:0009607 response to biotic 44 678 7.86E-07 7.41E-05 stimulus GO:0006955 immune response 69 1290 9.83E-07 9.14E-05 GO:0044237 cellular metabolic pro- 309 8513 1.71E-06 1.57E-04 cess GO:0034097 response to cytokine 25 300 2.84E-06 2.52E-04 GO:0045630 positive regulation of 4 6 3.54E-06 3.11E-04 T-helper 2 cell differ- entiation GO:0046006 regulation of activated 8 34 3.73E-06 3.25E-04 T cell proliferation GO:0051223 regulation of protein 42 692 7.43E-06 6.18E-04 transport GO:0045429 positive regulation of 8 37 7.52E-06 6.22E-04 nitric oxIDe biosyn- thetic process GO:0006950 response to stress 126 2992 1.41E-05 1.13E-03 GO:0043123 positive regulation of 17 176 1.60E-05 1.27E-03 I-kappaB kinase/NF- kappaB signaling GO:0002711 positive regulation of 7 31 2.01E-05 1.56E-03 T cell mediated immu- nity GO:0051092 positive regulation of 14 129 2.37E-05 1.78E-03 NF-kappaB transcrip- tion factor activity GO:0019752 47 859 3.15E-05 2.25E-03 metabolic process GO:0042592 homeostatic process 59 1168 3.31E-05 2.34E-03 GO:0055086 nucleobase-containing 32 501 3.37E-05 2.37E-03 small molecule metabolic process

Table F.8: GO Associations Enriched for Virus Profiles. The 20 most significantly en- riched GO associations for virus profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0010467 gene expression 249 821 2.93E-23 2.58E-22 GO:0051607 defense response to 68 149 7.24E-17 5.87E-16 virus 402

Table F.8 (continued) ID Title Study Genes Total Genes P-value FDR GO:0019221 cytokine-mediated 131 389 9.06E-17 7.32E-16 signaling pathway GO:0060337 type I interferon sig- 41 68 2.75E-16 2.21E-15 naling pathway GO:0045071 negative regulation of 29 45 3.99E-13 3.07E-12 viral genome replica- tion GO:0000184 ”nuclear-transcribed 52 118 1.53E-12 1.16E-11 mRNA catabolic process and nonsense- mediated decay” GO:0060333 interferon-gamma- 39 77 3.68E-12 2.78E-11 mediated signaling pathway GO:0045944 positive regulation 240 959 8.73E-12 6.54E-11 of transcription from RNA polymerase II promoter GO:0043123 positive regulation of 66 176 1.63E-11 1.21E-10 I-kappaB kinase/NF- kappaB signaling GO:0000122 negative regulation 184 696 2.05E-11 1.52E-10 of transcription from RNA polymerase II promoter GO:0045087 innate immune re- 228 911 2.91E-11 2.15E-10 sponse GO:0006954 inflammatory re- 119 400 3.16E-11 2.33E-10 sponse GO:0006915 apoptotic process 181 690 5.98E-11 4.39E-10 GO:0006412 translation 95 304 1.60E-10 1.17E-09 GO:0008285 negative regulation of 161 614 6.98E-10 5.01E-09 cell proliferation GO:0043409 negative regulation of 52 136 9.24E-10 6.61E-09 MAPK cascade GO:0071345 cellular response to 60 168 1.24E-09 8.79E-09 cytokine stimulus GO:0009108 coenzyme biosyn- 42 103 3.60E-09 2.52E-08 thetic process GO:0019058 viral life cycle 53 145 4.23E-09 2.96E-08 GO:0006413 translational initiation 72 223 5.15E-09 3.58E-08 403

Table F.1: KEGG Pathways Enriched for the Collection of Profiles. The 20 most signif- icantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa00640 Propanoate metabolism 21 29 0 hsa00620 Pyruvate metabolism 22 41 0 hsa00071 degradation 22 45 0 hsa03050 Proteasome 24 45 0 hsa00310 degradation 25 53 0 hsa05150 Staphylococcus aureus infec- 26 58 0 tion hsa04623 Cytosolic DNA-sensing path- 28 65 0 way hsa05134 Legionellosis 28 56 0 hsa04621 NOD-like receptor signaling 29 58 0 pathway hsa05230 Central carbon metabolism in 30 68 0 cancer hsa04115 p53 signaling pathway 31 69 0 hsa04146 Peroxisome 33 83 0 hsa05140 Leishmaniasis 35 75 0 hsa05323 Rheumatoid arthritis 36 92 0 hsa05132 Salmonella infection 36 87 0 hsa04070 Phosphatidylinositol signal- 36 83 0 ing system hsa05133 Pertussis 39 76 0 hsa04066 HIF-1 signaling pathway 39 104 0 hsa04210 Apoptosis 39 87 0 hsa05146 Amoebiasis 40 110 0 404

Table F.2: KEGG Pathways Enriched for Bacteria Profiles. The 20 most significantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa05110 Vibrio cholerae infection 13 55 0 hsa05144 Malaria 14 50 0 hsa00010 Glycolysis / Gluconeogenesis 16 68 0 hsa05150 Staphylococcus aureus infec- 17 58 0 tion hsa05131 Shigellosis 17 66 0 hsa04622 RIG-I-like receptor signaling 18 71 0 pathway hsa05134 Legionellosis 18 56 0 hsa04640 Hematopoietic cell lineage 19 88 0 hsa04666 Fc gamma R-mediated phago- 19 92 0 cytosis hsa05120 Epithelial cell signaling in He- 20 69 0 licobacter pylori infection hsa04660 T cell receptor signaling path- 20 105 0 way hsa04662 B cell receptor signaling path- 20 73 0 way hsa03050 Proteasome 21 45 0 hsa05146 Amoebiasis 21 110 0 hsa04210 Apoptosis 21 87 0 hsa05142 (American 22 105 0 trypanosomiasis) hsa05140 Leishmaniasis 22 75 0 hsa04621 NOD-like receptor signaling 22 58 0 pathway hsa05132 Salmonella infection 22 87 0 hsa05145 Toxoplasmosis 23 121 0 405

Table F.3: KEGG Pathways Enriched for Eukaryotes Profiles. The 20 most significantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa05134 Legionellosis 12 56 0 hsa04620 Toll-like receptor signaling 15 107 0 pathway hsa04380 Osteoclast differentiation 16 132 0 hsa04630 Jak-STAT signaling pathway 17 157 0 hsa05168 Herpes simplex infection 18 187 0 hsa05164 Influenza A 17 178 0 hsa04668 TNF signaling pathway 13 111 0 hsa04621 NOD-like receptor signaling 9 58 0 pathway hsa05162 Measles 14 135 0 hsa05143 African trypanosomiasis 7 35 0 hsa04066 HIF-1 signaling pathway 12 104 0 hsa05145 Toxoplasmosis 13 121 0.0001 hsa05140 Leishmaniasis 10 75 0.0001 hsa04064 NF-kappa B signaling path- 11 92 0.0001 way hsa05144 Malaria 8 50 0.0001 hsa05205 Proteoglycans in cancer 17 205 0.0001 hsa04060 Cytokine-cytokine receptor 20 266 0.0001 interaction hsa05132 Salmonella infection 10 87 0.0002 hsa05142 Chagas disease (American 11 105 0.0002 trypanosomiasis) hsa05152 Tuberculosis 15 180 0.0003 406

Table F.4: KEGG Pathways Enriched for Viruses Profiles. The 20 most significantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa00630 Glyoxylate and dicarboxylate 17 29 0 metabolism hsa00640 Propanoate metabolism 18 29 0 hsa00310 Lysine degradation 26 53 0 hsa04914 Progesterone-mediated 35 89 0 maturation hsa04620 Toll-like receptor signaling 39 107 0 pathway hsa05142 Chagas disease (American 41 105 0 trypanosomiasis) hsa04919 ThyroID hormone signaling 43 120 0 pathway hsa05162 Measles 51 135 0 hsa04380 Osteoclast differentiation 51 132 0 hsa04120 Ubiquitin mediated proteoly- 51 138 0 sis hsa04668 TNF signaling pathway 51 111 0 hsa05160 Hepatitis C 52 134 0 hsa05161 Hepatitis B 55 147 0 hsa05168 Herpes simplex infection 64 187 0 hsa05164 Influenza A 69 178 0 hsa05166 HTLV-I infection 81 262 0 hsa05200 Pathways in cancer 113 399 0 hsa04210 Apoptosis 33 87 0 hsa04611 Platelet activation 45 132 0 hsa04064 NF-kappa B signaling path- 34 92 0 way 407

Appendix G: Data from the Meta-analysis of microRNA Expression Studies

Figure G.1: Phylogenetic Tree of Pathogen Species. The species are divided into 3 groups: Viruses (green), Bacteria (violet), and Eukaryotes (orange). The number of sig- nificant miRNA and the number of corresponding gene targets in each group are shown. Groups that do not produce a significant gene are included. 408 GSM493633,GSM493635, GSM493634, GSM493637, GSM493636, GSM493639, GSM493638, GSM493641 GSM493640, GSM575155,GSM575157 GSM575156, GSM624319,GSM624321 GSM624320, GSM722294,GSM722296, GSM722295, GSM722298, GSM722299 GSM722297, GSM797503,GSM797517, GSM797506, GSM797533, GSM797524, GSM797543, GSM797535, GSM797546, GSM797544, GSM797550 GSM797548, GSM797504,GSM797507, GSM797505, GSM797511, GSM797508, GSM797520, GSM797519, GSM797549 GSM797542, GSM575154 GSM624327 GSM722308 GSM797515,GSM797532, GSM797530, GSM797539, GSM797534, GSM797541 GSM797540, GSM797515,GSM797532, GSM797530, GSM797539, GSM797534, GSM797541 GSM797540, GSM493644,GSM493646, GSM493645, GSM493648, GSM493647, GSM493650, GSM493651 GSM493649, . The GSE series ID, the Platform ID, the series title, the species of H. pylori GSM493642,M. smegmatis GSM493643, GSM575152,M. tuberculosis GSM575153, GSM624325,M. tuberculosis GSM624326, GSM722306,H. pylori GSM722307, GSM797510,H. pylori GSM797513, GSM797510, GSM797513, sion of microRNA-142-3p Reduces WASP Family Proteins and Controls Phagocytosis profile 1 volved the tuberculosis or latentinfection TB II profile 1 profile s in Individualsand Active with Tuberculosis Latent profile 1 cer disease caused bypylori Helicobacter infection in alation. Western profile popu- 1 cer disease caused bypylori Helicobacter infection in alation. Western profile popu- 2 human biopsy samples,pylori H. positive versus negativefile pro- 1 Profiles Used in miRNA Expression Meta-analysis GSE23429 GPL9347 Mycobacteria-Dependent Expres- GSE25435 GPL10850 Analysis of the microRNA that in- GSE29190 GPL10850 Comparative miRNA Expression GSE32174 GPL8178 microRNA profiling in duodenal ul- GSE32174 GPL8178 microRNA profiling in duodenal ul- GEO IDGSE19769 Platform GPL9081 miRNA Title expression signatures for Infection Species Control Samples Infection Samples Table G.1: infectious pathogen, control sampleplatforms IDs, and and series infection using sample multiplesamples, IDs vector infectious are notation species of given are the for separated form each textitGSM(i):increment:GSM(i+n) into profile. is distinct used. profiles. Series that For are profiles performed that on contain a multiple large number of 409 GSM838573,GSM838575, GSM838574, GSM838577, GSM838578 GSM838576, GSM838579,GSM838581, GSM838580, GSM838583, GSM838584 GSM838582, GSM851801,GSM851803, GSM851802, GSM851805, GSM851804, GSM851807, GSM851808 GSM851806, GSM1195199, GSM1195200, GSM1195201 GSM1195205, GSM1195206, GSM1195207 GSM838561,GSM838563, GSM838564 GSM838562, GSM851811,GSM851813, GSM851812, GSM851815, GSM851816 GSM851814, GSM1195189 GSM1195189 GSM838555,GSM838557, GSM838558 GSM838556, E. coliE. coli GSM838553, GSM838554, M. tuberculosis GSM838559, GSM851809, GSM838560, M. tuberculosis GSM851810, GSM1195187, GSM1195188, E. coli GSM1195187, GSM1195188, tion factor in the innatesponse immune to re- systemicprofile LPS 2 [miRNA] in pulmonary tuberculosis andcoidosis profile sar- 1 ter subcutaneous injection ofnegative Gram and positive bacteria inmice the profile 1 ter subcutaneous injection ofnegative Gram and positive bacteria inmice the profile 2 tion factor in the innatesponse immune to re- systemicprofile 1 LPS [miRNA] GSE33900 GPL8786 The Role of the E2F1 transcrip- GSE34608 GPL7731 Gene and microRNA expression GSE49189 GPL15518 Circulating miRNA expression af- GSE49189 GPL15518 Circulating miRNA expression af- Table G.1 (continued) GEO IDGSE33900 Platform GPL8786 The Title Role of the E2F1 transcrip- Infection Species Control Samples Infection Samples 410 GSM1210582, GSM1210584, GSM1210585, GSM1210587, GSM1210588, GSM1210590, GSM1210592, GSM1210594, GSM1210596, GSM1210598, GSM1210600, GSM1210602, GSM1210604, GSM1210606, GSM1210607, GSM1210609, GSM1210610, GSM1210612, GSM1210613, GSM1210615, GSM1210618, GSM1210620, GSM1210621, GSM1210626, GSM1210627, GSM1210628, GSM1210630, GSM1210633, GSM1210635, GSM1210636, GSM1210638, GSM1210641, GSM1210643, GSM1210644, GSM1210646, GSM1210649, GSM1210651, GSM1210652, GSM1210654, GSM1210657 GSM1210586, GSM1210589, GSM1210591, GSM1210593, GSM1210595, GSM1210597, GSM1210599, GSM1210601, GSM1210603, GSM1210605, GSM1210608, GSM1210611, GSM1210614, GSM1210616, GSM1210617, GSM1210619, GSM1210622, GSM1210623, GSM1210624, GSM1210625, GSM1210629, GSM1210631, GSM1210632, GSM1210634, GSM1210637, GSM1210639, GSM1210640, GSM1210642, GSM1210645, GSM1210647, GSM1210648, GSM1210650, GSM1210653, GSM1210655, GSM1210656, GSM1210658, GSM1210661, GSM1210663, M. tuberculosis GSM1210581, GSM1210583, Architecture and Regulatory Impact of microRNAsponse Expression to Infection in profile 1 Re- Table G.1 (continued) GEO IDGSE49951 Platform GPL16770 A Genomic Title Portrait of the Genetic Infection Species Control Samples Infection Samples 411 GSM1210659, GSM1210660, GSM1210662, GSM1210665, GSM1210667, GSM1210668, GSM1210672, GSM1210673, GSM1210675, GSM1210677, GSM1210680, GSM1210682, GSM1210683, GSM1210685, GSM1210688, GSM1210690, GSM1210691, GSM1210693, GSM1210696, GSM1210698, GSM1210699, GSM1210701, GSM1210704, GSM1210706, GSM1210707, GSM1210709, GSM1210712, GSM1210714, GSM1210715, GSM1210719, GSM1210720, GSM1210722 GSM1314316, GSM1314317, GSM1314318, GSM1314319, GSM1314320, GSM1314321, GSM1314322, GSM1314323 GSM1314332, GSM1314333, GSM1314334, GSM1314335, GSM1314336, GSM1314337, GSM1314338, GSM1314339 GSM1314342, GSM1314343, GSM1314344, GSM1314345, GSM1314346, GSM1314347 GSM1210664, GSM1210666, GSM1210669, GSM1210670, GSM1210671, GSM1210674, GSM1210676, GSM1210678, GSM1210679, GSM1210681, GSM1210684, GSM1210686, GSM1210687, GSM1210689, GSM1210692, GSM1210694, GSM1210695, GSM1210697, GSM1210700, GSM1210702, GSM1210703, GSM1210705, GSM1210708, GSM1210710, GSM1210711, GSM1210713, GSM1210717, GSM1210718, GSM1210721 GSM1314326, GSM1314327, GSM1314328, GSM1314329, GSM1314330, GSM1314331 H. pyloriH. pylori GSM1314324, GSM1314325, GSM1314340, GSM1314341, cancer profile 1 cancer profile 2 Table G.1 (continued) GEO IDGSE49951 Platformcontinued Title Infection Species Control Samples Infection Samples GSE54397 GPL15159 microRNA expressions in gastric GSE54397 GPL15159 microRNA expressions in gastric 412 GSM1305100, GSM1305101, GSM1305102, GSM1305103, GSM1305106, GSM1305107, GSM1305108, GSM1305109, GSM1305110, GSM1305111, GSM1305114, GSM1305115, GSM1305116, GSM1305117, GSM1305118, GSM1305119, GSM1305122, GSM1305123, GSM1305124, GSM1305125, GSM1305126, GSM1305127, GSM1305129, GSM1305130, GSM1305131, GSM1305132, GSM1305133, GSM1305134, GSM1305135, GSM1305138, GSM1305139, GSM1305140, GSM1305141 GSM381751,GSM381753, GSM381752, GSM381755, GSM381754, GSM381757, GSM381756, GSM381759, GSM381758, GSM381761, GSM381760, GSM381763, GSM381762, GSM381765, GSM381764, GSM381767, GSM381766, GSM381769, GSM381768, GSM381771, GSM381770, GSM381773, GSM381774 GSM381772, GSM410658,GSM410660, GSM410659, GSM410662, GSM410661, GSM410664, GSM410665 GSM410663, GSM381741,GSM381743, GSM381742, GSM381745, GSM381744, GSM381747, GSM381746, GSM381749, GSM381750 GSM381748, GSM410655,GSM410657 GSM410656, GSM1305098, GSM1305099, GSM1305104, GSM1305105, GSM1305112, GSM1305113, GSM1305120, GSM1305121, GSM1305128, GSM1305136, GSM1305137 O. viverrini GSM1305096, GSM1305097, Hepatitis C virus GSM381739, GSM381740, Human herpesvirus 8 hepatitis c virus (HCV) liversamples biopsy profile 1 microRNAs in AIDS-KS(and biopsies normal skinprofile control 1 biopsies) duced cholangiocarcinoma bycroarray: profile mi- 1 GSE15288 GPL6955 microRNA expression in human GSE16353 GPL8617 The profile of cellular and KSHV Table G.1 (continued) GEO IDGSE53992 Platform GPL18159 The miRNAome Title of O. viverrini in- Infection Species Control Samples Infection Samples 413 GSM425704,GSM425706, GSM425707 GSM425705, GSM556724,GSM556726, GSM556727 GSM556725, GSM591391,GSM591393, GSM591392, GSM591398, GSM591397, GSM591400 GSM591399, GSM591394,GSM591396, GSM591395, GSM591402, GSM591401, GSM591404 GSM591403, GSM612738, GSM612739 GSM613377,GSM613379, GSM613378, GSM613381 GSM613380, GSM544417, GSM544418 GSM544419, GSM544420 GSM556722, GSM556723 GSM591405,GSM591407, GSM591406, GSM591409, GSM591408, GSM591411, GSM591412 GSM591410, GSM591405,GSM591407, GSM591406, GSM591409, GSM591408, GSM591411, GSM591412 GSM591410, GSM612749 GSM613384 GSM425700,GSM425702, GSM425703 GSM425701, Human herpesvirus 8 Human immunodefi- ciency virus Hepatitis B virus GSM556720,Human immunodefi- GSM556721, ciency virus Human immunodefi- ciency virus Alphapapillomavirus GSM612747, GSM612748, Influenza A H1N1 GSM613382, GSM613383, tivity contributes to perturbationlymphocyte of miRNAs by HIV-1 pro- file 1 chronically infected HIV-1 patients, LTNPs and healthy controls1 profile chronically infected HIV-1 patients, LTNPs and healthy controls2 profile lates the expression of hostNAs microR- profile 1 Expression inMononuclear Cells Peripheral ofPatients critically Blood with ill Influenzaprofile 1 A (H1N1) microRNAs in KSHV infected lym- phatic endothelial cellshours post (6 infection) profile and 1 72 during the HBV acuteinfections profile and 1 chronic GSE21892 GPL10428 Tat RNA silencing suppressor ac- GSE24022 GPL8227 miRNA profiling of CD4+ T cells in GSE24022 GPL8227 miRNA profiling of CD4+ T cells in GSE24908 GPL6955 Human Papillomavirus 16 E5 modu- GSE24956 GPL10850 Microarray Analysis of MicroRNA Table G.1 (continued) GEO IDGSE17016 Platform GPL9081 The Title profile of cellular and KSHV GSE22378 GPL8469 Analysis of micorRNA expression Infection Species Control Samples Infection Samples 414 GSM645123,GSM645125 GSM645124, GSM645126,GSM645128 GSM645127, GSM800350,GSM800352 GSM800351, GSM800353,GSM800355 GSM800354, GSM825760,GSM825763, GSM825762, GSM825768, GSM825771 GSM825767, GSM645129,GSM645131 GSM645130, GSM654620, GSM654621 GSM654622, GSM654623 GSM654620, GSM654621 GSM654624, GSM654625 GSM800349 GSM800349 GSM825761,GSM825769, GSM825765, GSM825775, GSM825774, GSM825777, GSM825778 GSM825776, GSM645129,GSM645131 GSM645130, Rabbies Virus strain FJDRV Rabbies Virus strain FJDRV Human herpesvirus 4 Human herpesvirus 4 Rhinovirus ARhinovirus A GSM800347,Human immunodefi- GSM800348, ciency virus GSM800347, GSM800348, analysis of microRNA expression in of mice infected with FJDRV, a streetvirulence, rabies and ERA, virusadapted a virus laboratory- with with lowerprofile 2 high virulence during Epstein-Barr virusLMP1 encoded and LMP2ATW03 cells transfected profile 1 in rhinosinusitis profile 2 ripheral bloodfrom mononuclear HIV-1-infected elite cells sors, suppres- viremicfected patients, control donors profile and 1 unin- analysis of microRNA expression in brains of mice infected with FJDRV, a streetvirulence, rabies and ERA, virusadapted a virus laboratory- with with lowerprofile 1 high virulence during Epstein-Barr virusLMP1 encoded and LMP2ATW03 cells transfected profile 2 in rhinosinusitis profile 1 GSE26269 GPL11354 Genome-wide identification and GSE26596 GPL11442 Analysis of microRNA expression GSE32300 GPL7724 MicroRNA expressionGSE33387 in GPL14822 chronic NanoString miRNA profiling of pe- Table G.1 (continued) GEO IDGSE26269 Platform GPL11354 Genome-wide Title identification and Infection Species Control SamplesGSE26596 GPL11442 Analysis of microRNA expression Infection Samples GSE32300 GPL7724 MicroRNA expression in chronic 415 GSM825758,GSM825764, GSM825759, GSM825770, GSM825766, GSM825773 GSM825772, GSM828595,GSM828600, GSM828598, GSM828605, GSM828608 GSM828604, GSM828596,GSM828601, GSM828597, GSM828607, GSM828609 GSM828603, GSM837606,GSM837608, GSM837607, GSM837610, GSM837611 GSM837609, GSM887886,GSM887888 GSM887887, GSM894219,GSM894235, GSM894227, GSM894251, GSM894259 GSM894243, GSM828599,GSM828606, GSM828602, GSM828611, GSM828612 GSM828610, GSM828599,GSM828606, GSM828602, GSM828611, GSM828612 GSM828610, GSM837600,GSM837602, GSM837601, GSM837604, GSM837605 GSM837603, GSM887885 GSM894230,GSM894238, GSM894231, GSM894246, GSM894239, GSM894254, GSM894255 GSM894247, GSM825761,GSM825769, GSM825765, GSM825775, GSM825774, GSM825777, GSM825778 GSM825776, Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Influenza A H1N1 GSM887883,Influenza A H1N1 GSM887884, GSM894222, GSM894223, clear cell miRNA profile1-infected s elite suppressors, of viremic HIV- patients, anddonors profile 1 uninfected control clear cell miRNA profile1-infected s elite suppressors, of viremic HIV- patients, anddonors profile 2 uninfected control miRNA and mRNA inripheral Primary Pe- Blood MononuclearInfected Cells with Humanciency Immunodefi- Virus (HIV-1) [miRNA] pro- file 1 Control vs.polyI:C Influenza-infected stimulated. profile vs. 1 tion with H1N1(A/Mexico/InDRE4487/H1N1/2009) influenza A virus profile 1 ripheral bloodfrom mononuclear HIV-1-infected elite cells sors, suppres- viremicfected patients, control donors profile and 2 unin- GSE33492 GPL14842 TaqMan Peripheral blood mononu- GSE33492 GPL14842 TaqMan Peripheral blood mononu- GSE33837 GPL14907 Comparative Expression profile of GSE36316 GPL7722 Murine primary dendriticGSE36461 GPL15271 cells: MiRNA profiling during infec- Table G.1 (continued) GEO IDGSE33387 Platform GPL14822 NanoString miRNA profiling Title of pe- Infection Species Control Samples Infection Samples 416 GSM894266,GSM894282, GSM894274, GSM894298, GSM894306 GSM894290, GSM906424,GSM906426 GSM906425, GSM945524,GSM945533, GSM945526, GSM945537, GSM945535, GSM945541, GSM945542 GSM945539, GSM906427,GSM906429 GSM906428, GSM928221, GSM928245 GSM928224, GSM928248 GSM928227, GSM928251 GSM928230, GSM928254 GSM928233, GSM928257 GSM928236, GSM928260 GSM928239, GSM928263 GSM928242, GSM928266 GSM945522,GSM945529, GSM945528, GSM945531, GSM945530, GSM945543, GSM945545 GSM945532, GSM894269,GSM894277, GSM894270, GSM894285, GSM894278, GSM894293, GSM894286, GSM894301, GSM894302 GSM894294, Influenza A H7N7 GSM894261, GSM894262, Human herpesvirus 4 Xenotropic murine leukemiarelated virus virus- Xenotropic murine leukemiarelated virus virus- Xenotropic murine leukemiarelated virus virus- Xenotropic murine leukemiarelated virus virus- Human immunodefi- ciency virus mor suppressor miR-34a ispromoting in growth EBV-infected Bprofile 1 cells Associated microRNAs in Four Cell Types in Culture profile 1 with H7N7(A/Ck/Germany/R28/H7N7/2003) influenzaprofile 1 A virus Associated microRNAs in Four Cell Types in Culture profile 2 Associated microRNAs in Four Cell Types in Culture profile 3 Associated microRNAs in Four Cell Types in Culture profile 4 in LTNPs and Chronic HIV(CHI) patients profile 1 GSE36926 GPL7722 The Epstein-Barr virus induced tu- GSE37788 GPL11432 Identification of XMRV Infection- Table G.1 (continued) GEO IDGSE36462 Platform GPL15271 MiRNA profiling Title during infection Infection SpeciesGSE37788 GPL11432 Identification of Control XMRV Samples Infection- GSE37788 GPL11432 Identification of XMRV Infection- GSE37788 GPL11432 Identification Infection of Samples XMRV Infection- GSE38556 GPL11487 MicroRNA profiling of monocytes 417 GSM1000488, GSM1000489, GSM1000490, GSM1000491, GSM1000492, GSM1000493, GSM1000494, GSM1000495, GSM1000496, GSM1000497, GSM1000498, GSM1000499, GSM1000500, GSM1000501, GSM1000502, GSM1000503, GSM1000504, GSM1000505 GSM1000523, GSM1000524, GSM1000525, GSM1000526, GSM1000527, GSM1000528, GSM1000529, GSM1000530, GSM1000531 GSM1079564, GSM1079565 GSM1079610, GSM1079611 GSM1079612, GSM1079613 GSM1000471, GSM1000472, GSM1000473, GSM1000474, GSM1000475, GSM1000476, GSM1000477, GSM1000478, GSM1000479, GSM1000480, GSM1000481, GSM1000482, GSM1000483, GSM1000484, GSM1000485, GSM1000486, GSM1000487 GSM1079086, GSM1079087 GSM1079088, GSM1079089 GSM1079562, GSM1079563, GSM1079608, GSM1079609 GSM1079562, GSM1079563, GSM1079608, GSM1079609 GSM1079562, GSM1079563, GSM1079608, GSM1079609 GSM1000471, GSM1000472, GSM1000473, GSM1000474, GSM1000475, GSM1000476, GSM1000477, GSM1000478, GSM1000479, GSM1000480, GSM1000481, GSM1000482, GSM1000483, GSM1000484, GSM1000485, GSM1000486, GSM1000487 Hepatitis C virus GSM1000469, GSM1000470, Hepatitis C virus GSM1000469, GSM1000470, Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus ically expressed in HCV-associated hepatocellular carcinoma profile 2 ronal dysfunction throughtion of microRNAs. disrup- profile 1 ronal dysfunction throughtion of microRNAs. disrup- profile 3 ronal dysfunction throughtion of microRNAs. disrup- profile 4 ically expressed in HCV-associated hepatocellular carcinoma profile 1 ronal dysfunction throughtion of microRNAs. disrup- profile 2 GSE40744 GPL14613 Identification of microRNAs specif- GSE44265 GPL8227 HIV-1 Tat protein promotes neu- GSE44265 GPL8227 HIV-1 Tat protein promotesGSE44265 neu- GPL8227 HIV-1 Tat protein promotes neu- Table G.1 (continued) GEO IDGSE40744 Platform GPL14613 Identification of microRNAs Title specif- Infection Species Control Samples Infection Samples GSE44265 GPL8227 HIV-1 Tat protein promotes neu- 418 GSM1047653, GSM1047654, GSM1079566 GSM1047655, GSM1047656 GSM1084006, GSM1084007, GSM1084008 GSM1047649, GSM1047650 GSM1047653, GSM1047654 GSM1047649, GSM1047650 GSM1047655, GSM1047656 GSM1173253, GSM1173254 GSM1173255, GSM1173256 GSM1079562, GSM1079563 GSM1079564, GSM1079565 GSM1079562, GSM1079563 GSM1079610, GSM1079611 GSM1047649, GSM1047650, GSM1079567 GSM1047649, GSM1047650, GSM1079567 GSM1084011, GSM1084012 GSM1047649, GSM1047650 GSM1047651, GSM1047652 Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Human immunodefi- ciency virus Hepatitis C virus GSM1084009, GSM1084010, ronal dysfunction throughtion of microRNAs. disrup- profile 2 HIV-1 Vpr protein leadsvelopment to of the neurocognitive de- disor- ders. profile 2 virus infection profile 1 ronal dysfunction throughtion of microRNAs. disrup- profile 1 ronal dysfunction throughtion of microRNAs. disrup- profile 3 ronal dysfunction throughtion of microRNAs. disrup- profile 1 HIV-1 Vpr protein leadsvelopment to of the neurocognitive de- disor- ders. profile 1 HIV-1 Vpr protein leadsvelopment to of the neurocognitive de- disor- ders. profile 2 HIV-1 Vpr protein leadsvelopment to of the neurocognitive de- disor- ders. profile 1 GSE44265 GPL16346 HIV-1 Tat protein promotes neu- GSE44266 GPL16346 Deregulation of microRNAs by GSE44369 GPL13264 MiRNA analysis during hepatitis C Table G.1 (continued) GEO IDGSE44265 Platform GPL16346 HIV-1 Tat Title protein promotes neu- GSE44265 GPL16346 HIV-1 Tat protein promotes neu- GSE44265 GPL16384 HIV-1 Tat protein promotes neu- GSE44266 Infection Species GPL8227 Deregulation of Control Samples microRNAs by GSE44266 GPL8227 Deregulation of microRNAs by GSE44266 Infection Samples GPL16346 Deregulation of microRNAs by 419 GSM1122593, GSM1122594, GSM1122595 GSM1122596, GSM1122597, GSM1122598 GSM1122599, GSM1122600, GSM1122601 GSM1122602, GSM1122603, GSM1122604 GSM1122605, GSM1122606, GSM1122607 GSM1123138, GSM1123139, GSM1123140 GSM1123141, GSM1123142, GSM1123143 GSM1122590, GSM1122591, GSM1122592 GSM1122590, GSM1122591, GSM1122592 GSM1122590, GSM1122591, GSM1122592 GSM1122590, GSM1122591, GSM1122592 GSM1123137 GSM1123137 GSM1122590, GSM1122591, GSM1122592 Lymphocytic chori- omeningitismarenavirus mam- Lymphocytic chori- omeningitismarenavirus mam- Lymphocytic chori- omeningitismarenavirus mam- Lymphocytic chori- omeningitismarenavirus mam- Lymphocytic chori- omeningitismarenavirus mam- Influenza A H1N1 GSM1123135, GSM1123136, Influenza A H1N1 GSM1123135, GSM1123136, Memory CD8 T cell Fatesulating by Proliferation Mod- in ResponseInfection to profile 2 Memory CD8 T cell Fatesulating by Proliferation Mod- in ResponseInfection to profile 3 Memory CD8 T cell Fatesulating by Proliferation Mod- in ResponseInfection to profile 4 Memory CD8 T cell Fatesulating by Proliferation Mod- in ResponseInfection to profile 5 profile s ofwith mouse 2009 lungsfluenza infected pandemic virus andvirus H1N1 profile 1 seasonal in- H1N1 profile s ofwith mouse 2009 lungsfluenza infected pandemic virus andvirus H1N1 profile 2 seasonal in- H1N1 Memory CD8 T cell Fatesulating by Proliferation Mod- in ResponseInfection to profile 1 GSE46052 GPL7724 miR-17 92 Regulates Effector and GSE46052 GPL7724 miR-17 92 Regulates Effector and GSE46052 GPL7724 miR-17 92 Regulates Effector and GSE46052 GPL7724 miR-17 92 Regulates Effector and GSE46087 GPL17018 Differential microRNA expression GSE46087 GPL17018 Differential microRNA expression Table G.1 (continued) GEO IDGSE46052 Platform GPL7724 miR-17 Title 92 Regulates Effector and Infection Species Control Samples Infection Samples 420 GSM1125524, GSM1125525, GSM1125526, GSM1125527, GSM1125528, GSM1125529, GSM1125530, GSM1125531, GSM1125532, GSM1125533, GSM1125534, GSM1125535, GSM1125536, GSM1125537, GSM1125538, GSM1125539, GSM1125540, GSM1125541, GSM1125542, GSM1125543, GSM1125544, GSM1125545, GSM1125546, GSM1125547, GSM1125548, GSM1125549, GSM1125550, GSM1125551, GSM1125552, GSM1125553, GSM1125554, GSM1125555, GSM1125556, GSM1125557, GSM1125558, GSM1125559, GSM1125560, GSM1125561, GSM1125562, GSM1125563, GSM1125564, GSM1125565, GSM1125566 GSM1208925, GSM1208926 GSM1329059, GSM1329060, GSM1329070 GSM1329061, GSM1329062, GSM1329063, GSM1329064, GSM1329065 GSM1208922 GSM1329054, GSM1329055, GSM1329056, GSM1329057, GSM1329058 GSM1329054, GSM1329055, GSM1329056, GSM1329057, GSM1329058 GSM1125569, GSM1125570, GSM1125571, GSM1125572, GSM1125573, GSM1125574, GSM1125575, GSM1125576, GSM1125577, GSM1125578, GSM1125579, GSM1125580, GSM1125581, GSM1125582, GSM1125583, GSM1125584, GSM1125585, GSM1125586, GSM1125587, GSM1125588, GSM1125589 Influenza A H1N1 GSM1125567, GSM1125568, Chikungunya virus GSM1208920, GSM1208921, Human immunodefi- ciency virus Human immunodefi- ciency virus HEK293T cellsChikungunya virus profile 1 infected with pression of MEF2Cmonkey and in human SIV/HIV neurons neuro- in logical disease profile 1 pression of MEF2Cmonkey and in human SIV/HIV neurons neuro- in logical disease profile 2 NAs in Influenza A patients.1 profile GSE49884 GPL14613 microRNA expression profileGSE55069 in GPL15829 MicroRNA-21 dysregulates the ex- GSE55069 GPL15829 MicroRNA-21 dysregulates the ex- Table G.1 (continued) GEO IDGSE46176 Platform GPL17036 Expression of Title circulating microR- Infection Species Control Samples Infection Samples 421 All miR-142 miR-652 GSM1329066, GSM1329067, GSM1329068, GSM1329069 GSM1379524, GSM1379525, GSM1379526, GSM1379527, GSM1379528, GSM1379529, GSM1379530, GSM1379531 GSM1381321, GSM1381322, GSM1381323 BacteriaViruses miR-138 & miR-564 miR-629 EukaryotesViruses & miR-663 GSM1379519, GSM1379520, GSM1379521, GSM1379522, GSM1379523 GSM1381320 GSM1329054, GSM1329055, GSM1329056, GSM1329057, GSM1329058 . The table lists miRNA that are significantly upregulated for taxonomical Bacteria &karyotes Eu- miR-21 miR-221 miR-329 miR-548 miR-769 miR-8 Human immunodefi- ciency virus Human immunodefi- ciency virus Enterovirus A GSM1381318, GSM1381319, Viruses miR-190 miR-205 miR-299 miR-367 miR-492 miR-596 miR-608 miR-634 miR-639 Eukaryotes miR-1202 miR-2110 miR-218 miR-2861 miR-431 miR-645 on StimulatedMononuclear Peripheral Cellsinfected Blood Elite from Controllers,Patients, Viremic HIV-1 Uninfected Treated Donors profile 1 Patients and HT29 cells profile 1 pression of MEF2Cmonkey and in human SIV/HIV neurons neuro- in logical disease profile 3 Up Regulated miRNA From the Meta-analysis Bacteria let-7 miR-10 miR-101 miR-103 miR-1224 miR-1225 miR-1226 miR-1228 miR-124 GSE57323 GPL17391 microRNA Expression profile GSE57372 GPL14613 miRNA profiling of EV71 infected Table G.1 (continued) GEO IDGSE55069 Platform GPL15829 MicroRNA-21 dysregulates Title the ex- Infection Species Control Samples Infection Samples Table G.2: groups and miRNA that are commonlysignificant upregulated miRNA. in combinations of the groups. The Bacteria taxonomical group produces the most 422 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses miR-668 miR-941 Eukaryotes Table G.2 (continued) Bacteria miR-126 miR-128 miR-129 miR-130 miR-132 miR-133 miR-134 miR-135 miR-136 miR-139 miR-140 miR-144 miR-145 miR-146 miR-148 miR-15 miR-150 miR-154 miR-155 miR-17 miR-181 miR-182 miR-183 miR-184 miR-185 miR-186 423 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses Eukaryotes Table G.2 (continued) Bacteria miR-188 miR-19 miR-191 miR-192 miR-193 miR-194 miR-196 miR-197 miR-199 miR-202 miR-210 miR-22 miR-223 miR-23 miR-24 miR-25 miR-26 miR-27 miR-28 miR-29 miR-290 miR-296 miR-30 miR-320 miR-322 miR-324 424 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses Eukaryotes Table G.2 (continued) Bacteria miR-326 miR-328 miR-330 miR-331 miR-335 miR-337 miR-338 miR-339 miR-34 miR-340 miR-342 miR-344 miR-361 miR-362 miR-363 miR-365 miR-368 miR-370 miR-373 miR-374 miR-378-2 miR-379 miR-423 miR-425 miR-433 miR-451 425 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses Eukaryotes Table G.2 (continued) Bacteria miR-454 miR-483 miR-484 miR-485 miR-486 miR-491 miR-493 miR-497 miR-498 miR-500 miR-503 miR-505 miR-506 miR-515 miR-541 miR-542 miR-550 miR-557 miR-574 miR-575 miR-584 miR-590 miR-598 miR-601 miR-610 miR-623 426 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses Eukaryotes Table G.2 (continued) Bacteria miR-624 miR-625 miR-627 miR-638 miR-650 miR-654 miR-659 miR-665 miR-671 miR-7 miR-743 miR-744 miR-760 miR-766 miR-770 miR-874 miR-877 miR-9 miR-936 miR-939 miR-940 miR-942 miR-95 miR-96 427 All BacteriaViruses & EukaryotesViruses & . The table lists miRNA that are significantly downregulated for taxo- Bacteria &karyotes Eu- Viruses miR-1 miR-137 miR-153 miR-202 miR-32 miR-326 miR-384 miR-412 miR-488 miR-499 miR-544 miR-548 miR-549 miR-551 miR-553 miR-555 miR-558 miR-561 miR-563 miR-569 miR-573 miR-576 miR-577 Eukaryotes miR-1203 miR-122 miR-1258 miR-1260b miR-1261 miR-1272 miR-1275 miR-1288 miR-1298 miR-1301 miR-1305 miR-1469 miR-216 miR-3117 miR-3127 miR-3160 miR-3190 miR-3198 miR-365 miR-567 miR-604 miR-759 Down Regulated miRNA From the Meta-analysis Bacteria miR-467 Table G.3: nomical groups and miRNA thatthe are most commonly significant upregulated miRNA. in combinations of the groups. The viruses taxonomical group produces 428 All BacteriaViruses & EukaryotesViruses & Bacteria &karyotes Eu- Viruses miR-580 miR-581 miR-586 miR-587 miR-589 miR-593 miR-597 miR-598 miR-599 miR-600 miR-606 miR-607 miR-611 miR-613 miR-614 miR-618 miR-619 miR-621 miR-622 miR-624 miR-627 miR-632 miR-633 miR-635 miR-637 miR-640 Eukaryotes Table G.3 (continued) Bacteria 429 All Viruses Genemir-596mir-639 P-value 8.1720mir-367 FDR 5.5466 1.36E-14 mir-492 5.5266 4.12E-07 mir-941 5.4524 4.45E-07 mir-608 4.9482 6.31E-07 mir-668 4.9069 6.27E-06 mir-629 4.8420 7.24E-06 mir-190 4.5836 9.86E-06 mir-634 4.5316 3.06E-05 mir-663 4.4713 3.85E-05 4.3627 4.69E-05 7.50E-05 BacteriaViruses & EukaryotesViruses & . The 100 miRNA that are upregulated and generate the Eukaryotes Genemir-8mir-548mir-663 P-valuemir-769 6.6771 FDR 6.6630mir-329 2.70E-09 5.0012 1.98E-09 mir-21 4.8421 1.05E-05 mir-218 4.6473 1.90E-05 mir-221 4.66E-05 4.5043mir-2110 4.4555 8.20E-05 mir-645 4.3378 8.43E-05 mir-431 4.2399 1.33E-04 1.91E-04 4.1762 4.1186 2.43E-04 2.82E-04 Bacteria &karyotes Eu- Viruses miR-641 miR-643 miR-644 miR-645 miR-651 miR-657 miR-658 miR-769 miR-802 Bacteria Genemir-17let-7 P-value 16.5433mir-515 FDR 4.00E-58 mir-154 15.9131 14.9628mir-15 5.54E-54 8.68E-48 14.8290mir-10 4.78E-47 14.1916mir-500 3.97E-43 13.9480mir-30 13.3569 1.02E-41 mir-181 2.79E-38 12.8544mir-29 10.9745 1.77E-35 mir-130 8.40E-26 10.4550 10.4496 1.98E-23 1.90E-23 Eukaryotes The 100 Most Up Regulated miRNA from the Meta-analysis Table G.3 (continued) Bacteria Collection Gene P-value FDR mir-17mir-500 10.3464mir-10 10.0619 3.62E-22 mir-15 4.39E-21 9.1580mir-515 8.8444mir-596 1.95E-17 8.2470mir-132 2.63E-16 8.1143 1.18E-14 mir-181 7.8886 2.02E-14 mir-130 7.8395 1.24E-13 mir-21 7.7206 1.57E-13 mir-188 3.68E-13 7.6189 7.5902 7.40E-13 8.49E-13 Table G.4: least P-values are given with corresponding Z-scores for taxonomical groups and the collection of all profiles. 430 Viruses Genemir-299mir-142 P-value 4.3294mir-564 FDR 4.3289 8.47E-05 mir-205 4.1891 8.36E-05 mir-138 4.0406 1.47E-04 mir-652 4.0285 2.73E-04 3.8920 2.83E-04 4.88E-04 Eukaryotes Genemir-652mir-142mir-1202 P-value 4.0756mir-2861 FDR 4.0196 3.18E-04 3.9655 3.91E-04 3.9082 4.64E-04 5.57E-04 Bacteria Genemir-132mir-188 P-value 10.3858mir-148 FDR 3.39E-23 10.3418mir-192 4.93E-23 10.2994mir-21 7.10E-23 9.5887mir-23 7.77E-20 9.3965mir-505 9.3603mir-193 4.52E-19 8.8467mir-7 5.97E-19 8.7110 6.05E-17 mir-368 1.89E-16 mir-142 8.6447 8.6371mir-25 3.19E-16 8.5826 3.24E-16 mir-140 4.94E-16 8.4896mir-199 8.4448mir-146 1.05E-15 8.4237 1.46E-15 mir-221 8.4237 1.68E-15 mir-362 8.3299 1.62E-15 mir-340 8.2957 3.07E-15 mir-28 8.2833 3.95E-15 mir-26 2.06E-15 8.2746mir-629 8.2073mir-324 2.14E-15 8.0491mir-506 2.22E-15 7.9902 7.48E-15 mir-665 7.9565 1.09E-14 mir-126 7.8119 1.41E-14 mir-19 7.8054 4.28E-14 4.49E-14 7.7861 5.02E-14 Table G.4 (continued) Collection Genemir-1224 7.4239 P-value FDR 2.79E-12 mir-668let-7 6.9608mir-324 5.39E-11 mir-29 6.8121 6.8046mir-625 1.46E-10 1.47E-10 6.7104mir-330 6.6737mir-30 2.69E-10 6.6160 3.31E-10 mir-25 4.70E-10 6.5924mir-290 6.4173mir-564 5.31E-10 6.3873mir-154 1.64E-09 6.3727 1.92E-09 mir-368 6.3563 2.04E-09 mir-665 6.3507 2.12E-09 mir-663 6.2914 2.14E-09 mir-542 6.2593 3.04E-09 mir-608 6.1099 3.63E-09 mir-7 6.0387 9.08E-09 mir-34 1.38E-08 mir-340 5.9893 5.9675 1.82E-08 5.8532 2.02E-08 3.94E-08 mir-629mir-148 7.3934mir-1225 7.3002 3.26E-12 mir-221 7.1409 6.11E-12 mir-142 1.85E-11 7.1088mir-138 7.0679 2.19E-11 6.9966 2.79E-11 4.40E-11 431 Viruses Gene P-value FDR Viruses Genemir-587mir-624 P-value -4.4883mir-632 -4.5010 FDR 4.48E-05 mir-488 -4.5270 4.29E-05 mir-558 -4.5852 3.86E-05 mir-561 -4.7445 3.09E-05 mir-640 -4.7517 1.45E-05 -4.7546 1.43E-05 1.43E-05 . The 100 miRNA that are downregulated and generate Eukaryotes Genemir-1203mir-122 -3.9595 P-valuemir-216 4.62E-04 FDR -4.0113mir-1469 3.94E-04 -4.0793mir-1288 -4.1354 3.23E-04 mir-759 2.71E-04 -4.1388mir-3160 2.76E-04 -4.3239 -4.3520 1.36E-04 1.30E-04 Eukaryotes Gene P-value FDR Bacteria Genemir-467 P-value -4.3491 FDR 2.71E-05 Bacteria Genemir-1225mir-22 7.7725 P-valuemir-330 FDR 5.51E-14 7.7540mir-329 7.6927mir-223 6.14E-14 7.5827 9.73E-14 mir-575 7.5255 2.22E-13 mir-378-2 7.5163 3.38E-13 7.4523mir-361 3.54E-13 mir-1224 5.62E-13 7.4194mir-766 7.4108 7.05E-13 mir-374 7.35E-13 7.2920mir-625 7.2865 1.75E-12 mir-34 7.2265 1.79E-12 2.73E-12 7.1851 3.62E-12 The 100 Most Down Regulated miRNA from the Meta-analysis Table G.4 (continued) Collection Genemir-654mir-454 P-value 5.8360mir-126 FDR 5.8082 4.26E-08 mir-1228 5.8077 4.91E-08 mir-484 5.8003 4.81E-08 mir-129 4.91E-08 5.7765mir-8 5.7585 5.52E-08 mir-374 6.01E-08 mir-493 5.7410 5.6700mir-492 6.52E-08 5.6479 9.68E-08 mir-134 5.6389 1.08E-07 mir-486 5.5632 1.11E-07 mir-197 5.5306 1.69E-07 5.5152 1.99E-07 2.14E-07 Collection Genemir-548mir-1 P-value -3.9744mir-641 FDR 1.84E-04 mir-137 -4.0197 -4.1334mir-553 -4.1571 1.55E-04 1.01E-04 mir-384 -4.2336 9.34E-05 mir-586 -4.3344 7.05E-05 -4.3764 5.01E-05 4.23E-05 Table G.5: the least P-values are given with corresponding Z-scores for taxonomical groups and the collection of all profiles. 432 Viruses Genemir-202mir-544 P-value -4.8186mir-589 -4.8274 FDR 1.06E-05 mir-657 -4.9323 1.04E-05 mir-580 -4.9369 6.50E-06 mir-645 -4.9657 6.49E-06 mir-637 -5.0329 5.86E-06 mir-137 -5.0336 4.23E-06 mir-658 -5.0834 4.32E-06 mir-619 -5.1120 3.41E-06 mir-644 -5.1327 3.01E-06 mir-576 -5.1393 2.77E-06 mir-598 -5.1466 2.74E-06 mir-651 -5.2371 2.71E-06 mir-569 -5.3283 1.72E-06 mir-555 -5.3358 1.07E-06 mir-581 -5.3406 1.06E-06 mir-643 -5.3833 1.07E-06 mir-635 -5.4419 8.68E-07 mir-627 -5.5022 6.47E-07 mir-586 -5.6187 4.93E-07 mir-1 -5.6744 2.83E-07 mir-618 2.13E-07 mir-611 -5.6852 -5.7616mir-599 -5.7932 2.09E-07 1.39E-07 mir-621 -5.8220 1.21E-07 -5.8243 1.07E-07 1.11E-07 Eukaryotes Genemir-1301mir-365 -4.4801 P-valuemir-3198 7.87E-05 FDR -4.4856mir-3127 -4.5026 8.06E-05 mir-1260b 7.83E-05 -4.5556 -4.8911mir-1305 6.81E-05 1.59E-05 mir-604 -4.9141mir-1275 1.52E-05 -5.0219mir-567 -5.1328 1.03E-05 mir-3117 6.33E-06 -5.2207mir-1258 -5.4051 4.39E-06 mir-3190 1.79E-06 -5.5712mir-1272 8.01E-07 -5.6093mir-1261 7.50E-07 -5.8033mir-1298 2.88E-07 -6.1502 4.29E-08 -7.9771 3.44E-13 Bacteria Gene P-value FDR Table G.5 (continued) Collection Genemir-597mir-613 P-value -4.8556mir-563 -5.1284 FDR 5.10E-06 mir-577 -5.4332 1.46E-06 mir-622 -6.3694 3.27E-07 -25.2503 2.02E-09 4.52E-137 433 143 Viruses Genemir-384mir-607 P-value -5.8583mir-553 -5.8626 FDR 9.56E-08 mir-614 -5.8906 9.87E-08 mir-769 -5.9173 8.85E-08 mir-802 -6.0004 8.03E-08 mir-32 -6.0325 5.17E-08 mir-641 -6.0784 4.57E-08 mir-412 -6.3011mir-577 3.72E-08 -6.5118 9.89E-09 mir-551 -7.4611 2.73E-09 mir-593 -7.5092 3.51E-12 mir-597 -7.5640 2.74E-12 mir-563 -9.1851 2.05E-12 mir-613 -9.2637 1.41E-17 mir-548 -9.8577 8.51E-18 mir-622 -10.0080 3.88E-20 -25.8178 1.31E-20 2.66E- Eukaryotes Gene P-value FDR Bacteria Gene P-value FDR Table G.5 (continued) Collection Gene P-value FDR 434

Table G.6: mRNA Targets of Significant miRNA. Significant miRNA are mapped to mRNA targets for the entire collection and taxonomical groups. mRNA that correspond to a gene are listed. Bacteria profiles map to the greatest number of genes, over 10,000.

Collection Bacteria Eukaryotes Viruses A2M, A4GALT A2M, A4GALT A2M, AACS ABCB5, ABCB6 AAAS, AACS AAAS, AACS ABCC9, ABCD3 ABCB7, ABCC1 AADACL3, AADAT AADACL3, AADAT ABHD3, ABLIM1 ABCC4, ABCC9 AAGAB, AAMP AAGAB, AAMDC ACACA, ACAT1 ABCF2, ABHD11 AAR2, AARS AAMP, AAR2 ACBD5, ACIN1 ABHD12, ACADVL AARS2, AARSD1 AARS, AARS2 ACLY, ACSL3 ACLY, ACP2 AASDH, AASDHPPT AARSD1, AASDH ACTB, ACTG1 ACPL2, ACTA1 AATF, AATK AASDHPPT, AATF ACTN1, ACTN4 ACTB, ACTC1 ABAT, ABCA1 AATK, ABAT ACTR2, ACVR1C ACTG1, ACTN1 ABCA12, ABCA13 ABCA1, ABCA12 ACVR2B, ADAM10 ACTN4, ACVR1C ABCA3, ABCA4 ABCA13, ABCA3 ADAM17, ADCY3 ADAM12, ADAMTSL4 ABCA5, ABCA7 ABCA4, ABCA5 ADD1, ADD2 ADAR, ADPGK ABCA8, ABCB10 ABCA6, ABCA7 ADNP, ADO ADRM1, AFAP1 ABCB5, ABCB6 ABCA8, ABCB1 ADPGK, ADRM1 AGAP3, AGBL2 ABCB7, ABCB8 ABCB10, ABCB6 AFTPH, AGAP1 AGMAT, AGO1 ABCB9, ABCC1 ABCB7, ABCB8 AGAP3, AGBL2 AGO2, AGPAT6 ABCC11, ABCC3 ABCB9, ABCC1 AGGF1, AGO1 AGRN, AGTPBP1 ABCC4, ABCC5 ABCC10, ABCC11 AGO2, AGO4 AGTR2, AGTRAP ABCC8, ABCC9 ABCC3, ABCC4 AGPAT5, AGPAT6 AHNAK2, AIFM2 ABCD1, ABCD3 ABCC5, ABCC8 AGTR2, AHI1 AIM2, AK1 ABCD4, ABCE1 ABCC9, ABCD1 AHSA1, AHSA2 AKAP11, AKAP12 ABCF1, ABCF2 ABCD3, ABCD4 AIM1, AIMP2 AKAP2, AKAP4 ABCG2, ABCG8 ABCE1, ABCF1 AK1, AKAP11 AKR1D1, ALAD ABHD10, ABHD11 ABCF2, ABCG2 AKAP2, AKAP8 ALDH2, ALG2 ABHD12, ABHD12B ABCG4, ABCG8 AKAP9, AKT2 ALG3, AMDHD1 ABHD14B, ABHD15 ABHD10, ABHD12 AKT3, ALAD AMZ1, ANGPTL4 ABHD17C, ABHD2 ABHD12B, ABHD14B ALDOA, ALG3 ANKFY1, ANKIB1 ABHD3, ABHD4 ABHD15, ABHD16A ALMS1, AMMECR1L ANKRD17, ANKRD29 ABHD5, ABI2 ABHD17B, ABHD17C AMOT, ANK2 ANO8, ANP32A ABL1, ABL2 ABHD2, ABHD3 ANKRD10, ANKRD12 ANP32B, ANP32E ABLIM1, ABR ABHD4, ABHD5 ANKRD13C, ANKRD28 ANPEP, ANXA2 ABT1, ABTB1 ABI2, ABI3 ANKRD46, ANP32A AP1B1, AP1G1 ABTB2, ACAA1 ABL1, ABL2 ANXA11, ANXA2 AP1M1, AP1S1 ACAA2, ACACA ABLIM1, ABR ANXA7, AP1B1 AP1S3, AP2S1 ACACB, ACAD8 ABRACL, ABT1 AP1G1, AP2A1 AP3B1, AP3D1 ACADM, ACADS ABTB1, ABTB2 AP3B1, AP3M1 APEH, APPL1 ACADSB, ACADVL ACAA1, ACAA2 AP3M2, APAF1 AR, ARCN1 ACAP3, ACAT1 ACACA, ACACB APBB2, APC AREL1, ARF3 ACAT2, ACBD3 ACAD8, ACADL APLP2, APMAP ARF4, ARG1 ACBD4, ACBD5 ACADM, ACADS APOL2, APOLD1 ARGLU1, ARHGAP12 ACBD6, ACE ACADSB, ACADVL APP, APPL1 ARHGAP29, ARHGEF18 ACER3, ACIN1 ACAP3, ACAT1 ARCN1, AREL1 ARHGEF3, ARID1A ACLY, ACOT12 ACAT2, ACBD3 ARF4, ARHGAP12 ARID2, ARL1 ACOT2, ACOT4 ACBD4, ACBD5 ARHGAP21, ARHGAP42 ARL10, ARL14EP ACOT6, ACOT7 ACBD6, ACD ARHGEF12, ARHGEF18 ARL15, ARL6IP6 ACOT8, ACOT9 ACE, ACE2 ARHGEF3, ARID1A ARMC10, ARMC7 ACOX1, ACP1 ACER3, ACIN1 ARID4A, ARIH2 ARNTL, ARPC5 ACP2, ACPL2 ACLY, ACN9 ARL1, ARL10 ARRDC1, ARRDC4 ACPP, ACPT ACOT11, ACOT12 ARL14EP, ARL15 ARSB, ASAP1 ACR, ACSF2 ACOT2, ACOT4 ARL6, ARL6IP6 ASH2L, ASXL2 ACSF3, ACSL1 ACOT6, ACOT7 ARMCX3, ARNTL ATAD2, ATF7 ACSL3, ACSL4 ACOT8, ACOT9 ARSB, ARSG ATL3, ATP11A ACSL5, ACSL6 ACOX1, ACOX2 ART3, ASRGL1 ATP11B, ATP11C ACSM1, ACSM2A ACP1, ACP2 ASXL2, ASXL3 ATP13A1, ATP2A2 ACSS1, ACSS2 ACP6, ACPL2 ATAD1, ATAD2B ATP2B4, ATP5B ACSS3, ACTA1 ACPP, ACPT ATF2, ATF5 ATP5G2, ATP6V0A1 ACTA2, ACTB ACR, ACSF2 ATF7, ATL2 ATP6V1A, ATP6V1B2 ACTC1, ACTG1 ACSF3, ACSL1 ATMIN, ATP11A ATP6V1C1, ATP6V1E1 ACTL6A, ACTL7A ACSL3, ACSL4 ATP11B, ATP11C ATP6V1F, ATXN1 ACTL8, ACTL9 ACSL5, ACSL6 ATP13A3, ATP1A2 ATXN2L, ATXN7L3B ACTN1, ACTN2 ACSM1, ACSM2A ATP2A2, ATP2B4 AXIN1, AXL ACTN4, ACTR10 ACSS1, ACSS2 ATP5A1, ATP5B B3GNT2, B4GALT1 ACTR1A, ACTR1B ACSS3, ACTA1 ATP5G3, ATP5J BACH1, BAG5 ACTR2, ACTRT2 ACTA2, ACTB ATP6, ATP6V0E1 BARX2, BAX ACVR1, ACVR1B ACTG1, ACTL6A ATP6V1E1, ATP6V1F BCAP29, BCAS4 ACVR1C, ACVR2A ACTL7A, ACTL8 ATP7A, ATP7B BCAT1, BCAT2 435

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ACVR2B, ADA ACTL9, ACTN1 ATRX, ATXN1 BCKDHB, BCL11A ADAD2, ADAM10 ACTN2, ACTN4 ATXN2L, ATXN7L3 BCL11B, BCL2 ADAM12, ADAM15 ACTR10, ACTR1A ATXN7L3B, AUTS2 BCL2L11, BCL6B ADAM2, ADAM21 ACTR1B, ACTR2 AXIN1, B3GALNT1 BCOR, BDNF ADAM32, ADAM33 ACTR3, ACTR5 B3GNT5, B4GALT1 BHLHE40, BICD2 ADAM8, ADAM9 ACTR8, ACTRT2 B4GALT2, B4GALT7 BLCAP, BMP7 ADAMTS1, ADAMTS12 ACVR1, ACVR1B BACH1, BAG3 BMP8A, BOD1 ADAMTS16, ADAMTS17 ACVR1C, ACVR2A BAG6, BAP1 BRD2, BRDT ADAMTS2, ADAMTS4 ACVR2B, ADA BARX2, BASP1 BRI3BP, BRMS1L ADAMTS5, ADAMTS6 ADAD2, ADAM10 BATF2, BAX BRPF3, BTBD2 ADAMTS9, ADAMTSL1 ADAM12, ADAM15 BAZ1B, BBC3 BTC, BTF3 ADAMTSL4, ADAP2 ADAM17, ADAM18 BCAT1, BCL2 C10ORF137, ADAR, ADARB1 ADAM2, ADAM21 BCL2L1, BCL2L11 C11ORF31, C11ORF48 ADAT1, ADC ADAM29, ADAM32 BCL2L13, BCL2L2 C11ORF58, C12ORF10 ADCK1, ADCK2 ADAM33, ADAM8 BCL6, BDH2 C12ORF40, C12ORF49 ADCK3, ADCK4 ADAM9, ADAMTS1 BICD2, BIRC5 C12ORF57, C19ORF54 ADCY1, ADCY3 ADAMTS12, ADAMTS16 BIRC6, BLCAP C1ORF173, C1ORF174 ADCY6, ADCY7 ADAMTS17, ADAMTS2 BMF, BMI1 C1ORF27, C1ORF43 ADCY9, ADD1 ADAMTS4, ADAMTS5 BMP8A, BMPR2 C1ORF56, C1ORF85 ADD3, ADGB ADAMTS6, ADAMTS9 BNIP3, BNIP3L C2, C2ORF48 ADH1A, ADH5 ADAMTSL1, ADAMTSL5 BOC, BOD1 C4BPB, C5ORF24 ADH6, ADH7 ADAP2, ADAR BPGM, BPTF C6ORF118, C7ORF31 ADI1, ADIG ADARB1, ADAT1 BRAP, BRCA1 C9ORF72, CA3 ADIPOR1, ADIPOR2 ADC, ADCK1 BRCA2, BRD1 CA5B, CACNA2D1 ADK, ADM ADCK2, ADCK3 BRD2, BRD3 CACNG8, CAD ADNP, ADO ADCK4, ADCY1 BRD4, BRD7 CALCOCO2, CALM1 ADORA2A-AS1, ADPGK ADCY3, ADCY6 BRI3BP, BRMS1L CALM2, CALM3 ADPRHL2, ADRA1A ADCY7, ADCY9 BTBD3, BTBD7 CALR, CAMK2G ADRA1B, ADRBK2 ADD1, ADD2 BTF3, BTF3L4 CAND1, CAP1 ADRM1, ADSL ADD3, ADGB BTG2, BZW1 CAPG, CAPN1 ADSS, ADSSL1 ADH1A, ADH5 C10ORF118, C11ORF16 CAPNS1, CAPRIN1 AEBP2, AEN ADH6, ADH7 C11ORF48, C11ORF57 CARS, CASP3 AFAP1, AFAP1L1 ADI1, ADIG C11ORF58, C12ORF65 CASP8AP2, CAST AFAP1L2, AFF1 ADIPOR1, ADIPOR2 C14ORF39, C15ORF40 CBR4, CBS AFF4, AFG3L2 ADK, ADM C19ORF54, C1ORF122 CBX2, CBX5 AFP, AFTPH ADNP, ADNP2 C1ORF43, C1ORF85 CCDC102A, CCDC113 AGA, AGAP1 ADO, ADORA2A-AS1 C2, C20ORF194 CCDC12, CCDC124 AGAP3, AGBL2 ADORA2B, ADPGK C20ORF24, C2ORF43 CCDC134, CCDC170 AGBL3, AGER ADPRH, ADPRHL2 C4BPB, C5ORF24 CCDC180, CCDC22 AGFG1, AGFG2 ADRA1A, ADRA1B C7ORF31, C8ORF33 CCDC74B, CCDC88C AGGF1, AGK ADRA2A, ADRBK2 C9ORF72, CA5B CCL14, CCL2 AGL, AGMAT ADRM1, ADSL CABYR, CACNG8 CCND1, CCND2 AGO1, AGO2 ADSS, ADSSL1 CALCOCO2, CALD1 CCND3, CCNE2 AGO4, AGPAT1 AEBP2, AEN CALR, CALU CCNJ, CCSAP AGPAT3, AGPAT4 AFAP1, AFAP1L1 CAMTA1, CAND1 CCT6A, CCT7 AGPAT5, AGPAT6 AFAP1L2, AFF1 CANX, CAPNS1 CD109, CD2AP AGPS, AGR2 AFF4, AFG3L2 CAPRIN1, CARS CD44, CD46 AGR3, AGRN AFP, AFTPH CASC3, CASC5 CD63, CD81 AGRP, AGTR2 AGA, AGAP1 CASKIN2, CASP3 CD9, CDC20 AGTRAP, AGXT2 AGAP3, AGBL2 CASP7, CAST CDC27, CDC42 AHCTF1, AHCYL1 AGBL3, AGER CBS, CBX2 CDC42BPB, CDC42EP1 AHCYL2, AHDC1 AGFG1, AGFG2 CCDC104, CCDC14 CDC42EP4, CDC42SE1 AHI1, AHNAK AGGF1, AGK CCDC142, CCDC34 CDC6, CDCA4 AHNAK2, AHR AGL, AGMAT CCDC47, CCDC74B CDCP1, CDH1 AHRR, AHSA1 AGO1, AGO2 CCL20, CCND1 CDH13, CDH2 AHSA2, AIDA AGO4, AGPAT1 CCNE2, CCNG1 CDH4, CDK1 AIFM1, AIFM2 AGPAT2, AGPAT3 CCNH, CCNJ CDK14, CDK4 AIFM3, AIG1 AGPAT4, AGPAT5 CCR1, CCSAP CDK6, CDK9 AIM1, AIM1L AGPAT6, AGPS CCT3, CCT6P1 CDKN1A, CDKN1B AIM2, AIMP1 AGR2, AGR3 CCT7, CD276 CDKN2AIP, CEBPA AIMP2, AJUBA AGRN, AGRP CD44, CD46 CEBPG, CEBPZ AK1, AK2 AGT, AGTR1 CD47, CD81 CENPF, CERS2 AK3, AK7 AGTR2, AGTRAP CD83, CDC25A CETN3, CHAF1B AKAP1, AKAP10 AGXT2, AHCTF1 CDC25C, CDC42 CHAMP1, CHMP2A AKAP11, AKAP12 AHCYL1, AHCYL2 CDC42EP1, CDC42EP3 CHRAC1, CHST11 AKAP13, AKAP17A AHDC1, AHI1 CDC6, CDCA4 CHSY1, CIZ1 AKAP2, AKAP4 AHNAK, AHNAK2 CDIP1, CDK1 CKAP2L, CKS1B AKAP7, AKAP8 AHR, AHRR CDK18, CDK19 CLCN3, CLCN7 AKAP9, AKIP1 AHSA1, AHSA2 CDK2AP1, CDK4 CLDN12, CLEC1A AKIRIN1, AKR1A1 AIDA, AIF1L CDK5RAP2, CDK6 CLEC4D, CLIC4 AKR1B1, AKR1B10 AIFM1, AIFM2 CDKL5, CDKN1A CLOCK, CLTC 436

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses AKR1C2, AKR1C3 AIFM3, AIG1 CDKN1B, CDKN1C CLUAP1, CLYBL AKR1D1, AKR1E2 AIM1, AIM1L CDKN2AIP, CDY2B CNEP1R1, CNIH4 AKR7A2, AKR7L AIMP1, AIMP2 CEACAM8, CEBPG CNN3, CNOT3 AKT1, AKT1S1 AJUBA, AK1 CELF1, CENPF CNOT6, COIL AKT2, AKT3 AK2, AK3 CENPO, CENPT COL12A1, COL4A1 ALAD, ALAS1 AK4, AK7 CEP104, CEP152 COMMD2, COPB1 ALCAM, ALDH16A1 AKAP1, AKAP10 CEP250, CEP97 COPG1, COPZ1 ALDH18A1, ALDH1A2 AKAP11, AKAP12 CERS2, CERS6 COQ6, CORO1C ALDH1A3, ALDH1B1 AKAP13, AKAP17A CFL1, CHCHD2 COTL1, COX2 ALDH2, ALDH3A2 AKAP2, AKAP4 CHN1, CHST12 COX7A2, CPEB4 ALDH3B2, ALDH4A1 AKAP5, AKAP7 CHST3, CKAP2 CPOX, CPSF1 ALDH5A1, ALDH7A1 AKAP8, AKAP9 CKAP2L, CKAP5 CPSF3, CREB5 ALDH8A1, ALDH9A1 AKIP1, AKIRIN1 CKB, CLCN2 CREBBP, CREG1 ALDOA, ALDOC AKR1A1, AKR1B1 CLCN3, CLCN5 CRELD2, CRH ALG10, ALG10B AKR1B10, AKR1C2 CLCN7, CLDN23 CRIP2, CRKL ALG12, ALG13 AKR1C3, AKR1CL1 CLEC11A, CLEC4D CRYGS, CSDE1 ALG2, ALG3 AKR1E2, AKR7A2 CLIC1, CLIC4 CSE1L, CSNK1A1 ALG8, ALK AKR7L, AKT1 CLIP4, CLOCK CSNK2A1, CSNK2A2 ALKBH1, ALKBH4 AKT1S1, AKT2 CLSPN, CLUAP1 CSRP1, CSTF3 ALKBH5, ALMS1 AKT3, ALAD CLYBL, CNIH4 CTBP1, CTBP2 ALPI, ALS2 ALAS1, ALCAM CNN3, CNOT1 CTC1, CTDNEP1 ALYREF, AMACR ALDH16A1, ALDH18A1 CNOT3, CNOT6 CTGF, CTNNA1 AMBRA1, AMD1 ALDH1A2, ALDH1A3 CNRIP1, CNTNAP2 CTSC, CTTN AMDHD1, AMELX ALDH1B1, ALDH2 CNTRL, COBLL1 CUL4B, CXCL1 AMELY, AMER1 ALDH3A1, ALDH3A2 COCH, COIL CXCL2, CXCL3 AMHR2, AMIGO1 ALDH3B2, ALDH4A1 COL4A1, COL5A2 CYP20A1, CYP2E1 AMMECR1, AMMECR1L ALDH5A1, ALDH6A1 COL6A1, COPA CYR61, CYTB AMOT, AMOTL1 ALDH7A1, ALDH8A1 COPE, CORO1A DBN1, DCAF7 AMPD1, AMPD2 ALDH9A1, ALDOA CORO1C, CORO2A DCAKD, DCTN4 AMPD3, AMPH ALDOB, ALDOC COX1, COX2 DCTPP1, DCUN1D1 AMT, AMY1B ALG1, ALG10 COX7A2, CPA3 DDB1, DDT AMY1C, AMZ1 ALG10B, ALG12 CPEB3, CPNE4 DDTL, DDX24 ANAPC1, ANAPC10 ALG13, ALG2 CPNE5, CREB1 DDX39A, DDX3X ANAPC11, ANAPC13 ALG3, ALG5 CREBBP, CREBZF DDX42, DDX5 ANAPC15, ANAPC16 ALG8, ALG9 CRH, CRK DDX50, DDX6 ANAPC5, ANAPC7 ALK, ALKBH1 CROT, CSAG1 DDX60, DEFB125 ANG, ANGEL1 ALKBH4, ALKBH5 CSNK1A1, CSNK1G1 DFNA5, DGCR8 ANGEL2, ANGPT4 ALKBH6, ALMS1 CSNK2A1, CSRP1 DGKH, DHRS1 ANGPTL4, ANK2 ALPI, ALPK3 CSTF2T, CTC1 DHX15, DHX30 ANKFY1, ANKH ALPPL2, ALS2 CTNNA1, CTNNB1 DKK1, DLX4 ANKHD1, ANKIB1 ALX1, ALYREF CTPS1, CXORF38 DMKN, DMTN ANKLE2, ANKRD1 AMACR, AMBRA1 CYB5D1, CYBRD1 DNAJB1, DNAJB12 ANKRD10, ANKRD11 AMD1, AMDHD1 CYP1B1, CYP4V2 DNAJB4, DNAJB9 ANKRD13A, ANKRD13B AMDHD2, AMELX CYP7A1, CYTB DNAJC10, DOCK10 ANKRD13C, ANKRD13D AMELY, AMER1 DAAM1, DAG1 DOCK9, DOK6 ANKRD17, ANKRD18A AMHR2, AMIGO1 DARS2, DAXX DOLPP1, DPH5 ANKRD18CP, AMIGO2, AMMECR1 DAZAP1, DBN1 DPP7, DPY19L1 ANKRD20A11P ANKRD22, ANKRD26 AMMECR1L, AMOT DBNL, DCAF10 DPY30, DRAP1 ANKRD27, ANKRD28 AMOTL1, AMOTL2 DCAF7, DCAF8 DROSHA, DSC3 ANKRD29, ANKRD30A AMPD1, AMPD2 DCP1A, DCTD DSG2, DST ANKRD31, ANKRD32 AMPD3, AMPH DCTN4, DCTN5 DTX1, DYM ANKRD40, ANKRD44 AMT, AMY1B DDAH1, DDHD2 DYNC1LI1, DYNLT1 ANKRD46, ANKRD52 AMY1C, AMZ1 DDIT4, DDR2 E2F1, E2F3 ANKRD53, ANKRD54 AMZ2, ANAPC1 DDX21, DDX24 E2F5, E2F6 ANKRD6, ANKRD9 ANAPC10, ANAPC11 DDX39A, DDX3X ECHS1, EDEM1 ANKS1A, ANKS4B ANAPC13, ANAPC15 DDX3Y, DDX46 EDEM3, EDF1 ANKUB1, ANKZF1 ANAPC16, ANAPC4 DENND2C, DENR EDN1, EED ANLN, ANO1 ANAPC5, ANAPC7 DERA, DERL1 EEF1A1, EEF2 ANO3, ANO6 ANG, ANGEL1 DGCR8, DHCR7 EFCAB1, EFR3A ANP32A, ANP32B ANGEL2, ANGPT4 DHX15, DHX30 EFTUD2, EGFR ANP32C, ANP32E ANGPTL4, ANGPTL7 DICER1, DKK2 EHMT1, EHMT2 ANPEP, ANTXR1 ANK2, ANKFY1 DLG1, DLST EID1, EIF3L ANTXR2, ANXA1 ANKH, ANKHD1 DLX4, DLX5 EIF4A1, EIF4E ANXA11, ANXA2 ANKIB1, ANKLE2 DMD, DMTF1 EIF4ENIF1, EIF4G1 ANXA4, ANXA5 ANKRD1, ANKRD10 DMWD, DMXL1 EIF4H, EIF5A ANXA6, ANXA7 ANKRD11, ANKRD12 DNAJB1, DNAJB4 ELL, ELMOD2 ANXA8, ANXA8L1 ANKRD13A, ANKRD13B DNAJC18, DNMT1 ELOVL1, EMD ANXA8L2, AOC2 ANKRD13C, ANKRD13D DOCK10, DOCK4 EMILIN3, EML3 AP1AR, AP1B1 ANKRD17, ANKRD18A DOCK5, DOCK7 EML4, EMP3 AP1G1, AP1M1 ANKRD18CP, ANKRD2 DPH5, DPP8 ENAH, EPB41L2 437

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses AP1S1, AP1S2 ANKRD20A11P, ANKRD22 DSE, DST EPB41L4B, ERBB2 AP1S3, AP2A1 ANKRD26, ANKRD27 DSTYK, DTX3L ERBB2IP, ERBB3 AP2A2, AP2B1 ANKRD28, ANKRD29 DUSP10, DUSP18 ERGIC2, ERLIN2 AP2M1, AP2S1 ANKRD30A, ANKRD31 DUSP2, DUSP8 ERMN, ESF1 AP3B1, AP3D1 ANKRD32, ANKRD36B DVL2, DYNC1H1 ESR1, ESRRA AP3M1, AP3M2 ANKRD36BP1, ANKRD39 DYNC1I2, DYNC1LI2 ESYT1, ETNK1 AP3S1, AP3S2 ANKRD40, ANKRD42 DZIP1L, E2F1 ETNK2, ETV7 AP4B1, AP4E1 ANKRD44, ANKRD46 E2F2, E2F3 EVI2B, EXOC2 AP4S1, AP5M1 ANKRD52, ANKRD53 EARS2, EBNA1BP2 EXOC3, EXOC5 AP5S1, APAF1 ANKRD54, ANKRD6 EBP, ECI2 EZH2, F11R APBA2, APBA3 ANKRD9, ANKS1A EDEM3, EDIL3 F2, F2RL1 APBB1, APBB2 ANKS1B, ANKS3 EDNRA, EEF1A1 FABP3, FABP4 APC, APCDD1 ANKS4B, ANKS6 EEF2, EFCAB6 FADD, FADS1 APCDD1L, APEH ANKUB1, ANKZF1 EFNA1, EFNB2 FAM102A, FAM105B APEX1, APITD1 ANLN, ANO1 EGFR, EGLN3 FAM126B, FAM167A APLN, APLNR ANO2, ANO3 EIF2AK1, EIF2S1 FAM177A1, FAM208A APLP1, APLP2 ANO6, ANP32A EIF3B, EIF3I FAM209A, FAM214A APOA1BP, APOBEC3B ANP32B, ANP32C EIF3L, EIF4A1 FAM21B, FAM21C APOBEC3C, APOBEC4 ANP32E, ANPEP EIF4A2, EIF4E FAM3C, FAM49B APOC1, APOF ANTXR1, ANTXR2 EIF4EBP2, EIF4ENIF1 FAM57A, FAM65A APOH, APOL2 ANXA1, ANXA11 EIF4G1, EIF4G3 FAM81A, FAM98A APOLD1, APOPT1 ANXA2, ANXA3 EIF5, EIF5A FANCD2, FANCI APP, APPBP2 ANXA4, ANXA5 ELAVL2, ELAVL4 FASN, FAT2 APPL1, APRT ANXA6, ANXA7 ELL, ELMO2 FBLN2, FBN1 AQP10, AQP12B ANXA8, ANXA8L1 ELOVL1, ELOVL4 FBXO22, FBXO28 AQP5, AR ANXA8L2, AOC2 ELOVL7, EMILIN3 FBXO3, FBXO33 ARAF, ARAP1 AOX1, AP1AR EML4, ENAH FBXO45, FERMT2 ARCN1, AREG AP1B1, AP1G1 ENO1, ENTPD4 FGF2, FGFR2 AREGB, AREL1 AP1M1, AP1M2 ENTPD6, EP300 FHDC1, FIG4 ARF1, ARF3 AP1S2, AP1S3 EP400, EPB41L2 FILIP1, FKRP ARF4, ARF5 AP2A1, AP2A2 EPHA4, EPHB4 FLNA, FLNB ARF6, ARFGAP1 AP2B1, AP2M1 EPM2A, ERBB2 FLNC, FLOT2 ARFGAP2, ARFGAP3 AP2S1, AP3B1 ERBB2IP, ERC1 FMNL2, FMNL3 ARFGEF1, ARFGEF2 AP3D1, AP3M1 ERCC6L, ERP29 FN1, FNDC3A ARFIP1, ARFIP2 AP3M2, AP3S1 ERRFI1, ESF1 FNDC3B, FOLR1 ARG1, ARG2 AP3S2, AP4B1 ESR1, ESYT2 FOPNL, FOXK2 ARGLU1, ARHGAP1 AP4E1, AP4S1 ETNK1, ETNK2 FOXN2, FOXN4 ARHGAP10, ARHGAP11A AP5M1, AP5S1 ETS1, ETS2 FOXO1, FOXO4 ARHGAP12, ARHGAP17 APAF1, APBA2 EVI2B, EVL FOXP1, FRG1 ARHGAP19, ARHGAP21 APBA3, APBB1 EXOC5, EXOC8 FRMD7, FRS2 ARHGAP26, ARHGAP28 APBB2, APC EXOSC1, EXOSC10 FSTL1, FTH1 ARHGAP29, ARHGAP32 APCDD1, APCDD1L EYA4, EZR FTSJ1, FUBP1 ARHGAP33, ARHGAP35 APEH, APEX1 FAM102A, FAM117B FUBP3, FUK ARHGAP36, ARHGAP39 APEX2, APH1A FAM118A, FAM126B FXN, G3BP1 ARHGAP42, ARHGAP44 API5, APITD1 FAM134B, FAM13B G3BP2, G6PC3 ARHGAP5, ARHGDIA APLN, APLNR FAM177A1, FAM19A3 G6PD, GAK ARHGEF1, ARHGEF10 APLP1, APLP2 FAM209A, FAM20B GAPDH, GART ARHGEF10L, ARHGEF12 APOA1BP, APOBEC3B FAM217B, FAM21B GATA3, GATA4 ARHGEF18, ARHGEF2 APOBEC3C, APOBEC4 FAM3C, FAM46A GATA6, GATAD2B ARHGEF25, ARHGEF26 APOC1, APOF FAM49B, FAM53C GCFC2, GCH1 ARHGEF28, ARHGEF3 APOH, APOL2 FAM65A, FAM83G GFPT2, GGPS1 ARHGEF33, ARHGEF37 APOLD1, APOO FAM98A, FANCD2 GIMAP4, GJA1 ARHGEF4, ARHGEF40 APOPT1, APP FANCI, FARSA GJB3, GLA ARHGEF5, ARHGEF6 APPBP2, APPL1 FAS, FASLG GLI1, GLI2 ARHGEF7, ARID1A APPL2, APRT FASN, FAT1 GLTSCR2, GNA13 ARID1B, ARID2 APTX, AQP1 FAT2, FAXDC2 GNAI1, GNAI2 ARID3A, ARID3B AQP10, AQP12B FBLN5, FBN3 GNAI3, GNAQ ARID4A, ARID4B AQP4, AQP5 FBXL17, FBXL2 GNAT2, GNB1 ARID5A, ARID5B AR, ARAF FBXO11, FBXO21 GNB2, GNG5 ARIH1, ARIH2 ARAP1, ARAP2 FBXO28, FBXO3 GNPDA2, GNPNAT1 ARL1, ARL10 ARCN1, AREG FBXO45, FBXO7 GNRHR, GOLGA3 ARL13B, ARL14 AREGB, AREL1 FBXW2, FECH GOLGA5, GOLGA7 ARL14EP, ARL15 ARF1, ARF3 FERMT2, FIGN GOLGA8A, GOLGA8B ARL16, ARL2 ARF4, ARF5 FILIP1L, FKBP1A GOLPH3, GOT1 ARL2BP, ARL3 ARF6, ARFGAP1 FKBP5, FKBP8 GPAA1, GPBP1L1 ARL4C, ARL5B ARFGAP2, ARFGAP3 FKRP, FLNA GPC1, GPC4 ARL6, ARL6IP1 ARFGEF1, ARFGEF2 FLT1, FMOD GPCPD1, GPD1L ARL6IP4, ARL6IP6 ARFIP1, ARFIP2 FMR1, FN1 GPD2, GPR137C ARL8A, ARL8B ARFRP1, ARG1 FNBP1, FOPNL GPR83, GPX1P1 ARL9, ARMC10 ARG2, ARGLU1 FOS, FOXJ3 GRAMD4, GRK5 ARMC12, ARMC2 ARHGAP1, ARHGAP10 FOXK2, FOXN2 GRK6, GRPEL1 438

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ARMC6, ARMC7 ARHGAP11A, ARHGAP12 FOXN3, FOXO3 GSE1, GSTO1 ARMC8, ARMCX2 ARHGAP17, ARHGAP19 FOXP1, FPGS GTF2H1, GTF3C1 ARMCX3, ARNT ARHGAP21, ARHGAP22 FRAT2, FRS2 GTF3C5, GTF3C6 ARNT2, ARNTL ARHGAP26, ARHGAP28 FSCN1, FSTL3 H1FX, H2AFX ARNTL2, ARPC1A ARHGAP29, ARHGAP32 FUBP1, FUNDC2 H2AFY2, H3F3B ARPC2, ARPC3 ARHGAP33, ARHGAP35 FUS, FYTTD1 H3F3C, HADH ARPC5, ARPC5L ARHGAP36, ARHGAP39 G3BP1, G3BP2 HAND2, HAT1 ARPP19, ARR3 ARHGAP42, ARHGAP44 G6PC3, GALNT10 HBS1L, HCN2 ARRB1, ARRB2 ARHGAP5, ARHGDIA GANAB, GAPDH HCN4, HDAC2 ARRDC3, ARRDC4 ARHGEF1, ARHGEF10 GAPVD1, GART HDAC4, HDLBP ARSA, ARSD ARHGEF10L, ARHGEF12 GATA4, GATA6 HEATR2, HELLS ARSF, ARSG ARHGEF18, ARHGEF2 GATAD2B, GCN1L1 HERC2, HERC5 ARSI, ARSJ ARHGEF25, ARHGEF26 GDAP1, GDI1 HIC2, HIF1A ART3, ARTN ARHGEF28, ARHGEF3 GDI2, GGPS1 HIF3A, HIGD1A ARVCF, ARX ARHGEF33, ARHGEF37 GID4, GID8 HINT2, HIPK3 ASAH1, ASAP1 ARHGEF4, ARHGEF40 GLA, GLCCI1 HIST1H1B, HIST1H1C ASAP3, ASB1 ARHGEF5, ARHGEF6 GLCE, GLG1 HIST1H1E, HIST1H2AC ASB11, ASB13 ARHGEF7, ARID1A GLI3, GLO1 HIST1H2BB, HIST1H2BD ASB2, ASB3 ARID1B, ARID2 GLOD4, GLRX5 HIST1H2BH, HIST1H2BJ ASB4, ASB6 ARID3A, ARID3B GLYR1, GNAQ HIST1H2BK, HIST1H3B ASCC2, ASCC3 ARID4A, ARID4B GNB4, GNE HIST1H4B, HIST1H4E ASCL2, ASF1A ARID5A, ARID5B GNG10, GNL3L HIST2H2AC, HIST2H3A ASF1B, ASGR1 ARIH1, ARIH2 GNPDA2, GOLGA4 HIST2H3C, HIST2H3D ASH1L, ASH2L ARL1, ARL10 GOLGA5, GOT1 HIST2H4B, HIST3H3 ASIC1, ASIC4 ARL13B, ARL14 GPAM, GPBP1 HIVEP1, HIVEP2 ASIP, ASMTL ARL14EP, ARL15 GPC4, GPCPD1 HMGB1, HMGCR ASNA1, ASNS ARL16, ARL17A GPD1L, GPD2 HMGCS1, HMOX1 ASNSD1, ASPA ARL2, ARL2BP GPHB5, GPM6A HNF4A, HNRNPA0 ASPG, ASPH ARL3, ARL4A GPR107, GPR64 HNRNPA1, HNRNPA1L2 ASPM, ASPN ARL4C, ARL5A GRB10, GRK6 HNRNPA3, HNRNPD ASPSCR1, ASRGL1 ARL5B, ARL6IP1 GRPEL1, GRPEL2 HNRNPH1, HNRNPH3 ASS1, ASTE1 ARL6IP4, ARL6IP5 GSE1, GSG2 HNRNPU, HNRNPUL2 ASTL, ASTN1 ARL6IP6, ARL8A GSTM3, GTDC2 HOOK1, HOXA11 ASXL1, ASXL2 ARL8B, ARL9 GTF2A1, GTF2B HOXA13, HOXA9 ASXL3, ATAD1 ARMC1, ARMC10 GTF2E1, GTF2H2 HP1BP3, HPS4 ATAD2, ATAD2B ARMC12, ARMC2 GTF2I, GTF3C1 HRASLS5, HS2ST1 ATAD3A, ATAD3B ARMC6, ARMC7 GTF3C5, GTF3C6 HSD17B11, HSD17B8 ATAD3C, ATAD5 ARMC8, ARMCX2 GXYLT2, GYS1 HSD3B7, HSFY2 ATAT1, ATE1 ARMCX3, ARNT H2AFY2, H3F3B HSP90B1, HSPA1B ATF1, ATF2 ARNT2, ARNTL HAPLN1, HAUS8 HSPA4, HSPD1 ATF3, ATF5 ARNTL2, ARPC1A HBS1L, HCCS HSPG2, HYI ATF6, ATF6B ARPC1B, ARPC2 HCFC1, HDGF IARS, IFI44 ATF7, ATF7IP ARPC3, ARPC5 HDLBP, HEATR2 IFIT1, IFIT2 ATF7IP2, ATG10 ARPC5L, ARPP19 HECTD1, HECTD2 IFIT3, IFT52 ATG12, ATG13 ARPP21, ARR3 HECTD3, HECW2 IGF1R, IGF2R ATG14, ATG16L1 ARRB1, ARRB2 HELLS, HERC4 IGFBP7, IGSF1 ATG16L2, ATG2B ARRDC1, ARRDC2 HERPUD2, HES1 IGSF3, IKBKAP ATG3, ATG4A ARRDC3, ARRDC4 HEXIM1, HHAT IKZF4, IL11 ATG4B, ATG4C ARSA, ARSD HIC2, HIF1AN IL24, IL27RA ATG9A, ATHL1 ARSF, ARSI HIGD1A, HIPK1 IL32, IL6 ATIC, ATL1 ARSJ, ART3 HIPK2, HIST1H1C IL8, ILF3 ATL2, ATL3 ART5, ARTN HIST1H1E, HIST1H2AC ILVBL, IMMT ATM, ATMIN ARVCF, ARX HIST1H2AE, HIST1H2BB IMPDH1, INADL ATN1, ATOH1 ASAH1, ASAP1 HIST1H2BD, HIST1H2BH INCENP, INPP5F ATOH8, ATOX1 ASAP3, ASB1 HIST1H2BJ, HIST1H3B INPPL1, INSIG1 ATP10A, ATP10B ASB11, ASB13 HIST1H3D, HIST2H2AC INSIG2, INTS1 ATP10D, ATP11A ASB2, ASB3 HIST2H3A, HIST2H3D INTS6, IP6K2 ATP11B, ATP11C ASB4, ASB6 HIST2H4B, HIVEP1 IPO13, IPO8 ATP13A1, ATP13A2 ASCC2, ASCC3 HLA-DQA1, HLX IPO9, IQCD ATP13A3, ATP1A1 ASCL2, ASF1A HMGA1, HMGB1 IQGAP3, IQSEC1 ATP1A2, ATP1B1 ASF1B, ASGR1 HMGB3, HMGCS1 IRF2BP1, IRF2BPL ATP1B2, ATP1B3 ASH1L, ASH2L HMGXB4, HMOX1 IRS4, ISG15 ATP2A2, ATP2B1 ASIC1, ASIC4 HNRNPA0, HNRNPA1 ISG20, ISOC1 ATP2B4, ATP2C1 ASIP, ASMTL HNRNPA2B1, HNRNPA3 ISOC2, IST1 ATP5A1, ATP5B ASNA1, ASNS HNRNPD, HNRNPH1 ITGA1, ITGA3 ATP5C1, ATP5E ASNSD1, ASPA HNRNPH3, HNRNPK ITGA6, ITGAV ATP5F1, ATP5G2 ASPG, ASPH HOXA11, HOXA13 ITGB1, ITGB4 ATP5G3, ATP5H ASPM, ASPN HOXA9, HOXB5 ITM2B, ITPR1 ATP5I, ATP5J ASPRV1, ASPSCR1 HOXC10, HOXD13 JMJD1C, JUNB ATP5J2, ATP5L ASRGL1, ASS1 HPS5, HPSE JUND, JUP ATP5O, ATP5SL ASTE1, ASTL HRASLS5, HS3ST3B1 KANK2, KAT2A 439

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ATP6, ATP6AP1 ASTN1, ASTN2 HSFY2, HSP90AA1 KAT2B, KATNAL1 ATP6AP1L, ATP6AP2 ASXL1, ASXL2 HSPA14, HSPA1B KCND1, KCND3 ATP6V0A1, ATP6V0D2 ASXL3, ATAD1 HSPG2, IARS KCNE1, KCNJ2 ATP6V0E1, ATP6V1A ATAD2, ATAD2B ICAM1, ICMT KCNN4, KCTD12 ATP6V1B2, ATP6V1C1 ATAD3A, ATAD3B IDH2, IDS KCTD20, KDELR1 ATP6V1E1, ATP6V1F ATAD3C, ATAD5 IFNA1, IGF1R KDM1A, KDM2A ATP6V1G1, ATP6V1G2- ATAT1, ATE1 IGSF3, IKBKB KDM2B, KIAA1191 DDX39B ATP7A, ATP7B ATF1, ATF2 IL1B, IL6 KIAA1211, KIAA1522 ATP8A1, ATP8A2 ATF3, ATF4 ILF3, ILK KIAA1598, KIAA1671 ATP8B2, ATP9A ATF5, ATF6 IMMT, IMPA1 KIAA1731, KIF1A ATPIF1, ATR ATF6B, ATF7 IMPDH1, INADL KIF1C, KIF2A ATRAID, ATRNL1 ATF7IP, ATF7IP2 INCENP, INPP4A KIF2C, KIF4A ATRX, ATXN1 ATG10, ATG12 INSIG2, INTS1 KIF5B, KLHDC4 ATXN10, ATXN1L ATG13, ATG14 IPO13, IPO5 KLHDC5, KLHL3 ATXN2, ATXN2L ATG16L1, ATG16L2 IPP, IQCE KLHL42, KLK12 ATXN7, ATXN7L1 ATG2B, ATG3 IRAK1, IREB2 KMT2C, KNCN ATXN7L3, ATXN7L3B ATG4A, ATG4B IRF2BP1, IRF2BP2 KPNA2, KRAS AUNIP, AUP1 ATG4C, ATG7 IRS4, ISCU KRT75, KRTAP4-5 AURKA, AURKAIP1 ATG9A, ATHL1 ISG20L2, ISOC1 L1CAM, LAMC1 AURKB, AUTS2 ATIC, ATL1 ISOC2, ITGAV LARP4, LASP1 AVEN, AVL9 ATL2, ATL3 ITGB1, ITSN2 LCA5L, LCP1 AVP, AVPI1 ATM, ATMIN IVNS1ABP, JAG1 LEMD3, LEPREL2 AVPR1B, AXIN1 ATN1, ATOH1 JMJD6, JMY LEPREL4, LETM1 AXIN2, AXL ATOH8, ATOX1 JPH1, JTB LGALS1, LGALS3BP AZGP1, AZIN1 ATP10A, ATP10B JUNB, JUND LHFPL2, LHX4 B2M, B3GALNT1 ATP10D, ATP11A KAT6A, KATNAL1 LIFR, LIMS1 B3GAT1, B3GNT2 ATP11B, ATP11C KBTBD6, KBTBD7 LIN54, LIN7C B3GNT3, B3GNT4 ATP13A1, ATP13A2 KCND3, KCNK13 LIPC, LMNB1 B3GNT5, B3GNT8 ATP13A3, ATP1A1 KCTD12, KCTD20 LNPEP, LONP2 B4GALNT3, B4GALT1 ATP1A2, ATP1A3 KDELR1, KDM2A LONRF1, LPIN2 B4GALT2, B4GALT3 ATP1B1, ATP1B2 KDR, KEAP1 LPL, LRBA B4GALT5, B4GALT6 ATP1B3, ATP2A2 KHSRP, KIAA0101 LRP1, LRRC8A BAAT, BACE1 ATP2B1, ATP2B4 KIAA1161, KIAA1191 LRRC8B, LRRC8C BACE2, BACH1 ATP2C1, ATP5A1 KIAA1211, KIAA1462 LRRC8D, LRRK2 BACH2, BAD ATP5B, ATP5C1 KIAA1551, KIAA1715 LRWD1, LSM4 BAG1, BAG2 ATP5E, ATP5F1 KIF16B, KIF1A LZTFL1, MACROD1 BAG3, BAG5 ATP5G2, ATP5G3 KIF1C, KIF3B MAD2L1, MAGEB3 BAG6, BAHCC1 ATP5H, ATP5I KIF4A, KIF5C MAN1A2, MAN1B1 BAHD1, BAI2 ATP5J, ATP5J2 KIFAP3, KIT MAP3K5, MAP4 BAI3, BAK1 ATP5L, ATP5O KITLG, KLF11 MAP4K2, MAPK1 BAMBI, BAP1 ATP5S, ATP5SL KLF5, KLF9 MARCKS, MAST4 BARD1, BARX2 ATP6, ATP6AP1 KLHL12, KLHL13 MAT2A, MATR3 BASP1, BATF2 ATP6AP1L, ATP6AP2 KLHL15, KLHL20 MAZ, MBTPS1 BAX, BAZ1A ATP6V0A1, ATP6V0C KLHL24, KLHL31 MCAM, MCCC2 BAZ1B, BAZ2A ATP6V0D1, ATP6V0D2 KLHL42, KLHL8 MCFD2, MCL1 BAZ2B, BBC3 ATP6V0E1, ATP6V1A KMT2C, KMT2D MCM2, MCM3 BBS10, BBS12 ATP6V1B2, ATP6V1C1 KMT2E, KNSTRN MCM4, MCM5 BBS7, BBS9 ATP6V1C2, ATP6V1E1 KPNA6, KPNB1 MCM6, MCM7 BBX, BCAP29 ATP6V1F, ATP6V1G1 KRT10, KRT14 MCMBP, MCOLN2 BCAP31, BCAS2 ATP6V1G2-DDX39B, KRT18, KRTAP4-5 MCTS1, MDC1 ATP6V1H BCAS4, BCAT1 ATP7A, ATP7B LAMB3, LAMP1 MDM2, MECP2 BCAT2, BCCIP ATP8A1, ATP8A2 LAMP2, LAMTOR5 MECR, MED1 BCDIN3D, BCHE ATP8B2, ATP8B3 LARS, LASP1 MED19, MEF2A BCKDHA, BCKDHB ATP9A, ATPAF1 LATS1, LCA5 MESP2, MET BCKDK, BCL10 ATPAF2, ATPIF1 LCORL, LDHB METTL13, METTL21D BCL11A, BCL11B ATR, ATRAID LGALS3BP, LGTN MFF, MFHAS1 BCL2, BCL2A1 ATRIP, ATRNL1 LHFPL2, LIFR MFN2, MFSD10 BCL2L1, BCL2L11 ATRX, ATXN1 LIMA1, LIMCH1 MFSD12, MGAT5 BCL2L12, BCL2L13 ATXN10, ATXN1L LIMS1, LIN7C MGC27345, MGST1 BCL2L2, BCL3 ATXN2, ATXN2L LMBR1, LMNB2 MIA2, MIER1 BCL6, BCL6B ATXN7, ATXN7L1 LONRF1, LONRF2 MIER3, MIF BCL7A, BCL7B ATXN7L3, ATXN7L3B LPAR1, LPGAT1 MINPP1, MIS12 BCL9, BCL9L AUH, AUNIP LPHN2, LPIN2 MITF, MKI67 BCLAF1, BCOR AUP1, AURKA LRBA, LRP10 MKI67IP, MMD BCORL1, BCR AURKAIP1, AURKB LRP11, LRP6 MME, MMP3 BCS1L, BDH2 AUTS2, AVEN LRRC57, LRRFIP1 MMS22L, MOB4 BDNF, BDP1 AVIL, AVL9 LSM4, LUC7L2 MOBP, MON2 BECN1, BEND3 AVP, AVPI1 LUZP1, LYN MORC3, MORF4L1 BEND5, BEX1 AVPR1B, AXIN1 LYRM7, LYSMD1 MORF4L2, MOV10 BEX2, BEX5 AXIN2, AXL MACF1, MAFG MPDU1, MPP7 440

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses BFAR, BFSP1 AZGP1, AZIN1 MAK16, MALT1 MPRIP, MRAS BFSP2, BGLAP B2M, B3GALNT1 MAN1A2, MAP1B MRC2, MRE11A BHLHE22, BHLHE40 B3GALT1, B3GAT1 MAP3K1, MAP3K2 MRFAP1, MRGPRX1 BHLHE41, BICD1 B3GNT2, B3GNT3 MAP3K5, MAPK1 MROH7, MRPL17 BICD2, BID B3GNT4, B3GNT5 MAPK11, MAPK14 MRPL19, MRPL36 BIK, BIN1 B3GNT8, B4GALNT3 MAPK9, MAPRE1 MSH2, MSH3 BIN3, BIRC3 B4GALT1, B4GALT2 MARCKS, MARS MSH6, MSI2 BIRC5, BIRC6 B4GALT3, B4GALT5 MATR3, MAZ MTA2, MTF1 BIVM, BLCAP B4GALT6, BAAT MBNL1, MBNL2 MTG2, MTHFD2 BLM, BLMH BABAM1, BACE1 MBTPS1, MCCC2 MTHFS, MTMR12 BLOC1S2, BLOC1S4 BACE2, BACH1 MCFD2, MCM3 MTX1, MXD1 BLOC1S5, BLOC1S6 BACH2, BAD MCM3AP, MCM7 MXD4, MXI1 BLVRA, BMF BAG1, BAG2 MCMBP, MCTS1 MXRA7, MYCBP BMI1, BMP1 BAG3, BAG4 MDFIC, MDH2 MYCN, MYO18A BMP2, BMP2K BAG5, BAG6 MDM4, MDN1 MYO1A, MYO1B BMP7, BMP8A BAHCC1, BAHD1 MEA1, MECP2 MYO3B, MYO5A BMPR1B, BMPR2 BAI2, BAI3 MED25, MEF2A MYOCD, NAB1 BMS1, BNC1 BAK1, BAMBI MEF2C, MEGF9 NABP2, NAP1L2 BNC2, BNIP2 BAP1, BARD1 MEIS1, MEOX2 NAT14, NBEA BNIP3, BNIP3L BARX1, BARX2 MESDC2, MESP2 NBL1, NBPF15 BOC, BOD1 BASP1, BATF METTL13, MFN2 NCAPD3, NCAPG BOK, BOP1 BATF2, BATF3 MFSD12, MFSD8 NCKAP1, NCL BORA, BPGM BAX, BAZ1A MGA, MGAT4A NCLN, NCOA2 BPIFA1, BPIFB3 BAZ1B, BAZ2A MGAT4B, MGAT5 NCS1, NCSTN BPTF, BPY2 BAZ2B, BBC3 MGST1, MIB1 ND1, ND4 BPY2B, BPY2C BBS10, BBS12 MICALL1, MIDN NDE1, NDUFC2 BRAP, BRAT1 BBS7, BBS9 MIEN1, MIF NDUFS4, NEFH BRCA1, BRCA2 BBX, BCAN MINK1, MKI67 NEFL, NEO1 BRCC3, BRD1 BCAP29, BCAP31 MKNK2, MLX NETO2, NFAT5 BRD2, BRD3 BCAR3, BCAS2 MMP1, MMP2 NFE2L2, NHSL1 BRD4, BRD7 BCAS4, BCAT1 MMP9, MOAP1 NHSL2, NIPA2 BRD8, BRDT BCAT2, BCCIP MOB3B, MON2 NKRF, NME4 BRF1, BRI3 BCDIN3D, BCHE MORC3, MORF4L1 NOC2L, NOTCH1 BRI3BP, BRIP1 BCKDHA, BCKDK MORF4L2, MOXD1 NOTCH2, NOTCH3 BRMS1L, BRPF1 BCL10, BCL11A MPP5, MPV17 NPM1, NPTN BRPF3, BRSK1 BCL11B, BCL2 MRAP2, MRAS NR1H3, NR3C1 BRWD1, BSDC1 BCL2A1, BCL2L1 MRGPRX1, MROH1 NR5A2, NRP1 BSG, BSN BCL2L11, BCL2L12 MROH7, MROH8 NSFL1C, NSUN4 BTAF1, BTBD1 BCL2L13, BCL2L2 MRPL10, MRPL24 NT5C1B, NT5E BTBD10, BTBD17 BCL3, BCL6 MRPL36, MRPS10 NTRK3, NUAK1 BTBD2, BTBD3 BCL6B, BCL7A MRPS27, MSH2 NUFIP2, NUP160 BTBD7, BTBD9 BCL7B, BCL7C MSH6, MSI2 NUP205, NUP210 BTC, BTF3 BCL9, BCL9L MSN, MT4 NUP43, NUP50 BTF3L4, BTG1 BCLAF1, BCOR MTA2, MTAP NUP98, NUS1 BTG2, BTG3 BCORL1, BCR MTCH1, MTG2 NVL, NXN BTLA, BTN2A1 BCS1L, BDH2 MTMR12, MTMR9 NXPH2, NXT2 BTN2A3P, BTN3A1 BDKRB1, BDNF MTPN, MTSS1L OARD1, OASL BTN3A3, BTNL3 BDP1, BECN1 MUC1, MXI1 OAT, OCIAD2 BTNL9, BTRC BEND3, BEND4 MXRA7, MYBL1 ORC3, ORMDL2 BUB1, BUB1B BEND5, BEST3 MYC, MYCBP OSBPL10, OSBPL7 BUB3, BUD13 BET1, BET1L MYCBP2, MYEF2 OSBPL8, OSTF1 BVES, BVES-AS1 BEX1, BEX2 MYO5C, MYO9A OXCT1, OXTR BYSL, BZRAP1 BEX4, BEX5 NAA30, NABP2 P4HA1, PACS2 BZW1, BZW2 BFAR, BFSP1 NACC1, NAIP PACSIN3, PAFAH1B3 C10ORF10, C10ORF105 BFSP2, BGLAP NALCN, NAP1L2 PAPD5, PAPD7 C10ORF114, C10ORF118 BHLHB9, BHLHE22 NARS, NAT8L PAQR8, PARD6B C10ORF137, C10ORF2 BHLHE40, BHLHE41 NBEA, NBPF15 PARP6, PARVA C10ORF35, C10ORF71 BICD1, BICD2 NCAM1, NCAPD2 PAX3, PAXBP1 C10ORF88, C10ORF99 BID, BIK NCAPG, NCAPG2 PAXIP1, PBX2 C11ORF16, C11ORF24 BIN1, BIN3 NCDN, NCKAP1 PCDH7, PCDHB14 C11ORF31, C11ORF48 BIRC2, BIRC3 NCKAP5L, NCL PCDHGA10, PCDHGA2 C11ORF52, C11ORF54 BIRC5, BIRC6 NCLN, NCOA3 PCMTD2, PCSK7 C11ORF57, C11ORF63 BIVM, BLCAP NCOA4, NCOR1 PCTP, PDCD10 C11ORF65, C11ORF68 BLM, BLMH NCOR2, NCSTN PDCD4, PDE12 C11ORF83, C11ORF84 BLNK, BLOC1S2 ND1, ND4 PDHA1, PDIA3 C11ORF91, C11ORF95 BLOC1S4, BLOC1S5 NDC1, NDE1 PDLIM7, PEG10 C11ORF96, C12ORF10 BLOC1S6, BLVRA NDFIP1, NDUFB5 PFDN1, PFN1 C12ORF29, C12ORF4 BLVRB, BMF NDUFC2, NDUFS1 PGAM1, PGD C12ORF40, C12ORF49 BMI1, BMP1 NEFH, NEK1 PGM2, PGRMC2 C12ORF5, C12ORF56 BMP2, BMP2K NETO2, NFAT5 PHAX, PHF12 C12ORF57, C12ORF60 BMP6, BMP7 NFATC2IP, NFE2L1 PHF20L1, PHKA1 441

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses C12ORF65, C12ORF66 BMP8A, BMP8B NFE2L2, NFIB PHKG2, PHLDA3 C12ORF68, C12ORF76 BMPR1A, BMPR1B NFKB1, NFS1 PHLDB2, PHLPP1 C14ORF166, C14ORF178 BMPR2, BMS1 NFYA, NFYC PHLPP2, PI16 C14ORF183, C14ORF28 BNC1, BNC2 NGRN, NHSL1 PICALM, PIGA C14ORF37, C14ORF79 BNIP2, BNIP3 NIN, NIPA2 PIGS, PIGT C15ORF27, C15ORF32 BNIP3L, BOC NIPBL, NKRF PIM1, PIP5K1A C15ORF39, C15ORF40 BOD1, BOD1L1 NKTR, NLE1 PIP5K1C, PIR C15ORF41, C15ORF52 BOK, BOLA2 NLGN3, NME2 PITPNC1, PKD1L1 C15ORF56, C15ORF57 BOLA2B, BOP1 NME4, NMNAT2 PKM, PLA2G4F C16ORF11, C16ORF46 BORA, BPGM NODAL, NOL8 PLCB3, PLCXD2 C16ORF52, C16ORF54 BPIFA1, BPIFB3 NOLC1, NONO PLEC, PLEK2 C16ORF58, C16ORF59 BPTF, BPY2 NOP2, NOP58 PLEKHA1, PLEKHB2 C16ORF62, C16ORF70 BPY2B, BPY2C NPEPPS, NR0B2 PLEKHG2, PLEKHH2 C16ORF71, C16ORF72 BRAF, BRAP NR2C2, NR2C2AP PLGRKT, PLK2 C17ORF100, C17ORF50 BRAT1, BRCA1 NSFL1C, NSMF PLS1, PLS3 C17ORF51, C17ORF58 BRCA2, BRCC3 NSUN2, NSUN5 PLXDC2, PLXNA4 C17ORF64, C17ORF66 BRD1, BRD2 NT5C2, NT5C3A PLXNB2, PNMA2 C17ORF70, C17ORF74 BRD3, BRD4 NT5DC2, NTF3 PNN, PNP C17ORF75, C17ORF77 BRD7, BRD8 NTRK2, NUBPL POGK, POLA2 C17ORF78, C17ORF80 BRDT, BRE NUF2, NUFIP2 POLD1, POLR2A C17ORF96, C17ORF98 BRF1, BRI3 NUMBL, NUP133 POLR2I, POLR2K C18ORF32, C18ORF54 BRI3BP, BRIP1 NUP205, NUP210 POLR3A, POLRMT C19ORF12, C19ORF18 BRMS1L, BRPF1 NUP43, NUP93 POM121, POM121C C19ORF47, C19ORF48 BRPF3, BRSK1 NUP98, NUTF2 POMC, PPA2 C19ORF53, C19ORF54 BRSK2, BRWD1 NXN, OAZ2 PPAP2C, PPARG C19ORF55, C19ORF73 BSDC1, BSG OGFOD1, OGG1 PPCS, PPIA C1D, C1GALT1 BSN, BTAF1 OLA1, OLIG2 PPIB, PPM1D C1ORF106, C1ORF109 BTBD1, BTBD10 OLR1, ORC2 PPM1H, PPP1CA C1ORF111, C1ORF112 BTBD17, BTBD2 OSBP, OSBP2 PPP1R12C, PPP1R13B C1ORF115, C1ORF123 BTBD3, BTBD7 OSBPL10, OSBPL1A PPP1R16A, PPP2R1A C1ORF134, C1ORF168 BTBD9, BTC OSBPL3, OSR1 PPP2R2A, PPP2R2C C1ORF172, C1ORF173 BTF3, BTF3L4 OSTF1, OTUD1 PPP2R4, PPP2R5A C1ORF174, C1ORF198 BTG1, BTG2 OTUD5, P4HA1 PQBP1, PRC1 C1ORF21, C1ORF27 BTG3, BTLA PACS2, PAG1 PRCP, PRDX2 C1ORF35, C1ORF43 BTN2A1, BTN2A3P PAIP2, PAK1 PRDX4, PRDX6 C1ORF53, C1ORF56 BTN3A1, BTN3A3 PALB2, PALLD PREX1, PRIMA1 C1ORF61, C1ORF63 BTNL3, BTNL9 PAN2, PAN3 PRKAA1, PRKAG1 C1ORF85, C1ORF86 BTRC, BUB1 PANK3, PAPD5 PRKAR2A, PRKCE C1QA, C1QBP BUB1B, BUB3 PARK7, PARL PRKG2, PRMT5 C1QTNF1, C1QTNF2 BUD13, BUD31 PARP1, PARP11 PROCR, PROM1 C1QTNF3, C1QTNF6 BVES, BVES-AS1 PARP4, PARP6 PRPF40A, PRSS21 C1QTNF9, C1S BYSL, BZRAP1 PARP9, PBRM1 PRUNE2, PSAT1 C2, C20ORF141 BZW1, BZW2 PBX1, PBX2 PSD3, PSG3 C20ORF173, C20ORF194 C10ORF10, C10ORF105 PBX3, PCBP1 PSG6, PSG9 C20ORF197, C20ORF24 C10ORF114, C10ORF118 PCBP2, PCDH8 PSIP1, PSMB5 C20ORF72, C20ORF85 C10ORF137, C10ORF2 PCDHB14, PCDHGA10 PSMD12, PSMD3 C20ORF96, C21ORF33 C10ORF35, C10ORF71 PCDHGA2, PCMTD2 PSME1, PSMG1 C21ORF58, C21ORF59 C10ORF88, C10ORF99 PCNT, PCSK7 PTAR1, PTBP1 C21ORF62, C21ORF91 C11ORF16, C11ORF24 PDCD4, PDGFD PTBP2, PTGES2 C22ORF29, C22ORF43 C11ORF30, C11ORF48 PDHA2, PDHB PTGS2, PTK2 C2CD2, C2CD3 C11ORF52, C11ORF54 PDIK1L, PDXDC1 PTMA, PTMAP7 C2CD4A, C2CD5 C11ORF57, C11ORF58 PEA15, PEBP1 PTPLA, PTPLAD1 C2ORF16, C2ORF18 C11ORF63, C11ORF65 PEF1, PEG10 PTPLB, PTPMT1 C2ORF42, C2ORF43 C11ORF68, C11ORF75 PELI1, PELI2 PTPN1, PTPN22 C2ORF44, C2ORF47 C11ORF82, C11ORF83 PELO, PER3 PTPN9, PTPRD C2ORF48, C2ORF49 C11ORF84, C11ORF9 PEX1, PEX19 PTPRF, PTPRG C2ORF69, C2ORF72 C11ORF91, C11ORF96 PFDN1, PFKFB2 PUS7, PWP1 C2ORF74, C2ORF81 C12ORF10, C12ORF23 PFN1, PGRMC2 PWWP2A, PXDN C2ORF88, C3 C12ORF29, C12ORF4 PHACTR2, PHAX PXN, PYGB C3ORF17, C3ORF30 C12ORF40, C12ORF49 PHC3, PHF12 QKI, QRSL1 C3ORF36, C3ORF37 C12ORF5, C12ORF52 PHF14, PHF16 RAB11FIP2, RAB27B C3ORF38, C3ORF49 C12ORF56, C12ORF57 PHF17, PHF20 RAB30, RAB34 C3ORF58, C4BPB C12ORF60, C12ORF65 PHF20L1, PHF21A RAB39B, RABEPK C4ORF17, C4ORF27 C12ORF66, C12ORF68 PHF5A, PHIP RABGAP1L, RABL2A C4ORF32, C4ORF45 C12ORF76, C14ORF1 PHKA1, PHLDA3 RABL2B, RAC1 C5, C5AR1 C14ORF166, C14ORF178 PHLPP2, PHOX2A RAI1, RALB C5ORF15, C5ORF22 C14ORF183, C14ORF2 PHPT1, PHRF1 RAP1B, RAP1GAP2 C5ORF24, C5ORF30 C14ORF28, C14ORF37 PHTF1, PIAS3 RAPGEF2, RARA C5ORF34, C5ORF42 C14ORF79, C14ORF80 PIGN, PIGQ RASSF1, RASSF5 C5ORF51, C6ORF106 C15ORF27, C15ORF32 PIGX, PIK3C2A RBBP5, RBFOX2 C6ORF118, C6ORF120 C15ORF39, C15ORF40 PIK3R1, PIP4K2A RBM12B, RBM28 442

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses C6ORF132, C6ORF141 C15ORF41, C15ORF52 PIP5K1A, PITPNC1 RBM39, RBM42 C6ORF195, C6ORF201 C15ORF56, C15ORF57 PITPNM1, PKD2 RBM47, RCC1 C6ORF211, C6ORF222 C16ORF11, C16ORF46 PKM, PKNOX1 RCC2, RCOR2 C6ORF223, C6ORF25 C16ORF52, C16ORF54 PKP2, PLAGL2 RELN, REM2 C6ORF58, C6ORF62 C16ORF58, C16ORF59 PLAT, PLD1 RER1, RFC2 C7ORF26, C7ORF31 C16ORF62, C16ORF70 PLEC, PLEKHA1 RFC5, RFT1 C7ORF33, C7ORF49 C16ORF71, C16ORF72 PLOD2, PLOD3 RGMB, RGN C7ORF60, C7ORF69 C17ORF100, C17ORF50 PLP2, PLXNB2 RGS12, RGS17 C8A, C8G C17ORF51, C17ORF53 PM20D2, PMAIP1 RGS5, RHOC C8ORF12, C8ORF31 C17ORF58, C17ORF59 PMP22, POFUT1 RHPN2, RIMS2 C8ORF33, C8ORF4 C17ORF64, C17ORF66 POGK, POGZ RIN1, RIPK2 C8ORF44, C8ORF46 C17ORF70, C17ORF74 POLB, POLD2 RNASEK, RNF138 C8ORF47, C9ORF114 C17ORF75, C17ORF77 POLE, POLG RNF141, RNF152 C9ORF163, C9ORF24 C17ORF78, C17ORF80 POLR1B, POLR2A RNF170, RNF213 C9ORF37, C9ORF40 C17ORF96, C17ORF98 POLR3A, POLR3B RNF4, RNF40 C9ORF41, C9ORF64 C18ORF25, C18ORF32 POLR3H, POM121 ROCK2, RPE C9ORF66, C9ORF72 C18ORF54, C18ORF56 POMGNT1, POMZP3 RPE65, RPL13A C9ORF78, C9ORF84 C19ORF12, C19ORF18 POU3F2, PPAP2A RPL14, RPL18A C9ORF89, CA1 C19ORF47, C19ORF48 PPARA, PPFIA4 RPL21, RPL23A CA10, CA12 C19ORF53, C19ORF54 PPIA, PPIAL4B RPL26, RPL27 CA13, CA14 C19ORF55, C19ORF73 PPIF, PPM1H RPL29, RPL32 CA2, CA3 C1D, C1GALT1 PPM1L, PPP1CA RPL35A, RPL4 CA5A, CA5B C1GALT1C1, C1ORF106 PPP1R15B, PPP1R9B RPL7L1, RPLP1 CA8, CA9 C1ORF109, C1ORF111 PPP2R1A, PPP2R2A RPP14, RPP40 CAAP1, CAB39 C1ORF112, C1ORF115 PPP2R2C, PPP2R5E RPRD2, RPS16 CAB39L, CABIN1 C1ORF122, C1ORF123 PPP6C, PPRC1 RPS29, RPS6 CABLES1, CABLES2 C1ORF134, C1ORF168 PRC1, PRDM16 RPS7, RPSA CABP1, CABP2 C1ORF172, C1ORF174 PRDX4, PREP RRBP1, RRM1 CABP5, CABYR C1ORF198, C1ORF21 PREPL, PRICKLE2 RRM2B, RRP36 CACFD1, CACHD1 C1ORF216, C1ORF27 PRICKLE4, PRKAA1 RSF1, RSRC2 CACNA1B, CACNA1I C1ORF35, C1ORF43 PRKAB1, PRKAB2 RXFP1, S100A11 CACNA2D1, CACNA2D3 C1ORF52, C1ORF53 PRKACA, PRKAR2A S100A16, S100G CACNA2D4, CACNB1 C1ORF56, C1ORF61 PRKCE, PRMT3 SAC3D1, SAMD15 CACNB2, CACNB3 C1ORF63, C1ORF74 PROM1, PROSER1 SAMD8, SAT1 CACNG1, CACNG4 C1ORF85, C1ORF86 PRPF39, PRR11 SATB1, SBNO2 CACNG5, CACNG8 C1QA, C1QB PRRC1, PRRC2A SCAF11, SCARF1 CACUL1, CAD C1QBP, C1QTNF1 PRRC2C, PRUNE2 SCML2, SCN3A CADM1, CADPS2 C1QTNF2, C1QTNF3 PSAT1, PSMB5 SCP2, SDC4 CAGE1, CALB2 C1QTNF6, C1QTNF9 PSMC4, PSMD10 SDPR, SDR39U1 CALCB, CALCOCO2 C1QTNF9B-AS1, C1S PSMD12, PSMD2 SEC11C, SEC13 CALD1, CALHM3 C2, C20ORF141 PSMD3, PSMD4 SEC16A, SEC23IP CALM1, CALM2 C20ORF173, C20ORF194 PSPH, PTAR1 SEC61A1, SEC62 CALM3, CALML5 C20ORF197, C20ORF24 PTBP3, PTDSS2 SECISBP2, SEMA4D CALML6, CALN1 C20ORF27, C20ORF72 PTEN, PTGFR SEMG2, 2-Sep CALR, CALU C20ORF85, C20ORF96 PTK2, PTPDC1 6-Sep, SERBP1 CAMK2B, CAMK2D C21ORF33, C21ORF58 PTPLA, PTPN1 SERP1, SERPINB5 CAMK2G, CAMK4 C21ORF59, C21ORF62 PTPN11, PTPN12 SERPINE1, SETD7 CAMKK2, CAMKV C21ORF91, C22ORF29 PTPN13, PTPN14 SFI1, SFN CAMLG, CAMSAP1 C22ORF42, C22ORF43 PTPN3, PTPN9 SFXN1, SGK3 CAMSAP2, CAMSAP3 C2CD2, C2CD3 PTPRD, PTPRF SGSH, SH2D1A CAMTA1, CAMTA2 C2CD4A, C2CD5 PTX3, PUM2 SH2D4A, SH3BGRL3 CAND1, CAND2 C2ORF16, C2ORF18 PURA, PURB SH3PXD2B, SH3TC2 CANT1, CANX C2ORF42, C2ORF43 PXN, PYGO1 SHCBP1, SHE CAP1, CAPG C2ORF44, C2ORF47 QKI, QRSL1 SIGLEC10, SIGMAR1 CAPN1, CAPN11 C2ORF49, C2ORF69 RAB11FIP1, RAB11FIP2 SIK1, SIN3A CAPN13, CAPN2 C2ORF72, C2ORF74 RAB22A, RAB30 SIRT1, SKP2 CAPN7, CAPNS1 C2ORF81, C2ORF88 RAB5C, RAB6A SLBP, SLC12A6 CAPRIN1, CAPRIN2 C3, C3ORF17 RAB6B, RAB6C SLC16A9, SLC25A1 CAPS2, CAPZA1 C3ORF18, C3ORF30 RABGAP1, RABGEF1 SLC25A10, SLC25A19 CAPZA2, CAPZB C3ORF36, C3ORF37 RABL6, RAC1 SLC25A22, SLC25A30 CARD10, CARD11 C3ORF38, C3ORF49 RACGAP1, RAD23B SLC25A32, SLC25A44 CARD8, CARF C3ORF58, C4BPB RAI1, RAI14 SLC26A7, SLC27A4 CARHSP1, CARM1 C4ORF17, C4ORF27 RALGPS2, RAN SLC29A1, SLC35A5 CARNS1, CARS C4ORF32, C4ORF45 RANBP10, RANGAP1 SLC35F5, SLC37A3 CARS2, CASC3 C5, C5AR1 RAP1GAP2, RAPGEF6 SLC39A14, SLC39A3 CASC4, CASC5 C5ORF15, C5ORF22 RAPH1, RARA SLC39A6, SLC44A1 CASD1, CASK C5ORF24, C5ORF30 RASEF, RASGRP1 SLC45A3, SLC7A1 CASKIN2, CASP14 C5ORF34, C5ORF42 RASGRP3, RASSF2 SLC7A11, SLX4 CASP2, CASP3 C5ORF51, C6ORF106 RB1, RBBP5 SMAD2, SMAP2 CASP7, CASP8 C6ORF120, C6ORF132 RBM10, RBM15B SMARCA1, SMARCA2 CASP8AP2, CASP9 C6ORF141, C6ORF15 RBM33, RBM39 SMARCB1, SMARCC1 443

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses CASQ1, CASQ2 C6ORF195, C6ORF201 RBM42, RBM6 SMC4, SMEK1 CAST, CAT C6ORF211, C6ORF222 RBMXL1, RDH11 SMIM14, SMO CATSPERB, CATSPERG C6ORF223, C6ORF25 RECK, RER1 SMU1, SNAI1 CAV1, CAV2 C6ORF47, C6ORF58 RERE, REST SNAI2, SNAPIN CBFA2T2, CBFB C6ORF62, C6ORF89 REV1, REV3L SNCA, SNRNP200 CBL, CBLN4 C7ORF10, C7ORF26 RFC1, RGMB SNRNP35, SNX5 CBR1, CBR4 C7ORF31, C7ORF33 RGS5, RHBDD1 SNX6, SOD2 CBS, CBWD1 C7ORF49, C7ORF55 RHOA, RHOB SOS2, SOX4 CBX1, CBX2 C7ORF60, C7ORF69 RHOQ, RICTOR SOX5, SOX6 CBX3, CBX4 C8A, C8G RIN2, RIT1 SOX9, SP1 CBX5, CBX6 C8ORF12, C8ORF22 RMND5A, RNASEK SPAG5, SPATA2 CBX7, CBY1 C8ORF31, C8ORF33 RND3, RNF10 SPC24, SPEN CC2D1B, CC2D2A C8ORF4, C8ORF44 RNF11, RNF152 SPERT, SPI1 CC2D2B, CCAR1 C8ORF46, C8ORF47 RNF170, RNF20 SPIN1, SPINK1 CCBL1, CCBL2 C9ORF114, C9ORF142 RNF215, RNF219 SPIRE1, SPP1 CCDC102A, CCDC103 C9ORF16, C9ORF163 RNF38, RNF4 SPRTN, SPRY2 CCDC106, CCDC108 C9ORF24, C9ORF37 RNF40, RNF44 SPRYD4, SPTBN1 CCDC109B, CCDC110 C9ORF40, C9ORF41 RP2, RPA1 SRC, SREK1 CCDC111, CCDC112 C9ORF64, C9ORF66 RPE, RPL10A SRF, SRGAP1 CCDC113, CCDC115 C9ORF72, C9ORF78 RPL12, RPL13A SRI, SRPK1 CCDC117, CCDC120 C9ORF84, C9ORF85 RPL14, RPL15 SRRM1, SRRM2 CCDC121, CCDC124 C9ORF89, CA1 RPL18A, RPL21 SRSF4, SRSF5 CCDC134, CCDC136 CA10, CA11 RPL23A, RPL26 SRSF6, SRSF7 CCDC137, CCDC14 CA12, CA13 RPL27, RPL29 SRSF9, SRXN1 CCDC141, CCDC142 CA14, CA2 RPL3, RPL32 SSFA2, SSNA1 CCDC146, CCDC147 CA3, CA5A RPL35A, RPL36A SSR1, STAMBPL1 CCDC15, CCDC150 CA5B, CA8 RPL4, RPL7L1 STIM1, STK24 CCDC151, CCDC170 CA9, CAAP1 RPL8, RPLP0 STXBP3, STYX CCDC174, CCDC176 CAB39, CAB39L RPRD2, RPS13 SUCLA2, SUGP1 CCDC18, CCDC180 CABIN1, CABLES1 RPS15, RPS16 SUPT6H, SUSD1 CCDC181, CCDC19 CABLES2, CABP1 RPS17, RPS2 SUV420H1, SUZ12 CCDC22, CCDC28A CABP2, CABP5 RPS24, RPS25 SYBU, SYMPK CCDC30, CCDC34 CABP7, CABYR RPS29, RPS5 SYNE1, SYNE2 CCDC38, CCDC42 CACFD1, CACHD1 RPS6, RPS6KA3 SYNJ1, SYNPO2L CCDC43, CCDC47 CACNA1B, CACNA1C RPS7, RPSA SYPL1, SYT4 CCDC51, CCDC57 CACNA1I, CACNA1S RRAGC, RRM2B SYTL2, TAC1 CCDC58, CCDC59 CACNA2D1, CACNA2D3 RSF1, RSPRY1 TACC3, TAGLN2 CCDC6, CCDC65 CACNA2D4, CACNB1 RTN4, RUFY3 TAOK1, TAOK2 CCDC69, CCDC7 CACNB2, CACNB3 RUNDC3B, SACM1L TAS2R7, TAT CCDC71, CCDC74B CACNB4, CACNG1 SAMD5, SAP30L TAZ, TBC1D9 CCDC77, CCDC8 CACNG3, CACNG4 SAPCD2, SAR1A TBC1D9B, TBCD CCDC86, CCDC88C CACNG5, CACNG8 SART1, SASH1 TCEAL1, TCERG1 CCDC92, CCDC94 CACUL1, CAD SAT1, SATB1 TCP1, TDP1 CCDC96, CCDC97 CADM1, CADPS2 SBNO1, SCAF11 TELO2, TERT CCER1, CCIN CAGE1, CALB2 SCARB2, SCARF1 TET3, TEX2 CCKBR, CCL1 CALCB, CALCOCO2 SCML2, SCN4B TFRC, TGFB1 CCL11, CCL14 CALCRL, CALD1 SCP2, SCRN1 TGFBR1, TH1L CCL15, CCL17 CALHM3, CALM2 SDHA, SEC13 THAP2, THBS1 CCL19, CCL2 CALM3, CALML5 SEC24C, SEC61G THRAP3, THY1 CCL20, CCL22 CALML6, CALN1 SEC63, SECISBP2 TIAM1, TIGD4 CCL24, CCL25 CALR, CALU SELE, SEMA5A TIMP3, TKT CCL26, CCL28 CAMK2B, CAMK2D SEPHS1, SEPHS2 TLE1, TLE4 CCL3L1, CCL3L3 CAMK2G, CAMK4 11-Sep, 2-Sep TLR4, TMCC1 CCL4, CCL5 CAMKK2, CAMKMT 7-Sep, SERAC1 TMED5, TMEM106C CCL8, CCNA1 CAMKV, CAMLG SERBP1, SERINC3 TMEM107, TMEM127 CCNA2, CCNB1 CAMP, CAMSAP1 SERPINB5, SERPINI1 TMEM136, TMEM165 CCNB2, CCNC CAMSAP2, CAMSAP3 SERTAD4, SESN1 TMEM167A, TMEM68 CCND1, CCND2 CAMTA1, CAMTA2 SESTD1, SET TMEM87A, TMEM97 CCND3, CCNE1 CAND1, CAND2 SETD1B, SETDB1 TMEM99, TMF1 CCNE2, CCNF CANT1, CANX SF1, SF3B1 TMOD3, TMSB4X CCNG1, CCNG2 CAP1, CAPG SF3B3, SFPQ TMX1, TNIP3 CCNI, CCNJ CAPN1, CAPN11 SFRP2, SFXN1 TNKS1BP1, TNPO2 CCNK, CCNL1 CAPN13, CAPN2 SGCB, SGK3 TNRC6B, TNS4 CCNO, CCNT2 CAPN7, CAPN9 SGSM1, SGTA TOB1, TOR2A CCNY, CCNYL1 CAPNS1, CAPRIN1 SH2D1A, SHC1 TP53, TP73 CCP110, CCPG1 CAPRIN2, CAPS2 SHISA2, SIGLEC10 TPD52L2, TPM1 CCR1, CCR2 CAPZA1, CAPZA2 SIGMAR1, SIP1 TPM2, TPM3 CCR4, CCR7 CAPZB, CARD10 SIRT1, SKI TPM4, TPPP CCR9, CCRN4L CARD11, CARD14 SKIV2L, SKP1 TPSD1, TPSG1 CCS, CCSAP CARD8, CARF SKP2, SLAIN2 TRA2B, TRAP1 CCSER2, CCT2 CARHSP1, CARM1 SLC11A2, SLC12A6 TRAPPC2P1, TRAPPC3 444

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses CCT3, CCT4 CARNS1, CARS SLC15A2, SLC16A10 TRIM2, TRIM24 CCT5, CCT6A CARS2, CASC3 SLC16A9, SLC17A5 TRIM26, TRIM36 CCT6B, CCT6P1 CASC4, CASC5 SLC19A2, SLC22A23 TRIM37, TRIM59 CCT7, CCT8 CASD1, CASK SLC25A10, SLC25A3 TRIM9, TRIO CCZ1, CCZ1B CASKIN2, CASP14 SLC25A30, SLC25A36 TRMT61A, TROVE2 CD101, CD109 CASP2, CASP3 SLC25A44, SLC25A5 TRPA1, TRPM4 CD14, CD151 CASP4, CASP7 SLC25A51, SLC26A2 TRPM6, TRPS1 CD160, CD164 CASP8, CASP8AP2 SLC30A7, SLC31A1 TSC1, TSHR CD177, CD1D CASP9, CASQ1 SLC35A2, SLC35F5 TSKU, TSPAN14 CD22, CD226 CASQ2, CASR SLC37A3, SLC37A4 TSPAN18, TSPAN19 CD24P4, CD27 CAST, CAT SLC38A1, SLC39A8 TSPAN4, TSPAN6 CD274, CD276 CATSPERB, CATSPERG SLC3A2, SLC44A1 TST, TTC17 CD2AP, CD2BP2 CAV1, CAV2 SLC52A2, SLC5A3 TTC37, TUBA1B CD300A, CD300C CBFA2T2, CBFB SLC6A9, SLC7A1 TUBAL3, TUBB2B CD302, CD320 CBL, CBLC SLC7A11, SLC9A1 TULP4, TWF1 CD33, CD34 CBLL1, CBLN4 SLC9A6, SLFN11 TWF2, TWSG1 CD36, CD37 CBR1, CBR3 SLK, SLMAP TXLNA, UBA2 CD38, CD3EAP CBR4, CBS SMAD2, SMAD3 UBA6, UBAC2 CD40, CD40LG CBWD1, CBX1 SMAD7, SMAP2 UBC, UBE2I CD44, CD46 CBX2, CBX3 SMARCA4, SMARCE1 UBE2Q2, UBE2W CD47, CD55 CBX4, CBX5 SMC1A, SMC6 UBE2Z, UBQLN4 CD59, CD5L CBX6, CBX7 SMCHD1, SMNDC1 UBR5, UBTF CD63, CD69 CBY1, CC2D1B SMOC1, SMU1 UCK1, UGGT1 CD79A, CD81 CC2D2A, CC2D2B SMURF2, SNF8 UGGT2, UGP2 CD83, CD8A CCAR1, CCBL1 SNRK, SNRNP200 UGT2B17, UGT8 CD9, CD96 CCBL2, CCDC102B SNRNP48, SNX30 UHMK1, UHRF1 CD97, CD99 CCDC103, CCDC106 SNX5, SOCS1 UHRF1BP1, UNC119B CD99L2, CDA CCDC108, CCDC109B SOCS4, SOCS5 UNC13D, UNC45A CDADC1, CDAN1 CCDC11, CCDC110 SOD2, SOD3 UNC93B1, URI1 CDC123, CDC14A CCDC111, CCDC112 SOGA3, SON USP10, USP21 CDC14B, CDC16 CCDC113, CCDC115 SORT1, SOST USP33, USP45 CDC20, CDC23 CCDC117, CCDC12 SOWAHC, SOX2 UST, UTRN CDC25A, CDC25B CCDC120, CCDC121 SOX5, SP1 UVRAG, VAC14 CDC25C, CDC27 CCDC124, CCDC125 SP3, SPAG5 VAMP7, VASP CDC34, CDC37 CCDC134, CCDC136 SPAG9, SPATS2L VEGFA, VIM CDC37L1, CDC42 CCDC137, CCDC14 SPECC1L, SPEN VKORC1, VMP1 CDC42BPA, CDC42BPB CCDC141, CCDC142 SPG11, SPIN1 VPS13C, VPS37B CDC42EP1, CDC42EP4 CCDC144A, CCDC144NL SPIRE1, SPRTN VPS4B, VPS53 CDC42SE1, CDC5L CCDC146, CCDC147 SPRY2, SPRYD3 VSIG1, WASF2 CDC6, CDC7 CCDC15, CCDC150 SPTBN1, SPTBN2 WDFY1, WDR11 CDC73, CDCA2 CCDC151, CCDC170 SPTLC1, SPTLC3 WDR18, WDR33 CDCA3, CDCA4 CCDC174, CCDC176 SPTSSA, SREK1 WDR62, WLS CDCA5, CDCA7 CCDC18, CCDC180 SRF, SRP68 XPNPEP3, XPO6 CDCA7L, CDCA8 CCDC181, CCDC19 SRPK1, SRPK2 XPOT, XRCC6 CDCP1, CDH1 CCDC22, CCDC25 SRRM2, SRRT YBX1, YBX3 CDH11, CDH13 CCDC28A, CCDC3 SRSF11, SRSF4 YES1, YIPF6 CDH18, CDH19 CCDC30, CCDC34 SS18, SSFA2 YTHDC1, YTHDF2 CDH2, CDH20 CCDC38, CCDC39 SSH1, SSH2 YWHAH, YWHAQ CDH3, CDH4 CCDC41, CCDC42 SSR1, SSR3 YWHAZ, YY1 CDH5, CDHR1 CCDC43, CCDC47 SSRP1, ST6GAL1 ZBTB18, ZBTB21 CDHR2, CDHR3 CCDC51, CCDC53 ST6GALNAC4, STAG2 ZBTB39, ZBTB4 CDHR5, CDIP1 CCDC57, CCDC58 STAM2, STAMBP ZBTB44, ZBTB6 CDIPT, CDK1 CCDC59, CCDC6 STARD13, STAT3 ZBTB9, ZC3H11A CDK10, CDK11A CCDC65, CCDC68 STAT5A, STAU2 ZC3H14, ZCCHC14 CDK12, CDK13 CCDC69, CCDC7 STIM1, STK16 ZCCHC6, ZDHHC5 CDK14, CDK16 CCDC71, CCDC74B STK3, STMN3 ZEB1, ZEB2 CDK17, CDK18 CCDC77, CCDC8 STOX2, STRBP ZFAND5, ZFC3H1 CDK19, CDK2 CCDC82, CCDC86 STXBP5, SUCLA2 ZHX1, ZMYND8 CDK20, CDK2AP1 CCDC88C, CCDC92 SUN2, SUPT6H ZNF207, ZNF236 CDK4, CDK5R2 CCDC94, CCDC96 SUPV3L1, SUV420H1 ZNF264, ZNF280C CDK5RAP1, CDK5RAP2 CCDC97, CCER1 SUZ12, SV2A ZNF326, ZNF384 CDK6, CDK7 CCIN, CCKBR SYNE2, SYNGR1 ZNF395, ZNF398 CDK8, CDK9 CCL1, CCL11 SYNJ2, SYT4 ZNF436, ZNF48 CDKAL1, CDKL5 CCL14, CCL15 TACC3, TAF1 ZNF561, ZNF567 CDKN1A, CDKN1B CCL17, CCL19 TAF5, TANGO6 ZNF568, ZNF579 CDKN1C, CDKN2A CCL2, CCL20 TAOK1, TAOK2 ZNF587, ZNF594 CDKN2AIP, CDKN2AIPNL CCL22, CCL24 TARDBP, TAS2R7 ZNF598, ZNF608 CDKN2B, CDKN2C CCL25, CCL26 TATDN2, TAZ ZNF622, ZNF638 CDKN2D, CDKN3 CCL27, CCL28 TBC1D22B, TBC1D9B ZNF689, ZNF701 CDON, CDR1 CCL3L1, CCL3L3 TBK1, TBL1XR1 ZNF746, ZNF799 CDR2L, CDRT15L2 CCL4, CCL5 TBX19, TCEAL1 ZNF804A, ZSCAN12 445

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses CDRT4, CDS1 CCL7, CCL8 TCEB2, TCERG1 ZXDA CDS2, CDT1 CCM2L, CCNA1 TCF21, TCF7L1 CDV3, CDX2 CCNA2, CCNB1 TCOF1, TCP1 CDX4, CDY2B CCNB1IP1, CCNB2 TCP11, TDRP CEACAM21, CEACAM3 CCNC, CCND1 TESK2, TET1 CEACAM5, CEACAM8 CCND2, CCND3 TET3, TEX10 CEBPA, CEBPB CCNDBP1, CCNE1 TFAP2A, TFDP2 CEBPD, CEBPG CCNE2, CCNF TFRC, TGFB1 CEBPZ, CECR2 CCNG1, CCNG2 TGFB2, TGFBI CECR5, CECR5-AS1 CCNI, CCNJ TGFBR1, TGFBR2 CEL, CELA3B CCNK, CCNL1 TGFBR3, TGFBRAP1 CELF1, CELF2 CCNL2, CCNO TGIF1, TGIF2 CELF4, CELSR1 CCNT1, CCNT2 THOC2, THOP1 CELSR3, CEND1 CCNY, CCNYL1 TIAM1, TICAM1 CENPA, CENPB CCP110, CCPG1 TIMM50, TIMP3 CENPE, CENPF CCR1, CCR2 TIPARP, TLE3 CENPI, CENPJ CCR4, CCR6 TLE4, TLN1 CENPK, CENPM CCR7, CCR9 TLR4, TM9SF3 CENPN, CENPO CCRN4L, CCS TMED5, TMED7 CENPP, CENPQ CCSAP, CCSER2 TMEM107, TMEM127 CENPT, CENPV CCT2, CCT3 TMEM132B, TMEM136 CEP104, CEP112 CCT4, CCT5 TMEM147, TMEM165 CEP120, CEP128 CCT6A, CCT6B TMEM167A, TMEM167B CEP152, CEP164 CCT6P1, CCT7 TMEM168, TMEM183A CEP170, CEP170B CCT8, CCZ1 TMEM2, TMEM245 CEP19, CEP192 CCZ1B, CD101 TMEM248, TMEM41A CEP250, CEP350 CD109, CD14 TMEM56, TMEM59 CEP41, CEP44 CD151, CD160 TMEM64, TMEM74 CEP55, CEP63 CD163L1, CD164 TMEM97, TMEM99 CEP68, CEP72 CD177, CD1D TMPO, TMUB1 CEP78, CEP85 CD22, CD226 TMX4, TNFAIP3 CEP85L, CEP97 CD24, CD248 TNFRSF11B, TNFSF10 CEPT1, CERCAM CD24P4, CD27 TNIP1, TNIP2 CERK, CERS1 CD274, CD276 TNIP3, TNKS2 CERS2, CERS5 CD28, CD2AP TNPO1, TNRC6B CERS6, CES1 CD2BP2, CD300A TNS3, TNXB CES2, CETN1 CD300C, CD302 TOB1, TOB2 CETN2, CETN3 CD320, CD33 TOM1, TOMM20 CFB, CFH CD34, CD36 TOP2A, TOP3A CFHR2, CFI CD37, CD38 TOPORS, TP53 CFL1, CFL2 CD3EAP, CD40 TP53BP2, TP63 CFLAR, CFP CD40LG, CD44 TPD52, TPD52L2 CFTR, CGA CD46, CD47 TPM1, TPRG1L CGN, CGNL1 CD55, CD59 TPSG1, TRAF4 CHAC1, CHAF1A CD5L, CD68 TRAM2, TRAP1 CHAF1B, CHAMP1 CD69, CD79A TRAPPC2, TRAPPC2P1 CHAT, CHCHD10 CD81, CD83 TRIB1, TRIM16L CHCHD2, CHCHD5 CD86, CD8A TRIM2, TRIM27 CHCHD6, CHD1 CD9, CD96 TRIM28, TRIM29 CHD3, CHD4 CD97, CD99 TRIM33, TRIM36 CHD6, CHD7 CD99L2, CDA TRIM37, TRIM38 CHD8, CHD9 CDADC1, CDAN1 TRIM44, TRIM59 CHDH, CHEK1 CDC123, CDC14A TRIM65, TRMT5 CHERP, CHFR CDC14B, CDC16 TRMT61A, TRPC3 CHI3L2, CHIC2 CDC20, CDC23 TRPM7, TRRAP CHL1, CHM CDC25A, CDC25B TSC22D2, TSHZ3 CHML, CHMP2A CDC25C, CDC27 TSKU, TSN CHMP3, CHMP4B CDC34, CDC37 TSNAX, TSPAN14 CHMP4C, CHMP7 CDC37L1, CDC40 TSPAN6, TSPYL1 CHN1, CHORDC1 CDC42, CDC42BPA TSR2, TST CHP1, CHPF CDC42BPB, CDC42EP1 TTC21B, TTC33 CHPF2, CHPT1 CDC42EP3, CDC42EP4 TUB, TUBA1B CHRAC1, CHRDL2 CDC42SE1, CDC42SE2 TUBA1C, TUBAL3 CHRFAM7A, CHRM2 CDC5L, CDC6 TUBB3, TUBB4B CHRM3, CHRNA2 CDC7, CDC73 TUBGCP5, TUFM CHRNA5, CHRNA7 CDCA2, CDCA3 TUT1, TXLNG2P CHRNA9, CHRNB1 CDCA4, CDCA5 TXN, TXNL4A CHRNB4, CHST11 CDCA7, CDCA7L UBA1, UBAP1 CHST12, CHST14 CDCA8, CDCP1 UBAP2, UBC CHST3, CHST6 CDH1, CDH11 UBE2I, UBE2J1 CHST8, CHST9 CDH12, CDH13 UBE2J2, UBE2K 446

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses CHSY1, CHSY3 CDH18, CDH19 UBE2L3, UBE2N CHTF18, CHTF8 CDH2, CDH20 UBE2W, UBIAD1 CHTOP, CHUK CDH3, CDH5 UBQLN4, UBR3 CIAO1, CIAPIN1 CDH6, CDH9 UBR5, UBR7 CIB1, CIB2 CDHR1, CDHR2 UGGT1, UGT2B17 CIC, CIDEA CDHR3, CDHR5 UGT8, UHRF1 CIITA, CIRBP CDIP1, CDIPT UHRF1BP1, UHRF2 CIRH1A, CISD1 CDK1, CDK10 UNC13B, UQCR10 CISD2, CIT CDK11A, CDK12 UROD, USP10 CITED2, CIZ1 CDK13, CDK14 USP12, USP15 CKAP2, CKAP2L CDK16, CDK17 USP18, USP22 CKAP4, CKAP5 CDK18, CDK19 USP28, USP34 CKB, CKLF CDK2, CDK20 USP47, USP5 CKMT1B, CKS1B CDK2AP1, CDK2AP2 USP7, USP9X CKS2, CLASP1 CDK4, CDK5 UTP14A, UTRN CLASP2, CLCA2 CDK5R1, CDK5R2 VAC14, VANGL1 CLCA4, CLCC1 CDK5RAP1, CDK5RAP2 VASH2, VCAM1 CLCN2, CLCN3 CDK5RAP3, CDK6 VCL, VCP CLCN5, CLCN6 CDK7, CDK8 VCPIP1, VDAC1 CLCN7, CLDN1 CDK9, CDKAL1 VEGFA, VHL CLDN12, CLDN14 CDKL2, CDKL5 VIM, VOPP1 CLDN23, CLDN4 CDKN1A, CDKN1B VPS13A, VPS26A CLDN5, CLDN7 CDKN1C, CDKN2A VPS36, VPS37B CLDN9, CLDND1 CDKN2AIP, CDKN2AIPNL VPS53, VPS54 CLEC11A, CLEC14A CDKN2B, CDKN2C VSIG1, WASF1 CLEC16A, CLEC1A CDKN2D, CDKN3 WASF3, WDFY1 CLEC3B, CLEC4D CDO1, CDON WDR18, WDR34 CLEC4G, CLEC5A CDR1, CDR2L WDR37, WDR6 CLIC1, CLIC2 CDRT15L2, CDRT4 WDR61, WDR62 CLIC3, CLIC4 CDS1, CDS2 WDR7, WEE1 CLINT1, CLIP1 CDT1, CDV3 WFS1, WHSC1 CLIP2, CLIP3 CDX2, CDX4 WHSC1L1, WIBG CLIP4, CLK1 CDY2B, CEACAM21 WNK1, WNK3 CLK3, CLK4 CEACAM3, CEACAM5 WNT1, WNT3 CLMN, CLN3 CEACAM6, CEACAM8 WNT5A, WWC2 CLN6, CLN8 CEBPA, CEBPB XIAP, XPO6 CLNS1A, CLOCK CEBPD, CEBPE XRCC6, YAP1 CLP1, CLPB CEBPG, CECR2 YBX3, YES1 CLPP, CLPTM1 CECR5, CECR5-AS1 YIPF6, YKT6 CLPTM1L, CLPX CEL, CELA3B YLPM1, YME1L1 CLSPN, CLSTN1 CELF1, CELF2 YOD1, YWHAB CLSTN3, CLTA CELF4, CELSR1 YWHAE, YWHAG CLTB, CLTC CELSR2, CELSR3 YY1, YY1AP1 CLU, CLUAP1 CEMP1, CEND1 ZADH2, ZBTB18 CLUH, CLVS1 CENPA, CENPB ZBTB20, ZBTB21 CMAS, CMC1 CENPE, CENPF ZBTB38, ZBTB39 CMPK1, CMPK2 CENPI, CENPJ ZBTB4, ZBTB44 CMTM1, CMTM3 CENPK, CENPM ZBTB47, ZBTB7A CMTM4, CMTM6 CENPN, CENPP ZBTB8A, ZC3H14 CMTM8, CMYA5 CENPQ, CENPT ZC3H18, ZCCHC14 CNBP, CNDP1 CENPV, CEP104 ZCCHC2, ZCCHC24 CNDP2, CNEP1R1 CEP112, CEP120 ZCCHC7, ZEB1 CNFN, CNGB1 CEP128, CEP152 ZEB2, ZFAND5 CNIH, CNIH2 CEP164, CEP170 ZFPM2, ZFYVE16 CNIH3, CNIH4 CEP170B, CEP19 ZFYVE19, ZFYVE20 CNN3, CNNM2 CEP192, CEP250 ZFYVE9, ZKSCAN8 CNNM3, CNNM4 CEP350, CEP41 ZMIZ1, ZMYM2 CNOT1, CNOT11 CEP44, CEP55 ZMYND11, ZMYND8 CNOT2, CNOT3 CEP57, CEP63 ZNF101, ZNF131 CNOT4, CNOT6 CEP68, CEP72 ZNF160, ZNF207 CNOT6L, CNOT7 CEP78, CEP85 ZNF217, ZNF233 CNOT8, CNP CEP85L, CEP97 ZNF236, ZNF264 CNPY3, CNPY4 CEPT1, CERCAM ZNF275, ZNF292 CNRIP1, CNTD2 CERK, CERS1 ZNF321P, ZNF326 CNTLN, CNTNAP1 CERS2, CERS4 ZNF35, ZNF367 CNTNAP2, CNTNAP3 CERS5, CERS6 ZNF395, ZNF407 CNTNAP3B, CNTRL CES1, CES2 ZNF416, ZNF436 CNTROB, COA1 CES3, CETN1 ZNF460, ZNF532 COA6, COASY CETN2, CFB ZNF567, ZNF571 COBLL1, COCH CFH, CFHR2 ZNF587, ZNF598 COG1, COG2 CFI, CFL1 ZNF608, ZNF618 447

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses COG3, COG4 CFL2, CFLAR ZNF652, ZNF658 COG5, COG8 CFP, CFTR ZNF667, ZNF701 COIL, COL10A1 CGA, CGGBP1 ZNF708, ZNF711 COL11A1, COL12A1 CGN, CGNL1 ZNF714, ZNF770 COL14A1, COL15A1 CH25H, CHAC1 ZNF772, ZNF816 COL18A1, COL1A1 CHAF1A, CHAF1B ZNF829, ZRANB1 COL1A2, COL21A1 CHAMP1, CHAT ZSCAN4, ZW10 COL27A1, COL2A1 CHCHD10, CHCHD2 ZYG11B, ZYX COL3A1, COL4A1 CHCHD3, CHCHD5 COL4A2, COL4A3 CHCHD6, CHCHD7 COL4A5, COL4A6 CHD1, CHD1L COL5A2, COL5A3 CHD2, CHD3 COL6A1, COL6A2 CHD4, CHD5 COL6A5, COL6A6 CHD6, CHD7 COL7A1, COL8A2 CHD8, CHD9 COL9A2, COL9A3 CHDH, CHEK1 COLEC10, COLEC12 CHERP, CHFR COLGALT1, COLGALT2 CHI3L2, CHIC2 COLQ, COMMD10 CHL1, CHM COMMD2, COMMD4 CHML, CHMP2A COMMD6, COMMD7 CHMP2B, CHMP3 COMMD9, COMT CHMP4B, CHMP4C COPA, COPB1 CHMP6, CHMP7 COPB2, COPE CHN1, CHORDC1 COPG1, COPRS CHP1, CHPF COPS2, COPS3 CHPF2, CHPT1 COPS5, COPS6 CHRAC1, CHRDL1 COPS7A, COPS7B CHRDL2, CHRFAM7A COPS8, COPZ1 CHRM1, CHRM2 COQ10B, COQ2 CHRM3, CHRNA2 COQ3, COQ6 CHRNA5, CHRNA7 CORO1A, CORO1B CHRNA9, CHRNB1 CORO1C, CORO2A CHRNB4, CHST11 CORO2B, CORO6 CHST12, CHST14 CORO7, COTL1 CHST15, CHST3 COX1, COX10 CHST4, CHST6 COX14, COX16 CHST7, CHST8 COX17, COX18 CHST9, CHSY1 COX2, COX3 CHSY3, CHTF18 COX4I1, COX5A CHTF8, CHTOP COX6B1, COX7A2 CHUK, CIAO1 COX7A2L, COX7B CIAPIN1, CIB1 COX7C, COX8A CIB2, CIC COX8C, CPA3 CIDEA, CIITA CPA4, CPAMD8 CIRBP, CIRH1A CPD, CPE CISD1, CISD2 CPEB2, CPEB3 CISH, CIT CPEB4, CPED1 CITED2, CIZ1 CPNE1, CPNE2 CKAP2, CKAP2L CPNE4, CPNE5 CKAP4, CKAP5 CPNE6, CPNE8 CKB, CKLF CPO, CPOX CKMT1B, CKS1B CPPED1, CPS1 CKS2, CLASP1 CPSF1, CPSF2 CLASP2, CLCA2 CPSF3, CPSF3L CLCA4, CLCC1 CPSF4, CPSF6 CLCN2, CLCN3 CPSF7, CPT1A CLCN5, CLCN6 CPT1B, CPT2 CLCN7, CLDN1 CR2, CRABP1 CLDN12, CLDN14 CRAMP1L, CRB2 CLDN23, CLDN4 CRB3, CRBN CLDN5, CLDN7 CREB1, CREB3L2 CLDN9, CLDND1 CREB5, CREBBP CLEC11A, CLEC14A CREBL2, CREBRF CLEC16A, CLEC2B CREBZF, CRELD2 CLEC3B, CLEC4D CRH, CRHBP CLEC4G, CLEC5A CRHR2, CRIM1 CLIC1, CLIC2 CRIP2, CRISP1 CLIC3, CLIC4 CRISPLD2, CRK CLINT1, CLIP1 CRKL, CRLF3 CLIP2, CLIP3 CRLS1, CRNKL1 CLIP4, CLK1 448

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses CRNN, CROT CLK3, CLK4 CRTAM, CRTAP CLMN, CLMP CRTC1, CRTC2 CLN3, CLN5 CRTC3, CRY1 CLN6, CLN8 CRY2, CRYAA CLNS1A, CLOCK CRYBA1, CRYBA2 CLP1, CLPB CRYBG3, CRYGC CLPP, CLPTM1 CRYGN, CRYGS CLPTM1L, CLPX CS, CSAG1 CLSPN, CLSTN1 CSDE1, CSE1L CLSTN2, CLSTN3 CSF1, CSGALNACT1 CLTA, CLTB CSGALNACT2, CSHL1 CLTC, CLU CSK, CSN3 CLUAP1, CLUH CSNK1A1, CSNK1D CLVS1, CLYBL CSNK1E, CSNK1G1 CMAS, CMC1 CSNK1G2, CSNK1G3 CMC2, CMC4 CSNK2A1, CSNK2A2 CMPK1, CMPK2 CSPG5, CSPP1 CMSS1, CMTM1 CSRNP2, CSRNP3 CMTM2, CMTM3 CSRP1, CST1 CMTM4, CMTM6 CST2, CST3 CMTM7, CMTM8 CST4, CST5 CMYA5, CNBP CST6, CST9L CNDP1, CNDP2 CSTB, CSTF2 CNEP1R1, CNFN CSTF2T, CSTF3 CNGB1, CNIH CTBP1, CTBP2 CNIH2, CNIH3 CTC1, CTCF CNIH4, CNKSR3 CTDNEP1, CTDP1 CNN2, CNN3 CTDSP1, CTDSP2 CNNM2, CNNM3 CTDSPL, CTDSPL2 CNNM4, CNOT1 CTGF, CTH CNOT10, CNOT11 CTHRC1, CTIF CNOT2, CNOT3 CTLA4, CTNNA1 CNOT4, CNOT6 CTNNB1, CTNNBIP1 CNOT6L, CNOT7 CTNNBL1, CTNND1 CNOT8, CNP CTPS1, CTPS2 CNPY3, CNPY4 CTR9, CTRB1 CNRIP1, CNTD2 CTRB2, CTSA CNTFR, CNTLN CTSB, CTSC CNTNAP1, CNTNAP2 CTSD, CTSE CNTNAP3, CNTNAP3B CTSF, CTSK CNTRL, COA1 CTSL1, CTSW COA4, COA6 CTSZ, CTTN COASY, COBLL1 CTTNBP2NL, CTXN1 COCH, COG1 CUEDC1, CUEDC2 COG2, COG3 CUL1, CUL2 COG4, COG5 CUL3, CUL4A COG8, COIL CUL4B, CUL5 COL10A1, COL11A1 CUL7, CUL9 COL12A1, COL14A1 CUTA, CUX1 COL15A1, COL17A1 CUZD1, CWC15 COL18A1, COL1A1 CWC22, CWC25 COL1A2, COL21A1 CWF19L1, CXADR COL27A1, COL2A1 CXCL1, CXCL12 COL3A1, COL4A1 CXCL16, CXCL2 COL4A2, COL4A3 CXCL3, CXCL5 COL4A4, COL4A5 CXCL9, CXCR1 COL4A6, COL5A1 CXCR2, CXCR4 COL5A2, COL5A3 CXCR7, CXORF27 COL6A1, COL6A2 CXORF30, CXORF31 COL6A5, COL6A6 CXORF38, CXORF40B COL7A1, COL8A2 CXORF48, CXORF56 COL9A2, COL9A3 CXORF57, CXORF58 COLEC10, COLEC12 CXXC1, CXXC4 COLGALT1, COLGALT2 CXXC5, CYB561 COLQ, COMMD10 CYB561A3, CYB561D1 COMMD3, COMMD4 CYB5B, CYB5R1 COMMD6, COMMD7 CYB5R3, CYB5R4 COMMD8, COMMD9 CYBA, CYBRD1 COMT, COPA CYCS, CYFIP1 COPB1, COPB2 CYFIP2, CYGB COPE, COPG1 449

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses CYLD, CYORF17 COPRS, COPS2 CYP19A1, CYP1A1 COPS3, COPS5 CYP1A2, CYP1B1 COPS6, COPS7A CYP20A1, CYP21A1P COPS7B, COPS8 CYP21A2, CYP24A1 COPZ1, COPZ2 CYP26A1, CYP26B1 COQ10B, COQ2 CYP27B1, CYP27C1 COQ3, COQ6 CYP2B6, CYP2E1 COQ9, CORO1A CYP2F1, CYP2J2 CORO1B, CORO1C CYP2R1, CYP2U1 CORO2A, CORO2B CYP39A1, CYP3A5 CORO6, CORO7 CYP4F22, CYP4V2 COTL1, COX1 CYP51A1, CYP7A1 COX10, COX14 CYP7B1, CYP8B1 COX16, COX17 CYR61, CYSLTR1 COX18, COX2 CYSTM1, CYTB COX3, COX4I1 CYTH1, CYTH3 COX5A, COX6B1 CYTH4, CYTIP COX7A2, COX7A2L CYTL1, CYYR1 COX7B, COX7C D2HGDH, DAAM1 COX8A, COX8C DAAM2, DAB2 CPA1, CPA3 DAB2IP, DACH1 CPA4, CPA6 DACT1, DACT2 CPAMD8, CPB2 DAG1, DAGLA CPD, CPE DAGLB, DALRD3 CPEB2, CPEB3 DAP, DAP3 CPEB4, CPED1 DAPK2, DAPK3 CPM, CPN2 DARS2, DAXX CPNE1, CPNE2 DAZAP1, DAZAP2 CPNE3, CPNE4 DAZL, DBF4 CPNE5, CPNE6 DBF4B, DBH CPNE8, CPO DBN1, DBNDD2 CPOX, CPPED1 DBNL, DBT CPS1, CPSF1 DCAF10, DCAF12 CPSF2, CPSF3 DCAF13, DCAF16 CPSF3L, CPSF4 DCAF17, DCAF4L2 CPSF6, CPSF7 DCAF5, DCAF6 CPT1A, CPT1B DCAF7, DCAF8 CPT2, CR1 DCAKD, DCBLD1 CR2, CRABP1 DCBLD2, DCD CRADD, CRAMP1L DCDC2, DCHS2 CRAT, CRB1 DCK, DCLRE1A CRB2, CRB3 DCLRE1B, DCLRE1C CRBN, CREB1 DCP1A, DCP2 CREB3L2, CREB5 DCPS, DCST1 CREBBP, CREBL2 DCST2, DCSTAMP CREBRF, CREBZF DCTD, DCTN1 CREG1, CRELD1 DCTN2, DCTN3 CRELD2, CRH DCTN4, DCTN5 CRHBP, CRHR2 DCTN6, DCTPP1 CRIM1, CRIP2 DCUN1D1, DCUN1D3 CRISP1, CRISPLD2 DCUN1D4, DCUN1D5 CRK, CRKL DCX, DDAH1 CRLF1, CRLF3 DDAH2, DDB1 CRLS1, CRNKL1 DDHD1, DDHD2 CRNN, CROT DDI2, DDIT3 CRTAM, CRTAP DDIT4, DDN CRTC1, CRTC2 DDOST, DDR1 CRTC3, CRY1 DDR2, DDRGK1 CRY2, CRYAA DDT, DDTL CRYBA1, CRYBA2 DDX10, DDX11 CRYBG3, CRYGC DDX17, DDX18 CRYGN, CRYGS DDX20, DDX21 CRYL1, CRYZ DDX23, DDX24 CS, CSAG1 DDX27, DDX28 CSDC2, CSDE1 DDX31, DDX39A CSE1L, CSF1 DDX39B, DDX3X CSF1R, CSGALNACT1 DDX3Y, DDX41 CSGALNACT2, CSHL1 DDX42, DDX46 CSK, CSN3 DDX47, DDX49 CSNK1A1, CSNK1A1L DDX5, DDX50 CSNK1D, CSNK1E 450

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses DDX52, DDX54 CSNK1G1, CSNK1G2 DDX56, DDX6 CSNK1G3, CSNK2A1 DDX60, DEAF1 CSPG4, CSPG5 DEDD, DEDD2 CSPP1, CSRNP2 DEF8, DEFA4 CSRNP3, CSRP1 DEFA5, DEFB104A CSRP2BP, CST1 DEFB104B, DEFB106A CST2, CST3 DEFB106B, DEFB118 CST4, CST5 DEFB123, DEFB125 CST6, CST9L DEFB126, DEFB132 CSTB, CSTF1 DEFB4A, DEFB4B CSTF2T, CSTF3 DEGS1, DEK CTAGE5, CTBP1 DENND1A, DENND2A CTBP2, CTC1 DENND3, DENND4A CTCF, CTDNEP1 DENND4B, DENND5A CTDP1, CTDSP1 DENND6A, DENND6B CTDSP2, CTDSPL DENR, DEPDC1 CTDSPL2, CTGF DEPDC5, DEPDC7 CTH, CTHRC1 DERA, DERL1 CTIF, CTLA4 DERL2, DESI1 CTNNA1, CTNNAL1 DESI2, DEXI CTNNB1, CTNNBIP1 DFFA, DFNA5 CTNNBL1, CTNND1 DFNB31, DGAT1 CTNS, CTPS1 DGCR11, DGCR14 CTPS2, CTR9 DGCR2, DGCR6L CTRB1, CTRB2 DGCR8, DGKA CTSA, CTSB DGKD, DGKH CTSC, CTSD DGKZ, DHCR24 CTSE, CTSF DHCR7, DHFR CTSH, CTSK DHFRL1, DHFRP1 CTSL1, CTSL2 DHH, DHPS CTSW, CTSZ DHRS1, DHRS11 CTTN, CTTNBP2NL DHRS13, DHRS3 CTXN1, CUEDC1 DHRS4, DHRS7B CUEDC2, CUL1 DHTKD1, DHX15 CUL2, CUL3 DHX16, DHX30 CUL4A, CUL4B DHX32, DHX33 CUL5, CUL7 DHX35, DHX36 CUL9, CUTA DHX37, DHX38 CUX1, CUX2 DHX40, DHX57 CUZD1, CWC15 DHX8, DHX9 CWC22, CWC25 DIABLO, DIAPH1 CWC27, CWF19L1 DIAPH2, DICER1 CXADR, CXCL1 DIDO1, DIEXF CXCL12, CXCL13 DIMT1, DIO1 CXCL16, CXCL2 DIP2A, DIP2C CXCL3, CXCL5 DIRAS1, DIS3 CXCL6, CXCL9 DIS3L, DISP1 CXCR1, CXCR2 DIXDC1, DKC1 CXCR4, CXCR6 DKK1, DKK2 CXCR7, CXORF27 DKK4, DLAT CXORF30, CXORF31 DLC1, DLD CXORF38, CXORF40B DLEU2, DLG1 CXORF48, CXORF56 DLG2, DLG3 CXORF57, CXORF58 DLG4, DLG5 CXXC1, CXXC4 DLGAP5, DLK1 CXXC5, CYB561 DLL1, DLST CYB561A3, CYB561D1 DLX1, DLX2 CYB561D2, CYB5A DLX5, DLX6 CYB5B, CYB5D2 DMAP1, DMBX1 CYB5R1, CYB5R3 DMD, DMGDH CYB5R4, CYBA DMKN, DMPK CYBRD1, CYCS DMRT2, DMRT3 CYFIP1, CYFIP2 DMTF1, DMTN CYGB, CYLC2 DMWD, DMXL1 CYLD, CYORF17 DMXL2, DNAAF2 CYP11B2, CYP19A1 DNAAF3, DNAH3 CYP1A1, CYP1A2 DNAH8, DNAJA1 CYP1B1, CYP20A1 DNAJA2, DNAJA3 CYP21A1P, CYP21A2 DNAJA4, DNAJB1 CYP24A1, CYP26A1 DNAJB11, DNAJB12 CYP26B1, CYP27B1 451

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses DNAJB14, DNAJB2 CYP27C1, CYP2B6 DNAJB4, DNAJB5 CYP2C8, CYP2D6 DNAJB6, DNAJB8 CYP2D7P1, CYP2E1 DNAJB9, DNAJC1 CYP2F1, CYP2J2 DNAJC10, DNAJC11 CYP2R1, CYP2U1 DNAJC13, DNAJC15 CYP39A1, CYP3A4 DNAJC16, DNAJC19 CYP3A5, CYP4A11 DNAJC2, DNAJC21 CYP4A22, CYP4F11 DNAJC27, DNAJC3 CYP4F22, CYP4F8 DNAJC30, DNAJC5 CYP4V2, CYP51A1 DNAJC5G, DNAJC6 CYP7A1, CYP7B1 DNAJC7, DNAJC8 CYP8B1, CYR61 DNAJC9, DNAL1 CYSLTR1, CYSTM1 DNAL4, DNALI1 CYTB, CYTH1 DNASE1L1, DNASE1L3 CYTH2, CYTH3 DNASE2, DNASE2B CYTH4, CYTIP DNHD1, DNLZ CYTL1, CYYR1 DNM1L, DNM2 D2HGDH, DAAM1 DNM3, DNMBP DAAM2, DAB2 DNMT1, DNMT3A DAB2IP, DACH1 DNMT3B, DNPEP DACT1, DACT2 DNPH1, DNTTIP1 DAG1, DAGLA DNTTIP2, DOCK1 DAGLB, DAK DOCK10, DOCK11 DALRD3, DAP DOCK3, DOCK4 DAP3, DAPK1 DOCK5, DOCK6 DAPK2, DAPK3 DOCK7, DOCK9 DARS2, DAXX DOHH, DOK2 DAZAP1, DAZAP2 DOK6, DOLK DAZL, DBF4 DOLPP1, DONSON DBF4B, DBH DOPEY2, DOT1L DBN1, DBNDD2 DPAGT1, DPF1 DBNL, DBT DPF2, DPH1 DCAF10, DCAF11 DPH5, DPH7 DCAF12, DCAF13 DPM1, DPP7 DCAF16, DCAF17 DPP8, DPP9 DCAF4, DCAF4L2 DPT, DPY19L1 DCAF5, DCAF6 DPY19L2, DPY19L4 DCAF7, DCAF8 DPY30, DPYSL2 DCAKD, DCBLD1 DPYSL3, DPYSL5 DCBLD2, DCC DRAP1, DRAXIN DCD, DCDC2 DRD5, DRG1 DCHS2, DCK DRG2, DROSHA DCLRE1A, DCLRE1B DSC2, DSC3 DCLRE1C, DCP1A DSCC1, DSCR3 DCP2, DCPS DSCR8, DSE DCST1, DCST2 DSG1, DSG2 DCSTAMP, DCTD DSN1, DSP DCTN1, DCTN2 DST, DSTN DCTN3, DCTN4 DSTYK, DTD1 DCTN5, DCTN6 DTD2, DTL DCTPP1, DCUN1D1 DTNB, DTWD2 DCUN1D2, DCUN1D3 DTX1, DTX2 DCUN1D4, DCUN1D5 DTX3L, DTX4 DCX, DCXR DTYMK, DUOXA1 DDAH1, DDAH2 DUS1L, DUS2L DDB1, DDB2 DUS3L, DUSP10 DDHD1, DDHD2 DUSP11, DUSP12 DDI2, DDIT3 DUSP14, DUSP16 DDIT4, DDN DUSP18, DUSP2 DDOST, DDR1 DUSP22, DUSP23 DDR2, DDRGK1 DUSP3, DUSP5 DDT, DDX1 DUSP6, DUSP7 DDX10, DDX11 DUSP8, DUSP9 DDX17, DDX18 DUT, DVL1 DDX19A, DDX20 DVL2, DVL3 DDX21, DDX23 DYDC1, DYNC1H1 DDX24, DDX25 DYNC1I1, DYNC1LI1 DDX27, DDX28 DYNC1LI2, DYNC2H1 DDX31, DDX39A DYNLL1, DYNLL2 DDX39B, DDX3X DYNLRB2, DYRK1A DDX3Y, DDX41 452

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses DYRK1B, DYRK2 DDX42, DDX46 DZANK1, DZIP1 DDX47, DDX49 DZIP3, E2F1 DDX5, DDX50 E2F2, E2F3 DDX51, DDX52 E2F4, E2F5 DDX54, DDX56 E2F6, E2F7 DDX6, DDX60 E2F8, E4F1 DEAF1, 1-Dec EAF1, EAF2 DEDD, DEDD2 EAPP, EARS2 DEF8, DEFA1 EBF1, EBI3 DEFA3, DEFA4 EBNA1BP2, EBP DEFA5, DEFB104A EBPL, ECD DEFB104B, DEFB106A ECE1, ECEL1 DEFB106B, DEFB118 ECH1, ECHDC1 DEFB123, DEFB126 ECHDC2, ECHS1 DEFB132, DEFB4A ECI1, ECI2 DEFB4B, DEGS1 ECM2, ECSIT DEK, DENND1A ECT2, EDA DENND2A, DENND2D EDA2R, EDARADD DENND3, DENND4A EDC3, EDC4 DENND4B, DENND5A EDDM3B, EDEM1 DENND6A, DENND6B EDEM2, EDEM3 DENR, DEPDC1 EDF1, EDIL3 DEPDC1B, DEPDC5 EDN1, EDNRA DEPDC7, DERA EEA1, EED DERL1, DERL2 EEF1A1, EEF1A2 DESI1, DESI2 EEF1B2, EEF1D DET1, DEXI EEF1E1, EEF1G DFFA, DFFB EEF2, EEF2K DFNA5, DFNB31 EEFSEC, EFCAB1 DGAT1, DGCR11 EFCAB11, EFCAB13 DGCR14, DGCR2 EFCAB14, EFCAB5 DGCR6L, DGCR8 EFEMP2, EFHC1 DGKA, DGKD EFHD1, EFHD2 DGKI, DGKZ EFNA1, EFNA2 DHCR24, DHCR7 EFNA3, EFNA5 DHFR, DHFRL1 EFNB1, EFNB2 DHFRP1, DHH EFNB3, EFR3A DHODH, DHPS EFS, EFTUD1 DHRS1, DHRS11 EFTUD2, EGFL7 DHRS13, DHRS3 EGFLAM, EGFR DHRS4, DHRS7B EGLN1, EGLN2 DHRSX, DHTKD1 EGLN3, EGR1 DHX15, DHX16 EGR2, EGR3 DHX30, DHX32 EHBP1, EHBP1L1 DHX33, DHX34 EHD1, EHD2 DHX35, DHX36 EHD3, EHD4 DHX37, DHX38 EHF, EHHADH DHX40, DHX57 EHMT1, EHMT2 DHX8, DHX9 EID1, EIF1 DIABLO, DIAPH1 EIF1AD, EIF1AX DIAPH2, DIAPH3 EIF2A, EIF2AK1 DICER1, DIDO1 EIF2AK2, EIF2AK3 DIEXF, DIMT1 EIF2B1, EIF2B2 DIO1, DIP2A EIF2B3, EIF2B4 DIP2C, DIRAS1 EIF2B5, EIF2C1 DIRC2, DIS3 EIF2C3, EIF2C4 DIS3L, DISC1 EIF2S1, EIF2S2 DISP1, DIXDC1 EIF2S3, EIF3A DKC1, DKK1 EIF3B, EIF3C DKK2, DKK4 EIF3CL, EIF3D DLAT, DLC1 EIF3E, EIF3F DLD, DLEU2 EIF3G, EIF3H DLG1, DLG2 EIF3I, EIF3J DLG3, DLG4 EIF3K, EIF3L DLG5, DLGAP4 EIF3M, EIF4A1 DLGAP5, DLK1 EIF4A2, EIF4A3 DLL1, DLST EIF4B, EIF4E DLX1, DLX2 EIF4E2, EIF4EBP1 DLX4, DLX5 EIF4EBP2, EIF4ENIF1 DLX6, DMAP1 EIF4G1, EIF4G2 DMBX1, DMD 453

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses EIF4G3, EIF4H DMGDH, DMKN EIF5, EIF5A DMPK, DMRT2 EIF5A2, EIF5AL1 DMRT3, DMTF1 EIF5B, EIF6 DMTN, DMWD ELAC2, ELAVL1 DMXL1, DMXL2 ELAVL2, ELAVL4 DNAAF2, DNAAF3 ELF4, ELK1 DNAH3, DNAH5 ELK3, ELK4 DNAH8, DNAJA1 ELL2, ELMO2 DNAJA2, DNAJA3 ELMOD2, ELMSAN1 DNAJA4, DNAJB1 ELOF1, ELOVL1 DNAJB11, DNAJB12 ELOVL2, ELOVL4 DNAJB14, DNAJB2 ELOVL5, ELOVL6 DNAJB4, DNAJB5 ELOVL7, ELP2 DNAJB6, DNAJB8 ELP3, ELP5 DNAJB9, DNAJC1 EMB, EMC1 DNAJC11, DNAJC12 EMC10, EMC2 DNAJC13, DNAJC14 EMC6, EMC7 DNAJC15, DNAJC16 EMC8, EMCN DNAJC19, DNAJC2 EMD, EME1 DNAJC21, DNAJC25 EME2, EMID1 DNAJC27, DNAJC3 EMILIN3, EML1 DNAJC30, DNAJC5 EML3, EML4 DNAJC5G, DNAJC6 EML5, EMP1 DNAJC7, DNAJC8 EMP3, EMR1 DNAJC9, DNAL1 EMR2, EMR3 DNAL4, DNALI1 EMX2, EN2 DNASE1L1, DNASE1L3 ENAH, ENC1 DNASE2, DNASE2B ENDOD1, ENDOU DND1, DNHD1 ENG, ENHO DNLZ, DNM1L ENKD1, ENO1 DNM2, DNM3 ENO2, ENOPH1 DNMBP, DNMT1 ENOSF1, ENOX2 DNMT3A, DNMT3B ENPP4, ENPP5 DNPEP, DNPH1 ENPP6, ENSA DNTTIP1, DNTTIP2 ENTHD1, ENTPD1 DOCK1, DOCK10 ENTPD3, ENTPD4 DOCK11, DOCK3 ENTPD6, ENTPD7 DOCK4, DOCK5 ENY2, EOGT DOCK6, DOCK7 EOMES, EP300 DOCK9, DOHH EP400, EPAS1 DOK2, DOK3 EPB41, EPB41L1 DOK4, DOK6 EPB41L2, EPB41L3 DOK7, DOLK EPB41L4A, EPB41L4B DONSON, DOPEY2 EPB41L5, EPC1 DOT1L, DPAGT1 EPDR1, EPHA1 DPF1, DPF2 EPHA2, EPHA4 DPF3, DPH1 EPHA5, EPHA7 DPH5, DPH7 EPHB2, EPHB4 DPM1, DPP3 EPHX1, EPM2A DPP7, DPP8 EPM2AIP1, EPN2 DPP9, DPPA4 EPOR, EPPK1 DPT, DPY19L1 EPRS, EPS8 DPY19L2, DPY19L3 EPS8L2, EPSTI1 DPY19L4, DPYD EPT1, ERAL1 DPYSL2, DPYSL3 ERAP1, ERAP2 DPYSL5, DR1 ERBB2, ERBB2IP DRAM1, DRAP1 ERBB3, ERBB4 DRAXIN, DRD3 ERC1, ERCC1 DRD4, DRD5 ERCC2, ERCC3 DRG1, DRG2 ERCC4, ERCC6L DROSHA, DSC1 ERCC6L2, ERCC8 DSC2, DSC3 EREG, ERF DSCAML1, DSCC1 ERGIC1, ERGIC2 DSCR3, DSCR8 ERGIC3, ERH DSE, DSEL ERI1, ERI2 DSG1, DSG2 ERICH1, ERLIN1 DSN1, DSP ERLIN2, ERMN DST, DSTN ERMP1, ERN1 DSTYK, DTD1 ERO1L, ERO1LB DTD2, DTL ERP44, ERRFI1 DTNB, DTWD2 454

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ESAM, ESCO2 DTX1, DTX2 ESD, ESPL1 DTX3, DTX3L ESR1, ESR2 DTX4, DTYMK ESRP1, ESRP2 DUOXA1, DUS1L ESRRA, ESRRB DUS2L, DUS3L ESX1, ESYT1 DUS4L, DUSP1 ESYT2, ETF1 DUSP10, DUSP11 ETFA, ETNK1 DUSP12, DUSP14 ETNK2, ETS1 DUSP16, DUSP18 ETS2, ETV1 DUSP2, DUSP21 ETV3, ETV5 DUSP22, DUSP23 ETV6, ETV7 DUSP3, DUSP5 EVC, EVI2B DUSP6, DUSP7 EVI5, EVL DUSP8, DUSP9 EWSR1, EXD2 DUT, DVL1 EXD3, EXO1 DVL2, DVL3 EXOC2, EXOC3 DYDC1, DYNC1H1 EXOC4, EXOC5 DYNC1I1, DYNC1LI1 EXOC6B, EXOC7 DYNC1LI2, DYNC2H1 EXOC8, EXOG DYNC2LI1, DYNLL1 EXOSC1, EXOSC10 DYNLL2, DYNLRB2 EXOSC2, EXOSC6 DYNLT1, DYRK1A EXPH5, EXT2 DYRK1B, DYRK2 EXTL2, EXTL3 DYRK3, DZANK1 EYA3, EYA4 DZIP1, DZIP3 EZH1, EZH2 E2F1, E2F2 EZR, F10 E2F3, E2F4 F11R, F13B E2F5, E2F6 F2, F2R E2F7, E2F8 F2RL1, F3 E4F1, EAF1 F8A3, FAAH2 EAF2, EAPP FABP1, FABP3 EARS2, EBF1 FABP4, FABP6 EBI3, EBNA1BP2 FABP7, FADD EBP, EBPL FADS1, FADS2 ECD, ECE1 FADS3, FAF1 ECE2, ECEL1 FAF2, FAH ECH1, ECHDC1 FAHD1, FAHD2A ECHDC2, ECHS1 FAIM, FAM101B ECI1, ECI2 FAM102A, FAM102B ECM2, ECSIT FAM103A1, FAM104A ECT2, EDA FAM105A, FAM105B EDA2R, EDARADD FAM109B, FAM110C EDC3, EDC4 FAM111A, FAM111B EDDM3B, EDEM1 FAM114A1, FAM115A EDEM2, EDEM3 FAM115C, FAM117A EDF1, EDIL3 FAM117B, FAM120A EDN1, EDN2 FAM120AOS, FAM120B EDNRA, EEA1 FAM120C, FAM122A EED, EEF1A1 FAM122B, FAM122C EEF1A2, EEF1B2 FAM124B, FAM126A EEF1D, EEF1E1 FAM126B, FAM127A EEF1G, EEF2 FAM127B, FAM129A EEF2K, EEFSEC FAM129B, FAM131A EFCAB11, EFCAB13 FAM131C, FAM132B EFCAB14, EFCAB5 FAM133A, FAM134A EFEMP2, EFHC1 FAM134C, FAM135A EFHD1, EFHD2 FAM136A, FAM13A EFNA1, EFNA2 FAM13B, FAM13C EFNA3, EFNA4 FAM149A, FAM150B EFNA5, EFNB1 FAM151A, FAM155B EFNB2, EFNB3 FAM160A2, FAM160B1 EFS, EFTUD1 FAM160B2, FAM167A EFTUD2, EGFL7 FAM168A, FAM168B EGFLAM, EGFR FAM171A1, FAM171A2 EGLN1, EGLN2 FAM171B, FAM173B EGLN3, EGR1 FAM177A1, FAM178A EGR2, EGR3 FAM179A, FAM187A EHBP1, EHBP1L1 FAM188B, FAM189A1 EHD1, EHD2 FAM189A2, FAM189B EHD3, EHD4 FAM192A, FAM193B EHF, EHHADH 455

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses FAM195A, FAM196A EHMT1, EHMT2 FAM198B, FAM199X EI24, EID1 FAM203A, FAM208A EIF1, EIF1AD FAM208B, FAM209A EIF1AX, EIF1AY FAM20B, FAM20C EIF2A, EIF2AK1 FAM210A, FAM211A EIF2AK2, EIF2AK3 FAM212B, FAM213A EIF2B1, EIF2B2 FAM213B, FAM214A EIF2B3, EIF2B4 FAM214B, FAM216A EIF2B5, EIF2C1 FAM216B, FAM217B EIF2C3, EIF2C4 FAM219B, FAM21B EIF2S1, EIF2S2 FAM21C, FAM221A EIF2S3, EIF3A FAM221B, FAM222A EIF3B, EIF3C FAM222B, FAM227B EIF3CL, EIF3D FAM26E, FAM3A EIF3E, EIF3F FAM3B, FAM3C EIF3G, EIF3H FAM43B, FAM46A EIF3I, EIF3J FAM46C, FAM47B EIF3K, EIF3L FAM47C, FAM47E EIF3M, EIF4A1 FAM49A, FAM49B EIF4A2, EIF4A3 FAM50A, FAM53C EIF4B, EIF4E FAM57A, FAM58A EIF4E2, EIF4EBP1 FAM5B, FAM60A EIF4EBP2, EIF4ENIF1 FAM63B, FAM65A EIF4G1, EIF4G2 FAM65C, FAM69A EIF4G3, EIF4H FAM69B, FAM71E1 EIF5, EIF5A FAM71F2, FAM72A EIF5A2, EIF5AL1 FAM73B, FAM76A EIF5B, EIF6 FAM81A, FAM83B ELAC2, ELAVL1 FAM83D, FAM83G ELAVL2, ELAVL3 FAM83H, FAM84B ELAVL4, ELF1 FAM89A, FAM89B ELF3, ELF4 FAM8A1, FAM91A1 ELK1, ELK3 FAM96A, FAM96B ELK4, ELL FAM98A, FAM98B ELL2, ELMO2 FAM99A, FAM9B ELMOD2, ELMSAN1 FAN1, FANCA ELOF1, ELOVL1 FANCC, FANCD2 ELOVL2, ELOVL4 FANCE, FANCG ELOVL5, ELOVL6 FANCI, FANCM ELOVL7, ELP2 FAR1, FARP1 ELP3, ELP5 FARSA, FARSB EMB, EMC1 FAS, FASLG EMC10, EMC2 FASN, FASTK EMC3, EMC6 FASTKD1, FASTKD2 EMC7, EMC8 FASTKD5, FAT1 EMCN, EMD FAT2, FAT3 EME1, EME2 FAT4, FAU EMID1, EMILIN3 FAXC, FAXDC2 EML1, EML4 FBL, FBLIM1 EML5, EMP1 FBLN1, FBLN2 EMP3, EMR1 FBLN5, FBN1 EMR2, EMR3 FBN2, FBN3 EMX2, EN2 FBP2, FBRS ENAH, ENC1 FBXL15, FBXL16 ENDOD1, ENDOG FBXL17, FBXL18 ENDOU, ENG FBXL19, FBXL2 ENHO, ENKD1 FBXL20, FBXL3 ENO1, ENO2 FBXL5, FBXL8 ENOPH1, ENOSF1 FBXO10, FBXO11 ENOX1, ENOX2 FBXO18, FBXO21 ENPEP, ENPP4 FBXO22, FBXO25 ENPP5, ENPP6 FBXO28, FBXO3 ENSA, ENTHD1 FBXO30, FBXO31 ENTPD1, ENTPD3 FBXO33, FBXO34 ENTPD4, ENTPD6 FBXO36, FBXO38 ENTPD7, ENY2 FBXO4, FBXO41 EOGT, EOMES FBXO42, FBXO44 EP300, EP400 FBXO45, FBXO5 EPAS1, EPB41 FBXO7, FBXO8 EPB41L1, EPB41L2 FBXW11, FBXW2 EPB41L3, EPB41L4A 456

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses FBXW7, FCAR EPB41L5, EPC1 FCGRT, FCHO2 EPDR1, EPG5 FCHSD1, FCRL4 EPGN, EPHA1 FCRLA, FDFT1 EPHA2, EPHA4 FDPS, FDX1 EPHA5, EPHA7 FDXACB1, FDXR EPHB2, EPHB4 FECH, FEM1A EPHX1, EPM2A FEM1B, FEM1C EPM2AIP1, EPN1 FEN1, FER EPN2, EPOR FERMT1, FERMT2 EPPIN, EPPK1 FETUB, FFAR1 EPRS, EPS8 FGA, FGB EPS8L2, EPSTI1 FGD2, FGD3 EPT1, ERAL1 FGD6, FGF1 ERAP1, ERAP2 FGF10, FGF11 ERBB2, ERBB2IP FGF16, FGF17 ERBB3, ERBB4 FGF2, FGF22 ERC1, ERCC1 FGF23, FGF3 ERCC2, ERCC3 FGF4, FGF5 ERCC4, ERCC5 FGF7, FGF8 ERCC6L, ERCC6L2 FGF9, FGFBP3 ERCC8, EREG FGFR1, FGFR1OP ERF, ERG FGFR2, FGFR3 ERGIC1, ERGIC2 FGFRL1, FGG ERGIC3, ERH FGL2, FH ERI1, ERI2 FHDC1, FHIT ERICH1, ERLIN1 FHL1, FHL3 ERLIN2, ERMN FHOD1, FIBCD1 ERMP1, ERN1 FIBP, FICD ERO1L, ERO1LB FIG4, FIGF ERP29, ERP44 FIGN, FIGNL1 ERRFI1, ESAM FILIP1, FILIP1L ESCO2, ESD FIP1L1, FIS1 ESF1, ESPL1 FIZ1, FJX1 ESR1, ESR2 FKBP10, FKBP14 ESRP1, ESRP2 FKBP15, FKBP1A ESRRA, ESRRB FKBP3, FKBP4 ESX1, ESYT1 FKBP5, FKBP7 ESYT2, ETF1 FKBP8, FKBP9 ETFA, ETNK1 FKBPL, FKTN ETNK2, ETNPPL FLAD1, FLCN ETS1, ETS2 FLG2, FLII ETV1, ETV3 FLNA, FLNB ETV4, ETV5 FLNC, FLOT1 ETV6, EVC FLOT2, FLT1 EVI2A, EVI2B FLT3, FLT4 EVI5, EVL FLVCR1, FLYWCH1 EWSR1, EXD2 FMNL2, FMNL3 EXD3, EXO1 FMO6P, FMOD EXOC2, EXOC3 FMR1, FN1 EXOC4, EXOC5 FN3K, FNBP1 EXOC6B, EXOC7 FNBP1L, FNDC1 EXOC8, EXOG FNDC3A, FNDC3B EXOSC1, EXOSC10 FNDC4, FNDC7 EXOSC2, EXOSC6 FNDC9, FNIP1 EXOSC8, EXPH5 FNIP2, FNTA EXT1, EXT2 FOCAD, FOLR1 EXTL2, EXTL3 FOLR2, FOPNL EYA3, EYA4 FOS, FOSL1 EYS, EZH1 FOSL2, FOXA1 EZH2, EZR FOXA2, FOXA3 F10, F11R FOXC1, FOXD1 F13B, F2 FOXD2, FOXD3 F2R, F3 FOXD4L6, FOXF2 F5, F8A3 FOXI1, FOXJ2 FAAH, FAAH2 FOXJ3, FOXK1 FABP1, FABP4 FOXK2, FOXM1 FABP6, FABP7 FOXN2, FOXN3 FADD, FADS1 FOXN4, FOXO1 FADS2, FADS3 FOXO3, FOXO4 FAF1, FAF2 FOXP1, FOXP4 FAH, FAHD1 457

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses FOXQ1, FOXRED1 FAHD2A, FAHD2B FOXRED2, FPGS FAIM, FAM101B FPGT, FPGT-TNNI3K FAM102A, FAM102B FPR1, FRA10AC1 FAM103A1, FAM104A FRAT1, FRAT2 FAM105A, FAM105B FREM1, FREM2 FAM107B, FAM109B FRG1, FRK FAM110A, FAM110B FRMD1, FRMD4B FAM110C, FAM111A FRMD7, FRMD8 FAM111B, FAM114A1 FRMPD4, FRS2 FAM114A2, FAM115A FRY, FRZB FAM115C, FAM117A FSCB, FSCN1 FAM117B, FAM118A FSCN2, FSIP1 FAM120A, FAM120AOS FSTL1, FTCD FAM120B, FAM120C FTH1, FTL FAM122A, FAM122B FTO, FTSJ1 FAM122C, FAM124B FTSJ3, FTSJD2 FAM126A, FAM126B FUBP1, FUBP3 FAM127A, FAM127B FUK, FUNDC2 FAM129A, FAM129B FURIN, FUS FAM131A, FAM131C FUT1, FUT10 FAM132B, FAM133A FUT11, FUT6 FAM134A, FAM134B FUT9, FUZ FAM134C, FAM135A FXN, FXR1 FAM136A, FAM13A FXR2, FXYD2 FAM13B, FAM13C FXYD4, FXYD6 FAM149A, FAM150B FYCO1, FYN FAM151A, FAM155B FYTTD1, FZD1 FAM160A2, FAM160B1 FZD10, FZD3 FAM160B2, FAM167A FZD4, FZD5 FAM168A, FAM168B FZD7, FZD8 FAM171A1, FAM171A2 G2E3, G3BP1 FAM171B, FAM172A G3BP2, G6PC FAM173B, FAM175B G6PC3, G6PD FAM177A1, FAM178A GAB1, GAB2 FAM179A, FAM187A GAB3, GABARAPL1 FAM188A, FAM188B GABARAPL2, GABBR1 FAM189A1, FAM189A2 GABPA, GABPB1 FAM189B, FAM192A GABPB2, GABRA1 FAM193A, FAM193B GABRB3, GABRE FAM195A, FAM196A GABRP, GABRQ FAM198B, FAM199X GADD45A, GADD45G FAM19A2, FAM203A GADD45GIP1, GAGE1 FAM206A, FAM207A GAGE12B, GAGE12D FAM208A, FAM208B GAGE12E, GAGE12H FAM209A, FAM20B GAGE2C, GAGE2D FAM20C, FAM210A GAGE2E, GAK FAM211A, FAM212B GAL3ST4, GALK1 FAM213A, FAM213B GALK2, GALNT1 FAM214A, FAM214B GALNT11, GALNT12 FAM216A, FAM216B GALNT16, GALNT18 FAM217B, FAM219B GALNT2, GALNT3 FAM21B, FAM21C GALNT5, GALNT7 FAM221A, FAM221B GALNTL5, GALT FAM222A, FAM222B GANAB, GAPDH FAM227B, FAM26E GAPDHS, GAPVD1 FAM32A, FAM35A GAREM, GAREML FAM35BP, FAM3A GARNL3, GARS FAM3B, FAM3C GART, GAS1 FAM43B, FAM45A GAS2L2, GAS2L3 FAM46A, FAM46C GAS7, GATA1 FAM47B, FAM47C GATA3, GATA4 FAM47E, FAM49A GATA5, GATA6 FAM49B, FAM50A GATAD1, GATAD2A FAM53C, FAM57A GATAD2B, GATC FAM58A, FAM5B GATM, GATS FAM60A, FAM63B GBAS, GBF1 FAM65A, FAM65B GBP2, GBP4 FAM65C, FAM69A GBX2, GC FAM69B, FAM71E1 GCA, GCAT FAM71F2, FAM72A GCC1, GCC2 FAM73B, FAM76A 458

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses GCFC2, GCH1 FAM81A, FAM83B GCHFR, GCK FAM83D, FAM83G GCLM, GCM1 FAM83H, FAM84B GCN1L1, GCNT1 FAM89A, FAM89B GCNT2, GCNT3 FAM8A1, FAM91A1 GCNT4, GCSAML FAM96A, FAM96B GDAP1, GDAP2 FAM98A, FAM98B GDE1, GDF10 FAM99A, FAM9B GDF11, GDF15 FAN1, FANCA GDF5, GDF7 FANCC, FANCD2 GDF9, GDI1 FANCE, FANCG GDI2, GDNF FANCI, FANCM GDPD1, GDPD5 FAR1, FARP1 GEM, GEMIN4 FARSA, FARSB GEMIN5, GEMIN6 FAS, FASLG GEMIN7, GEN1 FASN, FASTK GFAP, GFER FASTKD1, FASTKD2 GFM1, GFM2 FASTKD5, FAT1 GFOD1, GFPT1 FAT2, FAT3 GFPT2, GFRA1 FAT4, FAU GGA2, GGA3 FAXC, FAXDC2 GGCT, GGPS1 FBL, FBLIM1 GGT5, GGT7 FBLN1, FBLN2 GHITM, GID4 FBLN5, FBN1 GID8, GIGYF1 FBN2, FBN3 GIGYF2, GIMAP2 FBP2, FBRS GIMAP4, GIMAP8 FBRSL1, FBXL15 GINM1, GINS1 FBXL16, FBXL17 GINS2, GINS3 FBXL18, FBXL19 GIP, GIPC1 FBXL2, FBXL20 GIPC2, GIT1 FBXL3, FBXL5 GJA1, GJA3 FBXL6, FBXL8 GJA8, GJB1 FBXO10, FBXO11 GJB2, GJB3 FBXO17, FBXO18 GJB4, GJB5 FBXO21, FBXO22 GJB6, GJC1 FBXO24, FBXO25 GJE1, GK FBXO28, FBXO3 GK5, GLB1 FBXO30, FBXO31 GLB1L3, GLCCI1 FBXO33, FBXO34 GLCE, GLDN FBXO36, FBXO38 GLE1, GLG1 FBXO4, FBXO41 GLI1, GLI2 FBXO42, FBXO44 GLI3, GLIPR1 FBXO45, FBXO5 GLIS2, GLMN FBXO7, FBXO8 GLO1, GLOD4 FBXW11, FBXW2 GLRA2, GLRX FBXW7, FCAR GLRX3, GLRX5 FCF1, FCGR1B GLS, GLS2 FCGRT, FCHO2 GLT8D1, GLT8D2 FCHSD1, FCHSD2 GLTP, GLTPD1 FCN2, FCRL4 GLTPD2, GLTSCR1 FCRLA, FDFT1 GLTSCR1L, GLTSCR2 FDPS, FDX1 GLUD1, GLUD2 FDXACB1, FDXR GLUL, GLYCTK FECH, FEM1A GLYR1, GMEB2 FEM1B, FEM1C GMFB, GMNN FEN1, FER GMPPB, GMPR2 FERMT1, FERMT2 GMPS, GNA12 FES, FETUB GNA13, GNA14 FFAR1, FGA GNAI1, GNAI2 FGB, FGD2 GNAI3, GNAL FGD3, FGD6 GNAQ, GNAS FGF1, FGF10 GNAT2, GNAZ FGF11, FGF16 GNB1, GNB2 FGF17, FGF2 GNB2L1, GNB4 FGF20, FGF22 GNB5, GNE FGF23, FGF3 GNG12, GNG2 FGF4, FGF5 GNG5, GNG5P2 FGF7, FGF8 GNG7, GNG8 FGF9, FGFBP1 GNGT1, GNGT2 FGFBP3, FGFR1 GNL1, GNL2 FGFR1OP, FGFR1OP2 459

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses GNL3L, GNMT FGFR2, FGFR3 GNPAT, GNPDA1 FGFRL1, FGG GNPDA2, GNPNAT1 FGGY, FGL2 GNPTAB, GNRH1 FH, FHDC1 GNRHR, GNS FHIT, FHL1 GOLGA2P5, GOLGA3 FHL2, FHL3 GOLGA4, GOLGA5 FHL5, FHOD1 GOLGA6A, GOLGA6C FIBCD1, FIBP GOLGA6L9, GOLGA7 FICD, FIG4 GOLGA8A, GOLGA8B FIGF, FIGN GOLGB1, GOLIM4 FIGNL1, FILIP1L GOLM1, GOLPH3 FIP1L1, FIS1 GOLPH3L, GOLT1B FITM2, FIZ1 GON4L, GOPC FJX1, FKBP10 GORASP2, GOSR1 FKBP14, FKBP15 GOSR2, GOT1L1 FKBP1A, FKBP1B GOT2, GP5 FKBP2, FKBP3 GP9, GPAA1 FKBP4, FKBP5 GPAM, GPANK1 FKBP7, FKBP8 GPATCH1, GPATCH11 FKBP9, FKBP9L GPATCH2, GPATCH3 FKBPL, FKRP GPATCH4, GPATCH8 FKTN, FLAD1 GPBAR1, GPBP1 FLCN, FLG GPBP1L1, GPC1 FLG2, FLI1 GPC4, GPD1L FLII, FLNA GPD2, GPER FLNB, FLNC GPHB5, GPHN FLOT1, FLOT2 GPI, GPIHBP1 FLT1, FLT3 GPLD1, GPM6A FLT4, FLVCR1 GPM6B, GPN1 FLYWCH1, FMNL2 GPN2, GPN3 FMNL3, FMO1 GPR107, GPR108 FMO3, FMO6P GPR112, GPR115 FMOD, FMR1 GPR116, GPR126 FN1, FN3K GPR137B, GPR137C FN3KRP, FNBP1 GPR143, GPR153 FNBP1L, FNDC1 GPR157, GPR162 FNDC3A, FNDC3B GPR17, GPR180 FNDC4, FNDC7 GPR19, GPR22 FNDC9, FNIP1 GPR34, GPR37 FNIP2, FNTA GPR37L1, GPR50 FNTB, FOCAD GPR55, GPR56 FOLH1, FOLR2 GPR63, GPR64 FOPNL, FOS GPR78, GPR82 FOSL1, FOSL2 GPR83, GPR87 FOXA1, FOXA2 GPR89A, GPR98 FOXA3, FOXC1 GPRC5A, GPRC5C FOXD1, FOXD2 GPRIN1, GPRIN2 FOXD3, FOXD4L6 GPRIN3, GPS1 FOXE1, FOXF2 GPSM2, GPX1 FOXG1, FOXI1 GPX1P1, GPX2 FOXJ2, FOXJ3 GPX3, GPX7 FOXK1, FOXK2 GPX8, GRAMD1A FOXM1, FOXN2 GRAMD4, GRASP FOXN3, FOXN4 GRB10, GRB2 FOXO1, FOXO3 GRB7, GREB1 FOXO4, FOXP1 GREB1L, GRHL1 FOXP4, FOXQ1 GRHL3, GRHPR FOXRED1, FOXRED2 GRIA1, GRIA2 FPGS, FPGT GRIA4, GRID2IP FPGT-TNNI3K, FPR1 GRIK1-AS1, GRIK4 FRA10AC1, FRAS1 GRIK5, GRIN2A FRAT1, FRAT2 GRIN2D, GRINA FREM1, FREM2 GRIP2, GRIPAP1 FRG1, FRK GRK6, GRM3 FRMD1, FRMD3 GRM5, GRM7 FRMD4A, FRMD4B GRN, GRPEL1 FRMD6, FRMD8 GRPEL2, GRPR FRMPD4, FRS2 GRSF1, GRTP1 FRS3, FRY GRWD1, GSDMB FRZB, FSCB GSE1, GSG1 FSCN1, FSCN2 460

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses GSG1L, GSG2 FSIP1, FSTL1 GSK3A, GSK3B FSTL3, FTCD GSN, GSPT1 FTH1, FTL GSR, GSS FTO, FTSJ1 GSTA4, GSTCD FTSJ3, FTSJD2 GSTM2, GSTM3 FUBP1, FUBP3 GSTM4, GSTO1 FUK, FUNDC2 GSTP1, GSTT2B FURIN, FUS GTF2A1, GTF2B FUT1, FUT10 GTF2E1, GTF2E2 FUT11, FUT4 GTF2H1, GTF2H2 FUT6, FUT8 GTF2H2C, GTF2H3 FUT9, FUZ GTF2H4, GTF2I FXN, FXR1 GTF3C1, GTF3C2 FXR2, FXYD2 GTF3C3, GTF3C4 FXYD4, FXYD6 GTF3C5, GTF3C6 FYCO1, FYN GTPBP1, GTPBP3 FYTTD1, FZD1 GTPBP4, GTPBP8 FZD10, FZD3 GTSE1, GTSF1L FZD4, FZD5 GUCA2B, GUCD1 FZD6, FZD7 GUCY1A2, GUCY1B3 FZD8, FZR1 GULP1, GUSB G2E3, G3BP1 GXYLT1, GXYLT2 G3BP2, G6PC GYG1, GYLTL1B G6PC3, G6PD GYS1, GYS2 GAB1, GAB2 GZF1, GZMK GAB3, GABARAP H1F0, H1FX GABARAPL1, GABARAPL2 H1FX-AS1, H2AFV GABBR1, GABBR2 H2AFX, H2AFY GABPA, GABPB1 H2AFZ, H3F3A GABPB2, GABRA1 H3F3AP6, H3F3B GABRA3, GABRB3 H3F3C, H6PD GABRE, GABRG3 HABP4, HACE1 GABRP, GABRQ HACL1, HADH GADD45A, GADD45G HADHA, HADHB GADD45GIP1, GAGE1 HAGH, HAND1 GAGE12B, GAGE12D HAND2, HAPLN1 GAGE12E, GAGE12H HAPLN2, HARS GAGE2C, GAGE2D HARS2, HAS2 GAGE2E, GAK HAS3, HAUS1 GAL3ST1, GAL3ST4 HAUS2, HAUS3 GALK1, GALK2 HAUS6, HAUS8 GALNT1, GALNT11 HAVCR1, HAX1 GALNT12, GALNT16 HBB, HBE1 GALNT18, GALNT2 HBEGF, HBP1 GALNT3, GALNT5 HBS1L, HCAR1 GALNT7, GALNTL5 HCCS, HCFC1 GALR2, GALR3 HCLS1, HCN2 GALT, GANAB HCN3, HCN4 GAPDH, GAPDHS HCRTR1, HCST GAPVD1, GAR1 HDAC1, HDAC10 GAREM, GAREML HDAC2, HDAC4 GARNL3, GARS HDAC5, HDAC6 GART, GAS1 HDAC7, HDDC2 GAS2, GAS2L1 HDGF, HDHD1 GAS2L2, GAS2L3 HDHD2, HDHD3 GAS6, GAS7 HDLBP, HEATR1 GATA1, GATA2 HEATR2, HEATR3 GATA3, GATA4 HEATR5B, HEATR6 GATA5, GATA6 HECTD1, HECTD2 GATAD1, GATAD2A HECTD3, HECTD4 GATAD2B, GATC HEG1, HELLS GATM, GATS HELQ, HELZ GBA, GBAS HEMGN, HERC1 GBF1, GBP1 HERC2, HERC3 GBP2, GBP3 HERC4, HERC5 GBP4, GBX2 HERC6, HERPUD1 GC, GCA HERPUD2, HES1 GCAT, GCC1 HES2, HES4 GCC2, GCFC2 HES6, HESX1 GCH1, GCHFR 461

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses HEXIM1, HEY1 GCK, GCLC HEY2, HEYL GCLM, GCM1 HFM1, HGD GCN1L1, GCNT1 HGFAC, HGS GCNT2, GCNT3 HHAT, HHATL GCNT4, GCSAM HHEX, HIATL1 GCSAML, GCSH HIBADH, HIC1 GDA, GDAP1 HIC2, HID1 GDAP2, GDE1 HIF1A, HIF1AN GDF10, GDF11 HIF3A, HIGD1A GDF15, GDF5 HINT2, HINT3 GDF7, GDF9 HIP1R, HIPK1 GDI1, GDI2 HIPK2, HIPK3 GDNF, GDPD1 HIRA, HIRIP3 GDPD5, GEM HIST1H1A, HIST1H1B GEMIN4, GEMIN5 HIST1H1C, HIST1H1D GEMIN6, GEMIN7 HIST1H1E, HIST1H2AA GEN1, GFAP HIST1H2AB, HIST1H2AC GFER, GFM1 HIST1H2AD, HIST1H2AE GFM2, GFOD1 HIST1H2AG, HIST1H2AH GFOD2, GFPT1 HIST1H2AI, HIST1H2AJ GFPT2, GFRA1 HIST1H2AK, HIST1H2AL GGA1, GGA2 HIST1H2AM, HIST1H2BA GGA3, GGCT HIST1H2BB, HIST1H2BC GGH, GGPS1 HIST1H2BD, HIST1H2BE GGT5, GGT7 HIST1H2BF, HIST1H2BG GHITM, GHSR HIST1H2BH, HIST1H2BI GID4, GID8 HIST1H2BJ, HIST1H2BK GIGYF1, GIGYF2 HIST1H2BL, HIST1H2BO GIMAP2, GIMAP4 HIST1H3A, HIST1H3B GIMAP8, GINM1 HIST1H3C, HIST1H3D GINS1, GINS2 HIST1H3E, HIST1H3F GINS3, GINS4 HIST1H3G, HIST1H3H GIP, GIPC1 HIST1H3I, HIST1H3J GIPC2, GIT1 HIST1H4A, HIST1H4B GIT2, GJA1 HIST1H4C, HIST1H4D GJA5, GJA8 HIST1H4E, HIST1H4F GJB1, GJB2 HIST1H4H, HIST1H4I GJB4, GJB5 HIST1H4J, HIST1H4K GJB6, GJC1 HIST1H4L, HIST2H2AA3 GJE1, GK HIST2H2AA4, HIST2H2AB GK5, GKN1 HIST2H2AC, HIST2H2BC GLA, GLB1 HIST2H2BE, HIST2H2BF GLB1L3, GLCCI1 HIST2H3A, HIST2H3C GLCE, GLDC HIST2H3D, HIST2H4A GLDN, GLE1 HIST2H4B, HIST3H2A GLG1, GLI1 HIST3H2BB, HIST3H3 GLI2, GLI3 HIST4H4, HIVEP1 GLIPR1, GLIPR2 HIVEP2, HIVEP3 GLIS2, GLMN HJURP, HK1 GLO1, GLOD4 HK2, HKR1 GLRA2, GLRX HLA-A, HLA-C GLRX2, GLRX3 HLA-DMA, HLA-DRB5 GLRX5, GLS HLA-E, HLA-G GLS2, GLT8D1 HM13, HMBOX1 GLT8D2, GLTP HMBS, HMCN1 GLTPD1, GLTPD2 HMG20A, HMGA1 GLTSCR1, GLTSCR1L HMGA2, HMGB1 GLTSCR2, GLUD1 HMGB2, HMGB3 GLUD2, GLUL HMGCL, HMGCLL1 GLYCTK, GLYR1 HMGCR, HMGCS1 GMCL1, GMCL1P1 HMGN1, HMGN2 GMEB2, GMFB HMGN4, HMGXB3 GMNN, GMPPA HMGXB4, HMHA1 GMPPB, GMPR2 HMMR, HMOX1 GMPS, GNA12 HMOX2, HMP19 GNA13, GNA14 HN1, HN1L GNAI1, GNAI2 HNF1B, HNF4A GNAI3, GNAL HNRNPA0, HNRNPA1 GNAQ, GNAS HNRNPA1L2, HNRNPA2B1 GNAT2, GNAZ HNRNPA3, HNRNPAB GNB1, GNB2 462

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses HNRNPC, HNRNPD GNB2L1, GNB3 HNRNPDL, HNRNPF GNB4, GNB5 HNRNPH1, HNRNPH2 GNE, GNG10 HNRNPH3, HNRNPK GNG12, GNG2 HNRNPL, HNRNPLL GNG5, GNG5P2 HNRNPM, HNRNPR GNG7, GNG8 HNRNPU, HNRNPUL1 GNGT1, GNGT2 HNRNPUL2, HNRPDL GNL1, GNL2 HOGA1, HOMER1 GNL3, GNL3L HOOK1, HOOK3 GNMT, GNPAT HORMAD2, HOXA1 GNPDA1, GNPDA2 HOXA10, HOXA11 GNPNAT1, GNPTAB HOXA13, HOXA2 GNRH1, GNRHR HOXA3, HOXA5 GNS, GOLGA2P5 HOXA6, HOXA9 GOLGA3, GOLGA4 HOXB1, HOXB2 GOLGA5, GOLGA6A HOXB3, HOXB4 GOLGA6C, GOLGA6L9 HOXB5, HOXB8 GOLGA7, GOLGA8A HOXB9, HOXC10 GOLGA8B, GOLGB1 HOXC11, HOXC12 GOLIM4, GOLM1 HOXC4, HOXC6 GOLPH3, GOLPH3L HOXC8, HOXD10 GOLT1B, GON4L HOXD11, HOXD13 GOPC, GORASP2 HOXD3, HP1BP3 GOSR1, GOSR2 HPCAL1, HPDL GOT1, GOT1L1 HPN, HPRT1 GOT2, GP1BB HPS1, HPS3 GP5, GP9 HPS4, HPS5 GPAA1, GPAM HPS6, HRAS GPANK1, GPATCH1 HRASLS5, HRC GPATCH11, GPATCH2 HRCT1, HRH1 GPATCH3, GPATCH4 HRH2, HRH4 GPATCH8, GPBAR1 HRSP12, HS2ST1 GPBP1, GPBP1L1 HS3ST3A1, HS3ST3B1 GPC1, GPC4 HS6ST2, HS6ST3 GPC6, GPCPD1 HSBP1P2, HSD11B1 GPD1, GPD1L HSD17B10, HSD17B11 GPD2, GPER HSD17B12, HSD17B3 GPHB5, GPHN HSD17B4, HSD17B7 GPI, GPIHBP1 HSD17B7P2, HSD17B8 GPLD1, GPM6A HSD3B2, HSD3B7 GPM6B, GPN1 HSDL1, HSDL2 GPN2, GPN3 HSF2, HSFY2 GPNMB, GPR1 HSH2D, HSP90AA1 GPR107, GPR108 HSP90AB1, HSP90B1 GPR112, GPR115 HSP90B3P, HSPA12A GPR116, GPR123 HSPA13, HSPA14 GPR126, GPR135 HSPA1A, HSPA1B GPR137B, GPR137C HSPA1L, HSPA2 GPR143, GPR153 HSPA4, HSPA4L GPR157, GPR161 HSPA5, HSPA8 GPR162, GPR17 HSPA9, HSPB2 GPR180, GPR183 HSPB3, HSPB7 GPR19, GPR22 HSPB8, HSPBAP1 GPR27, GPR34 HSPBP1, HSPD1 GPR37, GPR37L1 HSPG2, HSPH1 GPR39, GPR50 HTATSF1, HTN3 GPR55, GPR56 HTR1A, HTR1D GPR63, GPR64 HTR2C, HTR3E GPR78, GPR82 HTR7, HTRA1 GPR83, GPR87 HTRA3, HTRA4 GPR89A, GPR89B HTT, HUS1 GPR98, GPRASP2 HUWE1, HYAL1 GPRC5A, GPRC5C HYAL3, HYI GPRIN1, GPRIN2 HYLS1, HYOU1 GPRIN3, GPS1 HYPK, IAH1 GPSM2, GPT2 IAPP, IARS GPX1, GPX2 IARS2, IBA57 GPX3, GPX4 IBTK, ICAM1 GPX7, GPX8 ICAM3, ICAM4 GRAMD1A, GRAMD3 ICK, ICMT GRASP, GRB10 463

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ICOSLG, ID1 GRB14, GRB2 ID2, ID3 GRB7, GREB1 ID4, IDE GREB1L, GREM1 IDH1, IDH3A GREM2, GRHL1 IDI1, IDS GRHL2, GRHL3 IDUA, IER2 GRHPR, GRIA1 IER3, IER3IP1 GRIA2, GRIA4 IFI16, IFI27 GRID2IP, GRIK1-AS1 IFI30, IFI44 GRIK2, GRIK4 IFI44L, IFIH1 GRIK5, GRIN2A IFIT1, IFIT2 GRIN2D, GRINA IFIT3, IFIT5 GRIP2, GRIPAP1 IFITM1, IFITM3 GRK6, GRM1 IFNA21, IFNAR1 GRM3, GRM5 IFNAR2, IFNB1 GRM7, GRM8 IFNE, IFNG GRN, GRPEL1 IFNGR1, IFNGR2 GRPEL2, GRPR IFNLR1, IFRD1 GRSF1, GRTP1 IFRD2, IFT122 GRWD1, GSDMB IFT140, IFT172 GSE1, GSG1 IFT52, IFT74 GSG1L, GSG2 IFT81, IGDCC3 GSK3A, GSK3B IGDCC4, IGF1R GSN, GSPT1 IGF2, IGF2-AS GSPT2, GSR IGF2BP1, IGF2BP2 GSS, GSTA1 IGF2BP3, IGF2R GSTA4, GSTCD IGFBP1, IGFBP2 GSTK1, GSTM2 IGFBP4, IGFBP5 GSTM3, GSTM4 IGFBP7, IGFBPL1 GSTO1, GSTP1 IGHMBP2, IGHV1-69 GSTT2, GSTT2B IGIP, IGLC1 GTDC2, GTF2A1 IGLC2, IGLC3 GTF2A2, GTF2B IGLC7, IGLON5 GTF2E1, GTF2E2 IGLV1-36, IGLV1-44 GTF2F1, GTF2H1 IGLV1-47, IGLV3-10 GTF2H2, GTF2H2C IGSF1, IGSF10 GTF2H3, GTF2H4 IGSF21, IGSF3 GTF2I, GTF2IRD2 IGSF8, IGSF9B GTF2IRD2B, GTF3C1 IK, IKBIP GTF3C2, GTF3C3 IKBKAP, IKBKB GTF3C4, GTF3C5 IKBKE, IKBKG GTF3C6, GTPBP1 IKZF2, IKZF4 GTPBP2, GTPBP3 IKZF5, IL10 GTPBP4, GTPBP8 IL11, IL12RB2 GTSE1, GTSF1L IL13, IL13RA2 GUCA2B, GUCD1 IL15, IL17A GUCY1A2, GUCY1B3 IL17RA, IL17RB GUF1, GUK1 IL17RC, IL17RD GULP1, GUSB IL18BP, IL1A GXYLT1, GXYLT2 IL1B, IL1R1 GYG1, GYG2 IL1RAP, IL2 GYLTL1B, GYS1 IL21R, IL22 GYS2, GZF1 IL24, IL25 GZMK, H1F0 IL27, IL27RA H1FX, H1FX-AS1 IL31RA, IL32 H2AFV, H2AFX IL33, IL36RN H2AFY, H2AFY2 IL37, IL4 H2AFZ, H3F3A IL4R, IL5 H3F3AP6, H3F3B IL6, IL6R H6PD, HABP4 IL6ST, IL7 HACE1, HACL1 IL8, ILDR1 HADH, HADHA ILF2, ILF3 HADHB, HAGH ILK, ILKAP HAL, HAND1 ILVBL, IMMT HAPLN1, HAPLN2 IMP3, IMP4 HARS, HARS2 IMPA1, IMPA2 HAS2, HAS3 IMPAD1, IMPDH1 HAUS1, HAUS2 IMPDH2, INA HAUS3, HAUS6 INADL, INCENP HAUS8, HAVCR1 INF2, ING4 HAX1, HBB INHBA, INHBB HBE1, HBEGF 464

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses INHBE, INIP HBP1, HBQ1 INO80B, INO80D HBS1L, HCAR1 INPP4A, INPP5A HCCS, HCFC1 INPP5D, INPP5F HCLS1, HCN2 INPPL1, INSC HCN3, HCN4 INSIG1, INSIG2 HCRTR1, HCST INSL5, INSM2 HDAC1, HDAC10 INTS1, INTS10 HDAC2, HDAC4 INTS12, INTS2 HDAC5, HDAC6 INTS3, INTS4 HDDC2, HDGF INTS5, INTS6 HDHD1, HDHD2 INTS7, INTS8 HDHD3, HDLBP IP6K1, IP6K2 HEATR1, HEATR2 IP6K3, IPCEF1 HEATR3, HEATR5A IPMK, IPO11 HEATR5B, HEATR6 IPO13, IPO4 HEBP2, HECA IPO5, IPO7 HECTD1, HECTD2 IPO8, IPO9 HECTD3, HECTD4 IPP, IPPK HEG1, HELLS IQCA1, IQCB1 HELQ, HELZ IQCC, IQCD HEMGN, HERC1 IQCE, IQCG HERC2, HERC3 IQGAP1, IQGAP3 HERC4, HERC6 IQSEC1, IRAK1 HERPUD1, HERPUD2 IRAK2, IRAK3 HES1, HES2 IRAK4, IREB2 HES4, HES6 IRF1, IRF2 HESX1, HEXA IRF2BP1, IRF2BP2 HEXB, HEXIM1 IRF2BPL, IRF4 HEY1, HEY2 IRF5, IRF6 HEYL, HFM1 IRF7, IRF9 HGD, HGFAC IRGQ, IRS1 HGS, HHAT IRS2, IRS4 HHATL, HHEX IRX2, IRX4 HHLA3, HIAT1 ISCA1, ISCU HIATL1, HIBADH ISG15, ISG20 HIC1, HIC2 ISG20L2, ISL1 HID1, HIF1A ISM2, ISOC1 HIF1AN, HIGD1A ISOC2, IST1 HINT1, HINT3 ISYNA1, ITCH HIP1R, HIPK1 ITFG1, ITGA1 HIPK2, HIPK3 ITGA10, ITGA11 HIRA, HIRIP3 ITGA2, ITGA3 HIST1H1A, HIST1H1B ITGA4, ITGA5 HIST1H1C, HIST1H1D ITGA6, ITGA9 HIST1H1E, HIST1H2AA ITGAV, ITGB1 HIST1H2AB, HIST1H2AC ITGB2, ITGB3 HIST1H2AD, HIST1H2AE ITGB4, ITGB5 HIST1H2AG, HIST1H2AH ITGB6, ITGB8 HIST1H2AI, HIST1H2AJ ITGBL1, ITIH1 HIST1H2AK, HIST1H2AL ITIH5, ITM2B HIST1H2AM, HIST1H2BA ITM2C, ITPA HIST1H2BB, HIST1H2BC ITPKA, ITPKB HIST1H2BD, HIST1H2BE ITPKC, ITPR1 HIST1H2BF, HIST1H2BG ITPR2, ITPR3 HIST1H2BH, HIST1H2BI ITPRIP, ITPRIPL2 HIST1H2BJ, HIST1H2BK ITSN1, ITSN2 HIST1H2BL, HIST1H2BO IVD, IVL HIST1H3A, HIST1H3B IVNS1ABP, IWS1 HIST1H3C, HIST1H3D IYD, IZUMO2 HIST1H3E, HIST1H3F JAG1, JAG2 HIST1H3G, HIST1H3H JAK1, JAK2 HIST1H3I, HIST1H3J JAK3, JAKMIP1 HIST1H4A, HIST1H4B JAKMIP2, JAKMIP3 HIST1H4C, HIST1H4D JARID2, JAZF1 HIST1H4E, HIST1H4F JHDM1D, JMJD1C HIST1H4H, HIST1H4I JMJD4, JMJD6 HIST1H4J, HIST1H4K JMJD8, JMY HIST1H4L, HIST2H2AA3 JOSD1, JPH1 HIST2H2AA4, HIST2H2AB JPH4, JTB HIST2H2AC, HIST2H2BC JUN, JUNB HIST2H2BE, HIST2H2BF 465

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses JUND, JUP HIST2H3A, HIST2H3C KAAG1, KALRN HIST2H3D, HIST2H4A KANK1, KANK2 HIST2H4B, HIST3H2A KANSL1, KANSL2 HIST3H2BB, HIST3H3 KANSL3, KARS HIST4H4, HIVEP1 KAT2A, KAT2B HIVEP2, HIVEP3 KAT5, KAT6A HJURP, HK1 KAT6B, KAT7 HK2, HKDC1 KATNAL1, KATNB1 HKR1, HLA-A KATNBL1, KBTBD11 HLA-C, HLA-DMA KBTBD2, KBTBD3 HLA-DQB2, HLA-DRB5 KBTBD4, KBTBD6 HLA-E, HLA-G KBTBD7, KBTBD8 HLTF, HM13 KCNA4, KCNC1 HMBOX1, HMBS KCNC4, KCND1 HMCN1, HMG20A KCND3, KCNE1 HMGA1, HMGA2 KCNE1L, KCNG1 HMGB1, HMGB2 KCNG3, KCNH1 HMGB3, HMGCL KCNH2, KCNH3 HMGCLL1, HMGCR KCNH4, KCNH8 HMGCS1, HMGN1 KCNJ14, KCNJ15 HMGN2, HMGN4 KCNJ16, KCNJ2 HMGN5, HMGXB3 KCNJ6, KCNJ9 HMGXB4, HMHA1 KCNK1, KCNK13 HMMR, HMOX1 KCNK4, KCNK5 HMOX2, HMP19 KCNK7, KCNK9 HN1, HN1L KCNMB4, KCNN4 HNF1A, HNF1B KCNQ2, KCNS2 HNF4A, HNRNPA0 KCNS3, KCNT2 HNRNPA1, HNRNPA1L2 KCTD1, KCTD10 HNRNPA2B1, HNRNPA3 KCTD11, KCTD12 HNRNPA3P1, HNRNPAB KCTD14, KCTD15 HNRNPC, HNRNPCL1 KCTD17, KCTD2 HNRNPD, HNRNPDL KCTD20, KCTD3 HNRNPF, HNRNPH1 KCTD4, KCTD5 HNRNPH2, HNRNPH3 KCTD7, KDELC2 HNRNPK, HNRNPL KDELR1, KDELR2 HNRNPLL, HNRNPM KDM1A, KDM1B HNRNPR, HNRNPU KDM2A, KDM2B HNRNPUL1, HNRNPUL2 KDM3A, KDM3B HNRPDL, HOGA1 KDM4A, KDM4B HOMER1, HOOK1 KDM4C, KDM4D HOOK3, HORMAD2 KDM5A, KDM5B HOXA1, HOXA10 KDM5C, KDM5D HOXA11, HOXA13 KDM6B, KDM8 HOXA2, HOXA3 KDR, KDSR HOXA5, HOXA6 KEAP1, KERA HOXA7, HOXA9 KHDC3L, KHDRBS1 HOXB1, HOXB2 KHDRBS2, KHDRBS3 HOXB3, HOXB4 KHNYN, KHSRP HOXB5, HOXB7 KIAA0020, KIAA0060 HOXB8, HOXB9 KIAA0061, KIAA0100 HOXC10, HOXC11 KIAA0101, KIAA0141 HOXC12, HOXC4 KIAA0195, KIAA0196 HOXC6, HOXC8 KIAA0226L, KIAA0232 HOXC9, HOXD10 KIAA0247, KIAA0319L HOXD11, HOXD13 KIAA0355, KIAA0368 HOXD3, HOXD8 KIAA0430, KIAA0432 HP1BP3, HPCAL1 KIAA0586, KIAA0825 HPCAL4, HPDL KIAA0895, KIAA0895L HPN, HPRT1 KIAA0899, KIAA0907 HPS1, HPS4 KIAA0922, KIAA1033 HPS5, HPS6 KIAA1109, KIAA1143 HRAS, HRASLS5 KIAA1147, KIAA1161 HRC, HRCT1 KIAA1191, KIAA1217 HRH1, HRH2 KIAA1244, KIAA1279 HRH4, HRSP12 KIAA1307, KIAA1324L HS1BP3, HS2ST1 KIAA1328, KIAA1377 HS3ST3A1, HS3ST3B1 KIAA1407, KIAA1429 HS6ST2, HS6ST3 KIAA1432, KIAA1462 HSBP1P2, HSD11B1 KIAA1468, KIAA1491 HSD17B10, HSD17B11 466

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses KIAA1522, KIAA1524 HSD17B12, HSD17B14 KIAA1549, KIAA1549L HSD17B2, HSD17B3 KIAA1551, KIAA1586 HSD17B4, HSD17B6 KIAA1598, KIAA1644 HSD17B7, HSD17B7P2 KIAA1671, KIAA1715 HSD17B8, HSD3B1 KIAA1731, KIAA1804 HSD3B2, HSD3B7 KIAA1841, KIAA1919 HSDL1, HSDL2 KIAA1958, KIAA1967 HSF1, HSF2 KIAA2013, KIAA2026 HSF4, HSFY2 KIDINS220, KIF11 HSH2D, HSP90AA1 KIF13B, KIF14 HSP90AB1, HSP90B1 KIF15, KIF16B HSP90B3P, HSPA12A KIF17, KIF18A HSPA13, HSPA14 KIF18B, KIF1A HSPA1A, HSPA1B KIF1B, KIF1C HSPA1L, HSPA2 KIF20A, KIF20B HSPA4, HSPA4L KIF21A, KIF22 HSPA5, HSPA8 KIF23, KIF24 HSPA9, HSPB11 KIF27, KIF2A HSPB2, HSPB3 KIF2C, KIF3B HSPB6, HSPB7 KIF3C, KIF4A HSPB8, HSPBAP1 KIF4B, KIF5A HSPBP1, HSPD1 KIF5B, KIF5C HSPG2, HSPH1 KIF6, KIFAP3 HTATIP2, HTATSF1 KIFC1, KIFC2 HTN3, HTR1A KIN, KIR3DL1 HTR1B, HTR1D KIR3DL2, KIRREL HTR2C, HTR3A KIT, KITLG HTR3E, HTR7 KL, KLB HTRA1, HTRA2 KLC1, KLC2 HTRA3, HTRA4 KLC4, KLF10 HTT, HUS1 KLF11, KLF12 HUWE1, HYAL1 KLF13, KLF14 HYAL3, HYLS1 KLF2, KLF3 HYOU1, HYPK KLF4, KLF5 IAH1, IAPP KLF6, KLF7 IARS, IARS2 KLF8, KLF9 IBA57, IBSP KLHDC1, KLHDC10 IBTK, ICAM1 KLHDC2, KLHDC3 ICAM3, ICAM4 KLHDC4, KLHDC5 ICAM5, ICK KLHDC7B, KLHL1 ICMT, ICOSLG KLHL11, KLHL12 ICT1, ID1 KLHL13, KLHL15 ID2, ID3 KLHL17, KLHL2 ID4, IDE KLHL20, KLHL21 IDH1, IDH2 KLHL22, KLHL23 IDH3A, IDH3B KLHL24, KLHL28 IDH3G, IDI1 KLHL3, KLHL31 IDS, IDUA KLHL32, KLHL34 IER2, IER3 KLHL35, KLHL36 IER3IP1, IER5 KLHL4, KLHL42 IFFO2, IFI16 KLHL5, KLHL6 IFI27, IFI44 KLHL7, KLHL8 IFI44L, IFIH1 KLHL9, KLK10 IFIT1, IFIT2 KLK11, KLK12 IFIT3, IFIT5 KLK3, KLK6 IFITM1, IFITM3 KLK7, KLRB1 IFNA21, IFNAR1 KLRC1, KLRC2 IFNAR2, IFNB1 KLRC3, KLRC4 IFNE, IFNG KLRF1, KLRG1 IFNGR1, IFNGR2 KLRK1, KMT2A IFNLR1, IFRD1 KMT2C, KMT2D IFRD2, IFT122 KMT2E, KNCN IFT140, IFT172 KNG1, KNSTRN IFT74, IFT81 KNTC1, KPNA1 IGDCC3, IGDCC4 KPNA2, KPNA3 IGF1, IGF1R KPNA4, KPNA5 IGF2, IGF2-AS KPNA6, KPNB1 IGF2BP1, IGF2BP2 KPTN, KRAS IGF2BP3, IGF2R KRBOX4, KRCC1 IGFBP1, IGFBP2 KREMEN1, KREMEN2 IGFBP3, IGFBP4 467

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses KRI1, KRT15 IGFBP5, IGFBP7 KRT18P8, KRT19 IGFBPL1, IGFLR1 KRT3, KRT32 IGFN1, IGHMBP2 KRT38, KRT4 IGHV1-69, IGIP KRT5, KRT6B IGJ, IGLC1 KRT6C, KRT7 IGLC2, IGLC3 KRT71, KRT72 IGLC7, IGLON5 KRT75, KRT77 IGLV1-36, IGLV1-44 KRT80, KRT85 IGLV1-47, IGLV3-10 KRTAP10-12, KRTAP10-2 IGSF1, IGSF10 KRTAP10-4, KRTAP10-6 IGSF21, IGSF3 KRTAP11-1, KRTAP12-2 IGSF6, IGSF8 KRTAP19-1, KRTAP3-2 IGSF9B, IK KRTAP3-3, KRTAP4-11 IKBIP, IKBKAP KRTAP4-2, KRTAP4-4 IKBKB, IKBKE KRTAP4-5, KRTAP4-6 IKBKG, IKZF1 KRTAP4-7, KRTAP5-9 IKZF2, IKZF4 KRTAP9-8, KSR2 IKZF5, IL10 KTN1, KXD1 IL11, IL12A L1CAM, L2HGDH IL12RB2, IL13 L3MBTL2, L3MBTL4 IL13RA1, IL13RA2 LACC1, LACE1 IL15, IL17A LACRT, LACTB IL17RA, IL17RB LACTB2, LAIR2 IL17RC, IL17RD LAMA3, LAMA4 IL18BP, IL18R1 LAMA5, LAMB1 IL1A, IL1B LAMB2, LAMB3 IL1R1, IL1R2 LAMC1, LAMC2 IL1RAP, IL1RL1 LAMP1, LAMP2 IL2, IL20RA LAMTOR1, LAMTOR2 IL21R, IL22 LAMTOR3, LAMTOR4 IL22RA1, IL25 LAMTOR5, LANCL1 IL27, IL27RA LAPTM4A, LAPTM4B IL3, IL31RA LAPTM5, LARGE IL32, IL33 LARP1, LARP1B IL36RN, IL37 LARP4, LARP4B IL4, IL4R LARS, LARS2 IL5, IL6 LAS1L, LASP1 IL6R, IL6ST LATS1, LATS2 IL7, IL7R LBP, LBR IL8, ILDR1 LBX2, LCA5 ILF2, ILF3 LCA5L, LCE1E ILK, ILKAP , LCLAT1 ILVBL, IMMT LCN1, LCN9 IMP3, IMP4 LCNL1, LCOR IMPA1, IMPA2 LCORL, LCP1 IMPACT, IMPAD1 LDHA, LDHB IMPDH1, IMPDH2 LDLR, LDLRAD4 IMPG1, INA LDLRAP1, LEAP2 INADL, INCENP LECT1, LECT2 INF2, ING1 LEF1, LEMD2 ING2, ING4 LEMD3, LENG8 INHA, INHBA LENG9, LEPR INHBB, INHBE LEPRE1, LEPREL2 INIP, INO80B LEPREL4, LEPROT INO80C, INO80D LEPROTL1, LETM1 INPP4A, INPP5A LETMD1, LFNG INPP5B, INPP5D LGALS1, LGALS3 INPP5E, INPP5F LGALS3BP, LGALS9 INPP5J, INPP5K LGALS9B, LGALS9C INPPL1, INSC LGALSL, LGI2 INSIG1, INSIG2 LGMN, LGR4 INSL4, INSL5 LGR6, LGTN INSM2, INSR LHCGR, LHFP INTS1, INTS10 LHFPL2, LHFPL4 INTS12, INTS2 LHPP, LHX4 INTS3, INTS4 LHX6, LHX9 INTS5, INTS6 LIF, LIFR INTS7, INTS8 LIG1, LIG3 IP6K1, IP6K2 LIG4, LILRA1 IP6K3, IPCEF1 LILRA3, LILRA5 IPMK, IPO11 468

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses LILRB4, LILRB5 IPO13, IPO4 LIM2, LIMA1 IPO5, IPO7 LIMCH1, LIMD1 IPO8, IPO9 LIMK1, LIMS1 IPP, IPPK LIN-41, LIN28A IQCA1, IQCB1 LIN28B, LIN37 IQCC, IQCE LIN52, LIN7A IQCG, IQCK LIN7B, LIN7C IQGAP1, IQSEC1 LIN9, LINC00174 IRAK1, IRAK2 LINC00238, LINC00266-1 IRAK3, IRAK4 LINC00304, LINC00305 IREB2, IRF1 LINC00313, LINC00319 IRF2, IRF2BP1 LINC00324, LINC00334 IRF2BP2, IRF2BPL LINC00470, LINC00483 IRF4, IRF5 LINC00487, LINC00521 IRF6, IRF7 LINC00684, LINC00862 IRF9, IRGC LINGO3, LIPA IRGQ, IRS1 LIPC, LIPG IRS2, IRS4 LITAF, LIX1L IRX2, IRX4 LLGL1, LLGL2 ISCA1, ISCU LLPH, LMAN1 ISG15, ISG20 LMAN2, LMAN2L ISG20L2, ISL1 LMBR1, LMBR1L ISM2, ISOC1 LMBRD1, LMBRD2 ISOC2, ISY1 LMCD1, LMF2 ISYNA1, ITCH LMLN, LMNA ITFG1, ITFG3 LMNB1, LMNB2 ITGA1, ITGA10 LMO2, LMO3 ITGA11, ITGA2 LMO7, LMOD1 ITGA3, ITGA4 LMOD3, LMTK2 ITGA5, ITGA6 LMTK3, LNPEP ITGA8, ITGA9 LOC391247, LOC81691 ITGAV, ITGAX LOH12CR1, LONP1 ITGB1, ITGB2 LONP2, LONRF1 ITGB3, ITGB3BP LONRF2, LOX ITGB4, ITGB5 LOXHD1, LOXL1 ITGB6, ITGB8 LOXL2, LOXL4 ITGBL1, ITIH1 LPAL2, LPAR1 ITIH5, ITM2A LPCAT1, LPCAT3 ITM2B, ITM2C LPGAT1, LPHN1 ITPA, ITPK1 LPHN2, LPHN3 ITPKA, ITPKB LPIN1, LPIN2 ITPKC, ITPR1 LPIN3, LPL ITPR2, ITPR3 LPPR4, LPPR5 ITPRIP, ITPRIPL2 LRAT, LRBA ITSN1, ITSN2 LRCH2, LRCH4 IVD, IVL LRFN1, LRFN4 IVNS1ABP, IWS1 LRG1, LRIG1 IYD, IZUMO2 LRIG3, LRP1 JAG1, JAG2 LRP10, LRP1B JAK1, JAK2 LRP3, LRP4 JAK3, JAKMIP1 LRP5, LRP6 JAKMIP2, JAKMIP3 LRP8, LRPPRC JAM2, JARID2 LRR1, LRRC1 JAZF1, JHDM1D LRRC10B, LRRC16A JMJD1C, JMJD4 LRRC17, LRRC19 JMJD6, JMJD8 LRRC20, LRRC27 JMY, JOSD1 LRRC28, LRRC32 JPH1, JPH4 LRRC34, LRRC36 JTB, JUN LRRC37A2, LRRC4 JUNB, JUND LRRC40, LRRC41 JUP, KAAG1 LRRC42, LRRC47 KALRN, KANK1 LRRC57, LRRC58 KANK2, KANSL1 LRRC59, LRRC70 KANSL2, KANSL3 LRRC8A, LRRC8B KARS, KAT2A LRRC8C, LRRC8D KAT2B, KAT5 LRRCC1, LRRFIP1 KAT6A, KAT6B LRRFIP2, LRRK2 KAT7, KATNA1 LRRN1, LRRN3 KATNAL1, KATNB1 LRWD1, LSAMP KATNBL1, KBTBD11 LSG1, LSM1 KBTBD2, KBTBD3 469

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses LSM10, LSM11 KBTBD4, KBTBD6 LSM12, LSM14A KBTBD7, KBTBD8 LSM14B, LSM2 KCNA4, KCNA5 LSM3, LSM5 KCNC1, KCNC4 LSM6, LSM7 KCND1, KCND3 LSMEM1, LSMEM2 KCNE1, KCNE1L LSP1, LSR KCNF1, KCNG1 LSS, LTA4H KCNG3, KCNH1 LTBP1, LTBP2 KCNH2, KCNH3 LTBP3, LTBP4 KCNH4, KCNH8 LTBR, LTC4S KCNJ14, KCNJ15 LTK, LTN1 KCNJ16, KCNJ2 LTV1, LUC7L KCNJ6, KCNJ9 LUC7L2, LUC7L3 KCNK1, KCNK13 LURAP1L, LUZP1 KCNK18, KCNK2 LXN, LY6E KCNK4, KCNK5 LY6G5B, LY6G6E KCNK7, KCNK9 LY6H, LY6K KCNMB4, KCNN2 LYAR, LYL1 KCNN4, KCNQ1 LYN, LYPD2 KCNQ2, KCNS2 LYPD3, LYPD4 KCNS3, KCNT2 LYPD6, LYPLA1 KCTD1, KCTD10 LYPLA2, LYRM2 KCTD11, KCTD12 LYRM5, LYRM7 KCTD14, KCTD15 LYSMD1, LYSMD3 KCTD17, KCTD2 LYST, LYZL1 KCTD20, KCTD3 LYZL2, LZIC KCTD4, KCTD5 LZTFL1, LZTR1 KCTD7, KCTD9 LZTS2, LZTS3 KDELC1, KDELC2 M6PR, MAB21L1 KDELR1, KDELR2 MACC1, MACF1 KDM1A, KDM1B MACROD1, MAD1L1 KDM2A, KDM2B MAD2L1, MAD2L1BP KDM3A, KDM3B MAF1, MAFB KDM4A, KDM4B MAFF, MAFG KDM4C, KDM4D MAFK, MAGEA12 KDM5A, KDM5B MAGEA2, MAGEA2B KDM5C, KDM5D MAGEA3, MAGEA6 KDM6B, KDR MAGEB1, MAGEB2 KDSR, KEAP1 MAGEB3, MAGED1 KERA, KHDC1 MAGED2, MAGED4 KHDC3L, KHDRBS1 MAGED4B, MAGI1 KHDRBS2, KHDRBS3 MAGOHB, MAK16 KHK, KHNYN MAL2, MALSU1 KHSRP, KIAA0020 MALT1, MAML1 KIAA0060, KIAA0061 MAML2, MAMLD1 KIAA0100, KIAA0101 MAN1A1, MAN1A2 KIAA0141, KIAA0195 MAN1B1, MAN1C1 KIAA0196, KIAA0226L MAN2A1, MAN2A2 KIAA0232, KIAA0247 MAN2B1, MANBA KIAA0319L, KIAA0355 MANEAL, MANF KIAA0368, KIAA0430 MANSC1, MAOA KIAA0432, KIAA0586 MAP1B, MAP1LC3A KIAA0825, KIAA0895 MAP1LC3B, MAP1LC3B2 KIAA0895L, KIAA0899 MAP1S, MAP2 KIAA0907, KIAA0922 MAP2K1, MAP2K2 KIAA0930, KIAA0947 MAP2K3, MAP2K4 KIAA1024, KIAA1033 MAP2K5, MAP2K7 KIAA1045, KIAA1107 MAP3K1, MAP3K10 KIAA1109, KIAA1143 MAP3K12, MAP3K13 KIAA1147, KIAA1161 MAP3K2, MAP3K3 KIAA1191, KIAA1199 MAP3K4, MAP3K6 KIAA1211, KIAA1217 MAP3K7, MAP3K8 KIAA1244, KIAA1279 MAP3K9, MAP4 KIAA1307, KIAA1324 MAP4K2, MAP6D1 KIAA1324L, KIAA1328 MAP7, MAP7D1 KIAA1377, KIAA1407 MAP7D2, MAP7D3 KIAA1429, KIAA1432 MAPK1, MAPK11 KIAA1462, KIAA1468 MAPK14, MAPK1IP1L KIAA1491, KIAA1522 MAPK3, MAPK4 KIAA1524, KIAA1549 MAPK6, MAPK7 KIAA1549L, KIAA1551 470

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses MAPK8, MAPK8IP2 KIAA1586, KIAA1598 MAPK9, MAPKAPK2 KIAA1644, KIAA1671 MAPRE1, MAPRE3 KIAA1704, KIAA1715 MAPT, 1-Mar KIAA1731, KIAA1804 1-Mar, 11-Mar KIAA1841, KIAA1919 2-Mar, 5-Mar KIAA1958, KIAA1967 6-Mar, 8-Mar KIAA2013, KIAA2026 MARCKS, MARCKSL1 KIDINS220, KIF11 MARK2, MARK3 KIF13A, KIF13B MARK4, MARS KIF14, KIF15 MARS2, MAS1L KIF16B, KIF17 MAST1, MAST2 KIF18A, KIF18B MAST3, MAST4 KIF1A, KIF1B MAT1A, MAT2A KIF1C, KIF20A MAT2B, MATK KIF20B, KIF21A MATN3, MATR3 KIF21B, KIF22 MAU2, MAVS KIF23, KIF24 MAX, MAZ KIF26A, KIF27 MB21D2, MBD1 KIF2A, KIF2C MBD3, MBD5 KIF3B, KIF3C MBD6, MBIP KIF4A, KIF4B MBLAC2, MBNL1 KIF5A, KIF5B MBNL2, MBNL3 KIF5C, KIF6 MBOAT7, MBTD1 KIF7, KIFAP3 MBTPS1, MBTPS2 KIFC1, KIFC2 MC4R, MCAM KIFC3, KIN MCAT, MCCC2 KIR3DL1, KIR3DL2 MCCD1, MCFD2 KIR3DX1, KIRREL MCHR2, MCL1 KIRREL2, KIT MCM10, MCM2 KITLG, KL MCM3, MCM3AP KLB, KLC1 MCM4, MCM5 KLC2, KLC4 MCM6, MCM7 KLF10, KLF11 MCM8, MCM9 KLF12, KLF13 MCMBP, MCMDC2 KLF14, KLF2 MCOLN2, MCPH1 KLF3, KLF4 MCRS1, MCTP1 KLF5, KLF6 MCTS1, MCU KLF7, KLF8 MCUR1, MDC1 KLF9, KLHDC1 MDFI, MDFIC KLHDC10, KLHDC2 MDGA1, MDGA2 KLHDC3, KLHDC7B MDH1, MDH2 KLHL1, KLHL11 MDK, MDM2 KLHL12, KLHL14 MDM4, MDN1 KLHL15, KLHL17 ME1, ME2 KLHL18, KLHL2 ME3, MEA1 KLHL20, KLHL21 MEAF6, MECP2 KLHL22, KLHL23 MECR, MED1 KLHL24, KLHL28 MED12, MED12L KLHL3, KLHL32 MED13, MED13L KLHL34, KLHL35 MED14, MED15 KLHL36, KLHL4 MED16, MED17 KLHL42, KLHL5 MED18, MED21 KLHL6, KLHL7 MED22, MED24 KLHL8, KLHL9 MED25, MED28 KLK10, KLK11 MED4, MED6 KLK3, KLK6 MED8, MEF2A KLK7, KLRB1 MEF2C, MEF2D KLRC1, KLRC2 MEGF11, MEGF8 KLRC3, KLRC4 MEGF9, MEIS1 KLRF1, KLRG1 MEIS2, MEIS3 KLRK1, KMT2A MEIS3P2, MELK KMT2C, KMT2D MEN1, MEOX2 KMT2E, KNG1 MEP1B, MEPCE KNSTRN, KNTC1 MEPE, MERTK KPNA1, KPNA2 MESDC1, MESDC2 KPNA3, KPNA4 MESP2, MET KPNA5, KPNA6 METAP1, METAP2 KPNB1, KPTN METRN, METTL10 KRAS, KRBOX4 METTL13, METTL16 KRCC1, KREMEN1 METTL17, METTL18 KREMEN2, KRI1 471

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses METTL20, METTL21A KRT14, KRT15 METTL21D, METTL22 KRT17P2, KRT18 METTL23, METTL25 KRT18P8, KRT19 METTL2A, METTL2B KRT222, KRT3 METTL4, METTL6 KRT32, KRT33A METTL9, MEX3B KRT38, KRT4 MEX3C, MFGE8 KRT5, KRT6B MFHAS1, MFI2 KRT6C, KRT7 MFN2, MFNG KRT71, KRT72 MFSD1, MFSD10 KRT77, KRT8 MFSD12, MFSD3 KRT80, KRT85 MFSD5, MFSD6 KRTAP10-12, KRTAP10-2 MFSD6L, MFSD8 KRTAP10-4, KRTAP10-6 MGA, MGAT2 KRTAP11-1, KRTAP12-2 MGAT3, MGAT4A KRTAP19-1, KRTAP3-2 MGAT4B, MGAT5 KRTAP3-3, KRTAP4-11 MGC27345, MGEA5 KRTAP4-2, KRTAP4-4 MGLL, MGMT KRTAP4-5, KRTAP4-6 MGP, MGRN1 KRTAP4-7, KRTAP5-3 MGST1, MGST2 KRTAP5-7, KRTAP5-9 MIA2, MIB1 KRTAP9-8, KSR2 MIB2, MICA KTN1, KXD1 MICAL1, MICALL1 L1CAM, L2HGDH MICU1, MICU2 L3HYPDH, L3MBTL2 MID1, MID1IP1 L3MBTL4, LACC1 MID2, MIDN LACE1, LACRT MIEN1, MIER1 LACTB, LACTB2 MIER2, MIER3 LAIR1, LAIR2 MIF4GD, MINA LAMA1, LAMA3 MINK1, MINOS1 LAMA4, LAMA5 MINPP1, MIOS LAMB1, LAMB2 MIPEP, MIPOL1 LAMB3, LAMC1 MIR124-2, MIR143 LAMC2, LAMP1 MIR192, MIR194-2 LAMP2, LAMP3 MIR30C2, MIR34C LAMTOR1, LAMTOR2 MIR671, MIR7-3HG LAMTOR3, LAMTOR4 MIR769, MIS12 LAMTOR5, LANCL1 MIS18A, MIS18BP1 LAPTM4A, LAPTM4B MISP, MITD1 LAPTM5, LARGE MITF, MKI67 LARP1, LARP1B MKI67IP, MKL2 LARP4, LARP4B MKLN1, MKNK2 LARP6, LARS MKRN1, MKRN2 LARS2, LAS1L MKX, MLC1 LASP1, LAT2 MLEC, MLF1IP LATS1, LATS2 MLF2, MLH1 LBH, LBP MLH3, MLLT1 LBR, LBX2 MLLT10, MLLT11 LCA5, LCA5L MLLT4, MLLT6 LCAT, LCE1E MLN, MLNR LCK, LCLAT1 MLX, MLXIP LCN1, LCN9 MLYCD, MMAB LCNL1, LCOR MMACHC, MMADHC LCORL, LCP1 MMD, MME LCT, LDB2 MMEL1, MMP1 LDHA, LDHB MMP11, MMP13 LDLR, LDLRAD3 MMP14, MMP15 LDLRAD4, LDLRAP1 MMP16, MMP17 LDOC1, LDOC1L MMP2, MMP20 LEAP2, LECT1 MMP23A, MMP24 LECT2, LEF1 MMP25, MMP3 LEMD2, LEMD3 MMP7, MMP9 LENG1, LENG8 MMRN1, MMS19 LENG9, LEPR MMS22L, MN1 LEPRE1, LEPREL2 MND1, MNS1 LEPREL4, LEPROT MNT, MOAP1 LEPROTL1, LETM1 MOB1A, MOB1B LETMD1, LFNG MOB3A, MOB3B LGALS3, LGALS3BP MOB3C, MOB4 LGALS9, LGALS9B MOBP, MOCOS LGALS9C, LGALSL MOCS3, MOGS LGMN, LGR4 472

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses MON1B, MON2 LGR5, LGR6 MORC2, MORC3 LGTN, LHCGR MORC4, MORF4L1 LHFP, LHFPL2 MORF4L2, MORN2 LHFPL4, LHPP MOSPD2, MOV10 LHX2, LHX6 MOV10L1, MOXD1 LHX9, LIAS MPC1, MPDU1 LIF, LIFR MPDZ, MPG LIG1, LIG3 MPHOSPH10, MPHOSPH8 LIG4, LILRA1 MPHOSPH9, MPI LILRA3, LILRA4 MPL, MPLKIP LILRA5, LILRB4 MPND, MPP1 LILRB5, LIM2 MPP2, MPP5 LIMA1, LIMCH1 MPP6, MPPED2 LIMD1, LIMD2 MPRIP, MPST LIMK1, LIMS1 MPV17, MPZL1 LIN-41, LIN28A MRAP2, MRAS LIN28B, LIN37 MRC2, MRE11A LIN52, LIN7A MREG, MRFAP1 LIN7B, LIN7C MRGBP, MRGPRX1 LIN9, LINC00174 MRGPRX3, MRM1 LINC00238, LINC00266-1 MROH1, MROH7 LINC00304, LINC00305 MROH8, MROH9 LINC00313, LINC00319 MRPL1, MRPL10 LINC00324, LINC00334 MRPL12, MRPL13 LINC00470, LINC00483 MRPL14, MRPL15 LINC00487, LINC00521 MRPL19, MRPL2 LINC00684, LINC00862 MRPL20, MRPL21 LINGO3, LINS MRPL24, MRPL28 LIPA, LIPC MRPL3, MRPL32 LIPG, LITAF MRPL36, MRPL37 LIX1L, LLGL1 MRPL38, MRPL39 LLGL2, LLPH MRPL4, MRPL40 LMAN1, LMAN2 MRPL42, MRPL43 LMAN2L, LMBR1 MRPL44, MRPL45 LMBR1L, LMBRD1 MRPL46, MRPL47 LMBRD2, LMCD1 MRPL49, MRPL50 LMF2, LMLN MRPL51, MRPL53 LMNA, LMNB1 MRPL9, MRPS10 LMNB2, LMO2 MRPS11, MRPS14 LMO3, LMO7 MRPS15, MRPS16 LMOD1, LMOD3 MRPS18B, MRPS2 LMTK2, LMTK3 MRPS23, MRPS24 LNPEP, LNX2 MRPS25, MRPS26 LOC100128295, LOC150786 MRPS27, MRPS31 LOC339924, LOC391247 MRPS33, MRPS34 LOC81691, LOC91431 MRPS35, MRPS5 LOH12CR1, LONP1 MRPS6, MRPS9 LONP2, LONRF1 MRRF, MRS2 LONRF2, LOX MRTO4, MS4A10 LOXHD1, LOXL1 MS4A14, MS4A2 LOXL2, LOXL4 MS4A3, MS4A4A LPAL2, LPAR1 MS4A6E, MS4A8 LPCAT1, LPCAT3 MSANTD1, MSANTD2 LPGAT1, LPHN1 MSANTD3, MSANTD4 LPHN2, LPHN3 MSC, MSH2 LPIN1, LPIN2 MSH3, MSH6 LPIN3, LPL MSI1, MSI2 LPP, LPPR4 MSL2, MSL3 LPPR5, LPXN MSLN, MSMO1 LRAT, LRBA MSN, MSRB1 LRCH2, LRCH3 MSRB3, MSS51 LRCH4, LRFN1 MST1, MSTO1 LRFN3, LRFN4 MT1E, MT1F LRG1, LRIF1 MT1X, MT4 LRIG1, LRIG3 MTA2, MTA3 LRP1, LRP10 MTAP, MTCH2 LRP11, LRP1B MTDH, MTERFD1 LRP2, LRP3 MTF1, MTF2 LRP4, LRP5 MTFP1, MTFR1 LRP6, LRP8 MTFR1L, MTFR2 LRPAP1, LRPPRC 473

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses MTG1, MTHFD1 LRR1, LRRC1 MTHFD1L, MTHFD2 LRRC10B, LRRC15 MTHFR, MTHFS LRRC17, LRRC19 MTHFSD, MTIF2 LRRC2, LRRC20 MTMR1, MTMR10 LRRC27, LRRC28 MTMR12, MTMR14 LRRC32, LRRC34 MTMR2, MTMR3 LRRC36, LRRC37A2 MTMR4, MTMR6 LRRC4, LRRC40 MTMR9, MTO1 LRRC41, LRRC42 MTOR, MTPAP LRRC47, LRRC57 MTPN, MTR LRRC58, LRRC59 MTRF1, MTRF1L LRRC70, LRRC8A MTRR, MTSS1 LRRC8B, LRRC8C MTSS1L, MTUS1 LRRC8D, LRRC8E MTUS2, MTX1 LRRCC1, LRRFIP1 MTX3, MUC1 LRRFIP2, LRRK2 MUC13, MUC15 LRRN1, LRRN2 MUC17, MUC21 LRRN3, LRRTM1 MUC3A, MUC4 LSAMP, LSG1 MUC5AC, MUC5B LSM1, LSM10 MUC6, MVB12B LSM11, LSM12 MX2, MXD1 LSM14A, LSM14B MXD3, MXD4 LSM2, LSM3 MXI1, MXRA7 LSM4, LSM5 MXRA8, MYADM LSM6, LSM7 MYB, MYBBP1A LSMEM1, LSMEM2 MYBL1, MYBL2 LSP1, LSR MYBPC1, MYBPH LSS, LTA4H MYC, MYCBP LTBP1, LTBP2 MYCBP2, MYCL1 LTBP3, LTBP4 MYCN, MYCT1 LTBR, LTC4S MYEF2, MYEOV LTK, LTN1 MYF5, MYH1 LTV1, LUC7L MYH10, MYH14 LUC7L2, LUC7L3 MYH2, MYH4 LURAP1, LURAP1L MYH9, MYL12B LUZP1, LXN MYL2, MYL6 LY6D, LY6E MYLIP, MYLK LY6G5B, LY6G6E MYLK3, MYLPF LY6H, LY6K MYO10, MYO18A LYAR, LYL1 MYO19, MYO1A LYN, LYNX1 MYO1B, MYO1C LYPD1, LYPD2 MYO1D, MYO1E LYPD3, LYPD4 MYO3B, MYO5A LYPD6, LYPLA1 MYO5B, MYO5C LYPLA2, LYRM2 MYO6, MYO7A LYRM5, LYRM7 MYO9A, MYO9B LYSMD1, LYSMD3 MYOCD, MYOD1 LYST, LYVE1 MYOF, MYOM1 LYZ, LYZL1 MYOT, MYPN LYZL2, LZIC MYPOP, MYRF LZTFL1, LZTR1 MYSM1, MYT1L LZTS2, LZTS3 MZT1, MZT2A M6PR, MAB21L1 MZT2B, N4BP1 MACC1, MACF1 N4BP2, N4BP2L2 MAD1L1, MAD2L1 N6AMT1, NAA10 MAD2L1BP, MAD2L2 NAA15, NAA16 MADD, MAF1 NAA20, NAA25 MAFB, MAFF NAA30, NAA35 MAFG, MAFK NAA40, NAA50 MAGEA11, MAGEA12 NAA60, NAALAD2 MAGEA2, MAGEA2B NAB1, NAB2 MAGEA3, MAGEA6 NABP1, NABP2 MAGEA9, MAGEA9B NACA, NACC1 MAGEB1, MAGEB2 NACC2, NADK MAGEC3, MAGED1 NADK2, NAF1 MAGED2, MAGED4 NAGA, NAGK MAGED4B, MAGEF1 NAGPA, NAIP MAGI1, MAGIX NAMPT, NANOG MAGOHB, MAGT1 NANS, NAP1L1 MAK16, MAL2 NAP1L2, NAPEPLD MALL, MALSU1 474

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses NAPG, NAPRT1 MALT1, MAML1 NAPSB, NARF MAML2, MAMLD1 NARS, NARS2 MAN1A1, MAN1A2 NASP, NAT1 MAN1C1, MAN2A1 NAT10, NAT14 MAN2A2, MAN2B1 NAT8L, NAV1 MAN2C1, MANBA NAV2, NAV3 MANEAL, MANF NBEA, NBEAL2 MANSC1, MAOA NBL1, NBN MAP1A, MAP1B NBPF15, NBPF23 MAP1LC3A, MAP1LC3B NBR1, NCAM1 MAP1LC3B2, MAP1LC3C NCAM2, NCAPD2 MAP1S, MAP2 NCAPD3, NCAPG MAP2K1, MAP2K2 NCAPG2, NCAPH MAP2K3, MAP2K4 NCAPH2, NCBP1 MAP2K5, MAP2K6 NCBP2, NCDN MAP2K7, MAP3K1 NCEH1, NCF4 MAP3K10, MAP3K11 NCK2, NCKAP1 MAP3K12, MAP3K13 NCKAP5, NCKAP5L MAP3K14, MAP3K2 NCKIPSD, NCL MAP3K3, MAP3K4 NCLN, NCOA1 MAP3K5, MAP3K6 NCOA2, NCOA3 MAP3K7, MAP3K7CL NCOA4, NCOA5 MAP3K8, MAP3K9 NCOA6, NCOR1 MAP4, MAP4K3 NCOR2, NCS1 MAP6D1, MAP7 NCSTN, ND1 MAP7D1, MAP7D2 ND2, ND3 MAP7D3, MAPK1 ND4, ND4L MAPK11, MAPK14 ND5, ND6 MAPK1IP1L, MAPK3 NDC1, NDC80 MAPK4, MAPK6 NDE1, NDEL1 MAPK7, MAPK8 NDFIP1, NDFIP2 MAPK8IP1, MAPK8IP2 NDN, NDNL2 MAPK8IP3, MAPK9 NDOR1, NDP MAPKAP1, MAPKAPK2 NDRG1, NDRG3 MAPKAPK3, MAPRE1 NDST1, NDST3 MAPRE3, MAPT NDUFA1, NDUFA10 1-Mar, 2-Mar NDUFA13, NDUFA2 1-Mar, 11-Mar NDUFA3, NDUFA4 2-Mar, 3-Mar NDUFA4L2, NDUFA5 4-Mar, 5-Mar NDUFA7, NDUFA9 6-Mar, 8-Mar NDUFAF1, NDUFAF3 MARCKS, MARCKSL1 NDUFAF4, NDUFAF7 MARK2, MARK3 NDUFB10, NDUFB11 MARK4, MARS NDUFB2, NDUFB3 MARS2, MAS1L NDUFB5, NDUFB6 MAST1, MAST2 NDUFB8, NDUFC2 MAST3, MAST4 NDUFS1, NDUFS2 MASTL, MAT1A NDUFS4, NDUFS5 MAT2A, MAT2B NDUFS6, NDUFS7 MATK, MATN3 NDUFV1, NDUFV3 MATR3, MAU2 NEBL, NECAB1 MAVS, MAX NECAP2, NEDD1 MAZ, MB NEDD4, NEDD9 MB21D2, MBD1 NEFH, NEFM MBD3, MBD5 NEIL1, NEIL2 MBD6, MBIP NEIL3, NEK1 MBLAC2, MBNL1 NEK2, NEK3 MBNL2, MBNL3 NEK4, NEK6 MBOAT7, MBTD1 NEK7, NEK9 MBTPS1, MBTPS2 NELFB, NELFCD MC4R, MCAM NELFE, NEMF MCAT, MCC NENF, NEO1 MCCC2, MCCD1 NES, NET1 MCFD2, MCHR2 NETO2, NEU3 MCL1, MCM10 NEURL, NEURL2 MCM2, MCM3 NEURL3, NEURL4 MCM3AP, MCM4 NEUROD1, NEUROD2 MCM5, MCM6 NEUROD4, NEUROG1 MCM7, MCM8 NF1, NF2 MCM9, MCMBP NFAT5, NFATC1 MCMDC2, MCOLN2 475

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses NFATC2IP, NFATC3 MCPH1, MCRS1 NFE2L1, NFE2L2 MCTP1, MCTP2 NFIA, NFIB MCTS1, MCU NFIC, NFIL3 MCUR1, MDC1 NFIX, NFKB1 MDFI, MDFIC NFKB2, NFKBIA MDGA1, MDGA2 NFKBIL1, NFKBIZ MDH1, MDH2 NFS1, NFU1 MDK, MDM1 NFXL1, NFYA MDM2, MDM4 NFYB, NFYC MDN1, ME1 NGFR, NGFRAP1 ME2, ME3 NGLY1, NGRN MEA1, MEAF6 NHLRC2, NHLRC3 MECP2, MECR NHP2, NHP2L1 MED1, MED10 NHSL1, NHSL2 MED12, MED12L NICN1, NID1 MED13, MED13L NID2, NIN MED14, MED15 NINL, NIP7 MED16, MED17 NIPA1, NIPA2 MED18, MED20 NIPAL1, NIPAL2 MED21, MED22 NIPBL, NIPSNAP1 MED23, MED24 NISCH, NKAIN1 MED25, MED26 NKAIN2, NKAP MED28, MED31 NKG7, NKIRAS2 MED4, MED6 NKRF, NKTR MED8, MED9 NKX2-1, NKX2-5 MEF2A, MEF2C NKX3-1, NKX3-2 MEF2D, MEGF11 NLE1, NLGN4X MEGF8, MEGF9 NLGN4Y, NLK MEIS1, MEIS2 NLRC4, NLRP1 MEIS3, MEIS3P2 NLRP10, NLRP13 MELK, MEN1 NLRP14, NLRP2 MEOX2, MEP1A NLRP3, NLRP7 MEP1B, MEPCE NLRX1, NMD3 MEPE, MERTK NME2, NME3 MESDC1, MESDC2 NME4, NME5 MESP2, MEST NME6, NME8 MET, METAP1 NMI, NMRK2 METAP2, METRN NMT1, NMU METRNL, METTL10 NNMT, NNT METTL13, METTL16 NOA1, NOB1 METTL17, METTL18 NOBOX, NOC2L METTL20, METTL21A NOC3L, NOC4L METTL21D, METTL22 NOD2, NOL10 METTL23, METTL25 NOL11, NOL12 METTL2A, METTL2B NOL4, NOL6 METTL4, METTL6 NOL7, NOL8 METTL7A, METTL9 NOL9, NOLC1 MEX3B, MEX3C NOM1, NOMO1 MEX3D, MFAP4 NOMO3, NONO MFAP5, MFF NOP10, NOP14 MFGE8, MFHAS1 NOP16, NOP2 MFI2, MFN2 NOP58, NOS3 MFNG, MFSD1 NOSIP, NOTCH1 MFSD10, MFSD12 NOTCH2, NOTCH2NL MFSD3, MFSD5 NOTCH3, NOTCH4 MFSD6, MFSD6L NOVA1, NOX4 MFSD8, MFSD9 NOXO1, NPAS1 MGA, MGAT2 NPAS2, NPAT MGAT3, MGAT4A NPC1, NPC2 MGAT4B, MGAT5 NPDC1, NPEPL1 MGEA5, MGLL NPEPPS, NPHP3 MGMT, MGP NPHS2, NPLOC4 MGRN1, MGST1 NPM1, NPM3 MGST2, MGST3 NPNT, NPPC MIA2, MIA3 NPR3, NPRL3 MIB1, MIB2 NPTN, NPTX1 MICA, MICAL1 NPTXR, NPY MICAL2, MICALL1 NPY1R, NPY4R MICU1, MICU2 NR0B2, NR1D1 MID1, MID1IP1 NR1D2, NR1H2 MID2, MIDN 476

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses NR1H3, NR1I2 MIEN1, MIER1 NR2C2, NR2C2AP MIER2, MIER3 NR2E1, NR2F1 MIF, MIF4GD NR2F2, NR2F6 MINA, MINK1 NR3C1, NR3C2 MINOS1, MINPP1 NR4A1, NR4A2 MIOS, MIPEP NR4A3, NR5A2 MIPOL1, MIR124-2 NR6A1, NRARP MIR143, MIR192 NRAS, NRBF2 MIR194-2, MIR22HG NRBP1, NRD1 MIR30C2, MIR34C NRDE2, NREP MIR671, MIR7-3HG NRG4, NRIP1 MIR769, MIS12 NRIP2, NRK MIS18A, MIS18BP1 NRN1, NRP1 MISP, MITD1 NRP2, NRSN2 MITF, MIXL1 NRXN1, NRXN3 MKI67, MKI67IP NSA2, NSD1 MKL1, MKL2 NSDHL, NSF MKLN1, MKNK1 NSFL1C, NSG1 MKNK2, MKRN1 NSMAF, NSMCE4A MKRN2, MKX NSMF, NSRP1 MLC1, MLEC NSUN2, NSUN4 MLF1IP, MLF2 NSUN5, NT5C1A MLH1, MLH3 NT5C1B, NT5C2 MLLT1, MLLT10 NT5C3, NT5C3A MLLT11, MLLT3 NT5C3B, NT5DC1 MLLT4, MLLT6 NT5DC2, NT5DC3 MLN, MLNR NT5E, NT5M MLX, MLXIP NTAN1, NTF3 MLYCD, MMAB NTHL1, NTMT1 MMACHC, MMADHC NTN1, NTN3 MMD, MME NTN4, NTPCR MMEL1, MMP1 NTRK2, NTRK3 MMP10, MMP11 NTSR1, NUAK1 MMP12, MMP13 NUAK2, NUB1 MMP14, MMP15 NUBP1, NUBP2 MMP16, MMP17 NUBPL, NUCB1 MMP19, MMP2 NUCB2, NUCKS1 MMP20, MMP23A NUDC, NUDCD2 MMP24, MMP25 NUDCD3, NUDT11 MMP26, MMP3 NUDT12, NUDT13 MMP7, MMP8 NUDT15, NUDT19 MMP9, MMRN1 NUDT21, NUDT3 MMS19, MMS22L NUDT4, NUDT5 MN1, MNAT1 NUDT8, NUF2 MND1, MNDA NUFIP1, NUFIP2 MNF1, MNS1 NUMA1, NUMB MNT, MNX1 NUP107, NUP153 MOAP1, MOB1A NUP155, NUP160 MOB1B, MOB3A NUP188, NUP205 MOB3B, MOB3C NUP210, NUP214 MOB4, MOBP NUP35, NUP37 MOCOS, MOCS1 NUP50, NUP54 MOCS3, MOGS NUP62, NUP62CL MON1B, MON2 NUP85, NUP93 MORC2, MORC3 NUP98, NUPL1 MORC4, MORF4L1 NUPL2, NUPR1 MORF4L2, MORN2 NUS1, NUSAP1 MOSPD2, MOSPD3 NUTF2, NUTM1 MOV10, MOV10L1 NUTM2A, NUTM2B MOXD1, MPC1 NUTM2D, NVL MPC2, MPDU1 NXF3, NXN MPDZ, MPG NXNL2, NXPE3 MPHOSPH10, MPHOSPH8 NXPH2, NXT1 MPHOSPH9, MPI NXT2, NYNRIN MPL, MPLKIP OARD1, OAS1 MPND, MPP1 OAS2, OASL MPP2, MPP5 OAT, OAZ1 MPP6, MPPE1 OAZ2, OBSL1 MPPED2, MPRIP OCA2, OCIAD2 MPST, MPV17 OCLN, OCRL MPZL1, MRAP2 477

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ODC1, ODF3B MRAS, MRC2 ODF3L1, OFCC1 MRE11A, MREG OFD1, OGDH MRFAP1, MRGBP OGDHL, OGFOD1 MRGPRX1, MRGPRX3 OGFR, OGFRL1 MRM1, MROH1 OGN, OGT MROH7, MROH8 OIP5, OLA1 MROH9, MRPL1 OLFM4, OLFML2A MRPL10, MRPL11 OLFML2B, OLFML3 MRPL12, MRPL13 OLIG2, OLIG3 MRPL14, MRPL15 OLR1, OMA1 MRPL16, MRPL18 ONECUT2, OPA1 MRPL19, MRPL2 OPA3, OPN1LW MRPL20, MRPL21 OPN1MW, OPN1MW2 MRPL22, MRPL24 OPN1SW, OPRK1 MRPL27, MRPL28 OPTN, OR10D1P MRPL3, MRPL32 OR11A1, OR1A1 MRPL34, MRPL35 OR1C1, OR1J2 MRPL36, MRPL37 OR1R1P, OR2C3 MRPL38, MRPL39 OR2J1, OR2J3 MRPL4, MRPL40 OR2L2, OR2W1 MRPL42, MRPL43 OR3A1, OR3A2 MRPL44, MRPL45 OR51B5, OR52K1 MRPL46, MRPL47 OR5E1P, OR6B1 MRPL49, MRPL50 OR6V1, OR7A17 MRPL51, MRPL53 OR7D2, OR8B2 MRPL9, MRPS10 OR8B3, OR8D2 MRPS11, MRPS12 OR8G7P, OR8U1 MRPS14, MRPS15 ORAI2, ORAI3 MRPS16, MRPS18A ORAOV1, ORC1 MRPS18B, MRPS2 ORC2, ORC3 MRPS22, MRPS23 ORC4, ORC5 MRPS24, MRPS25 ORC6, ORMDL1 MRPS26, MRPS27 ORMDL2, ORMDL3 MRPS28, MRPS31 OSBP, OSBP2 MRPS33, MRPS34 OSBPL10, OSBPL11 MRPS35, MRPS5 OSBPL1A, OSBPL2 MRPS6, MRPS9 OSBPL3, OSBPL6 MRRF, MRS2 OSBPL7, OSBPL8 MRTO4, MS4A10 OSBPL9, OSGEPL1 MS4A14, MS4A2 OSMR, OSR1 MS4A3, MS4A4A OSTC, OSTF1 MS4A6E, MS4A8 OSTM1, OTOGL MSANTD1, MSANTD2 OTOL1, OTOR MSANTD3, MSANTD4 OTP, OTUB1 MSC, MSH2 OTUB2, OTUD1 MSH3, MSH6 OTUD3, OTUD4 MSI1, MSI2 OTUD5, OTUD6A MSL2, MSL3 OTUD7A, OTUD7B MSLN, MSMO1 OTX2, OVCH2 MSN, MSRA OVGP1, OVOL1 MSRB3, MSS51 OXA1L, OXCT1 MST1, MST4 OXGR1, OXNAD1 MSTN, MSTO1 OXR1, OXTR MT1E, MT1F P2RX1, P2RX4 MT1HL1, MT1M P2RX5, P2RX6 MT1P3, MT1X P2RX7, P2RY11 MT4, MTA2 P2RY2, P4HA1 MTA3, MTAP P4HA2, P4HA3 MTCH1, MTCH2 P4HB, PA2G4 MTCP1, MTDH PAAF1, PABPC1 MTERF, MTERFD1 PABPC1L, PABPC3 MTERFD2, MTF1 PABPC4, PABPN1 MTF2, MTFMT PACS1, PACS2 MTFP1, MTFR1 PACSIN2, PACSIN3 MTFR1L, MTFR2 PADI4, PAF1 MTG1, MTG2 PAFAH1B1, PAFAH1B2 MTHFD1, MTHFD1L PAFAH1B3, PAG1 MTHFD2, MTHFD2L PAGE2B, PAGR1 MTHFR, MTHFS PAH, PAICS MTHFSD, MTIF2 PAIP1, PAIP2 MTMR1, MTMR10 478

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses PAK1, PAK1IP1 MTMR12, MTMR14 PAK2, PAK3 MTMR2, MTMR3 PAK4, PAK6 MTMR4, MTMR6 PALB2, PALD1 MTMR9, MTO1 PALLD, PALM MTOR, MTPAP PALM2, PALM2-AKAP2 MTPN, MTR PALMD, PAM MTRF1, MTRF1L PAMR1, PAN2 MTRR, MTSS1 PAN3, PANK2 MTSS1L, MTUS1 PANK3, PANX1 MTUS2, MTX1 PAPD4, PAPD5 MTX3, MUC1 PAPD7, PAPOLA MUC13, MUC15 PAPOLG, PAPPA MUC17, MUC21 PAPSS1, PAPSS2 MUC3A, MUC4 PAQR3, PAQR5 MUC5AC, MUC5B PAQR6, PAQR8 MUC6, MUC7 PARD3, PARD6B MUS81, MUT PARK2, PARK7 MVB12A, MVB12B PARL, PARM1 MVD, MVK PARN, PARP1 MVP, MX2 PARP10, PARP11 MXD1, MXD3 PARP12, PARP15 MXD4, MXI1 PARP16, PARP4 MXRA7, MXRA8 PARP9, PARVA MYADM, MYB PARVB, PARVG MYBBP1A, MYBL1 PASK, PATE2 MYBL2, MYBPC1 PATL1, PATZ1 MYBPH, MYC PAWR, PAX3 MYCBP, MYCBP2 PAX6, PAXBP1 MYCL1, MYCN PAXIP1, PBDC1 MYCT1, MYD88 PBRM1, PBX1 MYEF2, MYEOV PBX2, PBX3 MYF5, MYF6 PBXIP1, PC MYH1, MYH10 PCBD1, PCBP1 MYH14, MYH2 PCBP2, PCBP3 MYH4, MYH7B PCCB, PCDH10 MYH9, MYL12B PCDH7, PCDH8 MYL2, MYL6 PCDHA4, PCDHB10 MYLIP, MYLK PCDHB14, PCDHB15 MYLK3, MYLPF PCDHB8, PCDHGA1 MYO10, MYO18A PCDHGA10, PCDHGA2 MYO19, MYO1C PCDHGA8, PCDHGB2 MYO1D, MYO1E PCDHGB4, PCDHGC3 MYO3B, MYO5A PCDHGC4, PCDHGC5 MYO5B, MYO5C PCED1A, PCED1B MYO6, MYO7A PCF11, PCGF1 MYO9A, MYO9B PCGF2, PCGF3 MYOD1, MYOF PCGF5, PCID2 MYOG, MYOM1 PCIF1, PCK2 MYOT, MYPN PCMT1, PCMTD1 MYPOP, MYRF PCMTD2, PCNA MYSM1, MYT1 PCNP, PCNT MYT1L, MZT1 PCNX, PCNXL4 MZT2A, MZT2B PCOLCE2, PCSK4 N4BP1, N4BP2 PCSK6, PCSK7 N4BP2L2, N6AMT1 PCSK9, PCTP NAA10, NAA15 PCYOX1, PCYOX1L NAA16, NAA20 PCYT1A, PCYT1B NAA25, NAA30 PCYT2, PDAP1 NAA35, NAA40 PDCD10, PDCD11 NAA50, NAA60 PDCD4, PDCD5 NAALAD2, NAB1 PDCD6, PDCD6IP NAB2, NABP1 PDCL3, PDE12 NABP2, NACA PDE2A, PDE3A NACC1, NACC2 PDE3B, PDE4A NADK, NADK2 PDE4D, PDE4DIP NAE1, NAF1 PDE5A, PDE6G NAGA, NAGK PDE6H, PDE8B NAGPA, NAIP PDF, PDGFA NAMPT, NANOG PDGFB, PDGFC NANOS1, NANS PDGFD, PDGFRA NAP1L1, NAP1L2 479

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses PDGFRL, PDHA1 NAP1L4, NAPEPLD PDHA2, PDHB NAPG, NAPRT1 PDHX, PDIA3 NAPSB, NARF PDIA4, PDIA6 NARG2, NARS PDIK1L, PDK1 NARS2, NASP PDK2, PDK4 NAT1, NAT14 PDLIM1, PDLIM2 NAT6, NAT8L PDLIM3, PDLIM4 NAV1, NAV2 PDLIM5, PDLIM7 NAV3, NBEA PDP1, PDP2 NBEAL2, NBL1 PDPK1, PDPN NBN, NBPF15 PDPR, PDRG1 NBPF23, NBR1 PDS5A, PDS5B NCAM1, NCAM2 PDSS1, PDXDC1 NCAN, NCAPD2 PDXK, PDXP NCAPD3, NCAPG PDZD11, PDZD2 NCAPG2, NCAPH PDZD4, PDZD8 NCAPH2, NCBP1 PDZK1, PDZRN3 NCBP2, NCDN PDZRN4, PEA15 NCEH1, NCF2 PEBP1, PEBP4 NCF4, NCK2 PEF1, PEG10 NCKAP1, NCKAP5 PEG3, PELI1 NCKAP5L, NCKIPSD PELI2, PELO NCL, NCLN PELP1, PER1 NCOA1, NCOA2 PER3, PERP NCOA3, NCOA4 PES1, PET112 NCOA5, NCOA6 PEX1, PEX11B NCOR1, NCOR2 PEX13, PEX14 NCS1, NCSTN PEX19, PEX26 ND1, ND2 PEX5, PEX6 ND3, ND4 PFAS, PFDN1 ND4L, ND5 PFDN2, PFDN6 ND6, NDC1 PFKFB2, PFKFB3 NDC80, NDE1 PFKFB4, PFKL NDEL1, NDFIP1 PFKM, PFKP NDFIP2, NDN PFN1, PFN2 NDNL2, NDOR1 PGAM1, PGD NDP, NDRG1 PGK1, PGLYRP1 NDRG2, NDRG3 PGM1, PGM2 NDST1, NDST3 PGM2L1, PGM3 NDUFA1, NDUFA10 PGM5, PGP NDUFA13, NDUFA2 PGR, PGRMC1 NDUFA3, NDUFA4 PGRMC2, PGS1 NDUFA4L2, NDUFA5 PHACTR2, PHACTR4 NDUFA7, NDUFA9 PHAX, PHB NDUFAF1, NDUFAF3 PHB2, PHC1 NDUFAF4, NDUFAF7 PHC2, PHC3 NDUFB1, NDUFB10 PHEX, PHF10 NDUFB11, NDUFB2 PHF12, PHF13 NDUFB3, NDUFB4 PHF14, PHF16 NDUFB5, NDUFB6 PHF17, PHF19 NDUFB8, NDUFC2 PHF20, PHF20L1 NDUFS1, NDUFS2 PHF21A, PHF23 NDUFS5, NDUFS6 PHF3, PHF6 NDUFS7, NDUFS8 PHF8, PHGDH NDUFV1, NDUFV3 PHIP, PHKA1 NEBL, NECAB1 PHKA2, PHKB NECAP1, NECAP2 PHKG2, PHLDA2 NEDD1, NEDD4 PHLDB1, PHLDB2 NEDD4L, NEDD9 PHLDB3, PHLPP1 NEFH, NEFM PHLPP2, PHOX2A NEIL1, NEIL2 PHPT1, PHRF1 NEIL3, NEK1 PHTF1, PHTF2 NEK2, NEK3 PHYH, PHYHIP NEK4, NEK6 PI16, PI4K2A NEK7, NEK9 PI4K2B, PI4KA NELFB, NELFCD PI4KB, PIAS1 NELFE, NEMF PIAS3, PIAS4 NENF, NEO1 PICALM, PIEZO1 NES, NET1 PIF1, PIFO NETO2, NEU1 PIGB, PIGC NEU3, NEU4 480

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses PIGF, PIGG NEURL, NEURL2 PIGH, PIGM NEURL3, NEURL4 PIGN, PIGO NEUROD1, NEUROD2 PIGP, PIGQ NEUROD4, NEUROG1 PIGS, PIGT NF1, NF2 PIGU, PIGX NFAT5, NFATC1 PIH1D1, PIH1D3 NFATC2, NFATC2IP PIK3AP1, PIK3C2A NFATC3, NFE2L1 PIK3C2B, PIK3C3 NFE2L2, NFIA PIK3CA, PIK3CB NFIB, NFIC PIK3CD, PIK3CG NFIL3, NFIX PIK3R1, PIK3R2 NFKB1, NFKB2 PIK3R3, PIK3R4 NFKBIA, NFKBIE PIKFYVE, PIM1 NFKBIL1, NFKBIZ PIM2, PIM3 NFS1, NFU1 PIN1, PIN4 NFXL1, NFYA PINX1, PIP4K2A NFYB, NFYC PIP4K2B, PIP4K2C NGDN, NGFR PIP5K1A, PIP5K1C NGFRAP1, NGLY1 PIP5KL1, PIPOX NGRN, NHLRC2 PIR, PISD NHLRC3, NHP2 PITHD1, PITPNA NHP2L1, NHSL1 PITPNB, PITPNC1 NHSL2, NICN1 PITPNM1, PITPNM2 NID1, NID2 PITRM1, PITX1 NIN, NINJ1 PIWIL4, PJA2 NINL, NIP7 PKD1, PKD1L1 NIPA1, NIPA2 PKD2, PKD2L1 NIPAL1, NIPAL2 PKDCC, PKHD1L1 NIPBL, NIPSNAP1 PKIG, PKM NISCH, NIT2 PKMYT1, PKN1 NKAIN1, NKAIN2 PKN3, PKNOX1 NKAP, NKD1 PKP1, PKP2 NKG7, NKIRAS2 PKP4, PLA2G10 NKRF, NKTR PLA2G12A, PLA2G2D NKX2-1, NKX2-2 PLA2G2E, PLA2G4A NKX2-5, NKX3-1 PLA2G4C, PLA2G4F NKX3-2, NKX6-1 PLA2G5, PLA2G7 NLE1, NLGN4X PLAC1L, PLAC8L1 NLGN4Y, NLK PLAG1, PLAGL1 NLRC4, NLRP1 PLAGL2, PLAT NLRP10, NLRP13 PLAU, PLAUR NLRP14, NLRP2 PLCB1, PLCB3 NLRP3, NLRP7 PLCD1, PLCD3 NLRX1, NMD3 PLCE1, PLCG1 NME2, NME3 PLCG2, PLCH2 NME4, NME5 PLCL2, PLCXD1 NME6, NME8 PLCXD2, PLD1 NMI, NMNAT1 PLD3, PLD4 NMNAT2, NMRK1 PLD6, PLEC NMRK2, NMT1 PLEK2, PLEKHA1 NMU, NNMT PLEKHA4, PLEKHA5 NNT, NOA1 PLEKHA7, PLEKHA8 NOB1, NOBOX PLEKHA8P1, PLEKHB1 NOC2L, NOC3L PLEKHB2, PLEKHF1 NOC4L, NOD2 PLEKHF2, PLEKHG1 NOL10, NOL11 PLEKHG2, PLEKHG3 NOL12, NOL4 PLEKHG4, PLEKHH1 NOL6, NOL7 PLEKHH2, PLEKHH3 NOL8, NOL9 PLEKHJ1, PLEKHM1 NOLC1, NOM1 PLEKHM2, PLEKHM3 NOMO1, NOMO3 PLEKHN1, PLEKHO1 NONO, NOP10 PLGRKT, PLIN3 NOP14, NOP16 PLIN4, PLK1 NOP2, NOP56 PLK2, PLK3 NOP58, NOP9 PLK4, PLLP NOS1AP, NOS2 PLOD1, PLOD2 NOS3, NOSIP PLOD3, PLP2 NOTCH1, NOTCH2 PLRG1, PLS1 NOTCH2NL, NOTCH3 PLS3, PLSCR1 NOTCH4, NOTUM PLSCR3, PLSCR4 NOVA1, NOX3 481

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses PLTP, PLXDC2 NOX4, NOXO1 PLXNA1, PLXNA3 NPAS1, NPAS2 PLXNA4, PLXNB2 NPAT, NPC1 PLXNC1, PLXND1 NPC2, NPDC1 PM20D2, PMAIP1 NPEPL1, NPEPPS PMEL, PMEPA1 NPHP3, NPHS2 PMF1, PML NPLOC4, NPM1 PMM2, PMP2 NPM3, NPNT PMPCA, PMS1 NPPC, NPR1 PMS2, PNISR NPR3, NPRL2 PNKD, PNLDC1 NPRL3, NPTN PNLIP, PNLIPRP2 NPTX1, NPTX2 PNLIPRP3, PNMA2 NPTXR, NPY PNN, PNP NPY1R, NPY4R PNPLA2, PNPLA3 NQO2, NR0B2 PNPLA6, PNPO NR1D1, NR1D2 PNRC1, PODN NR1H2, NR1I2 PODXL, PODXL2 NR2C2, NR2C2AP POFUT1, POFUT2 NR2E1, NR2F1 POGK, POGZ NR2F2, NR2F6 POLA1, POLA2 NR3C1, NR3C2 POLB, POLD1 NR4A1, NR4A2 POLD2, POLD3 NR4A3, NR6A1 POLD4, POLDIP2 NRARP, NRAS POLDIP3, POLE NRBF2, NRBP1 POLE2, POLE3 NRD1, NRDE2 POLE4, POLG NREP, NRF1 POLI, POLK NRG1, NRG4 POLM, POLN NRIP1, NRIP2 POLQ, POLR1A NRIP3, NRK POLR1B, POLR1C NRN1, NRP1 POLR1D, POLR2A NRP2, NRSN2 POLR2B, POLR2C NRTN, NRXN1 POLR2D, POLR2E NRXN3, NSA2 POLR2H, POLR2I NSD1, NSDHL POLR2K, POLR2L NSF, NSFL1C POLR2M, POLR3A NSG1, NSL1 POLR3B, POLR3D NSMAF, NSMCE4A POLR3E, POLR3G NSMF, NSRP1 POLR3H, POLR3K NSUN2, NSUN5 POLRMT, POM121 NSUN7, NT5C POM121C, POM121L1P NT5C1A, NT5C1B POMC, POMGNT1 NT5C2, NT5C3 POMP, POMT2 NT5C3A, NT5C3B PON2, PON3 NT5DC1, NT5DC2 POP1, POP4 NT5DC3, NT5E POPDC2, POR NT5M, NTAN1 POTEF, POTEG NTF3, NTF4 POU2F1, POU2F3 NTHL1, NTMT1 POU3F2, POU4F1 NTN1, NTN3 POU4F2, POU4F3 NTN4, NTNG1 POU5F1, POU6F1 NTPCR, NTRK2 POU6F2, PPA1 NTRK3, NTSR1 PPA2, PPAN NUAK1, NUAK2 PPAN-P2RY11, PPAP2A NUB1, NUBP1 PPAP2B, PPAP2C NUBP2, NUBPL PPAPDC1A, PPARA NUCB1, NUCB2 PPARD, PPARG NUCKS1, NUDC PPARGC1A, PPARGC1B NUDCD2, NUDCD3 PPAT, PPBP NUDT11, NUDT12 PPCS, PPFIA1 NUDT13, NUDT15 PPFIA2, PPFIA4 NUDT19, NUDT2 PPFIBP1, PPIA NUDT21, NUDT3 PPIAL4G, PPIB NUDT4, NUDT5 PPID, PPIE NUDT8, NUF2 PPIF, PPIG NUFIP1, NUFIP2 PPIL1, PPIL2 NUMA1, NUMB PPIL4, PPL NUP107, NUP153 PPM1A, PPM1B NUP155, NUP160 PPM1D, PPM1E NUP188, NUP205 PPM1F, PPM1G NUP210, NUP214 482

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses PPM1H, PPM1L NUP35, NUP37 PPOX, PPP1CA NUP50, NUP54 PPP1CB, PPP1CC NUP62, NUP62CL PPP1R10, PPP1R11 NUP85, NUP93 PPP1R12A, PPP1R12B NUP98, NUPL1 PPP1R12C, PPP1R13B NUPL2, NUPR1 PPP1R14C, PPP1R15A NUS1, NUSAP1 PPP1R15B, PPP1R16A NUTF2, NUTM1 PPP1R16B, PPP1R17 NUTM2A, NUTM2B PPP1R18, PPP1R1A NUTM2D, NVL PPP1R1C, PPP1R2 NXF3, NXN PPP1R26, PPP1R36 NXNL2, NXPE3 PPP1R3A, PPP1R3B NXPH2, NXPH4 PPP1R3C, PPP1R3D NXT1, NXT2 PPP1R3G, PPP1R7 NYNRIN, NYX PPP1R8, PPP1R9A OAF, OARD1 PPP2CA, PPP2R1A OAS1, OAS2 PPP2R1B, PPP2R2A OASL, OAT PPP2R2B, PPP2R2C OAZ1, OAZ2 PPP2R2D, PPP2R3A OBSCN, OCA2 PPP2R3B, PPP2R4 OCIAD2, OCLN PPP2R5A, PPP2R5C OCRL, ODAM PPP2R5E, PPP3CA ODC1, ODF2 PPP3CB, PPP3R1 ODF3B, ODF3L1 PPP3R2, PPP4C OFCC1, OFD1 PPP4R1, PPP4R1L OGDH, OGDHL PPP4R2, PPP5C OGFOD1, OGFOD2 PPP6C, PPP6R2 OGFR, OGFRL1 PPP6R3, PPRC1 OGN, OGT PPT1, PPT2 OIP5, OLA1 PPTC7, PQBP1 OLFM4, OLFML2A PRADC1, PRAF2 OLFML2B, OLFML3 PRAMEF1, PRAMEF10 OLIG3, OLR1 PRAMEF13, PRAMEF14 OMA1, ONECUT2 PRAMEF16, PRAMEF17 OPA1, OPA3 PRAMEF2, PRAP1 OPN1LW, OPN1MW PRC1, PRCC OPN1MW2, OPN1SW PRCP, PRDM1 OPN3, OPRD1 PRDM11, PRDM16 OPRK1, OPTN PRDM2, PRDM4 OR10D1P, OR10H1 PRDM8, PRDX1 OR10H2, OR10H5 PRDX2, PRDX3 OR11A1, OR1A1 PRDX4, PREB OR1A2, OR1C1 PRELID1, PREPL OR1J2, OR1R1P PREX1, PREX2 OR2C3, OR2F2 PRG3, PRH2 OR2J1, OR2J3 PRICKLE1, PRICKLE2 OR2L2, OR2S2 PRICKLE3, PRICKLE4 OR2W1, OR3A1 PRIM1, PRIMA1 OR3A2, OR51B5 PRKAA1, PRKAA2 OR52K1, OR5E1P PRKAB2, PRKACA OR6B1, OR6V1 PRKACB, PRKAG1 OR7A17, OR7D2 PRKAR1A, PRKAR2A OR8B2, OR8B3 PRKCA, PRKCD OR8D2, OR8G7P PRKCDBP, PRKCE OR8U1, ORAI2 PRKCG, PRKCI ORAI3, ORAOV1 PRKCSH, PRKD1 ORC1, ORC2 PRKD2, PRKD3 ORC4, ORC5 PRKG2, PRKRIR ORC6, ORMDL1 PRLR, PRMT1 ORMDL3, OSBP PRMT2, PRMT3 OSBP2, OSBPL10 PRMT5, PRMT7 OSBPL11, OSBPL1A PRMT8, PRND OSBPL2, OSBPL3 PRNP, PRO0992 OSBPL6, OSBPL8 PROCA1, PROCR OSBPL9, OSCP1 PRODH, PROKR2 OSGEPL1, OSMR PROM1, PROS1 OSR1, OSTC PROSC, PROSER1 OSTF1, OSTM1 PROX1, PROX2 OTC, OTOGL PRPF19, PRPF31 OTOL1, OTOR PRPF38A, PRPF38B OTP, OTUB1 483

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses PRPF39, PRPF40A OTUB2, OTUD1 PRPF40B, PRPF4B OTUD3, OTUD4 PRPF6, PRPF8 OTUD5, OTUD6A PRPS1, PRPS2 OTUD7A, OTUD7B PRPSAP1, PRR11 OTX2, OVCA2 PRR12, PRR13 OVCH2, OVGP1 PRR14L, PRR15L OVOL1, OXA1L PRR3, PRR4 OXCT1, OXGR1 PRR5, PRRC1 OXNAD1, OXR1 PRRC2A, PRRC2B OXSR1, P2RX1 PRRC2C, PRRG1 P2RX2, P2RX4 PRRG2, PRRG4 P2RX5, P2RX6 PRRT3, PRSS21 P2RX7, P2RY11 PRSS22, PRSS23 P2RY2, P2RY6 PRSS53, PRSS8 P4HA1, P4HA2 PRTG, PRUNE2 P4HA3, P4HB PRX, PSAP PA2G4, PAAF1 PSAT1, PSD3 PABPC1, PABPC1L PSG2, PSG3 PABPC3, PABPC4 PSG6, PSG9 PABPN1, PACRG PSIP1, PSKH1 PACRGL, PACS1 PSMA2, PSMA3 PACS2, PACSIN2 PSMA4, PSMA5 PACSIN3, PADI4 PSMA6, PSMA7 PAF1, PAFAH1B1 PSMB1, PSMB2 PAFAH1B2, PAFAH1B3 PSMB3, PSMB4 PAG1, PAGE2B PSMB5, PSMB6 PAGR1, PAH PSMB7, PSMC1 PAICS, PAIP1 PSMC2, PSMC3 PAIP2, PAK1 PSMC3IP, PSMC4 PAK1IP1, PAK2 PSMC5, PSMD1 PAK3, PAK4 PSMD10, PSMD11 PAK6, PALB2 PSMD12, PSMD13 PALD1, PALLD PSMD14, PSMD2 PALM, PALM2-AKAP2 PSMD3, PSMD4 PALMD, PAM PSMD6, PSMD7 PAM16, PAMR1 PSMD8, PSMD9 PAN2, PAN3 PSME1, PSME3 PANK2, PANK3 PSME4, PSMF1 PANX1, PAPD4 PSMG1, PSMG2 PAPD5, PAPD7 PSMG3, PSMG4 PAPOLA, PAPOLG PSORS1C1, PSPH PAPPA, PAPSS1 PSPN, PSRC1 PAPSS2, PAQR3 PTAFR, PTAR1 PAQR4, PAQR5 PTBP1, PTBP2 PAQR6, PAQR8 PTBP3, PTCD1 PARD3, PARD6B PTCD3, PTCH1 PARK2, PARK7 PTCHD1, PTCHD3 PARL, PARM1 PTDSS1, PTDSS2 PARN, PARP1 PTEN, PTER PARP10, PARP11 PTGDS, PTGER2 PARP12, PARP14 PTGES, PTGES2 PARP15, PARP16 PTGFR, PTGFRN PARP4, PARP6 PTGR2, PTGS1 PARP8, PARP9 PTGS2, PTH PARVB, PARVG PTH2, PTK2 PASK, PATE2 PTK2B, PTK7 PATL1, PATZ1 PTMA, PTMAP7 PAWR, PAX2 PTMS, PTOV1 PAX3, PAX6 PTP4A1, PTP4A2 PAX7, PAXBP1 PTP4A3, PTPDC1 PAXIP1, PBDC1 PTPLA, PTPLAD1 PBK, PBLD PTPLAD2, PTPLB PBRM1, PBX1 PTPMT1, PTPN1 PBX2, PBX3 PTPN11, PTPN12 PBXIP1, PC PTPN13, PTPN14 PCBD1, PCBP1 PTPN18, PTPN22 PCBP2, PCBP3 PTPN23, PTPN3 PCCA, PCCB PTPN4, PTPN7 PCDH10, PCDH7 PTPN9, PTPRA PCDH8, PCDH9 PTPRD, PTPRE PCDHA11, PCDHA2 484

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses PTPRF, PTPRG PCDHA4, PCDHA6 PTPRH, PTPRJ PCDHB10, PCDHB14 PTPRK, PTPRM PCDHB15, PCDHB8 PTPRN, PTPRN2 PCDHGA1, PCDHGA10 PTPRO, PTPRR PCDHGA2, PCDHGA8 PTPRS, PTPRT PCDHGB2, PCDHGB4 PTPRU, PTPRZ1 PCDHGC3, PCDHGC4 PTRH1, PTRH2 PCDHGC5, PCED1A PTS, PTTG1 PCED1B, PCF11 PTTG1IP, PTX3 PCGF1, PCGF2 PUF60, PUM1 PCGF3, PCGF5 PUM2, PURA PCGF6, PCID2 PURB, PURG PCIF1, PCK2 PUS1, PUS10 PCMT1, PCMTD1 PUS7, PVR PCMTD2, PCNA PVRL2, PWP1 PCNP, PCNT PWP2, PWWP2A PCNX, PCNXL4 PXDC1, PXDN PCOLCE2, PCSK1 PXMP2, PXN PCSK4, PCSK5 PYCR1, PYCR2 PCSK6, PCSK7 PYCRL, PYGB PCSK9, PCTP PYGL, PYGM PCYOX1, PCYOX1L PYGO1, PYROXD2 PCYT1A, PCYT1B QARS, QDPR PCYT2, PDAP1 QKI, QPRT PDCD10, PDCD11 QRICH1, QRSL1 PDCD4, PDCD5 QSER1, QSOX1 PDCD6, PDCD6IP QSOX2, QTRT1 PDCL, PDCL3 QTRTD1, R3HCC1L PDE11A, PDE12 R3HDM2, R3HDM4 PDE2A, PDE3A R3HDML, RAB10 PDE3B, PDE4A RAB11A, RAB11FIP1 PDE4D, PDE4DIP RAB11FIP2, RAB11FIP3 PDE5A, PDE6D RAB11FIP5, RAB12 PDE6G, PDE6H RAB13, RAB14 PDE8A, PDE8B RAB18, RAB1A PDF, PDGFA RAB1B, RAB21 PDGFB, PDGFC RAB22A, RAB23 PDGFD, PDGFRA RAB26, RAB27A PDGFRL, PDHA1 RAB27B, RAB28 PDHA2, PDHB RAB2A, RAB30 PDHX, PDIA3 RAB31, RAB33B PDIA4, PDIA5 RAB34, RAB35 PDIA6, PDIK1L RAB36, RAB37 PDK1, PDK2 RAB38, RAB39B PDK4, PDLIM1 RAB3B, RAB3C PDLIM2, PDLIM3 RAB3D, RAB3GAP2 PDLIM4, PDLIM5 RAB3IP, RAB40A PDLIM7, PDP1 RAB40B, RAB40C PDP2, PDPK1 RAB4B, RAB5A PDPN, PDPR RAB5B, RAB5C PDRG1, PDS5A RAB6A, RAB6C PDS5B, PDSS1 RAB7A, RAB8A PDSS2, PDXDC1 RAB8B, RAB9B PDXK, PDXP RABAC1, RABEP1 PDZD11, PDZD2 RABEPK, RABGAP1 PDZD4, PDZD8 RABGAP1L, RABGGTB PDZK1, PDZRN3 RABL2A, RABL2B PDZRN4, PEA15 RABL6, RAC1 PEAK1, PEBP1 RAC2, RAC3 PEBP4, PECR RACGAP1, RACGAP1P PEF1, PEG10 RAD1, RAD21 PEG3, PELI1 RAD23A, RAD23B PELI2, PELO RAD51, RAD51B PELP1, PEMT RAD51C, RAD51D PEPD, PER1 RAD52, RAD54B PER2, PER3 RAD9A, RADIL PERP, PES1 RAE1, RAF1 PET112, PEX1 RAG1, RAI1 PEX11A, PEX11B RAI14, RAI2 PEX13, PEX14 RALA, RALB PEX16, PEX19 485

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses RALBP1, RALGAPA1 PEX26, PEX5 RALGAPA2, RALGAPB PEX5L, PEX6 RALGDS, RALGPS2 PFAS, PFDN2 RAN, RANBP10 PFDN4, PFDN5 RANBP2, RANBP3 PFDN6, PFKFB2 RANBP6, RANGAP1 PFKFB3, PFKFB4 RAP1A, RAP1B PFKL, PFKM RAP1GAP, RAP1GAP2 PFKP, PFN1 RAP2A, RAP2B PFN2, PGAM1 RAP2C, RAPGEF1 PGAM4, PGAP1 RAPGEF2, RAPGEF3 PGD, PGGT1B RAPGEF4, RAPGEF6 PGK1, PGLYRP1 RAPGEFL1, RAPH1 PGM1, PGM2 RARA, RARB PGM2L1, PGM3 RARG, RARRES2 PGM5, PGP RARS, RARS2 PGR, PGRMC1 RASA1, RASA2 PGRMC2, PGS1 RASA3, RASA4B PHACTR2, PHACTR4 RASAL2, RASD1 PHAX, PHB RASEF, RASGEF1A PHB2, PHC1 RASGEF1C, RASGRP1 PHC2, PHC3 RASGRP3, RASIP1 PHEX, PHF10 RASL10A, RASL10B PHF12, PHF13 RASSF1, RASSF2 PHF14, PHF16 RASSF3, RASSF5 PHF17, PHF19 RASSF6, RASSF8 PHF20, PHF20L1 RASSF9, RAVER1 PHF21A, PHF21B RAVER2, RAX PHF23, PHF3 RB1, RB1CC1 PHF6, PHF7 RBAK, RBBP4 PHF8, PHGDH RBBP5, RBBP6 PHIP, PHKA1 RBBP7, RBCK1 PHKA2, PHKB RBFOX1, RBFOX2 PHKG2, PHLDA2 RBL1, RBL2 PHLDA3, PHLDB1 RBM10, RBM12 PHLDB2, PHLDB3 RBM12B, RBM14 PHLPP1, PHLPP2 RBM15B, RBM17 PHOSPHO2, PHOX2A RBM19, RBM20 PHPT1, PHRF1 RBM22, RBM23 PHTF1, PHTF2 RBM24, RBM26 PHYH, PHYHIP RBM27, RBM28 PI4K2A, PI4K2B RBM3, RBM33 PI4KA, PI4KB RBM34, RBM39 PIAS1, PIAS2 RBM4, RBM42 PIAS3, PIAS4 RBM43, RBM46 PICALM, PIDD RBM47, RBM4B PIEZO1, PIF1 RBM5, RBM6 PIFO, PIGA RBM7, RBM8A PIGB, PIGC RBMS1, RBMS2 PIGF, PIGG RBMS3, RBMX PIGH, PIGM RBMXL1, RBMY1D PIGN, PIGO RBMY1F, RBMY1HP PIGP, PIGQ RBP1, RBP7 PIGS, PIGT RBPJ, RBPMS PIGU, PIGX RBPMS2, RBX1 PIH1D1, PIH1D3 RC3H1, RC3H2 PIK3AP1, PIK3C2A RCAN1, RCBTB1 PIK3C2B, PIK3C3 RCBTB2, RCC1 PIK3CA, PIK3CB RCC2, RCCD1 PIK3CD, PIK3CG RCE1, RCL1 PIK3R1, PIK3R2 RCN1, RCN2 PIK3R3, PIK3R4 RCN3, RCOR1 PIKFYVE, PIM1 RCOR2, RCOR3 PIM2, PIM3 RCSD1, RCVRN PIN1, PIN4 RDH10, RDH11 PINX1, PIP4K2A RDH16, RDH5 PIP4K2B, PIP4K2C RDM1, RDX PIP5K1A, PIP5K1C RECK, RECQL PIP5KL1, PIPOX RECQL4, REEP1 PISD, PITHD1 REEP3, REEP4 PITPNA, PITPNB REEP5, REL PITPNC1, PITPNM1 486

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses RELA, RELN PITPNM2, PITRM1 REM1, REM2 PITX1, PITX3 REN, REPS1 PIWIL4, PJA2 REPS2, RER1 PKD1, PKD2 RERE, RERG PKD2L1, PKDCC RERGL, REST PKHD1L1, PKIA RET, RETSAT PKIG, PKM REV1, REV3L PKMYT1, PKN1 REXO1, REXO2 PKN2, PKN3 REXO4, RFC1 PKNOX1, PKP1 RFC2, RFC3 PKP2, PKP4 RFC4, RFC5 PLA2G10, PLA2G12A RFFL, RFK PLA2G2D, PLA2G2E RFT1, RFTN1 PLA2G4A, PLA2G4B RFTN2, RFWD3 PLA2G4C, PLA2G4F RFX1, RFX2 PLA2G5, PLA2G7 RFX3, RFX5 PLAC1L, PLAC8L1 RFX7, RGCC PLAG1, PLAGL1 RGL2, RGMA PLAGL2, PLAT RGMB, RGN PLAU, PLAUR RGP1, RGPD1 PLCB1, PLCB2 RGPD2, RGPD3 PLCB3, PLCB4 RGPD4, RGPD5 PLCD1, PLCD3 RGPD8, RGS12 PLCE1, PLCG1 RGS13, RGS14 PLCG2, PLCH2 RGS16, RGS17 PLCL2, PLCXD1 RGS18, RGS19 PLCXD2, PLD1 RGS2, RGS22 PLD3, PLD4 RGS3, RGS5 PLD6, PLEC RGS7BP, RGS8 PLEK2, PLEKHA1 RHAG, RHBDD1 PLEKHA3, PLEKHA4 RHBDD2, RHBDF2 PLEKHA5, PLEKHA7 RHEB, RHEBL1 PLEKHA8, PLEKHA8P1 RHOA, RHOB PLEKHB1, PLEKHB2 RHOBTB1, RHOBTB2 PLEKHF1, PLEKHF2 RHOBTB3, RHOC PLEKHG1, PLEKHG2 RHOF, RHOG PLEKHG3, PLEKHG4 RHOQ, RHOT1 PLEKHH1, PLEKHH2 RHOT2, RHOU PLEKHH3, PLEKHJ1 RHOV, RHPN1 PLEKHM1, PLEKHM2 RHPN2, RIC8A PLEKHM3, PLEKHN1 RIC8B, RICTOR PLGRKT, PLIN3 RIF1, RIIAD1 PLIN4, PLK1 RIMKLA, RIMS2 PLK2, PLK3 RIMS3, RIMS4 PLK4, PLLP RIN1, RIN2 PLOD1, PLOD2 RING1, RIOK1 PLOD3, PLP2 RIOK2, RIOK3 PLRG1, PLS1 RIPK2, RIPPLY2 PLS3, PLSCR1 RIT1, RLF PLSCR3, PLSCR4 RLIM, RMDN1 PLTP, PLXDC2 RMDN3, RMI1 PLXNA1, PLXNA2 RMND5A, RMND5B PLXNA3, PLXNA4 RN7SK, RNASE10 PLXNB2, PLXNC1 RNASE2, RNASE4 PLXND1, PM20D2 RNASEH1, RNASEH2A PMAIP1, PMEL RNASEH2B, RNASEK PMEPA1, PMF1 RNASEL, RND1 PML, PMM2 RND2, RND3 PMP2, PMP22 RNF10, RNF11 PMPCA, PMS1 RNF111, RNF113A PMS2, PNISR RNF114, RNF115 PNKD, PNKP RNF121, RNF123 PNLDC1, PNLIP RNF125, RNF126 PNLIPRP2, PNLIPRP3 RNF128, RNF13 PNMA2, PNMT RNF138, RNF14 PNN, PNOC RNF141, RNF144A PNP, PNPLA2 RNF144B, RNF145 PNPLA3, PNPLA4 RNF146, RNF149 PNPLA6, PNPLA8 RNF150, RNF152 PNPO, PNPT1 RNF157, RNF167 PNRC1, POC1B 487

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses RNF168, RNF170 PODN, PODNL1 RNF181, RNF182 PODXL, PODXL2 RNF183, RNF186 POFUT1, POFUT2 RNF187, RNF19A POGK, POGLUT1 RNF19B, RNF2 POGZ, POLA1 RNF20, RNF213 POLA2, POLB RNF214, RNF215 POLD1, POLD2 RNF217, RNF219 POLD3, POLD4 RNF220, RNF26 POLDIP2, POLDIP3 RNF34, RNF38 POLE, POLE2 RNF4, RNF40 POLE3, POLE4 RNF41, RNF43 POLG, POLH RNF44, RNF5 POLI, POLK RNF6, RNF7 POLM, POLN RNFT1, RNFT2 POLQ, POLR1A RNGTT, RNH1 POLR1B, POLR1C RNMT, RNMTL1 POLR1D, POLR2A RNPC3, RNPEP POLR2B, POLR2C RNPS1, RNU1-1 POLR2D, POLR2E RNU4-8P, RNU6-50P POLR2F, POLR2G RNU6-82P, ROBO1 POLR2H, POLR2I ROBO4, ROCK1 POLR2J, POLR2L ROCK2, ROGDI POLR2M, POLR3A ROPN1L, ROR1 POLR3B, POLR3D ROR2, RORA POLR3E, POLR3G RORB, RORC POLR3H, POLR3K RP1, RP1L1 POLRMT, POM121 RP2, RP9P POM121C, POM121L1P RPA1, RPA2 POMC, POMGNT1 RPAIN, RPAP1 POMP, POMT2 RPAP2, RPAP3 PON2, PON3 RPE, RPE65 POP1, POP4 RPF1, RPGRIP1L POP5, POPDC2 RPH3A, RPH3AL POR, PORCN RPIA, RPL10 POTEF, POTEG RPL10A, RPL10L POTEJ, POU2F1 RPL11, RPL12 POU2F2, POU2F3 RPL13, RPL13A POU3F2, POU4F1 RPL14, RPL15 POU4F2, POU4F3 RPL18, RPL18A POU5F1, POU6F1 RPL19, RPL21 POU6F2, PPA1 RPL22, RPL23 PPA2, PPAN RPL23A, RPL24 PPAN-P2RY11, PPAP2A RPL26, RPL27 PPAP2B, PPAPDC1A RPL27A, RPL28 PPARA, PPARD RPL29, RPL3 PPARG, PPARGC1A RPL30, RPL31 PPARGC1B, PPAT RPL32, RPL34 PPBP, PPCS RPL35, RPL35A PPDPF, PPFIA1 RPL36, RPL36A PPFIA2, PPFIA4 RPL36AL, RPL37 PPFIBP1, PPIA RPL37A, RPL39 PPIAL4G, PPIB RPL39L, RPL4 PPIC, PPID RPL5, RPL6 PPIE, PPIF RPL7, RPL7A PPIG, PPIL1 RPL7L1, RPL8 PPIL2, PPIL4 RPL9, RPLP0 PPL, PPM1A RPLP1, RPLP2 PPM1B, PPM1D RPN2, RPP14 PPM1E, PPM1F RPP30, RPP38 PPM1G, PPM1H RPP40, RPRD1A PPM1L, PPM1M RPRD1B, RPRD2 PPOX, PPP1CA RPS10, RPS12 PPP1CB, PPP1CC RPS13, RPS14 PPP1R10, PPP1R11 RPS15, RPS15A PPP1R12A, PPP1R12B RPS16, RPS17 PPP1R12C, PPP1R13B RPS18, RPS19 PPP1R13L, PPP1R14A RPS19BP1, RPS2 PPP1R14B, PPP1R14C RPS21, RPS23 PPP1R15A, PPP1R15B RPS24, RPS25 PPP1R16B, PPP1R17 RPS26, RPS27 PPP1R18, PPP1R1A 488

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses RPS27A, RPS27L PPP1R1C, PPP1R2 RPS28, RPS29 PPP1R26, PPP1R36 RPS3, RPS3A PPP1R3A, PPP1R3B RPS4X, RPS4Y1 PPP1R3C, PPP1R3D RPS5, RPS6 PPP1R3G, PPP1R7 RPS6KA1, RPS6KA2 PPP1R8, PPP1R9A RPS6KA3, RPS6KA5 PPP2CA, PPP2CB RPS6KA6, RPS6KB1 PPP2R1A, PPP2R1B RPS6KB2, RPS7 PPP2R2A, PPP2R2B RPS8, RPS9 PPP2R2C, PPP2R2D RPSA, RPUSD2 PPP2R3A, PPP2R3B RPUSD3, RPUSD4 PPP2R4, PPP2R5A RQCD1, RRAD PPP2R5C, PPP2R5E RRAGA, RRAGC PPP3CA, PPP3CB RRAGD, RRAS PPP3R1, PPP3R2 RRAS2, RRBP1 PPP4C, PPP4R1 RREB1, RRM1 PPP4R1L, PPP4R2 RRM2, RRM2B PPP5C, PPP6C RRN3, RRP1 PPP6R2, PPP6R3 RRP12, RRP15 PPRC1, PPT1 RRP1B, RRP36 PPT2, PPTC7 RRP7A, RRP8 PPWD1, PQBP1 RRP9, RRS1 PQLC2, PQLC3 RSAD2, RSBN1 PRADC1, PRAF2 RSBN1L, RSF1 PRAMEF1, PRAMEF10 RSL1D1, RSL24D1 PRAMEF13, PRAMEF14 RSPH4A, RSPH9 PRAMEF16, PRAMEF17 RSPO1, RSPO2 PRAMEF2, PRAP1 RSPO3, RSPO4 PRB2, PRC1 RSPRY1, RSRC1 PRCC, PRCP RSRC2, RSU1 PRDM1, PRDM11 RTBDN, RTCA PRDM13, PRDM16 RTCB, RTEL1-TNFRSF6B PRDM2, PRDM4 RTFDC1, RTKN2 PRDM8, PRDX1 RTN2, RTN3 PRDX2, PRDX3 RTN4, RTP1 PRDX4, PRDX5 RTTN, RUFY2 PRDX6, PREB RUFY3, RUNDC1 PRELID1, PREPL RUNDC3B, RUNX1 PREX2, PRG3 RUNX1T1, RUNX2 PRG4, PRH2 RUNX3, RUSC1 PRICKLE1, PRICKLE2 RUSC2, RUVBL1 PRICKLE3, PRICKLE4 RUVBL2, RWDD1 PRIM1, PRKAA1 RXFP1, RXFP3 PRKAA2, PRKAB2 RXRB, RYBP PRKACA, PRKACB RYK, RYR2 PRKAG1, PRKAR1A S100A1, S100A10 PRKAR2A, PRKAR2B S100A11, S100A14 PRKCA, PRKCB S100A16, S100A2 PRKCD, PRKCDBP S100A5, S100A8 PRKCE, PRKCG S100B, S100G PRKCI, PRKCSH S100PBP, S100Z PRKD1, PRKD2 SAA1, SAC3D1 PRKD3, PRKDC SACM1L, SACS PRKG2, PRKRIR SAE1, SAFB PRLR, PRMT1 SAG, SALL1 PRMT2, PRMT3 SALL2, SALL4 PRMT5, PRMT7 SAMD1, SAMD10 PRMT8, PRND SAMD12, SAMD13 PRNP, PRO0992 SAMD15, SAMD3 PROCA1, PROCR SAMD4B, SAMD5 PRODH, PROK2 SAMD8, SAMD9 PROKR2, PROM1 SAMD9L, SAMHD1 PROS1, PROSC SAMSN1, SAP18 PROSER1, PROX1 SAP30, SAP30BP PROX2, PRPF19 SAP30L, SAPCD2 PRPF31, PRPF38A SAR1A, SARS PRPF38B, PRPF39 SARS2, SART1 PRPF40A, PRPF40B SART3, SASH1 PRPF4B, PRPF6 SASS6, SAT1 PRPF8, PRPH SAT2, SATB1 PRPS1, PRPS2 489

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses SATB2, SAV1 PRPSAP1, PRR11 SAYSD1, SBDS PRR12, PRR13 SBDSP1, SBF1 PRR14, PRR14L SBF2, SBK1 PRR15L, PRR3 SBNO1, SBNO2 PRR4, PRR5 SBSN, SC5D PRR5L, PRR7 SCAF1, SCAF11 PRRC1, PRRC2A SCAF4, SCAF8 PRRC2B, PRRC2C SCAMP1, SCAMP2 PRRG1, PRRG2 SCAMP3, SCAMP4 PRRG3, PRRG4 SCAND2P, SCAP PRRT3, PRRX1 SCARA5, SCARB1 PRRX2, PRSS16 SCARB2, SCARF1 PRSS21, PRSS22 SCCPDH, SCD PRSS23, PRSS50 SCD5, SCFD1 PRSS53, PRSS8 SCG2, SCG3 PRTG, PRUNE2 SCGB1D1, SCGB2A2 PRX, PRY SCGB3A1, SCHIP1 PRY2, PRYP3 SCIMP, SCIN PRYP4, PSAP SCLY, SCMH1 PSAT1, PSD3 SCML2, SCN11A PSD4, PSEN1 SCN1A, SCN1B PSG2, PSG3 SCN3A, SCN3B PSG9, PSIP1 SCN4A, SCN9A PSKH1, PSMA2 SCNN1D, SCO1 PSMA3, PSMA4 SCOC, SCP2D1 PSMA5, PSMA6 SCRG1, SCRIB PSMA7, PSMB1 SCRN1, SCRN3 PSMB2, PSMB3 SCUBE2, SCYL1 PSMB4, PSMB5 SCYL2, SCYL3 PSMB6, PSMB7 SDAD1, SDC1 PSMB8, PSMC1 SDC3, SDC4 PSMC2, PSMC3 SDCBP, SDCCAG8 PSMC3IP, PSMC4 SDE2, SDF2 PSMC5, PSMC6 SDF2L1, SDF4 PSMD1, PSMD10 SDHA, SDHAF2 PSMD11, PSMD12 SDHB, SDHC PSMD13, PSMD14 SDPR, SDR16C5 PSMD2, PSMD3 SDR39U1, SEC11A PSMD4, PSMD5 SEC11C, SEC13 PSMD6, PSMD7 SEC14L1, SEC14L3 PSMD8, PSMD9 SEC14L4, SEC16A PSME1, PSME3 SEC16B, SEC23A PSME4, PSMF1 SEC23B, SEC23IP PSMG1, PSMG2 SEC24A, SEC24B PSMG3, PSMG4 SEC24C, SEC24D PSORS1C1, PSPH SEC31A, SEC61A1 PSPN, PSRC1 SEC61A2, SEC61B PSTPIP2, PTAFR SEC62, SEC63 PTAR1, PTBP1 SECISBP2, SECISBP2L PTBP2, PTBP3 SEH1L, SEL1L PTCD1, PTCD2 SEL1L3, SELE PTCD3, PTCH1 SELENBP1, SELPLG PTCHD1, PTCHD3 SELRC1, SEMA3B PTDSS1, PTDSS2 SEMA3C, SEMA3D PTEN, PTER SEMA3G, SEMA4B PTGDS, PTGER2 SEMA4C, SEMA4D PTGES, PTGES2 SEMA4G, SEMA5A PTGES3, PTGFR SEMA6A, SEMA6B PTGFRN, PTGR2 SEMG2, SENP1 PTGS1, PTGS2 SENP2, SENP3 PTH, PTH2 SENP5, SENP6 PTH2R, PTK2 SENP7, SEPHS1 PTK2B, PTK7 SEPHS2, SEPN1 PTMA, PTMS SEPSECS, 1-Sep PTOV1, PTP4A1 10-Sep, 11-Sep PTP4A2, PTP4A3 2-Sep, 4-Sep PTPDC1, PTPLA 6-Sep, 7-Sep PTPLAD1, PTPLAD2 8-Sep, 9-Sep PTPLB, PTPMT1 SEPW1, SERAC1 PTPN1, PTPN11 SERBP1, SERF2 PTPN12, PTPN13 490

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses SERINC1, SERINC2 PTPN14, PTPN18 SERINC3, SERINC5 PTPN22, PTPN23 SERP1, SERPINA1 PTPN3, PTPN4 SERPINA12, SERPINA3 PTPN7, PTPN9 SERPINB1, SERPINB10 PTPRA, PTPRB SERPINB5, SERPINB9 PTPRD, PTPRE SERPIND1, SERPINE1 PTPRF, PTPRG SERPINE2, SERPINF1 PTPRH, PTPRJ SERPING1, SERPINH1 PTPRK, PTPRM SERPINI1, SERTAD2 PTPRN, PTPRN2 SERTAD3, SERTAD4 PTPRO, PTPRR SESN1, SESN2 PTPRS, PTPRT SESN3, SESTD1 PTPRU, PTPRZ1 SET, SETD1A PTRF, PTRH1 SETD1B, SETD2 PTRH2, PTS SETD3, SETD4 PTTG1, PTTG1IP SETD5, SETD6 PTX3, PUF60 SETD7, SETD8 PUM1, PUM2 SETDB1, SETX PURA, PURB SF1, SF3A1 PURG, PUS1 SF3A3, SF3B1 PUS10, PUS3 SF3B2, SF3B3 PUS7, PVALB SF3B4, SFI1 PVR, PVRL2 SFMBT1, SFN PWP1, PWP2 SFPQ, SFR1 PWWP2A, PXDC1 SFRP1, SFRP2 PXDN, PXK SFSWAP, SFT2D1 PXMP2, PXN SFTPD, SFXN1 PYCARD, PYCR1 SFXN2, SFXN5 PYCR2, PYCRL SGCB, SGCD PYGB, PYGL SGK223, SGK3 PYGM, PYGO1 SGMS1, SGMS2 PYGO2, PYROXD2 SGOL1, SGOL2 QARS, QDPR SGPL1, SGPP1 QKI, QPCTL SGPP2, SGSH QPRT, QRICH1 SGSM1, SGSM3 QRSL1, QSER1 SGTA, SGTB QSOX1, QSOX2 SH2B1, SH2B3 QTRT1, QTRTD1 SH2D1A, SH2D3A R3HCC1L, R3HDM1 SH2D3C, SH2D4A R3HDM2, R3HDM4 SH2D6, SH3BGRL R3HDML, RAB10 SH3BGRL2, SH3BGRL3 RAB11A, RAB11FIP1 SH3BP1, SH3BP2 RAB11FIP2, RAB11FIP3 SH3BP4, SH3BP5 RAB11FIP5, RAB12 SH3BP5L, SH3D19 RAB13, RAB14 SH3D21, SH3GL1 RAB18, RAB1A SH3GL2, SH3GLB1 RAB1B, RAB21 SH3GLB2, SH3PXD2A RAB22A, RAB23 SH3PXD2B, SH3RF1 RAB26, RAB27A SH3TC2, SHANK1 RAB27B, RAB28 SHBG, SHC1 RAB2A, RAB2B SHC2, SHC3 RAB30, RAB31 SHC4, SHCBP1 RAB32, RAB33B SHE, SHFM1 RAB34, RAB35 SHH, SHISA2 RAB36, RAB37 SHISA4, SHISA5 RAB38, RAB3A SHMT1, SHMT2 RAB3B, RAB3C SHOC2, SHPRH RAB3D, RAB3GAP1 SHQ1, SHROOM2 RAB3GAP2, RAB3IL1 SHROOM3, SHROOM4 RAB3IP, RAB40A SIAH1, SIAH2 RAB40B, RAB40C SIGIRR, SIGLEC10 RAB4B, RAB5A SIGLEC16, SIGLEC5 RAB5B, RAB5C SIGLEC8, SIGMAR1 RAB6A, RAB6C SIK1, SIK2 RAB7A, RAB7L1 SIK3, SIKE1 RAB8A, RAB8B SIMC1, SIN3A RAB9B, RABAC1 SIN3B, SIP1 RABEP1, RABGAP1 SIPA1L1, SIPA1L2 RABGGTA, RABGGTB SIPA1L3, SIRPA RABL3, RABL6 SIRT1, SIRT2 RAC1, RAC2 491

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses SIRT4, SIRT5 RAC3, RACGAP1 SIRT7, SIVA1 RACGAP1P, RAD1 SIX1, SIX4 RAD21, RAD23A SIX6, SKA1 RAD23B, RAD51 SKA2, SKA3 RAD51AP1, RAD51B SKAP2, SKI RAD51C, RAD51D SKIL, SKIV2L RAD52, RAD54B SKP1, SKP2 RAD54L, RAD54L2 SLA2, SLAIN1 RAD9A, RADIL SLAIN2, SLBP RAE1, RAET1E SLC10A3, SLC10A7 RAF1, RAG1 SLC11A2, SLC12A2 RAI1, RAI14 SLC12A3, SLC12A4 RAI2, RALA SLC12A6, SLC12A7 RALB, RALBP1 SLC13A4, SLC15A1 RALGAPA1, RALGAPA2 SLC15A4, SLC16A1 RALGAPB, RALGDS SLC16A10, SLC16A14 RALGPS2, RALY SLC16A2, SLC16A3 RAMP2-AS1, RAN SLC16A4, SLC16A6 RANBP1, RANBP10 SLC16A7, SLC16A9 RANBP17, RANBP2 SLC17A1, SLC17A2 RANBP3, RANBP6 SLC17A3, SLC17A4 RANBP9, RANGAP1 SLC17A5, SLC17A7 RAP1A, RAP1B SLC17A8, SLC18B1 RAP1GAP, RAP1GAP2 SLC19A1, SLC19A2 RAP2A, RAP2B SLC19A3, SLC1A1 RAP2C, RAPGEF1 SLC1A4, SLC1A5 RAPGEF2, RAPGEF3 SLC20A1, SLC20A2 RAPGEF4, RAPGEF6 SLC22A14, SLC22A15 RAPGEFL1, RAPH1 SLC22A16, SLC22A18 RARA, RARB SLC22A2, SLC22A3 RARG, RARRES2 SLC22A5, SLC25A1 RARS, RARS2 SLC25A10, SLC25A12 RASA1, RASA2 SLC25A13, SLC25A14 RASA3, RASA4B SLC25A15, SLC25A18 RASAL1, RASAL2 SLC25A19, SLC25A22 RASD1, RASEF SLC25A23, SLC25A24 RASGEF1A, RASGEF1C SLC25A28, SLC25A3 RASGRF1, RASGRP1 SLC25A30, SLC25A31 RASGRP3, RASIP1 SLC25A32, SLC25A33 RASL10A, RASL10B SLC25A36, SLC25A37 RASSF1, RASSF2 SLC25A38, SLC25A39 RASSF3, RASSF5 SLC25A4, SLC25A40 RASSF6, RASSF7 SLC25A41, SLC25A44 RASSF8, RASSF9 SLC25A5, SLC25A51 RAVER1, RAVER2 SLC25A6, SLC26A2 RAX, RB1 SLC26A3, SLC26A7 RB1CC1, RBAK SLC27A1, SLC27A2 RBBP4, RBBP5 SLC27A4, SLC27A6 RBBP6, RBBP7 SLC29A1, SLC29A2 RBBP9, RBCK1 SLC29A3, SLC2A1 RBFOX1, RBFOX2 SLC2A10, SLC2A11 RBL1, RBL2 SLC2A12, SLC2A13 RBM10, RBM12 SLC2A14, SLC2A3 RBM12B, RBM14 SLC2A4, SLC2A6 RBM15B, RBM17 SLC30A1, SLC30A2 RBM19, RBM20 SLC30A6, SLC30A7 RBM22, RBM23 SLC31A1, SLC32A1 RBM24, RBM26 SLC33A1, SLC34A2 RBM27, RBM28 SLC35A1, SLC35A2 RBM3, RBM33 SLC35A4, SLC35A5 RBM34, RBM38 SLC35B2, SLC35B3 RBM39, RBM4 SLC35B4, SLC35C1 RBM41, RBM42 SLC35D1, SLC35D2 RBM43, RBM46 SLC35E2, SLC35E2B RBM47, RBM4B SLC35F1, SLC35F2 RBM5, RBM6 SLC35F3, SLC35F4 RBM8A, RBMS1 SLC35F5, SLC35F6 RBMS2, RBMS3 SLC35G1, SLC36A1 RBMX, RBMXL1 SLC37A1, SLC37A2 RBMY1D, RBMY1F SLC37A3, SLC37A4 RBMY1HP, RBP1 492

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses SLC38A1, SLC38A2 RBP7, RBPJ SLC38A4, SLC38A5 RBPMS, RBPMS2 SLC39A1, SLC39A10 RBX1, RC3H1 SLC39A11, SLC39A12 RC3H2, RCAN1 SLC39A13, SLC39A14 RCBTB1, RCBTB2 SLC39A2, SLC39A3 RCC1, RCC2 SLC39A7, SLC39A8 RCCD1, RCE1 SLC39A9, SLC3A2 RCL1, RCN1 SLC41A3, SLC43A1 RCN2, RCN3 SLC44A1, SLC44A5 RCOR1, RCOR3 SLC45A1, SLC45A3 RCSD1, RCVRN SLC46A1, SLC46A2 RDH10, RDH11 SLC46A3, SLC48A1 RDH13, RDH16 SLC4A10, SLC4A1AP RDH5, RDM1 SLC4A2, SLC4A7 RDX, RECK SLC4A8, SLC50A1 RECQL, RECQL4 SLC52A1, SLC52A2 REEP1, REEP2 SLC52A3, SLC5A1 REEP3, REEP4 SLC5A10, SLC5A3 REEP5, REL SLC5A4, SLC5A5 RELA, RELB SLC5A6, SLC6A10P RELN, REM1 SLC6A12, SLC6A15 REN, REPIN1 SLC6A16, SLC6A19 REPS1, REPS2 SLC6A4, SLC6A6 RER1, RERE SLC6A8, SLC6A9 RERG, RERGL SLC7A1, SLC7A11 REST, RET SLC7A13, SLC7A2 RETSAT, REV1 SLC7A5, SLC7A6 REV3L, REXO1 SLC7A9, SLC9A1 REXO2, REXO4 SLC9A2, SLC9A3R1 RFC1, RFC2 SLC9A3R2, SLC9A5 RFC3, RFC4 SLC9A6, SLC9A7 RFC5, RFFL SLCO1B1, SLCO2A1 RFK, RFT1 SLCO3A1, SLCO4A1 RFTN1, RFTN2 SLCO4C1, SLFN11 RFWD2, RFWD3 SLFN13, SLFN5 RFX1, RFX2 SLIRP, SLIT2 RFX3, RFX5 SLITRK1, SLK RFX7, RGCC SLMAP, SLMO2 RGL2, RGMA SLPI, SLTM RGMB, RGP1 SLX4, SLX4IP RGPD1, RGPD2 SMAD1, SMAD2 RGPD3, RGPD4 SMAD3, SMAD4 RGPD5, RGPD8 SMAD5, SMAD6 RGS12, RGS13 SMAD7, SMAD9 RGS14, RGS16 SMAP1, SMAP2 RGS18, RGS19 SMARCA1, SMARCA2 RGS2, RGS20 SMARCA4, SMARCA5 RGS22, RGS3 SMARCAD1, SMARCAL1 RGS5, RGS7BP SMARCB1, SMARCC1 RGS8, RHAG SMARCC2, SMARCD1 RHBDD2, RHBDF1 SMARCD2, SMARCE1 RHBDF2, RHBDL2 SMC1A, SMC2 RHEB, RHEBL1 SMC3, SMC4 RHOA, RHOB SMC6, SMCHD1 RHOBTB1, RHOBTB2 SMCO4, SMCR7L RHOBTB3, RHOC SMCR8, SMEK1 RHOF, RHOG SMEK2, SMG1 RHOQ, RHOT1 SMG5, SMG6 RHOT2, RHOU SMG7, SMIM12 RHOV, RHPN1 SMIM14, SMIM15 RHPN2, RIBC2 SMIM8, SMN1 RIC8A, RIC8B SMN2, SMNDC1 RICTOR, RIF1 SMO, SMOC2 RIIAD1, RILPL2 SMPD4, SMPDL3A RIMKLA, RIMS3 SMPDL3B, SMS RIMS4, RIN1 SMTNL1, SMTNL2 RIN2, RING1 SMU1, SMUG1 RIOK1, RIOK2 SMURF1, SMURF2 RIOK3, RIPK2 SMYD1, SMYD2 RIPK4, RIPPLY2 SMYD5, SNAI1 RIT1, RLF 493

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses SNAI2, SNAP23 RLIM, RMDN1 SNAP29, SNAP91 RMDN3, RMI1 SNAPC4, SNAPIN RMND5A, RMND5B SNCA, SNCB RN7SK, RNASE1 SNCG, SND1 RNASE10, RNASE2 SNED1, SNF8 RNASE4, RNASE6 SNORA25, SNORD114-10 RNASEH1, RNASEH2A SNORD116-21, SNPH RNASEH2B, RNASEK SNRK, SNRNP200 RNASEL, RNASET2 SNRNP27, SNRNP35 RND1, RND2 SNRNP40, SNRNP48 RND3, RNF10 SNRNP70, SNRPA RNF103, RNF11 SNRPA1, SNRPB RNF111, RNF113A SNRPB2, SNRPC RNF114, RNF115 SNRPD1, SNRPD2 RNF121, RNF123 SNRPD3, SNRPE RNF125, RNF126 SNRPF, SNTB1 RNF128, RNF13 SNTB2, SNTG2 RNF130, RNF138 SNTN, SNUPN RNF139, RNF14 SNW1, SNX1 RNF141, RNF144A SNX10, SNX11 RNF144B, RNF145 SNX12, SNX13 RNF146, RNF149 SNX15, SNX16 RNF150, RNF152 SNX17, SNX18 RNF157, RNF167 SNX19, SNX2 RNF168, RNF170 SNX24, SNX29 RNF181, RNF182 SNX3, SNX30 RNF183, RNF185 SNX32, SNX4 RNF186, RNF187 SNX5, SNX6 RNF19A, RNF19B SNX7, SNX9 RNF2, RNF20 SOAT1, SOAT2 RNF213, RNF214 SOBP, SOCS1 RNF215, RNF216 SOCS2, SOCS3 RNF217, RNF219 SOCS4, SOCS5 RNF220, RNF24 SOCS6, SOD1 RNF26, RNF34 SOD2, SOD3 RNF38, RNF4 SOGA1, SOGA2 RNF40, RNF41 SOLH, SON RNF43, RNF44 SORBS2, SORBS3 RNF5, RNF6 SORCS2, SORCS3 RNF7, RNFT1 SORD, SORT1 RNFT2, RNGTT SOS1, SOS2 RNH1, RNMT SOST, SOSTDC1 RNMTL1, RNPC3 SOWAHC, SOX1 RNPEP, RNPEPL1 SOX11, SOX13 RNPS1, RNU1-1 SOX18, SOX2 RNU4-8P, RNU6-50P SOX4, SOX5 RNU6-82P, ROBO1 SOX6, SOX9 ROBO2, ROBO3 SP1, SP100 ROBO4, ROCK1 SP2, SP3 ROCK2, ROGDI SP4, SP7 ROM1, ROPN1L SPACA3, SPAG1 ROR1, ROR2 SPAG5, SPAG9 RORA, RORB SPARC, SPARCL1 RORC, ROS1 SPAST, SPATA12 RP1, RP1L1 SPATA13, SPATA18 RP2, RP9P SPATA2, SPATA20 RPA1, RPA2 SPATA3, SPATA5 RPA3, RPAIN SPATA6, SPATA6L RPAP1, RPAP2 SPATA9, SPATC1L RPAP3, RPE SPATS2, SPATS2L RPE65, RPF1 SPC24, SPC25 RPGRIP1, RPGRIP1L SPCS2, SPCS3 RPH3A, RPH3AL SPDEF, SPDYA RPIA, RPL10 SPECC1, SPECC1L RPL10A, RPL10L SPEN, SPERT RPL11, RPL12 SPG11, SPG20 RPL13, RPL13A SPHK1, SPHK2 RPL14, RPL15 SPI1, SPIC RPL18, RPL18A SPIN1, SPIN4 RPL19, RPL21 SPINK1, SPINK5 RPL22, RPL23 494

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses SPINK7, SPINT4 RPL23A, RPL24 SPIRE1, SPN RPL26, RPL26L1 SPNS1, SPNS2 RPL27, RPL27A SPO11, SPOPL RPL28, RPL29 SPP1, SPPL2C RPL3, RPL30 SPR, SPRED1 RPL31, RPL32 SPRED2, SPRR2G RPL34, RPL35 SPRR3, SPRTN RPL35A, RPL36 SPRY1, SPRY2 RPL36A, RPL36AL SPRY4, SPRYD3 RPL37, RPL37A SPRYD4, SPRYD7 RPL39, RPL39L SPSB1, SPTAN1 RPL4, RPL5 SPTB, SPTBN1 RPL6, RPL7 SPTBN2, SPTBN5 RPL7A, RPL7L1 SPTLC1, SPTLC3 RPL8, RPL9 SPTSSA, SPTY2D1 RPLP0, RPLP1 SPZ1, SQLE RPLP2, RPN1 SQSTM1, SRC RPN2, RPP14 SRCAP, SRD5A1 RPP25L, RPP30 SRD5A3, SREBF1 RPP38, RPP40 SREBF2, SREK1 RPRD1A, RPRD1B SREK1IP1, SRF RPRD2, RPS10 SRGAP1, SRGAP2 RPS12, RPS13 SRGAP2C, SRI RPS14, RPS15 SRL, SRM RPS15A, RPS16 SRP19, SRP68 RPS17, RPS18 SRP72, SRP9 RPS19, RPS19BP1 SRPK1, SRPK2 RPS2, RPS21 SRPK3, SRPR RPS23, RPS24 SRPRB, SRRM1 RPS25, RPS26 SRRM2, SRRM2-AS1 RPS27, RPS27A SRRM3, SRRT RPS27L, RPS28 SRSF1, SRSF10 RPS29, RPS3 SRSF11, SRSF12 RPS3A, RPS4X SRSF2, SRSF3 RPS4Y1, RPS5 SRSF4, SRSF5 RPS6, RPS6KA1 SRSF6, SRSF7 RPS6KA2, RPS6KA3 SRSF9, SRXN1 RPS6KA4, RPS6KA5 SS18, SSB RPS6KA6, RPS6KB1 SSFA2, SSH1 RPS6KB2, RPS6KC1 SSH2, SSNA1 RPS7, RPS8 SSPN, SSR1 RPS9, RPSA SSR2, SSR3 RPTOR, RPUSD2 SSR4, SSRP1 RPUSD3, RPUSD4 SSSCA1, SSU72 RQCD1, RRAD SSX1, SSX2IP RRAGA, RRAGC SSX7, ST13 RRAGD, RRAS ST14, ST3GAL2 RRAS2, RRBP1 ST3GAL5, ST3GAL6 RREB1, RRM1 ST5, ST6GAL1 RRM2, RRM2B ST6GALNAC1, RRN3, RRP1 ST6GALNAC5 ST7L, ST8SIA4 RRP12, RRP15 STAG1, STAG2 RRP1B, RRP36 STAG3L2, STAG3L3 RRP7A, RRP8 STAM, STAM2 RRP9, RRS1 STAMBP, STAMBPL1 RSAD1, RSAD2 STARD5, STARD6 RSBN1, RSBN1L STARD7, STARD8 RSF1, RSL1D1 STAT1, STAT2 RSL24D1, RSPH4A STAT3, STAT5A RSPH9, RSPO1 STAT5B, STAT6 RSPO2, RSPO3 STAU1, STC2 RSPO4, RSPRY1 STEAP2, STEAP3 RSRC1, RSRC2 STEAP4, STIL RSU1, RTBDN STIM1, STIM2 RTCA, RTCB STIP1, STK10 RTEL1-TNFRSF6B, RTF1 STK11, STK11IP RTFDC1, RTKN STK16, STK17B RTKN2, RTN2 STK19, STK24 RTN3, RTN4 STK25, STK3 RTP1, RTTN 495

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses STK33, STK35 RUFY2, RUFY3 STK38, STK38L RUNDC1, RUNDC3B STK4, STK40 RUNX1, RUNX1T1 STMN1, STMN3 RUNX2, RUNX3 STON2, STOX1 RUSC1, RUSC2 STOX2, STRADB RUVBL1, RUVBL2 STRAP, STRBP RWDD1, RXFP1 STRIP1, STRN RXFP3, RXRA STRN3, STRN4 RXRB, RYBP STS, STT3A RYK, RYR2 STT3B, STUB1 RYR3, S100A1 STX12, STX16 S100A10, S100A11 STX17, STX19 S100A14, S100A16 STX1A, STX1B S100A2, S100A5 STX3, STX4 S100A7, S100A7A STX7, STXBP1 S100A8, S100A9 STXBP3, STXBP4 S100B, S100G STXBP5, STYX S100P, S100PBP SUB1, SUCLA2 S100Z, S1PR5 SUCLG1, SUCLG2 SAA1, SAC3D1 SUCO, SUDS3 SACM1L, SACS SUFU, SUGP1 SAE1, SAFB SUGP2, SUGT1 SAG, SALL1 SULF1, SULF2 SALL2, SALL4 SULT1A3, SULT1C2 SAMD1, SAMD10 SULT2A1, SUMF1 SAMD11, SAMD12 SUMF2, SUMO1 SAMD13, SAMD14 SUMO2, SUN1 SAMD3, SAMD4B SUN2, SUOX SAMD5, SAMD8 SUPT16H, SUPT20H SAMD9, SAMD9L SUPT3H, SUPT4H1 SAMHD1, SAMSN1 SUPT5H, SUPT6H SAP18, SAP30 SUPT7L, SUPV3L1 SAP30BP, SAP30L SURF2, SURF4 SAPCD2, SAR1A SURF6, SUSD1 SAR1B, SARS SUSD3, SUV39H1 SARS2, SART1 SUV420H1, SUV420H2 SART3, SASH1 SUZ12, SV2A SASS6, SAT1 SVIL, SWT1 SAT2, SATB1 SYAP1, SYBU SATB2, SAV1 SYCP2L, SYDE2 SAYSD1, SBDS SYF2, SYK SBDSP1, SBF1 SYMPK, SYN1 SBF2, SBK1 SYNCRIP, SYNDIG1 SBNO1, SBNO2 SYNDIG1L, SYNE1 SBSN, SC5D SYNE2, SYNE3 SCAF1, SCAF11 SYNGAP1, SYNGR1 SCAF4, SCAF8 SYNGR2, SYNGR3 SCAMP1, SCAMP2 SYNJ1, SYNJ2 SCAMP3, SCAMP4 SYNJ2BP, SYNM SCAND2P, SCAP SYNPO2, SYNPO2L SCARA3, SCARA5 SYNPR, SYP SCARB1, SCARB2 SYPL1, SYT1 SCARF1, SCCPDH SYT11, SYT12 SCD, SCD5 SYT14, SYT17 SCFD1, SCG2 SYT2, SYT4 SCG3, SCGB1D1 SYT5, SYTL1 SCGB1D2, SCGB2A2 SYTL2, SYVN1 SCGB3A1, SCHIP1 SZRD1, TAAR2 SCIMP, SCIN TAAR6, TAB1 SCLY, SCMH1 TAB2, TAB3 SCML1, SCML2 TAC1, TAC4 SCN11A, SCN1A TACC1, TACC2 SCN1B, SCN2A TACC3, TACO1 SCN3A, SCN3B TACR2, TACR3 SCN4A, SCN9A TACSTD2, TADA3 SCNN1D, SCO1 TAF1, TAF11 SCO2, SCOC TAF12, TAF15 SCP2, SCP2D1 TAF1C, TAF1D SCRG1, SCRIB TAF2, TAF4 SCRN1, SCRN3 TAF4B, TAF5 SCUBE2, SCYL1 496

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TAF5L, TAF6 SCYL2, SDAD1 TAF6L, TAF7 SDC1, SDC3 TAF8, TAF9 SDC4, SDCBP TAF9B, TAGLN SDCCAG8, SDE2 TAGLN2, TALDO1 SDF2, SDF2L1 TANC2, TANGO6 SDF4, SDHA TAOK1, TAOK2 SDHAF2, SDHB TAOK3, TAPBP SDHC, SDR16C5 TAPT1, TARBP2 SEC11A, SEC11C TARDBP, TARS SEC13, SEC14L1 TARS2, TARSL2 SEC14L3, SEC14L4 TAS2R7, TAT SEC16A, SEC16B TATDN2, TAX1BP1 SEC22A, SEC22C TAX1BP3, TAZ SEC23A, SEC23B TBATA, TBC1D1 SEC23IP, SEC24A TBC1D10B, TBC1D12 SEC24B, SEC24C TBC1D13, TBC1D15 SEC24D, SEC31A TBC1D17, TBC1D20 SEC61A1, SEC61A2 TBC1D22A, TBC1D22B SEC61B, SEC61G TBC1D25, TBC1D27 SEC62, SEC63 TBC1D28, TBC1D2B SECISBP2, SECISBP2L TBC1D30, TBC1D4 SECTM1, SEH1L TBC1D8, TBC1D8B SEL1L, SEL1L3 TBC1D9, TBC1D9B SELE, SELENBP1 TBCB, TBCC SELPLG, SELRC1 TBCCD1, TBCD SEMA3B, SEMA3C TBCEL, TBCK SEMA3D, SEMA3G TBK1, TBKBP1 SEMA4B, SEMA4C TBL1X, TBL1XR1 SEMA4D, SEMA4G TBL3, TBP SEMA5A, SEMA6A TBPL1, TBRG1 SEMA6B, SEMG2 TBRG4, TBX1 SENP1, SENP2 TBX15, TBX18 SENP3, SENP5 TBX19, TBX21 SENP6, SENP7 TBX22, TBX3 SENP8, SEPHS1 TBX4, TBX5 SEPHS2, SEPN1 TBXAS1, TCAM1P SEPSECS, 1-Sep TCEA1, TCEA2 10-Sep, 11-Sep TCEA3, TCEAL1 2-Sep, 4-Sep TCEAL7, TCEB2 7-Sep, 8-Sep TCEB3, TCERG1 9-Sep, SEPW1 TCF12, TCF19 SERAC1, SERBP1 TCF20, TCF21 SERF2, SERGEF TCF25, TCF3 SERINC1, SERINC2 TCF4, TCF7 SERINC3, SERINC5 TCF7L1, TCF7L2 SERP1, SERPINA1 TCFL5, TCHP SERPINA12, SERPINA3 TCL1A, TCL1B SERPINB1, SERPINB10 TCOF1, TCP1 SERPINB2, SERPINB5 TCP11L1, TCP11L2 SERPINB6, SERPINB9 TCTN1, TCTN3 SERPIND1, SERPINE1 TDG, TDP1 SERPINE2, SERPINF1 TDRD3, TDRD5 SERPING1, SERPINH1 TDRD7, TDRP SERPINI1, SERPINI2 TEAD1, TEAD2 SERTAD2, SERTAD3 TEAD4, TECPR1 SERTAD4, SESN1 TECPR2, TECR SESN2, SESN3 TECTB, TEF SESTD1, SET TEFM, TEK SETD1A, SETD1B TEKT1, TEKT3 SETD2, SETD3 TEKT4, TELO2 SETD4, SETD5 TENC1, TENM3 SETD6, SETD7 TENM4, TEP1 SETD8, SETDB1 TERF1, TERF2 SETX, SF1 TERF2IP, TERT SF3A1, SF3A3 TES, TESK1 SF3B1, SF3B2 TESK2, TESPA1 SF3B3, SF3B4 TET1, TET2 SFMBT1, SFN TET3, TEX10 SFPI1, SFPQ TEX101, TEX13A SFR1, SFRP1 TEX13B, TEX15 SFRP2, SFRP4 497

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TEX261, TEX264 SFSWAP, SFT2D1 TEX30, TEX33 SFTPD, SFXN1 TEX36, TEX38 SFXN2, SFXN5 TF, TFAM SGCB, SGCD TFAP2A, TFAP2B SGK1, SGK223 TFAP2C, TFAP4 SGK3, SGMS1 TFCP2, TFCP2L1 SGMS2, SGOL1 TFDP2, TFE3 SGOL2, SGPL1 TFF2, TFG SGPP1, SGPP2 TFPI, TFRC SGSH, SGSM1 TG, TGFA SGSM3, SGTA TGFB1, TGFB2 SGTB, SH2B1 TGFB3, TGFBI SH2B3, SH2D1A TGFBR1, TGFBR2 SH2D1B, SH2D3A TGFBR3, TGFBRAP1 SH2D3C, SH2D6 TGIF1, TGIF2 SH3BGRL, SH3BGRL2 TGIF2LX, TGIF2LY SH3BGRL3, SH3BP1 TGM2, TGM7 SH3BP2, SH3BP4 TGOLN2, TH1L SH3BP5, SH3BP5L THADA, THAP10 SH3D19, SH3D21 THAP2, THAP3 SH3GL1, SH3GL2 THAP6, THAP7 SH3GL3, SH3GLB1 THBD, THBS1 SH3GLB2, SH3PXD2A THBS3, THEM4 SH3PXD2B, SH3RF1 THEM6, THEMIS2 SH3TC2, SHANK1 THOC1, THOC2 SHBG, SHC1 THOC3, THOC5 SHC2, SHC3 THOP1, THRA SHC4, SHCBP1 THRAP3, THSD4 SHE, SHFM1 THUMPD1, THUMPD3 SHH, SHISA2 THY1, TIA1 SHISA4, SHISA5 TIAF1, TIAL1 SHMT1, SHMT2 TIAM1, TICAM1 SHOC2, SHPK TICAM2, TICRR SHPRH, SHQ1 TIFA, TIGD3 SHROOM2, SHROOM3 TIGD4, TIMELESS SHROOM4, SI TIMM10, TIMM10B SIAH1, SIAH2 TIMM13, TIMM17A SIDT1, SIGIRR TIMM17B, TIMM23 SIGLEC10, SIGLEC16 TIMM44, TIMM50 SIGLEC5, SIGLEC8 TIMM8A, TIMM9 SIGMAR1, SIK1 TIMMDC1, TIMP1 SIK2, SIK3 TIMP2, TIMP3 SIKE1, SIMC1 TIPARP, TIPIN SIN3A, SIN3B TIPRL, TJAP1 SIP1, SIPA1 TJP1, TJP2 SIPA1L1, SIPA1L2 TJP3, TK1 SIPA1L3, SIRPA TK2, TKT SIRT1, SIRT2 TKTL1, TKTL2 SIRT4, SIRT5 TLDC1, TLE3 SIRT7, SIVA1 TLE4, TLK1 SIX1, SIX4 TLK2, TLL1 SIX5, SIX6 TLN1, TLN2 SKA1, SKA2 TLR1, TLR10 SKA3, SKAP2 TLR2, TLR4 SKI, SKIL TLX1, TLX3 SKIV2L, SKIV2L2 TM4SF1, TM7SF2 SKP1, SKP2 TM9SF1, TM9SF3 SLA, SLA2 TM9SF4, TMA16 SLAIN1, SLAIN2 TMBIM1, TMBIM6 SLAMF1, SLBP TMC3, TMC6 SLC10A3, SLC10A7 TMC7, TMC8 SLC11A2, SLC12A2 TMCC1, TMCC2 SLC12A3, SLC12A4 TMCC3, TMCO1 SLC12A6, SLC12A7 TMCO3, TMED1 SLC13A2, SLC13A4 TMED10, TMED2 SLC15A1, SLC15A4 TMED3, TMED4 SLC16A1, SLC16A10 TMED5, TMED7 SLC16A14, SLC16A2 TMED9, TMEFF1 SLC16A3, SLC16A4 TMEFF2, TMEM100 SLC16A6, SLC16A7 TMEM101, TMEM106A SLC16A9, SLC17A1 498

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TMEM106B, TMEM106C SLC17A2, SLC17A3 TMEM107, TMEM109 SLC17A4, SLC17A5 TMEM11, TMEM115 SLC17A6, SLC17A7 TMEM120A, TMEM120B SLC17A8, SLC18B1 TMEM126A, TMEM127 SLC19A1, SLC19A2 TMEM129, TMEM130 SLC19A3, SLC1A1 TMEM131, TMEM132B SLC1A3, SLC1A4 TMEM132C, TMEM134 SLC1A5, SLC1A6 TMEM135, TMEM136 SLC20A1, SLC20A2 TMEM138, TMEM140 SLC22A13, SLC22A14 TMEM147, TMEM14A SLC22A15, SLC22A16 TMEM14E, TMEM154 SLC22A18, SLC22A2 TMEM155, TMEM159 SLC22A3, SLC22A5 TMEM160, TMEM161A SLC22A8, SLC23A2 TMEM163, TMEM164 SLC25A1, SLC25A10 TMEM165, TMEM167A SLC25A12, SLC25A13 TMEM168, TMEM169 SLC25A14, SLC25A15 TMEM17, TMEM170B SLC25A16, SLC25A17 TMEM173, TMEM178A SLC25A18, SLC25A19 TMEM179B, TMEM180 SLC25A22, SLC25A23 TMEM181, TMEM183A SLC25A24, SLC25A28 TMEM184B, TMEM184C SLC25A3, SLC25A30 TMEM185A, TMEM189 SLC25A31, SLC25A32 TMEM189-UBE2V1, SLC25A33, SLC25A36 TMEM19 TMEM190, TMEM192 SLC25A37, SLC25A38 TMEM194A, TMEM198 SLC25A39, SLC25A4 TMEM2, TMEM200A SLC25A40, SLC25A41 TMEM200B, TMEM201 SLC25A44, SLC25A46 TMEM203, TMEM204 SLC25A5, SLC25A51 TMEM209, TMEM214 SLC25A6, SLC26A2 TMEM217, TMEM237 SLC26A3, SLC26A4 TMEM239, TMEM241 SLC26A7, SLC27A1 TMEM243, TMEM245 SLC27A2, SLC27A4 TMEM246, TMEM248 SLC27A6, SLC28A1 TMEM25, TMEM251 SLC29A1, SLC29A2 TMEM254, TMEM255A SLC29A3, SLC29A4 TMEM257, TMEM30A SLC2A1, SLC2A10 TMEM30B, TMEM33 SLC2A11, SLC2A12 TMEM38B, TMEM39A SLC2A13, SLC2A14 TMEM41A, TMEM41B SLC2A3, SLC2A4 TMEM42, TMEM43 SLC2A4RG, SLC2A5 TMEM45A, TMEM45B SLC2A6, SLC30A1 TMEM47, TMEM50A SLC30A10, SLC30A2 TMEM53, TMEM55B SLC30A4, SLC30A5 TMEM56, TMEM59 SLC30A6, SLC30A7 TMEM63B, TMEM64 SLC30A9, SLC31A1 TMEM65, TMEM66 SLC31A2, SLC32A1 TMEM68, TMEM69 SLC33A1, SLC34A2 TMEM81, TMEM86B SLC35A1, SLC35A2 TMEM87A, TMEM87B SLC35A4, SLC35A5 TMEM8A, TMEM8B SLC35B2, SLC35B3 TMEM9, TMEM97 SLC35B4, SLC35C1 TMEM98, TMEM9B SLC35D1, SLC35D2 TMF1, TMOD2 SLC35E2, SLC35E2B TMOD3, TMOD4 SLC35F1, SLC35F2 TMPO, TMPRSS11A SLC35F3, SLC35F4 TMPRSS11D, TMPRSS12 SLC35F5, SLC35F6 TMPRSS13, TMSB10 SLC35G1, SLC36A1 TMSB4X, TMTC1 SLC37A1, SLC37A2 TMTC2, TMTC3 SLC37A3, SLC37A4 TMTC4, TMUB1 SLC38A1, SLC38A10 TMUB2, TMX1 SLC38A2, SLC38A4 TMX2, TMX3 SLC38A5, SLC38A6 TMX4, TNC SLC39A1, SLC39A10 TNFAIP1, TNFAIP2 SLC39A11, SLC39A12 TNFAIP3, TNFAIP8L1 SLC39A13, SLC39A14 TNFRSF10A, TNFRSF10B SLC39A2, SLC39A3 TNFRSF10D, TNFRSF11B SLC39A6, SLC39A7 TNFRSF12A, TNFRSF13B SLC39A8, SLC39A9 TNFRSF17, TNFRSF1A SLC3A2, SLC41A3 499

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TNFRSF21, TNFRSF6B SLC43A1, SLC43A3 TNFRSF9, TNFSF10 SLC44A1, SLC44A2 TNFSF11, TNFSF12 SLC44A5, SLC45A1 TNFSF13, TNFSF13B SLC45A3, SLC46A1 TNFSF9, TNIK SLC46A2, SLC46A3 TNIP1, TNIP2 SLC48A1, SLC4A10 TNIP3, TNK1 SLC4A1AP, SLC4A2 TNK2, TNKS SLC4A7, SLC4A8 TNKS1BP1, TNKS2 SLC50A1, SLC52A1 TNNC1, TNNI1 SLC52A2, SLC52A3 TNNT2, TNP1 SLC5A1, SLC5A10 TNPO1, TNPO2 SLC5A3, SLC5A4 TNPO3, TNRC18 SLC5A5, SLC5A6 TNRC6A, TNRC6B SLC6A10P, SLC6A12 TNRC6C, TNS1 SLC6A14, SLC6A15 TNS3, TNS4 SLC6A16, SLC6A19 TOB1, TOB2 SLC6A4, SLC6A6 TOE1, TOLLIP SLC6A8, SLC6A9 TOM1, TOM1L1 SLC7A1, SLC7A11 TOM1L2, TOMM20 SLC7A13, SLC7A2 TOMM22, TOMM34 SLC7A5, SLC7A6 TOMM40, TOMM5 SLC7A9, SLC9A1 TOMM70A, TONSL SLC9A2, SLC9A3R1 TOP1, TOP1P2 SLC9A3R2, SLC9A5 TOP2A, TOP3A SLC9A6, SLC9A7 TOPBP1, TOPORS SLC9A9, SLCO1B1 TOR1A, TOR1AIP1 SLCO2A1, SLCO3A1 TOR1AIP2, TOR1B SLCO4A1, SLCO4C1 TOR2A, TOR3A SLFN11, SLFN12 TOR4A, TOX2 SLFN13, SLFN5 TOX3, TOX4 SLIRP, SLIT1 TP53, TP53AIP1 SLIT2, SLITRK1 TP53BP1, TP53BP2 SLITRK4, SLK TP53I11, TP53INP1 SLMAP, SLMO2 TP53RK, TP63 SLPI, SLTM TP73, TPBG SLU7, SLX4 TPCN1, TPD52 SLX4IP, SMAD1 TPD52L1, TPD52L2 SMAD2, SMAD3 TPD52L3, TPGS2 SMAD4, SMAD5 TPI1, TPM1 SMAD6, SMAD7 TPM2, TPM3 SMAD9, SMAP1 TPM4, TPO SMAP2, SMARCA1 TPP1, TPP2 SMARCA2, SMARCA4 TPPP, TPPP3 SMARCA5, SMARCAD1 TPRG1L, TPRKB SMARCAL1, SMARCB1 TPRN, TPSD1 SMARCC1, SMARCC2 TPSG1, TPST2 SMARCD1, SMARCD2 TPT1, TPTE2 SMARCE1, SMC1A TPX2, TRA2A SMC2, SMC3 TRA2B, TRABD SMC4, SMC6 TRAF1, TRAF3IP1 SMCHD1, SMCO4 TRAF3IP2, TRAF3IP3 SMCR7L, SMCR8 TRAF4, TRAF6 SMEK1, SMEK2 TRAF7, TRAFD1 SMG1, SMG5 TRAIP, TRAK1 SMG6, SMG7 TRAK2, TRAM1 SMIM12, SMIM14 TRAM2, TRANK1 SMIM15, SMIM8 TRAP1, TRAPPC1 SMN1, SMN2 TRAPPC10, TRAPPC11 SMNDC1, SMO TRAPPC12, TRAPPC13 SMOC2, SMPD3 TRAPPC2, TRAPPC2P1 SMPD4, SMPDL3A TRAPPC3, TRAPPC9 SMPDL3B, SMS TRAV12-2, TRAV22 SMTNL1, SMTNL2 TRAV23DV6, TRAV39 SMU1, SMUG1 TRAV6, TRAV9-2 SMURF1, SMURF2 TRBV7-3, TRDC SMYD1, SMYD2 TRDN, TREH SMYD3, SMYD4 TREML3P, TRERF1 SMYD5, SNAI1 TRIB1, TRIB2 SNAI2, SNAP23 TRIB3, TRIM11 SNAP29, SNAP91 TRIM14, TRIM16 SNAPC4, SNAPC5 500

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TRIM16L, TRIM2 SNAPIN, SNCA TRIM22, TRIM23 SNCB, SNCG TRIM24, TRIM25 SND1, SNED1 TRIM26, TRIM28 SNF8, SNHG16 TRIM29, TRIM3 SNORA25, SNORA70 TRIM32, TRIM33 SNORD114-10, SNORD116- 21 TRIM36, TRIM37 SNPH, SNRK TRIM38, TRIM4 SNRNP200, SNRNP27 TRIM40, TRIM44 SNRNP35, SNRNP40 TRIM47, TRIM49C SNRNP48, SNRNP70 TRIM5, TRIM50 SNRPA, SNRPA1 TRIM59, TRIM63 SNRPB, SNRPB2 TRIM65, TRIM68 SNRPC, SNRPD1 TRIM69, TRIM71 SNRPD2, SNRPD3 TRIM73, TRIM74 SNRPE, SNRPF TRIM8, TRIM9 SNRPG, SNRPN TRIO, TRIP10 SNTA1, SNTB1 TRIP12, TRIP13 SNTB2, SNTG2 TRIP6, TRMT1 SNTN, SNUPN TRMT10C, TRMT112 SNURF, SNW1 TRMT13, TRMT2A SNX1, SNX10 TRMT5, TRMT61A SNX11, SNX12 TRMT61B, TRNT1 SNX13, SNX15 TROAP, TROVE2 SNX16, SNX17 TRPA1, TRPC1 SNX18, SNX19 TRPC3, TRPC4AP SNX2, SNX24 TRPC5, TRPC5OS SNX29, SNX3 TRPM4, TRPM6 SNX30, SNX32 TRPM7, TRPM8 SNX4, SNX5 TRPS1, TRPV1 SNX6, SNX7 TRPV4, TRRAP SNX8, SNX9 TRUB1, TRUB2 SOAT1, SOAT2 TSC1, TSC2 SOBP, SOCS1 TSC22D1, TSC22D2 SOCS2, SOCS3 TSC22D3, TSC22D4 SOCS4, SOCS5 TSEN15, TSEN34 SOCS6, SOCS7 TSEN54, TSFM SOD1, SOD2 TSG101, TSHR SOD3, SOGA1 TSHZ3, TSKU SOGA2, SOLH TSLP, TSN SON, SORBS1 TSNARE1, TSNAX SORBS2, SORBS3 TSPAN13, TSPAN14 SORCS2, SORCS3 TSPAN15, TSPAN18 SORD, SORT1 TSPAN19, TSPAN3 SOS1, SOS2 TSPAN33, TSPAN4 SOST, SOSTDC1 TSPAN5, TSPAN6 SOWAHC, SOX1 TSPYL1, TSPYL2 SOX11, SOX13 TSPYL4, TSR1 SOX18, SOX2 TST, TSTD1 SOX3, SOX4 TSTD2, TTC1 SOX5, SOX9 TTC12, TTC17 SP1, SP100 TTC18, TTC19 SP2, SP3 TTC21A, TTC21B SP4, SP7 TTC22, TTC23L SPACA3, SPAG1 TTC26, TTC28 SPAG5, SPAG9 TTC29, TTC3 SPARC, SPARCL1 TTC30A, TTC30B SPAST, SPATA12 TTC32, TTC33 SPATA13, SPATA18 TTC36, TTC37 SPATA2, SPATA20 TTC38, TTC39A SPATA3, SPATA5 TTC39B, TTC5 SPATA6, SPATA6L TTC6, TTC7A SPATA9, SPATC1L TTC7B, TTC8 SPATS2, SPATS2L TTC9C, TTF2 SPC24, SPC25 TTI1, TTI2 SPCS1, SPCS2 TTK, TTL SPCS3, SPDEF TTLL1, TTLL12 SPDL1, SPDYA TTLL13, TTLL4 SPECC1, SPECC1L TTLL5, TTLL7 SPEN, SPG11 TTPA, TTPAL SPG20, SPHK1 501

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TTYH2, TTYH3 SPI1, SPIC TUB, TUBA1A SPICE1, SPIN1 TUBA1B, TUBA1C SPIN3, SPIN4 TUBA8, TUBAL3 SPINK5, SPINK7 TUBB, TUBB2A SPINT4, SPIRE1 TUBB2B, TUBB3 SPN, SPNS1 TUBB4A, TUBB4B SPNS2, SPO11 TUBB6, TUBBP1 SPOCD1, SPON2 TUBG1, TUBGCP2 SPOPL, SPP1 TUBGCP3, TUBGCP4 SPPL2C, SPR TUBGCP5, TUBGCP6 SPRED1, SPRED2 TUFM, TUFT1 SPRR1A, SPRR2C TULP3, TULP4 SPRR2G, SPRR3 TUSC1, TUSC2 SPRTN, SPRY1 TUSC3, TUSC5 SPRY2, SPRY3 TUT1, TVP23A SPRY4, SPRYD3 TVP23B, TVP23C SPRYD4, SPRYD7 TWF1, TWF1P1 SPSB1, SPTA1 TWF2, TWISTNB SPTAN1, SPTB TWSG1, TXLNA SPTBN1, SPTBN2 TXLNG, TXLNG2P SPTBN5, SPTLC1 TXN, TXN2 SPTLC3, SPTSSA TXNDC15, TXNDC16 SPTY2D1, SPZ1 TXNDC5, TXNIP SQLE, SQRDL TXNL1, TXNL4A SQSTM1, SRBD1 TXNRD1, TXNRD2 SRC, SRCAP TYMP, TYMS SRD5A1, SRD5A3 TYSND1, TYW3 SREBF1, SREBF2 TYW5, U2AF2 SREK1, SREK1IP1 U2SURP, UACA SRF, SRGAP1 UAP1, UBA1 SRGAP2, SRGAP2C UBA2, UBA3 SRGN, SRL UBA52, UBA6 SRM, SRP14 UBAC2, UBALD1 SRP19, SRP54 UBAP1, UBAP2 SRP68, SRP72 UBAP2L, UBASH3B SRP9, SRP9P1 UBB, UBC SRPK1, SRPK2 UBD, UBE2A SRPK3, SRPR UBE2C, UBE2D1 SRPRB, SRPX UBE2D2, UBE2D3 SRRM1, SRRM2 UBE2E2, UBE2E3 SRRM2-AS1, SRRM3 UBE2F, UBE2G1 SRRT, SRSF1 UBE2G2, UBE2H SRSF10, SRSF11 UBE2I, UBE2J1 SRSF12, SRSF2 UBE2K, UBE2L3 SRSF3, SRSF4 UBE2M, UBE2N SRSF5, SRSF6 UBE2NL, UBE2O SRSF7, SRSF9 UBE2Q1, UBE2Q2 SRXN1, SS18 UBE2R2, UBE2S SS18L2, SSB UBE2V1, UBE2V2 SSFA2, SSH1 UBE2W, UBE2Z SSH2, SSNA1 UBE3B, UBE3C SSPN, SSPO UBE4A, UBE4B SSR1, SSR2 UBFD1, UBIAD1 SSR3, SSR4 UBL3, UBL5 SSRP1, SSSCA1 UBL7, UBN1 SSU72, SSX1 UBN2, UBOX5 SSX2IP, SSX7 UBP1, UBQLN1 ST13, ST14 UBQLN2, UBQLN4 ST18, ST3GAL2 UBR1, UBR2 ST3GAL5, ST3GAL6 UBR3, UBR4 ST5, ST6GAL1 UBR5, UBR7 ST6GALNAC1, ST6GALNAC5 UBTD2, UBTF ST7L, ST8SIA2 UBXN1, UBXN2A ST8SIA4, STAG1 UBXN2B, UBXN6 STAG2, STAG3L2 UBXN7, UBXN8 STAG3L3, STAM UCHL1, UCK1 STAM2, STAMBP UCK2, UCMA STAMBPL1, STARD5 UCP2, UEVLD STARD6, STARD7 UFC1, UFM1 STARD8, STAT1 502

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses UFSP2, UGCG STAT2, STAT3 UGDH, UGGT1 STAT5A, STAT5B UGGT2, UGP2 STAT6, STAU1 UGT2B17, UGT2B4 STC2, STEAP2 UGT3A1, UGT8 STEAP3, STEAP4 UHMK1, UHRF1 STIL, STIM1 UHRF1BP1, UHRF2 STIM2, STIP1 ULBP2, ULK1 STK10, STK11 ULK2, ULK3 STK11IP, STK16 UMPS, UNC119 STK17B, STK19 UNC119B, UNC13B STK24, STK25 UNC13C, UNC13D STK3, STK33 UNC45A, UNC45B STK35, STK36 UNC50, UNC5B STK38, STK38L UNC5C, UNC5CL STK4, STK40 UNC93B1, UNG STMN1, STMN3 UNK, UNKL STOM, STOML2 UPF1, UPF2 STON1-GTF2A1L, STON2 UPF3A, UPP1 STOX1, STOX2 UPP2, UQCC STRADB, STRAP UQCR10, UQCR11 STRBP, STRIP1 UQCRC1, UQCRC2 STRIP2, STRN UQCRFS1, UQCRQ STRN3, STRN4 URB1, URB2 STS, STT3A URGCP, URM1 STT3B, STUB1 UROC1, UROD STX10, STX11 UROS, USE1 STX12, STX16 USF1, USF2 STX17, STX18 USH1C, USH2A STX19, STX1A USMG5, USO1 STX1B, STX2 USP1, USP10 STX3, STX4 USP11, USP12 STX5, STX7 USP13, USP14 STXBP1, STXBP2 USP15, USP16 STXBP3, STXBP4 USP18, USP19 STXBP5, STYX USP2, USP21 SUB1, SUCLA2 USP22, USP24 SUCLG1, SUCLG2 USP25, USP28 SUCO, SUDS3 USP29, USP3 SUFU, SUGP1 USP30, USP31 SUGP2, SUGT1 USP32, USP33 SULF1, SULF2 USP34, USP36 SULT1A3, SULT1B1 USP37, USP38 SULT1C2, SULT1E1 USP39, USP40 SULT2A1, SUMF1 USP42, USP44 SUMF2, SUMO1 USP45, USP46 SUMO2, SUMO3 USP47, USP48 SUN1, SUN2 USP5, USP53 SUOX, SUPT16H USP54, USP6NL SUPT20H, SUPT3H USP7, USP8 SUPT4H1, SUPT5H USP9X, USPL1 SUPT6H, SUPT7L UST, UTP11L SUPV3L1, SURF2 UTP14A, UTP14C SURF4, SURF6 UTP15, UTP18 SUSD3, SUSD5 UTP20, UTP23 SUV39H1, SUV420H1 UTP3, UTP6 SUV420H2, SUZ12 UTRN, UTS2 SV2A, SV2B UVRAG, UVSSA SVIL, SVIP UXT, VAC14 SWAP70, SWT1 VAMP2, VAMP3 SYAP1, SYBU VAMP5, VAMP7 SYCE1, SYCP2L VAMP8, VANGL1 SYDE2, SYF2 VANGL2, VAPB SYK, SYMPK VARS, VASH1 SYN1, SYNC VASH2, VASN SYNCRIP, SYNDIG1 VASP, VAT1 SYNDIG1L, SYNE1 VAT1L, VAV2 SYNE2, SYNE3 VAV3, VBP1 SYNGAP1, SYNGR1 VCAM1, VCL SYNGR2, SYNGR3 VCP, VCPIP1 SYNJ1, SYNJ2 VDAC1, VDAC2 SYNJ2BP, SYNM 503

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses VDAC3, VDR SYNPO2, SYNPR VEGFA, VEGFC SYP, SYPL1 VEZF1, VEZT SYS1, SYT1 VGLL3, VGLL4 SYT11, SYT12 VHL, VILL SYT14, SYT17 VIM, VIP SYT2, SYT4 VKORC1, VLDLR SYT5, SYT9 VMA21, VMO1 SYTL1, SYVN1 VMP1, VN1R1 SZRD1, SZT2 VNN2, VOPP1 TAAR2, TAAR6 VPRBP, VPREB3 TAB1, TAB2 VPS11, VPS13A TAB3, TAC1 VPS13B, VPS13C TAC3, TAC4 VPS13D, VPS16 TACC1, TACC2 VPS18, VPS26A TACC3, TACO1 VPS28, VPS33A TACR2, TACR3 VPS35, VPS36 TACSTD2, TADA3 VPS37A, VPS37B TAF1, TAF10 VPS39, VPS41 TAF11, TAF12 VPS45, VPS4A TAF15, TAF1A VPS4B, VPS51 TAF1C, TAF1D VPS53, VPS54 TAF2, TAF4 VPS8, VPS9D1 TAF4B, TAF5 VSIG1, VSNL1 TAF5L, TAF6 VSTM2L, VTCN1 TAF6L, TAF7 VTI1A, VTI1B TAF8, TAF9 VTN, VWA7 TAF9B, TAGLN VWA8, VWA9 TAGLN2, TAGLN3 WAC, WAPAL TALDO1, TANC2 WARS, WARS2 TANGO6, TAOK1 WASF1, WASF2 TAOK2, TAOK3 WASF3, WASF4P TAPBP, TAPT1 WASH3P, WASL TARBP1, TARBP2 WBP1, WBP11 TARDBP, TARS WBP1L, WBP2 TARS2, TARSL2 WBP2NL, WBP4 TAS2R10, TAS2R13 WBSCR16, WBSCR22 TAS2R7, TAS2R9 WDFY1, WDFY3 TASP1, TATDN2 WDFY4, WDR1 TAX1BP1, TAX1BP3 WDR11, WDR12 TAZ, TBATA WDR17, WDR18 TBC1D1, TBC1D10B WDR19, WDR25 TBC1D12, TBC1D13 WDR26, WDR3 TBC1D14, TBC1D15 WDR33, WDR34 TBC1D16, TBC1D17 WDR36, WDR37 TBC1D2, TBC1D20 WDR4, WDR43 TBC1D22A, TBC1D22B WDR44, WDR45 TBC1D25, TBC1D27 WDR45B, WDR46 TBC1D28, TBC1D2B WDR47, WDR48 TBC1D30, TBC1D4 WDR5, WDR54 TBC1D8, TBC1D8B WDR55, WDR59 TBC1D9, TBC1D9B WDR5B, WDR6 TBCA, TBCB WDR60, WDR61 TBCC, TBCCD1 WDR62, WDR67 TBCE, TBCEL WDR7, WDR73 TBCK, TBK1 WDR74, WDR75 TBKBP1, TBL1X WDR76, WDR77 TBL1XR1, TBL3 WDR81, WDR82 TBP, TBPL1 WDR83OS, WDR86 TBRG1, TBRG4 WDR89, WDR96 TBX1, TBX15 WDTC1, WDYHV1 TBX18, TBX19 WEE1, WFDC11 TBX2, TBX21 WFDC3, WFIKKN1 TBX22, TBX3 WFS1, WHAMM TBX4, TBX5 WHSC1, WHSC1L1 TBXA2R, TBXAS1 WIBG, WIF1 TC2N, TCAM1P WIPF1, WIPF2 TCEA1, TCEA2 WIPI2, WISP1 TCEA3, TCEAL1 WISP3, WIZ TCEAL7, TCEB2 WLS, WNK1 TCEB3, TCEB3B WNK3, WNK4 TCERG1, TCF12 504

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses WNT1, WNT10B TCF19, TCF21 WNT16, WNT2 TCF25, TCF3 WNT3, WNT3A TCF4, TCF7 WNT4, WNT5A TCF7L1, TCF7L2 WNT5B, WNT7B TCFL5, TCHP WNT8B, WNT9A TCL1A, TCL1B WNT9B, WRAP53 TCN1, TCOF1 WSB2, WT1 TCP1, TCP11L1 WTIP, WWC2 TCP11L2, TCTA WWC3, WWP2 TCTEX1D2, TCTN1 WWTR1, XAF1 TCTN2, TCTN3 XBP1, XIAP TDG, TDP1 XIRP1, XIRP2 TDRD3, TDRD5 XIST, XK TDRD7, TDRKH XKR4, XKR6 TDRP, TEAD1 XKR8, XKR9 TEAD2, TEAD3 XKRX, XPA TEAD4, TECPR1 XPC, XPNPEP3 TECPR2, TECR XPO1, XPO4 TECTB, TEF XPO5, XPO6 TEFM, TEK XPO7, XPOT TEKT1, TEKT3 XPR1, XRCC1 TEKT4, TELO2 XRCC5, XRCC6 TENC1, TENM3 XRN1, XRN2 TENM4, TEP1 XRRA1, XYLT1 TERF1, TERF2 XYLT2, YAP1 TERF2IP, TERT YARS, YARS2 TES, TESK1 YBX1, YBX2 TESK2, TESPA1 YBX3, YDJC TET1, TET2 YEATS2, YES1 TET3, TEX10 YIF1B, YIPF1 TEX101, TEX13A YIPF2, YIPF3 TEX13B, TEX15 YIPF4, YIPF5 TEX2, TEX261 YIPF6, YKT6 TEX264, TEX30 YLPM1, YME1L1 TEX33, TEX36 YOD1, YPEL1 TEX38, TF YPEL2, YPEL3 TFAM, TFAP2A YPEL4, YPEL5 TFAP2B, TFAP2C YRDC, YTHDC1 TFAP4, TFB2M YTHDC2, YTHDF1 TFCP2, TFCP2L1 YTHDF2, YWHAB TFDP2, TFE3 YWHAE, YWHAG TFEB, TFF2 YWHAH, YWHAQ TFG, TFPI YWHAZ, YY1 TFPT, TFRC YY1AP1, ZADH2 TG, TGDS ZAP70, ZBED1 TGFA, TGFB1 ZBED2, ZBED3 TGFB1I1, TGFB2 ZBED4, ZBED6CL TGFB3, TGFBI ZBTB1, ZBTB10 TGFBR1, TGFBR2 ZBTB11, ZBTB14 TGFBR3, TGFBRAP1 ZBTB18, ZBTB2 TGIF1, TGIF2 ZBTB20, ZBTB21 TGIF2LX, TGIF2LY ZBTB22, ZBTB24 TGM1, TGM2 ZBTB3, ZBTB32 TGM7, TGOLN2 ZBTB33, ZBTB34 TGS1, THADA ZBTB38, ZBTB39 THAP10, THAP2 ZBTB4, ZBTB40 THAP3, THAP6 ZBTB41, ZBTB43 THAP7, THAP9 ZBTB44, ZBTB45 THBD, THBS1 ZBTB47, ZBTB48 THBS3, THEM4 ZBTB5, ZBTB6 THEM6, THEMIS2 ZBTB7A, ZBTB7B THOC1, THOC2 ZBTB7C, ZBTB8A THOC3, THOC5 ZBTB8OS, ZBTB9 THOC6, THOC7 ZC3H11A, ZC3H11B THOP1, THPO ZC3H12A, ZC3H12C THRA, THRAP3 ZC3H15, ZC3H18 THRB, THSD4 ZC3H3, ZC3H4 THUMPD1, THUMPD3 ZC3H7A, ZC3H7B TIA1, TIAF1 ZC3H8, ZC3HAV1 TIAL1, TIAM1 ZCCHC11, ZCCHC13 TICAM1, TICAM2 505

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ZCCHC14, ZCCHC2 TICRR, TIFA ZCCHC24, ZCCHC3 TIGD3, TIMELESS ZCCHC5, ZCCHC6 TIMM10, TIMM10B ZCCHC9, ZCWPW2 TIMM13, TIMM17A ZDHHC11, ZDHHC14 TIMM17B, TIMM23 ZDHHC15, ZDHHC17 TIMM44, TIMM50 ZDHHC19, ZDHHC2 TIMM8A, TIMM8B ZDHHC20, ZDHHC23 TIMM9, TIMMDC1 ZDHHC3, ZDHHC4 TIMP1, TIMP2 ZDHHC5, ZDHHC6 TIMP3, TINF2 ZDHHC7, ZDHHC8 TIPARP, TIPIN ZDHHC9, ZEB1 TIPRL, TIRAP ZEB2, ZER1 TJAP1, TJP1 ZFAND1, ZFAND3 TJP2, TJP3 ZFAND4, ZFAND5 TK1, TK2 ZFHX3, ZFHX4 TKT, TKTL1 ZFP1, ZFP2 TKTL2, TLDC1 ZFP28, ZFP3 TLE1, TLE3 ZFP30, ZFP36L1 TLE4, TLK1 ZFP36L2, ZFP41 TLK2, TLL1 ZFP57, ZFP62 TLN1, TLN2 ZFP69, ZFP90 TLR1, TLR10 ZFP91, ZFPL1 TLR2, TLR3 ZFPM2, ZFR TLR4, TLX1 ZFX, ZFYVE1 TLX3, TM4SF1 ZFYVE16, ZFYVE20 TM6SF1, TM7SF2 ZFYVE21, ZFYVE26 TM9SF1, TM9SF3 ZFYVE27, ZFYVE28 TM9SF4, TMA16 ZFYVE9, ZG16B TMBIM1, TMBIM6 ZHX3, ZIC2 TMC3, TMC6 ZIC5, ZIK1 TMC7, TMC8 ZKSCAN2, ZKSCAN7 TMCC1, TMCC2 ZKSCAN8, ZMAT1 TMCC3, TMCO1 ZMAT2, ZMAT3 TMCO3, TMCO6 ZMAT5, ZMIZ1 TMED1, TMED10 ZMIZ2, ZMPSTE24 TMED2, TMED3 ZMYM1, ZMYM2 TMED4, TMED5 ZMYM3, ZMYND11 TMED7, TMED9 ZMYND19, ZMYND8 TMEFF1, TMEFF2 ZNF106, ZNF107 TMEM100, TMEM101 ZNF114, ZNF12 TMEM102, TMEM104 ZNF121, ZNF131 TMEM106A, TMEM106B ZNF134, ZNF135 TMEM107, TMEM108 ZNF143, ZNF146 TMEM109, TMEM11 ZNF148, ZNF157 TMEM115, TMEM120A ZNF165, ZNF174 TMEM120B, TMEM121 ZNF175, ZNF180 TMEM123, TMEM126A ZNF184, ZNF20 TMEM127, TMEM128 ZNF200, ZNF202 TMEM129, TMEM130 ZNF207, ZNF208 TMEM131, TMEM132A ZNF212, ZNF214 TMEM132B, TMEM132C ZNF215, ZNF217 TMEM134, TMEM135 ZNF219, ZNF221 TMEM136, TMEM138 ZNF226, ZNF227 TMEM140, TMEM147 ZNF23, ZNF230 TMEM14A, TMEM14C ZNF236, ZNF248 TMEM14E, TMEM154 ZNF25, ZNF251 TMEM155, TMEM156 ZNF252P, ZNF256 TMEM159, TMEM160 ZNF260, ZNF263 TMEM161A, TMEM161B ZNF264, ZNF267 TMEM163, TMEM164 ZNF273, ZNF275 TMEM165, TMEM167A ZNF276, ZNF280B TMEM167B, TMEM168 ZNF280C, ZNF280D TMEM169, TMEM17 ZNF281, ZNF282 TMEM170B, TMEM173 ZNF283, ZNF284 TMEM175, TMEM178A ZNF286A, ZNF292 TMEM179B, TMEM180 ZNF296, ZNF3 TMEM181, TMEM183A ZNF302, ZNF317 TMEM184B, TMEM184C ZNF318, ZNF320 TMEM185A, TMEM187 ZNF322, ZNF322P1 TMEM189, TMEM189- UBE2V1 506

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ZNF324, ZNF326 TMEM19, TMEM190 ZNF330, ZNF331 TMEM192, TMEM194A ZNF334, ZNF33A TMEM198, TMEM2 ZNF341, ZNF346 TMEM200A, TMEM200B ZNF35, ZNF350 TMEM201, TMEM203 ZNF354A, ZNF354B TMEM204, TMEM206 ZNF362, ZNF365 TMEM208, TMEM209 ZNF367, ZNF37A TMEM214, TMEM217 ZNF383, ZNF384 TMEM223, TMEM230 ZNF385A, ZNF385B TMEM239, TMEM241 ZNF395, ZNF398 TMEM243, TMEM245 ZNF407, ZNF408 TMEM246, TMEM248 ZNF416, ZNF417 TMEM25, TMEM251 ZNF418, ZNF420 TMEM254, TMEM255A ZNF423, ZNF426 TMEM256, TMEM257 ZNF428, ZNF432 TMEM259, TMEM30A ZNF436, ZNF44 TMEM30B, TMEM33 ZNF442, ZNF443 TMEM38B, TMEM39A ZNF445, ZNF449 TMEM41A, TMEM41B ZNF45, ZNF451 TMEM42, TMEM43 ZNF460, ZNF461 TMEM44, TMEM45A ZNF462, ZNF467 TMEM45B, TMEM47 ZNF473, ZNF474 TMEM5, TMEM50A ZNF48, ZNF483 TMEM50B, TMEM51 ZNF488, ZNF490 TMEM53, TMEM54 ZNF503, ZNF506 TMEM55B, TMEM56 ZNF507, ZNF511 TMEM59, TMEM62 ZNF512, ZNF512B TMEM63B, TMEM64 ZNF513, ZNF516 TMEM65, TMEM66 ZNF518A, ZNF519 TMEM69, TMEM74B ZNF521, ZNF525 TMEM79, TMEM81 ZNF526, ZNF532 TMEM86B, TMEM87A ZNF543, ZNF551 TMEM87B, TMEM8A ZNF552, ZNF555 TMEM8B, TMEM9 ZNF557, ZNF558 TMEM97, TMEM98 ZNF559, ZNF561 TMEM99, TMEM9B ZNF562, ZNF563 TMF1, TMOD2 ZNF567, ZNF568 TMOD3, TMOD4 ZNF571, ZNF572 TMPO, TMPRSS11A ZNF573, ZNF579 TMPRSS11D, TMPRSS11E ZNF581, ZNF587 TMPRSS12, TMPRSS13 ZNF589, ZNF592 TMPRSS6, TMSB10 ZNF593, ZNF594 TMSB15A, TMTC1 ZNF597, ZNF598 TMTC2, TMTC3 ZNF605, ZNF606 TMTC4, TMUB1 ZNF607, ZNF608 TMUB2, TMX2 ZNF609, ZNF611 TMX3, TMX4 ZNF614, ZNF618 TNC, TNFAIP1 ZNF621, ZNF622 TNFAIP2, TNFAIP3 ZNF625, ZNF629 TNFAIP8L1, TNFRSF10A ZNF638, ZNF641 TNFRSF10B, TNFRSF10D ZNF644, ZNF646 TNFRSF11A, TNFRSF11B ZNF652, ZNF654 TNFRSF12A, TNFRSF13B ZNF655, ZNF660 TNFRSF17, TNFRSF1A ZNF664, ZNF667 TNFRSF21, TNFRSF25 ZNF668, ZNF678 TNFRSF6B, TNFRSF9 ZNF682, ZNF687 TNFSF10, TNFSF11 ZNF689, ZNF696 TNFSF12, TNFSF13 ZNF700, ZNF701 TNFSF13B, TNFSF15 ZNF703, ZNF704 TNFSF9, TNIK ZNF706, ZNF708 TNIP1, TNIP2 ZNF71, ZNF710 TNIP3, TNK1 ZNF711, ZNF714 TNK2, TNKS ZNF721, ZNF74 TNKS1BP1, TNKS2 ZNF746, ZNF747 TNNC1, TNNI1 ZNF76, ZNF763 TNNT2, TNP1 ZNF77, ZNF770 TNP2, TNPO1 ZNF771, ZNF772 TNPO2, TNPO3 ZNF776, ZNF777 TNRC18, TNRC6A ZNF780A, ZNF780B TNRC6B, TNRC6C ZNF785, ZNF791 TNS1, TNS3 507

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ZNF792, ZNF793 TNS4, TOB1 ZNF799, ZNF80 TOB2, TOE1 ZNF800, ZNF804A TOLLIP, TOM1 ZNF804B, ZNF805 TOM1L1, TOM1L2 ZNF813, ZNF814 TOMM20, TOMM22 ZNF821, ZNF823 TOMM34, TOMM40 ZNF827, ZNF829 TOMM5, TOMM70A ZNF83, ZNF84 TONSL, TOP1 ZNF841, ZNF843 TOP1P2, TOP2A ZNF85, ZNF862 TOP2B, TOP3A ZNF92, ZNFX1 TOPBP1, TOPORS ZNHIT3, ZNHIT6 TOR1A, TOR1AIP1 ZNRD1-AS1, ZNRF1 TOR1AIP2, TOR1B ZNRF3, ZP3 TOR2A, TOR3A ZPBP, ZPBP2 TOR4A, TOX ZPLD1, ZRANB1 TOX2, TOX3 ZRANB2, ZSCAN18 TOX4, TP53 ZSCAN23, ZSCAN25 TP53AIP1, TP53BP1 ZSWIM1, ZSWIM3 TP53BP2, TP53I11 ZSWIM6, ZSWIM7 TP53I13, TP53I3 ZSWIM8, ZW10 TP53INP1, TP53RK ZWINT, ZXDC TP53TG5, TP63 ZYG11A, ZYG11B TP73, TPBG ZYX, ZZEF1 TPCN1, TPCN2 TPD52, TPD52L1 TPD52L2, TPD52L3 TPGS2, TPI1 TPM1, TPM3 TPM4, TPO TPP1, TPP2 TPPP, TPPP3 TPRA1, TPRG1 TPRG1L, TPRKB TPRN, TPSG1 TPST1, TPST2 TPT1, TPTE2 TPX2, TRA2A TRA2B, TRABD TRAF1, TRAF3 TRAF3IP1, TRAF3IP2 TRAF3IP3, TRAF4 TRAF6, TRAF7 TRAFD1, TRAIP TRAK1, TRAK2 TRAM1, TRAM2 TRANK1, TRAP1 TRAPPC1, TRAPPC10 TRAPPC11, TRAPPC12 TRAPPC13, TRAPPC2 TRAPPC2P1, TRAPPC3 TRAPPC6A, TRAPPC9 TRAV12-2, TRAV22 TRAV23DV6, TRAV39 TRAV6, TRAV9-2 TRBV7-3, TRDC TRDN, TREH TREML3P, TRERF1 TREX2, TRIB1 TRIB2, TRIB3 TRIM11, TRIM13 TRIM14, TRIM16 TRIM16L, TRIM17 TRIM2, TRIM22 TRIM23, TRIM24 TRIM25, TRIM27 TRIM28, TRIM29 TRIM3, TRIM32 TRIM33, TRIM36 TRIM37, TRIM38 TRIM4, TRIM40 TRIM44, TRIM47 508

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TRIM49C, TRIM5 TRIM50, TRIM58 TRIM59, TRIM6 TRIM6-TRIM34, TRIM63 TRIM65, TRIM68 TRIM69, TRIM7 TRIM71, TRIM73 TRIM74, TRIM8 TRIM9, TRIO TRIOBP, TRIP10 TRIP11, TRIP12 TRIP13, TRIP6 TRIT1, TRMT1 TRMT10C, TRMT11 TRMT112, TRMT13 TRMT2A, TRMT2B TRMT5, TRMT61A TRMT61B, TRNT1 TRO, TROAP TROVE2, TRPA1 TRPC1, TRPC3 TRPC4, TRPC4AP TRPC5, TRPC5OS TRPC6, TRPC7 TRPM1, TRPM2 TRPM4, TRPM7 TRPM8, TRPS1 TRPV1, TRPV3 TRPV4, TRPV6 TRRAP, TRUB1 TRUB2, TSC1 TSC2, TSC22D1 TSC22D2, TSC22D3 TSC22D4, TSEN15 TSEN34, TSEN54 TSFM, TSG101 TSGA10, TSHB TSHR, TSHZ3 TSKU, TSLP TSN, TSNARE1 TSNAX, TSPAN12 TSPAN13, TSPAN14 TSPAN15, TSPAN17 TSPAN18, TSPAN3 TSPAN33, TSPAN5 TSPAN6, TSPAN7 TSPAN9, TSPYL1 TSPYL2, TSPYL4 TSPYL6, TSR1 TSR2, TST TSTA3, TSTD1 TSTD2, TTC1 TTC12, TTC17 TTC18, TTC19 TTC21A, TTC22 TTC23L, TTC26 TTC28, TTC29 TTC3, TTC30A TTC30B, TTC32 TTC33, TTC36 TTC37, TTC38 TTC39A, TTC39B TTC5, TTC6 TTC7A, TTC7B TTC8, TTC9C TTF1, TTF2 TTI1, TTI2 TTK, TTL TTLL1, TTLL12 TTLL13, TTLL3 TTLL4, TTLL5 509

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses TTLL7, TTN TTPA, TTPAL TTYH2, TTYH3 TUB, TUBA1A TUBA1B, TUBA1C TUBA3D, TUBA8 TUBAL3, TUBB TUBB1, TUBB2A TUBB2B, TUBB3 TUBB4A, TUBB4B TUBB6, TUBBP1 TUBD1, TUBE1 TUBG1, TUBGCP2 TUBGCP3, TUBGCP4 TUBGCP5, TUBGCP6 TUFM, TUFT1 TULP2, TULP3 TULP4, TUSC1 TUSC2, TUSC3 TUSC5, TUT1 TVP23A, TVP23B TVP23C, TWF1 TWF1P1, TWF2 TWIST2, TWISTNB TWSG1, TXK TXLNA, TXLNG TXLNG2P, TXN TXN2, TXNDC11 TXNDC12, TXNDC15 TXNDC16, TXNDC5 TXNIP, TXNL1 TXNL4A, TXNRD1 TXNRD2, TYK2 TYMP, TYMS TYSND1, TYW1 TYW3, TYW5 U2AF2, U2SURP UACA, UAP1 UBA1, UBA2 UBA3, UBA52 UBA6, UBAC2 UBALD1, UBAP1 UBAP2, UBAP2L UBASH3B, UBB UBC, UBD UBE2A, UBE2B UBE2C, UBE2D1 UBE2D2, UBE2D3 UBE2E1, UBE2E2 UBE2E3, UBE2F UBE2G1, UBE2G2 UBE2H, UBE2I UBE2J1, UBE2J2 UBE2K, UBE2L3 UBE2M, UBE2N UBE2NL, UBE2O UBE2Q1, UBE2Q2 UBE2R2, UBE2S UBE2V1, UBE2V2 UBE2W, UBE2Z UBE3A, UBE3B UBE3C, UBE4A UBE4B, UBFD1 UBIAD1, UBL3 UBL5, UBL7 UBN1, UBN2 UBOX5, UBP1 UBQLN1, UBQLN2 UBQLN3, UBQLN4 UBR1, UBR2 UBR3, UBR4 510

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses UBR5, UBR7 UBTD2, UBTF UBXN1, UBXN2A UBXN2B, UBXN6 UBXN7, UBXN8 UCHL1, UCHL3 UCK1, UCK2 UCMA, UCP2 UCP3, UEVLD UFC1, UFL1 UFM1, UFSP2 UGCG, UGDH UGGT1, UGP2 UGT1A10, UGT2A1 UGT2B17, UGT2B4 UGT3A1, UGT8 UHMK1, UHRF1 UHRF1BP1, UHRF2 ULBP2, ULK1 ULK2, ULK3 UMPS, UNC119 UNC13A, UNC13B UNC13C, UNC13D UNC45A, UNC45B UNC50, UNC5A UNC5B, UNC5C UNC5CL, UNC5D UNC79, UNC93B1 UNG, UNK UNKL, UPF1 UPF2, UPF3A UPF3B, UPP1 UPP2, UQCC UQCR10, UQCR11 UQCRB, UQCRC1 UQCRC2, UQCRFS1 UQCRQ, URB1 URB2, URGCP URI1, URM1 UROC1, UROD UROS, USB1 USE1, USF1 USF2, USH1C USH2A, USMG5 USO1, USP1 USP10, USP11 USP12, USP13 USP14, USP15 USP16, USP18 USP19, USP2 USP20, USP21 USP22, USP24 USP25, USP28 USP29, USP3 USP30, USP31 USP32, USP32P1 USP33, USP34 USP36, USP37 USP38, USP39 USP40, USP42 USP44, USP45 USP46, USP47 USP48, USP5 USP53, USP54 USP6, USP6NL USP7, USP8 USP9X, USPL1 UST, UTP11L UTP14A, UTP14C UTP15, UTP18 UTP20, UTP23 511

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses UTP3, UTP6 UTRN, UTS2 UVRAG, UVSSA UXT, VAC14 VAMP2, VAMP3 VAMP5, VAMP7 VAMP8, VANGL1 VANGL2, VAPA VAPB, VARS VASH1, VASH2 VASN, VASP VAT1, VAT1L VAV1, VAV2 VAV3, VBP1 VCAM1, VCL VCP, VCPIP1 VDAC1, VDAC2 VDAC3, VDR VEGFA, VEGFC VEZF1, VEZT VGLL3, VGLL4 VHL, VILL VIM, VIP VIPAS39, VIT VKORC1, VLDLR VMA21, VMO1 VMP1, VN1R1 VNN2, VOPP1 VPRBP, VPREB1 VPREB3, VPS11 VPS13A, VPS13B VPS13C, VPS13D VPS16, VPS18 VPS25, VPS26A VPS28, VPS33A VPS35, VPS36 VPS37A, VPS37B VPS37C, VPS39 VPS41, VPS45 VPS4A, VPS4B VPS51, VPS53 VPS54, VPS8 VPS9D1, VRK1 VSIG1, VSNL1 VSTM2L, VTCN1 VTI1A, VTI1B VTN, VWA7 VWA8, VWA9 WAC, WAPAL WARS, WARS2 WASF1, WASF2 WASF3, WASF4P WASH3P, WASL WBP1, WBP11 WBP1L, WBP2 WBP2NL, WBP4 WBP5, WBSCR16 WBSCR22, WDFY1 WDFY3, WDFY4 WDHD1, WDR1 WDR11, WDR12 WDR13, WDR17 WDR18, WDR19 WDR25, WDR26 WDR3, WDR33 WDR34, WDR36 WDR37, WDR4 WDR43, WDR44 WDR45, WDR45B WDR46, WDR47 WDR48, WDR5 512

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses WDR53, WDR54 WDR55, WDR59 WDR5B, WDR6 WDR60, WDR61 WDR62, WDR67 WDR7, WDR73 WDR74, WDR75 WDR76, WDR77 WDR81, WDR82 WDR83OS, WDR86 WDR89, WDR91 WDR96, WDTC1 WDYHV1, WEE1 WFDC11, WFDC3 WFDC6, WFIKKN1 WFS1, WHAMM WHAMMP3, WHSC1 WHSC1L1, WIBG WIF1, WIPF1 WIPF2, WIPF3 WIPI2, WISP1 WISP2, WISP3 WIZ, WLS WNK1, WNK3 WNK4, WNT1 WNT10B, WNT16 WNT2, WNT3 WNT3A, WNT4 WNT5A, WNT5B WNT6, WNT7B WNT8A, WNT8B WNT9A, WNT9B WRAP53, WRAP73 WRB, WSB2 WT1, WTAP WTIP, WWC1 WWC2, WWC3 WWP2, WWTR1 XAF1, XBP1 XDH, XIAP XIRP1, XIRP2 XIST, XK XKR4, XKR6 XKR8, XKR9 XKRX, XPA XPC, XPNPEP3 XPO1, XPO4 XPO5, XPO6 XPO7, XPOT XPR1, XRCC1 XRCC2, XRCC5 XRCC6, XRN1 XRN2, XRRA1 XYLT1, XYLT2 YAF2, YAP1 YARS, YARS2 YBX1, YBX2 YBX3, YDJC YEATS2, YES1 YIF1A, YIF1B YIPF1, YIPF2 YIPF3, YIPF4 YIPF5, YIPF6 YKT6, YLPM1 YME1L1, YOD1 YPEL1, YPEL2 YPEL3, YPEL4 YPEL5, YRDC YTHDC1, YTHDC2 YTHDF1, YTHDF2 YWHAB, YWHAE 513

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses YWHAG, YWHAH YWHAQ, YWHAZ YY1, YY1AP1 ZADH2, ZAP70 ZBED1, ZBED2 ZBED3, ZBED4 ZBED6CL, ZBTB1 ZBTB10, ZBTB11 ZBTB14, ZBTB18 ZBTB2, ZBTB20 ZBTB21, ZBTB22 ZBTB24, ZBTB25 ZBTB3, ZBTB32 ZBTB33, ZBTB34 ZBTB38, ZBTB39 ZBTB4, ZBTB40 ZBTB41, ZBTB43 ZBTB44, ZBTB45 ZBTB47, ZBTB48 ZBTB5, ZBTB6 ZBTB7A, ZBTB7B ZBTB7C, ZBTB8A ZBTB8OS, ZBTB9 ZC3H11A, ZC3H11B ZC3H12A, ZC3H12C ZC3H14, ZC3H15 ZC3H18, ZC3H3 ZC3H4, ZC3H7A ZC3H7B, ZC3H8 ZC3HAV1, ZCCHC11 ZCCHC13, ZCCHC14 ZCCHC2, ZCCHC24 ZCCHC3, ZCCHC5 ZCCHC6, ZCCHC7 ZCCHC8, ZCCHC9 ZCWPW2, ZDHHC11 ZDHHC14, ZDHHC15 ZDHHC17, ZDHHC19 ZDHHC2, ZDHHC20 ZDHHC23, ZDHHC3 ZDHHC4, ZDHHC5 ZDHHC6, ZDHHC7 ZDHHC8, ZDHHC9 ZEB1, ZEB2 ZER1, ZFAND1 ZFAND3, ZFAND4 ZFAND5, ZFAT ZFC3H1, ZFHX3 ZFHX4, ZFP1 ZFP2, ZFP28 ZFP3, ZFP30 ZFP36L1, ZFP36L2 ZFP37, ZFP41 ZFP57, ZFP62 ZFP64, ZFP69 ZFP90, ZFP91 ZFPL1, ZFPM2 ZFR, ZFX ZFY, ZFYVE1 ZFYVE16, ZFYVE20 ZFYVE21, ZFYVE26 ZFYVE27, ZFYVE28 ZFYVE9, ZG16B ZHX1, ZHX3 ZIC2, ZIC3 ZIC5, ZIK1 ZKSCAN2, ZKSCAN3 ZKSCAN5, ZKSCAN7 ZKSCAN8, ZMAT1 ZMAT2, ZMAT3 ZMAT4, ZMAT5 514

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ZMIZ1, ZMIZ2 ZMPSTE24, ZMYM1 ZMYM2, ZMYM3 ZMYND11, ZMYND19 ZMYND8, ZNF106 ZNF107, ZNF114 ZNF12, ZNF121 ZNF131, ZNF133 ZNF134, ZNF135 ZNF136, ZNF140 ZNF141, ZNF143 ZNF146, ZNF148 ZNF155, ZNF157 ZNF165, ZNF174 ZNF175, ZNF180 ZNF184, ZNF195 ZNF20, ZNF200 ZNF202, ZNF207 ZNF208, ZNF212 ZNF214, ZNF215 ZNF217, ZNF219 ZNF221, ZNF226 ZNF227, ZNF23 ZNF230, ZNF233 ZNF236, ZNF238 ZNF248, ZNF25 ZNF251, ZNF252P ZNF254, ZNF256 ZNF260, ZNF263 ZNF264, ZNF267 ZNF273, ZNF275 ZNF276, ZNF28 ZNF280B, ZNF280C ZNF280D, ZNF281 ZNF282, ZNF283 ZNF284, ZNF286A ZNF292, ZNF296 ZNF3, ZNF30 ZNF302, ZNF317 ZNF318, ZNF320 ZNF322, ZNF322P1 ZNF324, ZNF324B ZNF326, ZNF330 ZNF331, ZNF334 ZNF33A, ZNF341 ZNF346, ZNF35 ZNF350, ZNF354A ZNF354B, ZNF362 ZNF365, ZNF367 ZNF37A, ZNF383 ZNF384, ZNF385A ZNF385B, ZNF395 ZNF398, ZNF407 ZNF408, ZNF410 ZNF416, ZNF417 ZNF418, ZNF420 ZNF423, ZNF426 ZNF428, ZNF430 ZNF432, ZNF436 ZNF44, ZNF440 ZNF442, ZNF443 ZNF445, ZNF449 ZNF45, ZNF451 ZNF460, ZNF461 ZNF462, ZNF467 ZNF468, ZNF473 ZNF474, ZNF48 ZNF480, ZNF483 ZNF488, ZNF490 ZNF493, ZNF496 ZNF503, ZNF506 515

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ZNF507, ZNF511 ZNF512, ZNF512B ZNF513, ZNF516 ZNF518A, ZNF519 ZNF521, ZNF525 ZNF526, ZNF529 ZNF532, ZNF543 ZNF548, ZNF549 ZNF551, ZNF552 ZNF554, ZNF555 ZNF556, ZNF557 ZNF558, ZNF559 ZNF561, ZNF562 ZNF563, ZNF567 ZNF568, ZNF571 ZNF572, ZNF573 ZNF579, ZNF581 ZNF584, ZNF587 ZNF589, ZNF592 ZNF593, ZNF594 ZNF597, ZNF598 ZNF600, ZNF605 ZNF606, ZNF607 ZNF608, ZNF609 ZNF611, ZNF614 ZNF618, ZNF622 ZNF625, ZNF629 ZNF638, ZNF641 ZNF644, ZNF646 ZNF652, ZNF654 ZNF655, ZNF660 ZNF664, ZNF667 ZNF668, ZNF672 ZNF678, ZNF682 ZNF687, ZNF689 ZNF696, ZNF700 ZNF701, ZNF703 ZNF704, ZNF706 ZNF708, ZNF71 ZNF710, ZNF711 ZNF714, ZNF721 ZNF74, ZNF740 ZNF746, ZNF747 ZNF76, ZNF763 ZNF766, ZNF77 ZNF770, ZNF771 ZNF772, ZNF776 ZNF777, ZNF778 ZNF780A, ZNF780B ZNF785, ZNF791 ZNF792, ZNF793 ZNF80, ZNF800 ZNF804B, ZNF805 ZNF81, ZNF813 ZNF814, ZNF821 ZNF823, ZNF827 ZNF829, ZNF83 ZNF839, ZNF84 ZNF841, ZNF843 ZNF85, ZNF862 ZNF92, ZNFX1 ZNHIT1, ZNHIT2 ZNHIT3, ZNHIT6 ZNRD1-AS1, ZNRF1 ZNRF2, ZNRF3 ZP3, ZPBP ZPBP2, ZPLD1 ZRANB1, ZRANB2 ZSCAN16, ZSCAN18 ZSCAN23, ZSCAN25 ZSCAN29, ZSWIM1 516

Table G.6 (continued) Collection Bacteria Eukaryotes Viruses ZSWIM3, ZSWIM6 ZSWIM7, ZSWIM8 ZW10, ZWINT ZXDA, ZXDC ZYG11A, ZYG11B ZYX, ZZEF1 517

Appendix H: Enriched Data from the Meta-analysis of miRNA Expression

Table H.1: KEGG Pathways Enriched for the Collection of Profiles. The 20 most signif- icantly enriched KEGG pathways for the collection of profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa00471 D-Glutamine and D- 4 4 0 glutamate metabolism hsa05211 Renal cell carcinoma 48 66 0 hsa05212 Pancreatic cancer 48 66 0 hsa04110 Cell cycle 84 124 0 hsa04550 Signaling pathways 91 142 0 regulating pluripo- tency of stem cells hsa04390 Hippo signaling path- 98 154 0 way hsa04510 Focal adhesion 121 207 0 hsa05205 Proteoglycans in can- 130 204 0 cer hsa05166 HTLV-I infection 147 261 0 hsa05200 Pathways in cancer 253 398 0 hsa04014 Ras signaling pathway 130 228 0 hsa03430 Mismatch repair 20 23 0 hsa05210 43 62 0 hsa03410 Base excision repair 26 33 0 hsa05220 Chronic myeloid 49 73 0 leukemia hsa05161 Hepatitis B 86 146 0 hsa03050 Proteasome 32 44 0 hsa04520 Adherens junction 48 73 0 hsa04810 Regulation of actin cy- 119 215 0 toskeleton hsa03030 DNA replication 27 36 0 518

Table H.2: KEGG Pathways Enriched for Bacteria Profiles. The 20 most significantly enriched KEGG pathways for bacteria profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa00471 D-Glutamine and D- 4 4 0 glutamate metabolism hsa04110 Cell cycle 85 124 0 hsa04550 Signaling pathways 98 142 0 regulating pluripo- tency of stem cells hsa04390 Hippo signaling path- 109 154 0 way hsa05205 Proteoglycans in can- 136 204 0 cer hsa05200 Pathways in cancer 276 398 0 hsa03430 Mismatch repair 21 23 0 hsa05211 Renal cell carcinoma 49 66 0 hsa05212 Pancreatic cancer 49 66 0 hsa05202 Transcriptional mis- 114 179 0 regulation in cancer hsa04510 Focal adhesion 128 207 0 hsa04062 Chemokine signaling 117 189 0 pathway hsa04310 Wnt signaling path- 90 140 0 way hsa05217 Basal cell carcinoma 41 55 0 hsa03410 Base excision repair 27 33 0 hsa03050 Proteasome 34 44 0 hsa05210 Colorectal cancer 45 62 0 hsa04014 Ras signaling pathway 137 228 0 hsa05222 Small cell lung cancer 59 86 0 hsa05219 Bladder cancer 30 38 0

Table H.3: KEGG Pathways Enriched for Eukaryote Profiles. The 20 most significantly enriched KEGG pathways for eukaryote profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa05219 Bladder cancer 15 38 0 hsa04520 Adherens junction 20 73 0 hsa05210 Colorectal cancer 23 62 0 519

Table H.3 (continued) ID Title Study Genes Total Genes P-value hsa05222 Small cell lung cancer 25 86 0 hsa05212 Pancreatic cancer 26 66 0 hsa04066 HIF-1 signaling path- 27 103 0 way hsa04110 Cell cycle 30 124 0 hsa04510 Focal adhesion 37 207 0 hsa05161 Hepatitis B 40 146 0 hsa05166 HTLV-I infection 47 261 0 hsa05205 Proteoglycans in can- 49 204 0 cer hsa05206 MicroRNAs in cancer 54 297 0 hsa05200 Pathways in cancer 76 398 0 hsa05220 Chronic myeloid 19 73 0 leukemia hsa04068 FoxO signaling path- 27 134 0 way hsa04390 Hippo signaling path- 30 154 0 way hsa05215 Prostate cancer 20 89 0 hsa05223 Non-small cell lung 15 56 0 cancer hsa05203 Viral carcinogenesis 34 206 0 hsa04151 PI3K-Akt signaling 49 347 0 pathway

Table H.4: KEGG Pathways Enriched for Virus Profiles. The 20 most significantly enriched KEGG pathways for virus profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. A P-value of 0 is a value less than 10−4.

ID Title Study Genes Total Genes P-value hsa05219 Bladder cancer 13 38 0 hsa05100 Bacterial invasion of 19 78 0 epithelial cells hsa04520 Adherens junction 20 73 0 hsa04110 Cell cycle 30 124 0 hsa05203 Viral carcinogenesis 34 206 0 hsa04510 Focal adhesion 39 207 0 hsa05205 Proteoglycans in can- 49 204 0 cer hsa05200 Pathways in cancer 71 398 0 hsa05212 Pancreatic cancer 16 66 0 520

Table H.4 (continued) ID Title Study Genes Total Genes P-value hsa04151 PI3K-Akt signaling 46 347 0 pathway hsa04115 p53 signaling pathway 16 68 0 hsa05206 MicroRNAs in cancer 40 297 0 hsa05210 Colorectal cancer 15 62 0 hsa05222 Small cell lung cancer 18 86 0 hsa04810 Regulation of actin cy- 32 215 0 toskeleton hsa05202 Transcriptional mis- 28 179 0 regulation in cancer hsa04015 Rap1 signaling path- 31 211 0 way hsa05131 Shigellosis 15 65 0 hsa05214 Glioma 15 65 0 hsa05132 Salmonella infection 17 86 0

Table H.5: GO Associations Enriched for the Collection of Profiles. The 20 most signif- icantly enriched GO associations for the collection of profiles are shown. The total number of genes and the total number of enriched miRNA target genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0010467 gene expression 668 821 5.35E-128 8.50E-128 GO:0000278 mitotic cell cycle 344 409 1.78E-73 2.75E-73 GO:0000122 negative regulation 494 696 6.73E-59 1.03E-58 of transcription from RNA polymerase II promoter GO:0006367 transcription initiation 189 245 4.00E-31 5.83E-31 from RNA polymerase II promoter GO:0007067 mitotic nuclear divi- 185 240 2.00E-30 2.90E-30 sion GO:0048011 neurotrophin TRK re- 203 274 4.51E-29 6.53E-29 ceptor signaling path- way GO:0019058 viral life cycle 124 145 4.53E-29 6.56E-29 GO:0007173 epidermal growth fac- 157 197 4.71E-29 6.82E-29 tor receptor signaling pathway GO:0007411 axon guidance 278 409 1.28E-28 1.85E-28 GO:0006366 transcription from 346 536 1.36E-28 1.96E-28 RNA polymerase II promoter 521

Table H.5 (continued) ID Title Study Genes Total Genes P-value FDR GO:0000082 G1/S transition of mi- 133 160 2.06E-28 2.96E-28 totic cell cycle GO:0090263 positive regulation of 113 130 4.80E-28 6.91E-28 canonical Wnt signal- ing pathway GO:0019083 viral transcription 100 111 5.34E-28 7.68E-28 GO:0000086 G2/M transition of mi- 119 141 7.39E-27 1.06E-26 totic cell cycle GO:0007596 blood coagulation 308 473 1.47E-26 2.10E-26 GO:0000184 ”nuclear-transcribed 102 118 4.49E-25 6.37E-25 mRNA catabolic process and nonsense- mediated decay” GO:0006614 SRP-dependent co- 95 108 8.73E-25 1.24E-24 translational protein targeting to membrane GO:0043687 post-translational pro- 223 326 7.65E-24 1.08E-23 tein modification GO:0007179 transforming growth 112 137 5.17E-23 7.26E-23 factor beta receptor signaling pathway GO:0001701 in utero embryonic de- 164 224 6.99E-23 9.81E-23 velopment

Table H.6: GO Associations Enriched for Bacteria Profiles. The 20 most significantly enriched GO associations for bacteria profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0010467 gene expression 710 821 4.08E-134 4.30E-134 GO:0000278 mitotic cell cycle 359 409 7.52E-72 7.69E-72 GO:0000122 negative regulation 538 696 1.15E-65 1.16E-65 of transcription from RNA polymerase II promoter GO:0006367 transcription initiation 206 245 8.28E-36 8.14E-36 from RNA polymerase II promoter GO:0007411 axon guidance 310 409 2.16E-35 2.13E-35 GO:0007067 mitotic nuclear divi- 196 240 1.17E-30 1.13E-30 sion GO:0000398 ”mRNA splicing and 162 190 4.79E-30 4.63E-30 via spliceosome” 522

Table H.6 (continued) ID Title Study Genes Total Genes P-value FDR GO:0048011 neurotrophin TRK re- 217 274 4.94E-30 4.77E-30 ceptor signaling path- way GO:0007173 epidermal growth fac- 166 197 1.90E-29 1.83E-29 tor receptor signaling pathway GO:0019058 viral life cycle 130 145 2.40E-29 2.32E-29 GO:0090263 positive regulation of 119 130 3.64E-29 3.51E-29 canonical Wnt signal- ing pathway GO:0000086 G2/M transition of mi- 125 141 3.62E-27 3.47E-27 totic cell cycle GO:0006614 SRP-dependent co- 101 108 3.80E-27 3.64E-27 translational protein targeting to membrane GO:0007596 blood coagulation 330 473 3.62E-26 3.45E-26 GO:0000082 G1/S transition of mi- 136 160 3.00E-25 2.85E-25 totic cell cycle GO:0001701 in utero embryonic de- 177 224 1.30E-24 1.23E-24 velopment GO:0019083 viral transcription 101 111 1.59E-24 1.51E-24 GO:0000184 ”nuclear-transcribed 106 118 1.79E-24 1.69E-24 mRNA catabolic process and nonsense- mediated decay” GO:0008543 fibroblast growth fac- 138 165 3.86E-24 3.64E-24 tor receptor signaling pathway GO:0090090 negative regulation of 130 155 4.40E-23 4.14E-23 canonical Wnt signal- ing pathway

Table H.7: GO Associations Enriched for Eukyotes Profiles. The 20 most significantly enriched GO associations for eukaryote profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0010467 gene expression 178 821 6.65E-41 1.32E-39 GO:0000122 negative regulation 135 696 4.38E-26 7.64E-25 of transcription from RNA polymerase II promoter 523

Table H.7 (continued) ID Title Study Genes Total Genes P-value FDR GO:0045944 positive regulation 166 959 4.89E-26 8.51E-25 of transcription from RNA polymerase II promoter GO:0000278 mitotic cell cycle 84 409 3.03E-18 4.77E-17 GO:0019058 viral life cycle 44 145 1.07E-16 1.62E-15 GO:0019083 viral transcription 37 111 9.02E-16 1.34E-14 GO:0000184 ”nuclear-transcribed 37 118 8.48E-15 1.23E-13 mRNA catabolic process and nonsense- mediated decay” GO:0006614 SRP-dependent co- 35 108 1.41E-14 2.04E-13 translational protein targeting to membrane GO:0008283 cell proliferation 104 655 4.13E-14 5.86E-13 GO:0007179 transforming growth 39 137 5.89E-14 8.33E-13 factor beta receptor signaling pathway GO:0006413 translational initiation 50 223 5.93E-13 8.21E-12 GO:0006367 transcription initiation 52 245 2.02E-12 2.75E-11 from RNA polymerase II promoter GO:0006412 translation 59 304 3.80E-12 5.12E-11 GO:0051301 cell division 66 364 5.25E-12 7.03E-11 GO:0048011 neurotrophin TRK re- 54 274 1.65E-11 2.13E-10 ceptor signaling path- way GO:0042771 intrinsic apoptotic sig- 16 30 1.73E-11 2.24E-10 naling pathway in re- sponse to DNA dam- age by p53 class medi- ator GO:0007219 Notch signaling path- 34 131 3.97E-11 5.04E-10 way GO:0030036 actin cytoskeleton or- 53 274 5.27E-11 6.65E-10 ganization GO:0007173 epidermal growth fac- 43 197 6.02E-11 7.51E-10 tor receptor signaling pathway GO:0001701 in utero embryonic de- 46 224 1.19E-10 1.47E-09 velopment 524

Table H.8: GO Associations Enriched for Virus Profiles. The 20 most significantly enriched GO associations for virus profiles are shown. The total number of genes and the total number of enriched genes in each pathway shown are provided. False discovery rates (FDR) are included.

ID Title Study Genes Total Genes P-value FDR GO:0010467 gene expression 131 821 2.74E-24 6.56E-23 GO:0000122 negative regulation 110 696 3.27E-20 7.00E-19 of transcription from RNA polymerase II promoter GO:0045944 positive regulation 134 959 1.61E-19 3.41E-18 of transcription from RNA polymerase II promoter GO:0007596 blood coagulation 76 473 9.93E-15 1.88E-13 GO:0008285 negative regulation of 90 614 1.05E-14 1.97E-13 cell proliferation GO:0000278 mitotic cell cycle 63 409 1.25E-11 2.15E-10 GO:0008283 cell proliferation 86 655 1.85E-11 3.15E-10 GO:0006915 apoptotic process 86 690 2.72E-10 4.40E-09 GO:0051301 cell division 55 364 5.13E-10 8.24E-09 GO:0007173 epidermal growth fac- 37 197 8.34E-10 1.31E-08 tor receptor signaling pathway GO:0001701 in utero embryonic de- 40 224 8.51E-10 1.33E-08 velopment GO:0042493 response to drug 53 355 1.63E-09 2.49E-08 GO:0030168 platelet activation 38 212 1.99E-09 3.01E-08 GO:0000082 G1/S transition of mi- 32 160 2.19E-09 3.29E-08 totic cell cycle GO:0001525 angiogenesis 42 250 2.25E-09 3.37E-08 GO:0019058 viral life cycle 30 145 2.94E-09 4.37E-08 GO:0000904 cell morphogenesis in- 31 155 3.87E-09 5.71E-08 volved in differentia- tion GO:0006367 transcription initiation 41 245 3.91E-09 5.77E-08 from RNA polymerase II promoter GO:0007411 axon guidance 57 409 5.41E-09 7.87E-08 GO:0048011 neurotrophin TRK re- 43 274 1.22E-08 1.72E-07 ceptor signaling path- way 525

Vita

Francis Bell June 12, 1987 Philadelphia, PA

• Education

– Drexel University, Philadelphia, PA Dual Bachelor/Master of Science in Biomedical Engineering, Graduation - June 2010 Biomaterials and Tissue Engineering – Drexel University, Philadelphia, PA Doctorate in Biomedical Engineering, Anticipated Graduation - November, 2015 Bioinformatics/Computational Biology

• Publications

– Cross-species gene and micro-RNA meta-analysis of host responses to pathogens. Francis Bell and Ahmet Sacan. (Being reviewed).3 – Content-based Search of Gene Expression Databases Using Binary Fingerprints of Differential Expression Profiles. Francis Bell and Ahmet Sacan. Network Modeling Analysis in Health Infor- matics and Bioinformatics. 2015. – BioGUI: A Graphical User Interface for Bioinformatics Applications. Francis Bell and Ahmet Sacan. International Conference on Intelligent Systems for Molecular Biology (ISMB), 2014. – PDB Circle Plot: A novel visualization protein structures. Francis Bell, Chunyu Zhao, and Ahmet Sacan. arXiv preprint arXiv:1402.5323. 2014. – Chapter In: Microarray Image Analysis: Theory and Practice. Yiqian Zhou, Rehman Qureshi, Francis Bell, and Ahmet Sacan. Luis Rueda (Ed.), CRC Press. 2013. – Content Based Searching Of Gene Expression Databases Using Binary Finger- prints Of Differential Expression Profiles. Francis Bell and Ahmet Sacan. IEEE Transactions on International Symposium on Health Informatics and Bioinformatics (HIBIT). 2012.