Analysis of Gene Co-expression Networks of Two Coccolithophore Species

In affiliation, with

California State University, San Marcos

In partial fulfillment of the Requirements for the Degree of

Master of Computer Science

By

Nitesh Balasaheb Sabankar

Summer 2018

1 ABSTRACT

Emiliania Huxleyi (E. Huxleyi) and Gephyrocapsa oceanica (G. Oceanica) are some of the most abundant species of Coccolithophores in the ocean. G. Oceanica and E. Huxleyi produce coccoliths, making it feasible to use comparative genomics [1, 2]. Coccolithophores play an important role in the oceanic carbon cycle through calcification of coccoliths and photosynthesis. The main objective of the project was to study and compare two sister coccolithophores E. Huxleyi and G. Oceanica, using differential gene co-expression network analysis.

Gene Co-Expression network involves nodes which corresponds to genes and edges corresponding to co-expression relationship between genes. The direction and type of co-expression relationships are not determined in gene co-expression networks. Gene Co-expression networks allows identification of different candidate biomarkers and curative targets. Such networks enable the inference of diseases and system-level functionality of genes. These inferences are helpful in identifying genes characteristics [4, 5].

In this project, E. Huxleyi and G. Oceanica RNA-Seq data was compared with each other and divided into different co-expressed group of genes between two species which are also called as ‘modules’. These modules are then compared with external traits to find out how the modules and traits are related. Functional enrichment analysis was also performed on these modules to identify significantly enriched genes in the interesting modules. After the analysis, by using 12 modules which are highly preserved in both the data sets, lipid metabolism genes and biomineralization genes to these modules are related and biological functions for these genes are obtained. Similarly, lists of genes which can be related to set of biomineralization genes was obtained which can vastly help us study biomineralization process in detail in the two species.

2 ACKNOWLEDGEMENT

I would first like to express my sincere gratitude and thanks to Dr. Xiaoyu Zhang, without whose constant support and advice this project would not be possible. I would also like to thank Dr. Ahmad Hadaegh and Dr. Betsy Read for taking time off their busy schedule to guide me through the entire process. I am deeply indebted to all of them. My sincere appreciation also goes out to other faculty members and staff of the Computer Science department, my fellow students and my family members, without whose support and motivation I would not have been able to complete this project.

3 TABLE OF CONTENTS

ABSTRACT ...... 2

ACKNOWLEDGEMENT ...... 3

1 INTRODUCTION ...... 6

2 BACKGROUND ...... 8

Gene co-expression networks...... 8

Software for co-expression network analysis...... 8

Weighted gene co-expression network analysis (WGCNA) ...... 9

3 ARCHITECTURE ...... 10

Input Data ...... 11

Data pre-processing ...... 11

Low-count filtering ...... 12

Log-transforming data ...... 12

Normalization ...... 12

Co-expression network construction ...... 13

Soft-thresholding power selection ...... 13

Adjacency matrix construction ...... 14

Topological Overlap Matrix based network construction ...... 14

Hierarchical Clustering and module identification ...... 15

Assessing module preservation ...... 18

Module-trait relationship ...... 19

Functional enrichment analysis ...... 19

4 IMPLEMENTATION AND RESULTS ...... 20

Input Data sets ...... 20

Data pre-processing ...... 21

4 Evaluating correlation between the datasets ...... 23

Co-expression network construction ...... 25

Adjacency matrix construction ...... 25

Topological Overlap Matrix based network construction ...... 26

Scaling of Topological Overlap Matrices ...... 27

Hierarchical clustering and module assignment...... 30

Imposing unmerged modules ...... 31

Imposing merged modules ...... 31

Assessing module preservation ...... 32

Relating modules to external information ...... 34

Module-trait relationship ...... 34

Functional Enrichment Analysis ...... 38

Relating modules to biomineralization genes...... 45

Relating modules to lipid metabolism genes...... 52

5 CONCLUSION AND FUTURE WORK ...... 54

6 REFERENCES ...... 55

7 APPENDICES ...... 57

APPENDIX 1 – GO analysis table for unmerged E. Huxleyi data modules ...... 57

APPENDIX 2 – Table relating unmerged modules to biomineralization genes ...... 65

APPENDIX 3 – Table relating unmerged modules to lipid metabolism genes ...... 83

APPENDIX 4 – GO analysis table for merged E. Huxleyi data modules ...... 86

APPENDIX 5 – Table relating merged modules to biomineralization genes ...... 94

APPENDIX 6 – Table relating merged modules to lipid metabolism genes ...... 106

5 1 INTRODUCTION

G. Oceanica and E. Huxleyi are type of phytoplankton microscopic organisms. These microscopic organisms are found mostly in the upper layers of oceans where there is sufficient sunlight. Similar to plants, phytoplanktons derive energy through the process of photosynthesis [6]. Spherical cells about 5-100 micrometers across, enclosed by calcareous (coccoliths) plates are called 'Coccolithophores'. They are one of the most important micro-algae, and the third-most prominent group of phytoplanktons [7].

Coccolithophores are exclusively marine organisms, and, like other phytoplanktons, are found in abundance in those parts of the ocean that receive sufficient sunlight [7]. Some coccolithophores differ from other oceanic phytoplanktons in that they have an exclusive outer sphere of calcite plates known as coccoliths. Because of their unique properties, research on coccolithophores has emerged as a major area of interest among scientists who study global climate change.

E. Huxleyi is one of the most abundant species of coccolithophores. This species received its name from Cesare Emiliani and Thomas Huxley, two scientists who discovered coccoliths that were embedded in the sediments at the bottom of the ocean [8]. The coccoliths of E. Huxleyi are generally transparent and colorless, and are made up of calcites that can refract light very efficiently in water.

In this study the focus was on comparison of RNA-Seq data of G. Oceanica and E. Huxleyi mainly using Gene Co-expression Network Analysis to find similarities and differences between the two species. It involves construction of gene correlation network and diving it into small groups of correlated genes. RNA-seq data were generated for two coccolithophores E. Huxleyi and G. Oceanica, under four growth conditions (0mM calcium, 9mM calcium, 0mM calcium with a spike of sodium carbonate, and 9mM calcium with a spike of sodium carbonate).

Errors and bias may be present in the dataset which is introduced by several factors such as difference in probe labeling, concentration of target RNA sequence, instrumental noise etc. Therefore, first step in this study was to preprocess data sets which allows data to be compared across a common reference level. Data preprocessing includes low-count filtering, log

6 transformation and data normalization which was aimed to correct for the systematic measurement errors and bias in the observed data [9].

Second step constructs a gene co-expression network represented mathematically by an adjacency matrix, the element of which indicates co-expression similarity between a pair of genes.

After constructing gene co-expression network, hierarchical clustering was used for identifying group of co-expressed genes. Co-expressed genes share expression level changes across different samples. Topological overlap measure (explained in detail in section 3.3.3) was used to measure the dissimilarity between clusters that can result in biologically meaningful modules in data analysis.

In the next step, association between the module and external information which is available for two species was calculated. Several methods can be used to measure the association of a module to a phenotypic trait. In this study, the association between the module eigengene (explained in detail in section 3.6) and the traits was tested. This associations helps us to find interesting modules by observing their expression level with respect to growth conditions [10].

In this project, number of co-expression modules were identified which up regulates or down regulates when compared with different growth conditions. It was also overserved that modules in both the species shows similar expression patterns when compared against growth conditions. Similarly, the modules are compared with biomineralization genes (explained in section 4.8.3) and lipid metabolism genes (explained in section 4.8.4) to find out their association with modules. Functional enrichment analysis (explained in section 4.8.2) was performed on biologically interesting modules to identify classes of genes that are over-represented in the modules. This helped us to find biological functions performed by genes in the module and are used to evaluate the functional properties of experimentally derived gene sets.

The report is organized in 7 sections. First three sections provide general idea about the analysis. Detailed steps are provided in the architecture section. Section 4 to 7 provides implementation and results of the analysis. The R program and required data files are available on GitHub page link: https://github.com/niteshsabankar/R-script2. Similarly, all the results files are available on CSUSM bioinformatics server bioinfo.csusm.edu at location /home/saban001/project.

7 2 BACKGROUND

The main objective of this study was to analyze gene co-expression networks between two species of Coccolithophores G. Oceanica and E. Huxleyi, to identify genes expressed similarly between different growth conditions using differential expression analysis. The particular interest was in genes and their networks related to biomineralization as both G. Oceanica and E. Huxleyi calcify under normal conditions. The inputs to our analysis were gene expression matrix from RNA-seq datasets of the two species, which were produced in previous research [19]. When interested in finding the cause for a trait, gene expression patterns are a common and useful way. Differently expressed genes may play a role in comparing two species.

Gene co-expression networks

A gene co-expression network is an undirected graph, where each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them. Having gene expression profiles of a number of genes for several samples or experimental conditions, a gene co-expression network can be constructed by looking for pairs of genes which show a similar expression pattern across samples, since the transcript levels of two co-expressed genes rise and fall together across samples. Gene co-expression networks are of biological interest since co-expressed genes are controlled by the same transcriptional regulatory program or they are functionally related. [19]

Software for co-expression network analysis

The traditional approach of finding differentially expressed genes is to compare the expression levels in the groups, and produce a list of differentially expressed candidate genes. Different software packages were considered to accomplish the analysis such as HO-GSVD, C3D, DiffCoEx, [20] which are singular value decomposition methods and WGCNA which corresponds to generating weighted network and diving it into modules. After comparing each method, it was determined to use WGCNA. The selection of WGCNA for analysis was due to its relevance to biological function. WGCNA is different than other methods which mainly emphasizes on individual genes whereas WGCNA emphasizes on clusters (modules). The major statistical

8 advantage of using WGCNA is that it alleviates multiple testing problem. WGCNA uses unsupervised hierarchical clustering based on Topological Overlap (TO) [11] to exploit the higher- order co-expression relationships present in RNA-Seq data, and fit it to this type of structure [20].

Weighted gene co-expression network analysis (WGCNA)

WGCNA was first described in 2005 [11], and implemented as an R package in 2008 [12]. It is already used in different biological context such as cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. [22]

The general outline of the procedure is as follows:

In the First step, a gene co-expression similarity matrix is defined by using Pearson correlation of all gene pairs. It is used to define the network.

Next, similarity matrix is transformed into an adjacency matrix (network), is used to quantify how strongly genes are connected to one another.

A major step in the module centric analysis is to cluster genes into network modules using a network proximity measure. A pair of genes has a high proximity if it is closely interconnected. Typically, WGCNA uses the Topological Overlap Measure (TOM) as proximity.

The TOM is a highly robust measure of network interconnectedness (proximity). This proximity is used as input of average linkage hierarchical clustering. Modules are defined as branches of the resulting cluster tree using the dynamic branch cutting approach.

Next, the genes inside a given module are summarize with the module eigengene, which is defined as the first principal component of the standardized expression profiles. Eigengenes define robust biomarkers [21].

To find modules that relate to a clinical trait of interest, module eigengenes are correlated with the clinical trait of interest.

9 3 ARCHITECTURE

Figure 1: Architecture for G. Oceanica and E. Huxleyi co-expression network analysis

10 Figure 1 shows the architecture and workflow of G. Oceanica and E. Huxleyi differential co- expression network analysis. As shown in the architecture, differential co-expression network analysis was carried out on both datasets separately and results are compared to find out similarities between them. Details on architecture is explained below.

Input Data

In the study, differentially expressed genes were analyzed in the two species of coccolithophores by mapping RNA-seq data from four different growth conditions to their genomes. The RNA-seq data was generated using high-throughput sequencing [4] for the two species of coccolithophores. Then the RNA-seq read was obtained from four different growth conditions, which are as follows:

• with 0mM calcium, • with 9mM calcium, • with 0mM calcium with a spike of sodium carbonate, and, • with 9mM calcium with a spike of sodium carbonate.

Next, to compare genes in both E. Huxleyi and G. Oceanica, a BLAST database was used. BLAST (Basic Local Alignment Search Tool) is an algorithm, which is used for comparing biological sequence information. The BLAST search helps in comparing a query sequence with a database of sequences. There are different types of BLASTs that are available according to the query sequences.

The protein-protein BLAST (blastp) program gives a protein query that returns the most similar protein sequences from the protein database that is specified by the user. Therefore, a BLAST database of the differentially expressed genes of G. Oceanica was created. Then blastp was used to compare the genes of E. Huxleyi to the G. Oceanica database. The result was a list of genes that are matched in both E. Huxleyi and G. Oceanica.

Data pre-processing

Biological processes depend on complex interactions between many genes and gene products. To understand the role of a single gene or gene in this network, many different types of information, such as genome-wide knowledge of gene expression, will be needed. RNA-seq

11 technology is a useful tool to understand gene regulation and interactions. It allows the monitoring of expression levels for thousands of genes simultaneously. In the RNA-Seq data, however, many undesirable systematic variations are observed. Even in replicated experiment, some variations are commonly observed. Data preparation is the process of removing some sources of variation which affect the measured gene expression levels [9]. Data preparation was performed in three steps:

Low-count filtering

Low count genes usually have the effect of casting out the statistical models used for differential expression analysis. Co-expression network analysis make assumptions about the distribution the data. If the data is a lot of zeros or very low values close to zero it can skew the results. Therefore, low-count genes are filtered out for better results.

Here, variable counts will be replaced with data matrix containing G. Oceanica and E. Huxleyi RNA-Seq data. After filtering out low expressed genes from both data sets, both data sets are checked for common genes available in both the datasets to be considered for further analysis.

Log-transforming data

RNA-Seq data are generally widely distributed in terms of their expression level. It is different than what most of the co-expression network analysis methods are designed. Therefore, it is important to log transform the data to make its distribution like microarray data which is what WGCNA and other co-expression network analysis methods require as input data. Log transformations of data are applied to make the data as "normal" as possible and, thus, increase the validity of the associated statistical analyses.

Normalization

RNA-Seq data contains the expression levels of thousands of genes simultaneously. Each sample receives different conditions (ex: spike, no-spike, etc.). A small difference in RNA quantities and experimental errors may cause the intensity level to vary from one replicate to the other. This can be irrespective of the biological expression of genes. Handling this inherent problem requires the normalization of data. This minimizes the technical effects, rendering the data comparable.

12 Co-expression network construction

Co-expression network construction involves converting expression data into graph containing nodes and edges. Nodes corresponds to genes and edges corresponds to co-expression relationship between genes.

Co-expression network construction mainly involves two steps, i.e. Adjacency matrix construction and then Topological Overlap Matrix based network construction [14]. Following are the steps performed to construct co-expression networks for E. Huxleyi and G. Oceanica.

Soft-thresholding power selection

The soft-thresholding, is a value used to raise the correlation of the genes to that threshold. The assumption on that by raising the correlation to a power will reduce the noise of the correlations in the adjacency matrix. To pick up the threshold, R function pickSoftThreshold was used, which calculates for each power if the network resembles to a scale-free topology. The power which produce a higher similarity with a scale-free network was used as soft-thresholding power.

The first step for constructing a co-expression network was to create adjacency matrix which provides the similarity between every gene with every other gene. Then, this adjacency matrix was raised to soft-thresholding power 훽 to satisfy scale-free topology of the network [14]. More about parameter  will be discussed in section …

J1 J2

J1 Correlation between J1 and J1 based on their Correlation between J2 and J1 based on their expression: co-expression expression: co-expression

J2 Correlation between J1 and J2 based on their Correlation between J2 and J2 based on their expression: co-expression expression: co-expression

13 Adjacency matrix construction

Adjacency matrix construction involves measuring similarity of every gene with every other gene to build an adjacency matrix. One of the core component of co-expression networks is selecting similarity measure for defining a co-expression similarity of genes and measuring the level of co- expression. Pearson correlation was used as a measure of similarity between the expression profiles.

From the input n × m matrix X = [xil] where the row indices (i = 1, . . ., n) correspond to network nodes (such as genes) and the column indices (l = 1, . . ., m) correspond to sample measurements, similarities in expression profiles were calculated by Pearson correlation, cor(xi, xj), creating a correlation matrix. The adjacency matrix A = [aij], was then calculated from the correlation matrix by raising the correlation to a soft threshold power β. The use of β is explained in section 4.5.1. This Then leads to a network that satisfies scale free topology:

푎푖푗=| 0.5 + 0.5 × 푐표(푥푖, 푥푗)|훽

The adjacency() function allows for a choice between “signed”, “unsigned” and “unsigned hybrid” for adjacency calculations from correlation values. These three are different choices for handling negative correlations. The calculations are done for “signed” network meaning that negative correlations give low adjacency [14]. Adjacency matrix with dimensions n × n was constructed for both data sets separately.

Topological Overlap Matrix based network construction

From the adjacency matrix, a Topological Overlap Matrix (TOM) was constructed. It minimizes the effects of noise and spurious associations. TOM was used since it has been found useful in generating biological meaningful clusters. It describes how well connected the genes are in respect to how many neighbors they share. The topological overlap of two nodes reflects their similarity in terms of commonality of the nodes they connect to. Two nodes have high topological overlap if they are connected to roughly the same group of nodes in the network, such as they share the same neighborhood. Detailed mechanism of TOM is provided in section 4.5.

14

The topological overlap measure is defined as matrix product of adjacency product with itself, then matrix product is normalized so that the result lies between 0 and 1. This is achieved with dividing matrix product by the minimum of connectivity k. Connectivity k is row sum of adjacencies. This formula assumes that diagonal elements of the adjacency matrix are zero. Adjacency matrix is used as input to construct topological overlap matrix (TOM). TOM matrix builds adjacency matrix that considers topological similarity. The topological overlap of two nodes reflects their similarity in nodes they connect to. It measures how similar two nodes are based on their number of shared neighbors on a scale from 0 to 1 [14].

Hierarchical Clustering and module identification

Hierarchical clustering is a common method employed for clustering biological networks. Using a TOM-based dissimilarity, linkage hierarchical clustering was performed. Agglomerative strategy was used where each node is considered as a separate cluster (singleton) in the starting point of the clustering procedure, and then clusters are iteratively merged if their similarity is sufficiently high.

There are number of different cluster agglomeration methods (i.e. linkage methods) to measure the similarity between two clusters of observations. The most common types methods are:

• Maximum or complete linkage clustering: It computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the largest value (i.e. maximum value) of these dissimilarities as the distance between the two clusters. It tends to produce more compact clusters as shown in figure 2.

15

Figure 2: Complete Linkage method illustration

• Minimum or single linkage clustering: It computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the smallest of these dissimilarities as a linkage criterion. It tends to produce long, ‘loose’ clusters as shown in figure 3.

Figure 3: Complete Linkage method illustration

• Mean or average linkage clustering: It computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the average of these dissimilarities as the distance between the two clusters as shown in figure 3.

16

Figure 4: Average Linkage method illustration

• Ward’s minimum variance method: It minimizes the total within-cluster variance. At each step the pair of clusters with minimum between-cluster distance are merged as shown in figure 4. [15].

Figure 5: Ward’s minimum Linkage method illustration After careful consideration, Ward’s minimum variance method was used for hierarchical clustering. This was due to even gene distribution in modules and easy to achieve required module sizes by changing tree cut height.

17 Hierarchical clustering has the advantage that it does not require any assumptions on the number and the size of the clusters. The module detection relies on the choice of the cut height in the dendrogram which is a tree diagram, showing taxonomic relationships between genes.

Consensus Topological Overlap (CTO) was calculated by taking the component-wise (parallel) minimum of the TOMs in individual sets. R function pmin()was used to calculate component wise minimum from 2 TOM matrices. This was done to ensure that consensus topological overlap of two genes to be only large if the corresponding entries in the two sets are also large. Then dissimilarity matrix of consensus TOM (DTOM) was calculated.

DTOMij = 1 − TOMij

DTOM was used as a distance measure, measuring how distant node i is from node j. In DTOM matrix, a high connectivity produces a low number and no connectivity gives us a value of 1, or close. Clustering of the genes in the DTOM matrix hence produces network structure having clusters of genes sharing many common neighbors. Modules are then found using cutreeDynamic() function.

Assessing module preservation

Next step was to assess preservation of gene co-expression modules to see how well they agree between G. Oceanica and E. Huxleyi networks. This was done in qualitative way by assigning module labels to the reference network then impose the same module labels of the reference network onto the second network. R function modulePreservation provides statistics about how well the modules are preserved between two networks. Highly preserved modules are considered for further analysis.

Detected modules are imposed on normalized data sets of G. Oceanica and E. Huxleyi which will divide whole data sets in small modules which are group of genes having similar expression profiles. Then function modulePreservation was used for calculating module preservation in two data sets. For each reference-test pair, the function calculates module preservation statistics that measure how well the modules of the reference set are preserved using Z-score summary.

18 Z-score gives a significant threshold to make a statement if a module is significantly preserved or not in two data sets. It is given as follows:

observed − mean Z= permuted sd permuted

It is calculated by permutation test analysis. Here genes from data sets are randomly permutated for multiple times to calculate a mean value meanpermuted. Similarly, standard deviation sdpermuted was calculated. The observed values are genes in each module. Z-Score measures how far away is the observed value from the value under the null hypothesis.

Module-trait relationship

One of the analysis was finding module-trait relationship. In this analysis modules were identified that are significantly associated with the traits. Summary profile (module eigengene) was used for finding module-trait relationship. The first principle component is referred to as the module eigengene (ME), and is a single value that represents the highest percent of variance for all genes in a module. If ME for a module is highly correlated with trait data, there is a good chance that most genes in that module will be highly correlated with that trait data [16].

Functional enrichment analysis

Functional enrichment analysis was performed on modules with genes that share common biological function to retrieve a functional profile of that gene set, to better understand the underlying biological processes. This was done by comparing the input gene set to each of the terms in the gene ontology. A statistical test was performed for each term to see if it is enriched for the input genes. Trinity software was used as functional enrichment analysis tool [10].

19 4 IMPLEMENTATION AND RESULTS

Input Data sets The input to this analysis was gene expression matrices of G. Oceanica and E. Huxleyi generated by mapping RNA-seq data sets to their corresponding genomes. Sample data of the expression matrices are shown in Table 1.

Table 1 E. Huxleyi expression data

ID 0.1 0.2 0.3 0S.1 0S.2 0S.3 9.1 9.2 9.3 9S.1 9S.2 9S.3 78860 0 0 0 0 0 0 0 0 0 0 0 0 62533 0 0 0 0 0 1 0 0 0 1 0 0 62524 7 2 1 5 6 10 4 1 1 7 6 6 62542 143 2 2 1009 109 11 74 1 4 430 9 26 69477 4 6 2 11 26 21 3 4 9 3 23 6 69499 27 2 5 52 17 3 33 1 4 44 16 3 69472 0 0 0 0 0 0 0 0 0 0 0 0 69467 21 72 45 16 60 55 13 42 53 34 54 78 69471 0 0 0 0 0 0 0 1 0 0 0 0

First column indicates gene ids and subsequent 12 columns provides the expression level under different growth conditions. There were four different growth conditions and three samples per condition:

1) 0.1, 0.2 and 0.3: with 0mM calcium 2) 0S.1, 0S.2 and 0S.3: with 0mM calcium with a spike of sodium carbonate 3) 9.1, 9.2 and 9.3: with 9mM calcium 4) 9S.1, 9S.2 and 9S.3: with 9mM calcium with a spike of sodium carbonate

G. ocenica data also has same four growth conditions and three samples per condition, but had different gene ID’s from those of E. Huxleyi. In order to perform co-expression analysis, matched genes between the two species were identified. An E. Huxleyi gene was considered as matched with an G. Oceanica gene if it was the top hit of the other gene in blastp search, and the percentage

20 of identity of their top HSP was over 90%. According to these stringent criteria, 10,624 pairs of matched E. Huxleyi and G. Ocenica genes was found.

For further analysis, the gene IDs of G. Oceanica are then replaced with matched gene IDs of E. Huxleyi.

Data pre-processing

Column 1,4,7,10 from two data sets corresponded to samples from a different batch and had much larger variations in expression level from remaining conditions in the data set. In order to minimize noise due to batch effects, these 4 samples are discarded from two data sets leaving 8 conditions for further analysis. These 8 conditions will act as clinical traits for relating modules to external clinical traits.

As shown in table 1, E. Huxleyi expression data sets contains some rows with 0 or clodse t zero expression levels. These rows of low expression levels were removed from the data sets. It was done using following R code. isexpr <- rowSums(counts > 10) >= 2 # create a variable with all genes having counts some > 10 counts<- counts[isexpr, ] # Only keep genes having counts > 10 filtering out remaining genes

Further, data sets are normalized to minimize any variations from the data. Log transformation was performed using following R code. log.norm.counts <- log2(counts(dds, normalized=TRUE) + 1) # Take log data set

As we can see in the R code, 1 is added to all data set elements to avoid log transformation of count 0 which could result in infinity and thus create problems during the calculation.

In the next step, DESeq2 package was used for normalization which estimates differences in sequencing depth for normalization [13]. Normalization was applied on both datasets for adjusting the relative expression measures between samples to compensate for various sources of variability

21 in the assay and so to allow accurate comparisons of the results between different samples and conditions.

Figure 2 and 3 showed density plot of log-normalized gene expression levels of G. Oceanica and E. Huxleyi. The original data set for G. Oceanica contained 33214 genes and E. Huxleyi data set contained 39130 genes.

Figure 6: G. Oceanica density plot before (left) and after (right) data pre-processing

Figure 7: E. Huxleyi density plot before (left) and after (right) data pre-processing

22 As shown in the above graphs, data is normally distributed after data pre-processing in both the data sets. After data pre-processing, 7033 matched genes which are available in both the data sets were considered for further analysis

Evaluating correlation between the datasets

Before performing co-expression network construction and analysis, it is important to assess the comparability of two data sets. This was done by correlating measures of average gene expression between two data sets. The higher the correlations, better the chance of finding similarities between the two data sets at subsequent stages of the analysis. Following is the R code for Evaluating correlation between the datasets. rankExprA1= rank(rowMeans(datExprA1)) # Takes row sum of genes expression values and rank them according sum rankExprA2= rank(rowMeans(datExprA2)) verboseScatterplot(rankExprA1, rankExprA2, xlab = "Ranked Expression (G. Oceanica)", ylab="Ranked Expression (E. Huxleyi)", pch = 1)

Data sets with different gene counts were evaluated for finding better correlation. Correlation graphs are as shown below. Each dot in the diagram means a gene plotted against it rank in G. Oceanica set versus rank in E. Huxleyi.

23

Figure 8: Correlation graph for 90% matched data sets(left) and 95% matched data sets (right)

Figure 9: Correlation graph for 97% matched data sets(left) and 100% matched data sets (right) Figure 4 and Figure 5 show correlation graph for different percentage of matched data set. It is basically a plot of E. Huxleyi vs. G. Oceanica matched data set after taking mean of each row and ranking it according to values of mean. After careful investigation, data sets with 90% matched gene data are considered for further evaluation, for the reasons such as high percentage matched data, large enough dataset for analysis and 0.52 correlation which is the highest in all correlation plots.

24 Co-expression network construction Adjacency matrix construction

For adjacency matrix construction, soft-thresholding power 훽 should be decided. Zhang et al. [11] proposed a scale-free topology criterion. Soft-thresholding allows for more biologically relevant modules and the results are robust to the choice of parameter 훽. In a logarithmic scale, the weighted network adjacency is linearly related to the co-expression similarity as

log(aij) = 훽 × log(sij)

Where aij is adjacency matrix and sij is similarity maritx. Soft thresholding power was determined for data sets using scale-free topology graph. The graphs are as shown below.

Figure 10: Scale Independence and Mean connectivity graph for G. Oceanica

25

Figure 11: Scale Independence and Mean connectivity graph for E. Huxleyi As shown in the figure 7, for E. Huxleyi power 18 is the lowest power for which the scale-free topology fit index curve flattens out upon reaching a high value; and in figure 6, for G. Oceanica, the lowest power for which the scale-free topology is satisfying is also 18, therefore power 18 was chosen, which was the lowest power where the scale-free topology fit index curve was relatively flattening.

Next, adjacency matrix was calculated for two data sets using soft power 18 for both the data sets. Following is the R code for adjacency matrix construction. softPower = 18; # This is β value as explained in 4.4.1 adjacencyA1 = adjacency(t(datExprA1), power = softPower, type = "signed"); diag(adjacencyA1) = 0 # Required for TOM calculation; adjacencyA2 = adjacency(t(datExprA2), power = softPower, type = "signed"); diag(adjacencyA2) = 0 # Required for TOM calculation;

Topological Overlap Matrix based network construction

Using adjacency matrix TOM similarity matrix was generated for two data sets. The R snippet for generating TOM is as follows

26 TOMA1 = TOMsimilarity(adjacencyA1, TOMType="signed") # Generate TOM with Adjacency matrix as input. TOMA2 = TOMsimilarity(adjacencyA2, TOMType="signed")

Scaling of Topological Overlap Matrices

Topological Overlap Matrices in the G. Oceanica data may be systematically lower than the TOM in E. Huxleyi data. Since consensus is defined as the component-wise minimum of the two TOMs, a bias may result. Therefore, simple scaling was used that mitigates the effect of different statistical properties to some degree. The E. Huxleyi TOM was scaled such that the 95th percentile equals the 95th percentile of the G. Oceanica TOM. Figure 12 shows TOM plot after scaling.

Figure 12: quantile-quantile plot of TOMs in G. Oceanica and E. Huxleyi data In above quantile-quantile plot of the TOMs in G. Oceanica and E. Huxleyi data sets. The black points are TOMs before scaling, the red points are TOMs after scaling. As shown in figure 8, TOMs after scaling is closer to the reference line shown in blues, which indicates closer distribution of the TOM values in the two data sets after scaling TOM.

27 Consensus TOM matrix was then calculated using pmin() function. It calculates component wise minimum from 2 TOM matrices. Consensus tree was then constructed from consensus TOM. Figure 9 showed the diagram for the consensus tree dendrogram.

Figure 13: Consensus tree dendrogram

Then all modules were plotted as dendrogram. The similarities are measured by correlations of module eigengenes; a “consensus” measure is defined as the “consensus quantile” over the corresponding relationship in each set. Once the similarity was calculated, average linkage hierarchical clustering of the module eigengenes was performed, the dendrogram was cut at the height cut Height and modules on each branch are merged.

The consensus tree was cut at height 30 with deepSplit 2. deepSplit is the option used while cutting tree which will check the cluster size and shape and divide clusters. R function cutreeDynamic() was used for cutting tree. Height 30 was chosen to achieve small size modules which can be further analyzed. The resulted modules (clusters) are shown as unmerged in Figure 10, displayed using different colors.

28

Figure 10: Consensus tree module choices for merged and unmerged modules

Those unmerged modules can be further combined by grouping modules with similar eigen genes. Figure 11 showed the hierarchical clustering of module eigen genes. As shown in figure, modules below height 0.15 (indicated using red line) are merged together.

29

Figure 11: Consensus clustering of consensus module eigengenes

Similar to unmerged modules, the resulted merged modules (clusters) were shown as merged in Figure 10, displayed using different colors. Both merged as well as unmerged modules for was used for independent analyses. This was because unmerged modules provide us small enough module sizes which can be useful for comparing it with external trait information, whereas merged modules are uneven module sizes but precisely differentiated according to gene expression levels.

Hierarchical clustering and module assignment

After scaling TOMs, the scaled TOMs were used for hierarchical clustering of data sets. Then, calculated modules were imposed on G. Oceanica and E. Huxleyi co-expression networks. Following is the R code for hierarchical clustering and module identification. consensusTOM = pmin(TOMA1, TOMA2) # Takes component wise minimum between TOM1 & TOM2 consTree = flashClust(as.dist(1 – consensusTOM), method = "ward") # Hierarchical clustering based on consensusTOM moduleLabels = cutreeDynamic(dendro = consTree, distM = 1- consensusTOM, deepSplit = 2, cutHeight = 30, minClusterSize = 30, pamRespectsDendro = FALSE) # Module assignment moduleColors = labels2colors(moduleLabels) # Label modules as colors for visualization.

30 Imposing unmerged modules

When considered unmerged modules and consensus, TOM was divided into modules, each gene ID got assigned with a module which are identified as colors. These consensus modules were used to divide G. Oceanica and E. Huxleyi data sets in such a way that each gene ID in data set was compared with consensus module genes and will be assigned as module for that particular data set if gene ID was matched.

Imposing merged modules

Experiment was repeated with merged modules, and consensus TOM was formed which was then divided into modules. Similar to step done while imposing unmerged modules, consensus modules were used to divide G. Oceanica and E. Huxleyi data sets. The only obvious difference between imposing unmerged and merged modules was less number of modules in merged modules due to the fact that modules with similar eigen genes were combined together. Figure 12 shows imposing unmerged and merged consensus modules on G. Oceanica data set dendrogram and Figure 13 shows imposing unmerged and merged consensus modules on E. Huxleyi data set dendrogram.

Figure 12: Imposing Unmerged modules(left) and merged modules(right) on G. Oceanica data set

31

Figure 13: Imposing Unmerged modules(left) and merged modules(right) on E. Huxleyi data sets Assessing module preservation

After assigning modules to two data sets, Z-Score was calculated to assess module preservation between the two networks. If the Z-score is less than 5 then there is no evidence that the module is preserved. The connectivity pattern and density is not preserved between the modules. In contrast, if the Z-score is higher than 10, it can be concluded that there's strong evidence that modules are largely preserved. 5 < Z < 10 indicates moderate preservation [14, 16]. The R code for assessing module preservation is given below. multiData = list(A1=list(data=t(datExprA1g)), A2=list(data=t(datExprA2g))) # Create data structure for storing data sets of two species multiColor = list(A1 = moduleColors) # Takes list of identified modules as a list mp=modulePreservation(multiData, multicolor, referenceNetworks = 1, networkType = "signed",nPermutations = 30) # Perform module preservation analysis stat = mp$preservation$Z$ref.A1$inColumnsAlsoPresentIn.A2 # Store results in variable

For the simplicity, only top 12 modules which had highest Z-score were considered, even though there are more than approximately 25 modules which had Z-score more than 10. Remaining results are available on the CSUSM bioinfo server with details provided in section 1. Tables 2 and table 3 show statistical summary containing Z-Score of top 12 modules for two networks, unmerged and merged modules respectively.

Module Module Size Z-Score pink 221 29.80328

32 brown 314 27.04081 purple 210 19.79626 grey60 161 19.58156 yellow 274 18.78055 lightyellow 153 18.67151 greenyellow 196 18.02123 darkolivegreen 64 17.54934 cyan 182 16.91372 magenta 218 15.71001 red 245 15.25065 orange 115 13.74684

Table 2: 12 highly preserved modules in two data sets according to unmerged modules

Module Size Z-Score paleturquoise 1000 37.71636 darkolivegreen 661 30.84493 magenta 973 20.76983 orange 360 17.6227 green 529 17.0803 lightyellow 153 16.66606 tan 185 14.32406 lightcyan 163 13.06638 saddlebrown 97 12.72574 midnightblue 167 11.53779 lightgreen 159 11.48454 black 407 10.87629

Table 3: 12 highly preserved modules in two data sets according to merged modules

33 Relating modules to external information

Module-trait relationship

After assessing the module preservation, top 12 preserved modules are considered for analyzing relationship between the data sets. module-trait relationship was assed using preserved modules to identify modules that are significantly associated with the traits. For this, module eigengenes (ME) for two data sets were calculated and MEs were correlated with trait data. Following is the R code for calculating ME in two data sets.

PCs1A = moduleEigengenes(t(datExprA1g), colors=moduleColors) ME_1A = PCs1A$eigengenes # Calculates module eigengenes for modulse PCs2A = moduleEigengenes(t(datExprA2g), colors=modulesA1) ME_2A = PCs2A$eigengenes

The trait data for the analysis is growth conditions in G. Oceanica and E. Huxleyi. Following is the list of different combinations of growth conditions used as trait data:

0Ca.NoSpike: 0mM calcium without spike of sodium carbonate 0Ca.Spike: 0mM calcium with a spike of sodium carbonate 9Ca.NoSpike: 9mM calcium without spike of sodium carbonate 9Ca.Spike: 9mM calcium with a spike of sodium carbonate NoSpike: Spike of sodium carbonate Spike: No spike of sodium carbonate 0Ca: 0mM calcium 9Ca: 9mM calcium Binary table was created to hold above conditions as trait data. Then trait data was correlated with module Eigengene using Pearson correlation to identify modules that are significantly associated with the traits. R code is as given below: moduleTraitCorA1 = cor(ME_1A, datTraits, use= "p") # pearson correlation between traits and module eigengens moduleTraitPvalueA1 = corPvalueStudent(moduleTraitCorA1, 8) moduleTraitCorA2 = cor(ME_2A, datTraits, use= "p") moduleTraitPvalueA2 = corPvalueStudent(moduleTraitCorA2, 8)

34 Result of this will be Module Trait relationship matrix which is plotted as heatmap as shown below.

Unmerged modules • G. Oceanica module-trail relationship

Figure 14: Module-trait relationship heatmap for G. Oceanica (unmerged modules) Each row in the figure 14 corresponds to a module eigengenes, and each column to a trait. Numbers in the table report the correlations of the corresponding module eigengenes and traits, with the p- values printed below the correlations in parentheses. The table is color coded by correlation according to the color legend indicated on right side of the heatmap. Red color shows up regulation and blue color shows down regulation while white color shows no change in expression level.

Figure 14 also showes many module eigen genes of G. Oceanica had strong correlations with Spike/NoSpike conditions, while weaker correlations with 0Ca/9Ca conditions. Modules brown, purple, grey60, yellow, red, orange were up-regulated with spike and modules pink, light yellow, green yellow, dark olivegreen, cyan, magenta were down-regulated.

35 • E. Huxleyi module-trait relationship

Figure 15: Module-trait relationship heatmap for E. Huxleyi (unmerged modules) From the heatmap in Figure 15, It can be inferred that module eigen genes of E. Huxleyi showed similar pattern as per G. Oceanica module eigen genes. It has strong correlations with Spike/NoSpike conditions, while weaker correlations with 0Ca/9Ca conditions. Modules brown, purple, grey60, yellow, red, orange were up-regulated with spike and modules pink, lightyellow, greenyellow, darkolivegreen, cyan, magenta were down-regulated.

The similar pattern in module-trait relationship in both the data sets also signifies that modules assignment in two data sets is precise.

36 Merged modules • G. Oceanica module-trail relationship

Figure 16: Module-trait relationship heatmap for G. Oceanica (merged modules) Figure 16 also showed many module eigen genes which are resulted from merged modules of G. Oceanica has strong correlations with Spike/NoSpike conditions, while weaker correlations with 0Ca/9Ca conditions. Modules paleturquoise, orange, lightgreen, black were up-regulated with spike and modules darkolivegreen, magenta, green, lightyellow, tan, saddlebrown, midnightblue were down-regulated.

• E. Huxleyi module-trait relationship

37

Figure 17: Module-trait relationship heatmap for E. Huxleyi (merged modules) While observing figure 17, it was found that there was a variation in merged E. Huxleyi eigengene modules green, tan, lightcyan, lightgreen and black when compared with merged G. Oceanica eigengene modules in figure 16. This can be the result of larger module sizes and its effect on eigengene calculations.

Functional Enrichment Analysis

The gene ontology (GO) terms for all genes in G. Oceanica and E. Huxleyi data sets were Extracted using Trinotate tool [5]. After this, GO enrichment analyses were performed on 12 modules using the Trinity Software [10]. This analysis returns overrepresented functional categories with respect to a reference gene set. Reference gene set here was all data considered for WGCNA analysis for both G. Oceanica and E. Huxleyi.

Top 15 functional GO terms sorted according to lowest to highest P-value were considered for analyzing genes functions within modules. The P-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The top 15 gene functions can be considered as gene functions of whole module as per guilt by association phenomenon. Following are bar plots showing gene functions for E. Huxleyi modules which contains biomineralization genes (explained in section 4.8.3). Because the p-value was small,

38 negative log10 was taken of all P-values for easier graph reading purpose. Therefore, higher the - log10(p-value), lesser the possibility the GO terms are obtained by chance.

Unmerged E. Huxleyi modules I. Brown module:

Figure18: Bar plot of -log10(P-valve) vs. function for brown module II. Pink module:

39

Figure 19: Bar plot of -log10(P-valve) vs. function for pink module III. Greenyellow module:

Figure 20: Bar plot of -log10(P-valve) vs. function for Greenyellow module IV. Gray60 module:

40

Figure 21: Bar plot of -log10(P-valve) vs. function for Gray60 module V. Yellow Module:

Figure 22: Bar plot of -log10(P-valve) vs. function for yellow module

VI. Purple Module:

41

Figure 23: Bar plot of -log10(P-valve) vs. function for purple module VII. Darkolivegreen Module:

Figure 24: Bar plot of -log10(P-valve) vs. function for darkolivegreen module

Merged E. Huxleyi modules I. Paleturquoise

42

Figure 25: Bar plot of -log10(P-valve) vs. function for paleturquoise module II. Darkolivegreen

Figure 26: Bar plot of -log10(P-valve) vs. function for darkolivegreen module III. Black

43

Figure 27: Bar plot of -log10(P-valve) vs. function for black module IV. Orange

Figure 28: Bar plot of -log10(P-valve) vs. function for orange module

44 Relating modules to biomineralization genes.

The investigation of genes involved in the process of calcification might also provide information about changes in calcification under future ocean conditions. Genes putatively related to calcification (e.g. calcium and inorganic carbon transport, H+ transport and carbonic anhydrases) have been identified via gene expression studies comparing calcifying (potentially by forming the organic biomineralization surfaces) and non-calcifying E. Huxleyi cells [17].

There are total of 85 genes which are identified as genes which take part in biomineralization process in E. Huxleyi. These genes are related with all modules to find out which modules they belong to; As a result, 18 biomineralization genes were identified which falls in any of the WGCNA modules. Following bar plot shows number of biomineralization genes each module contains. Plots are according to preservation order starting from highest at the top to lowest at the bottom of the graph.

45 Unmerged modules

Figure 29: No. of biomineralization genes in each unmerged WGCNA module GO annotation term was then checked for each biomineralization process genes and table was formed indicating the specific GO function rather than generic function having lowest P-value. The table also indicates genes lists with same GO term. These lists can be used to study biomineralization process in detail. The table is as follows.

Gene module Biominer GO term pvalue function All genes with same ID process term 416800 brown Ca2+/H+ GO:0099516 3.45E-05 ion 200599, 354606, 416800, exchange antiporte 433656, 453445, 99943 r (CAX3) r activity (CAX family) 373149 brown carbonic NA NA NA NA anhydras e, gamma

46 Gene module Biominer GO term pvalue function All genes with same ID process term 99943 brown anion NA NA NA NA exchange r-like, SLC4 Na+ independ ent Cl- /HCO3- exchange rs 413949 brown V- GO:0006811 0.001766593 ion 198810, 200599, 213954, ATPase, transport 214474, 217075, 239455, D 239659, 354606, 413949, 416800, 426015, 430018, 433371, 433656, 437003, 438874, 447975, 453445, 96194, 99943 466232 darkoliv anion GO:0000324 0.003746599 fungal- 442901, 466232 egreen exchange type r-like, vacuole SLC4 Na+ independ ent Cl- /HCO3- exchange rs 463266 greenye fibrillins GO:0007165 0.007811999 signal 201455, 225639, 253074, llow and transduct 353175, 353329, 420398, related ion 431876, 433276, 434388, proteins 436621, 438194, 441886, containin 45288, 463266, 466586, g Ca2+- 50390 binding EGF-like domains 439538 grey60 V- GO:0033178 5.34E-08 proton- 433142, 435128, 439538, ATPase, transport 461699, 95543 A ing two- sector ATPase complex, catalytic domain 435128 grey60 V- GO:0046034 2.38E-07 ATP 107385, 433142, 435128, ATPase, metaboli 436550, 439538, 461699, B c process 63832, 68485

47 Gene module Biominer GO term pvalue function All genes with same ID process term 196760 paleturq bicarbon GO:0021860 0.009237543 pyramida 196760 uoise ate l neuron transport develop er, anion ment exchange r-like Cl- /HCO3- exchange rs 436956 pink anion GO:0000324 0.036877167 fungal- 435850, 436956 exchange type r-like, vacuole SLC4 Na+ independ ent Cl- /HCO3- exchange rs 434034 pink eukaryoti GO:0098588 0.021729088 bounding 214071, 251432, 355894, c Na+/H+ membra 368295, 432191, 434034, exchange ne of 434324, 436125, 436956, r organelle 437226, 437571, 437586, 437972, 439489, 452762, 453191, 455644, 471368, 50604 469783 purple bicarbon GO:0008514 0.003169881 organic 231096, 244268, 451704, ate anion 462391, 465146, 469783, transport transme 96009 er mbrane transport er activity 439740 red H+ Ppase NA NA NA NA 420005 salmon V- GO:0016887 0.020055213 ATPase 194578, 199049, 256467, ATPase, activity 258138, 259135, 420005, D 439359, 44664, 47896 62679 turquois carbonic NA NA NA NA e anhydras e, alpha 464767 turquois V- GO:0006873 0.00340069 cellular 100055, 103027, 111405, e ATPase, ion 196760, 217082, 222551, A homeost 369600, 416312, 418014, asis 431695, 441105, 44130, 446686, 448708, 462143, 464767, 468148, 97320

48 Gene module Biominer GO term pvalue function All genes with same ID process term 63173 yellow carbonic NA NA NA NA anhydras e, gamma 72273 yellow cation/H GO:0015085 0.001900602 calcium 453185, 72273, 78746, + ion 94977 exchange transme r (CAX mbrane family) transport er activity Table 4: Enriched GO terms overlapped with biomineralization genes for E. Huxleyi unmerged modules analysis

Merged modules

Figure 30: No. of biomineralization genes in each merged WGCNA module

Similar to unmerged modules, GO annotation term was then checked for each biomineralization process genes for analysis using merged modules and table was formed indicating the specific GO function rather than generic function having lowest P-value. The table is as follows.

Gene module Biominer GO term over function All genes with same term ID represented pvalue 420005 black V- GO:0016887 0.001925979 ATPase 110859, 194578, 199049, ATPase, activity 221221, 256467, 258138, D 259135, 420005, 420596, 437166, 439359, 44664,

49 461008, 463439, 47896, 55447, 63150, 63180, 76565 436956 darkoliv anion NA NA NA NA egreen exchange r-like, SLC4 Na+ independ ent Cl- /HCO3- exchange rs 434034 darkoliv eukaryoti GO:0098657 0.013852678 import 434034, 464098 egreen c Na+/H+ into cell exchange r 466232 darkoliv anion GO:0000324 0.00481779 fungal- 435850, 436956, 442901, egreen exchange type 466232 r-like, vacuole SLC4 Na+ independ ent Cl- /HCO3- exchange rs 463266 darkoliv fibrillins NA NA NA NA egreen and related proteins containin g Ca2+- binding EGF-like domains 439740 orange H+ PPase GO:0003824 NA NA NA 439538 paleturq V- GO:1902600 0.00013765 hydrogen 196760, 216523, 413949, uoise ATPase, ion 433142, 435128, 439538, A transme 457332, 461699, 95543 mbrane transport 63173 paleturq carbonic NA NA NA NA uoise anhydras e, gamma 196760 paleturq bicarbon GO:0015301 8.98E-08 anion:ani 196760, 200599, 432785, uoise ate on 433656, 451704, 453445, transport

50 er, anion antiporte 462391, 465146, 469783, exchange r activity 99943 r-like Cl- /HCO3- exchange rs 416800 paleturq Ca2+/H+ GO:0015078 0.000119598 hydrogen 100529, 216523, 354280, uoise exchange ion 413949, 416800, 433142, r (CAX3) transme 439538, 443369, 457332, (CAX mbrane 461699, 72273 family) transport er activity 469783 paleturq bicarbon GO:0015301 8.98E-08 anion:ani 196760, 200599, 432785, uoise ate on 433656, 451704, 453445, transport antiporte 462391, 465146, 469783, er r activity 99943 373149 paleturq carbonic NA NA NA NA uoise anhydras e, gamma 435128 paleturq V- GO:0015988 0.000101409 energy 216523, 413949, 435128, uoise ATPase, coupled 439538, 457332, 95543 B proton transme mbrane transport , against electroch emical gradient 72273 paleturq cation/H GO:0015491 0.002781175 cation:ca 100529, 354606, 416800, uoise + tion 72273 exchange antiporte r (CAX r activity family) 99943 paleturq anion GO:0015301 8.98E-08 anion:ani 196760, 200599, 432785, uoise exchange on 433656, 451704, 453445, r-like, antiporte 462391, 465146, 469783, SLC4 Na+ r activity 99943 independ ent Cl- /HCO3- exchange rs

51 413949 paleturq V- GO:0090662 0.000101409 ATP 216523, 413949, 435128, uoise ATPase, hydrolysi 439538, 457332, 95543 D s coupled transme mbrane transport 62679 turquois carbonic NA NA NA NA e anhydras e, alpha 464767 turquois V- GO:0050801 0.004355867 ion 100055, 103027, 104009, e ATPase, homeost 199932, 214231, 217082, A asis 222551, 369600, 416312, 418014, 423009, 431695, 441105, 44130, 446686, 448708, 451134, 462143, 464767, 468148, 471020, 56555, 97320 Table 5: Enriched GO terms overlapped with biomineralization genes for E. Huxleyi merged modules analysis

Detail information about all enriched GO terms overlapped with biomineralization genes are available in APPENDIX 2 (for unmerged modules analysis) and APPENDIX 5 (for merged modules analysis).

Relating modules to lipid metabolism genes.

‘Lipid metabolism’ is a process that involves the breakdown of fatty acid molecules in cells, as well as their synthesis to form more complex lipid structures [18]. The differentially expressed genes shared by E. Huxleyi and G. Oceanica were compared to the Lipid metabolism proteins present in the E. Huxleyi. By using similar steps as used for biomineralization genes, 707 genes which are identified as genes which takes part in lipid metabolism process in E. Huxleyi, of which 466 remained after removing duplicates.

These genes are related with all modules to find out which modules they belong to. As a result, 127 lipid genes were identified which falls in any of the WGCNA modules. All lipid genes with gene IDs associated with module colors are available in APPENDIX 3 and APPENDIX 6 in tabular format. Following bar plot shows number of lipid metabolism genes each module contains. Plots are according to preservation order starting from highest at the top to lowest at the bottom of the graph.

52 Unmerged modules

Figure 31: No. of lipid metabolism genes in each unmerged WGCNA module Merged modules

Figure 32: No. of lipid metabolism genes in each merged WGCNA module

53 5 CONCLUSION AND FUTURE WORK

In this project, two data set of coccolithophore species were explored by WGCNA analysis. For each data set, modules consisting of co-expressed genes based on Topological Overlap (TO) were constructed, and the modules were then related to the additional data available for the data set. The goal of the project was to construct WGCNA modules and find differentially expressed genes in the data sets. The data sets were divided into 34 modules when modules with similar eigengenes were not grouped together; whereas 19 modules were obtained when they were grouped together. Detailed analysis was performed separately on these two module sets and results were discussed.

When the modules were compared with different growth conditions, it was found that there was substantial change in the expression levels in genes of two data sets with and without spike of sodium carbonate. Modules, which are group of genes, were either upregulating or downregulating with respect to change in growth conditions. There was not much of a change in expression level when modules were compared with and without 9mm calcium growth conditions. Modules were also compared with biomineralization genes and functions and then associated with gene ontology (GO) functions to find out functions associated with biomineralization genes. Appendix 2 shows table relating modules to biomineralization genes and GO functions. Similarly, modules were compared with lipid metabolism genes to find out how which genes falls under WGCNA modules and what are their count.

In the future, obtained results can be compared with other gene co-expression network analysis method such as HO-GSVD or DiffCoEx to make sure that results are consistent. Similarly, additional data sets and traits should be obtained to repeat the analysis and made sure that the results are in agreement.

54 6 REFERENCES

[1] Betsy A. Read, Xiaoyu Zhang (11 July 2013). Pan genome of the phytoplankton Emiliania underpins its global distribution.

[2] Jorg Bollmann, Christine Klaasb, and Larry E. Brandd. Morphological and Physiological Characteristics of Gephyrocapsa oceanica var. typica Kamptner 1943 in Culture Experiments: Evidence for Genotypic Variability. 19 October 2009

[3] Bollmann J. Morphology and biogeography of Gephyrocapsa coccoliths in sediments. Mar Micropaleontol 29 (1997) 319–350

[4] Jason A. Reuter, Damek Spacek, Michael P. Snyder. High-Throughput Sequencing Technologies. 2015 May 21

[5] Bryant DM, Johnson K, DiTommaso T. A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors. 2016/12

[6] Thurman, H.V., 2007. Introductory Oceanography. Academic Internet Publishers

[7] Bown, P.R.; Lees, J.A.; Young, J.R. (August 17, 2004). "Calcareous nannoplankton evolution and diversity through time". In Thierstein, Hans R.; Young, Jeremy R. Coccolithophores-from molecular processes to global impact.

[8] Amos Winter; William G. Siesser (2006). Coccolithophores. Cambridge University Press.

[9] Mark D RobinsonEmail, Alicia OshlackEmail (2 March 2010). A scaling normalization method for differential expression analysis of RNA-seq data. BioMed Central Ltd.

[10] Brian J. Haas (2013 Jul 11). De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity.

[11] Zhang, B. and S. Horvath, A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol, 2005.

55 [12] Almaas, E., Biological impacts and context of network theory. J Exp Biol, 2007.

[13] Michael I Love, Simon Anders, Wolfgang Huber, “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2” Genome Biology 2014.

[14] Peter Langfelder, Steve Horvath (2008). WGCNA: An R package for weighted correlation network analysis.

[15] Izenman A.J. (2013) Cluster Analysis. In: Modern Multivariate Statistical Techniques. Springer Texts in Statistics. Springer, New York, NY.

[16] Langfelder, P. and S. Horvath, Eigengene networks for studying the relationships between co-expression modules.

[17] Ina Benner, Rachel E. Diner, Stephane C. Lefebvre, Dian Li, Tomoko Komada, Edward J. Carpenter, Jonathon H. Stillman (26 August 2013). E huxleyi increases calcification but not expression of calcification-related genes in long-term exposure to elevated temperature and pCO2.

[18] Geoffrey M. Cooper, Robert E. Hausman (2015-10-08). The Cell: A Molecular Approach.

[19] Jianqiang Li, Doudou Zhou, Weiliang Qiu, Yuliang Shi, Ji-Jiang Yang, Shi Chen, Qing Wang, Hui Pan (2018/01/12). Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design

[20] Xiaolin Xiao, Aida Moreno, Maxime Rotival, L Bottolo, Enrico Petretto (January 2014). Multi-tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules.

[21] Amir Foroushani, Rupesh Agrahari, Roderick Docking, Linda Chang, Gerben Duns, Monika Hudoba, Aly Karsan and Habil Zare (8 March 2017), Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications

[22] Bin Zhang, Steve Horvath (2005), Statistical Applications in Genetics and Molecular Biology

56 7 APPENDICES

APPENDIX 1 – GO analysis table for unmerged E. Huxleyi data modules

• Brown category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0044391 2.34E-20 19.63057099 26 55 ribosomal subunit GO:0044445 1.42E-19 18.84845806 25 55 cytosolic part GO:0006412 4.63E-18 17.33403677 31 98 translation GO:0003735 7.26E-18 17.13895807 29 85 structural constituent of ribosome GO:0043043 2.07E-17 16.68363436 31 103 peptide biosynthetic process GO:0022625 3.00E-17 16.52220162 16 22 cytosolic large ribosomal subunit GO:0043604 7.49E-17 16.12552288 32 116 amide biosynthetic process GO:0015934 6.63E-16 15.17827732 18 33 large ribosomal subunit GO:0006518 1.18E-15 14.92774681 32 127 peptide metabolic process GO:0005198 1.25E-15 14.90239514 30 111 structural molecule activity GO:0030529 8.17E-15 14.08790011 42 233 intracellular ribonucleoprotein complex GO:1990904 8.17E-15 14.08790011 42 233 ribonucleoprotein complex GO:0043603 1.23E-13 12.90962141 33 159 cellular amide metabolic process GO:1901566 1.30E-13 12.88610354 49 340 organonitrogen compound biosynthetic process GO:0005840 1.66E-13 12.78085326 23 75 ribosome

• Cyan category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0043161 0.000149461 3.825471839 8 62 proteasome-mediated ubiquitin-dependent protein catabolic process GO:0010498 0.000168611 3.77311307 8 63 proteasomal protein catabolic process GO:0034657 0.0007099 3.148803107 2 2 GID complex

57 GO:0000109 0.000709949 3.148772643 2 2 -excision repair complex GO:0030163 0.001166738 2.933026586 8 84 protein catabolic process GO:1990391 0.001169306 2.932071843 3 9 DNA repair complex GO:0034968 0.001179428 2.928328458 4 19 histone lysine methylation GO:0044248 0.00131378 2.881477288 18 320 cellular catabolic process GO:0006511 0.001377626 2.860868615 8 86 ubiquitin-dependent protein catabolic process GO:0019941 0.001377626 2.860868615 8 86 modification-dependent protein catabolic process GO:0051569 0.00176641 2.752908604 3 10 regulation of histone H3-K4 methylation GO:0016571 0.001777743 2.750130929 4 21 histone methylation GO:0009312 0.001797327 2.745372881 3 10 oligosaccharide biosynthetic process GO:0006296 0.002086156 2.680653217 2 3 nucleotide-excision repair, DNA incision, 5'-to lesion GO:0043632 0.002166638 2.664213703 8 92 modification-dependent macromolecule catabolic process

• Darkolivegreen category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0042284 3.49E-05 4.457523008 2 2 sphingolipid delta-4 desaturase activity GO:0000982 0.001485765 2.828049952 2 6 transcription factor activity, RNA II core promoter proximal region sequence-specific binding GO:0001077 0.001485765 2.828049952 2 6 transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding GO:0001228 0.001485765 2.828049952 2 6 transcriptional activator activity, RNA polymerase II transcription regulatory region sequence-specific binding GO:0000978 0.002056524 2.686866163 2 6 RNA polymerase II core promoter proximal region sequence-specific DNA binding GO:0001134 0.002530524 2.59678955 2 9 transcription factor activity, transcription factor recruiting

58 GO:0001135 0.002530524 2.59678955 2 9 transcription factor activity, RNA polymerase II transcription factor recruiting GO:0000987 0.003331482 2.477362531 2 8 core promoter proximal region sequence-specific DNA binding GO:0001159 0.003331482 2.477362531 2 8 core promoter proximal region DNA binding GO:0000324 0.003746599 2.426362831 2 7 fungal-type vacuole GO:0000322 0.004348456 2.361664942 2 8 storage vacuole GO:0016717 0.005816333 2.235350737 2 10 activity, acting on paired donors, with oxidation of a pair of donors resulting in the reduction of molecular oxygen to two molecules of water GO:0047066 0.005980475 2.223264293 1 1 phospholipid-hydroperoxide glutathione peroxidase activity GO:0031058 0.006601959 2.180327184 2 11 positive regulation of histone modification GO:1905269 0.006601959 2.180327184 2 11 positive regulation of chromatin organization

• Greenyellow category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0008361 3.89E-06 5.410396609 7 24 regulation of cell size GO:0032535 3.89E-06 5.410396609 7 24 regulation of cellular component size GO:0043254 1.08E-05 4.968103237 9 48 regulation of protein complex assembly GO:0032271 1.79E-05 4.746941244 6 21 regulation of protein polymerization GO:0090066 1.87E-05 4.72847559 7 29 regulation of anatomical structure size GO:0032956 3.09E-05 4.509711796 6 23 regulation of actin cytoskeleton organization GO:0044087 5.15E-05 4.288467999 10 72 regulation of cellular component biogenesis GO:0030833 6.67E-05 4.175792146 5 16 regulation of actin filament polymerization GO:0032970 7.56E-05 4.1211996 6 26 regulation of actin filament- based process

59 GO:1902903 9.13E-05 4.039521537 6 28 regulation of supramolecular fiber organization GO:0008064 0.000154238 3.811807664 5 19 regulation of actin polymerization or depolymerization GO:0030832 0.000154238 3.811807664 5 19 regulation of actin filament length GO:0051493 0.000606729 3.217005222 6 39 regulation of cytoskeleton organization GO:0051893 0.000647275 3.18891121 2 2 regulation of focal adhesion assembly GO:0090109 0.000647275 3.18891121 2 2 regulation of cell- junction assembly

• Grey60 category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0009521 1.85E-20 19.73355269 18 35 photosystem GO:0009523 4.25E-19 18.37184211 17 34 photosystem II GO:0098796 8.26E-19 18.08298279 27 128 membrane protein complex GO:0006091 5.13E-18 17.2897671 22 82 generation of precursor metabolites and energy GO:0009765 2.70E-17 16.56835765 15 29 photosynthesis, light harvesting GO:0016168 2.70E-17 16.56835765 15 29 chlorophyll binding GO:0030076 3.28E-17 16.48473692 14 24 light-harvesting complex GO:0044436 1.61E-16 15.79329517 23 105 thylakoid part GO:0018298 1.72E-15 14.76500784 15 37 protein-chromophore linkage GO:0009535 1.81E-15 14.74272027 20 82 chloroplast thylakoid membrane GO:0055035 2.41E-15 14.61885007 20 83 plastid thylakoid membrane GO:0034357 7.23E-15 14.1407674 20 88 photosynthetic membrane GO:0042651 7.23E-15 14.1407674 20 88 thylakoid membrane GO:0046906 9.93E-14 13.00305527 17 70 tetrapyrrole binding GO:0032991 1.47E-12 11.83133921 52 810 macromolecular complex

• Lightyellow

60 category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0010586 0.000362808 3.440323374 3 7 miRNA metabolic process GO:0042819 0.000805974 3.093679183 3 9 vitamin B6 biosynthetic process GO:0008614 0.001353139 2.868657663 2 3 pyridoxine metabolic process GO:0008615 0.001353139 2.868657663 2 3 pyridoxine biosynthetic process GO:0008543 0.001579936 2.801360537 2 3 fibroblast growth factor receptor signaling pathway GO:0035278 0.001579936 2.801360537 2 3 miRNA mediated inhibition of translation GO:0040033 0.001579936 2.801360537 2 3 negative regulation of translation, ncRNA-mediated GO:0045974 0.001579936 2.801360537 2 3 regulation of translation, ncRNA-mediated GO:0072525 0.001996589 2.699711275 3 12 pyridine-containing compound biosynthetic process GO:0035198 0.002916771 2.535097731 2 4 miRNA binding GO:0042816 0.003172292 2.498626815 3 14 vitamin B6 metabolic process GO:0009443 0.004458681 2.350793637 2 5 pyridoxal 5'-phosphate salvage GO:0061158 0.004514527 2.345387777 2 5 3'-UTR-mediated mRNA destabilization GO:0043487 0.004800654 2.318699555 3 16 regulation of RNA stability GO:0043488 0.004800654 2.318699555 3 16 regulation of mRNA stability

• Magenta category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0022411 0.000373297 3.427945196 5 21 cellular component disassembly GO:0005776 0.000412509 3.384566547 3 6 autophagosome GO:0004396 0.000940968 3.026425127 2 2 activity GO:0000421 0.002647117 2.57722687 2 3 autophagosome membrane GO:1901068 0.004014869 2.396328658 3 11 guanosine-containing compound metabolic process GO:0006914 0.004165706 2.380311392 4 22 autophagy GO:0000422 0.005002392 2.300822253 2 4 autophagy of mitochondrion GO:0061726 0.005002392 2.300822253 2 4 mitochondrion disassembly

61 GO:0007219 0.005267699 2.278379055 3 12 Notch signaling pathway GO:0000045 0.005367477 2.270229776 2 4 autophagosome assembly GO:1905037 0.005367477 2.270229776 2 4 autophagosome organization GO:0098805 0.005894782 2.229532274 12 170 whole membrane GO:1903008 0.008388961 2.076291817 2 5 organelle disassembly GO:0005774 0.008982985 2.046579326 8 96 vacuolar membrane GO:0003887 0.009102164 2.04085535 3 15 DNA-directed DNA polymerase activity

• Orange category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0042597 0.002466865 2.607854564 3 17 periplasmic space GO:0016413 0.003834718 2.416266571 2 6 O-acetyltransferase activity GO:0016878 0.007427017 2.129185561 2 8 acid-thiol activity GO:0035251 0.009076494 2.042081863 2 9 UDP-glucosyltransferase activity GO:0016646 0.010812517 1.966073208 2 10 oxidoreductase activity, acting on the CH-NH group of donors, NAD or NADP as acceptor GO:0050728 0.013578759 1.867139907 2 11 negative regulation of inflammatory response GO:0003824 0.013959039 1.855144475 53 2508 catalytic activity GO:1901264 0.014906709 1.826618222 3 32 carbohydrate derivative transport GO:0071248 0.015619886 1.806322137 2 12 cellular response to metal ion GO:0006612 0.015666303 1.805033482 2 12 protein targeting to membrane GO:1903469 0.016059822 1.794259267 1 1 removal of RNA primer involved in mitotic DNA replication GO:0036041 0.016130137 1.792361935 1 1 long-chain fatty acid binding GO:1904012 0.016130137 1.792361935 1 1 platinum binding GO:1904013 0.016130137 1.792361935 1 1 xenon atom binding GO:0008484 0.016159467 1.791572975 2 12 sulfuric ester activity

• Pink

62 category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0010008 0.000770023 3.113496404 7 47 endosome membrane GO:0009395 0.001185019 2.926274785 3 7 phospholipid catabolic process GO:0098805 0.001216391 2.914926899 15 170 whole membrane GO:0044440 0.001297293 2.88696188 7 51 endosomal part GO:0008289 0.001673468 2.776382466 7 52 lipid binding GO:0098657 0.002350327 2.62887179 2 2 import into cell GO:0046434 0.002365478 2.62608101 5 31 organophosphate catabolic process GO:0015837 0.002867329 2.542522465 2 2 amine transport GO:0006644 0.003181461 2.497373337 7 58 phospholipid metabolic process GO:0030136 0.00357084 2.447229665 2 3 clathrin-coated vesicle GO:1901981 0.00411028 2.386128584 3 11 phosphatidylinositol phosphate binding GO:0031901 0.004163755 2.380514816 3 13 early endosome membrane GO:0032091 0.004252179 2.371388496 2 5 negative regulation of protein binding GO:0005886 0.004276824 2.368878665 23 362 plasma membrane GO:0007589 0.004377184 2.358805154 2 3 body fluid secretion

• Purple category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0099516 6.15E-05 4.210956254 5 16 ion antiporter activity GO:0016757 0.000160256 3.79518652 13 138 activity, transferring glycosyl groups GO:0044446 0.000163907 3.785402992 71 1606 intracellular organelle part GO:0015301 0.000185694 3.731202729 4 11 anion:anion antiporter activity GO:0044422 0.00022774 3.642560976 71 1622 organelle part GO:0043228 0.000311338 3.506767899 27 438 non-membrane-bounded organelle GO:0043232 0.000311338 3.506767899 27 438 intracellular non-membrane- bounded organelle GO:0016020 0.00060724 3.21663966 54 1195 membrane GO:0031224 0.000622207 3.206064906 45 948 intrinsic component of membrane GO:0005575 0.000806099 3.093611368 126 3447 cellular_component GO:0018279 0.000832989 3.079360629 3 7 protein N-linked glycosylation via asparagine

63 GO:0044464 0.000898898 3.046289752 118 3182 cell part GO:0016021 0.000927041 3.032901069 44 938 integral component of membrane GO:0044444 0.000950263 3.022156065 76 1854 cytoplasmic part GO:1902570 0.001001827 2.99920731 2 2 protein localization to nucleolus

• Red category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0006782 7.92E-05 4.101150903 4 8 protoporphyrinogen IX biosynthetic process GO:0009536 0.000120692 3.91832115 20 228 plastid GO:0046501 0.00013995 3.854026517 4 9 protoporphyrinogen IX metabolic process GO:0015995 0.000286796 3.542426928 5 18 chlorophyll biosynthetic process GO:0009507 0.000359448 3.444364191 18 210 chloroplast GO:0046148 0.000474561 3.323707841 7 43 pigment biosynthetic process GO:0006779 0.000759885 3.119252239 5 22 porphyrin-containing compound biosynthetic process GO:0033013 0.001424422 2.846361275 7 49 tetrapyrrole metabolic process GO:1990726 0.001491325 2.826427669 2 2 Lsm1-7-Pat1 complex GO:0033014 0.002041109 2.690133872 5 27 tetrapyrrole biosynthetic process GO:0009543 0.0022676 2.644433603 4 16 chloroplast thylakoid lumen GO:0031978 0.0022676 2.644433603 4 16 plastid thylakoid lumen GO:0046906 0.002524455 2.597832355 8 70 tetrapyrrole binding GO:0042440 0.003184082 2.497015743 7 58 pigment metabolic process GO:0009407 0.003896513 2.409323922 2 3 toxin catabolic process

• Yellow category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0030684 2.01E-08 7.697878896 10 26 preribosome

64 GO:0030688 5.64E-05 4.248685791 3 3 preribosome, small subunit precursor GO:0006364 0.000120007 3.920792133 14 115 rRNA processing GO:0034660 0.000149537 3.825250486 20 208 ncRNA metabolic process GO:0001650 0.000161598 3.79156344 4 8 fibrillar center GO:0016072 0.000209752 3.67829419 14 121 rRNA metabolic process GO:0000462 0.000216025 3.665495431 3 4 maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU- rRNA) GO:0022613 0.000455891 3.341138857 7 37 ribonucleoprotein complex biogenesis GO:0030687 0.000561605 3.250569059 4 11 preribosome, large subunit precursor GO:0033198 0.000573822 3.241222872 3 5 response to ATP GO:0034470 0.000821617 3.085330347 16 170 ncRNA processing GO:0005730 0.000975928 3.010582213 15 155 nucleolus GO:0071470 0.001031923 2.986352514 3 6 cellular response to osmotic stress GO:0044452 0.001106592 2.956012405 5 21 nucleolar part GO:0005618 0.001235632 2.908110912 6 31 cell wall

APPENDIX 2 – Table relating unmerged modules to biomineralization genes

Gene ID: 416800 module: brown Biomineralization process: Ca2+/H+ exchanger (CAX3) (CAX family) GO term over fucntion All genes with same term No. represented of pvalue genes GO:0008150 6.03E-07 biological_process 100158, 100710, 101014, 197 101092, 104884, ... GO:0099516 3.45E-05 ion antiporter activity 200599, 354606, 416800, 6 433656, 453445, ... GO:0044444 0.000101 cytoplasmic part 100158, 100710, 105936, 112 109999, 123268, ... GO:0003674 0.000169 molecular_function 100158, 100710, 101014, 199 101092, 104884, ... GO:0044424 0.00074 intracellular part 100158, 100710, 105936, 158 109999, 115039, ...

65 GO:0005575 0.000777 cellular_component 100158, 100710, 105936, 181 109999, 115039, ... GO:0044464 0.001309 cell part 100158, 100710, 105936, 168 109999, 115039, ... GO:0015297 0.001545 antiporter activity 200599, 214474, 354606, 7 416800, 433656, ... GO:0006811 0.001767 ion transport 198810, 200599, 213954, 20 214474, 217075, ... GO:0043226 0.004222 organelle 100710, 115039, 123268, 100 196861, 198939, ... GO:0015368 0.00557 calcium:cation antiporter activity 354606, 416800 2 GO:0015291 0.007177 secondary active transmembrane 200599, 214474, 217075, 10 transporter activity 354606, 372934, ... GO:0015077 0.009432 monovalent inorganic cation 217075, 354606, 413949, 6 transmembrane transporter 416800, 443369, ... activity GO:0043229 0.010145 intracellular organelle 100710, 123268, 196861, 94 200695, 200700, ... GO:0022891 0.012817 substrate-specific 198810, 200599, 213954, 17 transmembrane transporter 217075, 354606, ... activity GO:0015491 0.017691 cation:cation antiporter activity 354606, 416800 2 GO:0015491 0.017691 cation:cation antiporter activity 354606, 416800 2 GO:0022857 0.020006 transmembrane transporter 100710, 198810, 200599, 21 activity 213954, 214474, ... GO:0022892 0.023608 substrate-specific transporter 194900, 198810, 200599, 18 activity 213954, 217075, ... GO:0022892 0.023608 substrate-specific transporter 194900, 198810, 200599, 18 activity 213954, 217075, ... GO:0044422 0.0259 organelle part 100710, 117708, 123268, 87 200599, 200700, ... GO:0044446 0.027768 intracellular organelle part 100710, 117708, 123268, 86 200599, 200700, ... GO:0005215 0.029113 transporter activity 100710, 194900, 198810, 23 200599, 213954, ... GO:0022804 0.04502 active transmembrane 200599, 214474, 217075, 11 transporter activity 354606, 372934, ...

Gene ID: 373149 module: brown Biomineralization process: carbonic anhydrase, gamma GO term over function All genes with same term No. represented of pvalue genes

66 GO:0009987 1.41E-07 cellular process 100158, 100710, 101014, 164 101092, 104884, ... GO:0044237 1.81E-07 cellular metabolic process 100158, 101014, 101092, 135 104884, 105936, ... GO:0008150 6.03E-07 biological_process 100158, 100710, 101014, 197 101092, 104884, ... GO:0032991 2.76E-06 macromolecular complex 100158, 109999, 123268, 64 200700, 217317, ... GO:0008152 7.38E-06 metabolic process 100158, 101014, 101092, 149 104884, 105936, ... GO:0044444 0.000101 cytoplasmic part 100158, 100710, 105936, 112 109999, 123268, ... GO:0003674 0.000169 molecular_function 100158, 100710, 101014, 199 101092, 104884, ... GO:0044424 0.00074 intracellular part 100158, 100710, 105936, 158 109999, 115039, ... GO:0005575 0.000777 cellular_component 100158, 100710, 105936, 181 109999, 115039, ... GO:0044464 0.001309 cell part 100158, 100710, 105936, 168 109999, 115039, ... GO:0005488 0.002762 binding 100158, 101014, 104884, 135 105936, 107321, ... GO:0043226 0.004222 organelle 100710, 115039, 123268, 100 196861, 198939, ... GO:0065003 0.004746 macromolecular complex 123268, 234133, 249675, 18 assembly 309232, 351077, ... GO:0016836 0.005705 hydro- activity 100158, 235900, 373149, 5 420746, 437725 GO:0043229 0.010145 intracellular organelle 100710, 123268, 196861, 94 200695, 200700, ... GO:0043933 0.010151 macromolecular complex subunit 123268, 234133, 249675, 18 organization 309232, 351077, ... GO:0016829 0.011903 lyase activity 100158, 222366, 235900, 10 254913, 373149, ... GO:0022607 0.019472 cellular component assembly 123268, 234133, 249675, 21 309232, 351077, ... GO:0044422 0.0259 organelle part 100710, 117708, 123268, 87 200599, 200700, ... GO:0044446 0.027768 intracellular organelle part 100710, 117708, 123268, 86 200599, 200700, ... GO:0016835 0.038904 carbon-oxygen lyase activity 100158, 235900, 373149, 5 420746, 437725 GO:0009735 0.042356 response to cytokinin 373149, 427548, 456646 3

67 Gene ID: 99943 module: brown Biomineralization process: anion exchanger-like, SLC4 Na+ independent Cl-/HCO3- exchangers GO term over function All genes with same term No. represented of pvalue genes GO:0009987 1.41E-07 cellular process 100158, 100710, 101014, 164 101092, 104884, .... GO:0008150 6.03E-07 biological_process 100158, 100710, 101014, 197 101092, 104884, .... GO:0044444 0.000101 cytoplasmic part 100158, 100710, 105936, 112 109999, 123268, .... GO:0003674 0.000169 molecular_function 100158, 100710, 101014, 199 101092, 104884, .... GO:0044424 0.00074 intracellular part 100158, 100710, 105936, 158 109999, 115039, .... GO:0005575 0.000777 cellular_component 100158, 100710, 105936, 181 109999, 115039, .... GO:0044464 0.001309 cell part 100158, 100710, 105936, 168 109999, 115039, .... GO:0005488 0.002762 binding 100158, 101014, 104884, 135 105936, 107321, .... GO:0043226 0.004222 organelle 100710, 115039, 123268, 100 196861, 198939, .... GO:0043229 0.010145 intracellular organelle 100710, 123268, 196861, 94 200695, 200700, .... GO:0044422 0.0259 organelle part 100710, 117708, 123268, 87 200599, 200700, ....

Gene ID: 413949 module: brown Biomineralization process: V-ATPase, D GO term over function All genes with same term No. represented of pvalue genes GO:0009987 1.41E-07 cellular process 100158, 100710, 101014, 164 101092, 104884, ... GO:0008150 6.03E-07 biological_process 100158, 100710, 101014, 197 101092, 104884, ... GO:0032991 2.76E-06 macromolecular complex 100158, 109999, 123268, 64 200700, 217317, ...

68 GO:0044444 0.000101 cytoplasmic part 100158, 100710, 105936, 112 109999, 123268, ... GO:0003674 0.000169 molecular_function 100158, 100710, 101014, 199 101092, 104884, ... GO:0044424 0.00074 intracellular part 100158, 100710, 105936, 158 109999, 115039, ... GO:0005575 0.000777 cellular_component 100158, 100710, 105936, 181 109999, 115039, ... GO:0044464 0.001309 cell part 100158, 100710, 105936, 168 109999, 115039, ... GO:0006811 0.001767 ion transport 198810, 200599, 213954, 20 214474, 217075, ... GO:0044699 0.001933 NA 100158, 100710, 101014, 108 101092, 115039, ... GO:1902578 0.003198 NA 100710, 200599, 213954, 22 214474, 217075, ... GO:0044765 0.004163 NA 100710, 200599, 213954, 21 214474, 217075, ... GO:0043226 0.004222 organelle 100710, 115039, 123268, 100 196861, 198939, ... GO:0015672 0.007254 monovalent inorganic cation 198810, 213954, 214474, 7 transport 354606, 413949, ... GO:0015077 0.009432 monovalent inorganic cation 217075, 354606, 413949, 6 transmembrane transporter 416800, 443369, ... activity GO:0043229 0.010145 intracellular organelle 100710, 123268, 196861, 94 200695, 200700, ... GO:0022891 0.012817 substrate-specific 198810, 200599, 213954, 17 transmembrane transporter 217075, 354606, ... activity GO:0044763 0.014024 NA 100158, 101014, 101092, 79 115039, 118523, ... GO:0015075 0.017709 ion transmembrane transporter 198810, 200599, 213954, 15 activity 217075, 354606, ... GO:0022857 0.020006 transmembrane transporter 100710, 198810, 200599, 21 activity 213954, 214474, ... GO:0022892 0.023608 substrate-specific transporter 194900, 198810, 200599, 18 activity 213954, 217075, ... GO:0044422 0.0259 organelle part 100710, 117708, 123268, 87 200599, 200700, ... GO:0044446 0.027768 intracellular organelle part 100710, 117708, 123268, 86 200599, 200700, ... GO:0005215 0.029113 transporter activity 100710, 194900, 198810, 23 200599, 213954, ... GO:0030641 0.042833 regulation of cellular pH 413949, 99943 2 GO:0051453 0.042833 regulation of intracellular pH 413949, 99943 2

69 GO:0022804 0.04502 active transmembrane 200599, 214474, 217075, 11 transporter activity 354606, 372934, ...

Gene ID: 466232 module: darkolivegreen Biomineralization process: anion exchanger-like, SLC4 Na+ independent Cl-/HCO3- exchangers GO term over function All genes with same term No. represented of pvalue genes GO:0000324 0.003747 fungal-type vacuole 442901, 466232 2 GO:0000322 0.004348 storage vacuole 442901, 466232 2 GO:0035445 0.033896 borate transmembrane transport 466232 1 GO:0046713 0.033896 borate transport 466232 1 GO:0046715 0.033896 borate transmembrane 466232 1 transporter activity

Gene ID: 463266 module: greenyellow Biomineralization process: fibrillins and related proteins containing Ca2+-binding EGF-like domains GO term over function All genes with same term No. represented of pvalue genes GO:0007165 0.007812 signal transduction 201455, 225639, 253074, 16 353175, 353329, ... GO:0050794 0.01132 regulation of cellular process 103629, 104702, 201455, 38 206853, 225639, ... GO:0050789 0.035194 regulation of biological process 103629, 104702, 201455, 38 206853, 225639, ...

Gene ID: 439538 module: grey60 Biomineralization process: V-ATPase, A GO term over function All genes with same term No. represented of pvalue genes

70 GO:0098796 8.26E-19 membrane protein complex 211477, 216523, 218313, 27 354280, 355949, ... GO:0032991 1.47E-12 macromolecular complex 211477, 213701, 216523, 52 218313, 308697, ... GO:1901564 3.59E-12 organonitrogen compound 103368, 107385, 110366, 63 metabolic process 122012, 204056, ... GO:0044444 4.13E-09 cytoplasmic part 107385, 110366, 122012, 76 211477, 213701, ... GO:0043234 1.95E-08 protein complex 211477, 213701, 216523, 31 218313, 354280, ... GO:0006818 3.62E-08 hydrogen transport 216523, 355949, 433142, 8 435128, 437063, ... GO:0015992 3.62E-08 proton transport 216523, 355949, 433142, 8 435128, 437063, ... GO:0033178 5.34E-08 proton-transporting two-sector 433142, 435128, 439538, 5 ATPase complex, catalytic domain 461699, 95543 GO:0044237 5.69E-08 cellular metabolic process 107385, 110366, 122012, 80 200040, 204056, ... GO:0008152 1.32E-07 metabolic process 103368, 107385, 110366, 89 113987, 122012, ... GO:0046034 2.38E-07 ATP metabolic process 107385, 433142, 435128, 8 436550, 439538, ... GO:1902600 3.99E-07 hydrogen ion transmembrane 216523, 433142, 435128, 6 transport 439538, 461699, ... GO:0009199 1.28E-06 ribonucleoside triphosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0009205 1.28E-06 purine ribonucleoside 107385, 433142, 435128, 8 triphosphate metabolic process 436550, 439538, ... GO:0009144 1.88E-06 purine nucleoside triphosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0019693 2.83E-06 ribose phosphate metabolic 107385, 433142, 435128, 11 process 436305, 436550, ... GO:0009141 5.87E-06 nucleoside triphosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0009126 5.99E-06 purine nucleoside 107385, 433142, 435128, 8 monophosphate metabolic 436550, 439538, ... process GO:0009167 5.99E-06 purine ribonucleoside 107385, 433142, 435128, 8 monophosphate metabolic 436550, 439538, ... process GO:0015988 6.18E-06 energy coupled proton 216523, 435128, 439538, 4 transmembrane transport, 95543 against electrochemical gradient GO:0015991 6.18E-06 ATP hydrolysis coupled proton 216523, 435128, 439538, 4 transport 95543 GO:0090662 6.18E-06 ATP hydrolysis coupled 216523, 435128, 439538, 4 transmembrane transport 95543

71 GO:0099131 6.18E-06 ATP hydrolysis coupled ion 216523, 435128, 439538, 4 transmembrane transport 95543 GO:0099132 6.18E-06 ATP hydrolysis coupled cation 216523, 435128, 439538, 4 transmembrane transport 95543 GO:0044424 7.20E-06 intracellular part 107385, 110366, 122012, 94 204056, 211477, ... GO:0033180 7.23E-06 proton-transporting V-type 435128, 439538, 95543 3 ATPase, V1 domain GO:0044769 8.55E-06 ATPase activity, coupled to 216523, 433142, 439538, 4 transmembrane movement of 461699 ions, rotational mechanism GO:0006807 1.15E-05 compound metabolic 103368, 107385, 110366, 66 process 122012, 204056, ... GO:0015672 1.24E-05 monovalent inorganic cation 216523, 355949, 433142, 8 transport 435128, 437063, ... GO:0009987 1.28E-05 cellular process 107385, 110366, 122012, 88 200040, 204056, ... GO:0009117 1.59E-05 nucleotide metabolic process 107385, 427769, 433142, 12 435128, 436305, ... GO:0009150 1.82E-05 purine ribonucleotide metabolic 107385, 433142, 435128, 9 process 436305, 436550, ... GO:0006753 1.91E-05 nucleoside phosphate metabolic 107385, 427769, 433142, 12 process 435128, 436305, ... GO:0044281 2.13E-05 small molecule metabolic process 107385, 122012, 213701, 30 249205, 373551, ... GO:0071704 2.34E-05 organic substance metabolic 103368, 107385, 110366, 75 process 113987, 122012, ... GO:0009161 2.44E-05 ribonucleoside monophosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0016820 2.52E-05 hydrolase activity, acting on acid 216523, 355949, 433142, 7 anhydrides, catalyzing 435128, 439538, ... transmembrane movement of substances GO:0006163 2.64E-05 purine nucleotide metabolic 107385, 433142, 435128, 9 process 436305, 436550, ... GO:1901363 2.71E-05 heterocyclic compound binding 107385, 110366, 211477, 54 218313, 227656, ... GO:0097159 2.92E-05 organic cyclic compound binding 107385, 110366, 211477, 54 218313, 227656, ... GO:0009123 2.92E-05 nucleoside monophosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0008150 4.39E-05 biological_process 103368, 107385, 110366, 104 113987, 122012, ... GO:0009259 5.19E-05 ribonucleotide metabolic process 107385, 433142, 435128, 9 436305, 436550, ... GO:0015078 5.55E-05 hydrogen ion transmembrane 216523, 354280, 433142, 5 transporter activity 439538, 461699

72 GO:0019829 7.40E-05 cation-transporting ATPase 216523, 433142, 439538, 4 activity 461699 GO:0072521 8.18E-05 purine-containing compound 107385, 433142, 435128, 9 metabolic process 436305, 436550, ... GO:0044238 9.46E-05 primary metabolic process 103368, 107385, 110366, 68 113987, 122012, ... GO:0044710 9.52E-05 NA 107385, 113987, 122012, 44 213701, 227656, ... GO:0055086 0.000103 nucleobase-containing small 107385, 427769, 433142, 12 molecule metabolic process 435128, 436305, ... GO:0042625 0.000118 ATPase coupled ion 216523, 433142, 439538, 4 transmembrane transporter 461699 activity GO:0044464 0.000167 cell part 107385, 110366, 122012, 95 204056, 211477, ... GO:0098662 0.000346 inorganic cation transmembrane 216523, 433142, 435128, 6 transport 439538, 461699, ... GO:0098660 0.000557 inorganic ion transmembrane 216523, 433142, 435128, 6 transport 439538, 461699, ... GO:1901135 0.000651 carbohydrate derivative 107385, 122012, 427769, 13 metabolic process 433142, 435128, ... GO:0098655 0.000694 cation transmembrane transport 216523, 433142, 435128, 6 439538, 461699, ... GO:0019637 0.000737 organophosphate metabolic 107385, 122012, 427769, 13 process 433142, 435128, ... GO:0003674 0.000852 molecular_function 107385, 110366, 113987, 105 122012, 123375, ... GO:0022853 0.000905 active ion transmembrane 216523, 433142, 439538, 4 transporter activity 461699 GO:0005575 0.001459 cellular_component 107385, 110366, 113987, 97 122012, 204056, ... GO:0015077 0.001806 monovalent inorganic cation 216523, 354280, 433142, 5 transmembrane transporter 439538, 461699 activity GO:0036442 0.002063 hydrogen-exporting ATPase 216523, 439538 2 activity GO:0046961 0.002063 proton-transporting ATPase 216523, 439538 2 activity, rotational mechanism GO:0044446 0.002639 intracellular organelle part 122012, 211477, 216523, 52 218313, 227656, ... GO:0044422 0.003315 organelle part 122012, 211477, 216523, 52 218313, 227656, ... GO:0005488 0.003704 binding 107385, 110366, 122012, 73 211477, 213701, ... GO:0016020 0.005687 membrane 211477, 213701, 216523, 39 218313, 227656, ...

73 GO:0006812 0.006146 cation transport 216523, 355949, 433142, 8 435128, 437063, ... GO:0044425 0.00669 membrane part 211477, 216523, 218313, 36 227656, 250739, ... GO:0034220 0.007797 ion transmembrane transport 216523, 433142, 435128, 6 439538, 461699, ... GO:0042626 0.008796 ATPase activity, coupled to 216523, 433142, 439538, 4 transmembrane movement of 461699 substances GO:0043492 0.009168 ATPase activity, coupled to 216523, 433142, 439538, 4 movement of substances 461699 GO:0022804 0.010648 active transmembrane 216523, 355949, 432785, 8 transporter activity 433142, 435128, ... GO:0000331 0.013584 contractile vacuole 439538 1 GO:0034641 0.013908 cellular nitrogen compound 107385, 110366, 122012, 37 metabolic process 204056, 308697, ... GO:0015399 0.013933 primary active transmembrane 216523, 433142, 439538, 4 transporter activity 461699 GO:0015405 0.013933 P-P-bond-hydrolysis-driven 216523, 433142, 439538, 4 transmembrane transporter 461699 activity GO:0044437 0.02128 vacuolar part 355949, 415280, 417676, 6 435128, 439538, ... GO:0006811 0.021484 ion transport 216523, 355949, 432785, 10 433142, 435128, ... GO:0006796 0.022432 phosphate-containing compound 107385, 122012, 200040, 14 metabolic process 427769, 433142, ... GO:0044699 0.02385 NA 107385, 113987, 122012, 54 213701, 216523, ... GO:0022857 0.025922 transmembrane transporter 216523, 354280, 355949, 12 activity 432184, 432785, ... GO:0022890 0.034149 inorganic cation transmembrane 216523, 354280, 433142, 5 transporter activity 439538, 461699 GO:0006793 0.035907 phosphorus metabolic process 107385, 122012, 200040, 14 427769, 433142, ... GO:0044763 0.036538 NA 107385, 122012, 213701, 41 249205, 354280, ... GO:0055085 0.044509 transmembrane transport 216523, 432184, 432785, 9 433142, 435128, ...

Gene ID: 435128 module: grey60 Biomineralization process: V-ATPase, B

74 GO term over function All genes with same term No. represented of pvalue genes GO:0098796 8.26E-19 membrane protein complex 211477, 216523, 218313, 27 354280, 355949, ... GO:0032991 1.47E-12 macromolecular complex 211477, 213701, 216523, 52 218313, 308697, ... GO:1901564 3.59E-12 organonitrogen compound 103368, 107385, 110366, 63 metabolic process 122012, 204056, ... GO:0044444 4.13E-09 cytoplasmic part 107385, 110366, 122012, 76 211477, 213701, ... GO:0043234 1.95E-08 protein complex 211477, 213701, 216523, 31 218313, 354280, ... GO:0006818 3.62E-08 hydrogen transport 216523, 355949, 433142, 8 435128, 437063, ... GO:0015992 3.62E-08 proton transport 216523, 355949, 433142, 8 435128, 437063, ... GO:0033178 5.34E-08 proton-transporting two-sector 433142, 435128, 439538, 5 ATPase complex, catalytic domain 461699, 95543 GO:0044237 5.69E-08 cellular metabolic process 107385, 110366, 122012, 80 200040, 204056, ... GO:0008152 1.32E-07 metabolic process 103368, 107385, 110366, 89 113987, 122012, ... GO:0046034 2.38E-07 ATP metabolic process 107385, 433142, 435128, 8 436550, 439538, ... GO:1902600 3.99E-07 hydrogen ion transmembrane 216523, 433142, 435128, 6 transport 439538, 461699, ... GO:0009199 1.28E-06 ribonucleoside triphosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0009205 1.28E-06 purine ribonucleoside 107385, 433142, 435128, 8 triphosphate metabolic process 436550, 439538, ... GO:0009144 1.88E-06 purine nucleoside triphosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0019693 2.83E-06 ribose phosphate metabolic 107385, 433142, 435128, 11 process 436305, 436550, ... GO:0009141 5.87E-06 nucleoside triphosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0009126 5.99E-06 purine nucleoside 107385, 433142, 435128, 8 monophosphate metabolic 436550, 439538, ... process GO:0009167 5.99E-06 purine ribonucleoside 107385, 433142, 435128, 8 monophosphate metabolic 436550, 439538, ... process GO:0015988 6.18E-06 energy coupled proton 216523, 435128, 439538, 4 transmembrane transport, 95543 against electrochemical gradient

75 GO term over function All genes with same term No. represented of pvalue genes GO:0015991 6.18E-06 ATP hydrolysis coupled proton 216523, 435128, 439538, 4 transport 95543 GO:0090662 6.18E-06 ATP hydrolysis coupled 216523, 435128, 439538, 4 transmembrane transport 95543 GO:0099131 6.18E-06 ATP hydrolysis coupled ion 216523, 435128, 439538, 4 transmembrane transport 95543 GO:0099132 6.18E-06 ATP hydrolysis coupled cation 216523, 435128, 439538, 4 transmembrane transport 95543 GO:0044424 7.20E-06 intracellular part 107385, 110366, 122012, 94 204056, 211477, ... GO:0033180 7.23E-06 proton-transporting V-type 435128, 439538, 95543 3 ATPase, V1 domain GO:0006807 1.15E-05 nitrogen compound metabolic 103368, 107385, 110366, 66 process 122012, 204056, ... GO:0015672 1.24E-05 monovalent inorganic cation 216523, 355949, 433142, 8 transport 435128, 437063, ... GO:0009987 1.28E-05 cellular process 107385, 110366, 122012, 88 200040, 204056, ... GO:0009117 1.59E-05 nucleotide metabolic process 107385, 427769, 433142, 12 435128, 436305, ... GO:0009150 1.82E-05 purine ribonucleotide metabolic 107385, 433142, 435128, 9 process 436305, 436550, ... GO:0006753 1.91E-05 nucleoside phosphate metabolic 107385, 427769, 433142, 12 process 435128, 436305, ... GO:0044281 2.13E-05 small molecule metabolic process 107385, 122012, 213701, 30 249205, 373551, ... GO:0071704 2.34E-05 organic substance metabolic 103368, 107385, 110366, 75 process 113987, 122012, ... GO:0009161 2.44E-05 ribonucleoside monophosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0016820 2.52E-05 hydrolase activity, acting on acid 216523, 355949, 433142, 7 anhydrides, catalyzing 435128, 439538, ... transmembrane movement of substances GO:0006163 2.64E-05 purine nucleotide metabolic 107385, 433142, 435128, 9 process 436305, 436550, ... GO:1901363 2.71E-05 heterocyclic compound binding 107385, 110366, 211477, 54 218313, 227656, ... GO:0097159 2.92E-05 organic cyclic compound binding 107385, 110366, 211477, 54 218313, 227656, ... GO:0009123 2.92E-05 nucleoside monophosphate 107385, 433142, 435128, 8 metabolic process 436550, 439538, ... GO:0008150 4.39E-05 biological_process 103368, 107385, 110366, 104 113987, 122012, ...

76 GO term over function All genes with same term No. represented of pvalue genes GO:0009259 5.19E-05 ribonucleotide metabolic process 107385, 433142, 435128, 9 436305, 436550, ... GO:0072521 8.18E-05 purine-containing compound 107385, 433142, 435128, 9 metabolic process 436305, 436550, ... GO:0044238 9.46E-05 primary metabolic process 103368, 107385, 110366, 68 113987, 122012, ... GO:0044710 9.52E-05 NA 107385, 113987, 122012, 44 213701, 227656, ... GO:0055086 0.000103 nucleobase-containing small 107385, 427769, 433142, 12 molecule metabolic process 435128, 436305, ... GO:0044464 0.000167 cell part 107385, 110366, 122012, 95 204056, 211477, ... GO:0098662 0.000346 inorganic cation transmembrane 216523, 433142, 435128, 6 transport 439538, 461699, ... GO:0098660 0.000557 inorganic ion transmembrane 216523, 433142, 435128, 6 transport 439538, 461699, ... GO:1901135 0.000651 carbohydrate derivative 107385, 122012, 427769, 13 metabolic process 433142, 435128, ... GO:0098655 0.000694 cation transmembrane transport 216523, 433142, 435128, 6 439538, 461699, ... GO:0019637 0.000737 organophosphate metabolic 107385, 122012, 427769, 13 process 433142, 435128, ... GO:0003674 0.000852 molecular_function 107385, 110366, 113987, 105 122012, 123375, ... GO:0005575 0.001459 cellular_component 107385, 110366, 113987, 97 122012, 204056, ... GO:0044446 0.002639 intracellular organelle part 122012, 211477, 216523, 52 218313, 227656, ... GO:0044422 0.003315 organelle part 122012, 211477, 216523, 52 218313, 227656, ... GO:0005488 0.003704 binding 107385, 110366, 122012, 73 211477, 213701, ... GO:0016020 0.005687 membrane 211477, 213701, 216523, 39 218313, 227656, ... GO:0006812 0.006146 cation transport 216523, 355949, 433142, 8 435128, 437063, ... GO:0044425 0.00669 membrane part 211477, 216523, 218313, 36 227656, 250739, ... GO:0034220 0.007797 ion transmembrane transport 216523, 433142, 435128, 6 439538, 461699, ... GO:0022804 0.010648 active transmembrane 216523, 355949, 432785, 8 transporter activity 433142, 435128, ... GO:0034641 0.013908 cellular nitrogen compound 107385, 110366, 122012, 37 metabolic process 204056, 308697, ...

77 GO term over function All genes with same term No. represented of pvalue genes GO:0044437 0.02128 vacuolar part 355949, 415280, 417676, 6 435128, 439538, ... GO:0006811 0.021484 ion transport 216523, 355949, 432785, 10 433142, 435128, ... GO:0006796 0.022432 phosphate-containing compound 107385, 122012, 200040, 14 metabolic process 427769, 433142, ... GO:0044699 0.02385 NA 107385, 113987, 122012, 54 213701, 216523, ... GO:0022857 0.025922 transmembrane transporter 216523, 354280, 355949, 12 activity 432184, 432785, ... GO:0006793 0.035907 phosphorus metabolic process 107385, 122012, 200040, 14 427769, 433142, ... GO:0044763 0.036538 NA 107385, 122012, 213701, 41 249205, 354280, ... GO:0055085 0.044509 transmembrane transport 216523, 432184, 432785, 9 433142, 435128, ...

Gene ID: 196760 module: paleturquoise Biomineralization process: bicarbonate transporter, anion exchanger-like Cl-/HCO3- exchangers GO term over function All genes with same term No. represented of pvalue genes GO:0021860 0.009238 pyramidal neuron development 196760 1 GO:0021884 0.009238 forebrain neuron development 196760 1 GO:0021954 0.009238 central nervous system neuron 196760 1 development GO:0035641 0.009238 locomotory exploration behavior 196760 1 GO:0097441 0.009238 basilar dendrite 196760 1 GO:0097442 0.009238 CA3 pyramidal cell dendrite 196760 1 GO:0043025 0.011924 neuronal cell body 196760, 351492 2 GO:0044297 0.018163 cell body 196760, 351492 2 GO:0015701 0.023157 bicarbonate transport 196760 1 GO:0030425 0.023581 dendrite 196760, 351492 2 GO:0097440 0.028637 apical dendrite 196760 1 GO:0035640 0.032624 exploration behavior 196760 1 GO:0035640 0.032624 exploration behavior 196760 1 GO:0048666 0.035305 neuron development 196760 1 GO:0048854 0.036782 brain morphogenesis 196760 1

78 GO term over function All genes with same term No. represented of pvalue genes GO:0008510 0.036883 sodium:bicarbonate symporter 196760 1 activity GO:0015106 0.036883 bicarbonate transmembrane 196760 1 transporter activity

Gene ID: 436956 module: pink Biomineralization process: anion exchanger-like, SLC4 Na+ independent Cl-/HCO3- exchangers GO term over function All genes with same term No. represented of pvalue genes GO:0098805 0.001216 whole membrane 214071, 251432, 355894, 15 368295, 432191, ... GO:0005886 0.004277 plasma membrane 108504, 123368, 197284, 23 251432, 309260, ... GO:0098588 0.021729 bounding membrane of organelle 214071, 251432, 355894, 19 368295, 432191, ... GO:0000324 0.036877 fungal-type vacuole 435850, 436956 2 GO:0000322 0.042733 storage vacuole 435850, 436956 2

Gene ID: 434034 module: pink Biomineralization process: eukaryotic Na+/H+ exchanger GO term over function All genes with same term No. represented of pvalue genes GO:0098805 0.001216 whole membrane 214071, 251432, 355894, 15 368295, 432191, ... GO:0005886 0.004277 plasma membrane 108504, 123368, 197284, 23 251432, 309260, ... GO:0098588 0.021729 bounding membrane of organelle 214071, 251432, 355894, 19 368295, 432191, ...

Gene ID: 469783 module: purple

79 Biomineralization process: bicarbonate transporter GO term over function All genes with same term No. represented of pvalue genes GO:0016020 0.000607 membrane 100529, 100633, 107728, 54 110254, 111592, ... GO:0031224 0.000622 intrinsic component of 100529, 100633, 106166, 45 membrane 106729, 107728, ... GO:0005575 0.000806 cellular_component 100056, 100529, 100633, 126 101352, 106166, ... GO:0044464 0.000899 cell part 100056, 100529, 100633, 118 101352, 106166, ... GO:0016021 0.000927 integral component of membrane 100529, 100633, 106166, 44 106729, 107728, ... GO:0044425 0.001094 membrane part 100529, 100633, 106166, 49 106729, 107728, ... GO:0006811 0.002459 ion transport 100529, 107728, 113427, 15 206643, 208089, ... GO:0008514 0.00317 organic anion transmembrane 231096, 244268, 451704, 7 transporter activity 462391, 465146, ... GO:0015297 0.005703 antiporter activity 100529, 451704, 462391, 5 465146, 469783 GO:0006810 0.008504 transport 100529, 107728, 110254, 30 112952, 113427, ... GO:0008509 0.009696 anion transmembrane 231096, 244268, 451704, 7 transporter activity 462391, 465146, ... GO:0051234 0.010295 establishment of localization 100529, 107728, 110254, 30 112952, 113427, ... GO:0008150 0.0111 biological_process 100056, 100529, 100598, 122 100633, 101352, ... GO:0051179 0.01385 localization 100529, 107728, 110254, 31 112952, 113427, ... GO:0003674 0.01409 molecular_function 100056, 100529, 100598, 129 100633, 101352, ... GO:0005215 0.014164 transporter activity 100529, 107728, 110254, 18 113427, 208089, ... GO:0022892 0.015755 substrate-specific transporter 100529, 113427, 208089, 14 activity 231096, 244268, ... GO:0009987 0.023562 cellular process 100529, 100633, 101352, 96 106166, 107102, ... GO:0015075 0.023659 ion transmembrane transporter 100529, 208089, 231096, 11 activity 244268, 451704, ... GO:0022891 0.025782 substrate-specific 100529, 208089, 231096, 12 transmembrane transporter 244268, 250560, ... activity

80 GO:0006820 0.02642 anion transport 244268, 250560, 451704, 7 462391, 466304, ... GO:0022857 0.029826 transmembrane transporter 100529, 107728, 208089, 15 activity 231096, 244268, ... GO:0015081 0.046512 sodium ion transmembrane 100529, 469783 2 transporter activity

Gene ID: 420005 module: salmon Biomineralization process: V-ATPase, D GO term over function All genes with same term No. represented of pvalue genes GO:0044424 0.00033 intracellular part 103684, 104883, 108460, 100 109612, 115029, ... GO:0008150 0.000487 biological_process 103684, 104883, 108460, 114 115029, 121403, ... GO:0005575 0.00066 cellular_component 103684, 104883, 108460, 112 109612, 115029, ... GO:0003674 0.00119 molecular_function 103684, 104883, 108460, 119 109612, 115029, ... GO:0044464 0.001285 cell part 103684, 104883, 108460, 104 109612, 115029, ... GO:0044446 0.006292 intracellular organelle part 103684, 104883, 108460, 57 115029, 121403, ... GO:0044422 0.007869 organelle part 103684, 104883, 108460, 57 115029, 121403, ... GO:0043229 0.015413 intracellular organelle 104883, 108460, 109612, 58 194578, 196361, ... GO:0043226 0.015946 organelle 104883, 108460, 109612, 60 194578, 196361, ... GO:0016887 0.020055 ATPase activity 194578, 199049, 256467, 9 258138, 259135, ... GO:0044444 0.045223 cytoplasmic part 103684, 108460, 115029, 59 121403, 125388, ...

Gene ID: 464767 module: turquoise Biomineralization process: V-ATPase, A

81 GO term over function All genes with same term No. represented of pvalue genes GO:0042592 0.000262 homeostatic process 100055, 103027, 104009, 60 105320, 111405, ... GO:0019725 0.000301 cellular homeostasis 100055, 103027, 111405, 43 196760, 217082, ... GO:0048878 0.001349 chemical homeostasis 100055, 103027, 104009, 32 111405, 194068, ... GO:0006873 0.003401 cellular ion homeostasis 100055, 103027, 111405, 18 196760, 217082, ... GO:0050801 0.006442 ion homeostasis 100055, 103027, 104009, 25 111405, 196760, ... GO:0003008 0.009479 system process 102463, 104009, 105638, 22 121824, 202804, ... GO:0098771 0.010586 inorganic ion homeostasis 100055, 103027, 104009, 22 111405, 196760, ... GO:0034654 0.011139 nucleobase-containing compound 105320, 106724, 122834, 88 biosynthetic process 125567, 194430, ... GO:0055082 0.011718 cellular chemical homeostasis 100055, 103027, 111405, 19 196760, 217082, ... GO:0065008 0.01388 regulation of biological quality 100055, 103027, 104009, 89 104534, 105320, ... GO:0005765 0.014141 lysosomal membrane 104979, 204576, 237489, 9 242941, 430501, ... GO:0098852 0.017233 lytic vacuole membrane 104979, 204576, 237489, 12 242941, 426930, ... GO:0009156 0.018815 ribonucleoside monophosphate 224909, 360925, 421581, 12 biosynthetic process 432220, 441105, ... GO:0030003 0.02219 cellular cation homeostasis 100055, 103027, 111405, 15 196760, 217082, ... GO:0043227 0.022204 membrane-bounded organelle 100055, 100138, 100420, 377 101233, 101242, ... GO:0006139 0.025292 nucleobase-containing compound 102628, 104534, 105320, 225 metabolic process 105330, 105856, ... GO:0009124 0.025377 nucleoside monophosphate 224909, 360925, 421581, 12 biosynthetic process 432220, 441105, ... GO:0055080 0.028913 cation homeostasis 100055, 103027, 111405, 20 196760, 199932, ... GO:0009127 0.031596 purine nucleoside 224909, 360925, 421581, 9 monophosphate biosynthetic 432220, 441105, ... process GO:0009168 0.031596 purine ribonucleoside 224909, 360925, 421581, 9 monophosphate biosynthetic 432220, 441105, ... process GO:0007600 0.034097 sensory perception 102463, 105638, 211275, 11 222362, 250169, ...

82 GO term over function All genes with same term No. represented of pvalue genes GO:0050877 0.040182 nervous system process 102463, 105638, 121824, 16 211275, 222362, ... GO:0046483 0.048569 heterocycle metabolic process 102628, 104086, 104534, 252 105320, 105330, ...

Gene ID: 72273 module: yellow Biomineralization process: cation/H+ exchanger (CAX family) GO term over function All genes with same term No. represented of pvalue genes GO:0015085 0.001901 calcium ion transmembrane 453185, 72273, 78746, 4 transporter activity 94977 GO:0072509 0.00766 divalent inorganic cation 453185, 53810, 72273, 5 transmembrane transporter 78746, 94977 activity GO:0006816 0.008921 calcium ion transport 453185, 72273, 78746, 4 94977 GO:0046873 0.047592 metal ion transmembrane 116528, 453185, 53810, 6 transporter activity 72273, 78746, ... GO:0008150 0.049262 biological_process 100277, 101380, 103787, 152 104255, 10457 ...

APPENDIX 3 – Table relating unmerged modules to lipid metabolism genes

Gene ID Module Lipid 74568 black Biosyn. of unsat. fatty acid 195308 black Sphingolipid 71084 black Syn. & degrad ketone 455111 black alpha-Linolenic acid metabolism 466572 blue Glycerolipid 462459 blue Biosyn. of unsat. fatty acid 65804 blue Sphingolipid 99752 blue Fatty acid metabolism 437991 blue Sphingolipid 468545 blue Steriod biosynthesis

83 455687 blue alpha-Linolenic acid metabolism 462898 blue Syn. & degrad ketone 103579 blue Ether Lipid metabolism 216985 blue Fatty acid metabolism 77411 brown Ether Lipid metabolism 372934 brown Biosyn. of unsat. fatty acid 235900 brown Fatty acid elongation 417285 brown Linoleic acid 462793 brown Primary bile acid 42521 brown Sphingolipid 113631 brown Arachidonic acid metabolism 441547 brown Fatty acid metabolism 217317 brown Glycerophospholipid 450766 brown Ether Lipid metabolism 115039 brown alpha-Linolenic acid metabolism 435118 cyan Biosyn. of unsat. fatty acid 416107 cyan Syn. & degrad ketone 97678 cyan Glycerolipid 435400 cyan alpha-Linolenic acid metabolism 442028 cyan Biosyn. of unsat. fatty acid 463079 cyan Sphingolipid 461539 darkgreen alpha-Linolenic acid metabolism 350751 darkgreen Glycerophospholipid 57780 darkgreen Glycerophospholipid 462014 darkgrey Glycerolipid 71725 darkgrey Glycerophospholipid 434850 darkmagenta Biosyn. of unsat. fatty acid 433989 darkorange Arachidonic acid metabolism 104256 darkorange Glycerophospholipid 455453 darkorange Sphingolipid 44703 darkred Biosyn. of unsat. fatty acid 363689 darkturquoise Arachidonic acid metabolism 107980 darkturquoise Sphingolipid 310078 darkturquoise Biosyn. of unsat. fatty acid 55362 green Glycerophospholipid 351545 green Ether Lipid metabolism 62920 green alpha-Linolenic acid metabolism 426762 greenyellow Biosyn. of unsat. fatty acid 225887 greenyellow Fatty acid metabolism 65222 greenyellow Ether Lipid metabolism 439583 greenyellow Steriod biosynthesis 437926 greenyellow Fatty acid metabolism 462385 grey60 Fatty acid biosynthesis

84 71654 grey60 Biosyn. of unsat. fatty acid 466230 grey60 Glycerolipid 454032 lightcyan Sphingolipid 451417 lightcyan Fatty acid metabolism 228782 lightcyan alpha-Linolenic acid metabolism 450371 lightcyan Sphingolipid 467794 lightcyan Sphingolipid 434624 lightgreen Fatty acid metabolism 231395 lightgreen Arachidonic acid metabolism 444084 lightyellow Biosyn. of unsat. fatty acid 456684 lightyellow Fatty acid metabolism 451975 lightyellow Glycerolipid 358135 magenta Glycerolipid 428205 magenta Steriod biosynthesis 353558 magenta Fatty acid metabolism 209748 magenta Glycerolipid 100082 magenta Glycerolipid 436719 magenta alpha-Linolenic acid metabolism 464665 midnightblue Fatty acid metabolism 455647 midnightblue Glycerolipid 206840 midnightblue Glycerolipid 462451 midnightblue Sphingolipid 428713 midnightblue Fatty acid biosynthesis 64273 midnightblue Steriod biosynthesis 200969 orange Sphingolipid 438299 pink Glycerolipid 308356 purple Arachidonic acid metabolism 310116 purple Glycerolipid 461962 purple Fatty acid biosynthesis 470128 red Arachidonic acid metabolism 414354 red Fatty acid biosynthesis 357482 royalblue alpha-Linolenic acid metabolism 61017 royalblue Ether Lipid metabolism 459364 salmon Ether Lipid metabolism 450278 salmon Steriod biosynthesis 432676 salmon Steriod biosynthesis 427068 skyblue Biosyn. of unsat. fatty acid 439159 skyblue Ether Lipid metabolism 425873 skyblue Biosyn. of unsat. fatty acid 350230 tan Fatty acid metabolism 198823 tan alpha-Linolenic acid metabolism 46870 tan alpha-Linolenic acid metabolism 42019 tan Ether Lipid metabolism

85 195845 tan Biosyn. of unsat. fatty acid 97603 tan Arachidonic acid metabolism 452872 tan Biosyn. of unsat. fatty acid 64122 turquoise Steriod biosynthesis 233079 turquoise Glycerolipid 206165 turquoise Sphingolipid 105406 turquoise Glycerolipid 448517 turquoise Arachidonic acid metabolism 438072 turquoise alpha-Linolenic acid metabolism 111458 turquoise Sphingolipid 218439 turquoise Sphingolipid 448764 turquoise Biosyn. of unsat. fatty acid 43539 turquoise alpha-Linolenic acid metabolism 434256 turquoise Fatty acid metabolism 108787 turquoise Ether Lipid metabolism 464423 turquoise Biosyn. of unsat. fatty acid 65073 turquoise alpha-Linolenic acid metabolism 463256 turquoise Sphingolipid 448559 turquoise Syn. & degrad ketone 234622 turquoise alpha-Linolenic acid metabolism 451556 violet Sphingolipid 441988 white Biosyn. of unsat. fatty acid 357847 white Fatty acid metabolism 63173 yellow Biosyn. of unsat. fatty acid 55855 yellow Biosyn. of unsat. fatty acid 45086 yellow Syn. & degrad ketone 51150 yellow Fatty acid metabolism 434784 yellow Glycerophospholipid 464150 yellow alpha-Linolenic acid metabolism 122167 yellow Sphingolipid 77738 yellow Ether Lipid metabolism

APPENDIX 4 – GO analysis table for merged E. Huxleyi data modules

• Paleturquoise category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0006412 1.53E-21 20.81670708 58 98 translation

86 GO:0003735 3.59E-21 20.44525139 53 85 structural constituent of ribosome GO:0044445 2.75E-20 19.56122377 40 55 cytosolic part GO:0043043 4.10E-20 19.38686686 58 103 peptide biosynthetic process GO:0006518 1.36E-19 18.86760946 65 127 peptide metabolic process GO:0043604 1.90E-18 17.72040008 60 116 amide biosynthetic process GO:0044391 9.00E-18 17.04563347 38 55 ribosomal subunit GO:1901566 9.73E-18 17.01210918 118 340 organonitrogen compound biosynthetic process GO:0005840 1.90E-17 16.72017513 45 75 ribosome GO:0030529 1.24E-16 15.90778396 90 233 intracellular ribonucleoprotein complex GO:1990904 1.24E-16 15.90778396 90 233 ribonucleoprotein complex GO:0005198 2.83E-15 14.54812116 54 111 structural molecule activity GO:0043603 7.46E-15 14.12721609 67 159 cellular amide metabolic process GO:0022625 4.61E-14 13.33607139 20 22 cytosolic large ribosomal subunit GO:0003723 2.50E-12 11.60123787 116 387 RNA binding

• Darkolivegreen category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0035556 0.000282555 3.548897568 28 144 intracellular signal transduction GO:0046332 0.000519105 3.284744536 4 5 SMAD binding GO:0001952 0.00055965 3.252083682 3 3 regulation of cell-matrix adhesion GO:0042981 0.000658487 3.181452778 15 63 regulation of apoptotic process GO:0043067 0.000817493 3.087516141 15 64 regulation of programmed cell death GO:0030136 0.000924253 3.034209287 3 3 clathrin-coated vesicle GO:0006869 0.000957414 3.018900347 10 30 lipid transport GO:0043066 0.001102197 2.957740599 10 36 negative regulation of apoptotic process GO:0043069 0.001102197 2.957740599 10 36 negative regulation of programmed cell death GO:0004674 0.0017551 2.755698214 22 114 protein serine/threonine activity GO:0030155 0.002123979 2.672849883 6 16 regulation of cell adhesion GO:0007165 0.002444133 2.611875224 42 275 signal transduction

87 GO:0010941 0.002999827 2.522903778 15 72 regulation of cell death GO:1902936 0.003301649 2.481269103 3 4 phosphatidylinositol bisphosphate binding GO:0007040 0.003346387 2.475423775 3 4 lysosome organization

• Magenta category over -Log(pvalue) num nu term represented DE In m In pvalue Cat Cat GO:0044430 7.81E-05 4.107515725 41 167 cytoskeletal part GO:0005776 0.00242054 2.616087706 4 6 autophagosome GO:0008017 0.00242547 2.615204017 9 23 microtubule binding GO:0008092 0.002856689 2.544137017 22 89 cytoskeletal protein binding GO:0005815 0.003526557 2.452649057 19 73 microtubule organizing center GO:0003677 0.004745526 2.323715663 54 282 DNA binding GO:0022411 0.005014085 2.299808329 8 21 cellular component disassembly GO:0006914 0.006560963 2.183032429 8 22 autophagy GO:0016461 0.006570014 2.182433684 3 4 unconventional myosin complex GO:0003779 0.007198221 2.142774833 12 43 actin binding GO:0004197 0.007510124 2.12435288 6 14 cysteine-type endopeptidase activity GO:0051639 0.008128573 2.089985675 3 4 actin filament network formation GO:0051764 0.008128573 2.089985675 3 4 actin crosslink formation GO:0031252 0.008227811 2.084715678 4 7 cell leading edge GO:0035194 0.008615562 2.064716368 3 4 posttranscriptional gene silencing by RNA

• Orange category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0009536 3.43E-05 4.464938055 27 228 plastid GO:0009507 6.58E-05 4.18169705 25 210 chloroplast GO:0046148 0.000150688 3.821921771 9 43 pigment biosynthetic process GO:0015995 0.000168059 3.77453949 6 18 chlorophyll biosynthetic process GO:0006782 0.000304756 3.516047948 4 8 protoporphyrinogen IX biosynthetic process

88 GO:0042440 0.00042484 3.371774843 10 58 pigment metabolic process GO:0006779 0.000543569 3.264745643 6 22 porphyrin-containing compound biosynthetic process GO:0046501 0.000545087 3.263534033 4 9 protoporphyrinogen IX metabolic process GO:0006348 0.000550188 3.259488558 3 4 chromatin silencing at telomere GO:0033013 0.000679371 3.167892809 9 49 tetrapyrrole metabolic process GO:0015994 0.000841335 3.075031206 7 32 chlorophyll metabolic process GO:0033014 0.001784489 2.748486229 6 27 tetrapyrrole biosynthetic process GO:0006750 0.002250513 2.647718446 2 2 glutathione biosynthetic process GO:0019184 0.002250513 2.647718446 2 2 nonribosomal peptide biosynthetic process GO:0006222 0.002394651 2.620757806 3 6 UMP biosynthetic process

• Green category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0051119 0.001429029 2.844958881 7 24 sugar transmembrane transporter activity GO:0046323 0.001481306 2.829355316 3 4 glucose import GO:0008643 0.001794557 2.746042652 10 46 carbohydrate transport GO:0007623 0.001830418 2.7374497 5 13 circadian rhythm GO:0015144 0.002378856 2.623631913 7 26 carbohydrate transmembrane transporter activity GO:0006721 0.002913278 2.535618076 8 34 terpenoid metabolic process GO:1901476 0.003033403 2.518069922 7 27 NA GO:0046975 0.003810117 2.41906167 3 5 histone methyltransferase activity (H3-K36 specific) GO:0045489 0.004372117 2.359308221 5 16 pectin biosynthetic process GO:0055072 0.004783996 2.320209237 5 16 iron ion homeostasis GO:0050265 0.004876682 2.311875577 2 2 RNA uridylyltransferase activity GO:0071076 0.004876682 2.311875577 2 2 RNA 3' uridylation GO:0090065 0.004876682 2.311875577 2 2 regulation of production of siRNA involved in RNA interference GO:1900368 0.004876682 2.311875577 2 2 regulation of RNA interference

89 GO:1900370 0.004876682 2.311875577 2 2 positive regulation of RNA interference

• Lightyellow category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0010586 0.000362808 3.440323374 3 7 miRNA metabolic process GO:0042819 0.000805974 3.093679183 3 9 vitamin B6 biosynthetic process GO:0008614 0.001353139 2.868657663 2 3 pyridoxine metabolic process GO:0008615 0.001353139 2.868657663 2 3 pyridoxine biosynthetic process GO:0008543 0.001579936 2.801360537 2 3 fibroblast growth factor receptor signaling pathway GO:0035278 0.001579936 2.801360537 2 3 miRNA mediated inhibition of translation GO:0040033 0.001579936 2.801360537 2 3 negative regulation of translation, ncRNA-mediated GO:0045974 0.001579936 2.801360537 2 3 regulation of translation, ncRNA-mediated GO:0072525 0.001996589 2.699711275 3 12 pyridine-containing compound biosynthetic process GO:0035198 0.002916771 2.535097731 2 4 miRNA binding GO:0042816 0.003172292 2.498626815 3 14 vitamin B6 metabolic process GO:0009443 0.004458681 2.350793637 2 5 pyridoxal 5'-phosphate salvage GO:0061158 0.004514527 2.345387777 2 5 3'-UTR-mediated mRNA destabilization GO:0043487 0.004800654 2.318699555 3 16 regulation of RNA stability GO:0043488 0.004800654 2.318699555 3 16 regulation of mRNA stability

• Tan category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:1903214 0.000696515 3.157069765 2 2 regulation of protein targeting to mitochondrion GO:1903747 0.000696515 3.157069765 2 2 regulation of establishment of protein localization to mitochondrion

90 GO:1903749 0.000696515 3.157069765 2 2 positive regulation of establishment of protein localization to mitochondrion GO:1903955 0.000696515 3.157069765 2 2 positive regulation of protein targeting to mitochondrion GO:1901800 0.000913253 3.039408927 4 17 positive regulation of proteasomal protein catabolic process GO:0015788 0.000996947 3.001328032 3 8 UDP-N-acetylglucosamine transport GO:1903052 0.001431502 2.844208098 4 19 positive regulation of proteolysis involved in cellular protein catabolic process GO:1903364 0.001755179 2.755678638 4 20 positive regulation of cellular protein catabolic process GO:0061136 0.002036942 2.691021319 4 21 regulation of proteasomal protein catabolic process GO:0045732 0.002692845 2.569788685 4 23 positive regulation of protein catabolic process GO:1903050 0.003428906 2.464844478 4 24 regulation of proteolysis involved in cellular protein catabolic process GO:1903362 0.004008384 2.397030732 4 25 regulation of cellular protein catabolic process GO:0015781 0.004605602 2.336713556 3 13 pyrimidine nucleotide-sugar transport GO:0009535 0.006367788 2.196011388 7 82 chloroplast thylakoid membrane GO:0010822 0.006766774 2.169618303 2 5 positive regulation of mitochondrion organization

• Lightcyan category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0007283 0.00101522 2.993439799 4 20 spermatogenesis GO:0048232 0.001484695 2.828362722 4 22 male gamete generation GO:0016607 0.002805299 2.552020914 5 42 nuclear speck GO:0034660 0.002977001 2.526221003 12 208 ncRNA metabolic process GO:0000738 0.00314699 2.502104633 2 4 DNA catabolic process, exonucleolytic GO:0007276 0.003813974 2.418622325 4 28 gamete generation GO:0006353 0.004442372 2.352385069 3 15 DNA-templated transcription, termination

91 GO:0030182 0.004984249 2.302400293 2 5 neuron differentiation GO:0006308 0.00513527 2.28943675 2 5 DNA catabolic process GO:0004540 0.00533006 2.273267885 4 31 ribonuclease activity GO:0016866 0.006596425 2.180691397 4 33 intramolecular transferase activity GO:0010558 0.006825388 2.165872666 7 95 negative regulation of macromolecule biosynthetic process GO:0019901 0.006902662 2.16098342 4 33 binding GO:0031023 0.007355137 2.133409231 2 6 microtubule organizing center organization GO:0004534 0.007580143 2.120322605 2 6 5'-3' activity

• Saddlebrown category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0015377 0.000215211 3.667135524 2 2 cation:chloride symporter activity GO:0015379 0.000215211 3.667135524 2 2 potassium:chloride symporter activity GO:0022820 0.000215211 3.667135524 2 2 potassium ion symporter activity GO:0032807 0.000851043 3.070048367 2 3 DNA ligase IV complex GO:0000151 0.001163356 2.934287398 5 54 ubiquitin ligase complex GO:0000153 0.001233664 2.908803163 2 4 cytoplasmic ubiquitin ligase complex GO:0006297 0.001598928 2.796171073 2 4 nucleotide-excision repair, DNA gap filling GO:0005874 0.001795237 2.745878182 4 37 microtubule GO:0009615 0.002019182 2.694824524 3 17 response to virus GO:0005057 0.002034361 2.691571949 2 5 signal transducer activity, downstream of receptor GO:0006266 0.002637671 2.578779361 2 5 DNA ligation GO:0051103 0.002637671 2.578779361 2 5 DNA ligation involved in DNA repair GO:0099513 0.003459521 2.460984007 4 44 polymeric cytoskeletal fiber GO:0004534 0.003531738 2.452011464 2 6 5'-3' exoribonuclease activity GO:0003909 0.003555429 2.449107977 2 6 DNA ligase activity

• Midnightblue

92 category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0032993 0.000359336 3.444498733 4 15 protein-DNA complex GO:0006307 0.000554579 3.256036696 2 2 DNA dealkylation involved in DNA repair GO:0071705 0.001447556 2.839364723 15 264 nitrogen compound transport GO:0000786 0.001476444 2.830782977 3 10 nucleosome GO:0044815 0.001476444 2.830782977 3 10 DNA packaging complex GO:0006497 0.001880929 2.725627495 3 11 protein lipidation GO:0005905 0.003121034 2.505701536 2 4 clathrin-coated pit GO:0005769 0.003165507 2.499556711 3 13 early endosome GO:0008104 0.003633978 2.439617751 13 234 protein localization GO:0033036 0.004507321 2.346081556 13 240 macromolecule localization GO:0035510 0.004726793 2.325433438 2 5 DNA dealkylation GO:2000377 0.004847263 2.314503434 3 15 regulation of reactive oxygen species metabolic process GO:0015031 0.005238536 2.280790075 11 189 protein transport GO:0015833 0.005430891 2.26512893 11 190 peptide transport GO:0042886 0.006069902 2.21681833 11 193 amide transport

• Lightgreen category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0030041 0.001491519 2.826371102 2 3 actin filament polymerization GO:0051748 0.003983005 2.399789118 2 5 UTP-monosaccharide-1- phosphate uridylyltransferase activity GO:0005777 0.005480733 2.261161389 4 34 peroxisome GO:0004003 0.005684345 2.245319581 3 18 ATP-dependent DNA helicase activity GO:0008154 0.007091484 2.149262895 2 6 actin polymerization or depolymerization GO:0045010 0.007276545 2.138074783 2 6 actin nucleation GO:0043138 0.007389242 2.131400083 2 6 3'-5' DNA helicase activity GO:0070569 0.007685326 2.114337678 2 7 uridylyltransferase activity GO:0008026 0.008047331 2.094348156 5 59 ATP-dependent helicase activity GO:0070035 0.008047331 2.094348156 5 59 purine NTP-dependent helicase activity GO:0043291 0.008391857 2.076141905 1 1 RAVE complex

93 GO:0019369 0.009552424 2.019886411 2 8 arachidonic acid metabolic process GO:0016779 0.009622568 2.016709031 5 61 activity GO:0033046 0.010056725 1.997543415 2 7 negative regulation of sister chromatid segregation GO:0033048 0.010056725 1.997543415 2 7 negative regulation of mitotic sister chromatid segregation

• Black category over -Log(pvalue) num num term represented DE In In pvalue Cat Cat GO:0016805 0.000149246 3.826097833 3 3 dipeptidase activity GO:0005730 0.000594551 3.225811075 20 155 nucleolus GO:0004298 0.000740638 3.130394082 4 8 threonine-type endopeptidase activity GO:0070003 0.000740638 3.130394082 4 8 threonine-type peptidase activity GO:0005737 0.001690799 2.771907934 71 879 cytoplasm GO:0019774 0.001841394 2.734853169 3 5 proteasome core complex, beta-subunit complex GO:0016887 0.001925979 2.715348405 19 167 ATPase activity GO:0097284 0.002372928 2.624715369 2 2 hepatocyte apoptotic process GO:0048640 0.003006091 2.521997823 2 2 negative regulation of developmental growth GO:0001085 0.003548525 2.449952076 4 12 RNA polymerase II transcription factor binding GO:0000980 0.003598513 2.443876921 2 2 RNA polymerase II distal enhancer sequence-specific DNA binding GO:0001158 0.003598513 2.443876921 2 2 enhancer sequence-specific DNA binding GO:0035326 0.003598513 2.443876921 2 2 enhancer binding GO:0071049 0.003603899 2.443227419 2 2 nuclear retention of pre- mRNA with aberrant 3'-ends at the site of transcription GO:0005839 0.003605973 2.442977565 3 6 proteasome core complex

APPENDIX 5 – Table relating merged modules to biomineralization genes

Gene ID: 420005

94 module: black Biomineralization process: V-ATPase, D GO term over fucntion All genes with same term No. represented of pvalue genes GO:0016887 0.001926 ATPase activity 110859, 194578, 199049, 19 221221, ... GO:0003674 0.012726 molecular_function 102951, 103655, 103684, 240 104343, ... GO:0016787 0.034776 hydrolase activity 102951, 103655, 108620, 64 110807, ...

Gene ID: 434034 module: darkolivegreen Biomineralization process: eukaryotic Na+/H+ exchanger GO term over fucntion All genes with same term No. represented of pvalue genes GO:0098805 0.006383 whole membrane 122965, 212097, 214071, 28 225887, 251155, ... GO:0098657 0.013853 import into cell 434034, 464098 2

Gene ID: 466232 module: darkolivegreen Biomineralization process: anion exchanger-like, SLC4 Na+ independent Cl-/HCO3- exchangers GO term over fucntion All genes with same term No. represented of pvalue genes GO:0000324 0.004818 fungal-type vacuole 435850, 436956, 442901, 4 466232

Gene ID: 463266 module: darkolivegreen Biomineralization process: fibrillins and related proteins containing Ca2+-binding EGF-like domains

95 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0007165 0.002444 signal transduction 101236, 114652, 123368, 42 197284, 201455, ...

Gene ID: 439740 module: orange Biomineralization process: H+ Ppase GO term over fucntion All genes with same term No. represented of pvalue genes GO:0003824 0.016541 catalytic activity 102363, 104448, 104632, 145 104739, 105407, 106031, ...

Gene ID: 439538 module: paleturquoise Biomineralization process: V-ATPase, A GO term over fucntion All genes with same term No. represented of pvalue genes GO:1901564 1.05E-11 organonitrogen compound 100158, 100633, 101014, 274 metabolic process 101092, 103267, ... GO:0009987 3.34E-09 cellular process 100158, 100277, 100529, 527 100633, 100710, ... GO:0044424 1.64E-08 intracellular part 100056, 100158, 100529, 556 100633, 100710, ... GO:0044444 3.07E-08 cytoplasmic part 100158, 100529, 100633, 373 100710, 101352, ... GO:0044422 1.75E-05 organelle part 100529, 100633, 100710, 318 101352, 101380, ... GO:0006811 5.14E-05 ion transport 100529, 107728, 113427, 59 116528, 196760, ... GO:0006753 6.57E-05 nucleoside phosphate metabolic 100158, 107385, 212824, 37 process 214924, 217317, ... GO:0006163 6.77E-05 purine nucleotide metabolic 100158, 107385, 214924, 25 process 254913, 359239, ... GO:0055086 6.99E-05 nucleobase-containing small 100158, 107385, 118523, 42 molecule metabolic process 211444, 212824, ...

96 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0009150 8.97E-05 purine ribonucleotide metabolic 100158, 107385, 254913, 24 process 359239, 365588, ... GO:0015988 0.000101 energy coupled proton 216523, 413949, 435128, 6 transmembrane transport, 439538, 457332, ... against electrochemical gradient GO:1902600 0.000138 hydrogen ion transmembrane 196760, 216523, 413949, 9 transport 433142, 435128, ... GO:0006812 0.000177 cation transport 100529, 107728, 113427, 37 116528, 196760, ... GO:0044769 0.003514 ATPase activity, coupled to 216523, 413949, 433142, 5 transmembrane movement of 439538, 461699 ions, rotational mechanism GO:0015075 0.005804 ion transmembrane transporter 100529, 116528, 196760, 44 activity 198810, 200599, ... GO:0008324 0.032877 cation transmembrane 100529, 116528, 196760, 29 transporter activity 198810, 208089, ... GO:0006793 0.034006 phosphorus metabolic process 100158, 100633, 104255, 73 107224, 107385, ... GO:0016820 0.043697 hydrolase activity, acting on acid 216523, 242411, 355949, 12 anhydrides, catalyzing 413949, 433142, ... transmembrane movement of substances

Gene ID: 196760 module: paleturquoise Biomineralization process: bicarbonate transporter, anion exchanger-like Cl-/HCO3- exchangers GO term over fucntion All genes with same term No. represented of pvalue genes GO:0008150 1.23E-10 biological_process 100056, 100158, 100277, 666 100529, 100598, ... GO:0015301 8.98E-08 anion:anion antiporter activity 196760, 200599, 432785, 10 433656, 451704, ... GO:0003674 6.30E-06 molecular_function 100056, 100158, 100277, 680 100529, 100598, ... GO:0015077 4.21E-05 monovalent inorganic cation 100529, 116528, 196760, 18 transmembrane transporter 216523, 217075, ... activity GO:0015297 5.94E-05 antiporter activity 100529, 196760, 200599, 17 214474, 310007, ...

97 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0022857 0.000779 transmembrane transporter 100529, 100710, 107728, 69 activity 116528, 196760, ... GO:0098662 0.003113 inorganic cation transmembrane 116528, 196760, 206643, 15 transport 208089, 216523, ... GO:0008514 0.003222 organic anion transmembrane 196760, 200599, 213954, 20 transporter activity 217075, 231096, ... GO:0015701 0.026024 bicarbonate transport 196760, 99943 2 GO:0055067 0.027556 monovalent inorganic cation 196760, 197732, 413949, 5 homeostasis 469783, 99943 GO:0005452 0.028903 inorganic anion exchanger 196760, 469783, 99943 3 activity GO:0006821 0.030207 chloride transport 196760, 466304, 99943 3 GO:0046873 0.031927 metal ion transmembrane 100529, 116528, 196760, 17 transporter activity 208089, 217075, ... GO:0008324 0.032877 cation transmembrane 100529, 116528, 196760, 29 transporter activity 198810, 208089, ... GO:0030641 0.036379 regulation of cellular pH 196760, 413949, 469783, 4 99943 GO:0051453 0.036379 regulation of intracellular pH 196760, 413949, 469783, 4 99943

Gene ID: 416800 module: paleturquoise Biomineralization process: Ca2+/H+ exchanger (CAX3) (CAX family) GO term over fucntion All genes with same term No. represented of pvalue genes GO:0044464 1.28E-07 cell part 100056, 100158, 100529, 592 100633, 100710, ... GO:0003674 6.30E-06 molecular_function 100056, 100158, 100277, 680 100529, 100598, ... GO:0015297 5.94E-05 antiporter activity 100529, 196760, 200599, 17 214474, 310007, ... GO:0015078 0.00012 hydrogen ion transmembrane 100529, 216523, 354280, 11 transporter activity 413949, 416800, ... GO:0022804 0.00037 active transmembrane 100529, 107728, 196760, 39 transporter activity 200599, 201172, ... GO:0043229 0.024169 intracellular organelle 100710, 101352, 101380, 304 103787, 104255, ... GO:0015369 0.024794 calcium:proton antiporter activity 416800, 72273 2

98 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0051139 0.024794 metal ion:proton antiporter 416800, 72273 2 activity GO:0015299 0.030259 solute:proton antiporter activity 100529, 416800, 72273 3 GO:0046873 0.031927 metal ion transmembrane 100529, 116528, 196760, 17 transporter activity 208089, 217075, ... GO:0008324 0.032877 cation transmembrane 100529, 116528, 196760, 29 transporter activity 198810, 208089, ... GO:0006816 0.037167 calcium ion transport 206643, 354606, 416800, 7 453185, 72273, ...

Gene ID: 469783 module: paleturquoise Biomineralization process: Ca2+/H+ exchanger (CAX3) (CAX family) GO term over fucntion All genes with same term No. represented of pvalue genes GO:0008150 1.23E-10 biological_process 100056, 100158, 100277, 666 100529, 100598, ... GO:0015301 8.98E-08 anion:anion antiporter activity 196760, 200599, 432785, 10 433656, 451704, ... GO:0003674 6.30E-06 molecular_function 100056, 100158, 100277, 680 100529, 100598, ... GO:0015077 4.21E-05 monovalent inorganic cation 100529, 116528, 196760, 18 transmembrane transporter 216523, 217075, ... activity GO:0015297 5.94E-05 antiporter activity 100529, 196760, 200599, 17 214474, 310007, ... GO:0022857 0.000779 transmembrane transporter 100529, 100710, 107728, 69 activity 116528, 196760, ... GO:0008514 0.003222 organic anion transmembrane 196760, 200599, 213954, 20 transporter activity 217075, 231096, ... GO:0055067 0.027556 monovalent inorganic cation 196760, 197732, 413949, 5 homeostasis 469783, 99943 GO:0005452 0.028903 inorganic anion exchanger 196760, 469783, 99943 3 activity GO:0046873 0.031927 metal ion transmembrane 100529, 116528, 196760, 17 transporter activity 208089, 217075, ... GO:0008324 0.032877 cation transmembrane 100529, 116528, 196760, 29 transporter activity 198810, 208089, ... GO:0030641 0.036379 regulation of cellular pH 196760, 413949, 469783, 4 99943

99 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0051453 0.036379 regulation of intracellular pH 196760, 413949, 469783, 4 99943

Gene ID: 373149 module: paleturquoise Biomineralization process: carbonic anhydrase, gamma GO term over fucntion All genes with same term No. represented of pvalue genes GO:0032991 3.90E-11 macromolecular complex 100158, 101352, 101380, 199 107102, 108359, ... GO:0003674 6.30E-06 molecular_function 100056, 100158, 100277, 680 100529, 100598, ... GO:0098796 8.00E-06 membrane protein complex 101380, 113427, 211477, 41 216523, 218228, ... GO:0065003 0.000605 macromolecular complex 107102, 110366, 116528, 52 assembly 117294, 123268, ... GO:0043933 0.002084 macromolecular complex subunit 107102, 110366, 116528, 53 organization 117294, 123268, ... GO:0043234 0.020456 protein complex 101380, 108933, 109999, 89 110744, 113427, ... GO:0043229 0.024169 intracellular organelle 100710, 101352, 101380, 304 103787, 104255, ... GO:0071840 0.040567 cellular component organization 100710, 101380, 104573, 124 or biogenesis 106538, 107102, ...

Gene ID: 435128 module: paleturquoise Biomineralization process: V-ATPase, B GO term over fucntion All genes with same term No. represented of pvalue genes GO:1901564 1.05E-11 organonitrogen compound 100158, 100633, 101014, 274 metabolic process 101092, 103267, ... GO:0044237 7.19E-10 cellular metabolic process 100158, 100633, 101014, 429 101092, 101352, ... GO:1901363 3.76E-09 heterocyclic compound binding 101352, 101940, 104255, 304 105936, 107102, ...

100 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0006807 3.82E-09 nitrogen compound metabolic 100158, 100633, 101014, 373 process 101092, 101352, ... GO:0044424 1.64E-08 intracellular part 100056, 100158, 100529, 556 100633, 100710, ... GO:0009144 3.16E-06 purine nucleoside triphosphate 100158, 107385, 214924, 19 metabolic process 254913, 359239, ... GO:0044281 3.21E-06 small molecule metabolic process 100158, 100633, 101014, 140 101092, 107385, ... GO:0009141 6.86E-06 nucleoside triphosphate 100158, 107385, 214924, 20 metabolic process 254913, 359239, ... GO:0015988 0.000101 energy coupled proton 216523, 413949, 435128, 6 transmembrane transport, 439538, 457332, ... against electrochemical gradient GO:0036094 0.023403 small molecule binding 104255, 107102, 107224, 170 107385, 108933, ... GO:0043229 0.024169 intracellular organelle 100710, 101352, 101380, 304 103787, 104255, ... GO:0044421 0.033613 extracellular region part 100633, 113987, 115039, 37 196229, 198201, ... GO:0006793 0.034006 phosphorus metabolic process 100158, 100633, 104255, 73 107224, 107385, ... GO:0016820 0.043697 hydrolase activity, acting on acid 216523, 242411, 355949, 12 anhydrides, catalyzing 413949, 433142, ... transmembrane movement of substances GO:0006458 0.043851 'de novo' protein folding 109999, 425512, 454161, 4 72380 GO:0070062 0.045682 extracellular exosome 100633, 115039, 198939, 29 206643, 212824, ...

Gene ID: 72273 module: paleturquoise Biomineralization process: cation/H+ exchanger (CAX family) GO term over fucntion All genes with same term No. represented of pvalue genes GO:0008150 1.23E-10 biological_process 100056, 100158, 100277, 666 100529, 100598, ... GO:0003674 6.30E-06 molecular_function 100056, 100158, 100277, 680 100529, 100598, ...

101 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0044446 9.52E-06 intracellular organelle part 100529, 100633, 100710, 317 101352, 101380, ... GO:0015077 4.21E-05 monovalent inorganic cation 100529, 116528, 196760, 18 transmembrane transporter 216523, 217075, ... activity GO:0015297 5.94E-05 antiporter activity 100529, 196760, 200599, 17 214474, 310007, ... GO:0022857 0.000779 transmembrane transporter 100529, 100710, 107728, 69 activity 116528, 196760, ... GO:0015491 0.002781 cation:cation antiporter activity 100529, 354606, 416800, 4 72273 GO:0043229 0.024169 intracellular organelle 100710, 101352, 101380, 304 103787, 104255, ... GO:0015369 0.024794 calcium:proton antiporter activity 416800, 72273 2 GO:0051139 0.024794 metal ion:proton antiporter 416800, 72273 2 activity GO:0015299 0.030259 solute:proton antiporter activity 100529, 416800, 72273 3 GO:0046873 0.031927 metal ion transmembrane 100529, 116528, 196760, 17 transporter activity 208089, 217075, ... GO:0008324 0.032877 cation transmembrane 100529, 116528, 196760, 29 transporter activity 198810, 208089, ... GO:0006816 0.037167 calcium ion transport 206643, 354606, 416800, 7 453185, 72273, ...

Gene ID: 99943 module: paleturquoise Biomineralization process: anion exchanger-like, SLC4 Na+ independent Cl-/HCO3- exchangers GO term over fucntion All genes with same term No. represented of pvalue genes GO:0008150 1.23E-10 biological_process 100056, 100158, 100277, 666 100529, 100598, ... GO:0043232 2.18E-10 intracellular non-membrane- 101352, 103787, 106538, 121 bounded organelle 108359, 110237, ... GO:0015301 8.98E-08 anion:anion antiporter activity 196760, 200599, 432785, 10 433656, 451704, ... GO:0003674 6.30E-06 molecular_function 100056, 100158, 100277, 680 100529, 100598, ... GO:0015077 4.21E-05 monovalent inorganic cation 100529, 116528, 196760, 18 transmembrane transporter 216523, 217075, ... activity

102 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0015297 5.94E-05 antiporter activity 100529, 196760, 200599, 17 214474, 310007, ... GO:0022857 0.000779 transmembrane transporter 100529, 100710, 107728, 69 activity 116528, 196760, ... GO:0008514 0.003222 organic anion transmembrane 196760, 200599, 213954, 20 transporter activity 217075, 231096, ... GO:0043229 0.024169 intracellular organelle 100710, 101352, 101380, 304 103787, 104255, ... GO:0015701 0.026024 bicarbonate transport 196760, 99943 2 GO:0055067 0.027556 monovalent inorganic cation 196760, 197732, 413949, 5 homeostasis 469783, 99943 ... GO:0005452 0.028903 inorganic anion exchanger 196760, 469783, 99943 3 activity GO:0006821 0.030207 chloride transport 196760, 466304, 99943 3 GO:0046873 0.031927 metal ion transmembrane 100529, 116528, 196760, 17 transporter activity 208089, 217075, ... GO:0008324 0.032877 cation transmembrane 100529, 116528, 196760, 29 transporter activity 198810, 208089, ... GO:0044421 0.033613 extracellular region part 100633, 113987, 115039, 37 196229, 198201, ... GO:0030641 0.036379 regulation of cellular pH 196760, 413949, 469783, 4 99943 GO:0051453 0.036379 regulation of intracellular pH 196760, 413949, 469783, 4 99943 GO:0070062 0.045682 extracellular exosome 100633, 115039, 198939, 29 206643, 212824, ... GO:0046983 0.047501 protein dimerization activity 116528, 194396, 206643, 22 212580, 217317, ...

Gene ID: 413949 module: paleturquoise Biomineralization process: V-ATPase, D GO term over fucntion All genes with same term No. represented of pvalue genes GO:0032991 3.90E-11 macromolecular complex 100158, 101352, 101380, 199 107102, 108359, ... GO:0003674 6.30E-06 molecular_function 100056, 100158, 100277, 680 100529, 100598, ... GO:0098796 8.00E-06 membrane protein complex 101380, 113427, 211477, 41 216523, 218228, ...

103 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0015077 4.21E-05 monovalent inorganic cation 100529, 116528, 196760, 18 transmembrane transporter 216523, 217075, ... activity GO:0090662 0.000101 ATP hydrolysis coupled 216523, 413949, 435128, 6 transmembrane transport 439538, 457332, ... GO:0099131 0.000101 ATP hydrolysis coupled ion 216523, 413949, 435128, 6 transmembrane transport 439538, 457332, ... GO:1902600 0.000138 hydrogen ion transmembrane 196760, 216523, 413949, 9 transport 433142, 435128, ... GO:0043234 0.020456 protein complex 101380, 108933, 109999, 89 110744, 113427, ... GO:0043229 0.024169 intracellular organelle 100710, 101352, 101380, 304 103787, 104255, ... GO:0055067 0.027556 monovalent inorganic cation 196760, 197732, 413949, 5 homeostasis 469783, 99943 GO:0008324 0.032877 cation transmembrane 100529, 116528, 196760, 29 transporter activity 198810, 208089, ... GO:0030641 0.036379 regulation of cellular pH 196760, 413949, 469783, 4 99943 GO:0051453 0.036379 regulation of intracellular pH 196760, 413949, 469783, 4 99943 GO:0016820 0.043697 hydrolase activity, acting on acid 216523, 242411, 355949, 12 anhydrides, catalyzing 413949, 433142, ... transmembrane movement of substances

Gene ID: 62679 module: turquoise Biomineralization process: carbonic anhydrase, alpha GO term over fucntion All genes with same term No. represented of pvalue genes GO:0044763 0.011661 NA 100055, 101242, 103027, 314 103327, 104031, ... GO:0009987 0.036374 cellular process 100055, 100973, 101233, 569 101242, 102628, ...

Gene ID: 464767 module: turquoise

104 Biomineralization process: V-ATPase, A GO term over fucntion All genes with same term No. represented of pvalue genes GO:0042592 0.000134 homeostatic process 100055, 103027, 104009, 55 105320, 122834, ... GO:0019725 0.000284 cellular homeostasis 100055, 103027, 217082, 39 222551, 224426, ... GO:0048878 0.001202 chemical homeostasis 100055, 103027, 104009, 29 194068, 199932, ... GO:0003008 0.001567 system process 102463, 104009, 105638, 22 121824, 202804, ... GO:0043227 0.003649 membrane-bounded organelle 100055, 100138, 101233, 340 101242, 102628, ... GO:0050801 0.004356 ion homeostasis 100055, 103027, 104009, 23 199932, 214231, ... GO:0006873 0.004829 cellular ion homeostasis 100055, 103027, 217082, 16 222551, 369600, ... GO:0005765 0.005229 lysosomal membrane 104979, 204576, 237489, 9 242941, 430501, ... GO:0098771 0.008877 inorganic ion homeostasis 100055, 103027, 104009, 20 199932, 214231, ... GO:0050877 0.011263 nervous system process 102463, 105638, 121824, 16 211275, 222362, ... GO:0044763 0.011661 NA 100055, 101242, 103027, 314 103327, 104031, ... GO:0007600 0.01213 sensory perception 102463, 105638, 211275, 11 222362, 250169, ... GO:0034654 0.0122 nucleobase-containing 105320, 106724, 122834, 78 compound biosynthetic process 125567, 194430, ... GO:0055082 0.012981 cellular chemical homeostasis 100055, 103027, 217082, 17 222551, 369600, ... GO:0065008 0.013111 regulation of biological quality 100055, 103027, 104009, 79 104534, 105320, ... GO:0009127 0.013169 purine nucleoside 224909, 360925, 421581, 9 monophosphate biosynthetic 432220, 441105, ... process GO:0009168 0.013169 purine ribonucleoside 224909, 360925, 421581, 9 monophosphate biosynthetic 432220, 441105, ... process GO:0043623 0.013242 cellular protein complex 200144, 203237, 211393, 20 assembly 232861, 240076, ... GO:0032501 0.013486 multicellular organismal process 102463, 104009, 105320, 62 105330, 105638, ... GO:0098852 0.015572 lytic vacuole membrane 104979, 204576, 237489, 11 242941, 426930, ...

105 GO term over fucntion All genes with same term No. represented of pvalue genes GO:0043226 0.016588 organelle 100055, 100138, 101233, 385 101242, 102628, ... GO:0006139 0.017569 nucleobase-containing 102628, 104534, 105320, 200 compound metabolic process 105330, 105856, ... GO:0009156 0.017735 ribonucleoside monophosphate 224909, 360925, 421581, 11 biosynthetic process 432220, 441105, ... GO:0071822 0.01884 protein complex subunit 194068, 199313, 200144, 35 organization 202828, 203237, ... GO:0009124 0.023662 nucleoside monophosphate 224909, 360925, 421581, 11 biosynthetic process 432220, 441105, ... GO:0006461 0.0238 protein complex assembly 194068, 199313, 200144, 32 203237, 211393, ... GO:0044699 0.024956 NA 100055, 101242, 103027, 410 103327, 104009, ... GO:0007605 0.026074 sensory perception of sound 102463, 105638, 211275, 6 446686, 45912, ... GO:0050954 0.026074 sensory perception of mechanical 102463, 105638, 211275, 6 stimulus 446686, 45912, ... GO:0055080 0.026872 cation homeostasis 100055, 103027, 199932, 18 214231, 217082, ... GO:0046483 0.030763 heterocycle metabolic process 102628, 104086, 104534, 224 105320, 105330, ... GO:0043229 0.030951 intracellular organelle 100055, 100138, 101242, 366 102628, 104031, ... GO:0098588 0.031091 bounding membrane of organelle 100055, 102463, 104031, 76 104937, 104979, ... GO:0030003 0.034061 cellular cation homeostasis 100055, 103027, 217082, 13 222551, 369600, ... GO:0034641 0.034942 cellular nitrogen compound 102628, 104086, 104534, 246 metabolic process 104979, 105320, ... GO:0009987 0.036374 cellular process 100055, 100973, 101233, 569 101242, 102628, ... GO:0044424 0.040502 intracellular part 100055, 100138, 100973, 608 101050, 101242, ... GO:0044463 0.041571 cell projection part 102463, 105638, 199357, 20 211393, 212652, ... GO:0005488 0.042195 binding 100055, 100973, 101233, 526 101622, 102628, ...

APPENDIX 6 – Table relating merged modules to lipid metabolism genes Gene ID module Lipid

106 74568 black Biosyn. of unsat. fatty acid 195308 black Sphingolipid 71084 black Syn. & degrad ketone 455111 black alpha-Linolenic acid metabolism 466572 blue Glycerolipid 462459 blue Biosyn. of unsat. fatty acid 65804 blue Sphingolipid 99752 blue Fatty acid metabolism 437991 blue Sphingolipid 468545 blue Steriod biosynthesis 455687 blue alpha-Linolenic acid metabolism 462898 blue Syn. & degrad ketone 103579 blue Ether Lipid metabolism 216985 blue Fatty acid metabolism 77411 brown Ether Lipid metabolism 372934 brown Biosyn. of unsat. fatty acid 235900 brown Fatty acid elongation 417285 brown Linoleic acid 462793 brown Primary bile acid 42521 brown Sphingolipid 113631 brown Arachidonic acid metabolism 441547 brown Fatty acid metabolism 217317 brown Glycerophospholipid 450766 brown Ether Lipid metabolism 115039 brown alpha-Linolenic acid metabolism 435118 cyan Biosyn. of unsat. fatty acid 416107 cyan Syn. & degrad ketone 97678 cyan Glycerolipid 435400 cyan alpha-Linolenic acid metabolism 442028 cyan Biosyn. of unsat. fatty acid 463079 cyan Sphingolipid 461539 darkgreen alpha-Linolenic acid metabolism 350751 darkgreen Glycerophospholipid 57780 darkgreen Glycerophospholipid 462014 darkgrey Glycerolipid 71725 darkgrey Glycerophospholipid 434850 darkmagenta Biosyn. of unsat. fatty acid 433989 darkorange Arachidonic acid metabolism 104256 darkorange Glycerophospholipid 455453 darkorange Sphingolipid 44703 darkred Biosyn. of unsat. fatty acid 363689 darkturquoise Arachidonic acid metabolism 107980 darkturquoise Sphingolipid

107 310078 darkturquoise Biosyn. of unsat. fatty acid 55362 green Glycerophospholipid 351545 green Ether Lipid metabolism 62920 green alpha-Linolenic acid metabolism 426762 greenyellow Biosyn. of unsat. fatty acid 225887 greenyellow Fatty acid metabolism 65222 greenyellow Ether Lipid metabolism 439583 greenyellow Steriod biosynthesis 437926 greenyellow Fatty acid metabolism 462385 grey60 Fatty acid biosynthesis 71654 grey60 Biosyn. of unsat. fatty acid 466230 grey60 Glycerolipid 454032 lightcyan Sphingolipid 451417 lightcyan Fatty acid metabolism 228782 lightcyan alpha-Linolenic acid metabolism 450371 lightcyan Sphingolipid 467794 lightcyan Sphingolipid 434624 lightgreen Fatty acid metabolism 231395 lightgreen Arachidonic acid metabolism 444084 lightyellow Biosyn. of unsat. fatty acid 456684 lightyellow Fatty acid metabolism 451975 lightyellow Glycerolipid 358135 magenta Glycerolipid 428205 magenta Steriod biosynthesis 353558 magenta Fatty acid metabolism 209748 magenta Glycerolipid 100082 magenta Glycerolipid 436719 magenta alpha-Linolenic acid metabolism 464665 midnightblue Fatty acid metabolism 455647 midnightblue Glycerolipid 206840 midnightblue Glycerolipid 462451 midnightblue Sphingolipid 428713 midnightblue Fatty acid biosynthesis 64273 midnightblue Steriod biosynthesis 200969 orange Sphingolipid 438299 pink Glycerolipid 308356 purple Arachidonic acid metabolism 310116 purple Glycerolipid 461962 purple Fatty acid biosynthesis 470128 red Arachidonic acid metabolism 414354 red Fatty acid biosynthesis 357482 royalblue alpha-Linolenic acid metabolism 61017 royalblue Ether Lipid metabolism

108 459364 salmon Ether Lipid metabolism 450278 salmon Steriod biosynthesis 432676 salmon Steriod biosynthesis 427068 skyblue Biosyn. of unsat. fatty acid 439159 skyblue Ether Lipid metabolism 425873 skyblue Biosyn. of unsat. fatty acid 350230 tan Fatty acid metabolism 198823 tan alpha-Linolenic acid metabolism 46870 tan alpha-Linolenic acid metabolism 42019 tan Ether Lipid metabolism 195845 tan Biosyn. of unsat. fatty acid 97603 tan Arachidonic acid metabolism 452872 tan Biosyn. of unsat. fatty acid 64122 turquoise Steriod biosynthesis 233079 turquoise Glycerolipid 206165 turquoise Sphingolipid 105406 turquoise Glycerolipid 448517 turquoise Arachidonic acid metabolism 438072 turquoise alpha-Linolenic acid metabolism 111458 turquoise Sphingolipid 218439 turquoise Sphingolipid 448764 turquoise Biosyn. of unsat. fatty acid 43539 turquoise alpha-Linolenic acid metabolism 434256 turquoise Fatty acid metabolism 108787 turquoise Ether Lipid metabolism 464423 turquoise Biosyn. of unsat. fatty acid 65073 turquoise alpha-Linolenic acid metabolism 463256 turquoise Sphingolipid 448559 turquoise Syn. & degrad ketone 234622 turquoise alpha-Linolenic acid metabolism 451556 violet Sphingolipid 441988 white Biosyn. of unsat. fatty acid 357847 white Fatty acid metabolism 63173 yellow Biosyn. of unsat. fatty acid 55855 yellow Biosyn. of unsat. fatty acid 45086 yellow Syn. & degrad ketone 51150 yellow Fatty acid metabolism 434784 yellow Glycerophospholipid 464150 yellow alpha-Linolenic acid metabolism 122167 yellow Sphingolipid 77738 yellow Ether Lipid metabolism

109