Effect of Caffeine Exposure on Expression in Mesenchymal Stem Cells Madison Pedreira

1

Abstract Millions of pregnant women consume caffeine during their pregnancy, which may end up affecting their child’s development. In animal models, fetal caffeine exposure has shown to alter and embryonic cardiac tissue while also producing long-term effects. Some human studies have shown miscarriage and reduced birth weight to be an effect of fetal caffeine exposure, but there are differing opinions. To better understand the influence of caffeine on stem cell gene expression, we cultured mesenchymal stem cells (MSCs) from human umbilical cords. One group of samples were obtained from mothers that consumed high levels of caffeine during their pregnancy and the other group consumed lower caffeine levels. RNA was isolated from 3 cell lines for both levels of caffeine exposure and analyzed by RNAseq. Analysis of RNAseq data revealed differences in gene expression between the two groups. In this report, we demonstrate that involved in apoptosis, gliomas, and transcriptional activation are upregulated in the high caffeine samples, while genes involving apoptosis inhibition, embryonic and neural development, and transcriptional repression are downregulated.

2

Introduction Environmental factors can have a major effect on the development of the fetus in the womb. Anything consumed by the pregnant mother has the potential to affect the fetus’s gene expression directly or through epigenetic pathways[1]. It has been demonstrated that smoking or drinking during pregnancy can have serious consequences on the fetus, however, caffeine has not been considered as dangerous. Understanding the consequence of caffeine consumption during pregnancy is a public health concern that directly affects our children and their development. Animal studies have shown ventricular septal defects[2] and decreased embryonic cardiac tissue[3] to be heart-related abnormalities during embryonic development when exposed to caffeine in utero. Caffeine exposure has been correlated with lower birth weight[4] and miscarriage in human babies[5], but these concerns are not shown by all researchers. However, data from our previous experiments in animals showed that embryonic caffeine exposure leads to altered cardiac function in mice[3, 6, 7], as described in the following paragraphs. The mouse data we have collected already is important for us to consider when comparing our human stem cell analysis in order to better understand the effect of fetal caffeine exposure. One of our previous animal studies showed that in utero exposure to caffeine affects embryonic development and cardiac function after birth[3]. After pregnant mice were treated with caffeine, either the embryos were collected, or the pregnant mothers gave birth. This allowed for both the analysis of prenatal fetus and the postnatal offspring to understand the extent to which caffeine affects post-fertilization development. In the embryos, cardiac histology was examined and showed that the caffeine-treated group had a smaller cross-sectional area of the ventricular myocardium, indicating that the embryonic development of the heart was negatively affected by caffeine exposure. In the pups, echocardiography was used to determine cardiac function, showing a decrease in overall cardiac function and in systolic function. This information indicates that in utero caffeine exposure in mice affects both the fetal development before birth and the offspring development after birth in the heart, which is a concern that must be studied in human samples to determine if caffeine consumption during pregnancy is dangerous. A relevant concern is that fetal caffeine exposure may affect the offspring and generations after that. The consequence of consuming caffeine becomes much greater if this is true, and another one of our previous animal studies indicates that this may be the case[7]. Pregnant mice were treated with caffeine, and their offspring and the next two generations displayed altered cardiac morphology and function. Generation 1 was the only generation exposed to caffeine in utero, yet the next two generations displayed adverse effects into adulthood. Symptoms of altered cardiac function often did not show until adulthood of the 2nd and 3rd generations, indicating that transgenerational fetal caffeine exposure may have latent effects. This raises questions about how representative this study is to . In order to understand the extent of the danger of caffeine consumption during pregnancy, these findings must be confirmed in human models. These findings are also supported in animal cell studies. Another previous study demonstrates that gene expression and cell morphology are altered when mouse cardiomyocyte cells are treated with caffeine[6]. These results showed that the cells treated with caffeine had higher mRNA expression in several cardiac structural genes, cardiac transcription factors, and cardiac

3 miRNAs. The cells also displayed morphological changes and reduced in numbers after a certain concentration of caffeine was administered. This study supports the results from the last two studies discussed here, but with cellular data. Using cells to study the effect of caffeine treatment is helpful because there is less genetic variability that may affect the results, as cells grow in a more controlled environment than breeding mice. As described, there is existing evidence in mouse studies that indicate that in utero caffeine exposure has effects on embryonic development that may lead to altered phenotypes in adulthood. The next step in this sequence of studies is to use a human model to verify the data we have from animal studies. The model for fetal development in this study is human umbilical cord MSCs taken from subjects, and the samples were separated into two groups, high vs. low in utero caffeine exposure. High caffeine exposure is defined as greater than 71.3 mg/day (about 6 oz. coffee) and low caffeine exposure is defined as less than 24.9 mg/day. This is not an ideal model, but it is a good first step because it allows us to examine human fetal cells in response to in utero caffeine exposure. Because these cells are mesenchymal stem cells from fetal tissue, we can compare them to our mouse data. MSCs are undifferentiated stem cells and will allow us to identify genes that are differentially expressed in response to caffeine exposure. This will give us a basis to understand these cells so that future studies can be done on these cells after differentiation into cardiomyocytes. RNAseq data from each human sample group (high vs. low caffeine exposure) will be compared on the basis of gene expression to determine any differences between the two groups. This also allows us to determine if human MSCs respond to caffeine similarly as we have observed in animal models. Here, we use an RNAseq analysis pipeline to understand the differentially expressed genes between high caffeine and low caffeine groups. This will help us understand if caffeine consumption during pregnancy has an effect on gene expression in the embryo, and whether this has long-term effects on the health of the fetus.

4

Methods

Quality control and alignment to transcriptome The 6 RNAseq samples (3 high caffeine, 3 low caffeine) were analyzed for quality control using FastQC (version 0.11.4)[8] and MultiQC (version 1.1)[9], which was done before and after trimming. This provides quality scores, per base sequence content, and GC content. The sequences were trimmed using Trimmomatic (version 0.36)[10] based on the quality control report to remove low quality bases. The samples were aligned to the human transcriptome using STAR (version 2.7.3a)[11]. STAR is an efficient alignment tool for RNAseq because it aligns transcripts and takes splicing into account instead of mapping to the genomic sequence.

Quantification and Differential Expression analysis Gene and transcript abundance and expression values were quantified with RSEM (version 1.2.31)[12]. RSEM also produces a correlation scatterplot with correlation scores between each replicate, along with a Multi-Dimensional Scaling (MDS) plot for the raw expression data. DESeq2[13] was used for the differential expression analysis to determine which of the genes identified by RSEM were differentially expressed between the two study conditions. The significant genes were filtered to only include log2(FC) ≥ |1.0| and False Discovery Rate corrected p-value ≤ 0.05. Another MDS plot is generated in DESeq2 once the expression data is normalized. DESeq2 filtered -coding genes specifically so that the genes of interest may be identified in developmental pathways to understand the effect that caffeine has on embryonic development.

Pathway identification GeneCards is a searchable human gene database available on the internet and was used to identify potential pathways in the differentially expressed genes found in the RNAseq pipeline. A more accurate and robust pathway analysis would be done in the continuation of this study, but GeneCards was used here as a preliminary pathway analysis for the genes found. GeneCards pulls information from and UniProt to include in their summary for each gene.

5

Results and Discussion

Quality control analysis shows high quality scores and transcriptome alignment rate In order to better understand the quality of the sequencing of our isolated RNA samples, we must first run quality control before any further analysis. Quality control provides the information that dictates trimming, which will lead to a better alignment to the transcriptome. The quality of the samples was analyzed using FastQC[8], and the quality report determined how much needed to be trimmed off the ends of our sample to ensure that only high quality bases are retained. This will allow for the closest possible mapping to the transcriptome, in which we used STAR[11]. Here, the percentage of reads retained after trimming (Table 1) is high due to the high quality scores of the samples. The quality control used here assigns only 4 quality score values as compared to the previous range of 40, resulting in the highest possible scores for these samples. Trimming allows for a more efficient mapping to the transcriptome, which is also why the transcriptome alignment rate is above 90% for each sample (Table 2). We use transcriptome alignment rate rather than genome alignment rate because for RNAseq samples, mapping to the genome can result in reads mapping to multiple locations, whereas mapping to the transcriptome chooses the best possible match for each read. The coverage is the total number of nucleotides sequenced divided by the size of the genome, which is misleading due to the reasons previously stated. The effective coverage takes the average coverage divided by the effectively covered fraction of the genome, which is defined as the number of bases having a coverage greater than 5. The effective coverage is very high, with an average of 232.40 (Table 2). This indicates that in future studies, more replicates can be used, and the coverage will still be high enough to run a successful analysis.

Quantified transcript abundance shows similarity between replicates In order to ensure the continuity between replicates of the same condition, we quantified the transcript expression values for every gene in the samples so they could be compared against each other. We used RSEM[12] to quantify transcript abundance and expression values for each sample. RSEM also calculates correlation scores between the samples and generates scatterplots and MDS plots to visualize these comparisons. This is important to analyze so we can ensure that there is not too much biological variability between samples within the same condition. Otherwise, the 3 samples of each condition could not be effectively pooled together to perform differential analysis against the other condition. Correlation values and scores were found (Table 3), as well as scatterplots for each comparison (Figure 4) showing level of similarity. The least similar samples within their condition were CAF1 vs. CAF2 and VEH2 vs. VEH3. The most similar samples were CAF1 vs. CAF3 and VEH1 vs. VEH3. The scatterplots with higher correlation scores have less outliers beyond the trendline, while the lower correlation scores have a broader range of points deviating from the main line. The correlations discussed can also be visualized on the Multi-Dimensional Scaling (MDS) plot (Figure 5) from the quantification analysis on the raw (un-normalized) expression data. As seen on the plot, CAF1 and CAF3 are close together while CAF2 is farther away, which is supported by the correlation scores. VEH1 and VEH3 are also plotted close to each other while VEH2 is farther away. The discrepancies of VEH2 and CAF2 can be justified in the sample collection steps. They both had lower starting concentrations than the other samples and VEH2 had an RNA Integrity Number (RIN) almost half that of the other samples.

6

Caffeine exposure resulted in 126 differentially expressed protein-coding genes 126 protein-coding genes were found in the differential analysis, 68 that were overexpressed in the caffeine samples and 58 that were underexpressed. Figure 6 shows the MDS plot from the differential analysis step after normalization and Table 7 shows the most relevant differentially expressed genes to developmental pathways. Each gene identified and described in the sections below were searched on GeneCards for their respective biological functions.

Upregulated genes involved in apoptosis, transcriptional activation, and developmental pathways were identified: In order to understand the effect that in utero caffeine exposure has on human fetal stem cells, we analyzed here which genes were overexpressed in high caffeine samples compared to the lower caffeine group. One pathway that has many upregulated (in caffeine samples) genes is apoptosis. MYCT1 and SFRP4 are correlated with apoptosis when overexpressed. PPP1R13L, OAS3, and CASP10 also all have roles in apoptosis, and CASP10 plays a role in the execution phase of apoptosis. Brain/neuronal development pathways are also seen to be upregulated in caffeine samples. CDH8 plays a role in axon outgrowth and CNTNAP3 plays a role in mediating neuron-glial interactions in the brain. EFEMP1 plays a role in malignant gliomas and AEBP1 is often associated with glioblastoma when the gene is overexpressed, as it is here. Therefore, overexpression of these two genes are correlated with both caffeine consumption during pregnancy and brain cancer. Some pathways related to cardiac development and regulation are also upregulated. SFRP4 is correlated with apoptosis when it is expressed more highly in the ventricular myocardium. POPDC2 and KCNK6 help maintain cardiac conduction and regulate heart rate. BHLHE40 is involved in heart development and also affects blood pressure by repressing transcription of ATP1B1 in the cardiovascular system. Genes involved in DNA modification are also expressed differently in caffeine groups. HIST1H2BC is a component of nucleosomes, ZNF219 is a repressor of nucleosome binding domain 1 and therefore is linked to chromatin in its more active conformation. CPA4 is a potential component of the hyperacetylation pathway, which would also be associated with transcriptionally active chromatin. These results suggest that fetal caffeine exposure differentially overexpresses genes involved in cell death, neural development, tumor origination, cardiac development, and chromatin being in a more transcriptionally active conformation. This tells us that gene expression is altered in these important developmental pathways in response to in utero caffeine exposure.

7

Downregulated genes involved in inhibition of apoptosis, transcriptional silencing, and developmental pathways were identified: Just as important as identifying overexpressed genes to understand how fetal caffeine exposure genetically affects the child, we must look at underexpressed genes as well. This tells us which pathways may be inhibited during the fetal development process. TNFRSF10D, CDK15, and PARM1 were all downregulated in caffeine samples and all inhibit apoptosis. This is in contrast to the five genes that were overexpressed in caffeine samples that are involved in executing apoptosis. Also, while multiple genes involved in gliomas were overexpressed in caffeine samples, several genes involved in neuronal development are underexpressed in caffeine samples. This includes CIT (central nervous system development), LSAMP (neuronal growth and limbic system development), GAS7 (neuronal development), DOK6 and DLX2 (brain development), and DAB1 (brain development and directing migration of neurons). Similarly to in the caffeine samples, heart development genes are upregulated in the control samples. CENPF may be a regulator of the embryonic cardiomyocyte cell cycle and KCNIP3 plays a role in cardiac conduction. The DNA modification genes that were upregulated in caffeine samples were correlated with active chromatin. However, SALL1 is more expressed in the control samples and may be a component of Histone Deacetylase, which would result in transcriptional repression. Every gene that involved embryonic development was upregulated in control samples, including CENPF, PEG10, TNFRST19, MMP16 (breakdown extracellular matrixes during embryonic development), and LAMA1 (organizes cells into tissues during embryonic development). These results demonstrate that genes that oppose the functions described in the previous section of upregulated genes are in fact downregulated in the caffeine samples. The overexpressed genes and underexpressed genes found in this differential expression analysis overlap with each other. The basic pathway analysis performed in this study is not an exhaustive conclusion, but it gives us a baseline of information to draw preliminary conclusions that we can use to investigate further with a more refined pathway analysis.

8

Conclusion Our previous studies using animal models have shown compelling results about the negative effects of caffeine consumption during pregnancy. In one study, we demonstrate that not only was cardiac development altered in the mouse fetus prior to birth, but the pups had post-natal effects of in utero caffeine exposure as well[3]. Other data shows that these consequences may be transgenerational. Indications of altered cardiac development and function in mice were seen two and three generations after the fetal caffeine exposure[7]. If these results were also demonstrated in humans, this would make caffeine consumption during pregnancy a serious health consideration, as is smoking and drinking alcohol. We use human stem cells from umbilical cords in this study to explore the effect of caffeine exposure on gene expression in human samples. It has been shown previously that cell morphology and gene expression is altered in mouse cells treated with caffeine[6]. We are interested in studying human cells to compare these findings and determine the relevance of our animal studies in humans. The existing data that shows how in utero caffeine exposure effects embryonic development and morphology/function into adulthood in mouse studies is compelling and leads us to believe that caffeine consumption during pregnancy may be a relevant public health concern. To make conclusions about the importance of these studies in human babies, there must be a human model for fetal development to test the effect of caffeine treatment. We used undifferentiated MSCs from human umbilical cords taken from subjects who consumed either high or low caffeine levels during their pregnancy. Using these cells as a model for human fetal development allows us to directly compare our results to previous mouse cell studies. This study shows that there are 126 protein-coding genes that are differentially expressed between high caffeine and low caffeine-exposed models. Many of these genes are involved in developmental processes, including but not limited to embryonic, cardiac, and neural development pathways. Caffeine samples expressed apoptosis-inducing genes more than the control samples, whereas the control samples had greater expression in genes that are involved in inhibiting apoptosis. Similarly, caffeine samples showed greater expression of genes involved in transcriptional activation, including nucleosome binding domain protein repression and histone hyperacetylation. Meanwhile, one upregulated gene in the control samples is thought to be a component of Histone Deacetylase. These results indicate that there are functional parallels between the two conditions that oppose each other and may play a role in embryonic development. Future steps in this study would include doing the same RNAseq analysis in differentiated cells, such as human cardiac or neuronal cells treated with caffeine. To be consistent with the existing literature, pathway analysis, real-time PCR confirmation, and examining cell morphology would be useful in comparing the results of animal studies with this new human-model data.

9

Figures, Tables, and Legends

Sample Reads before QC Reads after QC % retained

CAF1 71,763,931 69,531,117 96.89%

CAF2 72,190,964 69,757,771 96.63%

CAF3 110,091,053 106,679,283 96.90%

VEH1 65,130,347 63,054,726 96.81%

VEH2 80,684,221 78,544,851 97.35%

VEH3 71,655,149 69,415,580 96.87%

Table 1: The number of reads in each sequence before and after QC/trimming indicates high sequence quality. Quality control was determined using FastQC[8], which determined the quality of the reads. This report dictated the trimming done by Trimmomatic[10]. The 2nd and 3rd column show that very few reads were removed due to the quality control report. The last column shows the percentage of reads retained after trimming.

10

Sample Transcriptome alignment rate Coverage Effective coverage CAF1 93.57% 56.81 224.96 CAF2 94.58% 56.10 222.40 CAF3 94.28% 79.39 297.67 VEH1 92.75% 44.83 187.57 VEH2 93.25% 62.55 239.46 VEH3 93.75% 55.58 222.31

Table 2: The transcriptome alignment and genome coverage after trimming are high, indicating that the sequences aligned well to the human reference transcriptome. STAR[11] was used because it maps the samples to the human transcriptome, which is better for RNA sequences because it takes splicing into account. We look at transcriptome alignment rate rather than genome alignment rate because mapping RNAseq samples to the genome can result in reads mapping to several locations rather than just the desired one. Coverage indicates the number of nucleotides sequenced divided by the size of the genome, which is misleading since mapping to the transcriptome is better for RNAseq. Therefore, the effective coverage is a more accurate representation, which is the average coverage divided by the effectively covered fraction of the genome (the number of bases having a coverage > 5.)

11

Condition Replicate 1 Replicate 2 Correlation Correlation score CAF CAF1 CAF2 0.9493 1.295 CAF CAF1 CAF3 0.9736 1.579 CAF CAF2 CAF3 0.9584 1.381 VEH VEH1 VEH2 0.9297 1.153 VEH VEH1 VEH3 0.9672 1.484 VEH VEH2 VEH3 0.9232 1.115

Table 3: Correlation scores between each sample within conditions (caffeine and control) show how similar each replicate is to each other. RSEM[12] was used to determine gene and transcript expression values. These quantifications were used to compare samples within the same condition based on how similarly their genes were expressed. This information was used to generate these correlation values and scores. For the high caffeine samples, CAF1 and CAF3 were most similar to each other, indicating that CAF2 was a slight outlier, and the same is true for the low caffeine samples. These correlations are visually depicted in Figures 4 and 5. These conclusions are important because if the replicate samples were very dissimilar, we cannot justify grouping them together to compare all the high caffeine samples to all the low caffeine samples.

12

CAF1 vs. CAF2 VEH1 vs. VEH2

CAF1 VEH1

CAF2 VEH2

CAF1 vs. CAF3 VEH1 vs. VEH3

VEH1 CAF1

CAF3 VEH3

CAF2 vs. CAF3 VEH2 vs. VEH3

VEH2 CAF2

CAF3 VEH3

Figure 4: Caffeine and vehicle samples were compared against each other according to correlation scores in Table 2. The correlation scores were calculated using RSEM[12], and these scatterplots show the level of similarity between each sample within the same testing condition. The X and Y values are gene expression values for each gene, which were determined by RSEM. Samples that are more similar will have dots ordered in a narrower line, and less points deviating from the trendline. This indicates that genes behave more similarly in both samples.

13

CAF2 Figure 5: This Multi-Dimensional Scaling plot is from quantification step before normalization and is another representation of the correlation scores [12] in Table 2. RSEM generates this plot

CAF1 to show similarity between samples,

much like the scatterplots in Figure 4. VEH2 CAF3 The axes represent log fold change, so samples close together are more related because there are less differences between them (lower fold change). High caffeine samples are

Leading logFC dim2 Leading shown in red and low caffeine samples are shown in blue. CAF2 and VEH2 appear to be the outliers in their VEH3 condition group. VEH1

Leading logFC dim1

Figure 6: This Multi-Dimensional VEH1

Scaling plot is from the differential VEH3 analysis step after normalization. This CAF3 plot was generated using DESeq2[13], and similarity is determined here by

using differential expression. The axes represent log fold change. Ideally, replicates of the same condition should be close together on the plot. High CAF1 caffeine samples are shown in red and low caffeine samples are shown in blue.

CAF2 and VEH2 are still outliers, as their logFC dim2 Leading correlation scores with the other samples in their group were the lowest. VEH2

CAF2

Leading logFC dim1

14

# Gene Pathway Log2(FC) p-value 1 MYCT1 Apoptosis 3.652 0.0150 2 CDH8 Brain development 3.310 5.02×10-8 3 SFRP4 Apoptosis & heart development 3.167 0.0468 4 EFEMP1 Brain development 2.838 3.49×10-9 5 CPA4 DNA modification 2.243 9.40×10-5 6 POPDC2 Heart development 2.107 0.0113 7 CNTNAP3 Brain development 2.073 0.0487 8 XAF1 Apoptosis (inhibitory) 2.008 0.0162 9 CASP10 Apoptosis 1.911 0.0424 10 AEBP1 Brain development 1.828 0.0347 11 PPP1R13L Apoptosis 1.818 0.00689 12 HIST1H2BC DNA modification 1.767 0.0279 13 OAS3 Apoptosis 1.525 0.00193 14 KCNK6 Heart development 1.393 0.00646 15 ZNF219 DNA modification 1.026 0.00306 16 BHLHE40 Heart development 1.020 8.18×10-5 17 MCM5 Cell cycle phases -1.007 0.00126 18 CENPF Cell cycle phases & heart development -1.024 0.00739 19 RRM2 Wnt signaling pathway -1.034 0.00709 20 CIT Brain development -1.036 0.0438 21 TNFRSF10D Apoptosis (inhibitory) -1.151 0.0112 22 BRCA2 DNA repair -1.176 0.0203 23 KCNIP3 Heart development -1.189 0.0495 24 PEG10 Embryonic development -1.216 0.0154 25 PPP2R2B Cell cycle phases -1.230 0.0237 26 MMP16 Embryonic development -1.239 0.00707 27 FANCD2 DNA repair -1.283 0.00157 28 TNFRSF19 Embryonic development -1.317 0.0252 29 CDK15 Apoptosis (inhibitory) -1.392 0.00377 30 ID2 Wnt signaling pathway -1.403 0.00512 31 LSAMP Brain development -1.510 0.0281 32 GAS7 Brain development -1.613 0.00535 33 TCF7 Wnt signaling pathway -1.827 0.0290 34 DOK6 Brain development -1.878 9.40×10-5 35 PARM1 Apoptosis (inhibitory) -1.883 0.0144 36 LAMA1 Embryonic development -1.959 0.0144 37 DLX2 Brain development -1.964 0.0146 38 CAMK1G Heart development -2.491 0.0332 39 DAB1 Brain development -4.556 0.0468 40 SALL1 DNA modification -4.598 0.000426

Table 7: This list includes biologically relevant differentially expressed genes between the two groups. Differential genes were determined by DESeq2[13] and their functions were determined using the GeneCards online database. The list is in order of log2(FC), so the top is the most upregulated and the bottom is the most downregulated in caffeine samples, with the less differentially expressed genes in the middle. Fold change is 2log2(FC). For example, a log2(FC) value of 3.652 (gene #1), would be 12.57 times upregulated.

15

Works Cited

1. Shirodkar A. V. & Marsden P. A. (2011) Epigenetics in cardiovascular disease. Current opinion in cardiology 26, 209–215, PMID: 21415727

2. Matsuoka R, Uno H, Tanaka H, Kerr CS, Nakazawa K, et al... (1987) Caffeine induces cardiac and other malformations in the rat. Am J Med Genet Suppl 3: 433–443, PMID: 3130878

3. Wendler CC, Busovsky-McNeal M, Ghatpande S, Kalinowski A, Russell KS, et al. (2009) Embryonic caffeine exposure induces adverse effects in adulthood. FASEB J 23: 1272–1278, PMID: 19088180

4. Vlajinac H. D., Petrovic R. R., Marinkovic J. M., Sipetic S. B. & Adanja B. J. (1997) Effect of caffeine intake during pregnancy on birth weight. Am J Epidemiol 145, 335–338, PMID: 9054237

5. Weng X., Odouli R. & Li D. K. (2008) Maternal caffeine consumption during pregnancy and the risk of miscarriage: a prospective cohort study. Am J Obstet Gynecol, PMID: 18221932

6. Fang X, Mei W, Barbazuk WB, Rivkees SA, Wendler CC. (2014) Caffeine exposure alters cardiac gene expression in embryonic cardiomyocytes. Am J Physiol Regul Integr Comp Physiol. PMID: 25354728

7. Fang X, Poulsen RR, Rivkees SA, Wendler CC. (2016). In Utero Caffeine Exposure Induces Transgenerational Effects on the Adult Heart. Sci Rep. 2016;6:34106. PMID: 27677355

8. Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

9. Philip Ewels, Mans Magnusson, Sverker Lundin and Max Kaller (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics. PMID: 27312411

10. Bolger, A. M., Lohse, M., and Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170. PMID: 24695404

11. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1):15-21. PMID: 23104886

12. Li B and Dewey CN (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323. PMID: 21816040

13. Love MI, huber W, Anders S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15,550. PMID: 25516281

16