Uncovering transcriptional cis- regulatory elements active in adult zebrafish exocrine pancreas

Diogo José Coutinho Ribeiro Dissertação de Mestrado em Bioquímica apresentada à Faculdade de Ciências da Universidade do Porto e Instituto de Ciências Biomédicas Abel Salazar da Universidade do Porto

2018

Orientador Dr. José Bessa Principal Investigator, Instituto de Biologia Celular e Molecular / Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal

Coorientadores Dra. Renata Bordeira-Carriço Post-Doc Researcher, Instituto de Biologia Celular e Molecular / Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal

Prof. Dr. Pedro Rodrigues 1 Associate Professor, Instituto de Ciências Biomédicas de Abel Salazar, Universidade do Porto, Portugal

Todas as correções determinadas

Todas as correções determinadas pelo júri, e só essas, foram efetuadas. O Presidente do Júri, Porto, ______/______/______

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 680156 - ZPR).

Aos meus pais, Ao meu irmão, À memória da minha avó.

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Acknowledgments

Ao meu Orientador Doutor José Carlos Ribeiro Bessa que me deu a oportunidade de participar neste projeto e, sem a liderança do mesmo, nada do que se segue no presente trabalho teria sido alcançável. Os meus agradecimentos recaem na sua incondicional disponibilidade e apoio em todos os aspetos e momentos, quer a nível profissional, quer pessoal. Por todas as sugestões, motivação, incentivo que se afiguraram fundamentais para o sucesso deste trabalho, bem assim por todo o seu conhecimento inigualável. Não é, de todo, possível imaginar um tão fantástico líder nesta fase crucial do meu percurso profissional. Para além de uma relação profissional de enorme relevo que foi estabelecida ao longo do presente ano, guardo igualmente com estima uma relação de amizade que nunca perderei. Por tudo isto, um inexprimível Obrigado. À minha coorientadora Doutora Renata Bordeira-Carriço que foi a minha companheira desde o primeiro dia ao último e que desempenhou um papel fundamental e extraordinário ao longo desta fase. Grande parte do que sei hoje se deve ao conhecimento que por si me foi transmitido e que levarei comigo no meu futuro caminho profissional. Agradeço, ademais, todas as palavras e momentos de amizade que guardarei com carinho. Ao Professor Pedro Rodrigues, um especial agradecimento pela disponibilidade, que constituiu uma peça fundamental para a realização deste trabalho. Aos restantes membros do grupo VDR e colaboradores, que constituíram uma peça basilar para a minha formação, um enorme Obrigado, por me permitirem a oportunidade de aprender com os melhores, pela partilha de conhecimento e entreajuda. Um grande abraço para todos com carinho e amizade ao João, Marta, Fábio, Ana G, Ana E, Leonor, Joana T, Joana M, Filipa, Mafalda e Isabel, a quem devo um enorme obrigado. Finalmente, àqueles a quem um obrigado nunca será suficiente: aos meus pais e ao meu irmão que foram, são e sempre serão o meu suporte na vida e que estiveram incondicionalmente ao meu lado. Aos meus amigos do coração que são uma família e a quem nunca terei oportunidade de lhes agradecer o suficiente. À minha namorada, cujas palavras de carinho, amor, paciência e apoio, que guardarei para a vida no meu coração, foram determinantes. À memória da minha avó que faleceu o ano passado vítima de cancro do pâncreas e de quem guardo as palavras de força, recordando-as diariamente e para o resto dos meus dias. Um perpétuo obrigado pela inspiração.

7

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

8

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Table of contents

Abstract ...... 11 Resumo ...... 13 List of figures ...... 15 List of tables ...... 17 List of abbreviations ...... 18 Introduction...... 20 1. Pancreas ...... 20 1.1 Early pancreas development and morphology……………………………..20 1.2. Danio rerio as a model system to study pancreas development and function ...... 22 1.2.1. Zebrafish development ...... 22 1.2.2. The zebrafish pancreas ...... 24 1.3. Pancreatic diseases ...... 26 2. regulation: general aspects ...... 27 2.1. The role of cis-regulatory elements in the genome ...... 28 2.2. Genome-wide techniques to predict cis-regulatory elements ...... 29 2.2.1. Chromatin epigenetic marks for enhancers and promoters ...... 31 2.2.2. Accessing open chromatin regions using ATAC-seq to find cis- regulatory elements ...... 34 2.2.3. Contemporary tools to identify chromatin interactions genome-wide 35 2.3. Genetic tools to study cis-regulatory elements in vivo ...... 36 2.3.1. Enhancer assays to test for CREs activity ...... 37 2.3.2. Genome editing as a tool for loss of function assays ...... 39 3. Objectives ...... 41 4. Material and Methods ...... 42 4.1. Zebrafish stocks, husbandry, breeding and embryo culture ...... 42 4.2. DNA extraction from zebrafish embryos ...... 42 4.3. Generation of the plasmids carrying the selected putative enhancer sequences ...... 43 4.3.1. PCR amplification of the selected sequences ...... 43 4.3.2. Cloning of fragments into pCR®8/GW/TOPO vector ...... 45 4.3.3. Recombination of the selected sequences into Z48 and ZED vectors ...... 46 4.3.4. Purification of the plasmids for injection ...... 48

9

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

4.4. Generation of transgenic zebrafish ...... 49 4.4.1. RNA synthesis ...... 49 4.4.2. Purification of the synthesized mRNA ...... 49 4.4.3. Injection of the plasmids and mRNA into zebrafish embryos ...... 50 4.5. Screening for enhancer activity ...... 50 4.6. Mutagenesis using the CRISPR-Cas9 system ...... 51 4.6.1. Generation of Cas9 mRNA and sgRNAs ...... 51 4.6.2. Injection of sgRNAs into zebrafish embryos ...... 52 4.7. Real time quantitative PCR ...... 54 4.8. Statistical Analysis ...... 55 5. Results ...... 56 5.1. Analysis of enhancer activity of predicted cis-regulatory elements in pancreas ...... 56 5.2. Comparative analysis of the distal enhancers of ptf1a in human and zebrafish ...... 67 5.2.1. Cell-specific expression pattern of hPtf1aE and ptf1aE3 ...... 68 5.2.2. Deletion of ptf1aE3 in zebrafish ...... 71 5.3. Analysis of the impact of hnf4a knockdown ...... 73 6. Discussion ...... 76 7. Conclusions ...... 82 References ...... 83

10

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Abstract

Non-coding cis-regulatory elements (CREs), such as enhancers, are known to play crucial roles in maintaining proper gene expression. Besides alterations in the protein- coding regions, the disruption of CREs can also compromise gene expression, potentially leading to disease. However, it is still poorly explored how this correlation is established. The aim of this study is to identify active pancreatic CREs of expressed in pancreas, particularly genes involved in pancreas development and function, including tumour-suppressor genes (TSGs), and further evaluate the impact of the loss-of-function of these CREs in pancreatic disorders, using zebrafish as a model.

To achieve this goal, we analyzed data from ATAC-seq and ChIP-seq for epigenetic marks associated with enhancer activity (e.g.: H3K27Ac), obtained from adult zebrafish pancreas, and selected putative CREs in the vicinity of exocrine pancreas specific genes and TSGs. We then performed transient in vivo reporter assays in zebrafish to test the enhancer activity of 17 selected putative CREs and identified 11 (64.7%) active in differentiated exocrine pancreas, validating our zebrafish adult pancreas-specific data as powerful and reliable tools for the prediction of CREs. One of the most active enhancers identified in this study is distally downstream of the pancreas specific transcription factor 1α (ptf1a) gene, that encodes a transcription factor (TF) crucial for the pancreas development and differentiation of acinar cells. Analyzing an adult stable zebrafish reporter line of this enhancer, we observed that this element is active in differentiated exocrine acinar and duct cells, pancreatic progenitor cells, but not in endocrine cells, reproducing the endogenous activity of the predicted target gene ptf1a. Moreover, exploring recent Hi-ChIP data from our laboratory, we observed that the identified enhancer establishes direct contact with the ptf1a promoter, supporting the hypothesis of being an enhancer regulating the expression of ptf1a in zebrafish pancreas.

To further validate this hypothesis, we used CRISPR-Cas9 to induce deletions in the ptf1a-enhancer and observed a reduction in the pancreatic area with the pancreas organogenesis apparently affected. This is in accordance with previously described pancreatic agenesis resulting from the deletion of a distal enhancer of PTF1A in humans. Interestingly, we observed that this human CRE has cell-specific expression patterns identical to the newly identified distal ptf1a-enhancer in zebrafish, suggesting these regions might be functional orthologues.

Lastly, by targeting hnf4a using CRISPR-Cas9 we evaluated how the loss of function of this TF encoding gene, whose TF binding sites were found to be enriched in our predicted

11

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas enhancers, could affect the expression of selected pancreatic genes. Upon disrupting this gene, we detected a significant downregulation of downstream target genes, including ptf1a. This suggests that this TF may be a master regulator of pancreas development through interaction with key enhancers of pancreatic genes.

Overall, these findings will contribute to validate zebrafish as a powerful model organism to understand the mechanisms of human gene regulation and development of the pancreas, and help clarifying the molecular networks involved in the development of pancreatic disorders, especially pancreatic cancer and pancreatic agenesis.

Key words: cis-regulatory elements; enhancers; gene regulation; pancreas; disease; zebrafish; transgenesis; CRISPR-Cas9

12

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Resumo

Os elementos cis-reguladores não codificantes, tais como os enhancers, são conhecidos por desempenhar funções importantes no controlo adequado da expressão génica. Para além das alterações nas regiões que codificam para proteínas, a alteração de elementos cis-reguladores pode igualmente comprometer a expressão génica, podendo potencialmente levar à doença. No entanto, a forma como esta relação se estabelece ainda está muito pouco explorada. O objetivo deste estudo é identificar elementos cis-reguladores ativos de genes expressos no pancreas, particularmente genes envolvidos no desenvolvimento e função do pancreas, incluindo genes supressores tumorais, e posteriormente avaliar o impacto da perda de função destes elementos em doenças pancreáticas, usando o peixe-zebra como modelo.

Para atingir este objetivo, analisámos dados de ATAC-seq e ChIP-seq para marcas epigenéticas associadas à atividade de enhancers (p. ex.: H3K27Ac), obtidos a partir do pancreas de peixe-zebra adulto, e selecionamos possíveis elementos cis-reguladores na proximidade de genes específicos do pancreas exócrino e genes supressores tumorais. Em seguida, realizamos ensaios transientes in vivo em peixe-zebra, utilizando genes repórter para testar a atividade enhancer de dezassete possíveis elementos cis- reguladores, tendo identificado onze (64.7%) ativos no pancreas exócrino diferenciado, validando os nossos dados de ATAC-seq e ChIP-seq específicos de pancreas adulto como fidedignos para a previsão destes elementos. Um dos enhancers mais ativos identificado neste estudo está distalmente localizado a jusante do gene ptf1a, que codifica um fator de transcrição crucial para o desenvolvimento do pancreas e diferenciação de células acinares. Analisando uma linha transgénica estável estabelecida em peixe-zebra deste enhancer, observamos que este elemento está ativo em células acinares e do ducto diferenciadas, células progenitoras pancreáticas, mas não em células endócrinas, reproduzindo a atividade endógena do gene alvo previsto ptf1a. Para além disso, explorando dados recentes de Hi-ChIP do nosso laboratório, observamos que o enhancer identificado estabelece contacto direto com o promotor de ptf1a, suportando a hipótese de ser um enhancer que regula a expressão de ptf1a no pancreas de peixe-zebra.

Para validar esta hipótese, usámos CRISPR-Cas9 para induzir deleções no enhancer de Ptf1a e observámos uma redução na área do pâncreas, estando a organogénese aparentemente afetada. Isto está de acordo com a agenesia pancreática previamente descrita, resultante de uma deleção de um enhancer distal de PTF1A em humanos. Curiosamente, observamos que este elemento cis-regulatório humano possui padrões

13

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas de expressão idênticos ao enhancer distal de ptf1a identificado em peixe-zebra, sugerindo que estas regiões possam ser funcionalmente ortólogas.

Por último, mutando hnf4a por CRISPR-Cas9 avaliámos o efeito da perda de função do fator de transcrição codificado por este gene, cujos locais de ligação estão enriquecidos nos enhancers previstos, na expressão de genes alvo no pancreas. Depois de mutarmos este gene, detetámos uma redução significativa da expressão de genes alvo, incluindo ptf1a. Isto sugere que este fator de transcrição poderá ser um importante regulador do desenvolvimento do pancreas através da interação com enhancers associados a genes pancreáticos.

De um modo geral, estes resultados vão contribuir para a validação do peixe-zebra como um importante modelo animal para a compreensão dos mecanismos envolvidos na regulação da expressão génica e desenvolvimento do pancreas em humanos, e ajudar a esclarecer as vias moleculares envolvidas no desenvolvimento de doenças pancreáticas, como por exemplo agenesia do pâncreas e cancro do pancreas.

Palavras-chave: elementos cis-regulatórios; enhancers; regulação genética; pancreas; doença; peixe-zebra; transgénese; CRISPR-Cas9

14

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

List of figures

Figure 1. Pancreatic morphogenesis and developmental regulation...... 21 Figure 2. Islet morphogenesis ...... 21 Figure 3. Zebrafish life cycle...... 24 Figure 4. Comparison of the microscopy anatomy of pancreatic tissues in adult zebrafish and human ...... 24 Figure 5. Schematic representation of the zebrafish pancreas development ... 25 Figure 6. Overview of gene regulatory elements ...... 28 Figure 7. Genetic networks of cis-regulatory elements ...... 29 Figure 8. Chromatin accessibility and histone marks associated with cis- regulatory elements ...... 32 Figure 9. Overview of the chromatin immunoprecipitation (ChIP) procedure ... 33 Figure 10. ATAC-seq reaction schematics ...... 34 Figure 11. Overview of Hi-C technique ...... 36 Figure 12. Embryo transgenic experiments in zebrafish ...... 37 Figure 13. Assessment of gene regulatory events in vivo ...... 38 Figure 14. Genomic engineering using Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 endonucleases ...... 40 Figure 15. UCSC Genome Browser tracks showing different data sets in the region of pancreatic transcription factor 1 subunit alpha gene ...... 43 Figure 16. Schematics of the in vivo reporter (destination) vectors ...... 47 Figure 17. Overview of the cloning and recombination protocol ...... 48 Figure 18. Landscape of selected genes and associated CREs ...... 58 Figure 19. GFP midbrain expression as an internal control of transgenesis .... 58 Figure 20. Overall enhancer activity of the tested sequences ...... 59 Figure 21. Enhancer assays for CREs of pancreatic genes ...... 61 Figure 22. Enhancer assays for CREs of TSGs ...... 62 Figure 23. Enhancer activity within the pancreatic primordium...... 63 Figure 24. Ptf1aE3 expression pattern in a stable transgenic line ...... 64 Figure 25. Nkx6.1 is a molecular marker for both progenitor and differentiated ductal cells...... 65 Figure 26. Cell type-specific expression pattern driven by the tg(ZED:ptf1aE3) line ...... 66

15

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 27. Zebrafish and human distal ptf1a enhancers...... 67 Figure 28. Human landscape of PTF1A...... 68 Figure 29. hPtf1aE has enhancer activity in the pancreas ...... 68 Figure 30. Comparison of expression patters driven by the human and zebrafish ptf1a enhancers in differentiated exocrine acinar cells ...... 69 Figure 31. Comparison of expression patters driven by the human and zebrafish ptf1a enhancers in pancreatic progenitor cells at 48 hpf ...... 70 Figure 32. Comparison of expression patters driven by the human and zebrafish ptf1a enhancers in differentiated exocrine ductal cells ...... 70 Figure 33. Comparison of expression patters driven by the human and zebrafish ptf1a enhancers in endocrine cells ...... 71 Figure 34. Pancreatic area is reduced in ptf1aE3 mutant embryos ...... 71 Figure 35. Phenotype displayed by mutant embryos for Ptf1aE3 ...... 72 Figure 36. Ptf1aE3 gel electrophoresis after CRISPR-Cas9 ...... 72 Figure 37. Chromatograph of the CRISPR-generated mutation in the hnf4a gene ...... 73 Figure 38. Total pancreatic area is not affected in hnf4a knockout embryos .... 73 Figure 39. Elastase:mCherry signal is downregulated in hnf4a knockout embryos ...... 74 Figure 40. Elastase is a molecular marker of the exocrine pancreas ...... 74 Figure 41. Hnf4a downstream target genes expression levels are affected upon hnf4a disruption ...... 75

16

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

List of tables

Table 1. Primers used for the amplification of cis-regulatory sequences...... 44 Table 2. Oligonucleotides used for CRISPR-Cas9...... 51 Table 3. Primers used for RT-qPCR ...... 54 Table 4. List of selected genes and associated cis-regulatory elements ...... 56 Table 5. Percentage of tissue-specific expression in the pancreas ...... 63

17

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

List of abbreviations

ATAC Assay for Transposase-Accessible Chromatin

BAC Bacterial Artificial

bHLH basic-helix-loop-helix

bp (s)

BSA Albumin from bovine serum

ChIP-seq Chromatin Immunoprecipitation followed by sequencing

CRE Cis-regulatory element

CRISPR Clustered Regularly Interspaced Short Palindromic Repeats

CTCT CCCTC-binding factor

dNTPs deoxynucleotide triphosphates

dpf days post fertilization

DSB Double strand break

EB Extraction buffer

ENU N-ethyl-N-nitrosourea

FACS Fluorescent Activated Cell Sorting

GFP Green Fluorescent Protein

GWAS Genome-Wide Association Studies

HAT Histone acetyltransferase

hDFb human dermal fibroblasts

hESC human Embryonic Stem Cells

HMT Histone methyltransferase

hpf hours post fertilization

kb kilobase pairs

NC Negative control

18

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

NGS Next Generation Sequencing

NHEJ Non-homologous end joining

O.N. overnight

PC Pancreatic cancer

PCR Polymerase Chain Reaction

PDAC Pancreatic ductal adenocarcinoma

Pol II RNA polymerase II

PTU 1-phenyl-2-thiourea

RFP Red Fluorescent Protein

RT Room temperature sgRNA Single guide RNA

SNP Single-Nucleotide Polymorphism

T1D Type 1 Diabetes

T2D Type 2 Diabetes

TE Tris-EDTA

TF Transcription Factor

TFBS Transcription Factor Binding Site

TSG Tumour-suppressor genes

TSS Transcription Starting Site

WT Wild-type

ZED Zebrafish enhancer detection

19

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Introduction 1. Pancreas

1.1. Early pancreas development and morphology

The pancreas is a multifunctional flattened organ whose cells direct a variety of vital endocrine and exocrine physiological functions in the organism (Gittes, 2009). Over the past few years, the knowledge about the mechanisms responsible for the fundamental processes of pancreas development had an incredible growth due to modern molecular biology approaches, epigenetics and gain- or loss-of-function studies in model organisms (Benitez, Goodyer and Kim, 2012). The combined activity of specific signaling networks result in precise allocation of the pancreatic primordium within the developing gut, and concomitantly, in the regionalized expression of early pancreatic markers, among which pancreas specific transcription factor 1α (PTF1A) and pancreatic/duodenal homeobox factor 1 (PDX1) (Figure 1) (Benitez, Goodyer and Kim, 2012). PTF1A, also known as p48, is a basic-helix-loop-helix (bHLH) transcription factor (TF) part of the PTF1 complex that plays a crucial role in patterning and differentiation of the pancreas in all known vertebrates. Furthermore, PDX1 is a critical TF of early pancreas development (Hald et al., 2008; Hesselson, Anderson and Stainier, 2011; Jennings et al., 2015). Loss of function studies have shown that the lack of each gene alone leads to almost complete pancreas agenesis (Hald et al., 2008). Moreover, Dong and coworkers (Dong et al., 2008) demonstrated that PTF1A besides being essential for normal exocrine cell fate, also plays a role in supporting endocrine differentiation. Pancreas development starts with the thickening of the distal foregut endoderm and evagination of two pancreatic buds, dorsal and ventral (Jørgensen et al., 2007; Benitez, Goodyer and Kim, 2012; Jennings et al., 2015). Following a transient epithelial stratification, tubular structures containing a distal multipotent Ptf1a+ (‘tip’) arise to origin both endocrine and exocrine cells, while a bipotent Nk6 homeobox (Nkx6+) central duct- like structure (‘trunk’) differentiate into ductal and endocrine cells (Figure 1) (Benitez, Goodyer and Kim, 2012; Jennings et al., 2015). Nkx6 and Ptf1a are negatively crossregulated, however in initial stages of pancreatic development there is coexpression of both genes (Schaffer et al., 2010). The exocrine tissue constitutes almost 95% of the total pancreatic mass and is composed by two different cell types: acinar and duct cells (Das et al., 2014). Acini secret digestive enzymes, such as Amylase, Trypsin, Chymotrypsin, Elastase, Carboxypeptidase, that are guided into the digestive tract through a duct system (Field et al., 2003).

20

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Figure 1. Pancreatic morphogenesis and developmental regulation. The pancreas arises from the distal foregut endoderm and contains a distinguishing combination of cell lineages. After pancreatic budding, epithelial expansion and proliferation, a subset of cells start expressing Ptf1a, Pdx1, and other TFs (panel 1 and 2). A multipotent “tip” and a bipotent “trunk” is established in a tube-like structure. Multipotent Ptf1a+ progenitor cells origin acinar, duct and endocrine cells, whereas bipotent Nkx6+ cells produce duct and endocrine cells (panel 3). Lately, pancreatic branching, cell differentiation and expansion, and islet formation drive pancreatic morphogenesis (panel 4) (Benitez, Goodyer and Kim, 2012).

Likewise, the endocrine domain of the pancreas arises from a subset of central duct-like progenitor cells that express the bHLH TF encoding gene Neurogenin3 (NGN3) and aggregate to form islets of Langerhans. It has been reported that the levels of NGN3 are fundamental for the pancreatic progenitor commitment into the endocrine lineage (Wang et al., 2010). Ngn3+ endocrine progenitors give rise to five different endocrine cell types that produce, respectively, five different hormones: α-cells - glucagon; β-cells - insulin; δ-cells - somatostatin; ε-cells - ghrelin; and γ [or PP]-cells - pancreatic polypeptide) (Figure 2) in a temporally controlled manner (Miyatsuka et al., 2011).

Figure 2. Islet morphogenesis. Endocrine cells arise from Ngn3+ endocrine progenitors (pink). These cells migrate and aggregate into clusters to form islets (Benitez, Goodyer and Kim, 2012).

21

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

1.2. Danio rerio as a model system to study pancreas development and function

Zebrafish (Danio rerio), a small tropical freshwater vertebrate, has emerged over the past few years as a popular animal model to study vertebrate developmental biology and human diseases (Tiso, Moro and Argenton, 2009; Yee, Kazi and Yee, 2013; Kojić et al., 2017). Multiple characteristics of zebrafish, including a relatively inexpensive maintenance, husbandry and high fecundity rate make it a preferred model for research (Ishibashi et al., 2013). Zebrafish spawning can produce hundreds of eggs that are externally fertilized allowing for genetic manipulation and transgenesis assays through microinjection techniques (Dooley and Zon, 2000; Ishibashi et al., 2013). Due to rapid ex-utero embryonic development, most of the organs are already formed 24 hours post fertilization (hpf) (Kojić et al., 2017). Embryo transparency facilitates the acquisition of noninvasive images of developmental processes and monitoring of fluorescent proteins during genetic screenings (Yee and Pack, 2005; Kojić et al., 2017). Generation time is very short, reaching sexual maturity within 3 months and allowing the rapid transfer of transgenes to the next generations (Ishibashi et al., 2013). As a high-throughput screening model organism, zebrafish enables genome editing in newly identified candidate genes and regulatory elements (L. Li et al., 2016). Moreover, there are 26,206 genes annotated in the zebrafish genome, 69% having a human orthologous (Howe et al., 2013). Equally, even though they are two evolutionary distant species, 70% of human genes have functional orthologs in zebrafish, being the developmental processes and the associated gene regulatory networks conserved (Perez-Rico and Shkumatava, no date; Navratilova et al., 2009; Yee, Kazi and Yee, 2013). Through this, zebrafish can provide a forward genetic approach for assigning function to genes, positioning them into developmental and/or disease-related trails and its full potential has only begun to be unveiled. (Ota and Kawahara, 2014)

1.2.1. Zebrafish development

Zebrafish embryogenesis can be divided into 7 stages of development during the first 3 days post fertilization (dpf): the zygote, cleavage, blastula, gastrula, segmentation, pharyngula, and hatching periods (Figure 3). The externally fertilized egg is in the zygote stage until the first cleavage occurs (Kimmel et al., 1995). After successive cleavages, the egg enters the blastula period at the 128-cell stage (Kimmel et al., 1995). This period is characterized by morphogenetic cell movements to produce the primary germ layers

22

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

and the embryonic axis (Kimmel et al., 1995). Around 10 hpf, the embryo is in the segmentation period to undergo several developmental processes, including the development of somites, some primary organs and elongation of the embryo (Kimmel et al., 1995). By this stage the first body movements occur (Kimmel et al., 1995). Between 24-48 hpf, the embryo enters the pharyngula period and is now a bilaterally organized organism with a well-developed notochord and almost developed nervous system (Kimmel et al., 1995). By the second day post fertilization (hatching period), the larvae have completed most of its morphogenesis and continues to grow (Kimmel et al., 1995). After hatching, zebrafish larvae inflates its swim bladder and gradually begins to swim actively moving its jaws, operculum, pectoral fins and eyes (Kimmel et al., 1995). These processes of development result in swift escape responses, herald respiration, seeking of prey and feeding (Kimmel et al., 1995). Exploiting the advantages of this animal model, including its rapid morphogenesis, processes such as development, differentiation and physiology of single tissues can be explored. Hence, studies using zebrafish have enabled the discovery of novel genetic elements and their functional roles in development of several organs (De La Calle-Mustienes et al., 2005; Navratilova et al., 2009; Aday et al., 2011; Wang et al., 2011; Yee, Kazi and Yee, 2013; Ung et al., 2015).

23

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 3. Zebrafish life cycle. (A) Zebrafish develop from one-cell stage zygote. Gastrulation starts at around 6 hpf hatching at approximately 48 hpf. Sexual maturity is reached around 3 months of age. Reproduced from (D’Costa and Shepherd, 2009). (B) Representation of several zebrafish embryonic developmental stages. Pigmentation is absent. Scale bar = 250 µm. Adapted from (Kimmel et al., 1995).

1.2.2. The zebrafish pancreas

The zebrafish pancreas has long been a subject of study since 1996 at the molecular, genetic and morphological level to understand how the organ develops and is regulated at the transcriptional level (Pack et al., 1996). Just like in humans, zebrafish pancreas plays fundamental roles in glucose homeostasis, food digestion and nutrition, being constituted by both endocrine and exocrine compartments (Field et al., 2003). In adult zebrafish, the pancreas is diffusely dispersed in the mesentery and within the intestinal loops (Yee and Pack, 2005) being the microscopic structure of adult pancreas in zebrafish very similar to that of adult human (Figure 4).

Figure 4. Comparison of the microscopy anatomy of pancreatic tissues in adult zebrafish and human. Histological sections of (A) WT zebrafish and (B) human were stained with hematoxylin and eosin. The exocrine pancreas consists of ducts and acini, and the islets containing endocrine cells are as indicated and highly similar between zebrafish and human (Yee, Kazi and Yee, 2013).

24

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

The structure of zebrafish pancreas is similar to other vertebrates (Tiso, Moro and Argenton, 2009), composed by exocrine acini, responsible for the production and secretion of digestive enzymes (e.g.: proteases, amylases, lipases and nucleases), a ductal network that transport the enzymes into the intestine, and an endocrine portion encompassing a principal and secondary islets constituted of hormone-secreting cells (Zecchin et al., 2004; Tiso, Moro and Argenton, 2009; Wang et al., 2011). In zebrafish, both compartments have different spatiotemporal origins, they arise from two adjacent areas of the gut that expand sequentially at two different developmental times; a first one posterodorsal and a second anteroventral (Figure 5).

Figure 5. Schematic representation of the zebrafish pancreas development. Important genes regarding pancreas morphogenetic events are in green, blue and magenta. Dorsal view images with anterior at the top. L – liver; G – gut, S – swim bladder (Tiso, Moro and Argenton, 2009).

At approximately 24 hpf the dorsal bud arises from endodermal cells expressing the pancreatic and duodenal homeobox-1 gene pdx1, an early pancreatic and gut marker (Biemar et al., 2001). This dorsal bud gives origin to endocrine cells of the principal islet expressing the main pancreatic hormones (insulin and glucagon). By 48 hpf, the principal islet is already organized, with alpha and delta cells at the periphery surrounding a central core of beta cells. The second bud emerges at around 40 hpf from the ventral portion of the gut. This anteroventral bud gives rise to the exocrine tissue, pancreatic ducts and a small portion of endocrine cells that differentiate near the pancreatic duct. This process requires the activity of the evolutionarily conserved TF Ptf1a, an exocrine marker necessary for determining the cell fate of pancreatic progenitors that will become exocrine and endocrine epithelia during pancreas development (Zecchin et al., 2004; Dong et al., 2008; Pashos et al., 2013). Moreover, ptf1a deficiency in mice and zebrafish

25

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas leads to loss of the ventral bud formation (Krapp et al., 1998; Kawaguchi et al., 2002; Lin et al., 2004; Zecchin et al., 2004; Pashos et al., 2013). By 52 hpf, the ventral bud grows towards the dorsal bud, progressively engulfing the islet. With the onset of expression of trypsin and nkx2.2a, the pancreas continues to grow, pancreatic ducts further develop and branch, while acini enlarge and mature (Gittes, 2009). The formation of the mature endocrine and exocrine pancreas is a complex process that requires migration, extensive proliferation and differentiation of different cell types (Zecchin et al., 2004). Later on, by 6 dpf the zebrafish pancreas has already established its mature shape and position and the overall structure resembles to the adult one since all the cell types are already fully differentiated (Gnügge, Meyer and Driever, 2004). The overall structural similarities of pancreas in zebrafish and humans suggest highly conserved developmental biology and genetics as well as functions of the exocrine and endocrine pancreas in these organisms (Yee and Pack, 2005; Benitez, Goodyer and Kim, 2012; Yee, Kazi and Yee, 2013). These data provide support for zebrafish as a vertebrate model organism to dissect the molecular mechanisms that control human pancreatic development and diseases (Field et al., 2003).

1.3. Pancreatic diseases

The increasing incidence of human pancreatic diseases, such as diabetes mellitus and pancreatic cancer (PC), led to an augmented interest in understanding pancreas developmental biology (Benitez, Goodyer and Kim, 2012). Among the several types of PC, pancreatic ductal adenocarcinoma (PDAC) is the predominant histopathological disorder and one of the most debilitating and lethal diseases, with growing occurrence and poor prognosis, less than 5% survival rate (Hezel et al., 2012). Due to the lack of biomarkers for an early diagnosis, when the tumour is identified there is usually no effective therapy (Yee and Pack, 2005; Kim et al., 2013). Although several mutations in oncogenic and tumour-suppressor genes (TSG) can contribute to the development of PC, these are not specific for PDAC (Yee and Pack, 2005; Esteller, 2007; Yee, Kazi and Yee, 2013). On another hand, diabetes is a growing public health problem afflicting millions worldwide (Tabish, 2007; Dong et al., 2008). Type 1 diabetes (T1D) is an autoimmune disease characterized by the destruction of insulin-producing β-cells with a huge impact on glucose homeostasis. In contrast, type 2 diabetes (T2D), the most common form of the disorder, is a multifactorial metabolic illness normally characterized by a progressive failure of β-cells and lack of responsiveness to normal insulin levels (Chen, Magliano and Zimmet, 2012; Jennings et al., 2015).

26

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Apart from the numerous alterations in the protein-encoding part of genes, these genetic diseases can equally arise by disruption of the non-coding functional elements that maintain correct gene expression (Santos-Rosa et al., 2002; Creyghton et al., 2010). Several variants in non-coding regions of the DNA associated with pancreatic diseases have already been identified by several genome-wide association studies (GWAS) (Mahajan et al., 2014). Some of these disease-associated single-nucleotide polymorphisms (SNPs) are located in the proximity of genes that play key functions in pancreas development, such as PTF1A, PDX1, HNF1B, HNF1A, HNF4A and NOTCH2 (Sellick et al., 2004; Maestro et al., 2007; Mahajan et al., 2014). Moreover, it is now clear that when the regulatory system of a gene is disrupted, gene expression may be compromised, potentially leading to disease (Kleinjan and van Heyningen, 2005; Weedon et al., 2014). Many mechanisms causing disruption of cis-regulatory elements (CREs) have already been encountered in actual disease cases (Kleinjan and Coutinho, 2009). In a study with human embryonic stem cell (hESC)-derived pancreatic progenitor cells, six recessive mutations were uncovered in a previously uncharacterized developmental enhancer of PTF1A resulting in pancreatic agenesis (Weedon et al., 2014). Therefore, understanding how mutations in developmental tissue-specific CREs are linked to human disorders is a very exciting and unexplored field that can provide important advances to the study of pancreatic diseases.

2. Gene regulation: general aspects

In multicellular organisms, the exact spatiotemporal control of gene expression is fundamental for the correct function of organs and developmental processes (Kleinjan and Coutinho, 2009; Atkinson and Halfon, 2014; Fernández-Miñán et al., 2016). The transcriptional networks underlying cell development and function require the precise orchestration of a composite set of interactions among a variety of proteins and DNA sequences (Kornberg, 1974; Ong and Corces, 2011). The complexity of transcription control, chromatin structure, epigenetic modifications and combinatorial binding of trans- activating factors to promoters and CREs result in precise regulation of eukaryotic genes (Kleinjan and Coutinho, 2009). Around 99% of the is constituted by non- coding DNA, including promoters, insulators, enhancers and silencers, that play key roles in controlling gene expression (Beaton and Cavalier-smith, 1999; Creyghton et al., 2010; Lelli, Slattery and Mann, 2012). Hence, unveiling the role of every regulatory element and how they work together coordinately in time and space would allow us to understand how gene expression can be correctly sorted out (Hoffman and Jones, 2009).

27

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

2.1. The role of cis-regulatory elements in the genome

Typically, CREs can be divided into proximal (i.e., promoters) or distal (e.g.: enhancers, insulators and silencers) concerning the genes they control (Maston, Evans and Green, 2006). Required for the proper transcription of genes, promoters play an essential role in the assembly of the transcription machinery that recruits RNA polymerase II (Pol II) to the Transcription Starting Site (TSS) (Figure 6) (Bessa et al., 2009; Atkinson and Halfon, 2014). In many genes, the transcriptional information encoded in the promoters is enough to control their transcription. However, in genes with more complex and dynamic networks, besides promoters, other cis-regulatory regions proved to play essential functions in the control of gene expression during development and in response to environmental stimulus (Bessa et al., 2014; Danino et al., 2015).

Figure 6. Overview of gene regulatory elements. Schematic representation of proximal and distal CREs and the interactions established with each other. Promoters (light and dark blue) are next to the transcription starting site (TSS) and are enriched for transcription factor binding sites (TFBS) to serve as anchors for enhancers. Insulators have an opposite effect compared to enhancers, they turn off the expression of genes in specific tissues and specific timepoints acting as ‘barriers’ for enhancers and silencers (Maston, Evans and Green, 2006; Luizon and Ahituv, 2015). Reproduced from (Luizon and Ahituv, 2015).

Insulators are a specific type of CREs that function in a position-dependent, orientation- independent manner to prevent interactions between genes and adjacent regulatory regions (Figure 6) (Maston, Evans and Green, 2006). Although the exact mechanisms by which insulators block enhancer-promoter interactions and heterochromatin spread are still being explored, the only known conserved TF that mediate insulator activity in vertebrates is the CCCTC-binding factor (CTCF) (Kuhn et al., 2003; Maston, Evans and Green, 2006). The most extensively studied CREs are transcriptional enhancers, which are non-coding DNA sequences that contain multiple binding sites for a variety of TFs (Rodríguez-Seguí and Bessa, 2015). TFs are trans-acting regulatory elements that play a very important

28

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

role in transcriptional regulation (Spitz and Furlong, 2012). When bound by TFs, enhancers mediate the recruitment and activation of Pol II at the target gene’s promoter to activate transcription (Figure 7A) (Tjian, 1994; Ong and Corces, 2011; Kumar, Garg and Garg, 2014). For such, the looping of the DNA strand is required, with cohesin displaying an important role in enhancer-promoter interaction, stabilizing the looping necessary for the direct interaction between the regulatory elements (Figure 7B) (Maston, Evans and Green, 2006; Ong and Corces, 2011; Shlyueva, Stampfel and Stark, 2014). Moreover, enhancers can be located very far away from their target genes. An extreme example was the identification of an enhancer located 1 Mb away from its target (Sagai, 2005; Stadhouders, 2012). An interesting possibility is that, regardless of their position, distance or orientation in relation to the promoter of a gene, enhancers may control gene expression in a spatio-temporal and tissue-specific manner (Visel et al., 2009; Shlyueva, Stampfel and Stark, 2014).

Figure 7. Genetic networks of cis-regulatory elements. (A) Enhancers are regions of the chromatin that are enriched for TF binding sites and that can enhance the transcription of target genes from its transcription starting site (TSS). (B) In specific tissues, active enhancers (Enhancer A) are bound by activating TFs and are brought into proximity of their target promoters by chromatin looping mediated by cohesin (Shlyueva, Stampfel and Stark, 2014).

29

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

The networks that coordinate correct assembly of genomes and gene expression are extremely dependent on cis and trans-regulatory elements with the entire pathway being triggered by specific TFs that bind key cis-regulatory sequences, recruiting co-activator complexes and the transcriptional machinery to allow transcription and production of proteins (Boyer et al., 2005). Through feed forward loops and through displacement of chromosomal proteins and exposure of the DNA, different time and tissue specific TFs and enhancers may be active (Kornberg, 1974; Hawkins et al., 2011). Overall, proper transcription is accomplished by multiple CREs spread over large genomic distances that operate together with TFs to define regulatory landscapes (Ragvin et al., 2011). A better understanding of the regulatory code and the mechanisms that translate that information into spatio-temporal correct gene expression and moreover how mutations in CREs change the read-out of the code will be a necessary and interesting advance for disease management and treatment (Kleinjan and Coutinho, 2009).

2.2. Genome-wide techniques to predict cis-regulatory elements

In a human cell, one of the major functions of chromatin is to pack the DNA in the nucleus around a set of histone proteins to form nucleosomes (Kornberg, 1974). This packaging engulfs inactive genomic regions and leaves biologically non-coding active regions (e.g.: promoters, enhancers) exposed, facilitating the accessibility of proteins to DNA and proper gene transcription control (Odom et al., 2004; Vaezeslami, Sterling and Reznikoff, 2007; Fernández-Miñán et al., 2016). Therefore, nucleosome positioning and dynamic modifications of DNA/histones interaction, which plays a very important role in gene regulation, has become a very challenging field (Hawkins et al., 2011; Buenrostro et al., 2013). Recent advances in molecular and computational biology have allowed the application of genome-wide tools to map DNA-protein interactions and epigenetic marks for enhancer prediction (Farnham, 2009; Ishibashi et al., 2013). In addition, chromatin state can directly influence transcription by altering the packaging of DNA to allow or prevent the access of TFs (Park, 2009a). Therefore, efficient techniques such as chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq), assay for transposase-accessible chromatin using sequencing (ATAC-seq) and Hi-ChIP, when combined provide a comprehensive portrait of the regulatory landscapes of highly transcribed genes genome-wide (Buenrostro et al., 2013).

30

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

2.2.1. Chromatin epigenetic marks for enhancers and promoters

Specific-chromatin features, such as histone post-translational modifications, are often associated with regulatory elements and have been used to predict cis-regulatory modules (eg.: promoters and enhancers) (Spitz and Furlong, 2012). These histone modifications are usually mediated by enzymes, such as histone acetyltransferases (HAT) (eg.: p300/CBP) and histone methyltransferases (HMT) in response to various stimulus (Taatjes, Marr and Tjian, 2004). Heintzman and coworkers (Heintzman et al., 2007) identified in cell lines the chromatin features that can distinguish enhancers from promoters. Rapidly, these histone modifications were validated using embryo model organisms such as Drosophila, zebrafish and mouse (Visel et al., 2009; Aday et al., 2011; Négre et al., 2011). Chromatin state at promoters is generally stable, being the most common epigenetic marks used to identify them the trimethylation of both histone H3 lysine 4 (H3K4me3) and lysine 27 (H3K27m3) and the acetylation of lysine 27 in histone 3 (H3K27Ac) (Figure 8C) (Hawkins et al., 2011; Rada-Iglesias et al., 2011). Moreover, it was shown that enhancers in mouse and hESC can oscillate between two states: active or poised (Figure 8B, D). Active enhancers are characterized by the monomethylation of lysine 4 in histone 3 (H3K4me1) and the acetylation of lysine 27 in histone 3 (H3K27Ac). In contrast, poised enhancers are characterized by an absence of the acetylation mark, are enriched for histone 3 lysine 27 trimethylation (H3K27m3) and maintain the H3K4me1 mark (Santos- Rosa et al., 2002; Heintzman et al., 2007; Creyghton et al., 2010; Hawkins et al., 2011; Rada-Iglesias et al., 2011).

31

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 8. Chromatin accessibility and histone marks associated with cis-regulatory elements. (A) Chromatin has proven to be a “gatekeeper” for TF accessibility and enhancer activity. Nucleosomes when positioned close can restrict access for TFs. The transition from the “closed” to “open” state is determined by regulatory DNA-binding proteins (eg.: TFs, insulator proteins ‘CTCF’, repressors and polymerases). (B) Active enhancers are often associated with flanking histones marked by the H3 acetylation at lysine 27 (H3K27Ac) and H3 monomethylation at lysine 4 (H3K4me1). (C) Active promoters (represented bound by Pol II) are usually flanked by nucleosomes with H3K27Ac and H3K4m3 marks. (D) Closed or poised enhancers are characterized by the combination of H3K4m1 and the repressive H3K27m3 mark (Shlyueva, Stampfel and Stark, 2014).

Hence, many genome-wide techniques have been used to investigate gene regulation, among them, ChIP-seq (Hoffman and Jones, 2009; Park, 2009a; Raha, Hong and Snyder, 2010). This widely employed technique allows a genome-wide profiling of DNA- binding proteins, histone modifications, TFBS or nucleosomes that are essential for deciphering gene regulatory networks that underlie several biological processes (Park, 2009b). ChIP-seq enriches DNA fragments to which a specific protein is bound. It uses antibodies that bind specifically to the protein of interest bound to DNA or histones with specific modifications to immunoprecipitate the DNA-protein complex. Finally, the released DNA is purified and assayed directly by next generation sequencing (NGS) to determine the sequences motifs bound by the protein (Figure 9) (Odom et al., 2004; Hoffman and Jones, 2009; Shlyueva, Stampfel and Stark, 2014). Therefore, ChIP-seq has become a very important tool for profiling transcriptional networks and unveiling enhancers in chromatin landscapes, improving our understanding on gene regulation (Creyghton et al., 2010; Ong and Corces, 2011).

32

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Figure 9. Overview of the chromatin immunoprecipitation (ChIP) procedure. Cells are initially treated with a cross- linking agent that covalently links DNA-interacting proteins to the DNA. The genomic DNA is then isolated and sheared, usually by sonication, into a suitable fragment size distribution (100-300 bp). An antibody that specifically recognizes the protein of interest is added and immunoprecipitation is used to isolate appropriate DNA-protein complexes. The cross- links are then reversed, and the DNA fragments purified. ChIP-seq uses the released DNA for Next Generation Sequencing (NGS) to access the sequences bound by the protein (Hoffman and Jones, 2009; Shlyueva, Stampfel and Stark, 2014).

33

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

2.2.2. Accessing open chromatin regions using ATAC-seq to find cis- regulatory elements

To allow the binding of TFs to DNA, the disruption of nucleosomes is necessary (Workman, 2006). Thus, nucleosome-free regions are often enriched with CREs that bind these trans-regulatory elements (Q. Zhang et al., 2017). Different genome-wide techniques have been developed to identify chromatin accessibility (‘open chromatin’), nucleosome positioning and TFBS (Buenrostro et al., 2015). ATAC-seq has emerged as an alternative to DNAse I hypersensitivity digestion and is a powerful and efficient tool to assay open chromatin genome-wide (Buenrostro et al., 2013, 2015; Lee et al., 2015). ATAC-seq probes DNA accessibility with hyperactive Tn5 transposase, which simultaneously cut and inserts sequencing adapters into accessible regions of chromatin for high-throughput sequencing (Figure 10) (Goryshin and Reznikoff, 1998). The resulting sequencing reads allow for multidimensional analysis of regulatory landscapes of chromatin and are used to detect regions of open chromatin, as well as to map regions of TF binding and nucleosome position. Thus, ATAC-seq has proved to be a robust and sensitive method for the profiling of open chromatin regions, providing a multidimensional representation of CREs (Buenrostro et al., 2013, 2015).

Figure 10. ATAC-seq reaction schematics. Tn5 Transposase (green), loaded with sequencing adaptors (red and blue), inserts only in regions of accessible chromatin (between nucleosomes in gray) and generates sequencing fragments that can be amplified by PCR (Buenrostro et al., 2013).

34

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

2.2.3. Contemporary tools to identify chromatin interactions genome- wide

The study of the packaging and three-dimensional architecture of chromatin in the nucleus is shedding light on the spatial aspects of CREs and their target genes (van Berkum et al., 2010). Genome-wide identification of CREs as well as their role in target genes expression has not been a straightforward process since enhancers are often located thousands of base pairs away from their targets (Belton et al., 2012). In molecular assays based on Chromosome Conformation Capture (3C) technology, such as 4C (circularized 3C), 5C (carbon-copy 3C), and Hi-C (Figure 11), next generation sequencing can be used to identify interacting chromatin points (Jon-Matthew Belton, 2013; Fernández-Miñán et al., 2016; Carty et al., 2017). Hi-C allow the identification of chromatin interactions between and within by covalent crosslinking DNA and proteins. After chromatin digestion and biotinylation, the DNA present in crosslinked complexes is ligated to form chimeric DNA molecules. Following this rational, biotin is removed from unligated ends and molecules are fragmented to reduce their overall size. Output molecules with internal biotin incorporation are pulled down with specific beads and modified for deep sequencing of all ligation junctions (Figure 11A-D) (van Berkum et al., 2010; Belton et al., 2012). The readout of the method varies with the C technique applied. Overall, the process focus on analyzing a set of loci enabling “one-versus-some” (basic 3C), “one-versus all” (4C), or “many-to-many” (5C) examinations of regions of interest. Instead, Hi-C enables an “all-to-all” interaction profiling (Figure 11F), functioning as a powerful method for studying the three-dimensional architecture of the genome by mapping chromatin interactions, such as between CREs and promoters in a genome- wide manner. Therefore, the unparalleled nature of the Hi-C method provides the previously missing dimension of spatial context, giving a new perspective for the study of the genome organization and its implications for health and disease (Dekker et al., 2002; van Berkum et al., 2010; Belton et al., 2012; Carty et al., 2017).

35

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 11. Overview of Hi-C technique. (A-E) Representation of the crosslink, digestion, ligation and pulldown methods used in Hi-C (F) Different techniques for the analysis of ligation junctions, including 3C, and its high-throughput derivatives 4C, 5C and Hi-C. Colored blocks indicate different regulatory DNA elements; black arrow indicates transcription starting point; and curved colored arrows indicate the interactions between regulatory elements (Wijchers and de Laat, 2011; Belton et al., 2012).

2.3. Genetic tools to study cis-regulatory elements in vivo

Molecular techniques in model organisms have been rapidly advancing and seriously facilitated the study of gene regulation and the developmental events in specific tissues (Yee, Kazi and Yee, 2013). Although several methods have been described, such as microinjection techniques and forward and reverse genetics, the identification of CREs and their functional validation remains challenging (Kawakami, 2007; Wittkopp and Kalay, 2012; Kalender Atak et al., 2017). Therefore, for enhancer validation, the use of in vivo transgenic reporter assays (Figure 12), which allow the evaluation of the tissue specificity of the reporter construct/sequence in a multicellular organism in different cell types and developmental timepoints, rather than in vitro cell culture assays (e.g.: luciferase assays), is preferred (Bessa and Gómez-skarmeta, 2011).

36

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Figure 12. Embryo transgenic experiments in zebrafish. Transposase mRNA and a transposon donor plasmid containing a Tol2 construct with a minimal promoter and the gene encoding green fluorescent protein (GFP) are co- injected into zebrafish one-cell stage fertilized eggs. The Tol2 construct is excised from the donor plasmid and integrated into the genome. In this figure, the promoter is driving expression of GFP in the midbrain of the embryo (depicted in green). Adapted from (Kawakami, 2007).

2.3.1. Enhancer assays to test for CREs activity

Several model organisms, such as zebrafish, mouse and the common fruit fly (Drosophila melanogaster) have been used to generate transgenics and test CREs for enhancer activity (Wittkopp and Kalay, 2012). Initially, reporter assays using Bacterial Artificial Chromosomes (BACs), in which large DNA fragments are inserted through recombineering, were used to test the cis-regulatory activity of large genomic regions (up to 200 kb) in their genomic context (Jeong, 2006; Pashos et al., 2013). In a study, Pashos and coworkers (Pashos et al., 2013), using Tol2-mediated and BAC transgenesis for the zebrafish ptf1a gene, identified different CREs necessary for correct ptf1a expression and normal pancreas development. These CREs included a novel ptf1a autoregulatory element, several neuronal enhancers and an enhancer active in the ventral pancreas (Pashos et al., 2013). Moreover, Pashos’s study revealed that this autoregulatory element was necessary for proper pancreas development (Pashos et al., 2013). However, the generation of transgenic animals with BACs is more difficult than with smaller DNA fragments (Udvadia and Linney, 2003; L. Li et al., 2016). The use of short genomic elements (bellow 5kb) in transposon-mediated enhancer assays,

37

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas containing putative enhancer regions, is a simpler alternative to BACs (Ariza-Cosano et al., 2012). In enhancer assays, the most commonly used reporter vectors consist of a (1) cloning site to insert the sequences of interest, (2) a minimal promoter with low endogenous expression that is able to read a wide range of different enhancers, and (3) a sensible reporter gene, which can be detected in vivo (Figure 13) (Bessa et al., 2009; Bessa and Gómez-skarmeta, 2011; Ariza-Cosano et al., 2012).

Figure 13. Assessment of gene regulatory events in vivo. Transcriptional reporters that are integrated into the genome allow the assessment of gene regulatory activities in specific tissues (Shlyueva, Stampfel and Stark, 2014).

In model organisms, cis-regulatory activity is usually assessed through transient expression or through stable transgenic lines (Ariza-Cosano et al., 2012). By transient expression, the enhancer activity of the sequences is analyzed at desired developmental stages of embryos injected with the destination construct at one-cell stage (Hadrys et al., 2006; Miyatsuka et al., 2011; Pashos et al., 2013). The vector is randomly integrated in the host genome in such a way that the DNA integration occurs in several embryonic cells, allowing the detection of the enhancer activity in a mosaic manner (Ung et al., 2015). Consequently, this approach requires the analysis of a large number of embryos to define a consistent expression pattern through imaging techniques with the advantage of analyzing many different independent integrations of the construct (Grabher and Wittbrodt, 2007; Bessa and Gómez-skarmeta, 2011). Furthermore, the exact expression pattern directed by the enhancer can only be fully reconstructed by the establishment of reporter stable transgenic lines (L. Li et al., 2016). This way, reporter assays in model organisms allow for rapid and large-scale genomic screening of regulatory elements, and for efficient detection of enhancer activity in vivo (Bessa et al., 2009).

38

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

2.3.2. Genome editing as a tool for loss of function assays

Specific genetic modifications in the genome of model organisms by directly manipulating the information encoded in specific genes and regulatory elements is crucial for biomedical research (M. Li et al., 2016; Y. Zhang et al., 2017). Traditional forward genetic screens using chemical mutagens such as N-ethyl-N-nitrosourea (ENU) and retroviral insertions have been used for the identification of relevant developmental genes and pathways (Pack et al., 1996; Amsterdam et al., 1999). Additionally, their functional characterization is enabled by reverse genome-editing tools such as morpholino oligonucleotides and engineered nucleases [e.g.: Zinc Finger Nucleases (ZFN), Transcription Activator-Like Effector Nucleases (TALENs) and CRISPR] (Amsterdam et al., 1999; Woong Y Hwang et al., 2013; M. Li et al., 2016). Generally, these tools contain a specific DNA binding domain fused to a non-specific DNA cleavage domain that generates double-stand breaks (DSBs) in the DNA. As a normal cellular response, the DSBs are rapidly repaired mostly through the non-homologous end joining (NHEJ) pathway generating random insertions, deletions and substitutions (Indels) (Figure 14) (Woong Y. Hwang et al., 2013; M. Li et al., 2016).

39

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 14. Genomic engineering using Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)- Cas9 endonucleases. CRISPR-Cas9 is a complex between the Cas9 protein and a small RNA molecule, the single guide RNA (sgRNA). The endonuclease is designed to site specifically recognize a single site in a vertebrate genome. Cas9 generates double-strand breaks (DSBs), which are repaired by the endogenous DNA repair machinery. This usually occurs without a provided template by nonhomologous end-joining (NHEJ) mechanisms. The repair mechanisms normally generate indels at the targeting site (M. Li et al., 2016).

The recently discovered CRISPR system is constituted by a diverse family of endonucleases found in archaeal and bacterial genomes that have been adapted and developed as a suitable in vivo genome editing tool in several model organisms (Woong Y. Hwang et al., 2013). The Cas9 endonuclease has a domain that recognizes short sequences of DNA usually referred as protospacer adjacent motifs (PAM). However, most of the Cas9-binding specificity comes from a small single guide RNA (sgRNA) molecule that guides the Cas9 protein to the target DNA site on the genome to function as an endonuclease and induce double-strand breaks (DSBs) (Woong Y. Hwang et al., 2013; Felker and Mosimann, 2016; Y. Zhang et al., 2017). Genetic research commonly applies the study of mutant phenotypes caused by molecular disruptions to infer the function of the disrupted entity in the normal situation. In this way, the study of mutant phenotypes caused by the disruption of specific genes and its regulatory elements provides a valuable and powerful resource for the discovery of new insights into the architecture of the genome (Woong Y Hwang et al., 2013; Felker and Mosimann, 2016; M. Li et al., 2016; Y. Zhang et al., 2017).

40

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

3. Objectives

The goal of this thesis was to identify CREs with enhancer activity in pancreas, to explore the regulatory networks of the zebrafish adult exocrine pancreas. To achieve this, we coupled next-generation sequencing techniques with in vivo reporter assays. Based on data from ATAC-seq, that allow to detect open chromatin regions, and ChIP-seq for epigenetic marks associated with enhancer activity (e.g.: H3K27Ac), previously obtained by our group (unpublished), we selected putative CREs in the vicinity of genes of interest, to test for enhancer activity. The regions selected were located in the landscape of genes that are known to be involved in pancreatic development, as well as genes known to be actively expressed in pancreas. To identify putative disease-associated enhancers, we also tested the enhancer activity of sequences located in the landscape of TSGs described as being involved in PC. After identifying active enhancers in pancreas, we performed functional assays to assess their role in regulating gene expression, by deleting the selected sequences through CRISPR-Cas9, and evaluating the expression of the predicted target gene and the phenotypic consequences of enhancer deletion, focusing on pancreas development. Moreover, we also analyzed the enhancer activity of a human distal enhancer of PTF1A, comparing with the zebrafish distal enhancer identified in this study, to ascertain whether there might exist conservation of enhancer function between both species. Lastly, we evaluated how the loss of function of a pancreatic TF encoding gene (hnf4a) could affect pancreas morphogenesis and the expression of pancreatic genes associated with these cis-regulatory regions. Clarifying the basic mechanisms that regulate pancreas development and organogenesis will help us describe molecular networks involved in the development of pancreatic diseases such as diabetes and PC. Therefore, we expect this work to provide a valuable and powerful information on the prediction and functionality of the pancreatic regulatory genome in vivo.

41

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

4. Material and Methods 4.1. Zebrafish stocks, husbandry, breeding and embryo culture

Adult zebrafish AB/TU WT strains as well as transgenic lines generated in this study were reared at 27-29ºC under a 10h dark/14h light cycle in a recirculating housing system. The system contains a mechanical filter (filter pad and canister filter), biological filter (bacteria), chemical filter (activate carbon filter) and a life support system containing a recirculation pump, heaters, UV light sterilizer and reverse osmosis water. To generate embryos, adult female and male fish were identified and placed in a breeding tank separated with a partition at room temperature the night before spawning. Embryos were collected in the morning after spawning and cultured at 28ºC in petri dishes, generally containing E3 medium [5mM NaCl, 0.17mM KCl, 0.33mM CaCl2·2H2O, 0.33mM

MgSO4·7H2O and 0.01% methylene blue (Sigma-Aldrich), pH 7.2] or E3 supplemented with 0.01% PTU (1-phenyl-2-thiourea) to delay pigmentation formation (Ishibashi et al., 2013). Injected embryos were cleaned at 24 hpf with 0.36% bleach (Sodium hypochlorite; Sigma-Aldrich) diluted in tap water, in 2 incubations of 5 min of bleach, alternated with tap water, followed by 2 final incubations, of 5 min each, with tap water. In the end, embryos were collected to new plates containing a fresh E3 medium or E3 plus PTU. For in vivo reporter screening, embryos were anaesthetized by adding tricaine (ethyl 3- aminobenzoate methane-sulfonate salt, Sigma-Aldrich) to the E3 medium and moved into new petri dishes for immunohistochemistry assays. For the establishment of stable transgenic lines, hundreds of embryos were injected, F0s screened for the expression of the internal control of transgenesis and grown until adulthood. Adult F0s were outcrossed with WT adults and the offspring screened for the internal control of transgenesis and for the pattern of expression of the regulatory element. The following transgenic lines were used in this work: tg(ela:mCherry), tg(sst:mCherry) and tg(ptf1aE3:EGFP). All efforts were made to minimize animal stress and suffering. The i3S animal facility and the project are licensed by the Direcção Geral de Alimentação e Veterinária (DGAV) and all the protocols used for the experiments were approved by the i3S Animal Welfare and Ethics Review Body.

4.2. DNA extraction from zebrafish embryos

Genomic DNA was extracted from whole zebrafish embryos at 24 hpf and used as a template for amplification of fragments. Embryos were collected, dechorionated, washed

42

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

in ddH2O and incubated 3 hours at 56ºC with agitation in 50 µl genomic extraction buffer (EB) [10mM Tris, pH 8.2; 100mM EDTA, pH 8.0; 200mM NaCl; 0.5% SDS and 200 µg/ml proteinase K (Roche)]. Next, 100 µl 100% molecular grade absolute ethanol (Merck) was added and samples incubated at -20ºC overnight (O.N.). After precipitation, the DNA was centrifuged 10 mins at 13000 rpm, discarding the supernatant, washed with 200 µl of 70% ethanol diluted in molecular grade water with 2 min centrifugation at 13000 rpm, and let to dry. The DNA was resuspended in 20 µl of TE buffer with RNase [10mM Tris, pH 8.0; 1mM EDTA pH 8.0 and 100 µg/ml RNAse (Sigma-Aldrich)], incubated 1 hour at 37ºC, and stored at -20ºC.

4.3. Generation of the plasmids carrying the selected putative enhancer sequences

Based on data previously obtained in our laboratory, namely ChIP-seq for the H3K27Ac and ATAC-seq in zebrafish pancreas (unpublished data), this analysis was focused on putative enhancers of genes known to be expressed in zebrafish pancreas, as well as two TSGs associated with PC. The sequences of interest have been selected considering the signal of activity concomitantly in H3K27Ac ChIP and ATAC-seq data, in non-coding regions within the landscape of each gene, depicted in Figure 15 for ptf1a, as an example.

Figure 15. UCSC Genome Browser tracks showing different data sets in the region of pancreatic transcription factor 1 subunit alpha gene (ptf1a, green box) in zebrafish and putative regions for enhancer sequences (red boxes); coordinates: chr2 29,416,955-29,481,705; UCSC Genome Browser on Zebrafish Jul. 2010, Version: (Zv9/danRer7). Open chromatin regions detected with transposase-accessible chromatin using sequencing (blue track) overlap with the epigenetic mark H3K27Ac (black track) in putative enhancer regions.

4.3.1. PCR amplification of the selected sequences

Specific primers (Sigma-Aldrich) targeting the genomic regions of interest were designed using NCBI/Primer-BLAST software and confirmed with Biotools/Oligo Calc software (Table 1). A standard PCR reaction using a proof-reading iMax-TM II DNA polymerase (INtRON Biotechnology) was performed, following the manufacturer’s instructions for a standard 20 µl PCR reaction. Commercial water was used for all the PCR reactions

43

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

(HyClone™ Water, GE Healthcare). PCR amplification occurred in thermocycler (Veriti, Applied Biosystems), with the following conditions: initial denaturation at 94°C for 3 min; 35 cycles: denaturing at 94°C for 30 s, annealing adjusted on primers melting temperature (Table 1) for 45 s, elongation at 72°C for 1 min/per kb amplified sequence, and a final elongation step at 72°C for 7 min, kept at 12°C until samples were collected and stored at -20ºC. To confirm that only a single band of the correct size was amplified, 10 µl of the PCR product were loaded onto a 1% agarose gel. Amplified products were visualized by transillumination under UV (TFX – 35 M, VILBER LOURMAT), the band of interest excised and purified using a NZYGelpure kit (NZYTech) according to the manufacturer’s instructions. The zebrafish sequences were amplified from DNA of WT zebrafish embryos (TU/AB) at 24hpf. Regarding the human enhancer of PTF1A, the PCR was performed using as template DNA from human dermal fibroblasts (hDFb) that was available in the laboratory.

Table 1. Primers used for the amplification of cis-regulatory sequences.

Target Orientation Sequence (5’ - 3’) Tm (ºC) Amplicon (bp) sequence

Forward AAGCAATGAAGGCTGTTTTGTTTTC 60.9 arid1abE1 1125 Reverse TTTAGCACAGAGTGTGTTCTTGC 60.9

Forward GCCATGTCGTAGAATTAAAAAGGG 62.0 arid1abE2 1386 Reverse TGAGCTCGGGGAAAAACTGC 60.5

Forward TGGTAGTGGTTGTTCTCAGCG 61.2 arid1abE2b 934 Reverse TGAAGGGAAATCCTGTCAGCC 61.2

Forward ATGCTTTGAGATCACATTCCCG 60.1 arid1abE3 1011 Reverse CAGGTTTTGGAGACGCAGACG 63.2

Forward GCACAAAAATCGGCAAAGCC 58.4 arid1abE4 1578 Reverse TCTGTGACAAAATGTATCAAGTTCC 60.9

Forward GTCATCATGGTATCAGAGAGTAAACC 64.6 arid1abE5 2084 Reverse GGAATGAAAAGAGCATTCAGTGATG 62.5

Forward TGACATACTTCTACCACCCGC 61.2 arid1abP1 1027 Reverse TGTAGGAAGCAGCAGTAGCC 60.5

Forward TTGGCTTCAAAACCTAGCTGGA 60.1 hARID1A 2053 Reverse CTCCACTTAACTGGGTGCTGG 63.2

Forward TTTCCCTTATGTCAAGTAAAGATCG 60.9 amy2aE1 1530 Reverse GGATTTGTTCAATTATTATCTTGCTC 60.1

amy2aE2 Forward TGAAGCTGACCTCTAGACAAAGC 62.9 842

44

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Reverse ATCATGCACTTTGGTCTTTGAACC 62.0

Forward CAATGAAAATTAGAATAATACTTGTAGC 59.9 amy2aE3 1641 Reverse GCATTGATAATCTTACCTAAATAAGTA 59.2

Forward CACTTTACTACAGTCGTGTGG 59.5 cpa1E1 1559 Reverse CAAATGTCTTTAACCTCCACTGT 59.2

Forward GAACCACTGTTTACACAAATATTTG 59.2 cpa5E1 1939 Reverse GCATTGATTTTGTGAATAACGTTGT 59.2

Forward TGTGAACAAGCATGCCTTTGG 59.5 ctrb1E1 1515 Reverse TGTTCAGAAATGTGTGGGGG 58.4

Forward GACGCTGACGTCACATGCTCC 65.3 tp53E1 1137 Reverse TGGGGGCTGAATACTTCTGAG 61.2

Forward CTGTCGTTGTTGCATTGATGG 59.5 ptf1aE1 1707 Reverse GGCCCTTTTACGATAACGTTTG 60.1

Forward CAGTGCAGAGTAAATTTGTTAACG 60.3 ptf1aE3 1828 Reverse GGACAGCTAACAGGCTAAAACA 60.1

Forward CACCAACAACAGGGGCAACTGAAC 66.9 hPTF1a 744 Reverse TATGTCCTTCCTAGGCTGGTT 59.5

Forward GATTGAAGCATATGGAGAAACAGG 62.0 tryE1 2724 Reverse GATTTAAAGATACACATTCAGATTGTG 60.8

Forward ACCACCACAACAACATCAAAAGC 60.9 tryE2 3131 Reverse GCTAATTCCAGCAGAAGCTTTTTC 62.0

4.3.2. Cloning of fragments into pCR®8/GW/TOPO vector

PCR products were subcloned into pCR®8/GW/TOPO vector (Invitrogen #250020) by A- overhang cloning mediated through topoisomerase activity (‘TOPO® TA cloning’) according to the manufacturer’s instructions. The ligated product was chemically transformed into 50 µl commercially available competent cells (MultiShotTM FlexPLate Mach1TM T1R, Invitrogen) according to chemical transformation standard procedures. Transformed cells were plated on LB agar plates containing Spectinomycin (Sigma) at 100 µg/ml. Grown colonies were screened by colony PCR for positive bacterial clones using the M13 forward/reverse combination of primers: M13 forward (5’- TGTAAAACGACGGCCAGT-3’) and M13 reverse (5’-TCAGGAAACAGCTATGAC-3’). Colony PCR was performed as 12.5 µl reaction: 1.25 µl of 10× B2 Buffer, 0.75 µl of

25mM MgCl2, 0.5 µl dNTPs [10mM A, 10mM U, 10mM C, 10mM G (ThermoScientific)], 0.5 µl of primers M13 Forward/Reverse (10 µM final concentration, Invitrogen), 0.10 µl

45

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

® of HOT FIREPol DNA Polymerase (Solis BioDyne) and 8.90 µl of H2O (HyClone™ Water, GE Healthcare). PCR conditions were as follows: denaturing 95ºC for 15 minutes; 35 cycles: denaturing 95ºC for 30 s, annealing 56ºC for 1 min, elongation 72ºC for 1 min/per kb amplified sequence; 72ºC for 7 min; 4ºC. Then, 10 µl of the colony PCR product were used for 1% agarose gel electrophoresis and the positive clones were identified. Positive clones were picked up from the plate to inoculate in 3 ml LB medium containing Spectinomycin (100 µg/ml) and grown O.N. at 37ºC in a shaking incubator (MaxQ6000, ThermoScientific). Plasmid DNA was extracted from the culture using a standard NZYMiniprep kit (NZYTech) according to manufacturer’s instructions and stored at -20ºC. Isolated plasmid DNA concentration and purity was determined using a NanoDrop 1000 Spectrophotometer (ThermoScientific). Sanger sequencing was used to confirm the insertion of the correct sequence in the pCR®8/GW/TOPO vector with following primers: M13 forward (5’-TGTAAAACGACGGCCAGT-3’) and M13 reverse (5’- TCAGGAAACAGCTATGAC-3’) on an 3130XL Genetic Analyzer sequencer (Applied Biosystems), analyzed using Sequence Scanner v1.0 (Applied Biosystems) and aligned using Benchling platform (Benchling [Biology Software], https://benchling.com) to confirm the cloned fragment.

4.3.3. Recombination of the selected sequences into Z48 and ZED vectors

The entry vectors pCR®8/GW/TOPO containing the enhancer sequences, obtained in the previous task, were recombined with destination vectors Z48 and ZED. Both vectors contain a minimal promoter driving an in vivo fluorescent reporter gene (GFP), so GFP expression only occurs in specific tissues in the presence of an active enhancer (Bessa et al., 2009, 2014; L. Li et al., 2016). Z48 vector consists in a plasmid carrying Tol2 recognition sequences to allow integration in the genome by resorting to transposase, a gata2a minimal promoter, GFP as a reporter gene, a GW (Gateway) cassette for easy recombination of amplified genomic sequences and an internal zebrafish enhancer (irx Z48 enhancer ‘Z54390’), and it has been used to further perform transient enhancer assays. In zebrafish, the Z48 enhancer is very active in the midbrain, functioning as a strong internal control of transgenesis (De La Calle-Mustienes et al., 2005) (Figure 16A). For the establishment of stable transgenic lines, the ZED vector (Bessa et al., 2009) has been used. The vector also contains Tol2 recognition sequences and is composed by two main different cassettes: the internal control cassette of transgenesis that is constituted by the Cardiac Actin promoter and the red fluorescent protein (RFP); and the

46

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

enhancer detection cassette composed by the Gateway entry site, the gata2 minimal promoter and the enhanced green fluorescent protein (EGFP) reporter gene (Figure 16B). This cassette is flanked by two insulator sequences that protect the cassette from position effect (Bessa et al., 2009).

Figure 16. Schematics of the in vivo reporter (destination) vectors. (A) Diagram of the Z48 construct and the pattern of expression given by the Z48 enhancer at 72 hpf. Scale bar = 250 µm. (B) Schematics of the ZED vector (Bessa et al., 2009).

Upon site-specific recombination using Gateway® LR Clonase® II Enzyme mix (Invitrogen), the candidate fragments were cloned into the destination vectors (Figure

17), according to manufacturer’s instructions. A standard reaction was prepared as follows: 1 µl with 50 ng of the pCR®8 construct, 1 µl with 50 ng of the destination vector and 0.5 µl LR Clonase® II Enzyme mix. The mixture was incubated in a thermocycler (Veriti, Applied Biosystems) set at 25ºC for one hour and stopped by adding 0.25 µl Proteinase K and a 10 min incubation at 37ºC. The recombinant solution was chemically transformed into 50 µl of (MultiShotTM FlexPLate Mach1TM T1R, Invitrogen) competent bacteria as previously described. Cells were plated on agar plates containing 100 µg/ml Ampicillin (Normon). Grown colonies were used to inoculate in 3 ml LB medium with ampicillin (100 µg/ml) O.N. Plasmid DNA was extracted using a NZYMiniprep kit (NZYTech) as described above and stored at -20ºC. Isolated plasmid DNA concentration and purity was determined as described. Sequences were analyzed by sanger sequencing.

47

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 17. Overview of the cloning and recombination protocol. (1) Initially, a putative enhancer is amplified from genomic DNA with specific primers. (2) The amplified DNA is cloned into the pCR8/GW/TOPO entry vector. (3) The DNA fragment is recombined into a desired destination vector to test the enhancer activity of the candidate sequence. In (3) is represented a destination vector similar to ZED vector and BglII restriction sites are annotated as black lines (Bessa and Gómez-skarmeta, 2011).

4.3.4. Purification of the plasmids for injection

For the purification of DNA, UltraPureTM Phenol:Chloroform:Isoamyl Alcohol (25:24:1, v/v, Invitrogen) has been used. The volume of the isolated DNA was adjusted to 100 µl by adding hyper-pure H2O (HyClone™ Water, GE Healthcare). 100 µl Phenol:Chloroform have been added followed by vortexing. The mixture was centrifuged at 13000 rpm for 5 minutes to obtain two phases. The aqueous phase was transferred to a new tube RNAse free and mixed by vortexing with 100 µl of chloroform (Fisher Chemicals). DNA was precipitated by adding 10 µl of sodium acetate 3 M pH 5.6 and two volumes of cold ethanol 100 % per 100 µl of aqueous phase collected, mixed and incubated for 1 hour at -80ºC or O.N. at -20ºC. The mixture was centrifuged at 13000 rpm for 15 minutes, discarding the supernatant, the pellet resuspended in hyper-pure H2O and samples stored at -20ºC. Purified plasmid DNA concentration was determined by spectrophotometric analysis using a NanoDrop 1000 Spectrophotometer (ThermoScientific).

48

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

4.4. Generation of transgenic zebrafish

4.4.1. RNA synthesis

Tol2-mediated transgenesis in zebrafish depends on transposase, which allows integration of the sequence of interest in the genomic DNA (Kawakami, 2007). The DNA sequence of interest, located between two Tol2 sequences is excised from a circular vector and then integrated into the genome (Felker and Mosimann, 2016). Therefore, the destination vector carrying a reporter gene is co-injected with transposase mRNA (Kawakami, 2007). To obtain Tol2 mRNA, Tol2-pCS2FA vector (Kawakami, 2007; Kwan et al., 2007) was first amplified by transformation into competent cells using ampicillin selection, isolated and linearized with NotI restriction enzyme (Anza, Invitrogen) for 1h at 37ºC. Linearized DNA was purified by phenol/chloroform extraction, confirmed with 1% agarose gel electrophoresis and quantified by nanodrop spectrophotometric analysis using a NanoDrop 1000 Spectrophotometer (ThermoScientific). Using this DNA as template, transposase mRNA was synthesized in vitro with the following conditions: 14 µl of linearized DNA mixed with 10 µl transcription buffer 5×, 5 µl NTP mix (10mM A,

10mM U, 10mM C, 5mM G), 5 µl DTT, 5 µl Cap (5mM G), 9.5 µl hyper-pure H2O (HyClone™ Water, GE Healthcare) and NZYRibonuclease Inhibitor (NZYTech). The mixture was incubated 2 hours at 37ºC with 2 µl of RNA polymerase SP6 (ThermoFisher Scientific). Finally, the template DNA was digested by adding 2 µl DNase and incubated 1 hour at 37ºC.

4.4.2. Purification of the synthesized mRNA

For the RNA purification an adapted Sephadex protocol has been followed. The piston of a sterilized syringe (Omnifix 100 Solo, B/BRAUN) was removed. Glass wool was inserted into the syringe (± 0.1 mm) and Sephadex® G-50 fine DNA grade (GE Healthcare) was added to the top (1ml). To prepare a good extraction column, the syringe was inserted into a falcon to discard liquids, and centrifuged at 4000 rpm, at 4ºC, for 5 minutes. The flow-through was discarded and Sephadex column checked for reaching 0.6 mm. Then 50 µl of Hyper-pure H2O (HyClone™ Water, GE Healthcare) was added to the column and an 0.5 µl tube without the cover was attached to the tip of the syringe. The column was centrifuged again in the same conditions. The flow-through was confirmed as 50 µl and discarded, otherwise, centrifugation was repeated. Next, synthesized RNA was loaded into the column and a new 0.5 µl tube was attached to the tip of the syringe. Centrifugation was performed in the same conditions and the RNA was

49

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

collected to a new tube, on ice. After, 50 µl of Hyper-pure H2O was added to the Sephadex column and centrifuged, to wash the remaining RNA from the column, and was added to the purified RNA sample, on ice. This was followed by a purification with phenol/chloroform as described above. RNA integrity was verified by 0.8% agarose gel electrophoresis and quantified by spectrophotometry. Pure RNA was stored at -80ºC.

4.4.3. Injection of the plasmids and mRNA into zebrafish embryos

Z48 vector and ZED vectors carrying the putative enhancer sequences were injected with transposase mRNA in one-cell stage WT zebrafish embryos (TU or AB strain), to evaluate their in vivo enhancer activity. In the day before the injection AB/TU WT zebrafish females and males (3:2 or 2:1 ratio) were separated in breeding boxes O.N. The next day, fishes were crossed, and the embryos immediately collected. Injection of embryos was performed by using a 0.58x1.00x100mm microneedle (GB100F-10, SCIENCE PRODUCTS GmbH) previously made in a pooler (PN-31, NARISHIGE). The microneedle had its tip carefully cut and was placed into a microinjector (IM 300 Microinjector, NARISHIGE). The injection mix was prepared as follows: plasmid DNA to a final concentration 25ng/µl, transposase mRNA to a final concentration 25ng/µl and 0.05% phenol red. The microneedle was filled with the mix and one cell stage zebrafish embryos were injected in the cell or yolk with approximately 2 nl of mixture.

4.5. Screening for enhancer activity

To assess in vivo enhancer activity of the tested sequences specifically in the pancreas, immunohistochemistry for pancreatic markers was performed in the F0 embryos injected with the Z48 vectors and F1 embryos from transgenic founders carrying the ZED vectors. Around 20 zebrafish embryos per condition, at 12 dpf, were euthanized by overdose with tricaine methane sulfonate (MS222, 200-300 mg/l) prolonged immersion, then they were fixed in formaldehyde 4% (in PBS-1×) 1 hour at room temperature (RT). Embryos were washed in PBS-T (0.5 % Triton X-100 in PBS-1x) 5 minutes at RT (twice). Then, embryos were permeabilized with 1% Triton X-100 in PBS-1x 1 hour at RT followed by blocking with 0.5% BSA-PBS-T 1 hour at RT. For the immunostaining, embryos were incubated with 200 µl primary antibody diluted in 0.5% BSA-PBS-T at 4ºC O.N. with shaking. Primary antibodies used: anti-Amylase (1:50, Sigma), anti-AlcamA Zn8 (1:50, DSHB) and anti-Nkx6.1 (1:50, DSHB). In the next day, after removal of the primary antibody, embryos were washed 6 times with PBS-T (0.5% Triton X-100 in PBS-1x) for 10 minutes and incubated with 200 µl secondary antibody and DAPI (1:1000, Invitrogen) diluted in 50

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

0.5% BSA-PBS-T 4 hours at RT. Secondary antibodies used: Alexa 568 (1:800, Invitrogen) and Alexa 647 (1:800, Invitrogen). Finally, embryos were washed with PBS- T 0.5% for 10 min, 6 times (last wash 30 minutes) and stored in 50% Glycerol/PBS. For confocal imaging, eyes or yolk were removed (depending on the developmental stage of the embryo), and embryos were mounted and assembled in 50% Glycerol/PBS. Confocal images were acquired by using a SP5II Leica confocal microscope and processed with ImageJ and Inkscape.

4.6. Mutagenesis using the CRISPR-Cas9 system

4.6.1. Generation of Cas9 mRNA and sgRNAs

The sgRNAs targeting regions flanking ptf1aE3 and hnf4a, were designed using the Crisprscan software (Table 2).

Table 2. Oligonucleotides used for CRISPR-Cas9. sgRNA Oligonucleotide orientation Sequence (5’ - 3’)

Forward TAGGGAATGTTGTTTGACTACA ptf1aE3_Sg1 Reverse AAACTGTAGTCAAACAACATTCCC

Forward TAGGCCTGACAAGCTTCTGACC ptf1aE3_Sg2 Reverse AAACGGTCAGAAGCTTGTCAGGCC

Forward TAGGCGAAAACGTTTTAGCAGA ptf1aE3_Sg3 Reverse AAACTCTGCTAAAACGTTTTCGCC

Forward TAGGTGTGAGGTAGTTCGCATT ptf1aE3_Sg4 Reverse AAACAATGCGAACTACGTCACACC

Forward TAGGACAACATATTGTTGTAGT ptf1aE3_Sg5 Reverse AAACACTACAACAATATGTTGTCC

Forward TAGGTGGCCCGGTCCCCACAAA Hnf4a_Sg1 Reverse AAACTTTGTGGGGACCGGGCCA

Forward TAGGTAGTGCTTTCCTGTGGGG Hnf4a_Sg2 Reverse AAACGGCCACAGGAAAGGACTA

Forward TAGGTGGCAATTTCTTGTGGTG Hnf4a_Sg3 Reverse AAACCACCACAAGAAAGTGCCA

Forward TAGGTGCCGGCCCCTAAGTGCT Hnf4a_Sg4 Reverse AAACACCACTTAGGGGCCGGCA

These oligonucleotides were annealed in vitro by denaturation at 95ºC for 5 min followed by slow cooling at RT. Annealed oligonucleotides (1:10) were further inserted into 100

51

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas ng of pDR274 vector (Addgene plasmid #42250) previously cut with BsaI (Anza, ThermoScientific) and purified with NZYGelpure kit (NZYTech), using T4 Ligase (ThermoScientific), following the manufacturer’s instructions. The ligation product was cloned in MachI bacteria as described above and plated in agar plates with 50 µg/µl kanamycin. Colonies were tested by colony PCR using M13 forward primer (5’- TGTAAAACGACGGCCAGT-3’) and the oligonucleotide for the sgRNA with the TAGG sequence. Colony PCR was performed as 12 µl reaction: 2.5 µl of 10× B2 Buffer, 1.5 µl of 25mM MgCl2, 1 µl dNTPs [10mM A, 10mM U, 10mM C, 10mM G (ThermoScientific)], 1 µl of primers M13 forward/oligonucleotide (10 µM final concentration), 0.2 µl of HOT ® FIREPol DNA Polymerase (Solis BioDyne) and 4.8 µl of H2O (HyClone™ Water, GE Healthcare). PCR conditions were as follows: denaturing 95ºC for 15 minutes; 30 cycles: denaturing 95ºC for 30 s, annealing 56ºC for 1 min, elongation 72ºC for 90 s; 72ºC for 7 min; 4ºC. Positive colonies were grown in LB with 50 µg/µl kanamycin and DNA extracted using a standard NZYMiniprep kit (NZYTech) according to manufacturer’s instructions. Correct cloning was confirmed through sequencing using M13 forward primer. The pDR274 vectors carrying the sgRNA sequences were then cut with HindIII (ThermoScientific) and purified by sephadex and phenol-chloroform as described in section 4.4.2. For the Cas9 mRNA, the vector pCS2-nCas9n (Addgene plasmid #47929) was cut with NotI (ThermoScientific), confirmed in 1% agarose gel electrophoresis, and purified with NZYGelpure kit (NZYTech), according to the manufacturer’s instructions. SgRNA and Cas9 mRNA were synthetized by in vitro transcription according to the procedures described in subheading 4.4.1. using, respectively, Sp6 and T7 RNA polymerases (ThermoScientific). RNA was purified by Sephadex followed by Phenol:Chloroform extraction. RNA integrity was verified by 0.8% agarose gel electrophoresis and quantified by spectrophotometry. Pure RNA was stored at -80ºC.

4.6.2. Injection of sgRNAs into zebrafish embryos

WT embryos (TU) were injected with Cas9-encoding mRNA as a negative control (NC) or co-injected with Cas9-encoding mRNA and sgRNAs targeting ptf1aE3. Different combinations of sgRNAs flanking ptf1aE3 enhancer were used to find the most effective ones to delete the enhancer. Genomic DNA from injected embryos was extracted as described in subheading 4.2. The presence of a deletion of the enhancer was tested by PCR amplification using primers flanking the enhancer region: ptf1aE3 forward (5’- CAGTGCAGAGTAAATTTGTTAACG-3’) and ptf1aE3 reverse (5’

52

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

GGACAGCTAACAGGCTAAAACA-3’), which product was analyzed by electrophoresis in 2% agarose gel. PCR conditions were as follows: initial denaturation at 94°C for 3 min; 35 cycles: denaturing at 94°C for 30 s, annealing at 62ºC for 45 s, elongation at 72°C for 1 min, and a final elongation step at 72°C for 7 min, 4ºC. Similarly, stable transgenic lines carrying the fluorescent reporter gene mCherry in elastase-producing cells were injected with Cas9-encoding mRNA as a NC or co-injected with Cas9-encoding mRNA and sgRNAs targeting hnf4a. Genomic DNA from injected embryos was extracted as previously described. The presence of a mutation in the gene was tested by PCR amplification using oligonucleotides for the sgRNAs with the TAGG sequence, which product was analyzed by electrophoresis in 1% agarose. PCR conditions were as follows: initial denaturation at 95°C for 15 min; 35 cycles: denaturing at 95°C for 30 s, annealing at 62ºC for 45 s, elongation at 72°C for 90 s, and a final elongation step at 72°C for 7 min, 4º C. Sanger sequencing was used to confirm the generated mutation. Likewise, for hnf4a, the most effective sgRNA was selected to disrupt the gene. Embryos RFP-positive in the pancreas, expressing Elastase in this organ, were selected for Fluorescent Activated Cell Sorting (FACS) sorting. WT embryos were used as NC for the cell sorting. Injected embryos were transferred to an eppendorf tube at 8 dpf and euthanized by overdose with tricaine methane sulfonate (MS222, 200-300 mg/l) prolonged immersion and placed on ice. Then all the medium was removed and 300 µl filtered Dissection buffer (HBSS 1x, Hepes 10mM, EDTA 2mM) added. Embryos were centrifuged at 300g for 30 secs. Supernatant was discarded and resuspended in 500 µl Digestion Buffer (Dissection Buffer, Colagenase IV 0.125 mg/ml, TrypLE Select 1x) pre- warmed at 30ºC. Embryos were incubated at 30ºC with shaking at 800 rpm for 30 min with mechanical dissociation every 5 min. Then, the mixture was centrifuged at 300g for 4 min, the supernatant removed, and the pellet resuspended in 500 µl of filtered FACS Buffer (PBS 1x, 0.5% BSA) filtered with 40 µm mesh cell strainer (FALCON) into a 5mL PP flow cytometry tube (FALCON). Elastase-producing cells expressing RFP were sorted by a BD FACSARIA II cytometer and collected into a 1.5mL eppendorf with 500 µl of FACS Buffer. After cell sorting, RNA was extracted from the cells using TRIzol (Invitrogen, ThermoScientific). Cells were centrifuged 10 mins at 300g and the supernatant removed. 500 µl TRIzol (Invitrogen, ThermoScientific) was added to the cells and vortexed 10 min at RT. Next, 100 µl chloroform were added, vortexed and centrifuged at 13300 rpm for 20 min at 4ºC. The upper phase was collected to a new 1.5 ml tube. 300 µl isopropanol (Fisher Chemicals) and 2 µl glycogen 4mg/ml (Sigma- Aldrich) were added, mixed and precipitated at -20ºC ON. The mixture was spin down for 20 min at 4ºC and the pellet washed with 600 µl 70% ethanol (centrifuged 10 min at 13300 rpm). Ethanol was discarded, and the pellet let to dry. Next, DNase I treatment

53

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas was performed using 1 µl DNAse I (ThermoScientific), 1 µl 10x reaction buffer and 0.5 µl NZY Ribonuclease Inhibitor (40 U/µl) at 0.05 µl/µl final concentration. The mixture was incubated 30min at 37ºC. To stop the reaction, 1 µl EDTA 50mM was added per 1 µg of estimated RNA, mixed and placed in ice. The final volume was completed to 60 µl with

H2O HyPure. For the purification and precipitation of RNA, phenol-chloroform extraction was performed as described. RNA was stored at -80ºC.

4.7. Real time quantitative PCR

The transcriptional activity of target genes upon site-directed mutagenesis in the zebrafish adult exocrine pancreas was detected by RT-qPCR. 50-100 ng of total mRNA was used for reverse transcription using random hexamers (NZYTech) and SuperScriptTM II Reverse Transcriptase (Invitrogen) according to the manufacturer’s instructions. Real-time quantitative PCR (qPCR) analysis was performed using iTaqTM Universal SYBR Green Supermix (BIO-RAD) on a Real time PCR3 (CFX Biorad) 96 real- time system. Three technical replicates were used per round of qPCR. Sets of primers (Sigma-Aldrich) targeting the exon junction of specific genes were designed using NCBI/Primer-BLAST software and confirmed with Biotools/Oligo Calc software. Table 3 describes sets of primers used for RT-qPCR. Expression of the gene’s transcripts was normalized to the endogenous control genes encoding β-actin and eef1a1. Results are shown as the mean of the three technical replicates ± S.D.

Table 3. Primers used for RT-qPCR. Target gene Orientation Sequence (5’ - 3’) Amplicon (bp)

Forward GTACTCATGCAGGTTCAACAG hnf4a 228 Reverse CCCGGTGATGAAATTTGTCTAG

Forward CACAGTTACCATGTCACATGTAC hnf1a 236 Reverse CAAACTGCCTGAAGACCCTGG

Forward GCCACCGAGGAACAAGATCC ptf1a 200 Reverse GGGGGGTTCATTCTCAATATTGTTT

Forward CAGGATACGACAGAAGACCAG arid1ab 241 Reverse CTCTTCTGTGTGTAATTCGGTTG

Forward CCGGGAACTAAAGGTTGACG nr5a2 148 Reverse TAGAGCGTAATCCACCTGTTG

Forward AAAAGCACTTCGAAGGAAATTAAAG xbp1 247 Reverse GGACTCCAGTACCTGAACCTG

54

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Forward CCGCTAGCATTACCCTCC eef1a - Reverse CTTCTCAGGCTGACTGTGC

Forward CGTGCTGTCTTCCCATCCA β-actin - Reverse TCACCAACGTACCTGTCTTTCTG

4.8. Statistical Analysis

Assuming that the data is normally distributed, group values expressed as the mean ± S.D, were compared by a two-tailed t-test to test the significance of differences among sample averages. Values expressed as percentages were compared by Chi-square with Yates’ correction test. P-values of less than 0.05 were considered significant.

55

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

5. Results

5.1. Analysis of enhancer activity of predicted cis-regulatory elements in pancreas

To identify CREs with enhancer activity in pancreas, 17 sequences sharing both ATAC and ChIP for H3K27Ac enrichment in adult zebrafish pancreas were selected to perform enhancer assays. These sequences were obtained from the zebrafish genome (Zv9/danRer7), the majority located in the proximity of exocrine pancreas specific genes such as amy2a, cpa1, cpa5, ctrb1, ptf1a and prss1; but also near TSGs altered in PC, namely tp53 and arid1ab (Table 4 and Figure 18). These sequences in the landscape of TSGs have the potential to be putative PC-associated enhancers.

Table 4. List of selected genes and associated cis-regulatory elements.

Selected genes CREs Genomic coordinates (Zv9/danRer7); length

E1 chr19:30,795,899-30,797,023; 1125 bp

E2 chr19:30,852,102-30,853,487; 1386 bp

E2b chr19:30,856,603-30,857,536; 934 bp arid1ab E3 chr19:30,863,340-30,864,350; 1011 bp

E4 chr19:30,884,418-30,885,995; 1578 bp

E5 chr19:30,723,050-30,725,133; 2084 bp

E1 chr2:15,419,317-15,420,860; 1544 bp

amy2a E2 chr2:15,464,763-15,465,604; 842 bp E3 chr2:15,499,605-15,501,245; 1641 bp cpa1 E1 chr25:18,943,446-18,944,999; 1554 bp cpa5 E1 chr25:18,959,212-18,961,149; 1938 bp ctrb1 E1 chr7:36,528,475-36,529,989; 1515 bp

E1 chr2:29,421,617-29,423,323; 1707 bp ptf1a E3 chr2:29,464,044-29,465,920; 1877 bp

E1 chr16:27,328,708-27,331,431; 2724 bp prss1 E2 chr16:27,331,328-27,334,458; 3131 bp tp53 E1 chr5:25,755,139-25,756,275; 1137 bp

56

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

57

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 18. Landscape of selected genes and associated CREs. UCSC Genome Browser on Zebrafish Jul. 2010, Version: (Zv9/danRer7). Open chromatin regions detected with transposase-accessible chromatin using sequencing (blue track) overlap with the epigenetic mark H3K27Ac (black track) in putative enhancer regions of arid1ab (A), amy2a (B), cpa1 (C), cpa5 (D), ctrb1 (E), ptf1a (F), prss1 (G) and tp53 (H).

A ptf1a enhancer previously described in zebrafish (Pashos et al., 2013) was used as a positive control (ptf1aE1), whereas empty Z48 vector was used as NC. Injected embryos were selected for GFP expression in midbrain at 72 hpf, as a positive control of transgenesis (Figure 19).

Figure 19. GFP midbrain expression as an internal control of transgenesis. Brightfield and epifluorescence images of a live zebrafish embryo at 72 hpf, side oriented, anterior right, dorsal top, injected with the Z48 vector showing Z48- mediated GFP expression in the midbrain (green); Magnification ×6.25, zoom ×2.5. Fluorescence images captured with Leica M205 microscope.

The immunohistochemistry in the transgenic embryos at 12 dpf, for the pancreatic markers Amylase and AlcamA, allowed to determine if the enhancer mediated GFP expression was localized in the pancreas. Embryos with GFP expression in this tissue were counted for all the conditions, including the NC, that was considered as the threshold above which we considered as enhancer activity. Regarding these criteria, we started by analyzing the enhancer activity of the predicted CREs of pancreatic genes:

58

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

amylase alpha 2A (amy2a), carboxypeptidase A1 (cpa1), carboxypeptidase A5 (cpa5), chymotrypsinogen B1 (ctrb1), pancreas specific transcription factor 1a (ptf1a) and serine protease 1 (prss1) (Figure 21). For these sequences we observed that 7 out of 10 (70%) CREs had enhancer activity in the pancreas above the threshold defined by the empty Z48 vector, which was set as 7.5% of GFP+ embryos (Figure 20).

1 0 0

**** n = 2 7 **** n = 1 7

8 0

s o

y ****

r n = 3 0 b 6 0 m ****

e n = 2 6 ****

+ n = 2 8

P

F

G

f 4 0 o

****

n = 3 5 %

ns ns 2 0 ns ns n = 2 4 n = 3 0 n = 1 8 n = 1 8 ns ns ns n = 2 4 ns n = 1 7 n = 2 4 n = 1 8 n = 5 3 ns ns ns n = 2 2 n = 2 8 n = 2 0 0

1 3 1 2 3 1 1 1 1 2 1 2 b 3 4 5 1 C E E E E E E E E E E E E 2 E E E E N a a a a a 1 5 1 1 1 b b E b b b 3 1 1 2 2 2 a a b s s a a b a a a 5 f f y y y r s s t t p p t r r 1 1 a 1 1 1 tp p p m m m c c c p p id id 1 id id id a a a r r id r r r a a r a a a a

Figure 20. Overall enhancer activity of the tested sequences. NC – Negative control (Z48 empty vector). Red dashed line corresponds to the threshold defined by the NC. Values are expressed as percentages and compared by Chi-square with Yates’ correction test. P-values of less than 0.05 are considered significant. ****P<0.0001.

We started by analyzing the positive control, ptf1aE1, previously described as being an enhancer with activity in pancreas (Pashos et al., 2013), and we observed GFP expression in this tissue in 63.3% (n=30) of the transgenic embryos. Regarding the other CRE of ptf1a, which has not been explored previously, we observed that the ptf1aE3 sequence showed a striking activity in the pancreas, driving GFP expression in 85.2% (n=27) of the embryos. Also, ctrb1E1 had a strong activity in pancreas, with 82.4% (n=17) of the embryos showing GFP expression. In the case of amy2a, both amy2aE1 and amy2aE2 were able to drive GFP expression in the pancreas, in about 11% (n=18 for each) of the embryos, while the amy2aE3 showed enhancer activity in 53.8% (n=26) of the injected embryos. Cpa1E1 showed GFP expression in only 5.9% (n=17) of the

59

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas embryos, below the threshold established by the NC, whereas cpa5 putative enhancer cpa5E1 elicited GFP pancreatic expression in 16.7% (n=24) of the embryos. Curiously, any of the putative enhancers of prss1 seemed to have activity in pancreas, the prss1E1 did not drive any GFP expression in the analyzed embryos (n=22), and the prss1E2 only showed GFP expression in 4.2% (n=24) of embryos, below the NC threshold.

60

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Figure 21. Enhancer assays for CREs of pancreatic genes. Immunohistochemistry of transgenic zebrafish embryos (10-12 dpf) injected with the NC (Z48 empty vector), positive control (ptf1aE1), and ptf1aE3, amy2aE1, amy2aE2, amy2aE3, cpa5E1 and ctrb1E1 enhancers. The pancreatic field is defined by the Zn-8 staining (yellow dashed line). The endocrine pancreas domain is defined by the absence of Zn-8 and Amylase staining (magenta dashed line). All samples were stained with DAPI (blue), anti-Amylase (red) and Zn-8 (white). Images were captured with a Leica SP5II confocal microscope.

In addition, we analyzed the activity of CREs in the landscape of TSGs: AT rich interactive domain 1Ab (arid1ab) and tumour protein p53 (tp53) (Figure 22). The majority of these putative CREs showed enhancer activity in the pancreas, above the NC threshold (Figure 20). For arid1ab putative enhancers, three of them showed enhancer activity, whereas the other three did not. Arid1abE1 injected embryos had GFP expression in 16.7% (n=30) of the cases, arid1abE2 in 8.3% (n=24) of the embryos, and arid1abE2b was the enhancer with higher activity for this genomic landscape, with pancreatic GFP expression in 50.0% (n=28) of the embryos. Contrarily, neither arid1abE3 nor arid1abE4 showed any GFP expression in the pancreas (n=28 and n=20, respectively, and the arid1abE5 drove GFP expression in only 5.6% (n=18) of the embryos, below the threshold. Tp53 enhancer, tp53E1, was able to drive GFP expression in 31.3% (n=35) of injected embryos.

61

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 22. Enhancer assays for CREs of TSGs. Immunohistochemistry of transgenic zebrafish embryos (10-12 dpf) injected with the arid1abE1, arid1abE2, arid1abE2b and tp53E1 enhancers. The pancreatic field is defined by the Zn-8 staining (yellow dashed line). The endocrine pancreas domain is defined by the absence of Zn-8 and Amylase staining (red dashed line). All samples were stained with DAPI (blue), anti-Amylase (red) and Zn-8 (white). Images were captured with a Leica SP5II confocal microscope.

In total, we have identified 11 out of 17 (64.7%) as having enhancer activity in the pancreas: aridabE1, arid1abE2, arid1abE2b, amy2aE1, amy2aE2, amy2aE3, cpa5E1, ctrb1E1, ptf1aE1, ptf1aE3 and tp53E1 (Figure 20). It was possible to observe GFP expression in domains outside of the pancreas in some of the embryos (e.g.: kidney and gut), however our analysis was focused on the pancreatic field. We have also observed that some of the tested sequences were able to significantly drive expression of the in vivo reporter gene in the endocrine pancreas (Figure 23), being the more significant the enhancers arid1abE1, arid1aE2b, amy2aE3, ctrb1E1, ptf1aE1 and tp53E1.

62

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

100 E x o c r in e

**** E n d o c r in e 80

**** s

o **** y

r 60 ****

b

m

e

f *** o

40 ****

% **

20 ** ** * * *

0

1 3 1 2 3 1 1 1 1 2 1 2 b 3 4 5 1 C E E E E E E E E E E E E 2 E E E E a a a a a 1 5 1 1 1 b b E b b b 3 N 1 1 2 2 2 a a s s a a b a a a 5 f f y y y b t t p p tr s s 1 1 a 1 1 1 p p p c c r r d d 1 d d d t m m m c p p i i i i i a a a r r id r r r a a r a a a a

Figure 23. Enhancer activity within the pancreatic primordium. NC – Negative control (Z48 empty vector). Values are expressed as percentages and compared by Chi-square with Yates’ correction test. P-values of less than 0.05 are considered significant. ****P<0.0001, ***P<0.001, **P<0.01, *P<0.05.

To evaluate if the GFP expression in the endocrine pancreas was linked with the accessibility of the chromatin, we decided to see if these sequences had ATAC signal for the endocrine pancreas. Our group have performed ATAC-seq for dissected islets (unpublished) and this data was used for this estimation. We observed that the enhancers of the TSGs arid1aE1, arid1E2b and tp53E1 had ATAC signal in the endocrine pancreas. However, the enhancers from the pancreatic genes amy2aE3, ctrb1E1 and ptf1aE1 did not have ATAC signal in endocrine pancreas (Table 5).

Table 5. Percentage of tissue-specific expression in the pancreas. ATAC-seq signal in the endocrine pancreas. Tissue-specific enhancer activity is denoted as percentages.

ATAC signal Enhancer % exocrine expression % endocrine expression n endocrine

No ptf1aE1 63,33 20,00 30 No ptf1aE3 85,19 3,70 27 No amy2aE1 5,56 5,56 18

Yes amy2aE2 11,11 0,00 18 No amy2aE3 57,69 19,23 26

No cpa1E1 5,88 0,00 17

63

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

No cpa5E1 16,67 0,00 24

No ctrb1E1 76,47 11,76 17 No prss1E1 0,00 0,00 22 No prss1E2 4,17 0,00 24 Yes arid1aE1 16,67 13,33 30

Yes arid1aE2 8,33 4,17 24 Yes arid1aE2b 42,86 35,71 28 Yes arid1aE3 0,00 0,00 28 Yes arid1aE4 0,00 0,00 20 Yes arid1aE5 5,56 0,00 18

Yes tp53E1 34,29 11,43 35

NC 7,55 0,00 53

In order to validate the results obtained from the transient enhancer assays and better define the associated expression patterns, the enhancer with higher levels of activity, namely ptf1aE3, was selected to generate stable transgenic lines. This sequence was recombined into the ZED vector (Bessa et al., 2009) and stable transgenic animals were generated. F1 embryos resulting from founders were grown and fixed for immunohistochemistry at 12 dpf. An intense GFP expression was observed throughout the pancreas, validating ptf1aE3 as a pancreatic enhancer active in exocrine pancreas, as observed in the transient enhancer assays, showing also some expression in the lenses (Figure 24).

Figure 24. Ptf1aE3 expression pattern in a stable transgenic line. Expression patterns shown by a generated transgenic line (F1) through the injection of the ZED:ptf1aE3 construct. Lateral view of darkfield and epifluorescent channels are shown. Larvae at 8 dpf lateral to the right. Images captured with Leica M205 microscope.

In order to analyze the cell-specific expression of ptf1aE3 in a stable transgenic line, immunohistochemistry for cell-type specific pancreatic markers was performed. We validated Nkx6.1 as a marker of pancreatic progenitor cells and differentiated ductal cells. At 48 hpf Nkx6.1 is a molecular marker of pancreatic progenitor cells while at later

64

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

stages of development functions as a molecular marker of differentiated ductal cells (Figure 25). Depending on the embryo developmental stage, Nkx6.1 help us distinguish between different cell types. As a marker for acinar cells we used Amylase or Elastase, proteins specifically expressed in acini. We took advantage of the tg(ela:mCherry) transgenic zebrafish existent in our lab, expressing the mCherry fluorescent marker under the promoter of elastase, to further assess enhancer activity in acinar cells. To see the enhancer activity in endocrine cells we used another transgenic zebrafish tg(sst:mCherry), carrying a mCherry in vivo reporter of Somatostatin as a molecular marker of the endocrine pancreas, where mCherry is expressed in -cells under the promoter of sst gene.

Figure 25. Nkx6.1 is a molecular marker for both progenitor and differentiated ductal cells. Immunohistochemistry of zebrafish embryos where we can observe Nkx6.1 (white), sst:mCherry (red) and DAPI (blue) at 48 hpf in a stable tg(sst:mCherry) line (left panel) and Nkx6.1 (white), Amylase (red) and DAPI (blue) in WT embryos at 10 dpf (right panel). Images were captured with a Leica SP5II confocal microscope using a magnification ×40, zoom ×4.00 and x40, zoom x2.00, respectively.

From the outcross of the stable transgenic tg(ZED:ptf1aE3) with tg(ela:mCherry) and tg(sst:mCherry) lines we obtained embryos where we could evaluate the enhancer activity in acinar and endocrine cells, respectively. Ptf1aE3 displayed a broad expression of GFP throughout the pancreatic field (Figure 26). The enhancer demonstrated activity in differentiated acinar cells, depicted by the co-localization of GFP with ela:mCherry (Figure 26A), confirming the results obtained with transient enhancer assays (Figure 21). We observed co-localized expression of GFP with Nkx6.1 at 48 hpf and at 11 dpf, showing Ptf1E3 enhancer activity in progenitor pancreatic cells and differentiated ductal cells, respectively (Figure 26B and C). In the embryos obtained by the outcross of

65

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas tg(ZED:ptf1aE3) with tg(sst:mCherry), at 12 dpf it was not visible any GFP expression within the endocrine pancreas domain defined by sst:mCherry, suggesting that this enhancer is not active in these cells (Figure 26D).

Figure 26. Cell type-specific expression pattern driven by the tg(ZED:ptf1aE3) line. Immunohistochemistry of tg(ZED:ptf1aE3) zebrafish embryos showing cell-specific enhancer activity. (A) Enhancer activity in differentiated acinar cells. Tg(ZED:ptf1aE3) line was outcrossed with tg(ela:mCherry) line and embryos stained with Zn-8 (white) and DAPI (blue). (B) Enhancer activity in pancreatic progenitor cells (yellow dashed line). Tg(ZED:ptf1aE3) line was outcrossed with tg(sst:mCherry) line and embryos stained for Nkx6.1 (white) and DAPI (blue). (C) Enhancer activity in differentiated ductal cells. Tg(ZED:ptf1aE3) line was outcrossed with tg(ela:mCherry) line and embryos stained for Nkx6.1 (white) and DAPI (blue). (D) Absence of enhancer activity in endocrine cells. Somatostatin cells define the endocrine domain (magenta dashed line). Tg(ZED:ptf1aE3) line was outcrossed with tg(sst:mCherry) line and embryos stained with Zn-8 (white) and DAPI (blue). Images were captured with a Leica SP5II confocal microscope using a magnification ×40, zoom ×1.00 for A, B, C and zoom x2.00 for D.

66

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

5.2. Comparative analysis of the distal enhancers of ptf1a in human and zebrafish

From recent Hi-ChIP results obtained by our laboratory (unpublished), we observed that the ptf1aE3 enhancer establishes direct interaction with the ptf1a promoter (Figure 27A). Therefore, analyzing the landscape of ptf1a in zebrafish, we observe that the newly identified enhancer ptf1aE3 (Figures 21, 24, 26) is a putative distal enhancer of this gene, located 35 kb away from it (Figure 27A). Comparatively, it has been previously described in human, a sequence located 25 kb downstream of PTF1A (Figure 27B and Figure 28) acting as a developmental enhancer of PTF1A (Weedon et al., 2014). Moreover, Weedon and colleagues (Weedon et al., 2014) demonstrated that the deletion of this region causes loss of PTF1A expression with consequent pancreatic agenesis in humans.

Figure 27. Zebrafish and human distal ptf1a enhancers. (A) Screenshot from UCSC Genome Browser on Zebrafish Jul. 2010, Version: (Zv9/danRer7) showing ptf1aE3 enhancer location and epigenomic landscape. E3 is interacting with the ptf1a promoter (4C data), show an enrichment for the histone mark H3K27Ac (ChIP-seq data) and is in an open chromatin region (ATAC-seq data). The sequence is conserved in fish and among other vertebrates in some extent (bottom tracks). (B) Screenshot from UCSC Genome Browser on Human Dec. 2013 (GRCh38/hg38) showing hPTF1A enhancer location and conservation between vertebrates and mammals.

67

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 28. Human landscape of PTF1A. hPtf1aE enhancer is distally located 25 kb downstream of PTF1A. ChIP-seq data for the enhancer mark H3K4me1 is annotated. Blue and green tracks show occupancy for FOXA2 and PDX1 TFs, respectively. Both TFs are essential for pancreatic development (Palanker et al., 2006). This element is highly conserved between vertebrates. Illustrated from Weedon et al., 2013.

5.2.1. Cell-specific expression pattern of hPtf1aE and ptf1aE3

The human and zebrafish landscapes around this region look very similar, so we hypothesized whether these enhancers could be functional orthologues between the 2 species, despite of the lack of sequence conservation between zebrafish and humans for the two sequences. To address this question, we performed enhancer assays for this sequence using Z48 vector. The human enhancer has shown enhancer activity in the pancreas in 60% of the embryos analyzed (n=42) (Figure 29).

Figure 29. hPtf1aE has enhancer activity in pancreas. Representative confocal projection of a transgenic zebrafish embryo (11 dpf) injected with the human PTF1A enhancer. The pancreatic field is defined by the Zn-8 staining (yellow dashed line). All samples were stained with DAPI (blue), anti-Amylase (red) and Zn-8 (white). Images were captured with a Leica SP5II confocal microscope using a magnification ×40, zoom ×1.00.

To explore whether the human and the zebrafish enhancers are functionally equivalent we evaluated whether they could drive the same cell-type specific expression pattern in transient reporter assays. We observed that both enhancers are active in differentiated acinar cells (Figure 30) at 11 dpf, since the human enhancer drove an expression pattern

68

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

co-localizing with Amylase in 56% of the embryos (n=18) and the zebrafish enhancer in 85% of the embryos (n=27). We saw co-localization of GFP expression with Nkx6.1 at 48 hpf (Figure 31) in both enhancers: the human enhancer in 86% of the embryos (n=18) and the zebrafish enhancer in 64% of the embryos (n=16), suggesting that both enhancers are active in pancreatic progenitor cells. It was also observed that both enhancers are active in differentiated ductal cells (Figure 32) at 11 dpf, as the human enhancer drove an expression pattern co-localizing with Nkx6.1 in 58% of the embryos (n=12), and the zebrafish enhancer in 67% of the embryos (n=13). Lastly, we observed that the human enhancer hPTF1E is more active in endocrine cells, being able to drive an expression pattern in the endocrine primordium in 69% of the embryos (n=32) (Figure 33), contrariwise to the zebrafish ptf1aE3 in which endocrine expression was detected in only 3.70% of the embryos. Although we have seen some enhancer activity for ptf1aE3 in F0s endocrine cells, we did not observe expression in the endocrine pancreas in the stable transgenic line (Figure 26D).

Figure 30. Comparison of expression patterns driven by the human and zebrafish ptf1a enhancers in differentiated exocrine acinar cells. Immunohistochemistry of transgenic zebrafish embryos (11 dpf) injected with the human hPtf1a (upper panel) and zebrafish ptf1aE3 (lower panel) enhancers. The pancreatic field is defined by the Zn-8 staining. All samples were stained with DAPI (blue), anti-Amylase (red) and Zn-8 (white). Orange arrows indicate co- localization. Images were captured with a Leica SP5II confocal microscope using a magnification ×40, zoom ×4.00.

69

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 31. Comparison of expression patters driven by the human and zebrafish ptf1a enhancers in pancreatic progenitor cells at 48 hpf. Immunohistochemistry of transgenic zebrafish embryos (48 hpf) injected with the human hPtf1a (upper panel) and zebrafish ptf1aE3 (lower panel) enhancers. The endocrine pancreatic field is defined by the Somatostatin domain from a stable transgenic line tg(sst:mCherry). All samples were stained with DAPI (blue) and for Nkx6.1 (white). Orange arrows indicate co-localization. Images were captured with a Leica SP5II confocal microscope using a magnification ×40, zoom ×4.00.

Figure 32. Comparison of expression patters driven by the human and zebrafish ptf1a enhancers in differentiated exocrine ductal cells. Immunohistochemistry of transgenic zebrafish embryos (11 dpf) injected with the human hPtf1a (upper panel) and zebrafish ptf1aE3 (lower panel) enhancers. All samples were stained with DAPI (blue) and for Nkx6.1 (white). Orange arrows indicate co-localization. Images were captured with a Leica SP5II confocal microscope using a magnification ×40, zoom ×2.00 and x40, zoom x3.00, respectively.

70

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Figure 33. Comparison of expression patters driven by the human and zebrafish ptf1a enhancers in endocrine cells. Immunohistochemistry of transgenic zebrafish embryos (11 dpf) injected with the human hPtf1a (upper panel) and zebrafish ptf1aE3 (lower panel) enhancers. The endocrine domain (red dashed line) is defined by the absence of Zn-8 and Amylase staining. All samples were stained with DAPI (blue) and Zn-8 (white). Images were captured with a Leica SP5II confocal microscope using a magnification ×40, zoom ×1.00 and x40, zoom x2.00, respectively.

5.2.2. Deletion of ptf1aE3 in zebrafish

To explore the functional role of ptf1aE3 and whether the enhancer is necessary for the transcriptional regulation of ptf1a in the pancreas, we aimed to generate mutant embryos by CRISPR, through different combinations of sgRNAs to delete the enhancer. We explored the phenotypes associated with these embryos by confocal microscopy and PCR amplification followed by gel electrophoresis. Embryos injected with Cas9 mRNA plus sgRNAs targeting ptf1aE3 show a decrease in the pancreatic area when compared with the controls injected with Cas9 mRNA (Figure 34) resulting in defective pancreas organogenesis (Figure 35).

Pancreatic area

20000

15000 **** 2

m 10000 µ

5000

0 Cas9 mRNA ptf1aE3 sgRNAs

Figure 34. Pancreatic area is reduced in ptf1aE3 mutant embryos. Cas9 mRNA (n=22) and ptf1aE3 Sg3+Sg4 (n=26) group values are expressed as the mean ± SD and compared by Student’s t-test. P-values of less than 0.05 are considered significant. ****P<0.0001.

71

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 35. Phenotype displayed by mutant embryos for Ptf1aE3. Immunohistochemistry of mosaic mutant zebrafish embryos (11 dpf) injected with Cas9 mRNA or co-injected with Cas9 mRNA plus two flanking sgRNAs (Sg3+Sg4) targeting Ptf1aE3. All samples were stained with DAPI (blue), anti-Amylase (red) and Zn-8 (white). Fluorescent samples captured with Leica SP5II confocal microscope with magnification ×40, zoom ×1.00.

After DNA extraction from control embryos injected with Cas9 mRNA and embryos injected with Cas9 mRNA plus a cocktail of sgRNAs, enhancer amplification by PCR and gel electrophoresis we could not identify any change in the enhancer size nor the presence of additional bands in none of the different injections conditions (Figure 36). These data suggest that, despite the very promising result observed by confocal microscopy, we were not able to demonstrate molecularly the deletion of the enhancer.

Figure 36. Ptf1aE3 gel electrophoresis after CRISPR-Cas9. 2% agarose gel electrophoresis after PCR amplification of Ptf1aE3 from injected embryos DNA. NC – Embryos injected with Cas9 mRNA (control). Ptf1aE3 is 1828 bp length.

72

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

5.3. Analysis of the impact of hnf4a knockdown

One of the TF binding sites enriched in the detected ATAC and H3K27Ac ChIP-seq positive regions is HNF4a, including the regions confirmed in this study as having enhancer activity in pancreas (data not shown). Therefore, we wanted to evaluate the importance of this TF in the regulation of expression of pancreatic genes. To achieve this, we targeted hnf4a to assess the phenotypical and transcriptional impact on downstream target genes. The presence of the CRISPR-generated mutation was confirmed by PCR amplification followed by Sanger sequencing (Figure 37). Mosaic hnf4a knockout embryos were generated by co-injecting sgRNA and Cas9 mRNA in tg(ela:mCherry) (Figure 40A) embryos. Phenotypically we did not observe a significant change in the pancreatic area (Figure 38), but we did consistently observe a loss of mCherry intensity in the pancreas (Figure 39A and B).

Figure 37. Chromatograph of the CRISPR-generated mutation in the hnf4a gene. Mutation mediated by the Hnf4_Sg4. Screenshot from Sanger sequencing raw data.

P a n c re a tic a r e a

2 5 0 0 0 ns

2 0 0 0 0

1 5 0 0 0

² m

µ 1 0 0 0 0

5 0 0 0

0 C a s 9 m R N A H n f4 a s g R N A

Figure 38. Total pancreatic area is not affected in hnf4a knockout embryos. Cas9 mRNA (n=13) and hnf4a sgRNA (n=14) group values are expressed as the mean ± SD and compared by Student’s t-test. P-values of less than 0.05 are considered significant.

73

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Figure 39. Elastase:mCherry signal is downregulated in hnf4a knockout embryos. (A) Cas9 mRNA (n=13) and hnf4a sgRNA (n=14) group values are expressed as the mean ± SD and compared by Student’s t-test. P-values of less than 0.05 are considered significant. *P<0.05. (B) - Pancreas of 8 dpf hnf4a KO embryo and control (Cas9 mRNA) from a stable transgenic line tg(ela:mCherry), side oriented, anterior right, dorsal top. Magnification ×6,25, zoom ×3,00. Fluorescence images captured with Leica M205 microscope.

At 8 dpf ela:mCherry positive cells from the Cas9 mRNA injection control and from sgRNA targeting hnf4a injected embryos were sorted by (FACS) (Figure 40B) and the RNA extracted to perform RT-qPCR for downstream target genes, including ptf1a, nuclear receptor subfamily 5, group A, member 2 (nr5a2) and X-box binding protein 1 (xbp1).

Figure 40. Elastase is a molecular marker of the exocrine pancreas. (A) Brightfield and epifluorescence images of a live zebrafish embryo at 8 dpf from a stable transgenic line tg(ela:mCherry), side oriented, anterior right, dorsal top. Fluorescence images captured with Leica M205 stereoscope, magnification ×6,25, zoom ×2,5. (B) FACS for mCherry positive cells. WT control population on top. Fluorescent sorting of elastase:mCherry population on the bottom.

74

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

At the transcriptional level we observed that hnf4a disruption resulted in a downregulation of hnf4a levels as well as downstream target genes including nr5a2 and a strikingly reduction in ptf1a expression in the pancreas. Xbp1, a known downstream direct target of hnf4a seems to be unaffected (Figure 41).

Figure 41. Hnf4a downstream target genes expression levels are affected upon hnf4a disruption. Expression levels were assessed by RT-qPCR using 8 dpf zebrafish Elastase-positive cells RNA from tg(ela:mCherry) embryos injected with Cas9 mRNA (n>300) or Cas9 plus sgRNA (n>300). Hnf4a, nr5a2, ptf1a and xbp1 expression levels were normalized to the endogenous control genes β-actin and eef1a1 expression levels. *P<0.05, compared with control.

75

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

6. Discussion

In this study we were able to demonstrate the reliability of the predictions of active enhancers from ATAC and ChIP-seq for H3K27Ac combined, by using appropriate tools to biologically test enhancer activity in vivo. From a set of 17 putative enhancer elements in the differentiated adult zebrafish pancreas (Table 4) identified through the enhancer mark H3K27Ac and ATAC-seq for global enhancer discovery (Figure 18), we have uncovered 11 active pancreatic enhancers (64.7%) with expression patterns above background levels. A group of these selected putative enhancers were located near exocrine pancreas specific genes and we have identified 7 out of 10 (70%) as having enhancer activity in the pancreas (Figure 21). One of them, Ptf1aE1, has been previously described as an enhancer active in pancreas in zebrafish embryos, and we used it as a positive control of our assays. With these results, we validated our pancreas-specific data and both epigenetic marks and open chromatin as good tools to predict CREs. The previously unexplored ptf1aE3 revealed a striking expression in the exocrine pancreas. Contrarily to what has been observed by Pashos et al., 2013, who have tested for enhancer activity non-coding sequences ranging in size from 3 to 13 kb upstream and downstream of ptf1a, and did not observe reporter gene expression in the pancreas for this region. Instead, enhancer activity of sequences located near the enhancer identified in this study was observed in the hindbrain, spinal cord and notochord (Pashos et al., 2013). One possible explanation for this discrepancy is that the sequences tested by Pashos et al., 2013 may include binding sites for repressive TFs that ablate the enhancer activity of these non-coding sequences specifically in the pancreas when isolated from their genomic context. Regarding the other pancreatic enhancers, this is the first study to have identified 5 enhancers active in the zebrafish pancreas: three enhancers in the landscape of amy2a, one enhancer of cpa5 and one enhancer of ctrb1. Hence, all these enhancers may contribute to the regulation of the expression of these pancreas-specific genes in differentiated cells. The remaining tested sequences had little activity in the pancreas, namely an enhancer of cpa1 and two prss1 enhancers. Curiously, prss1 and cpa1 are genes highly expressed in the acini, though we could not identify any enhancer nearby their genomic location. One possibility is that the transcription may be mediated by the promoter alone, or their enhancers are located more distally. Likewise, we observed that prss1E1 did not have any activity, being much below the NC threshold. This may indicate these sequences could act as a silencers, negatively controlling the expression of the reporter gene. One way to address this hypothesis would be cloning the sequences in a construct containing an ubiquitous promoter or the elastase promoter

76

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

and quantitatively evaluate the in vivo reporter fluorescence upon injecting the elastase promoter alone and the construct with the sequences upstream of the promoter in zebrafish.

For the remaining putative enhancers, located near TSGs, we have identified 4 out of 7 (57%) as having enhancer activity in the pancreas (Figure 22). Three arid1ab and one tp53 enhancers were uncovered in this study. As these genes are altered in PC (Weissmueller et al., 2014; Gan et al., 2018), their loss-of-function in pathological situations may arise from the disruption of the elements responsible for their transcriptional regulation, namely enhancers. Therefore, the identification of disease- associated enhancers and their functional validation as regulatory elements of TSGs may add important information regarding the development of oncogenic diseases in a tissue-specific manner. Overall, our adult zebrafish pancreas-specific data constitutes a strong and reliable information for the discovery of putative disease-associated enhancers.

We have also detected that some of the tested putative enhancers were able to drive GFP expression in the endocrine pancreas besides the exocrine field (Figure 23 and Table 5). This was expected since the H3K27Ac ChIP-seq and ATAC-seq data used for the selection of pancreas-specific CREs is from whole zebrafish adult pancreas. Thus, although there is an overrepresentation of acinar and ductal cells in the pancreas, islets of Langerhans are also included in the tissue, so it was expectable to uncover endocrine enhancers. Recent endocrine-specific ATAC-seq data from our laboratory (unpublished) justify why some of our enhancers have endocrine pancreas activity namely arid1aE1, arid1aE2b and tp53E1 (Fig5). It is interesting to note that all the TSGs-associated enhancers have both ATAC signal for the endocrine and exocrine pancreas. This finding demonstrate that these enhancers might regulate the expression of these TSGs in the whole pancreas, which is expected since TSGs should be active in most tissues. We also observed that three enhancers of exocrine-specific genes (amy2aE3, ctrb1E1 and ptf1aE1) had ectopic expression in the endocrine pancreas (19.2%, 11.76% and 20% of GFP expression, respectively). This unexpected finding might be justified by the nature of transient reporter assays using the Z48 vector once the transposon is being randomly integrated in the genome and is not protected from position effects. Therefore, the minimal promoter may be non-specifically activated to drive reporter gene expression in tissues were the enhancer may not be active. Alternatively, these results might result from isolating these sequences from their genomic context, where silencers might be operating. Importantly, the NC does not show such position effect.

77

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

We have shown that transient enhancer assays with the Z48 vector system is a strong and reliable tool to access enhancer activity in zebrafish. As the Z48 vector carries GFP reporter gene and an endogenous zebrafish enhancer Z48, it has an internal control of transgenesis (Figure 19), allowing the detection of enhancer activity in specific tissues through visual examination of expression patterns in a mosaic manner. However, the Z48 vector has some disadvantages: the background resulting from the cis-regulatory activity of genomic regions surrounding the transposon insertion site, also known as position effect (Chung, Whiteley and Felsenfeld, 1993) and the mosaicism caused both by late integration of the transposon in the genome and irregular distribution of the DNA in the embryo (Stuart, McMurray and Westerfield, 1988; Stuart et al., 1990). Focusing on ptf1aE3, we observed that the enhancer was able to drive expression in the endocrine pancreas in 3.70% of the embryos, which is not expected since ptf1a is not expressed in islets. To overcome this issue, we exploited the advantages of the ZED vector, specially the minimization of position effects (Bessa et al., 2009). The Zebrafish Enhancer Detector (ZED) vector is a novel improved Tol2 transposon-based vector used to assess enhancer activity in zebrafish. We used this vector, carrying a sensitive and specific minimal promoter chosen for optimal enhancer activity detection, insulator sequences that shield the minimal promoter from position effects, thereby increasing the specificity of the signal, and a positive control of transgenesis (Bessa et al., 2009), to generate a stable transgenic line for the most active pancreatic enhancer (ptf1aE3) (Figure 24).

Ptf1a is a very important TF necessary for development and specification of the pancreas (Sellick et al., 2004; Dong et al., 2008; Jusuf and Harris, 2009; Hesselson, Anderson and Stainier, 2011). Adult ptf1aE3 transgenic zebrafish offspring was screened and we have identified ptf1aE3 as a pancreas-specific enhancer with some ectopic expression in the lenses, which has not been described for ptf1a in zebrafish. In a pancreas cell-type specific assay we have observed that ptf1aE3 enhancer is active in differentiated exocrine acinar and duct cells (Figure 26A,C), as expected since Ptf1a is known to be broadly expressed in the exocrine pancreas (Lin et al., 2004; Zecchin et al., 2004). Additionally, we also observed activity of ptf1aE3 enhancer in the pancreatic progenitors surrounding the cluster of endocrine cells of the principal islet, but not in endocrine cells (Figure 26B,D), which also is in accordance with the previously described expression of ptf1a in the multipotent pancreatic progenitors (Kawaguchi et al., 2002; Hald et al., 2008) and elucidates the minor expression that was observed in transient assays in the endocrine pancreas. In summary, the pattern of expression of ptf1a gene in pancreatic cells, including an ectopic expression in the lenses, concomitant with the pattern of expression of ptf1aE3, suggests that the enhancer was able to reproduce the

78

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

endogenous activity of the predicted target gene, mainly the exocrine-specific expression. Therefore, although transient transgenic assays with the Z48 vector allow for high-throughput screens, the establishment of stable transgenic lines with the ZED vector is important, since allows a more clear and detailed description of expression patterns. However, these results were not enough to demonstrate that this enhancer is in fact regulating the expression of this gene. To check this possibility, we analyzed recent adult-specific Hi-ChIP data from our laboratory (Figure 27A), where we observed that the ptf1aE3 enhancer establishes direct interactions with the ptf1a promoter, supporting the hypothesis of ptf1aE3 being an enhancer regulating the expression of ptf1a in zebrafish pancreas.

We observed that ptf1aE3 is a distal enhancer in the landscape of ptf1a located 35kb downstream of ptf1a promoter. Interestingly, it has been identified in human cell lines, by Weedon et al (2014), a putative enhancer 25kb downstream of PTF1A in a highly conserved region (Figure 28). We hypothesized that, due to the similarities in the landscape of this gene in both vertebrates (Figure 27A-B), these two CREs could be functional orthologs. First, we observed that this human sequence had enhancer activity in the pancreas (Figure 29), demonstrating that both enhancers are functional in zebrafish. Furthermore, to answer if the two enhancers are functionally equivalent we wondered if they could drive the same cell-type specific expression pattern in the pancreas. Through transient reporter assays in zebrafish, we observed that both enhancers were very active in progenitor cells, differentiated exocrine acinar and duct cells but not in endocrine cells (Figures 30-33). The human enhancer hPtf1aE was able to drive reporter gene expression in the endocrine domain, but in a stable transgenic line we observed that the zebrafish enhancer Ptf1aE3 did not. This difference in the endocrine enhancer activity could be related with the fact that we are testing a human sequence in zebrafish. Nevertheless, ptf1a is not expressed in the endocrine pancreas, supporting the hypothesis of being a trans-species problem. Weedon et al., 2013, described this region as a human developmental enhancer specifically active in pancreatic embryonic progenitors. In the present study, we observed that besides its expression in progenitor cells, the hPtf1aE enhancer is also active in differentiated cell types, in zebrafish, as we observed for Ptf1aE3. Taken together, these data suggest that, despite both hPtf1aE and Ptf1aE3 lack sequence conservation, the molecular machinery operating through these enhancers specifically in the pancreas is similar and functional in zebrafish. These findings are in agreement with what has been already described by Fisher et al., 2006, in which functional conservation between human and zebrafish is possible without . Moreover, conserved genes are predicted to

79

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas regulate zebrafish and human pancreas development in a similar way, since there are already many evidences that orthologous signaling pathways and TFs regulate pancreas development in these vertebrates (Yee and Pack, 2005). Hence, these findings validate zebrafish as a powerful model organism to dissect and understand the mechanisms of human gene regulation and development in the pancreas.

The data obtained in this study are quite promising in supporting the hypothesis that the same principle, of functional homology without sequence conservation, could be applied to enhancers. However, although human and zebrafish ptf1a enhancers drive expression in similar regions, suggesting that they might work as functional orthologous enhancers, to further demonstrate this hypothesis, it is necessary to ablate the ptf1aE3 enhancer and see whether in zebrafish occurs a similar phenotype to the observed in humans upon deletion of hPtf1aE, causing loss of PTF1A expression with consequent pancreas agenesis (Weedon et al., 2014). In order to validate our hypothesis, we used CRISPR- Cas9 system with two sgRNAs flanking the ptf1aE3 enhancer to promote its deletion. We observed the reduction in the pancreatic area, and pancreas organogenesis seemed to be affected, comparing with the embryos injected only with Cas9 mRNA (Figure 34- 35). These results were very promising, as we expect to see in pancreas agenesis, or at least reduction of pancreas size due to the reduction in the levels of ptf1a, upon deletion of the enhancer. Unfortunately, we could not detect molecularly the mutations associated with the phenotype through agarose gel electrophoresis nor DNA sequencing (Figure 36). One hypothesis might be the efficiency of the double strand breaks being very low. Therefore, we cannot assert that the observed phenotype of defective pancreas organogenesis, was a consequence of the enhancer knockdown and not an injection side-effect, even though we did not observe any impact on the control embryos injected in the same conditions.

Uncovering specific TFs that might be determinant for the expression of pancreatic genes, with possible important consequences in pancreas organogenesis and development, would provide insights into the molecular basis for proper pancreatic function and characterization of pancreatic diseases. The uncovered putative active enhancers are enriched with motifs of HNF4a binding, including ptf1a enhancers (data now shown). Therefore, we wanted to see what the impact of the disruption of this pancreatic TF was, phenotypically and transcriptionally. By CRISPR-Cas9 targeting hnf4a (Figure 37) we downregulated this gene and we observed a mild phenotype associated with the reduction of the intensity of fluorescence from elastase:mCherry reporter gene (Figure 39A-B) but not in the total pancreatic area (Figure 38) suggesting that somehow HNF4 might be regulating the elastase promoter function. However, by

80

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

CRISPR-Cas9 we did not obtain a complete deletion of hnf4a. For a complete loss-of- function, mosaic F0 adult animals would have to be outcrossed to generate heterozygotes that would be raised and then in-crossed to generate homozygous mutants. From the results obtained with mosaic mutant embryos, upon ablating hnf4a we observed a downregulation of the expression of downstream target genes of hnf4a (Figure 41), one of them nr5a2 (Nissim et al., 2016) and a striking reduction in ptf1a expression levels, which was expected since most of ptf1a enhancers are enriched with motifs of HNF4 binding (unpublished data). Moreover, we observed a small but not significant reduction in xbp1 expression levels, which may indicate that this described downstream target gene (Moore et al., 2016) is not entirely dependent on hnf4a- mediated trans-regulation. However, these results are from two biological replicates, therefore more replicates are required to address if they are consistent. Nevertheless, these findings suggest that HNF4a may act as a master regulator of pancreas through interaction with the CREs of many pancreatic genes including ptf1a.

81

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

7. Conclusions

In this master thesis we aimed to uncover several cis-regulatory elements active in the zebrafish pancreas. Therefore, we proposed to study this CREs in vivo through enhancer reporter assays and loss-of-function techniques using zebrafish as a model.

The main conclusions of this thesis are:

- Most of the predicted pancreatic active enhancers (11/17) tested in this study show enhancer activity in pancreas.

- Our pool of validated enhancers shows that our H3K27Ac and ATAC data are strong and reliable for the discovery of new disease-associated enhancers.

- We have identified and validated through reporter assays a novel distal enhancer (ptf1aE3) in the landscape of ptf1a located 35kb downstream of zebrafish ptf1a promoter active in acinar, ductal and multipotent pancreatic progenitor cells.

- The previously described human distal PTF1A enhancer seems to be functionally equivalent to the uncovered ptf1aE3 enhancer from the zebrafish ptf1a regulatory landscape. This finding highlights zebrafish potential as a model to identify functional orthologs between vertebrates.

- An alteration in pancreas phenotype was observed in the embryos injected with sgRNAs to delete ptf1aE3, although we could not confirm the deletion of the enhancer.

- hnf4a disruption led to a downregulation of pancreas downstream target genes, including ptf1a.

- HNF4 may act as a master regulator of pancreatic genes.

Despite great advances in studying mechanisms of gene regulation we are still a long way from understanding the regulatory networks in the same extent that we can read the genetic code. We have demonstrated that the integration of genome-wide techniques to predict CREs, reporter assays for their identification and loss of function assays for functional characterization constitute a very complete and unbiased approach to study tissue-specific gene regulation. Hence, our work constitutes a small but important step into the depths of gene regulation and provides a more complete portrait of ptf1a regulation and hnf4a transcriptional networks in the pancreas.

82

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

References

Aday, A. W. et al. (2011) ‘Identification of cis regulatory features in the embryonic zebrafish genome through large-scale profiling of H3K4me1 and H3K4me3 binding sites’, Developmental Biology, 357(2), pp. 450–462. doi: 10.1016/j.ydbio.2011.03.007.

Amsterdam, A. et al. (1999) ‘A large-scale insertional mutagenesis screen in zebrafish’, Genes and Development, 13(20), pp. 2713–2724. doi: 10.1101/gad.13.20.2713.

Ariza-Cosano, A. et al. (2012) ‘Differences in enhancer activity in mouse and zebrafish reporter assays are often associated with changes in gene expression’, BMC Genomics, 13(1). doi: 10.1186/1471-2164-13-713.

Atkinson, T. J. and Halfon, M. S. (2014) ‘Regulation of Gene Expression in the Genomic Context’, Computational and Structural Biotechnology Journal. Research Network of Computational and Structural Biotechnology, 9(13), p. e201401001. doi: 10.5936/csbj.201401001.

Beaton, M. J. and Cavalier-smith, T. (1999) ‘Eukaryotic non-coding DNA is functiona evidence from the differential scaling of cryptomonad genomes.pdf’, 7(June).

Belton, J. M. et al. (2012) ‘Hi-C: A comprehensive technique to capture the conformation of genomes’, Methods. Elsevier Inc., 58(3), pp. 268–276. doi: 10.1016/j.ymeth.2012.05.001.

Benitez, C. M., Goodyer, W. R. and Kim, S. K. (2012) ‘Deconstructing pancreas developmental biology’, Cold Spring Harbor Perspectives in Biology, 4(6), pp. 1–17. doi: 10.1101/cshperspect.a012401. van Berkum, N. L. et al. (2010) ‘Hi-C: A Method to Study the Three-dimensional Architecture of Genomes.’, Journal of Visualized Experiments, (39), pp. 1–7. doi: 10.3791/1869.

Bessa, J. et al. (2009) ‘Zebrafish Enhancer Detection (ZED) vector: A new tool to facilitate transgenesis and the functional analysis of cis-regulatory regions in zebrafish’, Developmental Dynamics, 238(9), pp. 2409–2417. doi: 10.1002/dvdy.22051.

Bessa, J. et al. (2014) ‘A mobile insulator system to detect and disrupt cis-regulatory landscapes in vertebrates’, Genome Research, 24(3), pp. 487–495. doi: 10.1101/gr.165654.113.

Bessa, J. and Gómez-skarmeta, J. L. (2011) ‘Molecular Methods for Evolutionary

83

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Genetics’, 772, pp. 397–408. doi: 10.1007/978-1-61779-228-1.

Biemar, F. et al. (2001) ‘Pancreas development in zebrafish: Early dispersed appearance of endocrine hormone expressing cells and their convergence to form the definitive islet’, Developmental Biology, 230(2), pp. 189–203. doi: 10.1006/dbio.2000.0103.

Boyer, L. A. et al. (2005) ‘Core transcriptional regulatory circuitry in human embryonic stem cells’, Cell, 122(6), pp. 947–956. doi: 10.1016/j.cell.2005.08.020.

Buenrostro, J. D. et al. (2013) ‘Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position’, Nature Methods. Nature Publishing Group, 10(12), pp. 1213–1218. doi: 10.1038/nmeth.2688.

Buenrostro, J. D. et al. (2015) ‘ATAC-seq: A method for assaying chromatin accessibility genome-wide’, Current Protocols in Molecular Biology, 2015(January), p. 21.29.1- 21.29.9. doi: 10.1002/0471142727.mb2129s109.

Carty, M. et al. (2017) ‘An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data’, Nature Communications, 8(May), pp. 1–10. doi: 10.1038/ncomms15454.

Chen, L., Magliano, D. J. and Zimmet, P. Z. (2012) ‘The worldwide epidemiology of type 2 diabetes mellitus - Present and future perspectives’, Nature Reviews Endocrinology. Nature Publishing Group, 8(4), pp. 228–236. doi: 10.1038/nrendo.2011.183.

Chung, J. H., Whiteley, M. and Felsenfeld, G. (1993) ‘A 5′ element of the chicken β- globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila’, Cell, 74(3), pp. 505–514. doi: 10.1016/0092- 8674(93)80052-G.

Creyghton, M. P. et al. (2010) ‘Histone H3K27ac separates active from poised enhancers and predicts developmental state’, Proceedings of the National Academy of Sciences, 107(50), pp. 21931–21936. doi: 10.1073/pnas.1016071107.

D’Costa, A. and Shepherd, I. T. (2009) ‘Zebrafish Development and Genetics: Introducing Undergraduates to Developmental Biology and Genetics in a Large Introductory Laboratory Class’, Zebrafish, 6(2), pp. 169–177. doi: 10.1089/zeb.2008.0562.

Danino, Y. M. et al. (2015) ‘The core promoter: At the heart of gene expression’, Biochimica et Biophysica Acta - Gene Regulatory Mechanisms. Elsevier B.V., 1849(8), pp. 1116–1131. doi: 10.1016/j.bbagrm.2015.04.003. 84

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Das, S. L. M. et al. (2014) ‘Relationship between the exocrine and endocrine pancreas after acute pancreatitis’, World Journal of Gastroenterology, 20(45), pp. 17196–17205. doi: 10.3748/wjg.v20.i45.17196.

Dekker, J. et al. (2002) ‘Capturing chromosome conformation’, Science, 295(5558), pp. 1306–1311. doi: 10.1126/science.1067799.

Dong, P. D. S. et al. (2008) ‘Graded levels of Ptf1a differentially regulate endocrine and exocrine fates in the developing pancreas’, Genes and Development, 22(11), pp. 1445– 1450. doi: 10.1101/gad.1663208.

Dooley, K. and Zon, L. I. (2000) ‘Zebrafish: a model system for the study of human disease’, Current Opinion in Genetics & Development, 10(3), pp. 252–256. doi: 10.1016/S0959-437X(00)00074-5.

Esteller, M. (2007) ‘Cancer epigenomics: DNA methylomes and histone-modification maps’, Nature Reviews Genetics, 8(4), pp. 286–298. doi: 10.1038/nrg2005.

Farnham, P. J. (2009) ‘Insights from genomic profiling of transcription factors’, Nature Reviews Genetics, 10(9), pp. 605–616. doi: 10.1038/nrg2636.

Felker, A. and Mosimann, C. (2016) Contemporary zebrafish transgenesis with Tol2 and application for Cre/lox recombination experiments, Methods in Cell Biology. Elsevier Ltd. doi: 10.1016/bs.mcb.2016.01.009.

Fernández-Miñán, A. et al. (2016) ‘Assay for transposase-accessible chromatin and circularized chromosome conformation capture, two methods to explore the regulatory landscapes of genes in zebrafish’, Methods in Cell Biology, 135, pp. 413–430. doi: 10.1016/bs.mcb.2016.02.008.

Field, H. A. et al. (2003) ‘Formation of the digestive system in zebrafish. II. Pancreas morphogenesis’, Developmental Biology, 261(1), pp. 197–208. doi: 10.1016/S0012- 1606(03)00308-7.

Gan, K. A. et al. (2018) ‘Identification of single nucleotide non-coding driver mutations in cancer’, Frontiers in Genetics, 9(FEB), pp. 1–10. doi: 10.3389/fgene.2018.00016.

Gittes, G. K. (2009) ‘Developmental biology of the pancreas: A comprehensive review’, Developmental Biology. Elsevier B.V., 326(1), pp. 4–35. doi: 10.1016/j.ydbio.2008.10.024.

Gnügge, L., Meyer, D. and Driever, W. (2004) ‘Pancreas development in zebrafish.’, Methods in cell biology, 76, pp. 531–551.

85

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Goryshin, I. Y. and Reznikoff, W. S. (1998) ‘Tn 5 in Vitro TranIgor Yu Reznikoff, William S.sposition’, Journal of Biological Chemistry, 273(13), pp. 7367–7374. doi: 10.1074/jbc.273.13.7367.

Grabher, C. and Wittbrodt, J. (2007) ‘Meganuclease and transposon mediated transgenesis in medaka’, Genome Biology, 8(SUPPL. 1), pp. 1–7. doi: 10.1186/gb-2007- 8-s1-s10.

Hadrys, T. et al. (2006) ‘Conserved co-regulation and promoter sharing of hoxb3a and hoxb4a in zebrafish’, Developmental Biology, 297(1), pp. 26–43. doi: 10.1016/j.ydbio.2006.04.446.

Hald, J. et al. (2008) ‘Generation and characterization of Ptf1a antiserum and localization of Ptf1a in relation to Nkx6.1 and Pdx1 during the earliest stages of mouse pancreas development’, Journal of Histochemistry and Cytochemistry, 56(6), pp. 587–595. doi: 10.1369/jhc.2008.950675.

Hawkins, R. D. et al. (2011) ‘Dynamic chromatin states in human ES cells reveal potential regulatory sequences and genes involved in pluripotency’, Cell Research. Nature Publishing Group, 21(10), pp. 1393–1409. doi: 10.1038/cr.2011.146.

Heintzman, N. D. et al. (2007) ‘Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome’, Nature Genetics, 39(3), pp. 311–318. doi: 10.1038/ng1966.

Hesselson, D., Anderson, R. M. and Stainier, D. Y. R. (2011) ‘Suppression of Ptf1a activity induces acinar-to-endocrine conversion’, Current Biology. Elsevier Ltd, 21(8), pp. 712–717. doi: 10.1016/j.cub.2011.03.041.

Hezel, A. F. et al. (2012) ‘TGF-β and αvβ6 integrin act in a common pathway to suppress pancreatic cancer progression’, Cancer Research, 72(18), pp. 4840–4845. doi: 10.1158/0008-5472.CAN-12-0634.

Hoffman, B. G. and Jones, S. J. M. (2009) ‘Genome-wide identification of DNA-protein interactions using chromatin immunoprecipitation coupled with flow cell sequencing’, Journal of Endocrinology, 201(1), pp. 1–13. doi: 10.1677/JOE-08-0526.

Howe, K. et al. (2013) ‘The zebrafish reference genome sequence and its relationship to the human genome’, Nature, 496(7446), pp. 498–503. doi: 10.1038/nature12111.

Hwang, W. Y. et al. (2013) ‘Efficient genome editing in zebrafish using a CRISPR-Cas system’, Nature Biotechnology. Nature Publishing Group, 31(3), pp. 227–229. doi: 10.1038/nbt.2501. 86

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Hwang, W. Y. et al. (2013) ‘Efficient In Vivo Genome Editing Using RNA-Guided Nucleases Woong’, Nature biotechnology, 31(3), pp. 227–229. doi: 10.1038/nbt.2501.Efficient.

Ishibashi, M. et al. (2013) ‘Using zebrafish transgenesis to test human genomic sequences for specific enhancer activity’, Methods. Elsevier Inc., 62(3), pp. 216–225. doi: 10.1016/j.ymeth.2013.03.018.

Jennings, R. E. et al. (2015) ‘Human pancreas development’, Development, 142(18), pp. 3126–3137. doi: 10.1242/dev.120063.

Jeong, Y. (2006) ‘A functional screen for sonic hedgehog regulatory elements across a 1 Mb interval identifies long-range ventral forebrain enhancers’, Development, 133(4), pp. 761–772. doi: 10.1242/dev.02239.

Jon-Matthew Belton (2013) ‘Genomes’, Genomes, 58(3), pp. 1–16. doi: 10.1016/j.ymeth.2012.05.001.Hi-C.

Jørgensen, M. C. et al. (2007) ‘An illustrated review of early pancreas development in the mouse’, Endocrine Reviews, 28(6), pp. 685–705. doi: 10.1210/er.2007-0016.

Jusuf, P. R. and Harris, W. A. (2009) ‘Ptf1a is expressed transiently in all types of amacrine cells in the embryonic zebrafish retina’, Neural Development, 4(1). doi: 10.1186/1749-8104-4-34.

Kalender Atak, Z. et al. (2017) ‘Identification of cis-regulatory mutations generating de novo edges in personalized cancer gene regulatory networks’, Genome Medicine. Genome Medicine, 9(1), pp. 1–13. doi: 10.1186/s13073-017-0464-7.

Kawaguchi, Y. et al. (2002) ‘The role of the transcriptional regulator Ptf1a in converting intestinal to pancreatic progenitors’, Nature Genetics, 32(1), pp. 128–134. doi: 10.1038/ng959.

Kawakami, K. (2007) ‘Tol2: A versatile gene transfer vector in vertebrates’, Genome Biology, 8(SUPPL. 1), pp. 1–10. doi: 10.1186/gb-2007-8-s1-s7.

Kim, J. et al. (2013) ‘An iPSC Line from Human Pancreatic Ductal Adenocarcinoma Undergoes Early to Invasive Stages of Pancreatic Cancer Progression’, Cell Reports. Elsevier, 3(6), pp. 2088–2099. doi: 10.1016/j.celrep.2013.05.036.

Kimmel, C. B. et al. (1995) ‘Stages of embryonic development of the zebrafish.’, Developmental dynamics : an official public, 203(3), pp. 253–310. doi: 10.1002/aja.1002030302.

87

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Kleinjan, D. A. and van Heyningen, V. (2005) ‘Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease’, The American Journal of Human Genetics, 76(1), pp. 8–32. doi: 10.1086/426833.

Kleinjan, D. J. and Coutinho, P. (2009) ‘Cis-ruption mechanisms: Disruption of cis- regulatory control as a cause of human genetic disease’, Briefings in Functional Genomics and Proteomics, 8(4), pp. 317–332. doi: 10.1093/bfgp/elp022.

Kojić, S. et al. (2017) ‘Zebrafish (Danio rerio) in deciphering molecular mechanisms of human diseases’, Biologia Serbica, 39(1), pp. 66–73. doi: 10.5281/zenodo.826900.

Kornberg, R. (1974) ‘Chromatin Structure : A Repeating Unit of Histones and DNA Chromatin structure is based on a repeating unit of eight’, Science, 184, pp. 868–871.

Krapp, A. et al. (1998) ‘The bHLH protein PTF1-p48 is essential for the formation of the exocrine and the correct spatial organization of the endocrine pancreas’, Genes and Development, 12(23), pp. 3752–3763. doi: 10.1101/gad.12.23.3752.

Kuhn, E. J. et al. (2003) ‘A test of insulator interactions in Drosophila’, EMBO Journal, 22(10), pp. 2463–2471. doi: 10.1093/emboj/cdg241.

Kumar, A., Garg, S. and Garg, N. (2014) Regulation of Gene Expression, Reviews in Cell Biology and Molecular Medicine.

Kwan, K. M. et al. (2007) ‘The Tol2kit: A multisite gateway-based construction Kit for Tol2 transposon transgenesis constructs’, Developmental Dynamics, 236(11), pp. 3088– 3099. doi: 10.1002/dvdy.21343.

De La Calle-Mustienes, E. et al. (2005) ‘A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts’, Genome Research, 15(8), pp. 1061–1072. doi: 10.1101/gr.4004805.

Lee, D. et al. (2015) ‘A method to predict the impact of regulatory variants from DNA sequence’, Nature Genetics. Nature Publishing Group, 47(8), pp. 955–961. doi: 10.1038/ng.3331.

Lelli, K. M., Slattery, M. and Mann, R. S. (2012) ‘Disentangling the Many Layers of Eukaryotic Transcriptional Regulation’, Annual Review of Genetics, 46(1), pp. 43–68. doi: 10.1146/annurev-genet-110711-155437.

Li, L. et al. (2016) ‘In Vivo Screening Using Transgenic Zebrafish Embryos Reveals New Effects of HDAC Inhibitors Trichostatin A and Valproic Acid on Organogenesis’, PLoS ONE, 11(2). doi: 10.1371/journal.pone.0149497.

88

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

Li, M. et al. (2016) ‘Zebrafish Genome Engineering Using the CRISPR–Cas9 System’, Trends in Genetics. Elsevier Ltd, 32(12), pp. 815–827. doi: 10.1016/j.tig.2016.10.005.

Lin, J. W. et al. (2004) ‘Differential requirement for ptf1a in endocrine and exocrine lineages of developing zebrafish pancreas’, Developmental Biology, 270(2), pp. 474– 486. doi: 10.1016/j.ydbio.2004.02.023.

Luizon, M. R. and Ahituv, N. (2015) ‘Uncovering drug-responsive regulatory elements’, Pharmacogenomics, 16(16), pp. 1829–1841. doi: 10.2217/pgs.15.121.

Maestro, M. A. et al. (2007) ‘Distinct roles of HNF1beta, HNF1alpha, and HNF4alpha in regulating pancreas development, beta-cell function and growth.’, Endocrine development, 12, pp. 33–45. doi: 10.1159/0000109603.

Mahajan, A. et al. (2014) ‘Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility’, Nature Genetics, 46(3), pp. 234–244. doi: 10.1038/ng.2897.

Maston, G. A., Evans, S. K. and Green, M. R. (2006) ‘Transcriptional Regulatory Elements in the Human Genome’, Annual Review of Genomics and Human Genetics, 7(1), pp. 29–59. doi: 10.1146/annurev.genom.7.080505.115623.

Miyatsuka, T. et al. (2011) ‘Neurogenin3 inhibits proliferation in endocrine progenitors by inducing Cdkn1a’, Proceedings of the National Academy of Sciences, 108(1), pp. 185– 190. doi: 10.1073/pnas.1004842108.

Moore, B. D. et al. (2016) ‘Transcriptional regulation of X-Box-binding protein One (XBP1) by hepatocyte nuclear factor 4- (HNF4-) Is vital to beta-cell function’, Journal of Biological Chemistry, 291(12), pp. 6146–6157. doi: 10.1074/jbc.M115.685750.

Navratilova, P. et al. (2009) ‘Systematic human/zebrafish comparative identification of cis-regulatory activity around vertebrate developmental transcription factor genes’, Developmental Biology. Elsevier Inc., 327(2), pp. 526–540. doi: 10.1016/j.ydbio.2008.10.044.

Négre, N. et al. (2011) ‘A cis-regulatory map of the Drosophila genome’, Nature, 471(7339), pp. 527–531. doi: 10.1038/nature09990.

Nissim, S. et al. (2016) ‘Iterative use of nuclear receptor Nr5a2 regulates multiple stages of liver and pancreas development’, Developmental Biology. Elsevier, 418(1), pp. 108– 123. doi: 10.1016/j.ydbio.2016.07.019.

Odom, D. T. et al. (2004) ‘Control of Pancreas and Liver Gene Expression by HNF

89

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Transcription Factors’, Science, 303(5662), pp. 1378–1381. doi: 10.1126/science.1089769.

Ong, C.-T. and Corces, V. G. (2011) ‘Enhancer function: new insights into the regulation of tissue-specific gene expression’, Nature Reviews Genetics. Nature Publishing Group, 12(4), pp. 283–293. doi: 10.1038/nrg2957.

Ota, S. and Kawahara, A. (2014) ‘Zebrafish: A model vertebrate suitable for the analysis of human genetic disorders’, Congenital Anomalies, 54(1), pp. 8–11. doi: 10.1111/cga.12040.

Pack, M. et al. (1996) ‘Mutations affecting development of zebrafish digestive organs’, Development, 123, pp. 321–328. doi: 10.5167/uzh-229.

Palanker, L. et al. (2006) ‘Dynamic regulation of’, Development, 3562, pp. 3549–3562. doi: 10.1101/gad.1752608.lineages.

Park, P. J. (2009a) ‘ChIP-seq: Advantages and challenges of a maturing technology’, Nature Reviews Genetics, 10(10), pp. 669–680. doi: 10.1038/nrg2641.

Park, P. J. (2009b) ‘ChIP-seq: Advantages and challenges of a maturing technology’, Nature Reviews Genetics. Nature Publishing Group, 10(10), pp. 669–680. doi: 10.1038/nrg2641.

Pashos, E. et al. (2013) ‘Distinct enhancers of ptf1a mediate specification and expansion of ventral pancreas in zebrafish’, Developmental Biology, 381(2), pp. 471–481. doi: 10.1016/j.ydbio.2013.07.011.

Perez-Rico, Y. A. and Shkumatava, A. (no date) ‘SI: Comparative analyses of super- enhancers reveal conserved elementsi n vertebrate genomes’, pp. 1–14.

Rada-Iglesias, A. et al. (2011) ‘A unique chromatin signature uncovers early developmental enhancers in humans’, Nature. Nature Publishing Group, 470(7333), pp. 279–285. doi: 10.1038/nature09692.

Ragvin, A. et al. (2011) ‘Correction for Ragvin et al., Long-range gene regulation links genomic type 2 diabetes and obesity risk regions to HHEX, SOX4, and IRX3’, Proceedings of the National Academy of Sciences, 108(10), pp. 4264–4264. doi: 10.1073/pnas.1101890108.

Raha, D., Hong, M. and Snyder, M. (2010) ‘ChIP-Seq: a method for global identification of regulatory elements in the genome’, Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.], Chapter 21(July), pp. 1–14. doi:

90

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

10.1002/0471142727.mb2119s91.

Rodríguez-Seguí, S. A. and Bessa, J. (2015) ‘Gatekeepers of pancreas: TEAD and YAP.’, Oncotarget, 6(18), pp. 15736–15737. doi: 10.18632/oncotarget.4607.

Sagai, T. (2005) ‘Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb’, Development, 132(4), pp. 797–803. doi: 10.1242/dev.01613.

Santos-Rosa, H. et al. (2002) ‘Active genes are tri-methylated at K4 of histone H3’, Nature. Macmillian Magazines Ltd., 419, p. 407. Available at: http://dx.doi.org/10.1038/nature01080.

Schaffer, A. E. et al. (2010) ‘Nkx6 transcription factors and Ptf1a function as antagonistic lineage determinants in multipotent pancreatic progenitors’, Developmental Cell. Elsevier Ltd, 18(6), pp. 1022–1029. doi: 10.1016/j.devcel.2010.05.015.

Sellick, G. S. et al. (2004) ‘Mutations in PTF1A cause pancreatic and cerebellar agenesis’, Nature Genetics, 36(12), pp. 1301–1305. doi: 10.1038/ng1475.

Shlyueva, D., Stampfel, G. and Stark, A. (2014) ‘Transcriptional enhancers: From properties to genome-wide predictions’, Nature Reviews Genetics. Nature Publishing Group, 15(4), pp. 272–286. doi: 10.1038/nrg3682.

Spitz, F. and Furlong, E. E. M. (2012) ‘Transcription factors: From enhancer binding to developmental control’, Nature Reviews Genetics. Nature Publishing Group, 13(9), pp. 613–626. doi: 10.1038/nrg3207.

Stadhouders, R. (2012) ‘Transcription regulation by distal enhancers’, (August), pp. 181– 186.

Stuart, G. W. et al. (1990) ‘Stable lines of transgenic zebrafish exhibit reproducible patterns of transgene expression.’, Development (Cambridge, England), 109(3), pp. 577–84. doi: 10.1111/j.1440-169x.2008.01023.x.

Stuart, G. W., McMurray, J. V and Westerfield, M. (1988) ‘Replication, integration and stable germ-line transmission of foreign sequences injected into early zebrafish embryos.’, Development (Cambridge, England), 103(2), pp. 403–12. doi: 10.1242/dev.068080.

Taatjes, D. J., Marr, M. T. and Tjian, R. (2004) ‘Regulatory diversity among metazoan co-activator complexes’, Nature Reviews Molecular Cell Biology, 5(5), pp. 403–410. doi: 10.1038/nrm1369.

91

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements active in adult zebrafish exocrine pancreas

Tabish, S. A. (2007) ‘Is Diabetes Becoming the Biggest Epidemic of the Twenty-first Century?’, International journal of health sciences, 1(2), pp. V–VIII.

Tiso, N., Moro, E. and Argenton, F. (2009) ‘Zebrafish pancreas development’, Molecular and Cellular Endocrinology, 312(1–2), pp. 24–30. doi: 10.1016/j.mce.2009.04.018.

Tjian, R. (1994) ‘Molecular machines that control genes’, Scientific American.

Udvadia, A. J. and Linney, E. (2003) ‘Windows into development: Historic, current, and future perspectives on transgenic zebrafish’, Developmental Biology, 256(1), pp. 1–17. doi: 10.1016/S0012-1606(02)00083-0.

Ung, C. Y. et al. (2015) ‘Mosaic Zebrafish Transgenesis for Functional Genomic Analysis of Candidate Cooperative Genes in Tumor Pathogenesis’, Journal of Visualized Experiments, (97), pp. 1–8. doi: 10.3791/52567.

Vaezeslami, S., Sterling, R. and Reznikoff, W. S. (2007) ‘Site-directed mutagenesis studies of Tn5 transposase residues involved in synaptic complex formation’, Journal of Bacteriology, 189(20), pp. 7436–7441. doi: 10.1128/JB.00524-07.

Visel, A. et al. (2009) ‘ChIP-seq accurately predicts tissue-specific activity of enhancers’, Nature. Nature Publishing Group, 457(7231), pp. 854–858. doi: 10.1038/nature07730.

Wang, S. et al. (2010) ‘Neurog3 gene dosage regulates allocation of endocrine and exocrine cell fates in the developing mouse pancreas’, Developmental Biology. Elsevier Inc., 339(1), pp. 26–37. doi: 10.1016/j.ydbio.2009.12.009.

Wang, Y. et al. (2011) ‘Genetic inducible fate mapping in larval zebrafish reveals origins of adult insulin-producing -cells’, Development, 138(4), pp. 609–617. doi: 10.1242/dev.059097.

Weedon, M. N. et al. (2014) ‘Recessive mutations in a distal PTF1A enhancer cause isolated pancreatic agenesis’, Nature Genetics. Nature Publishing Group, 46(1), pp. 61– 64. doi: 10.1038/ng.2826.

Weissmueller, S. et al. (2014) ‘Mutant p53 drives pancreatic cancer metastasis through cell-autonomous PDGF receptor β signaling’, Cell. Elsevier Inc., 157(2), pp. 382–394. doi: 10.1016/j.cell.2014.01.066.

Wijchers, P. J. and de Laat, W. (2011) ‘Genome organization influences partner selection for chromosomal rearrangements’, Trends in Genetics. Elsevier Ltd, 27(2), pp. 63–71. doi: 10.1016/j.tig.2010.11.001.

Wittkopp, P. J. and Kalay, G. (2012) ‘Cis-regulatory elements: Molecular mechanisms

92

FCUP/ICBAS Uncovering transcriptional cis-regulatory elements

active in adult zebrafish exocrine pancreas

and evolutionary processes underlying divergence’, Nature Reviews Genetics. Nature Publishing Group, 13(1), pp. 59–69. doi: 10.1038/nrg3095.

Workman, J. L. (2006) ‘Nucleosome displacement in transcription’, Genes and Development, 20(15), pp. 2009–2017. doi: 10.1101/gad.1435706.

Yee, N. S., Kazi, A. A. and Yee, R. K. (2013) ‘Translating Discovery in Zebrafish Pancreatic Development to Human Pancreatic Cancer: Biomarkers, Targets, Pathogenesis, and Therapeutics’, Zebrafish, 10(2), pp. 132–146. doi: 10.1089/zeb.2012.0817.

Yee, N. S. and Pack, M. (2005) ‘Zebrafish as a model for pancreatic cancer research’, Methods Mol Med, 103(FEBRUARY 2005), pp. 273–298. doi: 1-59259-780-7:273 [pii].

Zecchin, E. et al. (2004) ‘Evolutionary conserved role of ptf1a in the specification of exocrine pancreatic fates’, Developmental Biology, 268(1), pp. 174–184. doi: 10.1016/j.ydbio.2003.12.016.

Zhang, Q. et al. (2017) ‘Genome-wide open chromatin regions and their effects on the regulation of silk protein genes in Bombyx mori’, Scientific Reports. Springer US, 7(1), pp. 1–9. doi: 10.1038/s41598-017-13186-6.

Zhang, Y. et al. (2017) ‘Programmable base editing of zebrafish genome using a modified CRISPR-Cas9 system’, Nature Communications. Springer US, 8(1), pp. 6–10. doi: 10.1038/s41467-017-00175-6.

93