Integrated Genome Browser: Visual Analytics Platform for Genomics Nowlan H

bioRxiv preprint doi: https://doi.org/10.1101/026351; this version posted September 11, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bioinformatics, YYYY, 0–0 doi: 10.1093/bioinformatics/xxxxx Advance Access Publication Date: DD Month YYYY Original Paper

Genome Analysis Integrated Genome Browser: visual analytics platform for genomics Nowlan H. Freese1*, David C. Norris1*, and Ann E. Loraine1** 1Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 600 Laureate Way, Kannapolis, NC 28081, USA

*Authors contributed equally to this work.

**To whom correspondence should be addressed.

Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX

Abstract Motivation: Genome browsers that support fast navigation and interactive visual analytics can help scientists achieve deeper insight into large-scale genomic data sets more quickly, thus accelerating the discovery process. Toward this end, we developed Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Results: Here we describe multiple updates to IGB, including all-new capability to display and interact with data from high-throughput sequencing experiments. To demonstrate, we describe example visualizations and analyses of data sets from RNA-Seq, ChIP-Seq, and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related data sets. To facilitate this, we enhanced IGB’s ability to consume data from diverse sources, including Galaxy, Distributed Annotation, and IGB-specific Quickload servers. To support future visualization needs as new genome-scale assays enter wide use, we transformed the IGB codebase into a modular, extensible platform for developers to create and deploy all-new visualizations of genomic data.

Availability: IGB is open source and is freely available from http://bioviz.org/igb. Contact: [email protected]

(Kapranov, et al., 2003). As such, IGB was designed from the start to handle and display what we would now call “big data” in bioinformat- 1 Introduction ics - millions of probe intensity values per sample. IGB was one of the Genome browsers are visualization software tools that display ge- first genome browsers to support visual analytics, in which interactive nomic data in interactive, graphical formats. Since the 1990s, genome visual interfaces augment our natural ability to notice patterns in data. browsers have played an essential role in genomics, first as tools for The “thresholding” feature described below is an example. building, inspecting, and annotating assemblies and later as tools for Because early IGB development was publicly funded, Affymetrix distributing data to the public (Durbin and Thierry-Mieg, 1991; released IGB and its companion graphics library, the Genoviz Soft- Harris, 1997; Kent, et al., 2002). Later, the rise of genome-scale as- ware Development Kit (Helt, et al., 2009), as open source software in says created the need for a new generation of genome browsers that 2004. Our first article introducing IGB appeared in 2009 (Nicol, et al., could display user’s experimental data alongside reference sequence 2009) and focused on visualization of tiling array data. Here, we data and annotations. describe new visual analytics and data integration features developed Integrated Genome Browser (IGB), first developed in 2001 at for high-throughput sequencing data. We also introduce a new plug-in Affymetrix, was among the first of this new breed of tools. IGB was application programmers interface (API) that makes adding new func- first written to support Affymetrix scientists and collaborators who tionality easier for developers, transforming IGB into an extensible were using whole genome tiling arrays to probe gene expression and visual analytics platform for genomics. transcription factor binding sites as part of the ENCODE project bioRxiv preprint doi: https://doi.org/10.1101/026351; this version posted September 11, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

N. Freese, et al.

2 Results it integrates data from different sources into the same view. To illus- IGB is implemented as a stand-alone, rich client desktop program trate, Figure 1 shows an example view of integrated data sets from the using the Java programming language. To run IGB, users download human genome. Reference gene model annotations and other data sets and run platform-specific installers; these also support automatic from IGB Quickload are shown, together with tracks loaded from a updates. Mac and Windows installers also include a copy of the Java DAS1 server hosted by the UCSC Genome Bioinformatics group. virtual machine, which is installed in an IGB-specific location, making it unnecessary for users to install (and maintain) Java separately. 3.2 Navigating and interacting IGB’s implementation as local application rather than as a Web app means that IGB can access the full processing power of the user’s Genomic data sets span many scales, and fast navigation through local computer. IGB is always present on the user’s desktop, regard- these different scales is a key feature for a genome browser. Users less of internet connection status, but IGB can also use the Web to need to be able to quickly travel between base-level views depicting sequence details like splice sites, gene-level views depicting the exon- consume data, as described below. Because IGB runs locally, users intron organization of genes, and chromosome-level views showing view their own datasets without uploading them to a server, which can be important when working with confidential data. IGB’s implemen- larger-scale structures, like centromeres and chromosome bands. tation in Java means that developers benefit from Java’s robust sup- Tools that support fast navigation through the data can accelerate the port for multi-threaded, object-oriented programming and dozens of discovery process. well-tested, robust libraries and tools that are available for Java. For this reason, IGB implements a visualization technique called one-dimensional, animated semantic zooming, in which objects change their appearance in an animated fashion around a central line, 3.1 Viewing genomes and annotations called the “zoom focus” (Loraine and Helt, 2002). Animation helps On startup, IGB displays a “home” screen featuring a carousel of users stay oriented during zooming and can create the impression of images linking to the latest versions of model organism, crop plant, flying through one’s data (Bederson and Boltman, 1999; Cockburn, et and reference human genome assemblies. More species and genome al., 2009). versions are available via menus in the Current Genome tabbed panel, To set the zoom focus, users click a location in the display. A verti- including more than seventy animal, plant, and microbial genomes. cal, semi-opaque line called the “zoom stripe” indicates the zoom Users can also load and visualize their own genome assemblies, called focus position and also serves as a pointer and guideline tool when “custom” genomes in IGB, provided they have a reference sequence viewing sequence or exon boundaries. When users zoom, the display file in FASTA or 2bit format. appears to contract or expand around this central zoom stripe, which Once a user selects a species and a genome version, IGB automati- remains in place, thus creating a feeling of stability and control even cally loads the reference gene model annotations for that genome, if while the virtual genomic landscape is rapidly changing. available. For any given genome version, additional annotations or IGB also supports “jump zooming,” in which a request to zoom high-throughput data sets may also be available via the Data Access triggers what feels like instantaneous teleportation to a new location. Panel interface. These data sets can come from different sites, and to To jump-zoom, users can double-click an item, click-drag a region in highlight this, IGB can display a favico.ico graphic distinguishing the coordinates track, or search using the Advanced Search tab or the these different data sources. This is why IGB is named “integrated” - Quick Search box at the top left of the display.

Fig. 1. Viewing multiple human genome data sets in IGB. IGB screen showing human genome build 38, released in December 2013. Gene annotations load by default, with additional data available in the section labeled Available Data (lower left). Each dataset occupies a separate track within the main window, and can be colored according to user preference. Right-clicking items in the main window activates a context menu with options to search Google, run BLAST, or view the underlying genomic sequence. bioRxiv preprint doi: https://doi.org/10.1101/026351; this version posted September 11, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Integrated Genome Browser: Visual analytics platform for genomics

Fig. 2. RNA-Seq data from human lung adenocarcinomas bearing mutant or wild-type (WT) alleles of the KRAS oncogene. (a) Coverage depth graphs show transcript abundance across a 250 kb region. Mutant samples contain a peak indicating higher expression in the mutant sample. (b) Overlaid depth graphs showing a discontinuity in coverage indicating differential splicing in PFN2. Quantification of split reads by FindJunctions further supports differential splicing. (c) Zoomed in view of (b), showing aligned reads.

Moving without changing the scale (panning) is also important for controls. Larger data sets, such as RNA-Seq data, should be loaded on fast navigation through data. In IGB, clicking arrows in the toolbar a region-by-region basis, while others that are small enough to fit into move the display from left to right, and scrollbars offer ways to move memory can be loaded in their entirety, e.g., a BED file containing more rapidly. Users can click-drag the selection (arrow) cursor into ChIP-Seq peaks. To load data into a region, users zoom and pan to the the left or right border of the main display window to activate contin- region of interest and click a button labeled “Load Data.” To load all uous pan. For finer-scale control, a move tool cursor enables click- data in an opened data set, they can change the data sets loading dragging the display in any direction. IGB helps users ask and answer questions about their data by sup- method by selecting a “genome” load mode setting in the Data Access porting multiple ways for users to interact with what they see. Select- Panel. ing data display elements (Glyphs) triggers display of meta-data about This behavior differs from other genome browsers in that most oth- the selected item and mouse-over causes a tooltip to appear. Right- er tools link navigation and data loading. In other tools, a request to clicking items within tracks activates a context menu with options to navigate to a new location or change the zoom level both redraws the search Google, run a BLAST search at NCBI, or open a sequence display and also triggers a data loading operation. Although some- viewer (Fig. 1). Right-clicking track labels activates context menus times convenient for users, this limits the types of navigation interac- showing a rich suite of visual analytics tools and functions, some of tions a tool can support. This trade-off can be seen in the Integrative which we discuss in more detail below. Genomics Viewer (IGV), a desktop Java application developed after IGB (Thorvaldsdottir, et al., 2013). IGV auto-loads data but restricts 3.3 Loading data from files or URLs movement; for example, it lacks panning scrollbars and does not support fast, animated zooming. IGB, by contrast, prioritizes naviga- IGB can consume data from local files or URLs and can read more tion speed and gives users total control over when data load, which than 30 different file formats popular in genomics. When users open a can be important when loading data from distant locations over slower data set, a new track is added to the main display. Users then choose how much data to add to the new track by operating data loading bioRxiv preprint doi: https://doi.org/10.1101/026351; this version posted September 11, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

N. Freese, et al.

internet connections. IGB gives users control over when such delays basic authentication protocol, making it easy to keep data sets private might occur, thus making waiting more palatable. if desired. If users do not require password-protection for their data, they can also use Public Dropbox folders to share Quickload sites.

3.4 Sharing and integrating data IGB aims to support the scientific discovery process by making it easy 3.5 Visualizing RNA-Seq data for users to document and share results. Taking a cue from Web As ultra high-throughput sequencing of cDNA (RNA-Seq) has mostly browsers, IGB for many years has supported bookmarking genomic replaced microarrays for surveying gene expression, we added new scenes. In IGB, genomic scene bookmarks record the location, ge- features to IGB to support visualization of RNA-Seq data. RNA-Seq nome version, and data sets loaded into the current view. Users can analysis data processing workflows typically produce large read also add free text notes and thus record conclusions about what they alignment (BAM) files, which IGB can open and display. IGB also see. Selecting a bookmark causes IGB to zoom and pan to the book- implements many new visual analytics functions that operate on BAM marked location and load all associated data sets, thus enabling users file tracks and highlight biologically meaningful patterns in the data. to quickly return to a region of interest. Users can sort, edit, import We describe a subset of these functions below. and export bookmarks using the Bookmarks tab. Exporting book- Right-clicking a BAM track label opens a context menu listing op- marks creates an HTML file which users can re-import into IGB or tions to create coverage graphs, called “depth” graphs in IGB. At open in a Web browser. If IGB is running, clicking an IGB bookmark present, IGB supports two depth graph types: “depth graph start” that in a Web browser causes IGB to zoom to the bookmarked location counts a read’s first mapped base and “depth graph all” that counts the and load data sets associated with the bookmark. number of reads overlapping a position. The latter is useful when IGB loads bookmarks through a ReST-style endpoint implemented investigating overall expression at a locus, and the former is useful within IGB itself, using a port on the user’s computer. This endpoint when investigating sequencing bias. Both graph types are implement- allows bookmarks to be loaded from HTML hyperlinks embedded in ed using IGB’s track operations API, which developers can use to add web pages, spreadsheets, or any other document type that supports all-new graph generation algorithms as plug-ins, discussed below. hyperlinks. IGB was the first desktop genome browser to implement By comparing depth graphs for multiple samples, users can identify this technique; for many years, Affymetrix used it to display probe set differences in transcript levels. Figure 2a shows “depth graph all” alignments on their NetAffx Web site (Liu, et al., 2003). graphs made from RNA-Seq alignment (BAM) tracks; reads were More recently, we used IGB’s ReST-style bookmarking system to from sequencing lung cancer samples bearing wild-type or mutant implement a Javascript bridge between IGB and the Galaxy bioinfor- copies of the KRAS oncogene (Kalari, et al., 2012). Using the Graph matics workflow system (Goecks, et al., 2010). When users generate tab, users can place both graphs on the same scale, and if sequencing IGB-compatible data files within Galaxy, they can now click a ‘dis- depth is similar, peaks that are the same height and shape reflect simi- play in IGB’ hyperlink. Clicking this link opens a BioViz.org Web lar expression. However, one peak is much taller in the mutant sam- page containing a javascript program that forwards the hyperlink to ple, indicating higher expression. The gene is AQP3, encoding a wa- IGB, causing IGB to retrieve that data directly from Galaxy. If IGB is ter/glycerol-transporting protein (Hara-Chikuma and Verkman, 2005). not running, the javascript instead invites the user to launch IGB. Visual analysis of RNA-seq data can also highlight alternative Once they do, the data set loads. splicing differences between samples. Figure 3b shows PFN2, which IGB supports multiple formats and protocols for sharing data set produces two isoforms due to alternative splicing. Here, the depth collections and integrating across data sources. The most lightweight graphs from Figure 2a were merged into a single track. Comparing the and easy to use of these is the IGB specific “Quickload” format, peak discontinuities in the peaks to the gene models shows that the which consists of a simple directory structure containing plain text, mutant sample favors the shorter isoform. To further aid splicing metadata files. The metadata files can reference data sets stored in the analysis, we developed FindJunctions, a visual analytics tool same Quickload directory or in other locations, including Web, ftp FindJunctions identifies split read alignments in an RNA-Seq track, sites, or cloud storage resources such as Dropbox and iPlant (Goff, et uses them to identify potential exon-exon junctions, and then creates al., 2011). This flexibility makes it possible for a Quickload site to an all-new track containing junction features annotated with the num- aggregate data sets from multiple locations. The metadata for a Quick- ber of alignments that supported them. Figure 2b shows an example load site also includes styling and data access directives controlling where FindJunctions split read quantification of split reads reinforces the way IGB loads each file, and how the data will look when loaded. the finding that PFN2 is differentially spliced between samples. Sharing a Quickload site is straightforward. Users simply copy the Viewing the read alignments is also informative. Figure 2c shows a contents to a publicly accessible location and then publicize the URL, zoomed-in view of the alternatively spliced region from Fig. 2b. Read which IGB users then add as a new Quickload site to their copy of alignments show that the shorter isoform predominates in the sample IGB. Typically, users copy their Quickload sites to the content direc- bearing mutant KRAS. tories of Web sites, and IGB supports secure access via the HTTP bioRxiv preprint doi: https://doi.org/10.1101/026351; this version posted September 11, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. 3. Visualizing ChIP-Seq data. (a) MACS BED file with peak regions from mouse ChIP-Seq data investigating binding sites for transcription factor SOX9. Peak regions in the track labeled “MACS” are colored by score, and higher-scoring regions appear darker. (b) Zoomed in view of (a). MACS identified four significant peaks, of which two exceeded a user- defined coverage threshold, visible as a thin horizontal line in the ChIP-Seq WIG track (top). (c) Zoomed in view of (b). Searching for the SOX9 binding motif identified sites under the most significant peaks in (b).

Users can interact with reads and other data displayed in the viewer and importance may be hard to predict. To validate these choices, it is using selection operations. Clicking a single item selects it, and click- important to view the “raw” alignments and statistical analysis results dragging over multiple items selects all of them. Pressing keyboard in a genome browser. As an example, we describe using IGB to view modifiers while clicking an item adds (SHIFT) or removes (CNTRL) results from a ChIP-Seq analysis done using MACS, a widely used it from the pool of selected items. IGB reports the identity or number tool (Zhang, et al., 2008). of selected items in the upper right corner. In this way, users can MACS produces a BED format file containing peak locations and identify and count items that match a biologically interesting pattern, significance scores indicating which peaks likely contain a binding e.g., spliced reads that support a junction. As an additional visual cue, site. Opening and loading this BED file in IGB creates a new track when a read or annotation is selected, all items with matching bounda- with single-span annotations representing the extent of each peak. At ries on either the 5’ or 3’ end are highlighted. This technique, called first, they look identical, varying only by length, and it is difficult to “edge matching,” aids in visualization of read boundaries and identi- distinguish them. To highlight the highly scoring peak regions, IGB fying alternative donor/acceptor sites. offers a powerful “Color by” visual analytics feature that can assign colors from a heatmap using quantitative variables associated with features. For this, users right-click a track, select “Color by” from the 3.6 Visualizing ChIP-Seq data context menu, and then operate a heatmap editor adopted from the Knowing where a transcription factor binds DNA in relation to nearby Cytoscape codebase to color-code by score (Shannon, et al., 2003). genes is tantamount to understanding its function, as genes whose Doing this makes it easy to identify the highest scoring region most promoters are bound by a given transcription factor are likely to be likely to contain a binding site (Fig. 4a). regulated by it. Identifying binding sites of DNA-binding proteins is ChIP-Seq analysis tools also typically produce WIG-format depth- now routinely done using whole-genome ChIP-Seq, in which DNA graph files that report where immunoprecipitated sequences have cross-linked to protein is immunoprecipitated using antibodies against “piled up,” forming peaks. Loading this WIG file into IGB creates a the protein of interest and then sequenced. Subsequent data analysis new graph track that shows how these peaks coincide with regions typically involves mapping reads onto a reference genome, identifying from the BED file. In IGB, users can move tracks to new locations in regions with large numbers of immunoprecipated reads, and then the display. As shown in Figure 4b, placing the WIG track above the performing statistical analysis to assess significance of enrichment. color-coded BED track makes it easy to observe how taller peaks Each step requires users to choose analysis parameters whose effects typically have higher significance values. bioRxiv preprint doi: https://doi.org/10.1101/026351; this version posted September 11, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. 4. Visualizing bisulfite sequencing data. (a) Bismark bedGraph file from an Arabidopsis bisulfite sequence experiment. Peaks indicate regions containing many methylated cytosine residues. (b) Zoomed in view of (a). A user-defined threshold shows that most cytosines in the first intron are methylated. (c) Zoomed in view of (b), showing the positive and negative strand aligned reads. Thymines are colored white, and cytosines red. Unmethylated cytosines that were converted to thymines appear as white columns occupying the same base pair position as a mark below the sequence axis, which indicate cytosines in the reference sequence. The highly-methylated region on the left contains many marks and few white columns.

IGB’s graph thresholding function, originally developed for tiling ing. This technique can identify methylated sites throughout a genome arrays, is useful for exploring the relationship between coverage depth and reveal potential epigenetic regulation of gene expression. Analyz- and peak score. Available from the IGB Graph tab, the thresholding ing bisulfite data involves mapping the reads onto the genome using function identifies regions in a graph track where consecutive y- tools that can accommodate reads where many but not all C residues values exceed a user-defined threshold. Users can change the thresh- have been converted. Several such tools are available; here we de- old dynamically using a slider and observe in real-time how changing scribe using IGB to visualize output of Bismark (Krueger and the threshold value affects the number and extent of identified re- Andrews, 2011). gions. Users can promote regions to new tracks and save them in BED Bismark produces a BAM file containing read alignments and a format. As an example, Figure 4b shows four MACS-detected peaks depth-graph file (in bedGraph format) reporting percent methylation near the Hs6st1 gene, a regulatory target for the SOX9 transcription calculated from sliding windows along the genome. Figure 4a shows a factor in mouse (Kadaja, et al., 2014). Two exceed the current thresh- Bismark bedGraph file from an experiment investigating methylation old setting, providing a visual cue that these two peaks may be most in the model plant Arabidopsis thaliana (Yelagandula, et al., 2014). important for regulation. All of chromosome one is shown, and peaks indicate regions of high Sometimes the recognition sequence for transcription factor being methylation. This whole chromosome view makes it easy to observe studied is known or can be deduced from the data. IGB offers a way that the centromeric region is highly methylated. From here, users can for users to visualize instances of binding site motifs. Using the Ad- zoom in to examine methylation at a region, gene, or base pair level. vanced Search tab, users can search the genomic region in view using Figure 4b shows a zoomed-in view of one of the tallest peaks visi- regular expressions. For instance, to search for the SOX9 binding ble in Figure 4a. Similar to the ChIP-Seq data analysis described in motif AGCCGYG (where Y can be C or T) a user would enter the previous section, users can apply the thresholding feature to iden- “AGCCG[CT]G”. In this example, the search found several instances tify regions of high methylation within a gene. Here, thresholding at of this motif, with one located in each of the two user-defined signifi- the 70th percentile (Fig. 4b) highlights how the first intron, but not the cant peaks near the Hs6st1 gene (Fig. 3c). second and third introns, contains many with methylated cytosines. Closer examination of aligned reads provides further support of methylation. In IGB, nucleotide residues are color-coded, and users 3.7 Visualizing bisulfite sequencing data can change these colors. In addition, read alignment tracks can be Whole genome bisulfite sequencing (WGBS) refers to bisulfite con- configured so that only mismatched bases are color-coded. Figure 4c version of unmethylated cytosines to thymines followed by sequenc- shows a view of bisulfite sequencing data in which cytosines are red bioRxiv preprint doi: https://doi.org/10.1101/026351; this version posted September 11, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Integrated Genome Browser: Visual analytics platform for genomics

and thymines are white. In this view, white columns in the read track This work was supported by the National Institutes of Health (labeled WGBS) represent unmethylated cytosines, and marks in the [R01GM103463 to A.L.]. coordinates track indicate cytosine residues in the reference sequence that. Regions with many cytosines and few white columns are highly Conflict of Interest: none declared. methylated. References 3.8 Extending IGB using plug-ins Bederson, B.B. and Boltman, A. (1999) Does animation help users build mental maps of spatial information? In, Information Visualization. (Info Vis '99) Developers have created dozens of genome browser tools, each one Proceedings. IEEE Symposium. p. 28-35. aiming to meet a need not met by the tools that preceded it. And yet, Céol, A. and Müller, H. (2015) The MI Bundle: Enabling Network and Structural Biology in genome visualization tools. Bioinformatics. each new tool has faced similar problems, such as how to consume Cockburn, A., Karlson, A. and Bederson, B.B. (2009) A review of data from files, how to lay out genomic features and graphs into overview+detail, zooming, and focus+context interfaces. ACM Comput. tracks, and how to support zooming through vast differences in scale. Surv. 3;41(1):1-31. Durbin, R. and Thierry-Mieg, J. (1991) A C. elegans Database. In. As described above, IGB has solved many of these problems and Documentation, code and data available from anonymous FTP servers at offers a flexible and fast environment for users to explore the genomic lirmm.lirmm.fr, cele.mrc-lmb.cam.ac.uk and ncbi.nlm.nih.gov. landscape. If it were easy for to add new features and tools to IGB, Goecks, J., Nekrutenko, A. and Taylor, J. (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent they could spend more time creating new visualizations and less time computational research in the life sciences. Genome Biol 3;11(8):R86. re-implementing features that already exist. Toward this end, we Goff, S.A., et al. (2011) The iPlant Collaborative: Cyberinfrastructure for Plant transformed IGB into a modular, extensible platform for developers to Biology. Front Plant Sci 3;2:34. Hara-Chikuma, M. and Verkman, A.S. (2005) Aquaporin-3 functions as a create and deploy all-new visualizations of genomic data. glycerol transporter in mammalian skin. Biology of the cell / under the The IGB software architecture now resembles other popular open auspices of the European Cell Biology Organization 3;97(7):479-486. source java projects that support adding new functionality via plug- Harris, N.L. (1997) Genotator: a workbench for sequence annotation. Genome Res 3;7(7):754-762. ins, including Eclipse, the Netbeans Rich Client Platform, and Cyto- Helt, G.A., et al. (2009) Genoviz Software Development Kit: Java tool kit for scape (Shannon, et al., 2003). This similarity comes from our com- building genomics visualization applications. BMC bioinformatics 3;10:266. mon use of OSGi, a services based architectural framework and com- Kadaja, M., et al. (2014) SOX9: a stem cell transcriptional regulator of secreted niche signaling factors. Genes Dev 3;28(4):328-341. munity standard for building modular software. By adopting this Kalari, K.R., et al. (2012) Deep Sequence Analysis of Non-Small Cell Lung framework, we increased IGB extensibility, simplified adding new Cancer: Integrated Analysis of Gene Expression, Alternative Splicing, and features, and created a plug-in API that empowers community devel- Single Nucleotide Variations in Lung Adenocarcinomas with and without Oncogenic KRAS Mutations. Frontiers in Oncology 3;2:12. opers to contribute new functionality without needing deep under- Kapranov, P., Sementchenko, V.I. and Gingeras, T.R. (2003) Beyond standing of IGB internal systems. The plug-in API is new, but com- expression profiling: next generation uses of high density oligonucleotide munity developers are already using it to create novel visualizations arrays. Briefings in functional genomics & proteomics 3;2(1):47-56. Kent, W.J., et al. (2002) The human genome browser at UCSC. Genome Res (Céol and Müller, 2015). Documentation describing how to create 3;12(6):996-1006. plug-ins for IGB is available from the IGB developer’s guide. Krueger, F. and Andrews, S.R. (2011) Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 3;27(11):1571-1572. 3 Future directions Liu, G., et al. (2003) NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res 3;31(1):82-86. IGB offers users powerful utilities for viewing, analyzing, and inter- Loraine, A.E. and Helt, G.A. (2002) Visualizing the genome: techniques for acting with data within an environment that feels fast, flexible, and presenting human genome data and annotations. BMC bioinformatics 3;3:19. Nicol, J.W., et al. (2009) The Integrated Genome Browser: free software for highly interactive. In future, we plan to make the IGB’s Quickload distribution and exploration of genome-scale datasets. Bioinformatics data sharing system easier to use by providing tools for building 3;25(20):2730-2731. Quickload sites from within IGB, along with a Quickload registry for Shannon, P., et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 3;13(11):2498- users to publicize their sites. In addition, we will continue developing 2504. and improving the IGB plug-in APIs, providing documentation and Thorvaldsdottir, H., Robinson, J.T. and Mesirov, J.P. (2013) Integrative example plug-ins to demonstrate how developers can use IGB as a Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 3;14(2):178-192. platform to support genomics research. Yelagandula, R., et al. (2014) The histone variant H2A.W defines heterochromatin and promotes chromatin condensation in Arabidopsis. Cell 3;158(1):98-109. Acknowledgements Zhang, Y., et al. (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 3;9(9):R137. We thank the many developers, testers, and designers who contributed to IGB,

some of whom include Gregg Helt, Michael Lawrence, Lance Frohman, John Nicol, Hiral Vora, Alyssa Gulledge, Fuquan Wang, David Nix, Ido Tamir, Vikram Bishnoi, Anuj Puram, Richard Linchangco, Mason Myer, Katherine Kubiak, Zhong Ren, Tarun Kanaparthi, Kyle Suttlemyre, and Tarun Mall.

Funding