Database, 2019, 1–11 doi: 10.1093/database/baz120 Original article

Original article RPGeNet v2.0: expanding the universe of retinal disease interactions network Rodrigo Arenas-Galnares1,2,‡, Sergio Castillo-Lara1,2,‡, Vasileios Toulis1,2,3, Daniel Boloc4, Roser Gonzàlez-Duarte5, Gemma Marfany1,2,3,5,†,* and Josep F. Abril1,2,†,* 1Department of Genetics, Microbiology and Statistics, University of Barcelona, Barcelona, 08028, Catalo- nia, Spain, 2Institute of Biomedicine (IBUB), University of Barcelona, Barcelona, 08028, Catalonia, Spain, 3CIBERER, ISCIII, University of Barcelona, Barcelona, 08028, Catalonia, Spain, 4Faculty of Medicine, University of Barcelona, Barcelona, 08036, Catalonia, Spain and 5DBGen Ocular Genomics, Barcelona, 08028, Catalonia, Spain *Corresponding author. Tel: +34 93 403 1305; Email: [email protected] †The authors wish it to be known that last two authors should be regarded as joint senior authors. ‡The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors. Citation details: Arenas-Galnares,R., Castillo-Lara,S., Toulis,V. et al. RPGeNet v2.0: expanding the universe of retinal disease gene interactions network. Database (2019) Vol. 2019: article ID baz120; doi:10.1093/database/baz120

Received 6 March 2019; Revised 9 September 2019; Accepted 13 September 2019

Abstract RPGeNet offers researchers a user-friendly queriable tool to visualize the interactome net- work of visual disorder , thus enabling the identification of new potential causative genes and the assignment of novel candidates to specific retinal or cellular pathways. This can be highly relevant for clinical applications as retinal dystrophies affect 1:3000 people worldwide, and the causative genes are still unknown for 30% of the patients. RPGeNet is a refined interaction network interface that limits its skeleton network to the shortest paths between each and every known causative gene of inherited syndromic and non-syndromic retinal dystrophies. RPGeNet integrates interaction information from STRING, BioGRID and PPaxe, along with retina-specific expression data and associated genetic variants, over a Cytoscape.js web interface. For the new version, RPGeNet v2.0, the database engine was migrated to Neo4j graph database manager, which speeds up the initial queries and can handle whole interactome data for new ways to query the network. Further, user facilities have been introduced as the capability of saving and restoring a researcher customized network layout or as novel features to facilitate navigation and data projection on the network explorer interface. Responsiveness has been further improved by transferring some functionality to the client side.

Introduction genes (1). The prevalence of IRDs is 1:3000 worldwide, Inherited retinal dystrophies (IRDs) comprise a highly het- which make these blinding disorders a health relevant tar- erogeneous group of disorders caused by over 200 causative get. The implementation of massive sequencing approaches

© The Author(s) 2019. Published by Oxford University Press. Page 1 of 11 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. (page number not for citation purposes) Page2of11 Database, Vol. 2019, Article ID baz120 has greatly facilitated genetic testing and, as a result, the number of IRD genes and mutations is constantly increas- ing. Nonetheless, a substantial number of cases remain to be accurately diagnosed, as the average yield in IRD genetic diagnosis is roughly 50%. Besides technical limitations, one of the bottlenecks in massive sequence-based molecular diagnosis is that most identified variants are previously unreported missense changes of unknown pathogenicity, either in known causative genes or in previously unreported candidates. These variants may be deemed as pathogenic or probably damaging by in silico predictive programs but end up classified as genetic variants of unknown significance, since there are not functional analyses to support their pathogenicity and their relationship to retinal physiology is yet to be determined. Molecular medicine based on gene and networks is rapidly expanding since most disease-causing genes often work together, either forming a protein complex or par- ticipating in the same signalling pathways. In contrast to the analysis of isolated genes, finding the networks that link disease candidate genes provides supporting data for the identification of new causative genes, functional clues for assaying putative pathogenic novel variants, as well as opens new scenarios to identify key therapeutic targets. Comprehensive tools to navigate through gene func- tions, cellular pathways and pathogenicity begin to emerge, particularly for cancer research. Although a considerable amount of genetic and functional data on IRDs genes and mutations has been gathered, there are not many Figure 1. Data integration workflow to build the RPGeNet core database. user-friendly searchable tools to make a network map of The final graphical web interface (bottom panels) depends on a series of data integration steps that provide the interactions and nodes to gene/ interactions. To fill this gap, a web applica- the main neo4j database engine. Each component of the workflow is tion, RPGeNet (2), was implemented that integrated all the described in detail on the main text. physical and genetic interactions at that time for a subset of IRD genes (retinitis pigmentosa and Leber congenital on the so-called skeleton network). The reduced network amaurosis), obtained from different databases and includ- allows users to better identify genes interconnecting drivers ing additional data such as tissue-specific expression. The that can be directly associated with retinal dystrophies expansion of genetic data, plus all the interactome and by searching through the shortest paths in the skeleton other omics information gathered of late, has prompted us graph (see workflow schema on Figure 1), instead of the to review and expand the initial 100 genes to more than immense number of interactions found in the complete 200 retinal dystrophy genes, to include distilled interaction whole network. RPGeNet does, however, allow users to databases and to implement an improved network genera- expand beyond the skeleton subnetwork if needed. To make tion, management, and visualization interface. searches more practical, RPGeNet expands the network by levels until recreating the entire known interactions graph (whole graph). Each level is an expansion in the parents and Materials and Methods children of the nodes in the previous level (see Figure 2), and RPGeNet is a tool to assist in the search for potential its newly added nodes tend to be either less significant to the candidate genes and pathways associated with retinitis disease the higher the level you go or poorly studied genes pigmentosa and intended to be used by both research with few known interactions. and genetic diagnostic purposes. RPGeNet refines the vast RPGeNet now has three distinct types of queries avail- interaction network by reducing it to the shortest paths able to undergo such disease-specific pathways characteri- between known driver genes of the disease (from now zation. The first query, as in the original RPGeNet, ‘is what Database, Vol. 2019, Article ID baz120 Page 3 of 11

Figure 2. Visual representation of the expansion of the core interaction graph. The graph builder begins with the construction of the skeleton graph, represented by all the nodes on the leftmost panel. The skeleton is created by finding the shortest direct interaction path (the red arrows) between all the known driver genes—drawn here using the same shapes as in RPGeNet Network Explorer (star, square and diamond shapes, based on whether their mutations cause syndromic diseases or not). The graph is then expanded into level one (represented as the orange panel) by adding all of the parents and children (straight lines) of the nodes already found in the skeleton graph. Level-specific interactions are shown as curved connections linking nodes within a given graph level. The expansion is repeated until the highest level is reached and all known genes with known interactions have been connected to the interactions core graph. The remaining genes that do not have any known interaction that connects them with the core graph are included in the whole graph level. Some of those genes may have interactions with other genes found only in the whole graph level (and many of those interactions are self-references). The bottom table from this figure compares the number of nodes and edges added on each level (‘New Nodes’ and ‘New Edges’ rows), as well as the total number of nodes and edges accumulated. interacts directly with the gene of interest?’ The user sets to characterize the shortest paths from a gene of interest, one or more gene of interest, and the output is a graph with not yet related to a disease, to one of the network levels the gene or genes of interest connected with all the genes already defined over the driver genes. The new database that interact directly. This graph can then be expanded, and engine facilitates that kind of query on the whole graph of its layout can be manually curated (Figure 3). The second interactions with respect to any of the nodes on the skeleton query is ‘what are all the shortest paths between two genes path and upper levels. Like in the previous query, a user can of interest?’ Two genes are provided, and a list of all the further explore a retrieved pathway from results listing on pathways between those two genes is returned (Figure 4). Network Explorer. However, as edges in the pathway are directional, only pathways driving from the first requested gene to the second Changes to database one are listed. To find possible pathways in the opposite direction, if any, the user should redo the query just switch- Driver genes ing the order of the query genes. A pathway can then be RPGeNet sources its driver genes from online database transferred to the Network Explorer where the user RetNet (1), which has a collection of over 300 mapped can further explore other nodes extending to the retrieved loci that are clinically validated with known Mendelian shortest pathway. A third query was included: ‘is a given mutations that can cause a retinal disease in . Of the gene connected to any disease associated gene?’ The idea is over 300 mapped loci, only the 276 identified genes, from Page4of11 Database, Vol. 2019, Article ID baz120

Figure 3. The renewed Network Explorer interface of RPGeNet. In this example, the Network Explorer interface shows a subgraph containing all the genes that directly interact with CERKL within the skeleton subnetwork, after expanding nodes for PPM1A, VHL and MICAL3 (those selected four nodes highlighted in green, driver genes border in purple, node colors based on the ‘ABSOLUTE’ gene-expression data). Some improvements to the interface can be appreciated: a ‘search’ gene add-on at the mid-bottom, an expression data set selection drop-down menu at the right-top corner of the network visualization canvas, as well as the ‘undo’/‘redo’ buttons. A more dynamic ‘buttons panel’ on the right facilitates the interaction with the network data. Finally, the coloured-by-type interactions also provide directionality information with arrow heads and have reliability-score proportional widths adjusted to the number of evidences supporting them. This figure can be reproduced on the Network Explorer if users upload the Supplementary File 1. now on referred to as driver genes, were chosen to build the protein-to-protein interactions and genetic interaction data skeleton network. That means that the database now han- for about 61 species. BioGRID includes interactions of dles 166 more driver genes than in the previous version. The artificially induced trans-species. All non- interac- increase of driver genes comes with many changes to the tions, including interactions that had human proteins/genes network including a more connected whole network and that interacted with proteins/genes of another species, an earlier saturation of the interaction subnetworks when were filtered out. The protein-to-protein interactions are considering parent and child nodes growing from the skele- considered physical interactions, while the genetic interac- ton network (see Supplementary Table 1 for a summary of tions may refer to both physical and genetic interactions. graph statistics at different RPGeNet levels). Previously, the Not all interactions had the physical/genetic label, but all subnetworks saturated at level four and now saturate at interactions had an experiment type, hence it was possible level three. That posed some limitations to the previous to deduce the interaction type from the experiment in implementation of the interface and the management of the such cases. Examples of physical interaction experiments network queries, which has been overcome with the new include affinity capture-luminescence, affinity capture-MS, release of RPGeNet. All gene identifiers included on the co-crystal structure, FRET and two hybrid. Examples of whole network were unaliased to the official HUGO Gene genetic interaction experiments are dosage growth defect, Nomenclature Committee (HGNC) reference symbol (3). dosage lethality and dosage rescue. All the filtering and curation were done by means of a Perl script that recovered for the whole network 15 139 nodes and 623 659 Data sources interactions from BioGRID (see Supplementary Tables 2, 3 BioGRID:(4)TheRPGeNet v2.0 database has been and 4, for a comparison of the contribution made by updated with the interaction data from version 3.5.171 each source database integrated into RPGeNet graph). of BioGRID. This database contains a compilation of The database, however, is filled with many undirected Database, Vol. 2019, Article ID baz120 Page 5 of 11

interest. PPaxe uses the random forest classifier algorithm, which is a machine learning method by which large col- lection of decorrelated decision trees are computed. PPaxe uses decision trees that combine different variables about the sentences it reads. One such variable that PPaxe con- siders is whether a verb describes the act of interaction or relationship. PPaxe was used to gather further interactions from scientific literature that were described and detected in published articles referred from PubMed. PPaxe replaces the sparser tool applied on the first release of RPGeNet with the added benefit of using machine learning to gather a larger number of interactions than possible by hand. PPaxe can work on abstracts and full-text articles; the first option gathers interactions solely from the abstract, but can process more articles because abstracts are gener- ally free to read; the second option looks for interactions from entire articles, which implies a smaller set of articles. Each option was used in two separate searches: the first Figure 4. Example of a pathways list returned using the shortest path- search was built to process PubMed papers that contained way RPGeNet query. CERKL and CSPP1 where used to start the pathway any of the 276 driver genes (65 820 abstracts and 29 819 search on the main RPGeNet form. Only the first three pathways of the full papers retrieved); the second search was scanning any 28 retrieved by that query are shown on this figure, all of them at the shortest path length of three (three edges and two nodes between the interaction related to retinitis pigmentosa (1124 abstracts chosen identifiers). By clicking on the corresponding ‘Explore Network’ and 502 full papers retrieved). All four PPaxe outputs were button on any of the listed pathways, users can easily jump to the combined into one set and were then filtered by score and by Network Explorer interface to work on the selected genes for that pathway. the putative gene/proteins found (Supplementary Figure 2). Any interactions not having a gene identifier on the HGNC official nomenclature database (3) were filtered out using interactions. When direction of interaction is unknown a Perl script. PPaxe does not infer yet directionality on or otherwise unstated, we assume bidirectionality. Our retrieved interactions, so that bidirectionality is assumed. interactions graph building program took interactions with A total of 3062 nodes and 13 584 interactions were col- unknown direction and duplicate the interaction in the lected for the RPGeNet core network (further details on reverse direction (AB will become ABandBA). For Supplementary Tables 2, 3 and 4). genetic interactions, the interactions were assumed to be unidirectional. STRING: (5) The network was updated with interac- Database manager tions from version 11.0 of STRING. The top five sources In order to handle a larger driver gene skeleton network for STRING were GRID, INTACT, KEGG, BIOCARTA and and an increased amount of interactions, as well as to REACTOME (Supplementary Figure 1). Not all the interac- facilitate new ways of querying the data elements for the tions from this database have experimental evidence to web interface, we had to resort to a more suitable database back them up and many interactions are predictions of manager. Neo4j (community 3.1.7) was chosen for that possible interactions. Because of this, any interaction not purpose because it is a graph-based database manager that supported by evidence were discarded when building the uses the property graph model to store and access the net- RPGeNet core network. STRING database includes tags work data efficiently using a set of graph function instead stating the directionality of the interaction; in the case that of simply storing information in tables, like those used in there is an interaction where the direction is not known, the traditional relational database managers as MySQL, etc (7). interaction is assumed to be bidirectional. It also includes This database manager is used by other interaction database a large list of non-human protein interactions; those were web applications, such as REACTOME and PlanNET,for filtered out as well. After the processing steps, 13 269 nodes storing and managing large interaction data (8, 9). The and 629 271 interactions were included from this database property graph model uses nodes (the elements to store (further details on Supplementary Tables 2, 3 and 4). attributes/data of an entity) and relationships (relevant con- PPaxe:(6) This text-mining tool can sift through aca- nections between nodes). Its native use of graph functions demic papers to find interactions for the user’s gene(s) of to query graph data makes neo4j an ideal system to Page6of11 Database, Vol. 2019, Article ID baz120 store and manage information for all the network levels of be projected into the network. For this expression set there RPGeNet, speeding up searches either complex or taking are several precomputed analyses available, like retina only larger numbers of nodes into account. absolute expression, retina fold-change versus all other tissues and fold-change with respect to liver on the same Web server improvements microarray experiment. However, the new implementation facilitates the integration of further expression datasets, Queries and performance some of them are already in progress, and we expect to As mentioned in the tool description, RPGeNet now has make them available soon. Another improvement made on three distinct queries available to help users in finding genes the network explorer interface is that charging expression or pathways of interest. Previously, RPGeNet would break data on the current visualized network can be done on the when trying to access interactions at higher subnetwork lev- fly, without having to repeat the query as it happened in els, but the new RPGeNet engine can now handle searching the previous version. and visualizing interactions from genes in the highest level with respect to the gene interest. The new database manager Other improvements on the web interface not only optimizes the searches and access of higher levels RPGeNet network explorer now has an ‘undo’ and a but also makes possible that the new queries implemented ‘redo’ buttons, recording up to five changes made to the in the new RPGeNet web interface were feasible and can be graph. The ‘undo/redo’ buttons facilitate exploring the done in a reasonable amount of time. network with the already available ‘add/remove’ buttons too. Another add-on to the visualization interface is a Data management and visualization ‘search’ bar that can look for any gene(s) and highlight The current RPGeNet upgrade facilitates navigation them in the displayed graph, which is especially helpful through all data available, making it more accessible and when working on large graphs in the network explorer. producing more informative results. Cytoscape.js (10) There were also improvements made to the ‘save image’ and was used to display the interactive graphs in RPGeNet ‘save graph’ buttons to reduce the number of steps required, (see Figure 3). The interactions between genes are now now the user is asked for the saving directory directly. One colour coded depending on what type of interaction main improvement has been introduced to the ‘save graph’ exists between them (blue, for genetic; red, for physical; button, which was initially only saving the nodes identity or black, for unknown edges). If multiple interaction but not the nodes distinct locations within the graph so, types exist between two genes, multiple arrows with when reuploading the graph to RPGeNet, the nodes were the corresponding interaction type colour will be drawn not necessarily laid out in the same way as in the previous between the two genes. The driver genes also have session. When importing a graph, now the nodes are laid distinctive shapes depending on whether they are associated out exactly as in the previous session, facilitating the storage with syndromic or non-syndromic retinitis pigmentosa, of manual rearrangements made by users across different which is particularly useful for genetic diagnosis. Finally, work sessions. clicking on a gene of interest provides users with further information about it from a pop-up panel like the one Discussion shown in Figure 5 (left panel). A basic summary of the gene Using open-source databases, like BioGRID and STRING, is given: all of the known aliases, related expression data, has the advantage that they are free and commonly used functional annotation in , the subnetwork within the scientific community. The problem with these level at which the gene is found within the RPGeNet large databases is that they are usually too large to serve network, the number of known variants and external links the community for specific necessities and need to be further to its GeneCards, UniProt, OMIM and RetNet pages to curated by researchers to distil the relevant biological net- access further information if needed. When users click on work data from noise. BioGRID has experimental evidence a given interaction, a complete information panel is also backing every interaction in their database. STRING,on provided that summarizes all the evidences supporting that the other hand, has many predicted interactions, which can edge as well as links to the external references if available be a good start for researchers interested in finding novel (see Figure 5, right panel). evidences for them. STRING and BioGRID share many of the same interactions; although the raw STRING database Gene expression layer does have more interactions due to the predictions and RPGeNet continues to use the NCBI GEO (11) entry the larger number of species in comparison to BioGRID. GSE7905 (12) as an example of expression data that can Despite STRING not having all of their interactions exper- Database, Vol. 2019, Article ID baz120 Page 7 of 11

Figure 5. An example of node (left) and interaction (right) information panels from the Network Explorer interface. The default behaviour ‘On click’ of the Network Explorer interface is to show ‘node properties’ (see topmost controls on the right panel of that interface on previous figure). From the network example of Figure 3, when clicking at the CERKL node the gene/protein information panel pops up to display a description of the gene, known aliases, a summary of its expression levels and functional annotation and links to external references. On the other hand, by clicking on an edge, VHL to CERKL in this example, the interaction panel pops up, providing information about the type of possible interactions (genetic, blue; physical, red; or ‘unknown’, black), as well as a series of tables containing details about the supporting evidences from the distinct sources, along with the corresponding external links to the reference databases and to the supporting evidences when possible.

imentally backed up, they do offer a much wider range coloured in black to distinguish from the other interac- of information for each interaction than does BioGRID. tions). PPaxe retrieves the PubMed ID (PMID) of the article Once the STRING dataset was processed, all interactions from which the interaction was found; those PMIDs are now that did not have experimental evidence were removed. available on RPGeNet, so that those who may be interested Since the interactions without evidences were excluded, the can figure out whether the ‘unknown’ interaction of interest new RPGeNet has a smaller whole graph than the previous describes a protein or a genetic interaction by jumping to version of RPGeNet that was also considering the predic- the corresponding PubMed entry. Regardless of the cons, tions. We now have 18 542 nodes and 1 218 032 edges— PPaxe is simpler to use and is able to retrieve interactions defining 613 319 non-redundant interactions—versus the without defining any syntactic pattern, unlike the previous 63 139 nodes and 1 688 656 edges in the core network of sparser method. the past version of RPGeNet. In relation to the updated core whole network, shortest On the other hand, using the PPaxe machine learning paths between two retinitis pigmentosa driver genes had software, we managed to find multiple interactions but distances from one to seven, where a distance of one means still required some post-filtering to ensure that all the that there is a direct interaction between two driver genes interactions found were indeed protein/genetic interactions. and any number above one is the number of genes in PPaxe cannot distinguish yet between a genetic and a between the two driver genes that made up the shortest protein interaction, so all PPaxe derived interactions in pathway. The shortest path distance for all 276 driver genes the network were labelled as ‘unknown’ interactions (and fall within three to four edges; in other words, there are two Page8of11 Database, Vol. 2019, Article ID baz120

Figure 6. Analysis of RPGeNet v1 and v2 networks connectivity with Circos (19). The figure compares the connectivity of the driver genes of the old RPGeNet v1.0 (left) and the updated RPGeNet v2.0 (right). The top pair of the plots shows all the shortest paths between driver genes at distance one, meaning direct interaction between each pair of driver genes. The bottom part of the plots provide the comparison at distance three, meaning there are two genes in the shortest path between a pair of driver genes of interest. It is clear from the visualized Circos plots that the updated RPGeNet database has a highly connected network. to three connecting genes between them (Figure 6). The sub- 4.44-fold and 26.24-fold increase from skeleton, respec- networks can be compared by the topology of each of the tively, but accounting for 96.27% of nodes and 76.54% level’s own graph (see Supplementary Table 1). The average of edges from whole graph). Such trend can be observed degree is 17.68 in the skeleton (∼8.84 in-/out-degree), on the corresponding in-/out-degree density graphs (see 104.458 in level one (∼52.23 in−/out-) and 131.49 in Supplementary Figure 3, as well as on the aforementioned level three (∼65.74 in-/out-). The large increase of average Supplementary Table 1). degree for level one mainly results from both adding new With an increased number of driver genes, it was nodes and basically much more interactions; obviously, net- expected for the RPGeNet core whole network to work saturates faster at nodes than at interactions (about swell immensely. Previously, the network saturated at Database, Vol. 2019, Article ID baz120 Page 9 of 11

Figure 7. A control case interaction visualized on RPGeNet showing the connections between NRL, NR2E3 and CRX retinal transcription factors. (A) RPGeNet was queried to display the subnetwork among NRL, NR2E3 for the nodes at distance one at level one; then nodes connected to CRX were added with the ‘Node Addition’ button activated on click. CRX, NRL and NR2E3 are three well-known transcription factors that co-regulate retinal-specific genes, among them RHO. Interestingly, several other genes that cause retinal dystrophies also appear in the network as target genes or other transcriptional regulators (border shown in purple). Nodes color-fill defined by the ‘RET-ALL’ gene-expression data. (B) Further trimming of the subnetwork obtained by omitting the nodes that are not chromatin remodelers or transcription factors provide an overview of relevant co- regulators of retinal genes. After deleting the corresponding nodes, further edges—out of the shortest paths that link the genes left—were shown by clicking on the ‘Connect Genes’ button at the control panel. This visualization allows pinpointing and exploring alternate pathways that connect the initial seeds; for instance, CRX and NRL were already connected, but longer paths are now evident like CRXPRKNHNRNPKNRL,whenPRKN and HNRNPK are linked. Another example can be the pathway found between NRL and NR2E3 on the path NRLSMAD4PIAS3NR2E3,when SMAD4 and PIAS3 are linked. Both panels from this figure can be reproduced on the Network Explorer if users upload the Supplementary Files 2 and 3, respectively.

subnetworks of level four meaning that there were no the fact that clear genetic communication between the more interactions within the network above that level. nucleus and the mitochondria is known (13), and that most The only genes not found within these four subnetwork proteins of the mitochondria are, in fact, encoded in the levels were genes that were unconnected to the rest of the autosomal DNA (14). Yet the proteins that are encoded network. The new RPGeNet network saturates earlier at in mitochondrial DNA are all important for the electron level three. There is a decrease in the number of total nodes transport chain and connect well with each other (15,16). and interactions in relation to the previous version, but There were also few autosomal genes that did not connect there is also a large number of supporting evidences for to the network but that may simply be because they do not interactions that did not exist in the previous network. have any known interactions at the moment or they have Most of the nodes and interactions no longer included are not been characterized at an experimental level in depth the result of stricter filtering of the data from the STRING (see Supplementary Tables 5 and 6 for a list of driver and database and improved anti-aliasing of node identifiers over non-driver genes, respectively, not connected to the core HGNC standard symbols; only interactions with evidence interactions network). were added to the network and predictions were left out. It The RPGeNet network was curated by reducing the is possible that these new nodes and interactions filled up network to the shortest paths between known driver missing gaps in the network, sketching new pathways to be genes, allowing users to better identify genes and pathways discovered. important in the development of retinitis pigmentosa. Even with an earlier saturation at level three, there Using RPGeNet, many potential candidate genes have were still driver genes that did not connect to any other been identified by inspecting the shortest paths found in member of the network. Some of the unconnected driver the skeleton graph. One of the candidate genes identified genes within the network were mitochondrial genes, like in the skeleton, SIRT1, has recently been experimentally MT-ND4, MT-TP and MT-TS2. This may be because there confirmed to interact with CERKL. More importantly, it is not enough research on interactions between mitochon- was found that CERKL regulates autophagy via SIRT1 drial genes/proteins and autosomal genes/proteins despite (17). This discovery supports SIRT1 as a new driver gene Page 10 of 11 Database, Vol. 2019, Article ID baz120 of retinitis pigmentosa and confirms the utility of the Conflict of interest RPGeNet model to identify potential IRD candidate genes. G.M. and R.G-D. are co-founders and assessors of DBGen, a spin-off Furthermore, RPGeNet allows to visualize and to highlight of the Universitat de Barcelona dedicated to the genetic diagnosis of new connections even in known interaction networks. visual disorders. The subnetwork retrieved after querying for three retinal- The authors declare that there is no competing interest. specific transcription factors (shortest path between NRL and NR2E3, plus addition of CRX) allows showing their References connection to other causative retinal dystrophy genes (Figure 7A). In addition, such subnetwork can be easily 1. Daiger,S.P., Sullivan,L.S. and Bowne,S.J. RetNet, the retinal information network. https://sph.uth.edu/retnet/. trimmed by omitting ‘noisy’ nodes to focus on particular 2. Boloc,D., Castillo-Lara,S., Marfany,G. et al. (2015) Distilling interactors. In this case, deletion of the nodes unrelated a visual network of retinitis pigmentosa gene–protein inter- to transcriptional regulation and chromatin remodelers actions to uncover new disease candidates. PLoS One, 10, unveils new regulatory loops between these transcription e0135307. doi:10.1371/journal.pone.0135307. factors that may be relevant for retinal development and 3. Braschi,B., Denny,P., Gray,K. et al. (2019) Genenames.org:the maintenance (Figure 7B). HGNC and VGNC resources in 2019. Nucleic Acids Res., 47, There are plans on the way to create a mouse and D786–D792. zebrafish RPGeNet specific interaction networks, as they 4. Chatr-Aryamontri,A., Oughtred,R., Boucher,L. et al. (2017) The BioGRID interaction database: 2017 update. Nucleic Acids are the two most used model organisms for research on Res., 45, D369–D379. retinal dystrophies, and later on to integrate them with the 5. Szklarczyk,D., Morris,J.H., Cook,H. et al. (2017) The STRING human network currently available. The newer RPGeNet database in 2017: quality-controlled protein–protein associa- graph engine will facilitate clustering gene nodes against a tion networks, made broadly accessible. Nucleic Acids Res., 45, separate network layer based on disease nodes; for instance, D362–D368. as described in Lázaro-Guevara et al.(18). New retinal 6. Castillo-Lara,S. and Abril,J.F. (2019) PPaxe:easyextractionof differential gene-expression data from new RNA-seq and protein occurrence and interactions from the scientific literature. Bioinformatics, 35(14), 2523–2524. proteomic experiments is under analysis and will be added 7. Robinson,I., Webber,J. and Eifrem,E. (2014) Graph Databases: soon. We are also working on an automated pipeline to New Opportunities for Connected Data. O’Reilly Media, Inc., automate the protocol used to create RPGeNet, so it would Sebastopol, CA, USA be easier to keep it up-to-date as well as to expand the 8. Fabregat,A., Jupe,S., Matthews,L. et al. (2018) The procedure to generate specific interaction networks for Reactome pathway knowledgebase. Nucleic Acids Res., 46, other rare diseases. D649–D655. 9. Castillo-Lara,S. and Abril,J.F. (2018) PlanNET: homology- based predicted interactome for multiple planarian transcrip- Availability tomes. Bioinformatics, 34, 1016–1023. 10. Franz,M., Lopes,C.T., Huck,G. et al. (2016) Cytoscape.js:a RPGeNet is an open-source refined interaction network graph theory library for visualisation and analysis. Bioinformat- for retinitis pigmentosa. The RPGeNet website provides ics, 32, 309–311. data description and a complete tutorial. Visit RPGeNet at 11. Barrett,T., Wilhite,S.E., Ledoux,P. et al. (2013) NCBI GEO: https://compgen.bio.ub.edu/RPGeNet. archive for functional genomics data sets—update. Nucleic Acids Res., 41(Database Issue), D991–995. 12. Dezso,Z.,˝ Nikolsky,Y., Sviridov,E. et al. (2008) A comprehensive Supplementary data functional analysis of tissue specificity of human gene expres- sion. BMC Biol., 6, 49. Supplementary Data are available at Database Online. 13. Brandvain,Y. and Wade,M.J. (2009) The functional transfer of genes from the mitochondria to the nucleus: the effects of selection, mutation, population size and rate of self-fertilization. Acknowledgements Genetics, 182, 1129–1139. This work was supported by research grants from Spanish Min- 14. Berg,O.G. and Kurland,C.G. (2000) Why mitochondrial genes istry of Economy (BFU2017-83755P) and Generalitat de Catalunya are most often found in nuclei. Mol. Biol. Evol., 17, (2017-SGR-1455) to J.F.A.; and Ministerio de Economía y Com- 951–961. petitividad/FEDER (SAF2016-80937-R), Generalitat de Catalunya 15. Anderson,S., Bankier,A.T., Barrell,B.G. et al. (1981) Sequence (2017 SGR 738) and La Marató TV3 (Project Marató 201417-30- and organization of the human mitochondrial genome. Nature, 31-32) to G.M. and R.G.D. S.C.-L. is a fellow of the Catalan Gov- 290, 457–465. ernment ‘AGAUR’ (FI-FDR, 2017FI_B_00191). V.T. is a fellow of the 16. Satoh,M. (1991) Organization of multiple nucleoids and DNA MINECO (BES-2014-068639, Ministerio de Economía, Industria y molecules in mitochondria of a human cell. Exp. Cell Res., 196, Competitividad). 137–140. Database, Vol. 2019, Article ID baz120 Page 11 of 11

17. Hu,X., Lu,Z., Yu,S. et al. (2019) CERKL regulates autophagy 19. Krzywinski,M., Schein,J., Birol,I. et al. (2009) Circos: an infor- via the NAD-dependent deacetylase SIRT1. Autophagy, 15, mation aesthetic for comparative genomics. Genome Res., 19, 453–465. 1639–1645. 18. Lázaro-Guevara,J.M., Flores-Robles,B.J., Garrido,K. et al. (2018) Gene’s hubs in retinal diseases: a retinal disease network. Heliyon, 4, e00867. doi:10.1016/j.heliyon.2018.e00867.