www.nature.com/scientificdata

OPEN ACO2 clinicobiological dataset with Data Descriptor extensive phenotype ontology annotation Khadidja Guehlouz1, Thomas Foulonneau2, Patrizia Amati-Bonneau2,3, Majida Charif2,4, Estelle Colin2,3, Céline Bris2,3, Valérie Desquiret-Dumas2,3, Dan Milea5, Philippe Gohier1, Vincent Procaccio2,3, Dominique Bonneau2,3, Johan T. den Dunnen 6, Guy Lenaers 2, Pascal Reynier 2,3 & Marc Ferré 2 ✉

Pathogenic variants of the aconitase 2 (ACO2) are responsible for a broad clinical spectrum involving optic nerve degeneration, ranging from isolated optic neuropathy with recessive or dominant inheritance, to complex neurodegenerative syndromes with recessive transmission. We created the frst public locus-specifc database (LSDB) dedicated to ACO2 within the “Global Variome shared LOVD” using exclusively the Human Phenotype Ontology (HPO), a standard vocabulary for describing phenotypic abnormalities. All the variants and clinical cases listed in the literature were incorporated into the database, from which we produced a dataset. We followed a rational and comprehensive approach based on the HPO thesaurus, demonstrating that ACO2 patients should not be classifed separately between isolated and syndromic cases. Our data highlight that certain syndromic patients do not have optic neuropathy and provide support for the classifcation of the recurrent pathogenic variants c.220C>G and c.336C>G as likely pathogenic. Overall, our data records demonstrate that the clinical spectrum of ACO2 should be considered as a continuum of symptoms and refnes the classifcation of some common variants.

Background & Summary Aconitate hydratase (ACO2; EC# 4.2.1.3) is a ubiquitous human mitochondrial monomeric enzyme composed of 780 amino acids. It catalyses the second reaction of the citric acid cycle by isomerising citrate to isocitrate1. It is encoded by the aconitase 2 gene (ACO2; MIM# 100850) that extends over 35 kb on 22q13 and includes 18 translated exons2. Biallelic pathogenic variants of this gene have been associated with infantile cerebellar-retinal degeneration (ICRD; MIM# 614599), a severe neurodegenerative disorder with optic neuropa- thy beginning in childhood and including retinal dystrophy, cerebellar ataxia, seizure, strabismus, axial hypoto- nia and athetosis3. Isolated optic neuropathies have also been associated with ACO2 pathogenic variants, either with recessive (locus OPA9; MIM# 616289)4 or dominant inheritance, as recently reported5. A recent review of the literature has listed a total of 26 individuals reported from 15 families with the syndromic form since 2012, while only seven individuals from four families have been reported with the recessive form of isolated optic neu- ropathy6. More specifc clinical presentations, possibly without optic neuropathy, have also extended the clinical spectrum, such as a form of hereditary spastic paraplegia reported in two siblings7. Te recent description by our team of 116 additional cases belonging to 94 families brings the total to over a hundred cases listed in the literature to date5. Te high frequency of dominant ACO2 mutations in our molecular diagnosis experience of optic neuropa- thies and the diversity of the neurological involvement led us to develop a reliable database dedicated to ACO2 as part of the Global Variome shared Leiden Open-source Variation Database (LOVD) installation8,9. Indeed,

1Département d’Ophtalmologie, Centre Hospitalier Universitaire d’Angers, Angers, France. 2Unité Mixte de Recherche MITOVASC, CNRS 6015/INSERM 1083, Université d’Angers, Angers, France. 3Département de Biochimie et Génétique, Centre Hospitalier Universitaire d’Angers, Angers, France. 4Genetics, and immuno-cell therapy Team, Mohammed First University, Oujda, Morocco. 5Singapore National Eye Centre, Singapore Eye Research Institute, Duke-NUS, Singapore. 6Human Genetics and Clinical Genetics, Leiden University Medical Centre, Leiden, The Netherlands. ✉e-mail: [email protected]

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 1 www.nature.com/scientificdata/ www.nature.com/scientificdata

we recently developed such a database to list the genetic variants and clinical presentations of OPA1-related dis- orders, the major cause of dominant optic neuropathies with either isolated (80%; MIM# 165500) or syndromic (20%; MIM# 125250) presentations, and with some rare biallelic cases afected by early severe syndromes (MIM# 605390)10. Te interoperability and the use of an international clinical thesaurus11 should, therefore, make it possible to progressively improve understanding of how this increasing number of responsible for optic neuropathies contributes to the diversity of the pathophysiological mechanisms responsible for their common clinical phenotype. In this article, we describe the construction of this ACO2 dataset, listing all the patients referenced in the liter- ature, and drawing statistical information on gene variants and clinical diversity. Methods Nomenclature. All names, symbols, and OMIM numbers were checked for correspondence with current ofcial names indicated by the Organization (HUGO) Gene Nomenclature Committee12 and the Online Mendelian Inheritance in Man database (OMIM)13. Te phenotype descriptions are based on the standardised Human Phenotype Ontology (HPO)11, indicating the HPO term name and identifer. ACO2 var- iants are described according to both the NCBI genomic reference sequence NG_032143.1 and transcript ref- erence sequence NM_001098.2, including 18 exons encoding a of 780 amino acids reference sequence NP_001089.114. Te numbering of the nucleotides refects that of the cDNA, with “+1” corresponding to the “A” of the ATG translation initiation codon in the reference sequence, according to which the initiation codon is codon 1, as recommended by the version 2.0 nomenclature of the Human Genome Variation Society (HGVS; http://varnomen.hgvs.org)15. Information concerning changes in RNA levels has been added from the original papers or predicted from DNA mutations if not experimentally studied. Following the HGVS guidelines, deduced changes are indicated between brackets. Protein domains were predicted according to InterPro version 79.016 and Pfam version 32.017.

Data collection. Since no locus-specifc database (LSDB) dedicated to the ACO2 gene previously existed (http://www.hgvs.org/locus-specifc-mutation-databases, accessed on January 12, 2021), this work was done from scratch. Te causative variants were collected from the literature published to date (January 2021)3–7,18–24 using the NCBI PubMed search tool25 with the keyword “ACO2,” and from the classifcations of diagnostic laboratories in the Netherlands who recently decided to share them publicly (so-called the VKGL initiative)26. Te posi- tions of variants in the reference transcript were determined according to the HGVS nomenclature version 2.015. Correct naming at the nucleotide and amino acid levels were verifed, and reestablished when necessary, using the Mutalyzer 2.0.32 Syntax Checker27. All clinical descriptions of the dataset have been strictly and exhaustively translated using exclusively the HPO vocabulary, i.e. each phenotype mentioned having successfully matched an HPO term (or one of the synonyms associated with).

Integrity of the dataset. Te work was coordinated by a single ophthalmologist to ensure consistency, with the help of our clinical and research team specializing in hereditary optic neuropathies, which performs moni- toring of literature on the subject for several years. No data was rejected, the corresponding molecular biologist or ophthalmologist were contacted when clarifcation was required. Finally, the consistency and integrity of the entire dataset was validated by the curator of the database, who is a referent specialist, before the technical valida- tion that followed. (Please also refer to the section Author contributions.)

Implementation of the dataset. The ACO2 dataset belongs to the Global Variome shared Leiden Open-source Variation Database (LOVD) currently running under LOVD v.3.0 Build 239, following the guide- lines for LSDBs28 and hosted under the responsibility of the Global Variome/Human Variome Project8. Te dataset reviews clinical and molecular data from patients carrying ACO2 variants published in peer-reviewed literature, as well as unpublished contributions that are directly submitted.

Data classifcation. Te criteria of pathogenicity, which depend upon the clinical context and molecular fndings, are stated under the headings: “ClassClinical” for the classifcation of the variant based on standardised criteria, and “Afects function (as reported)” and “Afects function (by curator)” respectively for the pathogenicity reported by the submitter and concluded by the curator (Fig. 1d). As several patients can be registered with difer- ent reported pathogenicity or new patients with existing variants added to the database, the status of the variants is reassessed on the basis of the data submitted and stated in the “SUMMARY record.”

Dataset and analysis. Starting with version ACO2:200608 of the ACO2 LSDB (last updated on June 8, 2020), we produced a dataset. To carry out the statistical analysis, the HPO terms have been checked and pre- pared using the suite of R packages ontologyX29 within R version 4.0.030 to read in the OBO fle version hp/ releases/2020-06-0811. Hierarchical clustering is performed using the hclust function from the R-Core package30. Data Records Te ACO2 dataset is available on the Code Ocean cloud-based computational reproducibility platform as a Code Ocean “compute capsule31”: snapshot of the ACO2 dataset as of June 8, 2020, /data/LOVD_full_down- load_ACO2_2020-06-21_18.03.52.txt (LOVD fat fle format); the Human Phenotype Ontology data-version hp/ releases/2020-06-08, /data/hp.obo.2020-06-08.txt (OBO fat fle format), which is also available in fgshare32. Te updated dataset is accessible on the Global Variome shared LOVD server (https://www.lovd.nl/ACO2; or through the Mitochondrial Dynamics variation portal: https://mitodyn.org). Te data can also be retrieved via an appli- cation programming interface (API), i.e. a web service allowing simple queries and retrieval of basic gene and variant information (documentation available on the web page of the database)9; as well as serving as a public

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 2 www.nature.com/scientificdata/ www.nature.com/scientificdata

Fig. 1 Sample recording for a given patient in the ACO2 dataset. (a) individual items; (b) phenotype items; (c) screening items; and (d) molecular items (some uninformative lines were removed to save space). Abbreviations and legends of the felds are given by following the link “Legend” on the web page of each table; SEQ-NG: next generation sequencing; M: male. Data as of June 8, 2020.

beacon in Te Global Alliance for Genomics and Health Beacon Project33. General information is available on the database home page. Te process for submitting data begins by clicking the “Submit” tab. Te ACO2-LOVD dataset contains four main interconnected tables. Tese tables are visible on a typical web page entry as shown in Fig. 1. Te “Individual” table contains details of the patient examined, including gender, geographic origin, and patient identifcation, if applicable (Fig. 1a). Te “Phenotype” table indicates the clinical phenotypic features, described according to the root of the Phenotypic abnormality subontology (HPO# HP:0000118; Fig. 1b). Te “Screening” table gives details of the methods and techniques used for inves- tigating the structural variants and the tissue analysed (Fig. 1c). Te “Variants” table includes information about the sequence variations at the genomic (DNA) and the transcript variant (cDNA) levels, as well as the reported and concluded status for each variant (Fig. 1d). To date, the dataset records 123 patient records and 126 unique variants (Online-only Table 1). Since the ACO2 dataset is built on the same central platform (to allow interoperability) and on the same model (for ease of handling) as our previous OPA1 gene database, the description of these data records extends our related work10, but the data recorded relates to a new gene and is entirely diferent. Technical Validation Molecular relevance. Te dataset contains 126 unique variants, of which 73% (92) are considered patho- genic sequence variants with almost two thirds in a dominant condition and the last third recessive, 7% (9) of unknown signifcance and 20% (25) benign (Online-only Table 1). Te variants considered pathogenic, which afect the coding sequence and exon-intron boundaries of the gene (Fig. 2a), are particularly overrepresented in the aconitase C-terminal domain of the protein (spanning from end of exon 14 to half of exon 17), highlighting the importance of this domain in ACO2 functions (Fig. 2b), as well as the aconitase domain since it spans more than half of the protein sequence (from beginning of exon 5 to beginning of exon 13). Only one mutation con- sidered pathogenic (intron 1 splicing site) and two benign (exon 2) are localized in the N-terminal presequence that is cleaved upon import of ACO2 into the mitochondrial matrix, predicted in the frst 28 to 35 amino acids34. Among the most frequently observed pathogenic efects on ACO2, 66% are missense variants; 14% are associated with altered splicing, which produces efects that are difcult to assess experimentally; 11% are frameshif variants leading to a premature protein truncation; 8% are nonsense variations; one variant is a deletion of a single amino acid (Fig. 2c). Although only a few mutations are recurrent, two have been signifcantly more frequently reported, both located in exon 3 (Fig. 2a) with a recessive mode of inheritance: the c.220C>G variant, which induces a mis- sense mutation p.(Leu74Val), has been reported 19 times afecting one allele (of which 14 belongs to compound heterozygous genotypes); the c.336C>G variant, which induces a missense mutation p.(Ser112Arg), has been reported 10 times afecting both alleles (homozygous).

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 3 www.nature.com/scientificdata/ www.nature.com/scientificdata

a Aconitase domain Aconitase C-terminal domain 14 1

12

10

8

Count 13 6 + 1 10 1 9 1 4 2 6 5 5 5 5 2 4 2 4 3 133 3 2 1 1 1 0 Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Exonic/ Intronic mutations Location

b c Deletion n i q u e v a r i a n t s 1% U

n A ai c on om i d ta Nonsense l s a ted va e por rian in e ts d 8% R o m r m e t a - i Frameshift 31.5% n C with premature e 27% ino ac s m ids termination a A t i 11% n 17% o

c

A

45%

49%

56% Splicing defect

27%

Missense

14%

66%

28%

c

i

f

i

c

e

p 19.5%

s -

n

o

N

Fig. 2 Distribution of the ACO2 variants classifed as pathogenic or likely pathogenic. (a) Distribution of the 89 unique variants. Exons involved in the variants are shown as blue bars; the variants in the intronic neighborhood of the exons are shown as red bars; the location of the two variants signifcantly more frequently reported being indicated by the “+” symbol. (b) Comparison for each region of its size on the sequence (Amino acids), of the distribution of the variants reports in the dataset (i.e. by counting each of the reported case of the same variant; Reported variants), and of the distribution of the unique variants (i.e. by counting only once several reported cases of the same variant). (c) Distribution of the diferent efects on the protein of the ACO2 variants considered pathogenic. Data as of June 8, 2020.

Te Global Variome shared LOVD server has integrated the data from Te Genome Aggregation Database (gnomAD), which is the aggregation of the high-quality exome (protein-coding region) DNA sequence data for about 150,000 individuals35. It was decided to indicate for each variant its frequency in gnomAD rather than adding it as a new record, in order not to swamp the locus-specifc databases (LSDBs) with data not associated to a phenotype. Tis information is mainly used to assist the curator in estimating the relevance of the classifcation of variants. In total, only two of the unique variants in our dataset (less than 2%) have assigned a “likely patho- genic” status with a frequency slightly below 1% in gnomAD. Interestingly, one of these, the previously mentioned c.220C>G variant, has a frequency from 0.2 to 0.4% (depending on versions of gnomAD) which does not allow its defnitive classifcation as rare or frequent; it is registered in dbSNP (Build 153; dbSNP# rs141772938)36 refer- ring to ClinVar (record last updated May 9, 2020; ClinVar# VCV000189310.8)37 for an unclear interpretation (“Conficting interpretations of pathogenicity”). Tis last-mentioned heterozygous variant has been reported 19 times by at least fve independent sources in our dataset, of which 14 are compound heterozygous disease cases. In addition, variant c.220C>G has so far not been reported as a third variant in a case with two other pathogenic variants. Tese observations provide strong evidence that has allowed us to classify this missense variant as “likely pathogenic (recessive)”, strengthening the importance of the LSDB approach and data sharing to support variant classifcation and DNA diagnostics.

Clinical relevance. Te dataset includes 123 patient records (71 males, 49 females, and three records of unspecifed gender). Among these, 99 patients had isolated optic neuropathy4–6,18, 8 had infantile cerebellar-retinal degeneration (ICRD; MIM# 614559)3, 8 had various forms of encephalopathy (including epileptic encephalopathy

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 4 www.nature.com/scientificdata/ www.nature.com/scientificdata

Autosomal recessive inheritance Autosomal dominant inheritance Syndromic forms Patients reported with an isolated or predominant optic neuropathy SPG00225621 SPG00225632 RDA00225635 NDG0023918 4 ENM00225634 ENS00225650 ENE00226239 ENE00226238 ENE00226240 ENE00176985 ICD00226243 ICD0022625 9 ICD00226263 ICD0022626 1 ICD00226262 ICD00226242 ICD0022624 1 ICD0022626 0 ENC00225648 ENS00225649 SZR00302956 OPN00287982 OPN00302655 OPN00302656 OPN00225645 OPN00225646 OPN00288895 OPN00288891 OPN00288888 OPN00288899 OPN00289037 OPN00289038 OPN00288889 OPN00288887 OPN00287981 OPN00289027 OPN00289024 OPN00289025 OPN00288890 OPN00288897 OPN00225647 OPN00286196 OPN00286191 OPN00276352 OPN00286184 OPN00287366 OPN00275992 OPN00287979 OPN00289036 OPN00289019 OPN00288894 OPN00288893 OPN00289034 OPN00276060 OPN00281823 OPN00286175 OPN00275994 OPN00289031 OPN00289023 OPN00289017 OPN00281821 OPN00276355 OPN00276133 OPN00276132 OPN00275995 OPN00287171 OPN00287364 OPN00286188 OPN00287974 OPN00275996 OPN00276063 OPN00289020 OPN00275993 OPN00276128 OPN00276121 OPN00289029 OPN00276061 OPN00288892 OPN00288886 OPN00287973 OPN00287980 OPN00276131 OPN00287975 OPN00286183 OPN00289022 OPN00276065 OPN00276066 OPN00276354 OPN00275985 OPN00286162 OPN00276356 OPN00286190 OPN00276072 OPN00286177 OPN00287363 OPN00286161 OPN00286185 OPN00275991 OPN00276135 OPN00281820 OPN00288898 OPN00276064 OPN00286182 OPN00276122 OPN00276136 OPN00276130 OPN00276351 OPN00287971 OPN00287172 OPN00287976 OPN00276062 OPN00287972 OPN00288896 Phenotypic abnormality Abnormality of the eye Abnormality of higher mental function Intellectual disability Sensorineural hearing impairment Abnormal ear morphology Functional abnormality of the inner ear Abnormality of the ear Hearing impairment Behavioral abnormality Poor eye contact Peripheral neuropathy Increased blood pressure Hypertension Abnormality of the cardiovascular system Abnormal cardiovascular system physiology Abnormality of the immune system Abnormality of immune system physiology Functional respiratory abnormality Asthma Hypertonia Lower limb spasticity Paraplegia/paraparesis Dysarthria Abnormality of limbs Abnormality of the head Abnormality of the musculature of the limbs Abnormal axial skeleton morphology Hyperreflexia Abnormality of the curvature of the vertebral column

Scoliosis Abnormal eye physiology Reduced consciousness/confusion Vegetative state Hyperglycemia Central apnea Coma Metabolic acidosis Vertigo Cerebral atrophy Cyanosis Bradycardia Spastic paraparesis

Bilateral sensorineural hearing impairment All the phenotypic abnormalities excludin g Anorexia Migraine Increased inflammatory response Abnormality of the digestive system Intellectual disability, severe Inguinal hernia Recurrent otitis media

Abnormality of the upper limb HP:0012373 Abnormality of the lower limb Hyporeflexia of upper limbs Hyporeflexia of lower limbs Generalized hypotonia Neurodegeneration Lower limb muscle weakness Babinski sign Microcephaly

Spastic paraplegia and Equinovarus deformity Limited pronation/supination of forearm Hyperlordosis Developmental regression Abnormal eye morphology Impaired vibration sensation in the lower limbs Syndactyly Intellectual disability, mild Encephalopathy Epileptic encephalopathy Attention deficit hyperactivity disorder Cognitive impairment Memory impairment Mitral valve prolapse Renal cortical microcysts Abnormality of the kidney Renal insufficiency Myoclonus Abnormal facial shape Motor delay Abnormality of the face Submucous cleft soft palate Short stature Limb hypertonia Bifid uvula Delayed speech and language development HP:0012372 Diabetes insipidus Exercise intolerance Abnormal circulating cholesterol concentration Abdominal obesity Hypoplasia of the optic tract Tonsillitis Pituitary gland cyst Seizure Failure to thrive Growth abnormality Abnormality of body weight Abnormality of the cerebral white matter Aplasia/Hypoplasia of the cerebellum Aplasia/Hypoplasia of the cerebrum Hypoplasia of the corpus callosum Muscular hypotonia of the trunk Abnormality of movement Abnormal reflex Reduced tendon reflexes Areflexia Neurodevelopmental delay Athetosis Global developmental delay Involuntary movements Profound global developmental delay Abnormality of the nervous system Abnormal nervous system physiology Abnormal nervous system morphology Morphological central nervous system abnormality Abnormality of brain morphology Abnormal cerebral morphology Aplasia/Hypoplasia involving the central nervous system Muscular hypotonia Abnormal muscle physiology Abnormal muscle tone Abnormal central motor function Ataxia Cerebellar atrophy

Neurodevelopmental abnormality Abnormal eye morpholog y Optic atrophy Temporal optic disc pallor Diffuse optic disc pallor Abnormal retinal morphology Retinal dystrophy Abnormal macular morphology

Macular dystrophy HP:0012372 Peripapillary atrophy Optic disc hypoplasia Abnormal retinal morphology on macular OCT Pseudopapilledema Cystoid macular degeneration Retrobulbar optic neuritis Pigmentary retinopathy Astigmatism Abnormality of optic chiasm morphology Increased cup−to−disc ratio Aplasia/Hypoplasia of the optic nerve Septo−optic dysplasia Abnormality of the optic nerve Optic neuropathy Abnormality of the optic disc Optic disc pallor Strabismus Abnormality of eye movement Nystagmus Progressive visual loss Centrocecal scotoma Tr itanomaly Protanopia Paracentral scotoma Abnormality of binocular vision Amblyopia Horizontal nystagmus Abnormal eye physiology Blindness Heterotropia Exotropia Abnormal visual field test Abnormal saccadic eye movements Glaucoma Slow decrease in visual acuity HP:0012373 Photophobia Visual acuity light perception without projection Esodeviation Esotropia Congenital nystagmus Abnormality of ocular smooth pursuit Horizontal pendular nystagmus High myopia Myopia Astigmatism Ring scotoma Blurred vision Pericentral scotoma Nonprogressive visual loss Ptosis Color vision defect Dyschromatopsia Central scotoma Visual field defect Scotoma Visual loss Abnormal eye physiology Abnormal best corrected visual acuity test Reduced visual acuity Abnormality of vision Visual impairment

Fig. 3 Visualisation of the Phenotypic abnormality subontology (HPO# HP:0000118) annotation in the ACO2 dataset. Describing the 113 symptomatic patients’ reports with an extended set of full clinical description. Rows are clustered using hclust by separating the terms descending from Abnormal eye physiology (HPO# HP:0012373) and Abnormal eye morphology (HPO# HP:0012372); human readable shortened ontological term names were used (where possible). In columns, the identifers of the patients (8 digits) are prefxed by code of the disease reported (3 letters): ICD: degeneration, cerebellar-retinal, infantile; RDA; dystrophy, retinal, with or without extraocular anomalies; ENM: encephalomyopathy, mitochondrial; ENC: encephalopathy; ENE: encephalopathy, epileptic; ENS: encephalopathy, neonatal, severe; NDG: neurodegeneration; OPN: neuropathy, optic; SPG: paraplegia, spastic; SZR: seizures. A red box indicates the presence of the phenotype. Data as of June 8, 2020.

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 5 www.nature.com/scientificdata/ www.nature.com/scientificdata

a Patients reported with an All isolated or predominant e y optic neuropathy (92) e lit urs e Of od henotypic M P norma nheritanc I Ab Clinical Co al l m m a m e e ting n iv nt tes s y yste a emi o s ey cl g S ascular Infancy al Obesity ye m osis earingirment E iolo iov rcula rati In Autoso itance Kidn a Muss coli Onset Autosomaomina H Diabe y ous Anorexi S Ci h Reces D omin Insipidu ard yste perglyc holesterol nher d Imp Ph erv C S y C eat I Inheritance N Immune Syste H oncent D Abl C a y m al em e et t ste e l s y s Blood alv nset t siolog cise Systgy d V e O n phology e c or er rance se ur Ons Renaficiency Bilateral ral f enal crocystCortic me Phy M Ex vous ss olaps ult ensorineurearing e er Tonsilliti e Mit r R Mi S H Intole hysiolo mmunehysiolog Sy P eonatal Ons Ad ediatri Insu Ey Eye N P I P IncreaPr N P Impair t e r e et t o l t Ey t ntal ns t se a y O m on or e a Onse eri ent gy aine n n ension ult Ons e osis lo hm rt Ag Pt Vision ost Ad e Onse P egm Migr Seizur normalit Ast pe le Glauco Refracti S Behaviora Lat orpho CentralFunctio Mo HigherFunctio Me Hy ung uvenile On Eye Movement Ab o hildhood J M Y Midd C m ion y stem is ion y it ve y ia tis ision is al y OnT us tic V V V d s esis a hria Sy r log r niti ment la ractiver ir igma lor Retin Fun Ataxi sart og a holog Myop pholog rpho Spa tentionpe Deficit y C Strabismus Nystagmus Co Defect r y D Ast o Photophobia At H Imp orp Binocu Blurred M acular OCMo Parapa Disord NervousM Visual Impairment l M ous n ia ld ua l s s s e is y y pia op V topsia ar era th y enital a ul log lit mu mu y o ph em otro ced it ac h orizontal ong u M ropa Morphologicalyst Ex odeviatio H stag C Visual Fi ptic Nerve Peri u entral Nerv y Visual Los Defect ed Acu O orp ory Impairment C S norma Es N Nystag High My R M m Ne Dyschrom a e r a Ab e y tic M y si y e y th m a d it ly nd s ssiv a u ia a pa c as y e olog s s ic ion h re mality ro pt Th ness om log s rat g rog st blyop O n t ogressiv cotom e nor an ia/Hypoplasie ys r np T b ptic Di ptic rChipho s lar Dystroph lvi uitary Gla P isual Los S Visual Fiel Blind A Am Trit Protanopia O o la trobulbar Op u gene o m n Morp it C Visual Ac O Neuriti c Cystoide Macula P V No Visual Los ptic Neu M Ap Of TheNerv Re D Aplasia/HypoplInv rai n Test d O Ma entralSyste NervousB I a e r a C a l ion y y uity allo c c ecal om uit P s lasia llar a a a orrect Ac oph edem p ia e C c io e ct y roc central t Ac al t s ed Cup to po om tom al t u s at y Th ra Decrease ent eri isu R papill ptic Di H T epto opti erebroph w cot P cotom ral ScotomParacentraco su es V ho tion ea c O poplasia Of splas C o C S nt S Besi T c ptic Atr r s y S y At isual Acuity S Ring Scot V ght WitPerceptje ptic Di O di H ptic D Sl V Ce Li o O Inc O Pr c Pseudo ptic O r l Opti ra allo o P p m Diffuse Disc Te Disc Pallor y illar y ripap Pe Atroph

b Patients reported with a All syndromic form (21)

e

utosomal rmality A ritance Recessiv Phenotypic Inhe Abno Clinical Course

y nia e wth o mality rator Head Ear Ey r mality Gr Functional adycardiaMetabolic ous System Onset r Acidosis rv Respi B Age Of Death Abno Abnor HyperglycemiaInguinal Her Ne

ive r ancy ring ment phology ysiology o Th ic Onset ace r h eletal t Stature T al Apnea anosis F phology k phology Limbs r y vous System vous System Hea Fundusr S C r ysiology r phologyDeath In ye P h Impai Mo Mor Shor ailure Musculature Childhood ediatr Ear Mor E F Cent Ne P Ne Mor Death In Inf Neonatal OnsetP

y h ous al rv eleton ement k elopmental vior v ral Motor v mality phological ucousalate Cleft mality Of o er Limb vement e mality r ripheral r al Ne P Retinalphology phology Muscle o r mality m Functional r r ysiology ow Seizure Beha Pe Mo acial ShapeBifid Uvula Media Optic Disc Blindness Syndactyly M System fantile Onset F Recurrent Otitis ye M Axial S Upper LimbMusculatureThe Limbs Of Ph L CentFunction HigherFunction Mental Neuropathy Sub Soft Abnor Mo E Mo Encephalopat Abno Abno Centr In The Inner Ear Neurod Abnor Childhood Onset

n ratio hy e e ral y y y allor hy ry y s us r e P h us one ar ia us rineu m T v d Th o ing m o ex elopmental rphology r rtigo r mity v y e Contact ration /Degenng rment e al Column olunta a y Impaired V ements aturer Of TheLimited Ataxia Refl vements elopmental hy SensoHea Pigmenta v orea er Limb Muscle Epileptic UpperNeuron Motor Inv o educe Intellectual ev Vib Nystag Strabismu o Pursuit F eakness EquinDefor Dysarthr R Disability Del er Limbs ffecti stem Optic Atrop Retinopat SaccadicM EOcular Smooth rteb w W Muscle M D Regressionoor E w tropA y rain Mo Impai Optic Disc Retinal Dystrop Curv Of Lo Dysfunction P o A S B Ve Pronation/supination Encephalopath Neurode SensationL In The Central Nerv Consciousness/confusion a si a y xia s , a opl e s Tendon ral ontalmus tonia ve State , Mild /Hyp rvou z lordosis es ed Speech a r oclonu Global y i ving Th ex y Coma Intellectualvere a s l m phology phology Hori Esotropia Scoliosis Muscular Athetosis e velopmental Cerebr r Hyper M Disability Intellectual e Del elopmentMotor Del tral Ne Cerebellum Nystag Hype Hypotonia Babinski Sign Refl Hyperrefle egetati S Delay v AplaInvon Mo Mo Reduced V Disability D And Language Syste De Ce Neurodegeneration

hy ed tonia z aparesis rum ontal r xia Of xia Of z us rali xia y ri m er Limb e ral White h w ound Globaly ral Atrop Ho endular Muscular er Limbs elopmentala P Gene runk Lo w Arefl v Matter CerebellarAtrop Hypotonia T Spasticity o Profe Del Cereb Nystag HyporeflUppere LimbsHyporefleL D Limb Hype Hypotonia Of The raplegia/par Aplasia/HypoplasiaCereb Aplasia/Hypoplasia a Of The Cereb P Of The Cerebellum

Spastic Spastic raparesis raplegia a a Microcephaly P P rpus Callosum 1% 50% 100% Hypoplasia Of The Co

Fig. 4 Visualisation of ontological annotation in the ACO2 dataset by disease. Subgraphs of the mode of inheritance (HPO# HP:0000005), the phenotypic abnormalities (HPO# HP:0000118) and the natural history of the disease (Clinical Course, HPO# HP:0031797), descending from the root of all terms (All; HPO# HP:0000001) in the Human Phenotype Ontology: (a) for the 92 patients reported with an isolated or predominant optic neuropathy; (b) for the 21 remaining patients reported with a syndromic form. Terms which are annotated to exactly the same objects as well as all of their children have been removed, showing only informative terms. Arrows indicate relations between terms in the ontology. Colors correspond to the frequency of the phenotypes, from the least frequent in yellow to the most frequent in blue, the green color corresponding to a term present in half of the patients. Human readable shortened ontological term names were used (where possible). Data as of June 8, 2020.

and neonatal severe encephalopathy)4,19,20, 2 had spastic paraplegia7, together with single cases of autosomal reces- sive spinocerebellar ataxia, neurodegeneration21, retinal dystrophy (RDEOA; MIM# 617175)22, seizures23 and unclassifed diseases24. Of these patients’ reports, 113 have an extended set of full clinical description (the remain- ing ten are either asymptomatic or not described), 90 relating to our Molecular Genetics Laboratory, along with data from 23 retrieved from publications. For the description of all of these phenotypes, use is made exclusively of a standard vocabulary for referencing phenotypic abnormalities, the so-called Human Phenotype Ontology (HPO)11, confrming the maturity of this ontology to describe eye diseases38. Genomic medicine requires the pre- cise and standardized description of phenotypes39–41; these HPO annotations are key elements that make possible the development of algorithms for molecular diagnostics and genetic research. A total of 154 unique HPO terms were used, each assigned from 1 to 92 patients for the most frequent term, optic neuropathy (HPO# HP:0001138); 208 unique HPO terms have been analysed by including the parent terms inferred by the ontological relationships. Figure 3 shows an exhaustive overview of the ontological annotation of the phenotypic abnormalities as a grid (mode of inheritance and natural history of the disease not shown), highlighting that patients reported with an isolated optic neuropathy, especially with a dominant mode of inher- itance, have a similar limited phenotypic profle, diferent from the other patients whose phenotypic variability is

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 6 www.nature.com/scientificdata/ www.nature.com/scientificdata

particularly wide. We carried out the separate study of these two populations by showing the frequency of phe- notypes and removing terms simply linking two terms together to focus the reading on informative phenotypes (Fig. 4): patients reported with an isolated or predominant optic neuropathy almost exclusively have a structural anomaly of the globe of the eye (Abnormal eye morphology; HPO# HP:0012372), in a predominantly dominant but also recessive mode of transmission, with an onset throughout life (Fig. 4a); the other patients reported with a syndromic form have rather a functional anomaly of the eye (Abnormal eye physiology; HPO# HP:0012373), in an exclusively recessive mode of transmission, with a beginning in the frst years of life (Fig. 4b). Overall, the extensive annotation using the phenotype ontology shows that the “isolated” versus “syndromic” separation is actually very relative: it is more likely to be a clinical spectrum with a continuum of symptoms. Figures 3 and 4 show that several cases reported as isolated are, in fact, afected by other symptoms; for the non-recurring symptoms, it is difcult to decide whether they are due to ACO2 variants or whether they are just associated comorbidities. Tese fgures also reveal that certain syndromic patients do not have optic neuropathy, which is therefore not compulsory in the ACO2 phenotype as is the case for the OPA1 gene, with only one excep- tion published42. Usage Notes Te databases recording pathogenic variations, i.e. the so-called LSDBs, have proven to be the most complete because they rely on a curator who is a referent specialist for the gene or disease considered43. However, they are ofen based on isolated initiatives which use various interfaces, to the detriment of intuitiveness, and which are hosted on local servers, preventing their interoperability, unlike other types of databases which are central, i.e. encompassing all the genes of an organism, as in sequence databases44,45 or in databases oriented towards non-pathogenic variations36,46. With the aim to overcome this, the Human Variome Project currently favors the centralisation of LSDBs8,47. Our current objective with the work reported here is to achieve a cluster of LOVD databases integrating the main genes responsible for optic neuropathies, whether isolated or syndromic, and with recessive or dominant transmission. Interoperability between these databases will be useful for molecular biologists analysing such pan- els of genes with, in particular, the possibility of detecting digenism. Te ACO2-dataset, which is, to our knowl- edge, the only clinicobiological dataset dedicated to ACO2, interfaces molecular biology with medicine thanks to a common vocabulary, making it possible to link the phenotypic profles of ACO2 patients with those involving mutations in other genes or clinical presentations. It will also allow a better understanding of the complex and overlapping relationships between isolated optic neuropathies and neurological syndromes involving optic neuropathy. Tus, this open-access dataset should prove useful for molecular biologists, researchers and clinicians. Code availability The source codes written in R programming language are available on the Code Ocean cloud-based computational reproducibility platform as a Code Ocean “compute capsule,” together with the dataset analysed in this article31: snapshot of the ACO2 dataset as of June 8, 2020, /data/LOVD_full_download_ACO2_2020-06- 21_18.03.52.txt (LOVD fat fle format); the Human Phenotype Ontology data-version hp/releases/2020-06-08, / data/hp.obo.2020-06-08.txt (OBO fat fle format). Tus, readers can reproduce and verify the results of this article without having to download or install anything. All the content is available under MIT License.

Received: 21 September 2020; Accepted: 22 June 2021; Published: xx xx xxxx References 1. Martins, C. Über den Abbau der Citronensäure. Hoppe-Seyler’s Zeitschrif für physiologische Chemie 247, 104–110, https://doi. org/10.1515/bchm2.1937.247.3.104 (1937). 2. Mirel, D. B. et al. Characterization of the human mitochondrial aconitase gene (ACO2). Gene 213, 205–218, https://doi.org/10.1016/ s0378-1119(98)00188-7 (1998). 3. Spiegel, R. et al. Infantile cerebellar-retinal degeneration associated with a mutation in mitochondrial aconitase, ACO2. Am J Hum Genet 90, 518–523, https://doi.org/10.1016/j.ajhg.2012.01.009 (2012). 4. Metodiev, M. D. et al. Mutations in the tricarboxylic acid cycle enzyme, aconitase 2, cause either isolated or syndromic optic neuropathy with encephalopathy and cerebellar atrophy. J Med Genet 51, 834–838, https://doi.org/10.1136/jmedgenet-2014-102532 (2014). 5. Charif, M. et al. Dominant ACO2 mutations are a frequent cause of isolated optic atrophy. Brain Communications 3, fcab063, https:// doi.org/10.1093/braincomms/fcab063 (2021). 6. Gibson, S. et al. Recessive ACO2 variants as a cause of isolated ophthalmologic phenotypes. Am J Med Genet A, https://doi. org/10.1002/ajmg.a.61634 (2020). 7. Bouwkamp, C. G. et al. ACO2 homozygous missense mutation associated with complicated hereditary spastic paraplegia. Neurol Genet 4, e223, https://doi.org/10.1212/NXG.0000000000000223 (2018). 8. Cotton, R. G. et al. GENETICS. Te Human Variome Project. Science 322, 861–862, https://doi.org/10.1126/science.1167363 (2008). 9. Fokkema, I. F. et al. LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 32, 557–563, https://doi.org/10.1002/ humu.21438 (2011). 10. Le Roux, B. et al. OPA1: 516 unique variants and 831 patients registered in an updated centralized Variome database. Orphanet journal of rare diseases 14, 214, https://doi.org/10.1186/s13023-019-1187-1 (2019). 11. Kohler, S. et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res 47, D1018–D1027, https://doi.org/10.1093/nar/gky1105 (2019). 12. Gray, K. A. et al. Genenames.org: the HGNC resources in 2013. Nucleic Acids Res 41, D545–552, https://doi.org/10.1093/nar/ gks1066 (2013). 13. Hamosh, A., Scott, A. F., Amberger, J., Valle, D. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM). Hum Mutat 15, 57–61, 10.1002/(SICI)1098-1004(200001)15:1<57::AID-HUMU12>3.0.CO;2-G (2000).

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 7 www.nature.com/scientificdata/ www.nature.com/scientificdata

14. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44, D733–745, https://doi.org/10.1093/nar/gkv1189 (2016). 15. den Dunnen, J. T. et al. HGVS Recommendations for the Description of Sequence Variants: 2016 Update. Hum Mutat 37, 564–569, https://doi.org/10.1002/humu.22981 (2016). 16. Mitchell, A. L. et al. InterPro in 2019: improving coverage, classifcation and access to protein sequence annotations. Nucleic Acids Res 47, D351–D360, https://doi.org/10.1093/nar/gky1100 (2019). 17. El-Gebali, S. et al. Te Pfam protein families database in 2019. Nucleic Acids Res 47, D427–D432, https://doi.org/10.1093/nar/gky995 (2019). 18. Kelman, J. C. et al. A sibling study of isolated optic neuropathy associated with novel variants in the ACO2 gene. Ophthalmic Genet 39, 648–651, https://doi.org/10.1080/13816810.2018.1509353 (2018). 19. Sadat, R. et al. Functional cellular analyses reveal energy metabolism defect and mitochondrial DNA depletion in a case of mitochondrial aconitase defciency. Mol Genet Metab 118, 28–34, https://doi.org/10.1016/j.ymgme.2016.03.004 (2016). 20. Abela, L. et al. Plasma metabolomics reveals a diagnostic metabolic fngerprint for mitochondrial aconitase (ACO2) defciency. PLoS One 12, e0176363, https://doi.org/10.1371/journal.pone.0176363 (2017). 21. Fukada, M. et al. Identifcation of novel compound heterozygous mutations in ACO2 in a patient with progressive cerebral and cerebellar atrophy. Mol Genet Genomic Med 7, e00698, https://doi.org/10.1002/mgg3.698 (2019). 22. Srivastava, S. et al. Increased Survival and Partly Preserved Cognition in a Patient With ACO2-Related Disease Secondary to a Novel Variant. J Child Neurol 32, 840–845, https://doi.org/10.1177/0883073817711527 (2017). 23. Helbig, K. L. et al. Diagnostic exome sequencing provides a molecular diagnosis for a signifcant proportion of patients with epilepsy. Genet Med 18, 898–905, https://doi.org/10.1038/gim.2015.186 (2016). 24. Narang, A. et al. Frequency spectrum of rare and clinically relevant markers in multiethnic Indian populations (ClinIndb): A resource for genomic medicine in India. Hum Mutat 41, 1833–1847, https://doi.org/10.1002/humu.24102 (2020). 25. Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 38, D5–16, https:// doi.org/10.1093/nar/gkp967 (2010). 26. Fokkema, I. et al. Dutch genome diagnostic laboratories accelerated and improved variant interpretation and increased accuracy by sharing data. Hum Mutat, https://doi.org/10.1002/humu.23896 (2019). 27. Wildeman, M., van Ophuizen, E., den Dunnen, J. T. & Taschner, P. E. Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 29, 6–13, https://doi.org/10.1002/ humu.20654 (2008). 28. Vihinen, M., den Dunnen, J. T., Dalgleish, R. & Cotton, R. G. Guidelines for establishing locus specifc databases. Hum Mutat 33, 298–305, https://doi.org/10.1002/humu.21646 (2012). 29. Greene, D., Richardson, S. & Turro, E. ontologyX: a suite of R packages for working with ontological data. Bioinformatics 33, 1104–1106, https://doi.org/10.1093/bioinformatics/btw763 (2017). 30. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2020). 31. ACO2 locus-specifc database with extensive phenotype ontology annotation to unravel clinical spectrum. Code Ocean https://doi. org/10.24433/CO.5810290.V1 (2020). 32. Ferré, M. ACO2 clinicobiological dataset with extensive phenotype ontology annotation. figshare https://doi.org/10.6084/ m9.fgshare.14915682.v1 (2021). 33. Global Alliance for Genomics and Health. GENOMICS. A federated ecosystem for sharing genomic, clinical data. Science 352, 1278–1280, https://doi.org/10.1126/science.aaf6162 (2016). 34. Calvo, S. E. et al. Comparative Analysis of Mitochondrial N-Termini from Mouse, Human, and Yeast. Molecular & cellular proteomics: MCP 16, 512–523, https://doi.org/10.1074/mcp.M116.063818 (2017). 35. Karczewski, K.J., Francioli, L.C., Tiao, G. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 https://doi.org/10.1038/s41586-020-2308-7 (2020). 36. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29, 308–311 (2001). 37. Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46, D1062–D1067, https://doi.org/10.1093/nar/gkx1153 (2018). 38. Kohler, S. et al. Te Human Phenotype Ontology in 2017. Nucleic Acids Res 45, D865–D876, https://doi.org/10.1093/nar/gkw1039 (2017). 39. Deans, A. R. et al. Finding our way through phenotypes. PLoS Biol 13, e1002033, https://doi.org/10.1371/journal.pbio.1002033 (2015). 40. Robinson, P. N. Deep phenotyping for precision medicine. Hum Mutat 33, 777–780, https://doi.org/10.1002/humu.22080 (2012). 41. Biesecker, L. G. Phenotype matters. Nat Genet 36, 323–324, https://doi.org/10.1038/ng0404-323 (2004). 42. Nasca, A. et al. Not only dominant, not only optic atrophy: expanding the clinical spectrum associated with OPA1 mutations. Orphanet journal of rare diseases 12, 89, https://doi.org/10.1186/s13023-017-0641-1 (2017). 43. Brookes, A. J. & Robinson, P. N. Human genotype-phenotype databases: aims, challenges and opportunities. Nat Rev Genet 16, 702–715, https://doi.org/10.1038/nrg3932 (2015). 44. Te UniProt, C. UniProt: the universal protein knowledgebase. Nucleic Acids Res 45, D158–D169, https://doi.org/10.1093/nar/ gkw1099 (2017). 45. Benson, D. A. et al. GenBank. Nucleic Acids Res 41, D36–42, https://doi.org/10.1093/nar/gks1195 (2013). 46. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291, https://doi.org/10.1038/ nature19057 (2016). 47. Smith, T. D., Vihinen, M. & Human Variome, P. Standard development at the Human Variome Project. Database (Oxford) 2015, https://doi.org/10.1093/database/bav024 (2015).

Acknowledgements We thankfully acknowledge grants from the following foundations and patients’ associations: ACO2 GENE, Association contre les Maladies mitochondriales, Fondation Visio, Kjer France, Ouvrir les Yeux, Retina France, and Union nationale des Aveugles et Défcients visuels. Author contributions M.F. designed supervised the project. J.T.D. and P.R. participated in the design and supervision. K.G. collected the data with the help from P.A.B., M.C., E.C., C.B., V.D., V.P., D.B., D.M., P.G. and G.L. T.F. participated in the data acquisition; M.F. performed the statistical analysis. K.G., P.R. and M.F. wrote the manuscript with inputs from all authors. All authors have read and approved the fnal manuscript. Competing interests Te authors declare no competing interests.

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 8 www.nature.com/scientificdata/ www.nature.com/scientificdata

Additional information Correspondence and requests for materials should be addressed to M.F. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre- ative Commons license, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per- mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Te Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata fles associated with this article.

© Te Author(s) 2021

Scientific Data | (2021)8:205 | https://doi.org/10.1038/s41597-021-00984-x 9