The Transformave Potenal of AI for Science: Will AI Write the Scienfic Papers of the Future?

Yolanda Gil Informaon Sciences Instute and Department of Computer Science University of Southern California

hps://nyurl.com/yolandagil-aaai2020

This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as the original source is credited. Source: “Will AI Write the Scienfic Papers of the Future?”, Yolanda Gil, 2020 Presidenal Address, Associaon for the Advancement of Arficial Intelligence (AAAI), February 2020.

A Personal Perspecve on AI A Personal Perspecve on AI

• The AI community has always been • Visionary • Broad • Inclusive • Interdisciplinary • Determined

• And dare I say • Successful A Personal View of Watershed Moments in AI: (I) 1980s

Scripts Plans Goals and Understanding Roger C. Schank and Robert P. Abelson A Personal View of Watershed Moments in AI: 1990s 1990: Sphynx shows speaker independent large vocabulary connuous speech recognion – used it to write my PhD

1992: Soar flies helicopter teams in simulaons and is indisnguishable to commanders from human-controlled aircra 1992: TD-Gammon autonomously learns to play backgammon at human player levels

1995: SKICAT idenfies five new quasars in the Second Palomar Sky Survey

1995: Navlab is the first trained car to drive autonomously on highways to cross the United States 1997: Deep Blue defeats human chess world champion and gets grandmaster-level rang

1999: RAX flew a spacecra autonomously, demonstrang planning, monitoring, and fault repair

hps://doi.org/10.1609/aimag.v16i1.1121

hps://www.cs.cmu.edu/afs/cs/project/alv/www/navlab_video_page_files/Navlab_Kanade_version97.mpg hps://flic.kr/p/9Dpww A Personal View of Watershed Moments in AI: 2000s

2000: The Gene Ontology is shown to describe over 15,000 gene products for drosophila, mouse, and yeast

2000: Kismet demonstrates and recognizes emoons

2003: USC ISI’s stascal machine translaon prototype beats hand-craed commercial systems

2004: RDF semanc specificaons become W3C recommendaons for the GGG (Giant Global Graph)

2007: Stanley wins $1M for first autonomous high-speed off-road driving

2007: First robot soccer team against human players in exhibion game at RoboCup

2009: Pragmac Chaos ensemble learning wins $1M compeon to predict user film rangs

hps://commons.wikimedia.org/w/index.php?curid=20798358 hps://commons.wikimedia.org/w/index.php?curid=374949 A Personal View of Watershed Moments in AI: 2010s

2010: Siri voice-acvated personal assistant is released as a smart phone app

2011: CMDragons team of 10 soccer robots coordinate plays for roune passing, intercepon, and goal scoring 2011: Watson takes first place in Jeopardy Q/A game defeang two human champions

2012: AIexNet scores improved 10 percentage points in the ImageNet visual recognion challenge

2013: Cognive Tutor shows 8 percenle points average improvement in algebra in 25,000 students study

2016: Knowledge Graph used as semanc backbone in one third of 100B monthly searches

2019: Wikidata records 8 billion triples and over 880 billion edits, surpassing Wikipedia as most edited Wikimedia site

hps://en.wikipedia.org/w/index.php?curid=54391045

hps://www.youtube.com/watch?v=4QtBSDSC2pk Diversity and Breadth of Advances in AI

Creavity Vision Percepon Commonsense Robocs Ethics Speech Sensing Explanaon Language Discovery

Representaon Knowledge

Search Planning Learning Cognion Reasoning Teamwork Matching Predicon Coordinaon Constraints Dialogue Understanding hps://commons.wikimedia.org/w/index.php?curid=70268939 AI to Address Major Future Challenges

The Imperave for AI in Science AI and Scienfic Discovery

• [Lenat 1976] • [Lindsay et al 1980] • [Langley 1981] • [Falkenhainer 1985] • [Kulkarni and Simon 1988] • [Cheeseman et al 1989] • [Zytkow et al 1990] • [Simon 1996] • [Valdes-Perez 1997] • [Todorovski et al 2000] • [Schmidt and Lipson 2009] Human Limitaons Curb Scienfic Progress [Gil DSJ’17]

• Not systemac • e.g., [Peters et al PLOS 2014]

• Errors • e.g., [Herndon et al CJE 2013]

• Biases • e.g., [Anderson et al ACS 2014]

• Poor reporng • e.g., [Garijo et al PLOS 2013] hps://doi.org/10.1145/3339399 hps://doi.org/10.1038/s41586-019-1335-8

hps://doi.org/10.1038/s41586-019-1923-7

hps://arxiv.org/pdf/1910.06710 http://commons.wikimedia.org/wiki/File:MRI_brain_sagittal_section.jpg http://commons.wikimedia.org/wiki/File:Earth_Eastern_Hemisphere.jpg http://www.nasa.gov/mission_pages/swift/bursts/uv_andromeda.html

Capturing Scienfic Knowledge Scienfic Knowledge

hp://www.pihm.psu.edu/pihm_home.html Supporng Composionality of Scienfic Knowledge

Ø Data formats Ø Physical variables Ø Constraints for use Ø Adjustable parameters Ø Intervenons Capturing Scienfic Knowledge

Crowdsourced Common Software and Provenance vocabularies entities models

Scientific data repositories Data analysis Collaborative processes methods

Open reproducible Automating Model publications discoveries integration Capturing Scienfic Knowledge

Crowdsourced Common Software and Provenance vocabularies entities models

Scientific data repositories Data analysis Collaborative processes methods

Open reproducible Automating Model publications discoveries integration Low-Cost Creaon of Scienfic Vocabulary Standards [Gil et al ISWC 2017; Khider et al PP 2019; Emile-Geay et al PAGES 2018] Work with Deborah Khider, Julien Emile-Geay, Daniel Garijo (USC); Nick McKay (NAU)

Problem: Diversity of requirements for metadata Approach: Semanc technologies used for controlled crowdsourcing facilitate creaon of community standards to describe highly heterogeneous scienfic data • Organic growth: As sciensts annotate their datasets, they propose new metadata properes

hps://commons.wikimedia.org/wiki/File:An_ice_core_segment.jpg • Crowdsourcing: Sciensts proposed properes for reuse, vote on priories • Editorial oversight: Editors decide what properes will be in future versions hps://commons.wikimedia.org/wiki/File:Gravity-corer_hg.png Results: A new standard for paleoclimate (PaCTS 1.0) with one (!!) single inial face-to-face meeng Controlled Crowdsourcing:

7 9 Connuous Ontology Growth 8

8

8

Figure 4: An overview of the core Linked Earth Ontology and its extensions.

We also took into account relevant standards and widely used models. We used several vocabularies3: Schema.org and Dublin Core Terms (DC) for representing the basic metadata of a dataset and its associated publications (e.g., title, description, authors, contributors, license, etc.), the wgs_84 and GeoSparql specifications for rep- resenting locations where samples are collected, the Semantic Sensor Network (SSN) to represent observation-related metadata, the FOAF vocabulary to represent basic information about contributors, and PROV-O to represent the derivation of models from raw datasets. Figure 4 shows an overview of the ontology, which is layered and has a modular structureFigure. The 3:existing Overview standards of just the mentioned main features provide an of upper the Linkedontology forEarth basic P latform. terms. We used the LiPD format, mentioned in Section 2, to develop the LiPD ontol- ogy4 which contains the main terms useful to describe any paleoclimate dataset (e.g., data tables,When variables, annotating instrument used metadata, to measure them,the calibration, system uncertainty, offers in etc.) a. pull-down menu the possible A set of extensions of LiPD cover more specific aspects of the domain. The Proxy completions of what the user is typing based on similar terms proposed by other users. Archive extension defines the types of medium in which measurements are taken, Figure 3: OverviewsuchThis as ofmarine the helps sedimentsmain avoid features or coral proliferation. Theof theProxy Linked Observation of Earth unnecessary extension Platform. describe terms s the and helps normalize the new Figure 2: Overview of the metadata annotation interface, with core ontology terms markedtypes of observations (e.g., tree ring width, trace metal ratio, etc.) that can be meas- with an “L”. The properties under “Extra Information” are part of the crowd vocabulary. uredterms. The Pro created.xy Sensor extensionIf none describes represents the types what of biological the user or non wants-biological to specify, then a new property When annotatingcomponentswill be metadata,that added. react to environmental The the property system conditions offersbecomes and reflect in part athe pullclimate of -thedown at the crowd time. menu vocabulary, the possible and a new wiki Figure 3: Overview of the main Thefeatures Instrument of the extension Linked enumerate Earths thePlatform. instruments used for taking measurements, completions ofsuch whatpage as a mass theis createdspectrometer user is for typing. The it. Inferred The based user,Variable on or similarextension perhaps describe terms others,s theproposed types can of edit by that other page users to add. documen- 4.1 Annotation Framework: The Linked Earth Platform This helps avoidclimatetation proliferationvariables. As that a result,can be ofinferred users unnecessary from build measurements the crow terms or fromd vocabulary and other inferred helps vari-while normalize curating the their new own datasets. When annotating metadata,ables the (e.g.Figure systemtemperature) 3 highlights offers. The crowd in vocabulary the a pull main- buildsdown features on these menu extensions.of the Linked possible Earth Platform. The map-based The Linked Earth Platform [21] implements the Annotation Frameworkterms as created. an exten- IfThe none core ontology represents and the crwhatowd vocabulary the user share wants a common to namespacespecify, for then all a new property completions of what the user theis typingextensionsvisualizations based(http://linked.earth/ontology/ on show similar datasets terms), in already order proposed to simplify annotated by querying other with as userswell location as. metadata. Author pages sion of the Organic Data Science framework [9], which is built onwill MediaWiki be added. [15] The property becomes part of the crowd vocabulary, and a new wiki and Semantic MediaWiki [13]. There are severalThis reasons helps for this avoid. First, proliferation a wiki pro- show of unnecessary their contributions terms and, which helps help normalize track credit the and new create incentives. Other pages page is created for it. The user, or perhaps others, can edit that page to add documen- terms created. If none represents3 what the user wants to specify, then a new property vides a collaborative environment where multiple users can edit pages, and where the http://schema.org/ are devoted, http://dublincore.org/documents/dcmi to foster community-terms/, https://www.w3.org/2003/01/geo/wgs84_pos discussions and ,take polls. The annotation interface tation. As a result, users build the crowd vocabulary while curating their own datasets. 1 will be added. The property becomeshttp://schemas.opengis.net/geospis designed part toof be arql/1.0/geosparql_vocab_all.rdf,the intuitive, crowd vocabulary,and https://www.w3.org/2005/Incubator/ssn/ssnx/ssn#,provides anddetailed a new documentation wiki with examples . history of edits is automatically tracked. Second, MediaWiki is easily extensible, https://www.w3.org/2005/Incubator/ssn/ssnx/ssn#, http://xmlns.com/foaf/spec/, http://www.w3.org/TR/prov-o/ Figure 3 highlights4 the main features of the Linked Earth Platform. The map-based allowing us to easily create special types of pages, pagegenerate is createddynamic foruser it. input The forms user, ,http://wiki.linked.earth/Linked_Paleo_Data or perhaps others, can edit that page to add documen- and create many other extensions. Third, because tationMediaWiki. As ais result, wellvisualizations maintained users build and showthe crow datasetsd vocabulary already while annotated curating with their location own datasets. metadata. Author pages 4.2 Initial Core Ontology has a strong community, there are numerous plug-insFigure available 3. highlightsFinallyshow, the their theSemantic maincontributions features ,of which the Linked help trackEarth credit Platform and. Thecreate map incentive-based s. Other pages MediaWiki API makes it easy to export content andvisualizations interoperate with showare other devoted datasets systems to .already foster communityannotated with discussions location metadata.and take polls. Author The pages annotation interface To ensure that most changes would be crowd extensions that would1 not cause major Each dataset is a page in the Linked Earth Platformshow. “ Datasettheir contributions” is isa specialdesigned class,, which to or be redesigns intuitive,help track of and creditthe provide core and ontology, screate detailed incentive the documentation initials .core Other ontology pageswith wasexamples carefully. designed. category in wiki parlance. When a user creates a neware pagedevoted for a todataset, foster all community the prop- discussionsFirst, the ontology and take was polls. developed The annotation using a traditional interface methodology for ontology en- erties that apply are shown in a table where the user can fill their values. For each 1 is designed to be intuitive,4.2 andInitial providegineering Cores detailed [ 23Ontology]. We documentation started by collect withing examples terms to. be included by the ontology in col- variable they indicate if it is observed or inferred, its value, uncertainty, and how it laboration with a select group of domain scientists. These terms where extracted from was measured. The order of the variables as columns in the data file Tois also ensure specified that. mostexamples changes provided would by be thecrowd community extensions2, and that from would previous not cause workshops major where the Figure 2 shows the metadata annotation interface for a4.2 lake sedimentInitial dataset. Core OntologyThe redesigns of the communitycore ontology, had tdiscussedhe initial coredataset ontology annotation was [carefully4]. The ontology designed. development process user has provided some of the values of the metadata properties, others have not been was also informed by previous efforts to represent basic paleoclimate metadata [14], filled out yet. The core ontology properties are shownTo e nsureat the top,that andmost First,the changescrowd the vo-ontology would bewas crowd developed extensions using that a traditional would not methodology cause major for ontology en- and by prior community proposals to unify terminology in the paleo-climate domain cabulary properties are shown near the bottom (underredesigns “Extra ofInformation”). thegineering core ontology, The [ 23user]. Wethe initialstarted core by ontologycollecting was terms carefully to be designincludeded. by the ontology in col- [5]. can also specify a new subcategory for this dataset, asFirst, shown the at the ontology bottom.laboration was withdeveloped a select using group a traditional of domain methodology scientists. These for ontology terms where en- extracted from gineering [23]. examplesWe started provided by collect bying the terms community to be included2, and by from the ontologyprevious inworkshops col- where the

laboration withcommunity a select group had of 1discussed domain scientists dataset .annotation These terms [4 ]where. The extractedontology fromdevelopment process http://wiki.linked.earth/Best_Practices2 examples providedwas also by theinforme community2 d by previous, and fromefforts previous to represent workshops basic paleoclimatewhere the metadata [14], https://github.com/LinkedEarth/Ontology/tree/master/Example community hadand discussed by prior dataset community annotation propos [4als]. The to unifyontology terminology development in the process paleo -climate domain was also informe[5]d. by previous efforts to represent basic paleoclimate metadata [14], and by prior community proposals to unify terminology in the paleo-climate domain [5]. 1 http://wiki.linked.earth/Best_Practices 2 https://github.com/LinkedEarth/Ontology/tree/master/Example 1 http://wiki.linked.earth/Best_Practices 2 https://github.com/LinkedEarth/Ontology/tree/master/Example Living with a Live Ontology 5

Challenge: Connuous revisions of ontologies + annotated data • Maintain core + crowd ontology • Editors decide when to update • Automated upgrades when monotonic changes • Semi-automated upgrades when non-monotonic changes

Figure 1: Overview of our approach for controlled crowdsourcing.

Figure 1 highlights the main aspects of the proposed controlled crowdsourcing process. There are three major components of the process: 1) annotation and vocabu- lary crowdsourcing, shown at the top; 2) editorial revision for ontology extensions, shown at the bottom; and 3) updates to the metadata repository, shown on the right. An Annotation Framework supports the annotation and crowdsourcing process, shown in the top left of Figure 1. The framework is initialized with a core ontology (A). The core ontology represents a standard that the community has agreed to use. Users, which form the crowd (B), interact with a metadata annotation system (C) to select terms from the core ontology for annotating datasets (D). For example, a term such as “archive type” from the core ontology could be used to express that a dataset contains coral. If a term they want to specify is missing, users may propose extensions by simply adding the term, which becomes a property in the crowd vocabulary (E), That new term is immediately available to other users when they are annotating their datasets. Users may also have requests and issues (F), such as requests for changes to the core ontology, comments to discuss new proposed terms, and other issues. There are two main types of changes to the core ontology that users may request: a) Monotonic changes: these are proposed new terms for the core ontology that do not affect any prior metadata descriptions made by others. For example, a user may extend the ontology for coral and add the property “species name”. b) Non-monotonic changes: these concern existing terms in the core ontology or the crowd vocabulary that have already been used to describe datasets. For ex- ample, a request to rename an existing property or change its domain/range. A Revision Framework supports the ontology revision process, shown at the bot- tom of Figure 1. A select group of users form an editorial board (G) that is continu- The Paleoclimate Community reporTing Standard (PaCTS)

All cores Ice Chronologies cores

Trees

Marine sediments

Speleothems Capturing Scienfic Knowledge

Crowdsourced Common Software and Provenance vocabularies entities models

Scientific data repositories Data analysis Collaborative processes methods

Open reproducible Automating Model publications discoveries integration Workflows Semanc Workflows [Gil et al JETAI 2011; Gil et al IEEE IS 2011; Kim et al JETAI 2010; Gil IEMS 2014]

Workflow constuents -120.931 37.371 2011-04-27 (step, input data, results, MST 1 4.523957 parameters) are assigned 2399 an idenfier that can be referenced in constraints • Input datasets have metadata that can be 2) Choice of models Owens-Gibbs Model referenced in the 1) Parameter constraints O’Connor-Dobbins Model sengs Churchill Model • Constraints are used to customize a workflow to a 3) Metadata of given dataset: new results -120.931 37.371 • Set parameters 2011-04-27 MST 14.523957 Generate new metadata 2399 • Choose steps • Validaon Example: Semanc Constraints on Workflows

• The alignment step (TopHat) and the Step assembly step (Cufflinks) must be done with the same reference genome • Species, build, and version Data

hasDatasetID COADREAD

hasGenomeReferenceID hg19

hasOmicsType RNA

hasParcipantID TCGA-A6-3807 Constraint hasSpeciesName HS Abstracons in Semanc Workflows

X!Tandem

MSGF+

Myrimatch unified well-formed req. WINGS: Workflow Composion Seed workflow from request seeded workflows

Find input data requirements Workflow reasoning algorithms use constraint-based planning binding-ready workflows to generate executable workflows from high-level workflows Data source selection

bound workflows

Parameter selection

configured workflows

Workflow instantiation

workflow instances

Workflow grounding

ground workflows

Workflow ranking

top-k workflows

Workflow mapping http://wings-workflows-org executable workflows Reproducing Work in Populaon Genomics [Gil et al 2012] Work with Christopher Mason (Cornell University)

Workflows for population genomics Transmission Disequilibrium Test Association Tests CNV Detection Variant Discovery from Resequencing Learning Reusable Workflow Fragments [Garijo et al FCGS 2017; Garijo et al eScience 2014] Work with Daniel Garijo (USC) and Oscar Corcho (UPM) Capturing Scienfic Knowledge

Crowdsourced Common Software and Provenance vocabularies entities models

Scientific data repositories Data analysis Collaborative processes methods

Open reproducible Automating Model publications discoveries integration The W3C PROV Provenance Standard [Gil and Miles 2013; Groth and Moreau 2013; Moreau et al 2014]

hps://www.w3.org/2011/prov/ Capturing Scienfic Knowledge

Crowdsourced Common Software and Provenance vocabularies entities models

Scientific data repositories Data analysis Collaborative processes methods

Open reproducible Automating Model publications discoveries integration ANALYSIS

Box 4 | Guidelines for transparent methods reporting in neuroimaging The Organization for Human Brain Mapping (OHBM) Committee on Best Statistical modelling and inference Practices in Data Analysis and Sharing (COBIDAS) report provides a set of • Mass univariate analyses: variable submitted to statistical modelling; best practices for reporting and conducting studies using MRI. It divides spatial region modelled; independent variables; model type; model practice into seven categories and provides detailed checklists that can settings; inference (contrast, search region, statistic type, P-value be consulted when planning, analysing and writing up a study. The text computation, multiple-testing correction) below lists these categories with summaries of the topics that are covered • Functional connectivity: confound adjustment and filtering; in the checklists. multivariate method (for example, independent component analysis); Acquisition reporting dependent variable definition; functional connectivity measure; • Subject preparation: mock scanning; special accommodations; effectivity connectivity model; graph analysis algorithm experimenter personnel • Multivariate modelling and predictive analysis: independent variables; • MRI system description: scanner; coil; significant hardware features extraction and dimension reduction; model; learning method; modifications; software version training procedure; evaluation metrics (discrete response, continuous response, representational similarity analysis, significance); fit • MRI acquisition: pulse sequence type; imaging type; essential sequence interpretation and imaging parameters; phase encoding parameters; parallel imaging method and parameters; multiband parameters; readout parameters; Results reporting fat suppression; shimming; slice order and timing; slice position • Mass univariate analysis: effects tested; extracted data; tables of procedure; brain coverage; scanner-side pre-processing; scan duration; coordinates; thresholded maps; unthresholded maps; extracted data; other non-standard procedures; T1 stabilization; diffusion MRI gradient spatial features table; perfusion (arterial spin labelling (ASL) MRI or dynamic • Functional connectivity: independent component analyses; graph susceptibility contrast MRI) analyses (null hypothesis tested) • Preliminary quality control: motion monitoring; incidental findings • Multivariate modelling and predictive analysis: optimized evaluation Pre-processing reporting metrics • General: intensity correction; intensity normalization; distortion Data sharing correction; brain extraction; segmentation; spatial smoothing; artefact • Define data-sharing plan early: material shared; URL (access information); and structured noise removal; quality control reports; intersubject ethics compliance; documentation; data format registration • Database for organized data: quality control procedures; ontologies; • Temporal or dynamic: motion correction visualization; de-identification; provenance and history; • Functional MRI: T1 stabilization; slice time correction; function– interoperability; querying; versioning; sustainability plan (funding) structure (intra-subject) co-registration; volume censoring; resting-state functional MRI feature Reproducibility • Diffusion: gradient distortion correction; diffusion MRI eddy current • Documentation: tools used; infrastructure; workflow; provenance trace; correction; diffusion estimation; diffusion processing; diffusion literate programming; English language version. tractography • Archiving: tools availability; virtual appliances • Perfusion: ASL; dynamic susceptibility contrast MRI • Citation: data; workflow

Validation methodologies (such as comparing with we published an initial set of guidelines for report- another existing implementation or using simulated ing the methods typically used in an fMRI study59. data) should be clearly defined. Custom analysis code Unfortunately, reporting standards in the fMRI litera- should always be shared on manuscript submission ture remain poor. Carp60 and Guo et al.61 analysed 241 (for an example, see REF. 58). It may be unrealistic to and 100 fMRI papers, respectively, for the reporting expect reviewers to evaluate code in addition to the of methodological details, and both found that some manuscript itself, although this is standard in some important analysis details (such as interpolation meth- journals such as the Journal of Statistical Software. ods and smoothness estimates) were rarely described. However, reviewers should request that the code be Consistent with this, in 22 of the 66 papers that we made available publicly (so others can evaluate it) and, discussed above, it was impossible to identify exactly in the case of methodological papers, that the code is which multiple-comparison correction technique was accompanied with a set of automated software tests. used (beyond generic terms such as ‘cluster-based cor- Finally, researchers need to acquire sufficient training rection’), because no specific method or citation was on the implemented analysis methods, particularly so provided. The Organization for Human Brain Mapping that they understand the default parameter values of (OHBM) has recently addressed this issue through the software (such as cluster-forming thresholds and its 2015–2016 Committee on Best Practices in Data www.scienficpaperohefuture.org filtering cut-offs), as well as the assumptions on the Analysis and Sharing (COBIDAS), which has issued a ANALYSIS Interpolation data and how to verify those assumptions. new, detailed set of reporting guidelines62 (BOX 4). [Gil et al ESS 2016; Essawy et al EMS 2017; Goodman et al PLOS CB 2014] The operation by which In addition, claims in the neuroimaging literature are a function is applied to the Insufficient study reporting often asserted without corresponding statistical support. acceptance of the null hypothesis. ‘Reverse inference’ replicationssampled data are to possibleobtain using publicly available data, claims, in which the presence of a given pattern of brain they are still few and far between, because the academic Geoscience'Paper'of'the'Future' estimates of the data at For the reader of a paper to know whether appro- In particular, failures to observe a statistically significant Scientific Paper of the Futureactivity is taken to imply a specific cognitive process community tends to put greater emphasis on novelty of (for example, “the anterior insula was activated, sug- findingspositions ratherwhere thandata haveon their not replicability.priate analyses have been performed, the methods effect can lead researchers to proclaim the absence of an gesting that subjects experienced empathy”), are rarely been sampled. Modern'Paper' Open'Science' 63 must be reported in sufficient detail. Some time ago, effect — a dangerous and almost invariably unsupported grounded in quantitative evidence . Furthermore, Solutions.Special Section:The neuroimaging Geoscience community Papers shouldof the Future claims of ‘selective’ activation in one brain region or acknowledge replication reportsANALYSIS as scientifically impor- Text:' Sharing:' experimental condition are often made when activa- tant research outcomes that are essential in advanc- Deposit.data.and.software..tion is statistically significant in one region or condi- ing knowledge. One effort to acknowledge this is the Narrative.of.the.method,. tion but not in others. This false assertion ignores the OHBM Replication Award,| which is to be awarded for | .(and.provenance/work%low).. NATURE REVIEWS NEUROSCIENCE ADVANCE ONLINE PUBLICATION 9 some.data.is.in.tables,. fact that “the difference between ‘significant’ and ‘not the first time in 2017 for the best neuroimaging replica- in.publicly.shared.repositories. ǟ ƐƎƏƗ !,(++ - 4 +(2'#.%igures/plots,.and.the..12 (,(3#"Ʀ / 13 .$ /1(-%#1  341#ƥ ++ 1(%'32 1#2#15#"ƥ significant’ is not itself statistically significant” (REF. 64); Scanningtion study in thethe previous horizon: year. Intowards addition, in casesǟ ƐƎ ƏofƗ !,(++ - 4 +(2'#12 (,(3#"Ʀ / 13 .$ /1(-%#1  341#ƥ ++ 1(%'32 1#2#15#"ƥ ɥ ɥ ɥ software.used.is.mentioned.ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ such claims require appropriate tests for statistical especially surprising findings, findings that couldɥ haveɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ interactions65. transparentinfluence on public and health reproducible policy or medical treatment Open'licenses:' neuroimagingdecisions, or findings research that could be tested using data Open.source.licenses.for...Solutions. Authors should follow accepted standards from another existing data set, reviewers should con- Data:' for reporting methods (such as the COBIDAS stand- Russellsider A. requesting Poldrack1, Chris replicationI. Baker2, Joke Durnez of the1,3, Krzysztof finding J. Gorgolewski by the1 ,group data.and.software.. 4 5,6 7 8 Include.data.as.. ard for MRI studies), and journals should require Paulbefore M. Matthews accepting, Marcus the R. Munafò manuscript., Thomas E. Nichols , Jean-Baptiste Poline , 9 10 supplementary.materials. (and.provenance/work%low).adherence to these standards. Every major claim in Edward Vul and Tal Yarkoni AbstractTowards | Functional the neuroimaging neuroimaging techniques have transformedpaper ofour abilitythe to future probe the and.pointers.to.. a paper should be directly supported by appropriate statistical evidence, including specific tests for signif- neurobiologicalIn this Analysis basis of behaviour article, and are we increasingly have being outlined applied by the a widernumber neuroscience of community. However, concerns have recently been raised that the conclusions that are drawn data.repositories. icance across conditions and relevant tests for interac- problems with current practice and made suggestions Metadata:'' from some human neuroimaging studies are either spurious or not generalizable. Problems such Structured.descriptions.of.the..tions. Because the computer code is often necessary to asfor low improvements. statistical power, flexibility Here, in data analysis, we outline software errors what and a lackwe of would direct replication like understand exactly how a data set has been analysed, applyto see to many in fields,the butneuroimaging perhaps particularly to paper functional of MRI. the Here, future, we discuss theseinspired problems, characteristics.of.data.and.software.releasing the analysis code is particularly useful and outlineby related current and work suggested in best the practices, geosciences and describe how71. we think the field should evolve Reproducible'Publication' (and.provenance/work%low).should be standard practice. to produce the most meaningful and reliable answers to neuroscientific questions. Planning. The sample size for the study would be deter- Lack of independent replications Neuroimaging,mined in particularly advance using using functional formal MRI that statisticalmay introduce bias, power low statistical analysis. power and flexi - (fMRI), has become the primary tool of human neuro- bility in data collection, analysis and reporting — termed There are surprisingly few examples of direct replica- scienceThe1, andentire recent advances analysis in the acquisitionplan, includingand ‘researcher degrees exclusion of freedom’ by and Simmons inclu et al.8. There- Software:' tion in the field of neuroimaging, probably reflecting analysission of criteria,fMRI data have software provided increasingly workflows pow- is clearly (including concern that these contrasts issues may be undermin and - For.data.preparation,.data. erful means to dissect brain function. The most common ing the value of science — in the United Kingdom, the Digital'Scholarship'both the expense of fMRI studies and the emphasis of formmultiple of fMRI (known-comparison as blood-oxygen-level -dependentmethods) Academy and of Medical specific Sciences recently definitions convened a joint analysis,.and.visualization. most top journals on novelty rather than informative- (BOLD)for allfMRI) planned measures brain regions activity indirectly of interest,meeting with severalwould other fundersbe formally to explore these through localized changes in blood oxygenation that issues, and the US National Institutes of Health has an ness. Although there are many basic results that are pre-registered. 2 11 Persistent'identi9iers:' Reproducible Research: occur in relation to synaptic signalling . TheseOn changes ongoingReproducible initiative to improve research reproducibility AI. : clearly replicable (for example, the presence of activity in signal provide the ability to map activation in rela- In this Analysis article, we outline a number of poten- For.data,.software,.and.authors.in the ventral temporalGeophysics Papers of the Future cortex that is selective for faces tionImplementation. to specific mental processes, to identifyAll code functionally for tiallydata problematic collection research practices and in neuroimaginganalysis that connected networks from resting fMRI3, to Towardscharacterize can lead to increased R riskeproducible of false or exaggerated results. Provenance'and'methods:'' 4 (and.provenance/work%low).over scenes, or systematic correlations within func- neuralwould representational be stored spaces and in to decode a version or predict -Forcontrol each problematic system research practice, and we would propose a set Work%low/scripts.specifying. tional networks in the resting state), the replicability mentalinclude function fromsoftware brain activity tests5,6. These to advances detect of solutions. common Although mostproblems. of the proposed The solutions promise to offer important insightsR esearch,into the workings are uncontroversial Open in principle, S theircience, implementation is and data%low,.codes,.. of weaker and less neurobiologically established effects ofrepository the human brain but would also generate usethe potential a continuous for often challenging integration for the research community, system and best con%iguration.%iles,.. Citations:' (for example, group differences and between-subject a to‘perfect ensure storm’ of irreproducible that each results. revisionIn particular, practices of the are not code necessarily passes followed. Many appro of these solu- - correlations) is nowhere near as certain. One study66 thepriate high dimensionality software of fMRI data, tests. theD relativelyigital The low entiretions ariseS cholarshipfrom analysis the experience ofworkflow other fields with in sim - AI Citations.for.data.and.software. power of most fMRI studies and the great amount of ilar problems (particularly those dealing with similarly parameter.settings,.and.. attempted to replicate 17 studies that had previously flexibility(including in data analysis both contribute successful to a potentially high andlarge and failed complex data analyses) sets, such as genetics) would (BOX 1). We (and.provenance/work%low).found associations between brain structure and behav- degreebe ofcompletely false-positive findings. automated innoteP a that,ublications workflow although our discussion engine here focuses and on fMRI, runtime.dependencies. Recent years have seen intense interest in the repro- many of the same issues are relevant for other types of iour. Only 1 of the 17 attempts showed stronger evi- ducibilitypackaged of scientific in results a andsoftware the degree to containerwhich neuroimaging, or virtualsuch as structural machine or diffusion MRI. to dence for an effect as large as the original effect size someensure problematic, computational but common, researchOdd Erik Gundersen, Yolanda Gil, David W. Aha practices reproducibility. may All data sets and be responsible for high rates of false findings in the sciOdd- Low Erik statistical Gundersen, power Yolanda Gil, David W. Aha than for a null effect, and 8 out of 17 showed stronger results would be assigned version numbers12 to enable entific literature, particularly within psychology but also The analyses of Button et al. provided a wake-up call 7–9 evidence for a null effect. This suggests thatCorrespondence replica to R.A.P. - moreexplicit generally . trackingThere is growingn Artificial interestof provenance. intelligence,in ‘meta- likeregarding any science, statistical Automated must rely power on reproduciblein neuroscience, quality experiments particu to - validate results. Our [email protected] research’ (REF. 10) and a correspondingobjective growth is toin givestudies practical larly and by pragmatic highlighting recommendations the point (that forwas how raised to earlierdocument by AI research so that bility of neuroimaging findings (particularly brain– control would assess the analysis 7at each stage to detect doi:10.1038/nrn.2016.167 investigating factors that contributeresults to poor are reproducible.reproduci- OurIoannidis analysis) ofthat the low literature power showsnot only that reduces AI publications the likeli currently- fall short of behaviour correlations) is exceedingly low, Publishedas has online 5been Jan 2017 bility.potential These factors include errors. study designproviding characteristics enough documentation hood of findingto facilitate a true reproducibility. result if it exists Our suggested but also bestraises practices are based on a framework for reproducibility and recommendations for best practices given by scientific 67 demonstrated in other fields, such as cancer biology organizations, scholars, and publishers. We have made a reproducibility checklist based on our 68 investigation and described how every item in the checklist can be documented by authors and and psychology . NATURE REVIEWS | NEUROSCIENCEValidation. For empiricalexamined by reviewers. papers, We encourage all authors exploratory ADVANCE and reviewers ONLINE to use PUBLICATIONresults the suggested | 1 best practices and It is worth noting that, although the cost of con- wouldǟ ƐƎ ƏƗbe ! ,validated(++ - 4 +(2'#12author (,(3#"Ʀ checklist/against 13 .$ /1 (when-%#1  considering 34an1#ƥ ++ 1(independent%'3 2submissions1#2#15#"ƥ for AAAI publicatio validationns and conferences. ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ducting a new fMRI experiment is a factor limiting the data set that wasReproducibility not examined is a cornerstone ofbefore the scientific validation. method. The ability For and effort required from other researchers to replicate experiments and explore variations depends heavily on the feasibility of replication studies, there are many find- methodologicalinformation papers, provided the whenapproach the original wouldwork was p ublished.follow Reproducibility best is challenging ings that can be replicated using publicly available data. practices for reducingfor many sciences, overly for optimisticexample when the results variability72 of. Any physical new samples and reagents can 26 significantly affect the outcome (Begley and Ellis 2012; Lithgow, Driscoll, and Phillips 2017). Resources such as the FCP-INDI , the Consortium method would beIn computer validated science, aagainst large portion benchmark of the experiments aredata fully setsconducted on computers, for Reliability and Reproducibility69, OpenfMRI70 and compared withmaking theother experiments state more-of straightforward-the-art tomethods. document (Braun and Ong 2014). Most AI Software container 20 and machine learning research also falls under this category of computational or the HCP provide MRI data that are suitable for experimentation. However, reproducibility in AI is not easily accomplished (Hunold and A self-contained software tool attempts to replicate many previously reported find- Dissemination.Träff All 2013; results Fokkens et wouldal. 2013; Hunold be 2015).clearly This may marked be because AIas research has its own that encompasses all of the unique reproducibility challenges. Ioannidis (2005) suggests that the use of analytical necessary software and ings. These resources can also be used to answer ques- either hypothesismethods-driven which a re(with still a focus a link of active to investigationthe appropriate is one reason it is comparatively difficult to ensure that computational research is reproducible. For example, Henderson et al. dependencies to run tions about sensitivity of a particular finding to the data pre-registration)(2017) or show exploratory. that problems due All to non analysesdeterminism performed in standard benchmark environments a particular program. analysis tools used36. However, even in the cases when on the data set an(includingd variance intrinsic those to AI methods analyses require proper that experimental were nottechniques and reporting procedures. Acknowledging these difficulties, computational research should be documented properly so that the experiments and results are clearly described. The AI research community should strive to facilitate reproducible research, following 10 | ADVANCE ONLINE PUBLICATION sound scientific methods and properwww.nature.com/nrn documentation in publications. Concomitant with reproducibility is open science, which involves sharing data, software, and other science resources in public repositories using permissive licenses. Open science is increasingly ǟ ƐƎƏƗ !,(++ - 4 +(2'#12 (,(3#"Ʀ / 13 .$ /1(-%#1  341#ƥ ++ 1(%'32 1#2#15#"ƥ ǟ ƐƎƏƗ !,(++ - 4 +(2'#12 (,(3#"Ʀ / 13 .$ /1(-%#1  341#ƥ ++ 1(%'32 1#2#15#"ƥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ associated with FAIR principles to ensure that science resources have the necessary metadata ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ ɥ to make them findable, accessible, interoperable, and reusable (Wilkinson et al. 2016). Modern digital scholarship promotes proper credit to scientists who document and share

AI for Systemac Scienfic Data Analysis Automated Discovery: Hypotheses and Lines of Inquiry [Gil et al ACS 2016; Gil et al AAAI 2017; Garijo et al 2017; Srivastava et al PSB 2019; Mallick et al 2020]

New With Parag Malick, Ravali Adusumilli, Hunter Boyce (Stanford); Arunima Srivastava (OSU); hypothesis Daniel Garijo, Varun Ratnakar, Rajiv Mayani (USC/ISI); Thomas Yu (Sage Bionetworks)

Formulate Meta-analysis line of inquiry of results Hypothesis Revised (data + method) hypothesis

Run Lines of Inquiry Retrieve workflows data Specify relevant analyc methods (workflows), (methods) type of data needed, and how to combine results

Query to DATA Retrieve CATALOG Data Meta-Workflows Analytic Workflows Confidence Esmaon

Workflow

Library Benchmarking

Automated Discovery in DISK

hypothesis Protein PRKCDBP is expressed in samples of patient P36 lines of inquiry data revision PRKCDBP mutation is expressed in P36

Wf#0# Wf#1# Wf#2#

comparison*

simMetrics#

hypothesis#

hypothesisRevision*

revisedHyp# workflows meta- workflows Integrang Knowledge through Workflows

TRANSCRIPTOMICS/GENOMICS

ANALYSIS & SYNTHESIS

PROTEOMICS Represenng Hypotheses

HG1 HE1 Statement Execution 1 wasGeneratedBy Qualifier

HS1 Associated HQ1 hasConfidence Evidence Protein With C1 Colon Report L1 EGFR Cancer Evoluon hasConfidenceLevel revisionOf

HG2 HE2 Execution 2 wasGeneratedBy wasGeneratedBy

HS2 Associated HQ2 With Colon hasConfidence Protein Cancer C1 L2 EGFR Report DISK Hypothesis Ontology: SubtypeA hasConfidenceLevel hp://disk-project.org/ontology/disk Reproducing a Seminal Cancer Omics Paper

hp://www.nature.com/nature/journal/v513/n7518/full/nature13438.html Paent Proteomic Subtyping

Original Reanalysis [Zhang et al 2014] [Mallick et al 2020]

Proteome TCGA_mRNA_subtype Sadanandam_subtype DeSousa_subtype Methylation_subtype cancer MLH1_silencing Hypermutation MSI_high POLE BRAF CIN Enterocyte/Goblet−like CCS1 CIMP−high Colon No No No No No Invasive Inflammatory CCS2 CIMP−low Rectum Yes Yes Yes Yes Yes MSI/CIMP Stem−like CCS3 Cluster3 TA Cluster4

Proteome

Relative 2 1 Protein 0 Abundance −1 (log2) −2

Proteome

TCGA_mRNA_subtype

Sadanandam_subtype

DeSousa_subtype Methylation_subtype cancer MLH1_silencing Hypermutation MSI_high POLE BRAF KRAS TP53 loss_18q Large Differences in Protein Idenficaon

10% Difference 10% to 5% Difference Mul-Omics: Sensivity to Methods/Soware

X!Tandem

MSGF+

35% of protein Myrimatch identifications are not robust to changing just one analysis step

MSGFPlus Magplus Peptome Detection of important proteins in colorectal cancer: Proteins detected per search engine

Proteins# search engines detected in number of Search Engines 128 Protein MSGF+ Myrimatch X!Tandem 6000 APC Found Not found Found 4000

4000 PIK3CA Found Not found Not found 3000 1089873 4867 #Proteins Found Not found Found 2000 VTI1A #Proteins 2000 # proteins # proteins PMS1 Found Not found Not found 1000 47 14236 0 MyriMatch Found Found Not found 0 xTandem NTHL1 Msgfplus Myrimatch Pepitome XTandem 1 2 3 4 #Search Engines Myrimatch XTandem Mul-Omics: Method Sensivity

Different Parameters NCBI vs UCSC Reference DB

DB2 DB1

240 4816 1311 Revising Hypotheses with Addional Data

100

75 Proteome Subtype New Combo 5 50 4

count 3 2 25 1

0 CRC95 CRC100 CRC105 CRC110 CRC115 CRC115_BE Analyzing and Comparing Different Datasets

12000

9000

6000 # Proteins 3000

0 CRC95 NCI60 OV BRCA

CRC95

OV

BRCA

0 1000 2000 Number of SAAVs Benchmarking for DREAM Challenges

Abstract Alignment Sequencer Component ComparisonAligned reads comparison of aligned TopHat 1.01.0 reads across samples Output (STAR/TopHat) 0.8 FASTQ Aligned data Align Quantification Quantification 0.6

components components 0.4 read coverage 0.2

(FeatureCount (Cufflinks alignment and STAR TopHat Correlation between

STAR and STAR between Aligned and EdgeR) suite) Spearmancorrelation Reads 0.00.0 RPKM SamplesSamples BAM Quan-fied and normalized data Quantify Abstract Prediction OverlapOverlap of genes of genes found found

Overlap of genes found GeneGene−wise− wiseexpression expression correlation correlation Component Overlap of genes identified Gene-wiseGene−wise expression correlation correlation

Overlap of genes found 7000 7000

FPKM RPKM 70007000 Gene−wise expression correlation

FPKMFPKM RPKMFPKMRPKM RPKM 5000 (Generic Model/ 7000 5000 FPKM 5000 1422 24942 FPKM538 RPKM Transcript 1422 24942 538 5000 14221422 24942 538538 3000 3000

Gene-Specific Model) # of Gene Profiles 1422 24942 538 3000 # of Gene Profiles # of Genes #of 0 3000 1000 level 0 # of Gene Profiles 1000 # of Gene Profiles 0 0 Generic 0 24942 0 0 0.0 0.2 0.4 0.6 0.8 1.0 1000 1000 0.0Spearman0.2 correlation0.4 0.6 0.8 1.0

0.00 Spearman correlation 1.0

0 Spearman TSV 0.0 0.2 0.4 0.6 0.8 1.0 Model 0.0 0.2CorrelationSpearman0.4 correlation0.6 0.8 1.0 Spearman correlation Sample−wise expression correlation Scatterplot of expression values Sample-wiseSample −correlationwise expression correlation ExpressionScatterplot of expression comparison values Sample−wise expression correlation Scatterplot of expression values 15 15 8 15 8 8 15

Sample−wise expression correlation 10 Scatterplot of expression values 10 8 10 6 5 6 5 Predict 15 6 5 RPKM RPKM 8 0 − 0 − 4 4 RPKM 10 0 − 4 Log Log # of Samples # of Samples 5 5 Log − # of Samples − 5 2 2 − 6

Protein level predic-on 5 Log RPKM Log 2 10 10 # of Samples #of − − 10 − 0 0 RPKM 0

0 0 -10 Gene − 4 0.0 0.2 0.4 0.6 0.8 1.0 −15 −10 −5 0 5 10 0.0 0.2 0.40.0 0.60.2 0.80.4 0.61.0 0.8 1.0 −15 −10−15−5−10 0 −5 5 0 105 10 Spearman correlation Log−FPKM 0.0 SpearmanSpearman correlationSpearman correlation1.0 Log -15Log −FPKMLog−FPKM 10 # of Samples

Predicted 5 Correlation − Log FPKM Specific Correctness scoring 2 10 − protein levels Model 0 0.0 0.2 0.4 0.6 0.8 1.0 −15 −10 −5 0 5 10 Spearman correlation Log−FPKM TSV Dynamic range comparison

1.5

Model Linear Gene Specific Density

0 -5 10 Protein Abundance Predicted protein level comparison 4 Linear

1 -5 Gene Specific 10 AI for Automang Discovery

Projecon of Data Volumes in 2025 Siloed Experse

C E O X M P P E U R T I A M T E I T O N N A A L L

Analyc Complexity for Each Type of Data doi: 10.1038/nbt.1658

AI for Interdisciplinary Science Froners MINT: Model INTegraon [Gil et al IEMS 2018; Garijo et al eScience 2019] Collaboration with Daniel Garijo, Deborah Khider, Craig Knoblock, Ewa Deelman, Rafael Ferreira (USC/ISI), Vipin Kumar (UM), Scott Peckham (CU), Chris Duffy & Armen Kemanian (PSU), Kelly Cobourn (VT), Suzanne Pierce (UT)

Food-Energy-Water Nexus

hps://commons.wikimedia.org/wiki/File:FEMA_-_43937_-_Flood_Damaged_Tennessee.jpg hps://www.pexels.com/photo/bird-birds-blue-sky-drought-1178291/ hps://www.publicdomainpictures.net/en/view-image.php?image=256538&picture=agriculture-farming Integrated Modeling

Economic Mediation at Models Agriculture many levels Social Models ■ Takes months/years Models ■ Currently a cra Modeling scope ■ Increased demand Natural Model Models Infrastructure Models set up Variable mapping Data ingestion Spatial gridding Supporng Composionality of Scienfic Knowledge

Ø Data formats Ø Physical variables Ø Constraints for use Ø Adjustable parameters Ø Intervenons

hp://www.pihm.psu.edu/pihm_home.html Ontologies for Unambiguous Physical Variables

Work by Scott Peckham and Maria Stoica (CU) • Ontology of standard scienfic names • Eg SSN: watershed_outlet_water__volume_outflow_rate is more precise than “streamflow” or “discharge” Spaal Datasets of Varying Quality

Temperature Precipitaon Crop prices Producon

hp://www.scirp.org/journal/PaperInformaon.aspx?paperID=76775 hp://www.fao.org/giews/countrybrief/country.jsp?code=SSD Land use Agricultural potenal Aquifer producvity and recharge

Diao, X. et al. DOI: 10.5772/47938 Diao et al DOI:10.5772/47938 hp://earthwise.bgs.ac.uk/index.php/Hydrogeology_of_Sudan Automated Data Transformaons

Work by Craig Knoblock, Yao- Create a saturaon file Yi Chiang Jay Pujara (USC) pg.gw file for each mesh cell Cycles.REINIT file Time Mesh 1 Mesh 2 ... Mesh n ROT DOY VARIABLE VALUE 1440 3.595513 6.534754 ... 3.771523 1 1 INFILTRATION 0.0 2880 3.595509 6.534728 ... 3.771488 1 1 SATURATION_L1 0.0 4320 3.595505 6.534702 ... 3.771453 1 1 SATURATION_L2 0.012 ...... Create a data graph Write the data for each cell graph as a row

pihm:SoilDepth_1 Saturaon cycle:Variable_1 transformaon depth doy recordedAt meshId var_name value year 1 3.595513 1 1440 SATURATION_L1 1 0.0 Hydrology model gives depth unl ground water Agriculture model requires saturaon Creang Virtual Gauges When No Data Available

Work by Vipin Kumar and Ankuah Khandelwal (UM) Theory-guided data science incorporates biophysical laws into machine learning from remote sensing data

High

Corn Theory'guided, Theory-guided Data,Science,ModelsData Science Models based,Models ' ic Theory ic 1 f (TGDS) Soybean Theory-based Models cienti S

Theory f Use o Data,Science,Models Data Science Models Low

LowUse of Data High Mapping Models to Intervenons and Decisions

Work by Armen Kemanian Drivers and and Yuning Shi (PSU) adjustable Response parameters variables Weather Crop types hps://commons.wikimedia.org/wiki/File:Phosphorus_Cycle_copy.jpg Crop yield Ferlizer Model parameters Planng dates Nitrogen stress Weed factor Soils Land use Solar radiaon Rich Representaons of Models

Work with Daniel Garijo, Deborah Khider, Varun Ratnakar, Maximiliano Osorio (USC) Model Catalog Data formats

ProcessesPhysical variables and Configurations:

Constraints

Adjustable parameters Variables: MINT:Model From Catalog Models to Soluons

Work with Deborah Khider (USC); Suzanne Pierce and Lissa Pearson(UT); Chris Duffy and Lele Shu (PSU) Data formats

ProcessesPhysical variables and Configurations:

2 Identify variables of interest Constraints

1 Identify variables of interest The models below generate data that includes the response variables that you selected earlier: European Flooding Index. Other models that are available do not generate that kind of result.

Variables:

5 Adjust model to explore interventions, identify problem areas Adjustable parameters

3 Compare models 4 Set up and run model

12

6 Prepare modeling products for analyst • Subsidies for sorghum fertilizer will decrease sesame production • If sorghum prices fall, sesame and maize production increase • If sesame prices fall, groundnuts production will increase 13 Model Portability: Data Scarce Regions

Height Above Natural Drainage Model (HAND) Work with Suzanne Pierce, Daniel Hardesty Lewis, David Arctur Paola Passalacgua (UT), David Tarboton (Utah State); Misty Porter, Mary Hill (KU)

Ethiopia | Awash Basin

Results provide first pass vulnerability for flood risk Pixels with colors on right of scale (brighter red) indicate higher change of inundation or flooding Model Portability: Data Rich Regions

Work with Suzanne Pierce, Daniel Hardesty Lewis, David Arctur and Paola Passalacgua (UT)

Combining • Terrain (10 m) Travis County • Populaon Comparison: Williamson County • Urban Land Use Manually Constructed Buyout Map MINT Generated Vulnerability Map

A Perspecve on the Future Tackling Complex Scienfic Phenomena

Large number of Community Single authorship Co-authorship co-authors as author

Evolution of the scientific enterprise from [Barabasi, 2005] extended with the ATLAS Detector Project at the Large Hadron Collider [The ATLAS Collaboration, 2012]. http://commons.wikimedia.org/wiki/File:MRI_brain_sagittal_section.jpg http://commons.wikimedia.org/wiki/File:Earth_Eastern_Hemisphere.jpg http://www.nasa.gov/mission_pages/swift/bursts/uv_andromeda.html Consider the Atlas Collaboraon

Atlas collaboraon • Approximately 4,000 authors • Worked in subgroups that coordinated with one another • Collaboraon lasted many years

Today, large scienfic collaboraons take significant me and effort and therefore are not very frequent How can we change this? The Importance of Process

“The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Freestyle Chess Champion Weak human + machine + better process was Anson Williams superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.”

– Garry Kasparov, 2010 http://www.thechessdrum.net/blog/2007/12/21/anson-williams-king-of-freestyle-chess/ Future AI Systems as Partners for Discovery [Gil DSJ 2017] Thoughtful AI: Principles for Partnership Rationality Behavior is governed by explicit knowledge structures Context Seek to understand the purpose and scope of tasks Initiative Proactively learn new knowledge relevant to their task Networking Access external sources of knowledge and capabilities Articulation Respond with persuasive justifications and arguments Systems Facilitate integration & collaboration with humans/systems Ethics Behavior that conveys scope and uncertainty These are important research challenges for AI Will AI Write the Scienfic Papers of the Future?

Crowdsourced Common Software and Provenance vocabularies entities models

Scientific data repositories Data analysis Collaborative processes methods

ScientificGeoscience'Paper'of'the'Future' Paper of the Future Automating Open reproducible Model Open'Science' publications discoveries integration Modern'Paper'

Text:' Sharing:' Narrative.of.the.method,. Deposit.data.and.software.. some.data.is.in.tables,. .(and.provenance/work%low).. .%igures/plots,.and.the.. in.publicly.shared.repositories. software.used.is.mentioned. Open'licenses:' Thoughtful AI: Principles for Partnership Data:' Open.source.licenses.for... Include.data.as.. data.and.software.. (and.provenance/work%low). Rationality Behavior is governed by explicit knowledge structures supplementary.materials. and.pointers.to.. Context Seek to understand the purpose and scope of tasks data.repositories. Metadata:'' Structured.descriptions.of.the.. Initiative Proactively learn new knowledge relevant to their task characteristics.of.data.and.software. Networking Access external sources of knowledge and capabilities Reproducible'Publication' (and.provenance/work%low). Articulation Respond with persuasive justifications and arguments Software:' Systems Facilitate integration & collaboration with humans/systems For.data.preparation,.data. Digital'Scholarship' analysis,.and.visualization. Ethics Behavior that conveys scope and uncertainty Persistent'identi9iers:' For.data,.software,.and.authors. Provenance'and'methods:'' (and.provenance/work%low). Work%low/scripts.specifying. data%low,.codes,.. con%iguration.%iles,.. Citations:' parameter.settings,.and.. Citations.for.data.and.software. runtime.dependencies. (and.provenance/work%low). YOUR IDEAS HERE! Formulang Grand Challenges

January 2019 Spring 2016

AI reproducing arcles AI as research assistant AI as co-author The Next Two Decades

2025: AI can generate automacally new complex scienfic analyses using open data

2025: AI detects when it is missing knowledge and can seek and read new scienfic papers on target topics

2030: AI can generate and test sophiscated hypotheses about complex physical phenomena

2030: AI can reproduce the results in 80% of the arcles in a scienfic journal

2035: AI can describe a scienfic experiment and discuss sophiscated aspects of it

2035: AI can compare scienfic experiments and papers and contrast their merits

2040: AI can teach advanced theories in some scienfic domain effecvely to students

2040: AI can formulate research quesons and generate novel contribuons in some scienfic domain

AI reproducing arcles AI as research assistant AI as co-author

2030 2035 2040 AI to Address Major Future Challenges Diversity and Breadth of Advances in AI

Percepon Creavity Commonsense Vision Ethics Robocs Explanaon Sensing Language Discovery

Representaon Knowledge

Learning Search Planning Cognion Teamwork Reasoning Predicon Generaon Speech Matching Dialogue Understanding hps://commons.wikimedia.org/w/index.php?curid=70268939 Will AI Write the Scienfic Papers of the Future? • The AI community has always been • Visionary • Broad • Inclusive • Interdisciplinary • Determined

• And dare I say • Successful

So… the answer may be yes?

hps://www.flickr.com/photos/losalamosnatlab/22043600248 http://www.thechessdrum.net/blog/2007/12/21/anson-williams-king-of-freestyle-chess/ Will AI Write the Scienfic Papers of the Future? • AI researchers have • Humans: been: • Visionary • Not systemac • Broad • Inclusive • Errors • Interdisciplinary • Determined • Biases

• And dare I say • Poor reporng • Successful

So… the answer may be yes? So… the answer is definitely yes?

hps://www.flickr.com/photos/losalamosnatlab/22043600248 http://www.thechessdrum.net/blog/2007/12/21/anson-williams-king-of-freestyle-chess/ Will AI Write the Scienfic Papers of the Future? Maybe the answer is no: • The AI community • Not systemac has always been • Visionary • Errors • Broad • Inclusive • Interdisciplinary Pennicillin discovery • Biases • Determined resulted from human error…

• And dare I say • Poor reporng • Successful

So… the answer may be yes? So… the answer is definitely yes?

hps://www.flickr.com/photos/losalamosnatlab/22043600248 http://www.thechessdrum.net/blog/2007/12/21/anson-williams-king-of-freestyle-chess/ Will AI Write the Scienfic Papers of the Future? Maybe the answer is no: • The AI community • Not systemac has always been • Visionary • Errors • Broad • Inclusive • Interdisciplinary Pennicillin discovery • Biases • Determined resulted from human error…

• And dare I say • Poor reporng • Successful

So… the answer may be yes? …and humans make So… the answer is definitely yes? unique contribuons… hps://www.flickr.com/photos/losalamosnatlab/22043600248 http://www.thechessdrum.net/blog/2007/12/21/anson-williams-king-of-freestyle-chess/ Thank you!

• Varun Ratnakar, Daniel Garijo, Deborah Khider, Maximiliano Osorio, Hernan Vargas (USC) • Workflows: Jihie Kim, Ewa Deelman, Karan Vahi; Rafael Ferreira, Rajiv Mayani, Hyunjoon Jo, Yan Liu, Dave Kale (USC); Ralph Bergmann (U Trier); William Cheung (HKBU); Oscar Corcho (UPM); Pedro Gonzalez, Gonzalo Castro (UCM); Paul Groth (UA); Ricky Sethi (FSU); (UM); Chris Mamann, Paul Ramirez, Dan Crichton, Rishi Verma (JPL); Natalia Villanueva (UTEP) • Linked Earth and Organic Data Science: Julien Emile-Geay, Deborah Khider (USC); Nick McKay (NAU); Felix Michel and Matheus Hauder (TUM); Chris Duffy (PSU); Paul Hanson, Hilary Dugan, Craig Snortheim (U Wisconsin); Jordan Read (USGS); Neda Jahanshad (USC) • Biomedical workflows: Phil Bourne, Sarah Kinnings (UCSD); Chris Mason (Cornell); Joel Saltz, Tahsin Kurk (Emory U.); Jill Mesirov, Michael Reich (Broad); Shannon McWeeney, Chrisna Zhang (OHSU); Parag Mallick, Ravali Adusumilli, Hunter Boyce (Stanford U.) • Geosciences workflows: Paul Hanson (U Wisconsin), Tom Harmon & Sandra Villamizar (U Merced), Tom Jordan & Phil Maechlin (USC), Kim Olsen (SDSU); Suzanne Pierce (UT); Chris Duffy & Armen Kemanian (PSU); Sco Peckham & Maria Stoica (CU) • And many others!