The Transformative Potential of AI for Science
Total Page:16
File Type:pdf, Size:1020Kb
The Transforma,ve Poten,al of AI for Science: Will AI Write the Scien,fic Papers of the Future? Yolanda Gil Informaon Sciences Ins.tute and Department of Computer Science University of Southern California h<ps://.nyurl.com/yolandagil-aaai2020 This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as the original source is credited. Source: “Will AI Write the Scien.fic Papers of the Future?”, Yolanda Gil, 2020 Presiden.al Address, Associaon for the Advancement of Ar.ficial Intelligence (AAAI), February 2020. A Personal Perspecve on AI A Personal Perspecve on AI • The AI community has always been • Visionary • Broad • Inclusive • Interdisciplinary • Determined • And dare I say • Successful A Personal View of Watershed Moments in AI: (I) 1980s Scripts Plans Goals and Understanding Roger C. Schank and Robert P. Abelson A Personal View of Watershed Moments in AI: 1990s 1990: Sphynx shows speaker independent large vocabulary connuous speech recognion – used it to write my PhD thesis 1992: Soar flies helicopter teams in simulaons and is indisnguishable to commanders from human-controlled aircra 1992: TD-Gammon autonomously learns to play backgammon at human player levels 1995: SKICAT idenfies five new quasars in the Second Palomar Sky Survey 1995: Navlab is the first trained car to drive autonomously on highways to cross the United States 1997: Deep Blue defeats human chess world champion and gets grandmaster-level rang 1999: RAX flew a spacecraC autonomously, demonstrang planning, monitoring, and fault repair hps://doi.org/10.1609/aimag.v16i1.1121 hps://www.cs.cmu.edu/afs/cs/project/alv/www/navlab_video_page_files/Navlab_Kanade_version97.mpg hps://flic.kr/p/9Dpww A Personal View of Watershed Moments in AI: 2000s 2000: The Gene Ontology is shown to describe over 15,000 gene products for drosophila, mouse, and yeast 2000: Kismet demonstrates and recognizes emoons 2003: USC ISI’s stascal machine translaon prototype beats hand-craed commercial systems 2004: RDF semanc specificaons become W3C recommendaons for the GGG (Giant Global Graph) 2007: Stanley wins $1M for first autonomous high-speed off-road driving 2007: First robot soccer team against human players in exhibion game at RoboCup 2009: Pragmac Chaos ensemble learning wins $1M compe@@on to predict user film rangs hps://commons.wikimedia.org/w/index.php?curid=20798358 hps://commons.wikimedia.org/w/index.php?curid=374949 A Personal View of Watershed Moments in AI: 2010s 2010: Siri voice-acvated personal assistant is released as a smart phone app 2011: CMDragons team of 10 soccer robots coordinate plays for roune passing, intercepon, and goal scoring 2011: Watson takes first place in Jeopardy Q/A game defeang two human champions 2012: AIexNet scores improved 10 percentage points in the ImageNet visual recognion challenge 2013: Cognive Tutor shows 8 percenle points average improvement in algebra in 25,000 students study 2016: Knowledge Graph used as seman@c backbone in one third of 100B monthly searches 2019: Wikidata records 8 billion triples and over 880 billion edits, surpassing Wikipedia as most edited Wikimedia site hps://en.wikipedia.org/w/index.php?curid=54391045 hps://www.youtube.com/watch?v=4QtBSDSC2pk Diversity and Breadth of Advances in AI Creavity Vision Percep.on Commonsense Robo.cs Ethics Speech Sensing Explanaon Language Discovery Representaon Knowledge Search Planning Learning Cognion Reasoning Teamwork Matching Predicon Coordinaon Constraints Dialogue Understanding hps://commons.wikimedia.org/w/index.php?curid=70268939 AI to Address Major Future Challenges The Impera,ve for AI in Science AI and Scien,fic Discovery • [Lenat 1976] • [Lindsay et al 1980] • [Langley 1981] • [Falkenhainer 1985] • [Kulkarni and Simon 1988] • [Cheeseman et al 1989] • [Zytkow et al 1990] • [Simon 1996] • [Valdes-Perez 1997] • [Todorovski et al 2000] • [Schmidt and Lipson 2009] Human Limita,ons Curb Scien,fic Progress [Gil DSJ’17] • Not systemac • e.g., [Peters et al PLOS 2014] • Errors • e.g., [Herndon et al CJE 2013] • Biases • e.g., [Anderson et al ACS 2014] • Poor reporng • e.g., [Garijo et al PLOS 2013] hps://doi.org/10.1145/3339399 hps://doi.org/10.1038/s41586-019-1335-8 hps://doi.org/10.1038/s41586-019-1923-7 hps://arxiv.org/pdf/1910.06710 http://commons.wikimedia.org/wiki/File:MRI_brain_sagittal_section.jpg http://commons.wikimedia.org/wiki/File:Earth_Eastern_Hemisphere.jpg http://www.nasa.gov/mission_pages/swift/bursts/uv_andromeda.html Capturing Scien,fic Knowledge Scienfic Knowledge hp://www.pihm.psu.edu/pihm_home.html Suppor,ng Composi,onality of Scien,fic Knowledge Ø Data formats Ø Physical variables Ø Constraints for use Ø Adjustable parameters Ø Interven.ons Capturing Scien,fic Knowledge Crowdsourced Common Software and Provenance vocabularies entities models Scientific data repositories Data analysis Collaborative processes methods Open reproducible Automating Model publications discoveries integration Capturing Scien,fic Knowledge Crowdsourced Common Software and Provenance vocabularies entities models Scientific data repositories Data analysis Collaborative processes methods Open reproducible Automating Model publications discoveries integration Low-Cost Creaon of Scienfic Vocabulary Standards [Gil et al ISWC 2017; Khider et al PP 2019; Emile-Geay et al PAGES 2018] Work with Deborah Khider, Julien Emile-Geay, Daniel Garijo (USC); Nick McKay (NAU) Problem: Diversity of requirements for metadata Approach: Seman.c technologies used for controlled crowdsourcing facilitate creaon of community standards to describe highly heterogeneous scien.fic data • Organic growth: As scien.sts annotate their datasets, they propose new metadata properes hps://commons.wikimedia.org/wiki/File:An_ice_core_segment.jpg • Crowdsourcing: Scien.sts proposed proper.es for reuse, vote on priori.es • Editorial oversight: Editors decide what proper.es will be in future versions hps://commons.wikimedia.org/wiki/File:Gravity-corer_hg.png Results: A new standard for paleoclimate (PaCTS 1.0) with one (!!) single ini.al face-to-face mee.ng Controlled Crowdsourcing: 7 9 Con,nuous Ontology Growth 8 8 8 Figure 4: An overview of the core Linked Earth Ontology and its extensions. We also took into account relevant standards and widely used models. We used several vocabularies3: Schema.org and Dublin Core Terms (DC) for representing the basic metadata of a dataset and its associated publications (e.g., title, description, authors, contributors, license, etc.), the wgs_84 and GeoSparql specifications for rep- resenting locations where samples are collected, the Semantic Sensor Network (SSN) to represent observation-related metadata, the FOAF vocabulary to represent basic information about contributors, and PROV-O to represent the derivation of models from raw datasets. Figure 4 shows an overview of the ontology, which is layered and has a modular structureFigure. The 3:existing Overview standards of just the mentioned main features provide an of upper the Linkedontology forEarth basic P latform. terms. We used the LiPD format, mentioned in Section 2, to develop the LiPD ontol- ogy4 which contains the main terms useful to describe any paleoclimate dataset (e.g., data tables,When variables, annotating instrument used metadata, to measure them,the calibration,system uncertainty,offers in etc.) a. pull-down menu the possible A set of extensions of LiPD cover more specific aspects of the domain. The Proxy completions of what the user is typing based on similar terms proposed by other users. Archive extension defines the types of medium in which measurements are taken, Figure 3: OverviewsuchThis as ofmarine thehelps sedimentsmain avoid features or coralproliferation. Theof theProxy Linked Observation of Earthunnecessary extension Platform. describe terms s the and helps normalize the new Figure 2: Overview of the metadata annotation interface, with core ontology terms markedtypes of observations (e.g., tree ring width, trace metal ratio, etc.) that can be meas- with an “L”. The properties under “Extra Information” are part of the crowd vocabulary. uredterms. The Pro created.xy Sensor extensionIf none describes represents the types what of biological the user or non wants-biological to specify, then a new property When annotatingcomponentswill bemetadata, that added. react to environmental The the propertysystem conditions offersbecomes and reflect in part athe pullclimate of -thedown at the crowd time. menu vocabulary, the possible and a new wiki Figure 3: Overview of the main Thefeatures Instrument of the extension Linked enumerate Earths thePlatform. instruments used for taking measurements, completions ofsuch whatpage as a mass theis createdspectrometer user is for typing. The it. Inferred The based user,Variable on or similarextension perhaps describe terms others,s theproposed types can of edit by that other page users to add. documen- 4.1 Annotation Framework: The Linked Earth Platform This helps avoidclimatetation proliferationvariables. As that a result,can be ofinferred users unnecessary from build measurements the crowterms or fromd vocabularyand other inferredhelps vari-whilenormalize curating the their new own datasets. When annotating metadata,ables the (e.g.Figure systemtemperature) 3 highlightsoffers. The crowd in vocabulary thea pull main- buildsdown features on thesemenu extensions.of the Linked possible Earth Platform. The map-based The Linked Earth Platform [21] implements the Annotation Frameworkterms as created. an exten- IfThe none core ontology represents and the crwhatowd