Exploiting Parallel News Streams for Unsupervised Event Extraction Congle Zhang, Stephen Soderland & Daniel S

Total Page:16

File Type:pdf, Size:1020Kb

Exploiting Parallel News Streams for Unsupervised Event Extraction Congle Zhang, Stephen Soderland & Daniel S Exploiting Parallel News Streams for Unsupervised Event Extraction Congle Zhang, Stephen Soderland & Daniel S. Weld Computer Science & Engineering University of Washington Seattle, WA 98195, USA clzhang, soderlan, weld @cs.washington.edu { } Abstract Unfortunately, while distant supervision can work well in some situations, the method is limited to rela- Most approaches to relation extraction, the tively static facts (e.g., born-in(person, location) or task of extracting ground facts from natural capital-of(location,location)) where there is a cor- language text, are based on machine learning responding knowledge base. But what about dy- and thus starved by scarce training data. Man- ual annotation is too expensive to scale to a namic event relations (also known as fluents), such comprehensive set of relations. Distant super- as travel-to(person, location) or fire(organization, vision, which automatically creates training person)? Since these time-dependent facts are data, only works with relations that already ephemeral, they are rarely stored in a pre-existing populate a knowledge base (KB). Unfortu- KB. At the same time, knowledge of real-time nately, KBs such as FreeBase rarely cover events is crucial for making informed decisions in event relations (e.g. “person travels to loca- fields like finance and politics. Indeed, news stories tion”). Thus, the problem of extracting a wide range of events — e.g., from news streams — report events almost exclusively, so learning to ex- is an important, open challenge. tract events is an important open problem. This paper introduces NEWSSPIKE-RE, a This paper develops a new unsupervised tech- novel, unsupervised algorithm that discovers nique, NEWSSPIKE-RE, to both discover event rela- event relations and then learns to extract them. tions and extract them with high precision. The in- NEWSSPIKE-RE uses a novel probabilistic graphical model to cluster sentences describ- tuition underlying NEWSSPIKE-RE is that the text ing similar events from parallel news streams. of articles from two different news sources are not These clusters then comprise training data independent, since they are each conditioned on the for the extractor. Our evaluation shows that same real-world events. By looking for rarely de- NEWSSPIKE-RE generates high quality train- scribed entities that suddenly “spike” in popularity ing sentences and learns extractors that per- on a given date, one can identify paraphrases. Such form much better than rival approaches, more temporal correspondence (Zhang and Weld, 2013) than doubling the area under a precision-recall curve compared to Universal Schemas. allow one to cluster diverse sentences, and the re- sulting clusters may be used to form training data in order to learn event extractors. Furthermore, one can 1 Introduction also exploit parallel news to obtain direct negative Relation extraction, the process of extracting struc- evidence. To see this, suppose one day the news in- tured information from natural language text, grows cludes the following: (a) “Snowden travels to Hong increasingly important for Web search and ques- Kong, off southeastern China.” (b) “Snowden can- tion answering. Traditional supervised approaches, not stay in Hong Kong as Chinese officials will not which can achieve high precision and recall, are lim- allow ...” Since news stories are usually coherent, it ited by the cost of labeling training data and are un- is highly unlikely that travel to and stay in (which is likely to scale to the thousands of relations on the negated) are synonymous. By leveraging such direct Web. Another approach, distant supervision (Craven negative phrases, we can learn extractors capable of and Kumlien, 1999; Wu and Weld, 2007), creates its distinguishing heavily co-occurring but semantically own training data by matching the ground instances different phrases, thereby avoiding many extraction of a Knowledge base (KB) (e.g. Freebase) to the un- errors. Our NEWSSPIKE-RE system encapuslates labeled text. these intuitions in a novel graphical model making 117 Transactions of the Association for Computational Linguistics, vol. 3, pp. 117–129, 2015. Action Editor: Hal Daume´ III. Submission batch: 10/2014; Revision batch 1/2015; Published 2/2015. c 2015 Association for Computational Linguistics. Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00127 by guest on 29 September 2021 the following contributions: nately, most facts existing in the KBs are static facts We develop a method to discover a set of dis- like geographical or biographical data. They fall • tinct, salient event relations from news streams. short of learning extractors for fluent facts such as sports results or travel and meetings by a person. We describe an algorithm to exploit paral- • lel news streams to cluster sentences that be- Bootstrapping is another common extraction long to the same event relations. In particu- technique (Brin, 1999; Agichtein and Gravano, lar, we propose the temporal negation heuris- 2000; Carlson et al., 2010; Nakashole et al., 2011; tic to avoid conflating co-occurring but non- Huang and Riloff, 2013). This typically takes a set synonymous phrases. of seeds as input, which can be ground instances or key phrases. The algorithms then iteratively gener- We introduce a probabilistic graphical model to • ate more positive instances and phrases. While there generate training for a sentential event extractor are many successful examples of bootstrapping, the without requiring any human annotations. challenge is to avoid semantic drift. Large-scale sys- We present detailed experiments demonstrating • tems, therefore, often require extra processing such that the event extractors, learned from the gener- as manual validation between the iterations or addi- ated training data, significantly outperform sev- tional negative seeds as the input. eral competitive baselines, e.g. our system more Unsupervised approaches have been developed than doubles the area under the micro-averaged, for relation discovery and extractions. These algo- PR curve (0.80 vs. 0.30) compared to Riedel’s rithms are usually based on some clustering assump- Universal Schema (Riedel et al., 2013). tions over a large unlabeled corpus. Common as- sumptions include the distributional hypothesis used 2 Previous Work by (Hasegawa et al., 2004; Shinyama and Sekine, Supervised learning approaches have been widely 2006), latent topic assumption by (Yao et al., 2012; developed for event extraction tasks such as MUC-4 Yao et al., 2011), and low rank assumption by (Taka- and ACE. They often focus on a hand-crafted on- matsu et al., 2011; Riedel et al., 2013). Since the tology and train the extractor with manually created assumptions largely rely on co-occurrence, previous training data. While they can offer high precision unsupervised approaches tend to confuse correlated and recall, they are often domain-specific (e.g. bio- but semantically different phrases during extraction. logical events (Riedel et al., 2011; McClosky et al., In contrast to this, our work largely avoids these er- 2011) and entertainment events (Benson et al., 2011; rors by exploiting the temporal negation heuristic Reichart and Barzilay, 2012)), and are hard to scale in parallel news streams. In addition, unlike many over the events on the Web. unsupervised algorithms requiring human effort to Open IE systems extract open domain relations canonicalize the clusters, our work automatically (e.g. (Banko et al., 2007; Fader et al., 2011)) and discovers events with readable names. events (e.g. (Ritter et al., 2012)). They often perform Paraphrasing techniques inspire our work. Some self-supervised learning of relation-independent ex- techniques, such as DIRT (Lin and Pantel, 2001) tractions. It allows them to scale but makes them and Resolver (Yates and Etzioni, 2009), are based unable to output canonicalized relations. on the distributional hypothesis. Another common Distant supervised approaches have been devel- approach is to use parallel corpora, including news oped to learn extractors by exploiting the facts exis- streams (Barzilay and Lee, 2003; Dolan et al., 2004; ting in a knowledge base, thus avoiding human an- Zhang and Weld, 2013), multiple translations of the notation. Wu et al. (2007) and Reschke et al. (2014) same story (Barzilay and McKeown, 2001) and learned Infobox relations from Wikipedia, while bilingual sentence pairs (Ganitkevitch et al., 2013) Mintz et al. (2009) heuristically matched Freebase to generate the paraphrases. Although these algo- facts to texts. Since the training data generated rithms create many good paraphrases, they can not by the heuristic matching is often imperfect, multi- be directly used to generate enough training data to instance learning approaches (Riedel et al., 2010; train a relation extractor for two reasons: first, the Hoffmann et al., 2011; Surdeanu et al., 2012) have semantics of the paraphrases is often context depen- been developed to combat this problem. Unfortu- dent; second, the generated paraphrases are often in 118 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00127 by guest on 29 September 2021 Parallel news Event streams news sources typically use different sentences to de- Discover E=e(t1,t2) Test scribe the same event, and that corresponding sen- event sentences Extractions tences can be identified when they mention a unique relations s s→ E(a1,a2) Group Generate pair of real-world entities. For example, when an un- training data input extract usual entity pair (Selena, Norway) is suddenly seen NS=(a1,a2,d,S)(a ,a ,t) E=e((a t,a1,t,t)2) 1 2 1 2 learn S={sr r r, s ,s } rs→r E(ar 1,a2) in three articles on a single day: r11 2r123r 32 3 r1 2r 3r Event r1 rr2 r3 s’→1r1E(a’2r2 3r 3 ,a’ ) 4 5 r4 r5 1 2 Extractor NewsSpike w/ Training sentences Selena traveled to Norway to see her ex-boyfriend. Parallel sentences Selena arrived in Norway for a rendezvous with Justin. Selena’s trip to Norway was no coincidence. Training Phase Testing Phase It is likely that all three refer to the same event re- Figure 1: During its training phase, NEWSSPIKE-RE 1 first groups parallel sentences as NewsSpikes.
Recommended publications
  • ARCHITECTS of INTELLIGENCE for Xiaoxiao, Elaine, Colin, and Tristan ARCHITECTS of INTELLIGENCE
    MARTIN FORD ARCHITECTS OF INTELLIGENCE For Xiaoxiao, Elaine, Colin, and Tristan ARCHITECTS OF INTELLIGENCE THE TRUTH ABOUT AI FROM THE PEOPLE BUILDING IT MARTIN FORD ARCHITECTS OF INTELLIGENCE Copyright © 2018 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Acquisition Editors: Ben Renow-Clarke Project Editor: Radhika Atitkar Content Development Editor: Alex Sorrentino Proofreader: Safis Editing Presentation Designer: Sandip Tadge Cover Designer: Clare Bowyer Production Editor: Amit Ramadas Marketing Manager: Rajveer Samra Editorial Director: Dominic Shakeshaft First published: November 2018 Production reference: 2201118 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK ISBN 978-1-78913-151-2 www.packt.com Contents Introduction ........................................................................ 1 A Brief Introduction to the Vocabulary of Artificial Intelligence .......10 How AI Systems Learn ........................................................11 Yoshua Bengio .....................................................................17 Stuart J.
    [Show full text]
  • Keeping AI Legal
    Vanderbilt Journal of Entertainment & Technology Law Volume 19 Issue 1 Issue 1 - Fall 2016 Article 5 2016 Keeping AI Legal Amitai Etzioni Oren Etzioni Follow this and additional works at: https://scholarship.law.vanderbilt.edu/jetlaw Part of the Computer Law Commons, and the Science and Technology Law Commons Recommended Citation Amitai Etzioni and Oren Etzioni, Keeping AI Legal, 19 Vanderbilt Journal of Entertainment and Technology Law 133 (2020) Available at: https://scholarship.law.vanderbilt.edu/jetlaw/vol19/iss1/5 This Article is brought to you for free and open access by Scholarship@Vanderbilt Law. It has been accepted for inclusion in Vanderbilt Journal of Entertainment & Technology Law by an authorized editor of Scholarship@Vanderbilt Law. For more information, please contact [email protected]. Keeping Al Legal Amitai Etzioni* and Oren Etzioni** ABSTRACT AI programs make numerous decisions on their own, lack transparency, and may change frequently. Hence, unassisted human agents, such as auditors, accountants, inspectors, and police, cannot ensure that Al-guided instruments will abide by the law. This Article suggests that human agents need the assistance of AI oversight programs that analyze and oversee operational AI programs. This Article asks whether operationalAIprograms should be programmed to enable human users to override them; without that, such a move would undermine the legal order. This Article also points out that Al operational programs provide high surveillance capacities and, therefore, are essential for protecting individual rights in the cyber age. This Article closes by discussing the argument that Al-guided instruments, like robots, lead to endangering much more than the legal order-that they may turn on their makers, or even destroy humanity.
    [Show full text]
  • AI Ignition Ignite Your AI Curiosity with Oren Etzioni AI for the Common Good
    AI Ignition Ignite your AI curiosity with Oren Etzioni AI for the common good From Deloitte’s AI Institute, this is AI Ignition—a monthly chat about the human side of artificial intelligence with your host, Beena Ammanath. We will take a deep dive into the past, present, and future of AI, machine learning, neural networks, and other cutting-edge technologies. Here is your host, Beena. Beena Ammanath (Beena): Hello, my name is Beena Ammanath. I am the executive director of the Deloitte AI Institute, and today on AI Ignition we have Oren Etzioni, a professor, an entrepreneur, and the CEO of Allen Institute for AI. Oren has helped pioneer meta search, online comparison shopping, machine reading, and open information extraction. He was also named Seattle’s Geek of the Year in 2013. Welcome to the show, Oren. I am so excited to have you on our AI Ignition show today. How are you doing? Oren Etzioni (Oren): I am doing as well as anyone can be doing in these times of COVID and counting down the days till I get a vaccine. Beena: Yeah, and how have you been keeping yourself busy during the pandemic? Oren: Well, we are fortunate that AI is basically software with a little sideline into robotics, and so despite working from home, we have been able to stay productive and engaged. It’s just a little bit less fun because one of the things I like to say about AI is that it’s still 99% human intelligence, and so obviously the human interactions are stilted.
    [Show full text]
  • Oren Etzioni, Phd: CEO of Allen Institute for AI
    Behind the Tech Kevin Scott Podcast EP-26: Oren Etzioni, PhD: CEO of Allen Institute for AI [MUSIC] OREN ETZIONI: Whose responsibility is it? The responsibility and liability has to ultimately rest with a person. You can’t say, “Hey, you know, look, my car ran you over, it’s an AI car, I don’t know what it did, it’s not my fault, right?” You as the driver or maybe it’s the manufacturer if there’s some malfunction, but people have to be responsible for the behavior of the machines. [MUSIC] KEVIN SCOTT: Hi, everyone. Welcome to Behind the Tech. I'm your host, Kevin Scott, Chief Technology Officer for Microsoft. In this podcast, we're going to get behind the tech. We'll talk with some of the people who have made our modern tech world possible and understand what motivated them to create what they did. So, join me to maybe learn a little bit about the history of computing and get a few behind- the-scenes insights into what's happening today. Stick around. [MUSIC] CHRISTINA WARREN: Hello, and welcome to Behind the Tech. I’m Christina Warren, senior cloud advocate at Microsoft. KEVIN SCOTT: And I’m Kevin Scott. CHRISTINA WARREN: Today on the show, our guest is Oren Etzioni. Oren is a professor, entrepreneur, and is the chief executive officer for the Allen Institute for AI. So, Kevin, I’m guessing that you guys have already crossed paths in your professional pursuits. KEVIN SCOTT: Yeah, I’ve been lucky enough to know Oren for the past several years.
    [Show full text]
  • The Elephant in the Room: Getting Value from Big Data Serge Abiteboul, Luna Dong, Oren Etzioni, Divesh Srivastava, Gerhard Weikum, Julia Stoyanovich, Fabian Suchanek
    The elephant in the room: getting value from Big Data Serge Abiteboul, Luna Dong, Oren Etzioni, Divesh Srivastava, Gerhard Weikum, Julia Stoyanovich, Fabian Suchanek To cite this version: Serge Abiteboul, Luna Dong, Oren Etzioni, Divesh Srivastava, Gerhard Weikum, et al.. The elephant in the room: getting value from Big Data. Workshop on Web and Databases (WebDB), May 2015, Melbourne, France. 10.1145/2767109.2770014. hal-01699868 HAL Id: hal-01699868 https://hal-imt.archives-ouvertes.fr/hal-01699868 Submitted on 2 Feb 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The elephant in the room: getting value from Big Data Serge Abiteboul (INRIA Saclay & ENS Cachan, France), Luna Dong (Google Inc., USA), Oren Etzioni (Allen Institute for Artificial Intelligence, USA), Divesh Srivastava (AT&T Labs-Research, USA), Gerhard Weikum (Max Planck Institute for Informatics, Germany), Julia Stoyanovich (Drexel University, USA) and Fabian M. Suchanek (Télécom ParisTech, France) 1. INTRODUCTION SIGMOD Innovation Award in 1998, the EADS Award from Big Data, and its 4 Vs { volume, velocity, variety, and the French Academy of sciences in 2007; the Milner Award veracity { have been at the forefront of societal, scientific from the Royal Society in 2013; and a European Research and engineering discourse.
    [Show full text]
  • Toward Automatic Bootstrapping of Online Communities Using
    Toward Automatic Bootstrapping of Online Communities Using Decision-theoretic Optimization Shih-Wen Huang* Jonathan Bragg* Isaac Cowhey University of Washington University of Washington Allen Institute for AI Seattle, WA, USA Seattle, WA, USA Seattle, WA, USA [email protected] [email protected] [email protected] Oren Etzioni Daniel S. Weld Allen Institute for AI University of Washington Seattle, WA, USA Seattle, WA, USA [email protected] [email protected] ABSTRACT INTRODUCTION Successful online communities (e.g., Wikipedia, Yelp, and The Internet has spawned communities that create extraordi- StackOverflow) can produce valuable content. However, nary resources. For example, Wikipedia’s 23 million users many communities fail in their initial stages. Starting an on- have created over 4 million English articles, a resource over line community is challenging because there is not enough 100 times larger than any other encyclopedia. Similarly, content to attract a critical mass of active members. This StackOverflow has become a top resource for programmers paper examines methods for addressing this cold-start prob- with 14 million answers to 8.5 million questions, while Yelp lem in datamining-bootstrappable communities by attract- users generated more than 67 million reviews.1 ing non-members to contribute to the community. We make four contributions: 1) we characterize a set of communi- In reality, however, most online communities fail. For exam- ple, thousands of open source projects have been created on ties that are “datamining-bootstrappable” and define the boot- SourceForge, but only 10% have three or more members [24]. strapping problem in terms of decision-theoretic optimiza- Furthermore, more than 50% of email-based groups received tion, 2) we estimate the model parameters in a case study involving the Open AI Resources website, 3) we demonstrate no messages during a four-month study period [6].
    [Show full text]
  • Autonomous Cars –
    Prof. Roberto V. Zicari Frankfurt Big Data Lab www.bigdata.uni-frankfurt.de Johannes Gutenberg University Mainz January 28, 2019 1 Data as an Economic and Semantic Asset “I think we’re just beginning to grapple with implications of data as an economic asset” (*) –Steve Lohr (The New York Times) Data has become a new economic asset. The companies with big data pools can have great economic power They greatly influence what the philosopher Floridi call our semantic capital (**) (*) Source: Big Data and The Great A.I. Awakening. Interview with Steve Lohr, ODBMS Industry Watch, December 19, 2016 (**) Source: Semantic Capital: Its Nature, Value, and Curation, Luciano Floridi, December 2018, Volume 31, Issue 4, pp 481–497| 2 What is more important, vast data pools, sophisticated algorithms or deep pockets? “No one can replicate your data. It’s the defensible barrier, not algorithms.” (*) -- Andrew Ng, Stanford professor (*) Source: Big Data and The Great A.I. Awakening. Interview with Steve Lohr, ODBMS Industry Watch, December 19, 2016 3 Algorithms and Data “AI is akin to building a rocket ship. You need a huge engine and a lot of fuel. The rocket engine is the learning algorithms but the fuel is the huge amounts of data we can feed to these algorithms.” (*) -- Andrew Ng It is important to note that Big Data is of NO use unless it is analysed. (*) Source: Big Data and The Great A.I. Awakening. Interview with Steve Lohr, ODBMS Industry Watch, December 19, 2016 4 Interplay and implications of Big Data and Artificial Intelligence The Big Data revolution and the new developments in Hardware have made the recent AI advances possible.
    [Show full text]
  • Jeffrey P. Bigham Updated 2/3/2020
    3525 Newell-Simon Hall +1 (412) 945-0708 Carnegie Mellon University Jeffrey P. Bigham [email protected] Pittsburgh, PA 15213 www.jeffreybigham.com updated 2/3/2020 ACADEMIC POSITIONS Research Scientist July 2018 – Accessibility, Sensing, and Machine Learning Research Apple, Pittsburgh, PA Associate Professor September 2013 – Human Computer Interaction Institute and Language Technologies Institute Carnegie Mellon University, School of Computer Science, Pittsburgh, PA Assistant Professor July 2009 – August 2013 Department of Computer Science University of Rochester, Rochester, NY EDUCATION Ph.D., Computer Science and Engineering, University of Washington – 2009 Thesis Title: Intelligent Interfaces Enabling Blind Web Users to Build Accessibility Into the Web Committee: Richard E. Ladner (chair), Tessa Lau, Ed Lazowska, and Jacob O. Wobbrock. M.Sc. Computer Science and Engineering, University of Washington – 2005 Qualifying Exam Project: Boosting Relation Extraction Recall with Soft Rules. Advisor: Oren Etzioni. B.S.E. Computer Science, Princeton University – 2003 Thesis Title: On Using Error-Correcting Codes and Boosting to Learn Multi-Class Classification Problems Advisors: Amit Sahai and Robert Shapire HONORS Alfred P. Sloan Research Fellowship (2014) NSF CAREER Award (2012) MIT Technology Review Top 35 Innovators Under 35 Award (2009) W4A 2016 Best Technical Paper Award [C.69] (2016) ASSETS 2015 Best Demo Award [P.25] (2015) W4A 2014 Best Technical Paper Award [C.50] (2014) W4A Paciello Group Accessibility Challenge Award – Scribe [O.14] (2013) ACM WSDM 2012 Best Paper Award [C.30] (2012) ACM UIST 2010 Best Paper Award [C.24] (2010) W4A Accessibility Challenge Award – VizWiz [O.8] (2010) University of Washington College of Engineering Student Innovator Award for Research (2009) NCTI Technology in the Works Award (2009) NISH National Scholar Award for Workplace Innovation & Design – Slide Rule (Honorable Mention) (2009) NISH National Scholar Award for Workplace Innovation & Design – WebAnywhere (Honorable Mention) (2009) Andrew W.
    [Show full text]
  • Convolutional Neural Networks (CNN) for Data Classification
    Convolutional Neural Networks (CNN) for data classification Gianluca Filippini EBV / FAE -ML Specialist 1 2017: We will create systems and robots, which are smarter than us Raymond Kurzweil, Google’s Director of Engineering, is a well-known futurist with a high-hitting track record for accurate predictions. “2029 is the consistent date I have predicted for when an AI will pass a valid Turing test and therefore achieve human levels of intelligence. I have set the date 2045 for the ‘Singularity’ which is when we will multiply our effective intelligence a billion fold by merging with the intelligence we have created” Ray Kurzweil Using big data, computer programs (artificial intelligence) will be capable of analyzing massive amounts of information, identifying trends and using that knowledge to come up with solutions to the world’s biggest problems.. https://en.wikipedia.org/wiki/Ray_Kurzweil https://futurism.com/kurzweil-claims-that-the-singularity-will-happen-by-2045 https://en.wikipedia.org/wiki/Technological_singularity 2 1950: The Imitation Game. Computing Machinery and Intelligence (Mind 49, 433-460) I propose to consider the question, "Can machines think?" This should begin with definitions of the meaning of the terms "machine" and "think.“ […] It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows
    [Show full text]
  • Douglas C Downey
    Douglas C Downey Electrical Engineering and Computer Science Dept. [email protected] Northwestern University, Ford 3-345 (847)491-3710 2133 Sheridan Road http://www.cs.northwestern.edu/~ddowney Evanston, IL 60208 Research Natural Language Processing, Machine Learning, Artificial Intelligence Interests Education University of Washington September 2002 – Ph.D. Computer Science and Engineering, 2008 December 2008 Dissertation Title: Redundancy in Web-scale Information Extraction: Probabilistic Model and Experimental Results M.S. Computer Science and Engineering, 2004 Case Western Reserve University August 1996 – B.S./M.S. in Computer Science May 2000 Minors in Mathematics and Economics Professional Northwestern University, Associate Professor September 2014 – Experience Department of Electrical Engineering and Computer Science present Northwestern University, Assistant Professor September 2008 – Department of Electrical Engineering and Computer Science 2014 University of Washington, Research Assistant June 2003 – Advisor: Oren Etzioni September 2008 March 2006 – Microsoft Corporation, Research Intern July 2006 Mentor: Susan Dumais Manager: Eric Horvitz October 2000 – Intel Corporation, Internet Software Engineer September 2002 October 1998 – Case Western Reserve University, Research Assistant September 2000 Advisor: Randall D. Beer Honors and NSF CAREER Award, 2014 Awards Northwestern EECS “Teacher of the Year,” 2012 US SECRET Security Clearance, 2012-2016 DARPA Computer Science Study Panel, 2011 Microsoft New Faculty Fellowship,
    [Show full text]
  • Artificial Intelligence: Background, Selected Issues, and Policy Considerations
    Artificial Intelligence: Background, Selected Issues, and Policy Considerations May 19, 2021 Congressional Research Service https://crsreports.congress.gov R46795 SUMMARY R46795 Artificial Intelligence: Background, Selected May 19, 2021 Issues, and Policy Considerations Laurie A. Harris The field of artificial intelligence (AI)—a term first used in the 1950s—has gone through Analyst in Science and multiple waves of advancement over the subsequent decades. Today, AI can broadly be thought Technology Policy of as computerized systems that work and react in ways commonly thought to require intelligence, such as the ability to learn, solve problems, and achieve goals under uncertain and varying conditions. The field encompasses a range of methodologies and application areas, including machine learning (ML), natural language processing, and robotics. In the past decade or so, increased computing power, the accumulation of big data, and advances in AI techniques have led to rapid growth in AI research and applications. Given these developments and the increasing application of AI technologies across economic sectors, stakeholders from academia, industry, and civil society have called for the federal government to become more knowledgeable about AI technologies and more proactive in considering public policies around their use. Federal activity addressing AI accelerated during the 115th and 116th Congresses. President Donald Trump issued two executive orders, establishing the American AI Initiative (E.O. 13859) and promoting the use of trustworthy AI in the federal government (E.O. 13960). Federal committees, working groups, and other entities have been formed to coordinate agency activities, help set priorities, and produce national strategic plans and reports, including an updated National AI Research and Development Strategic Plan and a Plan for Federal Engagement in Developing Technical Standards and Related Tools in AI.
    [Show full text]
  • Preparing for the Future of Artificial Intelligence
    PREPARING FOR THE FUTURE OF ARTIFICIAL INTELLIGENCE Executive Office of the President National Science and Technology Council National Science and Technology Council Committee on Technology October 2016 About the National Science and Technology Council The National Science and Technology Council (NSTC) is the principal means by which the Executive Branch coordinates science and technology policy across the diverse entities that make up the Federal research and development (R&D) enterprise. One of the NSTC’s primary objectives is establishing clear national goals for Federal science and technology investments. The NSTC prepares R&D packages aimed at accomplishing multiple national goals. The NSTC’s work is organized under five committees: Environment, Natural Resources, and Sustainability; Homeland and National Security; Science, Technology, Engineering, and Mathematics (STEM) Education; Science; and Technology. Each of these committees oversees subcommittees and working groups that are focused on different aspects of science and technology. More information is available at www.whitehouse.gov/ostp/nstc. About the Office of Science and Technology Policy The Office of Science and Technology Policy (OSTP) was established by the National Science and Technology Policy, Organization, and Priorities Act of 1976. OSTP’s responsibilities include advising the President in policy formulation and budget development on questions in which science and technology are important elements; articulating the President’s science and technology policy and programs; and fostering strong partnerships among Federal, state, and local governments, and the scientific communities in industry and academia. The Director of OSTP also serves as Assistant to the President for Science and Technology and manages the NSTC. More information is available at www.whitehouse.gov/ostp.
    [Show full text]