Using a Knowledge Graph to Discover Earth Science Information Brian Freitag1, JJ Miller1, Kaylin Bugbee1, Xiaoyi Duan2, Jia Zhang2, Rahul Ramachandran3, Manil Maskey3, Patrick Gatlin3, Sundar Christopher1

1 – University of Alabama in Huntsville; 2 – Carnegie Mellon University; 3 – NASA Marshall Space Flight Center

Introduction AMS Glossary of Results Another approach useful for extracting Knowledge graphs link key entities within a Terms are selected from the American We use the term “” and evaluate the time meaningful information from the AFDs is to look specific domain to other entities via Meteorological Society (AMS) glossary of series from 2007 – 2017 over the southeastern at term usage as a function of time. With sea relationships. Researchers are able to mine meteorology. The AMS glossary is a static U.S. (Figure 2). An anomalous peak can be levels rising in response to global warming and these relationships from numerous sources to database that contains more than 12,000 seen in 2015 with more than 4000 extractions melting land ice, coastal flooding events are infer new knowledge. Text extraction from terms and definitions. for “snow” from AFDs occurring in the month of projected to increase. Analysis from 2007-2017 peer-reviewed papers and scientific reports are February. of the term “flooding” from offices along the untapped resources that can be leveraged by Data Analysis Gulf of Mexico indicates that coastal flooding knowledge graphs to accelerate scientific events are possibly on the rise (Figure 4). In discovery. More than 2M unique AFDs are compiled to 2007, no month exceeded 200 mentions of create a database more than 50 GB in size. flooding whereas each of the past 3 years have Such a database is difficult to parse without seen 7 months exceed 200 and 2 exceed 400. Methods efficient text mining and NLP techniques like GOAL: Demonstrate the feasibility of applying those used in knowledge graph development. natural language processing techniques used A 10-year period from 2007 to 2017 is used for in knowledge graphs to support scientific extracting 20 terms to assess the feasibility of discovery of Earth Science information. using text mining techniques for meaningful Earth Science applications. Knowledge Graph* Figure 2 Total extractions for the term “snow” for 2007 – A knowledge graph with a semantic entity • accumulation, , bomb, bow 2017 for the southeastern U.S. recognition (SER) and a relationship model echo, bright band, derecho, , A closer look at term extractions by day for was developed for Earth Science application. downslope wind, flooding, , freeze, gap February 2015 reveals two distinct peaks in Using the natural language processing (NLP) wind, hard freeze, hurricane, microburst, sea counts of the term “snow” where term usage Figure 4 Counts of extractions with “flooding” from 2007 techniques developed for knowledge graphs, breeze, smoke, snow, , exceeds 250 counts per day (Figure 3). The – 2017 for Gulf Coast NWS offices. how can Earth Science information be peak on the 24th and 25th was associated with a extracted from large data sets? record breaking snowstorm across the Conclusions southeast when places in Alabama received up NWS Area Forecast Discussions We demonstrate the ability of text mining and to 12” of snowfall. The National Service (NWS) issues NLP techniques to perform data analysis on area forecast discussions (AFD) about 4 large amounts of text data that is too Figure 1 Total number of terms extracted for all 122 offices cumbersome for traditional methods. Potential times a day for each from 2007 – 2017. office (WFO). Forecast discussions include applications of this approach include: 1) spatial short-term and long-term forecasts with Trends from specific terms can be used for and temporal patterns for case identification; 2) detailed discussion on upcoming weather several applications climate studies from extended time series; and events. We use the Environmental 1) Case identification from spatial and temporal 3) social science studies for improved Mesonet site hosted by Iowa State University anomalies communication of scientific information. and their archive of NWS text reports to 2) Trend analysis of term usage for extract AFDs from 2001 to 2017 for CONUS climatological application Contact: [email protected] 3) Evaluation of consistency in communication (122 of 126 NWS WFOs). Select terms are Figure 3 Counts of extractions with “snow” by day for extracted from AFDs to study spatial and of scientific information to the general public February 2015 for the southeastern U.S. temporal patterns in the use of terms. * For more information on the NASA science knowledge graph, see presentation IN53E-0669 in this poster hall on Friday afternoon.