<<

HorizonWhitepaper Scanning October 2016

Horizon Scanning A novel approach to scanning the scientific, technical and medical horizon for signs of future emergence.

Meta.com Toronto Montreal San Francisco © 2016 Meta. All rights reserved. Whitepaper Horizon Scanning

Executive Summary With the convergence of open-source big data frameworks, advances in machine In the global life sciences marketplace, learning and natural language processing, the ability to rapidly identify and capitalize and the ability to access the growing corpus on emerging that are still in of closed-access full-text scientific their infancy is a critical marker for success. and technical information, the conditions Anticipating those opportunities however, are right to support a new class of product, requires a deep understanding of the insights capable of horizon scanning at true scale. buried within the global corpus of scientific, technical, and patent literature. This paper examines the conditions required to enable teams with the ability With the rising pace of global output, to spot early technical emergences. It also the scale of literature being generated has examines how Meta Horizon Scanning become too great to support the horizon leverages these capabilities to provide scanning approaches that have worked innovation-driven companies with a unique in the past. Too much data is being produced view of the future trajectories of science for human efforts alone to interpret, and . causing the subtle signals that have the power to alert analysts to emerging technical areas to go unnoticed.

Addressing these challenges requires fundamentally new approaches that go beyond the methods and classes of tools being used by academics and life science companies today. It requires advanced predictive intelligence engines capable of bibliome-scale scanning, and the ability to continuously monitor millions of concepts for signs of future emergence.

2 Whitepaper Horizon Scanning

The case for Innovation management experts need new ways to scale the horizon scanning process Horizon Scanning and reduce the labor investment involved. Until now this was not possible. However, For innovation management teams, identifying as advances in machine learning intersect and assessing emerging technical capabilities with the ability to access and mine closed- is an expensive and time-consuming process. access scientific and technical articles, Human-driven approaches may include small- novel systems like Meta Horizon Scanning scale reviews, database analyses, expert are being used to glean insights from across panels, and conference scouting. large amounts of unstructured content for the very first time. These insights are helping These strategies are rapidly becoming analysts scan the scientific and technical unscalable in the face of the growing horizon at bibliome-scale, and focus their mountain of global research. As research efforts in the fixed spaces that offer output continues to rise, so too does the the most promise. potential for future technical capabilities to emerge from unexpected, socioeconomic This is reducing overall time and cost 1 and geographic areas. efforts, and accelerating early opportunities for in-licensing, acquisition, and strategic partnerships.

China has quadrupled its yearly rate of scientific output in under ten years. It is now the second largest producer of top-cited publications in Materials Science, Chemistry, Engineering, Computer Science, Chemical Engineering, and Mathematics.2

Iran saw an 18.5% increase in patent applications between 2013-2014, making it one of the fast-growing producers in the world.3

Korea spends more on R&D than any other country in the world - investing 4.3% of its GDP into research in 2014.4 Whitepaper Horizon Scanning

Machine intelligent There’s no point being precise horizon scanning – “ if you don’t know what you’re talking about.11 why now? ” Machine learning is by no means new. John von Neumann Neural networks and natural language Mathematician, Physicist, Pioneering Computer Scientist processing (NLP) can be traced back to the 1950s,5 statistical AI grew in the 1990s,6 and deep learning has been around for over While these conditions are creating fertile a decade.7 So why then, has machine learning ground for machine learning opportunities seen such a dramatic uptake in almost every throughout the industrial ecosystem, the global industry over the past few years? key to success is not just in the power of the algorithms, it is in access to the content.12 One reason is data. When applied to machine This presents a serious challenge to industries learning systems, large-scale datasets have built on the scientific literature, where the the power to do everything from informing vast majority of that content remains behind customer buying patterns, to diagnosing expensive paywalls. In biomedicine alone, illnesses, to predicting future technologies.8 a single academic research article With over 2.5 quintillion bytes of data being can cost between $30 - $50. These high costs produced every day, the world is overflowing have for years created a large barrier with data – and machine learning is the to entry for companies who wished to mine perfect tool to extract meaningful insights the content and act on the insights buried from it at beyond-human scales.9 within it.

Another reason is accessibility. Big data Meta has solved this challenge by creating processing frameworks like Hadoop and Solr direct relationships with over 35 of the world’s (which are currently at work inside Meta) leading scientific, technical and medical are open source and freely available. publishers. Over the past five years, Meta Graphics processing units (GPUs) are now has compiled the largest scholarly full-text available in the cloud, and companies like commercial text mining collection on Earth – Facebook and Google are open-sourcing their a collection worth over $850 million. machine learning IP.10 As accessibility spreads, Meta’s collection includes full-text, closed adoption grows. This creates a feedback loop access, archives, and in some cases, where novel applications for machine learning embargoed content. are being created and fed back into the open source code – inspiring more innovative applications and solutions.

4 Whitepaper Horizon Scanning

Meta’s content has been mined, structured Meta consumes full-text articles and factors and analyzed using advanced analytics, their contents into a large Knowledge Graph, enabling customers to access data and which is conceptually similar to what insights without the extensive challenges Google has built for the broader world. This of content licensing, formatting, training, continuously evolving knowledge network storing, and processing. Meta users can feeds the entire Meta platform, including Meta simply log in, view and explore the insights, Science, a free AI-enabled literature discovery or they can choose to incorporate the service that enables researchers and science- data into their own systems using Meta’s focused companies to discover journals and application programming interface (API). papers that would otherwise remain buried and undiscovered. In exchange for their content, Meta enables publishers to leverage its (AI) to expand the reach of publisher journals and articles to even greater audiences around the world. By enabling article discovery across Meta's entire platform, Meta's Publisher Partnership Program is putting more journals and papers into the hands of researchers and innovation-driven companies around the globe.

41M 28M 40 38K Full-text Closed-access Major STM Serial titles articles articles publishers (books & journals)

In peer-reviewed $850M research content

Figure 1: Meta has compiled the largest commercial STM text-mining collection on Earth - an $850M collection that includes over 28M full-text, closed access articles.

5 Whitepaper Horizon Scanning

Meta’s unprecedented collection of science This allows Meta to serve a variety of domains and IP content stretches beyond biomedicine in addition to biomedicine – including physics, to include both U.S. patents (applications and chemistry, and materials science. filings) and Chinese patents (applications and filings).

Type Domain Language Volume Coverage

Closed Access Science and English 28M 1900-2016 Full-Text Articles Technical Literature

US Patent and Patent 9.7M (app) English 1980-2016 Trademark Office Database 4.9M (grant)

Open-Access Science and English 26M 1980-2016 Repositories Technical Literature

Chinese National Patent Chinese 16M 1994-2001 Knowledge Infrastructure Database

Chinese State Intellectual Science and 9.6M (app) Chinese 1980-2016 Property Office Technical Literature 5.4M (grant)

Figure 2: Meta Horizon Scanning content sources.

6 Whitepaper Horizon Scanning

A framework created for The framework they created under the FUSE project is the same framework that now National Intelligence underpins Meta Horizon Scanning.

The predictive intelligence capabilities that underpin Meta Horizon Scanning Emergence – were originally developed for the Intelligence the north star metric Advanced Research Projects Activity (IARPA), a research wing within the Office of the for Horizon Scanning Director of National Intelligence (ODNI). The central hypothesis of Horizon Scanning The Foresight and Understanding is that “real-world processes of technical from Scientific Exposition (FUSE) program emergence leave discernable traces in was a 5-year, $60 million research project the public, scientific, and technical literature.”15 that sought to “develop automated methods These features, when extracted from that aid in the systematic, continuous, the literature, can be connected and used and comprehensive assessment of technical to establish models of future emergence emergence using publicly available that have the power to warrant further information found in published scientific, investigation. So what then is technical technical and patent literature.”13 emergence, how can it be measured, Specifically, the goal of the program and what features are best suited to form was to create a system that could: models upon which projections of future • Process the growing, multi-discipline, and emergence can be made? SRI International unstructured body of full-text scientific, definestechnical emergence as follows: technical, and patent literature from around the world. Technical emergence is the phase during • Automatically generate and prioritize “ which a concept or construct is adopted concepts within emerging areas, and and iterated by an expert community of nominate those that exhibit signs of early practice, resulting in a fundamental change in (or significant extension of) human technical emergence. understanding or capability.16 • Provide this capability for literature in English and at least two non-English Since change is a significant signal” of 15 languages. emergence, then it should follow that those changes could be mapped through semantic SRI International, the Silicon Valley research and linguistic patterns within the scientific institute responsible for creating Apple’s Siri, and technical literature.17 was the prime contractor on the project.

7 Whitepaper Horizon Scanning

Terms expressed in expressed in Subjects refined through usage in formalized by Concepts develop Topics & prescribe associated with Methodologies express Disciplines concerning focused around

used to design Publications published in Journals Phenomena related to which produce generally performed with organized by investigated Instrumentation using Experiments span the boundaries of Inventions Communities conduct and can Networks produce described in house & mantain Universities Projects associate with become engaged in Patent Applications are engaged in Alliances can map to Facilities Teams attributed to collaborate in members of PRIs including Researchers/Inventors are employed by operate are located in Firms Organizations which are units of

Laboratories

Figure 3: Mapping the social nature of emergence.18

Determining technical Next, a holistic analysis is conducted on each concept using over 30 quantitative indicators emergence of prominence, and 11 years worth of content. Finally, this information is aggregated and There are several key steps that Meta takes used to create a global, systematic and to determine the prominence of concepts complete assessment of the future trajectories from within current article sets, and ultimately, of every concept mentioned in the corpus. to predict their future emergence. First, relevant concepts and phrases are identified and extracted from the documents.

8 Whitepaper Horizon Scanning

Concept mapping To find concepts that are characteristic to a specific topic, the system compares The ability to understand and extract all the frequencies of particular concepts relevant concepts, phrases and concepts in two sets of documents: the foreground from within a given article set is easily one corpus (documents about a single topic) of the most important steps in the Horizon and the background corpus (documents Scanning process. To enable this, Meta about a mixture of topics). Using several consumes the entire article corpus, relying statistical measures, the system then on the NLP pipeline which parses and tags produces a list of concepts, matched to the 19 the text, identifies nouns, verbs and named topic to which they belong. concepts, and aggregates complex concepts. This allows the system to consider expressions like ‘breast cancer’ as a single concept, rather than two individual concepts.

Figure 4: Meta relies on the NLP pipeline to identify relevant concepts and expressions from articles within a given corpus.

9 Whitepaper Horizon Scanning

Incorporating third- Meta has comprehensive gold-standard lists of genes and proteins from many organisms party concept lists that can be input to any scan. This includes a complete set of human gene symbols Often, companies have pre-curated lists as well as their alternative names, mouse of compounds or genes and they want homologues (human responses to disease to understand which of those concepts can often be recapitulated in mouse models), they should be focusing their R&D efforts. as well as FDA-approved compounds lists For example, a pharmaceutical company (drug names and active ingredients). might have a list of compounds within their database that are approved within the field Appendix A shows some of the concept sets of Cardiovascular Disease. By running that available within Meta’s growing list through a Diabetes Type 2-specific article content catalogue. subset, the results could point to approved drugs within their pipeline that could be repurposed for Type 2 Diabetes.

With Meta Horizon Scanning, external concept lists can be easily ingested directly into the system. Scans can be done on a particular concept list, or they can be combined with Meta’s Concept Mapping service, allowing analysts to identify novel topics and technologies within their specific field that they might not have known about.

10 Whitepaper Horizon Scanning

Nominating concepts for Meta Horizon Scanning uses over 30 indicators that have been shown to signal future technical emergence emergence. These indicators are grouped into four families: Once the concepts have been identified and extracted from the texts, they are then • Baseline indicators: Indicators that are analyzed based on a number of semantic conceptually straightforward and require and linguistic patterns within the article sets, little to no data processing. Ex. Article and within the corpus as a whole.20 frequency in subset, Article frequency in corpora. Research teams supporting the IARPA FUSE • Citation-based indicators: Indicators program developed and implemented a large based on relevant changes to the citation number of these quantitative indicators of distribution of an concept. Ex. G-index, emergence – ones that contributed to more H-index. accurate predictive models were prioritized for use in Meta Horizon Scanning.21 • Geospatial indicators: Indicators that analyze the geographic clustering and Some indicators are conceptually distribution of emerging concepts. straightforward and can be computed with Ex. Geospatial concept frequency, little to no intermediate data processing. geographic cell frequency. Article frequency and Maturity are two • Predefined forecasts: Computed 3-year such examples. Other indicators require prediction scores for extracted concepts. extra modeling steps before they can be Ex. Three-year prominence, Three-year 22 interpreted. For example, an indicator that article frequency. computes an concept’s Spatial distribution must first compute all researchers associated For a full list of Meta’s indicator set, with a specific concept, then identify their see Appendix B. global geographic distribution. Based on this information, it can generate a score reflecting the uniformity or clustering of the spatial distribution of geographic cells represented in an article subset.

11 Whitepaper Horizon Scanning

Measuring current Suppose now that miR-21 had only been mentioned in two articles between 2010-2012 prominence and five articles between 2013-2015. Although the rate of increase in article mentions is much With any Horizon Scanning forecasting task, higher in this case (150% vs 120%), the overall the goal is to predict whether a concept will prominence score is lower: emerge to become prominent three years 23 into the future. To validate this task, a model 1 2 needs to be created in which a prediction can p = (1 - –– ) (1 - –– ) = 0.48 be made on a year that has already occurred. 5 5 That way the prediction can be validated r = 2 (in 2012) against a current prominence score based f = 5 (in 2015) on real world results. To calculate the current prominence of a concept, the following As the above example shows, the formula formula is applied: is designed to correct for concepts with very low mentions within the current article set. 1 r p = (1 - – ) (1 - – ) f f 1 r r – reference year p = (1 - – ) (1 - – ) f – forecast year f f

Prominence correction Prominence for low frequency concepts strength To illustrate how this formula works, consider the following example: Figure 5: Meta’s prominence formula is split into two A scan is created to uncover new immuno- parts. The first is designed to correct for concepts with oncology technologies, and miR-21 is identified low mentions in current article sets, and the second as a potentially emerging gene. Further analysis is designed to detect the level of prominence of an reveals that miR-21 was mentioned in 10 articles concept within a current article set. published between 2010-2012, and 22 articles published between 2013-2015. This information During the original IARPA FUSE program, is input into the algorithm, and a prominence a score of 0.3 was set as the marker score is produced. for determining whether or not a concept should qualify as 'currently prominent'.24 1 10 p = (1 - –– ) (1 - –– ) = 0.521 The same threshold is used to train the system 22 22 to calculate future emergence scores in Meta Horizon Scanning. r = Article Frequency 2010-2012 inclusive f = Article Frequency 2013-2015 inclusive 12 Whitepaper Horizon Scanning

Generating a future To generate this score, Meta uses a machine learning technique called random forest emergence score classification.

In the final step of Horizon Scanning, Random forest classification creates a large a cumulative future emergence score collection of decision trees (called a forest).25 is assigned to concepts extracted from the article set.

Article Frequency in 2011

≥6 <6

Emerging: 8 Emerging: 5 Not Emerging: 4 Not Emerging: 6

G-Index Age in in 2012 2009

≥4 <4 ≥3 <3

Emerging: 5 Emerging: 3 Emerging: 5 Emerging: 0 Not Emerging: 4 Not Emerging: 0 Not Emerging: 0 Not Emerging: 6

Spatial Distribution in 2013

≥6 <6

Emerging: 5 Emerging: 0 Not Emerging: 0 Not Emerging: 4

Figure 6: An example of a binary decision tree used in Random Forest Classification. The result is a ‘weak classifier’.

13 Whitepaper Horizon Scanning

Each tree produces a binary prediction – By combining the predictions from a random the concept is emerging or it is not emerging. forest of decision trees, a strong and powerful A randomization process conducted during indicator of future outcome is produced. training ensures that each tree learns to focus on a different aspect of the data, resulting in what is known as weak classifiers.

1 0 0 ... 1

x = 0.341

Figure 7: By combining the results from a random forest of decision trees, a future emergence score is assigned to each concept.

14 Whitepaper Horizon Scanning

Training the system In order to train the model to predict 2012 to 2015, the system needs to analyze five full years of data. Any less, and the model produces In order for Meta Horizon Scanning to weak results, any more and it begins to over- accurately predict emphasize mature concepts. three years into the future, the models used to train the algorithms need 11 full years This brings the first articles back to 2008. worth of content. The following breakdown explains the training process in detail: Evaluate

Data Training Target Analysis Window Scan

2005 2008 2012 2015 2016 2019 Suppose a scan was created to make a prediction for 2019.

To ensure accuracy, Meta trains its models using

Target Scan four separate training windows. This means the

2005 2016 2019 above steps are repeated using the following years:

• 2008 - 2012 predicts on 2015 In order to produce that scan, the system • 2007 - 2011 predicts on 2014 needs a complete three-year training window. Since 2016 is not yet complete, the window • 2006 - 2010 predicts on 2013 would be 2012 - 2015. • 2005 - 2009 predicts on 2012

Train

Training Target Window Scan Predict 2005 2012 2015 2016 2019

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2019

Evaluate The goal then, is to train the system to make a prediction for 2015 using data from 2012. Once the models are trained and the predictions This allows the results to be validated based validated, the model is applied to the present day on known 2015 emergences. – 2016 predicting for 2019.

Evaluate

Training Target Predict Window Scan

2005 2012 2015 2016 2019 2005 2016 2019

15 Whitepaper Horizon Scanning

Validating the predictions International’s forecasting platform, which now underpins Meta Horizon Scanning. To validate the accuracy of a future emergence For each discipline, a random sample of 5,000 score, the model must generate a prediction concepts was selected and predictions were on a year that has already occurred. That way it made for 2007, based on data that went no can be evaluated against known emergences. further than 2004. The results in Appendix C 26 For the IARPA FUSE program, an extensive report the results of each of those forecasts. evaluation process was conducted on SRI

Recall Precision

Emerging concept Non-emerging concept

Precision: Recall: Of all the concepts that the system predicted Of all the concepts that emerged within the were most likely to emerge, how many did window, how many did the system predict? 27 in fact emerge?

Figure 8: Mapping Precision and Recall 16 Whitepaper Horizon Scanning

Interpreting the results of the future trajectories of every major concept within their industries, systematically Once a scan is complete and its predictions organized by their future emergence score. generated, focus should turn to interpreting the results into actionable insights. The Meta Horizon Scanning interface allows analysts to filter results in a variety of ways – To address this challenge, Meta visualizes from current and future prominence scores, the results of Horizon Scanning queries to the year a concept was introduced, to the through a custom interface that is integrated number of words in the concept name (called into the Meta Science platform. Through this N-Grams). This makes it easy to dig through interface, analysts and innovation teams the results, quickly validate hypotheses, have a panoramic and global perspective and nominate specific areas that warrant further investigation.

Figure 9: Results of a Horizon Scan are displayed in a custom interface on the Meta Science platform.

17 Whitepaper Horizon Scanning

Digging deeper Through these perspectives, analysts can quickly glean an concept’s past and future The results of a scan often reveal emergences trajectories from a number of quantitative from unexpected concepts. For this reason, perspectives, isolate the countries and regions the ability to explore the results from a more that are producing the early breakthrough granular perspective is paramount. With Meta research, and even compare multiple Horizon Scanning, analysts can hone in on any emergences side-by-side to fast-track internal concept within the article set and investigate prioritization efforts. These privileged views it from the viewpoint of many of the indicators give innovation teams a unique competitive that contributed to its predicted result. advantage, allowing them to focus their resources in the right direction much earlier in the process.

Figure 10: Exploring the factors that contributed to the predicted emergence of a biomedical technology using Meta Horizon Scanning.

18 Whitepaper Horizon Scanning

Getting started with 3. Choose a concept extraction technique. Some scans are more exploratory in nature Horizon Scanning and are best suited for Meta Concept Mapping. For example, a food sciences What steps should companies take to extract company that wants to detect new genes the most value from Meta Horizon Scanning? to increase food productivity in arid climates The following steps outline how to get started: would benefit from allowing Meta Concept Mapping to pull relevant concepts from 1. Determine the challenge. the appropriate article set. By contrast, a A proper articulation of the challenge will be pharmaceutical company that wants to find a guide for everything that follows throughout new uses for existing drugs within their the process of running a scan. The value of portfolio may wish to scan a specific article set this step cannot be overstated. What is the against a custom-curated compound list. overarching goal? Who are the key stakeholders In some cases, companies might find it useful involved in the project? What kind of data is to apply a combination of the two extraction being sought? What are the next steps? approaches. This would ensure that nothing Having a clear and shared understanding of important is missed – including any concepts how these challenges will be addressed will they had not yet prioritized for emergence. make the process run smoothly and efficiently. 4. Know the next steps. 2. Define the article set. Once relevant concepts are identified Ultimately, the query created in step one and their three-year predictive impacts will determine the subject matter needed calculated, what next steps will be taken? to define the scan. For example, if a funder For biomedical scans, Meta Science is a great wanted to identify early-stage biotech place to start. Through it, innovation teams companies specializing in oncology, a subset can quickly identify the people and labs of full-text, oncology-specific biomedical that are driving the future groundbreaking articles would be best suited for that scan. research, as well as adjacent areas of research If a company wanted to identify emerging and technology that they should be paying technologies that could lead to more attention to. Together, these services provide environmentally-friendly consumer products, a truly privileged view of the future and then a materials science subset would need trajectories of scientific discovery to be defined. Meta has unprecedented and progress. volumes of science and IP content to surface emerging technologies in a variety of domains, in both English and Chinese. Figure 2 provides a full list of content options.

19 Whitepaper Horizon Scanning

Summary With Meta, innovators have for the first time, a complete, continuous and unbiased view The ability to rapidly identify and capitalize of the current and future states of science on emerging technologies can create valuable and technology – and with it, a competitive competitive advantages for companies in advantage that can propel their businesses the life science and technology marketplace. forward. However, with the rise in global research output, the human-driven approaches that innovation management teams relied on For more information in the past, simply won’t scale. To learn more about Meta Horizon Scanning, contact your sales representative, Meta Horizon Scanning addresses these email [email protected], or visit challenges by using world-leading machine http://solutions.meta.com/horizonscanning. learning techniques to identify, extract, monitor, and nominate concepts for future emergence from within the global corpus of scientific and technical literature. Whitepaper Horizon Scanning

Appendix A: Available concept sets

Concept Set Count Description

Genetic Elements

Human Gene Map 19,004 Regularly updated list of all curated human genes. Human Pseudogenes 12,965 Non-protein-coding DNA elements. Non-coding RNA 6,031 RNA elements including lncRNAs, miRNAs, rRNAs, etc. MicroRNA Targets 28,645 Genes that contain a 3'-UTR microRNA binding motif. Gene sets that represent cell states and perturbations within the Immunologic Signatures 4,872 immune system.

Phenotypes & Pathways

All recorded disease phenotypes, with or without underlying Phenotypes 8,155 molecular basis, and mendelian inheritance.

KEGG Pathways 487 Molecular pathways interaction/reaction network diagrams.

List of biochemical reactions and metabolic KEGG Reactions 10,126 pathway maps. The molecular, cellular and tissue system level classification Gene Ontology 24,500 of genes. Formal ontologies that represent over 40,000 biological concepts.

Compounds

Thompson Reuters Nearly 700,000 compounds with targets and bioactivity 700,000 Compounds information. A collection of small molecules that have a NIH Clinical Compounds 726 of use in human clinical trials.

NCATS Pharmaceutical Small molecular entities that have been approved for clinical use 3,500 Collection by U.S., European Union, Japanese and Canadian authorities.

Repository of a chemically diverse collection of small molecules MLSMR Small Molecules 350,478 used for probe discovery.

Any custom list derived from combinations of the above, Custom Sets or from privately owned sets.

21 Whitepaper Horizon Scanning

Appendix B: Horizon Scanning indicators

Family Name Description

Baseline Total count of articles within the subset that Article Frequency contain the specified concept. Total count of articles that cite articles in the subset Article Frequency in Citations mentioning the specified concept. Total count of articles within the domain corpus Article Frequency in Corpora that contain the specified concept. Year of concept's first occurrence within the Year Introduced specified article subset. Number of years since the first occurrence of the Citation Maturity concept within articles cited by the article subset. Number of years since the first occurrence of the Age in Corpora concept within the entire domain corpus. Current Prominence Computes the current prominence of a concept (based on 3-year rolling window) over a rolling three-year historical window. Current Prominence Computes the current prominence of a concept (based on 5-year rolling window) over a rolling five-year historical rolling window. Citations G-Index calculates a citation-based impact of G-Index articles associated with a specific concept. H-Index incorporates the citation-based impact of H-Index articles associated with a specific concept and the count of articles containing the concept. Yearly count of citations for articles within the Citation Kinetics (total yearly) subset that contain the specified concept.

Citation Kinetics (average yearly) Average citation count for the specified concept. Geospatial Maps the co-authorship network by geographic Collaboration Networks (affiliations) cell. Collaboration Networks (countries) Maps the co-authorship network by country. Computes the geographic cell correlation network Citation Networks (affiliations) based on the citation data for articles related to a specific concept. Computes the country-to-country correlation Citation Networks (countries) network based on the citation data for articles related to a specific concept.

Continued on next page. Whitepaper Horizon Scanning

Geospatial Measures the dispersion of research activity Cont'd. Quotient by Geographic Cell around a particular concept within individual geographic cells, relative to all geographic cells. Count of articles that contain the specific concept Geospatial Concept Frequency broken down by the country of the author. Measures the dispersion of research activity Quotient by Country around a particular concept within individual geographic cells, relative to all countries. Measures geographic knowledge dispersion by computing the number of geographic cells Geographic Cell Frequency represented by an article set in which the concept appears. Total count of unique countries that appear in the Country Frequency organizational fields for articles associated with a specific concept. Total count of geographic cells represented by the Collaboration Network collaborating author group of an article associated with a specific concept. Computes the G-Index values of papers related to a Maximum Geographic Cell G-Index specific concept, relative to the geographic cell that produced the research. Computes the maximum extent to which authors Maximum Geographic Cell H-Index of articles related to a specific concept collaborate with researchers from other geographic cells. Measures the geographical diversity of a concept Maximum Geographic Cell Centrality by computing the extent to which researchers are collaborating across geographic cells. Computes the geographic distribution of authors Spatial Distribution contributing to a paper referencing the specific concept. Measures the geographical diveristy of an concept Geographic Cell Diversity by computing the extent to which researchers are collaborating across geographic cells. Measures the geographic influence of co-authors Collaboration Spatial Network associated to a specific concept. Computes the frequency to which a concept is Article Frequency (affiliations) mentioned within individual geographic cells. Pre- Forecasts the total count of articles matching the defined Future Article Frequency scan that will contain the specified concept over a Forecasts three-year horizon. Future Emergence Predicted emergence of a concept three years into the future. Whitepaper Horizon Scanning

Appendix C: Precision and recall

Source Discipline Precision Recall

Scientific Multidisciplinary 31% 50% Journals Chemistry and Chemical Engineering 34% 51% Neuroscience 28% 44% Physics 24% 43% Computer Science: Pattern Recognition & Vision 29% 46% 45% 65% Materials Science (Minus Nanotechnology) 31% 49% Energy 28% 48% Mathematics 27% 44% Molecular Biology, Biochemistry, and Genetics 29% 48% Immunology 26% 42% US Computer Science 41% 58% Patents Communications 36% 54% Semiconductors and Memory 33% 52% Optical Systems and Components 35% 52% Chemical Engineering Processes, Apparatus 36% 58% Organic Chemistry and Compounds 31% 56% Molecular Biology, Microbiology, and Biotechnology 31% 51% Mechanical Engineering and Manufacturing 27% 49% Thermal and Combustion Technology 25% 44% Electrical Circuits and Electricity 34% 51%

24 Whitepaper Horizon Scanning

References

1. Murdick, D.A. “Foresight and Understanding from Scientific Exposition.” Office of the Director of National Intelligence. https://www.iarpa. gov/index.php/research-programs/fuse Retrieved 19 July 2016 2. “OCED Science, Technology and Industry Scoreboard 2015: Innovation for Growth and Society” OCED Publishing 2015. Page 65. http:// www.keepeek.com/Digital-Asset-Management/oecd/science-and-technology/oecd-science-technology-and-industry-scoreboard-2015_sti_ scoreboard-2015-en#page96 Retrieved 31 July, 2016 3. “Global Patent Filings Rise in 2014 for Fifth Straight Year; China Driving Growth.” WIPO World Intellectual Property Organization. 14 December 2015. http://www.wipo.int/pressroom/en/articles/2015/article_0016.html Retrieved 31 July 2016. 4. “Main Science and Technology Indicators” OCED Publishing 2016. Page 16. http://www.keepeek.com/Digital-Asset-Management/oecd/ science-and-technology/main-science-and-technology-indicators/volume-2016/issue-1_msti-v2016-1-en#.V57pzdArKGQ#page16 Retrieved 31 July 2016. 5. Hemsoth, Nicole. “Why the Golden Age of Machine Learning is just Beginning.” The Next Platform. 20 October, 2015. http://www. nextplatform.com/2015/10/20/why-the-golden-age-of-machine-learning-begins-now/ Retrieved 18 July 2016. 6. “What is Machine Learning” Machine Learning in the Netherlands. http://www.mlplatform.nl/what-is-machine-learning/ Retrieved 18 July 2016. 7. Hof, Robert D. “Deep Learning”. MIT Technology Review https://www.technologyreview.com/s/513696/deep-learning/ Retrieved 25 July, 2016. 8. Tsidulko, Joseph. “The Race Is On: IBM, Google, And AWS Aim To Deliver Machine Learning As A Cloud Service. CRN. 8 July, 2016. http:// www.crn.com/news/cloud/300081289/the-race-is-on-ibm-google-microsoft-and-aws-aim-to-deliver-machine-learning-as-a-cloud-service. htm Retrieved 17 July, 2016. 9. Walker, Ben. “Every Big Data Statistics” V&C Cloud News. 5 April, 2015. http://www.vcloudnews.com/every-day-big-data-statistics-2-5- quintillion-bytes-of-data-created-daily/ Retrieved 31 July 2016. 10. Metz, Cade. “Google Open-Sourcing Tensorflow Shows AI’s Future is Data” Wired. 16 November, 2015.http://www.wired.com/2015/11/ google-open-sourcing-tensorflow-shows-ais-future-is-data-not-code/ Retrieved 17 July 2016. 11. Neumann, John von. Qtd in. Alexander, Jeffrey “A Reasoning-based Framework for the Computation of Technical Emergence.” SRI International. GTM 2013. 25 September 2013. p5. 12. Wetrz, Boris. “Data, not algorithms, is the key to machine learning success” Version One. 06 January, 2016. http://versionone.vc/data-not- algorithms-is-key-to-machine-learning-success/ Retrieved 18 July 2016. 13. Murdick, D.A “Foresight and Understanding from Scientific Exposition.” Office of the Director of National Intelligence.https://www.iarpa. gov/index.php/research-programs/fuse Retrieved 19 July 2016. 14. Ibid. 15. Ibid. 16. Alexander, Jeffrey “A Reasoning-based Framework for the Computation of Technical Emergence.” SRI International. GTM 2013. 25 September 2013. p8. 17. Ibid. 18. Alexander, Jeffrey “Emergence is Inherently Social – A Reasoning-based Framework for the Computation of Technical Emergence.” SRI International. GTM 2013. 25 September 2013. P10 Retrieved 19 July 2016. 19. Meyers, Adam, et al. The Termolator: Terminology Recognition based on Chunking, Statistical and Search-based Scores. New York University Department of Computer Science. 201X. P1 20. Alexander, Jeffrey “A Reasoning-based Framework for the Computation of Technical Emergence.” SRI International. GTM 2013. 25 September 2013. p10. 21. Rahmer, Robert. “FUSE Indicator Catalog” Intelligence Advanced Research Projects Activity (IARPA). 26 April 2016. Page 2. 22. Ibid. 23. Copernicus evaluation - SRI International 24. Ibid. 25. Malakar, Gopal. “What is Random Forest Algorithm? A graphical tutorial on how Random Forest algorithm works” Online video clip. YouTube. 9 December, 2014. Retrieved 18 July, 2016. 26. Copernicus Evaluation - SRI International 27. Ibid.

Last modified on October 7, 2016. Horizon Scanning

About Meta At Meta, we work on landmark AI challenges that have the power to transform how scientific knowledge is experienced and consumed. Meta provides innovators throughout the scientific ecosystem with powerful views of the current and future trajectories of science and technology, with the goal of making all scientific knowledge computable. Meta’s mission is to unlock all of the world’s scientific and technical insights using artificial intelligence.

Meta.com Toronto Montreal San Francisco

Meta.com Toronto Montreal San Francisco © 2016 Meta. All rights reserved.