Unstructured Data Is a Risky Business

R&D Solutions for OIL & GAS EXPLORATION AND PRODUCTION Unstructured Data is a Risky Business Summary Much of the information being stored by oil and gas companies—technical reports, scientific articles, well reports, etc.,—is unstructured. This results in critical information being lost and E&P teams putting themselves at risk because they “don’t know what they know”? Companies that manage their unstructured data are better positioned for making better decisions, reducing risk and boosting bottom lines. Most users either don’t know what data they have or simply cannot find it. And the oil and gas industry is no different. It is estimated that 80% of existing unstructured information. Unfortunately, data is unstructured, meaning that the it is this unstructured information, often majority of the data that we are gen- in the form of internal and external PDFs, erating and storing is unusable. And PowerPoint presentations, technical while this is understandable considering reports, and scientific articles and publica- 18 managing unstructured data takes time, tions, that contains clues and answers money, effort and expertise, it results in regarding why and how certain interpre- 3 x 10 companies wasting money by making tations and decisions are made. ill-informed investment decisions. If oil A case in point of the financial risks of bytes companies want to mitigate risk, improve not managing unstructured data comes success and recovery rates, they need to from a customer that drilled a ‘dry hole,’ be better at managing their unstructured costing the company a total of $20 million data. In this paper, Phoebe McMellon, dollars—only to realize several years later Elsevier’s Director of Oil and Gas Strategy that they had drilled a ‘dry hole’ ten years & Segment Marketing, discusses how big Every day we create approximately prior two miles away. This discovery was data technologies when combined with three quintillion bytes of data, the made only when a paper copy of a drilling domain expertise can help oil and gas majority of which is unstructured. report was found by a retiring employee companies better manage their unstruc- who was cleaning out his office. tured content, helping them improve business outcomes. Better management of unstructured data helps oil and gas companies mitigate Success Rates and Reducing Risk risk by enabling exploration and produc- 80% Every day we create approximately three tion teams to learn from their historical quintillion bytes of data, the majority information—their successes and failures. of which is unstructured. In fact, it is Conversion of unstructured information estimated that 80% of existing data is into structured data can help exploration unstructured, meaning that the majority and production teams gain greater insight of the data that we are storing is unus- into subsurface conditions, identify 20% able. In reality, most users either don’t anomalies and patterns that can lead to know what data they have or simply can- higher recovery rates, reduce planned and not find it. And the oil and gas industry is unplanned downtime, reduce HSE inci- no different. dents, and potentially reduce the number Unstructured Structured of ‘dry holes’ a company drills. Deploying In fact, our industry is among the most data data other technologies, such as machine data intensive industries in the world, and learning and natural language processing, ...of the world’s data is unstructured. while over the last decade the industry with domain knowledge can further help Unstructured data is growing at 15 has become very effective and efficient companies to better analyze data, identify times the rate of structured data. in managing, storing, and sharing patterns and predict outcomes, leading to structured data, the industry has failed more informed decisions. to find cost-effective ways to manage The Unstructured Data Challenge In addition to the challenges cited above, Survey of our oil and gas customers the process of mining and structuring reveals significant internal challenges in unstructured data is not easy. Although, managing unstructured data from the difficult to mine, unstructured data can majors to small E&P companies, from be an incredibly rich source of content Houston to Abu Dhabi. Some of the and an important asset to oil and gas common challenges include: companies, extending the life and value of previously acquired data and • High variation in information types interpretations. With millions of dollars and sources (e.g., internally vs. spent on acquiring data from early stages externally generated) of exploration to late-stage production, • Inconsistency in data quality vast quantities of data are stored in • Variation in nomenclature and units unstructured formats, meaning that it to reconcile cannot be easily searched and utilized in the future. Two phrases that are echoed • Number of disparate data containers in across the industry are, “I don’t know disparate locations what I don’t know” and “if we only knew • High variation of file formats to reconcile what we know.” At a time when corporate • Diverse data models knowledge and expertise are being lost as a result of the exit of more senior • Non-homogeneous ownership with staff, it is critical that companies take diverse collection strategies measures to invest in understanding • Uncoordinated access rights and what data and information they have and access models invest in managing that data so it can be discovered and used. • Unconfirmed, inaccurate, expired, or retracted information • Duplicate files and lack of version control 3 Two-thirds of oil and gas professionals reported that the use of analytics is one of the most important capabilities for transforming their companies. Overcoming the challenge Today, we continue to be in the forefront of technological service to industry by As a company that has led the way in the leveraging machine learning systems and publishing industry, Elsevier transformed using them to analyze a variety of data its business by tackling some of these types and formats, recognizing that these challenges decades ago, taking millions systems, when deployed with domain of static, paper journals and books and expertise, can save money by shrinking converting them to digital assets that the time and effort of mining the content can be searched and downloaded instan- manually. More importantly, we can help taneously. By continually investing in reduce the time between data generation, the development of automated technol- analysis (and interpretation), and deci- ogies and tools, we reduced the time, sion-making for our customers. According effort, and cost of digitization, improved to the 2016 Accenture and Microsoft oil metadata extraction and indexing capa- and gas digital trends survey1, two-thirds bilities and processes by refining and of oil and gas professionals reported that expanding our domain-specific taxon- the use of analytics is one of the most omies and ontologies. The expertise we important capabilities for transforming developed and investments we made their companies. In the same survey, 56 paved the way for us to better support percent of respondents indicated that big our customers through the development data and digital capabilities will enable of domain-specific workflow tools in a them to make faster, better decisions. variety of industries, including the oil and gas industry. The journey of struc- Not surprisingly, the biggest time and turing our own unstructured assets, financial sink is not in the actual process- which commenced in the 1990s, not only ing, but in the evaluation and preparation transformed our business, but helped to of the content to be processed, and in transform the industries we serve in the the development and implementation way that they consumed and used of quality assurance and quality control our content. processes. Overcoming the unstructured In the same survey, 56 percent } of respondents indicated that big data and digital capabilities will enable them to make FASTER, BETTER decisions. Management of unstructured data has the potential to enhance both the productivity and competitiveness of a company. data challenge requires devising and No time like the present implementing an effective and efficient Management of structured and unstruc- processing pipeline, developing robust tured data is critical for the oil and gas taxonomies (and ontologies), access- industry to improve operational efficiency ing extensive content to help ‘train’ and across the value chain. While it is under- refine capabilities in natural language standable that companies are hesitant processing and semantics text analytics, to invest in data management initiatives and generating tools that facilitate the when the price of oil is hovering around discovery of the newly structured content $50 USD/bbl, case studies continually that align or are integrated into our cus- emerge that demonstrate the benefits in tomers’ workflows. investing in managing unstructured data, A colleague of mine wrote several years as do examples of the costs sustained ago, “as stated by Thomas Edison, genius by companies making poor investment is one percent inspiration, ninety-nine decisions merely because they don’t know percent perspiration…Scientists are what they know. Management of unstruc- perpetually looking for meaning in the tured data has the potential to enhance mountains of data, while those same both the productivity and competitiveness mountains grow to ever higher levels. of a company by enabling exploration Scientists need to be able to rapidly and production teams to gain insights identify relevant data points and relation- through identification of patterns, ships of interest during their exploration relationships, and anomalies that can process.” 2 Although, he was writing help them make better decisions faster, about data management in the pharma effectively improving success and recovery and medical industries, this observation rates, as well as bottom lines in both the can be readily applied to geoscientists, short and long-term. engineers, asset managers, and others working in the oil and gas industry. Reference 1 The 2016 Upstream Oil and Gas Digital Trends Survey.

Unstructured Data Is a Risky Business

Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

1 Application of Text Mining to Biomedical Knowledge Extraction: Analyzing Clinical Narratives and Medical Literature

Big Data Mining Tools for Unstructured Data: a Review YOGESH S

Extracting Unstructured Data from Template Generated Web Documents

Top Natural Language Processing Applications in Business UNLOCKING VALUE from UNSTRUCTURED DATA for Years, Enterprises Have Been Making Good Use of Their 1

Combining Unstructured, Fully Structured and Semi-Structured Information in Semantic Wikis

Solving the Unstructured Data Puzzle with Analytics

Cheminformatics for Genome-Scale Metabolic Reconstructions

Geospatial Semantics Yingjie Hu GSDA Lab, Department of Geography, University of Tennessee, Knoxville, TN 37996, USA

Unstructured Data Analysis in Arcgis

The Role of Text Analytics in Healthcare: a Review of Recent Developments and Applications

Web Mining – Data Mining Im Internet