Data Wrangling for Effective Metadata Extraction

Data wrangling for effective metadata extraction rganizations are realizing that there is Document metadata is also utilized to populate valuable information hidden within their system profiles for easy end-user access and legacy files (both paper and electronic) less end-user copies. which can now be efficiently identified due to improved methods for extracting metadata. Metadata has several forms and can be located Organizational challenge within many areas. For example, in the oil and gas industry, metadata contains information pertaining to fields, basins, geological provinces, Many organizations struggle to enforce the etc. In addition to the company, third parties may information management standards, processes use or need the metadata - contracting or procedural guidance for metadata-often due companies, vendors, fabricators and partners. to cost or resource constraints. However, with the right course of action, companies can unlock The importance of accurate and accessible the value in previously inaccessible metadata metadata should not be undervalued. Accessible and documents. metadata attribution provides end users with confidence that the origin, history, and integrity There are many metadata challenges that of each attribute comes from a reliable source. organizations are currently facing today, such as: Metadata is available but has not been extracted to populate system attribution. There is a lack of consistent use of metadata. There is no efficient method for quality-checking existing metadata. Information is stored in unstructured drives and cannot be instantly located. There is too much information to analyze and not enough manpower to do so. Documents containing foreign languages have not been translated. Existing paper or tape media is aging Information that was inherited during and deteriorating by the year. mergers and acquisitions does not contain data consistent with the acquiring company’s standards and systems. 2 Data wrangling solutions demonstrate an intimate understanding of the inherent Data wrangling services challenges of digitizing data, as well as a scalable, structured and phased approach to metadata extraction. Metadata extraction Data wrangling services support an utilizes targeted technologies to accomplish data organization’s drive to digital by extracting extraction, such as computer vision, machine accurate and complete metadata from previously learning, and text analysis. Additionally, SMEs’ inaccessible repositories of information. hands-on experience supports organizations Furthermore, metadata associations and drive to digital by maximizing the value of relationships between document revisions, extracted metadata, such as: attachments or mark-ups/comment sheets must be maintained. Utilizing extracted data to Transforming extracted metadata to enable organizations to create align with taxonomies. structured repositories. Aligning extracted metadata Identifying gaps and missing or with target system attribution erroneous data. requirements. • Key word trigger: Intended for items such Data wrangling approach as Title • Form based: Used for repetitive forms to extract targeted information Metadata extraction and cleansing activities can be customized by defined organizational rules, • Vision analysis based: Used to dynamically such as numbering and coding procedures, target find an area on the document and extract system attribution requirements, etc. Data targeted information wrangling services would encompass several • Free form extractions: To extract targeted data metadata extracting techniques, such as: • Exact Phrase: Intended for items like a list of Originators, Country, Basin, Field 3 By leveraging domain SME engineering IM knowledge, extracted metadata can be The result processed, mapped and aligned to organizational requirements, such as: The ability to extract rich and searchable data • Identifying gaps and missing or erroneous data from previously inaccessible repositories can • Reviewing mandatory system metadata field transform the way a company operates. requirements and expected data for each field Implementing data wrangling services can • Aligning metadata to numbering Specifications lead to: and/or Procedures • Increased data integrity • Reworking metadata transformation to include • Enhanced safety newly aligned codes and ensure no gaps • Streamlined processes are present • Increased data traceability • Identifying value-add opportunities with available attributes for enhancement of data • Improved decision quality capabilities (e.g. System Numbers, Area Codes, • Accelerated decision-making MOC Numbers, etc.) • Increased operational use of information that • Aligning identified value-add opportunities was previously unreachable with global taxonomies About the author Janine Murray brownfield modifications, greenfield Consulting Practice of Energy, Natural Resources, enhancements, MCP joint ventures, Closeout, Utilities and Engineering & Construction and MCP handover to Operations. Additionally, she is experienced with document cleansing and Janine Murray is an IM Consultant with over 15 data extraction techniques for digitizing O&G years of experience in the O&G industry. She has legacy assets. extensive FE/Operations and Major Capital Project (MCP) Information Management She can be reached at: experience. She also has deep experience with IM [email protected] 4 Wipro Limited Doddakannelli, Sarjapur Road, Bangalore-560 035, India Tel: +91 (80) 2844 0011 Fax: +91 (80) 2844 0256 wipro.com Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading global information technology, consulting and business process services company. We harness the power of cognitive computing, hyper-automation, robotics, cloud, analytics and emerging technologies to help our clients adapt to the digital world and make them successful. A company recognized globally for its comprehensive portfolio of services, strong commitment to sustainability and good corporate citizenship, we have over 175,000 dedicated employees serving clients across six continents. Together, we discover ideas and connect the dots to build a better and a bold new future. For more information, please write to us at [email protected] IND/BRD/MAR 2019 - FEB 2020.

Data Wrangling for Effective Metadata Extraction

A Platform for Networked Business Analytics BUSINESS INTELLIGENCE

Data Extraction Techniques for Spreadsheet Records

Tip for Data Extraction for Meta-Analysis - 11

AIMMX: Artificial Intelligence Model Metadata Extractor

BI SEARCH and TEXT ANALYTICS New Additions to the BI Technology Stack

Intelligent Decision Support Systems- a Framework

Best Practices for PDF and Data: Use Cases, Methods, Next Steps

Automatic Document Metadata Extraction Using Support Vector Machines

Web Data Extraction, Applications and Techniques: a Survey

Big Scholarly Data in Citeseerx: Information Extraction from the Web

Extract Transform Load Data with ETL Tools

Data Mining This Book Is a Part of the Course by Jaipur National University, Jaipur