WHITE PAPER

Creating value from National Data Repositories

Introduction CGG & NDRs As the name suggests, National Data CGG Data Management Services has Repositories or NDRs are typically set up implemented and operated NDRs around by government agencies and regulators as the world since 2008. In 2013 CGG was part of a long-term strategy to protect and awarded the contract to operate Diskos, optimize the value of a nation‟s natural Norway‟s National Data Repository. resources. By collecting, standardizing Widely regarded as the first true NDR and and making data available quickly and referenced as an example of international efficiently they reduce barriers blocking best practice, Diskos serves over 60 investor entry, eliminate competitive different companies and over 300 regular positions based on data holdings and users. It celebrated its 20th anniversary of thereby maximize inward investment on operation in 2015. A core team of CGG exploration, development and production experts deliver the services together with operations. staff from our project partners, EVRY and Kadme. The initial phase of the contract To achieve this, NDRs must create value included migration of a proprietary data for key data generators and users – the model format into the PPDM data model, operating companies and the service the development of new functionalities, the providers. In many respects, the value of design and implementation of the IT NDRs to operating companies, especially environment and the transfer of nearly 1 in territories that have implemented public PB of data. Today the data volume data release policies, can be argued to be exceeds 5 PB. Each month an average of the data volumes they enable them to 150 TB of new data is loaded. The 300+ access. In a study commissioned by Diskos users place over 1000 data orders Common Data Access, “The business every month, averaging 65 TB of value case for data management – a downloads and distribution. study” (Hawtin and Lecore, 2011), it is stated that, since oil company value is In this white paper we discuss current directly related to an understanding of the NDR best practice and share our subsurface, data contributes 25-33% of experience on how best to initiate, the value. Of course, unless the data can implement, operate and create value from be efficiently accessed and integrated into NDRs. We reveal how NDRs have a company‟s systems and processes and developed to provide greater automation be considered of recognized quality, that and new capabilities and consider how value is diminished. they must further evolve to meet the future needs of the industry.

www.cgg.com Copyright © CGG, 2017 page 1

WHITE PAPER

Efficient access to trusted data comes at a “We have big expectations for the new cost. Although there are different business partnership agreement with CGG. The goal is models for NDRs with many countries to build on 20 years of success and bring choosing to pay for their NDR from central Diskos to the next level,” says Diskos manager Eric Toogood, and points out several focus funds, it is the operating companies that areas for the upcoming years.” ultimately directly or indirectly pay for the service. In addition to paying for the NDR, the operating companies and their service suppliers also carry the cost burden of data submission. Additional copies of data, preparation to required standards, and administration efforts all contribute to this cost.

If the proposed range of 25-33% of company value derived directly from data is accepted, the outcome of any cost/benefit analysis performed on a planned NDR implementation or ongoing Diskos goes next level, an article in operation should be a foregone Diskos 20 years of service for conclusion. This does not mean however geology: that there is no need for a focus on http://www.npd.no/en/Publications/Presentations/DI SKOS/Diskos-goes-next-level/ efficiency. Opportunities for NDR operations to realize and deliver further Evolution of NDRs efficiencies are required and need to be In many ways, the progression of NDRs pursued. Modern NDR solutions should parallels changes we have all experienced both reduce cost and save time for their in consumer banking. Until comparatively stakeholders, whether that relates to the recently, depositing or withdrawing money cost to the regulator to operate the from a bank involved a visit to a “branch” environment or minimizing the burden on and the performance of manual handling operating companies conducting their data tasks involving physical items. In a submissions, for example. Additionally, relatively short time period this NDRs must evolve to be more than a conventional way of doing business has repository, delivering direct access to the been replaced by online banking with data source. In the future, they should instructions and transactions carried out provide a feedstock of data for new and digitally. emerging analytics and machine learning capabilities whilst at the same time preserving information security.

www.cgg.com Copyright © CGG, 2017 page 2

WHITE PAPER

NDRs are not so different. Submissions or Data Orders: Requests for data transfer “deposits” of data involved preparing file to authorized parties. sets on media and shipping them to a physical location or “branch”. Whilst Data Maintenance: Updating meta-data downloads or “withdrawals” of data could such as entitlement settings that control be planned and defined online, typically data access. Correcting, replacing and the actual delivery of data required the amending records. physical shipment of media. Data & Functional Upgrades: Clean-up Such physical transfers of data take time, and data extraction projects and changes cost money and are inherently inefficient. to functionality. When the operation of Diskos was Administration: Information security tendered in 2013 the technical and routines, process improvement and risk commercial requirements indicated that an management, quality monitoring, user evolutionary jump was necessary, marking support and updates to standards and a transition to a new, highly automated, guidelines. efficient and lower-cost solution. CGG delivered this, providing an NDR solution that has greater automation and new capabilities. This is what has enabled the significant monthly uploads of data and the rapid, accurate self-service fulfilment of over 99% of the approximately 1000 data orders made each month. With Diskos celebrating 20 years of service, CGG is proud to have enabled this next evolutionary phase.

Automation When deciding on the services to automate, those accessed most frequently Of these tasks, in terms of service with the greatest urgency should take demand, data inputs are typically the most priority. Exactly what services does an frequent activity, closely followed by data NDR typically offer? Considering different orders. However, although Service Level geographies and NDRs the services can Agreements (SLAs) may govern the rate be grouped into the following broad at which data is to be loaded there is a categories: significant business driver for fulfilling data orders quickly. For this reason, we will Data Input: The submission of original or look at the automation of data orders first. reprocessed data or reporting to satisfy monthly or annual requirements, including validation and QC against defined standards.

www.cgg.com Copyright © CGG, 2017 page 3

WHITE PAPER

Data Orders to user applications via an API. This is The ability for a user to search, browse fully secured and honors data entitlement. and view, both textually and on a map, the The API enables users to integrate details data they are entitled to is a pre-requisite. of the data they are entitled to into their Invariably, this functionality is delivered own desktop applications. For example, if over a web interface. Automation beyond they have built a GIS application that this point occurs in two ways: displays their corporate data availability, they can add a layer that shows data 1) Delivery of the data order available from the NDR, providing a 2) Access to the data via an complete picture of data coverage. It is Application Programming Interface also possible to construct and apply (API) regular data downloads for well data. This A data order actually consists of a user- can be useful when an operating company defined set of data that needs to be has a particular group of wells of interest located and copied. Since the user defines and wishes to maintain local file storage of the order on the web interface it consists all relevant data for those wells. Whilst it is of meta-data describing the data. To fulfil easy to create an order for all of this data, as new data is being loaded all the time, it the order it is necessary to use the meta- quickly becomes necessary to update and data to locate, extract, copy and send the data to the user. Where the user has submit new data orders to gain access to defined part of a seismic volume, the newly loaded data. Using the API, this SEG-Y data (pre- or post-stack) for update process can be fully automated, ensuring that the latest and most complete example, there is also a need to „cut‟ the data coverage is always available to the data to the spatial limits defined in the order. company geoscientists. Data Input CGG‟s Akon system automates this entire The ease of performing loading or data process with an API between the web interface and the NDR „back-end‟ input to an NDR can be highly variable. In managing the process of locating, cutting a perfect world, where there are clear submission guidelines, each contractor and copying the data. Since the web and company involved in data acquisition interface only allows users to see and order data to which they are entitled, and supply uses industry-standard file access rights are also honored. To formats and they are consistently and complete the order, the extracted data accurately applied, loading would be a straightforward process. However, the volume or order is either placed onto an gathering and supply of the data that must FTP/SFTP site or copied to media and dispatched, according to client eventually be submitted to an NDR instructions. involves a multitude of stakeholders acting over a long period of time. Consider also In addition to the conventional approach of that requirements and values that should browsing and ordering data via a web be populated in header records do interface, CGG‟s system can be „exposed‟ sometimes change over time and this can

www.cgg.com Copyright © CGG, 2017 page 4

WHITE PAPER therefore lead to genuine data inconsistencies.

This means that, even in mature provinces with well-established NDRs, data preparation for loading remains a significant task. There are, however, various aspects of this process that can be automated.

For simpler data types that do not include Production data gathered using associated files, such as consolidated MPRML as seen on the Production Data production data reporting, it is possible to Public Portal: use industry data exchange protocols. For http://www.diskos.no/public-portal the Diskos operation in Norway the Norwegian Petroleum Directorate participated in an NDR community initiative to create a derivative of the Collaboration Works Production Reporting Markup Language MPRML is itself the result of collaborative (PRODML) from Energistics. This was work coordinated by Energistics. This termed Monthly Production Reporting global, not-for-profit industry consortium Mark-up Language (MPRML). Using this serves as an independent body to as the target format, operating companies facilitate and manage open data, extract the relevant data from their internal information and process standards for the production reporting systems using upstream oil and gas industry. scripts. Each operating company creates a The concept of a standardized production single XML file using the MPRML format data reporting structure that would enable that is automatically submitted. The regulators to meet government objectives contents of the file are validated before was conceived at the National Data being published to the NDR. The Repository conference held in Rio de submission and publication of monthly Janeiro, Brazil in 2010. A work group was reporting data are almost entirely formed and six different countries automatic for both the supplier and contributed alongside several service recipient of the data. companies. The results of the work were Certain data types such as well and presented at NDR11, held in Kuala seismic are more complex, consisting of Lumpur, Malaysia, and the MPRML, a many different files and file types. Full derivative of PRODML, was completed automation of the loading of this type of shortly afterwards. MPRML is proof that data is possible, but is not yet desirable as collaboration works. it would inevitably lead to incomplete or inconsistent data.

www.cgg.com Copyright © CGG, 2017 page 5

WHITE PAPER

However, aspects of the quality control New Capabilities process have been automated. For Automation is not the only enhancement example, industry-standard file types that to have been delivered to the Diskos NDR include header records, such as LAS and solution by CGG. In addition to extended SEG-Y, are automatically validated. This capabilities and new functionalities, such takes the form of rule-based checks for as completeness control, CGG was the each attribute in the header. As an first NDR provider required to deliver a illustration, the spud date of a well can be solution compliant with the ISO 27001 validated to be consistent with the data information security standard. held in an external master repository for wells – such as the regulator‟s well Information security covers not only licensing database. It can also be confidentiality of data but also its validated as being in the correct format availability and integrity. according to the submission standards and to be present if it is a mandatory field.

Integrate to Validate Integrating the NDR with other associated repositories, such as the regulatory body‟s licensing or permitting systems, enables validation and consistency of data. It can also provide a „warning‟ or planning aid to highlight data that should be submitted. In turn, this enables control over completeness, helping to ensure that all required data are gathered and reported in a timely manner.

By building an extensive library of The information security ‘triangle’ validation rules it is possible to fully automate the QC and verification of these This has far-reaching consequences for files, such that a file that passes validation the design and operation of an NDR, can then be loaded to the system ready especially when accessibility and ease of for entitlements to be set. Further use must be balanced with security efficiency is gained by exposing these requirements. Information security is validation tools to the operating similar to data management in that it relies companies and reporting back to them on equally on people, process and the rules that have not been met. In this technology. way, the tools are used to validate the files at the point they are received from their Technologically, security requirements contractors. This enables the user to be dictate and inform the hardware, aware of any specific failure and have communications and application design. them corrected at source. For example, the data itself must be

www.cgg.com Copyright © CGG, 2017 page 6

WHITE PAPER completely separate from the user front- and honest about any incidents and end; „data segregation‟ must be evident as weaknesses they experience, to report a control. The user therefore accesses them and to be engaged in investigations and uses only meta-data. Any access and improvement actions. (ordering) of that data creates a request that is made on the data servers via an Data Management Competence API. Irrespective of automation, people remain at the center of an NDR operation. The Processes must be designed, accuracy, integrity and security of the data documented, followed and updated to rely on their competence. How do you ensure the integrity and confidentiality of measure oil & gas data management the data. Setting entitlement to data is a competence and staff progression? The good example. No matter how secure and answer can be found at well designed the solution is, if an www.cdacompetency.com. This data entitlement is set incorrectly someone management competency system could gain access to data to which they (available free of charge) lists different have no rights. To control and prevent this aspects of the profession and describes risk, it is necessary to design a safe five different levels of competence for process. This needs to be holistic in that it each. Competency maps can be created considers how the application will guide for an individual or for a role. This is a very and assist the user, for example by asking useful tool for planning career the operator to confirm the selection they development or for ensuring that staff request before applying it, and how fail- have the right competencies for the job. safes and checks can be applied. In the Alternatively, staff may wish to become a case of entitlement setting, it is such a certified petroleum data manager. Find out critical process to complete correctly that how under the Certification tab at CGG determined that, to adequately www.ppdm.org mitigate the risk, one system operator should input the data, while a second must Ready for the Future check their work before the change can be As an industry we are experiencing an applied; „segregation of duties‟ must also unprecedented time of change driven by a be evident as a control. combination of low oil prices and emerging technologies. The changes are exciting as People are also at the center of security, new technologies and techniques deliver since it is almost always a human action new possibilities. that creates an information security incident. To meet the criteria for ISO 27001 all staff need to be capable and competent in their role. They need to be trained to understand the implications of their actions and to understand information security requirements. Above all, they must be encouraged to be open

www.cgg.com Copyright © CGG, 2017 page 7

WHITE PAPER

organising a greater amount and variety of such data.

Analytics & Artificial Intelligence Predictive analytics and artificial intelligence (AI) are already being used in various ways in the oil & gas industry. Whilst experimental R&D projects are underway in the exploration arena, both techniques are actively being used in projects for producing assets to predict and optimize facilities maintenance and production.

Each of these projects requires „base- data‟ to be supplied as complete, accurate and up-to-date information, in a timely and cost-effective manner, and in an acceptable format. This is the challenge faced by data managers. To meet it, the data stored in national and corporate data repositories therefore needs to meet these requirements whilst also remaining accessible and transferable in open- standard formats. CGG‟s NDR solution meets this challenge via robust processes and automation to ensure accuracy and Gartner’s Top 10 Strategic Technology completeness. Accessibility and Trends for 2017: transferability of data is ensured by http://www.gartner.com/smarterwithgartner/gartners making use of the open and public PPDM -top-10-technology-trends-2017/ data model from the Professional As data managers of national or corporate Petroleum Data Management Association. solutions and environments, we are Files are also stored in standard formats, responsible for evaluating and such as LAS and SEG standards, with understanding the changes and predicting consistent nomenclature and use of and preparing for the next data headers. management challenges. Increasingly, access to data needs to be NDRs often represent a very significant faster and more efficient. As already collection of data gathered over a long described, CGG is making use of an period of time. Emerging technologies Energistics exchange standard for the hold the promise of extracting and submission of monthly production data. In the future, NDR solutions will need to be capable of exchanging a greater variety of

www.cgg.com Copyright © CGG, 2017 page 8

WHITE PAPER subsurface data in exchange standards information. However, much of the data such as WITSML and RESQML. particularly related to wells is presented in unstructured document collections or Machine Learning semi-structured files. In these formats, Machine learning aims to replicate how we data such as tops, pressures and bottom as humans think and learn. Traditional hole temperatures need significant computer programming is more „preparation‟ work before use as the „base- deterministic, meaning that the inputs and data‟ for analytics. desired outputs have to be known. Non- deterministic programming enables Machine learning promises to transform different outcomes depending on „choice the speed and cost of accessing this data points‟, but the programmer must be by changing it from a manual task to one aware of all such choices at the outset. that is largely machine-driven (Blinston For many routine tasks in data and Blondelle, 2017). management, variation of the input data, in terms of documents and formats, is so The data manager‟s challenge is to great that it has proved impossible to assess the potential of these emerging create programmes that can reliably technologies, consider the data present complete the task. Machine learning offers within the repositories they manage and a chance to automate what are currently build the business case for unlocking the manual tasks, such as locating and full range of data. transferring core analysis data within well Pre-stack Interpretation files into a corporate or national data Until recently, most seismic interpretation repository. and analysis was carried out on post-stack Most national data repositories have been seismic volumes. Pre-stack volumes were successful in gathering critical well often an intermediary step produced and QCed during processing to enable the

A typical machine learning workflow (from Blinston and Blondelle, 2017).

www.cgg.com Copyright © CGG, 2017 page 9

WHITE PAPER production of relevant and useful post- inefficient as the large data volumes are stack volumes. being stored twice. In proprietary applications this is solved by giving direct As seismic acquisition technologies, access to the data and to applications compute power and our understanding of used to visualize, analyze and interpret it. the seismic response have progressed, it The NDR of the future requires a similar has become more common to conduct solution, perhaps co-locating interpretation interpretation and analysis on pre-stack and analytics applications with the data. In volumes, for example to analyze rock this vision of the future the user will not properties using Amplitude Versus Offset place an order for data retrieval. Instead, (AVO). This has an impact on the data that „order‟ will request access to the data manager as pre-stack volumes are orders and the chosen application. of magnitude larger than the equivalent post-stack volumes. With advances in Conclusions storage technologies there are many The business case for NDRs is as strong, viable Hierarchical Storage Management if not stronger, than it ever was. This does (HSM) and private/public “cloud” storage not, however, guarantee acceptance and options. Storage of larger volumes is not adoption by all stakeholders. To be truly therefore the issue. However, fulfilling successful, these solutions must become multiple data order requests for data an intrinsic part of the wider community‟s volumes into the 10 TBs or even 100 TBs data management solution and to do this currently requires physical shipment of they must deliver their services more media. efficiently and effectively than could be achieved by one party alone.

This is CGG‟s driver to stay at the forefront of NDR evolution and automation. But we are not content to stop there. We see an exciting transformation taking place in our industry and we are preparing our NDR solutions for this bright future.

Pre-stack data conditioning, attribute computation and analysis using CGG’s HampsonRussell AVO module. http://www.cgg.com/en/What-We- Do/GeoSoftware/AVO/AVO

Where high-bandwidth communication networks are available, CGG overcomes this problem by enabling replication of „extracted‟ volumes direct to the user‟s local file storage. However, even this is

www.cgg.com Copyright © CGG, 2017 page 10

WHITE PAPER

Glossary of terms SLA (Service Level Agreement): A API (Application Programming specified rate, time or other measure for a Interface): Functionality within a computer given service type, commonly contained application that can be „exposed‟ to other within a contract. applications to enable data and WITSML: XML exchange standard commands to be passed “machine to developed and maintained by the machine” from one application to another. Energistics Industry Consortium covering HSM (Hierarchical Storage well, drilling and log data. Management): Hardware that enables seamless storage of files and data across References different types of disk and/or media such Blinston, K. and Blondelle, H., 2017, as tape. Commonly combines disk and Machine Learning systems open up access to large volumes of valuable tapes moved between storage slots and information lying dormant in unstructured tape drives by a robotic system. documents. The Leading Edge, 36, no.3, LAS (Log ASCII Standard): Standard file 257-261. format used to store well log information. Hawtin, S. and Lecore, D., 2011, The MPRML (Monthly Production Reporting business value case for data management – A study: Common Data Access Limited, Markup Language): XML exchange http://cdal.com/wp-content/uploads/2015/09/Data- standard developed as a derivative of Management-Value-Study-Final-Report.pdf PRODML used to exchange monthly production figures.

PPDM (The Professional Petroleum Data Management association): Frequently used to refer to the open, public data model developed and maintained by the association.

PRODML: XML exchange standard developed and maintained by the Energistics Industry Consortium covering production data.

RESQML: XML exchange standard developed and maintained by the Energistics Industry Consortium covering data relating to the reservoir.

SEG-Y: File standard developed and maintained by the Society of Exploration Geophysicists designed to enable the transfer of seismic data.

www.cgg.com Copyright © CGG, 2017 page 11

WHITE PAPER

About the authors About Data Management Kerry Blinston is the Services Global Commercial Data Management Services Director for CGG Data (www.cgg.com/dms) are part of CGG Management Services. GeoConsulting and offer a full range of He has worked for services and solutions to answer E&P CGG for nine years and data challenges, from physical storage been actively engaged and transcription to the implementation of within national and corporate and national data repositories. corporate data They were awarded the Diskos NDR repository projects over that time, working contract by the Norwegian Petroleum in countries such as Sri Lanka, Azerbaijan Directorate in 2013. and Ethiopia. Currently he is the contract holder for CGG‟s operation of Norway‟s About CGG NDR, Diskos. He holds a BSc in CGG (www.cgg.com) is a fully integrated Geological Sciences from Leeds Geoscience company providing leading University and an MSc in Petroleum geological, geophysical and reservoir Geology from the University of Aberdeen. capabilities to its broad base of customers primarily from the global oil and gas Karen Blohm is industry. Through its three complementary currently the Diskos businesses of Equipment, Acquisition and Programme Geology, Geophysics & Reservoir (GGR), Manager for CGG CGG brings value across all aspects of Data Management natural resource exploration and Services. Karen exploitation. fulfils an operational leadership role with strategic and high- CGG employs around 5,600 people level data management consultancy around the world, all with a Passion for assignments in National/Governmental Geoscience and working together to Authorities and oil & gas companies. She deliver the best solutions to its customers. has had technical roles in subsurface interpretation and petroleum geoscience CGG is listed on the SA followed by technical, consultative and (ISIN: 0013181864) and the New York operational leadership roles in data Stock Exchange (in the form of American management. Data management Depositary Shares. NYSE: CGG). engagements have provided opportunities around the globe, including in Norway and Australia, and have focused on scoping and delivery of major data management and service delivery improvement programs. Karen holds a BSc. in Geology from the University of Hull, UK.

www.cgg.com Copyright © CGG, 2017 page 12