Data Engineering & Data Mining

Total Page:16

File Type:pdf, Size:1020Kb

Data Engineering & Data Mining DATA ENGINEERING & DATA MINING Exploring Trends and Approaches from Inside and Outside of Our Industry Data Science Series, Part 3 of 4 21 August 2019 #ILTACON19 Data Science Series Schedule Monday Tuesday Wednesday Thursday @ 3:30 PM @ 3:30 PM @ 3:30 PM @ 11:30 AM Session 1 Session 2 Session 3 Session 4 LAW in the ERA of The RISE of the LEGAL DATA ENGINEERING and DATA VISUALIZATION in LEGAL DATA SCIENCE DATA SCIENCE TEAM DATA MINING the LEGAL INDUSTRY Ed Walters, Dazza Carmin Ballou, Eric J. Lisa Mayo, Shree Melaina Fireman, Greenwood Felsberg, Mike Klastava Bharadwaj Andrew P. Medeiros, Mark Thorogood SPEAKERS/MODERATOR Shreenidhi Bharadwaj Lisa Mayo ANDREW BAKER (Moderator) VP, Data & Analytics Director of Data Senior Director, Adjunct Professor Management Digital Services + University of Chicago Ballard Spahr LLP Analytics [email protected] [email protected] HBR Consulting SESSION NEED Data Engineering + Data Mining Management Data Engineering + Data Mining Data Management How we store, stage, prep and How we explore data and begin to ready data for consumption derive meaning, insights and value from those assets DATA ENGINEERING THE DATA REVOLUTION • Data is changing the world. Data to this century is what oil was for the last century - A driver for growth, change, and success. David Parkins https://www.youtube.com/watch?v=4ycC0DJqrpc https://leewardcapitalmgt.com/the-economist-the-worlds-most-valuable-resource-is-no-longer-oil-but-data/ BIG DATA • Facebook: stores 400 PB data, with an incoming daily rate of about 600 TB. (as of 2017) • YouTube: 1000 PB video storage, 100 M views/day • Google: 4M searches/minute, stores 10 EB data(estimation) • AT&T: 1.9 T phone call records, 70,000 calls/second • US Credit cards: 1.4 B cards, 20 B transactions/year • Your Law Firm: 1 Bazillion Documents DATA-DRIVEN STUDY IN LEGAL INDUSTRY • Critical inputs are overlooked and suggests that many law firms may be missing data-oriented opportunities for growth – Expanding client base – Billing more hours, etc. • Are firms missing opportunities to improve the practice of law itself? https://www.clio.com/resources/legal-trends/2018-report/ DATA LIFECYCLE: ENABLING BUSINESS GROWTH • Data Lifecycle Management (DLM) is a process that helps organizations manage the flow of data throughout its lifecycle—from creation, to use, to sharing, archive and deletion. Analyze Share Capture Curate Store Aggregate Iterate Archive Create Enrich Secure ENTERPRISE DATA MANAGEMENT • Holistic framework comprising the people, processes and technology that optimizes data from a variety of different sources, then makes it available when and where it’s needed ( harmonization ) https://www.firstsanfranciscopartners.com/data-management/ ALIGNING BUSINESS STRATEGY & DATA STRATEGY • A Successful Data Strategy links Business Goals with Technology Solutions https://globaldatastrategy.com/our-services/enterprise-data-strategy/ DATA ENGINEERING • Aspect of data science that Ingest/ focuses on automation of Extract practical applications of data collection, curation, analysis and Analyze/ Prepare/ delivery in batch as well as in Deliver Clean near real time. Store/ Organize DATA FORMATS Numerical Text Media – audio, video Geospatial POPULAR DATABASES Database Type Database Names Relational Key-Value Column Document Graph OPERATIONAL VERSUS HISTORICAL DATA Turns the Wheel of the Organization Operational (OLTP) Databases Analytical (OLAP) Watch the Wheels of the Organization DATA PIPELINE ETL Reporting/ Data Warehouse BI Users/ Analysts Data Marts Operational Databases DATA WAREHOUSE • The data warehouse is a Informational structured repository of integrated, subject-oriented, enterprise-wide, historical, and time-variant data. The purpose of the data Enterprise Data Warehouse(EDW) warehouse is the retrieval of analytical information. A data warehouse can store detailed and/or summarized data. Analytical Data Mining DATA MARTS • Subset of data from a data warehouse • Confined to data specific to a single line of business or department e.g. Finance or Marketing • Features: – Subject oriented – Small in size (few tables) – Customized by department – Source is departmentally structured data warehouse EXTRACT, TRANSFORM, LOAD (ETL) • Creating ETL infrastructure is often the most time and resource- consuming part of the data warehouse development process https://dzone.com/ DATA LAKE • “A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.” https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/ DATA LAKE ARCHITECTURE https://dzone.com/ DATA LAKE VS DATA WAREHOUSE Attribute Data Warehouse Data Lake Schema Schema on write (predefined schemas) Schema on read (no predefined schemas) Scale Scales to large volumes at moderate cost - limited Scales to huge volumes at low cost - tens of thousands of storage and number of server nodes compute nodes Access methods Accessed through standardized SQL and BI tools Accessed through SQL-like systems, programs, and other methods Workloads Batch processing, concurrent users performing Batch processing, stream processing, predictive analytics, improved interactive Analytics capability over EDWs for interactive queries New data Time consuming to introduce new content Fast ingestion of new data/content Cost/efficiency Efficiently uses CPU/IO. Efficiently uses storage and processing capabilities at very low cost. Data Retention Limited - driven by retention policies Potential to retain all data (subject to retention policies) Users Reporting, Business Intelligence users Analytics, Data Scientists, Data Engineers Key Benefits Provides a single enterprise wide view of data from Allows usage of raw structured and unstructured data from a centralized multiple sources low-cost store EXTRACT, LOAD, TRANSFORM (ELT) • Loading of the extracted data, into a single, centralized data repository enabling unlimited access to all of the data at any time https://dzone.com/ DOCUMENT DATABASES • Data stored as documents ( multiple key-value pairs ) • Inherently a subclass of the key-value store Documents • Stores all information for a given object in a single instance in the database DATA MODELS Tabular (Relational) Data Model Document Data Model Related data split across multiple records and tables Related data contained in a single, rich document DISCRETE TO CONNECTED DATA RDBMS Hadoop / |<———————- Graph Database & ———————>| & EDW/ Graph Compute Engine Aggregate-Oriented Columnar RDBMS (Graph Transactions & Analytics) NoSQL Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid) GRAPH DATABASES Small network of Twitter users • Graph is just a collection of vertices and edges - or, in less intimidating language, a set of nodes and the relationships that connect them. Graphs represent entities as nodes and the ways in which those entities relate to the world as relationships. A small social graph Five graphs in the world of business—social, intent, consumption, interest, and mobile - Ability to leverage these graphs provides a “sustainable competitive advantage.” - Gartner Graph Databases 2nd Edition : By Ian Robinson, Jim Webber, and Emil Eifrém THE LABELED PROPERTY GRAPH MODEL • A labeled property graph has the following characteristics: – Contains nodes and relationships. – Nodes contain properties (key-value pairs). – Nodes can be labeled with one or more labels. – Relationships are named and directed, and always have a start and end node. – Relationships can also contain properties. RELATIONAL VERSUS GRAPH MODELS • No more tables, no more foreign keys, no more joins Graph Model SHREE Terminator Titanic RATED SHREE Toy Story Person Ratings Movie GRAPH DATABASE (NEO4J) • Database management system with Create, Read, Update and Delete (CRUD) operations working on a graph data model. • Generally built for use with On line transaction processing (OLTP) systems. Relationships are first-class citizens of the graph data model NOW, LET’S EXPLORE THESE CONCEPTS AS APPLIED TO A LAW FIRM ADVANCED DATA MANAGEMENT IN PRACTICE ENTERPRISE DATA WAREHOUSE PROJECT: WHAT… WHY… HOW… DRIVERS FOR CHANGE/INVESTMENT Issues: • Loads of data from disparate systems each with its own metadata • No firm-wide tool to connect the dots • We needed greater agility for system conversions DRIVERS FOR CHANGE/INVESTMENT Realization: How we manage, analyze, and leverage data to drive quality and add value will: • Differentiate us in a demanding legal services market • Improve profitability GOALS GOALS • Better manage slowly-changing dimensions • Solidify systems of ownership • Optimize EDW for build performance and query performance • Point-in-time reporting with flexible level of granularity (people changes, client/matter changes) • Determine a mechanism for handling effective dated and future-dated data GOALS • Ensure all data in NBI system has a home in the EDW • Enable a more efficient view into GL actuals and budgets • Establish governance processes for new key entities, data elements, and data marts METHODOLOGY METHODOLOGY • Data Workshop • Create a centralized framework and include data across all core firm systems • Start with People, Clients, Matters – Current view – Effective Dated – Point in Time METHODOLOGY • Incorporate processes that scrub the data to ensure completeness and accuracy • Partner with Microsoft for best practices around: – Security –
Recommended publications
  • Data Retention Policy (GDPR Compliant)
    Data Retention Policy (GDPR Compliant) Data controller: Habasit (UK) Ltd Habegger House Gannex Park Dewsbury Road Elland HX5 9AF 1 INTRODUCTION 1.1 Habasit (UK) Ltd (“we”, “our”, “us” or “the Company”) must comply with our obligations under data protection laws (including the GDPR and the Data Protection Act 2018) whenever we Process Personal Data relating to our employees, workers, customers and suppliers and any other individuals we interact with. 1.2 This includes the obligation not to Process any Personal Data which permits the identification of Data Subjects for any longer than is necessary and the purpose of this policy is to assist us to comply with that obligation. This policy should be read alongside the Data Retention Matrix which is appended at Schedule 1 to this policy and which provides guideline data retention periods for various different types of Personal Data we hold. 1.3 Compliance with this policy will also assist us to comply with our ‘data minimisation’ and accuracy obligations under data protection laws which require us to ensure that we do not retain Personal Data which is irrelevant, excessive, inaccurate or out of date. 1.4 A failure to comply with data protection laws could result in enforcement action against the Company, which may include substantial fines of up to €20 million or 4% of total worldwide annual turnover (whichever is higher), significant reputational damage and potential legal claims from individuals. It can also have personal consequences for individuals in certain circumstances i.e. criminal fines/imprisonment or director disqualification. 1.5 Compliance with this policy will also assist in reducing the Company’s information storage costs and the burden of responding to requests made by Data Subjects under data protection laws such as access and erasure requests.
    [Show full text]
  • Basic Overview of Data Retention Mandates – Privacy and Cost
    BASIC OVERVIEW OF DATA RETENTION MANDATES – PRIVACY AND COST September 2012 Introduction The use of telephone and Internet services generates information useful to governments in conducting law enforcement and national security investigations. In an effort to guarantee the availability of communications data for investigations, some governments have imposed or have considered imposing legal obligations requiring communications service providers to retain for specified periods of time certain data about all of their users. Generally, under these “data retention” mandates, data about individuals’ use of communications services must be collected and stored in a manner such that it is linked to a specific user’s name or other identification information. Government officials may then request access to this data, pursuant to the laws of their respective countries, for use in investigations. As a tool for addressing law enforcement challenges, data retention comes with a very high cost and is ultimately disproportionate to the goals it seeks to advance. Less privacy-burdensome alternatives are likely to accomplish governments’ legitimate goals just as, and perhaps more, effectively. I. Data Retention: The Basics Data retention laws vary with respect to the types of companies, data, and services that they cover. Types of companies covered: Most of the data retention laws that have been adopted by governments around the world focus on telephone companies (both fixed line and wireless) and Internet service providers (ISPs), including cable companies cable and mobile providers. Some data retention laws also apply to any entity that offers Internet access, such as Internet cafes and WiFi “hotspots.” Some data retention laws place retention obligations on online service providers (OSPs) – companies that provide, among other things, web-hosting services, email services, platforms for user-generated content, and mobile and web applications.
    [Show full text]
  • Facts About the Federal Government's Data Retention Scheme
    Consumer Fact Sheet Facts about the Federal Government’s data retention scheme The Federal Government’s data retention scheme, enacted in March 2015, will come into effect between 13 October 2015 and 12 April 2017. Our fact sheet covers what consumers need to know. What is metadata? Metadata, simply put, is ‘data about data’. In telecommunications it is information about communications (e.g. the time a phone call was made and its duration), information about the people communicating (e.g. the sender and the receiver) including account and location information, and the device used. The scheme requires that service providers retain metadata but not the content or substance of a communication. However metadata can still reveal a lot of information about an individual and those they interact with. The set of metadata that will be required is set out in the legislation – see http://www.ag.gov.au/NationalSecurity/DataRetention/Documents/Dataset.pdf How will your metadata be used? It will be mandatory for telcos and ISPs to store your metadata for two years (some may have a business need for longer retention of some data). This metadata will be available to specified government agencies (such as law enforcement and national security agencies) upon request. You will be able to access your own data and many service providers do some of this already in your ordinary bill. How will it affect consumers? Costs We don’t know how much a data retention scheme will cost to set up but in March the Government estimated it at $400 million to set up and $4 per year, per customer to run.
    [Show full text]
  • 10. GCHQ. Handling Arrangements for Bulk
    OFFICIAL This information has been gisted, GCHQ Bulk Personal Datasets Closed Handling Arrangements 1. introduction 1.1 These handling arrangements are made under section 4(2)(a) of the Intelligence Services Act 1994 (ISA). They come into force on 4 November 2015. 1.2 These arrangements apply to the Government Communications Headquarters (GCHQ) with respect to the obtaining, use and disclosure of the category of information identified in Part 2 below, namely "bulk personal datasets". 1.3 The rules set out in these arrangements are mandatory and must be followed by GCHQ staff. Failure by staff to comply with these arrangements may lead to disciplinary action, which can include dismissal, and potentially to criminal prosecution. 2- Information covered by these arrangements 2.1 The Security and Intelligence Agencies (SIA) have an agreed definition of a 'Bulk Personal Dataset" (BPD). A BPD means any collection of information which: • comprises personal data; • relates to a wide range of individuals, the majority of whom are unlikely to be of intelligence interest; • is held, or acquired for the purpose of holding, on one or more analytical systems within the SIA. 2.2 Bulk Personal Datasets will in general also share the characteristic of being too large to be manually processed (particularly given that benefit is derived from using them in conjunction with other datasets). 2.3 In this context, "personal data" has the meaning given to it in section 1(1) of the Data Protection Act 1998 (DPA), which defines "personal data" as follows: "data which relate to a livingl individual who can be identified — • from those data; or • from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller [e.g.
    [Show full text]
  • NSA) Surveillance Programmes (PRISM) and Foreign Intelligence Surveillance Act (FISA) Activities and Their Impact on EU Citizens' Fundamental Rights
    DIRECTORATE GENERAL FOR INTERNAL POLICIES POLICY DEPARTMENT C: CITIZENS' RIGHTS AND CONSTITUTIONAL AFFAIRS The US National Security Agency (NSA) surveillance programmes (PRISM) and Foreign Intelligence Surveillance Act (FISA) activities and their impact on EU citizens' fundamental rights NOTE Abstract In light of the recent PRISM-related revelations, this briefing note analyzes the impact of US surveillance programmes on European citizens’ rights. The note explores the scope of surveillance that can be carried out under the US FISA Amendment Act 2008, and related practices of the US authorities which have very strong implications for EU data sovereignty and the protection of European citizens’ rights. PE xxx.xxx EN AUTHOR(S) Mr Caspar BOWDEN (Independent Privacy Researcher) Introduction by Prof. Didier BIGO (King’s College London / Director of the Centre d’Etudes sur les Conflits, Liberté et Sécurité – CCLS, Paris, France). Copy-Editing: Dr. Amandine SCHERRER (Centre d’Etudes sur les Conflits, Liberté et Sécurité – CCLS, Paris, France) Bibliographical assistance : Wendy Grossman RESPONSIBLE ADMINISTRATOR Mr Alessandro DAVOLI Policy Department Citizens' Rights and Constitutional Affairs European Parliament B-1047 Brussels E-mail: [email protected] LINGUISTIC VERSIONS Original: EN ABOUT THE EDITOR To contact the Policy Department or to subscribe to its monthly newsletter please write to: [email protected] Manuscript completed in MMMMM 200X. Brussels, © European Parliament, 200X. This document is available on the Internet at: http://www.europarl.europa.eu/studies DISCLAIMER The opinions expressed in this document are the sole responsibility of the author and do not necessarily represent the official position of the European Parliament.
    [Show full text]
  • Data Localization and the Role of Infrastructure for Surveillance, Privacy, and Security
    International Journal of Communication 10(2016), 2221–2237 1932–8036/20160005 Data Localization and the Role of Infrastructure for Surveillance, Privacy, and Security TATEVIK SARGSYAN American University, USA Due to the increased awareness of the politics embedded in Internet technologies, there has been a growing tendency for state and nonstate actors around the world to leverage Internet infrastructure configurations to attain various political and economic objectives. Governments push for infrastructure modifications in pursuit of economic development, data privacy and security, and law enforcement and surveillance effectiveness. Information intermediaries set and enact their infrastructure to maximize revenue by enabling data collection and analytics, but have the capacity to implement tools for protecting privacy and limiting government surveillance. Relying on a conceptual framework of the politics of infrastructure, this article explores tensions and competing interests that emerge around intermediaries’ technical and policy infrastructure through analysis of (a) data localization strategies in a number of countries and (b) privacy and security undertakings by information intermediaries. Keywords: privacy, security, Internet infrastructure, surveillance, data localization The Politics of Infrastructure Governments across the world have come to recognize the importance of information intermediaries’ infrastructure for national security, public safety, and other political interests. Law enforcement and intelligence agencies are tasked with addressing various challenges, including the growth of terrorism, cyberattacks, cybercrime, fraud, and—in some regimes—political opposition and social movements. To pursue these goals, government agencies often need to access communications data that are beyond their immediate control, facilitated by a handful of information intermediaries. These companies mediate content by providing online services and communication platforms to global users.
    [Show full text]
  • 2017 Data Mining Report to Congress October 2018 2017 DHS Data Mining Report
    Privacy Office 2017 Data Mining Report to Congress October 2018 2017 DHS Data Mining Report FOREWORD August 2018 I am pleased to present the Department of Homeland Security’s (DHS) 2017 Data Mining Report to Congress. The Federal Agency Data Mining Reporting Act of 2007, 42 U.S.C. § 2000ee- 3, requires DHS to report annually to Congress on DHS activities that meet the Act’s definition of data mining. For each identified activity, the Act requires DHS to provide the following: (1) a thorough description of the activity and the technology and methodology used; (2) the sources of data used; (3) an analysis of the activity’s efficacy; (4) the legal authorities supporting the activity; and (5) an analysis of the activity’s impact on privacy and the protections in place to protect privacy. This is the twelfth comprehensive DHS Data Mining Report and the tenth report prepared pursuant to the Act. Two annexes to this report, which include Law Enforcement Sensitive information and Sensitive Security Information, are being provided separately to Congress as required by the Act. With the creation of DHS, Congress authorized the Department to engage in data mining and the use of other analytical tools in furtherance of Departmental goals and objectives. Consistent with the rigorous compliance process it applies to all DHS programs and systems, the DHS Privacy Office works closely with the programs discussed in this report to ensure that they employ data mining in a manner that both supports the Department’s mission to protect the homeland and protects privacy. www.dhs.gov/privacy 2017 DHS Data Mining Report Pursuant to congressional requirements, this report is being provided to the following Members of Congress: The Honorable Michael Pence President, U.S.
    [Show full text]
  • Humanitarian Futures for Messaging Apps
    HUMANITARIAN FUTURES FOR MESSAGING APPS UNDERSTANDING THE OPPORTUNITIES AND RISKS FOR HUMANITARIAN ACTION Syrian refugees, landed on Lesbos in Greece, looking for a mobile signal to check their location and notify relatives that they arrived safely. International Committee of the Red Cross 19, avenue de la Paix 1202 Geneva, Switzerland T +41 22 734 60 01 F +41 22 733 20 57 E-mail: [email protected] www.icrc.org January 2017 Front cover: I. Prickett/UNHCR HUMANITARIAN FUTURES FOR MESSAGING APPS UNDERSTANDING THE OPPORTUNITIES AND RISKS FOR HUMANITARIAN ACTION This report, commissioned by the International Committee of the Red Cross (ICRC), is the product of a collaboration between the ICRC, The Engine Room and Block Party. The content of this report does not reflect the official opinion of the ICRC. Responsibility for the information and views expressed in the report lies entirely with The Engine Room and Block Party. Commissioning Editors: Jacobo Quintanilla and Philippe Stoll (ICRC). Lead Researcher: Tom Walker (The Engine Room). Content: Eytan Oren (Block Party), Zara Rahman (The Engine Room), Nisha Thompson, and Carly Nyst. Editors: Michael Wells and John Borland. Project Manager: Waiyee Leong (ICRC). The ICRC, The Engine Room and Block Party request due acknowledgement and quotes from this publication to be referenced as: ICRC, The Engine Room and Block Party, Humanitarian Futures for Messaging Apps, January 2017. This report is available at www.icrc.org, https://theengineroom.org and http://weareblockparty.com. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-sa/4.0/.
    [Show full text]
  • Two Years After Snowden
    TWO YEARS AFTER SNOWDEN PROTECTING HUMAN RIGHTS IN AN AGE OF MASS SURVEILLANCE (COVER IMAGE) A student works on a computer that is projecting former U.S. National Security Agency contractor Edward Snowden as he appears live via video during a world affairs conference in Toronto © REUTERS/Mark Blinch 2 TWO YEARS AFTER SNOWDEN JUNE 2015 © REUTERS/Zoran Milich © REUTERS/Zoran “The hard truth is that the use of mass surveillance technology effectively does away with the right to privacy of communications on the Internet altogether.” Ben Emmerson QC, UN Special Rapporteur on counter-terrorism and human rights EXECUTIVE SUMMARY On 5 June 2013, a British newspaper, The exposed by the media based on files leaked by Guardian, published the first in a series Edward Snowden have included evidence that: of revelations about indiscriminate mass surveillance by the USA’s National Security Companies – including Facebook, Google Agency (NSA) and the UK’s Government and Microsoft – were forced to handover Communications Headquarters (GCHQ). their customers’ data under secret orders Edward Snowden, a whistleblower who had through the NSA’s Prism programme; worked with the NSA, provided concrete evidence of global communications the NSA recorded, stored and analysed surveillance programmes that monitor the metadata related to every single telephone internet and phone activity of hundreds call and text message transmitted in of millions of people across the world. Mexico, Kenya, and the Philippines; Governments can have legitimate reasons GCHQ and the NSA have co- for using communications surveillance, for opted some of the world’s largest example to combat crime or protect national telecommunications companies to tap security.
    [Show full text]
  • II. Data Retention: the Basics
    INTRODUCTION TO DATA RETENTION MANDATES September 2012 This memo introduces the concept of data retention, describes the common attributes of data retention laws, and discusses the risks to human rights, broadband deployment, economic growth and law enforcement effectiveness that such laws create. I. What is data retention? The telephone network (both fixed and wireless) and Internet services generate huge amounts of transactional data that reveals the activities and associations of users. Increasingly, law enforcement officers around the world seek such information from service providers for use in criminal and national security investigations. In order to ensure the ready availability of such data, some governments have imposed or have considered imposing mandates requiring communications companies to retain certain data – data that these companies would not otherwise keep – about all of their users. Under these mandates (imposed by law or regulation or through licensing conditions), data must be collected and stored in such a manner that it is linked to users’ names or other identification information. Government officials may then demand access to this data, pursuant to the laws of their respective countries, for use in investigations.1 As a tool for addressing law enforcement challenges, data retention comes with a very high cost and is ultimately disproportionate to the goals it seeks to advance. Less privacy-burdensome alternatives are likely to accomplish governments’ legitimate goals just as effectively and perhaps more effectively. II. Data Retention: The Basics Data retention laws vary with respect to the types of companies, data, and services that they cover. Types of companies covered: Most of the data retention laws that have been adopted thus far focus on telephone companies (both fixed line and wireless) and Internet service providers (ISPs), including cable companies and mobile providers.
    [Show full text]
  • A Practical Guide to Implementing a Data Retention Policy
    AA PRACTICALPRACTICAL GUIDEGUIDE TOTO IMPLEMENTING IMPLEMENTING AA DATADATA RETENTION RETENTION POLICY POLICY ACHTERGROND ACHTERGROND A practical guide to implementing a data retention policy drs. J. Blaauw en Y. Ajibade Msc* Trefwoorden: data management, data retention, data retentie, record retention, GDPR, AVG, data, dataverwerking, privacy Every organization processes1 data for different reasons and in different ways. In a data driven world, an organization largely depends on the data it has and uses.2 Part of fruitful processing data is making sure that you know when data must be kept and when it must be removed. Data3 is subject to different data retention periods, which may vary per country and/or industry. As a result, thousands of retention rules require you to either keep or destroy data and it is challenging to get advice on this topic. Implementing a data retention policy will help organizations to be in control of their data and it will reduce the risk of being non-com- pliant with laws and regulation, including the General Data Protection Regulation (GDPR).4 Properly implementing such a policy takes effort, commitment and management support. As a bonus, up-to-date and relevant data increases the value of your organization. This article offers a helping hand to organizations that are in various stages of implementing a data retention policy. In the first part of this article, we will focus on the legal framework. Here we will zoom in on the complexity of various rules and * Joris Blaauw is Senior Compliance Officer, regulations. In the second part, we outline eight steps to build a Group Data Protection Officer and Group solid data retention policy.
    [Show full text]
  • Benefits of Data Archiving in Data Warehouses 2 Benefits of Data Archiving in Data Warehouses
    IBM Software White Paper Benefits of data archiving in data warehouses 2 Benefits of data archiving in data warehouses Contents This unchecked data growth often results in ever-increasing infrastructure and operational costs, poor data warehouse 2 Executive summary performance, and an inability to support complex data 3 Typical reasons for rapid data growth retention and legal hold requirements. 4 Challenges associated with data warehouse growth A data archiving solution helps organizations address these 5 Traditional data growth solutions that do not work challenges by allowing IT staff to intelligently move (and purge) historical and inactive data from production databases 6 Understanding data archiving into a more cost-effective location while still providing the capabilities to query, search or even restore data if needed. 9 Benefits of data archiving A tiered archiving strategy provides additional benefits in 10 Guiding principles and technology requirements terms of managing performance and cost-effectiveness. Data archiving can also alleviate data growth issues by: 11 Managing data growth responsibly with data warehouse archiving • Removing or relocating inactive and dormant data out of the database to improve data warehouse performance • Reducing the infrastructure and operational costs typically Executive summary associated with data growth Data warehouses are the pillars of business intelligence and • Leveraging proven policies and processes to cost-effectively analytics systems, often integrating data from multiple data manage multi-temperature data sources in an organization to provide historical, current or • Improving disaster recovery and backup/restore plans to even predictive analysis of the business. Information from consistently meet service-level agreements (SLAs) multiple internal or external transactional systems is extracted, • Supporting compliance with data retention, purge or transformed and loaded into data warehouses as atomic hold policies data.
    [Show full text]