A company of Daimler AG

ARE DATA LAKES THE NEW CORE DWHS? ANDREAS BUCKENHOFER, DAIMLER TSS DOAG BIG DATA, REPORTING, GEODATA DAYS - KASSEL 2017 ABOUT ME

https://de.linkedin.com/in/buckenhofer Andreas Buckenhofer Senior DB Professional https://twitter.com/ABuckenhofer [email protected] https://www.doag.org/de/themen/datenbank/in-memory/

Since 2009 at Daimler TSS http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/ Department: Big Data Business Unit: Analytics https://www.xing.com/profile/Andreas_Buckenhofer2 TSS 2 0 2 0 ALWAYS ON THE MOVE.

DAIMLER TSS. IT EXCELLENCE: COMPREHENSIVE, INNOVATIVE, CLOSE.

We're a specialist and strategic business partner for innovative IT Solutions within Daimler – not just another supplier! As a 100% subsidiary of Daimler, we live the culture of excellence and aspire to take an innovative and technological lead. With our outstanding technological and methodical know-how we are a competent provider of services that help those who benefit from them to stand out from the competition. When it comes to demanding IT questions we create impetus, especially in the core fields car IT and mobility, information security, analytics, shared services and Digital Customer Experience.

Daimler TSS GmbH Are Data Lakes the new Core DWHs? 3 LOCATIONS

Daimler TSS Germany More than 1000 Employees Ulm (Headquarters)

Stuttgart Area Daimler TSS China Böblingen, Echterdingen, Hub Beijing Leinfelden, Möhringen 6 Employees

Berlin Daimler TSS Malaysia Hub Kuala Lumpur 38 Employees Karlsruhe Daimler TSS India Hub Bangalore 16 Employees

Daimler TSS GmbH Are Data Lakes the new Core DWHs? 4 AGENDA

1. Introduction/Motivation 2. From the classic DWH architecture to the Data Lake 3. Data Lake usage scenarios 4. Summary DIGITIZATION – DATA AS AN ASSET FOR ANALYTICAL DECISIONS • Software is becoming more and more important • 100Mio lines of code • Physical products • are significantly enhanced with digital service capabilities, e.g. the value of the car comes increasingly from digital assets • become digital services, e.g. car2go Source image: https://www.linkedin.com/pulse/20140626152045-3625632-car-software-100m-lines-of-code-and-counting • IOT, Robotics, etc.

Daimler TSS Are Data Lakes the new Core DWHs? 6 DWH AS INTEGRATION SYSTEM FOR DIGITAL ASSETS SOME OF TODAY’S MAIN CHALLENGES

Agility • Is the Organization ready? IT (Dev + Ops) and Business Flexibility • Data Modeling under pressure, model as you go • New data formats coming from logs, sensors, etc. Performance • Right Time • Scale to high volumes • Integrate data arriving at high speed

Daimler TSS Are Data Lakes the new Core DWHs? 7 IS THE DEAD? AND ETL, TOO?

Sources: https://www.linkedin.com/groups/45685/45685-6224210695295168512?trk=hp-feed-group-discussion&_mSplash=1 https://speakerdeck.com/nehanarkhede/etl-is-dead-long-live-streams https://gcn.com/blogs/reality-check/2014/01/hadoop-vs-data-warehousing.aspx Daimler TSS Are Data Lakes the new Core DWHs? 8 AGENDA

1. Introduction/Motivation 2. From the classic DWH architecture to the Data Lake 3. Data Lake usage scenarios 4. Summary REFERENCE DATA WAREHOUSE ARCHITECTURE

Internal data sources subject-Data Warehouse oriented, integrated, time- Backend Frontend variant, non- volatile OLTP Core Mart Layer Staging Integration Warehouse Aggregation (Output Layer Layer Layer Layer Layer) (Input (Cleansing (Storage (Reporting OLTP Layer) Layer) Layer) Layer) External data sources

Metadata Management Security DWH Manager

Daimler TSS Are Data Lakes the new Core DWHs? 10 REFERENCE DATA WAREHOUSE ARCHITECTURE

Internal data sources subject-Data Warehouse oriented, integrated, time- Backend Frontend variant, non- volatile OLTP Core Mart Layer Staging Integration Warehouse Aggregation (Output Layer Layer Layer Layer Layer) (Input (Cleansing (Storage (Reporting OLTP Layer) Layer) Layer) Layer) External data sources

Metadata Management Security DWH Manager

Daimler TSS Are Data Lakes the new Core DWHs? 11 Data Library Data Lake on Spark Data Repository Data Reservoir Data Archive Data Swamp Data Lake 3.0 Landing Zone Data Lake on Hadoop

Daimler TSS Are Data Lakes the new Core DWHs? 12 DATA LAKE REFERENCE ARCHITECTURE DATA LAKE OVERALL ARCHITECTURE VS DATA LAKE LAYER

Data Reservoir / Presentation t n e e D D m c e a a n g t t a a a a n

r n S A a e e r v c c M o h u Data Lake i r a G v

t i t a a a y t l d a a t D e M Landing Zone

Daimler TSS Are Data Lakes the new Core DWHs? 13 DATA LAKE REFERENCE ARCHITECTURE

ODBC/JDBC Restful Client

Firewall e D D c Data Reservoir /Presentation t a n a

n t t a a e a a t n

r a K S m A e d e e n r v a c c g o t o h u a Data Lake e x i r G n v

i a t M a a y t l M a D Landing Zone

Sqoop Kafka Rest API Firewall Sources Daimler TSS Are Data Lakes the new Core DWHs? 14 DATA LAKE VS HADOOP

Data Lake • Architecture, concept

Hadoop, Spark, • Tools (that can be used to Elastic Stack implement a Lake)

Daimler TSS Are Data Lakes the new Core DWHs? 15 HOW TO STRUCTURE THE DATA LAKE? SCHEMA-LESS REVOLUTION? • Data has a structure: schema-less does not exist • You apply • schema-on-read e.g. copy files (csv, json, html, …) into HDFS • schema-on-write e.g. create on data files in HDFS

Daimler TSS Are Data Lakes the new Core DWHs? 16 SCHEMA-ON-READ

Flexibility • For whom? Writing the data vs reading the data Simplicity • For whom? Writing the data vs reading the data • Human mistakes while trying to reading the data Agility / Model as you go • Just copy files into the directory

Daimler TSS Are Data Lakes the new Core DWHs? 17 LAMBDA ARCHITECTURE AN EARLY COMPREHENSIVE BIG DATA ARCHITECTURE • It can be argued about the complexity of the Lambda architecture • More interesting is the author’s view on data • Rawness Store the data as it is. No transformations. • Immutability Don’t update or delete data, just add more. • Graph-like schema recommended

Source image: Nathan Marz, James Warren: Big Data: Principles and best practices of scalable realtime data systems, Manning Publications 2015 Daimler TSS Are Data Lakes the new Core DWHs? 18 LAMBDA ARCHITECTURE

• It can be argued about the complexity „Manyof the developers go down the path of Lambda architecture writing their raw data in a schemaless • More interesting is the author’s viewformat on data like JSON. This is appealing because of how easy it is to get started, but this • Rawness approach quickly leads to problems. Store the data as it is. No transformations.Whether due to bugs or misunderstandings • Immutability between different developers, data Don’t update or delete data, just add more. corruption inevitably occurs“ • Graph-like schema recommended (see page 103, Nathan Marz, „Big Data: Principles and best practices of scalable realtime data systems", Manning Publications) Source image: Nathan Marz, James Warren: Big Data: Principles and best practices of scalable realtime data systems, Manning Publications 2015 Daimler TSS Are Data Lakes the new Core DWHs? 19 STRUCTURING THE DATA LAKE DATA SECURITY

Just dumping data into the Lake?

• General Data Protection Regulation, e.g. Privacy by Design • Vehicle identifier VIN is already sensitive data that needs to be protected (anonymized) depending from usage • Earmarked use of data Schema-on-read: How do you protect data assets if you are not aware that the data exists or where it exists?

Daimler TSS Are Data Lakes the new Core DWHs? 20 DATA LAKE REFERENCE ARCHITECTURE

access Structured data for fast archive access Data Presentation transform t n e e D D m c e a a n g t t a a a a n

r n S A a e e Immutable, modeled data r v c c M o h u Data Lake i r a G

Tool neutral v

t i t a a a y t l d a a t D e

archive R M a

structure w

d a t Temporary storage a Landing Zone archive load

Daimler TSS Are Data Lakes the new Core DWHs? 21 DATA LAKE HAS LAYERS (1) DATA LAKE AS CONCEPT VS DATA LAKE AS LAYER Distinguish Data Lake as overall concept vs Data Lake as a layer • Landing Zone • Source data programmatically loaded • Data is partitioned for processing • Governance includes catalog and ILM (Security, Retention) • Data Lake • Lightly integrated by Keys • Data accessible via SQL-on-Hadoop or using SerDes on raw data • Data is partitioned for access • Governance includes catalog, ILM, lightweight model

Daimler TSS Are Data Lakes the new Core DWHs? 22 DATA LAKE HAS LAYERS (2)

• Presentation Zone • Data is structured and partitioned/tuned for data access • Full Governance including e.g. catalog, ILM, model • Known schema including metadata about tables and columns • Lineage • Documented quality

Daimler TSS Are Data Lakes the new Core DWHs? 23 GOVERNANCE BY DAIMLER AG / COE E.G. SAMPLE HDFS LAYOUT

/

Landing_zone Data_archive Data_lake Data_reservoir Source_system Source_system Source_system_object Use_case scripts scripts scripts scripts data data data data Data_science_results Data_science_sandbox model scripts data data

Daimler TSS Are Data Lakes the new Core DWHs? 24 AGENDA

1. Introduction/Motivation 2. From the classic DWH architecture to the Data Lake 3. Data Lake usage scenarios 4. Summary USE CASES WHAT IS THE BUSINESS PROBLEM TO SOLVE? S o u r c e :

h t t p : / / w w w . a z q u o t e s . c o m /

Daimler TSS Are Data Lakes the new Core DWHs? 26 USE CASE: ANALYSIS BATTERY AGING

• CSV data ingested into HDFS, Hive tables on files • Identify breaks (“> 8h”) and compute current drain

Max capacity Current capacity

Daimler TSS Are Data Lakes the new Core DWHs? 27 STRUCTURING THE DATA LAKE NEW DATA SOURCES – SENSOR DATA • Sensor data format change without notice • Sensors get regularly updated with new versions • Names of metrics may change • Sensors with various versions in the field • Sensors from different suppliers • Often many fields >>100 and increasing with new sensor versions • Easy storing of data in HDFS and applying schema later • Data from Robots, vehicles, …

Daimler TSS Are Data Lakes the new Core DWHs? 28 STRUCTURING THE DATA LAKE “SCHEMA-ON-READ”

• Sensor data format change R Python without notice Hive tables • Time consuming and error-prone Data Reservoir data integration into the Data Lake

t Samp- n

e ling / • Therefore preparation of data for e D D m c e a a n

g filter t t a a a usage in the Data Reservoir a n

r n S A a e e r v c Hive tables c required: “Data Engineer” M o h u Data Lake i r a G v

t i t a a a y t l d a a t D e

M Struc- ture Landing Zone csv

Daimler TSS Are Data Lakes the new Core DWHs? 29 USE CASE: OPTIMIZE CYCLE TIME FOR LIGHTWEIGHT ROBOTS • JSON data from Orient NoSQL-DB ingested into HDFS, Hive tables on files • Partly automatize the diagnosis of anomalies (e.g. the identification of reasons for idle times)

Daimler TSS Are Data Lakes the new Core DWHs? 30 USE CASE: BOM EXPLOSION HADOOP COMPUTING POWER

Daimler TSS Are Data Lakes the new Core DWHs? 31 USE CASE: BOM EXPLOSION HADOOP COMPUTING POWER • PLMXML files supplied by source systems • Compute changes by comparing last BOM with current BOM • Data Lake contains data across all tiers • Data Reservoir contains “dedicated, secured” views for tiers • Transfer changes to local relational DBs

Daimler TSS Are Data Lakes the new Core DWHs? 32 STRUCTURING THE DATA LAKE LAYER EXISTING INTERNAL DATA FOR ANALYTICS • Several stakeholders, e.g. different (independent) truck units • Dumping existing systems (or new data sources like logs) into the Data Lake • Data is available fast, but • Different data models • No integration: IF ETL is reduced to EL, then T is performed by Data Scientists many times • Some lightweight data integration required Æ Data Vault

Daimler TSS Are Data Lakes the new Core DWHs? 33 STRUCTURING THE DATA LAKE LAYER DATA VAULT CHALLENGES WITH HADOOP • Hub and Link tables: how to ensure uniqueness? • No unique constraints or indexes like RDBMS • Use View with distinct or group by on Hub or Link table • Don’t create Hub or Link table. Create view with distinct or group by on original persisted incoming files • Use HBase NoSQL wide- store for Hub, Link (+ Sat) and Phoenix for SQL access via Hive • Hub and Link in RDBMS only • Data Reservoir needs different structure or export data into in RDBMS for faster access

Daimler TSS Are Data Lakes the new Core DWHs? 34 DATA LAKE IN ANALOGY TO AN ENTERPRISE DWH?

• Vision: One central Enterprise DWH • Reality for many organizations: Many DWHs • more flexible • acquisition of companies. Merge of systems? • units with different (innovation) speeds and different interests, e.g. trucks (Mercedes Benz LKW, Freightliner, Fuso, BharatBenz, Western Star, Fleetboard) • legal requirements (e.g. data export) • Vision: One central Data Lake • Reality: ?

Daimler TSS Are Data Lakes the new Core DWHs? 35 BARRY DEVLIN – LOGICAL DATA WAREHOUSE

“The long-term vision was clear – the data warehouse should not be confined physically to a single database or machine” (09-MAR-2017)

Barry Devlin wrote the first published article describing a data warehouse architecture in 1988 ( http://www.9sight.com/1988/02/art-ibmsj-ebis/ )

Source: https://upside.tdwi.org/articles/2017/03/09/making-the-most-of-a-logical-data-warehouse.aspx Daimler TSS Are Data Lakes the new Core DWHs? 36 AGENDA

1. Introduction/Motivation 2. From the classic DWH architecture to the Data Lake 3. Data Lake usage scenarios 4. Summary WHY DATA MODELING?

“Data modeling is the process of learning about the data, and regardless of technology, this process must be performed for a successful application.” Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014 • Learn about the data and promote collective data understanding • Derive security classification and measures • Design for performance • Accelerate development • Improve Software quality • Reduce maintenance costs • Generate code • NoSQL Schema-on-read: understand model versions after years Daimler TSS Are Data Lakes the new Core DWHs? 38 DWH AND DATA LAKE

DWH on RDBMS Data Lake on Hadoop

Slowly Changing Dimension Schema-on-Read ELT vs ETL Agility 3-Layer vs 2-Layer Parquet Kimball Approach Hive Inmon Definition Hbase SQL-on-Hadoop Data Vault Impala Anchor Modeling Methods, Oozie Tools, etc Concepts, Zoekeeper Tools, Techniques Tools

Daimler TSS Are Data Lakes the new Core DWHs? 39 NO DATA INTEGRATION - IS ETL DEAD? DATA SCIENCE REQUIRES PROPER DATA ENGINEERING Many ETL problems are home-made, e.g. • Inefficient: ETL vs ETL / -based vs set-based • Expensive: repetitive tasks should be accomplished with generators

Most people in AI forget that the hardest part of building a new AI solution or product is not the AI or algorithmsറ— it’s the data collection and labeling.

Source: https://medium.com/startup-grind/fueling-the-ai-gold-rush-7ae438505bc2#.ywjvuca6z (Luke de Oliveira)

Daimler TSS Are Data Lakes the new Core DWHs? 40 IS THE CLASSICAL DWH DEAD? ARE DATA LAKES THE NEW CORE DWHS?

Data Lakes currently focus too much on tools instead on concepts and methods •Tools come and go •Flexibility / Schema-on read: Integration just postponed to Data Reservoir or in the worst case even later to end user PoCs vs production-ready implementation •Many tools, but still low-productivity tools (Oozie, etc) •Error handling coding nightmare across tools Data Lakes and Core DWHs will coexist •Another choice that makes sense for many use cases •DWH: e.g. Data Vault 2.0 architecture with storing raw data and postponing data cleansing / harmonization for lightweight data integration has similar ideas

Daimler TSS Are Data Lakes the new Core DWHs? 41 THANK YOU

Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 [email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSS Domicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle

Daimler TSS Are Data Lakes the new Core DWHs? 42 GARTNER DATA LAKE ARCHITECTURE STYLES

Source: http://blogs.gartner.com/nick-heudecker/data-lake-webinar-recap/

Daimler TSS Are Data Lakes the new Core DWHs? 43 GARTNER DATA LAKE ARCHITECTURE STYLES

• Inflow Lake: accommodates a collection of data ingested from many different sources that are disconnected outside the lake but can be used together by being colocated within a single place • Outflow Lake: a landing area for freshly arrived data available for immediate access or via streaming. It employs schema-on-read for the downstream data interpretation and refinement. • Data Science Lab: most suitable for data discovery and for developing new advanced analytics models

Source: http://blogs.gartner.com/nick-heudecker/data-lake-webinar-recap/ and https://www.asug.com/news/gartner-separate-data-lakes-myths-from-facts-before-you-dive-in IMAGE ATTRIBUTION

Slide 12: Creative Commons Licence, Hernán Piñera https://www.flickr.com/photos/hernanpc/7175577368/in/photolist-bW5Hab-JF9HNW-a2LHAF-pwWNjx-oC1Jq8-noeV4d-oLsHUa-gUjhFx-qNB2Sw-jKLDCR-DB3B8-pRUpx2-crB6A7-nTUuNp-cXdPgN- bX7mA4-7oHeKJ-arQCtK-njdhWh-nSadX3-dykooG-sjSZHV-eq69Ux-oW44NF-i2eUbE-5AyaGL-QkmoFh-nU7KcU-QEG6Nf-oziZ4t-oUbQi4-e2NWAT-i3Yna1-eJchKZ-pGC8eC-GDux8r-5FQt95-cWdzfh-ciwtqL- jQg8BL-4X83Uc-nBZXBA-nogVER-oekb6A-9F7w4M-jKPnYQ-bAGrjd-qNB4Hq-8gJRqp-ahC2fg Slide 47: Creative Commons Licence, James Loesch https://www.flickr.com/photos/jal33/5182574275/in/photolist-8TY3LT-7M8Fb9-4jWYv1-hrdbHV-4jSWSn-6cHmvc-m4NnDV-s9Efoy-ccFCcW-5t3Csw-8R87fq-mT6WNq-89mMuL-pzzDjq-2iq7ti-bBA7PT- rjPdnX-buU2V9-aottwt-4zHTZv-mT6gA6-5hLzzx-9aWGiZ-s9DJRY-jwfgr3-7WZA75-bVmho1-bXkF7U-9aWGba-3mJSwv-sa4Esa-4jWZaA-aottqr-8bj7rS-5NiZbm-oowJXV-3vp25c-5t3EkQ-NnLMaJ-naLPJm- m78nWk-nqnUYk-mT7Wso-o54T1J-bVmgA9-emeyU1-5hQFV5-akhQQL-naLDim-pPeh93

Daimler TSS Are Data Lakes the new Core DWHs? 45 DWH = inflexible development, bad performance, complex architecture with 3 layers

Daimler TSS Are Data Lakes the new Core DWHs? 46 SCHEMA-ON-READ OR WHY MODELING CAN STILL BE USEFUL

Failure to talk to business to obtain proper requirements

Ingestion of wrong data

Storage of data with errors

Business Keys (independent object) nested into document

Read performance

Daimler TSS Are Data Lakes the new Core DWHs? 47 SCHEMA-ON-READ OR WHICH BUSINESS PROBLEMS ARE SOLVED

Schema-on-read Remark

Data storage Yes, flexible Store data from various systems

Data integration no Integrate data from various systems

Has to be done during each access by each user Data historization Yes, auditable Stamp data with timestamp

Information delivery no Turn data into valuable information.

Has to be done during each access by each user

Daimler TSS Are Data Lakes the new Core DWHs? 48 DATA MODELS IN THE DWH

Layer Characteristics Data Model

Staging Layer ƒ Temporary storage ƒ Normally 1:1 copy of source table structure – usually without constraints and indexes ƒ Ingest of source data

Core Warehouse ƒ Historization / bitemporal data ƒ 3NF with historization Layer ƒ Integration ƒ Head and Version modelling ƒ Tool-independent ƒ Data Vault ƒ Non-redundant data storage ƒ Anchor modeling ƒ Historization ƒ Dimensional model with historization (possible)

Data Mart Layer ƒ Performance for end user queries ƒ Flat structures, esp. Dimensional model required, Tool-dependent (ROLAP / MOLAP / HOLAP) ƒ Lots of joins necessary to answer complex questions

Daimler TSS Are Data Lakes the new Core DWHs? 49 WHY MODEL?

Understand business requirements

Understand problem space

Design solution space

Think ideas (incl. alternatives) through

Daimler TSS Are Data Lakes the new Core DWHs? 50 MAKE SQL GREAT AGAIN OR WHY SQL ON BIG DATA?

SQL is universal language to access and manipulate data in a RDBMS

SQL is a language not only for DBAs or developers

SQL is standard for OLTP and OLAP, especially for BI tools

Daimler TSS Are Data Lakes the new Core DWHs? 51 STRATA 2012 VS 2016

Source: http://www.cazena.com/blog/strata-word-cloud-2012-vs-2016-data-lakes-spark-real-time-and-other-trends

Daimler TSS Are Data Lakes the new Core DWHs? 52 ATLAS FOR METADATA MANAGEMENT

• Architecture with Atlas • Supports the classical tools: • Hive • Sqoop • HDFS? • Schema-on-read?

Daimler TSS Are Data Lakes the new Core DWHs? 53 NO DATA INTEGRATION NECESSARY OR WHO REALLY DOES UNDERSTANDS DATA MODELS?

• 3NF is inefficient for query processing • 3NF models are difficult to understand • 3NF gets even more complicated with history added

• Many ways from person to order

Source: Corr / Stagnitto: Agile Data Warehouse Design, DecisionOne Press, 2011, page 5 Daimler TSS Are Data Lakes the new Core DWHs? 54 WHY DATA MODELING?

“Data modeling is the process of learning about the data, and regardless of technology, this process must be performed for a successful application.” Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014 • Learn about the data and promote collective data understanding„Expanding your • Derive security classification and measures modeling skills • Design for performance enables you to • Accelerate development reduce documentation.“ • Improve Software quality • Reduce maintenance costs Scott Ambler • Generate code • NoSQL Schema-on-read: understand model versions after years Daimler TSS Are Data Lakes the new Core DWHs? 55

• Standard approach in Data Marts in DWH • Not just for performance reasons • Performance is also an issue on Hadoop-based systems, e.g. Hive, Spark • Joins! • But also due to understandability for end users • Understandability is also an issue on Hadoop-based systems

Daimler TSS Are Data Lakes the new Core DWHs? 56 IMPORTANCE OF STRONG SCHEMA @GOOGLE

A prime motivation for this evolution towards a more “database-like” system was driven by the experiences of Google developers trying to build on previous “key-value” storage systems. The prototypical example of such a key-value system is Bigtable, which continues to see massive usage at Google for a variety of applications. However, developers of many OLTP applications found it difficult to build these applications without a strong schema system, cross-row transactions, consistent replication and a powerful query language.

Source: https://research.google.com/pubs/pub46103.html Daimler TSS Are Data Lakes the new Core DWHs? 57 HADOOP VS CLASSIC DWH SQL APPROACH

Classic DWH Hadoop Tables Yes Yes SQL language Yes Yes, SQL-on-Hadoop Query Optimizer Yes Yes Indexes, Pks Yes No Data “Owner” Proprietary RDBMS Open data format Access by many engines like Spark, Hive Many open formats like Parquet, Avro Metadata dictionary User data + dictionary User data and dictionary (“Hive in RDBMS Metastore”) separate

Daimler TSS Are Data Lakes the new Core DWHs? 58 STRUCTURING THE DATA LAKE

New data sources • Sensors, Logs, NoSQL, etc. as data source • Schema-on-read useful as sensor data format change frequent Existing internal data • Dump RDBMS exports into Data Lake for data analytics • Schema-on-read does not make any sense as data is already in a documented data model

Daimler TSS Are Data Lakes the new Core DWHs? 59