Data Lake BUILDING AGILE BIGDATA ANALYTICS PLATFORM Rama Kattunga Systems Director Enterprise Analytics About Me
Total Page:16
File Type:pdf, Size:1020Kb
Data Lake BUILDING AGILE BIGDATA ANALYTICS PLATFORM Rama Kattunga Systems Director Enterprise Analytics About me . Big Data Strategies . Analytics . Playing with petabytes is passion . Currently building a unified and unique data Experience Worked @ platform for healthcare System Director | Enterprise Analytics “Culture eats Strategy for …..” . Culture is todays’ major performance differentiator . Culture is the foundation for the strategy What is Data Lake? Ecosystem Data Lake “a place to store practically unlimited amounts of data of any format, schema and type that is relatively inexpensive and massively scalable” "If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.“ - James Dixon, Pentaho CTO Water packaging Business Area 1 ETL CRM ERP Finance Business Area 2 ETL CRM ERP Finance Single Source of Truth Business Area 3 ETL CRM ERP Finance Today’s Model: Traditional Extract Transform Load IT Supported Business Provision Satisfaction Low Analysis Reports Dashboards & End Users Scorecards 6 Months6 Transform & - 3 Load Change Analysis Data Warehouse $$$ Data Cubes Specialized Tools Quality 9 Months9 - 6 Spreadsheets Extract Data Marts Existing Data LOB Applications Data Marts Files High cost of rework IT Pro Water packaging Business Area 1 ETL LOB CORPORATE CRM ERP Finance Business Area 2 ETL CRM ERP Finance Single Source of Truth Business Area 3 ETL CRM ERP Finance Local LOB EDW Data Mart Transactional Systems CRM ERP EMR Better approach: ETL EL, iterate then T Managed Production Self-Service StewardshipData & Governance Provision Analysis Reports Dashboards & End Users & Scorecards Business Ad Hoc Transform Keep Kill Analysis Data Warehouse Data Cubes IT Support Quality Pilot Specialized Tools Requirements Spreadsheets Extract & Load Iterate POC Transform LOB Applications Data Marts Files Rapid Data Lake Experiment IT Pro / IT Supported Common Platform Where can we use Data Lakes? Ingestion challenges with Data sources like EMR systems No data left behind Schema on read Scaling Reduction of costs due to data movement Challenges with Data Lakes? Not a silver bullet Not a replacement to Information Governance Frustrating to business users if there is no schema, Metadata, Size of the data Security and data privacy ONESource Reference Architecture Source Systems Portals Business Intelligence Platform Information Hub FIREWALL Dashboards Advanced Predictive CDR Visualization Analysis HIE Clinical APPS Standard Patient Portal reporting Adhoc Reporting FINANCE APPS ACO EMR & Revenue Cycle Physician Portal Clinical DATA NORMALIZATION Operational ERP Data Quality Management Business and Data Definitions Financial Reference Data Management Business and Data Traceability Supply Chain Decision Support Data Quality Rules Engine Hierarchy Management HR Accounting Project DUMP iT Business Rules Engine Enterprise Data Model Sales and Mktg. Data Policy Management Data Enrichment ACO IT Analytics Population External and SQL in Hadoop SQL in Hadoop. SQL in Hadoop SQL in Hadoop. PHP Benchmark Data Compute & Storage . PMG . Compute & Storage Quality Data Hadoop Distributed File System Data Lake Platform Patient Experience Data Quality Images Systems Management Devices Security Management Da Vince Data Governance Robots 12 Rama.Kattunga@gmail Thank You ! 13.