Make Big Data Agile: Improve Reliability and Trustworthiness of Big Data by Aligning Process and Data

Make Big Data Agile: Improve Reliability and Trustworthiness of Big Data by Aligning Process and Data

10/4/2011 MAKE BIG DATA AGILE: IMPROVE RELIABILITY AND TRUSTWORTHINESS OF BIG DATA BY ALIGNING PROCESS AND DATA Ralph Kimball Informatica October 2011 The Data Warehouse is More Urgently Needed Than Ever Operational data increasingly dominant Low latency data trending toward zero delay Customer behavior insights being monetized “Big data” introduces many new challenges Huge volumes, many types of sources Structured semi-structured unstructured Non-relational processing Raw data, often dirty, often sparse 1 10/4/2011 Threats to the Data Warehouse Mission Waterfall approach Compartmentalized responsibilities Formal specifications, handoffs, change bureaucracy Excessive documentation Resistance to course changes Infrequent deliverables Integration overwhelm Untrustworthy, unreliable data “Starting Over” because of Big Data ?? Big Data Extends the Data Warehouse Use cases that drive the bottom line: Targeted ads in real time based on web site activity and/or geo-tracking “Satisfaction crash” intervention based on voice-to-text analysis of support calls Customer sentiment change response based on social media Emergency response planning based on satellite image comparison Fraud detection based on cross-market brokerage transaction comparisons 2 10/4/2011 Adopt the Best Aspects of Agile Development Business driven Sophisticated business sponsor All IT members involved in KPI identification Small teams with end to end responsibility Eliminate non value-added tasks Decompose the project Frequent deliverables Reasonable documentation Expect and encourage mid-course corrections Use the Value Stream Map to Identify Best Agile Opportunities DOIT Corporation: Value Stream Map (AS-IS) for Change Request Process Monday, December 20, 2010 GMNA Applications Data Warehouse Change Request Team Team (Irina) Confirmation Request Telephone Tag Automated Status Request Workflow/Tracking (Cust satisfaction) Status Update Status Request CR Review Committee Status Update Data Dictionary to clarify rqmnts Notify Customer Semi-Weekly Review (13 days) Status Request (Cust Satisfaction) (5 days) Clarify Requirements Status Update Requirements Clarification Bypass Council Approved Changes for simple CR’s Production CR Add CR (26 days) Submission To List Assign Resource Design Approval Daily ETL Batch Run Data Warehouse Team Integration Team Manager Architecture Review Council Test Scheduling Bypass CR Approval Committee for Test Team Manager simple changes Charge Test Results Design Approved Request Change Management Board (8 days) Distribution Document Designs Forward CR Request Automated Regression To Developer Approved CR Design Docs & Design Docs & Schedule Testing (21 days - requires CR Approval & Schedule investment) Requirements Design & Testing Handoff Test Case Test Execution Production Production Review Development Development Deployment Execution Development Team Test Team Automatic Daily ETL Batch Development Team Development Team Test Team Infrastructure Team Run CR’s 1 P1x12 P2x35 P3x124 8.8 Days 13.3 Days 26 Days 1 Day 12.8 Days 8.5 Days 0.3 Days Lead Time = 75.6 Days 30 Minutes 180 Minutes 15 Minutes 180 Minutes 90 Minutes 15 Minutes Work Time = 510 Minutes (8.5 hrs or 0.35 Days) Value Ratio: Work Time / Lead Time = 0.5% Notes: (1) Lead Time includes 5 delay in customer notification (2) Lead Time could be reduced to 24 days with just process changes and using existing tools (3) Lead Time could be reduced to 3 days with a capital investment for automated testing 3 10/4/2011 Kimball Lifecycle Approach is Inherently Agile Business Driven! Technical Product Architecture Selection & Growth Design Installation Program/ ETL Business Dimensional Physical Project Design & Deployment Requirements Modeling Design Planning Definition Development BI BI Application Application Maintenance Design Development Program/Project Management Prepare for an Agile EDW with Data Profiling Data profiling is a necessary foundation of the EDW Qualify or disqualify new data sources at the earliest opportunity Implement data quality triage: Demand source system fix, or Correct data in DW ETL, or Tag data and pass through Embed data profiling throughout pipeline: Source ETL back room BI front room Integrate data profiling tightly with data flow development: discovery spawns in line module 4 10/4/2011 Agile Data Governance Fits Rhythm of EDW Development Recognize, protect data as a corporate asset Continuously manage data quality at all levels Operational sources Data integration pipelines BI delivery Continuously extend EDW integration with conformed dimensions driven by enterprise MDM Continuously add new sources, facts, dimensions, and dimension attributes Dimensional approach minimizes rework, impact Agile Approach: Four Active Steps Manage/plan with the data warehouse bus matrix Develop with enterprise MDM Integrate incrementally Manage data quality incrementally 5 10/4/2011 Agile Approach: Use the 1. Data Warehouse Bus Matrix Decompose and manage data warehouse implementation projects Implement rows of matrix one at a time Business Processes Dimensions Agile Approach: Leverage 2. Development with Enterprise MDM Move entity creation and admin to operations Move away from downstream MDM Reduce repeated ETL demands for same entities Establish single source for entity content Operational systems, social media tracking, and EDW subscribe to MDM Roll out MDM incrementally 6 10/4/2011 Agile Approach: 3. Integrate Incrementally Data warehouse integration is drilling across: Agile advantages of conformed dimensions Install conformed attributes one source at a time Add new conformed attributes one at a time Deliver value with each iteration Agile Approach: 4. Manage Data Quality Incrementally Use data profiler to build data quality screens Column screens Structure screens Business rule screens Manage data quality through event database Agile advantages of this DQ architecture Build screens one at a time Add or subtract screens 7 10/4/2011 Agile Approach: Special Considerations for Big Data Big data fits comfortably with agile approach Agile approach tolerates uncertainty, back tracking, and midcourse corrections Big data often must be analyzed before structure is understood Extend familiar RDBMS experience to non-relational data with Hadoop resources MapReduce, Hbase, Cassandra Data profiling extended to non-relational data Hadoop as a preprocessor for creating structured data Next Steps Integration is your fate! Establish continuous agile process for identifying, vetting, and implementing new data sources Use data virtualization for rapid prototyping and proofs of concept Choose a Big Data source and a Hadoop configuration as a pilot project Build IT skills Introduce new KPIs to business community Use Kimball/Informatica Big Data white paper as guide to growing EDW to support Big Data 8 10/4/2011 The Kimball Group Resource www.kimballgroup.com Best selling data warehouse books NEW BOOK! The Kimball Group Reader In depth data warehouse classes taught by primary authors Dimensional modeling (Ralph/Margy) Data warehouse lifecycle (Margy/Warren) ETL architecture (Ralph/Bob) Dimensional design reviews and consulting by Kimball Group principals Informatica White Paper on Big Data Impact on EDW: http://vip.informatica.com/?elqPURLPage=8808 9 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us