Pentaho & Mongodb Partner to Solve Government Big Data Challenges
Total Page:16
File Type:pdf, Size:1020Kb
Pentaho & MongoDB Partner to Solve Government Big Data Challenges December 2013 Bob Gourley Publisher, CTOvision.com Will LaForest Director of Federal, MongoDB Dave Henry SVP Enterprise Solutions, Pentaho 1 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Big Data Management Best Practices for Federal Big Data Projects Bob Gourley Publisher, CTOvision.com 2 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Brief Purpose Research & Reports Intro to top 5 A focus on a “Best new discipline Practices” of “Big Data Management” of Federal Data activities Invitation to A perpetual Contribute your collaborate draft - your thoughts at and refine input is CTOvision.com approaches requested 3 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Update Sources . Big Data Government Newsletter - reader survey 2,600 readers 2% response rate, across Federal agencies . Review of openly published research by Wikibon, TDWI, IDC, Gartner, Forrester and of course our own CTOvision . Review of best practices and use cases from the best vendors in Enterprise Big Data . Engagement of the community at events like Strata and Hadoop World Planning Assumption The ability to collect, parse, analyze machine data in real time, whether on premise or in the cloud, will continue to grow 4 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Big Data Management . Agencies are thinking through the right changes to concepts and technologies . Old approaches still important, but cannot solve emerging problems . Big Data Management is an evolved discipline which builds on existing data management approaches to leverage new concepts, technologies and best practices to optimize mission support 5 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Solutions That Require Big Data Management • Open Source Information: analysis and integration • Situational Awareness across disparate data sets • Two use cases: “Connect the Dots” and “Needle in Haystack” • Cyber Security: rapid real time analysis of all relevant data • Asset catalog across extensive/dynamic enterprises • Rapid return of geospatial data • Location based push of data • Real time return of relevant search • Real time suggestion of topics • Bioinformatics: • Human Genome • Patient location, treatment, outcomes • Law Enforcement: Predictive Policing • Data Hub: Unified storage, governance, security, functionality 6 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Best Practices in Big Data Management Start with a mission-focused vision. This will vary by organization. Support VISION to mission will drive everything else. Consider that analytics and Big Data go together. Should prioritize and tackle challenges like: Changes to governance processes, right mix of skills for workforce, learning new technology, STRATEGY prioritizing which workload types will be handled by which part of the architecture. Know existing infrastructure and process with focus on: Understanding of legal/policy dynamics relevant to your agency, understanding of new KNOW capabilities available, current and required throughputs/capacities, types of workloads supported by each components in the architecture, available tech choices. Document and continuously improve. Architect to manage data in its original form. Include right mix of traditional and new in your design. Don’t DESIGN assume any one platform will be a solution. Architect to insulate applications and users from a variety of disparate big data platforms. Avoid custom coding wherever possible. Don’t let new Big Data Platforms EXECUTE become proprietary silos. ETL remains important. Ensure training for all based on job function. Don’t neglect your own training. Serve the analyst. 7 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Next Steps Continue your market surveys, stay aware of what new technologies can do for you. Revisit your vision. As you do, ponder this: How can you leverage data to support your mission? Continue to study use-cases and exchange best practices. Dialog with others in and out of your sector. Great lessons are coming from other industries. Continue to engage with the broader community. Sign-up for our Government Big Data Weekly. Share your lessons learned. 8 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Provide Your Thoughts, Input, Questions E-mail: [email protected] Blog: http://ctovision.com Twitter: http://www.twitter.com/bobgourley Facebook, LinkedIn, etc: See the blog 9 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 The Modern Operational Database for Government Will LaForest Director of Federal, MongoDB 10 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 The Evolution of Databases Online 1990 2000 2010 Operational & Real-time NoSQL RDBMS RDBMS RDBMS Datawarehouse OLAP/BI OLAP/BI Hadoop Offline 11 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Relational Database Challenges Variety Agile Development • Unstructured data • Iterative • Semi-structured • Short development data cycles • Polymorphic data • New workloads Volume & Velocity New Architectures • Petabytes of data • Horizontal scaling • Trillions of records • Commodity servers • Millions of queries per second • Cloud computing 12 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 MongoDB The Modern Operational Database General Document Open- Purpose Oriented Source 13 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Fully Featured • Find Paul’s cars MongoDB Rich Queries • Find everybody in London with a car built between 1970 and 1980 { first_name: ‘Paul’, • Find all of the car owners within 5km of Geospatial surname: ‘Miller’, Trafalgar Sq. city: ‘London’, location: [45.123,47.232], • Find all the cars described as having cars: [ Text Search leather seats { model: ‘Bentley’, year: 1973, value: 100000, … }, • Calculate the average value of Paul’s Aggregation car collection { model: ‘Rolls Royce’, year: 1965, value: 330000, … } • Secondary • Full Text Native Indexes • Compound • Hash } • • Geospatial Covering } 14 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 MongoDB and Enterprise IT Stack Applications CRM, ERP, Collaboration, Mobile, BI Security Auditing Security & Data Management Online Data Offline Data RDBMS RDBMS Hadoop EDW Management & Monitoring & Management Infrastructure OS & Virtualization, Compute, Storage, Network 15 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Variety – Modern Data Document Data Model Relational MongoDB { first_name: ‘Paul’, surname: ‘Miller’ city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } 17 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Dynamic Schema MongoDB does not need any defined data schema. Every document could have different data {name: “will”, {name: “jeff”, {name: “brendan”, eyes: “blue”, eyes: “blue”, aliases: [“el diablo”]} birthplace: “NY”, height: 72, aliases: [“bill”, “la boss: “ben”} ciacco”], {name: “matt”, gender: ”???”, pizza: “DiGiorno”, boss: ”ben”} {name: “ben”, height: 74, hat: ”yes”} boss: 555.555.1212} 18 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Volume, Velocity, and New Architectures Automatic Sharding • Increase or decrease capacity as you go • Automatic balancing • Optimized for commodity servers and cloud infrastructure 20 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 High Availability • Automated replication and failover • 0 down time with hardware failure and upgrades • Multi-data center support • Improved operational simplicity (e.g., HW swaps) • Data durability and consistency 21 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 MongoDB Performance* Top 5 Marketing Government Top 5 Investment Firm Agency Bank Data Key/value 10+ fields, arrays, 20+ fields, arrays, nested documents nested documents Queries Key-based Compound queries Compound queries 1 – 100 docs/query Range queries Range queries 80/20 read/write MapReduce 50/50 read/write 20/80 read/write Servers ~250 ~50 ~40 Ops/sec 1,200,000 500,000 30,000 * These figures are provided as examples. Your application governs your performance. 22 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Replication Benefits Operational and Analytical Workloads • Application interacts with primaries • Analytical workloads on secondaries • Workloads are isolated from one another • Working set appropriate for each application 24 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Global Data Distribution Real-time Real-time Real-time Real-time Real-time Real-time Real-time 25 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Read Global / Write Local Primary:LON Secondary:NYC Secondary:SYD Primary:NYC Secondary:LON Secondary:SYD Primary:SYD Secondary:LON Secondary:NYC 26 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Solving Big Data Challenges in the Federal Government Dave Diegtel Head of Federal Sales, Pentaho 27 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)