Data Streams, Complex Events, and BI Seth Grimes Alta Plana Corporation +1 301-270-0795
Total Page:16
File Type:pdf, Size:1020Kb
Data Streams, Complex Events, and BI Seth Grimes Alta Plana Corporation +1 301-270-0795 -- http://altaplana.com International Data Warehouse & Business Intelligence Summit 2008 June 11-13, 2008 Agenda BI at the speed of thought → Analytics at the speed of data: the business case. Technologies and Solutions for real-time data and real-time analytics. Evaluating and implementing: Best Practices, strategy, and resources. Data Streams, Complex Events, and BI 3 Data Streams I worked for 3 years for NASA... Programming simulations of satellites that would map the earth‟s gravity field. Data involved simulated orbit altitude, inclination, duration, and the accuracy and measurement rate of instruments. http://media.skyandtelescope.com Instruments captured acceleration with 6 degrees of freedom – 3 spatial dimensions plus “attitude”: roll, pitch & yaw. I also worked on magnetic-field mapping where satellites would be ranged by ground-based lasers. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 4 Data Streams Then I worked for 5 years for the U.S. National Highway Traffic Safety Administration... Vehicle crash tests (at that time) included up to 80-90 sensors on vehicles and dummies, 8000x/second data rate. Data was correlated to video. http://www.nhtsa.dot.gov/ Analyses were off-line. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 5 Data Streams I then worked with economic data, especially time series... Observations are at calendar frequencies: yearly, monthly, weekly, etc. Econometric models link disparate data and enable forecasting. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 6 Data Streams Data streams come from many sources. Data volumes, types, frequencies vary. Processing approach depends on use:Data-stream correlation; analysis techniques. Communications networks and capital markets created the market for new classes of software. Data stream processing (DSP). Complex event processing (CEP). Builds on BI and Operations Research. OR=mathematical modelling for automated decision making. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 7 Data Streams Capital markets involve trades, bids/offers, news, statistics (e.g., P/E ratio, float), competitive intelligence, and economic & market data. Trading data includes symbol, volume, price, time. Many derivatives: Put & call options. Indexes & baskets. ForEx hedging. Monetizations. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 8 Breakthrough Analysis Traditional BI is not enough. Historically focused rather than “operational” and predictive. Designed for reporting and interactive analysis rather than automated, embedded analytics. Issues with real-time (RT) data and real-time response. Tilt toward performance management, financials. http://www.dundasconsulting.com/ On the plus side, BI has: Dashboards. KPIs. Ability to handle time-based data. Growing RT capabilities. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 9 Breakthrough Analysis Traditional DBMSes and DW approaches – Load → Index → Query – do not meet the speed need. Standard SQL and standard SQL engines don‟t support continuous queries. Common APIs (interfaces) – ODBC/JDBC, Analysis Services – are set/record oriented and do not provide adequate real-time query capabilities. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 10 Breakthrough Analysis We need to cope with – Bigger data. Faster data. New, complex information types. Low-latency (“real-time”) processing. Analysis over data streams. Complex, statistically rooted algorithms. Automated decision making. => Complex Event Processing. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 11 Problem Statement More data sources, more & faster data: RFID (radio frequency identifier) chips, especially for inventory and logistics. News and social media. Retail transactions. Communications networks. Sensor networks, security devices. Trading markets. Analytics driver: efficiency, speed, accuracy. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 12 Problem Statement More data types: Surveillance video and audio. Sensor state information (e.g., temperature, humidity). Geolocation data, e.g., from GPS devices. Complex transactional information. Bids and offers. Full (“unstructured”) text. Coded and tagged (“structured”) text. Timestamps. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 13 Problem Statement Challenges: Timely response. Capture of high-frequency data. Correlation of data from multiple sources. Long-running and time-aware queries. Issues: RDBMSes were designed for transaction processing. Load, structuring, and indexing time given traditional datamarts and data warehouses. Analytical styles: Traditional BI is historically focused. Algorithms: Scalability of traditional serial algorithms. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 14 Responses Operational/Real-Time BI? Data Stream Processing (DSP). Complex Event Processing (CEP). ... Adaptive Data Warehousing. Column stores. Massively Parallel (database) Processing (MPP). In-memory analytics. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 15 Operational/Real-time Data/BI Real Time BI: Fast response required. E.g., model scoring. Supports automated decision making. Operational BI on real-time data – What’s happening right now? – involves a bit more. Dashboards provide a great visualization tool. BI differs from monitoring: it contextualizes information. Be careful of “twinkling” data. But cleansing, structuring, indexing add latency. And we’re still not handling events or complexity. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 16 Operational/Real-time Data/BI Most BI dashboards are not for RT data. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 17 http://www.perceptualedge.com/blog/?p=154 Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 18 Events The concepts of events became familiar in the „90s in association with object-oriented (OO) programming. C++, Smalltalk, Java, and other languages. Software “objects” are invoked in response to program events such as the user’s pressing a button. Objects and events are managed in an application server. This conception did not cover data events. We did have DBMS based event handlers: stored procedures, triggers, alerts, etc. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 19 Events Data events: • A shop in Bangkok seeks to authorize a charge on a credit card that was used one hour ago in Nairobi. • Inventory of widget X fall below five days‟ average sales. • The spread between the AU.L price in London and the AUY.F price in Frankfort exceeds 3%. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 20 Data Stream Concepts Ordering: A stream is a sequence of observations over time. Correlation: Multiple streams may be related. Time and observation windows. Continuous/long-running queries. Use of SQL, treating streams like tables. Use of SQL joining streams and historical data. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 21 Data Stream Concepts “Traditional ETL processes are limited to reading data at rest: from databases, mainframes, and files extracted from other operational systems. “Data in flight exists in other formats: messages on message- oriented middleware, web service calls, TCP network packets, and so forth... “A stream is analogous to a table in a relational database; but … contains an infinite sequence of rows that arrive whenever the producer decides to send them.” -- Julian Hyde, SQLstream architect (http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html) Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 22 Data Streams and CEP Extend the concept to event streams. My definition: An “event” involves a data condition that is notable in some context. Those BI dashboards showed events (and more) but don‟t handle streams. Complex Event Processing. Feeds on data and event streams and “clouds.” Transforms input streams – whether data are atomic or complex – into output streams. Supports automated decisions via rules systems. Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 23 BI versus CEP http://www.eventstreamprocessing.com/ Copyright © 2008 Alta Plana Corporation International BI-DW Summit Data Streams, Complex Events, and BI 24 Data Stream Pioneers AT&T Labs created Hancock: A Language for Extracting Signatures from Data Streams, in 1999-2000. “Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or