INFORMS CONFERENCE THE BUSINESS OF BIG DATA

June 22-24, 2014 San Jose, California San Jose Convention Center

Conference Sponsors 2 Online Proceedings View slides for most conference sessions: Conference Committee 2 http://meetings2.informs.org/bigdata2014/ Program Highlights 3 Conference Mobile App Search your App Store (Android or Apple) Conference Mobile App 4 for “INFORMS Mobile” to download Technology Workshops 4 Networking & Social Events 5 Guide to Exhibits 6 Poster Sessions 8 Sunday Events 9 Monday Events/Abstracts 9 Tuesday Events/Abstracts 14 Schedule at a Glance 18 Floor Plans of Meeting Space 20 2 Thanks to Our Sponsors Conference Committee

Margery H. Connor Conference Co-Chair Senior Operations Researcher Advanced Analytics, Chevron

Diego Klabjan Portfolio Pads and Signage Conference Co-Chair Professor, Founding Director Master of Science in Analytics Northwestern University

Paul Bachteal Conference Bag Director, Americas Techology Practice, SAS

Donald L. Buckshaw Chief Operations Research Analyst Badge and Lanyard Online Proceedings Arnold Greenland IBM Distinguished Engineer Office of the Public Sector CTO, IBM

Mani Janakiram Director, Supply Chain Intelligence & Analytics, Intel Welcome Reception Brian Keller Lead Associate,

Natalie Kortum Director of Consumer Analytics Humana

Veena Mendiratta Consulting Member of Technical Staff Monday Refreshment Break Alcatel-Lucent

Jeff Winters Operations Research Division Manager UPS

Tuesday Refreshment Break 3 Program Highlights POSTER PRESENTATIONS WITH DESSERT THOUGHT-PROVOKING KEYNOTE PRESENTATIONS Monday & Tuesday: 2:00pm – 3:00pm Monday, 8:30am – 9:30am, LL20A LL20 Foyer Putting Big Data To Work Bill Franks, Chief Analytics Officer, Each day, we’re serving up a delectable Teradata Corporation afternoon dessert during a high quality poster session. These poster presenta - Bill provides insight on trends in the analytics & big data space and tions offer you an inside look into a full helping clients understand how Teradata and its analytic partners can range of big data analytics, including all support their efforts. In addition, Bill is a faculty member of the Interna - applications areas, industries and tional Institute for Analytics and the author of the book Taming The Big methodologies. Presenters will be available Data Tidal Wave (John Wiley & Sons, Inc., April, 2012). He is also an active during the afternoon poster sessions speaker and blogger. Bill’s focus has always been to help translate from 2:00pm-3:00pm to describe their complex analytics into terms that business users can understand and to work and answer questions. The posters then help an organization implement the results effectively within their processes. His will also be displayed during the entire work has spanned clients in a variety of industries for companies ranging in size from day, so you can browse at your leisure. As Fortune 100 companies to small non-profit organizations. a guide for identifying which posters you want to visit, watch for the color-coded Abstract: signs that indicate general topic areas. Big data is everywhere. You can’t avoid being exposed to discussions around big data, See full poster listing on page 8. and the analysis of it, on a regular basis. The downside of this attention is that there is a lot of hype and misinformation in the marketplace. Many companies are confused about how to get started, what actions to take, and what pitfalls to avoid. Based on content from the popular book Taming The Big Data Tidal Wave , this talk will provide an overview ONLINE PROCEEDINGS of important themes to understand regarding big data. The talk will address technological , Preview Presentation Slides to organizational, and cultural points that must be considered. Most importantly, the talk Help Select Sessions to Attend will aim to provide attendees a solid direction to take their big data initiatives. Our online proceedings, featuring Tuesday, 8:30am – 9:30am, LL20A presentation slides for almost every Moving from Big Data to Big Outcomes on the Journey to ROI session in the conference, is a great Michael Svilar, Managing Director, resource for selecting the talks you want to attend. A quick check provides As a managing director at Accenture Analytics, now part of Accenture a detailed look at what speakers will Digital, Michael Svilar serves as the Delivery Lead and Capability Lead be presenting, and helps ensure a for Accenture Analytics. In these roles, Michael runs a group of more good match with what you want to than 900 data scientists and analytics professionals globally. Michael learn. The online proceedings will oversees Accenture’s network of Analytics Innovation Network, including also be available to attendees after our Advanced Analytics Innovation Centers in Athens, Barcelona, Dublin, the conference. Go to the conference Madrid and Singapore. Michael has been with Accenture for 14 years, and web page and click on “Conference was a founding member of its Marketing Sciences practice. Most recently, Michael led Proceedings.” If you have questions, the Marketing Analytics group in Accenture Interactive. For more than 30 years, Michael please go to the INFORMS has run analytics projects across multiple industries including Retail, Communications, registration desk. Financial Services , Automotive, Consumer Packaged Goods, and Electronics. His pri - mary area of focus is applying analytics to marketing, merchandising and price promo - tions. He holds a patent in forecasting analytics, and has published on a variety of analytics topics. Michael has a BS in Economics from John Carroll University, an MA in Eco - nomics from Miami University , and a Ph.D. from the University of Maryland. Abstract: In this Internet of Everything world, systems, devices and physical objects are “talking” to one another. There are upwards of a trillion connected and instrumented things: cars, appliances, cameras, roadways, pipelines…even pharmaceuticals and livestock. The talk will focus on how organizations can drive positive business outcomes in the connected world using big data analytics. 4 Conference Mobile App Technology Workshops INFORMS Future Meetings

INFORMS is pleased to SUNDAY, JUNE 22 2014 offer a free mobile app for 1:45pm – 3:15pm November 9-12 smart phones and tablet INFORMS Annual Meeting users. Use the app to access FICO 2014 San Francisco conference information on LL21B Hilton San Francisco & Parc 55 Wyndham the go, including the schedule of talks, Turn Big Data into Better Decisions San Francisco, California speaker bios, maps showing meeting http://meetings2.informs.org/ locations, exhibit details and much more. Forio sanfrancisco2014/ As this is a native app, you do not need an LL21C Internet data connection to use it during Creating and Publishing Interactive Online 2015 the conference. Operations Research Applications April 12-14 When you enter the conference program INFORMS Conference on Business in the app, you can either browse through ProbabilityManagement.org Analytics & Operations Research the parallel sessions and special events or LL21D Hyatt Huntington Beach search for a specific speaker or keywords. Unambiguous Uncertainty Stored Huntington Beach, California As you locate the events, sessions and as Big Data presentations you are interested in, click June 14-17 on the star button which bookmarks it 3:30pm – 5:00pm CORS-INFORMS International into the My List itinerary. Use My List for Sheraton Montreal a quick reference tool. Frontline Systems Montreal, Canada LL21B Android Devices Analytics Made Easy: Data Mining, November 1-4 To download, go to the Play Store Simulation and Optimization for and search for INFORMS Mobile. INFORMS Annua l Meeting Excel and the Web 2015 Philadelphia Apple Devices Pennsylvania Convention Center To download on either iPhone or iPad, go SAS & Marriott Philadelphia Downtown to the App Store and search for INFORMS LL21C Mobile. An Overview of with SAS® Enterprise Miner™ Please Provide Your Input We encourage you to use the survey feed - back form available for every presentation on the program. You can respond to the survey as you browse the presentations. A full conference survey is on the conference screen of the main menu. The conference committee greatly appreciates your feed - back on individual sessions and on the conference overall. 5 Networking and Social Events For Your Career SUNDAY 5:30pm – 7:00pm Marriott Hotel & Your Organization 5:30pm – 7:00pm LL20 Foyer General Reception San Jose Ballroom Welcome Reception and Exhibits We welcome you to the conference with Stop by this informal reception where you ANALYTICS this first opportunity to visit the exhibits, can connect with other attendees and CERTIFICATION – CAP® network with colleagues and enjoy heavy unwind from a jam packed day. Members of the Conference Committee will be your Stop by the INFORMS table to hors d’oeuvres, beer, wine, soft drinks and reserve your copy of the official CAP bottled water. hosts. Enjoy a hearty display of heavy hors d’oeuvres and food stations, beer, Study Guide. This brand new resource wine, soft drinks and bottled water. provides an easy and efficient way to MONDAY prepare for CAP. Be sure to pick up a 7:30am – 8:30am LL20BCD TUESDAY copy of the CAP Candidate Handbook . Full Buffet Breakfast This contains all the information you Connect with colleagues over breakfast. 7:30am – 8:30am LL20BCD need to decide if CAP is for you. Full Buffet Breakfast Quantities are limited. 12:00pm – 1:00pm LL20BCD Connect with colleagues over breakfast. Networking Lunch INFORMS ANALYTICS Join your colleagues and meet someone 12:00pm – 1:00pm LL20BCD MATURITY MODEL new to discuss how you are using big data Networking Lunch Join your colleagues and meet someone How well does your organization use in your company. Seating is not assigned analytics? What can you do to progress for this event. new to discuss what is keeping you up at night, as a big data expert or data scientist , from good to great? Attend this and where do you see big data analytics presentation of the new INFORMS 2:00pm – 3:00pm LL20 Foyer analytics maturity model to see how Poster Session, Dessert & Exhibits going in the next 3 years. Seating is not assigned for this event. you can score your organization, set Enjoy dessert, visit the exhibits, and learn target goals, and attain the enormous from poster presentations that showcase 2:00pm – 3:00pm LL20 Foyer advantage of analytics leadership. a full range of big data analytics, including For information about doing your first many applications areas, industries and Poster Session, Dessert & Exhibits Enjoy dessert, visit the exhibits, and learn analytics maturity assessment, stop methodologies. As a guide for identifying by the INFORMS booth for a helpful which posters you want to visit, watch for from poster presentations that showcase full range of big data analytics, including brochure or attend the following in the color-coded signs that indicate general depth presentation. topic areas. Prizes for the top 3 overall all applications areas, industries and methodologies. As a guide for identifying Monday, Track T5 projects will be awarded at 2:50pm. The 9:40am – 10:30am, LL21B criteria are interesting use of big data which posters you want to visit, watch for analytics, effective display of results the color-coded signs that indicate general topic areas. Prizes for the top 3 overall INFORMS CONTINUING and business impact. See page 8 for EDUCATION the full list of posters. projects will be awarded at 2:50pm. The criteria are interesting use of big data INFORMS’ two-day, in-person courses 4:00pm – 5:30pm LL20BCD analytics, effective display of results provide real-world value in skills, tools Birds-of-a-Feather Discussion Groups and business impact. See page 8 for and methods that can be implemented Stop by for beer, soda, snacks and good the full list of posters. in your work. Essential Practice Skills conversation around important issues. for Analytics Professionals was offered Topics will be selected by the attendees. just before the INFORMS Big Data We need your input! Look for the white Conference and a second course is board near registration to suggest topics being offered immediately following or vote for a topic already posted. This is the conference. If you would like to one of the most popular and highly-rated attend the Data Exploration & Visuali - aspects of the conference – don’t miss it. zation course offered June 25-26 off site, please inquire at the registration desk. For the full 2014 schedule and more information, contact Thedra White at [email protected] or visit www.informs.org/continuinged. 6 Guide to Exhibits Exponent®, Inc. Booth #116 Forio Booth #111 www.exponent.com www.forio.com Exponent is a leading engineering and High performance predictive analytics Booz Allen Hamilton Booth #109 scientific consulting firm providing solutions with the Forio analytics platform. Put inter - www.boozallen.com to complex technical problems. Our multi - active analyses within the hands of decision- Booz Allen Hamilton is a leading provider disciplinary organization of engineers, makers with online analytic applications of management consulting, technology, and scientists, physicians, and business that let users change assumptions and engineering services to the US government consultants addresses complicated issues instantly see results through their web in defense, intelligence, and civil markets, facing industry and government today. browsers. With the Forio Analytics Platform , and to major corporations, institutions, Exponent’s staff has extensive experience you can develop models in Julia (a new, and not-for-profit organizations. Booz in advanced analytics techniques and the high-performance technical computing Allen is headquartered in McLean, Virginia , application of these methods to a wide language with capabilities similar to R and employs approximately 23,000 people, range of applied and research studies, MATLAB) or other languages, import data and had revenue of $5.76 billion for the 12 including clinical investigations, epidemi - and formulas from Excel, and enable users months ended March 31, 2013. In 2014, Booz ologic studies, public health evaluations, to share and compare scenarios online. Allen celebrates its 100th anniversary year. health economics analyses, and other out - The Forio Analytics Platform combines To learn more, visit www.boozallen.com. comes research. Exponent has a full-time sophisticated analysis, universal accessi - (NYSE: BAH) staff of over 900 located in 25 international bility via online data visualizations, and offices. Exponent is certified to ISO 9001 centralized model with secure access. Contact Singapore Booth# 104 and is authorized by the GSA to provide www.contactsingapore.sg professional engineering services. Frontline Systems, Inc. Booth #106 Contact Singapore is an alliance of the www.solver.com Singapore Economic Development Board FICO Booth #108 & 110 See the new release of Analytic Solver and Ministry of Manpower. We engage www.fico.com/Xpress Platform, introduced at this conference – overseas Singaporeans and global talent FICO is a leading analytics the surprisingly easy, lightning-fast, sur - to work, invest and live in Singapore. company, helping businesses around the prisingly powerful integrated analytics Contact Singapore actively links Singapore- world make better decisions that drive software from Excel Solver developer based employers with professionals to higher levels of growth, profitability and Frontline Systems, the company that’s support the growth of our key industries. customer satisfaction. FICO’s analytic democratizing analytics. See Solver SDK We work with investors to realise their tools, which include FICO® Model Builder Platform and Solver Server, your fastest, business and investment interests in and FICO® Xpress Optimization Suite, most flexible path to Web deployment of Singapore. For information on working, provide functionality to turn Big Data into your own analytics applications. Learn investing and living in Singapore, please better decisions. FICO's analytics capabil - how little this costs, and how quickly you visit www.contactsingapore.sg or contact ities are now available on the FICO® can create data visualizations, data mining our worldwide offices. Analytic Cloud and the FICO® Decision and forecasting models, simulation and Management Platform, an easy, cost- risk analysis models, and conventional Enrich Consulting Booth# 102 effective way to evaluate, customize, and stochastic optimization models in www.enrichconsulting.com deploy and scale state-of-the-art analytics Excel or your favorite program - Enrich was born out of a restless desire to and decision management solutions. ming language, from C++, C# and Java to help companies do more—create more Learn more at www.fico.com/Xpress. PHP and JavaScript. Take home a CD with game-changing technology, develop more FICO: Make every decision count™. full-power versions of our tools, and free life-saving drugs, find more opportunity. trial licenses. Our emphasis is on building self-sufficient business forecasting software for our clients , based on customized analytic solutions. The Enrich Analytics Platform (EAP) provides the enterprise computing power; our high-touch, one-on-one approach builds the custom solutions that empower clients to make strategic business decisions— and follow through on them. 7

INFORMS Near Registration ProbabilityManagement.org Booth #107 SAS Booth #112 & 114 www.informs.org www.probabilitymanagement.org www.sas.com INFORMS is offering complementary print Non-profit ProbabilityManagement.org What if you could use big data to make issues of our member magazine OR/MS promotes standards for communicating faster, more precise decisions – while Today and our online publication Analytics , uncertainties through arrays of big data your competitors are still running analyses? as well as our 13 scholarly journals known as SIPs (Stochastic Information No matter the size or complexity of your including Interfaces and our online-only Packets). data demands, high-performance analytics journal Service Science . Attendees can SIPs are: simply outperforms traditional computing also purchase books, CDs, tutorials, and • Actionable - may be used directly in methods. SAS provides several high- INFORMS t-shirts. Information is available Excel to perform thousands of Monte performance distributed processing on INFORMS Certification, Continuing Carlo trials before your finger leaves the options: in-memory, in-database and grid Education, Subdivisions, Awards, Meetings , key using only the Data Table computing. Plus, you can seamlessly Publications, and Memberships. Come by command. explore, visualize and model big data the booth to speak to an INFORMS • Additive - may be aggregated across stored in open-source platforms like Hadoop. representative about our programs and platforms across the enterprise. Visit sas.com/bigdata to learn how you can: products. • Auditable - are represented as • Gain faster, more profitable insights unambiguous data with provenance. into your business. Northwestern University Booth #115 Packages such as Crystal Ball, @Risk, Risk • Analyze and model data with domain- Master of Science in Analytics Solver, Matlab, R, and SAS may be leveraged specific analytics. www.analytics.northwestern.edu through shared stochastic libraries using • Deploy a scalable analytics infrastructure. The Master of Science in Analytics program the open SIPmath™ standard. Visit • Optimize the use of your IT resources. at Northwestern University is a full-time, SIPmath.org for videos and Excel demo 15-month professional master’s degree models. UCLA Anderson Master Booth# 100 that immerses students in a comprehen - of Financial Engineering Program sive and applied curriculum exploring the Provalis Research Booth # 113 www.anderson.ucla.edu/mfe underlying data science, information www.provalisresearch.com The UCLA Anderson Master of Financial technology and business of analytics. Provalis Research is a leading developer of Engineering (MFE) Program equips graduates Supplemented by an internship placement text analytics software with ground-break - with skills required to succeed in careers and industry supplied projects, graduates ing qualitative and quantitative analysis in quantitative finance and data analytics. will be exceptionally well equipped to programs, such as QDA Miner, an innovative Companies in financial services and harness and communicate the full value mixed-methods qualitative data analysis technology are increasingly looking for of data to the organizations they serve. software; WordStat, a powerful add-on employees with data science skillset. Our module for computer assisted content one-year specialized degree meets that analysis and text mining; and Simstat, an need by providing our students with a easy yet powerful statistical software. One rigorous educational experience that of the most distinctive features of these blends mathematical and statistical tools is their interoperability, allowing modelling skills, computational expertise, researchers to integrate numerical and and finance theory. Our program is based textual data into a single project file and on the business school paradigm of to seamlessly move back and forth between merging theory and principle with up-to- quantitative and qualitative data analysis, the-minute business practice – so that as well as to easily explore relationships our graduates can immerse themselves between numerical and textual data. Visit in a rewarding career immediately upon www.provalisresearch.com. graduation. 8 Poster Presentations 9 - Detection of Manipulated Movie 6 - Research on Big-Data to Increase Reviewers and its WOM Implications: Wheat Yield and Efficiency of "Granary with dessert Experience in a Movie Porta Site in Korea of Bohai Bay" Jaehyeon Ju, Doctoral Student, KAIST Chupeng Xie, Professor, Shandong MONDAY SESSION Agricultural University 2:00pm – 3:00pm, LL20 Foyer 10 - Organization, Segmentation, and Hierarchies for Large Number 7 - Large Scale Optimization for 1 - Just-in-Time Scheduling to Maximize of Time Series Stochastic Economic Dispatch Weighted Number of Jobs on Flow Michele Trovero, Research Statistician, Harsha Gangammanavar, Visiting Shop Machines SAS Institute Inc. Assistant Professor, University of Muminu Adamu, Senior Lecturer, Southern California University of Lagos 11 - Large Scale Hierarchical Text Classification 8 - Analyzing Historical Exchange Rate 2 - Zero Inflated Transformation Zheng Zhao, SAS Institute, Inc. Ali Alhamdan, Graduate Student, Hazard Modeling for Corporate Marywood University Bankruptcy Prediction 12 - Experiments in Deep Learning Shaonan Tian, Assistant Professor, Patrick Hall, SAS Institute, Inc. 9 - Post-Marketing Safety Monitoring San Jose University of Self-Reported Symptom-Treatment- 13 - Establishing Functionally Similar Outcome Measurement in Web-based 3 - How Do Business Models Evolve Sites Based on Direct Impressions Media Sources through Big-data Analytics Stephen Burning, University of Illinois- Michael Wallis, SAS Institute, Inc. Christoph Breidbach, Postdoctoral Mathematics Department Scientist, University of California, Merced 10 - Sales Prediction for New Product with Kaggle Data 4 - Scalable, Real-time Big Data TUESDAY SESSION Shuanglong Wang, PhD Candidate, Analytics for Connected Cars Univeristy of Illinois Urbana Champaign 2:00pm – 3:00pm, LL20 Foyer Dirk Van den Poel, Professor of Business Analytics/Big Data, Ghent University 11 - A New Approach to Time Series 1 - Business Analytics with Forio Data Preparation in SAS® Epicenter using Julia: Delivery Route 5 - Sensitivity Analysis of the Kenneth P. Sanford, PhD, SAS Institute, Inc. Optimization with a Mobile Interface VPIN Metric Example Kesheng Wu, Staff Computer Scientist, 12 - Using Big Data and Text Analytics Brian Piper, Data Scientist, Forio Lawrence Berkeley National Laboratory to Manage Attrition Vinh Nguyen, Toyota Financial Services 2 - An Application of Principle 6 - Salient Features of TCS' GNDM™ Component Analysis in Clinical Trials – (Global Network Delivery Model) Choosing the ‘Best’ Sites Framework Chao Deng, Sr. Operational Analytics Natarajan Vijayarangan, Senior Scientist, Specialist, Quintiles Tata Consultancy Services Ltd (TCS) 3 - Bridging the Semantic Gap in 7 - Exploration vs. Exploitation in the Multimedia Retrieval with Topic Information Filtering Problem Extraction From User Reviews Xiaoting Zhao, PhD Candidate, Eunjeong Park, Seoul National University Cornell University 4 - Generating Ideas from Hidden 8 - Twitter Analysis Captures Customer Needs in Social Media for Antiretroviral Drug Toxicity New Product Development Process and Community Taehoon Ko, Seoul National University Cosme Adrover, Postdoctoral Scholar in Data Science, Pennsylvania State University 5 - Operationalizing Big Data for the Tactical Edge Arkady Godin, Research Associate, Naval Postgraduate School 9 Sunday June 22 Monday June 23 Track 2: Case Studies LL21E Big Data in Practice - Scientific Information as a Business Asset – 12:00pm – 7:00pm LL20 Foyer 7:00am – 5:00pm LL20 Foyer Driving Productivity at Merck Research Registration Registration Labs Through Novel Approaches to Scientific Information Management 1:45pm – 5:00pm 7:30am – 8:30am LL20BCD John Erik Koch, Director, Informatics, Technology Workshops Continental Breakfast Merck & Co. See page 4 for complete information on sessions and rooms. Workshops held 9:30am – 4:00pm LL20 Foyer BioPharma companies often struggle to during these times: Exhibits Open manage scientific information – study 1:45pm – 3:15pm results, analyses and historical records 3:30pm – 5:00pm 8:30am – 9:30am LL20A are lost due to poor information manage - Plenary Presentation ment practices and failure to steward Putting Big Data to Work information in a way that can be leveraged Bill Franks, Chief Analytics Officer for future purposes. We will share examples Teradata Corporation of our strategy, execution and progress to date for improving information Big data is everywhere. You can’t avoid management through a set of innovative being exposed to discussions around big capabilities focused on information data, and the analysis of it, on a regular Search, Access and Analytics. basis. The downside of this attention is that there is a lot of hype and misinformation Tracks 3: Big Data 101 LL21D in the marketplace. Many companies are Getting Started with Hadoop confused about how to get started, what and Big Data actions to take, and what pitfalls to avoid. Brian Keller, Data Scientist Based on content from the popular book Booz Allen Hamilton Taming The Big Data Tidal Wave, this talk will provide an overview of important themes Getting started with big data analytics to understand regarding big data. The talk can be a daunting task because of the will address technological, organizational, complexities of the technologies and the and cultural points that must be considered . fast evolution of the software ecosystem. Most importantly, the talk will aim to This talk will expose analytics professionals provide attendees a solid direction to to established open source big data tech - take their big data initiatives. nologies and provide them with practical methods to prototype and develop big data 9:40am – 10:30am Tracks analytics without requiring deep knowledge Track 1: Case Studies LL21F o f software development or distributed Optimal Fusion for Predictive Road-way computing. Come learn the secrets that Traffic Speeds hard core software developers don’t want Toby Tennent, HERE, a Nokia Business you to know! Keller will cover: • An overview of Hadoop and Map Reduce , The Estimation of Short Term Future Traffic • Alternatives to writing Java code to Conditions (15 mins - 12 hours ahead) can develop analytics on Hadoop, be Accomplished using a Multitude of • The big data analytic development life - Methods. In this case-study, we discuss our cycle; how to go from data to actions, development of a principled algorithmic • A worked example using the principals approach to fusing different estimates discussed, of future traffic speed conditions for an • How you can get started with big data ‘optimal estimate’. Data sparseness, noise technologies today. and competing model inputs were issues to overcome. In our case, though use of a Minimum Variance Unbiased Estimator (MVUE) approach to data-fusion, we were able to see a 25%+ improvement over our legacy approach. Lessons from this case- study can be applied to many applications involving current and predictive state- modelling.” 10

Track 4: Emerging Trends LL21C 11:10am – 12:00pm Tracks Tracks 3: Big Data 101 LL21D Big Data Combining Large and Track 1: Case Studies LL21F H adoop Beyond Java: Pig, Hive Semantically Disparate Datasets to Big Data in Official Statistics & Other Friends Come to Meaningful Conclusions Cavan Capps, Big Data Lead Diego Klabjan, Professor Chris Hallenbeck, Global Vice President, U.S. Census Bureau Northwestern University SAP Director, Master of Science in Analytics Big Data provides both challenges and While the three Vs of big data (volume, opportunities for the official statistical It takes two to tango, and Hadoop and velocity, and variety) are largely focused community. The difficult issues of privacy , MapReduce are an excellent example of this on the capturing and storage of the data, statistical reliability, and methodological idiom. In order to maximize performance we have learned a lot over the last few years transparency will need to be addressed in of MapReduce, low level java implementa - about the characteristics and challenges order to make full use of Big Data in the tions must be used which requires teams of Big Data. The real challenge of Big Data official statistical community. Improve - of data scientists. Scripting engine Pig of - is combining large and semantically disparate ments in statistical coverage at small fers a nice balance between ease of use datasets to come to meaningful conclusions . geographies, new statistical measures, and performance. Similarly, Hive offers a Even in 2014, it’s hard enough to ‘join’ text more timely data at perhaps lower costs more SQL-like syntax for data stored in to tabular data! How can this data be mined are the potential opportunities. This talk files within Hadoop. This talk will highlight to derive a model that can be scored? will provide an overview of some of the the basic principles of these two tools. It What about semi-structured, Machine or research being done by the Census will focus on their pros and cons and sim - Graph data? The answer to the problem is Bureau as it explores the use of “Big Data” ple examples will be discussed. If you have all about semantics. We have SQL for tabular for statistical agency purposes. to introduce these technologies to your data and SPARQL for RDFs, but how can organization, employ the necessary tal - these meaningfully be combined and Track 2: Case Studies LL21E ent, or are behind the selection decision exposed to solve larger problems? How Improving Efficiency of Google’s making process, this is a must attend do we expose the resulting data to devel - Infrastructure Using Big Data Tools presentation. A substantial portion of the opers in a meaningful way? What is the Behdad Masih, PhD, Quantitative Analyst talk will revolve around the question of best way to present the data to the end-users Google selecting the right tool for the task at hand. for the consumption and application of In this talk, we will review how Google’s the data? big data tools are being used internally to improve the efficiency of Google’s Track 5: Software Tutorials LL21B infrastructure. Every minute, hundreds INFORMS Analytics Maturity Model of thousands of servers across Google’s How Mature Are You? Preview the Beta fleet report performance metrics of hard - Version of the New INFORMS Analytics ware components such as cpu, disk, Maturity Model memory and network bandwidth usage Aaron Burciaga, Senior Manager and these metrics are saved in different Operations Analytics, Accenture Analytics formats, in different databases, across Co-author; Nagaraj Reddi, Director of multiple datacenters. But as consumers Information Technology, INFORMS of the data, Google’s data analysts can When it comes to the way you use analytics , easily merge and aggregate these metrics how mature is your organization? How do and perform data mining techniques to you compare to others in your industry? identify and address inefficiency of the INFORMS is debuting the much anticipated resource deployment. As the first case model, which you can use to assess your study, we will show how we have used analytics needs, set goals, and obtain Google’s application programming inter - guidance from INFORMS. Meet two faces to merge performance metrics and architects of the model who will walk you forecast the required networking capacities through the new website and show how at server, cluster and datacenter levels as you can gain advantage in all your Big a function of other hardware resources such Data projects. as cpu and disk. During the second case study, we share our experience setting up 10:30am – 11:00am LL20 Foyer a bandwidth monitoring dashboard used Refreshment Break with Exhibits to inform flash deployment decisions. 11

Track 4: Emerging Trends LL21C Track 5: Software Tutorials LL21B 1:10pm – 2:00pm Tracks Taming Big Data with Berkeley Data FORIO- Publishing Interactive Data Track 1: Case Studies LL21F Analytics Stack (BDAS) Analytics on the Web Conception to Deployment of Big Data Ion Stoica, Professor, UC Berkeley Michael Bean, Forio Projects at a Large Financial Institute CEO, Databricks, CTO, Conviva Mike Aguiling, Head of Big Data Technology Forio’s web platform makes your model JP Morgan Intelligent Solutions One of the most interesting developments available to hundreds of people within your over the past decade is the rapid increase organization through the browser. We will What does it take to directly impact top in data; we are now deluged by data from start with an introduction to the platform line revenue with Big Data projects? What on-line services (PBs per day), scientific and sample interactive online models. are the typical hurdles to overcome as you instruments (PBs per minute), gene Then we’ll divide the workshop into two move from a research project to integrate sequencing (250GB per person) and parts. In the first part we will teach you how directly into an existing business? We review many other sources. Researchers and to get your analysis on a server so it can the lifecycle (conception to deployment) practitioners collect this massive data with be shared. In the second part we’ll focus on of Big Data projects at a large financial one goal in mind: extract “value” through creating a for your model. institute. The talk will cover topics including sophisticated exploratory analysis, and project concept and design, guidelines to use it as the basis to make decisions as Track 6: Software Tutorials LL20A Big Data project development, a high level varied as personalized treatment and ad SAS - High-performance Data Mining overview of a ‘Big Database’, and end project targeting. Unfortunately, today’s data and Big Data Analytics deployment. The focus will be on the specific analytics tools are slow in answering even Jaren Dean, Senior Director of Research business use case, the analytic methodology simple queries, as they typically require to and Development for Advanced Analytics used, and the measurable outcome. In sift through huge amounts of data stored SAS addition, there will be a review of lessons on disk, and are even less suitable for learned and typical challenges faced In this era of big data, organizations strive complex computations, such as machine working as the “Big Data Team” in a large to make the best use of information. High- learning algorithms. These limitations business. We will also briefly cover regula - performance data mining and big data leave the potential of extracting value of big tory hurdles framing Big Data solutions analytics drive better, faster decisions data unfulfilled. To address this challenge, (specific to financial institutions). based on vast amounts of data. During we are developing BDAS, an open source this tutorial, learn about the evolution of data analytics stack that provides interactive Track 2: Case Studies LL21E hardware platforms; predictive modeling response times for complex computations Big Data in Health methods, including statistical and machine on massive data. To achieve this goal, BDAS Juergen A. Klenk, PhD, Principal Scientist learning; and the benefits of segmentation . supports efficient, large-scale in-memory Exponent®, Inc. Through case studies and straightforward data processing, and allows users and examples, you’ll walk away with new ideas Erik Brynjolfsson of MIT’s Sloan School of applications to trade between query on how to implement these methods and Management found a remarkable correlation accuracy, time, and cost. In this talk, I’ll technologies. between an organization’s performance present the architecture, challenges, early and its data driven decision making (DDD) results, and our experience with developing 12:00pm – 1:00pm LL20BCD ability, thus demonstrating the actual BDAS. Some BDAS components have already Networking Lunch value of data. Fueled by findings like this been released: Mesos, a platform for Join your colleagues and meet someone and other promises, hospital systems, cluster resource management has been new to discuss how you are using big data health insurance providers, pharmaceutical deployed by Twitter on 6,000+ servers, while in your company. Seating is not assigned and biotech firms, and medical devices Spark, an in-memory cluster computing for this event. manufacturers alike now have high expec - frameworks, is already being used by tens tations to generate value from Big Data. of companies and research institutions. Indeed, we are nearing the “Peak of Inflated Expectations” in Gartner’s Hype Cycle, and organizations must quickly learn to develop and implement Big Data strategies. In this talk, we will review key components of a Big Data strategy, and demonstrate how it can be successfully implemented to benefit your organization, whether you are interested in improvement of the quality of your healthcare delivery, the safety and benefits of your drugs and devices, the effective management of your operations, or the control of cost. 12

Tracks 3: Big Data 101 LL21D Track 5: Software Tutorials LL21B 3:00pm – 3:50pm Tracks An Introduction to NOSQL Databases ProbabilityManagement.org- Track 1: Case Studies LL21F and their Analytic Uses The SIPmath™ Modeler Tools Smart Machine Learning to Lend to Aaron Cordova, CTO & Co-Founder Sam L. Savage, Executive Director Small Businesses Koverse ProbabilityManagement.org Pinar Donmez, Chief Data Scientist Kabbage, Inc. NOSQL technology has the potential to The Open SIPmath Standard from non- provide powerful data storage and analytic profit ProbabilityManagement.org allows The way traditional financial institutions capabilities, but the space is just emerging , uncertainties to be communicated as big underwrite loans and manage risk has been crowded with many approaches and data for driving interactive simulation in changing incredibly with the help of emerging difficult to navigate. This talk will expose native Excel and other environments with - technologies. Consumer/business loans analytics professionals to some of the out add-in software. We will demonstrate and other areas of credit scoring have been core principles that make NOSQL unique the SIPmath Modeler Tools, which facilitate disrupted by interdisciplinary technologies and dive into the specifics of several of the generation of such models in Excel, combining risk management with machine the more established efforts. Included and also how to import and export results learning and advanced data analytics. will be sample use cases and lessons from @RISK, Crystal Ball, Risk Solver and The talk will cover how we turn data into learned. Specifically this talk will cover: Matlab to leverage those packages. Visit decisions for underwriting and dynamic • An overview of NOSQL principals and the SIPmath page of SIPmath.org for risk management at Kabbage Inc. history videos and example files. • An overview of core analytic use cases Track 2: Case Studies LL21E for NOSQL Track 6: Software Tutorials LL20A Developing a Lean Data Science Strategy • An overview of leading NOSQL solutions Booz Allen Hamilton - Machine Joel Horwitz, Director of Products and and analytic use cases Learning on Streaming Data with Marketing, Alpine Data Labs • Ideas and tools for getting started with Storm and MOA When developing a data science project NOSQL technology. Paul Yacci, Data Scientist strategy, the scope, tools, and team are Booz Allen Hamilton crucial. In Lean Data Science projects, Track 4: Emerging Trends LL21C Analysis of streaming data enables real-time these considerations require special Watson in the Emerging Era of actions in applications such as network attention prior to kicking off the engage - Cognitive Computing defense and fraud detection. Storm pro - ment. We will present possible paths to Rob High Jr., IBM Corporation vides an open source distributed stream success, by focusing on topics including: In this talk I will describe the role of cogni - processing framework designed to scale • Setting “bite-sized” project goals tive computing in addressing some of the to the demands of big data. Storm can be • Using tools that provide a display of worlds most important business and social extended to include machine learning by process and information problems. I will discuss how Watson integrating the open source library MOA, • Utilizing a cross-functional team leverages human-readable information which provides machine learning algorithms It is important to map out the path to a at massive scale to meet those needs. In capable of online learning. This tutorial minimally viable product such that critical addition, I will provide an update on how will introduce concepts to build a scalable, points of failure are evaluated early when IBM has applied that capability to assist streaming machine learning platform developing a process. clinical decision support for Oncology, using open source software. and for engaging clients in Insurance and Financial Services institutions. 2:00pm – 3:00pm LL20 Foyer Poster Session with Dessert Break Enjoy dessert, visit the exhibits, and learn from poster presentations that showcase a full range of big data analytics, including many applications areas, industries and methodologies. As a guide for identifying which posters you want to visit, watch for the color-coded signs that indicate general topic areas. Prizes for the top 3 overall projects will be awarded at 2:50pm. The criteria are interesting use of big data analytics, effective display of results and business impact. See page 8 for the full list of posters. 13

Tracks 3: Big Data 101 LL21D enough and those just getting started Track 6: Software Tutorials LL20A What is Data Science, and Is a have a lot of trouble knowing where to Contact Signapore “Data Scientist” a Myth? begin. At GraphLab we’ve been working Data Analytics Career Opportunities Steve Mills, Senior Associate hard to make it easy for both developers in Singapore Booz Allen Hamilton and data scientists to build, deploy, and Kelly Tan, Area Director manage their data products. How many Contact Singapore Data Scientist was named the sexiest job times have you heard that things don’t of the 21st Century. Given the rapid increase Interested to find out more about Big scale or have had to completely rewrite in the term “Data Science” over the last Data trends and career opportunities in your applications just to deploy a proof year, many organizations are left with Singapore? Join us at the session by of concept? R, Mahout, sci-kit learn, figuring out who a data scientist is and Contact Singapore where we will cover Hadoop, relational, key-value, and graph what they do, leading many to think that it topics ranging from the latest Big Data databases; Pig, Hive, Flume, and array of is a “purple unicorn”. Many organizations developments in Singapore to career visualization and other tools…. Is there a believe they can simply turn an existing opportunities in data analytics and general better way? We think so. Whether you are analytics team into a “data science team” information on working and living in a data scientist trying to do more, or a or that they can hire a “data scientist”. In Singapore. Come find out more! developer or business owner trying to get addition, many organizations are evaluating started, this talk is for you. I’ll define data Big Data systems and are challenged with 4:00pm – 5:30pm LL20BCD products and how they are different from providing value from these deployments. Birds-of-a-Feather Discussion Groups traditional data analytics. I’ll give some Through client examples, this talk will Stop by for beer, soda, snacks and good examples of the cool applications you can discuss the terminology around OR and conversation around important issues. build, show some code and systems behind data science; discuss the next generation Topics will be selected by the attendees. the scenes, and show how data scientists of data science technology, Big Data, and We need your input! Look for the white can be more productive, developers can subsequent skills; showcase data science board near registration to suggest topics get started faster, and businesses can organizational design, deployment, and or vote for a topic already posted. This define new strategic opportunities. tips for building the team; and present case will be the most popular and highly-rated studies on how businesses and governments aspects of the conference–don’t miss it! Track 5: Software Tutorials LL21B used data science to gain competitive Provalis Research advantage and to better serve citizens. 5:30pm – 7:00pm Marriott San Jose Presentation of Provalis Research General Reception Ballroom Text Analytics Tools Track 4: Emerging Trends LL21C Stop by this informal reception where you Adam Bendriss Alami, Senior Marketing Data Products - Changing Lives, can connect with other attendees and Manager, Provalis Research Disrupting Markets, Hard to Build unwind from a jam packed day. Members Shawn Scully, Graph Lab/UC Berkeley Provalis Research will showcase its integrated of the Conference Committee will be your collection of text analytics software. QDA hosts. Enjoy a hearty display of heavy What the heck is a data product and why Miner is an easy to use qualitative and hors d’oeuvres and food stations, beer, should you care? Applications that drive mixed methods software that meets the wine, soft drinks and bottled water. online product recommendations, predict needs of researchers performing qualitative machine failures, forecast airfare, social data analysis and would like to code more match-make, identify fraud, and predict quickly and more consistently larger churn are all examples of data products. amounts of documents. It offers high level Under the hood, these products consume computer assistance for qualitative coding all that Big Data and apply machine learning with innovative text search tools that help and statistics magic to drive real-time users speed up the coding process as well actions. Data products are quickly as advanced statistical and visualization becoming the most valuable part of the tools. Users with even bigger text data, Big Data pipeline both because many of can also take advantage of WordStat. This them drive a company’s most important add-on module to QDA Miner can be used customer experiences and because these to analyze huge amounts of unstructured can directly affect the bottom line. If you information, quickly extract themes, find don’t have data products driving new trends over time, and automatically iden - business at your company, you should be tify patterns and references to specific asking why. In spite of this value, teams concepts using categorization dictionaries. aren’t developing these products fast 14 Tuesday June 24 viewing records, and heuristics for finding Track 10: Software Tutorials LL21C a near-optimal media buy plan. AMG has FICO been featured on the cover of the Analytics Tools for the Area of Big Data 7:00am – 3:00pm LL20 Foyer Times magazine as well as in Bloomberg, Benjamin Baer, Senior Director, Registration Politico, the Cook Political Report and Product Marketing elsewhere. 7:30am – 8:30am LL20BCD Get a hands-on look at how to use new Continental Breakfast innovations in FICO® Model Builder that Track 8: Big Data 101 LL21E detect predictive signals from massive Panel Discussion: Analytics Talent 9:30am – 3:00pm LL20 Foyer and unstructured data for extracting real, Moderator: Anne Robinson, Director, Exhibits Open actionable insight. Learn how to turn these Chain Strategy & Analytics, Verizon Wireless , predictions into optimized decisions with Panelists: Colin Kessinger, Theresa Kushner, 8:30am – 9:30am LL20A the new robust modeling and optimization Thomas Olavson , Aditya Rastogi Plenary Presentations capabilities of FICO® Xpress Optimization Moving from Big Data to Big Outcomes MIT claims that 67% of companies see Suite. Come see how Model Builder and on the Journey to ROI having analytics capabilities as a driver Xpress can help you make better decisions Michael Svilar, Managing Director for their competitive advantage. However, fueled by Big Data. Advanced Analytics-Accenture Analytics, according to TDWI, 46% of companies Accenture listed inadequate staffing or skills as the 10:30am – 11:00am LL20 Foyer top barrier for realizing value from their Refreshment Break with Exhibits In this Internet of Everything world, systems , big data and analytics investments. What devices and physical objects are “talking” does it take to have a successful big data 11:10am – 12:00pm Tracks to one another. There are upwards of a and analytics capability in an organization? Track 7: Case Studies LL21F trillion connected and instrumented things : How do you attract the quintessential data Real-world Big Data Applications cars, appliances, cameras, roadways, scientist? What are the executive sponsor Jane Uyvova, Teradata pipelines…even pharmaceuticals and and leadership qualities required to be livestock. The talk will focus on how Real-world Big Data Applications: Learn successful? Listen to a panel of industry organizations can drive positive business how customers are utilizing bid data in the expert’s answer these questions and more. outcomes in the connected world using enterprise to uncover valuable insights big data analytics. and operationalize them across large Track 9: Emerging Trends LL21D organizations and complex business The Role of the Big Data Platform in a 9:40am – 10:30am Tracks processes. Topics include data discovery Smart System Track 7: Case Studies LL21F strategies and applications in telecommu - Kaushik Kunal Das Optimizing Media Purchasing Through nications, financial services, life sciences Senior Principal Data Scientist Big Data and manufacturing industries. Use cases Pivotal, Inc. Alan Papir, Software Engineer address solutions for digital marketing, Analytics Media Group (AMG) There is a proliferation of data in today’s fraud loss prevention, customer churn world in terms of quantity, type and prediction, sales force enablement and more. Analytics Media Group (AMG) is using sampling speed. This is matched by the lessons learned from the 2012 Obama growth in tools to store and extract value presidential campaign to bring new, from this data. But what is missing is an data-driven insights into the world of media effective way to bring these together to buying. Using various modeling and data form a smart system. I am going to define mining techniques in conjunction with a smart system and describe how a big large and rich datasets (such as billions of data platform which is based on open set top box records), AMG discovers who standards and compatible with many is most likely to “convert” to a product or analytical and data processing tools is candidate at the person-level. AMG then essential for building such a system. I am takes these desirable targets and uses a also going to describe a few specific use trove of set top box data to produce a near- cases of such systems and the technology , optimal solution to problem of purchasing methodology and algorithms needed to the most valuable placements given a create them. limited budget (a multi-objective variation of the knapsack problem). This presentation will cover some of AMG’s techniques for identifying targets, strategies for efficiently storing and retrieving tens of billions of TV 15

Track 8: Big Data 101 LL21E Track 10: Software Tutorials LL21C Specifically, the discussion will include: Python, R, and SQL in MPP Databases Frontline Systems, Inc. • Background of the aviation industry. Anton J. Mobley Analytic Solver Platform: Integrated • Characteristics of machine diagnostic Kaiser Permanente Data Mining, Simulation and problem. Optimization in Microsoft Excel • Data elements. Python, R, and SQL are some of a data Daniel H. Fylstra, Frontline Systems, Inc. • Algorithms. scientist’s favorite tools for exploring data . • Results of analytics. Massively Parallel Processing (MPP) data - Analytic Solver Platform in Microsoft This problem is analogous to a large class bases provide one way to leverage these Excel has everything you need for data of machine monitoring and control problems , tools at scale. This talk introduces MPP visualization, forecasting and data mining, hence, the solution approach discussed databases and shows how to combine them Monte Carlo simulation and risk analysis, here is applicable to various industrial with analytic tools to start developing models. and conventional and stochastic optimiza - sectors. Topics include: tion – where its solving power actually • A brief introduction to how MPP data - surpasses “enterprise” analytic software Track 8: Big Data 101 LL21E bases work from the data scientist’s costing far more. See how you can use it Big Data Solutions on AWS Platform perspective to build your own analytic expertise and Vikram Garlapati, Manager Solutions • Which type of data science problems teach others, leveraging what you already Architect, Web Services MPP databases are well suited to solve know, build and solve industrial-scale and why they are used even when HDFS Scientists, developers, and many other models with the world’s best Solvers, and is likely cheaper technologists from many different industries effectively communicate business results. • How to use Python and R natively in are taking advantage of Amazon Web MPP databases Services to meet the challenges of the 12:00pm – 1:00pm LL20BCD • Sample problems that leverage Python, increasing volume, variety, and velocity of Lunch and Roundtable Discussion R, and SQL in MPP digital information. Join your colleagues and meet someone offers an end-to-end portfolio of cloud new to discuss what is keeping you up at Track 9: Emerging Trends LL21D computing resources to help you manage night, as a big data expert or data scientist , Big Data and Big Analytics – big data by reducing costs, gaining a and where do you see big data analytics So Much more Gunpowder! competitive advantage, and increasing going in the next 3 years. Seating is not Paul Kent, Vice President the speed of innovation. It feels like every - assigned for this event. SAS thing generates data today, from your customers on social networks to the in - Mathematical Computing is changing for 1:10pm – 2:00pm Tracks stances running your web applications. the better — Modern Analytic Platforms Track 7: Case Studies LL21F AWS makes it easy to provision the storage , such as Hadoop are embracing the Mas - Big Data Analytics Application to computation, and database services you sively Parallel Cluster. Spread your data Jet Engine Diagnostics need to turn that data into information for out over the nodes, and find ways to send Link C. Jaw, Fellow, Intel Corporation your business. AWS also has data transfer the work to the data (instead of the other The “explosion of information” that we services which can move big data into and way around), and you’ll soon be enjoying have witnessed is not just contributed by out of the cloud quickly such as AWS Direct some awesome advances in computing a large number of data sources, but also by Connect and our Import/Export service. firepower. This talk details the transition the large amount of data originating from In this presentation we will talk about to this new style of computing and gives these sources. Together they have created • AWS Direct Connect and our examples of software techniques that were the so-called “big data” environment Import/Export service un-reachable before but are now practical. characterized by variety, volume, and • Redshift - Petabyte-scale data Interactivity and visual data exploration velocity. The challenge of information warehousing in minutes. are available across all your data and now explosion is how to extract the right - • Amazon Elastic MapReduce (EMR) you can access sophisticated programmatic mation at the right time from the big data provides - Easy analytics for everyone model development that was previously environment which we live in. Extracting • Amazon Kinesis - streaming data at computationally intractable! The talk should the right information from the data is an massive scale challenge you to consider the status quo at analytic process, and this process uses your organization – are you doing enough data platforms and analytic algorithms to adapt to the new wave of computing? to spot trends and patterns so as to derive predictive indicators. These indicators are then used to make proper recommendations or to take timely actions. In this case study , an aircraft engine diagnostic problem is described and a solution is discussed. 16

Track 9: Emerging Trends LL21D 2:00pm – 3:00pm LL20 Foyer Track 8: Big Data 101 LL21E Can Machine Learning Really Do That? Poster Session with Dessert Break Babies, Brains, and Buses... and Why Anthony Goldbloom, Founder and CEO Enjoy dessert, visit the exhibits, and learn Stream Computing is the Right Kaggle from poster presentations that showcase Approach to Real-time Predictions and a full range of big data analytics, including Decision Making Big data is one of today’s hot technology many applications areas, industries and Kevin Foster, Big Data Solutions Architect trends. We’re told that our ability to store methodologies. As a guide for identifying IBM petabytes of data and the profusion of which posters you want to visit, watch for sensors will transform modern business. Babies, brains, and buses. What do they the color-coded signs that indicate general One element of the big data story that’s have in common? Each of these have Prizes for the top 3 overall projects will be often overlooked is the development of been the subject for stream computing awarded at 2:50pm. The criteria are inter - machine learning techniques that allow projects that monitor data in real-time esting use of big data analytics, effective us to do more with the data that we’re with decisions made quicker and in higher display of results and business impact. storing. This talk will introduce data mining volume than is possible with any disk or See page 8 for the full list of posters. competitions as a way to solve challenging even memory based store-then-query problems. In doing this, it will also cover architectural approach. Real-time decision 3:00pm – 3:50pm Tracks some of what Kaggle is seeing at the cut - making can often require sub-second Track 7: Case Studies LL21F ting edge of machine learning, including results, but can also just be within seconds A Case Study on Grid Data Analytics some of the applications that are now or even minutes based on the needs of Marina Thottan, Director possible with modern techniques. the project. In this talk we’ll discuss why Bell Laboratories, Alcatel-Lucent real-time is better approached from the Track 10: Software Tutorials LL21C Smart grid evolution involves tremendous perspective of just-in-time, and discuss Enrich Consulting growth in sensor deployment in both the examples of use cases across industries Finding the Strategy Hidden in Your distribution and the transmission grid, including their algorithms, data volumes, New Product Portfolio with the Enrich and an exponential growth in the availabil - latency requirements, project methodology , Analytics Platform ity of both structured and unstructured and implementation architectures that Dan Smith, VP of Consulting data collected from sensors and other ex - include analytics with R, MATLAB, SPSS, Richard Sonnenblick, CEO ternal sources such as demographic and and other technologies. Enrich Consulting environmental data. To realize the return on investment in smart grid technology it Track 9: Emerging Trends LL21D The Enrich Analytics Platform helps R&D is important to maximize the benefit of Simon Zhang, Senior Director, Business companies value, communicate, and refine the substantial amount of data flowing Analytics, LinkedIn tens of billions of dollars in R&D investments into utility control centers. The wealth of annually. Come see for yourself how easy Please visit the conference website for data obtained can be leveraged to extend it can be to build new product forecasts additional information. visibility into the grid from the traditional that include critical uncertainties, create boundaries of the Supervisory Control and compare what-if scenarios, and judge and Data Acquisition (SCADA) network the strategic value of a portfolio of invest - (i.e., substations and transformers) all the ments, using the Enrich Analytics Platform . way down to the end user customer prem - We’ll also introduce a free set of tools ises. Utilities have to develop a data ana - available on our web site to build visualiza - lytics strategy for timely and precise tions specially developed to communicate analysis of the grid data to define novel the trade-offs between portfolio value business and operational applications and portfolio balance. that will greatly enhance grid operations. In this talk we will describe the design of an online grid data analytics system along with lessons learned. 17

November 9-12, 2014 Hilton San Francisco & Parc 55 Wyndham San Francisco, California Registration Now Open

Thanks to our Sponsors:

SAN FRANCISCO 2014 INFORMS ANNUAL MEETING BRIDGING DATA AND DECISIONS

INFORMSFORRRMS CONFERENCECONFFFER ON USINESSUSSINNNESS ANALYTICSANNALN LLYTICSYTTICS & PERPERATIONSRAATIONSTTIONS RESEARCHREESEEEARCHH 18

Sunday, June 22

12:00pm - 7:00pm Registration - LL20 Foyer

1:45pm - 5:00pm Technology Workshops

5:30pm - 7:00pm Welcome Reception and Exhibits - LL20 Foyer

Monday, June 23

7:00am - 5:00pm Registration - LL20 Foyer

7:30am - 8:30am Breakfast - LL20BCD

9:30am - 4:00pm Exhibits Open - LL20 Foyer

8:30am - 9:30am Plenary - Bill Franks, Teradata - LL20A

TRACKS T1 T2 T3 T4 T5 T6 CASE STUDIES CASE STUDIES BIG DATA 101 EMERGING TRENDS SOFTWARE TUTORIALS SOFTWARE TUTORIALS

ROOMS LL21F LL21E LL21D LL21C LL21B LL20A

9:40am - 10:30am Optimal Fusion for Big Data in Practice - Getting Started with Big Data Combining Large INFORMS Predictive Road-way Scientific Information Hadoop and Big Data and Semantically Disparate INFORMS Analytics Traffic Speeds as a Business Asset Brian Keller Datasets to Come to Maturity Model Toby Tennent John Erik Koch Booz Allen Hamilton Meaningful Conclusions Preview the Beta Version HERE Merck & Co., Inc. Chris Hallenbeck SAP

10:30am - 11:00am Refreshment Break with Exhibits - LL20 Foyer

11:10am-12:00pm Big Data in Improving Efficiency of Hadoop Beyond Java: Taming Big Data with Forio SAS Official Statistics Google’s Infrastructure Pig, Hive & Other Friends Berkeley Data Publishing Interactive High-performance Data Cavan Capps Using Big Data Tools Diego Klabjan Analytics Stack (BDAS) Data Analytics on Mining and Big Data U.S. Census Bureau Behdad Masih Northwestern University Ion Stoica The Web Analytics Google UC Berkeley/ Databricks/Conviva

12:00pm-1:00pm Lunch and Roundtable Discussions - LL20 BCD

1:10pm-2:00pm Conception to Deployment Big Data in Health An Introduction to Watson in the ProbabilityManagement.org Booz Allen Hamilton of Big Data Projects at a Juergen A. Klenk NOSQL databases and Emerging Era of The SIPmath™ Machine Learning on Large Financial Institute Exponent®, Inc. their Analytic Uses Cognitive Computing Modeler Tools Streaming Data with Mike Aguiling Aaron Cordova Rob High Jr. Storm and MOA JP Morgan Koverse, Inc. IBM Corporation Intelligent Solutions

2:00pm-3:00pm Poster Session - Dessert Break with Exhibits and Posters - LL20 Foyer

3:00pm-3:50pm Smart Machine Learning Developing a Lean Data What is Data Science, and is Data Products - Changing Provalis Research Contact Singapore to Lend to Science Strategy a “Data Scientist” a Myth? Lives, Disrupting Markets, Presentation of Provalis Data Analytics Career Small Businesses Joel Horwitz Steven Mills Hard to Build Research Text Opportunities Pinar Donmez Alpine Data Labs Booz Allen Hamilton Shawn Scully Analytics Tools in Singapore Kabbage Inc. GraphLab

4:00pm - 5:30pm Birds-of-a-Feather Discussion Groups - LL20BCD

5:30pm - 7:00pm General Reception in Marriott Hotel - San Jose Ballroom 19

Tuesday, June 24

7:30am - 3:00pm Registration - LL20 Foyer

7:30am - 8:30am Breakfast - LL20BCD

9:30am - 3:00pm Exhibits Open - LL20 Foyer

8:30am - 9:30am Plenary - Michael Svilar, Accenture - LL20A

TRACKS T7 T8 T9 T10 CASE STUDIES BIG DATA 101 EMERGING TRENDS SOFTWARE TUTORIALS

ROOMS LL21F LL21E LL21D LL21C

9:40am - 10:30am Optimizing Media Purchasing Panel Discussion: The Role of the Big Data FICO Through Big Data Analytics Talent Platform in a Smart System AnalyticsTools for Alan Papir Anne Robinson Kaushik Kunal Das the Area of Big Data

Analytics Media Group (AMG) Verizon Wireless Pivotal, Inc.

10:30am - 11:00am Refreshment Break with Exhibits - LL20 Foyer

11:10am-12:00pm Real-world Big Data Python, R, and SQL in Big Data and Big Analytics – Frontline Systems, Inc. Applications MPP Databases So Much more Gunpowder! Analytic Solver Platform: Jane Uyvova Anton J. Mobley Paul Kent Integrated Data Mining, Simulation Teradata Kaiser Permanente SAS and Optimization

12:00pm-1:00pm Lunch and Roundtable Discussions - LL20BCD

1:10pm-2:00pm Big Data Analytics Application Big Data Solutions on Can Machine Learning Enrich Consulting to Jet Engine Diagnostics AWS Platform Really Do That? Finding the Strategy Hidden in Link Jaw Vikram Garlapati Anthony Goldbloom Your New Product Portfolio Intel Corporation Amazon Web Services Kaggle with the Enrich Analytics Platform

2:00pm-3:00pm Poster Session - Dessert Break with Exhibits and Posters - LL20 Foyer

3:00pm-3:50pm A Case Study on Babies, Brains and Buses... Simon Zhang Grid Data Analytics and Why Stream Computing is LinkedIn Marina Thottan the Right Approach to Real-Time Bell Laboratories Predictions and Decision Making Kevin Foster IBM 20 San Jose Convention Center Lower Level

Registration, Exhibits Poster Session, Breaks Welcome Reception

LL20A T6 San Jose Plenary LL21B T5 Marriott LL20B LL21C T4

LL21D T3 LL20C

LL21E T2 LL20D LL21F T1

Networking Lunches & Birds of a Feather