AI and Big Data on IBM Power Systems Servers

Total Page:16

File Type:pdf, Size:1020Kb

AI and Big Data on IBM Power Systems Servers Front cover AI and Big Data on IBM Power Systems Servers Ivaylo B. Bozhinov Anto Ajay Raj John Rafael Freitas de Lima Ahmed (Mash) Mashhour James Van Oosten Fernando Vermelho Allison White Redbooks IBM Redbooks AI and Big Data on IBM Power Systems Servers March 2019 SG24-8435-00 Note: Before using this information and the product it supports, read the information in “Notices” on page xv. First Edition (March 2019) This edition applies to IBM Power Systems servers that are based on the POWER9 and POWER8 processor-based technology that is running the software stack that is described in “Software levels” on page 120. © Copyright International Business Machines Corporation 2019. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Figures . vii Tables . xi Examples . xiii Notices . .xv Trademarks . xvi Preface . xvii Authors. xviii Now you can become a published author, too! . xix Comments welcome. xix Stay connected to IBM Redbooks . .xx Chapter 1. Solution overview. 1 1.1 Introduction . 2 1.1.1 Types of AI . 2 1.1.2 What is a data lake . 3 1.2 Artificial intelligence solutions . 4 1.2.1 IBM PowerAI . 4 1.2.2 IBM Watson Machine Learning Accelerator . 5 1.2.3 IBM Watson Studio Local . 7 1.2.4 H2O Driverless AI . 9 1.2.5 IBM PowerAI Vision . 10 1.2.6 Key differences among the popular AI solutions. 11 1.3 Data platforms. 11 1.3.1 Apache Hadoop and Hortonworks . 11 1.3.2 IBM Spectrum Scale . 13 1.3.3 IBM Elastic Storage Server. 15 1.3.4 IBM Spectrum Scale HDFS Transparency Connector . 17 1.4 When to use a GPU . 19 1.5 Hortonworks Data Platform GPU support . 22 1.5.1 Native GPU support . 22 1.5.2 GPU discovery . 23 1.5.3 GPU isolation and monitoring . 23 1.5.4 GPU scheduling . 24 1.5.5 Hortonworks Data Platform 3 and YARN container with GPU support. 24 1.6 Linux on Power . 24 1.7 Client use cases . 25 1.7.1 Asian job-hunting services company. 25 1.7.2 Large bank . 26 1.7.3 South American IT services provider . 26 1.7.4 Governmental agency. 26 1.7.5 European IT services company. 26 Chapter 2. Integration overview. 29 2.1 Architecture overview . 30 2.1.1 Infrastructure stack . 31 © Copyright IBM Corp. 2019. All rights reserved. iii 2.2 System configurations. 33 2.2.1 IBM Watson Machine Learning Accelerator configuration . 33 2.2.2 IBM Watson Studio Local configurations . 35 2.2.3 Configuring an HDP system . 38 2.2.4 Configuring a proof of concept . 38 2.2.5 Conclusion . 39 2.3 Deployment options . 39 2.3.1 Deploying IBM Watson Studio Local in stand-alone mode or with IBM Watson Machine Learning Accelerator . 39 2.3.2 Using the Hadoop Integration service versus using an Apache Livy connector . 40 2.3.3 Deploying H2O Driverless AI in stand-alone mode or within IBM Watson Machine Learning Accelerator. 40 2.3.4 Running Spark jobs. 43 2.4 IBM Watson Machine Learning Accelerator and Hortonworks Data Platform. 43 2.5 IBM Watson Studio Local with Hortonworks Data Platform . 44 2.6 IBM Watson Studio Local with IBM Watson Machine Learning Accelerator . 46 2.7 IBM Spectrum Scale and Hadoop Integration. 47 2.7.1 Information Lifecycle Management . 48 2.8 Security . 50 2.8.1 Datalake security . 51 2.8.2 IBM Watson Machine Learning Accelerator security with Hadoop . 53 2.8.3 IBM Watson Studio Local security . 53 2.8.4 IBM Spectrum Scale security . 55 Chapter 3. Integrating new data. 57 3.1 Data ingestion overview . 58 3.2 Types of data ingestion . 59 3.3 Options for data ingestion . 61 3.4 Using data connectors to work with external data sources . 63 3.5 How integration improves the artificial intelligence models. 64 Chapter 4. Integration details. 65 4.1 Integrating IBM Watson Machine Learning Accelerator with Hortonworks . ..
Recommended publications
  • Java Linksammlung
    JAVA LINKSAMMLUNG LerneProgrammieren.de - 2020 Java einfach lernen (klicke hier) JAVA LINKSAMMLUNG INHALTSVERZEICHNIS Build ........................................................................................................................................................... 4 Caching ....................................................................................................................................................... 4 CLI ............................................................................................................................................................... 4 Cluster-Verwaltung .................................................................................................................................... 5 Code-Analyse ............................................................................................................................................. 5 Code-Generators ........................................................................................................................................ 5 Compiler ..................................................................................................................................................... 6 Konfiguration ............................................................................................................................................. 6 CSV ............................................................................................................................................................. 6 Daten-Strukturen
    [Show full text]
  • Artificial Intelligence in Health Care: the Hope, the Hype, the Promise, the Peril
    Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril Michael Matheny, Sonoo Thadaney Israni, Mahnoor Ahmed, and Danielle Whicher, Editors WASHINGTON, DC NAM.EDU PREPUBLICATION COPY - Uncorrected Proofs NATIONAL ACADEMY OF MEDICINE • 500 Fifth Street, NW • WASHINGTON, DC 20001 NOTICE: This publication has undergone peer review according to procedures established by the National Academy of Medicine (NAM). Publication by the NAM worthy of public attention, but does not constitute endorsement of conclusions and recommendationssignifies that it is the by productthe NAM. of The a carefully views presented considered in processthis publication and is a contributionare those of individual contributors and do not represent formal consensus positions of the authors’ organizations; the NAM; or the National Academies of Sciences, Engineering, and Medicine. Library of Congress Cataloging-in-Publication Data to Come Copyright 2019 by the National Academy of Sciences. All rights reserved. Printed in the United States of America. Suggested citation: Matheny, M., S. Thadaney Israni, M. Ahmed, and D. Whicher, Editors. 2019. Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril. NAM Special Publication. Washington, DC: National Academy of Medicine. PREPUBLICATION COPY - Uncorrected Proofs “Knowing is not enough; we must apply. Willing is not enough; we must do.” --GOETHE PREPUBLICATION COPY - Uncorrected Proofs ABOUT THE NATIONAL ACADEMY OF MEDICINE The National Academy of Medicine is one of three Academies constituting the Nation- al Academies of Sciences, Engineering, and Medicine (the National Academies). The Na- tional Academies provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions.
    [Show full text]
  • IBM Power Systems Solution Edition for Scale-Out Cloud
    IBM Systems and Technology Solution Brief IBM Power Systems and Storage Solution Edition for Scale-Out Cloud Open source, Linux solution for scale-out data virtualization and cloud Cloud computing is no longer a question of “if” for IT organizations, but Highlights rather one of when, how and for which workloads. Cloud is widely understood to be an IT delivery model that can improve IT asset utilization, Allows open infrastructures to scale out intelligently, with less hardware, flexibility and responsiveness while reducing complexity and lowering power and cooling requirements costs. With these benefits come many complexities which need to be and better economics, using over considered as organizations define and implement a cloud delivery strategy. twice the bandwidth from previous Some technology options can hinder the efficiency and costs saving generations potential of the cloud by impeding interoperability, hampering workload Built-in data virtualization delivers performance, exposing security vulnerabilities and limiting scalability. seamless storage management from a single control point with no Building on the performance advantaged IBM POWER8™ architecture, the impact to applications Power Systems™ and Storage Solution Edition for Scale-Out Cloud Flexibility, agility and provides a superior cost effective platform with open source PowerKVM interoperability with open source, hypervisor, powerful data virtualized with IBM Storwize ® V7000, and community-driven virtualization and a single pane of glass for OpenStack-based cloud management for a single pane of glass cloud heterogeneous cloud management management. The Solution Edition reduces the timeframe for infrastructure deployments from months to days with integrated infrastructure and automated provisioning of virtualized resources. It ensures optimal cost effectiveness with Power scale out systems, automated Easy Tier® for flash and disk storage and Real-Time Compression™ that can store up to 5x as much data in the same physical space.
    [Show full text]
  • Integration of System for Research in Health with Computation in Cloud Using Artificial Intelligence
    MOJ Proteomics & Bioinformatics Research Article Open Access Integration of system for research in health with computation in cloud using artificial intelligence Abstract Volume 7 Issue 4 - 2018 Currently the use of information technology is often applied in the health area. With regard 1 to scientific research, we have the SINPE© Integrated System of Electronic Protocols. Carlos Henrique Kuretzki, José Simão de 2 3 The purpose of this tool is to support the researcher in this area who until now did not Paula Pinto, José Claudio Vianna have statistical tests. The main objective of this work is to provide SINPE© users with the 1Professor of the Program for Analysis and Development of improvement in health data analysis through the use of cognitive computing technologies. Systems and Computer Engineering, Federal University, Brazil 2Professor of the Information Management Program, Federal Keywords: artificial intelligence, cloud computing, information systems University, Brazil 3Professor of the Computer Engineering, Universidade Positivo, Brazil Correspondence: Carlos Henrique Kuretzki, Professor of the Program for Analysis and Development of Systems and Computer Engineering, Federal University, Brazil, Email [email protected] Received: June 29, 2018 | Published: August 08, 2018 Introduction more quickly.4 IBM Watson is a cognitive computing technology and have been used to investigate the biological and health sciences. The health sector is one of the largest in most countries and can It is based on medical literature, patents, genetic, and chemical and significantly benefit from high‒quality, real‒time and location‒ pharmacological data.5 The IBM’s Watson platform combines some independent data. However, many healthcare professionals are capabilities, building a complete cognitive system, which is being not familiar with information technology solutions, business and called the new era of computing.
    [Show full text]
  • IBM Power® Systems for SAS® Empowers Advanced Analytics Harry Seifert, Laurent Montaron, IBM Corporation
    Paper 4695-2020 IBM Power® Systems for SAS® Empowers Advanced Analytics Harry Seifert, Laurent Montaron, IBM Corporation ABSTRACT For over 40+ years of partnership between IBM and SAS®, clients have been benefiting from the added value brought by IBM’s infrastructure platforms to deploy SAS analytics, and now SAS Viya’s evolution of modern analytics. IBM Power® Systems and IBM Storage empower SAS environments with infrastructure that does not make tradeoffs among performance, cost, and reliability. The unified solution stack, comprising server, storage, and services, reduces the compute time, controls costs, and maximizes resilience of SAS environment with ultra-high bandwidth and highest availability. INTRODUCTION We will explore how to deploy SAS on IBM Power Systems platforms and unleash the full potential of the infrastructure, to reduce deployment risk, maximize flexibility and accelerate insights. We will start by reviewing IBM and SAS’s technology relationship and the current state of SAS products on IBM Power Systems. Then we will look at some of the infrastructure options to deploy SAS 9.4 on IBM Power Systems and IBM Storage, while maximizing resiliency & throughput by leveraging best practices. Next, we will look at SAS Viya, which introduces changes to the underlying infrastructure requirements while remaining able to be deployed alongside a traditional SAS 9.4 operation. We’ll explore the various deployment modes available. Finally, we’ll look at tuning practices and reference materials available for a deeper dive in deploying SAS on IBM platforms. SAS: 40 YEARS OF PARTNERSHIP WITH IBM IBM and SAS have been partners since the founding of SAS.
    [Show full text]
  • POWER® Processor-Based Systems
    IBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description.
    [Show full text]
  • IBM Power Systems Solution for Mariadb
    IBM Power Systems solution for MariaDB Performance overview of MariaDB Enterprise on Linux on Power featuring the new IBM POWER8 technology Axel Schwenke MariaDB Corporation Hari Reddy IBM Systems and Technology Group ISV Enablement Basu Vaidyanathan IBM Systems and Technology Group Performance Analysis October 2014 © Copyright IBM Corporation, 2014 Table of contents Abstract........................................................................................................................................1 Introduction .................................................................................................................................1 Advantages of MariaDB on Power Systems.............................................................................1 MariaDB architecture ..................................................................................................................2 Sysbench OLTP benchmark.................................................................................................................... 3 MariaDB performance.............................................................................................................................. 3 Relative performance of IBM Power S822L and IBM System x3650 M4 ................................................ 3 Power Systems built with the POWER8 technology................................................................6 Tested configuration details ......................................................................................................7
    [Show full text]
  • IBM Power System E850 the Most Agile 4-Socket System in the Marketplace, Optimized for Performance, Reliability and Expansion
    IBM Systems Data Sheet IBM Power System E850 The most agile 4-socket system in the marketplace, optimized for performance, reliability and expansion Businesses today are demanding faster insights that analyze more data in Highlights new ways. They need to implement applications in days versus months, and they need to achieve all these goals while reducing IT costs. This is ●● ●●Designed for data and analytics, delivers creating new demands on IT infrastructures, requiring new levels of per- secure, reliable performance in a compact, 4-socket system formance and the flexibility to respond to new business opportunities, all at an affordable price. ●● ●●Can flexibly scale to rapidly respond to changing business needs The IBM® Power® System E850 server offers a unique blend of ●● ●●Can reduce IT costs through application enterprise-class capabilities in a space-efficient, 4-socket system with consolidation, higher availability and excellent price performance. With up to 48 IBM POWER8™ processor virtualization to yield over 70 percent utilization cores, advanced IBM PowerVM® virtualization that can yield over 70 percent system utilization and Capacity on Demand (CoD), no other 4-socket system in the industry delivers this combination of performance, efficiency and business agility. These capabilities make the Power E850 server an ideal platform for medium-size businesses and as a departmental server or data center building block for large enterprises. Designed for the demands of big data and analytics Businesses are amassing a wealth of data and IBM Power Systems™, built with innovation to support today’s data demands, can store it, secure it and, most important, extract actionable insight from it.
    [Show full text]
  • Data Transfer Options and Requirements
    Data Transfer Options and Requirements Original date 25 Jan 2005 Revision date 21 Aug 2018 Proprietary and Confidential Data Transfer Options and Requirements Summary of changes Date Short description of changes September 2016 Updated name of submission tool to SDSS. August 2018 Rebranded. Contacting support Support is available through the following link: https://www.ibm.com/software/support/watsonhealth/truven_support.html © Copyright IBM Corporation 2018 IBM Confidential 2 Data Transfer Options and Requirements Contents Summary of changes .................................................................................................................................... 2 Contacting support ........................................................................................................................... 2 Contents ........................................................................................................................................................ 3 Overview ....................................................................................................................................................... 5 Electronic data transfer methods ..................................................................................................... 5 Data suppliers ..................................................................................................................... 5 Data recipients ...................................................................................................................
    [Show full text]
  • Treatment and Differential Diagnosis Insights for the Physician's
    Treatment and differential diagnosis insights for the physician’s consideration in the moments that matter most The role of medical imaging in global health systems is literally fundamental. Like labs, medical images are used at one point or another in almost every high cost, high value episode of care. Echocardiograms, CT scans, mammograms, and x-rays, for example, “atlas” the body and help chart a course forward for a patient’s care team. Imaging precision has improved as a result of technological advancements and breakthroughs in related medical research. Those advancements also bring with them exponential growth in medical imaging data. The capabilities referenced throughout this document are in the research and development phase and are not available for any use, commercial or non-commercial. Any statements and claims related to the capabilities referenced are aspirational only. There were roughly 800 million multi-slice exams performed in the United States in 2015 alone. Those studies generated approximately 60 billion medical images. At those volumes, each of the roughly 31,000 radiologists in the U.S. would have to view an image every two seconds of every working day for an entire year in order to extract potentially life-saving information from a handful of images hidden in a sea of data. 31K 800MM 60B radiologists exams medical images What’s worse, medical images remain largely disconnected from the rest of the relevant data (lab results, patient-similar cases, medical research) inside medical records (and beyond them), making it difficult for physicians to place medical imaging in the context of patient histories that may unlock clues to previously unconsidered treatment pathways.
    [Show full text]
  • Splitting the Load How Separating Compute from Storage Can Transform the Flexibility, Scalability and Maintainability of Big Data Analytics Platforms
    IBM Analytics Engine White paper Splitting the load How separating compute from storage can transform the flexibility, scalability and maintainability of big data analytics platforms 2 Splitting the load Contents Executive summary 2 Executive summary Hadoop is the dominant big data processing system in use today. It is, however, a technology that has been around for 10 3 Challenges of Hadoop design years, and the world of big data has changed dramatically over 4 Limitations of traditional Hadoop clusters that time. 5 How cloud has changed the game 5 Introducing IBM Analytics Engine Hadoop started with a specific focus – a bunch of engineers 6 Overcoming the limitations wanted a way to store and analyze copious amounts of web logs. They knew how to write Java and how to set 7 Exploring the IBM Analytics Engine architecture up infrastructure, and they were hands-on with systems 8 Use cases for IBM Analytics Engine programming. All they really needed was a cost-effective file 10 Benefits of migrating to IBM Analytics Engine system (HDFS) and an execution paradigm (MapReduce)—the 11 Conclusion rest, they could code for themselves. Companies like Google, 11 About the author Facebook and Yahoo built many products and business models using just these two pieces of technology. 11 For more information Today, however, we’re seeing a big shift in the way big data applications are being programmed and deployed in production. Many different user personas, from data scientists and data engineers to business analysts and app developers need access to data. Each of these personas needs to access the data through a different tool and on a different schedule.
    [Show full text]
  • Thesis Artificial Intelligence Method Call Argument Completion Using
    Method Call Argument Completion using Deep Neural Regression Terry van Walen [email protected] August 24, 2018, 40 pages Academic supervisors: dr. C.U. Grelck & dr. M.W. van Someren Host organisation: Info Support B.V., http://infosupport.com Host supervisor: W. Meints Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering http://www.software-engineering-amsterdam.nl Abstract Code completion is extensively used in IDE's. While there has been extensive research into the field of code completion, we identify an unexplored gap. In this thesis we investigate the automatic rec- ommendation of a basic variable to an argument of a method call. We define the set of candidates to recommend as all visible type-compatible variables. To determine which candidate should be recom- mended, we first investigate how code prior to a method call argument can influence a completion. We then identify 45 code features and train a deep neural network to determine how these code features influence the candidate`s likelihood of being the correct argument. After sorting the candidates based on this likelihood value, we recommend the most likely candidate. We compare our approach to the state-of-the-art, a rule-based algorithm implemented in the Parc tool created by Asaduzzaman et al. [ARMS15]. The comparison shows that we outperform Parc, in the percentage of correct recommendations, in 88.7% of tested open source projects. On average our approach recommends 84.9% of arguments correctly while Parc recommends 81.3% correctly. i ii Contents Abstract i 1 Introduction 1 1.1 Previous work........................................
    [Show full text]