
Collaborative Workflow-Driven Science in a Rapidly Evolving Cyberinfrastructure Ecosystem 2018 Symposium of the Center for Network and Storage Enabled Collaborative Computational Science – October 16, 2018 İlkay ALTINTAŞ, Ph.D. UC San Diego Chief Data Science Officer & Division Director of Cyberinfrastructure Research, Education and Development, San Diego Supercomputer Center Fellow and Chair of Cyberinfrastructure, Halıcıoğlu Data Science Institute Director, Workflows for Data Science Center of Excellence İlkay ALTINTAŞ, Ph.D. [email protected] A little about me… İlkay ALTINTAŞ, Ph.D. [email protected] SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego Providing Cyberinfrastructure for Research and Education • Established as a national supercomputer resource center in 1985 by NSF • A world leader in HPC, data-intensive computing, and scientific data management • Current strategic focus on “Big Data”, “versatile computing”, and “life sciences applications” Recent Innovative Architectures • Gordon: First Flash-based Supercomputer for Data-intensive Apps • Comet: Serving the Long Tail of Science İlkay ALTINTAŞ, Ph.D. [email protected] Halıcıoğlu Data Science Institute https://datascience.ucsd.edu An academic unit that provides a home for curating the growth in Data Sciences as a discipline • Deep expertise organized into clusters • Manage new engagements to seed, cultivate and grow the practice of Data Sciences İlkay ALTINTAŞ, Ph.D. [email protected] Data Science Hub at SDSC -- Expertise, Systems and Training for DSH Data Science Research and Applications -- http://datascience.sdsc.edu • Serving as a community hub for collective and lasting innovation • Leading innovative solutions and applications Industry Training Analytics • Creating top of the line computing and Applications data platforms Data Big Platforms • Educating and establishing a modern data science workforce SDSC Expertise and Strengths • Connecting research initiatives with SDSC Data Science Hub (DSH) since 2015 industry and entrepreneurial ventures A collaborative organization and a community hub for collective lasting innovation in data science research, development and education. İlkay ALTINTAŞ, Ph.D. [email protected] Common Theme… “Big” Data, Computational Science, Data Science, Cyberinfrastructure, and Their Applications İlkay ALTINTAŞ, Ph.D. [email protected] In most applications, utilization of Big Data often needs to be combined COMPUTING AT with “BIG” DATA Scalable DIVERSE SCALES Computing. Enables dynamic data-driven applications Computer-Aided Drug Discovery Smart Cities Disaster Resilience and Response Smart Manufacturing Personalized Precision Medicine Smart Grid and Energy Management İlkay ALTINTAŞ, Ph.D. [email protected] How do we amplify the value of Big Data? İlkay ALTINTAŞ, Ph.D. [email protected] How do we find the connections and answer questions that benefit science and society? “We are drowning in information and starving for knowledge” – John Naisbitt Source: Megatrends, 1982 İlkay ALTINTAŞ, Ph.D. [email protected] Going from Data to Discovery to Impact Benefit Y for Amplifying the Science, Value of Data Business, Society or Related to X Education We need to focus on the problems to solve. İlkay ALTINTAŞ, Ph.D. [email protected] When MD met sea spray aerosols… Large scale PCA and clustering GPU enabled molecular dynamics Markov State Model Optimization Informed Feedback to Experimental Continuous Data ~1M++ Design Access, Integration core and Transformation hours Collaborators: Rommie Amaro, Kim Prather, Amarnath Gupta, UC San Diego Built-in Scientific Communication and İlkay ALTINTAŞ, Ph.D. Reproducibility [email protected] Assurance Problem solving happens at the AMBER GPU MD Workflow application Minimization Actor integration level... A Kepler Workflow Tool for Reproducible AMBER GPU Molecular Dynamics, Purawat, Ieong, Malmstrom, Chan, Yeung, Walker, Altintas, Amaro.. DOI: 10.1016/j.bpj.2017.04.055 İlkay ALTINTAŞ, Ph.D. [email protected] Application integration requires the expertise of a collaborative team. • Multi-disciplinary scientific and technological expertise • Integration of many scales of computing • Integration of big and small experimental datasets • Can be historical or real time • Usage of individual or community developed legacy tools • Methods to manage and interpret data • Modeling and simulation • Gateways for visualization and dashboard • Long term active and passive storage needs İlkay ALTINTAŞ, Ph.D. [email protected] The problems we are solving are … • data-driven • heterogeneous • collaborative • data systems • on demand distributed computing • high speed connectivity What if the whole world was our computer? İlkay ALTINTAŞ, Ph.D. [email protected] Create an Ecosystem that Enables Needs and Best Practices • data-driven • accountable • scalable • reproducible • dynamic • interactive • process-driven • heterogeneous • collaborative • includes many different expertise İlkay ALTINTAŞ, Ph.D. [email protected] A Typical Collaborative Data Science Ecosystem İlkay ALTINTAŞ, Ph.D. [email protected] Data-driven problem solving requires: • Heterogenous systems • Data management • Data-driven methods • Scalable tools for dynamic coordination and resource optimization • Skilled interdisciplinary workforce • Collaborative culture and tools that enable groups to communicate İlkay ALTINTAŞ, Ph.D. [email protected] Data Engineering Computational Data Science ACQUIRE PREPARE ANALYZE REPORT ACT STORE PUBLISH … Scale Scale Scale Scale Continuous Iteration, Integration and Programmability İlkay ALTINTAŞ, Ph.D. [email protected] Systems integration requirements… Dynamic composability matters. Systems are only useful if groups can integrate them into applications. Tools that enhance teamwork need to be coupled with AI systems. İlkay ALTINTAŞ, Ph.D. [email protected] ENABLING INTERFACES MANAGEMENT COLLABORATION e.g., Gateways and other online tools for research and teaching WORKFLOW MANAGEMENT e.g., Application Integration, Coordination, Optimization, Communication, Reporting COMPOSABLE SERVICES e.g., Model and Data Archives, Deep Learning, Analytics, HPC, Training, Notebooks REPRODUCIBILITY RESOURCE MANAGEMENT e.g., Kubernetes Container Cloud SECURITY PRIVACY and SECURITY COMPOSABLE SYSTEMS DATA LIFECYCLE MANAGEMENT LIFECYCLE DATA e.g., GPU, CPU, Big Data, Neuromorphic, Networks, Storage, … İlkay ALTINTAŞ, Ph.D. [email protected] ENABLING INTERFACES MANAGEMENT COLLABORATION e.g., Gateways and other online tools for research and teaching WORKFLOW MANAGEMENT e.g., Application Integration, Coordination, Optimization, Communication, Reporting COMPOSABLE SERVICES e.g., Model and Data Archives, Deep Learning, Analytics, HPC, Training, Notebooks SECURITY REPRODUCIBILITY RESOURCE MANAGEMENT e.g., Kubernetes Container Cloud COMPOSABLE SYSTEMS DATA LIFECYCLE MANAGEMENT LIFECYCLE DATA e.g., GPU, CPU, Big Data, Neuromorphic, Networks, Storage, … İlkay ALTINTAŞ, Ph.D. [email protected] Smart Composability from Systems to Services • Composable Systems • dynamically measured and resourced • used as a resource based on need and availability MANAGEMENT COLLABORATION • Resource Management • Kubernetes integration WORKFLOW MANAGEMENT • mapping tools for resource identification per task e.g., Application Integration, Coordination, • Composable Services Optimization, Communication, Reporting • runs on composable systems (e.g., containers) • exposes a parametric interface for integration COMPOSABLE SERVICES • continuously measured and profiled e.g., Model and Data Archives, Deep Learning, Analytics, HPC, Training, Notebooks • Workflow Management • focused on coordination and resource optimization RESOURCE MANAGEMENT • requires a number of AI-based tools and data processing e.g., Kubernetes Container Cloud to function • Collaboration Management COMPOSABLE SYSTEMS • focused on expertise integration and goal setting e.g., GPU, CPU, Big Data, Neuromorphic, • applies methodologies for effective communication Networks, Storage, … • provides tools measure and validate team activity İlkay ALTINTAŞ, Ph.D. [email protected] The Rest of It ENABLING INTERFACES e.g., Gateways and other online tools for research and teaching • Enabling Interfaces • front end to communicate and explore with data products • examples are SUAVE, maps, specialized dashboards, etc. • Security and Privacy • compliance through Sherlock REPRODUCIBILITY SECURITY and and PRIVACY SECURITY • Blockchain research and integration • Reproducibility DATA LIFECYCLE MANAGEMENT DATA • containerization and repeatability of work • Data Lifecycle Management • curation and long term storage of research datasets (FAIR data principles) • active data management İlkay ALTINTAŞ, Ph.D. [email protected] Dynamic composition requires dynamic network, compute, storage and resource combined with intelligent software for steering applications. İlkay ALTINTAŞ, Ph.D. [email protected] Collaborative Workflow Process for Management using Composable Practice of Data Systems and Services Science • PPoDS: Methodology and tools for collaboration • Measurable collaboration management • Web based metric setting and testing for composable units • Ability to design parametric workflows that run on multiple PODs • SmartFlows: Performance measurement and application integration tools for composable units • Web-based container integration interface and deployment of scientific workflows on customizable infrastructure • Notebook Workflow Orchestration • PODBank for Kubernetes based container management of large
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages39 Page
-
File Size-