Lecture - 1 “Moving Ahead”

Lecture - 1 “Moving Ahead” - from Clusters and Grids to Cloud computing Salman Toor [email protected] Basic questions • Why Cloud computing? • What are the previous technologies? • What was missing in the previous technologies? • Will previous technologies be substituted? • Can legacy applications run on Cloud platforms? 2 Were supercomputers the only source of large scale computing before Clouds? ANSWER: NO 3 Distributed Computing Infrastructures (DCI) • Cluster Computing • Accessible via Local Area Network (LAN) • Grid Computing • Based on Wide Area Network (WAN) • Cloud Computing • Next generation computing model • Desktop Computing • Utility Computing • P2P Computing • Pervasive Computing • Ubiquitous Computing • Mobile Computing 4 Contribution of large scale computing • Areas in which the role of large scale computing is inevitable: • Particle Physics • Bioinformatics • Computational Mathematics • Quantum Chemistry • … • … 5 Computing model • Most of the large scale applications both from academia and industry were designed for batch processing • Batch Processing: A complete set of batch or group of instructions together with the required input data to accomplish a given task (often known as job). No user interaction is possible during the execution. 6 Cluster computing http://www.wikid.eu/index.php/Computer_Clustering Cluster computing • A cluster is a type of parallel or distributed computer system, which consists of a collection of interconnected stand-alone computers working together as a single integrated computing resource • First realised in 60’s but gained real momentum in mid 80’s • The aim is to move away from the specialised supercomputing platform and build more general purpose computing environment based on commodity hardware http://www.cloudbus.org/papers/ic_cluster.pdf Cluster computing • The concept of building computing clusters materialised with tremendous growth in computer hardware • In a typical scenario (worker/slave/compute) cluster nodes are dedicated resources with no external peripherals attached • Specifically designed for batch processing • Cluster Types: • Supercomputing clusters • Commodity hardware based clusters 9 Cluster computing • Known Softwares of Cluster computing: • HTCondor • Portable Batch System (PBS) • Load Sharing Facility (LSF) • Simple Linux Utility for Resource Management (SLRM) • Rocks • …. • …. 10 Cluster computing Advantages • Uniform access to available resources • Load balancing • Various job scheduling techniques • Cluster management tools • User interfaces • single job submission • complex workflows management • Fundamental level security (in typical cases) • Production quality softwares are available 11 Cluster computing Disadvantages • Applications need to adopt the way underlying infrastructure is designed • Cluster softwares are non-coherent • Steep learning curve • Less secure (improved significantly over the years) • Tightly coupled with the underlying resources • Difficult to port new applications • Applications need to stick with the available tools and libraries • Non standard interfaces 12 Cluster computing Current status • Cluster computing is one of the most established way of accessing limited amount of interconnected computational resources • For example, hundreds of organisations in industry, government, and academia have used HTCondor • Extension like Directed Acyclic Graph Manager (DagMAN) in HTCondor are still in use to define complex workflows https://research.cs.wisc.edu/htcondor/description.html 13 https://research.cs.wisc.edu/htcondor/dagman/dagman.html Cluster computing Short falls ̣ Uniform access to large number of resources ̣ System that can handle complex and large workloads • Possible next steps • Explore ways to find more resources • Uniform access to distributed computational resources • A bigger system for batch processing 14 Grid computing • Definition - 1 : (Computational Grid) Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements. • Definition - 2 : (Computational Power Grids) The computational power grid is analogous to electric power grid and it allows to couple geographically distributed resources and offer a consistent and inexpensive access to resources irrespective of their physical location or access point. http://toolkit.globus.org/alliance/publications/papers/chapter2.pdf The anatomy of the grid: Enabling scalable virtual organizations The Grid 2: Blueprint for a new computing infrastructure 15 http://www.gridcomputing.com/gridfaq.html Grid computing Vision 16 Grid computing Actual picture http://kekcc.kek.jp/service/cc/uguide_en/10_1.system_tokutyou.html 17 Grid computing System components • Application execution tools • Information extraction • Multi-level scheduling • Runtime environments • Resource discovery • Security • Reliability • Data management • Quality of Services (QoS) • Interoperability • Resource allocation • Virtual Organisation • Metadata management Management System (VOMS) • …. • …. 18 Grid computing Virtual Organisation Management System (VOMS) • Virtual Organisation An abstract entity grouping Users, Institutions and Resources in a same administrative domain. • Virtual Organisation Management System VOMS is a system for managing authorisation data within multi-institutional collaborations. VOMS provides a database of user roles and capabilities and a set of tools for accessing and manipulating the database and using the database contents to generate Grid credentials for users when needed. Article: From gridmap-file to VOMS: managing authorization in a Grid environment http://toolkit.globus.org/grid_software/security/voms.php 19 Large Hadron Collider Grid (LCG) 20 http://www.isgtw.org/feature/isgtw-feature-mega-grid-mega-science Grid Computing Basic Workflow UI Input sandbox JDL DataSets info Information Output sandbox Service Resource SE & CE info Input Output voms-proxy-init Broker sandbox Job Submit Event sandbox Job Query + Broker Info Publish Expanded JDL Job Status Globus RSL Storage Element Job Submission Job Status Service Computing Job Status Element 21 Job workflow in gLite middleware: http://slideplayer.com/slide/2801198/ Grid computing at CERN • Large Hadron Collider (LHC) experiment at European Organisation for Nuclear Research (CERN) • The Grid runs more than two million jobs per day • Till 2013, system had 100PB of data and its growing 27PB per year • Expected to generate 400PB of data till 2023 https://www.youtube.com/watch?v=7k3VnWXOjP4 http://home.web.cern.ch/about/updates/2013/02/cern-data-centre-passes-100-petabytes http://www.hpcwire.com/2014/11/04/cern-details-openstack-journey/ 22 http://home.web.cern.ch/about/computing Grid computing Advantages • Seamless access to geographically distributed resources • Provide means to accelerate collaborative science • The concept of virtual organisations (VO) evolved with Grids • Each site in the Grid system is fully autonomous • Transparent access to the heterogeneous resources • Allows large scale batch processing capabilities 23 Grid Computing Disadvantages • Complex system architecture • Steep learning curve for the end user • Only allow batch processing, zero level interactivity • Difficult to attach a comprehensive economic model • The sites are autonomous but the softwares are tightly connected with the underlying hardware • Mostly available for academic and research activities • Lack of standard interface • Static availability of resources 24 Grid computing Current status • European Middleware Initiative (EMI) • Compute Resources: • gLite Middleware • Advanced Resource connector (ARC) • Unicore • Storage Resources • DCache • Castor • DPM 25 Grid computing Current status • Advanced Resource connector (ARC) 26 Grid computing Current status • Nordic Data Grid Facility (NDGF) • Storage/data grid based on DCache software stack • Data is distributed over many computing centres across Scandinavia • Secure data access using variety of protocols http://neic.nordforsk.org/about/strategic-areas/tier-1 27 Grid computing Short falls ̣ Tight coupling with hardware resources ̣ User interfaces ̣ Limited user community ̣ Weak monitoring and billing system ̣ Limited user level access ̣ Complex software stack ̣ Security model ̣ users and project management system Possible next steps A system that can address these limitations 28 Cloud computing NIST definition Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf 29 Example from SoVware Engineering Waterfall Model Spiral Model Grid Computing Uniﬁed Modeling Language (UML) Cloud Computing 30 Strength of cloud computing Cloud compu\ng reduces the gap between the concept and the implementa\on by deﬁning roles and responsibili\es that allows: • level of abstrac\on • Service Level Agreements (SLA) • paradigm shiV from servers to *-as-a-service • possibility to a_ach economic model • on-demand resource availability 31 Cloud computing Roles and responsibilities • Infrastructure provider • Platform provider • Software provider • Network provider 32 Cloud computing

Lecture - 1 “Moving Ahead”

Push-Based Job Submission Using Reverse SSH Connections

Introduction to Python University of Oxford Department of Particle Physics

The Translational Journey of the Htcondor-CE

The Advanced Resource Connector User Guide

Facilitating E-Science Discovery Using Scientific Workflows on the Grid

A Comprehensive Perspective on Pilot-Job Systems

Glite Middleware

Pos(EGICF12-EMITC2)050

EGI Federated Platforms Supporting Accelerated Computing

Experimental Study of Remote Job Submission and Execution on LRM Through Grid Computing Mechanisms

Pilot-Job Provisioning Through CREAM Computing Elements on the Worldwide LHC Computing Grid Alexandre Boyer, David Hill, Christophe Haen, Federico Stagni

Workload Management in the EMI Project