Lecture - 1 “Moving Ahead”

- from Clusters and Grids to

Salman Toor [email protected] Basic questions

• Why Cloud computing?

• What are the previous technologies?

• What was missing in the previous technologies?

• Will previous technologies be substituted?

• Can legacy applications run on Cloud platforms?

2 Were supercomputers the only source of large scale computing before Clouds?

ANSWER: NO

3 Distributed Computing Infrastructures (DCI) • Cluster Computing • Accessible via Local Area Network (LAN) • • Based on Wide Area Network (WAN) • Cloud Computing • Next generation computing model

• Desktop Computing • Utility Computing • P2P Computing • Pervasive Computing • Ubiquitous Computing • Mobile Computing

4 Contribution of large scale computing • Areas in which the role of large scale computing is inevitable:

• Particle Physics • Bioinformatics • Computational Mathematics • Quantum Chemistry • … • …

5 Computing model

• Most of the large scale applications both from academia and industry were designed for batch processing

• Batch Processing:

A complete set of batch or group of instructions together with the required input data to accomplish a given task (often known as job). No user interaction is possible during the execution.

6 Cluster computing

http://www.wikid.eu/index.php/Computer_Clustering Cluster computing

• A cluster is a type of parallel or distributed computer system, which consists of a collection of interconnected stand-alone computers working together as a single integrated computing resource

• First realised in 60’s but gained real momentum in mid 80’s

• The aim is to move away from the specialised supercomputing platform and build more general purpose computing environment based on commodity hardware http://www.cloudbus.org/papers/ic_cluster.pdf Cluster computing

• The concept of building computing clusters materialised with tremendous growth in computer hardware

• In a typical scenario (worker/slave/compute) cluster nodes are dedicated resources with no external peripherals attached

• Specifically designed for batch processing

• Cluster Types:

• Supercomputing clusters • Commodity hardware based clusters

9 Cluster computing

• Known Softwares of Cluster computing:

• HTCondor • Portable Batch System (PBS) • Load Sharing Facility (LSF) • Simple Utility for Resource Management (SLRM) • Rocks • …. • ….

10 Cluster computing Advantages • Uniform access to available resources • Load balancing • Various job scheduling techniques • Cluster management tools • User interfaces • single job submission • complex workflows management • Fundamental level security (in typical cases) • Production quality softwares are available

11 Cluster computing Disadvantages • Applications need to adopt the way underlying infrastructure is designed • Cluster softwares are non-coherent • Steep learning curve • Less secure (improved significantly over the years) • Tightly coupled with the underlying resources • Difficult to port new applications • Applications need to stick with the available tools and libraries • Non standard interfaces

12 Cluster computing Current status • Cluster computing is one of the most established way of accessing limited amount of interconnected computational resources

• For example, hundreds of organisations in industry, government, and academia have used HTCondor

• Extension like Directed Acyclic Graph Manager (DagMAN) in HTCondor are still in use to define complex workflows

https://research.cs.wisc.edu/htcondor/description.html 13 https://research.cs.wisc.edu/htcondor/dagman/dagman.html Cluster computing Short falls

̣ Uniform access to large number of resources ̣ System that can handle complex and large workloads

• Possible next steps

• Explore ways to find more resources • Uniform access to distributed computational resources • A bigger system for batch processing

14 Grid computing

• Definition - 1 : (Computational Grid)

Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements.

• Definition - 2 : (Computational Power Grids)

The computational power grid is analogous to electric power grid and it allows to couple geographically distributed resources and offer a consistent and inexpensive access to resources irrespective of their physical location or access point. http://toolkit.globus.org/alliance/publications/papers/chapter2.pdf The anatomy of the grid: Enabling scalable virtual organizations

The Grid 2: Blueprint for a new computing infrastructure 15 http://www.gridcomputing.com/gridfaq.html Grid computing Vision

16 Grid computing Actual picture

http://kekcc.kek.jp/service/cc/uguide_en/10_1.system_tokutyou.html 17 Grid computing System components

• Application execution tools • Information extraction • Multi-level scheduling • Runtime environments • Resource discovery • Security • Reliability • Data management • Quality of Services (QoS) • Interoperability • Resource allocation • Virtual Organisation • Metadata management Management System (VOMS) • …. • …. 18 Grid computing Virtual Organisation Management System (VOMS) • Virtual Organisation

An abstract entity grouping Users, Institutions and Resources in a same administrative domain.

• Virtual Organisation Management System

VOMS is a system for managing authorisation data within multi-institutional collaborations. VOMS provides a database of user roles and capabilities and a set of tools for accessing and manipulating the database and using the database contents to generate Grid credentials for users when needed.

Article: From gridmap-file to VOMS: managing authorization in a Grid environment http://toolkit.globus.org/grid_software/security/voms.php 19 Large Hadron Collider Grid (LCG)

20 http://www.isgtw.org/feature/isgtw-feature-mega-grid-mega-science Grid Computing Basic Workflow

UI Input sandbox JDL DataSets info Information Output sandbox Service Resource SE & CE info Input

Output voms-proxy-init Broker

sandbox Job Submit Event Submit Job sandbox Job QueryJob + Broker Info Publish

Expanded JDL

Job Status

Globus RSL Storage Element Job Submission Job Status Service Computing Job Status Element 21 Job workflow in gLite middleware: http://slideplayer.com/slide/2801198/ Grid computing at CERN

• Large Hadron Collider (LHC) experiment at European Organisation for Nuclear Research (CERN)

• The Grid runs more than two million jobs per day

• Till 2013, system had 100PB of data and its growing 27PB per year

• Expected to generate 400PB of data till 2023 https://www.youtube.com/watch?v=7k3VnWXOjP4 http://home.web.cern.ch/about/updates/2013/02/-data-centre-passes-100-petabytes http://www.hpcwire.com/2014/11/04/cern-details-openstack-journey/ 22 http://home.web.cern.ch/about/computing Grid computing Advantages

• Seamless access to geographically distributed resources

• Provide means to accelerate collaborative science

• The concept of virtual organisations (VO) evolved with Grids

• Each site in the Grid system is fully autonomous

• Transparent access to the heterogeneous resources

• Allows large scale batch processing capabilities

23 Grid Computing Disadvantages • Complex system architecture

• Steep learning curve for the end user

• Only allow batch processing, zero level interactivity

• Difficult to attach a comprehensive economic model

• The sites are autonomous but the softwares are tightly connected with the underlying hardware

• Mostly available for academic and research activities

• Lack of standard interface

• Static availability of resources 24 Grid computing Current status • European Middleware Initiative (EMI)

• Compute Resources: • gLite Middleware • Advanced Resource connector (ARC) • Unicore

• Storage Resources • DCache • Castor • DPM

25 Grid computing Current status • Advanced Resource connector (ARC)

26 Grid computing Current status • Nordic Data Grid Facility (NDGF)

• Storage/data grid based on DCache software stack

• Data is distributed over many computing centres across Scandinavia

• Secure data access using variety of protocols

http://neic.nordforsk.org/about/strategic-areas/tier-1 27 Grid computing Short falls ̣ Tight coupling with hardware resources ̣ User interfaces ̣ Limited user community ̣ Weak monitoring and billing system ̣ Limited user level access ̣ Complex software stack

̣ Security model ̣ users and project management system Possible next steps A system that can address these limitations 28 Cloud computing NIST definition

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf 29 Example from Soware Engineering Waterfall Model Spiral Model Grid Computing

Unified Modeling Language (UML) Cloud Computing

30 Strength of cloud computing

Cloud compung reduces the gap between the concept and the implementaon by defining roles and responsibilies that allows:

• level of abstracon • Service Level Agreements (SLA) • paradigm shi from servers to *-as-a-service • possibility to aach economic model • on-demand resource availability

31 Cloud computing Roles and responsibilities

• Infrastructure provider

• Platform provider

• Software provider

• Network provider

32 Cloud computing

Why Cloud Computing?

33 Cloud computing

A well-defined economic model

• Driving force behind Cloud concept

• Public Clouds • Private or Community • Amazon Clouds Smog • HP Helion Cloud • • ePouta • Intel Cloud • UberCloud 34 Cloud computing

Complete isolation, direct access and full control of allocated resources

35 Cloud computing

On demand resource allocation No job queues!

• No need of specialised static worker nodes

36 Cloud computing

Loose coupling with the underlying resources

• Live or block based VM migration

37 Cloud computing

“Standard” interface to interact with the cloud resources

• Amazon EC2 and S3 APIs could be used to connect to OpenStack based Cloud • RestAPIs based communication

38 Cloud computing

Orchestration of scalable services

• Amazon EC2 (Compute) • Amazon S3 (Storage) • Amazon Elastic MapReduce • OpenStack Sahara (virtual Hadoop cluster) • OpenStack Trove (Database)

39 Cloud computing

Minimal interaction with service providers

40 Cloud computing

Are legacy applications portable to Clouds?

ANSWER: Yes

41 Cloud computing Computing model • Together with batch processing, Cloud computing model provides interactive processing of complex applications

Wikipedia:

Interactive computing refers to software which accepts input from humans — for example, data or commands.

• Frameworks like; IPython or Jupyter notebooks extend web technologies for interactive computing

42 Cloud Computing

43 Introduction to SNIC Cloud (IaaS)

http://smog.uppmax.uu.se

44 Security on SNIC Cloud

45 What SNIC Cloud will provide?

• Resources – Compute – Storage – Network

• Users will have complete control over the allocated resources.

• Power comes with the responsibility!

46 Important

• Users can login as supper-user root, can install or uninstall whatever they want. • Question: What if for “connivance” I will create a user account on my VM with the name “XXX” and password “XXX123”… Can I ? • The answer is YES, you can. But it may have serious consequences!!!!

47 Consequences

• Since the VMs will be available via Internet and with weak password or sometimes even with strong passwords, systems can get hacked.

• The attacker can do varies things: – Destroy the data available on the VM – Corrupt the VM so it will not be usable – Generate an attack using your VM – Or even much more …

48 What should we do ?

• Don’t use password based logins!

• The convention is to use SSH key-pair login mechanism.

• For this course it is required that all the students always use SSH keys to access resources.

49 What is SSH key-pair?

• A public key based authentication system used to identify users on SSH enabled servers • based on pair of keys – private key (user’s personal key) – public key (world readable key) • User can generated RSA or DSA based keys – RSA (Rivest-Shamir-Adleman) keys have a minimum key length of 768 bits and the default length is 2048 – The key length of DSA (Digital Signature Algorithm) is always 1024

https://wiki.archlinux.org/index.php/SSH_keys https://help.ubuntu.com/community/SSH/OpenSSH/Keys 50 Key-Pair generation

• OpenStack based key

generation interface $ ssh-keygen or $ ssh-keygen -t rsa -b 2048 • Command-line interface

51