10/9/2013

Outline

 Condor

Presented by : Walid Budgaga  The Anatomy of the Grid

 Globus Toolkit CS 655 – Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 2

Motivation Motivation  High Throughput Computing (HTC)?  HTC is suitable for scientific research  Large amounts of computing capacity over long  Example(Parameter sweep): periods of time. Testing parameter combinations to  Measured: operations per month or per year  High Performance Computing (HPC)? keep temp. at particular level  Large amounts of computing capacity for short

periods of time  op(x,y,z) takes 10 hours, 500(MB) memory, I/O 100(MB)  Measured: FLOPS  x(100), y(50), z(25)=> 100x50x25=125,000(145 years)

3 4

Motivation HTC Environment Fort Collins  Large amounts of processing capacity?  Exploiting computers on the network Science Center  Utilizing heterogeneous resources Uses Condor for  Overcoming differences of the platforms  By building portable solution scientific projects  Including resource management framework

 Over long periods of time?  System must be reliable and maintainable  Surviving failures (software & hardware)  Allowing leaving and joining of resources at any time  Upgrading and configuring without significant downtimes Source: http://www.fort.usgs.gov/Condor/ComputingTimes.asp 5 6

1 10/9/2013

HTC Environment HTC

Also, the system must meet the needs of: Other considerations:  Resource owners The distributive owned resources lead to:  Rights respected  Decentralized maintenance and configuration of resources  Policies enforced

 Resource availability  Customers  Benefit of additional processing capacity outweigh complexity of  Applications preempted at any time usage  Adds an additional degree of resource heterogeneity  System administrators

 Real benefit provided to users outweigh the maintenance cost

7 8

Condor Overview

 Open-source high-throughput computing framework for computing intensive tasks.

 Manages distributive owned resources to provide large amount of capacity

 Developed at the Computer Sciences Department at the University of Wisconsin-Madison

 Name changed to HTCondor in October 2012

9 10

Condor Overview Condor Overview

 Customer agent  Represent the customer job(application)  Can state the its requirement as following:

 Need a /x86 platform

 Want the machine with the high memory capacity

 Prefer a machine in the lab 120

11 12

2 10/9/2013

Condor Overview Condor Overview

 Resource agent  Matchmaker  Represent the resource  Matches jobs and resources  Can state its offers as following:  Platform: a Linux/x86 platform  based on requirements and offers  Memory: 1GB  Can state its requirements as following:  Notifies the agents when a match found  Run jobs only when keyboard and mouse are idle for 15 m  Run jobs only from the computer department

 Never run jobs belong to [email protected]

13 14

Challenges of HTC system:

Software Development

System Administration

15 16

Software Development Software Development Four primary challenges Utilization of heterogeneous resources:  Utilization of heterogeneous resources Requires system portability obtained through layered system design • Network API :  Requires system portability. • Connection-oriented and connectionless  Network Protocol Flexibility • Reliable and unreliable interfaces.  Required to cope with constantly changing of the resource and customer needs • Authentication and encryption  Required for adding new features • Process management API : • Create , suspend, resume,  Remote file access and kill a process.  Required for giving ability for accessing data from any workstation • Workstation statistics API:  Utilization of non dedicated resources • Reports information needed to  Implement resource owner policies  Required for preempt and resume application.  Verify the validation of the applications requirements 17 18

3 10/9/2013

Software Development Software Development Network Protocol Flexibility: Remote file access(1): To cope with adding new services in HTC without frequently updating HTC To guarantee that HTC applications can access their data components, general purpose data format may be used from any workstation in the cluster. • Three possible solutions: • For example: Condor uses protocol similar to RPC • Using existing distributed file system (NFS) • Authenticates customer application, • Condor: • Privileges need to assigned, or • Grant file access permission

19 20

Software Development Software Development Remote file access (2): Remote file access(3): • Redirecting file I/O system calls • Implementing data file staging • Interposing HTC between application & • By Linking application with an interposition library • Transferring input and output files to remote workstation • Does not require file storage on remote workstation specified by customer • Reduce performance. • Require free disk space on workstation • Difficult to develop & maintain portable interposition • High cost for large data files

21 22

Software Development Software Development Utilization of non-dedicated resources Checkpoints in Condor (1)

 Requires the ability for preempting and resuming application.  Used as migration mechanism  It can be obtained using checkpoints  to migrate jobs from workstations to others  Checkpoint:  Used to resume a vacated jobs  snapshot of the state of the executing program  The program has the ability to checkpoint itself  It can be used to restart the program at a later time  Using a checkpointing library  Provide reliability  To provide additional reliability  Enable preemptive-resume scheduling  HTCondor can be configured to write checkpoints periodically

23 24

4 10/9/2013

Software Development Software Development Checkpoints in Condor (2) Checkpoints in Condor (3)

 When checkpoints are stored:  Storing of checkpoints

 Periodically, if HTCondor is configured  By default,

 At any time by the program  checkpoints are stored on local disk of the machine

 When higher priority job has to start on the same machine where job was submitted

 When the machine becomes busy  However,

 It can be configured to stored them on checkpoints server

25 26

System Administration Administrator has to answer to:

 Resource owners

 By guaranteeing that HTC enforces their policies

 Customers

 By ensuring receipt of valuable services from HTC

 Policy makers

 By demonstrating that HTC is meeting the stated goals.

28 27

System Administration System Administration Access Policies Access Policies Example from Condor:  Specifies when and how the resources can be accessed and by whom

 The policies might be specified using a set of expressions

 For example in Condor:  Requirements (true: to start accessing the resources)  Rank (preference)  Suspend  Continue  Vacate (notification to stop using resources)  Kill (immediately stopping using the resources)

29 30

5 10/9/2013

System Administration System Administration Reliability System logs The HTC must be prepared against failures and It is primary tools for diagnosing system failures. It gives the ability to must be automate failure recover for common failure. reconstruct the events leading up to the failure . Problems and Suggested solutions  It is not easy job  Logs files can grow to unbounded size.  Detect difference between normal and abnormal termination  Detailed logs for recently events and summaries for old information  Don’t leave running applications unattended  Managing distributed log file  Choose the correct checkpoint to restart  Store logs centrally on a file server or a customized log server

 Decide when it is safe to restart the application  Provide single interface by installing logging agents on each workstation  Determine & avoid bad nodes 31 32

System Administration System Administration Monitoring and Accounting  CondorView Usage Graph

It helps the Administrator to:

 Assess the current and historical state of the system

 Track the system usage

33 34

System Administration System Administration Security (1) Security (2)

Possible attacks To protect against an unauthorized of resource access policy

 Resource attack  Resource owner may specify authorized users in his access policy

 Unauthorized user gains access to a resource  Condor Example:

 Authorized user violates resource owner’s access policy Requirement = (Customer == “[email protected]”) ||  Customer attack ( Customer == “[email protected]”)

 Customer’s account or files are risked via HTC environment

35 36

6 10/9/2013

System Administration System Administration Security (3) Security (4)

To protect against violations of resource access policy, To protect the customer’s account and files

 The resource agent may:  HTC must ensure that all resource agents are trustworthy

 Set resource consumption limit by using system API  Placing data files only on trusty hosts

 Run the application under “guest” account  Using authentication mechanism  Set file system root directory to “sandbox” directory  Encrypting network streams  Intercept the system calls performed by app. via OS interposition interface

37 38

System Administration Remote Customers

Remote access is more convenient than direct access

 Customer creates an HTC account

 Customer agent can be installed on customer workstation

 The administrator allows this agent to access the HTC cluster

 For non- trustworthy customers, extra security procedures may be required

39 40

Condor Condor Condor is suitable for high throughput computations  Running programs unattended and in the background  Redirecting console input & outputs from and to files  Running many jobs at same time at different machines  Notifying on completion via email  Exploiting idle machines  Allowing tracking jobs’ progress  Running one job on multiple machines  Allowing for many jobs to be completed over a long period of  Survive hardware and software failure time  Allowing joining and leaving of machines  Useful for researchers that concern with number of jobs they  Enforcing your own policy can do over particular time length

41 42

7 10/9/2013

Condor Condor can be seen as a distributed job scheduler  Scheduling submitted jobs on available machines  Allowing users to specify priorities to their jobs  Ensuring fair resources share by constantly calculating user priority Condor as Distributed Job Scheduler  Lower numerical value means higher priority  Each user starts with the highest priority (0.5)  Priority improves over time if number of used machines < priority  Priority worsens over time if number of used machines > priority  Using of checkpoints  Suspending and resuming of jobs  Rescheduling jobs on different machines

43 44

Distributed Job Scheduler Distributed Job Scheduler  Machine expresses ClassAd Language  Attributes  Describing jobs, workstations, and other resources  Conditions  Same idea of classified advertising section of news paper  Preferences

 Job expresses  Attributes  Exchangeable between  Requirements processes to schedule jobs  Preferences

 Matchmaker  Providing information about  Finds matching the state of the system  Notifies the matched parties 45 46

Distributed Job Scheduler Distributed Job Scheduler

ClassAd Structure ClassAd Example:  Set of attribute-values pairs  Each value can be:  Integer  Floating point  String  Logical expression  TRUE  FALSE  UNDEFINED Also, attributes from different ClassAds can be used

For Example: other.size > 3

47 48

8 10/9/2013

Distributed Job Scheduler Distributed Job Scheduler

Matchmaker:

Its job to find matching between two ClassAds (job & machine)

 Matching between two ClassAds (job & machine) occurs if

 The expressions of Requirements attribute in both ClassAds are true

 If more than two matches are found?  Rank is used

49 50

Distributed Job Scheduler Job Submission Can be done by submit a job description file

Job description file It is plain ASCII text file used to describe job or cluster (several jobs) Condor: How to submit the job  Specify how many times to run the job  Specify the directory of the input and output files  Specify how to receive notification when completing the execution (email or log)  Select an Universe  Standard or Vanilla  PVM  MPI  GLOBUS (Grid applications)  Scheduler (meta-schedulers)

51 52

Distributed Job Scheduler Distributed Job Scheduler  Description file Example:  Description file Example:

53 54

9 10/9/2013

Distributed Job Scheduler Distributed Job Scheduler Standard Universe  Description file Example:  Running serial jobs

 Not supporting Checkpoint at kernel level

 Relinking source code with Condor system call library

 Transparently processing & restarting checkpoint

 Transparently processing migration

 Automatically using remote access mechanism  By default, storing checkpoint on local disk of submit machine  Configurable, it can be stored on checkpoint server 55 56

Distributed Job Scheduler Distributed Job Scheduler

Standard Universe Vanilla Universe Remote file access  Running almost all serial jobs

 Running any program that can run outside of Condor

 Typically relying on shared file system between submit machine and other nodes

 If no shared file system, files will be transferred

57 58

Distributed Job Scheduler Distributed Job Scheduler

MPI Universe PVM Universe  Managing parallel programs written using MPI  Giving the ability to submit PVM applications  PVM can ask Condor to add new machine  Uses only dedicated resources

59 60

10 10/9/2013

Distributed Job Scheduler  DAGMan Scheduler  Using directed acyclic graph(DAG) to specify dependencies

Condor: dependencies between jobs

61 62

Distributed Job Scheduler Distributed Job Scheduler

 DAGMan Scheduler  DAGMan Scheduler  Managing the submissions of jobs  Managing the submissions of jobs

63 64

Distributed Job Scheduler Distributed Job Scheduler

 DAGMan Scheduler  DAGMan Scheduler  Managing the submissions of jobs  Managing the submissions of jobs

65 66

11 10/9/2013

Condor Architecture Condor Architecture

Condor Pool?  Pool owns central manager and a collection of jobs and machines Job Startup  Central manager serves as centralized repository of info about the state of the pool

67 68

Condor Architecture

INFN Condor pool

69 70

Grid overview Grid overview What is Grid? “Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources”

Virtual Organization (VO) Dynamic Set of individuals and institutions defined by sharing rules to share their resources to achieve a common goal

Example of VO A crisis management team, the databases, and simulation systems are used to plan a response to an emergency situation

71 72

12 10/9/2013

Grid overview Grid Architecture VO requirements Grid architecture must be formed as layers with hourglass shape  Flexible sharing relationships  Each layer contains  Control on shared resources component sharing  Usage modes the same role  Shared infrastructure services  Component in each  Interoperability layer can use services

of lower layers Since Grid technology provides a general resource-sharing framework, it can be used to address the VO requirements  The interacting between components can be done through standard protocols

73 74

Grid Architecture Grid Architecture Fabric Connectivity Interface to local control  Defines core communication and authentication protocols.  Implement the local, resource–specific operations  To exchange data between Fabric layer resources.  Implement resources Enquiry and resource management  Authentication solutions : mechanisms to have the capability  logon once and have access to multiple Grid resources  Computational: monitoring and controlling process execution  Delegation  Storage: read and writes files  User-based trust relationships  Network: have control over network resources  Code Repository: managing versioned source code

75 76

Grid Architecture Grid Architecture Resource Collective Sharing Single Resources Coordinating Multiple Resources  Defines protocols for secure negotiation, initiation, monitoring,  Defines protocols that capture interactions across collections of control, accounting, payment of sharing operations on individual resources. resources.  Service examples:  Directory services  Two primary classes of Resource layer protocols  Co-allocation, scheduling, and brokering services  Information protocols  Monitoring and diagnostics services  Data replication services  To provide information about the structure and state of a resource  Grid-enabled programming systems  Management protocols  Workload management systems and collaboration frameworks  To negotiate access to a shared resource  Software discovery services

 Community authorization servers

77 78

13 10/9/2013

Grid Architecture Application Implements the business logic  Operate within VO environment  Constructed by calling services defined at any layer.

79 80

Globus Toolkit Globus Toolkit Globus Who is involved in Globus Alliance?  Argonne National Laboratory’s Mathematics and Computer Science Division Community of organizations and individuals developing fundamental  The University of Southern California’s Information Sciences Institute technologies behind the Grid  The University of Chicago's Distributed Systems Laboratory

 The University of Edinburgh in Scotland Globus Toolkit  The Swedish Center for Parallel Computers

Open source software toolkit provides basic infrastructure, protocols,  National Computational Science Alliance and services to build grids and applications  The NASA Information Power Grid project

 …..

81 82

Globus Toolkit Globus Toolkit

Projects using Globus Toolkit The Toolkit

Computer Science Physics Astronomy  Include a set of services and software components to  Condor  FusionGrid Sloan Digital Sky Survey  National Virtual  DOE e-Services  LIGO  Observatory support building Grids and their applications  GridLab  Particle Physics Data Grid  GriPhyN Chemistry Infrastructure  NMI GridShib  CMCS  Includes a set of modules  NMI Performance Monitoring  ASCI (HPSS) Civil Engineering  OGCE  EGEE  NEES  Each module provides an interface used by higher-  OGSA-DAI  Grid3  SciDAC CoG  GRIDS Center Climate Studies level services to invoke the module’s mechanisms.  SciDAC Data Grid  iVDGL  LEAD  SciDAC Security  NorduGrid  Earth System Grid  Each module provides implementations that use  vGRADS  Open Science Grid Collaboration low-level operations to give the ability to implement  TeraGrid  Access Grid  UK e-Science these mechanisms in different environments 83 84

14 10/9/2013

Globus Toolkit Globus Toolkit

Fabric:  Any resources that can be shared.  For example: Distributed file system and condor  Resources defined by vendor-supplied interfaces  Includes enquiry software to detect

resources capabilities and

delivers these information

to higher lever services

85 86

Globus Toolkit Globus Toolkit

Connectivity Resource

Grid Resource Access Management (GRAM)

Grid Resource Information Protocol (GRIP)

Grid Resource Registration Protocol (GRRP)

Grid Security Infrastructure (GSI) GridFTP

Nexsus

87 88

Globus Toolkit Commonalities & Contrast

Connectivity Commonalities

 Using dedicated & non-dedicated resources Grid Information Index Servers (GIISs)  Providing powerful capacity LDAP information protocol

Dynamically Updated Request Online Contrast Co-allocator (DUROC)  Globus provides tools to to build girds, while Condor is software that exploits resources of workstations to perform extensive tasks

 Condor and Globus complementary technologies

 Condor-G, a Globus-enabled version of Condor

89 90

15 10/9/2013

Inefficiencies & Possible

Problems in Condor:

 One central manager

 One checkpoint server

Possible Solution:

For each one, we should have mirror server that can be used

in case of crashing the original server

91 92

16