Grid and Grid Research and Deployment 香港网格及网格技術研究及實踐

Francis C.M. Lau (劉智滿博士) (with C.L. Wang and Roy Ho) Department of Computer Science (and Information Systems) The University of Hong Kong 香港大學計算機科學系 1 HKU Systems Research Group 香港大學系統研究組

“Small“Small group,group, bigbig research”research” 雷声小,雨点大雷声小,雨点大

喊得多,做的少 金海 UU““喊得多,做的少”–”–金海 2 Agenda

{ The Hong Kong Grid { Grid research at HKU z InstantGrid/SLIM z JESSICA2 z G-JavaMPI z LOTS DSM z G-Pass { Conclusion

3 Hong Kong Grid 香港网格

4 HK & Grid

{ HK as a regional hub (香港作爲亞 洲联网的樞紐之一)

{ Interconnecting major cities in Asia Pacific & US

{ Little restrictions 谁都不管!

5 Hong Kong Grid (HKGrid)

{ Goals: z To construct a grid test bed in HK { Grid R&D { Institutions, government, industry { Partners in and Asia-Pacific z To act as gateway for grids nearby z To demonstrate key R&D outcomes { Supported by grants from the HKSAR government, HKU, etc. {

6 HKGrid - Current Constituents

Institutions Computing facilities

City University of HK Service gateway 城市大學 HK Baptist University 2-way Xeon SMP x 64 浸會大學 (#300 in TOP500, 6/2003) HK University of Science and 4-way SMP cluster Technology 科技大學 The HK Institute of HPC Service gateway 高性能計算中心 The HK Polytechnic University Service gateway 理工大學 HKU – Computer Centre 2-way Xeon SMP x 128 香港大學計算中心 (#240 in TOP500, 11/2003) HKU –CS Department Pentium 4 x 300 香港大學計算機科學系 (#175 in TOP500, 11/2002) A 4 Tflop/s theoretical maximum computing power 7 HKGrid – Network Connections

{ Links to China National Grid (CNGrid) and Asia-Pacific Grid (ApGrid) via CERNET and APAN { Plan to connect to China Grid (if Prof. JH lets us) { Internet2 connection to the Abilene backbone at Chicago, USA { Plays the role of a gateway for the other bigger grids

8 Performance Monitoring with Ganglia

URL: HKGrid Launched in Cluster2003

Cluster2003 國際大會

Main Organizers of HKGrid; with Dr. Z.W. Xu of CNGrid

10 Progress

{ Oct-Dec 2003 z HKGrid Certificate Authority (CA) and middleware (GT and monitoring software) installed z HKGrid officially launched in Cluster2003 z Weather forecasting demo with AIST () and Kasetsart University (Thailand) z Climate simulation demo in the 5th PRAGMA { Jan-Apr 2004 z HK Supercomputing Directory released z SLIM demonstration for HK Science and Technology Parks, HK Linux Industry Association, HK Government { Ongoing work z Deploy our advanced grid platform in HKGrid z Interoperability with CNGrid, ApGrid, and other country grids

11 Research Projects with HKGrid

{ HKBU (浸大): Knowledge grid, autonomous grid service composition { HKCU (中大): Agent-based wireless grid computing { HKPU (理大): Peer-to-peer grid, meta-scheduling, fault tolerance { HKUST (科大): Resource allocation and scheduling, topology optimization { HKU (港大) z Computer Centre : HKU campus grid ; scientific applications running across the ApGrid z CS : Robust Speech Recognition (Dr. Q. Huo) z CS : Simulation for the DNA Shuffling Experiment (Dr. T.W. Lam) z CS : Approximate String Matching on DNA Sequences (L.L. Cheng) z CS : Whole Genome Alignment via Mutation-Sensitive Sequence Similarity (Dr. T.W. Lam) z CS : HKU Grid Point (A 863 Project: China National Grid) z CS : Asia-Pacific Grid z ETI: Modeling of Air Quality in Hong Kong (E-Business Technology Institute with the Environmental Protection Department, HKSAR) z ME: Parallel Simulation of Turbulent Flow Model (Dr. C.H. Liu, Dept. of Mechanical Engineering) 12 Other Adoptions in Hong Kong

{ Local financial institutes to model foreign exchange market and forecast exchange rates { The Environmental Protection Department has attempted to investigate the inter-connections of air pollution mosaic through numerical simulation (since 2001) { Government plans to harness grid technologies to utilize idle PCs during off- hours { Applications!!!!

13 Grid Facilities at HKU 港大网格設施

14 HKU Grid Point: Grid and Cluster Software

Remote job Grid middleware submission - Globus Toolkit (GT) 2.0, 2.4, 3.0.1 Gatekeeper Job scheduling - OpenPBS 2.3.16 - Maui 3.2.5

Local Job Scheduler Programming

-HPF, Fortran 90 -C, C++, Java with MPI -JESSICA2 (HKU) CS CS CS CC Gideon Ostrich Srgdell HPCPower Communication Lib - MPICH-G2 1.2.3 IPC / Network communication 15 Computer Centre - HPCPOWER

IBM 2-way Xeon x 128; Ranked #240 in TOP500, 11/2003 16 CS Department – “Self-Made” Gideon 300 PC cluster

Pentium x 300; Ranked #340 (#175) in TOP500, 6/2003 (11/2002) 17 HKU in CNGrid (863 Project)

China National Grid Participants 中科院计算所开发的网 格系统软件已将计算所 上海超级计算中心 、华中科技大学 与香港 中科院计算所 大学网格节点连接在一 起,通过VEGA_GOS … 香港大学 (CS) 西安交通大学 中国科技大学 国防科技大学 中科院应用物理所 清华大学

Supporting software: VEGA (织女星)GOS: dynamic service deployment, single-sign-on, data replication, and performance monitoring. Developed by Institute of Computing Technology, Chinese Academy of Sciences V.1.0 released 8 18 ApGrid Test Bed – Weather Forecasting 80

160 32

16 32 40

19 Visits, demos, making noises … (2003)

20 Grid Research at HKU 香港大學网格研究項目

21 Grid Computing : A Refresher CPU power, Memory, { Computing as “utilities”, like Network, electricity, water, etc. Storage… { Advantages: Data.. z Cost-effectiveness Services.. z Platform extensibility z Convenience (Plug & Play) Resource providers Grid Computing

Access to remote resources via standard protocols for cross-domain End users collaboration 22 Potential Applications

{ HPC z High-energy/particle physics, environmental science, bioinformatics, molecular modeling, drug design, neuroscience, weather forecasting, aerospace design, earthquake simulation, … { Grid Services z Video-conferencing, e-learning, supply chain mgmt., automobile manufacturing, OLTP front ends, CRM, financial analysis… and many others after convergence of OGSA and WS (i.e., WSRF)

23 Our Position

{ Will all these apps be naturally supported by interconnecting the computing resources?

{ Spec. of “commodity” grid platforms ≠ ideal execution environments for apps

{ Grid middleware to bridge the gap

{ Existing middleware provides the mechanisms to access remote resources, but does not address many fundamental problems

{ We aim to derive solutions to these problems, and incorporate them to form an advanced grid platform (AGP)

24 Commodity Grids Difficult to Use

{ Heterogeneous & dynamic z Load balancing z How to distribute the work? z Meta-scheduling vs. local autonomy { Poor programmability { Inconsistent software configuration z OS, library, middleware, … z Collective computation? { Complicated security management z O(nm) for n grid points and m users/apps z User-to-host authentication?

25 HKU’s Advanced Grid Platform Goals 我們的目標

On-demand Flexible grid execution construction environments

Advanced Grid Platform “Grid-friendly” Consolidated programming security

Load balancing

26 Core Components/Projects

Load balancing, G-JavaMPI G-JavaMPI work (re-)distribution mechanisms SSI for programmability. JESSICA2JESSICA2 LOTSLOTS JESSICA: LAN-based distributed JVMs LOTS: WAN-based DSM for grid

Grid-wide VO-centric security G-PassG-Pass

InstanInstantGridtGrid On-demand grid construction

Execution environments mgmt., SLIM SLIM dissemination mechanisms

Production use Experimental Prototype stage

27 InstantGrid on SLIM

On-demand construction of grid points with customized execution environments

28 SLIM – Single Linux System Mgmt.

{ A network service for managing and constructing EE’s, and disseminating them to remote computing platforms { Grid computing decouples computing platforms (resources) and computing logic (applications) { I.e., a single platform can run completely different applications { Problem: different applications demand different execution environments (OS, shared libraries, supporting apps, etc.) { The troubles of managing execution environments (EE’s) on the resource provider’s side offset the benefits of resource sharing

29 SLIM – System design

“On demand”

“Get what you need”

How does it work? { A node sends a EE specification across the network to find the Boot server { Boot server delivers the requested Linux kernel { Image server constructs an EE by collecting shared libraries, user data, etc. { Linux kernel boots, and contacts the Image Server to “mount”the EE via a file synchronization protocol such as NFS { Aggressive caching techniques are deployed to optimize performance 30 InstantGrid: On-Demand Grid Point Construction

{ Integrates SLIM with GT, PBS, and Ganglia { Modular design, supports any grid middleware { Configure-before-disseminate, simplifies grid point management { On-demand construction of grid points, virtually effortless on the clients’ and compute nodes’ part { Customizable EE, consistent across the entire grid

31 Performance Evaluation

SLIM: 272 PCs < 5 minutes (Linux only)

InstantGrid: { Construct a 256-node grid point (Linux+GT+PBS+Ganglia) from scratch (PXE enabled) through Fast Ethernet: 5 minutes { Generate host certificates for 256 machines: 9 minutes { Total time : 14 minutes 32 InstantGrid/SLIM Current Progress

{ Released to general public since April 2004 { >150 downloads; from Mainland China, Hong Kong, Macau, Taiwan, USA, and Singapore { HKSAR Government bodies, academic institutions and high schools, software development firms, and private companies { InstantGrid/SLIM has been managing: z the HKU-CSIS grid point (350 nodes) for various grid research projects z an addition 300+ lab machines for teaching purposes (different courses have different requirements) 33 InstantGrid/SLIM Future Work

{ To overcome the challenges in deploying InstantGrid over broadband links z “Pervasive grid computing” { Standard for EE specification { Negotiation protocols among grid points to compromise on EE spec. z Platform extensibility

34 InstantGrid/SLIM – Key References


{ R.S.C. Ho, C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau, “Managing Execution Environments for Utility Computing,” Network Research Workshop, APAN 2004, July, 2004

{ Try it

(LinuxPilot 2004/04) 35 JESSICA2

A Java-Enabled Single-System Image Computing Architecture

36 A Form of “Mobile” Computing

{ Applications should not be stationary – to take advantage of multiplicity of distributed resources and to achieve efficiency (e.g., load balancing) { If threads and processes can be migrated, then applications can, and multi- process/thread apps can execute in real/enhanced parallelism – Amoeba! { This applies best to certain (many?) grid apps { Supports for dynamic process/thread migration should be built from ground 0


{ A Distributed Java Virtual Machine (DJVM) consisting of a group of extended JVMs running in a distributed environment

{ Supports true parallel execution of a multithreaded Java application

{ Java threads can freely move across node boundaries and execute in parallel

{ Grid as a single machine – Single System Image (SSI)

38 JESSICA2 Architecture

A Multithreaded Java Program

Thread Migration JIT Compiler Mode Portable Java Frame


Global Object Space

39 JESSICA2 Main Features

{ Transparent Java thread migration z Runtime capturing and restoring of thread execution context z No source code modification; no bytecode instrumentation (preprocessing); no new API introduced z Enable dynamic load balancing on clusters { Full Speed Computation z JITEE: cluster-aware bytecode execution engine z Operated in Just-In-Time (JIT) compilation mode z Zero cost if no migration { Transparent Remote Object Access z Global Object Space : A shared global heap spanning all cluster nodes z Adaptive migrating home protocol for memory consistency + various optimizing schemes. z I/O redirection

40 Ray Tracing on JESSICA2 (64 PCs)

Linux 2.4.18-3 kernel (Redhat 7.3)

64 nodes: 108 seconds 1 node: 4402 seconds ( 1.2 hour) Speedup = 4402/108=40.75

41 JESSICA – Key references

{ W.Z. Zhu , C.L. Wang, and F.C.M. Lau “A Lightweight Solution for Transparent Java Thread Migration in Just-in-Time Compilers,” The 2003 International Conference on Parallel Processing (ICPP-2003), pp. 465-472, Taiwan, Oct. 6-10, 2003 { W.Z. Zhu, C.L. Wang and F.C.M. Lau, “JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support,” IEEE Fourth International Conference on Cluster Computing (CLUSTER 2002), Chicago, USA, September 23-26, 2002, pp. 381-388. { M.J.M. Ma, C.L. Wang, F.C.M. Lau. “JESSICA: Java- Enabled Single-System-Image Computing Architecture,” Journal of Parallel and Distributed Computing, Vol. 60, No. 10, October 2000, pp. 1194-1222.

42 G-JavaMPI

A grid-enabled Java-MPI system with dynamic load-balancing via process migration

43 G-JavaMPI

{ Goal: load balancing for grid z Grid’s heterogeneity & dynamicity z Poor parallelization of programs { A grid-enabled implementation of Java binding of MPI { Transparent Java process migration (through JVMDI) { Balance both CPU and network loads { Communication-aware process migration policies based on: z application’s communication pattern z available network bandwidth on grid overlays

44 G-JavaMPI – System design

(3) (1)(1*)

Gatekeeper L Gatekeeper LS S Java-MPI (2)communicatio nWAN (*) Some legacy messages are (2*) redirected Migrating during migration (restarting a new process through JVM Migration Globus remote module job request with resides M delegated user Gatekeepe LS in each credentials and (3*) r JVM Java-MPI job credentials)

45 G-JavaMPI – Ongoing and Future Work

{ The migration mechanism has been implemented { Requests for source code from universities in China, , and Singapore { Future work target at process migration policies z CPU and network heterogeneities cause long “blocking” periods in cooperative processes, thus limiting the system throughput z G-JavaMPI aims to detect and eliminate “blocking” through process migration (e.g. to migrate a “bottleneck” process to a faster node, etc.)


Large Object Space (DSM) on Grid

47 LOTS: Large Object Space on Grid LOTS LOTS OS OS H/W H/W Grid Large Global LOTS Object Space OS LOTS H/W OS H/W LOTS OS H/W

{ A software distributed memory system for Grid { Provides a large distributed memory space > the process space { Uses local hard disk to store recently unused objects { Scope Consistency + Home Migration to reduce redundant data traffic 48 G-Pass

Virtual organization-centric grid security

49 G-Pass

{ Multi-agent and some dynamic grid systems (e.g., G-JavaMPI, etc.) demand flexible authentication schemes { “user-to-host” too limited { Identity of “VO”? { In G-Pass, each process is given a G-Pass credential, which is valid in a pre-defined, grid-wide, security context { Examples of context: names of VO, file access privileges, valid period, etc. { G-Pass forms a foundation for secure process migration within and across grid points, which provides the needed support for our G-JavaMPI, JESSICA, and LOTS projects

50 Summary

{ Performance z G-JavaMPI & JESSICA establish extensible grid platforms z Process/thread migration enables performance optimization and load balancing z LOTS supports shared memory programming environment on large object space (data grid applications)

{ Reliability z G-JavaMPI migrates processes from failed machines z InstantGrid/SLIM help construct platforms for failover

{ Convenience z G-JavaMPI, JESSICA, and LOTS enhance programmability z InstantGrid/SLIM simplify grid point management

{ Security z G-Pass consolidates grid-wide security and enables application mobility

51 Conclusion

{ Grid computing is a relatively new paradigm that deserves further investigation

{ We identify and address the fundamental research issues in grid computing

{ Our advanced grid computing platform is geared to deploy in production grid systems

52 To Find Out More

• Hong Kong Grid • • Grid Computing Research Portal • • The HKU Systems Research Group •

{ The HK Supercomputing Directory


53 Thanks! 谢谢!

HKU Systems Research Group (12/2003) 香港大學系統研究組 54