Grid and Grid Research and Deployment 香港网格及网格技術研究及實踐

Francis C.M. Lau (劉智滿博士) (with C.L. Wang and Roy Ho) Department of Computer Science (and Information Systems) The University of Hong Kong 香港大學計算機科學系 1 HKU Systems Research Group 香港大學系統研究組

“Small“Small group,group, bigbig research”research” 雷声小,雨点大雷声小,雨点大

喊得多,做的少 金海 UU““喊得多,做的少”–”–金海 2 Agenda

{ The Hong Kong Grid { Grid research at HKU z InstantGrid/SLIM z JESSICA2 z G-JavaMPI z LOTS DSM z G-Pass { Conclusion

3 Hong Kong Grid 香港网格

4 HK & Grid

{ HK as a regional hub (香港作爲亞 洲联网的樞紐之一)

{ Interconnecting major cities in Asia Pacific & US

{ Little restrictions 谁都不管!

5 Hong Kong Grid (HKGrid)

{ Goals: z To construct a grid test bed in HK { Grid R&D { Institutions, government, industry { Partners in and Asia-Pacific z To act as gateway for grids nearby z To demonstrate key R&D outcomes { Supported by grants from the HKSAR government, HKU, etc. { http://www.hkgrid.org/

6 HKGrid - Current Constituents

Institutions Computing facilities

City University of HK Service gateway 城市大學 HK Baptist University 2-way Xeon SMP x 64 浸會大學 (#300 in TOP500, 6/2003) HK University of Science and 4-way SMP cluster Technology 科技大學 The HK Institute of HPC Service gateway 高性能計算中心 The HK Polytechnic University Service gateway 理工大學 HKU – Computer Centre 2-way Xeon SMP x 128 香港大學計算中心 (#240 in TOP500, 11/2003) HKU –CS Department Pentium 4 x 300 香港大學計算機科學系 (#175 in TOP500, 11/2002) A 4 Tflop/s theoretical maximum computing power 7 HKGrid – Network Connections

{ Links to China National Grid (CNGrid) and Asia-Pacific Grid (ApGrid) via CERNET and APAN { Plan to connect to China Grid (if Prof. JH lets us) { Internet2 connection to the Abilene backbone at Chicago, USA { Plays the role of a gateway for the other bigger grids

8 Performance Monitoring with Ganglia

URL: http://gideon.csis.hku.hk/hkgrid/9 HKGrid Launched in Cluster2003

Cluster2003 國際大會

Main Organizers of HKGrid; with Dr. Z.W. Xu of CNGrid

10 Progress

{ Oct-Dec 2003 z HKGrid Certificate Authority (CA) and middleware (GT and monitoring software) installed z HKGrid officially launched in Cluster2003 z Weather forecasting demo with AIST () and Kasetsart University (Thailand) z Climate simulation demo in the 5th PRAGMA { Jan-Apr 2004 z HK Supercomputing Directory released z SLIM demonstration for HK Science and Technology Parks, HK Linux Industry Association, HK Government { Ongoing work z Deploy our advanced grid platform in HKGrid z Interoperability with CNGrid, ApGrid, and other country grids

11 Research Projects with HKGrid

{ HKBU (浸大): Knowledge grid, autonomous grid service composition { HKCU (中大): Agent-based wireless grid computing { HKPU (理大): Peer-to-peer grid, meta-scheduling, fault tolerance { HKUST (科大): Resource allocation and scheduling, topology optimization { HKU (港大) z Computer Centre : HKU campus grid ; scientific applications running across the ApGrid z CS : Robust Speech Recognition (Dr. Q. Huo) z CS : Simulation for the DNA Shuffling Experiment (Dr. T.W. Lam) z CS : Approximate String Matching on DNA Sequences (L.L. Cheng) z CS : Whole Genome Alignment via Mutation-Sensitive Sequence Similarity (Dr. T.W. Lam) z CS : HKU Grid Point (A 863 Project: China National Grid) z CS : Asia-Pacific Grid z ETI: Modeling of Air Quality in Hong Kong (E-Business Technology Institute with the Environmental Protection Department, HKSAR) z ME: Parallel Simulation of Turbulent Flow Model (Dr. C.H. Liu, Dept. of Mechanical Engineering) 12 Other Adoptions in Hong Kong

{ Local financial institutes to model foreign exchange market and forecast exchange rates { The Environmental Protection Department has attempted to investigate the inter-connections of air pollution mosaic through numerical simulation (since 2001) { Government plans to harness grid technologies to utilize idle PCs during off- hours { Applications!!!!

13 Grid Facilities at HKU 港大网格設施

14 HKU Grid Point: Grid and Cluster Software

Remote job Grid middleware submission - Globus Toolkit (GT) 2.0, 2.4, 3.0.1 Gatekeeper gideon.csis.hku.hk Job scheduling - OpenPBS 2.3.16 - Maui 3.2.5

Local Job Scheduler Programming

-HPF, Fortran 90 -C, C++, Java with MPI -JESSICA2 (HKU) CS CS CS CC Gideon Ostrich Srgdell HPCPower Communication Lib - MPICH-G2 1.2.3 IPC / Network communication 15 Computer Centre - HPCPOWER

IBM 2-way Xeon x 128; Ranked #240 in TOP500, 11/2003 16 CS Department – “Self-Made” Gideon 300 PC cluster

Pentium x 300; Ranked #340 (#175) in TOP500, 6/2003 (11/2002) 17 HKU in CNGrid (863 Project)

China National Grid Participants 中科院计算所开发的网 格系统软件已将计算所 上海超级计算中心 、华中科技大学 与香港 中科院计算所 大学网格节点连接在一 起,通过VEGA_GOS … 香港大学 (CS) 西安交通大学 中国科技大学 国防科技大学 中科院应用物理所 清华大学

Supporting software: VEGA (织女星)GOS: dynamic service deployment, single-sign-on, data replication, and performance monitoring. Developed by Institute of Computing Technology, Chinese Academy of Sciences V.1.0 released 8 18 ApGrid Test Bed – Weather Forecasting 80

160 32

16 32 40

19 Visits, demos, making noises … (2003)

20 Grid Research at HKU 香港大學网格研究項目

21 Grid Computing : A Refresher CPU power, Memory, { Computing as “utilities”, like Network, electricity, water, etc. Storage… { Advantages: Data.. z Cost-effectiveness Services.. z Platform extensibility z Convenience (Plug & Play) Resource providers Grid Computing

Access to remote resources via standard protocols for cross-domain End users collaboration 22 Potential Applications

{ HPC z High-energy/particle physics, environmental science, bioinformatics, molecular modeling, drug design, neuroscience, weather forecasting, aerospace design, earthquake simulation, … { Grid Services z Video-conferencing, e-learning, supply chain mgmt., automobile manufacturing, OLTP front ends, CRM, financial analysis… and many others after convergence of OGSA and WS (i.e., WSRF)

23 Our Position

{ Will all these apps be naturally supported by interconnecting the computing resources?

{ Spec. of “commodity” grid platforms ≠ ideal execution environments for apps

{ Grid middleware to bridge the gap

{ Existing middleware provides the mechanisms to access remote resources, but does not address many fundamental problems

{ We aim to derive solutions to these problems, and incorporate them to form an advanced grid platform (AGP)

24 Commodity Grids Difficult to Use

{ Heterogeneous & dynamic z Load balancing z How to distribute the work? z Meta-scheduling vs. local autonomy { Poor programmability { Inconsistent software configuration z OS, library, middleware, … z Collective computation? { Complicated security management z O(nm) for n grid points and m users/apps z User-to-host authentication?

25 HKU’s Advanced Grid Platform Goals 我們的目標

On-demand Flexible grid execution construction environments

Advanced Grid Platform “Grid-friendly” Consolidated programming security

Load balancing

26 Core Components/Projects

Load balancing, G-JavaMPI G-JavaMPI work (re-)distribution mechanisms SSI for programmability. JESSICA2JESSICA2 LOTSLOTS JESSICA: LAN-based distributed JVMs LOTS: WAN-based DSM for grid

Grid-wide VO-centric security G-PassG-Pass

InstanInstantGridtGrid On-demand grid construction

Execution environments mgmt., SLIM SLIM dissemination mechanisms

Production use Experimental Prototype stage

27 InstantGrid on SLIM

On-demand construction of grid points with customized execution environments

28 SLIM – Single Linux System Mgmt.

{ A network service for managing and constructing EE’s, and disseminating them to remote computing platforms { Grid computing decouples computing platforms (resources) and computing logic (applications) { I.e., a single platform can run completely different applications { Problem: different applications demand different execution environments (OS, shared libraries, supporting apps, etc.) { The troubles of managing execution environments (EE’s) on the resource provider’s side offset the benefits of resource sharing

29 SLIM – System design

“On demand”

“Get what you need”

How does it work? { A node sends a EE specification across the network to find the Boot server { Boot server delivers the requested Linux kernel { Image server constructs an EE by collecting shared libraries, user data, etc. { Linux kernel boots, and contacts the Image Server to “mount”the EE via a file synchronization protocol such as NFS { Aggressive caching techniques are deployed to optimize performance 30 InstantGrid: On-Demand Grid Point Construction

{ Integrates SLIM with GT, PBS, and Ganglia { Modular design, supports any grid middleware { Configure-before-disseminate, simplifies grid point management { On-demand construction of grid points, virtually effortless on the clients’ and compute nodes’ part { Customizable EE, consistent across the entire grid

31 Performance Evaluation

SLIM: 272 PCs < 5 minutes (Linux only)

InstantGrid: { Construct a 256-node grid point (Linux+GT+PBS+Ganglia) from scratch (PXE enabled) through Fast Ethernet: 5 minutes { Generate host certificates for 256 machines: 9 minutes { Total time : 14 minutes 32 InstantGrid/SLIM Current Progress

{ Released to general public since April 2004 { >150 downloads; from Mainland China, Hong Kong, Macau, Taiwan, USA, and Singapore { HKSAR Government bodies, academic institutions and high schools, software development firms, and private companies { InstantGrid/SLIM has been managing: z the HKU-CSIS grid point (350 nodes) for various grid research projects z an addition 300+ lab machines for teaching purposes (different courses have different requirements) 33 InstantGrid/SLIM Future Work

{ To overcome the challenges in deploying InstantGrid over broadband links z “Pervasive grid computing” { Standard for EE specification { Negotiation protocols among grid points to compromise on EE spec. z Platform extensibility

34 InstantGrid/SLIM – Key References

{ http://slim.csis.hku.hk/

{ R.S.C. Ho, C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau, “Managing Execution Environments for Utility Computing,” Network Research Workshop, APAN 2004, July, 2004

{ Try it

(LinuxPilot 2004/04) 35 JESSICA2

A Java-Enabled Single-System Image Computing Architecture

36 A Form of “Mobile” Computing

{ Applications should not be stationary – to take advantage of multiplicity of distributed resources and to achieve efficiency (e.g., load balancing) { If threads and processes can be migrated, then applications can, and multi- process/thread apps can execute in real/enhanced parallelism – Amoeba! { This applies best to certain (many?) grid apps { Supports for dynamic process/thread migration should be built from ground 0

37 JESSICA2

{ A Distributed Java Virtual Machine (DJVM) consisting of a group of extended JVMs running in a distributed environment

{ Supports true parallel execution of a multithreaded Java application

{ Java threads can freely move across node boundaries and execute in parallel

{ Grid as a single machine – Single System Image (SSI)

38 JESSICA2 Architecture

A Multithreaded Java Program

Thread Migration JIT Compiler Mode Portable Java Frame

JESSICA2 JESSICA2 JESSICA2 JESSICA2 JESSICA2 JESSICA2 JVM JVM JVM JVM JVM JVM Master Worker Worker Worker Worker Worker

Global Object Space

39 JESSICA2 Main Features

{ Transparent Java thread migration z Runtime capturing and restoring of thread execution context z No source code modification; no bytecode instrumentation (preprocessing); no new API introduced z Enable dynamic load balancing on clusters { Full Speed Computation z JITEE: cluster-aware bytecode execution engine z Operated in Just-In-Time (JIT) compilation mode z Zero cost if no migration { Transparent Remote Object Access z Global Object Space : A shared global heap spanning all cluster nodes z Adaptive migrating home protocol for memory consistency + various optimizing schemes. z I/O redirection

40 Ray Tracing on JESSICA2 (64 PCs)

Linux 2.4.18-3 kernel (Redhat 7.3)

64 nodes: 108 seconds 1 node: 4402 seconds ( 1.2 hour) Speedup = 4402/108=40.75

41 JESSICA – Key references

{ W.Z. Zhu , C.L. Wang, and F.C.M. Lau “A Lightweight Solution for Transparent Java Thread Migration in Just-in-Time Compilers,” The 2003 International Conference on Parallel Processing (ICPP-2003), pp. 465-472, Taiwan, Oct. 6-10, 2003 { W.Z. Zhu, C.L. Wang and F.C.M. Lau, “JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support,” IEEE Fourth International Conference on Cluster Computing (CLUSTER 2002), Chicago, USA, September 23-26, 2002, pp. 381-388. { M.J.M. Ma, C.L. Wang, F.C.M. Lau. “JESSICA: Java- Enabled Single-System-Image Computing Architecture,” Journal of Parallel and Distributed Computing, Vol. 60, No. 10, October 2000, pp. 1194-1222.

42 G-JavaMPI

A grid-enabled Java-MPI system with dynamic load-balancing via process migration

43 G-JavaMPI

{ Goal: load balancing for grid z Grid’s heterogeneity & dynamicity z Poor parallelization of programs { A grid-enabled implementation of Java binding of MPI { Transparent Java process migration (through JVMDI) { Balance both CPU and network loads { Communication-aware process migration policies based on: z application’s communication pattern z available network bandwidth on grid overlays

44 G-JavaMPI – System design

(3) (1)(1*)

Gatekeeper L Gatekeeper LS S Java-MPI (2)communicatio nWAN (*) Some legacy messages are (2*) redirected Migrating during migration (restarting a new process through JVM Migration Globus remote module job request with resides M delegated user Gatekeepe LS in each credentials and (3*) r JVM Java-MPI job credentials)

45 G-JavaMPI – Ongoing and Future Work

{ The migration mechanism has been implemented { Requests for source code from universities in China, , and Singapore { Future work target at process migration policies z CPU and network heterogeneities cause long “blocking” periods in cooperative processes, thus limiting the system throughput z G-JavaMPI aims to detect and eliminate “blocking” through process migration (e.g. to migrate a “bottleneck” process to a faster node, etc.)

46 LOTS

Large Object Space (DSM) on Grid

47 LOTS: Large Object Space on Grid LOTS LOTS OS OS H/W H/W Grid Large Global LOTS Object Space OS LOTS H/W OS H/W LOTS OS H/W

{ A software distributed memory system for Grid { Provides a large distributed memory space > the process space { Uses local hard disk to store recently unused objects { Scope Consistency + Home Migration to reduce redundant data traffic 48 G-Pass

Virtual organization-centric grid security

49 G-Pass

{ Multi-agent and some dynamic grid systems (e.g., G-JavaMPI, etc.) demand flexible authentication schemes { “user-to-host” too limited { Identity of “VO”? { In G-Pass, each process is given a G-Pass credential, which is valid in a pre-defined, grid-wide, security context { Examples of context: names of VO, file access privileges, valid period, etc. { G-Pass forms a foundation for secure process migration within and across grid points, which provides the needed support for our G-JavaMPI, JESSICA, and LOTS projects

50 Summary

{ Performance z G-JavaMPI & JESSICA establish extensible grid platforms z Process/thread migration enables performance optimization and load balancing z LOTS supports shared memory programming environment on large object space (data grid applications)

{ Reliability z G-JavaMPI migrates processes from failed machines z InstantGrid/SLIM help construct platforms for failover

{ Convenience z G-JavaMPI, JESSICA, and LOTS enhance programmability z InstantGrid/SLIM simplify grid point management

{ Security z G-Pass consolidates grid-wide security and enables application mobility

51 Conclusion

{ Grid computing is a relatively new paradigm that deserves further investigation

{ We identify and address the fundamental research issues in grid computing

{ Our advanced grid computing platform is geared to deploy in production grid systems

52 To Find Out More

• Hong Kong Grid • http://www.hkgrid.org/ • Grid Computing Research Portal • http://grid.csis.hku.hk/ • The HKU Systems Research Group • http://www.srg.csis.hku.hk/

{ The HK Supercomputing Directory

z http://www.hkhpc.org/~SuperDir/

53 Thanks! 谢谢!

HKU Systems Research Group (12/2003) 香港大學系統研究組 54