Openssi (Single System Image) Linux C Luster Project Openssi.Org

Total Page:16

File Type:pdf, Size:1020Kb

Openssi (Single System Image) Linux C Luster Project Openssi.Org OpenSSI (Single System Image) Linux C luster Project openssi.org Jeff Edlund Senior Principle Solution Architect NSP Bruce Walker Staff Fellow – Office of Strategy & Technology © 2004 Hewlett-Packard Developm ent Com pany, L.P. The inform ation contained herein is subject to change without notice Agenda • W hat are today’s clustering strategies for Linux • W hy isn’t failover clustering enough • W hat is Single System Im age (SSI) • W hy is SSI so im portant • O penSSI Cluster Project Architecture • Project Status 07/10/03 2 M any types of Clusters • High Perform ance Clusters ¦ Beowulf; 1000 nodes; parallel program s; M PI • Load-leveling Clusters ¦ M ove processes around to borrow cycles (eg. M osix) • W eb-Service Clusters ¦ LVS/Piranah; load-level tcp connections; replicate data • Storage Clusters ¦ G FS; parallel filesystem s; sam e view of data from each node • Database Clusters ¦ O racle Parallel Server; • High Availability Clusters ¦ ServiceG uard, Lifekeeper, Failsafe, heartbeat, failover clusters • Single System Im age Clusters 07/10/03 3 W ho is Doing SSI Clustering? • O utside Linux: ¦ Com paq/HP with VM SClusters, TruClusters, N SK, and N SC ¦ Sun had “Full M oon”/Solaris M C (now SunClusters) ¦ IBM Sysplex ? • Linux SSI: ¦ Scyld - form of SSI via Bproc ¦ M osix - form of SSI due their hom enode/process m igration technique and looking at a single root filesystem ¦ Polyserve - form of SSI via CFS (Cluster File System ) ¦ Q Clusters – SSI through software / m iddleware layer ¦ RedHat G FS – G lobal File System (based on Sistina) ¦ Hive Com puting – Declarative program m ing m odel for “workers” ¦ O penSSI Cluster Project – SSI project to bring all attributes together 07/10/03 4 Scyld - Beowulf Bproc (used by Scyld): ¦ process-related solution ¦ m aster node with slaves ¦ initiate process on m aster node and explicitly “m ove”, “rexec” or “rfork” to slave node ¦ all files closed when the process is m oved ¦ m aster node can “see” all the processes which were started there ¦ m oved processes see the process space of the m aster (som e pid m apping) ¦ process system calls shipped back to the m aster node (including fork) ¦ other system calls executed locally but not SSI 07/10/03 5 M osix M osix / O penM osix: ¦ hom e nodes with slaves ¦ initiate process on hom e node and transparently m igrate to other nodes ¦ hom e node can “see” all and only all processes started there ¦ m oved processes see the view of the hom e node ¦ m ost system calls actually executed back on the hom e node ¦ DFSA helps to allow I/O to be local to the process 07/10/03 6 PolyServe M atrix Server: ¦ Com pletely sym m etric Cluster File System with DLM ( no m aster / slave relationships) ¦ Each node m ust be directly attached to SAN ¦ Lim ited SSI for m anagem ent ¦ N o SSI for processes ¦ N o load balancing 07/10/03 7 Q lusters ClusterFram e: ¦ Based on M osix ¦ Uses Hom e-node SSI Application Components ¦ centralized policy-based ClusterFrame XHA ClusterFrame SSI m anagem ent Xtreme High Availability Single System Image • reduces overhead Enterprise Cluster ClusterFrame Q RM – Q lusters Resource M anager • pre-determ ined resource M anagement allowances • centralized provisioning ClusterFrame Platform ¦ stateful application recovery Linux Kernel Intel Blades & Storage Systems 07/10/03 8 RedHat G FS – G lobal File System RedHat Cluster Suite (G FS): ¦ Form erly Sistina ¦ Prim arily Parallel Physical file system (only real form of SSI) ¦ Used in conjunction with RedHat cluster m anager to provide • High availability • IP load balancing ¦ Lim ited sharing and no process load balancing 07/10/03 9 Hive Com puting - Tsunam i Hive Creator: ¦ Hives can be m ade up of any num ber of IA32 m achines ¦ Hives consist of: • Client applications • Hive client API • W orkers • W orker applications ¦ Databases exist outside of the Hive ¦ Applications m ust be m odified to run in a Hive ¦ N o Cluster File System ¦ Closer to G rid m odel than SSI 07/10/03 10 Are there O pportunity G aps in the current SSI offerings? YES!! A Full SSI solution is the foundation for sim ultaneously addressing all the issues in all the cluster solution areas O pportunity to com bine: •High Availability •IP load balancing •IP failover •Process load balancing •Cluster filesystem •Distributed Lock M anager •Single nam espace •M uch m ore … 07/10/03 11 W hat is a Full Single System Im age Solution? Com plete Cluster looks like a single system to: ¦ Users; ¦ Adm inistrators; ¦ Program s and Program m ers; Co-operating O S Kernels providing transparent access to all O S resources cluster-wide, using a single nam espace ¦ A.K.A – You don’t really know it’s a cluster! The state of cluster nirvana 07/10/03 12 SM P – Sym m etrical M ulti Processing functionality Function SM P M anageability Yes Usability Yes Sharing / Utilization Yes High Availability Scalability Incremental G row th Price / Performance 07/10/03 13 Value add of HA clustering to SM P Traditional Function SM P Clusters M anageability Yes Usability Yes Sharing / Utilization Yes High Availability Yes Scalability Yes Incremental G row th Yes Price / Performance Yes 07/10/03 14 SSI Clusters have the best of both!! Traditional Function SM P Clusters SSI Clusters M anageability Yes Yes Usability Yes Yes Sharing / Utilization Yes Yes High Availability Yes Yes Scalability Yes Yes Incremental G row th Yes Yes Price / Performance Yes Yes 07/10/03 15 Com m on Clustering G oals O ne or All of: • High Availability ¦ A com pute engine is always available to run m y workload • Scalability ¦ As I need m ore resource I can access it transparently to the end user application • M anageability ¦ I can guarantee som e level of service because I can efficiently m onitor, operate and service m y com pute resources • Usability ¦ Com pute resources are assem bled together in such a way as to give m e trouble free easy operations of m y com pute resources without regard to having knowledge of the cluster 07/10/03 16 O penSSI Linux Cluster Project Ideal/Perfect Cluster in all dimensions SMP Typical HA Cluster Availability OpenSSI Linux Cluster Project Scalability Manageability log scale HUGE Really BIG Usability 07/10/03 17 O verview of O penSSI Cluster • Single HA root filesystem • Consistent O S kernel on each node • Cluster form ation early in boot • Strong M em bership • Single, clusterwide view of files, filesystem s, devices, processes and ipc objects • Single m anagem ent dom ain • Load balancing of connections and processes 07/10/03 18 O penSSI Cluster Project Availability • N o Single (or even m ultiple) Point(s) of Failure • Autom atic Failover/restart of services in the event of hardware or software failure • Application Availability is sim pler in an SSI Cluster environm ent; statefull restart easily done; • SSI Cluster provides a sim pler operator and program m ing environm ent • O nline software upgrade • Architected to avoid scheduled downtim e 07/10/03 19 O penSSI Cluster Project Price / Perform ance Scalability • W hat is Scalability? ¦ Environm ental Scalability and Application Scalability! • Environm ental (Cluster) Scalability: ¦ m ore USEABLE processors, m em ory, I/O , etc. ¦ SSI m akes these added resources useable 07/10/03 20 O penSSI Cluster Project Price / Perform ance Scalability Application Scalability: • SSI m akes distributing function very easy • SSI allows sharing of resources between processes on different nodes (all resources transparently visible from all nodes): ¦ filesystem s, IPC, processes, devices*, networking* • SSI allows replicated instances to co-ordinate (alm ost as easy as replicated instances on an SM P; in som e ways m uch better) • Load balancing of connections and processes • O S version in local m em ory on each node • Industry Standard Hardware (can m ix hardware) • Distributed O S algorithm s written to scale to hundreds of nodes (and successful dem onstrated to 133 blades and 27 Itanium SM P nodes) 07/10/03 21 O penSSI Linux Cluster - M anageability • Single Installation ¦ Joining the cluster is autom atic as part of booting and doesn’t have to m anaged • Trivial online addition of new nodes • Use standard single node tools (SSI Adm inistration) • Visibility of all resources of all nodes from any node ¦ Applications, utilities, program m ers, users and adm inistrators often needn’t be aware of the SSI Cluster • Sim pler HA (High Availability) m anagem ent 07/10/03 22 O penSSI Linux Cluster Single System Adm inistration • Single set of User accounts (not N IS) • Single set of filesystem s (no “N etwork m ounts”) • Single set of devices • Single view of networking • Single set of Services (printing, dum ps, networking*, etc.) • Single root filesystem (lots of adm in files there) • Single set of paging/swap spaces (goal) • Single install • Single boot and single copy of kernel • Single m achine m anagem ent tools 07/10/03 23 O penSSI Linux Cluster - Ease of Use • Can run anything anywhere with no setup; • Can see everything from any node; • Service failover/restart is trivial; • Autom atic or m anual load balancing; ¦ powerful environm ent for application service provisioning, m onitoring and re-arranging as needed 07/10/03 24 Blades and O penSSI Clusters ¦Very sim ple provisioning of hardware, system and applications ¦N o root filesystem per node ¦Single install of the system and single application install ¦N odes can netboot ¦Local disk only needed for swap but can be shared ¦Blades don’t need FCAL connect but can use it ¦Single, highly available IP address for the cluster ¦Single system update
Recommended publications
  • View Article(3467)
    Problems of information technology, 2018, №1, 92–97 Kamran E. Jafarzade DOI: 10.25045/jpit.v09.i1.10 Institute of Information Technology of ANAS, Baku, Azerbaijan [email protected] COMPARATIVE ANALYSIS OF THE SOFTWARE USED IN SUPERCOMPUTER TECHNOLOGIES The article considers the classification of the types of supercomputer architectures, such as MPP, SMP and cluster, including software and application programming interfaces: MPI and PVM. It also offers a comparative analysis of software in the study of the dynamics of the distribution of operating systems (OS) for the last year of use in supercomputer technologies. In addition, the effectiveness of the use of CentOS software on the scientific network "AzScienceNet" is analyzed. Keywords: supercomputer, operating system, software, cluster, SMP-architecture, MPP-architecture, MPI, PVM, CentOS. Introduction Supercomputer is a computer with high computing performance compared to a regular computer. Supercomputers are often used for scientific and engineering applications that need to process very large databases or perform a large number of calculations. The performance of a supercomputer is measured in floating-point operations per second (FLOPS) instead of millions of instructions per second (MIPS). Since 2015, the supercomputers performing up to quadrillion FLOPS have started to be developed. Modern supercomputers represent a large number of high performance server computers, which are interconnected via a local high-speed backbone to achieve the highest performance [1]. Supercomputers were originally introduced in the 1960s and bearing the name or monogram of the companies such as Seymour Cray of Control Data Corporation (CDC), Cray Research over the next decades. By the end of the 20th century, massively parallel supercomputers with tens of thousands of available processors started to be manufactured.
    [Show full text]
  • Distributed-Operating-Systems.Pdf
    Distributed Operating Systems Overview Ye Olde Operating Systems OpenMOSIX OpenSSI Kerrighed Quick Preview Front Back Distributed Operating Systems vs Grid Computing Grid System User Space US US US US US US Operating System OS OS OS OS OS OS Nodes Nodes Amoeba, Plan9, OpenMosix, Xgrid, SGE, Condor, Distcc, OpenSSI, Kerrighed. Boinc, GpuGrid. Distributed Operating Systems vs Grid Computing Problems with the grid. Programs must utilize that library system. Usually requiring seperate programming. OS updates take place N times. Problems with dist OS Security issues – no SSL. Considered more complicated to setup. Important Note Each node, even with distributed operating systems, boots a kernel. This kernel can vary depending on the role of the node and overall architecture of the system. User Space Operating System OS OS OS OS OS OS Nodes Amoeba Andrew S. Tanenbaum Earliest documentation: 1986 What modern language was originally developed for use in Amoeba? Anyone heard of Orca? Sun4c, Sun4m, 386/486, 68030, Sun 3/50, Sun 3/60. Amoeba Plan9 Started development in the 1980's Released in 1992 (universities) and 1995 (general public). All devices are part of the filesystem. X86, MIPS, DEC Alpha, SPARC, PowerPC, ARM. Union Directories, basis of UnionFS. /proc first implemente d here. Plan9 Rio , the Plan9 window manager showing ”faces(1), stats(8), acme(1) ” and many more things. Plan9 Split nodes into 3 distinct groupings. Terminals File servers Computational servers Uses the ”9P” protocol. Low level, byte protocol, not block. Used from filesystems, to printer communication. Author: Ken Thompso n Plan9 / Amoeba Both Plan9 and Amoeba make User Space groupings of nodes, into specific categories.
    [Show full text]
  • A Multiserver User-Space Unikernel for a Distributed Virtualization System
    A Multiserver User-space Unikernel for a Distributed Virtualization System Pablo Pessolani Departamento de Ingeniería en Sistemas de Información Facultad Regional Santa Fe, UTN Santa Fe, Argentina [email protected] Abstract— Nowadays, most Cloud applications are developed application components began to be deployed in Containers. using Service Oriented Architecture (SOA) or MicroService Containers (and similar OS abstractions such as Jails [7] and Architecture (MSA). The scalability and performance of them Zones [8]) are isolated execution environments or domains is achieved by executing multiple instances of its components in in user-space to execute groups of processes. Although different nodes of a virtualization cluster. Initially, they were Containers share the same OS, they provide enough security, deployed in Virtual Machines (VMs) but, they required enough performance and failure isolation. As Containers demand computational, memory, network and storage resources to fewer resources than VMs [9], they are a good choice to hold an Operating System (OS), a set of utilities, libraries, and deploy swarms. the application component. By deploying hundreds of these Another option to reduce resource requirements is to use application components, the resource requirements increase a the application component embedded in a Unikernel [10, lot. To minimize them, usually small OSs with small memory footprint are used. Another way to reduce the resource 11]. A Unikernel is defined as “specialized, single-address- requirements is integrating the application components in a space machine image constructed by using library operating Unikernel. This article proposes a Unikernel called MUK, system” [12]. A Unikernel is a technology which integrates based on a multiserver OS, to be used as a tool to integrate monolithically network, storage, and file systems services Cloud application components.
    [Show full text]
  • Facilitating Both Adoption of Linux Undergraduate Operating Systems Laboratories and Students’ Immersion in Kernel Code
    SOFTICE: Facilitating both Adoption of Linux Undergraduate Operating Systems Laboratories and Students’ Immersion in Kernel Code Alessio Gaspar, Sarah Langevin, Joe Stanaback, Clark Godwin University of South Florida, 3334 Winter Lake Road, 33803 Lakeland, FL, USA [alessio | Sarah | joe | clark]@softice.lakeland.usf.edu http://softice.lakeland.usf.edu/ courses, frequent re-installations… Because of this, virtual Abstract machines, such as the one provided by the User Mode Linux (UML) project [4][5][6] are becoming more common as a This paper discusses how Linux clustering and virtual machine means to enable students to safely and conveniently tinker with technologies can improve undergraduate students’ hands-on kernel internals. Security concerns are therefore addressed and experience in operating systems laboratories. Like similar re-installation processes considerably simplified. However, one 1 projects, SOFTICE relies on User Mode Linux (UML) to potential problem still remains; we have been assuming all provide students with privileged access to a Linux system along the availability of Linux workstations in the classrooms. without creating security breaches on the hosting network. We Let’s consider an instructor is interested in a Linux-based extend such approaches in two aspects. First, we propose to laboratory at an institution using exclusively Windows facilitate adoption of Linux-based laboratories by using a load- workstations. Chances are that the classroom setup will be balancing cluster made of recycled classroom PCs to remotely assigned to the instructor or that new Linux-qualified personnel serve access to virtual machines. Secondly, we propose a new will be hired. In both cases, the adoption of such laboratories approach for students to interact with the kernel code.
    [Show full text]
  • Operating Systems – the Code We Love to Hate
    U.S. Department of Energy Office of Science Operating Systems – the Code We Love to Hate (Operating and Runtime Systems: Status, Challenges and Futures) Fred Johnson Senior Technical Manager for Computer Science Office of Advanced Scientific Computing Research DOE Office of Science Salishan – April 2005 1 U.S. Department of Energy The Office of Science Office of Science Supports basic research that underpins DOE missions. Constructs and operates large scientific facilities for the U.S. scientific community. Accelerators, synchrotron light sources, neutron sources, etc. Six Offices Basic Energy Sciences Biological and Environmental Research Fusion Energy Sciences High Energy Nuclear Physics Advanced Scientific Computing Research Salishan – April 2005 2 U.S. Department of Energy Simulation Capability Needs -- FY2005 Timeframe Office of Science Sustained Application Simulation Need Computational Significance Capability Needed (Tflops) Climate Calculate chemical balances Provides U.S. policymakers with Science in atmosphere, including > 50 leadership data to support policy clouds, rivers, and decisions. Properly represent and vegetation. predict extreme weather conditions in changing climate. Magnetic Optimize balance between > 50 Underpins U.S. decisions about future Fusion Energy self-heating of plasma and international fusion collaborations. heat leakage caused by Integrated simulations of burning electromagnetic turbulence. plasma crucial for quantifying prospects for commercial fusion. Combustion Understand interactions > 50 Understand detonation dynamics (e.g. Science between combustion and engine knock) in combustion turbulent fluctuations in systems. Solve the “soot “ problem burning fluid. in diesel engines. Environmental Reliably predict chemical > 100 Develop innovative technologies to Molecular and physical properties of remediate contaminated soils and Science radioactive substances. groundwater. Astrophysics Realistically simulate the >> 100 Measure size and age of Universe and explosion of a supernova for rate of expansion of Universe.
    [Show full text]
  • ELASTICITY THANKS to KERRIGHED SSI and XTREEMFS Elastic Computing and Storage Infrastructure Using Kerrighed SSI and Xtreemfs
    ELASTICITY THANKS TO KERRIGHED SSI AND XTREEMFS Elastic Computing and Storage Infrastructure using Kerrighed SSI and XtreemFS Alexandre Lissy1, St´ephane Lauri`ere2 1Laboratoire d’Informatique, Universit´ede Tours, 64 avenue Jean Portalis, Tours, France 2Mandriva, 8 rue de la Michodi`ere, Paris, France Julien Hadba Laboratoire d’Informatique, Universit´ede Tours, 64 avenue Jean Portalis, Tours, France Keywords: Kerrighed, Cloud, Infrastructure, Elastic, XtreemFS, Storage. Abstract: One of the major feature of Cloud Computing is its elasticity, thus allowing one to have a moving infrastructure at a lower cost. Achieving this elasticity is the job of the cloud provider, whether it is IaaS, PaaS or SaaS. On the other hand, Single System Image has a hotplug capability. It allows to “transparently” dispatch, from a user perspective, the workload on the whole cluster. In this paper, we study whether and how we could build a Cloud infrastructure, leveraging tools from the SSI field. Our goal is to provide, thanks to Kerrighed for the computing power aggregation and unified system view, some IaaS and to exploit XtreemFS capabilities to provide performant and resilient storage for the associated Cloud infrastructure. 1 INTRODUCTION age: SSIs are distributed systems running on top of a network of computers, that appear as one big unique When talking about Cloud, John Mc- computer to the user. Several projects aimed at pro- Carthy’s (Garfinkel, 1999) quotation at the MIT’s viding such a system. Among them, we can cite 100th year celebration is often used: “computation MOSIX (Barak and Shiloh, 1977), openMosix (Bar, may someday be organized as a public utility”.
    [Show full text]
  • Storage Research at ORNL Sudharshan Vazhkudai R. Scott
    Storage Research at ORNL Presentation to HEC-IWG Workshop Sudharshan Vazhkudai Network and Cluster Computing, CSMD R. Scott Studham National Center for Computational Sciences Contributors: Sarp Oral, Hong Ong, Jeff Vetter, John Cobb, Xiaosong Ma and Micah Beck OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY 1 Application Needs and User Surveys—Initial Observations • GYRO, POP, TSI, SNS • Most users have limited IO capability because the libraries and runtime systems are inconsistent across platforms. • Limited use of Parallel NetCDF or HDF5 − POP moving to P-NetCDF − SNS uses HDF5 • Seldom use of MPI-IO • Widely varying file size distribution − 1MB, 10MB, 100MB, 1GB, 10GB OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY 2 Current Storage Efforts for NLCF • Future procurements require support for center– wide file system • Minimize the need for users to move files around for post processing. • As most applications continue to do the majority of I/O from PE0 we are focused on the single client performance to the central pool. NLCF Center Wide Filesystem OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY 3 Using Xen to test scalability of Lustre to O(100,000) processors. SSI Software OpenSSI 1.9.1 Filesystem Lustre 1.4.2 Basic OS Linux 2.6.10 Virtualization Xen 2.0.2 Single System Image with process migration OpenSSI OpenSSI OpenSSI OpenSSI XenLinux XenLinux XenLinux XenLinux Linux 2.6.10 Linux 2.6.10 Linux 2.6.10 Linux 2.6.10 Lustre Lustre Lustre Lustre Xen Virtual Machine Monitor Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) OAK RIDGE NATIONAL LABORATORY U.
    [Show full text]
  • SSI-OSCAR: a Distribution for High Performance Computing Using a Single System Image
    1 SSI-OSCAR: a Distribution For High Performance Computing Using a Single System Image Geoffroy Vallée (INRIA / ORNL / EDF), Christine Morin (INRIA), Stephen L. Scott (ORNL), Jean-Yves Berthou (EDF), Hugues Prisker (EDF) OSCAR Symposium, May 2005 2 Context • Clusters: distributed architecture • difficult to use • difficult to manage • Different approaches • do everything manually • use software suite to simplify management and use (e.g. OSCAR) • This solution does not completely hide the resources distribution • use a Single System Image (SSI) • all resources are managed at the cluster scale • transparent for users and administrators • gives the illusion that a cluster is an SMP machine 3 What is a Single System Image? • SSI features ● Transparent resource management at the cluster level ● High Availability: tolerate all undesirable events that can occurs (node failure or eviction) ● Support of programming standards (e.g. MPI, OpenMP) ● High performance • A Solution: merge OSCAR and an SSI ● Simple to install ● Simple to use ● Collaboration INRIA / EDF / ORNL 4 SSI - Implementation • Key point: global resource management Limitations for functionalities User level: middle-ware and efficiency (e.g. CONDOR) Complex to develop Kernel level: OS and maintain (e.g. MOSIX, OpenSSI, Kerrighed) Hardware level More expensive (e.g. SGI) 5 Kerrighed – Overview • SSI developed in France (Rennes), INRIA/IRISA, in collaboration with EDF • Management at the cluster scale of • memories (through a DSM) • processes (through mechanisms for global process
    [Show full text]
  • Downloaded on 2018-08-23T19:11:32Z Single System Image: a Survey
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Cork Open Research Archive Title Single system image: A survey Author(s) Healy, Philip D.; Lynn, Theo; Barrett, Enda; Morrisson, John P. Publication date 2016-02-17 Original citation Healy, P., Lynn, T., Barrett, E. and Morrison, J. P. (2016) 'Single system image: A survey', Journal of Parallel and Distributed Computing, 90- 91(Supplement C), pp. 35-51. doi:10.1016/j.jpdc.2016.01.004 Type of publication Article (peer-reviewed) Link to publisher's http://dx.doi.org/10.1016/j.jpdc.2016.01.004 version Access to the full text of the published version may require a subscription. Rights © 2016 Elsevier Inc. This is the preprint version of an article published in its final form in Journal of Parallel and Distributed Computing, available https://doi.org/10.1016/j.jpdc.2016.01.004. This manuscript version is made available under the CC BY-NC-ND 4.0 licence https://creativecommons.org/licenses/by-nc-nd/4.0/ Item downloaded http://hdl.handle.net/10468/4932 from Downloaded on 2018-08-23T19:11:32Z Single System Image: A Survey Philip Healya,b,∗, Theo Lynna,c, Enda Barretta,d, John P. Morrisona,b aIrish Centre for Cloud Computing and Commerce, Dublin City University, Ireland bComputer Science Dept., University College Cork, Ireland cDCU Business School, Dublin City University, Ireland dSoftware Research Institute, Athlone Institute of Technology, Ireland Abstract Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system.
    [Show full text]
  • Comparative Study of Single System Image Clusters
    Comparative Study of Single System Image Clusters Piotr OsiLski 1, Ewa Niewiadomska-Szynkiewicz 1,2 1 Warsaw University of Technology, Institute of Control and Computation Engineering, Warsaw, Poland, e-mail: [email protected], [email protected] 2 Research and Academic Computer Network (NASK), Warsaw, Poland. Abstract. Cluster computing has been identified as an important new technology that may be used to solve complex scientific and engineering problems as well as to tackle many projects in commerce and industry. In this paper* we present an overview of three Linux- based SSI cluster systems. We compare their stability, performance and efficiency. 1 Introduction to cluster systems One of the biggest advantages of distributed systems over standalone computers is an ability to share the workload between the nodes. A cluster is a group of cooperating, usually homogeneous computers that serves as one virtual machine [8, 11]. The performance of a given cluster depends on the speed of processors of separate nodes and the efficiency of particular network technology. In advanced computing clusters simple local networks are substituted by complicated network graphs or very fast communication channels. The most common operating systems used for building clusters are UNIX and Linux. Clusters should effectuate following features: scalability, transparency, reconfigurability, availability, reliability and high performance. There are many software tools for supporting cluster computing. In this paper we focus on three of them: Mosix [9] and its open source version – OpenMosix [12], OpenSSI [14] and Kerrighed [3]. One of the most important features of cluster systems is load balancing. The idea is to implement an efficient load balancing algorithm, which is triggered when loads of nodes are not balanced or local resources are limited.
    [Show full text]
  • Performance, Management, and Monitoring of 68 Node Raspberry Pi 3 Education Cluster: Big Orange Bramble (Bob)
    PERFORMANCE, MANAGEMENT, AND MONITORING OF 68 NODE RASPBERRY PI 3 EDUCATION CLUSTER: BIG ORANGE BRAMBLE (BOB) J. Parker Mitchell Aaron R. Young The University of Tennessee The University of Tennessee 1520 Middle Drive, Knoxville, TN, USA 1520 Middle Drive, Knoxville, TN, USA [email protected] [email protected] Jordan Sangid Kelley A. Deuso The University of Tennessee The University of Tennessee 1520 Middle Drive, Knoxville, TN, USA 1520 Middle Drive, Knoxville, TN, USA [email protected] [email protected] Patricia J. Eckhart Taher Naderi The University of Tennessee The University of Tennessee 1520 Middle Drive, Knoxville, TN, USA 1520 Middle Drive, Knoxville, TN, USA [email protected] [email protected] Mark Dean The University of Tennessee 1520 Middle Drive, Knoxville, TN, USA [email protected] ABSTRACT High performance computing clusters have become the de facto standard for addressing large scientific and commercial applications, yet there continues to be many challenges with building and using these system structures including cost, power consumption, application scaling, systems management, and parallel programming. The Big Orange Bramble (BOB) project, influenced by the smaller-scale Tiny Titan of Oak Ridge National Laboratory, was created to provide students, faculty, and researchers a low-cost platform to study and find solutions to these challenges. BOB involved the design and construction ofa high performance cluster composed of 68 quad-core ARMv8 64-bit Raspberry Pi 3s featuring one master node, 64 worker nodes, a monitor node, and two storage nodes. Beyond the primary target of delivering a functional system, efforts were focused on application development, performance benchmarking, and de- livery of a comprehensive build and usage guide to aid those who wish to build upon the efforts of the project.
    [Show full text]
  • Single System Image in a Linux-Based Replicated Operating System Kernel
    Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Binoy Ravindran, Chair Robert P. Broadwater Antonio Barbalace February 24, 2015 Blacksburg, Virginia Keywords: Linux, Multikernel, Thread synchronization, Signals, Process Management Copyright 2015, Akshay Giridhar Ravichandran Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran (ABSTRACT) Recent trends in the computer market suggest that emerging computing platforms will be increasingly parallel and heterogeneous, in order to satisfy the user demand for improved performance and superior energy savings. Heterogeneity is a promising technology to keep growing the number of cores per chip without breaking the power wall. However, existing system software is able to cope with homogeneous architectures, but it was not designed to run on heterogeneous architectures, therefore, new system software designs are necessary. One innovative design is the multikernel OS deployed by the Barrelfish operating system (OS) which partitions hardware resources to independent kernel instances that communi- cate exclusively by message passing, without exploiting the shared memory available amongst different CPUs in a multicore platform. Popcorn Linux implements an extension of the mul- tikernel OS design, called replicated-kernel OS, with the goal of providing a Linux-based single system image environment on top of multiple kernels, which can eventually run on dif- ferent ISA processors. A replicated-kernel OS replicates the state of various OS sub-systems amongst kernels that cooperate using message passing to distribute or access various services uniquely available on each kernel.
    [Show full text]