A Hybrid Beowulf Cluster

Total Page:16

File Type:pdf, Size:1020Kb

A Hybrid Beowulf Cluster Escola Tècnica Superior d’Enginyeria Electrònica i Informàtica La Salle Final Thesis Electronic Engineering A Hybrid Beowulf Cluster Student Tutor Gonçal Roch Colom Dr Joan Verdaguer-Codina Dr Jordi Margalef i Marrugat ACTA DE L'EXAMEN DEL TREBALL FI DE CARRERA The evaluating panel meeting on this day, the student: Gonçal Roch Colom Presented their final thesis on the following subject: A Beowulf Cluster with Intel, AMD and ARM Nodes for Teaching and Research At the end of the presentation and upon answering the questions of the members of the panel, this thesis was awarded the following grade: Barcelona, MEMBER OF THE PANEL MEMBER OF THE PANEL PRESIDENT OF THE PANEL A Hybrid Beowulf Cluster Gonçal Roch-Colom ETSEEI La Salle, Universitat Ramon Llull Tutor: Dr Joan Verdaguer-Codina Co-Tutor: Dr Jordi Margalef Year of Presentation: 2013 Hybrid Beowulf Cluster Abstract Every day, all over the world companies, public and private institutions, and households alike dismiss thousands of old computers. Most are perfectly fine, some are still quite powerful, but they are being replaced with brand new, x86 based units, be it PCs or Macs. Corruption in existing Windows installations, minor hardware faults, ommitances in manually updating the hardware, or generally being deemed 'too old' often lead to their demise. Recycling those forlorn but fully functional pieces of hardware into nodes of a powerful computer cluster for high performance distributed computing seems not only a fascinating challenge but a worthy cause as well, especially in the teaching arena. Extra spice shall be thrown in in the shape of ARM SoCs, a building block to prepare our students for their future role in society. i Hybrid Beowulf Cluster Sumari Cada dia, a tot el món, empreses, organismes públics i privats i particulars estan llençant milers d'ordinadors vells. La majoria estan perfectament bé, alguns són encara molt potents, però els estant canviant per altres de nous amb processadors x86, ja siguin PCs o Macs. Normalment la corrupció en els Windows instal·lats, petits problemes o manques d'actualització en el hardware, o considerar els ordinadors “ja massa vells” condueixen a la seva desaparició. Reciclar aquest hardware perfectament vàlid tot fent-lo formar part d'un cluster d'ordinadors per a la computació distribuïda d'alt rendiment ens sembla no només un repte fascinant, sinó també una bona causa, especialment aplicada a l'educació. Afegirem un toc addicional fent servir també SoCs amb ARM, bàsics per preparar els estudiants per al seu futur. ii Hybrid Beowulf Cluster Acknowledgements I wish to thank Joan Verdaguer for supplying ideas, ceaseless manuscript follow-up and improvement, and believing in me over the years; Jordi Margalef for his help and academic guidance; my wife for encouraging me to pursue this endeavour when time was the scarcest resource; my colleague Miquel Soler for helping out and brainstorming with me; my parents for believing in learning; Steven Vickers for his witty Jupiter Ace User's Guide which shaped my mind in my mid teens; all involved in the inception of the ARM architecture for a source of fascination to me; and last but not least, the great Charles Dickens whose work deeply influenced my life. iii Hybrid Beowulf Cluster Table of Contents 1 Motivation and Objectives......................................................2 1.1 Motivation................................................................................2 1.1.1 Reasoning behind the Motivation..........................................2 1.1.2 Personal Motives....................................................................2 1.2 Objectives.................................................................................3 1.2.1 Using this Project in the Academic world..............................4 1.2.1.1 Students..............................................................................4 1.2.1.2 Teachers..............................................................................4 1.2.1.3 Teacher Training.................................................................4 1.3 Open Source.............................................................................5 2 A Changing Teaching Environment........................................7 2.1 A Revolution in Teaching: Using Computers...........................7 2.2 The Initiatives..........................................................................7 2.2.1 Britain....................................................................................8 2.3 The Everis Poll, STEM............................................................9 2.4 Linux, not Windows...............................................................11 3 Beowulf and its Background.................................................14 3.1 Definition of a Beowulf Cluster..............................................14 3.2 Hybrid and Heterogeneous.....................................................14 4 Brief History of Architectures...............................................17 4.1 x86 dominates the PC scene...................................................17 4.2 The Relevance of ARM..........................................................18 4.2.1 The ARM in the Pi..............................................................19 4.3 x86 vs ARM...........................................................................20 4.4 Tablets, smartphones, netbooks and aspiring desktops..........21 4.5 1980s - Occam and Transputers.............................................22 5 Brief History of OSs..............................................................24 5.1 MS-DOS and Windows...........................................................24 iv Hybrid Beowulf Cluster 5.1.1 The Demise of the Home PC..............................................25 5.1.2 Apple's dominance and the personal computer...................25 5.1.3 The Future of the Personal Computer................................26 5.2 Early 1990s.............................................................................26 5.2.1 MINIX..................................................................................26 5.2.2 Linux....................................................................................27 5.3 1990s.......................................................................................28 5.4 2000s: Render Farms..............................................................28 5.5 Beowulf as a Valid Alternative...............................................29 6 The Future............................................................................31 6.1 Mont-Blanc.............................................................................31 6.2 UPC's Scientific, Technical and Educational Training..........32 7 The Raspberry Pi..................................................................34 8 A Heterogeneous Cluster.......................................................37 8.1 Symmetric Load Balancing.....................................................37 8.1.1 Finest Grain: the Coimbra approach....................................37 8.1.2 Benchmarking nodes: the HINT benchmark........................37 8.2 Asymmetric Load Balancing..................................................38 8.3 Quibbles.................................................................................38 8.4 Potential for Further Research...............................................38 9 HBC Overview......................................................................41 9.1 Network Topology..................................................................41 9.1.1 Head / Master Node: RaspberryPi.......................................41 9.1.1.1 RaspberryPi99...................................................................41 9.1.1.2 Rest of Nodes.....................................................................41 9.2 Software..................................................................................41 9.2.1 Raspbian OS on Master and Compute Nodes......................41 9.2.2 Ubuntu 10.04.4 OS on Compute Nodes................................42 9.2.3 Rest of software needed........................................................42 10 Hardware - Building the System.........................................44 10.1 Picking up the Bits...............................................................44 v Hybrid Beowulf Cluster 10.2 The Dell Poweredge 1500sc Server.......................................44 10.3 The 3 HP Kayaks.................................................................45 10.4 The Webgine 1115XL laptop Hardware................................46 10.5 The HP Pavilion AMD64 based laptop.................................46 10.6 The AMD Athlon based brandless desktop...........................47 10.7 The Raspberry Pi..................................................................47 10.8 Communications...................................................................48 10.9 Physical Layout: Positioning the PC's.................................48 11 Network – Building the System...........................................51 11.1 The Downton Cluster...........................................................51 11.2 IP addresses and node names...............................................51 12 Software - Building the System...........................................53 12.1 Previous Considerations.......................................................53 12.2 General Procedure on PC's...................................................53 12.2.1 Partitioning........................................................................53
Recommended publications
  • Lesslinux Handbook
    lesslinux.com Documentation Handbook Mattias Schlenker INUX August-Bebel-Str. 74 L 04275 Leipzig GERMANY ê [email protected] ESS L Contents 1 About LessLinux and this handbook2 2 LessLinux for users3 3 LessLinux for admins4 3.1 Remote access.......................................4 3.1.1 SSH.........................................4 3.1.2 VNC........................................5 3.1.3 RDP........................................6 3.1.4 Xpra........................................7 3.2 Netbooting LessLinux..................................7 3.2.1 CIFS or NFS boot.................................8 3.2.2 HTTP, FTP or TFTP boot.............................8 3.3 LessLinux as thinclient..................................9 3.3.1 Booting to Remmina...............................9 3.3.2 Booting to an RDP login mask.........................9 3.3.3 Booting to a chooser............................... 10 3.3.4 Using XDMCP.................................. 12 3.3.5 Local printers................................... 12 4 LessLinux for builders and contributors 13 4.1 Preparation........................................ 13 4.1.1 Prepare a drive.................................. 13 4.1.2 Create some directories............................. 14 4.1.3 Download the „sources”............................. 14 4.2 Build the first stage.................................... 14 2 Abstract LessLinux is a free Linux system designed to be light and easily modifiable. It is based on Linux from Scratch and was started by Mattias Schlenker in 2009. Since then it has been used as a base for dozens of security and rescue systems published by computer magazines all over the world. It’s simple architecture makes it easy to build LessLinux based systems for use as thinclient, software deployment or the demonstration of software. This book covers the possibilities of LessLinux and tells you how small changes can make LessLinux the lever you need to move your world.
    [Show full text]
  • Multicomputer Cluster
    Multicomputer • Multiple (full) computers connected by network. • Distributed memory each have special address space. • Access to data another processor is explicit in program, express by call function for sending or receiving message. • Don’t need special operating System, enough libraries with function for sub sending message. • Good scalability. In this section we discuss network computing, in which the nodes are stand- alone computers that could be connected via a switch, local area network, or the Internet. The main idea is to divide the application into semi-independent parts according to the kind of processing needed. Different nodes on the network can be assigned different parts of the application. This form of network computing takes advantage of the unique capabilities of diverse system architectures. It also maximally leverages potentially idle resources within a large organization. Therefore, unused CPU cycles may be utilized during short periods of time resulting in bursts of activity followed by periods of inactivity. In what follows, we discuss the utilization of network technology in order to create a computing infrastructure using commodity computers. Cluster • In 1990 shifted from expensive and specialized parallel machines to the more cost-effective clusters of PCs and workstations. • A cluster is a collection of stand-alone computers connected using some interconnection network. • Each node in a cluster could be a workstation. • Important for it to have fast processors and fast network to enable it to use for distributed system. • Cluster workstation component: 1. Fast processor/memory and complete HW for PC. 2. Free access SW. 3. High execute, low latency. The 1990s have witnessed a significant shift from expensive and specialized parallel machines to the more cost-effective clusters of PCs and workstations.
    [Show full text]
  • Schedule 14A Employee Slides Supertex Sunnyvale
    UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 SCHEDULE 14A Proxy Statement Pursuant to Section 14(a) of the Securities Exchange Act of 1934 Filed by the Registrant Filed by a Party other than the Registrant Check the appropriate box: Preliminary Proxy Statement Confidential, for Use of the Commission Only (as permitted by Rule 14a-6(e)(2)) Definitive Proxy Statement Definitive Additional Materials Soliciting Material Pursuant to §240.14a-12 Supertex, Inc. (Name of Registrant as Specified In Its Charter) Microchip Technology Incorporated (Name of Person(s) Filing Proxy Statement, if other than the Registrant) Payment of Filing Fee (Check the appropriate box): No fee required. Fee computed on table below per Exchange Act Rules 14a-6(i)(1) and 0-11. (1) Title of each class of securities to which transaction applies: (2) Aggregate number of securities to which transaction applies: (3) Per unit price or other underlying value of transaction computed pursuant to Exchange Act Rule 0-11 (set forth the amount on which the filing fee is calculated and state how it was determined): (4) Proposed maximum aggregate value of transaction: (5) Total fee paid: Fee paid previously with preliminary materials. Check box if any part of the fee is offset as provided by Exchange Act Rule 0-11(a)(2) and identify the filing for which the offsetting fee was paid previously. Identify the previous filing by registration statement number, or the Form or Schedule and the date of its filing. (1) Amount Previously Paid: (2) Form, Schedule or Registration Statement No.: (3) Filing Party: (4) Date Filed: Filed by Microchip Technology Incorporated Pursuant to Rule 14a-12 of the Securities Exchange Act of 1934 Subject Company: Supertex, Inc.
    [Show full text]
  • Cluster Computing: Architectures, Operating Systems, Parallel Processing & Programming Languages
    Cluster Computing Architectures, Operating Systems, Parallel Processing & Programming Languages Author Name: Richard S. Morrison Revision Version 2.4, Monday, 28 April 2003 Copyright © Richard S. Morrison 1998 – 2003 This document is distributed under the GNU General Public Licence [39] Print date: Tuesday, 28 April 2003 Document owner: Richard S. Morrison, [email protected] ✈ +612-9928-6881 Document name: CLUSTER_COMPUTING_THEORY Stored: (\\RSM\FURTHER_RESEARCH\CLUSTER_COMPUTING) Revision Version 2.4 Copyright © 2003 Synopsis & Acknolegdements My interest in Supercomputing through the use of clusters has been long standing and was initially sparked by an article in Electronic Design [33] in August 1998 on the Avalon Beowulf Cluster [24]. Between August 1998 and August 1999 I gathered information from websites and parallel research groups. This culminated in September 1999 when I organised the collected material and wove a common thread through the subject matter producing two handbooks for my own use on cluster computing. Each handbook is of considerable length, which was governed by the wealth of information and research conducted in this area over the last 5 years. The cover the handbooks are shown in Figure 1-1 below. Figure 1-1 – Author Compiled Beowulf Class 1 Handbooks Through my experimentation using the Linux Operating system and the undertaking of the University of Technology, Sydney (UTS) undergraduate subject Operating Systems in Autumn Semester 1999 with Noel Carmody, a systems level focus was developed and is the core element of this material contained in this document. This led to my membership to the IEEE and the IEEE Technical Committee on Parallel Processing, where I am able to gather and contribute information and be kept up to date on the latest issues.
    [Show full text]
  • Building a Beowulf Cluster
    Building a Beowulf cluster Åsmund Ødegård April 4, 2001 1 Introduction The main part of the introduction is only contained in the slides for this session. Some of the acronyms and names in this paper may be unknown. In Appendix B we includ short descriptions for some of them. Most of this is taken from “whatis” [6] 2 Outline of the installation ² Install linux on a PC ² Configure the PC to act as a install–server for the cluster ² Wire up the network if that isn’t done already ² Install linux on the rest of the nodes ² Configure one PC, e.g the install–server, to be a server for your cluster. These are the main steps required to build a linux cluster, but each step can be done in many different ways. How you prefer to do it, depends mainly on personal taste, though. Therefor, I will translate the given outline into this list: ² Install Debian GNU/Linux on a PC ² Install and configure “FAI” on the PC ² Build the “FAI” boot–floppy ² Assemble hardware information, and finalize the “FAI” configuration ² Boot each node with the boot–floppy ² Install and configure a queue system and software for running parallel jobs on your cluster 3 Debian The choice of Linux distribution is most of all a matter of personal taste. I prefer the Debian distri- bution for various reasons. So, the first step in the cluster–building process is to pick one of the PCs as a install–server, and install Debian onto it, as follows: ² Make sure that the computer can boot from cdrom.
    [Show full text]
  • Embedded Market Study, 2013
    2013 EMBEDDED MARKET STUDY Essential to Engineers DATASHEETS.COM | DESIGNCON | DESIGN EAST & DESIGN WEST | EBN | EDN | EE TIMES | EMBEDDED | PLANET ANALOG | TECHONLINE | TEST & MEASUREMENT WORLD 2013 Embedded Market Study 2 UBM Tech Electronics’ Brands Unparalleled Reach & Experience UBM Tech Electronics is the media and marketing services solution for the design engineering and electronics industry. Our audience of over 2,358,928 (as of March 5, 2013) are the executives and engineers worldwide who design, develop, and commercialize technology. We provide them with the essentials they need to succeed: news and analysis, design and technology, product data, education, and fun. Copyright © 2013 by UBM. All rights reserved. 2013 Embedded Market Study 5 Purpose and Methodology • Purpose: To profile the findings of the 2013 results of EE Times Group annual comprehensive survey of the embedded systems markets worldwide. Findings include types of technology used, all aspects of the embedded development process, tools used, work environment, applications, methods and processes, operating systems used, reasons for using and not using chips and technology, and brands and chips currently used by or being considered by embedded developers. Many questions in this survey have been trended over two to five years. • Methodology: A web-based online survey instrument based on the previous year’s survey was developed and implemented by independent research company Wilson Research Group from January 18, 2013 to February 13, 2013 by email invitation • Sample: E-mail invitations were sent to subscribers to UBM/EE Times Group Embedded Brands with one reminder invitation. Each invitation included a link to the survey. • Returns: 2,098 valid respondents for an overall confidence of 95% +/- 2.13%.
    [Show full text]
  • Beowulf Clusters Make Supercomputing Accessible
    Nor-Tech Contributes to NASA Article: Beowulf Clusters Make Supercomputing Accessible Original article available at NASA Spinoff: https://spinoff.nasa.gov/Spinoff2020/it_1.html NASA Technology In the Old English epic Beowulf, the warrior Unferth, jealous of the eponymous hero’s bravery, openly doubts Beowulf’s odds of slaying the monster Grendel that has tormented the Danes for 12 years, promising a “grim grappling” if he dares confront the dreaded march-stepper. A thousand years later, many in the supercomputing world were similarly skeptical of a team of NASA engineers trying achieve supercomputer-class processing on a cluster of standard desktop computers running a relatively untested open source operating system. “Not only did nobody care, but there were even a number of people hostile to this project,” says Thomas Sterling, who led the small team at NASA’s Goddard Space Flight Center in the early 1990s. “Because it was different. Because it was completely outside the scope of the Thomas Sterling, who co-invented the Beowulf supercomputing cluster at Goddard Space Flight Center, poses with the Naegling cluster at California supercomputing community at that time.” Technical Institute in 1997. Consisting of 120 Pentium Pro processors, The technology, now known as the Naegling was the first cluster to hit 10 gigaflops of sustained performance. Beowulf cluster, would ultimately succeed beyond its inventors’ imaginations. In 1993, however, its odds may indeed have seemed long. The U.S. Government, nervous about Japan’s high- performance computing effort, had already been pouring money into computer architecture research at NASA and other Federal agencies for more than a decade, and results were frustrating.
    [Show full text]
  • Linux Networking Cookbook.Pdf
    Linux Networking Cookbook ™ Carla Schroder Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo Linux Networking Cookbook™ by Carla Schroder Copyright © 2008 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Editor: Mike Loukides Indexer: John Bickelhaupt Production Editor: Sumita Mukherji Cover Designer: Karen Montgomery Copyeditor: Derek Di Matteo Interior Designer: David Futato Proofreader: Sumita Mukherji Illustrator: Jessamyn Read Printing History: November 2007: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. The Cookbook series designations, Linux Networking Cookbook, the image of a female blacksmith, and related trade dress are trademarks of O’Reilly Media, Inc. Java™ is a trademark of Sun Microsystems, Inc. .NET is a registered trademark of Microsoft Corporation. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
    [Show full text]
  • Virtual GPU Software User Guide Is Organized As Follows: ‣ This Chapter Introduces the Capabilities and Features of NVIDIA Vgpu Software
    Virtual GPU Software User Guide DU-06920-001 _v13.0 Revision 02 | August 2021 Table of Contents Chapter 1. Introduction to NVIDIA vGPU Software..............................................................1 1.1. How NVIDIA vGPU Software Is Used....................................................................................... 1 1.1.2. GPU Pass-Through.............................................................................................................1 1.1.3. Bare-Metal Deployment.....................................................................................................1 1.2. Primary Display Adapter Requirements for NVIDIA vGPU Software Deployments................2 1.3. NVIDIA vGPU Software Features............................................................................................. 3 1.3.1. GPU Instance Support on NVIDIA vGPU Software............................................................3 1.3.2. API Support on NVIDIA vGPU............................................................................................ 5 1.3.3. NVIDIA CUDA Toolkit and OpenCL Support on NVIDIA vGPU Software...........................5 1.3.4. Additional vWS Features....................................................................................................8 1.3.5. NVIDIA GPU Cloud (NGC) Containers Support on NVIDIA vGPU Software...................... 9 1.3.6. NVIDIA GPU Operator Support.......................................................................................... 9 1.4. How this Guide Is Organized..................................................................................................10
    [Show full text]
  • Vnc Linux Download
    Vnc linux download click here to download Enable remote connections between computers by downloading VNC®. macOS · VNC Connect for Linux Linux · VNC Connect for Raspberry Pi Raspberry Pi. Windows · VNC Viewer for macOS macOS · VNC Viewer for Linux Linux · VNC Viewer for Raspberry Pi Raspberry Pi · VNC Viewer for iOS iOS · VNC Viewer for . Sign in to the VNC Server app to apply your subscription, or take a free trial. Note administrative privileges are required (this is typically the user who first set up a. These instructions explain how to install VNC Connect (version 6+), consisting of For a Debian-compatible Linux computer, download the VNC Viewer DEB. VNC Viewer for Windows Windows · VNC Viewer for macOS macOS · VNC Viewer for Linux Linux · VNC Viewer for Raspberry Pi Raspberry Pi · VNC Viewer for. Download the original open source version of VNC® remote access technology. The latest release of TigerVNC can be downloaded from our GitHub release also provide self- contained binaries for bit and bit Linux, installers for bit. sudo apt install tightvncserver. To complete the VNC server's initial configuration after installation, use the vncserver command to set up a. From your Linode, launch the VNC server to test your connection. You will be prompted to set a password: vncserver How To Install VNC Server On Ubuntu This guide explains the installation and Further, we need to start the vncserver with the user, for this use. RealVNC for Linux (bit) is remote control software which allows you to view and interact with one computer (the "server") using a simple.
    [Show full text]
  • Anthony J. Massa
    EMBEDDED SOFTWARE DEVELOPMENT WITH ECOS™ Anthony J. Massa EMBEDDED SOFTWARE DEVELOPMENT WITH ECOS Anthony J. Massa PRENTICE HALL PROFESSIONAL TECHNICAL REFERENCE UPPER SADDLE RIVER, NJ 07458 WWW.PHPTR.COM WWW.PHPTR.COM/MASSA/ Library of Congress Cataloging-in-Publication Data Massa, Anthony J. Embedded software development with eCos / Anthony J. Massa p. cm.--(Bruce Perens' Open source series) ISBN 0-13-035473-2 1. Embedded computer systems--Programming. 2. Application software--Development. 3. Real-time data processing. I. Title. II. Series. QA76.6 .M364317 2002 005.26--dc21 2002035507 Editorial/production supervision: Techne Group Cover design director: Jerry Votta Cover design: Anthony Gemmellaro Art director: Gail Cocker-Bogusz Interior design: Meg Van Arsdale Manufacturing buyer: Maura Zaldivar Editor-in-Chief: Mark L. Taub Editorial assistant: Kate Wolf Marketing manager: Bryan Gambrel Full-service production manager: Anne R. Garcia © 2003 Pearson Education, Inc. Publishing as Prentice Hall Professional Technical Reference Upper Saddle River, New Jersey 07458 This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at <http://www.opencontent.org/openpub/>). Prentice Hall books are widely used by corporations and government agencies for training, marketing, and resale. For information regarding corporate and government bulk discounts please contact: Corporate and Government Sales (800) 382-3419 or [email protected] Other company and product names mentioned herein are the trademarks or registered trademarks of their respective owners. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher.
    [Show full text]
  • Spark on Hadoop Vs MPI/Openmp on Beowulf
    Procedia Computer Science Volume 53, 2015, Pages 121–130 2015 INNS Conference on Big Data Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf Jorge L. Reyes-Ortiz1, Luca Oneto2, and Davide Anguita1 1 DIBRIS, University of Genoa, Via Opera Pia 13, I-16145, Genoa, Italy ([email protected], [email protected]) 2 DITEN, University of Genoa, Via Opera Pia 11A, I-16145, Genoa, Italy ([email protected]) Abstract One of the biggest challenges of the current big data landscape is our inability to pro- cess vast amounts of information in a reasonable time. In this work, we explore and com- pare two distributed computing frameworks implemented on commodity cluster architectures: MPI/OpenMP on Beowulf that is high-performance oriented and exploits multi-machine/multi- core infrastructures, and Apache Spark on Hadoop which targets iterative algorithms through in-memory computing. We use the Google Cloud Platform service to create virtual machine clusters, run the frameworks, and evaluate two supervised machine learning algorithms: KNN and Pegasos SVM. Results obtained from experiments with a particle physics data set show MPI/OpenMP outperforms Spark by more than one order of magnitude in terms of processing speed and provides more consistent performance. However, Spark shows better data manage- ment infrastructure and the possibility of dealing with other aspects such as node failure and data replication. Keywords: Big Data, Supervised Learning, Spark, Hadoop, MPI, OpenMP, Beowulf, Cloud, Parallel Computing 1 Introduction The information age brings along an explosion of big data from multiple sources in every aspect of our lives: human activity signals from wearable sensors, experiments from particle discovery research and stock market data systems are only a few examples [48].
    [Show full text]