Programming Models to Support Data Science Workflows
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Evaluating DDS, MQTT, and Zeromq Under Different Iot Traffic Conditions
Evaluating DDS, MQTT, and ZeroMQ Under Different IoT Traffic Conditions Zhuangwei Kang Robert Canady Abhishek Dubey Vanderbilt University Vanderbilt University Vanderbilt University Nashville, Tennessee Nashville, Tennessee Nashville, Tennessee [email protected] [email protected] [email protected] Aniruddha Gokhale Shashank Shekhar Matous Sedlacek Vanderbilt University Siemens Technology Siemens Technology Nashville, Tennessee Princeton, New Jersey Munich, Germany [email protected] [email protected] [email protected] Abstract Keywords: Publish/Subscribe Middleware, Benchmark- ing, MQTT, DDS, ZeroMQ, Performance Evaluation Publish/Subscribe (pub/sub) semantics are critical for IoT applications due to their loosely coupled nature. Although OMG DDS, MQTT, and ZeroMQ are mature pub/sub solutions used for IoT, prior studies show that their performance varies significantly under different 1 Introduction load conditions and QoS configurations, which makes Distributed deployment of real-time applications and middleware selection and configuration decisions hard. high-speed dissemination of massive data have been hall- Moreover, the load conditions and role of QoS settings in marks of the Internet of Things (IoT) platforms. IoT prior comparison studies are not comprehensive and well- applications typically adopt publish/subscribe (pub/- documented. To address these limitations, we (1) propose sub) middleware for asynchronous and cross-platform a set of performance-related properties for pub/sub mid- communication. OMG Data Distribution Service (DDS), dleware and investigate their support in DDS, MQTT, ZeroMQ, and MQTT are three representative pub/sub and ZeroMQ; (2) perform systematic experiments under technologies that have entirely different architectures (de- three representative, lab-based real-world IoT use cases; centralized data-centric, decentralized message-centric, and (3) improve DDS performance by applying three and centralized message-centric, respectively). -
This Paper Must Be Cited As
Document downloaded from: http://hdl.handle.net/10251/64607 This paper must be cited as: Luzuriaga Quichimbo, JE.; Pérez, M.; Boronat, P.; Cano Escribá, JC.; Tavares De Araujo Cesariny Calafate, CM.; Manzoni, P. (2015). A comparative evaluation of AMQP and MQTT protocols over unstable and mobile networks. 12th IEEE Consumer Communications and Networking Conference (CCNC 2015). IEEE. doi:10.1109/CCNC.2015.7158101. The final publication is available at http://dx.doi.org/10.1109/CCNC.2015.7158101 Copyright IEEE Additional Information © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. A comparative evaluation of AMQP and MQTT protocols over unstable and mobile networks Jorge E. Luzuriaga∗, Miguel Perezy, Pablo Boronaty, Juan Carlos Cano∗, Carlos Calafate∗, Pietro Manzoni∗ ∗Department of Computer Engineering Universitat Politecnica` de Valencia,` Valencia, SPAIN [email protected], jucano,calafate,[email protected] yUniversitat Jaume I, Castello´ de la Plana, SPAIN [email protected], [email protected] Abstract—Message oriented middleware (MOM) refers to business application [6]. It works like instant messaging or the software infrastructure supporting sending and receiving email, and the difference towards these available -
Communicate the Future PROCEEDINGS HYATT REGENCY ORLANDO 20–23 MAY | ORLANDO, FL and Summit.Stc.Org #STC18
Communicate the Future PROCEEDINGS HYATT REGENCY ORLANDO 20–23 MAY | ORLANDO, FL www.stc.org and summit.stc.org #STC18 SOCIETY FOR TECHNICAL COMMUNICATION 1 Efficiency exemplified Organizations globally use Adobe FrameMaker (2017 release) Request demo to transform the way they create and deliver content Accelerated turnaround time for 70% reduction in printing and paper customized publications material cost Faster creation and delivery of content 50% faster production of PDF for new products across devices documentation 20% faster development of Accelerated publishing across formats course content Increased efficiency and reduced 20% improvement in process translation costs while producing multilingual manuals efficiency For a personalized demo or questions, write to us at [email protected] © 2018 Adobe Systems Incorporated. All rights reserved. The papers published in these proceedings were reproduced from originals furnished by the authors. The authors, not the Society for Technical Communication (STC), are solely responsible for the opinions expressed, the integrity of the information presented, and the attribution of sources. The papers presented in this publication are the works of their respective authors. Minor copyediting changes were made to ensure consistency. STC grants permission to educators and academic libraries to distribute articles from these proceedings for classroom purposes. There is no charge to these institutions, provided they give credit to the author, the proceedings, and STC. All others must request permission. All product and company names herein are the property of their respective owners. © 2018 Society for Technical Communication 9401 Lee Highway, Suite 300 Fairfax, VA 22031 USA +1.703.522.4114 www.stc.org Design and layout by Avon J. -
Advanced Architecture for Java Universal Message Passing (AA-JUMP)
The International Arab Journal of Information Technology, Vol. 15, No. 3, May 2018 429 Advanced Architecture for Java Universal Message Passing (AA-JUMP) Adeel-ur-Rehman1 and Naveed Riaz2 1National Centre for Physics, Pakistan 2School of Electrical Engineering and Computer Science, National University of Science and Technology, Pakistan Abstract: The Architecture for Java Universal Message Passing (A-JUMP) is a Java based message passing framework. A- JUMP offers flexibility for programmers in order to write parallel applications making use of multiple programming languages. There is also a provision to use various network protocols for message communication. The results for standard benchmarks like ping-pong latency, Embarrassingly Parallel (EP) code execution, Java Grande Forum (JGF) Crypt etc. gave us the conclusion that for the cases where the data size is smaller than 256K bytes, the numbers are comparative with some of its predecessor models like Message Passing Interface CHameleon version 2 (MPICH2), Message Passing interface for Java (MPJ) Express etc. But, in case, the packet size exceeds 256K bytes, the performance of the A-JUMP model seems to be severely hampered. Hence, taking that peculiar behaviour into account, this paper talks about a strategy devised to cope up with the performance limitation observed under the base A-JUMP implementation, giving birth to an Advanced A-JUMP (AA- JUMP) methodology while keeping the basic workflow of the original model intact. AA-JUMP addresses to improve performance of A-JUMP by preserving its various traits like portability, simplicity, scalability etc. which are the key features offered by flourishing High Performance Computing (HPC) oriented frameworks of now-a-days. -
Dcamp: Distributed Common Api for Measuring
DCAMP: DISTRIBUTED COMMON API FOR MEASURING PERFORMANCE A Thesis presented to the Faculty of California Polytechnic State University San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Alexander Paul Sideropoulos December 2014 c 2014 Alexander Paul Sideropoulos ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: dCAMP: Distributed Common API for Measuring Performance AUTHOR: Alexander Paul Sideropoulos DATE SUBMITTED: December 2014 COMMITTEE CHAIR: Michael Haungs, Ph.D. Associate Professor of Computer Science COMMITTEE MEMBER: Aaron Keen, Ph.D. Assistant Professor of Computer Science COMMITTEE MEMBER: John Bellardo, Ph.D. Associate Professor of Computer Science iii ABSTRACT dCAMP: Distributed Common API for Measuring Performance Alexander Paul Sideropoulos Although the nearing end of Moore's Law has been predicted numerous times in the past [22], it will eventually come to pass. In forethought of this, many modern computing systems have become increasingly complex, distributed, and parallel. As software is developed on and for these complex systems, a common API is necessary for gathering vital performance related metrics while remaining transparent to the user, both in terms of system impact and ease of use. Several distributed performance monitoring and testing systems have been proposed and implemented by both research and commercial institutions. How- ever, most of these systems do not meet several fundamental criterion for a truly useful distributed performance monitoring system: 1) variable data delivery mod- els, 2) security, 3) scalability, 4) transparency, 5) completeness, 6) validity, and 7) portability [30]. This work presents dCAMP: Distributed Common API for Measuring Per- formance, a distributed performance framework built on top of Mark Gabel and Michael Haungs' work with CAMP. -
Efficient Support for Data-Intensive Scientific Workflows on Geo
Efficient support for data-intensive scientific workflows on geo-distributed clouds Luis Eduardo Pineda Morales To cite this version: Luis Eduardo Pineda Morales. Efficient support for data-intensive scientific workflows ongeo- distributed clouds. Computation and Language [cs.CL]. INSA de Rennes, 2017. English. NNT : 2017ISAR0012. tel-01645434v2 HAL Id: tel-01645434 https://tel.archives-ouvertes.fr/tel-01645434v2 Submitted on 13 Dec 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Lucía, que no dejó que me rindiera. Acknowledgements One page is not enough to thank all the people who, in one way or another, have helped, taught, and motivated me along this journey. My deepest gratitude to them all. To my amazing supervisors Gabriel Antoniu and Alexandru Costan, words fall short to thank you for your guidance, support and patience during these 3+ years. Gabriel: ¡Gracias! For your trust and support went often beyond the academic. Alex: it has been my honor to be your first PhD student, I hope I met the expectations. To my family, Lucía, for standing by my side every time, everywhere; you are the driving force in my life, I love you. -
VSI's Open Source Strategy
VSI's Open Source Strategy Plans and schemes for Open Source so9ware on OpenVMS Bre% Cameron / Camiel Vanderhoeven April 2016 AGENDA • Programming languages • Cloud • Integraon technologies • UNIX compability • Databases • Analy;cs • Web • Add-ons • Libraries/u;li;es • Other consideraons • SoDware development • Summary/conclusions tools • Quesons Programming languages • Scrip;ng languages – Lua – Perl (probably in reasonable shape) – Tcl – Python – Ruby – PHP – JavaScript (Node.js and friends) – Also need to consider tools and packages commonly used with these languages • Interpreted languages – Scala (JVM) – Clojure (JVM) – Erlang (poten;ally a good fit with OpenVMS; can get good support from ESL) – All the above are seeing increased adop;on 3 Programming languages • Compiled languages – Go (seeing rapid adop;on) – Rust (relavely new) – Apple Swi • Prerequisites (not all are required in all cases) – LLVM backend – Tweaks to OpenVMS C and C++ compilers – Support for latest language standards (C++) – Support for some GNU C/C++ extensions – Updates to OpenVMS C RTL and threads library 4 Programming languages 1. JavaScript 2. Java 3. PHP 4. Python 5. C# 6. C++ 7. Ruby 8. CSS 9. C 10. Objective-C 11. Perl 12. Shell 13. R 14. Scala 15. Go 16. Haskell 17. Matlab 18. Swift 19. Clojure 20. Groovy 21. Visual Basic 5 See h%p://redmonk.com/sogrady/2015/07/01/language-rankings-6-15/ Programming languages Growing programming languages, June 2015 Steve O’Grady published another edi;on of his great popularity study on programming languages: RedMonk Programming Language Rankings: June 2015. As usual, it is a very valuable piece. There are many take-away from this research. -
Apache Taverna
Apache Taverna hp://taverna.incubator.apache.org/ Donal Fellows San Soiland-Reyes @donalfellows @soilandreyes [email protected] [email protected] hp://orcid.org/0000-0002-9091-5938 hp://orcid.org/0000-0001-9842-9718 Alan R Williams @alanrw [email protected] hp://orcid.org/0000-0003-3156-2105 This work is licensed under a Creave Commons Collaboraons Workshop 2015-03-26 A,ribuIon 3.0 Unported License. Taverna Workflow Ecosystem • Workflow Language — SCUFL2 (and t2flow) • Workflow Engine — Taverna • Used in… – Taverna Command Line Tool – Taverna Server – Taverna WorkBench • Allied services – myExperiment, workflow repository – Service Catalographer, service catalog software • Instantiated as BioCatalogue, BiodiversityCatalogue, … NERSC Workflow Day 2 Map of the Taverna Ecosystem Taverna Ruby client Taverna Online Applicaon-Specific Portals Player UI UI Plugins UI Plugins Plugins Taverna UI Plugins Lite Taverna Taverna Workbench UI Plugins Command UI Plugins Line Tool REST API Components Taverna Other TavernaUI Plugins APIs Server Servers Taverna SOAP API Taverna Core AcIvity Engine UI Ps UI Plugins Plugins Workflow Service many Repository Catalogs services… 3 Users, Scientific Areas, Projects Taverna In Use NERSC Workflow Day 4 Taverna Users Worldwide NERSC Workflow Day 5 Taverna Uses — Scientific Areas • Biodiversity — BioVeL project • Digital Preservation — SCAPE project • Astronomy — AstroTaverna product • Solar Wind Physics — HELIO project • In silico Medicine — VPH-Share project NERSC Workflow Day 6 Biodiversity: BioVeL • Virtual e-LaBoratory for -
A Review of Scalable Bioinformatics Pipelines
Data Sci. Eng. (2017) 2:245–251 https://doi.org/10.1007/s41019-017-0047-z A Review of Scalable Bioinformatics Pipelines 1 1 Bjørn Fjukstad • Lars Ailo Bongo Received: 28 May 2017 / Revised: 29 September 2017 / Accepted: 2 October 2017 / Published online: 23 October 2017 Ó The Author(s) 2017. This article is an open access publication Abstract Scalability is increasingly important for bioin- Scalability is increasingly important for these analysis formatics analysis services, since these must handle larger services, since the cost of instruments such as next-gener- datasets, more jobs, and more users. The pipelines used to ation sequencing machines is rapidly decreasing [1]. The implement analyses must therefore scale with respect to the reduced costs have made the machines more available resources on a single compute node, the number of nodes which has caused an increase in dataset size, the number of on a cluster, and also to cost-performance. Here, we survey datasets, and hence the number of users [2]. The backend several scalable bioinformatics pipelines and compare their executing the analyses must therefore scale up (vertically) design and their use of underlying frameworks and with respect to the resources on a single compute node, infrastructures. We also discuss current trends for bioin- since the resource usage of some analyses increases with formatics pipeline development. dataset size. For example, short sequence read assemblers [3] may require TBs of memory for big datasets and tens of Keywords Pipeline Á Bioinformatics Á Scalable Á CPU cores [4]. The analysis must also scale out (horizon- Infrastructure Á Analysis services tally) to take advantage of compute clusters and clouds. -
Zeromq
ZeroMQ Martin Sústrik <> ØMQ is a messaging system, or "message-oriented middleware", if you will. It's used in environments as diverse as financial services, game development, embedded systems, academic research and aerospace. Messaging systems work basically as instant messaging for applications. An application decides to communicate an event to another application (or multiple applications), it assembles the data to be sent, hits the "send" button and there we go—the messaging system takes care of the rest. Unlike instant messaging, though, messaging systems have no GUI and assume no human beings at the endpoints capable of intelligent intervention when something goes wrong. Messaging systems thus have to be both fault-tolerant and much faster than common instant messaging. ØMQ was originally conceived as an ultra-fast messaging system for stock trading and so the focus was on extreme optimization. The first year of the project was spent devising benchmarking methodology and trying to define an architecture that was as efficient as possible. Later on, approximately in the second year of development, the focus shifted to providing a generic system for building distributed applications and supporting arbitrary messaging patterns, various transport mechanisms, arbitrary language bindings, etc. During the third year the focus was mainly on improving usability and flattening the learning curve. We've adopted the BSD Sockets API, tried to clean up the semantics of individual messaging patterns, and so on. Hopefully, this chapter will give an insight into how the three goals above translated into the internal architecture of ØMQ, and provide some tips for those who are struggling with the same problems. -
Presto: the Definitive Guide
Presto The Definitive Guide SQL at Any Scale, on Any Storage, in Any Environment Compliments of Matt Fuller, Manfred Moser & Martin Traverso Virtual Book Tour Starburst presents Presto: The Definitive Guide Register Now! Starburst is hosting a virtual book tour series where attendees will: Meet the authors: • Meet the authors from the comfort of your own home Matt Fuller • Meet the Presto creators and participate in an Ask Me Anything (AMA) session with the book Manfred Moser authors + Presto creators • Meet special guest speakers from Martin your favorite podcasts who will Traverso moderate the AMA Register here to save your spot. Praise for Presto: The Definitive Guide This book provides a great introduction to Presto and teaches you everything you need to know to start your successful usage of Presto. —Dain Sundstrom and David Phillips, Creators of the Presto Projects and Founders of the Presto Software Foundation Presto plays a key role in enabling analysis at Pinterest. This book covers the Presto essentials, from use cases through how to run Presto at massive scale. —Ashish Kumar Singh, Tech Lead, Bigdata Query Processing Platform, Pinterest Presto has set the bar in both community-building and technical excellence for lightning- fast analytical processing on stored data in modern cloud architectures. This book is a must-read for companies looking to modernize their analytics stack. —Jay Kreps, Cocreator of Apache Kafka, Cofounder and CEO of Confluent Presto has saved us all—both in academia and industry—countless hours of work, allowing us all to avoid having to write code to manage distributed query processing. -
ADMI Cloud Computing Presentation
ECSU/IU NSF EAGER: Remote Sensing Curriculum ADMI Cloud Workshop th th Enhancement using Cloud Computing June 10 – 12 2016 Day 1 Introduction to Cloud Computing with Amazon EC2 and Apache Hadoop Prof. Judy Qiu, Saliya Ekanayake, and Andrew Younge Presented By Saliya Ekanayake 6/10/2016 1 Cloud Computing • What’s Cloud? Defining this is not worth the time Ever heard of The Blind Men and The Elephant? If you still need one, see NIST definition next slide The idea is to consume X as-a-service, where X can be Computing, storage, analytics, etc. X can come from 3 categories Infrastructure-as-a-S, Platform-as-a-Service, Software-as-a-Service Classic Cloud Computing Computing IaaS PaaS SaaS My washer Rent a washer or two or three I tell, Put my clothes in and My bleach My bleach comforter dry clean they magically appear I wash I wash shirts regular clean clean the next day 6/10/2016 2 The Three Categories • Software-as-a-Service Provides web-enabled software Ex: Google Gmail, Docs, etc • Platform-as-a-Service Provides scalable computing environments and runtimes for users to develop large computational and big data applications Ex: Hadoop MapReduce • Infrastructure-as-a-Service Provide virtualized computing and storage resources in a dynamic, on-demand fashion. Ex: Amazon Elastic Compute Cloud 6/10/2016 3 The NIST Definition of Cloud Computing? • “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” On-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf • However, formal definitions may not be very useful.