CHABBI-DOCUMENT-2015.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Petaflops for the People
PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution. -
ACM SIGACT News Distributed Computing Column 28
ACM SIGACT News Distributed Computing Column 28 Idit Keidar Dept. of Electrical Engineering, Technion Haifa, 32000, Israel [email protected] Sergio Rajsbaum, who edited this column for seven years and established it as a relevant and popular venue, is stepping down. This issue is my first step in the big shoes he vacated. I would like to take this opportunity to thank Sergio for providing us with seven years’ worth of interesting columns. In producing these columns, Sergio has enjoyed the support of the community at-large and obtained material from many authors, who greatly contributed to the column’s success. I hope to enjoy a similar level of support; I warmly welcome your feedback and suggestions for material to include in this column! The main two conferences in the area of principles of distributed computing, PODC and DISC, took place this summer. This issue is centered around these conferences, and more broadly, distributed computing research as reflected therein, ranging from reviews of this year’s instantiations, through influential papers in past instantiations, to examining PODC’s place within the realm of computer science. I begin with a short review of PODC’07, and highlight some “hot” trends that have taken root in PODC, as reflected in this year’s program. Some of the forthcoming columns will be dedicated to these up-and- coming research topics. This is followed by a review of this year’s DISC, by Edward (Eddie) Bortnikov. For some perspective on long-running trends in the field, I next include the announcement of this year’s Edsger W. -
Edsger Dijkstra: the Man Who Carried Computer Science on His Shoulders
INFERENCE / Vol. 5, No. 3 Edsger Dijkstra The Man Who Carried Computer Science on His Shoulders Krzysztof Apt s it turned out, the train I had taken from dsger dijkstra was born in Rotterdam in 1930. Nijmegen to Eindhoven arrived late. To make He described his father, at one time the president matters worse, I was then unable to find the right of the Dutch Chemical Society, as “an excellent Aoffice in the university building. When I eventually arrived Echemist,” and his mother as “a brilliant mathematician for my appointment, I was more than half an hour behind who had no job.”1 In 1948, Dijkstra achieved remarkable schedule. The professor completely ignored my profuse results when he completed secondary school at the famous apologies and proceeded to take a full hour for the meet- Erasmiaans Gymnasium in Rotterdam. His school diploma ing. It was the first time I met Edsger Wybe Dijkstra. shows that he earned the highest possible grade in no less At the time of our meeting in 1975, Dijkstra was 45 than six out of thirteen subjects. He then enrolled at the years old. The most prestigious award in computer sci- University of Leiden to study physics. ence, the ACM Turing Award, had been conferred on In September 1951, Dijkstra’s father suggested he attend him three years earlier. Almost twenty years his junior, I a three-week course on programming in Cambridge. It knew very little about the field—I had only learned what turned out to be an idea with far-reaching consequences. a flowchart was a couple of weeks earlier. -
Arxiv:1901.04709V1 [Cs.GT]
Self-Stabilization Through the Lens of Game Theory Krzysztof R. Apt1,2 and Ehsan Shoja3 1 CWI, Amsterdam, The Netherlands 2 MIMUW, University of Warsaw, Warsaw, Poland 3 Sharif University of Technology, Tehran, Iran Abstract. In 1974 E.W. Dijkstra introduced the seminal concept of self- stabilization that turned out to be one of the main approaches to fault- tolerant computing. We show here how his three solutions can be for- malized and reasoned about using the concepts of game theory. We also determine the precise number of steps needed to reach self-stabilization in his first solution. 1 Introduction In 1974 Edsger W. Dijkstra introduced in a two-page article [10] the notion of self-stabilization. The paper was completely ignored until 1983, when Leslie Lamport stressed its importance in his invited talk at the ACM Symposium on Principles of Distributed Computing (PODC), published a year later as [21]. Things have changed since then. According to Google Scholar Dijkstra’s paper has been by now cited more than 2300 times. It became one of the main ap- proaches to fault tolerant computing. An early survey was published in 1993 as [26], while the research on the subject until 2000 was summarized in the book [13]. In 2002 Dijkstra’s paper won the PODC influential paper award (renamed in 2003 to Dijkstra Prize). The literature on the subject initiated by it continues to grow. There are annual Self-Stabilizing Systems Workshops, the 18th edition of which took part in 2016. The idea proposed by Dijkstra is very simple. Consider a distributed system arXiv:1901.04709v1 [cs.GT] 15 Jan 2019 viewed as a network of machines. -
Distributed Algorithms for Wireless Networks
A THEORETICAL VIEW OF DISTRIBUTED SYSTEMS Nancy Lynch, MIT EECS, CSAIL National Science Foundation, April 1, 2021 Theory for Distributed Systems • We have worked on theory for distributed systems, trying to understand (mathematically) their capabilities and limitations. • This has included: • Defining abstract, mathematical models for problems solved by distributed systems, and for the algorithms used to solve them. • Developing new algorithms. • Producing rigorous proofs, of correctness, performance, fault-tolerance. • Proving impossibility results and lower bounds, expressing inherent limitations of distributed systems for solving problems. • Developing foundations for modeling, analyzing distributed systems. • Kinds of systems: • Distributed data-management systems. • Wired, wireless communication systems. • Biological systems: Insect colonies, developmental biology, brains. This talk: 1. Algorithms for Traditional Distributed Systems 2. Impossibility Results 3. Foundations 4. Algorithms for New Distributed Systems 1. Algorithms for Traditional Distributed Systems • Mutual exclusion in shared-memory systems, resource allocation: Fischer, Burns,…late 70s and early 80s. • Dolev, Lynch, Pinter, Stark, Weihl. Reaching approximate agreement in the presence of faults. JACM,1986. • Lundelius, Lynch. A new fault-tolerant algorithm for clock synchronization. Information and Computation,1988. • Dwork, Lynch, Stockmeyer. Consensus in the presence of partial synchrony. JACM,1988. Dijkstra Prize, 2007. 1A. Dwork, Lynch, Stockmeyer [DLS] “This paper introduces a number of practically motivated partial synchrony models that lie between the completely synchronous and the completely asynchronous models and in which consensus is solvable. It gave practitioners the right tool for building fault-tolerant systems and contributed to the understanding that safety can be maintained at all times, despite the impossibility of consensus, and progress is facilitated during periods of stability. -
2020 SIGACT REPORT SIGACT EC – Eric Allender, Shuchi Chawla, Nicole Immorlica, Samir Khuller (Chair), Bobby Kleinberg September 14Th, 2020
2020 SIGACT REPORT SIGACT EC – Eric Allender, Shuchi Chawla, Nicole Immorlica, Samir Khuller (chair), Bobby Kleinberg September 14th, 2020 SIGACT Mission Statement: The primary mission of ACM SIGACT (Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory) is to foster and promote the discovery and dissemination of high quality research in the domain of theoretical computer science. The field of theoretical computer science is the rigorous study of all computational phenomena - natural, artificial or man-made. This includes the diverse areas of algorithms, data structures, complexity theory, distributed computation, parallel computation, VLSI, machine learning, computational biology, computational geometry, information theory, cryptography, quantum computation, computational number theory and algebra, program semantics and verification, automata theory, and the study of randomness. Work in this field is often distinguished by its emphasis on mathematical technique and rigor. 1. Awards ▪ 2020 Gödel Prize: This was awarded to Robin A. Moser and Gábor Tardos for their paper “A constructive proof of the general Lovász Local Lemma”, Journal of the ACM, Vol 57 (2), 2010. The Lovász Local Lemma (LLL) is a fundamental tool of the probabilistic method. It enables one to show the existence of certain objects even though they occur with exponentially small probability. The original proof was not algorithmic, and subsequent algorithmic versions had significant losses in parameters. This paper provides a simple, powerful algorithmic paradigm that converts almost all known applications of the LLL into randomized algorithms matching the bounds of the existence proof. The paper further gives a derandomized algorithm, a parallel algorithm, and an extension to the “lopsided” LLL. -
Unlocking the Full Potential of the Cray XK7 Accelerator
Unlocking the Full Potential of the Cray XK7 Accelerator ∗ † Mark D. Klein , and, John E. Stone ∗National Center for Supercomputing Application, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA †Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA Abstract—The Cray XK7 includes NVIDIA GPUs for ac- [2], [3], [4], and for high fidelity movie renderings with the celeration of computing workloads, but the standard XK7 HVR volume rendering software.2 In one example, the total system software inhibits the GPUs from accelerating OpenGL turnaround time for an HVR movie rendering of a trillion- and related graphics-specific functions. We have changed the operating mode of the XK7 GPU firmware, developed a custom cell inertial confinement fusion simulation [5] was reduced X11 stack, and worked with Cray to acquire an alternate driver from an estimate of over a month for data transfer to and package from NVIDIA in order to allow users to render and rendering on a conventional visualization cluster down to post-process their data directly on Blue Waters. Users are able just one day when rendered locally using 128 XK7 nodes to use NVIDIA’s hardware OpenGL implementation which has on Blue Waters. The fully-graphics-enabled GPU state is many features not available in software rasterizers. By elim- inating the transfer of data to external visualization clusters, currently considered an unsupported mode of operation by time-to-solution for users has been improved tremendously. In Cray, and to our knowledge Blue Waters is presently the one case, XK7 OpenGL rendering has cut turnaround time only Cray system currently running in this mode. -
LNCS 4308, Pp
On Distributed Verification Amos Korman and Shay Kutten Faculty of IE&M, Technion, Haifa 32000, Israel Abstract. This paper describes the invited talk given at the 8th Inter- national Conference on Distributed Computing and Networking (ICDCN 2006), at the Indian Institute of Technology Guwahati, India. This talk was intended to give a partial survey and to motivate further studies of distributed verification. To serve the purpose of motivating, we allow ourselves to speculate freely on the potential impact of such research. In the context of sequential computing, it is common to assume that the task of verifying a property of an object may be much easier than computing it (consider, for example, solving an NP-Complete problem versus verifying a witness). Extrapolating from the impact the separation of these two notions (computing and verifying) had in the context of se- quential computing, the separation may prove to have a profound impact on the field of distributed computing as well. In addition, in the context of distributed computing, the motivation for the separation seems even stronger than in the centralized sequential case. In this paper we explain some motivations for specific definitions, sur- vey some very related notions and their motivations in the literature, survey some examples for problems and solutions, and mention some ad- ditional general results such as general algorithmic methods and general lower bounds. Since this paper is mostly intended to “give a taste” rather than be a comprehensive survey, we apologize to authors of additional related papers that we did not mention or detailed. 1 Introduction This paper addresses the problem of locally verifying global properties. -
This Is Your Presentation Title
Introduction to GPU/Parallel Computing Ioannis E. Venetis University of Patras 1 Introduction to GPU/Parallel Computing www.prace-ri.eu Introduction to High Performance Systems 2 Introduction to GPU/Parallel Computing www.prace-ri.eu Wait, what? Aren’t we here to talk about GPUs? And how to program them with CUDA? Yes, but we need to understand their place and their purpose in modern High Performance Systems This will make it clear when it is beneficial to use them 3 Introduction to GPU/Parallel Computing www.prace-ri.eu Top 500 (June 2017) CPU Accel. Rmax Rpeak Power Rank Site System Cores Cores (TFlop/s) (TFlop/s) (kW) National Sunway TaihuLight - Sunway MPP, Supercomputing Center Sunway SW26010 260C 1.45GHz, 1 10.649.600 - 93.014,6 125.435,9 15.371 in Wuxi Sunway China NRCPC National Super Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Computer Center in Cluster, Intel Xeon E5-2692 12C 2 Guangzhou 2.200GHz, TH Express-2, Intel Xeon 3.120.000 2.736.000 33.862,7 54.902,4 17.808 China Phi 31S1P NUDT Swiss National Piz Daint - Cray XC50, Xeon E5- Supercomputing Centre 2690v3 12C 2.6GHz, Aries interconnect 3 361.760 297.920 19.590,0 25.326,3 2.272 (CSCS) , NVIDIA Tesla P100 Cray Inc. DOE/SC/Oak Ridge Titan - Cray XK7 , Opteron 6274 16C National Laboratory 2.200GHz, Cray Gemini interconnect, 4 560.640 261.632 17.590,0 27.112,5 8.209 United States NVIDIA K20x Cray Inc. DOE/NNSA/LLNL Sequoia - BlueGene/Q, Power BQC 5 United States 16C 1.60 GHz, Custom 1.572.864 - 17.173,2 20.132,7 7.890 4 Introduction to GPU/ParallelIBM Computing www.prace-ri.eu How do -
Stabilization
Chapter 13 Stabilization A large branch of research in distributed computing deals with fault-tolerance. Being able to tolerate a considerable fraction of failing or even maliciously be- having (\Byzantine") nodes while trying to reach consensus (on e.g. the output of a function) among the nodes that work properly is crucial for building reli- able systems. However, consensus protocols require that a majority of the nodes remains non-faulty all the time. Can we design a distributed system that survives transient (short-lived) failures, even if all nodes are temporarily failing? In other words, can we build a distributed system that repairs itself ? 13.1 Self-Stabilization Definition 13.1 (Self-Stabilization). A distributed system is self-stabilizing if, starting from an arbitrary state, it is guaranteed to converge to a legitimate state. If the system is in a legitimate state, it is guaranteed to remain there, provided that no further faults happen. A state is legitimate if the state satisfies the specifications of the distributed system. Remarks: • What kind of transient failures can we tolerate? An adversary can crash nodes, or make nodes behave Byzantine. Indeed, temporarily an adversary can do harm in even worse ways, e.g. by corrupting the volatile memory of a node (without the node noticing { not unlike the movie Memento), or by corrupting messages on the fly (without anybody noticing). However, as all failures are transient, eventually all nodes must work correctly again, that is, crashed nodes get res- urrected, Byzantine nodes stop being malicious, messages are being delivered reliably, and the memory of the nodes is secure. -
Hardware Complexity and Software Challenges
Trends in HPC: hardware complexity and software challenges Mike Giles Oxford University Mathematical Institute Oxford e-Research Centre ACCU talk, Oxford June 30, 2015 Mike Giles (Oxford) HPC Trends June 30, 2015 1 / 29 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Top500 supercomputers Really impressive { 300× more capability in 10 years! Mike Giles (Oxford) HPC Trends June 30, 2015 3 / 29 My personal experience 1982: CDC Cyber 205 (NASA Langley) 1985{89: Alliant FX/8 (MIT) 1989{92: Stellar (MIT) 1990: Thinking Machines CM5 (UTRC) 1987{92: Cray X-MP/Y-MP (NASA Ames, Rolls-Royce) 1993{97: IBM SP2 (Oxford) 1998{2002: SGI Origin (Oxford) 2002 { today: various x86 clusters (Oxford) 2007 { today: various GPU systems/clusters (Oxford) 2011{15: GPU cluster (Emerald @ RAL) 2008 { today: Cray XE6/XC30 (HECToR/ARCHER @ EPCC) 2013 { today: Cray XK7 with GPUs (Titan @ ORNL) Mike Giles (Oxford) HPC Trends June 30, 2015 4 / 29 Top500 supercomputers Power requirements are raising the question of whether this rate of improvement is sustainable. -
Download Distributed Systems Free Ebook
DISTRIBUTED SYSTEMS DOWNLOAD FREE BOOK Maarten van Steen, Andrew S Tanenbaum | 596 pages | 01 Feb 2017 | Createspace Independent Publishing Platform | 9781543057386 | English | United States Distributed Systems - The Complete Guide The hope is that together, the system can maximize resources and information while preventing failures, as if one system fails, it won't affect the availability of the service. Banker's algorithm Dijkstra's algorithm DJP algorithm Prim's algorithm Dijkstra-Scholten algorithm Dekker's algorithm generalization Smoothsort Shunting-yard algorithm Distributed Systems marking algorithm Concurrent algorithms Distributed Systems algorithms Deadlock prevention algorithms Mutual exclusion algorithms Self-stabilizing Distributed Systems. Learn to code for free. For the first time computers would be able to send messages to other systems with a local IP address. The messages passed between machines contain forms of data that the systems want to share like databases, objects, and Distributed Systems. Also known as distributed computing and distributed databases, a distributed system is a collection of independent components located on different machines that share messages with each other in order to achieve common goals. To prevent infinite loops, running the code requires some amount of Ether. As mentioned in many places, one of which this great articleyou cannot have consistency and availability without partition tolerance. Because it works in batches jobs a problem arises where if your job fails — Distributed Systems need to restart the whole thing. While in a voting system an attacker need only add nodes to the network which is Distributed Systems, as free access to the network is a design targetin a CPU power based scheme an attacker faces a physical limitation: getting access to more and more powerful hardware.