Introducing Distributed Computing to Programming Competitions

Olympiads in Informatics, 2016, Vol. 10, 177–194 177 © 2016 IOI, Vilnius University DOI: 10.15388/ioi.2016.11 Distributed Tasks: Introducing Distributed Computing to Programming Competitions Adam KARCZMARZ1, Jakub ŁĄCKI2, Adam POLAK3 Jakub RADOSZEWSKI1,4, Jakub O. WOJTASZCZYK5 1Faculty of Mathematics, Informatics and Mechanics, University of Warsaw ul. Banacha 2, 02-097 Warsaw, Poland 2Department of Computer, Control, and Management Engineering Antonio Ruberti at Sapienza University of Rome via Ariosto 25, 00185 Roma, Italy 3Department of Theoretical Computer Science, Faculty of Mathematics and Computer Science, Jagiellonian University ul. Łojasiewicza 6, 30-348 Kraków, Poland 4King’s College London Strand, London WC2R 2LS 5Google Warsaw Emilii Plater 53, 00-113 Warsaw, Poland e-mail: [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. In this paper we present distributed tasks, a new task type that can be used at programming competitions. In such tasks, a contestant is supposed to write a program which is then simultaneously executed on multiple computing nodes (machines). The instances of the program may communicate and use the joint computing power to solve the task presented to the contestant. We show a framework for running a contest with distributed tasks, that we believe to be ac- cessible to contestants with no previous experience in distributed computing. Moreover, we give examples of distributed tasks that have been used in the last two editions of a Polish programming contest, Algorithmic Engagements, together with their intended solutions. Finally, we discuss the challenges of grading and preparing distributed tasks. Keywords: programming contests, distributed tasks. 1. Introduction When looking at major programming competitions, it is easy to notice that a large number of them are very similar in design. To name a few, the IOI, the ACM ICPC, Google Code Jam, TopCoder’s algorithmic track, Facebook Hacker Cup, the CodeForces competitions and many more, are all focused on small, self-contained tasks, with automated 178 A. Karczmarz et al. judging based on testing a program on a judge-provided set of testcases and mostly algorithmic in nature. This model seems attractive both to the organizers and the participants. The organizers appreciate the very clear, non-subjective mechanism of judging and the automation of judging, which means the competition can scale out easily to a larger number of competitors. The participants also appreciate the automated judging (which means fast results) and the objective judgment criteria (which makes the competition more fair); ad- ditionally the entry barrier to such contests is pretty low, as the introductory-level tasks can be very simple. If we consider programming competitions a way of educating future computer sci- entists and professional programmers, the model has its strengths, but also weaknesses. Some of these weaknesses come from the very nature of the small, self-contained tasks (which are a large part of the model’s attractiveness): the participants do not learn about code maintainability and extensibility, they also do not learn anything about larger-scale system design. These weaknesses seem to be intrinsic to the model itself; and competitions that abandon the model moving to larger or less clear-cut tasks are frequently less successful (for example, TopCoder’s Marathon Match track has over 10 times less reg- istered participants than the Algorithm track). However, there are also areas of programming expertise that the existing competitions do not teach, which are less intrinsically tied to the model of these competitions. In particular, the focus on algorithmic problems is not crucial to the model of the competition itself. Indeed, multiple authors already explored extending this classical model of competitions to other fields of computer science, for example visualization and precom- putation (Kulczyński et al., 2011), online algorithms (Komm, 2011), computer graphics and cryptography (Forišek, 2013). 1.1. Distributed Programming Distributed (and cloud) computing has gained importance quickly in the recent years. The computing giants of today – Google, Amazon, Facebook and others – do not operate on the enormous mainframes that dominated computing in the past, but on networked farms of smaller servers. This is a model of computing that is not inherently in conflict with the algorithmic programming contest model, but is not taught by the dominant competitions of today; students that gained most of their programming skills through programming competitions will be totally unfamiliar with even the basic paradigms of distributed computing like the MapReduce framework (Dean and Ghemawat, 2008) or the CAP theorem (Fox and Brewer, 1999). In this article we present a framework for running a contest focusing on distributed algorithms, developed by engineers in Google’s Warsaw office in collaboration with the University of Warsaw, and show sample problems used in the Algorithmic Engage- ments (Potyczki Algorytmiczne in Polish) contest ran by the University of Warsaw. The same framework is used at the recently introduced Google’s Distributed Code Jam competition. Distributed Tasks: Introducing Distributed Computing to ... 179 1.2. Designing a Distributed Programming Contest The primary focus of our design was simplicity from the contestant’s point of view. The introduction of a new programming paradigm, likely unfamiliar to most participants, is clearly a challenge for the contestants, and we aimed at making the transition as smooth as possible. Thus, the basic interface is similar to a programming contest like IOI. The participant submits a single program, which is compiled and executed by the framework. The same program is run on every node (computer) available to the participant. Obviously, the nodes need to be able to communicate in order to collaborate in computing. We decided on a protocol based on simple message-passing (“send this array of bytes to this node”). The message passing methods are available to the program through a library that is common for all problems, and provided by the framework. We also considered using an RPC-style interface. This, however, is more complex on an API level. A standard approach to RPC is that the programmer has to declare an interface (which will contain the remotely-callable methods). The server will just implement the methods of this interface. On the client side, the infrastructure needs to provide a way to generate a “stub” connected to some particular server; this stub will automagically have the method calls autogenerated. The language would have to provide some way to anno- tate that the interface is to be treated as a “remote” interface, increasing the complexity of the infrastructure implementation, and meaning more “magic” happening under the hood, which – in our perception – decreases the comprehensibility of the system. For instance, the “stub implementation” would need to make a choice of deep versus shallow copying of the arguments, each choice potentially leading to confusion for the participants. Let us now present the functions of our library, together with their declarations in C++. First, the library provides a function that returns the number of nodes on which the solution is running, and the index (in the range [0 − 1]) of the node on which the calling process is running. int NumberOfNodes(); int MyNodeId(); The library maintains in each node a message buffer for each of the nodes, which represents messages that are to be sent to this node. Messages are added to the buffer through the Put-methods. // Append “value” to the message that is being prepared for // the node with id “target”. The “Int” in PutInt is // interpreted as 32 bits, regardless of whether the actual // int type will be 32 or 64 bits. void PutChar(int target, char value); void PutInt(int target, int value); void PutLL(int target, long long value); 180 A. Karczmarz et al. There is also a method that sends the message that was accumulated in the buffer for a given node and clears this buffer. This method is non-blocking, that is, it does not wait for the receiver to call Receive, but returns immediately after sending the message. void Send(int target); The following function is used for receiving messages. int Receive(int source); The library has a receiving buffer for each remote node. When you call Receive and retrieve a message from a remote node, the buffer tied to this remote node is over- written. You can then retrieve individual parts of the message through the Get-methods. This method is blocking – if there is no message to receive, it will wait for the message to arrive. You can call Receive(-1) to retrieve a message from any source, or set source to a number from [0 − 1] to retrieve a message from a particular source. Receive returns the number of the node which sent the message, which is equal to source, unless source is -1. Finally, for reading the buffer of incoming messages, the following three methods are provided. // The “Int” in GetInt is interpreted as 32 bits, regardless // of whether the actual int type is 32 or 64 bits. char GetChar(int source); int GetInt(int source); long long GetLL(int source); Each of these methods returns and consumes one item from the buffer of the ap- propriate node. You must call these methods in the order in which the elements were appended to the message (so, for instance, if the message was created with PutChar, PutChar, PutLL, you must call GetChar, GetChar, GetLL in this order). If you call them in a different order, or you call a Get-method after consuming all the contents of the buffer, the behaviour is undefined. The serialization we decided to use is very basic, compared to models like Java’s serialization mechanisms, Python’s “pickle” or even Google’s protocol buffer language.

Load more