Developments in Dataflow Programming
Total Page:16
File Type:pdf, Size:1020Kb
University of Exeter Department of Computer Science Developments in Dataflow Programming Daniel Julius Maxwell April 2018 Supervised by Dr Antony Galton & Prof. Jonathan Fieldsend Submitted by Daniel Julius Maxwell to the University of Exeter as a thesis for the degree of Doctor of Philosophy in Computer Science, April 2018. This thesis is available for Library use on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement. I certify that all material in this thesis which is not my own work has been identi- fied and that no material has previously been submitted and approved for the award of a degree by this or any other University. (signature) ................................................................................................. Abstract Dataflow has historically been motivated either by parallelism or programmability or some combination of the two. This work, rather than being directed primarily at parallelism or programmability, is instead aimed at maximising the overall utility to the programmer of the system at large. This means that it aims to result in a system in which it is easy to create well-constructed, flexible programs that comply with the principles of software engineering and architecture, but also that the proposed system should be capable at performing practical real-life tasks and should be as widely applicable as can be achieved. With those aims in mind, this project has four goals: • to argue for a unified global dataflow coordination system, extensible to be able to accommodate components of any form that may exist now or in the future; • to establish a link between the design of such a system and the principles of software engineering and architecture; • to design a dataflow coordination system based on those principles, aiming where possible to embed them in the design so that they become easy or unthinking for programmers to apply; and • to implement and test components of the proposed system, using it to build a set of three sample algorithms. Taking the best ideas that have been proposed in dataflow programming in the past — those that most effectively embed the principles of software engineering — and extending them with new proposals where necessary, a collection of interactions and functionali- ties is proposed, including a novel way of using partial evaluation of functions and data dimensionality to represent iteration in an acyclic graph. The proposed design was implemented as far as necessary to construct three test algo- rithms: calculating a factorial, generating terms of the Fibonacci sequence and perform- ing a merge-sort. The implementation was successful in representing iteration in acyclic dataflow, and the test algorithms generated correct results, limited only by the numer- ical representation capabilities of the underlying language. Testing and working with the implemented system revealed the importance to usability of the system being visual, interactive and, in a distributed environment, always-available. Proposed further work falls into three categories: writing a full specification (in particular, defining the interfaces by which components will interact); developing new features to extend the functionality; and further developing the test implementation. The conclusion summarises the vision of a unified global dataflow coordination system and makes an appeal for cooperation on its development as an open, non-profit dataflow system run for the good of its community, rather than allowing a proliferation of competing systems run for commercial gain. To Avocado, for whom I hope computers will be easier to use. Acknowledgements I would like to thank my supervisors, Dr Antony Galton for his great attention to de- tail, being lightning-fast to understand and tease apart my logic, and insistence on clear explanations; and Professor Jonathan Fieldsend for solid competence, deep knowledge of programming languages, high standards and some timely advice on productivity. Huge thanks to Clare for listening, sharing my excitement about the subject, being brave enough to read sections of my thesis and for her valiant efforts to explain the topic to our friends. Thank you to my funders, EPSRC (grant number EP/K503046/1), and to the Department of Computer Science for awarding me the funding. And thank you to my examiners, Dr Yulei Wu and Professor Ian Watson, for taking on that mammoth task. Contents List of figures iii 1 Introduction 1 1.1 Purpose . 1 1.2 What Is Dataflow? . 4 1.3 What Is A Coordination System . 5 1.4 Document Structure . 7 2 The History of Dataflow 8 2.1 Origin and Motivation . 8 2.2 Partial Evaluation . 11 2.3 Side-Effects . 12 2.4 Data-Driven vs. Demand-Driven Execution . 14 2.5 Iteration . 15 2.6 Pessimism ... and the Recovery . 16 2.7 Coordination Languages . 19 2.8 Dataflow Classification . 20 2.9 Implementations . 22 2.10 Open Problems . 25 2.11 Summary . 26 3 Software Engineering 27 3.1 Origins . 27 3.2 Development Methodologies . 28 3.3 Software Architecture . 33 3.4 Architectural Principles . 34 3.5 Architectural Styles . 42 3.6 Summary . 43 4 Definition 45 4.1 A Coordination System . 45 4.2 Functional Purity . 46 4.3 What Are Nodes And Connections? . 48 4.4 Separation Between Nodes And Resources . 52 4.5 Visual Representation . 52 4.6 The Service-Provider Model . 54 4.7 Triggering Execution . 56 4.8 Partial Evaluation . 57 i 4.9 Expected Inputs . 62 4.10 Dimensions . 65 4.11 Iteration . 66 4.12 The Generalised Iteration Node . 74 4.13 Function Isolation . 83 4.14 Notifications and Time-Stamps . 86 4.15 Subscription Types . 88 4.16 Synchronisation . 89 4.17 Testing and Development . 90 4.18 Example Application . 96 4.19 Further Work . 100 4.20 Summary . 100 5 Implementation 102 5.1 Feature Implementation . 102 5.2 Code Structure . 110 5.3 Test Algorithms . 114 5.4 Summary . 127 6 Evaluation and Results 129 6.1 Programmability . 129 6.2 Speed . 131 6.3 Distributability . 132 6.4 Summary . 132 7 Further Work 133 7.1 Full Specification . 133 7.2 Additional Features . 136 7.3 Further Work on the Implementation . 145 7.4 Summary . 148 8 Conclusion 149 Appendices A Nomenclature 153 B Visual Notation 156 C API Documentation 161 D Examples 173 Bibliography 183 ii List of Figures 2.1 Reproduction of Yazdanpanah et al. [2014] Fig 2. 19 3.1 An example of a stage-wise process model, reproduced from Benington [1956]. 28 3.2 Not Royce’s recommendation. Royce commented about this process that it is “risky and invites failure”. Reproduced from Royce [1970]. 29 3.3 Royce’s recommended process model. 29 3.4 An Efficiency Chart. The horizontal axis represents work done; the project starts at the left and moves to the right as it progresses. 31 3.5 Multiple phases of work. A series of phases of work in which each change to the project triggers a new phase. 32 3.6 Narrowing the gap. Seen on an efficiency chart, software engineering has two goals: to move change points to the left by uncovering knowledge; and to move reversion points to the right by making the system flexible. 35 3.7 An Efficiency Chart. 43 4.1 The ‘How Is Bob Feeling?’ node. A node’s computational power could come from any source. 50 4.2 Bob’s pie-making node. A node could have the job of creating some output or external effect. 51 4.3 Nodes in shallow and deep notation. 53 4.4 Inputs, outputs and connections. 54 4.5 An ID node is a node with one input, which provides its input, unaltered, as an output. 54 4.6 Partial evaluation. 58 4.7 Derived inputs. 59 4.8 Reusing nodes. 59 4.9 Connecting a node’s content to an upstream node. 60 4.10 Fully resolved input names. 60 4.11 Inputs that arrive by multiple routes. Each input can only be inherited once by a each node. 61 4.12 Inheriting the ‘quantity’ input. 62 4.13 Without connecting the recipe, it would be impossible to set the quantity required. 63 4.14 Input parameters appear as a label attached to the input in question. 63 4.15 Transmission of parameters and derived inputs. 63 4.16 Internal storage of the connected value. 64 iii 4.17 Nominated inputs. When an input with an ‘expected inputs’ parameter is connected to a node, an input of the incoming function must be nominated as the one to be associated with each expected input. 64 4.18 Implicit nomination. In cases where the incoming function has only one input, the nomination is not required, and is not always shown in drawings. 65 4.19 Name-only nomination. In cases where the nominated input name is unique, it can be nominated using its input name alone. 65 4.20 Where the input name being nominated is not.