Data-Trace Types for Distributed Stream Processing Systems Konstantinos Mamouras Caleb Stanford Rajeev Alur Rice University, USA University of Pennsylvania, USA University of Pennsylvania, USA
[email protected] [email protected] [email protected] Zachary G. Ives Val Tannen University of Pennsylvania, USA University of Pennsylvania, USA
[email protected] [email protected] Abstract guaranteeing correct deployment, and (2) the throughput of Distributed architectures for efficient processing of stream- the automatically compiled distributed code is comparable ing data are increasingly critical to modern information pro- to that of hand-crafted distributed implementations. cessing systems. The goal of this paper is to develop type- CCS Concepts • Information systems → Data streams; based programming abstractions that facilitate correct and Stream management; • Theory of computation → Stream- efficient deployment of a logical specification of the desired ing models; • Software and its engineering → General computation on such architectures. In the proposed model, programming languages. each communication link has an associated type specifying tagged data items along with a dependency relation over tags Keywords distributed data stream processing, types that captures the logical partial ordering constraints over data items. The semantics of a (distributed) stream process- ACM Reference Format: ing system is then a function from input data traces to output Konstantinos Mamouras, Caleb Stanford, Rajeev Alur, Zachary data traces, where a data trace is an equivalence class of se- G. Ives, and Val Tannen. 2019. Data-Trace Types for Distributed quences of data items induced by the dependency relation. Stream Processing Systems. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation This data-trace transduction model generalizes both acyclic (PLDI ’19), June 22–26, 2019, Phoenix, AZ, USA.