Anomaly Detection in Manufacturing Equipment with Apache Flink : Grand Challenge Yann Busnel, Nicolo Riveei, Avigdor Gal

Total Page:16

File Type:pdf, Size:1020Kb

Anomaly Detection in Manufacturing Equipment with Apache Flink : Grand Challenge Yann Busnel, Nicolo Riveei, Avigdor Gal FlinkMan : Anomaly Detection in Manufacturing Equipment with Apache Flink : Grand Challenge Yann Busnel, Nicolo Riveei, Avigdor Gal To cite this version: Yann Busnel, Nicolo Riveei, Avigdor Gal. FlinkMan : Anomaly Detection in Manufactur- ing Equipment with Apache Flink : Grand Challenge. DEBS ’17 : 11th ACM International Conference on Distributed and Event-based Systems, Jun 2017, Barcelone, Spain. pp.274-279 10.1145/3093742.3095099. hal-01644417 HAL Id: hal-01644417 https://hal.archives-ouvertes.fr/hal-01644417 Submitted on 22 Nov 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Grand Challenge: FlinkMan – Anomaly Detection in Manufacturing Equipment with Apache Flink Nicolo Rivei Yann Busnel Avigdor Gal Technion - Israel Institute of IMT Atlantique / IRISA / UBL Technion - Israel Institute of Technology [email protected] Technology [email protected] [email protected] ABSTRACT analog. ese sensors provide periodic measurements, which are We present a (so) real-time event-based anomaly detection appli- sent to a monitoring base station. e laer receives then a large cation for manufacturing equipment, built on top of the general collection of observations. Analyzing in an ecient and accurate purpose stream processing framework Apache Flink. e anom- way, this very-high-rate – and potentially massive – stream of aly detection involves multiple CPUs and/or memory intensive events is the core of the Grand Challenge. Although, the analysis tasks, such as clustering on large time-based window and parsing of a massive amount of sensor reading requires an on-line analytics input data in RDF-format. e main goal is to reduce end-to-end pipeline that deals with linked-data, clustering as well as a Markov latencies, while handling high input throughput and still provide model training and querying. exact results. Given a truly distributed seing, this challenge also e FlinkMan system proposes a solution to the 2017 Grand entails careful task and/or data parallelization and balancing. We Challenge, making use of a publicly available streaming engine and propose FlinkMan, a system that oers a generic and ecient so- thus oering a generic solution that is not specially tailored for lution, which maximizes the usage of available cores and balances this or that challenge. We oer an ecient solution that maximally the load among them. We illustrates the accuracy and eciency utilizes available cores, balances the load among the cores, and of FlinkMan, over a 3-step pipelined data stream analysis, that avoids to the extent possible tasks such as garbage collection that includes clustering, modeling and querying. are only indirectly related to the task at hand. is rest of the paper is organized as follows. Section 2 presents CCS CONCEPTS the query engine pipeline, the data set and the evaluation platform, that are provided for this challenge. Section 3 introduces the general •Information systems →Stream management; •eory of com- architecture of our solution and its rationale. Finally, Section 4 putation →Distributed algorithms; Unsupervised learning and clus- provides details of the implementation as well as the optimizations tering; included in our solution. KEYWORDS 2 PROBLEM STATEMENT Anomaly Detection, Stream Processing, Clustering, Markov Chains, Linked-Data e overall goal is to detect anomalies in manufacturing machines based on a stream of measurements produced by the sensors embed- ded into the monitored equipments. e events produced by each 1 INTRODUCTION sensor are clustered and the state transitions between the clusters Stream processing management system (SPMS) and/or Complex are used to train a Markov model. In turn, the produced Markov Event Processing (CEP) systems gain momentum In performing model is used to detect anomalies. A sequence of transitions that analytics on continuous data streams. eir ability to achieve sub- follows a low probability path in the Markov chain is considered as second latencies, coupled with their scalability, makes them the abnormal, and is agged as an anomaly. preferred choice for many big data companies. Supporting this trend, since 2011, the ACM International Conference on Distributed 2.1 ery Event-based Systems (DEBS) launched the Grand Challenge series e anomaly detection analysis can be modeled as a pipeline with to increase the focus on these systems as well as provide common three stages: (i) clustering, (ii) Markov model training and (iii) benchmarks to evaluate and compare them. e ACM DEBS 2017 Markov model querying (i.e., output transition sequences with low Grand Challenge focuses on (so) real-time anomaly detection in probability). ese three steps are executed continuously on a manufacturing equipment [4]. To handle continuous monitoring, time-based sliding window and the whole pipeline is performed each machine is ed with a vast array of sensors, either digital or independently for each sensor of each machine. e query has 6 Permission to make digital or hard copies of all or part of this work for personal or parameters: the time-based sliding window size W (in seconds), the classroom use is granted without fee provided that copies are not made or distributed initial number of clusters k (non uniform among sensors), the maxi- for prot or commercial advantage and that copies bear this notice and the full citation mum number of iterations of the clustering algorithm M (if conver- on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, gence has not been reached), the clustering algorithm convergence to post on servers or to redistribute to lists, requires prior specic permission and/or a distance µ, the length of the Markov model path we consider for fee. Request permissions from [email protected]. computing the anomaly probability N , and the probability thresh- DEBS’17, Barcelona, Spain © 2017 ACM. 978-1-4503-5065-5...$15.00 old T below which the path is classied as anomaly. Each event DOI: 10.1145/3093742.3095099 goes through all the mentioned stages so that a single event may DEBS’17, June 19 - 23, 2017, Barcelona, Spain Nicolo Rivei, Yann Busnel, and Avigdor Gal Event Physical Timestamp Logical Timestamp Value 1/3 r0 = ( 1485903716000; 1155; −0:04 ) r = ( 1485903717000; 1165; −0:04 ) r r6 1 0 5 r = ( 1485903718000; 1175; +0:02 ) 2 1/3 r = ( 1485903719000; 1185; −0:0 ) 1/5 r7 3 1/3 r = ( 1485903720000; 1195; −0:01 ) 2 3/5 4 1 r8 r5 = ( 1485903721000; 1205; −0:04 ) 1/5 r6 = ( 1485903722000; 1215; +0:0 ) 1 r9 r7 = ( 1485903723000; 1225; −0:02 ) r8 = ( 1485903724000; 1235; +0:0 ) Last N transitions Probability r9 = ( 1485903725000; 1245; +0:02 ) r5 : 2 ! 0 P1 = P2!0 = 1=5 = 0:2 Table 1: Example of an input window of size W = 10. r6 : 0 ! 2 P2 = P1 × 1=3 = 1=15 ≈ 0:666 r7 : 2 ! 2 P3 = P2 × 3=5 = 1=25 = 0:04 r8 : 2 ! 2 P4 = P3 × 3=5 = 3=125 ≈ 0:024 r9 : 2 ! 1 P5 = P4 × 1=5 = 1=3215 < T = 1=200 C0 C2 C1 r0 r3 ) r5 trigger an anomaly r1 r6 r2 r5 r7 r4 r8 r9 Figure 2: Trained Markov model and probability of the ter- Init : -0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 minal path of length N = 5, with a threshold T = 0:005. c0 c2 c1 P1 = 1=5 to occur. Following the 5-step path from r4 to r9, this se- quence has a probability of 1=3215 to happen, which is way below C0 C2 C1 the anomaly threshold (set to 0:005 for this toy example). r0 r3 r1 r6 r2 r r r r r Iter1 : 5 7 4 8 9 2.2 Dataset Iter2 : -0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 e molding machines of our dataset are equipped with a large array of sensors, measuring various parameters of their processing c0 c2 c1 including distance, pressure, time, frequency, volume, tempera- ture, time, speed, and force. e dataset is encoded as RDF [20] (Resource Description Framework) triples using Turtle [19] and Figure 1: Clustering (k-means) with k = . 3 consists of two types of inputs, namely a stream of measurements and a meta-data le. e stream measurements contain a sequence change the clustering, modify the Markov model, and trigger an of observation groups, a 120 dimensional vector with the events anomaly detection. It is worth noting that the detected anomalies from all sensors for a single time-tick and machine. It is notewor- must be ordered with respect to the ordering in the input stream. thy that the vector contains a mix of dierent value types, e.g., text and numerical values. Each observation group is marked with a Example 2.1. Table 1 contains an input windows of size W = 10. physical timestamp and has a machine identier. In addition, each Clustering First, the clustering algorithm groups all readings event contains a sensor identier, a sensor reading and a sensor (from r0 to r9) into k = 3 clusters. To do so, a k-means algo- type. Each machine outputs a (complete) observation group once rithm is initialized: the cluster centers are set to the k rst values every second. W is the size in time of the sliding window and, in encountered (represented as c0, c1 and c2 in the Init part of Fig- steady state, the exact count of the sliding window.
Recommended publications
  • Log4j-Users-Guide.Pdf
    ...................................................................................................................................... Apache Log4j 2 v. 2.2 User's Guide ...................................................................................................................................... The Apache Software Foundation 2015-02-22 T a b l e o f C o n t e n t s i Table of Contents ....................................................................................................................................... 1. Table of Contents . i 2. Introduction . 1 3. Architecture . 3 4. Log4j 1.x Migration . 10 5. API . 16 6. Configuration . 18 7. Web Applications and JSPs . 48 8. Plugins . 56 9. Lookups . 60 10. Appenders . 66 11. Layouts . 120 12. Filters . 140 13. Async Loggers . 153 14. JMX . 167 15. Logging Separation . 174 16. Extending Log4j . 176 17. Extending Log4j Configuration . 184 18. Custom Log Levels . 187 © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . T a b l e o f C o n t e n t s ii © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . 1 I n t r o d u c t i o n 1 1 Introduction ....................................................................................................................................... 1.1 Welcome to Log4j 2! 1.1.1 Introduction Almost every large application includes its own logging or tracing API. In conformance with this rule, the E.U.
    [Show full text]
  • LMAX Disruptor
    Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads Martin Thompson Dave Farley Michael Barker Patricia Gee Andrew Stewart May-2011 http://code.google.com/p/disruptor/ 1 Abstract LMAX was established to create a very high performance financial exchange. As part of our work to accomplish this goal we have evaluated several approaches to the design of such a system, but as we began to measure these we ran into some fundamental limits with conventional approaches. Many applications depend on queues to exchange data between processing stages. Our performance testing showed that the latency costs, when using queues in this way, were in the same order of magnitude as the cost of IO operations to disk (RAID or SSD based disk system) – dramatically slow. If there are multiple queues in an end-to-end operation, this will add hundreds of microseconds to the overall latency. There is clearly room for optimisation. Further investigation and a focus on the computer science made us realise that the conflation of concerns inherent in conventional approaches, (e.g. queues and processing nodes) leads to contention in multi-threaded implementations, suggesting that there may be a better approach. Thinking about how modern CPUs work, something we like to call “mechanical sympathy”, using good design practices with a strong focus on teasing apart the concerns, we came up with a data structure and a pattern of use that we have called the Disruptor. Testing has shown that the mean latency using the Disruptor for a three-stage pipeline is 3 orders of magnitude lower than an equivalent queue-based approach.
    [Show full text]
  • High Performance & Low Latency Complex Event Processor
    International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Impact Factor (2012): 3.358 Lightning CEP - High Performance & Low Latency Complex Event Processor Vikas Kale1, Kishor Shedge2 1, 2Sir Visvesvaraya Institute of Technology, Chincholi, Nashik, 422101, India Abstract: Number of users and devices connected to internet is growing exponentially. Each of these device and user generate lot of data which organizations want to analyze and use. Hadoop like batch processing are evolved to process such big data. Batch processing systems provide offline data processing capability. There are many businesses which requires real-time or near real-time processing of data for faster decision making. Hadoop and batch processing system are not suitable for this. Stream processing systems are designed to support class of applications which requires fast and timely analysis of high volume data streams. Complex event processing, or CEP, is event/stream processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. In this paper I propose “Lightning - High Performance & Low Latency Complex Event Processor” engine. Lightning is based on open source stream processor WSO2 Siddhi. Lightning retains most of API and Query Processing of WSO2 Siddhi. WSO2 Siddhi core is modified using low latency technique such as Ring Buffer, off heap data store and other techniques. Keywords: CEP, Complex Event Processing, Stream Processing. 1. Introduction of stream applications include automated stock trading, real- time video processing, vital-signs monitoring and geo-spatial Number of users and devices connected to internet is trajectory modification. Results produced by such growing exponentially.
    [Show full text]
  • Need a Title Here
    Equilibrium Behaviour of Double-Sided Queueing System with Dependent Matching Time by Cheryl Yang A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Science in Probability and Statistics Carleton University Ottawa, Ontario @ 2020 Cheryl Yang ii Abstract In this thesis, we consider the equilibrium behaviour of a double-ended queueing system with dependent matching time in the context of taxi-passenger systems at airport terminal pickup. We extend the standard taxi-passenger model by considering random matching time between taxis and passengers in an airport terminal pickup setting. For two types of matching time distribution, we examine this model through analysis of equilibrium behaviour and optimal strategies. We demonstrate in detail how to derive the equilibrium joining strategies for passen- gers arriving at the terminal and the existence of a socially optimal strategies for partially observable and fully observable cases. Numerical experiments are used to examine the behaviour of social welfare and compare cases. iii Acknowledgements I would like to give my deepest gratitude to my supervisor, Dr. Yiqiang Zhao, for his tremendous support throughout my graduate program. Dr. Zhao gave me the opportunity to explore and be exposed to various applications of queueing theory and offered invaluable guidance and advice in my thesis research. Without his help, this thesis would not be possible. I am grateful for his patience, the time he has spent advising me, and his moral support that guided me through my studies at Carleton University. I would also like to thank Zhen Wang of Nanjing University of Science and Tech- nology for his gracious help and patience.
    [Show full text]
  • Varon-T Documentation Release 2.0.1-Dev-5-G376477b
    Varon-T Documentation Release 2.0.1-dev-5-g376477b RedJack, LLC June 21, 2016 Contents 1 Contents 3 1.1 Introduction...............................................3 1.2 Disruptor queue management......................................3 1.3 Value objects...............................................5 1.4 Producers.................................................6 1.5 Consumers................................................7 1.6 Yield strategies..............................................9 1.7 Example: Summing integers.......................................9 2 Indices and tables 15 i ii Varon-T Documentation, Release 2.0.1-dev-5-g376477b This is the documentation for Varon-T 2.0.1-dev-5-g376477b, last updated June 21, 2016. Contents 1 Varon-T Documentation, Release 2.0.1-dev-5-g376477b 2 Contents CHAPTER 1 Contents 1.1 Introduction Message passing is currently a popular approach for implementing concurrent data processing applications. In this model, you decompose a large processing task into separate steps that execute concurrently and communicate solely by passing messages or data items between one another. This concurrency model is an intuitive way to structure a large processing task to exploit parallelism in a shared memory environment without incurring the complexity and overhead costs associated with multi-threaded applications. In order to use a message passing model, you need an efficient data structure for passing messages between the processing elements of your application. A common approach is to utilize queues for storing and retrieving messages. Varon-T is a C library that implements a disruptor queue (originally implemented in the Disruptor Java library), which is a particularly efficient FIFO queue implementation. Disruptor queues achieve their efficiency through a number of related techniques: • Objects are stored in a ring buffer, which uses a fixed amount of memory regardless of the number of data records processed.
    [Show full text]
  • On New Approaches of Assessing Network Vulnerability: Hardness and Approximation Thang N
    1 On New Approaches of Assessing Network Vulnerability: Hardness and Approximation Thang N. Dinh, Ying Xuan, My T. Thai, Member, IEEE, Panos M. Pardalos, Member, IEEE, Taieb Znati, Member, IEEE Abstract—Society relies heavily on its networked physical infrastructure and information systems. Accurately assessing the vulnerability of these systems against disruptive events is vital for planning and risk management. Existing approaches to vulnerability assessments of large-scale systems mainly focus on investigating inhomogeneous properties of the underlying graph elements. These measures and the associated heuristic solutions are limited in evaluating the vulnerability of large-scale network topologies. Furthermore, these approaches often fail to provide performance guarantees of the proposed solutions. In this paper, we propose a vulnerability measure, pairwise connectivity, and use it to formulate network vulnerability assessment as a graph-theoretical optimization problem, referred to as β-disruptor. The objective is to identify the minimum set of critical network elements, namely nodes and edges, whose removal results in a specific degradation of the network global pairwise connectivity. We prove the NP-Completeness and inapproximability of this problem, and propose an O(log n log log n) pseudo- approximation algorithm to computing the set of critical nodes and an O(log1:5 n) pseudo-approximation algorithm for computing the set of critical edges. The results of an extensive simulation-based experiment show the feasibility of our proposed vulnerability assessment framework and the efficiency of the proposed approximation algorithms in comparison with other approaches. Index Terms—Network vulnerability, Pairwise connectivity, Hardness, Approximation algorithm. F 1 INTRODUCTION ONNECTIVITY plays a vital role in network per- tal a network element is to the survivability of the net- formance and is fundamental to vulnerability C work when faced with disruptive events.
    [Show full text]
  • Data Structures of Big Data: How They Scale
    DATA STRUCTURES OF BIG DATA: HOW THEY SCALE Dibyendu Bhattacharya Manidipa Mitra Principal Technologist, EMC Principal Software Engineer, EMC [email protected] [email protected] Table of Contents Introduction ................................................................................................................................ 3 Hadoop: Optimization for Local Storage Performance................................................................ 3 Kafka: Scaling Distributed Logs using OS Page Cache .............................................................10 MongoDB: Memory Mapped File to Store B-Tree Index and Data Files ....................................15 HBase: Log Structured Merged Tree – Write Optimized B-Tree ................................................19 Storm: Efficient Lineage Tracking for Guaranteed Message Processing ...................................25 Conclusion ................................................................................................................................33 Disclaimer: The views, processes, or methodologies published in this article are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies. 2014 EMC Proven Professional Knowledge Sharing 2 Introduction We are living in the data age. Big data—information of extreme volume, diversity, and complexity—is everywhere. Enterprises, organizations, and institutions are beginning to recognize that this huge volume of data can potentially deliver high value to their
    [Show full text]
  • Research Article DECISION TREE LEARNING and REGRESSION MODELS to PREDICT ENDOCRINE DISRUPTOR CHEMICALS - a BIG DATA ANALYTICS APPROACH with HADOOP and APACHE SPARK
    International Journal of Machine Intelligence ISSN: 0975-2927 & E-ISSN: 0975-9166, Volume 7, Issue 1, 2016, pp.-469-473. Available online at http://www.bioinfopublication.org/jouarchive.php?opt=&jouid=BPJ0000231 Research Article DECISION TREE LEARNING AND REGRESSION MODELS TO PREDICT ENDOCRINE DISRUPTOR CHEMICALS - A BIG DATA ANALYTICS APPROACH WITH HADOOP AND APACHE SPARK PAULOSE RENJITH1*, JEGATHEESAN K.2 AND GOPAL SAMY B.3 1Cognizant Technology Solutions, Athulya, Infopark SEZ, Kakkanad, Kochi 682030, Kerala, India 2Center for Research and PG Studies in Botany and Department of Biotechnology, Thiagarajar College (Autonomous), Madurai - 625 009, Tamil Nadu, India 3Department of Biotechnology, Liatris Biosciences LLP, Kakkanad, Cochin, Kerala *Corresponding Author: [email protected] Received: April 04, 2016; Revised: May 04, 2016; Accepted: May 05, 2016; Published: May 07, 2016 Abstract- Predictive toxicology calls for innovative and flexible approaches to mine and analyse the mounting quantity and complexity of data used in it. Classification and regression based machine learning algorithms are used in this study in order to computationally predict chemical’s affinity towards endocrine hormones. As a result of the modelling complexity and existing big sized toxicity datasets generated by various irrelevant descriptors, missing values, noisy data and skewed distribution, we are motivated to use machine learning and big data analytics in toxicity prediction. This paper reports results of a qualitative and quantitative toxicity prediction of endocrine disrupting chemicals. Datasets of Estrogen Receptor (ER) and Androgen Receptor (AR) disrupting chemicals along with their Binding Affinity values were used for building the predictive models. Fragment counts of dataset chemicals were generated using Kier Hall Smarts Descriptor that exploit electro-topological state (e- state) indices.
    [Show full text]
  • Operating Model Editorial Board
    McKinsey on Digital Services Introducing the next-generation operating model Editorial Board: Joao Dias Somesh Khanna Chirstopher Paquette Marta Rohr Barr Seitz Alex Singla Rohit Sood Jasper van Ouwerkerk Contents Introduction— Reinventing your operating model to win in the digital world: 1 How to capture the full value Part 1: The next-generation operating model: What is it and how do I build it? 7 The next-generation operating model for the digital world 8 Putting customer experience at the heart of next-generation operating models 16 How to start building your next-generation operating model 26 What it takes to deliver breakthrough customer experiences 34 From disrupted to disruptor: Reinventing your business by transforming the core 40 Part 2: New approaches and capabilities to drive your next-generation 49 operating model Digitizing customer journeys and processes: Stories from the front lines 50 Four fundamentals of workplace automation 58 Intelligent process automation: The engine at the core of the next-generation operating model 66 Making data analytics work for you—instead of the other way around 76 Part 3: Foundations to scale your next-generation operating model 89 Organizing for the future 90 Deploying a two-speed architecture at scale 102 Speed and scale: Unlocking digital value in customer journeys 106 Scaling a transformative culture through a digital factory 112 Transforming operations management for a digital world 118 Introduction Reinventing your operating model to win in the digital world: How to capture the full value How to profitably deliver services to customers has become a defining challenge for businesses today.
    [Show full text]
  • Log4j User Guide
    ...................................................................................................................................... Apache Log4j 2 v. 2.14.1 User's Guide ...................................................................................................................................... The Apache Software Foundation 2021-03-06 T a b l e o f C o n t e n t s i Table of Contents ....................................................................................................................................... 1. Table of Contents . i 2. Introduction . 1 3. Architecture . 3 4. Log4j 1.x Migration . 10 5. API . 17 6. Configuration . 20 7. Web Applications and JSPs . 62 8. Plugins . 71 9. Lookups . 75 10. Appenders . 87 11. Layouts . 183 12. Filters . 222 13. Async Loggers . 238 14. Garbage-free Logging . 253 15. JMX . 262 16. Logging Separation . 269 17. Extending Log4j . 271 18. Programmatic Log4j Configuration . 282 19. Custom Log Levels . 290 © 2 0 2 1 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . T a b l e o f C o n t e n t s ii © 2 0 2 1 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . 1 I n t r o d u c t i o n 1 1 Introduction ....................................................................................................................................... 1.1 Welcome to Log4j 2! 1.1.1 Introduction Almost every large application includes its own logging or tracing API. In conformance with this rule, the E.U.
    [Show full text]
  • Brettner Washington 0250E 21
    Exploring heterogeneous phenotypic states in bacteria. Leandra M. Brettner A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2020 Reading Committee: R. William DePaolo, Chair Eric Klavins, Chair Wendy Thomas Georg Seelig Program Authorized to Offer Degree: Bioengineering © Copyright 2020 Leandra M. Brettner University of Washington Abstract Exploring heterogeneous phenotypic states in bacteria. Leandra M. Brettner Chairs of the Supervisory Committee: Professor R. William DePaolo College of Medicine, Division of Gasteroenterology Professor Eric Klavins Department of Electrical Engineering The collective dry biomass of all bacteria on earth is estimated to be in the ballpark of 400 trillion tons. A single human body alone can harbor approximately 100 trillion bacterial cells. We are only just beginning to understand how the goings-on of these creatures affect not only our own health and biology, but the health and biology of every ecosystem on earth. The vast majority of naturally existing bacteria (>99%) have yet to be cultured in a laboratory setting, making studying them using classical microbiology techniques difficult at best. Investigating the complex community structures of these organisms becomes even more difficult still. To work around this problem, we can first construct synthetic microbial consortia that are engineered to exhibit the behaviors we would like to study. Or second, we can develop applications that can measure organism behavior directly in their natural environments, as is being seen the rapid expansion of the –omics technologies. In this exam, we use both approaches to examine how single- cell behaviors, in both isogenic and mixed bacterial communities, affect the phenotypic dynamics of the population as a whole.
    [Show full text]
  • On Design and Applications of Practical Concurrent Data Structures
    THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY On Design and Applications of Practical Concurrent Data Structures IVAN WALULYA Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2018 On Design and Applications of Practical Concurrent Data Structures IVAN WALULYA Copyright © 2018 Ivan Walulya except where otherwise stated. All rights reserved. ISBN 978-91-7597-815-4 Doktorsavhandlingar vid Chalmers tekniska hogskola,¨ Ny serie nr 4496. ISSN 0346-718X Technical report 164D Department of Computer Science and Engineering Chalmers University of Technology SE-412 96 Gothenburg, Sweden Phone: +46 (0)31-772 10 00 Author e-mail: [email protected] This thesis has been prepared using LATEX. Printed by Chalmers Reproservice, Gothenburg, Sweden 2018. ii On Design and Applications of Practical Concurrent Data Struc- tures Ivan Walulya Department of Computer Science and Engineering, Chalmers University of Technology ABSTRACT The proliferation of multicore processors is having an enormous impact on software design and development. In order to exploit parallelism available in multicores, there is a need to design and implement abstractions that pro- grammers can use for general purpose applications development. A common abstraction for coordinated access to memory is a concurrent data structure. Concurrent data structures are challenging to design and implement as they are required to be correct, scalable, and practical under various application constraints. In this thesis, we contribute to the design of efficient concurrent data structures, propose new design techniques and improvements to existing implementations. Additionally, we explore the utilization of concurrent data structures in demanding application contexts such as data stream processing. In the first part of the thesis, we focus on data structures that are difficult to parallelize due to inherent sequential bottlenecks.
    [Show full text]