When and How Java Developers Give up Static Type Safety

Total Page:16

File Type:pdf, Size:1020Kb

When and How Java Developers Give up Static Type Safety View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by RERO DOC Digital Library When and How Java Developers Give Up Static Type Safety Doctoral Dissertation submitted to the Faculty of Informatics of the Università della Svizzera Italiana in partial fulfillment of the requirements for the degree of Doctor of Philosophy presented by Luis Mastrangelo under the supervision of Prof. Matthias Hauswirth and Prof. Nathaniel Nystrom June 2019 Git Commit d3fcf49 Dissertation Committee Prof. Antonio Carzaniga Università della Svizzera Italiana, Switzerland Prof. Gabriele Bavota Università della Svizzera Italiana, Switzerland Prof. Jan Vitek Northeastern University & Czech Technical University Prof. Hridesh Rajan Iowa State University Dissertation accepted on 13 June 2019 Research Advisor Research Advisor Prof. Matthias Hauswirth Prof. Nathaniel Nystrom Ph.D. Program Director Ph.D. Program Director Prof. Walter Binder Prof. Olaf Schenk i I certify that except where due acknowledgement has been given, the work presented in this thesis is that of the author alone; the work has not been submitted previously, in whole or in part, to qualify for any other academic award; and the content of the thesis is the result of work which has been carried out since the official commencement date of the approved research program. Luis Mastrangelo Lugano, 13 June 2019 ii To Ankur Singhal, In Memoriam iii iv “It is not only the violin that shapes the violinist, we are all shaped by the tools we train ourselves to use, and in this respect programming languages have a devious influence: they shape our thinking habits.” – Edsger W. Dijkstra, 2001, To the members of the Budget Councila ahttp://www.cs.utexas.edu/users/EWD/transcriptions/OtherDocs/Haskell.html v vi Abstract The main goal of a static type system is to prevent certain kinds of er- rors from happening at run time. A type system is formulated as a set of constraints that gives any expression or term in a program a well-defined type. Besides detecting these kinds of errors, a static type system can be an invaluable maintenance tool, can be useful for documentation purposes, and can aid in generating more efficient machine code. However, there are situations when the developer has more information about the program that is too complex to explain in terms of typing constraints. To that end, programming languages often provide mechanisms that make the typing constraints less strict to permit more programs to be valid, at the expense of causing more errors at run time. These mechanisms are essentially two: Unsafe Intrinsics and Reflective Capabilities. We want to understand how and when developers give up these static constraints. This knowledge can be useful as: a) a recommendation for current and future language designers to make informed decisions, b) a reference for tool builders, e.g., by providing more precise or new refac- toring analyses, c) a guide for researchers to test new language features, or to carry out controlled programming experiments, and d) a guide for developers for better practices. In this dissertation, we focus on the Unsafe API and cast operator—a subset of unsafe intrinsics and reflective capabilities respectively—in Java. We report two empirical studies to understand how these mechanisms— Unsafe API and cast operator—are used by Java developers when the static type system becomes too strict. We have devised usage patterns for both the Unsafe API and cast operator. Usage patterns are recurrent programming idioms to solve a specific issue. We believe that having usage patterns can help us to better categorize use cases and thus understand how those features are used. vii viii Acknowledgements First of all, I want to thank both my advisors, Prof. Matthias Hauswirth and Prof. Nate Nystrom. Their expertise helped me to become a better researcher. They taught me how to ask the proper questions when facing a problem, that not only will help me in my professional career, but in everyday life. In our Unsafe study, Luca Ponzanellli, Andrea Mocci and Prof. Michele Lanza collaborated with us to correlate Unsafe comments in the Questions and Answers database, Stack Overflow, mentioned in Chapter 3. Thanks to them. Special thanks to Max Schäfer from Semmle. He executed our QL queries in the entire lgtm database used in Chapter 4. Without his help, this thesis would not have been possible. Prof. Antonio Carzaniga and Prof. Gabriele Bavota made valuable com- ments during the Ph.D. proposal. They contributed to find an appropriate sample size in the cast study. My group colleagues, Mohammad, Dmitri, Andrea, and Igor. With whom we had uncountable talks during my Ph.D. Special mention for Amanj and Nosheen, I learned a lot in our coffee breaks. Thank you guys. On the personal side, Vanesa, Cinzia, Mariano, Esteban, Dino, Olaf, Di- mos, Jeff, Marisa, Rali, Gabi, Tanya, Dilly, Marie-line, Lea, Celine, Stefania, Carmen and Grace who make me feel I found a family in Lugano. My flat mate and dearest friend, Ankur Singhal, I wish you were here to see this moment. The Singhal family, Anil, Anita, and Ankita. I appreciate your support in the toughest moments. And last but not least, I want to thank my family, for their unconditional support, and for always believing in me. ix x Contents Contents xi List of Figures xv List of Tables xvii List of Listings xvii 1 Introduction 1 1.1 Beyond Static Type Checking . .2 1.2 Research Question . .4 1.3 Thesis Outline . .6 2 Literature Review 7 2.1 Benchmarks and Corpora . .8 2.2 Tools for Mining Software Repositories . .9 2.3 Empirical Studies of Large Codebases . 11 2.3.1 Unsafe Intrinsics in Java . 13 2.3.2 Reflective Capabilities . 14 2.4 Conclusions . 17 3 Empirical Study on the Unsafe API 19 3.1 The Risks of Compromising Safety . 21 3.2 Is Unsafe Used? . 24 3.2.1 Which Features of Unsafe Are Actually Used? . 25 3.2.2 Beyond Maven Central . 29 3.3 Finding sun.misc.Unsafe Usage Patterns . 30 3.4 Usage Patterns of sun.misc.Unsafe ............... 32 3.4.1 Allocate an Object without Invoking a Constructor . 35 3.4.2 Process Byte Arrays in Block . 35 xi xii Contents 3.4.3 Atomic Operations . 36 3.4.4 Strongly Consistent Shared Variables . 36 3.4.5 Park/Unpark Threads . 37 3.4.6 Update Final Fields . 37 3.4.7 Non-Lexically-Scoped Monitors . 38 3.4.8 Serialization/Deserialization . 38 3.4.9 Foreign Data Access and Object Marshaling . 39 3.4.10 Throw Checked Exceptions without Being Declared 39 3.4.11 Get the Size of an Object or an Array . 40 3.4.12 Large Arrays and Off-Heap Data Structures . 40 3.4.13 Get Memory Page Size . 40 3.4.14 Load Class without Security Checks . 41 3.5 What is the Unsafe API Used for? . 41 3.6 Conclusions . 45 4 Empirical Study on the Cast Operator 47 4.1 Casts in Java . 48 4.2 Issues Developers have using Cast Operations . 52 4.3 Finding Cast Usage Patterns . 54 4.3.1 Corpus Analysis . 54 4.3.2 Is the Cast Operator used? . 55 4.3.3 Manual Detection of Cast Patterns . 56 4.3.4 Methodology . 56 4.4 Overview of the Sampled Casts . 58 4.5 Cast Usage Patterns . 60 4.5.1 Typecase .......................... 63 4.5.2 Equals ........................... 73 4.5.3 ParserStack ....................... 77 4.5.4 Stash ............................ 78 4.5.5 Factory .......................... 84 4.5.6 KnownReturnType ................... 86 4.5.7 Deserialization ..................... 88 4.5.8 NewDynamicInstance ................. 89 4.5.9 Composite ......................... 92 4.5.10 Family ........................... 93 4.5.11 CovariantReturnType ................. 95 4.5.12 FluentAPI . 97 4.5.13 UseRawType ....................... 99 4.5.14 RemoveWildcard .................... 101 xiii Contents 4.5.15 CovariantGeneric ................... 102 4.5.16 SelectTypeArgument .................. 103 4.5.17 GenericArray ...................... 105 4.5.18 UnoccupiedTypeParameter .............. 106 4.5.19 SelectOverload ..................... 108 4.5.20 SoleSubclassImplementation ............ 110 4.5.21 ImplicitIntersectionType ............... 111 4.5.22 ReflectiveAccessibility ................ 113 4.5.23 AccessSuperclassField ................. 114 4.5.24 Redundant ........................ 116 4.5.25 VariableSupertype ................... 118 4.5.26 ObjectAsArray ..................... 121 4.6 Discussion . 123 4.7 Conclusions . 126 5 Conclusions 127 5.1 Research Questions . 127 5.2 Java’s Evolution . 128 5.3 Limitations . 129 5.4 Future Work . 129 5.5 Lessons Learned . 131 5.6 Artifacts . 131 A Automatic Detection of Patterns using QL 133 A.1 Introduction to QL . 133 A.2 Additional QL Classes and Predicates . 135 B JNIF: Java Native Instrumentation 143 B.1 Introduction . 143 B.2 Related Work . 145 B.3 Using JNIF . 148 B.4 JNIF Design and Implementation . 151 B.5 Validation . 154 B.6 Performance Evaluation . 156 B.7 Limitations . 158 B.8 Conclusions . 159 Bibliography 163 xiv Contents Figures 2.1 Categorization of Unsafe usages from Sandoz’ survey . 15 3.1 sun.misc.Unsafe method usage on Maven Central ...... 26 3.2 sun.misc.Unsafe field usage on Maven Central ........ 27 3.3 com.lmax:disruptor call sites . 31 3.4 org.scala-lang:scala-library call sites . 32 3.5 Classes using off-heap large arrays . 33 4.1 Projects, by fraction of their methods containing casts. Bulk of data summarized by box plot, outliers shown as individual data points. 55 4.2 Cast distribution . 57 4.3 Cast Usage Pattern Occurrences . 62 4.4 Typecase Variant Occurrences . 63 4.5 Equals Variant Occurrences . 74 4.6 Stash Variant Occurrences . 79 B.1 Instrumentation time on DaCapo and Scala benchmarks . 161 xv xvi Figures Tables 1.1 Safe/Unsafe and Statically/Dynamically checked languages2 3.1 Patterns and their occurrences in the Maven Central repository 34 3.2 Patterns and their alternatives .
Recommended publications
  • Log4j-Users-Guide.Pdf
    ...................................................................................................................................... Apache Log4j 2 v. 2.2 User's Guide ...................................................................................................................................... The Apache Software Foundation 2015-02-22 T a b l e o f C o n t e n t s i Table of Contents ....................................................................................................................................... 1. Table of Contents . i 2. Introduction . 1 3. Architecture . 3 4. Log4j 1.x Migration . 10 5. API . 16 6. Configuration . 18 7. Web Applications and JSPs . 48 8. Plugins . 56 9. Lookups . 60 10. Appenders . 66 11. Layouts . 120 12. Filters . 140 13. Async Loggers . 153 14. JMX . 167 15. Logging Separation . 174 16. Extending Log4j . 176 17. Extending Log4j Configuration . 184 18. Custom Log Levels . 187 © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . T a b l e o f C o n t e n t s ii © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . 1 I n t r o d u c t i o n 1 1 Introduction ....................................................................................................................................... 1.1 Welcome to Log4j 2! 1.1.1 Introduction Almost every large application includes its own logging or tracing API. In conformance with this rule, the E.U.
    [Show full text]
  • CDC Runtime Guide
    CDC Runtime Guide for the Sun Java Connected Device Configuration Application Management System Version 1.0 Sun Microsystems, Inc. www.sun.com November 2005 Copyright © 2005 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries. THIS PRODUCT CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF SUN MICROSYSTEMS, INC. USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SUN MICROSYSTEMS, INC. U.S. Government Rights - Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements. This distribution may include materials developed by third parties. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, Java, J2ME, Java ME, Sun Corporate Logo and Java Logo are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Products covered by and information contained in this service manual are controlled by U.S. Export Control laws and may be subject to the export or import laws in other countries.
    [Show full text]
  • Developer's Guide Sergey A
    Complex Event Processing with Triceps CEP v1.0 Developer's Guide Sergey A. Babkin Complex Event Processing with Triceps CEP v1.0 : Developer's Guide Sergey A. Babkin Copyright © 2011, 2012 Sergey A. Babkin All rights reserved. This manual is a part of the Triceps project. It is covered by the same Triceps version of the LGPL v3 license as Triceps itself. The author can be contacted by e-mail at <[email protected]> or <[email protected]>. Many of the designations used by the manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this manual, and the author was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this manual, the author assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Table of Contents 1. The field of CEP .................................................................................................................................. 1 1.1. What is the CEP? ....................................................................................................................... 1 1.2. The uses of CEP ........................................................................................................................ 2 1.3. Surveying the CEP langscape ....................................................................................................... 2 1.4. We're not in 1950s any more,
    [Show full text]
  • Safety Properties Liveness Properties
    Computer security Goal: prevent bad things from happening PLDI’06 Tutorial T1: Clients not paying for services Enforcing and Expressing Security Critical service unavailable with Programming Languages Confidential information leaked Important information damaged System used to violate laws (e.g., copyright) Andrew Myers Conventional security mechanisms aren’t Cornell University up to the challenge http://www.cs.cornell.edu/andru PLDI Tutorial: Enforcing and Expressing Security with Programming Languages - Andrew Myers 2 Harder & more important Language-based security In the ’70s, computing systems were isolated. Conventional security: program is black box software updates done infrequently by an experienced Encryption administrator. Firewalls you trusted the (few) programs you ran. System calls/privileged mode physical access was required. Process-level privilege and permissions-based access control crashes and outages didn’t cost billions. The Internet has changed all of this. Prevents addressing important security issues: Downloaded and mobile code we depend upon the infrastructure for everyday services you have no idea what programs do. Buffer overruns and other safety problems software is constantly updated – sometimes without your knowledge Extensible systems or consent. Application-level security policies a hacker in the Philippines is as close as your neighbor. System-level security validation everything is executable (e.g., web pages, email). Languages and compilers to the rescue! PLDI Tutorial: Enforcing
    [Show full text]
  • The Java Hotspot VM Under the Hood
    The Java HotSpot VM Under the Hood Tobias Hartmann Principal Member of Technical Staff Compiler Group – Java HotSpot Virtual Machine Oracle Corporation August 2017 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | About me • Joined Oracle in 2014 • Software engineer in the Java HotSpot VM Compiler Team – Based in Baden, Switzerland • German citizen • Master’s degree in Computer Science from ETH Zurich • Worked on various compiler-related projects – Currently working on future Value Type support for Java Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 3 Outline • Intro: Why virtual machines? • Part 1: The Java HotSpot VM – JIT compilation in HotSpot – Tiered Compilation • Part 2: Projects – Segmented Code Cache – Compact Strings – Ahead-of-time Compilation – Minimal Value Types Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 4 A typical computing platform User Applications Application Software Java SE Java EE Java Virtual Machine System Software Operating system Hardware Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 A typical computing platform User Applications Application Software Java SE Java EE Java Virtual Machine System Software Operating system Hardware Copyright © 2017, Oracle and/or its affiliates.
    [Show full text]
  • Cyclone: a Type-Safe Dialect of C∗
    Cyclone: A Type-Safe Dialect of C∗ Dan Grossman Michael Hicks Trevor Jim Greg Morrisett If any bug has achieved celebrity status, it is the • In C, an array of structs will be laid out contigu- buffer overflow. It made front-page news as early ously in memory, which is good for cache locality. as 1987, as the enabler of the Morris worm, the first In Java, the decision of how to lay out an array worm to spread through the Internet. In recent years, of objects is made by the compiler, and probably attacks exploiting buffer overflows have become more has indirections. frequent, and more virulent. This year, for exam- ple, the Witty worm was released to the wild less • C has data types that match hardware data than 48 hours after a buffer overflow vulnerability types and operations. Java abstracts from the was publicly announced; in 45 minutes, it infected hardware (“write once, run anywhere”). the entire world-wide population of 12,000 machines running the vulnerable programs. • C has manual memory management, whereas Notably, buffer overflows are a problem only for the Java has garbage collection. Garbage collec- C and C++ languages—Java and other “safe” lan- tion is safe and convenient, but places little con- guages have built-in protection against them. More- trol over performance in the hands of the pro- over, buffer overflows appear in C programs written grammer, and indeed encourages an allocation- by expert programmers who are security concious— intensive style. programs such as OpenSSH, Kerberos, and the com- In short, C programmers can see the costs of their mercial intrusion detection programs that were the programs simply by looking at them, and they can target of Witty.
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • LMAX Disruptor
    Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads Martin Thompson Dave Farley Michael Barker Patricia Gee Andrew Stewart May-2011 http://code.google.com/p/disruptor/ 1 Abstract LMAX was established to create a very high performance financial exchange. As part of our work to accomplish this goal we have evaluated several approaches to the design of such a system, but as we began to measure these we ran into some fundamental limits with conventional approaches. Many applications depend on queues to exchange data between processing stages. Our performance testing showed that the latency costs, when using queues in this way, were in the same order of magnitude as the cost of IO operations to disk (RAID or SSD based disk system) – dramatically slow. If there are multiple queues in an end-to-end operation, this will add hundreds of microseconds to the overall latency. There is clearly room for optimisation. Further investigation and a focus on the computer science made us realise that the conflation of concerns inherent in conventional approaches, (e.g. queues and processing nodes) leads to contention in multi-threaded implementations, suggesting that there may be a better approach. Thinking about how modern CPUs work, something we like to call “mechanical sympathy”, using good design practices with a strong focus on teasing apart the concerns, we came up with a data structure and a pattern of use that we have called the Disruptor. Testing has shown that the mean latency using the Disruptor for a three-stage pipeline is 3 orders of magnitude lower than an equivalent queue-based approach.
    [Show full text]
  • High Performance with Distributed Caching
    High Performance with Distributed Caching Key Requirements For Choosing The Right Solution High Performance with Distributed Caching: Key Requirements for Choosing the Right Solution Table of Contents Executive summary 3 Companies are choosing Couchbase for their caching layer, and much more 3 Memory-first 4 Persistence 4 Elastic scalability 4 Replication 5 More than caching 5 About this guide 5 Memcached and Oracle Coherence – two popular caching solutions 6 Oracle Coherence 6 Memcached 6 Why cache? Better performance, lower costs 6 Common caching use cases 7 Key requirements for an effective distributed caching solution 8 Problems with Oracle Coherence: cost, complexity, capabilities 8 Memcached: A simple, powerful open source cache 10 Lack of enterprise support, built-in management, and advanced features 10 Couchbase Server as a high-performance distributed cache 10 General-purpose NoSQL database with Memcached roots 10 Meets key requirements for distributed caching 11 Develop with agility 11 Perform at any scale 11 Manage with ease 12 Benchmarks: Couchbase performance under caching workloads 12 Simple migration from Oracle Coherence or Memcached to Couchbase 13 Drop-in replacement for Memcached: No code changes required 14 Migrating from Oracle Coherence to Couchbase Server 14 Beyond caching: Simplify IT infrastructure, reduce costs with Couchbase 14 About Couchbase 14 Caching has become Executive Summary a de facto technology to boost application For many web, mobile, and Internet of Things (IoT) applications that run in clustered performance as well or cloud environments, distributed caching is a key requirement, for reasons of both as reduce costs. performance and cost. By caching frequently accessed data in memory – rather than making round trips to the backend database – applications can deliver highly responsive experiences that today’s users expect.
    [Show full text]
  • CSI Web Server for Linux
    INSTRUCTION CSI Web Server for Linux Installation Guide Revision: 3/18 MANUAL Copyright © 2006- 2018 Campbell Scientific, Inc. License for Use This software is protected by United States copyright law and international copyright treaty provisions. The installation and use of this software constitutes an agreement to abide by the provisions of this license agreement. Campbell Scientific grants you a non-exclusive license to use this software in accordance with the following: (1) The purchase of this software allows you to install and use a single instance of the software on one physical computer or one virtual machine only. (2) This software cannot be loaded on a network server for the purposes of distribution or for access to the software by multiple operators. If the software can be used from any computer other than the computer on which it is installed, you must license a copy of the software for each additional computer from which the software may be accessed. (3) If this copy of the software is an upgrade from a previous version, you must possess a valid license for the earlier version of software. You may continue to use the earlier copy of software only if the upgrade copy and earlier version are installed and used on the same computer. The earlier version of software may not be installed and used on a separate computer or transferred to another party. (4) This software package is licensed as a single product. Its component parts may not be separated for use on more than one computer. (5) You may make one (1) backup copy of this software onto media similar to the original distribution, to protect your investment in the software in case of damage or loss.
    [Show full text]
  • High Performance & Low Latency Complex Event Processor
    International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Impact Factor (2012): 3.358 Lightning CEP - High Performance & Low Latency Complex Event Processor Vikas Kale1, Kishor Shedge2 1, 2Sir Visvesvaraya Institute of Technology, Chincholi, Nashik, 422101, India Abstract: Number of users and devices connected to internet is growing exponentially. Each of these device and user generate lot of data which organizations want to analyze and use. Hadoop like batch processing are evolved to process such big data. Batch processing systems provide offline data processing capability. There are many businesses which requires real-time or near real-time processing of data for faster decision making. Hadoop and batch processing system are not suitable for this. Stream processing systems are designed to support class of applications which requires fast and timely analysis of high volume data streams. Complex event processing, or CEP, is event/stream processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. In this paper I propose “Lightning - High Performance & Low Latency Complex Event Processor” engine. Lightning is based on open source stream processor WSO2 Siddhi. Lightning retains most of API and Query Processing of WSO2 Siddhi. WSO2 Siddhi core is modified using low latency technique such as Ring Buffer, off heap data store and other techniques. Keywords: CEP, Complex Event Processing, Stream Processing. 1. Introduction of stream applications include automated stock trading, real- time video processing, vital-signs monitoring and geo-spatial Number of users and devices connected to internet is trajectory modification. Results produced by such growing exponentially.
    [Show full text]
  • Getting Started with Apache Avro
    Getting Started with Apache Avro By Reeshu Patel Getting Started with Apache Avro 1 Introduction Apache Avro Apache Avro is a remote procedure call and serialization framework developed with Apache's Hadoop project. This is uses JSON for defining data types and protocols, and tend to serializes data in a compact binary format. In other words, Apache Avro is a data serialization system. Its frist native use is in Apache Hadoop, where it's provide both a serialization format for persistent data, and a correct format for communication between Hadoop nodes, and from client programs to the apache Hadoop services. Avro is a data serialization system.It'sprovides: Rich data structures. A compact, fast, binary data format. A container file, to store persistent data. Remote procedure call . It's easily integration with dynamic languages. Code generation is not mendetory to read or write data files nor to use or implement Remote procedure call protocols. Code generation is as an optional optimization, only worth implementing for statically typewritten languages. Schemas of Apache Avro When Apache avro data is read, the schema use when writing it's always present. This permits every datum to be written in no per-value overheads, creating serialization both fast and small. It also facilitates used dynamic, scripting languages, and data, together with it's schema, is fully itself-describing. 2 Getting Started with Apache Avro When Apache avro data is storein a file, it's schema is store with it, so that files may be processe later by any program. If the program is reading the data expects a different schema this can be simply resolved, since twice schemas are present.
    [Show full text]