Dmla: a Dynamic Model-Based Lambda Architecture for Learning And
Total Page:16
File Type:pdf, Size:1020Kb
DMLA: A DYNAMIC MODEL-BASED LAMBDA ARCHITECTURE FOR LEARNING AND RECOGNITION OF FEATURES IN BIG DATA A THESIS IN Computer Science Presented to the Faculty of the University Of Missouri-Kansas City in partial fulfillment Of the requirements for the degree MASTER OF SCIENCE By RAVI KIRAN YADAVALLI B.Tech, Jawaharlal Nehru Technological University – Hyderabad, India, 2013 Kansas City, Missouri 2016 ©2016 RAVI KIRAN YADAVALLI ALL RIGHTS RESERVED DMLA: A DYNAMIC MODEL-BASED LAMBDA ARCHITECTURE FOR LEARNING AND RECOGNITION OF FEATURES IN BIG DATA Ravi Kiran Yadavalli, Candidate for the Master of Science Degree University of Missouri-Kansas City, 2016 ABSTRACT Real-time event modeling and recognition is one of the major research areas that is yet to reach its fullest potential. In the exploration of a system to fit in the tremendous challenges posed by data growth, several big data ecosystems have evolved. Big Data Ecosystems are currently dealing with various architectural models, each one aimed to solve a real-time problem with ease. There is an increasing demand for building a dynamic architecture using the powers of real-time and computational intelligence under a single workflow to effectively handle fast-changing business environments. To the best of our knowledge, there is no attempt at supporting a distributed machine-learning paradigm by separating learning and recognition tasks using Big Data Ecosystems. The focus of our study is to design a distributed machine learning model by evaluating the various machine-learning algorithms for event detection learning and predictive analysis with different features in audio domains. We propose an integrated architectural model, called DMLA, to handle real-time problems that can enhance the richness in the information level and at the same time reduce the overhead of dealing with diverse architectural constraints. The DMLA architecture is the variant of a Lambda Architecture that combines the power of Apache Spark, Apache Storm (Heron), and Apache Kafka to handle massive amounts of data using both streaming and batch processing techniques. The primary dimension of this study is to iii demonstrate how DMLA recognizes real-time, real-world events (e.g., fire alarm alerts, babies needing immediate attention, etc.) that would require a quick response by the users. Detection of contextual information and utilizing the appropriate model dynamically has been distributed among the components of the DMLA architecture. In the DMLA framework, a dynamic predictive model, learned from the training data in Spark, is loaded from the context information into a Storm topology to recognize/predict the possible events. The event-based context aware solution was designed for real-time, real-world events. The Spark based learning had the highest accuracy of over 80% among several machine-learning models and the Storm topology model achieved a recognition rate of 75% in the best performance. We verify the effectiveness of the proposed architecture is effective in real-time event-based recognition in audio domains. iv APPROVAL PAGE The faculty listed below, appointed by the Dean of the School of Computing and Engineering, have examined a thesis titled “DMLA: A Dynamic Model-based Lambda Architecture for Learning and Recognition of Features in Big Data” presented by Ravi Kiran Yadavalli, candidate for the Master of Science degree, and certify that in their opinion, it is worthy of acceptance. Supervisory Committee Yugyung Lee, Ph.D., Committee Chair School of Computing and Engineering Yongjie Zheng, Ph.D. School of Computing and Engineering Sejun Song, Ph.D. School of Computing and Engineering v TABLE OF CONTENTS ABSTRACT .............................................................................................................................................. iii ILLUSTRATIONS………………………………….................................................................................................vii TABLES .................................................................................................................................................... x 1. INTRODUCTION .................................................................................................................................. 1 1.1 Motivation .................................................................................................................................... 1 1.2 Problem Statement ...................................................................................................................... 2 1.3 Proposed Solution ........................................................................................................................ 2 2. BACKGROUND AND RELATED WORK.................................................................................................. 4 2.1 Terminology .................................................................................................................................. 4 2.2 Related Work ................................................................................................................................ 6 2.2.1 Big Data Streaming Tools and Frameworks .......................................................................... 6 2.2.2 Evaluation on Current Stream Processing Frameworks ...................................................... 11 3. PROPOSED FRAMEWORK ................................................................................................................. 18 3.1 Overview ..................................................................................................................................... 18 3.2 Dynamic Recognition .................................................................................................................. 20 3.3 Feature Extraction Flow.............................................................................................................. 21 3.4 Apache Spark Workflow ............................................................................................................. 22 3.5 Apache Storm Workflow ............................................................................................................ 24 3.6 Apache Kafka and REST API ........................................................................................................ 28 3.7 Features on JAudio ..................................................................................................................... 29 3.8 Context Aware Model................................................................................................................. 32 3.8.1 Home Context ..................................................................................................................... 33 3.8.2 Classroom Context .............................................................................................................. 34 3.8.3 Outdoor Context ................................................................................................................. 36 3.8.4 Office Context...................................................................................................................... 37 vi 3.8.5 Contextual features ............................................................................................................. 38 4. RESULTS AND EVALUATION .......................................................................................................... 43 4.1 Apache Spark .............................................................................................................................. 43 4.1.1 Machine Learning Algorithms ............................................................................................. 43 4.2 Evaluation .................................................................................................................................. 53 4.2.1 Feature Based Analysis ........................................................................................................ 53 4.2.2 Audio File VS Feature Data .................................................................................................. 54 5. CONCLUSION AND FUTURE WORK ................................................................................................... 56 5.1 Conclusion .................................................................................................................................. 56 5.2 Limitations .................................................................................................................................. 56 5.3 Future Scope ............................................................................................................................... 56 REFERENCES ......................................................................................................................................... 57 VITA ..................................................................................................................................................... 59 vii ILLUSTRATIONS Figure Page Figure 1: Hadoop vs Spark Runtime Performance .................................................................................... 9 Figure 2: Storm Topology Architecture ................................................................................................... 10 Figure 3: Streaming Applications Workflow ............................................................................................ 12 Figure 4: Lambda Architecture ...............................................................................................................