A guide in the Big Data jungle Thesis, Bachelor of Science Anna Ohlsson, Dan Öman Faculty of Computing Blekinge Institute of Technology SE-371 79 Karlskrona Sweden Contact Information: Author(s): Anna Ohlsson, E-mail:
[email protected], Dan Öman E-mail:
[email protected] University advisor: Nina Dzamashvili Fogelström Department of Software Engineering Faculty of Computing Internet : www.bth.se/com Blekinge Institute of Technology Phone : +46 0455 38 50 00 SE-371 79 Karlskrona Sweden Fax : +46 0455 38 50 57 1 Abstract This bachelor thesis looks at the functionality of different frameworks for data analysis at large scale and the purpose of it is to serve as a guide among available tools. The amount of data that is generated every day keep growing and for companies to take advantage of the data they collect they need to know how to analyze it to gain maximal use out of it. The choice of platform for this analysis plays an important role and you need to look in to the functionality of the different alternatives that are available. We have created a guide to make this research easier and less time consuming. To evaluate our work we created a summary and a survey which we asked a number of IT-students, current and previous, to take part in. After analyzing their answers we could see that most of them find our thesis interesting and informative. 2 Content Introduction 5 1.1 Overview 5 Background and related work 7 2.1 Background 7 Figure 1 Overview of Platforms and Frameworks 8 2.1.1 What is Big Data? 8 2.1.2 Platforms and frameworks 9