Analysis of Human Activities on Smart Devices Using Riak-TS
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of human activities on smart devices using Riak TS Hinduja Dhanasekaran, Siddharth Selvam, Jeongkyu Lee University of Bridgeport Abstract—In this paper we have definition – “Extremely large data sets implemented Riak TS which is a time that may be analyzed computationally to series-based database. It is a key value- reveal patterns, trends, and associations, based database and has time as especially relating to human behavior important parameter. During the and interactions”. implementation of the project we have understood the installation process, We should also try to understand that big loading the data and also analyzing the data is changing every second and it is at data using Riak TS. By doing complex a very fast pace and hence the processing querying we learnt how time plays a rate must be super-fast in order to match crucial role in understanding the data the needs. Since big data has huge and analyzing them to visualize. amounts of data in terms of volume, we can’t process them using the traditional Index Terms—Big Data, NOSQL database, tools. The reason for traditional tools not Motivation, Riak TS features, Riak TS being a favorable one for processing Big Architecture, Dataset and Implementation, Data is because most of the Result, Conclusion traditional ones can’t handle huge I. INTRODUCTION amount of data at once and also the format of the data that is being collected We all know that the digital world today from various sources differ from each is based and is running because of the effective usage of Data. Social platforms like Facebook, Instagram, Snapchat all other and hence they are not ideal to deal with huge amount of data on a day handle the Big Data. Also, when we say to day basis. In fact, every second there we are going to handle Big Data we must is a change in the amount of data that is make sure we choose the right database to be stored and supported by their to get the desired results. The database systems. With number of users and the should be able to handle huge set of data. time spent on social media we can The database should be able to imagine how much data needs to be accommodate data of various types and managed and formats which makes it non-predictable. processed. In the year 2013 the oxford Keeping all this in mind the option dictionary officially added the term “Big having a traditional database has been Data”. When searched for Big data you eliminated and hence we must have a will be provided with the following database that supports all this for achieving the desired results. Hence, we have to go for a NoSQL database. We all knowingly or unknowingly spend NoSQL stands for not only SQL. Most time on social media on day to day basis. popular NoSQL database are Mongo Hence, we wanted to deal with DB, Orient DB, Couchbase, Redis and something related or based on the social Neo4j. All the NoSQL database are media and hence we needed a database designed keeping the Big Data in mind is that handles dataset based on time. Riak a notable thing TS is one of the databases that supports and manages the dataset based on time. NoSQL database supports the dynamic It has time as an important factor. It is schema design which leads to maximum designed for handling IOT and time flexibility and ease to work on data. series dataset. Riak TS seamlessly They provide high levels of scalability integrates with Apache Spark for faster compared to the relational database. and easier analyses of data. These key features make it the most suitable one for handling non-uniform III. Riak TS Features – data which various from time to time. They are ideal for all the social media 1.Availability – Riak TS ensures high applications and the web applications as availability of the data always since it well. These NoSQL databases are follows the 3X model and hence the data classified into four different types is replicated across multiple data centers namely – document database, graph spanning across multiple zones. Hence database, key value based, and wide when one goes down, we are not let column based. down as a whole. Availability of data 24*7. From all the above details you can figure out that the NoSQL database does not 2.Resilient – The Riak TS follows a depend on tables, columns and rows. In master class architecture and hence it other words, they don’t depend on any ensures the availability of data even at defined structure, hence pave way to the worst times like network failures or a process the unstructured data. The hardware issue. This also eliminates the NoSQL database which we have selected downtime of copying the data to other is Riak TS. TS in the name stands for places at the emergency situation as “time series”. This indicates the time everything here is automatic. factor importance in our database. Riak TS was developed Basho technologies 3.Scalability – It makes the scaling on top of the Riak KV. The difference process easier when compared to another between them are it adds the facility to database. Increasing and decreasing the co-locate keys of the same series within size of the database is very easy and the same quanta for faster and efficient simple. Based on our requirement we can read operations. scale to meet the peak requirements and improve the performance. II. MOTIVATION – 4.Operational Simplicity – Going by the term, the interface of the database is easy and user friendly to operate and navigate. locations spanning across various It makes it convenient to add the cluster locations. This ensure the high and also uniformly distribute the data availability of the data always. among the clusters. Therefore, the set up 11.Robust API and Client Libraries – and also addition and deletion of the Riak TS supports various programming capacity to the database is very easy. languages like python, java, php, node.js, 5.Data co-location – It locates the data erlang. This makes it comfortable to together in the same physical part of the build for IOT and time series-based data. cluster based on the time limit. This helps in fast query process on the data 12.Aggregations – Certain built in and hence the faster read and analysis of aggregations help in faster read and write the time series data. operations when handling huge amount of data. 6.SQL commands – Riak TS helps in 13.Apache Memos Framework – The storing the semi structured data in a Riak TS memo framework provides the schema by the help of SQL commands. cluster management and push button, Data co location and range queries help scale up and scale down for all the nodes in reading and analyzing the time series in Riak. data at a faster rate. 14.Time Stamped Data Feeds – Riak TS 7.SQL range queries – SQL commands has all its data feeds associated with time in the Riak TS have special time and hence its very handy for medical, quantum attached to them and hence financial and economical data and fields. they help in locating the data at a faster rate instead of going through the entire IV. Riak TS Architecture – set of data which is time consuming. The Riak TS integrates the functionality 8.Data Expiry – The Riak TS has the and the data expiry feature. This feature allows SQL structure with the Riak KV storage to explicitly specify the data by you this is achieved by using the Riak TS when aged must be removed from the tables. Riak TS enables you to query database. This decreases the load of data with large amount of data and hence it is stored in the database and hence different from Riak KV. The Riak TS improves its performance. table consist of – 9.Apache Spark – Riak TS seamlessly 1.Partition Key – It decides where the integrates with the apache spark to data must be located in a cluster. provide faster and efficient analysis of 2.Local Key – It decides where the data the data. is written in the partition. 10.Multi Cluster Replication – The Riak The partition key uses the time TS follows the 3X architecture and quantization to group data that will be hence the data is replicated in multiple required to queried together in the same physical part of the cluster. The time The structure of the partition key must be quantization has its own parameters maintained and defined in the following inside it. For us to query a time series way – data, we need the data to be structured using a specific schema. The schema 1- The first field (family) is type of defines what sort of data can be stored in data or a class. the database and what type it has. The 2- Second one is (series) which following is an example of a table in identifies the particular instance Riak TS – of the class / type like the device ID or user ID. 3- The third one is (quantum) CREATE TABLE GeoCheckin which sets the time intervals to ( groups by the data. id SINT64 NOT NULL, The quantum function when broken region VARCHAR NOT NULL, down has the following three parameters – state VARCHAR NOT NULL, 1- The quantity time TIMESTAMP NOT NULL, 2- The unit of time I) D days weather VARCHAR NOT NULL, II)H hours III)M minutes temperature DOUBLE, IV)S seconds 3- The name of a field in the table PRIMARY KEY { definition of type TIMESTAMP. (id, QUANTUN(time, 15, ‘m’)), Id, time LOCAL KEY ) ) The local key or the second key must contain the same 3 fields in the same In the above example we are having order as the partition key.