International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 9 (2019) pp. 2238-2243 © Research India Publications. http://www.ripublication.com Associated with Streaming Data Problems

Ramesh Balasubramaniam1*, Dr. K. Nandhini2 1 Research Scholar, 2 Assistant Professor PG and Research Department of Computer Science, Chikkanna Govt. Arts College (Bharathiyar University), Tirupur, India. *Corresponding Author

Abstract: difference between the models is how the input streams are handled based on the underlying signals [1] The constant frequency and variety of high-speed data have resulted in the growth of large data sets in the last two decades.  Time series model: Each ai is equal to A[i] which is Storage of this data in memory is not possible due to storage the underlying signal. They appear in increasing order constraints. As a result, today’s streaming algorithms need to of i. This is useful for data dependent on time series, work with one pass of data. Many streaming algorithms address for example, temperature checks every hour on a day, different problems with data streams. Data problems Stock price note every 5mins. though originated around the ‘70s have gained popularity only  Cash register model: Known as the most popular data in that last 15 years due to the exponential growth of data from stream model ai’s are increments to A[j]’s. 푎푖 = a variety of sources. Today’s researchers aim at providing (푗, 퐼푖), 퐼푖 ≥ 0 mean 퐴푖[푗] = 퐴푖−1[푗] + 퐼푖 where Ai is solutions to specific problems that are found across many the state of the signal after seeing the ith item in the industries. Most of the problems addressed in this paper are the stream. Much as in a cash register, multiple ai‘s could ones researchers have considered as oft faced problems in the increment a given A[j] over time. This is where each industry like heavy-hitters/top k elements or frequency count value of the signal gives a positive increment to the or membership check or cardinality. Algorithms addressing previous value to obtain a new value. Like a cash these problems like , Count-min Sketch, register adding occurrences of element values to a HyperLogLog and such are being continuously improved for counter over time. faster and effective results. Hashing is a technique used by  Turnstile Model: Where increments can also be many algorithms to resolve massive data problems affecting negative. Like a turnstile in an amusement park that today’s applications. Hashing enables streaming data to be able keeps count of people entering and exiting a park, the to store and retrieve data resourcefully. In this paper, we turnstile model allows both positive and negative analyze both the problems in streaming data and the various inputs. Here ai’s are updates to A[j]’s. 푎 = algorithms used to resolve them. 푖 (푗, 푈푖) 푚ean 퐴푖[푗] = 퐴푖−1[푗] + 푈푖 where Ai is the Keywords: Bloom Filter, Count-min Sketch, Data Stream state of the signal after seeing the ith item in the stream Problems, HyperLogLog, Streaming Data and 푈푖maybe positive or negative. Though the turnstile model seems to be the best model it is only from a theoretical purpose; the cash register model is used in 1. INTRODUCTION most algorithms from a practical purpose. [1] A data stream is defined as a continuous sequence of high- Many algorithms have been developed to deal with such frequency data (Volume and Velocity) from different sources massive data problems, and that is what we will be discussing (Variety) which is too big to be stored in memory. There is in our paper. The paper is sectioned as first the Introduction; continuous progress being made on algorithms in recent years the next section lists various streaming problems, and their due to the advent of streaming data. associated algorithms formalized to res