International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 5, September-October 2018, pp. 161–166, Article ID: IJCET_09_05_019 Available online at http://iaeme.com/Home/issue/IJCET?Volume=9&Issue=5 Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com ISSN Print: 0976-6367 and ISSN Online: 0976–6375 © IAEME Publication

A SURVEY PAPER ON DEEP BELIEF NETWORK FOR BIG DATA

Azhagammal Alagarsamy Research Scholar, Bharathiyar University, Coimbatore, Tamil Nadu, India Assistant Professor, Department of Computer Science, Ayya Nadar Janaki Ammal College, Sivakasi, Tamilnadu

Dr. K. Ruba Soundar Head, Department of CSE, PSR Engineering College, Sivakasi, Tamilnadu, India

ABSTRACT Deep Belief Network (DBN) , as one of the architecture also noticeable technique, used in many applications like image processing, speech recognition and text mining. It uses supervised and unsupervised method to understand features in hierarchical architecture for the classification and pattern recognition. Recent year development in technologies has enabled the collection of large data. It also poses many challenges in and processing due its nature of large volume, variety, velocity and veracity. In this paper we review the research of DBN for Big data. Key words: DBN, Big Data, Deep Belief Network, Classification. Cite this Article: Azhagammal Alagarsamy and Dr. K. Ruba Soundar, A Survey Paper on Deep Belief Network for Big Data. International Journal of Computer Engineering and Technology, 9(5), 2018, pp. 161-166. http://iaeme.com/Home/issue/IJCET?Volume=9&Issue=5

1. INTRODUCTION Deep learning is also named as deep structured learning or hierarchal learning. It is one of the machine learning methods based on learning data representation. Learning can be either supervised or semi-supervised or unsupervised. In Deep learning methods there four models they are: stacked auto-encoder, deep belief network, convolution neural network and , which are also the most widely used for big data methods. In this paper we review the research work on Deep Belief Network for Big data feature learning. Big Data is a term used to describe collection of data that is huge in size and yet growing exponentially with time. In short, such a data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. Recently, the cyber-physical-social systems, together with the sensor networks and communication

http://iaeme.com/Home/journal/IJCET 161 [email protected] A Survey Paper on Deep Belief Network for Big Data technologies, have made a great progress, enabling the collection of big data [1,2]. Big data can be defined by its four characteristics, i.e., large volume, large variety, large velocity and large veracity, which is usually called 4V’s model [3–5]. Deep learning is playing an important role in big data solutions since it can harvest valuable knowledge from complex systems [6]. Specially, deep learning has become one of the most active research points in the machine learning community since it was presented in 2006 [7-9]. Actually, deep learning can track back to the 1940s. However, traditional training strategies for multi-layer neural networks always result in a locally optimal solution or cannot guarantee the convergence. Therefore, the multi-layer neural networks have not re-ceived wide applications even though it was realized that the multi-layer neural networks could achieve the better performance for feature and representation learning. In 2006, Hinton et al. [10] proposed a two-stage strategy, pre-training and fine-tuning, for training deep learning eectively, causing the first back-through of deep learning. In addition, the increase of computing power and data size also contributes to the popularity of deep learning. As the era of big data comes, a large number of samples can be collected to train the parameters of deep learning models. Meanwhile, training a large-scale deep learning model requires high- performance computing systems. Take the large-scale deep belief network with more than 100 million free parameters and millions of training samples developed by Raina et al. [11] for example. With a GPU-based framework, the training time for such the model is reduced from several weeks to about one day. Typically, deep learning models use an unsupervised pre- training and a supervised fine-tuning strategy to learn hierarchical features and representations of big data in deep architectures for the tasks of classification and recognition [12]. Deep learning has achieved state-of-the-art performance in a broad of applications such as computer vision [13,14], speech recognition [15,16] and text understanding [17,18]. In this paper section 2 describes the Deep Belief Network architecture and its methodology, section 3 explains the DBN on real time data, data parallelism and multi modal data on section 4 and 5 respectively and concluded in section 6.

2. DEEP BELIEF NETWORK (DBN): The first deep learning model that is successfully trained is the deep belief network [10,19]. Dierent from the stacked auto-encoder, the deep belief network is stacked by several restricted Boltzmann machines. The restricted consists of two layers, i.e., visible layer v and hidden layer h, as presented in Fig. 1 [20,21]

Figure 1 RBM Structure A typical restricted Boltzmann machine uses the to train the parameters. Specially, the restricted Boltzmann machine uses the conditional probability P(h|v) to

http://iaeme.com/Home/journal/IJCET 162 [email protected] Azhagammal Alagarsamy and Dr. K. Ruba Soundar calculate the value of each unit in the hidden layer and then uses the conditional probability p(h|v) to calculate the value of each unit in the visible layer. This process is performed repeatedly until convergence.

Let (0iN) be a binary variable of input neurons, N is the number of input neurons, (0jM) be a binary variable of hidden neuron, where M is the number of hidden neurons. The energy function   for input vector    and hidden vector    [22] is represented as

     -  -   (1) The joint probability distribution of  and  is represented as       exp(-E(v,h)), Z=  exp(-E(v,h)) (2) Where

 are the coefficients for  and  respectively,

Weight  is the parameter between  and  , Z is the partition function, P(v,h) is a probability function, calculates sum over all possible pairs of visible and hidden vectors. DBN copy the joint distribution between input vector v and the h hidden layers as follows          ,…., )= ( p( ( ))  ,…, ) (3) Where    , | ) (4) is a conditional distribution for the visible units conditioned on the hidden units of the RBM at level , and ,) is the visible hidden joint distribution in the RBM[2]. Several restricted Boltzmann machines can be stacked into a deep learning model, called deep belief network, as presented in Fig. 2 [12].

Figure 2 DBN Structure Similar to the stacked auto-encoder, the deep belief network is also trained by a two-stage strategy. The pre-training stage is used to train the initial parameters in a greedy layer-wise unsupervised manner while the fine-tuning stage uses the supervised strategy to fine-tune the parameters with regard to the labeled samples by adding a softmax layer on the top layer. Deep belief networks have a wide range applications in image classification [23,24] and acoustic modeling [25] and so on [26–28].

http://iaeme.com/Home/journal/IJCET 163 [email protected] A Survey Paper on Deep Belief Network for Big Data

3. DEEP BELIEF NETWORKS ON REAL-TIME DATA Real-time data refers that there is a high-speed generation of data, which needed to be processed promptly. Non-stationary data means that the distribution of data is varying over time. Providing efficient analysis for these data is profitable in monitoring tasks, such as fraud detection [8]. Real-time nonstationary data are frequently collected and present a challenging area in Big Data analytics. Finding an ideal feature set size becomes difficult for large-scale online datasets whose distribution may change over time. Online incremental learning is considered a solution for learning from such data . Incremental feature learning offers an effective manner to learn from large datasets through starting with a slight set of initial features. The idea of such learning is to add new feature mappings to the existing feature set and then merge them redundantly. In recent years, only limited progress has been made on deep online learning, it is important to acclimate Deep Learning algorithms to be able to handle large scale of online, real-time data streams.

4. DEEP BELIEF NETWORK ON DATA PARALLELISM Big data often retains large scale of inputs, high dimensionality attributes, and great varieties of output. These properties lead to high complexity of running time and proposed models. Deep learning algorithms with a central processor and storage face a challenge in dealing with these properties. Instead, distributed frameworks with parallelized machines are desired. Popular mini-batch stochastic gradient techniques of Deep Learning are well-known to be difficult to be parallelized over computers. Novel architectures of sound parallel learning algorithms needed to be further developed to make Deep Learning techniques scalable to large scale input data. Parallel implementation utilizes clusters of CPUs or GPUs in increasing speed of training without decreasing the accuracy learning algorithms .

5. DEEP BELIEF NETWORK ON MULTIMODAL DATA Big Data is usually collected from multi-modalities. Multimodal data reside several input modalities come from different sources. Each modality has a different kind of representation and correlational structure. For example, an image is usually represented by real values of pixel intensities, but the text is usually represented as vectors of discrete sparse word count . It is difficult task to realize the highly non-linear relationships, which exist between low-level features through dissimilar modalities. Deep Learning is more suitable for heterogeneous data integration due to its capability of learning variation factors of data and providing abstract representations for it. However, its capability is limited to integrate only bi-modalities in which data come from two modalities . Deep Learning also cannot deal with confliction and fusion, which may normally be existed due to the variation of data. There is a need for more progress in Deep Learning algorithms to be optimal models for integrating multi-modal data.

6. CONCLUSIONS Deep Learning algorithms extract significant abstract representations of the raw data by using hierarchical multi-level learning approach. The extracted representations can be reflected as a real source of knowledge for tasks of Big Data analytics, such as data tagging and recognition, information retrieval and natural language processing. However, there are several areas of Big Data where Deep Learning needs extra exploration to cover them, such as learning with streaming data, distributed computing, and dealing with high-dimensional data. We need to address these technical challenges with new ways of thinking and transformative solutions to realize the full prospective of Big Data.

http://iaeme.com/Home/journal/IJCET 164 [email protected] Azhagammal Alagarsamy and Dr. K. Ruba Soundar

REFERENCES [1] L. Kuang, L.T. Yang, Y. Liao, An integration framework on cloud for cyber physical social systems big data, IEEE Trans. Cloud Comput. (2015). 10.1109/ TCC.2015.2511766 [2] Y. Li, X. Qi, M. Keally, Z. Ren, G. Zhou, D. Xiao, S. Deng, Communication energy modeling and through joint packet size analysis of BSN and wifi networks, IEEE Trans. Parallel Distrib. Syst. 24 (9) (2013) 1741–1751. [3] C.L.P. Chen, C. Zhang, Data-intensive applications, challenges, techniques and technologies: a survey on big data, Inf. Sci. 275 (2014) 314–347. 2014 [4] X. Wu, X. Zhu, G.-Q. Wu, W. Ding, Data mining with big data, IEEE Trans. Knowl. Data Eng. 26 (1) (2014) 97–107. [5] G.B. Orgaz, J.J. Jung, D. Camacho, Social big data: recent achievements and new challenges, Inf. Fusion 28 (2016) 45–59. [6] X. Chen, X. Lin, Big data deep learning: challenges and perspectives, IEEE Access 2 (2014) 514–525. [7] Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1798–1828. [8] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521(7553) 436–444. [9] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw. 61 (2015) 85–117. [10] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507. [11] R. Raina, A. Madhavan, A. Ng, Large-scale deep using graphics processors, Proceedings of International Conference on Machine Learning, (2009), p. 873 C880. ACM [12] V. Sze, Y. Chen, T. Yang, J. Emer, Efficient processing of deep neual networks: a tutoria and survey, 2017, arXiv:1703.09039. [13] S.M.S. Islam, S. Rahman, M.M. Rahman, E.K. Dey, M. Shoyaib, Application of deep learning to compuer vision: a comprehensive study, Proceedings of International Conference on Informatics, Electronics and Vision, (2016), pp. 592–597. IEEE [14] N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A.J. Rodriguez- Sanchez, L. Wiskott, Deep hierarchies in the primate visual cortex: what can we learn for computer vision, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1847–1871. [15] L. Deng, G. Hinton, B. Kingsbury, New types of deep neual network learning for speech recognition and related applications: an overview, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, (2013), pp. 26–31. IEEE [16] D. Chen, B.K. Mak, Multitask learning of deep neural networks for low-resource speech recognition, IEEE/ACM Trans. Audio Speech Lang. Process. 23 (7) (2015) 1172–1183. [17] N. Majumder, S. Poria, A. Gelbukh, E. Cambria, Deep learning-based document modeling for personality detection from text, IEEE Intell. Syst. 32 (2) (2017)74–79. [18] Z. Jiang, L. Li, D. Huang, L. Lin, Training word embeddings for deep learning in biomedical text mining tasks, Proceedings of IEEE International Conference on Bioinformatics and Biomedicine, (2015), pp. 625–628. IEEE [19] G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Comput. 18(7) 1527–1554. [20] H. Larochelle, M. Mandel, R. Pascanu, Y. Bengio, Learning algorithms for the classification restricted boltzmann machine, J. Mach. Learn. Res. 13 (2012)643–669. [21] R. Salakhutdinov, A. Mnih, G. Hinton, Restricted boltzmann machines for colla-borative filtering, Proceedings of International Conference on Machine learning, (2007), pp. 791– 798. ACM

http://iaeme.com/Home/journal/IJCET 165 [email protected] A Survey Paper on Deep Belief Network for Big Data

[22] G.E. Hinton, "A Practical Guide to Training Restricted Boltzmann Machines", Neural Networks Tricks of the Trade Lecture Notes in Computer Science, vol. 7700, pp. 599-619, 2012. [23] T. Li, J. Zhang, Y. Zhang, Classification of hyperspectral image based on deep belief networks, Proceedings of IEEE International Conference on Image Processing, (2014), pp. 5132–5136. IEEE [24] J. Niu, X. Bu, Z. Li, Y. Wang, An improved bilinear deep belief network algorithm for image classification, Proceedings of International Conference on Computational Intelligence and Security, (2014), pp. 189–192. IEEE [25] A. Mohamed, G.E. Dahl, G. Hinton, Acoustic modeling using deep belief networks, IEEE Trans. Audio Speech Lang. Process. 20 (1) (2012) 14–22. [26] X. Zhang, J. Wu, Deep belief networks based voice activity detection, IEEE Trans. Audio Speech Lang. Process. 21 (4) (2013) 697–710. [27] W. Huang, G. Song, H. Hong, K. Xie, Deep architecture for traffic flow prediction: deep belief networks with multitask learning, IEEE Trans. Intell. Transp. Syst. 15a. (2014) 2191–2201. [28] R. Sarikaya, G.E. Hinton, A. Deoras, Application of deep belief networks for nat-ural language understanding, IEEE/ACM Trans. Audio Speech Lang. Process. 22 (2014) 778– 784.

http://iaeme.com/Home/journal/IJCET 166 [email protected]