A Survey Paper on Deep Belief Network for Big Data
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 5, September-October 2018, pp. 161–166, Article ID: IJCET_09_05_019 Available online at http://iaeme.com/Home/issue/IJCET?Volume=9&Issue=5 Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com ISSN Print: 0976-6367 and ISSN Online: 0976–6375 © IAEME Publication A SURVEY PAPER ON DEEP BELIEF NETWORK FOR BIG DATA Azhagammal Alagarsamy Research Scholar, Bharathiyar University, Coimbatore, Tamil Nadu, India Assistant Professor, Department of Computer Science, Ayya Nadar Janaki Ammal College, Sivakasi, Tamilnadu Dr. K. Ruba Soundar Head, Department of CSE, PSR Engineering College, Sivakasi, Tamilnadu, India ABSTRACT Deep Belief Network (DBN) , as one of the deep learning architecture also noticeable machine learning technique, used in many applications like image processing, speech recognition and text mining. It uses supervised and unsupervised method to understand features in hierarchical architecture for the classification and pattern recognition. Recent year development in technologies has enabled the collection of large data. It also poses many challenges in data mining and processing due its nature of large volume, variety, velocity and veracity. In this paper we review the research of DBN for Big data. Key words: DBN, Big Data, Deep Belief Network, Classification. Cite this Article: Azhagammal Alagarsamy and Dr. K. Ruba Soundar, A Survey Paper on Deep Belief Network for Big Data. International Journal of Computer Engineering and Technology, 9(5), 2018, pp. 161-166. http://iaeme.com/Home/issue/IJCET?Volume=9&Issue=5 1. INTRODUCTION Deep learning is also named as deep structured learning or hierarchal learning. It is one of the machine learning methods based on learning data representation. Learning can be either supervised or semi-supervised or unsupervised. In Deep learning methods there four models they are: stacked auto-encoder, deep belief network, convolution neural network and recurrent neural network, which are also the most widely used for big data feature learning methods. In this paper we review the research work on Deep Belief Network for Big data feature learning. Big Data is a term used to describe collection of data that is huge in size and yet growing exponentially with time. In short, such a data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. Recently, the cyber-physical-social systems, together with the sensor networks and communication http://iaeme.com/Home/journal/IJCET 161 [email protected] A Survey Paper on Deep Belief Network for Big Data technologies, have made a great progress, enabling the collection of big data [1,2]. Big data can be defined by its four characteristics, i.e., large volume, large variety, large velocity and large veracity, which is usually called 4V’s model [3–5]. Deep learning is playing an important role in big data solutions since it can harvest valuable knowledge from complex systems [6]. Specially, deep learning has become one of the most active research points in the machine learning community since it was presented in 2006 [7-9]. Actually, deep learning can track back to the 1940s. However, traditional training strategies for multi-layer neural networks always result in a locally optimal solution or cannot guarantee the convergence. Therefore, the multi-layer neural networks have not re-ceived wide applications even though it was realized that the multi-layer neural networks could achieve the better performance for feature and representation learning. In 2006, Hinton et al. [10] proposed a two-stage strategy, pre-training and fine-tuning, for training deep learning eectively, causing the first back-through of deep learning. In addition, the increase of computing power and data size also contributes to the popularity of deep learning. As the era of big data comes, a large number of samples can be collected to train the parameters of deep learning models. Meanwhile, training a large-scale deep learning model requires high- performance computing systems. Take the large-scale deep belief network with more than 100 million free parameters and millions of training samples developed by Raina et al. [11] for example. With a GPU-based framework, the training time for such the model is reduced from several weeks to about one day. Typically, deep learning models use an unsupervised pre- training and a supervised fine-tuning strategy to learn hierarchical features and representations of big data in deep architectures for the tasks of classification and recognition [12]. Deep learning has achieved state-of-the-art performance in a broad of applications such as computer vision [13,14], speech recognition [15,16] and text understanding [17,18]. In this paper section 2 describes the Deep Belief Network architecture and its methodology, section 3 explains the DBN on real time data, data parallelism and multi modal data on section 4 and 5 respectively and concluded in section 6. 2. DEEP BELIEF NETWORK (DBN): The first deep learning model that is successfully trained is the deep belief network [10,19]. Dierent from the stacked auto-encoder, the deep belief network is stacked by several restricted Boltzmann machines. The restricted Boltzmann machine consists of two layers, i.e., visible layer v and hidden layer h, as presented in Fig. 1 [20,21] Figure 1 RBM Structure A typical restricted Boltzmann machine uses the Gibbs sampling to train the parameters. Specially, the restricted Boltzmann machine uses the conditional probability P(h|v) to http://iaeme.com/Home/journal/IJCET 162 [email protected] Azhagammal Alagarsamy and Dr. K. Ruba Soundar calculate the value of each unit in the hidden layer and then uses the conditional probability p(h|v) to calculate the value of each unit in the visible layer. This process is performed repeatedly until convergence. Let (0iN) be a binary variable of input neurons, N is the number of input neurons, (0jM) be a binary variable of hidden neuron, where M is the number of hidden neurons. The energy function for input vector and hidden vector [22] is represented as - - (1) The joint probability distribution of and is represented as exp(-E(v,h)), Z= exp(-E(v,h)) (2) Where are the coefficients for and respectively, Weight is the parameter between and , Z is the partition function, P(v,h) is a probability function, calculates sum over all possible pairs of visible and hidden vectors. DBN copy the joint distribution between input vector v and the h hidden layers as follows ,…., )= ( p( ( )) ,…, ) (3) Where , | ) (4) is a conditional distribution for the visible units conditioned on the hidden units of the RBM at level , and ,) is the visible hidden joint distribution in the RBM[2]. Several restricted Boltzmann machines can be stacked into a deep learning model, called deep belief network, as presented in Fig. 2 [12]. Figure 2 DBN Structure Similar to the stacked auto-encoder, the deep belief network is also trained by a two-stage strategy. The pre-training stage is used to train the initial parameters in a greedy layer-wise unsupervised manner while the fine-tuning stage uses the supervised strategy to fine-tune the parameters with regard to the labeled samples by adding a softmax layer on the top layer. Deep belief networks have a wide range applications in image classification [23,24] and acoustic modeling [25] and so on [26–28]. http://iaeme.com/Home/journal/IJCET 163 [email protected] A Survey Paper on Deep Belief Network for Big Data 3. DEEP BELIEF NETWORKS ON REAL-TIME DATA Real-time data refers that there is a high-speed generation of data, which needed to be processed promptly. Non-stationary data means that the distribution of data is varying over time. Providing efficient analysis for these data is profitable in monitoring tasks, such as fraud detection [8]. Real-time nonstationary data are frequently collected and present a challenging area in Big Data analytics. Finding an ideal feature set size becomes difficult for large-scale online datasets whose distribution may change over time. Online incremental learning is considered a solution for learning from such data . Incremental feature learning offers an effective manner to learn from large datasets through starting with a slight set of initial features. The idea of such learning is to add new feature mappings to the existing feature set and then merge them redundantly. In recent years, only limited progress has been made on deep online learning, it is important to acclimate Deep Learning algorithms to be able to handle large scale of online, real-time data streams. 4. DEEP BELIEF NETWORK ON DATA PARALLELISM Big data often retains large scale of inputs, high dimensionality attributes, and great varieties of output. These properties lead to high complexity of running time and proposed models. Deep learning algorithms with a central processor and storage face a challenge in dealing with these properties. Instead, distributed frameworks with parallelized machines are desired. Popular mini-batch stochastic gradient techniques of Deep Learning are well-known to be difficult to be parallelized over computers. Novel architectures of sound parallel learning algorithms needed to be further developed to make Deep Learning techniques scalable to large scale input data. Parallel implementation utilizes clusters of CPUs or GPUs in increasing speed of training without decreasing the accuracy learning algorithms . 5. DEEP BELIEF NETWORK ON MULTIMODAL DATA Big Data is usually collected from multi-modalities. Multimodal data reside several input modalities come from different sources. Each modality has a different kind of representation and correlational structure. For example, an image is usually represented by real values of pixel intensities, but the text is usually represented as vectors of discrete sparse word count . It is difficult task to realize the highly non-linear relationships, which exist between low-level features through dissimilar modalities.