CS855 Pattern Recognition and Machine Learning Homework 3 A.Aziz Altowayan

Problem Find three recent (2010 or newer) journal articles of conference papers on pattern recognition applications using feed-forward neural networks with learning that clearly describe the design of the neural network – number of layers and number of units in each layer – and the rationale for the design. For each paper, describe the neural network, the reasoning behind the design, and include images of the neural network when available.

Answer 1 The theme of this ansewr is Deep Neural Network 2 ( or multi-layer deep architecture). The reason is that in recent years, “Deep learning technology and related algorithms have dramatically broken landmark records for a broad range of learning problems in vision, speech, audio, and text processing.” [1] Deep learning models are a class of machines that can learn a hierarchy of features by building high-level features from low-level ones, thereby automating the process of feature construction [2].

Following are three paper in this topic.

Paper1: D. C. Ciresan, U. Meier, J. Schmidhuber. “Multi-column Deep Neural Networks for Image Classifica- tion”. IEEE Conf. on and Pattern Recognition CVPR 2012. Feb 2012. arxiv “Work from Swiss AI Lab IDSIA”

This method is the first to achieve near-human performance on MNIST handwriting dataset. It, also, outperforms humans by a factor of two on the traffic sign recognition benchmark. In this paper, the network model is Deep Convolutional Neural Networks. The layers in their NNs are comparable to the number of layers found between retina and visual cortex of “macaque monkeys”. These NNs, which inspired by the (a hierarchical multilayered artificial neural network proposed by Professor Kunihiko Fukushima for handwritten character recognition), are deep and have hundreds of maps per layer, with many (6-10) layers of non-linear neurons stacked on top of each other. They train the nets by simple online back-propagation where weight updates occur after each error back- propagation step. Why multi-columns? inspired by micro-columns of neurons in the cerebral cortex, “We also show how combining several DNN columns into a Multi-column DNN (MCDNN) further decreases the error rate by 30–40%.”. Their DNN (Figure.1) have 2D layers of “winner-take-all neurons” with using a simple max pooling technique to determine the winning neurons for selecting most active neuron of each region. Thus, only winner neurons are trained, and other neurons (not-wining) will not forget what they have learnt. Eventually, as down-sampling automatically leads to the first 1D layer, the top part of the hierarchy becomes a standard multi-layer perceptron (MLP). (See Figure.2 for their model of MNIST experiment.)

1 This answer is available online at: http://goo.gl/ih6O9Z 2A DNN is a feed-forward, artificial neural network that has more than one layer of hidden units between its inputs and its outputs [Paper3].

1 Figure 1: (a) DNN architecture. (b) MCDNN architecture. The input image can be preprocessed by P0 ? Pn?1 blocks. An arbitrary number of columns can be trained on inputs preprocessed in different ways. The final predictions are obtained by averaging individual predictions of each DNN. (c) Training a DNN. The dataset is preprocessed before training, then, at the beginning of every epoch, the images are distorted (D block). See text for more explanations.

Figure 2: (a) Handwritten digits from the training set (top row) and their distorted versions after each epoch (second to fifth row). (b) The 23 errors of the MCDNN, with correct label (up right) and first and second best predictions (down left and right). (c) DNN architecture for MNIST. Output layer not drawn to scale; weights of fully connected layers not displayed.

2 Paper2: Sarikaya, R.; Hinton, G.E.; Ramabhadran, B., “Deep belief nets for natural language call-routing,” Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on , vol., no., pp.5680,5683, 22-27 May 2011 pdf “Work from IBM Watson Research Center (Yorktown) and Uni. of Toronto”

In this paper, the network model is Deep Belief Nets (DBNs) built with the probablistic model Restricted Boltzman Machine (RBM). Call-routing task requires two statistical models. One performs speech transcription (maps the caller voice to text), and the the other is Action Classification (AC) model, which maps the transcription generated by the speech recognizer to the call-types of the appropriate action. “as the complexity of the task increases the amount of training data required for reasonable performance can become large. This increases the cost and time taken to deploy the natural language understanding system.” So to overcome this, “DBNs learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation.” From their experiments results, they stated that hidden layers of 500 → 500 → 500 provided better results than other sizes they tried. See Figure.3

Figure 3: Stacking RBMs to create a deep network. This architecture is used in thier experiments.

They, also, compare the DBN-initialized neural network to three (popular) text classification algorithms; SVM, Boosting, and Maximum Entropy. The results are shows in the Figure.4:

Figure 4: PACKAGE SHIPMENT TASK: AC accuracy for various classifiers.

3 Paper3: Hinton, G.; et. al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” Signal Processing Magazine, IEEE , vol.29, no.6, pp.82,97, Nov. 2012 pdf

Most speech recognition systems use HMMs with GMMs to determine how well each HMM state fits a frame of the acoustic input. Another approach is to use a feed-forward neural network; training a neural network with many hidden layers (DNNs) using new methods have shown to outperform GMMs, sometimes, by a large margin. In this article, they demonstrate the shared views of four different research groups (The University of Toronto, Microsoft Research, Google, and IBM Research) who have all had recent successes in using DNNs for acoustic modeling. See Figures5,6, and7

Figure 5: The sequence of operations used to create a DBN with three hidden layers and to convert it to a pretrained DBN-DNN. First, a GRBM is trained to model a window of frames of real-valued acoustic coefficients. Then the states of the binary hidden units of the GRBM are used as data for training an RBM. This is repeated to create as many hidden layers as desired. Then the stack of RBMs is converted to a single generative model, a DBN, by replacing the undirected connections of the lower level RBMs by top-down, directed connections. Finally, a pretrained DBN-DNN is created by adding a ?softmax? output layer that contains one unit for each possible state of each HMM. The DBN-DNN is then discriminatively trained to predict the HMM state corresponding to the central frame of the input window in a forced alignment.

Figure 6: Comparisons among the reported speaker-independent (si) phonetic recognition accuracy results on timit core test set with 192 sentences.

4 Figure 7: Comparing five different dbn-dnn acoustic models with two strong gmm-hmm baseline systems that are discriminatively trained. si training on 309 h of data and single-pass decoding were used for all models except for the gmm-hmm system shown on the last row which used sa training with 2,000 h of data and multipass decoding including hypotheses combination. in the table, ?40 mix? means a mixture of 40 gaussians per hmm state and ?15.2 NZ? means 15.2 million, nonzero weights. WERs in % are shown for two separate test sets, hub500-swb and rt03s-fsh.

References

[1] Ni, K.; Prenger, R., “Learning features in deep architectures with unsupervised kernel k-means,” Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE , vol., no., pp.981,984, 3-5 Dec. 2013 Link

[2] Shuiwang Ji; Wei Xu; Ming Yang; Kai Yu, “3D Convolutional Neural Networks for Human Action Recognition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.35, no.1, pp.221,231, Jan. 2013 [3] Wand, Michael; Schultz, Tanja, “Pattern learning with deep neural networks in EMG-based speech recognition,” Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE , vol., no., pp.4200,4203, 26-30 Aug. 2014 pdf

Deep learning resources: Recent Developments in Deep Learning ‘video’ by Geoffrey Hinton, 2010 at Google A tutorial on Deep Learning ‘video’ by Geoffrey Hinton Learning Deep Architecture for AI, 2010 Paper by Yoshua Bengio ppt

Deep Learning in media:

NYT Science: “Scientists See Promise in Deep-Learning Programs”, 11/24/2012 Fun article on “Deep Learning”

5