Computer Vision: Filtering

Total Page:16

File Type:pdf, Size:1020Kb

Computer Vision: Filtering Computer Vision: Filtering Raquel Urtasun TTI Chicago Jan 10, 2013 Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 1 / 82 Today's lecture ... Image formation Image Filtering Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 2 / 82 Readings Chapter 2 and 3 of Rich Szeliski's book Available online here Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 3 / 82 How is an image created? The image formation process that produced a particular image depends on lighting conditions scene geometry surface properties camera optics [Source: R. Szeliski] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 4 / 82 Image formation Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 5 / 82 What is an image? !"#"$%&'(%)*+%' 0*1&&'23456'37'$-*6*'"7'$-"6'4&%66' ,-*'./*' [Source: A. Efros] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 6 / 82 From photons to RGB values Sample the 2D space on a regular grid. Quantize each sample, i.e., the photons arriving at each active cell are integrated and then digitized. [Source: D. Hoiem] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 7 / 82 What is an image? A grid (matrix) of intensity values #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #%" %" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" &$" &$" &$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" &$" '$" '$" &$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" '(" )#&" )*$" )&$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" )#&" )*$" )&$" )&$" )&$" #$$" #$$" #$$" #$$" #$$" !" #$$" #$$" )#&" )*$" #%%" #%%" )&$" )&$" '$" #$$" #$$" #$$" #$$" #$$" )#&" )*$" #%%" #%%" )&$" )&$" '$" *&" #$$" #$$" #$$" #$$" )#&" )*$" )*$" )&$" )#&" )#&" '$" *&" #$$" #$$" #$$" #$$" &*" )#&" )#&" )#&" '$" '$" '$" *&" #$$" #$$" #$$" #$$" #$$" &*" &*" &*" &*" &*" &*" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" #$$" Common to use one byte per value: 0=black, 255=white) [Source: N. Snavely] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 8 / 82 What is an image? We can think of a (grayscale) image as a function f : <2 ! < giving the intensity at position (x; y) f (x, y) x y A digital image is a discrete (sampled, quantized) version of this function [Source: N. Snavely] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 9 / 82 Image Transformations As with any function, we can apply operators to an image !g (x,y) = f (x,y) + 20 !g (x,y) = f (-x,y) We'll talk about special kinds of operators, correlation and convolution (linear filtering) [Source: N. Snavely] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 10 / 82 Filtering Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 11 / 82 What's the next best thing? [Source: S. Seitz] Question: Noise reduction Given a camera and a still scene, how can you reduce noise? Take lots of images and average them! Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 12 / 82 What's the next best thing? [Source: S. Seitz] Question: Noise reduction Given a camera and a still scene, how can you reduce noise? Take lots of images and average them! Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 12 / 82 Question: Noise reduction Given a camera and a still scene, how can you reduce noise? Take lots of images and average them! What's the next best thing? [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 12 / 82 Question: Noise reduction Given a camera and a still scene, how can you reduce noise? Take lots of images and average them! What's the next best thing? [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 12 / 82 Image filtering Modify the pixels in an image based on some function of a local neighborhood of each pixel #'" !" &" 5).0"678*9)8" $" !" #" %" #" #" %" ()*+,"-.+/0"1+2+" 3)1-401"-.+/0"1+2+" [Source: L. Zhang] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 13 / 82 Detect patterns, e.g., template matching. Applications of Filtering Enhance an image, e.g., denoise, resize. Extract information, e.g., texture, edges. Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 14 / 82 Applications of Filtering Enhance an image, e.g., denoise, resize. Extract information, e.g., texture, edges. Detect patterns, e.g., template matching. Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 14 / 82 Applications of Filtering Enhance an image, e.g., denoise, resize. Extract information, e.g., texture, edges. Detect patterns, e.g., template matching. Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 14 / 82 [Source: S. Marschner] Noise reduction Simplest thing: replace each pixel by the average of its neighbors. This assumes that neighboring pixels are similar, and the noise to be independent from pixel to pixel. Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 15 / 82 Noise reduction Simplest thing: replace each pixel by the average of its neighbors. This assumes that neighboring pixels are similar, and the noise to be independent from pixel to pixel. [Source: S. Marschner] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 15 / 82 Noise reduction Simplest thing: replace each pixel by the average of its neighbors. This assumes that neighboring pixels are similar, and the noise to be independent from pixel to pixel. [Source: S. Marschner] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 15 / 82 Noise reduction Simplest thing: replace each pixel by the average of its neighbors This assumes that neighboring pixels are similar, and the noise to be independent from pixel to pixel. Moving average in 1D: [1, 1, 1, 1, 1]/5 [Source: S. Marschner] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 16 / 82 Noise reduction Simpler thing: replace each pixel by the average of its neighbors This assumes that neighboring pixels are similar, and the noise to be independent from pixel to pixel. Non-uniform weights [1, 4, 6, 4, 1] / 16 [Source: S. Marschner] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 17 / 82 Moving Average in 2D [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 18 / 82 Moving Average in 2D [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 18 / 82 Moving Average in 2D [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 18 / 82 Moving Average in 2D [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 18 / 82 Moving Average in 2D [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 18 / 82 Moving Average in 2D [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 18 / 82 The entries of the weight kernel or mask h(k; l) are often called the filter coefficients. This operator is the correlation operator g = f ⊗ h Linear Filtering: Correlation Involves weighted combinations of pixels in small neighborhoods. The output pixels value is determined as a weighted sum of input pixel values X g(i; j) = f (i + k; j + l)h(k; l) k;l Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 19 / 82 This operator is the correlation operator g = f ⊗ h Linear Filtering: Correlation Involves weighted combinations of pixels in small neighborhoods. The output pixels value is determined as a weighted sum of input pixel values X g(i; j) = f (i + k; j + l)h(k; l) k;l The entries of the weight kernel or mask h(k; l) are often called the filter coefficients. Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 19 / 82 Linear Filtering: Correlation Involves weighted combinations of pixels in small neighborhoods. The output pixels value is determined as a weighted sum of input pixel values X g(i; j) = f (i + k; j + l)h(k; l) k;l The entries of the weight kernel or mask h(k; l) are often called the filter coefficients. This operator is the correlation operator g = f ⊗ h Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 19 / 82 Linear Filtering: Correlation Involves weighted combinations of pixels in small neighborhoods. The output pixels value is determined as a weighted sum of input pixel values X g(i; j) = f (i + k; j + l)h(k; l) k;l The entries of the weight kernel or mask h(k; l) are often called the filter coefficients. This operator is the correlation operator g = f ⊗ h Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 19 / 82 Smoothing by averaging What if the filter size was 5 x 5 instead of 3 x 3? [Source: K. Graumann] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 20 / 82 Gaussian filter What if we want nearest neighboring pixels to have the most influence on the output? Removes high-frequency components from the image (low-pass filter). [Source: S. Seitz] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 21 / 82 Smoothing with a Gaussian [Source: K. Grauman] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 22 / 82 Mean vs Gaussian Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 23 / 82 Gaussian filter: Parameters Size of kernel or mask: Gaussian function has infinite support, but discrete filters use finite kernels. [Source: K. Grauman] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 24 / 82 Gaussian filter: Parameters Variance of the Gaussian: determines extent of smoothing. [Source: K. Grauman] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 25 / 82 Gaussian filter: Parameters [Source: K. Grauman] Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 26 / 82 Is this the most general Gaussian? No, the most general form for x 2 <d 1 1 N (x; µ, Σ) = exp − (x − µ)T Σ−1(x − µ) (2π)d=2jΣj1=2 2 But the simplified version is typically use for filtering. Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 27 / 82 Amount of smoothing proportional to mask size. Remove high-frequency components; low-pass filter. [Source: K. Grauman] Properties of the Smoothing All values are positive. They all sum to 1. Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 28 / 82 Remove high-frequency components; low-pass filter. [Source: K. Grauman] Properties of the Smoothing All values are positive. They all sum to 1.
Recommended publications
  • Synthesizing Images of Humans in Unseen Poses
    Synthesizing Images of Humans in Unseen Poses Guha Balakrishnan Amy Zhao Adrian V. Dalca Fredo Durand MIT MIT MIT and MGH MIT [email protected] [email protected] [email protected] [email protected] John Guttag MIT [email protected] Abstract We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, re- taining the appearance of both the person and background. We present a modular generative neural network that syn- Source Image Target Pose Synthesized Image thesizes unseen poses using training pairs of images and poses taken from human action videos. Our network sepa- Figure 1. Our method takes an input image along with a desired target pose, and automatically synthesizes a new image depicting rates a scene into different body part and background lay- the person in that pose. We retain the person’s appearance as well ers, moves body parts to new locations and refines their as filling in appropriate background textures. appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single part details consistent with the new pose. Differences in target image as a supervised label. We use an adversarial poses can cause complex changes in the image space, in- discriminator to force our network to synthesize realistic volving several moving parts and self-occlusions. Subtle details conditioned on pose. We demonstrate image syn- details such as shading and edges should perceptually agree thesis results on three action classes: golf, yoga/workouts with the body’s configuration.
    [Show full text]
  • Moving Average Filters
    CHAPTER 15 Moving Average Filters The moving average is the most common filter in DSP, mainly because it is the easiest digital filter to understand and use. In spite of its simplicity, the moving average filter is optimal for a common task: reducing random noise while retaining a sharp step response. This makes it the premier filter for time domain encoded signals. However, the moving average is the worst filter for frequency domain encoded signals, with little ability to separate one band of frequencies from another. Relatives of the moving average filter include the Gaussian, Blackman, and multiple- pass moving average. These have slightly better performance in the frequency domain, at the expense of increased computation time. Implementation by Convolution As the name implies, the moving average filter operates by averaging a number of points from the input signal to produce each point in the output signal. In equation form, this is written: EQUATION 15-1 Equation of the moving average filter. In M &1 this equation, x[ ] is the input signal, y[ ] is ' 1 % y[i] j x [i j ] the output signal, and M is the number of M j'0 points used in the moving average. This equation only uses points on one side of the output sample being calculated. Where x[ ] is the input signal, y[ ] is the output signal, and M is the number of points in the average. For example, in a 5 point moving average filter, point 80 in the output signal is given by: x [80] % x [81] % x [82] % x [83] % x [84] y [80] ' 5 277 278 The Scientist and Engineer's Guide to Digital Signal Processing As an alternative, the group of points from the input signal can be chosen symmetrically around the output point: x[78] % x[79] % x[80] % x[81] % x[82] y[80] ' 5 This corresponds to changing the summation in Eq.
    [Show full text]
  • Memristor-Based Approximated Computation
    Memristor-based Approximated Computation Boxun Li1, Yi Shan1, Miao Hu2, Yu Wang1, Yiran Chen2, Huazhong Yang1 1Dept. of E.E., TNList, Tsinghua University, Beijing, China 2Dept. of E.C.E., University of Pittsburgh, Pittsburgh, USA 1 Email: [email protected] Abstract—The cessation of Moore’s Law has limited further architectures, which not only provide a promising hardware improvements in power efficiency. In recent years, the physical solution to neuromorphic system but also help drastically close realization of the memristor has demonstrated a promising the gap of power efficiency between computing systems and solution to ultra-integrated hardware realization of neural net- works, which can be leveraged for better performance and the brain. The memristor is one of those promising devices. power efficiency gains. In this work, we introduce a power The memristor is able to support a large number of signal efficient framework for approximated computations by taking connections within a small footprint by taking the advantage advantage of the memristor-based multilayer neural networks. of the ultra-integration density [7]. And most importantly, A programmable memristor approximated computation unit the nonvolatile feature that the state of the memristor could (Memristor ACU) is introduced first to accelerate approximated computation and a memristor-based approximated computation be tuned by the current passing through itself makes the framework with scalability is proposed on top of the Memristor memristor a potential, perhaps even the best, device to realize ACU. We also introduce a parameter configuration algorithm of neuromorphic computing systems with picojoule level energy the Memristor ACU and a feedback state tuning circuit to pro- consumption [8], [9].
    [Show full text]
  • A Survey of Autonomous Driving: Common Practices and Emerging Technologies
    Accepted March 22, 2020 Digital Object Identifier 10.1109/ACCESS.2020.2983149 A Survey of Autonomous Driving: Common Practices and Emerging Technologies EKIM YURTSEVER1, (Member, IEEE), JACOB LAMBERT 1, ALEXANDER CARBALLO 1, (Member, IEEE), AND KAZUYA TAKEDA 1, 2, (Senior Member, IEEE) 1Nagoya University, Furo-cho, Nagoya, 464-8603, Japan 2Tier4 Inc. Nagoya, Japan Corresponding author: Ekim Yurtsever (e-mail: [email protected]). ABSTRACT Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art is improved further. This paper discusses unsolved problems and surveys the technical aspect of automated driving. Studies regarding present challenges, high- level system architectures, emerging methodologies and core functions including localization, mapping, perception, planning, and human machine interfaces, were thoroughly reviewed. Furthermore, many state- of-the-art algorithms were implemented and compared on our own platform in a real-world driving setting. The paper concludes with an overview of available datasets and tools for ADS development. INDEX TERMS Autonomous Vehicles, Control, Robotics, Automation, Intelligent Vehicles, Intelligent Transportation Systems I. INTRODUCTION necessary here. CCORDING to a recent technical report by the Eureka Project PROMETHEUS [11] was carried out in A National Highway Traffic Safety Administration Europe between 1987-1995, and it was one of the earliest (NHTSA), 94% of road accidents are caused by human major automated driving studies. The project led to the errors [1]. Against this backdrop, Automated Driving Sys- development of VITA II by Daimler-Benz, which succeeded tems (ADSs) are being developed with the promise of in automatically driving on highways [12].
    [Show full text]
  • Comparative Analysis of Gaussian Filter with Wavelet Denoising for Various Noises Present in Images
    ISSN (Print) : 0974-6846 Indian Journal of Science and Technology, Vol 9(47), DOI: 10.17485/ijst/2016/v9i47/106843, December 2016 ISSN (Online) : 0974-5645 Comparative Analysis of Gaussian Filter with Wavelet Denoising for Various Noises Present in Images Amanjot Singh1,2* and Jagroop Singh3 1I.K.G. P.T.U., Jalandhar – 144603,Punjab, India; 2School of Electronics and Electrical Engineering, Lovely Professional University, Phagwara - 144411, Punjab, India; [email protected] 3Department of Electronics and Communication Engineering, DAVIET, Jalandhar – 144008, Punjab, India; [email protected] Abstract Objectives: This paper is providing a comparative performance analysis of wavelet denoising with Gaussian filter applied variouson images noises contaminated usually present with various in images. noises. Wavelet Gaussian transform filter is ais basic used filter to convert used in the image images processing. to wavelet Its response domain. isBased varying on with its kernel sizes that have also been shown in analysis. Wavelet based de-noisingMethods/Analysis: is also one of the way of removing thresholding operations in wavelet domain noise could be removed from images. In this paper, image quality matrices like PSNR andFindings MSE have been compared for the various types of noises in images for different denoising methods. Moreover, the behavior of different methods for image denoising have been graphically shown in paper with MATLAB based simulations. : In the end wavelet based de-noising methods has been compared with Gaussian basedKeywords: filter. The Denoising, paper provides Gaussian a review Filter, MSE,of filters PSNR, and SNR, their Thresholding, denoising analysis Wavelet under Transform different noise conditions. 1. Introduction is of bell shaped.
    [Show full text]
  • MPA15-16 a Baseband Pulse Shaping Filter for Gaussian Minimum Shift Keying
    A BASEBAND PULSE SHAPING FILTER FOR GAUSSIAN MINIMUM SHIFT KEYING 1 2 3 3 N. Krishnapura , S. Pavan , C. Mathiazhagan ,B.Ramamurthi 1 Department of Electrical Engineering, Columbia University, New York, NY 10027, USA 2 Texas Instruments, Edison, NJ 08837, USA 3 Department of Electrical Engineering, Indian Institute of Technology, Chennai, 600036, India Email: [email protected] measurement results. ABSTRACT A quadrature mo dulation scheme to realize the Gaussian pulse shaping is used in digital commu- same function as Fig. 1 can be derived. In this pap er, nication systems like DECT, GSM, WLAN to min- we consider only the scheme shown in Fig. 1. imize the out of band sp ectral energy. The base- band rectangular pulse stream is passed through a 2. GAUSSIAN FREQUENCY SHIFT lter with a Gaussian impulse resp onse b efore fre- KEYING GFSK quency mo dulating the carrier. Traditionally this The output of the system shown in Fig. 1 can b e describ ed is done by storing the values of the pulse shap e by in a ROM and converting it to an analog wave- Z t form with a DAC followed by a smo othing lter. g d 1 y t = cos 2f t +2k c f This pap er explores a fully analog implementation 1 of an integrated Gaussian pulse shap er, which can where f is the unmo dulated carrier frequency, k is the c f result in a reduced power consumption and chip mo dulating index k =0:25 for Gaussian Minimum Shift f area. Keying|GMSK[1] and g denotes the convolution of the rectangular bit stream bt with values in f1; 1g 1.
    [Show full text]
  • CSE 152: Computer Vision Manmohan Chandraker
    CSE 152: Computer Vision Manmohan Chandraker Lecture 15: Optimization in CNNs Recap Engineered against learned features Label Convolutional filters are trained in a Dense supervised manner by back-propagating classification error Dense Dense Convolution + pool Label Convolution + pool Classifier Convolution + pool Pooling Convolution + pool Feature extraction Convolution + pool Image Image Jia-Bin Huang and Derek Hoiem, UIUC Two-layer perceptron network Slide credit: Pieter Abeel and Dan Klein Neural networks Non-linearity Activation functions Multi-layer neural network From fully connected to convolutional networks next layer image Convolutional layer Slide: Lazebnik Spatial filtering is convolution Convolutional Neural Networks [Slides credit: Efstratios Gavves] 2D spatial filters Filters over the whole image Weight sharing Insight: Images have similar features at various spatial locations! Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R. Fergus, Y. LeCun Slide: Lazebnik Convolution as a feature extractor Key operations in a CNN Feature maps Rectified Linear Unit (ReLU) Spatial pooling Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Key operations in a CNN Feature maps Spatial pooling Max Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Pooling operations • Aggregate multiple values into a single value • Invariance to small transformations • Keep only most important information for next layer • Reduces the size of the next layer • Fewer parameters, faster computations • Observe larger receptive field in next layer • Hierarchically extract more abstract features Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R.
    [Show full text]
  • 1 Convolution
    CS1114 Section 6: Convolution February 27th, 2013 1 Convolution Convolution is an important operation in signal and image processing. Convolution op- erates on two signals (in 1D) or two images (in 2D): you can think of one as the \input" signal (or image), and the other (called the kernel) as a “filter” on the input image, pro- ducing an output image (so convolution takes two images as input and produces a third as output). Convolution is an incredibly important concept in many areas of math and engineering (including computer vision, as we'll see later). Definition. Let's start with 1D convolution (a 1D \image," is also known as a signal, and can be represented by a regular 1D vector in Matlab). Let's call our input vector f and our kernel g, and say that f has length n, and g has length m. The convolution f ∗ g of f and g is defined as: m X (f ∗ g)(i) = g(j) · f(i − j + m=2) j=1 One way to think of this operation is that we're sliding the kernel over the input image. For each position of the kernel, we multiply the overlapping values of the kernel and image together, and add up the results. This sum of products will be the value of the output image at the point in the input image where the kernel is centered. Let's look at a simple example. Suppose our input 1D image is: f = 10 50 60 10 20 40 30 and our kernel is: g = 1=3 1=3 1=3 Let's call the output image h.
    [Show full text]
  • Deep Clustering with Convolutional Autoencoders
    Deep Clustering with Convolutional Autoencoders Xifeng Guo1, Xinwang Liu1, En Zhu1, and Jianping Yin2 1 College of Computer, National University of Defense Technology, Changsha, 410073, China [email protected] 2 State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, China Abstract. Deep clustering utilizes deep neural networks to learn fea- ture representation that is suitable for clustering tasks. Though demon- strating promising performance in various applications, we observe that existing deep clustering algorithms either do not well take advantage of convolutional neural networks or do not considerably preserve the local structure of data generating distribution in the learned feature space. To address this issue, we propose a deep convolutional embedded clus- tering algorithm in this paper. Specifically, we develop a convolutional autoencoders structure to learn embedded features in an end-to-end way. Then, a clustering oriented loss is directly built on embedded features to jointly perform feature refinement and cluster assignment. To avoid feature space being distorted by the clustering loss, we keep the decoder remained which can preserve local structure of data in feature space. In sum, we simultaneously minimize the reconstruction loss of convolutional autoencoders and the clustering loss. The resultant optimization prob- lem can be effectively solved by mini-batch stochastic gradient descent and back-propagation. Experiments on benchmark datasets empirically validate
    [Show full text]
  • Tensorizing Neural Networks
    Tensorizing Neural Networks Alexander Novikov1;4 Dmitry Podoprikhin1 Anton Osokin2 Dmitry Vetrov1;3 1Skolkovo Institute of Science and Technology, Moscow, Russia 2INRIA, SIERRA project-team, Paris, France 3National Research University Higher School of Economics, Moscow, Russia 4Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia [email protected] [email protected] [email protected] [email protected] Abstract Deep neural networks currently demonstrate state-of-the-art performance in sev- eral domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train [17] format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks [21] we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times. 1 Introduction Deep neural networks currently demonstrate state-of-the-art performance in many domains of large- scale machine learning, such as computer vision, speech recognition, text processing, etc. These advances have become possible because of algorithmic advances, large amounts of available data, and modern hardware.
    [Show full text]
  • Fully Convolutional Mesh Autoencoder Using Efficient Spatially Varying Kernels
    Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels Yi Zhou∗ Chenglei Wu Adobe Research Facebook Reality Labs Zimo Li Chen Cao Yuting Ye University of Southern California Facebook Reality Labs Facebook Reality Labs Jason Saragih Hao Li Yaser Sheikh Facebook Reality Labs Pinscreen Facebook Reality Labs Abstract Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes, like tetrahedrons and non-manifold meshes. While more general graph convolution methods can be employed, they lack performance in reconstruction precision and require higher memory usage. In this paper, we propose a non-template-specific fully convolutional mesh autoencoder for arbitrary registered mesh data. It is enabled by our novel convolution and (un)pooling operators learned with globally shared weights and locally varying coefficients which can efficiently capture the spatially varying contents presented by irregular mesh connections. Our model outperforms state-of-the-art methods on reconstruction accuracy. In addition, the latent codes of our network are fully localized thanks to the fully convolutional structure, and thus have much higher interpolation capability than many traditional 3D mesh generation models. 1 Introduction arXiv:2006.04325v2 [cs.CV] 21 Oct 2020 Learning latent representations for registered meshes 2, either from performance capture or physical simulation, is a core component for many 3D tasks, ranging from compressing and reconstruction to animation and simulation.
    [Show full text]
  • Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Ramesh Simhambhatla SMU, [email protected]
    SMU Data Science Review Volume 2 | Number 1 Article 23 2019 Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Ramesh Simhambhatla SMU, [email protected] Kevin Okiah SMU, [email protected] Shravan Kuchkula SMU, [email protected] Robert Slater Southern Methodist University, [email protected] Follow this and additional works at: https://scholar.smu.edu/datasciencereview Part of the Artificial Intelligence and Robotics Commons, Other Computer Engineering Commons, and the Statistical Models Commons Recommended Citation Simhambhatla, Ramesh; Okiah, Kevin; Kuchkula, Shravan; and Slater, Robert (2019) "Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions," SMU Data Science Review: Vol. 2 : No. 1 , Article 23. Available at: https://scholar.smu.edu/datasciencereview/vol2/iss1/23 This Article is brought to you for free and open access by SMU Scholar. It has been accepted for inclusion in SMU Data Science Review by an authorized administrator of SMU Scholar. For more information, please visit http://digitalrepository.smu.edu. Simhambhatla et al.: Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Kevin Okiah1, Shravan Kuchkula1, Ramesh Simhambhatla1, Robert Slater1 1 Master of Science in Data Science, Southern Methodist University, Dallas, TX 75275 USA {KOkiah, SKuchkula, RSimhambhatla, RSlater}@smu.edu Abstract. Deep Learning has revolutionized Computer Vision, and it is the core technology behind capabilities of a self-driving car. Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of object detection.
    [Show full text]