Enabling machines to perceive the world
Text
TERADEEP Eugenio Culurciello Purdue University Teradeep Inc.
Eugenio Culurciello © 2014 team
Clement Farabet Alfredo Canziani Berin Martini
Aysegul Dundar Yann LeCun Vinayak Jonghoon NYU Gokhale Jin
Eugenio Culurciello © 2014 enable ROBOTICS Our Goal: to FOOD INSPECTION machines
UAV perceive the world
like we do! PARTS INSPECTION
HOME AUTOMATION Autonomous cars
PARKING MANAGEMENT
BUILDING SECURITY BIOMETRICS Eugenio Culurciello © 2014 enabling technology
70k$
+TeraDeep HW enabling low-cost autonomous ~100$ driving!
Eugenio Culurciello © 2014 enabling technology
on sale: 199$
enabling always-on real-time hardware tagging videos on mobile devices!
Eugenio Culurciello © 2014 enabling technology
enabling low-power servers for parsing big data, images, videos!
Eugenio Culurciello © 2014 Electrical Engineering neuromorphic engineering synthetic vision Computer Science computer architecture computer vision machine learning
Neuroscience vision learning neural networks
Eugenio Culurciello © 2014 our vision system
palm sized!
Eugenio Culurciello © 2014 demos
state-of-the-art Eugenio Culurciello © 2014 demos
state-of-the-art Eugenio Culurciello © 2014 demos
state-of-the-art Eugenio Culurciello © 2014 how did we do this?
1: deep neural networks
2: streaming processor architecture
3: assemble vision systems
4: training deep networks
Eugenio Culurciello © 2014 1: deep neural networks
Eugenio Culurciello © 2014 Model
Animal
Prefrontal 11, vs. 8 Cortex 46 45 12 13 non-animal ✦ V1: PG • Simple and complex cells tuning properties (Schiller et al 1976; Hubel & Wiesel 1965; V1 Devalois et al 1982) TE • MAX operation in subset of complex cells LIP,VIP,DP,7a PIT, AIT PIT, AIT,36,35 (Lampl et al 2004) V2,V3,V4,MT,MST
STP ✦ V4: 36 35 } TG • Tuning for two-bar stimuli (Reynolds Chelazzi TPO PGa IPa TEa TEm & Desimone 1999) AIT Rostral STS • MAX operation (Gawne et al 2002) PG Cortex • Two-spot interaction (Freiwald et al 2005) • Tuning for boundary conformation (Pasupathy & Connor 2001) Tuning for Cartesian and non-Cartesian DP VIP LIP 7a PP MSTcMSTp FST PIT TF • gratings (Gallant et al 1996) ✦ IT: • Tuning and invariance properties (Logothetis et al 1995) PO V3A MT V4 • Differential role of IT and PFC in categorization (Freedman et al 2001 2002 V3 V2 2003) • Read out data (Hung Kreiman Poggio & V1 DiCarlo 2005) • Average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo in press) dorsal stream ventral stream 'where' pathway 'what' pathway ✦ Human behavior: • Rapid animal categorization (Serre Oliva Poggio 2007)
(Riesenhuber & Poggio 1999 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva & Poggio 2007) deep networks
layer 1 layer 2 layer 3 output classifier N1 maps N2 maps N3 maps MLP input Image
1 to N1 convolutions output class N1/M1 to N2 N2/M2 to N3 map convolutions convolutions
deep, invariant, trained on data, model of visual system, state-of-art
Eugenio Culurciello © 2014 deep networks
one layer
input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool
Eugenio Culurciello © 2014 deep network training
dataset stochastic gradient descent
Eugenio Culurciello more on training later... © 2014 we use:
Deep Learning algorithms multi-layer neural networks trained on large amounts of data
Eugenio Culurciello © 2014 Our workhorse: deep learning algorithms
Eugenio Culurciello © 2014 Our workhorse: deep learning algorithms
Eugenio Culurciello © 2014 2: streaming processor architecture
Eugenio Culurciello © 2014 conventional hardware
Eugenio Culurciello © 2014 our hardware: FPGA
Eugenio Culurciello © 2014 nn-X
ARM External Memory memory Processors controller
Memory interconnect
Memory router
Router Router Router
Parameterizable co- processor bus MAC MAC MAC 32 bit off-chip bus
64 bit on-chip bus f(x) f(x) f(x) 32/64 bit on-chip high throughput bus nn-X 32-bit config bus
Eugenio Culurciello © 2014 streaming architecture
Parallel no flow control
even with a slow clock Eugenio Culurciello © 2014 3: assemble vision systems
nn-X
> 360 G-ops/s on FPGA parallel: 8x operators collections includes neural classifier
Eugenio Culurciello © 2014 nn-X performance
Performance and power consumption computed on a 16x10x10 filter-bank over a 500x500 input image Eugenio Culurciello © 2014 • IBM 45 nm high-density technology, 2.5x5mm nn-X • 400MHz system clock, 200MHz CPU clock • 4 convolvers, combiners, mappers ASIC • Performance 12x better than FPGA (G-ops/Watt)
Eugenio Culurciello © 2014 nn-X performance
Eugenio Culurciello © 2014 nn-X performance
Eugenio Culurciello © 2014 nn-X performance
Eugenio Culurciello © 2014 4: training deep networks
dataset stochastic gradient descent
Eugenio Culurciello © 2014 expensive, not scalable for videos, imprecise, not “portable” to new apps
dataset stochastic gradient descent
Eugenio Culurciello © 2014 deep networks training one layer
input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool
It is all about: [filters], connections!!!
Eugenio Culurciello © 2014 Hebbian Learning Clustering Learning
respond fire together together wire together cluster together
compatible with STDP learning
Eugenio Culurciello © 2014 clustering learning
use filters as parameters for clustered means this layer random patches of images
Eugenio Culurciello © 2014 clustering learning
use filters as a 5 mins technique!parameters for clustered means this layer random patches of images
Eugenio Culurciello © 2014 layer N layer N+1 feature maps (N1) feature maps (N2)
receptive field (RF)
RF filters ...
...
main idea: learn to CLUSTER inputs based on
Eugenio Culurciello co-occurrence of features © 2014 deep networks one layer
input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool
Multiple layers of deep network: Repeat for each layer: 1- sample output of previous layer (new input) 2- cluster these inputs = filters 3- use filters to generate outputs
Eugenio Culurciello © 2014 clustering learning as well as standard convnet/CNN!!!
CIFAR10
Eugenio Culurciello © 2014 clustering learning: motion filters
same patch location for multiple frames
run k-means on group of patches
Eugenio Culurciello © 2014 clustering learning: topographic filters
Eugenio Culurciello © 2014 Summary of results
• state-of-the-art architecture for complex understanding of the environment
• fully programmable
• customized for neural networks, is extendable to any state-of-the-art algorithms
• scalable to larger networks and applications
• To do: extend it to learning from videos, unsupervised multi-layer learning
Eugenio Culurciello © 2014 Eugenio Culurciello © 2014 Thanksgiving
Past members: Zhengming Fu Pujitha Weerakoon E-Lab current members: Shoushun Chen Aysegul Dundar Farah Laiwalla Vinayak Gokhale Huang Chenxi Jonhoon Jin Dongsoo Kim Andrew Kunil Choe Alfredo Canziani Selcuk Talay Berin Martini Polina Askelrod Faye Zhao Ifigeneia Derekli Hazael Montanaro and with: Darko Jelaca Visitors: Clement Farabet Jonathan McMillan Jose Carrasco Evan JoonHuyk Park Angelo Rottigni Wei Tang Alejandro Linares-Barranco Phi-Hung Pham Rafael Paz Jordan Bates
Eugenio Culurciello © 2014