<<

Enabling machines to perceive the world

Text

TERADEEP Eugenio Culurciello Teradeep Inc.

Eugenio Culurciello © 2014 team

Clement Farabet Alfredo Canziani Berin Martini

Aysegul Dundar Yann LeCun Vinayak Jonghoon NYU Gokhale Jin

Eugenio Culurciello © 2014 enable ROBOTICS Our Goal: to FOOD INSPECTION machines

UAV perceive the world

like we do! PARTS INSPECTION

HOME AUTOMATION Autonomous cars

PARKING MANAGEMENT

BUILDING SECURITY BIOMETRICS Eugenio Culurciello © 2014 enabling technology

70k$

+TeraDeep HW enabling low-cost autonomous ~100$ driving!

Eugenio Culurciello © 2014 enabling technology

on sale: 199$

enabling always-on real-time hardware tagging videos on mobile devices!

Eugenio Culurciello © 2014 enabling technology

enabling low-power servers for parsing big data, images, videos!

Eugenio Culurciello © 2014 Electrical Engineering neuromorphic engineering synthetic vision computer architecture

Neuroscience vision learning neural networks

Eugenio Culurciello © 2014 our vision system

palm sized!

Eugenio Culurciello © 2014 demos

state-of-the-art Eugenio Culurciello © 2014 demos

state-of-the-art Eugenio Culurciello © 2014 demos

state-of-the-art Eugenio Culurciello © 2014 how did we do this?

1: deep neural networks

2: streaming architecture

3: assemble vision systems

4: training deep networks

Eugenio Culurciello © 2014 1: deep neural networks

Eugenio Culurciello © 2014 Model

Animal

Prefrontal 11, vs. 8 Cortex 46 45 12 13 non-animal ✦ V1: PG • Simple and complex cells tuning properties (Schiller et al 1976; Hubel & Wiesel 1965; V1 Devalois et al 1982) TE • MAX operation in subset of complex cells LIP,VIP,DP,7a PIT, AIT PIT, AIT,36,35 (Lampl et al 2004) V2,V3,V4,MT,MST

STP ✦ V4: 36 35 } TG • Tuning for two-bar stimuli (Reynolds Chelazzi TPO PGa IPa TEa TEm & Desimone 1999) AIT Rostral STS • MAX operation (Gawne et al 2002) PG Cortex • Two-spot interaction (Freiwald et al 2005) • Tuning for boundary conformation (Pasupathy & Connor 2001) Tuning for Cartesian and non-Cartesian DP VIP LIP 7a PP MSTcMSTp FST PIT TF • gratings (Gallant et al 1996) ✦ IT: • Tuning and invariance properties (Logothetis et al 1995) PO V3A MT V4 • Differential role of IT and PFC in categorization (Freedman et al 2001 2002 V3 V2 2003) • Read out data (Hung Kreiman Poggio & V1 DiCarlo 2005) • Average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo in press) dorsal stream ventral stream 'where' pathway 'what' pathway ✦ Human behavior: • Rapid animal categorization (Serre Oliva Poggio 2007)

(Riesenhuber & Poggio 1999 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva & Poggio 2007) deep networks

1 layer 2 layer 3 output classifier N1 maps N2 maps N3 maps MLP input Image

1 to N1 output class N1/M1 to N2 N2/M2 to N3 map convolutions convolutions

deep, invariant, trained on data, model of visual system, state-of-art

Eugenio Culurciello © 2014 deep networks

one layer

input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool

Eugenio Culurciello © 2014 deep network training

dataset stochastic

Eugenio Culurciello more on training later... © 2014 we use:

Deep Learning algorithms multi-layer neural networks trained on large amounts of data

Eugenio Culurciello © 2014 Our workhorse: algorithms

Eugenio Culurciello © 2014 Our workhorse: deep learning algorithms

Eugenio Culurciello © 2014 2: streaming processor architecture

Eugenio Culurciello © 2014 conventional hardware

Eugenio Culurciello © 2014 our hardware: FPGA

Eugenio Culurciello © 2014 nn-X

ARM External Memory memory Processors controller

Memory interconnect

Memory router

Router Router Router

Parameterizable co- processor bus MAC MAC MAC 32 bit off-chip bus

64 bit on-chip bus f(x) f(x) f(x) 32/64 bit on-chip high throughput bus nn-X 32-bit config bus

Eugenio Culurciello © 2014 streaming architecture

Parallel no flow control

even with a slow clock Eugenio Culurciello © 2014 3: assemble vision systems

nn-X

> 360 G-ops/s on FPGA parallel: 8x operators collections includes neural classifier

Eugenio Culurciello © 2014 nn-X performance

Performance and power consumption computed on a 16x10x10 filter-bank over a 500x500 input image Eugenio Culurciello © 2014 • IBM 45 nm high-density technology, 2.5x5mm nn-X • 400MHz system clock, 200MHz CPU clock • 4 convolvers, combiners, mappers ASIC • Performance 12x better than FPGA (G-ops/Watt)

Eugenio Culurciello © 2014 nn-X performance

Eugenio Culurciello © 2014 nn-X performance

Eugenio Culurciello © 2014 nn-X performance

Eugenio Culurciello © 2014 4: training deep networks

dataset stochastic gradient descent

Eugenio Culurciello © 2014 expensive, not scalable for videos, imprecise, not “portable” to new apps

dataset stochastic gradient descent

Eugenio Culurciello © 2014 deep networks training one layer

input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool

It is all about: [filters], connections!!!

Eugenio Culurciello © 2014 Hebbian Learning Clustering Learning

respond fire together together wire together cluster together

compatible with STDP learning

Eugenio Culurciello © 2014 clustering learning

use filters as parameters for clustered means this layer random patches of images

Eugenio Culurciello © 2014 clustering learning

use filters as a 5 mins technique!parameters for clustered means this layer random patches of images

Eugenio Culurciello © 2014 layer N layer N+1 feature maps (N1) feature maps (N2)

receptive field (RF)

RF filters ...

...

main idea: learn to CLUSTER inputs based on

Eugenio Culurciello co-occurrence of features © 2014 deep networks one layer

input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool

Multiple layers of deep network: Repeat for each layer: 1- sample output of previous layer (new input) 2- cluster these inputs = filters 3- use filters to generate outputs

Eugenio Culurciello © 2014 clustering learning as well as standard convnet/CNN!!!

CIFAR10

Eugenio Culurciello © 2014 clustering learning: motion filters

same patch location for multiple frames

run k-means on group of patches

Eugenio Culurciello © 2014 clustering learning: topographic filters

Eugenio Culurciello © 2014 Summary of results

• state-of-the-art architecture for complex understanding of the environment

• fully programmable

• customized for neural networks, is extendable to any state-of-the-art algorithms

• scalable to larger networks and applications

• To do: extend it to learning from videos, unsupervised multi-layer learning

Eugenio Culurciello © 2014 Eugenio Culurciello © 2014 Thanksgiving

Past members: Zhengming Fu Pujitha Weerakoon E-Lab current members: Shoushun Chen Aysegul Dundar Farah Laiwalla Vinayak Gokhale Huang Chenxi Jonhoon Jin Dongsoo Kim Andrew Kunil Choe Alfredo Canziani Selcuk Talay Berin Martini Polina Askelrod Faye Zhao Ifigeneia Derekli Hazael Montanaro and with: Darko Jelaca Visitors: Clement Farabet Jonathan McMillan Jose Carrasco Evan JoonHuyk Park Angelo Rottigni Wei Tang Alejandro Linares-Barranco Phi-Hung Pham Rafael Paz Jordan Bates

Eugenio Culurciello © 2014