TERADEEP Eugenio Culurciello Purdue University Teradeep Inc

Home , Neuromorphic engineering

Enabling machines to perceive the world

Text

TERADEEP Eugenio Culurciello Purdue University Teradeep Inc.

Clement Farabet Alfredo Canziani Berin Martini

Aysegul Dundar Yann LeCun Vinayak Jonghoon NYU Gokhale Jin

UAV perceive the world

like we do! PARTS INSPECTION

HOME AUTOMATION Autonomous cars

PARKING MANAGEMENT

70k$

+TeraDeep HW enabling low-cost autonomous ~100$ driving!

on sale: 199$

enabling always-on real-time hardware tagging videos on mobile devices!

enabling low-power servers for parsing big data, images, videos!

Neuroscience vision learning neural networks

palm sized!

1: deep neural networks

2: streaming processor architecture

3: assemble vision systems

4: training deep networks

Animal

Prefrontal 11, vs. 8 Cortex 46 45 12 13 non-animal ✦ V1: PG • Simple and complex cells tuning properties (Schiller et al 1976; Hubel & Wiesel 1965; V1 Devalois et al 1982) TE • MAX operation in subset of complex cells LIP,VIP,DP,7a PIT, AIT PIT, AIT,36,35 (Lampl et al 2004) V2,V3,V4,MT,MST

STP ✦ V4: 36 35 } TG • Tuning for two-bar stimuli (Reynolds Chelazzi TPO PGa IPa TEa TEm & Desimone 1999) AIT Rostral STS • MAX operation (Gawne et al 2002) PG Cortex • Two-spot interaction (Freiwald et al 2005) • Tuning for boundary conformation (Pasupathy & Connor 2001) Tuning for Cartesian and non-Cartesian DP VIP LIP 7a PP MSTcMSTp FST PIT TF • gratings (Gallant et al 1996) ✦ IT: • Tuning and invariance properties (Logothetis et al 1995) PO V3A MT V4 • Differential role of IT and PFC in categorization (Freedman et al 2001 2002 V3 V2 2003) • Read out data (Hung Kreiman Poggio & V1 DiCarlo 2005) • Average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo in press) dorsal stream ventral stream 'where' pathway 'what' pathway ✦ Human behavior: • Rapid animal categorization (Serre Oliva Poggio 2007)

(Riesenhuber & Poggio 1999 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva & Poggio 2007) deep networks

layer 1 layer 2 layer 3 output classiﬁer N1 maps N2 maps N3 maps MLP input Image

1 to N1 convolutions output class N1/M1 to N2 N2/M2 to N3 map convolutions convolutions

deep, invariant, trained on data, model of visual system, state-of-art

one layer

input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool

dataset stochastic gradient descent

Deep Learning algorithms multi-layer neural networks trained on large amounts of data

ARM External Memory memory Processors controller

Memory interconnect

Memory router

Router Router Router

Parameterizable co- processor bus MAC MAC MAC 32 bit oﬀ-chip bus

64 bit on-chip bus f(x) f(x) f(x) 32/64 bit on-chip high throughput bus nn-X 32-bit conﬁg bus

Parallel no flow control

nn-X

> 360 G-ops/s on FPGA parallel: 8x operators collections includes neural classifier

Performance and power consumption computed on a 16x10x10 ﬁlter-bank over a 500x500 input image Eugenio Culurciello © 2014 • IBM 45 nm high-density technology, 2.5x5mm nn-X • 400MHz system clock, 200MHz CPU clock • 4 convolvers, combiners, mappers ASIC • Performance 12x better than FPGA (G-ops/Watt)

dataset stochastic gradient descent

input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool

It is all about: [filters], connections!!!

respond fire together together wire together cluster together

compatible with STDP learning

use filters as parameters for clustered means this layer random patches of images

use filters as a 5 mins technique!parameters for clustered means this layer random patches of images

receptive field (RF)

RF filters ...

...

main idea: learn to CLUSTER inputs based on

input Image N1 to N1 non-linearity 1 to N1 convolutions subsampling tanh or correlation or pooling etc... or template matching or L norm pool

Multiple layers of deep network: Repeat for each layer: 1- sample output of previous layer (new input) 2- cluster these inputs = filters 3- use filters to generate outputs

CIFAR10

same patch location for multiple frames

run k-means on group of patches

• state-of-the-art architecture for complex understanding of the environment

• fully programmable

• customized for neural networks, is extendable to any state-of-the-art algorithms

• scalable to larger networks and applications

• To do: extend it to learning from videos, unsupervised multi-layer learning

Past members: Zhengming Fu Pujitha Weerakoon E-Lab current members: Shoushun Chen Aysegul Dundar Farah Laiwalla Vinayak Gokhale Huang Chenxi Jonhoon Jin Dongsoo Kim Andrew Kunil Choe Alfredo Canziani Selcuk Talay Berin Martini Polina Askelrod Faye Zhao Ifigeneia Derekli Hazael Montanaro and with: Darko Jelaca Visitors: Clement Farabet Jonathan McMillan Jose Carrasco Evan JoonHuyk Park Angelo Rottigni Wei Tang Alejandro Linares-Barranco Phi-Hung Pham Rafael Paz Jordan Bates