Building Intelligent Systems with Large Scale Jeff Dean Google Brain team g.co/brain

Presenting the work of many people at Google Google Brain Team Mission: Make Machines Intelligent. Improve People’s Lives. How do we do this?

● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers) ○ of cats, Inception, , seq2seq, DeepDream, image captioning, neural translation, Magenta, ML for robotics control, healthcare, …

● Build and open-source systems like TensorFlow (see .org and https://github.com/tensorflow/tensorflow)

● Collaborate with others at Google and Alphabet to get our work into the hands of billions of people (e.g., RankBrain for Google Search, GMail Smart Reply, Google Photos, Google speech recognition, Google Translate, Waymo, …)

● Train new researchers through internships and the Google Brain Residency program Main Research Areas

● General and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation Main Research Areas

● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation research.googleblog.com/2017/01 /the-google-brain-team-looking-ba ck-on.html 1980s and 1990s

Accuracy neural networks

other approaches

Scale (data size, model size) 1980s and 1990s

more Accuracy compute neural networks

other approaches

Scale (data size, model size) Now

more Accuracy compute neural networks

other approaches

Scale (data size, model size) Growth of Deep Learning at Google

and many more . . . . Directories containing model description files Experiment Turnaround Time and Research Productivity ● Minutes, Hours: ○ Interactive research! Instant gratification! ● 1-4 days ○ Tolerable ○ Interactivity replaced by running many experiments in parallel ● 1-4 weeks ○ High value experiments only ○ Progress stalls ● >1 month ○ Don’t even try Build the right tools

Google Confidential + Proprietary (permission granted to share within NIST) Open, standard software for general machine learning

Great for Deep Learning in particular

http://tensorflow.org/ First released Nov 2015

and Apache 2.0 license https://github.com/tensorflow/tensorflow TensorFlow Goals

Establish common platform for expressing machine learning ideas and systems

Make this platform the best in the world for both research and production use

Open source it so that it becomes a platform for everyone, not just Google

TensorFlow Scaling Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80 server added to the cluster TensorFlow supports many platforms

CPU GPU

iOS Android

Raspberry Pi

1st-gen TPU Cloud TPU TensorFlow supports many languages Java

2013

2011

2013

2013

2010 late 2015 ML is done in many places

TensorFlow GitHub stars by GitHub user profiles w/ public locations Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow TensorFlow: A Vibrant Open-Source Community

● Rapid development, many outside contributors ○ 475+ non-Google contributors to TensorFlow 1.0 ○ 15,000+ commits in 15 months ○ Many community created tutorials, models, translations, and projects ■ ~7,000 GitHub repositories with ‘TensorFlow’ in the title

● Direct engagement between community and TensorFlow team ○ 5000+ Stack Overflow questions answered ○ 80+ community-submitted GitHub issues responded to weekly

● Growing use in ML classes: Toronto, Berkeley, Stanford, ... Google Photos

[glacier]

Google Cloud Platform Confidential & Proprietary 24 24 Reuse same model for completely different problems

Same basic model structure trained on different data, useful in completely different contexts

Example: given image → predict interesting pixels

www.google.com/sunroof

We have tons of vision problems

Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars, Computers can now see

Large implications for healthcare

Google Confidential + Proprietary (permission granted to share within NIST) MEDICAL IMAGING Using similar model for detecting diabetic retinopathy in retinal images

Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91). http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html Computers can now see

Large implications for robotics

Google Confidential + Proprietary (permission granted to share within NIST) Combining Vision with Robotics

“Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016

“Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, Arxiv, .org/abs/1603.02199 Self-Supervised and End-to-end Pose Estimation

Confidential + Proprietary TCN + Self-Supervision (No Labels!)

Confidential + Proprietary Scientific Applications of ML

Google Confidential + Proprietary (permission granted to share within NIST) Predicting Properties of Molecules

Toxic?

Message Passing Neural Bind with a given protein? Aspirin Net

Quantum properties.

● Chemical space is too big, so chemists often rely on virtual screening. ● Machine Learning can help search this large space. ● Molecules are graphs, nodes=atoms and edges=bonds (and other stuff) ● Message Passing Neural Nets unify and extend many neural net models that are invariant to graph symmetries ● State of the art results predicting output of expensive quantum chemistry calculations, but ~300,000 times faster https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html and https://arxiv.org/abs/1702.05532 and https://arxiv.org/abs/1704.01212 (latter to appear in ICML 2017) Measuring live cells with image to image regression

“Seeing More” Enabling technology: Image to image regression

Input True Depth Predicted Depth Depth prediction on portrait data Applications for camera effects Input Saturation Defocus Predict cellular markers from transmission microscopy? Human cancer cells / DIC / nuclei (blue) and cell mask (green) Human iPSC neurons / phase contrast / nuclei (blue), dendrites (green), and axons (red) Scaling language understanding models

Google Confidential + Proprietary (permission granted to share within NIST) Sequence-to-Sequence Model

Target sequence

[Sutskever & Vinyals & Le NIPS 2014] X Y Z Q

v

Deep LSTM

A B C D __ X Y Z

Input sequence Sequence-to-Sequence Model: Machine Translation

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How

v

Quelle est votre taille?

Input sentence Sequence-to-Sequence Model: Machine Translation

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How tall

v

Quelle est votre taille? How

Input sentence Sequence-to-Sequence Model: Machine Translation

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How tall are

v

Quelle est votre taille? How tall

Input sentence Sequence-to-Sequence Model: Machine Translation

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How tall are you?

v

Quelle est votre taille? How tall are

Input sentence Sequence-to-Sequence Model: Machine Translation At inference time: Beam search to choose most probable [Sutskever & Vinyals & Le NIPS 2014] over possible output sequences

v

Quelle est votre taille?

Input sentence Google Research Blog - Nov 2015 Incoming Email Smart Reply Activate Small Smart Reply? Feed-Forward yes/no Neural Network Google Research Blog - Nov 2015 Incoming Email Smart Reply Activate Small Smart Reply? Feed-Forward yes/no Neural Network

Generated Replies

Deep Smart Reply

April 1, 2009: April Fool’s Day joke

Nov 5, 2015: Launched Real Product

Feb 1, 2016: >10% of mobile Inbox replies Sequence to Sequence model applied to Google Translate

Google Confidential + Proprietary (permission granted to share within NIST) https://arxiv.org/abs/1609.08144 Google Neural Machine Translation Model

Y1 Y2 One model Encoder LSTMs SoftMax replica: one Decoder LSTMs machine Gpu8 w/ 8 Gpu8 GPUs

8 Layers + + + + + + Gpu3

Gpu3

Gpu2 Attention Gpu2 Gpu2

Gpu1 Gpu1 Y1 Y3 X3 X2 Model + Data Parallelism

Parameters distributed across Params Params Params many parameter ... server machines

Many ... replicas Neural Machine Translation

6 perfect translation

5 human 4 neural (GNMT) phrase-based (PBMT) 3

2 Closes gap between old system

Translation quality 1 and human-quality translation by 58% to 87% 0 English English English Spanish French Chinese > > > > > > Spanish French Chinese English English English Enables better communication Translation model across the world research.googleblog.com/2016/09/a-neural-network-for-machine.html BACKTRANSLATION FROM JAPANESE (en->ja->en) Phrase-Based Machine Translation (old system): Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.

Google Neural Machine Translation (new system): Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” God ‘s house in Masai language. There is a dried and frozen carcass of a leopard near the summit of the west. No one can explain what the leopard was seeking at that altitude. Automated machine learning (“learning to learn”)

Google Confidential + Proprietary (permission granted to share within NIST) Current: Solution = ML expertise + data + computation Current: Solution = ML expertise + data + computation

Can we turn this into: Solution = data + 100X computation

??? Early encouraging signs

Trying multiple different approaches:

(1) RL-based architecture search (2) Model architecture evolution (3) Learn how to optimize Appeared in ICLR 2017

Idea: model-generating model trained via RL

(1) Generate ten models (2) Train them for a few hours (3) Use loss of the generated models as signal

arxiv.org/abs/1611.01578 CIFAR-10 Image Recognition Task Penn Tree Bank Language Modeling Task “Normal” LSTM cell

Cell discovered by architecture search Learn2Learn: Learn the Optimization Update Rule

Neural Optimizer Search using Reinforcement Learning, Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. To appear in ICML 2017

More computational power needed

Deep learning is transforming how we design computers

Google Confidential + Proprietary (permission granted to share within NIST) Special computation properties

about 1.2 1.21042 reduced precision × about 0.6 NOT × 0.61127 ok about 0.7 0.73989343 Special computation properties

about 1.2 1.21042 reduced precision × about 0.6 NOT × 0.61127 ok about 0.7 0.73989343

handful of specific × = operations Tensor Processing Unit v2

Revealed in May at Google I/O

Google-designed device for neural net training and inference

● 180 teraflops of computation, 64 GB of memory ● Designed to be connected together TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory Programmed via TensorFlow

Same program will run with only minor modifications on CPUs, GPUs, & TPUs

Will be Available through Google Cloud Cloud TPU - virtual machine w/180 TFLOPS TPUv2 device attached Making 1000 Cloud TPUs available for free to top researchers who are committed to open machine learning research

We’re excited to see what researchers will do with much more computation! g.co/tpusignup Machine Learning in Google Cloud

Custom ML models Pre-trained ML models

Vision API Speech API Jobs API

TensorFlow Machine Learning Engine

Natural Translation Video Language API API Intelligence API Machine Learning for Higher Performance Machine Learning Models

Google Confidential + Proprietary (permission granted to share within NIST) Device Placement with Reinforcement Learning

Placement model (trained via RL) gets graph as input Measured time + set of devices, outputs per step gives device placement for each RL reward signal graph node

+19.3% faster vs. expert human for NMT model +19.7% faster vs. expert human for InceptionV3

Device Placement Optimization with Reinforcement Learning, Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean, to appear in ICML 2017, arxiv.org/abs/1706.04972 Now

more Accuracy compute neural networks

other approaches

Scale (data size, model size) Future

more Accuracy compute neural networks

other approaches

Scale (data size, model size) Example queries of the future

Which of these eye images shows Describe this video symptoms of diabetic in Spanish retinopathy?

Find me documents related to reinforcement learning for Please fetch me a cup robotics and summarize them of tea from the kitchen in German Conclusions Deep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, …

If you’re not considering how to use deep neural nets to solve your problems, you almost certainly should be g.co/brain More info about our work g.co/brain

Thanks!