Building Intelligent Systems with Large Scale Deep Learning Jeff Dean Google Brain team g.co/brain
Presenting the work of many people at Google Google Brain Team Mission: Make Machines Intelligent. Improve People’s Lives. How do we do this?
● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers) ○ Unsupervised learning of cats, Inception, word2vec, seq2seq, DeepDream, image captioning, neural translation, Magenta, ML for robotics control, healthcare, …
● Build and open-source systems like TensorFlow (see tensorflow.org and https://github.com/tensorflow/tensorflow)
● Collaborate with others at Google and Alphabet to get our work into the hands of billions of people (e.g., RankBrain for Google Search, GMail Smart Reply, Google Photos, Google speech recognition, Google Translate, Waymo, …)
● Train new researchers through internships and the Google Brain Residency program Main Research Areas
● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation Main Research Areas
● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation research.googleblog.com/2017/01 /the-google-brain-team-looking-ba ck-on.html 1980s and 1990s
Accuracy neural networks
other approaches
Scale (data size, model size) 1980s and 1990s
more Accuracy compute neural networks
other approaches
Scale (data size, model size) Now
more Accuracy compute neural networks
other approaches
Scale (data size, model size) Growth of Deep Learning at Google
and many more . . . . Directories containing model description files Experiment Turnaround Time and Research Productivity ● Minutes, Hours: ○ Interactive research! Instant gratification! ● 1-4 days ○ Tolerable ○ Interactivity replaced by running many experiments in parallel ● 1-4 weeks ○ High value experiments only ○ Progress stalls ● >1 month ○ Don’t even try Build the right tools
Google Confidential + Proprietary (permission granted to share within NIST) Open, standard software for general machine learning
Great for Deep Learning in particular
http://tensorflow.org/ First released Nov 2015
and Apache 2.0 license https://github.com/tensorflow/tensorflow TensorFlow Goals
Establish common platform for expressing machine learning ideas and systems
Make this platform the best in the world for both research and production use
Open source it so that it becomes a platform for everyone, not just Google
TensorFlow Scaling Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80 server added to the cluster TensorFlow supports many platforms
CPU GPU
iOS Android
Raspberry Pi
1st-gen TPU Cloud TPU TensorFlow supports many languages Java
2013
2011
2013
2013
2010 late 2015 ML is done in many places
TensorFlow GitHub stars by GitHub user profiles w/ public locations Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow TensorFlow: A Vibrant Open-Source Community
● Rapid development, many outside contributors ○ 475+ non-Google contributors to TensorFlow 1.0 ○ 15,000+ commits in 15 months ○ Many community created tutorials, models, translations, and projects ■ ~7,000 GitHub repositories with ‘TensorFlow’ in the title
● Direct engagement between community and TensorFlow team ○ 5000+ Stack Overflow questions answered ○ 80+ community-submitted GitHub issues responded to weekly
● Growing use in ML classes: Toronto, Berkeley, Stanford, ... Google Photos
[glacier]
Google Cloud Platform Confidential & Proprietary 24 24 Reuse same model for completely different problems
Same basic model structure trained on different data, useful in completely different contexts
Example: given image → predict interesting pixels
www.google.com/sunroof
We have tons of vision problems
Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars, Computers can now see
Large implications for healthcare
Google Confidential + Proprietary (permission granted to share within NIST) MEDICAL IMAGING Using similar model for detecting diabetic retinopathy in retinal images
Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91). http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html Computers can now see
Large implications for robotics
Google Confidential + Proprietary (permission granted to share within NIST) Combining Vision with Robotics
“Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016
“Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, Arxiv, arxiv.org/abs/1603.02199 Self-Supervised and End-to-end Pose Estimation
Confidential + Proprietary TCN + Self-Supervision (No Labels!)
Confidential + Proprietary Scientific Applications of ML
Google Confidential + Proprietary (permission granted to share within NIST) Predicting Properties of Molecules
Toxic?
Message Passing Neural Bind with a given protein? Aspirin Net
Quantum properties.
● Chemical space is too big, so chemists often rely on virtual screening. ● Machine Learning can help search this large space. ● Molecules are graphs, nodes=atoms and edges=bonds (and other stuff) ● Message Passing Neural Nets unify and extend many neural net models that are invariant to graph symmetries ● State of the art results predicting output of expensive quantum chemistry calculations, but ~300,000 times faster https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html and https://arxiv.org/abs/1702.05532 and https://arxiv.org/abs/1704.01212 (latter to appear in ICML 2017) Measuring live cells with image to image regression
“Seeing More” Enabling technology: Image to image regression
Input True Depth Predicted Depth Depth prediction on portrait data Applications for camera effects Input Saturation Defocus Predict cellular markers from transmission microscopy? Human cancer cells / DIC / nuclei (blue) and cell mask (green) Human iPSC neurons / phase contrast / nuclei (blue), dendrites (green), and axons (red) Scaling language understanding models
Google Confidential + Proprietary (permission granted to share within NIST) Sequence-to-Sequence Model
Target sequence
[Sutskever & Vinyals & Le NIPS 2014] X Y Z Q
v
Deep LSTM
A B C D __ X Y Z
Input sequence Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How
v
Quelle est votre taille?
Input sentence Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall
v
Quelle est votre taille?
Input sentence Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall are
v
Quelle est votre taille?
Input sentence Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall are you?
v
Quelle est votre taille?
Input sentence Sequence-to-Sequence Model: Machine Translation At inference time: Beam search to choose most probable [Sutskever & Vinyals & Le NIPS 2014] over possible output sequences
v
Quelle est votre taille?
Input sentence Google Research Blog - Nov 2015 Incoming Email Smart Reply Activate Small Smart Reply? Feed-Forward yes/no Neural Network Google Research Blog - Nov 2015 Incoming Email Smart Reply Activate Small Smart Reply? Feed-Forward yes/no Neural Network
Generated Replies
Deep Recurrent Neural Network Smart Reply
April 1, 2009: April Fool’s Day joke
Nov 5, 2015: Launched Real Product
Feb 1, 2016: >10% of mobile Inbox replies Sequence to Sequence model applied to Google Translate
Google Confidential + Proprietary (permission granted to share within NIST) https://arxiv.org/abs/1609.08144 Google Neural Machine Translation Model
Y1 Y2 One model Encoder LSTMs SoftMax replica: one Decoder LSTMs machine Gpu8 w/ 8 Gpu8 GPUs
8 Layers + + + + + + Gpu3
Gpu3
Gpu2 Attention Gpu2 Gpu2
Gpu1 Gpu1 Y1 Y3 X3 X2 Model + Data Parallelism
Parameters distributed across Params Params Params many parameter ... server machines
Many ... replicas Neural Machine Translation
6 perfect translation
5 human 4 neural (GNMT) phrase-based (PBMT) 3
2 Closes gap between old system
Translation quality 1 and human-quality translation by 58% to 87% 0 English English English Spanish French Chinese > > > > > > Spanish French Chinese English English English Enables better communication Translation model across the world research.googleblog.com/2016/09/a-neural-network-for-machine.html BACKTRANSLATION FROM JAPANESE (en->ja->en) Phrase-Based Machine Translation (old system): Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.
Google Neural Machine Translation (new system): Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” God ‘s house in Masai language. There is a dried and frozen carcass of a leopard near the summit of the west. No one can explain what the leopard was seeking at that altitude. Automated machine learning (“learning to learn”)
Google Confidential + Proprietary (permission granted to share within NIST) Current: Solution = ML expertise + data + computation Current: Solution = ML expertise + data + computation
Can we turn this into: Solution = data + 100X computation
??? Early encouraging signs
Trying multiple different approaches:
(1) RL-based architecture search (2) Model architecture evolution (3) Learn how to optimize Appeared in ICLR 2017
Idea: model-generating model trained via RL
(1) Generate ten models (2) Train them for a few hours (3) Use loss of the generated models as reinforcement learning signal
arxiv.org/abs/1611.01578 CIFAR-10 Image Recognition Task Penn Tree Bank Language Modeling Task “Normal” LSTM cell
Cell discovered by architecture search Learn2Learn: Learn the Optimization Update Rule
Neural Optimizer Search using Reinforcement Learning, Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. To appear in ICML 2017
More computational power needed
Deep learning is transforming how we design computers
Google Confidential + Proprietary (permission granted to share within NIST) Special computation properties
about 1.2 1.21042 reduced precision × about 0.6 NOT × 0.61127 ok about 0.7 0.73989343 Special computation properties
about 1.2 1.21042 reduced precision × about 0.6 NOT × 0.61127 ok about 0.7 0.73989343
handful of specific × = operations Tensor Processing Unit v2
Revealed in May at Google I/O
Google-designed device for neural net training and inference
● 180 teraflops of computation, 64 GB of memory ● Designed to be connected together TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory Programmed via TensorFlow
Same program will run with only minor modifications on CPUs, GPUs, & TPUs
Will be Available through Google Cloud Cloud TPU - virtual machine w/180 TFLOPS TPUv2 device attached Making 1000 Cloud TPUs available for free to top researchers who are committed to open machine learning research
We’re excited to see what researchers will do with much more computation! g.co/tpusignup Machine Learning in Google Cloud
Custom ML models Pre-trained ML models
Vision API Speech API Jobs API
TensorFlow Machine Learning Engine
Natural Translation Video Language API API Intelligence API Machine Learning for Higher Performance Machine Learning Models
Google Confidential + Proprietary (permission granted to share within NIST) Device Placement with Reinforcement Learning
Placement model (trained via RL) gets graph as input Measured time + set of devices, outputs per step gives device placement for each RL reward signal graph node
+19.3% faster vs. expert human for NMT model +19.7% faster vs. expert human for InceptionV3
Device Placement Optimization with Reinforcement Learning, Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean, to appear in ICML 2017, arxiv.org/abs/1706.04972 Now
more Accuracy compute neural networks
other approaches
Scale (data size, model size) Future
more Accuracy compute neural networks
other approaches
Scale (data size, model size) Example queries of the future
Which of these eye images shows Describe this video symptoms of diabetic in Spanish retinopathy?
Find me documents related to reinforcement learning for Please fetch me a cup robotics and summarize them of tea from the kitchen in German Conclusions Deep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, …
If you’re not considering how to use deep neural nets to solve your problems, you almost certainly should be g.co/brain More info about our work g.co/brain
Thanks!