Deep Learning and Humanity's Grand Challenges

Deep Learning and Humanity’s Grand Challenges

Jeff Dean Google Brain team g.co/brain

Presenting the work of many people at Google WHAT ARE MACHINE LEARNING AND AI? Artificial Intelligence The science of making things smart

Machine Learning Making machines that learn to be smart Deep learning is causing a machine learning revolution ML Arxiv Papers per Year Deep Learning Modern Reincarnation of Artificial Neural Networks Collection of simple trainable mathematical units, organized in layers, that work together to solve complicated tasks

What’s New Key Benefit new network architectures, Learns features from raw, heterogeneous, noisy data new training math, scale No explicit feature engineering required

“cat” ConvNets Functions a Deep Neural Network Can Learn input output

Pixels: “lion” Functions a Deep Neural Network Can Learn input output

Pixels: “lion”

Audio: “How cold is it outside?” Functions a Deep Neural Network Can Learn input output

Pixels: “lion”

Audio: “How cold is it outside?”

“Hello, how are you?” “Bonjour, comment allez-vous?” Functions a Deep Neural Network Can Learn input output

Pixels: “lion”

Audio: “How cold is it outside?”

“Hello, how are you?” “Bonjour, comment allez-vous?”

Pixels: “A blue and yellow train travelling down the tracks” But why now? 1980s and 1990s

Accuracy neural networks

other approaches

Scale (data size, model size) 1980s and 1990s

more Accuracy compute neural networks

other approaches

Scale (data size, model size) Now

more Accuracy compute neural networks

other approaches

Scale (data size, model size) 2011

humans

26% errors 5% errors 2011 2016

humans

26% errors 5% errors 3% errors 2008: NAE Grand Engineering Challenges for 21st Century ● Make solar energy affordable ● Engineer better medicines ● Provide energy from fusion ● Reverse-engineer the brain ● Develop carbon sequestration methods ● Prevent nuclear terror ● Manage the nitrogen cycle ● Secure cyberspace ● Provide access to clean water ● Enhance virtual reality ● Restore & improve urban infrastructure ● Advance personalized learning ● Advance health informatics ● Engineer the tools for scientific discovery

www.engineeringchallenges.org/challenges.aspx 2008: NAE Grand Engineering Challenges for 21st Century ● Make solar energy affordable ● Engineer better medicines ● Provide energy from fusion ● Reverse-engineer the brain ● Develop carbon sequestration methods ● Prevent nuclear terror ● Manage the nitrogen cycle ● Secure cyberspace ● Provide access to clean water ● Enhance virtual reality ● Restore & improve urban infrastructure ● Advance personalized learning ● Advance health informatics ● Engineer the tools for scientific discovery

I would personally add two others: ● Enable universal communication ● Build flexible general purpose AI systems www.engineeringchallenges.org/challenges.aspx 2008: NAE Grand Engineering Challenges for 21st Century ● Make solar energy affordable ● Engineer better medicines ● Provide energy from fusion ● Reverse-engineer the brain ● Develop carbon sequestration methods ● Prevent nuclear terror ● Manage the nitrogen cycle ● Secure cyberspace ● Provide access to clean water ● Enhance virtual reality ● Restore & improve urban infrastructure ● Advance personalized learning ● Advance health informatics ● Engineer the tools for scientific discovery

I would personally add two others: ● Enable universal communication ● Build flexible general purpose AI systems www.engineeringchallenges.org/challenges.aspx Restore & improve urban infrastructure https://waymo.com/tech/ Advance health informatics Hemorrhages

Healthy Diseased

No DR Mild DR Moderate DR Severe DR Proliferative DR 1 2 3 4 5 F-score 0.95 0.91 Algorithm Ophthalmologist (median)

“The study by Gulshan and colleagues truly represents the brave new world in medicine.”

Dr. Andrew Beam, Dr. Isaac Kohane Harvard Medical School

“Google just published this paper in JAMA (impact factor 37) [...] It actually lives up to the hype.” Dr. Luke Oakden-Rayner University of Adelaide Pathology Radiology

“The network performed similarly to senior orthopedic surgeons when presented with images at the same resolution as the network.” Tumor localization score (FROC): www.tandfonline.com/doi/full/10.1080/17453674.2017.1344459 model: 0.89 pathologist: 0.73 arxiv.org/abs/1703.02442 Predictive tasks for healthcare Given a patient’s electronic medical record data, can we predict the future?

Deep learning methods for sequential prediction are becoming extremely good

e.g. recent improvements in Google Translation Sequence-to-Sequence Model: Machine Translation

Target sentence

What size are you?

Quelle est votre taille?

Input sentence

[Sutskever & Vinyals & Le NIPS 2014] Neural Machine Translation

6 perfect translation

5 human 4 neural (GNMT) phrase-based (PBMT) 3

2 Closes gap between old system

Translation quality 1 and human-quality translation by 58% to 87% 0 English English English Spanish French Chinese > > > > > > Spanish French Chinese English English English Enables better communication Translation model across the world research.googleblog.com/2016/09/a-neural-network-for-machine.html BACKTRANSLATION FROM JAPANESE (en->ja->en) Phrase-Based Machine Translation (old system): Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.

Google Neural Machine Translation (new system): Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” God ‘s house in Masai language. There is a dried and frozen carcass of a leopard near the summit of the west. No one can explain what the leopard was seeking at that altitude. Predictive tasks for healthcare Given a large corpus of training data of de-identified medical records, can we predict interesting aspects of the future for a patient not in the training set?

● will patient be readmitted to hospital in next N days? ● what is the likely length of hospital stay for patient checking in? ● what are the most likely diagnoses for the patient right now? and why? ● what medications should a doctor consider prescribing? ● what tests should be considered for this patient? ● which patients are at highest risk for X in next month?

Collaborating with several healthcare organizations, including UCSF, Stanford, and Univ. of Chicago. Have early promising results (no public paper yet) Engineer better medicines

and maybe... Make solar energy affordable Develop carbon sequestration methods Manage the nitrogen cycle Predicting Properties of Molecules

DFT (density Toxic? functional Bind with a given protein? theory) simulator Quantum properties: E,ω0, ... Predicting Properties of Molecules

DFT (density Toxic? functional Bind with a given protein? theory) simulator Quantum properties: E,ω0, ...

● State of the art results predicting output of expensive quantum chemistry calculations, but ~300,000 times faster https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html and https://arxiv.org/abs/1702.05532 and https://arxiv.org/abs/1704.01212 (latter appeared in ICML 2017) Engineer the tools for scientific discovery

Open, standard software for general machine learning

Great for Deep Learning in particular

http://tensorflow.org/ First released Nov 2015

and Apache 2.0 license https://github.com/tensorflow/tensorflow TensorFlow Goals

Establish common platform for expressing machine learning ideas and systems

Open source it so that it becomes a platform for everyone, not just Google

Make this platform excellent for both research and production use

TensorFlow: A Vibrant Open-Source Community ● Rapid development, many outside contributors ○ ~1000+ non-Google contributors to TensorFlow ○ 24,000+ commits in 24 months ○ Many community created tutorials, models, translations, and projects ■ ~23,000 GitHub repositories with ‘TensorFlow’ in the title ○ Many millions of downloads

● Direct engagement between community and TensorFlow team ○ 5000+ Stack Overflow questions answered ○ 80+ community-submitted GitHub issues responded to weekly

● Growing use in ML classes: Toronto, Berkeley, Stanford, ... More computational power needed

Deep learning is transforming how we design computers

Google Confidential + Proprietary (permission granted to share within NIST) Special computation properties

about 1.2 1.21042 reduced precision × about 0.6 NOT × 0.61127 ok about 0.7 0.73989343 Special computation properties

about 1.2 1.21042 reduced precision × about 0.6 NOT × 0.61127 ok about 0.7 0.73989343

handful of specific × = operations Tensor Processing Unit v1 Google-designed chip for neural net inference

In production use for >30 months: used on search queries, for neural machine translation, for AlphaGo match, …

In-Datacenter Performance Analysis of a Tensor Processing Unit, Jouppi, Young, Patil, Patterson et al., ISCA 2017, arxiv.org/abs/1704.04760 TPUv1 is a huge help for inference

But what about training?

Speeding up training hugely important: for researcher productivity, and for increasing the scale of problems that can be tackled Tensor Processing Unit v2

Google-designed device for neural net training and inference ● 180 teraflops of computation, 64 GB of memory ● Designed to be connected together TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory TPU Pod 64 2nd-gen TPUs For comparison, 11.5 petaflops #10 supercomputer in world has Rpeak of 4 terabytes of memory 11 petaflops (more precision, though) Programmed via TensorFlow

Same program will run with only minor modifications on CPUs, GPUs, & TPUs

Will be Available through Google Cloud

Cloud TPU - virtual machine w/180 TFLOPS TPU device attached Some TPU Success Stories

Resnet-50 to >76% accuracy: 1402 minutes (23 hours 22 minutes) on single TPUv2 device 45 minutes on 1/2 pod (32 TPUv2 devices, 31.2X speedup)

Resnet-50 to 75% accuracy: same code, no special tricks 22 minutes on full pod (64 TPUv2 devices) Speedup When Using Multiple TPU Devices batch size # TPUs Time to (i/o tokens) PPL=4.8

More than just ImageNet 16k / 16k 1 17.9 hours

32k / 32k 4 3.5 hours

Transformer model from "Attention is 256k / 256k 16 1.1 hours All You Need" 1M / 1M 64 0.5 hours (2017 A. Vaswani et. al., NIPS 2017)

WMT’14 English-German translation task

Adam optimizer - same learning rate schedule across configurations Making 1000 Cloud TPUs available for free to top researchers who are committed to open machine learning research

We’re excited to see what researchers will do with much more computation! g.co/tpusignup Machine Learning for Finding Planets

Google Confidential + Proprietary (permission granted to share within NIST) www.nasa.gov/press-release/artificial-intelligence-nasa-data-used-to-discover-eighth-planet-circling-distant-star Blog: www.blog.google/topics/machine-learning/hunting-planets-machine-learning/ Paper: [Shallue & Vandenburg], www.cfa.harvard.edu/~avanderb/kepler90i.pdf www.nasa.gov/press-release/artificial-intelligence-nasa-data-used-to-discover-eighth-planet-circling-distant-star Blog: www.blog.google/topics/machine-learning/hunting-planets-machine-learning/ Paper: [Shallue & Vandenburg], www.cfa.harvard.edu/~avanderb/kepler90i.pdf www.nasa.gov/press-release/artificial-intelligence-nasa-data-used-to-discover-eighth-planet-circling-distant-star Blog: www.blog.google/topics/machine-learning/hunting-planets-machine-learning/ Paper: [Shallue & Vandenburg], www.cfa.harvard.edu/~avanderb/kepler90i.pdf Automated machine learning (“learning to learn”)

Google Confidential + Proprietary (permission granted to share within NIST) Current: Solution = ML expertise + data + computation Current: Solution = ML expertise + data + computation

Can we turn this into: Solution = data + 100X computation

??? Neural Architecture Search

Idea: model-generating model trained via reinforcement learning (1) Generate ten models (2) Train them for a few hours (3) Use loss of the generated models as reinforcement learning signal

Neural Architecture Search with Reinforcement Learning, Zoph & Le, ICLR 2016 arxiv.org/abs/1611.01578 CIFAR-10 Image Recognition Task AutoML outperforms handcrafted models

Inception-ResNet-v2

accuracy Accuracy (precision @1)

computational cost

Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012 AutoML outperforms handcrafted models

Inception-ResNet-v2

accuracy Accuracy (precision @1) Years of effort by top ML researchers in the world

computational cost

Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012 AutoML outperforms handcrafted models

accuracy Accuracy (precision @1)

computational cost

Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012 AutoML outperforms handcrafted models

accuracy Accuracy (precision @1)

computational cost

Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012 AutoML outperforms handcrafted models

accuracy Accuracy (precision @1)

computational cost

Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012 Early encouraging signs that we can build flexible systems that can solve new problems automatically… Deep neural networks and machine learning are producing significant breakthroughs that are solving some of the world’s grand challenges Deep neural networks and machine learning are producing significant breakthroughs that are solving some of the world’s grand challenges

Many open areas of research:

● Better theoretical understanding of neural nets ● Better non-convex optimization techniques ● Unsupervised learning, multi-task learning ● Learning-to-learn, ...

g.co/brain and tensorflow.org Thank you!