Nature-Learning-Machines.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
NEWS FEATURE THE LEARNING MACHINES Using massive amounts of data to recognize photos and speech, deep-learning computers are taking a big step towards true artificial intelligence. BY NICOLA JONES hree years ago, researchers at the help computers to crack messy problems that secretive Google X lab in Mountain humans solve almost intuitively, from recog- View, California, extracted some nizing faces to understanding language. 10 million still images from YouTube Deep learning itself is a revival of an even videos and fed them into Google Brain older idea for computing: neural networks. — a network of 1,000 computers pro- These systems, loosely inspired by the densely grammed to soak up the world much as a interconnected neurons of the brain, mimic T BRUCE ROLFF/SHUTTERSTOCK human toddler does. After three days looking human learning by changing the strength of for recurring patterns, Google Brain decided, simulated neural connections on the basis of all on its own, that there were certain repeat- experience. Google Brain, with about 1 mil- ing categories it could identify: human faces, lion simulated neurons and 1 billion simu- human bodies and … cats1. lated connections, was ten times larger than Google Brain’s discovery that the Inter- any deep neural network before it. Project net is full of cat videos provoked a flurry of founder Andrew Ng, now director of the jokes from journalists. But it was also a land- Artificial Intelligence Laboratory at Stanford mark in the resurgence of deep learning: a University in California, has gone on to make three-decade-old technique in which mas- deep-learning systems ten times larger again. sive amounts of data and processing power Such advances make for exciting times in 146 | NATURE | VOL 505 | 9 JANUARY 2014 © 2014 Macmillan Publishers Limited. All rights reserved FEATURE NEWS artificial intelligence (AI) — the often-frus- simpler systems, says Malik. Plus, they were Google Brain. The project’s ability to spot cats trating attempt to get computers to think like tricky to work with. “Neural nets were always was a compelling (but not, on its own, commer- humans. In the past few years, companies such a delicate art to manage. There is some black cially viable) demonstration of unsupervised as Google, Apple and IBM have been aggres- magic involved,” he says. The networks needed learning — the most difficult learning task, sively snapping up start-up companies and a rich stream of examples to learn from — like because the input comes without any explana- researchers with deep-learning expertise. a baby gathering information about the world. tory information such as names, titles or For everyday consumers, the results include In the 1980s and 1990s, there was not much categories. But Ng soon became troubled that software better able to sort through photos, digital information available, and it took too few researchers outside Google had the tools to understand spoken commands and translate long for computers to crunch through what work on deep learning. “After many of my talks,” text from foreign languages. For scientists and did exist. Applications were rare. One of the he says, “depressed graduate students would industry, deep-learning computers can search few was a technique — developed by LeCun — come up to me and say: ‘I don’t have 1,000 com- for potential drug candidates, map real neural puters lying around, can I even research this?’” networks in the brain or predict the functions So back at Stanford, Ng started develop- of proteins. “OVER THE NEXT FEW YEARS ing bigger, cheaper deep-learning networks “AI has gone from failure to failure, with bits using graphics processing units (GPUs) — the of progress. This could be another leapfrog,” super-fast chips developed for home-computer says Yann LeCun, director of the Center for WE’LL SEE A FEEDING FRENZY. gaming3. Others were doing the same. “For Data Science at New York University and a about US$100,000 in hardware, we can build deep-learning pioneer. LOTS OF PEOPLE WILL JUMP an 11-billion-connection network, with “Over the next few years we’ll see a feeding 64 GPUs,” says Ng. frenzy. Lots of people will jump on the deep- learning bandwagon,” agrees Jitendra Malik, ON THE DEEP-LEARNING VICTORIOUS MACHINE who studies computer image recognition at But winning over computer-vision scientists the University of California, Berkeley. But BANDWAGON.” would take more: they wanted to see gains on in the long term, deep learning may not win standardized tests. Malik remembers that Hin- the day; some researchers are pursuing other that is now used by banks to read handwritten ton asked him: “You’re a sceptic. What would techniques that show promise. “I’m agnostic,” cheques. convince you?” Malik replied that a victory in says Malik. “Over time people will decide what By the 2000s, however, advocates such as the internationally renowned ImageNet com- works best in different domains.” LeCun and his former supervisor, computer petition might do the trick. scientist Geoffrey Hinton of the University In that competition, teams train computer INSPIRED BY THE BRAIN of Toronto in Canada, were convinced that programs on a data set of about 1 million Back in the 1950s, when computers were new, increases in computing power and an explo- images that have each been manually labelled the first generation of AI researchers eagerly sion of digital data meant that it was time for a with a category. After training, the programs predicted that fully fledged AI was right renewed push. “We wanted to show the world are tested by getting them to suggest labels around the corner. But that optimism faded as that these deep neural networks were really for similar images that they have never seen researchers began to grasp the vast complexity useful and could really help,” says George Dahl, before. They are given five guesses for each test of real-world knowledge — particularly when a current student of Hinton’s. image; if the right answer is not one of those it came to perceptual problems such as what As a start, Hinton, Dahl and several others five, the test counts as an error. Past winners makes a face a human face, rather than a mask tackled the difficult but commercially impor- had typically erred about 25% of the time. In or a monkey face. Hundreds of researchers and tant task of speech recognition. In 2009, the 2012, Hinton’s lab entered the first ever com- graduate students spent decades hand-coding researchers reported2 that after training on petitor to use deep learning. It had an error rate rules about all the different features that com- a classic data set — three hours of taped and of just 15% (ref. 4). puters needed to identify objects. “Coming up transcribed speech — their deep-learning neu- “Deep learning stomped on everything else,” with features is difficult, time consuming and ral network had broken the record for accuracy says LeCun, who was not part of that team. The requires expert knowledge,” says Ng. “You have in turning the spoken word into typed text, a win landed Hinton a part-time job at Google, to ask if there’s a better way.” record that had not shifted much in a decade and the company used the program to update In the 1980s, one better way seemed to be with the standard, rules-based approach. The its Google+ photo-search software in May 2013. deep learning in neural networks. These sys- achievement caught the attention of major Malik was won over. “In science you have tems promised to learn their own rules from players in the smartphone market, says Dahl, to be swayed by empirical evidence, and this scratch, and offered the pleasing symmetry who took the technique to Microsoft during was clear evidence,” he says. Since then, he has of using brain-inspired mechanics to achieve an internship. “In a couple of years they all adapted the technique to beat the record in brain-like function. The strategy called for switched to deep learning.” For example, the another visual-recognition competition5. Many simulated neurons to be organized into sev- iPhone’s voice-activated digital assistant, Siri, others have followed: in 2013, all entrants to eral layers. Give such a system a picture and relies on deep learning. the ImageNet competition used deep learning. the first layer of learning will simply notice all With triumphs in hand for image and speech the dark and light pixels. The next layer might GIANT LEAP recognition, there is now increasing interest in realize that some of these pixels form edges; When Google adopted deep-learning-based applying deep learning to natural-language the next might distinguish between horizon- speech recognition in its Android smartphone understanding — comprehending human tal and vertical lines. Eventually, a layer might operating system, it achieved a 25% reduction discourse well enough to rephrase or answer recognize eyes, and might realize that two eyes in word errors. “That’s the kind of drop you questions, for example — and to translation are usually present in a expect to take ten years to achieve,” says Hin- from one language to another. Again, these are human face (see ‘Facial NATURE.COM ton — a reflection of just how difficult it has currently done using hand-coded rules and recognition’). Learn about another been to make progress in this area. “That’s like statistical analysis of known text. The state- The first deep-learn- approach to brain- ten breakthroughs all together.” of-the-art of such techniques can be seen in ing programs did not like computers: Meanwhile, Ng had convinced Google to let software such as Google Translate, which can perform any better than go.nature.com/fktnso him use its data and computers on what became produce results that are comprehensible (if 9 JANUARY 2014 | VOL 505 | NATURE | 147 © 2014 Macmillan Publishers Limited.