b ny Ra dy Barrett illustration by Mike Perry

Teewh n Janelia computing cluster puts a premium on expandability and speed. ple—it’s pretty obvious to anyone which words are basically the same. That would be like two genes from humans and apes.” But in organisms that are more diver- gent, Eddy needs to understand how DNA sequences tend to change over time. “And it becomes a difficult specialty, with seri- ous statistical analysis,” he says. From a computational standpoint, that means churning through a lot of opera- tions. Comparing two typical-sized protein sequences, to take a simple example, would require a whopping 10200 opera- Computational biologists have a need for to help investigators conduct genome tions. Classic algorithms, available since speed. The computing cluster at HHMI’s searches and catalog the inner workings the 1960s, can trim that search to 160,000 Janelia Farm Research Campus delivers and structures of the brain. computations—a task that would take the performance they require—at a mind- only a millisecond or so on any modern boggling 36 trillion operations per second. F aSTER Answers processor. But in the genome business, In the course of their work, Janelia A group leader at Janelia Farm, Eddy deals people routinely do enormous numbers researchers generate millions of digitized in the realm of millions of computations of these sequence comparisons—trillions images and gigabytes of data files, and they daily as he compares sequences of DNA. and trillions of them. These “routine” cal- run algorithms daily that demand robust He is a rare breed, both biologist and code culations could take years if they had to be computational horsepower. Geneticists, jockey. “I’m asking biological questions, done on a single computer. molecular biologists, biophysicists, physi- and designing technologies for other peo- That’s where the Janelia cluster comes ologists, and even electrical engineers ple to ask biological questions,” he says. in. Because a different part of the work- pursue some of the most challenging Eddy writes algorithms to help re- load can easily be doled out to each of problems in neuroscience, chief among searchers extract information from DNA its 4,000 processors, researchers can get them how individual neuronal circuits sequences. It’s a gargantuan matching their answers 4,000 times faster—in hours process information. Their discoveries game where a biological sequence— instead of years. The solutions don’t depend, now more than ever, on the seam- DNA, RNA, or protein—is treated as a tend to lead to eureka moments; rather, less interplay of scientists and computers. string of letters and compared with other they provide reference data for genome Humming nonstop in Janelia’s compact ­sequences. “From a computer science researchers as they delve into the com- computing center are 4,000 processors, standpoint, it’s similar to voice recognition plexities of different organisms. “These 500 servers, and storage machines holding and data mining,” he says. “You’re compar- computational tools are infrastructural, a half a petabyte of data—about 50 Libraries ing one piece against another. We look for foundation for many things,” Eddy says. of Congress worth of information. a signal in what looks like random noise.” While that may not sound dramatic, Though there are many larger clus- Eddy looks for the hand of evolution in Eddy’s protein-matching algorithms are ters around the world, this particular DNA by comparing different organisms’ an industry standard, used by researchers one is just right for Janelia Farm. “Beau- genomes. He’s searching for strings of as the search tool for a reference library tifully conceived, ruthlessly efficient, DNA sequences that match—more than called the Protein Families database, and extraordinarily well run by the high-­ random chance would dictate. or . There are roughly 10 million performance computing team,” according “It’s a lot like recognizing words from proteins in the database. Luckily about to Janelia researcher Sean Eddy, the sys- different languages that have a common 80 percent of those sequences fall into tem is designed to make digital images ancestry, thus probably the same mean- a much smaller set of families and Eddy available lightning fast while muscling ing,” he explains. “In two closely related has designed the analysis software to query through the monster calculations required languages—Italian and Spanish, for exam- for matches in this data set. “When a new

30 hhmi bulletin | May 2o1o Paul Fetters ra te otae n ps te enve- lope,” hesays. the push and software the break improve them. his analysis tools to the failure point so he can stressing means which users, his of ahead step one keep to has Eddy bilities. have software design and upkeep responsi- for which Eddy and his Janelia team 12,000 protein families. identifies currently database The says. he to,” added being always tionary—it’s dic- a like is [Pfam] in, comes sequence BOTTO M TO accom- togrowneasily be can andsystem previous the than responsive more nitude mag- of order an is that platform a vides six timesmorememory. has and one old the than faster times 10 to up is system new The them. built who engineers the by signed are and 2 or 1 ber num-serial have components the of some fact, design—in particular this for tomer cus- first the is Janelia class supercomputer.” “working a system the calls Eddy Networks, Arista and Dell, Intel, by built com- of components hardware available mercially up Made refresh. technology four-year regular a of part as upgraded tific needs may change and evolve rapidly. scien- the since particularly response, fast premium on expandability, flexibility, and a puts design Its it. demand requirements if more many serve to up scale can and staff support and researchers cluster 350 The serves users. its and both by overseers its processors of “cluster” a as to referred is system computing Janelia The A M I DDLE: P: to try and experiments up set “We called database RNA an also is There “Janelia’s new computing cluster pro- cluster computing new “Janelia’s recently was cluster computer The osaic R oian Egnor M: Sean Eddy Elena

of R F ivas, Lou Sheffer l y N May 2o1o May eurons

| hhmi bulletin 31 modate changing requirements,” says Vijay But Scheffer’s matching is just the first processors. Those familiar with office net- Samalam, Janelia’s director of information step in the image-manipulation process. works know it well: ­Ethernet, the popular technology and scientific computing. Janelia software engineer Philip Winston standard for moving electronic data from That expanded capacity is a big help to takes the processed pictures and does the point A to B. While it has been a long- Janelia fellow Louis Scheffer, an electrical unthinkable—he chops them up again. standing protocol for slower connections, engineer and chip designer by training. He creates smaller “tiles” of the photos, 10-gigabit Ethernet has not traditionally He uses the cluster to help researchers which can be more easily added and been the choice of makers and engineers map the brain wiring of the common fruit subtracted from a computer screen as a of top supercomputers who until recently, fly Drosophila melanogaster. Essentially, researcher pans across an image. “To open when best performance was a must, used it’s a massive three-dimensional image- a single image would take five minutes if specialized networking technology called manipulation challenge. First, slices of you didn’t tile them,” says Winston. Only InfiniBand. brain 1/1,000th the thickness of a human 20 tiles are required on the screen at any “Now Ethernet switches are as effi- hair are digitally photographed with an one time. Currently, Winston is working cient as, or very close to, InfiniBand and electron microscope and stored. In each with four million tiles as part of the Janelia you don’t need a different [networking] layer, the computer assigns colors to the Fly Electron Microscope project to map skill set,” says Spartaco Cicerchia, man- neurons so researchers can trace their the entire brain of the fruit fly. ager of network infrastructure at Janelia path. As an example, the medulla of the Humans proofread the final fly-brain Farm. The bottom-line advantage is that fly, part of the brain responsible for vision, image for accuracy, to trace the neural Ethernet is easier to work with, familiar requires more than 150,000 individual paths and make sure the computer has to more networking engineers, and tends images to create the full mosaic, which is identified structures correctly. “[People] to be cheaper to use. 1,700 layers (slices) deep. are an important step,” says Winston. Lower latency—the time it takes to But all these pictures must be knitted “Without them, the computer segmen- move data across a network connection— together so scientists can follow neural tation would be 95 percent right and we is now possible via Ethernet due to a paths and see where they lead. Think wouldn’t know about the other 5 percent.” relatively new networking standard called Google Earth. As you pan across the Scheffer and Winston’s ultimate goal iWarp. Traditionally, computers’ proces- globe, data are fed onto the screen so you is to completely automate the mapping sors must manage the flow of information can “fly” from one location to another, process and to teach the computer to packets as they pass between them. In the and more images are required as you drill identify the inner structures of the fruit new systems, those packets are handled by down to examine surface topography. fly brain, in particular the different types a separate piece of hardware made by the Making the transitions smooth in between of neurons, and the axons and dendrites chip manufacturer Intel. images requires fine-tuned alignment. “It’s branching out from them. “To do the “Traditionally, [central processing units not completely simple—there are a whole whole fly brain we have to improve the handle] network packets. However, when bunch of distortions to deal with,” says automated segmentation,” says Winston. network interface speeds went from 1 to Scheffer. Some are caused by the electron Scheffer hopes to achieve the computer- 10 gigabits per second, the load on CPUs microscope itself as it dries out the target generated—and accurate—mapping of increased by an order of magnitude,” says specimen during imaging. the brain within the next five years as Goran Ceric, Janelia’s manager of scien- To align one image to its neighbor more pieces are imaged and processed. tific computing systems. takes about one minute of computer time. The creation of iWarp helped alle- But once matched, the resulting checker- Easier Communications viate this issue and reduce latency and board must be stacked and aligned with The increased speed of the new computing overhead. iWarp helps in three ways, the mosaic of images above and below. system will make that effort easier going according to Ceric: by processing net- “You need to make about one million forward. The cluster is faster than its pre- work packets using specialized hardware comparisons,” Scheffer says. “It would decessor for two reasons: it has more than instead of CPUs; by placing data directly take [a personal] computer four years.” four times as many individual processors, into application buffers, thus eliminat- With Janelia’s parallel processors on the and it’s using a networking technology that ing intermediate packet copies; and by task, the job is done in a few hours. speeds up the communication between reducing a need for “context switching,”

32 hhmi bulletin | May 2o1o Janelia’s IT team—Vijay Samalam, Spartaco Cicerchia, and Goran Ceric—recently updated its computing cluster, building it from ­commercially available hardware components by Intel, Dell, and Arista Networks, and using popular and now faster Ethernet networking.

in which a processor must pass commands daily. “If you take those vocalizations and programs is identical,” Rivas says. “We’re back and forth between an application computationally lower frequencies, they going to try to determine the types of and an operating system. “For many paral- sound remarkably like bird songs.” [mouse] vocalizations and try to model lel applications,” he says, “if you can lower Egnor is trying to better understand each one.” communications time between processors the elements of these mouse whispers. Once those models, or “families,” are in different systems over the network, the “When you look at mouse vocalizations, delineated, researchers can then test new better your performance is.” there appears to be some acoustic struc- mouse vocalizations against these tem- The new network infrastructure has ture to them. What is it for?” she says. plates. “Then we’ll try to catalog every- dropped the communications lag inside To find out, she is collaborating with thing the mouse says,” Rivas explains. The the Janelia cluster from 60 to 10 micro­ another Janelia fellow, Elena Rivas, who analysis takes the computing cluster only seconds—a sixfold improvement. is starting to process the communications a few moments to run. using a statistical analysis tool called a “The beautiful thing about Janelia Mouse Talk . The software is is that I stream that [information] to the Janelia Farm fellow Roian Egnor isn’t a similar to that which Sean Eddy uses to data share, and Elena picks it up and starts computer scientist or network engineer, compare millions of DNA strands. working on it,” Egnor says. but her research on the vocalizations of “The cool thing about hidden Markov Making that transfer possible is mice (and the neural pathways required) models is, you can tell them ‘Look, here’s another hidden attribute of the Janelia depends partly on heavy computation what I think are good examples of what research complex—its internal network. power. Though famed for their quiet I want you to characterize. Learn them, It’s the pipeline that carries huge image or ways, it turns out mice are chatterboxes. and then I’m going to give you unlabeled auditory files without clogging or slowing All their communications, unfortunately, vocalizations and I want you to see which down the system. In the startup phase, that happen at frequencies between 30 and match and which don’t,’” Egnor says. meant overbuilding the fiber infrastruc- 100 kilohertz, far above the range of Rivas has reworked a standard pro- ture as much as possible and designing human hearing. tein analysis program called HMMER3 it to handle unpredictable loads through “There’s a secret world up there,” says to handle Egnor’s data, which comes in 10-gigabit ports. Janelia’s network is fully

Paul Fetters Egnor, who records hours of mouse talk one-terabyte chunks. “The core of the (continued on page 48)

May 2o1o | hhmi bulletin 33 continued from page 15 a minuscule “game of life” movie onto mammalian cells con­ (ightl moves) taining the phytochrome module. Each movie frame displays a Cell Sculpting pattern of dark and light boxes. The pattern evolves in a systematic way Lim, at UCSF, is applying optogenetic methods to illuminate the from frame to frame—dark boxes become light and vice versa, localized, protein–protein interactions that underlie everything according to simple mathematical rules. By projecting these from turning genes on and off, to making cells more or less sensitive changing patterns of light and dark boxes (pixels) onto a cell, the to stimuli, to cytoskeletal remodeling that alters a cell’s shape or researchers induced the cell surface to embody the same morph- influences its movements. ing patterns. Phytochrome B is a light-sensitive receptor in the mustard plant In a paper in the September 13, 2009, issue of , Lim and Arabidopsis thaliana that Lim is developing as a versatile molecu- several UCSF colleagues at the Cell Propulsion Lab say they should lar tool. In its normal role, the phytochrome enables the plants to be able to link the phytochrome light switch to many other cell respond to shade. When bathed in red light, for example, the phy- signaling pathways that involve the recruitment of protein players. tochrome undergoes a shape change that leads to the alteration of Lim refers to the system as a “universal remote control” for experi- gene expression in ways that cause the plant to grow toward sunnier mentally dictating when and where in a cell to activate a pathway of patches of space. interest. He can also imagine expanding the toolkit (see Web Extra, In one audacious show of experimental control, Lim and his col- “Beyond Light,” www.hhmi.org/bulletin/may2010). leagues combine the phytochrome with an enzymatic component “We are learning how to dissect biological systems the way into modules so they can use light to trigger the polymerization of electronics engineers dissect circuits,” Lim says. Elegant, pre- actin protein molecules in a cell. This results in localized changes cise interventions in neural circuitry, the kind that optogenetics in the cell’s cytoskeletal framework, which determines the cell’s researchers are exploring, stand a chance of eventually taking the shape. Using precision optics, the researchers can induce localized place of blunt instruments like surgery, electrodes, and the present shape changes with enough finesse that Lim refers to the process generation of pharmaceuticals. W as cell sculpting. Lim can imagine using light to orchestrate new organizations of we b e x t r a : To learn about specific studies using optogenetics and how researchers are

cells, perhaps even for making neuron-based logic components for looking beyond light to manipulate neurons, go to www.hhmi.org/bulletin/may2010. biological computers or to help reconstruct damaged nerve tissue. To demonstrate the utility of the approach in fine cell sculpt- we b e x t r a : Listen to Karl Deisseroth talk about the clinical potential of optogenetics.

ing, Lim’s group used a digital micromirror array device to project www.hhmi.org/bulletin/may2010

continued from page 33 the global standard, called the Linpack must be air conditioned away. “It takes less (heilict s on marvel) Benchmark, traditionally used to measure power and we produce fewer BTUs,” says meshed and runs at “line rate”—meaning performance and rank top supercomputers. Cicerchia. that the 40-gigabit/second data-center back- Right now, the Janelia system would rank As Janelia researchers go about their bone is available to every user at all times, roughly in the top 200 of existing comput- day thinking up novel ways to explore rather than being designed to serve only ing clusters, says Ceric. Janelia plans to neural networks, few contemplate the a small percentage of researchers as they enter its cluster in the next edition of the silicon ­marvel that quietly makes much of need it while the rest ponder their research Linpack ranking system this summer. their work possible. But ask any of them to or go to lunch. The installation’s increased efficiency consider their research without the clus- The computing cluster communicates is also better for the planet, since it gob- ter and you quickly enter the realm of the with the rest of the campus through 450 bles less electricity. The old cluster ran at unthinkable. miles of fiber optic cable, operating at 1 25 million operations per second per watt. “In a single day at Janelia we can do gigabit/second to users’ desktops. Now it can produce 200 million operations something that would take 11 years on a The updated cluster also runs at an per second on the same amount of power. single-processor workstation,” says Eddy. impressive 84 percent efficiency, based on And it throws off less heat that ultimately “We breathe CPU cycles like air.” W

This paper is certifed by SmartWood for FSC standards, which promote environmentally appropriate, socially benefcial, and economically viable management of the world’s forests.

48 hhmi bulletin | May 2o1o