Cambridge Healthtech Media Group www.bio-itworld.com

BONUS EDITION: Data Management and the Cloud Bonus Edition: Data Management and the Cloud ®

editorial director 3 Inaugural Gathering of Lab IT Forum Wins Big Allison Proffitt (617) 233-8280 [email protected] Pharma Interest Account Manager, Media Jay Mulhern (781) 972.1359 5 BitSpeed Pushes Software Solutions for High-Speed [email protected] Lead Generation Account Manager, Companies A-K Data Transfer Katelin Fitzgerald (781) 972-5458 [email protected] Lead Generation Account Manager, 7 Cycle Computing CTO James Cuff on Clouds, Companies L-Z Tim McLucas (781) 972-1342 On-Demand Computing and Package Holidays [email protected]

corporate marketing BONUS EDITION: communications director Lisa Scimemi (781) 972-5446 10 Courtagen Leverages Level 3 to Provide Direct Access [email protected] to Amazon Cloud Marketing Assistant Lisa Hecht (781) 972-1351 [email protected] 12 NetApp Eyes Opportunities in Health Care Contributing Editors

Deborah Janssen Data Storage Da John Russell

Ann Neuer t a Ma n aga nd ement

Cambridge Healthtech Institute president Phillips Kuhl

Contact Information [email protected] 250 First Avenue, Suite 300 Needham, MA 02494 Cl t h e oud

Follow us on Twitter, LinkedIn, and Facebook Google Plus, You Tube and Xing This index is provided as an additional service. The publisher does not assume any liability for errors or omissions.

Subscriptions: Address inquires to Bio•IT World, 250 First Avenue, Suite 300, Needham, MA 02494 888-999-6288 or e-mail [email protected]

Reprints: Copyright © 2013 byBio•IT World, All rights reserved. Reproduction of material printed in Bio•IT World is forbidden without writ- ten permission. For reprints and/or copyright permission, please contact Jay Mulhern, (781) 972-1359, [email protected].

[2] Inaugural Gathering of Lab IT Forum Wins Big Pharma Interest

By Kevin Davies | march 6, 2013

The chief architects of a fledgling coalition In addition to Ceiba Solutions, the IT com- collaborating to enhance the scientists’ ex- of IT firms, consultancies and biopharma munity was represented by executives from perience in the lab today—and we have the representatives declared their first meeting Dell, Intel, Thermo Scientific and Microsoft. right people in the room to drive innovation last week a promising success. Representatives from , Har- for the lab of the future.” vard Medical School, and Cognizant were The two-day gathering—at AstraZeneca’s also in attendance. “This is very focused on helping scientists research center in Waltham—was organized do more science at an operational level,” by Mike Santimaw (head of specialist The vision of the group is to build a “peer- says Arneman. “It’s about moving data, computing at AstraZeneca), Kevin Granfield to-peer, pre-competitive network,” said improving the quality of service, detection

(director R&D IT support services at Biogen Santimaw. “We do a good job with our against viruses, backup, etc. It’s about the BONUS EDITION: Idec), Jay Paghdal (head of regional customers, but we do what they want, not operational layer within the lab.” service delivery at Novartis Institute of what they need.” The goal, he said, was to Biomedical Research), and Merck’s Alec learn and deploy best practices to help In that regard, the Lab IT Forum differs from Anuka, with support from Tom Arneman R&D colleagues “do more of what they do the Pistoia Alliance, which focuses more on (president of Ceiba Solutions, a Boston- best: science, quality, and manufacturing.” informatics, or the Allotrope Foundation, based IT managed services, products and which deals with instrument standards. information analytics provider). “We’re all competitors,” he says, “but it Meanwhile, Microsoft spun off the BioIT Da

doesn’t mean we can’t share best prac- Alliance, founded by Don Rule in 2006, as a t In the absence of a catchier name, the tices… the vision is to make that easy… It’s translational medicine standards organiza- a Ma n aga nd ement group is calling itself the Lab IT Forum. about helping customers reach strategic tion almost three years ago. goals through delivery of ‘value-add’ IT Other pharma companies represented in services… It’s about doing the right things New Alliance the group of some 25 representatives in- right, not once but all the time.” cluded Pfizer, Johnson & Johnson, Sanofi, “Over the past two years, we’ve all been and Alkermes. “This forum has great potential,” adds Gran- suffering from common woes—scientists field. “IT professionals and key vendors are needing better service, managing applica- Cl t h e oud

[3] tions, security, and so on,” Arneman recalls. Pharma IT services staff also shared their Arneman says Ceiba prides itself on reduc- After several informal discussions over the perspectives. One discussed his headaches ing resolution times from weeks to about a past two years about forging collaborations following a merger in sharing data between day. “The trouble with PC software is that between industry researchers and IT groups, four global research sites. Another raised it can take 15-20 days to close [a technical Santimaw finally pushed him: “When are the issue of staffing models across global issue]. We’re the technical experts to solve it you going to connect us?!” he asked. sites and the need for better forecasting or the concierge to get it solved. The result and tracking systems, as well as more pro- is scientists get their day back.” “Following Mike’s lead, I facilitated connec- active data analytics. tions between pharma and owners of IT Ceiba also offers implementation and/or support. Mike and others took over from The consensus highlight of the first day was support for open-source or third-party ap- there,” says Arneman. “It was out of their a first-hand perspective on data processing. plications. For example, the company won passion for the end users that this [meeting] Liping Zhou, a scientist from NIBR, “pro- a contract from Merck support more than came about.” vided an elegant, compelling description 60 Rosetta Biosoftware customers. Ceiba of why she needs support,” says Arneman, continues to partner with Microsoft (which The first gathering of the Lab IT Forum highlighting three major areas of frustra- acquired the Rosetta assets) to enhance was in Arneman’s view an experiment, but tion: difficulty in obtaining information she those product sets, Arneman says. one that worked better than he expected. wants, processing it, and communicating Opening day was about sharing concerns. the information produced. Following a deal with GSK, Ceiba is also the

Several breakout groups were convened distributor for Helium, a cross-source data BONUS EDITION: on subjects such as lab IT support, client On the second day, the group discussed the reporting tool that won a 2011 Bio-IT World management, validation approaches, and concept of a “lab of the future,” covering is- Best Practices award. A community edition lab/manufacturing security. sues such as mobility, data security and the will be available shortly. ideal laboratory layout. Organizers listed a host of areas ripe for Next Steps improved information sharing, including: The meeting also included presentations

• Operating systems and software upgrade from some of the industrial strategic part- One of the future objectives of the Lab IT Da

management ners. For example, a Dell representative dis- Forum is to create a training program and t cussed investments in new mobile devices certification process for help desk person- a Ma n aga nd ement • Data securing and antivirus protection and WiGig (the wireless Gigabit Alliance). nel to deliver differentiated lab support. • Operational support Another interesting development is the Arneman envisions that several white acquisition of McAfee and the notion of papers will be published in the coming • Instrument/equipment management, integrating security measures at the level of months before the group meets again in scheduling and utilization the microprocessor. six months time. • Information collection and sharing for decision making and predictive analytics “It was important that this group understands The Lab IT Forum welcomes new mem- how companies like Dell, Microsoft, and Intel bers—membership is open. • Packaging/manufacturing best practices go into life sciences. They have a dedicated Cl t h e • Lab design—layout, enabling de- practice on life science mobility and how that Arneman emphasizes that it is early days vices, software can be supported,” says Arneman. and the group currently lacks structure. oud “It’s driven by the passion of individuals to Various participants spoke of the need for For example, the deployment of Intel tablet better support science,” he says. “I don’t more seamless, efficient cooperation be- devices in the lab has saved Merck about $1 want to lose that. For good governance, tween IT support staff and R&D users. One million per year by improving data manage- we’ll work with any other groups in the academic manager said her colleagues were ment within a compliant environment, says space. We must balance passion with “very demanding” and needed solutions Arneman. A Thermo Scientific executive process.” fast. “Scientists just want it to work,” she said. discussed resources for remote instrumen- tation management to improve productiv- “This group needs a bit of advocacy Another academic IT support professional ity. Unity Lab Services, a division of Thermo, within their own, between scientists and said that his group is just starting to seri- allows scientists to focus on science by pro- vendors,” he says. “More than one orga- ously examine instrument life cycle and viding a menu of lab support services from nization would like to see security em- asset management. “Our goal is getting instrument management to data collection. bedded in an instrument. That’s where scientists to do what they need to do. they can use these things internally Our support stops when the instrument Ceiba’s goal is to help R&D teams innovate and educate their own organization. It’s connects to the computer,” he said. He de- and better utilize their information assets. about letting them know: ‘Here’s why scribed an instance when a new instrument The firm started by offering services, but this is hard, and why we can do it better.’” sat in its box unused for a month. “We could now offers “end-to-end responsibility for IT have got this running much faster if we’d requirements from the scientists’ perspec- known about it,” he said. tive, including the network, PCs, software, processes, systems upgrades, etc.”

[4] BitSpeed Pushes Software Solutions for High-Speed Data Transfer

By Kevin Davies | February 7, 2013

magine a piece of software that could contemporaneously write Protocol Preferences the same data file in Los Angeles while it is actually stream- As reported in a Bio-IT World cover story in Iing off a next-generation sequencing (NGS) instrument in New 2010, Aspera’s patented fasp data transfer York, essentially reducing the time to transport said data from protocol makes use of UDP (user datagram hours or minutes to virtually zero. protocol), which was originally developed more than 30 years ago as a means to It sounds a little far-fetched, but that is the protocols, gaining strong traction within the provide a fast way of accelerating data promised performance of Concurrency, the life sciences community. movement. latest software product from Los Angeles-

based BitSpeed. Currently being tested in The BioTeam consultant Chris Dwan, who In Davis’ opinion, however, UDP is like BONUS EDITION: a research lab at the University of Southern is currently working with the New York “throwing mud on wall, then picking up California (USC), BitSpeed executives believe Genome Center, says the bandwidth prob- what falls off with a shovel, and repeating it will warrant a close look by many life sci- lem addressed by companies like Aspera, the process until all the mud is on the wall.” ence organizations struggling to manage BitSpeed, new tools such as EMC Isilon’s A transmission might be reported as suc- big data or balking at the cost or ease of SyncIQ, and GridFTP from Globus Online, cessful even as packets of data are still being use of existing commercial or open-source is critical. “There are a lot of underutilized 1 sent to complete the transmission, he says. solutions for data transport. Gb/sec connections out there in the world,” Da

says Dwan. BitSpeed, by contrast, is based on TCP t Concurrency updates BitSpeed’s Velocity (transmission control protocol). “We’re the a Ma n aga nd ement software, which expedites the transfer of “Aspera’s done a good job,” BitSpeed co- only company with accelerated TCP,” says large data files. Although based on a dif- founder Doug Davis conceded in an inter- Davis. “We can perform better, provide ferent protocol, BitSpeed hopes to offer a view with Bio-IT World, before laying out more security and order than UDP–based compelling alternative to Aspera, which over why he thinks his software is superior in cost solutions.” the past few years has become the domi- effectiveness, ease of configuration, features nant commercial provider of data transport and performance. TCP is an ordered protocol, which Davis argues is important for data integrity. “We grab the data, mixed up in some cases, and lay the data down at the destination in the Cl t h e Moving Data same sequence. This is important – if the data are jumbled, you might need a third BitSpeed was founded in 2008 by Davis Some health care centers such as the Mayo oud and Allan Ignatin, who previously founded Clinic also signed on, but the medical software package to re-order the data.” Tape Laboratories, a developer of back-up space wasn’t the initial focus. technologies and virtual tape libraries. The UDP and TCP have their respective advo- company developed a close relationship Velocity is a peer-to-peer software package cates, of course, but as Cycle Computing with Hewlett Packard (its back-up technol- that does three things, says Davis: “Acceler- CEO Jason Stowe points out, both also have ogy still exists as part of HP’s widely used ate. Ensure. Secure.” It’s about enhancing tradeoffs and there is only so much that can NonStop series) until the company was the speed, integrity, and security of the be deduced by evaluating algorithms theo- sold in 2006. data, he says. The product works on LANs retically. “TCP inherently gets better reliabil- Later, Ignatin reconnected with Davis, a as well as within organizations and be- ity and is used by many protocols, including former CEO of Tape Laboratories, and tween storage nodes. “No other solution HTTP, at the cost of lower throughput and hatched the idea of BitSpeed. “We noticed does that,” says Davis. overhead,” says Stowe. He also points out problems in transferring data outside that “noisy networks aren’t friendly to UDP buildings,” says Davis. “But what did we The software installs within a few minutes, either.” know? We were just storage guys—we says Davis, and configures automatically. thought latency was just a necessary evil.” Because of a modular architecture, it is em- But the only true test, says Stowe, is a Initially BitSpeed focused on local area beddable in other solutions. There are two benchmark with real-world data between network (LAN) optimizations, but the licensing models—either point-to-point or real endpoints, ideally also including open founders soon recognized a much bigger multitenant. “You can put a big license in protocols such as FDT and Tsunami. opportunity. Launched in 2010, Velocity the cloud deployment or data center, and gained a foothold in the video and enter- all clients are free of charge. It’s a compel- tainment sector as well as other verticals. ling model,” says Davis. Another potential advantage of TCP is that it is being studied extensively by standards

[5] organizations. While modest improve- In the capable hands of system administra- tion—in name, or suffix, time, whatever. ments are being made to UDP, according tor Andrew Clark, Velocity has expedited the There is no limit to the number of watch to Ignatin, “TCP has thousands of organi- transport of about 1 terabyte of NGS data folders we can handle. It’s not like a Drop- zations working on it, all of which we can daily to the HPC computing center. What box. None of the SysAdmins at server B, C, take advantage of. They’re distributed and formerly crawled along at 5-7 megabytes or D have to do anything.” converted to each operating system fairly (MB)/second was upgraded to nearly 80 MB/ transparently. So when a new congestion/ sec without configuration, and 112 MB/sec With Concurrency, Davis says, data written control algorithm [is released], we get it au- with configuration. Typical transport times to local the servers are also written to a cen- tomatically. We’ve added MD5 checksums of 20 hours were slashed to less than two. ter miles away. “When the sequencers have on every block, so all data received are 100 finished, it’s already there.” In theory, hours percent intact.” When Clark’s team added compression, he of transport time are reduced to essentially found no benefit at first—until it became zero. At USC, Clark has been experimenting Stowe notes, however, that checksums are apparent that the storage I/O of the disk with Concurrency, but he told Bio-IT World “commonly used to verify that transfers oc- array in the HPC center wasn’t fast enough. that the product was still being evaluated curred without error in many systems.” “This is a pretty common result for the soft- and he had no further comment. ware,” says Davis. “Marketing geniuses that As the name suggests, one of the potential we are, it never occurred to us that we could BitSpeed has also developed faster algo- virtues of Velocity is speed of data transfer, do this.” rithms for data compression and encryp-

which emerges from a complex multi-level tion. The compression algorithms run as the BONUS EDITION: buffering scheme that “gulps data off stor- Following the installation of a faster disk data are in flight, which in principle provides age and puts it back on,” says Ignatin. “We array, transfer speeds doubled to nearly further performance advantages. A pair of take the connection between two points, 235 MB/sec. Clark said Velocity “has proved encryption algorithms optimizes security, and use all the available bandwidth. Most absolutely invaluable to speeding up our including a proprietary algorithm called ASC connections use 30-40 percent efficiency, so data transfers.” (Advanced Symmetric Cipher). “It’s a robust we get more bang for the buck. We take a algorithm… with very little CPU usage,” says single TCP connection, break it into multiple Active Replication Ignatin. Da

parallel streams, then re-assemble [the data] t on the other end.” As promising as Velocity looks, BitSpeed The ability to have data encrypted during a Ma n aga nd ement has particularly high hopes for its latest flight should prove attractive for patient and “The bigger the bandwidth, the bigger the software, Concurrency—a patent-pending other data requiring HIPAA compatibility acceleration,” says Davis. In one benchmark- technology that does active file replication. and other forms of compliance. “How do ing test, he says Velocity took 1 minute 43 The product was unveiled in May 2012 at [users] get the big data to/from the cloud? seconds to move 10 gigabytes (GB) data the National Association of Broadcasters How do they ensure it is secure?” asks Davis. to four sites in New York, Tokyo, Rome and convention. It may expand use of the cloud, as a cloud Sydney—regardless of distance. provider’s security is of little use if the data Explains Ignatin: “Concurrency senses the aren’t secured en route, he says. Thus far, the only significant deployment beginning of file and writes it in multiple Cl t h e within a life sciences organization is in the locations at the same time. As data are Davis says that BitSpeed’s software is at- lab of neuroscientist James Knowles at the created at source, it’s being created at the tractively priced and interested parties can oud USC Keck Medical Center. (The introduction destination. The destination, in turn, can be register online for a 15-day free trial. was made by Ignatin’s wife, who is a USC transferring it simultaneously to another faculty member.) At the time of Velocity’s in- location. It’s called ‘chain multi-casting’ and But while rival protocols duke it out in the stallation, the Knowles lab had three Illumi- saves a lot of time.” marketplace, Dwan from The BioTeam says na sequencers sending data to a Windows they are still missing the bigger issue, name- server and a Solaris server, writing at about “We’ve made it virtually automatic,” Ignatin ly “the question of making the data scientifi- 4 MB/sec. The Solaris server transfers data continues. “We watch those folders for cre- cally useful and usable. None of these tools to the HPC computing center six miles away. ation of files that match a specific descrip- address that question at all.”

[6] Cycle Computing CTO James Cuff on Clouds, On-Demand Computing and

Package Holidays BONUS EDITION:

By Kevin Davies | february 6, 2013

What is your visceral reaction to the The new Chief Technology Officer at Cycle Computing, James Cuff, spent the past phrase “big data”? Did you encounter

seven years as Director of Research Computing and Chief Technology Architect for that in all areas? Da

Harvard University’s Faculty of Arts and Sciences. His team worked “at the interface It’s everywhere you look at this point. t a Ma n aga nd ement of science and advanced computing technologies,” providing a breadth of high- From an historical perspective, when I performance computing, storage and software expertise, all the while striving to started at Harvard in 2006, we had 200 manage a monstrous surge in data. Cuff previously led the construction of the CPUs and a state-of-the-art, 30-Terabyte Ensembl project at the Wellcome Trust Sanger Institute, before moving to the U.S., (TB) local NAS [network-attached stor- where he managed production systems at the Broad Institute, while his wife, fellow age] array. As I’m leaving, we’re at 25,000 Brit Michelle Clamp, joined the lab of Broad director Eric Lander. processors and 10 Petabytes (PB). And that’s just a small, university-wide research In his new position, Cuff aims to apply some of his insights and ideas to an even computing offering. bigger canvas. Cycle has made headlines over the past 2-3 years by spinning up Cl t h e virtual for academic and industry clients, as well as creating the Big In comparison, the breakdown of those Science Challenge, donating more than $10,000 in cloud compute time. CEO Jason data is exploding in all areas, even places like the museums of comparative zool- Stowe says Cuff brings a wealth of knowledge and contacts, and could bring some oud managerial discipline to Cycle’s patent portfolio. He adds that Cuff will remain in ogy. People are taking high-quality, high- the Boston/Cambridge area, which could impact Cycle’s local presence down the resolution images, particularly things like road. (Meanwhile Clamp, who moved to Harvard from the BioTeam last year, will the Giza Archives, there are forces at play fill Cuff’s shoes on an interim basis while the search for his replacement continues.) where our artifacts may unfortunately only be the digital records of some of Cuff spoke to Bio-IT World editor Kevin Davies and shared his views about big data, these areas. Everyone is collecting “big cloud computing, and the future of research computing. data” but this collection phase is a pre- lude to a second phase—namely once collected, trying to work out what we Bio-IT World: James, before we talk astrophysics, even economics and finan- ought to do with it so history informs the about your new gig, what were your cial modeling. Simulation is an exploding future. “Big data” is a very hyped term, chief responsibilities during your tenure field in all domains, so we had to be agile but it’s real. at Harvard? enough to help with all fields. We learned Cuff: It started in the life sciences, due about throughput and performance in the The bigger question I think is one of to the complexity with data, life sciences, and were able to apply that data provenance: not only creating the but rapidly expanded to include earth to other areas. data but being able to find it. The data and planetary sciences, particle physics, retention policies of the National Science

November 15, 2011 [7] Foundation and others—it’s a headache… within a very well respected, well regarded dreams into viable computing assets. We’ve seen this in the ENCODE Project, traditional IT environment. We started lis- where and his team even tening to our customers at Harvard—the I was talking to the [Cycle] engineers here encapsulated their virtual machine along faculty, grad students, and researchers— this morning, showing me the deep, dark with their data. We’re going to see more and built what they needed. corners of the Cycle server platform. I of this—to be able to have that frozen re- don’t profess to understand the thou- port of the science done, as it was. I started having conversations with Jason sands of hours these guys put into this. about his technology a few years ago. I want to help set strategy, work on gaps Many people think the data storage prob- Then the phone rang one day and he ex- where we can be more competitive, and lem per se has been solved. Is that fair? plained they were growing because they that means improve our customers’ expe- I’m inclined to agree. The component had too many customers and they want rience to the point where everybody gets parts of storage and processing are very to help their customers more and more. more work done. much solved problems—from a cent/ca- That rang a real bell with me, because as pability measure. The amount of Terabytes a bootstrapped company, the customers How much more can be done to push I can buy per given spend or CPU horse- drive what the real business is. I started to research into the cloud? Is cloud comput- power I can buy is now trivial. The com- talk with his engineering talent—he had ing still underutilized? plexity is that we’re doing this at much me at ‘hello’ basically… It’s still like the early days of the electric larger orders of scale and magnitude. company. Just because we have power,

I actually see this as a natural progression. the distribution area to light up your BONUS EDITION: The difficulty in the spending for smart I used to run Silicon Graphics clusters house—there was a lot of energy in the motivated researchers and organizations back at Oxford, doing it by myself. When early days to be able to handle fuse boxes is around how to orchestrate those events. a patch to the SGI came out, I would have and complexity. If I think of myself as a If you look at a top-tier storage array, the to put that on myself. Later on, at the lone grad student in a lab and I’ve got a price per Terabyte isn’t where the com- Wellcome Trust Sanger Institute and the credit card and a happy PI willing to let plexity is, it’s orchestrating Petabytes of Broad Institute, I was the guy between me spend it, I’m not sure I’ll be all that this into a single namespace, or being able the researcher and the compute. Even effective with ‘the cloud’. What is it? It’s Da

to make it act at a performance level. I can more so at Harvard, in many different do- a cloud provider’s consul, but I’ve got to t build a 4-TB storage array that will perform mains—we were the guys in between. To build an operating system, I’ve got to get a Ma n aga nd ement very differently than a 4-TB disk drive. me, it’s the logical progression—Cycle is my algorithms ported, I’ve got to work out that layer between the massive complexity what interconnects are… Are you looking at data management needed to orchestrate hundreds of thou- solutions such as open-source iRODS or sands of unique computer instances, to be If you look at the QIIME source code from commercial equivalents? able to deliver on our scientific promises. Rob Knight’s lab, there’s thousands upon To keep religion out of the conversation thousands of dependencies. If you look at here, the art of finding and annotating For me, Cycle is like a light bulb: If I’m a cloud adoption, the tinkerers are currently metadata at a massive scale is currently scientist walking into a lab, I want to turn tinkering around the edges, but Cycle has unsolved. One of the technology chal- a light bulb on to do my research, my been tinkering for seven years now. We Cl t h e lenges I see ahead is how to accelerate chemistry, etc. I don’t care how the energy can get them onto these resources and those to the point where the metadata is generated and distributed. I just want think of them as a utility from the electric oud analytics is of sufficient caliber to those to throw the switch, do my science, turn company perspective. that Lustre, WhamCloud, and now Intel the switch off and walk away. I want utility can build robust parallel file systems. supercomputing to get to that point—to It’s the same reason Beowulf clusters were These are also non-trivial and not neces- drive both supercomputing and storage on a slow ramp, but once we started to get sarily a solved problem either. to be consumable items as line items on cluster administration tools and reliable NSF and NIH awards. Computing should batch schedulers and Linux stopped fork- From the point of view of being able to no longer be a capital item. It should be ing every two weeks and things calmed find your data, or more importantly, what an on-demand, as-you-need-it platform. down a bit, the top-tier providers—Dell, happened to it?—How did it get to that HP, IBM in particular—embraced cluster state?—is a bigger issue. We’re starting to What do you intend to bring to Cycle? computing at a rate we weren’t expect- see that, in order to publish, you have to Will you work more on the technical side ing. We’re a few years away from that [in have publishable provenance. What did my or closer to the clients? cloud], but not that far, and Cycle is defi- grad student do to these data and how re- Cycle has amazing engineering tal- nitely positioned for the next logical step. producible are these scientific results? That’s ent—that was where they were founded going to be a big headache going forward. in terms of building customer solutions. Some experts—notably The BioTeam’s I want to engage our customers more Chris Dagdigian—have said Amazon I trust you didn’t leave Harvard because deeply in science outreach and under- has a large if not insurmountable lead in your wife just arrived. What did Cycle’s stand their grand challenge science prob- cloud computing. Do you agree? CEO Jason Stowe do to lure you over? lems. I want to bring to bear many years’ I love Dag dearly, but I’m not inclined to They were seven really exciting years. experience at being an interface between necessarily agree with him. We go where We basically built a start-up organization a brilliant faculty at Harvard and turn their the customers are. Today, the bulk of our

[8] customers are within AWS [Amazon Web have difficult portability issues, security “watch this space,” I’ve been dealing with Services]. To discount any player in this issues, compliance issues. There’s a set the last mile challenge for a while—how space is a dangerous game. As long as we of things we want to do to help new to get people’s computing off the desk- keep following our customers and the sci- customers get that work done. In the top. That’s what we’ve been doing in a ence, I think everyone will be successful. financial services areas, there are a lot of university setting for a long time and As to the crystal ball—who wins that race? ‘just-in-time’ computing challenges of I want to apply some of those lessons I don’t want to bet on that one..! the order of the size of Sequoia or Titan learned in anger here, with an amazing or the National Center for Computational engineering team who can actually turn I think of researchers like folks picking Sciences. Those big clients will always some of my dreams into reality. holiday destinations. They go where the be available on a national level. There’s weather’s warm, right? Even if the cost to no way a university should be building a How critical are technologies that facili- get down to Florida is a bit high, it’s bet- 20- or 30-Megawatt [machine] in a local tate the transport of big data and how ter than staying in Massachusetts in the computing facility to solve their comput- do you interact with them? winter. ing challenges. The Aspera technology is amazing and those protocols work incredibly well at Where do you see growth opportunities What new technologies will most impact the national centers—if you’re Ewan in your offerings for the life sciences? your services in the near future? Birney or the head of the NCBI and you To build credibility that you can leverage Not to pick any particular technology, but can license those technologies centrally,

and use these resources at a high scale is the ability to do high-performance paral- where it’s one to many, where many is BONUS EDITION: what some of our recent press has been lel file systems with the ability to retain millions, there’s great benefit. around. This week, we proved we have the some control of your metadata in remote technical capability to do 10,600 processors. computing environments is of consider- In terms of on-wire capability—back to But that’s not the business we’re in. We can able interest to me. the Florida analogy—we go where the show massive scale. I had similar conversa- weather is warm and our customers are. tions at Harvard—the astrophysicists would I’m also aware of the challenges of the We’re all going to have to be smarter happily consume tens of millions of CPUs ‘last mile’—you can build national high- about how we move data around. The Da

if they could get their paws on it. Museum speed 100-200-Gigabit/sec networking cheapest way is never to move it in the t collections had data challenges but didn’t infrastructure, but if your last mile is a first place. There are techniques and a Ma n aga nd ement need 1 million CPU-hours. much slower connection, you have to ideas I have in terms of where repositories be clever about dealing with the type actually need to be. Does you ultimate Your compute challenge is of the order of of technology you need on premises to repository need to be local? We’re going 1,000-40,000 processors, which we now be able to get in and out of these amaz- to have lots of fun there. glibly consider as small clusters. We’ll ing resources. So other than a teaser to Cl t h e oud

[9] Courtagen Leverages Level 3 to Provide Direct Access to Amazon Cloud By Kevin Davies | february 4, 2013

Although it didn’t require digging up any local and in the future, is how to process patient any reason, we’d have a pile-up. All the GPPs roads in the end, a small biotech company has genomic data as efficiently and securely as for the following week couldn’t get processed. struck a partnership in life sciences with Level 3 possible. The McKernans needed a data From a scaling standpoint, we had to change,” Communications to create a seamless and se- processing approach that was both scalable— says McKernan. cure data link that pipes genomic data directly throughput is expected to grow sharply in from its laboratory just outside Boston to the the next 1-2 years—and yet conservative and (While the data processed in the AWS cloud Amazon Web Services (AWS) cloud facility in secure, something that could withstand HIPAA are de-identified, Courtagen stores and deliv-

Ashburn, Northern Virginia. regulations regarding the privacy protection ers patient records in a private patient portal BONUS EDITION: of patient data. hosted by NetSuite, a new emerging ERP “We have a dedicated EPL [Ethernet Private system or through Courtagen’s ZiPhyr iPad Line] that carries terabytes of genetic data into Selecting Level 3’s network and the on-de- application. The physician portal is managed their servers and back again,” says Courtagen mand Amazon cloud was an obvious choice. in facilities that are both HIPAA- and SAS-700- Life Sciences President and co-founder, Bren- “Amazon has the scale,” says McKernan. “Our Type II compliant.) dan McKernan. expertise will be in interpreting scientific data

to enable researchers and clinicians to make On the Level Da

Although the system only went live late last year, better decisions regarding patient care and t the early results could hardly be better. “Our in- drug development. We outsource everything McKernan began investigating the idea of a Ma n aga nd ement formatics team is thrilled,” says McKernan. “Data else that’s a non-core competency. We don’t a private line—off the public Internet—to is flowing and we’re getting patient results in a have any IT infrastructure in our facility. The transport data to AWS. In addition to avoiding matter of minutes. It’s seamless; it’s perfect!” data comes off the sequencers and goes pile-ups, it should provide additional security. right to Amazon via the Level 3 network for Courtagen, founded by Brendan along with processing, where we utilize our ZiPhyr bioin- McKernan turned to Level 3 Communications, his brothers Kevin (Chief Technology Officer) formatics pipeline, which leverages standard owner of an international fiber-optic network, and Brian (CEO), is a small firm of about 25 industry algorithms in conjunction with our and what he calls a “carrier of carriers.” Many employees with a clinical laboratory gener- unique analysis workflows to generate results.” of the major telecommunications firms run ating patient genomic data for diagnostic off Level 3. “Eventually everyone hits a Level 3 Cl t h e purposes. Brendan’s forte is the implementa- “Amazon is one of the largest clouds in the gateway,” says McKernan. “From there, it goes tion of world-class manufacturing concepts world, so from a strategic standpoint, I don’t up to the cloud.” oud in running a laboratory, ideas and strategies want to invest capital in something we’re not honed over the past 15 years at the McKernan going to be number one at. The Amazon- Level 3 is one of the few global partners of brothers’ previous company, Agencourt, and Level 3 partnership gives us the ability to have Amazon’s that has “Direct Connect” capability, shared with partners such as the Broad Insti- global infrastructure that is scalable, cost ef- allowing clients to bypass the public Internet tute’s sequencing lab. fective, and extremely secure.” and go directly into the AWS servers.

At Courtagen’s offices in Woburn, Mass., How to push the data into the cloud? Until The challenge was not so much how to the CLIA-certified laboratory contains half- last year, Courtagen had two options, neither transfer the data down to Virginia, but how to a-dozen Illumina MiSeq sequencers, but no one ideal. One was to ship hard drives to transmit it the 15 miles or so from Courtagen’s trace of a data center. The incoming saliva (or Amazon’s facility in Virginia, but that took two offices in Woburn to Level 3’s gateway on Bent blood and tissue) samples, referred by a grow- days. Courtagen’s average sample-to-report Street in Cambridge, just behind the Broad ing network of physicians, are bar-coded and cycle time is fast—just 12 days. “But adding Institute. “Eric Lander [Broad Institute director] given a Genomic Profiling Project (GPP) num- two days for shipping is unacceptable. Our must have been thinking about this 20 years ber. “Once samples are accessioned and a GPP Informatics team wanted data processing in a ago, that’s smart. That’s one of the gateways to is assigned, no-one in the lab can see the Pro- matter of minutes,” says McKernan. the Internet!” says McKernan. tected Health Information (PHI). PHI includes any information according to HIPAA laws that The other method was to use traditional Inter- As discussions with Level 3 progressed, McK- can identify a person,” says McKernan. net delivery through an “old, slow pipe” but ernan was contemplating signing a purchase delivery often stalled. “It would take days to order to dig up roads and lay some new One of the key issues facing Courtagen today, move data up to the cloud, and if it failed for fiber-optic cable. “It was going to take a long

[10] time and cost a fair amount of money,” says for the end-to-end solution. In this instance, Although in the early days, McKernan says McKernan. Sidera reports to Level 3. his colleagues are delighted with the way the network is working. Raw genome sequence At the last minute, another company entered Once the data connect from the Sidera pipe data go in; what emerges is a rich analysis of the mix, providing the pipe for “the last mile.” to the Level 3 gateway in Cambridge—one a patient’s data with variant conservation and Sidera—one of a number of companies that of 350 data centers Level 3 has across the mutation prediction scores, which in many work with Level 3 to provide that local transmis- world—it travels on a private line down to instances is helping Courtagen’s scientists and sion—already had fiber in the Courtagen office Ashburn. Courtagen pays Level 3 a month- physicians identify deleterious mutations. building, with the all-important DWDM (Dense ly subscription fee for a minimum data Wavelength Division Multiplexing) technology commitment. McKernan says Courtagen takes advantage of for scalability. This means that for Courtagen Amazon’s EC2 instances for sequencing analy- to upgrade the network from 10-gigabit to In addition to Sidera, Level 3, and AWS, sis, primer design, and hosting of web servers. 100GE down the road, McKernan says it will Courtagen had to work with Amazon’s hosts, In addition, Courtagen utilizes StarCluster to only require changing a couple of cards. “Our the Equinix facility in Virginia, as well as Check dynamically start EC2 instances and stores network is now scalable to move [data on] Point (a leader in securing the internet). “These their sequencing data in S3 buckets. Courta- 2,000 patients or more,” says McKernan. relationships allowed us to combine fast net- gen is also beginning to migrate long-term working technologies with the highest level of storage to Amazon’s Glacier platform to Courtagen insisted on working with Level 3 security for our employees and patient data,” save money, and is evaluating AWS Elastic

as the carrier, so in the event of any network says McKernan. Beanstalk to deploy custom applications. BONUS EDITION: problems, Level 3 alone would be responsible

Da t a Ma n aga nd ement Cl t h e oud

[11] NetApp Eyes Opportunities in Health Care Data Storage

By Kevin Davies | february 1, 2013

providers—for profit, not-for-profit, aca- demic medical centers, all that falls under our domain. Pharma and biotech is largely run Whatever happened to NetApp? out of a dedicated district that’s part of our When Bio-IT World launched in 2002, NetApp was one of the big names in big Americas group, not part of the health care data storage in the biotech and life sciences arena. But over the past decade, while group today. As I said, different companies brand names such as Isilon, EMC, BlueArc, Quantum, Panasas, DDN and many oth- define health care differently. We’ve defined ers have cashed in on the data deluge, NetApp kept at best a very low profile in it around payers, providers, and some ISVs… the space. That is not to say that it was not in use or the technology does not have It remains to be seen what’s going to make

its supporters: on the contrary, many data center managers could point to trusted the most sense for NetApp, whether the ex- BONUS EDITION: NetApp installations. NetApp storage is used at Genentech and several other major isting structure is good, or whether it should biotech firms headquartered in California and beyond. For some, however, it was have an expanded definition. But that’s our less of a pain to integrate their old NetApp systems than replace them with new. definition today.

But there are strong signs that NetApp is turning things around. For example, the com- What are the shifts in medicine, the im- pany has introduced flash-based storage solutions (such as FlashCache and SSD based petus driving this growth in volume? And

architectures) to meet extreme performance requirements. These technologies have also how is NetApp meeting that demand? Da

been integrated with NetApp’s Virtual Storage Tiering solution in order to help customers Nesvisky: One element is in the basic re- t a Ma n aga nd ement leverage flash to improve performance while still utilizing cost effective storage. search itself. They’re mapping more and more genomes and it’s obviously driving Bio-IT World reached out to Dave Nesvisky, who joined NetApp in September 2010 as much greater data requirements within that senior director of health care sales, for an update on NetApp’s new technology and re- industry itself. But we’re seeing effects on the kindled interest in the health care and life sciences sector. rest of health care… Today medicine is deliv- ered reactively and episodically. You get sick. You go to the doctor. They treat you. That’s a Bio-IT World: Dave, what’s been going on said, ‘you’re solving that problem for us at a very expensive way to treat people. at NetApp over the past few years as the small level because the data you’re manag- life sciences has been swamped with data? ing represents a fraction of our overall data The push under the Affordable Care Act and Cl t h e Dave Nesvisky: There’s been significant problem. Our bigger data storage expense is ACOs (Accountable Care Organizations) is change over the past couple of years and it around diagnostic imaging, electronic medi- more in preventive medicine—the promo- oud continues to evolve. Health care’s obviously a cal records. Can you help us with that?’ tion of wellness rather than treating sickness. very broad area and includes a lot of different If you’ve got people with asthma or diabetes segments—you’ve got providers, research, A couple of years ago, NetApp was not or high blood pressure, it’s really about pro- regulatory, device manufacturers, health in- fully prepared to help our customers in that actively getting these people into programs surance, and distributors. There’s almost no market… We did not necessarily have the to maintain their wellness so that they don’t limit to what you could include in the health skill set around the applications that health get into the health care system any deeper care segment. care customers were running. My first step in than they’re already in. joining the company was to start building a When I joined NetApp a couple of years ago, team—bring in people that had come from Where the future and the alignment is with NetApp had several thousand customers in big application companies that serve the bio-IT is predictive medicine—the opportuni- health care. Customers were using our prod- provider market—companies like McKes- ty to look at somebody’s genetic makeup and ucts for the kind of things that every NetApp son and Siemens—and brought in a former be able to predict with some level of accuracy customer uses our products for: Exchange health system CIO to help us better under- you have the markers that indicate that in 20 and SharePoint and virtualized data centers, stand the market. We’re now in a much bet- years you’re likely to get these things. What general IT, not anything specific to health ter position to support our customers around can we do now? And then, in line with the care. But many of those clients, especially their bigger data problems. pharma companies that are starting to be hospitals, clinics, providers, were very inter- able to create custom-made pharmaceuticals ested in solving bigger problems. They were Last year, we pulled together the payers and for individuals, to treat them more effectively enjoying the benefits that NetApp brings in providers and a select number of software and target their disease more accurately. terms of storage efficiency and total cost of vendors and created the health care vertical That’s where the convergence is… ownership and operational efficiency. They that I lead today. That includes all stripes of

[12] What is NetApp doing in the space? We NetApp because they’re getting a lot of ef- running, you can run it with a much more acquired a company called Engenio from ficiency out of a capability in our FAS product efficient staff… They were able to continue LSI a year or so ago to create a cost-effective line called flexible volumes, or FlexVol. That with the current staff and handle bigger and dense storage platform ideal for high allows a lot of researchers to be allocated a workloads efficiently. And they were getting throughput workloads, for object content lot of storage as far, say several terabytes. tremendous throughput from the system. repositories, for unstructured data, and for The administrator is really only carving up a Some really good benefits from making a other use cases where you’ve got either smaller amount, but it gives the appearance move to NetApp. high volumes or very large databases or very to the user that they have all they need. large object containers. What’s your competitive advantage In a typical environment without NetApp, these days? Actually, that was a part of the portfolio that you would have to physically provision Nesvisky: There are a couple of areas. Clearly we didn’t previously have. We had a broad that amount of storage to each user. If ten there are very successful top tier players product portfolio that could essentially do researchers each needs 10 terabytes, you in the space, but the features of NetApp that function, but this is a platform that took would physically have to provision 100 software, the flexible volume, the ability to it to the next level. It had very high through- terabytes to those people, even though provision virtually a lot more storage to the put and very dense storage—obviously those guys might only be using one or two users than they had to physically provision, when you talk about very large data sets terabytes at any given time. With flexible was very efficient to them, and the ease of there are physical constraints to the data volumes, you can tell them that they have management compared to other solutions.

center before you have to expand it, so you access to ten but you know they’re not going BONUS EDITION: want to be able to pack as much storage into to use that. You’re able to physically provision Every other vendor tends to offer a portfolio the smallest possible space. We’ve been very a lot less, which saves a lot on the storage. of storage solutions—a particular solution successful with that E-Series product. It’s a for backup, another for production. And product that we work into the space as well The other part that people are finding with they have families of systems so when you as it’s a very large OEM product for us. NetApp is it’s just easier to manage. We outgrow one of them you have to forklift up- consistently find that our administrators can grade to the next bigger series of equipment

What was it about that technology that manage a lot more volumes of data, a lot and it has a different operating system. And Da

particularly appealed to NetApp? larger arrays with a lot fewer people. so you’ve got to do a data migration, you’ve t Nesvisky: The footprint density. It’s a very got to literally physically remove the system a Ma n aga nd ement dense storage platform and it had very high Are there a couple of installations in the that was in there, put in the new system, mi- throughput for use cases like full motion last 12 months in your health care arena grate the data, retrain the staff, all that. And video where typical SAN or NAS was not that you can point to as good examples? that comes into account. built to handle that effectively. It’s finding its Nesvisky: One that comes to mind is the way into a lot of different application areas. Duke Institute for Genomic Sciences, which When people assess the long-term impact From the health care perspective, the two is a NetApp FAS customer. They were getting of their storage decision, NetApp runs one most interesting things are big data analytics more and more grants and research projects operating system. We have an ‘agile data and also very large object content reposito- and it was stressing their systems because infrastructure.’ This is important: our agile ries in the multiple petabyte range. they had more and more researchers on it. data infrastructure means that from a single Cl t h e The way they were adding people and trying operating environment, Data ONTAP, we In terms of the actual data that you’re sup- to manage things, it was just runaway data can offer people non-disruptive operation, oud plying solutions for, what are you seeing? growth and they needed a new platform that which means that literally any upgrade of Nesvisky: There may be a future application was more efficient, that could work into their systems, software, adding equipment, retir- telemedicine with video and image data. But environment. ing disk out of warranty or out of service or that’s a little bit of a future state for us, not for whatever reason, anything you need to top-of-mind right now. Another emerging The two things they found with NetApp is do in the maintenance of an array is done area is digital pathology. Today, the typical NetApp works very well in a virtualized en- non-disruptively. You don’t have to schedule imaging modalities that you see—X-ray, CT, vironment. And the way of doing it before is downtime, which is a huge advantage. No- PET, MRI—as those modalities become more you’d get a grant and you’d stand up a new body wants to schedule downtime! powerful and the images are more refined, system so you’ve got tons and tons of really they’re requiring more storage themselves. underutilized servers and storage. And this is We have non-disruptive operation. We 3-D mammography was approved by the not a unique thing to genomics… They made have intelligence which means that we can FDA last year. It uses almost ten times more an architecture decision to move to NetApp put—we have different types, different per- storage per image than 2-D. The typical mo- in a heavily virtualized environment and it formance profiles of disks, so we can put the dalities are taking up a tremendous amount gave them several tremendous advantages. data where it makes the most sense for the of storage. In digital pathology, some of It allowed them to reduce the footprint on performance you need… In the agile data these things can run into a terabyte per study, the floor, which enabled them to extend the infrastructure you can build into the multiple which is an incredible amount of storage. life of how long they could stay in their data tens of petabytes in single volumes. If you center—if you can compress into a smaller have to store large volumes of genomic data, But we also see, on the genomics side, it’s footprint, that means your data center’s you’re really never going to run out of steam. taking up a lot of space and it requires high got more room to grow over time. That was bandwidth. We have clients who moved to really good. With fewer physical devices The agile data infrastructure is something

[13] unique that no other company can offer. doing analytics for health care in various in motion, how the physical environment They all offer a family of products that re- areas. Some of them are taking very broad is secured, that really becomes a non-issue. quire you to literally retire one, forklift it out, approaches, some narrow approaches. Being migrate the data. It’s an expensive, complex able to do analytics in hospitals around the What challenges do you face in finding process that’s time consuming and costly. outbreak of, say, sepsis; they want to track new customers? We eliminate that. And when people rec- that. Sepsis is very expensive to a hospital… Nesvisky: As you might imagine, health care ognize what NetApp is doing, it’s absolutely Analytics around predicting is somebody is a fairly conservative business in terms of revolutionary. You can literally build a stor- likely to get that based on or are they show- their decisions because they’re entrusted age architecture now with us with a single ing the indications, can we treat it early with protecting patients’ lives. And so, our operating environment that goes from the before it fully evolves? That’s a big one for us. biggest challenge is just the status quo; hey, smallest array, whatever you want to start we’ve always bought from these guys. Why with is just some very small system, to scale We’re seeing more private clouds, organiza- would we change? We just need to be in literally infinitely and without ever having to tions operating clouds on behalf of the rest front of people. forklift anything out. That’s a big advantage. of their organization or other organizations that they’re affiliated with. We are also work- One of my favorite quotes is from Woody What other innovative and exciting de- ing with some public cloud providers also Allen: “Eighty percent of success is show- velopments do you foresee? that have some great solutions out there. ing up.” When we get our opportunity to Nesvisky: Wow, a lot of things! For me, on present, people get very comfortable with

the health care part of the business, the Aren’t there fundamental issues with us. We win our share of business. I think we BONUS EDITION: future is really around analytics, whether it’s putting patient data in the public cloud? have an absolutely superior solution for our to pave the way for predictive medicine or Nesvisky: Once you explain how the system industry… this vertical is a very new venture manage populations of people. I think the is architected, it’s really not an issue. Frankly, for NetApp. We just have to tell our story Hadoop/E-Series combination is going to a professionally managed, well-architected and effectively message and let them know be very powerful. cloud data center, the patient information what we have. Our biggest challenge is just is much more secure than paper files laying really inertia.

There are a lot of companies in the space around in a hospital. Once people under- Da

taking a lot of interest in how to go about stand how the data are encrypted at rest, t a Ma n aga nd ement Cl t h e oud

[14] Cambridge Healthtech Media Group

www.bio-itworld.com 250 First Avenue, Suite 300 Needham, MA 02494