CS@CU

NEWSLETTER OF THE DEPARTMENT OF COMPUTER SCIENCE AT COLUMBIA UNIVERSITY VOL.5 NO.2 SPRING 2009

An Interview with Computer Science Department Chair Henning Schulzrinne

Outgoing Computer Science Department Chair Henning Schulzrinne

After serving as the How did you get into the Were there any surprises students, 30 other faculty, Computer Science role of department chair? on your first day of work? and many more Masters I think the way it works in There weren’t too many students, undergraduates, Department Chair for etc—all who have needs that 5 1/2 years, Professor most departments is that surprises on the first day, but some of the senior faculty eventually there were surprises need to be taken care of. Henning Schulzrinne put their heads together and in terms of things that you What I found out is that the is stepping down at the decide who can we draft, who don’t really see as a normal department chair tends to be end of his term in July. might be willing, who probably faculty member. Usually it’s the last stop, at least within won’t do too much damage, kind of a black box “paper the department, for just about T.C. Chang Professor Shree etc. For some reason some of goes in, signature comes out” everything. One of the things Nayar will be the next chair the senior faculty picked me, type of model; what you don’t that you notice very quickly is of the department. Professor so I was invited to a breakfast see is all the activities that that no issue is too small to Schulzrinne gave CS@CU at one of the local eateries go on on a daily basis. Or as make it to the email inbox of Newsletter Roving Reporter and it was sprung on me that a student, you probably think the chair—missing toilet paper, Carlos-Rene Perez some some people think that “you of the front office as a group somebody ordered the wrong candid insights about his should be next.” Serving as of people that sit there all day pencils, somebody can’t find experiences and perspectives chair was something that I and you don’t really know what their mail—you name it, sooner as Department Chair. hadn’t necessarily planned on they do—you hope they do or later you’ll get a e-mail with at that particular moment, but something useful! You might somebody stopping by. That’s I decided it was a challenge only bother them 10 minutes not a bad thing, because I was interested in, and I was during the year, but of course ultimately most organizations willing to take it on. there are 120 other Ph.D. only work if somebody takes

CS@CU SPRING 2009 1 Cover Story (continued) responsibility for everything didn’t scale very well to a $10 I think the staff here, particularly with the school on some budget in the end. You want people to million dollar research, 30+ the academic staff, has done a issues. Even during a single day, feel that there is a person faculty department. So one much better job than we used to you end up constantly switching who eventually will listen and of the things that I definitely do at integrating things through between different roles. hopefully make the right things wanted to accomplish was social activities. I’ve set up a happen. Of course, as chair having better infrastructure number of social events, like the Do you have any tips for you usually don’t do it yourself, that allows the faculty and family day and the graduation your successor? you end up delegating. So staff and student volunteers ceremony that we do for Ph.D. If I were to do this again, chair management is often to run things a little bit more students, to help a little bit more I think it’s important to set what I call management by efficiently. with this; also we now have our expectations appropriately. forwarding; as chair you try to department-wide coffee hours I think I have quite a reputation decide who is best qualified What are some goals that you pretty regularly. But I think we for answering email relatively to deal with whatever idea or feel still need more work? can still do better. quickly at all hours, and people suggestion or complaint or I do believe that there are two Hopefully this department is a have started to expect that. problem you’re confronted areas where the department can community—a community that That is not an expectation with. Speaking as a networking do better. One is a byproduct of has common technical and which you necessarily want to person, you act kind of like the success: I think we have moved intellectual interests, but also a foster! So it’s a good idea to central router: you provide a well up in our external ranking and social community. Most people set clear expectations: if you known address to everybody in visibility, and so we need to spend an inordinate fraction of do not get an answer to an the department, and hopefully start “acting our age.” I don’t their week here in these build- e-mail until Monday morning, in many cases you can provide mean our biological age as a ings; many people probably you’re going to survive, and some assistance. 25+ year department, but we spend more time with their you don’t have to panic as to have now acquired a bit of colleagues here than they why you didn’t get an answer What is your proudest external reputation. We used spend with their their children to your email on Saturday. The accomplishment as chair? to be kind of a “mid-20’s” type and spouse, besides sleeping. other piece of advice is to take One thing you learn as a chair, of department, but now we So treating it simply as a place advantage of our great staff. We hopefully quickly, is that the are easily a top-20 department with a bit of concrete and are lucky now to have a very Oath of Hippocrates applies: by our impact. But once you bricks where you get a network strong office staff on all sides— First do no harm. In any reach that level, you are connection and a desk doesn’t business, financial, academic— department the really important suddenly in the same league quite cut it; I think we need to that can take on many roles. work that the University does— as departments which have a treat it as a place where people It’s a situation where if we can teaching, research, service— number of activities, in terms can connect socially as human grow the professionalism and gets done by the faculty and of outreach and interaction beings. So having people that responsibility of our staff even the Ph.D. students and to some with the community, that we volunteer here, be it for a further, I think they will step extent by the other students in haven’t quite reached. Working newsletter or be it for cookies up to the plate. This means the department and the research with alumni, working with or be it for being a big brother, many things can be done well staff. So what a department industry, working with the is very much an important part that otherwise may not get chair can do and staff can do rest of the university—more is of that. I very much try to done at all, if the chair tries is really make sure that those expected of you when you’re encourage activities along to do it all. who do this real work of the a more visible department. these lines; that’s why we How about faculty recruitment? University have the resources The second area is a long term have new awards for students Could you see the department they need so that they can challenge. The way we are set who do community service. growing by ten more faculty? do their work unencumbered up physically, we are in two— by bureaucracy, by silly walls, if you count the InterChurch How do you balance all the Every department wants to by resource limitations (to the group, three—buildings, and things that a department grow, and growth is a good extent possible). so we are very distributed. chair has to do? way to do new things and One of the issues that I It’s quite possible to graduate, Well, I’m not claiming that my do existing things better. discovered relatively early is if you were in CEPSR, having schedule is any worse than But simply relying on faculty that the department had grown never set foot into the CS any other faculty, but certainly growth is not the only way significantly since I joined in building, except maybe on the number of hats that a we can be doing things better. 1996. Back then we were graduation day. This means you department chair has to wear We’ve reached a size that probably roughly half the size may not have met many of the makes Imelda Marcos’s shoe makes us now one of the that we are now, a little larger, people whose offices are in collection seem small. You largest computer science and many of our ways of doing the CS building. So facilitating have to time-slice to an extent departments in the Ivy League, things were “small department” contact between students that I didn’t experience before. certainly, and we are probably ways. For example, we had is an ongoing goal: between Today is a good example: you mid-sized for general computer very little electronic support, students in CEPSR and students go from a meeting at the science departments. So we computing support for our who are in the CS building, school level with other chairs, are now competitive in terms administrative functions, every- between students in, say, the to a meeting with an external of size and we have pretty thing was done on spread- theory group or the graphics research sponsor, to a meeting good coverage of areas. sheets, word documents in file group and students who are with a department administrator, I would certainly like us to see stores and so on. This worked more on the systems side. to meeting with a Ph.D. student, growth and I would hope that OK for a small department, but This is a challenge. followed by another meeting

2 CS@CU SPRING 2009 the current financial situation they came from—-and staying improves to such an extent, in touch once they leave the from a University perspective, University. I’m not necessarily that the school allows us to talking about a $20 million grow to some extent. But I donation for a new building think we also want to make (though that would be highly the best use of the faculty welcome), but simply staying in assets that we already have. In touch, providing us with input, the last few years we’ve hired providing us with contacts, extremely strong junior faculty, providing us with a means of and as they grow they become engagement with companies more senior. We have strengths and other entities. This is in areas that we never used to something where I think alumni have strength in. This means can certainly help, and students we can now do things that can help with outreach to we were never able to do— wherever they came from earlier. instead of everybody doing Students are our life blood; we things by themselves, we wouldn’t get any research done can operate in larger scales, without them. And attracting with larger groups doing large good students to Columbia is proposals and centers and probably the best way to make other efforts that are just more Columbia Computer Science visible than five little efforts. even better. It helps everybody: I think this is how good students get good colleagues departments and very highly to work with, we all get good ranked departments operate. research done, and the depart- ment over time will become What are you going to miss even better. We’ve already about being department chair? seen this when we have the Well, one thing I’ve missed while Ph.D. visit day—many students being chair is the opportunity to already help out with this, do more intense research effort and it has been tremendously that requires more hands-on helpful in attracting the best work. Right now I’ve been doing students to come here. a lot through the Ph.D. student group that I have, but maybe Professor & Chair I haven’t always been able to Henning provide as much day-to-day Schulzrinne supervision or interaction with them as I’ve had before. I also hope that I can continue to do work on the external professional side; I have a fair amount of ACM involvement now through ACM SIGCOMM and other parties, and that is something that I think is an important part of being a somewhat more senior faculty. You can help shape the general direction of ideas—in my case, a general direction for the network research community.

In closing, do you have any suggestions for how the student community can help out in outreach or other ways? We have a great student community at both the under- graduate and graduate levels. I hope we can draw on their resources, both reaching out to their alma maters—wherever

CS@CU SPRING 2009 3 Department Activities

APAM Ph.D. student Tianzhi Yang practices a presentation.

If you walked through the CS conference and journal papers lounge at some point during or for their dissertations. Academic Writing & the first three weeks of the fall Students analyzed the elements semester, you may have been of a scientific paper, including Great Presentations there when it was transformed content, form, and language, temporarily into a writers’ with particular attention given space. For three weeks in to reader-oriented writing and Workshops September 2008, 45 engineering clarity. While a three week doctoral students from session goes by quickly, Computer Science, Biomedical we were able to focus on a for CS Grad Students Engineering, Electrical number of important genres, Engineering, Earth and including introductions, data Starting in 2007, the Department of Computer Environmental Engineering, commentaries, results and Science invited Janet Kayfetz to teach writing Applied Physics/Applied Math, discussions, and abstracts. and presentation skills to our graduate IEOR, and Chemical Engineering During class, we converted participated in an Academic the CS lounge into a digital students in short courses taking place during Writing Workshop organized by the early weeks of the semester. classroom which allowed us to the Department of Computer access the workshop website, Science. Students met choose a student document, These short courses were built on her experience twice a week for 2 hours project it on the screen in front teaching writing and presentation skills at UC Santa each meeting in one of three of the group, edit the text in Barbara. We asked her to reflect on her experiences. workshop sections. real time, and then email the The goals of the writing work- edited text to the student for shop were threefold: to create further review. In addition to a discourse community of the details of language use scholarly student writers who and clarity of written expression, would share a commitment to we turned our attention to the excellence in writing, to apply verbal articulation of ideas and basic skills in composing and the appropriate and helpful editing to produce clear, logical, ways to critique peer writing. precise, and readable scientific Students worked in pairs on texts, and to practice techniques the peer editing of their texts, that would help student writers and also on timed, unedited, produce more—and qualitatively journal-type writing that builds better—writing, either for fluency, speed, and confidence.

4 CS@CU SPRING 2009 The Academic Writing of the research space and the on a site where students could accept the suggestions that Workshops were first offered specific problem to be discussed review them as often as they come our way when we throw to CS Ph.D. students in in the text. We were able to liked. In addition to the formal our text on a screen in front September 2007. The 2008 take things to the next level presentation, we also spent of a group of sharp thinkers or sessions were a continuation and look at the subtleties time talking about job talks and ask “What did you think of my of our 2007 efforts, with the of the construction of logical interviews, and the art of asking presentation?” Likewise, many added feature of opening the arguments, the idea of and responding to questions. students, even those who are workshops to Ph.D. students consistency of tone, and the A huge challenge faced by skilled at analyzing a text and from other departments in the notion of rhythm in a text, using the workshop participants who understand how it might be School of Engineering and the digital classroom approach are not native speakers of improved, are not comfortable Applied Science. As interest in for group analysis and editing. English is that in addition to with the process of offering the first semester 2008 sessions The Great Presentations the central issues of argument feedback. But the workshop seemed to be strong, the Workshops were a pleasant development, organization, flow, students did not hold back. Department organized a second change of focus, and allowed us word choice, precision, and so They reached a level of analytical round of workshops this past to dive into the world of talking on, nonnative speakers have the ability in three weeks that usually January-February 2009—with a about research to an audience language acquisition challenge takes much longer to achieve, new twist. Instead of a repeat of specialists. We used the and difficulty as well. While the and I watched them learn how offering of the writing work- acronym “STORIES” to highlight workshops were not conducted to be willing to receive some shops, the most recent sessions the intertwined variables of with any specific orientation to very intense and detailed offered an Advanced Writing excellent presentations. We second language issues, I was suggestions from group Workshop for students who had talked about the most important able to incorporate work with the members, myself included. attended one of the previous part of a presentation, the nonnative speakers on specific We hope these workshop writing sessions, as well as development of a clear and language patterns, pronunciation, experiences take the students two Great Presentations logical story (S), as well as and other areas of difficulty. a long way to being better Workshops, a response to the elements of timing (T); I want to say that it was a collaborators with their student requests for help with organization (O); rehearsal (R) and professors and their student presentation skills. wonderful experience to have practicing aloud; interacting (I) the chance to work with colleagues, and that their writing The advanced writing group with both the audience as well Columbia students. From the and presentations reflect a new worked only on their own writing as with the visuals if appropriate; very first minute, students were level in their communication -in-progress, documents that the details of verbal expression free with their observations and expertise. I want to extend my varied from conference papers (E) including volume, pitch, comments and were ready to thanks to Swapneel Sheth for and journal articles to technical intonation, choppiness and participate. It is no easy thing creating the website for the reports. As students were fluency; and the development to create a cooperative group Great Presentations groups and already familiar with the of clear and useful visuals of writers and speakers with a for coordinating the uploading approaches introduced in the and slides (S). Students gave shared commitment to being onto the site of all of the video first writing workshop, we had a presentations of varying lengths open and vulnerable about their clips of student presentations shared view of the importance of which were videotaped and writing efforts and presentations. over the three-week sessions. audience and the significance of critiqued by me, and all of the All of us know that it is not easy developing a precise description presentation clips were posted to take in the questions and

Computer Science Ph.D. students Arezu Moghadam (L) and Ariel Elbaz (R) at work on peer editing. Computer Science Ph.D. student Ryan Overbeck (L) and APAM Ph.D. student Abby Shaw-Krauss (R) discuss a writing sample.

CS@CU SPRING 2009 5 New Faculty

What type of research computation as a linear set of interests you? instructions executed in the order Problems that are at the dictated by a program counter. boundary of architecture hold In dataflow computation, the a special interest to me. At the program is represented as a software boundary there are graph, where the nodes of the Interview with many problems relating to how graph represent instructions, and the programming systems the edges connecting the nodes New Faculty interact with the machine on represent data dependencies, which they are running. How producer-consumer relationships, can architects harness the between instructions. Under Member raw computational resources this model, an instruction can be afforded by Moore’s Law to executed at any point once all of Martha Kim help existing and emerging its input operands have become applications run as efficiently available. This is called the dataflow firing rule, and takes Martha Kim joined the Computer Science as possible? At the device- level boundary, questions arise the role of the program counter Department as an Assistant Professor in as to how to adapt architectures dictating when instructions can the Spring 2009 semester after completing and microarchitectures to execute. Because the dataflow her Ph.D. at the University of Washington. changing device characteristics. model of computation does not My research has focused on introduce artificial or unnecessary serialization, it exposes all CS@CU Newsletter Roving Reporter Carlos-Rene Perez those boundaries. theoretical instruction level caught up with Martha and talked to her about spatial I have explored algorithms for parallelism. architectures, legacy code, and more. mapping software to highly distributed or decentralized Going back to Wavescalar, architectural designs. These what did this architecture architectures consist of many, bring to the field? many processing cores. The One significant contribution of challenge lies in mapping code the WaveScalar project was (typically seen as serial) onto that, unlike previous dataflow a highly parallel computational machines, it could be Tell us about your field substrate. of research and why you programmed in traditional, got into it. In 2006 you delved into spatial well-known languages. Previous dataflow machines Computer architecture is a architectures, specifically the had to be programmed with broad and cutting-edge field at Wavescalar architecture. Can special, dataflow languages. the moment. It encompasses you guide us through this? This was due to differences all manner of technical problem WaveScalar* is a research between the language and the from the device level right up project at the University of machines regarding the order to programming systems. This Washington that I contributed in which memory operations breadth is what makes it so to as a graduate student. were to be applied to memory. interesting and extremely rich. The project explored several Another significant research For many years, architects aspects of spatial architectures contribution is in the micro- were able to reap performance and pushed the limits of very architectural space. Traditional by scaling processors to higher small, simple processing cores in processor designs were and higher clock frequencies a design where communication designed under the assumption while leaving the of is a first order citizen. that data could be sent any- the design unchanged. For a How do you build this architec- where on a chip within a single number of reasons, this old ture? How would you program clock cycle. Over the years, technique will no longer work, such a machine? The design as clocks have become faster so the field is faced with a host ended up as dataflow machine, and faster, that has ceased of challenges, which, quite which is a different computa- to be true. In some sense, frankly, we do not know how tional paradigm from what is other regions of the chip have to solve. While the problems traditionally used. Dataflow has become more remote because are quite real and quite difficult, the benefit of exposing a lot of of this. At the microarchitectural this makes it an exciting time instruction level parallelism. level, WaveScalar was focused to be in architecture. on these non-uniform commu- This is part of the reason I What is a dataflow machine? nication characteristics from the decided to pursue research Dataflow is an alternative to start. We acknowledged that in the field. There are many von Neumann computation. some parts of the chip were unknowns and unsolved Von Neumann computation closer than others and designed problems. is the more familiar model of the microarchitecture accordingly.

6 CS@CU SPRING 2009 Out of this came several to do. If we succeed in working engineer effort. Then, once then physically assemble them innovations in the design and with the programming and soft- you’re convinced that it works into the particular hardware study of distributed micro- ware system community, we and is ready to go, you still system they need. This gets architectural components. may be able to produce parallel spend a couple of million dollars around the high initial cost by These results apply to less codes that can profit from all for lithographic masks, which sharing those high initial costs aggressively parallel chip these cores. I think also, in the are essentially templates that across many different custom designs as well. A set of face of power constraints, we are used to physically fabricate designs. The questions on the processing cores of any size really have no other choice that the circuit. technical side are, how do we has this problem. For example, will keep the power consumption So it’s an enormously expensive make these chips perform you can consider a quad-core of these chips within our process, in time and money, reasonably well? How do we size multiprocessor as a distributed budgets. So I do see multi- before you ever even produce a the components? What sort of architecture with just four core as being the future in the chip. If you are somebody such communication infrastructure locations. next 5-20 years. as Intel, building Larrabee, you’ve should we provide for them to got a lot of engineers and a lot of communicate between bricks? This sounds similar to Intel's What are your thoughts on capital. You can afford to spend Imagine a multiprocessor sys- Larrabee, so can you make a legacy support? Should we that time and money, because tem, in which you could go to quick comparison? scrap the X86 ISA? you’ve got reasonable confidence Intel’s website in the same way Sure. One way to view these The need to support legacy that you’ll sell a fairly large that you go to Dell’s website architectures is sitting on a code was always a funny one number of these chips. The today and customize your chip. continuum. At one end you’ve to me. People in the hardware problem is, if I need only a You could request the particular got a single-core processor. community often think of soft- couple of dozen of my own mix of processing cores you Then you can apply Moore’s ware as being much easier to design, or even a couple want, the particular size and law, where you say, in the write and much more changeable thousand, I really can’t afford to organization of the caches, and same sized footprint I’m going than hardware. In software, for spend those tens of millions of so on. You could apply brick and to put two half-sized cores, example, you can patch your dollars. So, my research was mortar techniques to multi- or four quarter-sized cores. code and you don’t need to focused on trying to reduce the processors, as an example Larrabee is out in this direction. commit implementations to initial cost of designing a circuit. domain, to make it possible to You can keep pushing towards lithographic masks. But the The goal was to allow people to have made-to-order processors. more and more, smaller and reality is that software is even actually fabricate hardware in smaller cores. At the end of more complex. There is much scenarios where today it is not the spectrum you run into so- more of it, partially because economically feasible to do so. * The WaveScalar Architecture, called tiled architectures, of it’s so easy to write. While S.Swanson, M. Mercaldi, A.Petersen, The way I did this was to A.Putnam, A.Schwerin, K.Michelson, which WaveScalar is one. The supporting legacy code does observe that a lot of chips are M.Oskin, S.Eggers. ACM Transactions cores on these machines are create an enormous burden, the built out of the same underlying on Computer Systems (TOCS). small, lightweight computation consensus seems to be that components. The example I Volume 25 Issue 2. May 2007. engines, but there are many re-writing it would be an even like to use is a camera phone. more of them. greater burden. So we choose If I were building your iPhone I to continue to provide that Comparing WaveScalar and would not implement the JPEG Larrabee directly, I see them as support. Personally I believe encoder from scratch. Other heading in the same direction that there is too much code people have implemented the out there that we depend on but with Larrabee a little JPEG encoder and I could save closer to what we have today. too much. There is no practical time and effort by licensing that WaveScalar is unlike anything way that it can all go away or encoder for use in my design. we have today, but if we be rewritten. The essential idea behind my take the trend that produced research, named Brick and Can you tell us more Larrabee to its extreme, we Mortar, is reuse—to share not about your recent research might find something that only the design effort (as is on circuits? looks like WaveScalar. done now) but also the physical My focus for the last three We are unquestionably headed implementation effort. The years has been very close to in the direction of multicore. proposal is to prefabricate a large the physical device level. I was I don’t know how to take a volume of small chips containing motivated by the fact that it processor design and make it commonly used hardware is very expensive to fabricate twice as fast, but I can very modules. These are called bricks. a chip nowadays. In order readily shrink it and give you For example, I’d build a JPEG to design something such two of them. In doing this we encoder brick, and a micro- as Larrabee, with billions of are foisting the problem onto processor brick, and and an extremely tiny transistors, there the programmers, leaving it up MPEG codec brick, and so on. is an enormous engineering to them to extract improved I’d prefabricate these system-on- cost: a lot of time and money performance out of those two chip-type components, so that all goes into producing that design, parallel cores. I think we are different chip designers need to validating and verifying that it headed in that direction because do is buy these mass-produced does what it is supposed to. it’s the only thing we know how components off the shelf at This takes a lot of time and relatively low cost to them and

CS@CU SPRING 2009 7 Research Focus

Yell if you see a buffer overflow coming this way!

Application Communities

Professor diversity, by introducing Community (AC), a collection monitoring by each member of Angelos “controlled uncertainty” in one of of identical instances of the the AC, and for fault mitigation Keromytis the system parameters that the same application running once the group identifies a attacker must know (and control) autonomously across a network. new fault. Members of the AC in order to carry out a successful Members of an AC collaborate in emulate different “slices” of the attack. Such parameters include, identifying previously unknown application, monitoring for low- but are not limited to, the (zero day) flaws/attacks and level failures (such as buffer instruction set, the high-level exchange information so that overflows, illegal memory Software monocultures have implementation, the memory such failures are prevented accesses) as well as deviation been identified as a major source layout, the operating system from re-occurring. Individual from normal application of problems in today’s networked interface and others, with varying members may succumb to behavior (e.g., unusual function computing environments. levels of success. However, new flaws; however, over time arguments or function call Monocultures act as force running different systems in a the AC should converge to a sequences). This same low- amplifiers for attackers, allowing network creates its own set of state of immunity against that level information produced by them to exploit the same problems involving configuration, specific fault. The system “normal” runs (those that are not vulnerability across thousands management, and certification learns new faults and adapts to deemed anomalous) is correlated or millions of instances of the of each new platform. In them, exploiting the size of an with information from other same application across the certain cases, running such AC to achieve both coverage (in instances of the application, to network. Such attacks have the multiplatform environments detecting faults) and fairness build behavior models. When a potential to rapidly cause wide- can even decrease the overall (in minimizing the amount of fault or deviation from normal spread disruption, as evidenced security of the network. additional work at each member). behavior is detected by a by several incidents over the last At the core of the system lies member, the relevant information few years. The severity of the Given the difficulties associated our work on selective transaction is broadcast to the rest of the problem has fueled research with artificial diversity and the emulation (STEM). STEM offers a AC. Members may verify the behind introducing diversity in pervasive nature of homoge- new model for partial emulation fault and apply STEM on the software systems. However, neous software systems, the of selected segments of legacy identified vulnerable code slices, creating a large enough number Network Security Laboratory applications, in order to detect possibly combining this with of truly different systems from has been working on software divergences from expected input filtering or other mitigation scratch not only presents self-healing techniques whereby behavior (e.g., compared to techniques. Because of the use practical challenges but can a large, homogeneous software another instance of the appli- of STEM, it is possible to wrap result in systems that are not base can be used to improve cation) or fault conditions (e.g., the necessary functionality diverse enough. security and reliability for the whole community of participating buffer overflow, illegal memory around existing applications, As a result, recent research has users. To that end, we introduced reference). STEM can be used without requiring source code focused on creating artificial the concept of an Application both for data collection/fault modifications. In fault-mitigation

8 CS@CU SPRING 2009 New Courses New Computer Science Courses

Sebastian Lahaie and Sergei management issues for mobile Vassilvitskii taught a new devices, wireless mobile course, COMS 6998-3: networking, thin clients and Introduction to Algorithmic mobile Web, location-aware Game Theory, in the Fall 2008 and other context-aware semester. Algorithmic game services, and virtualization. theory is an emerging area at the A course programming project intersection of computer science is required. The course was and microeconomics. Motivated mentioned in a Forbes.com by the rise of the internet and piece about mobile computing electronic commerce, computer courses around the country, and scientists have turned to models had a Spring 2009 enrollment where problem inputs are held of more than 70 students. by distributed, selfish agents (as opposed to the classical Professor model where the inputs are Kenneth Ross chosen adversarially). This new taught a new perspective leads to a host of course, COMS fascinating questions on the 4112: Database interplay between computation Systems and incentives. Implementation, This course provides a broad Professor in the Spring survey of topics in algorithmic Kenneth Ross 2009 semester. game theory, such as: algorith- This new course mic mechanism design; focuses on how to build a data- mode, STEM operates transac- combinatorial and competitive base engine. Topics covered tionally at function-call granularity, auctions; congestion and include storage structures, query i.e., it can rollback the current potential games; computation of processing and optimization, execution, and then force the equilibria; network games and concurrency control methods, application to take a different selfish routing; and sponsored recovery methods, and parallel execution path. We identify such search. No prior knowledge of and distributed databases. At paths by using the models of game theory is necessary; the the same time, the introductory application behavior built from most important prerequisite is database systems course (4111) “normal” runs. An alternative mathematical maturity. has been revamped, focusing on mechanism for implementing how to effectively use a data- this rollback functionality is Professor base system, and including new based on the process check- Jason Nieh material on object-relational pointing work done at the taught a new databases and XML. Network Computing Laboratory. course, COMS Professor Rocco Application Communities offer E6998: Mobile Computing Servedio taught a new paradigm in software a new course, security and reliability, by with Iphone and Android, in COMS 6998: exploiting the “wisdom of the Professor Advanced Jason Nieh the Spring 2009 masses” to conduct continuous, Topics in collaborative fault and attack semester. The course is an intensive study of Computational detection and mitigation. In an Complexity, environment where attackers mobile computing on smart- Professor phones with an emphasis on Rocco Servedio in the Spring routinely control and use 2009 semester. thousands of compromised applications. These handheld Internet devices are poised to The course is an intensive study systems to attack users and of concrete lower bounds for networks, the natural response become the future dominant software platform as a result various computational problems is for the defense to pool its in simple models of computation resources and information in of the rapid convergence of computers and mobile phones. such as decision trees, Boolean order to be effective. This work formulas, restricted classes of is currently underway under Topics covered include mobile operating systems and devel- Boolean circuits, and the like. In support from the U.S. Air this line of research unconditional Force and the National Science opment environments, input modalities and user interfaces lower bounds are established Foundation, with several which rely on no unproven patents filed. for mobile devices, power

CS@CU SPRING 2009 9 New Courses (continued) assumptions such as P vs NP. Professor understand the errors in There has been steady progress Junfeng Yang multi-threaded programs (e.g. made over the years using a taught a new races and deadlocks), students range of techniques from combi- course, COMS study their characteristics, natorics, algebra, analysis, and 6998-2: Topics and will also study effective other branches of mathematics. in Computer techniques to find these The course gives self-contained Science: concurrency errors. proofs of a wide range of Professor How to Make • Other reliability techniques. unconditional lower bounds for Junfeng Yang Reliable For example, students learn interesting and important models Software, in the the underlying mechanisms of computation, covering many Fall 2008 semester. Despite our of a popular dynamic tool of the “gems” of the field that increasing reliance on computing Valgrind and a commercial have been discovered over the platforms, making reliable virtual machine VMware. past several decades, right up to software systems remains difficult. Software errors have In addition, students have results from the last year or two. the opportunity to hear guest Students present papers and been reported to take lives and cost billions of dollars annually. speakers who have built large work on a research project as software systems or effective the main activities in the class. Making reliable software is one of the most important problems checking tools talking about in computer science. In their first-hand experiences. Professor Simha recent years, this problem has Sethumadhavan drawn huge attention from taught a researchers in systems, software new course, engineering, and programming language communities. A COMS 6998-1: Advanced number of automated techniques Computer have been developed to Professor Simha increase software reliability. Sethumadhavan Architecture: Parallel This course covers the most Systems and Programming, practical and most important in the Fall 2008 semester. of these reliability techniques. The predominant question for Specifically, students in the the Computer Architecture course study: community, and arguably for all • Practical bug-finding of the CS community, is how techniques. Students learn do we exploit the highly parallel effective techniques that have hardware that is available found thousands of serious to us today. Researchers errors in large systems, as have proposed pure software large as an entire Linux kernel. solutions, pure hardware • Fault isolation and recovery. solutions and some hardware/ Students learn techniques to software solutions, but there prevent a single component is no silver bullet. Yet, the failure from crashing an future IT successes completely entire system and to depend on the ability to extract completely recover from performance from parallel such component failures. hardware. This class teaches students techniques that may • Automated debugging. help them tackle this challenge. Debugging (i.e. finding the root cause of a bug) is usually The class is both for students a painful manual process; and practitioners and covers students learn techniques machine organization and design to make debugging more of parallel systems, the three automatic and less painful. dominant parallel programming models, performance analysis • Concurrency. CPUs are and optimizations. Students getting more cores; the also read and analyze recent implication is that multi- research on parallel systems. threaded programs will be the mainstream. To better

10 CS@CU SPRING 2009 Featured Lectures 2008-9 Distinguished Lecture Series

The Computer Science Department enjoyed Dr. Peter G. Neumann Professor Herbert another successful Distinguished Lecture of SRI International Edelsbrunner of Duke Series during the 2008-9 academic year. This visited on October 6 University visited on yearly series brings world-renowned scientists and spoke about November 10 and spoke and pioneers in academia and industry from Integrity of Elections. about Measuring Shape across the country to Columbia's campus. Before Simplification. Dr. Neumann is Principal These distinguished speakers give talks about their Scientist at SRI, where he has Professor Edelsbrunner is research, field questions from the audience, and meet been since 1971; his research also Founder and Principal at with faculty and students. The lectures are open to the deals with computer systems Geomagic, a software company public and announced and documented at our website and networks, trustworthiness in the field of Digital Shape (http://www.cs.columbia.edu/lectures), and ensure that with respect to security, Sampling and Processing. He members of the department and the community at large reliability, survivability, and safety, is a recipient of the Alan T. see first-hand the latest and greatest computer science and risks-related issues such Waterman Award from the research emerging outside the Columbia campus. This as voting-system integrity, National Science Foundation, year, six lecturers visited us and discussed a broad and crypto policy, social implications, a member of the American exciting range of topics. and privacy. Academy of Arts and Sciences, Elections demand end-to-end and a member of the German These speakers helped make this year an exciting and integrity of voting processes, Academy of Sciences, inspirational one for computer science at Columbia. with additional trustworthiness the Leopoldina. Professor Stay tuned next year, when more of the world’s greatest requirements such as system Edelsbrunner specializes in researchers will visit our department and tell us about security, privacy, usability, and the combination of computing their work. accessibility. The overall system and advanced mathematics to aspects present a paradigmatic solve problems in applications. hard problem. In today’s His methodology is to search systems and procedures, out the mathematical roots essentially everything is of application problems and a potential weak link. The to combine mathematical with pervasive nature of the risks is computational structure to astounding, with a serious lack get working solutions. of system architecture, good Nature is inherently multi-scalar; software engineering practice, Professor Edelsbrunner’s and understanding of security talk presented an attempt problems. Dr. Neumann’s talk to measure this aspect discussed limitations in existing mathematically. Rooted in systems, processes, standards, algebraic topology, this idea has and evaluations. It also ramifications inside and outside considered some possible mathematics. A particularly alternatives—including important application is coping nontechnological approaches, with noise in scientific data. As computer-based systems, and suggested by the title, the talk possible roles for cryptography. advocated measuring noise, but not necessarily removing it from the data, because doing so has side-effects.

CS@CU SPRING 2009 11 Featured Lectures (continued)

Professor Alberto • Access devices such as PDAs, Dr. Sangiovanni-Vincentelli’s talk Professor Bonnie John Sangiovanni-Vincentelli cell phones, and laptops, presented directions, challenges, of Carnegie Mellon which allow leveraging the and potential solutions to of the University of immense capabilities of the the design of future systems, University visited California, Berkeley, infrastructure to users that for which heterogeneous on December 1 and visited on November 19 can be humans or intelligent subsystems such as mechanical spoke about Cognitive and spoke about Quo physical systems; and electrical components Crash Test Dummies: Vadis System Design? • A swarm of sensors, actua- must be designed concurrently. Where We Are and tors and local computing The possible scenarios pose fundamental questions to the Where We’re Going. Professor Sangiovanni-Vincentelli capabilities “immersed in all engineering and scientific worlds holds the Buttner Chair of kinds of physical systems regarding how to deal with the Professor John has more than Electrical Engineering and that offer a wide variety design and management of 25 years experience in usability Computer Sciences at of personal or broad-use global systems with such huge analysis and design. She Berkeley. He was a cofounder services.” complexity. A unified design heads the Masters Program in of Cadence and Synopsys, Most refer to these swarms as methodology that can extend Human-Computer Interaction the two leading companies in embedded systems. Recently from cyber physical systems at Carnegie Mellon, where the area of electronic design there has been a growing (CPS) all the way down to she researches both human automation. He is the recipient interest in Cyber Physical chips, boards, and mechanical performance modeling and of numerous awards, an Systems (CPS) where the components with general software engineering and author of more than 800 interaction between the environments capable of hosting consults regularly in government papers and 15 books in the computing and electronic specific design flows for the and industry. Professor John is area of design tools and elements with the physical industry segments is the an ACM CHI Academy member, methodologies, a Fellow of systems they are immersed ultimate enabling technology. recognized for her contributions the IEEE, and a member of into is emphasized. CPS will The talk presented a potential to HCI through her work the National Academy of allow developing a wide span approach to such a unified in cognitive modeling and Engineering. In 1995 Professor of applications because of the design methodology, called the implications of usability Sangiovanni-Vincentelli availability of a new generation platform-based design (PBD), concerns on the design of received the worldwide of sensors, actuators, and and some examples of its use. software architecture. Graduate Teaching Award of local computing that leverage Crash dummies in the auto the IEEE for “inspirational novel interconnect capabilities teaching of graduate students.” industry save lives by testing the and centralized computation. physical safety of automobiles The electronics industry Dealing with system-level before they are brought to ecosystem is undergoing problems requires more than market. “Cognitive crash a radical change driven by simply developing new tools, dummies” save time, money, an emerging three-layered although of course they are and potentially even lives, by architecture characterized by: essential to advancing the state allowing computer-based system • Computing and communica- of the art in design. Rather, the designers to test their design tion infrastructure that will focus must be on understanding ideas before implementing those offer increasingly faster data the principles of system design, ideas in products and processes. transfer and manipulation the necessary changes to In her talk, Professor John via powerful data centers, design methodologies, and the reviewed the uses of cognitive compute farms and wired dynamics of the supply chain. models in system design and interconnection; Developing this understanding the current state of research and is necessary to define a sound practice. She also presented approach that meets the needs some exciting new research of the system and component directions that promise to make industries as they try to serve predictive human performance their customers better and modeling even more useful. develop their products more Along the way, her talk quickly and with higher quality. discussed the role of applica- tions in science and validity versus useful approximation.

12 CS@CU SPRING 2009 Professor Gail Murphy To combat this overload, Finally, Professor In Professor Moore’s talk, she of the University of Professor Murphy’s research Johanna Moore of the evaluated two recent approaches group has been building to information presentation in British Columbia approaches rooted in structure University of Edinburgh SDS: (1) the Refiner approach visited on March 2 and and inspired by memory models. visited on April 22 and (Polifroni et al., 2003) which spoke about Attacking As an example, the Mylyn spoke about Language generates summaries by Information Overload in project packages and makes Generation for Spoken clustering the options to Software Development. available the structure Dialogue Systems. maximize coverage of the that emerges from how a domain, and (2) the user-model programmer works in an based summarize and refine Professor Murphy works Professor Moore is Director of episodic-memory inspired (UMSR) approach (Demberg & primarily on building simpler the Human Communication interface. Programmers working Moore, 2006) which clusters and more effective tools to Research Centre, and Deputy with Mylyn see only the options to maximize utility with help developers manage Head of the School of information they need for a respect to a user model, and software evolution tasks. Informatics at the University task and can recall past task uses linguistic devices (e.g., She is a recipient of the UBC of Edinburgh. She is a Fellow of information with a simple click. discourse cues, adverbials) to Killam Research Fellowship, the Royal Society of Edinburgh Professor Murphy’s group has highlight the trade-offs among the AITO Dahl-Nygaard Junior and of the British Computing shown in a field study that the presented items. Prize, an NSERC Steacie Society, and has served as Mylyn makes programmers Fellowship, the CRA-W Anita President of the Association for To evaluate these strategies, more productive; the half a Borg Early Career Award, and Computational Linguistics and Professor Moore has gone million programmers now the University of Washington Chair of the Cognitive Science beyond the typical “overhearer” using Mylyn seem to agree. College of Engineering Society. Prof. Moore’s research evaluation methodology, in In her talk, Professor Murphy Diamond Early Career Award. brings together theory and which participants read or listen described the overload faced Professor Murphy is also a practice from computer science, to pre-prepared dialogues, which by programmers today and co-founder and CEO of Tasktop linguistics and cognitive science limits the evaluation criteria discussed several approaches Technologies Inc. to inform the development of to users’ perceptions (e.g., her group has developed to informativeness, ease of The productivity of software natural language technologies attack the problem, some comprehension). Using a Wizard- developers is continually for use in a wide variety of which may also pertain of-Oz methodology to evaluate degrading due to an inundation of applications, including beyond the domain of soft- the approaches in an interactive of information: source code is spoken dialogue systems, ware development. setting, Professor Moore has easier and easier to traverse and recommender systems, and shown that in addition to being to find, email inboxes are stuffed information retrieval from preferred by users, the UMSR to capacity, RSS feeds provide a multimodal archives. approach is superior to the continual stream of technology The goal of spoken dialogue Refiner approach in terms of updates, and so on. To enable systems (SDS) is to offer both task success and dialogue software developers to work efficient and natural access to efficiency, even when the user more effectively, software applications and services. A is performing a demanding engineering researchers often common task for SDS is to help secondary task. Finally, Professor deliver tools which produce even users select a suitable option Moore hypothesized that UMSR more information. The effect of (e.g., flight, hotel, restaurant) is more effective because it more and more tools producing from the set of options available. uses linguistic devices to more and more information is When the number of options highlight relations (e.g., trade- placing developers into overload. is small, they can simply be offs) between options and presented sequentially. However, attributes, and described as the number of options experimental results on users increases, the system must which supported this hypothesis. have strategies for summarizing the options to enable the user to browse the option space. Further details of the lecture series are available online at: http://www.cs.columbia.edu/lectures

CS@CU SPRING 2009 13 Department News & Awards

Ph.D. student Miklos Bergou Eitan Grinspun and Ravi called opportunistic controls, Multimodal Biometrics for received an Autodesk Research Ramamoorthi. His research in which naturally occurring Maritime Domain, between Fellowship Award. Miklos is a focuses on finding principled physical artifacts in a task domain Columbia University, University Ph.D. candidate in the Columbia representations and efficient are used to provide input to a of Maryland, University of Computer Graphics Group, algorithms that operate well user interface through simple Colorado at Colorado Springs, advised by Professor Eitan across a wide range of visual vision-based processing. Tactile and University of California at Grinspun. Miklos investigates scales. In work presented feedback from an opportunistic San Diego. the simulation and direction of at SIGGRAPH 2007, Charles control can make possible Professor Steven Feiner physical systems, numerically presented a solution to the long- eyes-free interaction. For served as General Chair of the modeling real-world objects and standing problem of normal map example, a ridged surface can 2008 International Conference the interactions between them. filtering. By reinterpreting normal be used as a slider or a spinning on Intelligent Technologies for His research seeks out principled mapping in the frequency-domain washer as a rotary pot. Interactive Entertainment. He and efficient discrete models as a convolution of geometry Ph.D. student Snehit Prabhu also served as General Co-chair that mirror the key geometric and BRDF, this work has enabled was awarded a Microsoft for the 2008 ACM Symposium properties of the physical accurate multiscale rendering Research and Live Labs Ph.D. on Virtual Reality Software and system. Miklos is also interested of normal maps at speeds Fellowship. Snehit is advised Technology, and as chair of the in developing intuitive tools orders of magnitude faster by Professor Itsik Pe’er; he best paper awards committee that can be used to control the than previously possible. transitioned from the MS to the for IEEE Virtual Reality 2008. behavior of these systems, with Sahar Hasan, a junior in the Ph.D. track in January 2008. Professor Feiner gave invited applications in engineering and Department of Computer Snehit’s research involves the talks on his research at the entertainment. His work on thin Science, and Ian Vo, a senior, application of computational Universidade Nova de Lisboa, shell simulations is currently were selected for Honorable models to high throughput DNA the University of Florida, the used by special-effects studios. Mention in the 2008 sequencing. He developed a International Symposium Ph.D. student Rebecca Outstanding Undergraduate method to analyze DNA from on Ubiquitous Virtual Reality Collins received an IBM Ph.D. Award competition held by pooled sets of individuals, using in Gwangju, Korea, the VTT Fellowship. Rebecca began the Computing Research error-correcting codes to identify Research Technical Research her Ph.D. studies at Columbia in Association (CRA). The CRA each person. This work has been Center in Espoo, Finland, the fall of 2005 and is advised honored a total of 22 female accepted to RECOMB 09 and and the opening of the by Professor Luca Carloni. and 44 male students in this selected as one of four papers Experimental Media and Rebecca’s research explores year’s competition. Sahar Hasan considered for the Genome Performing Arts Center at ways to harness the potential worked with Professor Gail Research special issue for the Rensselaer Polytechnic power of multi-core processor Kaiser on a project which was conference. Snehit has also Institute. Professor Feiner was systems for general purpose accepted for presentation been involved in applied analysis also invited speaker at the 2008 programming. Rebecca at SIGCSE 2009, the annual of sequenced organisms, IBM Research Symposium on has developed a tool that conference of the ACM Special and is a joint-first-author on a Human-Computer Interaction automatically generates parallel Interest Group on Computer publication describing the first at TJ Watson Research Center code which integrates distributed Science Education. The paper animal mutant whose genome and at Visual Computing Trends scheduling and adaptive memory is entitled Retina: Helping was assembled—a worm with 2009 in Vienna, Austria, and management into SPMD-like Students and Instructors Based two right “brains” (Nature spoke at invited panels at threads running on the cores. on Observed Programming Methods 2008). the 2008 IEEE International Rebecca is also developing a Activities. Ian Vo wrote a paper Symposium on Wearable Sean White method that combines static titled Quality Assurance of Ph.D. student Computers and the 2008 will speak on behalf of his task deployment with dynamic Software Applications Using the IEEE and ACM International fellow Ph.D. candidates at runtime schedules to flexibly In Vivo Testing Approach, which Symposium on Mixed and the convocation for doctoral balance the computational load was accepted for presentation Augmented Reality. candidates in the schools among unbalanced stream at 2009 ICST, the 2nd IEEE of Architecture, Business, Professor Steven Feiner is a tasks for stream programs, and International Conference on Engineering, Journalism, member of the new Games For increase the overall processing Software Testing, Verification Law, Nursing, Physicians Learning Center at NYU, a multi- throughput. and Validation. and Surgeons, Public Health, disciplinary, multi-institutional Ph.D. student Charles Han Ph.D. student Steve Henderson and Teachers College. The games research alliance was awarded an ATI 2008-2009 received the Best Paper Award convocation is held on Monday, with support from Microsoft Fellowship. ATI’s highly selective at ACM VRST 2008, held in May 18, 2009, at 3:30 PM Research that is working to panel awards between four and Bordeaux, France. Steve is in the Chapel. Congratulations provide fundamental scientific six fellowships each year to advised by Professor Steven to Sean on this honor! evidence to support games outstanding doctoral students Feiner. The award-winning as learning tools for math Professors Peter Belhumeur studying a broad range of topics paper, Opportunistic controls: and science subjects among and Shree Nayar have received spanning computer graphics, Leveraging natural affordances middle-school students. a 5 year Multidisciplinary multimedia, chip or system as tangible user interfaces University Research Initiative To ny Jebara was design, or related research. for augmented reality, was Professor (MURI) from the Office of featured in a Business Week Charles is a Ph.D. student in coauthored by Steve Henderson Naval Research. The total article on Mapping a New the Computer Graphics Group, and Professor Feiner. It presents award is for $7.5 million and is Mobile Internet. co-advised by Professors a class of interaction techniques, for a joint project on Remote

14 CS@CU SPRING 2009 Professors Angelos Keromytis Professors Tal Malkin, Sal is driven by a novel approach Professor Itsik Pe’er received a and Sal Stolfo received a Stolfo, To ny Jebara, Steve for detecting relatedness of grant from the National Institute research gift from Symantec Bellovin, Vishal Misra, and individuals from high throughput of Health (NIH) to support a to aid their studies and investi- Dan Rubenstein. The team data of DNA variation across project investigating the genetic gations on techniques for will develop a next-generation millions of genetic markers and basis of schizophrenia with scalable program whitelisting, network-trace anonymization tool tens of thousands of individuals. computational analysis. The for common software that preserves individual and The approach is based on a study is led by Pablo Gejman vulnerability discovery across organizational privacy while still new linear-time algorithm that (Northwestern) and Douglas large software installations, allowing cross-trace correlation stores words of marker data in Levinson (Stanford). The PIs and for vulnerability-oriented for detection, understanding, and a dictionary of genetic variants. are conducting a genomewide analytics for risk assessment. prevention of complex attacks This framework will handle study for the association and other network behavior. The the diversity of human genetic between genetic variants Professors Angelos Keromytis tool will rely on three techniques data in terms of populations and schizophrenia, a highly and Sal Stolfo received a 30- that will be developed at and experimental platforms heritable disease, but without month grant titled Automated Columbia University: (1) Hidden culminating in a complete map clear common genetic factors. Creation of Network and Markov Model (HMM)-based of human genetic genealogy. The research project started Content Traffic For the National clustering will be used to divide in May 2008. from DARPA, Itsik Pe’er received Cyber Range raw network traces into groups Professor as part of a team led by BAE. a four-year National Institute of Itsik Pe’er won for which statistical and other Professor A team of researchers led by Health (NIH) R01 award titled second prize, among 200 properties can be preserved BAE won a DARPA contract Genomewide survey of allele- entries, in a genome sequencing across the anonymized to design and build a National specific somatic copy number award competition from equivalents; (2) more aggressive Cyber Test Range, a facility to changes, together with Tom Applied Biosystems. The application- and definition-specific test and evaluate new security LaFramboise of Case Western company will sequence 30 anonymization will prevent technologies to defend the University and Matthew billion “bases”, i.e., letters of recovery and attribution of private Internet from cyber attack. Freedman (Harvard) as a sub- DNA. Professor Pe’er and his topology, flow, and content contractor. Cancer is a genetic collaborators will generate Columbia University spin-off information under attack; and disease in two levels: First, the genomic sequence for company, StackSafe, Inc., is (3) efficient and application- like many diseases inherited individuals from an isolated the winner of the 2008 National specific secure computation mutations at the individual level Pacific population, aiming University Start-Up Competition. will allow this clustering and may contribute to disease risk to provide insights on the “Beating out more than 400 anonymization without central- and susceptibility in families. genetics of obesity, lipid levels entrants from across the country, izing or revealing the contents Second, and more specific to and diabetes. StackSafe was awarded first of individual raw traces during cancer, it is a genetic disease prize after a rigorous assessment the clustering stage. Professor Ravi Ramamoorthi of cells, that acquire mutations by an online panel of over won a 2007 Presidential Early Professors Vishal Misra and during the lifetime of the patient, 300 venture capitalists, angel Career Award for Scientists Dan Rubenstein co-authored transforming the cells from investors, and university judges.” and Engineers. Professor the paper Opportunistic Use of normal tissue into tumors. The company employs patent- Ramamoorthi was cited for Client Repeaters to Improve Professor Pe’er proposes a pending network security and “investigating foundations of Performance of WLANs, which novel computational method to intrusion detection technologies the visual appearance of objects received the Best Paper Award process signals from tumor DNA developed by Professor Angelos and developing mathematical at ACM CoNext 2008 in Madrid. to detect interaction between and Salvatore Stolfo. Keromytis The paper was co-authored these two kinds of mutations. and computational models for StackSafe, based in Vienna, with Victor Bahl (Microsoft recreating complex scenes for Virginia, leverages virtualization Professor Itsik Pe’er has been Research), Ranveer Chandra automated image understanding, technology to reduce costly participating in the Severe (Microsoft Research), Patrick P. and for mentoring undergraduate IT production problems and Adverse Effects Consortium, an C. Lee (Columbia U.), Jitendra and graduate students.” The downtime. StackSafe’s flagship industry-academia partnership to Padhye (Microsoft Research), awards are the nation’s highest product, Test Center, enables investigate the genetic basis to and Yan Yu (Google). honor for faculty members that IT departments to stage their fatal or near-fatal adverse reaction are beginning their independent existing software infrastructure, Professor Steve Nowick was to commonly used medications. research careers. conduct pre-deployment tests, named an IEEE Fellow for his The consortium brings together Professors Simha and predict IT complications in contributions to asynchronous major pharmaceuticals (Abbott, Sethumadhavan, Angelos a secure environment. and mixed-timing integrated Daiichi Sankyo, GlaxoSmithKline, Keromytis, and Sal Stolfo were circuits and systems. Johnson & Johnson, Novartis, The Department of Homeland Pfizer, Roche, Sanofi Aventis, awarded a $650,000 equipment Security (DHS) is supporting a Professor Itsik Pe’er received Takeda, Wellcome Trust and grant from the Air Force Office new three-year project titled a three-year grant from the Wyeth) with joint interest in for Scientific Research (AFOSR), Privacy Preserving Sharing National Science Foundation safely prescribing drugs by titled SCOPS: Secure Cyber of Network Trace Data, in within the Program for Emerging means of personalized medicine. Operations and Parallelization cooperation with BAE Systems Models and Technologies to Columbia is serving as the infor- Studies Cluster. The funds will be National Security Solutions Inc. reconstruct a genealogy of the matics center, with Aris Floratos used to deploy a new compute The project will be lead by human species. The project (C2B2) and Professor Pe’er. cluster capable of continuous

CS@CU SPRING 2009 15 Department News & Awards line-speed capture and near- online analysis of network traffic The Department at Play and for storage and analysis of large traces generated from run-time profiling of (legacy) Computer Science department students applications. The computational and faculty participate in a range of capabilities provided by this cluster will allow detailed organized social activities. Here are some modeling and in depth analysis photos from recent hiking and ski trips. of real world scenarios. Professor Joseph Traub, Edwin Howard Armstrong Hike on the Hudson Professor of Computer Science, gave a College of Computing Distinguished Lecture at Georgia Tech. The title of his lecture was Exponential Improvement in Qubit Complexity. Together with Corinna Cortes, Professor Vladimir Vapnik has been awarded the Paris Kanellakis Theory and Practice Award for their revolutionary development of a highly effective algorithm known as Support Vector Machines (SVM). The Kanellakis Award honors specific theoretical accomplishments that significantly affect the practice of computing. Support Vector Machines are a set of related supervised learning methods used for data classification and regression common in the field Ski Trip to Hunter Mountain of artificial intelligence. As a result of this work, SVM is one of the most frequently used algorithms in machine learning, and is used in medical diagnosis, weather forecasting, and intrusion detection among many other practical applications.

16 CS@CU SPRING 2009 CS@CU SPRING 2009 17 Alumni News

Hrvoje Benko (Ph.D. ‘07) was on four released titles: FIFA 06 Ta run Kapoor (M.S. ‘01) will be Akira Tsukamoto (M.S. ‘03) pictured in a feature article on and FIFA 07 for , PS2, joining Colibria, a Norwegian- writes, “I moved to Sony the front page of the New York Nintendo GameCube, and PC; based company specializing in Computer Entertainment in Times Business section. The and FIFA 08 and FIFA 09 for IM & Presence Services for 2006 and have been working on article, Microsoft Mapping and PS3. Mobile Operators and Service Cell Broadband Engine (Cell/B.E.) Course to a Jetsons-Style It’s been a rollercoaster Providers, as Product Director processor projects jointly with Future, is about technology ride where learning and of e2e solutions for SIP-based IBM and Toshiba since then. that is being developed at contributions are at a maximum solutions, where he will focus Currently I am leading an open Microsoft, including Hrvoje’s point in every development cycle. on helping with the SIP strategy source project called MARS to gesture recognition system. First working on the demos & solutions for the operators make programming easier and and service providers market. more efficient on Cell/B.E., which Matthew Blaschko (SEAS ‘01) to gather my first This position will be based out is used on PLAYSTATION 3 writes, “I’ve just finished my and console development experience, then leading an area, of Oslo, Norway. and IBM Cell Blades (also used Ph.D. at the Max Planck Institute on the first 1 petaflops super for Biological Cybernetics in and lately being responsible for Han Liang (M.S. ’06) writes, computer, IBM Roadrunner). Tuebingen, Germany, and am the design, implementation, and “A f ter I graduated, I worked for MARS is a multi-tasking execution of innovative features IBM for a year and half as a soft- now at the University of Oxford programming framework for as a postdoctoral research fellow like Be A Pro (conceived in FIFA ware engineer in Massachusetts. and adopted by all EA Sports Cell/B.E. processors which where I work on computer Then I joined TripAdvisor simplifies maximizing the vision and machine learning. titles) and Player Reactions, (www.tripadvisor.com) as a usage of SPE cores on the If anyone wants to get in among others. senior software engineer in 2008. Cell/B.E. MARS is SPE centric, touch, send an email to Columbia University has been Last April, TripAdvisor began in that each SPE runs a micro [email protected].” invaluable for preparing me for to build a Chinese version of kernel that provides light the challenges I faced and the their site, and I was working Casey Callendrello (SEAS ‘07) weight context switching with ones to come. I’m proud of my on a project team to help them writes, “I’m currently a Senior synchronization with other work and I encourage everyone launch their site. Currently I Network Operations Engineer SPEs and generally runs to pursue your inner dreams am permanently relocated to at Akamai, and I’ve been having independent of the PPE core. and accomplish those goals. the newly established Beijing a great time there. It certainly Detailed documentation is at office, and switched my role justified the long hours I spent in You can see more about http://ftp.uk.linux.org/pub/ to manage the search engine the windowless confines of the what I’ve been doing at linux/SonyPS3/mars/1.0.1/ marketing (SEM), search engine Columbia NOC and the Network www.sebastianenrique.com.ar.” mars-docs-1.0.1/html/ optimization (SEO) efforts here, and a download is available at Lab. I’m also volunteering at a Eleazar Eskin and some product management. Boston-based charity called Bikes (Ph.D. ‘02) has http://ftp.uk.linux.org/pub/ This office will expand to roughly linux/Sony-PS3/mars/.” not Bombs, working on their been named 15 headcounts this year, and I website and other technology. a 2009 Sloan am enjoying this startup feeling Bikes not Bombs takes donated Foundation very much.” bicycles, repairs them, and Research sends them to needy people Fellow. The Ramiro Ordonez (SEAS ‘04) in developing countries. Our Sloan Research writes, “I am working for most recent shipment went Fellowships seek to stimulate Charles River in Boston (after to a group called Maya Pedal, fundamental research by early- the fall of Lehman Brothers, a Guatemalan organization career scientists and scholars of which forced me to leave NYC). that turns bicycles into pedal- outstanding promise. These two- Things are really good for me powered farm machinery.” year fellowships are awarded here considering the economy, and I am looking forward to Sebastian Enrique (M.S. ‘05) yearly to 118 researchers in being more involved with the writes, “When I left Columbia recognition of distinguished faculty and alumni.” I moved from New York to performance and a unique Vancouver to fulfill one of my potential to make substantial Peter Ottomanelli (SEAS ‘03, dreams: to work in the game contributions to their field. M.S. ‘05) and his wife Lisa industry. It’s been more than Eleazar works in the area of (CC ‘05) welcomed their first 3 years and I can say that I’ve molecular biology and is an son on Christmas morning successfully accomplished it assistant professor at UCLA. (12/25/2008) this past year. by working on the FIFA video He was advised by Professor Kundan Singh (Ph.D. ‘06) was game franchise at Electronic Sal Stolfo. married to Mamta Singh, also Arts Canada. In all that period a computer professional, on I have worked as a Software November 29th in Kanpur. Engineer and Game Designer

18 CS@CU SPRING 2009 Recent & Upcoming PhD Defenses

Paul Blaer John This thesis examines the perform- Frank Enos Advisor: Professor Cieslewicz ance of several core database Advisor: Peter Allen Advisor: Professor operations on different chip Professor Julia Kenneth Ross multiprocessor architectures. Hirschberg View Planning Performance challenges are for Automated Architecture- identified and architecture- Detecting Site Modeling Sensitive sensitive algorithms designed Deception Abstract: Database Query specifically for chip multi- in Speech Constructing highly detailed 3-D Processing processors are proposed and This dissertation describes work models of large complex sites on Chip Multiprocessors verified. Three core challenges on the detection of deception in using range scanners can be a Abstract: Microprocessor design impact database performance speech using the techniques of time consuming manual process. is currently in the midst of a on chip multiprocessors: (1) spoken language processing. The One of the main drawbacks sea change, which will have a identifying sufficient parallelism accurate detection of deception in is determining where to place significant impact on all computer with database operations to take human interactions has long been the scanner to obtain complete applications including database advantage of on-chip parallelism, of interest across a broad array coverage of a site. We have management systems, the (2) managing shared on-chip of contexts and has been studied developed a system for automatic focus of this thesis. For the past resources including memory in a number of fields, including view planning called VuePlan. several decades, microprocessor cache, memory bandwidth, and psychology, communication, and The system proceeds in two performance has been driven execution units, and (3) balancing law enforcement. The detection distinct stages. In the initial primarily by higher clock rates and the aforementioned resource of deception is well-known to be phase, the system is given a 2-D the subsequent faster execution sharing with inter-thread a challenging problem: people are site footprint with which it plans of a single stream of instructions communication overhead. notoriously bad lie detectors, and a minimal set of sufficient and (a single thread). Faster chip The first part of this thesis no verified approach yet exists properly constrained covering speeds require more power and examines the use of simultaneous that can reliably and consistently views. We then use a 3-D laser produce more heat. A single multithreading technology to catch liars. To date, the speech scanner to take scans at each of thread also has difficulty providing help overcome performance signal itself has been largely these views. When this planning enough work to keep a chip busy bottlenecks due to high memory neglected by researchers as a system is combined with our because applications often have latency in common database source of cues to deception. Prior mobile robot, it automatically to wait for high latency operations join algorithms. Then a highly to the work presented here, no computes and executes a tour such as loading data from memory, parallel supercomputer is used comprehensive attempt has been of these viewing locations and an operation that becomes more to demonstrate the benefits of made by speech scientists to acquires them with the robot’s expensive in terms of processor massively parallel architectures apply state-of-the-art speech onboard laser scanner. These cycles the faster a chip is. for data processing tasks. processing techniques to the initial scans serve as an approxi- For these reasons, processor Finally, the last sections examine study of deception. This work mate 3-D model of the site. The architects have recently begun to core database operations on a uses a set of features new planning software then enters a improve processor performance commodity chip multiprocessor to the deception domain in second phase in which it updates by increasing the degree of on- with the highest currently available classification experiments, this model by using a voxel- chip thread-level parallelism degree of on-chip parallelism. statistical analyses, and speaker- based occupancy procedure to rather than the chip speed. The Chip multiprocessors are shown and group-dependent modeling plan the next best view (NBV). processor designs featuring to have significant benefits for approaches, all designed to This next best view is acquired, on-chip parallelism are termed database performance, but identify and employ potential and further NBVs are sequentially chip multiprocessors and include achieving optimal performance cues to deception in speech. This computed and acquired until an multi-core and simultaneous multi- requires database algorithms to be dissertation shows that speech accurate and complete 3-D threading processors, both of crafted with on-chip parallelism as processing techniques are model is obtained. A simulator which are examined in this thesis. a design parameter from the start. relevant to the deception domain tool which we developed has The move toward on-chip paral- by demonstrating significant allowed us to test our entire view lelism has important implications statistical effects for deception planning algorithm on simulated for computer applications. on a number of features, both sites. We have also successfully Database management systems in corpus-wide and subject- used our two-phase system to are a core computer application dependent analyses. Results also construct precise 3-D models of used across disciplines within show that deceptive speech can real world sites located in New computer science and in almost be automatically classified with York City: Uris Hall on the campus every sector of the economy. some success: accuracy is better of Columbia University and Fort This means that database than chance and considerably Jay on Governors Island. performance has a big impact better than human hearers on most data related activities. performing an analogous task.

CS@CU SPRING 2009 19 Recent & Upcoming PhD Defenses (continued)

The work also examines speaker to produce a short utterance To reason about the RAS capa- impact on system operation. Third, and group differences with conveying continued attention. bilities of the systems of today that combining practical fault- respect to deceptive speech, and Additionally, we describe a or the self-* systems of tomorrow, injection tools with mathematical we report a number of findings in series of studies of affirmative there are three evaluation- modeling techniques based on this regard. We provide a context cue words— a family of cue related challenges to address. Markov Chains, Markov Reward for our work via a perception study words such as ‘okay’ or ‘alright’ First, developing (or identifying) Networks and Control Theory can in which human hearers attempted that speakers use frequently in practical fault-injection tools that be used to develop a benchmark- to identify deception in our corpus. conversation for several purposes: can be used to study the failure ing methodology for evaluating Through this perception study we for acknowledging what the behavior of computing systems and comparing the reliability, identify a number of previously interlocutor has said, or for cueing and exercise any (remediation) availability and serviceability unreported effects that relate the start of a new topic, among mechanisms the system has (RAS) characteristics of the personality of the hearer to others. We find differences in available for mitigating or resolving computing systems. deception detection ability. An the acoustic/prosodic realization problems. Second, identifying This thesis demonstrates how the additional product of this work is of such functions, but observe techniques that can be used 7U Evaluation Method can be used the CSC Corpus, a new corpus of that contextual information to quantify RAS deficiencies in to evaluate the RAS capabilities deceptive speech. figures prominently in human computing systems and reason of real-world computing systems disambiguation of these words. about the efficacy of individual and in so doing makes three We also conduct machine learning or combined RAS-enhancing contributions. First, a suite of run- Agustin mechanisms (at design-time Gravano experiments to explore the auto- time fault-injection tools (Kheiron matic classification of affirmative or after system deployment). tools) able to work in a variety Advisor: Professor cue words. Finally, we examine Third, developing an evaluation of execution environments is Julia Hirschberg a novel measure of speaker methodology that can be used developed. Second, analytical to objectively compare systems Turn-Taking entrainment related to the usage tools that can be used to construct based on the (expected or and Affirmative of these words, showing its mathematical models (RAS actual) benefits of RAS-enhancing Cue Words association with task success models) to evaluate and quantify mechanisms. in Task-Oriented Dialogue and dialogue coordination. RAS capabilities using appropriate This thesis addresses these three metrics are discussed. Finally, Abstract: As interactive voice challenges by introducing the the results and insights gained response systems spread at Rean Griffith 7U Evaluation Methodology, from conducting fault-injection a rapid pace, providing an a complementary approach to experiments on real-world increasingly more complex Advisor: Professor traditional performance-centric systems and modeling the system functionality, it is becoming clear Gail Kaiser evaluations that identifies criteria responses (or lack thereof) using that the challenges of such The 7U for comparing and analyzing RAS models are presented. In systems are not solely associated Evaluation existing (or yet-to-be-added) conducting 7U Evaluations of to their synthesis and recognition Method: RAS-enhancing mechanisms, is real-world systems, this thesis capabilities. Rather, issues such as Evaluating able to evaluate and reason about highlights the similarities and the coordination of turn exchanges Software Systems via Runtime combinations of mechanisms, differences between traditional between system and user, or the Fault-Injection and Reliability, exposes under-performing performance-oriented evaluations correct generation and under- Availability and Serviceability mechanisms and highlights the and RAS-oriented evaluations standing of words that may convey (RAS) Metrics and Models lack of mechanisms in a rigorous, and outlines a general framework multiple meanings, appear to Abstract: Renewed interest in objective and quantitative manner. for conducting RAS evaluations. play an important role in system developing computing systems The development of the 7U usability. This thesis explores that meet additional non-functional Evaluation Methodology is based those two issues in the Columbia requirements such as reliability, Andrew on the following three hypotheses. Games Corpus, a collection high availability and ease-of- Howard First, that runtime adaptation of spontaneous task-oriented management/self-management provides a platform for imple- Advisor: Professor dialogues in Standard American (serviceability) has fueled menting efficient and flexible Tony Jebara English. research into developing systems fault-injection tools capable of in- We provide evidence of the that exhibit enhanced reliability, Large Margin situ and in-vivo interactions with existence of seven turn-yielding availability and serviceability (RAS) Transformation computing systems. Second, that cues—prosodic, acoustic capabilities. This research focus Learning mathematical models such as and syntactic events strongly on enhancing the RAS capabilities Abstract: With the current Markov chains, Markov reward associated with conversational of computing systems impacts not explosion of data coming from networks and Control theory turn endings—and show that the only the legacy/existing systems many scientific fields and industry, models can successfully be likelihood of a turn-taking attempt we have today, but also has machine learning algorithms are used to create simple, reusable from the interlocutor increases implications for the design and more important than ever to help templates for describing specific linearly with the number of development of next generation make sense of this data in an failure scenarios and scoring the cues conjointly displayed (self-managing/self-*) systems, automated manner. The support system’s responses, i.e., studying by the speaker. We present which are expected to meet these vector machine (SVM) has the failure-behavior of systems, similar results related to six non-functional requirements with been a very successful learning and the various facets of its backchannel-inviting cues— minimal human intervention. algorithm for many applied remediation mechanisms and their events that invite the interlocutor settings. However, the support

20 CS@CU SPRING 2009 vector machine only finds introduced to solve larger that they should capture, hence models help predict the efficiency linear classifiers so data often problems and extend the basic hurting output accuracy and and output quality of a variety of needs to be preprocessed with framework to a multitask setting. completeness. At the same time, query execution strategies. Finally, appropriately chosen nonlinear Another cost function based query processing decisions, such we also consider the common mappings in order to find a model on kernel alignment is explored as the choice of information scenario where information with good predictive properties. to learn matrix mixture of extraction systems or document extraction systems report the These mappings can either transformations. Maximizing retrieval strategies, also impact extracted tuple together with take the form of an explicit the alignment with a cardinality the output quality. Finally, scores that reflect the confidence transformation or be defined constraint on the mixture weights depending on the nature of the in the correctness of the extracted implicitly with a kernel function. gives rise to approximation information need, users may have tuples. Specifically, we present Automatically choosing these algorithms with constant factor varying preferences regarding the query processing algorithms that mappings has been studied under approximations similar to execution efficiency and quality leverage these confidence scores the name of kernel learning. These the Max-Cut problem. These expected from the querying and efficiently produce the methods typically optimize a cost methods are then applied to process. This dissertation builds high-confidence tuples, in turn function to find a kernel made up the task of learning monotonic on the critical observation that, discarding extracted tuples that of a combination of base kernels, transformations which are built in addition to efficiency, which are likely to be incorrect. thus implicitly learning mappings. from a mixture of truncated ramp is important just as in traditional In summary, this thesis presents This dissertation investigates functions. Experimental results for relational query optimization, the a principled query optimization methods for choosing explicit synthetic data, image histogram output quality of an execution approach for processing transformations automatically. classification, text classification is critical. structured queries over text data- This setting differs from the and gender recognition In our extraction-based scenario, bases, taking into consideration kernel learning framework by demonstrate the utility of these query processing can be both the efficiency and the output learning a combination of base learned transformations. decomposed into a sequence of quality of the query execution transformations rather than basic steps: retrieving relevant text strategies. Our hope is that the base kernels. This allows prior Alpa Jain documents, extracting relations contributions of this work will help knowledge to be exploited from the documents, and joining shrink the gap between structured in the functional form of the Advisor: Professor extracted relations for queries databases and text databases, transformations which may not Luis Gravano involving multiple relations. Each by enabling the seamless and be easily encoded as kernels. Query of these steps presents different expressive querying of all Additionally, kernel based SVMs Processing alternatives and together they available information, regardless are often hard to interpret because over Relations form a space of possible query of whether it is in structured they lead to complex decision Extracted from execution strategies. Our goal databases or embedded in boundaries which are only linear Text Databases is to consider the user-specified natural language text. in the implicitly defined space. requirements for execution Abstract: Text documents often However, by working with explicit efficiency and quality, and choose embed data that is structured in mappings, models with an intuitive an execution strategy for each Anshul nature, and this structured data meaning can be learned. The query based on a principled, Kundaje is increasingly exposed using learned transformations can be cost-based comparison of the information extraction systems. Advisor: Professor visualized to lend insight into the alternative execution strategies. Information extraction systems Itsik Pe’er problem, and the hyperplane We first introduce a simple, generate structured relations Predictive weights indicate the importance integrative optimization approach from documents, thus enabling Models of Gene of transformed features. for processing queries involving expressive, structured queries Regulation The two basic models that will single as well as multiple over text databases. This Abstract:Inferring gene regulatory be studied are the mixture of extracted relations. This approach dissertation studies the problem networks and signal transduction transformations and the matrix considers each execution strategy of processing structured queries pathways from high-throughput mixture of transformations. The as a whole and exploits database- over relations extracted from text genomic and proteomic data is matrix mixture reduces to mixture specific statistics to predict the databases. one of the central problems in of transformation learning when execution strategy characteristics. To process structured queries over computational biology. There has the matrix M is rank 1 and mix- We then move towards an text databases, we face multiple been an explosion in the quantity ture of kernel learning when M is in-depth understanding of the challenges. One key challenge is and types of high-throughput a diagonal matrix. First, greedy impact of each component of an efficiency: information extraction biological data. These datasets algorithms are proposed to execution strategy on the overall is a time-consuming process, tend to be noisy, incomplete, high- simultaneously learn a mixture execution. With this in mind, we so query processing strategies dimensional and undersampled. of transformations and a large rigorously analyze the critical should minimize the number of Due to the complexity of the margin hyperplane classifier. Then, components of an execution documents that they process. biological systems and a constant a convex semidefinite algorithm is strategy and build statistically Another key challenge is output evolution of our understanding of derived to find a matrix mixture of robust representations for quality: information extraction these systems, building flexible transformations and large margin information extraction systems, systems are often far from perfect, and robust models that aid hyperplane. More efficient as well as statistical models for and might output erroneous automatic hypothesis generation algorithms based on the document retrieval strategies and information or miss information remains a challenge. extragradient method are join processing algorithms. These

CS@CU SPRING 2009 21 Recent & Upcoming PhD Defenses (continued)

We present a new predictive Sujit to achieve a desired scene Patrick Lee modeling framework for decipher- Kuthirummal composition in a single image. The Advisor: Professor ing gene regulation. The core Advisor: Professor proposed system consists of a Vishal Misra computational approach is based Shree Nayar conventional camera that images on the GeneClass and MEDUSA the scene reflected in a flexible Achieving algorithms. MEDUSA integrates Flexible Imaging mirror sheet. By deforming the Network promoter sequence, mRNA for Capturing mirror we can generate a wide Robustness expression, and transcription Depth and and continuous range of smoothly with factor occupancy (ChIP chip) data Controlling Field of View and curved mirror shapes, each of Diversity Overlay Routing to learn regulatory programs that Depth of Field which results in a new field of Abstract: Network systems, in predict the differential expression Abstract: Over the past few view. This system enables us to both wired and wireless domains, of target genes. MEDUSA is centuries cameras have greatly realize a wide range of scene- traditionally forward data along able to learn context-specific evolved to better capture our to-image mappings, in contrast a single best routing path for regulatory networks and discover visual world. However, the funda- to conventional imaging systems communication. Given that regulatory sequence motifs ab- mental principle has remained that yield a fixed or a fixed set of network communication has initio. We design new techniques the same—the camera obscura. scene-to-image mappings. become a major part of today’s for extracting biologically and Consequently, though cameras All imaging systems that use economy, it is important to statistically significant information today can capture incredible curved mirrors (including the ones guarantee the robustness of from the learned regulatory photographs, they still have certain above) suffer from the problem of network communication toward models. We show the utility of this limitations. Recent years have defocus due to mirror curvature; general failures or malicious approach in analyzing several seen several efforts to overcome due to local curvature effects attacks. However, there are interesting biological contexts these limitations and extend the the entire image is usually not fundamental constraints on (environmental stress responses, capabilities of cameras through in focus. We use the known re-engineering existing network DNA-damage, hypoxia) in the the paradigm of computational mirror shape and camera and systems to use robustness budding yeast Saccharomyces imaging— capture the scene in lens parameters to numerically mechanisms. First, network cerevisiae. We present wet- a coded fashion, which is then compute the spatially varying systems are not amenable to lab experiments to validate decoded computationally in soft- defocus blur kernel and then centralized solutions because hypotheses generated by our ware. This thesis subscribes to explore how we can use spatially they usually span multiple models. Experimental results this philosophy. In particular, we varying deconvolution techniques administrative domains. Also, it demonstrate that MEDUSA is present several imaging systems to computationally “stop-up” is non-trivial to modify low-layer able to achieve high prediction that enable us to overcome limi- the lens—capture all scene hardware and protocols of legacy accuracy and reveal context- tations of conventional cameras elements with sharpness while network systems to support new specific regulatory mechanisms and provide us with flexibility in using larger apertures than what robustness mechanisms. even on small datasets. how we capture scenes. is usually required in curved To overcome the fundamental We present extensions to the First, we present a family of mirror imaging systems. robustness challenges, we model for integrating interactome imaging systems called radial Finally, we present an imaging propose to adapt existing network data such as signaling and imaging systems that capture a system with flexible depth of systems into robust architectures protein-protein interaction scene from a large number of field. We propose to translate the using the concept of overlays. data. We use a graph-based viewpoints, instantly, in a single image detector along the optical Layered atop the physical net- representation of the interactome image. These systems consist of axis during the integration of work, overlays provide a flexible data for reconstructing active a conventional camera looking a single image. We show that platform for building robustness signaling pathways. We through a hollow conical mirror by controlling the motion of the mechanisms. In particular, demonstrate that our methods whose reflective side is the detector—its starting position, we seek to design distributed scale well for predictive model- inside. By varying the parameters speed, and acceleration—we solutions that are amenable to ing in higher eukaryotes such as of the cone we get a continuous can manipulate the depth of field nodes that can communicate C. elegans and humans. We also family of imaging systems. We in new and interesting ways. We with one another, but are present an off-the-shelf software demonstrate the flexibility of this demonstrate capturing scenes independently operated and tool for learning, interpreting and family—different members of this with large depths of field, while prefer to make local decisions visualizing predictive models of family can be used for different using large apertures to maintain on link costs and route choices. gene regulation. In this thesis, applications. One member is well high signal-to-noise ratio. We we introduce algorithms and This thesis proposes to use suited for reconstructing objects also show how we can capture diversity overlay routing to software that make powerful with fine geometry such as 3D scenes with discontinuous, tilted new techniques for modeling address two robustness problems: textures, while another is apt for or non-planar depths of field. (1) resilient multipath routing, in gene regulatory networks broadly reconstructing larger objects accessible to the biological All the imaging systems presented which distributed algorithms are such as faces. Other members of proposed to route data along community. this family can be used to capture here subscribe to the philosophy of computational imaging. This multiple available network paths texture maps and estimate the in order to minimize the loss of BRDFs of isotropic materials. approach is particularly attractive as with Moore’s law computations throughput due to node or link We then present an imaging become increasingly cheaper, failures, and (2) network fault system with a flexible field of enabling us to push the limits of correction, in which an end- view—the size and shape of how cameras can capture scenes. to-end inference approach is the field of view can be varied proposed to aggregate information

22 CS@CU SPRING 2009 from multiple paths in order to failure probability given a network For mobile visualization, we minimize the cost of diagnosing with faulty nodes. By taking into present results from a field study and repairing network failures. account the cost of correcting and task analysis of botanists First, we develop a distributed faulty nodes, we prove that an performing species identification resilient multipath solution that optimal strategy should start with in the field. We develop an routes data across available checking one of the candidate iteratively designed, extensible network paths to improve routing nodes, which are identified based Electronic Field Guide (EFG) resilience in wired overlay net- on a potential function that we system and architecture, a works. Our solution allows routing develop. We propose several conceptual data model, and decisions to be made locally by efficient heuristics for inferring the interface (LeafView) for mobile network nodes without requiring best node to check in large-scale visualization. Field experiments centralized information, such as networks. By extensive simulation, show improved identification the entire network topology. We we show that checking first a speed and interaction efficacy. devise two distributed algorithms candidate node decreases the Next, we explore spatially- and termed the Bound-Control cost of correcting all faulty nodes semantically-driven situated algorithm and the Lex-Control when compared with checking visualization using objects as algorithm, with an objective of first the most likely faulty node. context in head-worn augmented minimizing the loss of throughput This thesis seeks to provide reality (AR). We develop and due to a single-link attack. We insights into the solutions of evaluate tangible AR and formally prove that both algorithms overcoming the fundamental head-movement controlled converge to their respective robustness challenges of existing AR interaction techniques. optimal solutions. Using simulation, network systems. We consider In interviews following lab we show that our resilient multi- one possible solution, i.e., using experiments, participants reported path solution effectively mitigates diversity routing on top of an improved speed for inspection and the loss of throughput due to overlay architecture, and show comparison and a preference for single-link attacks as well as that how this solution improves tangible AR. Building on this work, multi-link attacks. the robustness of network we design, develop, and evaluate We next consider the use of communication toward general visualization and activation opportunistic routing to boost failures or malicious attacks. techniques for discovering and the throughput gain of routing in learning gestures for these user interfaces (Visual Hints). Lab multi-hop wireless networks. Our Sean White goals are to overlay the 802.11 experiments show preference for MAC layer, exploit the broadcast Advisor: Professor visual hints that combine overlaid nature of wireless communication, Steven Feiner graphics and animation. In addition, we investigate menu and utilize the spatial diversity of Interaction and techniques, activated by shaking a network. We consider a novel Presentation an object, for interacting with routing scheme called Stable Techniques visualizations (Shake Menus). Opportunistic Routing (SOR), for Situated We compare display-, object-, which integrates the techniques of Visualization forwarder scheduling and network and world-referenced coordinate Abstract: Every day we move coding to improve the throughput systems for presentation of menu through a sea of invisible informa- gain. Using nsclick simulation, options. Lab experiments show tion. As computation, sensing, and we show that SOR improves increased speed and accuracy display become more mobile and throughput when compared with when using the display-referenced distributed, interaction shifts from traditional single-path routing and coordinate system. our desktop to our environment. previous opportunistic routing Finally, we present techniques This shift changes how we interact schemes under various settings, for spatially-driven visualization with our surroundings, creating including the scenarios where in the context of physical scenes. the opportunity for situated visual- link-level measurements are Based on our field study of site ization—visual representation of inaccurate. We also verified visits by urban designers, a data presented in its spatial and the benefits of SOR using hand-held AR visualization tool semantic context. This dissertation testbed experiments. (SiteLens) embodying these tech- contributes novel interaction niques enables interaction with Finally, we consider an end-to- and presentation techniques invisible aspects of urban sites, end approach of inferring network for situated visualization in such as georeferenced sensor faults in an overlay system, with three research areas: mobile data. Field experiments provide an optimization goal of minimizing visualization, objects as context, evidence for new insights the expected cost of correcting and scenes as context. In support derived from situated visualiza- (i.e., diagnosing and repairing) all of this research, we make tion, preference for specific faulty nodes. Conventional fault further contributions developing representations, and improved localization problems first check algorithms, architectures, and interaction with data using a the most likely fault node, i.e., the working artifacts. node with the highest conditional novel stabilization algorithm.

CS@CU SPRING 2009 23 CUCS Resources Stay in Touch!

Visit the CUCS Alumni Portal at You can subscribe to the CUCS news mailing list at https://mice.cs.columbia.edu/alum to: http://lists.cs.columbia.edu/mailman/listinfo/cucs-news • Update your contact information CUCS colloquium information is available at • Look at recent job postings http://www.cs.columbia.edu/lectures • Get departmental news Read the CUCS Newsletter online at If you know another alumni who may not be http://www.cs.columbia.edu/resources/newsletters receiving the newsletter, please forward the Alumni Portal link to them. In the spirit of environmental responsibility, we are moving towards more electronic (and less hard-copy) distribution of the newsletter. If you’d like to continue receiving paper copies of the newsletter, please visit the Alumni Portal to sign up.

Columbia University in the City of New York Department of Computer Science NON-PROFIT ORG. 1214 Amsterdam Avenue U.S. POSTAGE Mailcode: 0401 PAID New York, NY 10027-7003 NEW YORK, NY PERMIT 3593 ADDRESS SERVICE REQUESTED

CS@CU NEWSLETTER OF THE DEPARTMENT OF COMPUTER SCIENCE AT COLUMBIA UNIVERSITY WWW.CS.COLUMBIA.EDU