Select Research Projects

CEWIT | Center of Excellence in Wireless and Information Technology Table of Contents

Security Securing Mobile Apps and Platforms...... 6 National Security Institute...... 6 Cloud Computing Security...... 6 Security Policy Mining...... 7 Security Policy Analysis...... 7 Fast, Architecture-neutral Static Binary Instrumentation for Security Applications...... 7 Security and Privacy in Geo-Social Networks...... 8 NFS4Sec: An Extensible Security Layer for Network Storage...... 8 Leadership Security and Privacy in Geo-Social Networks...... 8 NFS4Sec: An Extensible Security Layer for Network Storage...... 8 Yacov Shamash, PhD UID: A Strongly-Secure Usable Identity Ecosystem with Privacy...... 9 Vice President of Economic Cloudtracker: Transparent, Secure Provenance Tracking and Security Policy Enforcement in Clouds...... 9 Development and Dean of College of Development of Security Threat Control System with Multi-Sensor Integration and Image Analysis.... 9 Engineering and Applied Sciences Internet Censorship Lab: Illuminating Internet Censorship with Principled Network Measurements...... 10 Satya Sharma, PhD, MBA Detecting and Characterizing Internet Traffic Interception based on BGP Hijacking...... 10 Executive Director Samir R. Das, PhD Networks Belief Evolutions in Networks of Bayesian Agents...... 9 Director, Network Technologies Energy Efficient Reliable Data Transmissions in a Generalized Power Line Monitoring Network....9 Shmuel Einav, PhD Development of Security Threat Control System with Multi-Sensor Integration and Image Analysis.... 9 Internet Censorship Lab: Illuminating Internet Censorship with Director, Medical Technologies Principled Network Measurements...... 10 Sangjin Hong, PhD Detecting and Characterizing Internet Traffic Interception based on BGP Hijacking...... 10 Director, Globalization Cognitive and Efficient Spectrum Access in Autonomous Wireless Networks...... 11 Novel Interconnection Networks for Data Centers...... 11 Arie Kaufman, PhD Exploiting Control and Communications for Robust and Flexible Multi-Robot ...... 12 Chief Scientist Multicast in Data Center Networks...... 12 Modeling and Understanding Complex Influence in Social Networks...... 12 Yuanyuan Yang, PhD Research on Wireless Rechargeable Sensor Networks...... 13 Director, Communications & Devices RFID Sense-a-tags for the Internet of Things...... 40 Algorithmic Foundations for Hybrid Mobile Sensor Networks ...... 46 Bin Zhang Associate Director, Computing Services Big Data Center for Dynamic Data Analytics (CDDA): A National Science Foundation Rong Zhao, PhD Industry/University Cooperative Research Center...... 14 Director, Software Systems Scalable Multilingual NLP through Deep Learning...... 14 An Ontology and Reasoning System for Access Policies to Software Services...... 14 The ND-Scope: Visual Analytics for High-Dimensional Data...... 15 Capturing Word Meanings by Language, Place and Time...... 15 Application of Parallel Computing to the Analysis of Next-generation Sequencing Data...... 15 Develop Novel Statistical Methods for Multi-loci Genetic Mapping...... 16 Efficient Resource-oblivious Parallel ...... 16 Cache-efficient Parallel Algorithms for fast Flexible Docking and Molecular Dynamics...... 16 FTFS: A Read/Write-optimized Fractal Free File System...... 17 Eliminating the Data Ingestion Bottleneck in Big Data Applications...... 17 High-performance Rule Engine for Intelligent Web Information Systems...... 17 Visual Correlation Analysis of Numerical and Categorical Multivariate Data...... 18 High Performance Big Data Analytics with the User in the Loop...... 18 Automatic Generation of Virtual Troubleshooting Systems Using Ontologies...... 18 Reality Deck - Immersive Gigapixel Display...... 20 NYRISE Visualization for Climate Simulation Data...... 21 Gigapixel Video Streaming...... 21 Fusion of GIS and Procedural Modeling for Driving Simulations in New York...... 22 Ambienizer: Turning Digital Photos into Ambient Visualizations ...... 28 is an affirmative action/ Virtual Dressing Room...... 28 equal opportunity educator and employer. An Interactive User Interface for the Smart Grid...... 42 This publication can be made available in an alternative format upon request. Performance Analysis and Optimization for Logic Rule Engines...... 47 © 2014 Stony Brook University 2 Table of Contents

An Efficient, Versatile, Scalable, and Portable Storage System for Scientific Data Containers...... 48 Workload-aware Storage Architectures for the Optimal Performance and Energy Efficiency...... 49 General-Purpose Computation on Graphics Processing Units...... 51

Imaging and Visualization The ND-Scope: Visual Analytics for High-Dimensional Data...... 15 Visual Correlation Analysis of Numerical and Categorical Multivariate Data...... 18 High Performance Computing for Medical Imaging...... 19 A Visual Framework for Health Care Analytics...... 19 Detecting Data Visualization Preferences Using Games...... 19 Visual Analytics for Open-Ended Educational Multiplayer Games...... 19 Energy Choices: Using an Agent-based Modeling Simulation and Game to Teach Socio-Scientific Topics...... 19 Compressive Sensing Approach to Intraoperative Mass Spectrometry for Tumor Margin...... 20 Reality Deck - Immersive Gigapixel Display...... 20 NYRISE Visualization for Climate Simulation Data...... 21 Gigapixel Video Streaming...... 21 Natural Interaction with VR Environments...... 21 Immersive Virtual Colonoscopy ...... 21 Interactive Immersive Visualization of Computed Microtomography Data...... 22 Fusion of GIS and Procedural Modeling for Driving Simulations in New York...... 22 Medical Volume Rendering on the Gigapixel Reality Deck...... 22 Lattice Simulations and Rendering...... 22 Plume Modeling Simulation and Visualization ...... 23 Visual Simulation of Thermal Fluid Dynamics in a Water Reactor...... 23 Volumetric Segmentation of Computed Tomography Angiography ...... 23 Automatic Spleen Segmentation for Non-Invasive Lymphoma Diagnosis...... 24 Mobile-based Volume Rendering Pipeline for m-Health...... 24 Video Streaming for Interactive Visualization on Mobile Devices...... 24 Volumetric Mesh Mapping ...... 25 Conformal Mapping for Medical Imaging ...... 25 Automatic Detection of Colon Cancer ...... 25 Reconstruction and Registration Framework for Endoscopic Videos...... 26 Registration of Volumetric Prostate Scan using Curvature Flow ...... 26 Brain Parcellation ...... 26 Multimodal and Multivariate Visualization of the Brain...... 27 Meshless Point Cloud Registration by Conformal Mapping...... 27 Volumetric Shape Analysis ...... 27 Saliency-aware Compression of Volumetric Data...... 28 Ambienizer: Turning Digital Photos into Ambient Visualizations ...... 28 Virtual Dressing Room...... 28 3D Facial Recognition...... 29 Geometric Manifold Theory for Higher-Dimensional Data Modeling, Analysis, and Visualization.... 29 Multivariate Spline Theory, Algorithms, and Computational Techniques for Shape Modeling and Graphics Applications...... 30 Wireless Sensor Network Routing...... 30 Human Cortical Surface Morphological Study...... 31 Volumetric Modeling and Shape Design in Virtual Environments...... 31 EmoteControl: An Interactive Multimedia Stress Management System...... 32 Shape Analysis with Teichmüller Shape Space...... 32 Conformal Wasserstein Shape Space...... 33 Shelterware: An Integrated Multi-Platform System to Automate the Services of the Smithtown Animal Shelter...... 33 BrainRank: A Multimedia Game to Increase Fluid IQ...... 34 Integrating Humans and Computers for Image and Video Understanding...... 34 Machine Learning for the Analysis of fMRI Images...... 35 Learning Models for Illumination, Shadows and Shading...... 35 An Interactive User Interface for the Smart Grid ...... 42

3 Table of Contents

Healthcare and Biomedical Applications Application of Parallel Computing to the Analysis of Next-generation Sequencing Data...... 15 Cache-efficient Parallel Algorithms for fast Flexible Docking and Molecular Dynamics...... 16 High Performance Computing for Medical Imaging...... 19 A Visual Framework for Health Care Analytics...... 19 Compressive Sensing Approach to Intraoperative Mass Spectrometry for Tumor Margin...... 20 Immersive Virtual Colonoscopy ...... 21 Interactive Immersive Visualization of Computed Microtomography Data...... 22 Medical Volume Rendering on the Gigapixel Reality Deck...... 22 Volumetric Segmentation of Computed Tomography Angiography ...... 23 Automatic Spleen Segmentation for Non-Invasive Lymphoma Diagnosis...... 24 Mobile-based Volume Rendering Pipeline for m-Health...... 24 Conformal Mapping for Medical Imaging...... 25 Automatic Detection of Colon Cancer...... 25 Reconstruction and Registration Framework for Endoscopic Videos...... 26 Registration of Volumetric Prostate Scan using Curvature Flow ...... 26 Brain Parcellation...... 26 Multimodal and Multivariate Visualization of the Brain...... 27 Platform Comparison via Errors in Variables Models with or without Replicates...... 36 Biomarker Agreement and Integration Analyses across Measurement Platforms...... 36 Comparing the RNA-seq and Microarray Platforms with Generalized Linear EIV Model...... 36 Bootstrap-based RANOVA for microRNA Panel Data Analysis ...... 37 Integrative Multi-Scale Biomedical Image Analysis...... 37 Feature-based Exploration of Extremely Large Spatio-Temporal Scientific Datasets...... 37 Unified Statistical Models for Integrative Omics...... 38 Home Sleep Monitoring Using Bluetooth LE Biosensors to Facilitate Sleep Health...... 38 Developing Computer Games and Virtual Reality Environments for Use in Physical Rehabilitation...... 39 Mobile Healthcare Device Platform...... 39 Study of How Information Technology Affects Patient Access, Quality of Care, and Administrative Costs in a Newly Formed ACO...... 39

Internet of Things Research on Wireless Rechargeable Sensor Networks...... 13 Mobile Healthcare Device Platform...... 39 RFID Sense-a-tags for the Internet of Things...... 40 Design of Audio Interfaces in Adverse Acoustic Environments ...... 40 Non-isotropic Networked Sensor Deployment for Smart Buildings...... 40 Self-powered Wireless Hybrid Langasite Sensor for Pressure/ Temperature Monitoring of Nuclear Reactors...... 40 Algorithmic Foundations for Hybrid Mobile Sensor Networks...... 46

Smart Energy and Smart Urban Systems Non-isotropic Networked Sensor Deployment for Smart Buildings...... 40 Smart Grid Regional Demonstration – Long Island Smart Energy Corridor...... 41 Smart Composites for Energy Harvesting...... 41 Thin Film Solar Cells with Tunable Transparency...... 41 An Interactive User Interface for the Smart Grid...... 42 An Interactive User Inter­face for the Smart Grid...... 42 Enhanced Power System Operation and Control...... 42 Smart Grid Android Manager...... 42

4 Table of Contents

Algorithms Security Policy Mining ...... 7 Security Policy Analysis ...... 7 Modeling and Understanding Complex Influence in Social Networks...... 12 An Ontology and Reasoning System for Access Policies to Software Services ...... 14 Scalable Multilingual NLP through Deep Learning...... 14 Capturing Word meanings by Language, Place, and Time ...... 15 Application of Parallel Computing to the Analysis of Next-generation Sequencing Data ...... 15 Develop Novel Statistical Methods for Multi-loci Genetic Mapping...... 16 FTFS: A Read/Write-optimized Fractal Free File System ...... 17 Eliminating the Data Ingestion Bottleneck in Big Data Applications ...... 17 High-performance Rule Engine for Intelligent Web Information Systems...... 17 Visual Correlation Analysis of Numerical and Categorical Multivariate Data ...... 18 Automatic Generation of Virtual Troubleshooting Systems Using Ontologies...... 18 Lattice Simulations and Rendering ...... 22 Volumetric Mesh Mapping ...... 25 Meshless Point Cloud Registration by Conformal Mapping ...... 27 Volumetric Shape Analysis ...... 27 Saliency-aware Compression of Volumetric Data ...... 28 Design of Audio Interfaces in Adverse Acoustic Environments ...... 40 Non-Visual Skimming: Improving the Usability of Web Access for Blind People...... 43 Divisible Load Scheduling...... 43 Approximation Algorithms for Geometric Optimization...... 43 Geometric Networks...... 43 Algorithms in Support of Flight Trajectory Analysis...... 44 Using Evolutionary Computations to Understand the Design and Evolution of Gene and Cell Regulatory Networks...... 44 Adaptive Runtime Verification and Recovery for Mission-Critical Software...... 45 Computational Modeling and Analysis for Complex Systems ...... 45 Algorithmic Foundations for Hybrid Mobile Sensor Networks...... 46 From Clarity to Efficiency for Distributed Algorithms...... 46 Demand-driven Incremental Object Queries...... 47 Performance Analysis and Optimization for Logic Rule Engines ...... 47 Building Metaknowledge Representations in Circuit Design: Symbolic Data Mining for Systematic Modeling of Analog Circuits...... 50

Advanced Computing Systems Cache-efficient Parallel Algorithms for fast Flexible Docking and Molecular Dynamics...... 16 Efficient Resource-oblivious Parallel Algorithms...... 16 FTFS: A Read/Write-optimized Fractal Free File System...... 17 High Performance Big Data Analytics with the User in the Loop ...... 18 Video Streaming for Interactive Visualization on Mobile Devices...... 24 Adaptive Runtime Verification and Recovery for Mission-Critical Software...... 45 Computational Modeling and Analysis for Complex Systems...... 45 Eco Hadoop: Cost and Energy-Aware Cloud and HPC Computing...... 48 Secure Provenance in High-End Computing Systems...... 48 An Efficient, Versatile, Scalable, and Portable Storage System for Scientific Data Containers...... 48 Workload-Aware Storage Architectures for Optimal Performance and Energy Efficiency...... 49 Pochoir: A Stencil Computation Compiler for Modern Multicore Machines...... 49 Energy-Efficient Superconductor SFQ Processor Design...... 50 Building Metaknowledge Representations in Circuit Design: Symbolic Data Mining for Systematic Modeling of Analog Circuits ...... 50 Leveraging Three-Dimensional Integration Technology for Highly Heterogeneous Systems-on-Chip...... 50 General-Purpose Computation on Graphics Processing Units...... 51

5 Security

Securing Mobile Apps and Cloud Computing Security National Security Institute Platforms Radu Sion Radu Sion Long Lu [email protected] [email protected] [email protected] As computing becomes embedded in the Stony Brook University, the flagship state Smartphones, tablets, and other mobile very fabric of our society, the exponential university of New York, is establishing the devices have outgrown personal comput- growth and advances in cheap, high- National Security Institute (NSI). The NSI ers to become the dominant tools support- speed communication infrastructures vision and its core mission are bold: to ing users’ day-to-day needs for communi- allow for unprecedented levels of global secure our homeland by researching and cations, computations, and entertainment. information exchange and interaction. As developing technologies and insights for While enjoying the mobility and versatility a result, new market forces emerge that secure, trustworthy, and available com- of these fairly popular devices, users are propel toward a fundamental, cost-efficient munications and computing platforms. facing an emerging wave of security and paradigm shift in the way computing is de- NSI’s goal is to become a world leader in privacy threats. These threats have not ployed and delivered: computing outsourc- research, the education of professionals, been seen on conventional computers ing. Outsourcing has the potential to mini- security technology, business and policy, or systematically studied. The current mize client-side management overheads and raising awareness. NSI will span mul- research projects in RiS3 Lab (Research and benefit from a service provider’s global tiple disciplines and establish public-pri- in Systems and Software Security) aim expertise consolidation and bulk pricing. vate partnerships to develop new holistic at understanding and mitigating the two Companies such as Google, Amazon, and socio-technological solutions for securing general causes for security threats on Microsoft are rushing to offer increasingly our highly-digital societies; it will engage mobile devices: (1) vulnerable applications complex storage and computation out- not only in research but also in the educa- resulted from flawed designs or poor de- sourcing services supported by globally tion of professionals in defense, national velopment practices; (2) inadequate and distributed “cloud” infrastructures. Yet sig- and cyber-security, assurance, healthcare, unsuited OS-level security mechanisms nificant challenges lie in the path to suc- and policy. A comprehensive assurance originally designed for non-mobile plat- cessful large-scale adoption. In business, education program will be established, to forms. As the first step towards automatic healthcare and government frameworks, train not only Stony Brook students but detection of vulnerabilities in apps, we clients are reluctant to place sensitive data also the broader corporate and academic undertook the challenging task of build- under the control of a remote, third-party community. ing the first comprehensive static program provider, without practical assurances of analysis framework for Android apps. This privacy and confidentiality. Today’s solu- NSI will leverage the team’s strengths to framework addresses several open prob- tions however, do not offer such assuranc- spawn a steady stream of successful secu- lems in analyzing mobile code, such as es, and are thus fundamentally insecure rity-centric technology startups. (CEWIT) modeling asynchronous execution of app and vulnerable to illicit behavior. Existing components and discovery of implicit code research addresses several aspects of this entry points. Next, based on this frame- problem, but advancing the state of the art work, we will design scalable and accurate to practical realms will require a funda- vulnerability detectors desirable to app mental leap. This project addresses these stores, enterprise IT, and security service challenges by designing, implementing providers. At the operating system level, and analyzing practical data outsourcing our previous studies revealed some major protocols with strong assurances of privacy security problems of both iOS and Android and confidentiality. It will also l initiate the platforms. Recognizing the distinct security exploration of the cost and energy foot- requirements of mobile applications, we prints of outsourcing mechanisms. This is revisited the traditional OS design assump- essential as the main raison d’etre of out- tions that may not hold on mobile devices. sourcing paradigms lies in their assumed Our ongoing projects are now retrofit- end-to-end cost savings and expertise ting mobile OSes with new process and consolidation. Yet, research has yet to ex- memory management methods that better plore and validate the magnitudes of these safeguard user privacy even with the pres- savings and their underlying assumptions. ence of powerful attackers. (NSF) (NSF, ARO, Microsoft Research)

6 Security

achieves significantly better results than the initial attribute data that enable speci- Security Policy Mining previous algorithms. Our first ABAC policy fied users to together change the policy in mining mines the policy from an a way that grants the specified permission Scott D. Stoller ACL policy and data about user attributes to the specified user. (NSF, ONR) [email protected] and resource attributes. Our second ABAC policy mining algorithm mines the The most widely used representation of policy from operation logs and attribute Fast, Architecture- access control policy is still the humble data. It allows a controlled trade-off between policy quality (reflecting how well Neutral Static Binary access control list (ACL). Expressing Instrumentation policies in a higher-level language, such the resulting permissions correlate with as a role-based access control (RBAC) attribute data) and the number of specula- R. Sekar policy language or an attribute-based tive permissions (i.e., permissions granted [email protected] access control (ABAC) policy language, by the generated policy even though they can make the policies much easier---and do not appear in the log). (NSF, ONR) Program instrumentation techniques form hence cheaper---to understand, analyze, the basis of many recent software security and maintain. For example, consider defenses, including defenses against com- the policy, ``A user working on a project Security Policy Analysis mon exploits, security policy enforcement, can read and request to work on a non- application monitoring and debugging. As proprietary task whose required areas Scott D. Stoller compared to source-code instrumentation, of expertise are among his/her areas of [email protected] binary instrumentation is easier to use expertise.’’ In ABAC, this policy can be and more broadly applicable due to the expressed with a single rule, regardless of In large organizations, access control ready availability of binary code. Moreover, the number of users, projects, and tasks. policies are managed by multiple users source-code based instrumentation may In RBAC, this policy requires creation of (administrators). An administrative policy be incomplete because some of it may be a role for each task, creation of user-role specifies how each user may change the eliminated by compiler optimizations, and and permission-role assignments for each access control policy. Fully understanding because some low level code added by of those roles, and updates to those as- the implications of an administrative policy linkers (or compilers) is not instrumented. signments when relevant attribute values in an enterprise system can be difficult, change (for example, a user gains an area because of the scale and complexity of the One of the major challenges in binary in- of expertise). In an ACL policy, a separate access control policy and the administra- strumentation is the complexity of modern entry is needed for each combination of tive policy, and because sequences of instruction sets. Accurate instrumentation a user and a task for which the user has changes by different administrators may requires the semantics of all instructions permissions. The RBAC and ACL policies interact in unexpected ways. Adminis- to be captured, since all of the analyses require significantly more management trative policy analysis helps by answer- and transformations performed by the effort, and are more prone to management ing questions such as user-permission instrumentor are based on this semantics. errors, than the ABAC policy. reachability, which asks whether specified Any errors in modeling instructions will users can together change the policy likely cause instrumented programs to Policy mining is the problem of construct- in a way that achieves a specified goal, fail. Clearly, this is a daunting task even ing a higher-level (typically role-based or namely, granting a specified permission for a single architecture: the Intel manual attribute-based) policy that is equivalent, to a specified user. We are developing al- describing the x86 instruction set runs to or approximately equivalent (if noise is gorithms for administrative policy analysis over 1500 pages describing over 1100 present), to a given policy represented as for role-based and attribute-based policy instructions. When this task is multiplied ACLs. Policy mining can significantly re- frameworks. across different architectures such as duce the cost of migrating to a higher-level ARM, PowerPC, SPARC, MIPS, etc, the policy language in large organizations. We For attribute-based frameworks, the prob- effort involved becomes impractically are developing new algorithms for mining lem is particularly challenging, because large. We are therefore developing a novel RBAC policies and ABAC policies. Our administrators may change policy rules approach that avoids the need for model- RBAC policy mining algorithm can easily as well as facts (e.g., facts about attribute ing instructions by leveraging knowledge be used to optimize a variety of policy data). To provide more broadly applicable embedded in retargetable code generators quality metrics, including metrics based results, our algorithm performs abductive in today’s compilers such as GCC. This on policy size, metrics based on interpret- analysis, which means that it can answer approach not only simplifies the develop- ability of the roles with respect to user user-permission reachability queries ment of instrumentation, but also makes it attribute data, and compound metrics for given initial policy rules, even in the applicable to all architectures for which a that consider size and interpretability. In absence of specific data about initial user code generator is available. experiments with real access control poli- attributes and resource attributes. It does cies released by HP Labs, our algorithm this by identifying minimal conditions on 7 Security

Another important advance made by our approach is that of enabling a rich set of Security and Privacy in NFS4Sec: An Extensible optimizations to be performed on binary instrumentations, thereby significantly Geo-Social Networks Security Layer for improving performance over today’s tech- Network Storage niques. Moreover, our approach enables Radu Sion the use of today’s compiler backends for [email protected] Radu Sion and Erez Zadok generating and optimizing instrumenta- [email protected], tions, thereby achieving architecture-inde- Location based social or geosocial net- [email protected] pendent instrumentation. works (GSNs) have recently emerged as a natural combination of location based The Network File System (NFS) is a popu- Today’s binary instrumentation techniques services with online social networks: us- lar method for computers to access files have largely been based on dynamic (i.e., ers register their location and activities, across networks. The latest major version runtime) binary instrumentation (DBI). share it with friends and achieve special of this IETF protocol, version 4, is widely DBI techniques provide two key features status (e.g., “mayorship” badges) based accepted and includes numerous new needed for security instrumentation: (a) on aggregate location predicates. Boast- features to improve security, performance, it should be applied to all application ing millions of users and tens of millions and usability when used over wide-area code, including code contained in various of daily check-ins, such services pose networks. However, the NFSv4’s secu- system and application libraries, and (b) it significant privacy threats: user location rity focus is on network-wide encryption should be non-bypassable. Previous static information may be tracked and leaked (ensuring that user data cannot be inter- binary instrumentation (SBI) techniques to third parties. Conversely, a solution cepted) and user authentication (ensuring have lacked these features. However, DBI enabling location privacy may provide that only legitimate users can access their techniques can incur very high overheads cheating capabilities to users wanting to files); it does not address end-to-end data in several common usage scenarios, such claim special location status. In this paper security (i.e., persistently stored data), as application startups, system-calls, and we introduce new mechanisms that allow does not address data integrity (malicious many real-world applications. We have users to (inter)act privately in today’s or benign data corruptions), and more. therefore developed a new platform for geosocial networks while simultaneously This project extends NFSv4 with a security secure static binary instrumentation (PSI) ensuring honest behavior. An Android layer that allows one to develop multiple, that overcomes these drawbacks of DBI implementation is provided. The Google composable plugin modules to enhance techniques, while retaining the security, Nexus One smartphone is shown to be the protocol’s security. This layer allows for robustness and ease-of-use features. Our able to perform tens of badge proofs per interception of protocol requests between experimental results have demonstrated minute. Providers can support hundreds clients and servers to perform various an order of magnitude improvement in of millions of check-ins and status verifica- useful security functions: logging access performance over DBI techniques on tions per day. (NSF, ONR) to files by users and hosts, useful for many real-world applications. (NSF) regulatory compliance reports and audits; inspecting files for malware patterns and automatically quarantining them; verify- ing the integrity of long-lived files against malicious changes (e.g., Trojan intru- sions) and benign but serious ones (e.g., storage media degradation and hardware corruptions); detecting denial-of-service attempts and ensuring quality-of-service to legitimate users through load-balancing and redirection; automatic snapshotting and logging to allow for forensic analysis and recovery from failures and intrusions. In a cloud-based era where more data lives longer and is accessed over wide- area, insecure networks, this project help elevate the level of security of every user’s data files. (NSF)

8 Security

The prototype system is designed to allow uID: A Strongly-Secure Cloudtracker: Transparent, individual users and organizations to rap- Usable Identity Ecosystem Secure Provenance Track- idly adopt new technology platforms, from clouds to novel end-user systems, without with Privacy ing and Security Policy having to worry about the interaction of Enforcement in Clouds. these new systems with security policies Radu Sion and regulatory compliance concerns. [email protected] Radu Sion and Don Porter (NSF) [email protected] uID is a secure, usable, privacy-enabling [email protected] digital identity ecosystem, able to integrate, and synergize with existing governmental, As companies, governments, and indi- commercial and open-source identity and Development of Security vidual users adopt increasingly diverse Threat Control System with authentication solutions. computing platforms, from outsourced cloud computations to personal laptops Multi-Sensor Integration Designing tomorrow’s digital identity and mobile devices, enforcing uniform and Image Analysis solution is faced with unique challenges. security policies across these platforms Identity mechanisms overwhelmingly refer becomes unwieldy. Sangjin Hong to and are used by people. They need to [email protected] be usable and affordable, and ad-dress Similarly, regulatory compliance and busi- individual concerns of privacy and confi- ness auditing requires tracking the history of This purpose of this project is to develop dentiality. At the same time, to ensure trust this data in a comprehensive, secure, and a heterogeneous sensor network platform they need to provide accountability and be platform-independent manner. Unfortunate- for intelligent surveillance system applica- strongly secure. ly, technology has not kept pace with these tions. Multiple image sensors, such as practical concerns, and several systems and hyperspectral image sensors, are utilized Further, it is important to realize that no security research challenges must be ad- in order to capture various undetectable one plat-form can be a sole provider – a dressed to make this vision a reality. images for discovering hidden information. viable ecosystem will have standards with Different sensors collaborate for creating well specified APIs and conduits for in- consistent overall real-time information to teroperability that naturally foster a healthy There is a natural and under-explored connection between understanding the be used to infer many abnormal surround- market. Finally, it is essential that these ing situations. In the project, following mechanisms interoperate and are efficient origins of data and using that data’s history to enforce security policies. To leverage researches are conducted. Multiple object so as to not constitute a bottleneck when association through estimation and predic- deployed. this connection, this project is developing a comprehensive, general framework for tion, low complexity embedded system automatically tracking the history of data design, large scale system modeling, While addressing all of the above multimedia database access strategy, and challenges, uID will focus on two key and enforcing associated security policies in cloud computing environments. The stochastic collaborative signal processing. goals: privacy protection and transaction (Korea MKE) unlinkability. These properties are unfor- research focuses on three key research tunately conflicting and require a complex challenges. multi-layer research and development approach calling on multi-disciplinary First, the project investigates novel ap- expertise across all the layers of today’s plications of virtualization technologies to digital transactions. Simple “browser transparently infer data provenance by plugins” or “email-based” mechanisms inspecting a guest operating system (OS) alone are bound to fail by not considering and applications. Second, this project is the multiple cross-layer security chal- developing techniques to securely store, lenges. (CEWIT) manage, and query provenance data at cloud scale. Finally, the project combines the first two technologies to transparently and collaboratively enforce security poli- cies throughout the cloud and end-user systems.

9 Networks

BGP hijacking; (iii) document such events, Internet Censorship Lab: Detecting and providing datasets to researchers as well as informing operators, emergency-re- Illuminating Internet Cen- Characterizing Internet sponse teams, law-enforcement agencies, sorship with Principled Traffic Interception based and policy makers. In characterizing their Network Measurements on BGP Hijacking impact, we will quantify increased latency along observed paths, the magnitude of the incident in terms of number of ASes Phillipa Gill Phillipa Gill and prefixes intercepted, and the social/ [email protected] [email protected] political implications of interceptions that take traffic across national borders. We will The Internet was not designed with Recent reports have highlighted incidents of massive Internet traffic interception augment our active measurement frame- information controls, such as censorship work with algorithmic simulations of BGP or surveillance, in mind. However, its executed by re-routing BGP paths across the globe (affecting banks, governments, routing policies, and qualitative analysis importance has led many nations to repur- of the organizations involved, to better pose Internet protocols (e.g., the Domain entire network service providers, etc.). The potential impact of these attacks can range understand the both technical and political Name System (DNS) and Border Gateway effects of hijacks. (CEWIT) Protocol (BGP)), and network manage- from massive eavesdropping to identity ment products (e.g., Web proxies, traffic spoofing or selective content modification. shapers) for information control. This unin- In addition, executing such attacks does tended use of networking technologies can not require access or proximity to the af- Belief Evolutions in lead to unintended international impact of fected links and networks, posing increas- Networks of censorship, and raises many ethical issues ing risks to national security. Worse yet, when network management products are the impact of traffic interception on the Bayesian Agents exported to countries that use them to Internet is practically unknown, with even violate human rights. large-scale and long-lasting events appar- Petar M. Djuric ently going unnoticed by the victims. [email protected]

To address these challenges, this project: As reported by Renesys Corporation in The research is on understanding the pro- (1) develops a platform which enables November of last year, there is evidence cesses of belief evolution about the state repeatable measurements of Internet cen- that traffic interception events are growing of nature in time and/or space in networks sorship while mitigating risks to individuals more frequent, but there are no validated of Bayesian agents and in a wide variety of performing the measurements, (2) designs methods to immediately detect them or settings. We study the evolution of beliefs techniques to analyze data from the plat- evaluate their impact. The architectural in- in static networks (with fixed number of form to detect different types of censorship novation that mitigates this inherent proto- agents and fixed topology) and dynamic and even specific network management col design flaw exploited by such attacks, networks (with varying number of agents products used for censorship, and (3) is slow to take off, suggesting that this and social ties). The latter includes net- uses the platform to quantify instances of vulnerability will persist, leaving our critical works that allow new agents to join the net- national censorship which have unintend- communication infrastructure exposed. work and its current agents to disappear ed international impact on the Internet. and reappear or to change their social

Because of their complex dynamics, and ties. Special attention is given to problems The use of technology to restrict free- the number of different actors involved on of evolution of beliefs in networks where dom of speech and persecute dissidents a global scale, devising effective method- agents communicate correlated informa- around the globe is a human rights ologies for the detection and characteriza- tion. Evolution of beliefs in networks whose concern. This project provides high fidelity tion of traffic interception events requires agents make random decisions and evolu- technical data to inform policy discussions. empirical and timely data (e.g., acquired tion of beliefs in presence of malicious Further, the technical analysis provides while the event is still ongoing). By lever- agents is also studied. (NSF) insights into the global impacts of national aging our experience in measuring and censorship on the Internet, and how pro- investigating events affecting inter-domain posed improvements to existing protocols communication and leveraging our mea- (e.g., DNSSEC, BGPSEC) can mitigate surement and data processing infrastruc- these issues. (NSF, Google) ture, this research project is working on

the following goal: (i) investigate, develop, and experimentally evaluate novel meth- odologies to automatically detect traffic interception events and to characterize their extent, frequency, and impact; (ii) extend our measurement infrastructure to detect in near-realtime and report episodes of traffic interception based on 10 Networks

centralized management to more au- Novel Interconnection Energy Efficient Reliable tonomous, uncoordinated, and intelligent rollouts of small base stations deployed by Networks for Data Centers Data Transmissions in a end users and overlaying wide-area cellu- Generalized Power Line lar networks. Fundamentally different from Yuanyuan Yang Monitoring Network conventional cellular networks or hetero- [email protected] geneous networks with nodes operating in Xin Wang different spectrum bands, femtocells are Driven by technology advances, massive deployed in an ad hoc manner and share [email protected] data centers consisting of tens or even the same spectrum band as the cellular hundreds of thousands servers have been networks to increase the spectrum usage built as infrastructures by large online ser- Efficient power line monitoring is essential efficiency and allow a terminal to seam- vice providers. Designing a cost-effective for reliable operation of the Smart Grid. A lessly operate in macrocells and femto- network topology for data centers that can Power Line Monitoring Network (PLMN) cells. This network infrastructure, how- deliver sufficient bandwidth and consistent based on wireless sensor nodes can pro- ever, would create strong cross-tier and latency performance to a large number of vide the necessary infrastructure to deliver intra-tier interference. The objective of servers has been an important and chal- data from the extension of the power grid this project is to enable more efficient and lenging problem. A good network topology to one or several control centers. However, reliable operation of autonomous femto- should meet the following requirements: the restricted physical topology of the cell networks with agile spectrum access, (1) expandability, which means that power lines constrains the data paths, and autonomous interference control, as well expanding an existing network should has a great impact on the reporting perfor- as intelligent network self-organization and not incur huge extra cost in either man mance. In this work, we present a com- self-optimization. This project falls into four power or device replacement; (2) plenty prehensive design to guide efficient and interacted thrusts: 1) Incorporate cognition parallel paths between any pair of servers flexible relay selection in PLMNs to ensure into the femtocell networks to cognitively to guarantee a sufficient bandwidth and reliable and energy efficient transmissions reuse the available spectrum sensed; 2) graceful degradation; (3) small network while taking into account the restricted Develop distributed, dynamic and coop- diameter so that a task can be assigned topology of power-lines. Specifically, our erative interference management schemes to any part of the network as required design applies probabilistic power control exploiting antenna techniques and based by cloud computing; (4) low cost of the along with flexible transmission schedul- on sensed environmental conditions; 3) In- interconnection structure. As the size of ing to combat the poor channel conditions vestigate the scenarios and schemes that data centers becomes larger and larger, around power line while maintaining the femtocells can be exploited to facilitate the cost of the interconnection structure energy level of transmission nodes. We macrocell transmissions, and the potential becomes a key factor in practical applica- consider the impact of different channel gains in capacity, coverage and reliability; tions. Many data center network topologies conditions, non-uniform topologies for a 4) Incorporate interference cancellation have been proposed recently, which can power line corridor and the effect of report- for data multicast, and develop techniques be basically divided into two categories: ing events. Our performance results dem- to support multiuser video streaming. The switch-centric networks and server-centric onstrate that our data forwarding scheme project also develops a testbed with open networks. We conduct extensive research can well control the energy consumption source programmable wireless platforms, in both categories. In this project, we and delay while ensuring reliability and for prototyping and evaluating the effec- mainly focus on server-centric networks extended lifetime. (CEWIT) tiveness of various techniques developed. and propose a novel server-centric net- The proposed research has the potential work topology called BCube Connected to significantly increase the capacity and Crossbars (BCCC) which can meet all the resilience of existing and future wireless requirements described above. We also Cognitive and Efficient networks. The agility and resilience of the propose routing algorithms for different Spectrum Access in Auton- system will also make it instrumental to communication patterns in the BCCC. It is support communications and applications shown that the BCCC outperforms the cur- omous Wireless Networks that are important for national security and rent popular data center networks. (NSF) economy. (NSF) Xin Wang [email protected]

The exponential growth of wireless traffic calls for novel and efficient spectrum access techniques and wireless network infrastructures. The recent introduction of autonomous network concept, where femtocell is an application, presents as a paradigm shift from traditional cellular network with planned deployment and 11 Networks

initial seeds matter; how does the degree Exploiting Control and Modeling and Under- distribution of the network, the ‘rich club’ property, or community structure relate to Communications for Robust standing Complex Influence the spread of complex contagions; how do and Flexible Multi-Robot in Social Networks we stop complex contagions from spread- Coordination ing; and finally how do external environ- Jie Gao mental factors affect complex contagions. [email protected] Xin Wang We will conduct theoretical, rigorous analy- sis of the behavior of complex contagions [email protected] Social interactions constitute a crucial part under various models of the social network of everyday life. Behavior changes, similar and the diffusion process, as well as simu- The objective of this project is to investi- to rumors or viruses, spread in the social lations on real world networks to calibrate gate a set of control and communication network and become a contagion. Some of our model construction. techniques for establishing and maintain- these contagions can be beneficial (e.g., ing communication connections among adopting healthy lifestyles) or profitable The results of this project will have deep multiple collaborating mobile robots, in (e.g., viral marketing), and we would like and long-lasting social impact. Rigorous response to varying communication condi- to encourage or promote them. Some of understanding of complex contagions will tions in practice, to ensure robust and these contagions can be destructive (such provide much needed theoretical guidance flexible multi-robot coordination. The ap- as teenager smoking, obesity, or alcohol to real world applications, ranging from proach is to exploit the features of wireless abuse), and we would like to discourage or healthcare to word-of-mouth advertising, stop them. It is therefore of great impor- communications and mobility of robots to from influencing cultural trends to politi- tance and urgency to understand how significantly increase the coverage and re- cal campaigns. The technical content of these contagions naturally spread in social liability of communications and the chance this project is inherently interdisciplinary networks and how to effectively encourage of forming communication links among and will have direct applications to related or discourage a contagion with our avail- fields such as probability, economics, mobile robots. This research addresses the able (limited) resources. critical challenge in the establishment and sociology, and statistical physics. (NSF, DARPA) maintenance of communication connec- Diseases and information can spread tions among collaborative mobile robots, through a single contact. Thus they spread which is imposed by unstable communica- fast in social networks with the small world tion conditions and the dynamic nature of property. However, in most of the realistic robots and environments. The contribu- settings when agents’ actions and behav- Multicast in Data Center tion of this work lies in the transformative ioral changes are involved, it often takes Networks development and integration of novel multiple activated neighbors to spread a distributed control and communication contagion. We denote this type of conta- Yuanyuan Yang techniques for controlled communications, gion as a complex contagion. The require- [email protected] multi-robot motion, and wireless signal ment of synergy between neighbors, intui- search under unstable communication tively, makes the spreading of a complex Data center networks (DCNs) intercon- conditions and robot/task constraints. The contagion to be more unlikely, slower, and nect tens of thousands of servers to form integrated application of these techniques more delicate. Enabling the successful the infrastructure for today’s ubiquitous will lead to robust and efficient commu- spreading of a complex contagion requires collaborative computing, which are the nication networking, high-freedom task special graph structures. backbone of the clouds. Most of today’s operation and exploration, and thus highly DCNs provide high bandwidth using inex- robust and flexible coordination among This project will answer both the scientific pensive commodity switches. Also, many question of what factors enable a complex multiple collaborative robots. (NSF) online applications (e.g., web searching) contagion to spread in a social network, and backend infrastructural computations as well as the engineering question of how (e.g., distributed file system and database) to design intervention schemes to guide hosted by data centers require one-to- complex contagions. It will provide the fun- many group communication. Network- damental understanding of the interplay of level multicast can greatly benefit such structural properties of the social network group communication through reducing and the behaviors of social processes network traffic and releasing the sender operating on the network. Specific ques- from duplicated transmission tasks, tions to be addressed include: how fast thus significantly improving application do complex contagions spread in a social throughput and increasing network capac- network, on both model networks and real ity. Several unique features in data centers world networks; how does the choice of facilitate the implementation of multicast. First of all virtualization of machines and networks provides plenty of flexibility. For 12 Networks

ously gather and deliver energy informa- Research on Wireless tion to the vehicle to coup with a variety of recharge requests. Analytical results Rechargeable Sensor based on the energy neutral conditions Networks that give rise to perpetual operation are also derived. Yuanyuan Yang [email protected] The second question is how to schedule vehicle(s) to achieve perpetual opera- In this project, we introduce the novel tions. For a single vehicle, we formulate wireless charging technology to power the problem into an Orienteering problem, wireless sensor networks. The objective is which is NP-hard. It aims to maximize the to achieve perpetual network operations as total energy recharged in a given time. well as improve network performance. Tra- Since the problem is NP-hard in nature, ditional battery-powered sensor networks we take reasonable approximations to usually have limited lifetime that pose great simplify the problem into a Knapsack challenges to meet a variety of energy- problem so we can develop polynomial- time solutions. Further, the problem on computing tasks submitted to data cen- demanding applications. Energy harvesting how to schedule multiple vehicles that has ters, physical resources can be logically from environmental sources can sustain more scalability and robustness immedi- divided and each task can be provided a network operations. However, dynamics ately follows. Our focus is to minimize the virtual network according to its scale and from the energy sources may cause inter- vehicles’ total traveling cost while ensuring requirements. We exploit the freedom of ruptions in network services and degrade all nodes are functional. We formulate the resource allocation to meet several goals: performance greatly. The novel wireless problem into a Multiple Traveling Sales- (1) to deliver satisfied user experience, charging technology has opened up a new man Problem with Deadlines (m-TSP with i.e., the communication between mem- dimension to replenish energy in sensor Deadlines), which is also NP-hard. To ac- bers of a virtual network is nonblocking for networks without wires or plugs. A charging commodate energy dynamics and reduce multicast; (2) to achieve efficient resource vehicle equipped with resonant coils can computational overhead, we develop an utilization, i.e., hardware cost of the data move around the field to recharge nodes on-line algorithm that selects the node center network can be reduced. The conveniently. Several important issues are with the minimum weighted sum of travel- second valuable feature in data centers studied in this project. ing time and residual lifetime. Our scheme is server redundancy. Server redundancy not only improves network scalability but is originally proposed in order to deliver The first question is how to gather real- also guarantees the perpetual operation of high availability and fault tolerance in data time node’s energy information in a scal- networks. centers. Nevertheless, this feature can able manner. To achieve this, we propose be utilized to improve the design of data an NDN (Named Data Networking)-based The third problem is how to integrate wire- center networks and reduce network cost. real time communication protocol for gath- less charging with traditional sensing ap- For example, the number of core switches ering real-time energy status that divides plications such as mobile data gathering. in a fat-tree data center network can be the network into hierarchies. We leverage We can combine wireless charging and dramatically reduced if the server redun- concepts and mechanisms from NDN data gathering utilities on a single vehicle dancy is taken into consideration in the to design a set of protocols that continu- to improve spatial-temporal efficiency. overall network design. Third, the recently Our objective is to maximize the network developed OpenFlow framework gives utility. First, a set of sensor nodes with endless possibility to design various net- minimum residual energy are selected work control strategies based on different for the vehicle to perform recharge. The needs. We develop several online multi- algorithm ensures a bounded travel- cast scheduling algorithms to balance the ing time under a given threshold. Upon traffic load in data center networks. These traversing each node for recharging, the algorithms not only improve the utilization vehicle collects data messages in the of network bandwidth, but also ensure that neighborhood by multi-hop transmissions. no link in the network is oversubscribed. To maximize network utility, we divide the (NSF) original problem into several sub-problems to find the optimal data rates, flow routing and vehicle’s stopping time. Distributed algorithms to solve these problems are proposed and convergence properties are examined. (NSF)

13 Big Data

Center for Dynamic Data An Ontology and Scalable Multilingual NLP Analytics (CDDA): Reasoning System for through Deep Learning A National Science Foundation Access Policies to Industry/University Cooperative Software Services Steven Skiena Research Center [email protected] Michael Kifer and Paul Fodor Accurate interpretation of natural lan- Arie Kaufman, Rong Zhao [email protected] guage, including search queries, requires [email protected] [email protected] an understanding of how word meanings change with time and group. The word The Center for Dynamic Data Analyt- Rule-based models are commonly used to “gay” meant something quite different ics (CDDA), an NSF supported Industry/ govern access rights to software services thirty years ago than it does today, a cycle University Cooperative Research Center and systems. As administrators attempt to of reinterpretation that can happen within (I/UCRC), was established in 2011 to con- define increasingly sophisticated rules that days or even hours due to external events duct pre-competitive research to manage, take into account company policies, user (storm names like Sandy or Katrina) or analyze, and visualize massive, complex, context (e.g., mobile vs. fixed devices and people in the news. Twitter hashtags are multidimensional and multi-scale dynamic geo-location), and the characteristics of the constantly recycled, the same string now data. The CDDA is a multi-institution con- system, service or files, the rule sets can with a different meaning. Regional slang sortium located at Rutgers University and become complex. Moreover, administra- implies that certain words mean different Stony Brook University. Its mission is to tors increasingly want to use available data things in different places. turn chaos into knowledge and to unleash analytics about users and the usage of the The key with all of these examples is the transformative potential of big data in system’s, services, and files. A security-do- capturing the semantics of what words a wide range of application domains, such main-based reasoning system can help ad- ministrators design more robust, consistent actually mean. Recently re-introduced as information technology, healthcare, rules and can help applications interpret techniques in unsupervised feature pharmaceutical, biotechnology, com- those rules as intended by the administra- learning make this possible, by acquiring merce, retail, finance, insurance, media, tions. The development and management common features for a specific language entertainment, transportation, logistics, of the basic building blocks for security vocabulary from unlabeled text. These manufacturing, defense, security, educa- mechanisms, which can be integrated into features, also known as distributed words tion, and public administration. Dynamic a wide variety of mission-critical informa- representations (embeddings), have been data pose new challenges in algorithm tion systems, also needs such a domain used by us and other groups to build a design for analysis and visualization that specific knowledge base system. unified NLP architecture that solved mul- traditionally have not been addressed. tiple tasks; part of speech (POS) tagging, Types of data that are considered dynamic In this project, we develop a knowledge named entity recognition (NER), semantic in nature include data captured from base infrastructure to help security domain role labeling and chunking. We have built unmanned vehicles; sensor networks; tele- experts to build ontologies of security word embeddings for one hundred of communications networks; utility grid and knowledge components. This knowledge world’s most frequently spoken languages, other critical infrastructure; mobile and base infrastructure is intended to not only using neural networks (auto-encoders) interactive Web 2.0 applications; robot- provide an interactive knowledge entry trained on each language’s Wikipedia in ics and intelligent agents; biomedical and facility, but also to support reasoning capa- an unsupervised setting, and shown that healthcare informatics; computer simula- bilities to answer various types of queries they capture surprisingly subtle features tions and modeling; real-time financial about rule sets. It is also intended to help of language usage. We seek to build applications; social media and entertain- administrators construct more robust better systems for large-scale analysis of text streams regardless of language, and ment; just to name a few. Our research policy rules. Figure 1. (Center for Dynamic explore new methods for training word will strategically focus on design and Data Analytics and CA Technologies) embeddings and their applications. (NSF) evaluation methods, algorithms, architec- tures, software, visualization techniques, mathematical and statistical foundations, and benchmarking of complex systems that facilitate large-scale, dynamic data analytics. The ultimate goal of the CDDA is to develop new technologies that can be applied by its industry partners to cre- ate value across a wide range of sectors. For more information please visit cdda. cs.stonybrook.edu. (NSF) Fig.1 14 Big Data

tagging, named entity recognition (NER), The ND-Scope: Visual semantic role labeling and chunking. We Application of Parallel have built word embeddings for over one Analytics for High- hundred of world’s most widely-used Computing to the Analysis Dimensional Data languages, and shown that they capture of Next-generation surprisingly subtle features of language Sequencing Data Klaus Mueller usage. [email protected] Song Wu We will improve and generalize word em- [email protected] The growth of digital data is tremendous. beddings to new problems and domains. Our research here includes: Any aspect of life and matter is being The focus of my lab is on developing and (1) Better Word Embeddings — We will recorded and stored on cheap disks, either implementing novel statistical/bioinformat- construct improved language representa- in the cloud, in businesses, or in research ics methodologies for the analysis of large- tions based on our experiences building labs. We can now afford to explore very scale genetic and genomic data. complex relationships with many variables the first-generation of Polyglot embed- dings. We will develop effective techniques playing a part. But for this data-driven re- In the past few years, high-throughput next to capture representative lexicons in the search we need powerful tools that allow us generation sequencing (NGS) technology face of diverse language morphology, to be creative, to sculpt this intricate insight has quickly emerged as the dominating phrase co-locations, and major sources from the raw block of data. High-quality method in biomedical research, replacing of out-of-vocabulary (OOV) words such as visual feedback plays a decisive role here. the once-prevalent microarray technology. names/numbers/locations. The trick here And the more people participate in the effort With NGS, many problems traditionally con- is to do these things within an unsuper- the more progress we can make. We have sidered arduous or impossible have now vised learning framework, so we can build created a framework and software package, become feasible. However, along with the embeddings without labeled training data called the ND-Scope, which incorporates rapid development of NGS technology, the and domain experts. various facilities for high-dimensional data data magnitude and complexity for analyz- exploration and reasoning with high- ing NGS data far exceed the capacity and (2) Applications of Word Embeddings to dimensional data. It couples powerful data capability of traditional small-scale comput- Multilingual NLP — We seek to apply our analysis with artistic illustration techniques ing facilities, such as standalone worksta- improved embeddings to construct a full to help users to only show those aspects tions. On a separate note, massively parallel stack of NLP processing tools -- including of the data they deem relevant. This clears processing (MPP) systems have undergone POS tagging, named entity recognition and the clutter of current data displays and so dramatic development in the past decades. classification, anaphora resolution, senti- fosters better insight. (NSF) In spite of significant advances in both ment analysis, parsing, and even gross fields of NGS and MPP, cutting-edge re- language translation -- for all 100+ lan- search in applying supercomputing to NGS guages. We anticipate these resources will analysis is still at its infancy. Therefore, the Capturing Word Meanings continue to interest a broad research com- enormous computational demand of the munity. We also will use them ourselves, by Language, Place, NGS data analysis along with the huge sup- to establish a sentiment/volume trend and Time ply of processing power by MPP systems analysis system to measure changes in the presents a unique and substantial opportu- world condition, in its native tongues. Steven Skiena nity for developing highly efficient analysis strategies for the NGS data. Specifically, [email protected] (3) New Domains for Word Embeddings— in this project, we aim to address the two We believe that word embeddings have most fundamental problems in NGS data Building multilingual processing sys- considerable promise in scientific and cul- processing: (1) how to quickly align billions tems remains challenging, particularly tural domains far removed from traditional of short reads to a reference genome and for resource-poor languages. But recent language processing. By appropriately (2) how to assemble billions of short reads advancements in unsupervised feature training embeddings on scientific corpora into a genome in a timely manner. We are learning present a promising alternative. like Pubmed/Medline abstracts, we can currently developing a scalable hierarchical Instead of relying on expert knowledge, reducethe ``bibliome’’ to concise numerical multitasking algorithm for importing classi- these approaches employ automatically features about diverse entities like drugs cal sequencing algorithms for these tasks to generated task-independent features, ac- and diseases, suitable for building models modern parallel computers. quired for a specific language vocabulary for a wide variety of applications. We can by training on unlabeled text corpora. even use embeddings to study philosophi- The bioinformatics and parallel computing cal issues of what words really mean, by schemes developed in this project would These features, known as distributed word relating embeddings to traditional defini- be a valuable contribution to the genomic representations or embeddings, have been tions found in dictionaries. (NSF) used by our lab and other groups to build community for comparative genomics unified NLP architectures that solve mul- based on ultra-fast computing, which could tiple tasks, including part of speech (POS) potentially have a much broader impact, 15 Big Data

especially on public health and clinical Most economically, biologically and clini- research. By tackling these challenges in cally important traits, such as grain yield, Efficient Resource- handling the massive NGS data generated, reproductive behavior and cancer risk, are our parallel software tool will offer unparal- inherently complex. This project will greatly oblivious Parallel leled opportunities for targeted biological advance the discovery of novel genes and Algorithms discovery and more accurate clinical diag- their interactions to facilitate the identifi- nostics to accelerate personalized medica- cation of drug targets to enhance public Rezaul Chowdhury tion. (Stony Brook University) health, or to help animal and plant breed- [email protected] ers to improve trait quality. (CEWIT) Parallelism is now ubiquitous. From the Develop Novel Statistical tiny multicore smartphones in our pock- Cache-efficient Parallel ets to the gigantic supercomputers with Methods for Multi-loci thousands of multicore and manycore Genetic Mapping Algorithms for Fast Flex- compute nodes, parallel processing is ible Docking and Molecular supported at all scales. Unfortunately, Song Wu Dynamics however, writing correct parallel programs [email protected] is hard, and making them efficient is even harder because of two apparantly conflicting Rezaul Chowdhury reasons: (1) abundance of resources, and The overall goal of this project is to develop [email protected] novel statistical methods for detecting (2) constraints on resources. To understand resource abundance consider a cluster of genetic risk factors by modeling mul- Proteins are one of the major structural tiple linked genetic loci in genome-wide nodes each containing multicore processors and functional building blocks in our association studies (GWAS). GWAS have and manycore coprocessesors. In order to cells, and they often realize their functions achieved great success in moving forward fully utilize these resources one must exploit through mutual interactions. The problem our understanding of genetic contributions internode distributed-memory parallelism as in many complex traits, however, current of computationally determining the relative well as intranode shared-memory parallel- GWAS analyses experience a bottleneck transformation and conformation of two ism, and even inside a single node, keep of the so-called “missing heritability”, proteins that form a stable complex, repro- both the processors and coprocessors busy a phenomenon commonly observed in ducible in nature, is known as “protein- with task parallel computations and data many GWAS that the detected genetic protein docking”. Docking is an important parallel computations, respectively. On the variants can explain only a small propor- step towards understanding protein- other hand, each level of the cluster comes tion of phenotypic heritability, while the protein interactions, and has applications with its own set of resource constraints, majority part remains mysterious. Part of in drug design, structure function analysis, e.g., number of available compute nodes the reason is that the traditional GWAS and studying molecular assemblies. In and internode communication bandwidth analyses are mainly based on the single spite of recent advancements the imaging at the network level; number of available marker association, which is often inef- of macromolecular complexes remains a processing cores and coprocessors, and ficient in detecting small effect sizes and difficult task, and the need for fast and cache/RAM/external-memory sizes inside complicated gene interactions involved in robust computational approaches to a compute node; and the number of cores complex traits. Therefore, developing more predicting the structures of protein-protein and cache/memory sizes inside a coproces- sor. Structures of the internode network and powerful genetic mapping methods has interactions is growing. become increasingly important. Capital- intranode memory-hierarchies also impose significant resource constraints. Finally, izing on the observation that the effect We have already developed F2Dock -- a there could be a limited energy budget of a causal genetic locus may be carried rigid-body protein-protein docking pro- by its neighboring marker loci due to the for the comptation. Thus writing efficient gram based on cache-efficient multicore parallel codes for state-of-the-art machines structure of linkage disequilibrium blocks, algorithms and data structures. F2Dock is methods that integrate multiple linked remains the job of a few experts who also the first docking program to use tunable markers are presumably more powerful to must work hard to keep their skills up-to- approximations, and partly builds on our detect the causal locus. However, the main date as the state-of-the-art changes so work on the fast estimation of the total free difficulty for multi-loci genetic mapping is frequently. As a result, being able to design that when the number of linked SNPs is energy of bio-molecules in almost linear efficient “resource-oblivious” algorithms, large, like 100 or more, high collinearity work and linear space. While F2Dock i.e., algorithms that do not use the knowl- and model validity become serious is- performs rigid-body docking using static edge of resource parameters but still can sues. We aim to address these issues and octree-based data structures, the goal perform with reasonable efficiency across extend current genetic mapping practice of this project is to develop cache- and machines, has become a highly desirable to incorporate the linkage information of communication-efficient dynamic data goal. We have already shown how to design multiple linked loci, and develop a series structures and algorithms for fast dock- efficient core- and cache-oblivious parallel of statistical methods for multi-loci genetic ing of flexible molecules and performing algorithms for the multilevel cache-hierar- mapping with a block of SNPs. molecular dynamics on modern parallel chy of a multicore machine. The objective machines. (CS/IACS Startup) of this project is to improve known results 16 Big Data for multicores, and extend the notion of cryptoreads). Cryptoreads cause update op- resource-obliviousness to real-world hybrid erations to block on lookups, thus throttling Eliminating the Data computing environments. If successful, this the faster updates that WRO data structures research will enable programmers to easily provide. The proposed work will investigate Ingestion Bottleneck in produce highly efficient codes for state-of- OS support for WRO data structures, as Big Data Applications the-art parallel computing platforms. A wide well as redesigning WRO data structures to variety of scientific applications­ — ranging support the operations of a fully-featured file Michael Bender and Rob Johnson across physics, , chemistry, energy, system. (CEWIT) [email protected] climate, mechanical and electrical engineer- ing, finance, and other areas — will become “Old-school” tools, such as RDBMS and easier to develop and maintain, benefiting High-performance file systems, are unable to keep up with the these application areas, as well as society at data ingestion rates of big data applications. large. (CS/IACS Startup) Rule Engine for The result has been the proliferation of spe- Intelligent Web cialty solutions that make compromises on generality, data freshness, result accuracy, Information Systems or hardware cost. RDBMS and file systems FTFS: A Read/Write- have been orphaned at the largest scales optimized Fractal Tree Michael Kifer by a mismatch between their underlying al- File System [email protected] gorithms and hardware trends. At this point they run up to three orders of magnitude Rule languages have been a popular behind the capabilities of the hardware. Michael Bender, Rob Johnson, choice for research on intelligent infor- and Don Porter mation systems for over three decades This proposal aims to deploy write-opti- [email protected] now. However, until recently, the lack of mization techniques to yield a disruptive necessary computing power and of suit- improvement in general-purpose analytical Today’s general-purpose file systems offer able networking infrastructure prevented tools. By recouping two of the orders of poor performance on microdata operations, wide adoption of this paradigm in software magnitude available on modern hardware, such as file creation and destruction, small development. With the advent of the Se- this project will enable RDBMS and file sys- writes to large files, and metadata updates, mantic Web and following wide recognition tems to scale to orders-of-magnitude larger such as inode or atime updates. Micro- of the limitations of its current standards, speeds and data sizes. This research will data operations are pervasive in file system such as RDF and OWL, rule-based sys- bring large-scale data analytics to a much workloads. This performance artifact is the tems were put firmly back on the agenda. broader, less specialized audience. result of a tacit, but demonstrably incorrect, As proof of this renewed interest, a host A key component of this proposal is to assumption underlying file system designs: of projects are expected to release new rethink how storage-system hardware gets that one must tradeoff between data locality rule-based engines in the near future, and deployed for large data. By shifting the on disk and small-write performance. the World Wide Web Consortium is busy bottleneck on indexing away from I/Os standardizing the necessary infrastructure, per second to bandwidth, it is possible to The proposed research observes that a the Rule Interchange Format. replace small disks with large disks, at a recently-discovered class of data structures savings of more than an order of magnitude called write-read-optimized (WRO) data Most of the aforesaid projects, however, per byte. Similarly, write-optimization can be used to reduce the number of relatively structures, such as Bε -trees and fractal aim at a low-hanging fruit by augmenting tree indexes, can bridge this gap. WRO existing Semantic Web languages with expensive solid state disks (SSDs), and data structures give comparable asymptotic simple rule-based capabilities. Their aim is to use them more effectively, for a further cost savings. A twin goal of this proposal is behavior to a B-tree for queries and bulk to provide for the most basic today’s needs to increase the scale of data amenable for updates, and support small updates with of the developers of semantic content on analysis while cutting storage- performance close to logging. Prior work the Web. In contrast, our project targets hardware costs. demonstrates that these asymptotic benefits future intelligent information systems, translate to real performance improve- which require very expressive and high- To achieve these goals, this project will ments—up to two orders of magnitude performance rule-based knowledge redesign storage mechanisms to avoid faster than a traditional B-tree for some programming languages. In this work, we crypto-searches that can throttle write- operations. develop new and integrate existing tech- optimized data structures down to the nologies ranging from higher-order, frame- speed of B-trees, develop SSD-resident based, and defeasible knowledge repre- The organizing principle of this research is approximate membership query (AMQ) sentation, to reasoning about processes, to the creation of FTFS, a general-purpose, data structures, explore AMQs for range truth maintenance, databases, and logic Linux kernel implementation of a file system queries, design write-optimized data struc- programming. (Vulcan Inc., NSF) designed around WRO data structures. Un- tures that minimize write-amplification Potential markets: Financial regulations, fortunately, operating systems have ossified when used on SSDs, and investigate I/O medical informatics, security certain assumptions about how file sys- models for capacitated maintenance and tems are designed, such as inducing extra online index creation. (NSF) lookups during update operations (called 17 Big Data

High Performance Big Data Analytics with the User in the Loop Klaus Mueller [email protected]

In data mining and especially in big data, preprocessing consumes a large por- tion of the workflow, as much as 80-90%. Preprocessing includes data preparation, Fig.2 Diagnosis system development process integration, cleaning, reduction, and trans- formation. As big data analysis can often be mission critical, preprocessing should be done expediently. The massively paral- Automatic Generation of Visual Correlation Analysis lel architecture of GPUs offers an effective Virtual Troubleshooting of Numerical and Categori- platform to accomplish high speed data pro- Systems Using Ontologies cal Multivariate Data cessing. However, as GPU technology has been developing steadily and rapidly, users Paul Fodor, Michael Kifer, and Steve Klaus Mueller have trouble keeping up. And even if they do, the largeness of big data requires not Greenspan (CA Technologies) [email protected] just one GPU but multiples of these in con- [email protected] junction with large memory. These needs are [email protected] Correlation analysis can reveal complex re- best addressed in a cloud-based platform. lationships in multivariate data. It touches Development of troubleshooting software Our framework utilizes both a single GPU a diverse set of application domains: as well as a multi-GPU cloud server and it is an attractive area of research for agent science, medicine, finance, business, and based system developers. In this project, supports clustering, redundancy-based data many more. Knowledge about correlations decimation, outlier detection, data fusion, we use ontologies extracted from textual can enable the discovery of causal links representations to automatically construct and so on. Using a progressive refinement and enables predictive analysis. However, scheme, users are given immediate visual a troubleshooting virtual expert. In our as the number of variables grows, it can solution, we verify the information about the feedback as partial results become available, be difficult to gain a good understanding allowing them to focus and steer the data structure of the system extracted from the of the correlation landscape and important processing. (NSF, DOE) textual document, then generate a conver- intricate relationships might be missed.

sation with the user in order to identify the To address this significant problem, we problem and recommend appropriate rem- have devised an interactive visual inter- edies. To illustrate the approach, we have face to assist in correlation analysis. Our built knowledge base for a simple use case interface visualizes the data in two linked and developed a special parser to generate spaces: data space and variable (attribute) conversations that can help the user solve space. For the data space, a data-centric software configuration problems. Figure 2. plot such as parallel coordinates of low- (Center for Dynamic Data Analytics and CA dimensional embedding, visualizes cor- Technologies) relations among the variables in terms of their patterns. And for the variable space, a network-based correlation map directly visualizes the relationships among all variables—a multi-scale semantic zooming method provides scalability for high-di- mensional and large data. (NSF, DOE)

18 Imaging and Visualization

ent illness, previous treatments, available High Performance Comput- data, current medications, past history, Visual Analytics for family history, and others into a single ing for Medical Imaging interactive visual framework. Based on this Open-Ended Educational information the physician can then follow Multiplayer Games Klaus Mueller through a medical diagnostics chain that [email protected] includes requests for further data, diagno- Lori Scarlatos sis, treatment, follow-up, and eventually a [email protected] We have devised solutions based on mas- report of treatment outcome. As patients sively parallel commodity graphics hard- often have rather complex medical Open-ended multiplayer games have the ware (GPUs) to accelerate compute-inten- histories this visualization and visual ana- potential to promote collaborative, active, sive applications in medical imaging, such lytics framework can offer large benefits and inquiry-based learning. Yet the data as iterative algorithms used for low-dose for the navigation and reasoning with gathered by such games can be mas- X-ray Computed Tomography (CT). For ex- this information. (CEWIT) sive, with a great number of variables to ample, we have pioneered a novel stream- consider. Assessing the learning from this ing CT framework that conceptualizes the activity, without knowing which variables reconstruction process as a steady flow of data across a computing pipeline, updat- Detecting Data to focus on, can be a daunting task. We ing the reconstruction result immediately Visualization Preferences address this problem by presenting guide- after the projections have been acquired. lines for visualizing data collected from an Using a single PC equipped with a single Using Games open-ended educational multiplayer game. high-end commodity graphics board, our An implementation of these guidelines, system is able to process clinically-sized Lori Scarlatos resulting in a system for visualizing data projection data at speeds meeting and ex- [email protected] from a socio-scientific multiplayer game, ceeding the typical flat-panel detector data was created. The system was pilot tested production rates, enabling throughput rates In visualizations of large multivariate data to determine how changes to the gameplay of 100 projections/s for the reconstruction sets, discrete data can be effectively could provide more instructive feedback of clinically sized data volumes. Apart from represented using glyphs. Glyphs have to the students and increase learning. enabling fast patient throughput and di- the advantage of allowing for rapid visual Students playing the game also used the agnosis, the streaming CT framework also comparison, using differing visual dimen- visual analytics system to better under- represents an excellent platform for image- sions to represent the different variables in stand how the collective players’ choices guided surgery and diagnostic imaging of the data. Some types of glyphs accommo- affect the outcome of the simulation. We transient phenomena. Multi-GPU solutions date even more variables by using shape are continuing this research by investigat- can achieve even higher through-put rates. to represent the data. Yet the characteris- ing other open-ended educational games, A further, non-medical, application can be tics of these shapes may have underlying and how their data can be effectively inside rapid security scanners for luggage perceptual meanings. The purpose of this visualized. (CEWIT) and freight. (NIH) study was to determine whether certain shape characteristics are commonly viewed as good or bad. We conducted a Energy Choices: A Visual Framework for study using two methods to gather data: a Using an Agent-Based Health Care Analytics traditional survey, and a casual game. The results of this study strongly suggest that Modeling Simulation and Klaus Mueller, IV Ramakrishnan, there are certain shape characteristics that Game to Teach Socio- Rong Zhao, Asa Vicellio are generally perceived as positive/nega- Scientific Topics tive, although they are not necessarily what High costs, lack of speed, non-intuitive might be expected. Research is continuing Lori Scarlatos on how to effectively use games as survey interfaces, and inefficient, fragmented dis- [email protected] play of patient information have hindered instruments. (CEWIT) the adoption of the Electronic Health In our modern world, where science, Record (EHR, EMR). Critical factors technology and society are tightly interwo- inhibiting adoption of the EMR include the ven, it is essential that all students be able time spent by the health care providers in to evaluate scientific evidence and make accessing and also documenting patient informed decisions. Energy Choices, an information during clinical encounters. agent-based simulation with a multiplayer We have devised a visual analytics system game interface, was developed as a learn- that unifies all EMR information fragments, ing tool that models the interdependen- such as current symptoms, history of pres- cies between the energy choices that are 19 Imaging and Visualization

made, growth in local economies, and consist of desorption electrospray ion- climate change on a global scale. We have ization mass spectrometry (DESI-MS) Reality Deck – Immersive pilot tested Energy Choices in two differ- imaging data. Compressive sampling/sens- ent settings, using two different modes of ing has been utilized to show that a very Gigapixel Display delivery. In our research, we are continu- large class of signals can be accurately ing development of the simulation - to (or in some cases exactly) reconstructed Arie Kaufman (PI) and Klaus Mueller, increase the number of parameter choices from far fewer samples than suggested by Dimitris Samaras, Hong Qin (co-PIs) - and the game - to make it more engaging conventional sampling theory. Classical [email protected] for student players. We are also investi- signal processing techniques lead to suf- gating the creation of a general interface ficient sampling by employing the band- Large, interactive, high-resolution displays framework that can be applied to other limitedness of signals. In the compressive have been demonstrated as a valuable tool games built upon socio-scientific agent- sensing approach, one defines sufficient for the exploration of massive amounts of based simulations. (CEWIT) sampling conditions based on the com- data. Due to their size, they allow for physi- pressibility of a signal relative to a given cal navigation (walking around in space) dictionary designed for the problem at rather than virtual navigation (manipulating Compressive Sensing hand. Via models based upon the tumor a virtual camera with a controller). Such cell concentrations derived from compres- displays were limited to 300 megapixels Approach to Intraoperative sive sampling, our framework requires no in aggregate resolution. Additionally, they Mass Spectrometry for assumptions about tumor shape or tumor presented themselves as a single planar Tumor Margin Delineation cell density concentration. Further, we surface, reducing the potential physical are developing certain statistical filtering navigation benefits that users can enjoy. We Allen Tannenbaum techniques (e.g., particle and Kalman) have built the Reality Deck, which is the [email protected] that may be performed in the compressed next-generation immersive, large, interac- domain to better approximate tumor tive, super resolution display. It is a unique Tumor resection is a key procedure for the boundaries. 416 tiled display visualization environment treatment of a number of tumors including that offers a total resolution of 1.5 billion those of the brain and breast. The ac- Compared with scalar CT/MR imagery pixels and 100dpi in a 4-wall horizontally curate determination of tumor boundaries where each voxel holds a single value, immersive layout, while providing 20/20 is of crucial clinical importance since the and a diffusion weighted MR imagery visual acuity for the visualization space. It surgical resection of tumors requires a del- where about 100 values are associated is the first facility of its kind and improves icate balance between a maximal resec- with a sample location, for DESI-MS the resolution by a factor of 5 compared to tion of diseased tissue and minimal injury imaging, roughly numbers are obtained the next largest tiled display wall and by a to the surrounding healthy tissue. Medical at a single voxel. The treatment of such factor of 15 compared to other immersive imaging techniques such as computed big data provides a great opportunity for environments, such as the CAVE. The high- tomography and magnetic resonance data analysis and classification, but also resolution tiled LCD displays are driven by imaging (CT and MR) are currently used poses a challenge for carrying out the an efficient 20-node visualization cluster in diagnosis as well as in image-guided analysis in a near real-time intraoperative that utilizes four AMD FirePro V9800 GPUs surgical procedures, but studies show that fashion. Hence, the necessity of selecting per node with 6 displays connected to each these methods fail to accurately identify and extracting the important information GPU. The cluster provides aggregate 2.3 the full extent of malignant tumors and from such big data sets, and performing teraflops CPU performance, 220 teraflops their microscopic infiltration. This high- the computations in a sparse manner. In GPU performance and 1.2 TB of memory. lights the need for a procedure that allows addition to reducing the computational The Reality Deck is a one-of-a-kind facility, microscopic inspection of the tissue in load, identifying a sparse subset in the rich which serves as a platform for core visual- real-time, and a framework to collectively spectrum of DESI-MS would naturally link ization and usability research, systems-level analyze a limited set of local measure- with the exploration of the biochemical sig- research for enabling the visualization of ments and assist the surgeon in delineat- nificance of the MS data. Indeed, among new types of data (such as gigapixel video) ing the tumor margin intraoperatively. the 104 values at a single voxel, only less and finally as an exploration platform for than render themselves as key chemical real-world visualization problems. We have Accordingly, at CEWIT we are developing substances that differentiate the normal implemented a number of interactive ap- the use of compressive sensing methods and cancerous tissue. Thus, determin- plications that leverage the high resolution for the reliable and robust detection of ing the sparse spectral features provides and deal with a variety of large datasets, tumor margins using a small number of indications which mass/charge ratios are including gigapixel panoramic images, measurements, in particular for the breast significantly different among various tis- global GIS data, molecular models, medical and the brain. These measurements will sues. (NIH) data and art collections. (NSF)

20 Imaging and Visualization

poses a challenging problem. Additionally, the microcamera data needs to be pre- NYRISE Visualization for processed (effectively “stiched” together) Immersive Virtual Climate Simulation Data prior to the final gigapixel image being Colonoscopy available, a process that currently is not Arie Kaufman real-time. We are developing solutions to Arie Kaufman [email protected] this challenging technical problem. First, [email protected] techniques for optimizing data transfers The New York Resiliency Institute for between the camera and the Reality Deck Virtual Colonoscopy (VC) is an estab- Storms & Emergencies (NYRISE) is a facility are being developed. Specifically, lished alternative to optical colonoscopy cross-institutional and cross-disciplinary the position of a user within the Reality for colorectal cancer screening. A major effort to enable the state of New York to Deck can be used in order to adaptively advantage compared to the traditional be one step ahead of superstorm Sandy- select the image quality that should be procedure is that it is non-invasive and like disasters. The goal of the Institute is delivered to each display (reducing in this may be easier on the patient. There are to bring together the expertise of vari- way the overall bandwidth requirement also significant advantages in terms of ous researchers at Stony Brook (lead by for data streaming). Additionally, novel completeness of the examination as researchers at the School of Marine and techniques for the reconstruction of these VC provides complete coverage of the Atmospheric Sciences) and other universi- gigapixel images are being investigated. surface of the colon. We have developed ties in the fields of marine sciences, clima- These techniques will leverage the parallel an Immersive Virtual Colonoscopy (IVC) tology, road network simulation, emer- processing abilities of the graphics pipeline visualization system that further improves gency response planning and visualization. on modern GPUs, with a goal of achieving the efficiency of the examination through We are focusing on the visualization aspect real-time reconstruction of the gigapixel the use of virtual reality technologies in the of the NYRISE and are developing novel image. (NSF, internal funding) Immersive Cabin (IC), also known as the and scalable visualization techniques for CAVE, at Stony Brook University. The IC is climate simulation data that will be utilized a 5-sided enclosed visualization environ- in the work of the institute. Such visualiza- Natural Interaction with ment that uses pairs of high-resolution tions would merge the simulation data with projectors to create stereoscopic images underlying road networks, elevation and VR Environments around the user. Stereoscopic rendering other GIS sources, enabling emergency provides the radiologist with enhanced planners to be better prepared for future Arie Kaufman shape and depth perception, which im- storms and emergencies. These tech- [email protected] proves the visual detection of polyps. The nologies will be deployable on traditional immersion further gives the ability to look desktop computers but also scale up to Traditional user interface devices, such as around obstacles in the data, such as the gigapixel resolution facilities, such as the mice and keyboards, impose a “tethering” haustral folds or sharp bends in the colon, Reality Deck. (NYS) constrain to the user, anchoring him to a simply by looking around in the IC. We stationary working surface. Consequently, have also developed a conformal visualiza- these interfaces do not apply themselves tion technique to support facilities that are Gigapixel Video Streaming to a facility such as the Reality Deck, only partially immersive (e.g., missing ceil- which encourages users to physically navi- ing surface such as the IC) while preserv- gate the data by walking within the space. ing important 3D shapes in the data, such Arie Kaufman This physical navigation enables users to [email protected] as the polyps. The combined effect of IVC leverage their spatial memory when attack- is reduced examination time and improved ing data analysis tasks within an immer- Gigapixel-resolution image capture is an sensitivity for the screening procedure. sive visualization system and should not (NIH, NSF) active research area, with proof of concept be discouraged by using tethered interac- devices already in the field. The AWARE tion devices. We are developing the next family of cameras, developed at the Duke generation of user interfaces, fit for usage Imaging and Spectroscopy Program, can within large immersive environments. capture color gigapixel resolution images These user interfaces are hand-driven and in a single “shot” by utilizing arrays of gesture centric, allowing users to explore microcameras. The natural next step is the visualization from any point within the the acquisition of gigapixel resolution video facility, simply by wearing a pair of low-cost and the Reality Deck facility is a perfect fit tracked gloves. (NSF, Samsung) for displaying and inspecting such content. However, the sheer bandwidth require- ment for the delivery of such contents from a remote camera to the visualization facility 21 Imaging and Visualization

Interactive Immersive Medical Volume Rendering Visualization of Computed on the Gigapixel Microtomography Data Reality Deck

Arie Kaufman Arie Kaufman [email protected] [email protected]

Computed microtomography (CMT) is We have developed a novel visualization widely used at synchrotron facilities for system based on the reconstruction of characterization of samples in many dif- high resolution and high frame-rate im- ferent fields such as energy, environment, ages from a multi-tiered stream of samples materials, biomedicine, and plants. The that are rendered framelessly as opposed experiments produce 3D data sets of to the traditional rendering on a pixel grid. several GB that need to be analyzed and Fusion of GIS and Proce- The sample generator can be a GPU clus- visualized. With the BNL NSLS II facility dural Modeling for Driving ter that is separate from the display clus- that will become operational shortly, the Simulation in New York ter, or even a cloud-based service. This resolution and data sizes will be signifi- decoupling of the rendering system from cantly larger due to the increased x-ray Arie Kaufman the display system is particularly suitable intensities. This will require novel visualiza- [email protected] when dealing with very high resolution dis- tion tools to allow for the efficient analysis plays or expensive rendering algorithms, of the data. We have developed interactive We have developed a mixed modeling where the latency of generating complete visualization techniques for CMT data that pipeline for the creation of virtual 3D urban frames may be prohibitively high for leverage the immersive visualization facili- environments that approximate the real- interactive applications. We specifically ad- ties at Stony Brook University. The Immer- world environments with sufficient fidelity dress the application of medical volumetric sive Cabin (IC), also known as the CAVE, is to be suitable for driving simulations. One rendering on a gigapixel display, such as a 5-sided enclosed visualization environ- of the target applications is the simulation the Reality Deck at Stony Brook University, ment that uses pairs of high-resolution of eco-driving on the Manhattan to JFK where the traditional visualization pipeline projectors to create stereoscopic images route, where the area is too large to be cannot produce complete images at inter- around the user. This allows for interactive modeled effectively by traditional modeling active frame-rates. (NSF) virtual flight through the pore structure of tools. Our approach combines accurate the sample, which can be used as a stage GIS data for large areas with procedural in the analysis framework. While the ste- tools for the semi-automatic generation Lattice Simulation reoscopic rendering is particularly suitable of the 3D data based on mathematical and Rendering for studying intricate 3D structures, the models. From sources such as elevation pixel density in the IC (and CAVEs in gen- and aerial maps, transportation networks Arie Kaufman and Klaus Mueller eral) is too low to handle larger datasets. and building footprints, our pipeline pro- [email protected] We plan to implement the novel visualiza- duces compelling and varied models that tion techniques recently developed Reality approximate the target location. We have In this project we have developed an Deck, which uses 416 high-resolution pro- further developed visualization tools for the fessional LCDs to provide more than 1.5 innovative lattice of the computation and exploration of these models leveraging our rendering domain, the hexagonal lattice, billion pixels of aggregate resolution in an virtual reality expertise and the Immersive immersive 4-sided and significantly larger such as BCC (Body Centered Cubic) and Cabin and Reality Deck facilities at Stony FCC (Face Centered Cubic), for a variety of working space with an order of magnitude Brook University. (Hyundai Motors, SUNY) increase in pixel density compared to the 3D and 4D wave phenomena simulations, IC. (BNL) such as lighting, acoustics and tomogra- phy. The hexagonal lattice discretizes the

22 Imaging and Visualization domain into lattice sites, where every site tem and have tested it with a 10-block GIS It demonstrates the formation of cold-wa- is connected to 12 neighbors in contrast in the West Village of New York City, and ter plumes in the reactor vessel. We also with the cubic cell in Cartesian grids. with an 851-building area in Times Square develop a set of interactive visualization Photons, phonons, wavelets, or wavefronts of NYC. In addition to a pivotal application tools, such as side-view slices, 3D volume are traced on the lattice links and inter- in simulation of airborne contaminants in rendering, thermal layers rendering, and act at the lattice sites (e.g., scattering, urban environments, our approach will panorama rendering, which are provided absorption) using cellular automata like enable the development of other superior to collectively visualize the structure and processes. The proposed work investigates prediction simulation capabilities for physi- dynamics of the temperature field in the and develops structures, constructions, cally accurate environmental modeling vessel. To the best of our knowledge, sampling, algorithms and applications and disaster management, visual simula- this is the first system that combines 3D of the hexagonal lattices. The hexagonal tions for computer graphics, and has the simulation and visualization for analyzing lattice can be applied in and has the po- potential to revolutionize the way scientists thermal shock risk in a pressurized water tential to revolutionize a variety of applica- and engineers conduct their simulations. reactor. (NRC, ISL) tions, such as simulation and rendering (NSF) of light, acoustics and tomography, and has the potential to a very broad impact Volumetric Segmentation by extending the framework to many other Visual Simulation of of Computed Tomography particle based wave phenomena, such as those in particle physics. (NSF, CEWIT) Thermal Fluid Dynamics in Angiography a Water Reactor Arie Kaufman, Rong Zhao and Plume Modeling Simula- Arie Kaufman Satya Sharma tion and Visualization [email protected] [email protected] We have developed a simulation and The resolution and size of the Computed Arie Kaufman and Klaus Mueller visualization system for the critical applica- Tomography Angiography (CTA) data [email protected] tion of analyzing the thermal fluid dynam- is rapidly growing and can easily reach ics inside a pressurized water reactor of up to 6GB per patient, especially when We have adopted a numerical method a nuclear power plant when cold water multi-slice CT scanners are used, such as from computational fluid dynamics, the is injected into the reactor vessel for a the 320-slice CT scanner. However, the Lattice Boltzmann Method (LBM), for possible thermal shock to the vessel. We essential diagnostic information is con- real-time simulation and visualization of employ a hybrid thermal lattice Boltzmann tained only within the heart envelope and flow and amorphous phenomena. Unlike method (HTLBM), which has the advan- coronary vessel structures. This project other approaches, LBM discretizes the tages of ease of parallelization and ease of develops robust volumetric segmentation micro-physics of local interactions and can handling complex simulation boundaries. algorithms of the whole heart envelope, handle very complex boundary conditions, For efficient computation and storage of as well as the coronary arteries for the such as deep urban canyons, curved the irregular-shaped simulation domain, purpose of data size reduction. Seg- walls, indoors, and dynamic boundaries of we classify the domain into nonempty and mented data is significantly smaller in size moving objects. Due to its discrete nature, empty cells and apply a novel packing and hence can be delivered, preferably LBM lends itself to multi-resolution ap- technique to organize the nonempty cells. compressed using either lossless or lossy proaches, and its computational pattern is We further implement this method on a compression, to the doctor at any loca- easily parallelizable. We have accelerated GPU (graphics processing unit) cluster for tion far away from the hospital for rapid LBM on commodity graphics processing acceleration. preliminary diagnosis in the cases of acute units (GPUs), achieving real-time or even chest pain, which is ultimately a lifesaving accelerated real-time on a single GPU or scenario. (CEWIT) on a GPU cluster. Another key innovation of LBM is its extension to support input from pervasive sensors, influencing the simulation so as to maintain its faithfulness to real-time live sensor readings. We have implemented a 3D urban navigation sys-

23 Imaging and Visualization

Automatic Spleen Mobile-based Volume Video Streaming for Segmentation for Non- Rendering Pipeline for Interactive Visualization Invasive Lymphoma m-Health on Mobile Devices Diagnosis Arie Kaufman Arie Kaufman Arie Kaufman [email protected] [email protected] [email protected] Current mobile devices such as smart- We have developed a client-server stream- phones or tablets have a number of unique ing architecture to address the computa- Diagnosis of spleen disorders, especially characteristics that make them suitable tional challenge of large data visualization lymphoma, is a common and challenging platforms for medical applications. Their on modern mobile devices, such as in issue in the clinical practice. Changes in portability and always-on connectivity allows medical imaging cases. In particular, the size of the spleen under the therapy a medical doctor or health care provider to tablets are used extensively in many fields, provide a good evidence to assess the conduct the diagnostic process and follow including the medical field, however, they up without being constrained to the work- course of lymphoma malignancy. The do not provide sufficient performance to station computer in the hospital facility. initial step for the diagnosis is automatic run applications such as volume rendering spleen segmentation. In clinical setting it that traditionally require a workstation-class We develop a pipeline for visualization is still often performed manually, which computer. The problem is further compli- of medical imaging, such as Computed is time consuming and highly observer- cated by the ever-growing dataset sizes Tomography (CT) and Magnetic Reso- dependent. We focus on developing in medicine, driven by advancements in nance Imaging (MRI) data that is com- algorithms for robust automatic spleen scanner and sensor technologies. In our mon for a variety of applications, such as segmentation and evaluate them against system, the user’s (e.g., doctor’s) input CT angiography, virtual colonoscopy, and available ground truth results. Additionally, on the tablet device is sent to a dedicated brain imaging. In our work we concentrate we automatically evaluate the volume of GPU cluster that executes the visualization on two main architectures for volumetric the spleen that is to be compared across algorithms, encodes the resulting video us- rendering of medical data: rendering of the repeated scans of the patient for evaluat- ing high quality hardware video encoders, data fully on the mobile device, when the ing changes and lymphoma diagnosis. and streams the video back to the tablet. data is already transmitted to the device, Full resolution high quality results can be (Stony Brook University) and a thin-client architecture, where the achieved with minimal latency over a range entire data resides on the remote server of wireless technologies, enabling faster and the image is rendered on it and then and more efficient diagnosis. (Samsung) streamed to the client mobile device. As mobile devices have been establishing new ways of interaction, we explore and develop 3D User Interfaces for interacting with the volume rendered visualization. These in- clude touch-based interaction for improved exploration of the data. (Samsung)

24 Imaging and Visualization

Volumetric Mesh Mapping Conformal Mapping for Automatic Detection of Medical Imaging Colon Cancer Arie Kaufman and Xianfeng David Gu [email protected] Arie Kaufman, Xianfeng David Gu, Arie Kaufman Dimitris Samaras, and Wei Zhu [email protected] With the rapid development of volumetric [email protected] acquisition and computational technolo- This project involves a method for com- gies, vast quantities of volumetric datasets It is paramount in medical imaging to puter aided detection (CAD) of colorectal exist in numerous applications, such as measure, compare, calibrate, register and cancer. The CAD pipeline automatically industrial inspection and medical imaging. analyze potentially deformed organ shapes detects polyps while reducing the number The demands for processing such vol- umes are pressing to analyze the topology with high accuracy and fidelity. However, of false positives (FPs). It integrates volume and geometry, such as volumetric map- this is extremely difficult due to the com- rendering techniques and conformal (i.e., ping to canonical structures, volumetric plicated shape of human organs. Differ- angle preserving) colon flattening with tex- registration, volumetric feature extraction, ent organs have different topologies and ture and shape analysis. Using our colon geometric database indexing, volumetric curvature distributions, and furthermore, flattening method, the CAD problem is parameterization, and so on. This project the shape may deform due to disease greatly simplified by converting it from 3D focuses on developing rigorous algorithms progression, movement, imaging, surgery into 2D. The colon is first digitally cleansed, for computing the topology and geometry and treatment. We have used conformal segmented, and extracted from the CT for general mesh volumes. Specifically, we geometry, a theoretically rigorous and dataset of the abdomen. The colon surface have been developing computational algo- practically efficient and robust method, to is then mapped to a 2D rectangle using rithms for Ricci flows. On the other hand, tack this challenge. The broad objective of conformal mapping. This flattened image it is highly desirable to map one or more this project is to develop conformal geome- is colored using a direct volume rendering volumes to a canonical domain, to support try as a primary tool in the vast biomedical of the 3D colon dataset with a translucent database indexing and volume registration. applications of medical imaging. We have transfer function. Suspicious polyps are been developing application of conformal detected by applying a clustering method We have built a concrete set of software surface flattening to a variety of organs, on the 2D flattened image to locate regions tools for computing and visualizing the use of conformal mapping for volumetric with irregularity and high density. The topology and geometric structures for mesh parameterization, registration and fusion FPs are reduced by analyzing shape and volumes, including volumetric parameter- using conformal geometry, and statisti- texture features of the suspicious areas ization, volumetric registration, volumetric cal analysis and feature extraction using detected by the clustering step. Compared mapping to canonical structures, funda- conformal geometry.The research design with shape-based methods, ours is much mental groups computation, and topo- and methodology include developing and faster and much more efficient as it avoids logical and geometric feature extraction. validating techniques to conformally flatten computing curvature and other shape Engineering, science, medicine, computer graphics, vision, scientific computing, and 3D organ surfaces to canonical parametric parameters for the whole colon wall. We mathematics will directly benefit from these surfaces for colonic polyp detection, blad- tested our method with 88 datasets from tools, the research and education. These der cancer screening, and endovascular NIH and Stony Brook University Hospi- tools can be further used in: (1) industry: surgical planning for aortic aneurysm. We tal and found it to be 100% sensitive to in CAD/CAM/CFD simulation and analysis, have further extended flattening to imple- adenomatous polyps with a low rate of FPs. non-destructive testing of scanned parts, ment volumetric parameterization based The CAD results are seamlessly integrated reverse engineering, and large geometric on Ricci flow and then apply it to brain into a virtual colonoscopy system, providing database indexing; (2) medical imaging for and colon structure segmentation, tumor the physician with visual cues and likeli- volumetric registration and fusion, com- evaluation, diffuse tensor field study. In hood indicators of areas likely to contain parison, shape analysis, abnormality and addition, we have implemented shape reg- polyps. This, serving as a second reader, cancer detection; (3) computational fields, istration and data fusion using a common allows the physician to quickly inspect the for weather prediction, air flow around ve- canonical parameter domain. Brain data suspicious areas and exploit the flattened hicles, and toxin prediction, using volumet- sets have been fused between and within colon view for easy navigation and book- ric computed datasets; and (4) other fields subjects and modalities, as well as colon mark placement. (NIH) for confocal volume microscopy for cellular supine and prone have been registered research, siesmology for earthquake pre- for cancer screening. Finally, we have diction and gas and oil exploration, radar conducted statistical analysis and feature and underwater sonography for terrain extraction using conformal geometry for mapping and object detection, both civilian drug addiction and Alzheimer’s disease, and military. (NSF) where Fourier analysis on the canoni- cal domains have transformed them to frequency domain. (NIH) 25 Imaging and Visualization

Reconstruction and Registration of Volumetric Brain Parcellation Registration Framework Prostate Scans using for Endoscopic Videos Curvature Flow Arie Kaufman [email protected] Arie Kaufman Arie Kaufman Establishing correspondences across [email protected] [email protected] structural and functional brain images via labeling, or parcellation, is an important We develop a general reconstruction Radiological imaging of the prostate is and challenging task in clinical neurosci- and registration framework (RnR) for becoming more popular among research- ence and cognitive psychology. A limita- endoscopic videos. We focus on and ers and clinicians in searching for dis- tion with existing approaches is that they demonstrate RnR for optical colonoscopy eases, primarily cancer. Scans might be (i) are based on heuristic manual feature (OC) videos and underline a novel way of acquired with different equipment or at engineering, and (ii) assume the validity of visualizing endoscopic images and videos different times for prognosis monitoring, the designed feature model. In contrast, by using high-quality depth maps. To with patient movement between scans, re- we advocate a machine learning approach recover these depth maps from any given sulting in multiple datasets that need to be to automate brain parcellation. We are image captured from an endoscopic video, registered. For these cases, we introduce developing a novel shape-based approach we employ a non-parametric machine a method for volumetric registration using for automatically labeling the anatomical learning technique which uses a diction- curvature flow. Multiple prostate datasets features (folds, gyri, fundi) of the brain. We ary of RGB images captured from virtual are mapped to canonical solid spheres, show that this approach can be consid- colonoscopy (VC) datasets and their cor- which are in turn aligned and registered ered a viable alternative to the atlas-based responding depth patches. Given a video through the use of identified landmarks labeling approach. Also, this method can sequence, we can use shape-from-motion on or within the gland. Theoretical proof be used as an initialization process for (SfM) and these inferred depth maps to and experimental results show that our atlas-based approach to provide for more recover a complete 3D structure defined method produces homeomorphisms with accurate labeling of anatomical features. in that sequence, along with the camera feature constraints. We provide thorough We create a machine learning framework parameters. This allows us to register the validation of our method by registering to learn from the manually-labeled ana- recovered structure from the OC video with prostate scans of the same patient in dif- tomical dataset, Mindboggle. This frame- the one captured in the corresponding VC ferent orientations, from different days and work takes into account multiple features model, using quasi-conformal mapping. using different modes of MRI. Our method and shape measures to achieve up to 91% As a result, our framework can document also provides the foundation for a general accuracy, without taking into account any the OC procedure using reconstruction group-wise registration using a standard spatial information. Moreover, this work is and can localize the polyps found in VC, reference, defined on the complex plane, the first in the domain of 3D anatomical via registration. This framework can also for any input. In the present context, this labeling of the brain. (internal funding) be used for registration of two OCs in can be used for registering as many scans different times. In addition, we can study as needed for a single patient or differ- the quality of our reconstruction using the ent patients on the basis of age, weight registration process, keeping the VC model or even malignant and non-malignant as ground truth. Due to the non-rigidity attributes to study the differences in of the colon, some components (e.g., general population. Though we present folds) can sometimes have large deforma- this technique with a specific application tions in successive frames which cannot to the prostate, it is generally applicable for be handled by SfM. To deal with these volumetric registration problems. (NIH) cases, our framework can also reconstruct a complete colon segment from a single colonoscopy image, assuming the intrinsic parameters of the camera are known. Our RnR general framework may be used for other endoscopic procedures (e.g., bron- choscopy, cystoscopy, etc.). (NIH)

26 Imaging and Visualization

Multimodal and Meshless Point Cloud Volumetric Shape Analysis Multivariate Visualization Registration by Conformal of the Brain Mapping Arie Kaufman [email protected] Arie Kaufman Arie Kaufman and Xianfeng David Gu This project involves development of novel [email protected] [email protected] algorithms and techniques for geometric shape analysis in 3D volumetric data. Un- Current connectivity diagrams of human With the mass-production of home-use like in the case of 3D surfaces, the shape brain image data are either overly complex 3D sensors, such as the Microsoft Kinect, analysis in volumes is still in a nascent or overly simplistic. We attempt in this 3D Scanning is becoming more popular. stage due to several challenges such work to introduce simple yet accurate in- However, due to limitations of this scan- as immense data sizes, increased com- teractive visual representations of multiple ners (e.g., low resolution, 2.5D) advanced plexities, etc. This project aims to analyze brain image structures and connectivity registration is required for generating shapes in 3D volumes, including laying among them. We map cortical surfaces higher quality models, such as the user’s down theoretic foundations, designing extracted from human brain magnetic avatar. These scanners create 3D point computational algorithms, implementing resonance image (MRI) data onto 2-D sur- cloud models, which we need to register software systems and applying to com- faces that preserve shape (angle), extent in order to build the high quality model puter graphics and visualization fields. The (area) and spatial (neighborhood) infor- or to search for a match in a database aim is to use both deterministic methods mation for 2-D (circular disc or square) of point clouds. The classical Iterative such as heat kernel, Laplace-Beltrami mapping or optimal angle preservation for Close Point (ICP) method assumes that operator and stochastic methods such as 3D (spherical) mapping, break up these there is a large percentage of overlap random walks, Markov Random Fields, surfaces into separate patches, overlay between the two models to be registered. Markov Chain Monte Carlo Methods. shape information in these patches, and On the other hand, traditional conformal These methods are used to generate cluster functional and diffusion tractog- mapping requires meshes for boundary unique shape signatures and spectra raphy MRI connections between pairs of estimation and does not extend well to which in turn are used in various appli- these patches. The resulting visualizations point clouds. We introduce a mesh-less cations such as matching, registration, are computationally easier to compute on conformal mapping framework capable of tracking and analysis. The focus is also on and more visually intuitive to interact with registering noisy point clouds captured by developing fast and efficient computational than the original data, and facilitate simul- Kinect without any mesh data using model methods using GPUs in order to facilitate taneous exploration of multiple data sets, component segmentation and skeleton real-time shape analysis. The results ob- modalities, and statistical maps. Additional tracking. (internal funding) tained using all the aforementioned meth- specific contributions include two new ods are compared to provide an evaluative techniques to map the brain surface on survey. In addition to computer graphics the sphere, and a novel insight regarding and visualization, the project results can shape-based delineation of brain feature be applied to other fields such as, com- boundaries. (internal funding) puter vision, geometric modeling, medical imaging, manufacturing and architecture. Finally, as a byproduct, a general purpose volumetric geometric database is built up, with the set of geometric analysis tools, which allows volumetric geometric processing, searching, registration, fusion, comparison and classification. (NSF)

27 Imaging and Visualization

Ambienizer: Turning Saliency-Aware Compres- Virtual Dressing Digital Photos into Ambient sion of Volumetric Data Room Project Visualizations Arie Kaufman Arie Kaufman Arie Kaufman and Klaus Mueller [email protected] [email protected] [email protected] With a constantly growing size of the volu- Development of virtual reality head-mount- Although traditional information visualiza- metric data, the demand is increasing for ed display kits, such as Oculus Rift, and tion systems are of great aid to experts, its efficient compression, storage, trans- their popularity among the consumers, they are typically not appropriate for com- mission and visualization. Often only a cer- calls for integration of them into the daily mon citizens because these main stream tain part of the large volume is of interest routines, for example shopping experienc- users are often not familiar even with the to the user, and significant amount of the es. In our scenario the user can shop in most traditional techniques, such as bar data contains no relevant information. In any store in a fully immersive way, yet all in charts or line charts, let alone advanced this project, we have developed a frame- the comfort of her/his home. In this project paradigms such as parallel coordinates. In work designed for processing and visual- we integrate immersive 3D headset to cre- this project we have developed a frame- izing a large amount of volumetric data on ate fully immersive shopping environment work that allows users to turn any digital a target device, possibly a mobile device that can be explored by the user similar to photo into an ambient visualization which with limited computational resources. Our any real life department store. We struc- seeks to convey information in a casual system involves a 3D block-based saliency ture the shopping environment as a physi- non-technical way using metaphors ac- and transfer function guided compression cal store, with which user can interact by cessible to any common sense viewer. In scheme of volumetric data that achieves walking around, browsing items, and trying our ambient visualization, called Ambi- content and spatial scalability. Saliency of them on, all done in the first-person view. enizer, the levels of the variable(s) to be each volumetric region is computed from The user is provided with an opportunity monitored are mapped to suitable image the coefficients of the 3D discrete cosine to import an avatar, which resembles his/ processing operations whose effects reflect transform on the 3D volume. The identifi- her body shape. Creation of this avatar is the level of these variables or retrieved cation of the salient volumetric blocks and an intricate process and utilizes state-of- images with certain level of features cor- weighting them by the transfer function the-art algorithms developed in our lab. responding to value of data. Our approach help to schedule the ordered transmission Alternatively, we provide a set of synthetic is attractive since users can either choose of the regions in the volume. Additionally, avatars with set of body measurements any image of their preference and map our method is integrated into a resolu- that correspond to traditional clothing any of the available visual effects to the tion scalable coding scheme with integer sizes. In order to facilitate virtual fitting variables they wish to monitor or view an wavelet transform of the image, so it allows in our environment we develop custom image with preferred image features from the rendering of each significant region at geometric clothing models and methods their photo achieve or the web. Although it a different resolution and even lossless re- of virtual fitting. The user is provided with shows various types of data such as bank construction can be achieved. In addition, a feedback from trying on such clothing account, stock portfolio, weather informa- in order to further reduce the compressed as it is simulated according to its physical tion, and so on, we demonstrate the use data, our method provides an option to properties. The project is implemented on of our system in the context of energy use remove less salient blocks obtained by the basis of the 3D game engine to provide monitoring in a household. (DOE) 3D seam carving from a 3D minimal cut real time interaction and feedback to the before the compressing data. At the target user. (Omni-Scient) device the received data is rendered pro- gressively based on its saliency. Prioritized streaming of the data helps to achieve data reduction by rendering regions based on their saliency and disregarding less es- sential components. (CEWIT)

28 Imaging and Visualization

Figure 3

splines’ application domains and realize 3D Facial Recognition Geometric Manifold Theory their full scientific potential. We systemati- cally trailblaze a novel geometric manifold for Higher-Dimensional theory founded upon continuous poly- Xianfeng Gu Data Modeling, Analysis, [email protected] nomial representations, and to apply this and Visualization mathematically-rigorous theory to both 3D facial recognition has fundamental shape geometry and higher-dimensional, importance for homeland security. This Hong Qin multi-attribute data modeling, analysis and project focuses on 3D facial surface rec- [email protected] visualization, with a special emphasis on ognition based on modern geometry and visual computing applications. We explore machine learning method. The human Continuous functions defined over arbi- new geometric manifold theory at the facial surfaces are captured using dy- trary manifold such as piecewise polyno- interface of differential geometry, numeri- namic 3D camera based on phase shifting mial-centric splines enable compact data cal approximation theory, computational principle in real time with high resolution representation, analysis (especially quanti- topology, and linear algebra. We conduct and high accuracy. The 3D facial surfaces tative analysis), simulation, and digital the comprehensive study of new and im- are mapped onto the planar unit disk prototyping. They are relevant throughout portant geometric manifold theory that can via Riemann mapping. The Riemannian the entire higher-dimensional, multi- enable the accurate and effective model- metric on the original surface is encoded attribute data processing pipeline includ- ing of higher-dimensional, multi-attribute by the conformal factor on the disk. Then ing raw data acquisition and organization, volumetric datasets which are of compli- prominent geometric features are auto- visual data modeling and cated geometry, arbitrary topology, and matically selected by machine learning interactive manipulation, synthesis, analy- with rich geometric features. This research method. A diffeomorphism of the planar sis, and visualization. The main thrust is expected to significantly improve the disk is computed by optimizing a special of this project is to promote manifold current theory and practice of information functional, which describes the elastic functions as a new powerful data model- integration by potentially enabling a more deformation and bending of the shape. ing, analysis, and simulation tool that not accurate, more efficient, and easier-to- This optimal registration also induces a only continues to enjoy their popularity in use data representation and structure in distance between two 3D facial surfaces. conventional application fields such as the future for processing geometric and By using the distance among the faces, geometric and shape representation, but higher-dimensional scientific datasets. we can compare their similarities. Different also aims to broaden its widespread ac- (NSF) geometric features are weighted in order to ceptance in a general higher-dimensional improve the recognition rate, the weights data modeling and analysis framework. are obtained by machine learning method Consequently, our goal is to transcend automatically. Current system beats the the traditional and current boundary of state of the art. Figure 3. (NSF)

29 Imaging and Visualization

Figure 4

formulations and the strong demand to Wireless Sensor Network Multivariate Spline Theory, accurately and efficiently model acquired datasets towards quantitative analysis and Routing Algorithms, and Com- finite element simulation, our research ef- putational Techniques forts center on the unexplored mathemati- Xianfeng Gu for Shape Modeling and cal theory of manifold splines, that will en- [email protected] Graphics Applications able popular spline schemes to effectively represent objects of arbitrary topology. We This project focuses on design efficient, conduct the comprehensive study of new Hong Qin reliable and secure routing strategies for and important [email protected] wireless sensor network routing. Given a theoretical foundations that can trans- sensor network, according to their dis- form the spline-centric representations Conventional multivariate splines (being tance, one can build a planar graph. By to the accurate and effective modeling of defined as piecewise polynomials while graph embedding method, the graph can surfaces of arbitrary topology. The primary satisfying certain continuity requirements) be assigned geometric coordinates, such and long-term thrust is to promote had found their mathematical root in ap- that all cells are convex. The convexity splines as a powerful data modeling, proximation theory. In this digital of all cells guarantees the delivery using analysis, and simulation tool that not only era, splines are becoming ubiquitous, greedy routing method. The embedding continues to enjoy its popularity in conven- with widespread and deep penetration can be transformed to achieve load bal- tional application fields such as geometric into many scientific and engineering fields ancing. Furthermore, the curvature of the and shape modeling, but also aims including functional data analysis, reverse network can be designed to avoid to broaden its widespread acceptance in a engineering, medical image analysis congestion. Figure 4. (NSF) general data modeling and analysis frame- (e.g., image registration of non-rigid work. The flexible and effective construc- shapes), finite element analysis and simu- tion of splines defined over arbitrary mani- lation (e.g., numerical solutions of PDEs), fold has immediate impact in geometric data visualization (e.g., digital ocean), data design, visual information processing, and manipulation and deformation graphics. It will provide a sound theoreti- (e.g., scalar fields and heterogeneous volu- cal foundation for rapid product design metric datasets for solid modeling), etc. In and data analysis and have a potential to visual computing, traditional multivariate streamline the entire virtual splines have been extremely popular in prototyping processes for reverse engi- geometric modeling, graphics, and neering in the future by expediting data visualization, and they are commonly conversion from discrete samples to con- defined over an open, planar domain. tinuous geometry and spline-centric finite Nevertheless, real-world volumetric objects element models. (NSF) are oftentimes of arbitrarily complicated topology. In addition, modern surface and volumetric scanning technologies enable data acquisition for not only geometric shapes but also multi-scale, multi-attribute material properties. In order to bridge the large gap between conventional spline 30 Imaging and Visualization

particular, our strategy aims to develop Human Cortical Surface Volumetric Modeling and a haptics-enabled platform for volumet- ric data modeling, design, analysis, and Morphological Study Shape Design in Virtual relevant applications, which includes a Environments suite of new, solid models for representing Xianfeng Gu volumetric datasets as well as haptics- [email protected] Hong Qin based physical interactions for manipulat- [email protected] ing and managing such representations. This project develops a general approach We take a unique and integrated approach that uses conformal geometry to param- IT-enabled engineering design is an in- that aims to incorporate within a single eterize anatomical surfaces with complex novative and iterative practice that consists IT system disparate research thrusts that (possibly branching) topology. Rather than of a variety of complex, challenging, and span volumetric, physical, and material evolve the surface geometry to a plane or creative processes, ranging from concep- modeling for novel data representation; sphere, we instead use the fact that all tual design, interactive shape modeling, geometric and physics-based algorithms orientable surfaces are Riemann surfaces quantitative test/evaluation, rapid prototyp- for interaction, analysis, and visualization; and admit conformal structures, which ing, manufacturing, assembly, to produc- and software tools for various applications. induce special curvilinear coordinate sys- tion. The essence of design is rapid, effec- The confluence of modeling, visualiza- tems on the surfaces. Based on Riemann tive, and creative change towards optimal tion, and haptics is both imperative and surface structure, we can then canonically solutions, which can be uniquely valuable, not only for advancing partition the surface into patches. Each of accomplished through the iterative modi- the state of the knowledge in distinct dis- these patches can be conformally mapped fication of numerous design parameters ciplines but also for collectively enhancing to a parallelogram. The resulting surface spanning across multi-dimensions. To current technologies with more advantages subdivision and the parameterizations of ameliorate the entire CAE processes and towards creativity enhancement. Ulti- the components are intrinsic and stable. enhance designers’ creativity through mately, the anticipated outcome is a new To illustrate the technique, we computed IT-supported tools, we develop a creativity- IT platform for modeling, visualizing, and conformal structures for several types of enhancing virtual environment that can interacting with CAD-based, volumetric anatomical surfaces in MRI scans of the greatly facilitate human-computer interac- datasets that have arbitrary topology, volu- brain, including the cortex, hippocampus, tion through physics-based modeling of metric structure, heterogeneous material, and lateral ventricles. We found that the real-world objects and force-enabled hap- and dynamic behavior. resulting parameterizations were consis- tic sculpting. Our technical approach aims tent across subjects, even for branching to broaden the accessibility of volumetric structures such as the ventricles, which modeling of real-world objects by combin- are otherwise difficult to parameterize. ing haptic sculpting with computational Compare with other variational approaches physics, thus offering novel interactive based on surface inflation, our technique methodologies towards more creative and works on surfaces with arbitrary complex- intuitive engineering design. Our research ity while guaranteeing minimal distortion activities are concentrating on: (1) to in the parameterization. It also offers a bridge the large gap between physical way to explicitly match landmark curves objects and digital models, and (2) to im- in anatomical surfaces such as the cortex, prove the communication and interaction providing a surface-based framework between human beings and computerized to compare anatomy statistically and to virtual environments (VEs), with a special generate grids on surfaces for PDE-based emphasis on creativity enhancement. In signal processing. Figure 5. (NIH)

Figure 5

31 Imaging and Visualization

Furthermore, the method has solid theo- Shape Analysis with retic foundation, and the computation of EmoteControl: An Inter- Teichmuller coordinates is practical, stable Teichmüller Shape Space and efficient. active Multimedia Stress Management System Xianfeng Gu This work develops the algorithms for the [email protected] Teichmuller coordinates of surfaces with Tony Scarlatos arbitrary topologies. The coordinates which [email protected] Shape indexing, classification, and we will compute are conformal modules retrieval are fundamental problems in represented as the lengths of a special set EmoteControl is an interactive multimedia computer graphics. This work introduces of geodesics under this special metric. The application, developed in collaboration a novel method for surface indexing and metric can be obtained by the curvature with Stony Brook University’s Center for classification based on Teichmuller theory. flow algorithm, the geodesics can be cal- Prevention and Outreach, that provides Two surfaces are conformal equivalent, if culated using algebraic topological meth- exercises and simulations to relieve stress, there exists a bijective angle-preserving od. We tested our method extensively for while measuring brain wave activity for map between them. indexing and comparison of large surface subsequent analysis and assessment. The Teichmuller space for surfaces with databases with various topologies, geom- EmoteControl uses a consumer-grade elec- the same topology is a finite dimensional etries and resolutions. The experimental troencephalogram (EEG) headset, called manifold, where each point represents results show the efficacy and efficiency of the MindWave, to measure a user’s level a conformal equivalence class, and the the length coordinate of the Teichmuller of concentration, anxiety, and overall brain conformal map is homotopic to Identity. A space. Figure 6. (NSF) activity. Four activities, including doodling curve in the Teichmuller space represents and creating music sequences, provide a deformation process from one class to exercises to help users relax. Three guided the other. meditations, using video and narration, instruct users on breathing and meditation In this work, we apply Teichmuller space techniques. Users see their relative level of coordinates as shape descriptors, which concentration or anxiety as a color-coded are succinct, discriminating and intrin- icon onscreen in real time. Their brain sic, invariant under the rigid motions activity is recorded over several sessions and scalings, insensitive to resolutions. for subsequent analysis and assessment. The purpose of this prototype is to study whether real time feedback helps users to manage their stress levels over time, and to examine which exercises or simulations have the greatest efficacy for stress relief. (CEWIT)

Figure 6

32 Imaging and Visualization

Shelterware: An Integrated Multi-Platform System to Automate the Services of the Smithtown Animal Shelter

Tony Scarlatos [email protected]

Shelterware was created to assist the Smithtown Animal Shelter (SAS), a pub- licly funded agency, update its system from an antiquated paper-based system Figure 7 to a modern digital system, so that the shelter’s limited staff resources could be disk. This gives a map from the shape better optimized. Presently, the shelter Conformal Wasserstein space of all topological disks with metrics houses 120 to 150 ready-to-adopt animals to the Wasserstein space of the disk, the at any given time. The Shelterware system Shape Space pull back Wasserstein metric equips the will allow SAS staff to manage adoption shape space with a Riemannian metric. and volunteer information through a web- Xianfeng Gu The conformal Wasserstein shape space based interface, and to find and identify [email protected] framework is applied for expression clas- lost animals. The Shelterware system sification. melds various state of the art client-server Surface based 3D shape analysis plays a technologies from both the web and from fundamental role in computer vision and 3D human faces with dynamic expressions mobile devices to automate the services medical imaging. This project uses optimal are captured using real time structure light provided by SAS. We have developed an mass transportation maps for modeling 3D camera based on phase shifting prin- iPad application that allows volunteers shape space. The computation of the ciple. The facial surfaces with expressions and staff to process incoming animals at optimal mass transport map is based on form a shape space. By using Wasserstein the shelter by taking photographs of the Monge-Brenier theory, in comparison to distance, they can be clustered and clas- animals, entering data about the animals, the conventional method based on Monge- sified, each cluster represents an expres- and automatically generating QR codes for Kantorivich theory, this method signifi- sion. This technique is applied for expres- identification, all of which are uploaded to cantly improves the efficiency by reducing sion analysis and autism diagnosis. Figure a database. This database will generate computational complexity. 7. (US Air Force) a catalog of pets for adoption, which can be accessed by potential adopters on the This project develops the framework of web. Furthermore, potential adopters will Conformal Wasserstein Shape Space. Giv- now be able to fill out an online application en a Riemannian manifold, the space of all form. Additionally, we have developed a probability measures on it is the Wasser- field application, which allows staff to use stein space. The cost of an optimal mass iPhones for the identification of lost ani- transportation between two measures is mals. Staff can scan QR codes attached the Wasserstein distance, which endows to the collars of lost animals, or photo- a Riemannian metric of the Wasserstein graph the animals. This data will then be space. In this work, all metric surfaces uploaded to a database with a time and with the disk topology are mapped to the date stamp along with the lost animals’ unit planar disk by a conformal mapping, location data. (CEWIT) which pushes the area element on the surface to a probability measure on the

33 Imaging and Visualization

3) implementation of retrieval, collection BrainRank: A Multimedia Integrating Humans and organization, and real world applications using our collaborative models. People are Game to Increase Fluid IQ Computers for Image and usually the end consumers of visual imag- Video Understanding ery. Understanding what people recognize, Tony Scarlatos attend to, or describe about an image is [email protected] Dimitris Samaras therefore a necessary goal for general im- [email protected] age understanding. Toward this goal, we The Dual N-Back game is a memory game first address how people view and narrate of recalling sequences of audio and visual Visual imagery is powerful; it is a trans- images through behavioral experiments stimuli (Dual), with the increasing dif- formative force on a global scale. It has aimed at discovering the relationships ficulty of remembering sequences several fueled the social reforms that are sweeping between gaze, description, and imagery. steps ago (N-Back). Recent studies have across the Middle East and Africa, and is Second, our methods integrate human confirmed that persistent practice with sparking public debate in this country over input cues – gaze or description – with ob- Dual N-Back games increases working questions of personal privacy. It is also cre- ject and action recognition algorithms from memory capacity, and improves Fluid IQ. ating unparalleled opportunities for people computer vision to better align models But most versions of the game are tedious across the planet to exchange information, of image understanding with how people for users, and they quit playing before communicate ideas, and to collaborate in interpret imagery. Our underlying belief much benefit to their working memory ways that would not have been possible is that humans and computers provide has occurred. BrainRank will study the even a few short years ago. Part of the complimentary sources of information impact of a kinesthetic version of the Dual power of visual imagery comes from the regarding the content of images and video. N-Back game, using a dance pad as the staggering number of images and videos Computer vision algorithms can provide input device for an application that records available to the public via social media and automatic indications of content through performance and time on task. Users can community photo websites. With this ex- detection and recognition algorithms. then take a version of Raven’s Progressive plosive growth, however, comes the prob- These methods can inform estimates of Matrices test to gauge any improvement in lem of searching for relevant content. Most “what” might be “where” in visual imag- their Fluid IQ scores over time. The study, search engines largely ignore the actual ery, but will always be noisy. In contrast, to be conducted through ES BOCES, will visual content of images, relying almost humans can provide: passive indications measure the impact of physical activity on exclusively on associated text, which is of content through gaze patterns - where cognitive function in elementary school often insufficient. In addition to the grow- people look in images or video, or active students. Additionally, BrainRank introduc- ing ubiquitousness of web imagery, we indications of content through annotations. es “gamification” techniques to the design also notice another kind of visual phenom- Finally, we expect that our proposed meth- of the software, such as leader boards ena, the proliferation of cameras viewing ods for human-computer collaborative and accomplishment badges, to evaluate the user, from the ever present webcams applications will enable improved systems if such motivators increase users’ time on peering out at us from our laptops, to cell for search, organization, and interaction. task. (CEWIT) phone cameras carried in our pockets (NSF). wherever we go. This record of a user’s viewing behavior, particularly of their eye, head, body movements, or description, could provide enormous insight into how people interact with images or video, and inform applications like image retrieval. In this project our research goals are: 1) behavioral experiments to better un- derstand the relationship between human viewers and imagery and 2) development of a human-computer collaborative system for image and video understanding, including subject-verb- object annotations,

34 Imaging and Visualization

discovery of the appropriate features that Machine Learning for the capture the most discriminative differenc- Learning Models for es in activation levels. We have validated Analysis of fMRI Images our technique in multiple fMRI data sets. Illumination, Shadows In prediction of behavior based on neuro- and Shading Dimitris Samaras imaging data, we search for the features [email protected] which can unambiguously relate brain Dimitris Samaras function with behavioral measurements. [email protected] Analysis of functional Magnetic Resonance At the same time we have been exploring Images has allowed numerous insights functional connectivity between brain re- Pictures result from the complex interac- in the way the brain functions. A large gions by searching for conditional probabi- tions between the geometry, illumination amount of data is being collected in nu- listic dependencies between such regions, and materials present in a scene. Estimat- merous studies, however significant chal- described by Gaussian Graphical Models, ing any of these factors individually from lenges remain towards automatic analysis suitable to high-dimensional datasets. The a single image leads to severely under- of such data. These challenges include method has guaranteed global minima, constrained problems. Having a rough high levels of noise in the acquired signal, and does not require a-priori brain seg- geometric model of the scene has allowed difficulties in registering scans between mentation or selection of active regions. As us to jointly estimate the illumination and different subjects and different modalities, a structure learning technique, the effect the cast shadows of the scene. We solve relatively small number of subjects per of confounding variables of brain region the joint estimation using graphical models study and differences in protocol design activations is removed, giving a clearer and novel shadow cues such as the bright and scanning equipment between studies. interpretation of possible dependencies channel. We further develop our method We hypothesize that unique patterns of between brain regions. Structure learning so as to be able to perform inference on variability in brain function can assist in approaches will be useful in our quest to not only illumination and cast shadows but identification of brain mechanisms rooted integrate spatiotemporally heterogeneous also on the scene’s geometry, thus refin- in neuroscientific prior knowledge. Such data sets such as MRI and PET. In imme- ing the initial rough geometrical model. patterns will increase our understand- diate future work we plan to explore how Scene understanding methods that want ing of brain connectivity and circuitry as domain knowledge and assumptions about to account for illumination effects require we move iteratively between a-priori and common underlying neuropsychological reasonable shadow estimates. General exploratory means of describing functional processes can be used as constraints that shadow detection from a single image is brain circuits. We propose an integrated allow us to combine similar (but not equal) a challenging problem, especially when machine learning framework for the features or dependencies across related dealing with consumer grade pictures (or joint exploration of spatial, temporal and datasets in multitask learning frameworks. pictures from the internet). We investigate functional information for the analysis of Leveraging such related datasets will methods for single image shadow detec- fMRI signals, thus allowing the testing of improve the stability of machine learning tion based on learning the appearance hypotheses and development of applica- methods that are often encumbered by the of shadows from labelled data sets. Our tions that are not supported by traditional difficulties posed by the high dimensional- methods are the state of the art in shadow analysis methods. While our major focus ity in brain imaging datasets. In order to detection. In our work we take advantage is drug addiction and disorders of inhibi- study larger cohorts of subjects we are of domain knowledge in illumination and tory control, we are testing our methods also investigating resting state data for a image formation for feature design, as well in datasets for other disorders such as number of disorders. (NIH) as extensive use of machine learning tech- major depression and autism. In recent niques such as graphical models (MRFs, work in my group with fMRI data we have CRFs) and classifiers (SVMs). (NSF) demonstrated that it is possible to clas- sify different groups of human subjects performing the same tasks based on the observed 3D fMRI BOLD images, through

35 Healthcare and Biomedical Applications

Platform Comparison via Biomarker Agreement and Comparing the RNA-seq Errors in Variables Models Integration Analyses across and Microarray Platforms with or without Replicates Measurement Platforms with Generalized Linear EIV Model Wei Zhu and Jinmiao Fu Wei Zhu and Jinmiao Fu [email protected] [email protected] Ellen Li, Wei Zhu, and Yuanhao Zhang [email protected]; Platform or instrument comparison is a With the rapid development of biotechnol- [email protected] critical task in many lines of research and ogy, an increasing number of platforms for industry. In biotechnology, for example, measuring gene expression levels are co- The discovery of gene biomarkers and with the rapid development gene sequenc- sharing the market – with the older tech- related pathways is critical in the studies ing platforms – we now have the first and nology such as the gene microarray being of disease and therapeutic treatment. The the next generation sequencing platforms, phased out, while the newer ones based rapid evolution of modern biotechnology each with several brands manufactured on the RNA sequencing (RNAseq) being has generated several waves of measure- by different companies, co-sharing the brought in, generation after generation. A ment platforms – with some phasing out, market – at the same time, the third gen- crucial question to the entire biomedical while others co-sharing the market. Two eration sequencing method built upon the community is whether biomarkers detect- front runners are the more traditional sequencing of a single molecule of DNA, ed through these different platforms are gene microarray technology and the newly has already emerged (http://genomena. consistent or not? In this work, we pres- arrived RNA-Sequencing (RNA-Seq) com/technology-how-we-look-at-dna/third- ent a theoretical framework on biomarker platform. An intensive literature review re- generation-genome-sequencing/). consistency study based on the errors in vealed that the prevalent statistical method An accompanying critical task for the variable (EIV) model. First, we calibrate for the comparison of the Microarray and statisticians is how to best compare and the measurements between two platforms the RNA-Seq platforms is the Pearson combine results from different platforms. through an EIV model featuring indices product moment correlation coefficient. Previously, we have demonstrated the for the constant bias and the proportional However, the Pearson correlation is unable advantages of the errors-in-variable (EIV) bias. Subsequently we demonstrate how to provide a calibration formula to convert model as an optimal instrument compari- different biomarker detection algorithms the expression levels between the plat- son and calibration device. However, one including the fundamental fold change forms. It also fails to account for the fixed limitation to the traditional EIV modeling and Z - test, T - test, will be influenced by and proportional biases between the two approach is its reliance on the availability such biases. Finally, we discuss strategies platforms. To fill this void, we have devel- of repeated measures of the same sample. to combine measurements from different oped a method based on the generalized Such replicates can be expensive and at platforms for better biomarker detection. linear errors-in-variable (EIV) model that is times unattainable. Two methods by Wald (Simons Foundation, NIH) able to provide both a calibration equation (1940) and Kukush (2005) respectively as well as a statistical measure of the two are applicable for estimating the EIV model biases. (Simons Foundation, NIH) in the absence of replicates -- both relying heavily on the pre-cluster of data points. In this work, we aim to combine and improve these two methods through a better clus- tering strategy. (Simons Foundation, NIH)

36 Healthcare and Biomedical Applications

health, combining “omics”, Pathology, Bootstrap-based RANOVA Integrative Multi-Scale Bio- Radiology and patient outcome. We lever- age data from integrative patient studies for microRNA Panel Data medical Image Analysis including the Cancer Genome Atlas study, Analysis from the Stony Brook Cancer Center and Joel Saltz, Tahsin Kurc, and from several other collaborating New York Ellen Li, Wei Zhu, Yuanhao Zhang, Allen Tannenbaum area cancer centers including Yale and Jennie Williams [email protected] Rutgers. Analyses of brain tumors using [email protected]; whole slide tissue images, for example, [email protected] Understanding the interplay between have shown that correlations exist be- morphology and molecular mechanisms is tween the morphological characterization Gene microarray is widely-adopted to central to the success of research studies of micro-anatomic structures and clinical identify differentially microRNAs (miRNAs) in practically every major disease. At dis- outcome and genomic profiles. across different levels of potential prognos- ease onset and through disease progres- tic factors such as phenotype, genotype, sion changes occur in tissue structure and Our research and development is in three race, gender, etc. The significance analysis function at multiple spatial and temporal inter-related areas: (1) Analysis methods of microarray package (SAM) is arguably scales. Quantitative study of disease capable of extracting descriptions of tissue the most common analysis procedure through synthesis of multi-scale inte- morphology from massive 2-dimensional, which has been included as part of the grated structure, function, and molecular 3-dimensional, and temporal whole slide Agilent miRNA analysis pipeline. How- information has tremendous potential to tissue imaging datasets for multi-scale ever, SAM can only analyze data with one significantly enhance disease diagnosis morphological characterization of tissue; prognostic factor. When the experimental and therapeutic strategies in a wide range (2) Analytical methods and tools to explore design is factorial with multiple factors, of disease types. combined imaging, molecular, and clinical Zhou and Wong (2011) provided a nice features to reveal the interplay between non-parametric bootstrap ANOVA pipeline. The need for deep, quantitative under- tissue structure and genomic and clini- Both this and SAM are nonparametric in standing of biomedical systems is crucial cal profiles; and (3) a high performance nature as the miRNA data are often not to unravel the underlying mechanisms middleware platform, which can take normal. When the experimental design is and pathophysiology exhibited in cancer advantage of high end computing systems in a panel data format containing both the biology. Our goal is to develop detailed, and emerging architectures with multi- within- subject factor (repeated measures) models of multiscale tissue morphology core CPUs and hardware accelerators, to and the between-subject factors, however, and molecular characterization to 1) en- process and manage very large volumes the most suitable method to compare able high precision predictions of tumor of high resolution microscopy imaging group means is the repeated measures change and evolution, and 2) develop an datasets. (NCI, NLM, NIH) ANOVA (RANOVA) rather than the factorial ability to anticipate treatment response to ANOVA. To our knowledge, the available combined regimens consisting of radia- methods are either parametric (Li, 2004; tion therapy, traditional chemotherapy Feature-based Exploration Conesa et al., 2006) or only one-way and targeted therapies. This research repeated measures analysis without is an interplay between systems biology, of Extremely Large between-subject factors (Elbakry, 2012). multi-scale image analysis and population Spatio-Temporal Scientific To rectify the lack of a non-parametric Datasets RANOVA with multiple test correction via FDR for microarray panel data analysis, we Tahsin Kurc and Joel Saltz have developed a novel analysis pipeline [email protected] combining the modern RMANOVA (Seco et al., 2006) and the empirical FDR (Zhou Advances in sensor technologies make it and Wong, 2011) for multiple test com- possible to rapidly collect vast quantities of parison, and implemented which in an low-dimensional, spatio-temporal datasets, efficient computational paradigm. In addi- in which data elements are associated with tion, our method is also applicable to the coordinates in a multi-dimensional space traditional messenger RNA (mRNA) panel with low-dimensionality and potentially ob- data analysis. (Simons Foundation, NIH) tained at multiple time steps. Analysis and characterization of features (e.g., spatial structures, their properties, function of the properties over space and time) in these datasets are important in many scientific

37 Healthcare and Biomedical Applications

domains, including weather prediction and climate modeling, earth systems science, Home Sleep Monitoring Unified Statistical Models biomedicine, and materials science. In order to fully exploit the potential of spatio- Using Bluetooth LE for Integrative Omics temporal sensor datasets in scientific Biosensors to Facilitate Pei Fen Kuan research, high performance computing Sleep Health capabilities are needed to rapidly extract [email protected] and classify various features from large Helmut H. Strey Large-scale epigenomics public resources volumes of data, ranging from multiple [email protected] terabytes to petabytes, using data and such as the Cancer Genome Atlas (TCGA) and NIH Roadmap Epigenomics Map- computation intensive analysis pipelines. An estimated 50–70 million adults in the ping Consortium offer an exciting research The primary objective of our research is have chronic sleep and direction to integrate analysis results from to develop and evaluate a suite of novel wakefulness disorders. Sleep difficulties, multiple omics experiments generated data and processing abstractions and some of which are preventable, are associ- from different technologies/platforms (tiling optimizations within an integrated frame- ated with chronic diseases, mental disor- arrays, next generation sequencing). This work to enable analysis of extremely large ders, health-risk behaviors, limitations of is as an important step to unlocking the low-dimensional spatio-temporal data for daily functioning, injury, and mortality. The secrets of complex biological systems. scientific, biomedical, and clinical stud- goal of this project is to enable patients A couple of examples include studying ies. The methods and software systems with sleep disorders to improve their sleep gene-gene and gene-transcription factor developed in this research support novel health by providing long-term sleep quality interactions via co-expression and regu- data representations and runtime optimi- and quantity feedback while maintaining latory network analysis; and integrating zations to be able to 1) ingest and manage the accuracy of a clinical sleep study. We mutation, copy number and gene expres- large volume of diverse data sets, 2) stage are developing a Bluetooth low-energy sion data in identifying candidate driver datasets using resources in the data path, wireless sensor suite equivalent to a Type- genes in cancer. Such analyses usually such as clusters and GPU accelerators, 3 sleep study that is suitable for home require large data sets and implicitly as- and 3) rapidly process datasets using use. The quality of our measurements is sume independence among the different a repertoire of analysis operations. The verified by side-by-side testing in a clinical data sets. However, in most cases, these research also investigates the interplay sleep laboratory. Furthermore, we are data sets are generated from biologically between spatio-temporal data analysis developing mobile device (iPhone, iPad, dependent experiments (e.g., multiple applications, middleware software, and iPod) software that captures and analyzes time series gene expression data) and hardware configurations, targeting high the sensor data to provide feedback to ignoring this fact could lead to incor- end parallel machines/clusters contain- the user about sleep quantity and quality. rect inferences on biologically significant ing hybrid multicore CPU-GPU nodes; Combined with professional questionnaires pathways. In addition, most of the existing extreme scale machines which consist of before and after sleep, the patients will softwares work by first analyzing each data hundreds of thousands of CPU cores; and be able to correlate their life-style to sleep source separately, followed by declaring a machines with deep memory and stor- quality and through this achieve long-term set of candidate regions based on user- age hierarchies linked to machines that changes in behavior to improve their sleep specified error rate threshold, and finally acquire and aggregate data. (NLM, DOE, health. Optionally, the patients can link constructing regulatory network from these NIH) their data to social media (Facebook, Twit- reduced data. These approaches have ter) to improve their motivation and to add multiple pitfalls including difficulties in a competitive edge to their efforts. Similar overall error rate control and information approaches in the fitness area (Flexbit, loss due to data reduction. Our lab aims Fitbit, Polar Heart Rate Monitor, etc.) have to develop unified statistical methodolo- been tremendously successful in improv- gies for integrating omics data to identify ing user motivation and ultimately leading functionally associated regulators and to positive behavioral changes. improve error rate control, by incorporating Figure 8. the dependence among these experiments at the raw data level. Such data integration encompasses both combining multiple omics data (1) of similar types across stud- ies and (2) of different types across and within studies. Softwares implementing our Figure 8 proposed methodologies will be developed as Bioconductor/R packages. (CEWIT)

38 Healthcare and Biomedical Applications

collects the vital signals from infants and Mobile Healthcare Device performs on-site analysis. It will send instant Developing Computer warning to mobile device of parents and Platform caregivers if abnormalities are observed. Games and Virtual Reality The platform has significantly reduced the Wei Lin Environments for Use in develop time of the prototype. Figure 9. [email protected] Physical Rehabilitation Our research focuses on the development Rong Zhao of wearable healthcare devices. We’ve de- Study of How IT Affects [email protected] veloped a hybrid platform that utilizes ARM Computer augmented rehabilitation Cortex-M series processor as the embed- Patient Access, Quality combines interactive computer technol- ded system and the open source FreeRTOS of Care and Administrative ogy with various devices and apparatus to as its real time operating system. The ARM Costs in a Newly processors are designed for mobile applica- enhance physical rehabilitation. The degree tions with low power consumption. The Fre- Formed ACO of interaction can range from simple motion biofeedback to virtual reality (VR) simula- eRTOS provides a small foot print kernel for Eugene Feinberg real time, multitask applications. In order tions. We propose to have applications that [email protected] are rich with rehabilitation strategies, pro- to integrate the FreeRTOS with the selected processor, a set of drivers were designed to tocols and applications. The two areas we This project studies efficiency of imple- bridge the FreeRTOS and the mobile pro- focus on with the greatest market potential menting particular information technology cessor. They provide a uniformed software are 1) balance training and fall prevention interface that allows the application code solutions for medical offices and hospitals solutions, and 2) gait training solutions. In- running on the FreeRTOS to easily control in order to achieve the following goals: novative computer games and virtual reality the hardware resources such as analog 1) Enable secure, digital Patient-to-Provid- environments are developed and integrated digital converter, I2C bus and universal er and Provider-to-Provider communica- into rehabilitation processes to deliver high asynchronous receiver/transmitter (UART). tions that are traceable and reportable. quality care to the aging population and in- Application code can be divided into small 2) Reduce phone call volume in medical jured veterans returning from the two wars. modules called tasks for design simplic- offices and hospitals. We partner with Biodex Medical Systems, a ity. The software architecture maximizes 3) Achieve cost savings and productivity leading provider of a wide range of physical the code reusability and enables the quick improvements. (ClipboardMD) rehabilitation and training systems, which switch of hardware design with little impact offers specific products for balance reha- on the existing code. Its flexibility is very bilitation that provided an excellent platform attractive for the development of mobile for interactive gaming. With these products healthcare applications. This platform has the patient’s motion provides a joystick been adopted in the mobile infant monitor output that can “work” or navigate based device for the prevention of infant sudden on the patient’s motion through an appro- death syndrome in collaboration with the priate program. Biodex also produces an School of Health Technology and Manage- instrumented treadmills that could interface ment at Stony Brook University. The device for “walking” simulations as well as a com- puter based robotic system for upper and lower extremities that would benefit from virtual, interactive computer games. Various competition-type training programs are developed to increase the motivation and further engage patients in the interactive rehabilitation process. Data analytics and pattern recognition techniques are incorpo- rated into such systems to assist the clinical evaluation and decision-support processes. (Biodex Medical Systems)

Figure 9 39 Internet of Things

tropic sensing ranges). Ideas from compu- RFID sense-a-tags for the Non-isotropic Networked tational geometry will be applied to tackle these problems. For some of the problems it Internet of Things Sensor Deployment for is not hard to show to be NP-hard. Thus we Petar M. Djuric Smart Buildings will focus on approximation algorithms with [email protected] worst-case guarantee for theoretical rigor Jie Gao as well as practical algorithms suitable for We address the question of the feasibility of [email protected] implementations. (NSF, DARPA) a design of backscattering devices that can communicate independently. We investigate The recent advancement of ubiquitous the design of a novel component for the Inter- computing, sensing, distributed control and Self-powered Wireless net of Things. We refer to it as a sense-a-tag embedded systems has enabled a wide spec- tag. The tag may be passive or semipassive, trum of smart building applications, such as Hybrid Langasite Sensor for and it has the following functionalities: (a) surveillance for public safety, power monitoring Pressure/Temperature Moni- initiation and generation of query signals for and control for energy saving, patient monitor- toring of Nuclear Reactors the tags in its proximity, (b) passive detection ing for health care, and hazard detection for and decoding of backscatter signals from dis- aster response. The deployments of those Lei Zuo (SBU), Haifeng Zhang (UNT) RFID tags in its vicinity, (c) processing of sensing and control systems have been tested information that it acquires during its opera- in a number of small-scale houses and labs. and Jie Lian (RPI) tion, and (d) communication of the collected However, due to the use of simplified isotropic [email protected] information by backscattering. The research sensing model, the initial de- ployments usu- also involves development of protocols for the ally do not yield satisfactory sensing quality in The objective of this proposal is to develop a new tags. (NSF) real world settings. As a result, adjustment of novel self-powered wireless hybrid sensor that initial deployment or even multiple redeploy- can accurately monitor both pressure and ments are typically required, which introduces temperature using a single device without high overhead and decreased quality of requiring external electricity, even in the ex- Design of Audio Interfaces applications. The inefficiency and inaccuracy treme harsh environments of severe nuclear in Adverse Acoustic of existing sensor deployment call for new accidents, such as up to 14000C tempera- Environments research solutions. The desired solution shall ture, 10,000 psi pressure, and excessive systematically guide the sensor deployment radiation. This objective is achieved through Miltuin Stanacevic through the integration of sensing and com- three innovations: the first innovation is to [email protected] munication considerations. To accomplish design a dual-mode langasite (La3Ga5SiO14) this task, this works aims at (i) addressing a resonant sensor to detect extreme high Achieving high speech intelligibility and fundamental challenge that is essentially tied temperature and high pressure simultane- recognition in noisy and reverberant environ- with the nature of heterogeneity of sensors: ously; the second is to create a multi-source ment presents a hard problem especially non-isotropic sensing models, (ii) proposing a energy harvester to harness intrinsic heat when the size of the microphone array is set of deploying solutions suitable for a wide of the reactor and the kinetic energy of the limited. A time delays between the observed range of sensor systems, and (iii) designing reactor components (such as pump vibra- signals are fraction of the sampling time, and implementing an open test-bed and simu- tion or seismic motion) to provide the electric while the reverberations add attenuated and lation tools to evaluate real world deployments power needed for the sensors; and the third delayed source signal to the mixture. We for the community. innovation is to design a self-protected sensor devised a new set of blind source localiza- This work investigates the non-isotropic package upon integration of radiation shield- tion and separation algorithms in the unique networked sensor deployment for localization ing and mechanical support for mitigating framework that combines the spatial sam- and tracking of human and activities in smart severe environmental impacts. pling, sub-band processing and independent buildings, with special attention on a new set component analysis to achieve improvement of problems aris- ing from the deployment The novel concept of self-powered wireless in the localization and separation perfor- of four types of sensors: cameras, tripwire, langasite-based P/T sensor that will operate mance. The advancement in the separation motion sensors and microphones. The set under high temperature, high pressure, and performance leads to better intelligibility of of deployment problems would strive for full excessive radiation provides a pathway to the speech signals and improved speech sensor coverage and wireless connectivity significantly improve monitoring technology recognition in the audio interfaces operat- with a complex floor plan, and involve one or for current nuclear reactors, and unques- ing in adverse acoustic environments. Our more of the following constraints: (i) Non- tionably support the program of Nuclear design of miniature size sensor arrays with isotropic model using visibility (cameras and Reactor Technologies: Monitoring Technolo- sensory information processing implemented tripwires), (ii) Non-overlapping sensing range, gies for Severe Accident Conditions. Upon in mixed-signal VLSI enables a wide range of (iii) Robust 2-coverage for accurate localiza- success, the technology can also be used new applications of the miniature microphone tion. The sensor deployment problem will during normal operating conditions to pro- arrays where the power consumption and size heavily involve the geometric properties of the vide enhanced monitoring of critical com- of the digital implementation where the limit- environment (building layout), as well as the ponents in a standoff and energy-efficient ing factors. (NSF) geometric properties of the sensors (non-iso- manner. (DOE) 40 Smart Energy and Smart Urban Systems

monolithic piezoelectric materials. The overall objectives of our research efforts are: (i) To obtain a comprehensive under- standing of the fundamental properties of smart piezoelectric composites; and (ii) To design novel smart materials based devices and structures for sensing and actuating functions as well as for energy harvesting applications. (CEWIT)

Thin Film Solar Cells with Tunable Transparency

T. A. Venkatesh (SBU) and Figure 10 M .Cotlet (BNL) [email protected]

Smart Grid Regional Smart Composites for Solar technologies are currently based Demonstration: Long Island Energy Harvesting on conventional solar cells made out of Smart Energy Corridor inorganic materials like silicon which T. A. Venkatesh are generally opaque. Organic semicon- Eugene Feinberg, Samir Das, Rob [email protected] ducting materials, including conjugated Johnson, Ari Kaufman, Klaus Mueller, polymers, have been investigated as Erez Zadok Smart piezoelectric materials, by virtue of potential replacement materials for solar [email protected] their coupled electromechanical charac- cells. Conjugated polymer-based organic teristics, have been recognized for their photovoltaic solar cells have several at- This is a joint project with Long Island potential utility in many applications as tractive features such as being made of Power Authority and Farmingdale College. sensors and actuators, from medical ul- inexpensive materials, by cost-effective The goal is to create a Smart Energy Cor- trasound devices for prenatal care, micro/ processing methods like printing tech- ridor along Route 110 on Long Island. The nano-positioners for atomic force micro- niques, dip or spin casting, requiring low project demonstrates the integration of a scopes and sonar hydrophones to non- amounts of active material (around 100 suite of Smart Grid technologies on the destructive testers and inkjet print heads. nm thick layer), and being light weight distribution and consumer systems, such Considerable research efforts in the past and mechanically flexible. However, even as smart meters, distribution automation, years have resulted in the development of at thicknesses of around 100 nm, these distributed energy resources, and electric several monolithic piezoelectric materials continuous thin films also tend to be quite vehicle charging stations. such as lead zirconate titanate (PZT) and opaque, restricting their application for barium titanate, with enhanced coupled integrated photovoltaics such as power- The project also includes testing cyberse- properties. However, despite the enhance- generating windows. In order to obtain curity systems, using Smart Grid technolo- ment in their piezoelectric properties, transparent solar cells, recent efforts have gies to enhance efficiency and reliability of monolithic piezoelectric materials generally focused on reducing the thickness of the the distribution network, identifying opti- exhibit certain limitations. For example, active layer to less than 100 nm, which mal combinations of features to encourage they are mechanically brittle as most of the improves the transparency, but drastically consumer participation, and educating piezoelectric materials are ceramic-type reduces the conversion efficiency of the the public about the tools and techniques materials and their functionality is gener- solar cells. Hence, the overall objective of available with the Smart Grid. Figure 10. ally unidirectional as the poling charac- our research effort is to develop conju- (DOE) teristics of the piezoelectric material allow gated polymerbased thin films with a novel them to sense or actuate in one direction micro-porous structure as active materials (i.e., in the dominant poled direction) only. for solar cells with tunable transparency Because of these limitations, the range and good photovoltaic properties. (DOE) of applicability of monolithic piezoelec- tric materials is limited. A composites’ approach to piezoelectric materials can potentially overcome the limitations of

41 Smart Energy and Smart Urban Systems

Enhanced Power System Smart Grid An Interactive User Inter- Operation and Control Android Manager face for the Smart Grid

Eugene Feinberg Radu Sion Arie Kaufman and Klaus Mueller [email protected] [email protected] [email protected]

This project investigates and implements The Mobile Smart Energy Manager is an In the traditional system, customers just numerical algorithms and high perfor- Android/iPhone based Energy Manager purchase the energy from suppliers and mance computing software for solving that runs on smart phones and tablets. It consume it. However, smart grid is two- power-flow, state-estimation, and system- will connect to local autonomous micro- way communication channel between stability control problems for electric grid information gateways and compo- suppliers and consumers. The roles of transmission grids. The main objective is nents (such as solar panel relays), the consumers are to reduce their total con- to develop solution methods that are 10 macro-grid operators and utility compa- sumption and shift their usage to off-peak to 100 times faster than the solution times nies’ portals, smart home components, time. However, it is difficult to encourage achieved in current control system imple- and the internet, to provide real-time consumers to change their behavior with mentations. (NYSERDA, NYPA) energy-related information, debit and simple visualization. In this project, we consumption data, billing and the ability to have developed an interactive system to manage smart-home devices in real-time help customers gain better understating of on-demand. their energy consumption. In our system, since customers hardly understand their The Mobile Smart Energy Manager will energy consumption of their own electric also allow a unified user control platform devices, customers could configure their for the integration of various external smart own virtual house with electric devices to grid data processing and visualization estimate their current energy consump- plugins. Examples of such plugins include: tion. Customers could choose what kind (1) big data analytics visualization of of devices they have by dragging and micro-grid and macro-grid energy data, dropping an electric device into their vir- (2) connectivity conduit to external data tual house. Customers can easily select a sources and (3) visual devices such as the specific model of each device. Our system reality deck display and the SCADA smart also provides a tool to analyze their energy grid control center, (4) networking plugins consumption pattern in order to control to interface with any additional custom their energy usage efficiently. Given their wireless protocols designed as part of the total energy consumption from their smart SGRID3 project. (DOE) meters, it shows their daily, weekly, and monthly energy usage patterns. In addi- tion, it enables customers to predict their energy consumption once they replace their current electric devices with new ones. (DOE)

42 Algorithms

for non-visual skimming. The research Non-Visual Skimming: and development of non-visual skimming Approximation Algorithms interfaces in this project can lead to drastic Improving the Usability improvement of Web accessibility for blind for Geometric Optimization of Web Access for Blind people. Non-visual skimming will not only Esther Arkin and Joe Mitchell People enhance webpage navigation, but it may [email protected] also address the usability of non-visual Yevgen Borodin and browsing interfaces by reducing informa- We apply the methodologies of computa- I.V. Ramakrishnan tion overload. (NSF) tional geometry to design, analyze, imple- [email protected] ment, and test algorithms for problems that arise in several application areas, In our information-driven web-based Divisible Load Scheduling including geometric network optimization, society, we are all gradually falling “vic- air traffic management, sensor networks, tims” to information overload. However, Thomas Robertazzi robotics, geometric modeling, and manu- while sighted people are finding ways to [email protected] facturing. The main goal is the develop- sift through information faster, visually ment of fundamental advances in approxi- impaired computer users are experienc- Ongoing work seeks to look at novel ways mation algorithms for geometric problems. ing an even greater information overload. in which very large partitionable data loads Additionally, the project will strive to foster These people access computers and the can be processed in a minimal amount of and deepen collaborations with research- Web using screen readers, the software time with the biggest parallel processing ers and domain experts in application that narrates the content on a computer advantage. Divisible or partitionable loads areas and industry, in order to formulate screen using computer-generated speech, are loads that can be chopped arbitrarily their algorithmic needs precisely and to and allows users to navigate with keyboard into smaller fragments for processing on a make available algorithmic tools, insights shortcuts or gestures. networked collection of processors. One from theoretical results, and software from would like to make the best temporal experimental investigations.The specific While sighted people can learn how to choice and assignment of fragments to repertoire of problems includes geometric quickly skim web content to get the gist processors and links so as to minimize so- network optimization (optimal routing and of the information and find the informa- lution time and maximize parallel process- network design in geometric contexts, in- tion they need, people who are blind have ing gain (known a speedup). A number cluding TSP variants, vehicle routing, con- to process volumes of content narrated of uses are being investigated. One is strained spanning trees, minimum-weight through a serial audio interface, which processing huge volumes of data looking subdivisions, optimal route planning with does not allow them to find out what con- for characteristic patterns (i.e. signatures) various constraints, and survivable network tent is important before they listen to it. So, that signify the presence of an interest- design); air traffic management (optimal they either listen to all content or listen to ing event or object. Applications include use of airspace in the face of dynamic and the first part of each sentence or para- network security, sensors, bio-medical uncertain constraints induced by weather graph before they skip to the next one. problems and image processing. Work to and traffic congestion, sectorization (load Dr. Borodin and Dr. Ramakrishnan are date has examined establishing the perfor- balancing), and optimization of flow man- developing novel interfaces and algorith- mance of systems doing such processing. agement structures for the National Air- mic techniques for non-visual skimming A second use is in systems where there space System); and sensor networks and that will empower people with vision im- is a monetary cost for doing processing coverage (sensor deployment, localization, pairments to access digitized information and transmission. The goal is to tradeoff data field monitoring, and coverage for significantly faster than they currently can solution time against the cost of problem stationary or mobile (robotic) sensors). with the state-of-the-art screen readers. solution. This is relevant in an era of (NSF) In this way, skimming will help reduce the cloud computing. Finally, using divide cognitive load associated with non-visual and conquer strategies to speed prob- browsing. lem solving has been examined. Such Geometric Networks techniques can yield impressive savings in In the process of skimming, sighted time. Since the original work on analytical Esther Arkin and Joe Mitchell people quickly look through content while solutions for partitionable loads by Cheng [email protected] picking out words and phrases that are and Robertazzi, and also by researchers at emphasized visually and/or carry the most Bell Labs, almost 150 journal papers have We study geometric networks, which rep- meaning. Dr. Borodin and Dr. Ramakrish- been written on divisible load scheduling. resent interconnections between entities nan are exploring ways to emulate this It remains an active and relevant field, that arise in physical domains or geometric process and enable a computer-assisted especially in an age of big data. (CEWIT) spaces. Networks are all around us and skimming experience for screen-reader us- are an important part of the technology ers by designing interfaces and algorithms in our daily lives. Examples of geometric 43 Algorithms

networks include wired/wireless commu- cis-regulatory regions. Beginning to model nication networks, transportation sys- Using Evolutionary Compu- and understand the role of this architec- tems, power grids, sensor networks, and ture on gene regulation requires what we geometric graphs that arise in information tations to Understand the call a ‘mid-grained’ level of approach. visualization. Geometric networks often Design and Evolution of Mid-grained modeling involves treating have special structure that allows their Gene and Cell Regulatory functional regions of the cis-regulatory analysis and optimization to be done more Networks modules (CRMs) as discrete units subject efficiently than is possible in general (non- to evolution. The choice of fitness function geometric) networks. We study the design Alexander V. Spirov is crucial for GRN evolutionary design. and analysis of energy-efficient wireless [email protected] Particularly, a solution can be decomposed communication networks. In particular, into a number of building blocks (BBs, e.g. we investigate wireless communication A central challenge to understanding biological cis-regulatory modules, CRMs), networks involving directional antennas gene expression is to recreate the regula- which can be searched for independently and/or network improvement methods. tory networks controlling expression; or and afterwards be combined to obtain We also study several related optimization at the least to generate models of these good or optimal solutions. To implement problems in geometric networks that arise networks which capture essential charac- BBs in GRN evolution, we have developed in other applications, including sensor teristics of their connectivity and control, a crossover recombination operator which networks, transportation science, air traffic and which can be quantitatively analyzed. maintains meaningful BB sequences, management, vehicle routing in robotics, By developing such a quantitative theory such as CRMs. Systematic comparisons of covering tours, and exploration/mapping. of gene control, we can hope to develop the cost/benefit between coarse-grained The primary objective of the project is to far more powerful experimental tests and finer-grained approaches are needed develop new algorithmic solutions to a and applications. A great deal of diverse for more test cases, to better understand cohesive collection of geometric optimi- work has gone into developing and test- the appropriate level for addressing par- zation problems that fall in the common ing models of gene regulatory networks ticular questions. Our conclusion at this category of network problems. The goal (GRNs) in recent decades. As GRN stage is that the mid-grained level of GRN is to develop theoretically sound, provable models become more developed, allowing modeling (CRM level) is the best tradeoff methods, but with a strong awareness us greater understanding of the particu- between highly expensive calculations towards practicality and implementation. lar ‘wiring’ involved in particular cases, it (which impact the extent of computational (US-Israel Binational Science Foundation) becomes natural to ask how these GRNs evolution that can be performed) and evolved, to uncover general principles biologically reasonable simplification of the of evolutionary processes. Such ques- gene regulatory organization. The biologi- Algorithms in Support of tions can be approached computationally cal and approaches Flight Trajectory Analysis by evolution in silico. Complementary to to evolution have operated somewhat these biologically-focused approaches, independently, but have much to offer a now well-established field of computer each other with increased crossover of Joe Mitchell ideas. We feel, especially, that simulations [email protected] science is Evolutionary Computations (EC), in which highly efficient optimization of GRN evolution, which are computation- ally intensive, could benefit from increased This project applies geometric algorithms techniques are inspired from evolution- ary principles. In our project we focused use of EC optimization techniques and the to the analysis of large databases of trajec- analytical tools that come with them. The tory data for moving agents. Given the on the considerations that must be taken with respect to level of detail to model progress in modeling biological evolution exploding amount of time trajectory data will be greatly enhanced by incorporation collected, there is a need to be able to (from solely gene-gene interactions to the DNA sequence level); and efficiency of of recent developments in EC optimization process, analyze, mine, and understand techniques, many of which have already trajectories that arise from vehicles (cars, computation. With respect to the latter, we argue that developments in computer EC been developed on ‘test case’ biologi- buses, trains, aircraft), pedestrians, ani- cal type problems. At the same time, the mals, autonomous vehicles/sensors, etc. offer the means to perform more complete simulation searches, and will lead to more computer science stands to benefit greatly Using methods of computational geometry from increased appreciation and use of and geometric data structures, we design comprehensive model predictions. While the some of the projects on GRN evolu- the complexity of biological evolutionary methods and tools to process trajectory dynamics. (CEWIT) data in the space-time domain, allowing a tion in silico begin to account for multiple high level of understanding of structures in binding of transcription factors, they do patterns and rapid query access to large not address the architecture of genes’ databases. (Sandia National Labs)

44 Algorithms

mission completion in the face of runtime violations. We propose to develop Adap- Adaptive Runtime tive Runtime Verification and Recovery Computational Modeling Verification and Recovery (Arrive), a novel extension of runtime and Analysis for Complex for Mission-Critical verification in which the runtime verifica- Systems Software tion itself is adaptive. Arrive will dynami- cally allocate more runtime-verification Radu Grosu, James Glimm, Radu Grosu, Scott A. Smolka, Scott D. resources to high-criticality monitored Scott A. Smolka Smolka, Erez Zadok objects, thereby increasing the probability [email protected] of detecting property violations within a [email protected] given overhead budget. Moreover, when a The CMACS project is a 5-year $10M violation is imminent, Arrive will take adap- Runtime verification (RV) refers to the use multi-institution NSF Expedition in Com- tive and possibly preemptive of lightweight yet powerful formal tech- puting project focused on far-reaching and action in response, thereby ensuring niques to monitor, analyze, and guide the transformative research into techniques recovery. execution of programs at run-time. RV is based on Model Checking and Abstract becoming increasingly important for at Interpretation (MCAI) for analyzing the We are investigating three related aspects least two reasons. First, software systems behavior of complex embedded and of Arrive: overhead control, incomplete are becoming more complex and more dynamical systems. Model Checking and monitoring, and predictive analysis. We adaptive, making it difficult to statically Abstract Interpretation have a 30-year are developing a Simplex architecture for understand all of their possible behaviors; record of success at verifying proper- cyber-physical systems by extending Sim- this is especially true of mission-critical ties of the behavior of discrete systems plex to monitor the software state as well as software on autonomous unmanned automatically. The techniques have been the physical-plant state. We will evaluate vehicles, where completion of mission fruitfully used, both independently and the performance and utility of the Arrive goals depends upon adaptive responses in combination, to establish properties of framework through significant case studies, to changing conditions. RV thus plays a systems containing thousands of variables including the runtime monitoring of the valuable role in helping users monitor and and inputs and several 100,000 of lines command-and-control and energy-manage- understand system behavior during test- of code (for example, the Airbus 380 flight ment infrastructure of a fleet of UAVs. ing, debugging, and deployment. Sec- control software), and to detect subtle ond, to increase the reliability of complex bugs in a variety of hardware and software It is anticipated that this project will be adaptive software, RV must monitor and applications, ranging from microprocessor funded by the Air Force Office of Scientific analyze the behavior of the software, its designs and communication protocols to Research (AFOSR) starting in the first environment, and their interaction, in railway-switching systems and satellite- quarter of FY 2014. order to trigger adaptive responses when control software. necessary. The purpose of the CMACS project is to To fill these needs, runtime verification extend the MCAI paradigm to reasoning itself must become moreflexible and about the behavior of models of physi- adaptive, and it must be equipped with a cal systems that include continuous and recovery framework that will help ensure stochastic behavior, such as those found in biological and embedded-control areas. Specific research is being undertaken in model discovery/system identification for stochastic and nonlinear hybrid systems; methods for generating sound model abstractions to simplify the reasoning pro- cess; and next-generation algorithms for analyzing the behavior of these models. Challenge problems in the areas of pan- creatic-cancer modeling, atrial-fibrillation detection, distributed automotive control, and aerospace control software are being used as technology drivers and testbeds for the results obtained in the course of the project. (NSF)

45 Algorithms

Algorithmic Foundations From Clarity to Efficiency for Hybrid Mobile for Distributed Algorithms Sensor Networks Y. Annie Liu and Scott D. Stoller Jie Gao [email protected] [email protected] Distributed algorithms are at the core of The objective of this project is to develop distributed systems, which are increas- efficient distributed algorithms for integrat- ingly indispensable in our daily lives. Yet, ing static, perva- sive sensor nodes and developing practical implementations of mobile, autonomous agents that altogether distributed algorithms with correctness and efficiency assurance remains a chal- monitor and act on the environment. The • How to make use of the data stored in lenging, recurring task. project leverages on the recent advance the network to best serve user requests? and maturity of static wireless sensor The biggest challenge is to solve the infor- This project develops formal semantics networks for environment monitoring and mation brokerage problem, in which sen- and powerful optimizations for a very high projects the vision of seamlessly integrat- sors detecting interesting local events and level language for distributed algorithms. ing physical users and/or autonomous the mobile users seeking such information The language, DistAlgo, allows the algo- robots into a hybrid intelligent framework. are not aware of one another. Non-trivial rithms to be expressed easily and clearly, The focus is on the joint information pro- brokerage methods need to be designed to making it easier to understand them cessing, optimization and coordi- nation ‘match’ the data suppliers and data con- precisely and to generate executable of both static and mobile nodes to com- sumers, without communication expensive implementations. The semantics are for prehend and act on the environment. The routines such as flooding. project covers the following application exact understanding of the language. The optimizations are for generating efficient categories: mobile robots as aids to sensor • How to best make use of the continuity implementations, without manually cod- network operation, tracking, searching and and coherence in mobility, either in the ing the algorithms and applying ad hoc planning mobile identities, online resource presence of mobility enabled data (e.g., optimizations. management and allocation, maintaining continuous trajectories for mobile targets), group communication and coordination as well as the locations of mobile users? We evaluate the language and the transla- of mobile agents. A common theme in We are looking for solutions that smoothly tion and optimization methods by applying these problems is the tight coupling of adapt to the system configuration with low them to important and difficult distributed and the frequent information flow between worst-case or amortized update cost. We algorithms. Based on our experience and static monitoring nodes and mobile want to avoid sudden changes or any level success teaching concurrent and actionable agents. of reconstruction. Smoothly deforming distributed algorithms in the past four communication structures will be help- years, we also propose to systematically To enable mobile agent coordination and ful for continuous, steady data flow from apply our methods to major distributed ensure smooth and frequent informa- the network to mobile users and between algorithms in textbooks. We will make tion flow between sensor nodes and the multiple mobile users. agents, we ask how sensor data should be the results publicly available on the Web. (NSF) stored and processed in the network, and The embedded sensors integrated what type of distributed structures should smoothly with other mobile embedded be maintained, to best serve mobile inter- devices would provide real-time data active users. There are two central intellec- collection, knowledge extraction, environ- tual questions that transcend the algorithm ment understanding and eventually evolve and system design. (NSF, DARPA) into an intelligent component of a smart environment. (CEWIT)

46 Algorithms

over arbitrarily nested objects and sets of Demand-Driven Incremen- objects, for any dynamically demanded parameter values, under all possible up- tal Object Queries dates to the values that the query depends on. Y. Annie Liu and Scott D. Stollerliu@ Transformation into relational queries cs.stonybrook.edu allows clean and uniform treatment of objects, sets, and demands, all of which High-level queries such as relational are captured precisely by constraints on database queries have provided significant relations, resulting in systematic consid- productivity improvement in the develop- eration of the trade-offs in incrementaliza- ment of database applications. Relational tion and optimization. This then allows Web, system security and access control, queries are supported by powerful query our method to provide precise complexity program analysis and verification, cogni- optimization engines and are significantly guarantees for the generated implementa- tive radio, disruption-tolerant networking, easier to write, understand, and maintain tions. Our transformations that exploit sensor networks, and a number of other than efficient low-level programs. Similar demands systematically reduces not only domains, each with its own demand for queries, in the form of comprehensions, time complexity drastically but also needed efficiency and scalability. Various imple- have also been used increasingly in pro- space significantly. Our prototype imple- mentations of rule systems have been de- gramming languages, from SETL to mentations have been applied successfully veloped and shown to have widely varying Python to Haskell to C# with LINQ and to complex query problems in a variety of performance characteristics. While some many more. application domains, including social net- systems generally have better performance work analysis, role-based access control, than others, no single system dominates Object queries can provide even more query-based debugging, and distributed for all types of rule programs. This project productivity improvements, because they algorithms. (NSF and ONR) aims to take one successful rule system, allow easier and more natural modeling the XSB Logic Programming System, and of real-world data and easier and clearer use techniques from other implementa- expression of complex queries. Object- tions that for some programs have better oriented modeling and programming have Performance Analysis and performance to guide the development of indeed been widely adopted for improved Optimization for Logic a program optimization strategy for XSB productivity in application development. Rule Engines that will make it perform uniformly well. However, efficient implementation of object queries remains challenging, espe- Michael Kifer, Y. Annie Liu, The approach will be through global cially to support queries on demand, on and David Warren program transformations, drawing on dynamically determined parameter values, [email protected] techniques from relational database query and to make the queries incremental, un- optimization, compiler optimization and re- der updates to the values that the queries As the range of applications of rule-based cursive query optimization. Our approach depend on, while ensuring the desired systems has expanded and W3C is final- is to 1) explore and collect techniques for correctness. izing standards for rules on the Semantic generating (possibly more efficient) logi- Web, understanding of the execution per- cally equivalent programs; 2) develop cost We study a systematic method, based on formance of such systems has become models and analysis techniques to gener- relational invariants, for efficient imple- more critical. Rule-based systems are ate parameterized cost formulas that mentation of demand-driven incremental now widely being applied to the Semantic approximate the running time (and space) object queries. The method translates of given rule programs; 3) explore and demand-driven object queries into rela- develop techniques to estimate the param- tional queries, and incrementalizes and eters required to evaluate the cost formu- optimizes the resulting relational queries las; 4) develop heuristic search techniques by exploiting constraints from objects and to approximately search an exponentially demands. The method applies to queries large space of rule programs logically equivalent to a given one, to find the one with the least predicted running time (and/ or space); and 5) implement the strategy to perform as a program optimizer for the open-source, widely used, XSB Logic Programming System. (NSF)

47 Advanced Computing Systems

Secure Provenance in Eco Hadoop: Cost and An Efficient, Versatile, High-End Computing Energy-Aware Cloud and Scalable, and Portable Systems HPC Computing Storage System for Scientific Data Containers Radu Sion and Erez Zadok Radu Sion [email protected] [email protected] Erez Zadok [email protected] [email protected] The project examines novel services built Data provenance documents the inputs, on top of public cloud infrastructure to Scientific big data sets are becoming too entities, systems, and processes that influ- enable cost-effective high-performance large and complex to fit in RAM, forcing ence data of interest---in effect providing a computing. Eco Hadoop explores the scientific applications to perform a lot of historical record of the data and its origins. on-demand, elastic, and configurable slow disk and network I/O. This growth The generated evidence supports essential features of cloud computing to comple- also makes scientific data more vulnerable forensic activities such as data-dependen- ment the traditional supercomputer/cluster to corruptions due to crashes and human cy analysis, error/ compromise detection platforms. More specifically, this research errors. This project will use recent results and recovery, and auditing and compli- aims to assess the efficacy of building from algorithms, database, and storage ance analysis. dynamic cloud-based clusters leveraging research to improve the performance the configurability and tiered pricing model and reliability of standard scientific data This collaborative project is focused on of cloud instances. The scientific value formats. This will make scientific research theory and systems supporting practi- of this proposal lies in the novel use of cheaper, faster, more reliable, and more cal end-to-end provenance in high-end untapped features of cloud computing for reproducible. computing systems. Here, systems are HPC and the strategic adoption of small, investigated where provenance authorities cloud-based clusters for the purpose of The Hierarchical Data Format (HDF5) accept host- level provenance data from developing/tuning applications for large standard is a container format for scientific validated provenance monitors, to as- supercomputers. data. It allows scientists to define and semble a trustworthy provenance record. store complex data structures inside HDF5 Provenance monitors externally observe Through this research, we expect to an- files. Unfortunately, the current standard systems or applications and securely re- swer key research questions regarding: (1) forces users to store all data objects and cord the evolution of data they manipulate. automatic workload-specific cloud cluster their meta-data properties inside one large The provenance record is shared across configuration, (2) cost-aware and conten- physical file; this mix hinders meta-data- the distributed environment. tion-aware data and task co-scheduling, specific optimizations. The current storage and (3) adaptive, integrated cloud instance also uses data-structures that scale poorly In support of this vision, tools and sys- provisioning and job scheduling, plus for large data. Lastly, the current model tems are explored that identify policy workload aggregation for cloud instance lacks snapshot support, important for (what provenance data to record), trusted rental cost reduction. If successful, this recovery from errors. authorities (which entities may assert research will result in tools that adaptively provenance information), and infra- aggregate, configure, and re-configure A new HDF5 release allows users to structure (where to record provenance cloud resources for different HPC needs, create more versatile storage plugins to data). Moreover, the provenance has the with the purpose of offering low-cost R&D control storage policies on each object potential to hurt system performance: col- environments for scalable parallel applica- and attribute. This project is developing lecting too much provenance information tions. (NSF) support for snapshots in HDF5, design- or doing so in an inefficient or invasive way ing new data structures and algorithms can introduce unacceptable overheads. In to scale HDF5 data access on modern response, the project is further focused on storage devices to Bigdata. The project ways to understand and reduce the costs is designing several new HDF5 drivers: of provenance collection. (NSF) mapping objects to a Linux file system; storing objects in a database; and access- ing data objects on remote Web servers. These improvements are evaluated using large-scale visualization applications with Bigdata, stemming from real-world scien- tific computations. (CEWIT)

48 Advanced Computing Systems

Pochoir: A Stencil Compu- tation Compiler for Modern Multicore Machines

Rezaul Chowdhury [email protected]

A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. These computations arise in diverse ap- plication areas including physics, biol- ogy, chemistry, finance, mechanical and compute farms, and a next-generation electrical engineering, energy, climate, and recreational mathematics. Such Workload-Aware Storage infrastructure, such as Stony Brook’s Real- ity Deck, a massive gigapixel visualization computations are conceptually simple to Architectures for Optimal facility. These workloads produce com- implement using nested loops, but looping Performance and Energy bined performance and energy traces that implementations suffer from poor cache are being released to the community. performance. Real stencil applications Efficiency often exhibit complex irregularities and Second, the team is applying techniques dependencies, which makes it difficult for Erez Zadok such as statistical feature extraction, programmers to produce efficient mul- [email protected] Hidden Markov Modeling, data-mining, ticore code for them or to migrate them and conditional likelihood maximization to to other modern hardware platforms. The most significant performance and analyze these data sets and traces. The Even simple stencils are hard to code for energy bottlenecks in a computer are often Reality Deck is used to visualize the result- performance. This project aims to solve caused by the storage system, because ing multi-dimensional performance/energy the difficult problem of generating high- the gap between storage device and CPU data sets. The team’s analyses reveal efficiency cache-oblivious code for stencil speeds is greater than in any other part of fundamental phenomena and principles computations that make good use of the the machine. Big data and new storage memory hierarchy and processor pipe- media only make things worse, because that inform future designs. today’s systems are still optimized for lines, starting with simple-to-write linguistic Third, the findings from the first two efforts specifications. We are developing a lan- legacy workloads and hard disks. The are being combined to develop new stor- team at Stony Brook University, Harvard guage embedded in C++ that can express age architectures that best balance perfor- stencil computations concisely and can be University, and Harvey Mudd College has mance and energy under different work- shown that large systems are poorly op- compiled automatically into highly efficient loads when used with modern devices, algorithmic code for multicore processors timized, resulting in waste that increases such as solid-state drives (SSDs), phase- computing costs, slows scientific progress, and other platforms. This research will en- change memories, etc. The designs lever- and jeopardizes the nation’s able ordinary programmers of stencil com- age the team’s work on storage-optimized energy independence. putations to enjoy the benefits of multicore algorithms, multi-tier storage, and new technology without requiring them to write First, the team is examining modern work- optimized data structures. (CEWIT) code any more complex than naive nested loads running on a variety of platforms, loops. (NSF) including individual computers, large

49 Advanced Computing Systems

Energy-Efficient Building Metaknowledge Leveraging Three- Superconductor SFQ Representations in Circuit Dimensional Integration Processor Design Design: Symbolic Data Technology for Highly Mining for Systematic Heterogeneous Mikhail Dorojevets Modeling of Analog Circuits Systems-on-Chip [email protected] Alex Doboli Emre Salman New SFQ logics with extremely low power [email protected] [email protected] consumption at frequencies of 20-50 GHz make superconductor technology one of Analog circuits are important building the candidates for use in future energy- blocks in many modern electronic sys- Three-dimensional (3D) integration is an efficient systems for critical national ap- tems used in manufacturing automation, emerging integrated circuit (IC) technology plications. In the first phase of the project, telecommunication, healthcare, defense, that maintains the benefits of scaling by the PI’s SBU team has developed viable infrastructure management, and many vertically stacking multiple wafers rather design techniques, including asynchro- more. Analog circuit design tasks are often than decreasing the size of the transis- nous wave pipelining, and set of design conducted manually as circuit design tors in two dimensions. 3D technology tools and cell libraries for VLSI-scale has been difficult to automate. Efficient not only increases the integration density, superconductor design. Several RSFQ EDA tools have emerged over the years but also reduces the length and number chips with the complexity of up to 10K for circuit simulation, transistor sizing of global interconnects. This reduction Josephson Junctions (JJs) have been suc- and layout generation but other design enhances chip performance (due to cessfully designed using those techniques activities like circuit topology generation, reduced interconnect delay) while lowering and demonstrated operation at 20 GHz. refinement and selection, circuit modeling the power consumption (due to reduced The objective of the current phase of the and verification are still performed mainly switching capacitance). These advantages project (2012-2015) is to develop and by skilled experts. Our recent research have attracted significant attention in the evaluate an ultra-low-power design of 10- work has been addressing the topic of on past decade to develop high performance 20 GHz 32-bit superconductor processors automatically extracting and characterizing computing systems such as many-core using a novel Reciprocal Quantum Logic the design knowledge embedded in analog processors with embedded memory. (RQL) with zero static power consumption. circuits, and then using the mined knowl- The cell-level design and energy efficiency edge to produce new circuit topologies This research focuses on 3D heteroge- evaluation of a complete set of 32-bit pro- through steps like aligning and combining neous integration to expand the applica- cessing and storage components is to be design features, abstracting and general- tion domain of 3D integrated circuits from done using SBU RQL VHDL-based design izing concepts, and inferring new features high performance computing to relatively & evaluation tools. The goal of the work by similarities and induction. As research lower power systems-on- chip (SoCs) is to get a quantitative assessment of how in cognitive psychology shows, these are consisting of, for example, sensors, analog successfully the superconductor technolo- the main steps in human reasoning. We interface circuit, memory, digital process- gy in its latest form will be able to compete think that an automated, knowledge- ing blocks, and RF wireless transmission in terms of energy efficiency with future 10 based reasoning flow can produce circuit circuitry. (NSF, CAREER) nm CMOS-based designs. (ARO) designs that are more similar to those created by human experts, offer superior performance, and are easier to verify. It can also enable new kinds of EDA tools to potentially express (all) knowledge in circuit design, produce innovative designs for new applications, and exploit unstudied opportunities (NSF)

50 Advanced Computing Systems

The primary objective is to develop a reli- large scale problems and for high perfor- able framework to facilitate highly het- General-Purpose mance computing. We have built several erogeneous and tightly coupled 3D SoCs large GPU clusters and implemented for substantially complex applications, Computation on Graphics several applications on them. The most where plug-and-play based approaches Processing Units recent GPU cluster for the Reality Deck are not sufficient. The proposed circuit- is the largest one with an aggregate 2.3 and physical-level design and analysis Arie Kaufman and Klaus Mueller teraflops CPU performance, 220 teraflops methodologies serve as a novel framework GPU performance and 1.2 TB of memory. to produce robust 3D SoCs consisting of A commodity graphics processing unit However, programming a GPU cluster is diverse electrical planes such as analog/ (GPU) can perform general-purpose difficult. Therefore, we have developed RF, digital, and memory. At a time when computations and not only the specific the Zippy framework to simplify the GPU the fundamental limits of traditional scal- built-in graphics operations. GPU is a cluster programming. Zippy is based on ing are approaching, the synergy between multi-core coprocessor that supports high global arrays and stream programming 3D integration and heterogeneous SoCs data parallelism. Its performance has been and hides the low-level details. The GPU brings new opportunities to a variety of growing at a rate of squared Moore’s law, clusters have the potential to become the applications such as mobile computing, and its peak performance exceeds that of peta-scale supercomputers of the near life sciences, and environmental control. the CPU by an order of magnitude. We fur- future. We have done much groundbreak- Figure 11. (NSF) ther advocate the use of a GPU cluster for ing computational science work on GPU and GPU clusters, such as computa- tional fluid dynamics using the Lattice Botzmann Model (LBM), medical image reconstruction (RapidCT), large-scale complex volume visualization, and hashing and compression on the GPU. A specific example of using the GPU cluster includes real-time plume dispersion simulation in complex urban environments, such as midtown Manhattan, where nearly 1,000 skyscrapers geometries serve as boundary conditions which substantially influence the flow. Example applications using Zippy include LBM, volume visualization, and isosurface extraction. (NSF)

Fig.11 3D heterogeneous IC with four stacked dies

51 www.cewit.org

Center of Excellence in Wireless and Information Technology (CEWIT) Stony Brook University Research and Development Park 1500 Stony Brook Road Stony Brook, NY 11794-6040 Phone: (631) 216-7000 [email protected]