<<

XSEDE: the Extreme Science and Engineering Discovery Environment

PY1-3 Comprehensive Report July 1, 2011 through June 30, 2014

PY1-3 Comprehensive Report Page i

XSEDE Senior Management Team (SMT) John Towns (NCSA) PI and Project Director Kelly Gaither (TACC) Interim Co-PI Ralph Roskies (PSC) Co-PI and Extended Collaborative Support Service-Projects Director Nancy Wilkins-Diehr (SDSC) Co-PI and Extended Collaborative Support Service-Communities Director Greg Peterson (UT-NICS) Co-PI and Operations Director Chris Hempel (TACC) Interim Users Services Director Scott Lathrop (Shodor) Education & Outreach Director Justin Whitt (UT/NICS) Manager of Project Management Janet Brown (PSC) Manager of Systems and Software Engineering JP Navarro (UChicago) Manager of Software Development & Integration Tom Cheatham (Utah) Chair of the User Advisory Committee (voting member of the SMT) Richard Moore (SDSC) Chair of the XD Service Providers Forum (voting member of the SMT)

PY1-3 Comprehensive Report Page ii

Table of Contents Preamble ...... 1 1. Executive Summary ...... 1 1.1. Strategic Goals ...... 1 1.2. Progress toward Strategic Goals ...... 2 1.3. Summary & Project Highlights ...... 2 2. Science and Engineering Highlights ...... 5 2.1. Environmental Biology/Behavioral Genomics: Using Next-Generation Sequencing to Identify Genetic Differences in the Social Lives of Mammals (Eileen Lacey and Matthew MacManes, University of California, Berkeley) ...... 5 2.2. Biomass Research: Cellulase Enzyme Structure-Function Relationships (Gregg T. Beckham, National Renewable Energy Laboratory) ...... 6 2.3. Biochemistry: Evaluating and improving the performance of RNA force fields in molecular dynamics simulations (Thomas Cheatham, University of Utah College of Pharmacy) .... 7 2.4. Aeronautics and astronautics — Turbulence Modeling: Droplet-laden Isotropic Turbulence (Antonino Ferrante, William E. Boeing Department of Aeronautics and Astronautics, University of Washington, Seattle) ...... 8 2.5. Economics: Fast Construction of Nanosecond Level Snapshots of Financial Markets (Mao Ye, University of Illinois College of Business) ...... 9 2.6. Chemistry: Clean Air Technologies: XSEDE resources help chemists develop new tools for cleaner air, energy production (Brian Space, University of South Florida) ...... 9 2.7. Multimedia Literacy: Analysis of video content (Virginia Kuhn, University of Southern California) ...... 10 2.8. Seismology: Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers (Yifeng Cui, San Diego Supercomputer Center, UC San Diego) ...... 11 2.9. A HPC “Cyber Wind Facility” Incorporating Fully-Coupled CFD/CSD for Turbine-Platform- Wake Interactions with the Atmosphere and Ocean (PI: James G. Brasseur, Pennsylvania State University) ...... 12 2.10. Materials Research: Atomistic Characterization of Stable and Metastable Alumina Surfaces (Douglas Spearot, University of Arkansas) ...... 13 2.11. Mechanics and Materials: An Inverse Stiffness Mapping Approach to the Early Detection of Breast Cancer (Lorraine Olson, Rose-Hulman Institute of Technology) ...... 13 2.12. Materials Research: “Morphology and flow properties of micelle-nanoparticle solutions from Molecular Dynamics simulations” (Subas Dhakal, Syracuse University) ...... 14 3. Discussion of Goals and Key Performance Indicators ...... 16 3.1. Deepen and Extend Use ...... 16 3.1.1. Deepening use to Existing Communities ...... 16 3.1.2. Extending Use to New Communities ...... 17 3.1.3. Prepare the Current and Next Generation ...... 17

PY1-3 Comprehensive Report Page iii

3.1.4. Raising Awareness ...... 18 3.2. Advance the Ecosystem ...... 19 3.2.1. Create an Open and Evolving e-Infrastructure ...... 19 3.2.2. Enhance the Array of Technical Expertise and Support Services ...... 19 3.3. Sustain the Ecosystem ...... 20 3.3.1. Provide Reliable, Efficient, and Secure Infrastructure ...... 20 3.3.2. Provide Excellent User Support ...... 21 3.3.3. Effective and Productive Virtual Organization ...... 21 3.3.4. Innovative Virtual Organization...... 22 4. Supporting and Expanding the Community ...... 23 4.1. Extended Collaborative Support Service - ECSS (WBS 1.4 and 1.5) ...... 23 4.2. ECSS – Projects (WBS 1.4) ...... 24 4.2.1. Extended Support for Research Teams (WBS 1.4.1) ...... 24 4.2.2. Novel and Innovative Projects (WBS 1.4.2) ...... 25 4.3. ECSS – Communities (WBS 1.5) ...... 25 4.3.1. Extended Support for Community Codes (WBS 1.5.1) ...... 27 4.3.2. Extended Support for Science Gateways (WBS 1.5.2) ...... 28 4.3.3. ECSS Communities – Extended Collaborative Support for Training, Education and Outreach (WBS 1.5.3) ...... 29 4.4. User Services (WBS 1.3) ...... 30 4.4.1. Training (WBS 1.3.1) ...... 32 4.4.2. User Information & Interfaces (WBS 1.3.2) ...... 34 4.4.3. User Engagement (WBS 1.3.3) ...... 35 4.4.4. Allocations (WBS 1.3.4) ...... 36 4.5. Education and Outreach (WBS 1.6) ...... 36 4.5.1. Education (WBS 1.6.1) ...... 37 4.5.2. Campus Bridging (WBS 1.6.5) ...... 39 4.5.3. Champions Program (WBS 1.6.6) ...... 40 4.5.4. Under-represented Community Engagement (WBS 1.6.7) ...... 41 4.5.5. Student Engagement Programs (WBS 1.6.8) ...... 43 4.5.6. Evaluation (WBS 1.6.9) ...... 44 4.5.7. Coordination and Administration (WBS 1.6.10) ...... 44 5. Delivering & Supporting New Capabilities in an Evolving Environment ...... 46 5.1. Architecture for a National Distributed Cyberinfrastructure Ecosystem (Architecture & Design, WBS 1.1.3) ...... 46 5.2. Realizing the XSEDE Architecture (Software Development & Integration, WBS 1.1.6) ...... 47 5.3. Technology Investigation Service (WBS 1.7) ...... 48

PY1-3 Comprehensive Report Page iv

5.3.1. TIS - Technology Identification (WBS 1.7.1) ...... 49 5.3.2. Technology Evaluation (WBS 1.7.2) ...... 50 5.4. XSEDE Operations (WBS 1.2) ...... 51 5.4.1. Cybersecurity (WBS 1.2.1) ...... 54 5.4.2. Data Services (WBS 1.2.2) ...... 55 5.4.3. XSEDEnet (WBS 1.2.3) ...... 56 5.4.4. Software Testing & Deployment (WBS 1.2.4) ...... 57 5.4.5. Accounting & Account Management (WBS 1.2.5) ...... 58 5.4.6. Systems Operational Support (WBS 1.2.6) ...... 58 6. Managing the Program ...... 60 6.1. XSEDE Project Office (WBS 1.1) ...... 60 6.1.1. Project Management & Reporting (WBS 1.1.1) ...... 60 6.1.2. Systems & Software Engineering (WBS 1.1.2) ...... 61 6.1.3. External Relations (WBS 1.1.4) ...... 62 6.1.4. Industry Relations (WBS 1.1.5) ...... 63 A Glossary and List of Acronyms ...... 1 B Project Metrics ...... 1 B.1 Project-wide Metrics ...... 1 B.1.1 User community metrics ...... 2 B.1.2 Project and allocation metrics ...... 5 B.1.3 Resource and usage metrics ...... 6 B.1.4 Data Services ...... 8 B.1.5 User Information & Interfaces ...... 9 B.2 Area Metrics ...... 10 B.2.1 XSEDE Project Office (WBS 1.1) (John Towns) ...... 10 B.2.1.1. Project Management and Reporting (WBS 1.1.1) (Justin Whitt) ...... 10 B.2.1.2. Systems & Software Engineering (WBS 1.1.2) (Janet Brown)...... 10 B.2.1.3. Architecture and Design (WBS 1.1.3) (David Lifka) ...... 10 B.2.1.4. External Relations (WBS 1.1.4) (Travis Benjamin Tate) ...... 11 B.2.1.5. Industry Relations (WBS 1.1.5) (Dave Hudak) ...... 11 B.2.1.6. Software Development and Integration (WBS 1.1.6) (JP Navarro) ...... 11 B.2.1.7. Project Administration (WBS 1.1.7) (John Towns)...... 12 B.2.2 XSEDE Operations (WBS 1.2) (Greg Peterson) ...... 12 B.2.2.1. Cybersecurity (WBS 1.2.1) (Randy Butler and Jim Marsteller) ...... 12 B.2.2.2. Data Services (WBS 1.2.2) (Chris Jordan) ...... 13 B.2.2.3. XSEDEnet (WBS 1.2.3) (Joe Lappa) ...... 13

PY1-3 Comprehensive Report Page v

B.2.2.4. Software Testing & Deployment (WBS 1.2.4) (Tabitha Samuel) ...... 14 B.2.2.5. Accounting & Account Management (WBS 1.2.5) (Amy Schuele) ...... 14 B.2.2.6. Systems Operational Support (WBS 1.2.6) (Stephen McNally) ...... 15 B.2.2.7. User Services (WBS 1.3) (Chris Hempel) ...... 15 B.2.2.8. Training (WBS 1.3.1) (Vince Betro) ...... 16 B.2.2.9. User Information & Interfaces (WBS 1.3.2) (Maytal Dahan) ...... 16 B.2.2.10. User Engagement (WBS 1.3.3) (Bobby Whitten) ...... 16 B.2.2.11. Allocations (WBS 1.3.4) (Ken Hackworth) ...... 17 B.2.2.12. Extended Collaborative Support Service - ECSS (WBS 1.4 and 1.5) ...... 17 B.2.2.13. ECSS – Projects (WBS 1.4) (Ralph Roskies) ...... 18 B.2.2.13.1. Extended Support for Research Teams (WBS 1.4.1) (Mark Fahey)...... 18 B.2.2.13.2. Novel and Innovative Projects (WBS 1.4.2) (Sergiu Sanielevici) ...... 19 B.2.2.14. ECSS – Communities (WBS 1.5) (Nancy Wilkins-Diehr) ...... 19 B.2.2.14.1. Extended Support for Community Codes (WBS 1.5.1) (John Cazes) ...... 19 B.2.2.14.2. Extended Science Gateways Support (WBS 1.5.2) (Suresh Marru) ...... 20 B.2.2.14.3. ECSS Communities – Extended Collaborative Support for Training, Education and Outreach (WBS 1.5.3) (Jay Alameda) ...... 20 B.2.3 Education and Outreach (WBS 1.6) (Scott Lathrop) ...... 21 B.2.3.1. Education (WBS 1.6.1) (Steve Gordon)...... 21 B.2.3.2. Campus Bridging (WBS 1.6.5) (Craig Stewart) ...... 22 B.2.3.3. Champions Program (WBS 1.6.6) (Kay Hunt) ...... 22 B.2.3.4. Under-represented Community Engagement (WBS 1.6.7) (Linda Akli) ...... 23 B.2.3.5. Student Engagement Programs (WBS 1.6.8) (Jim Ferguson) ...... 24 B.2.3.6. Coordination and Administration (WBS 1.6.10) (Scott Lathrop) ...... 24 B.2.4 Technology Investigation Service (WBS 1.7) (J. Ray Scott) ...... 24 B.2.4.1. TIS - Technology Identification (WBS 1.7.1) (Maytal Dahan) ...... 25 B.2.4.2. Technology Evaluation (WBS 1.7.2) (Dan Lapine) ...... 25 C Publications Listing ...... 1 C.1 XSEDE Staff Publications ...... 1 C.2 Publications from XSEDE Users ...... 2 D The Value of XSEDE: Value-Add and Cost-Avoidance ...... 1 D.1 Cost avoidance ...... 2 D.2 Value added ...... 5 D.3 Funding for Service Provider activities ...... 8 D.4 A vignette on value added funding for Service Provider activities ...... 9 E SWOT Analysis of XSEDE ...... 1

PY1-3 Comprehensive Report Page vi

F Evolving XSEDE and Organizational Improvement ...... 1 F.1 Process Improvements...... 1 F.1.1 Formal Identification of Improvements ...... 2 F.1.1.1. The Baldrige Criteria ...... 2 F.1.1.2. Staff Climate Study ...... 3 F.1.1.3. Engineering Process Improvement ...... 4 F.2 Organizational Changes ...... 5 F.3 Budget Savings ...... 5 F.4 Project Change Requests ...... 7 G Increasing Diversity ...... 1 H Sustainability Planning ...... 1 H.1 XRAS Pilot to Product ...... 2 H.1.1 Roadmap Development for XRAS ...... 2 H.1.2 XRAS Pricing ...... 3 H.2 Expanding the Portfolio of Offerings ...... 3 I Indicators of Impact: Developing Impact Metrics ...... 1 I.1 Publication and Citation Metrics ...... 2 I.2 Longitudinal Tracking Proof of Concept Study (July 2011 – July 2014) ...... 4 I.2.1 Findings ...... 5 I.2.1.1. All XSEDE Portal Users ...... 5 I.2.1.2. Training ...... 6 I.2.1.3. Campus Champions ...... 7 I.2.1.4. Student Programs ...... 9 I.3 Conclusions ...... 11 J XD Service Provider Forum Report ...... 1 J.1 Summary ...... 1 J.2 Context ...... 1 J.3 Major Activities ...... 2 J.4 Membership ...... 4 J.5 Leadership ...... 5 K TAS Summary ...... 1

PY1-3 Comprehensive Report Page vii

List of Tables Table 1-1: Summary of key performance indicators (KPIs) for XSEDE (as of PY3)...... 3 Table 3-1: KPIs for the sub-goal of deepening use to existing communities...... 16 Table 3-2: KPIs for the sub-goal of extending use to new communities...... 17 Table 3-3: KPIs for the sub-goal of preparing the current and next generation...... 18 Table 3-4: KPIs for the sub-goal of raising the awareness of XSEDE...... 18 Table 3-5: KPIs for the sub-goal of creating an open and evolving e-infrastructure...... 19 Table 3-6: KPIs for the sub-goal of enhancing the array of technical expertise and support services...... 20 Table 3-7: KPIs for the sub-goal of providing reliable, efficient, and secure infrastructure...... 20 Table 3-8: KPIs for the sub-goal of providing excellent user support...... 21 Table 3-9: KPIs for the sub-goal of providing an effective and productive virtual organization...... 21 Table 4-1: KPIs for ECSS...... 23 Table 4-2: KPIs for ECSS-ESRT...... 24 Table 4-3: KPIs for ECSS-NIP...... 25 Table 4-4: KPIs for ECSS-ESCC...... 27 Table 4-5: KPIs for ECSS-ESSGW...... 29 Table 4-6: KPIs for ECSS-ESTEO...... 30 Table 4-7: KPIs for User Services...... 31 Table 4-8: KPIs for User Services-Training...... 33 Table 4-9: KPIs for User Services-User Information & Interfaces...... 34 Table 4-10: KPIs for User Services-User Engagement...... 35 Table 4-11: KPIs for User Services-Allocations...... 36 Table 4-12: KPIs for E&O...... 37 Table 4-13: KPIs for E&O-Education...... 38 Table 4-14: KPIs for E&O-Campus Bridging...... 39 Table 4-15: KPIs for E&O-Champions Program...... 40 Table 4-16: KPIs for E&O-Under-represented Community Engagement...... 42 Table 4-17: KPIs for E&O-Student Engagement...... 43 Table 4-18: KPIs for E&O-Evaluation...... 44 Table 4-19: KPIs for E&O-Coordination and Administration...... 45 Table 5-1: KPIs for Architecture & Design...... 47 Table 5-2: KPIs for Software Development & Integration...... 47 Table 5-3: KPIs for TIS...... 49 Table 5-4: KPIs for TIS-Technology Identification...... 49 Table 5-5: KPIs for TIS-Technology Evaluation...... 50 Table 5-6: KPIs for XSEDE Operations...... 51 Table 5-7: KPIs for XSEDE Operations-Cybersecurity...... 55 Table 5-8: KPIs for XSEDE Operations-Data Services...... 56 Table 5-9: KPIs for XSEDE Operations-XSEDEnet...... 57 Table 5-10: KPIs for XSEDE Operations-Software Testing & Deployment...... 57 Table 5-11: KPIs for XSEDE Operations-Accounting & Account Management...... 58 Table 5-12: KPIs for XSEDE Operations-Systems Operational Support...... 59 Table 6-1: KPIs for Project Office...... 60 Table 6-2: KPIs for Project management & Reporting...... 61 Table 6-3: KPIs for Systems & Software Engineering...... 62 Table 6-4: KPIs for External Relations...... 63 Table 6-5: KPIs for Industry Relations...... 64

PY1-3 Comprehensive Report Page viii

Preamble This report is the result of a still active process of developing crisp statements of XSEDE’s vision, mission, goals, and associated key performance indicators (KPIs) along with reporting on the progress in delivering our mission and realizing our goal. This is most certainly a work in progress. For a large, complex, highly distributed project such as XSEDE, this has been a considerable undertaking. This process has helped XSEDE improve as an organization and a provider of services to the compute- and data-enabled science and engineering research and education community. This process has also resulted in a significant change in how XSEDE reports on its activities and progress using a metrics-based approach The Executive Summary (§1) is intended to effectively and concisely communicate the status of the project toward delivery of the mission and realization of the vision by reaching the three strategic goals. Stoplight indicators (§1.2) are used to visually provide a quick understanding of our assessment of overall project progress with respect to the strategic goals in light of our KPIs. The Science and Engineering Highlights (§2) provide a small selection of a continuing series of scientific and engineering research and education successes XSEDE has enabled. These successes are an ongoing testament to the importance of our services to the research community. The Discussion of Goals and Key Performance Indicators (§3) provides the next level of detail in understanding project progress. It decomposes the strategic goals into sub-goals and discusses progress toward each of the sub-goals using KPIs that represent measures of impact to the scientific community where possible. As noted in the report, in some cases metrics for impact, outcome, or both are not currently being collected and in these cases the best available metric is used as the KPI. These first three sections take a project-wide view. A more detailed analysis of progress from the view of the areas responsible for supporting each of the sub-goals—and contributing toward the KPIs associated with those sub-goals—is provided by looking at each of the areas of the project in the remaining sections (§4, §5, and §6). These sections also contain area highlights and additional KPIs that the area’s staff has deemed important. It is anticipated that this report is read in electronic form (PDF) using Adobe Reader®. There is extensive cross linking to facilitate referencing content across the document. In general, all text that has blue underlining (e.g. §1.1 ) is clickable. Clicking on the underlined text will take you to the referenced section. These are set up to facilitate moving back and forth between the high level discussions in §1 and §3, to more detailed discussions regarding specific project areas in §4, §5, and §6. As noted, this is a work in progress and we welcome comments on how to improve any and all aspects of our reporting process.

PY1-3 Comprehensive Report Page 1

1. Executive Summary Computing in science and engineering is now ubiquitous: digital technologies underpin, accelerate, and enable new, even transformational, research in all domains. Researchers continue to integrate an increasingly diverse set of distributed resources and instruments directly into their research and educational pursuits. Access to an array of integrated and well-supported high-end digital services is critical for the advancement of knowledge. XSEDE is a socio-technical platform that integrates and coordinates advanced digital services within the national ecosystem to support contemporary science. This ecosystem involves a highly-distributed yet integrated and coordinated assemblage of software, supercomputers, visualization systems, storage systems, networks, portals and gateways, collections of data, instruments and personnel with specific expertise. XSEDE fulfills the need for an advanced digital services ecosystem distributed beyond the scope of a single institution and provides a long-term platform to empower modern science and engineering research and education. As a significant contributor to this ecosystem, driven by the needs of the open research community, XSEDE (the Extreme Science and Engineering Discovery Environment) substantially enhances the productivity of a growing community of scholars, researchers, and engineers. XSEDE federates with other high-end facilities and with campus-based resources, serving as the foundation for a national e-science infrastructure with tremendous potential for enabling new advancements in research and education. Our vision is a world of digitally-enabled scholars, researchers, and engineers participating in multidisciplinary collaborations while seamlessly accessing computing resources and sharing data to tackle society’s grand challenges. Researchers use advanced digital resources and services every day to expand their understanding of our world. More pointedly, research now requires more than just supercomputers, and XSEDE represents a step toward a more comprehensive and cohesive set of advanced digital services through our mission: to substantially enhance the productivity of a growing community of scholars, researchers, and engineers through access to advanced digital services that support open research; and to coordinate and add significant value to the leading cyberinfrastructure resources funded by the NSF and other agencies. XSEDE has developed its strategic goals in a manner consistent with NSF’s strategies stated broadly in the Cyberinfrastructure Framework for 21st Century Science and Engineering1 vision document, and the more specifically relevant Advanced Computing Infrastructure: Vision and Strategic Plan2 document. 1.1. Strategic Goals To support our mission and to guide the project’s activities toward the realization of our vision, three strategic goals are defined: Deepen and Extend Use: XSEDE will deepen the use—make more effective use—of the advanced digital services ecosystem by existing scholars, researchers, and engineers, and extend the use to new communities. We will contribute to preparation—workforce development—of the current and next generation of scholars, researchers, and engineers in the use of advanced digital services via training, education, and outreach; and we will raise the general awareness of the value of advanced digital services. Advance the Ecosystem: Exploiting its internal efforts and drawing on those of others, XSEDE will advance the broader ecosystem of advanced digital services by creating an open and evolving e-infrastructure, and by enhancing the array of technical expertise and support services offered.

1 http://www.nsf.gov/cise/aci/reorgLandingPage.jsp?p=/od/oci/cif21/CIF21Vision2012current.pdf 2 http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12051

PY1-3 Comprehensive Report Page 1

Sustain the Ecosystem: XSEDE will sustain the advanced digital services ecosystem by assuring and maintaining a reliable, efficient, and secure infrastructure, and providing excellent user support services. XSEDE will further operate an effective, productive, and innovative virtual organization. The strategic goals of XSEDE cover a considerable scope. To assure we are delivering our mission and to assess progress toward our vision, we have identified key metrics to measure our progress toward meeting each sub-goal. These key performance indicators (KPIs) are a high-level encapsulation of our project metrics that measure how well we are meeting each sub-goal. Our planning is driven by our vision, mission, goals and these metrics, which are in turn rooted in the needs and requirements of the communities we serve. 1.2. Progress toward Strategic Goals Table 1-1 below shows the project’s progress toward the three strategic goals and associated sub- goals. Status icons are used in the table as follows: A green status is defined as a strategic goal for which at least 90% of the targets for all KPIs are met. A yellow status is defined as a strategic goal within which at least 60% of the targets for all KPIs are met. A red status is a strategic goal with less than 60% of the KPI targets met. A grey status indicates there are currently no metrics being tracked for this sub-goal or there is not complete data for any of the metrics being tracked. Multiple indicators represent a strategic goal that has sub-goals for which there is incomplete data or that have metrics not currently being tracked. In these cases, the second indicator is a qualitative assessment of the status provided in lieu of sufficient data or a formal metric being in place. Given that the metrics-based approach to reporting is new to XSEDE in PY3, we have provided the assessments for PY3. As can be seen, there is certainly room for improvement in some areas though this must be considered in light of aggressive targets being set in many areas. While these aggressive target are more challenging to meet—and result in not always having green indicators— we believe they help us strive toward excellence as a result. It is not surprising that we are struggling to meet the metrics in the more challenging areas of extending use to new communities, preparing the current and next generation workforce, and in particular in advancing the advanced digital services ecosystem. While great strides have been made in these areas, we are working to improve. 1.3. Summary & Project Highlights This report is the most comprehensive summary of XSEDE’s activities, progress, and impact to date covering the first three years of the project. Having recently introduced metrics-based reporting, it also represents a significant effort in attempting to retroactively measure and apply metrics and key performance indicators to measure our progress. As reflected in Table 1-1, XSEDE is doing well with respect to nearly all metrics that we have been measuring—details of this are in the sections below. However, we are still working toward both developing and measuring metrics and KPIs in some areas (e.g. innovative virtual organization). While we are beginning to use these metrics and KPIs to drive us toward excellence, we are also still gaining experience in this and continuing to refine the metrics and KPIs we have developed. To do this we continue to solicit external input, particularly via the XSEDE Advisory Board, the User Advisory Committee, the Service Providers Forum, the NSF, and the community we support.

PY1-3 Comprehensive Report Page 2

Table 1-1: Summary of key performance indicators (KPIs) for XSEDE (as of PY3).

Strategic Goals Sub-goals KPIs

Deepen and Extend Use (§3.1)

Deepening use (existing  Number of completed ECSS projects communities) (§3.1.1)  Average ECSS impact score

Extending use (new  Number of projects from new communities communities) (§3.1.2)  Number of users on new community projects

Prepare the current and  Number of individuals reached next generation (§3.1.3)  Average impact assessment by individuals reached

Raise awareness of the  XSEDE online exposure value of advanced  XSEDE media presence digital services (§3.1.4)

Advance the Ecosystem (§3.2)

Create an open and  # of software components deployed (beta + production evolving e-  # of users who use one or more Campus Bridging tools infrastructure (§3.2.1)  Overall Satisfaction with XSEDE Services

Enhance the array of  Number of staff participating in professional technical expertise and development activities support services  Average rating for training from Staff Climate Survey (§3.2.2)

Sustain the Ecosystem (§3.3)

Provide reliable,  Average composite availability of core services efficient, and secure infrastructure (§3.3.1)

Provide excellent user  Mean time to resolution for XOC service requests support (§3.3.2)  Average satisfaction ratings for support services

Effective and  Number of improvement actions taken productive virtual organization (§3.3.3)

Innovative virtual  qualitative evidence organization (§3.3.4) NOTE: Multiple indicators represent a strategic goal that has sub-goals that for which there is incomplete data or that are not currently being measured. Secondary indicators provide a qualitative assessment of the status in lieu of sufficient data or a formal metric being in place. Indicators: ≥ 90% of KPI targets are being met, ≥ 60% of KPIs are being met, < 60% of KPIs being met, no KPIs are currently being tracked or there is not complete data for any KPIs.

PY1-3 Comprehensive Report Page 3

Given the recent development of these KPIs, in the majority of cases initial targets were not set until late in Program Year 3 as part of preparing the Program Year 4 Program Plan. As a result, the targets we have given in the discussions of KPIs below indicate our PY4 targets to provide a consistent set of targets. In the midst of all of this, and as summarized in §2 of this report, XSEDE has supported and enabled an ongoing series of scientific and engineering research and education successes. They are an ongoing testament to the importance of XSEDE to the research community supporting greater productivity and making many advances practical on an advanced time scale.

PY1-3 Comprehensive Report Page 4

2. Science and Engineering Highlights By supporting greater productivity and making many advances practical on an accelerated time scale, XSEDE has enabled a continuing series of scientific and engineering research and education successes. These successes are an ongoing testament to the importance of our services to the research community. Here we provide a small selection of the many examples of these successes during the first three years of XSEDE. 2.1. Environmental Biology/Behavioral Genomics: Using Next-Generation Sequencing to Identify Genetic Differences in the Social Lives of Mammals (Eileen Lacey and Matthew MacManes, University of California, Berkeley) In the foothills of the Santa Cruz Mountains, two closely related of mice share a habitat and a genetic lineage, but have very different social lives. The California mouse (Peromyscus californicus) is characterized by a lifetime of monogamy; the deer mouse (Peromyscus maniculatus) is sexually promiscuous. Researchers from the University of California, Berkeley examined the differences between these two species of mice on a microscopic and molecular level. They discovered that the lifestyles of the two mice had a direct impact on the bacterial communities that reside within the female reproductive tract of the species. These differences correlate with enhanced diversifying selection on genes related to immunity against bacterial diseases. The Figure 2-1: Phylogeny of the 16S rRNA sequences used in results were published in the May 2012 this study. Gray branches correspond to the bacterial edition of PLoS One. Post-doctoral researcher sequences recovered from the monogamous P. Matthew MacManes performed a genetic californicus, while the black branches related to the promiscuous P. maniculatus. Phylotypes recovered in analysis on the variety of DNA present in both host species were relegated to the host in which they each species, revealing hundreds of different were more common. types of bacteria. He found that the promiscuous deer mouse had twice the bacterial diversity as the monogamous California mouse. Since many bacteria cause sexually transmitted infections (like chlamydia or gonorrhea), he used the diversity of bacteria as a proxy for risk of disease. The researchers next studied how the bacterial diversity in the promiscuous mice might translate into changes to the genes involved in immune function. MacManes hypothesized that selective pressures caused by generations of bacterial warfare had fortified the genomes of the promiscuous deer mouse against the array of bacteria it hosts. Based on a comparison of the two species' genotypes, he confirmed that the promiscuous mice had much more diversity in the genes related to their immune system. The results match findings in humans and other species with differential mating habits. They show that differences in social behavior can lead to changes in the selection pressures and gene-level evolutionary changes in a species. To analyze datasets too big for their university laboratory clusters, the researchers used NSF supercomputers allocated through the Extreme Science and Engineering Discovery Environment (XSEDE), including Ranger at TACC. The alignment and analysis that MacManes accomplished on Ranger in a few weeks would have taken years with his local resources, allowing him to rapidly find insights about the relationship between genes and

PY1-3 Comprehensive Report Page 5

behavior. For related genomics work, MacManes used Blacklight at the Pittsburgh Supercomputing Center (PSC), and received consulting support from Phil Blood at PSC. In the study, MacManes looked at differences in gene expression related to the transition from social to solitary living in the colonial tuco-tuco, a species of burrowing rodent. Presenting a unique case for studying social behavior, some tucos live in groups while others leave their burrow system to live in solitary conditions. MacManes used messenger RNA extracted from the hippocampus (a brain region implicated by prior research in social behavior) of captive tucos housed as two control groups — in social and solitary conditions. The extracted Figure 2-2: The colonial tuco-tuco (Ctenomys tissue was sequenced and yielded 56 billion base- sociabilis) in its native Patagonian environment pairs of raw data. XSEDE consultant Phil Blood of (volcanic ash). The tuco-tuco is a species of PSC installed pre-compiled modules of about 20 subterranean rodent, related to the common guinea pig, of interest in behavioral genomics because it programs typically used in genomics assembly exhibits within one species extremes between social and analysis, which, says MacManes, were and solitary living conditions. extremely helpful in identifying a number of genes that are differentially expressed according to tuco social behavior. Using 640 gigabytes of RAM (80 cores) on Blacklight, the assembly required 14 days of computing, with subsequent analysis extending for months. MacManes identified a number of genes that are differentially expressed according to tuco social behavior, and the researchers have a manuscript reporting their findings in preparation. 2.2. Biomass Research: Cellulase Enzyme Structure-Function Relationships (Gregg T. Beckham, National Renewable Energy Laboratory) Plant cell wall polysaccharides such as cellulose and hemicellulose offer a vast renewable resource for the sustainable production of transportation biofuels and commodity chemicals through industrial processing, but plants have evolved these structural polymers as natural defense mechanisms to resist infection and degradation. As plants represent a vast source of energy for organisms from all kingdoms of life on Earth, many biological paradigms have evolved to deconstruct these polymers to sugars. These natural biological degradation strategies offer an obvious starting point for harnessing their capability in an industrial context to produce biofuels. To that end, Beckham and colleagues use a variety of computational methods at varying resolution, from quantum mechanical calculations to large, atomistic molecular dynamics simulations, up to coarse-grained resolution models to understand the mechanisms that Nature employs to break down cellulose and hemicellulose into their component sugars. Understanding the various steps required for biomass degradation at different length and time scales requires a broad range of

Figure 2-3: The Family 7 cellobiohydrolase from T. reesei. computer architectures, which makes

PY1-3 Comprehensive Report Page 6

XSEDE resources ideal for this type of work. In particular, Beckham and colleagues conduct quantum mechanical calculations to examine the electronic structure of plant cell wall carbohydrates and to understand enzymatic reactions to break polysaccharides down into the component sugars. These types of calculations require significant memory, which are ideal for PSC and SDSC resources. Classical molecular dynamics simulations of enzymes and large crystalline substrates, as would be found when an enzyme is degrading cellulose in the plant cell wall, require computer architectures with fast interconnects for parallelization and fast compute nodes. These types of simulations are ideal for resources at NICS and TACC. Overall, the variety of XSEDE resources available is enabling the group’s multiple types of projects to occur simultaneously and efficiently, with the overall aim of understanding and engineering enzymes for more efficient conversion of lignocellulosic biomass to renewable biofuels. To date, the NREL group has predicted new roles for glycosylation on cellulase enzymes, elucidated new roles for the individual sub- domains in cellulases, and revealed new understanding about inhibition of enzymes by their disaccharide product, cellobiose. More recently, they have used quantum mechanical calculations and hybrid quantum mechanical/molecular mechanics methods to elucidate various aspects of cellulose hydrolysis. 2.3. Biochemistry: Evaluating and improving the performance of RNA force fields in molecular dynamics simulations (Thomas Cheatham, University of Utah College of Pharmacy) Thomas Cheatham’s group focuses on adjusting the parameters of RNA force fields to better fit either experimentally determined data (NMR, X-ray crystallography, and others) or highly accurate quantum mechanical data. Making such comparisons, the group has identified potentially anomalous simulated conformations of the RNA molecule’s sugar ring. Adjusting the force field parameters provided a better match with the quantum mechanical results and thus improved simulation accuracy, enhancing understanding of this biologically pivotal molecule and offering the potential for future therapeutic targets. These quantum calculations are extremely memory intensive, and the Blacklight machine at PSC was critical for performing them due to its large shared memory capability. The group’s current work, testing these results in realistic molecular dynamic simulations of RNA, require simulating on long enough timescales for the RNA to convert between various conformations. Simulations of several microseconds of real time are necessary. To mitigate the months of traditional CPU hardware time that would otherwise be needed, the investigators have performed GPU- accelerated molecular dynamics on the NICS Keeneland machine. An alternative to long timescale simulations, enhanced sampling methods via replica exchange molecular dynamics, employs an ensemble of simulations (typically 10- 50) run simultaneously at different temperatures and exchanging at defined intervals. They found that a special variant of replica exchange, called "reservoir replica exchange", works best for studying a small oligonucleotide. Figure 2-4: The Cheatham group’s In this variant, a high temperature reservoir of structures is work revealed that some simulations inaccurately predicted which of two pre-generated using a conventional MD on a GPU. This conformations the sugar moiety in reservoir is then coupled to the replica exchange procedure RNA would take: C3'-endo (top) or and drives convergence much faster than "regular" replica C2'-endo

PY1-3 Comprehensive Report Page 7 exchange They are now using the method to test new force fields and solvation conditions. The computational demands of this work would not have been achievable without allocations on Kraken at NICS, Ranger at TACC, and Gordon at SDSC. 2.4. Aeronautics and astronautics — Turbulence Modeling: Droplet-laden Isotropic Turbulence (Antonino Ferrante, William E. Boeing Department of Aeronautics and Astronautics, University of Washington, Seattle) While most people think of turbulence as the source of unsettling bouts of chaotic airflow during a flight on an airplane, physicists have a much deeper perspective. In fact, Nobel Laureate and theoretical physicist Richard Feynman once said, “Turbulence is the most important unsolved problem of classical physics.” Probing the puzzling nature of turbulence is essential because of its impact on the flow physics of liquids and gases, and, by extension, the influence of those fundamental states of matter flowing outside and inside things — whether the setting is the atmosphere (pollution and volcanic ash) or a jet engine, a combustor, or a nuclear reactor, for example. Antonino Ferrante of the University of Washington, Seattle, is leading a research team employing turbulence simulations and modeling to investigate fluid dynamics (both gases and liquids in motion). Resolving the wide-range scales of motion requires fine computational grids with billions of points. “Without high-performance computing (HPC), turbulence simulation would just not be feasible, and our understanding of turbulent flows would not progress much,” he says. Moreover, as HPC becomes more robust, he adds, it will allow researchers to address the challenges of reducing fuel consumption and CO2 emissions. XSEDE’s Extended Collaborative Support Services provided an avenue for Ferrante and his team to access the HPC resources and expertise offered by the University of Tennessee’s National Institute for Computational Sciences and by the National Center for Supercomputing Applications at the University of Illinois at Urbana–Champaign. Ferrante worked with Vince Betro, a NICS computational scientist and a member of ECSS, to make it possible to simulate turbulent flows as well as turbulence coupled with other phenomena — including chemically reactive and multiphase turbulent flows — while getting results in a reasonable time. The team worked closely with David Bock, a NCSA Visualization Programmer, and Darren Adams, an NCSA Research Programmer, both members of ECSS. They were able to provide Petascale scaling and deployment, development of a high-level HDF5 parallel I/O layer (H5DNS), and custom visualization of simulation results. Variables such as velocity and pressure had to be advanced in time in billions of grid points; minimum resolution was 10243 data points. Kraken enabled the researchers to address the challenge. Ferrante says: “For the first time, we are simulating the effects of droplets on turbulence. In practice, we are running a real-life experiment on a supercomputer rather than a wind-tunnel.” The result was a turbulence model that can be coupled Figure 2-5: Fully resolved droplet-laden decaying with flow-solver computer applications to better isotropic turbulence. The plotted cube is a 512th of understand fluid flow and design all manner of the entire cubic box simulated using a mesh of objects affected by fluids (e.g., air and water). 10,243 grid points and 7,000 droplets.

PY1-3 Comprehensive Report Page 8

2.5. Economics: Fast Construction of Nanosecond Level Snapshots of Financial Markets (Mao Ye, University of Illinois College of Business) Most economists agree that the millisecond stock trades enabled by automated trading have improved the efficiency and fairness of the markets. Subsequent advances in technology, which reduced the time necessary for a trade to the nanosecond scale, have however posed potential problems not balanced by further improvements in efficiency. In particular, the use of massive, fast trades to manipulate the market, as well as the profound effects of inadvertent software bugs such as the “Flash Crash” of May 6, 2010, suggest that ever-faster trades may actually be destabilizing the markets. With help from the NIP program, Mao Ye, Jiading Gai, Chen Yao, Jiading Gai and Maureen O'Hara of Cornell University have employed multiple XSEDE resources to study these phenomena. In earlier work, they used PSC Blacklight to show that 20 percent of NASDAQ trades — and more than half Figure 2-6: The May 6, 2010, “Flash Crash” revealed how profoundly even inadvertent flaws in trading of some major stocks such as Google — are done software can destabilize markets. DIJA: Dow Jones automatically in “odd lots” that have not Industrial Average; S&P 500: Standard & Poor’s 500. previously been reportable to the moment-by- moment TAQ “ticker tape.” These findings contributed to a change in NASDAQ and New York Stock Exchange rules, such that all trades are reportable and visible moment-by-moment. Following up on that retrospective study of the markets, and with the help of SDSC ECSS staff, the collaborators used SDSC Gordon, TACC Stampede, and PSC Blacklight to optimize the code for analyzing market data, speeding the analyses by as much as 126-fold. Future work, particularly with SDSC Gordon and PSC Blacklight (the former for real-time analysis of individual stocks, the latter for memory- intensive calculations involving large groups of stocks), holds the promise of speeding analyses of multiple stocks such that researchers and regulators can follow and understand automated trades over whole markets in near-real-time. 2.6. Chemistry: Clean Air Technologies: XSEDE resources help chemists develop new tools for cleaner air, energy production (Brian Space, University of South Florida) Chemists at the University of South Florida and King Abdullah University of Science and Technology have discovered a more efficient, less expensive, and reusable material for carbon dioxide (CO2) capture and separation. The breakthrough could have implications for a new generation of clean-air technologies and offers new tools for confronting the world's challenges in controlling carbon. The international group of scientists has identified a previously underused material — known as SIFSIX-1-Cu — that offers a highly efficient mechanism for capturing CO2. The discovery addresses one the biggest challenges of capturing CO2 before it enters the atmosphere: energy costs associated with the separation and purification of industrial commodities currently consumes 15 percent of global energy production. The demand for such commodities is projected to triple by 2050.

PY1-3 Comprehensive Report Page 9

To confirm their findings, the researchers used supercomputing simulations enabled by the National Science Foundation's XSEDE cyberinfrastructure: Ranger (TACC), Blacklight (PSC) and Trestles (SDSC). They initially used Blacklight to simulate the behavior of small numbers of gas molecules with each other and with metal-organic-materials (MOMs). Predicting the exact behavior of even small numbers of molecules requires a huge amount of computer memory, for which Blacklight is particularly well suited. The researchers then used the results to simulate, at a coarser level, the behavior of the gasses and the MOMs in bulk on Ranger and Trestles. Figure 2-7: The metal-organic framework material at the center The scientists believe SIFSIX-1-Cu has three significant of a new discovery by chemists at applications: carbon-capture for coal-burning energy plants; the USF and KAUST is shown under a microscope. It is a promising purification of methane in natural gas wells; and the breakthrough in developing better advancement of clean-coal technology. The next step is to carbon-control technologies. collaborate with engineers to determine how the materials can be manufactured and implemented for real-world uses. These research results were published online in the journal Nature in February 2013 with the title: "Porous materials with optimal adsorption thermodynamics and kinetics for CO2 separations." 2.7. Multimedia Literacy: Analysis of video content (Virginia Kuhn, University of Southern California) Virginia Kuhn, associate director of the Institute for Multimedia Literacy and an associate professor of cinema practice at the University of Southern California, believes that understanding the impact of film and video is “a matter of large-scale public literacy. Understanding how these screens impact us is crucial. It’s just vital to an educated citizenry.” Kuhn’s work is computationally intensive, but she had shied away from using supercomputers because her large file sizes and need for interactivity were ill-suited to the traditional HPC model. XSEDE digital humanities specialist Alan Craig worked with SDSC to enable Kuhn to use the XSEDE-allocated Gordon system for interactive database querying rather than traditional batch processing. SDSC dedicated one head node and four compute nodes on Gordon to Kuhn’s project. Kuhn is working with films in the Internet Archive, which contains more than a million digital movies, ranging from classic full-length films to news broadcasts, concerts, and cartoons. Thanks to the assistance she’s received through XSEDE’s ECSS, Kuhn is able to use computationally intensive algorithms to do real-time, sophisticated queries to determine which frames in which film or video to return. The proof-of-concept search returned 300 videos–from an archive containing over one million videos, which would never be searchable by a persona in an entire lifetime. She is also doing some experiments with automatic metadata extraction of the multimedia resources, as well as experimenting with image and video analytics. In addition, the ECSS team is assisting Kuhn by developing tools to make film and video images easily available to scholars to study. Luigi Marini and Liana Diesendruck at NCSA customized the Medici multimedia content management system for Kuhn’s project, making it easy to upload and tag films and video and adapting the software to identify key features of interest to Kuhn such as color, camera angle, texture, etc. They also made it possible for Kuhn to upload an image and ask Medici to find films and video with similar images. NCSA visualization expert Dave Bock assisted Kuhn with visualization data, taking 2D frames from select films and turning them on their sides to make 3D “slices” in a time and space configuration. This temporal and spatial simultaneity lets the researchers see

PY1-3 Comprehensive Report Page 10 motion across time in a single image. “You can immediately ascertain camera shots, movement, and other cinematic elements. The only way to find them otherwise is to watch the entire movie, and nobody could ever watch all the movies. It’s impossible,” he said. 2.8. Seismology: Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers (Yifeng Cui, San Diego Supercomputer Center, UC San Diego) A team of researchers at the San Diego Supercomputer Center (SDSC) and the Department of Electronic and Computer Engineering at UC San Diego has developed a highly scalable computer code that promises to dramatically cut both research times and energy costs in simulating seismic hazards. The team, led by SDSC computational scientist Yifeng Cui, performed GPU-based benchmark simulations of the 5.4 magnitude earthquake that occurred in July 2008 below Chino Hills, near Los Angeles. Compute systems used in the project included Keeneland, an XSEDE resource managed by Georgia Tech, Oak Ridge National Laboratory (ORNL), and the National Institute for Computational Sciences (NICS). The simulation, performed on Keeneland utilizing 128 NVIDIA M2090 GPUs, indicates that small- scale heterogeneities (causing the highly irregular pattern of shaking in the image) may significantly affect ground motion in geologic basins. The work on Keeneland was instrumental to the team’s ultimately achieving 5-fold speedup over the heavily optimized CPU code on Cray’s XK7, and a sustained performance of two petaflop per second on the tested system. A previous benchmark of the AWP-ODC code reached only 200 teraflops (trillions of calculations per second) of sustained performance. By delivering a significantly higher level of computational power, researchers can provide more accurate earthquake predictions with increased physical reality and resolution, with the potential of saving lives and minimizing property damage. “The increased capability of GPUs, combined with the high-level GPU programming language CUDA, has provided tremendous horsepower required for acceleration of numerically intensive 3D simulation of earthquake ground motions,” said Cui, who presented the team’s development at the NVIDIA 2013 GPU Technology Conference (GTC) in San Jose, Calif.

Figure 2-8: The image shows a snapshot of ground motion of the 2008 magnitude-5.4 Chino Hills earthquake in an east-to-west direction; the red- yellow and green-blue colors depict the amplitude of shaking. The simulation indicates that small- scale heterogeneities (causing the highly irregular pattern of shaking in the image) may significantly affect ground motion in geologic basins. Simulation by Efecan Poyraz/UC San Diego and Kim Olsen/San Diego State University. Visualization by Efecan Poyraz; map image courtesy of Google.

PY1-3 Comprehensive Report Page 11

The project is part of a larger computational effort coordinated by the Southern California Earthquake Center (SCEC). A technical paper based on this work was presented June 5-7 at the 2013 International Conference on Computational Science Conference in Barcelona, Spain. 2.9. A HPC “Cyber Wind Facility” Incorporating Fully-Coupled CFD/CSD for Turbine-Platform-Wake Interactions with the Atmosphere and Ocean (PI: James G. Brasseur, Pennsylvania State University) While wind turbines have proved a practical renewable energy source, the cost of energy depends both on power production and longevity of reliable operation. Wind plant developers base their financial return estimates on the assumption that turbines operate for 20 years with minimal repair. Replacing even a bearing in a 150 m commercial wind turbine is costly; the failure of small numbers of components can eliminate profit margin. James Brasseur and colleagues at Penn State have created the “Cyber Wind Facility” (CWF), a wind turbine “field facility” within the HPC environment, to predict with high space-time resolution and fidelity the interactions between an operating commercial wind turbine and the most energetic turbulence eddies that impact the rotor. The CWF is used to analyze the nature of damaging blade and shaft deformations caused by atmospheric turbulence, together with degradation in power generation and hardware longevity. With this understanding, damage mitigation and power enhancement strategies can be proposed. In addition, the CWF is designed to compare directly with lower-order wind park design tools by replacing blades with “actuator line” and “blade-element momentum” representations to improve these industry-level design tools. The Penn State team has found PSC Blacklight’s large shared memory capability to be essential in carrying out the massively parallel decomposition and mappings of the large complex grids necessary for the CWF. Further, Blacklight has been used for many of the validation and early simulation studies used in CWF development. Initial simulations indicate that the changing direction of the wind as daytime turbulent eddies pass through the wind turbine rotor disk force large changes in blade angle-of-attack and, consequently, large changes in boundary layer separation topography that may underlie large load transients. This discovery has led to recent fully blade-resolved simulations on TACC Stampede, between direct forcing by atmospheric turbulence and space-time-dependent blade aerodynamics. Figure 2-9: The Cyber Wind Facility models turbulence around wind turbines at multiple scales. CAD: computer-aided design; ABL: atmospheric boundary layer; NREL: U.S. Department of Energy's National Renewable Energy Laboratory; URANS/LES: unsteady Reynolds-averaged Navier- Stokes/large eddy simulation; openFOAM: the standard fluid dynamics software toolbox.

PY1-3 Comprehensive Report Page 12

2.10.Materials Research: Atomistic Characterization of Stable and Metastable Alumina Surfaces (Douglas Spearot, University of Arkansas) “Structure influences every property of a material”, says engineering professor Douglas Spearot. And it is the structural details at the atomic level that can have the greatest influence but which we often know the least. That’s why Spearot, who teaches materials science and mechanics in the Department of Mechanical Engineering at the University of Arkansas and also conducts research in UA’s Institute for Nanoscale Materials Science and Engineering, and PhD student Shawn Coleman explored new techniques to learn more about the atomic structure of materials. The results? A unique algorithm integrated with the LAMMPS simulation package that uses simulation data to produce a visualization of both x-ray diffraction line profiles and selected-area electron diffraction patterns that is directly comparable to what experimentalists would observe in a lab. Their work was first published in the journal Modeling and Simulation in Materials Science and Engineering in 2013, and more recently in the March 2014 issue of JOM, the journal of The Minerals, Metals & Materials Society. Spearot and Coleman are quick to point out that this success would not have been possible without XSEDE’s compute resources and Extended Collaborative Support Services (ECSS). An NCSA expert in workflows worked with Coleman and an NCSA visualization expert to automate simulation and visualization techniques, allowing simulations to be launched from a desktop computer and receive the data and visualizations without further interaction. Once a simulation is launched, it runs on Stampede at TACC to do the atomistic simulations; when complete it automatically launches on Gordon at SDSC to do the virtual diffraction calculations as well as the Figure 2-10: High-resolution visualization of two electron visualizations. Another ECSS expert at PSC has been working to diffraction spots with the relrod improve the parallelization of the algorithm and thus increase the structure produced by the α-alumina scalability, and a Campus Champion Fellow also assisted in surface. improving the scalability of the code. 2.11.Mechanics and Materials: An Inverse Stiffness Mapping Approach to the Early Detection of Breast Cancer (Lorraine Olson, Rose-Hulman Institute of Technology) One in eight women in the United States will develop breast cancer over the course of her lifetime. Lorraine G. Olson, professor of mechanical engineering at the Rose-Hulman Institute of Technology in Terre Haute, Ind., was diagnosed in 2005 at the age of 45. Fortunately, her breast cancer was caught early from a routine mammogram. Few cancer survivors are in the position to change the way cancer is diagnosed, but Olson and her husband Robert Throne, also a cancer survivor, are doing just that for breast cancer. Together, they have created a mathematical model to improve early detection efforts. Cancerous tissues are as much as 10 times stiffer than healthy tissues, but mammograms use X-rays — which are only sensitive to tissue density, not stiffness — for detection. Manual breast exams look at tissue stiffness, but, currently, there is not a good way to record the results of these exams. The ultimate implementation of Olson and Throne’s system would use a robotic device to mimic manual breast palpations, enabling doctors to record accurate data about the underlying tissue. Mathematical techniques would then interpret the palpation in terms of stiffness. The device would not replace mammography, but would act as an affordable, effective, and less-invasive complementary tool. To run these simulations quickly and to generate accurate results, Olson and Throne are applying the computational power of XSEDE through resources at the

PY1-3 Comprehensive Report Page 13

Texas Advanced Computing Center. They also are working with XSEDE staff as part of the Extended Collaborative Support Service (ECSS) in an effort to improve the speed of their algorithms. For breast cancer research, speed of execution is very important, as a typical genetic algorithm must be iterated many times to produce a usable result for a complex problem. Carlos Rosales- Fernandez, an ECSS staff member based at TACC, helped Olson and Throne increase the speed of Figure 2-11: A typical result from the stiffness- their code by replacing the solver, optimizing the mapping algorithm for a simulated breast tumor. The displacement patterns in 2D, extrapolating those predicted tumor location is shown in red while the optimizations to their 3D simulations, and green shows the true location. Computed using TACC's Ranger supercomputer. exploring alternative fitness functions for the genetic algorithm. With their code optimized for XSEDE’s advanced computing resources, the researchers can parallelize their stiffness mapping jobs and finish simulations in minutes. Results of the research were published in the July 2012 edition of Inverse Problems in Science and Engineering. 2.12.Materials Research: “Morphology and flow properties of micelle- nanoparticle solutions from Molecular Dynamics simulations” (Subas Dhakal, Syracuse University) Surfactants (short for surface-active agents) are in products people use every day, from soap to toothpaste to a variety of industrial substances. Naturally occurring in the human body, surfactants ensure the proper function of vital organs such as the lungs, and they are applied in many medical therapies. As ubiquitous as surfactants are, however, they nonetheless have properties that remain mysterious despite hundreds of years of investigations, said Subas Dhakal of Professor Radhakrishna Sureshkumar’s group in the Department of Biomedical and Chemical Engineering at Syracuse University. XSEDE resources have helped the researchers probe the relationship between aggregate particles called micelles in surfactants and the property of viscosity, a measure of a fluid’s resistance to flow. What’s learned could enable the fine-tuning of surfactants to make them more beneficial to society, Dhakal said. He explained that flow properties of micellar solutions change dramatically when surfactant chemistry, ionic environment, temperature, and chemical concentrations are altered; and detailed large-scale molecular dynamics simulations can help to elucidate the interplay between micellar shapes and rheology (fluid flow). Using Kraken at the National Institute for Computational Sciences, Blacklight at the Pittsburgh Supercomputing Center, and Lonestar and Stampede at the Texas Advanced Computing Center, the group has simulated the trajectory of a micellar solution system over a few hundred nanoseconds, looked at different emerging shapes of micelles as a function of surfactant and salt concentration, deduced various scaling laws to different length scales, and discovered that miscelle length distribution is not universal and strongly depends on the electrostatic interactions within the system. The research has also revealed that salt concentrations have an effect on the thickening and thinning rates in surfactant solutions. Dhakal said the simulation techniques developed so far have created a foundation for comprehensive understanding of structure–dynamics–rheology relationships in micellar solutions. He explained that by having access to multiple XSEDE resources, the research team was able to test their simulations on different available computing architectures and pick the one that was most suitable to the project’s needs. “Performance of some of the machines for a particular type of simulation was better as compared with others,” he said. In particular, the highly efficient GPU nodes available on Stampede/Lonestar were very helpful in simulating the system over a prolonged time, which is essential to observe the morphological changes as well as to

PY1-3 Comprehensive Report Page 14 quantify various dynamical properties. Therefore, having different machines allowed the researchers to take advantage of the unique specialty of each one. Dhakal also noted, “We were able to finish our simulations on time by avoiding long wait time due to overload in a particular machine, since we were able to run our simulations switching back and forth between different resources.” The researchers have given talks on their work at various professional society meetings in recent months and plan to submit papers for publication in scientific journals soon.

PY1-3 Comprehensive Report Page 15

3. Discussion of Goals and Key Performance Indicators The strategic goals of XSEDE (§1.1) cover a considerable scope. Additionally, the specific activities within our scope are often very detailed; therefore to ensure that this significant and detailed scope will ultimately deliver our mission and realize our vision, we consider each component or sub-goal individually. In determining the best measures of progress toward each of the sub-goals, metrics that represent measures of impact to the scientific community are used where possible. These are often paired with measures of outcome to provide a sense of the scope of the supporting activities. In some cases metrics for impact, outcome, or both are not currently being collected. In these cases the best available metric is used and an implementation plan for more significant metrics is included. 3.1. Deepen and Extend Use XSEDE will 1) deepen the use—make more effective use—of the advanced digital services ecosystem by existing scholars, researchers, and engineers and 2) extend the use to new communities. We will contribute to preparation—workforce development—of scholars, researchers, and engineers in the use of advanced digital technologies via training, education, and outreach; and we will 4) raise the general awareness of the value of advanced digital research services. 3.1.1. Deepening use to Existing Communities Table 3-1 shows the key performance indicators for this sub goal. Usage in XSEDE can be defined in many ways. A possible minimum threshold for usage is the existence of a portal account, while more substantial usage is the consumption of a service, such as training, support, or allocations. While each of these potentially impacts a researcher’s ability to produce new science, the collaborative, yearlong work done to help research teams to more effectively and broadly use the ecosystem offered through XSEDE’s Extended Collaborative Support Services program (ECSS), most directly impacts the delivery of new science. “Existing communities” are defined as communities that represent more than 1% of all XRAC allocations. Therefore, the project has chosen two metrics that together represent the KPI: the total number of projects completed by the ECSS team through work with research teams, community codes, and gateways and the average PI rating of the satisfaction with the ECSS experience for these projects (out of a maximum score of 5) obtained through interviews following project completion. Table 3-1: KPIs for the sub-goal of deepening use to existing communities. PY4 KPI PY1 PY2 PY3 Owner(s) Target # of completed projects (ESRT + ESCC + 55 8† 51 59 ECSS (§4.1) ESSGW) Average rating of satisfaction with the ECSS experience from PI follow-up 4 of 5 * 4.13 4.56 ECSS (§4.1) interviews † Incomplete data * metric not tracked during this period We did not track impact ratings by the PIs of supported projects in the first project year, and the data for the number of supported projects is incomplete. In the third project year we exceeded the previous target for the number of supported projects (45 in PY3) and the impact these projects had on the related research as assessed by the PI. As noted in the table, the target for PY4 has been raised to 55 for the number of completed projects. In addition late in PY3 we began asking PIs for

PY1-3 Comprehensive Report Page 16 an additional rating of the impact of ECSS support on their project in the follow-up interviews. This will become an additional KPI. Our target for that will also be 4 out of 5. 3.1.2. Extending Use to New Communities Table 3-2 shows the key performance indicators for this sub goal. Again, usage in XSEDE can be defined in many ways. A possible minimum threshold for usage is the existence of a portal account, while more substantial usage is the consumption of a service, such as training, support, or allocations. While each of these potentially impacts a researcher’s ability to produce new science, the collaborative work done with new communities to effectively use the ecosystem, most directly impacts the delivery of science from new communities. New communities are defined as new fields of science, industry, and under-represented communities. New fields of science are those that represent less than 1% of XRAC allocations. Therefore, the project has chosen two metrics that together represent the KPI: the number of new research projects started within new communities and the number of users associated with these projects. The Industry Challenge program, the Novel & Innovative Projects team, and the Under-represented Community Engagement team all support collaborative projects with new communities. Table 3-2: KPIs for the sub-goal of extending use to new communities. PY4 KPI Target PY1 PY2 PY3 Owner(s) # of new projects with new communities = 38 * * 35 # of Industry Challenge projects + Industry Relations (§6.1.4) # of NIP startup projects + ECSS – NIP (§4.2.2) # of NIP XRAC projects + ECSS – NIP (§4.2.2) # projects from under-represented E&O – Under-represented communities and minority serving institutions Communities (§4.5.4) # of users on new communities projects = 116 * * 120 # of Industry Challenge users + Industry Relations (§6.1.4) # of NIP first-time users + ECSS – NIP (§4.2.2) # of NIP first-time XRAC users + ECSS – NIP (§4.2.2) # of new users with allocations from under- E&O – Under-represented represented communities and MSIs Communities (§4.5.4)

* metric not tracked during this period The industry challenge program debuted in the third project year, and data was not previously tracked neither for under-represented communities, minority-serving institutions nor from the novel and innovative projects (NIP). In the third project year, we were slightly under target for the number of projects with new communities, but had slightly more users on these projects than expected. 3.1.3. Prepare the Current and Next Generation Table 3-3 shows the two metrics that together represent the KPI for the sub-goal of preparing the current and next generation of computationally savvy researchers. Part of XSEDE’s mission is to provide a wide community of such existing and future researchers with access to, and training to use, advanced digital services. Current researchers, both seasoned and novice, are potential XSEDE training targets as are future researchers (students). Fundamentally, the metrics below are a measure of the number of these individuals we have reached along with a measure of impact as indicated by those same individuals. Again these are composite metrics, the elements of which are owned by various parts of the XSEDE organization.

PY1-3 Comprehensive Report Page 17

Table 3-3: KPIs for the sub-goal of preparing the current and next generation. PY4 KPI PY1 PY2 PY3 Owner(s) Target # of participants = 6,000 1,723† 3,220† 5,568 # of industry registrants for training + Industry Relations (§6.1.4) # of registrants through User Services + Training (§4.4.1) # of Campus Champion Fellows + ESTEO (§4.3.3) # of students engaged + Student Engagement (§4.5.5) # of Champions Champions Program (§4.5.3) Average of impact assessments across all 4 of 5 4.21† 4.32† 4.13† Industry Relations (§6.1.4) activities Training (§4.4.1) ESTEO (§4.3.3) Student Engagement (§4.5.5) Champions Program (§4.5.3) † Incomplete data The only assessment data captured for each of the first three years of the project is for training given through User Services. Assessments of the Campus Champions Fellows program were added in the second project year and assessments for industry training and Campus Champions will begin in the fourth project year. However, impact assessments for training through User Services and the Fellows program have exceeded the target. 3.1.4. Raising Awareness Table 3-4 shows the key performance indicators for this sub-goal. Measuring our ability to raise the general awareness of the value of advanced digital research services includes the use of two outcome metrics: a measure of the use of XSEDE’s online presence and a composite that measures how well we are communicating the value of XSEDE. Perhaps the best measure of our efforts to raise awareness might be a market-based assessment of the XSEDE brand, but this is not part of the XSEDE charter. Table 3-4: KPIs for the sub-goal of raising the general awareness of the value of advanced digital research services. PY4 KPI PY1 PY2 PY3 Owner(s) Target Increase XSEDE’s online presence (growth 4% * * † External Relations (§6.1.3) percentage of people engaged) = # website visits + # Twitter followers + # Facebook likes compared to total numbers from previous quarter XSEDE’s media presence (# of 16 * * † External Relations (§6.1.3) communications) = # of stories + # of press releases * metric not tracked during this period † Incomplete data Establishing these particular metrics and in particular setting their targets has been challenging. Given the lack of previous tracking, and the nature of them being a year-over-year percentage increase, we lack trends in the data to inform the targets. These targets will be refined as we instrument the system and collect data.

PY1-3 Comprehensive Report Page 18

3.2. Advance the Ecosystem Exploiting its internal efforts and drawing on those of others, XSEDE will advance the broader ecosystem of advanced digital services by 1) creating an open and evolving e-infrastructure, and by 2) enhancing the array of technical expertise and support services offered. 3.2.1. Create an Open and Evolving e-Infrastructure Three KPIs have been identified for the sub-goal of creating an open and evolving e-infrastructure: the number of new software components delivered to the community, deployment of tools on campuses via the Campus Bridging efforts, and the overall satisfaction with XSEDE services as measured by the annual user survey. Table 3-5: KPIs for the sub-goal of creating an open and evolving e-infrastructure. PY4 KPI PY1 PY2 PY3 Owner(s) Target Use Case Performance = average of > 0.9 * 0.85† 0.83†  Use Case Specification Clearance Systems & Software Ratio ( # of specifications Engineering (§6.1.2) completed/# of specifications started)  Average Relative Effectiveness Architecture & Design (§5.1) (estimated task duration / actual task duration) in A&D’s completion of Use Cases Software Testing &  Fraction of software components Deployment (§5.4.4) deployed within target deployment time # of users who use one or more CB tools 1,000 91 170 235 Campus Bridging (§4.5.2) Overall satisfaction with XSEDE services 4 of 5 * 4.28 4.36 User Engagement (§4.4.3) * metric not tracked during this period † Incomplete data As can be seen in Table 3-5, we have not met the target for the composite KPI of use case performance, at least based on the incomplete data. We are still challenged in having adequate means to measure the uptake of our Campus Bridging tools. Our current methodology indicates we are getting fewer uptakes than we would like, but we are working to improve our ability to track this metric. The overall satisfaction with our service offerings, as gauged by the annual user survey, continues to be high. 3.2.2. Enhance the Array of Technical Expertise and Support Services Table 3-6 shows the key performance indicators for this sub goal. We have defined two KPIs for the sub-goal of enhancing the array of technical expertise and support services offered by XSEDE: the total number of staff trained and their evaluation of the effectiveness of staff training provided by XSEDE. This training may come from the courses associated with an XSEDE certification program or from domain experts through workshops, symposia, and training events hosted by Extended Collaborative Support Services or Service Providers. As a composite, the number of staff trained reflects the sum of staff participation in these myriad training events.

PY1-3 Comprehensive Report Page 19

Table 3-6: KPIs for the sub-goal of enhancing the array of technical expertise and support services. PY4 KPI PY1 PY2 PY3 Owner(s) Target

# of staff trained = 455 80† 128† 439 # of staff receiving certification + Training (§4.4.1) # of staff at ECSS in depth events + ECSS (§4.3.3) # of staff at ECSS symposia + ECSS (§4.3.3) # of staff registrants for XSEDE Training Training (§4.4.1) Average rating for training from Staff 3.5 of 5 * 2.76 2.91 Training (§4.4.1) Climate Survey * metric not tracked during this period † Incomplete data Since we currently are not able to collect all the data for the metric of number of staff trained, the target was set based on the available data for the third project year, which is the number of staff training registrants and the number of staff attending ECSS symposia. The average satisfaction rating by staff for training that is available continues to be low since we began measuring it in the second project year. Work has begun on a revised training plan with implementation to be completed by Q4 of CY2014. Staff training certification is a task in the plan and a Certification Committee will be formed to define and implement the certification process. Additionally, a Staff Training Committee will be formed to further evaluate and respond to the staff climate results. 3.3. Sustain the Ecosystem XSEDE will sustain the advanced digital services ecosystem by 1) assuring and maintaining a reliable, efficient, and secure infrastructure, and 2) providing excellent user support services. Furthermore, XSEDE will operate an 3) effective, 4) productive, and 5) innovative virtual organization. 3.3.1. Provide Reliable, Efficient, and Secure Infrastructure Table 3-7 shows the key performance indicators for this sub goal. Perhaps the truest measure of an infrastructure’s reliability is its robustness as reflected by sustained availability. Thus the KPI for this sub goal is a composite measure of the availability of XSEDE infrastructure components. This composite averages the percentage availability of enterprise services, XSEDEnet, the XSEDE-wide file system (XWFS), and the XSEDE resource allocation service (XRAS). Table 3-7: KPIs for the sub-goal of providing reliable, efficient, and secure infrastructure. PY4 KPI PY1 PY2 PY3 Owner(s) Target Average composite uptime % = average of (% 97% 99.6 99.8 99.6 Operations (§5.4) enterprise services uptime, % XSEDEnet uptime, % XWFS uptime, % POPS/XRAS uptime) 1

1 Composite uptime includes XSEDEnet beginning in Q1 of PY3 and XWFS beginning in Q2 of PY3.

The availability of these core services has been very high in the first three project years. This is in no small part due to a focus on redundancy and failover procedures by the Operations group.

PY1-3 Comprehensive Report Page 20

3.3.2. Provide Excellent User Support Table 3-8 shows the key performance indicators for our ability to provide excellent user support. We use two KPIs to measure our success: the mean time to resolution on support tickets that are resolved by the XSEDE Operations Center (XOC) and the average satisfaction rating across user services. The XOC is the frontline 24x7 centralized support group that either resolves or escalates tickets to the appropriate resolution center depending on the request. The average satisfaction rating across services includes satisfaction ratings of the user portal, of the timeliness and effectiveness of support at the federated resource providers, and of the effectiveness of the allocations process. These are measured from the annual user survey and serve as a quality metric for XSEDE user support. Table 3-8: KPIs for the sub-goal of providing excellent user support. PY4 KPI PY1 PY2 PY3 Owner(s) Target Mean time (hours) to ticket resolution by XOC < 24 * * 8.9 Systems Operational Support (§5.4.6) Average of satisfaction ratings across services = 4 of 5 4.31† 4.13† 4.21† Overall user portal satisfaction + User Information and Interfaces (§4.4.2) Overall support response time satisfaction + User Engagement (§4.4.3) Overall support effectiveness satisfaction + User Engagement (§4.4.3) Overall satisfaction with allocations process Allocations (§4.4.4) * metric not tracked during this period † Incomplete data While we are doing quite well with respect to responsiveness to issues reported by researchers, we do not as yet have a complete set of measures of satisfaction because we have not yet initiated our micro-surveys to assess satisfaction with the User Portal and allocations process. The response time and support effectiveness satisfaction measures are solely based upon the annual user survey, but shows that satisfaction with those services is on target. 3.3.3. Effective and Productive Virtual Organization Table 3-9 shows the single composite KPI for the sub-goal of providing an effective virtual organization: the number of process improvements made in XSEDE and the number of actions taken as a result of the staff climate survey. Table 3-9: KPIs for the sub-goal of providing an effective and productive virtual organization. PY4 KPI PY1 PY2 PY3 Owner(s) Target # of improvements implemented = 40 17† 36 37 Project Management & # of process improvements + Reporting (§6.1.1) # of actions from staff climate survey † Incomplete data The number of process improvements implemented has increased each year of the project with an uptick between the first year and the latter two years. This is due to instantiation of a staff climate survey and the implementation of the first recommendations from the Baldrige criteria in the second project year; both of which have led to greater awareness of process and measurement in the project.

PY1-3 Comprehensive Report Page 21

3.3.4. Innovative Virtual Organization Measuring innovation for an organization like XSEDE (or for organizations in general) is difficult and represents an area of open research. A partial measure is the number of staff publications produced since this shows that XSEDE staff is involved in activities that achieve peer reviewed publication. This will continue to be an open conversation within the project as we decide how to provide additional evidence to qualify the innovation of XSEDE. Of particular note is the discussion in §F.1 (in Appendix F) which discusses process improvements in the organization but hits upon something that we may be able to develop as a KPI in this area.

PY1-3 Comprehensive Report Page 22

4. Supporting and Expanding the Community 4.1. Extended Collaborative Support Service - ECSS (WBS 1.4 and 1.5) The Extended Collaborative Support Service improves the productivity of the XSEDE user community both through successful, meaningful collaborations and well-planned training activities. The goal is to optimize applications, improve work and data flows, increase effective use of the XSEDE digital infrastructure and broadly expand the XSEDE user base by engaging members of underrepresented communities and domain areas. Highlights from individual projects will be presented in the appropriate L2 section. Across the years, for the L2 directors, the highlight is the great enthusiasm that users profess for the ECSS program and the support that they have received. Among the kudos that we have heard were:  "It’s great to have people who understand the discipline and speak the scientists’ language."  "This is the best support I’ve ever had for any computing. We pay for support at Amazon and it’s nowhere as good as ECSS support."  “Sinkovits sped up his part of the code by 167x”.  “I have no one else on campus to interact with on these matters”  “Pekurovsky found at least 3 important errors in my original code”

The two L2-wide KPIs are the number of projects we complete and the overall satisfaction rating that PIs express in the interviews that the L2 directors carry out with the PI of each completed project. These measure both the quantity and the quality of what we do. Since the number of interviewees is manageable, the L2 directors can talk to each of them and do not have to rely on a survey which may have low participation. These ratings and any further feedback are shared with the ECSS expert. If there are concerns with the assistance given, the results are discussed with the ECSS expert, the L2 lead, and the L3 manager. The targets for the KPIs were determined by early estimation, based on our FTE count. We did not track impact ratings by the PIs of supported projects in the first project year, and the data for the number of supported projects is incomplete. In cases where metrics have been consistently surpassed, the targets have been raised and this will continue to be the case. In addition late in PY3 we began asking PIs for an additional rating of the impact of ECSS support on their project in the follow-up interviews. This will become an additional KPI. Our target for that will also be 4 out of 5. Table 4-1: KPIs for ECSS. PY4 KPI PY1 PY2 PY3 Sub-goal Supported Target # of completed ECSS projects 55/year 8† 51 59 Deepen/Extend – Deepening use (ESRT + ESCC + ESSGW) to existing communities (§3.1.1) Average Satisfaction with ECSS 4 of 5 * 4.13 4.53 Deepen/Extend – Deepening use support from PI follow-up to existing communities (§3.1.1) interviews Staff Climate Survey results for 4 of 5 * 2.76 2.91 Advance – Enhance the array of staff satisfaction with training technical expertise and support services (§3.2.1) # of ECSS hosted symposia 10/year 8 10 9 Advance – Enhance the array of technical expertise and support services (§3.2.1)

PY1-3 Comprehensive Report Page 23

PY4 KPI PY1 PY2 PY3 Sub-goal Supported Target # of Campus Champion Fellows Deepen/Extend – Preparing the 4 0 4 6 current and next generation (§3.1.3) † Incomplete data * metric was not tracked in this period

4.2. ECSS – Projects (WBS 1.4) 4.2.1. Extended Support for Research Teams (WBS 1.4.1) Extended Support for Research Teams accelerates scientific discovery by collaborating with researchers, engineers, and scholars in order to optimize their application codes, improve their work and data flows, and increase the effectiveness of their use of XSEDE digital infrastructure. ESRT supports are initiated as a result of requests for assistance during the allocation process and are almost always home-grown codes, as community codes fall under ESCC (below). At the beginning of the XSEDE project, it was hard to gauge how many projects could be completed in a year; as such a goal of 20 was chosen. The goal was exceeded in PY1 and PY2, so it was raised it to 25 in PY3. We intend to raise it to 35 in PY4. To date, no requests for ESRT support have been turned down because of inadequate staffing levels (~13 FTEs). For PY3, all ESRT staff have been part of at least one project. The projects are tracked from beginning (positive recommendation from the XRAC and creation of workplan) to end (Final Reports) with quarterly reports along the way. A completed project is one that had a workplan and the ECSS expert(s) and PI’s team worked on throughout the time period specified in the workplan. Table 4-2: KPIs for ECSS-ESRT. PY4 KPI PY1 PY2 PY3 Sub-goal Supported Target # of completed projects Deepen/Extend – 35 5* 37 46 Deepening use to existing communities (§3.1.1) Average value of satisfaction with ESRT Deepen/Extend – support from PI follow-up interviews 4 of 5 * 4.13 4.53 Deepening use to existing communities (§3.1.1) * metric was not tracked during this period The number of completed projects in PY1 is a lower bound as there were no formal tracking mechanisms in place for the projects inherited from the TeraGrid ASTA program. Final reports were requested from the ASTA projects, but only 5 were received. As for PY2 and PY3, the target was exceeded. For the quarters where the level of satisfaction with ESRT was tracked, this KPI exceeded the goal. These numbers indicate high satisfaction with the support the ESRT experts have provided. We believe that workplans generated for each project contribute significantly to the satisfaction rating as both the PI and the ECSS staff work together to set expectations for the project at the outset. And of course the rating is high because the ECSS experts produce results that the project teams benefit from.

PY1-3 Comprehensive Report Page 24

4.2.2. Novel and Innovative Projects (WBS 1.4.2) Novel and Innovative Projects accelerates research, scholarship, and education provided by new communities that can strongly benefit from the use of XSEDE’s ecosystem of advanced digital services. Working closely with the XSEDE outreach team, the NIP team (~ 5 FTEs) identifies a subset of scientists, scholars and educators from new communities, i.e. from disciplines or demographics that have not yet made significant use of advanced computing infrastructure, who are now committed to projects that appear to require XSEDE services and are in a good position to use them efficiently. Our staff then provides personal mentoring to these projects, helping them to obtain XSEDE allocations and to use them successfully. The KPIs for this effort monitor the sustainability of the projects from new communities that are mentored by the NIP team. This is defined in terms of usage (at least 10% of the allocation on at least one of the resources that were awarded to a project), and/or by a project obtaining their first XRAC grant (for startups), respectively by successfully renewing a first-time XRAC project. We believe that sustainability of projects is the most important goal of NIP, which is why we chose measures of this success as KPIs. We set the same targets for sustained projects as for generated projects (see Appendix B). Since these KPIs were only adopted in PY3, they cannot be reliably measured for PY1 and PY2. , Table 4-3: KPIs for ECSS-NIP. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of startup projects from new communities 20 * * 21 Deepen/Extend – that use 10% of their allocation on at least one Extending use to new resource within 1 year initial award or communities (§3.1.2) successful XRAC proposal # of XRAC projects from new communities 10 * * 8 Deepen/Extend – that use 10% of their allocation on at least one Extending use to new resource within 1 year initial award or communities (§3.1.2) successful renewal * metric not tracked during this period We conclude that our new KPIs are realistic, and give us actionable information we did not have before: it seems easier to sustain startups than first-time XRAC projects from new communities. In PY4, the NIP team will pay close attention to detecting and overcoming the difficulties that user groups face when they tackle the full complexity of production projects on XSEDE resources. 4.3. ECSS – Communities (WBS 1.5) The Communities area of ECSS undertakes projects that benefits larger communities rather than independent research groups. Our successes are amplified significantly as a result. Extended Support for Community Codes functions much like ESRT, but the codes we optimize are used by communities of users rather than individual research groups. Improvements made here have wide ramifications. For example in PY1 an MIT team using GADGET, an open-source community code used to run large-scale cosmology simulations, had trouble when moving to TACC’s Lonestar system. They requested ECSS support. Through a close collaboration with the developer, the ECSS team uncovered a fundamental flaw in the newest version of GADGET that would have caused problems going forward on all but the oldest of clusters. These changes were incorporated back into the open-source code base and now impact all those using GADGET on clusters.

PY1-3 Comprehensive Report Page 25

A collaboration with the Broad Institute focused on adapting the genomics community codes Trinity and Allpaths-LG for use on HPC systems. Supporters at PSC worked with Trinity developers at the Broad Institute and the National Center for Genome Analysis Support on code improvements leading to the largest Trinity analysis ever performed. The team assisted over 20 research groups during the optimization phase and the codes are now generally available for all users. The strong relationships developed during this project will be a useful stepping stone to installations on other SPs. Science Gateways are community-designed, web-based interfaces to advanced cyberinfrastructure. Typically these are tools that serve thousands of users who do not access XSEDE from the UNIX command line, but still benefit from the extraordinary resources. The ECSS science gateway team works with development teams to incorporate XSEDE resources into existing science gateways. As such, they are also well-positioned to contribute to gateway-related use cases for XSEDE’s Architecture team and to perform evaluations for XSEDE’s Software Development and Integration team. In Q3PY3, gateway users (2739) surpassed active command line users (2631) for the first time in the XSEDE program, indicating the tremendous growth in this user community. We describe here a few highlights from the first 3 years of the XSEDE gateway program. The Ultrascan science gateway supports researchers using analytical ultracentrifugation (AUC) to study the size and shape of molecules in solution. Using both instruments (an ultracentrifuge) and software (ultrascan) through the gateway, remote researchers can study the properties of macromolecules connected to disease. This gateway serves a world-wide audience and uses supercomputers in both the US and Germany. PI Borries Demeler at UT Health Science Center at San Antonio comments that the existence of the gateway is changing the field and the number of trained analysts in AUC has doubled in two years. Having worked in other countries, he feels XSEDE’s ECSS program gives US researchers an important and unique advantage. 25% of all XSEDE users come in through the CIPRES science gateway. CIPRES recently served their 10,000th user since the gateway went live in 2009. Over 1000 scholarly publications reference CIPRES. Original CIPRES member Brent Mishler at UC Berkeley says “CIPRES has moved the needle of what phylogenetic scientists think is possible”. And, in a story picked up by the Huffington Post, a Massachusetts high school student won his state science fair by using CIPRES and therefore XSEDE supercomputers with absolutely no assistance from either CIPRES or XSEDE staff. By putting cutting-edge tools in the hands of young researchers, we drive interest in scientific careers. The ECSS Science Gateways Team collaborated with PI David Tarboton, Utah State University to optimize TauDEM, an open source software tool for hydrological information analysis based on massive high-resolution digital elevation models (DEM) data such as the LIDAR-based terrain data provided by the NSF OpenTopography data facility. The ECSS consultants have been conducting computational performance analysis and performance profiling experiments using XSEDE resources, identifying computational bottlenecks, and developing solutions to improve the scalability and efficiency of TauDEM for data-intensive analysis. Finally, the ECSS Extended Support for Education, Outreach and Training provides the people power behind XSEDE’s training and outreach events. This team is largely comprised from small time slices from of over 40 unique ECSS staff members. Each is qualified to deliver training on technologies they use in their ECSS projects, as well as attend campus outreach events and present on their own scholarly work at conferences, further spreading the word about XSEDE. ESTEO staff were engaged in 193 activities in PY1, 169 activities in PY2 (when the tutorial count was lower) and 192 activities reaching 14,000 participants in PY3 alone. By activities we mean tutorials, conference presentation, development of asynchronous training modules, mentoring activities and more.

PY1-3 Comprehensive Report Page 26

Unique activities include support (including homework problem tests) for a 300-person hands-on massive open online course (MOOC) given by Jim Demmel at UC Berkeley entitled “Applications of Parallel Computers.” ECSS Communities also run several programs that benefit all of ECSS. These include the Campus Champions Fellows program, the ECSS Symposium series and the Workflow Community Applications Team. The Campus Champions Fellows program pairs XSEDE Campus Champions with ECSS mentors for a one-year period to work side by side on real world science and engineering problems. Fellows are paid a stipend to support a 400-hour time commitment and take the skills that they learn back to their campuses to help others. Four Fellows were supported in the program’s inaugural year (PY2), six in PY3 and four more are planned in PY4. The ECSS symposium series is a monthly event highlighting work done by ECSS staff on their projects. In this way we can share expertise both with each other and with the larger user community. These talks are open to all and also recorded for asynchronous viewing on YouTube. The small workflow team assists researchers conducting complex computations involving parameter sweeps, multiple applications combined in dependency chains, tightly coupled applications, and more. The workflow team accomplishes its mission through the use of third party workflow software in collaboration with the workflow developers, service providers and XSEDE Extended Collaborative Support Services. 4.3.1. Extended Support for Community Codes (WBS 1.5.1) Extended Collaborative Support for Community Codes (~6.5 FTEs) extends the use of XSEDE resources by collaborating with researchers and community code developers to deploy, harden, and optimize software systems necessary for research communities to create new knowledge. ESCC supports users via requested projects and initiated projects. ESCC projects may be created in two different ways. Most ESCC projects are initiated as a result of requests for assistance during the allocation process. These projects are similar to nature to ESRT projects but involve community codes rather than codes developed for and by individual research groups. ESCC projects may also be initiated by staff to support a community’s needs. Past examples of this include efforts to optimize Trinity (an important bioinformatics code) for HPC platforms and to port and configure AMBER (a molecular dynamics code) for XSEDE resources. The goal of 10 completed projects per year was established in the first year of XSEDE and appears to be a reasonable choice that corresponds well to the average number of requests per year and is sustainable with current staffing levels. The satisfaction rating is an attempt to track the satisfaction with the support given by ECSS experts. These interviews are carried out by the L2 lead for ECSS communities. These ratings and feedback are shared with the ECSS expert. If there are concerns with the assistance given, the results are discussed with the ECSS expert, the L2 lead, and the L3 manager. Table 4-4: KPIs for ECSS-ESCC. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of completed projects 10 * 10 11 Deepen/Extend – Deepening use to existing communities (§3.1.1) Average value of satisfaction with 4 of 5 * * 4.58 Deepen/Extend – Deepening use to ESCC support from PI follow-up existing communities (§3.1.1) interviews * metric not tracked during this period

PY1-3 Comprehensive Report Page 27

Since the ESCC effort began with XSEDE in Q3 2011, there were no ESCC projects that carried over from TeraGrid. ESCC projects were not initialized until after the first quarter of PY1, and are usually six months to a year in length. Hence, only 2 projects were completed before the end of PY1. However, by the end of PY2 the pipeline was full with 10 projects completed in PY2. This included two projects, Collaboration with the Broad Institute: Genomics Community Capabilities and the Amber Sustainability Project that were initiated internally to satisfy user requirements. In PY3, the number of completed projects (11) just slightly exceeds the target value of 10. It is expected that this number may vary from year to year as most ESCC projects are the result of PI requests. Currently, we are able to satisfy ~10 projects per year with current staffing levels. This also allows us to initiate one or two internal projects per year. Although PI interviews have been performed for past projects, there were no requests for a number rating from the PI until PY3. Going forward, all future interviews will include a request for a satisfaction rating. 4.3.2. Extended Support for Science Gateways (WBS 1.5.2) Extended Support for Science Gateways (ESSGW, ~7.5 FTEs) broadens the science impact and accelerates scientific discovery by collaborating in development and enhancement of Science Centric Gateway Interfaces and fosters a Science Gateway community ecosystem. ESSGW projects primarily begin through user requests from the XSEDE allocation process. Similar to ESRT and ESCC, ESSGW projects progresses through 3 activities. First, the project is assigned to an ECSS expert. Second, the project is quantified with the formation of a workplan document in collaboration with the research group. Third, when the project is completed, the ESRT expert produces a final report with input from the research group. Each of these phases must be completed to produce a successful project. Each state of the progression is measured to provide an assessment of progress. Submission of workplans within 45 days of initial contact, 90% of projects with workplans completed, and 85% of completed projects with final reports within 3 months are additional criteria for success. In addition to reacting to user requests, ESSGW management initiate proactive projects. ESSGW staff can be assigned to assist gateway developers in complying with XSEDE policies (for example gateway end user reporting, SHA2 security compliance). Staff also help gateway developers take advantage of XSEDE’s digital capabilities (for example cross-site executions using workflow tools). Projects are also initiated when needs are recognized that benefit the gateway community at large. Examples for these (in development) include large file uploads from users to XSEDE brokered through gateway community accounts, resource scheduling based on batch queue predictions and avoiding resources in scheduled and unscheduled maintenance. ESSGW staff also proactively reach out to community leaders and motivate them to establish gateways. Examples include an ongoing effort to introduce command-line users to gateways via a lightweight portal. General Automated Atomic Model Parameterization (GAAMP) is a gateway designed for PI Benoit Roux and a Large Scale Visualization Gateway is under development for PI Virginia Kuhn. Finally, the ESSGW team bridges gap between the XSEDE Architecture and Design team and the gateway developer community to provide use cases, validate software installation, and provide gateway test case descriptions.

PY1-3 Comprehensive Report Page 28

Table 4-5: KPIs for ECSS-ESSGW. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of completed projects 10 2 9 18 Deepen/Extend – Deepening use to existing communities (§3.1.1) Average value of satisfaction 4 of 5 * * 4.5 Deepen/Extend – Deepening use to with ESSGW support from PI existing communities (§3.1.1) follow-up interviews * metric not tracked during this period

In the first project year, ESSGW carried forward Science Gateway projects from TeraGrid that were closed in the second project year. Gateway projects tend to be longer than an average ECSS project as the gateway team establishes a sustainability user support service which is a harder challenge. It should be noted that Gateways are externally funded and operated entities but largely support users in lowering the barriers of computational usage through XSEDE. The ECSS team collaboratively explores gateway backend architectures (which interfaces with XSEDE architecture). A few ESSGW projects require minimal assistance in primarily setting up the XSEDE gateway hosting Virtual Machine to operate the gateway. A measurable goal of 10 completed projects per year was established in the first year of XSEDE and appears to be a reasonable choice that corresponds well to the average number of requests per year and is sustainable with current staffing levels. The effort rating is an attempt to track the quality of support given by ESSGW experts. This interview will be carried out by the L2 lead for ECSS communities. These ratings and feedback are shared with the ECSS expert. If there are concerns with the assistance given, the results are discussed with the ECSS expert, the L2 lead, and the L3 manager. 4.3.3. ECSS Communities – Extended Collaborative Support for Training, Education and Outreach (WBS 1.5.3) Extended Collaborative Support for Training, Education and Outreach (~4.3 FTEs) prepares the current and next generation of researchers, engineers, and scholars in the use of advanced digital technologies by providing the technical support for Training- and Education and Outreach-planned activities. ESTEO staff provide the person-power for activities described in the User Services Training area and in the Education and Outreach areas. These areas coordinate training and outreach activities across all XSEDE sites to avoid duplication and the goal of 50 activities annual is dictated by these two areas. ESTEO involves many ECSS staff in training and outreach activities, helping to spread lessons learned through the various ECSS projects. Typical events include train-the-trainers events, onsite classes requested by Campus Champions, XSEDE regional workshops, representation of XSEDE at conferences and staffing for summer schools (including the International HPC Summer School and extreme scalability workshops (in collaboration with Blue Waters)). Staff also create and review online documentation and training modules. This just-in-time training is increasingly popular with the user community when both time and travel budgets are limited. A continuing ESTEO focus is supporting staff training to identify and train those who will become experts in newly requested areas such as genomics, bioinformatics, high throughput computing (Condor and Open Science Grid), data analysis and digital humanities. In addition, several new aspects of the XSEDE architecture, supporting driving use cases, are becoming available – and

PY1-3 Comprehensive Report Page 29

ESTEO will need to develop training and perform outreach to help foster successful adoption of the architecture elements by our user community. Another focus area is to continue to support large online courses such as Jim Demmel’s “Applications of Parallel Computers” course hosted by XSEDE in the spring semester of 2013 and 2014. We have supported courses like this with staff, and using XSEDE computational and data resources. ESTEO staff continue to review education proposals, especially in light of service provider specific capabilities and policies. The experience with education proposal reviews has already resulted in policy and process improvements for education allocations, and we expect continued improvement of support for large courses, as well as the small courses we have traditionally enabled. Finally, we are adding survey components to assess quality of the services we provide. Many ESTEO-staffed events are also evaluated by the TEOS evaluators. Other metrics are provided to give insight to the breadth and depth of activities supported by ESTEO. Table 4-6: KPIs for ECSS-ESTEO. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of Campus Champion Fellows 4 0 4 6 Deepen/Extend – Preparing the current and next generation (§3.1.3) Average score of Fellows 4 of 5 * 4.3 4.25 Deepen/Extend – Preparing the current assessment of the program and next generation (§3.1.3) # of live training events staffed 50 88 33 71 Deepen/Extend – Preparing the current and next generation (§3.1.3) Assessment of ESTEO trainer 4 of 5 * * * Deepen/Extend – Preparing the current by attendees for live events and next generation (§3.1.3) # of in depth training events 2 * * * Advance - Enhance the array of technical offered for staff expertise and support services (§3.2.2) # of staff at ECSS in depth 10 * * * Advance - Enhance the array of technical events expertise and support services (§3.2.2) # of staff at ECSS symposia 300 * * 302 Advance - Enhance the array of technical expertise and support services (§3.2.2) *Metric not tracked during this period Out of the above metrics, historically only the Campus Champion Fellows statistics and the number of tutorials has been tracked. The other metrics were not tracked for PY1-3, but we are putting in mechanisms to be able to track the rest of the metrics going forward. 4.4. User Services (WBS 1.3) User Services helps users to effectively utilize the capabilities of the XSEDE advanced digital services ecosystem to conduct research and education activities, and attracts and prepares future users to do the same. The User Services User Interfaces and Information team (UII) continues to develop and integrate new XSEDE User Portal (XUP) features in support of the sub-goal Advance – Create an open and evolving e-infrastructure. As a user’s initial XSEDE experience it is important that the XUP enable 1) easy access to the information required to select the appropriate resources, 2) PIs to apply for and manage allocations, 3) users to make effective and efficient use of the resources and services, and 4) provide users with the tools to record their research accomplishments. Key performance indicators are the number of new XUP features and average user satisfaction with each feature. Targeted and

PY1-3 Comprehensive Report Page 30 micro surveys will be user to gather the information and a target an average satisfaction rating of 4 achieved (scale of 1-5, with 5 highest.) In support of the sub-goal Sustain – Provide excellent user support, User Services will use targeted and micro surveys to assess the consulting process and quality of documentation. Timely response to user requests for assistance via the XSEDE ticket system is of utmost importance in ensuring that researchers are achieving their research objectives. Providing comprehensive and accurate documentation minimizes the number of requests for assistance from the community, freeing up XSEDE support staff to focus on more complicated problems such as code performance and optimization. The target is achieving an average satisfaction rating of 4 (scale of 1-5, with 5 as highest score.) The user community’s effective and efficient use of XSEDE resources is critical to the success of the XSEDE project. In support of the sub-goal Deepen/Extend – Preparing the current and next generation, the XSEDE training team will increase the number of online training modules, in-person training sessions, and courses presented via webcast, and work with the ER group to increase the availability and visibility of training events, the group aims to increase the number of attendees by 20% over PY3 attendance. The XSEDE training and event registration tool will be used to measure the number of students who register for the event. The tool also sends surveys to each registrant following event completion. Registrants rate the training in several areas using a scale of 1-5 (5 being most favorable). An average rating of 4 across all surveys submitted will indicate success of the training program. Table 4-7: KPIs for User Services. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of new XUP features released 4 4 11 4 Advance – Create an open and evolving e-infrastructure (§3.2.1) Average satisfaction of new XUP 4 of 5 * * * Advance – Create an open and features as determined by targeted evolving e-infrastructure (§3.2.1) user surveys of new features Targeted user survey to assess the 4 of 5 * * * Sustain – Provide excellent user consulting process and the quality of support (§3.3.2) documentation Targeted user survey to assess the 4 of 5 * * * Sustain – Provide excellent user allocations process support (§3.3.2) # of total trainees 10% 1489 2910 5221 Deepen/Extend Preparing the current increase and next generation (§3.1.3) # of user inputs that lead to valuable 8 * * 8 Sustain – Provide excellent user suggestions support (§3.3.2) Average impact rating of training 4 of 5 4.21 4.34 4.01 Advance – Create an open and events by trainees evolving e-infrastructure (§3.2.1) * metric not tracked during this period The XSEDE User Services activities continue to mature in quality and scope, supporting a growing and increasingly diverse XSEDE user community. In person training workshops attracted 9,620 participants, meeting the goal of increasing attendance by 10% annually. A strategic training plan was approved and an effort has been underway to develop an overall training plan that integrates the training components in multiple XSEDE areas to access and address the training needs of users,

PY1-3 Comprehensive Report Page 31

Campus Champions and XSEDE staff. Trainees have consistently rated the impact of training events over 4 on a scale of 1-5. User Interfaces & Information (UII) rolled out the XSEDE User Portal at project start. To meet the needs of the user community and XSEDE staff 19 new features have been deployed to date, including the training and event registration and administration interface, user profile management, allocations management, and an interface that enables PIs and users to enter their publications manually and to import them from other sources. The latest feature rollout provides staff with the ability to conduct micro surveys – surveys that contain 3-4 questions to enable staff to gather metrics to measure user satisfaction with XSEDE resources and services. The UII team also designed user guide templates to present users with information that has a look and feel that is consistent across all resources and service providers. UII continues to oversee the editing, entry and maintenance of all technical documentation on the XSEDE web site in addition to working with the User Engagement team to adapt to the ever-increasing needs of the community. User Engagement (UE) activities included planning, scheduling and conducting all feedback activities (conference BoFs, focus groups, PI/user interviews, etc.), routinely contacting startup and XRAC new/renewal PIs, and analyzing old tickets to identify the areas of operations and support that need to be addressed to resolve issues encountered by the XSEDE user community. UE led the project effort to identify and evaluate options for the XSEDE ticket system and coordinated with the Operations team in configuring and deploying the system using Resource Tracker (RT). UE also provided XSEDE staff with the training required to use the new system to support the user community. The PY3 annual survey indicated an overall user satisfaction rating of 4.3 out of 5 for the consulting services provided by XSEDE. The Allocations team facilitated 12 successful XRAC review meetings and presented 24 “How to Write a Successful Proposal” workshops as well as addressing the user community at each XSEDE Conference. Storage resources were added to the set of allocable resources and the Allocations team played a significant role in the planning and implementation. The team has also contributed to the design and deployment of the new XSEDE Resource Allocation System, which will replace POPS and is scheduled to begin production in Q3 of 2014. Several of the area KPIs are dependent on the execution of targeted or micro surveys and will be reported in future reports. 4.4.1. Training (WBS 1.3.1) The Training group develops and enhances the skills of the national open science community for the effective conduct of research and education activities utilizing XSEDE services. In support of Deepen/Extend – Prepare the current and next generation – there are 3 key performance indicators. By increasing the number of online training modules, in-person training sessions, and courses presented via webcast, and working with the ER group to increase the availability and visibility of training events, the group has increased the number of attendees by greater than 10% each year during PY1 to PY3. The XSEDE training and event registration tool was used to measure the number of students who registered for the event. The tool also sent surveys to each registrant following event completion. Registrants rated the training in several areas using a scale of 1-5 (5 being most favorable). An average rating of 4 across all surveys submitted indicated success of the training program. In support of Advance – Enhance the array of technical expertise and support services – two key performance indicators are defined. The staff climate survey was used in PY3 used to gauge the overall satisfaction of XSEDE staff with the training provided by the Training group. A low value (1- 3) triggered efforts to identify what was missing and where existing training was falling short. We

PY1-3 Comprehensive Report Page 32 hope that by adding more asynchronous training modules that fit staff schedules better and by adding 2 domain specific trainings each year, we can reverse this trend. An average of 4 across all respondents would indicate that the staff training program is moving in the right direction. As per the XSEDE Training Strategic Plan a Certification Committee and a Staff Training Committee have been formed at the end of PY3 to create a certificate program to include the certification of XSEDE staff trainers. The goal is to complete the certification process and issue 10 “Trainer” certificates to XSEDE staff in PY4. Also, through a partnership with ImprovScience and Raquell Holmes, XSEDE plans to offer mentorship training to help staff at all levels engage better with each other, users, and students. Table 4-8: KPIs for User Services-Training. PY4 Subgoal KPI PY1 PY2 PY3 Target Supported # of total registrants 10% 1489 2910 5221 Deepen/Extend – Preparing the current Increase and next generation (§3.1.3) Average impact rating 4 of 5 4.21 4.34 4.01 Deepen/Extend – Preparing the current of training events and next generation (§3.1.3) # of staff registrants for 145 80 128 137 Advance - Enhance the array of training technical expertise and support services (§3.2.2) Average rating from 3.5 of 5 * 2.76 2.91 Advance - Enhance the array of Staff Climate Survey technical expertise and support regarding training services (§3.2.2) # of certificates to staff 10 ** ** ** Advance – Enhance the array of technical expertise and support services (§3.2.2) # of views and 21,000 * * 20,708‡ Deepen/Extend - Prepare the Current downloads of HPC and Next Generation (§3.1.3) University modules * metric not tracked during this period ** Training certificate program to commence in Q1, 2015 ‡ This number is based on our first implementation of a count of visits to the resources site at HPC University and is expected to grow substantially as the number of materials indexed on the site is increased. The training group is on target for 2 of the 5 KPIs – no training certificates have been awarded to XSEDE staff, though work has commenced on beginning a Mozilla-esque badging system to allow users and staff to track their certifications. Also below target is the satisfaction rating by staff on available training. Work has begun on an XSEDE training plan, implementation plan, and execution timeline, with the timeline to be completed by Q4 of 2014. User and staff training certification will be a task in the plan with a Certification Committee formed to define and implement the certification process. It should also be noted that the number of staff registrations for training has been higher than anticipated, and based on staff climate surveys a fifth KPI has been added to assure that an adequate number of staff are getting the training they desire; a Staff Training Committee has been formed to lead the process. Due to the effects of web bots on page hit counters, the numbers for training views has been revised using data from the User Portal only. Lastly, the user satisfaction metric was generated by taking the average of six questions off of the survey given at the end of all XSEDE registered trainings with a scale of 1 (Strongly Disagree) to 5 (Strongly Agree):  The training session fulfilled my expectations.  The trainer stimulated my interest.

PY1-3 Comprehensive Report Page 33

 I have a better understanding of this topic as a result of this experience.  The training session was well-organized.  I was able to easily access this training session.  Overall I would rate my experience as successful. The HPCU site has a very active participation. Our counts of materials viewed and downloaded from the site is at 20,708, which provides a good base against which to compare future activity on the site as we add additional materials, and the targets will be re-evaluated moving forward. 4.4.2. User Information & Interfaces (WBS 1.3.2) User Information & Interfaces enables the discovery, understanding, and effective utilization of XSEDE’s powerful capabilities and services by the national open science community. In support of Advance – Create an open and evolving e-infrastructure – there are 2 key performance indicators. Using targeted micro surveys UII will measure the impact and effectiveness of new features requested by users and staff that are implemented in the XUP. A satisfaction rating of 4 (rating scale 1-5, with 5 the highest.) Also the UII team will use metrics gathered internally and with tools such as Google analytics to measure the number of visits per new feature implemented and expect to see the number of visits increase by 5% each quarter. In support of Sustain – Provide excellent user support – there are 5 key performance indicators. To measure the effectiveness of UII the annual user survey will be used to gauge overall user satisfaction with the XUP, online documents, and the knowledge base. The target is an average rating of 4 (scale of 1-5 with 5 being the highest.) Other important measures of UII effectiveness are the number of hits on the portal and on knowledge base documents, this will be measured with analytics software. UII hopes to see a 5% increase in hits on both from quarter to quarter. Table 4-9: KPIs for User Services-User Information & Interfaces. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported User satisfaction micro survey 4 of 5 * * * Advance – Create an open and targeting new features evolving e-infrastructure (§3.2.1) # of visits per new feature 22% * +180% +68% Advance – Create an open and Increase evolving e-infrastructure (§3.2.1) Average overall user portal 4 of 5 4.24 4.12 4.27 Sustain – Provide excellent user satisfaction from annual user support (§3.3.2) survey Average overall satisfaction with 4 of 5 4.21 4.34 4.29 Sustain – Provide excellent user online documents from regular support (§3.3.2) user survey Quarterly increase of hits on 22% * +70% +82% Sustain – Provide excellent user user portal increase support (§3.3.2) Average overall knowledgebase 4 of 5 * * * Sustain – Provide excellent user satisfaction from annual user support (§3.3.2) survey Quarterly increase of hits on KB 22% * * * Sustain – Provide excellent user documents increase support (§3.3.2) * metric not tracked during this period The XSEDE User Portal was available and in production from day 1 of the XSEDE project and released many valuable features and capabilities to make users more productive on XSEDE. One of the first new deliverables was the unified XSEDE training registration and administrative interface.

PY1-3 Comprehensive Report Page 34

This has been an extremely valuable asset of XSEDE which did not exist previously. Users can easily come to XSEDE and create an account and register for a variety of training courses both in person and web cast. This gave users one click access to training and service providers a single location to create and manage their training courses. Additional features such as managing user profiles and tracking publications have been added in addition to robust and easy to use allocation management. Documentation was also vastly improved during the project, comprehensive user guides and templates were created which have been shared with organizations outside XSEDE as well. The team has also developed getting started guides, resource listings, allocation documentation and more. All of these new features and capabilities impact both new and experienced users and have resulted in an increased growth of visits to the XSEDE User Portal and web site. At this time only a subset of metrics used to measure the success and progress of the UII team are available. Targeted user surveys and micro surveys will be used to gather the additional metrics. Microsurveys were initiated in PY3 and will be used to gather metrics throughout the project. The metrics for total hits for the portal and increase in visits for XUP features both far exceeded expectations. This is largely due the release of many new features in the XSEDE User Portal. As per the annual survey, the average overall user satisfaction with the XUP and XSEDE user documentation met the target rating of 4 out of a possible 5. The knowledge base was not part of the annual survey but is planned for inclusion in the 2014 survey. 4.4.3. User Engagement (WBS 1.3.3) User Engagement (UE) engages the users of XSEDE services to gauge their productivity and satisfaction, and to provide a mechanism for collecting and reporting on emerging needs and requirements from the user community. In support of Sustain – Provide excellent user support - there are 2 key performance indicators. For the average number of feedback responses per week, a 90% response rate was selected as a goal to encourage rapid response to feedback suggestions. Responses are acknowledgement of receipt of suggestion and not necessarily a guarantee of an action. An achievable goal of providing excellent user support through the engagement process is the overall annual user satisfaction for support response and overall support effectiveness. This KPI will be measured by surveying XSEDE users via the annual, targeted or micro surveys with the target being a 4 (ratings are 1-5, with 5 being the highest.) In support of Advance – Create an open and evolving e-infrastructure – the number of user inputs that lead to valuable suggestions was chosen as a measure of the success of eliciting feedback from the user community. A target is difficult to define since there are no direct means for ensuring actionable feedback from the community. However, to support the targeted sub-goal, this KPI should be included to encourage User Engagement staff to be vigilant in soliciting actionable items whenever possible. Empirical analysis of gathered feedback items will allow for creation on a realistic target for this KPI. Table 4-10: KPIs for User Services-User Engagement. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported % of feedback responses within 90 * * 90 Sustain – Provide excellent user support 1 week (§3.3.2) Overall annual satisfaction for 4 of 5 * 4.2 4.27 Sustain – Provide excellent user support support response time and (§3.3.2) support effectiveness

PY1-3 Comprehensive Report Page 35

PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of user inputs that lead to 8 * * 8 Advance – Create an open and evolving e- valuable suggestions infrastructure (§3.2.1) * metric not tracked during this period The selected KPIs were chosen to provide insight into how effectively users are being engaged and their feedback being collected. The first KPI was chosen as a measure of the performance of UE staff at acknowledging receipt of user suggestions. During the reporting period all feedback issues were responded to within a week of user submission, surpassing the target of 90%. The second KPI was chosen as a quantitative measure of user satisfaction with consulting services provided by XSEDE. The annual survey results indicate a score of 4.275 out of 5. The third KPI was chosen as an indicator of how effectively UE staff are soliciting feedback from the user community. There were 8 user feedback/ticket submitted within the reporting period. One was rated a valuable suggestion warranting follow up. This feedback item related to COTS software on XSEDE resources. The SP forum was contacted with the request. Combined these KPIs give an overall view of UE’s main objectives. Namely, solicit actionable feedback and provide excellent consulting services. 4.4.4. Allocations (WBS 1.3.4) Allocations enables the national open science community to easily gain access to XSEDE’s advanced digital resources, enabling them to achieve their research and education goals. In support of Sustain – Provide excellent user support – there are 2 key performance indicators. Average user satisfaction with the allocations process will be measured by a targeted user survey following each of the quarterly XRAC review meetings and after award notifications have been sent. The second KPI is average user satisfaction with the allocations process as determined by two questions in the XSEDE annual survey. The team’s goal is to achieve an average of 4 on a scale from 1-5 - an average of 4 would indicate a large majority of the user community is satisfied the process. This also provides the opportunity to identify and improve in any areas that rate an unsatisfactory rating. Table 4-11: KPIs for User Services-Allocations. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Targeted user survey to assess the 4 of 5 * * * Sustain – Provide excellent allocations process user support (§3.3.2) Average satisfaction of allocations 4 of 5 4.13 4.02 4.05 Sustain – Provide excellent process from annual user survey user support (§3.3.2)

At this time only one of the KPIs is active - the XSEDE Annual survey - and is on target, achieving an average user satisfaction rating of just over a 4 on a scale of 1-5, 1 being very dissatisfied and 5 very satisfied. Development of the targeted user survey will begin in Q2CY2014 with the goal of completing the survey in the same quarter. 4.5. Education and Outreach (WBS 1.6) The Education and Outreach mission is to be the national resource center for education and outreach for computational science and engineering and the broker for XSEDE’s digital resources. One goal is to expand the base of users of XSEDE resources and services including users of the XSEDE User Portal, participants at training and education sessions, the number of users with allocations on XSEDE resources, and the number of people who attend XSEDE events (e.g. training,

PY1-3 Comprehensive Report Page 36 workshops, institutes) including the annual XSEDE Conference. Another goal is to increase the number of under-represented people using XSEDE resources and services. This includes women, minorities, and people with disabilities, as well as people at institutions in EPSCoR jurisdictions, and fields of study that have not traditionally used HPC resources. The targets are based on the past history of success engaging new users and our pro-active goals for engaging a larger and more diverse community. The metrics are gathered from the XSEDE User Portal, the XSEDE allocations database, and data that each activity collects. Our goal is to have everyone that attends any XSEDE event (training session, campus visit, conference, summer school) register through the XSEDE User Portal, so that we may track their level of engagement over time. XSEDE is now collecting demographics information (a voluntary option) to allow us to report on participation by women and minorities going forward. Table 4-12: KPIs for E&O. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Increased # and % of new 10% * † †780 new accounts (10% Deepen/Extend – users/participants, especially increase of user base) Extending use to new from among underrepresented communities (§3.1.1) groups over time Increased use of XSEDE 10% * † †2,522 active users (2% Advance – Create an resources and services on as increase increase), 1,813 active open and evolving e- many campuses as possible portal users (22% infrastructure (§3.2.1) across the country increase), 311 new users in EPSCoR jurisdictions (5% increase) * metric not tracked during this period or through entire period † Incomplete data These results are in line with historical measures of growth among new users, along with the increased outreach and community engagement efforts of the XSEDE project and the growth of the number of collaborating institutions who help to further engage the community. We are continuing to institute better methods for gathering and analyzing data to further distinguish growth among different community sectors including the number of minorities, the number of women, the number of people at institutions in EPSCoR jurisdictions, and the number of people within communities that have not traditionally used HPC resources (e.g. humanities, arts and social sciences). 4.5.1. Education (WBS 1.6.1) XSEDE Education prepares a diverse community of the current and next generation of researchers, scholars, educators, and practitioners in the use of data analysis and management, modeling, simulation, and visualization techniques. In the three years of the program, we have reached almost 5,000 students and faculty in workshops, campus visits, online and through boot camps. One major goal of the education program is to encourage universities to adopt formal programs in computational science and education in the form of certificate and degree programs. These could be certificates outside the major or degree programs such as minors at the undergraduate or graduate level or tagged graduate degrees. Thus, two of our KPIs will measure the number of such programs that are created. We are actively engaged with a number of institutions investigating those possibilities and will track when their progress on creating such programs through the academic approval process as well as final implementation of those programs.

PY1-3 Comprehensive Report Page 37

Another of our efforts is aimed at faculty who are interested in integrating computational science materials into their current or future courses. Our summer workshops for faculty focus on the pedagogy, tools, and example materials that have been used in courses across a wide range of science and engineering disciplines. Through an annual survey of the faculty that are involved in workshops, we can measure the number of courses that have been modified or updated to include computational modeling examples. A third major activity of the education program is to review and index quality education and training materials that can be integrated into the academic curriculum or used by students and faculty interested in advancing their own computational science skills. On the HPC University site (hpcuniversity.org), we have recently updated a more complete index of some materials and created a process for reviewing and indexing additional materials. Over the next year, we expect to add a large number of additional materials to the site and will track the number of resources indexed and over time. Finally, we track the number of views and downloads of available materials from the HPC University website as a measure of the utility of the indexed materials and our ability to engage the community in the use of computational science education materials in their own work. We also measure the number of students that participate in online course offerings, using the materials created for those courses. Table 4-13: KPIs for E&O-Education. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of CS&E/HPC certificate programs 3 * * 3 Deepen/Extend - Prepare the created Current and Next Generation (§3.1.3) # of CS&E/HPC degrees created 3 * * 0 Deepen/Extend - Prepare the Current and Next Generation (§3.1.3) # of courses modified/created that 40 * ** 55 * Deepen/Extend - Prepare the integrate CS&E/HPC concepts and Current and Next Generation methods (§3.1.3) # of HPC University modules 50 * * 5 Deepen/Extend - Prepare the reviewed Current and Next Generation (§3.1.3) * metric not tracked during this period ** this metric is collected from survey of the community on an annual basis As noted in Table 4-13 metrics were not collected prior to the third project year; thus targets have been set based on very little data and will be revisited in the fourth project year. We have been working directly with faculty and administrators at twenty institutions over the past several years that are in various stages of creating formal computational science education programs. Because of the slow processes for proposing and approving new programs at academic institutions, we are now able to report that a few programs have been approved to date. These include undergraduate minor programs at University of Mary Washington, Doan College, and The Ohio State University. We expect several others to be approved soon and have targeted three degree programs and three certificate programs to be implemented in PY4 of the XSEDE project. Our first annual survey of faculty who have attended our summer workshops has put us ahead of our target for classes that have integrated computational science content. Our target for PY4 is 40

PY1-3 Comprehensive Report Page 38 such courses but our survey of last summer’s workshop participants has indicated that 55 courses have been changed during PY3. Since we have just completed the tools to review and add computational science education materials to the HPC University site, we have only added a few new modules in the past quarter. However, we expect to be able to meet or exceed our target of adding 50 modules in PY4. 4.5.2. Campus Bridging (WBS 1.6.5) XSEDE Campus Bridging programs 1) make it possible for researchers, educators, and students to access cyberinfrastructure resources located from their lab to XSEDE individually or in concert as simply as if they were peripherals attached to a personal laptop computer; 2) disseminate software, training materials, and information that enables the US research community to leverage the nation's aggregate cyberinfrastructure (XSEDE, other federally-funded non-classified systems, state, and campus) to maximize US innovation and global competitiveness. KPI #1: Number of campuses on which one or more Campus Bridging tools have been adopted – This number tracks campuses employing Campus Bridging tools in order to manage data or jobs between resources, including the basic XSEDE compatible cluster stack. This KPI tracks the number of campuses actively engaged in bridging activities. We measure this KPI based on reports from Campus Champion, Minority Research Council, and Campus Bridging meetings as well as requests for help on [email protected]. KPI #2: Number of users who use one or more CB tools. Since campus deployments are only part of the measure of use of Campus Bridging tools, this metric also captures users who make use of campus tools as well as installing the tools on their own laptops or workstations. This number measures the number of users of bridged campus resources as well as individual user installations reported by campus and collected via help requests to [email protected]. KPI #3: Number of rocks roll downloads. This metric describes the number of downloads of the basic XSEDE compatible cluster rocks roll software, for trial or for installation on campus resources. This number is collected from logs on software.xsede.org, the download site for the campus bridging rocks roll. Table 4-14: KPIs for E&O-Campus Bridging. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of campuses on which one or 100 17 40 52 Advance-Create an open and more CB tools have been evolving e-infrastructure (§3.2.1) adopted # of users who use one or 1000 91 170 235 Advance-Create an open and more CB tools evolving e-infrastructure (§3.2.1) # of rocks roll downloads 100 0 6 38 Advance-Create an open and evolving e-infrastructure (§3.2.1)

We are below target for CB tool implementation on campuses, as well as for CB tool users. We are investigating the use of an application like the Linux Counter Project (http://linuxcounter.net), but this relies on the user’s willingness to agree to reporting. We plan on increasing reporting between the Globus Online Project and XSEDE Campus Bridging in order to better capture users making use of those Campus Bridging tools. We will examine these KPI’s for full validity and if new ones are necessary, they will be decided on by the December Quarterly Management meeting. This may include for example changing the “Number of Campuses using CB tools” to “Number of consults and questions from campuses using CB tools” that measures the impact of XSEDE staff working with

PY1-3 Comprehensive Report Page 39 campus personnel. Rocks roll downloads are well above the target, as we believe that the BOF at SC13 helped to spur a number of downloads. Across all KPI’s we are seeing a gradual increase as more tools and more information about Campus Bridging is disseminated among the community. The Campus Bridging team is pro-actively working to engage potential users, system administrators, research leaders, and CIOs. The team has been doing this by participating in campus visits, meetings, workshops, and conferences including the SC conference, Internet2, EDUCAUSE, regional IT consortium meetings, and the XSEDE conference. The team has developed a Campus Bridging Business Card in order to quickly summarize the tools and services available. Campus Bridging has also developed a set of outreach materials and is working with Training in order to develop additional modules for campus bridging tools. A tutorial for the Campus Bridging XSEDE Compatible Basic Cluster Stack is complete and was presented at XSEDE 14. Additional training modules are planned to be ready by the end of September 2014. Campus Bridging continues to work with Underrepresented Community Engagement team in order to identify potential partners for leveraging XSEDE resources on campuses across the country. 4.5.3. Champions Program (WBS 1.6.6) The Champions programs engage a larger, more diverse community of current and potential users of e-science infrastructure, and build campus capacity to provide digital resource support to researchers, educators, staff and students on campuses across the country. While we track several outcome metrics related to Champion engagement with the community and their interaction with the XSEDE digital ecosystem, the project has chosen the number of participants in the Champions programs as the best available measure of reaching new communities of researchers. The other KPIs were chosen based on the directives that are given to champions when they join the program: provide outreach to their communities and assist their communities in gaining access to advanced digital resources and services. To assess these indicators, we will in the future count the number of outreach events hosted by the Champions, and the number of users/stakeholders engaged. Both of these metrics will be self- reported by the Champions. We will also track the number of users on a Champion allocation, which will be reported from the XSEDE database. Table 4-15: KPIs for E&O-Champions Program. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of Campus Champion 175 118 147 165 Deepen/Extend - Prepare the institutions Current and Next Generation (§3.1.3) # of Champions (campus, 253 159 204 237 Deepen/Extend - Prepare the regional, domain, student) Current and Next Generation (§3.1.3) # of outreach events hosted by 20 * * * Deepen/Extend – Deepening use to Champions existing communities (§3.1.1) # of users who were on a 20 * * * Deepen/Extend – Deepening use to Champion allocation existing communities (§3.1.1) Impact assessment of the 4.9 of 5 * * * Deepen/Extend - Prepare the Campus Champions Program Current and Next Generation (§3.1.3) * metric not tracked during this period

PY1-3 Comprehensive Report Page 40

The number of Champions is continually growing, and we target a 10% growth annually, which we have consistently met. The data collection for the number of hosted outreach events, the number of stakeholders engaged, the number of users on a Champion allocation, and assessments of the program, will begin in the fourth project year. 4.5.4. Under-represented Community Engagement (WBS 1.6.7) Under-represented Community Engagement (URCE) programs seek to pro-actively engage a significantly larger number of faculty and students from underrepresented groups in utilizing XSEDE resources and services and as contributors to the advanced digital research services ecosystem. Through the four main program components – conference exhibiting, campus visits, regional workshops, and the minority research community calls – URCE is an entry point for under- represented minorities (URM) and Minority Serving institution (MSI) faculty to join the XSEDE community and benefit from the resources and services provided. URCE leads three to four regional workshops per year hosted by institutions with large URM faculty and student populations such as MSIs or Primarily White Institutions (PWIs) with strong diversity programs and collaborations with MSIs. These multi-day multi-track workshops provide introductory and advanced training. All workshops are developed with a local planning committee and the workshop content is customized for the audience’s needs. URCE conducts campus visits that include meetings with senior administration, and XSEDE new user training and education workshops for faculty, staff and students. The Minority Research Community (MRC) is a community building forum for URM and MSI faculty. This forum has been successful in engaging MSIs in the Campus Bridging Rocks Rolls pilot, submitting projects to the summer research experience for undergraduate and graduate students, the creation of new research collaborations between institutions, and the submission of white papers to several CI , HPC, and big data conferences. To date, eight regional workshops were conducted. The workshops at the University of Texas at El Paso, Florida International University, Florida A&M University-Florida State University College of Engineering, and Arizona State University each had over one hundred registered participants. The Arizona State University and California State Universities University were the first university- system multi-campus engagements. California State University has 23 campuses, 15 of which are MSI, and Arizona State has three campuses, which have significant minority enrollment. URCE team members visited twenty-eight campuses, eight of which were visited at least twice. Follow-up visits are important to help sustain the momentum of the initial visit, to garner support from additional faculty and administrators, and provide additional training. The goal is to engage the institution so the participation is sustained and grows over time. XSEDE team members from across the XSEDE project including ESTEO, E&O, and URCE with the assistance of MRC faculty and XSEDE scholars presented at twenty-one professional conferences for URM in STEM. Conference exhibiting contributed to a robust pool of student applications for SRE and XSEDE Scholars. The number of new user accounts from Minority Serving Institutions (MSIs) and under-represented groups is the first Key Performance Indicator for URCE. The primary goal in URCE activities is to make first contact with prospective users who have not been engaged with XSEDE, its predecessor TeraGrid, or other large-scale cyberinfrastructure initiatives. These users typically have no or limited knowledge of XSEDE and no experience with other similar advanced digital services, supercomputing or high performance computing. Many of the potential users in this community do not have the basic computational thinking, computational science and engineering, parallel

PY1-3 Comprehensive Report Page 41 computing, Linux/Unix, or other programming knowledge. So the first step for beginning their engagement with XSEDE is the establishment of an XSEDE User Portal Account so they can receive announcements for training and professional development opportunities, and sign up for webinars and local in-person training. The target number for this KPI is based on the anticipated number of new contacts made at campus visits, regional workshops and conferences. The second KPI for URCE is the number of new users with allocations from under-represented communities and MSIs. This represents users who have sufficient training from XSEDE training events or other sources to request and receive a startup, education, or research allocation. For PY1 through PY3, eight hundred thirty-eight MSI faculty and students participated in non-URCE led XSEDE training events. This is a strong indicator that MSI faculty and students are actively engaged in developing the skills necessary to use advanced digital resources. The MRC has proven to be an effective mechanism to assist faculty wanting to access XSEDE resources or participate in XSEDE programs. This KPI is estimated to be between five to 10 percent of the number who create new portal accounts. The third KPI is the number of new projects from underrepresented groups and MSIs. This is the number of MSI and underrepresented PIs requesting assistance from the Extended Collaborative Support Services (ECSS) team. Based on the user population as described above, it is estimated that 50% of the new allocations from underrepresented groups and MSIs will need ECSS support. At this time all the data below is by institution type. The institution type used is from the Carnegie Classification™, which is consistent with the federal agencies’ designations for Minority Serving Institutions that include: Historically Black Colleges and Universities (HBCUs), Hispanic Serving Institutions (HSIs), and Tribal Colleges and Universities (TCUs). Some demographic information is optionally available if the user provides it when creating their profile or registering for a training event. The development of queries for accounts, allocations and projects by ethnicity/race and gender is in process. This data will be reported starting first quarter of PY4 and will include the data for the last two quarters of PY3. Table 4-16: KPIs for E&O-Under-represented Community Engagement. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of new users with allocations from 12 19 14 21 Deepen/Extend – Extending use to under-represented communities and new communities (§3.1.2) MSIs # of new projects from under- 6 * 1 4 Deepen/Extend – Extending use to represented communities and MSIs new communities (§3.1.2)

# of new portal accounts from under- 160 196 369 362 Deepen/Extend – Extending use to represented communities and MSIs new communities (§3.1.2) * metric not tracked during this period. The number of new user accounts from Minority Serving Institutions which include HBCUs and HSIs is 362 for PY3 and is 47% of the 780 new user accounts reported for all of E&O. The large increases in PY2 and PY3 reflect the high number of first time participants in the URCE-led regional workshops and the high number participant registrations for the workshops held at University of Texas at El Paso, Florida International University, Florida A&M University-Florida State University College of Engineering, and Arizona State University. Nine hundred thirty-eight MSI faculty and students participated in non-URCE led training events PY1 – PY3, which is a good indicator that additional interaction with XSEDE is occurring after first contact.

PY1-3 Comprehensive Report Page 42

Overall, the number of MSI PIs with allocations is higher than the original target with startup allocations dominating the allocations types for MSIs. The number of MSI PIs with projects was 4, less than the target by a third. The consulting sessions at California State University at San Bernardino and Clark Atlanta University-Spelman workshops resulted in 4 startup allocations submitted during the new user training session. The Humanities and Social Sciences (HASS) consulting sessions at Morehouse College resulted in a Morehouse faculty member being added to the Large Scale Video Analysis NIPS project. Based on these successes, the URCE team in collaboration with ESTO will emphasize this service in new user training and recommend ECSS support when assisting users in submitting allocations requests during consulting sessions in all future regional workshops and campus visits. A more detailed discussion of the impact of XSEDE programs on diversity is presented in Appendix G Increasing Diversity. 4.5.5. Student Engagement Programs (WBS 1.6.8) The Student Engagement Programs involve students, with a strong emphasis on engaging under- represented individuals, to create a vibrant, diverse, and collaborative community that is empowered through mentoring, education and training to accelerate scientific discovery. The KPI of “# of students engaged” is the number of students involved with the XSEDE Scholars, Summer Research Experience, Student Champions, and the annual XSEDE Conference. The KPI target is a conservative number since we have been receiving supplemental funding each year from NSF to bring an additional 50-70 students to the annual conference, and that funding is subject to availability of NSF funds each year. The students are measured by counting them in the XSEDE sponsored programs, plus the registration statistics from the XSEDE Conference. Table 4-17: KPIs for E&O-Student Engagement. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of students engaged = 60 75 102 102 Deepen/Extend - Prepare the # Summer Research Experience Current and Next Generation participants + (§3.1.3) # XSEDE conference attendees Assessment of student experiences 4.9 of 5 * * * Deepen/Extend - Prepare the Current and Next Generation (§3.1.3) * metric not tracked during this period The first KPIs is exceeding expectations due to successful supplemental funding for student travel to XSEDE conferences and includes the current XSEDE Scholar cohort, the Student Research Experience students, the students given travel support for the XSEDE conferences, plus the student Campus Champions. PY3 was the first year for student Champions. Since the supplemental funding is not guaranteed from year to year, the target of engaging 60 students is based on baseline funding for these activities. The KPI for assessing the student experiences is a new metric and will be measured for the first time in PY4. The evaluators will be assessing the experience of each student in the program for the duration of their involvement. Perceived experience, professional growth, and skills enhancement will be among the criteria evaluated. Preparing the next generation of computational scientists, and also those who would have a career supporting them, starts with deep exposure to this community of technical staff and computational

PY1-3 Comprehensive Report Page 43 scientists. At the very least, the students counted in the tables above get a whole week exposure to the HPC computational science community at the annual XSEDE conference, with some events tailored especially for students and leaving time for mentoring and attending other events alongside working professionals. The Scholars and Summer Research Experience (SRE) students get much more exposure, including training seminars and workshops, and work experience in the case of the SRE cohort. Recruitment for these opportunities takes place online, at events where XSEDE exhibits, via student contact lists used by partners around the country, and by student names XSEDE itself has collected at schools visited by other Education & Outreach efforts, including URCE and Education. Recruitment for the Year 4 cohort of XSEDE Scholars and SRE opportunities yielded the largest, most diverse list of applicants thus far in the project. 4.5.6. Evaluation (WBS 1.6.9) The external evaluations assists the TEOS team in sustaining an effective virtual organization by conducting formative evaluation of implementation and effectiveness as well as a summative evaluation of longitudinal impact and institutionalization. The evaluation reports allow us to track our full range of metrics, conduct a formative analysis, and guide us in altering our path forward based on the analysis. The external evaluation team has accomplished its evaluation plans through successful implementation and reporting of all activities listed in the evaluation plan. During PY4 and at the project completion, the evaluation team will provide summative reports. Quarterly targets for the team include the submission of formative interim reports for each activity listed in the evaluation plan, which is regularly updated and posted on the wiki, occurring during the current or previous quarter. Table 4-18: KPIs for E&O-Evaluation. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # evaluation reports produced 20 12 12 19 Deepen/Extend - Prepare the Current and Next Generation (§3.1.3)

During PY1 - PY3 the evaluation team published the following reports in the following areas of TEOS; (1) 9 XSEDE Scholars Program reports in PY1, 1 URCE report in PY1, 2 SRE in PY1, 6 Education reports in PY2, 2 Campus Champion reports in PY2, 3 SRE reports in PY2, 1 URCE report in PY2, 1 Training report in PY3, 3 Education reports in PY3, 3 Champion reports in PY3, 2 Quarterly Outreach Event reports in PY3, 6 URCE reports in PY3, 1 Campus Bridging report in PY3, and 3 Longitudinal Tracking Dashboard reports in PY3. We are on target achieving our KPIs, as our reports have been submitted on schedule and as planned since all aforementioned reports detail activities occurring from the previous or current quarter. The TEOS teams have taken these formative reports into consideration to refine and improve the resources and services provided to the community. 4.5.7. Coordination and Administration (WBS 1.6.10) To provide a well-integrated set of cohesive activities that achieve the E&O goals and best serve the needs and requirements of the community. The KPI is the success of the Education and Outreach team in achieving its collective KPIs. This KPI was chosen to demonstrate that the team is working in an integrated and cohesive manner to

PY1-3 Comprehensive Report Page 44 support one another in achieving overall team success. We will assess the overall team measures to strive to exceed 80% success. Table 4-19: KPIs for E&O-Coordination and Administration. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Success of E&O team achieving its 80% * * 90% Sustain – Effective virtual collective KPIs. organization (§3.3.3) * metric not tracked during this period Collectively, the Education and Outreach team is meeting the majority of the KPIs that have been set. The team has a number of regular conference calls to plan activities, to identify support staff from ESTEO and other XSEDE teams, to review formative and summative evaluation reports, to review evolving community requirements, and to identify strategies for improving the activities. The team continues to work together to improve communications, coordination, the staff climate, and service to the community.

PY1-3 Comprehensive Report Page 45

5. Delivering & Supporting New Capabilities in an Evolving Environment 5.1. Architecture for a National Distributed Cyberinfrastructure Ecosystem (Architecture & Design, WBS 1.1.3) Architecture & Design's mission is to advance the XSEDE infrastructure by designing an open and evolving standards-based architecture for the XSEDE infrastructure that satisfies stakeholder requirements. The primary output of the Architecture and Design (A&D) team is the delivery of Level 3 decompositions of use cases to Software Development & Integration (SD&I) by passing an Active Design Review conducted by SD&I and acceptance of the product provided by A&D. As may be evident from this description, A&D primarily supports the overall XSEDE goal of creating an open and evolving e-infrastructure. The KPIs for A&D are relative effectiveness per use case and clearance ratio of preparing Level 3 architectural decompositions. These are computed by: (estimated task duration) / (actual task duration) and (number Level 3 decompositions completed)/(number Level 3 decompositions planned) Regarding the relative effectiveness KPI, we are defining a measure of how well we process use cases as opposed to how many use cases we process because the number of use cases presented is not something we can directly control. In defining this metric, we need to account for the fact that some use cases are easier and some are harder. Accordingly, we define a metric that measures how much time is spent on a use case relative to how much time is expected to be spent on it. For determining how much time is expected to be spent, we make engineering estimates based on our experience in systems and software engineering practice. As we gain more experience at this, our ability to make accurate estimates will improve. There are three levels of architectural decomposition in order of increasing detail: Level 1: three layer architectural structure (access, services & resources) Level 2: defines major functions of each layer Level 3: details of each function, its components, and how they interact with each other Level 3 is the level of detail necessary for an Active Design Review (ADR) led by the SD&I Team. It is in the ADR that SD&I ensures that it has all the information it needs to develop its list of activities to develop and integrate these new software components into the XSEDE environment. There are several steps involved in delivering a Level 3 architectural decomposition from a use case. These include: 1. Use case and associated quality attributes (quantifiable measures that define how the server or resource must perform) are fully documented and accepted by the architects as candidates for an architectural response. 2. Level 3 decomposition documentation completed and approved by stakeholders 3. Use cases and Level 3 decompositions that have been completed and passed an SD&I ADR. The first step, use case documentation, is driven by the users/stakeholders and therefore is subject to their ability and time to develop the documentation. Altaf Hossain has been a resource to stakeholders who need help with the process. Since this step is out of the control of the systems and software engineering team (S&SE, A&D and SD&I), it is not part of the measured KPI for A&D.

PY1-3 Comprehensive Report Page 46

Level 3 decompositions and ADRs are activities within the scope of A&D and SD&I activities and are part of the KPI for A&D because we can estimate the amount of time it will take for each step and then measure how much time it will actually take. Table 5-1: KPIs for Architecture & Design. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Average relative effectiveness = >0.8 * * * Advance-Create an open and (estimated task duration) evolving e-infrastructure / (actual task duration) (§3.2.1) Completion Ratio = >0.8 * * 0.4 Advance-Create an open and (# Level 3 decompositions completed) evolving e-infrastructure / (# Level 3 decompositions planned) (§3.2.1) * metric not tracked during this period We began measuring data for relative effectiveness during the fourth quarter of PY3 so we have no results to report at this time. For the completion ratio of preparing Level 3 architectural decompositions, as of the end of PY3, we prepared decompositions for 4 use cases although we had planned to do 10 use cases so the completion ratio is 0.4. Of the remaining 6, 4 are expected to be completed by the end of July, 2013. Active Design Reviews for the final 2 canonical use cases are still in process. These remaining 2 canonical use cases are closely tied to identity management planning efforts and therefore are delayed until those plans are solidified. Our target completion date for these is fall of 2014. 5.2. Realizing the XSEDE Architecture (Software Development & Integration, WBS 1.1.6) Software Development and Integration’s mission is to: deliver deployment-ready software and services that satisfy stakeholder requirements and implement XSEDE’s architecture, provide software and service maintenance and support, and enable the extended community of XSEDE software providers to use SD&I's software engineering documentation and related software and services. Table 5-2: KPIs for Software Development & Integration. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported % of delivered components >90% * 100% (6) 100% (5) Advance-Create an open and rated 4 out of 5 or higher by evolving e-infrastructure (§3.2.1) operators % of known defects repaired >90% * 100% (6) 100% (5) Sustain- Provide reliable, and in delivered components secure infrastructure (§3.3.1) % availability of engineering >99% 99.96% 99.83% 99.99% Sustain- Provide reliable, and and information services secure infrastructure (§3.3.1) * metric not tracked during this period SD&I’s KPIs aim to measure 1) how our customers rate the quality of the software components we deliver (i.e., pass acceptance tests, easy to install and administer, and quality of documentation), 2) how timely we respond to and resolve components defects, and 3) the availability and reliability of engineering services, software distribution services, and information services operated by SD&I. We chose a high component acceptance rating by operators (4 out of 5) to minimize the risk of re- work and ensure that quality software is delivered to XSEDE operators and users.

PY1-3 Comprehensive Report Page 47

In PY1 SD&I was organized as a new function not in the original XSEDE proposal, developed an initial engineering process based on planning, design, development, and testing cycles or increments, and implement and piloted, ahead of an architecture, the Genesis II/GFFS, UNICORE 6/EMS, and Globus Transfer components. Due to several design and security issues, delivery of these three components was delayed till PY2. During PY2 and PY3, SD&I delivered 6 and 7 components (respectively) to Operations. Because operations conducts their own acceptance testing of new or significant upgrades, by the end of PY2 and PY3 6 and 5 components (respectively) had been accepted by operations. The two remaining components were deferred to early PY4. All of these components were accepted by Operations with a rating of 4 or higher; therefore the value is 100% in PY2 and PY3. We chose a target repaired defect rate of 90% or higher. Because the first delivered components were early in PY2, the first fix to previously delivered components did not occur until Q3 PY2. All outstanding defects were fixed in PY2 and PY3 delivered components resulting in a defect fix KPI of 100%. Because SD&I does not have a good defect tracking process, in PY4 we plan to start tracking defects, mapping defects to tickets, components, and use cases, and identify slow defect response areas to improve using JIRA. The target availability KPI for our engineering and information services was chosen at greater than 99% to ensure that our tools facilitate the engineering teams in meeting their goals. During PY1- PY3, our average availability for these services was 99.83% or higher. Note that we had availability data for the information services for all quarters whereas availability data for the other services (JIRA, use case registry, software source management, and software distribution service) was only available starting on 5/14/13. 5.3. Technology Investigation Service (WBS 1.7) The Technology Investigation Service (TIS) identifies and evaluates potential technologies to close the gap between user needs and the XSEDE service offerings. The TIS organization has changed over the reporting period and as of the end of XSEDE PY3 made up of two groups: Technology Identification and Technology Evaluation. Two major TIS goals are evaluating technologies and making recommendations of new technologies to XSEDE for consideration of adoption and raising awareness of TIS to XSEDE and the eScience community to solicit their input. These goals map to the “Advance the Ecosystem” XSEDE goal. The following paragraphs describe the TIS KPIs as indicated in Table 4-3. Note that the NSF award for TIS started one year earlier than XSEDE. So PY1 for XSEDE is actually PY2 for TIS. Technologies TIS recommended to XSEDE: TIS intends to support the overall XSEDE program by discovering technologies that are of use to XSEDE and should be considered by XSEDE for adoption. The decision of technology adoption is up to the XSEDE engineering process. Since TIS does not control what is selected, the KPI was selected to specify how many technologies were recommended. A KPI target of two technology recommendations per year was selected based on the seven evaluations that the Technology Evaluation Team performs a year. Contributing to these results, TIS made two recommendations in XSEDE PY1: to use Globus Online for Reliable File Transfer (RFT) and to use GridFTP for UNICORE RFT instead of the native UNICORE file transfer protocol, due to UNICORE’s native transfer speed being slower than GridFTP. Both of these technologies were accepted by XSEDE and are now in production. In PY2 TIS recommended that Pegasus be used as for large scale workflows as a Workflow Management System (WMS), and that Genesis II (GFFS) not be used for RFT. Pegasus has started the engineering process and Genesis II is going through the process. In PY3 TIS recommended UNICORE for small

PY1-3 Comprehensive Report Page 48 scale WMS, as it was user friendly and provided a useful GUI to submit small numbers of jobs and TIS recommended DUO’s OTP system for use in XSEDE as the user authentication system due to a variety of available authentication methods and open architecture. TIS marketing efforts: It is important that the there be a general awareness of TIS within the XSEDE project and the eScience community in general in order to further the goals of TIS. Marketing events performed by TIS staff have conveyed messages about TIS with presentations, posters, prepared texts, etc. A KPI target of four TIS marketing events per quarter starting in PY3 based on one per month plus one for the quarter. This is the maximum that the TIS management can sustain and will allow for predictable, periodic messages into the community. Table 5-3: KPIs for TIS. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Technology Recommendations to 2 per 2 2 2 Advance – Create an open and XSEDE year evolving e-infrastructure (§3.2.1) TIS Marketing Efforts 4 per N/A N/A 4 Advance – Create an open and year evolving e-infrastructure (§3.2.1)

TIS continues to engage the SD&I group and Operations group in order to be in sync with their activities. In doing so, TIS will be able to better apply its resources to identifying and evaluating technologies that will be useful to the XSEDE project and community. During PY2, TIS began organizing and hosting quarterly meetings with SD&I and Operations which has led to more close coordination of evaluation and testing across these groups. As a result of these meetings we have identified “software categorization and inventory” across the XSEDE project to be an important need. In PY3 TIS began coordinating with SD&I activities to accelerate the activities in this area. 5.3.1. TIS - Technology Identification (WBS 1.7.1) The Technology Identification team has actively engaged the user community to identify candidate technologies that are relevant to XSEDE as well as the eScience community at large. In order to create an open and evolving infrastructure the Technology Identification team is focused on measuring the number of TIS visits, unique TIS visitors, the number of entries in XTED and the new entries added every quarter. The team will be able to use tools such as internal metric gathering and Google analytics to measure the impact of TIS on the wider community. By increasing the entries in XTED we had hoped to accelerate the impact of the XTED tool on the community and help inform the XSEDE community and wider eScience community of technologies that can help meet their user environment needs. By quantifying the number of entries in XTED and the visitors to TIS and comparing that to previous quarters helps measure advancement of creating an open infrastructure for the eScience community to benefit from. Table 5-4: KPIs for TIS-Technology Identification. Sub-goal KPI PY4 Target PY1 PY2 PY3 Supported Visits to TIS website 5% increase N/A ** +290% Advance – Create an open and per quarter evolving e-infrastructure (§3.2.1) Unique TIS website visitors 5% increase N/A ** +280% Advance – Create an open and per quarter evolving e-infrastructure (§3.2.1)

PY1-3 Comprehensive Report Page 49

Sub-goal KPI PY4 Target PY1 PY2 PY3 Supported Entries in the XTED 5% increase N/A +200% +1022% Advance – Create an open and per quarter evolving e-infrastructure (§3.2.1) New Entries in XTED +15 entries +6 +12 +184 Advance – Create an open and per quarter evolving e-infrastructure (§3.2.1) ** metrics not fully tracked for this period, therefore this information is unavailable At this time only a subset of metrics used to measure the success and progress of Technology Identification are available. TIS released XTED in late PY1after developing a robust technology database enabling staff and technology teams to enter their candidates into XTED. The focus of PY3 was to increase the number of technologies in the database and therefore increase the visibility of the project as well. Both of these goals were accomplished in PY3. For the number of new entries in XTED and the percentage increase the goal was exceeded. The metrics represent the increased effort the TIS team as a whole has been putting into gathering additional technologies for inclusion into the database instead of relying on outside sources to populate the information. 5.3.2. Technology Evaluation (WBS 1.7.2) The Technology Evaluation team evaluated and recommended technologies to improve the utility, functionality, availability, performance, capacity, responsiveness, security, and reliability of the XSEDE user environment. The “Evaluations Completed Annually” metric refers to the number of evaluations completed in a fiscal year by members of the Technology Evaluation group. This metric was chosen as it best represents the actual work output of the group. The metric is based on the number FTE assigned to the group, the historical evaluation rate, and the overhead in FTE used for non-evaluation duties. This year's metric was adjusted to reflect the shift in focus to reduce non-evaluation duties, and each year's metric will be adjusted on funded and staffed FTE positions. A completed evaluation will have approved internal and final reports on the team's evaluation results and recommendation, as well as an entry in the XTED database with a publicly available copy of the final report. Deferred evaluations will not count towards the accomplished metric, but may be noted when discussing progress, or lack thereof. The term evaluation is taken to mean “the process of testing the specified technology against a set of Use Cases and Requirements to show whether that technology is capable of meeting those needs in a repeatable way, across a useful set of environments”. The results will be positive, positive with reservations or negative. An evaluation need not have positive results to be completed; we consider it a valuable service to note where a technology fails to perform as required. Deferred evaluations are those which have been started but may not be completed for some reason. If and when the issue has been resolved, the evaluation will be re-attempted. Table 5-5: KPIs for TIS-Technology Evaluation. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Evaluations completed 7 1 2 9 Advance – Create an open and evolving e- infrastructure (§3.2.1)

PY1 was being used to develop and validate the evaluation process itself. PY2 included efforts to develop training materials for new team members and to enable some local resources to act test systems for further testing. PY3 saw the start, but not completion of 4 additional evaluations. One

PY1-3 Comprehensive Report Page 50 technology was selected for evaluation twice, but was deferred both times, and those efforts are not reflected here. Note that PY4 targets are for limited increase in funding, and apply only for that year. WBS 1.7.2 efforts to support XTED entries were only slated for PY3/4, and are listed only the appendix. 5.4. XSEDE Operations (WBS 1.2) XSEDE Operations provisions and supports the wide range of digital capabilities that comprise XSEDE's evolving, integrated cyberinfrastructure. The Operations group consists of ~30 FTEs and is responsible for implementing, delivering, maintaining, and evolving an integrated cyberinfrastructure of unprecedented scale that incorporates a wide range of digital capabilities to support the national scientific and engineering research effort. Operations staff is subdivided into six teams based on the WBS: 1.2.1 Cybersecurity, 1.2.2 Data Services, 1.2.3 XSEDEnet (Networking), 1.2.4 Software Testing and Deployment (ST&D), 1.2.5 Accounting and Accounts Management, and 1.2.6 Systems Operational Support. The Operations management team meets weekly and individual Operations groups meet approximately bi-weekly with all meeting minutes posted to the XSEDE wiki. Maintaining and evolving an integrated cyberinfrastructure requires the availability and reliability of digital resources. To that end, XSEDE Operations is charged with monitoring myriad hardware and software services. The composite availability of these resources is thus a key measure of the health of the entire cyberinfrastructure. Each of the components of this composite (enterprise services, XSEDEnet, XSEDE-Wide File System, and the POPS/XRAS account management service) has its own availability target. When the targets for each of these services are met, the average composite availability percentage is at least 97. As shown in Table 4-6, XSEDE operations met this KPI for each program year to date. Along with monitoring and maintaining resources, the XSEDE Operations Center (XOC) provides XSEDE with effective and efficient user support by responding to service requests from end users and staff. Because the center is staffed 24 hours per day, the measure of success is a resolution time by the XOC of 24 hours or less for service requests. For the complete list of Operations metrics, see Appendix B. Table 5-6: KPIs for XSEDE Operations. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Average composite availability % = average 97 99.61 99.81 99.62 Sustain- Provide reliable, and of (% enterprise services availability, % secure infrastructure (§3.3.1) XSEDEnet availability, % XWFS availability, % POPS/XRAS availability) 1

1 Composite availability includes XSEDEnet beginning in Q3 2013 and XWFS beginning this quarter. Mean time to ticket resolution by XOC < 24 * * 8.9 Sustain – Provide excellent (hrs) user support (§3.3.2) * metric not tracked during this period 1 Includes only POPS and enterprise services availability 2 Includes XSEDEnet beginning in Q1 PY3 and XWFS beginning in Q2 PY3

PY1-3 Comprehensive Report Page 51

Operations highlights for the first three project years, by WBS: Cybersecurity (WBS 1.2.1) Security Incidents: There have been no major security incidents across XSEDE for the first three years of the project. This is reflective of the security policies and implementation performed by the team made up of the service providers lead by the XSEDE Operations Cybersecurity group. There have been local incidents at the service providers but none to date have spread across XSEDE. Risk Assessment: Starting in PY1 the security team conducted a risk assessment on XSEDE’s core services and infrastructure to identify and mitigate risks. The outcome of this risk assessment has been used to prioritize the efforts of the security team in the following years. Information Security Training Program: In the past, compromised accounts have been the most common security issue. Generally they are compromised through systems operated by XSEDE users. In order to improve security awareness with our users, the security team has conducted annual information security training at the XSEDE annual conferences. While it’s difficult to make a direct cause/effect to this training, we have seen a decrease in account compromises related to users. Vulnerability Scanning Program: In order to assist XSEDE SP’s in identifying security vulnerabilities to XSEDE’s core services, we have implemented a automated vulnerability scanning service. All of XSEDE’s most critical resources are scanned on a weekly basis for potential vulnerabilities. This service has helped SPs to keep systems properly patched and protected from attackers. XSEDE Two Factor Authentication Service: In PY3 a second factor authentication service was offered to SPs and required for select XSEDE core services. The use of this service is to ensure that credentials cannot be reused if obtained by an attacker. The use of this service is planned to expand in the coming year to include federating with local SP second factor services. Data Services (WBS 1.2.2) XSEDE Wide File System: Data Services deployed a production wide-area file system, the XSEDE- Wide File System (XWFS), using IBM’s General Parallel File System (GPFS) technology. Operations and SD&I staff from five service providers worked in a coordinated effort over multiple program years to assess technology choices, develop an acquisition and deployment plan, install and configure systems at each site, and successfully operate and maintain the system as an allocated resource for projects and users with multi-site workflows. Globus Online: Data Services provided widespread availability and adoption of Globus Online/Globus Transfer as a mechanism for accessing the GridFTP infrastructure inherited from the TeraGrid. GridFTP Infrastructure: The GridFTP software infrastructure was enhanced by version upgrades of the GridFTP services. Additionally, significant effort in PY3 to generate, collect and analyze detailed logs of all GridFTP transfer activities at major XSEDE endpoints. Global Federated File System: Significant efforts were made in the first three years to evaluate, harden, document and improve the Genesis II software to provide a Global Federated File System (GFFS) and other capabilities. This software was deployed as optional software for XSEDE service providers at the end of PY3.

PY1-3 Comprehensive Report Page 52

XSEDEnet (WBS 1.2.3) XSEDEnet infrastructure: XSEDEnet, the high-performance wide area network interconnecting the XSEDE service providers, was migrated from the Teragrid infrastructure to the National Lambda Rail (NLR) national infrastructure at the beginning of XSEDE. Subsequently, XSEDEnet was migrated in March 2013 from NLR to Internet2’s Advanced Layer-2 Service (AL2S), which is provided over a 100GE national backbone network. PerfSONAR: A full 10GbE perfSONAR mesh was deployed between XSEDE service provider sites for end-to-end network diagnostic testing and performance measurement, a major upgrade from the 1Gb netmon test systems of Teragrid. This system is now coordinated by XSEDEnet and maintained by the service providers. XSEDEnet reporting: Starting after the migration of XSEDEnet to Internet2’s AL2S, metrics were defined for XSEDEnet and XSEDE began working with Internet2 on XSEDEnet reporting capabilities regarding XSEDEnet metrics, availability and utilization. Software Testing and Deployment (ST&D, WBS 1.2.4) Operational reviews: There were 16 Operational reviews completed between PY1 and PY3, out of which 15 activities were accepted and 1 was rejected. Apart from these activities, which introduced new software to XSEDE, Operational reviews were conducted on upgraded versions of existing software in XSEDE. Some of the activities that passed Operational review and introduced upgraded versions of software were Globus Online (SDIACT-100), GridFTP update (SDIACT-031), EMS/GFFS update (SDIACT123/126), GridFTP Server Update (SDIACT-150) and XDUsage update (SDIACT- 154). Major activities that introduced new software to XSEDE and passed Operational testing were as follows. SDIACT-097: Basic EMS: The Basic EMS activity introduced UNICORE, a Grid middleware offering services for program execution and data movement on remote computers via the Internet. UNICORE has subsequently been deployed successfully at all level 1 Service Provider sites at XSEDE. SDIACT-084: Initial XWFS Deployment: The XWFS Initial Deployment activity encompassed the phase 1 deployment of the XSEDE-Wide File System, using IBM's GPFS, resulting in a production resource which would provide a high-performance file system mountable on all XSEDE compute and visualization resources. SDIACT-070: Single Sign-On Hub: The Single Sign-On Hub activity (SDIACT-070) introduced a centralized single sign on hub for XSEDE that would provide command line access to XSEDE Service Provider login nodes using gsissh. SDIACT-108: Globus Connect Multi-User: The Globus Connect Multi-User activity (SDIACT-108) packaged GridFTP servers and MyProxy Online CA, created a simple way to install and configure a Globus Online endpoint. This activity was aimed for campuses, to support the campus bridging use cases for transferring data to and from campuses, including transfers between a campus and XSEDE systems. SDIACT-102: XDUsage: The XDUsage activity (SDIACT-102) delivered a XSEDE usage utility program, xdusage, which gives researchers and their collaborators a command line tool to view their allocation information in the XSEDE central database (XDCDB). This activity upgraded and replaced the legacy TeraGrid tgusage tool.

PY1-3 Comprehensive Report Page 53

Accounting and Account Management (A&AM, WBS 1.2.5) Account Management: The A&AM team has continually upgraded and maintained key services with the aim of providing uninterrupted service to the user community of XSEDE. Throughout the first three years of the XSEDE project, account creation times have continued to decrease, going from 5 working days to less than an hour (0.03 days). The XSEDE User Portal is now able to automatically create the user accounts in the XSEDE Central Database, thus reducing staff data entry delays, allowing users to see their account creation immediately on the XSEDE user portal. Service Providers are then provided the account data within an hour for processing locally. POPS/XRAS: While the A&AM team has remained attentive to the account creation process, the majority of the effort has been in the redevelopment of the POPS system into the XRAS system. This new and vastly improved system was developed across 5 XSEDE institutions and was released the first quarter of PY4 to the XSEDE community. While the survey results for the POPS system have been consistently high since the beginning of XSEDE, preliminary feedback received indicates the new XRAS system will be a significant enhancement to what is often the first experience a researcher has to the resources offered through the XSEDE Allocation process. Allocation Proposals and Awards: Through the first three years of XSEDE, 8,844 proposals have been submitted, resulting in 7,699 awards. The new XRAS system will provide researchers with an interface integrated within the XSEDE User Portal as well as increased flexibility for the administrators and reviewers involved in the XSEDE Allocations Process. See section 3.4.4 Allocations above for more information about XSEDE allocation efforts. System Operations Support (SysOps, WBS 1.2.6) Transition to XSEDE: Smooth and efficient transition from Teragrid branded services to XSEDE. XSEDE Ticket System: The XSEDE ticket system was transitioned from a locally developed ticket system to the RT ticket system on May 20, 2013. Systems Monitoring: SysOps deployed, configured and maintains a Nagios system for XSEDE services SysOps monitoring. Enterprise Services: At the end of program year 3 SysOps was operating and maintaining 47 primary and backup enterprise services. SysOps deployed approximately 28 new central services from Teragrid and decommissioned 7. Enterprise Services Availability: Delivered no less than 99% aggregate uptime for Enterprise services. XSEDE Central Database: Along with many other upgrades performed for the three years of XSEDE, and as a representative example, the XSEDE Central DataBase (XDCDB) PostgreSQL software was upgraded for feature support and added HA functionality. A backup XDCDB was installed at PSC and failover testing has been performed and actual failovers have occurred seamlessly. XSEDE Operations Center: The XSEDE Operations Center (XOC) fielded and routed over 35,000 tickets. A backup XOC was instantiated at Indiana University and was tested and verified. 5.4.1. Cybersecurity (WBS 1.2.1) The Cybersecurity Security (Ops-Sec) group protects the confidentiality, integrity, and availability of XSEDE resources and services. The KPI for the Cybersecurity group is the percentage of time that XSEDE resources are unavailable due to a security incident.

PY1-3 Comprehensive Report Page 54

Downtime resulting from security incidents has a direct impact on the availability of XSEDE resources and is the key evaluative measurement of the Cybersecurity group’s efforts. The goal is to have no service interruptions as a result of a security incident and thus a 0% availability reduction was chosen as the KPI target. Also, this supports the XSEDE goal of providing a reliable and secure infrastructure. Table 5-7: KPIs for XSEDE Operations-Cybersecurity. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Availability reduction resulting 0 0 0 0 Sustain - Provide reliable, and secure from security incidents (%) infrastructure (§3.3.1)

Over the past three years there have been no XSEDE security incidents that have resulted in a reduction in availability of resources or services. Though the security team has responded to various threats over the past three project years of XSEDE, including compromised user accounts, login nodes, and critical vulnerabilities, none have had an impact on the availability of XSEDE resources. This reality is one that all projects aim to achieve but is difficult to maintain over a period of time. Having a 0% on resource availability over a three-year period is exceptional and demonstrates our commitment to resource security. We realize past performance does not always predict the future but we will continue to anticipate and mitigate threats to XSEDE to the best of our ability. During the last quarter of PY3 a major security vulnerability named Heartbleed was discovered. The XSEDE security and operations teams quickly responded to assess the possible effects of Heartbleed and to deploy measures to counter such as soon as possible. As a result, XSEDE systems that were found to be vulnerable were patched and new server host certificates were issued. With the combined effort of the portal, system operations, and security teams, it was determined that roughly 5,000 XSEDE accounts may have been exposed. To eliminate the possibility of any unauthorized use that may have occurred from this exposure, all affected accounts were ‘reset’ requiring users to select new passwords. This was the largest amount of possibly compromised accounts that XSEDE has handled to date. The efforts of many XSEDE staff to develop a well-organized response and clear communication plan can be credited to very minimal user support that was needed as the response was executed. Affected users were informed via email about this issue and information was posted on the XSEDE website to ensure all users were informed about XSEDE’s response to Heartbleed. To date, there have been no known security issues related to this vulnerability. 5.4.2. Data Services (WBS 1.2.2) The Data Services group facilitates data movement and data management for the community by maintaining and continuously evolving XSEDE data services and resources. The KPIs for the Data Services group are the performance (Gbps) of intra-XSEDE GridFTP transfers > 10MB and the percentage of time that the XSEDE Wide File System (XWFS) is available. The GridFTP performance KPI represents the ability of the GridFTP server configurations at the service providers to take advantage of the XSEDE network capabilities and the performance of the underlying file systems. The number is generated by calculating the average performance of all transfers larger than 10MB between XSEDE endpoints. This eliminates the highly variable performance associated with very small transfers and provides a highly accurate reflection of the

PY1-3 Comprehensive Report Page 55 overall user experience in transferring data within XSEDE. The initial target value is 1Gbps, or 1/10 the XSEDE network capacity, based on the expected capabilities of the XSEDE endpoints. The XSEDE-Wide File System effort has one associated KPI, the availability metric, which reflects both the overall architecture of the file system and its resiliency in situations of partial hardware or network failure, as well as the stability of the underlying XSEDE infrastructure elements on which the XWFS relies, most importantly the network. Thus, the XWFS availability target is coupled with the availability target of XSEDE’s network, XSEDEnet, which is 99%. Table 5-8: KPIs for XSEDE Operations-Data Services. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Performance (Gbps) of intra-XSEDE 1 Gbps * * 0.827 Sustain - Provide reliable, and secure GridFTP transfers > 10MB infrastructure (§3.3.1) 99 % * * 98.6 Sustain - Provide reliable, and secure Availability of XWFS (%) infrastructure (§3.3.1) * metric not tracked during this period The two KPIs for Data Services were identified and instrumented beginning in PY3, thus there is no KPI data for the first two project years. Because GridFTP reporting was introduced during PY3, the quantity of GridFTP transfer data available on a per-quarter basis was inconsistent, leading to significant quarterly variation in results for both the KPI value and the overall metrics. The GridFTP logging effort was rolled out over the course of the year of PY3, resulting in only four service providers providing data by the final quarter. GridFTP log analysis was the basis of an activity to provide evidence-based recommendations for improving GridFTP performance overall. These recommendations will be available from the beginning of PY4, and detailed metrics and the KPI value will be tracked by additional service providers. XWFS availability showed an upward trajectory over the course of PY3, with the annual average reflecting a quarter-to-quarter increase as incremental improvements to the configuration resulted in a system that was increasingly tolerant of minor hardware and network issues. Additionally, operations staff have become more adept at managing the system and utilizing failover capabilities where available, resulting in a significant reduction of downtime as a result of hardware maintenance. 5.4.3. XSEDEnet (WBS 1.2.3) The Networking group monitors, maintains, and improves XSEDEnet and other networking related capabilities and services. The KPI for the XSEDEnet group is the percentage of time that the XSEDE network is available. XSEDEnet is a private network, built upon the Internet2 AL2S service, which provides a controlled network environment that maximizes file transfer performance between XSEDE SP resources. XSEDEnet is dependent upon the availability of the Internet2 national, redundant AL2S service. If Internet2’s AL2S service is down, or a site’s connection is down, connectivity and file transfer performance is negatively affected. An availability target of 99% was chosen because it represents the typical Service Level Agreement goal for many networks.

PY1-3 Comprehensive Report Page 56

Table 5-9: KPIs for XSEDE Operations-XSEDEnet. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Availability of XSEDEnet (%) 99 * * 99.9 Sustain - Provide reliable, and secure infrastructure (§3.3.1) * metric not tracked during this period Prior to PY3, XSEDEnet was built upon the National Lambda Rail (NLR) network. The availability percentages for the KPI are not available for that time period, however NLR-based XSEDEnet was built in a redundant fashion and there were no significant network-wide outages observed during PY1 or PY2. However, availability of XSEDEnet was on target for every quarter during PY3, with an overall average availability of 99.9% for the project year. The metrics for XSEDEnet were defined in PY3 and, to provide more complete reporting of metrics, the XSEDEnet group has been in discussion with Internet2 staff to provide a solution whereby Internet2 would collect and report per-protocol usage data for each site, thus eliminating the requirement for each site to collect and report on these metrics individually. This will streamline the reporting process and ensure more complete data for these metrics. The new mechanism for reporting utilization should not be subject to the same metrics collection problems as experienced during prior quarters. We expect to begin reporting usage data instrumented by Internet2 beginning in PY4. 5.4.4. Software Testing & Deployment (WBS 1.2.4) The Software Testing & Deployment (ST&D) group performs operational testing as part of the XSEDE engineering process and supports and coordinates the deployment of XSEDE software at the XSEDE Service Providers, to XSEDE Central Services, and for campus bridging. The KPI for the ST&D group is the percentage of software components deployed within the targeted deployment time. Each activity received from SD&I is evaluated by the ST&D group and subsequently given a target deployment date. If each activity that passes Operational testing is deployed by its target date, then the KPI target has been met. These activities include both new software components and upgraded versions of previously deployed software. A target of 100% of accepted components deployed on time was chosen for this KPI as a measure of the overall success for ST&D. Because this KPI is a new metric for the group beginning in the second quarter of PY3, it was not instrumented prior to that quarter. Table 5-10: KPIs for XSEDE Operations-Software Testing & Deployment. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported % of software components deployed 100% * * 66.7 Advance – Create an open and within target deployment time evolving e-infrastructure (§3.2.1) * metric not tracked during this period ST&D met its KPI target of 100% for PY3Q2 and PY3Q3. There was only one component that was reviewed by the Software Testing and Development group during PY3Q4 – the EMS/GFFS update. This activity missed the targeted deployment date by 78 days due to several factors, primary among them being bundling two major software packages together for testing, inadequate and incorrect documentation, poor quality of software being tested, and inadequate support from developers. Because the EMS/GFFS update was the only activity for PY3Q4 and it was not deployed within its target time, the average KPI for all of PY3 was reduced to 66.7%.

PY1-3 Comprehensive Report Page 57

5.4.5. Accounting & Account Management (WBS 1.2.5) The Accounting & Account Management (A&AM) group maintains and improves the interfaces, databases, and data transfer mechanisms for XSEDE-wide resource allocation and resource usage accounting. The KPIs for A&AM are satisfaction results from surveys on the allocation process (using POPS and XRAS) and the aggregated availability of key A&AM services. The XSEDE Resource Allocation Service (XRAS) is a reengineering of the legacy allocations system, called Partnerships Online Proposal System (POPS). Our goal is to better meet stakeholder requirements, including the XSEDE program, XSEDE users, and allocation review panel members. In order to gauge the effectiveness of this development effort, representative groups of users are surveyed regarding their experience with POPS/XRAS. These surveys provide one of the key performance indicators (KPIs) for the Accounting and Account Management Group. The surveys rank user experience on a scale of 1 to 5, with 5 indicating the highest user satisfaction. The target value is 4.0. Accounting and Account Management also provides critical services to the XSEDE program through the XSEDE central database (XDCDB) and the Account Management Information Exchange (AMIE), on which the POPS/XRAS system depends. The aggregate availability of these two critical systems is an additional KPI for the group and has a target of 95%, matching the target KPI value for aggregated availability of all enterprise central services in XSEDE. Table 5-11: KPIs for XSEDE Operations-Accounting & Account Management. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Survey results on overall POPS/XRAS 4 of 5 4.5 4.5 4.7 Sustain - Provide reliable, and satisfaction secure infrastructure (§3.3.1) Aggregate availability of key A&AM 95% 99.6 99.9 99.9 Sustain - Provide reliable, and services – POPS/XRAS (%) secure infrastructure (§3.3.1)

Survey data has enabled us to provide historical metrics for user satisfaction of POPS. It will also be helpful to determine any change in user satisfaction when the XRAS system is deployed in PY4. Results of the user surveys indicate a very high level of satisfaction with the current POPS system. Our goal of 4.0 has exceeded the target each project year and has remained consistent since the survey’s inception. The POPS system has been continually updated and maintained, focusing on requests from users and Allocation’s staff (see Appendix B for additional POPS metrics). This attention to the needs and desires of the user community has contributed to the high overall satisfaction with POPS system and the expectation is this will remain unchanged with the implementation of XRAS. The availability of the key A&AM services remains at nearly 100% and has, in fact, slightly increased each project year. 5.4.6. Systems Operational Support (WBS 1.2.6) Systems Operational Support (SysOps) provides front-line support via the XSEDE Operations Center and system administration for all XSEDE Enterprise services. The KPIs for the SysOps group are the percentage of time that enterprise services (cumulatively) are available and the mean time to ticket resolution by the XSEDE Operations Center (XOC), measured in hours. Providing a reliable infrastructure requires that central service resources are functioning and dependable. The aggregate availability percentage KPI of 95% for enterprise services was chosen

PY1-3 Comprehensive Report Page 58 because this is the typical NSF uptime requirement for deployed resources. This percentage was also explicitly stated in the original XSEDE proposal. The XOC is necessary to monitor and maintain resources as well as to provide XSEDE with effective and efficient user support. Given that the XOC is staffed 24x7, a mean time to resolution for tickets in the XOC ticket queue of less than 24 hours is a reasonable target, both from feasibility and effective operational standpoints. Table 5-12: KPIs for XSEDE Operations-Systems Operational Support. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Aggregate availability of 95 99.6 99.7 99.9 Sustain - Provide reliable, and secure enterprise services (%) infrastructure (§3.3.1) Mean time (hrs) to ticket < 24 * * 8.9 Sustain – Provide excellent user support resolution by XOC hours (§3.3.2) * metric not tracked during this period Overall availability of enterprise services was has remained above 99%. The XOC provided good service in initial triage of tickets, closing a significant portion within 24 hours or less with a mean time significantly less than the target time. In project years 1 and 2 the ability to measure mean time to ticket resolution did not exist. The new ticket system, which came online at the end of PY2, provided more functionality and visibility into these ticket metrics. The number of enterprise services has grown as a result of a more integrated partnership and workflow between SysOps and SD&I. This trend is upward and should remain that way until the project is completed. For the enterprise server % availability metric, the SysOps group has delivered an extremely high aggregate from the 55 enterprise services monitored. This metric is key for SysOps as it shows the reliability of XSEDE’s enterprise (central) systems. The SysOps team continues to transition core services from site-specific installations to cross-provider, high availability offerings (see Appendix B.2.2 XSEDE Operations (WBS 1.2) (Greg Peterson)).

PY1-3 Comprehensive Report Page 59

6. Managing the Program 6.1. XSEDE Project Office (WBS 1.1) The XSEDE Project Office assures excellence in the management and operation of the project and supports functions that span XSEDE’s organization, such as project management, external relationships, and software engineering and architecture design. Table 6-1: KPIs for Project Office. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Use Case Performance = average of > 0.9 * 0.85† 0.83† Advance – Create an  Use Case Specification open and evolving e- Clearance Ratio ( # of infrastructure (§3.2.1) specifications completed/# of specifications started)  Average Relative Effectiveness (estimated task duration / actual task duration) in A&D’s completion of Use Cases  Fraction of software components deployed within target deployment time Increase XSEDE’s online presence 1% * * * Deepen/Extend - Raise (growth percentage of people growth the Awareness (§3.1.4) engaged) = per total website visits + total Twitter quarter followers + total Facebook likes/total numbers from previous quarter # of improvements implemented= 40 17† 36 37 Sustain - Effective virtual organization (§3.3.3) * metric not tracked during this period † Incomplete data 6.1.1. Project Management & Reporting (WBS 1.1.1) Project Management & Reporting enables an effective virtual organization through application of project management principles; provides visibility to project progress, successes, and challenges; brings new ideas and management practices into the project; and disseminates lessons learned in XSEDE to other virtual organizations. To deliver the most advanced and useful digital services for researcher, XSEDE must constantly evolve. This makes continuous evaluation and improvement of processes and policies critical to the effectiveness of XSEDE as an organization; therefore, the project’s KPI for sustaining an effective virtual organization is the number of improvements made. Potential improvements consist of recommendations based on the staff climate survey; efforts by area managers to increase efficiency or quality; adaptations in response to an evolving project; or improvements to overall execution through implementation of widely validated methodologies. Another focus for the PM&R area is communication of the project status to NSF. Thus, we have chosen the timeliness and quality of reporting to the funding agency as area KPIs. We are targeting to deliver each report to NSF within 30 days of the close of the reporting period and are planning to ask for external evaluation of reports to rate their usefulness. The intent is that we increase the

PY1-3 Comprehensive Report Page 60 readability of the report and the ability of stakeholders to quickly glean the projects status and challenges. Table 6-2: KPIs for Project management & Reporting. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of process improvements + 40 17† 36 37 Sustain - Effective virtual # of actions from staff climate survey organization (§3.3.3) # of reports delivered within 30 days of 4 of 4 4 3 1 Sustain - Effective virtual quarter close organization (§3.3.3) Usefulness rating of reports from sponsor 4 of 5 * * * Sustain - Effective virtual assessment organization (§3.3.3) † Incomplete data * metric not tracked during this period The staff climate survey did not start until PY2, and process improvements were not formally tracked until PY3, so for this report we queried management across the projects about improvements implemented in the first two years. The target for the number of annual improvements for PY3 was based on the number of recommendations from the first staff climate survey, the number of recommendations related to alignment with the Baldrige methodology for strategic planning, and a rough estimate of the capacity of project management to identify and implement improvements based on need and best practices. The target for PY3 was eight improvements, which failed to consider the many improvements that are implemented within management areas without project-wide coordination. The target for PY4 is 50 improvements and will include the formal tracking of improvements originating within an area of management. The third project year saw the most improvements implemented. These improvements included stakeholder alignment activities relating to the project’s vision, mission, and strategic goals. This was followed by a review of project activities that insured alignment with the strategic goals and the development of key performance indicators for each goal. A new reporting format that reflects the move to performance toward goals as a measure of progress was also implemented in the third project year. The new format is intended to increase the usefulness of the report in expressing project status and to reduce the effort required to produce reports. A rating by the sponsor was identified as the best way to measure it’s impact. At writing discussions continue with the sponsor on the best way to implement this metric, but qualitative evidence exists in two forms: 1) project-wide feedback has been positive with many managers saying that their contributions took significantly less time and 2) initial feedback from the sponsor was very good as well with the cognizant program officer stating, “This is much easier to follow and makes it easy to see how you [XSEDE} are doing in different areas. It is easy to see how activities within the project support the mission.” Despite the reduction in time spent to produce reporting, we did not meet the goal of delivering three quarterly reports, and an annual report to NSF within 30 days of the close of a reporting cycle in the third project year. This is a result of the additional effort required to implement project-wide KPIs and the new reporting format, which has caused a backlog in reporting that is now clear at the time of writing. 6.1.2. Systems & Software Engineering (WBS 1.1.2) The Systems & Software Engineering group’s mission is to: 1) identify new user needs and capabilities to guide the growth and development of XSEDE's infrastructure; 2) participate in developing these needs and capabilities into effective use cases; 3) document and track the

PY1-3 Comprehensive Report Page 61 progress in implementation of these use cases; and 4) prioritize the implementation of these use cases through the activities of the User Requirements Evaluation and Prioritization committee. One of the main activities for SSE is to identify new needs and capabilities desired by the XSEDE user community. SSE then works with subject matter experts to develop these needs and capabilities into use case specifications that accurately describe the desired functionality, which can take some time to research and complete. Accordingly, the KPI for SSE measures the number of use case specifications that have been started relative to the number of use case specifications that have been fully developed and sent on to Architecture and Design for review and inclusion in the XSEDE architecture. This KPI can be calculated by simple arithmetic and is measureable today. Table 6-3: KPIs for Systems & Software Engineering. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported Non-Canonical Use Case Specification 1.0 * 0.85 1.0 Advance – Create an open Clearance Ratio = and evolving e- (# of specifications completed) / infrastructure (§3.2.1) (# of specifications started) * metric not tracked during this period The KPI for PY2 is below expectations because of the very large number of use cases that were started during this project year. This is the year that we switched to a use case approach for capturing user needs and capabilities, which resulted in an explosion of ideas for new use case specifications. As the XSEDE architecture matures, we do expect the number of new use case specifications that need to be developed to decrease, which it did in PY3. 6.1.3. External Relations (WBS 1.1.4) External Relations’ mission is to properly communicate the value and importance of the XSEDE project to all identified stakeholders with a creative and strategic communications methodology. External Relation’s primary mechanisms for communicating our message to communities are via our online presence and through issuing of communications, typically in the form science success stories and press releases. The KPI for “Increase XSEDE’s online presence” was chosen because a consistent increase in the number of users engaged indicates that we are deepening and extending the awareness of XSEDE and the value of XSEDE in the community. This number represents the percentage increase in users we wish to see each quarter and is in line with industry standards. Historically this information has not been tracked. For future tracking we have implemented the use of Google Analytics as well as tracking mechanisms for both Twitter and Facebook. The KPI for “Communicate the value of XSEDE” was chosen because in addition to increasing the number of users engaged we must consistently provide them with information that conveys the purpose of XSEDE as well as the value being added to the science community through the use of XSEDE resources. The KPI selected represents the average number of communications on a quarterly basis presented to the community conveying this message. Historically this information has not been tracked. For future tracking we have implemented a manual tracking process.

PY1-3 Comprehensive Report Page 62

Table 6-4: KPIs for External Relations. Sub-goal PY4 KPI PY1 PY2 PY3 Target Supported Increase XSEDE’s online presence (growth 1% * * † Deepen/Extend - Raise percentage of people engaged) = growth the Awareness (§3.1.4) total website sessions + # of referrals + total per Twitter followers + total Facebook likes/total quarter numbers from previous quarter XSEDE’s media presence (# of 16 * * † Deepen/Extend - Raise communications) = the Awareness (§3.1.4) # of stories + # of press releases * metric not tracked during this period † Incomplete data The KPI for “increase XSEDE’s online presence” is an essential factor in determining how well we are communicating the value of XSEDE not only to the regular, well-established computing community, but also those who may be new to science, engineering and research. It gives us the baseline number for how many people we are reaching with the public through our website and social media arms. Our growth in social media reach is expected to keep growing at an aggressive rate, while website hits (which in raw numbers far outnumber our “likes” and “follows”) hope to at the very least maintain a steady rate of regular visitors. Hopefully, we should be able to grow our website hits by using the social media arms as a gateway to the XSEDE.org website, which would help our growth rate increase more quickly. So far, our KPI has been on target, but may vary over time. Since the KPI tracks both the numbers of website hits (which is around 40,000), while our social media numbers are much smaller (under 2,000 combined as of July, 2014), the impact on a growth percentage is thrown off by small increases or decreases in web hits, while the strong growth in social media doesn’t have much of an effect. The KPI for “Communicate the value of XSEDE” is just as important as our outreach noted above, because it determines the content we produce for the public sphere. The ER team must not simply barrage our users, the media and the public with stories and press releases that will be ignored, but we also must maintain a presence in the minds of those mentioned. By ensuring we produce a certain number of science stories and a certain number of press releases, we can hopefully keep XSEDE on the minds of all of our stakeholders. As noted above, we don’t truly know the “impact” of whether 10, 20 or 50 combined stories and releases is the “right” number – each story is not born equal in terms of importance to a large number of readers, but so far, it is safe to say that our users and other stakeholders get the information they need. Discussions have taken place in regards to a more wide-spread media buy, such as Wired or the New York Times – it is unclear if there is value in such a strategy, but XSEDE is well-established with niche publications HPCwire and ISGTW. 6.1.4. Industry Relations (WBS 1.1.5) Industry Relations' mission is to demonstrate the value of advanced computational science and engineering resources, services and expertise to industry. Industry Relations has two current activities: Industry Challenge and Workforce Development. The Industry Challenge program works to address computationally intensive challenges facing industry that could be addressed with access to XSEDE expertise and resources. The program establishes a supportive environment for innovation and cooperative sharing to support future economic and societal benefits within both the industrial and academic worlds. The Workforce Development

PY1-3 Comprehensive Report Page 63 project seeks to apply the significant expertise and experience for XSEDE staff to support the workforce development needs of industry. The project has three phases. First, we will conduct a skill assessment to measure industry need for HPC skills in the workforce. Second, the assessment will be used to identify desirable XSEDE training and education offerings for industry. Third, industry-focused education and training pilots will be conducted and analyzed for effectiveness. We are measuring progress on the projects via the following KPIs: Number of Industry Challenge projects: Two Industry Challenge pilot projects were launched in PY3. We would like both teams to continue, so the KPI target is 2. This KPI will be measured by counting annual renewal responses made by the Industry Challenge teams. Number of identified Workforce Development skills deemed desirable for HPC: Industry Relations is conducting its first HPC “Skill Gap” survey in PY4. The survey is under design and the response rate is unknown at this point, so the KPI target of 5 is a placeholder. This KPI will be measured by counting the number of skills with a 60% or greater response rate from industry. Number of Industry Challenge users: This is the number of XSEDE users participating in Industry Challenge projects. The target was blank since we did not know the size of the project teams. This KPI will be measured through project reporting from the XSEDE user portal. Number of industry people trained: This is the number of industry representatives attending XSEDE training. We are in the assessment phase and have not begun offering XSEDE training to industry personnel. Average impact assessment of industry people trained: This is the impact rating from the attendee surveys conducted at the end of training workshops. Table 6-5: KPIs for Industry Relations. PY4 Sub-goal KPI PY1 PY2 PY3 Target Supported # of Industry Challenge Projects 2 0 0 2 Deepen/Extend – Extending use to new communities (§3.1.2) # of Industry Challenge Users 4 0 0 6 Deepen/Extend – Extending use to new communities (§3.1.2) # of industry people trained 120 19 100 111 Deepen/Extend – Preparing the current and next generation (§3.1.3) # of identified Workforce 5 0 0 0 Deepen/Extend – Preparing the current Development skills deemed and next generation (§3.1.3) desirable for HPC Average impact assessment of 0 0 0 0 Deepen/Extend – Preparing the current industry people trained; and next generation (§3.1.3) * metric not tracked during this period Two active industry challenge projects continue have been established, so we report a KPI of 2. This is on target. Six user accounts have been used for these projects, so we report a KPI of 6. This is also on target. We lack the ability to track industry representation in training. The IR committee will work with Education, Outreach and Training to implement this reporting. To identify Workforce Development skills for HPC, Industry Relations has created a “HPC Skill Gap” survey. The survey will be conducted later in 2014, so no skills have been identified and we report a KPI of 0. This is on target. Finally, since no completed training has been tracked, we report a KPI of 0 for training assessment. This is on target.

PY1-3 Comprehensive Report Page 64

A Glossary and List of Acronyms ACRONYM DESCRIPTION Notes A&AM Accounting & Account Management A&D Architecture & Design ADR Architecture Design Review AL2S Advanced Layer 2 Service Allows Internet2 users to create point-to-point VLANs AMIE Account Management Information Exchange API Application Interface CB Campus Bridging Infrastructure to make XSEDE resources appear to be proximal to the researcher's desktop CC Campus Champion co-Pi Co-Principal Investigator CRM Customer Relationship Management CS&E Computational Science & Engineering DNS Domain Name Service DNSKEY Domain Name Service Key DNSSEC DNS Security E&O Education & Outreach ECSS Extended Collaborative Service e-infrastructure the integration of networks, grids, data centers and collaborative environments, and are intended to include supporting operation centers, service registries, credential delegation services. ER External Relations ESRT Extended Support for Research Teams ESSGW Extended Collaborative Support for Science Gateways ESTEO Extended Support for Training, Education, and Outreach FTE Full Technical Equivalent GAAMP General Automated Atomic Model Parameterization GFFS Globus Federated File System GridFTP Grid File Transfer Protocol HPC High Performance Computing HPCU HPC University HSM Hardware Security Models I2 Internet2 IC Industry Challenge IdM Identity Management

PY1-3 Comprehensive Report Page 1

ACRONYM DESCRIPTION Notes INCA/Nagios an service monitoring tool IR Incident Reports JIRA an activity tracking tool KB KB documents KPI Key Performance Indicators L2 WBS Level 2 L3 WBS Level 3 MFC Minority Faculty Council MS Microsoft MSI Minority Serving Institution NCAR National Center for Atmospheric Research NCSA National Center for Supercomputing Applications NICS National Institute of Computational Science NIP Novel & Innovative Projects OPS-SEC Operations - Cybersecurity OSG Open Science Grid OTP One Time Password PEP Program Execution Plan perfSONAR PERFormance Service Oriented Network monitoring Architecture PI Principal Investigator PM Project Management POPS Partnerships Online Proposal System this is no longer an acronym and POPS is just the name for the allocation submission system; being supplanted by XRAS

PSC Pittsburgh Supercomputing Center PY Program Year REST Representational state transfer: a software architectural style RESTful Conforming to the REST constraints rocks roll an open source cluster distribution solution that simplifies the processes of deploying, managing, upgrading, and scaling high- performance parallel computing clusters.

RSIG Recordset Signature RT Request Tracker Ticketing System SACNAS Society for Advancement of Chicanos and Native Americans SC14 Super Computing Conference in 2014 SD&I Software Development & Integration

PY1-3 Comprehensive Report Page 2

ACRONYM DESCRIPTION Notes SDIACT Software Development & Integration Activity SDSC San Diego Supercomputer Center SH2 SH2 Security Shodor A National Resource for Computational Science Education SP Service Providers A supercomputer center STEM Science Technology Engineering Mathematics SysOPS Systems Operations TACC Texas Advanced Computing Center TEOS Training, Education & Outreach Service TeraGrid an e-Science grid computing infrastructure combining resources at eleven partner sites. TIS Technology Investigation Service UCCAN Use Case Canonical Use Case UCCB Campus Bridging Use Case UCDA Data Analytics Use Case UCDM Data Management Use Case UCF Federation and Interoperation Use Case UCFC First Connecting Instrumentation Use Case UCHPC High Performance Computing Use Case UCHTC High Throughput Computing Use Case UCSGW Science Gateway Use Case UCSW Scientific Workflow Use Case UCVIS Visualization Use Case UE User Services - User Engagement UII User Services User Interfaces and Information URC Under-represented communities URCE Under-Represented Community Engagement UREP User Requirement Evaluation & Prioritization URM Under-Represented Minority US United States WBS Work Breakdown Structure Numerical code for each group within XSEDE XCDB XSEDE Central Database The XDCDB contains 24 schemas, notably the accounting, resource repository, portal, and AMIE databases. XDCDB XSEDE Central Database XOC XSEDE Operation Center XRAC XSEDE Resource Allocation Committee XRAS XSEDE Resource Allocations Service

PY1-3 Comprehensive Report Page 3

ACRONYM DESCRIPTION Notes XSEDE eXtreme Science and Engineering Discovery Environment XSEDE CA XSEDE Certificate Authorities Entity responsible for certifying encryption keys for identity management XSEDE KDC XSEDE Kerberos XSEDE14 XSEDE Conference in 2014 XSEDEnet an XSEDE-only network XSP XSEDE Scholars Program XTED XSEDE Technology Evaluation Database XUP XSEDE User Portal The XSEDE web pages at http://xsede.org XWFS XSEDE Wide File System

PY1-3 Comprehensive Report Page 4

B Project Metrics B.1 Project-wide Metrics To demonstrate its success and help focus management attention on areas in need of improvement, XSEDE monitors a wide range of metrics in support of different aspects of “success” for the program. The metrics presented in this section provides a view into XSEDE’s user community, including its success at expanding that community, the projects and allocations through which XSEDE manages access to resources, and the subsequent use of the resources by the community. Table 6 summarizes a few key measures of the user community, the projects and allocations, and resource utilization. Expanded information and five-year historical trends are shown in three corresponding subsections. From PY1 to PY3, the XSEDE user community continued to grow, in terms of open user accounts, active individuals, engaged institutions, and other metrics. More details are in §B.1.1. Project and allocation activity held strong, with resource requests growing from 2x the available resources in PY1, to 3.7x what was available in PY3, with XRAC recommending 1.7x what was available in the later quarters of PY3. More details are in §B.1.2. Across program years, XSEDE resource metrics increased in most areas, but the PY3 figures mask quarterly declines at the end of the program year due to the decommissioning of Kraken. Total capacity declined to 11.36 Pflops (peak) after exceeding 12 Pflops in mid-PY3. The central accounting system showed a decreasing number of compute resources reporting activity. More details are in §B.1.3. Table 6. Quarterly activity summary User Community PY1 PY2 PY3 Open user accounts 9,352 10,695 11,910 Active individuals 3,810 4,395 4,938 New user accounts 2,801 3,716 3,975 Active fields of science 29 30 30 Active institutions 523 524 564 Projects and Allocations NUs available at XRAC 65.494B 119.155B 90.954B NUs requested at XRAC 135.674B 253.475B 337.896B NUs recommended by XRAC 81.953B 117.038B 157.202B NUs awarded at XRAC 65.973B 105.389B 91.195B Open projects 2,263 2,339 2,528 Active projects 1,616 1,763 1,929 New projects 838 923 988 Closed projects 933 973 1,033 Resources and Usage Resources open (all types) 36 32 27 Total peak petaflops 2.97 6.82 11.37 Resources reporting use 22 16 15 Jobs reported 4.89M 5.45M 4.47M NUs delivered 61.635B 83.238B 107.480B Avg wtd run time (hrs) 22.1 21.0 20.9 Avg wtd wait time (hrs) 27.6 24.0 17.4

PY1-3 Comprehensive Report Page 1

Avg wtd slow down 3.6 3.1 2.5 B.1.1 User community metrics Figure 12 shows the five-year trend in the XSEDE user community, including open user accounts, total active XSEDE users, active individual accounts, active gateway users, the number of new accounts, and the total number of unvetted user accounts (i.e., portal-only accounts) at the end of each quarter. Portal-only user accounts can be used for training course registration and other non- HPC services. The last quarter of PY3 saw 7,555 open accounts and a record 2,679 users charging jobs, continued strong growth in the XSEDE user community. In the third and fourth quarters of PY3, the number of active gateway users reported, 2,739 and 2,874, surpassed the number of active traditional users. Figure 13 shows the activity on XSEDE resources according to field of science across program years, including the relative fraction of PIs, open accounts, active users, allocations, and NUs used according to discipline. For consistency across years, we show the ten fields of science that typically consume the most delivered NUs per quarter. PIs and users are counted more than once if they are associated with projects in different fields of science. The program year totals show that the percentage of PIs and user accounts associated with the “other” disciplines grew from 28% of all PIs and 35% of user accounts in PY1 to more than 35% of PIs and almost half of open user accounts in PY3. The number of users active in this “long tail” of science averaged nearly 40% of active traditional users in all three years, while collectively the “other” fields of science represented nearly 10% of total quarterly usage in each year. Figure 14 shows the number of publications, conference papers, and presentations reported by XSEDE users each quarter over the past five years, showing dramatic and (to date) sustained growth since late 2012, corresponding primarily to the increasing number of quarterly allocation requests.

PY1-3 Comprehensive Report Page 2

Figure 12. XSEDE user census, excluding XSEDE staff.

Figure 13. Quarterly XSEDE user, allocation, and usage summary by field of science, in order by usage, excluding staff projects. Note: PIs and users may appear under more than one field of science.

PY1-3 Comprehensive Report Page 3

Figure 14. Publications, conference papers, and presentations reported by XSEDE users.

Table 7 and Table 8 highlight aspects of the broader impact of XSEDE. The former shows that graduate students, post-doctoral researchers, and undergraduates make up 65% of the XSEDE user base. The latter table shows XSEDE’s reach into targeted institutional communities. Institutions with Campus Champions represent a large portion of XSEDE’s usage because this table shows all users at Campus Champion institutions, not just those on the champion’s project. The table also shows XSEDE’s reach into EPSCoR states, the MSI community, and countries outside the U.S.

Table 7. End of quarter XSEDE open user accounts by type, excluding XSEDE staff. Category PY1 PY2 PY3 Graduate Student 3,561 4,308 4,948 Faculty 1,987 2,014 2,113 Postdoctorate 1,377 1,514 1,596 Undergraduate Student 1,007 1,237 1,465 University Research Staff (excl. postdocs) 762 797 833 High School 22 31 57 Others 636 794 898 Totals 9,352 10,695 11,910

Table 8. Active institutions in selected categories. Institutions may be in more than one category. Category PY1 PY2 PY3 Campus Sites 88 91 96 Champions Users 1807 2157 2428 % total NUs 47% 49% 48% EPSCoR Sites 89 80 88 states Users 504 594 715 % total NUs 17% 14% 15% MSIs Sites 19 19 23 Users 59 77 100 % total NUs 0.8% 0.5% 0.5% International Sites 112 113 135 Users 186 198 221 % total NUs 5% 3% 2% Total Sites 523 524 564

PY1-3 Comprehensive Report Page 4

Users 3,810 4,395 4,938 B.1.2 Project and allocation metrics Figure 15 shows the five-year trend for requests and awards at XSEDE quarterly allocation meetings. The figure shows the continued strong demand even as available resources decreased due to the upcoming decommissioning of Kraken at NICS. As of the end of PY3, the amount of XSEDE allocation awards decreased for the fourth consecutive quarter. NUs requested were regularly were 3x or 4x greater than the NUs available, and the XRAC recommendations were 2x of the NUs available. XSEDE awarded fewer NUs than recommended due to lack of available resources. Table 9 presents a summary of overall project activity, and Table 10 shows projects and activity in key project categories as reflected in allocation board type, including the number of open and active user accounts with each type of project. Note that Science Gateways may appear under any board, and users may be associated with more than one project.) Across program years, the number of active projects—those making use of resources—continued its steady increase. As a special class of projects, science gateway activity is detailed in Figure 16, showing continued high levels of usage and users from these projects.

Figure 15. Five-year allocation history, showing NUs requested, awarded, available, and recommended. Since September 2008, all meetings have considered any request that exceeded “small” limits.

Table 9. Project summary metrics Project metric PY1 PY2 PY3 XRAC requests 565 666 765 XRAC request success 91% 84% 80% XRAC new awards 183 215 228 Startup requests 640 749 701 Startup request success 87% 80% 90% Projects open 2,263 2,339 2,528

PY1-3 Comprehensive Report Page 5

Projects new 838 923 988 Projects active 1,616 1,763 1,929 Projects closed 933 973 1,033

Table 10. Project activity by allocation board type. PY1 PY2 PY3 Open Active Open Active Open Active project project project project project project s s % NUs s s % NUs s s % NUs Campus 81 44 0.2% 126 70 0.4% 150 89 0.4% Champions Discretionar 10 8 0.5% 1 2 0.0% 1 1 0.1% y Educational 92 59 0.1% 109 81 0.4% 122 94 0.3% Staff 51 30 0.3% 765 20 0.4% 876 18 0.1% Startup 1,298 799 3.4% 29 828 3.2% 30 913 3.7% XRAC 731 676 95.5% 1,309 762 95.5% 1,349 814 95.4% Totals 2,263 1,616 100% 2,339 1,763 100% 2,528 1,929 100%

Figure 16. Quarterly gateway usage (NUs), jobs submitted, users (reported by ECSS), registered gateways, and active gateways.

B.1.3 Resource and usage metrics While the yearly totals showed continued increases in NUs delivered, SP systems delivered 24.7 billion NUs in Q2 2014, a decrease of about 15.6% from the previous quarter, and slighly less than the year-ago quarter. Table 11 breaks out the resource activity according to different resource types. Figure 17 shows the total NUs delivered by XSEDE computing systems, as reported to the central accounting system over the past five years.

PY1-3 Comprehensive Report Page 6

Table 11. Resource activity, by type of resource, excluding staff projects. Note: A user will be counted for each type of resource used. PY1 PY2 PY3 Resources 12 9 8 Jobs 4,254,451 4,852,251 3,966,378 High-performance computing Users 3,112 3,577 4,098 NUs 59,201,254,432 73,819,704,877 97,771,892,943 Resources 5 2 2 Jobs 207,708 485,871 451,542 Data-intensive computing Users 887 1,217 1,298 NUs 2,306,625,154 8,658,055,853 10,328,428,095 Resources 2 2 2 Jobs 1,545,229 971,168 52,606 High-throughput computing Users 42 38 30 NUs 61,027,019 289,470,322 612,548,255 Resources 3 3 3 Jobs 55,289 32,300 23,987 Visualization system Users 273 239 126 NUs 265,871,493 249,897,404 79,774,886

Figure 17. Total XSEDE resource usage in NUs.

Finally, Table 12 presents some summary metrics to reflect aggregate “usage satisfaction,” including the average run time, wait time, response time (run + wait), and slow down (or expansion factor). These values are presented as unweighted averages, which show the impact of small jobs, and as averages weighted by each job’s portion of the workload (in core-hours), which show responsiveness to the jobs responsible for most of the delivered NUs. The weighted averages eliminates the skew due to small jobs and shows a much more realistic average perceived slowdown for the work delivered.

PY1-3 Comprehensive Report Page 7

Table 12. Usage satisfaction metrics, for HPC and data-intensive computing resources only, excluding staff projects. Job attribute PY1 PY2 PY3 Run time (hrs) 3.0 2.0 2.0 Unweighted Wait time (hrs) 4.4 4.2 3.5 average Response time (hrs) 8.2 7.3 6.4 Slow down 397.2 475.4 322.6 Wtd run time (hrs) 22.1 21.0 20.9 Weighted Wtd wait time (hrs) 27.6 24.0 17.4 average Wtd response time (hrs) 49.7 45.0 38.2 Wtd slow down 3.6 3.1 2.5 B.1.4 Data Services XSEDE supports monitoring for the Globus data transfer service for connecting XSEDE service providers and external sites. Table 13 shows summary metrics and increasing Globus adoption over the past three years. Figure 18 shows the trends in Globus data transfer activity from the start of the XSEDE program, with the high points labeled for each metric.

Table 13. Globus Online activity to and from XSEDE endpoints, excluding GO XSEDE speed page user. PY1 PY2 PY3 To/from Files into XSEDE endpoints (millions) 83.1 147 405.1 XSEDE endpoint TB into XSEDE endpoints 1,478 2,260 4,928.9 Files from XSEDE (milions) 77 147.3 364.6 TB from XSEDE 613 1,953 4,790 Users (max. quarterly #) 187 258 392 To/from XSEDE Files to XSEDE (millions) 12.5 32.2 70.03 via Globus Connect TB to XSEDE 123 207 234.5 Files from XSEDE (millions) 5.9 28.3 52.62 TB from XSEDE 74 160 289.1 Users (max. quarterly #) 97 141 247

PY1-3 Comprehensive Report Page 8

Figure 18. Aggregate Globus data transfer activity into XSEDE endpoints by quarter. B.1.5 User Information & Interfaces The User Information and Interfaces group provides XSEDE users with central information and services via the XSEDE User Portal (XUP), web site, XUP mobile, and knowledgebase. Table 14 shows activity on the primary user information interfaces, as well as increases in the numbers of logged-in users accessing these interfaces. The sharp drop in web and portal hits in PY3 is due to a change in measurement tools. The UII team switched to Google Analytics, which provides more in- depth information and more accurate hit counts as hits to specific pages not just server hits. For annual metrics, most quarterly values were summed, but metrics marked with an asterisk represent the largest quarterly value during the program year.

Table 14. XSEDE web site, user portal and XUP Mobile activity. (Note: “Users” indicates logged-in users.) UII Activity PY1 PY2 PY3 Web hits 8,382,658 12,771,356 479,705 Web visitors 107,765 174,516 84,908 XUP hits 4,722,761 7,584,658 904,582 XUP visitors 45,336 80,946 73,909 XUP accounts* n/a 14,848 21,587 XUP users* 3,976 5,722 6,426 XUP users running jobs* n/a 1,818 2,732 XUP Mobile hits 5,588 19,010 44,346 XUP Mobile users* 48 21 46 KB docs retrieved 207,888 662,182 1,236,068 Total KB docs* 497 586 584 New KB docs 141 124 13

PY1-3 Comprehensive Report Page 9

B.2 Area Metrics B.2.1 XSEDE Project Office (WBS 1.1) (John Towns) B.2.1.1. Project Management and Reporting (WBS 1.1.1) (Justin Whitt) PY4 Metric Target PY1 PY2 PY3 # total XSEDE conference attendees 664 616 720 632 ratio of new/previously attendees TBD * * * Increase in total XSEDE conference 5% * +17% -12% attendees (KPI) # PM best practices training events 4 * 3 6 # improvements implemented 50 19† 36† 39 # reports delivered within 30 days of quarter close 4 4 3 1 Usefulness of reports from sponsor 4.5 * * * feedback # of total risks TBD 92 160 188 # of new risks TBD * 83 48 # of retired risks TBD 19 34 54 # of triggered risks TBD 0 1 4

B.2.1.2. Systems & Software Engineering (WBS 1.1.2) (Janet Brown) Metric Target PY1 PY2 PY3 # of use case specifications started TBD * 39 7 # of use case specifications in process TBD * 6 6 # of use case specifications completed TBD * 33 7 # of documents archived TBD * 84 62 # of hits on archived documents TBD * 445 2947

B.2.1.3. Architecture and Design (WBS 1.1.3) (David Lifka) Metric Target PY1 PY2 PY3 # of canonical use cases completed (through 10 * 0 4 ADR) # of use cases fully documented 10 * 65# 71# # of use cases in development >15 59 6 5 # of use case L3 decompositions completed ≥10 * 0 4 # L3 decompositions in progress ≥10 * 8 8 # of use cases submitted for Active Design ≥10 * 0 4 Reviews % of use cases submitted for ADR that are <0.75 * * * returned for significant corrections/clarifications

PY1-3 Comprehensive Report Page 10

Per use case A&D relative effectiveness = >0.8 * * * (estimated task duration) / (actual task duration) (KPI) (number of L3 decompositions >0.8 * * 0.4 completed)/(number of L3 decompositions planned) (KPI) # Actuals are much greater than target because the Architects focused on the targeted 10 canonical use cases while other XSEDE staff people worked on additional use cases.

B.2.1.4. External Relations (WBS 1.1.4) (Travis Benjamin Tate) Metric Target PY1 PY2 PY3 # website page hits TBD * * 481,472 # of sessions 40,000/quarter * * 173,240 #unique visitors TBD * * 75,174 #time spent on page TBD * * 2:51 # referrals TBD # new Twitter followers 5% increase * * * quarterly # new Facebook likes 5% increase * * * quarterly # of science highlights 12 per quarter * * * # of media hits TBD # of press releases 4 per quarter * * *

B.2.1.5. Industry Relations (WBS 1.1.5) (Dave Hudak) Metric Target PY1 PY2 PY3 # of Industry Challenge presentations made at 2 0 0 2 XSEDE Conference # of Industry Challenge projects renewed 2 0 0 0 # of skills deemed desirable 5 0 0 0 # of skills covered in XSEDE training 5 0 0 0 # of certificate opportunities for desirable skills 2 0 0 0

B.2.1.6. Software Development and Integration (WBS 1.1.6) (JP Navarro) Metric Target PY1 PY2 PY3 % Active Design Reviews (ADRs) completed >80% * * 50%% within 2 months (# ADRs) (4) % of new or improved components traceable >90% * 100% 100% to prioritized use cases or devoted to process improvement % of planned components delivered to >75% * 0% 0% operators within 30 days of planned time # of components delivered to operators TBD 0 6 7 % of activities proposed for increment TBD 50% 35.5% 33% worked on or completed thru this quarter (increment and ending quarter)

PY1-3 Comprehensive Report Page 11

% of delivered components rated 4 out of 5 >90% * 100% (6) 100% (5) or higher by operators (KPI) % availability of engineering and >99% 99.96% 99.83% 99.99% information services (KPI) Average time from escalated support to <30 * 3 25 known solution communicated to user (non- days (2) (43) defects) # component defects reported TBD * 10 6 % of known defects repaired in delivered >90% * 100% 100% (5) components (KPI) (6) Average time from defect report to known <60 * 62 134 solution communicated to user days (5) (11)

B.2.1.7. Project Administration (WBS 1.1.7) (John Towns) Metric Target PY1 PY2 PY3 # MOU’s TBD # joint activities TBD # engagement projects TBD amount of funding generated TBD % of penetration of SEI, Baldrige, Logic Model TBD assessment results/findings TBD Effort contributed/money, TBD # of customers TBD # of services offered, hit rate, TBD # external partnerships (e.g. PRACE, RIKEN, Blue TBD Waters, HPC Wales, SIGHPC, SIGAPP) # innovations TBD

B.2.2 XSEDE Operations (WBS 1.2) (Greg Peterson) B.2.2.1. Cybersecurity (WBS 1.2.1) (Randy Butler and Jim Marsteller) Metric Target PY1 PY2 PY3

Number of SD&I reviews completed TBD NA NA 7 Number of potential account credential exposures TBD 2 ~5,0001 Number of XSEDE security incidents TBD 0 0 0 % Downtime resulting from compromised accts or security <24 <24 <24 TBD incidents hours hours hours Time to randomize/disable XSEDE accounts TBD 1 0 1 Number security training developed TBD 1 1 1 Number security training events TBD NA NA 7 - Denotes metrics that were not being recorded during these project years. 1. See the Cybersecurity section (4.4.1) in the body of the report.

PY1-3 Comprehensive Report Page 12

B.2.2.2. Data Services (WBS 1.2.2) (Chris Jordan) Metrics Target PY1 PY2 PY3 New services added TBD - 2 2 Services retired TBD - 0 0 Total XWFS users1 TBD - 11 79 Total new XWFS users TBD - 11 68 Number of XWFS allocations requested TBD - 5 53 Number of XWFS allocations granted TBD - 3 20 XWFS total allocation size requested (TBs) TBD - 35 813 XWFS total allocations size granted (TBs) TBD - 18 161 Total XWFS storage used (TBs) TBD - - 132 Total XWFS storage available (TBs) TBD - 503 503 Total Globus Online users TBD - - 48 Total new Globus Online users TBD - - 241 Total transfers (Million) inbound TBD - - 280 Total transfers (Million) outbound TBD - - 2,282 Size of transfers (TBs) inbound TBD - - 2,628 Size of transfers (TBs) outbound TBD - 2 2 Intra-XSEDE GridFTP transfer information Number of SPs reporting/total SPs2 TBD - 1/12 4/122 Total number of transfers (Millions) TBD - 646 723 Total data transferred in TBs TBD - 58 767.7 Average file size per transfer in GBs TBD - 0.089 1.06 Average transfer rate in Mbps TBD - 137 326

- Denotes metrics that were not being recorded during these project years. 1. Because the XWFS was not added as a service until Q2 of 2013, there is no metric data for the service prior to that quarter. 2. Note that while 4 sites contributed some data to the total reporting for the project year, for 3 out of the 4 quarters only 2 or 3 sites had data available to contribute, as the GridFTP logging effort was rolled out over the course of the year.

B.2.2.3. XSEDEnet (WBS 1.2.3) (Joe Lappa) Metrics Target PY1 PY2 PY3 Total number of days in which any Network Interface error TBD - - 47 occurred

PY1-3 Comprehensive Report Page 13

XSEDEnet maximum TBD - - 3.771 bandwidth used (Gbps) XSEDEnet data transfer by

traffic type (TBs) Number of SPs reporting/total TBD - - 4/12 SPs SSH (TBs) TBD - - 573.4 GridFTP (TBs) TBD - - 2,724.4 PerfSonar (TBs) TBD - - 298.1 Also See Data Services intra-

XSEDE transfer information - Denotes metrics that were not being recorded during these project years. 1. Internet2 experienced errors in collecting metrics for the last quarter of PY3; their data is incomplete and potentially inaccurate and thus is not being reported. This value represents an average bandwidth used during first three quarters of PY3. B.2.2.4. Software Testing & Deployment (WBS 1.2.4) (Tabitha Samuel) Metric Target PY1 PY2 PY3 Operational readiness reviews (ORR) TBD 3 6 8 initiated Operational readiness reviews (ORR) TBD 3 6 7 completed Activities accepted TBD 1 7 7 Activities rejected TBD 1 0 0 Activities released for production TBD 0 1 3 Activities released for beta TBD 1 1 0 Activities released for campus bridging TBD 0 0 1 Activities released for enterprise TBD 2 1 0 Activities upgraded (production, beta, TBD 3 2 0 campus bridging, enterprise)

B.2.2.5. Accounting & Account Management (WBS 1.2.5) (Amy Schuele) Metric Target PY1 PY2 PY3

Account Creation time (days) TBD 2.4 .03 .03

POPS releases TBD 9 19 6

POPS new features TBD - - 8

AMIE/XDCDB releases TBD 26 52 34

PY1-3 Comprehensive Report Page 14

AMIE/XDCDB new features TBD 29 56 37

- Denotes metrics that were not being recorded during these project years.

B.2.2.6. Systems Operational Support (WBS 1.2.6) (Stephen McNally) Metric Target PY1 PY2 PY3 Total enterprise services TBD 34 41 55 Services added TBD - 14 14 Services removed TBD - 7 0 Enterprise services % TBD 99.6 99.7 99.9 availability

- Denotes metrics that were not being recorded during these project years.

Metric Target PY1 PY2 PY3 Support tickets opened TBD 1758 826 1905 for XOC Support tickets closed TBD 1277 589 1882 by XOC Mean time to first TBD * * 1.5 response by XOC (hrs) Mean time to first TBD * * 4.7 resolution by XOC (hrs)

* Metrics not available from previous ticket system

B.2.2.7. User Services (WBS 1.3) (Chris Hempel) Metric Target PY1 PY2 PY3 # of feedback activities, # of user inputs TBD *,*,* *,*,* ,33,8 gathered, # of inputs addressed # of new XUP features released, average 3 4,*** 11,*** 4,*** satisfaction (as determined by targeted user surveys) of new features (KPI) Targeted user survey to assess the consulting 4 *** *** *** process, and quality of documentation; improve time-to-solution of tickets (KPI)

*Not available ***Targeted user surveys not conducted

PY1-3 Comprehensive Report Page 15

B.2.2.8. Training (WBS 1.3.1) (Vince Betro) Metric Target PY1 PY2 PY3 # live training sessions (Goal: 50 live 50 70 56 98 training sessions) # of new training modules (goal: 5) 5 36 44 57 # total trainees (KPI) 2,800 (10% 1489 2910 5221 increase) % of trainees who are new to XSEDE 20% 100% 66% 64% training # of new tailored training materials for 2 3 2 7 discipline specific communities delivered Average course satisfaction rating 4 out of 5 4.21 4.34 4.01

B.2.2.9. User Information & Interfaces (WBS 1.3.2) (Maytal Dahan) Metric Target PY1 PY2 PY3 # of new features added 3 4 11 4 # of new KB documents 5 36 10 16 # XUP Hits

B.2.2.10. User Engagement (WBS 1.3.3) (Bobby Whitten) Metric Target PY1 PY2 PY3 # feedback * * 24 TBD emails/tickets # feedback actions TBD * * 8 % feedback 90 * * 98 responses within 1 week (KPI) # of 631 1253 1317 monthly/quarterly PI TBD contact emails sent #stale tickets by age * * 237,215,618 TBD (30, 60, 90 day) # stale tickets TBD * * 8 addressed Overall annual 4.0 * * 4.275 satisfaction for support response time and support effectiveness (KPI) #/% survey 15% * * 27% responses # interviews TBD * * 30

PY1-3 Comprehensive Report Page 16

# participants in * * 70 TBD focus groups # total user inputs * * 33 TBD gathered # user inputs that * * 22 lead to actionable TBD items (KPI) Not available due to data not being retrievable from old ticket system

B.2.2.11. Allocations (WBS 1.3.4) (Ken Hackworth) Metric Target PY1 PY2 PY3 # of XRAC meetings (Complete 4 4 4 4 4 XRAC meeting) # of “How to Write a Successful 8 8 8 8 Proposal” workshops conducted (goal: 8) Targeted user survey to assess 4 out of 5 the allocations process (KPI) Annual User Survey Overall 4 out of 5 4.13 4.02 4.05 Allocations Process Satisfaction (KPI) for XRAC: TBD # proposals by type TBD New – 221 New – 246 New – 303 Prog. Report – 15 Prog. Report – 7 Prog. Report – 5 Renewal – 306 Renewal – 365 Renewal – 423 Advance – 109 Advance – 134 Advance – 186 Justification – 25 Justification – 17 Justification – 12 Supplement – 102 Supplement – 104 Supplement – 150 Transfer – 498 Transfer – 366 Transfer – 266 Extension -246 Extension -260 Extension -263 #NUs requested (overall) TBD 141.5B 207B 305.5B # NUs offered (overall) TBD 67B 89B 117B # NUs recommended (overall) TBD 79B 103B 144B # NUs allocated (overall, 65.8B 84.5B 107.6B TBD system) for non-XRAC allocations: TBD # proposals by type New – 648 New – 745 New – 870 Renewal – 150 Renewal – 126 Renewal – 144 TBD Supplement – 128 Supplement – 108 Supplement – 122 Transfer – 287 Transfer – 159 Transfer – 150 Extension – 204 Extension – 171 Extension – 203 # proposals approved New – 572 New – 645 New – 711 Renewal – 138 Renewal – 106 Renewal – 118 TBD Supplement – 108 Supplement – 92 Supplement – 87 Transfer – 266 Transfer – 150 Transfer – 124 Extension - 190 Extension - 162 Extension - 183 # NUs allocated TBD 6.014B 7.836B 9.577B

B.2.2.12. Extended Collaborative Support Service - ECSS (WBS 1.4 and 1.5) Metric Target PY1 PY2 PY3

PY1-3 Comprehensive Report Page 17

# of ECSS staff intensive workshops/year (Begins in PY4) 2 per year 0 0 0 # of attendees at ECSS staff intensive workshops TBD 0 0 0 assessment of training impact of staff TBD workshops n/a n/a n/a Staff Climate Survey Results (are they satisfied with the level of training over time) (KPI) 4-5 rating 2.76 2.91 # of contract hires active during the quarter TBD 1.75 # of new projects (startup or XRAC) that were assisted by the contract hire TBD n/a n/a 25 # of symposiums given 10 8 10 9 # attending symposiums TBD n/a n/a 302 # of views of presentation pages (including TBD XSEDE YouTube channel) n/a 664 1166 # startup projects from new communities 20 n/a n/a 28 # XRAC projects from new communities 10 n/a n/a 12 # completed ECSS projects (Goal: sum of all goals(ESRT, ESCC, ESSGW)) (KPI) 45 8* 51 59 Avg. value of effort rating from PI across programs follow-up interview (Goal: 4 of 5) (KPI) 4 out of 5 n/a 4.13 4.56 # of adaptive proposals reviewed TBD n/a n/a 261 # of adaptive reviews completed on time (goal: 1 week before each XRAC meeting) TBD n/a n/a n/a # of ECSS staff for adaptive reviews TBD n/a n/a 38.5 average # of reviews/staff member TBD n/a n/a 2.39 # startup projects from new communities that use 10% of their allocation on at least one resource within 1 year of initial award or successful XRAC proposal (KPI) 20 n/a n/a 21 # XRAC projects from new communities that use 10% of their allocation on at least one resource within 1 year of initial award or successful renewal (KPI) 10 n/a n/a 8

B.2.2.13. ECSS – Projects (WBS 1.4) (Ralph Roskies) B.2.2.13.1. Extended Support for Research Teams (WBS 1.4.1) (Mark Fahey) Metric Target PY1 PY2 PY3 # of projects initiated TBD 86 79 70 # of projects discontinued TBD 28 33 21 # of project with workplans (Goal: 25 projects 25 34 43 35 with workplans every year, workplans within 45 days of initial PI contact) # projects with workplans completed (Goal: 23 2* 37 46 90%) # of final reports (goal: 85% of completed 22 5* 25 24 projects to have final reports within 3 months)

PY1-3 Comprehensive Report Page 18

# of PI interviews (goal: 100% completed 25 NA 11 28 projects with interviews) Quality Assessment (goal: avg 4 of 5 for value of 4 NA 4.13 4.53 effort) # of completed projects (KPI) 25 5* 37 46 * These numbers are lower bounds. Many projects inherited from the TG ASTA program did not provide final reports and did not have workplans. ** The 45 day target is not reflected in these numbers.

B.2.2.13.2. Novel and Innovative Projects (WBS 1.4.2) (Sergiu Sanielevici) Metric Target PY1 PY2 PY3 # of new startup projects from new 20 NA NA 28 communities (goal: 20) # of first -time XRAC projects from communities 10 NA NA 12 (goal: 10) # of users on startup projects from new TBD NA NA 54,28 communities (include count of users of project, users that submit jobs) # of users on 1st time XRAC projects for new TBD NA NA 39,11 communities (include count of users of project, users that submit jobs) # startup projects from new communities that 20 NA NA 21 use 10% of their allocation on at least one resource within 1 year of initial award or successful XRAC proposal (KPI) # XRAC projects from new communities that use 10 NA NA 8 10% of their allocation on at least one resource within 1 year of initial award or successful renewal (KPI)

B.2.2.14. ECSS – Communities (WBS 1.5) (Nancy Wilkins-Diehr) B.2.2.14.1. Extended Support for Community Codes (WBS 1.5.1) (John Cazes) Metric Target PY1 PY2 PY3 # of projects initiated TBD 29* 23 29 # of projects discontinued TBD 6 14 13 # of project with workplans (Goal: 10 projects 10 10 9 12 with workplans every year, workplans within 45 days of initial PI contact) # projects completed (Goal: 90%) (KPI) 10 1** 10 11 # of final reports (goal: 85% of completed 10 1** 10 5 projects to have final reports within 3 months) # of applicable XSEDE community code TBD 0 2 1 documentation (how tos) # of PI interviews TBD 0 0 7 Quality Assessment (goal: avg 4 of 5 for value of TBD NA NA 4.58 effort for XRAC projects)

PY1-3 Comprehensive Report Page 19

* * Since many requests concerned the use of the Trinity code, they were combined into one internal project. ** most projects were 9-12 months long and completed in PY2. ** Initial interviews did not include a quality assessment request with a 5 point scale

B.2.2.14.2. Extended Science Gateways Support (WBS 1.5.2) (Suresh Marru) Metric Target PY1 PY2 PY3 # of projects initiated TBD 17 12 16 # of projects discontinued TBD 5 2 0 # of project with workplans (Goal: 10 projects 10 4 4 1 with workplans every year, workplans within 45 days of initial PI contact) # projects completed (Goal: 90%) (KPI) 10 2 9 18 # of final reports (goal: 85% of completed 10 2 6 16 projects to have final reports within 3 months) # of PI interviews TBD 0 1 8 Quality Assessment (goal: avg 4 of 5 for value of TBD NA NA 4.5 effort) # of presentations given TBD 11 14 21 # attending TBD 187 213 308 # of views of presentation pages TBD NA NA NA # of science gateway architectural use cases TBD 0 5 0 derived from engaging with larger Science Gateway Community # of activities and components reviewed by TBD 0 8 20 ESSGW staff for applicability, relevance and viability of use by science gateways # of new e-infrastructure feature requirements TBD NA NA NA provided or solutions verified by science gateway community (KPI) ** Initial interviews did not include a quality assessment request with a 5 point scale

B.2.2.14.3. ECSS Communities – Extended Collaborative Support for Training, Education and Outreach (WBS 1.5.3) (Jay Alameda) Metric Target PY1 PY2 PY3 # of live training events staffed (Goal: Staff a 50 88 33 71 total of 50 live training events) # Requests for Service TBD 8 4 36 # of training modules reviewed TBD 36 24 9 # of new training modules produced TBD 7 7 18 # retired TBD NA NA 0 Assessment of ESTEO trainer by attendees for 4 out of NA NA NA live events (goal: 4 out of 5 on satisfaction) 5 # of CC fellows (KPI) TBD 0 4 6

PY1-3 Comprehensive Report Page 20

Average score of Fellows assessment of the 4 out of NA 4.3 4.25 program (goal:4 out of 5 on satisfaction) 5 (KPI) # Meetings and BOFs TBD 32 45 35 # Mentoring engagements TBD 20 20 17 # Talks and Presentations TBD 10 40 42 # Education Proposals reviewed TBD NA 12 48 # Education Course Support TBD NA NA 6 # of staff training events TBD NA NA 0 # of attendees TBD NA NA 0 Average assessment of staff training impact TBD NA NA NA

B.2.3 Education and Outreach (WBS 1.6) (Scott Lathrop) No additional metrics. B.2.3.1. Education (WBS 1.6.1) (Steve Gordon) PY4 PY1 PY2 PY3 Metric Target

# of new HPC University modules 10 * * 5

# of HPC University module downloads 14,000 * * 13,783

# of HPC University module views 7,000 * * 6,925

# of institutional partners & contributors to 20 20 20 20 development of HPC University modules

# of HPC University modules reviewed (KPI) 60 * * *

# of CS&E/HPC certificate programs created (KPI) 3 0 0 3

# of CS&E/HPC degrees created (KPI) 3 0 0 0

# of courses offered 2 0 0 1

PY1-3 Comprehensive Report Page 21

# external partnerships (e.g. PRACE, RIKEN, 7 6 6 6 Blue Waters, HPC Wales, SIGHPC, SIGAPP)

 Site initiation and measurement started in PY3 for these elements B.2.3.2. Campus Bridging (WBS 1.6.5) (Craig Stewart)

Metric Target PY1 PY2 PY3

# campuses with validated installs of Campus 100 17 40 52 Bridging Tools (KPI)

# of Campus Bridging Tools downloads 100 6 33 86 # Sites deploying Globus Online 1000 17 39 48 10 1.8 3.3 5.3 PB of data moved using Campus Bridging tools

# jobs submitted to XSEDE using CB tools 100 0 256 0 # clusters built with CB Rocks Roll 50 0 2 3 # clusters with XSEDE rpms installed 100 0 4 9 # users of campus systems with XSEDE- 1000 136 154 166 compatible software # Short-term/extended consults 25 9/12 3/3 14/3 # Live outreach events 50 6 10 13 # attendees 1000 158 294 380 # virtual workshops created 3/600 0 0 0 /accessed/#attendees

# of users who use one or more Campus Bridging 1000 0 4 8 tools (KPI)

B.2.3.3. Champions Program (WBS 1.6.6) (Kay Hunt) PY4 Metric PY1 PY2 PY3 Target # of campus champions 253 159 204 230 # of new campus champions 10% 36 45 26 growth (26%) (25%) (12%) # of domain champions TBD 0 0 5

PY1-3 Comprehensive Report Page 22

# of new domain champions +2/year 0 0 5 # of regional champions TBD 0 0 0 # of new regional champions +2/year 0 0 0 # of users requesting support from a Champion during 100 62 917 770 the allocations process # of Campus Champion Institutions 175 118 147 165 # of outreach events hosted by Champions 20 * * *

# of users who were on a Champion allocation 20 * * *

#of MSI institutions with a champion TBD 9 12 12 #of EPSCoR institutions with a champion TBD 43 47 53

B.2.3.4. Under-represented Community Engagement (WBS 1.6.7) (Linda Akli) Metric Target PY1 PY2 PY3 # URM and MSI faculty with 160 196 369 362 new portal accounts (KPI)

#URM and MSI participants in 80 non-URCE led training

# new URM and MSI PIs with 12 19 14 21 allocations (KPI)

# new URM and MSI students 60 228 85 123 faculty with access to allocations (account added to an allocation) # new projects from URM and 6 * 1 4 MSI faculty (KPI)

# URM and MSI faculty with 160 196 369 362 new portal accounts (KPI)

 metrics not collected in this period The dramatic drop in access to allocations is due to training account management changes from PY1 to PY3 for regional workshops. During PY2 and P3, local institutional resources are used for training events at the workshops held at Florida International University and Florida A&M University-Florida State University College of Engineering. In, PY3, XSEDE trainers used training

PY1-3 Comprehensive Report Page 23

accounts not associated to an individual participant for workshop sessions. There is an increase from PY2 to PY3, indicating an increase in access for URM and MSI faculty and students. B.2.3.5. Student Engagement Programs (WBS 1.6.8) (Jim Ferguson) Metric Target PY1 PY2 PY3 # of students engaged 60 70 102 102 % of population with evidence of progression from training/education/etc. to TBD * * * other services i.e. allocations, local resource usage, formal courses/programs # of XSEDE Scholars 40 40 40 40 # of Summer Research 10 0 17 15 Experience Participants (SRE) # of student champions +2/year 0 0 2 # of participants in XSEDE Conference student program, Depends on yearly 30 45 45 above Scholars and SRE supplemental proposal participants Quality Assessments of High quality program as * * * Programs (KPI) defined by criteria * metrics not collected in this period

B.2.3.6. Coordination and Administration (WBS 1.6.10) (Scott Lathrop) Metric Target PY1 PY2 PY3 # of presentations 4 4 –Feedback from TEOS Quarterly meetings, and yes yes Yes Advisory Committee annual report from TEOS AC # campuses visited 4 6 Success of E&O team achieving 80% 90% its collective KPIs (KPI)

B.2.4 Technology Investigation Service (WBS 1.7) (J. Ray Scott) Metric Target XSEDE XSEDE XSEDE PY1 PY2 PY3 # of technologies adopted/avoided by XSEDE.. 7 per year 1 2 9 # of technologies recommended to XSEDE. (KPI) 2 per year N/A N/A 0 # of marketing efforts per quarter (KPI) 4 per quarter N/A N/A 4

PY1-3 Comprehensive Report Page 24

B.2.4.1. TIS - Technology Identification (WBS 1.7.1) (Maytal Dahan) Metric Target XSEDE XSEDE XSEDE PY1 PY2 PY3 # hits to XTED (guest and logged in user hits) 5% N/A N/A + 2% (KPI) increase per quarter #unique visitors (see above) (KPI) 5% N/A N/A +20% increase per quarter # of entries in XTED (KPI) 5% +50% +300% +51% increase per quarter # new entries in XTED (KPI) +15 +6 +73 +46 entries entries per quarter

B.2.4.2. Technology Evaluation (WBS 1.7.2) (Dan Lapine) Metric Target XSEDE XSEDE XSEDE PY1 PY2 PY3 # of identified technologies +15 N/A +12 +184 entries entries per quarter # of evaluations completed annually (KPI) 7 per year 1 2 9

PY1-3 Comprehensive Report Page 25

C Publications Listing In its quarterly reports, XSEDE includes listings of the publications gathered from Research submissions to the quarter’s XSEDE Resource Allocations Committee (XRAC) meeting. Renewal submissions are required to provide a file specifically to identify publications resulting from the work conducted in the prior year. Altogether over the first three program years of XSEDE, users have contributed 10,670 publications—which we estimate would add more than 500 pages to this three-year report. While the full set of publications is available upon request, we are including here only the publications for Q2 2014, organized by the proposal with which they were associated. For Q2 2014, 116 allocation requests identified 1,169 publications and other papers that were published, in press, accepted, submitted, or in preparation. Those publications are listed in this appendix. A number of XSEDE activities are under way that will improve the collection and reporting effort, the metrics that can be reported on XSEDE user publications, and the description of the scientific impact resulting from these publications.  With the deployment of the XSEDE Resource Allocation Service (XRAS), publications associated with allocation requests will be entered into the XSEDE User Portal’s (XUP’s) publications database, associated with user profiles, ensuring a more accurate and “cleaner” data set. In parallel, the development of the XSEDE Resource Allocation System (XRAS) will integrate the user publications from XUP profiles with the allocation submission process, providing further motivation and opportunities for users to identify relevant publications for tracking and metrics purposes.  The XUP and XD-TAS teams are working together to “seed” user profiles with relevant publications, including those that have been reported in prior quarters as well as those harvested from other sources. Users will verify which publications have benefitted from access to XSEDE resources and services. The XSEDE User Portal (XUP) team is collaborating with the XD-TAS team to integrate these mechanisms to seed users’ profiles in the XUP with relevant publications.  The XD-TAS team is currently investigating methods to automate the collection of relevant publications as well as provide meaningful summary metrics of publication impact. These metrics analyses and services are described in the “Impact of XSEDE” appendix.  Future efforts include integrating the user publications from XUP profiles with the XDMoD reporting service. C.1 XSEDE Staff Publications The following staff publications were published in Q2 2014 and reported via the XSEDE User Portal user profiles. Since the start of XUP publications collections in September 2012, 70 XSEDE Staff Publications have been added to the collection. 1. Mark R. Fahey. 2014. A leap forward with UTK's Cray XC30. In Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE '14). ACM, New York, NY, USA, , Article 30 , 8 pages. DOI=10.1145/2616498.2616546 http://doi.acm.org/10.1145/2616498.2616546 2. Jim Basney, Jeff Gaynor, Suresh Marru, Marlon Pierce, Thejaka Amila Kanewala, Rion Dooley, and Joe Stubbs. 2014. Integrating Science Gateways with XSEDE Security: A Survey of Credential Management Approaches. In Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE '14). ACM, New York, NY, USA, , Article 58 , 2 pages. DOI=10.1145/2616498.2616559 http://doi.acm.org/10.1145/2616498.2616559 3. David L. Hart, Amy Schuele, Ester Soriano, Maytal Dahan, and Matthew Hanlon. 2014. XRAS: Allocations software as a service in XSEDE. In Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE '14). ACM, New York, NY, USA, Article 65, 8 pages. DOI=10.1145/2616498.2616562 http://doi.acm.org/10.1145/2616498.2616562 4. Galen Arnold, Manisha Gajbe, Seid Koric, and John Urbanic. 2014. XSEDE OpenACC workshop enables Blue Waters Researchers to Accelerate Key Algorithms. In Proceedings of the 2014 Annual Conference on Extreme Science and

PY1-3 Comprehensive Report Page 1

Engineering Discovery Environment (XSEDE '14). ACM, New York, NY, USA, Article 28 , 6 pages. DOI=10.1145/2616498.2616530 http://doi.acm.org/10.1145/2616498.2616530 C.2 Publications from XSEDE Users The following publications were gathered from Research submissions to the June 2014 XSEDE Resource Allocations Committee (XRAC) meeting. Renewal submissions are required to provide a file specifically to identify publications resulting from the work conducted in the prior year. The publications are organized by the proposal with which they were associated. This quarter, 116 requests identified 1,169 publications and other papers that were published, in press, accepted, submitted, or in preparation. 1. ASC050025 1. Abhishek Gupta, Laxmikant V. Kal´e, Dejan S. Milojicic, Paolo Faraboschi, Richard Kaufmann, Verdi March, Filippo Gioachin, Chun Hui Suen, and Bu-Sung Lee. The who, what, why and how of high performance computing applications in the cloud. In Proceedings of the 5th IEEE International Conference on Cloud Computing Technology and Science, CloudCom ’13, 2013. 2. Laxmikant V. Kale and Abhinav Bhatele, editors. Parallel Science and Engineering Applications: The Charm++ Approach. Taylor & Francis Group, CRC Press, November 2013. 3. Esteban Meneses. Scalable Message-Logging Techniques for Effective Fault Tolerance in HPC Applications. PhD thesis, Dept. of Computer Science, University of Illinois, 2013. http://charm.cs.uiuc.edu/papers/13-17/. 4. Esteban Meneses, Xiang Ni, Gengbin Zheng, Celso L. Mendes, and Laxmikant V. Kal´e. Using migratable objects to enhance fault tolerance schemes in supercomputers. IEEE Transactions on Parallel and Distributed Systems, 2014. Submitted for publication. 5. Rafael K. Tesser, Laercio L. Pilla, Fabrice Dupros, Philippe O. A. Navaux, Jean-Francois M´ehaut, and Celso Mendes. Improving the performance of seismic wave simulations with dynamic load balancing. In Proceedings of the 22nd Euromicro International Conference on Parallel, Distributed and Networkbased Processing (PDP’2014), Turin, Italy, February 2014. 6. Ehsan Totoni, Nikhil Jain, and Laxmikant V. Kale. Toward runtime power management of exascale networks by on/off control of links. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2013 IEEE International Symposium on, 2013. 2. ASC110028 7. S. Singh and D. You (2013). A Dynamic Global-Coefficient Mixed Subgrid-Scale Model for Large-Eddy Simulation of Turbulent Flows. International Journal of Heat and Fluid Flow, Vol. 42, pp. 94-104. 8. S. Singh, P. Adams, A. Misquitta, K. J. Lee, E. M. Lipsky, and A. L. Robinson (2014). Computational Analysis of Particle Nucleation in Dilution Tunnels: Effects of Flow Configuration and Tunnel Geometry. Journal of Aerosol Science and Technology, In press. 3. ASC130004 9. Y. C. See and M. Ihme. Les investigation of flow field sensitivity in a gas turbine model combustor. In 52nd Aerospace Science Meeting, AIAA, pages 1{15, 2014. 10. Y. C. See and M. Ihme. Large eddy simulation of a partially-premixed gas turbine model combustor. In 35th Symposium of Combustion, accepted, pages 1{12, 2014. 11. Y. C. See and M. Ihme. Effects of finite-rate chemistry and detailed transport on the instability of jet diffusion flames. J. Fluid Mech., 745:647{681, 2014. 12. H. Wu and M. Ihme. Effects of flow-field and mixture inhomogeneities on the ignition dynamics in continuous flow reactors. Combust. Flame, pages 1{10, 2014. in press. 4. ASC130023 13. George Teodoro, Tahsin Kurc, Jun Kong, Lee Cooper, Joel Saltz, “Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis”, the IEEE 28th International Symposium on Parallel and Distributed Processing (IPDPS), 2014. 14. George Teodoro, Tony Pan, Tahsin M. Kurc, Jun Kong, Lee A. D. Cooper, Scott Klasky, Joel H. Saltz. “Region Templates: Data Representation and Management for High-Throughput Image Analysis”, the Parallel Computing (ParCo), 2014 (Under Review). 15. George Teodoro, Tony Pan, Michael Nalisnik, Tahsin Kurc, Lee Cooper, Jun Kong, Joel Saltz, “Scalable Analysis of Large Datasets of Whole Slide Tissue Images using High Performance Computing Systems”, in preparation for submission to BMC Bioinformatics Journal.

PY1-3 Comprehensive Report Page 2

16. Thiago S. F. X. Teixeira, George Teodoro, Eduardo Valle, Joel H. Saltz, “Bringing Locality-Sensitive Hashing to Web- Scale Multimedia Similarity Search”, the 20th International Conference Euro-Par 2014 (Under Review). 17. Jun Kong, Fusheng Wang, George Teodoro, Lee Cooper, Tahsin Kurc, Tony Pan, Joel Saltz, and Daniel J. Brat, “High- Performance Computational Analysis of Glioblastoma Pathology Images with Database Support Identifies Molecular and Survival Correlates”, IEEE International Conference of bioinformatics and biomedicine (BIBM), pp 229-236, 2013 18. Joel H. Saltz, George Teodoro, Tony Pan, Lee A. D. Cooper, Jun Kong, Scott Klasky, Tahsin M. Kurc. “Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures”, the International Journal of High- Performance Computing Applications (IJHPCA), 27(3), pp. 263-272, 2013. 5. AST080005 19. \Accurate Modeling of Galaxy Clustering on Small Scales" Sinha, M., Berlind, A. A., McBride, C. K., & Scoccimarro, R., 2014, in preparation 20. \The Large Suite of Dark Matter Simulations (LasDamas) Project: Mock Galaxy Catalogs for the Sloan Digital Sky Survey" McBride, C. K., Berlind, A. A., Scoccimarro, R., Piscionere, J. A., Busha, M., Gardner, J., Manera, M., Wechsler, R. H., & Van den Bosch, F., 2014, in preparation 21. The Distribution of Galaxies on the Smallest Scales from their Angular Clustering" Piscionere, J., Berlind, A. A., & McBride, C. K., 2014, in preparation 22. Constraining Primordial Non-Gaussianity with Moments of the Large-Scale Density Field" Mao, Q., Berlind, A. A., McBride, C. K., Scherrer, R. J., Scoccimarro, R., & Manera, M., 2014, Monthly Notices of the Royal Astronomical Society, submitted 23. \Nonlocal Lagrangian bias" Sheth, R. K., Chan, K. C., & Scoccimarro, R., 2013, Physical Review D, 87, 083002 24. \The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: a large sample of mock galaxy catalogues" Manera, M., Scoccimarro, R., Percival, W. J., Samushia, L., McBride, C. K., Ross, A. J., Sheth, R. K., White, M., Reid, B. A., Sanchez, A. G., de Putter, R., Xu, X., Berlind, A. A., Brinkmann, J., Maraston, C., Nichol, B., Montesano, F., Padmanabhan, N., Skibba, R. A., Tojeiro, R., & Weaver, B. A., 2013, Monthly Notices of the Royal Astronomical Society, 428, 1036 6. AST080028 25. Semi-Implicit Scheme for Treating Radiation Under M1 Closure in General Relativistic Conservative Fluid Dynamics Codes, Sadowski, A., Narayan, R., Tchekhovskoy, A., Zhu, Y., MNRAS, 429, 3533 (2013) 26. Radio Light Curves During the Passage of Cloud G2 Near Sgr A_, Sadowski, A., Sironi, L., Abarca, D., Guo, X., ¨Ozel, F., Narayan, R., MNRAS, 432, 478 (2013) 27. Location of the Bow Shock Ahead of Cloud G2 at the Galactic Center, Sadowski, A., Narayan, R., Sironi, L., ¨Ozel, F., MNRAS, 433, 2165 (2013) 28. Energy, momentum and mass outflows and feedback from thick accretion discs around rotating black holes, Sadowski, A., Narayan, R., Penna, R., & Zhu, Y. 2013, MNRAS, 436, 3856 (2013) 29. General relativistic magnetohydrodynamic simulations of Blandford-Znajek jets and the membrane paradigm, Penna, R. F., Narayan, R., & S¸adowski, A. MNRAS, 436, 3741 (2013) 30. Numerical simulations of super-critical black hole accretion flows in general relativity, S¸adowski, A., Narayan, R., McKinney, J. C., & Tchekhovskoy, A. MNRAS, 439, 503 (2014) 31. Extracting black-hole rotational energy: The generalized Penrose process, Lasota, J.-P., Gourgoulhon, E., Abramowicz, M., Tchekhovskoy, A., & Narayan, R. Phys. Rev. D89, 024041 (2014) 32. Simulating the effect of the Sgr A* accretion flow on the appearance of G2 after pericenter, Abarca, D., Sadowski, A., & Sironi, L. MNRAS, in press (2014, arXiv:1309.2313) 33. Three-dimensional general relativistic radiation magnetohydrodynamical simulation of super-Eddington accretion, using a new code HARMRAD with M1 closure, MccKinney, J. C., Tchekhovskoy, A., S¸adowski, A., & Narayan, R., MNRAS, in press (2014, arXiv:1312.6127) 34. A high-frequency Doppler feature in the power spectra of simulated GRMHD black hole accretion disks, Wellons, S., Zhu, Y., Psaltis, D., Narayan, R., & McClintock, J. E. MNRAS, in press (2014, arXiv:1312.2333) 7. AST090107 35. Mathur, S., et al. (2013a) Study of HD 169392A observed by CoRoT and HARPS. Astronomy and Astrophysics, 549:A12. 36. Do˘gan, G., et al. (2013) Characterizing Two Solar-type Kepler Subgiants with Asteroseismology: KIC 10920273 and KIC 11395018. Astrophysical Journal, 763:49. 37. Mathur, S., et al. (2013b) Constraining magnetic-activity modulations in three solar-like stars observed by CoRoT and NARVAL. Astronomy & Astrophysics, 550:A32. 38. Barclay, T., et al. (2013) A sub-Mercury-sized exoplanet. Nature, 494:452. List of Publications - Asteroseismic Modeling Portal 1 of 2

PY1-3 Comprehensive Report Page 3

39. Gilliland, R. L., et al. (2013) Kepler-68: Three Planets, One with a Density between that of Earth and Ice Giants. Astrophysical Journal, 766:40. 40. Chaplin, W. J., et al. (2013) Asteroseismic Determination of Obliquities of the Exoplanet Systems Kepler-50 and Kepler-65. Astrophysical Journal, 766:101. 41. Huber, D., et al. (2013) Fundamental Properties of Kepler Planet-candidate Host Stars using Asteroseismology. Astrophysical Journal, 767:127. 42. Silva Aguirre, V., et al. (2013) Stellar Ages and Convective Cores in Field Main-Sequence Stars: First Asteroseismic Application to Two Kepler Targets. Astrophysical Journal, 769:141. 43. Chaplin, W. J., et al. (2014) Asteroseismic Fundamental Properties of Solar-type Stars Observed by the NASA Kepler Mission. Astrophysical Journal Supplement, 210:1. 44. Mazumdar, A., et al. (2014) Measurement of Acoustic Glitches in Solar-type Stars from Oscillation Frequencies Observed by Kepler. Astrophysical Journal, 782:18. 45. Mathur, S., et al. (2014) Magnetic activity of F stars observed by Kepler. Astronomy and Astrophysics, 562:A124. 46. Marcy, G. W., et al. (2014) Masses, Radii, and Orbits of Small Kepler Planets: The Transition from Gaseous to Rocky Planets. Astrophysical Journal Supplement Series, 210:20. 47. Metcalfe, T. S., et al. (2014) Properties of 42 solar-type Kepler targets from the Asteroseismic Modeling Portal. Astrophysical Journal, submitted (arXiv:1402.3614). 48. Ballard, S., et al. (2014) Kepler-93b: a Terrestrial World Measured to within 120 km, and a Test Case for a New Spitzer Observing Mode. Astrophysical Journal, submitted. 49. Davies, G.R., et al. (2014) Asteroseismic inference on rotation, gyrochronology and planetary system dynamics of 16 Cygni. Monthly Notices of the Royal Astronomical Society Letters, submitted. 50. Guzik, J., et al. (2014) Discovery of Solar-Like Oscillations, Observational Constraints, and Stellar Models for _ Cyg, the Brightest Star Observable by the Kepler Spacecraft. Astrophysical Journal, in preparation. 51. Brand˜ao, I.M., et al. (2014) An in-depth asteroseismic study of the Kepler target KIC 10273246. Astronomy & Astrophysics, in preparation. 8. AST120002 52. Choi, J.-H., Shlosman, I. & Begelman, M.C. 2013, Astrophys. J., 774, 149 53. Long, S., Shlosman, I. & Heller, C.H. 2014, Astrophys. J. Letters, 783, L18 54. Sadoun, R., Shlosman, I., Romano-Diaz, E. & Choi, J.-H. 2014, Astrophys. J., submitted 55. Shlosman, I. 2013, Secular Evolution of Galaxies, by Jess Falcn-Barroso & Johan H. Knapen, Cambridge, UK: Cambridge University Press, p.555 9. AST120022 56. “Statistical constraints on the formation history of the Milky Way”, Gómez, F.A.; O’Shea, B.W.; Griffen, B.; Frebel, A.; 2014, in prep. 57. “The Caterpillar project”, Griffen, B.; Ji, A.; Dooley, G.; Gómez, F.A.; Vogelsberger, M.; O'Shea, B.W.; Frebel, A.; 2014, in prep. 58. “Abundance of dark matter substructure in Milky Way like halos: The missing satellite problem revisited”, Griffen, B.; Gómez, F.A.; Ji, A.; Dooley, G.; Vogelsberger, M.; Frebel, A.; O'Shea, B.W.; 2014, in prep. 59. “Characterizing the diversity in the chemical composition of Milky Way-like stellar halos”, O'Shea, B.W.; Gómez, F.A.; Griffen, B.; Frebel, A.; Ji, A.; Dooley, G.; Vogelsberger, M.; 2014, in prep. 10. AST120024 60. Jeon, M., Pawlik, A. H., Bromm, V., & Milosavljevi´c, M. 2014, “Radiative feedback from highmass X-ray binaries on the formation of the first galaxies and early reionization”, MNRAS, in press (arXiv:1310.7944) 61. Safranek-Shrader, C., Milosavljevi´c, M., & Bromm, V. 2014b, “Formation of the first low-mass stars from cosmological conditions”, MNRAS, 440, L76 62. Safranek-Shrader, C., Milosavljevi´c, M., & Bromm, V. 2014a, “Star formation in the first galaxies–II. Clustered star formation and the influence of metal line cooling”, MNRAS, 438, 1669 63. Pawlik, A. H., Milosavljevi´c, M., & Bromm, V. 2013, “The First Galaxies: Assembly under Radiative Feedback from the First Stars”, ApJ, 767, 59 11. AST120067 64. New views of stellar explosions: The Supernova Spectropolarimetry Project, Hoffman, J.L., invited talk presented at American Physical Society Four Corners Meeting, Denver, CO, October 2013

PY1-3 Comprehensive Report Page 4

65. Polarization of circumstellar bow shocks due to electron scattering, Shrestha, M., Hoffman, J.L., Neilson, H.R., & Ignace, R. 2014, poster presented at American Astronomical Society Meeting #223, 154.24, Washington, DC, January 2014 66. Understanding the relation of progenitors and supernovae through the study of circumstellar material (CSM), Shrestha, M., Hoffman, J.L., Neilson, H.R., & Ignace, R. 2013, contributed talk presented at American Physical Society Four Corners Meeting, Denver, CO, October 2013 67. Modelling near-IR polarization to constrain stellar wind bow shocks Neilson, H.R., Ignace, R., Shrestha, M., Hoffman, J.L., & Mackey, J., poster presented at Massive Stars: From α to Ω, Rhodes, Greece, June 2013 68. Variety of polarized line profiles in interacting supernovae Hoffman, J.L., Huk, L.N., & Peters, C.L., poster presented at American Astronomical Society Meeting #221, 253.30, Long Beach, CA, January 2013 69. The evolution of SN 1997eg through spectropolarimetric model comparisons Huk, L.N., Peters, C.L,. & Hoffman, J.L., poster presented at American Astronomical Society Meeting #221, 253.16, Long Beach, CA, January 2013 12. AST130002 70. Zhu, Z., Stone, J. M., & Rafikov, R. R.,\Low-mass Planets in Protoplanetary Disks with Net Vertical Magnetic Fields: The Planetary Wake and Gap Opening", 2013, ApJ, 768, 143 71. Lubow, S. H., & Zhu, Z., \An Analytic Model for Buoyancy Resonances in Protoplanetary Disks", 2014, ApJ, 785, 32 72. Zhu, Z., Stone, J. M., Rafikov, R. R., & Bai, X.-N., \Particle Concentration at Planet-induced Gap Edges and Vortices. I. Inviscid Three-dimensional Hydro Disks", 2014, ApJ, 785, 122 73. M. W. Kunz, A. A. Schekochihin, J. M. Stone, \Firehose and Mirror Instabilities in a Shearing Collisional Plasma", 2014, PRL, submitted 74. Zhu, Z., & Stone, J. M., \Settling of Dust Particles in Vortices: Three-dimensional Global Models", 2014, ApJ, submitted 75. Zhu, Z., Stone, J. M., & Bai, X.-N.,\Dust Transport in Global MRI Turbulent Disks: Ideal MHD and Non-ideal MHD with Ambipolar Diffusion", 2014, ApJ, submitted 76. Zhu, Z., & Stone, J. M., \Dust Trapping by Vortices in Transitional Disks: Evidence for Non-ideal MHD Effects in Protoplanetary Disks", 2014, ApJ, submitted 77. Bai, X.-N. & Stone, J. M., \Hall-effect controlled Gas Dynamics in Protoplanetary Disks: II Outer Disk ", 2014, in prep 78. Jiang, Y.-F., Stone, J. M., & Davis, S. , \Global radiation MHD simulations of super-eddyington accretion disks", in prep 13. AST130020 79. On the operation of the chemo-thermal instability in primordial star-forming clouds Thomas H. Greif, Volker Springel, Volker Bromm, 2013, MNRAS, 434, 3408 80. Multi-frequency radiation hydrodynamics simulations of H2 line emission in primordial, star-forming clouds, Thomas H. Greif, 2014, MNRAS, submitted 81. The numerical frontier of the high-redshift Universe, Thomas H. Greif, 2014, CompAC, submitted 82. Binary supermassive black hole seeds in massive primordial halos, Thomas H. Greif, Fernando Becerra, Volker Springel, Lars Hernquist, 2014, MNRAS, in prep 83. A new opacity limit for star formation in atomic cooling halos, Thomas H. Greif, Fernando Becerra, Volker Springel, Lars Hernquist, 2014, Nature, in prep 84. Formation of supermassive black hole seeds in atomic cooling halos, Fernando Becerra, Thomas H. Greif, Volker Springel, Lars Hernquist, 2014, MNRAS, in prep 14. AST130036 85. Carroll, J.J., Frank, A., & Heitsch, F. 2014,eprint arXiv:1304.1367. 2014, Apj, accepted 86. Li, S., Frank, A., & Blackman, E. G. 2014,"Triggered Star Formation and Its Consequences", submitted to MNRAS 87. Kaminski, E., Frank, A., Carroll, J., Myers, P. 2014, "On the role of ambient environments in the collapse of Bonnor- Ebert spheres" eprint arXiv:1401.5064 15. AST140047 88. Manukian, H., Guillochon, J., Ramirez-Ruiz, E., & O’Leary, R. M. 2013, ApJ, 771, L28 89. McCourt, M., Quataert, E., Parrish, I. J., MNRAS, 432, 1:404-416 (2013) 16. ATM060012 90. McFarquhar, G. M. and J. Um, 2014: The impact of natural variations of ice crystal aspect ratios on single-scattering properties. 14th Conference on Atmospheric Radiation, Boston, MA, Submitted. 91. McFarquhar, G.M., and J. Um, 2014: The impact of natural variations of ice crystal aspect ratios on single-scattering radiative properties. 14th Amer. Meteor. Soc. Conference on Atmos. Rad., Boston, MA.

PY1-3 Comprehensive Report Page 5

92. Um J., and G. M. McFarquhar, 2013: Efficient numerical orientation average for calculations of single-scattering properties of small atmospheric ice crystals. Proc. ASR Science Team Meeting, U.S. Department of Energy, Potomac, MD. 93. Um, J., and G.M. McFarquhar, 2013: Optimal numerical methods for determining the orientation average of single- scattering properties of ice crystals. J. Quant. Spect. Rad. Trans., 127, doi:10.1016/j.jqsrt.2013.05.020. 94. Um, J. and G. M. McFarquhar, 2014: Formation of atmospheric halos by hexagonal ice crystals. 14th Conference on Atmospheric Radiation, Boston, MA, Submitted. 95. Um, J. and G. M. McFarquhar, 2014: Single-scattering properties of hexagonal ice crystals: Implications for atmospheric halo phenomena. Atmos. Chem. Phys., In preparation. 96. Um. J., and G. M. McFarquhar, 2014: Impacts of shapes and concentrations of small ice crystals on bulk scattering properties of tropical cirrus. J. Geophy. Res., In preparation. 97. Um, J., and G. M. McFarquhar, 2014: Accuracy of discrete-dipole approximation and T-matrix method in calculations of single-scattering properties of hexagonal particles. J. Quant. Spectrosc. Radiat. Transfer, In preparation. 17. ATM090042 98. Zhang, F., and D. Tao, 2013: Effects of vertical wind shear on the predictability of tropical cyclones. Journal of the Atmospheric Sciences, 70, 975-983. 99. Stern, D. P., and F. Zhang, 2013: How does the warm? Part I: A potential temperature budget analysis of an idealized . Journal of the Atmospheric Sciences, 70, 73-90. 100. Stern, D. P., and F. Zhang, 2013: How does the eye warm? Part II: Sensitivity to vertical wind shear, and a trajectory analysis. Journal of the Atmospheric Sciences, 70, 1849-1873. 101. Munsell, E. B., F. Zhang, and D. P. Stern, 2013: Predictability and dynamics of a nonintensifying tropical storm: Erika (2009). Journal of the Atmospheric Sciences, 70, 2505-2524. 102. Zhang, F., M. Zhang, and J. Poterjoy, 2013: E3DVar: Coupling an ensemble Kalman filter with three-dimensional variational data assimilation in a limited-area weather prediction model and comparison to E4DVar. Monthly Weather Review, 141, 900-917. 103. Xie, B., F. Zhang, Q. Zhang, J. Poterjoy, and Y. Weng, 2013: Observing strategy and observation targeting for tropical cyclones using ensemble-based sensitivity analysis and data assimilation. Monthly Weather Review, 141, 1437-1453. 104. Green, B. W., and F. Zhang, 2013: Impacts of air-sea flux parameterizations on the intensity and structure of tropical cyclones. Monthly Weather Review, 141, 2308-2324. 105. Zhang, F., M. Zhang, J. Wei, and S. Wang, 2013: Month-long simulations of gravity waves over North America and North Atlantic in comparison with satellite observations. Acta Meteorologica Sinica, 27, 446-454. 106. Bao, X., and F. Zhang, 2013: Evaluation of NCEP/CFSR, NCEP/NCAR, ERA-Interim and ERA-40 Reanalysis datasets against independent sounding observations over the Tibetan Plateau. Journal of Climate, 26, 206–214. 107. Bao, X., and F. Zhang, 2013: Impacts of the Mountain-Plains Solenoid and cold pool dynamics on the diurnal variation of over Northern . Atmospheric Chemistry and Physics, 13, 6965-6982. 108. Hu, X.-M., P. M. Klein, M. Xue, F. Zhang, D. C. Doughty, and J. D. Fuentes, 2013: Impact of the vertical mixing induced by low-level jets on boundary layer ozone concentration. Atmospheric Environment, 70, 123-130. 109. Hu, X.-M., P. M. Klein, M. Xue, J. K. Lundquist, F. Zhang, and Y. Qi, 2013: Impact of lowlevel jets on the nocturnal urban heat island intensity in Oklahoma City. Journal of Applied Meteorology and Climatology, 52, 1779-1801. 110. Zhang, Y. J.*, Z. Meng, Y. Weng*, and F. Zhang, 2014: Predictability of Tropical Cyclone Intensity Evaluated through 5-year Forecasts with a Convection-permitting Regional-scale Model in the Atlantic Basin. Weather and Forecasting, accepted subject to revisions. 111. Zhang, F., and Y. Weng*, 2014: Modernizing the Prediction of Hurricane Intensity and Associated Hazards: A Five-year Real-time Forecast Experiment Concluded by Superstorm Sandy (2012). Bulletin of the American Meteorological Society, in review. 112. Chi, Y.*, F. Zhang, W. Li, J. He, and Z. Guan, 2014: Influences of Madden-Julian Oscillation on the Onset of East Asian Subtropical Summer Monsoon". Journal of the Atmospheric Sciences, submitted. 113. Poterjoy, J.*, F. Zhang, 2014: Inter-comparison and coupling of ensemble and variational data assimilation approaches for mesoscale analysis and forecasting. Monthly Weather Review, accepted subject to minor revision. 114. Tao, D. and F. Zhang, 2014: Effect of environmental shear, sea-surface temperature and ambient moisture on the formation and predictability of tropical cyclones: an ensemble-mean perspective. Journal of Advanced Modeling in Earth Sciences (JAMES), accepted. 115. Wei, J., and F. Zhang, 2014: Mesoscale gravity waves in moist baroclinic jet-front systems. Journal of the Atmospheric Sciences, 71, 929-952. 116. Poterjoy, J., F. Zhang, and Y. Weng, 2014: The effects of sampling error on the EnKF assimilation of inner-core hurricane observations. Monthly Weather Review, 142, 1609-1630. 117. Poterjoy, J., and F. Zhang, 2014: Predictability and genesis of Hurricane Karl (2010) examined through the EnKF assimilation of field observations collected during PREDICT. Journal of the Atmospheric Sciences, 71, 1260-1275.

PY1-3 Comprehensive Report Page 6

118. Melhauser, C., and F. Zhang, 2014: Diurnal radiation cycle impact on the genesis of Hurricane Karl (2010). Journal of the Atmospheric Sciences, 71, 1241-1259. 119. Green, B. W., and F. Zhang, 2014: Sensitivity of tropical cyclone simulations to parametric uncertainties in air-sea fluxes and implications for parameter estimation. Monthly Weather Review, accepted. doi: http://dx.doi.org/10.1175/MWR-D-13-00208.1 120. Munsell, E. B., and F. Zhang, 2014: Prediction and uncertainty of Hurricane Sandy (2012) explored through a real-time cloud-permitting ensemble analysis and forecast system. Journal of Advanced Modeling in Earth Sciences (JAMES), 6, 1-20. 121. 2013 Plenary speaker on regional scale ensemble based data assimilation, South China Regional Conference on Numerical Weather Prediction, Guangzhou, June 2013 122. 2013 Plenary speaker on Advances and Challenges in Atmospheric Modeling, NSF EarthCube Workshop on Modeling Workshop for the Geosciences, April 2013 123. 2013 Invited speaker on weather significant gravity waves and spontaneous balance adjustment, Lance Bosart Symposium, University of Albany, Albany, New York, April 2013 124. 2013 Plenary speaker on Ensemble-based Data Assimilation: Inter-comparison, Hybrid and Coupling with Variational Methods at Mesoscales, 9th International Conference on Mesoscale Convective Systems, Beijing, China 125. 2013 Invited speaker on Real-time Cloud-Permitting Hurricane Prediction with Assimilation of Inner-core Airborne Doppler Observations, BIRS Workshop on Probabilistic Approaches to Data Assimilation for Earth Systems, Banff, Alberta, Canada 126. 2014 Keynote speaker on uncertainties in data assimilation and ensemble forecasting, International Symposium on Data Assimilation, Munich, Germany, February 2014 127. Department of Atmospheric Sciences, Colorado State University, Ft Collins, Colorado, 2013 128. Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China, July 2013 129. Meteorological Center, Civil Aviation Administration of China, Beijing, China, July 2013 130. Anhui Meteorological Bureau, Hefei, China, July 2013 131. University of Science and Technology of China, Hefei, China, July 2013 132. China National Meteorological Center, Bejing, China, July 2013 133. Nanjing University, Nanjing, China, July 2013 134. Chinese Academy of Meteorological Sciences, Beijing, China, July 2013 135. National Center for Atmospheric Research, Boulder, Colorado, May 2013 136. North Carolina State University, Raleigh, North Carolina, April 2013 137. China National Meteorological Center, Bejing, China, March 2013 138. Rossby Meteorological Institute, Stockholm, Sweden, June 2014 139. University of Gothenburg, Gothenburg, Sweden, June 2014 140. Naval Research Laboratory, Monterey, California, May 2014 141. Lawrence Liverpool National Laboratory (LLNL, DOE), California, April 2014 142. Department of Atmospheric and Oceanic Sciences, UCLA, Los Angles, California, March 2014 143. University of Mainz, Mainz, Germany, March 2014 144. Institut fuer Atmosphaere und Umwelt, Johann Wolfgang Goethe-Universitaet Frankfurt, March 2014 145. Swiss Federal Institute of Technology in Zurich (ETHZ), Zurich, Switzerland, March 2014 146. Institut für Physik der Atmosphäre, DLR (German NASA), Oberpfaffenhofen, Germany, February 2014 147. Department of Meteorology, Pennsylvania State University, University Park, Pennsylvania 2014 18. ATM100017 148. McFarquhar, G. M. and J. Um, 2014: The impact of natural variations of ice crystal aspect ratios on single-scattering properties. 14th Conference on Atmospheric Radiation, Boston, MA, Submitted. 149. Um J., and G. M. McFarquhar, 2013: Efficient numerical orientation average for calculations of single-scattering properties of small atmospheric ice crystals. Proc. ASR Science Team Meeting, U.S. Department of Energy, Potomac, MD. 150. Um, J., and G. M. McFarquhar, 2013: Optimal numerical methods for determining the orientation averages of single- scattering properties of atmospheric ice crystals. J. Quant. Spectrosc. Radiat. Transfer, 127, 207-223. 151. Um, J. and G. M. McFarquhar, 2014: Formation of atmospheric halos by hexagonal ice crystals. 14th Conference on Atmospheric Radiation, Boston, MA, Submitted. 152. Um, J. and G. M. McFarquhar, 2014: Single-scattering properties of hexagonal ice crystals: Implications for atmospheric halo phenomena. Atmos. Chem. Phys., In preparation. 153. Um. J., and G. M. McFarquhar, 2014: Impacts of shapes and concentrations of small ice crystals on bulk scattering properties of tropical cirrus. J. Geophy. Res., In preparation.

PY1-3 Comprehensive Report Page 7

154. Um, J., and G. M. McFarquhar, 2014: Accuracy of discrete-dipole approximation and Tmatrix method in calculations of single-scattering properties of hexagonal particles. J. Quant. Spectrosc. Radiat. Transfer, In preparation. 155. Um, J., 2013: Light scattering by atmospheric ice crystals. University of Illinois, Urbana, IL, USA. 156. Um, J., 2013: Uncertainties in microphysical and scattering properties of ice clouds. Yonsei University, Seoul, Republic of Korea (Invited). 157. Um, J., 2013: Scattering properties of atmospheric ice crystals. Gangneung-Wonju National University, Gangneung, Republic of Korea (Invited). 158. Um, J., 2013: Light scattering by of atmospheric particles. Ewha Womans University, Seoul, Republic of Korea (Invited). 19. ATM110007 159. Ganesh Vijayakumar, Adam W. Lavely, Balaji Jayaraman, Brent Craven, and James Brasseur. Blade Boundary Layer Response to Atmospheric Boundary Layer Turbulence on a NREL 5MW Wind Turbine Blade with Hybrid URANS- LES. In 52nd Aerospace Sciences Meet and Exhibit. January 2014. AIAA-2014-0867 160. Adam W. Lavely, Ganesh Vijayakumar, Brent Craven, Balaji Jayaraman, Eric G. Paterson, Tarak N. Nandi, and James Brasseur. Towards a Blade-Resolved Hybrid URANS-LES of the NREL 5-MW Wind Turbine Rotor within Large Eddy Simulation of the Atmospheric Boundary Layer. In 52nd Aerospace Sciences Meet and Exhibit. January 2014 161. Balaji Jayaraman and James Brasseur. Transition in Atmospheric Turbulence Structure from Neutral to Convective Stability States. In 52nd Aerospace Sciences Meet and Exhibit, pages –. American Institute of Aeronautics and Astronautics, January 2014 20. ATM140036 162. Johnson, A.* and X. Wang, 2013: Object-based evaluation of a storm scale ensemble during the 2009 NOAA Hazardous Weather Testbed Spring Experiment. Mon. Wea. Rev., 141,1079-1098 163. Johnson, A.*, and X. Wang, F. Kong and M. Xue, 2013: Object based evaluation of the impact of horizontal grid spacing on storm-scale forecasts. Mon. Wea. Rev., 141,3413-3425 164. Johnson, A.*, X. Wang, M. Xue, F. Kong, G. Zhao, Y. Wang, K. Thomas, K. Brewster and J. Gao, 2014: Multiscale characteristics and evolution of perturbations for warm season convection-allowing precipitation forecasts: Dependence on background flow and method of perturbation. Mon. Wea. Rev., 142, 1053-1073 165. Duda, J.*, X. Wang, F. Kong and M. Xue, 2014: Using Varied Microphysics to Account for Uncertainty in Warm- Season QPF in a Convection-Allowing Ensemble. Mon. Wea. Rev., in press. 166. Wang, X. and T. Lei*, 2014: GSI-based four dimensional ensemble variational data assimilation (4DEnsVar): formulation and single resolution experiments with real data for NCEP GFS. Mon. Wea. Rev., in press. 21. CDA100010 167. K. Min, A. Barati Farimani and N. R. Aluru, “Mechanical behavior of water filled C60”, Applied Physics Letters, Vol. 103, No. 26, Art. No. 263112, 2013. 168. A. Barati Farimani, Y. Wu and N. R. Aluru, “Rotational motion of a single water molecule in a buckyball”, Physical Chemistry Chemical Physics, Vol. 15, No. 41, pp. 17993-18000, 2013. 169. A. Alwan and N. R. Aluru, “Improved statistical models for limited datasets in uncertainty quantification using stochastic collocation”, Journal of Computational Physics, Vol. 255, pp. 521-539, 2013. 170. K. Kunal and N. R. Aluru, “Phonon mediated loss in a graphene nanoribbon”, Journal of Applied Physics, Vol. 114, No. 8, Art. No. 084302, 2013. 171. R. Bhadauria and N. R. Aluru, “A quasi-continuum hydrodynamic model for slit shaped nanochannel flow”, Journal of Chemical Physics, Vol. 139, No. 7, Art. No. 074109, 2013. 172. Y. Wu and N. R. Aluru, “Graphitic carbon-water nonbonded interaction parameters”, Journal of Physical Chemistry B, Vol. 117, No. 29, pp. 8802-8813, 2013. 173. V. V. R. Nandigana and N. R. Aluru, “Characterization of electrochemical properties of a micro-nanochannel integrated system using computational impedance spectroscopy (CIS)”, Electrochimica Acta, Vol. 105, pp. 514-523, 2013. 174. M. N. Silberstein, K. Min, L. D. Cremar, C. M. Degen, T. M. Martinez, N. R. Aluru, S. R. White and N. R. Sottos, “Modeling mechanophore activation within a crosslinked glassy matrix”, Journal of Applied Physics, Vol. 114, No. 2, Art. No. 023504, 2013. 175. K. Kunal and N. R. Aluru, “Intrinsic loss due to unstable modes in graphene”, Nanotechnology, Vol. 27, No. 24, Art. No. 275701, 2013. 176. M. E. Suk and N. R. Aluru, “Molecular and continuum hydrodynamics in graphene nanopores”, RSC Advances, Vol. 3, No. 24, pp. 9365-9372, 2013. 177. C. Cheng, K. Y. Ng, N. R. Aluru and A. H. W. Ngan, “Simulation and experiment of substrate aluminum grain orientation dependent self-ordering in anodic porous alumina”, Journal of Applied Physics, Vol. 113, No. 20, Art. No. 204903, 2013.

PY1-3 Comprehensive Report Page 8

178. B. Kumar, K. Min, M. Bashirzadeh, A. Barati Farimani, M.-H. Bae, D. Estrada, Y. D. Kim, P. Yassei, Y. D. Park, E. Pop, N. R. Aluru and A. Salehi-Khojin, “The role of external defects in chemical sensing of graphene field-effect transistors”, Nano Letters, Vol. 13, No. 5, pp. 1962-1968, 2013. 179. Y. Jing, L. Guo, Y. Sun, J. Shen and N. R. Aluru, “Mechanical properties of a silicon nanofilm covered with defective graphene”, Surface Science, Vol. 611, pp. 80-85, 2013. 180. T. Sanghi and N. R. Aluru, “A combined quasi-continuum/Langevin equation approach to study the self-diffusion dynamics of confined fluids”, Journal of Chemical Physics, Vol. 138, No. 12, Art. No. 124109, 2013. 181. W. Ye, K. Min, P. M. Martin, A. Rockett, N. R. Aluru and J. W. Lyding, “Scanning tunneling spectroscopy and density functional calculation of silicon dangling bonds on the Si(100)-2x1:H surface”, Surface Science, Vol. 609, pp. 147-151, 2013. 182. V. V. R. Nandigana and N. R. Aluru, “Nonlinear electrokinetic transport under combined AC and DC fields in micro/nanofluidic interface devices”, Journal of Fluids Engineering, Vol. 135, No. 2, pp. Art. No. 021201, 2013. 183. J. Lee and N. R. Aluru, “Water-solubility-driven separation of gases using graphene membrane”, Journal of Membrane Science, Vol. 428, pp. 546-553, 2013. 184. S. Banerjee, J. Shim, J. Rivera, X. Jin, D. Estrada, V. Solovyeva, X. You, J. Pak, E. Pop, N. R. Aluru and R. Bashir, “Electrochemistry at the edge of a single graphene layer in a nanopore”, ACS Nano, Vol. 7, No. 1, pp. 834-843, 2013. 22. CHE030089 185. Harrison, J. G.; Tantillo, D. J. J. Molec. Model. 2013, in press: "Role of Gold in a Complex Cascade Reaction Involving Two Electrocyclization Steps," invited for special issue on Quitel 2011. 186. Lebold, T. P.; Wood, J. L.; Deitch, J.; Lodewyk, M. W.; Tantillo, D. J.; Sarpong, R. Nature Chem. 2013, 5, 126-131: "A Divergent Approach to the Synthesis of the Yohimbinoid Alkaloids Venenatine and Alstovenine" 187. Ogawa, Y.; Painter, P. P.; Tantillo, D. J.; Wender, P. A. J. Org. Chem. 2013, 78, 104-115: "Mechanistic and Computational Studies of Exocyclic Stereocontrol in the Synthesis of Bryostatinlike cis-2,6-Disubstituted-4-Alkylidene Tetrahydropyrans by Prins Cyclization," part of Robert Ireland Memorial Issue. 188. Hong, Y. J.; Giner, J.-L.; Tantillo, D. J. J. Org. Chem. 2013, 78, 935-941: "Triple-Shifts and Thioether Assistance in Rearrangements Associated with an Unusual Biomethylation of the Sterol Side Chain" 189. Nguyen, Q. N. N.; Tantillo, D. J. Beistein J. Org. Chem. 2013, 9, 323-331: "Caryolene-forming Carbocation Rearrangements," part of thematic issue on "new reactive intermediates in organic chemistry." 190. Vaughan, M. M.; Webster, F. X.; Kiemle, D.; Hong, Y. J.; Tantillo, D. J.; Wang, Q.; Coates, R. M.; Wray, A. T.; Askew, W.; O'Donnell, C.; Tokuhisa, J. G.; Boland, W.; Tholl, D. Plant Cell, 2013, 25, 1108-1125: "Formation of the Unusual Semivolatile Diterpene Rhizathalene by the Arabidopsis Class I Terpene Synthase TPS08 in the Root Stele is Involved in Defense Against Belowground Herbivory" 191. Davis, R. L.; Lodewyk, M. W.; Siebert, M. R.; Tantillo, D. J. J. Organomet. Chem. 2013, 748, 68-74: "Complex Consequences: Substituent Effects on Metal---Arylmethylium Interactions," part of special issue on "Theory and Mechanistic Studies in Metal-Organic Chemistry." 192. Pemberton, R. P.; Hong, Y. J.; Tantillo, D. J. Pure Appl. Chem. 2013, 85, 1949-1957: "Inherent Dynamical Preferences in Carbocation Rearrangements Leading to Terpene Natural Products," part of the ICPOC-21 collection. 193. Hong, Y. J.; Tantillo, D. J. Chem. Sci. 2013, 4, 2512-2518: "C-H•••π Interactions as Modulators of Carbocation Structure - Implications for Terpene Biosynthesis" 194. Tantillo, D. J.; Schleyer, P. v. R. Org. Lett. 2013, 15, 1725-1727: "Nonamethylcyclopentyl Cation Rearrangement Mysteries Solved" 195. Hudson, B. M.; Harrison, J. G.; Tantillo, D. J. Tetrahedron Lett. 2013, 54, 2952-2955: "Assessing the Viability of Biosynthetic Pathways for Calophyline A Formation - Are Pericyclic Reactions Involved?" 196. Coffman, K. C.; Palazzo, T. A.; Hartley, T. P.; Fettinger, J. C.; Tantillo, D. J.; Kurth, M. J. Org. Lett. 2013, 15, 2062- 2065: "Heterocycle–Heterocycle Strategies: (2-Nitrophenyl)isoxazole Precursors to 4-Aminoquinolines, 1H-Indoles, and Quinolin-4(1H)-ones" 197. Felix, R. J.; Gutierrez, O.; Tantillo, D. J., Gagne, M. R. J. Org. Chem. 2013, 78, 5685-5690: "Gold(I)-Catalyzed Formation of Bicyclo[4.2.0]oct-1-enes" 198. Tantillo, D. J. Nat. Prod. Rep. 2013, 30, 1079-1086: "Walking in the Woods with Quantum Chemistry - Applications of Quantum Chemical Calculations in Natural Product Research," featured on the cover. 199. Hong, Y. J.; Irmisch, S.; Wang, S. C.; Garms, S.; Gershenzon, J.; Zu, L.; Kollber, T. G.; Tantillo, D. J. Chem. Eur. J. 2013, 19, 13590-13600: "Theoretical and Experimental Analysis of the Reaction Mechanism of MrTPS2, a Triquinane- forming Sesquiterpene Synthase from Chamomile" 200. Gutierrez, O.; Harrison, J. G.; Felix, R. J.; Guzman, F. C.; Gagne, M. R.; Tantillo, D. J. Chem. Sci. 2013, 4, 3894-3898: "Carbonium vs. Carbenium Ion-like Transition State Geometries for Carbocation Cyclization - How Strain Associated with Bridging Affects 5-exo vs. 6-endo Selectivity" 201. Gutierrez, O.; Strick, B. F.; Thomson, R. J.; Tantillo, D. J. Chem. Sci. 2013, 4, 3997-4003: "Mechanism of Triflimide- Catalyzed [3,3]-Sigmatropic Rearrangements of N-Allylhydrazones - Predictions and Experimental Validation"

PY1-3 Comprehensive Report Page 9

202. Maity, P.; Pemberton, R. P.; Tantillo, D. J.; Tambar, U. K. J. Am. Chem. Soc. 2013, 135, 16380- 16383: "Bronsted Acid Catalyzed Enantioselective Indole Aza-Claisen Rearrangement Mediated by an Arene CH-O Interaction" 203. Chooi, Y.-H.; Hong, Y. J.; Cacho, R. A.; Tantillo, D. J.; Tang, Y. J. Am. Chem. Soc. 2013, 135, 16805-16808: "A Cytochrome P450 Serves as an Unexpected Terpene Cyclase during Fungal Meroterpenoid Biosynthesis" 204. Nguyen, Q. N. N.; Tantillo, D. J. Chem. Asian J., in press, DOI: 10.1002/asia.201301452: "The Many Roles of Quantum Chemical Predictions in Synthetic Organic Chemistry," invited Focus Review. 205. Painter, P. P.; Pemberton, R. P; Wong, B. M.; Ho, K. C; Tantillo, D. J. J. Org. Chem., in press, DOI: 10.1021/jo402487d: "The Viability of Nitrone-Alkene (3+2) Cycloadditions in Alkaloid Biosynthesis" 206. Lodewyk, M. W.; Willenbring, D.; Tantillo, D. J. Org. Biomol. Chem., in press, DOI: 10.1039/C3OB42005A: "Pentalenene Formation Mechanisms Redux" 207. Hong, Y. J.; Tantillo, D. J. Nature Chem., in press: "Biosynthetic Consequences of Multiple Sequential Post-Transition State Bifurcations" 23. CHE040014 208. Aaron G. Green, Peng Liu, Craig A. Merlic, and K. N. Houk: "Distortion/Interaction Analysis Reveals the Origins of Selectivities in Iridium-Catalyzed C–H Borylation of Substituted Arenes and 5-Membered Heterocycles" J. Am. Chem. Soc. 2014, 136, 4575–4583 (2014). 209. Arne Dieckmann and K. N. Houk: "Analysis of Supramolecular Complex Energetics in Artifical Replicators," Chem. Sci., 4, 3591-3600 (2013). 210. Arne Dieckmann, Martin Breugst, and K. N. Houk: "Zwitterions and Unobserved Intermediates in Organocatalytic Diels-Alder Reactions of Linear and Cross-Conjugated Trienamines," J. Am. Chem. Soc., 135, 3237-3242 (2013). 211. Arne Dieckmann, Matthew T. Richers, Alena Yu Platonova, Chen Zhang, Daniel Seidel, and K. N. Houk: "Metal-Free alpha-Amination of Secondary Amines: Computational and Experimental Evidence for Azaquinone Methide and Azomethine Ylide Intermediates," J. Org. Chem., 135, 4132-4144 (2013). 212. Ashay Patel, Gregg A. Barcan, Ohyun Kwon, and K. N. Houk: "Origins of 1,6-Stereoinduction in Torquoselective 6 pi Electrocyclizations," J. Am. Chem. Soc., 135, 4878-4883 (2013). 213. Caroline Souris, Frederic Frebault, Ashay Patel, David Audisio, K. N. Houk, and Nuno Maulide: "Stereoselective Synthesis of Dienyl-Carboxylate Building Blocks: Formal Synthesis of Inthomycin C," Org. Lett. 15, 3242-3245 (2013). 214. Changxia Yuan, Yong Liang, Taylor Hernandez, Adrian Berriochoa, Kendall N. Houk, and Dionicio Siegel: "Metal-Free Oxidation of Aromatic Carbon-Hydrogen Bonds Through a Reverse-Reound Mechanism," Nature, 499, 192-196 (2013). 215. David N. Kamber, Lidia A. Nazarova, Yong Liang, Steven A. Lopez, David M. Patterson, Hui-Wen Shih, K. N. Houk, and Jennifer A. Prescher: "Isomeric Cyclopropenes Exhibit Unique Bioorthogonal Reactivities," J. Am. Chem. Soc., 135, 13680-13683 (2013). 216. Fang Liu, Robert S. Paton, Seonah Kim, Yong Liang, and K. N. Houk: "Diels-Alder Reactivities of Strained and Unstrained Cycloalkenes with Normal and Inverse-Electron-Dmand- Dienes: Activation Barriers and Distortion/Interaction Analysis," J. Am. Chem. Soc., 135, 15642-15649 (2013). 217. Gonzalo Jimenéz-Osés, Peng Liu, Ricardo A. Matute and K. N. Houk: “Competition Between Concerted and Stepwise Dynamics in the Triplet Di-π-Methane Rearrangement”. Angew. Chem. Int. Ed., DOI: 10.1002/anie.201310237 218. Gonzalo Jimenez-Oses, Anthony J. Brockway, Jared T. Shaw, and K. N. Houk: "Mechanism of Alkoxy Groups Substitution by Grignard Reagents on Aromatic Rings and Experimental Verification of Theoretical Predictions of Anomalous Reactions," J. Am. Chem. Soc., 135, 6633-6642 (2013). 219. Gui-Juan Cheng, Yun-Fang Yang, Peng Liu, Ping Chen, Tian-Yu Sun, Gang Li, Xinhao Zhang, K. N. Houk, Jin-Quan Yu, and Yun-Dong Wu: " Role of N-Acyl Amino Acid Ligands on Pd(II)-Catalyzed Remote C–H Activation of Tethered Arenes," J. Am. Chem. Soc., 136, 894-897 (2014). 220. Hiroshi Miyazaki, Myles B. Herbert, Peng Liu, Xiaofei Dong, Xiufang Xu, Benjamin K. Keitz, Thay Ung, Garik Mkrtumya, K. N. Houk, and Robert H. Grubbs: "Z-Selective Ethenolysis with a Ruthenium Metathesis Catalyst: Experiment and Theory," J. Am. Chem. Soc., 135, 5848-5858 (2013). 221. Hung V. Pham, Robert S. Paton, Audrey G. Ross, Samuel J. Danishefsky, and K. N. Houk: "Intramolecular Diels-Alder Reactions of Cycloalkenones: Stereoselectivity, Lewis Acid Acceleration, and Halogen Substituent Effects," J. Am. Chem. Soc., 136, 2397-2403 (2014). 222. J. Yang, Yong Liang, J. seckute, K. N. Houk, Neal Devaraj “Synthesis and Reactivity Comparisons of 1-Methyl-3- Substituted Cyclopropene Mini-Tags for Tetrazine Bioorthogonal Reactions” Chem–Eur. J., 20, 3365-3375 (2014). 223. Jeffrey C. Holder, Lufeng Zou, Alexander N. Marziale, Peng Liu, Yu Lan, Michele Gatti, Kotaro Kikushima, K. N. Houk, and Brian M. Stoltz: "Mechanism and Enantioselectivity in Palladium-Catalyzed Conjugate Addition of Arylboronic Acids to beta-substituted Cyclic Enones: Insights from Computation and Experiment," J. Am. Chem. Soc., 135, 14996-15007 (2013). 224. Jeffrey S. Cannon, Lufeng Zou, Peng Liu, Yu Lan, Daniel J. O’Leary, K. N. Houk, and Robert H. Grubbs: "Carboxylate- Assisted C(sp3)–H Activation in Olefin Metathesis-Relevant Ruthenium Complexes" J. Am. Chem. Soc. accepted.

PY1-3 Comprehensive Report Page 10

225. Lufeng Zou, Robert S. Paton, Albert Eschenmoser, Timothy R. Newhouse, Phil S. Baran, and K. N. Houk: "Enhanced Reactivity in Dioxirane C-H Oxidations via Strain Release: A Computational and Experimental Study," J. Org. Chem., 78, 4037-4048 (2013). 226. Martin Breugst, Albert Eschenmoser, and K. N. Houk: "Theoretical Exploration of the Mechanism of Riboflavin Formation from 6,7-Dimethyl-8-ribityllumazine: Nucleophilic Catalysis, Hydride Transfer, Hydrogen Atom Transfer, or Nucleophilic Addition?," J. Am. Chem. Soc., 135, 6658-6668 (2013). 227. Matthew F. L. Parker, Silvia Osuna, Guillaume Bollot, Shivaiah Vaddypally, Michael J. Zdilla, K. N. Houk, and Christian E. Schafmeister: "Acceleration of an Aromatic Claisen Rearrangement via a Designed Spiroligozyme Catalyst that Mimics the Ketosteroid Isomerase Catalytic Dyad," J. Am. Chem. Soc. ASAP Article (2014). 228. Natalie C. James, Joann M. Um, Anne B. Padias, H. K. Hall Jr., and K. N. Houk: "Computational Investigation of the Competition Between the Concerted Diels-Alder Reaction and Formation of Diradicals in Reactions of Acrylonitrile with Nonpolar Dienes," J. Org. Chem., 78, 6582- 6592 (2013). 229. Shiyu Zhang, Nihan Celebi-Olcun, Marie M. Melzer, K. N. Houk, and Timothy H. Warren: "Copper(I) Nitrosyls from Reacion of Copper (II) Thiolates with S-Nitrosothiols: Mechanism of NO Release from RSNOs at Cu," J. Am. Chem. Soc., 135, 16746-16749 (2013). 230. Shudan Bian, Amy M. Scott, Yang Cao, Yong Liang, Silvia Osuna, K. N. Houk, and Adam B. Braunschweig: "Covalently Patterned Graphene Surfaces by a Force Accelerated Diels-Alder Reaction," J. Am. Chem. Soc., 135, 9240- 9243 (2013). 231. Silvia Osuna, Seonah Kim, Guillaume Bollot, and Kendall N. Houk: "Aromatic Claisen Rearrangements of O- Prenylated Tyrosine and Model Prenyl Aryl Ethers: Computational Study of the Role of Water on Acceleration of Claisen Rearrangements," Eur. J. Org. Chem., 2823-2831 (2013). 232. Steven A. Lopez and K. N. Houk: "Alkene Distortion Energies and Torsional Effects Control Reactivities and Stereoselectivities of Azide Cycloadditions to Norbornene and Substituted Norbornenes," J. Org. Chem., 78, 1778-1783 (2013). 233. Steven A. Lopez, Morton E. Munk, and K. N. Houk: "Mechanisms and Transition States of 1,3-Dipolar Cycloadditions of Phenyl Azide with Enamines: A Computational Analysis," J. Org. Chem., 78, 1576-1582 (2013). 234. Xin Hong, Barry M. Trost, and K. N. Houk: "Mechanism and Origins of Selectivity in Ru(II)-Catalyzed Intramolecular (5+2) Cycloadditions and Ene Reactions of Vinylcyclopropanes and Alkynes from Density Functional Theory," J. Am. Chem. Soc., 135, 6588-6600 (2013). 235. Xin Hong, Peng Liu, and K. N. Houk: "Mechanism and Origins of Ligand-Controlled Selectivities in [Ni(NHC)]- Catalyzed Intramolecular (5+2) Cycloadditions and Homo-Ene Reactions: A Theoretical Study," J. Am. Chem. Soc., 135, 1456-1462 (2013). 236. Xin Hong, Yong Liang, Allison K. Griffith, Tristan H. Lambert, and K. N. Houk: "Distortion-Accelerated Cycloadditions and Strain-Release-Promoted Cycloreversions in the Organocatalytic Carbonyl-Olefin Metathesis," Chem. Sci. 5, 471 (2014). 237. Xin Hong, Yong Liang, K. N. Houk, “Mechanisms and Origins of Switchable Chemoselectivity of Ni-Catalyzed C(aryl)-O and C(acyl)-O Activation of Aryl Esters with Phosphine Ligands” J. Am. Chem. Soc., 136, 2017-2025 (2014). 238. Xiufang Xu, Peng Liu, Xing-Zhong Shu, Weiping Tang, and K. N. Houk: "Rh-Catalyzed (5+2) Cycloadditions of 3- Acyloxy-1,4-enynes and Alkynes: Computational Study of Mechanism, Reactivity, and Regioselectivity," J. Am. Chem. Soc., 135, 9271-9274 (2013). 239. Xue Gao, Wei Jiang, Gonzalo Jimenez-Oses, Moon Seok Choi, Kendall N. Houk, Yi Tang, and Christopher T. Walsh: "An Iterative, Bimodular Nonribosomal Peptide Synthetase that Converts Anthranilate and Tryptophan into Tetracyclic Asperlicins," Chem. & Biol., 20, 870-878 (2013). 240. Yang Cao, Silvia Osune, Yong Liang, Robert C. Haddon, and K. N. Houk: "Diels-Alder Reactions of Graphene: Computational Predictions of Products and Sites of Reaction," J. Am. Chem. Soc., 135, 17643-17649 (2013). 241. Yun-Fang Yang, Gui-Juan Cheng, Peng Liu, Dasheng Leow, Tian-Yu Sun, Ping Chen, Xinhao Zhang, Jin-Quan Yu, Yun-Dong Wu, and K. N. Houk: "Palladium-Catalyzed Meta-Selective C-H Bond Activation with a Nitrile-Containing Template: Computational Study on Mechanism and Origins of Selectivity," J. Am. Chem. Soc., 136, 344-355 (2014). 242. B. Martin, H.-Y. Chen, Y. Yang, and K. N. Houk: “Influence of Side Chains on Intermolecular Interactions in Cyclopentadithiophene- and Silacyclopentadithiophene-based Polymers,” In preparation. 243. B. Martin, J. S. Fell, and K. N. Houk: “Why 1-aza heterocyclopentadienes are generally less reactive than 2-aza heterocyclopentadienes in Diels-Alder cycloadditions,” in preparation. 244. B. Martin, J. S. Fell, and K. N. Houk: “Why 1-aza-substituted pentaheterocycles are generally less reactive than 2-aza- substituted pentaheterocycles in Diels-Alder cycloadditions,” Poster presentation at the Seaborg Symposium, Los Angeles, CA (10/26/2013). 24. CHE080076 245. Lombardi, R. L. and Birke, R. L., “The theory of surface-Enhanced Raman Scattering in Semiconductors” , J. Phys. Chem. C, submitted (2014)

PY1-3 Comprehensive Report Page 11

25. CHE100121 246. V. Seshadri, P. J. Fahey, X. Geng, P. R. Westmoreland, “Pyrolysis Kinetics for Small-Molecule Intermediates of Cellulose Pyrolysis,” AIChE Annual Meeting, San Francisco CA, November 3-8, 2013, paper 27a. 247. P.R. Westmoreland. "Entering a Golden Age of Chemical Engineering," AIChE National Capital Local Section Meeting, October 30, 2013. [Invited] 248. P.R. Westmoreland. "Fundamental Kinetics For Pyrolysis of Lignocellulosic Biomass to Bio-oil," 2nd Middle East Process Engineering Conference, 9/29-10/2/2013. [Invited.] 249. P.R. Westmoreland. "Molecular Kinetics of Making Bio-oils: Opportunities and Challenges for a Golden Age of ChE,”Jahrestreffen der Fachgemeinschaft Fluiddynamik und Trenntechnik 2013, Würzburg, Germany, September 24- 27, 2013. [Invited plenary address.] 250. P.R. Westmoreland. "Entering a Golden Age of Chemical Engineering," AIChE Eastern North Carolina Local Section Meeting, September 18, 2013. [Invited] 251. P.R. Westmoreland. "Fundamental Kinetics For Pyrolysis of Lignocellulosic Biomass to Bio-oil," World Congress of Chemical Engineering, incorporating 15th Asian Pacific Confederation of Chemical Engineering Congress, Seoul, Korea, August 19-23, 2013. [Invited.] 252. P.R. Westmoreland. "Entering a Golden Age of Chemical Engineering," AIChE Baton Rouge Local Section Meeting, April 8, 2013. [Invited] 253. P.R. Westmoreland. "Entering a Golden Age of Chemical Engineering," AIChE Virtual Local Section Meeting, January 24, 2013. [Invited] 254. University of Canterbury, Department of Christchurch, New Zealand, scheduled July 2014. 255. North Carolina State University, Department of Chemical and Biomolecular Engineering, Raleigh, NC, January 2014. 256. Peking University, Center for Water Research and College of Engineering, Beijing, China, October 2013. 257. Tsinghua University, Department of Chemical and Biomolecular Engineering, Beijing, China, October 2013. 258. Tianjin University, School of Chemical Engineering & Technology, Tianjin, China, October 2013. 259. Zhejiang University, Department of Chemical and Biological Engineering, Hangzhou, China, October 2013. 260. Nanjing University of Technology, Department of Chemical Engineering and State Key Laboratory of Materials- Oriented Chemical Engineering, Nanjing, China, October 2013. 261. Hong Kong University of Science and Technology, Department of Chemical and Biomolecular Engineering, Hong Kong, China, October 2013. 262. City University of New York, Department of Chemical Engineering, New York, NY, October 2013. 26. CHE110041 263. Christopher D. Von Bargen, Christopher M. MacDermaid, One-Sun Lee, Pravas Deria, Michael J. Therien, and Jeffery G. Saven. "Origins of the Helical Wrapping of Phenyleneethynylene Polymers about Single-Walled Carbon Nanotubes." J. Phys. Chem. B, 2013, 117 (42), pp 12953-12965 264. Pravas Deria, Christopher D. Von Bargen, Jean-Hubert Olivier, Amar S. Kumbhar, Jeffery G. Saven, and Michael J. Therien. "Single-Handed Helical Wrapping of Single-Walled Carbon Nanotubes by Chiral, Ionic, Semiconducting Polymers." J. Am. Chem. Soc., 2013, 135 (43), pp 16220-16234. 265. Michael Reca, Christopher D. Von Bargen, One-Sun Lee, Michael J. Therien, and Jeffery G. Saven. "Alternate Helical Wrapping Modes of Naphthylene Ethynylene Polymers about Single-Walled Carbon Nanotubes." Manuscript in preparation. 266. Lu Gao, Wenhao Liu, One-Sun Lee, Ivan J. Dmochowski, and Jeffery G. Saven. "Molecular basis of Xe binding affinities of water soluble cryptophanes" Manuscript in preparation. 267. Lu Gao, Christopher D. Von Bargen, Ivan J. Dmochowski, and Jeffery G. Saven. "Xe Accessibility to the Interiors of Water Soluble Cryptophanes: Affinities, Free Energy Barriers and Cage Fluctuations." Manuscript in preparation. 268. Sameer Sathaye, Huixi Zhang, Cem Sonmez, Joel P. Schneider, Christopher M. MacDermaid, Christopher D. Von Bargen, Jeffery G. Saven and Darrin J. Pochan "Engineering complementary hydrophobic interactions to control _- hairpin peptide self-assembly, network branching, and hydrogel properties". Manuscript in preparation. 269. Wenhao Liu, Lu Gao, One-Sun Lee, Jeffery G. Saven. "Molecular basis of cesium cation binding affinity for water soluble cryptophane" Manuscript in preparation. 270. Jianghai Ho, Lu Gao, Jose Manuel Perez-Aguilar, Jeffery G. Saven, Hiroaki Matsunami, Roderic Eckenhoff. "Molecular recognition of ketamine by discrete olfactory G protein coupled receptors." Manuscript in preparation. 27. CHE110064 271. K. Nanda and G. Beran. \What governs the proton-ordering in ice XV?" J. Phys. Chem. Lett. 4, 3165{3169 (2013). DOI: 10.1021/jz401625w 272. K. Theel, S. Wen, and G. Beran. \Communication: Constructing an implicit quantum mechanical/molecular mechanics solvent model by coarse-graining explicit solvent." J. Chem. Phys. 139, 081103 (2013). DOI: 10.1063/1.4819774

PY1-3 Comprehensive Report Page 12

273. Y. Huang, Y. Shao, and G. Beran. \Accelerating MP2C dispersion corrections for dimers and molecular crystals." J. Chem. Phys. 138, 224112 (2013). DOI: 10.1063/1.4809981 274. J. Liu, S. Wen, Y. Hou, F. Zuo, G. Beran, and P. Feng. \Boron carbides as efficient, metal- free visible-light-responsive photocatalysts." Angew. Chemie Int. Ed. 52, 3241{3245 (2013). DOI: 10.1002/anie.201209363 Chosen as a \Hot Article" by the journal editors. 275. G. Beran, S. Wen, K. Nanda, Y. Huang, and Y. Heit. \Accurate molecular crystal modeling with fragment-based electronic structure methods" in Prediction and Calculation of Crystal Structures: Methods and Applications, edited by A. Aspuru-Guzik and S. Atahan-Evrenk, in press (2014). ISBN 978-3-319-05773-6 276. Y. Huang and G. Beran. \Achieving high-accuracy intermolecular interactions by combining Coulomb-attenuated second-order Moller-Plesset perturbation theory with a long-range dispersion correction" J. Chem. Theory Comput. submitted (2014). 28. CHE120052 277. M. Z. Chen, O. Gutierrez and A. B. Smith, III. “Through-•Bond and Through-•Space Anion Relay Chemistry with Vinylepoxide Linchpins” Angew. Chem. Int. Ed. 2014, 53, 1279-• 1282. 278. Dickstein, J. S.; Curto, J. M.; Gutierrez, O.; Mulrooney, C. A.; Kozlowski, M. C. "Mild Aromatic Palladium-•Catalyzed Protodecarboxylation: Kinetic Assessment of the Decarboxylative Palladation and the Protodepalladation Steps" J. Org. Chem. 2013, 78, 4744-•4761. 279. Allen, S. E.; Mahatthananchai, J.; Bode, J. W.; Kozlowski, M. C. “Confluence of Electronic and Steric Effects in the Enantioselective N-•Heterocyclic Carbene-•Catalyzed Hetero-•Diels-•Alder Cycloaddition” Submitted. 280. Gutierrez, O.; Kozlowski, M. C. “Divergent Mode of Activation and Diastereoselectivity in Palladium-•Catalyzed [3,3]- •Sigmatropic Rearrangment of Allyloxy-• and Propargyloxyindoles Revealed by Density Functional Theory” In preparation. 281. Allen, S. E.; Binanzer, M.; Hsieh, S.-•Y.; Bode, J. W.; Kozlowski, M. C. “Determination of Reaction Path and Origins of Selectivity in the Kinetic Resolution of Cyclic Amines via NHC and hydroxamide co-•catalyzed Acyl Transfer” Submitted. 282. Raffier, L.; Gutierrez, O.; Stanton, G. R.; Kozlowski, M. C.; Walsh, P. J. “Chelation-• Controlled Additions to rac-•β,γ- •Unsaturated Ketones” Submitted. 283. Allen, S. E.; Mahatthananchai, J.; Bode, J. W.; Kozlowski, M. C. “Confluence of Electronic and Steric Effects in the Enantioselective N-•Heterocyclic Carbene-•Catalyzed Hetero-• Diels-•Alder Cycloaddition” Submitted 284. Gutierrez, O.; Bode, J. W.; Kozlowski, M. C. “Computational Analysis on the Mechanism of NHC-•Catalyzed Asymmetric Redox Acylation of α-•Substituted-•α,β-•Unsaturated Aldehydes” In preparation 285. Osvaldo Gutierrez, Jeffrey Bode, Marisa C. Kozlowski “Towards the Computational 286. Design of NHC-­Catalyst for Enantioselective Peptide Synthesis” National Autonomous University of Mexico (UNAM), Mexico City, Mexico. March 27, 2013. 287. Osvaldo Gutierrez, Jeffrey Bode, Marisa C. Kozlowski “Towards the Computational Design of NHC-•Catalyst for Enantioselective Peptide Synthesis” The Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV), Mexico City, Mexico. March 28, 2013. 288. Walvoord, R. R.; Huynh, P. N. H.; Kozlowski, M. C. Rapid Quantification of the LUMO-• Lowering Ability of Hydrogen-•Bonding Catalysts with a UV-­VIS Sensor” Biennial POCC Industrial Award and Poster Session, April 18, 2013. 289. Kozlowski, M.C.; Allen, S. E.; Gutierrez, O. "Using Transition Structure Calculations in the Development of New Organic Reactions" Gettysburg College, Musselman Distinguished Visiting Scientist, February 7, 2013 29. CHE120056 290. Gozem, S. Krylov, A. I., & Olivucci, M. Conical Intersection and Potential Energy Surface Features of a Model Retinal Chromophore: Comparison of EOM-CC and Multireference Methods. J. Chem. Theor. Comput. 2013. 9, 284−292. 291. Huix-Rotllant, M.; Filatov, M.; Gozem, S.; Schapiro, I.; Olivucci, M.; and Ferre, N. Assessment of Density Functional Theory for Describing the Correlation Effects on the Ground and Excited State Potential Energy Surfaces of a Retinal Chromophore Model Journal of Chemical Theory and Computation. 2013, 9, 3917-3932. 292. Huntress, M.M. Gozem, S., Malley, K. M., Jailaubekov, A. E., Vasileiou, C., Vengris, M., Geiger, J. H., Borhan, B. Schapiro, I., Larsen, D. S. & Olivucci, M. Towards an Understanding of the Retinal Chromophore in Rhodopsin Mimics. J. Phys. Chem. B 2013, 117, 10053-10070. 293. Gozem, S.; Melaccio, F.; Lindh, R.; Krylov, A. I.; Granovsky, A. A.; Angeli, C.; and Olivucci, M. Mapping the Excited State Potential Energy Surface of a Retinal Chromophore Model with Multireference and Equation-of-Motion Coupled- Cluster Methods Journal of Chemical Theory and Computation. 2013, 9, 4495-4506. 294. El-Khoury, P. Z.; Joseph, S.; Schapiro, I.; Gozem, S.; Olivucci, M.; and Tarnovsky, A. N. Probing Vibrationally Mediated Ultrafast Excited-state Reaction Dynamics with Multireference (CASPT2) Trajectories. J Phys Chem A. 2013, 117, 11271-11275.

PY1-3 Comprehensive Report Page 13

295. Rinaldi, S., Melaccio, F., Gozem, S., Fanelli, F. and Olivucci, M. Comparison of the isomerization mechanisms of human melanospin and invertebrate and vertebrate rhodopsins. Proc. Natl. Acad. Sci. U.S.A. 2014, 111, 1714-1719. 296. Luk, H. L., Melaccio, F., Rinaldi, S., and Olivucci, M. On the Dynamics of the Unidirectional Photoisomerization of Anabaena Sensory Rhodopsin in preparation. 30. CHE130009 297. Chang, C-E. A.*, and Huang, Y-M. M. Atomistic Modeling of Phosphopeptide Recognition of Modular Domains, Ralph A. Wheeler, editors: Annual Reports in Computational Chemistry, 2013, Vol. 9, 61-84. 298. Zeng, S., Huang, Y-M. M., Chang, C-E. A.*, and Zhong, W*. Protein Binding for Detection of Small Changes on Nanoparticle Surface, Analyst, 2014, 139, 1364-1371. 299. Miyata, N., Tang, Z., Johnson, M. E., Douglas, C. J., Hasson, S. A., Damoiseaux, R., Chang, C-E. A., and Koehler, C. M., Adaptation of a Genetic Screen Reveals an Inhibitor for mitochondrial of Protein import Component, submitted to Molecular Cell. 31. CHE140071 300. Norton, Jack; Spataru, Tudor; Camaioni, Donald; Lee, Suh-Jane; Franz, James; Li, Gang; Choi, Jongwook, "Kinetics and Mechanism of the Hydrogenation of the CpCr(CO)3•/[CpCr(CO)3]2 Equilibrium to CpCr(CO)3H", (accepted by organometallics). 301. Arthur Han; Tudor Spataru; John, Hartung; Jack R. Norton, “The H-radicals cyclization rate constants and their electronic structure”, Org. Chem., 2014, 79 (5), pp 1938–1946. 32. CHE140077 302. Lim, C.-H.; Holder, A. M.; Hynes, J. T.; Musgrave, C. B., Role of the Lewis Acid and Base in the Chemical Reduction of CO2 catalyzed by Frustrated Lewis Pairs. Inorganic Chemistry 2013, 52, 10062 (Grant number OCI-1053575) 303. Aguirre, A.; Lim, C.-H.; Hwang, A.; Musgrave, C. B; Stansbury, J. “Visible-light Organic Photocatalysis for Latent Radical-initiated Polymer Synthesis via 2e-/1H+ transfers”, JACS. 2014, Just accepted. (Grant number: CHE130075) 304. Lim, C.-H.; Holder, A. M.; Hynes, J. T.; Musgrave, C. B. “Role of Pyridine as a Biomimetic Organo-hydride for Homogeneous Reduction of CO2 to Methanol”,To be submitted. (Grant number: CHE130075) 33. CTS030025 305. Vahidkhah, K., Diamond, S.L., & Bagchi, P. 2014. Platelet dynamics in three-dimensional simulation of whole blood. Biophysical Journal (submitted). 306. Cordasco, D., Yazdani, A., and Bagchi, P. 2014. Comparison of Erythrocyte Dynamics in Shear Flow under Different Stress-free Configurations. Physics of Fluids (Accepted). 307. Cordasco, D., and Bagchi, P. 2013. Orbital drift of capsules and red blood cells in shear flow. Physics of Fluids, 25 (9), 091902 (24 pages). 308. Vahidkhah, K. Diamond, S.L, and Bagchi, P. 2013. Hydrodynamic interaction between a platelet and erythrocyte: Effect of erythrocyte deformation, dynamics, and wall proximity. Journal of Biomechanical Engineering 135 (5), 051002. 309. Yazdani, A.Z.K., & Bagchi, P., 2013. Influence of membrane viscosity on capsule dynamics in shear flow. Journal of Fluid Mechanics, 718, 569—595. 310. Bagchi, P., Vahidkhah, K., Balogh, P., & Cordasco, D. 2014. Multiscale 3D simulation of whole blood in complex geometry. 2014 SIAM Annual Meeting. Chicago, IL. 311. Vahidkhah, K., Diamond, S.L., & Bagchi, P. 2014. Transport of platelets and other nonspherical particles in whole blood simulation in 3D. 15th USNCTAM 2014. 312. Cordasco, D., & Bagchi, P., 2014. Orbital dynamics of red blood cells and the effect of stress-free shape. 15th USNCTAM 2014. 313. Bagchi, P., Cordasco, D., Yazdani, A., 2013. “The effect of stress-free shapes on the red blood cell dynamics”. 66th Annual Meeting of the APS Division of Fluid Dynamics, November 2013. Pittsburgh, PA. 314. Bagchi, P., 2013. “Particle pressure and normal stress differences in red blood cell and capsule suspensions”. Society of Rheology 85th Annual Meeting, Montreal, QC, October, 2013. 34. CTS070002 315. Ghodke, CD, Skitka J., Apte SV., 2013, Characterization of Oscillatory Boundary Layer over a Closely Packed Bed of Sediment Particles, Special Issue on Computational Modelling of Environmental and Biological Multiphase Flows, under review. 316. Ghodke, CD and Apte, SV. DNS of oscillatory boundary layer over a closely packed layer of sediment particles, American Physical Society Division of Fluid Dynamics Annual Meeting, Pittsburgh, November 2013. 317. Drost, K.J., Apte S.V., Haggerty, R., and Jackson, T., 2014, Parameterization of mean residence times in idealized rectangular dead zones representative of natural streams, Journal of Hydraulic Engineering, accepted for publication.

PY1-3 Comprehensive Report Page 14

318. J.R. Finn and S.V. Apte, \Integrated computation of finite-time lyapunov exponent fields during direct numerical simulation of unsteady flows", Chaos: An Interdisciplinary Journal of Nonlinear Science, 23(1):0131451:14, 2013. 319. S.V. Apte and J.R. Finn, \A Variable-Density Fictitious Domain Method for Particulate Flows with Broad Range of Particle-Fluid Density Ratios", Journal of Computational Physics, 213: 109-129, 2013. 320. Drost, K.J., Apte S.V., Haggerty, R., and Jackson, T., \Parametric modelling of residence times in natural streams using RANS simulations", Journal of Hydraulic Engineering, under review, 2013. 321. Jackson, T., Haggerty, R., and Apte S.V., \A Predictive relationship for the mean residence time of lateral cavities in gravel-bed rivers and streams: Incorporating streambed roughness and cavity shape," Water Resources Research, accepted for publication, 2013. 322. Cihonski, A., Finn, J., and Apte S.V., \ Volume displacement effects during bubble entrainment in a travelling vortex ring," Journal of Fluid Mechanics, 721:225-267, 2013. 323. Skitka, J., and Apte S.V., \Characterization of Oscillatory Boundary Layer Over a Closely Packed Bed of Sediment Particles", International Conference on Multiphase Flow, May 2013, Jeju Korea. 35. CTS080032 324. Wang, S., Vorotnikov, V., Sutton, J. E. & Vlachos, D. G. BEP and TSS Relations for furanic chemistry on Pd(111). (in preparation). 325. Vorotnikov, V. & Vlachos, D. G. Adsorption of furanic biomass derivatives on metal surfaces: a DFT study. (in preparation). 326. Vorotnikov, V. & Vlachos, D. G. DFT-parameterized group additivity extended to furanic compounds. (in preparation). 327. Gu, G. H. & Vlachos, D. G. Reaction mechanism for p-cresol hydrogenation and dehydroxylation on Pt(111). (in preparation). 328. Christiansen, M. A. & Vlachos, D. G. DFT-driven Multi-Site Microkinetic Modeling of Ethanol Conversion to Ethylene and Diethyl Ether on γ-Al2O3 (in preparation). 329. Christiansen, M. A. & Vlachos, D. G. Ethanol Dehydration and Etherification on γ-Al2O3(110): Mechanisms and Insights. (in preparation). 330. Wang, S., Vorotnikov, V. & Vlachos, D. G. A DFT Study Of Furan Hydrogenation and Ring Opening on Pd(111) Green Chem. Accepted, doi:10.1039/C3GC41183D (2013). 331. Nikbin, N. et al. A DFT study of the acid-catalyzed conversion of 2,5-dimethylfuran and ethylene to p-xylene. J. Catal. 297, 35-43, doi:10.1016/j.jcat.2012.09.017 (2013). 332. Guo, W. & Vlachos, D. G. Effect of local metal microstructure on adsorption on bimetallic surfaces: Atomic nitrogen on Ni/Pt(111). J. Chem. Phys. 138, doi:10.1063/1.4803128 (2013). 36. CTS090004 333. Kim, J., Bodony, D. J., Freund, J. (2014) “Adjoint-based control of loud events in a turbulent jet,” Journal of Fluid Mechanics, Vol. (741), pp. 28–59. 334. Sucheendran, M. M., Bodony, D. J., and Geubelle, P. H. (2014) “Coupled structural-acoustic response of a duct- mounted elastic plate with grazing flow,” AIAA Journal, Vol. 52(1), pp. 178–194. 335. Ostoich, C., Bodony, D. J., and Geubelle, P. H. (2013) “Interaction of a Mach 2.25 turbulent boundary layer with a fluttering panel using direct numerical simulation,” Physics of Fluids, Vol. 25(11), 110806, 27 pages. 336. Zhang, Q., and Bodony, D. J. “Numerical Investigation of 3-D Honeycomb Liner with Turbulent Boundary Layer. Part 1. Sound-orifice-boundary layer Interaction,” submitted to the Journal of Fluid Mechanics. 337. Zhang, Q., and Bodony, D. J. “Numerical Investigation of 3-D Honeycomb Liner with Turbulent Boundary Layer. Part 2. Impedance Prediction and Modeling,” submitted to the Journal of Fluid Mechanics. 338. Jambunathan, R., and Bodony, D. J. “Effect of heating on sound radiated from a two-dimensional mixing layer,” submitted to the International Journal of Aeroacoustics. 339. Bodony, D. J. (2014) “Foundational Shifts in Computing-Enabled Design & Engineering,” Plenary Panel on Emerging Technologies of Importance to Aerospace Engineering, Presented at the 52nd AIAA Aerospace Sciences Meeting, National Harbor, MD. 340. Zhang, W., and Larson, J. L., and Bodony, D. J., and Wilson, L. A. (2014) “ECSS Experience: Optimizing a Production Code in Preparation for Intel Xeon Phi,” XSEDE14. 341. Ostoich, C., Bodony, D. J., and Geubelle, P. H. (2013) “Direct numerical simulation of the aeroelastic response of a panel under high speed turbulent boundary layers,” AIAA Paper 2013-3200, Presented at the 43rd Fluid Dynamics Conference, San Diego, CA, 24-27 June, 2013. 342. Zhang, Q. and Bodony, D. J. (2013) “Impedance Prediction of Three-Dimensional Honeycomb Liners with Laminar/Turbulent Boundary Layers using DNS,” AIAA Paper 2013-2268, Presented at the 19th AIAA/CEAS Aeroacoustics Conference (34th AIAA Aeroacoustics Conference), Berlin, Germany, May 27-29, 2013.

PY1-3 Comprehensive Report Page 15

343. Natarajan, M., Freund, J. B., and Bodony, D. J. (2013) “Structural Sensitivity for Estimating Actuator and Sensor Placement for Flow Control,” Presented at the 66th Annual Meeting of the APS Division of Fluid Dynamics, Epitome, Vol. 58(18). 344. Vishnampet, R., Bodony, D. J., Freund, J. B. (2013) “Adjoint-field errors in high fidelity compressible turbulence simulations for sound control,” Presented at the 66th Annual Meeting of the APS Division of Fluid Dynamics, Epitome, Vol. 58(18). 345. Saurabh, S., Faber, J., and Bodony, D. J. (2013) “Study of dynamic fluid-structure coupling with application to human phonation,” Presented at the 66th Annual Meeting of the APS Division of Fluid Dynamics, Epitome, Vol. 58(18). 346. Bodony, D. J., Ostoich, C., and Geubelle, P. H. (2013) “Interaction of a Mach 2.25 turbulent boundary layer with a fluttering panel using direct numerical simulation,” Presented at the 66th Annual Meeting of the APS Division of Fluid Dynamics, Epitome, Vol. 58(18). 347. Zhang, Q., and Bodony, D. J. (2013) “Interaction of a turbulent boundary layer with a cavitybacked circular orifice and tonal acoustic excitation,” Presented at the 66th Annual Meeting of the APS Division of Fluid Dynamics, Epitome, Vol. 58(18). 37. CTS100062 348. B. Ray and L. R. Collins. Investigation of sub-Kolmogorov inertial particle pair dynamics in turbulence using novel satellite particle simulations. J. Fluid Mech., 720:192211, 2013. 349. B. Ray. New Subgrid Models For Inertial Particles In Large-Eddy Simulations Of Turbulent Flows. PhD thesis, Cornell University, 2013. 350. B. Ray and L. R. Collins. New subgrid models for inertial particles in large-eddy simulations of turbulent flows. J. Fluid Mech., 2014. in press. 351. P. S. Sukheswalla, T. Vaithianathan, and L. R. Collins. Simulation of homogeneous turbulent shear flows at higher Reynolds numbers: numerical issues and resolution. Journal of Turbulence, 14:6097, 2013. 352. P. S. Sukheswalla and L. R. Collins. Anisotropy of inertial-particle clustering in homogeneous turbulent shear flow. In Bull. APS Div. Fluid Dyn., volume 58, 2013. 353. P. S. Sukheswalla and L. R. Collins. Anisotropic clustering of inertial particles in homogeneous turbulent shear flow: effects of the shear parameter, Reynolds number, and gravity. Journal of Turbulence. 2014. In Preparation. 354. P. S. Sukheswalla and L. R. Collins. Relative velocity anisotropy of inertial particles in homogeneous turbulent shear flow: effects of the shear parameter, Reynolds number, and gravity. Journal of Turbulence. 2014. In Preparation. 38. CTS110029 355. Donzis, D. A. and Aditya, K. (2014) Asynchronous finite-difference schemes for partial differential equations. J. Comp. Phys., (under review). 356. Donzis D.A., Aditya K., Yeung P.K. and Sreenivasan K.R. (2014) “The turbulent Schmidt number” J. Fluid Eng.. doi:10.1115/1.4026619. 357. Donzis, D.A., and Jagannathan, S. (2013) Fluctuations of thermodynamic variables in compressible turbulence. J. Fluid Mech. 733, 221-244. 358. Donzis, D. A. and Jagannathan, S. (2013) On the relation between small-scale intermittency and shocks in turbulent flows. Procedia IUTAM, 9, 3-15. 359. Donzis, D.A., Gibbon, J.D., Kerr, R.M., Pandit, R., Gupta, A., and Vincenzi, D. (2013) Vorticity moments in four numerical simulations of the 3D Navier-Stokes equations. J. Fluid Mech. 732, 316-331. 360. Donzis, D.A. and Jagannathan S. (2013) Reynolds-number scaling of compressible turbulence using high-resolutions direct numerical simulations. J. Fluid Mech. (under preparation). 361. Donzis, D.A. and Maqui A. (2013) Dynamics of non-local interactions in isotropic turbulence. Phys. Fluids (under preparation). 362. Aditya, K. and Donzis, D. A. (2013) Asynchronous Computing for Partial Differential Equations at Extreme Scales. Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, IEEE Computer Society, 2012. 363. American Physical Society, 65th Fluid Dynamics Meeting (Nov 2013): (See Bull. Am. Phys. Soc. 58(18).) Jagannathan A. & Donzis D.A. (2013) “The role of compressions and expansions in stationary compressible isotropic turbulence” 364. American Physical Society, 65th Fluid Dynamics Meeting (Nov 2013): (See Bull. Am. Phys. Soc. 58(18).) Maqui A. & Donzis D.A. (2013) “Energy dynamics in turbulence generated through concentrated regions of intense kinetic energy” 365. American Physical Society, 65th Fluid Dynamics Meeting (Nov 2013): (See Bull. Am. Phys. Soc. 58(18).) Konduri A. & Donzis D.A. (2013) “Asynchronous schemes for CFD at extreme scales” 366. American Physical Society, 65th Fluid Dynamics Meeting (Nov 2013): (See Bull. Am. Phys. Soc. 58(18).) Donzis D.A. (2013) Francois N. Frenkiel Award Talk: “Shock Structure in Shock-Turbulence Interactions”

PY1-3 Comprehensive Report Page 16

39. CTS110039 367. N.Y.C. Lin, S. Goyal, X. Cheng, R.N. Zia, F. A. Escobedo, and I. Cohen, "Far-from-equilibrium sheared colloidal liquids: Disentangling relaxation, advection, and shear-induced diffusion", Phys. Rev. E 88, 062309 (2013). 368. S. Turgman-Cohen, J.C. Araque, E. Hoek, and F.A. Escobedo, “Molecular Dynamics of Equilibrium and Pressure- Driven Transport Properties of Water through LTA-type Zeolites”, Langmuir 29, 12389-12399 (2013). 369. P. Padmanabhan, M/ Chavis, C.K. Ober, and F.A. Escobedo “Phase behavior of PMMA-b-PHEMA with solvents methanol and THF: Modeling and comparison to experiment”, submitted to Soft Matter (April 2014). 370. S. Goyal and F.A. Escobedo “Coarse-grained modeling of blends of oligomers and oligomer-grafted nanoparticles, structural anomalies and miscibility behavior”, in preparation. 40. CTS130004 371. Agrawal, B., A. Rosenberg, and A. Sharma, 2014: On assessing the impact of wake turbulence on wind turbine noise. In 2nd Symposium on OpenFOAM in Wind Energy, Boulder, Colorado, USA, abstract accepted. 372. Agrawal, B. and A. Sharma, 2014a: Aerodynamic noise prediction for a rod-airfoil configuration using large eddy simulations. In 20th AIAA/CEAS Aeroacoustics Conference, Atlanta, Georgia, USA, abstract accepted. 373. Agrawal, B. and A. Sharma, 2014b: Modeling fan broadband noise: Rod-airfoil configuration. Poster at the Research and Design Exposition, Iowa State University, Ames, IA. 374. Rosenberg, A., S. Selvaraj, and A. Sharma, 2014a: A novel dual-rotor turbine for increased wind energy capture. In Proceedings of the Science of Making Torque from Wind, DTU, Denmark, full paper submitted. 375. Rosenberg, A., S. Selvaraj, and A. Sharma, 2014b: A novel dual-rotor turbine for increased wind energy capture. Poster at the Research and Design Exposition, Iowa State University, Ames, IA. 376. Selvaraj, S., A. Chaves, E. Takle, and A. Sharma, 2013: Numerical prediction of surface flow convergence phenomenon in windfarms. In Proceedings of the Conference on Wind Energy Science and Technology, RUZGEM, Middle Eastern Technical University, METU, Ankara, Turkey. 377. Selvaraj, S. and A. Sharma, 2014: On predicting the phenomenon of surface flow convergence in windfarms. In Proceedings of the ASME 2014 International Gas Turbine Institute ASME/IGTI 2014 , American Society of Mechanical Engineers, Dusseldorf, Germany. 378. Selvaraj, S., E. Takle, D. Rajewski, and A. Sharma, 2014: Numerical investigation of surface flow convergence in wind farms. Poster at the Research and Design Exposition, Iowa State University, Ames, IA. 379. Sharma, A., 2013: Large eddy simulations for predicting aerodynamic noise due to rod wake-airfoil interaction. In Proceedings of the Conference on Wind Energy Science and Technology, RUZGEM, Middle Eastern Technical University, METU, Ankara, Turkey. 380. Takle, E., D. Rajewski, J. K. Lundquist, W. A. Gallus, and A. Sharma, 2014: Measurements in support of wind farm simulations and power forecasts: The crop/wind-energy experiments (cwex). In Proceedings of the Science of Making Torque from Wind, DTU, Denmark, full paper submitted. 41. CTS130016 381. Vishal Acharya and Tim Lieuwen, “Role of Azimuthal Flow Fluctuations in Global Response of Axisymmetric Swirling Flames”, accepted for presentation at 52nd AIAA Aerospace Sciences Meeting and Exhibit (AIAA SciTech 2014), National Harbor, MD, USA, Jan. 2014, Paper# AIAA-2014-0654. 382. Vishal Acharya and Tim Lieuwen, “Response of Non-axisymmetric Premixed, Swirl Flames to Helical Disturbances”, accepted for presentation at ASME IGTI Turbo Expo 2014, Dusseldorf, Germany, Jun. 2014, Paper# GT2014-27059. 383. Vishal Acharya and Tim Lieuwen, “Sound Generation from Premixed Flames Excited by Helical Flow Disturbances”, under review at Journal of Sound and Vibration. 42. CTS130030 384. Zhi Liang, William Evans, Tapan Desai and Pawel Keblinski, “Improvement of heat transfer efficiency at solid-gas interfaces by self-assembled monolayers” Applied Physics Letters 102, 061907 (2013). 385. Zhi Liang, William Evans, and Pawel Keblinski, “Equilibrium and nonequilibrium molecular dynamics simulations of thermal conductance at solid-gas interfaces” Physical Review E 87, 022119 (2013). 386. Zhi Liang, and Pawel Keblinski, “Parametric Studies of the Thermal and Momentum Accommodation of Monoatomic and Diatomic Gases on Solid Surfaces” International Journal of Heat and Mass Transfer (in review). 387. Zhi Liang, William Evans and Pawel Keblinski, “Improvement of Heat Transfer Efficiency at Solid-gas Interfaces by Self-assembled Monolayers”, Materials Research Society Spring Meeting & Exhibit, April 2013, San Francisco.

PY1-3 Comprehensive Report Page 17

43. CTS140036 388. P. Albal T. Montidoro O. Dur P. Menon Gopalakrishna. Statistical Atlases and Computational Models of the Heart. Imaging and Modelling Challenges. Lecture Notes in Computer Science , Statistical Atlases and Computational Models of the Heart. Imaging and Modelling Challenges. Volume 8330 2014. 2014. 389. P. Menon Gopalakrishna J. F. Antaki A. Undar et al. Aortic Outflow Cannula Tip Design and Orientation Impacts Cerebral Perfusion During Pediatric Cardiopulmonary Bypass Procedures. Annals of Biomedical Engineering. undefined / undefined. 2013. 390. P. Menon Gopalakrishna N. Teslovich C. Chen et al. Characterization of neonatal aortic cannula jet flow regimes for improved cardiopulmonary bypass. Journal of Biomechanics. 46 / 2. 2013. 391. H. Hong P. Menon Gopalakrishna H. Zhang et al. Postsurgical Comparison of Pulsatile Hemodynamics in Five Unique Total Cavopulmonary Connections: Identifying Ideal Connection Strategies. The Annals of Thoracic Surgery. undefined / undefined. 2013. 392. P. Menon Gopalakrishna M. Yoshida K. Pekkan Presurgical Evaluation of Fontan Connection Options for Patients With Apicocaval Juxtaposition Using Computational Fluid Dynamics. Artificial Organs. 37 / 1. 2013. 393. P. Menon Gopalakrishna K. Pekkan S. Madan Statistical Atlases and Computational Models of the Heart. Imaging and Modelling Challenges.Springer-Verlag Berlin Heidelberg. Berlin. 2013. 394. M. Yoshida P. Menon Gopalakrishna C. Chrysostomou et al. Total cavopulmonary connection in patients with apicocaval juxtaposition: optimal conduit route using preoperative angiogram and flow simulation. European Journal of Cardio-Thoracic Surgery. 44 / 1. 2013. 44. DEB090011 395. Borba E, Salazar GA, Mazzoni-Viveiros S, Batista Phylogenetic position and floral morphology of the Brazilian endemic, monospecific genus Cotylolabium (Orchidaceae, Spiranthinae). Botanical Journal of the Linnean Society. In press. 396. Chang and Graham. Patterns of clade support across the major lineages of moss phylogeny. Cladistics In press. 397. Toussaint, E.F.A., Condamine, F.L., Hawlitschek, O., Watts, C., Hendrich, L. & Balke, M. Unveiling the diversification dynamics of Australasian predaceous diving in the Cenozoic. Systematic Biology. In press. (IF: 12.169) 398. Graf, D.L., A.J. Geneva, J.M. Pfeiffer III & A.D. Chilala. 2013. Phylogenetic analysis of Prisodontopsis Tomlin, 1928 and Mweruella Haas, 1936 (Bivalvia: Unionidae) from Lake Mweru (Congo Basin) reveals endemic Quaternary radiation in the Zambian Congo. Journal of Molluscan Studies, In press. 399. Hametner, C., Stocker-Wörgötter E., Grube M. & Rindi F. (2014) Phylogenetic position and morphology of lichenized Trentepohliales (Ulvophyceae, Chlorophyta) from selected species of Graphidaceae. Phycological Research In press. 400. Hendrix et al. : First comprehensive insights into nuclear and mitochondrial DNA based population structure of Near East mountain brook newts (Salamandridae: genus Neurergus) suggest the resurrection of Neurergus derjugini. Amphibia-Reptilia In press. 401. Sharma, B. and E. M. Kramer (2014). The MADS box gene family of the lower Eudicot Aquilegia coerulea E. James. Annals of the Missouri Botanical Garden, In press. 402. Kraichak, E. S. Parnmen, R. Lücking, T. Lumbsch. 2014. Gintarasia and Xalocoa, two new genera to accommodate temperate to subtropical species in the predominantly tropical Graphidaceae (Ostropales, ). Australian Systematic Botany. In press. 403. H. Thorsten Lumbsch, Sittiporn Parnmen, Ekaphan Kraichak, Khwanruan Butsatorn Papong & Robert Lucking. 2014. High frequency of character transformations is phylogenetically structured within the lichenized fungal family Graphidaceae (Ascomycota: Ostropales). Systematics and Biodiversity. 2014. In press. 404. Li, F.W., J.C. Villarreal, S. Kelly, C.J. Rothfels, M. Melkonian, E. Frangedakis, M. Ruhsam, E. M. Sigel, J.P. Der, J. Pittermann, D.O. Burge, L. Pokorny, A. Larsson, T. Chen, S. Weststrand, P. Thomas, E. Carpenter, Y. Zhang, Z. Tian, L. Chen, Z. Yan, Y. Zhu, X. Sun, J. Wang, D.W. Stevenson, B.J. Crandall-Stotler, A.J. Shaw, M.K. Deyholos, D.E. Soltis, S.W. Graham, M.D. Windham, J.A. Langdale, G.K.S. Wong, S. Mathews & K.M. Pryer. Horizontal transfer of an adaptive chimeric photoreceptor from bryophytes to ferns. Proceedings of the National Academy of Sciences USA, In press. 405. Malay MDC and Michonneau F. Phylogenetics and Morphological Evolution of Coral Dwelling Barnacles (Balanomorpha: Pyrgomatidae). Biological Journal of the Linnean Society, In press. 406. Malm, T. & Nyman, T. 2014. Phylogeny of the symphytan grade of : new pieces into the old jigsaw(fly) puzzle. Cladistics, In press. 407. Moura M., Carine M., Malécot V., Lourenço L., Schaefer H. & L. Silva. A taxonomic reassessment of Viburnum (Adoxaceae) in the Azores. Phytotaxa In press. 408. Orange, A. (2014) Porpidia irrigua, a new species related to P. contraponenda. Lichenologist 46 In press. 409. Orange, A. (2014) Two new or misunderstood species related to Verrucaria praetermissa (Verrucariaceae, lichenised Ascomycota). Lichenologist 46 In press.

PY1-3 Comprehensive Report Page 18

410. Pedron M., Buzatto C. R., Ramalho A. J., Carvalho B. M., Radins J. A., Rodrigo B. Singer R. B., Batista J. A. N. Molecular phylogenetics and taxonomic revision of Habenaria sect. Pentadactylae (Orchidaceae, Orchidinae). Botanical Journal of the Linnean Society In press. 411. Pick, C., Scherbaum, S., Hegedüs, E., Meyer, A., Saur, M., Neumann, R., Markl, J., and Burmester, T. (2014) Structure, diversity and evolution of myriapod hemocyanins. FEBS J. In press. 412. Phylogeny, identification and pathogenicity of the Botryosphaeriaceae associated with collar and root rot of the biofuel plant Jatropha curcas in Brazil, with a description of new species of Lasiodiplodia. Fungal Diversity 2014 In press. 413. Humphreys, A. M., and T. G. Barraclough. The evolutionary reality of higher taxa in mammals. Proceedings of the Royal Society B. In press. 414. Spinks, PQ., Thomson, RC., Gidis, M, Shaffer, HB. Multilocus phylogeny of the New-World mud turtles (Kinosternidae) supports the traditional classification of the group. Molecular Phylogenetics and Evolution. In press 415. Tanganelli V, Fanciulli PP, Nardi F, Frati F. Molecular phylogenetic analysis of a novel strain from Neelipleona enriches Wolbachia diversity in soil biota. Pedobiologia, In press. I.F.: 1,690 (online 10/10/13) 416. Nardi F, Liò P, Carapelli A, Frati F. MtPAN3: site-class specific amino acid replacement matrices for mitochondrial proteins of Pancrustacea and Collembola. Molecular Phylogenetics and Evolution, In press. I.F.: 3,889 (online 10/02/14) 417. Unmack, P. J., T. E. Dowling, N. J. Laitinen, C. L. Secor, R. L. Mayden, D. K. Shiozawa, and G. R. Smith. 2014. Influence of introgression and geological processes on phylogenetic relationships of western North American mountain suckers (Pantosteus, Catostomidae). PLoS One, In press 418. Valderrama, E., Perez-Eman, J.E., Cuervo, A.M., Brumfield, R.T. Cadena, C.D. 2014. The influence of the complex topography and dynamic history of the montane Neotropics on the evolutionary differentiation of a cloud forest bird (Premnoplex brunnescens, Furnariidae). Jounal of Biogeography, In press. 419. Yangbo Fan,Ying Pan, Jie Huang, Xiaofeng Lin, Xiaozhong Hu, Alan Warren. Molecular phylogeny and of two novel brackish water hypotrich , with the establishment of a new genus, Antiokeronopsis gen. n. (Ciliophora, ). The Journal of Eukaryotic Microbiology. In press 420. Tuomi PA, Murray MJ, Garner MM, Goertz CEC, Nordhausen RW, Burek-Huntington KA, Getzy DM, Nielsen O, Archer LL, Maness HTD, Wellehan JFX, Waltzek TB. Novel poxvirus infection in northern and southern sea otters (Enhydra lutris kenyoni and Enhydra lutris neiris). Journal of Wildlife Diseases 2014 421. Dueñas, L. F., Alderslade, P., and Sánchez, J. A. (2014) Molecular systematics of the deepsea bamboo corals (Octocorallia: Isididae: Keratoisidinae) from New Zealand with descriptions of two new species of Keratoisis. Molecular Phylogenetics and Evolution 74, 15-28 422. Santoferrara, L. F., Grattepanche, J.-D., Katz, L. A., and McManus, G. B. (2014) Pyrosequencing for assessing diversity of eukaryotic microbes: analysis of data on marine planktonic ciliates and comparison with traditional methods. Environmental Microbiology, n/a-n/a 423. Boeger, W. A., Kritsky, D. C., Domingues, M. V., and Bueno-Silva, M. (2014) The phylogenetic position of the Loimoidae Price, 1936 (Monogenoidea: Monocotylidea) based on analyses of partial rDNA sequences and morphological data. Parasitol. Int. 63, 492-499 424. Boeger, W. A., Ferreira, R. C., Vianna, R. T., and Patella, L. (2014) Neotropical monogenoidea 59. polyonchoineans from characidium spp. (characiformes: crenuchidae) from southern Brazil. Folia Parasitologica 425. Bueno-Silva, M., and Boeger, W. A. (2014) Neotropical Monogenoidea. 58. Three new species of Gyrodactylus (Gyrodactylidae) from Scleromystax spp. (Callichthyidae) and the proposal of COII gene as an additional fragment for barcoding gyrodactylids. . Folia Parasitologica (Praha) 426. Batalha-Filho, H., Fjeldså, J., Fabre, P.-H., and Miyaki, C. (2013) Connections between the Atlantic and the Amazonian forest avifaunas represent distinct historical events. Journal of Ornithology 154, 41-50 427. Bertrand, S., Iwema, T., and Escriva, H. (2014) FGF Signaling Emerged Concomitantly with the Origin of Eumetazoans. Molecular Biology and Evolution 31, 310-318 428. Neumann, G., Macken, C. A., and Kawaoka, Y. (2014) Identification of amino acid changes that may have been critical for the genesis of A(H7N9) influenza viruses. J Virol 429. Yamayoshi, S., Yamada, S., Fukuyama, S., Murakami, S., Zhao, D., Uraki, R., Watanabe, T., Tomita, Y., Macken, C., Neumann, G., and Kawaoka, Y. (2014) Virulence-Affecting Amino Acid Changes in the PA Protein of H7N9 Influenza A Viruses. J Virol 88, 3127-3134. 430. Higginbotham, S., Wong, W. R., Linington, R. G., Spadafora, C., Iturrado, L., and Arnold, A. E. (2014) Sloth Hair as a Novel Source of Fungi with Potent Anti-Parasitic, Anti-Cancer and Anti-Bacterial Bioactivity. PLoS ONE 9, e84549 431. Aggerbeck, M., Fjeldså, J., Christidis, L., Fabre, P.-H., and Jønsson, K. A. (2014) Resolving deep lineage divergences in core corvoid passerine birds supports a proto-Papuan island origin. Molecular Phylogenetics and Evolution 70, 272-285 432. Atkinson, G., Kuzmenko, A., Chicherin, I., Soosaar, A., Tenson, T., Carr, M., Kamenski, P., and Hauryliuk, V. (2014) An evolutionary ratchet leading to loss of elongation factors in eukaryotes. BMC Evolutionary Biology 14, 35 433. Hartikainen, H., Ashford, O. S., Berney, C., Okamura, B., Feist, S. W., Baker-Austin, C., Stentiford, G. D., and Bass, D. (2014) Lineage-specific molecular probing reveals novel diversity and ecological partitioning of haplosporidians. ISME J 8, 177-186

PY1-3 Comprehensive Report Page 19

434. Chen, W.-J., Koch, M., Mallatt, J. M., and Luan, Y.-X. (2014) Comparative Analysis of Mitochondrial Genomes in Diplura (Hexapoda, Arthropoda): Taxon Sampling Is Crucial for Phylogenetic Inferences. Genome Biology and Evolution 6, 105-120 435. Cooper, J. A. (2014) New species and combinations of some New Zealand agarics belonging to Clitopilus, Lyophyllum, Gerhardtia, Clitocybe, Hydnangium, Mycena, Rhodocollybia and Gerronema. Mycosphere 5, 263–288 436. Linde, C. C., Phillips, R. D., Crisp, M. D., and Peakall, R. (2014) Congruent species delineation of Tulasnella using multiple loci and methods. New Phytologist 201, 6-12 437. Drew, B. T., Ruhfel, B. R., Smith, S. A., Moore, M. J., Briggs, B. G., Gitzendanner, M. A., Soltis, P. S., and Soltis, D. E. (2014) Another Look at the Root of the Angiosperms Reveals a Familiar Tale. Systematic Biology 438. Dumont, E. R., Samadevam, K., Grosse, I., Warsi, O. M., Baird, B., and Davalos, L. M. (2014) Selection for mechanical advantage underlies multiple cranial optima in new world leaf-nosed bats. Evolution. 439. Ellingson, R. A., Swift, C. C., Findley, L. T., and Jacobs, D. K. (2014) Convergent evolution of ecomorphological adaptations in geographically isolated Bay gobies (Teleostei: Gobionellidae) of the temperate North Pacific. Molecular Phylogenetics and Evolution 70, 464-477 440. Engel, J. I., Byamana, K., Kahindo, C., Bates, J. M., and Fjeldså, J. (2014) Genetic structure offers insights into the evolution of migration and the taxonomy of the Barred Long-tailed Cuckoo Cercococcyx montanus species complex. Ibis 156, 330-340. 441. Evangelista, D. A., Bourne, G., and Ware, J. L. (2014) Species richness estimates of Blattodea s.s. (Insecta: Dictyoptera) from northern Guyana vary depending upon methods of species delimitation. Systematic Entomology 39, 150-158 442. Huang, J., Chen, Z., Song, W., and Berger, H. (2014) Three-gene based phylogeny of the Urostyloidea (Protista, Ciliophora, Hypotricha), with notes on classification of some core taxa. Molecular Phylogenetics and Evolution 70, 337- 347. 443. Gottschling, M., Luebert, F., Hilger, H. H., and Miller, J. S. (2014) Molecular delimitations in the Ehretiaceae (Boraginales). Molecular Phylogenetics and Evolution 72, 1-6. 444. He, D., Fiz-Palacios, O., Fu, C.-J., Fehling, J., Tsai, C.-C., and Baldauf, Sandra L. (2014) An Alternative Root for the Eukaryote Tree of Life. Current biology : CB 24, 465-470. 445. Whitten, W. M., Neubig, K. M., and Williams, N. H. (2014) Generic and Subtribal relationships in neotropical Cymbidieae (orchidaceae) based on matK/ycf1 plastid data. Lankesteriana 13, 375—392. 446. Hu, Y., Łukasik, P., Moreau, C. S., and Russell, J. A. (2014) Correlates of gut community composition across an species ( varians) elucidate causes and consequences of symbiotic variability. Molecular Ecology 23, 1284- 1300. 447. Joseph, L., Toon, A., Nyári, Á. S., Longmore, N. W., Rowe, K. M. C., Haryoko, T., Trueman, J., and Gardner, J. L. (2014) A new synthesis of the molecular systematics and biogeography of honeyeaters (Passeriformes: Meliphagidae) highlights biogeographical and ecological complexity of a spectacular avian radiation. Zoologica Scripta, n/a-n/a. 448. Kamilari, M., Klossa-Kilia, E., Kilias, G., and Sfenthourakis, S. (2014) Old Aegean palaeoevents driving the diversification of an endemic isopod species (Oniscidea, Trachelipodidae). Zoologica Scripta. 449. Kanzaki, N., Ragsdale, E. J., Herrmann, M., and Sommer, R. J. (2014) Two New and Two Recharacterized Species from a Radiation of Pristionchus (Nematoda: Diplogastridae) in Europe. Journal of Nematology 46, 60-74 450. Hughes, J. M., Schmidt, D. J., Macdonald, J. I., Huey, J. A., and Crook, D. A. (2014) Low interbasin connectivity in a facultatively diadromous fish: evidence from genetics and otolith chemistry. Molecular Ecology 23, 1000-1013. 451. Luna, J. A., Demissew, S., Darbyshire, I., and Carine, M. A. (2014) The significance of one versus two styles: the return of Seddera section Socotroseddera to Convolvulus. 452. Magain, N., and Sérusiaux, E. (2014) Do Photobiont Switch and Cephalodia Emancipation Act as Evolutionary Drivers in the Lichen Symbiosis? A Case Study in the Pannariaceae (Peltigerales). PLoS ONE 9, e89876. 453. Mahmood, M. T., McLenachan, P. A., Gibb, G. C., and Penny, D. (2014) Phylogenetic position of avian nocturnal and diurnal raptors. Genome Biology and Evolution. 454. Di Domenico, M., Martínez, A., Lana, P., and Worsaae, K. Molecular and morphological phylogeny of Saccocirridae (Annelida) reveals two cosmopolitan clades with specific habitat preferences. Molecular Phylogenetics and Evolution. 455. McHugh, A., Yablonsky, C., Binford, G., and Agnarsson, I. (2014) Molecular phylogenetics of Caribbean Micrathena (Araneae: Araneidae) suggests multiple colonization events and single island endemism Invertebrate Systematics. 456. Mottern, J. L., and Heraty, J. M. (2014) Revision of the Cales noacki species complex (Hymenoptera, Chalcidoidea, Aphelinidae). Systematic Entomology 39, 354-379. 457. Fernandes, A. M., Wink, M., Sardelli, C. H., and Aleixo, A. (2014) Multiple speciation across the Andes and throughout Amazonia: the case of the spot-backed antbird species complex (Hylophylax naevius/Hylophylax naevioides). Journal of Biogeography. 458. Mantelatto, F. L., Pezzuto, P. R., Masello, A., Wongtschowskid, C. L. B. R., Hilsdorfe, A. W. S., and Rossi, N. (2014) Molecular analysis of the commercial deep-sea crabs Chaceon ramosae and Chaceon notialis (Brachyura, Geryonidae) reveals possible cryptic species in the South Atlantic. Deep-Sea Research Part I - Oceanographic Research Papers 84, 29–37.

PY1-3 Comprehensive Report Page 20

459. Trimbos, K. B., Doorenweerd, C., Kraaijeveld, K., Musters, C. J. M., Groen, N. M., de Knijff, P., Piersma, T., and de Snoo, G. R. (2014) Patterns in Nuclear and Mitochondrial DNA Reveal Historical and Recent Isolation in the Black- Tailed Godwit (Limosa limosa). PLoS ONE 9, e8394 460. Penney, H. D., Hassall, C., Skevington, J. H., Lamborn, B., and Sherratt, T. N. (2014) The relationship between morphological and behavioural mimicry in hover flies (Diptera: Syrphidae). The American Naturalist. 183, 281-289. 461. Pérez-Ortega, S., Suija, A., Crespo, A., and Ríos, A. (2014) Lichenicolous fungi of the genus Abrothallus (Dothideomycetes: Abrothallales ordo nov.) are sister to the predominantly aquatic Janhulales. Fungal Diversity 64, 295- 304. 462. Lopes, U. P., Zambolim, L., Pinho, D. B., Barros, A. V., Costa, H., and Pereira, O. L. (2014) Postharvest rot and mummification of strawberry fruits caused by Neofusicoccum parvum and N. kwambonambiense in Brazil Tropical Plant Pathology 39, 191. 463. Alström, P., Hooper, D. M., Liu, Y., Olsson, U., Mohan, D., Gelang, M., Le Manh, H., Zhao, J., Lei, F., and Price, T. D. (2014) Discovery of a relict lineage and monotypic family of passerine birds. Biology Letters 10. 464. Reynolds, H. T., and Barton, H. A. (2014) Comparison of the White-Nose Syndrome Agent Pseudogymnoascus destructans to Cave-Dwelling Relatives Suggests Reduced Saprotrophic Enzyme Activity. PLoS ONE 9, e86437. 465. Richart, C. H. (2014) Taxonomic Revision of the Opilionid Genus Acuclavella Shear, 1986 (Opiliones, Ischyropsalidoidea) based on Molecular Phylogenetics and Morphometrics with Description of Four New Species. in Biology, San Diego State University. 466. Bradler, S., Robertson, J. A., and Whiting, M. F. (2014) A molecular phylogeny of Phasmatodea with emphasis on Necrosciinae, the most species-rich subfamily of stick . Systematic Entomology 39, 205-222. 467. Tkach, N., Ree, R. H., Kuss, P., Röser, M., and Hoffmann, M. H. (2014) High mountain origin, phylogenetics, evolution, and niche conservatism of arctic lineages in the hemiparasitic genus Pedicularis (Orobanchaceae). Molecular Phylogenetics and Evolution 76, 75-92. 468. Vďačný, P., Breiner, H.-W., Yashchenko, V., Dunthorn, M., Stoeck, T., and Foissner, W. (2014) The Chaos Prevails: Molecular Phylogeny of the Haptoria (Ciliophora, Litostomatea). Protist 165, 93-111. 469. Xiang, X.-G., Jin, W.-T., Li, D.-Z., Schuiteman, A., Huang, W.-C., Li, J.-W., Jin, X.-H., and Li, Z.-Y. (2014) Phylogenetics of Tribe Collabieae (Orchidaceae, Epidendroideae) Based on Four Chloroplast Genes with Morphological Appraisal. PLoS ONE 9, e87625. 470. Xie, W., and Luan, Y.-X. (2014) Evolutionary implications of dipluran hexamerins. Biochemistry and Molecular Biology 46, 17-24. 471. Denton, J. S. S. Seven-locus molecular phylogeny of Myctophiformes (Teleostei; Scopelomorpha) highlights the utility of the order for studies of deep-sea evolution. Molecular Phylogenetics and Evolution. 472. Haines, W. P., Schmitz, P., and Rubinoff, D. (2014) Ancient diversification of Hyposmocoma moths in Hawaii. Nat Commun . 473. Ferrari, B., and Melo, G. R. (2014) Deceiving colors: recognition of color morphs as separate species in orchid bees is not supported by molecular evidence. Apidologie, 1-12. 474. Scheel, B. M., and Hausdorf, B. (2014) Dynamic evolution of mitochondrial ribosomal proteins in Holozoa. Molecular Phylogenetics and Evolution 76, 67-74. 475. Hustad, V., Kučera, V., Rybáriková, N., Lizoň, P., Gaisler, J., Baroni, T., and Miller, A. (2014) Geoglossum simile of North America and Europe: distribution of a widespread earth tongue species and designation of an epitype. Mycological Progress, 1-10. 476. Cressey, E. R., Measey, G. J., and Tolley, K. A. (2014) Fading out of view: the enigmatic decline of Rose's mountain toad Capensibufo rosei. Oryx FirstView, 1-8. 477. Ripma, L. A. (2014) From Pectinate to Divaricate: The Promise of NGS Genome Skimming for Phylogenetic Systematics of Oreocarya (Boraginaceae). in Biology, San Diego State Univeristy, San Diego, CA. 478. Senatore, G. L., Alexander, E. A., Adler, P. H., and Moulton, J. K. (2014) Molecular systematics of the Simulium jenningsi species group (Diptera: Simuliidae), with three new fast-evolving nuclear genes for phylogenetic inference. Molecular Phylogenetics and Evolution 75, 138-148. 479. Rosenberg, M. I., Brent, A. E., Payre, F., Desplan, C., and Pan, D. (2014) Dual mode of embryonic development is highlighted by expression and function of Nasonia pair-rule genes. eLife 3. 480. Grabowski, M. K., Lessler, J., Redd, A. D., Kagaayi, J., Laeyendecker, O., Ndyanabo, A., Nelson, M. I., Cummings, D. A. T., Bwanika, J. B., Mueller, A. C., Reynolds, S. J., Munshaw, S., Ray, S. C., Lutalo, T., Manucci, J., Tobian, A. A. R., Chang, L. W., Beyrer, C., Jennings, J. M., Nalugoda, F., Serwadda, D., Wawer, M. J., Quinn, T. C., Gray, R. H., and the Rakai Health Sciences, P. (2014) The Role of Viral Introductions in Sustaining Community-Based HIV Epidemics in Rural Uganda: Evidence from Spatial Clustering, Phylogenetics, and Egocentric Transmission Models. PLoS Med 11, e1001610. 481. Gwiazdowski, R. A., and Normark, B. B. (2014) An Unidentified Parasitoid Community (Chalcidoidea) is Associated with Pine-Feeding Chionaspis Scale Insects (Hemiptera: Diaspididae). Annals of the Entomological Society of America 107, 356-363.

PY1-3 Comprehensive Report Page 21

482. Udayanga, D., Castlebury, L. A., Rossman, A. Y., and Hyde, K. D. (2014) Species limits in Diaporthe: molecular re- assessment of D. citri, D. cytosporella, D. foeniculina and D. rudis. Persoonia 32, 83. 483. Reid, C., Carter, R., and Urbatsch, L. (2014) Phylogenetic insights into New World Cyperus (Cyperaceae) using nuclear ITS sequences. Brittonia, 1-14. 484. Paiva, T. d. S., de Albuquerque, A. F. C., Borges, B. d. N., and Harada, M. L. (2014) Description and Phylogeny of Tetrakeronopsis silvanetoi gen. nov., sp. nov. (Hypotricha, Pseudokeronopsidae), a New Benthic Marine from Brazil. PLoS ONE 9, e88954. 485. Barlow, L. D., Dacks, J. B., and Wideman, J. G. (2014) From all to (nearly) none: Tracing adaptin evolution in Fungi. Cellular Logistics 4, e28114. 486. Lemer, S., Buge, B., Bemis, A., and Giribet, G. (2014) First molecular phylogeny of the circumtropical bivalve family Pinnidae (Mollusca, Bivalvia): Evidence for high levels of cryptic species diversity. Molecular Phylogenetics and Evolution 75, 11-23. 487. Pastore, A. I., Prather, C. M., Gornish, E. S., Ryan, W. H., Ellis, R. D., and Miller, T. E. (2014) Testing the competition– colonization trade-off with a 32-year study of a saxicolous lichen community. Ecology 95, 306-315. 488. Pick, C., Scherbaum, S., Hegedüs, E., Meyer, A., Saur, M., Neumann, R., Markl, J., and Burmester, T. (2014) Structure, diversity and evolution of myriapod hemocyanins. FEBS Journal. 489. Chen, Z.-F., Zhang, H., Wang, H., Matsumura, K., Wong, Y. H., Ravasi, T., and Qian, P.-Y. (2014) Quantitative Proteomics Study of Larval Settlement in the Barnacle Balanus amphitrite. PLoS ONE 9, e88744 490. Borba, E. L., Salazar, G. A., Mazzoni-Viveiros, S., and Batista, J. A. N. (2014) Phylogenetic position and floral morphology of the Brazilian endemic, monospecific genus Cotylolabium: a sister group for the remaining Spiranthinae (Orchidaceae). Botanical Journal of the Linnean Society, n/a-n/a 491. Nardi, F., Liò, P., Carapelli, A., and Frati, F. MtPAN3: Site-class specific amino acid replacement matrices for mitochondrial proteins of Pancrustacea and Collembola. Molecular Phylogenetics and Evolution. 492. Hujslová, M., Kubátová, A., Kostovčík, M., Blanchette, R., Beer, Z. W., Chudíčková, M., and Kolařík, M. (2014) Three new genera of fungi from extremely acidic soils. Mycological Progress, 1-13. 493. Nazaire, M., and Hufford, L. (2014) Phylogenetic Systematics of the Genus Mertensia (Boraginaceae). Systematic Botany 39, 268-303 494. Hoppe, B., Kahl, T., Karasch, P., Wubet, T., Bauhus, J., Buscot, F., and Krüger, D. (2014) Network Analysis Reveals Ecological Links between N-Fixing Bacteria and Wood-Decaying Fungi. PLoS ONE 9, e88141. 495. Huang, C., W., Y., Z., X., Y., Q., Chen, M., B., Q., Motokawa, M., M., H., Li, Y., and Wu, Y. (2014) A Cryptic Species of the Tylonycteris pachypus Complex (Chiroptera: Vespertilionidae) and Its Population Genetic Structure in Southern China and nearby Regions. . Int J Biol Sci 10, 200-211. 496. Sirichamorn, Y., Thomas, D. C., Adema, F. A. C. B., and van Welzen, P. C. (2014) Historical biogeography of Aganope, Brachypterum and Derris (Fabaceae, tribe Millettieae): insights into the origins of Palaeotropical intercontinental disjunctions and general biogeographical patterns in Southeast Asia. Journal of Biogeography, n/a-n/a. 497. Leger, T., Landry, B., Nuss, M., and Mally, R. (2014) Systematics of the Neotropical genus Catharylla Zeller (Lepidoptera, Pyralidae s. l., Crambinae). ZooKeys 375, 15-73. 498. Schwarzberg, K., Le, R., Bharti, B., Lindsay, S., Casaburi, G., Salvatore, F., Saber, M. H., Alonaizan, F., Slots, J., Gottlieb, R. A., Caporaso, J. G., and Kelley, S. T. (2014) The Personal Human Oral Microbiome Obscures the Effects of Treatment on Periodontal Disease. PLoS ONE 9, e86708. 499. JD, G., LF, S., J, A., AM, O., GB, M., and LA, K. (2014) Distribution and diversity of and choreotrich ciliates assessed by morphology and DGGE in temperate coastal waters. Aquatic Microbial Ecology 71, 211-221. 500. Lopez-Osorio, F., Pickett, K. M., Carpenter, J. M., Ballif, B. A., and Agnarsson, I. (2014) Phylogenetic relationships of yellowjackets inferred from nine loci (Hymenoptera: Vespidae, Vespinae, Vespula and Dolichovespula). Molecular Phylogenetics and Evolution 73, 190- 201. 501. Goldberg, J., Knapp, M., Emberson, R. M., Townsend, J. I., and Trewick, S. A. (2014) Species Radiation of Carabid Beetles (Broscini: Mecodema) in New Zealand. PLoS ONE 9, e86185. 502. El-Elimat, T., Raja, H. A., Graf, T. N., Faeth, S. H., Cech, N. B., and Oberlies, N. H. (2014) Flavonolignans from iizukae, a Fungal Endophyte of Milk Thistle (Silybum marianum). Journal of Natural Products 77, 193-199. 503. Parreira, D., Silva, M., Pereira, O., Soares, D., and Barreto, R. (2014) Cercosporoid hyphomycetes associated with Tibouchina herbacea (Melastomataceae) in Brazil. Mycological Progress, 1-12. 504. Price, S. L., Powell, S., Kronauer, D. J. C., Tran, L. A. P., Pierce, N. E., and Wayne, R. K. (2014) Renewed diversification is associated with new ecological opportunity in the Neotropical turtle . Journal of Evolutionary Biology 27, 242-258. 505. lves, K. L., and López-Fernández, H. (2014) A targeted next-generation sequencing toolkit for exon-based cichlid phylogenomics. Molecular Ecology Resources, n/a-n/a. 506. Dauphin, B., Vieu, J., and Grant, J. R. (2014) Molecular phylogenetics supports widespread cryptic species in moonworts (Botrychium s.s., Ophioglossaceae). American Journal of Botany 101, 128-140. 507. Huey, J. A., Cook, B. D., Unmack, P. J., and Hughes, J. M. (2014) Broadscale phylogeographic structure of five freshwater fishes across the Australian Monsoonal Tropics. Freshwater Science 33, 273-287.

PY1-3 Comprehensive Report Page 22

508. Jones, S., Burke, S., and Duvall, M. (2014) Phylogenomics, molecular evolution, and estimated ages of lineages from the deep phylogeny of Poaceae. Plant Syst Evol, 1-16. 509. Heiniger, H., and Adlard, R. (2014) Relatedness of novel species of Myxidium Bütschli, 1882, Zschokkella Auerbach, 1910 and Ellipsomyxa Køie, 2003 (Myxosporea: Bivalvulida) from the gall bladders of marine fishes (Teleostei) from Australian waters. Syst Parasitol 87, 47-72. 510. Machado, A., Pinho, D., and Pereira, O. (2014) Phylogeny, identification and pathogenicity of the Botryosphaeriaceae associated with collar and root rot of the biofuel plant Jatropha curcas in Brazil, with a description of new species of Lasiodiplodia. Fungal Diversity, 1-17. 511. Burke, G. R., and Strand, M. R. (2014) Systematic analysis of a wasp parasitism arsenal. Molecular Ecology 23, 890- 901. 512. Lewis, N. S., Anderson, T. K., Kitikoon, P., Skepner, E., Burke, D. F., and Vincent, A. L. (2014) Substitutions near the hemagglutinin receptor-binding site determine the antigenic evolution of influenza A H3N2 viruses in U.S. swine. J Virol. 513. Chopra, N., Saumitra, Pathak, A., Bhatnagar, R., and Bhatnagar, S. (2013) Linkage, Mobility, and Selfishness in the MazF Family of Bacterial Toxins: A Snapshot of Bacterial Evolution. Genome Biology and Evolution 5, 2268-2284. 514. Fan, X., Pan, H., Li, L., Jiang, J., Al‐Rasheid, K. A., and Gu, F. (2013) Phylogeny of the Poorly Known Ciliates, Microthoracida, a Systematically Confused Taxon (Ciliophora), with Morphological Reports of Three Species. Journal of Eukaryotic Microbiology 515. Kanther, M., Tomkovich, S., Xiaolun, S., Grosser, M. R., Koo, J., Flynn, E. J., Jobin, C., and Rawls, J. F. (2014) Commensal microbiota stimulate systemic neutrophil migration through induction of Serum amyloid A. Cellular microbiology. 516. Toome, M., Ohm, R. A., Riley, R. W., James, T. Y., Lazarus, K. L., Henrissat, B., Albu, S., Boyd, A., Chow, J., and Clum, A. (2013) Genome sequencing provides insight into the reproductive biology, nutritional mode and ploidy of the fern pathogen Mixia osmundae. New Phytologist. 517. Scott, C. H., Zaspel, J. M., Chialvo, P., and Weller, S. J. (2014) A preliminary molecular phylogenetic assessment of the lichen moths (Lepidoptera: Erebidae: Arctiinae: Lithosiini) with comments on palatability and chemical sequestration. Systematic Entomology 39, 286- 303. 518. Kundrata, R., Bocakova, M., and Bocak, L. (2013) The phylogenetic position of (Coleoptera: ), with description of the first two Eurypogon species from China. Contributions to Zoology 82, 199-208. 519. Thoma, B. P., Guinot, D., and Felder, D. L. (2014) Evolutionary relationships among American mud crabs (Crustacea: Decapoda: Brachyura: Xanthoidea) inferred from nuclear and mitochondrial markers, with comments on adult morphology. Zoological Journal of the Linnean Society 170, 86-109. 520. Vizzini, A., Ercole, E., and Voyron, S. (2014) Lepiota sanguineofracta (Basidiomycota, Agaricales), a new species with a hymeniform pileus covering from Italy. Mycological Progress, 1-8. 521. Lücking, R., Lawrey, J. D., Gillevet, P. M., Sikaroodi, M., Dal-Forno, M., and Berger, S. A. (2013) Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact? J Mol Evol, 1-15. 522. Shukla, V., Habib, F., Kulkarni, A., and Ratnaparkhi, G. S. (2013) Gene Duplication, Lineage Specific Expansion and Sub-functionalization in the MADF-BESS Family Patterns the Drosophila Wing-Hinge. Genetics, genetics. 113.160531. 523. Weeks, S. C., Brantner, J. S., Astrop, T. I., Ott, D. W., and Rabet, N. (2013) The Evolution of Hermaphroditism from Dioecy in Crustaceans: Selfing Hermaphroditism Described in a Fourth Spinicaudatan Genus. Evol Biol, 1-11. 524. Paiva, T. d. S., Borges, B. d. N., and Silva-Neto, I. D. d. (2013) Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data. Genetics and molecular biology 36, 571-585. 525. Olatinwo, R., Allison, J., Meeker, J., Johnson, W., Streett, D., Aime, M. C., and Carlton, C. (2013) Detection and Identification of Amylostereum areolatum (Russulales: Amylostereaceae) in the Mycangia of Sirex nigricornis (Hymenoptera: Siricidae) in Central Louisiana. Environmental Entomology 42, 1246-1256. 526. Muñoz, G., Caballero, A., Contu, M., Ercole, E., and Vizzini, A. Leucoagaricus croceobasis (Agaricales, Agaricaceae), a new species of section Piloselli from Spain. Mycological Progress, 1-7. 527. Crous, P., Wingfield, M., Guarro, J., Cheewangkoon, R., van der Bank, M., Swart, W., Stchigel, A., Cano-Lira, J., Roux, J., and Madrid, H. (2013) Fungal Planet description sheets: 154–213. Persoonia-Molecular Phylogeny and Evolution of Fungi 31, 188-296. 528. Mattio, L., Zubia, M., Loveday, B., Crochelet, E., Duong, N., Payri, C. E., Bhagooli, R., and Bolton, J. J. (2013) Sargassum (Fucales, Phaeophyceae) in Mauritius and Réunion, western Indian Ocean: taxonomic revision and biogeography using hydrodynamic dispersal models. Phycologia 52, 578-594. 529. Viljoen, J.-A., Muasya, A. M., Barrett, R. L., Bruhl, J. J., Gibbs, A. K., Slingsby, J. A., Wilson, K. L., and Verboom, G. A. (2013) Radiation and repeated transoceanic dispersal of Schoeneae (Cyperaceae) through the southern hemisphere. American journal of botany 100, 2494-2508. 530. Hong-Wa, C., and Besnard, G. (2014) Species limits and diversification in the Madagascar olive (Noronhia, Oleaceae). Botanical Journal of the Linnean Society 174, 141- 161.

PY1-3 Comprehensive Report Page 23

531. Malpica, A., and Ornelas, J. F. (2014) Postglacial northward expansion and genetic differentiation between migratory and sedentary populations of the broad-tailed hummingbird (Selasphorus platycercus). Molecular ecology 23, 435-452. 532. Jiang, J., Huang, J., Li, J., Al-Rasheid, K. A., Al-Farraj, S. A., Lin, X., and Hu, X. (2013) Morphology of two marine euplotids (Ciliophora: Euplotida), Aspidisca fusca Kahl, 1928 and A. hexeris Quennerstedt, 1869, with notes on their small subunit rRNA gene sequences. European Journal of Protistology 49, 634-643. 533. Powell, A. F., Barker, F. K., Lanyon, S. M., Burns, K. J., Klicka, J., and Lovette, I. J. (2014) A comprehensive species- level molecular phylogeny of the New World blackbirds (Icteridae). Molecular phylogenetics and evolution 71, 94-112. 534. Schmidt, D. J., Grund, R., Williams, M. R., and Hughes, J. M. (2014) Australian parasitic Ogyris butterflies: east–west divergence of highly-specialized relicts. Biological Journal of the Linnean Society 111, 473-484. 535. Amaro, R. C., Nunes, I., Canedo, C., Napoli, M. F., and Junca, F. A. (2013) A molecular phylogeny recovers Strabomantis aramunha Cassimiro, Verdade and Rodrigues, 2008 and Haddadus binotatus (Spix, 1824)(Anura: Terrarana) as sister taxa. Zootaxa 3741, 569–582. 536. Barrett, C. F., Specht, C. D., Leebens-Mack, J., Stevenson, D. W., Zomlefer, W. B., and Davis, J. I. (2014) Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)? Annals of botany 113, 119-133. 537. Miller, A., Makowsky, R., Formanowicz, D., Prendini, L., and Cox, C. (2014) Cryptic genetic diversity and complex phylogeography of the boreal North American scorpion,< i> Paruroctonus boreus(Vaejovidae). Molecular phylogenetics and evolution 71, 298-307. 538. Dunowska, M., Munday, J. S., Laurie, R. E., and Hills, S. F. (2013) Genomic characterisation of Felis catus papillomavirus 4, a novel papillomavirus detected in the oral cavity of a domestic cat. Virus Genes, 1-9. 539. Hodkinson, B. P., Moncada, B., and Lücking, R. (2014) Lepidostromatales, a new order of lichenized fungi (Basidiomycota, Agaricomycetes), with two new genera, Ertzia and Sulzbacheromyces, and one new species, Lepidostroma winklerianum. Fungal Diversity 64, 165-179. 540. Anderson, T. K., Nelson, M. I., Kitikoon, P., Swenson, S. L., Korslund, J. A., and Vincent, A. L. (2013) Population dynamics of cocirculating swine influenza A viruses in the United States from 2009 to 2012. Influenza and other respiratory viruses 7, 42-51. 541. Liebrand, T. W., van den Burg, H. A., and Joosten, M. H. (2013) Two for all: receptorassociated kinases SOBIR1 and BAK1. Trends in plant science. 542. Rodríguez‐Gómez, F., and Ornelas, J. F. (2013) Genetic divergence of the Mesoamerican azure‐crowned hummingbird (Amazilia cyanocephala, Trochilidae) across the Motagua‐Polochic‐Jocotán fault system. Journal of Zoological Systematics and Evolutionary Research. 543. Cáceres, M. E., Aptroot, A., Nelsen, M. P., and Lücking, R. (2013) Pyrenula sanguinea (lichenized Ascomycota: Pyrenulaceae), a new species with unique, trypethelioid ascomata and complex pigment chemistry. The Bryologist 116, 350-357. 544. Mood, J. D., Prince, L. M., Veldkamp, J., and Dey, S. The history and identity of Boesenbergia longiflora (Zingiberaceae) and descriptions of five related new taxa.. 125. Nelsen, M. P., Thell, A., Leavitt, S. D., Hampton-Miller, C. J., and Lumbsch, H. T. (2013) A reappraisal of Masonhalea (Parmeliaceae, Lecanorales) based on molecular and morphological data. The Lichenologist 45, 729-738. 545. Abe, K. T., Mariguela, T. C., Avelino, G. S., Castro, R. M. C., and Oliveira, C. Multilocus molecular phylogeny of Gasteropelecidae (Ostariophysi: Characiformes) reveals the existence of an unsuspected diversity. Molecular Phylogenetics and Evolution 546. Rodrigues, M. T., Teixeira Jr, M., Vechio, F. D., Amaro, R. C., NiSA, C., Guerrero, A. C., Damasceno, R., Roscito, J. G., Nunes, P. M. S., and Recoder, R. S. (2013) Rediscovery of the Earless Microteiid Lizard Anotosaura collaris Amaral, 1933 (Squamata: Gymnophthalmidae): A redescription complemented by osteological, hemipenial, molecular, karyological, physiological and ecological data. Zootaxa 3731, 345–370. 547. Qi, Z., Cameron, K. M., Li, P., Zhao, Y., Chen, S., Chen, G., and Fu, C. (2013) Phylogenetics, character evolution, and distribution patterns of the greenbriers, Smilacaceae (Liliales), a near‐cosmopolitan family of monocots. Botanical Journal of the Linnean Society 173, 535-548. 548. Pinzón-Latorre, D., and Deyholos, M. K. (2013) Characterization and transcript profiling of the pectin methylesterase (PME) and pectin methylesterase inhibitor (PMEI) gene families in flax (Linum usitatissimum). BMC genomics 14, 742. 549. Mccranie, J. R., and Hedges, S. B. (2013) A review of the Cnemidophorus lemniscatus group in Central America (Squamata: Teiidae), with comments on other species in the group. Zootaxa 3722, 301–316. 550. Vesterholt, J., Eberhardt, U., and Beker, H. J. (2013) Epitypification of Hebeloma crustuliniforme. Mycological Progress, 1-10. 551. Feller, K. D., Cronin, T. W., Ahyong, S. T., and Porter, M. L. (2013) Morphological and molecular description of the late-stage larvae of Alima Leach, 1817 (Crustacea: Stomatopoda) from Lizard Island, Australia. Zootaxa 3722, 22–32. 552. Fernández-Mazuecos, M., Blanco-Pastor, J. L., Gómez, J. M., and Vargas, P. (2013) Corolla morphology influences diversification rates in bifid toadflaxes (Linaria sect. Versicolores). Annals of botany 112, 1705-1722. 553. He, K., Shinohara, A., Jiang, X.-L., and Campbell, K. L. (2014) Multilocus phylogeny of talpine moles (Talpini, Talpidae, Eulipotyphla) and its implications for systematics. Molecular phylogenetics and evolution 70, 513-521.

PY1-3 Comprehensive Report Page 24

554. Aguirre‐von‐Wobeser, E., Soberón‐Chávez, G., Eguiarte, L. E., Ponce‐Soto, G. Y., Vázquez‐Rosas‐Landa, M., and Souza, V. (2013) Two‐role model of an interaction network of free‐living γ‐proteobacteria from an oligotrophic environment. Environmental microbiology. 555. Paulsen, M. (2013) A new genus for the Neotropical species of< i> Aesalus Fabricius, with descriptions of eight new species (Coleoptera: Lucanidae: Aesalinae).. 556. Chatzimaolis, S. (2014) Phylogeny of xanthopygine rove beetles (Coleoptera) based on six molecular loci. Systematic Entomology 39, 141-145. 557. , เ. ศ. บ. ล., and , ธ. (2013) เ เล ล เบ ข เ ไล เค ศ เล . Thai Journal of Genetics 6, 303-307. 558. Frajman, B., Carlon, L., Kosachev, P., Pedraja, Ó. S., Schneeweiss, G. M., and Schoenswetter, P. (2013) Phylogenetic position and taxonomy of the enigmatic Orobanche krylowii (Orobanchaceae), a predominatly Asian species newly found in Albania (SE Europe). Phytotaxa 137. 559. Mathis, V. L., Hafner, M. S., Hafner, D. J., and Demastes, J. W. (2013) Thomomys nayarensis, a new species of pocket gopher from the Sierra del Nayar, Nayarit, Mexico. Journal of Mammalogy 94, 983-994. 560. Gao, F., and Katz, L. A. (2014) Phylogenomic analyses support the bifurcation of ciliates into two major clades that differ in properties of nuclear division. Molecular phylogenetics and evolution 70, 240-243. 561. Melo, B. F., Sidlauskas, B. L., Hoekzema, K., Vari, R. P., and Oliveira, C. (2014) The first molecular phylogeny of Chilodontidae (Teleostei: Ostariophysi: Characiformes) reveals cryptic biodiversity and taxonomic uncertainty. Molecular phylogenetics and evolution 70, 286-295. 562. Tanganelli, V., Fanciulli, P. P., Nardi, F., and Frati, F. (2013) Molecular phylogenetic analysis of a novel strain from Neelipleona enriches< i> Wolbachia diversity in soil biota. Pedobiologia. 563. Hrbek, T., da Silva, V. M. F., Dutra, N., Gravena, W., Martin, A. R., and Farias, I. P. (2014) A New Species of River Dolphin from Brazil or: How Little Do We Know Our Biodiversity. PLOS ONE 9, e83623. 564. Isler, M. L., Bravo, G. A., and Brumfield, R. T. (2013) Taxonomic revision of Myrmeciza (Aves: Passeriformes: Thamnophilidae) into 12 genera based on phylogenetic, morphological, behavioral, and ecological data. Zootaxa 3717, 469-497. 565. Verstraete, B., Janssens, S., Lemaire, B., Smets, E., and Dessein, S. (2013) Phylogenetic lineages in Vanguerieae (Rubiaceae) associated with Burkholderia bacteria in sub-Saharan Africa. American journal of botany 100, 2380-2387. 566. Massoni, J., Forest, F., and Sauquet, H. (2014) Increased sampling of both genes and taxa improves resolution of phylogenetic relationships within Magnoliidae, a large and early diverging clade of angiosperms. Molecular phylogenetics and evolution 70, 84-93. 567. Heiniger, H., and Adlard, R. D. (2013) Molecular identification of cryptic species of Ceratomyxa Thélohan, 1892 (Myxosporea: Bivalvulida) including the description of eight novel species from apogonid fishes (Perciformes: Apogonidae) from Australian waters. Acta Parasit. 58, 342-360. 568. Aowphol, A., Rujirawan, A., Taksintum, W., Arsirapot, S., and McLeod, D. S. (2013) Re-evaluating the taxonomic status of Chiromantis in Thailand using multiple lines of evidence (Amphibia: Anura: Rhacophoridae). Zootaxa 3702, 101-123. 569. Singh, G., Divakar, P. K., Dal Grande, F., Otte, J., Parnmen, S., Wedin, M., Crespo, A., Lumbsch, H. T., and Schmitt, I. (2013) The sister-group relationships of the largest family of lichenized fungi, Parmeliaceae (Lecanorales, Ascomycota). Fungal Biology 117, 715-721. 570. Langone, J. A. (2013) Filogeografía de Pseudopaludicola falcipes (HENSEL, 1867) (Amphibia, Anura). in Zoología, Universidad de la República, Montevideo, Uruguay. 571. Freeman, S., Sharon, M., Maymon, M., Mendel, Z., Protasov, A., Aoki, T., Eskalen, A., and O’Donnell, K. (2013) Fusarium euwallaceae sp. nov.—a symbiotic of Euwallacea sp., an invasive ambrosia in Israel and California. Mycologia 105, 1595-1606. 572. Agorreta, A., San Mauro, D., Schliewen, U., Van Tassell, J. L., Kovačić, M., Zardoya, R., and Rüber, L. (2013) Molecular phylogenetics of Gobioidei and phylogenetic placement of European gobies. Molecular Phylogenetics and Evolution 69, 619-633. 45. DMR000003 573. Kuriakose S. & Dimitrakopoulos P. “Deformation of an elastic capsule in a rectangular microfluidic channel”, Soft Matter, 9(16) 4284–4296 (2013). 574. Park Sun-Young & Dimitrakopoulos P. “Transient dynamics of an elastic capsule in a microfluidic constriction”, Soft Matter, 9(37) 8844–8855 (2013). 575. Dimitrakopoulos P. “Effects of membrane stiffness and scaling analysis for capsules in planar extensional flows”, J. Fluid Mech., 745 487–508 (2014). 576. Dissanayake I. D. & Dimitrakopoulos P. “Coil-helix-rod transition of an initially flexible bead-rod polymer after rapid stiffening: configuration and stress evolution”, J. Chem. Phys., under review (2014).

PY1-3 Comprehensive Report Page 25

46. DMR060011 577. K. Sun, K. Park, J. L. Xie, J. Y. Luo, J. K. Yuan, Z. H. Xiong, J.-Z. Wang, and Q. K. Xue, “Direct Observation of Molecular Orbitals in an Individual Single-Molecule Magnet Mn12 on Bi(111),” ACS Nano 7, 6825 (2013). 578. K. Park, C. De Beule, and B. Partoens, “The ageing effect in topological insulators: evolution of the surface electronic structure of Bi2Se3 upon K adsorption,” New J. Phys. 15, 113031 (2013). 579. J. H. Atkinson, K. Park, C. C. Beedle, D. N. Hendrickson, Y. Myasoedov, E. Zeldov, and J. R. Friedman, ``The Effect of Uniaxial Pressure on the Magnetic Anisotropy of the Mn12-Ac Single-Molecule Magnet,, Europhys. Lett. 102, 47008 (2013). 580. E. Burzuri, Y. Yamamoto, M. Warnock, X. Zhong, K. Park, A. Cornia, and H. S. J. van der Zant, “Franck-Condon Blockade in a Single-Molecule TRansistor,” under review in Nano Lett. since Feb 2014. 581. Y. -T. Hsu, M. Fisher, T. L. Hughes, K. Park, and E.-A. Kim, ``Effects of surface-bulk hybridization in 3D topological metals,’’ arXiv: 1401.3378 (under review in Phys. Rev. B). 582. P. Rivero, V. M. Garcia-Suarez, Y. Yang, L. Bellaiche, K. Park, J. Ferrer, and S. Barraza-Lopez, ``Optimal pseudopotentials and numerical atomic orbital bases for arbitrary novel materials,’’ under review in Phys. Rev. B. 583. A. McCaskey, Y. Yamamoto, M. Warnock, E. Burzuri, H. S. J. van der Zant, and K. Park, ``An Effect of electron-vibron coupling on electron transport through a single-molecule magnet Fe4,’’ (in preparation). 584. K. Park and B. Partoens, ``Magnetic proximity effect of thin Bi2Se3 films upon Cr adsorption,’’ (in preparation). 47. DMR090071 585. Effect of Geometrical Orientation on the Charge-Transfer Energetics of Supramolecular (tetraphenyl)-porphyrin/C60 Dyads, Marco Olguin, Rajendra R. Zope, and Tunna Baruah, Journal of Chemical Physics 138, 074306 (2013). 586. Substituent-level Tuning of Frontier Orbital Energy Levels in Phthalocyanine/C60 Donor-Acceptor Charge Transfer Pairs, Marco Olguin, Luis Basurto, Rajendra R. Zope, Tunna Baruah (http://arxiv.org/abs/1308.2001).(To be submitted). 587. The Effect of Structural Conformational Changes on Charge Transfer States in a Light-Harvesting Carotenoid-diaryl- Porphyrin-C60 Molecular Triad, Marco Olguin, Rajendra R. Zope, Tunna Baruah (http://arxiv.org/abs/1308.2000). (Communicated). 588. Electronic structure and charge transfer excited states of a multichromophoric antenna Luis Basurto, Rajendra R. Zope, Tunna Baruah (http://arxiv.org/abs/1306.2627). (Communicated). 48. DMR090073 589. Botelho AL, Shin YW, Liu JK, and Lin X. “Structure and Optical Bandgap Relationship of π-Conjugated Systems” PLoS One 9: e86370 JAN 2014 590. Shin YW, Liu JK and Lin X. “Side chain effect on high efficient semiconducting polymer for bandgap designing” Submitted 2014 591. Shin YW, Liu JK, Quigley JJ, Luo H and Lin X. “Combinatorial Designing of Copolymer Donor Materials for Bulk Heterojunction Solar Cells” Submitted 2014. 592. Shin YW, Liu JK and Lin X. “Breaking charge conjugation symmetry for pi-conjugated electron acceptor materials design” Submitted 2014 593. Yu Y, Cetin D, Luo H, Lin X, Ludwig K, Pal U, Gopalan S, Basu S. “Effect of atmospheric carbon dioxide on surface segregation and phase formation in La0.6Sr0.4Co0.2Fe0.8O3-δ thin films” MRS proceedings 1647 accepted 2014 594. Cao PH, Park HS and Lin X. “Strain-rate and temperature-driven transition in the shear transformation zone for two- dimensional amorphous solids” Physical Review E 88(4) OCT 2013 595. Shin YW, and Lin X. “Modeling Photoinduced Charge Transfer Across π-Conjugated Heterojunctions” The Journal of Physical Chemistry C 117(24) MAY 22 2013 596. Du P, Lin X and Zhang X. “Tunable electrical and mechanical responses of PDMS and polypyrrole nanowire composites” Journal of Physics D: Applied Physics 46(19) MAY 2013 597. Du P, Lin X and Zhang X. “Tunable dielectric constants and mechanical properties of PDMS and conducting polymer nanowire composites” Solid-State Sensors, Actuators and Microsystems (TRANSDUCERS & EUROSENSORS XXVII) pp 434-437 JUNE 2013 49. DMR100029 598. Kutana, A., Penev, E. S. & Yakobson, B. I. Engineering electronic properties of layered transition-metal dichalcogenide compounds through alloying. Nanoscale (2014), DOI: 10.1039/C4NR00177J 599. Penev, E. S., Artyukhov, V. I. & Yakobson, B. I. Extensive energy landscape sampling of nanotube end-caps reveals no chiral-angle bias for their nucleation. ACS Nano 8, 1899–1906 (2014). 600. Penev, E. S., Artyukhov, V. I. & Yakobson, B. I. Tensile failure of a basic structural unit in carbon fibers from atomistic simulations. Carbon (2014). Submitted.

PY1-3 Comprehensive Report Page 26

601. Hao, Y., Bharathi, M. S., Wang, L., Liu, Y., Chen, H., Nie, S., Wang, X., Chou, H., Tan, C., Fallahazad, B., Ramanarayan, H., Magnuson, C. W., Tutuc, E., Yakobson, B. I., McCarty, K. F., Zhang, Y.-W., Kim, P., Hone, J., Colombo, L. & Ruoff, R. S. The role of surface oxygen in the growth of large single-crystal graphene on copper. Science 342, 720–723 (2013). 50. DMR100033 602. F.D. Ma, Y.M. Jin, Y.U. Wang, S.L. Kampe, S. Dong, “Effect of Magnetic Domain Structure on Longitudinal and Transverse Magnetoelectric Response of Particulate Magnetostrictive-Piezoelectric Composites,” Appl. Phys. Lett., 104, 112903-1-4, 2014. 603. F.D. Ma, Y.M. Jin, Y.U. Wang, S.L. Kampe, S. Dong, “Phase Field Modeling and Simulation of Particulate Magnetoelectric Composites: Effects of Connectivity, Conductivity, Poling and Bias Field,” Acta Mater., 70, 45-55, 2014. 604. Y.M. Jin, “Phase Field Modeling of Current Density Distribution and Effective Electrical Conductivity in Complex Microstructures,” Appl. Phys. Lett., 103, 021906-1-4, 2013. 605. T.L. Cheng, Y.U. Wang, “Shape-Anisotropic Particles at Curved Fluid Interfaces and Role of Laplace Pressure: A Computational Study,” J. Colloid Interface Sci., 402, 267-278, 2013. 606. J.E. Zhou, Y.U. Wang, “Effects of Crystallographic Texture and Grain Shape Anisotropy on Dielectric and Piezoelectric Properties of Ferroelectric Polycrystals,” Appl. Phys. Lett. (under preparation) 607. J.E. Zhou, Y. Yan, S. Priya, Y.U. Wang, “Phase Field Modeling of Textured Ferroelectric Polycrystals: Texture Development during Templated Grain Growth,” J. Appl. Phys. (under preparation) 608. F.D. Ma, Y.U. Wang, “Computer Modeling and Simulation of Random Packing of Arbitrary-Shaped Particles with Friction: A Diffuse Interface Field Approach,” Acta Mater. (under preparation) 609. L.D. Geng, J.E. Zhou, Y.M. Jin, Y.U. Wang, “Deformation Dependence of Incompletely Softened Phonons in Martensitic Precursor: First Principles Density Functional Theory Study,” Phys. Rev. B (under preparation) 610. L.D. Geng, T.L. Cheng, Y.M. Jin, Y.U. Wang, “Charge Compensation and Aging Mechanisms in Iron-Doped Ferroelectric Titanate Perovskites: First Principles Density Functional Theory Study,” Phys. Rev. B (under preparation) 611. L.D. Geng, Y.M. Jin, “Current-Driven Domain Wall Depinning in Magnetic Nanowires,” Appl. Phys. Lett. (under preparation) 612. (Invited Talk) Y.U. Wang, “Phase Field Modeling and Simulation of Processing, Microstructure and Property of Ferroelectric Materials,” 12th U.S. National Congress on Computational Mechanics, July 22-25, 2013, Raleigh, North Caroline. 613. (Invited Talk) Y.M. Jin, Y.U. Wang, Y. Ren, “Broken Dynamic Symmetry and Phase Transition Precursor,” Society of Engineering Science (SES) 50th Annual Technical Meeting, Providence, Rhode Island, July 28-31, 2013. 614. Y.M. Jin, Y.U. Wang, S.L. Kampe, S. Dong, “Phase Field Modeling and Computer Simulation of Multiferroic Materials,” Society of Engineering Science (SES) 50th Annual Technical Meeting, Providence, Rhode Island, July 28- 31, 2013.2 615. F.D. Ma, Y.M. Jin, Y.U. Wang, S.L. Kampe, “Phase Field Modeling and Simulation of Particulate Magnetoelectric Composites,” TMS Annual Meeting, San Diego, California, February 16-20, 2014. 616. F.D. Ma, Y.U. Wang, “Diffuse Interface Field Approach to Modeling and Simulation of Packing of Arbitrarily Shaped Particles with Friction,” TMS Annual Meeting, San Diego, California, February 16-20, 2014. 617. J.E. Zhou, Y.U. Wang, “Computational Study of Templated Grain Growth and Texture Development in Polycrystals,” TMS Annual Meeting, San Diego, California, February 16-20, 2014. 618. J.E. Zhou, Y.U. Wang, “Computational Study of Microstructure and Property Relations in Ferroelectric Polycrystals,” TMS Annual Meeting, San Diego, California, February 16-20, 2014. 619. B.L. Wang, Y.M. Jin, “Effects of Elastic Interactions on Domain Structures in Terfenol-D,” TMS Annual Meeting, San Diego, California, February 16-20, 2014. 620. (Invited Lecture) Y.M. Jin, “Phase Field Modeling of Magnetoelectric Composites,” Workshop on Materials Genomics of Phase Transforming Materials, Texas A&M University, College Station, Texas, June 11-12, 2014. 621. L.D. Geng, Y.M. Jin, Y.U. Wang, “Origin of Monoclinic Distortion in Nanodomained Ferroelectric Materials,” MS&T Meeting, Pittsburg, PA, October 12-16, 2014 (accepted). 622. L.D. Geng, Y.M. Jin, “Micromagnetic Simulation Study of Domain Wall Behaviors in Patterned Magnetic Nanowires,” MS&T Meeting, Pittsburg, PA, October 12-16, 2014 (accepted). 623. Z.J. Morgan, Y.M. Jin, S.L. Kampe, “Damping Mechanisms in Ferroelectric Ceramic-Reinforced Metal Matrix Composites,” MS&T Meeting, Pittsburg, PA, October 12-16, 2014 (accepted). 51. DMR110035 624. A. Selem, C. Herdman and K. B. Whaley, “Entanglement Entropy at Generalized Rokhsar-Kivelson Points of Quantum Dimer Models”, Phys. Rev. B 87, 125105 (2013).

PY1-3 Comprehensive Report Page 27

625. C. M. Herdman and K. B. Whaley, “Geometric properties of loop condensed phases on the square lattice”, Phys. Rev. B 87, 155150 (2013). 626. A. Selem, C. Herdman and K. B. Whaley, “Topological entanglement entropy in the triangular lattice quantum dimer model”, in preparation. 52. DMR120047 627. \Collective and single-chain correlations in disordered melts of symmetric diblock copolymer: Quantitative comparisons of simulations and theory", J. Glaser, J. Qin, P. Medapuram, and D. Morse, Macromolecules 47, 851 (2014) 628. \Universality of block copolymer melts", J. Glaser, P. Medapuram, T.M. Beardsley, M.W. Matsen and D.C. Morse (submitted to Physcal Review Letters, 2014). 629. \Identifying the order-disorder transition of block copolymers using metadynamics", Glaser, J., Qin J. and Morse, D. (in preparation) 630. \Universal phenomenology of symmetric diblock copolymer melts near the order disorder transition", P. Medapuram, J. Glaser and D.C. Morse (in preparation) 631. \Strong scaling GPU molecular dynamics with HOOMD-blue", J. Glaser, T. Nguyen, J. Anderson, P. Lui, D.C. Morse, and S. Glotzer (in preparation) 53. DMR130005 632. M. S. Miao, Caesium in high oxidation states and as a p block element, Nature Chemistry, 5, 846 (2013). 633. M. S. Miao and R. Hoffmann, High pressure electrides: A predictive chemical and physical theory, Accounts of Chemical Research, published as Editor’s choice, http://pubs.acs.org/doi/full/10.1021/ar4002922 634. M. S. Miao, Q. M. Yan, and C. G. Van de Walle, Electronic structure of a single-layer InN quantum well in a GaN matrix, Applied Physics Letters, 102, 102103 (2013). 635. M.-S. Miao, S. Yarbro, P. T. Barton, and R. Seshadri, Electron afinities and ionization energies of Cu and Ag delafossite compounds: A hybrid functional study, Phys. Rev. B. 89 (2014) 045306(1–8). 636. A. Krishnapriyan, P. T. Barton, M. Miao, and R. Seshadri, First-principles study of band alignments in the p-type hosts BaM2X2 (M = Cu, Ag; X = S, Se), J. Phys. Condens. Matter 26 (2014) 155802(1–6). 54. DMR130007 637. Coleman, S.P., Pamidighantam, S., Van Moer, M., Wang, Y., Koesterke, L., Spearot, D.E. (2014) Performance improvement and workflow development of virtual diffraction calculations, XSEDE 2014 Conference Proceedings, in press. 638. Coleman, S.P., Sichani, M.M., Spearot, D.E. (2014) A computational algorithm to produce virtual x-ray and electron diffraction patterns from atomistic simulations, JOM, 66, 408-416. 639. Coleman, S.P., Spearot, D.E., Capolungo, L. (2013) Virtual diffraction analysis of Ni [010] symmetric tilt grain boundaries, Modeling and Simulation in Materials Science and Engineering, 21, 055020. 640. Coleman, S.P., Weinberger, C., and Spearot, D.E. Characterizing Complex Metal-Oxide Interfaces Via Virtual Diffraction. Presented at the 2014 TMS Annual Conference and Exhibition, San Diego, CA (2014 February 18). 641. Coleman, S.P., Mirzaei, M.M., Spearot, D.E. A Computational Algorithm to Produce Virtual X-Ray and Electron Diffraction Patterns of Interfaces from Atomistic Simulations. Presented at the 2014 TMS Annual Conference and Exhibition, San Diego, CA (2014 February 17). 642. Spearot, D.E. and Coleman, S.P. Virtual Diffraction Analysis of Grain Boundaries in FCC Metallic Materials. Presented at the International Symposium on Plasticity, Freeport, Bahamas (2014 January 3). 643. Coleman, S.P. and Spearot, D.E. Atomistic Simulations and Virtual Diffraction Characterization of Al2O3 During Ion Bombardment. Presented at the Materials Science & Technology 2013 Conference and Exhibition, Montreal, Canada (2013 October 30). 644. Coleman, S.P. and Spearot, D.E. Grain Boundary Structure Analyzed via Atomistic Simulation and Virtual Diffraction. Presented at the Materials Science & Technology 2013 Conference and Exhibition, Montreal, Canada (2013 October 29). 645. Spearot, D.E. and Coleman, S.P., Linking Between Atomistic Simulation and Experiment Via Virtual Diffraction Computation. Presented at the 50th Annual Meeting of the Society of Engineering Science, Providence, RI (2013 July 29). 646. Coleman, S.P., Barrows, W.A., Spearot, D.E. In Situ Virtual Diffraction Analysis of Alumina During Ion Bombardment. Presented at the 2013 TMS Annual Conference and Exhibition, San Antonio, TX (2013 March 4). 55. DMR130009 647. D. Le, D. Sun, W. Lu, M. Aminpour, C. Wang, Q. Ma, T.S. Rahman, and L. Bartels, "Growth of aligned Mo6S6 nanowires on Cu(111)," Surface Science 611, 1-4 (2013).

PY1-3 Comprehensive Report Page 28

648. Q. Ma, P.M. Odenthal, J. Mann, D. Le, C.S. Wang, Y. Zhu, T. Chen, D. Sun, K. Yamaguchi, T. Tran, M. Wurch, J.L. McKinley, J. Wyrick, K. Magnone, T.F. Heinz, T.S. Rahman, R. Kawakami, and L. Bartels, "Controlled Argon Beam- Induced Desulfurization of Monolayer Molybdenum Disulfide," Journal of Physics: Condensed Matter 25, 252201-1–5 (2013). 649. D. Le and T.S. Rahman, "Joined edges in MoS2: metallic and half-metallic wires," Journal of Physics: Condensed Matter 25, 312201 (2013). 650. J. Mann, Q. Ma, P.M. Odenthal, M. Isarraraz, D. Le, E. Preciado, D. Barroso, K. Yamaguchi, G. von Son Palacio, A. Nguyen, T. Tran, M. Wurch, A. Nguyen, V. Klee, S. Bobek, D. Sun, T.F. Heinz, T.S. Rahman, R. Kawakami, and L. Bartels, "2-Dimensional Transition Metal Dichalcogenides with Tunable Direct Band Gaps: MoS2(1-x) Se2x Monolayers," Advanced Materials 26, 1399-404 (2014). 651. D. Le, T.B. Rawal, and T.S. Rahman, "Single-Layer MoS2 with Sulfur Vacancies: Structure and Catalytic Application," The Journal of Physical Chemistry C 118, 5346-5351 (2014). 652. Q. Ma, M. Isarraraz, E. Preciado, V. Klee, S. Bobek, K. Yamaguchi, E. Li, P.M. Odenthal, A. Nguyen, D. Barroso, D. Sun, G.v. Son, M. Gomez, A. Nguyen, D. Le, G. Pawin, J. Mann, T.F. Heinz, T.S. Rahman, and L. Bartels, "Post- Growth Tuning of the Bandgap of Single-Layer MoS2 Films by Sulfur/Selenium Exchange," Submitted (2014). 56. DMR130011 653. Lang, R. J., and Simmons, D. S. “Interfacial Dynamic Length Scales in the Glass Transition of a Model Freestanding Polymer Film and Their Connection to Cooperative Motion.” Macromolecules 46, no. 24 (2013): 9818–25. doi:10.1021/ma401525q. 654. Mackura, M. E., and Simmons, D. S. “Enhancing Heterogenous Crystallization Resistance in a Bead-Spring Polymer Model by Modifying Bond Length.” Journal of Polymer Science Part B: Polymer Physics 52, no. 2 (2014): 134–40. doi:10.1002/polb.23398. 655. Lang, R. J., Merling W. L., and Simmons, D. S. “Combined dependence of nanoconfined Tg on interfacial energy and softness of confinement.” In revision at ACS Macro Letters. 57. DMR130036 656. G. Cohen, E. Gull, D. R. Reichman, and A. J. Millis, Physical Review Letters 112, 146802 (2014), URL http://link.aps.org/doi/10.1103/PhysRevLett.112.146802. I, 1, II B, II C, 1 657. G. Cohen, E. Gull, D. R. Reichman, A. J. Millis, and E. Rabani, Physical Review B 87, 195108 (2013), URL http://link.aps.org/doi/10.1103/PhysRevB.87.195108. IIA 658. G. Cohen, D. R. Reichman, A. J. Millis, and E. Gull, Physical Review B 89, 115139 (2014), URL http: //link.aps.org/doi/10.1103/PhysRevB.89.115139. I, II B, II B, III, III, IV A, IV B, 1, 2 58. DMR130039 659. Huang, J., Valenzano, L., Singh, T.V., Pandey, R. and Sant, G., ‘The Influence of (Al, Fe, Mg) Impurities on Triclinic Ca3SiO5: Interpretations from DFT Calculations’, Crystal Growth & Design (accepted), DOI: 10.1021/cg401647f, 2014. 660. Huang, J., Valenzano, L., and Sant, G., ‘Surface Reactivity and Water Adsorption on Triclinic Ca3SiO5 (100) surface’, (in preparation), pp.10, 2014 661. Guo, P., Huang, J., Valenzano, L., and Sant, G., ‘The Influence of Impurities on Ca2SiO4: Electronic Structure and Implications on Reactivity’, (in preparation), pp.8, 2014 662. Valenzano, L. ‘Energetic, Nanoporous, and Cement-based Materials: when Quantum Chemistry meets Practical Applications’, Seminar on Selected Topics in Civil & Environmental Engineering UCLA, LA, CA, (October 17, 2013). 663. Sant, G., ‘Quantum to Continuum: New Directions in Designing, Controlling and Enhancing The Performance of Cementitious Materials’. Invited Presentation to the Technical Leadership of TECNALIA Research and Innovation, Derio, Spain, (November 8, 2013). 664. Huang, J., Valenzano, L., and Sant, G. ‘Influence of (Mg, Al, Fe) impurities on triclinic Ca3SiO5: Interpretations from DFT calculations’. ACS 247th National Meeting, Dallas, TX, (March 18, 2014). 59. DMR130046 665. V. R. Saranam and P. A. Greaney, “Surprising Behavior During Dissipation and Collision of Flexural Waves in Carbon Nanotubes”, Journal of Physics D, 2013, vol. 46, 485502. 666. Laura de Sousa Oliveira and P. A. Greaney, “Mapping thermal resistance around vacancy defects in graphite”, Materials Research Society Proceedings, 2013, vol. 1543, 65-70. 667. Luping Han, Makenzie Budge, and P. Alex Greaney, “Relationship between thermal conductivity and framework architecture in MOF-5”, Computational Materials Science, Accepted. 668. Congcong Hu and P. Alex Greaney, “Role of seta angle and flexibility in the gecko adhesion mechanism”, Journal of Applied Physics, Submitted.

PY1-3 Comprehensive Report Page 29

669. Wen Qian, P. Alex Greaney, Simon Fowler, Sheng-Kuei Chiu, Andrea M. Goforth, and Jun Jiao, “Low-temperature nitrogen doping in liquid ammonia for production of N-doped TiO2 hybridized graphene as a highly efficient photocatalyst”, ACS Applied Materials & Interfaces, Submitted. 60. DMR130077 670. Tsyshevsky, R. V.; Sharia, O.; Kuklja, M. M. Energies of Electronic Transitions of Pentaerythritol Tetranitrate Molecules and Crystals. Status: Has recently been accepted for publicatoin in J. Phys. Chem. C. 671. Tsyshevsky, R. V.; Sharia, O.; Kuklja, M. M. Electronic Structure of PETN Crystals Containing Molecular Defects. Status: Will be soon submitted to the journal. 672. Tsyshevsky, R. V.; Rashkeev, S.N.; Kuklja, M. M. Electronic Interaction of Oxide Defects and Energetic Materials: Pentaerythritol Tetranitrate on MgO SurfaceElectronic Structure of PETN Crystals Containing Molecular Defects. Status: In preparation. 61. DMR130080 673. A. K. Kuhlman and A. T. Zayak, “Revealing Interaction of Organic Adsorbates with Semiconductor Surfaces Using Chemically Enhanced Raman”, J. Phys. Chem. Lett. 5 (6) 964-968 (2014) 62. DMR130081 674. Wickramaratne, Darshana, Ferdows Zahid, and Roger K. Lake. "Electronic and thermoelectric properties of few-layer transition metal dichalcogenides." The Journal of Chemical Physics 140.12 (2014): 124710. [1] 675. Renteria, J., Samnakay, R., Jiang, C., Pope, T. R., Goli, P., Yan, Z., Wickramaratne, D., Salguero T.T, Khitun A.G, Lake, R.K, Balandin, A. A. "All-metallic electrically gated 2H-TaSe2 thin-_lm switches and logic circuits." Journal of Applied Physics 115.3 (2014): 034305. [2] 676. Wickramaratne, Darshana, Ferdows Zahid, and Roger Lake. "Thermoelectric performance in ultra-thin transition metal dichalcogenides." Bulletin of the American Physical Society (2014). [3] 677. Neupane, M., Wickarmaratne, D., Ge, S., Yin, G., and Lake, R. Neupane, Mahesh, et al. "Fermi velocity renormalization in misoriented graphene on hexagonal boron nitride." Bulletin of the American Physical Society (2014). [4] 678. Su, S., Yin, G., Wickramaratne, D., Neupane, M., and Lake, R. "A study of magnetic proximity effect in two- dimensional heterostructure." Bulletin of the American Physical Society (2014). [5] 679. Ge, S., Wickramaratne, D., Neupane, M., Su, S., and Lake, R. "Electronic properties of misoriented bilayer transition metal dichalcogenides." Bulletin of the American Physical Society (2014). [6] 680. Grant review meeting: Charge Density Wave Computational Fabric On-site review, UC Riverside, Riverside, CA July 2013 681. Grant review meeting: FAME Annual Review Meeting Task 4.2: Switching by Voltage Control of the Quantum Phase in Misoriented Layers of van der Waal Materials Los Angeles, CA, October 2013 682. Y. Alaskar, S. Arafin, D. Wickramaratne, M. Zurbuchen, L. He, R.K. Lake, K.L. Wang "Towards van der Waals Epitaxial Growth of GaAs on Si Using a Graphene Buffer Layer" Under Review 683. Z. Mutlu, H. Bay, S. Turkyilmaz, R. Ionescu, Z. Favors, D. Wickramaratne, R.K Lake, M. Ozkan, C. Ozkan "Large- Area CVD Growth of Ultrathin WS2 Sheets: Cooperative Nucleation of Triangular Domains and Electronic Structure", Under Review 684. Z. Mutlu, D. Wickramaratne, H.H. Bay, Z. Favors, D. Humphrey, M. Ozkan, R.K. Lake, C.S. Ozkan "Synthesis, characterization and electronic structure of few layer MoSe2 thin films", Under review 63. DMR130087 685. Turner, D. and S.R. Kalidindi, 3D reconstruction of microstructure from oblique sections [working title]. In preparation. 686. Patel, D.K., H.F. Al-Harbi, and S.R. Kalidindi, Extracting single crystal elastic constants from polycrystalline samples using spherical nanoindentation and orientation measurements. Submitted. 687. Al-Harbi, H.F. and S.R. Kalidindi, Crystal plasticity finite element simulations using a database of discrete Fourier transforms. Accepted, International Journal of Plasticity. 64. DMR140068 688. L. Cao and T. Mueller, “The effects of atomic order on the oxygen reduction reaction on the Pt3Ni (111) surface” In preparation (2014) 65. DMS110016 689. Arrival Time Estimation in a Sparsely Sampled Hemispheric Transducer Array. J. C. Tillett, J. P. Astheimer, R. C. Waag. AIUM Annual Convention, 6-10 April 2013, New York, NY, J. Ultras. Med., 32(42S) ('13).

PY1-3 Comprehensive Report Page 30

690. Large-scale acoustic scattering by breast tissue models derived from magnetic resonance images. Andrew J. Hesford, Jason C. Tillett, Jeffrey P. Astheimer, and Robert C. Waag, Fellow, IEEE, Accepted to the Journal of the Acoustical Society of America. 66. DPP130002 691. C. Burstedde, D. Calhoun, K. Mandli, and A. R. Terrel. ForestClaw: Hybrid forest-of-octrees AMR for hyperbolic conservation laws. In Proceedings of ParCo '13, August 2013. http://arxiv.org/abs/1308.1472. 692. A. Gholaminejad, D. Malhotra, H. Sundar, and G. Biros. FFT, FMM, or Multigrid? A comparative study of state-of-the- art Poisson solvers. SIAM Journal on Scientific Computing (to be submitted), 2014. http://users.ices.utexas.edu/~hari/files/pubs/sisc14.pdf. 693. T. Isaac, C. Burstedde, L. C. Wilcox, and O. Ghattas. Recursive algorithms for distributed forests of octrees. In preparation, 2014. 694. T. Isaac, N. Petra, G. Stadler, and O. Ghattas. From data to prediction with quantified uncertainties: Scalable parallel algorithms and applications to the dynamics of the Antarctic ice sheet. In SC14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (to be submitted). http://users.ices.utexas.edu/~georgst/papers/IsaacPetraStadlerEtAl14.pdf. 695. T. Isaac, N. Petra, G. Stadler, and O. Ghattas. A parallel, scalable, adaptive-mesh, spectrally- accurate, forward and inverse three-dimensional nonlinear full Stokes ice sheet model. Journal of Computational Physics (in preparation for submission), 2014. 696. T. Isaac, G. Stadler, and O. Ghattas. Solution of nonlinear Stokes equations discretized by high-order finite elements on nonconforming and anisotropic meshes, with application to ice sheet dynamics. SIAM Journal on Scientific Computing (to be submitted), 2014. http://users.ices.utexas.edu/~georgst/papers/IsaacStadlerGhattas14.pdf. 697. J. Kelly, H. Sundar, and O. Ghattas. Solution of the time-dependent acoustic-elastic wave equation on a heterogeneous, coprocessor-enabled supercomputer. Poster at SC13, 2013. 698. H. Sundar, D. Malhotra, and K. W. Schulz. Algorithms for high-throughput disk-to-disk sorting. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, page 93. ACM, 2013. 699. J. Worthen, G. Stadler, N. Petra, M. Gurnis, and O. Ghattas. Towards adjoint-based inversion for rheological parameters in nonlinear viscous mante flow. Physics of the Earth and Planetary Interiors (submitted), 2013. 67. EAR100012 700. Mehnert, E., J. Damico, S. Frailey, H. Leetaru, Y.-F. Lin, R. Okwen, N. Adams, B. Storsved, A. Valocchi, 2013. Development of a basin-scale model for CO2 sequestration in the basal sandstone reservoir of the Illinois Basin—issues, approach and preliminary results. Energy Procedia 37: 3850-3858. http://www.sciencedirect.com/science/article/pii/S1876610213005250 701. Roy, W.R., E. Mehnert, P.M. Berger, J.R. Damico and R.T. Okwen, accepted. Transport modeling at multiple scales for the Illinois Basin – Decatur Project, Greenhouse Gases: Science and Technology. 702. Mehnert, E. and R.T. Okwen, 2013. Near-Well Pressure Distribution of CO2-Injection in a Partially Penetrating Well, Proceedings of the TOUGH Symposium 2012, Lawrence Berkeley National Laboratory, Berkeley California, September 17-19, 2012. 703. http://esd.lbl.gov/files/research/projects/tough/events/symposia/toughsymposium12/Proceedings_TOUGH-Symposium- 2012.pdf 704. Mehnert, E., N. Adams, Z. Askari-Khorasgani, S.M. Benson, P.M. Berger, S.K. Butler, M. D’Alessio, J.T. Freiburg, K.C. Hackley, W.R. Kelly, N.C. Krothe, J. Krothe, Y.-F. Lin, S.V. Panno, C. Ray, R.J. Rice, W.R. Roy, B.A. Storsved, C.W. Strandli, and L.E. Yoksoulian, 2013. Protecting Drinking Water by Reducing Uncertainties Associated with Geologic Carbon Sequestration in Deep Saline Aquifers, Illinois State Geological Survey Contract Report to U.S. Environmental Protection Agency (Washington, D.C.), 219 p. + appendices. 705. Adams, N., 2013. Coupling TOUGH2 and SEAWAT Variable-Density Models to Understand Potential Geologic Carbon Sequestration Impacts on Groundwater Resources. Master of Science in Civil and Environmental Engineering. Urbana, IL, University of Illinois at Urbana-Champaign: 107. 706. Adams, N., Y.-F., Lin, and E. Mehnert, 2013. Coupling SEAWAT and TOUGH2 Models to Protect Regional Groundwater Quality, MODFLOW and More 2013, Conference Abstract, June 2-5, 2013, Golden, CO. Paper Presented [presenter in bold] 707. Adams, N., Y.-F. Lin, E. Mehnert, and A.J. Valocchi, 2013. Coupling SEAWAT TOUGH2 to analyze potential effects of geologic carbon sequestration on underground drinking water sources, 2013 Geological Society of America Annual Meeting, Denver, CO October 27-30. Paper Presented [presenter in bold] 708. W. Roy, P. Berger, S. Butler, L. Yoksoulian, Y.-F. Lin, E. Mehnert, S. Panno, K. Hackley, N. Adams, D. Beach, P. Weberling & M. Schuh, S. Benson, C. Strandli, C. Ray and M. D’Alessio, 2013. Protecting Drinking Water by Reducing Uncertainties Associated with Geologic Carbon Sequestration in Deep Saline Aquifers, USEPA STAR Project Review 2013, USEPA HQ, Washington, DC, January 7. Paper Presented [presenter in bold]

PY1-3 Comprehensive Report Page 31

709. Mehnert, E., J. Damico, S. Frailey, H. Leetaru, Y.-F. Lin, R. Okwen, N. Adams, B. Storsved, A. Valocchi, 2013. Basin- scale Modeling of CO2 Sequestration in the Illinois Basin—Status Report, 2013 Midwest Carbon Sequestration Science Conference, October 8, 2013 Champaign, Illinois. Paper Presented [presenter in bold] 68. MCA01S026 710. L. Horvathova, M. Dubecky, L. Mitas, and I. Stich, QMC study of energetics and structure of neutral and cationic vanadium-benzene half-sandwich, J. Chem. Theory Comput. 9, 390 (2013) 711. M. Dubecky, P. Jurecka, R. Derian, P. Hobza, M. Otyepka, and L. Mitas, Quantum Monte Carlo describe noncovalent interactions with subchemical accuracy, J. Chem. Theor. Comp., 9, 4287 (2013) 712. Zhu, M. and Mitas, L. Quantum Monte Carlo study of effective core potentials accuracy for transition elements, Chem. Phys. Lett., 572, 136 (2013) 713. S. Guo, L. Mitas, and P.J. Reynolds, Study of dipole moments of LiSr and KRb molecules by quantum Monte Carlo methods, Molec. Phys., 111, 1744 (2013). 714. A. Kulahlioglu, K. M. Rasch, S. Hu, and L. Mitas, Density dependence of fixed-node error in quantum Monte Carlo for spin-polarized systems: triplet correlations, Chem. Phys. Lett., 591, 170 (2014) 715. K. M. Rasch, S. Hu and L. Mitas, Fixed-node errors in quantum Monte Carlo: Interplay of electron density and node nonlinearities, J. Chem. Phys. 140, 041102 (2014) 716. M. Dubecky, R. Derian, P. Jurecka, L. Mitas, P. Hobza, and M. Otyepka, Quantum Monte Carlo protocols for noncovalent interactions, submitted to Phys. Chem. Chem. Phys., in review. 717. L. Horvathova, R. Derian, L. Mitas, I. Stich, Spin transport through one-dimensional transition metal organometallic cluster systems, submitted to New J. of Physics, in review. 718. Q. Niu, J. Dinan, S. Tirukkovalur, L. Wagner, L. Mitas, and P. Sadayappan, A global address space approach to automated data management for parallel quantum Monte Carlo applications, Concurrency and Computation, in review. 719. M. Zhu, S. Guo, S. Hu, and L. Mitas, Spins as quantum variables in diffusion Monte Carlo method, in preparation, 2014. 720. A. Kulahlioglu and L. Mitas, Singlet and triplet excitations in zn-porphyrin, in preparation, 2014. 69. MCA02N014 721. I. Vega, B. Wardell, P. Diener, S. Cupp, and R. Haas, Scalar self-force for eccentric orbits around a Schwarzschild black hole, Phys.Rev. D88, 084021 (2013), 1307.3476. 722. T. Damour, F. Guercilena, I. Hinder, S. Hopper, A. Nagar, et al., Strong-Field Scattering of Two Black Holes: Numerics Versus Analytics (2014), 1402.7307. 723. R. De Pietri, A. Feo, L. Franci, and F. L¨offler, Stiffness effects on the dynamics of the bar-mode instability of Neutron Stars in full General Relativity (2014), arXiv:gr-qc/1403.8066. 70. MCA03S012 724. Böse, M., Graves, R., Gill, D., Callaghan, S., Maechling, P (2014, in revision). CyberShake-•Derived Ground-•Motion Prediction Models for the Los Angeles Region with Application to Earthquake Early Warning, Geophys. J. Int. 725. Cui, Y., Poyraz, E., Olsen, K., Zhou, J., Withers, K., Callaghan, S., Larkin, J., Guest, C., Choi, D., Chourasia, A., Shi, Z., Day, S., Maechling, P. and Jordan, T., Physics-•‐based Seismic Hazard Analysis on Petascale Heterogeneous Supercomputers, SC13, Denver, Nov 17-•22, 2013-•1. 726. Cui, Y., Poyraz, E., Callaghan, S., Maechling, P., Chen, P. and Jordan, T., Accelerating CyberShake Calculations on XE6/XK7 Platforms of Blue Waters, Extreme Scaling Workshop 2013, August 15-•16, Boulder, 2013-•2. 727. Field, E.H., Biasi, G.P., Bird, P., Dawson, T.E., Felzer, K.R., Jackson, D.D., Johnson, K.M., Jordan, T.H., Madden, C., Michael, A.J., Milner, K.R., Page, M.T., Parsons, T., Powers, P.M., Shaw, B.E., Thatcher, W.R., Weldon, R.J., II, and Zeng, Y., 2013, Uniform California earthquake rupture forecast, version 3 (UCERF3)—The time-•independent model: U.S. Geological Survey Open-•‐File Report 2013–1165, 97 p., California Geological Survey Special Report 228, and Southern California Earthquake Center Publication 1792, http://pubs.usgs.gov/of/2013/1165/ 728. Isbiliroglu, Y., R. Taborda, and J. Bielak (2013). Coupled soil-•structure interaction effects of building clusters during earthquakes, Earthquake Spectra, in press, doi 10.1193/102412EQS315M. 729. Lee, E. & Chen, P. (2013). Automating Seismic Waveform Analysis for Full-•3D Waveform Inversions. Geophys. J. Int. 194 (1): 572-•589. 730. Lee, E., Huang, H., Dennis, J. M., Chen, P., & Wang, L. (2013) An optimized parallel LSQR algorithm for large-•scale seismic tomography (submitted to Computers & Geosciences). 731. Roten, D., K.B. Olsen, S.M. Day, Y. Cui, and D. Fah (2014). Expected seismic shaking in Los Angeles reduced by San Andreas fault zone plasticity, Geophys. Res. Lett., in press. 732. Olsen, K. B., W. Savran, B. H. Jacobsen (2013), Ground motion prediction from low-•velocity sediments including statistical models of inhomogeneities in Southern California basins, Seismol. Res. Lett., 84:2, 334. 733. Poyraz, E., Xu, Heming and Cui, Y. (2014). I/O Optimziations for High Performance Scientific Applications. ICCS’14, Cairns, June 10-•12 (accepted).

PY1-3 Comprehensive Report Page 32

734. Restrepo, D. (2013). Effects of Topography on 3D Seismic Ground Motion Simulation with an Application to the Valley of Aburra in Antioquia, Colombia. Ph.D. Thesis, Carnegie Mellon University, October, Pittsburgh, PA. 735. Roten, D., K.B. Olsen, S.M. Day, Y. Cui, and D. Fah (2014), Expected seismic shaking in Los Angeles reduced by San Andreas fault zone plasticity, Geophysical Research Letters, in revision. 736. Taborda, R. and J. Bielak (2013). Ground-•‐motion simulation and validation of the 2008 Chino Hills, California, earthquake, Bull. Seismol. Soc. Am. 103, no. 1, 131–156, doi 10.1785/0120110325. 737. Taborda, R. and J. Bielak (2014). Ground-•‐motion simulation and validation of the 2008 Chino Hills, California, earthquake using different velocity models, Bull. Seismol. Soc. Am., accepted for publication. 738. Shi, Z., and S. M. Day (2013), Rupture dynamics and ground motion from 3-•D rough-•fault simulations, J. Geophys. Res. Solid Earth, 118, 1122–1141, doi:doi:10.1002/jgrb.50094. 739. Wang, F., and Jordan, T. H. (2013), Comparison of probabilistic seismic hazard models using averaging-•based factorization, Bull. Seismol. Soc. Am., 84 pp., submitted 09/27/13. 740. Withers, K., K. B. Olsen, S. Shi, S. M. Day, and R. Takedatsu (2013). Deterministic high-•frequency ground motions from simulations of dynamic rupture along rough faults, Seismol. Res. Lett., 84:2, 335. 741. Zhou, J., Y. Cui, E. Poyraz, D. Choi, and C. Guest, "Multi-•‐GPU implementation of a 3D finite difference time domain earthquake code on heterogeneous supercomputers," Proceedings of International Conference on Computational Science, Vol. 18, 1255-•1264, Elesvier, ICCS 2013, Barcelona, June 5-•7, 2013. 71. MCA04N012 742. Battaglia, N.; Trac, H.; Cen, R.; Loeb, A., \Reionization on Large Scales. I. A Parametric Model Constructed from Radiation-hydrodynamic Simulations", 2013, ApJ, 776, 81 743. Battaglia, N.; Natarajan, A.; Trac, H.; Cen, R.; Loeb, A., \Reionization on Large Scales. III. Predictions for Low-l Cosmic Microwave Background Polarization and High-l Kinetic Sunyaev-Zel'dovich Observables", 2013, ApJ, 776, 83 744. La Plante, Paul; Battaglia, Nicholas; Natarajan, Aravind; Peterson, Jeffrey B.; Trac, Hy; Cen, Renyue; Loeb, Abraham, \Reionization on Large Scales IV: Predictions for the 21 cm signal incorporating the light cone effect", 2014, ApJ, in press 72. MCA06N043 745. A. Uzun, R. Kumar, M. Y. Hussaini and F. S. Alvi, “Simulation of Tonal Noise Generation by Supersonic Impinging Jets,” AIAA Journal, Vol. 51, No. 7, July 2013, pp. 1593 – 1611. 746. A. Uzun, J. T. Solomon, C. H. Foster, W. S. Oates, M. Y. Hussaini and F. S. Alvi, “Flow Physics of a Pulsed Microjet Actuator for High-Speed Flow Control,” AIAA Journal, Vol. 51, No. 12, December 2013, pp. 2894 – 2918. 747. A. Uzun, F. S. Alvi, T. Colonius and M. Y. Hussaini, “Spatial Stability Analysis of Subsonic Jets Modified for Low- Frequency Noise Reduction,” in preparation for submission to AIAA Journal. 748. A. Uzun, F. S. Alvi and M. Y. Hussaini, “Optimally-Growing Boundary Layer Disturbances in a Convergent Nozzle Preceded by a Circular Pipe,” in preparation for submission to Journal of Fluid Mechanics. 73. MCA07S014 749. Schmidt, W.; Collins, D.; Kritsuk, A. G., \Support Against Gravity in Magnetoturbulent Fluids." Mon. Not. Roy. Astron. Soc. 431, 3196-3215 (2013). 750. Kritsuk, A. G.; Wagner, R.; Norman, M. L., \Energy Cascade and Scaling in Supersonic Isothermal Turbulence." Journal of Fluid Mechanics, 713, R1, 11 pp. (2013). 751. Kritsuk, A. G.; Lee, C. T.; Norman, M. L., \A Supersonic Turbulence Origin of Larson's Laws." Mon. Not. Roy. Astron. Soc., 436, 3247-3261 (2013). 752. Bryan, Greg L.; Norman, Michael L.; O'Shea, Brian W.; Abel, Tom; Wise, John H.; Turk, Matthew J.; Reynolds, Daniel R.; Collins, David C.; Wang, Peng; Skillman, Samuel W.; Smith, Britton; Harkness, Robert P.; Bordner, James; Kim, Ji- hoon; Kuhlen, Michael; Xu, Hao; Goldbaum, Nathan; Hummels, Cameron; Kritsuk, Alexei G.; Tasker, Elizabeth; Skory, Stephen; Simpson, Christine M.; Hahn, Oliver; Oishi, Jeffrey S.; So, Geoffrey C.; Zhao, Fen; Cen, Renyue; Li, Yuan; The Enzo Collaboration, \ENZO: An Adaptive Mesh Refinement Code for Astrophysics." Astrophys. J. Supplement, 211, 19, 52 pp. (2014). 753. Invited talk, \Towards ab initio Simulations of Star Formation in Turbulent Molecular Clouds" (Presenter: Alexei Kritsuk), Astronomy Colloquium, The University of Chicago, Illinois, February 2013. 754. Seminar talk, \Energy Cascade and Scaling in Supersonic Turbulence" (Presenter: Alexei Kritsuk), Plasma Physics Seminar, UCSD, La Jolla, California, February 2013. 755. Contributed talk, \Energy Cascade and Scaling in Supersonic Turbulence" (Presenter: Alexei Kritsuk), 14th European Turbulence Conference, Lyon, France, September 2013. 756. Invited talk, \Interstellar Turbulence: A perspective towards Exascale" (Presenter: Alexei Kritsuk), International Conference on Exascale Computing in Astrophysics, Ascona, Swtzerland, September 2013.

PY1-3 Comprehensive Report Page 33

757. Invited talk, \High-Resolution Compressible MHD Simulations on a Cluster of GPUs" (Presenter: Pierre Kestener), International Conference on Exascale Computing in Astrophysics, Ascona, Swtzerland, September 2013. 758. Invited talk, \New Exact Relations for Compressible Turbulence" (Presenter: Alexei Kritsuk), International Workshop on Turbulence and Amorphous Materials, Eilat, Israel, November, 2013. 759. Contributed talk, \High Resolution Astrophysical Fluid Dynamics Simulations on a GPU Cluster" (Presenter: Pierre Kestener), GPU Technology Conference 2014, San Jose, California, March 2014. 760. Invited talk, \Simulating Star Formation in Turbulent Molecular Clouds" (Presenter: Alexei Kritsuk), Astronomy Colloquium, UIUC, Urbana, Illinois, March, 2014. 74. MCA07S033 761. Borovikov, S. N., Heerikhuisen, J., Pogorelov, N. V. (2013) Hybrid parallelization of adaptive MHD-kinetic module in the Multi-Scale Fluid-Kinetic Simulation Suite, in Numerical Modeling of Space Plasma Flows: ASTRONUM-2012, ed. N.V. Pogorelov, E. Audit, & G.P. Zank, 219–224, Astronomical Society of the Pacific Conf. Ser. 474 ASP: San Francisco) 762. Borovikov, S. N.; Pogorelov, N. V. (2014), Voyager 1 near the Heliopause, Astrophys. J., 783, L16 763. Gamayunov, K., Zhang, M., Pogorelov, N., Heerikhuisen, J., Rassoul, H. (2013), Alfv´enic turbulence generated by the interstellar pickup protons in the outer heliosphere, in AIP Conf. Proc. 1539, ed. G.P. Zank et al., SOLARWIND 13: Proc. Thirteenth International Solar Wind Conference, 171–174 (AIP: New York) 764. Heerikhuisen, J., Pogorelov, N., Zank, G. (2013), Simulating the heliosphere with kinetic hydrogen and dynamic MHD source terms, in Numerical Modeling of Space Plasma Flows: ASTRONUM-2012, ed. N.V. Pogorelov, E. Audit, & G.P. Zank, 195–200, Astronomical Society of the Pacific Conf. Ser. 474 (ASP: San Francisco) 765. Heerikhuisen, J., Zirnstein, E. J., Funsten, H. O., Pogorelov, N. V., Zank, G. P. (2014), The effect of new interstellar medium parameters on the heliosphere and energetic neutral atoms from the interstellar boundary, Astrophys. J., 784, 73 766. Kim, T. K., Pogorelov, N. V., Borovikov, S. N., Jackson, B. V., Yu, H.-S., Tokumaru, M. (2014), MHD heliosphere with boundary conditions from the UCSD tomographic reconstruction using interplanetary scintillation data, J. Geophys. Res, in press 767. Kucharek, H., Fuselier, S. A., Wurz, P., Pogorelov, N., Borovikov, S., Lee, M. A., Moebius, E., Reisenfeld, D., Funsten, H., Schwadron, N., McComas, D. (2013), The solar wind as a possible source of fast temporal variations of the heliospheric ribbon, Astrophys. J., 776, 109 768. Pogorelov, N. V., Suess, S. T., Borovikov, S. N., Ebert, R. W., McComas, D. J., Zank, G. P.(2013a), Three dimensional features of the outer heliosphere due to coupling between the interstellar and interplanetary magnetic fields. IV. Solar Cycle Model Based on Ulysses Observations, Astrophys. J., 772, 2 769. Pogorelov, N. V., Borovikov, S. N., Bedford, M. C., Heerikhuisen, J., Kim, T. K., Kryukov, I. A., Zank, G. P. (2013) Modeling solar wind flow with the Multi-Scale Fluid-Kinetic Simulation Suite, in Numerical Modeling of Space Plasma Flows: ASTRONUM-2012, ed. N.V. Pogorelov, E. Audit, & G.P. Zank, 165–170, Astronomical Society of the Pacific Conf. Ser. 474 (ASP: San Francisco) 770. Pogorelov, N. V., Borovikov, S. N., Burlaga, L. F., Ebert, R. W., Kryukov, I. A., Suess, S. T., Zank, G. P., Bedford, M. C. (2013c), Unsteady processes in the vicinity of the heliopause: Are we in the LISM yet? In AIP Conf. Proc. 1539, ed. G.P. Zank et al., SOLARWIND 13: Proc. Thirteenth International Solar Wind Conference, 352–355 (AIP: New York) 771. Zirnstein, E. J., Heerikhuisen, J., Zank, G. P., Pogorelov, N. V., McComas, D. J., Desai, M. I. (2014), Charge-exchange coupling between pickup ions across the Heliopause and its Effect on Energetic Neutral Hydrogen Flux, Astrophys, J., 783, 129 (2014) 772. Zank, G. P., Heerikhuisen, J., Wood, B. E., Pogorelov, N. V., Zirnstein, E., McComas, D. J. (2013), Heliospheric structure: The bow wave and the hydrogen wall, Astrophys. J., 763, 20 75. MCA08X002 773. J. Mondal, J. A. Morrone, and B. J. Berne, How hydrophobic drying forces impact the kinetics of molecular recognition, Proc. Natl. Acad. Sci., 110, 13277, (2013). 774. J. Mondal, G. Stirnemann, and B. J. Berne, When Does Trimethylamine N-Oxide Fold a Polymer Chain and Urea Unfold It?, J. Phys. Chem. B, 117, 8723, (2013). 775. R. A. Nistor, T. E. Markland, and B. J. Berne. Interface limited growth of heterogeneously nucleated ice in supercooled water, J. Phys. Chem. B, 118, 752 (2014). 776. R. A. Nistor, T. E. Markland, and B. J. Berne, Interplay of solute hydrophobic and hydrogen bonding interactions on the crystallization rate of heterogeneously nucleated ice. To be submitted to PCCP (2014) 777. J. Mondal, R. A. Friesner, B. J. Berne, How water contributes to free energy barriers in ligand binding : Insights from kinase-ligand binding, To be submitted (2014) 778. J. Mondal, G. Stirnemann, and B. J. Berne, Role of osmolytes on a synthetic polymer, In preparation (2014).

PY1-3 Comprehensive Report Page 34

76. MCA08X013 779. JC Dietrich, CN Dawson, JM Proft, MT Howard, G Wells, JG Fleming, RA Luettich Jr, JJ Westerink, Z Cobell, M Vitse, H Lander, BO Blanton, CM Szpilka, JH Atkinson. (2013). Real-Time Forecasting and Visualization of Hurricane Waves and Storm Surge using SWAN+ADCIRC and FigureGen. Computational Challenges in the Geosciences, CN Dawson and M Gerritsen, eds., The IMA Volumes in Mathematics and Its Applications, Volume 156, Springer, 2013. 780. C Dawson, A local time-stepping Runge-Kutta discontinuous Galerkin method for hurricane storm surge modeling, in Recent Developments in Discontinuous Galerkin Methods for Partial Differential Equations, X Feng, O Karakashian and Y Xing, editors, Springer, to appear. 781. RC Martyr, JC Dietrich, JJ Westerink, PC Kerr, CN Dawson, JM Smith, H Pourtaheri, N Powell, M van Ledden, S Tanaka, HJ Roberts, LG Westerink, HJ Westerink (2013). Simulating Hurricane Storm Surge in the Lower Mississippi River under Varying Flow Conditions. Journal of Hydraulic Engineering, 139, pp. 492{501, 2013. 782. JC Dietrich, M Zijlema, P-E Allier, LH Holthuijsen, N Booij, JD Meixner, JK Proft, CN Dawson, CJ Bender, A Naimaster, JM Smith, JJ Westerink (2013). Limiters for Spectral Propagation Velocities in SWAN. Ocean Modelling, 70, pp. 85-102, 2013. 783. TJ Povich, CN Dawson, CE Kees, and MW Farthing, Finite element methods for variable density flow and solute transport, Computational Geosciences, 17, pp. 529-550, 2013. 784. C Dawson, CJ Trajan, EJ Kubatko and JJ Westerink, A parallel local time-stepping Runge-Kutta discontinuous Galerkin method with applications to coastal ocean modeling, Computer Methods in Applied Mechanics and Engineering, 259, pp. 154-165, 2013. 785. D Wirasaet, EJ Kubatko, CE Michoski, S Tanaka, JJ Westerink, C Dawson, Discontinuous Galerkin Methods with Nodal and Hybrid Modal/Nodal Triangular, Quadrilateral, and Polygonal Elements for Nonliner Shallow Water Flow, Computer Methods in Applied Mechanics and Engineering, to appear. 786. ME Hope, JJ Westerink, AB Kerr, JC Dietrich, C Dawson, C Bender, JM Smith, RM Jensen, M. Zijlema, LH Holthuijsen, RA Luettich, Jr., MD Cardone, VJ Cox, AT Pourtaheri, H Roberts, HJ Atkinson, S Tanaka, HJ Westerink, LG Westerink, Hindcast and validation of Hurricane Ike (2008) waves, forerunner, and storm surge, Journal of Geophysical Research Oceans, to appear. 77. MCA08X034 787. Electron impact excitation of the lowest doublet and quartet core-excited autoionizing states in Rb atoms A. Borovik, V.Roman, O.Zatsarinny, andK. Bartschat, J. Phys.B 46 (2013) 015203 788. B-spline R-matrix-with-pseudostates calculations for electron-impact excitation and ionization of carbon Y. Wang,O. Zatsarinny, and K. Bartschat, Phys. Rev.A 87 (2013) 012704 5 2 o 789. Fine-structure-resolved electron collisions from chlorine atoms in the (3p ) P states 3/2 ,1 /2 Y. Wang, O. Zatsarinny, K. Bartschat, and J.-P. Bo oth, Phys. Rev.A 87 (2013) 022703 790. Polarization correlations for electron-impact excitation of the resonant transitions of Ne and Ar at low incident energies L. R. Hargreaves, C. Campb ell, M. A. Khako o, J. W. McConkey, O. Zatsarinny, K. Bartschat, A. D. Stauff er,and R. P. McEachran, Phys.Rev.A 87 (2013) 022711. 791. Electron-impact ionization of neon at low projectile energy: An internormalized experiment for a complex target T. Pfl uger, O. Zatsarinny, and K. Bartschat, A. Senftleb en, X. Ren, J. Ullrich, and A. Dorn, Phys. Rev. Lett. 110 (2013) 153202 792. Relativistic B-spline R-matrix calculations for electron collisions with lead atoms: differential cross sections and spin asymmetries O. Zatsarinny, Y.Wang,and K. Bartschat,J. Phys.B 46 (2013) 035202 793. Polarization correlations for electron-impact excitation of the resonant transitions of Ne and Ar at low incident energies L. R. Hargreaves, C. Campb ell, M. A. Khako o, J. W. McConkey, O. Zatsarinny, K. Bartschat, A. D. Stauff er,and R. P. McEachran, Phys.Rev.A 87 (2013) 022711 794. Carrier-envelope phase effects in above-threshold ionization of atomic hydrogen W.C. Wallace, M.G.Pullen, D.E. Laban, A.J. Palmer, G.F. Hanne, A.N.Grum-Grzhimailo, K. Bartschat, H.M.Quiney, I.V. Litv inyuk, R.T. Sang, and D. Kielpinski, New J. of Phys. 15 (2013) 033002 795. Anomalously large low-energy elastic cross sections for electron scattering from the CF3 radical J. R. Brunton, L. R. Hargreaves, S. J.Buckman, G. Garc ıa, F. Blanco, O. Zatsarinny, K.Bartschat, and M. J. Brunger, Chem. Phys. Lett. 568-569 (2013) 55. 796. Experimental and theoretical investigation of (e,2e ) ionization of Ar(3p) in asymmetric kinematics at 200 eV M. Ulu, Z. Nur Ozer, M. Yavuz, O.Zatsarinny,K. Bartschat, M. Dogan, and A.Crowe, J. Phys.B 46 (2013) 115004 797. Photoionization of helium by attosecond pulses: extraction of spectra from correlated wavefunctions L. Argenti, R. Pazourek, J. Feist, S. Nagele, M. Liertzer, E. Persson, J. Burgd orfer, and E. Lindroth, Phys. Rev. A 87 (2013) 053405

PY1-3 Comprehensive Report Page 35

+ 798. Photoionization of the H2 ion by ultrashort elliptically polarized laser pulses X. Guan, R. C. DuToit, and K. Bartschat, Phys. Rev.A 87 (2013) 053410 15 2 799. Measurement of laser intensities approaching 10 W/cm with an accuracy of 1% M. G. Pullen, W. C.Wallace, D. E. Laban, A.J. Palmer, G. F. Hanne, A.N. Grum-Grzhimailo, K. Bartschat, I.Ivanov, A. Kheifets, D.Wells, H.M. Quiney, X. M.Tong, I.V. Litvinyuk, R. T. Sang, andD. Kielpinski, Phys. Rev.A 87 (2013) 053411 800. Time-resolved photoemission on the attosecond scale: opportunities and challenges R. Pazourek, S. Nagele, and J. Burgd orfer, Faraday Discussions163 (2013) 353 801. Comparisons of sets of electron-neutral scattering crosssections and swarm parameters in noble gases I.Argon L.C.Pitchford, L. L.Alves,K. Bartschat, S.F.Biagi,M.C. Bordage,A. V.Phelps, C.M.Ferreira,G.J.M.Hagelaar, W. L. Morgan, S.Pancheshnyi, V.Puech, A. Stauffer, and O. Zatsarinny, J.Phys.D 46 (2013) 334001 802. Comparisons of sets of electron-neutral scattering crosssections and swarm parameters II. Helium and Neon L. L. Alves, M. C.Bordage, S. F. Biagi, L. C. Pitchford, O. Zatsarinny, K. Bartschat, G. J. M. Hagelaar, S. Pancheshnyi, C. M. Ferreira, V. Puech, W. L. Morgan, and A. V. Phelps, J. Phys.D 46 (2013) 334002 803. Comparisons of sets of electron-neutral scattering crosssections and swarm parameters in noble gases III.Krypton and Xenon M.C.Bordage, S.F.Biagi, L.C.Pitchford, K. Bartschat, S.Chowdhury, G.J.M.Hagelaar, W.L.Morgan, V.Puech, and O. Zatsarinny, J. Phys.D 46 (2013) 334003 804. Computational methods for electron-atom collisions in plasma applications K. Bartschat, J. Phys.D 46 (2013) 334004 805. Coher nce in multistate resonance-enhanced four-photon ionization of lithium atoms M. Schuricke, K. Bartschat, A. N. Grum-Grzhimailo, G. Zhu, J. Steinmann, R. Moshammer, J. Ullrich, and A. Dorn, Phys. Rev.A 88 (2013) 023427 806. TOPICAL REVIEW: The B-spline R-matrix Method for Atomic Processes: Application to Atomic Structure, Electron Collisions, and Photoionization O. Zatsarinny and K. Bartschat, J. Phys.B 46 (2013) 112001 807. Differential cross sections for low-energy elastic electron scattering from the CF3 radical J. R. Brunton, L. R. Hargreaves, T. M. Maddern, S. J. Buckman, G. Garc ıa, F. Blanco, O. Zatsarinny, K. Bartschat, D. B. Jones, G. B. da Silva, and M. J. Brunger, J. Phys.B 46 (2013) 245303 808. Threshold alignment reversal and circularly polarized fluorescence in rotationally resolved H2 J.W. Maseb erg, K. Bartschat, and T.J. Gay, Phys. Rev. Lett. 111 (2013) 253201 809. Resonance Effects in Two-Photon double ionization ofH2 by femtosecond XUV laser pulses X. Guan, K. Bartschat, B. I. Schneider, and L. Koesterke, Phys. Rev.A 88 (2013) 043402 810. Effects of numerical approximations in the treatment of short-pulse strong-field ionization of atomic hydrogen A. N.Grum-Grzhimailo, M. N. Khaerdinov, and K. Bartschat, Phys. Rev.A 88 (2013) 055401 2 1 1 811. Energy dependence of the (e,2e) recoil/binary peak ratio across He (2p ) D and (2s2p) P autoionizing levels B.A. deHarak, K. Bartschat, andN.L.S. Martin, Phys.Rev.A 89 (2014) 012702 812. Benchmark calculation of total cross sections for ionization-excitation of helium O. Zatsarinny and K. Bartschat, (Fast Track Communication) J. Phys.B 47 (2014) 061001 813. Electron-impact excitation of argon at intermediate energies O. Zatsarinny, Y.Wang,and K. Bartschat,Phys.Rev.A 89 (2014) 022706 814. Timedelays for attosecond streaking in photoionization of neon J. Feist, O. Zatsarinny, S. Nagele, R. Pazourek, J. Burgd orfer, X. Guan, K. Bartschat, and B. I. Schneider, Phys. Rev.A 89 (2014) 033417 815. Complementary imaging of the nuclear dynamics in laser-excited diatomic molecular ions in the time and frequency domains M. Magrak velidze, A. Kramer, K. Bartschat, and U. Thumm, J. Phys.B 47 (2014), accepted Feb. 2014 816. Electron collisions with cesium atoms – benchmark calculations and application to modeling an excimer-pumped alkali laser O. Zatsarinny, K. Bartschat, N.Y. Babaeva, and M.J. Kushner, submittedtoPlasmaSourcesScience andTechnology (2014) 817. Spin-exchange cross sections in low-energy electron collisions with one-electron ions K. Bartschat and H.R. Sadeghp our, submitted to Astrophysical Journal (2014) 818. Time-dependent computational methods for matter under extreme conditions B. I.Schneider, K. Bartschat, X. Guan, D. Feder, and L. A. Collins, submitted to Advances inChemical Physics (2013) 819. Probing time-ordering in two-photon double ionization of helium on the attosecond time scale R. Pazourek, S. Nagele, and J. Burgd orfer,in preparation for Phys. Rev. Lett. (2014). 820. Time-resolved photoemission using attosecond streaking S. Nagele, R. Pazourek, M.Wais, G. Wachter, and J. Burgd orfer, accepted for J. Phys.: Conf.Ser. (2014). 821. Attosecond Chronoscopy of Photoemission R. Pazourek, S. Nagele, and J. Burgd orfer, in preparation for Rev. Mo d. Phys. (2014).

PY1-3 Comprehensive Report Page 36

822. Ionization of helium by slow antiproton impact: total and differential cross sections S. Borb ely, J. Feist, K. T oke si, S. Nagele, L. Nagy, and J. Burgd orfer, in preparation for Phys. Rev. A(2014). 78. MCA93S028 823. Co-translational folding of nascent protein, Julian Deeng, Kwok-Yan Chan, Birgitta Beatrix, Wei Han, James Gumbart, Klaus Schulten and Roland Beckmann. Dynamic behavior of trigger factor on the ribosome. 2014. Submitted. 824. Molecular mechanism of EttA-regulated ribosomal translation, Shanmugapriya Sothiselvam, Bo Liu, Wei Han, Dorota Klepacki, Gemma C. Atkinson, Age Brauer, Maido Remm, Tanel Tenson, Klaus Schulten, Nora V´azquez-Laslop, and Alexander S. Mankin. Macrolide antibiotics allosterically predispose the ribosome for translation arrest. 2014. Submitted. 825. Hexameric Molecular Motors, Wen Ma and Klaus Schulten. Molecular mechanism of RNA translocation coupled to the ATPase cycle of a hexameric helicase. 2014. Submitted. 826. Membrane bending by synaptotagmin I, Zhe Wu and Klaus Schulten. Synaptotagmin’s role in neurotransmitter release likely involves Ca2+-induced conformational transition. 2014. Submitted. 827. Mortaza Aghtar, Johan Str¨umpfer, Carsten Olbrich, Klaus Schulten, and Ulrich Kleinekathoefer. The FMO complex in a glycerol-water mixture. Journal of Physical Chemistry B, 117:7157-7163, 2013. 828. Gongpu Zhao, Juan R. Perilla, Ernest L. Yufenyuy, Xin Meng, Bo Chen, Jiying Ning, Jinwoo Ahn, Angela M. Gronenborn, Klaus Schulten, Christopher Aiken, and Peijun Zhang. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature, 497:643-646, 2013. 829. Kai Zhang, Li Wang, Yanxin Liu, Kwok-Yan Chan, Xiaoyun Pang, Klaus Schulten, Zhiyang Dong, and Fei Sun. Flexible interwoven termini determine the thermal stability of thermosomes. Protein & Cell, 4:432-444, 2013. 830. Wei Han and Klaus Schulten. Characterization of folding mechanisms of Trp-cage and WW-domain by network analysis of simulations with a hybrid-resolution model. Journal of Physical Chemistry B, 117:13367-13377, 2013. 831. Ivan Teo and Klaus Schulten. A computational kinetic model of diffusion for molecular systems. Journal of Chemical Physics, 139:121929, 2013. 832. Boon Chong Goh, Michael J. Rynkiewicz, Tanya R. Cafarella, Mitchell R. White, Kevan L. Hartshorn, Kimberly Allen, Erika C. Crouch, Oliviana Calin, Peter H. Seeberger, Klaus Schulten, and Barbara A. Seaton. Molecular mechanisms of inhibition of influenza by surfactant protein D revealed by large-scale molecular dynamics simulation. Biochemistry, 52:8527-8538, 2013. 833. Christopher G. Mayne, Jan Saam, Klaus Schulten, Emad Tajkhorshid, and James C. Gumbart. Rapid parameterization of small molecules using the Force Field Toolkit. Journal of Computational Chemistry, 34:2757-2770, 2013. 834. Anuj Girdhar, Chaitanya Sathe, Klaus Schulten, and Jean-Pierre Leburton. Graphene quantum point contact transistor for DNA sensing. Proceedings of the National Academy of Sciences, USA, 110:16748-16753, 2013. 835. Ramya Gamini, Wei Han, John E. Stone, and Klaus Schulten. Assembly of Nsp1 nucleoporins provides insight into nuclear pore complex gating. PLoS Computational Biology, 10:e1003488, 2014. 836. Yong Wang, Yanxin Liu, Hannah A. DeBerg, Takeshi Nomura, Melinda Tonks Hoffman, Paul R. Rohde, Klaus Schulten, Boris Martinac, and Paul R. Selvin. Single molecule FRET reveals pore size and opening mechanism of MscL. eLife, 3:e01834, 2014. 837. Ilia Solov’yov, Tatiana Domratcheva, and Klaus Schulten. Separation of photo-induced radical pair in cryptochrome to a functionally critical distance. Scientific Reports, 4:3845, 2014. 838. Yanxin Liu, Maxim B. Prigozhin, Klaus Schulten, and Martin Gruebele. Observation of complete pressure-jump protein refolding in molecular dynamics simulation and experiment. Journal of the American Chemical Society, 136:4265-4272, 2014. 839. Wei Jiang, James Phillips, Lei Huang, Mikolai Fajer, Yilin Meng, James Gumbart, Yun Luo, Klaus Schulten, and Benoit Roux. Generalized scalable multiple copy algorithms for molecular dynamics simulations in NAMD. Computational Physics Communications, 185:908-916, 2014. 840. Ilia A. Solov’yov, P. J. Hore, Thorsten Ritz, and Klaus Schulten. A chemical compass for bird navigation. In Masoud Mohseni, Yasser Omar, Greg Engel, and Martin B. Plenio, editors, Quantum Effects in Biology, chapter 10, pp. 218- 236. Cambridge University Press, 2014. 841. Qufei Li, Sherry Wanderling, Marcin Paduch, David Medovoy, Abhishek Singharoy, Ryan McGreevy, Carlos Villalba- Galea, Raymond E. Hulse, Benoit Roux, Klaus Schulten, Anthony Kossiakoff, and Eduardo Perozo. Structural mechanism of voltage-dependent gating in an isolated voltage-sensing domain. Nature Structural & Molecular Biology, 21:244-252, 2014. 842. Qiangjun Zhou, Jiangmei Li, Hang Yu, Yujia Zhai, Zhen Gao, Yanxin Liu, Xiaoyun Pang, Lunfeng Zhang, Klaus Schulten, Fei Sun, and Chang Chen. Molecular insights into the membrane associated phosphatidylinositol 4-kinase II_. Nature Communications, 2014. In press. 843. Michael T. Englander, Joshua L. Avins, Rachel C. Fleisher, Bo Liu, Philip R. Effraim, Jiangning Wang, Klaus Schulten, Thomas S. Leyh, Ruben L. Gonzalez Jr., and Virginia W. Cornish. The ribosome can discriminate the chirality of amino acids within its peptidyl-transferase center. 2014. Submitted.

PY1-3 Comprehensive Report Page 37

844. Johan Str¨umpfer, Mortaza Aghtar, Carsten Olbrich, Klaus Schulten, and Ulrich Kleinekathoefer. Environmental Effects in the Cryptophyte PE545 Antenna System. 2014. Submitted. 845. Anuj Girdhar, Chaitanya Sathe, Klaus Schulten and Jean-Pierre Leburton. Gate-Modulated Graphene Quantum Point Contact Device for DNA Sensing. 2014. Submitted. 846. Rafael C. Bernardi, Isaac Cann, and Klaus Schulten. Molecular dynamics study of enhanced Man5B enzymatic activity. 2014. Submitted. 847. Kai Zhang, Li Wang, Yanxin Liu, Gang Ji, Kwok-Yan Chan, Klaus Schulten, Zhiyang Dong, and Fei Sun. Domain electrostatic interactions modulate the conformation propensity of group II chaperonin. 2014. Submitted. 848. Stephan Wickles, Abhishek Singharoy, Jessica Andreani, Stefan Seemayer, Lukas Bischoff, Otto Berninghausen, Johannes Soeding, Klaus Schulten, Eli van der Sluis, and Roland Beckmann. A structural model of the active ribosome- bound membrane protein insertase YidC. 2014. Submitted. 849. Wei Han and Klaus Schulten. Insight into the dynamics of fibril elongation for amyloidbeta( 17-42): a kinetic network analysis of simulations with a hybrid-resolution model. 2014. Submitted. 850. Ryan McGreevy, Abhishek Singharoy, Qufei Li, Jingfen Zhang, Eduardo Perozo, Klaus Schulten. xMDFF: Molecular Dynamics Flexible Fitting of Low Resolution X-Ray Structures. 2014. Submitted. 79. MCA95C006 851. Cheng, J. and M. Xue, 2014: Assimilation of attenuated data from an X-band radar network using ensemble Kalman filter: Simulated data experiments with a quasi-linear convective system. J. Atmos. Ocean. Tech., In preparation. 852. Cintineo, R., J. A. Otkin, M. Xue, and F. Kong, 2014: Evaluating the performance of planetary boundary layer and cloud microphysical parameterization schemes in a convection-permitting ensemble using synthetic GOES-13 satellite observations. Mon. Wea. Rev.,142, 163-182. 853. Clark, A.J., J.S. Kain, P.T. Marsh, J. James Correia, M. Xue, and F. Kong, 2014: Application of a 3-dimensional object algorithm to convection-allowing forecasts. Wea. Forecasting, accepted. 854. Clark, A. J., J. Gao, P. T. Marsh, T. Smith, J. S. Kain, J. Correia, Jr., M. Xue, and F. Kong, 2013: Tornado path length forecasts from 2010-2011 using ensemble updraft helicity. Wea. Forecasting, Accepted 855. Coniglio, M. C., J. Correia, Jr., P. T. Marsh, and F. Kong, 2013: Evaluation of WRF-ARW forecasts of the planetary boundary layer using sounding observations. Wea. Forecasting, Accepted. 856. Dahl, N., H. Xue, X. Hu, and M. Xue, 2014: Coupled fire-atmosphere modeling of wildfire Spread using DEVS-FIRE and ARPS. Int. J. Wildland Fire, Conditionally accepted. 857. Dawson, D. T., II, M. Xue, A. Shapiro, and J. A. Milbrandt, 2014: Impact of multi-moment microphysics on real-data simulations of the 3 May 1999 Oklahoma City tornadic supercell and associated tornadoes. Mon. Wea. Rev., To be submitted. 858. Dawson, D. T., II, E. R. Mansell, Y. Jung, L. J. Wicker, M. R. Kumjian, and M. Xue, 2013: Low-level ZDR signatures in supercell forward flanks: The role of size sorting and melting of hail. J. Atmos. Sci., In press. 859. Dong, J., and M. Xue, 2014: Airborne radar radial wind and dropsonde assimilation with Ensemble Kalman Filter for cloud-resolving analysis and forecast of Hurricane Earl (2010). In preparation. 860. Dong, J., and M. Xue, 2013: Assimilation of radial velocity and reflectivity data from coastal WSR-88D radars using ensemble Kalman filter for the analysis and forecast of landfalling hurrican IKE (2008). Quart. J. Roy. Meteor. Soc., 139, 467-487. 861. Dong, J. and M. Xue, 2013: The impact of assimilating satellite-derived winds, airborne Doppler radial velocity and dropsonde data on the analysis and prediction of Hurricane Earl (2010) using an ensemble Kalman filter. Poster presentations at the 6th WMO Data Assimilation Symposium. World Meter. Org., College Park, Maryland. 862. Dong, J. and M. Xue, 2013: The impact of assimilating airborne Doppler radar radial velocity and dropsonde data on the analysis and prediction of Hurricane Earl (2010) using the ARPS EnKF system and WRF prediction model. Poster presentations at the 93th AMS annual meeting: Special Symposium on the Next Level of Predictions in Tropical Meteorology: Techniques, Usage, Support, and Impacts. Amer. Meter. Soc., Austin, Texas. 863. Duda, J. D., X. Wang, F. Kong, and M. Xue, 2014: Using varied microphysics to account for uncertainty in warm- season QPF in a convection-allowing ensemble. Mon. Wea. Rev., Conditionally accepted. 3 864. Gagne, D. J., II, A. McGovern, and M. Xue, 2014: Machine learning enhancement of storm scale ensemble probabilistic quantitative precipitation forecasts. Mon. Wea. Rev., in review. 865. Gagne, D. J., A. McGovern, J. Brotzge, and M. Xue, 2013: Severe Hail Prediction within a Spatiotemporal Relational Data Mining Framework. Preprints, 2013 International Conference on Data Mining Workshops, Dallas, TX, IEEE, 994- 1001. 866. Gasperoni, N. A., M. Xue, R. D. Palmer, and J. Gao, 2013: Sensitivity of convective initiation prediction to near-surface moisture when assimilating radar refractivity: Impact tests using OSSEs. J. Atmos. Ocean. Tech., Conditionally accepted. 867. Ge, G., J. Gao, and M. Xue, 2013: Impacts of assimilating measurements of different sate variables on the analysis and forecast of a supercell storm using three dimensional variational method. Mon. Wea. Rev., Accepted.

PY1-3 Comprehensive Report Page 38

868. Hall, J. D. and M. Xue, 2014: Structure and evolution of vortex Rossby waves and asymmetries in numerically simulated typhoon Morakot (2009) with and without terrain. J. Atmos. Sci., To be submitted. 869. Hall, J.D., M. Xue, L. Ran and Lance M. Leslie, 2013: High-resolution modeling of typhoon Morakot (2009): Vortex Rossby waves and their and their role in extreme precipitation over . J. Atmos. Sci., 70, 163-186. 870. Hu, X.-M., P. Klein , M. Xue, F. Zhang, D. Doughty, R. Forkel, E. Joseph, and J. D. Fuentes, 2013a: Impact of the Vertical Mixing Induced by Low-level Jets on Boundary Layer Ozone Concentration, Atmos. Environ., 70, 123-130. 871. Hu, X.-M., P. M. Klein, M. Xue, A. Shapiro, and A. Nallapareddy, 2013b: Enhanced vertical mixing associated with a nocturnal cold front passage and its impact on near-surface temperature and ozone concentration, J. Geophys. Res., 118, 2714–2728, doi:10.1002/jgrd.50309. 872. Hu, X.-M., P. M. Klein, M. Xue, J. K. Lundquist, F. Zhang, and Y. Qi, 2013c: Impact of low-level jets on the nocturnal urban heat island intensity in Oklahoma City. J. Appl. Meteor. Climatol., doi:10.1175/JAMC-D-12-0256.1. 873. Hu, X.-M., P. M. Klein, and M. Xue, 2013d: Evaluation of the updated YSU Planetary Boundary Layer Scheme within WRF for Wind Resource and Air Quality Assessments, J. Geophys. Res., 118, doi:10.1002/jgrd.50823. 874. Hu, X.-M., J. D. Fuentes, D. Toohey, and D. Wang, 2013e: Chemical processing within and above a Loblolly pine forest in North Carolina, USA, J. of Atmos. Chem., DOI: 10.1007/s10874-013-9276-3. 875. Johnson, A., X. Wang, M. Xue, F. Kong, G. Zhao, Y. Wang, K. Thomas, K. Brewster, and J. Gao, 2014: Multiscale characteristics and evolution of perturbations for warm season convection-allowing precipitation forecasts: Dependence on background flow and method of perturbation. Mon. Wea. Rev., Accepted. 876. Klein, P. M., X.-M. Hu, M. Xue, 2013: Mixing processes in the nocturnal atmospheric boundary layer and their impacts on urban ozone concentrations, Bound.-layer meteor., conditionally accepted. 877. Liu, C. and M. Xue, 2013: A unified framework for four-dimensional ensemble-variational hybrid data assimilation. Quart. J. Roy. Meteor. Soc., Under review. 878. Putnam, B. J., M. Xue, Y. Jung, N. A. Snook, and G. Zhang, 2014: The analysis and prediction of microphysical states and polarimetric variables in a mesoscale convective system using double-moment microphysics, multi-network radar data, and the ensemble Kalman filter. Mon. Wea. Rev., 142, 141–162. 879. Putnam, Bryan J., Ming Xue, Guifu Zhang, Youngsun Jung, and Fanyou Kong, 2013: The Analysis and Prediction of Microphysical States and Polarimetric Radar Variables in a Mesoscale Convective System and Real Time Storm Scale Ensemble Forecast Using Advanced Multi-Moment Microphysics Schemes. 3rd International Symposium on Earth Science-Challenges, Kyoto, , ID O20. 880. Ran, L., M. Xue, and F. Kong, 2014: Impact of assimilating Taiwan radar data using 3DVAR and cloud analysis on the rediction of typhoon Morakot near its landfall. To be submitted. 881. Roberts, B.R., 2013: Tornadogenesis in High-Resolution Idealized Numerical Simulations. VORTEX2 Workshop, Marble Falls, TX. 882. Schenkman, A. D., M. Xue, and M. Hu, 2014: Tornadogenesis in a high-resolution simulation of the 8 May 2003 Oklahoma City Supercell. J. Atmos. Sci., 71, 130-154. 883. Schumacher, R. S., A. J. Clark, M. Xue and F. Kong, 2013: Factors influencing the development and maintenence of noturnal heavy-rain-producing convective systems in a storm-scale ensemble. Mon. Wea. Rev., Accepted. 884. Snook, N. A., M. Xue, and Y. Jung, 2014: Multi-scale EnKF assimilation of radar and conventional observations and ensemble forecasting for a tornadic mesoscale convective system. Mon. Wea. Rev., Conditionally accepted. 885. Supinie, Timothy A., Younsun Jung, Ming Xue, David J. Stensrud, Michael M. French, and Howard B. Bluestein “Impact of VORTEX2 Observations on Analyses and Forecasts of the 5 June 2009 Goshen County, Wyoming Supercell.” In preparation. 886. Tanamachi, R.L., L.J. Wicker, D.C. Dowell, H.B. Bluestein, and M. Xue, 2013: EnKF assimilation of high-resolution, mobile Doppler radar data of the 4 May 2007 Greensburg, Kansas supercell into a numerical cloud mode. Mon. Wea. Rev., accepted. 887. Tiwary,A., A. Namdeo, J. Fuentes, A. Dore, X.-M. Hu, M. Bell, 2013: Systems scale assessment of the sustainability implications of emerging green initiatives, Environ. Pollution, 183, 213–223. 888. Wang, M., M. Xue, K. Zhao, and J. Dong, 2014: Assimilation of T-TREC-retrieved winds from single-Doppler radar with an EnKF for the forecast of Typhoon Jangmi (2008), Mon. Wea. Rev. Accepted. 889. Wang, S., M. Xue, A.D. Schenkman, and J. Min, 2013: An iterative ensemble square root filter and tests with simulated radar data for storm scale data assimilation. Quart. J. Roy. Meteor. Soc., in press. 890. performance analysis for an ensemble square root filter suitable for dense observations. J. Atmos. Ocean. Tech., doi: http://dx.doi.org/10.1175/JTECH-D-12-00165.1. 6 891. 892. Xue, M., and J. Dong, 2013:, Impact of assimilating best track minimum sea level pressure data together with coastal Doppler radar data on hurricane analysis and prediction at a cloud-resolving resolution, Acta Meteorologica Sinica, 27, 379-399. 893. Xue, M., J. Schleif, F. Kong, K. K. Thomas, Y. Wang, K. Zhu, and J. S. Whitaker, 2013: Track and intensity forecasting of Hurricanes: Impact of cloud-resolving resolution and ensemble Kalman filter data assimilation on 2010 Atlantic season forecasts. Wea. Forecasting, 28, 1366-1384.

PY1-3 Comprehensive Report Page 39

894. Xue, M., M. Hu, and A. Schenkman, 2014: Numerical prediction of 8 May 2003 Oklahoma City tornadic supercell and embedded tornado using ARPS with assimilation of WSR-88D radar data. Wea. Forecasting, 29, 39-62. 80. MCA98N020 895. Heating of the IGM by X-Rays from Population III Binaries in High Redshift Galaxies, H. Xu, K. Ahn, J. Wise, M. L. Norman, & B. W. O’Shea, Astrophysical J. submitted (2014) 896. Spatially Extended 21 cm Signal from Strongly Clustered UV and X-ray Sources in the Early Universe, K. Ahn, H. Xu, M. L. Norman, M. Alvarez, & J. Wise, Astrophysical J. submitted (2014) 897. *Direct Numerical Simulation of Reionization in Large Cosmological Volumes I. Numerical Methods and Tests, D. Reynolds, M. Norman, G. So & R. Harkness, Astrophys. J., submitted, arXiv:1306.0645 898. *Direct Numerical Simulation of Reionization in Large Cosmological Volumes II. Clumping Factor Evolution and The Photon Budget for Reionization , G. So, M. Norman & D. Reynolds Astrophys. J., submitted, arXiv:1311.2152 899. *Direct Numerical Simulation of Reionization in Large Cosmological Volumes III. Suppression of Star Formation in Low Mass Halos by Radiative Feedback , G. So, M. Norman & D. Reynolds & R. Harkness, in preparation 900. Direct Numerical Simulation of Cosmic Reionization. Geoffrey So Ph. D. thesis, University of California, San Diego (2013) 81. MCB020003 901. Elshrif MM, Cherry EM. A quantitative comparison of the behavior of human ventricular cardiac electrophysiology models in tissue. PLoS One 2014; 9: e84401. 902. Bragard J, Elorza J, Cherry EM, Fenton FH. Validation of a computational model of cardiac defibrillation. Computing in Cardiology 2013; 40, 851-854. 903. Bragard J, Marin S, Cherry EM, Fenton FH. Study of cardiac defibrillation through numerical simulations. In: Rubio RG et al. (eds.), Without Bounds: A Scientific Canvas of Nonlinearity and Complex Dynamics, Understanding Complex Systems, Springer-Verlag, Berlin, 2013, p. 649-658. 904. Elshrif M, Cherry EM. Modeling human heart failure. In preparation. 905. Bueno-Orovio A, Cherry EM, Evans SJ, Fenton FH. Basis for the induction of phase-2 reentry in the Brugada syndrome in humans: Insights from computer simulations. In preparation. 906. Cantalapiedra I, Fenton FH, Cherry EM. A comparison of rabbit ventricular cell models in tissue. In preparation. 82. MCB070059 907. Lapelosa, M., and C. F. Abrams, A computational study of water and CO migration sites and channels inside myoglobin. J Chem Theory Comput, 2013;9:1265-1271. 908. Vashisth, H., and C. F. Abrams, All-atom structural models of insulin binding to the insulin receptor in the presence of a tandem hormone-binding element. Proteins, 2013;81:1017-1030. 909. Emileh, E., F. Tuzer, H. Yeh, D. Moreira, J. LaLonde, C. Bewley, C. F. Abrams, and I. Chaiken, A model of the peptide triazole entry inhibitor binding to HIV-1 gp120 and mechanism of bridging sheet disruption. Biochemistry 2013;52:2245-2261. 910. Lapelosa, M. and C. F. Abrams, Transition-path theory calculations on non-uniform meshes in two and three dimensions using finite elements. Comp Phys Comm, 2013;184:2310-2315. 911. *Abrams, C. F. and Giovanni Bussi, Enhanced sampling in molecular dynamics using metadynamics, replica-exchange, and temperature-acceleration. Entropy, 2014;16:163-199. 912. *Baker, M., Gangupomo, V., and C. F. Abrams, Characterization of the water defect at the HIV-1 gp41 membrane spanning domain in bilayers with and without cholesterol using molecular simulations. BBA-Biomembranes, 2014;1838:1396-1405 (2014). 913. *Bucci, A. and C. F. Abrams, Oxygen pathways and allostery in monomeric sarcosine oxidase via single-sweep free- energy reconstruction. J Chem Theory Comput, 2014;accepted. 914. Lapelosa, M., Yu, T.-Q., Vanden-Eijnden, E., and C. F. Abrams, “Markovian milestoning for computing diffusion rates of ligands in proteins,” Annual Meeting of the American Institute of Chemical Engineers, November 2013. 915. Abrams, C. F., “Exploring and mapping free-energy surfaces using temperature-acceleration, the string method and single sweep,” Snowmass Free Energy Workshop, July 2013, invited. 916. Abrams, C. F., “Understanding and Harnessing Biomolecular Metastabilities,” UC-Berkeley, Dept. Chemical and Biomolecular Engineering, 9 January 2014, invited. 917. Yu, T.-Q., Lapelosa, M., Vanden-Eijnden, E., and C. F. Abrams, “Markovian milestoning for computing diffusion rates of ligands in proteins,” 2014 March Meeting of the American Physical Society, 7 March 2014, invited. 918. Abrams, C. F., “A Workshop on Molecular Dynamics,” ExxonMobil Research and Engineering, 31 March 2014, invited. 919. Abrams, C. F., “Understanding and Harnessing Biomolecular Metastabilities,” University of Washington Molecular Engineering Symposium (upcoming), 22 April 2014, invited.

PY1-3 Comprehensive Report Page 40

83. MCB070068 920. Cheung MS. Where soft matter meets living matter -- protein structure, stability and folding in the cell. Current Opinion in Structural Biology. 2013;23:212-7. 921. Waxham MN, Cheung MS. Model of Calmodulin. In: Jaeger D, Jung R, editors. Encyclopedia of Computational Neuroscience: Springer; 2013. 922. Wang Q, Zhang P, Hoffman L, Tripathi S, Homouz D, Liu Y, et al. Protein recognition and selection through conformational and mutually induced fit. Proc. Natl. Acad. Sci. U. S. A, 110, 20545-20550, 2013. 923. D. Balamurugan, A. J. A. Aquino, F. De Dios, L. Flores Jr., H. Lischka, M. S. Cheung. Multiscale simulation of the ground and photo-induced charge-separated states of molecular triad in polar organic solvent: exploring the conformations, fluctuations and the free energy landscapes. Journal of Physical Chemistry, B, 117, 12065-12075 2013. 84. MCB080077 924. Johnston, J.M. & Filizola, M. “Differential stability of the crystallographic interfaces of mu- and kappa-opioid receptors” PLOS One (2014) DOI: 10.1371/journal.pone.0090694. 925. Mondal, S., Johnston, J., Wang, H., Khelashvili, G., Filizola, M., Weinstein, H. “Membrane Driven Spatial Organization of GPCRs” Nature Scientific Reports (2013) 3 (2909): 1-9 [PMID: 24105260]. 926. Jiang, J.-K. McCoy, J.G., Shen, M., LeClair, C., Huang, W., Negri, A., Li, J., Blue, R., Harrington, A., Naini, S., David, G. III, Choi, W.-S., Volpi, E., Fernandez, J., Babayeva, M., Nedelman, M.A., Filizola, M., Coller, B.S., Thomas, C.G. “A novel class of ion displacement ligands as antagonists of the αIIbβ3 receptor that limit conformational reorganization of the receptor” Bioorganic & Medicinal Chemistry Letters (2014) 24(4):1148-53 [PMID: 24461295] 927. Provasi, D., Negri, A., Coller, B.S., Filizola, M. “Talin-driven inside-out activation mechanism of platelet αIIbβ3 integrin probed by multi-microsecond, all-atom molecular dynamics simulations” PROTEINS: Structure, Function, and Bioinformatics (2014) DOI: 10.1002/prot.24540. 928. Li, J., Vootukuri, S., Shang, Y., Negri, A., Jiang, J.-K., Nedelman, M., Diacovo, T.G., Filizola, M., Thomas, C.J., Coller, B.S. “RUC-4: A Novel αIIbβ3 Antagonist for Pre-hospital Therapy of Myocardial Infarction” (2014) submitted 929. Provasi, D., Boz, M.B., Johnston, J.M., Filizola, M. “Self-association of Opioid Receptors In an Explicit Lipid-Water Environment.” in preparation. 930. Provasi, D., Filizola, M. “Molecular Determinants of G Protein-Biased Agonism at the Kappa-Opioid Receptor.” in preparation. 85. MCB090174 931. S. Jha, J. Qiu, A. Luckow, P. Mantha, and G. Fox, “A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures,” Accepted for IEEE Congress on BigData 2014, Anchorage, Alaska), 2014, http://arxiv.org/abs/1403.1528. 932. G Fox, J Qiu and S Jha, High Performance High Functionality Big Data Software Stack, in Big Data and Extreme-scale Computing (BDEC). 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf. 933. S.-H. Ko, N. Kim, S. Jha, D. Nikitopoulos, and D. Moldovan, “Numerical methodologies for investigation of moderate- velocity flow using a hybrid computational fluid dynamics molecular dynamics simulation approach,” Journal of Mechanical Science and Technology, vol. 28, no. 1, pp. 245–253, 2014. [Online]. Available: http://dx.doi.org/10.1007/s12206-013-0962-5 934. A. Raghotham, S. S, N. Kim, S. Jha, and J. Kim, “Developing ethread pipeline using saga-pilot abstraction for large- scale structural bioinformatics,” (accepted for) High-Performance Computing and Big Data in Omics-based Medicine, 2014. 935. H. S. C. Martin, S. Jha, and P. V. Coveney, “Comparative analysis of nucleotide translocation through protein nanopores using steered molecular dynamics and an adaptive biasing force,” Journal of Computational Chemistry, pp. 692–702, 2014, http://dx.doi.org/10.1002/jcc.23525. 936. D. W. Wright, B. A. Hall, O. A. Kenway, S. Jha, and P. V. Coveney, “Computing clinically relevant binding free energies of hiv-1 protease inhibitors,” Journal of Chemical Theory and Computation, 2014, http://dx.doi.org/10.1021/ct4007037. [Online]. Available: http://pubs.acs.org/doi/abs/10.1021/ct4007037 937. A. Luckow, M. Santcroos, A. Zebrowski, and S. Jha, “Pilot-Data: An Abstraction for Distributed Data,” accepted in, Journal of Parallel and Distributed Computing, Special Issue on Big Data, 2014, http://arxiv.org/abs/1301.6228. 938. S. Jha, A. Luckow, A. Merzky, M. Santcroos, M. Turilli, and O. Weidner, “A Fresh Perspective on Pilot-Jobs,” Not yet published, 2014, https://www.github.com/saga-project/radical.wp/raw/master/publications/pdf/review pilot review 2013 draft.pdf. 939. A. Zebrowsky and A. L. S. Jha, “BigJob Experimentation,” Not yet published, 2013, https://www.github.com/saga- project/radical.wp/raw/master/publications/pdf/review 2013 bj _experimentation draft.pdf.

PY1-3 Comprehensive Report Page 41

940. A. Merzky, C. Straube, S. Jha, and D. Kranzlmueller, “Toward Integrated Modeling of Distributed Applications and Infrastructure,” In preparation, 2014, https://www.github.com/sagaproject/radical.wp/raw/master/publications/pdf/review ccgrid maii 2013 draft.pdf. 941. ”Towards Effective Selection of Collective XSEDE Resources”, Vishal Shah, Antons Trekalis and Shantenu Jha, submitted to Annual Conference XSEDE’14. 942. “Computers select personal medicine (2014),” http://www.bbc.co.uk/news/science-environment-26213522. 86. MCB100065 943. Perilla J.R., Leahy, D.J. and Woolf T.B. (2013) Molecular dynamics simulations of transitions for ECD epidermal growth factor receptors show key differences between human and drosophila forms of the receptors. Proteins 81(7):1113-1126. 944. Nutanong S., Carey N., Ahmad Y., Szalay A.S. and Woolf T.B. (2013) Adaptive exploration for large-scale protein analysis in the molecular dynamics database. Proceedings of the 25th International Conference on Scientific and Statistical Database Management (ACM) 945. Braiterman L.T., Murthy A., Jayakanthan S., Nyassae L., Tzeng E., Gromadzka G., Woolf T.B., Lutsenko S., and Hubbard A.L. (2014) Distinct phenotype of a Wilson disease mutation reveals a novel trafficking determinant in the copper transporter ATP7B. Proc. Natl. Acad. Sci. USA 111(14) E1364-E1373. 946. Nagarian A. and Woolf TB (2014) Comparing all-atom and coarse-grained simulations for optimal sampling in SERCA 947. Denning L., Perilla J., Beckstein O., and Woolf TB (2014) Holo AdK transitions can be inferred from apo AdK: Simulations of transition events in DIMS. 948. Woolf T.B. (2014) Simulations of apo and holo cAMP binding domain from EPAC show the molecular determinants of an allosteric network. 949. Environmental Influences on States: Molecular Dynamics Simulations of SERCA. Nagarajan A. and Woolf T.B., Biophysical Journal 106(2), 584a 950. Molecular Dynamics Simulations helps to rationalize CopB mutations and their relationships to Wilson Disease. Jayakanthan S., McEvoy M.M. and Woolf T.B. Biophysical Journal 106(2), 583a. 951. Database Guided Exploration to Determine Native Ligands for Orphaned Odorant Receptors. Nutanong S., Wong K., Ahmad Y., Pluznick J., Shepard B. and Woolf T.B. Biophysical Journal 106(2), 609a. 952. Learning about Transitions: Adaptive Controls for the Molecular Dynamics Database. Nutanong S.Y., Ahmad Y., Wang I.J. and Woolf T.B. Biophysical Journal 106(2) 406a. 87. MCB100111 953. Jack Smith, Melissa Romanus, Pradeep Kumar Mantha, Yaakoub El Khamra, T.C. Bishop, Shantenu Jha. Scalable Online Comparative Genomics of Mononucleosomes: A BigJob. Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery (XSEDE '13), ACM, New York, NY, USA, Article 23, 8 pages. DOI: 10.1145/2484762.2484819 954. Mithun Biswas, Jörg Langowski, T. C. Bishop. Atomistic simulations of nucleosomes. Wiley Interdisciplinary Reviews: Computational Molecular Science, DOI: 10.1002/wcms.1139. 88. MCB120005 955. Nedialkov, Y.A., Opron, K., Assaf, F., Artsimovitch, I., Kireeva, M.L., Kashlev, M., Cukier, R.I., Nudler, E., and Burton, Z.F. (2013) The RNA polymerase bridge helix YFI motif in catalysis, fidelity and translocation. BBA Gene Regulatory Mechanisms 1829, 187-198. DOI: 10.1016/j.bbagrm.2012.11.005 89. MCB120014 956. C. Miller*, G. Zerze*, and J. Mittal, “Molecular Simulation indicate marked differences in structure of amylin mutants, correlated with known aggregation propensity”, J. Phys. Chem. B 117, 16066 (2013). *equal contribution 957. D. Roxbury, S.-•‐Q. Zhang, J. Mittal, W. F. DeGrado, and A. Jagota, “Structural Stability and Binding Strength of a Designed Peptide-•‐Carbon Nanotube Hybrid” J. Phys. Chem. C 117, 26255 (2013). 958. H. Zerze, J. Mittal, and A. J. McHugh, “Ab-•Initio Crystallization of n-•Eicosane: Structure and Kinetics of Crystal Formation”, Macromolecules 46, 9151 (2013). 959. D. de Sancho, J. Mittal, and R. B. Best, “Folding kinetics and unfolded state dynamics of the GB1 hairpin from molecular simulation”, J. Chem. Theory Comput. 9, 1743 (2013). 960. D. Roxbury, A. Jagota, and J. Mittal, “Structural characteristics of oligomeric DNA strands adsorbed onto single-•walled carbon nanotubes” J. Phys. Chem. B 117, 132 (2013). 961. Y. C. Kim and J. Mittal, “Crowding induced entropy-•enthalpy compensation in protein association equilibria”, Physical Review Letters 110, 208102 (2013). 962. G. Zerze, R. B. Best, and J. Mittal, “Do unfolded protein properties depend on dyes in single-•molecule FRET experiments?” Biophysical Journal (submitted).

PY1-3 Comprehensive Report Page 42

963. G. Zerze, B. Schuler, R. B. Best, and J. Mittal, “Role of water exemplified by temperature dependent change in unfolded protein behavior” (submitted). 964. M. J. Boyer, Y. Ding, and J. Mittal, “Coarse-•‐grained DNA model that includes non-•Watson-•Crick pairing and DNA- •nanomaterial interactions”, J. Chem. Phys. (to be submitted). 965. Y. Ding and J. Mittal, “A coarse-•grained model for DNA-•mediated colloidal assembly” J. Chem. Phys. (to be submitted). 966. A. Bhattacharya, Y. C. Kim, and J. Mittal, “Effects of macromolecular crowding on binding-•induced folding and binding affinity” Biophys. J. (to be submitted). 90. MCB120113 967. Blood, P.D., Marcus, S., Schatz M.C. Large-scale sequencing and assembly of cereal genomes using Blacklight. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment (XSEDE’14). ACM, New York, NY, USA, 2014 (accepted). 91. MCB120160 968. Coulomb soup of bioenergetics: Electron transfer in a bacterial bc1 complex, D. R. Martin, D. N. LeBard, D. V. Matyushov, J. Phys. Chem. Lett., 4, 3602-•3606, DOI: 10.1021/jz401910e (2013). (link) 969. Advancing simulations of biological materials: applications of coarse-•grained models on graphics processing unit hardware, D. N. LeBard, Molecular Simulation, In Press (Special Issue: Advances in Molecular Biology), DOI: 10.1080/08927022.2014.899700 970. (poster) A Replica Exchange Molecular Dynamics Study of Copper(ii) Binding to a Model Peptide of the Human Copper Transport Protein N-•Terminus, J. Skootsky**, Y. Shenberger, S. Ruthstein, D. N. LeBard, ACS Meeting, San Francisco, 2014, **undergraduate student poster 971. (oral) Binding of copper to a model peptide of the human CTR1 protein from atomistic replica exchange molecular dynamics simulations as a function of copper(ii) concentration, J. Skootsky, Y. Shenberger, S. Ruthstein, D. N. LeBard, ACS Meeting, San Francisco, 2014 972. (oral) A large-•scale computational study of the distance-•dependent properties of the Qa-• to Qb electron transfer reaction from a membrane-•bound reaction center protein, D. N. LeBard, ACS Meeting, San Francisco, 2014 92. MCB130010 973. Ackerman, D. G.; Heberle, F. A.; Feigenson, G. W. Limited Perturbation of a DPPC Bilayer by Fluorescent Lipid Probes: A Molecular Dynamics Study. The Journal of Physical Chemistry B 2013, 117, 4844–4852. 974. Ackerman, D. G.; Feigenson, G. W. Multiscale Modeling Of Four Component Lipid Mixtures: Coarse Grained And Atomistic Simulations Reveal Trends In Phase Separation. (in preparation). 975. Heberle, F. A.; Petruzielo, R. S.; Goh, S. L.; Konyakhina, T. M.; Ackerman, D. G.; Amazon, J. J.; Feigenson, G. W. 2014. Liposome-Based Models for Membrane Rafts: Methodology and Applications. In Liposomes, Lipid Bilayers and Model Membranes: From Basic Research to Application; Katsaras, J., Ed.; CRC Press: Boca Raton, FL. 976. Ackerman, D. G.; Feigenson G. W. Multiscale Modeling Of Four Component Lipid Mixtures: Coarse Grained And Atomistic Simulations Reveal Trends In Phase Separation. Biophysical Society 58th Annual Meeting (February 2014). 977. We have created a Youtube channel entitled Feigenson Lab in which we share movies and descriptions of our simulation results. This information is also included in our lab webpage. Youtube channel: http://www.youtube.com/channel/UCYwBI21R3Q52h0ut1kJU9XA Lab Webpage: http://feigensonlab.mbg.cornell.edu/methods.html 93. MCB130046 978. Bykhovskaia M, A Jagota, A Gonzalez, A Vasin, and JT Littleton. 2013. Interaction of the complexin accessory helix with the C-terminus of the SNARE complex: Molecular dynamics model of the fusion clamp. Biophys. J. 6:679-90. 979. Guan, Z., Jorquera, R.A., Sutton, R.B., Bykhovskaia, M. & Littleton, J.T. 2014 Conformational dynamics of Synaptotagmin multimerization and C2A-C2B intradomain interactions regulate synaptic vesicle fusion. In preparation. 980. Bykhovskaia. 2014. Molecular dynamics study of neuronal calcium sensor synaptotagmin. Biophys. Soc. Annual Meeting, San Francisco, February 2014. 94. MCB130112 981. Pogorelov, T.V., Vermaas, J. V., Arcario, M.J., and Tajkhorshid, E., “Partitioning of amino acids into a model membrane: capturing the interface,” J. Phys. Chem. B., 118, 1481–1492, 2014. PMID: 24451004 982. Goodman, J.S., Chao, S.-H., Pogorelov*, T.V., and Gruebele*, M., “Filling up the heme pocket stabilizes apomyoglobin and speeds up its folding,” J. Phys. Chem. B., in press. 983. Pogorelov, T. V., Arcario, M. J. and Tajkhorshid, E. “Lipid clustering in presence of ions,” (Oral Presentation) Illinois Hemostasis Consortium Seminar, March 2014, Urbana, IL.

PY1-3 Comprehensive Report Page 43

984. Pogorelov, T. V., Arcario, M. J. and Tajkhorshid, E. “Lipid clustering in presence of ions,” (Oral Presentation) Illinois Hemostasis Consortium Seminar, September 2013, Urbana, IL. 985. Arcario, M. J., Pogorelov, T. V., Ohkubo, Y. Z. and Tajkhorshid, E. “The HMMM Model and Membrane–Binding of Talin,” (Oral Presentation) Membrane Protein Interest Group Seminar, 2013, Urbana, IL. 986. Vermaas, J. V., Pogorelov, T. V., and Tajkhorshid, E., “Developing Novel in silico Solvents to Mimic Model Membranes,” (Oral Presentation) Mini-Symposium on Computational Approaches, Membrane Protein Structural Dynamics Consortium, May 2013, Chicago, IL. 987. Vermaas, J. V., Pogorelov, T. V., and Tajkhorshid, E., Applications and Energetics of a Novel Atomistic Membrane Representation,” Missouri Symposium in Biophysics II: Membrane Proteins, March 2013, Columbia, MO. 988. Goodman, J.S., Pogorelov*, T.V., Prigozhin, M.V., and Gruebele*, M., “Filling up the heme pocket stabilizes apomyoglobin and speeds up its folding,” 8th Annual Midwest Conference on Protein Folding, Assembly, and Molecular Motions, May 2013, Notre Dame, IN. 989. Vermaas, J. V., Pogorelov, T. V., and Tajkhorshid, E., “Design and Development of a Special-Purpose in silico Solvent for Use in a Highly Mobile Membrane Mimetic,” DOE CSGF Annual Conference, 2013, Washington, D.C. 990. Pogorelov, T.V., “Computational chemistry for organic chemists,” Organic chemistry research class, March 2014, Chemistry, University of Illinois, Urbana, IL. 991. Pogorelov, T.V., “Capturing membrane surface and protein-membrane interactions with molecular dynamics simulations,” Membrane protein biology and biotechnology Ph.D. Course, August 2013, Stockholm University, Sweden. 992. Pogorelov, T.V., “Computational chemistry overview,” Visiting chemistry class, University Laboratory High School (Uni High), May 2013, Urbana, IL. 95. MCB130115 993. Bernardi, R.C., Cann, I., and Schulten, K. Molecular dynamics study of enhanced Man5B enzymatic activity. Biotechnology for Biofuels (submitted -•‐ under re-•‐review) 994. Schoeler, C., Malinowska, K.H., Bernardi, R.C., Miles, L.F., Jobst, M.A., Durner, E., Ott, W., Fried, D.B., Bayer, E.A., Schulten, K., Gaub, H.E., Nash, M.A. Ultrastable Mechanically Shielded Protein Complex (to be submitted soon) 995. Bernardi, R.C., Dodd, D., Schulten, K.S., Cann, I. Mechanism of synergy in the bifunctional xylanase-•ferulic acid esterase revealed by biochemical experiments coupled to molecular dynamics simulations. (in preparation) 96. MCB140110 996. L. Ingber, M. Pappalepore, and R.R. Stesiak, ‘‘Electroencephalographic field influence on calcium momentum waves,’’ Journal of Theoretical Biology 343, 138-153 (2014). [ URL http://www.ingber.com/smni14_eeg_ca.pdf and http://dx.doi.org/10.1016/j.jtbi.2013.11.002 ] 997. L. Ingber, ‘‘Electroencephalographic field influence on calcium momentum waves,’’ Journal of Theoretical Biology 343, 138-153 (2013). [ URL http://www.ingber.com/smni14_eeg_ca_slides.pdf and http://dx.doi.org/10.1016/j.jtbi.2013.11.002 ] 998. P.L. Nunez, R. Srinivasan, and L. Ingber, ‘‘Theoretical and experimental electrophysiology in human neocortex: Multiscale correlates of conscious experience,’’ in Multiscale Analysis and Nonlinear Dynamics: From genes to the brain, edited by M.M. Pesenson (Wiley, New York, 2013), p. 149-178. 999. L. Ingber, ‘‘Electroencephalographic (EEG) influence on Ca2+ waves,’’ Report 2013:LEFI, Lester Ingber Research, Ashland, OR, (2013). [ 2nd World Neuroscience Online Conference 18 June 2013. URL http://www.ingber.com/smni13_eeg_ca_lect.pptx ] 1000. L. Ingber, ‘‘AI and ideas by multiple scales of statistical mechanics (ISM),’’ International Journal of Robotics Applications and Technologies (IJRAT) (to be published)(2014). [ Invited paper based on 2008 contribution to Encyclopedia of Artificial Intelligence. URL http://www.ingber.com/smni14_ismai.pdf ] 1001. L. Ingber, ‘‘Influences on consciousness from multiple scales of neocortical interactions,’’ in Nonchemical distant cellular interactions: experimental evidences, theoretical modeling, and physiological significance, edited by M. Cifra, V. Salari, D. Fels, and F. Scholkmann (2014), p. (in preparation). [ Invited Paper. URL http://www.ingber.com/smni14_conscious_scales.pdf ] 97. MCB140130 1002. Ngo, V., D. Stefanovski, S. Haas, and R.A. Farley. 2014. Non-•‐equilibrium dynamics contribute to ion selectivity in the KcsA channel. PLoS One. 9:e86079. 1003. AMINO ACID SUBSTITUTIONS FOR T75 IN KCSA ALTER ION SELECTIVITY. 1Melia Tabbakhian, 2Van Ngo, 2Stephan Haas, and 1Robert A. Farley. 1Department of Physiology & Biophysics, and 2Department of Physics & Astronomy, University of Southern California, Los Angeles, CA. Biophysical Society Annual Meeting, San Francisco, CA. February 15-•19, 2014.

PY1-3 Comprehensive Report Page 44

1004. MECHANISM OF NON-•SELECTIVITY IN NAK CHANNEL. Van Ngo1, Haina Wu1, 1Stephan Haas, and Robert A. Farley2. 1Department of Physics and Astronomy, and 2Department of Physiology and Biophysics, University of Southern California, Los Angeles, CA. Biophysical Society Annual Meeting, San Francisco, CA. February 15-•19, 2014. 1005. THERMAL EFFECTS AND FREE-•ENERGY BARRIER DIFFERENCES IN THE ION SELECTIVITY MECHANISM OF KCSA. 1Van Ngo, 3Darko Stefanovski, 2Robert A. Farley, and 1Stephan Haas. 1Department of Physics and Astronomy, and 2Department of Physiology and Biophysics, University of Southern California, Los Angeles, CA, 3Cedars Sinai Medical Center, Los Angeles, CA. Biophysical Society Annual Meeting, Philadelphia, PA. February 2-•6, 2013. 98. MCB140138 1006. Mayya Tokman, Jane HyoJin Lee, Zachary A. Levine, Ming-Chak Ho, Michael E. Colvin, and P. Thomas Vernier. \Electric Field-Driven Water Dipoles: Nanoscale Architecture of Electroporation." PLoS ONE, 8(4):e61111, April 2013. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0061111 1007. Mayya Tokman, Jane HyoJin Lee, and Michael E. Colvin. \Electrostatic Stretching of Water Nanodroplets." to be submitted by May 2014. 99. MCB140143 1008. G. Interlandi, W.E. Thomas; The allosteric regulation of the mannose-binding protein FimH investigated by molecular dynamics simulations; In preparation 1009. V.B. Rodriguez, B.A. Kidd, G. Interlandi, V. Tchesnokova, E.V. Sokurenko, W.E. Thomas; Allosteric coupling in the bacterial adhesive protein FimH; J. Biol. Chem.; 288(33), 24128-24139 (2013) 1010. Y. Liu, L. Esser, G. Interlandi, D.I. Kisiela, V. Tchesnokova, W.E. Thomas , S.J. Savarino, E.V. Sokurenko and D. Xia; Tight Conformational Coupling between the Domains of the Enterotoxigenic Escherichia coli Fimbrial Adhesin CfaE Regulates Binding State Transition; J. Biol. Chem.; 288(14), 9993-10001 (2013) 100. MCB140144 1011. Bio-inspired resin acid-derived materials as anti-bacterial resistance agents with unexpected activities, Mitra S. Ganewatta,a Yung Pin Chen,ab Jifu Wang,a Jihua Zhou,c Jerry Ebalunode,d Mitzi Nagarkatti,c Alan W. Decho*b and Chuanbing Tang*a Chem. Sci., 2014,5, 2011-2016 101. MSS120006 1012. Golman A.J., Danelson, K.A., Stitzel, J.D. “Robust Human Body Model Injury Prediction in Simulated Side Impact Crashes.” Computer Methods in Biomechanics and Biomedical Engineering. Submitted March 2014. 1013. Danelson, K.A, Golman, A.J., Kemper, A.R., Gayzik, F.S., Gabler, H.C., Duma, S.M, Stitzel, J.D. “Evaluation of Chest Injury Metrics in Human Body Model and Hybrid III Model Analyses.” In preparation for Accident Analysis and Prevention. 1014. Gaewsky, J.P., Danelson, K.A., Weaver, C.M., Weaver, A.A., Stitzel, J.D. “Development of a Generic Vehicle Model for Crash Simulations.” In preparation for Computer Methods in Biomechanics and Biomedical Engineering. 1015. Gaewsky, J.P., Danelson, K.A., Weaver, A.A., Stitzel, J.D. “Implementation and Validation of Frontal Impact Injury Prediction Metrics in a Human Body Model.” In preparation for Traffic Injury Prevention. 1016. Gaewsky, J.P., Danelson, K.A., Weaver, A.A., Stitzel, J.D. “Injury Prediction in a Frontal Impact Crash Using Human Body Model Simulation.” In preparation for Accident Analysis and Prevention. 1017. Danelson, K.A., Golman, A.J., Weaver, C.M., Stitzel, J.D. “Variation in Real World Crash Simulation Injury Metrics Given Crash Parameter Perturbations,” Digital Human Modeling Conference, Ann Arbor, MI, 2013. 1018. Danelson, K.A., Gaewsky, J.P., Golman, A.J., Stitzel, J.D. “Finite Element Human Body Model Thorax Response Validation with Matched Experimental Tests,” ASME Summer Bioengineering Conference, Sun River, OR, 2013. 102. MSS130007 1019. Ruizhi, Li and Huck Beng, Chew. Nanoscale Ductility Mechanisms in Cu/Ag Nanolayered Metals. Eight Pacific Rim International Congress for Advanced Materials and Processing (PRICM-8), 2013, Waikoloa, Hawaii. 1020. Ruizhi, Li and Huck Beng, Chew. Deformation Twinning and Plastic Recovery in Cu/Ag Nanolayers under Uniaxial Tensile Straining. Philosophical Magazine Letters, 2014. In Press. http://dx.doi.org/10.1080/09500839.2014.893063 1021. Ruizhi, Li and Huck Beng, Chew. Ductility Induced by Interface Slip in Cu/Ag Nanolayers: Wavy Interlayers Formed by Interface Migration. In Preparation. 103. OCE030007 1022. I. Alonso-Gonzalez, J. Aristegui, C. Lee, A. Sanchez-Vidal, A. Calafat, J. Fabres, P. Sangra, and E. Mason. Carbon dynamics within cyclonic eddies: Insights from a biomarker study. PLOS ONE, 8:e82447, 2013.

PY1-3 Comprehensive Report Page 45

1023. A. Benazzouz, J.L. Pelegri, H. Demarcq, F. Machin, E. Mason, A. Orbi, J. Pena-Izquierdo, and M. Soumia. On the temporal memory of coastal upwelling off NW Africa. J. Geophys. Res., 2014. Submitted. 1024. F. Colas, X. Capet, and J.C. McWilliams. Mesoscale eddy buoyancy flux and eddy-induced circulation in eastern- boundary upwelling systems. J. Phys. Ocean., 43:1073–1095, 2013. 1025. F. Colas, J. Molemaker, X. Capet, and J.C. McWilliams. Seasonal and geographical variability of surface-layer submesoscale circulations. 2014. In preparation. 1026. F. Colas, J. Molemaker, and J.C. McWilliams. Filamentary spiral vortices. 2014. In preparation. 1027. F. Colas, X. Wang, X. Capet, Y. Chao, and J.C. McWilliams. Untangling the roles of wind, run-off, and tides in Prince William Sound, Alaska. Cont. Shelf Res., 63:S79–S89, 2013. 1028. C. Deutsch, H. Brix, H. Frenzel, A. Booth, E.Y. Kwon, and J.C. McWilliams. Variability and trends in hypoxia in the California Current. Global Biogeochem. Cycles, 2014. In preparation. 1029. T. DeVries, J.-H. Liang, and C. Deutsch. A mechanistic particle flux model applied to the oceanic phosphorus cycle. Biogeosciences Discussion, 11:3653–3699, 2014. 1030. W.K. Dewar, J.C. McWilliams, and M.J. Molemaker. Centrifugal instability and mixing in the California Undercurrent. J. Phys. Ocean., 2014. Submitted. 1031. J. Farrara, Y. Chao, Z. Li, X. Wang, X. Jin, H. Zhang, P. Li, Q. Vu, P. Olsson, C. Schoch, M. Halverson, M. Moline, C. Ohlmann, M. Johnson, and J.C. McWilliams. A data-assimilative ocean forecasting system for the Prince William Sound and an evaluation of its performance during Sound Predictions 2009. Cont. Shelf Res., 63:S193–S208, 2013. 1032. J. Gula, M.J. Molemaker, and J.C. McWilliams. Dynamics of a Gulf Stream slope eddy propagating along the continental shelf. 2014. In preparation. 1033. J. Gula, M.J. Molemaker, and J.C. McWilliams. Gulf Stream dynamics along the Southeast U.S. 2014. In preparation. 1034. J. Gula, M.J. Molemaker, and J.C. McWilliams. Submesoscale cold filaments in the Gulf Stream. J. Phys. Ocean., 2014. Submitted. 1035. J. Gula, M.J. Molemaker, and J.C. McWilliams. Submesoscale instabilities on the North Wall of the Gulf Stream. 2014. In preparation. 1036. H. Hristova, W.S. Kessler, J.C. McWilliams, and M.J. Molemaker. Eddy distribution and seasonality in the Solomon and Coral Seas. J. Geophys. Res., 2014. Submitted. 1037. A. Jousse, L. Renault, J.C. McWilliams, and A. Hall. Droplet concentration impact on the Liquid Water Path and the shortwave radiation over the U. S. West Coast. 2014. In preparation. 1038. J. Klymak et al. Observations and modeling of submesoscale instabilities on the North Wall of the Gulf Stream. 2014. In preparation. 1039. E.Y. Kwon, H. Frenzel, and C. Deutsch. Climate control on the variability of acidic and hypoxic waters in the eastern boundary of the North Pacific. Global Biogeochem. Cycles, 2014. In preparation. 1040. F. Lemarie, F. Colas, and J.C. McWilliams. The combined effects of sea surface temperature and oceanic currents on eddy-scale air-sea interactions. J. Geophys. Res., 2014. In preparation. 1041. Z. Li, Y. Chao, J. Farrara, and J.C. McWilliams. Impacts of distinct observations during the 2009 Prince William Sound field experiment: A data assimilation study. Cont. Shelf Res., 63:S209–S222, 2013. 1042. Z. Li, Y. Chao, J.C. McWilliams, K. Ide, and J.D. Farrara. Coastal ocean data assimilation using a multi-scale three- dimensional variational scheme. Q. J. R. Meteor. Soc., 2014. Submitted. 1043. Z. Li, J. C. McWilliams, and K. Ide. A multi-scale data assimilation scheme: Formulation and illustration. Month. Weath. Rev., 2014. Submitted. 1044. J.-H. Liang, C. Deutsch, H. Frenzel, and J.C. McWilliams. Biogeochemical cycles in Pacific eastern boundary upwelling systems: A model study. Global Biogeochem. Cycles, 2014. In preparation. 1045. J.-H. Liang, C. Deutsch, H. Frenzel, L. Renault, and J.C. McWilliams. Wind-driven shifts in plankton community composition of the California Current. Global Biogeochem. Cycles, 2014. In preparation. 1046. J.-H. Liang, C. Deutsch, and J.C. McWilliams. Iron limitation and the iron cycle in the California Current ecosystem. Global Biogeochem. Cycles, 2014. In preparation. 1047. J.-H. Liang, C.A. Deutsch, J.C. McWilliams, B. Baschek, P.P. Sullivan, and D. Chiba. Parameterizing bubble-mediated air-sea gas exchange and its effect on ocean ventilation. Global Biogeochem. Cycles, 27:894–905, 2013. 1048. J.-H. Liang, C.A. Deutsch, J.C. McWilliams, and H. Frenzel. Submesoscale nutrient supply and biological productivity in the California Current. Global Biogeochem. Cycles, 2014. In preparation. 1049. J.-H. Liang, J. McWilliams, C. Deutsch, P. Sullivan, E. D’Asaro, C. McNiel, L. Romero, and K. Melville. Significance of gas bubbles in air-sea gas transfer under hurricanes: A comparison between observations and Large Eddy Simulations. 2014. In preparation. 1050. E. Mason, F. Colas, M.J. Molemaker, A.F. Shchepetkin, P. Sangra, and J.C. McWilliams. The Canary Island wake: A numerical study. 2014. In preparation. 1051. J.C. McWilliams. Filamentogenesis by vertical mixing. 2014. In preparation. 1052. J.C. McWilliams and B. Fox-Kemper. Wave-balanced surface fronts and filaments. J. Fluid Mech., 730:464–490, 2013.

PY1-3 Comprehensive Report Page 46

1053. J.C. McWilliams, E. Huckle, J.H. Liang, and P.P. Sullivan. Langmuir turbulence in swell. J. Phys. Ocean., 44:870–890, 2014. 1054. C.R. Mechoso et al. Ocean-Cloud-Atmosphere-Land interactions in the Southeastern Pacific: The VOCALS program. Bull. Am. Meteor. Soc., 2014. In press. 1055. M.J. Molemaker, J. Gula, and J.C. McWilliams. On the use of offline Lagrangian particle tracking for the diagnosis of submesoscale flows. 2014. In preparation. 1056. M.J. Molemaker, J. Gula, and J.C. McWilliams. Sustained submesoscale instability in a Gulf Stream Ring. 2014. In preparation. 1057. M.J. Molemaker, J.C. McWilliams, and W.K. Dewar. Submesoscale generation of mesoscale in the California Undercurrent. J. Phys. Ocean., 2014. Submitted. 1058. L. Renault, F. Lemarie, M.J. Molemaker, and J.C. McWilliams. Revisiting the U.S. West Coast upwelling system using interannual high resolution atmospheric and oceanic simulations. 2014. In preparation. 1059. L. Renault, M.J. Molemaker, C. Deutsch, J.-H. Liang, and J.C. McWilliams. How atmospheric forcing controls the ecosystem variability over the U.S. West Coast upwelling system. 2014. In preparation. 1060. L. E. Romero, Y. Uchiyama, J.C. McWilliams, and D.A. Siegel. Dispersion and dilution of stormwater runoff. 2014. In preparation. 1061. L. E. Romero, Y. Uchiyama, D. Siegel, J.C. McWilliams, and C. Ohlmann. Particle-pair dispersion in the Southern California coastal zone. J. Phys. Ocean., 43:1862–1879, 2013. 1062. A. Shchepetkin. An adaptive, Courant-number-dependent implicit scheme for vertical advection in oceanic models. 2014. In preparation. 1063. A. Shcherbina, E. D’Asaro, C. Lee, J. Klymak, M.J. Molemaker, and J.C. McWilliams. Statistics of vertical vorticity, divergence, and strain in a developed submesoscale turbulence field. Geophys. Res. Lett., 40:4706–4711, 2013. 1064. Y. Uchiyama, E. Idica, J.C. McWilliams, and K.D. Stolzenbach. Wastewater effluent dispersal in Southern California Bays. Cont. Shelf Res., 76:36–52, 2014. 1065. Y. Uchiyama, H. Kaida, and J.C. McWilliams. On the interaction between surfzone littoral flow and inner-shelf circulations. 2014. In preparation. 1066. X. Wang, Y. Chao, H. Zhang, J. Farrara, Z. Li, X. Jin, K. Park, F. Colas, J.C. McWilliams, C. Paternostro, C.K. Shum, Y. Yi, C. Schoch, and P. Olsson. Modeling tides and their influence on circulation in Prince William Sound, Alaska. Cont. Shelf Res., 63:S126–S137, 2013. 104. OCE110011 1067. Radko, T., 2014: Applicability and Failure of the Flux-Gradient Laws in Double-Diffusive Convection. J. Fluid Mech. Submitted. 1068. Flanagan, J.D., T. Radko, T. Stanton and W. Shaw, 2014: Dynamic and double-diffusive instabilities in a weak pycnocline, Part II: Direct numerical simulations and flux laws. J. Phys. Oceanogr., submitted. 1069. Radko, T., D. Peixoto de Carvalho, and J.D. Flanagan, 2014: Equilibrium eddy-induced transport in baroclinically unstable flows. J. Phys. Oceanogr., submitted. 1070. Radko, T., A. Bulters, J.D. Flanagan and J-M. Campin, 2014: Double-Diffusive Recipes. Part 1: Large-Scale Dynamics of Thermohaline Staircases. J. Phys. Oceanogr. doi: 10.1175/JPO-D-13-0155.1, in press. 1071. Radko, T., J.D. Flanagan, S. Stellmach and M-L. Timmermans, 2014: Double-Diffusive Recipes. Part 2: Layer- Merging Events. J. Phys. Oceanogr. doi: 10.1175/JPO-D-13-0156.1, in press. 1072. Double-Diffusive Convection. Publisher: Cambridge University Press. Format: Hardback, 342pp. ISBN: 978- 0521880749. Publication date: October 31, 2013. 1073. Flanagan, J.D., A.S. Lefler and T. Radko, 2013: Heat Transport through Diffusive Interfaces, Geophys. Res. Lett., 40, 2466-2470, doi:10.1002/grl.50440. 105. OCE110013 1074. Chen, Ru, J. L. McClean, S. Gille and A. Griesel, 2014 Isopycnal eddy diffusivities and critical layers in the Kuroshio Extension from an eddying ocean model, J. Phys. Oceanogr., in revision. 1075. Delman, A.S., J.L. McClean, J. Sprintall, L.D. Talley, E. Yulaeva, and S. Jayne, 2014 Effects of Eddy Vorticity Forcing on the Mean State of the Kuroshio Extension. J. Phys. Oceanogr., in revision. 1076. Griesel, A., J. L. McClean, S.T. Gille, J. Sprintall, C. Eden: Eulerian and Lagrangian isopycnal diffusivities in the Southern Ocean of an eddying model. J. Phys. Oceanogr. 44, 466-661, 2014. 1077. Xu, L, S.-P. Xie, J.L. McClean, Q. Liu, H. Sasaki: Eddy effects on subduction of North Pacific Mode waters, J. Geophys. Res., in revision. 1078. Griesel, A.; Eden, C.; McClean, J. L.; Gille, S. T.; Sprintall, J.: Lagrangian eddy diffusivities and linear stability analysis in the Southern Ocean 1079. Griesel, A.; Eden, C.; McClean, J. L.; Gille, S. T.; Sprintall, J.: A global map of Lagrangian eddy diffusivities

PY1-3 Comprehensive Report Page 47

1080. Chen, Ru, J. L. McClean, S. Gille and A. Griesel, Isopycnal eddy diffusivities and critical layers in the Kuroshio Extension from an eddying ocean model, Ocean Sciences Meeting, Honolulu, HI, USA, February 2014. 1081. Chen, Ru, J. L. McClean, S. Gille and A. Griesel, Isopycnal Mixing across a Strong Jet: A Case Study for the Kuroshio Extension, Caltech, March 2014 1082. Delman, A. S. McClean, J. L.; Sprintall, J.; Talley, L. D.; Jayne, S. R Eddy-mean flow interaction and its Internal Variability in the Kuroshio Extension. Ocean Sciences Meeting, Honolulu, HI, USA, 25 February 2014. 1083. Griesel, A.; Eden, C.; McClean, J. L.; Gille, S. T.; Sprintall, J.: Lagrangian Eddy Diffusivities and their Relation to Mean Jets, Ocean Sciences Meeting, Honolulu, HI, USA, February 2014. 1084. Macdonald, A.M, L.D. Talley, J. L. McClean, X.J. Davis and T. Capuano, Toward understanding the role of the Northeast Monsoon Circulation in the Indian Ocean Carbon Budget, Ocean Sciences Meeting, Honolulu, HI, USA, February 2014. 1085. McClean, J. L.; Yulaeva, E. V.; Delman, A. S.: Eddy-Mean Flow Interactions in the Low Latitude Kuroshio Current in an Eddying Global Ocean Model, Ocean Sciences Meeting, Honolulu, HI, USA, February 2014. 106. OCE130015 1086. Pan, C, L.Y. Zheng, R.H. Weisberg, Y. Liu, C. Lembke, 2014. Comparisons of different Ensemble schemes for glider data assimilation on west Florida shelf. Ocean Modeling, in revision. 1087. Weisberg, R.H., L.Y. Zheng, C. Lembke, J. Lenes, J.J. Walsh, 2014a. Why no red tide was observed on the West Florida Shelf in 2010. Harmful Algae, in press. 1088. Weisberg, R.H., L.Y. Zheng, Y. Liu, S. Murawski, C. Hu, J. Paul, 2014b. Did Deepwater Horizon hydrocarbons transit to the west Florida continental shelf? Deep-Sea Research, http://dx.doi.org/10.1016/j.dsr2.2014.02.002. 1089. Weisberg, R.H., L.Y. Zheng, E. Peebles, 2014c. Gag Grouper larvae pathways on the west Florida shelf. Continental Shelf Research, revised. 1090. Zhu, J., R.H. Weisberg, L.Y. Zheng, S. Han, 2014a. Influences of Channel Deepening and Widening on the Tidal and Non-Tidal Circulations of Tampa Bay. Estuaries and Coasts. doi 10.1007/s12237-014-9815-4. 1091. Zhu, J., R.H. Weisberg, L.Y. Zheng, S. Han, 2014b. On the Flushing of Tampa Bay. Estuaries and Coasts. doi 10.1007/s12237-014-9793-6. 107. PHY080005 1092. “Λb → pl−ῡ_` form factors from lattice QCD with static b quarks,” W. Detmold, C.-J.D. Lin, S. Meinel, and M. Wingate, Phys. Rev. D 88, 014512 (2013) 1093. “Flavor physics with Λb baryons,” S. Meinel, PoS Lattice2013 (2013) 024 arXiv:1401.2685 108. PHY090002 1094. \Graphene's morphology and electronic properties from discrete differential geometry." A. A. Pacheco Sanjuan, Z. Wang, H. Pour Imani, M. Vanevic, and S. Barraza-Lopez. Phys. Rev. B 89, 121403(R) (2014). [Rapid Communication.] 1095. \Quantitative chemistry and the discrete geometry of conformal atom-thin crystals." A. A. Pacheco Sanjuan, M. Mehboudi, E. O. Harriss, H. Terrones, and S. Barraza-Lopez. ACS Nano 8, 1136 (2014). [2 citations, ACS Nano Website]. 1096. \Discrete gauge fields for graphene membranes under mechanical strain." J. V. Sloan, A. A. Pacheco Sanjuan, Z. Wang, C. M. Horvath, and S. Barraza-Lopez. MRS Online Proceedings Library. Volume 1549 (2013). doi:10.1557/opl.2013.1030. 1097. \Strain-engineering of graphene's electronic structure beyond continuum elasticity." S. Barraza-Lopez, A. A. Pacheco- Sanjuan, Z. Wang, and M. Vanevic. Solid State Comm. 166, 70 (2013). [Fast Track Communication.] [5 citations, ISI]. 1098. \Strain gauge fields for rippled graphene membranes under central mechanical load: an approach beyond _rst-order continuum elasticity." J. V. Sloan, A. A. Pacheco-Sanjuan, Z. Wang, C. M. Horvath, and S. Barraza-Lopez. Phys. Rev. B 87, 155436 (2013). [9 citations, ISI Web of Knowledge]. 1099. \Coherent electron transport through freestanding graphene junctions with metal contacts: A materials approach." S. Barraza-Lopez. J. Comput. Elect 12, 145 (2013). [3 citations, ISI Web of Knowledge]. 1100. \Optimal pseudopotentials and numerical atomic orbital bases for arbitrary materials." P. Rivero, V. Garcia-Suarez, Y. Yang, L. Bellaiche, K. Park, J. Ferrer and S. Barraza-Lopez. Under review at Physical Review B (submitted on 4/3/2014). 1101. \Polarity compensation in ultra-thin _lms of complex oxides: The case of a perovskite nickelate." S. Middey, P. Rivero, D. Meyers, M. Kareev, X. Liu, Y. Cao, J. W. Freeland, S. Barraza-Lopez, and J. Chakhalian. Under review at ACS Nano (submitted on 2/24/2014). 1102. \The discrete geometry of two-dimensional materials under non-harmonic deformations." K. Utt, P. Rivero, H. Pour Imani, A. A. Pacheco Sanjuan, H. Terrones, E. O. Harriss, and S. Barraza-Lopez. 1103. \Rigid band shifts, charge pinning, and charge transport through graphene junctions with wetting metal contacts." T. B. Bothwell and S. Barraza-Lopez.

PY1-3 Comprehensive Report Page 48

109. PHY090084 1104. Collisionless Reconnection in the Large Guide Field Regime: Gyrokinetic Versus Particle-in-Cell Simulations, TenBarge, J. M., Daughton, W., Karimabadi, H., Howes, G. G., and Dorland, W. Phys. Plasmas, 21, 020708 [5 pages] (2014). 1105. An Oscillating Langevin Antenna for Driving Plasma Turbulence Simulations, TenBarge, J. M., Howes, G. G., Dorland, W., and Hammett, G. W. Comp. Phys. Comm., 185, 578 [12 pages] (2014). 1106. Collisionless Damping at Electron Scales in Solar Wind Turbulence, TenBarge, J. M., Howes, G. G. and Dorland, W. 1107. Astrophys. J., 774, 139 [10 pages] (2013). 1108. Alfven Wave Collisions, The Fundamental Building Block of Plasma Turbulence I: Asymptotic Solution, Howes, G. G. and Nielson, K. D. Phys. Plasmas, 20, 072302 [19 pages] (2013). 1109. Alfven Wave Collisions, The Fundamental Building Block of Plasma Turbulence II: Numerical Solution, Nielson, K. D., Howes, G. G. and Dorland, W. Phys. Plasmas, 20, 072303 [9 pages] (2013). 1110. Alfven Wave Collisions, The Fundamental Building Block of Plasma Turbulence III: Theory for Experimental Design, Howes, G. G., Nielson, K. D., Drake, D. J., Schroeder, J. W. R., Skiff, F., Kletzing, C. A., and Carter, T. A. Phys. Plasmas, 20, 072304 [12 pages] (2013). 1111. Current Sheets and Collisionless Damping in Kinetic Plasma Turbulence TenBarge, J. M. and Howes, G. G. Astrophys. J. Lett., 771, L27 [6 pages] (2013). 110. PHY100030 1112. S. Toal, D. Meral, D. Verbaro, B. Urbanc, and R. Schweitzer-Stenner, The pH-Independence of Trialanine and the Effects of Termini Blocking in Short Peptides: A Combined Vibrational, NMR, UVCD, and Molecular Dynamics Study, J. Phys. Chem. B 117, 3689-3706 (2013). [Times Cited: 8] 1113. D. Meral and B. Urbanc, Discrete Molecular Dynamics Study of Oligomer Formation by N-Terminally Truncated Amyloid β-Protein, J. Mol. Biol. 425, 2260-2275 (2013). [Times Cited: 7] 1114. B. Barz and B. Urbanc, Minimal Model of Self-Assembly: Emergence of Diversity and Complexity, J. Phys. Chem. B 118, 3761-3770 (2014). 1115. M. Henderson, B. Urbanc, and L. Cruz, A Computational Model for the Loss of Neuronal Organization in Microcolumns, Biophys. J., In Press (2014). 1116. T.L. Williams, D.M. Fox, B. Barz, R. Roychaudhuri, M.M. Condron, D.B. Teplow, and B. Urbanc, N-terminal Substitution in Amyloid _-Protein Results in Oligomer Compaction That Protects Against Peptide–Induced Membrane Disruption, Submitted. 1117. N. Kruczek and B. Urbanc, The Effect of Amino Acid Substitutions K16A and K28A on Amyloid _-Protein Self- Assembly, B.S. Thesis (Drexel University, Department of Physics), In Preparation. 1118. D. M. Fox, B. Barz, and B. Urbanc, The Effect of Covalent Cross-Linking on Amyloid _-Protein Assembly, In Preparation. 1119. M. Voelker, M. Betnel, and B. Urbanc, Discrete Molecular Dynamics Study of α-Synuclein Folding and Assembly, In Preparation. 1120. M. Voelker and B. Urbanc, Oligomerization of Amyloid _-Protein in Crowded Environments: A Discrete Molecular Dynamics Study, In Preparation. 1121. M. ˇZganec, E. ˇZerovnik, and B. Urbanc, Discrete Molecular Dynamics Simulation of Folding and Early Stages of Oligomerization of Stefins, In Preparation. 1122. A. Tomida, D. Meral, and B. Urbanc, The Effect of Amino Acid Substitutions at Position 2 on A_ Folding and Oligomer Formation, In Preparation. 111. PHY100033 1123. J. D. Kaplan, C. D. Ott, E. P. O’Connor, K. Kiuchi, L. Roberts, and M. Duez, The Influence of Thermal Pressure on Hypermassive Neutron Star Merger Remnants, Submitted to the Astrophys. J.; arXiv:1306.4034 (2013). 1124. J. D. Kaplan, Where tori fear to tread : hypermassive neutron star remnants and absolute event horizons or topics in computational general relativity, Ph.D. thesis, California Institute of Technology, Pasadena, CA, USA (2013). 1125. S.M. Couch and C. D. Ott, Revival of The Stalled Core-Collapse Supernova Shock Triggered by Precollapse Asphericity in the Progenitor Star, Astrophys. J. Lett. 778, L7 (2013). 1126. E. Abdikamalov, C. D. Ott, R. Haas, C. Reisswig, P. M¨osta, L. Roberts, H. Klion, and E. Schnetter, Neutrino-Driven Convection and Standing Accretion Shock Instability in Three-Dimensional Core-Collapse Supernovae, to be submitted (2014), URL http://www.tapir.caltech.edu/⇠cott/CC3dResStudy.pdf. 1127. P. M¨osta, S. Richers, C. D. Ott, R. Haas, A. L. Piro, K. Boydstun, E. Abdikamalov, C. Reisswig, and E. Schnetter, Magnetorotational Core-Collapse Supernovae in Three Dimensions, Astrophys. J. Lett. 785, L29 (2014). 1128. C. Reisswig, C. D. Ott, E. Abdikamalov, R. Haas, P. M¨osta, and E. Schnetter, Formation and Coalescence of Cosmological Supermassive-Black-Hole Binaries in Supermassive-Star Collapse, Phys. Rev. Lett. 111, 151101 (2013).

PY1-3 Comprehensive Report Page 49

1129. P. M¨osta, B. C. Mundim, J. A. Faber, R. Haas, S. C. Noble, T. Bode, F. L¨offler, C. D. Ott, C. Reisswig, and E. Schnetter, GRHydro: a new open-source general-relativistic magnetohydrodynamics code for the Einstein toolkit, Class. Quantum Grav. 31, 015005 (2014). 1130. E. Abdikamalov, S. Gossan, A. M. DeMaio, and C. D. Ott, Measuring the Angular Momentum Distribution in Core- Collapse Supernova Progenitors with Gravitational Waves, submitted to Phys. Rev. D.; arXiv:1311.3678 (2013). 1131. W. Engels, R. Frey, and C. D. Ott, A parametric multivariate statistical model for gravitational waveforms from rotating core collapse, to be submitted (2014), URL http://www.tapir.caltech.edu/⇠cott/CCSNMultivar.pdf.

112. PHY110015 1132. S. Akula and P. Nath, \Gluino-driven radiative breaking, Higgs boson mass, muon g-2, and the Higgs diphoton decay in supergravity unification," Phys. Rev. D 87, no. 11, 115022 (2013) [arXiv:1304.5526 [hep-ph]]. 1133. \SUGRA Grand Unification, LHC and Dark Matter," P. Nath, J. Phys. Conf. Ser. 485, 012012 (2014) [arXiv:1207.5501 [hep-ph]]. 1134. \Probe of New Physics using Precision Measurement of the Electron Magnetic Moment," A. Aboubrahim, T. Ibrahim and P. Nath, arXiv:1403.6448 [hep-ph]. 113. PHY110016 1135. Boldyrev, S., Perez, J. C., Mason, J., Cattaneo, F., Recent results on magnetic plasma turbulence, SOLAR WIND 13: Proceedings of the Thirteenth International Solar Wind Conference. AIP Conference Proceedings, Volume 1539, pp. 135-138 (2013). 1136. Mason, J. Boldyrev, S., Cattaneo, F., & Perez, J., Statistics of Passive Scalars in Incompressible Field-Guided MHD Turbulence (2014) in preparaiton. 114. PHY120002 1137. A. Cheng, A. Hasenfratz, G. Petropoulos and D. Schaich, JHEP 1307, 061 (2013) [arXiv:1301.1355 [hep-lat]]. 1138. A. Cheng, A. Hasenfratz, G. Petropoulos and D. Schaich, arXiv:1311.1287 [hep-lat], accepted for the proceedings "Lattice'13", Mainz, July 2013. 1139. A. Cheng, A. Hasenfratz, G. Petropoulos and D. Schaich, in preparation. 1140. A. Hasenfratz, A. Cheng, G. Petropoulos and D. Schaich, arXiv:1310.1124 [hep-lat], accepted for the proceedings "Lattice'13", Mainz, July 2013. 1141. A. Cheng, A. Hasenfratz, Y. Liu, G. Petropoulos and D. Schaich, arXiv:1401.0195 [hep-lat], submitted for publications to Phys. Rev. Lett.. 1142. G. Petropoulos, A. Cheng, A. Hasenfratz and D. Schaich, arXiv:1311.2679 [hep-lat], accepted for the proceedings "Lattice'13", Mainz, July 2013. 1143. A. Cheng, A. Hasenfratz, Y. Liu, G. Petropoulos and D. Schaich, arXiv:1404.0984 [hep-lat], submitted for publication to JHEP. 1144. A. Cheng, A. Hasenfratz, G. Petropoulos and D. Schaich, in preparation. 1145. D. Schaich [USBSM Collaboration], [arXiv:1310.7006 [hep-lat]]. 1146. A. Hasenfratz, A. Cheng, G. Petropoulos and D. Schaich, arXiv:1303.7129 [hep-lat], proceedings of SCGT12-Mini Workshop, Nagoya. 1147. T. Appelquist, R. Brower, S. Catterall, G. Fleming, J. Giedt, A. Hasenfratz, J. Kuti and E. Neil et al., arXiv:1309.1206 [hep-lat]. 115. PHY130014 1148. T. Vogel, Y.W. Li, T. Wüst, D.P. Landau, Generic, Hierarchical Framework for Massively Parallel Wang-Landau Sampling, Phys. Rev. Lett 110, 210603 (2013) 1149. T. Vogel, Y.-W. Li, T. Wüst, and D.P. Landau, Fast, replica-exchange framework for Wang- Landau sampling, submitted to Phys. Rev. E 1150. L. Gai, T. Vogel, K. Maerzke, C. Iacovella, D.P. Landau, P.T. Cummings, and C. McCabe, Examining the Phase Transition Behavior of Amphiphilic Lipids in Solution Using Statistical Temperature Molecular Dynamics and Replica- Exchange Wang-Landau Methods, J. Chem. Phys. 139, 054505 (2013) 1151. G. Shi, T. Vogel, Y.-W. Li, T. Wüst, and D.P. Landau, Effect of single-site mutation on HP lattice proteins, submitted to Phys. Rev. E 1152. Y.-W. Li, T. Vogel, T. Wüst, and D.P. Landau, A new paradigm for petascale Monte Carlo simulation: Replica exchange Wang-Landau sampling, J. Phys.: Conf. Ser. (2014) in press; arXiv:1402.2328 1153. T. Vogel, Y.-W. Li, T. Wüst, and D.P. Landau, Exploring new frontiers in statistical physics with a new, parallel Wang-Landau framework, J. Phys.: Conf. Ser. 487, 012001 (10 pages) (2014)

PY1-3 Comprehensive Report Page 50

1154. Thomas Vogel, Amphiphilic lipids in solution: a simulational study of lipid bilayer formation, March Meeting of the American Physical Society, Baltimore, MD 2013 1155. Guangjie Shi, Effect of Mutations on HP Lattice Proteins, March Meeting of the American Physical Society, Baltimore, MD 2013 (Poster) 1156. Ying Wai Li (with: Thomas Vogel and David Landau), Generic parallel Wang-Landau sampling for complex systems, March Meeting of the American Physical Society, Baltimore, MD 2013 1157. Thomas Vogel, Application of parallel Wang-Landau sampling, CSP Workshop, Athens, GA 2013 1158. Lili Gai (with: Thomas Vogel and David Landau), Phase Behavior Study by Statistical Temperature Molecular Dynamics (STDM), CSP Workshop, Athens, GA 2013 1159. David Landau, A New Paradigm for Petascale Monte Carlo: Replica Exchange Wang-Landau Sampling, Conference on Computational Physics, Moscow, Russia 2013 1160. David Landau, Exploring new frontiers in statistical physics with a new, parallel Wang-Landau framework, VIIth Brazilian Meeting on Simulational Physics, João Pessoa, Brazil 2013 1161. Ying Wai Li (with: Thomas Vogel and David Landau), Scalability and Performance Analysis of Replica-Exchange Wang-Landau Sampling, CSP Workshop, Athens, GA 2014 1162. Guangjie Shi, How Single-site Mutation Affects HP Lattice Proteins, CSP Workshop, Athens, GA 2014 (Poster) 1163. Guangjie Shi, How Single-site Mutation Affects HP Lattice Proteins, March Meeting of the American Physical Society, Denver, CO 2014 1164. Ying Wai Li (with: Thomas Vogel and David Landau), Performance of replica-exchange Wang-Landau sampling for the study of spin systems, March Meeting of the American Physical Society, Denver, CO 2014 1165. Guangjie Shi, Self-Assembly of Coarse-grained Amphiphilic Molecules Using Parallel Wang-Landau Sampling, XSEDE conference, Atlanta, GA 2014 (Poster, abstract submitted) 116. SES120001 1166. Chen Yao, Mao Ye (2014). Tick Size Constraints, Market Structure, and Liquidity, American Finance Association Annual Meeting. Philadelphia, PA. 1167. Robert Sinkovits, Tao Feng, Mao Ye(2014). Fast, Low-Memory Algorithm for Construction of Nanosecond Level Snapshots of Financial Market. Proceedings of XSEDE14, Extreme Science and Engineering Discovery Environment. Atlana, GA. 1168. Gai, J., Choi, D. J., O'Neal, D., Ye, M. and Sinkovits, R. S. (2013) Fast construction of nanosecond level snapshots of financial markets. Proceedings of XSEDE13, San Diego, CA. Jiading [1 BibWord. Chicago Numbered Extension. June 20, 2008. ] http://bibword.codeplex.com/releases/view/14597 (accessed April 18, 2012). [2 —. BibWord Styles. April 17, 2012. ] http://bibword.codeplex.com/wikipage?title=Styles&referringTitle=Documentation&ANCHOR# Installation (accessed April 18, 2012). [3 National Science Foundation. "Cyberinfrastructure Vision for 21st Century Discovery." March ] 2007. [4 Boehm, B., and J. Lane. 21st Century Processes for Acquiring 21st Century Software-Intensive ] Systems of Systems. CrossTalk, 2006. [5 Kufrin, Rick. "PerfSuite: An Accessible, Open Source Performance Analysis Environment for ] Linux." The 6th International Conference on Linux Clusters: The HPC Revolution 2005. Chapel Hill, NC, USA, April 2005. [6 Chen Yao, Mao Ye. "Tick Size Constraints, Market Structure, and Liquidity." AFA13. Philadelphia: ] American Finance Association, 2014. 51. [7 Robert Sinkovits, Tao Feng, Mao Ye. "Fast, Low-Memory Algorith for Construction of ] Nanosecond Level Snapshots of Financial Market." XSEDE14. Atlantic: XSEDE, 2014.

PY1-3 Comprehensive Report Page 51

PY1-3 Comprehensive Report Page 52

D The Value of XSEDE: Value-Add and Cost-Avoidance The NSF has implemented a variety of approaches to the organization and delivery of what are now called advanced cyberinfrastructure services. In the 1980s the NSF funded five distinct supercomputer centers, reduced to four in the 1990s. In 1997, the NSF established the PACI (Partnerships for Advanced Computational Infrastructure) program. From 2004 to 2011, TeraGrid was the dominant organizing structure for NSF-funded advanced cyberinfrastructure. In 2011 a new structure was introduced, in which the XSEDE project plus a set of individually funded Service Providers (SPs) work together to implement the objectives of the NSF XD program. XSEDE’s charge is to play a supporting and organizing role; the NSF funds multiple Service Providers and there are calls for proposals to deliver services (as an XD Program SP) roughly annually. The mission of NSF-funded advanced computing facilities has also changed over time. In the 1980s the supercomputer center program was intended to act as a time machine; to provide a small group of researchers with access to extreme scale computing capabilities that would not otherwise be available for several years in the future. Today, the mission for NSF ACI facilities has expanded. It now encompasses enabling cutting edge research that would not be possible in the absence of the NSF-funded facilities, accelerating research generally, and helping develop a technically savvy and diverse 21st century workforce. The XD program aims to provide a broad range of computational services that support many researchers with different types and levels of need, not just the few researchers with extreme-scale computing needs. From the user’s standpoint, XSEDE constitutes the primary user interface to NSF ACI resources and an organizing and supporting function. From the standpoint of Service Providers, XSEDE provides an organizing, integrating, standard setting, and supporting entity. (XSEDE both supports SPs in executing their role and aids SPs in supporting users). There is a great deal of overlap between involvement in XSEDE and involvement in the XD program as a SP. Thus, institutions that participate in the XD program both collaborate to present a coordinated user interface and experience, while also competing in response to new solicitations for SP awards. A critical question in evaluating XSEDE must be the cost-effectiveness and the overall effectiveness of its services. Are the community’s needs and NSF priorities being met by the “XSEDE+SPs” model that is currently funded through the XD program? Are those needs and priorities being served in a way that makes good use of the taxpayer monies that fund them? This appendix attempts to address this second question of cost-effectiveness . It also addresses, in a limited way, the value that XSEDE adds to the national research community through execution of NSF priorities and its service to the national open research community. This analysis is done under the assumption that the NSF fundsat least two national cyberinfrastructure facilities (SPs) that each operate resources with an initial system investment of approximately $30M. While there are a number of hypothetical scenarios to depict the landscape, we choose to use this basis as a lower bound for an alternative situation to the current cyberinfrastructure ecosystem. For the purpose of this analysis, we define cost avoidance of XSEDE as the difference between the cost of delivering a service under the XSEDE current model vs. the cost of delivering the same service under alternate models of two or four different national cyberinfrastructure centers. This is hypothetical in some ways, but also represents structures that have been supported by the NSF in the past. We define value added as the benefits that accrue from having one coordinated and consolidated organization supporting a national cyberinfrastructure that supports, integrates, and

PY1-3 Comprehensive Report Page 1 interacts with multiple SPs and the staff members at those SP sites and which would not be possible in a model of multiple independent national cyberinfrastructure centers. The summary of our analysis is this: as compared to a model of two independent national centers, XSEDE achieves a cost avoidance of $1,430,000 per year. As compared to a model of four national centers XSEDE achieves a cost avoidance of $18,240,000 per year. In addition to the financial efficiencies that XSEDE provides, there are significant and important benefits that result from XSEDE as a single coordinating and user interface function for the entire NSF-funded XD program. As one national, unified, virtual organization (VO) supporting open science and engineering research and education, XSEDE provides services and creates value that in some cases simply would not be possible in a scenario of multiple, national centers and in other ways are better than multiple national centers or which could not be delivered by multiple national centers. These added values range include enhanced quality and breadth of services delivered to XSEDE users. In addition, XSEDE provides technical leadership and a national infrastructure foundation for the NSF- funded national CI in ways that multiple, distinct national centers could not. D.1 Cost avoidance A major consideration when providing certain essential functions is the scaling of staff and workload. For example, at least one staff member is required to manage account services. As we learned under TeraGrid, having one person trained and authorized to do this function is insufficient. Delivering account management services to a national user group requires at least two people. A center providing 7 x 24 coverage of basic operations (including a person available to respond in real time to a security emergency) would require no fewer than five people working 40 hours to cover the 168 hours in a week. As a first data point, we list in Table 15 the minimum number of staff required to support a major, national SP operating a resource valued initially at $30M or more. Table 15. Minimum staff required to support an independent national CI center, by functional area Function Minimum number of people Allocations 1.25 Accounts & Account Management 1.25 Authentication Services 1.5 24X7 Operations 5.0 Ticket Support 1.5 User Survey 0.5 Project Management Overall project leadership, project management 3.0 Project Managers & Finance 1.0 User Information Services 3.0 Training 2.0 Education 2.0 Outreach 1.0 Networking 2.0 Central Services 5.0 Systems Engineering & Deployment 4.0 Advanced Support (MPI & IO) 6.0 Total 40

PY1-3 Comprehensive Report Page 2

We have calculated the cost avoidance achieved by one unified national cyberinfrastructure center versus two organizations (e.g.: the PACI program at the turn of this century) and four organizations (e.g.: the NSF supercomputer center program of late ‘80s and early ‘90s). Calculations assume that one full time equivalent (FTE) of staff effort costs $200,000 per year for salary, benefits, essential travel, and facilities & administration costs. This assumption has worked for XSEDE thus far (although we are seeing some signs that as the economy improves this funding level will be insufficient to attract and retain top talent). Running some of the most powerful supercomputers in the US and supporting some of the most important scientific research cannot be done for less. In some cases, usually in response to need or load, XSEDE currently employs more than the minimum staff required to perform a given function. In this circumstance we calculate the cost avoidance by comparing the minimum number of staff required for a center and the actual cost budgeted for XSEDE. This seems the most conservative way to calculate such costs. There are also per-center costs not associated with personnel, such as the cost of networking, travel costs for allocation review panel members, etc. Table 2 describes the actual cost of XSEDE critical functions as compared to the cost of minimal staff required to run 2 and 4 cyberinfrastructure centers. The center activities in Table 2 describe the set of responsibilities that make up a comprehensive cyberinfrastructure center, including the standard set of operations functions, user access to resources and training, and management. “Advanced support (MPI & I/O)” corresponds support and development for projects which require considerable adaptation to maximize computational capabilities. In the XSEDE context, this represents the efforts of Extended Collaborative Support Services, while for individual centers, this describes the efforts of center staff to adapt and optimize codes for local systems. Table 16. Costs for critical functional services for XSEDE and models of 2 and 4 national cyberinfrastructure centers. All FTE costs are estimated at a full cost of $200,000 per year per FTE (including salary, benefits, F&A, travel, etc.). Actual costs for critical functions offered by XSEDE Staffing and costs at the minimal levels defined in Table 1 Per XSEDE Fewest Center Non- FTEs Non Minimal Minimal XSEDE personnel XSEDE Cost Needed personnel 2-Center 4-Center Center Activities FTEs Costs Total /Center Cost annual cost annual cost Allocations 2.25 $100,000 $550,000 1.25 $100,000 $700,000 $1,400,000 A & AM 2.50 $500,000 1.25 $500,000 $1,000,000 Authentication 4.30 $860,000 1.5 $600,000 $1,200,000 Services 24 x 7 5.00 $1,000,000 5.00 $2,000,000 $4,000,000 Operations Ticket Support 2.00 $400,000 1.50 $600,000 $1,200,000 User Survey 1.25 $250,000 0.50 $200,000 $400,000 Project Management Leadership and 3.50 $700,000 3.00 $1,200,000 $2,400,000 Project Management Project 5.00 $30,000 $1,030,000 1.00 $15,000 $430,000 $860,000 Managers & Finance User Information 6.00 $20,000 $1,220,000 3.00 $20,000 $1,240,000 $2,480,000 Services Training 7.50 $1,500,000 2.00 $800,000 $1,600,000 Education 5.00 $1,000,000 2.00 $800,000 $1,600,000

PY1-3 Comprehensive Report Page 3

Outreach 3.50 $700,000 1.00 $400,000 $800,000 Network 3.50 $250,000 $950,000 2.00 $250,000 $1,300,000 $2,600,000 Central Services 5.00 $20,000 $1,020,000 5.00 $20,000 $2,040,000 $4,080,000 (web portal, etc.) Systems 12.50 $2,500,000 4.00 $1,600,000 $3,200,000 Engineering & Deployment Advanced 6.00 $1,200,000 6.00 $2,400,000 $4,800,000 Support (MPI & IO) Totals 74.8 $420,000/ $15,380,000/ 40 $16,810,000/ $33,620,000/ year year year year $1,530,000/ $18,340,000/ COST AVOIDANCE of XSEDE VS year year MULTIPLE CENTER COSTS

To be clear about several of the assumptions and factors underlying these calculations:  We assume that XSEDE and the hypothetical multiple national centers are funded by the NSF.  It is implicit in this assumption that current funding for SPs is either adequate, or that the inadequacies in funding SPs are at least not well corrected by removing funding for essential services tallied up in Table 15 and Table 16 above. (More text on this point is included in Section 3 below)  The alternate models of 2 and 4 national centers are hypothetical, but at the same time represent approaches that have been in practice in the past (the original four functioning NSF Supercomputing Centers during the late 80s early 90s and the PACI program funding for two collaborative organizations in the form of NPACI and NCSA in the late 90s / early part of the ). The 2 and 4 center costs represent minimal costs to operate centers, based on operations and maintenance costs at 25% of hardware expenditures.  Operations and maintenance costs for supercomputer centers are recognized to scale linearly only for utility, colocation, maintenance and some software licenses. For small to modest resources, there remains an “FTE floor” that is a constant, per-system cost. As systems increase in scale, a number of center leaders and managers have reported that the number of FTE’s required scales based on the size and sophistication of the user community, rather than the core count. Again, our scenario is intended to describe a conservative estimate of what the emerging landscape may look like.  These calculations are not precise, and there is no rational way to make precise calculations without having them be precise at the cost of being clearly incorrect.  The cost avoidance calculations do not include the entire cost of XSEDE or the entire potential costs of alternate approaches. This cost avoidance exercise focuses on the cost of essential services that any CI center or collaboration serving a national CI must offer, comparing the actual costs for XSEDE for these services with hypothetical costs based on what we believe to be the minimal number of staff required for any national CI center

With these assumptions, clarifications, and caveats, using the approach of estimating costs and cost avoidance for the delivery of critical functions any national CI center must provide and the costs of so doing, we estimate the cost avoidance XSEDE achieves, based on two scenarios below:  $ 1,530,000 per year as compared to a hypothetical model in which there are two different and independent national CI centers

PY1-3 Comprehensive Report Page 4

 $18,340,000 per year as compared to a hypothetical model in which there are four different and independent national CI centers XSEDE achieves this savings by avoiding duplication of effort in a number of areas, including operations; creation of educational material and lectures; definition of documentation templates; APIs; organizational and user interfaces; definition and execution of processes for development, operation and security; user services; communication and external interactions; and software implementations. D.2 Value added As one national, unified, virtual organization (VO) supporting open science and engineering research and education, XSEDE provides services and creates value that in some cases simply would not be possible in a scenario of multiple, national centers and in other ways are better than multiple national centers or which could not be delivered by multiple national centers. We discuss these aspects of ‘value added’ services next. Higher quality user experience, leadership, and national service. As a significant unified VO integrating NSF-funded national CI across a broad range of disciplines, XSEDE can represent the national research community more effectively than multiple centers. The following XSEDE functions illustrate:  XSEDE provides improved user experience uniformity and a one-stop shop: o Users experience greater uniformity of computing environments, commands, and integration tools across multiple sites, important for researchers who don’t have time to learn N different sets of standards for N different national CI centers, and essential for multi-site projects and for using multiple sites over time. o Users can apply for multiple resources on all sites via one application process and request transfers of allocations between sites. Users can get expert and consistent help in selecting the right resource from XSEDE’s entire suite of resources, rather than have resource selection driven by a mix of user needs and the value to centers of having and retaining high profile researchers. o XSEDE provides a unified set of tools for data management, featuring uniform use of Globus Online and unified file systems. o XSEDE.org offers a single portal to Computational and Data-intensive Science and Engineering information and advanced digital services. o XSEDE works with the staff of multiple SPs, and as a single organization integrates and promulgates the best work by the best staff available across the US, rather than splitting the best staff across multiple different centers that invariably end up competing.  XSEDE makes a greater impact across the nation and internationally thanks to its function as a national coordinating effort; XSEDE becomes a thing with which other CI projects may align. When one organization defines formats for processes, documentation, APIs, organizational interfaces, and user interfaces, the result is an advantage of scale. In addition, as a national organization XSEDE is in a better position to negotiate agreements and standards for digital US and national services and MOUs with other organizations. This is not just the advantage of size. Some agreements are only possible because XSEDE is a national cyberinfrastructure organization.  Campus Bridging. Campus bridging is an important effort to create a coordinated national cyberinfrastructure with XSEDE as an organizing and aligning influence. This effort will extend the value of software created by XSEDE, and, more broadly, the value of software created by others in the community and integrated and supported by XSEDE. In addition, it

PY1-3 Comprehensive Report Page 5

will allow the NSF and national research community to leverage non-federal investments in campus cyberinfrastructure. One example of this is the Basic XSEDE-compatible cluster build, a distribution of software that allows any entity operating a computing cluster to create a software environment that mimics the open source software within the least esoteric of the XSEDE clusters. This XSEDE compatible basic cluster (XCBC) build includes all of the components needed to create a cluster from scratch, or alternatively allows a cluster administrator to selectively install desired XSEDE-compatible tools on to an existing cluster. This distribution also includes important integration tools such as Globus Online. The XSEDE –compatible basic cluster build makes it easier for campus systems administrators to build clusters, makes it easier to implement on such clusters software that integrates campus clusters and a national CI system, and which makes it possible to reuse much of the documentation developed for XSEDE in documentation for local campuses. Campus bridging is effective because there is a national entity which to align. A nationally integrated and effective cyberinfrastructure needs a foundation, just like a mall needs an anchor store. The campus bridging effort would either be impossible, or fundamentally more limited in value, if there were two or more independent national CI centers.  Broader geographic reach and greater diversity of participants reaching a larger and more diverse national audience. o As a result service providers’ coordinated effort, XSEDE is “local” to more places in the US than any small number of national CI centers could be. This leads to a greater number and diversity of faculty who work at the same institution as and know XSEDE staff members. This leads automatically to better information dissemination (again, information dissemination about the nationally coordinated infrastructure with broader geographic reach and a larger, more diverse user community). o The scale and scope of XSEDE makes it possible to provide small amounts of funding to involve individuals from small colleges and MSIs that would not be able to compete to run a national CI center. Based on the data we have at hand, we believe that the number of such institutions collaborating with XSEDE is greater than the number of schools collaborating formally with the two PACI partnerships, combined.  Better decision making. Decisions based on information across the VO o Ticket handling drawing from incident history at all VO sites o Better projection of overall computing needs across the VO o Better mapping of computational problems to best resources o Better implementations, used ubiquitously, leading to higher productivity o Better balancing of human and digital resources  Better technical expertise including sharing/complementing/combining expertise. XSEDE can justly claim to have staff that are among the best of the best nationally, so an excellent group of the best national staff are available via the coordinated user service. Two or four different centers could not in a practical sense individually span such a breadth of top-notch expertise. The integration across sites means that we can keep more “integer full bodies” devoted to XSEDE, even though not all sites need all expertise all the time.

Unified national platform supporting computational science. A unified national platform that helps anchor and align the national cyberinfrastructure provides great value in ways that could not be delivered without the national research community having one leading organization with which to align and use. These include:  Software quality and implementation. Software architecture, software development and integration, and test and deployment. Examples of this benefit include:

PY1-3 Comprehensive Report Page 6

o Globus Online. File transfer was once the bane of TeraGrid, and was at best a challenge in the PACI days. Globus Online implemented in XSEDE provides a truly excellent interface to file movement services supporting file movement within XSEDE, and between campus resources and XSEDE. Relative to the best possible solutions given the state of the art at any given time, Globus is the best solution for data transfer software that a national CI facility has ever implemented. o Unified Authentication. Like Globus transfer service, the new Globus Nexus authentication service provides an excellent solution to a challenge that has dogged the national CI since the late 1990s – secure and user friendly identity and authentication services providing access to all systems in a distributed national cyberinfrastructure. o UNICORE implementation. XSEDE has added a unified job management interface through its implementation of the Execution Management Services based on UNICORE. As with Globus this implementation creates a mechanism for uniform job submission and management across heterogeneous and diverse SP resources.  Science gateways. XSEDE provides a platform for multiple science gateways that deliver advanced scientific analysis capabilities nationally. XSEDE delivers these more efficiently than multiple centers. Programming a science gateway can be complex; it is much more straightforward with one unified environment. Examples of widely used and valuable science gateways using XSEDE resources include CyberGIS, CIPRES (an evolutionary biology gateway), Nanohub (a well known nanotechnology gateway), and SCEC (Southern California Earthquake Center) Earthworks system.  Computing and data analysis platforms for major scientific projects. Major national scientific initiatives are coming to depend on XSEDE as a computing and data analysis environment. LIGO (Laser Interferometer Gravitational wave Observatory - http://www.ligo.caltech.edu) is one example of national projects to make a significant migration toward use of major shared national CI facilities. NSF funding for the National Center for Genome Analysis Support, which leverages and supports particular uses of XSEDE to support the biology research community thus aiding biologists and XSEDE in being more effective, is another example. This improves quality and security of the overall national cyberinfrastructure, fosters interdisciplinary research, and avoids service duplication.  Optimization and improvement of community codes. XSEDE takes a unified approach to optimizing widely used community codes, prioritizing and coordinating effort. Generally someone masters a code and optimizes it for multiple architectures. With a multi-center environment, there is no guarantee that improvements made to a code would be portable to other architectures and centers, and no coordinated selection of codes in the direction of global optimization.

Security effectiveness. One unified security team has proven highly effective in responding to security incidents. TeraGrid experienced two security incidents in which exposures involved more than one TeraGrid center. So far under XSEDE no incidents have propagated from one XSEDE center to another. Value of the research XSEDE supports per year and Disaster resilience. XSEDE provides resilience in critical core services such as authentication and 7x 24 phone support by operating a main and backup site in two distinct geographical regions. We have tested failovers of critical core services including portal access and 7 x 24 operations from NCSA to IU and TACC to NCSA. A conservative estimate of the annual value of all federally-funded research (including significant investments by NIH, DoE, DoD, and NASA) supported by the XSEDE infrastructure is $765,000,000

PY1-3 Comprehensive Report Page 7

(CY2013). This is based on allocations made through the XSEDE allocations process. This means that the value of the research supported by XSEDE is approximately $2,100,000 per day. While not all progress would stop if the national NSF-funded cyberinfrastructure were inoperable for a day, the per-day value of research supported by XSEDE demonstrates the need for a robust and disaster- resilient service infrastructure such as XSEDE provides. Disciplinary breadth of expertise. As a large organization XSEDE can support one expert for less- used fields in NSF-funded CI, providing better value than one-half of a non-expert FTE. For example, XSEDE devotes full-time FTEs to the following fields: digital humanities, some sub-disciplines of biology, digital arts applications in the sciences, and relationships with industrial partners.  Novel and innovative Projects. The NIP group identifies emerging and innovative scientific research and research needs, and dedicates an interdisciplinary team to supporting them. Here again, a single team provides more efficient, effective service than two or four competing teams and avoids the pitfalls experienced under the PACI program.  Training and education – including public education. As above, a single organization such as XSEDE provides training materials that are authoritative. A single organization has the advantage over multiple, competing centers in capturing national attention and conducting national public education and training activities because it integrates across the best materials created by XSEDE staff and by the best staff at multiple SPs.  Campus Champions – As a bridge from the national community to a national CI support organization, the XSEDE Campus Champions play a unique role in training, outreach, and research education. This effectiveness would not be possible if the group had to interact with multiple national organizations.  XSEDE conference – the annual XSEDE conference has become unusual in the balance of computational and domain scientists who attend, interact, and collaborate. This fosters new initiatives in CI-enabled research that would otherwise not take place, such as campus champions finding new ways to empower researchers at their home institution as a result of hearing talks at XSEDE. History shows the counterproductive impact of independent, competing centers. That the effectiveness, national economies of scale, and creation of a unified national open CI can be done more effectively with one central organization than multiple national centers is verified by our collective experience. Think back to the days when the PACI program split the national community into “friends of SDSC” and “friends of NCSA” with the former using Legion and the latter using Globus as grid computing tools. The mismatch between the best interests of researchers and the best interests of centers seeking to sustain themselves were even more strongly pronounced during several phases of the TeraGrid, where there were multiple examples of a center that was part of the TeraGrid providing a solution that met the center’s competitive best interests when a different center would have offered the researcher better or faster service. D.3 Funding for Service Provider activities At present, Service providers are allowed to submit proposals for management and operation of SP resources that are capped at 20 to 25% of the acquisition cost (depending on solicitation). There are expense factors that might scale somewhat linearly with the cost of a cluster – with a class of machine at least things like electricity and air conditioning might scale more or less linearly with system cost. Costs placed on M&O budgets more often involve software implementation and user support – things that may well not scale with the size of the hardware acquisition cost. Evidence to support this includes the fact that the winners of recent modest ($10M and under) system awards are including primarily personnel as part of the institution-provided “facilities” budget, and not

PY1-3 Comprehensive Report Page 8 including costs such as electricity and cooling. Electricity and cooling are real and often significant costs; if the M&O formula was really sufficient these costs would appear in M&O budgets. Further evidence that the linear scaling function is incorrect is found in the fact that there were no proposals submitted to the NSF in response to NSF 14-536 at the minimum $2M level, in spite of clear encouragement in the solicitation so to do. We are able to confirm that at least three entities considered a $2M proposal and decided not to submit such a proposal due to the perceived impossibility of offering good services to the national community with an M&O budget limited to a maximum of $400,000 per year. Table 3 shows the Management and Operations for the XSEDE award as well as the Service Providers expected to be in operations as of July 1, 2015 (Wrangler, Stampede, Comet, and the Track II G). XSEDE M&O is taken from table 2, above. M&O figures for the SP sites are based on NSF calculations of 25% of overall awards, with the assumption that the Track II G system is awarded. This represents the total cost of the national cyberinfrastructure ecosystem in terms of Management and Operations. Table 17. Management & Operations costs for current and planned SP systems assuming that M&O budgets are 25% of purchase cost per year Award Management and Operations

XSEDE $15,280,000 Wrangler $1,598,583 Stampede $12,875,000 Comet $3,000,000 Track II G systems (assume $12M $3,000,000 total investment in two awards) TOTAL $35,753,583

There is strong and general agreement that the funding formulas for Service Providers are not correct, and that in particular the formula scales down incorrectly for smaller installations. The fact that the current budget guidance in XD program hardware solicitations is not adequate to provide for the staffing and operational needs of systems funded at recent budget levels is a different issue than the questions “does XSEDE deliver essential services for less cost than the cost of these services delivered in a model of multiple independent services?” and “does XSEDE offer value to the national research community that could not be delivered in a model of multiple independent services?” While acknowledging the shortcomings of the current SP funding formula for management and operations, it is these two last questions that we attempt to address in this appendix. D.4 A vignette on value added funding for Service Provider activities XSEDE’s support for the Douglas Spearot project “Atomistic Characterization of Stable and Metastable Alumina Surfaces” shows how a multi-person team assembled from various sites working closely with the PI and his co-PI can accomplish more because XSEDE exists. The goals for this project were to (1) make performance and scalability enhancements of a user-defined X-ray diffraction (XRD) algorithm in LAMMPS and (2) implement a workflow to submit LAMMPS XRD compute and visualization computations on multiple XSEDE resources (Stampede and Gordon) from any lab (U. Arkansas) and obtain the results back at the starting point. The ECSS team was comprised of Sudhakar Pamidighantam and Mark Van Moer from NCSA and Yang Wang from PSC and had participation from TACC and both the XSEDE Student Engagement program via Paula

PY1-3 Comprehensive Report Page 9

Romero Bermudez and the XSEDE Campus-Champion Fellow Luis Cueva-Parra. This project was able to produce both speedup and efficiency of the diffraction calculation through the ECSS collaboration, resulting in a third generation of the code that has improved memory utilization, incorporation of a second level of OpenMP parallelization, and Intel Xeon Phi offloading all yielding at least a 3.4X speedup. A workflow to run LAMMPS XRD and visualizations using VisIT on Stampede and Gordon was implemented in the SEAGrid Science gateway independently on each system as well as chained running on both systems. Without the strong, multi-organization, coordinated team, the work could not have been completed in the one-year timeframe with such great success.

PY1-3 Comprehensive Report Page 10

E SWOT Analysis of XSEDE A SWOT Analysis is a situation analysis in which internal strengths and weaknesses of an organization, and external opportunities and threats faced by it are closely examined to chart a strategy. SWOT stand for Strengths, Weaknesses, Opportunities, and Threats. As additional supporting information, we felt it would be beneficial to the review panel to beneficial to provide the results of the XSEDE internal SWOT analysis. The XSEDE project-level SWOT Analysis was produced using a bottom-up approach. Referring to the WBS structure of XSEDE, the Level 3 Mangers were each asked to work with their team in producing a Level 3 SWOT Analysis. Given the Level 3 SWOT Analysis as input, the Level 2 Managers were asked to produce a Level 2 SWOT Analysis. At this point the PIs and Level 2 Managers met and came up with a project-level SWOT Analysis. As a final checkpoint, the XSEDE Advisory Board (XAB) reviewed the project-level SWOT and provided their insights. This resulted in the XSEDE SWOT Analysis.

PY1-3 Comprehensive Report Page 1

Strengths (Internal) Weaknesses (Internal) 1. Researcher-needs driven and adaptable to changing needs 1. Campus Integration Researcher needs-driven organization with a deep commitment to XSEDE supporters and promoters are not well integrated on campuses helping facilitate science: attentive and adaptable to changing needs and hindering our ability to facilitate use of the XSEDE environment. technologies 2. Communicating our vision 2. Staff expertise and experience Difficulty explaining what XSEDE is to those outside established set of Exceptional staff across XSEDE with access to a breadth of expertise and supported researchers: traditionally non-HPC disciplines, new programs at XSEDE partner and Service Provider institutions, and the researchers, industry, government, etc. regions in which they are embedded 3. Setting Staff Priorities 3. Ability to coordinate virtually Defining staff priorities within budgetary constraints within XSEDE who An effective virtual organization accustomed to remote communications are matrix-ed into XSEDE from Service Provider sites often with and consensus driven solutions supervisors not participating or familiar with XSEDE. 4. Professional management and engineering practices 4. Staff Diversity Dedication to improve practices to maintain and improve the Lack of diversity among the XSEDE staff limits innovation and precludes effectiveness of XSEDE as a virtual organization representation from a significant portion of the population (women, minorities, and people with disabilities). Opportunities (External) Threats (External) 1. Outsourcing 1. Coopetition Capitalize on trend to outsource all manner of IT services; XSEDE Maintaining a balance between collaboration within the context of XSEDE capabilities and services could be offered as an external services versus competition amongst those same partners for limited funding 2. Big Data, Cloud, SaaS could limit innovative ideas. Significant expansion of capabilities offered to users (including big data, 2. Sponsor Direction cloud, and advanced SaaS) by leveraging software, services, and practices Lack of clear direction or guidance on the availability of future resources of other infrastructures, providers, and software projects. from the NSF make it difficult to support a rapidly growing and 3. Model CI Organization diversifying set of communities and requirements. Become a recognized leader among similar highly distributed, virtual 3. Talent organizations by sharing our knowledge and tools (particularly with NSF- The loss of innovative, high-performing, key personnel due to growing funded CI projects). needs in academia, government, industry and society, as well as loss of or 4. Research Innovation and Curriculum Development reduced funding leading to the decrease or loss of FTEs. Many institutions have a strong desire to integrate computational and 4. Innovative World data analytics into their research programs and curricula. XSEDE could be The emerging and rapidly changing, diverse set of research communities a pioneer in the development of this curriculum. with evolving needs requiring the use of ever-changing technologies with 5. Inter-agency CI Coordinator new critical capabilities. Coordinate CI efforts (NSF, DoE, DoD, Internationally) to understand 5. Target larger ecosystem of requirements and opportunities to align and leverage High-profile target to ongoing security attacks. efforts, and to diversify funding opportunities for the benefit of the community.

PY1-3 Comprehensive Report Page 2

F Evolving XSEDE and Organizational Improvement As a large-scale, distributed project XSEDE is constantly evolving in response to external factors such as the available technologies and the needs of the research community, and as part of internal efforts to continually improve the delivery of our portfolio of digital services. In fact, the ability to quickly adapt to changing requirements is rooted in the projects DNA. The original project execution plan identifies several characteristics that support this agility.  services designed and implemented with widely validated system engineering principles  requirements clearly tied to the needs of science and engineering research community  implementation that permits the architecture to evolve in response to changing needs  governance that includes key stakeholders Performance as a virtual organization is critical to the evolution and delivery of these services, and thus, the organization and the supporting processes and policies must also evolve. This evolution is best understood by considering four datasets: process improvements, organizational changes, budget reprioritization, and project change requests. While there is inherently some overlap within these datasets, they together best show the evolution of XSEDE. As is shown below, our process improvement activities have shifted significantly over the first three years from reactive corrections toward strategic improvements and innovation within the organization. Over the past three years we have also executed four substantive organizational changes to better support our work and better communicate to our stakeholders more precisely what that work is. In PY2, having identified funds made available primarily as an artifact of spending down carry-over frund from the preceding TeraGrid project and the delayed spending of some funds as we ramped up staffing, we executed a process to identify high-priority activities that could be supported by these funds enabling a number of activities in XSEDE (e.g. the XRAS development was solely funded on this). Many of these changes are reflected in our formal documentation of major project changes via Project Change Requests. There have been 13 change requests within the first three years of the project; 12 of these have been processed with 9 of the 12 being approved. It is notable that not all requests are approved and, to date, 3 have been declined. F.1 Process Improvements Process improvements are identified in several ways: recommendations based on the staff climate survey, recommendations made by our partners at the Software Engineering Institute (SEI), identification by area managers, adaptations to external factors, or improvements to overall execution through implementation of widely validated improvement methodologies (e.g., the Baldrige Criteria). In the first three years of the project we have implemented 90 process improvements that have been identified through the above avenues. Seventeen of these improvements were implemented within Program Year 1 (PY1) while PY2 and PY3 had 36 and 37 improvements implemented respectively. These implementations range from big (aligning all project activities with strategic goals) to small (tweaks to frequencies of certain communications). The improvements can be considered in the context of the Baldrige guidance on organizational learning, which assesses organizational maturity by identifying five catalysts for change: 1. Reacting to the Problems 2. Making General Improvements 3. Systematic Evaluation and Improvement

PY1-3 Comprehensive Report Page 1

4. Learning and Strategic Improvement 5. Organizational Analysis & Innovation

This list of improvement categories progresses from reactive actions to strategic and innovative actions. Figure 1illustrates how the process improvements from the first three project years fall into these categories. 40 35 Organizational Analysis and 30 Innovation 25 Learning and Stategic Improvement 20 Systematic Evaluation 15 General Improvements 10 Reaction to Problem Number of Improvements of Number 5 0 PY1 PY2 PY3

Figure 1: PY1 - PY3 process improvements grouped by catalyst. Overtime the project has spent less time reacting to problems and more time on systematic evaluation, organizational learning, and strategic improvements. This maturation of our organization is an important part of its evolution. F.1.1 Formal Identification of Improvements F.1.1.1. The Baldrige Criteria The Baldrige Criteria for Performance Excellence are published by the National Institute of Standards and Technology (NIST) and provides a framework for organizational improvement. Although every organization is unique, this framework can be tailored to provide a useful guide for continuous improvement. In general the criteria guide an organization to reach goals and improve results by aligning plans and processes with people and actions. The criteria review seven aspects of an organization including: Leadership; Strategic planning; Customer focus; Measurement, analysis, and knowledge management; Workforce focus; and Results. In 2012 the NSF Review Panel made the following comments about XSEDE project management:  Issues of strategy may have been relegated to “strategic planning” for the project and thus an opportunity to align each activity of the project with the overall strategy is being missed.  XSEDE management needs to make the distinction between strategic planning and strategic management. Strategic planning involves “what will we do?” while strategic management involves “why are we doing it?” Should we do it differently? Is this the best use of our resources? Should we do something else?

PY1-3 Comprehensive Report Page 2

Strategic management involves a heavy focus on stakeholder (and user) segmentation; competitor analysis and evaluation of alternatives; and trends and extrapolation. Strategic management involves the continuous monitoring of all activities of an organization in order to inform and adjust the vision and its tasks, and to monitor progress not just at the task level, but in relation to the overall vision.  The panel suggests using the Baldrige Criteria as a proactive management tool (rather than reactive audit mechanism). The committee expressed these further expectations of XSEDE: 1) “Provide a two and five year vision – touch all major activities of the project,” 2) “Identify goals, objectives and appropriate mechanisms to gauge progress toward these visions,” 3) “Identify at least one metric for each goal and justify appropriate “target” values for these metrics,” 4) “Actively monitor metrics for trends Use information from metrics to feed back into the vision, goals and subsequent metrics,” 5) “Use information from metrics to feed back into the vision, goals and subsequent metrics,” 6) “Present progress succinctly,” 7) “Future review teams should not have to wade through a bunch of documents to get information needed.” In response, the project chose Leadership and Strategic Planning as the first areas of the Baldrige Criteria to be applied. As a result, the project team developed Vision, Mission and Goals Statements. Following this, the project team created a set of Key Performance Indicators (KPIs) to reflect goal achievement in each project area. These KPIs have been actively monitored and the ability to analyze trends is just getting underway now that data is becoming available. The project’s reporting format has been redesigned to allow stakeholders to more easily assess progress against the project goals. In 2013, workforce development became the project’s next focus area. The XSEDE workforce is highly distributed geographically, and all the participants must balance local tasks with those for XSEDE. It was speculated that understanding the workforce is critical to the success of this large- scale, highly distributed workforce. The Baldrige Workforce Focus encompasses three sub-areas:  Workforce environment – How do you build an effective and supportive workforce environment?  Workforce Engagement – How do you engage your workforce to achieve organizational and personal success?  Knowledge management – How do you manage your organizational knowledge assets, information, and information technology? An annual staff climate survey was established in 2013 to engage the workforce and gauge the workforce environment. The ISTEM Education Initiative at the University of Illinois conducts this survey. In 2014 XSEDE has turned its attention to the Customer Focus area of the Baldrige Criteria. The initial assessment includes customer/stakeholder-focused product and process results, work process effectiveness, which includes emergency preparedness and process efficiency. F.1.1.2. Staff Climate Study The 2013 National Science Foundation Panel Report to XSEDE indicated, “Human resource related issues are notoriously problematic for the HPC domain, but the XSEDE team can benefit from more explicit plans to attend to this issue.” Project management is particularly concerned with identifying, developing, onboarding, and retaining the appropriate talent in order to meet the project’s goals and objectives. The purpose of the initial XSEDE Staff Climate Study conducted in PY2 was to establish a baseline assessment of attitudes and concerns within the project so that

PY1-3 Comprehensive Report Page 3 management can appropriately respond and assess change over time. Constructs investigated included communication, staff interactions, resource sufficiency, personal value, satisfaction, and working effectiveness. In PY2, 129 of the 267 (48%) XSEDE staff members responded to the climate study. In PY3, additional constructs were incorporated, specifically belonging and organization management. In PY3, the response rate increased 14% when 163 of the 264 (62%) XSEDE staff members responded to the study. Following the initial PY2 climate study, Level 2 managers were asked to develop action plans to address the results of the 2013 study. Positive changes were seen in Communication, Value, Resources, Belonging and Satisfaction in relation to those actions. In PY3 the Climate Study was modified to include measures of the Baldrige criteria and to enable disaggregation by site. In addition, specific modules were added at the request of Level 2 managers. Preliminary results of the PY3 Climate Study indicate these areas of considerable improvement:  Continue to prominently display/discuss the XSEDE vision, mission and goals  Continue to focus workforce on achieving KPIs  Expand internal training opportunities  Make concentrated effort to improve internal XSEDE communications

Preliminary results were presented at a plenary session at the quarterly meeting in Minneapolis (June 2014) after which Level 2 and 3 managers received targeted reports specific to their areas. They have been asked to develop action plans to drive organizational improvements during PY4 based on the findings for their areas. F.1.1.3. Engineering Process Improvement During the planning grant phase of XSEDE, the Software Engineering Institute (SEI) developed an engineering approach that addressed the twin challenges of: a) establishing sound engineering practices that enable systematic, measured improvement in products and services; and b) introducing novel engineering practices that address novel challenges posed by XSEDE—in a word, those challenges that arise from XSEDE’s status as an NSF/OCI socio-technical ecosystem. Initially, the XSEDE workforce was unfamiliar with an architecture-centric approach to engineering systems. Consequently in PY1, the team focused on establishing a core set of engineering practices within the XSEDE program such as institutionalizing use case development to improve the quality of the XSEDE infrastructure and establishing management practices to improve visibility into software development progress and issues. During PY1 SEI saw clear indications that various socio-technical ecosystem issues were becoming more prominent to XSEDE (e.g., how to engage stakeholders to set priorities, chart technical direction, contribute engineering services, etc.). Based on these observations in PY2, SEI guided implementation of these process improvements:  Refined and documented software engineering processes to enable existing and new developers to effectively follow processes for iterative and incremental development  Designed and implemented Active Design Reviews for the architecture  Analyzed existing engineering practices through artifacts and interviews  Developed algorithms and conducted initial experiments to determine what forms of social structures can be deduced from probabilistic topic models

PY1-3 Comprehensive Report Page 4

 Developed the theoretical underpinnings for stochastic mixed membership block modeling;  Defined an initial structure for the information space to enable targeted notification and effective navigation

In PY3, SEI’s focus was to improve XSEDE’s efficiency, transparency, and accessibility by transitioning to more viable practices for conducting engineering at ecosystem scale through these activities:  Conducted interviews with key individuals about engineering activities that were potentially in need of improvement  Created information flow diagrams of XSEDE engineering processes as executed  Made recommendations for streamlining the engineering cycle without losing the engineering value  Introduced IdeaScale as a way to potentially provide automated support for collaboration needs.  Analyzed data from use case registry, wiki, tickets, use cases risk registry, discussion threads, and meeting minutes  Developed software for extracting and visualizing XSEDE data  Created visualizations of the history of four design/security reviews that were viewed as “complex and protracted,” and analyzed the reasons behind the complexity and protraction  Made recommendations for improving the design/security reviews and the design process

F.2 Organizational Changes Another lens for examining the evolution of XSEDE is how the organizational structure has changed over time. The project has undergone three changes to the organizational structure in the first three years of the project. In PY1, in response to feedback from the User Advisory Board, Advanced User Support Services was officially changed to Extended Collaborative Support Services to more effectively convey the extended collaborative nature of the support and that it was available to any allocated user, not just advanced users. The breakdown of effort and the management structure were also divided into two focus areas: projects and communities. The projects area focuses on support for research teams and novel or innovative projects while the communities area focuses on support for community codes and scientific gateways. In PY2, the Technology Investigation Service, which is funded through a separate award from NSF, was subsumed into XSEDE to improve coordination and communication. The scope of work and management structure was added while the budget necessarily remained separate. In PY3, the Education and Outreach group was refactored both in scope and structure. There were primarily two reasons for this: to more effectively coordinate closely related activities and to formalize evaluation activities for education and training services. F.3 Budget Savings At the end of the PY1, the project was significantly underspent. This was caused by several factors that include the time required to staff the new project, variance between planned effort and actual effort, and frontloading of capital for equipment. At this point the project allocated a portion of this savings to provide more effort to some high priority activities in the program plan for the second year. Additionally, some activities that had been delayed until the third project year were added to the program plan for the second year as a result of the savings.

PY1-3 Comprehensive Report Page 5

An annual review and re-prioritization of budget has been added to the yearly planning process. Now budget savings can be directly applied to the current program plan. Table 1 shows the effect of savings on the project’s scope in the first three years. Table 1: Summary of savings applied to scope in the second and third year of the project. Non-FTE WBS Activity Description FTE Cost Total Cost 1.1.1 Project Management & Operations project management assistance $200,000 $200,000 Reporting Security technical lead participation in active design reviews of A&D Use Cases $45,000 $45,000 Data technical lead participation in active desgin reviews of A&D Use Cases $45,000 $45,000 Resources technical lead participation in active design reviews of A&D Use Cases $45,000 $45,000 Testing technical lead participation in active design reviews of A&D Use Cases $45,000 $45,000 Transitioning of TeraGrid Science Gateway Credential with Attributes capability to 1.1.6 Software $32,000 $32,000 XSEDE for individual user tracking of Science Gateway users Development & Deliver GridFTP configuration and infrastructure for efficient use of bandwidth for Integration $32,000 $32,000 data transfers between XSEDE endpoints. Construction of InCommon-aware secure token service (STS) for Genesis II $64,000 $64,000 integration Improved and customized GRAMS packaging $24,000 $24,000 Subject expert reviewers for A&D and SD&I reviews $200,000 $200,000 Software Testing Engineer for testing, deploying, and maintaining the production 1.2.4 Software $200,000 $200,000 XSEDE resources Testing & Service Provider Coordination for all SP activities related to installing, maintaining, Deployment $100,000 $100,000 and updating required software services. 1.2.6 Systems Application Support for the backends of XSEDE centralized applications $300,000 $300,000 Operational Hardware for IU to leverage their VM system that hosts a significant percentage of $25,000 $25,000 Support XSEDE central services and Science Gateways Virtual Workshop functionality enhancement for online training environment $100,000 $5,000 $105,000 improvement Content building for additional "OpenACC" multi-site courses $75,000 $75,000 1.3.1 Training Videoconferencing software to support "OpenACC" multi-site course activities $10,000 $10,000 described above CI-Tutor technical support for deeper integration with XSEDE $25,000 $25,000

Phase 1 improvements to POPS User interface $50,000 $50,000

1.3.2 User Multi-platform, mobile web application development for XUP $75,000 $75,000 Information and Native mobile application development for iOS and Android $150,000 $150,000 Interfaces XUP interface to RT ticketing system $50,000 $50,000 Update documentation and system monitoring for XSEDE storage allocations $50,000 $50,000 Modification of training sources to support training assessment and certification $50,000 $50,000 Development of a REST API for XSEDE services $75,000 $75,000 Development of targeted, quarterly user surveys based on feedback from the $25,000 $25,000 annual user survey 1.3.3 User SurveyMonkey Pro Platinum Subscription $4,000 $4,000 Engagement Addition of FTE % at TACC (for XUP development) and relocation of .5 FTE to NICS $50,000 $50,000 Customize RT to meet specific needs for the XSEDE ticket system $50,000 $50,000 Implementation of microsurveys to support feedback campaigns $25,000 $25,000 Documentation updates for Allocations and implementation of procedure updates $75,000 $75,000 for ticket management 1.3.4 Allocations Post-renewal interviewing of Pis, who listed XSEDE-based publications in support of $75,000 $75,000 their proposal, for generation of Science Nuggets Targeted Gateway Development for Community Codes $200,000 $200,000

1.6.2 Outreach Additional staff support for logistics and tracking of the Campus Champion program $75,000 $75,000

1.6.4 Education Expand and enhance the content and effectiveness of HPC University, including the and Outreach $60,000 $12,000 $72,000 HPCU website and the TEOS website Infrastructure Support and outreach for campus bridging $66,667 $66,667 Technical support for planning, documentation, and development of training for 1.6.5 Campus $50,000 $50,000 cluster build rollout Bridging Leadership activities focused on campus bridging $36,667 $36,667 Support use of Rockhopper as a level 3 "Cluster as a service" resource for XSEDE $33,333 $33,333 Grand Totals $2,853,667 $56,000 $2,909,667

PY1-3 Comprehensive Report Page 6

F.4 Project Change Requests Changes to scope, schedule, budget, and management personnel are tracked through project change requests. There have been 13 change requests within the first three years of the project; 12 of these have been processed with 9 of the 12 being approved (Table 2). Table 2: Summary of project change requests that were submitted within the first three years of the project.

Submission Date Affected Approval Approved Date Recorded Change Type Area Description Submitter Level Approver (Y/N) Request for additional budget to provide development support to 4/30/12 6/25/12 Budget/Scope 1.1 Tim Cockerill Project John Towns no extend XSEDE authentication to the staff SharePoint. Request of additional 1.25 FTE in 5/12/12 6/27/12 Budget 1.6.5 Craig Stewart Project John Towns yes Campus Bridging. Request of additional $875,000 for 5/12/12 6/27/12 Budget 1.6.5 Craig Stewart Project John Towns no campus bridging efforts. 6/1/12 8/31/12 Scope All Submission of PY2 Program Plan Kathlyn BoudwinSponsor Barry Schneider yes Request for additional budget to 6/14/12 7/21/12 Budget 1.1.1 purchase project management Tim Cockerill Project John Towns yes software from Psnext. Request for budget for partial support of US desk editor for 7/30/12 8/14/12 Budget/Scope 1.1 Craig Stewart Project John Towns no International Science Grid This Week. Re-prioritization of unspent PY1 1/15/13 3/9/13 Budget All Martin Biernat Project John Towns yes budget. 6/1/13 8/1/13 Scope All Submission of PY3 Program Plan Justin Whitt Sponsor Barry Schneider yes Re-alignment of WBS for more 1/3/14 4/3/14 Organization/Scope 1.6 effective execution and to Scott Lathrop Project John Towns yes formalize evaluation efforts. Request for addition of a project administration group within 1.1 3/1/14 - Scope/Budget/Organization 1.1 Justin Project John Towns Pending and an associated addition to project scope. Re-prioritization of unspent PY2 3/15/14 4/9/14 Budget All Martin Biernat Project John Towns yes budget. 6/1/14 7/15/14 Scope All Submission of PY4 Program Plan Justin Whitt Sponsor Rudi Eigenmann yes Adding the budget and scope from 915/12 11/29/12 Budget/Organization/Scope 1.7 Jay Ray Scott Project John Towns yes the TIS award to XSEDE.

PY1-3 Comprehensive Report Page 7

G Increasing Diversity Extending the use of the advanced digital research services ecosystem to communities who are not traditional users of this ecosystem (Sec. 2.6) is a strategic sub-goal of the XSEDE project. Critical to this sub-goal is increasing the demographic, geographic, and domain diversity of the people engaged in this ecosystem. Increasing diversity is vital to the nation’s future and is a foundation for two of XSEDE’s strategic sub-goals: preparing the current and next generation of scholars, researchers, practitioners, and engineers in the use of advanced digital technologies via training, education, outreach, and consultations; and raising the general awareness of the value of advanced digital research services. The experiences gained through TeraGrid and XSEDE along with the insights gained from interacting with other projects indicate that while change is slow, it is indeed happening. We are able to provide metrics of impact to demonstrate successful sustained community involvement. Since we recognize that the long-term benefits are expected to have profound impacts that we will not be able to adequately assess during the funding period of XSEDE, XSEDE is building a longitudinal database that will be available to future projects for analysis of the long-term impact. The XSEDE leadership and all XSEDE staff are committed to, and participate in, our efforts to enhance diversity. XSEDE employs proactive efforts to increase diversity, led by specific groups within XSEDE. Underrepresented Community Engagement (URCE) (WBS 1.6.7), Education (WBS 1.6.1), Student Engagement (WBS 1.6.8) and the Champions Program (WBS 1.6.6) are responsible for organizing and running specific programs that establish and maintain working relationships with under-represented stakeholders. The Novel and Innovative Projects (NIP) program (WBS 1.4.2) in the Extended Collaborative Support Service (ECSS) identifies and mentors specific projects whose success in using XSEDE and other advanced digital resources can catalyze increased adoption by under-represented communities. Below are summarized results of these programs’ efforts in PY1-3 that led to the measured impact gains documented here, and that provide useful insights for future years.  The population of active XSEDE community members has become more diverse compared with that of TeraGrid, as well as compared to the general population of US faculty with federal grant support (see Table I-1).  Students increasingly predominate among XSEDE users, with the sustained increase in undergraduate student users directly attributable to the XSEDE student programs (see Table I-2).  XSEDE programs and allocations now reach people in all 50 states, including 30 of the 31 campuses in EPSCoR jurisdictions (see Figure I-1), Puerto Rico and the Virgin Islands.  Participation as users of XSEDE services and contributors among Under-Represented Minorities (URMs), Minority Serving Institutions (MSIs), people with disabilities, institutions in EPSCoR jurisdictions, and women is increasing and is becoming more sustained, with training and consulting as important enablers. The increasing participation by women and people from MSIs in XSEDE training events over project years is an important indicator of progress (Table I-3).  Compared to TeraGrid, XSEDE resource usage has become more diverse in terms of fields of science and institutions (Table I-4).  Growing discipline diversity can be expected to also entail increased demographic diversity, since women and minorities are better represented in fields that are not among the Top 20 in Table I-4, such as psychology, the social sciences and the humanities: see e.g. “Doctorates awarded to U.S. citizens and permanent residents, by sex, field, and race or ethnicity: 2012” http://www.nsf.gov/statistics/wmpd/2013/pdf/tab7-8_updated_2014_05.pdf

PY1-3 Comprehensive Report Page 1

 XSEDE continues to address accessibility issues through 1) self-paced instruction primarily for sighted individuals to train them to "read braille with their eyes" so as to help those who read braille because they are blind; and 2) the popular on-line signs dictionary for math and science for deaf and hard-of-hearing; and signs for STEM concepts including entire lesson plans with signed instructions.  XSEDE’s staff diversity has improved, and this continues to be a priority for the project overall. XSEDE now has two women as co-PIs, one level-2 woman director, and six level-3 women managers. While XSEDE does not currently collect demographic information about XSEDE funded staff, we know that over 25% of them are women. The various advisory boards to XSEDE include eight women, more than a dozen people from minorities, and a woman with disabilities.

Table G-1 shows the demographic distribution of respondents to the annual user surveys administered in the last year of TeraGrid and in each XSEDE project year. Survey response is an indication of active participation in the user community. The latest available data on the demographic distribution of faculty supported by federal research grants grant puts participation in XSEDE and TeraGrid in a broader perspective. Table G-1: Gender and demographic distribution (percent) of annual survey respondents Male Female White Asian Black/ Hispanic American Native Indian/ Hawaiian/ African- American Alaska Pacific Native Islander Faculty 71.4 28.6 76.7 14.9 2.9 4.2 0.2 0.1 Federal Grant Support (2009)* TG User 84.0# 16.0# 72.0 23.0 2.0 3.0 0.7 0.3 Survey 2010 XSEDE User 71.9 19.8 65.9 22.3 1.4 5.4 0.7 0.4 Survey PY1 XSEDE User 72.0 18.0 64.5 33.3 2.8 6.0 0.7 0.3 Survey PY2 XSEDE User 71.7 18.4 61.4 37.1 2.3 5.0 0.5 0.4 Survey PY3 *Source: http://www.nsf.gov/statistics/wmpd/2013/pdf/tab9-30.pdf. #In the 2010 TeraGrid user survey, respondents were not given the explicit option to not specify their gender. They do have this option in the XSEDE surveys. Table G-2 shows the number and % distribution of users with active accounts on XSEDE resources during each project year, by NSF status.

Table G-2: Distribution of XSEDE users with active accounts, by NSF status

PY1-3 Comprehensive Report Page 2

NSF Status PY1 PY2 PY3 Faculty 1,987 21.2% 2,014 18.8% 2,113 17.7% Graduate 3,561 38.1% 4,308 40.3% 4,948 41.5% Students Postdoctoral 1,377 14.7% 1,514 14.2% 1,596 13.4% researchers Undergraduate 1,007 10.8% 1,237 11.6% 1,465 12.3% Students University 762 8.1% 797 7.5% 833 7.0% research staff* Industry 18 0.2% 32 0.3% 47 0.4% Non-Profit 39 0.4% 43 0.4% 40 0.3% Government 99 1.1% 100 0.9% 95 0.8% Other categories 502 5.4% 650 6.1% 773 6.5% Total 9,352 10,695 11,910

*Excludes postdoctoral researchers.

Figure G-1 shows the approximately 20% per year growth of the Campus Champions program from the last month of the TeraGrid program with 93 member institutions to the end of XSEDE PY3 with 165 member institutions and over 200 representatives on those campuses. By the end of PY3, member campuses include 53 within EPSCoR jurisdictions, 12 that are MSIs, and 9 that are both MSIs and within EPSCoR jurisdictions. Since the Champions program serves as the platform for all diversity programs to interact with current and potential users, this is a good measure of our ongoing efforts to effect national-scale geographic and institutional impact.

PY1-3 Comprehensive Report Page 3

Figure G-1: Expansion of the Campus Champions Program from June 2011 (upper) to May 2014 (lower)

Table G-3 shows various indicators of participation of people from minority serving institutions, and of women, in the XSEDE community. People who obtain an XSEDE User Portal account automatically become members of the XSEDE community; their active follow-up is indicated by participation in training and outreach events, the annual XSEDE Conferences, and by obtaining allocations of computer time.

PY1-3 Comprehensive Report Page 4

Table G-3: Indicators of participation in XSEDE Participation in PY0** PY1 PY2 PY3 XSEDE Women who n/a 167 357 852 participated in XSEDE training MSI* institutions 23 28 25 25 MSI first-time n/a 534 711 635 XSEDE Portal accounts MSI first-time n/a 99 110 289 XSEDE training registrations MSI allocations 46 51 48 52 MSI users 186 153 132 168 EPSCoR 28 29 29 28 jurisdictions EPSCoR allocations 286 316 307 329 EPSCoR users 1,293 1,486 1,718 1,678 *MSI = all HBCU, HSI, MSI, and TCU institutions aggregated ** PY0 is the last year of TeraGrid

Table G-4 shows the percentages of normalized computing units (NUs) used by fields of science (FOS) other than the top 20, as well as by PI institutions other than the top 30, measured annually since July 1, 2010. These cutoffs were chosen to capture usage by any individual FOS or institution below ~1% in all time periods.

Table G-4: Diversity of NUs used by primary field of science (FOS) and by PI Institution, since July 2010 Period Percentage of Percentage of NUs used by NUs used by FOS Other than Institutions the Top 20 Other than the Top 30 TeraGrid: 7/1/2010 – 6/30/2011 13.0 27.7 XSEDE PY1 14.3 35.0 XSEDE PY2 17.6 37.3 XSEDE PY3 18.0 34.4

Figure G-2 shows the growth of projects mentored by the NIP team (WBS1.4.2). These projects represent fields such as genetics and nucleic acids; behavioral and neural sciences; social and economic sciences; and the humanities, which in PY3 still ranked between 39 and 107 in NU usage among the 115 primary fields of science.

PY1-3 Comprehensive Report Page 5

Figure G-2: Growth of projects mentored by NIP staff.

PY1-3 Comprehensive Report Page 6

H Sustainability Planning XSEDE is a major provider of advanced digital services to many research communities primarily targeted at direct support of end researchers. XSEDE faces a major challenge: as an NSF-funded activity in that the life of a project funded by NSF, particularly out of the NSF Research and Related Activities funds, a single typical award is limited to a maximum life time of 10 years. While there is certainly merit in the open re-competition of awards, it was also evident that the transition from TeraGrid to XSEDE was very disruptive to the community. XSEDE is seeking to develop a sustainability model beyond a potential second five years of funding—beyond PY10. XSEDE must create something that is driven by the needs of the community and provides the services that effectively support science and engineering research and education. It must also be convincing for NSF to continue funding a set of core services and be able to survive a re-competition for those services. XSEDE would like to ameliorate this problem and work to provide for sustained availability of critical services to the community without disruption. In addition, XSEDE is developing many services usable by other projects and infrastructure providers. This includes MREFC (Major Research Equipment and Facilities Construction) projects and NSF-funded infrastructure investments such as NCAR (National Center for Atmospheric Research), OSG (Open Science Grid), NRAO (National Radio Astronomy Observatory), DataNet projects and so on. In fact, most of these projects and infrastructure providers are “re-inventing the wheel” by replicating a number of core services that XSEDE has created at scale. This replication is sub-optimal on a global level, though it clearly allows optimization on a “local” level. Given the non-discipline-specific nature of XSEDE, the services we have and are developing generally have broader applicability, again speaking to the potentially utility to other projects and infrastructure providers. XSEDE would like to become a key provider of advanced digital services to projects, programs and other infrastructure providers. While this is not specifically restricted to NSF-funded projects, XSEDE will be focusing on such projects initially. If such services can be customized (if needed) and provided to other projects, programs and infrastructure providers at an incremental cost to them, a balance can be struck along a number of axes. The “local” customization can be realized, but a global optimization of costs across these efforts can also be gained. We wish to note here that should we discover that another provider offers services that XSEDE could use more effectively and they are more cost-efficient, we are completely open to those XSEDE has embarked on a first pilot test case of offering a service in offering the newly developed XRAS (XSEDE Resource Allocation Service) to NCAR for their use in operating Yellowstone and other resources. Though this is an activity still in process, preliminary estimates are that this service can be provided to NCAR at an approximate annual cost of less than $10,000. This allows NCAR to leverage the approximately $600,000 investment XSEDE has made in initially developing this service for its own needs. We are currently awaiting an estimate of current annual staff costs incurred by NCAR to support their allocations using a manual process. This will provide a much better service for them at an anticipated lower cost. The XRAS pilot is intended to be a first test case in which we work out the numerous issues of one project purchasing services from another. Nonetheless, it demonstrates the potential for providing higher quality services to a broader community with a global optimization of costs. A number of other XSEDE services are under consideration for packaging for offering to others, though we plan to proceed cautiously and to determine interest and ability to satisfy demands before moving forward.

PY1-3 Comprehensive Report Page 1

H.1 XRAS Pilot to Product The XRAS service has been developed as a “clean slate” redesign of the previous allocations support system in use by XSEDE (and previously TeraGrid) for over 10 years. Though it has been quite successful, it has been under ongoing incremental development for that entire period as needs have arisen to support the evolving allocations process. As the diversity of resources XSEDE allocates grows, it became clear that a redesign of the system was necessary. With the previous encouragement by NSF review panels to develop a sustainability plan, we took this opportunity to specifically design the service to be extensible not only to support the anticipated but unknown future needs of XSEDE, but also the needs of other infrastructure providers in allocating resources. XRAS has the distinction of being the first cost-recovery pay service that XSEDE has scoped and developed. As noted, a unique feature of XRAS is that it can be set up to allow other “projects” (programs, infrastructure providers, projects, etc.) outside XSEDE to utilize this service. XRAS was constructed such that it supports the allocation of arbitrary “resources”—be they more traditional resources such compute time or storage, or other resources such as telescope time, beam line time, or even money. Examples of projects that could utilize XRAS range from local university research computing resources to national resources like those operated by NCAR. To consider this a complete product offering XSEDE must look at this product as more just the service of XRAS, training will be offered to complete the customer experience with webinars for users, and full training for administrators and developers. XRAS provides value add for potential customers supporting:  Administration of resource allocation process: allowing time saving by using collaborative cloud-like technologies  Increased operational efficiency: by using the XRAS ecosystem of technologies and collaborative software reduces coordination costs, improves access to data and brings insight to the point of customer need.  Leveraging data to provide insights: providing unique perspectives regarding how resources have been allocated to activities.  Management of the ecosystem: using the XRAS cloud-like technology allows coordinated allocation of all resources a project might allocate and access to the allocation data to manage those resources in a coherent and cohesive manner As XSEDE engages with additional projects to provide the XRAS service, it will support product and service innovation by leveraging XSEDE XRAS and Training and incorporating the needs of a growing set of customers to create richer offerings to projects outside XSEDE. XSEDE is developing pricing around a cost-recovery model covering development, operation, equipment and training resources that will be utilized. H.1.1 Roadmap Development for XRAS This pilot activity is being used to develop a process to be implemented to develop products for external customers. The flexibility of the XRAS service design allows for a high degree of customization—which could be done entirely by the customer!—but XSEDE must provide for potential changes the core XRAS system using standardized methodology and nomenclature to accommodate any unanticipated requirements of particular customers. Great care has been taken to segregate the features of XRAS that might require customization from a core set of capabilities to support arbitrary customizations for the specific allocations process and policy needs of potential customers.

PY1-3 Comprehensive Report Page 2

Process Methodology: The XRAS product will be organizing around the feature requests made by its user base. Those features will be weighed and feasibility of development will be analyzed. Once this analysis is complete the XRAS product team will put development cost to each feature. Nomenclature: The XRAS initial release will be R1.0.0. XRAS will use the standardized method of: 1 (Major Release). 0 (Minor/Bug Release). 0 (Customer specific Release) This nomenclature is important and relevant to the way the XRAS team will develop the upcoming roadmap. Major releases will have valuable features that most customers will need and utilize. Minor/Bug releases will be made available, usually at no cost to the customer base. Customer or project specific releases will be utilized only when a customer has asked to have a feature developed only for them. As a cloud-like service offering, the XSEDE XRAS team can have multiple instances available making this system work easily and showing value to the work of the team. Each release will be archived and be made available to the customer base if the XRAS project is discontinued. The front-end interface (in which it is anticipated all customization necessary for a customer would be made) used by XSEDE is already available publically on github. H.1.2 XRAS Pricing Service pricing for our customer base will be based on a usage model. The usage model will be based on the allocations that the target project uses per year, and the training price points will be based on a per student basis. Pricing will be set around proposals processed/year with price points at 0–99, 100–499, 500–999 and greater than 1000 proposals/year. A continuing customer will have a good understanding of what their last year allocation rate was; we will work with the new projects to ascertain and set their price points. For Example, for the NCAR team we understand that their allocations are approximately 250 proposals/year. Agreements will be renewed annually and proposal rates will be reviewed during those renewals. Within the pricing model the original development and annual operating costs will also be partially recovered. Costs for training of customer staff are being determined based of XSEDE personnel time with current estimates for development and delivery of training that webinar-based training will require approximately 2.9 person-months of effort, XRAS administrator training will require approximately 4.2 person-months of effort and XRAS developer training will require approximately 4.2 person-months of effort. H.2 Expanding the Portfolio of Offerings With the successful first launch of XRAS as an offering outside XSEDE, we have looked at other possible candidates within the XSEDE suite of tools and services. Possible candidates include:  Helpdesk services o 24/7 monitoring, ticket triage and routing, …  ECSS-style support o additional effort funded in ECSS to support specific project/programs/communities  Customized training workshops  Cluster builds o part of Campus Bridging  Identity/group management services  Cybersecurity assessments  Custom User Portals

PY1-3 Comprehensive Report Page 3

o replicate XUP with customizations and “skinned” for other projects  Technology Evaluations on demand o evaluate based on stated requirements

PY1-3 Comprehensive Report Page 4

I Indicators of Impact: Developing Impact Metrics XSEDE has faced a significant challenge in trying to articulate its impact on the computation- and data-enabled science and engineering community. It faces one clear, hard fact: NSF fundamentally wishes to have measures of this impact when XSEDE itself does not conduct science and engineering research but plays an important supporting role. This challenge is not unique to XSEDE, but we have undertaken the challenge to develop better indicators—not absolute measures—of impact. In this appendix we will discuss two directions we are pursuing to develop indicators of impact: publication citation metrics and longitudinal studies. Both of these are in development and we will continue to work on them, but we wanted to relay the status of these efforts and solicit feedback on them. Publication lists have traditionally been used as one indicator of impact, but this is not satisfying in that it is unknown if those publications were of value to the community. XSEDE has been working with the Technology Audit Service (PI: Tom Furlani) to explore the development of citation indices and other bibliometrics based on publications supported by XSEDE. The raw number of publications supported is impressive—over 18,000 publications supported by XSEDE and TeraGrid since 2005 and more than 10,000 publications in just the past three years supported by XSEDE alone—but more meaningful metrics are in development. This analysis needs to be considered in the light of an increasing publication rate supported by XSEDE. In the last year of TeraGrid, 2,385 publications were supported. Three years later, in XSEDE’s Program Year 3, 4,376 publications were supported—almost a doubling of the publications rate in that period. Analysis of the past five years of publications shows XSEDE’s publication impact is comparable to the 20th ranked journal in the top research publication venues for the fields of “Engineering & Computer Science.” In addition to the publications we have supported by members of the communities we support, we have identified 70 publications from XSEDE staff since we began collecting this information in September of 2012. Independent from the previous activities, we have established the User Portal as the path to access a number of XSEDE services such as registering for training and providing evaluation of attended training events, allocation request submission and others. This capability allows us a means to track the various activities that researchers engage in over time—something we have not had the ability to do previously—and has been the lynchpin in being able to conduct a proof of concept longitudinal study. As an example scenario, we can now track an individual who:  initially has access to resources via a Campus Champion’s allocation  participates in one or more training options  shows usage of the allocation to which they have been given access  becomes associated with a PIs allocation and shows usage there  becomes an independent PI with their own allocation Such a sequence of events indicates that XSEDE has had an impact on developing a computationally savvy member of the workforce through the linked efforts of multiple XSEDE efforts (Champions program, Training program, and the XSEDE allocations process). The data that is captured from the user portal is giving us the ability to conduct Longitudinal Studies of the progression of new XSEDE portal users to becoming an independent PI. This analysis is provided below. A couple of interesting points include:  44% of those participating in XSEDE training are not making use of XSEDE resources indicating XSEDE is having an impact in workforce development in the broader community

PY1-3 Comprehensive Report Page 1

 42% of new researchers provided access to XSEDE resources via a Campus Champions allocation proceed to obtaining their own start-up allocation indicating the efficacy of the Campus Champions program

As noted above, both of these efforts are still works in progress but are showing promise in measuring impact and driving XSEDE’s planning to have greater impact. I.1 Publication and Citation Metrics As presented at XSEDE’14 in the paper “Towards a Scientific Impact Measuring Framework for Large Computing Facilities - a Case Study on XSEDE” (see paper attached at the end of this appendix), the Technology Audit Service (TAS) has been working with XSEDE. The team has devised a process to obtain and manage publication and citation data from various sources for the XSEDE community. At this time we have identified about 18,000 publications from the XSEDE user portal and past XSEDE and TeraGrid quarterly reports dating back to 2005. This set of publications is a valid and useful collection that can be used for impact analysis. However, the TAS team continues to strive to find more refined methods to ensure publication and citation datasets used for impact analysis including those with the highest direct relationship to support received from XSEDE. Having a user confirm a publication is directly related to work supported by XSEDE reduces the possibility considerably that these data points are invalid for analysis. Therefore, from the collection of 18,000 publications about 1,000 were vetted directly by XSEDE users through the XSEDE portal. This number will grow as XSEDE users continue vetting relevant publications. As this number grows it will increase the value of the analysis that is gained from the user portal data sources beyond the good level that is presently available in the larger dataset. In addition, the TAS team also provides a much larger comprehensive database that includes more than 142,000 i10-Index publications for over 22,000 XSEDE users. While this dataset is The i10-index indicates the number comprehensive and very useful, it can contain publications from of academic publications an author an XSEDE user that were not directly supported by XSEDE. has written that have at least ten citations from others. The TAS team conducted an impact study from the collection of h-Index 18,000 publications. Common metrics analyzed by the impact A scholar with an index of h has study include the number of publications, i10-index, citation published h papers each of which count, h-index, and g-index. Analysis of one of the preliminary has been cited in other papers at results is summarized in Table 1 which provides simplified least h times. Thus, the h-index insights about the relative measure of XSEDE’s scientific impact. reflects both the number of publications and the number of The table compares the metrics of XSEDE’s impact to that of citations per publication. John von Neumann, whose work has without question impacted science and is highly cited throughout scientific literature. If the g-Index XSEDE supported publication and citation metrics from the Given a set of articles ranked in decreasing order of the number of XSEDE User Portal publication dataset were consider to be from citations that they received, the g- an individual researcher, this researcher’s work would have a index is the (unique) largest higher h/g-index that John von Neumann. It cannot be number such that the top g articles concluded that XSEDE’s impact is greater than the work of von received (together) at least g2 Neumann but, it does serve to illustrate a high level indicator citations. that XSEDE is having a measurable impact on science as evidenced by the publication and citation metrics.

PY1-3 Comprehensive Report Page 2

XSEDE/TeraGrid John von Neumann TeraGrid/XSEDE John von Neumann since 2009 since 2009 All publications All publications Citations 85144 80330 53902 25005 h-index 107 81 85 52 i10-index 1772 206 1277 125 Table 1: Preliminary analysis of citations for 18,000 publications supported by XSEDE/TeraGrid in 2005-2014. To focus in more on the publications supported by XSEDE and as seen in Figure 1 below, another indicator of XSEDE’s impact is a comparison of XSEDE’s h5-index metric (the h-index for articles published in the last 5 complete years, i.e. 2009-2011) to the top research publication venues for the fields of “Engineering & Computer Science.” For this period of time there were 15,441 publications of which 10,670 were supported by XSEDE. Using this comparison, we see that the h5 index is close to the 20th spot in this field. While not an absolute measure of impact, this does give a good indicator of the overall positive impact XSEDE is making for the advancement of science. Further, looking at other relevant filds represented by Google Scholar, the composite XSEDEh5- index is also very close to the #20 ranked publication venues in the fields of “Chemical & Material Sciences,” “Life Sciences & Earth Sciences,” and “Physics & Mathematics.”

Figure 1: Comparison of the h5-index of XSEDE with that of the top 20 Journals in Engineering and Computer Science, obtained from Google scholar. http://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng The analysis of XSEDE’s relative impact to that of John von Neumann when combined with analysis of XSEDE’s relative impact to the top 20 journals in Engineering and Computer Science provides two very good indicators that XSEDE is having a noticeably positive impact on advancing science. There is still considerable work underway to provide deeper insight over time when more data has

PY1-3 Comprehensive Report Page 3 been made available to the TAS team by the users through the user portal. Together this data will increase the opportunities to understand and highlight the great effect XSEDE is having on research. One glimpse of the potential insight that can be gained is discussed below. The example is more completely covered in the XSEDE14 presentation paper, contained at the end of the appendix but is covered in summary here. Figure 2 below shows the Service Units (SU) allocated vs the h-index produced for each Field of Science (FOS). The circle size is proportional to the number of projects in a given FOS. When the fitted trend is removed, the number of SUs received begins to diverge from the expected value that would give higher h-index ratings. This could imply that certain FOS's are more efficiently (requiring less than expected resources) producing a given impact while some others require more than expected SUs to produce the same impact. This is something that many in the community have understood anecdotally—that various fields of science need varying amounts of resources to have an impact—but this provides for the first time the quantitative evidence. Similar analysis can be pursued to ensure the SUs are used in a more efficient manner to gain the maximum return on our investments in national high performance computing assets.

Figure 2: Analysis of allocation amounts (SUs) against h-index by field of science.

I.2 Longitudinal Tracking Proof of Concept Study (July 2011 – July 2014) A major goal of the Training, Education Outreach and Services (TEOS) component of XSEDE is to cultivate and expand the user group through its various activities. To assess the extent to which training, education and outreach activities result in increased capacity and use of XSEDE resources, the external evaluation team conducted a proof of concept longitudinal tracking study to

PY1-3 Comprehensive Report Page 4 understand how targeted user groups progress through typical pathways of engagement with XSEDE resources via the user portal. Data available for conducting the analysis from the portal included: allocations, training registration, and resource usage, namely job submission and execution. Declined allocation requests and help desk tickets were linked with portal data in order to conduct a deeper analysis. Groups targeted in the evaluation include: XSEDE Portal users, Training registrants, Campus Champions, Users of Campus Champion allocations, XSEDE Scholars, and Summer Research Experience participants. Results from the Longitudinal Tracking Study were provided to Level 2 and 3 managers to inform target setting and continuous improvement efforts. I.2.1 Findings I.2.1.1. All XSEDE Portal Users Over 33,900 people created an XSEDE portal account and received a user ID in PY1-3. Seventy-nine per cent (N=26,536) had access to an allocation. Slightly less than half (N= 12,662; 48%) of those with allocation access executed at least one job in PY1-3. Of the 14,363 jobs executed, 55% used research allocations, 20% used start up allocations, 23% used education allocations and 3% used Champion allocations.

XSEDE Portal Users: 2008 – July 2014, N = 33,904

33904 26536 12662

XUPID created PY 1 -3 Allocation Access Job Executors

PY1-3 Comprehensive Report Page 5

Jobs by Allocation Type for XSEDE Portal Users: 2008 – July 2014, N = 14,363

55% 20% 23% 3%

Research Education Startup Champion

These findings indicate that while more than three-fourths of those registered in the XSEDE Portal have access to an allocation, many do not execute jobs. It may be useful to conduct a needs assessment with 1) XSEDE portal users who do not have allocation access (N=7,368) and 2) those with allocation access who have not executed jobs (N=13,874) to better understand their circumstances and improve support. The data also suggest that good use is being made of the various types of allocations (e.g. research, education, startup). Future longitudinal studies should disaggregate use by type of allocation to gain insight into different patterns of XSEDE resource use. I.2.1.2. Training Training Registrants: July 2011 – July 2014, N = 6,405

6405 3585 2098

Training Registrants Allocation Access Job Executors

PY1-3 Comprehensive Report Page 6

Jobs by Allocation Type for Training Registrants: July 2011 – July 2014, N = 2,955

34% 34% 26% 7%

Research Education Startup Champion

More than 6,400 individuals registered for training through the XSEDE portal in PY1-3. Of those 56% (N=3,585) had allocation access (77% had access prior to their first training; 23% gained access after training). More than half (58%) of training participants executed at least one job in PY1-3. Of those, 64% executed a job prior to their first training; 37% executed after training. The 2,955 jobs executed were fairly evenly distributed across research, education and startup allocations, with 7% running on Champion Allocations. These data suggest that patterns of engagement of current training registrants are diverse. They engage in training at various points in the allocation process and a significant number (44%) have no access to an allocation before or after training. This diversity presents challenges to the design and delivery of training and raises questions regarding the extent to which training content should be more aligned to enable allocation and job execution or remain broad to engage a wide array of users and potential users. Training registrants make use of a full array of allocation types. I.2.1.3. Campus Champions The Campus Champion Program is one mechanism for promoting use of XSEDE resources on campuses across the nation. Champions are given a Champion Allocation for their own use or as a means of allowing others on their campuses to easily experiment with XSEDE resources.

PY1-3 Comprehensive Report Page 7

Campus Champions: July 2011 – July 2014, N = 226

226 214 184 97

Champions XUPID Allocation Access Job Executors

Of the 226 Campus Champions, 95% have an XSEDE portal ID and 81% have access to a Champion Allocation. About 40% of Champions have executed a job on some XSEDE allocation. Given these data, Level 3 managers have begun to monitor and promote use of Champion Allocations with a goal of 100% of Campus Champions having access to and executing at least one job on a Champion Allocation in PY4. Other Users of Campus Champions Accounts: July 2011 – July 2014, N = 667

667 255 100

New Champion Allocation Job Executors on Champion Job Executors on Other Access Account Accounts

More than 660 additional users have accessed Champion Allocations, with 38% actually executing jobs on the account. Of those who executed a job on the Champion Allocation, 39% (N=100) went on to execute a job on another account. Most frequently (42%), they progressed to their own start up account.

PY1-3 Comprehensive Report Page 8

Jobs by Allocation Type of Other Users: July 2011 – July 2014, N =180

31% 27% 42%

Research Education Startup

These findings demonstrate the utility of the Champion Allocation as a means of exposing new users to XSEDE allocations. I.2.1.4. Student Programs XSEDE operates two student programs, Summer Research Experience (SRE) and XSEDE Scholars, with the goal of introducing more students to high performance computing and increasing the diversity of the XSEDE user base. One indicator of impact for this programs is the extent to which student participants execute jobs on XSEDE allocations. Summer Research Experience Cohorts 1 – 3: July 2011 – July 2014, N = 43

43 43 39 18 23

SRE Participants XUPID Allocation Access Training Job Executors

PY1-3 Comprehensive Report Page 9

Jobs by Allocation Types for Summer Research Experience Cohorts 1 – 3: July 2011 – July 2014, N =32

9% 63% 19% 9%

Research Education Startup Champion

All SRE participants had a portal ID and 91% had access to an allocation. Forty-two percent participated in training. More than half (53%) of the students in the SRE executed jobs (32 jobs total) as part of their program, taking advantage of all allocation types. XSEDE Scholars Cohorts 1 -3: July 2011 – July 2014, N = 135

9 135 108 63 30

XSEDE Scholars XUPID Allocation Access Training Job Executors Participants

PY1-3 Comprehensive Report Page 10

Jobs by Allocation Type for XSEDE Scholars Cohorts 1 -3: July 2011 – July 2014, N = 12

25% 42% 33% 0%

Research Education Startup Champion

XSEDE Scholars exhibit lower levels of engagement with 80% having a portal ID and 47% with access to an allocation. Less than a fourth (22%) of the Scholars took part in XSEDE training outside of that delivered by the Scholars program. Approximately 7% of the Scholars executed a job (12 jobs total) in PY1-3. To enhance program impact, steps should be taken to increase Scholars portal registration, account access and active participation in training opportunities. I.3 Conclusions Longitudinal tracking using the XSEDE portal appears to be a viable means to assess patterns of use and impact of TEOS and other XSEDE resources. Implementation of new XRAS evaluation surveys and greater integration of the XSEDE User Survey into the portal will enhance our capacity to refine and embellish future analyses. Level 2 and 3 managers found the data useful in gaining greater understanding of their programs, setting metrics, and promoting integration across programs.

PY1-3 Comprehensive Report Page 11

Towards a Scientific Impact Measuring Framework for Large Computing Facilities - a Case Study on XSEDE

∗ Fugang Wang Gregor von Laszewski Geoffrey C. Fox Indiana University Indiana University Indiana University 2719 10th Street 2719 10th Street 2719 10th Street Bloomington, Indiana, U.S.A. Bloomington, Indiana, U.S.A. Bloomington, Indiana, U.S.A. [email protected] Thomas R. Furlani Robert L. DeLeon Steven M. Gallo Center for Computational Center for Computational Center for Computational Research Research Research University at Buffalo, SUNY University at Buffalo, SUNY University at Buffalo, SUNY 701 Ellicott Street 701 Ellicott Street 701 Ellicott Street Buffalo, New York, 14203 Buffalo, New York, 14203 Buffalo, New York, 14203

ABSTRACT Keywords We present a framework that (a) integrates publication and Scientific impact, bibliometric, h-index, Technology Audit citation data retrieval, (b) allows scientific impact metrics Service, XDMoD, XSEDE generation at different aggregation levels, and (c) provides correlation analysis of impact metrics based on publication 1. INTRODUCTION and citation data with resource allocation for a computing It is a well-known fact that many science and engineering facility. Furthermore, we use this framework to conduct a innovations and discoveries are increasingly dependent on scientific impact metrics evaluation of XSEDE. We carry out access to high performance computing resources. For many an extensive statistical analysis correlating XSEDE alloca- researchers, this demand is met by large-scale compute re- tion size to the impact metrics aggregated by project and sources that cannot typically be supported by any single field of science. This analysis not only helps to provide an research group. Accordingly, dedicated large-scale comput- indication of XSEDE’s scientific impact, but also provides ing facilities play an important role in scientific research, insight regarding maximizing the return on investment in in which resources are shared among groups of researchers, terms of allocation by taking into account the field of sci- while the facilities themselves are managed by dedicated ence or project based impact metrics. The findings from this staff. Indeed, the National Science Foundation and the De- analysis can be utilized by the XSEDE resource allocation partment of Energy have supported such facilities for many committee to help assess and identify projects with higher years. One such facility is the Extreme Science and Discov- scientific impact. It can also help provide metrics regard- ery Environment (XSEDE) [10]. XSEDE allocates resources ing the return on investment for XSEDE resources, or other to approved projects, which represent a substantial financial institutional or campus resources for which an analysis of investment by NSF. Thus, justification for their use is war- impact based on publications is important. ranted and questions regarding the scientific impact of these Categories and Subject Descriptors resources naturally arise, including: H.4 [Information Systems Applications]: Miscellaneous; 1. Is there a way to measure the impact that such facili- D.2.8 [Software Engineering]: Metrics—complexity mea- ties provide to scientific research? sures, performance measures 2. Is there a correlation between the size of a given allo- cation and the scientific impact of an individual user, General Terms a given project, or a field of science? Theory, Measurement 3. When evaluating a proposal request, what is the crite- ria to judge whether the proposal has the potential to ∗ Corresponding Author. lead to impactful research, and how does one obtain metrics to substantiate this?

Permission to make digital or hard copies of all or part of this work for To answer these questions, first we need a process to quan- personal or classroom use is granted without fee provided that copies are tify the scientific outcome for the individual researchers. not made or distributed for profit or commercial advantage and that copies Secondly, we need to define and generate metrics to measure bear this notice and the full citation on the first page. To copy otherwise, to the scientific impact for individual researchers and higher republish, to post on servers or to redistribute to lists, requires prior specific level aggregated entities. Finally, we correlate the impact permission and/or a fee. metrics to the consumed resources, to provide insight on XSEDE ’14, July 13 - 18 2014, Atlanta, GA, USA Copyright 2014 ACM 978-1-4503-2893-7/14/07...$15.00. how the computing facility benefits and impacts the science http://dx.doi.org/10.1145/2616498.2616507 ...$15.00. conducted utilizing its resources. In this paper, we present a framework that addresses these questions. It is important to Furthermore, often we do not have reliable data available. point out that measuring scientific impact can be quite con- troversial and that the presented results do not necessarily represent an absolute measure of the impact of a scientific 3. SYSTEM DESIGN project, but rather the results we present represent one of We have designed a software framework to support mea- many factors that together define the scientific impact. suring scientific impact via a publication and citation based Furthermore, while we have restricted our analysis of sci- approach. The framework is based on distributed set of ser- entific impact as it relates to XSEDE, the work presented vices. The service-oriented system consists of components here has general applicability to not only HPC resources, for (a) publication and citation data retrieval (e.g., from but to other resources including campus based HPC cen- NSF award database, Google Scholar, and ISI Web of Sci- ters, beam lines, and other expensive equipment. ence), (b) parsing and processing while correlating data from In particular, we focus our effort to identify impact based various databases and services, such as the XSEDE cen- on scientific publications as the base unit of the research pro- tral database (XDcDB), which stores all usage data for jobs ductivity and obtain data, as well as derive various metrics run on XSEDE resources, and (c) the Partnerships Online based on publication data to measure the impact of indi- Proposal System (POPS) database, which stores publication vidual users, projects, Field of Science (FOS), and XSEDE and grant funding information for PI’s applying for XSEDE itself as a whole. allocations. The system also includes components for met- In the following sections we briefly review related work rics generation and an analysis system for different aggrega- (Section 2), then present our designed framework (Section tion levels (users, projects, organization, Field of Science), 3) and implementation details (Section 4). Then, we dis- as well as a presentation layer using a lightweight portal in cuss results and their impact (Section 5). Finally, we out- addition to exposing some data via RESTful API. line ongoing activities, indicate future plans (Section 6), and Fig 1 shows the layered system architecture, with an em- provide a summary (Section 7). phasis on the relationships between related components es- pecially those integrating with databases. On the core App layer we have the database mining and publication mashup 2. RELATED WORK components. The database mining component generates Our choice of using publication as the basic unit to mea- XSEDE user specific publication data as well as, user, project, sure the scientific impact is supported by the fact that bib- and Field of Science (FOS) views. The publication mashup liometrics based criteria is one of the de-facto standards to component aggregates the publication data mined from the measure the impact of research. For example, publication previous component, in addition to those from XDcDB, and derived metrics are broadly used in faculty recruit/promotion, from other available external services. It also retrieves cita- and institutional rankings [19]. While usage based metrics tion data for each publication from external services. An- are proposed by some [12, 14, 13], citation based metrics other essential task of this component is to generate metrics are a widely accepted measure. For instance, nanoHub uses for users, projects, and FOS in which the POPS database publication and citation derived metrics to measure the im- is involved to get proposal and project data. This data will pact of their project [7]. be stored into the mashup database which can then be in- In addition to the intuitive measures like number of pub- tegrated into the XDMoD [16] system at our partner site lications and number of citations, h-index [17] and g-index at University of Buffalo. We also expose some data and [15] are two other popular metrics. The publication count analysis results via RESTful API and a portal as denoted is often related to a measure of the productivity, while ci- on the Service/GUI layer. The Data layer illustrates tation counts are often related to the quality, or impact of the databases involved. The External Resource/Services the work published. As h-index and g-index calculate the layer lists the third party resource and services that we are metric by combining this data, they measure both the pro- currently using or have experimented or plan to investigate. ductivity and the quality, thus providing a general measure To conduct the analysis the general workflow includes ob- for impact. taining the publication data for each XSEDE user, and then There are existing tools to measure the metrics for individ- retrieving the citation data for each publication. Hence, the ual users, e.g. Scholarometer [18] and Publish or Perish [8]. data is originally collected per user and per publication ba- These could be potentially leveraged to analyze a relatively sis. As part of processing the data we are aggregating it small group of users, e.g., the work [11] showing TeraGrid’s based on organization, XSEDE project/account, and FOS. impact based on limited data from one resource allocation By correlating the data (for example the Service Units (SU) meeting consisting of only 112 selected PIs. However, nei- awarded by XSEDE) our intention is to identify if the anal- ther of the tools provides a scalable solution to the large ysis may reveal patterns and trends of how XSEDE can im- community we are concerned with here, namely the over pact the sciences and possibly helps to achieve a better mea- 20,000 users who have utilized TeraGrid/XSEDE resources. sure of return on investment (ROI) for NSF. While we are While more formal publication based metrics, either based using the system to analyze the scientific impact of XSEDE, on citation or usage, are still the most widely employed cri- the framework itself is flexible enough so that it could be teria, easily adapted to other similar systems for impact measure Other non publication based metrics have been proposed and analyses. by altmetrics [1] while also considering measures for datasets and code; as well as mentioning of a snippet of work via social networking. We acknowledge these efforts willl be 4. IMPLEMENTATION useful as they correlate other usage, however at this time we We have implemented the system following best practices still lack standards and a well-established way to objectively and leveraging popular tools and frameworks. The core sys- derive scientific impact from the multitude of data sources. tem is developed in Python and various libraries are utilized Service / GUI approach is to obtain the publication data directly from the IU RESTful Portal API Presentation users. This is desirable since user curated data tends to be

TAS Pub more accurate in comparison to automated publication min- ______RESTful ______ing. Additionally, it can provide extra information regarding Services App Query 3rd a publication’s association with the system, e.g., to which party services project a given publication is associated with. XSEDE pro- vides such functionality via the user portal. However, this NSF AWDSearch TAS Pub DB Pub mining mashup framework was just recently introduced. Thus, the collected

______data is small and insufficient for our in-depth analyses. The

Data vetting and gathering of data by users through Web forms UB NCSA is also conducted by other projects, such as the nanoHub ci- tation analysis [7]. Our framework supports pluggable data XDMOD Data NSF AWD pub Warehouse data for XD sources that allow for the mining of databases and/or ac- people TAS pubs mashup db cessing 3rd party service APIs for publication data. In this

XDcDB mirror study, we focus only on two of the data sources - the user NSF POPS (Portal XD Entities AWDsearch - (Proposal Publicatiions Views submitted publication data via the XSEDE portal, and the source data) and Accounts) extensive NSF award database for automated mining. The former source has user curated data with project affiliation ______information, and thus in principle it gives a measure of direct External external Resource/ 3rd party data/services impact of XSEDE. However, it has limited data entries as Service Google of date. On the other hand, the NSF award database con- CiteSeerX, Scholar Microsoft ISI Web PUBMed, tains an extensive compilation of publications that can be (General Academic Mendeley of ACM, search Search Science IEEE automatically mined to pull out all publications for a given and user …… profile) XSEDE user. While we cannot directly correlate the pub- lications obtained in this way with XSEDE resources (since the NSF database contains all NSF related publications for Figure 1: The Architecture of the Framework a given user regardless of whether the publication was as- sociated with XSEDE use), it does nonetheless provide a measure on a general or indirect impact of XSEDE. As a to help interact with various databases, developing service given XSEDE user is affiliated with accounts/projects, and and web frameworks, and so on. The portal/web tier follows the projects are part of one or more FOS, we can thus tag the Ajax approach which provides better experience to users a publication as being related to the projects and a FOS while viewing the presented data and chart. based on these indirect correlations. Although not ideal, it Publication and citation data retrieval was a complex but provides analytic capabilities to analyze an indirect impact. essential part of our study, so we provide details of this pro- Based on this technique, we have been able to obtain over cess next. 142,000 publication entries for over 20,000 XSEDE users as of Jan 2014. This by itself is a substantial accomplishment 4.1 Publication Data Acquisition as we know of no other database that has this level of de- tail that can be correlated to researchers participating in XSEDE. To provide a quick overview of the data analyzed

● 3

4 we refer to Figure 2 showing the yearly distribution of the 10000

2 publications (histogram in (a)), and the distribution of num-

3 ber of the publications by project (left boxplot in (b)) and 1000 1 by per user for each project (right boxplot in (b)). 2 100 # pubs 0 log10(# pubs) log10(# pubs)

● 1 10 −1 4.2 Citation Data Retrieval ● While the publication data from the user curated data

−2 ● 1 0 1990 1995 2000 2005 2010 might be more ideal, we need to conduct an automated year by project per user avg by project search to identify the subsequent citations of the publication (a) Number of Publications (b) Distribution of number recovered from the NSF award database to help provide an Retrieved by Year of publications indication of the quality of the research. Due to the size of this publication data (over 142,000 pub- Figure 2: Distribution of the Publication Data lications), the only realistic way to accomplish this is with an automated process. We have used Google Scholar and ISI Given the size of the XSEDE user database, which as Web of Science as sources for the citation data. In order to of Jan 2014 was over 20,000 users, we needed to employ compare the two methods of obtaining citations, we explored an automated approach to obtain publication data on be- Google Scholar and ISI data for a subset of the publication half of each user. Publication citation data are available data, and did a comparison of the results. While a similar via subscribed resources such as ISI Web of Science [4] or comparison has been attempted [20], it was restricted to a open access such as Google Scholar [2], Microsoft Academic very small sample size - 2 people and about 100 publications. Search [6], and Mendeley [5], however they unfortunately In comparison, our study included 33,861 publications and usually do not provide unlimited access, making automated 1,462 users; moreover an author has been identified to have publication retrieval impractical for some services. Another an XSEDE account. Citation data from GS vs ISI H−index based on GS vs ISI and FOS. For each entity, we include the number of publica-

● ● r: 0.8411 ● ● ● tions (as header # of Pubs), number of citations (as header r: 0.9669 ● gs=7.02+1.14isi(linear fit) gs=0.09+1.14isi (Linear fit) ● ● ●● y=x (Reference line) ● ● ● Cited by), h-index and g-index. We also include the m fac- y=x (Reference line) ● 60

500 ●

● ● ● ● ● tor of h-index which indicates the slope of the h-index over ● ●● ●● ● ● ● ● ● ● ● ● ● the years spanned by the publications. This can be used to ● 40 ● ● 50 ●●● ●● ● ● ● ●● ● ● ● ●●●●● ●●● ● compare the efficiency between peers if they have the same ● ●●●● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ●●●●●● ● ●●●● ● ●●●●●●●

h−index based on GS h−index ● ● ●●●●

Citation count from GS ●●●●●● ●

20 h-index. Another metric we compute is i10-index [3] which ●●●●● ● ●●●●● ●

5 ●●●● ●●●●●● ● ●●●●●● ●●●●●●● ●●●●●●● ● ● ●●●●●● ●●●●●● ● ●●●●●●● ●●●●●● ● was first introduced in Google Scholar to measure the publi- ● ●●●●●● ● ●● ●●●●●● ●●●●● ● ●●●●● ● ● ●●●●●●● ●●● ● ● ● ●●●● ●●●●

1 ●●●●●●●●● ● ● 0 cation count of publications receiving over 10 citations each. 1 5 50 500 0 10 20 30 40 50 60 Citation count from ISI h−index based on ISI For all the metrics excluding the m factor for h-index, we also compute a recent version which was computed using (a) Citation counts (b) H-index comparison for only the publications published from the last 5 years. The comparison for a subset of a subset of XSEDE PIs time limited comparison helps to compare the peers based on our publication data recent work by eliminating effects from older publications. Figure 3: Comparison of metrics derived from GS vs ISI 5.2 Project metrics vs SUs allocation

The result of this activity is depicted in Figure 3. Part (a) shows the correlation of the citation data from Google 4 Scholar (GS) with the ISI Web of Science (ISI). And part 5

(b) shows the h-index derived from Google Scholar citation 4 3

data correlating to that calculated from ISI citation data. 3

In either case a high positive correlation is observed. The 2 2 log10(# cited) Pearson correlation coefficients (r) are 0.84 and 0.97 respec- log10(# pubs) 1 tively. The very strong correlation of the h-index values are 1 mostly due to the fact that one of the two factors determin- 0 ing the h-index, the number of publications, stay the same 2 4 6 8 2 4 6 8 for a particular user. log10(SUs allocation) log10(SUs allocation) Based on this study, while being aware of the limitations, we were able to use the ISI citation data to get very similar

measures for most of the data especially for the h-index met- 2.5 ric. This is especially useful if we consider that we do have 2.0 2.0

issues to retrieve a complete citation data set from Google 1.5 scholar for each of our relevant users and publications based 1.5 1.0 on restricted access rights. Thus, the following analyses are 1.0 log10(h−index) log10(g−index) 0.5 only using citations from ISI. 0.5 0.0 0.0 5. RESULTS AND ANALYSES 2 4 6 8 2 4 6 8 The previous section described the method used to extract log10(SUs allocation) log10(SUs allocation) publication and citation data for XSEDE users. With this data now in hand, we discuss the metrics derived from it Figure 4: Impact Metrics (number of publications, number of cita- tions, h-index, g-index) vs SUs for all projects with the goal of providing a measure of scientific impact. We also conduct analyses to determine if a correlation exists between the data and various categories such as the field of Correlation with SUs allocated r (Pearson's) df p-value science. # pubs 0.242 < 2.2e-16 # cites 0.243 < 2.2e-16 All Projects 6278 5.1 Direct impact of XSEDE h-index 0.228 < 2.2e-16 g-index 0.220 < 2.2e-16 By using the user vetted submitted publications only, we # pubs 0.381 < 2.2e-16 Research # cites 0.377 < 2.2e-16 were able to show the direct scientific impact of XSEDE. As 1677 Projects h-index 0.319 < 2.2e-16 of Jan 27, 2014, there are currently 837 publications regis- g-index 0.305 < 2.2e-16 tered, involving 882 XSEDE users as authors, 220 organiza- # pubs 0.335 0.001 Campus # cites 0.315 0.003 tions, 331 XSEDE projects, and a total of 11,258 citations Champion 86 h-index 0.344 0.001 Projects to date. Please note that these values are based on contin- g-index 0.325 0.002 uously growing publication data. So far, not all users have # pubs 0.025 0.118 Startup # cites 0.027 0.091 contributed their publications to XSEDE projects, or they 3944 Projects h-index 0.031 0.048 have not uploaded them to the portal yet. We are working g-index 0.035 0.029 with the XSEDE portal team to provide a publication dis- covery service in which a user would be presented a list of Table 1: Correlation between SUs allocated vs the impact metrics for publications to curate. This is expected to ease the publi- each project cation acquisition process so more user curated publication data will be available for our future analyses. Figure 4 shows the correlation analysis of impact met- Based on the currently available data, we calculated a rics (number of publications, number of citations, h-index, series of metrics aggregated by user, organization, project, and g-index) versus XSEDE resource allocation (number ● ● ●● ● ●● ● 4 ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● 1e+06 ● ● ● ● ● ● ● 50000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

3 ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● 5000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2 ● ● ● ● 500 ●

● 1e+04 ● # cites

# pubs ● ● ● ●

2 ● ● ● ● ● log10(# cited) log10(# pubs) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50

1 ● 1 ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● 0 5 1e+02 4 5 6 7 4 5 6 7 1e+03 1e+05 1e+07 1e+03 1e+05 1e+07

log10(SUs allocation) log10(SUs allocation) SUs allocation SUs allocation

● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● 200 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 1.5 ● ● ● 100

50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50

1.0 ● ● ● h−index g−index 1.0 ● ●

20 ● ● ● ● ● ● ● ● ● ● ● ● log10(h−index) log10(g−index) ● ● ● ● ●

20 ● 0.5 10 ● 0.5 ● ● ● ● ● ● ● ● ● 5 ● 10 ● 0.0 0.0

4 5 6 7 4 5 6 7 1e+03 1e+05 1e+07 1e+03 1e+05 1e+07

log10(SUs allocation) log10(SUs allocation) SUs allocation SUs allocation

Figure 5: Impact Metrics (number of publications, number of cita- Figure 6: Impact Metrics (number of publications, number of cita- tions, h-index, g-index) vs SUs for research projects tions, h-index, g-index) vs SUs for FOS’s of SU’s) for an individual project (research, start-up, cam- pus champion, etc). Previous work showed a stronger cor- ●● ●● ● ● ● ● ● ● ● ●● ● ● 8 ●●● ● ● ● ● ● ● ● ● ● ● ● relation between the citation and SUs [11] using a much ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● smaller sample size taken from a specific XSEDE resource ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● allocation meeting. However, we observed a weaker corre- ●

● ● ●

5 ● ●

● lation, if any. When categorizing the projects based on the log10(SUs) 4

types (research, startup, campus champion, etc.), it shows ● 3 a slightly stronger correlation, although still not as strong 0 50 100 150 200 250 in correlation to each category other than for the startup projects/allocations. Figure 5 shows the analysis for re- h−index search projects only. Table 1 lists the correlation coefficient values as well as the p-values showing the significance of the test. Please note in Figure 4 and 5 we included a regression

● ● ● ● ● ● ● line showing the upper trends of the correlation, i.e., higher ● ●

1 ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● SUs allocation correlating to higher impact metrics, but not ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● suggesting a linear relationship. This correlation analysis ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● does not show causality especially since the funding and im- −1 pact are expected to be related in a feedback loop. −2

5.3 Metrics vs SUs allocation on FOS level 0 50 100 150 200 250 While for individual projects we do not observe strong fit residule of log10(SUs) after removing h−index correlations between impact metrics and the resource allo- cations, Figure 6 shows stronger positive correlation on the Figure 7: SUs vs h-index for each FOS with trend (above) and residual FOS level (132 FOS involved). The Pearson correlation co- analysis (bottom) efficients (r) are 0.704, 0.712, 0.651, 0.648 respectively for the four impact metrics - number of publications, number of citations, h-index and g-index. This result is statistically mic scale) vs the h-index produced for each FOS, while the significant and show a very strong relationship between these circle size is proportional to the size (number of projects) of variables. the FOS. It also shows that after removing the fitted trend, The stronger correlations are most likely caused by the we can see a divergence of the SUs received, from the ex- effect of the different sizes of the FOS’s. However, this does pected SUs trend to produce the given impact judging by not diminish the conclusion of the analysis that shows how h-index. This could imply that certain FOS’s are more effi- XSEDE impacts science from different disciplines, e.g., by ciently (requiring less than expected resources) producing a approving more projects and granting more allocations for given impact while some others require more than expected certain FOS’s. SUs to produce the same impact. An interactive version of Figure 7 shows the SUs allocated (transformed in logarith- this plot is available via the portal interface [9]. Average (per project) npubs vs SUs for each FOS Average (per project) ncites vs SUs for each FOS correlation of project npubs vs SUs within each fos correlation of project ncites vs SUs within each fos

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 1.0 ● ● ● ● ● ● ● 3.0 ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ●● ● ● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● 2.5 ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● 0.5 ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● 3 ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●●● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● 0.0 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● n>=100 ● ● n>=100 ● ● ● ● ● ● ● ● <−−−− ● <−−−− ● ● ● ● ● ● 1 1.0 ● n>=50 n>=50

● −0.5 −0.5 ● <−−−− <−−−− ● corr.coef (ncites~SUs) corr.coef ● (npubs~SUs) corr.coef n>=10 ● n>=10 <−−−− <−−−− 0.5 log(avg of ncites per project) log(avg log(avg of npubs per project) log(avg ● ● ● ● ● ● ● ● ● ● ● ● 0 −1.0 −1.0 3 4 5 6 3 4 5 6 0 20 40 60 80 100 120 0 20 40 60 80 100 120

log(avg of SUs per project) log(avg of SUs per project) fos sorted by No. of projects in decreasing order fos sorted by No. of projects in decreasing order

Average (by project) h−index vs SUs for each FOS Average (per project) gindex vs SUs for each FOS correlation of project h−index vs SUs within each fos correlation of project g−index vs SUs within each fos

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 1.0 1.0 ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● 0.5 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.0 0.0 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● 40 ● ● ●●●● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●●● ●● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● n>=100 n>=100 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● <−−−− <−−−− ● ● ● ● ● ●● ● ● ● ●● ●● ● n>=50 n>=50 ● ● ● 20 ● ● ● ●● ● −0.5 −0.5 10 ● <−−−− <−−−− ● ● ● ● ● ● ● ●●● ● ● ● ● n>=10 ● n>=10

● (h−index~SUs) corr.coef (g−index~SUs) corr.coef ● ● ● <−−−− <−−−− ● ●● ● ●● ● log(avg of gindex per project) of gindex log(avg log(avg of h−index by project) by of h−index log(avg ● ● ● ● ● ● ● ● ● ● ● ● 0 0 −1.0 −1.0 3 4 5 6 3 4 5 6 0 20 40 60 80 100 120 0 20 40 60 80 100 120

log(avg of SUs per project) log(avg of SUs per project) fos sorted by No. of projects in decreasing order fos sorted by No. of projects in decreasing order

Figure 8: Impact Metrics (number of publications, number of cita- Figure 9: Correlation coefficient (r) of impact metrics vs SUs on tions, h-index, g-index) vs SUs for FOS (avg by project) project level for each FOS

Correlation with average r (Pearson's) df p-value SUs allocated 1 1 1 1 # pubs 0.221 0.010

Average per 0.8 0.8 0.8 0.8 # cites 0.222 0.010

project for 132 0.6 0.6 0.6 0.6 h-index -0.043 0.620 each FOS g-index -0.035 0.688 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 r r r r 0 0 0 0 Table 2: Correlation between average SUs allocated vs the average impact metrics (by projects) for each FOS −0.2 −0.2 −0.2 −0.2 −0.6 −0.6 −0.6 −0.6

As we see, the size of FOS significantly effects the im- −1 −1 −1 −1 pact as well as the allocations (for h-index as in Figure ??). We can eliminate this effect by comparing the average val- nprojs<10 10<=nprojs<50 50<=nprojs<100 nprojs>=100 ues within each FOS by dividing the number of projects, as shown in Figure 8, while Table 2 has the values. It shows the Figure 10: Distribution of r grouped by size of FOS weak correlation of per project based metrics vs SUs for the number of publications and citations, which is actually not significantly different than the result presented in Table 1. would be able to find in which FOS this correlation appears We did not observe any correlation between allocation and much stronger, and in which they are weak. This could be h-index or g-index. This is probably caused by the fact that potentially used during resource allocation to help deter- these two metrics do not work well when being averaged as mine which projects should be preferred when resources are they are not cumulative or additive values. limited but demands are high. However, as shown in Figure 9, within each FOS, the project level metrics vs SUs correlations are typically higher 5.4 Scientific Impact Produced per SU Allo- especially for large size FOS’s. With increasing size of the cation Unit FOS (n=10, n=50, and n=100 are denoted as vertical lines), The publications database acquired from the NSF awards the correlation appears positively higher and more signifi- database includes all publications from XSEDE users rather cant. Figure 10 shows the distribution of correlation coef- than just those relevant to XSEDE. As such, these pub- ficients (r) between number of publications and allocations lications present only an indirect measure of the scientific for each project within the same FOS, while grouped by size impact of XSEDE, diluted by the presence of many publica- of FOS (number of projects). Note the general trend that tions that are not related to the XSEDE resources. A more the extremes and ranges are narrowing, and the medians are ideal, and direct measurement of XSEDE’s scientific impact increasing (above 0.4 for groups of FOS with more than 50 is obtained from the user curated publication database. We projects), along with the increase of the FOS size. This sug- have shown in the previous section that these scientific im- gests that for the majority of the FOS, impact metrics for a pact metrics can be used to measure the scientific impact project do have a positive correlation with SUs allocated to of XSEDE in general, as well as comparing individual users, the project. By investigating the individual data points, we projects, and FOS with their peers. As these metrics are user names. This is not a problem specific to our study but for the automated bibliometrics analysis in general. In

● ● 3 the future, we plan to tackle the problem based on other ●

LSline: y= −0.879 x + 5.539 4 LSline: y= −0.852 x + 5.98 ● ● ● ● ● R^2= 0.5309

2 ● ● R^2= 0.8444 ● ● ●● available data such as field of science, organization, funding ● ● ●●● ● ●● ● 3 ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● data, co-author relationship etc. while conducting machine ●● ●● ● ● ● 1 ●● ●● ● ● ●● ● ● ● ● ●●● ●● ● ● ● 2 ●●● ● ●●●● ●● ● ●● ●● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ●● ●●● ●● ● ●● ● ●● ● ●●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ●● ●● ● learning techniques as well as adopting social network based ● ● ● ●● ● ●● ●● ● ●● ● ●● ● ●●● ● ●●●● ● ●● 1 ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●●● 0 ●● ● ● ● ●● ● ● ● ● ●●●● ●●●●● ●● ● ● ● ●●●● ●●●● ●●●●● ●● ● ● ●● ●● ●●● ● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ●●● ● ●●● ●●●● ●● ● analyses. ●●● ●● ●● ● ●● ● ● ● 0 ● ● ●● ●●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● −1 ● ●

● log10(ncites/million SUs)

log10(npubs/million SUs) ● ● A useful compromise is to let users curate their publica- ● ●● −1 ● ● tion list. We will include processes assisting in the curration 4 5 6 7 4 5 6 7 of date into the workflow of vetting the papers. One path- log10(SUs) log10(SUs) way we currently pursue is to work with the XSEDE portal team while providing the publication data we have collected as a publication discovery service, in the hope to provide

● ● 3

3 more convenient way for users to quickly populate the vet- LSline: y= −0.924 x + 5.717 ● LSline: y= −0.89 x + 5.626 ● ● ted publications library. ● ● 2 ● R^2= 0.9146 ● 2 ● ● R^2= 0.8599 ● ● ● ●● ● ● ● ●● ●●● ● ● ● ●● ● ●● ● ●● ● ● ● ● We have also started another similar activity, in which we ● ● ● ● ● ● ●● ● ●● ● ● ●●●●● ● ●● ● ● ● 1 ● ● ● ● ● ● ●● ● 1 ● ●● ●● ●● ● ● ●●● ● ● ●●●● ● ●● ●● ● ●● ● ● are attempting to extract and parse the publication data ● ● ● ●● ●● ● ● ●● ● ● ●● ● ●●●● ● ●● ● ● ●● ●● ●●● ● ●●● ● ● ● ● ● ●● ● ●● ●● ● ●● ●● ●● ●● ● ●● ●● ● ●● 0 ● ● ● ●

● 0 ●●● ● ●● ● ● ●●● ● ● ●●● ● ●●●● ● from past TeraGrid/XSEDE quarterly reports. This data, ●● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ●●●● ●● ● ●● ●● ●● ● ● ● ● ●● ● ●●● ● ● ●● ●● ●● ●● ● ●● ● ●●● ● ●● ● while not curated on per user basis, does have project level ● ● −1 ●● −1 ●● log10(h−index/million SUs) log10(h−index/million SUs) log10(g−index/million ●● ●● association information, and thus, can serve quite well for

4 5 6 7 4 5 6 7 most of our analyses.

log10(SUs) log10(SUs) As for the resource allocation, we currently only consid- ered the Service Units (SUs), or cpu-hours, as this is the dominant factor thus far to measure resource allocation in Figure 11: Four different measures of scientific impact per SU al- located. Note that the Y-axis gives scientific impact scaled by SU XSEDE. With the increasingly importance and bigger needs allocation. Therefore, these plots indicate that as the allocation size of storage allocations from big-data applications, and Vir- grows there is a diminishing scientific impact per SU allocated tual Machine (VM) based allocations for those interested into cloud computing, we will need to put these also into the equation to cover more forms of resources in addition to obtained from the publications that are tagged as results SUs. from an XSEDE project, we also can do an analysis of the Finally, we are conducting social networking related anal- scientific impact produced per SU allocation unit. yses among publications, users, projects, FOS’s, etc. based We have calculated the scientific impact for those involved on citation and co-authorship relations. Mining social net- projects (302 out of more than 6,000 in total) based on the working media such as Twitter and Facebook is also planned direct metrics obtained earlier and SUs allocated to them (in to obtain usage data, among other altmetrics, to compliment million SUs). Figure 11 shows a series of four log-log plots in the publication-based scientific impact studies. which four different scientific impact metrics for each project are scaled by the SU allocation then plotted against the to- tal allocation. Previously we have demonstrated the positive 7. CONCLUSION correlation of scientific impact metrics and the resource allo- This paper presents a framework to facilitate the measur- cation of projects within each FOS. Figure 11 suggests that ing of scientific impact and evaluation of ROI for large com- based on these metrics of scientific impact, that is number puting facilities. We have used this framework to conduct of papers, citations, h-index, and g-index scaled by SU’s, an evaluation of scientific impact of XSEDE by deriving var- sponsoring a larger number of smaller scale projects could ious metrics and carrying out extensive statistical analyses. actually produce a higher scientific impact than a smaller The major accomplishments include: number of very large projects. In other words, we cannot 1. We have devised a process to obtain and manage pub- expect a project that received double the amount of SUs lication and citation data from various sources for a of what another project did to produce double the impact, given group of people. We have followed this work- as measured by number of publications, citation counts, h- flow to obtain over 142,000 publications as well as the index, and g-index. citation count data for over 20,000 XSEDE users. Unfortunately, to date, the number of user curated publi- cations is still too small. With more such data available over 2. Based on the consolidated relevant bibliometrics data time, we anticipate to repeat our analysis in order to derive various scientific impact metrics are derived for users the scientific impact of XSEDE and demonstrate the rela- and other aggregated levels such as projects and field tionship between XSEDE funded allocations and a variety of science. of scientific impact metrics. 3. The results are presented via a lightweight portal, and are also exposed via database integration or RESTful 6. ONGOING AND FUTURE WORK services to other portals, including the XDMoD por- This paper does not address the researchers name am- tal and the XSEDE portal. For example, we expose biguity issue, which deserves dedicated research. The root the publication data via RESTful service API to the cause of this issue is that the metadata of the publications XSEDE portal team as a publication discovery service. simply does not include enough information to distinguish This will help facilitate the identification and curation similar names that can be uniquely associated to XSEDE of XSEDE enabled publications by XSEDE users. 4. Statistical analyses were carried out correlating the im- [9] Tas scientific impact metrics and analysis dev/testing pact metrics with projects/proposals, field of science, portal. URL: and allocation data to help provide metrics that can be http://fgdev.pti.indiana.edu:8088/xdportalpub/. used to quantify the impact of XSEDE resources on sci- [10] XSEDE. Web Page. URL: https://www.xsede.org/. entific research. These analyses do show a positive cor- [11] J. Bollen, G. Fox, and P. R. Singhal. How and where relation between XSEDE funded allocations and var- the TeraGrid supercomputing infrastructure benefits ious scientific impact metrics. When more XSEDE science. Journal of Informetrics, 5(1):114–121, 2011. directly related data are available we expect to pro- [12] J. Bollen, M. A. Rodriguez, and H. Van de Sompel. vide much more insight into the scientific impact of MESUR: Usage-based Metrics of Scholarly Impact. In the XSEDE program. Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’07, pages 5. We have conducted preliminary analyses on scientific 474–474, New York, NY, USA, 2007. ACM. URL: impact produced per SU allocation unit based on user , curated publication data with a limited sample size. http://doi.acm.org/10.1145/1255175.1255273 . This may provide a way to measure the ROI of XSEDE. doi:10.1145/1255175.1255273 We will conduct similar analyses when having more [13] J. Bollen, H. Van de Sompel, A. Hagberg, and user curated publications to further solidify the results. R. Chute. A principal component analysis of 39 scientific impact measures. PloS one, 4(6):e6022, 2009. It is obvious that continuous work is crucial to conduct [14] J. Bollen, H. Van de Sompel, and M. A. Rodriguez. longitudinal tracking of the data and deal with the issues Towards Usage-based Impact Metrics: First Results that XSEDE has so far provided limited amount of publica- from the Mesur Project. In Proceedings of the 8th tion data. Important is to note that this work has pioneered ACM/IEEE-CS Joint Conference on Digital Libraries, the workflow and the analysis capability on how to achieve JCDL ’08, pages 231–240, New York, NY, USA, 2008. the data gathering. This framework can be reused by vari- ACM. URL: ous groups enabling different services as part of XSEDE to http://doi.acm.org/10.1145/1378889.1378928, assist the portal and auditing teams. Moreover, this frame- doi:10.1145/1378889.1378928. work and its service oriented model makes it possible to ex- [15] L. Egghe. Theory and practise of the g-index. pand its usage beyond those targeted by XSEDE resources. Scientometrics, 69(1):131–152, 2006. It could be employed within other organizations such as De- [16] T. R. Furlani, B. L. Schneider, M. D. Jones, J. Towns, partment of Energy (DOE) or even a department of a uni- D. L. Hart, S. M. Gallo, R. L. DeLeon, C.-D. Lu, versity. Those that would like to consult with us on such A. Ghadersohi, R. J. Gentner, A. K. Patra, G. von specializations, can contact us for further details. Laszewski, F. Wang, J. T. Palmer, and N. Simakov. Using xdmod to facilitate xsede operations, planning 8. ACKNOWLEDGMENTS and analysis. In Proceedings of the Conference on Extreme Science and Engineering Discovery This work is part of the Technology Auditing Service (TAS) Environment: Gateway to Discovery, XSEDE ’13, project sponsored by NSF under grant number OCI 1025159. pages 46:1–46:8, New York, NY, USA, 2013. ACM. Lessons learned from FutureGrid have significantly influ- URL: enced this work. Gathering publications has first been pi- http://doi.acm.org/10.1145/2484762.2484763, oneered by FutureGrid influencing the development in the doi:10.1145/2484762.2484763. XSEDE portal. We would like to thank Matt Hanlon and Maytal Dahan for their efforts to integrate this framework [17] J. E. Hirsch. An index to quantify an individual’s into the XSEDE portal. We would like to thank Barbara scientific research output. Proceedings of the National O’Leary for her help in reviewing this paper. academy of Sciences of the United States of America, 102(46):16569–16572, 2005. [18] J. Kaur, D. T. Hoang, X. Sun, L. Possamai, 9. REFERENCES M. JafariAsbagh, S. Patil, and F. Menczer. [1] altmetrics. Web Page. URL: Scholarometer: A social framework for analyzing http://altmetrics.org/manifesto/. impact across disciplines. PloS one, 7(9):e43235, 2012. [2] Google Scholar. Web Page. URL: [19] P. Thomas and D. Watkins. Institutional research http://scholar.google.com/. rankings via bibliometric analysis and direct peer [3] i-10 index | google scholar citations open to all. URL: review: A comparative case study with policy http://googlescholar.blogspot.com/2011/11/ implications. Scientometrics, 41(3):335–355, 1998. google-scholar-citations-open-to-all.html. [20] K. Yang and L. I. Meho. Citation analysis: a [4] ISI Web of Science. Web Page. URL: comparison of Google Scholar, Scopus, and Web of http://wokinfo.com/. Science. Proceedings of the American Society for [5] Mendeley. Web Page. URL: Information Science and Technology, 43(1):1–15, 2006. http://www.mendeley.com/. [6] Microsoft Academic Search. Web Page. URL: http://academic.research.microsoft.com/. [7] nanoHUB.org - Citations. Web Page. URL: https://nanohub.org/citations. [8] Publish or Perish. Web Page. URL: http://www.harzing.com/pop.htm. J XD Service Provider Forum Report

Program Years 1-3: July 1, 2011 through June 30, 2014 J.1 Summary Service Providers (SPs) are independently funded projects or organizations that provide cyberinfrastructure (CI) services to the science and engineering community. In the US academic community, there is a rich diversity of Service Providers, spanning centers that are funded by NSF to operate large-scale resources for the national research community to universities that provide resources and services to their local researchers. The Service Provider Forum is intended to facilitate this ecosystem of Service Providers, thereby advancing the science and engineering researchers that rely on these cyberinfrastructure services. The Service Provider (SP) Forum provides:  An open forum for discussion of topics of interest to the SP community.  A formal communication channel between the SP Forum members and the XSEDE project.

The Service Provider (SP) Forum is a key element of the governance structure for the XSEDE project, in that it is the formal interface point between the XSEDE project and the group of autonomous Service Providers (SPs) whose services are integrated by the XSEDE project. The SP Forum was initially conceived to serve this purpose – the interface between SPs and the XSEDE project – and this must remain a required mandate for the SP Forum. However, the SP Forum members recognize the potential value of this forum for playing a broader role in the national cyberinfrastructure community beyond the interface to the XSEDE project. In the summer of 2011, the group of Service Providers that were then providing allocated resources convened, selected an interim set of officers, and developed the initial charter, bylaws and governance procedures for the SP Forum. The first SP Forum Charter document was formally approved February 2012. This document was amended in July 2013 to expand the pool of eligible candidates for the positions of the SP Forum. A second round of officers was then elected in October 2013. In early 2014, a significant effort was made to update the Charter document in parallel with updates to a companion document describing the XSEDE Federation, for the purposes of clarifying the relationships between these two entities and streamlining the process for new members to join either/both group. This update was approved in June 2014. Membership in the SP Forum has evolved, from an initial set of 16 Service Providers, all of whom provided allocated resources to the NSF user community (Level 1 XSEDE Federation members) to 18 current members representing a more diverse mix of Level 1, 2 and 3 XSEDE Federation members. The SP Forum has addressed both ongoing operational issues at the XSEDE interface (e.g. software, security, operations, allocations, networking) and specific priority topics such as establishing storage as a resource class, responding to the OCI/ACI re-organization at NSF, mutually developing important capabilities with the XD-TAS team via XDMOD, developing policies for large educational classes, and coordinating the collection and analysis of publications and financial awards data across XSEDE and its Service Providers. These activities are discussed in more detail below. J.2 Context The Service Provider Forum was originally proposed as part of the governance structure for the XSEDE project, which recognized the need for a forum to serve as the formal interface point between the XSEDE project and the group of autonomous Service Providers (SPs). The

PY1-3 Comprehensive Report Page 1 implementation is facilitated by cooperation and various degrees of commonality between SPs, XSEDE and the broader stakeholder community that utilizes these services. In this context, the SP Forum is the forum for interaction between XSEDE and SPs, discussing and mutually resolving issues where the interests of SPs and XSEDE overlap (e.g. networking, operations, security, allocations, accounting, user support, documentation, software environments, etc.). The SP Forum charter, membership and governance structures have always reflected the mutual relationships and responsibilities in this crucial governance body that must constructively support the XSEDE-SP interactions. While the initial motivation to create the Service Provider Forum was to facilitate the interface between the XSEDE project and those SPs with which it coordinates/integrates, the original members of the SP Forum recognized that this body could have a broader role beyond just the interface to XSEDE. In particular, the SP Forum includes a large number of members with considerable expertise and experience in operating diverse cyberinfrastructure resources and services. The SP Forum could itself become an active element in the national cyberinfrastructure ecosystem, with its members sharing best practices, leveraging mutual experiences with various technologies and systems, or coordinating training or outreach activities. This forms the basis for the SP Forum’s second purpose – to be an open forum for discussion of topics of interest to the SP community, whether or not these are directly related to XSEDE. This element was written into the initial Charter document of the Service Provider Forum and was clarified and strengthened further in the recent update to the Charter in June 2014. The specifics of this element of the Charter are being discussed both within the SP Forum as well as broader venues including XSEDE’14, the Campus Champions, and CASC. It is likely to focus on sharing best practices with respect to procuring and operating J.3 Major Activities The SP Forum members had extensive discussions regarding the nature of SP Forum, e.g. whether it would mainly serve as a platform to interact with the XSEDE project or it would be open to a broader community of service providers beyond the XSEDE project. The forum ultimately decided to establish a forum that is open to any service provider that sees value in being part of this community. This vision of inclusiveness has the potential to more positively impact cyberinfrastructure development on the campuses and nationwide. In practice, the majority of the SP Forum activities have focused on the interaction between Service Providers and the XSEDE project. Some of the topics addressed include:  Ongoing feedback to the XSEDE project on governance, priorities and activities through joint discussions in the SP Forum, the SP Forum Chair’s participation in the XSEDE Quarterly meetings and Annual Review, representation in the XSEDE Advisory Board, recommendations to the XSEDE User Advisory Council, SP participation in the XSEDE User Requirements Evaluation and Prioritization (UREP) process, tc.

 Until recently, all quarterly reports for Service Providers were included in the XSEDE report. The SP Forum established a common template for all these reports to improve readability, and worked closely with the XD-TAS XDMOD team at U Buffalo to produce standardized reports across all resources. More recently, the SP Forum has worked with XSEDE User Support to generate standard reports for SP-related tickets processed through XSEDE.

 In the summer of 2012, in response to the NSF reorganization in which the Office of Cyberinfrastructure (OCI) was merged back into the CISE directorate, the SP Forum wrote a

PY1-3 Comprehensive Report Page 2

letter to the NSF Advisory Committee for Cyberinfrastructure outlining the concerns and recommendations of its members.

 Ongoing and systematic interactions with the XSEDE team, particularly the Operations, Allocations and Software groups where most of the XSEDE-SP interfaces occur. Representatives from these teams routinely participate in SP Forum meetings and frequently lead agenda items. There are a number of documents and policies which define requirements, policies and procedures for being an XSEDE Service Provider (e.g. see “Production Environment and Operations Documents” at https://www.xsede.org/web/guest/project-documents- archive). These are developed by XSEDE in close collaboration with SPs, particularly the baseline software and security documents which very much impact SP operations.

 Development of allocation policies for storage as resources independent of compute allocations, and implementation of accounting procedures to support allocated storage. This opened the door to a new class of allocated resources, including both tape archives and disk-based persistent storage. It also recognizes storage as a finite resource with value and allows SPs to offer well-defined, constrained resources, including archival storage which had never been accounted for or formally constrained.

 The SP Forum made a number of efforts in the allocations arena, such as the XSEDE award letter to NSF Program Directors, citing the value of resources offered. This process was updated in late 2013 to reflect more granularity in the value of various compute and storage resources. In addition, potential policies regarding tighter control of allocation burn rates on resources were discussed to avoid the loss of cycles or inability of the SP to deliver; it was decided that this would be handled at an individual level by SPs.

 In early 2013, XSEDEnet transitioned from NLR to Internet2 to improve the backhaul network to support 100G traffic and reduce bottleneck issues with the NLR backbone for XSEDE. All SPs affected by this transition approved the change in late January, and technical staff at each of the SPs worked with XSEDE to successfully execute the transition.

 The issue of class accounts and allocations were brought into focus by a request for accounts for a large class with 300 students (from UC Berkeley). While the SPs have long supported classes, they are typically small ones. Issues related to allocation and support came up due to the scale of the class. The SP Forum worked with XSEDE to develop a policy and process to support classes more effectively and sustainably. This will include defining a threshold for class size above which additional information will need to be collected from the instructors. This work has just begun and will continue into the next quarter.

 The SP Forum constructively engaged with XD-TAS staff on several topics, including feedback on the compliance reports developed by XD-TAS, regular reporting by XDMOD for quarterly/annual reports, potential use of the SUPREMM monitoring capability at SPs, and more recently an audit of the allocations process conducted by XD-TAS. On the final topic, SP Forum members individually provided feedback on the report to XD-TAS staff and NSF, as well as comments on the audit’s recommendations to NSF.

 The SP Forum is working with XSEDE to coordinate the compilation of publications from the use of resources and XSEDE services, and the access to these data for appropriate reporting of publications by both XSEDE and individual SPs. This should streamline requests to users and

PY1-3 Comprehensive Report Page 3

allow improved reporting of user publications enabled by various individual awards. A similar process is just being initiated with XSEDE and XD-TAS staff to access the financial grant award information for allocated users which is collected by POPS/XRAS and other sources, and correlate this information with Service Providers that supported those allocations and hence awards.

J.4 Membership The Service Provider Forum was initially constituted amongst the Service Providers delivering allocated resources to NSF users in the fall of 2011, listed in the table below. Name Institution Resource/Service Carol Song, Interim Chair Purdue Steele/Condor/Wispy Sean Ahern Univ of Tennessee-Knoxville/NICS RDAV/Nautilus Jay Boisseau Univ of Texas-Austin/TACC Ranger/Spur/Ranch Jay Boisseau Univ of Texas-Austin/TACC Lonestar Mark Fahey Univ of Tennessee-Knoxville/NICS Kraken Geoffrey Fox Indiana Univ FutureGrid Kelly Gaither Univ of Texas-Austin/TACC Longhorn Craig Stewart Indiana Univ Quarry/HPSS/Data Capacitor/RockHopper Mike Levine PSC Blacklight Miron Livny Univ of Wisconsin OSG Honggao Liu LSU QueenBee Rich Loft NCAR Frost Richard Moore UCSD/SDSC Trestles Michael Norman UCSD/SDSC Gordon John Towns Univ of Illinois/NCSA Forge/Ember/MSS Jeffrey Vetter Georgia Tech Keeneland

Over the reporting period, there were a number of changes in the resources represented in the Service Provider Forum reflecting the addition or decommissioning of allocated resources, and the expansion to new members (including Level 2 and Level 3 XSEDE Federation members). In all cases of decommissioning, representatives are welcome to continue as members in the Service Provider Forum, even if their XSEDE Federation level changes. The following table reflects the SP Forum membership as of June 2014.

XSEDE SP FED RESOURCE/SERVICE LEVEL INSTITUTION PI ALTERNATE(S) Blacklight 1 PSC Mike Levine Bob Stock Univ of Tennessee- Darter/Nautilus 1 Greg Peterson Mark Fahey, Justin Whitt Knoxville/NICS FutureGrid 1 Indiana Univ Geoffrey Fox Gary Miksik

PY1-3 Comprehensive Report Page 4

Michael Gordon 1 UCSD/SDSC Robert Sinkovits Norman Keeneland 1 Georgia Tech Jeff Vetter Jeff Young Univ of Texas- Lonestar 1 Chris Hempel Tommy Minyard Austin/TACC Univ of Texas- Maverick 1 Kelly Gaither Chris Hempel Austin/TACC Univ of Texas- Stampede 1 Dan Stanzione Chris Hempel Austin/TACC Trestles 1 UCSD/SDSC Richard Moore Rick Wagner Kjellrun Olson, Mike Blue Waters 2 UIUC/NCSA William Kramer Showerman OSG 2 Univ of Wisconsin Miron Livny Dan Fraser David Hancock, Therese Quarry/Mason 2 Indiana Univ Craig Stewart Miller Rosen Ctr 2 Purdue Carol Song Preston Smith SuperMIC 2 LSU Honggao Liu Sam White, Jim Lupo XSEDEnet, Globus 2 NCAR Rich Loft Dave Hart Online/GLADE Minnesota Jeffrey Supercomputing 3 U of Minnesota TBD McDonald Institute (MSI) Manish Rutgers 3 Rutgers University Prentice Bisbal Parashar Univ of Illinois NCSA Urbana- John Towns TBD

Champaign

J.5 Leadership As the Service Provider Forum was being established, the membership selected Carol Song as the interim chair of the SP Forum. After the initial Charter document was completed, a formal election of all officers was held in April 2012, resulting in Carol Song (Purdue) as the Chair, Dave Hancock (IU) as Vice Chair, and Dick Glassbrook (GA Tech) and Miron Livny (U Wisconsin) as SP Forum representatives to the XSEDE Advisory Board (XAB). With these officers graciously serving well beyond their nominal one-year terms, a new slate of officers was elected in October 2013, with Richard Moore (SDSC) as Chair, Kelly Gaither (TACC) as Vice Chair, and Carol Song and Miron Livny as XAB representatives. When Dr. Gaither became an XSEDE co-PI in February 2014, she was no longer eligible to be an SP Forum officer and Dan Stanzione (TACC) was elected in her stead.

PY1-3 Comprehensive Report Page 5

K TAS Summary [For completeness of our reporting we have included the Technology Audit Service summary report for the first three years of that project.]

PY1-3 Comprehensive Report Page 1

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 1

ANNUAL REPORT TECHNOLOGY AUDIT SERVICES FOR TERAGRID/XSEDE (OCI 1025159)

REPORTING PERIOD: JULY 1, 2010 – APRIL 30, 2013

SUBMITTED BY THOMAS R. FURLANI, PI CENTER FOR COMPUTATIONAL RESEARCH UNIVERSITY AT BUFFALO, STATE UNIVERSITY OF NEW YORK OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 2 EXECUTIVE SUMMARY

While individual tools to measure the utilization, performance, and to a lesser extent the scientific impact of high-end cyberinfrastructure (CI) have been developed over the years, a comprehensive CI management framework that includes all three of these important measures of the efficacy and operational efficiency and return on investment of CI was lacking. The National Science Foundation recognized the value of this capability and through the Technology Audit Service (TAS) of XSEDE made a significant investment in developing tools and infrastructure to make this capability easily accessible to a broad range of stakeholders. In this context, the XDMoD (XSEDE Metrics on Demand) tool is being developed to provide a comprehensive management framework for auditing the utilization and performance of high-end cyberinfrastructure, with an initial focus on XSEDE and academic and industrial high- performance computing centers. As shown schematically in the figure below, XDMoD takes input from a variety of data sources and provides the various stakeholders with a wide variety of analytical functionality. TAS is designed to meet the following objectives: 1. provide the end-user community with a tool to more effectively and efficiently use their allocations and optimize their use of CI resources, 2. provide operational staff with the ability to monitor and tune performance to ensure optimal resource performance, 3. provide management with a diagnostic tool to facilitate CI planning and analysis as well as monitor resource utilization and performance, 4. provide metrics such as publications, awards, and citations to help measure scientific impact and determine return on investment, and 5. audit XSEDE operations and processes.

In addition to the development of the analytic auditing tool XDMoD, an important component of the Technology Audit Service is the auditing of XSEDE processes and policies. One example would be auditing the XRAC process for allocations. The early efforts of the TAS team have been mainly focused on developing the XDMoD tool and its supporting data sources, in order to provide the various stakeholders with powerful analytic tools to assist in many aspects of HPC operations. TAS is now at a point in the program where we can begin adding other important auditing mechanisms, such as the XRAC allocations process.

The XDMoD portal (https://xdmod.ccr.buffalo.edu) provides a rich set of features accessible through an intuitive graphical interface, which is tailored to the role of the user. Currently six roles are supported: Public, User, Principal Investigator, Campus Champion, Center Director and Program Officer. Metrics provided by XDMoD include: number of jobs, service units charged, CPUs used, wait time, and wall time, with minimum, maximum and the average of these metrics, in addition to many others. These metrics can be broken down by: field of science, institution, job size, job wall time, NSF directorate, NSF user status, parent science, person, principal investigator, and by resource. A context-sensitive drill-down capability has been added to many charts allowing users to access additional related information simply by clicking inside a plot and then selecting the desired metric. Another key feature is the Usage Data Explorer that allows the user to make a custom plot of any metric or combination of metrics filtered or aggregated as desired.

The XDMoD framework is also designed to preemptively identify potential bottlenecks from user applications by deploying customized, computationally lightweight “application kernels” OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 3 that continuously monitor CI system performance and reliability from the application users’ point of view. Accordingly, through XDMoD, system managers have the ability to proactively monitor system performance as opposed to having to rely on users to report failures or underperforming hardware and software. The detection of anomalous application kernel performance is being automated through the implementation of process control techniques. In addition, through this framework, new users can determine which of the available systems are best suited to address their computational needs.

XDMoD CI Management Framework Schematic

The TAS team has taken the approach that a fully functional version of XDMoD would be launched early in the program and additional features would be incorporated in upgrades as they OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 4 are developed. Therefore, since 2011, one year after the TAS award, two versions of XDMoD have been released each and every year; one at the annual XSEDE summer conference and a second at the annual SC conference. As important features have been added and it has become more powerful, utilization of XDMoD and the number of regular users has grown steadily now up to 1376 unique visitors who are averaging over 300 visits each month.

Although XDMoD is still under development, the TAS program has nonetheless met with considerable early success. For example, early implementation of application kernels have already proven invaluable in identifying underperforming and sporadically failing infrastructure that would have likely gone unnoticed, resulting in diminished productivity (CPU cycles, failed jobs) on HPC systems that are over subscribed. As one example, the figure below shows the execution time over the course of several months for an application kernel based on NWChem, a widely used quantum chemistry program that is run daily on the large production cluster the University at Buffalo and many other XSEDE resources. While the behavior for 8 cores (one node) is as expected, calculations on 16 or more cores (multiple nodes) in May showed widely sporadic behavior, with some jobs failing out right and others taking as much as seven times longer to run. The source of performance degradation was eventually traced to a software bug in the I/O stack of a commercial parallel file system, which was subsequently fixed by the vendor, as evidenced by the normal behavior in the application kernel after June 4th 2012. Indeed, the software patch to fix this problem is now part of the vendor’s standard operating system release. It is important to note that this error was going on unnoticed at least as far back as May 2011 when we first ran the NWChem-based application kernel and was only uncovered as a result of analyzing the data from the TAS application kernels. As shown in the body of the report, similar results have been uncovered on other XSEDE resources.

In another example of how XDMoD can contribute to CI management, SDSC staff utilized XDMoD to carry out a detailed analysis of utilization and job throughput on Trestles compared to Ranger and Kraken. The Trestles system is targeted to modest-scale and gateway users, and is designed to enhance users’ productivity by maintaining good turnaround time as well as other user-friendly features such as long run times and user reservations. However, the goal of maintaining good throughput competes with the goal of high system utilization. Through XDMoD, SDSC was able to properly fine-tune the maximum utilization on Trestles compatible with maintaining optimum turnaround. This work recognized at the XSEDE12 conference with the Best Technology Paper award. OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 5

While perhaps not as glamorous as some of the other early successes discussed in the body of this report, XDMoD is being used by all XSEDE service providers to provide usage plots for the quarterly and annual reports. XDMoD’s Custom Report Builder, with its predefined templates for the quarterly and annual usage plots, allows the service providers to replace a manual labor intensive procedure with an automated process that literally requires two mouse clicks to complete. Benefits include increased productivity (staff are not tied up for several days each quarter generating the requisite plots) and uniformity of the format of the data presented from provider to provider.

With the addition of metrics such as funding, publications and citations, XDMoD can provide a direct link between scientific impact and infrastructure use and cost. As such it provides the first example of a well documented and truly data driven CI planning tool. With the release of the open source version of XDMoD, this CI management framework will be available to all HPC users in the public, private or academic sectors. This is important, because despite the clear and crucial role that national, private and campus-based HPC centers play in supporting research and scholarship in the U.S., it is safe to say that all too often, given today’s financial pressures, administrative support for such centers is difficult to garner and budgetary pressure is constantly applied, especially for academic-based centers where the competition for funding by other campus programs can be intense. This is partly due to the lack of quantitative metrics that clearly demonstrate the utility, service, competitive advantage, and return on investment that these centers provide. Since resources are limited, it is incumbent on HPC centers to provide such quantitative evidence to justify the continued investment. For the first time, HPC center directors, through XDMoD, have a tool to unambiguously link publication, citation, and funding data to the activities supported by the HPC center and in so doing strengthen their case for continued support.

The XDMoD resource management framework is an extremely powerful tool to display usage, determine operational efficiency, tune performance, and provide metrics to help measure return on investment for XSEDE systems. However, by virtue of its modular structure it is also very flexible and can be adapted to other applications that are not closely related to its original purpose of auditing XSEDE. In particular, we are excited about the future extension of XDMoD to other large-scale advanced cyberinfrastructure programs, such as, for example, the National Ecological Observatory Network (NEON). NEON is complex combination of sensor networks, instrumentation, experimental infrastructure, natural history archive facilities, remote sensing, and computational resources and it is therefore difficult if not impossible, without a comprehensive CI management framework, to know how well it is operating and its scientific impact. Accordingly, we envision a role for XDMoD to help provide oversight and metrics with respect to the operational efficiency and scientific impact of cyberinfrastructure programs such as NEON. Indeed, the importance of cyberinfrastructure for advancing scientific research in the U.S. will only continue to grow and accordingly so will the need to provide a robust CI management framework.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 6 TABLE OF CONTENTS EXECUTIVE SUMMARY ...... 2 1.0 TECHNOLOGY AUDIT SERVICES PROGRAM OVERVIEW ...... 8 2.0 TECHNOLOGY AUDIT SERVICES PROGRESS ...... 10 2.1 XDMOD ...... 10 2.1.1 XDMoD Architecture ...... 11 2.1.2 Data Sources ...... 13 2.1.3 XDMoD User Interface ...... 16 2.1.4 XDMoD Usage ...... 24 2.2 APPLICATION KERNEL FRAMEWORK ...... 25 2.2.1 Application Kernels ...... 25 2.2.2 Application Kernel Toolkit ...... 28 2.3 XDMOD DASHBOARD ...... 29 2.4 SCIENCE IMPACT METRICS ...... 30 2.5 USABILITY ANALYSIS ...... 32 2.6 SUPREMM ...... 35 2.7 PEAK ...... 38 2.8 BROADER IMPACT – OPEN SOURCE XDMOD ...... 41 2.9 PROJECT REQUIREMENTS AND MANAGEMENT ...... 43 3.0 AUDITING CHALLENGES ...... 43 3.1 XDCDB ...... 43 3.2 POPS ...... 44 3.3 NSF AWARD DATA ...... 44 3.4 ENSURING TAS RECOMMENDATIONS ARE IMPLEMENTED...... 45 4.0 TECHNOLOGY AUDIT SERVICES IMPACT ...... 45 4.1 USE CASES ...... 45 4.1.1 Early Application Kernel Success Stories ...... 45 4.1.2 An Analysis of the Tradeoff between Throughput and Utilization using XDMoD ...... 49 4.1.3 XSEDE Planning and Analysis Case Studies using XDMoD ...... 50 4.2 XSEDE REPORTING TEMPLATES ...... 54 4.3 XSEDE COMPLIANCE TAB ...... 54 4.4 XSEDE CENTRAL DATABASE (XDCDB) DATA RECORDING RECOMMENDATIONS: ...... 55 6.0 EDUCATION, OUTREACH, AND TRAINING ...... 55 6.1 SUMMARY REPORT FOR REU SUPPLEMENTS TO OCI 1025159 ...... 55 6.2 TAS WORKSHOPS ...... 57 7.0 TAS MEETINGS, EVENTS, PRESENTATIONS, WORKSHOPS, AND PUBLICATIONS...... 58 7.1 MEETINGS: ...... 58 7.2 EVENTS, PRESENTATIONS AND WORKSHOPS: ...... 58 8.0 TEAM MEMBERS ...... 60 8.1 UNIVERSITY AT BUFFALO: ...... 60 8.2 INDIANA UNIVERSITY (IU) SUBCONTRACT: ...... 60 8.3 UNIVERSITY OF MICHIGAN (UM) SUBCONTRACT: ...... 60 8.4 NATIONAL INSTITUTE FOR COMPUTATIONAL SCIENCES (NICS) SUBCONTRACT: ...... 60 8.5 TECHNICAL ADVISORY COMMITTEE: ...... 60 9.0 REFERENCES ...... 62 10.0 APPENDIX ...... 66 10.1 REQUIREMENTS TRACEABILITY MATRIX ...... 66 10.2 XDMOD FEATURE HISTORY ...... 72 OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 7

10.3 UBMOD: INITIAL RELEASE OF UNIVERSITY-BASED METRICS ON DEMAND ...... 73 10.4 SAMPLE XDMOD AUDIT OF THE XSEDE SYSTEM FOR 2012: ...... 76 10.5 XSEDE CENTRAL DATABASE (XDCDB) DATA RECORDING RECOMMENDATIONS: ...... 84

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 8

1.0 TECHNOLOGY AUDIT SERVICES PROGRAM OVERVIEW

While individual tools to measure the utilization, performance, and to a lesser extent the scientific impact of high-end cyberinfrastructure (CI) have been developed over the years [1-28], a comprehensive CI management framework that includes all three of these important measures of the efficacy and operational efficiency and return on investment of CI was lacking. The National Science Foundation recognized the value of this capability and through the Technology Audit Service (TAS) of XSEDE made a significant investment in developing tools and infrastructure to make this capability easily accessible to a broad range of stakeholders. In this context, the XDMoD (XSEDE Metrics on Demand) tool [29] is being developed to provide a comprehensive management framework for auditing the utilization and performance of high-end cyberinfrastructure, with an initial focus on XSEDE and academic and industrial high- performance computing centers. As shown schematically in the figure below, XDMoD takes input from a variety of data sources and provides the various stakeholders with a wide variety of analytical functionality. TAS is designed to meet the following objectives: (1) provide the end- user community with a tool to more effectively and efficiently use their allocations and optimize their use of CI resources, (2) provide operational staff with the ability to monitor and tune performance to ensure optimal resource performance, (3) provide management with a diagnostic tool to facilitate CI planning and analysis as well as monitor resource utilization and performance, (4) provide metrics such as publications, awards, and citations to help measure scientific impact and determine return on investment, and (5) audit XSEDE operations and processes.

Auditing processes require data and analysis on operations, usage and policy. Our initial efforts resulted in the development of the XDMoD tool to process the usage data collected by XSEDE into useful reporting for use by users, XSEDE management and NSF. Having established the XSEDE analytics platform XDMoD, we are now in a position to audit and evaluate specific processes and policies, as described in the Future Work section of this report.

The XDMoD portal (https://xdmod.ccr.buffalo.edu) provides a rich set of features accessible through an intuitive graphical interface, which is tailored to the role of the user. Currently six roles are supported: Public, User, Principal Investigator, Campus Champion, Center Director and Program Officer. Metrics provided by XDMoD include: number of jobs, service units charged, CPUs used, wait time, and wall time, with minimum, maximum and the average of these metrics, in addition to many others. These metrics can be broken down by: field of science, institution, job size, job wall time, NSF directorate, NSF user status, parent science, person, principal investigator, and by resource. A context-sensitive drill-down capability has been added to many charts allowing users to access additional related information simply by clicking inside a plot and then selecting the desired metric. Another key feature is the Usage Data Explorer that allows the user to make a custom plot of any metric or combination of metrics filtered or aggregated as desired.

The XDMoD framework is also designed to preemptively identify potential bottlenecks from user applications by deploying customized, computationally lightweight “application kernels” that continuously monitor CI system performance and reliability from the application users’ OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 9 point of view. Accordingly, through XDMoD, system managers have the ability to proactively monitor system performance as opposed to having to rely on users to report failures or underperforming hardware and software. The detection of anomalous application kernel performance is being automated through the implementation of process control techniques. In addition, through this framework, new users can determine which of the available systems are best suited to address their computational needs.

In addition to measuring utilization and monitoring system performance and quality of service for the XSEDE resources, it is important to measure metrics that provide a more direct indication of XSEDE’s impact on research, with the most obvious examples being publications, citations, presentations, and grant proposals submitted and awarded for investigators utilizing the XSEDE’s infrastructure. Taken together, these provide a useful measure of scientific impact and return on investment.

The impact of the TAS framework will be highly beneficial to the overall productivity and utilization of XSEDE resources and accordingly will have a broad impact across the scientific community in the U.S. In addition, the TAS framework may also be used to help improve performance and the delivery of resources in activities outside of XSEDE such as university- based HPC centers and other advanced cyberinfrastructure. To this end, we will extend the TAS framework to address cyberinfrastructure other than XSEDE, including the newly commissioned Blue Waters. The applicability of the TAS framework to academic HPC centers is fairly straightforward. For example, the information utilized by XDMoD to describe the XSEDE structure, resources, and operation is very similar in nature to the information that describes resources available at a typical HPC center. In fact, a typical HPC center can be treated conceptually as XSEDE with a single resource provider.

In addition to the more straightforward application to HPC, we are excited about the extension of XDMoD to other large-scale advanced cyberinfrastructure programs, such as, for example, the National Ecological Observatory Network (NEON). NEON is a continental scale research instrument consisting of geographically distributed infrastructure, networked via cybertechnology into an integrated research platform for regional to continental scale ecological research. Sensor networks, instrumentation, experimental infrastructure, natural history archive facilities, and remote sensing are linked via the internet to computational, analytical, and modeling capabilities to create NEON's integrated infrastructure. Accordingly, we envision a role for XDMoD to help provide oversight and metrics with respect to the operational efficiency and scientific impact of cyberinfrastructure programs such as NEON. Indeed, the importance of cyberinfrastructure for advancing scientific research in the U.S. will only continue to grow and accordingly so will the need to provide a robust CI management framework.

This report is organized as follows. The overall progress achieved to date in the Technology Audit Service grant is described in detail in Section 2.0 Technology Audit Services Progress, including the XDMoD and Application Kernel frameworks, development of scientific impact metrics, improving the XDMoD user interface through a usability analysis, and two initiatives focused on identifying and improving the performance of scientific applications running on XSEDE resources. The challenges experienced in the auditing process, including deficiencies in OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 10

XSEDE-wide databases, are discussed in Section 3.0 Auditing Challenges. The following section, Section 4.0 Technology Audit Services Impact, describes early successes realized through the TAS program, including identifying underperforming hardware and software. Section 5.0 Future Work describes remaining work that will be carried out in the final years of the initial award as well as a discussion of forward-looking opportunities that could be realized in a grant renewal. Section 6.0 Education, Outreach, and Training describes EOT activities carried out under TAS. A summary of meetings, presentations, publications and workshops is contained in Section 7.0. The makeup of the TAS team, which includes members from Indiana, Michigan, and NICS, is described in Section 8.0 Team Members. References are contained in Section 9.0. Finally, an Appendix is included to provide additional detail on some of the topics covered in the main body of the report. Perhaps most importantly, it includes an annotated Requirements Traceability Matrix from the original TAS proposal to provide specific detail on the progress achieved to date in meeting the objectives and goals outlined in the original proposal.

In addition, where appropriate, we have included excerpts from the original TAS proposal to help demonstrate how closely the development efforts have mirrored the proposal.

2.0 TECHNOLOGY AUDIT SERVICES PROGRESS

Here we provide a detailed description our progress along with an in-depth description of the functionality available through the XDMoD portal. The first two subsections describe progress on the two primary components of TAS as outlined in the original proposal, namely the XDMoD portal and the Application Kernel Framework (Subsections 2.1 and 2.2). XDMoD is a complex application that draws its information from multiple sources and runs several internal processes to support collection and aggregation of data from these sources. Subsection 2.3 describes the XDMoD operational dashboard designed to help ensure the continued operation of these processes. Subsection 2.4 describes progress to date generating metrics to measure scientific impact through the inclusion of publication, grant funding, and citations data. Subsection 2.5 provides a summary of the usability analysis carried out on the XDMoD user interface. Subsections 2.6 and 2.7 provide summaries of SUPReMM and PEAK, two initiatives, not part of the original proposal, that are targeted at improving end-user application performance based on the detailed analysis of application performance data. The final subsection (2.8) provides a summary of the broader impact of the TAS program that has been achieved to date, with particular emphasis on the development of an open source version of XDMoD for use by academic and industrial HPC centers.

2.1 XDMoD From the original TAS proposal, “We will deploy a tool called TeraGrid Metrics on Demand (TGMoD) that performs continual monitoring and testing of TG:XD infrastructure and provides a role-based web portal for mining this data and performing statistical analysis. This tool will capitalize on and augment existing tools to provide the functionality needed to deploy monitoring tools, populate data warehouses with auditing information, and provide analysis capability.”

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 11

The XDMoD auditing framework (https://xdmod.ccr.buffalo.edu) provides a role-based web portal for mining HPC metrics data and performing statistical analysis of XSEDE usage and quality of service data [29]. XDMoD is a highly customizable tool designed to provide a means of graphically viewing and comparing traditional HPC job metrics as well as preemptively identifying underperforming hardware and software through the use of a series of application kernels. The general Public, Users, PIs, Center Directors and Administrators, Campus Champions, and XSEDE/NSF management are all able to easily view usage metrics appropriate for their role, generate recurring reports, and monitor system performance. In this section we describe the XDMoD architecture, the data sources used to populate XDMoD, and the XDMoD User Interface. More in-depth descriptions of each of these are contained in [29]. We begin with a description of the architecture.

2.1.1 XDMoD Architecture Figure 2.1-1 shows the high-level schematic of the XDMoD architecture. The system is comprised of three major components: the XDMoD Data Warehouse, the XDMoD Portal, and the XDMoD REST Service API, which are described in detail below.

Figure 2.1-1 XDMoD high-level architecture schematic. External data from a variety of sources including the XSEDE central database are ingested into the XDMoD data warehouse. The XDMoD Portal, through a series of tabs, provides users with extensive analysis capabilities. A REST Service API provides third party developers with access to the data warehouse. OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 12

XDMoD Data Warehouse: The data warehouse sits at the core of XDMoD and is the central repository where all aggregated data is stored. The data warehouse comprises not only a set of databases where raw and aggregated data is stored, but also the infrastructure required to bring information in from external data sources such as the XSEDE Central Database (XDCDB), the Partnerships Online Proposal System database (POPS), and others. The process of importing data from external sources, called ingestion, retrieves information from external sources, parses it when necessary, aggregates the information across multiple dimensions for faster access, and stores it in a local data warehouse de-normalizing it where appropriate for performance reasons. To facilitate the speedy ingestion of XSEDE job data and present a negligible load to the XSEDE reporting infrastructure production system, we maintain a local mirror of the XDCDB that is updated daily. Currently, the ingestion process is run every night and requires approximately 30 minutes to complete. During this process, relationships between the various datasets are established so that information can be easily cross-referenced to the XDCDB. An API simplifies the deployment of ingesters for new data sources and also provides access to the information stored in the data warehouse. The current implementation of the API supports time series queries, which aggregate data over time based on an aggregation unit (day, week, month, year), as well as simple aggregation queries that sum or average over a user defined period of time. Query results may be limited or restricted to certain dimensions and statistics in the data warehouse depending on the users’ role.

The API also provides functionality for visualization of the results as charts or datasets (i.e., spreadsheets). Time series datasets can be visualized as area, line, or bar charts, while aggregate datasets can be plotted as bar, line, or pie charts. Moreover, the API provides multiple export formats: PNG, CSV, XML and JavaScript Object Notation (JSON).

The XDMoD data warehouse employs a dimensional starflake model for storage of the ingested data. Most data warehouse designs use dimensional models, such as Star-Schema, Snow-Flake, and Star-Flake [30,31]. A star-schema is a dimensional model with fully de-normalized hierarchies, whereas a snowflake schema is a dimensional model with fully normalized hierarchies [32]. A starflake schema, a combination of a star schema and a snow-flake schema, provides the best solution as it allows for a balance between to two extremes. For example, while some of the dimensions having a very long list of attributes that may be used in a query perform better by being ‘snowflaked’, other dimensions perform better by using the star schema where the dimension is de-normalized. Therefore, the decision whether or not to normalize a dimension is based on the properties of the dimension. Upon ingestion, the transactional data of various data sources (e.g., XDCDB) are partitioned into “facts” or “dimensions”. Dimensions are the reference information that gives context to facts. For example a XD job transaction can be broken up into facts such as the job_id and total CPU time consumed, and into dimensions such as resource provider, field of science, funding allocation, and the principal investigator of the project.

While a star-flake design alleviates certain performance issues, the data warehouse will still need to address performance of aggregating data over dynamic timeframes. For this purpose, the data warehouse fact tables are aggregated one more time over units of time as a new dimension. In this phase, we aggregate data over its complete duration by day, week, month and quarter. While OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 13 this does increase the data storage requirements, keeping multiple resolutions of data allows for faster queries when aggregating or querying data over long periods of time and adds a negligible cost in disk space.

The data from the fact table is summarized into aggregates and stored physically in a different table than the fact table. Aggregates can significantly increase the performance of the query since the query will read fewer data from the aggregates than from the fact table. Since database read time is the major factor in query execution time, being able to reduce the database read time will help the query performance.

XDMoD Portal: The XDMoD portal is a role-based web application developed using open source tools including the LAMP software stack (Linux, Apache, MySQL, PHP) and the ExtJS [33] user interface toolkit. XDMoD is developed using the Model-View-Controller [34,35] software design pattern where multiple data views are implemented entirely on the client side using the ExtJS user interface toolkit to provide the look-and-feel and advanced features of a desktop application within a web browser. A routing service (part of the XDMoD API) receives requests and directs them to the controller for the appropriate model (e.g., charting, report building, application kernels, etc.) for handling. Multiple controllers have been created to handle specific types of requests such as the generation of charts, creation of customized reports, and queries for information such as available resources to feed the user interface views. We have chosen to use authenticated RESTful services as the interface to all backend data services (controllers and models) and to completely de-couple the user interface (view) from the backend services. This de-coupling allows us to immediately achieve the goal of a providing a service API that can be utilized by third party data ingestion or display tools such as Google gadgets or other portal infrastructures. ExtJS is an advanced cross-browser JavaScript user interface toolkit that provides a rich set of functionality that one would typically expect to find in a desktop application including drag-and-drop functionality, event handling, callbacks, and customizable widgets. This provides XDMoD developers with a wide variety of features to build into the portal such as: tree views for organizing hierarchical data, tabbed display panels for organizing and displaying different types of information, a mechanism for easily utilizing RESTful services, and components for automatically parsing data returned from those services.

RESTful API: XDMoD has been designed to provide a RESTful API for all interactions with its core infrastructure (data warehouse, report generator, etc.). This provides an abstraction and insulates components from any necessary changes to the infrastructure. Any 3rd party interaction with XDMoD must use this RESTful API, which ensures that each request is authenticated and authorized. In fact, the XDMoD user interface uses the same mechanism for obtaining data. To assist third party developers (as well as internal developers needing to interface with XDMoD components), we have implemented a self-documenting feature to make this task easier. The XDMoD RESTful API Catalog provides a mechanism for API developers to include documentation within the API implementation that is automatically formatted into a description document and presented to third party developers to assist them in utilizing the API.

2.1.2 Data Sources The data addressed by XDMoD has been expanded to include more than processor utilization data (CPU cycles, jobs, wait times, etc). This has led to adding support for a number of new data OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 14 realms to the XDMoD data warehouse. Realms within XDMoD are used to group distinct types of data and metrics. Each realm consists of a set of data and a number of metrics derived mainly from that data. With the recent addition of the Performance realm, six realms are currently supported as shown in Table 2.1-1 below. XDMoD populates these realms with ingested data from the XSEDE central database (XDCDB), the POPS database, the XSEDE Resource Description Repository (RDR), application kernels performance data, quarterly data dumps from the NSF Award Search database, and most recently from the SUPReMM project, all of which are described more fully below.

Table 2.1-1. Data realm descriptions

Realm Description Jobs1 Provides data and associated metrics for jobs run on XSEDE resources. Metrics provide details on many aspects of user jobs including wait times, job sizes, SUs consumed, resource utilization, and queue expansion factor. Data can be aggregated by user, PI, institution, resource, service provider, field of science, NSF directorate, gateway, queue, and allocation. Allocations1 Provides data and associated metrics for active and expired allocations across XSEDE. Metrics provide details on burn rate, usage rate, total SUs allocated and total SUs used. Data can be aggregated by PI, resource, field of science, NSF directorate, field of science, and allocation type. Accounts1 Provides general account metrics such as number of accounts created, opened, and closed for any given time period. Grants2 Provides general information regarding grants awarded in support of XSEDE allocations including number of awards and total amount awarded. Data can be aggregated by field of science, funding agency, and NSF directorate. Proposals2 Provides basic information regarding awarded XSEDE projects and proposals. Metrics associated with the proposals realm can be aggregated by field of science and NSF directorate. Performance3 Uses application kernel data to provide metrics on CPU, I/O, memory and network performance as a function of time and across different resources.

1 Data for this realm comes from the XSEDE Central Database (XDCDB) 2 Data for this realm comes from the POPS database 3 Data for this realm comes from the Application Kernel Runner

XDCDB: The XSEDE Central Database (XDCDB) is an extensive, unified database containing a wealth of information on the state of users, accounts, and jobs across XSEDE. It contains job statistics such as number of jobs, job size, CPU hours consumed, job wait times, etc. for all jobs run on XSEDE resources and also stores allocation and user account information. The TAS team maintains a local copy of the XDCDB (mirrored nightly from Pittsburgh Supercomputing Center) that is used to ingest this data without affecting performance of the XSEDE production systems.

POPS Database: The Partnerships Online Proposal System (POPS) is used to manage XSEDE allocation requests and facilitates the process of reviewing and approving resource allocations. POPS contains a record of resource allocation requests, abstracts, and supporting funding associated with each request. The TAS team has begun to examine this data to provide metrics OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 15 on the approved allocations, external funding supporting requests, and monitor the accuracy of information entered during the application process. In addition, members of the TAS team are involved with the re-design of POPS (POPS 2.0) to ensure that data is more accurately collected and better organized to facilitate ingestion into XDMoD and subsequent analysis.

Application Kernel Runner (ARR): ARR is a TAS developed software package that manages the process of organizing, submitting, running, and importantly, collecting the results of application kernels run across all XSEDE resources. Application kernel performance data collected by ARR is ingested nightly into the XDMoD data warehouse where it is used to generate data for the Performance realm.

SUPReMM: Detailed application performance data at the node and job level from the SUPReMM program (Section 2.6) is ingested into the XDMoD data warehouse and exposed as a realm within the XDMoD portal. The server-level data is extensive, including core-level CPU usage (user time, system time, idle, etc), socket-level memory usage (free, used, cached, etc), swapping/paging activities, system load and process statistics, network and block device counters, interprocess communications (SysV IPC), software/hardware interrupt request (IRQ) count, filesystems usage (NFS, Lustre, Panasas), interconnect fabric traffic (InfiniBand and Myrinet), and CPU hardware performance counters. By periodically sampling node and system state values and a wide spectrum of hardware counters, SUPReMM provides a rich (and extensive) dataset of performance data for every application run on each node in a given resource. This information is made available as a new realm in the Usage Explorer where it can be examined alongside user and resource metrics.

Scientific Impact: While it is clearly important to monitor the utilization of XSEDE resources, utilization in and of itself is insufficient to provide an objective measure of scientific impact or return on investment for XSEDE. Rather, it is important to measure metrics that provide an indication of XSEDE’s impact on research and scholarship within the scientific community, with the most obvious examples being publications, citations, presentations, and grant proposals submitted and awarded for investigators utilizing the infrastructure. Taken together, these provide a useful measure of scientific impact or, if you will, return on investment for the National Science Foundation (NSF). To this end, metrics that focus on scientific impact, such as publications, citations and external funding, are being examined and incorporated into XDMoD where appropriate. During the XSEDE allocations process, investigators are required to provide a list of scientific publications as well as external funding in support of their request. We are extracting this information from the POPS database and are able to cross-reference it with publically available data from external funding agencies such as the NSF award database. Using this data we are then able to compare XSEDE resource usage with external funding sources including funding agency, directorate, and field science for a particular user or project. The XSEDE User Portal has recently implemented a mechanism for users to more easily curate their scientific publications related to XSEDE projects. While our work in this area is still under active development, this allows us to more easily gather information, along with conference and journal publications reported by NSF and use this data as a basis for establishing a measure scientific impact. We are currently pursuing several additional sources of publication citation information such as Google Scholar, Mendelay, and Research Gate, however, the in-ability to uniquely identify authors (author name dis-ambiguation) presents a serious challenge. A registry OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 16 of unique researcher identifiers (such as an ORCID id, for example) would make this problem much more tractable.

Future: Additional data sources to be examined in the future include help tickets for which we are waiting for an XSEDE wide help desk ticketing system to be implemented.

2.1.3 XDMoD User Interface Arguably, the XDMoD user interface is the most important component of the technology audit service as it is the means by which the outside world accesses and analyzes audit related data. Accordingly, substantial work has gone into making it easy to use while also giving it powerful analytical capabilities; two functionalities that at times can oppose each other. The XDMoD user interface, shown in Figure 2.1-2, is organized into multiple tabs that provide functionality tied to the role of the user. Supported roles include a public (unauthenticated) user, XSEDE user, XSEDE PI, Campus Champion, Center Director, and Program Officer/XSEDE Management. A detailed description of each role as it applies to data access is as follows.

Figure 2.1-2 XDMoD Summary Tab. Summary tab contains a series of user configurable summary plots of usage. In this case the summary plots are for all of XSEDE in 2012.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 17

Public Role: With no user account required, the public role provides non-authenticated users with access to overall XSEDE utilization broken down by resource, field of science, allocation time, etc.; utilization at specific service providers; and the ability to view specific time periods, and export images. Quality of service data via the application kernel suite is not publically available, nor is specific user data.

XSEDE User: A user with an active XSEDE allocation can log into XDMoD using their XSEDE User Portal credentials. Users are able to view all data available to the Public Role as well as their personal utilization information. They are also able to view information regarding their allocations, quality of service data via the Application Kernel Explorer, and generate custom reports.

XSEDE Principal Investigator (PI): A principal investigator is a user who is listed as a PI on one or more XSEDE allocations. A PI has access to all data available to an XSEDE user, as well as detailed information for any users included on their allocations.

Campus Champion: The campus champion role is still a work in progress. A typical campus champion will include one or more users on their allocations while assisting them, giving the champion the same access to detailed information about those users that a PI would have. We will be collaborating with the Campus Champion working group to develop other tools to facilitate their work in the future.

Service Provider/Center Director: The director of a service provider or center will have access to detailed usage information for any user that has run jobs at their center. In addition, they will have access to detailed application kernel results for runs at their center. Directors are also able to delegate access to their center’s data to other users who assist in the operations of their center. Directors also have access to data reporting compliance information for their center to ensure that the proper information is being consistently reported to the XDCDB and other monitored data feeds.

Program Officer/XSEDE Management: Program Officers and XSEDE management have no restrictions on the information available to them. They may view data across all service providers, including data reporting compliance reports and custom queries (both described below).

The XDMoD UI contains a wealth of information and has been organized into tabs to compartmentalize the data without overwhelming the user. For illustrative purposes, we will focus on the Program Officer role. The Summary tab (Figure 2.1-2) provides a snapshot overview of XSEDE, with several small summary charts visible that can be expanded to full size charts through a simple mouse click. Clicking on the Role button (below the highlighted Summary tab) displays a drop down menu that allows one to 1) change your role (assuming you have multiple role permissions), and 2) narrow the scope of the metrics displayed to a particular service provider as opposed to all of XSEDE. The default is to show utilization over the previous month, but the user may select from a number of preset date ranges (week, month, quarter, year to date, etc) or choose a custom date range.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 18

The Usage tab, shown in Figure 2.1-3, provides access to an expansive set of XSEDE-wide metrics that are accessible through the tree-structure on the left-hand side of the portal window, including summaries of usage, allocations, accounts, and performance data. If logged in under the User role as opposed to the Program Officer role, then the usage tab initially provides details specific to your utilization as opposed to all of XSEDE although a User may still access XSEDE- wide metrics. Usage metrics provided by XDMoD include: number of jobs, SUs (service units) charged, NUs (normalized units) provided, CPUs used, wait time, wall time, minimum, maximum and average job size, average SUs charged, average NUs provided, average CPU used, average wall time, average wait time and user expansion factor. These metrics can be broken down by: field of science, gateway, institution, job size, job wall time, NSF directorate, NSF user status, parent science, person, principal investigator, and by resource. Many of the plots are context sensitive and allow users to click on a data element within the plot to further analyze the data. For example, in Figure 2.1-3, which shows the distribution of total CPU hours by job size in 2012 for all of XSEDE, one can click on any of the columns to obtain more detailed analysis for the selected job size range. The plot can also be made available to the custom report generator by clicking the box that reads “Available For Report”. It can also be exported in either PNG (portable network graphics) or SVG (scalable vector graphics) format. The data itself can be exported in either CSV (comma separated values) or XML (extensible markup language) format.

Figure 2.1-3 XDMoD Usage Tab. A Usage plot showing the total CPU hours broken out by job size on all XSEDE resources for 2012.

The Usage Explorer tab provides a powerful tool for organizing and comparing the XSEDE data from a wide variety of metrics. The Usage Explorer tab, which also provides access to all of the metrics available through the Usage tab, facilitates comparison among the various metrics by allowing multi-axis plots, as shown in Figure 2.1-4. Displayed in the window is a plot that shows the number of jobs (left hand access) and average core count (right hand axis) broken down by NSF Directorate in 2012 for all XSEDE resources. Biological Sciences rival Mathematical and Physical Sciences in total number of jobs but the biological-based jobs tend to OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 19 use a smaller number of processors on average. As shown in Figure 2.1-5, the data can be filtered in a variety of ways to display only a desired subset of the data. For example, the plot shown in Figure 2.1-5 was generated from Figure 2.1-4 by applying a filter to display only the “NICS-KRAKEN” data. It is interesting to note that on Kraken, Geosciences has surpassed Biological Sciences in terms of the total number of jobs run and has a much higher average core count. A notable feature is the ability to open a metric that is provided on the Usage tab directly in the Usage Explorer by clicking on the gear icon on the top right of the plot. This allows one to utilize an existing plot as a starting point and easily customize it, configure additional data series for comparison, and save it for use in a report. Taken in its entirety, the Usage Explorer provides a powerful and flexible interface to facilitate analysis of the XSEDE data.

Figure 2.1-4 The XDMoD Usage Explorer Tab. Plot shows the number jobs run for each NSF Directorate vs average core count for 2012 on all XSEDE resources. Number of jobs is shown on the primary axis (left-hand axis) and average core count (weighted by each jobs total SU’s charged) is shown on the secondary axis (right-hand axis).

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 20

Figure 2.1-5. The XDMoD Usage Explorer Tab. A Usage Explorer generated plot showing number of jobs for each NSF Directorate vs average core count on NICS-Kraken for 2012. Number of jobs is shown on the primary axis (left-hand axis) and average core count (weighted by each jobs total SU’s charged) is shown on the secondary axis (right-hand axis). Plot was generated from Figure 2.1-4 by applying a filter that limited the analysis to Kraken data.

The Allocations tab allows a user to view details of their active and expired allocations including SUs remaining for resource available to an allocation. The App Kernels tab provides information on the application kernel performance and quality of service on XSEDE resources. Through this tab, users can view historical performance for all application kernels run on all XSEDE resources. For example, Figure 2.1-6 shows the performance of the MPI-Tile-IO benchmark run on Trestles in January 2013. Note that the plot window contains a description pane that provides information on the application kernel. The data generated by the application kernels is substantial, making the exploration of the data challenging. Therefore, in order to facilitate analysis of the application kernel performance data, we developed the App Kernel Explorer tab. Here the user can easily select a specific application kernel or suite of application kernels, a specific set of resources, and a range of job sizes for which to view performance. It allows users to directly compare application kernel performance across multiple XSEDE resources. For example, Figure 2.1-7 shows a comparison between Kraken and Ranger for the HPC Challenge benchmark run on 32 and 64 processors in January 2012.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 21

Figure 2.1-6 App Kernels Tab. Plot showing wallclock time for MPI-Tile-IO Benchmark on Trestles for the month of January in 2013. The Description pane in the plot window provides more detailed information on the application kernel and the resource.

Figure 2.1-7 XDMoD App Kernel Explorer Tab. Plot generated using the App Kernel Explorer tab showing a comparison between HPCC benchmark run on Kraken and Ranger for 32 and 64 processor jobs.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 22

The Report Generator tab (Figure 2.1-8) gives the user access to the Custom Report Builder that allows a user to create and save custom reports. For example, a user may wish to have specific plots and data summarized in a concise report that they can download for offline viewing. The user can also choose to have custom reports generated at a user specified interval (daily, weekly, quarterly, etc) and automatically sent to them via email at the specified time interval, without the need to log into the portal.

Figure 2.1-8 XDMoD Custom Report Generator tab. A user can create custom reports by selecting plots from the pool of plots (right hand side) that were generated previously using the XDMoD portal.

On occasion, NSF program officers and XSEDE management have made requests for reports or comparisons that do not fit into the existing data realms or plotting tools provided by XDMoD. The Custom Query tab, shown in Figure 2.1-9 and available only to users with the Program Officer role, has been designed as a mechanism for fulfilling these requests without requiring substantial modification of existing XDMoD internals. Servicing these requests often requires custom back-end programming by the XDMoD team, the incorporation of potentially inconsistent data sources, and substantial data sanitization work. While these queries may be incorporated into XDMoD in the future, the Custom Query tab provides a more efficient way to quickly view the requested data while providing familiar user interface elements such as the date selector, plot export tools, and inclusion of plots into custom reports. Examples of custom queries include research funding supported by XSEDE (directly and indirectly) and NSF research funding supported by XSEDE (overall as well as restricted to MPS). In fact, discoveries during the generation of these queries have resulted in requirements to improve the quality of data collected by the XSEDE allocations process. OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 23

Figure 2.1-9 Custom Query tab showing number of SUs delivered in 2011 by supporting funding agency

A Compliance tab was added to the XDMoD framework to provide service providers, NSF Program Officers and XSEDE leadership with a tool to quickly assess service provider compliance with XSEDE operational reporting requirements and TAS recommendations. The new compliance tab tracks whether or not each service provider is supplying required reporting metrics and data. Section 4.3 provides a detailed description of its capability.

User’s Guide: Substantial work has gone into making the XDMoD user interface easy to use. However, given its powerful analytical capabilities, it can nonetheless be challenging for users to fully exploit its capability. To help remedy this, a User’s Guide and FAQ, available through the Help Button on the top right hand side of the XDMoD Portal window have been developed. The user guide provides detailed instructions for utilizing all aspects of XDMoD from obtaining an account to explaining common user interface elements and getting the most out of powerful tools such as the usage explorer. Access to the guide is context sensitive providing direct access to the most relevant portions of the guide based on what the user was viewing when they requested the help. For example, if one accesses the user guide while viewing the Usage Explorer tab they will be brought directly to the section of the manual describing that tab. In addition to the user manual, a “quick start” guide is provided directly within XDMoD for several tools including the Usage Explorer, Application Kernel Explorer, and Report Generator. As shown in Figure 2.1- 10, when a user views the Usage Explorer tab and has not created a query, visual assistance is provided to assist them in quickly getting started.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 24

Figure 2.1-10 XDMoD Help. The XDMoD Quick Start Guide for the Usage Explorer tab

2.1.4 XDMoD Usage XDMoD 2.6 was released in early February 2013. As shown in Figure 2.1-11, since XDMoD’s first release in July 2011, usage has been steadily climbing as more users become aware of its availability and capability. We anticipate this trend to continue.

Figure 2.1-11 XDMoD Usage OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 25

2.2 Application Kernel Framework The Application Kernel framework is designed to provide an overall quality of service (QoS) metric and to preemptively identify potential bottlenecks by deploying customized, computationally lightweight “application kernels” that continuously monitor CI system performance and reliability from the application users’ point of view. Accordingly, through the application kernels, system managers have the ability to proactively monitor system performance as opposed to having to rely on users to report failures or underperforming hardware and software. The detection of anomalous application kernel performance is being automated through the implementation of process control techniques. In addition, through this framework, new users can determine which of the available systems are best suited to address their computational needs.

2.2.1 Application Kernels From the original TAS proposal, “We will develop a lightweight application kernel framework that measures the performance of the TeraGrid infrastructure with respect to existing applications. It will include an active and reactive set of services accessible through protocols, services, and development tools to monitor the advanced cyberinfrastructure the TeraGrid. This will allow continuous auditing of TG assets to measure all aspects of system performance, including global filesystem performance, local processor-memory bandwidth, allocatable shared memory, processing speed, and network latency and bandwidth.”

Most modern, multipurpose HPC centers mainly rely upon system related diagnostics, such as network bandwidth, processing loads, number of jobs run, and local usage statistics in order to characterize their workloads and audit the performance of their infrastructure. However, this is quite different from having the means to determine how well the computing infrastructure is operating with respect to the actual scientific and engineering applications for which these HPC platforms are designed and operated. Some of this is discernable by running benchmarks, however, in practice benchmarks are so intrusive that they are not run very often (see, for example, Reference [25] in which the application performance suite is run on a quarterly basis), and in many cases only when the HPC platform is initially deployed. Today’s HPC infrastructure is a complex combination of resources and environments that are continuously evolving, so it is difficult at any one time to know if optimal performance of the infrastructure is being realized. Key to a successful and robust science and engineering-based HPC technology audit capability lies in the development of a diverse set of computationally lightweight application kernels that will run frequently on HPC resources to monitor and measure system performance, including critical components such as the global filesystem performance, local processor-memory bandwidth, processing speed, and network latency and bandwidth. The application kernels are designed to address this deficiency, and to do so from the perspective of the end-user applications. As noted previously, the term “application kernel” is used to represent micro and standard benchmarks that represent key performance features of modern scientific and engineering applications, as well as small but representative calculations done with popular open-source high-performance scientific and engineering software packages.

The initial set of application kernels has been drawn from community open-source scientific and engineering codes like NWChem [36,37], GAMESS [38-40], NAMD [41,42], WRF [43], and CESM [44], as well as established low-level and micro-benchmark tools [45,46], MPI OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 26 benchmarks [47], high-performance LINPACK [48] and GRAPH500 [49]. Categorizing the current set of kernels has been done by application area, but as the kernel population grows we anticipate also using a more functional approach similar to the Berkeley “dwarfs.” [50]. A recent Department of Energy study for Exascale computing needs [51] exemplifies this overlap between application areas and the kernel “dwarf” classification of the computationally intensive algorithms involved. Table 2.2-1 shows our current sampling of the dwarfs by application kernels (in terms of the original seven dwarfs and the application areas we have sampled thus far).

Table 2.2-1. Correspondence between currently deployed application kernels and the Berkeley dwarfs (the original seven dwarfs have been considered thus far of the extended list of thirteen).

Kernel Class Structured Unstructured FFT Dense Sparse N-body Monte Grids Grids Linear Linear Carlo Algebra Algebra Biomolecular √ √ √ √ Computational √ √ √ √ Chemistry Fluid Dynamics √ √ Climate/Weather √ √ Modeling

Our initial set of application kernels largely fall into the application areas of molecular physics, computational chemistry, weather/climate modeling, and nanoscale science, plus we have lower level micro-benchmarks and benchmarks specifically for input/output (I/O). The GRAPH500 benchmark samples a different, expanded view of the dwarf classification [52] in data-intensive graph-oriented combinatorial kernels. The list of application kernels will continually be expanded to ensure a thoroughly representative sampling of application areas, even as new applications are developed.

We have distilled “lightweight” benchmarking kernels from these that are designed to run quickly with an initially targeted wall-clock time of less than 10 minutes. However we also anticipate a need for more demanding kernels in order to stress larger computing systems subject to the needs of HPC resource providers to conduct more extensive testing. The goal is to touch all major performance issues to be found in typical HPC compute and data environments, including (1) local scratch, (2) global filesystems for file staging and parallel I/O, (3) local processor-memory bandwidth, (4) processing speed, and (5) network latency and bandwidth. For some aspects of performance it will become necessary to develop customized kernels, but we are initially leveraging existing applications as much as possible, provided that they can be kept lightweight and therefore low-impact.

This preliminary list of applications kernels was not meant to be exhaustive – indeed we intend to use an open-source framework that will easily allow for end-users to donate their own application kernels to the service. Application kernels will be preferred that satisfy the following profiling and usability criteria: 1) the application needs to provide reliable low-level performance OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 27 data that can be used for measuring the metric of choice: e.g., NWChem provides I/O statistics on scratch space used to store temporary files that can be used to identify under- and non- performing disk space, while STREAM can similarly be used for processor-memory bandwidth. 2) essential pieces of the system (processor, memory, and input/output) can all be tested from the application’s point of view. While a single application kernel will not simultaneously test all of these aspects of machine performance, the full suite of kernels will stress all of the important performance-limiting attributes.

It is important to note, however, that the application kernels will run at regular intervals from the user’s perspective, in that a regular user account is employed to run the kernels at both planned and requested intervals on XSEDE resources (even if the application kernels are light enough that they require only a few minutes each, it is likely that there will be enough of them that only a subset are run at any given time). The resulting performance data (including overall run time, run time of various sub-systems and other data characteristic of the various application kernels) will be archived within the XDMoD framework and used to provide immediate as well as long- term feedback on the performance profile of the HPC resources. An automated schedule can also be put in place for running the kernels. The intervals can easily be adjusted, and we further anticipate implementing a responsive unit in which system support personnel at various HPC resources can place requests for custom scheduling of the application kernel test suites, as well as quantitative measures of the application kernel coverage of given resources.

Application Kernel Process Control: Running the application kernels continuously on the XSEDE (or other) resources generates a tremendous amount of performance data that makes manual oversight of the data to identify underperforming hardware or software impractical. Accordingly, we have been exploring the development of automated processes to monitor application kernel performance. A preliminary model is being developed to use the application kernel data to assess quality of service. Figure 2.2-1 shows a representative plot of this effort for MPI-Tile-IO and IOR application kernel data. A region is automatically selected as the control region. The control region is by definition considered to be the normal operating envelope of the given application kernel. The process, in this case the performance of a given application kernel, is assumed to be nominally in control in this region. If the 5-point running average at a given point beyond the control region exceeds a specified tolerance (based upon the data range in the control region) the process is flagged as out of control. In Figure 2.2-1 the region where the write aggregate throughput bandwidth of the MPI-Tile-IO and IOR application kernels drops substantially is automatically evaluated to be out of control. The details of when to automatically readjust the control region to account for system upgrades or normal changes, when to flag a process as out of control, etc., are still being finalized. However, note that the automatic process control system successfully flagged the deterioration of the write throughput performance of Lonestar4. This usage case will be discussed in more detail in Section 4.1 below.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 28

Figure 2.2-1. Application Kernel process control. Time histories for two IO based application kernels selected to demonstrate the application kernel control process for two underperforming application kernels. The solid purple line is the data, the dashed black line is a 5-point average, the blue shading indicates the control zone range. The red zones indicate that the process is out of control in an unfavorable sense while the green zones indicate superior performance compared to the in control performance.

Using application kernel process control, site administrators can easily monitor application kernel run failures for troubleshooting performance issues at their site.

2.2.2 Application Kernel Toolkit From the original TAS proposal, “A further important component of application kernel based auditing will be the development of an application kernel toolkit that will facilitate the development of novel application kernels by other groups interested in devising metrics to help evaluate their computing infrastructure or that of the TG:XD. The toolkit will also allow other computational centers to access the application kernels developed under the TAS and install and run them on their own clusters to improve performance and the delivery of resources.”

A further important component of application kernel based auditing will be the development of an Application Kernel Toolkit that will facilitate the development of novel application kernels by other groups interested in devising metrics to help evaluate their computing infrastructure. The toolkit also will allow other HPC centers and providers to access the application kernels developed under XDMoD and install and run them on their own clusters to improve performance OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 29 and the delivery of resources. The toolkit will initially consist of documentation and examples on existing application kernels, together with scripts needed for integration with the monitoring framework. As the application kernels evolve, a custodial process will be developed in which user-developed and supplied kernels can be adopted into the mainstream collection. We anticipate that growth in the pool of application kernels will necessitate an organized collection, into application areas focused around the Berkeley “dwarfs” that characterize the computation and communication demands of scientific and engineering applications [50,52].

2.3 XDMoD Dashboard XDMoD is a complex application that draws its information from multiple sources and runs several internal processes to support ETL and aggregation of data from these sources. While some of these data sources are managed directly by the XDMoD team (e.g., the application kernels) many others such as the XDCDB and the POPS database are not. In order to provide a reliable service, it is important that the XDMoD team be able to monitor the health of these data sources, the ETL process, aggregation of the data, and several other internal XDMoD processes. To support this task, we have developed a new operational dashboard that will allow staff to quickly assess the current state of processes within the XDMoD application and take action to rectify any issues. Each internal process has been instrumented to send log data into the XDMoD data warehouse, in addition to the traditional log file, so that a summary can be provided and potential issues more easily identified from a central location.

This dashboard, initially available only to the XDMoD development team, provides an operational summary of all periodic processes executed by XDMoD. This includes: ETL from multiple sources including the XDCDB, POPS, and RDR; the deployment of the application kernels and the ingestion of their data; and the process of mirroring a local copy of the XDCDB from Pittsburgh. Failure of any of these processes may also help to identify issues with the data source itself. For example, a failure in the process of ingesting XDCDB data from our local mirror led to the discovery and subsequent correction of an issue in the mirroring process between SDSC and PSC.

The operations dashboard consists of a summary screen that displays an overview of the most recent invocation of each internal process. These currently include a report on active users, data warehouse ETL, report generation, compliance, application kernel ETL, and application kernel deployment. Clicking on any summary will direct the user to detailed information on each process. For most ingestion processes, this summary consists of the log entries for that process that can be sorted and filtered by log priority (e.g., information, warning, error, etc.), and by date. By default, only the log entries pertaining to the last invocation of the process are displayed but this can be easily expanded by the user. Log entries indicating an error state are highlighted in red and also displayed directly on the summary page. For example, the application kernel deployment engine (ARR), in addition to providing log entries, maintains the state of all application kernels scheduled to run at the various resource providers including complete logs of run success/failure rates and application output. The dashboard makes use of this information and allows operations personnel to quickly identify failed application kernels and immediately drill down to view error messages and inspect the complete output of the failed kernel. Figure 2.3-1 shows a sample set of application kernel results from Alamo, Gordon, and Edge (a local CCR resource). We immediately identify that WRF is failing on Alamo and Gordon while OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 30

GAMESS, AMBER, and WRF are running reasonably well on other platforms. We can also drill down to details on individual runs to view complete deployment and application output. In the future we will examine data returned by failed application kernels in an attempt to automatically identify potential causes of failure such as unmounted filesystems or compute nodes common to several failed kernels.

It is anticipated that certain portions of this dashboard will be made available to a broader audience as appropriate so that service providers can utilize data such as the application kernels to assess quality of service and proactively identify potential hardware and configuration issues.

Figure 2.3-1 Application Kernel Deployment Details and QoS

2.4 Science Impact Metrics From the original TAS proposal, “Furthermore, this auditing may go beyond the typical compute, network and data utilization and include other metrics such as journal publications, grant awards, portal usage, or other secondary artifacts that may serve as measures of project productivity to external funders.”

While it is clearly important to monitor the utilization of XSEDE resources, utilization in and of itself is insufficient to provide an objective measure of scientific impact or return on investment for XSEDE. Rather, it is important to measure metrics that provide an indication of XSEDE’s impact on research and scholarship within the scientific community, with the most obvious OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 31 examples being publications, citations, presentations, and grant proposals submitted and awarded for investigators utilizing the infrastructure. Taken together, these provide a useful measure of scientific impact or, if you will, return on investment for the National Science Foundation (NSF).

Efforts to date by the TAS team, led by Gregor von Laszewski, have focused on publication data ingested from 3 data sources; the NSF award database, the XSEDE POPS database and the XSEDE User Portal (XUP). The NSF award database is extensive and contains (as of February 2013) 911,147 publication records of which 446,890 are unique, and 376,722 NSF award records. Problematic with all these databases is that they are not well-curated, and are not integrated as well as they could be in an overall NSF database similar to PubMed that is organized by NIH.

The POPS (Partnerships Online Proposal System) is used to manage XSEDE allocation requests and contains a record of XSEDE projects, resource allocation requests, abstracts, and supporting funding associated with each request. The data is self-reported by investigators and requires careful checking to ensure accuracy. Future versions of POPS will enforce stricter data validation and attempt to automate some of the data collection to reduce as much as possible the amount of post processing required. The XUP provides XSEDE users with the ability to upload publication information relevant to their allocation, and in so doing create a repository of publication data which can be ingested and subsequently mined by XDMoD. The NSF award database, which is currently obtained from NSF via physical media as it is not accessible online, was converted from Microsoft Access to a database that can be more easily cross-referenced with information from POPS and the XSEDE central database. From the latter we have obtained information about users and projects and correlated them with the publication data. This will allow us to derive statistics in term of number of publications for an XSEDE user, a project, and field of science, etc. It is worth noting that the XDCDB contains information on over 7000 projects and almost 20,000 users from over 1000 organizations.

In terms of the correlation between the XDCDB and the NSF award database we found that:  116,412 out of the 446,890 unique NSF publications involved XD users  28,309 out of the 376,722 NSF awards involved XD users  12,958 out of 19,975 XD users have at least one publication database (maximum is 919, average is 59, and median is 10)  6,348 out of 7,027 XD accounts have at least one publication associated with them (maximum is 10,703, average is 93, and median is 32)  858 out of 1,075 XD organizations have at least one publication (maximum is 37,085, average is 765, and median is 46.)  136 out of 138 NSF Fields of Science have at least one publication (maximum is 143,848, average is 11,111, and median is 2,625)

Please note that the sum of publications for each Field of Science (FOS) is not the number of unique publications, but a sum from projects/accounts identified as in that FOS. A user’s publication could be double counted if he/she belongs to multiple projects; and a given publication from one project could be double counted as well when the project is classified into multiple FOS (this is the case for some projects). We will introduce another set of statistics for OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 32

FOS based on users and unique publications, but not aggregated in the projects/accounts level. Table 2.4-1 shows the 15 Fields of Science with the highest number of publications.

Table 2.4-1. The 15 Fields of Science with the Highest Number of Publications

Sum of Number Publications per Field of Science Publications Accounts Account 143848 596 241.3557 Materials Research 117353 567 206.9718 Molecular Biosciences 116431 585 199.0274 Chemistry 79071 250 316.284 Computer and Computation Research 65314 293 222.9147 Biophysics 63136 278 227.1079 Advanced Scientific Computing 61933 326 189.9785 Biochemistry and Molecular Structure and Function 53386 217 246.0184 Atmospheric Sciences 52939 246 215.1992 Physical Chemistry 50020 89 562.0225 Center Systems Staff 49225 287 171.5157 Chemical, Thermal Systems 44957 131 343.1832 Training 41314 318 129.9182 Physics 39490 310 127.3871 Astronomical Sciences 32902 177 185.887 Fluid, Particulate, and Hydraulic Systems

We have conducted the current data mining and analysis in a relatively independent framework in which we have python programs, database backend, and related prototype services and plan to integrate this data into the XDMoD framework. A prototype will be in place for the XSEDE13 conference where we will display detailed publication information for the top 20 XSEDE PIs by SUs consumed within XDMoD.

2.5 Usability Analysis From the original TAS proposal, “The success of the TG:XD monitoring and auditing effort will be a joint product of the technical quality of tools and services and their fit to user needs. To assess user needs we will engage in a rigorous multi-method approach to gather user requirement data and then analyze these data to inform subsequent software development and deployment. The guiding philosophy of this effort will be based on user-centered design with each software release representing an occasion to assess performance and usability in order to inform the next iteration of development.”

Our strategy utilizes mixed methods to collect data, including: user surveys of randomly sampled respondents; interviews (e.g., of resource users and providers); and direct observation (e.g., on- site observation of users under operational conditions). As a first measure of the usability of the XDMoD interface, a team of researchers at the University of Michigan (UM) conducted an in- OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 33 person interview study of local Michigan XSEDE users (n=10) and potential XSEDE users (n=13). Even though the sample size was small, analysis of the interview data showed a number of areas of the XDMoD interface that could be improved, including: (1) developing a less complex default view for some of XDMoD’s tabs to prevent information overload, (2) improving the ability to locate and adjust the time period for graphs in the Report Generator, (3) making it easier to add charts to the Report Generator, (4) improving navigation in the Usage tab, (5) making the location of help resources easier to find, and 6) implementing “right click” capability for some functions. This analysis provided useful feedback to the early XDMoD development efforts and many of the areas identified as needing improvement were addressed in subsequent releases.

In order to explore XDMoD usability in a broader context, the UM team chose to administer a web-based questionnaire to a random sample of XSEDE users. Survey respondents were sampled from the population of individual XSEDE users who ran a minimum of 25 jobs between March 1, 2012 and September 1, 2012. Information on jobs came from XSEDE logs for this period. Jobs run from group, staff or other non-individual accounts were excluded. A minimum of 25 jobs run on XSEDE was selected as evidence of significant use, as opposed to occasional or test use. Using these criteria produced a sampling frame of 1,914 account holders, from which 20% (N=398) were randomly selected. Respondents were invited via email to complete a web- based questionnaire. In this invitation, respondents were offered a pre-incentive in the form of a $5 Amazon.com gift code included in their personalized email invitation. Respondents were free to use the gift code independent of whether they opted to complete the survey. Of those invited to participate, a usable response rate of 28.7% (n=110) was obtained. Of the 110 respondents only 5 had logged into XDMoD prior to the survey, meaning 105 of the respondents were new XDMoD users and therefore had not been previously exposed to the interface.

In addition to the standard multiple choice questions and free format answers found in most survey tools, the survey also included a novel technology that can track where survey participants click in a given window. This capability is particularly useful for interface development since it allows the investigator to determine where in a given interface screen shot users go to find the answer to the posed question. For example, Figure 2.5-1 show all locations on a given window of the XDMoD interface where users clicked in response to a particular survey question. The resulting image can be thought of as a heat map of the response space, with areas of high click frequency represented by continually “warmer” colors (red indicating the highest frequency of clicks). In Figure 2.5-1 users predominantly clicked in one of two areas, namely on a tab along the top of the window or on a summary plot in the bottom center. This provides useful information regarding how intuitive and easy to use the interface is with respect to a given query.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 34

Figure 2.5-1 is the aggregate heat map of the clicks by respondents for question 20 in the web-based survey. The question “Usability Study - Summary Page The XSEDE XDMoD (Metrics on Demand) Usage Summary Tab is shown below. Click on the image below to indicate where you would find awarded grants by field of science.”

Results of the survey showed that some users are overwhelmed by the amount of data presented to them. Users are confused why information is presented that is not in a relevant format, and tutorials are useful in orienting users but other help resources would be beneficial. These findings suggest a set of priorities for future evolution of the XDMoD interface, including: simplifying the interface for the Usage tab to present less text in the tree navigation and switching to a drop down navigation more similar to the Report Generator, concentrating on making sure the data presented to the user is in a format most useful for their need (role), and making sure the help manual is indexed to the web. Specific findings and recommendations based on the survey results are as follows:

1. Some default graphs make little sense for their representation of data. For example, pie charts for a single value as displayed in the Summary page when in the User role. Accordingly, the information presented should be tailored to be relevant to a particular user (role). 2. Some users feel overloaded when approaching XDMoD with the amount of default information presented through the interface. In some respects this is not a particularly surprising result given the powerful features built into the interface along the rapid injection of new features during the first three years of the award. However, we expect this situation to improve substantially over the next two years as the XDMoD user interface matures and as more attention can be devoted to making the interface intuitive. In addition, we will continue to investigate alternative designs that reduce the presented information until requested by a user’s action. For example, when landing on the XDMoD Summary page for the first time, we may find it beneficial to direct the user to select the metrics they are interested in seeing when they first land on the page. This OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 35

would limit the amount of information first presented as well as provide new users with insight that the summary page can be customized. 3. Users turn to search engines for help and accordingly the documentation needs to be indexed by public search engines. This is the same sentiment shared by users in the previous XDMoD survey. 4. Tutorials and other help tips for new and returning users are desired. The effectiveness of the report generator instructions when presented with no charts appears to be effective in helping users understand how to begin creating a custom chart. This can be expanded to other areas.

2.6 SUPReMM SUPReMM is a separately funded NSF project that is formally not part of the TAS project but it is closely related in that one of its main goals is to display detailed job-level performance data for XSEDE resources in XDMoD. SUPReMM (Integrated HPC Systems Usage and Performance of Resources Monitoring and Modeling) leverages the open source software package TACC_Stats to periodically sample node and system state values and a wide spectrum of hardware counters to provide a rich (and extensive) dataset of performance data for every application run on each node in a given HPC cluster. Initially the focus has been on the TACC systems Lonestar4 and Ranger. Ultimately, however, the goal is to extend this data collection to all XSEDE resources. When fully integrated with XDMoD, SUPReMM will provide users and importantly system personnel with the ability to identify underperforming applications and based on the collected SUPReMM data, suggest methods that may be employed to improve end-user application performance, and in so doing, free-up valuable CPU cycles on these already oversubscribed platforms.

TACC_Stats enhances the UNIX sysstat/sar utilities, a useful and often underutilized collection of performance monitoring utilities for the UNIX/Linux-based HPC environments [53]. Currently TACC_Stats can gather core-level CPU usage (user time, system time, idle, etc), socket-level memory usage (free, used, cached, etc), swapping/paging activities, system load and process statistics, network and block device counters, inter-process communications (SysV IPC), software/hardware interrupt request (IRQ) counts, filesystems usage (NFS, Lustre, Panasas), interconnect fabric traffic (InfiniBand and Myrinet), and CPU hardware performance counters. Currently, TACC_Stats utilizes CPU performance counters as follows (note that performance counters are processor-specific, and have many restrictions on simultaneous sampling). On AMD Opteron, the events are FLOPS, memory accesses, data cache fills, SMP/NUMA traffic. On Intel Nehalem/Westmere, the events are FLOPS, SMP/NUMA traffic, and L1 data cache hits. To not interfere with a user's own profiling and instrumentation activities, at periodic invocations (currently every 10 minutes), TACC_Stats only reads values from performance registers.

The node-level TACC_Stats data is then processed to produce job-level data. The flow of data from SUPReMM to XDMoD is shown schematically in Figure 2.6-1. OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 36

Figure 2.6-1 Schematic showing the collection and flow of data within SUPReMM and XDMoD

To date, the TAS team has ingested job-level Ranger TACC_Stats data into XDMoD to provide users with the ability to easily generate plots showing the performance of Ranger on a fine- grained (job) level, as indicated in Figures 2.6-2 to 2.6-4 (all for the last quarter of 2012, Ranger’s last full quarter in production). In particular, we consider memory usage, CPU usage, FLOPS, and memory usage. Figure 2.6-2 shows the CPU usage compared with the CPU idle time for all XSEDE jobs run on Ranger during the final quarter of 2012. Note that the CPU usage is very efficient with most jobs showing very low CPU idle time. Figure 2.6-3 shows the daily output of FLOPS by Ranger that shows a somewhat surprisingly small value considering that the peak value for Ranger has been benchmarked as 579 TFLOPS. This is dependent on the job mix running on any specific day. Figure 2.6-4 shows the memory usage per core for Ranger during the same period. Note that each core is equipped with 2 GB of memory so even on peak usage days only about 40% of the memory is used on average.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 37

Figure 2.6-2 Total user CPU hours (red), total idle CPU hours (blue), and system time (black) for all XSEDE jobs run on Ranger during the last quarter of 2012

Figure 2.6-3 Total FLOPS for all XSEDE jobs run on Ranger during the last quarter of 2012.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 38

Figure 2.6-4 Memory usage per core for all XSEDE jobs run on Ranger during the last quarter of 2012.

2.7 PEAK It is well known that scientific application performance is dependent on what can be a myriad of compiler options and numerical libraries to select from when creating the executable for a given scientific application. In most cases, users and developers are looking for a systematic way of selecting the configuration of available options that will result in the optimal application performance on a given computing system. The Performance Environment Autoconfiguration frameworK (PEAK) is being developed with just this purpose in mind. It is designed to select the optimal configuration for an application on a given platform and to update that configuration when changes in the underlying hardware and systems software occur [54]. The configuration options considered for the performance optimization include the compiler along with its settings of compiling options, the numerical libraries and settings of library parameters, and settings of other environment variables. Based on a presentation at XSEDE12 by Dr. Haihang You describing the implementation of PEAK on Kraken, the TAS Team reached out to Dr. You to inquire about its applicability to other XSEDE resources and HPC cyberinfrastructure in general. From these conversations, it was decided to incorporate PEAK into the XDMoD auditing framework. Specifically, there will be a PEAK tab implemented on XDMoD to give the users direct access to the PEAK technology. Implementing the PEAK technology will help XSEDE service providers and HPC centers produce optimal application builds and to ensure that they remain optimal through system upgrades.

Figure 2.7-1 demonstrates the PEAK framework. HPC users enter usage information, such as a function in a numerical library, problem size, etc. The driver generator produces a program for benchmark purposes, while the compilation process automatically builds all possible versions of executable with different compilers and numerical libraries. The job script generator creates job submission files so benchmark tests can be executed after the compilation process is completed. Finally, the benchmark results are harvested and stored in a performance database. This OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 39 information is also returned to the users through the interface so they can compare the performance of all versions and select the optimal build for their application.

Figure 2.7-1. The Framework Design of PEAK

Despite beginning to work under the TAS program since only December 2012, substantial progress has been realized. The PEAK framework on Kraken has been extended to include scientific applications, as opposed kernels with BLAS/LAPACK functions. The basic idea remains the same; that is, build an application with different compilers and determine the optimal combination of compiler and libraries for each application under varying processor counts. In addition, if an application uses numerical libraries, available alternatives on Kraken (i.e., ACML from AMD and MKL from Intel) can be linked, instead of the default LibSci from Cray. FFTW is another possible option for performance tuning (i.e., Cray FFTW2 and Cray FFTW3).

These three factors (i.e., compiler, numerical library, and FFTW version) enable us to build as many as 18 versions for each application. The actual number of versions of a given application is subject to the compatibility of its code with different compilers, versions of FFTW, and its use of a numerical library.

In this phase of the project, the most heavily used applications on Kraken were initially selected, namely, NAMD, LAMMPS, GROMACS, CPMD, and AMBER. Kraken currently provides only one version for each of these five applications. However, the main objective was to produce as many versions as possible and then conduct a performance comparison in order to find the optimized configuration. Sample highlights of our PEAK findings to date on Kraken are as follows.

CPMD: CPMD is a typical application that can be optimized by varying a number of configuration options: compiler, numerical library, and FFTW version. Theoretically, we are OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 40 able to produce 18 versions of CPMD on Kraken. However, the latest source code version of CPMD is not compatible with the version of the GNU compiler on Kraken. Consequently, only 12 of 18 possible versions of CPMD were evaluated. Initially, the SI512 benchmark test was run for CPMD. However, due to memory intensive nature, it would not run when the number of nodes involved was one or two. As a result, the C120 benchmark test was also adopted.

Table 2.7-1. Performance data of CPMD-SI512 (unit: second)

48-core 96-core 192-core 384-core 768-core Intel-libsci-fftw2 257.25 155.06 88.68 90.50 144.72 Intel-libsci-fftw3 249.26 163.64 89.93 109.64 100.85 Intel-acml-fftw2 236.24 124.74 67.40 69.24 38.00 Intel-acml-fftw3 228.87 118.58 66.12 40.10 34.41 Intel-mkl-fftw2 155.34 95.53 54.52 54.47 63.18 Intel-mkl-fftw3 152.21 89.94 53.98 49.43 51.71 PGI-libsci-fftw2 239.91 285.96 234.12 512.95 127.33 PGI-libsci-fftw3 231.98 289.82 226.53 143.74 132.31 PGI-acml-fftw2 235.96 290.20 204.43 130.77 167.86 PGI-acml-fftw3 230.73 285.73 193.29 129.62 149.68 PGI-mkl-fftw2 160.96 290.71 193.24 139.33 155.72 PGI-mkl-fftw3 157.28 288.34 182.21 172.41 165.06 .

Figure 2.7-1. Performance data of CPMD-SI512.

From the performance results for CPMD-SI512, shown in Table 2.7-1 and Figure 2.7-1 it is clear that the choice of compiler can make a substantial difference. The Intel compiler is better in most cases. It can improve the speed up to 4 times (e.g., intel-acml-fftw3 vs. pgi-acml-fftw3). For the PGI version of CPMD, the performance degrades significantly when the number of cores OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 41 increases from 48 to 96. This is also true in some cases even for 192 cores. On the other hand, the right choice of FFTW version can benefit the performance to some extent. For CPMD, the numerical library does matter. For example, if built with the Intel compiler, the MKL library usually produces better performance. The FFTW3 version outperforms its FFTW2 counterpart in most cases. Also, it is striking to see that with 384 cores, Intel-mkl-FFTW3 is more than 10 times faster than PGI-LibSci-FFTW2 (the default version on Kraken). Finally, it is interesting to note that for CPMD, a larger number of cores does not necessarily produce better performance.

AMBER: AMBER uses BLAS/LAPACK functions so there are three options of numerical library. Although it needs to be linked with FFTW, only FFTW3 is compatible. Therefore, 9 versions of AMBER can be produced and evaluated, however the Intel versions were found not to work properly on Kraken, so only 6 versions were considered.

Table 2.7-2. Performance data of AMBER-JAC (unit: second)

24-core 48-core 96-core 192-core 384-core 768-core GNU-libsci 140.12 70.89 36.62 20.05 11.78 8.54 GNU-acml 139.93 70.99 36.67 19.95 12.95 8.59 GNU-mkl 139.92 70.87 36.89 19.78 12.32 10.96 PGI-libsci 107.21 55.04 29.18 16.65 10.92 8.79 PGI-acml 107.29 55.06 28.96 16.73 10.52 8.53 PGI-mkl 107.26 55.06 28.94 16.82 10.84 9.77

As shown in Table 2.7-2, AMBER built by the PGI compiler appears to run much faster than that built by the GNU compiler. Unlike CPMD, the selection of numerical library does not make any difference, except that the MKL versions are slower than their LibSci and ACML counterparts when running on 768 cores.

Conclusions: Utilizing the PEAK framework to analyze all possible configurations of five of the most heavily used scientific applications on Kraken, it was determined that: 1. In many cases, application performance is sensitive to the chosen compiler, sometimes dramatically so (as much as a factor of 4 for CPMD). 2. Selection of numerical libraries and FFTW version can influence the execution speed in some cases. CPMD represents a typical example in this study. 3. For HPC users, the optimized configuration of an application is not always the same. The best choice can vary based on the number of cores an application runs on.

Future Work: PEAK will be extended to additional XSEDE platforms and integrated into the XDMOD framework.

2.8 Broader Impact – Open Source XDMoD While the current TAS framework has been designed specifically to meet the needs of XSEDE community, much of its functionality is applicable to other HPC centers and may be used to improve the performance of cyberinfrastructure outside of XSEDE. Since the most straightforward application of XDMoD is to academic and industrial HPC centers, our initial efforts have been focused on providing an open-source version of XDMoD for installation at these sites. The organizational structure of XSEDE is very similar in nature to that of a typical OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 42

HPC center. In fact, a typical HPC center can be treated conceptually as XSEDE with a single resource provider. Both XSEDE and HPC centers employ the concept of users, compute jobs, project groups, help desk tickets, scientific publications, and one or more HPC resources (e.g., compute cluster, storage pool, etc.). Users also fall into a similar hierarchical organizational structure differing only in the terms used to describe nodes in the hierarchy. On XSEDE, users are associated with one or more projects or allocations, each with their own principal investigator. An individual project is associated with an NSF directorate and field of science; an academic HPC center may group users by decanal unit and department; an industrial HPC center may group users by project and department. Similarly, the job level information collected can include a job name, wall clock time, start and end dates, memory utilization, and so on.

Preliminary work has begun to prepare for the release of the open source version of XDMoD. Initially, as a proof of concept, HPC utilization data from the University at Buffalo’s Center for Computational Research (CCR) has been ingested into the XDMoD data warehouse structure. The various XSEDE roles were then mapped onto academic equivalents such as Dean, Department Chair, and PI. The net effect is to adapt the XDMoD framework, which was written to serve the XSEDE cyberinfrastructure, to serve cyberinfrastructure typical of a University setting. Ultimately, we will proviode an open source XDMoD template that will be suitable to support academic and commercial HPC systems. The response to XDMoD and its predecessor UBMoD has been so strong that there are a number of academic and commercial HPC centers that have requested to act as beta test sites for the open source version of XDMoD. We anticipate beta testing to start later in 2013.

Figure 2.8-1 shows a summary page with output data from CCR’s clusters obtained during this open source development effort. The upper left chart is CPU hours generated on various CCR clusters. The upper right chart analyzes CPU’s hours by job size (number of processors). The lower left chart breaks the usage down by academic divisions. Finally, the lower right chart shows the average daily core count. This is only a small sample of the many metrics that will be available through the open source version of XDMoD.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 43

Figure 2.8-1. Open source version of XDMoD showing a summary page of usage from SUNY at Buffalo computers.

2.9 Project Requirements and Management In the original proposal, information about the system requirements was compiled into a Requirements Traceability Matrix in order to help guide and coordinate development efforts and ensure that the project requirements were being met. It was anticipated at the outset of the project that the needs of the scientific research community as well as XSEDE management would continue to evolve over the five years of the award. The project requirements as well as the Agile design methodology were designed to be flexible enough to evolve in order to meet these changing needs. As shown in Appendix 10.1, which contains the Requirements Traceability Matrix exactly as it appeared in the proposal, we have met or exceeded almost all requirements.

3.0 AUDITING CHALLENGES

3.1 XDCDB The XSEDE Central Database (XDCDB) is the repository for, among other things, XSEDE usage data including job and user accounting information. The job accounting data collected makes up the base properties as specified in the Open Grid Forum (OGF) Usage Record Format OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 44

Recommendation (GFD.98). While these base properties are those items that all or most sites deem critical for accurately recording the usage on their resources such as job identifier, user name, job name, duration, and charges, additional attributes are desirable. Additional job attributes include the job exit status, local disk usage, and peak memory usage. Information on the usage of storage resources is also desirable and not currently recorded including shared filesystems description, capacity, and usage. In addition to job-level metrics, it is desirable to track up-to-date resource configuration descriptions that include a time stamped description of the resource (the total number of nodes, cores, CPU speed, GPU capability, Rpeak, Rmax, etc.) as well as the portion of the resource that is available at any given time. This information could be utilized to more accurately calculate utilization of the resource at any given time. Our recommendations, which were provided to XSEDE leadership, are included in Appendix 10.5: XSEDE Central Database (XDCDB) Data Recording Recommendations

3.2 POPs The POPS database is used to manage XSEDE allocation requests and facilitates the process of reviewing and approving resource allocations. POPS is currently separate from the Postgres infrastructure running the XDCDB, built using the Sybase RDBMS platform, and has grown organically over the course of many years and has been patched over time as additional functionality was needed. The data relating to allocation proposals has been user-provided, much of it in free-form text format prior to 2008, making it a challenge to extract accurate and useful information for reporting. In particular, publications and external funding in support of allocations requests are often of questionable accuracy both on their own and when compared to NSF sources. Publications are not currently stored in manner allowing programmatic access and supporting grant information often has incorrect award numbers. It is of no help that many funding agencies do not provide programmatic access to award information, requiring manual searches to be performed. Currently, data from POPS must be cleansed using a process developed in conjunction with XSEDE management and utilizing data provided – via physical media – by NSF before the information can be incorporated into reports.

The TAS team is taking part in the complete redesign of POPS that is currently being undertaken by XSEDE called POPS 2.0. This redesign will examine all aspects of the allocations process, develop use cases to drive the design of the new system, and verify as much information as possible before recording it. External data sources will be utilized to verify user-entered information where possible.

3.3 NSF Award Data The NSF Award Search tool provides users with an easy way to obtain information about current and historical NSF awards. This includes the award number, NSF organization, PI/Co-PIs, award dates, amount awarded to date, and other information. The usefulness of this tool for reporting is hampered by the lack of programmatic access to the data (e.g., a Web Services or RESTful search interface) such as that provided by NIH. In order to obtain a copy of the Award Search data, an NSF staffer currently dumps a snapshot into a Microsoft Access database on a physical DVD and mails it to the TAS team. This data is then extracted from Access into the XDMoD data warehouse where award numbers and other award details can be cross-referenced with XSEDE data - a process that typically occurs quarterly. Programmatic access to the NSF Award Search data would greatly enhance our ability to provide up-to-date reporting on the OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 45 amount of support that NSF has provided to XSEDE projects and would also improve the accuracy of data collected during the allocations process.

3.4 Ensuring TAS Recommendations are Implemented One aspect of UB’s Technology Audit Services is to identify deficiencies in XSEDE operations and services and make recommendations where appropriate for their improvement. Thus TAS’ role is advisory and does not include enforcing recommendations. While in general this has not proven to be problematic, there are instances where recommendations have not been implemented. For example, as shown in 4.4 (and Appendix 10.5), based upon examination of the existing records in the XDCDB, we have carefully formulated a series of recommendations for additional job, resource, and user related data to be included in the XDCDB that would allow greater auditing and analysis capabilities and lead to improved quality of service metrics. However, none of these recommendations have yet to be implemented. As a partial response to this, we have implemented the Compliance Tab in XDMoD (see Section 4.3) that monitors the data reported by each XSEDE service provider including both data fields required by XSEDE and data fields that are recommended by TAS. The Compliance tab is visible to each service provider, XSEDE leadership, and NSF leadership and it is our expectation that it will used by XSEDE leadership to help ensure that recommendations are implemented. In addition to viewing the Compliance tab directly through the XDMoD portal, it is possible to have a compliance report sent weekly through email via the Custom Report Builder.

4.0 TECHNOLOGY AUDIT SERVICES IMPACT

4.1 Use Cases Here, we demonstrate, through several case studies, XDMoD’s utility for providing important metrics regarding resource utilization and performance of XSEDE that can be used for detailed analysis and planning as well as improving operational efficiency and performance of resources. We begin with a description of several early success stories regarding the use of application kernels to detect underperforming hardware and software.

4.1.1 Early Application Kernel Success Stories Modern HPC infrastructure is a complex combination of hardware and software environments that is continuously evolving, so it is difficult at any one time to know if optimal performance of the infrastructure is being realized. Early implementation of application kernels have already proven invaluable in identifying underperforming and sporadically failing infrastructure that would have likely gone unnoticed, resulting in diminished productivity (CPU cycles, failed jobs) on HPC systems that are over subscribed.

While some of the cases presented here are the result of the application kernels run on the large production cluster at the Center for Computational Research (CCR) at the University at Buffalo, SUNY, the suite of application kernels is currently running on most XSEDE resources and will soon be running on all XSEDE resources as part of the Technology Audit Service of XSEDE. Application Kernels have already successfully detected runtime errors on popular codes that are frequently run on XSEDE resources. For example, Figure 4.1-1 shows the execution time over the course of several months for an application kernel based on NWChem, a widely used OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 46 quantum chemistry program that is run daily on the large production cluster at CCR. While the behavior for 8 cores is as expected, calculations on 16 cores (multiple nodes) in May showed widely sporadic behavior, with some jobs failing out right and others taking as much as seven times longer to run. The source of performance degradation was eventually traced to a software bug in the I/O stack of a commercial parallel file system, which was subsequently fixed by the vendor, as evidenced by the normal behavior in the application kernel after June 4th. Indeed, the software patch to fix this problem is now part of the vendor’s standard operating system release. It is important to note that this error was likely going on unnoticed by the administrators and user community for sometime and was only uncovered as a result of the suite of application kernels run at CCR.

Figure 4.1-1 Plot of execution time of NWChem application kernel 8 and 16 cores over a several month time period on CCR's production cluster. The execution time for a given job size should be relatively constant over time. Calculations on 16 cores show sporadic performance degradation until early June when a patch to a bug in the commercial parallel filesystem was installed.

As a further indication of the utility of application kernels, consider Figure 4.1-2, which shows a performance increase of a factor of two in MPI Tile IO after a system wide library upgrade from Intel MPI 4.0 to Intel MPI 4.0.3, which supports Panasas file system MPI I/O file hints. Since CCR employs a Panasas file system for its scratch file system, this particular application kernel alerted center staff to rebuild scientific applications that can utilize MPI file hints to improve performance. This would have gone unnoticed without the performance monitoring provided by the application kernels.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 47

Figure 4.1-2 Application kernels detect I/O performance increase of a factor of 2 for MPI Tile IO in a library upgrade from Intel MPI 4.0 to Intel MPI 4.0.3, which supported PanFS MPI I/O file hints (in this case for concurrent writes).

Figure 4.1-3 Application kernel data for IMB (blue), IOR (red) and MPI-Tile-IO (black) on Lonestar4. The IOR and MPI-Tile-IO data show a sudden drop of aggregate write throughput bandwidth and the IMB data shows a sudden increase in latency starting on 7/24-25/2012.

Figure 4.1-3, shows a sudden decrease in file system performance on Lonestar4 as measured by 3 different application kernels (IOR, MPI-Tile-IO, and IMB). The IOR and MPI-Tile-IO both show a sudden decrease in the aggregate write throughput bandwidth, while IMB, which measures latency, shows an equally sudden increase in latency. Once again, without application kernels periodically surveying this space, the loss in performance would have gone unnoticed. OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 48

We are currently working with TACC to understand this file system performance deterioration.

Figure 4.1-4, shows the unanticipated results brought to light by a periodically running application kernel based on the popular NAMD molecular dynamics package. The application kernel detected a 25% degradation in the NAMD’s baseline performance that was the unanticipated result of a routine system-wide upgrade of the application version (note change indicator symbols indicate when the source code for NAMD was upgraded). Pre-upgrade performance was restored through the application of more aggressive compiler options. Once again, without application kernels periodically surveying this space, valuable compute cycles would have been lost.

Figure 4.1-4. Application kernels detect a performance degradation of as much as 25% in the molecular dynamics application NAMD caused by a routine system-wide upgrade in the application version (note change indicator symbols indicate when application was upgraded).

One of the most problematic scenarios entails a single node posing a critical slowdown in which the cumulative resources for a job (possibly running on thousands of processing elements) are practically idled due to an unexpected load imbalance. It is very difficult for system support personnel to preemptively catch such problems, with the result that the end-users are the ”canaries” that report damaged or underperforming resources, often after investigations that are very expensive both in terms of computational resources and personnel time. An active monitoring capability designed to automatically detect such problems is therefore highly desirable. For example, Figure 4.1-5 shows the results of a log file analysis of CCR’s large production cluster consisting of more than 1000 nodes. By examining only the size of the log files generated on each node we were able to detect a loose cable on one node and a job scheduler error on another node, both of which resulted in failed jobs (large log file size is indicative of errors). Without such analysis, the loose cable or job scheduler error would have likely gone undetected, resulting in many failed jobs, frustrated users, and underperformance of the resource. While analysis of system log files is not currently included within the XDMoD framework, it is anticipated that future versions will, given its utility in identifying faulty hardware.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 49

Figure 4.1-5 Plot of log file analysis for each node in CCR's production cluster. Large log file size can be indicative of an error. Two nodes produce log files that are 1000 times larger than normal. One node was found to have a loose cable and the other had a job scheduler error, both resulting in failed jobs.

4.1.2 An Analysis of the Tradeoff between Throughput and Utilization using XDMoD TAS collaborated with SDSC researchers to use XDMoD to do an analysis of utilization and throughput (as measured by the user expansion factor – that is, the ratio of the sum of the time in queue plus the run time divided by the run time) on Trestles compared to Ranger and Kraken. Based on this work, SDSC was able to properly fine-tune the maximum utilization on Trestles compatible with maintaining optimum turnaround. This work won the award for the Best Technology Paper at XSEDE12 and is summarized below.

The Trestles system is targeted to modest-scale and gateway users, and is designed to enhance users’ productivity by maintaining good turnaround time as well as other user-friendly features such as long run times and user reservations. However, the goal of maintaining good throughput competes with the goal of high system utilization. Using XDMoD, one year of Trestles operations were analyzed to characterize the empirical relationship between utilization and throughput, with the objectives of understanding their relationship, and informing allocations and scheduling policies to optimize their tradeoff. There is considerable scatter in the correlation between utilization and throughput, as measured by expansion factor. There are periods of good throughput at both low and high utilizations, while there are other periods when throughput degrades significantly not only at high utilization but even at low utilization. However, throughput consistently degrades above ~90% utilization, as shown in Figure 4.1-6. User behavior clearly impacts the expansion factor metrics: the great majority of jobs with extreme expansion factors are associated with a very small fraction of users who either (1) flood the queue with many jobs or (2) request job run times far in excess of actual run times. While the former is a user workflow choice, the latter clearly demonstrates the benefit of matching requested time to actual run time. Utilization and throughput metrics derived from XDMoD are OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 50 compared for Trestles with two other XSEDE systems, Ranger and Kraken, with different sizes and allocation/scheduling policies. Both Ranger and Kraken have generally higher utilization and, not surprisingly, higher expansion factors than Trestles over the analysis period. As a result of this analysis, we intend to increase the target allocation fraction from the current 70% to ~75- 80%, and strongly advise users to reasonably match requested run times to actual run times. (Ref: R. L. Moore, L. Carson, A. Ghadersohi, A. Jundt, K. Yoshimoto, and W. Young, “Analyzing Throughput and Utilization on Trestles,” XSEDE12 2012 eXtreme Science and Engineering Discovery Environment 2012, Chicago, IL, USA – July 16 – 20, 2012. DOI: 10.1145/2335755.2335802).

Figure 4.1-6 - Trestles Daily Utilization and 95% Threshold User Expansion Factors for Period 4/2011 to 3/2012.

4.1.3 XSEDE Planning and Analysis Case Studies using XDMoD Measuring the utilization of high-end cyberinfrastructure such as XSEDE is obviously important as it helps provide a basic understanding of how CI resources are being utilized and can lead to improved performance of the resource in terms of job throughput or any number of desired job characteristics. Indeed, as described by Katz et. al. [19], the ability to readily measure usage modalities for cyberinfrastructure leads to a greater understanding of the objectives of end users and accordingly insight into the changes in CI to better support their usage. Furthermore, given the rapid pace of hardware upgrades, access to reliable, extensive data from past usage is also essential for planning purposes.

XSEDE is the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world [55]. It is a single virtual system that scientists can use to interactively share computing resources, data, and expertise. XDMoD, through the XSEDE central database, provides a rich repository of usage data. Here we demonstrate, through several OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 51 examples, the extent of the data as well as its utility for planning. In what follows, the terminology Service Units (SUs) is liberally used. It should be understood as core hours with the caveat that an SU is defined locally in the context of a particular machine. Thus, the value of an SU varies across resources utilizing varying technologies and, by implication, varies over time as technology advances. We begin with a historical look at utilization. The data displayed in Figure 4.1-7 shows the total number of service units (SUs) delivered to the community on a year-by- year basis from 2005 through 2012.

Figure 4.1-7 Total XSEDE usage in billions (G) of service units (SUs) for the years 2005 - 2012. Note: For the purpose of this paper, service units should be understood as core hours with the caveat that the value of SU varies across resources utilizing varying technologies and, by implication, varies over time as technology advances.

The large increase in the number of delivered SUs beginning in 2008 is not surprising since it was during that period that the NSF funded two very large computational resources, Ranger at TACC and Kraken at UTK/ORNL, which provided more cycles than the previous set of resources combined. Figure 4.1-8 is a plot which is designed to provide an indication of the largest, average and total usage on XSEDE resources, showing for example that the largest XSEDE allocation has increased by more than an order of magnitude since 2005 to more than 100M SUs. Remarkably, the largest allocation of a single user today exceeds the total usage of all users in 2005 and 2006.

Figure 4.1-8 Largest, average and total SU allocations over time. Note that the average and largest allocations have increased by more than a factor of 10 over the time period above. OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 52

The TeraGrid/XSEDE usage by parent science is shown in Figure 4.1-9. Parent Science is an aggregation of fields of science defined by a previous (ca. 1995) organizational structure of the NSF and corresponds to NSF divisions (or previous divisions). This aggregation is used to categorize the TeraGrid/XSEDE allocations and usage. Given the modest number of organizational changes at NSF at the divisional level, the classifications in Figure 4.1-9 and Figure 4.1-10 can easily be related to current NSF divisions. Physics and molecular biosciences are the top consumer science fields using between 600M-700M SUs per year after the Ranger and Kraken resources were deployed. Usage by the molecular biosciences has become comparable to physics in recent years as the bioscientists become more dependent on simulation as a part of their scientific arsenal. Materials research is also a significant and growing consumer of CPU time.

Figure.4.1-9 Total SU Usage by Parent Science

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 53

Figure 4.1-10 Average Core Count (weighted by SUs) by Parent Science

However, as Figure 4.1-10 shows, the average core count by parent science varies widely. Note, as shown in [20], [21] when examining the average core counts run on XSEDE resources, it can be misleading to report only the average core count for a particular metric or resource. Accordingly, we find it more informative to compute the average core count by weighting each job by the total SUs it consumes. Traditionally, fields in the Mathematical and Physical Sciences (MPS) directorate of NFS have been thought to be the largest users of XSEDE computational resources. While MPS users are still significant, it is clear from Figure 4.1-9 that the molecular biosciences community, which falls predominantly within the Biological Sciences Directorate, has been on the rise for some time and has harnessed the capabilities of these resources to advance the field. Researchers in this area have passed their colleagues in all of the divisions within MPS with the exception of Physics, which it is clearly on par with at this point. However, from Figure 4.1-10 it is also clear that the type of jobs that are typical of the molecular biosciences use a relatively small number of compute nodes. Physics and fluid dynamics (which dominates Chemical Thermal Systems), fields long characterized by the need to solve complex partial differential equations, typically require careful attention being paid to parallelization and by default, large core count jobs. Many of the biological applications are dominated by complex workflows, involving many jobs but relatively few cores, often with large memory per core. In general, the average number of cores used is moderate in size. It is interesting to speculate on the reasons that is the case. Certainly, it could be algorithmic. As we know, the development of effective software is an extremely time consuming and human intensive problem. Also, there are practical issues of turnaround. Many users have learned to structure their jobs for optimal turnaround and that often can be in conflict with optimal core count use. In addition, the use of average core count as a measure of the need for machines with many processors, can be misleading. The job mix submitted by most users ranges over core count. Often it is necessary to run a significant number of smaller core count jobs as a preliminary to the single large core count run. These all contribute to lowering the average core count number.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 54

Based in part on similar planning data provided by TAS through XDMoD, Barry Schneider produced the following paper: B.I. Schneider, A Data History of TeraGrid/XSEDE Usage: Defining a Strategy for Advanced CyberInfrastructure (ACI), Office of Cyberinfrastructure, National Science Foundation, Arlington, Virginia 22230, Dated: March 11, 2012).

4.2 XSEDE Reporting Templates The XDMoD Custom Report Builder allows the user to design and publish reports from XDMoD charts. A powerful new report template feature has been added that facilitates the generation of charts by providing users with a series of templates upon which a custom report can easily be assembled. For example, a given service provider can generate the entire series of XSEDE quarterly report charts for each resource with a single mouse click on the “Template: SP Quarterly Report” button, as shown in Figure 4.2-1. The user now has the option to export the report as a PDF or MS Word file. While perhaps not as glamorous as some of the other early successes discussed in the body of this report, XDMoD is being used by all XSEDE service providers to provide usage plots for the quarterly and annual reports. Benefits include increased productivity (staff are not tied up for several days each quarter generating the requisite plots) and uniformity of the format of the data presented from provider to provider.

Figure 4.2-1. The XDMoD Custom Report Builder. Note that this series of 9 quarterly reports (left hand side) for TACC was generated by a single mouse click on the SP Quarterly Report template in the Custom report builder.

4.3 XSEDE Compliance Tab A Compliance tab was added to the XDMoD framework to provide service providers, NSF Program Officers and XSEDE leadership with a tool to quickly assess service provider compliance with XSEDE operational reporting requirements. Figure 4.3-1 shows a snapshot of this feature. The new compliance tab tracks whether or not each service provider is supplying required reporting metrics and data. The data are shown in the first column of Figure 4.3-1. Required data (as determined by XSEDE) are shown using a bold face font while additional data that TAS strongly recommends should be collected are in the lighter (grey) font. The TAS recommended collected data (see Section 4.4) would allow greater auditing and analysis capabilities and lead to improved quality of service metrics. The time frame of the compliance is OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 55 user selectable, providing a compliance history for the service providers. Like the charts generated with XDMOD, compliance data can be output as a report in the Report Generator using a XDMoD provided template. Therefore, the service provider, XSEDE leadership, and NSF leadership can choose to have the Compliance Report emailed to them daily, weekly, or monthly without having to log into the XDMoD portal to manually check on compliance.

Figure 4.3-1 XDMoD Compliance tab

4.4 XSEDE Central Database (XDCDB) Data Recording Recommendations: Based upon TAS’s examination of the existing records in the TGcDB, we have carefully formulated a series of recommendations for improved record-keeping. These recommendations include improved job metrics, resource metrics and user metrics. Many of these metrics are included in the Compliance Tab in XDMoD that is viewable by NSF and XSEDE leadership. As a partial response to this, we have implemented the Compliance Tab in XDMoD, see Section 4.3, which tracks data reporting by the various service providers both for required data fields and our recommended data fields. The specific recommendations are contained in Appendix 10.5.

6.0 EDUCATION, OUTREACH, and TRAINING

6.1 Summary Report for REU Supplements to OCI 1025159

Research Experience for Undergraduates: REU funds from the TAS award were leveraged to continue the Center for Computational Research’s long-standing program in STEM training for undergraduates as well as K-12 and graduate level programs. Students not only gain valuable experience working in a state-of-the-art research environment as part of a team, often applying knowledge in their chosen fields, but also utilize new technologies, including advanced software engineering, database design, parallel computing, and user interface development. Over the summer, students also take part in a series of specialized workshops designed specifically for them by CCR staff, including topics such as basic Unix command line tools, debugging OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 56 techniques, scientific visualization, scripting, and database concepts. These topics (see Table 6.1- 1), often outside of the students’ area of expertise, are designed to introduce them to advanced computational methods. The end of the summer culminates with a short presentation by each student describing the research project they worked on and their role in it.

Table 6.1-1. Undergraduate Training Sessions for the Summer 2012

CCR Summer Training for Undergraduate and Graduate Students

CCR Resources and Overview CLI/Shells/Utilities - awk, cut, sed More CLI/editors Perl Debugging for fun and profit Scientific Visualization - from data to pretty pictures Advanced Visualization Databases

With funds from our 2011 and 2012 REU Supplements to OCI 1025159, we supported 4 undergraduates, Amanda Ruby, Katrina Schlum, Sam Steffan, and Thomas Yearke. Amanda Rudy is a senior biology major at the University at Buffalo; Katrina Schlum is a junior Bioinformatics major at St. Bonaventure University; Sam Steffan is a sophomore biology major at North Carolina State University, and Thomas Yearke is a senior computer science major at Rochester Institute of Technology. During 2012, they worked in the general scientific computing environment at the Center for Computational Research as well as the advanced cyberinfrastructure aspects of our Technology Audit Service for XSEDE project, providing them with a hands-on research experience, including developing computationally lightweight application kernels and user portals. In all 4 cases however, we also provided them with experience directly related to their chosen field of study, namely bioinformatics for Schlum, biology for Ruby and Steffan, and computer science for Yearke.

One measure of the success of our program is that several of our undergraduate students have gone on to pursue advanced degrees in science and engineering, including Mark Cianchetti (Cornell University – doctoral program), Lucas Hoffmann (Emory University – doctoral program), Hans Baumgaertner (University at Buffalo – masters), Kyle Markus (University at Buffalo – PhD), Molly Hannigan (Case Western, PhD), and Dan McSkimming (University at Buffalo – masters).

Table 6.1-2 provides a summary of the undergraduates (and one HS student) who have worked on various research projects at CCR over the past 3 years. As you will note, the students come from a mix of 2- and 4-year colleges as well as from non-traditional areas, including English, Media Study, and Digital Arts and Animation.

Table 6.1-2: Undergraduates Participating in Research Projects at CCR since 2007

Student Affiliation Period Major Baratta, Dominic University at Buffalo 8/09 - 5/10 Computer Science OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 57

Baumgaertner, Hans University at Buffalo 7/07 - 8/10 Media Study/Production Brodfuehrer, Sean University at Buffalo 2/07 - 12/07 Urban Planning Brubaker, Jacob University at Buffalo 1/12 - current Geography Dziadaszek, Michael Empire State College 6/09 - current Information Technology Hammill, Lucas University at Buffalo 7/07 - 8/08 English Haney, Samantha Cleveland Inst. of Art 5/07 - 3/11 T.I.M.E. Digital Arts/Animation Hannigan, Molly University at Buffalo 5/12 - current Biochemistry and Mathematics Hoffmann, Lukas University of Pittsburgh 6/07 - 8/08 Scientific Computing Houser, William Rensselaer Polytechnic 5/09 - 7/09 Gaming & Simulation/Business Institute Mgmt Lehner, Matthew University at Buffalo 2/07 - 8/09 Business Administration Marcus, Kyle University at Buffalo 7/08 - 5/12 Computer Science Milman, Noah University at Buffalo 8/06 - 5/07 Communications Pappalardo, Benjamin Ohio State University 6/08 - 7/10 Visual Communication Design Pitman, Mark Cansius High School 7/09 - current Ruby, Amanda University at Buffalo 5/12 - current Biological Sciences Schlum, Katrina St. Bonaventure University 5/12 - current Bioinformatics Steffan, Sam North Carolina State 5/12 - current Biology Volo, Robert Erie Community College 6/09 -8/11 Computer Science Yearke, Jillian University at Buffalo 6/12 - current Civil Engineering Yearke, Thomas Rochester Institute of 5/10 - current Software Engineering Technology Zhou, Jason Nichols High School 6/12 - current

In terms of K-12 outreach, each year CCR runs the Eric Pitman Annual Summer Workshop in Computational Science (http://ccr.buffalo.edu/outreach/k-12-outreach/summer-workshop.html). Every summer, high school sophomores, juniors, and seniors spend two weeks learning computer programming and its application to problems in chemistry, visualization, and most recently bioinformatics. High school students have not been the only beneficiaries of this program. Through NSF REU grants, CCR has supported many undergraduates over the years to help with the development of the workshops, including course material in Unix, Perl, MYSQL, C++, and CGI. The program has been particularly effective in reaching out to under-represented groups. Just this year (2012), almost 50% of the participants were female. The year before more than 50% were, and we have routinely had students from inner city schools participating.

6.2 TAS Workshops The TAS group has been active in outreach and training for XDMoD, offering the following workshops/BOFs.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 58

1. T. R. Furlani, et al., Technology Audit Service for TG:XD, BOF presentation made on 11/16/2010 at SC10, SC ’10 International Conference on High Performance Computing, Networking, Storage, and Analysis, New Orleans, LA, USA – November 13 – 19, 2010. 2. T. R. Furlani, et al., Technology Audit Service for XSEDE, BOF presentation made on 11/15/2011 at SC11, SC ’11 International Conference on High Performance Computing, Networking, Storage, and Analysis, Seattle, WA, USA – November 12 – 18, 2011. 3. T. R. Furlani, et al., Technology Audit Service for XSEDE, BOF presentation made on 11/14/2012 at SC12, SC ’12 International Conference on High Performance Computing, Networking, Storage, and Analysis, Salt Lake City, Utah, USA – November 10 – 16, 2012. 4. T.R. Furlani, et al., XSEDE Metrics on Demand (XDMoD) Technology Auditing Framework: A Guide for the XSEDE Community, BOF presentation made at XSEDE13: Gateway to Discovery, San Diego, CA, USA – July 22 – 25, 2013.

The brief summary of the XSEDE13 BOF is as follows. This BOF will provide a brief overview of XDMoD and its utility for analyzing XSEDE usage data as well as exploring system performance via analysis of application kernel data. This will be followed by a series of case studies carried out in real-time using XDMoD to provide the audience with a better understanding of how to navigate the user interface to achieve desired analyses. Particular emphasis will be given to case studies of interest to Campus Champions. The session will conclude with a feedback-driven discussion of current XDMoD capabilities and future feature requests.

7.0 TAS MEETINGS, EVENTS, PRESENTATIONS, WORKSHOPS, AND PUBLICATIONS

7.1 Meetings: Weekly meetings of the TAS Working Group are held every Wednesday afternoon. CCR staff members attend in person with University of Michigan and Indiana University subcontractors attending by web and audio conference. The TAS working group includes:  University at Buffalo: Dr. Thomas R. Furlani, PI, Dr. Matthew D. Jones (Co-PI and Technical Project Manager), Mr. Steven Gallo (lead software engineer), Mr. Andrew Bruno (scientific programmer), Dr. Charng-Da Lu (computational scientist), Mr. Ryan Gentner (scientific programmer), Mr. Amin Ghadersohi (scientific programmer), Mr. Jeffery T. Palmer (scientific programmer), Dr. Nikolay Simakov (computational scientist), Dr. Abani Patra and Dr. Robert L. DeLeon (Project Manager)  Indiana University: Dr. Gregor von Laszewski and Dr. Fugang Wang  University of Michigan: Dr. Thomas Finholt, Mr. Jonathan Brier.  NICS: Dr. Haihang You, Ziliang Zhao

7.2 Events, Presentations and Workshops: The TAS group has been very active in disseminating the progress made on TAS as well as closely related work. We have published in the primary scientific literature. In addition, we attended and presented papers, posters and BOF workshops at TG11, XSEDE12, XSEDE13, OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 59

SC10, SC11, and SC12. A listing of TAS publications, presentations, and workshops is as follows.

1. T. R. Furlani, B. L. Schneider, M. D. Jones, J. Towns, D. L. Hart, S. M. Gallo, R. L. DeLeon, C. Lu, A. Ghadersohi, R. J. Gentner, A. Patra, G. von Lazewski, F. Wang, J. T. Palmer, and N. Simakov, “Using XDMoD to Facilitate XSEDE Operations, Planning and Analysis,” submitted XSEDE13 Gateway to Discovery; July 22-25 (2013), San Diego, CA. 2. C. Lu, J. Browne, R. L. DeLeon, J. Hammond, B. Barth, T. R. Furlani, S. M. Gallo, M. D. Jones, and A. K. Patra, “Comprehensive Job Level Resource Measurement and Analysis for XSEDE HPC Systems,” submitted XSEDE13 Gateway to Discovery; July 22-25 (2013), San Diego, CA. 3. Z. Zhao, C. Lu, F. Xing, and H. You, “Achieve Best Utilization across XSEDE Platforms”, submitted XSEDE13 Gateway to Discovery; July 22-25 (2013), San Diego, CA. 4. T. R. Furlani, M. D. Jones, S. M. Gallo, A. E. Bruno, C. Lu, A. Ghadersohi, R. J. Gentner, A. Patra, R. L. DeLeon, G. von Lazewski, L. Wang and A. Zimmerman, “Performance Metrics and Auditing Framework Using Applications Kernels for High Performance Computer Systems”, Concurrency and Computation: Practice and Experience. Vol 25, Issue 7, p 918, (2013). DOI:10.1002/cpe.2871 5. C. Lu, M. D. Jones, T. R. Furlani, “Automatically Mining Program Build Information via Signature Matching”, TG 11 Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, Salt Lake City, UT, July 18-21, 2011. DOI:10.1145/2016741.2016766 6. R. L. Moore, L. Carson, A. Ghadersohi, A. Jundt, K. Yoshimoto, and W. Young, "Analyzing Throughput and Utilization on Trestles", XSEDE12 2012 eXtreme Science and Engineering Discovery Environment, Chicago, IL, USA – July 16 – 20, 2012. DOI: 10.1145/2016741.2016768 7. T. R. Furlani, et al., Technology Audit Service for TG:XD, BOF presentation made on 11/16/2010 at SC10, SC ’10 International Conference on High Performance Computing, Networking, Storage, and Analysis, New Orleans, LA, USA – November 13 – 19, 2010. 8. T. R. Furlani, et al., Technology Audit Service for XSEDE, BOF presentation made on 11/15/2011 at SC11, SC ’11 International Conference on High Performance Computing, Networking, Storage, and Analysis, Seattle, WA, USA – November 12 – 18, 2011. 9. T. R. Furlani, et al., Technology Audit Service for XSEDE, BOF presentation made on 11/14/2012 at SC12, SC ’12 International Conference on High Performance Computing, Networking, Storage, and Analysis, Salt Lake City, Utah, USA – November 10 – 16, 2012. 10. Charng-Da Lu, James C. Browne, Robert L. DeLeon, John Hammond, Bill Barth, Abani K. Patra, Thomas R. Furlani, Matthew D. Jones, Steve M. Gallo; “Comprehensive Resource Measurement and Analysis for HPC Systems”; poster presented at XSEDE12 2012 eXtreme Science and Engineering Discovery Environment, Chicago, IL, USA – July 16 – 20, 2012. 11. Ghadersohi, R. J. Gentner, T. R. Furlani, M. D. Jones, S. M. Gallo, C. Lu, A. K. Patra, R. L. DeLeon, G. von Laszewski, F. Wang, “XDMoD 2.0 Performance Metrics and Auditing Framework for High Performance Computer Systems”; poster presented at OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 60

XSEDE12 2012 eXtreme Science and Engineering Discovery Environment, Chicago, IL, USA – July 16 – 20, 2012.

8.0 TEAM MEMBERS

8.1 University at Buffalo: We are pleased to report that most of the TAS staff has remained together for the last three years including: Dr. Thomas R. Furlani, who serves as the PI and oversees the entire TAS program, Dr. Matthew D. Jones who oversees the technical aspects of the program as well as the application kernel development; Mr. Steven M. Gallo, who leads development and implementation of the XDMoD portal and backend infrastructure, Mr. Andrew Bruno, developer of the UB Metrics on Demand (UBMoD) portal, who is working on the development of the XDMoD portal, Mr. Ryan J. Gentner, scientific programmer who works on the XDMoD portal development, Mr. Amin Ghadersohi, scientific programmer who focuses on the XDMoD portal development and the XDMoD Data Warehouse, Mr. Jeffery T. Palmer scientific programmer who is working on the portal development and an open source version of XDMoD, and Dr. Robert L. DeLeon who serves as the TAS Project Manager. Dr. Nikolay Simakov, computational scientist has replaced Dr. Lu on application kernel development.

8.2 Indiana University (IU) Subcontract: The Indiana University subcontract is led by Dr. Gregor von Laszewski and supported by Dr. Fugang Wang. Efforts included analyzing the POPS allocation data and working on the publications and scientific impact and bringing them all into XDMoD.

8.3 University of Michigan (UM) Subcontract: The University of Michigan subcontract effort is led by Dr. Thomas Finholt and supported initially by Dr. Ann Zimmerman and later Mr. Jonathan Brier. UM is responsible for determining user assessment of XDMoD and obtaining critical user feedback to optimize XDMoD to best meet user needs.

8.4 National Institute for Computational Sciences (NICS) Subcontract: The NICS effort is lead by Dr. Haihang You. They are developing Performance Environment Autoconfiguration frameworK (PEAK), a technology to help developers and users of scientific applications to select the optimal configuration for their application on a given platform. A PEAK tab ultimately will be incorporated into XDMoD to bring this resource directly to the users.

8.5 Technical Advisory Committee: As outlined in the proposal, the TAS has assembled a Technical Advisory Committee (TAC) composed of computational scientists with expertise in high performance computing, scientific computing, and XSEDE. The function of the TAC is to provide guidance to the TAS project to help assure that the auditing services are both of high quality and useful to the various XSEDE stakeholders. A listing of the TAC committee membership is given below. This group meets annually in person with the TAS team at the annual SC meetings. Typically these meetings are OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 61 also attended by J. Towns, the XSEDE PI and by B. Schnieder the TAS and XSEDE NSF program officer. To date these meeting have been convened at SC10, SC11 and SC12. In addition, the TAC meets informally by conference call or e-mail in between formal meetings. Summaries of the meeting minutes and committee recommendations are included in the TAS Annual Reports.

TAS Technical Advisory Committee:

1. Dr. Jerzy Bernholc (Atomistic QM, Physics, NC State) http://chips.ncsu.edu/~bernholc/

2. Dr. P.K. Yeung (CFD, Georgia Tech) http://www.ae.gatech.edu/community/staff/bio/yeung-p?

3. Dr. Martin Berzins (Chair Computer Science at Utah) http://www.cs.utah.edu/personnel/mb/

4. Dr. Craig Stewart (Executive Director, Pervasive Technology Institute, Indiana), http://www.indiana.edu/~ovpit/bios/cstewart.html

5. Dr. Richard Brower (Lattice QCD, Boston University) http://physics.bu.edu/people/show/brower

6. Dr. Sara Graves (Director, Information Technology and Systems Center; Univ of Alabama, Huntsville), http://www.itsc.uah.edu

7. Dr. Abani Patra (Committee Chair, University at Buffalo) http://www.mae.buffalo.edu/people/full_time/a_patra.php

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 62

9.0 REFERENCES

1. Nagios: The Industry Standard. IT Infrastructure Monitoring: http://www.nagios.org/ [August 19, 2011]. 2. Matthew L, Massie B, Chun N, Culler DE. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing 2004; 30(8) 817-840. 3. Cacti: The Complete RRDTool-based Graphing Solution. http://www.cacti.net/ [August 19, 2011]. 4. Smallen S, Olschanowsky C, Ericson K, Beckman P, Schopf J. The Inca Test Harness and Reporting Framework. Proceedings of Supercomputing 2004. See also: Inca. http://inca.sdsc.edu [September 1, 2011]. 5. Hawkeye: A Monitoring and Management Tool for Distributed Systems. http://www.cs.wisc.edu/condor/hawkeye/ [August 19, 2011]. 6. Laszewski, v. G., Grid Usage Sensor Service. http://128.150.4.107/awardsearch/showAward.do?AwardNumber=0414407 [August 19, 2011]. 7. Martin S, Lane P, Foster I, Christie M. TeraGrid's GRAM Auditing & Accounting, & Its Integration with the LEAD Science Gateway, TeraGrid Workshop. 2007 http://www.globus.org/alliance/publications/papers/TG_GRAM_auditing_and_LEAD_G ateway_final_2.pdf [August 19, 2011]. 8. GPIR. Available from: http://www.tacc.utexas.edu/projects/gpir.php [September 8, 2011]. 9. TeraGrid Knowledge Base: How can I check my TeraGrid account balance?. [October 28, 2008]. 10. TeraGrid Knowledge Base: How can I tell if a TeraGrid resource is available?. [May 16, 2008]. 11. TeraGrid Traffic Map. https://network.teragrid.org/trafmap/ [August 19, 2011]. 12. TeraGrid Information Services. Available from: [September 8, 2011]. 13. TeraGrid Knowledge Base: On the TeraGrid, how do I convert service units on one platform to the equivalent amount on another platform?. [October 28, 2008]. 14. TeraGrid Knowledge Base: What is Inca?. [May 22, 2008]. 15. TeraGrid Knowledge Base: How does Inca work?. [July 21, 2008]. 16. Reporters and Reporter Repositories. [August 19, 2011]. 17. Available from: http://inca.sdsc.edu/repository/latest/cgi-bin/list_reporters.cgi. [September 8, 2011]. 18. TeraGrid Knowledge Base: What is pfmon, and can I run it on the TeraGrid?. [September 18, 2008]. 19. Katz DS, Hart D, Jordon C, Majumdar A, Navarro JP, Smith W, Towns J, Welch V, Wilkins-Diehr N. Cyberinfrastructure Usage Modalities on the TeraGrid. IEEE International Parallell & Distributed Processing Symposium, 932 (2011). DOI 10.1109/1PDPS.2011.239 OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 63

20.D. L. Hart, Measuring TeraGrid: Workload Characterization for an HPC Federation. IJHPCA, November 2011, 25(4): 451-465. Advance online publication, Feb. 10, 2011. doi:10.1177/1094342010394382. 21. D. L. Hart, Deep and Wide Metrics for HPC Resource Capability and Project Usage, SC11 State of the Practice Reports (Seattle, WA), 2011. doi:10.1145/2063348.2063350 22. TeraGrid Knowledge Base: What is PAPI, and where is it installed on the TeraGrid?. [October 17, 2008]. 23. Canal P, Green C. GRATIA, A resource accounting system for OSG. CHEP 2007. 24. DOD HPC modernization program metrics: http://www.hpcmo.hpc.mil/Htdocs/HPCMETRIC/index.html [December 16, 2011]. 25. Paul M. Bennett. 2011. Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program. In State of the Practice Reports (SC '11). ACM, New York, NY, USA, Article 3 , 11 pages. DOI=10.1145/2063348.2063352 http://doi.acm.org/10.1145/2063348.2063352 26. NERSC performance monitoring tools: https://www.nersc.gov/research-and- development/performance-and-monitoring-tools/ [December 16, 2011]. 27. DOE “operational assessment” metrics for various HPC sites, for example ORNL: http://info.ornl.gov/sites/publications/files/Pub32006.pdf [December 16, 2011]. 28. University at Buffalo Metrics on Demand (UBMoD): Open source data warehouse and web portal for mining statistical data from resource managers in high-performance computing environments. Developed at the Center for Computational Research at the University at Buffalo, SUNY. Freely available under an open source license at SourceForge at http://ubmod.sourceforge.net/ 29. Furlani, T.R., Jones, M.D., Gallo, S.M., Bruno, A.E., Lu, C.-D., Ghadersohi, A., Gentner, R.J., Patra, A., DeLeon, R.L., von Laszewski, G., Wang, F., and Zimmerman, A., “Performance metrics and auditing framework using application kernels for high performance computer systems,” Concurrency and Computation: Practice and Experience, pp.n/a–n/a, 2012. [Online]. Available: http://dx.doi.org/10.1002/cpe.2871 30. Levene M, Loizou G. Why is the snowflake schema a good data warehouse design?. Inf. Syst. 28, 3 (May 2003), 225-240. DOI=10.1016/S0306-4379(02)00021-2 http://dx.doi.org/10.1016/S0306-4379(02)00021-2 31. Moody DL, and Kortink MAR. From: ER Models to Dimensional Models: Bridging the Gap between OLTP and OLAP Design, Journal of Business Intelligence, 8, 3, Summer(2003). 32. Lin Beixin (Betsy), Hong Yu, Lee Zu-Hsu. Data Warehouse Performance (pages 318- 322) in Encyclopedia of Data Warehousing and Mining, Second Edition, editor Wang John (2009). 33. ExtJS Website. http://www.sencha.com/products/extjs/ 34. Krasner GE, Pope ST. A cookbook for using the model-view controller user interface paradigm in smalltalk-80. J. Object Oriented Program., 1(3):26–49, 1988. 35. Buschmann F, Meunier R, Rohnert H, Sommerlad P, Stal M. Pattern-Oriented Software Architecture, A System of Patterns, John Wiley & Sons, (1996). 36. Valiev M, Bylaska EJ, Govind, Kowalski NK, Straatsma TP, van Dam HJJ, Wang D, Nieplocha J, Apra E, Windus TL, de Jong WA. NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput. Phys. Commun. 181, 1477 (2010) OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 64

37. http://www.nwchem-sw.org [December 16, 2011]. 38. Schmidt MW, Baldridge KK, Boatz JA, Elbert TS, Gordon MS, Jensen JH, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA. General Atomic and Molecular Electronic Structure System. J. Comput. Chem., 14, 1347- 1363(1993). 39. Gordon MS, Schmidt MW. Advances in electronic structure theory: GAMESS a decade later pp. 1167-1189, in Dykstra CE, Frenking G, Kim KS, G.E. Scuseria GE (editors), Theory and Applications of Computational Chemistry: the first forty years. Elsevier, Amsterdam, 2005. 40. http://www.msg.ameslab.gov/gamess [December 16, 2011]. 41. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26:1781-1802, 2005; DOI10.1002/jcc.20289 42. http://www.ks.uiuc.edu/Research/namd [December 16, 2011]. 43. Skamarock WC, Klemp JB, Dudhia J, Gill DO, Barker M, Duda KG, Huang XY, Wang W, and Powers JG. A description of the advanced research WRF version 3. Technical report, National Center for Atmospheric Research, NCAR/TN–475+STR, 2008. http://www.wrf-model.org 44. http://www.cesm.ucar.edu 45. McCalpin JD. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995. 46. http://www.cs.virginia.edu/stream 47. OSU Micro Benchmarks 3.3. http://mvapich.cse.ohio-state.edu/benchmarks/ [September 1, 2011. See also: Intel MPI Benchmarks 3.2.2. http://software.intel.com/en- us/articles/intel-mpi-benchmarks [September 1, 2011]. 48. Petitet A, Whaley RC, Dongarra J, Cleary A. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers http://netlib.org/benchmark/hpl [September 1, 2011]. 49. Murphy RC, Wheeler KB, Barrett BW, Ang JA. Introducing the Graph 500. Cray User’s Group (CUG), (May 5, 2010). http://www.graph500.org/ 50. Asanovic K, Bodik R, Catanzaro B, Gebis J, Husbands P, Keutzer K, Patterson D, Plishker W, Shalf J, Williams S, Yelik K. The Landscape of Parallel Computing Research: A View from Berkeley. Technical report UCB/EECS-2006-183, EECS Department, University of California at Berkeley, (December 2006). 51. S. D. Ahern, S. R. Alam, M. R. Fahey, R. J. Hartman-Baker, R. F. Barrett, R. A. Kendall, D. B. Kothe, O. E. B. Messer, R. T. Mills, R. Sankaran, A. N. Tharrington, J. B. White III. "Scientific Application Requirements for Leadership Computing at the Exascale". Proceedings of the 2008 Cray User Group Meeting, May 5-8, 2008, Helsinki, Finland. Also available as ORNL/TM-2007/238. 52. Dwarf Mine. http://view.eecs.berkeley.edu/wiki/Dwarf_Mine [September 1, 2011]. 53. Hammond, J. "TACC_stats: I/O performance monitoring for the intransigent" In 2011 Workshop for Interfaces and Architectures for Scientific Data Storage (IASDS 2011) 54. Hadri, B., You, H., Moore, S.: Achieve better performance with PEAK on XSEDE resources. Proceedings of the 1st Conference of the Extreme Science and Engineering OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 65

Discovery Environment: Bridging from the eXtreme to the campus and beyond, pp. 1-8. ACM, Chicago, Illinois (2012) 55. “XSEDE Overview.” [Online]. Available: https://www.xsede.org/overview 56. Wang, L., von Laszewski, G., Dayal J., and Furlani, T.R., “Thermal Aware Workload Scheduling with Backfilling for Green Data Centers”, IEEE IPCCC 2009. DOI: 10.1109/PCCC.2009.5403821 57. Wang, L., von Laszewski, G., Dayal, J., He, X., Younge A.J., and Furlani, T.R., Towards Thermal Aware Workload Scheduling in a Data Center. Accepted by the 10th International Symposium on Pervasive Systems (i-SPAN'09), Algorithms and Networks, Taiwan, Dec. 14-16 2009. DOI: 10.1109/I-SPAN.2009.22 58. von Laszewski, G., Lee, H., Diaz, J., Wang, F., Tanaka, K., Karavinkoppa, S., Fox, G.C., Furlani, T.R.; “Design of an Accounting and Metric-based Cloud-shifting and Cloud- seeding framework for FederatedClouds and Bare-metal Environments,” FederatedClouds ’12 Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit; pages 25-32 (2012), DOI: 2378975.2378982

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 66

10.0 APPENDIX

The Appendix is organized as follows. Appendix 10.1 contains the annotated Requirements Traceability Matrix from the original TAS proposal to provide the reader with an indication of progress to date in achieving the objectives and goals outlined in the original proposal. New versions of XDMoD are released twice a year; one at the annual XSEDE summer conference and one at the annual supercomputing (SC) conference in November. Appendix 10.2 presents the history of XDMoD functionality for each release. As described in the body of this report, an open source version for academic and industrial HPC centers is being developed. However, as described in Appendix 10.3, in order to meet the immediate demand for HPC Center auditing functionality, we carried out a substantial rewrite of UBMoD, our in-house open-source auditing tool and made it available to the user community through sourceforge. Appendix 10.4 contains a sample brief audit of XSEDE for 2012 in order to demonstrate the capabilities of the TAS auditing framework and to indicate some of the metrics available through XDMoD. Appendix 10.5 contains the TAS recommendations for additional job and resource centric data that are currently not recorded but we strongly recommend should be in order to improve XDMoD’s analytical capability. These data collection recommendations are also contained in XDMoD’s Compliance tab and are therefore readily available to XSEDE and NSF leadership.

10.1 Requirements Traceability Matrix In the original proposal, information about the system requirements was been compiled into a Requirements Traceability Matrix in order to help guide and coordinate development efforts. We recognized early on that the needs of the scientific research community would continue to evolve over the five years of the award and accordingly the project requirements were designed to be flexible enough to evolve in order to continue to meet those needs. As shown in the Requirements Traceability Matrix below, which appears exactly as it did in the proposal with the exception that we have annotated it to include the current status of each of the requirements, we have met or exceeded almost all requirements. Please note that the tables below refer to TGMoD (TeraGrid Metrics on Demand) as opposed to XDMoD because at the time of the proposal the NSF program that TAS was targeted at was know by TeraGrid. We chose not to modify the entry in the tables below to reflect XSEDE but rather keep the table exactly as it appeared (minus the annotation) in the proposal.

The Requirements Traceability Matrix contains the following information:  Requirement Category  Requirement Description  Component that will fulfill the requirement  Optional comment about the requirement  Source of the requirement

Table 10.1-1. The requirement sources are grouped into three categories: TD Technical direction provided by NSF TGES Report from the TeraGrid Evaluation Study [1] UC User Community, Use Cases

Table 10.1-2. Acronyms and Abbreviations API Application Programmer Interface SA System Administrator OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 67

App Kernel Application Kernel TG TeraGrid App Kernel Application Kernel Toolkit TGMoD TeraGrid Metrics on Demand TK CD Center Director TGMoD TGMoD user information gathering Gatherer module EU End User TGMoD Report TGMoD reporting infrastructure PO Program Officer TGMoD UI TGMoD User Interface REST Representational State Transfer TGMoD TGMoD data warehouse Warehouse component

Table 10.1-3 Requirements Traceability Matrix. The requirements traceability matrix was taken directly from the original proposal and only annotated (in red text) to indicate our progress to date in each requirement. Requirement Component Comment/Status (shown in red color) Source Category Description Continually test user Monitoring As per solicitation. TD environment Framework Done routinely with Application Kernels. Deploy a monitoring Monitoring Deploy a framework for running monitoring tests and TD framework Framework application kernels. Initial deployment of XDMoD has been accomplished. Although XDMoD currently provides an extensive set of metrics, the framework is being continually refined to be more responsive to the XSEDE user base. Deploy application Monitoring Application kernels will be specialized tests deployed TD kernels Framework by the monitoring framework. Initial deployment has been accomplished; the application kernels are being continually refined and improved. To date over 18 application kernels have been deployed to test all aspects of system performance and reliability. Acquire and store Monitoring Cache raw monitoring data from recent tests for TD Monitor monitoring data Framework ingestion into the TGMoD data warehouse. This is done routinely. Provide mechanism Monitoring A simple RESTful or Web Service will allow TGMoD TD for export/extraction Framework to extract monitoring results. Initial deployment of monitoring data accomplished; API is continually being refined. Implement a Monitoring The monitoring tests and application kernels will be TD framework Framework kept as independent as possible from the monitoring independent set of framework. This will allow a different framework to tests be deployed if necessary. A completely independent system has been written to run the Application Kernels and the output is ingested into the XDMoD framework. Computational App Kernel Linpack, OpenFOAM, GAMESS, NAMD, PeTSC, etc. TD performance kernels Specific application kernels have been deployed to test computational performance. These include NPB, HPCC, Graph500 and NWChem. Test filesystem App Kernel Performance of local and global filesystems. Specific TD performance application kernels have been deployed to test the file systems. IOR and MPI-Tile-IO are routinely run on all resources. App Test uniform file App Kernel Test uniform file system interface. Specific application TGES Kernel access and kernels have been deployed to test the file systems. availability IOR and MPI-Tile-IO are routinely run on all resources. Test local processor- App Kernel A number of application kernels probe the memory TD memory bandwidth performance. HPCC is the one we have deployed and OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 68

have relied upon the most for measuring memory performance. Test allocate able App Kernel A number of application kernels probe the memory TD shared memory performance. HPCC is the one we have deployed and have relied upon the most for measuring memory performance. Test processor speed App Kernel Many application kernels test the CPU speed. Specific TD application kernels have been deployed to test computational performance. These include NPB, HPCC, Graph500 and NWChem. Test network App Kernel Several application kernels test network latency. These TD latency include: HPCC, NPB, Mpi-Tile-IO and Graph500. Test network App Kernel HPCC is the primary test of network bandwidth. TD bandwidth Test computational App Kernel Test computational templates for various scientific TGES templates domains. We have initially concentrated on computational Chemistry/BioChemistry (for example NAMD, GAMESS, etc.) and meterology application kernels (WRF and CESM) since these, coupled with the benchmark type kernels, seem to span the widest range of system performance. Report queue App Kernel Test wait times in queue for various job configurations. TGES turnaround times XDMoD currently reports the user expansion factor metric so users can gauge turnaround for their jobs. Application kernel App Kernel Allow 3rd parties to develop and submit their own Novel toolkit for 3rd party TK application kernels. TAS has been responsive to developers requests for custom features from XSEDE stakeholders. To date these have primarily taken the form of XDMoD metric requests. There have been no requests for custom application kernels or the implementation of an application kernel toolkit as of this time. The toolkit will be deployed during years 4 and 5. Implement role- TGMoD UI Web access is ubiquitous. XDMoD has been UC based web user implemented as a web-based tool. It is accessible to the interface public in general with special access for XSEDE users as per specific NSF direction. Customize various TGMoD UI The amount, type, and organization of information UC aspects of the user displayed is based on the role of the current user. interface based on XDMoD has been implemented with a full role based role of authenticated system. Currently the roles include: Program Officer, user. Center Director, PI, User, Campus Champion, and Public. The role-based system is completely flexible in that roles can be added, deleted or changed as desired. Authenticate/Author TGMoD UI All users must be authenticated and further authorized UC ize users to see particular information. XDMoD is accessible to the public in general with special access for authorized XSEDE users as per specific NSF direction. Associate one or TGMoD UI Each user is assigned one or more roles to tailor the UC more roles to each experience and access. The XDMoD role-based system TGMoD user both restricts data access as per NSF direction and customizes the data display to suit each role. Provide TGMoD Users may customize the way information is presented UC customizable user Report to them. In addition to the role system that matches the profiles data displayed to the role of the user, XDMoD has provided the ability to customize the information OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 69

presented to the user both in the display such as the customizable Summary tab and in information proved for output using the Custom Report Generator. Provide a default set The XDMoD framework provides a very detailed set of UC of reports for: metric for instant display. In addition TAS has provided a Custom Report Generator that allows the user to output data reports on any set of XDMoD metrics for any desired timeframe. The custom Report Generator goes far beyond simply providing reports for specific roles; it allows the user the flexibility to prepare specific reports for any specific purpose. This transcends the requirements listed below.  Program Officer TGMoD Have the obligation to determine how efficiently and UC (PO) Report effectively the supported programs are meeting the needs of the research community. See Custom Report Generator discussion above.  Center Director TGMoD Have the obligation to report on results to the funding UC (CD) Report agency to justify their operational and hardware costs. See Custom Report Generator discussion above.  System TGMoD Need to identify issues with a particular sites resources UC Administrator Report See Custom Report Generator discussion above. (SA)  End User (EU) TGMoD Needs detailed feedback to improve throughput and UC Report utilization. See Custom Report Generator discussion above. Provide reports for: The Custom Report Generator allows the user to UC prepare a report on any metrics available in XDMoD. Comments below will therefore only be limited to whether or not the metrics are yet available in XDMoD rather than whether or not reports on the metrics are available.  TG Utilization TGMoD Display resource utilization for TG (PO, CD), Site UC Report (SA), or personal utilization (EU) Fully implemented.  TG Application TGMoD Display application kernel performance metrics for TG UC Performance Report (PO, CD), Site (SA), or personal utilization (EU) Fully implemented.  Science Gateway TGMoD Display science gateway job throughput (PO, CD) UC Usage Report Partially implemented. All data available has been displayed and can be reported, however, Gateway usage reporting still has not been unified.  Help Desk TGMoD Track help desk tickets submitted and resolved and UC Tickets Report average time to resolve for TG (PO, CD) or their site (SA) Not yet implemented. TAS has been directed by XSEDE leadership to delay implementation of Help Desk Tickets until after XSEDE has deployed its TGMoD central Help Desk Ticket System.  Publications TGMoD Track publications acknowledging TG research TGES Report As described in this report, this task is currently still in progress.  Grant TGMoD Track grant submissions and awards acknowledging TGES Submissions and Report TG As described in this report, this task is currently Awards still in progress.  Comparing TGMoD Compare the performance of an application across TGES performance of Report different architectures. TAS has launched a special different initiative funding NICS to implement their PEAK OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 70

architectures software that not only compares performance of various important applications across different platforms but also optimizes compilation options as well.  Computational TGMoD Group application kernel results into computational TGES Templates Report templates for various scientific domains. As described above, PEAK uses application kernels to assess performance across platforms. This accomplishes the goal of guiding the various scientific domains in platform and even compiler optimization. Provide drill-down TGMoD Clicking on a particular chart will allow drill-down to UC capability on select Report more detailed information. This has been fully reports. implemented on the XDMoD framework. Provide user- TGMoD Allow users to generate custom reports that provide UC customizable auto- Report summary information for quick viewing. This has been generated report fully implemented on the XDMoD framework; also see summaries the Custom Report Generator discussion above. Examine journal TGMoD Users enter their publications related to TeraGrid. This UC publications Gatherer is a difficult task that is currently still in progress. The solution will span not only XDMoD/TAS but also the User Portal and POPS. Examine grant TGMoD Users enter Grant submissions and awards referencing UC awards Gatherer TeraGrid. This is a difficult task that is currently still in progress. The solution will span not only XDMoD/TAS but also the User Portal and POPS. Examine TGMoD Track statistics on scientific gateway usage and UC portal/gateway Gatherer availability Partially implemented. All data available usage has been displayed and can be reported, however, Gateway usage reporting still has not been unified. Export data as Excel TGMoD Allow export of report data to an Excel spreadsheet for UC spreadsheet Export offline analysis. This has been fully implemented. The user has several options to export either finished charts and reports or raw data to Excel. Save report as PDF TGMoD Save reports as PDF for inclusion into reports. This has UC document Export been fully implemented on the XDMoD framework; also see the Custom Report Generator discussion above. Ingest monitoring TGMoD Ingest test and application kernel data from the TD data Warehouse monitoring framework. This has been fully implemented. Store real-time and TGMoD Store and aggregate real-time and historical data in the UC historical data Warehouse data warehouse. This has been fully implemented. Provide data mining TGMoD Provide data mining and trend analysis capabilities. UC capability Warehouse This has been fully implemented. The extensive metrics available in the XDMoD framework provides the basis for detailed usage analysis. Provide TGMoD API An interface provides platform independent access to TD authenticated API TGMoD data and functionality. This has been fully implemented. Provide access to TGMoD API Authenticated 3rd parties can access raw monitoring UC raw monitoring data data in XML format so they can implement custom display methods. An XML format export option has been provided for XDMoD data reports. Provide access to TGMoD API Authenticated 3rd parties can access the raw data used UC report data to generate reports. This has been fully implemented on the XDMoD framework; also see the Custom Report OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 71

Generator discussion above. Provide search TGMoD API The API will allow search parameters to be specified. UC restriction tools A functional API has been provided to interested parties such as the XSEDE User Portal team. Provide full API TGMoD API Full documentation of all API functionality will be UC documentation provided. Documentation on the API has been provided to the XSEDE User Portal Team.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 72

10.2 XDMoD Feature History Table 10.2-1 presents the history of XDMoD functionality. New versions of XDMoD are released twice a year; one at the annual XSEDE summer conference and one at the annual supercomputing (SC) conference in November.

Table 10.2-1 XDMoD Feature History

XDMoD 1.0 (7/11) 1.5 (11/11) 2.0 (7/12) 2.5 (11/12) 2.6 (2/12) Data TGCDB Jobs X X X X X TGCDB Allocations, Accounts X X X X (POPS) Grants, Proposals X X X Derived Performance Metrics X X Resource RDR X X

Roles PO, CD X X X X X PI, User X X X X Campus Champion, Public X X X

Summary Tab X X X X X Client Side Charting (JavaScript) X X Edit/augment the summary charts X X

Usage Tab X X X X X Bar, Line, Area, Spline, Area Spline, Scatter, Pie X X X X X Drill down X X X X PNG, CSV, XML, EPS export formats X X X X X SVG export format in lieu of EPS X Client Side Charting (JavaScript) X

Usage Explorer Tab X X X Multiple datasets on one chart X X X Client Side Charting (JavaScript) X X X Bar, Line, Area, Spline, Area Spline, Scatter X X X Pie Charts X X Advanced Filtering X X X Trend Line for time series X X Invert X Axis X X PNG, CSV, XML, SVG export formats X X Save queries for future edit/use X X X Edit/augment the summary charts X X

App Kernels X X X X X Change Indicators X X X X X PNG, CSV, XML, EPS export formats X X X X X SVG export format in lieu of EPS X App Kernel Explorer X X X Quality of Service (Control Plot, Control Zones, X X Running Average, Control Bands) Generated plots can be added to report X

Report Generator X X X X X PDF Format X X X X X Word Format X X SP Quarterly Report Templates X X Monthly Compliance Report Template X Server-side chart caching X

Compliance Tab X X Job Exist Status, Job Memory and Disk Usage, X X Shared Storage Usage

Custom Queries X Generated plots can be added to report X

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 73

10.3 UBMoD: Initial Release of University-based Metrics on Demand Academic and industrial HPC centers have shown an interest in utilizing the capabilities of XDMoD to audit the delivery of their services in the same manner XDMoD currently provides XSEDE. As discussed in the body of this report, we are in the process of modifying XDMoD to provide this capability. However, in order to meet the immediate demand for HPC Center auditing functionality we have carried out a substantial rewrite of UBMoD (http://ccr.buffalo.edu/services/hpc-metrics-on-demand.html), our in-house open-source auditing tool and made it available to the user community through sourceforge [28].

UBMoD (UB Metrics on Demand), which is the model upon which XDMoD was based, was designed with many of the same objectives as the Technology Audit Service including: (1) providing the user community with an easy to use tool to manage their accounts and optimize their use of resources, (2) providing staff with a diagnostic tool to monitor and tune resource performance for the benefit of the users, and (3) providing senior management with a tool to easily monitor utilization, user base, distribution of resources among decanal units, and most importantly to help ensure that the resources are effectively enabling research.

UBMoD is an open source tool for collecting and mining statistical data from cluster resource managers (such as Torque, OpenPBS, and SGE) commonly found in high-performance computing environments. It presents resource utilization including CPU cycles consumed, total jobs, average wait time, etc. for individual users, research groups, departments, and decanal units. The web-based user interface provides a dashboard for displaying resource consumption along with fine-grained control over the time period and resources displayed. The data warehouse can easily be customized to support new resource managers. The information is presented in easy-to-understand charts and tables and provides system administrators, users, and directors of HPC centers with a rich set of metrics to better understand how their resources are being utilized. The current release, which was completely re-written in PHP, adds the ability to apply custom tags to users and jobs and to then filter all reports using those tags. This provides complete flexibility for organizing users into departments, projects, and groups. For example, users can be tagged as members of one or more projects and reports can be dynamically generated for those projects. UBMoD can be downloaded from http://ubmod.sf.net/.

Figure 10.3-1 shows UBMoD’s download history over a one year time period (February 2012 – February 2013). During this time over 500 downloads have occurred, demonstrating the interest in the HPC community for this capability. Table 10.3-1 is a list of user’s who have contacted UB staff with a question regarding implementation of UBMoD and therefore represents a lower bound for the number of installations. We are currently establishing a mailing list to more accurately track utilization. It is interesting to note in the table the number of industrial users in addition to the academic centers.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 74

Figure 10.3-1 Download history of UBMoD from February 2012 through February 2013.

Table 10.3-1 XDMoD users who have submitted a UBMoD help desk ticket. Since we currently do not track actual installations, this represents a lower bound to the number of installations.

Institution Individual Brookhaven National Laboratory Nicholas D' Imperio Case Western Roger Bielefeld Case Western Hadrian Djohari Case Western Sanjaya Gajurel Clemson University [email protected] CUNY Florian Lengyel Diamond Light Source Ltd Tina Friedrich Garvan Institute Derrick LIN Georgia Tech Paul Manno IBM Nathan Harper Instituto de Física de Cantabria Álvaro López García Johns Hopkins University Scott Roberts Mississippi State University Trey Breckenridge NCSA John Towns Nemzeti Információs Infrastruktúra Gábor Rőczei Fejlesztési Intézet Nemzeti Információs Infrastruktúra Ivan Marton Fejlesztési Intézet Northwestern University Joe Paris Notre Dame Jaroslaw Nabrzyski NYU Joseph Hargitai NYU Ted Malliaris RENCI John McGee RENCI UNC-CH Jonathan Mills Rice University Chandler Wilkerson Rice University Roger Moye Rolls-Royce PLC Dumal Welikala OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 75

Stanford University Jason Bishop The Dow Chemical Company Dee Dickerson University of Alabama at Birmingham John-Paul Robinson University of Birmingham Alan Reed University of Canterbury [email protected] University of Florida Erik Deumens University of Florida Paul Avery University of Illinois Nada J. Cagle University of Indiana Gregor von Laszewski University of North Carolina James Glasson University of North Carolina Larry Mason University of North Carolina Ruth Marinshaw , University of Oklahoma Henry Neeman University of South Carolina Jerry Ebalunode

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 76

10.4 Sample XDMoD Audit of the XSEDE System for 2012: In order to demonstrate the capabilities of the XDMoD auditing framework and to indicate some of the metrics available to the XDMoD user we have compiled a brief example audit of XSEDE for 2012. This audit is for illustrative purposes, many other metrics can easily be displayed.

Figures 10-4-1 and 10.4-2 illustrate the growth of the TG/XD system from 2005 to the present in terms of Service Units (SUs) supplied and jobs run. The terminology Service Units (SUs) should be understood as core hours with the caveat that an SU is defined locally in the context of a particular machine. Thus, the value of an SU varies across resources utilizing varying technologies and, by implication, varies over time as technology advances.

Figure 10.4-1 SUs supplied (millions) from 2005-2012.

Figure 10.4-2 Number of quarterly XSEDE/TG jobs run (millions) from 2005-2012.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 77

Figures 10.4-3, 9.4-4, and 10.4-5 show the SUs supplied, jobs run and overall percent utilization of XSEDE resources in terms of jobs run during 2012.

Figure 10.4-3 SUs supplied (millions) in 2012 broken down by XSEDE resource.

Figure 10.4-4 Number of jobs run (thousands) in 2012 broken down by XSEDE resource.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 78

Figure 10.4-5. Percent Utilization of XSEDE resources in terms of number of jobs run in 2012.

Figures 10.4-6 and 10.4-7 show the XSEDE SUs and jobs run during 2012 broken down by parent field of science.

Figure 10.4-6. XD SUs consumed (millions) during 2012 broken down by parent field of science.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 79

Figure 10.4-7. Number of XD jobs run during 2012 broken down by parent field of science.

Figures 10.4-8 and 10.4-9 show XD SUs supplied and jobs run during 2012 broken down by NSF status.

Figure 10.4-8. Number of XD SUs supplied (millions) by XD jobs run during 2012 broken down by NSF status.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 80

Figure 10.4-9. Number of jobs run (thousands) during 2012 broken down by NSF status.

Figures 10.4-10 and 10.4-11 show XD SUs supplied and jobs run during 2012 broken down by institution.

Figure 10.4-10 XD SUs supplied (millions) during 2012 broken down by user institution.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 81

Figure 10.4-11 Number of jobs run (thousands) during 2012 broken down by user institution.

Figures 10.4-12 and 10.4-13 show XD SUs supplied and jobs run during 2012 for the top 15 principal investigators recording the heaviest usage.

Figure 10.4-12 XD SU consumption (millions) during 2012 for the top 15 principal investigators.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 82

Figure 10.4-13 Number of jobs run (thousands) during 2012 for the top 15 principal investigators.

Figures 10.4-14 and 10.4-15 show XD SUs supplied and number of jobs run, during 2012 broken down by job size.

Figure 10.4-14 XD SUs consumed (millions) during 2012 broken down by job size.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 83

Figure 10.4-15 Number of jobs run (thousands) during 2012 broken down by job size.

OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 84

10.5 XSEDE Central Database (XDCDB) Data Recording Recommendations: Based upon TAS’s examination of the existing records in the XDCDB, we have carefully formulated a series of recommendations for improved record-keeping. These recommendations include improved job metrics, resource metrics and user metrics. Many of these metrics are included in the Compliance Tab in XDMoD that is viewable by NSF and XSEDE leadership. As a partial response to this, we have implemented the Compliance Tab in XDMoD, see Section 4.3, which tracks data reporting by the various service providers both for required data fields and our recommended data fields. The recommendations are as follows.

Job Metrics  Number and type of cores charged (including GPUs). Note that this should reference node types reported with resource metrics (e.g., RDR)  Exit status/return code  Local disk usage  Memory usage  SU/NU consumed  Ability to flag jobs as development or production (mostly done by queue now?)  Eligible queue time – time from when the job is eligible to run until it starts executing.  Policy expansion factor – (requested wall time + eligible queue time)/ (requested wall time)  Application usage, e.g., SUPReMM and Automatic Library Tracking Database implemented by ORNL (c.f., Hadry, Fahey, and Jones, TG10 presentation, following URL may be broken by Wiki formatting but clear in the source text, link)

Resource Metrics (e.g., RDR) The following resource data are either not being recorded or are being recorded inconsistently. We recommend both implementing new record keeping on these metrics and correcting historical records. Moving forward, both current and historical data should be maintained with appropriate time stamps when any data is modified to indicate record validity. For heterogeneous systems with multiple sets of node types, node and GPU data should be recorded for each set.  Node configuration (one entry for each set of node types in a heterogeneous system) o Number of nodes o Number of available nodes o Number of processors per node o Processor type and speed o Processor memory (shared or dedicated)  GPU configuration (one entry for each set of node types in a heterogeneous system) o Number of GPU/Accelerator nodes o Number of GPU/Accelerator processors per node o GPU/Accelerator Processor type and speed o GPU/Accelerator Processor memory  Rmax and Rpeak values  Percentage of resource allocated to XSEDE  Local SU computation algorithm for each resource  XD conversion factor OCI – 1025159: Technology Audit Service for TeraGrid SUNY Buffalo (Furlani) 85

 Interconnect

For storage resources

 Shared filesystems description and retention policy  Capacity  Shared filesystems usage  Keep information for storage jobs separately from compute jobs so that information collected can be specific to storage

Gateway Usage  Consistent reporting of gateways Reporting of gateway end-user attributes

User Metrics  During the allocation process, list PI’s other funding sources for the work that the allocation will be used for (POPS db)  percent credit for parent sciences for multi-disciplinary projects (at allocation process?)

Database Mirroring  Allow mirroring of the data from the primary to more than one secondary site

TGCDB Schema Documentation  Maintain a schema diagram/description of key tables and relationships o acct.accounts, acct.allocations, acct.allocation_adjustments, acct.allocation_on_resources, acct.fields_of_science, acct.jobs, acct.organizations, acct.people, acct.principal_investigators, acct.requests, acct.resources, acct.system_accounts, acct.transactions

Document formulas, triggers, and other logical elements. For example, NU/SU normalization and calculation process.