ENAR

2015SPRING MEETING With IMS & Sections of ASA MARCH 15 –18 Hyatt Regency Miami Miami, FL FINAL PROGRAM & ABSTRACTS

4 Welcome and Overview 6 Acknowledgements 13 Special Thanks 15 Miami Highlights 18 Presidential Invited Speaker 19 IMS Medallion Lecture 20 Short Courses 23 Tutorials 26 Roundtables 29 Program Summary 44 Scientific Program 148 Abstracts & Poster Presentations 383 Index 397 Floor Plan WELCOME

¡Bienvenidos a Miami! It is my great pleasure to introduce the 2015 ENAR Spring Meeting to be held at the Hyatt Regency Miami, in Miami, FL, from March 15-18. The ENAR Spring Meeting brings together researchers and practitioners from academia, industry and government, connected through a common interest in Biom- etry. It offers a unique opportunity for learning new exciting methods and software, hearing about interesting and impactful applications, meeting new people (including prospective employers and job candidates), reconnecting with friends, and, this year, getting a break from the cold and snowy winter. The ENAR Spring Meeting only hap- pens through the diligent work of a large number of people who organize and contrib- ute to the program, plan and oversee the meeting logistics, and help with sponsorship – my heartfelt gratitude to all of them.

Scientific Program: Through the leadership of Program Chair Mithat Gönen, of Memorial Sloan- Kettering Cancer Center, and Associate Chair Brisa Sánchez, of the University of Michigan School of Public Health, and with contributions from many of you, the Program Committee (with representatives from different ASA sections) has created an outstanding invited program. The sessions cover a wide range of topics of great interest to both researchers and practitioners, such as, data sciences (big data), , clinical trials, neuroimaging, biomarkers, health policy, electronic health records, ecology, and epidemiol- ogy. The IMS invited program, assembled under the leadership of IMS Program Chair Lurdes Inoue, of the University of Washington, also features an exciting array of sessions that nicely complement and promote synergies with the ENAR invited program.

Poster presentations will, once again, be a vibrant part of the scientific program. In addition to contrib- uted and invited posters (the latter first featured in the 2014 meeting), the 2015 ENAR Spring Meeting introduces a novelty: Contributed oral poster sessions, in which presenters will be able to give a two-minute elevator-speech on the highlights of their posters. The contributed oral sessions, to be held on Monday, will be organized by themes, will feature two invited posters from well-known researchers, and will run parallel to the rest of the sessions in the scientific program. As in previous years, the regular contributed and invited posters will be presented Sunday evening, during the Opening Mixer. The highly popular ENAR Regional Advisory Board (RAB) poster competition will include contributed posters from the Sunday session only.

Educational Program: Be sure to take advantage of the unique and varied learning opportunities that the 2015 ENAR Spring Meeting has to offer through its superb program of short courses, tutorials and roundtables, assembled by the Educational Advisory Committee. Presented by well-known experts in their respective fields, the short courses and tutorials will cover a variety of topics of great interest to meet- ing attendees, including: Bayesian methods in drug development, personalized medicine trial designs, analysis of brain imaging data, data sciences and high performance statistical computing, early phase clinical trials, statistical leadership and influence, graphics for clinical trial data, and software applications for group sequential and adaptive designs, Bayesian modeling and analysis, and multiplicity problems. A favorite of many who come to the meeting, roundtable luncheons will also be featured in the program. Distinguished statisticians from academia, government, and industry will lead the luncheon discussions on topics ranging from how to publish without perishing to innovations in drug development to Bayesian evidence synthesis.

Keynote Lectures: The Presidential Invited Address and the IMS Medallion Lecture are two of the high points of the ENAR Spring Meeting program, delivered by highly accomplished thought leaders in Biom- etry. I am honored to introduce Dr. David L. DeMets, Max Halperin Professor of and former Chair of the Department of Biostatistics and Medical Informatics at the University of Wisconsin – Madi-

4 ENAR 2015 | Spring Meeting | March 15–18 son, as the 2015 Presidential Invited Speaker. His lecture will be on “Big Data, Big Opportunities, Big Challenges.” Prof. DeMets has been an inspirational role model for more than a generation of biostatisticians working in clinical research across academia, government, and industry. His pioneering and highly impactful research in group sequential designs during his tenure at the National Heart, Lung and Blood Institute, at NIH, the creation of the Department of Biostatistics at the University of Wisconsin, and his seminal work in establishing statistical leadership in drug regulatory sciences and practice (including, literally, writing the book on Data Monitoring Committees), are just a few of his many achievements. He is a past-president of ENAR and the Society for Clini- cal Trials, as well as an Elected Fellow of the International Statistics Institute, the American Statistical Association, the Association for the Advancement of Science, the Society for Clinical Trials and the American Medical Informatics Association. In 2013, he was elected as a member of the Institute of Medicine.

The IMS Medallion Lecture, entitled “Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Cou- pling,” will be presented by Dr. Tilmann Gneiting, Group Leader at the Heidelberg Institute for Theoretical Studies (HITS) and Profes- sor of Computational Statistics at the Karlsruhe Institute of Technology (KIT) in Germany. Prof. Gneiting has held faculty positions in the Department of Statistics at the University of Washington, where he remains affiliate faculty, and at the Institute for Applied Mathematics at Heidelberg University. He serves as Editor for Physical Science, Computing, Engineering, and the Environment at the Annals of Applied Statistics.

Additional Meeting Activities: The 2015 ENAR Spring Meeting will feature a host of other activities in addition to the scientific and educational programs. On Saturday, March 14, there will be the Junior Researchers Workshop, organized under the leadership of Kimberly Drews, George Washington, University. The Fostering Diversity in Biostatistics Workshop, organized by Simone Gray, of the Centers for Disease Prevention and Control, and Sean Simpson, of Wake Forest School of Medicine, will be held on Sunday,

The Student Mixer on Monday evening and the Tuesday luncheon event organized by the Council of Emerging and New Statisti- cians (CENS) will provide ample networking opportunities for students and recent graduates. Meeting attendees seeking employ- ment and prospective employers will benefit from the vibrant Career Placement Center. Be sure to visit the exhibitors’ area to browse the latest books and software in your field.

A perennial favorite among many attendees, the Tuesday night social event for the 2015 meeting will take place at sea: a dinner cruise aboard the Biscayne Lady yacht. We will be picked up by boat at the Riverwalk in front of the hotel and will enjoy a memo- rable evening of breathtaking views of the Miami skyline, great food, nice conversation, music and dancing. Boat cruises have sold out quickly in previous ENAR meetings held in Florida, so sound statistical inference suggests that you should get your tickets early.

Meeting Venue: The conference will be held at the Hyatt Miami Regency hotel located by the Miami Riverwalk, in the downtown area. The hotel is within walking distance from the bustling Mary Brickell district, with its shops, restaurants, and nightlife. South Beach, showcasing beautiful Art Deco architecture, is a short cab drive away and so is Calle Ocho, in the heart of Little Havana.

Acknowledgements: This meeting would not happen without the dedication and leadership of Kathy Hoskins, the ENAR Executive Director. Kathy is the institutional memory of ENAR and each year patiently guides incoming presidents-elect, like myself, on the how-to’s of organizing the Spring Meeting. My heartfelt thanks to Kathy and the ENAR team, Challee Blackwelder and Katie Earley, for all the great work they have put into the meeting organization.

I am also very grateful to the Local Arrangements Committee, led (for a second time) by Tulay Koru-Sengul, of the University of Miami Miller School of Medicine, for their critical work towards the success of the ENAR meeting.

Welcome to the 2015 ENAR Spring Meeting!

Sincerely,

José Pinheiro Kathy Hoskins 2015 ENAR President ENAR Executive Director

Program & Abstracts 5 ENAR 2015 ACKNOWLEDGEMENTS

ENAR would like to Acknowledge the Generous Support of the 2015 Local Arrangements Committee, chaired by Tulay Koru-Sengul, University of Miami, and our Student Volunteers.

We Gratefully Acknowledge NIH, We Gratefully Acknowledge the and in Particular the: Invaluable Support and Generosity National Cancer Institute of Our Sponsors and Exhibitors. National Heart, Lung, & Blood Sponsors Institute AbbVie National Institute of Environmental Alexion Health Sciences Biogen Idec Inc. National Institute of Allergy and Infectious Diseases Cytel Inc. For their Generous Support of the Eli Lilly and Company ENAR Junior Researchers Workshop JANSSEN Pharmaceutical Companies of Johnson & Johnson ENAR Junior Researchers’ Novartis Oncology Workshop Coalition Members Parexel Columbia University The Proctor & Gamble Co. Emory University Quintiles – Center for Statistics ENAR in Drug Development Harvard University SAS Institute Inc. The Statistics Collaborative, Inc. North Carolina State University Statistics in Medicine/Wiley The University of Michigan Exhibitors The University of Minnesota Cambridge University Press The University of North Carolina at Chapel Hill CRC Press The University of Pennsylvania Minitab The University of Pittsburgh Oxford University Press The University of Wisconsin- SAS Institute Inc. Madison Springer Virginia Commonwealth University Texas A&M University – Department of Statistics Wiley

6 ENAR 2015 | Spring Meeting | March 15–18 Officers and Committees January – December 2015 Executive Committee | Officers President | José Pinheiro Past President | DuBois Bowman President-Elect | Jianwen Cai Secretary | Brent Coull (2015-2016) Treasurer | Sarah Ratcliffe (2014-2015)

Regional Committee (RECOM) President | Chair | José Pinheiro Eight Ordinary Members (elected to 3-year terms): + Philip Reiss (RAB Chair)

2013-2015 2014-2016 2015-2017 Sudipto Banerjee Michael Daniels Paul Albert Jeffrey Morris Michelle Dunn Reneé Moore Dionne Price Xihong Lin Mary Sammel

Regional Members of the International Biometric Society | Executive Board Karen Bandeen-Roche Joel Greenhouse Sharon-Lise Normand José Pinheiro

Regional Members of the Council of the International Biometric Society Scarlett Bellamy Brad Carlin Timothy Johnson KyungMann Kim

Appointed Members of Regional Advisory Board (3-year terms) Chair | Philip Reiss 2013-2015 2014-2016 2015-2017 Richard Cook Hongyua Cao Sean Devlin Lynn Eberly Susmita Datta Susan Halabi Zhezhen Jin Martin Lindquist Telba Irony Clara Kim Qi Long Sheng Lou Mi-Ok Kim Brian Millen Olga Marchenko Monnie McGee Alison Motsinger-Reif David Ohlssen Peter Thall Todd Ogden Limin Peng Sharon Xie Sean Simpson Elena Polverejan Elizabeth Zell Abdus Wahed Arindam RoyChoudhury Menggang Yu Ronglai Shen Programs

2015 Spring Meeting | Miami, FL Program Chair | Mithat Gönen

Associate Chair | Brisa Sánchez ENAR Local Arrangements | Tulay Koru-Sengul

2015SPRING MEETING

2016 Spring Meeting | Austin, TX ENAR Program Chair | Wei Sun Spring Meeting Associate Chair | Laura Hatfield 2016 Local Arrangements | Mike Daniels With IMS & Sections of ASA s

Joint Statistical Meetings | 2015 | Olga Marchenko | 2016 | Bin Nan

Biometrics Executive Editor | Marie Davidian Visit the ENAR Biometrics Co-Editors | Jeanine Houwing-Duistermaat website Yi-Hau Chen Michael Daniels www.enar.org Biometric Bulletin Editor | Dimitris Karlis as a resource for JABES Editor | Monteserrat Fuentes all ENAR activities. ENAR Correspondent for the Biometric Bulletin | Leslie McClure ENAR Executive Director | Kathy Hoskins

International Biometric Society Executive Director | Dee Ann Walker Programs Representatives

2015 Spring Meeting | Miami, FL Committee of Presidents of Statistical Societies (COPSS) Program Chair | Mithat Gönen ENAR Representatives Associate Chair | Brisa Sánchez President | José Pinheiro Past President | DuBois Bowman Local Arrangements | Tulay Koru-Sengul President-Elect | Jianwen Cai

ENAR Standing/Continuing Committees

Nominating Committee (2015) Chair | DuBois Bowman 2016 Spring Meeting | Austin, TX Members | Dan Heitjan (2015) Maura Stokes (2014-2015) Program Chair | Wei Sun Stacy Lindborg (2014-2015) Associate Chair | Laura Hatfield Sponsorship Committee (2015) Local Arrangements | Mike Daniels Diane Catellier Rhonda Szczesniak Bo Yang Névine Zariffa

Webinar Committee (2015) Chair | Lynn Eberly Mi-Ok Kim Philip Reiss Peter Thall

ENAR Representative on the ASA Committee on Meetings Laura Meyerson Joint Statistical Meetings | 2015 | Olga Marchenko | 2016 | Bin Nan

Distinguished Student Paper Awards Committee Chair | Daniel Heitjan, Southern Methodist University Veera Baladandayuthapani, MD Anderson Cancer Center Veronica Berrocal, University of Michigan Min Chen, University of Texas, Dallas Jason Fine, University of North Carolina - Chapel Hill Debashis Ghosh, Penn State University Joseph Hogan, Brown University Mingyao Li, University of Pennsylvania Martin Lindquist, Johns Hopkins Bloomberg School of Public Health Brian Neelon, Duke University Michael Newton, University of Wisconsin Brian Reich, North Carolina State University Taki Shinohara, University of Pennsylvania

Program 9 Alisa Stephens, University of Pennsylvania Matthew White, Children’s Hospital Boston Julian Wolfson, University of Minnesota Gui-Shuang Ying, University of Pennsylvania

Van Ryzin Award Winner Jean-Philippe Fortin, Johns Hopkins University Bloomberg School of Public Health

Distinguished Student Paper Award Winners Joseph Antonelli, Harvard School of Public Health Guanhua Chen, Vanderbilt University and University of North Carolina, Chapel Hill Chuan Hong, The University of Texas School of Public Health Peijie Hou, University of South Carolina Yue Hu, Rice University Lei Huang, Johns Hopkins University Runchao Jiang, North Carolina State University Edward Kennedy, University of Pennsylvania SungHwan Kim, University of Pittsburgh Eunjee Lee, University of North Carolina Ying Liu, Columbia University Xiaoye Ma, University of Minnesota Lu Mao, University of North Carolina, Chapel Hill Christine Mauro, Columbia University Peibei Shi, University of Illinois, Urbana-Champaign Thomas Stewart, University of North Carolina Yichi Zhang, North Carolina State University Yi Zhao, Brown University Yan Zhou, University of Michigan

2015 Fostering Diversity in Biostatistics Workshop Co-Chair | Simone Gray, Centers for Disease Control Co-Chair | Sean Simpson, Wake Forest School of Medicine Knashawn H. Morales, University of Pennsylvania, Perelman School of Medicine Scarlett Bellamy, University of Pennsylvania, Perelman School of Medicine DuBois Bowman, Columbia University, Mailman School of Public Health Amita Manatunga, Emory University, Rollins School of Public Health Reneé H. Moore, North Carolina State University Sastry Pantula, North Carolina State University Adriana Perez, The University of Texas Health Science Center at Houston Dionne Price, Food and Drug Administration DeJuran Richardson, Lake Forest College Louise Ryan, University of Technology Sydney Keith Soper, Merck Research Laboratories Alisa J. Stephens, University of Pennsylvania, Perelman School of Medicine Lance Waller, Emory University, Rollins School of Public Health

10 ENAR 2015 | Spring Meeting | March 15–18 2015 RAB Poster Award Competition Committee Chair | Philip Reiss, New York University Maitreyee Bose, University of Minnesota Erica Dawson, University of Alabama at Birmingham Dominque Williams, Eli Lilly Pei-Shien Wu, New York University Yuting Xu, Johns Hopkins University

2015 Council for Emerging and New Statisticians (CENS) RAB Liaisons | Chair | Clara Kim Elizabeth Zell Brian Millen Philip Reiss Members | Jarcy Zee Victoria Liublinska Tapan Mehta Naomi Brownstein Vivian Shih Diana Hall Michael McIsaac Erica Billig Chanmin Kim Jami Jackson Kaitlin Woo

American Association for the Advancement of Science (Joint with WNAR) Section E | Geology and Geography | Dr. Michael Emch Section N | Medical Sciences | Dr. Abdus S. Wahed Section G | Biological Sciences | Dr. Andrea S. Foulkes Section U | Statistics | Dr. Jessica Utts Section O | Agriculture | Dr. Andrew O. Finley

National Institute of Statistical Sciences | Board of Trustees (The ENAR President is also an ex-officio member) Member | Donna Brogan

Program & Abstracts 11 ENAR

2015SPRING MEETING

12 ENAR 2015 | Spring Meeting | March 15–18 ENAR 2015

SPECIAL THANKS Feng Dai Program Committee Yale University Program Chair ASA Statistical Education Mithat Gönen Section Memorial Sloan-Kettering Edward Boone Cancer Center Virginia Commonwealth University Program Co-Chair ASA Statistics and the Brisa N. Sánchez Environment Section University of Michigan Mazumdar IMS Program Chair Mt. Sinai School of Medicine ASA Statistics in Lurdes Inoue Section University of Washington Haipeng Shen University of North Carolina at Chapel Hill ASA Statistics in Imaging ASA Section Section Representatives TingTing Zhang Li Hong Qi University of Virginia University of California at Davis ASA Statistical Learning ASA Biometrics Section and Data Mining Section

Gary Aras Rebecca Andridge Amgen Ohio State University ASA Biopharmaceutical ASA Survey Research Methods Section Section

Nick Horton Nancy Petersen Amherst College US Department of Veterans ASA Mental Health Section Affairs Laura Freeman ASA Statistical Programmers Institute for Defense Analysis Section ASA Statistics in Defense and National Security Section

Program & Abstracts 13 ENAR Student Awards 2015 Daniel F. Heitjan Chair University of Pennsylvania ENAR Diversity 2015 ENAR Program Committee Workshop 2015 Simone Gray ENAR At-large Local Arrangements Co-Chair Members Chair Centers for Disease Rima Izem Tulay Koru-Sengul Control and Prevention Food and Drug Administration University of Miami Sean L. Simpson Yevgen Tymofyeyev Education Advisory Co-Chair Johnson & Johnson Committee Wake Forest School of Medicine Ronglai Shen José Pinheiro Memorial Sloan-Kettering 2015 ENAR President ENAR Workshop for Cancer Center Johnson & Johnson Junior Biostatisticians Laura White Rick Chappell in Health Research Boston University University of Wisconsin, Kimberly Drews Madison George Washington University Advising Members Telba Irony ENAR Executive Team José Pinheiro Food and Drug Administration 2015 ENAR President Kathy Hoskins Johnson & Johnson Dionne Price Executive Director Food and Drug Administration Rick Chappell Katie Earley University of Wisconsin, Bhramar Mukherjee Program Manager Madison University of Michigan Challee Blackwelder Gordon Lan Administrator Johnson & Johnson

14 ENAR 2015 | Spring Meeting | March 15–18 MIAMI HIGHLIGHTS MIAMI Welcome to Miami! Miami is a global metropolis with booming international business, vibrant culture, and some of the best beaches in the world.

Much of Miami’s appeal is due to its diverse neighborhoods, which range from the towering skyscrapers of downtown Miami to the Cuban community of Little Havana or to the trendy Miami Beach neighborhood of South Beach. People from all over the world come to enjoy the sunny weather, spicy nightlife and fine dining!

Miami has a cuisine that is uniquely its own. With the diversity of its people comes a blend of flavors – Latin, Caribbean and US – known as Floribean. Miami also has outstanding restaurants of every kind, from Italian to Thai. If you’re feeling barbeque, try a nostalgic and delicious landmark next door to Datran Center Skyscrapers, Shorty’s Bar-B-Q. When touring South Beach, take a rest and people-watch for a while at the News Café while enjoying their twenty-four hour breakfast and decadent desserts. If you’re willing to wait (no reservations!), satisfy your seafood desire by going to Joe’s Stone Crab, a restaurant famous for stone crab claws and claims to be the place where this tasty treat was first discovered. Can’t decide? Then take a Miami Culinary Tour – a Miami food tour adventure tasting delicious foods around the city’s historic neighborhoods. Program & Abstracts 15 No matter what kind of entertainment grabs your interest, Miami has it covered. Fulfill your desire for cultural programs at the Adrienne Arsht Center for the Performing Arts; Broadway shows, dance pro- ductions and concerts are on the schedule at this beautiful facility located less than two miles from the Hyatt Regency Miami. Head across the Bay to South Beach to experience the Art Deco District, where the largest collection of Art Deco architecture in the world can be found. Experience a different kind of pool at Coral Gables Venetian Pool, the only swim- ming pool on the National Register of Historic Places that’s chlorine-free and fed with cool spring water.

Of course, there is always the beach – take an afternoon and find a spot along Miami’s gorgeous shoreline and soak up the sun!

Blessed with a warm climate and unrivaled Brickell ocean access, America’s southernmost resort Miami’s financial district, just south of Down- city is also a sought-after international rec- town, offers some of the best nightlife and reation destination. Miami caters to action- dining the city has to offer. Visit some of the oriented visitors from around the globe with neighborhood’s best restaurants like Per- some of the world’s top golf, tennis and sport- ricone’s, Brickell Burger and Beer, and La ing facilities. Add sparkling waters that are a Lupita. See the high rises and condo com- magnet for boating enthusiasts, fishermen, plexes of Miami’s young professionals or divers and water sports aficionados to the check out Blue Martini or Fado Irish Pub for equation and it is easy to see why Miami is a drinks and dancing. You don’t want to miss number one choice for active travelers of all the ‘Manhattan of the South’. ages and skill levels. So to really experience South Florida, you must get out on the water! Bayside Market Rent a boat, see manatees in the wild at Coral Experience the best food, fun, and shopping Gables, swim with dolphins, or take a windsurf- Miami has to offer! You will certainly enjoy ing lesson — it’s all here! the open-air feeling of this Miami shopping mall, a short walk or Metromover ride from the Hyatt, with over 150 stores while walking

16 under the palm trees. One of Vizcaya Museum Coral Castle the most interesting features of this Downtown Miami loca- And Gardens Museum tion is probably the Biscayne Vizcaya is one of South Located near Homestead, Bay and Miami Skyline view Florida’s leading attractions. the castle is comprised of you will get, so even if your Built by agricultural industrial- numerous coral stones, each purpose is not to spend a big ist James Deering, Vizcaya weighing several tons. Many amount of money, go by and Museum & Gardens features of the castle structures are check it out. It’s also ideal for a main house, ten acres of notable, including machines finding boat tour operators, formal gardens, a rockland to tell time, home-made air enjoying live night entertain- hammock (native forest), and conditioners, and a nine-ton ment, and taking tours to soon-to-be-restored historic revolving door. To this day, Miami’s celebrities’ homes. village. Its art and furnishings no one knows how Edward portray 400 years of Euro- Leedskalnin created the Coral Jungle Queen pean history and provide a Castle. Built under the cover of Riverboat Cruise window to both the history night and in secret, at a time For more than 50 years, of Miami, graced by the villa when there were no modern visitors have traveled on this since its completion in 1916; construction conveniences, Ed stately river- and to the Italian Renais- would only say that he knew boat. Tours sance, represented in the “the secret of the pyramids.” sail past Museum’s architecture. First, Visit this site and try to figure estates while you’ll pass through Vizcaya’s out the mystery. an entertain- lush subtropical forest and Everglades ing mono- approach the Main House logue by along a walkway lined with National Park the captain fountains and foliage. The This national park protects points out the inside of the house is filled the southern 25 percent of homes of the with treasures from around the original everglades and famous and the world. Hear Vizcaya’s has a subtropical climate, a the infamous. 1917 pipe organ played Mon- broad, shallow river, and a On three-hour day through Friday from 12 variety of plant and animal life day tours or four-hour din- noon to 12:30 pm. Outside, that makes this a must visit. ner voyages you can sail to you’ll enjoy spectacular views Wildlife species include the an island where you will dine of Biscayne Bay, colorful Florida Panther, American amid tropical foliage. Evening orchids in the David A. Klein Crocodile, and West Indian cruises feature a dinner of Orchidarium, and the serene Manatee. If you enter through barbecued ribs and shrimp, gardens and the statues the Flamingo Main Entrance, with a variety revue and sing- that inhabit them. Located in make sure to stop and take along cruising back. There the southern side of Miami the 45-minute walk around may even be a sighting of in Coconut Grove, Vizcaya the Anhinga Trail, a partially macaws and rare birds from all welcomes visitors every day paved trail with a boardwalk over the world, alligator wres- except Tuesdays from 9:30 that stretches out over the tling, and Seminole Indians. am to 4:30 pm. water. Or, rent a bike and take The 550 passenger riverboat the 15 mile trail around Shark also includes a stop off at the Valley. These are both good Jungle Queen Indian Village, spots to see alligators in their a beautiful tropical island. natural habitat! Biography ENAR David L. DeMets, PhD is currently the Max Halperin Professor of Biostatistics and former Chair of the Department of Biostatistics and Medical Informatics at the University of Wisconsin - Madison. He received his PhD 2015 in biostatistics in 1970 from the University of Minnesota. Following a postdoctoral appointment at the National Institutes of Health (1970- 72), he spent ten years (1972- Presidential Invited Speaker 1982) at the National Heart, Lung and Blood Institute at the National Institutes of Health where he was David L. DeMets, Ph.D. a member of and later became Max Halperin Professor of Biostatistics chief of the Biostatistics Branch. In 1982, he joined the University University of Wisconsin-Madison of Wisconsin and founded the Department of Biostatistics and Big Data, Big Opportunities, Big Challenges Medical Informatics which he chaired until 2009. Since the 1950’s, biostatisticians have been successfully He has co-authored four texts, engaged in biomedical research, from laboratory experi- Fundamentals of Clinical Trials, ments to observational studies to randomized clinical Data Monitoring in Clinical Trials: trials. We owe some of that success to the early pioneers, A Case Studies Approach and Data Monitoring Committees especially those biostatisticians who were present at the National Institutes in Clinical Trials: A Practical of Health (NIH). They created a culture of scientific collaboration, working on Perspective, and Statistical the methodology as needed to solve the biomedical research problems in Methods for Clinical Trials. design, conduct and analysis. Over the past 5 decades, we have experienced He has served on numerous NIH and industry-sponsored a tremendous increase in computational power, data storage capability and Data Safety and Monitoring multidimensionality of data, or “big data”. Some of this expansion has been Committees for clinical trials in driven by genomics. diverse disciplines. He served on the Board of Directors of the At present, we have the opportunity to contribute to the design and analysis American Statistical Association, of genomic data, data stored in the electronic health record and continued as well as having been President of the Society for Clinical Trials needs of clinical trials for greater efficiency. However, with these opportuni- and President of the Eastern ties, we have serious challenges starting with the fact that we need to develop North American Region (ENAR) new methodology to design and analyze the “big data” bases. The demand of the Biometric Society. In addition he was Elected Fellow for quantitative scientists exceeds the supply and there is no strategic national of the International Statistics plan to meet these demands. Institute in 1984, the American Statistical Association in 1986, the Federal funding for biomedical research has been flat and likely to remain Association for the Advancement so for several years, impacting both the ability to train additional quantitative of Science in 1998, the Society scientists and provide them with research funding for new methodologies. We for Clinical Trials in 2006 and the face new or more public scrutiny, demanding that our data and analysis be American Medical Informatics Association in 2008. In 2013, he shared earlier and earlier, even as the data are being gathered such as in clini- was elected as a member of the cal trials. Litigation is now part of our research environment. We will examine Institute of Medicine. His research some of these issues and speculate on ways forward. interests include the design, data monitoring and analysis of clinical trials, especially large Phase III randomized clinical trials. He is well known for his work on sequential statistical methods for monitoring interim data for early evidence of intervention benefit or possible harm.

18 ENAR 2015 | Spring Meeting | March 15–18 IMS Medallion Lecture

Tilmann Gneiting, Ph.D. Biography Heidelberg Institute for Theoretical Studies (HITS) Tilmann Gneiting is Group Karlsruhe Institute of Technology (KIT) Leader at Heidelberg Institute for Theoretical Studies (HITS) and Professor of Computational Uncertainty Quantification in Statistics at Karlsruhe Institute Complex Simulation Models of Technology (KIT) in Germany. In 1997, he obtained a PhD in Using Ensemble Copula Coupling Mathematics at Bayreuth University Critical decisions frequently rely on high-dimensional with Peter Huber as supervisor. output from complex computer simulation models that Subsequently, he held faculty show intricate cross-variable, spatial and/or temporal dependence struc- positions in the Department of Statistics at the University of tures, with weather and climate predictions being examples. There is a Washington (1997-2009), where he strongly increasing recognition of the need for uncertainty quantification in remains affiliate faculty, and at the such settings, for which we propose and review a general multi stage proce- Institute for Applied Mathematics at dure called ensemble copula coupling (ECC), proceeding as follows. Heidelberg University (2009-2013). Tilmann’s research focuses on the 1. Generate a raw ensemble, consisting of multiple runs of the computer theory and practice of forecasting, model that differ in the inputs or model parameters in suitable ways. and spatial and spatio-temporal 2. Apply statistical postprocessing techniques, such as Bayesian model statistics, with applications to averaging or nonhomogeneous regression, to correct for systematic meteorological, hydrologic, and economic problems, among errors in the raw ensemble, to obtain calibrated and sharp predictive others. His work on probabilistic distributions for each univariate output variable individually. forecasting is supported by 3. Draw a sample from each postprocessed predictive distribution. an Advanced Grant from the European Research Council. 4. Rearrange the sampled values in the rank orderstructure of the raw Tilmann also serves as Editor for ensemble, to obtain the ECC postprocessed ensemble. Physical Science, Computing, Engineering, and the Environment The use of ensembles and statistical postprocessing have become rou- at the Annals of Applied Statistics tine in weather forecasting over the past decade. We show that seemingly (2011-2014). unrelated, recent advances can be interpreted, fused and consolidated within the framework of ECC, the common thread being the adoption of the empirical copula of the raw ensemble. In some settings, the adoption of the empirical copula of historical data offers an attractive alternative. In a case study, the ECC approach is applied to predictions of temperature, pressure, precipitation, and wind over Germany, based on the 50-member European Centre for Medium-Range Weather Forecasts (ECMWF) ensem- ble. This is joint work with Roman Schefzik and Thordis Thorarinsdottir.

Program & Abstracts 19 ENAR 2015

Short Courses

– Well-calibrated Bayesian clinical trial analy- SC1:  ses; appropriate use of prior distributions Bayesian Clinical Trials – Drawing valid causal conclusions with FULL DAY | 8:00 am to 5:00 pm Bayesian analyses of observational clinical studies Tuttle (Terrace Level) – Bayesian meta-analysis for combining information David Draper University of California, Santa Cruz SC2:  Overview Statistical Methods Experiments that would today be recognized as clinical trials have been performed at least for fMRI and EEG since the 1740s (with James Lind’s demonstra- Data Analysis tion that citrus fruits cure scurvy). From the late 19th century through the 1990s, sound infer- FULL DAY | 8:00 am to 5:00 pm ential design and analysis of clinical trials has Brickell (Terrace Level) largely been based on the frequentist probability paradigm, but there has been a recent recogni- Martin Lindquist tion that Bayesian methods can offer significant Johns Hopkins School of Public Health advantages in both design and analysis. Hernando Ombao The course University of California, Irvine – Optimal Bayesian design of clinical trials: sequential designs, adaptive designs; the Overview use of Bayesian decision theory for optimal This course will cover the state-of-the-art tech- design niques and statistical approaches for analyzing – Optimal Bayesian analysis of clinical trial fMRI and EEG data. Though there are many outcomes: what optimal analysis is, when it types of brain imaging modalities, these two can be achieved, and how to achieve it when are the most common. This course will be it’s possible scheduled for 4 hours and will be divided into 2 parts: the first devoted to analyzing fMRI data and the second to EEG data.

20 ENAR 2015 | Spring Meeting | March 15–18 The topics in the fMRI section include: methods of analysis. The workshop will focus on more (a) an overview of the acquisition and reconstruction advanced statistical topics such as studies involving of fMRI data more than one drug or schedule, patient heterogene- ity, and bridging studies. Monitoring safety and efficacy (b) overview of the physiological basis of the fMRI simultaneously in dose expansion cohorts or as part of a signal Phase I/II trial will also be discussed as Phase I trials are (c) common experimental designs increasingly including aiming to further characterize the (d) pre-processing steps toxicity and efficacy profile. Illustrations on how to use model based designs, implement and carry out a model (e) methods for localizing areas activated by a task based Phase I trial in practice will be provided based on (f) connectivity analysis actual studies from oncology. Computational consider- (g) prediction and brain decoding. ations and available software will also be discussed. The topics for the EEG section are: The course (a) overview of the physiological basis of the – Overview of Phase I designs EEG signal – Basic theory of model based designs (b) common experimental designs – How good can a design be? Defining optimal (c) pre-processing steps including artifact rejection performance and filtering – Approaches to non-binary outcomes (d) spectral analysis – More complex problems: drug combinations, patient (e) coherence and connectivity analysis heterogeneity (f) statistical approaches to modeling variation across – Dose expansion cohorts trials and subjects – Phase I/II; estimating toxicity and efficacy (g) source localization. in the presence of bivariate endpoints – Statistical Theory (retrospective vs. prospective analysis, convergence, model robustness) SC3:  – Protocol development, review of available software Design Considerations in Early Phase Clinical Trials: SC4:  Phase I, Phase I/II Trials Personalized Medicine and FULL DAY | 8:00 am to 5:00 pm Dynamic Treatment Regimes Flagler (Terrace Level) HALF DAY | 8:00 am to 12:00 noon Monroe (Terrace Level) Ken Cheung Columbia University Marie Davidian Alexia Iasonos North Carolina State University Memorial Sloan Kettering Cancer Center Butch Tsiatis Overview North Carolina State University This course will cover design considerations specific to Phase I and Phase I/II clinical trials, dose finding studies Overview Personalized medicine is focused on making treat- in humans (not in healthy volunteers), in various disease ment decisions for an individual patient based on his/ settings. The topic is receiving increased attention in the her genetic/genomic, clinical, and other characteristics. statistical literature and as a result there exist several new Traditional approaches to this goal seek to develop new designs that can be made use of in any given situa- treatments that are tailored to specific subgroups of tion. The workshop will start with a review of the aims of patients with unique characteristics. An alternative objec- Phase I trials, Phase I trials with expansion cohorts, Ph I/ tive is to determine the best treatment for each patient, II trials and provide a link between the aims, designs, and not only those in a small subgroup, to the benefit of the entire patient population.

Program 21 This course will take this point of view and introduce basic concepts and methods for discovery of dynamic treatment regimes based on data. In the simplest case of a single treatment decision, a dynamic treatment regime is a rule that assigns treatment to patients based on their own characteristics, and the goal is to find the optimal regime, that leading to the greatest benefit if followed by all patients. In chronic diseases and disorders such as cancer, treatment decisions may be made at multiple time points. In this setting, a dynamic treatment regime is a set of sequential such decision rules corresponding to each decision point, and the optimal regime is the set of rules that would lead to greatest benefit if followed over the entire course of decision making by all patients. SC5:  Data Science and High- Performance Statistical Computing HALF DAY | 1:00 pm to 5:00 pm Monroe (Terrace Level) Marc A. Suchard UCLA School of Public Health Martijn J. Schuemie Johnson & Johnson Overview Healthcare data are a prime research target for the Data Sciences because most databases are not only massive in size, but also very highly complex due to issues in sampling, the recording process, dependency through time and across individuals, and privacy in biomedicine. The size and complexity of these data present challenges to traditional statistical analysis that require novel method development and high-perfor- mance computing for scalability.

This course explores recent advances in large-scale statistical inference in healthcare as an example of Big Data in the Data Sciences. The course takes 4 hours and is divided into didactic lectures and hands-on, computing tutorials. Topics include mas- sive observational healthcare databasing and wrangling, scaling inference tools that incorporate complex data structure, and high-performance imple- mentation using emerging computing technology. To this end, participants will use and develop open-source R packages, learn important design patterns for statisti- cal computing, and discuss delegation of performance dependent hot-spots to C/C++ with multi-core and many-core parallelization (including on graphics processing units).

22 ENAR 2015 | Spring Meeting | March 15–18 ENAR 2015

Tutorials

Monday, March 16 predictive power and prediction intervals; T1:  time-to-event endpoints, including stratified Group Sequential Designs populations and power for meta-analyses; binomial endpoints; superiority and non-inferi- Using the gsDesign R ority designs; information-based sample size Package and Web Interface re-estimation and conditional power designs 8:30 am – 10:15 am for sample size re-estimation; generation of publication-quality tables, figures and Flagler (Terrace Level) documents describing designs. Keaven Anderson Merck Research Laboratories T2:  Graphics for Clinical Trials Description Group sequential design is the most widely- 10:30 am to 12:15 pm used and well-accepted form of adaptive Flagler (Terrace Level) design for confirmatory clinical trials. It controls Type I error for multiple analyses of a primary Frank E. Harrell Jr. endpoint during the course of a clinical trial Vanderbilt University School of Medicine and allows early, well-controlled evaluation of stopping for strong efficacy results or futility. Description This tutorial will review the basics of group This tutorial deals with some of the graphical sequential theory and demonstrate common displays that are useful for reporting clinical applications of the method. The R package trial results and for data monitoring commit- gsDesign and its graphical user interface will tee reports. Emphasis is placed on replacing be demonstrated to provide the user with an tables with graphics, new graphical displays easy-to-use, open source option for designing for adverse events, longitudinal data, subject group sequential clinical trials. The user should enrollment and exclusions, and reproducible leave the tutorial with an ability to propose reporting using R, LaTeX, and knitr. The philos- effective group sequential design solutions ophy of the approach is that tables should to confirmatory clinical trial design. Topics only support graphics, and they should be covered include: application of spending hyperlinked to graphics rather than appearing functions for selection of appropriate timing in the main report. Information that supports and levels of evidence for early stopping; graphics such as definitions and sample sizes confidence intervals; conditional power, are pop-ups in the pdf report. More details are available at biostat.mc.vanderbilt.edu/Greport.

Program 23 T4:  A Tutorial for Multisequence Clinical Structural Brain MRI 3:45 pm – 5:30 pm Flagler (Terrace Level) Ciprian Crainiceanu, Ani Eloyan, Elizabeth Sweeney, and John Muschelli Johns Hopkins University Description High resolution structural magnetic resonance imaging (sMRI) is used extensively in clinical practice, as it provides detailed anatomical information of the living organism, is sensitive to many pathologies, and assists in the diagnosis of disease. Applications of sMRI cover essentially every part of the human body from toes to brain and a wide variety of diseases from stroke, cancer, and multiple sclerosis (MS), to internal bleeding and torn ligaments. Since the introduction of MRI in the 1980s, the noninvasive nature of the technique, the continuously improving resolution of images, and the wide availability of MRI scanners have made sMRI instantly recognizable T3:  in the popular literature. Indeed, when one is asked to have an MRI in a clinical context it is almost certainly an Statistical Leadership in sMRI. These images are fundamentally different from Research and the Important functional MRI (fMRI) in size, complexity, measurement target, type of measurement, and intended use. While Role of Influence fMRI aims to study brain activity, sMRI reveals anatomical 1:45 pm – 3:30 pm information. This distinction is important as the scientific problems and statistical techniques for fMRI and sMRI Flagler (Terrace Level) analysis differ greatly, yet confusion between the two continues to exist in the statistical literature and among Bill Sollecito reviewers. Despite the enormous practical importance of University of North Carolina, Chapel Hill sMRI, few biostatisticians have made research contribu- tions in this field. This may be due to the subtle aspects

Lisa LaVange of sMRI, the relatively flat learning curve, and the lack Food and Drug Administration of contact between biostatisticians and the scientists working in clinical neuroimaging. Our goal is reduce Description the price of entry, accelerate learning, and provide the This tutorial will first define leadership and its importance information required to progress from novice to initiated for statisticians; various leadership styles and skills will sMRI researcher. This tutorial will provide a gentle be introduced. The concept of emergent leadership will introduction to high resolution multisequence structural be illustrated using the research team environment as an MRI (sMRI) using several data sets. The tutorial will example of how statisticians can develop leadership skills. provide hands-on training in a variety of image process- The important role of influence as a leadership skill will be ing techniques including: data structure and visualization, given special emphasis as a way to develop leadership data storage and management, inhomogeneity correc- abilities and as a way to have a greater impact on the tion, spatial interpolation, skull stripping, spatial registra- teams and organizations in which statisticians work. tion, intensity normalization, lesion segmentation and mapping, and cross-sectional and longitudinal analysis of images. The tutorial will use R and several other free specialized brain imaging software.

24 ENAR 2015 | Spring Meeting | March 15–18 Tuesday, March 17 T6:  T5:  Graphical Approaches to Bayesian Computation using Multiple Test Problems PROC MCMC 3:45 pm – 5:30 pm 1:45 pm – 3:30 pm Jasmine (Terrace Level) Jasmine (Terrace Level) Dong Xi Fang Chen Novartis Pharmaceuticals SAS Institute Inc. Description Description Methods for addressing multiplicity are becoming The MCMC procedure is a general purpose Markov increasingly more important in clinical trials and other chain Monte Carlo simulation tool designed to fit a wide applications. In the recent past, several multiple test range of Bayesian models, including linear or nonlinear procedures have been developed that allow one to map models, multi-level hierarchical models, models with the relative importance of different study objectives nonstandard likelihood function or prior distributions, and as well as their relation onto an appropriately tailored missing data problems. This tutorial provides a quick and multiple test procedure, such as fixed-sequence, gentle introduction to PROC MCMC and demonstrates fallback, and gate keeping procedures. In this tutorial its use with a series of applications, such as Monte we focus on graphical approaches that can be applied Carlo simulation, various regression models, sensitivity to common multiple test problems, such as compar- analysis, random-effects models, and predictions. ing several treatments with a control, assessing the benefit of a new drug for more than one endpoint, and Increasingly, Bayesian methods are being used by statis- combined non-inferiority and superiority testing. Using ticians in the pharmaceutical field to handle industry- graphical approaches, one can easily construct and specific problems. This tutorial will also present a number explore different test strategies and thus tailor the test of pharma-related data analysis examples and case procedure to the given study objectives. The resulting studies, including network meta-analysis, power prior, multiple test procedures are represented by directed, and missing data analysis. This tutorial is intended for weighted graphs, where each node corresponds to an statisticians who are interested in Bayesian computa- elementary hypothesis, together with a simple algorithm tion. Attendees should have a basic understanding of to generate such graphs while sequentially testing the Bayesian methods (the tutorial does not allocate time individual hypotheses. We also present several case covering basic concepts of Bayesian inference) and studies to illustrate how the approach can be used in experience using the SAS language. This tutorial is based clinical practice. In addition, we briefly consider power on SAS/STAT 13.2. and sample size calculation to optimize a multiple test procedure for given study objectives. The presented methods will be illustrated using the graphical user interface from the gMCP package in R, which is freely available on CRAN.

Program 25 ENAR 2015

Roundtables

Monday, March 16 | 12:15pm – 1:30pm Monroe (Terrace Level) R1:  R2:  Survival Strategies for New Trends and Junior Researchers: Innovations in Science and Can You Have It All? Practice of Clinical Trials Bhramar Mukherjee Olga Marchenko University of Michigan School of Public Health Quintiles Description Description As soon as you get a “real job” after complet- The intent of this roundtable discussion is to ing your doctoral or post-doctoral training, the highlight, share, and discuss the views on expectations and responsibilities from your some new trends and innovations in science employer increase dramatically. Unfortunately, and practice of clinical trials. Specific topics of this critical time window of establishing your- this discussion will include: self in the profession also coincides with the – Innovative designs (e.g., adaptive designs, phase when demands from your personal life biomarker-driven designs) escalate. I will share some useful strategies for where are we today? time management, carefully selecting research problems as a junior researcher, establishing – Statistical and PK/PD applications on smart independence from your advisor, prioritizing in phones to collect data (e.g., patient diary), terms of teaching, research, collaboration and to adjust doses (e.g., a dose for diabetes professional service opportunities and ultimately patients), to analyze data (e.g., simple sum- for trying to strike a work-life balance. maries and graphics) just an idea or the reality? It is a complex multi-dimensional optimization – Statistical and operational simulations problem with non-linear constraints, and while why do we need them? there is no uniform and obvious solution that – Predictive analytics to improve operational works for everybody, we can take advantage of support should we statisticians step up? shared experiences and existing resources to maximize our chance of success, in both per- sonal and professional terms. This discussion will be relevant for senior graduate students, post-doctoral researchers, junior researchers in both industry and academia who are planning to enter/have recently entered the work force.

26 ENAR 2015 | Spring Meeting | March 15–18 R3:  R5:  The Role of Statisticians at Survival Skills for Biostatisticians the FDA in Academic Medical Centers Dionne L. Price Mithat Gönen Food and Drug Administration Memorial Sloan-Kettering Cancer Center Description Description The Food and Drug Administration (FDA) is composed of Biostatisticians in academic medical centers face dif- seven centers which collectively employ over 250 stat- ferent challenges than their counterparts in universities isticians. Statisticians at the FDA are an integral part of and academia. This will be an informal discussion of multidisciplinary teams dedicated to assuring the safety these challenges. Possible topics to be covered include and efficacy of human and veterinary drugs, biologi- the double-edged nature of collaborative work, manag- cal products, medical devices, our nation’s food supply, ing the collaborations to sustain funding, find intellectual cosmetics, and products that emit radiation. Statisticians fulfillment and stimulation for one’s own methodological Monday, March 16 | 12:15pm – 1:30pm analyze and evaluate data, provide leadership, promote work, avoiding being overwhelmed and demotivated by innovation in study designs and statistical techniques, the amount and nature of collaborations, gaining accep- Monroe (Terrace Level) and conduct methodological research aimed at address- tance as an intellectual contributor (as opposed to being ing the many complex issues that arise in a regulatory a p-value generator) from one’s collaborators and striking environment. FDA statisticians utilize their statistical train- work-life balance. ing and knowledge to directly impact the public health. Roundtable participants will learn the role of statisticians at the FDA and potential paths to successful careers with R6:  the Agency. Working as a Statistician at the Center for Devices at the FDA R4:  Telba Irony Applying Bayesian Evidence Food and Drug Administration Synthesis in Comparative Description Effectiveness Research In this round table, I will discuss the life of statistician at the Center for Devices and Radiological Health, highlight- David Ohlssen ing the fact that the statistician is a problem solver, who Novartis Pharmaceuticals must be interested in science and teaching, and could Description aspire to leadership positions. Motivated by the use of evidence based medicine to evaluate health technology, there has been an enormous R7:  increase in the use of quantitative techniques that allow data to be combined from a variety of sources. In a drug Writing Collaborative Grant development setting, there have been a number of recent Applications: Tips and Strategies key works: The recommendations on the use and appli- cation of network meta-analysis were recently presented Brisa Sánchez by the ISPOR task force; From a regulatory perspective, University of Michigan School of Public Health the work of the Canadian Agency (Indirect Evidence: Indirect Treatment Comparisons in Meta-Analysis) and Description the UK NICE Evidence synthesis series have recently One of the key aspects of a biostatistics career in aca- been published; Further, the FDA also started a number demia undoubtedly includes participation in collabora- of recent projects on comparative effectiveness research tive research and writing grant proposals to support that as part of a plan to enhance regulatory science. By draw- research. In this round table we will discuss the range ing on examples from a drug development setting, this of contributions statisticians make to the grant writing roundtable aims to discuss these recent advances. process, share tips and strategies to make the process more efficient, and discuss how participation in collab- orative grant proposals can enhance the biostatistician’s methodological research.

Program 27 R8:  R9:  Interplay Between Adaptive Publishing Without Perishing: Design Features and Complex Strategies for Success in Study Subjectives, Case Studies Publishing in and Tools (Bio)statistical Journals Yevgen Tymofyeyev Marie Davidian Janssen Research & Development North Carolina State University Description Description The current state of available commercial implementa- Contributing to the advance of our discipline through tions of adaptive designs software covers substantial publication of articles in peer-reviewed journals is a fun- practical needs. On the other hand, there are also prac- damental expectation for both junior and not-so junior tical situations where a need exists for custom-made biostatistical researchers alike. Success in publishing programming to satisfy requirements and special fea- one’s work ensures that it will be widely disseminated tures of a particular study or program. Such cases are to researchers and practitioners who stand to benefit. hard to envision up-front in order to warrant a commer- In addition, funding agencies and academic institutions cial off-the-shelf tool. An example could be a study with place considerable importance on a successful record of multiple doses of the active drug, multiple comparators publication. Accordingly, understanding the peer review and several primary endpoints, where the corresponding and editorial processes of top journals and mastering multiple tests can be organized into some logical struc- the art of writing an effective journal article are keys to ture resolved by the application of a gatekeeping- type success in publishing. How does one determine the best procedure, to address the multiple testing problem. This outlet for one’s work? What are the essential elements roundtable is intended to share experiences of interesting of a successful journal article? How does one maximize case studies addressing not only statistical design and the chance of acceptance? What strategies can ensure simulation components, but also logistical implementa- that a published paper is read and cited? How does tion issues and interactions with regulatory agencies. one make optimal use of limited space and additional supplementary material in conveying the message? What are the roles of the editor, associate editor, and referees? What considerations do editors use when evaluating a paper? This roundtable will provide a forum for candid discussion of these and other questions.

28 ENAR 2015 | Spring Meeting | March 15–18 ENAR 2015

Program Summary

SATURDAY, MARCH 14

9:00 am – 9:00 pm WORKSHOP FOR JUNIOR RESEARCHERS Hibiscus B (Terrace Level)

3:30 pm – 5:30 pm CONFERENCE REGISTRATION Lower Promenade (Terrace Level)

SUNDAY, MARCH 15

7:30 am – 6:30 pm CONFERENCE REGISTRATION Lower Promenade (Terrace Level)

8:00 am – 12:00 pm SHORT COURSES

SC4: Personalized Medicine and Dynamic Monroe (Terrace Level) Treatment Regimes

8:00 am – 5:00 pm SHORT COURSES

SC1: Bayesian Clinical Trials Tuttle (Terrace Level)

SC2: Statistical Methods for fMRI and EEG Data Analysis Brickell (Terrace Level)

SC3: Design Considerations in Early Phase Clinical Flagler (Terrace Level) Trials: Phase I, Phase I/II Trials

12:30 am – 5:30 pm DIVERSITY WORKSHOP Orchid CD (Terrace Level)

Program 29 1:00 pm – 5:00 pm SHORT COURSES

SC5: Data Science and High-Performance Monroe (Terrace Level) Statistical Computing

3:00 pm – 6:00 pm EXHIBITS OPEN Lower Promenade (Terrace Level)

4:30 pm – 7:00 pm ENAR EXECUTIVE COMMITTEE MEETING Orchid A (Terrace Level) (by Invitation Only)

4:00 pm – 6:30 pm PLACEMENT SERVICE Hibiscus A (Terrace Level)

7:30 pm – 8:00 pm NEW MEMBER RECEPTION Riverfront Ballroom (Terrace Level)

8:00 pm – 11:00 pm SOCIAL MIXER AND POSTER SESSION Riverfront Ballroom (2nd Floor)

1. Posters: Latent Variable and Mixture Models

2. Posters: Imaging Methods and Applications

3. Posters: Clinical Trials, Adaptive Designs and Applications

4. Posters: Survival Analysis

5. Posters: Causal Inference

6. Posters: Statistical Genetics, GWAS, and ‘omics Data

7. Posters: Methodology and Applications in Epidemiology, Environment, and Ecology

8. Posters: Variable Selection and Methods for High Dimensional Data

9. Posters: Bayesian Methods and Computational Algorithms

30 ENAR 2015 | Spring Meeting | March 15–18 MONDAY, MARCH 16

7:30 a.m – 5:00 pm CONFERENCE REGISTRATION Lower Level Promenade (Terrace Level)

7:30 am – 5:00 pm SPEAKER READY ROOM Azalea B (Terrace Level)

9:00 am – 5:00 pm PLACEMENT SERVICE Hibiscus A (Terrace Level)

8:30 am – 5:30 pm EXHIBITS OPEN Lower Promenade (Terrace Level)

8:30 am – 10:15 am TUTORIAL

T1: Group Sequential Designs Using the gsDesign R Flagler (Terrace Level) Package and Web Interface

SCIENTIFIC PROGRAM

10. Advances in Patient-Centered Outcomes (PCOR) Ashe Auditorium Methodology (3rd Floor)

11. Looking Under the Hood: Assumptions, Methods Brickell (Terrace Level) and Applications of Microsimulation Models to Inform Health Policy

12. Optimal Inference for High Dimensional Problems Miami Lecture Hall (3rd Floor)

13. Lifetime Data Analysis Highlights Johnson (3rd Floor)

14. Recent Advances and Challenges in the Design Foster (3rd Floor) of Early Stage Cancer Trials

15. Large Scale Data Science for Observational Tuttle (Terrace Level) Healthcare Studies

16. Contributed Papers: Competing Risks Ibis (3rd floor)

Program & Abstracts 31 17. Contributed Papers: Applications and Methods Pearson I (3rd Floor) in Environmental Health

18. Contributed Papers: Statistical Methods Orchid C (Terrace Level) for Genomics

19. Contributed Papers: Spatial and Spatio-Temporal Merrick II (3rd Floor) Methods and Applications

20. Contributed Papers: Case Studies in Longitudinal Pearson II (3rd Floor) Data Analysis

21. Contributed Papers: Meta Analysis Gautier (3rd Floor)

22. Contributed Papers: Semi-Parametric Methods Stanford (3rd Floor)

9:30 am – 4:30 pm PLACEMENT SERVICE Hibiscus A (Terrace Level)

10:15 am – 10:30 am REFRESHMENT BREAK WITH OUR EXHIBITORS Lower Promenade (Terrace Level)

10:30 am – 12:15 pm TUTORIAL

T2: Graphics for Clinical Trials Flagler (Terrace Level)

SCIENTIFIC PROGRAM

23. Trends and Innovations in Clinical Trial Statistics: Tuttle (Terrace Level) “The Future ain’t What it Used to be”

24. Causal Inference in HIV/AIDS Research Foster (3rd Floor)

25. Open Problems and New Directions in Merrick II (3rd Floor) Neuroimaging Research

26. Statistical Methods for Understanding Whole Johnson (3rd Floor) Genome Sequencing

27. Doing Data Science: Straight Talk from the Frontline Brickell (Terrace Level)

32 ENAR 2015 | Spring Meeting | March 15–18 28. IMS Medallion Lecture Ashe Auditorium (3rd Floor)

29. In Memory of Marvin Zelen: Past, Present and Miami Lecture Hall Future of Clinical Trials and Cancer Research (3rd Floor)

30. Contributed Papers: Methods for Clustered Data Pearson I (3rd Floor) and Applications

31. Contributed Papers: GWAS Ibis (3rd Floor)

32. Contributed Papers: Applications, Simulations and Pearson II (3rd Floor) Methods in Causal Inference

33. Contributed Papers: Adaptive Designs and Dynamic Gautier (3rd Floor) Treatment Regimes

34. Contributed Papers: Survival Analysis and Cancer Stanford (3rd Floor) Applications

INVITED AND CONTRIBUTED ORAL POSTERS

35. Oral Posters: Methods and Applications in Jasmine (Terrace Level) High Dimensional Data and Machine Learning

12:15 pm – 1:30 pm ROUNDTABLE LUNCHEONS Monroe (Terrace Level)

12:30 pm – 4:30 pm REGIONAL ADVISORY BOARD (RAB) Hibiscus B (Terrace Level) LUNCHEON MEETING (by Invitation Only)

Program & Abstracts 33 1:45 pm – 3:30 pm TUTORIAL

T3: Statistical Leadership in Research and the Flagler (Terrace Level) Important Role of Influence

SCIENTIFIC PROGRAM

36. Recent Research in Adaptive Randomized Trials with Ashe Auditorium the Goal of Addressing Challenges in Regulatory (3rd Floor) Science

37. Statistical Innovations in Functional Genomics Johnson (3rd Floor) and Population Health

38. Big Data: Issues in Biosciences Miami Lecture Hall (3rd Floor)

39. Recent Advances in Statistical Ecology Foster (3rd Floor)

40. New Analytical Issues in Current Brickell (Terrace Level) Epidemiology Studies of HIV and Other Sexually Transmitted Infections

41. Statistical Advances and Challenges in Mobile Health Tuttle (Terrace Level)

42. Contributed Papers: Survey Research Pearson I (3rd Floor)

43. Contributed Papers: Graphical Models Pearson II (3rd Floor)

44. Contributed Papers: Joint Models for Longitudinal Merrick II (3rd Floor) and Survival Data

45. Contributed Papers: Functional Data Analysis Gautier (3rd Floor)

46. Contributed Papers: Methods in Causal Inference: Ibis (3rd Floor) Instrumental Variable, Propensity Scores and Matching

47. Contributed Papers: Covariates Measured with Error Stanford (3rd Floor)

34 ENAR 2015 | Spring Meeting | March 15–18 INVITED AND CONTRIBUTED ORAL POSTERS

48. Oral Posters: Clinical Trials Jasmine (Terrace Level)

3:30 pm – 3:45 pm REFRESHMENT BREAK WITH OUR EXHIBITORS Lower Promenade (Terrace Level)

3:45 pm – 5:30 pm TUTORIAL

T4: A Tutorial for Multisequence Clinical Structural Flagler (Terrace Level) Brain MRI

SCIENTIFIC PROGRAM

49. CENS Invited Session — Careers in Statistics: Ashe Auditorium Skills for Success

50. Analysis Methods for Data Obtained from Tuttle (Terrace Level) Electronic Health Records

51. Statistical Challenges of Survey and Surveillance Foster (3rd Floor) Data in US Government

52. Reconstructing the Genomic Landscape from Johnson (3rd Floor) High-Throughput Data

53. Statistical Methods for Single Miami Lecture Hall Molecule Experiments (3rd Floor)

54. Subgroup Analysis and Adaptive Trials Brickell (Terrace Level)

Program & Abstracts 35 55. Contributed Papers: Methods to Assess Agreement Pearson I (3rd Floor)

56. Contributed Papers: Methylation and RNA Stanford (3rd Floor) Data Analysis

57. Contributed Papers: New Developments in Imaging Ibis (3rd Floor)

58. Contributed Papers: Latent Variable and Principal Pearson II (3rd Floor) Component Models

59. Contributed Papers: Developments and Gautier (3rd Floor) Applications of Clustering, Classification, and Dimension Reduction Methods

60. Contributed Papers: Survival Analysis: Merrick II (3rd Floor) Methods Development and Applications

INVITED AND CONTRIBUTED ORAL POSTERS

61. Oral Posters: GWAS and Meta Analysis Jasmine (Terrace Level) of Genetic Studies

5:30 pm – 6:30 pm CENS STUDENT MIXER Monroe (Terrace Level)

6:30 pm – 7:30 pm PRESIDENT’S RECEPTION (by Invitation Only) Riverwalk Outdoor Terrace

36 ENAR 2015 | Spring Meeting | March 15–18 TUESDAY, MARCH 17

7:30 am – 5:00 pm CONFERENCE REGISTRATION Lower Promenade (Terrace Level)

7:30 am – 5:00 pm SPEAKER READY ROOM Azalea B (Terrace Level)

8:30 am – 5:30 pm EXHIBITS OPEN Lower Promenade (Terrace Level)

9:30 am – 3:30 pm PLACEMENT SERVICE Hibiscus A (Terrace Level)

8:30 am – 10:15 am SCIENTIFIC PROGRAM

62. Statistical Inference with Random Hibiscus B (Terrace Level) Forests and Related Ensemble Methods

63. Mediation and Interaction: Theory, Pratice and Ashe Auditorium Future Directions (3rd Floor)

64. Motivation and Analysis Strategies for Joint Orchid C (Terrace Level) Modeling of High Dimensional Data in Genetic Association Studies

65. Recent Developments on Inference for Possibly Johnson (3rd Floor) Time-Dependent Treatment Effects with Survival Data

66. Journal of Agricultural, Biological and Foster (3rd Floor) Environmental Statistics (JABES) Highlights

67. Estimation and Inference for High Dimensional Miami Lecture Hall and Data Adaptive Problems (3rd Floor)

68. Contributed Papers: Novel Methods Merrick I (3rd Floor) for Bioassay Data

Program & Abstracts 37 69. Contributed Papers: Infectious Disease Pearson I (3rd Floor)

70. Contributed Papers: Variable Selection Pearson II (3rd Floor)

71. Contributed Papers: Modeling Health Data Gautier (3rd Floor) with Spatial or Temporal Features

72. Contributed Papers: Advances in Merrick II (3rd Floor) Longitudinal Modeling

73. Contributed Papers: Causal Inference: Ibis (3rd Floor) Average and Mediated Effects

74. Contributed Papers: Variable Selection Stanford (3rd Floor) with High Dimensional Data

10:15 am – 10:30 am REFRESHMENT BREAK WITH OUR EXHIBITORS Lower Promenade (Terrace Level)

10:30 am – 12:15 pm 75. PRESIDENTIAL INVITED ADDRESS Regency Ballroom (Terrace Level)

12:30 pm – 4:30 pm REGIONAL COMMITTEE LUNCHEON MEETING Hibiscus B (Terrace Level) (by Invitation Only)

38 ENAR 2015 | Spring Meeting | March 15–18 1:45 pm – 3:30 pm TUTORIAL

T5: Bayesian Computation Using Proc MCMC Jasmine (Terrace Level)

SCIENTIFIC PROGRAM

76. Recent Advances in Dynamic Treatment Regimes Ashe Auditorium (3rd Floor)

77. Predictive Models for Precision Medicine Miami Lecture Hall (3rd Floor)

78. Electronic Health Records: Challenges Orchid C (Terrace Level) and Opportunities

79. Cost-Effective Study Designs for Observational Data Tuttle (Terrace Level)

80. Advanced Machine Learning Methods Johnson (3rd Floor)

81. Statistical Analysis for Deep Sequencing Data Foster (3rd Floor) in Cancer Research: Methods and Applications

82. Spatial and Spatio-Temporal Modeling Merrick II (3rd Floor)

83. Contributed Papers: Study Design and Power Stanford (3rd Floor)

84. Contributed Papers: Missing Data Gautier (3rd Floor)

85. Contributed Papers: Innovative Methods Ibis (3rd Floor) for Clustered Data

86. Contributed Papers: Biopharmaceutical Pearson II (3rd Floor) Applications and Survival Analysis

87. Contributed Papers: Computational Methods Pearson I (3rd Floor)

3:30 pm – 3:45 pm REFRESHMENT BREAK WITH OUR EXHIBITORS Lower Promenade (Terrace Level)

Program & Abstracts 39 3:45 pm – 5:30 pm TUTORIAL

T6: Graphical Approaches to Multiple Test Problems Jasmine (Terrace Level)

SCIENTIFIC PROGRAM

88. Biostatistical Methods for Heterogeneous Tuttle (Terrace Level) Genomic Data

89. Innovative Approaches in Competing Risk Analysis Orchid C (Terrace Level)

90. Biomarker Evaluation in Diagnostics Studies Johnson (3rd Floor) with Longitudinal Data

91. Solving Clinical Trial Problems by Using Foster (3rd Floor) Novel Designs

92. Ensuring Biostatistical Competence Using Miami Lecture Hall Novel Methods (3rd Floor)

93. Methodological Frontiers in the Analysis of Panel Ashe Auditorium (3rd Observed Data Floor)

94. Contributed Papers: Ordinal and Categorical Data Stanford (3rd Floor)

95. Contributed Papers: Statistical Genetics Merrick II (3rd Floor)

96. Contributed Papers: Ecology and Pearson I (3rd Floor) Forestry Applications

97. Contributed Papers: Pooled Biospecimens Pearson II (3rd Floor) and Diagnostic Biomarkers

98. Contributed Papers: Multiple Testing Ibis (3rd Floor) and Variable Selection

99. Contributed Papers: Parameter Estimation Gautier (3rd Floor) in Hierarchical and Non Linear Models

5:30 pm – 6:30 pm ENAR BUSINESS MEETING Orchid C (Terrace Level) (Open to all ENAR Members)

6:30 pm – 9:30 pm TUESDAY NIGHT EVENT Dinner Cruise on the Biscayne Lady

40 ENAR 2015 | Spring Meeting | March 15–18 WEDNESDAY, MARCH 18

7:30 am – 12:00 noon SPEAKER READY ROOM Azalea B (Terrace Level)

7:30 am – 9:00 am PLANNING COMMITTEE BREAKFAST MEETING Orchid A (Terrace Level) (by Invitation Only)

8:00 am – 12:30 pm CONFERENCE REGISTRATION Lower Promenade (Terrace Level)

8:00 am – 12:00 pm EXHIBITS OPEN Lower Promenade (Terrace Level)

8:30 am – 10:15 am SCIENTIFIC PROGRAM

100. New Statistical Methods in the Environmental Miami Lecture Hall Health Sciences (3rd Floor)

101. Novel Phase II and III Clinical Trial Designs for Pearson (3rd Floor) Cancer Research that Incorporate Biomarkers and Nonstandard Endpoints

102. Novel Statistical Methods to Decipher Gene Jasmine (Terrace Level) Regulation Using Sequence Data

103. Flow Cytometry: Data Collection and Foster (3rd Floor) Statistical Analysis

104. Statistical Methods in Chronic Kidney Disease Johnson (3rd Floor)

105. Challenging Statistical Issues in Imaging Merrick I (3rd Floor)

106. Statistical Methods for Predicting Subgroup Level Ashe Auditorium Treatment Response (3rd Floor)

107. Contributed Papers: ROC Curves Ibis (3rd Floor)

108. Contributed Papers: Personalized Medicine Merrick II (3rd Floor) and Biomarkers

109. Contributed Papers: Time Series Analysis Stanford (3rd Floor) and Methods

Program 41 10:15 am – 10:30 am REFRESHMENT BREAK WITH OUR EXHIBITORS Lower Promenade (Terrace Level)

10:30 am – 12:15 pm SCIENTIFIC PROGRAM

110. Incorporating Biological Information in Statistical Jasmine (3rd Floor) Modeling of Genome-Scale Data with Complex Structures

111. Emerging Issues in Clinical Trials and High Ashe Auditorium Dimensional Data (3rd Floor)

112. Advances in Repeated Measures and Longitudinal Pearson (3rd Floor) Data Analysis

113. Advances in Modeling Zero-Inflated Data Johnson (3rd Floor)

114. New Developments in Missing Data Analysis: Merrick II (3rd Floor) from Theory to Practice

115. Environmental Methods with Deterministic and Foster (3rd Floor) Stochastic Components

116. Bayesian and non-parametric Bayesian Miami Lecture Hall Approaches to Causal Inference (3rd Floor)

117. Design of Multiregional Clinical Trials: Merrick I (3rd Floor) Theory and Practice

118. Contributed Papers: Multivariate Survival Analysis Ibis (3rd Floor)

119. Contributed Papers: Constrained Inference Stanford (3rd Floor)

120. Contributed Papers: Nonparametric Methods Gautier (3rd Floor)

42 ENAR 2015 | Spring Meeting | March 15–18 Program 43 ENAR 2015

Scientific Program

SUNDAY, MARCH 15

8:00 am – 11:00 pm POSTER PRESENTATIONS Riverfront Ballroom (2nd Floor)

1. POSTERS: Latent Variable and Mixture Models Sponsor: ENAR 1a. INVITED POSTER: Assessment of Dimensionality Can Be Distorted by Too Many Zeroes: An Example from Psychiatry and a Solution Using Mixture Models Melanie M. Wall*, Columbia University Irini Moustaki, London School of Economics 1b. Local Influence Diagnostics for Hierarchical Count Data Models with Overdispersion and Excess Zeros Trias Wahyuni Rakhmawati*, Universiteit Hasselt Geert Molenberghs, Universiteit Hasselt and Katholieke Universiteit Leuven Geert Verbeke, Katholieke Universiteit Leuven and Universiteit Hasselt Christel Faes, Universiteit Hasselt and Katholieke Universiteit Leuven 1c. Finite Multivariate Mixtures of Skew-t Distributions with Collapse Clusters with Application in Forestry Josef Hoefler* and Donna Pauler Ankerst, Technical University Munich 1d. Weibull Mixture Regression for Zero-Heavy Continuous Substance Use Outcomes Mulugeta Gebregziabher, Delia Voronca* and Abeba Teklehaimanot, Medical University of South Carolina Elizabeth J. Santa Ana, Ralph H. Johnson Department of Veterans Affairs Medical Center 1e. Model-Free Estimation of Time-Varying Correlation Coefficients and their Confidence Intervals with an Application to fMRI Data Maria A. Kudela* and Jaroslaw Harezlak, Indiana University Richard M. Fairbanks School of Public Health, Indianapolis Martin Lindquist, Johns Hopkins Bloomberg School of Public Health

44 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 1f. Zero-and-One Inflated Beta Regression with Mixed Effects for Modeling Relative Frequency of Condom Use in Men Who Have Sex with Men (MSM) in Ghana Nanhua Zhang*, Cincinnati Children’s Hospital Medical Center Yue Zhang, University of Cincinnati LaRon E. Nelson, University of Rochester 1g. Inference for the Number of Topics in the Latent Dirichlet Allocation Model via a Pseudo-Marginal Metropolis-Hastings Algorithm Zhe Chen* and Hani Doss, University of Florida 1h. Applying a Stochastic Volatility Model to US Stock Markets with a UMM Undergraduate Student Jong-Min Kim* and Li Qin, University of Minnesota, Morris 1i. A Mixture Model of Heterogeneity in Treatment Response Hongbo Lin* and Changyu Shen, Indiana University School of Medicine and Richard M. Fairbanks School of Public Health, Indianapolis 1j. Bayesian Random Graph Mixture Model for Community Detection in Weighted Networks Christopher Bryant*, Mihye Ahn, Hongtu Zhu and Joseph Ibrahim, University of North Carolina, Chapel Hill 1k. Time Series Forecasting Using Model-Based Clustering and Model Averaging Fan Tang* and Joseph Cavanaugh, University of Iowa 1l. Multilevel Functional Principal Components Analysis of Surfaces with Application to CT Image Data of Pediatric Thoracic Shape Lucy F. Robinson*, Jonathan Harris and Sriram Balasubramanian, Drexel University 1m. A New Approach for Treatment Noncompliance with Structural Zero Data Pan Wu*, Christiana Care Health System

* = Presenter n = Student Award Winner Program & Abstracts 45 2. POSTERS: Imaging Methods and Applications Sponsor: ENAR 2a. INVITED POSTER: Determining Multimodal Neuroimaging Markers of Parkinson’s Disease DuBois Bowman*, Columbia University Weingiong Xue, Boehringer Ingelheim Daniel Drake, Columbia University 2b. Segmentation of Intracerebral Hemorrhage in CT Scans Using Logistic Regression John Muschelli*, Johns Hopkins Bloomberg School of Public Health Natalie Ullman and Daniel Hanley, Johns Hopkins School of Medicine Ciprian M. Crainiceanu, Johns Hopkins Bloomberg School of Public Health 2c. Relating Multi-Sequence Longitudinal Data from MS Lesions on Structural MRI to Clinical Covariates and Outcomes Elizabeth Sweeney*, Johns Hopkins Bloomberg School of Public Health Blake Dewey and Daniel Reich, National Institute of Neurological Disease and Stroke, National Institutes of Health Ciprian M. Crainiceanu, Johns Hopkins Bloomberg School of Public Health Russell Shinohara, University of Pennsylvania Ani Eloyan, Johns Hopkins Bloomberg School of Public Health 2d. Using Multiple Imputation to Efficiently Correct Magnetic Resonance Imaging Data in Multiple Sclerosis Alicia S. Chua*, Svetlana Egorova, Mark C. Anderson, Mariann Polgar-Turcsanyi, Tanuja Chitnis, Howard L. Weiner, Charles R. Guttmann, Rohit Bakshi and Brian C. Healy, Brigham and Women’s Hospital, Boston 2e. Background Adjustment and Voxelwise Inference for Template-Based Gaussian Mixture Models Meng Li* and Armin Schwartzman, North Carolina State University 2f. Fast, Fully Bayesian Spatiotemporal Inference for fMRI Donald R. Musgrove*, John Hughes and Lynn E. Eberly, University of Minnesota 2g. Bayesian Spatial Variable Selection for Ultra-High Dimensional Neuroimaging Data: A Multiresolution Approach Yize Zhao*, Statistical and Applied Mathematical Sciences Institute Jian Kang and Qi Long, Emory University 2h. Analysis of High Dimensional Brain Signals in Designed Experiments Using Penalized Threshold Vector Autoregression Lechuan Hu* and Hernando Ombao, University of California, Irvine

46 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 2i. Spatially Weighted Reduced-Rank Framework for Neuroimaging Data with Application to Alzheimer’s Disease Mihye Ahn*, University of Nevada, Reno Haipeng Shen and Chao Huang, University of North Carolina, Chapel Hill Yong Fan, University of Pennsylvania Hongtu Zhu, University of North Carolina, Chapel Hill 2j. Highly Adaptive Test for Group Differences in Brain Functional Connectivity Junghi Kim* and Wei Pan, University of Minnesota 2k. Pre-Surgical fMRI Data Analysis Using a Spatially Adaptive Conditionally Autoregressive Model Zhuqing Liu* and Veronica J. Berrocal, University of Michigan Andreas J. Bartsch, University of Heidelberg Timothy D. Johnson, University of Michigan 2l. Semiparametric Bayesian Models for Longitudinal MR Imaging Data with Multiple Continuous Outcomes Xiao Wu*, University of Florida Michael J. Daniels, University of Texas, Austin 2m. Improving Reliability of Subject-Level Resting-State Brain Parcellation with Empirical Bayes Shrinkage Amanda F. Mejia*, Mary Beth Nebel and Haochang Shou, Johns Hopkins University Ciprian M. Crainiceanu, Johns Hopkins Bloomberg School of Public Health James J. Pekar, Johns Hopkins University School of Medicine Stewart Mostofsky, Brian Caffo and Martin Lindquist, Johns Hopkins University

3. POSTERS: Clinical Trials, Adaptive Designs and Applications Sponsor: ENAR 3a. INVITED POSTER: The Role of Statisticians in Regulatory Drug Safety Evaluation Clara Kim* and Mark Levenson, U.S. Food and Drug Administration 3b. Analyzing Multiple Endpoints in a Confirmatory Randomized Clinical Trial: An Approach that Addresses Stratification, Missing Values, Baseline Imbalance and Multiplicity for Strictly Ordinal Outcomes Hengrui Sun*, University of North Carolina, Chapel Hill Atsushi Kawaguchi, Kyoto University, Japan Gary Koch, University of North Carolina, Chapel Hill 3c. Comparing the Statistical Power of Analysis of Covariance after Multiple Imputation and the Mixed Model in Testing the Treatment Effect for Pre-Post Studies with Loss to Follow-Up Wenna Xi*, Michael L. Pennell, Rebecca R. Andridge and Electra D. Paskett, The Ohio State University

* = Presenter n = Student Award Winner Program & Abstracts 47 3d. Extending Logistic Regression Likelihood Ratio Test Analysis to Detect Signals of Vaccine-Vaccine Interactions in Vaccine Safety Surveillance Kijoeng Nam*, U.S. Food and Drug Administration Nicholas C. Henderson, University of Wisconsin, Madison Patricia Rohan, Emily Jane Woo and Estelle Russek-Cohen, U.S. Food and Drug Administration 3e. Dose-Finding Approach Based on Efficacy and Toxicity Outcomes in Phase I Oncology Trials for Molecularly Targeted Agents Hiroyuki Sato*, Pharmaceuticals and Medical Devices Agency Akihiro Hirakawa, Nagoya University Graduate School of Medicine Chikuma Hamada, Tokyo University of Science 3f. Effect Size Measures and Meta-Analysis for Alternating Treatment Single Case Design Data D Leann Long*, Mathew Bruckner, Regina A. Carroll and George A. Kelley, West Virginia University 3g. Clinical Trials with Exclusions Based on Race, Ethnicity, and English Fluency Brian L. Egleston*, Omar Pedraza, Yu-Ning Wong, Roland L. Dunbrack Jr., Eric A. Ross and J. Robert Beck, Fox Chase Cancer Center, Temple University 3h. Comparing Four Methods for Estimating Optimal Tree-Based Treatment Regimes Aniek Sies* and Iven Van Mechelen, Katholieke Universiteit Leuven 3i. Comparing Methods of Adjusting for Center Effects Using Pediatric ICU Glycemic Control Data Samantha Shepler*, Scott Gillespie and Traci Leong, Emory University 3j. Bayesian Dose Finding Procedure Based on Information Criterion Lei Gao*, Sanofi William F. Rosenberger, George Mason University Zorayr Manukyan, Pfizer Inc. 3k. The Relationship among Toxicity, Response, and Survival Profiles Ultimately Influence Calling a Beneficial Experimental Drug Favorable Under Standard Phase I, II, and III Clinical Trial Designs Amy S. Ruppert* and Abigail B. Shoben, The Ohio State University 3l. Dose-Finding Using Hierarchical Modeling for Multiple Subgroups Kristen May Cunanan* and Joseph S. Koopmeiners, University of Minnesota 3m. Detecting Outlying Trials in Network Meta-Analysis Jing Zhang*, University of Maryland Haoda Fu, Eli Lilly and Company Bradley P. Carlin, University of Minnesota 3n. INVITED POSTER: Subgroup Analysis in Confirmatory Clinical Trials Brian Millen*, Eli Lilly and Company

48 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 4. POSTERS: Survival Analyses Sponsor: ENAR 4a. INVITED POSTER: Time Dependent Covariates in the Presence of Left Truncation Rebecca A. Betensky*, Harvard School of Public Health 4b. On the Estimators and Tests for the Semiparametric Hazards Regression Model Seung-Hwan Lee*, Illinois Wesleyan University 4c. A Martingale Approach to Estimating Confidence Band with Censored Data Eun-Joo Lee*, Millikin University 4d. Novel Image Markers for Non-Small Cell Lung Cancer Classification and Survival Prediction Hongyuan Wang*, University of Kentucky Fuyong Xing and Hai Su, University of Florida Arnold Stromberg, University of Kentucky Lin Yang, University of Florida 4e. Generalized Estimating Equations for Modeling Restricted Mean Survival Time Under General Censoring Mechanisms Xin Wang* and Douglas E. Schaubel, University of Michigan 4f. Generalized Accelerated Failure Time Spatial Frailty Model Haiming Zhou*, Timothy Hanson and Jiajia Zhang, University of South Carolina 4g. Penalized Variable Selection in Competing Risks Regression Zhixuan Fu*, Yale University Chirag R. Parikh, Yale University School of Medicine Bingqing Zhou, Yale University 4h. Statistical Modeling of Gap Times in Presence of Panel Count Data with Intermittent Examination Times: An Application to Spontaneous Labor in Women Ling Ma* and Rajeshwari Sundaram, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 4i. Competing Risks Model of Screening and Symptoms Diagnosis for Prostate Cancer Sheng Qiu* and Alexander Tsodikov, University of Michigan 4j. Joint Modeling of Recurrent Event Processes and Intermittently Observed Time-Varying Binary Covariate Processes Shanshan Li*, Indiana University Richard M. Fairbanks School of Public Health, Indianapolis 4k. Composite Outcomes Versus Competing Risks Paul Kolm*, Christiana Care Health Systems

* = Presenter n = Student Award Winner Program & Abstracts 49 4l. Quantile Regression Models for Interval-Censored Failure Time Data Fang-Shu Ou*, Donglin Zeng and Jianwen Cai, University of North Carolina, Chapel Hill 4m. Empirical Likelihood Confidence Bands for the Difference of Survival Functions Under the Proportional Hazards Model Mai Zhou and Shihong Zhu*, University of Kentucky

5. POSTERS: Causal Inference Sponsor: ENAR 5a. INVITED POSTER: A Causal Framework for Meta Analyses Michael E. Sobel*, David Madigan and Wei Wang*, Columbia University 5b. The Principal Direction of Mediation Oliver Chen*, Elizabeth Ogburn, Ciprian Crainiceanu, Brian Caffo and Martin Lindquist, Johns Hopkins Bloomberg School of Public Health 5c. Dynamic Marginal Structural Models to Test the Benefit of Lung Transplantation Treatment Regimes Jeffrey A. Boatman* and David M. Vock, University of Minnesota 5d. A Model Based Approach for Predicting Principal Stratum Membership in Environmental Interventions Katherine E. Freeland*, Johns Hopkins Bloomberg School of Public Health 5e. Propensity Score Approach to Modeling Medical Cost Using Observational Data Jiaqi Li* and Nandita Mitra, University of Philadelphia Elizabeth Handorf, Fox Chase Cancer Center Justin Bekelman, University of Philadelphia 5f. Generalizing Evidence from Randomized Trials Using Inverse Probability of Selection Weights Ashley L. Buchanan*, Michael G. Hudgens and Stephen R. Cole, University of North Carolina, Chapel Hill 5g. Racial Disparities in Cancer Survival: A Causal Inference Perspective Linda Valeri*, Jarvis Chen, Nancy Krieger, Tyler J. VanderWeele and Brent A. Coull, Harvard School of Public Health

50 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 6. POSTERS: Statistical Genetics, GWAS, and ‘Omics Data Sponsor: ENAR 6a. A Data-Adaptive SNP-Set-Based Association Test of Longitudinal Traits Yang Yang* and Peng Wei, University of Texas Health Science Center at Houston Wei Pan, University of Minnesota 6b. Genetic Analysis of Data from Structured Populations Yogasudha * and Gustavo de los Campos, University of Alabama at Birmingham 6c. Mapping Disease Susceptibility Loci for Multiple Complex Traits with U-Statistics Ming Li*, University of Arkansas for Medical Sciences Changshuai Wei, University of North Texas Qing Lu, Michigan State University 6d. Permutation-Based Test Statistics for Intermediate Phenotypes in Genome-Wide Association Studies Wei Xue* and Eric Bair, University of North Carolina, Chapel Hill 6e. Statistics for Genetic Association in the Presence of Covariates — Genome Scanning Considerations Hui-Min Lin*, Eleanor Feingold and Yan Lin, University of Pittsburgh 6f. Power and Sample Size Determination for Time Course Microarray Differential Expression Studies: A False Discovery Rate and Permutation-Based Simulation Method Joanne C. Beer*, University of Pittsburgh Thuan Nguyen, Kemal Sonmez and Dongseok Choi, Oregon Health & Science University 6g. Functional Random Field Models for Association Analysis of Sequencing Data Xiaoxi Shen*, Michigan State University Ming Li, University of Arkansas for Medical Sciences Zihuai He, University of Michigan Qing Lu, Michigan State University 6h. Quantifying Uncertainty in the Identification of Proteins, Post-Translational Modifications (PTMs) and Proteoforms Naomi C. Brownstein* and Xibei Dang, Florida State University National High Magnetic Field Lab Eric Bair, University of North Carolina, Chapel Hill Nicolas L. Young, Florida State University National High Magnetic Field Lab 6i. A Statistical Pipeline for Studying Co-Regulated Genes Using Single-Cell RNA-Seq Data Ning Leng* and Li-Fang Chu, Morgridge Institute for Research Yuan Li, University of Wisconsin, Madison Peng Jiang, Chris Barry, Ron Stewart and James Thomson, Morgridge Institute for Research Christina Kendziorski, University of Wisconsin, Madison

* = Presenter n = Student Award Winner Program & Abstracts 51 6j. Outlier Detection for Quality Control in Flow Cytometry Using Compositional Data Analysis Kipper Fletez-Brant*, Johns Hopkins University Josef Spidlen and Ryan Brinkman, BC Cancer Agency Pratip Chattopadhyay, National Institutes of Health 6k. Power Analysis for Genome-Wide Association Study in Biomarker Discovery Wenfei Zhang*, Yuefeng Lu, Yang Zhao, Vincent Thuillier, Jeffrey Palmer, Sherry Cao, Jike Cui, Stephen Madden and Srinivas Shankara, Sanofi 6l. Differential Dynamics in Single-Cell RNA-Seq Experiments Keegan D. Korthauer* and Christina Kendziorski, University of Wisconsin, Madison 6m. Experimental Design for Bulk Single-Cell RNA-Seq Studies Rhonda L. Bacher* and Christina Kendziorski, University of Wisconsin, Madison 6n. A Hierarchical Mixture Model for Joint Prioritization of GWAS Results from Multiple Related Phenotypes Cong Li*, Yale University Can Yang, Baptist University Hongyu Zhao, Yale School of Public Health 6o. Nonparametric Tests for Differential Enrichment Analysis with Multi-Sample ChIP-Seq Data Qian Wu*, BioStat Solution Kyoung-Jae Won and Hongzhe Li, University of Pennsylvania 6p. Analysis of Mass Spectrometry Data and Preproccesing Methods for Metabolomics Leslie Myint* and Kasper Hansen, Johns Hopkins University 6q. INVITED POSTER: Accounting for Measurement Error in Genomic Data and Misclassification of Subtypes in the Analysis of Heterogeneous Tumor Data Daniel Nevo, Hebrew University, Jerusalem, Israel David Zucker*, Hebrew University, Jerusalem, Israel Molin Wang, Harvard School of Public Health Donna Spiegelman, Harvard School of Public Health

52 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 7. POSTERS: Methodology and Applications in Epidemiology, Environment, and Ecology Sponsor: ENAR 7a. INVITED POSTER: Carpe Diem! Biostatisticians Impacting the Conducting and Reporting of Clinical Studies Sally Morton*, University of Pittsburgh 7b. On Stratified Bivariate Ranked Set Sampling with Optimal Allocation for Naive and Ratio Estimators Lili Yu, Hani Samawi, Daniel Linder, Arpita Chatterjee, Yisong Huang* and Robert Vogel, Georgia Southern University 7c. Comparisons of the Cancer Risk Estimates between Excess Relative Risk and Relative Risk Models: A Case Study Shu-Yi Lin*, Taipei City Hospital, Taiwan 7d. A Regression Based Spatial Capture-Recapture Model for Estimating Species Density Purna S. Gamage*, Souparno Ghosh, Philip S. Gipson and Gregory Pavur, Texas Tech University 7e. Application of the Use of Percentage Difference from Median BMI to Overcome Ceiling Effects in Adiposity Change in Children Christa Lilly* and Lesley Cottrell, West Virginia University Karen Northrup and Richard Wittberg, Wood County School System 7f. A Multi-Pathogen Hierarchical Bayesian Model for Spatio-Temporal Transmission of Hand, Foot and Mouth Disease Xueying Tang*, Nikolay Bliznyuk, Yang Yang and Ira Longini, University of Florida 7g. Evaluating Risk-Prediction Models Using Data from Electronic Health Records Le Wang*, Pamela A. Shaw, Hansie Mathelier, Stephen E. Kimmel and Benjamin French, University of Pennsylvania 7h. A Bayesian Model for Identifying and Predicting the Spatio-Temporal Dynamics of Re-Emerging Urban Insect Infestations Erica Billig*, Michael Levy, Michelle Ross and Jason Roy, University of Pennsylvania 7i. Semi-Markov Models for Interval Censored Transient Cognitive States with Back Transitions and a Competing Risk Shaoceng Wei* and Richard Kryscio, University of Kentucky

* = Presenter n = Student Award Winner Program & Abstracts 53 7j. Growth Curves for Cystic Fibrosis Infants Vary in the Ability to Predict Lung Function Yumei Cao* and Raymond G. Hoffmann, Medical College of Wisconsin Evans M. Machogu, Indiana University School of Medicine Praveen S. Goday and Pippa M. Simpson, Medical College of Wisconsin 7k. An Examination of the Concept of Frailty in the Elderly Felicia R. Griffin*, Daniel L. McGee and Elizabeth H. Slate, Florida State University 7l. Efficiencies from Using Entire United States Responses in Predicting County Level Smoking Rates for West Virginia Using Publicly Available Data Dustin M. Long* and Emily A. Sasala, West Virginia University 7m. Optimally Combined Estimation for Tail Quantile Regression Kehui Wang*, North Carolina State University Huixia Judy Wang, The George Washington University

8. POSTERS: Variable Selection and Methods for High Dimensional Data Sponsor: ENAR 8a. Bayes Factor Consistency Under g-prior Linear Model with Growing Model Size Ruoxuan Xiang*, Malay Ghosh and Kshitij Khare, University of Florida 8b. Variable Selection for Cox Proportional Hazard Frailty Model Ioanna Pelagia* and Jianxin Pan, The University of Manchester, United Kingdom 8c. Fused Lasso Approach to Assessing Data Comparability with Applications in Missing Data Imputation Lu Tang* and Peter X.K. Song, University of Michigan 8d. Multiple Imputation Using Sparse PCA for High-Dimensional Data Domonique Watson Hodge* and Qi Long, Emory University 8e. Topic Modeling for Signal Detection of Safety Data from Adverse Event Reporting System Database Weizhong Zhao*, Wen Zou and James J. Chen, U.S. Food and Drug Administration 8f. Building Risk Models with Calibrated Margins Paige Maas*, National Cancer Institute, National Institutes of Health Yi-Hau Chen, Academia Sinica Raymond Carroll, Texas A&M University Nilanjan Chatterjee, National Cancer Institute, National Institutes of Health

54 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 8g. Categorical Predictors and Pairwise Comparisons in Logistic Regression via Penalization and Bregman Methods Tian Chen* and Howard Bondell, North Carolina State University 8h. Comparison of Step-Wise Variable Selection, BlmmLasso, and GMMBoost for Identification of Predictor Interactions Associated with Disease Outcome Yunyun Jiang* and Bethany Wolf, Medical University of South Carolina 8i. Shrinkage Priors for Bayesian Learning from High Dimesional Genetics Data Anjishnu Banerjee*, Medical College of Wisconsin 8j. Functional Principal Component Analysis to Fifty-Eight Most Traded Currencies Based on Euro Jong-Min Kim, University of Minnesota, Morris Ali H. AL-Marshadi, King Abdulaziz University Junho Lim*, University of Minnesota, Morris

9. POSTERS: Bayesian Methods and Computational Algorithms Sponsor: ENAR 9a. INVITED POSTER: Nonparametric Bayes Models for Modeling Longitudinal Change in Association among Categorical Variables Tsuyoshi Kunihama, Duke University Amy Herring*, University of North Carolina, Chapel Hill David Dunson, Duke University Carolyn Halpern, University of North Carolina, Chapel Hill 9b. Regression Model Estimation and Prediction Incorporating Coefficients Information Wenting Cheng*, Jeremy M.G. Taylor and Bhramar Mukherjee, University of Michigan 9c. Cross-Correlation of Change Point Problem Congjian Liu*, Georgia Southern University 9d. Bayesian Network Models for Subject-Level Inference Sayantan Banerjee*, Han Liang and Veerabhadran Baladandayuthapani, University of Texas MD Anderson Cancer Center 9e. Algorithms for Constrained Generalized Eigenvalue Problem Eun Jeong Min* and Hua Zhou, North Carolina State University

* = Presenter n = Student Award Winner Program & Abstracts 55 9f. CycloPs: A Cyclostationary Algorithm for Automatic Walking Recognition Jacek K. Urbanek* and Vadim Zipunnikov, Johns Hopkins Bloomberg School of Public Health Tamara B. Harris, National Institute on Aging, National Institutes of Health Nancy W. Glynn, University of Pittsburgh Ciprian Crainiceanu, Johns Hopkins Bloomberg School of Public Health Jaroslaw Harezlak, Indiana University School of Medicine 9g. Simulation-Based Estimation of Mean and Variance for Meta-Analysis via Approximate Bayesian Computation (ABC) Deukwoo Kwon* and Isildinha M. Reis, University of Miami 9h. The Effects of Sparsity Constraints on Inference of Biological Processes in Stochastic Non-Negative Matrix Factorization of Expression Data Wai S. Lee*, Alexander V. Favorov and Elana J. Fertig, Johns Hopkins University Michael F. Ochs, The College of New Jersey 9i. Bayesian Sample Size Determination for Hurdle Models Joyce Cheng*, David Kahle and John W. Seaman, Baylor University 9j. Fast Covariance Estimation for Sparse Functional/Longitudinal Data Luo Xiao*, Johns Hopkins University David Ruppert, Cornell University Vadim Zipunnikov and Ciprian Crainiceanu, Johns Hopkins Bloomberg School of Public Health 9k. Prior Elicitation for Logistic Regression with Data Exhibiting Markov Dependency Michelle S. Marcovitz* and John Seaman Jr., Baylor University

56 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner

MONDAY, MARCH 16

8:30 am – 10:15 am

10. Advances in Patient-Centered Outcomes (PCOR) Ashe Auditorium Methodology (3rd Floor) Sponsors: ENAR, ASA Biometrics Section, ASA Section on Statistics in Epidemiology Organizers: Qi Long, Emory University and Jason Gerson, Patient-Centered Outcomes Research Institute Chair: Qi Long, Emory University 8:30 PCORI Funding Opportunities for Biostatisticians Jason Gerson*, Patient-Centered Outcomes Research Institute (PCORI) 8:55 Causal Inference for Effectiveness Research in Using Secondary Data Sebastian Schneeweiss*, Harvard University 9:20 Optimal, Two Stage, Adaptive Enrichment Designs for Randomized Trials, Using Sparse Linear Programming Michael Rosenblum*, Johns Hopkins Bloomberg School of Public Health Xingyuan Fang and Han Liu, Princeton University 9:45 Treatment Effect Inferences Using Observational Data when Treatments Effects are Heterogeneous Across Outcomes: Simulation Evidence John M. Brooks* and Cole G. Chapman, University of South Carolina

10:10 Floor Discussion

58 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 11. Looking Under the Hood: Assumptions, Methods Brickell (Terrace Level) and Applications of Microsimulation Models to Inform Health Policy Sponsors: ENAR, ASA Section on Statistics in Epidemiology Organizer: Ann Zauber, Memorial Sloan Kettering Cancer Center Chair: Eric (Rocky) Feuer, National Cancer Institute, National Institutes of Health 8:30 Introduction to the CISNET Program and Population Comparative Modeling Eric J. Feuer*, National Cancer Institute, National Institutes of Health 8:50 Microsimulation Modeling to Inform Health Policy Decisions on Age to Begin, Age to End, and Intervals of Colorectal Cancer Screening Ann G. Zauber*, Memorial Sloan Kettering Cancer Center 9:10 Role of Calibration and Validation in Developing Microsimulation Models Carolyn M. Rutter*, RAND Corporation 9:30 Using Microsimulation to Assess the Relative Contributions of Screening and Treatment in Observed Reductions in Breast Cancer Mortality in the United States Donald A. Berry*, University of Texas MD Anderson Cancer Center 9:50 Synthesis of Randomized Controlled Trials of Prostate Cancer Screening to Assess Impact of PSA Testing Using Microsimulations Ruth Etzioni* and Roman Gulati, Fred Hutchinson Cancer Research Center Alex Tsodikov, University of Michigan Eveline Heijnsdijk and Harry de Koning, Erasmus University

10:10 Floor Discussion

* = Presenter n = Student Award Winner Program 59 12. Optimal Inference for HighDimensional Problems Miami Lecture Hall Sponsors: ENAR, ASA Biometrics Section (3rd Floor) Organizer: Jelena Bradic, University of California, San Diego Chair: Jelena Bradic, University of California, San Diego 8:30 A Non-Parametric Natural Image for Decoding Visual Stimuli from the Brain Yuval Benjamini*, Stanford University Bin Yu, University of California, Berkeley 8:55 Does lq Minimization Outperform l1 Minimization? Arian Maleki*, Columbia University 9:20 Inference in High-Dimensional Varying Coefficient Models Mladen Kolar*, University of Chicago Damian Kozbur, ETH, Zurich 9:45 Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification Jianqing Fan, Princeton University Yang Feng, Columbia University Jiancheng Jiang, University of North Carolina, Charlotte Xin Tong*, University of Southern California

10:10 Floor Discussion

13. Lifetime Data Analysis Highlights Johnson (3rd Floor) Sponsors: ENAR, ASA Biometrics Section, Lifetime Data Analysis Organizer: Mei-Ling Ting Lee, University of Maryland Chair: Ruth Pfeiffer, National Cancer Institute, National Institutes of Health 8:30 Modeling the “Win Ratio” in Clinical Trials with Multiple Outcomes David Oakes*, University of Rochester 8:55 A Model for Time to Fracture with a Shock Stream Superimposed on Progressive Degradation: The Study of Osteoporotic Fractures Xin He*, University of Maryland, College Park G. A. Whitmore, McGill University Geok Yan Loo, University of Maryland, College Park Marc C. Hochberg, University of Maryland, Baltimore Mei-Ling Ting Lee, University of Maryland, College Park

60 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 9:20 Joint Rate Models for Bivariate Recurrent Events with Frailty Processes Mei-Cheng Wang*, Johns Hopkins University 9:45 Efficient Estimation of the Cox Model with Auxiliary Landmark Time Survival Information Chiung-Yu Huang*, Johns Hopkins University Jing Qin, National Institute of Allergy and Infectious Diseases, National Institutes of Health Huei-Ting Tsai, Georgetown University

10:10 Floor Discussion

14. Recent Advances and Challenges in the Design Foster (3rd Floor) of Early Stage Cancer Trials Sponsors: ENAR, ASA Biopharmaceutical Section Organizer: Ken Cheung, Columbia University Chair: Ken Cheung, Columbia University 8:30 Motivating Sample Sizes in One- and Two-Agent Phase I Designs via Bayesian Posterior Credible Intervals Thomas M. Braun*, University of Michigan 8:55 Beyond the MTD: Personalized Medicine and Clinical Trial Design Daniel Normolle*, Brenda Diergaarde and Julie Bauman, University of Pittsburgh 9:20 Understanding the Toxicity Profile of Novel Anticancer Therapies Shing M. Lee*, Columbia University 9:45 Simple Benchmark for Planning and Evaluating Complex Dose Finding Designs Ken Cheung*, Columbia University

10:10 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 61 15. Large Scale Data Science for Observational Tuttle (Terrace Level) Healthcare Studies Sponsor: IMS Organizers: Marc Suchard, University of California, Los Angeles and David Madigan, Columbia University Chair: Martijn J. Schuemie, Johnson & Johnson 8:30 Honest Inference from Observational Database Studies David Madigan*, Columbia University 8:55 Interpretable Feature Creation and Model Uncertainty in Observational Medical Data Tyler McCormick*, and Rebecca Ferrell, University of Washington 9:20 Beyond Crude Cohort Designs: Pharmacoepidemiology at Scale Marc A. Suchard*, University of California, Los Angeles 9:45 Safety Analysis Strategies for Comparing Two Cohorts Selected from Healthcare Data using Propensity Scores William DuMouchel* and Rave Harpaz, Oracle Health Sciences

10:10 Floor Discussion

16. CONTRIBUTED PAPERS: Ibis (3rd Floor) Competing Risks Sponsor: ENAR Chair: Domonique Watson Hodge, Emory University 8:30 Extending Fine and Gray’s Model: General Approach for Competing Risks Analysis Anna Bellach*, University of Copenhagen and University of North Carolina, Chapel Hill Jason Peter Fine, University of North Carolina, Chapel Hill Ludger Rüschendorf, Albert Ludwigs University of Freiburg im Breisgau Michael R. Kosorok, University of North Carolina, Chapel Hill 8:45 Non-Parametric Cumulative Incidence Estimation Under Misclassification in the Cause of Failure Giorgos Bakoyannis*, Indiana University Menggang Yu, University of Wisconsin Constantin T. Yiannoutsos, Indiana University Constantine Frangakis, Johns Hopkins University

62 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 9:00 Efficient Estimation of Semiparametric Transformation Models for the Cumulative Incidence of Competing Risks Lu Mao n and Danyu Lin, University of North Carolina, Chapel Hill 9:15 Joint Dynamic Modeling of Recurrent Competing Risks and a Terminal Event Piaomu Liu* and Edsel Peña, University of South Carolina, Columbia 9:30 Dynamic Prediction of Subdistribution Functions for Data with Competing Risks Qing Liu* and Chung-Chou H. Chang, University of Pittsburgh 9:45 Competing Risks Regression using Pseudo-Values Under Random Signs Censoring Tianxiu Wang* and Chung-Chou H. Chang, University of Pittsburgh 10:00 Kernel Score Test for Progression Free Survival Matey Neykov* and Tianxi Cai, Harvard University

17. CONTRIBUTED PAPERS: Pearson I (3rd Level) Applications and Methods in Environmental Health Sponsor: ENAR Chair: Yang Yang, University of Texas Health Science Center at Houston 8:30 Methodology for Quantifying the Change in Mortality Associated with Future Ozone Exposures Under Climate Change Stacey E. Alexeeff*, Gabriele G. Pfister and Doug Nychka, National Center for Atmospheric Research 8:45 Estimation of Environmental Exposure Distribution Adjusting for Dependence between Exposure Level and Detection Limit Yuchen Yang*, Brent Shelton and Tom Tucker, University of Kentucky Li Li, Case Western Reserve University Richard Kryscio and Li Chen, University of Kentucky 9:00 Spatial Confounding, Spatial Scale and the Chronic Health Effects of Coarse Thoracic Particulate Matter Helen Powell* and Roger D. Peng, Johns Hopkins Bloomberg School of Public Health 9:15 Estimating the Causal Effect of Coal Burning Power Plants on CO2 Emissions Georgia Papadogeorgou*, Corwin Zigler and Francesca Dominici, Harvard School of Public Health 9:30 Temporal Aspects of Air Pollutant Measures in Epidemiologic Analysis: A Simulation Study Laura F. White* and Jeffrey Yu, Boston University Bernardo Beckerman and Michael Jerrett, University of California, Berkeley Patricia Coogan, Boston University

* = Presenter n = Student Award Winner Program & Abstracts 63 9:45 Bayesian Models for Multiple Outcomes in Domains with Application to the Seychelles Child Development Study Luo Xiao, Johns Hopkins Bloomberg School of Public Health Sally W. Thurston*, University of Rochester David Ruppert, Cornell University Tanzy M.T. Love and Philip W. Davidson, University of Rochester 10:00 Analysis of 26 Million Area VOC Observations for the Prediction of Personal THC Exposure Using Bayesian Modeling Caroline P. Groth*, University of Minnesota Sudipto Banerjee, University of California, Los Angeles Gurumurthy Ramachandran and Ian Reagen, University of Minnesota Richard Kwok, National Institute of Environmental Health Sciences, National Institutes of Health Aaron Blair, National Cancer Institute, National Institutes of Health Dale Sandler and Lawrence Engel, National Institute of Environmental Health Sciences, National Institutes of Health Mark Stenzel and Patricia Stewart, Stewart Exposure Assessments, LLC

18. CONTRIBUTED PAPERS: Orchid C (Terrace Level) Statistical Methods for Genomics Sponsor: ENAR Chair: Wenna Xi, The Ohio State University 8:30 Identification of Consistent Functional Modules Xiwei Chen*, David L. Tritchler, Jeffrey C. Miecznikowski and Daniel P. Gaile, State University of New York at Buffalo 8:45 A Mediation-Based Integrative Genomic Analysis of Lung Cancer Sheila Gaynor* and Xihong Lin, Harvard University 9:00 Nonparametric Failure Time Analysis with Genomic Applications Cheng Cheng*, St. Jude Children’s Research Hospital 9:15 An Omnibus Test for Differential Abundance Analysis of Microbiome Data Jun Chen*, Mayo Clinic, Rochester Emily King, Iowa State University Diane Grill and Karla Ballman, Mayo Clinic, Rochester

64 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 9:30 Sparse Analysis for High Dimensional Data with Application to Data Integration Sandra Addo Safo*, Emory University Jeongyoun Ahn, University of Georgia 9:45 Robust Inference of Chromosome 3D Structure Using Hi-C Chromatin Interaction Data Kai Wang* and Kai Tan, University of Iowa

10:00 Floor Discussion

19. CONTRIBUTED PAPERS: Merrick II (3rd Floor) Spatial and Spatio-Temporal Methods and Applications Sponsor: ENAR Chair: Anjana Grandhi, New Jersey Institute of Technology 8:30 A Semiparametric Approach for Spatial Point Process with Geocoding Error in Case-Control studies Kun Xu* and Yongtao Guan, University of Miami 8:45 Semiparametric Nonseparable Spatial-Temporal Single Index Model Hamdy Fayez Farahat Mahmoud* and Inyoung Kim, Virginia Tech 9:00 Statistical Analysis of Feed-Forward Loops Arising from Aging Physiological Systems Jonathan (JJ) H. Diah*, Feiran Zhong and Arindam RoyChoudhury, Columbia University 9:15 Bayesian Computation for Log-Gaussian Cox Processes: A Comparative Analysis of Methods Ming Teng*, University of Michigan Farouk S. Nathoo, University of Victoria Timothy D. Johnson, University of Michigan 9:30 The Joint Asymptotics for Estimating the Smoothness Parameters of Bivariate Gaussian Random Process Yuzhen Zhou* and Yimin Xiao, Michigan State University 9:45 Covariance Tapering for Anisotropic Nonstationary Gaussian Random Fields with Application to Large Scale Spatial Data Sets Abolfazl Safikhani* and Yimin Xiao, Michigan State University 10:00 Dynamic Nearest Neighbor Gaussian Process Models for Large Spatio-Temporal Datasets Abhirup Datta*, University of Minnesota Sudipto Banerjee, University of California, Los Angeles Andrew O. Finley, Michigan State University

* = Presenter n = Student Award Winner Program & Abstracts 65 20. CONTRIBUTED PAPERS: Pearson II (3rd Floor) Case Studies in Longitudinal Data Analysis Sponsor: ENAR Chair: Yajuan Si, University of Wisconsin-Madison 8:30 Using the Sigmoid Mixed Models for Longitudinal Cognitive Decline Ana W. Capuano*, Robert S. Wilson and Sue E. Leurgans, Rush University Medical Center Jeffrey D. Dawson, University of Iowa Donald Hedeker, University of Chicago 8:45 Short-Term Blood Pressure Variability over 24 hours Using Mixed-Effects Models Jamie M. Madden*, Xia Lee, Patricia M. Kearney and Anthony P. Fitzgerald, University College Cork, Ireland 9:00 A Longitudinal Modelling Case Study in Renal Medicine and an Associated R Package Ozgur Asar*, Lancaster University Peter J. Diggle, Lancaster University and University of Liverpool James Ritchie and Philip A. Kalra, University of Manchester 9:15 A Likelihood Ratio Test for Nested Proportions Yi-Fan Chen*, University of Illinois, Chicago Jonathan Yabes and Maria Brooks, University of Pittsburgh Sonia Singh, Royal Children’s Hospital Lisa Weissfeld, Statistics Collaborative Inc. 9:30 Bayesian Nonparametric Quantile Regression Models: An Application to a Fetal Growth Study with Ultrasound Measurements Sungduk Kim* and Paul S. Albert, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 9:45 Modeling Repeated Labor Curves in Consecutive Pregnancies: Individualized Prediction of Labor Progression from Previous Pregnancy Data Olive D. Buhule* and Paul S. Albert, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health Alexander C. McLain, University of South Carolina Katherine Grantz, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 10:00 An Example of Unconstrained Model for Covariance Structure for Multivariate Longitudinal Data: Major League Baseball Batter’s Salary with the Weighted Offensive Average Chulmin Kim*, University of West Georgia

66 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 21. CONTRIBUTED PAPERS: Gautier (3rd Floor) Meta Analysis Sponsor: ENAR Chair: Joanne C. Beer, University of Pittsburgh 8:30 Meta-Analysis Sparse K-Means Framework for Disease Subtype Discovery When Combining Multiple Transcriptomic Studies Zhiguang Huo* and George Tseng, University of Pittsburgh 8:45 Meta Analysis: A Causal Framework, with Application to Randomized Studies of Vioxx Michael E. Sobel*, David Madigan and Wei Wang, Columbia University 9:00 A Bayesian Hierarchical Model for Network Meta-Analysis of Diagnostic Tests Xiaoye Ma n and Haitao Chu, University of Minnesota Yong Chen, University of Texas Health Science Center, Houston Joseph Ibrahim, University of North Carolina, Chapel Hill 9:15 Inference for Correlated Effect Sizes Using Multiple Univariate Meta-Analyses Yong Chen, Yi Cai* and Chuan Hong, University of Texas Health Science Center, Houston Dan Jackson, Cambridge Institute of Public Health 9:30 Detecting Outlying Studies in Meta-Regression Models Using a Forward Search Algorithm Dimitris Mavridis, University of Ioannina Irini Moustaki*, London School of Economics Melanie Wall, Columbia University Georgia Salanti, University of Ioannina 9:45 Comparing Multiple Imputation Methods for Systematically Missing Subject-Level Data David M. Kline*, Eloise E. Kaizar and Rebecca R. Andridge, The Ohio State University

10:00 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 67 22. CONTRIBUTED PAPERS: Stanford (3rd Floor) Semi-Parametric Methods Sponsor: ENAR Chair: Laura H. Gunn, Stetson University 8:30 Understanding Gaussian Process Fits Using an Approximate Form of the Restricted Likelihood Maitreyee Bose* and James S. Hodges, University of Minnesota 8:45 Mitigating Bias in Generalized Linear Mixed Models: The Case for Bayesian Nonparametrics Joseph L. Antonelli n, Sebastien Haneuse and Lorenzo Trippa, Harvard School of Public Health 9:00 An Estimated Likelihood Estimator by Extracting Auxiliary Information under Outcome Dependent Sample Design Wansuk Choi* and Haibo Zhou, University of North Carolina, Chapel Hill 9:15 Estimation, IID Representation and Inference for the Average Outcome Under Stochastic Intervention on Dependent Data Oleg Sofrygin* and Mark J. van der Laan, University of California, Berkeley 9:30 Empirical Likelihood-Based Inference for Partially Linear Models Haiyan Su*, Montclair State University

9:45 Bayesian Nonparametric Methods for Testing Shape Constraint for Longitudinal Data Yifang Li*, North Carolina State University Sujit Ghosh, North Carolina State University & Statistical and Applied Mathematical Sciences Institute 10:00 Hypothesis Testing in Semi-Parametric Discrete Choice Model Yifan Yang* and Mai Zhou, University of Kentucky

68 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner MONDAY, MARCH 16

Lower Promenade 10:15 – 10:30 pm — Refreshment Break with Our Exhibitors (Terrace Level)

10:30 am – 12:15 pm 23. Trends and Innovations in Clinical Trial Statistics: Tuttle (Terrace Level) “The Future ain’t What it Used to be” Sponsors: ENAR, ASA Biopharmaceutical Section Organizer: Olga Marchenko, Quintiles Chair: Olga Marchenko, Quintiles 10:30 “The Future Ain’t What it Used to be” (Yogi Berra). Have Statisticians Received the Memo? Nevine Zariffa*, AstraZeneca Pharmaceuticals 11:00 Panelists: Sara Hughes, GlaxoSmithKline Dominic Labriola, Bristol-Myers Squibb Lisa LaVange, U.S. Food and Drug Administration Shiferaw Mariam, Janssen R&D Jerry Schindler, Merck Venkat Sethuraman, Bristol-Myers Squibb Frank Shen, AbbVie Anastasios (Butch) Tsiatis, North Carolina State University

12:00 Floor Discussion

* = Presenter n = Student Award Winner Program 69 24. Causal Inference in HIV/AIDS Research Foster (3rd Floor) Sponsors: ENAR, ASA Section on Statistics in Epidemiology Organizer: Michael Hudgens, University of North Carolina, Chapel Hill Chair: Michael Hudgens, University of North Carolina, Chapel Hill 10:30 Representing Unmeasured Confounding in Causal Models for Observational Data Joseph W. Hogan*, Brown University Dylan Small, University of Pennsylvania 10:55 Inverse Probability of Censoring Weights under Missing Not at Random with Application to CD4 Outcomes in HIV-Positive Patients in Kenya Judith J. Lok*, Harvard School of Public Health Constantin T. Yiannoutsos, Indiana University Fairbanks School of Public Health Agnes Kiragga, Infectious Diseases Institute, Kampala, Uganda Ronald J. Bosch, Harvard School of Public Health 11:20 Doubly Robust Instrumental Variable Estimation for Outcome Missing Not at Random BaoLuo Sun*, Lan Liu, James Robins and Eric Tchetgen Tchetgen, Harvard School of Public Health 11:45 Estimating Prevention Efficacy Among Compliers in HIV Pre- Exposure Prophylaxis (PrEP) Trials James Dai* and Elizabeth Brown, Fred Hutchinson Cancer Research Center and University of Washington

12:10 Floor Discussion

25. Open Problems and New Directions Merrick II (3rd Floor) in Neuroimaging Research Sponsors: ENAR, ASA Mental Health Statistics Section, ASA Section on Statistics in Imaging Organizers: Hernando Ombao, University of California, Irvine and Martin Lindquist, Johns Hopkins University Chair: Timothy Johnson, University of Michigan 10:30 Problems in Structural Brain Imaging: Wavelets and Regressions on Non-Euclidean Manifolds Moo K. Chung*, University of Wisconsin-Madison 10:55 Open Problems and New Directions in Modeling Electroencephalograms Hernando Ombao*, University of California, Irvine

70 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 11:20 Open Problems and New Directions in functional Magnetic Resonance Imaging (fMRI) Martin A. Lindquist*, Johns Hopkins University 11:45 Empirical Bayes Methods Leveraging Heritability for Imaging Genetics Wesley Kurt Thompson*, University of California, San Diego

12:10 Floor Discussion

26. Statistical Methods for Understanding Whole Johnson (3rd Floor) Genome Sequencing Sponsors: ENAR, ASA Biometrics Section Organizer: Jeffrey Leek, Johns Hopkins University Chair: Ingo Ruczinski, Johns Hopkins University 10:30 Group Association Test Using a Hidden Markov Model for Sequencing Data Charles Kooperberg*, Yichen Cheng and James Y. Dai, Fred Hutchinson Cancer Research Center 10:55 Variant Calling and Batch Effects in Deep Whole-Genome Sequencing Data Margaret A. Taub*, Johns Hopkins University Suyash S. Shringarpure, Stanford University Rasika A. Mathias and Ingo Ruczinski, Johns Hopkins University Kathleen C. Barnes, Johns Hopkins University and The CAAPA Consortium 11:20 Flexible Probabilistic Modeling of Genetic Variation in Global Human Studies John Storey*, Princeton University 11:45 Allele Specific Expression to Identify Causal Functional QTLs Barbara Englehardt*, Princeton University

12:10 Floor Discussion

* = Presenter n = Student Award Winner Program 71 27. Doing Data Science: Straight Talk Brickell (3rd Floor) from the Frontline Sponsors: ENAR, ASA Statistical Programmers Section Organizer: Bhramar Mukherjee, University of Michigan Chair: Bhramar Mukherjee, University of Michigan 10:30 Doing Data Science Rachel Schutt*, Newscorp 11:00 Discussant: Sumanta Basu, University of California, Berkeley 11:30 Discussant: Beka Steorts, Carnegie Mellon University

12:00 Floor Discussion

28. IMS Medallion Lecture Ashe Auditorium Sponsor: IMS (3rd Floor) Organizer: Lurdes Y.T. Inoue, University of Washington Chair: Lurdes Y.T. Inoue, University of Washington 10:30 Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling Tilmann Gneiting*, Heidelberg Institute for Theoretical Studies (HITS) and Karlsruhe Institute of Technology (KIT) Roman Schefzik, Heidelberg University Thordis L. Thorarinsdottir, Norwegian Computing Center

72 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 29. Panel Discussion: In Memory of Marvin Zelen: Miami Lecture Hall Past, Present and Future of Clinical Trials and (3rd Floor) Cancer Research Sponsor: ENAR Organizer: Xihong Lin, Harvard University Chair: Xihong Lin, Harvard University 10:30 Colin Begg, Memorial Sloan Kettering Cancer Center Ross Prentice, Fred Hutchison Cancer Center Victor De Gruttola, Harvard Chan School of Public Health

12:00 Floor Discussion

30. CONTRIBUTED PAPERS: Pearson I (3rd Floor) Methods for Clustered Data and Applications Sponsor: ENAR Chair: Sung Won Han, New York University 10:30 Multivariate Modality Inference with Application on Flow Cytometry Yansong Cheng*, GlaxoSmithKline Surajit Ray, University of Glasgow 10:45 Estimation of the Prevalence of Disease Among Clusters Using Random Partial-Cluster Sampling Sarah J. Marks*, John S. Preisser, Anne E. Sanders and James D. Beck, University of North Carolina, Chapel Hill 11:00 Testing Homogeneity in a Contaminated Normal Model with Correlated Data Meng Qi* and Richard Charnigo, University of Kentucky 11:15 On the Use of Between-within Models to Adjust for Confounding due to Unmeasured Cluster-Level Covariates Babette A. Brumback* and Zhuangyu Cai, University of Florida 11:30 Estimating the Effects of Center Characteristics on Center Outcomes: A Symbolic Data Approach Jennifer Le-Rademacher*, Medical College of Wisconsin

* = Presenter n = Student Award Winner Program 73 11:45 A Robust and Flexible Method to Estimate Association for Sparse Clustered Data Lijia Wang* and John J. Hanfelt, Emory University

12:00 Floor Discussion

31. CONTRIBUTED PAPERS: Ibis (3rd Floor) GWAS Sponsor: ENAR Chair: Luis G. Neon-Novelo, University of Louisiana at Lafayette 10:30 Gene-Disease Associations via Sparse Simultaneous Signal Detection Sihai Dave Zhao*, University of Illinois at Urbana-Champaign Tony Cai and Hongzhe Li, University of Pennsylvania 10:45 Statistical Tests for the Detection of Shared Common Genetic Variants between Heterogeneous Diseases Based on GWAS Julie Kobie*, University of Pennsylvania Sihai Dave Zhao, University of Illinois at Urbana-Champaign Yun R. Li, Hakon Hakonarson and Hongzhe Li, University of Pennsylvania 11:00 Testing Class-Level Genetic Associations Using Single-Element Summary Statistics Jing Qian*, Eric Reed and Sara Nunez, University of Massachusetts, Amherst Rachel Ballentyne, Liming Qu and Muredach P. Reilly, University of Pennsylvania Andrea S. Foulkes, Mount Holyoke College 11:15 Set-Based Tests for Genetic Association in Longitudinal Studies Zihuai He*, Min Zhang, Seunggeun Lee and Jennifer A. Smith, University of Michigan Xiuqing Guo, Harbor-UCLA Medical Center Walter Palmas, Columbia University Sharon L.R. Kardia, Ana V. Diez Roux and Bhramar Mukherjee, University of Michigan

74 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 11:30 GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation Dongjun Chung*, Medical University of South Carolina Can Yang, Hong Kong Baptist University Cong Li, Joel Gelernter and Hongyu Zhao, Yale University 11:45 Optimum Study Design for Detecting Imprinting and Maternal Effects Based on Partial Likelihood Fangyuan Zhang*, The Ohio State University Abbas Khalili, McGill University Shili Lin, The Ohio State University 12:00 Analysis of Genomic Data via Likelihood Ratio Test in Composite Kernel Machine Regression Ni Zhao* and Michael C. Wu, Fred Hutchinson Cancer Research Center

32. CONTRIBUTED PAPERS: Pearson II (3rd Floor) Applications, Simulations and Methods in Causal Inference Sponsor: ENAR Chair: Luojun Wang, The Pennsylvania State University 10:30 Estimating the Fraction who Benefit from a Treatment, Using Randomized Trial Data Emily J. Huang* and Michael A. Rosenblum, Johns Hopkins University 10:45 Sensitivity Analyses in the Presence of Effect Modification in Observational Studies Jesse Y. Hsu*, Dylan S. Small and Paul R. Rosenbaum, University of Pennsylvania 11:00 The Causal Effect of Gene and Percentage of Trunk Fat Interaction on Physical Activity Taraneh Abarin*, Memorial University 11:15 A Simulation Study of a Multiply-Robust Approach for Causal Inference with Binary or Continuous Missing Covariates Jia Zhan*, Changyu Shen and Xiaochun Li, Indiana University School of Medicine and Richard M. Fairbanks School of Public Health Lingling Li, Harvard Medical School and Harvard Pilgrim Health Care Institute

* = Presenter n = Student Award Winner Program & Abstracts 75 11:30 The Impact of Unmeasured Confounding in Observational Studies Zugui Zhang* and Paul Kolm, Christiana Care Health System 11:45 Flexible Models for Estimating Optimal Treatment Initiation Time for Survival Endpoints: Application to Timing of cART Initiation in HIV/TB Co-Infection Liangyuan Hu* and Joseph W. Hogan, Brown University 12:00 Double Robust Goodness-of-Fit Test of Coarse Structural Nested Mean Models with Application to Initiating HAART in HIV-Positive Patients Shu Yang* and Judith Lok, Harvard School of Public Health

33. CONTRIBUTED PAPERS: Gautier (3rd Floor) Adaptive Designs and Dynamic Treatment Regimes Sponsor: ENAR Chair: Xiaoqing Zhu, Michigan State University 10:30 A Bayesian Optimal Design in Two-Arm, Randomized Phase II Clinical Trials with Endpoints from Exponential Families Wei Jiang*, Jo A. Wick, Jianghua He, Jonathan D. Mahnken and Matthew S. Mayo, University of Kansas Medical Center 10:45 A Novel Method for Estimating Optimal Tree-Based Treatment Regimes in Randomized Clinical Trials Lisa L. Doove*, Katholieke Universiteit Leuven Elise Dusseldorp, Leiden University Katrijn Van Deun, Tilburg University Iven Van Mechelen, Katholieke Universiteit Leuven 11:00 Longitudinal Bayesian Adaptive Designs for the Promotion of more Ethical Two Armed Randomized Controlled Trials: A Novel Evaluation of Optimality Jo Wick*, University of Kansas Medical Center Scott M. Berry, Berry Consultants Byron Gajewski, Hung-Wen Yeh, Won Choi, Christina M. Pacheco and Christine Daley, University of Kansas Medical Center 11:15 Identifying a Set that Contains the Best Dynamic Treatment Regimes Ashkan Ertefaie*, University of Pennsylvania Tianshuang Wu and Inbal Nahum-Shani, University of Michigan Kevin Lynch, University of Pennsylvania 11:30 Optimal Dynamic Treatment Regimes for Treatment Initiation with Continuous Random Decision Points Yebin Tao* and Lu Wang, University of Michigan Haoda Fu, Eli Lilly and Company

76 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 11:45 Statistical Inference for the Mean Outcome Under a Possibly Non-Unique Optimal Treatment Strategy Alexander R. Luedtke* and Mark J. van der Laan, University of California, Berkeley 12:00 Sequential Advantage Selection for Optimal Treatment Regime Ailin Fan*, Wenbin Lu and Rui Song, North Carolina State University

34. CONTRIBUTED PAPERS: Stanford (3rd Floor) Survival Analysis and Cancer Applications Sponsor: ENAR Chair: James Lymp, Genentech 10:30 Regression Analysis of Informative Current Status Data Under Cure Rate Model Yeqian Liu*, University of Missouri, Columbia Tao Hu, Capital Normal University, Jianguo Sun, University of Missouri, Columbia 10:45 The Historical Cox Model Jonathan E. Gellar*, Johns Hopkins Bloomberg School of Public Health Fabian Scheipl, LMU Munich Mei-Cheng Wang, Johns Hopkins Bloomberg School of Public Health Dale M. Needham, Johns Hopkins School of Medicine Ciprian M. Crainiceanu, Johns Hopkins Bloomberg School of Public Health 11:00 Bayesian Analysis of Survival Data Under Generalized Extreme Value Distribution with Application in Cure Rate Model Dooti Roy*, University of Connecticut Vivekananda Roy, Iowa State University Dipak Dey, University of Connecticut 11:15 Joint Semiparametric Time-to-Event Modeling of Cancer Onset and Diagnosis When Onset is Unobserved John D. Rice* and Alex Tsodikov, University of Michigan 11:30 A Multiple Imputation Approach for Semiparametric Cure Model with Interval Censored Data Jie Zhou*, Jiajia Zhang, Alexander C. McLain and Bo Cai, University of South Carolina, Columbia 11:45 A Flexible Parametric Cure Rate Model with Known Cure Time Paul W. Bernhardt*, Villanova University 12:00 Change-Point Proportional Hazards Model for Clustered Event Data Yu Deng*, Jianwen Cai and Donglin Zeng, University of North Carolina, Chapel Hill Jinying Zhao, Tulane University

* = Presenter n = Student Award Winner Program & Abstracts 77 35. INVITED AND CONTRIBUTED ORAL POSTERS: Jasmine (Terrace Level) Methods and Applications in High Dimensional Data and Machine Learning Sponsor: ENAR Chair: Sarah Ratcliff, University of Pennsylvania 35a. INVITED POSTER: Machine Learning Methods for Constructing Real-Time Treatment Policies in Mobile Health Susan Murphy* and Yanzhen Deng*, University of Michigan 35b. INVITED POSTER: Predicting Strokes Using Relational Random Forests Zach Shahn, Patrick Ryan and David Madigan*, Columbia University 35c. Network-Constrained Group LASSO for High Dimensional Multinomial Classification with Application to Cancer Subtype Prediction Xinyu Tian*, Stony Brook University Jun Chen, Mayo Clinic Xuefeng Wang, Stony Brook University 35d. Two Sample Mean Test in High Dimensional Compositional Data Yuanpei Cao*, University of Pennsylvania Wei Lin, Peking University Hongzhe Li, University of Pennsylvania 35e. Classifications Based on Active Set Selections Wen Zhou*, Colorado State University Stephen Vardeman, Huaiqing Wu and Max Morris, Iowa State University 35f. Application of a Graph Theory Algorithm in Soft Clustering Wenzhu Mowrey*, Albert Einstein College of Medicine George C. Tseng, University of Pittsburgh Lisa A. Weissfeld, Statistics Collaborative, Inc. 35g. Testing for the Presence of Clustering Erika S. Helgeson* and Eric Bair, University of North Carolina, Chapel Hill 35h. Variable Selection and Sufficient Dimension Reduction for High Dimensional Data Yeonhee Park* and Zhihua Su, University of Florida 35i. Variable Selection for Treatment Decisions with Scalar and Functional Covariates Adam Ciarleglio*, New York University School of Medicine Eva Petkova, New York University School of Medicine and Nathan S. Kline Institute for Psychiatric Research R. Todd Ogden, Columbia University Thaddeus Tarpey, Wright State University

78 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 35j. MOPM: Multi-Operator Prediction Model Based on High-Dimensional Features Hojin Yang*, Hongtu Zhu and Joseph G. Ibrahim, University of North Carolina, Chapel Hill 35k. Structured Sparse CCA for High Dimensional Data Integration Sandra Safo* and Qi Long, Emory University 35l. SPARC: Optimal Estimation and Asymptotic Inference Under Semiparametric Sparsity Yang Ning* and Han Liu, Princeton University 35m. Local-Aggregate Modeling for Big-Data via Distributed Optimization: Applications to Neuroimaging Yue Hu n, Rice University Genevera I. Allen, Rice University, Baylor College of Medicine and Texas Children’s Hospital 35n. Residual Weighted Learning for Estimating Individualized Treatment Rules Xin Zhou* and Michael R. Kosorok, University of North Carolina, Chapel Hill 35o. Integrative Multi-Omics Clustering for Disease Subtype Discovery by Sparse Overlapping Group Lasso and Tight Clustering SungHwan Kim n, YongSeok Park and George Tseng, University of Pittsburgh 35p. Identifying Predictive Markers for Personalized Treatment Selection Yuanyuan Shen* and Tianxi Cai, Harvard University

* = Presenter n = Student Award Winner Program & Abstracts 79 MONDAY, MARCH 16

12:15 – 1:30 pm — Roundtable Luncheons Monroe (Terrace Level)

1:45 – 3:30 pm 36. Recent Research in Adaptive Randomized Ashe Auditorium Trials with the Goal of Addressing Challenges (3rd Floor) in Regulatory Science Sponsors: ENAR, ASA Biopharmaceutical Section Organizer: Michael Rosenblum, Johns Hopkins University Chair: Michael Rosenblum, Johns Hopkins University 1:45 Adaptive Enrichment with Subpopulation Selection at Interim Sue-Jane Wang* and Hsien-Ming James Hung, U.S. Food and Drug Administration 2:10 Post-Trial Simulation of Type I Error for Demonstration of Control of Type I Error Scott M. Berry*, Berry Consultants 2:35 Bayesian Commensurate Prior Approaches for Pediatric and Rare Disease Clinical Trials Bradley P. Carlin* and Cynthia Basu, University of Minnesota Brian Hobbs, University of Texas MD Anderson Cancer Center 3:00 Identifying Subpopulations with the Largest Treatment Effect Iván Díaz* and Michael Rosenblum, Johns Hopkins Bloomberg School of Public Health

3:25 Floor Discussion

80 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 37. Statistical Innovations in Functional Genomics Johnson (3rd Floor) and Population Health Sponsor: ENAR Organizers: Hua Tang, Stanford University and Lihong Qi, University of California, Davis Chair: Marc Coram, Stanford University 1:45 Quality Preserving Databases: Statistically Sound and Efficient Use of Public Databases for an Infinite Sequence of Tests Saharon Rosset*, Tel Aviv University Ehud Aharoni and Hani Neuvirth, IBM Research 2:05 Fused Lasso Additive Model Ashley Petersen*, Daniela Witten and Noah Simon, University of Washington 2:25 Imputing Transcriptome in Inaccessible Tissues in and Beyond the GTEx Project via RIMEE Jiebiao Wang, Dan Nicolae, Nancy Cox and Lin S. Chen*, University of Chicago 2:45 A Bayesian Method for the Detection of Long-Range Chromosomal Interactions in Hi-C Data  Xu and Guosheng Zhang, University of North Carolina, Chapel Hill Fulai Jin, Ludwig Institute for Cancer Research Mengjie Chen and Patrick F. Sullivan, University of North Carolina, Chapel Hill Zhaohui Qin, Emory University Terrence S. Furey, University of North Carolina, Chapel Hill Ming Hu, New York University Yun Li*, University of North Carolina, Chapel Hill 3:05 Fine Mapping of Complex Trait Loci with Coalescent Methods in Large Case-Control Studies Ziqan Geng, University of Michigan Paul Scheet, University of Texas MD Andersen Cancer Center Sebastian Zöllner*, University of Michigan

3:25 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 81 38. Big Data: Issues in Biosciences Miami Lecture Hall Sponsors: ENAR, ICSA (3rd Floor) Organizers: Charmaine Dean, University of Western Ontario, Zhezhen Jin, Columbia University and Hongyu Zhao, Yale University Chair: Charmaine Dean, University of Western Ontario 1:45 Big Genomics Data Analytics Haiyan Huang* and Bin Yu, University of California, Berkeley 2:15 Recalculating the Relative Risks of Air Pollution to Account for Preferential Site Selection James V. Zidek*, University of Gavin Shaddick, University of Bath 2:45 Functional Data Analysis for Quantifying Brain Connectivity Hans-Georg Mueller* and Alexander Petersen, University of California, Davis Owen Carmichael, Louisiana State University

3:15 Floor Discussion

39. Recent Advances in Statistical Ecology Foster (3rd Floor) Sponsor: ENAR Organizer: Mahlet Tadesse, Georgetown University Chair: Mahlet Tadesse, Georgetown University 1:45 Efficient Spatial and Spatio-Temporal False Discovery Rate Control Ali Arab*, Georgetown University 2:10 Mixture of Inhomogeneous Matrix Models for Species-Rich Ecosystems Frederic Mortier*, CIRAD — Tropical Forest Goods and Ecosystem Services Unit 2:35 Spatio-Temporal Modeling of Multiple Species Migration Flow Trevor Oswald* and Christopher K. Wikle, University of Missouri, Columbia 3:00 Statistical Modeling of Spatial Discrete and Continuous Data in Ecology Jun Zhu*, University of Wisconsin, Madison

3:25 Floor Discussion

82 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 40. New Analytical Issues in Current Brickell (Terrace Level) Epidemiology Studies of HIV and Other Sexually Transmitted Infections Sponsor: ENAR Organizer: Xiangrong Kong, Johns Hopkins University Chair: Kellie Archer, Virginia Commonwealth University 1:45 A Framework for Quantifying Risk Stratification from Diagnostic Tests: Application to HPV Testing in Cervical Cancer Screening Hormuzd Katki*, National Cancer Institute, National Institutes of Health 2:05 Combining Information to Estimate Adherence in Trials of Pre-Exposure Prophylaxis for HIV Prevention James Hughes*, University of Washington 2:25 Analysis of Longitudinal Multivariate Outcome Data from Couples Cohort Studies: Application to HPV Transmission Dynamics Xiangrong Kong*, Johns Hopkins University 2:45 Sample Size Methods for Estimating HIV Incidence from Cross-Sectional Surveys Jacob Moss Konikoff* and Ron Brookmeyer, University of California, Los Angeles 3:05 Development of Accurate Methods to Estimate HIV Incidence in Cross-Sectional Surveys Oliver B. Laeyendecker*, National Institute of Allergy and Infectious Diseases, National Institutes of Health

3:25 Floor Discussion ENAR2015

* = Presenter n = Student Award Winner Program & Abstracts 83 41. Statistical Advances and Challenges Tuttle (Terrace Level) in Mobile Health Sponsor: IMS Organizer: Susan Murphy, University of Michigan Chair: Elizabeth Sweeney, Johns Hopkins University 1:45 Supporting Health Management in Everyday Life with Mobile Technology Predrag Klasnja*, Susan A. Murphy and Ambuj Tewari, University of Michigan 2:10 Measuring Stress and Addictive Behaviors from Mobile Physiological Sensors Santosh Kumar*, University of Memphis Emre Ertin, The Ohio State University Mustafa al’Absi, University of Minnesota David Epstein and Preston, National Institute on Drug Abuse, National Institutes of Health Annie Umbricht, Johns Hopkins University 2:35 Not Everybody, but Some People Move Like You Ciprian M. Crainiceanu*, Johns Hopkins Bloomberg School of Public Health 3:00 Micro-Randomized Trials and mHealth Peng Liao, Pedja Klasjna, Ambuj Tewari and Susan Murphy*, University of Michigan

3:25 Floor Discussion

42. CONTRIBUTED PAPERS: Pearson I (3rd Floor) Survey Research Sponsor: ENAR Chair: Stacey E Alexeeff, National Center for Atmospheric Research 1:45 Ordinal Bayesian Instrument Development: New Kid on the Patient Reported Outcome Measures Block Lili Garrard*, University of Kansas Medical Center Larry R. Price, Texas State University Marjorie J. Bott, University of Kansas Byron J. Gajewski, University of Kansas Medical Center 2:00 Quantifying Parental History in Survey Data Rengyi Xu*, Sara B. DeMauro and Rui Feng, University of Pennsylvania 2:15 Bayesian Nonparametric Weighted Sampling Inference Yajuan Si*, University of Wisconsin, Madison Natesh S. Pillai, Harvard University Andrew Gelman, Columbia University

84 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 2:30 How to Best Compute Propensity Scores in Complex Samples in Relation to Survey Weights Keith W. Zirkle* and Adam P. Sima, Virginia Commonwealth University 2:45 Multiple Imputation of the Accelerometer Data in the National Health and Nutrition Examination Survey Benmei Liu*, Mandi Yu, Barry I. Graubard and Richard Troiano, National Cancer Institute, National Institutes of Health Nathaniel Schenker, National Center for Health Statistics, Centers for Disease Control and Prevention 3:00 Split Questionnaire Survey Design in the Longitudinal Setting Paul M. Imbriano* and Trivellore E. Raghunathan, University of Michigan

3:15 Floor Discussion

43. CONTRIBUTED PAPERS: Pearson II (3rd Floor) Graphical Models Sponsor: ENAR Chair: Sheila Gaynor, Harvard University 1:45 Regression Analysis of Networked Data Yan Zhou n and Peter X.K. Song, University of Michigan 2:00 Integrative Analysis of Genetical Genomics Data Incorporating Network Structure Bin Gao* and Yuehua Cui, Michigan State University 2:15 Estimating a Graphical Intra-Class Correlation Coefficient (GICC) Using Multivariate Probit-Linear Mixed Models Chen Yue*, Shaojie Chen, Haris Sair, Raag Airan and Brian Caffo, Johns Hopkins University 2:30 Estimation of Directed Subnetworks in Ultra High Dimensional Data for Gene Network Problem Sung Won Han* and Hua (Judy) Zhong, New York University 2:45 Longitudinal Graphical Models: Optimal Estimation and Asymptotic Inference Quanquan Gu*, Yuan Cao, Yang Ning and Han Liu, Princeton University 3:00 Jointly Estimating Gaussian Graphical Models for Spatial and Temporal Data Zhixiang Lin* and Tao Wang, Yale University Can Yang, Hong Kong Baptist University Hongyu Zhao, Yale University

3:15 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 85 44. CONTRIBUTED PAPERS: Merrick II (3rd Floor) Joint Models for Longitudinal and Survival Data Sponsor: ENAR Chair: Kun Xu, University of Miami 1:45 Joint Modeling of Bivariate Longitudinal and Bivariate Survival Data in Spouse Pairs Jia-Yuh Chen* and Stewart J. Anderson, University of Pittsburgh 2:00 Joint Model of Bivariate Survival Times and Longitudinal Data Ke Liu* and Ying Zhang, University of Iowa 2:15 Dynamic Prediction of Acute Graft-versus-Host Disease with Time-Dependent Covariates Yumeng Li* and Thomas M. Braun, University of Michigan 2:30 The Joint Modelling of Recurrent Events and Other Failure Time Events Luojun Wang* and Vernon M. Chinchilli, The Pennsylvania State University 2:45 A Bayesian Approach for Joint Modeling of Longitudinal Menstrual Cycle Length and Fecundity Kirsten J. Lum*, Johns Hopkins University and Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health Rajeshwari Sundaram and Germaine M. Buck Louis, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health Thomas A. Louis, Johns Hopkins University and U.S. Census Bureau 3:00 Joint Analysis of Multiple Longitudinal Processes and Survival Data Measured on Nested Time-Scales Using Shared Parameter Models: An Application to Fecundity Data Rajeshwari Sundaram*, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health Somak Chatterjee, George Washington University

3:15 Floor Discussion

86 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 45. CONTRIBUTED PAPERS: Gautier (3rd Floor) Functional Data Analysis Sponsor: ENAR Chair: Ana W. Capuano, Rush University Medical Center 1:45 Generalized Multilevel Function-on-Scalar Regression and Principal Component Analysis Jeff Goldsmith*, Columbia University Vadim Zipunnikov and Jennifer Schrack, Johns Hopkins University 2:00 Inference on Fixed Effects in Complex Functional Mixed Models So Young Park* and Ana-Maria Staicu, North Carolina State University Luo Xiao and Ciprian Crainiceanu, Johns Hopkins Bloomberg School of Public Health 2:15 Generalized Function-on-Function Regression Janet S. Kim*, Ana-Maria Staicu and Arnab Maity, North Carolina State University 2:30 Variable Selection in Function-on-Scalar Regression Yakuan Chen*, Todd Ogden and Jeff Goldsmith, Columbia University 2:45 Bayesian Adaptive Functional Models with Applications to Copy Number Data Bruce D. Bugbee*, Veera Baladandayuthapani and Jeffrey S. Morris, University of Texas MD Anderson Cancer Center 3:00 Functional Bilinear Regression with Matrix Covariates via Reproducing Kernel Hilbert Space with Applications in Neuroimaging Data Analysis Dong Wang, University of North Carolina, Chapel Hill Dan Yang*, Rutgers University Haipeng Shen and Hongtu Zhu, University of North Carolina, Chapel Hill 3:15 Simultaneous Confidence Bands for Derivatives of Dependent Functional Data Guanqun Cao*, Auburn University

* = Presenter n = Student Award Winner Program & Abstracts 87 46. CONTRIBUTED PAPERS: Ibis (3rd Floor) Methods in Causal Inference: Instrumental Variable, Propensity Scores and Matching Sponsor: ENAR Chair: Ozgur Asar, Lancaster University 1:45 Methods to Overcome Violations of an Instrumental Variable Assumption: Converting a Confounder into an Instrument Michelle Shardell*, National Institute on Aging, National Institutes of Health 2:00 Assessing Treatment Effect of Thiopurines on Crohn’s Disease from a UK Population-Based Study Using Propensity Score Matching Laura H. Gunn*, Stetson University Sukhdev Chatu, St. George’s University Hospital London Sonia Saxena and Azeem Majeed, Imperial College London Richard Pollok, St. George’s University Hospital London 2:15 Semiparametric Causal Inference in Matched Cohort Studies Edward H. Kennedy n and Dylan S. Small, University of Pennsylvania 2:30 Revisiting the Comparison of Covariate Adjusted Logistic Regression versus Propensity Score Methods with Few Events per Covariate Fang Xia*, Phillip J. Schulte and Laine Thomas, Duke University School of Medicine 2:45 Bayesian Latent Propensity Score Approach for Average Causal Effect Estimation Allowing Covariate Measurement Error Elande Baro*, Yi Huang and Anindya Roy, University of Maryland Baltimore County 3:00 Comparative Performance of Multivariate Matching Methods that Select a Subset of Observations Maria de los Angeles Resa* and Jose R. Zubizarreta, Columbia University 3:15 Improving Treatment Effect Estimation in the Presence of Treatment Delay through Triplet Matching Erinn M. Hade* and Bo Lu, The Ohio State University Hong Zhu, University of Texas Southwestern Medical Center

88 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 47. CONTRIBUTED PAPERS: Stanford (3rd Floor) Covariates Measured with Error Sponsor: ENAR Chair: Xiaoye Ma, University of Minnesota 1:45 Locally Efficient Semiparametric Estimators for Proportional Hazards Models with Measurement Error Yuhang Xu* and Yehua Li, Iowa State University Xiao Song, University of Georgia 2:00 Separating Variability in Practice Patterns from Statistical Error: An Opportunity for Quality Improvement Laine Thomas* and Phillip J. Schulte, Duke University 2:15 Goodness-of-Fit Testing of Error Distribution in Linear Errors-in-Variables Model Xiaoqing Zhu*, Michigan State University 2:30 Estimating Recurrence and Incidence of Preterm Birth in Consecutive Pregnancies Subject to Measurement Error in Gestation: A Novel Application of Hidden Markov Models Paul S. Albert*, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 2:45 Multi-State Model with Missing Continuous Covariate Wenjie Lou*, Richard J. Kryscio and Erin Abner, University of Kentucky 3:00 Weighted l1-Penalized Corrected Quantile Regression for High Dimensional Measurement Error Models Abhishek Kaul* and Hira L. Koul, Michigan State University

3:00 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 89 48. INVITED AND CONTRIBUTED ORAL POSTERS: Jasmine (Terrace Level) Clinical Trials Sponsor: ENAR Chair: Reneé Moore, North Carolina State University 48a. INVITED POSTER: Split-Sample Based and Multiple Imputation Estimation and Computation Methods for Meta-Analysis of Clinical Trial Data and Otherwise Hierarchical Data Geert Molenbergs*, Universiteit Hasselt Geert Verbeke, Katholieke Universiteit Leuven Michael G. Kenward, London School of Hygiene and Tropical Medicine Wim Van der Elst and Lisa Hermans, Universiteit Hasselt Vahid Nassiri, Katholieke Universiteit Leuven 48b. INVITED POSTER: Over-Parameterization in Adaptive Dose-Finding Studies John O’Quigley, Universite Pierre et Marie Curie Nolan A. Wages and Mark R. Conaway, University of Virginia Ken Cheung, Columbia University Ying Yuan, University of Texas MD Anderson Cancer Center Alexia Iasonos*, Memorial Sloan Kettering Cancer Center 48c. Improving Some Clinical Trials Inference by Using Ranked Axillary Covariate Hani Samawi*, Rajai Jabrah, Robert Vogel and Daniel Linder, Georgia Southern University 48d. Direct Estimation of the Mean Outcome on Treatment when Treatment Assignment and Discontinuation Compete Xin Lu*, Emory University Brent A. Johnson, University of Rochester 48e. Bayesian Interim Analysis Methods for Phase Ib Expansion Trials Enable Earlier Go/No-Go Decisions in Oncology Drug Development James Lymp*, Jane Fridlyand and Hsin-Ju Hsieh, Genentech Daniel Sabanes Bove and Somnath Sarkar, F. Hoffmann-La Roche 48f. Unified Additional Requirement in Consideration of Regional Approval for Multi-Regional Clinical Trials Zhaoyang Teng*, Boston University Yeh-Fong Chen, The George Washington University Mark Chang, AMAG Pharmaceuticals and Boston University 48g. Efficiencies of Bayesian Adaptive Platform Clinical Trials Ben Saville* and Scott Berry, Berry Consultants 48h. A Bayesian Semiparametric Model for Interval Censored Data with Monotone Splines Bin Zhang, Cincinnati Children’s Hospital Medical Center Yue Zhang*, University of Cincinnati

90 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 48i. Comprehensive Evaluation of Adaptive Designs for Phase I Oncology Clinical Trials Sheau-Chiann Chen*, Vanderbilt University Yunchan Chi, National Cheng Kung University Yu Shyr, Vanderbilt University 48j. Statistical Inference for Composite Outcomes Based on Prioritized Components Ionut Bebu* and John M. Lachin, The George Washington University 48k. The Impact of Covariate Misclassification Using Generalized Linear Regression Under Covariate-Adaptive Randomization Liqiong Fan* and Sharon D. Yeatts, Medical University of South Carolina 48l. Non-Inferiority Test Based on Transformations Santu Ghosh*, Wayne State University Arpita Chatterjee, Georgia Southern University Samiran Ghosh, Wayne State University 48m. Methods Accounting for Mortality and Missing Data in Randomized Trials with Longitudinal Outcomes Elizabeth A. Colantuoni*, Johns Hopkins Bloomberg School of Public Health Chenguang Wang, Johns Hopkins School of Medicine Daniel O. Scharfstein, Johns Hopkins Bloomberg School of Public Health 48n. A Semiparametric Bayesian Approach Using Historical Control Data for Assessing Non-Inferiority in Three Arm Trials Arpita Chatterjee*, Georgia Southern University Santu Ghosh and Samiran Ghosh, Wayne State University 48o. Design Parameters and Effect of the Delayed-Start Design in Alzheimer’s Disease Guoqiao Wang* and Richard E. Kennedy, University of Alabama, Birmingham Lon S. Schneider, University of Southern California Gary R. Cutter, University of Alabama, Birmingham

* = Presenter n = Student Award Winner Program & Abstracts 91 MONDAY, MARCH 16

Lower Promenade 3:30 – 3:45 pm — Refreshment Break with Our Exhibitors (Terrace Level)

3:45 – 5:30 pm 49. CENS Invited Session — Careers in Statistics: Ashe Auditorium (3rd Floor) Skills for Success Sponsor: ENAR Organizer: Vivian Shih, AstraZeneca Chair: Michael McIsaac, Queen’s University 3:45 How to be Successful in Oral and Written Communications as a Biostatistician Peter Grant Mesenbrink*, Novartis Pharmaceuticals Corporation 4:15 Navigating the Academic Jungle Without Going Bananas Amy H. Herring*, University of North Carolina, Chapel Hill 4:45 What am I Going to be When I Grow Up? Evolving as a Statistician Nancy L. Geller*, National Heart, Lung and Blood Institute, National Institutes of Health

5:15 Floor Discussion

50. Analysis Methods for Data Obtained from Tuttle (Terrace Level) Electronic Health Records Sponsors: ENAR, ASA Biometrics Section, ASA Section on Statistics in Epidemiology Organizer: Sebastian Haneuse, Harvard University Chair: Sebastian Haneuse, Harvard University 3:45 Improving the Power of Genetic Association Tests with Imperfect Phenotype Derived from Electronic Medical Records Jennifer A. Sinnott* and Wei Dai, Harvard School of Public Health Katherine P. Liao and Elizabeth W. Karlson, Brigham and Women’s Hospital Isaac Kohane, Harvard Medical School Robert Plenge, Merck Research Laboratories Tianxi Cai, Harvard School of Public Health

92 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 4:15 Nonparametric Estimation of Patient Prognosis with Application to Electronic Health Records Patrick J. Heagerty* and Alison E. Kosel, University of Washington 4:45 Mining EHR Narratives for Clinical Research Enedia Mendonca*, University of Wisconsin, Madison

5:15 Floor Discussion

51. Statistical Challenges of Survey and Surveillance Foster (3rd Floor) Data in US Government Sponsors: ENAR, ASA Section on Statistics in Defense and National Security, ASA Survey Research and Methodology Section Organizer: Simone Gray, Centers for Disease Control and Prevention Chair: Simone Gray, Centers for Disease Control and Prevention 3:45 Using Venue-Based Sampling to Recruit Hard-to-Reach Populations Maria Corazon B. Mendoza*, Chris Johnson, Brooke Hoots and Teresa Finlayson, Centers for Disease Control and Prevention 4:10 Development of Guidelines for the Presentation of Data from the National Health and Nutrition Examination Survey Margaret Devers Carroll*, National Health and Nutrition Examination Survey, Centers for Disease Control and Prevention 4:35 Data Swapping Methods for Statistical Disclosure Limitation Guangyu Zhang*, Joe Fred Gonzalez, Anna Oganyan and Alena Maze, National Center for Health Statistics, Centers for Disease Control and Prevention 5:00 Practical Approaches to Design and Inference Through the Integration of Complex Survey Data and Non-Survey Information Sources John L. Eltinge*, U.S. Bureau of Labor Statistics Rachel M. Harter, RTI International

5:25 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 93 52. Reconstructing the Genomic Landscape Johnson (3rd Floor) from High-Throughput Data Sponsors: ENAR, ASA Biometric Section Organizers: Adam Olshen, University of California, San Francisco and Ronglai Shen, Memorial Sloan Kettering Cancer Center Chair: Adam Olshen, University of California, San Francisco 3:45 Copy Numbers in Circulating Tumor Cells (CTCs) Using DNA-Seq Henrik Bengtsson*, University of California, San Francisco 4:10 DNA Copy Number Analyses for Family Based Designs Ingo Ruczinski*, Johns Hopkins University 4:35 Reconstructing 3-D Genome Configurations: How and Why Mark Robert Segal*, University of California, San Francisco 5:00 A Latent Variable Approach for Integrative Clustering of Multiple Genomic Data Types Ronglai Shen*, Memorial Sloan-Kettering Cancer Center

5:25 Floor Discussion

53. Statistical Methods for Single Molecule Miami Lecture Hall (3rd Floor) Experiments Sponsors: ENAR, ASA Biometric Section Organizer: Ying Hung, Rutgers University Chair: Ying Hung, Rutgers University 3:45 Walking, Sliding, and Detaching: Time Series Analysis for Cellular Transport in Axons John Fricks*, Jason Bernstein and William Hancock, The Pennsylvania State University 4:10 Analyzing Single-Molecule Protein-Targeting Experiments via Hierarchical Models Samuel Kou* and Yang Chen, Harvard University 4:35 Bimolecular Reaction, Data Types, and an Alternative Model to the Smoluchowski Theory Hong Qian*, University of Washington 5:00 Hidden Markov Models with Applications in Cell Adhesion Experiments Jeff C. F. Wu, Georgia Institute of Technology Ying Hung*, Rutgers University

5:25 Floor Discussion

94 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 54. Subgroup Analysis and Adaptive Trials Brickell (Terrace Level) Sponsor: IMS Organizer: Donatello Telesca, University of California, Los Angeles Chair: Donatello Telesca, University of California, Los Angeles 3:45 A Bayes Rule for Subgroup Reporting — Bayesian Adaptive Enrichment Designs Peter Mueller*, University of Texas, Austin 4:15 Subgroup-Based Adaptive (SUBA) Designs for Multi-Arm Biomarker Trials Yanxun Xu, University of Texas, Austin Lorenzo Trippa, Harvard University Peter Mueller, University of Texas, Austin Yuan Ji*, NorthShore University HealthSystem and University of Chicago 4:45 Detection of Cancer Subgroup Associated Alternative Splicing Jianhua Hu*, University of Texas MD Anderson Cancer Center Xuming He, University of Michigan

5:15 Floor Discussion

55. CONTRIBUTED PAPERS: Pearson I (3rd Floor) Methods to Assess Agreement Sponsor: ENAR Chair: Yansong Cheng, GlaxoSmithKline 3:45 Kappa Statistics for Correlated Matched-Pair Categorical Data Zhao Yang*, University of Tennessee Health Science Center Ming Zhou, Bristol-Myers Squibb 4:00 Sample Size Methods for Constructing Confidence Intervals for the Intra-Class Correlation Coefficient Kevin K. Dobbin* and Alexei C. Ionan, University of Georgia 4:15 Statistical Methods for Assessing Reproducibility in Multicenter Neuroimaging Studies Tian Dai* and Ying Guo, Emory University 4:30 Nonparametric Regression of Agreement Measure Between Ordinal and Continuous Outcomes AKM F. Rahman*, Limin Peng, Ying Guo and Amita Manatunga, Emory University

* = Presenter n = Student Award Winner Program & Abstracts 95 4:45 Inter-Observer Agreement for a Mixture of Data Types Shasha Bai*, University of Arkansas for Medical Sciences Marcelo A. Lopetegui, The Ohio State University 5:00 Assessing Reproducibility of Discrete and Truncated Rank Lists in High-Throughput Studies Qunhua Li*, The Pennsylvania State University 5:15 Exponentiated Lindley Poisson Distribution Mavis Pararai* and Gayan Liyanag, Indiana University of Pennsylvania Broderick Oluyede, Georgia Southern University

56. CONTRIBUTED PAPERS: Stanford (3rd Floor) Methylation and RNA Data Analysis Sponsor: ENAR Chair: Babette A Brumback, University of Florida 3:45 Identify Differential Alternative Splicing Events from Paired RNA-Seq Data Cheng Jia* and Mingyao Li, University of Pennsylvania 4:00 Functional Normalization of 450k Methylation Array Data Improves Replication in Large Cancer Studies Jean-Philippe Fortin n, Johns Hopkins Bloomberg School of Public Health Aurelie Labbe, McGill University Mathieu Lemire, Ontario Institute of Cancer Research Brent W. Zanke, Ottawa Hospital Research Institute Thomas J. Hudson, Ontario Institute of Cancer Research Elana J. Fertig, Johns Hopkins School of Medicine Celia M. T. Greenwood, Jewish General Hospital Montreal Kasper D. Hansen, Johns Hopkins Bloomberg School of Public Health 4:15 Detecting Differentially Methylated Regions (DMRs) by Mixed-Effect Logistic Model Fengjiao Hu* and Hongyan Xu, Georgia Regents University 4:30 Penalized Modeling for Variable Selection and Association Study of High-Dimensional MicroRNA Data with Repeated Measures Zhe Fei*, University of Michigan Yinan Zheng, Northwestern University Wei Zhang, University of Illinois, Chicago Justin B. Starren and Lei Liu, Northwestern University Andrea A. Baccarelli, Harvard School of Public Health Yi Li, University of Michigan Lifang Hou, Northwestern University

96 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 4:45 Comparison of Paired Tumor-Normal Methods for Differential Expression Analysis of RNA-Seq Data Janelle R. Noel*, Alice Wang, Rama Raghavan and Prabhakar Chalise, University of Kansas Medical Center Byunggil Yoo, Childrens Mercy Hospital Kansas City Sumedha Gunewardena, Kansas Intellectual and Developmental Disabilities Research Center Jeremy Chien and Brooke L. Fridley, University of Kansas Medical Center 5:00 Detecting Differential Alternative Splicing with Biological Replicates between Two Groups from RNA-Seq Data Yu Hu*, Cheng Jia, Dwight Stambolian and Mingyao Li, University of Pennsylvania 5:15 Functional Region-Based Test for DNA Methylation Kuan-Chieh Huang* and Yun Li, University of North Carolina, Chapel Hill

57. CONTRIBUTED PAPERS: Ibis (3rd Floor) New Developments in Imaging Sponsor: ENAR Chair: Sihai Dave Zhao, University of Illinois 3:45 Estimating Dynamics of Whole-Brain Functional Connectivity in Resting-State fMRI by Factor Stochastic Volatility Model Chee-Ming Ting*, Universiti Teknologi Malaysia, Malaysia Hernando Ombao, University of California, Irvine Sh-Hussain Salleh, Universiti Teknologi Malaysia, Malaysia 4:00 Kernel Smoothing GEE for Longitudinal fMRI Studies Yu Chen*, Min Zhang and Timothy D. Johnson, University of Michigan 4:15 A Hierarchical Bayesian Model for Studying the Impact of Stroke on Brain Motor Function Zhe Yu*, University of California, Irvine Raquel Prado, University of California, Santa Cruz Erin Burke Quinlan, Steven C. Cramer and Hernando Ombao, University of California, Irvine 4:30 Source Estimation for Multi-Trial Multi-Channel EEG Signals: A Statistical Approach Yuxiao Wang* and Hernando Ombao, University of California, Irvine Raquel Prado, University of California, Santa Cruz 4:45 An Exploratory Data Analysis of EEGs Time Series: A Functional Boxplots Approach Duy Ngo* and Hernando Ombao, University of California, Irvine Marc G. Genton and Ying Sun, King Abdullah University of Science and Technology

* = Presenter n = Student Award Winner Program & Abstracts 97 5:00 A Bayesian Functional Linear Cox Regression Model (BFLCRM) for Predicting Time to Conversion to Alzheimer’s Disease Eunjee Lee n, Hongtu Zhu and Dehan Kong, University of North Carolina, Chapel Hill Yalin Wang, Arizona State University Kelly Sullivan Giovanello and Joseph Ibrahim, University of North Carolina, Chapel Hill

5:00 Floor Discussion

58. CONTRIBUTED PAPERS: Pearson II (3rd Floor) Latent Variable and Principal Component Models Sponsor: ENAR Chair: Jesse Y Hsu, University of Pennsylvania 3:45 A Latent Variable Model for Analyzing Correlated Ordered Categorical Data Ali Reza Fotouhi*, University of The Fraser Valley 4:00 Estimation of Branching Curves in the Presence of Subject Specific Random Effects Angelo Elmi*, The George Washington University Sarah J. Ratcliffe and Wensheng Guo, University of Pennsylvania 4:15 Composite Large Margin Classifiers with Latent Subclasses for Heterogeneous Biomedical Data Guanhua Chen n, Vanderbilt University Yufeng Liu and Michael R. Kosorok, University of North Carolina, Chapel Hill 4:30 Evaluation of Covariate-Specific Accuracy of Biomarkers without a Gold Standard Zheyu Wang*, Johns Hopkins University Xiao-Hua Zhou, University of Washington 4:45 Linear Mixed Model with Unobserved Informative Cluster Size: Application to a Repeated Pregnancy Study Ashok K. Chaurasia*, Danping Liu and Paul S. Albert, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 5:00 A Semiparametric Model of Estimating Non-Constant Factor Loadings Zhenzhen Zhang* and Brisa Sanchez, University of Michigan 5:15 Nested Partially-Latent Class Models (npLCM) for Estimating Disease Etiology in Case-Control Studies Zhenke Wu* and Scott L. Zeger, Johns Hopkins University

98 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 59. CONTRIBUTED PAPERS: Gautier (3rd Floor) Developments and Applications of Clustering, Classification, and Dimension Reduction Methods Sponsor: ENAR Chair: Taraneh Abarin, Memorial University 3:45 Separable Spatio-Temporal Principal Component Analysis Lei Huang n, Johns Hopkins University Philip T. Reiss, New York University School of Medicine Luo Xiao, Vadim Zipunnikov, Martin A. Lindquist and Ciprian Crainiceanu, Johns Hopkins University 4:00 Penalized Clustering Using a Hidden Markov Random Field Model: Detecting State-Related Changes in Brain Connectivity Yuting Xu* and Martin Lindquist, Johns Hopkins University 4:15 Clustering of Brain Signals Using the Total Variation Distance Carolina Euán*, Centro de Investigación en Matemáticas (CIMAT), A.C. Hernando Ombao, University of California, Irvine Joaquin Ortega, Centro de Investigación en Matemáticas (CIMAT), A.C. Pedro Alvarez-Esteban, Universidad de Valladolid, Spain 4:30 Impact of Data Reduction on Accelerometer Data in Children Daniela Sotres-Alvarez* and Yu Deng, University of North Carolina, Chapel Hill Guadalupe X. Ayala, San Diego State University Mercedes Carnethon, Northwestern University Alan M. Delamater, University of Miami Carmen R. Isasi, Albert Einstein College of Medicine Sonia Davis and Kelly R. Evenson, University of North Carolina, Chapel Hill 4:45 Learning Logic Rules for Disease Classification: With an Application to Developing Criteria Sets for the Diagnostic and Statistical Manual of Mental Disorders Christine M. Mauro n, Columbia University Donglin Zeng, University of North Carolina, Chapel Hill M. Katherine Shear and Yuanjia Wang, Columbia University 5:00 Characterizing Types of Physical Activity: An Unsupervised Way Jiawei Bai*, Luo Xiao, Vadim Zipunnikov and Ciprian M. Crainiceanu, Johns Hopkins University 5:15 Simultaneous Model-Based Clustering and Variable Selection: Extension to Mixed-Distribution Data Katie Evans, Dupont Tanzy M.T. Love* and Sally W. Thurston, University of Rochester

* = Presenter n = Student Award Winner Program & Abstracts 99 60. CONTRIBUTED PAPERS: Merrick II (3rd Floor) Survival Analysis: Methods Development and Applications Sponsor: ENAR Chair: Jo Wick, University of Kansas Medical Center 3:45 Predictive Model and Dynamic Prediction for Recurrent Events with Dependent Termination Li-An Lin*, Sheng Luo and Barry Davis, University of Texas Health Sciences Center at Houston 4:00 An Extended Self-Triggering Model for Recurrent Event Data Jung In Kim*, Feng-Chang Lin and Jason Fine, University of North Carolina, Chapel Hill 4:15 A Pairwise-Likelihood Augmented Estimator for the Cox Model Under Left-Truncation Fan Wu* and Sehee Kim, University of Michigan Jing Qin, National Institute of Allergy and Infectious Diseases, National Institutes of Health Yi Li, University of Michigan 4:30 Rank-Based Testing Based on Cross-Sectional Survival Data with or without Prospective Follow-Up Kwun Chuen Gary Chan*, University of Washington Jing Qin, National Institute of Allergy and Infectious Diseases, National Institutes of Health 4:45 Computation Efficient Models for Fitting Large-Scale Survival Data Kevin He*, Yanming Li, Ji Zhu and Yi Li, University of Michigan 5:00 Multiple Imputation for Interval Censored Data with Time-Dependent Auxiliary Variables Using Incident and Prevalent Cohort Data Wen Ye* and Douglas Schaubel, University of Michigan 5:15 Model Flexibility for Regression Analysis of Survival Data with Informative Interval Censoring Tyler Cook* and Jianguo Sun, University of Missouri, Columbia

100 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 61. INVITED AND CONTRIBUTED ORAL POSTERS: Jasmine (Terrace Level) GWAS and Meta Analysis of Genetic Studies Sponsor: ENAR Chair: Mary Sammel, University of Pennsylvania 61a. INVITED POSTER: Hypothesis Testing for Sparse Signals in Genetic Association Studies Xihong Lin*, Harvard University 61b. INVITED POSTER: Meta-Analysis of Gene-Environment Interaction in Case-Control Studies by Adaptively Using Gene-Environment Correlation Bhramar Mukherjee*, Shi Li, John D. Rice, Jeremy MG Taylor, Heather Stringham and Michael L. Boehnke, University of Michigan 61c. Partial Linear Varying Index Coefficient Model for Gene-Environment Interactions Xu Liu* and Yuehua Cui, Michigan State University 61d. Tree-Based Model Averaging Approaches for Modeling Rare Variant Association in Case-Control Studies Brandon J. Coombes* and Saonli Basu, University of Minnesota Sharmistha Guha, Fair Isaac Corporation Nicholas Schork, J. Craig Venter Institute 61e. A Functional Approach to Association Testing of Multiple Phenotypes in Sequencing Studies Sneha Jadhav* and Qing Lu, Michigan State University 61f. Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling Ran Tao*, Donglin Zeng, Nora Franceschini and Kari E. North, University of North Carolina, Chapel Hill Eric Boerwinkle, University of Texas Health Science Center Dan-Yu Lin, University of North Carolina, Chapel Hill 61g. Meta-Analysis of Complex Diseases at Gene Level by Generalized Functional Linear Models Ruzong Fan* and Yifan Wang, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health Haobo Ren, Regeneron Pharmaceuticals, Inc. Yun Li, University of North Carolina, Chapel Hill Christopher Amos, Dartmouth Medical School Wei Chen, University of Pittsburgh Momiao Xiong, University of Texas, Houston Jason Moore, Dartmouth Medical School

* = Presenter n = Student Award Winner Program & Abstracts 101 61h. Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models Yifan Wang* and Ruzong Fan, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health Michael Boehnke, University of Michigan Wei Chen, University of Pittsburgh Yun Li, University of North Carolina, Chapel Hill Momiao Xiong, University of Texas, Houston 61i. A New Estimating Equation Approach for Secondary Trait Analyses in Genetic Case-Control Studies Xiaoyu Song*, Iuliana Ionita-Laza and Ying Wei, Columbia University 61j. Novel Statistical Model for GWAS Meta-Analysis and Its Application to Trans-Ethic Meta-Analysis Jingchunzi Shi* and Seunggeun Lee, University of Michigan 61k. Multiple Phenotype Association Testing Based on Summary Statistics in Genome-Wide Association Studies Zhonghua Liu* and Xihong Lin, Harvard School of Public Health 61l. A New Approach for Detecting Gene-by-Gene Interactions Through Meta-Analyses Yulun Liu*, University of Texas, Health Science Center at Houston Paul Scheet, University of Texas MD Anderson Cancer Center Yong Chen, University of Texas, Health Science Center at Houston 61m. Genome-Wide Association Studies for Functional Valued Traits Han Hao* and Rongling Wu, The Pennsylvania State University 61n. Kernel-Based Testing for Nonlinear Effect of a SNP-Set under Multiple Candidate Kernels Tao He*, Ping-Shou Zhong and Yuehua Cui, Michigan State University 61o. A General Framework of Gene-Based Association Tests for Correlated Case-Control Samples Han Chen*, Chaolong Wang and Xihong Lin, Harvard School of Public Health 61p. Algorithm to Compute the Identity Coefficients at a Particular Locus Given the Marker Information J Concepcion Loredo-Osti* and Haiyan Yang, Memorial University 61q. Estimating the Empirical Null Distribution of Maxmean Statistics in Gene Set Analysis Xing Ren* and Jeffrey Miecznikowski, University at Buffalo, SUNY Song Liu and Jianmin Wang, Roswell Park Cancer Institute 61r. USAT: A Unified Score-Based Association Test for Multiple Phenotype-Genotype Analysis Debashree Ray* and Saonli Basu, University of Minnesota

102 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner TUESDAY, MARCH 17

8:30 – 10:15 am 62. Statistical Inference with Random Forests and Hibiscus B (Terrace Level) Related Ensemble Methods Sponsor: ENAR Organizer: Giles Hooker, Cornell University Chair: Giles Hooker, Cornell University 8:30 Consistency of Random Forests Gerard Biau*, Erwan Scornet and Jean-Philippe Vert, Pierre and Marie Curie University 8:55 Asymptotic Theory for Random Forests Stefan Wager*, Stanford Universityy 9:20 Detecting Feature Interactions in Bagged Trees and Random Forests Lucas K. Mentch* and Giles Hooker, Cornell University 9:45 Variable Selection with Bayesian Additive Regression Trees Shane T. Jensen*, Justin Bleich, Adam Kapelner and Edward I. George, University of Pennsylvania

10:10 Floor Discussion

63. Mediation and Interaction: Ashe Auditorium (3rd Floor) Theory, Practice and Future Directions Sponsors: ENAR, ASA Biometrics Section, ASA Section on Statistics in Epidemiology Organizers: Brisa Sanchez, University of Michigan and Melody Goodman, Washington University in St. Louis Chair: Brisa Sanchez, University of Michigan 8:30 A Unification of Mediation and Interaction: A Four-Way Decomposition Tyler J. VanderWeele*, Harvard University 9:00 Partial Identification of the Pure Direct Effect Under Exposure-Induced Confounding Caleb Miles* and Eric Tchetgen Tchetgen*, Harvard University 9:30 Integrative Analysis of Complex Genetic, Genomic and Environmental Data Using Mediation Analysis Xihong Lin*, Harvard University 10:00 Discussant: Bhramar Mukherjee, University of Michigan

* = Presenter n = Student Award Winner Program & Abstracts 103 ENAR2015

64. Motivation and Analysis Strategies for Joint Orchid C (Terrace Level) Modeling of High Dimensional Data in Genetic Association Studies Sponsors: ENAR, ASA Biometrics Section Organizer: Saonli Basu, University of Minnesota Chair: Weihua Guan, University of Minnesota 8:30 Region-Based Test for Gene-Environment Interactions in Longitudinal Studies Zihuai He, Min Zhang*, Seunggeun Lee and Jennifer Smith, University of Michigan Xiuqing Guo, Harbor-UCLA Medical Center Walter Palmas, Columbia University Sharon L.R. Kardia, Ana V. Diez Roux and Bhramar Mukherjee, University of Michigan 8:55 Strategies to Improve the Power of Pathway Analysis in Genetic Association Studies Kai Yu*, Han Zhang, Jianxin Shi and Nilanjan Chatterjee, National Cancer Institute, National Institutes of Health 9:20 A Unified Test for Population-Based Multiple Correlated Phenotype- Genotype Association Analysis Saonli Basu* and Debashree Ray, University of Minnesota 9:45 Modelling Multiple Correlated Genetic Variants Sharon R. Browning*, University of Washington

10:10 Floor Discussion

104 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 65. Recent Developments on Inference for Johnson (3rd Floor) Possibly Time-Dependent Treatment Effects with Survival Data Sponsors: ENAR, ASA Biometrics Section Organizer: Song Yang, National Heart, Lung and Blood Institute, National Institutes of Health Chair: Song Yang, National Heart, Lung and Blood Institute, National Institutes of Health 8:30 Threshold Regression for Lifetime Data Mei-Ling Ting Lee*, University of Maryland, College Park George A. Whitmore, McGill University, Canada 8:55 Hypothesis Testing for an Extended Cox Model with Time-Varying Coefficients Ying Q. Chen*, Fred Hutchinson Cancer Research Center 9:20 Time-Dependent Cut Point Selection for Biomarkers in Censored Survival Data Zhezhen Jin*, Columbia University 9:45 Inference on the Summary Measures of Treatment Effect with Survival Data When There is Possibly Treatment by Time Interaction Song Yang*, National Heart, Lung and Blood Institute, National Institutes of Health

10:10 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 105 66. Journal of Agricultural, Biological and Foster (3rd Floor) Environmental Statistics (JABES) Highlights Sponsors: ENAR, JABES Organizer: Montserrat Fuentes, North Carolina State University Chair: Murali Haran, The Pennsylvania State University 8:30 Limited-Information Modeling of Loggerhead Turtle Population Size John M. Grego* and David B. Hitchcock, University of South Carolina 8:55 Nonlinear Varying-Coefficient Models with Applications to a Photosynthesis Study Damla Senturk*, University of California, Los Angeles Esra Kurum, Medeniyet University Runze Li, The Pennsylvania State University Yang Wang, China Vanke 9:20 Multilevel Latent Gaussian Process Model for Mixed Discrete and Continuous Multivariate Response Data Erin M. Schliep*, Duke University Jennifer A. Hoeting, Colorado State University 9:45 Analysis of Variance of Integro-Differential Equations with Application to Population Dynamics of Cotton Aphids Jianhua Huang*, Texas A&M University

10:10 Floor Discussion

67. Estimation and Inference for High Dimensional Miami Lecture Hall and Data Adaptive Problems (3rd Floor) Sponsor: IMS Organizer: Noah Simon, University of Washington Chair: Michael Wu, Fred Hutchinson Cancer Research Center 8:30 False Discovery Rate Control for Spatial Data Alexandra Chouldechova*, Carnegie Mellon University 8:55 Conditional or Fixed? Different Philosophies in Adaptive Inference Max Grazier-G’sell* and Ryan Tibsharani, Carnegie Mellon Universityo 9:20 Inference for Regression Quantiles After Model Selection Jelena Bradic*, University of California, San Diego Mladen Kolar, University of Chicago 9:45 A Flexible Framework for Sparse Additive Modeling Conditional or Fixed? Different Philosophies in Adaptive Inference Noah Simon*, University of Washington

10:10 Floor Discussion

106 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 68. CONTRIBUTED PAPERS: Merrick I (3rd Floor) Novel Methods for Bioassay Data Sponsor: ENAR Chair: Wen Yu, University of Michigan 8:30 drLumi: Tools for the Analysis of the Multiplex Immunoassays in R Hector Sanz* and John Aponte, Universitat de Barcelona, Spain Jaroslaw Harezlak and Magdalena Murawska, Indiana University Fairbanks School of Public Health, Indianapolis Ruth Aguilar, Gemma Moncunill and Carlota Dobaño, Universitat de Barcelona, Spain Clarissa Valim, Harvard School of Public Health 8:45 A Bayesian Analysis of Bioassay Experiments Luis G. Leon-Novelo*, University of Louisiana at Lafayette Andrew Womack, Indiana University Hongxiao Zhu and Xiaowei Wu, Virginia Polytechnic Institute and State University 9:00 Compound Ranking Based on a New Mathematical Measure of Effectiveness Using Time Course Data from Cell-Based Assays Francisco J. Diaz*, University of Kansas Medical Center 9:15 Nonparametric Classification of Chemicals using Quantitative High Throughput Screening (qHTS) Assays Shuva Gupta*, National Institute of Environmental Health Sciences, National Institutes of Health Soumendra Lahiri, North Carolina State University Shyamal Peddada, National Institute of Environmental Health Sciences, National Institutes of Health 9:30 Robust Bayesian Methods for the Inverse Regression with an Application to Immunoassay Experiments Magdalena Murawska, Indiana University Fairbanks School of Public Health, Indianapolis Hector Sanz, Ruth Aguilar, Gemma Moncunill, Carlota Dobaño and John Aponte, Universitat de Barcelona, Spain Clarissa Valim, Harvard School of Public Health Jaroslaw Harezlak*, Indiana University Fairbanks School of Public Health, Indianapolis 9:45 Estimating the Prevalence of Multiple Diseases via Two-Stage Hierarchical Pooling Md S. Warasi* and Joshua M. Tebbs, University of South Carolina Christopher McMahan, Clemson University 10:00 A Ballooned Beta Regression Model and Its Application to Bioassay Data Min Yi* and Nancy Flournoy, University of Missouri, Columbia

* = Presenter n = Student Award Winner Program & Abstracts 107 69. CONTRIBUTED PAPERS: Pearson I (3rd Floor) Infectious Disease Sponsor: ENAR Chair: Jean-Philippe Fortin, Johns Hopkins Bloomberg School of Public Health 8:30 Viral Genetic Linkage Analysis in the Presence of Missing Data Shelley Han Liu* and Gabriel Erion, Harvard University Vladimir Novitsky and Victor DeGruttola, Harvard School of Public Health 8:45 A Bayesian Approach to Estimating Causal Vaccine Effects on Binary Post-Infection Outcomes Jincheng Zhou*, Minneapolis Medical Research Foundation, University of Minnesota Haitao Chu, University of Minnesota Michael G. Hudgens, University of North Carolina, Chapel Hill M. Elizabeth Halloran, Fred Hutchinson Cancer Research Center and University of Washington 9:00 Exploring Bayesian Latent Class Models as a Potential Statistical Tool to Estimate Sensitivity and Specificity in Presence of an Imperfect or No Gold Standard. Jay Mandrekar*, Mayo Clinic 9:15 Modeling and Inference for Rotavirus Dynamics in Niger Joshua Goldstein*, Murali Haran and Matthew Ferrari, The Pennsylvania State University 9:30 Comparison of Group Testing Algorithms for Case Identification in the Presence of Dilution Effect Dewei Wang*, University of South Carolina Christopher S. McMahan and Colin M. Gallagher, Clemson University 9:45 Cholera Transmission in Ouest Region of Haiti: Dynamic Modeling and Prediction Alexander Kirpich*, Alex Weppelmann, Yang Yang and Ira Longini, University of Florida

10:00 Floor Discussion

108 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 70. CONTRIBUTED PAPERS: Pearson II (3rd Floor) Variable Selection Sponsor ENAR Chair: Angelo Elmi, The George Washington University 8:30 Weak Signal Identification and Inference in Penalized Model Selection Peibei Shi n and Annie Qu, University of Illinois, Urbana-Champaign 8:45 Feature Screening for Time-Varying Coefficient Models Ultra-High Dimensional Longitudinal Data Wanghuan Chu*, Runze Li and Matthew Reimherr, The Pennsylvania State University 9:00 A Regularized Approach for Simultaneous Estimation and Model Selection for Single Index Models Longjie Cheng*, Purdue University Peng Zeng, Auburn University Yu Zhu, Purdue University 9:15 Multi-Step LASSO Haileab Hilafu*, University of Tennessee 9:30 Bayesian Hierarchical Variable Selection Incorporating Multi-Level Structural Information Changgee Chang*, Emory University Yize Zhao, Statistical and Applied Mathematical Sciences Institute Qi Long, Emory University 9:45 Model Selection for Protein Copy Numbers in Populations of Microorganism Burcin Simsek*, Hanna Salman and Satish Iyengar, University of Pittsburgh 10:00 Globally Adaptive Quantile Regression with Ultra-High Dimensional Data Qi Zheng* and Limin Peng, Emory University Xuming He, University of Michigan

* = Presenter n = Student Award Winner Program & Abstracts 109 71. CONTRIBUTED PAPERS: Gautier (3rd Floor) Modeling Health Data with Spatial or Temporal Features Sponsor: ENAR Chair: Guanhua Chen, Vanderbilt University 8:30 Modeling of Correlated Objects with Application to Detection of Metastatic Cancer Using Functional CT Imaging Yuan Wang*, Brian Hobbs, Jianhua Hu and Kim-Anh Do, University of Texas MD Anderson Cancer Center 8:45 A Spatially Varying Coefficient Model with Partially Unknown Proximity Matrix for the Detection of Glaucoma Progression Using Visual Field Data Joshua L. Warren*, Yale School of Public Health Jean-Claude Mwanza, University of North Carolina, Chapel Hill Angelo P. Tanna, Northwestern University Donald L. Budenz, University of North Carolina, Chapel Hill 9:00 Mapping and Measuring the Effect of Privatization on Alcohol and Violence: Does it Really Matter? Loni Philip Tabb* and Tony H. Grubesic, Drexel University 9:15 Modeling Adolescent Health Data Using a Binary Spatial-Temporal Generalized Method of Moments Approach Kimberly Kaufeld*, Statistical and Applied Mathematics Institute and North Carolina State University 9:30 A Piecewise Exponential Survival Model with Change Points for Evaluating the Temporal Association of World Trade Center Exposure with Incident Obstructive Airway Disease Charles B. Hall*, Albert Einstein College of Medicine Xiaoxue Liu, Rachel Zeig-Owens, Mayris P. Webber, Jessica Weakley and Theresa M. Schwartz, Montefiore Medical Center David J. Prezant, Fire Department of the City of New York 9:45 Distributed Lag Models: Examining Associations between the Built Environment and Health Jonggyu Baek*, Brisa N. Sanchez and Veronica J. Berrocal, University of Michigan Emma V. Sanchez-Vaznaugh, San Francisco State University 10:00 Cluster Detection Test in Spatial Scan Statistics: ADHD Application Ahmad Reza Soltani* and Suja Aboukhamseen, Kuwait University

110 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 72. CONTRIBUTED PAPERS: Merrick II (3rd Floor) Advances in Longitudinal Modeling Sponsor: ENAR Chair: Li-An Lin, University of Texas Health Science Center, Houston 8:30 Conditional Modeling of Longitudinal Data with Terminal Event Shengchun Kong*, Purdue University Bin Nan and Jack Kalbfleisch, University of Michigan 8:45 A Marginalized Multilevel Model for Bivariate Longitudinal Binary Data Gul Inan* and Ozlem Ilk Dag, Middle East Technical University, Turkey 9:00 Augmented Beta Rectangular Regression Models: A Bayesian Perspective Jue Wang* and Sheng Luo, University of Texas Health Science Center, Houston 9:15 Rank-Based Regression Models for Longitudinal Data Rui Chen, Tian Chen* and Xin Tu, University of Rochester 9:30 Markov Chains and Continuous Time Multi-State Markov Models Comparisons in Longitudinal Clinical Analysis Lijie Wan*, Richard J. Kryscio and Erin Abner, University of Kentucky 9:45 Applications of Multiple Outputation for the Analysis of Longitudinal Data Subject to Irregular Observation Eleanor M. Pullenayegum*, Hospital for Sick Children 10:00 A Hidden Markov Model Approach to Analyze Longitudinal Ternary Outcome Disease Stage Change Subject to Misclassification Julia Benoit*, University of Houston Wenyaw Chan, University of Texas Health Science Center School of Public Health

* = Presenter n = Student Award Winner Program & Abstracts 111 73. CONTRIBUTED PAPERS: Ibis (3rd Floor) Causal Inference: Average and Mediated Effects Sponsor: ENAR Chair: Jeff Goldsmith, Columbia University 8:30 Instrumental Variable Estimation of the Marginal Average Effect of Treatment on the Treated Lan Liu*, Baoluo Sun, James Robins and Eric Tchetgen Tchetgen, Harvard University 8:45 Within-Subject Designs for Causal Mediation Analysis Yenny Webb-Vargas*, Martin A. Lindquist and Elizabeth A. Stuart, Johns Hopkins Bloomberg School of Public Health Michael E. Sobel, Columbia University 9:00 Mediation Analysis of a Set of Correlated Predictors Using Weighted Quantile Sum Regression Method Bhanu Murthy Evani* and Robert A. Perera, Virginia Commonwealth University Chris Gennings, Icahn School of Medicine at Mount Sinai 9:15 Bayesian Semiparametric Latent Mediation Model Chanmin Kim*, Harvard University Michael J. Daniels, University of Texas, Austin Yisheng Li, University of Texas MD Anderson Cancer Center 9:30 Accounting for Uncertainty in Confounder Selection when Estimating Average Causal Effects in Generalized Linear Models Chi Wang*, University of Kentucky Corwin Matthew Zigler, Harvard School of Public Health Giovanni Parmigiani, Dana-Farber Cancer Institute and Harvard School of Public Health Francesca Dominici, Harvard School of Public Health 9:45 Variable Selection for Estimating Average Causal Effects Douglas Galagate*, U.S. Census Bureau 10:00 Estimating Mediation Effects Under Correlated Errors with an Application to fMRI Yi Zhao n and Xi Luo, Brown University

112 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 74. CONTRIBUTED PAPERS: Stanford (3rd Floor) Variable Selection with High Dimensional Data Sponsor: ENAR Chair: Tanujit Dey, Cleveland Clinic 8:30 Empirical Likelihood Tests for Coefficients in High Dimensional Linear Models Honglang Wang*, Ping-Shou Zhong and Yuehua Cui, Michigan State University 8:45 TPRM: Tensor Partition Regression Models with Applications in Imaging Biomarker Detection Michelle F. Miranda*, Hongtu Zhu and Joseph G. Ibrahim, University of North Carolina, Chapel Hill 9:00 A Boosting-Based Variable Selection Method for Survival Prediction with Genome-Wide Gene Expression Data Yanming Li*, Kevin He, Yi Li and Ji Zhu, University of Michigan 9:15 Statistical Inference in High-Dimensional M-Estimation Hao Chai* and Shuangge Ma, Yale University 9:30 Augmented Weighted Support Vector Machines for Missing Covariates Thomas G. Stewart n, Michael C. Wu and Donglin Zeng, University of North Carolina, Chapel Hill 9:45 Variable Selection on Model Spaces Constrained by Heredity Conditions Andrew Womack, Indiana University, Bloomington Daniel Taylor-Rodriguez*, Statistical and Applied Mathematics Institute and Duke University Claudio Fuentes, Oregon State University

10:00 Floor Discussion ENAR2015

* = Presenter n = Student Award Winner Program & Abstracts 113 TUESDAY, MARCH 17

Lower Promenade 10:15 – 10:30 am — Refreshment Break with Our Exhibitors (Terrace Level)

10:30 am – 12:15 pm 75. Presidential Invited Address Regency Ballroom Sponsor: ENAR (Terrace Level) Organizer/Chair: José Pinheiro, Johnson & Johnson PRD 10:30 Introduction 10:35 Distinguished Student Paper Awards 10:45 Big Data, Big Opportunities, Big Challenges David L. DeMets, Ph.D., Max Halperin Professor of Biostatistics, University of Wisconsin, Madison

1:45 – 3:30 pm 76. Recent Advances in Dynamic Treatment Regimes Ashe Auditorium Sponsors: ENAR, ASA Biometrics Section (3rd Floor) Organizer: Yingqi Zhao, University of Wisconsin, Madison Chair: Yingqi Zhao, University of Wisconsin, Madison 1:45 The LIBERTI Trial for Discovering a Dynamic Treatment Regimen in Burn Scar Repair Jonathan Hibbard and Michael R. Kosorok*, University of North Carolina, Chapel Hill 2:10 From Idealized to Realized: Estimating Dynamic Treatment Regimens from Electronic Medical Records Erica EM Moodie* and David A. Stephens, McGill University 2:35 Adaptive Treatment and Robust Control Robin Henderson*, Newcastle University, UK 3:00 Methods to Increase Efficiency of Estimation When a Test Used to Decide Treatment Has No Direct Effect on the Outcome James M. Robins*, Harvard University

3:25 Floor Discussion

114 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 77. Predictive Models for Precision Medicine Miami Lecture Hall Sponsors: ENAR, ASA Biometrics Section, ASA Mental Health Statistics (3rd Floor) Section, ASA Statistical Programmers Section Organizers: Suchi Saria, Johns Hopkins University and Peter Mueller, University of Texas, Austin Chair: Peter Mueller, University of Texas, Austin 1:45 The Power of Electronic Medical Records as Data-Gathering Tools for the Creation of (a) Longitudinal Personalized Near-Real-Time Predictions of Adverse Outcomes and (b) Data-Driven Advice Systems for Medical Decision-Making David Draper*, University of California, Santa Cruz and eBay Research Labs 2:10 Assessing Illness Severity from Electronic Health Data Suchi Saria*, Johns Hopkins University 2:35 Toward Individualizing Health Care: Statistical Opportunities Yates Coley, Zhenke Wu and Scott L. Zeger*, Johns Hopkins University 3:00 Dancing with Black Swans: A Computational Perspective on Suicide Risk Detection Truyen Tran*, Deakin University and Curtin University, Australia Santu Rana, Wei Luo, Dinh Phung and Svetha , Deakin University, Australia Richard Harvey, Barwon Health, Australia

3:25 Floor Discussion

78. Electronic Health Records: Challenges Orchid C (Terrace Level) and Opportunities Sponsors: ENAR, ASA Biometrics Section, ASA Section on Statistics in Epidemiology Organizer: Paramita Saha Chaudhuri, Duke University Chair: Paramita Saha Chaudhuri, Duke University 1:45 Trials and Tribulations in Trials Using EHR Data Meredith Nahm Zozus*, Duke University 2:10 Statistical Methods for Dealing with Non-Random Observation of Laboratory Data in EHRs Jason A. Roy*, University of Pennsylvania

* = Presenter n = Student Award Winner Program & Abstracts 115 2:35 Extending Bayesian Networks to Estimate Conditional Survival Probability Using Electronic Health Data David M. Vock*, Julian Wolfson, Sunayan Bandyopadhyay, Gediminas Adomavicius and Paul E. Johnson, University of Minnesota Gabriela Vazquez-Benitez and Patrick J. O’Connor, HealthPartners Institute for Education and Research 3:00 Tracking and Predicting Disease from the Electronic Medical Record Joseph Edward Lucas*, Duke University

3:25 Floor Discussion

79. Cost-Effective Study Designs Tuttle (Terrace Level) for Observational Data Sponsor: ENAR Organizer: Patrick Heagerty, University of Washington Chair: Patrick Heagerty, University of Washington 1:45 Design and Analysis of Retrospective Studies for Longitudinal Outcome Data Jonathan S. Schildcrout* and Nathaniel D. Mercaldo, Vanderbilt University School of Medicine 2:15 On the Analysis of Hybrid Designs that Combine Group- and Individual-Level Data Sebastien Haneuse* and Elizabeth Smoot, Harvard School of Public Health 2:45 Test-Dependent Sampling Design and Semi-Parametric Inference for the ROC Curve Haibo Zhou*, University of North Carolina, Chapel Hill Beth Horton, University of Virginia 3:15 Discussant: Paul Rathouz, University of Wisconsin, Madison

116 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 80. Advanced Machine Learning Methods Johnson (3rd Floor) Sponsors: ENAR, ASA Statistical Learning and Data Mining Section Organizer: Peiyong (Annie) Qu, University of Illinois, Champaign-Urbana Chair: Peiyong (Annie) Qu, University of Illinois, Champaign-Urbana 1:45 A New Approach to Variable Selection via Algorithmic Regularization Paths Yue Hu, Rice University Genevera I. Allen*, Rice University and Baylor College of Medicine 2:10 Link Prediction for Partially Observed Networks Yunpeng Zhao, George Mason University Yun-Jhong Wu, Elizaveta Levina and Ji Zhu*, University of Michigan 2:35 Graphical Regression Hsin-Cheng Huang, Academia Sinica, Taiwan Xiaotong Shen* and Wei Pan, University of Minnesota 3:00 Penalized Maximum Likelihood Estimation on a Two-Layered Network George Michailidis*, University of Michigan

3:25 Floor Discussion

81. Statistical Analysis for Deep Sequencing Data in Foster (3rd Floor) Cancer Research: Methods and Applications Sponsor: ENAR Organizer: Li-Xuan Qin, Memorial Sloan Kettering Cancer Center Chair: Yen-Tsung Huang, Brown University 1:45 A Statistical Method for Detecting Differentially Expressed Mutations Based on Next-Generation RNAseq Data Pei Wang*, Icahn School of Medicine at Mount Sinai Rong Fu, University of Washington Ziding Feng, University of Texas MD Anderson Cancer Center 2:10 Accounting for Differential Coverage in Comparing Mutation Prevalence George W. Wright*, National Cancer Institute, National Institutes of Health

* = Presenter n = Student Award Winner Program & Abstracts 117 2:35 Scalable Bayesian Nonparametric Learning for High-Dimensional Lung Cancer Genomics Data Chiyu Gu and Subharup Guha*, University of Missouri Veerabhadran Baladandayuthapani, University of Texas MD Anderson Cancer Center 3:00 Understanding MicroRNA Sequencing Data Distribution Li-Xuan Qin*, Memorial Sloan Kettering Cancer Center Tom Tuschl, Rockefeller University Sam Singer, Memorial Sloan Kettering Cancer Center

3:25 Floor Discussion

82. Spatial and Spatio-Temporal Modeling Merrick II (3rd Floor) Sponsor: IMS Organizer: Jonathan Stroud, The George Washington University Chair: Jonathan Stroud, The George Washington University 1:45 Multivariate Spatial Modeling of Conditional Dependence in Microscale Soil Elemental Composition Data Joseph Guinness*, Montserrat Fuentes, Dean Hesterberg and Matthew Polizzotto, North Carolina State University 2:10 Spatial Local Gradient Models of Biological Invasions Joshua Goldstein, Murali Haran* and Ottar N. Bjornstad, The Pennsylvania State University Andrew M. Liebhold, U.S. Forest Services 2:35 A Generalized Conditionally Autoregressive (CAR) Model Veronica J. Berrocal*, University of Michigan Alan E. Gelfand, Duke University 3:00 Gaussian Process Models for Emulating Spatial Computer Model Output Dave M. Higdon*, Los Alamos National Laboratory and Virginia Tech Mengyang Gu, Duke University

3:15 Floor Discussion

118 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 83. CONTRIBUTED PAPERS: Stanford (3rd Floor) Study Design and Power Sponsor: ENAR Chair: Shelley Han Liu, Harvard University 1:45 Comparison of Risk Estimates Derived from Full Cohort, Sub-Sample, and Nested Case-Cohort Methodologies Kathleen A. Jablonski* and Madeline M. Rice, The George Washington University 2:00 Power Estimation for Ordinal Categorical Data in the Presence of Non Proportional Odds Roy N. Tamura* and Xiang Liu, University of South Florida 2:15 Single Arm Phase II Cancer Survival Trial Designs Jianrong John Wu*, St. Jude Children’s Research Hospitial 2:30 Empirical Determination of Statistical Power and Sample Size for RNA-Seq Studies Milan Bimali*, Jonathan D. Mahnken and Brooke L. Fridley, University of Kansas Medical Center 2:45 Functional Signal-to-Noise Ratio Analysis with Applications in Quantitative Ultrasound Yeonjoo Park* and Douglas G. Simpson, University of Illinois, Urbana-Champaign 3:00 Analysis of a Non-Mortality Outcome in Clinical Trial of a Potentially Lethal Disease Roland A. Matsouaka*, Duke University Rebecca Betensky, Harvard University 3:15 Sample Size Determination Based on Quantile Residual Life Jong Hyeon Jeong*, University of Pittsburgh

* = Presenter n = Student Award Winner Program & Abstracts 119 84. CONTRIBUTED PAPERS: Gautier (3rd Floor) Missing Data Sponsor: ENAR Chair: Shengchun Kong, Purdue University 1:45 A Mixed Effects Model for Incomplete Data with Experiment-Level Abundance-Dependent Missing-Data Mechanism Lin S. Chen and Jiebiao Wang*, University of Chicago Xianlong Wang, Fred Hutchinson Cancer Research Center Pei Wang, Icahn Medical School at Mount Sinai 2:00 Multiple Imputation for General Missing Patterns in the Presence of High-Dimensional Data Yi Deng* and Qi Long, Emory University 2:15 A Mixed-Effects Model for Nonignorable Missing Longitudinal Data Xuan Bi* and Annie Qu, University of Illinois, Urbana-Champaign 2:30 EM Algorithm in Gaussian Copula with Missing Data Wei Ding* and Peter X.K. Song, University of Michigan 2:45 On Identification Issues with Binary Outcomes Missing Not at Random Jiwei Zhao*, University at Buffalo, SUNY 3:00 Kenward-Roger Approximation for Linear Mixed Models with Missing Covariates Akshita Chawla* and Tapabrata Maiti, Michigan State University Samiran Sinha, Texas A&M University 3:15 Nonparametric Sequential Multiple Imputation for Survival Analysis with Missing Covariates Paul Hsu, University of Arizona Mandi Yu*, National Cancer Institute, National Institutes of Health

120 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 85. CONTRIBUTED PAPERS: Ibis (3rd Floor) Innovative Methods for Clustered Data Sponsor: ENAR Chair: Jonggyu Beak, University of Michigan 1:45 Correlation Structure Selection Penalties for Improved Inference with Generalized Estimating Equations Philip M. Westgate* and Woodrow W. Burchett, University of Kentucky 2:00 Handling Negative Correlation and/or Overdispersion in Gaussian and Non-Gaussian Hierarchical Data Geert Molenberghs*, Hasselt University and Leuven University 2:15 Reflecting the Orientation of Teeth in Random Effects Models for Periodontal Outcomes Rong Xia*, Thomas M. Braun and William V. Giannobile, University of Michigan 2:30 Detecting Heterogeneity Based on Effect Size of Response Measures Xin Tong*, University of South Carolina, Columbia 2:45 Statistical Methods for Manifold-Valued Data from Longitudinal Studies Emil A. Cornea*, Hongtu T. Zhu and Joseph G. Ibrahim, University of North Carolina, Chapel Hill 3:00 Analyzing Dependent Data using Empirical Likelihood and Quadratic Inference Function Chih-Da Wu*, University of North Carolina, Chapel Hill Naisyin Wang, University of Michigan 3:15 Fast Estimation of Regression Parameters in a Broken Stick Model for Longitudinal Data Ritabrata Das*, Moulinath Banerjee and Bin Nan, University of Michigan

86. CONTRIBUTED PAPERS: Pearson II (3rd Floor) Biopharmaceutical Applications and Survival Analysis Sponsor: ENAR Chair: Chanmin Kim, Harvard University 1:45 Pseudo-Value Approach for Testing Conditional Residual Lifetime for Dependent Survival and Competing Risks Data Kwang Woo Ahn* and Brent R. Logan, Medical College of Wisconsin 2:00 Fallback Type FDR Controlling Procedures for Testing a Priori Ordered Hypotheses Anjana Grandhi*, Gavin Lynch and Wenge Guo, New Jersey Institute of Technology 2:15 Parametric Inference on Quantile Residual Life Kidane B. Ghebrehawariat*, Ying Ding and Jong-Hyeon Jeong, University of Pittsburgh * = Presenter n = Student Award Winner Program & Abstracts 121 2:30 Study Design Issues in Precision Study for Optical Coherence Tomography Device Haiwen Shi*, U.S. Food and Drug Administration 2:45 Modeling Gap Times between Recurrent Infections after Hematopoietic Cell Transplant Chi Hyun Lee* and Xianghua Luo, University of Minnesota Chiung-Yu Huang, Johns Hopkins University 3:00 Assessing Treatment Effects with Surrogate Survival Outcomes Using an Internal Validation Subsample Jarcy Zee*, Arbor Research Collaborative for Health Sharon X. Xie, University of Pennsylvania 3:15 Inference Concerning the Difference between Two Treatments in Clinical Trials K. Saha*, Central Connecticut State University

87. CONTRIBUTED PAPERS: Pearson I (3rd Floor) Computational Methods Sponsor: ENAR Chair: Sonja Grill, Technische Universität München 1:45 DNase2TF: An Efficient Algorithm for Footprint Detection Songjoon Baek*, Myong-Hee Sung and Gordon L. Hager, National Cancer Institute, National Institutes of Health 2:00 Spectral Properties of MCMC Algorithms for Bayesian Linear Regression with Generalized Hyperbolic Errors Yeun Ji Jung* and James P. Hobert, University of Florida 2:15 Group Fused Multinomial Regression Brad Price*, University of Miami Charles J. Geyer and Adam J. Rothman, University of Minnesota 2:30 Analysis of MCMC Algorithms for Bayesian Linear Regression with Laplace Errors Hee Min Choi*, University of California, Davis 2:45 On the Use of Cauchy Prior Distributions for Bayesian Binary Regression Joyee Ghosh*, University of Iowa Yingbo Li, Clemson University Robin Mitra, University of Southampton 3:00 Fast, Exact Bootstrap Principal Component Analysis for p > 1 million Aaron Fisher*, Brian Caffo, Brian Schwartz and Vadim Zipunnikov, Johns Hopkins University

3:15 Floor Discussion

122 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner TUESDAY, MARCH 17

Lower Promenade 3:30 – 3:45 pm — Refreshment Break with Our Exhibitors (Terrace Level)

3:45 – 5:30 p.m 88. Biostatistical Methods for Heterogeneous Tuttle (Terrace Level) Genomic Data Sponsor: ENAR Organizer: Wei Sun, University of North Carolina, Chapel Hill Chair: Wei Sun, University of North Carolina, Chapel Hill 3:45 Investigating Tumor Heterogeneity to Identify Etiologically Distinct Sub-Types Colin B. Begg*, Memorial Sloan Kettering Cancer Center 4:10 Statistical Challenges in Cancer Research: Heterogeneity in Functional Imaging and Multi-Dimensional Omics Data Kim-Anh Do*, Thierry Chekouo, Francesco Stingo, Brian Hobbs, Yuan Wang and Jianhua Hu, University of Texas MD Anderson Cancer Center James Doecke, CSIRO, Australian e-Health Research Centre, Brisbane, Australia 4:35 Accounting for Cellular Heterogeneity is Critical in Epigenome-Wide Association Studies Rafael Irizzary*, Harvard University 5:00 Modelling Sources of Variability in Single-Cell Transcriptomics Data Sylvia Richardson*, MRC Biostatistics Unit Cambridge, UK Catalina Vallejos, MRC Biostatistics Unit Cambridge and European Bioinformatics Institute, Hinxton, UK John Marioni, European Bioinformatics Institute, Hinxton, UK

5:25 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 123 89. Innovative Approaches in Competing Orchid C (Terrace Level) Risk Analysis Sponsors: ENAR, ASA Biometrics Section Organizer: Xu Zhang, University of Mississippi Medical Center Chair: Xu Zhang, University of Mississippi Medical Center 3:45 Flexible Modeling of Competing Risks and Cure Rate Qi Jiang and Sanjib Basu*, Northern Illinois University 4:15 Competing Risks Prediction in Two Time Scales Jason Fine*, University of North Carolina, Chapel Hill 4:45 Checking Fine and Gray’s Subdistribution Hazards Model with Cumulative Sums of Residuals Jianing Li, Medical College of Wisconsin Thomas H. Scheike, University of Copenhagen Mei-Jie Zhang*, Medical College of Wisconsin

5:15 Floor Discussion

90. Biomarker Evaluation in Diagnostics Studies Johnson (3rd Floor) with Longitudinal Data Sponsors: ENAR, ASA Biometrics Section, ASA Mental Health Statistics Section, ASA Statistical Programmers Section Organizer: Zheyu Wang, Johns Hopkins University Chair: Zheyu Wang, Johns Hopkins University 3:45 Combination of Longitudinal Biomarkers with Missing Data Danping Liu*, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 4:05 Measures to Evaluate Biomarkers as Predictors of Incident Cases Chao-Kang Jason Liang* and Patrick J. Heagerty, University of Washington 4:25 Prediction Accuracy of Longitudinal Marker Measurement Paramita Saha Chaudhuri*, McGill University Patrick Heagerty, University of Washington 4:45 Estimating Time-Dependent Accuracy Measures for Survival Outcome Under Two-Phase Sampling Designs Dandan Liu*, Vanderbilt University Tianxi Cai, Harvard University Anna Lok, University of Michigan Yingye Zheng, Fred Hutchinson Cancer Research Center

124 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 5:05 Compression of Longitudinal Genomic Biomarkers for Diagnosis Study Le Bao* and Xiaoyue Niu, The Pennsylvania State University Kayee Yeung, University of Washington

5:25 Floor Discussion

91. Solving Clinical Trial Problems by Using Foster (3rd Floor) Novel Designs Sponsors: ENAR, ASA Biopharmaceutical Section Organizer: Anastasia Ivanova, University of North Carolina, Chapel Hill Chair: Gheorge Doros, Boston University 3:45 Some Design Approaches to Address Missing Data Due to Early Discontinuation in Clinical Trials Sonia M. Davis*, University of North Carolina, Chapel Hill 4:15 Introduction to the Sequential Enriched Design Yeh-Fong Chen*, U.S. Food and Drug Administration Roy Tamura, University of South Florida 4:45 Integrity and Efficiency of Enrichment and Adaptive Trial Design and Analysis Options to Enable Accurate and Precise Signal Detection Marc L. de Somer*, PPD 5:15 Discussant: Anastasia Ivanova, University of North Carolina, Chapel Hill

* = Presenter n = Student Award Winner Program & Abstracts 125 92. Ensuring Biostatistical Competence Using Miami Lecture Hall (3rd Floor) Novel Methods Sponsor: ENAR Organizer: Lisa Sullivan, Boston University Chair: Lisa Sullivan, Boston University 3:45 What do Non-Biostatistics Concentrators Need from the Introductory Biostatistics Course? Jacqueline N. Milton*, Boston University 4:15 Creating the Integrated Biostatistics-Epidemiology Core Course: Challenges and Opportunities Melissa D. Begg*, Roger D. Vaughan and Dana March, Columbia University 4:45 Meeting Public Health Career Goals: Course Options in Biostatistics and Epidemiology Marie Diener-West*, Johns Hopkins Bloomberg School of Public Health 5:15 Discussant: Lisa Sullivan, Boston University

93. Methodological Frontiers in the Analysis Ashe Auditorium of Panel Observed Data (3rd Floor) Sponsor: IMS Organizer: Rebecca Hubbard, University of Pennsylvania Chair: Rebecca Hubbard, University of Pennsylvania 3:45 Multi-State Models: A Variety of Uses Vern Farewell*, MRC Biostatistics Unit, Cambridge, UK 4:10 Modeling Cognitive States in the Elderly: The Analysis of Panel Data Using Multi-State Markov and Semi-Markov Processes Richard J. Kryscio*, University of Kentucky 4:35 Second-Order Models of within-Family Association in Censored Disease Onset Times Yujie Zhong* and Richard J. Cook, University of Waterloo 5:00 Computationally Simple State Occupancy Probability Estimates for Multi-State Models Under Panel Observation Andrew Titman*, Lancaster University

5:25 Floor Discussion

126 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 94. CONTRIBUTED PAPERS: Stanford (3rd Floor) Ordinal and Categorical Data Sponsor: ENAR Chair: Haileab Hilafu, University of Tennessee 3:45 Explicit Estimates for Cell Counts and Modeling the Missing Data Indicators in Three-Way Contingency Table by Log-Linear Models Haresh D. Rochani*, Robert L. Vogel, Hani M. Samawi and Daniel F. Linder, Georgia Southern University 4:00 Additive Interactions and the Metabolic Syndrome Matthew J. Gurka* and Baqiyyah N. Conway, West Virginia University Michael E. Andrew and Cecil M. Burchfiel, National Institute for Occupational Safety and Health (NIOSH) Mark D. DeBoer, University of Virginia 4:15 Flexible Link Functions in Nonparametric Binary Regression with Gaussian Process Priors Dan Li* and Xia Wang, University of Cincinnati Lizhen Lin, University of Texas, Austin Dipak K. Dey, University of Connecticut 4:30 Penalized Non-Linear Principal Components Analysis for Ordinal Variables Jan Gertheiss*, Georg August University, Germany 4:45 Covariance Estimation of Proportion for Missing Dichotomous and Ordinal Data in Randomized Longitudinal Clinical Trial Siying Li* and Gary Koch, University of North Carolina, Chapel Hill 5:00 Bayesian Nonparametric Multivariate Ordinal Regression Junshu Bao* and Timothy E. Hanson, University of South Carolina

5:15 Floor Discussion ENAR2015

* = Presenter n = Student Award Winner Program & Abstracts 127 95. CONTRIBUTED PAPERS: Merrick II (3rd Floor) Statistical Genetics Sponsor: ENAR Chair: Chi Wang, University of Kentucky 3:45 Testing Calibration of Risk Models at Extremes of Disease Risk Minsun Song*, National Cancer Institute, National Institutes of Health Peter Kraft and Amit D. Joshi, Harvard School of Public Health Myrto Barrdahl, German Cancer Research Center (DKFZ) Nilanjan Chatterjee, National Cancer Institute, National Institutes of Health 4:00 PLEMT: A Novel Pseudolikelihood Based EM Test for Homogeneity in Generalized Exponential Tilt Mixture Models Chuan Hong n and Yong Chen, University of Texas School of Public Health, Houston Yang Ning, Princeton University Shuang Wang, Columbia University Hao Wu, Emory University Raymond J. Carroll, Texas A&M University 4:15 Regression-Based Methods to Map Quantitative Trait Loci Underlying Function-Valued Phenotypes Il Youp Kwak*, University of Minnesota Karl W. Broman, University of Wisconsin, Madison 4:30 A Framework for Classifying Relationships Using Dense SNP Data and Putative Pedigree Information Zhen Zeng* and Daniel E. Weeks, University of Pittsburgh Wei Chen, Children’s Hospital of Pittsburgh of UPMC Nandita Mukhopadhyay and Eleanor Feingold, University of Pittsburgh 4:45 A Negative Binomial Model-Based Method for Differential Expression Analysis Based on NanoString nCounter Data Hong Wang*, Arnold Stromberg and Chi Wang, University of Kentucky 5:00 Two-Stage Bayesian Regional Fine Mapping of a Quantitative Trait Shelley B. Bull*, University of Toronto and Lunenfeld-Tanenbaum Research Institute Zhijian Chen, Lunenfeld-Tanenbaum Research Institute Radu V. Craiu, University of Toronto 5:15 Optimal Ranking Procedures in Large-Scale Inference: Thresholding Families and the r-value Nicholas C. Henderson* and Michael A. Newton, University of Wisconsin, Madison

128 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 96. CONTRIBUTED PAPERS: Pearson I (3rd Floor) Ecology and Forestry Applications Sponsor: ENAR Chair: Min Wang, Michigan Technological University 3:45 A Statistical Framework for the Genetic Dissection of Evolution Induced by Ecological Interactions Cong Xu*, The Pennsylvania State University Libo Jiang and Meixia Ye, Beijing Forestry University Rongling Wu, The Pennsylvania State University 4:00 Analysis of Variance of Integro-Differential Equations with Application to Population Dynamics of Cotton Aphids Xueying Wang, Washington State University Jiguo Cao*, Simon Fraser University Jianhua Huang, Texas A&M University 4:15 New Insights into the Usefulness of Robust Singular Value Decomposition in Statistical Genetics: Robust AMMI and GGE Models Paulo Canas Rodrigues*, Federal University of Bahia, Brazil Andreia Monteiro and Vanda M. Lourenço, Nova University of Lisbon, Portugal 4:30 A Robust Mixed Linear Model for Heritability Estimation in Plant Studies Vanda M. Lourenço*, Nova University of Lisbon, Portugal Paulo C. Rodrigues, Federal University of Bahia, Brazil Miguel S. Fonseca and Ana M. Pires, University of Lisbon, Portugal 4:45 Cancer Incidence and Superfund Sites in Florida Emily Leary*, University of Missouri Alexander Kirpich, University of Florida

5:00 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 129 97. CONTRIBUTED PAPERS: Pearson II (3rd Floor) Pooled Biospecimens and Diagnostic Biomarkers Sponsor: ENAR Chair: Qingning Zhou, University of Missouri 3:45 Hierarchical Group Testing for Multiple Infections Peijie Hou n and Joshua M. Tebbs, University of South Carolina Christopher R. Bilder, University of Nebraska, Lincoln 4:00 Keeping Risk Calculators Current Donna Pauler Ankerst*, Technical University Munich and University of Health Science Center at San Antonio Andreas Strobl, Technical University Munich 4:15 Evaluation of Multiple Biomarkers in a Two-Stage Group Sequential Design with Early Termination for Futility Nabihah Tayob*, Kim-Anh Do and Ziding Feng, University of Texas MD Anderson Cancer Center 4:30 Flexible and Accessible Semi-Parametric Methods for Analyzing Pooled Biospecimens Emily M. Mitchell*, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health Robert H. Lyles and Amita K. Manatunga, Emory University Enrique F. Schisterman, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 4:45 Estimating Individualized Diagnostic Rules in the Era of Personalized Medicine Ying Liu n and Yuanjia Wang, Columbia University Chaorui Huang, Cornell University Donglin Zeng, University of North Carolina, Chapel Hill 5:00 Analysis of Unmatched Pooled Case-Control Data Neil J. Perkins*, Emily M. Mitchell and Enrique F. Schisterman, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 5:15 Estimating TP53 Mutation Carrier Probability in Families with Li-Fraumeni Syndrome Using LFSpro Gang Peng* and Jasmina Bojadzieva, University of Texas MD Anderson Cancer Center Mandy L. Ballinger, Peter MacCallum Cancer Centre, Melbourne, Australia David M. Thomas, The Kinghorn Cancer Centre and Garvan Institute, Sydney, Australia Louise C. Strong and Wenyi Wang, University of Texas MD Anderson Cancer Center

130 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 98. CONTRIBUTED PAPERS: Ibis (3rd Floor) Multiple Testing and Variable Selection Sponsor: ENAR Chair: Lee H Dicker, Rutgers University 3:45 Bayes Factor Approaches for Hypothesis Testing in ANOVA Models Min Wang*, Michigan Technological University 4:00 A Multifunctional Bayesian Procedure for Detecting Copy Number Variations from Sequencing Read Depths Yu-Chung Wei*, U.S. Food and Drug Administration and National Chiao Tung University, Taiwan Guan-Hua Huang, National Chiao Tung University, Taiwan 4:15 Inferring the Global Genetic Architecture of Gene Transcripts from Ultrahigh-Dimensional Molecular Data Kirk Gosik* and Rongling Wu, The Pennsylvania State University 4:30 Statistical Inference for High Dimensional Linear Regression with Linear Constraints and Application to Microbiome Study Pixu Shi*, Anru Zhang and Hongzhe Li, University of Pennsylvania 4:45 Taking into Account Overrepresented Patterns in Gene Expression Analysis Megan Orr* and Ekua Bentil, North Dakota State University 5:00 Bayesian Screening for Group Differences in Methylation Array Data Eric F. Lock*, University of Minnesota 5:15 Incorporating ENCODE Information into SNP-Based Phenotype Prediction Yue-Ming Chen* and Peng Wei, University of Texas School of Public Health, Houston

* = Presenter n = Student Award Winner Program & Abstracts 131 99. CONTRIBUTED PAPERS: Gautier (3rd Floor) Parameter Estimation in Hierarchical and Non-Linear Models Sponsor: ENAR Chair: Jingjing Yin, Georgia Southern University 3:45 A Hierarchical Bayesian Method for Well-Mixed and Two-Zone Models in Industrial Hygiene Xiaoyue Zhao*, Susan Arnold, Dipankar Bandyopadhyay and Gurumurthy Ramachandran, University of Minnesota Sudipto Banerjee, University of California, Los Angeles 4:00 Parameter Estimation: A Bayesian Inference Approach Romarie Morales*, Arizona State University 4:15 Bias and Confidence Interval Correction in Four Parameter Logistic Models Bronlyn Wassink* and Tapabrata Maiti, Michigan State University 4:30 Robust Mixed-Effects Model for Clustered Failure Time Data: Application to Huntington’s Disease Event Measures Tanya P. Garcia*, Texas A&M University Yanyuan Ma, University of South Carolina Yuanjia Wang and Karen Marder, Columbia University 4:45 Stacked Survival Models for Censored Quantile Regression Kyle Rudser*, University of Minnesota Andrew Wey, University of Hawaii John Connett, University of Minnesota 5:00 The CoGaussian Distribution: A Model for Right Skewed Data Govind S. Mudholkar and Ziji Yu*, University of Rochester Saria S. Awadalla, University of Chicago

5:15 Floor Discussion

132 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner WEDNESDAY, MARCH 18

8:30 – 10:15 am 100. New Statistical Methods in the Environmental Miami Lecture Hall Health Sciences (3rd Floor) Sponsors: ENAR, ASA Biometrics Section Organizers: Brisa Sanchez and Peter X.K. Song, University of Michigan Chair: Rong Xia, University of Michigan 8:30 New Statistical Models to Detect Vulnerable Prenatal Window to Carcinogenic Polycyclic Aromatic Hydrocarbons on Fetal Growth Lu Wang*, University of Michigan 8:55 Dimension Reduction for Spatially Misaligned Multivariate Air Pollution Data Adam Szpiro*, University of Washington 9:20 Evaluating Alterations in Regression Coefficients Directed by Toxicant Mixtures Peter X.K. Song*, University of Michigan Shujie Ma, University of California, Riverside

9:45 Floor Discussion

101. Novel Phase II and III Clinical Trial Designs for Pearson (3rd Floor) Cancer Research that Incorporate Biomarkers and Nonstandard Endpoints Sponsor: ENAR Organizer: Sujata Patil, Memorial Sloan Kettering Cancer Center Chair: Nichole Carlson, University of Colorado, Denver 8:30 Novel Phase II and III Designs for Oncology Clinical Trials, with a Focus on Biomarker Validation Daniel J. Sargent*, Mayo Clinic 8:55 Stratified Single Arm Phase 2 Design for Finding a Biomarker Group that Benefits from Treatment Irina Ostrovnaya* and Emily Zabor, Memorial Sloan Kettering Cancer Center

* = Presenter n = Student Award Winner Program & Abstracts 133 9:20 Lung-MAP: A Phase II/III Biomarker-Driven Master Protocol for Second Line Therapy of Squamous Cell Lung Cancer Mary W. Redman*, Fred Hutchinson Cancer Research Center 9:45 Randomized Phase II Design to Study Therapies Designed to Control Growth of Brain Metastases in Cancer Patients Sujata M. Patil*, Memorial Sloan-Kettering Cancer Center

10:10 Floor Discussion

102. Novel Statistical Methods to Decipher Gene Jasmine (Terrace Level) Regulation using Sequence Data Sponsor: ENAR Organizer: Hongyu Zhao, Yale University Chair: Hongyu Zhao, Yale University 8:30 On the Detection of Nonlinear and Interactive Relationships in Genomic Data Bo Jiang and Jun Liu*, Harvard University 8:55 Statistical Analysis of Differential Alternative Splicing Using RNA-Seq Data Mingyao Li*, Yu Hu and Cheng Jia, University of Pennsylvania 9:20 A Case Study of RNA-Seq Data in Breast Cancer Patients Wei Sun*, University of North Carolina, Chapel Hill 9:45 Unit-Free and Robust Detection of Differential Expression from RNA-Seq Data Hui Jiang*, University of Michigan

10:10 Floor Discussion

134 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 103. Flow Cytometry: Data Collection and Foster (3rd Floor) Statistical Analysis Sponsor: ENAR Organizer: Monnie McGee, Southern Methodist University Chair: Monnie McGee, Southern Methodist University 8:30 Flow, Mass and Imaging Cytometry for Single Cell Analysis: A Fertile Field for Biostatistics Research Richard H. Scheuermann*, J. Craig Venter Institute and University of California, San Diego Yu Qian, J. Craig Venter Institute Chiaowen Hsiao, University of Maryland, College Park Monnie McGee, Southern Methodist University 8:55 Computational Identification of Cell Populations from Cytometry Data: Methods, Applications, and Infrastructure Yu Qian* and Hyunsoo Kim, J. Craig Venter Institute Shweta Purawat, University of California, San Diego Rick Stanton, J. Craig Venter Institute Ilkay Altintas, University of California, San Diego Richard H. Scheuermann, J. Craig Venter Institute 9:20 Mapping Cell Populations in Flow Cytometry Data for Cross-Sample Comparison Using the Friedman-Rafsky Test Chiaowen Joyce Hsiao*, University of Maryland, College Park Mengya Liu, Southern Methodist University Rick Stanton, J. Craig Venter Institute Monnie McGee, Southern Methodist University Yu Qian, J. Craig Venter Institute Richard H. Scheuermann, J. Craig Venter Institute and University of California, San Diego 9:45 A Novel Approach to Modeling Immunology Data Derived from Flow Cytometry Jacob A. Turner*, Baylor Institute for Immunology Research 10:10 Discussant: Monnie McGee, Southern Methodist University

104. Statistical Methods in Chronic Kidney Disease Johnson (3rd Floor) Sponsor: ENAR Organizer: Dawei Xie, University of Pennsylvania Chair: Jesse Y. Hsu, University of Pennsylvania 8:30 Joint Modeling of Kidney Function Decline, End Stage Kidney Disease (ESRD), and Death with Special Consideration of Competing Risks Dawei Xie* and Wensheng Guo, University of Pennsylvania Wei Yang, Merrill Lynch Qiang Pan, University of Pennsylvania

* = Presenter n = Student Award Winner Program & Abstracts 135 9:00 Joint Multiple Imputation for Longitudinal Outcomes and Clinical Events which Truncate Longitudinal Follow-Up Bo Hu*, Cleveland Clinic Liang Li, University of Texas MD Anderson Cancer Center Tom Greene, University of Utah 9:30 Modeling the Effect of Blood Pressure on Disease Progression in Chronic Kidney Disease Using Multistate Marginal Structural Models Alisa J. Stephens*, Wei Peter Yang and Marshall M. Joffe, University of Pennsylvania Tom H. Greene, University of Utah 10:00 Dynamic Prediction of Clinical Events Using Longitudinal Biomarkers in a Cohort Study of Chronic Renal Disease Liang Li*, University of Texas MD Anderson Cancer Center

105. Challenging Statistical Issues in Imaging Merrick I (3rd Floor) Sponsors: ENAR, ASA Section on Statistics in Imaging, ASA Statistical Learning and Data Mining Section Organizer: Haipeng Shen and Hongtu Zhu, University of North Carolina, Chapel Hill Chair: Hongtu Zhu, University of North Carolina, Chapel Hill 8:30 Relating Developmental Transcription Factors Based on Drosophila Embryonic Gene Expression Images Siqi Wu*, University of California, Berkeley 8:55 Analysis of Point Pattern Imaging Data using Log Gaussian Cox Processes with Spatially Varying Coefficients Timothy D. Johnson*, University of Michigan Thomas E. Nichols, University of Warwick 9:20 Fiber Direction Estimation in Diffusion MRI Raymond Wong*, Iowa State University Thomas C. M. Lee, Debashis Paul and Jie Peng, University of California, Davis 9:45 FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-Scale Imaging Genetic Data Hongtu Zhu* and Meiyang Chen, University of North Carolina, Chapel Hill Thomas Nichols, University of Warwick Chao Huang, Yu Yang and Zhaohua Lu, University of North Carolina, Chapel Hill Qianjing Feng, Southern Medical University Rebecca C. Knickmeyer, University of North Carolina, Chapel Hill

10:10 Floor Discussion

136 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 106. Statistical Methods for Predicting Subgroup Ashe Auditorium (3rd Floor) Level Treatment Response Sponsor: IMS Organizer: Tianxi Cai, Harvard University Chair: Jennifer Anne Sinnot, Harvard School of Public Health 8:30 A Regression Tree Approach to Identifying Subgroups with Differential Treatment Effects Wei-Yin Loh*, University of Wisconsin, Madison 8:55 Feature Elimination for Reinforcement Learning Methods Sayan Dasgupta*, Fred Hutchinson Cancer Research Center Michael R. Kosorok, University of North Carolina, Chapel Hill 9:20 Increasing Efficiency for Estimating Treatment-Biomarker Interactions with Historical Data Jeremy MG Taylor*, Philip S. Boonstra and Bhramar Mukherjee, University of Michiganw 9:45 Adaptive Designs for Developing and Validating Predictive Biomarkers Noah Simon, University of Washington Richard M. Simon*, National Cancer Institute, National Institutes of Health

10:10 Floor Discussion

107. CONTRIBUTED PAPERS: Ibis (3rd Floor) ROC Curves Sponsor: ENAR Chair: Philip M Westgate, University of Kentucky 8:30 Improved Estimation of Diagnostic Cut-Off Point Associated with Youden Index Using Ranked Set Sampling Jingjing Yin*, Hani Samawi, Chen Mo and Daniel Linder, Georgia Southern University 8:45 A Better Confidence Interval for the Sensitivity at a Fixed Level of Specificity for Diagnostic Tests with Continuous Endpoints Guogen Shan*, University of Nevada Las Vegas 9:00 Simpson’s Paradox in the IDI Jonathan Chipman*, Vanderbilt University Danielle Braun, Dana-Farber Cancer Institute 9:15 A Nonparametric Test Based on t-Distribution for Comparing Two Correlated C Indices with Right-Censored Survival Outcome or AUCs with Dichotomous Outcome Le Kang* and Shumei Sun, Virginia Commonwealth University

* = Presenter n = Student Award Winner Program & Abstracts 137 9:30 Latent Mixture Models for Ordered ROC Curves Using the Scale Mixture of Normal Distributions Zhen Chen* and Sungduk Kim, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 9:45 Least Squares ROC Method for Tests with the Absence of the Gold Standard Larry Tang*, George Mason University and National Institutes of Health Clinical Center Minh Huynh, Department of Labor and National Institutes of Health Clinical Center Xuan Che and Elizabeth K. Rasch, Epidemiology and Biostatistics, National Institutes of Health Clinical Center Ao Yuan, Georgetown University

10:00 Floor Discussion

108. CONTRIBUTED PAPERS: Merrick II (3rd Floor) Personalized Medicine and Biomarkers Sponsor: ENAR Chair: Zhenzhen Zhang, University of Michigan 8:30 Using Decision Lists to Construct Interpretable and Parsimonious Treatment Regimes Yichi Zhang n, Eric Laber, Anastasios Tsiatis and Marie Davidian, North Carolina State University 8:45 Synthesizing Genetic Markers for Incorporation into Clinical Risk Prediction Tools Sonja Grill*, Technical University Munich, Germany Donna P. Ankerst, Technical University Munich, Germany and University of Texas Health Science Center at San Antonio 9:00 A PRIM Approach to Predictive-Signature Development for Patient Stratification Gong Chen*, Roche TCRC, Inc. Hua Zhong, New York University School of Medicine Anton Belousov, Roche Diagnostics GmbH Viswanath Devanarayan, AbbVie, Inc. 9:15 On Estimation of Optimal Treatment Regimes for Maximizing t-Year Survival Probability Runchao Jiang n, Wenbin Lu, Rui Song and Marie Davidian, North Carolina State University

138 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 9:30 Evaluation of Novel Biomarkers when Limited by Small Sample Size Bethany J. Wolf*, John Christian Spainhour and Jim C. Oates, Medical University of South Carolina 9:45 Calibrate Variations in Biomarker Measures for Improving Prediction Cheng Zheng*, University of Wisconsin, Milwaukee Yingye Zheng, Fred Hutchinson Cancer Research Center 10:00 Building Small, Robust Gene Signatures to Predict Prognosis Prasad Patil* and Jeffrey T. Leek, Johns Hopkins University

109. CONTRIBUTED PAPERS: Stanford (3rd Floor) Time Series Analysis and Methods Sponsor: ENAR Chair: Haiwen Shi, U.S. Food and Drug Administration 8:30 Robust Portfolio Optimization Under High Dimensional Heavy-Tailed Time Series Huitong Qiu* and Fang Han, Johns Hopkins University Han Liu, Princeton University Brian Caffo, Johns Hopkins University 8:45 Change-Point Detection in EEG Spectra for Informed Frequency Band Selection Anna Louise Schroeder*, London School of Economics Hernando Ombao, University of California, Irvine 9:00 Time Series Analysis for Symbolic-Valued Data S. Yaser Samadi*, Southern Illinois University Lynne Billard, University of Georgia 9:15 High Dimensional State Space Model with L-1 and L-2 Penalties Shaojie Chen* and Joshua Vogelstein, Johns Hopkins University Seonjoo Lee, Columbia University Martin Lindquist and Brian Caffo, Johns Hopkins University 9:30 Autoregressive Models for Spherical Data with Applications in Protein Structure Analysis Daniel Hernandez-Stumpfhauser*, University of North Carolina, Chapel Hill F. Jay Breidt and Mark van der Woerd, Colorado State University 9:45 Modeling Serial Covariance Structure in Semiparametric Linear Mixed-Effects Regression for Longitudinal Data Changming Xia*, University of Rochester Medical Center Hua Liang, The George Washington University Sally W. Thurston, University of Rochester Medical Center

10:00 Floor Discussion

* = Presenter n = Student Award Winner Program & Abstracts 139 WEDNESDAY, MARCH 18

Lower Promenade 10:15 – 10:30 am — Refreshment Break with Our Exhibitors (Terrace Level)

10:30 am – 12:15 pm 110. Incorporating Biological Information in Jasmine (Terrace Level) Statistical Modeling of Genome-Scale Data with Complex Structures Sponsor: ENAR Organizer: Mingyao Li, University of Pennsylvania Chair: Mingyao Li, University of Pennsylvania 10:30 Prioritizing GWAS Results by Integrating Pleiotropy and Annotation Hongyu Zhao*, Yale School of Public Health Dongjun Chung, Medical University of South Carolina Can Yang, Hong Kong Baptist University Cong Li and Qian Wang, Yale University Joel Gelernter, Yale School of Medicine 10:55 Challenges and Solutions for Whole Exome Sequence Analysis for Pedigree and External Control Data Daniel J. Schaid*, Mayo Clinic 11:20 Big Data Methods for Dissecting Variations in High-Throughput Genomic Data Fang Du, Bing He and Hongkai Ji*, Johns Hopkins Bloomberg School of Public Health 11:45 Model-Based Approach for Species Quantification and Differential Abundance Analysis Based on Shotgun Metagenomic Data Hongzhe Li*, University of Pennsylvania

12:10 Floor Discussion

140 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 111. Emerging Issues in Clinical Trials and Ashe Auditorium High Dimensional Data (3rd Floor) Sponsors: ENAR, ASA Biopharmaceutical Section Organizer: Qingxia (Cindy) Chen, Vanderbilt University Chair: Qingxia (Cindy) Chen, Vanderbilt University 10:30 Assessing Covariate Effects with the Monotone Partial Likelihood Using Jeffreys’ Prior in the Cox Model Ming-Hui Chen*, University of Connecticut Mario de Castro, Universidade de Sao Paulo Jing Wu and Elizabeth D. Schifano, University of Connecticut 10:55 Assessing Temporal Agreement between Central and Local Progression-Free Survival Times Donglin Zeng* and Emil Cornea, University of North Carolina, Chapel Hill Jun Dong and Jean Pan, Amgen Inc. Joseph Ibrahim, University of North Carolina, Chapel Hill 11:20 Statistical Design of Non-Inferiority Multiple Region Clinical Trials to Assess Global and Consistent Treatment Effects Guoqing Diao*, George Mason University Donglin Zeng and Joseph G. Ibrahim, University of North Carolina, Chapel Hill Alan Rong, Oliver Lee and Kathy Zhang, Amgen Inc. Qingxia Chen, Vanderbilt University 11:45 Bayesian Shrinkage Methods for High Dimensional Data Joseph G. Ibrahim* and Hongtu Zhu, University of North Carolina, Chapel Hill Zakaria Khondker, Medivation, Inc. Zhaohua Lu, University of North Carolina, Chapel Hill

12:10 Floor Discussion

112. Advances in Repeated Measures and Pearson (3rd Floor) Longitudinal Data Analysis Sponsor: ENAR Organizer: Sanjoy Sinha, Carleton University Chair: Sanjoy Sinha, Carleton University 10:30 Joint Modelling of Different Types of Longitudinal Data with Outliers and Censoring Lang Wu*, University of British Columbia

* = Presenter n = Student Award Winner Program & Abstracts 141 10:55 A Hidden Markov Model for Non-Ignorable Non-Monotone Missing Longitudinal Data for Medical Studies of Quality of Life Kaijun Liao, Hisun Pharmaceuticals USA Qiang Zhang, Radiation Therapy Oncology Group Andrea B. Troxel*, University of Pennsylvania Perelman School of Medicine 11:20 Inverse Weighted Estimating Equations for Repeated Measures in Tranfusion Medicine Richard Cook*, University of Waterloo 11:45 Joint Modelling of Nonignorable Missing Longitudinal Outcomes and Time-to-Event Data Sanjoy Sinha*, Carleton University

12:10 Floor Discussion

113. Advances in Modeling Zero-Inflated Data Johnson (3rd Floor) Sponsors: ENAR, ASA Mental Health Statistics Section Organizer: Brian Neelon, Duke University Chair: James O’Malley, Dartmouth University 10:30 Bayesian Two-Part Spatial Models for Semicontinuous Data Brian Neelon*, Duke University Li Zhu, University of Pittsburgh Sara Benjamin, Duke University 10:55 Zero-Inflated Frailty Model for Recurrent Event Data Lei Liu*, Northwestern University Xuelin Huang, University of Texas MD Anderson Cancer Center Alex Yaroshinsky, Vital Systems Inc. 11:20 Two-Part Models for Rolling Admission Group Therapy Data Lane F. Burgette* and Susan M. Paddock, RAND Corporation 11:45 A Marginalized Two-Part Model for Semicontinuous Data Valerie A. Smith*, Center for Health Services Research in Primary Care, Durham VAMC and University of North Carolina, Chapel Hill John S. Preisser, University of North Carolina, Chapel Hill Brian Neelon, Duke University Matthew L. Maciejewski, Center for Health Services Research in Primary Care, Durham VAMC

12:10 Floor Discussion

142 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 114. New Developments in Missing Data Analysis: Merrick II (3rd Floor) From Theory to Practice Sponsors: ENAR, ASA Survey Research and Methodology Section Organizer: Lihong Qi, University of California, Davis Chair: Yi Li, University of Michigan 10:30 Competing Risks Regression with Missing Data in the Prognostic Factors Federico Ambrogi*, University of Milan Thomas H. Scheike, University of Copenhagen 10:55 Comparison of Multiple Imputation via Chained Equations and General Location Model for Accelerated Failure Time Models with Missing Covariates Lihong Qi*, University of California, Davis Yulei He, Centers for Disease Control and Prevention Rongqi Chen, Ying-Fang Wang and Xiaowei Yang, University of California, Davis 11:20 The Effect of Data Clustering on the Multiple Imputation Variance Estimator Yulei He*, Iris Shimizu, Susan Schappert, Nathaniel Schenker, Vladislav Beresovsky, Diba Khan and Roberto Valverde, Centers for Disease Control and Prevention 11:45 Fractional Hot Deck Imputation for Multivariate Missing Data in Survey Sampling Jae kwang Kim* and Wayne A. Fuller, Iowa State University

12:10 Floor Discussion

115. Environmental Methods with Deterministic Foster (3rd Floor) and Stochastic Components Sponsor: ENAR Organizer: Ed Boone, Virginia Commonwealth University Chair: Edward L. Boone, Virginia Commonwealth University 10:30 High Resolution Nonstationary Random Field Simulation William Kleiber*, University of Colorado, Boulder 10:50 Estimating Parameters in Delay Differential Equation Models Liangliang Wang* and Jiguo Cao, Simon Fraser University 11:10 Zero-Inflated Spatial Temporal Models for Exploring Trend in Comandra Blister Rust Infection in Lodge Pole Pine Trees Cindy Feng*, University of Saskatchewan

* = Presenter n = Student Award Winner Program & Abstracts 143 11:30 A Spatio-Temporal Approach to Modeling Spatial Covariance Ephraim M. Hanks*, The Pennsylvania State University 11:50 Incorporating Covariates in Deterministic Environmental Models Edward L. Boone*, Virginia Commonwealth University Ben Stewart-Koster, Australian Rivers Institute at Griffith University

12:10 Floor Discussion

116. Bayesian and Non-Parametric Bayesian Miami Lecture Hall (3rd Floor) Approaches to Causal Inference Sponsor: IMS Organizer: Peter Mueller, University of Texas, Austin Chair: Peter Mueller, University of Texas, Austin 10:30 A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs George Karabatsos*, University of Illinois, Chicago Stephen G. Walker, University of Texas, Austin 10:55 Evaluating the Effect of University Grants on Student Dropout: Evidence from a Regression Discontinuity Design Using Bayesian Principal Stratification Analysis Fan Li*, Duke University Alessandra Mattei and Fabrizia Mealli, University of Florence 11:20 Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times Yanxun Xu* and Peter Mueller, University of Texas, Austin Abdus S. Wahed, University of Pittsburgh Peter F. Thall, University of Texas MD Anderson Cancer Center 11:45 A Framework for Bayesian Nonparametric Inference for Causal Effects of Mediation Chanmin Kim, Harvard University Michael J. Daniels*, University of Texas, Austin Jason Roy, University of Pennsylvania

12:10 Floor Discussion

144 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 117. Design of Multiregional Clinical Trials: Merrick I (3rd Floor) Theory and Practice Sponsor: ENAR Organizer: Gordon Lan, Janssen Research & Development Chair: Gordon Lan, Janssen Research & Development 10:30 Random Effects Models for Multiregional Clinical Trial Design and Analysis Gordon Lan*, Janssen Research & Development 11:15 Consistency of Treatment Effect in Multiregional Clinical Trials Joshua Chen*, Sanofi Pasteur 11:50 Discussant: Fei Chen, Janssen R&D, Johnson & Johnson

12:05 Floor Discussion

118. CONTRIBUTED PAPERS: Ibis (3rd Floor) Multivariate Survival Analysis Sponsor: ENAR Chair: Minsun Song, National Cancer Institute, National Institutes of Health 10:30 A Sieve Semiparametric Maximum Likelihood Approach for Regression Analysis of Bivariate Interval-Censored Failure Time Data Qingning Zhou*, University of Missouri Tao Hu, Capital Normal University Jianguo Sun, University of Missouri 10:45 Methods for Contrasting Gap Time Hazard Functions Xu Shu* and Douglas E. Schaubel, University of Michigan 11:00 Using Full Cohort Information to Improve the Effciency of Multivariate Marginal Hazard Model for Case-Cohort Studies Hongtao Zhang*, Jianwen Cai, Haibo Zhou and David Couper, University of North Carolina, Chapel Hill 11:15 Marginal Models for Restricted Mean Survival with Clustered Time to Event Data Using Pseudo-Values Brent R. Logan* and Kwang Woo Ahn, Medical College of Wisconsin 11:30 Semi-Parametric Modeling of Bivariate Recurrent Events Jing Yang* and Limin Peng, Emory University

* = Presenter n = Student Award Winner Program & Abstracts 145 11:45 Analysis of a Composite Endpoint Under Different Censoring Schemes for Component Events via Multiple Imputation Yuqi Chen*, University of California, Santa Barbara Chunlei Ke, Amgen Inc. Jianming Wang, Celgene Corporation 12:00 Quantile Regression for Survival Data with Delayed Entry Boqin Sun* and Jing Qian, University of Massachusetts, Amherst

119. CONTRIBUTED PAPERS: Stanford (3rd Floor) Constrained Inference Sponsor: ENAR Chair: Emily Leary, University of Missouri 10:30 Order Statistics from Lindley Distribution and their Applications Khalaf S. Sultan* and Wafaa S. AL-Thubyani, College of Science King Saud University, Saudi Arabia 10:45 CLME: A Tool for Inference in Linear Mixed Effects Models Under Inequality Constraints Casey M. Jelsema* and Shyamal D. Peddada, National Institute of Environmental Health Sciences, National Institutes of Health 11:00 Order-Constrained Bayesian Nonparametric Modeling of Correlated Three-Way ROC Surfaces Beomseuk Hwang* and Zhen Chen, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health 11:15 Partial Likelihood Estimation of Isotonic Proportional Hazards Models Yunro Chung*, Anastasia Ivanova, Michael Hudgens and Jason Fine, University of North Carolina, Chapel Hill 11:30 Nonparametric Tests of Uniform Stochastic Ordering Chuan-Fa Tang*, Joshua M. Tebbs and Dewei Wang, University of South Carolina 11:45 Covariate Balanced Restricted Randomization: Optimal Designs, Exact Tests, and Asymptotic Properties Jingjing Zou* and Jose R. Zubizarreta, Columbia University

12:00 Floor Discussion

146 ENAR 2015 | Spring Meeting | March 15–18 * = Presenter n = Student Award Winner 120. CONTRIBUTED PAPERS: Gautier (3rd Floor) Nonparametric Methods Sponsor: ENAR Chair: Nabihah Tayob, University of Texas MD Anderson Cancer Center 10:30 Nonparametric and Semiparametric Estimation in Multiple Covariates Richard Charnigo*, University of Kentucky Limin Feng, Intel Corporation Cidambi Srinivasan, University of Kentucky 10:45 Nonparametric Empirical Bayes via Maximum Likelihood for High-Dimensional Classification Lee H. Dicker, Rutgers University Sihai D. Zhao, University of Illinois, Urbana-Champaign Long Feng*, Rutgers University 11:00 Nonparametric Inference for an Inverse-Probability-Weighted Estimator with Doubly Truncated Data Xu Zhang*, University of Mississippi Medical Center 11:15 A Test For Directional Departure From Loewe Additivity Mingyu Xi*, University of Maryland, Baltimore County 11:30 Estimation and Confidence Bands for Nonparametric Regression with Functional Responses and Multiple Scalar Covariates Andrada E. Ivanescu*, Montclair State University 11:45 Nonparameteric Bayesian Analysis of The 2 Sample Problem with Censoring Kan Shang* and Cavan Sheerin Reilly, University of Minnesota ENAR 2015

dimensions are found if the excess zeroes are Abstracts & Poster accounted for correctly. A mixture model, i.e. hybrid latent class latent factor model, is used to assess Presentations the dimensionality of the underlying subgroup cor- responding to those who come from the part of the population with some measurable trait. Implications 1. POSTERS: of the findings are discussed, in particular regard- Latent Variable and ing the potential for different findings in community Mixture Models versus patient populations. email: [email protected]

1a. ASSESSMENT OF DIMENSIONALITY CAN BE DISTORTED BY TOO MANY ZEROES: 1b. LOCAL INFLUENCE DIAGNOSTICS AN EXAMPLE FROM PSYCHIATRY AND FOR HIERARCHICAL COUNT DATA A SOLUTION USING MIXTURE MODELS MODELS WITH OVERDISPERSION AND EXCESS ZEROS Melanie M. Wall*, Columbia University Trias Wahyuni Rakhmawati*, Universiteit Hasselt Irini Moustaki, London School of Economics Geert Molenberghs, Universiteit Hasselt and Common methods for determining the number of Katholieke Universiteit Leuven latent dimensions underlying an item set include eigenvalue analysis and examination of fit statistics Geert Verbeke, Katholieke Universiteit Leuven for factor analysis models with varying number of and Universiteit Hasselt factors. Given a set of dichotomous items, we will Christel Faes, Universiteit Hasselt and Katholieke demonstrate that these empirical assessments of Universiteit Leuven dimensionality are likely to underestimate the num- ber of dimensions when there is a preponderance We consider models for hierarchically observed of individuals in the sample with all zeros as their and possibly overdispersed count data, that in responses, i.e. all incorrect answers. A simulated addition allow for excess zeros. The model extends data experiment is conducted to demonstrate this the Poisson-normal generalized linear mixed phenomena. An example is shown from psychiatry model by including gamma random effects to assessing the dimensionality of a social anxiety accommodate overdispersion. Excess zeros are disorder battery where only one latent dimension handled using either a zero-inflation or a hurdle is found if the full sample is used, while three latent component. These models were studied by Kas- sahun et al. (2014). While flexible, the model is

148 ENAR 2015 | Spring Meeting | March 15–18 quite elaborate in parametric specifica- may experience zero lateral competition, mixtures. In a hurdle, the second compo- tion and therefore model assessment for example. To handle this problem we nent follows a zero-truncated distribution, is imperative. We derive local influence have developed an R package [fitmixst4] while in a zero-inflated it follows a count measures to detect influential subjects, that accommodates collapse clusters by distribution with positive probability of i.e., subjects who have undue influence constraining the variance of the cluster to generating zeroes. However models on either the fit of the model as a whole, be below a fixed upper bound. We apply that deal with this problem are not well or on specific important sub-vectors. the R package to retrospectively describe developed for zero-heavy continuous out- The latter include the fixed effects for clusters of trees that died and remained comes. Thus, in this paper, we propose the Poisson component and for the alive over series of five-year follow-up and evaluate a two-component Weibull excess zeros component, the variance periods from 9,292 beech trees in a mixture model for effectively dealing with components for the normal random Bavarian long-term forest research plot the problem. We use simulated and real effects, and the parameters describing network. Heat maps of the mixture den- data from a randomized-controlled-trial the gamma random effects. Interpretable sity ratios corresponding to dead versus (RCT) to demonstrate its application and influence components are derived. The alive trees are assessed as a potential make comparisons with other methods methods are illustrated using data from prediction tool for forest mortality for via statistical information and mean a longitudinal clinical trial in patients with future forest management. squared error criteria. Our results show dermatophyte onychomycosis. that the two component Weibull mixture email: [email protected] model is superior for modeling zero- email: triaswahyuni.rakhmawati@uhass heavy continuous data. 1d. WEIBULL MIXTURE REGRESSION email: [email protected] 1c. FINITE MULTIVARIATE MIXTURES FOR ZERO-HEAVY CONTINUOUS OF SKEW-T DISTRIBUTIONS WITH SUBSTANCE USE OUTCOMES COLLAPSE CLUSTERS WITH 1e. MODEL-FREE ESTIMATION OF Mulugeta Gebregziabher, Medical APPLICATION IN FORESTRY TIME-VARYING CORRELATION University of South Carolina COEFFICIENTS AND THEIR CON- Josef Hoefler*, Technical University Delia Voronca*, Medical University FIDENCE INTERVALS WITH AN Munich of South Carolina APPLICATION TO fMRI DATA Donna Pauler Ankerst, Technical Abeba Teklehaimanot, Medical Maria A. Kudela*, Indiana University University Munich University of South Carolina Richard M. Fairbanks School of Public Finite mixtures of skew-t distributions Elizabeth J. Santa Ana, Ralph Health, Indianapolis offer a flexible framework for model- H. Johnson Department of Veterans Jaroslaw Harezlak, Indiana University ing non-Normal data, in particular data Affairs Medical Center Richard M. Fairbanks School of Public that possess skewness, multiple clus- Health, Indianapolis ters and/or outliers. Such models have Outcomes with preponderance of zero been extended to multivariate data, with values are ubiquitous in data that arise Martin Lindquist, Johns Hopkins Bloom- detailed Expectation-Maximization (EM) from studies of addictive disorders. This berg School of Public Health is known to lead to violation of standard algorithms for fitting. A practical problem One of main interests in fMRI (functional assumptions in parametric inference and that arises with these models is the exis- magnetic resonance imaging) research enhances the risk of misleading conclu- tence of collapsed clusters, which reside is the study of associations between sions unless managed properly. Two of on smaller-dimensional planes than the time series from different brain regions, the most popular models used to handle remaining clusters. This occurs for certain so called functional connectivity (FC). this issue for count outcomes are hurdle competition indices measured in trees Recently, it has become increasingly and zero-inflated models. Both models – all trees in a particular plot or forest important to assess dynamic changes can be expressed as two-component

Program & Abstracts 149 in FC, both during resting state and with men (MSM) in 22 social networks advance is problematic. If we put a prior task-based fMRI experiments, as this in Ghana, we consider a zero and one on T, then the distribution on the latent is thought to provide the information inflated beta regression with mixed variables is a mixture of distributions on needed to better understand the brain’s effects. The proposed model addresses spaces of different dimensions, and esti- inner workings. Currently, the most com- the issue of abundance of zeros and mating this mixture distribution by MCMC mon approach to estimate these dynamic ones in the relative frequency of condom is very challenging. We present a variant changes is by computing the correlation uses, and also the dependence of the of the Metropolis-Hastings algorithm that coefficient between time series within a relative frequencies of MSM from the can be used to effectively estimate this sliding-window. However, one of the dis- same social network. We achieve this by mixture distribution and in particular the advantages of this method is that it tends linking the mixed-effects regression to the posterior distribution on the number of to overestimate the association between beta distribution mean through a logit link topics. We also provide theory to justify the time series obtained from different function while keeping the other positive that our algorithm can correctly estimate brain regions (Lindquist et al. 2014). Here parameter constant. We also discuss the true posterior distribution of T given we propose a new approach for estimat- extension of the proposed model by also the words. We evaluate the performance ing time-varying FC using the correlation relating the positive parameter to the of our algorithm on synthetic data, with a between two time series and provide mixed-effects through a log link. comparison with the existing method. We valid confidence bands for this estimator. also give an illustration on a collection of email: [email protected] We propose an algorithm based on the articles from Wikipedia. sliding-window approach which utilizes email: [email protected] the multivariate linear process bootstrap. 1g. INFERENCE FOR THE NUMBER OF Both numerical results and an application TOPICS IN THE LATENT DIRICH- to fMRI study of alcoholism risk factors LET ALLOCATION MODEL VIA A 1h. APPLYING A STOCHASTIC VOLA- will be presented. PSEUDO-MARGINAL METROPO- TILITY MODEL TO US STOCK email: [email protected] LIS-HASTINGS ALGORITHM MARKETS WITH A UMM UNDER- GRADUATE STUDENT Zhe Chen*, University of Florida Jong-Min Kim*, University of Minnesota, Hani Doss, University of Florida 1f. ZERO-AND-ONE INFLATED BETA Morris REGRESSION WITH MIXED Latent Dirichlet Allocation (LDA) is a Li Qin, University of Minnesota, Morris EFFECTS FOR MODELING RELA- model that is used for automatically TIVE FREQUENCY OF CONDOM organizing, understanding, searching, This research is for University of Min- USE IN MEN WHO HAVE SEX WITH and summarizing a corpus of documents. nesota-Morris statistics undergraduate MEN (MSM) IN GHANA Let V be the set of all words that appear senior project. We use the log-normal & Nanhua Zhang*, Cincinnati Children’s at least once in at least one document. stochastic volatility model as a basis for Hospital Medical Center By definition, a topic is a distribution on investigating the absolute returns and V. LDA posits that for every word in every the squared returns as two measures Yue Zhang, University of Cincinnati document, there is a latent topic from of latent volatility in financial markets. LaRon E. Nelson, University of which that word is drawn. In standard We also use linear regression with time Rochester LDA, the number of topics, T, must be varying parameters for investigating the specified in advance. A prior distribution relationships in financial markets. Further- Zero-and-one inflated proportion data are is placed on the T topics, and also on the more, we show the log volatility forecasts common in behavioral studies. Motivated latent word-specific topic-indicator vari- by using the log-normal & stochastic by a study of the effect of psycho-social ables that are associated with each word volatility model. variables on the relative frequencies of in the corpus. The need to specify T in condom use among men who have sex email: [email protected]

150 ENAR 2015 | Spring Meeting | March 15–18 1i. A MIXTURE MODEL OF HET- 1j. BAYESIAN RANDOM GRAPH apply the method to estimate the com- EROGENEITY IN TREATMENT MIXTURE MODEL FOR COMMU- munity structures in the functional resting RESPONSE NITY DETECTION IN WEIGHTED brain networks of 185 subjects from the NETWORKS ADHD-200 sample. Hongbo Lin*, Indiana University School of Medicine and Richard M. Fairbanks Christopher Bryant*, University of North email: [email protected] School of Public Health, Indianapolis Carolina, Chapel Hill

Changyu Shen, Indiana University Mihye Ahn, University of North Carolina, 1k. TIME SERIES FORECASTING School of Medicine and Richard M. Chapel Hill USING MODEL-BASED CLUSTER- Fairbanks School of Public Health, Hongtu Zhu, University of North Carolina, ING AND MODEL AVERAGING Indianapolis Chapel Hill Fan Tang*, University of Iowa In clinical trials, randomized studies Joseph Ibrahim, University of North are designed to estimate the average Joseph Cavanaugh, University of Iowa Carolina, Chapel Hill treatment effect. However, it is widely Time series forecasting is an important accepted that heterogeneity may exist The network paradigm has become a practical problem. By incorporating in treatment effects; some patients may popular approach for modeling complex information from series that exhibit similar benefit from a medical intervention, while systems, with applications ranging from long-term and transitory behaviors, we others may not. We propose a mixture- social sciences to genetics to neurosci- can potentially improve the forecasting model based approach to study the ence and beyond. Often the individual accuracy for a particular series of interest. heterogeneity of the treatment effect. connections between network nodes are However, identifying and appropri- We consider a latent binary variable that of less interest than network characteris- ately utilizing a cluster of related series indicates whether or not a subject will tics such as its community structure - the from a larger pool presents a daunting benefit from an intervention. Our mixture tendency in many real-data networks challenge. This paper introduces a time- model combines a logistic formulation for nodes to be naturally organized in series forecasting procedure that relies for the probability of a patient benefitting groups with dense connections between on model-based clustering and model from an intervention with proportional nodes in the same (unobserved) group averaging. The clustering algorithm hazards models conditional on the status but sparse connections between nodes employs a state-space model comprised of the latent variable. EM algorithm is in different groups. Most community of three latent structures: a long-term used to estimate the parameters in the detection algorithms involve optimiza- trend component; a seasonal compo- model. Standard errors are calculated tion of various connectedness measures nent, to capture recurring global patterns; using Louis’s method. Simulations are in order to achieve this structure, rather and an anomaly component, to reflect performed to study the properties of the than an explicit probabilistic framework. local perturbations. A two-step clustering estimators. The method is also applied to Random graph mixture models utilize algorithm is applied to identify series that a real randomized study that compared such a framework and can accurately are both globally and locally correlated, the Implantable Cardioverter Defibrillator capture latent communities in either based on corresponding smoothed latent (ICD) with conventional medical therapy binary or weighted networks. We fit a structures. For each series in a particular in reducing total mortality in a low ejec- Bayesian hierarchical model to Gaussian- cluster, a set of forecasting models are tion fraction population. weighted networks via Gibbs sampling, fit, using covariate series from the same which allows for community detection email: [email protected] across multiple subjects and even for small graphs or sub-graphs. We show results from simulated networks and

Program & Abstracts 151 1m. A NEW APPROACH FOR TREAT- MENT NONCOMPLIANCE WITH STRUCTURAL ZERO DATA Pan Wu*, Christiana Care Health System In randomized controlled trails, the con- founding of non-compliance after initial treatment assignment is a serious prob- lem that could lead to biased estimation of treatment effect and cause-plausible interpretation for study results. Further, it is usually inappropriate to assume the variable measuring post-treatment cluster. To fully utilize the cluster infor- example, we consider 3D CT image- non-compliance follows single-mode mation and to improve forecasting for a based reconstructions of rib cages from distributions such as normal or Poisson, series of interest, multi-model averaging normative pediatric subjects and those especially for mental health studies, since is employed. The proposed technique is with skeletal deformities. Variation and such a non-compliance variable, i.e., applied to a collection of monthly disease symmetry in the shape of individual ribs the amount of treatment participation, incidence series. Our approach yields and of the exterior surface of each side of reflects the attitude of acceptance of such both clinically and statistically meaningful the rib cage are of interest. Existing analy- treatment by patients, which can be quite clusters, and through model averaging, sis methods for thoracic shape typically heterogeneous across patients. Exist- produces more accurate forecasts than focus on a priori defined simple geomet- ing approaches are unable to address any individual model. ric summaries. The data have a multilevel such non-compliance patterns that are structure with multiple functional objects email: [email protected] described by models for structural zero (ribs, sides) observed within each sub- data. In the talk, we would like to propose ject’s image, and variation is modeled at a new framework of Structural Equation 1l. MULTILEVEL FUNCTIONAL PRIN- the levels of subject, rib, and side. We Models with robust inference to estimate CIPAL COMPONENTS ANALYSIS propose a multilevel functional principal the causal effect between two active treat- OF SURFACES WITH APPLICA- components model for shape of ribs and ment arms with non-compliance in each TION TO CT IMAGE DATA OF the ribcage surface, using a spherical group. The proposed models are able PEDIATRIC THORACIC SHAPE harmonic basis to represent the func- to address the patient heterogeneity in tional objects. Derived components are acceptance of treatment. Instead of using Lucy F. Robinson*, Drexel University used to assess lateral symmetry within likelihood based inference, the proposed Jonathan Harris, Drexel University each subject, to model the relationship methods require no assumption of Sriram Balasubramanian, Drexel between shape and covariates of interest, parametric distribution and offer consis- University and to identify latent clinical subpopula- tent estimation of model parameters with tions in patients with thoracic deformities. asymptotically normal distributions under We propose a multilevel functional prin- mild regularity conditions. cipal components model for multivariate email: [email protected] spatial functional data with one and two- email: [email protected] dimensional arguments. As a motivating

152 ENAR 2015 | Spring Meeting | March 15–18 2. POSTERS: 2b. SEGMENTATION OF INTRA- achieved greater than 90% accuracy in all Imaging Methods CEREBRAL HEMORRHAGE IN left-out scans. This model represents the and Applications CT SCANS USING LOGISTIC first automated segmentation procedure REGRESSION for ICH using regression methods. As such, we can infer the predictive power John Muschelli*, Johns Hopkins Bloom- 2a. DETERMINING MULTIMODAL for each explanatory variable. berg School of Public Health NEUROIMAGING MARKERS OF email: [email protected] PARKINSON’S DISEASE Natalie Ullman, Johns Hopkins School of Medicine DuBois Bowman*, Columbia University Daniel Hanley, Johns Hopkins School 2c. RELATING MULTI-SEQUENCE Weingiong Xue, Boehringer Ingelheim of Medicine LONGITUDINAL DATA FROM MS Daniel Drake, Columbia University LESIONS ON STRUCTURAL MRI Ciprian M. Crainiceanu, Johns Hopkins TO CLINICAL COVARIATES AND Advances in biomedical imaging tech- Bloomberg School of Public Health nology have led to an increase in health OUTCOMES Intracranial hemorrhage (ICH) is a research studies that collect large-scale, Elizabeth Sweeney*, Johns Hopkins neurological condition that results from multimodal data sets, frequently in con- Bloomberg School of Public Health a blood vessel rupturing into tissues and junction with genomic data, and biologic possibly the ventricles of the brain and Blake Dewey, National Institute of Neu- and clinical measures. Such studies is often fatal. X-ray computed tomogra- rological Disease and Stroke, National provide an unprecedented opportunity phy (CT) scans is the most commonly Institutes of Health for cross-cutting investigations that used diagnostic tool in patients with ICH stand to gain a deeper understanding Daniel Reich, National Institute of Neu- and allows quantitative description of of the pathophysiology associated with rological Disease and Stroke, National ICH in Hounsfield units (HU). The size major diseases. We develop a Bayes- Institutes of Health of the ICH is highly predictive of good ian hierarchical model for the analysis functional outcome in patients. The gold Ciprian M. Crainiceanu, Johns Hopkins of multimodal neuroimaging data to standard measurement of ICH is manual Bloomberg School of Public Health investigate neural markers of Parkinson’s segmentation of CT scans, which is Russell Shinohara, University of disease (PD). Our model incorporates time-consuming and subject to intra- and Pennsylvania imaging data, reflecting both functional inter-observer variability. We present a and structural characteristics of the brain, Ani Eloyan, Johns Hopkins Bloomberg regression modeling framework for esti- incorporates spatial correlations between School of Public Health mating the probability of ICH in a voxel. distinct brain regions, and yields classifi- We estimated this model from 10 patient Structural magnetic resonance imag- cations of subjects as either PD patients scans, out of 112 scans, and estimated ing (MRI) can be used to detect lesions or healthy controls (HCs). Applying the model performance on the left-out 92 in the brains of multiple sclerosis (MS) model to multimodal magnetic reso- scans. The area under the curve (AUC) patients. The formation of these lesions nance-based images, we demonstrate for a receiver operating characteristic is a complex sequence of inflammation, the ability to isolate neural characteristics (ROC) curve, partial AUC (pAUC), and degeneration, and repair that MRI has that reflect accurate signatures of PD and total accuracy (% correctly classified) been shown to be sensitive to. We char- that hold promise for serving as useful were estimated to determine which seg- acterize the lesion formation process with early stage PD biomarkers. mentation methods performed well. We multi-sequence structural MRI. We have email: [email protected] longitudinal MRI from 60 MS patients, each with between 10 and 40 studies consisting of a T1-weighted, T2-weighted, fluid attenuated inversion recovery

Program & Abstracts 153 (FLAIR) and proton density (PD) volume. Current technologies have enabled fully/ 2e. BACKGROUND ADJUSTMENT We extract the multi- sequence longitudi- semi-automated segmentation of MRI AND VOXELWISE INFERENCE nal voxel intensities from the four volumes scans for the assessment of multiple FOR TEMPLATE-BASED GAUSS- using SuBLIME, a method for detection of sclerosis (MS). However, manual correc- IAN MIXTURE MODELS incident and enlarging MS lesions voxels. tion of these images by an expert reader Meng Li*, North Carolina State University Next we spatially and temporally smooth remains desirable. Since automated the volumes and use functional principal segmentation data awaiting manual Armin Schwartzman, North Carolina component analysis to identify voxels correction is “missing”, we proposed State University that contain permanent damage and to use multiple imputation (MI) to fill-in In brain oncology, it is routine to ana- repair. We then investigate this repair and the missing manually-corrected MRI lyze the progression or remission of the permanent damage in relation to clinical data for measures of brain atrophy and disease based on the differences between covariates such as disease duration, MS lesion burden. Scans from 1370 patients a pre-treatment and a post-treatment subtype, Expanded Stability Status Score enrolled in the Comprehensive Longitu- Positron Emission Tomography (PET) (EDSS), and treatment. dinal Investigation of Multiple Sclerosis scan. The analysis is challenging because at the Brigham and Women’s Hospital email: [email protected] differences between two scans are (CLIMB) study were identified. Simula- expected even in the regions that are not tion studies were conducted to assess affected by the disease. To overcome this 2d. USING MULTIPLE IMPUTATION the performance of MI with missing data problem, it has been previously proposed TO EFFICIENTLY CORRECT MAG- both missing completely at random to segment the images using a template- NETIC RESONANCE IMAGING and missing at random. An imputation based Gaussian mixture model (GMM) DATA IN MULTIPLE SCLEROSIS model including the semi-automated data and then adjust the background of the explained a very high proportion of the two scans within each class, making the Alicia S. Chua*, Brigham and Women’s variance in the manually corrected data differences in the disease regions stand Hospital, Boston for both outcome measures (R2>.90 out. However, in spite of the anatomical Svetlana Egorova, Brigham and Wom- for each), demonstrating the potential guidance provided by the spatial template, en’s Hospital, Boston to accurately impute the missing data. the voxelwise mixture probabilities are Mark C. Anderson, Brigham and Further, our results demonstrate that typically not accurately estimated, making Women’s Hospital, Boston MI allows for the accurate estimation of the background adjustment and inference group differences with little to no bias difficult. In this paper, we propose a statis- Mariann Polgar-Turcsanyi, Brigham and with similar precision compared to tical testing procedure to detect localized and Women’s Hospital, Boston an analysis with no missing data. We differences between the images using a Tanuja Chitnis, Brigham and Women’s believe that our findings provide impor- template-based GMM approach. We show Hospital, Boston tant insights for efficient correction of that the voxelwise test statistic produced Howard L. Weiner, Brigham and automated or semi-automated MRI mea- by background adjustment is very close to Women’s Hospital, Boston sures to reduce the burden of manual of the standard Gaussian in a wide range of correction. scenarios, making it suitable for statisti- Charles R. Guttmann, Brigham and cal inference. In particular, the standard Women’s Hospital, Boston email: [email protected] Gaussian approximation is stable even Rohit Bakshi, Brigham and Women’s when the mixture probabilities are not Hospital, Boston accurately estimated, and it tends to be Brian C. Healy, Brigham and Women’s conservative at the tails, assuring the test’s Hospital, Boston validity. We confirm the good performance

154 ENAR 2015 | Spring Meeting | March 15–18 of the proposed approach by simulations both large-scale and small-scale spa- variable selection, leading to very efficient and phantom experiments. The proposed tial variation. Simulations show that the Markov chain Monte Carlo (MCMC) approach can be applied directly by prac- parcellation performs well under vary- algorithms. The proposed algorithms titioners since the resulting p-value can ing assumptions. Indicators on parcel are computationally feasible for ultra- provide immediate reference when making boundaries do not suffer edge effects high dimensional data. Also, our model conclusions and decisions with respect to and maintain a low false discovery rate. incorporates two levels of structural infor- the change of the disease status. With an event related experiment, we mation into variable selection using Ising show that the model is easy to implement priors: the spatial dependence between email: [email protected] and offers certain advantages over whole voxels and the functional connectiv- brain modeling. ity between anatomical brain regions. Applied to the resting state functional 2f. FAST, FULLY BAYESIAN SPATIO- email: [email protected] TEMPORAL INFERENCE FOR fMRI magnetic resonance imaging (R-fMRI) data in the ABIDE study, our methods Donald R. Musgrove*, University of 2g. BAYESIAN SPATIAL VARIABLE identify voxel-level imaging biomarkers Minnesota SELECTION FOR ULTRA-HIGH highly predictive of the ASD, which are John Hughes, University of Minnesota DIMENSIONAL NEUROIMAGING biologically meaningful and interpretable. Lynn E. Eberly, University of Minnesota DATA: A MULTIRESOLUTION Extensive simulations also show that our APPROACH methods achieve better performance in We propose a sparse spatial Bayesian variable selection compared to existing variable selection method for functional Yize Zhao*, Statistical and Applied Math- methods. magnetic resonance imaging (fMRI). ematical Sciences Institute email: [email protected] Typical fMRI experiments generate huge Jian Kang, Emory University datasets with complex spatiotemporal Qi Long, Emory University dependence structures. To ease the 2h. ANALYSIS OF HIGH DIMEN- computational burden, we separate the Ultra-high dimensional variable selec- SIONAL BRAIN SIGNALS IN brain into three-dimensional parcels tion has become increasingly important DESIGNED EXPERIMENTS USING whereby inference occurs parcel-wise in in analysis of neuroimaging data. For PENALIZED THRESHOLD VEC- parallel. Volume element (voxel) activa- example, in the Autism Brain Imaging TOR AUTOREGRESSION tion within parcels is modeled as a series Data Exchange (ABIDE) study, neuro- of autocorrelated regressions on a lattice. scientists are interested in identifying Lechuan Hu*, University of California, Regressors represent change in blood important biomarkers for early detection Irvine oxygenation in response to stimuli while of the autism spectrum disorder (ASD) Hernando Ombao, University of Califor- indicator variables capture the nonzero using high resolution brain images that nia, Irvine change. Via a reparameterized Gauss- include hundreds of thousands voxels. One way to measure cortical activity is ian Markov random field prior, a sparse However, most existing methods are not by electroencephalograms (EEG) which spatial generalized linear mixed model feasible for solving this problem due to recorded across many channels on the (SSGLMM) is used to model spatial their extensive computational costs. In entire surface of the scalp. Using EEG, dependence among indicator variables this work, we propose a novel multireso- one can infer the nature of neuronal activ- within a parcel for a given stimulus. In lution variable selection procedure under ity in the cortex and the cross-regional particular, the SSGLMM accounts for a Bayesian probit regression framework. interactions. In designed experiments It recursively uses posterior samples for there are many high-dimensional EEG coarser-scale variable selection to guide the posterior inference on finer-scale

Program & Abstracts 155 traces recorded across many trials. Our reduce noise sensitivity, we conduct the brain connectivity. An important issue is goal is to infer the brain dynamics using analysis in the frequency domain, and how to choose several critical parameters all the high dimensional time series data also impose sparsity on the frequency in estimating a network, such as what using threshold vector autoregressive basis function for better interpretation. association measure to use and what is models (T-VAR). Due to the sheer size From the framework, we also extract the the sparsity of the estimated network. In and dimension of the data, it is difficult subject-specific spatial factors which particular, an optimal choice of a param- to estimate the parameters in the VAR enable us group comparison. We discuss eter for network estimation may not be model. Here we will develop a complex- optimization strategies to avoid lack of optimal for testing. On the other hand, ity-penalized TVAR to infer connectivity memory in practice. Numerical results mis-specified values of these parameters between brain regions. We will use the show that the spatially weighted frame- may lead to extremely low-powered tests. proposed model to study connectivity in work has lower variability regardless of Here we present highly adaptive tests for resting-state using 1-second multichan- poor alignment and high inter-subject group differences in brain connectivity, nel- EEG traces recorded for over 3 variability. Finally, we apply the proposed which automatically combine statisti- minutes. This work has been in collabora- method to the Alzheimer’s Disease Neu- cal evidence against a null hypothesis tion with the Space-Time Modeling Group roimaging Initiative (ADNI) data. from multiple sources across a wide at UC Irvine. range of the plausible parameter val- email: [email protected] ues. These highly adaptive tests are not email: [email protected] only easy to use, but also high-powered 2j. HIGHLY ADAPTIVE TEST FOR robustly across various scenarios. The 2i. SPATIALLY WEIGHTED REDUCED- GROUP DIFFERENCES IN BRAIN advantages of these novel tests are dem- RANK FRAMEWORK FOR FUNCTIONAL CONNECTIVITY onstrated on realistically simulated data NEUROIMAGING DATA WITH and an Alzheimer’s disease dataset. Junghi Kim*, University of Minnesota APPLICATION TO ALZHEIMER’S email: [email protected] DISEASE Wei Pan, University of Minnesota Mihye Ahn*, University of Nevada, Reno Resting-state functional magnetic resonance imaging (rs-fMRI) and other 2k. PRE-SURGICAL fMRI DATA Haipeng Shen, University of North Caro- technologies have been offering increas- ANALYSIS USING A SPATIALLY lina, Chapel Hill ing evidence showing that altered brain ADAPTIVE CONDITIONALLY Chao Huang, University of North Caro- functional networks are associated with AUTOREGRESSIVE MODEL lina, Chapel Hill neurological illnesses such as Alzheim- Zhuqing Liu*, University of Michigan er’s disease. However, group-level Yong Fan, University of Pennsylvania Veronica J. Berrocal, University network analysis is both challenging and of Michigan Hongtu Zhu, University of North Carolina, necessary due to the high dimensionality Chapel Hill of network models and high noise levels Andreas J. Bartsch, University In neuroimaging studies, it is challenging in neuroimaging data. Varoquaux and of Heidelberg Craddock (2013) highlighted that “there to incorporate multiple subjects for group Timothy D. Johnson, University is currently no unique solution, but a inference due to spatial-temporal func- of Michigan tional variation. In this paper, we propose spectrum of related methods and ana- Spatial smoothing is an essential step in a new modelling framework for analyzing lytical strategies” to learn and compare the analysis of functional magnetic reso- functional connectivity pattern across nance imaging (fMRI) data. One standard subjects by considering spatial and tem- smoothing method is to convolve the poral similarity on whole brain images. To image data with a three-dimensional

156 ENAR 2015 | Spring Meeting | March 15–18 Gaussian kernel that applies a fixed develop a semiparametric Bayesian mod- amount of smoothing to the entire image. eling approach for MR imaging analysis In pre-surgical brain image analysis involves combining multiple measures of where spatial accuracy is paramount, multiple muscles over time. Dependence this method, however, is not reasonable among different outcomes is induced as it can blur the boundaries between through latent variables and nonparamet- activated and deactivated regions of the ric priors are used for the random effects brain. Moreover, while in a standard fMRI distribution. A Markov chain Monte Carlo analysis strict false positive control is algorithm is proposed for estimating the desired, for pre-surgical planning false posterior distributions of the parameters negatives are of greater concern. To and latent variables. clustering, which begins with a measure this end, we propose a novel spatially email: [email protected] of similarity between voxels. The goal adaptive conditionally autoregressive of this work is to improve the reproduc- model with smoothing variances that are ibility of single-subject parcellation using proportional to error variances, allowing 2m. IMPROVING RELIABILITY OF shrinkage-based estimators of such mea- the degree of smoothing to vary across SUBJECT-LEVEL RESTING- sures, allowing the noisy subject-specific the brain and present a new loss function STATE BRAIN PARCELLATION estimator to borrow strength from a that allows for the asymmetric treatment WITH EMPIRICAL BAYES larger population of subjects. We present of false positives and false negatives. SHRINKAGE several shrinkage estimators and outline We compare our proposed model with Amanda F. Mejia*, Johns Hopkins methods for estimating the within-subject two existing spatially adaptive smoothing University variance when multiple scans are not models. Simulation studies show that our available for each subject. We perform model outperforms these other models; Mary Beth Nebel, Johns Hopkins shrinkage on raw inter-voxel correlation as a real model application, we apply the University estimates and use both raw and shrink- proposed model to the pre-surgical fMRI Haochang Shou, Johns Hopkins age estimates to produce parcellations data of a patient to assess peri- and intra- University by performing clustering on the voxels. tumoral brain activity. Using two datasets - a simulated dataset Ciprian M. Crainiceanu, Johns Hopkins where the true parcellation is known, and email: [email protected] Bloomberg School of Public Health a test-retest dataset consisting of two James J. Pekar, Johns Hopkins Univer- 7-minute resting-state fMRI scans from 2l. SEMIPARAMETRIC BAYESIAN sity School of Medicine 20 subjects - we show that parcellations MODELS FOR LONGITUDINAL MR Stewart Mostofsky, Johns Hopkins produced from shrinkage correlation esti- IMAGING DATA WITH MULTIPLE University mates have higher reliability and validity CONTINUOUS OUTCOMES than those produced from raw correla- Brian Caffo, Johns Hopkins University Xiao Wu*, University of Florida tion estimates. Application to test-retest Martin Lindquist, Johns Hopkins rsfMRI data shows that using shrinkage Michael J. Daniels, University of Texas, University estimators increases the reproducibility of Austin A recent interest in resting state functional subject-specific parcellations of the motor This research is motivated by data from a magnetic resonance imaging (rsfMRI) cortex by up to 30 percent. Duchenne Muscular Dystrophy study on lies in subdividing the human brain into email: [email protected] changes in muscle imaging data to cap- functionally distinct regions of interest. ture disease progression over time. We One common parcellation technique is

Program & Abstracts 157 3. POSTERS: 3b. ANALYZING MULTIPLE END- dure was conducted so that family wise Clinical Trials, Adaptive POINTS IN A CONFIRMATORY error rate (FWER) was controlled in the Designs and Applications RANDOMIZED CLINICAL strong sense in the analysis of multiple TRIAL: AN APPROACH THAT endpoints. Results: The data analyzed ADDRESSES STRATIFICATION, are from a clinical trial that compares a 3a. THE ROLE OF STATISTICIANS MISSING VALUES, BASELINE test treatment and a control for the pain IN REGULATORY DRUG SAFETY IMBALANCE AND MULTIPLIC- management for patients with osteoarthri- EVALUATION ITY FOR STRICTLY ORDINAL tis. Four outcomes indicating joint pain, Clara Kim*, U.S. Food and Drug OUTCOMES stiffness and functional status were ana- lyzed collectively and individually through Administration Hengrui Sun*, University of North Caro- the procedures. Treatment efficacy was lina, Chapel Hill Mark Levenson, U.S. Food and Drug observed in the combined endpoint Administration Atsushi Kawaguchi, Kyoto University, as well as in the individual endpoints. Food and Drug Administration Amend- Japan Conclusions: The proposed approach is ments Act of 2007 granted the FDA Gary Koch, University of North Carolina, effective in solving the aforementioned the authority to require post-marketing Chapel Hill problems simultaneously. safety studies, and labeling change to Background: Many confirmatory ran- email: [email protected] include new safety information. Since domized clinical trials that compare two then, FDA has substantially strengthened treatments have strictly ordinal response its safety program for marketed drugs. 3c. COMPARING THE STATISTI- outcomes with stratified design. Multiple Major actions to advance drug safety CAL POWER OF ANALYSIS OF endpoints are often collected when one monitoring include enhanced capabili- COVARIANCE AFTER MUL- single endpoint does not represent the ties of statistical analysis. The Division of TIPLE IMPUTATION AND THE overall efficacy of the treatment. Base- Biometrics 7 (DB7) of the Office of Biosta- MIXED MODEL IN TESTING THE line imbalance and missing values add tistics in the Center for Drug Evaluation TREATMENT EFFECT FOR PRE- another layer of difficulty in the analysis and Research of FDA is dedicated to full- POST STUDIES WITH LOSS TO plan. Therefore, the development of an cycle drug safety evaluation. This division FOLLOW-UP approach that provides a consolidated is responsible for meta-analyses, evaluat- solution is essential. Methods: Multi- Wenna Xi*, The Ohio State University ing clinical trials designed primarily to variate Mann-Whitney estimators with study safety outcomes, and observational Michael L. Pennell, The Ohio stratification adjustment were used to studies submitted as a post-marketing State University handle the strictly ordinal responses. requirement. Additionally, DB7 has exper- Randomization based nonparametric Rebecca R. Andridge, The Ohio tise in the design and statistical methods analysis of covariance was applied to State University used in studies that utilize surveillance account for the possible baseline imbal- Electra D. Paskett, The Ohio State systems, and registry or health care data- ances. Several approaches that handle University bases, such as Sentinel and FDA initiated missing values were compared. A global pharmacoepidemiological studies. This In pre-post studies with complete follow- test followed by closed testing proce- poster will describe these activities with up, previous studies have shown that examples that exemplify DB7 contribu- analysis of covariance (ANCOVA) is tions that resulted in regulatory actions. more powerful than the change-score analysis in comparing the intervention email: [email protected]

158 ENAR 2015 | Spring Meeting | March 15–18 group to control. However, there have 3d. EXTENDING LOGISTIC REGRES- 3e. DOSE-FINDING APPROACH been no comparisons of power under SION LIKELIHOOD RATIO TEST BASED ON EFFICACY AND TOX- missing post-test values. The goal of ANALYSIS TO DETECT SIGNALS ICITY OUTCOMES IN PHASE I this study was to compare the power of OF VACCINE-VACCINE INTER- ONCOLOGY TRIALS FOR MOLEC- two methods: ANCOVA after multiple ACTIONS IN VACCINE SAFETY ULARLY TARGETED AGENTS imputation (MI) and the mixed model, in SURVEILLANCE Hiroyuki Sato*, Pharmaceuticals and testing the treatment effect when post- Kijoeng Nam*, U.S. Food and Drug Medical Devices Agency test values are missing. To do so, we Administration analyzed the BePHIT data and performed Akihiro Hirakawa, Nagoya University simulation studies. Four methods were Nicholas C. Henderson, University of Graduate School of Medicine Wisconsin, Madison used and compared: ANCOVA after MI, Chikuma Hamada, Tokyo University complete-case ANCOVA, the all-available Patricia Rohan, U.S. Food and Drug of Science data mixed model, and the complete- Administration The paradigm of oncology drug develop- case mixed model. Simulation studies Emily Jane Woo, U.S. Food and Drug ment is expanding from cytotoxic agents were conducted under various sample Administration to biological or molecularly targeted sizes, missingness rates, and missing- agents. It is common for cytotoxic agents ness scenarios. In the analysis of the Estelle Russek-Cohen, U.S. Food that the efficacy and toxicity mono- BePHIT study data, ANCOVA after MI and Drug Administration tonically increase with dose escalation. had the smallest p-value. The simulation Adverse vaccine effects (AVEs) might However, for some molecularly tar- results demonstrated that ANCOVA after arise from vaccine interactions in addi- geted agents, the efficacy may exhibit MI was usually more powerful than the tion to AVEs from individual vaccines and non-monotonic patterns in their dose- all-available data mixed model when the may not be detected until the postmar- response relationships. Many existing missingness percentage was moderate ket stage. The Vaccine Adverse Event dose-finding approaches form non-mono- (20% and 30%). However, the power of Reporting System (VAERS) is a national tonic patterns in dose-efficacy curve by ANCOVA after MI dropped the fastest as vaccine safety surveillance program using specific models, such as Quadratic the missingness rate increased and, in co-sponsored by the Centers for Disease model. In this study, we propose a novel most simulated scenarios, was the least Control and Prevention (CDC) and the Bayesian adaptive dose-finding approach powerful method when 50% of the post- Food and Drug Administration (FDA). based on binary efficacy and toxicity test outcomes were missing. The VAERS database contains reports of outcomes. We develop a dose-efficacy email: [email protected] adverse events associated with immuni- model whose parameters are allowed to zation and disproportionality analysis can change before and after the change point be used to explore vaccine interaction of dose in order to take into consideration adverse effects (VIAEs). In this paper, the non-monotonic pattern of the dose- we develop a logistic regression based efficacy relationship. The change point is likelihood ratio test (LR-LRT) for detect- obtained as the study dose maximizing ing interactions between vaccines that log likelihood given the model parameter may signal potential safety concerns. estimates. These model parameters are We evaluate our procedure with several estimated by using Markov chain Monte numerical simulations, and we compare our results with known safety profiles, to validate the ability of our method to detect potential VIAEs. email: [email protected]

Program & Abstracts 159 Carlo methods under the assumption in these scenarios focus on treatment trials has received attention from advo- that each study dose is the change point. effects within subjects rather than the cacy organizations, there has not been During the trial, we continuously estimate examination of treatment effects across much research documenting the degree the posterior probabilities of efficacy and several cases. For alternating treat- to which such exclusions continue to be toxicity and assign patients to the most ment SCD data, we investigate various widespread. We hypothesize that clinical appropriate dose based on the deci- measures of effect size within each case trials with sponsoring institutions located sion rules we defined. We evaluate the and apply well-established meta-analytic in more racially and ethnically diverse operating characteristics of the proposed methods for examination of treatment areas are more likely to have racial, eth- approach through simulation studies effects across cases. Our motivat- nic, or English fluency-related eligibility under various scenarios. ing example arises from the behavior criteria. We are using data from the Clini- analysis literature, where researchers calTrials.gov database linked to United email: [email protected] are interested in assessing educational States Census and American Community interventions on children with autism Survey data. Our preliminary findings with 3f. EFFECT SIZE MEASURES AND spectrum disorders. respect to English language exclusions suggest that trials located at institutions META-ANALYSIS FOR ALTERNAT- email: [email protected] ING TREATMENT SINGLE CASE in ZIP codes with more residents self- DESIGN DATA identifying as Black/African American or 3g. CLINICAL TRIALS WITH Asian are more likely to require that par- D Leann Long*, West Virginia University EXCLUSIONS BASED ON ticipants be fluent in English. Conversely, Mathew Bruckner, West Virginia RACE, ETHNICITY, AND clinical trials located in areas with more University ENGLISH FLUENCY residents self-identifying as Hispanic are less likely to have English fluency require- Regina A. Carroll, West Virginia Brian L. Egleston*, Fox Chase Cancer ments. Clinical trial statisticians may University Center, Temple University have an opportunity to address inclusion George A. Kelley, West Virginia Omar Pedraza, Fox Chase Cancer Cen- concerns when designing trials. University ter, Temple University email: [email protected] Single case designs (SCD) are employed Yu-Ning Wong, Fox Chase Cancer Cen- in several fields of research where the ter, Temple University treatment and outcome measures of 3h. COMPARING FOUR Roland L. Dunbrack Jr., Fox Chase Can- interest require a high degree of tailor- METHODS FOR ESTIMATING cer Center, Temple University ing to individual cases, as well as when OPTIMAL TREE-BASED TREAT- the study conditions are in short supply. Eric A. Ross, Fox Chase Cancer Center, MENT REGIMES Alternating treatment SCD are charac- Temple University Aniek Sies*, Katholieke Universiteit terized by the swift alternation between J. Robert Beck, Fox Chase Cancer Cen- Leuven different treatments or conditions within ter, Temple University a case, each associated with a distinct Iven Van Mechelen, Katholieke stimulus. Unique statistical challenges Recruiting diverse populations to clinical Universiteit Leuven trials helps ensure that study results are arise for SCD, particularly due to smaller When multiple treatment alternatives generalizable to the population at large. sample sizes than desired for traditional are available for a certain disease, an We are currently examining the charac- statistical theory and the repeated nature important challenge is to find an opti- teristics of clinical trials that have explicit of the treatments within a subject. The mal treatment regime, which specifies inclusion criteria related to race, ethnicity, statistical methods generally conducted for each patient the preferred treatment or English fluency. While explicit exclu- sion of African Americans from clinical

160 ENAR 2015 | Spring Meeting | March 15–18 alternative given his or her pretreat- these effects: (1) Clustering, (2) Hierarchi- We conduct simulations using Markov ment characteristics. An interesting cal Bayesian, (3) and Latent Variable and chain Monte Carlo (MCMC) algorithms class of treatment regimes is that of the will explain the practical interpretation. to examine the convergence of Bayesian tree-based ones, because they provide Through simulation, bias of each method dose finding designs and investigate their a straightforward and most insightful will be calculated. These approaches will operating characteristics. be applied to the analysis of glycemic representation of the decision struc- email: [email protected] ture. Recently, several methods for the control data in six pediatric Intensive construction of tree-based regimes have Care Units (ICU) from around the country, been proposed. Up to now however, only utilizing ICU length of stay as our primary 3k. THE RELATIONSHIP AMONG partial information is available concerning endpoint. TOXICITY, RESPONSE, AND SUR- their absolute and relative performance. email: [email protected] VIVAL PROFILES ULTIMATELY Our paper addresses this issue by INFLUENCE CALLING A BEN- reporting the results of an extensive simu- EFICIAL EXPERIMENTAL DRUG lation study to evaluate four tree-based 3j. BAYESIAN DOSE FINDING FAVORABLE UNDER STANDARD methods, namely “Interaction Trees”, PROCEDURE BASED ON PHASE I, II, AND III CLINICAL “Model-based recursive partitioning”, INFORMATION CRITERION TRIAL DESIGNS “Quint”, and an approach developed Lei Gao*, Sanofi Amy S. Ruppert*, The Ohio State by Zhang et al. (2012). The main evalu- University ation criterion is the expected potential William F. Rosenberger, George outcome if the entire population would be Mason University Abigail B. Shoben, The Ohio State University subjected to the optimal treatment regime Zorayr Manukyan, Pfizer Inc. resulting from each method under study. Background: The success rate for inves- In dose-finding studies with toxicity- tigational drugs from Phase I through III email: [email protected] efficacy responses, penalty functions and is less than 15% (Hay, 2014); potential Bayesian procedures are used to find a reasons include suboptimal Phase I single optimal dose with ethical toxicity- 3i. COMPARING METHODS dose levels declared for further study, efficacy trade-offs. It has been widely OF ADJUSTING FOR CENTER inadequate surrogate end points, lack of seen that such designs can select the EFFECTS USING PEDIATRIC randomization, and other Phase II and III wrong dose when the working prior is ICU GLYCEMIC CONTROL DATA design decisions. Trial design efficiency wrong, largely due to a “stickiness” prop- within stages has been assessed, but Samantha Shepler*, Emory University erty of miring at a single dose. As one few have evaluated the process as a possible remedy, we present a family of Scott Gillespie, Emory University continuum. We aimed to characterize the compound optimal designs that involve Traci Leong, Emory University ability to recognize a clinically benefi- both efficiency of estimation for updating cial drug across stages using standard In multi-site randomized clinical trials, prior parameters and the ethical criteria designs. Methods: Standard Phase I randomization is balanced by institu- that minimize highly toxic or ineffective (3+3), II (Simon’s optimal 2-stage), and tion to minimize any confounding does. It is shown that most Bayesian III (1:1 randomized group sequential) center effects. However, in a multi-site, sequential designs for dose finding in designs were implemented. Dose limiting non-randomized clinical trial with a the literature can be thought of as a toxicity and response data were assumed single intervention, this balancing is not special case of this family of designs. binomially distributed, and survival data possible. In this presentation, we test for exponentially distributed. Two toxic- center effects in the estimation of length ity (constant and step), three response of stay by known prognostic factors. and three survival patterns (constant We propose three methods to adjust for

Program & Abstracts 161 and two with increasing efficacy) were trials, subpopulations can be defined by trials in evidence synthesis may lead to evaluated using simulation. Results: With different standards of care or histologies. bias in estimation. We call such trials low constant toxicity, standard designs Current designs are inefficient or poten- trial-level outliers. To the best of our performed well regardless of response tially deceiving. We propose a phase knowledge, while heterogeneity and and survival patterns (experimental agent I Bayesian design that shares dose- inconsistency in NMA have been exten- favorable in 87.6% of simulations). Under response information across subgroups sively discussed and well addressed, few stepped toxicity and increasing response/ to improve and quicken dose finding previous papers have considered the survival profiles that were strongly dose- within a subgroup, while allowing the proper detection and handling of trial- dependent, favorable decisions occurred flexibility to drop subgroups that find all level outliers. In this paper (poster), we less frequently (64.0% of simulations). doses overly toxic. Traditionally, patients propose several Bayesian outlier detec- Understanding the performance of stan- are enrolled in cohorts and treated at the tion measures, which are then applied to dard design methods under a variety of updated MTD. However to account for a diabetes data set, and whose perfor- toxicity, response and survival profiles will staggered enrollment between sub- mance is evaluated through simulation be crucial when comparing performance groups, we propose multiple guidelines studies. for dose escalation within each subgroup. with non-standard designs. email: [email protected] In a simulation study, we investigate email: [email protected] three dose-response hierarchical models. For comparison, we investigate three 3n. SUBGROUP ANALYSIS IN CON- 3l. DOSE-FINDING USING corresponding saturated dose-response FIRMATORY CLINICAL TRIALS models that model each subgroup & HIERARCHICAL MODELING Brian Millen*, Eli Lilly and Company FOR MULTIPLE SUBGROUPS dose-response independently. The advent of personalized medicine has email: [email protected] Kristen May Cunanan*, University brought increased attention to the study of Minnesota of subpopulations in confirmatory clinical Joseph S. Koopmeiners, University 3m. DETECTING OUTLYING TRIALS trials. Increasingly, exploratory subgroup of Minnesota IN NETWORK META-ANALYSIS analyses have the potential to influence regulatory recommendations on appro- Primarily, phase I clinical trials determine Jing Zhang*, University of Maryland priate populations for treatment with a new treatment’s highest dosage with Haoda Fu, Eli Lilly and Company medicines in review. The recent EMA draft an acceptable toxicity rate, defined as guideline on Subgroup Analyses in Con- the maximum tolerated dose (MTD), Bradley P. Carlin, University of firmatory Trials focuses on this issue. In via a dose-finding study. In clinical Minnesota addition, confirmatory subgroup analysis trials, we have seen hierarchical model- Network meta-analysis (NMA) expands approaches are increasingly employed ing (HM) used in many applications to the scope of a conventional pairwise by sponsors interested in develop- improve estimation and power, such as meta-analysis to simultaneously handle ing tailored therapies or personalized adaptive drug screening trials. Given multiple treatment comparisons. How- medicines. In this talk, we will discuss the success of HM and the success ever, some trials may appear to deviate statistical and inferential considerations of dose-finding methods such as the markedly from the others, and thus be in these settings, along with thoughts on continual reassessment method, we inappropriate to be synthesized in the implications for drug development in the consider a design combining the two NMA. In addition, the inclusion of these future methods to better motivate dose finding for a heterogeneous disease population, email: [email protected] i.e. subpopulations. In oncology clinical

162 ENAR 2015 | Spring Meeting | March 15–18 4. POSTERS: two groups of patients that had two dif- Non-small cell lung cancer (NSCLC), the Survival Analyses ferent kinds of bone marrow transplants. most common type of lung cancer, is It is found that the differences of the two one of serious diseases causing death groups are well described by a time-scale for both men and women. Computer- 4a. TIME DEPENDENT COVARIATES change in hazard functions, i.e., the accel- aided diagnosis and survival prediction IN THE PRESENCE OF LEFT erated hazards model. of NSCLC is of great importance in TRUNCATION providing assistance to diagnosis and email: [email protected] Rebecca A. Betensky*, Harvard School personalize therapy planning for lung of Public Health cancer patients. In this presentation we 4c. A MARTINGALE APPROACH would propose an integrated framework A time varying marker process that is TO ESTIMATING CONFIDENCE for NSCLC computer-aided diagnosis measured at study entry is problematic in BAND WITH CENSORED DATA and survival analysis using novel image the presence of left truncation. In this talk, markers. The entire biomedical imaging Eun-Joo Lee*, Millikin University we describe possible approaches to this informatics framework consists of cell problem and explain their drawbacks. We Some non-parametric simultaneous detection, segmentation, classification, propose methods to appropriately handle confidence bands for survival function discovery of image markers, and survival this problem based on residuals of the are developed when data are randomly analysis. After the extraction of a set of marker process and alternative modeling censored on the right. To construct the extensive cellular morphological features of it. We present simulations and apply confidence bands, a computer-assisted using efficient feature descriptors, eight the methods to a longitudinal Alzheimer’s method is utilized and this approach different classification techniques that can disease study. requires no distributional assumptions, handle high-dimensional data have been email: [email protected] so the confidence bands can be easily evaluated and then compared for com- estimated. To improve the estimation puter-aided diagnosis. Moreover, a Cox procedures for the finite sample sizes, proportional hazards model is fitted by 4b. ON THE ESTIMATORS the log-minus-log transformation is component-wise likelihood based boost- AND TESTS FOR THE SEMIPARA- employed. ing. Significant image markers have been METRIC HAZARDS REGRESSION email: [email protected] discovered using the bootstrap analysis MODEL and the survival prediction performance Seung-Hwan Lee*, Illinois of the model is also evaluated. The Wesleyan University 4d. NOVEL IMAGE MARKERS proposed framework has been applied to FOR NON-SMALL CELL LUNG a lung cancer dataset that contains 122 In the accelerated hazards regression CANCER CLASSIFICATION cases with complete clinical information. model with censored data, estimation of AND SURVIVAL PREDICTION The classification performance exhibits the covariance matrices of the regres- high correlations between the discovered sion parameters is difficult, since it Hongyuan Wang*, University of image markers and NSCLC subtypes. involves the unknown baseline hazard Kentucky The survival analysis demonstrates function and its derivative. This provides Fuyong Xing, University of Florida strong prediction power of the discovered simple but reliable procedures that Hai Su, University of Florida image markers. yield asymptotically normal estimators whose covariance matrices can be easily Arnold Stromberg, University of email: [email protected] estimated. For the leukemia cancer data, Kentucky the issue of interest is a comparison of Lin Yang, University of Florida

Program & Abstracts 163 4e. GENERALIZED ESTIMATING 4f. GENERALIZED ACCELERATED 4g. PENALIZED VARIABLE EQUATIONS FOR MODELING FAILURE TIME SPATIAL SELECTION IN COMPETING RESTRICTED MEAN SURVIVAL FRAILTY MODEL RISKS REGRESSION TIME UNDER GENERAL Haiming Zhou*, University of South Zhixuan Fu*, Yale University CENSORING MECHANISMS Carolina Chirag R. Parikh, Yale University Xin Wang*, University of Michigan Timothy Hanson, University of South School of Medicine Carolina Douglas E. Schaubel, University of Bingqing Zhou, Yale University Michigan Jiajia Zhang, University of South Carolina The penalized variable selection methods Restricted mean lifetime is often of great Flexible incorporation of both geographi- have been extensively studied for stan- clinical interest in practice. Several exist- cal patterning and risk effects in cancer dard time-to-event data. Such methods, ing methods involve explicitly projecting survival models is becoming increas- cannot be directly applied when subjects out patient-specific survival curves using ingly important, due in part to the recent are at risk of several mutually exclusive parameters estimated through Cox availability of large cancer registries. events, known as competing risks. The regression. However, it would often be Most spatial survival models stochasti- proportional subdistribution hazard preferable to directly model the restricted cally order survival curves from different (PSH) model proposed by Fine and Gray mean, to yield more clinically meaning- subpopulations. However, it is common has become a popular semi-parametric ful treatment and covariate effects. We for survival curves from two subpopula- model for time-to-event data with compet- propose generalized estimating equation tions to cross in epidemiological cancer ing risks. It allows for direct assessment methods to relate restricted mean lifetime studies and thus interpretable standard of covariate effects on the cumulative inci- to baseline covariates. The proposed survival models cannot be used without dence function. In this paper, we propose methods avoid potentially problematic some modification. Common fixes are a general penalized variable selection distributional assumptions pertaining the inclusion of time-varying regression strategy that simultaneously handles a restricted survival time, and allow for effects in the proportional hazards model variable selection and parameter estima- censoring to depend on time-dependent or fully nonparametric modeling, either tion in the PSH model. We rigorously factors. Our methods are motivated by of which destroys any easy interpret- establish general asymptotic properties the desire to quantify the impact on pre- ability from the fitted model. To address for the proposed penalized estimators transplant survival of characteristics of this issue, we develop a generalized and present a numerical algorithm for end-stage liver disease (ESLD) patients accelerated failure time model which is implementing the variable selection pro- wait listed for liver transplantation. This interpretable in terms of median regres- cedure. Simulation studies are conducted analysis requires accommodation for sion and able to capture crossing survival to demonstrate the good performance of dependent censoring since pre-transplant curves in the presence of spatial correla- the proposed method. Diseased donor survival is dependently censored by time- tion. An efficient Markov chain Monte kidney transplant data from the United dependent factors due to the nature of Carlo algorithm is presented for posterior Network of Organ Sharing illustrate the the liver allocation system. Large sample computation and an R package is devel- utility of the proposed method. properties of the proposed estimators oped to fit the model using complied are derived and simulation studies are email: [email protected] C++. We apply our approach to a subset conducted to assess their finite sample of the prostate cancer data gathered for performance. We apply the proposed Louisiana by the Surveillance, Epidemi- methods to model pre-transplant mor- ology, and End Results program of the tality among end-stage liver disease National Cancer Institute. patients using national registry data. email: [email protected] email: [email protected]

164 ENAR 2015 | Spring Meeting | March 15–18 4h. STATISTICAL MODELING same subject and present a maximum 4j. JOINT MODELING OF OF GAP TIMES IN PRESENCE likelihood approach for parameter estima- RECURRENT EVENT PRO- OF PANEL COUNT DATA WITH tion. We will investigate our proposed CESSES AND INTERMITTENTLY INTERMITTENT EXAMINATION approach through simulations and show OBSERVED TIME-VARYING TIMES: AN APPLICATION analysis of Collaborative Perinatal Project BINARY COVARIATE PROCESSES study data. TO SPONTANEOUS LABOR Shanshan Li*, Indiana University Richard IN WOMEN email: [email protected] M. Fairbanks School of Public Health, Ling Ma*, Eunice Kennedy Shriver Indianapolis National Institute of Child Health and 4i. COMPETING RISKS MODEL When conducting recurrent event data Human Development, National Institutes analysis, it is common to assume that of Health OF SCREENING AND SYMPTOMS DIAGNOSIS FOR PROSTATE the covariate processes are observed Rajeshwari Sundaram, Eunice Kennedy CANCER throughout the follow-up period. In most Shriver National Institute of Child Health applications, however, the values of and Human Development, National Insti- Sheng Qiu*, University of Michigan time-varying covariates are only observed tutes of Health Alexander Tsodikov, University periodically rather than continuously. A popular ad-hoc approach is to carry In longitudinal studies of serial events of Michigan forward the last observed covariate value where each subject may be observed Introduction of screening for prostate until it is measured again. This simple only at several distinct and random cancer using the prostate-specific antigen approach, however, usually leads to observation times, only the numbers of (PSA) marker of the disease around biased estimation. To tackle this prob- occurrences of the events are known 1989 led to remarkable dynamics of the lem, we propose to model the covariate at the observation times. Data of this incidence of the disease observed in effect on the risk of the recurrent events type are commonly referred to as panel European countries. A competing risks through jointly modeling the recurrent count data. Most of the existing methods model for cancer screening diagnosis event process and the longitudinal mea- for panel count data focus on statistical and diagnosis due to symptoms is devel- sures. Despite its popularity, estimation inference for the point process while it is oped. The risks are driven by a latent of the joint model with binary longitudinal also of interest to make inference for the process modeling tumor onset. Intensity measurements remains a challenge, gap times between events. The applica- of screening and hazard driving pros- because the standard linear mixed tion of interest in this project is to provide tate cancer diagnosis in the absence of effects model approach is not appropri- a framework for modeling gap-time screening are estimated jointly and semi- ate for binary measures. In this work, we distributions between cervical dilations parametrically using estimating equations postulate a Markov model for the binary in the first stage labor process (e.g. 3 cm and the NPMLE method. Examples using covariate process and a random-effect dilation to 4 cm dilation). One well-known data from European cancer registries proportional intensity model for the recur- problem in obstetrics is that the start of (EUREG) are illustrated. rent event process. We use a Markov labor is not clearly defined, and thus the email: [email protected] chain Monte Carlo algorithm to estimate benchmark reference time is often cho- all the unknown parameters. The per- sen as the end of the process (full dilation formance of the proposed estimator is at 10 cm) and time is run backwards. We evaluated via simulations. The methodol- propose a parametric model for the gap ogy is applied to an observational study times and use random effects to capture designed to evaluate the effect of Group the correlation among gap times of the A streptococcus (GAS) on pharyngitis among school children in India. email: [email protected]

Program & Abstracts 165 4k. COMPOSITE OUTCOMES Interval-censored case 2 failure time how the treatment effect evolves over VERSUS COMPETING RISKS data arise frequently in longitudinal time. In this manuscript, we propose a studies where the exact failure time method to construct the simultaneous Paul Kolm*, Christiana Care Health cannot be determined but is known confidence band associated with the Systems only to have occurred between two predicted difference by converting an Many randomized trials as well as random observation times. We propose Empirical Likelihood ratio test statistic. observational comparative effectiveness a quantile regression model to analyze Simulation studies are conducted to dem- studies analyze a composite outcome interval-censored data since it relaxes onstrate the superior coverage accuracy that includes several singular outcomes the requirements on the error term and of the proposed confidence band over its of interest. An example of a composite the coefficients are interpretable as direct existing competitor. outcome often used in cardiovascular regression effects on the failure time. It is email: [email protected] research includes death due to cardio- assumed that the conditional quantile of vascular causes / myocardial infarction failure time is a linear function of covari- / stroke / rehospitalization for revascu- ates and the failure time and observation 5. POSTERS: larization. The composite outcome is time are conditional independent. An Causal Inference coded “yes” if any one of the outcomes M-estimator is developed for parameter occurs for a given patient. Although the estimation and the asymptotic distribution study is usually powered on the com- for the estimator is derived. The estimator 5a. A CAUSAL FRAMEWORK posite outcome, separate analyses of the is computed using the convex-concave FOR META ANALYSES single outcomes are often made with the procedure and its confidence intervals Michael E. Sobel*, Columbia University intent of determining the one that exerts are constructed using a subsampling David Madigan, Columbia University the major influence on the composite method. The small sample performance outcome. This study compares analysis of the proposed method is demonstrated Wei Wang*, Columbia University of a composite outcome with an analysis via simulation studies. Finally, we apply We construct a framework for meta-anal- of the outcomes from a competing risks the proposed method to analyze data ysis that helps to clarify and empirically perspective with respect to regression from the Atherosclerosis Risk in Commu- examine the sources of between study coefficient estimates, standard errors, nities Study. heterogeneity in treatment effects. The power and conclusions. email: [email protected] key idea is to consider, for each of the email: [email protected] treatments under investigation, the subject’s potential outcome in each study 4m. EMPIRICAL LIKELIHOOD were he to receive that treatment. We 4l. QUANTILE REGRESSION CONFIDENCE BANDS FOR THE consider four sources of heterogene- MODELS FOR INTERVAL- DIFFERENCE OF SURVIVAL ity: 1) response inconsistency, whereby CENSORED FAILURE TIME DATA FUNCTIONS UNDER THE PRO- a subject’s response to a given treat- Fang-Shu Ou*, University of North Caro- PORTIONAL HAZARDS MODEL ment varies across different studies, 2) lina, Chapel Hill Mai Zhou, University of Kentucky the grouping of non-equivalent treat- ments, where two or more treatments Donglin Zeng, University of North Caro- Shihong Zhu*, University of Kentucky lina, Chapel Hill are grouped and treated as a single When comparing two treatments giving treatment under the incorrect assump- Jianwen Cai, University of North Caro- rise to censored time-to-event outcomes, tion that a subject’s responses to the lina, Chapel Hill the difference of two predicted individual- different treatments would be identical, ized survival functions provides valuable information at the individual level about

166 ENAR 2015 | Spring Meeting | March 15–18 3) non-ignorable treatment assignment, imaging (fMRI) study of thermal pain we marginal structural models, estimated and 4) response related variability in the are interested in determining whether by inverse-of-probability-of-compliance composition of subjects in different stud- brain measurements (over hundreds of weighted estimators, to estimate the ies. We then examine the implications thousands of voxels) mediate the relation- survival benefit if a patient were to follow of these assumptions for heterogeneity/ ship between the application of thermal a certain rule on whether or not to accept homogeneity of conditional and uncon- pain and reported amount of perceived an offered organ. Specifically we are ditional treatment effects. To illustrate the pain. To address the problem of high interested in testing the survival ben- utility of our approach, we re-analyze indi- dimensional mediators, we propose a efit of declining organs below a certain vidual patient data from 29 randomized framework called the principal direction threshold of donor quality. We developed placebo controlled studies of Vioxx on of mediation (PDM). This framework is an organ quality score based on a Cox the cardio-vascular risk of Vioxx, a Cox-2 philosophically similar to principal com- regression model of post-transplant selective non- steroidal anti-inflammatory ponent analysis (PCA), but addresses survival using donor characteristics as drug approved by the FDA in 1999 for the a fundamentally different problem: the predictors, and we implement our pro- management of pain and withdrawn from first principal direction of mediation is the posed method using data from the United the market in 2004. linear combination of high dimensional Network for Organ Sharing (UNOS) potential mediators that is simultaneously national registry of lung transplants. Our email: [email protected] most strongly predicted by the treatment work may be easily extended to allow for and predictive of the outcome. We study time-varying strategies that accommo- 5b. THE PRINCIPAL DIRECTION this method using simulation and an date patient condition and the prevalence OF MEDIATION application to data from an fMRI study of of donor organs at a particular center. thermal pain. Oliver Chen*, Johns Hopkins Bloomberg email: [email protected] School of Public Health email: [email protected]

Elizabeth Ogburn, Johns Hopkins 5d. A MODEL BASED APPROACH Bloomberg School of Public Health 5c. DYNAMIC MARGINAL STRUC- FOR PREDICTING PRINCIPAL Ciprian Crainiceanu, Johns Hopkins TURAL MODELS TO TEST THE STRATUM MEMBERSHIP IN ENVI- Bloomberg School of Public Health BENEFIT OF LUNG TRANSPLAN- RONMENTAL INTERVENTIONS TATION TREATMENT REGIMES Brian Caffo, Johns Hopkins Bloomberg Katherine E. Freeland*, Johns Hopkins School of Public Health Jeffrey A. Boatman*, University Bloomberg School of Public Health of Minnesota Martin Lindquist, Johns Hopkins Bloom- Environmental interventions targeted at berg School of Public Health David M. Vock, University of Minnesota reducing indoor air pollution have shown promise as a method for improving Mediation analysis is often used in the Patients awaiting lung transplantation respiratory health outcomes in children behavioral sciences to investigate the role may confront a difficult decision if offered by reducing particulate matter (PM) in the of intermediate variables that lie in the a low-quality organ: accept the organ or home. However, in these interventions, causal path between a randomized treat- remain on the waiting list with the hope it is difficult to determine the effect of ment and an outcome variable. However, of receiving a better organ. Patients may reduced PM, a post-randomization vari- little is known about mediation analysis have multiple opportunities to accept or able, on the respiratory outcomes. Using when the intermediate variable (mediator) decline a transplant, and organ assign- principal stratification, a framework for is a high dimensional vector. For exam- ment is not independent across subjects, calculating principal effects (i.e. effects ple, in a functional magnetic resonance but previous statistical methods to infer within a stratum), we are able to measure optimal treatment strategies do not fully the effect of reduced PM on respiratory account for these problems. To over- come these issues, we extend dynamic

Program & Abstracts 167 outcomes. These principal effects allow so an individual’s total medical costs 5f. GENERALIZING EVIDENCE for the comparison of treatment effects are subject to non-independent right FROM RANDOMIZED TRIALS for those who would and would not have censoring. Moreover, medical costs USING INVERSE PROBABILITY seen a reduction in PM levels. With the are commonly right skewed. Therefore, OF SELECTION WEIGHTS PM reduction variable, we can identify standard regression models and survival Ashley L. Buchanan*, University of principal strata membership for some analysis techniques are inadequate. Lin North Carolina, Chapel Hill individuals in the control and treatment (2000) and Robins and Rotnitzky (1992) groups. However, the observed data have developed linear regression and Michael G. Hudgens, University of North only allow us partial identification of weighting techniques to model medical Carolina, Chapel Hill strata for other individuals. We explore cost from trial data. Since medical costs Stephen R. Cole, University of North the use of various models to predict the are often collected in observational data, Carolina, Chapel Hill partially identified individuals’ strata and we develop propensity score (PS) meth- Results obtained in randomized trials calculate “principal effects” based on ods to estimate costs that are adjusted may not generalize to a target popula- predicted membership. The customary for potential confounding inherent to tion. In a randomized trial, the treatment statistical uncertainty of these estimates observational studies. We compare com- assignment mechanism is always known, was explored, along with additional mon PS methods including generalized but assuming participants are a random variability introduced by the process of linear regression with gamma variances, sample from the target population may model selection and strata classification. stratification, inverse probability weighting be dubious. Lack of generalizability can A resampling based estimator of the (IPW) and doubly robust weighting. Spe- arise when the distribution of treatment principal effects was developed, account- cifically, for the IPW method, we develop effect modifiers in trial participants is dif- ing for the major sources of variability in a joint model with subject-specific ferent from the distribution in the target this process. random effects to account for possible population. We consider an inverse correlation of PS estimates and the email: [email protected] probability of selection weighted (IPSW) probability of observing complete costs. estimator for generalizing trial results to Large sample variances are derived using a target population. The IPSW estimator 5e. PROPENSITY SCORE APPROACH a general estimation equation (GEE) is shown to be consistent and asymp- TO MODELING MEDICAL COST framework. These modeling approaches totically normal. Expressions for the USING OBSERVATIONAL DATA are applied to a cost analysis of two blad- asymptotic variance and a consistent der cancer treatments, cystectomy versus Jiaqi Li*, University of Philadelphia sandwich-type estimator of the variance bladder preservation therapy, using are derived. Simulation results compar- Nandita Mitra, University of Philadelphia SEER-Medicare data. ing the IPSW estimator and a previously Elizabeth Handorf, Fox Chase email: [email protected] proposed stratified estimator show that Cancer Center the estimators perform similarly when the Justin Bekelman, University of propensity score model included a binary Philadelphia covariate. However, with a continuous covariate in the propensity score model, Medical cost estimation is vital to health the IPSW estimator is less biased and the economics evaluation and decision- corresponding Wald confidence intervals making. Often it is not feasible to follow had better coverage. The IPSW estimator subjects for the full duration of interest is employed to generalize results from the AIDS Clinical Trials Group to all people currently living with HIV in the U.S. email: [email protected] edu

168 ENAR 2015 | Spring Meeting | March 15–18 5g. RACIAL DISPARITIES IN are applied to SEER cancer registry data analysis to longitudinal trait analysis in CANCER SURVIVAL: A CAUSAL from 1992-2010. This work illustrates the framework of the generalized estimat- INFERENCE PERSPECTIVE how a causal inference perspective aids ing equations (GEE). We investigated in identifying and formalizing relevant the performance of the aSPU test family Linda Valeri*, Harvard School hypotheses in health disparities research in different scenarios, including differ- of Public Health that can inform policy decisions. ent sample sizes, varying number of Jarvis Chen, Harvard School null SNPs and the presence of opposite email: [email protected] of Public Health directions of causal SNP effects. Through Nancy Krieger, Harvard School extensive simulation studies, we showed of Public Health 6. POSTERS: that the aSPU family was generally more powerful than several other commonly Tyler J. VanderWeele, Harvard School Statistical Genetics, used methods, especially in the presence of Public Health GWAS, and ‘Omics Data of many null SNPs. We demonstrated the Brent A. Coull, Harvard School utility and statistical efficiency gains of the of Public Health 6a. A DATA-ADAPTIVE SNP-SET- proposed aSPU tests using the Athero- BASED ASSOCIATION TEST The National Cancer Institute has identi- sclerosis Risk in Communities (ARIC) OF LONGITUDINAL TRAITS fied the elimination of cancer health data. disparities as one of the most urgent Yang Yang*, University of Texas Health email: [email protected] goals for reducing disease burden in the Science Center at Houston US. Recent research has highlighted that Peng Wei, University of Texas Health Sci- 6b. GENETIC ANALYSIS OF disparities across racial/ethnic groups ence Center at Houston involve cancer etiology, incidence, DATA FROM STRUCTURED screening, diagnosis, treatment, and Wei Pan, University of Minnesota POPULATIONS survival. Quantifying the interplay of Yogasudha Veturi*, University of Ala- mediating factors across this continuum The current practice of single trait-single bama at Birmingham and informing targeted interventions is SNP analysis in genome-wide associa- Gustavo de los Campos, University of therefore a priority. In the present study tion studies (GWAS) is underpowered to Alabama at Birmingham we propose to estimate the disparity in detect the median-to-small effect sizes cancer survival between Black and White Human populations exhibit various typically expected for common diseases. individuals that would remain if the medi- degrees of stratification and admixture. When multiple measurements of a trait ator distribution of the black population In the analysis of genomic data, popula- at different time points are available, the were set equal to that of the white popu- tion stratification is usually treated as a longitudinal trait-multiple SNP analy- lation. We identify this causal estimand nuisance. Consequently, both in Genome sis becomes a promising alternative. under the assumption of no unmeasured Wide Association Studies (GWAS) and A longitudinal study may have greater confounders of the mediator-survival Whole Genome Regression (WGR) a power than a cross-sectional study, given relationship. We then develop sensitivity common approach has been to “cor- the same or even smaller sample size. analysis techniques for violation of the rect” for population structure by adding Multiple SNPs tend to reveal more infor- unmeasured confounding assumption marker-derived principal components mation and render more robust signals and for selection bias due to mediator as fixed effects. However, this approach than a single SNP. We extended an adap- missing not at random. The approaches induces a mean correction that does tive test, called adaptive sum of powered not consider the possibility that marker score (aSPU) test (Pan et al, Genetics effects vary across sub-populations. In 2014) and its variants (aSPU-weighted and aSPU-score) for cross-sectional trait

Program & Abstracts 169 this study, we propose ways of dealing jointly used. Testing these phenotypic phenotypes that are correlated with case with stratification that incorporate hetero- traits simultaneously is advantageous status. In such cases, naive regression geneity explicitly using interaction models to take the disease heterogeneity into methods that ignore case-control design and the bivariate Genomic Best Linear account, and improve the discovery will produce biased estimates. This may Unbiased Predictor (G-BLUP). These process of identifying causal genetic vari- be corrected by using methods such as approaches allow for the analysis of data ants, especially those pleiotropic variants inverse probability weighting (IPW), which from two or more groups jointly, provide associated with multiple traits. Further- assigns weights to the observations to group-specific marker effects, estimates more, complex diseases are caused by correct for the fact cases are overrepre- of variances and between-group correla- the interplay of multiple genetic vari- sented in a case-control study. However, tions. We applied the proposed methods ants through complicated mechanisms. IPW regression coefficient estimates to study genomic differences/similarities Multi-locus-based approaches, which may be unreliable when evaluating the between clusters obtained from a multi- take the possible genetic interactions into association between genetic markers racial human population (Multi-Ethnic account, are highly desired in genetic and intermediate phenotypes that are Study of Atherosclerosis) for height, association studies. The existing multi- strongly associated with case status. In a high-density lipoprotein (HDL) and low- trait-based approaches are commonly case-control study of temporomandibular density lipoprotein (LDL). Our estimates single-locus-based, and are proposed disorder (TMD), we may wish to identify of genomic heritability varied not only for family-based association studies. In markers associated with the severity of across traits but also across groups, and this article, we propose a multi-locus, orofacial pain. Nearly all controls will our estimates of genomic correlations multi-trait approach for population-based report no orofacial pain, which causes ranged from low (0.3-0.4) to moderate association studies. Through simulations, IPW regression to produce inaccurate high (0.5-0.6) providing evidence of great we demonstrated that testing multiple results. We propose a novel permutation- extents of genetic heterogeneity. traits simultaneously was more powerful based method and compared it with than testing one single trait at a time. We IPW. Simulations indicate that whereas email: [email protected] also illustrated the proposed approach IPW produces inflated type I error rates, with an application to Nicotine Depen- our method produces correct type I 6c. MAPPING DISEASE dence. The joint analysis of three traits error rates with no loss in power. We SUSCEPTIBILITY LOCI FOR simultaneously identified SNPs with a sig- then apply this method to identify SNPs MULTIPLE COMPLEX TRAITS nificant association, which was replicable associated with the severity of orofacial WITH U-STATISTICS across studies. pain using data from OPPERA study, a large-scale case-control study of TMD. Ming Li*, University of Arkansas for email: [email protected] We identify two novel SNPs strongly Medical Sciences associated with pain severity. Changshuai Wei, University 6d. PERMUTATION-BASED TEST email: [email protected] of North Texas STATISTICS FOR INTERMEDIATE Qing Lu, Michigan State University PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Many complex diseases, particularly psychiatric and behavioral disorders, are Wei Xue*, University of North Carolina, supposed to be multi-dimensional with Chapel Hill various aspects that are physical, behav- Eric Bair, University of North Carolina, ioral and psychological. While it remains Chapel Hill a great challenge to find a unified In case-control genome-wide association measurement to characterize a disease, studies, one may wish to identify genetic a number of phenotypic traits are usually markers associated with intermediate

170 ENAR 2015 | Spring Meeting | March 15–18 6e. STATISTICS FOR GENETIC 6f. POWER AND SAMPLE SIZE simulation-based sample size and power ASSOCIATION IN THE PRES- DETERMINATION FOR TIME calculations to those of other published ENCE OF COVARIATES–GENOME COURSE MICROARRAY sample size methods for both static and SCANNING CONSIDERATIONS DIFFERENTIAL EXPRESSION time course microarray experiments. STUDIES: A FALSE DISCOVERY Hui-Min Lin*, University of Pittsburgh email: [email protected] RATE AND PERMUTATION-BASED Eleanor Feingold, University SIMULATION METHOD of Pittsburgh Joanne C. Beer*, University of Pittsburgh 6g. FUNCTIONAL RANDOM FIELD Yan Lin, University of Pittsburgh MODELS FOR ASSOCIATION Thuan Nguyen, Oregon Health ANALYSIS OF SEQUENCING DATA A number of different statistics are avail- & Science University able for genetic association analysis in Xiaoxi Shen*, Michigan State University Kemal Sonmez, Oregon Health the presence of covariates. In the context & Science University Ming Li, University of Arkansas for Medi- of a genome-wide association study, cal Sciences hundreds of thousands to millions of Dongseok Choi, Oregon Health SNPs are tested, and whatever covariate & Science University Zihuai He, University of Michigan model we specify is likely to be imper- Microarray experiments allow research- Qing Lu, Michigan State University fect. In addition, the results of the study ers to assess levels of gene expression Generalized Genetic Random Field often focus on the list of SNPs ordered for thousands of genes at a time. A (GGRF) model holds many nice proper- according to the statistics rather than frequent goal of microarray experiments ties for small sample-size sequencing on certain p-value cutoffs. Therefore, it is to identify genes which are differen- studies, and has well-controlled type I is important to investigate the behavior tially expressed across various biological error and high power as compared with of extreme values of the statistics rather conditions. Several methods have been existing methods. It, however, needs to than the behavior of the expected values. developed for determining sample size specify a weight function to consider Gail et al. (2008) discussed this issue for differential expression microarray rare variants, and models only pairwise and proposed “detection probability” and experiments, but few methods have been linkage disequilibrium (LD). To further “proportion positive” to measure the suc- extended to time course experiments improve the method, we propose a cess (power) of a genomic study when in which gene expression is measured functional random field model (FRF) ranked lists are the primary outcome. over a series of time points. We propose for association analysis of sequencing In theory, the ranked lists can be domi- a flexible method for sample size and data. By fitting a functional curve on the nated by SNPs with misfit models rather power analysis of time course microarray genotypes of genetic region for each than by true positive results. We are experiments using a positive false dis- individual, we are able to incorporate conducting a comprehensive compara- covery rate type I error control. Because high-order LD information into the asso- tive study to investigate the behavior of microarray data often deviate from the ciation analysis. Moreover, because it different association statistics that model assumption of normality underlying the models sequencing data on the individual covariates. We evaluate the statistics use of parametric t-tests and F-tests, and level rather than the population level, from the perspective of which statistics since it has been increasingly recognized it does not require a weight function can provide robust ranked lists of “top that accounting for the correlation struc- for considering rare variants. We com- hits.” These are not necessarily the same ture of gene expression data is important pare type I error and power of FRF with statistics that have the highest power in a for accurately estimating error rate and shoes of GGRF, SKAT and Burden test. conventional single-test context. sample size, the method relies on a email: [email protected] permutation-based null distribution for the test statistics. We compare results of

Program & Abstracts 171 Our preliminary findings show that FRF human histone H4 and the corresponding dropout and heterogeneity, for example) outperform the other methods, especially proteoforms. Results show that related prohibit direct application to the single- when the weight function is misspecified. proteoforms may be statistically difficult cell setting. We here propose a statistical Additional findings also suggest FRF has to differentiate. pipeline for studying co-regulated genes the advantage over existing methods using single cell RNA-seq data. We email: [email protected] when genetic effects are bi-direction and applied our pipeline on expression when missing genotype data is present. profiles from 73 undifferentiated human embryonic stem cells (hESCs), and com- email: [email protected] 6i. A STATISTICAL PIPELINE FOR STUDYING CO-REGULATED pared it with a naïve approach based on GENES USING SINGLE-CELL correlation analysis. Results demonstrate 6h. QUANTIFYING UNCERTAINTY RNA-seq DATA that the naïve approach is largely affected IN THE IDENTIFICATION OF by technical noise and is unable to Ning Leng*, Morgridge Institute PROTEINS, POST-TRANSLATIONAL identify much of the biological heteroge- for Research MODIFICATIONS (PTMs) AND neity that is present. On the other hand, PROTEOFORMS Li-Fang Chu, Morgridge Institute our pipeline was able to accommodate for Research technical noise and in so doing reveal co- Naomi C. Brownstein*, Florida regulation features of cell cycle markers State University National High Magnetic Yuan Li, University of Wisconsin, in the undifferentiated hESC population. Field Lab Madison email: [email protected] Xibei Dang, Florida State University Peng Jiang, Morgridge Institute National High Magnetic Field Lab for Research Eric Bair, University of North Carolina, Chris Barry, Morgridge Institute 6j. OUTLIER DETECTION FOR Chapel Hill for Research QUALITY CONTROL IN FLOW CYTOMETRY USING COMPOSI- Nicolas L. Young, Florida State Ron Stewart, Morgridge Institute TIONAL DATA ANALYSIS University National High Magnetic for Research Field Lab James Thomson, Morgridge Institute Kipper Fletez-Brant*, Johns Hopkins University The traditional goals of top-down pro- for Research teomics are protein identification and Christina Kendziorski, University Josef Spidlen, BC Cancer Agency quantitation. However, the presence of of Wisconsin, Madison Ryan Brinkman, BC Cancer Agency additional sources of variability, such as Recent advances in single-cell RNA- Pratip Chattopadhyay, National post-translational modifications (PTMs) seq technology enable investigators Institutes of Health and genomic variants, complicates the to conduct transcriptome-wide gene problem of identification. Recent interest Flow cytometry experiments collect expression studies at the single-cell level. in the proteomics community has begun observations for N variables on C cells, Such studies serve as a revolutionary to shift from the relatively narrow problem with C >> N for a single experiment. tool to understand cell-to-cell variation of protein identification to consideration Current technology allows for hundreds within and among cell populations. A of these additional sources of variabil- of experiments to be performed per day, number of robust statistical methods are ity. Combining these factors results in and each experiment can have errors available for quality control and analysis a unique exhaustively defined chemi- in sample acquisition, measurement or of bulk RNA-seq data. However, features cal species termed “proteoform”. We machine malfuntion. This can result in common to single-cell data (increased explore a variety of scoring metrics and inaccurate observations for some, but estimate their uncertainty via bootstrap- not all, cells in an experiment. Trying ping. We demonstrate the method using

172 ENAR 2015 | Spring Meeting | March 15–18 to perform manual quality control on ment. Genetic markers are considered as within and across biological conditions is flow cytometry is not possible for more promising biomarkers that have drawn an important first step in many single-cell than a handful of experiments. We have great attentions in discovery research. RNA-seq experiments. Toward this end, developed a method to automate quality With recent technological innovations, we have developed a Dirichlet process control that uses compositional data genome-wide association studies mixture model based approach. The analysis. We model a flow cytometry (GWAS) have become possible and approach facilitates the identification of experiment as a set of compositions of revolutionized the research for genetic multi-modal genes, uses these genes to cell populations observed over time. markers underlying human complex identify subgroups, and allows for the Each population is defined as a cell hav- diseases. Power analysis plays a signifi- identification of differential dynamics (dif- ing observations above or below some cant role in GWAS by optimizing both ferential expression, differential dropout, threshold for each of the N variables. We practical costs and statistical validities. differential proportions within modal derive a summary statistic for each com- However, existing methods usually fail to groups) across multiple biological condi- position which reflects the distribution of consider the complicated correlation pat- tions. Advantages are demonstrated via cell populations represented in it. We use terns among genetic markers. Therefore, simulation and case studies. we propose a re-sampling based power this statistic to partition the data in a flow email: [email protected] experiment into “good” and “bad” data analysis method to properly address the using changepoint analysis. This statistic correlations among genetic markers. Our allows our method to take advantage of analysis shows the results on the statisti- 6m. EXPERIMENTAL DESIGN the multivariate nature of flow cytometry cal powers under varrious odd ratios and FOR BULK SINGLE-CELL data, and is reasonably fast. allele frequencies. RNA-Seq STUDIES email: [email protected] email: [email protected] Rhonda L. Bacher*, University of Wisconsin, Madison

6k. POWER ANALYSIS FOR GENOME- 6l. DIFFERENTIAL DYNAMICS Christina Kendziorski, University WIDE ASSOCIATION STUDY IN IN SINGLE-CELL RNA-Seq of Wisconsin, Madison BIOMARKER DISCOVERY EXPERIMENTS Studies of isoform expression are critical Wenfei Zhang*, Sanofi Keegan D. Korthauer*, University to understanding phenotypic complex- of Wisconsin, Madison ity as they potentially reveal information Yuefeng Lu, Sanofi not detectable using gene level esti- Christina Kendziorski, University Yang Zhao, Sanofi mates alone. With sequencing costs of Wisconsin, Madison continually decreasing, utilizing this Vincent Thuillier, Sanofi Measurements of genome-wide RNA information has become popular in bulk Jeffrey Palmer, Sanofi transcript abundance at the single-cell RNA-Seq experiments and we expect Sherry Cao, Sanofi level allow us to answer scientific ques- will become popular in single-cell RNA- tions that were elusive with traditional seq experiments as well. A few studies Jike Cui, Sanofi bulk data, which only provided averages have investigated the trade-off between Stephen Madden, Sanofi across large pools of cells. Specifically, sequencing depth and sample size both Srinivas Shankara, Sanofi it is now clear that transcription often for bulk and single-cell RNA-seq experi- occurs in a bursty manner, resulting in ments probing gene level expression, but Biomarker discovery is important for multi-modal distributions within gene no guidelines are available when isoform disease diagnosis, prognosis, and risk (with individual cells that are off, on at prediction in drug discovery and develop- a low level, and on at a high level, for example). Identifying such genes and using them to characterize subgroups

Program & Abstracts 173 expression is of interest. To address this, challenging for GPA to handle more than detecting genes with differential binding we have developed an approach for simu- three phenotypes simultaneously in its region among more than two condi- lating bulk and single-cell RNA-seq data at current form, limiting its applications in tions, such as for multiple time-course the gene and isoform level. The approach, practice. To address this limitation, we ChIP-Seq data. We further investigate called ReadSim, is used to assess the have reformulated the GPA model by the time-course changes of the genes question of depth vs. sample size for a adding a hierarchical prior on the asso- between four time points by defining number of RNA-seq experimental designs. ciation status matrix. A low-rank structure multivariate test statistics as the mean General guidelines are proposed. is imposed on the logit transformation of (TSmean) or maximum (TSmax) of three the prior matrix to encourage the correla- adjacent pair-wise ChIPtest statistics. email: [email protected] tion across multiple phenotypes. Through This new method provided the variance both simulations and real data applica- estimation under the assumption of equal 6n. A HIERARCHICAL MIXTURE tions, we have shown that our method or unequal error variance. We compared MODEL FOR JOINT PRIORITI- can effective integrate multiple related the performance of ChIPtest and Non- ZATION OF GWAS RESULTS phenotypes and boost the power of pChIP via ROC curves and True Positive FROM MULTIPLE RELATED detecting associated SNPs. Rate (TPR) curves in both two conditions and multiple conditions. Both real data PHENOTYPES email: [email protected] and simulation results shows the TSmax Cong Li*, Yale University with kernel smoothing dominate the other Can Yang, Hong Kong Baptist University 6o. NONPARAMETRIC TESTS FOR methods. All these results indicate that Hongyu Zhao, Yale School of Public DIFFERENTIAL ENRICHMENT the identified differential binding regions Health ANALYSIS WITH MULTI-SAMPLE are indeed biologically meaningful. We ChIP-Seq DATA demonstrate the method using a ChIP- The past ten years have witnessed a Seq data on a comparative epigenomic grand wave of endeavors hunting single Qian Wu*, BioStat Solution profiling of adipogenesis of murine adi- nucleotide polymorphisms (SNPs) Kyoung-Jae Won, University pose stromal cells. Our method detects affecting various human complex traits of Pennsylvania many genes with differential binding for through genome-wide association stud- Hongzhe Li, University of Pennsylvania the histone modification mark H3K27ac ies (GWAS). But disappointedly, the in gene promoter regions between significant SNPs identified through GWAS Chromatin immunoprecipitation sequenc- proliferating preadipocytes and mature can only explain a small fraction of the ing (ChIP-seq) technology is a powerful adipocytes in murine 3T3-L1 cells. The genetic contributions to complex traits. tool for analyzing protein interactions with test statistics also correlate with the Many lines of evidence suggest the major DNA. The genes with differential binding gene expression changes well and are reason being the existence of numer- region under two or multiple conditions predictive to gene expression changes, ous weak-effect SNPs that are difficult to are important and could be used to indicating that the identified differential identify under the current GWAS sample predict the gene expression changes. In binding regions are indeed biologically sizes. In our previous work, a statistical the previous study, we proposed a kernel meaningful. method called “GPA” was developed based nonparametric method ChIPtest to to improve our power to detect these solve this problem under two conditions. email: [email protected] weak-effect SNPs by borrowing informa- In this article, we develop a new nonpara- tion from GWAS results of genetically metric testing method NonpChIP without related traits and genomic functional considering any smoothing method, annotations. Despite its success, it is where the test statistics will not depend on the choice of bandwidth as ChIPtest. In addition, the methods are limited for

174 ENAR 2015 | Spring Meeting | March 15–18 6p. ANALYSIS OF MASS SPEC- 6q. ACCOUNTING FOR MEASURE- We describe our method for multinomial TROMETRY DATA AND MENT ERROR IN GENOMIC DATA data and also for survival analysis data PREPROCCESING METHODS AND MISCLASSIFICATION OF using a modified version of the Cox FOR METABOLOMICS SUBTYPES IN THE ANALYSIS model. The results of a simulation study OF HETEROGENEOUS indicate that our methods significantly Leslie Myint*, Johns Hopkins University TUMOR DATA lower the bias with a small price being Kasper Hansen, Johns Hopkins paid in terms of variance. We also ana- Daniel Nevo, Hebrew University, University lyze breast cancer data from the Nurses’ Jerusalem, Israel The genetic basis of disease has been Health Study to demonstrate the utility of David Zucker*, Hebrew University, a popular source of scientific inquiry in our method. Jerusalem, Israel recent years due to the rapid increase in email: [email protected] efficiency of sequencing technologies. Molin Wang, Harvard School However, disruptions in biochemical of Public Health pathways and metabolite concentrations Donna Spiegelman, Harvard School 7. POSTERS: are also key factors in disease etiology of Public Health Methodology and and progression, and growing numbers Applications in Epidemi- of researchers have started perform- A common paradigm in dealing with ology, Environment, and ing metabolic profiling to shed light on heterogeneity across tumors in cancer Ecology their biological condition of interest. analysis is to cluster the tumors accord- Metabolomics analysis typically involves ing to subtypes using gene expression separating the compounds present in data on the tumor and then to analyze 7a. CARPE DIEM! BIOSTATISTICIANS biological samples through some form each of the clusters separately. A more IMPACTING THE CONDUCTING of chromatography and fragmenting the specific target is to investigate the AND REPORTING OF CLINICAL compounds into distinctive patterns using association between risk factors and STUDIES specific subtype and to utilize the results a mass spectrometer. This process cre- Sally Morton*, University of Pittsburgh ates noisy and complex data, which must for personalized treatment. This task is Standards for conducting and guide- be processed in several ways before sub- usually carried out by two steps – cluster- lines for reporting clinical studies have sequent analysis. This processing should ing and risk factor assessment. However, evolved and proliferated. Inconsistency, ideally produce a list of metabolites and two sources of measurement error arise and in some cases disagreement, may their abundances for all samples, which in these problems. The first is the error be due to differences in whether or not will ultimately be used to identify the in the gene expression measurements. the standard or guideline is meant as metabolites that are differentially abun- The second is the misclassification error a best practice recommendation or a dant between groups. However, this when inaccurately assigning observations mandatory requirement. In addition, the preprocessing stage is a difficult task to clusters. We consider the case with a targeted study design may play a role that has not been extensively explored. specified set of relevant genes and pro- (for example, randomized trials versus We identified interesting features of mass pose unified single- likelihood approach observational studies). In this poster, spectrometry data and investigated for normally distributed gene expres- we compare and contrast standards their relationship with and impact on the sions. As an alternative, we consider a and guidelines using, as examples, the results of the widely used preprocessing two-step procedure with the tumor type Patient-Centered Outcomes Research software package, XCMS. We evaluate misclassification error taken into account Institute (PCORI) standards for con- the impact of the change in results on in the second-step risk factor analysis. ducting patient-centered comparative differential abundance analysis. effectiveness research, as well as email: [email protected] standards and guidelines for systematic

Program & Abstracts 175 reviews. We will consider the poten- cedure on a real data set, with our results and Poisson model (rate ratios). The tial impact on innovation and propose indicating that SBVRSS for ratio estima- relative risks estimated by the models that standards and guidelines provide tion is more efficient than using SBVSRS using attained age or the ones using potential tools of influence and education in all cases presented in the simulations. time from exposure to event as the time for the discipline. Biostatisticians can, scales were similar. Adjusting for different email: [email protected] and should, play an important role in the covariates does not change the estimates process of constructing, validating, and substantially. disseminating standards and guidelines. 7c. COMPARISONS OF THE email: [email protected] email: [email protected] CANCER RISK ESTIMATES BETWEEN EXCESS RELATIVE RISK AND RELATIVE RISK 7d. A REGRESSION BASED 7b. ON STRATIFIED BIVARIATE MODELS: A CASE STUDY SPATIAL CAPTURE-RECAPTURE RANKED SET SAMPLING WITH MODEL FOR ESTIMATING Shu-Yi Lin*, Taipei City Hospital, Taiwan OPTIMAL ALLOCATION FOR SPECIES DENSITY Relative risk models (RR Model) are com- NAIVE AND RATIO ESTIMATORS Purna S. Gamage*, Texas Tech monly used in cancer risk assessment Lili Yu, Georgia Southern University University in public health. However, in radiation Hani Samawi, Georgia Southern research, linear relative risk model is Souparno Ghosh, Texas Tech University often used to estimate excess relative University Philip S. Gipson, Texas Tech University risk (ERR Model), such as the studies of Daniel Linder, Georgia Southern Japanese atomic bomb survivors. The Gregory Pavur, Texas Tech University University purpose of this study is to compare the Data obtained from capture-recapture Arpita Chatterjee, Georgia Southern estimates of the cancer risks between studies are essentially spatial in nature. University the ERR models and the RR models The spatial proximity of the activity Yisong Huang*, Georgia Southern using the Taiwan radiation-contaminated centre, of an animal, and the trap loca- University buildings cohort follow-up data from tion determines how likely the concerned 1983 to 2005. The analyses were based individual will be captured. In order to Robert Vogel, Georgia Southern on 6,242 subjects who had ever lived in incorporate the spatial information in the University radio-contaminated buildings, and 117 inference about the relative abundance The purpose of the current work is to cancer cases were identified. The study of a species in the study region, Borchers introduce stratified bivariate ranked compares and assesses the estimates of and Efford (2008) proposed the spatially set sampling (SBVRSS) and investi- cancer risks by Cox Proportional Hazard explicit capture-recapture (SECR) model. gate its performance for estimating the Model, Poisson Log-linear Model and In its original form, SECR allowed the population means using naive and ratio ERR model. The study verifies that the state-space of the activity centers of the methods. The properties of the proposed excess relative risk estimated by ERR individuals to arise from a non-homoge- estimator are derived along with the model are equivalent to the relative risk neous Poisson process (NHPP). However, optimal allocation with respect to strati- minus one estimated by Poisson mod- in practice, complete spatial randomness fication. We conduct a simulation study els. Our analysis shows that the results (CSR) is generally assumed for the distri- to demonstrate the relative efficiency of by Cox model (hazard ratios) are more bution of the activity centers. However, in SBVRSS as compared to stratified bivari- conservative than those by ERR model many situations, covariates, such as veg- ate simple random sampling (SBVSRS) etation characteristics, heavily influence for ratio estimation. Data that consist of the location of these activity centers. To weights and bilirubin levels in the blood accommodate such information, we envi- of 120 babies used to illustrate the pro- sion an NHPP, with covariate dependent

176 ENAR 2015 | Spring Meeting | March 15–18 intensity function, driving the location of way ANOVA by ranks test determined and the laboratory test data for a small the activity centers in the study region. We differences of change scores among 4 subset of the reported cases. We build perform simulation studies to compare the groups of students (those who gained, a hierarchical Bayesian multi-pathogen robustness of CSR and NHPP specifica- lost, maintained or never had AN). model by using a latent process to link tions of the state-space, particularly under Significant effects were found with BMI% the disease counts and the lab test data. model-misspecification. We then illustrate but not in the weight or BMI percentile Our model explicitly accounts for spatio- our methodology on the abundance data outcomes. Findings provide evidence temporal disease patterns. The inference obtained during a survey of the meso- supporting the use of BMI% rather than and prediction are carried out by a com- carnivores in north-west Texas. BMI and weight percentiles when exam- putationally tractable MCMC algorithm. ining adiposity change among children We study operating characteristics of the email: [email protected] with morbid obesity. algorithm on simulated data and apply it to the HFMD in China data set. email: [email protected] 7e. APPLICATION OF THE USE email: [email protected] OF PERCENTAGE DIFFERENCE FROM MEDIAN BMI TO OVER- 7f. A MULTI-PATHOGEN HIERAR- COME CEILING EFFECTS IN CHICAL BAYESIAN MODEL FOR 7g. EVALUATING RISK-PREDICTION ADIPOSITY CHANGE IN CHILDREN SPATIO-TEMPORAL TRANS- MODELS USING DATA FROM MISSION OF HAND, FOOT AND ELECTRONIC HEALTH RECORDS Christa Lilly*, West Virginia University MOUTH DISEASE Le Wang*, University of Pennsylvania Lesley Cottrell, West Virginia University Xueying Tang*, University of Florida Pamela A. Shaw, University Karen Northrup, Wood County Nikolay Bliznyuk, University of Florida of Pennsylvania School System Yang Yang, University of Florida Hansie Mathelier, University Richard Wittberg, Wood County of Pennsylvania School System Ira Longini, University of Florida Stephen E. Kimmel, University Researchers are alert to the needs Mathematical modeling of infectious of Pennsylvania of morbidly obese children given the diseases plays an important role in the increasing incidence of obesity. How- development and evaluation of interven- Benjamin French, University ever, concern arose over the sensitivity tion plans. These plans, such as the of Pennsylvania development of vaccines, are usually and power of select adiposity measures Currently, there is particular clinical and pathogen-specific, but laboratory confir- and analytic approaches for examining economic interest in developing and mation of all pathogen-specific infections change in the top percentiles of BMI. In evaluating models that predict adverse is rarely available. If an epidemic is a the present study, standard statistical events (e.g., short-term hospital read- consequence of co-circulation of sev- techniques found no differences when mission) among patients with chronic eral pathogens, it is desirable to jointly assessing 4,058 students using both diseases (e.g., heart failure). Accurate model these pathogens in order to study weight and BMI percentile outcomes risk-prediction models can be used to the transmissibility of the disease. Our in morbidly obese children with Acan- inform personalized treatment strate- work is motivated by the hand, foot and thosis Nigricans (AN). The children’s gies for individual patients. As interest mouth disease (HFMD) surveillance data sex-age-specific median BMI was used in individualized prediction has grown, in China from 2008 to 2009. The data set to calculate the percentage difference so too has the availability of large-scale consists of counts of reported cases in from BMI (BMI%). Kruskal-Wallis one- clinical information systems. The increas- 334 prefectures and 53 consecutive weeks ing availability of data from electronic health records facilitates the development

Program & Abstracts 177 of prediction models, but estimation of Chagas disease vector Triatoma infes- transient, cognitive states and death as a prediction accuracy could be limited by tans in Arequipa, Peru. The fact that the competing risk (Figure 1). Each subject’s outcome misclassification, which can epidemic is slow-moving and the counts cognition is assessed periodically result- arise if events are not captured by the of infested houses are small leads to ana- ing in interval censoring for the cognitive electronic system. In simulation stud- lytic challenges. The data are limited to states while death without dementia is ies, we evaluate the performance of observed vector presence at each house- not interval censored. We apply a Semi- receiver operating characteristic curves hold at three time points over several Markov process in which we assume that and risk-reclassification methods in the years. In addition, streets are major bar- the waiting times are Weibull distributed presence of outcome misclassification. riers to T. infestans movement, resulting except for transitions from the baseline We consider situations in which events in a complex, spatial structure of the epi- state, which are exponentially distributed are not included in the electronic health demic. To address these challenges, we and in which we assume no additional record, with and without dependence propose a susceptible-infected-observed- changes in cognition occur between two on covariate values. We illustrate the removed model that uses informative assessments. We apply our model to the impact of outcome misclassification on priors and a novel spatial function that Nun Study. incorporates the complex dispersal estimation of prediction accuracy using email: [email protected] data from the University of Pennsylvania dynamics observed in Arequipa. The fully Health System electronic health record to Bayesian method is used to augment the evaluate alternative prognostic models for data, estimate the dispersal parameters, 7j. GROWTH CURVES FOR CYSTIC 30-day readmission among patients with and determine posterior infestation risk FIBROSIS INFANTS VARY a diagnosis of heart failure. probabilities of households for future IN THE ABILITY TO PREDICT treatment. We investigate the properties email: [email protected] LUNG FUNCTION of the model with simulation studies. Yumei Cao*, Medical College of Finally, the proposed methods are Wisconsin 7h. A BAYESIAN MODEL FOR illustrated with an analysis of the Chagas IDENTIFYING AND PREDICTING disease vector data. Raymond G. Hoffmann, Medical College of Wisconsin THE SPATIO-TEMPORAL DYNAM- email: [email protected] ICS OF RE-EMERGING URBAN Evans M. Machogu, Indiana University INSECT INFESTATIONS School of Medicine 7i. SEMI-MARKOV MODELS FOR Erica Billig*, University of Pennsylvania Praveen S. Goday, Medical College INTERVAL CENSORED TRAN- of Wisconsin Michael Levy, University of Pennsylvania SIENT COGNITIVE STATES Michelle Ross, University of WITH BACK TRANSITIONS Pippa M. Simpson, Medical College Pennsylvania AND A COMPETING RISK of Wisconsin Jason Roy, University of Pennsylvania Shaoceng Wei*, University of Kentucky Introduction. Monitoring children rou- tinely is especially important in the early Richard Kryscio, University of Kentucky Analyses of epidemics are complicated years of life and for patients with chronic by several factors, including the fact that Continuous-time multi-state stochastic illness. The Centers for Disease and the true dispersal mechanism of disease processes are useful for modeling the Control (CDC) Growth charts have been agents and the precise infection times flow of subjects from intact cognition to typically used for this purpose but now of patients are often unknown. Instead, dementia with mild cognitive impairment the World Health Organization (WHO) we often observe the infection state of and global impairment as intervening charts are recommended for infants 0-24 each unit at discrete time intervals. For months and the CDC charts after that example, consider a recent study of the

178 ENAR 2015 | Spring Meeting | March 15–18 age. However, the charts do not match at synonymous with functional status and due to small populations or low number 24 months. Studies have been conducted comorbidity, but these may be distinct of responses. Thus, missing counties setting goals using the CDC charts for all concepts requiring different manage- smoking rates must be predicted using ages. Our aim was to show how (1) the ment. We compared two methods of some modeling scheme. West Virginia’s charts differed, (2) they might be recon- defining a frailty phenotype, a count of between county smoking rates have high ciled to track cystic fibrosis patients and deficits and a weighted score of health variability and contains a high percentage (3) previous CDC growth goals for CF deficits incorporating the strength of (20%) of missing counties. Using other patients needed modification to account association between each deficit and publicly available county level variables for the differences from WHO parameters. mortality. The strength of association was from 2010 as covariates, two modeling Methods. CF registry data for patients estimated using proportional hazards frameworks, a generalized linear model born 2001-2004 were used. Bland Altman coefficients. The study uses data from the and a Bayesian model, were constructed plots were used to compare CDC and NHANES III. We compared the two meth- to predict smoking rates for counties WHO growth parameters. In addition, the odologies: frailty was associated with age, without 2010 BRFSS estimates in West ability to predict good lung function at 6 gender, ethnicity, and having comorbid Virginia. These models used only West years based on the different growth mea- chronic diseases. This study introduces Virginia data to predict the missing values sures at 2 years of age were compared a weighted score for defining a frailty in addition to using the entire US data. We using generalized linear models. Results. phenotype that is more strongly predictive found that the data using the entire US There is a considerable difference of mortality, and has potential to improve was more efficient, i.e., stronger prediction among the different measures for the CF targeting and care of today’s elderly. in both types of model and better overall patients. However, the ability to predict is convergence in the Bayesian model. email: [email protected] adequate for all measures. email: [email protected] email: [email protected] 7l. EFFICIENCIES FROM USING ENTIRE UNITED STATES 7m. OPTIMALLY COMBINED 7k. AN EXAMINATION OF THE RESPONSES IN PREDICTING ESTIMATION FOR TAIL CONCEPT OF FRAILTY IN COUNTY LEVEL SMOKING RATES QUANTILE REGRESSION FOR WEST VIRGINIA USING PUB- THE ELDERLY Kehui Wang*, North Carolina State LICLY AVAILABLE DATA Felicia R. Griffin*, Florida State University Dustin M. Long*, West Virginia University University Huixia Judy Wang, The George Wash- Daniel L. McGee, Florida State University Emily A. Sasala, West Virginia University ington University Elizabeth H. Slate, Florida State Smoking rates, as well as other risk Quantile regression offers a convenient University factors, tend to vary geographically, tool to access the relationship between specifically by county within each state. a response and covariates in a compre- Frailty has been defined as a state of The Behavioral Risk Factor Surveillance hensive way and it is appealing especially increased vulnerability to adverse out- System (BRFSS) collects data from across in applications where interests are on comes. The concept of frailty has been the United States on different risk factors, the tails of the response distribution. centered around counting the number of including smoking. However, many coun- However, due to data sparsity, the finite deficits in health, which can be diseases, ties are not represented in BRFSS reports sample estimation at tail quantiles often disabilities, or symptoms. However, there suffers from high variability. To improve is no consensus on how it should be the tail estimation efficiency, we consider quantified. Frailty has been considered modeling multiple quantiles jointly for

Program & Abstracts 179 cases where the quantile slope coef- approach using a hyper g prior (Liang et However, the method proposed here, is ficients tend to be constant at tails. We al.(2008) J. Amer. Statist. Assoc.). There to overcome the problem of high dimen- propose two estimators, the weighted are two regimes for computing Bayes sion under the Cox PH frailty model by composite estimator that minimizes factors, which differ in the choice of the considering a simultaneous variable the weighted combined quantile objec- base model. We study conditions under selection of both fixed effects and frailty tive function across quantiles, and the which Bayes factors are consistent for components through penalty functions weighted quantile average estimator that both regimes when the number of all such as LASSO and SCAD. Simula- is the weighted average of quantile-spe- potential regressors grows with sample tion studies show that the proposed cific slope estimators. By using extreme size. This situation is not fully understood procedure works well in selecting and value theory, we establish the asymptotic in the current literature, but gains increas- estimating significant fixed and frailty distributions of the two estimators at tails, ing importance recently. In the present terms. The proposed method is also and propose a procedure for estimating case, Bayes factors are not analytically applied to real data analysis for Diabetes the optimal weights. We show that the tractable and are calculated via Laplace of Type 2. optimally weighted estimators improve approximation. Results for other priors email: [email protected]. the efficiency over equally weighted esti- on g (e.g., the Zellner-Siow prior) can be mators, and the efficiency gain depends obtained in a similar manner. on the heaviness of the tail distribution. email: [email protected] 8c. FUSED LASSO APPROACH The performance of the proposed estima- TO ASSESSING DATA COMPARA- tors is assessed through a simulation BILITY WITH APPLICATIONS IN study and the analysis of a precipitation 8b. VARIABLE SELECTION MISSING DATA IMPUTATION downscaling data. FOR COX PROPORTIONAL Lu Tang*, University of Michigan email: [email protected] HAZARD FRAILTY MODEL Peter X. K. Song, University of Michigan Ioanna Pelagia*, The University of Manchester, United Kingdom Missing data imputation is a highly 8. POSTERS: sought-after approach for missing data Jianxin Pan, The University Variable Selection problems under big-data settings due to of Manchester, United Kingdom and Methods for High the curse of computing burden. Popu- Dimensional Data Extending the Cox Proportional Hazard lar imputation methods include single (PH) model to Cox PH frailty model may imputation such as mean imputation, increase the dimension of variable com- regression imputation, stochastic imputa- 8a. BAYES FACTOR CONSISTENCY ponents and become a very challenging tion and hot-deck imputation, as well UNDER G-PRIOR LINEAR MODEL task in terms of the significance and as multiple imputation which takes the WITH GROWING MODEL SIZE estimation of the parameter coefficients. average of the outcomes from multiple Ruoxuan Xiang*, University of Florida On the other hand, variable selection imputed data sets. It is known that most has always been one of the fundamental of the model based imputation methods Malay Ghosh, University of Florida problems when it comes to statistical are sensitive to the assumed distribu- Kshitij Khare, University of Florida modelling with high dimension variables tions to generate plausible values for the In this paper, we examine Bayes factor and has attracted a remarkable attention. replacement of missing data. In effect, consistency in the context of Bayes- Various techniques of variable selection when the assumed model is misspeci- ian variable selection for normal linear have been proposed such as the best fied, the imputation method may yield regression models. We take a hierarchical subset variable selection and stepwise elimination, but suffer from several draw- backs in contrast with penalty functions.

180 ENAR 2015 | Spring Meeting | March 15–18 “consistently biased” imputation values, address this concern, we evaluate several evaluated by an analysis of a FDA drug which will cause misleading statistical approaches for MI based on sparse adverse event dataset, which included 193 estimation and inference. In this paper, principal component analysis (SPCA). cardiovascular drugs with 8453 adverse we propose a method to evaluate the The performance of these methods is events. The results identified some new discrepancy between the distribution of assessed through numerical studies. signals as well as those clarified by other original data set and that of the imputed commonly used approaches. email: [email protected] data set, which will provide guidance email: [email protected] on the selection of appropriate imputa- tion techniques, for example, making a 8e. TOPIC MODELING FOR SIGNAL choice between model based imputa- DETECTION OF SAFETY DATA 8f. BUILDING RISK MODELS tion and nearest neighbor imputation. FROM ADVERSE EVENT REPORT- WITH CALIBRATED MARGINS Our evaluation is built upon the com- ING SYSTEM DATABASE Paige Maas*, National Cancer Institute, bination of fused lasso and bootstrap Weizhong Zhao*, U.S. Food and Drug National Institutes of Health resampling techniques, for both of which Administration statistical software is readily available. Yi-Hau Chen, Academia Sinica Wen Zou, U.S. Food and Drug We use extensive simulation studies Raymond Carroll, Texas A&M University to demonstrate the performance of the Administration Nilanjan Chatterjee, National Cancer new method under different situations. James J. Chen, U.S. Food and Institute, National Institutes of Health A real data application example is also Drug Administration provided. Risk models are used to weigh the risks The FDA centers receive reports from and benefits of preventative interventions email: [email protected] consumers, health care professionals, in clinical and public health settings. For manufacturers, and others regarding the many diseases, established risk models safety of various regulated products, such 8d. MULTIPLE IMPUTATION have been developed based on data from as drugs, vaccines, artificial hearts, surgi- USING SPARSE PCA FOR large representative cohorts and thor- cal lasers, and nutritional supplements. It HIGH-DIMENSIONAL DATA oughly validated in independent studies. is a challenge to extract the information As new risk factors are identified, there is Domonique Watson Hodge*, in these reports for better assessment a need to update existing risk models to Emory University of product safety and rapid detection of include up-to-date information in predict- adverse event signals. In this study, we Qi Long, Emory University ing disease risk. It is often not reasonable applied topic modeling approach, which Missing data presents challenges in the to conduct an entirely new cohort study is a hierarchical Bayesian model, to statistical analysis phase of research. to collect the few additional risk factors analyzing FDA adverse event databases. Common naive analyses such as needed to refit a given risk model. In fact, Topic model can reveal “hidden” patterns complete-case and available-case analy- a more efficient method would add new between adverse events and products, sis may introduce bias, loss of efficiency, risk factors while incorporating informa- such as drugs, vaccines, or consumption and produce unreliable results. Multiple tion from existing models as much as products, to detect potential safety sig- imputation is one of the most widely possible. We investigate two approaches nals. Drug groups, for example, inducing used method for handling missing data for using existing models to calibrate similar adverse events can be identified, which can be attributed to its ease of use. a new model. First, we explore the use and adverse events reported simultane- However, more research needs to be of a regression calibration approach, ously with similar drugs are clustered into conducted to determine the best strategy utilizing a method from sample-survey groups as well. The proposed method was to conduct multiple imputation (MI) in the literature which is traditionally used for presence of high-dimensional data. To increasing the efficiency of parameter estimation from a survey by leveraging

Program & Abstracts 181 information from external data sources. (IRLS) is applied for optimization. It is Lasso regression is an alternative predic- Second, we investigate a constrained shown that the method has the oracle tor selection algorithm that yields sparse maximum likelihood approach, leverag- property indicating that asymptotically it estimators by including a “shrinkage” ing a key constraint identified in our work performs as well as if the true structure penalty parameter. Lasso regression with the regression calibration method. were known in advance. Simulation stud- has been adapted for GLMMs, a method We present analytic and numerical results ies show the superiority of this procedure referred to as glmmLasso. GMMBoost, that reveal the performance of these over the traditional post hoc multiple a boosted generalized linear mixed approaches in various relevant scenarios. comparison hypothesis tests. The utility modeling algorithm, is another vari- of the method itself, as well as the com- able selection technique that yields a email: [email protected] putational approach, are also examined sparse solution through reweighting via a real data analysis. of model residuals. Both glmmLasso and GMMBoost effectively handle large 8g. CATEGORICAL PREDICTORS email: [email protected] AND PAIRWISE COMPARISONS numbers of predictors and achieve sparse IN LOGISTIC REGRESSION VIA prediction models. We conducted a PENALIZATION AND BREGMAN 8h. COMPARISON OF STEP-WISE simulation study comparing the ability of METHODS VARIABLE SELECTION, step-wise selection in GLMM, glmmLasso, BlmmLasso, AND GMMBoost and GMMBoost to correctly identify Tian Chen*, North Carolina State FOR IDENTIFICATION OF variables and interactions associated University PREDICTOR INTERACTIONS with a disease outcome. We apply these Howard Bondell, North Carolina ASSOCIATED WITH DISEASE techniques to identify variables and vari- State University OUTCOME able interactions associated with treatment response in patients with lupus nephritis. Logistic regression is widely used to Yunyun Jiang*, Medical University study the relationship between a binary of South Carolina email: [email protected] response and a set of covariates. When Bethany Wolf, Medical University the covariates in the logistic regression of South Carolina are categorical, two goals are determin- 8i. SHRINKAGE PRIORS FOR BAYES- ing the important factors, and detecting Predicting patients’ disease risk, severity, IAN LEARNING FROM HIGH differences among the levels of these or response to treatment often neces- DIMESIONAL GENETICS DATA important categorical factors. In this sitates modeling complex interactions Anjishnu Banerjee*, Medical College paper, we propose a penalization based among genetic and environmental vari- of Wisconsin approach to conduct these pairwise ables measured over time. Generalized Shrinkage priors are widely used in high comparisons among the levels. Within a linear mixed models (GLMM) can model dimensional settings for variable selec- single procedure, the irrelevant factors interactions in data with repeated tion, prediction and learning. There are can be removed, while the levels within measures, however without an a priori currently two generic flavors for shrinkage the important factors can be collapsed hypothesis, identification of higher-order priors - the first being having a global into groups. We propose an algorithm interactions can be cumbersome. Predic- shrinkage parameter and the second, based on Split Bregman iterations, which tors can be selected using a step-wise having individual shrinkage for each of transforms the constrained problem into variable selection technique comparing the variables in question. There has been a series of simple unconstrained prob- models using statistics such as Akaike’s a lot of interest of late, notably with the lems. Because of the logistic structure, Information Criterion (AIC) or Bayesian Iteratively Reweighted Least Squares Information Criterion (BIC). However, such variable selection techniques are known to produce unstable estimates.

182 ENAR 2015 | Spring Meeting | March 15–18 horse-shoe prior of Scott and Polson, 9. POSTERS: Markov chain Monte Carlo algorithm is 2010, which belongs to the second Bayesian Methods proposed. The methods are assessed category with them being shown to out- and Computational through simulation studies and applied perform the global shrinkage parameter Algorithms to the National Longitudinal Study of paradigm in experiments and theoreti- Adolescent Health. cal settings. We argue in this article that email: [email protected] neither approach is optimal - both from 9a. NONPARAMETRIC BAYES theoretical settings and computational MODELS FOR MODELING perspectives. We propose a new variant LONGITUDINAL CHANGE IN 9b. REGRESSION MODEL ESTI- of shrinkage priors - which is a “middle- ASSOCIATION AMONG CAT- MATION AND PREDICTION path” between the global and local EGORICAL VARIABLES INCORPORATING COEFFICIENTS approaches and show superior empiri- Tsuyoshi Kunihama, Duke University INFORMATION cal performance and significant gains in *, University of North Caro- Wenting Cheng*, University of Michigan computational efficiency. We apply our Amy Herring lina, Chapel Hill proposed algorithm to the setting of high Jeremy M. G. Taylor, University dimensional genetic data and compare it David Dunson, Duke University of Michigan against competing approaches. Carolyn Halpern, University of North Bhramar Mukherjee, University email: [email protected] Carolina, Chapel Hill of Michigan Modeling and computation for multi- We consider a situation where there is a variate longitudinal data has proven rich amount of historical data available for 8j. FUNCTIONAL PRINCIPAL COMPO- challenging, particularly when data are the coefficients and their standard errors NENT ANALYSIS TO FIFTY-EIGHT not all continuous but contain discrete in a regression model of E(Y|X) from large MOST TRADED CURRENCIES measurements. Approaches based on studies, and we would like to utilize this BASED ON EURO generalized linear mixed modeling, and summary information for improving infer- Jong-Min Kim, University of Minnesota, related exponential family hierarchical ence in an expanded model of interest, Morris models, have been criticized due to a say, E(Y|X, B). The additional variables B Ali H. AL-Marshadi, King Abdulaziz lack of robustness. In particular, prob- could be thought of as a set of new bio- University lems arise due to the dual role of the markers, measured on a modest number random effects structure in controlling the of subjects in a new dataset. We formulate Junho Lim*, University of Minnesota, dependence and shape of the marginal the problem in an inferential framework Morris distributions. Motivated by an interesting where the historical information is trans- This research is for investigating the application to sexual preference data, lated in terms of non-linear constraints recent trend of fifty eight most traded we propose a novel approach based on on the parameter space. We propose currencies based on Euro by using a Dirichlet process mixture of Gauss- several frequentist and Bayes solutions to functional principal component analysis ian latent factor models. The proposed this problem. In particular, we show that since January, 2013. We also performed model uses a rounded kernel method the transformation approach proposed in functional linear regression of Brent crude to allow data to be mixed scale, with a Gunn and Dunson, 2005 is a simple and oil given on the currencies which were longitudinal factor structure incorporat- effective computational method to conduct selected by Bayesian variable selection. ing dependence within-subjects in their Bayesian inference in this constrained email: [email protected] repeated measurements. Survey weights are incorporated into the model to facilitate generalizability. Parameter inter- pretation is considered, and an efficient

Program & Abstracts 183 parameter situation. Our simulation results 9d. BAYESIAN NETWORK MODELS 9e. ALGORITHMS FOR CON- comparing the methods indicate that his- FOR SUBJECT-LEVEL INFERENCE STRAINED GENERALIZED torical information on E(Y|X) can indeed EIGENVALUE PROBLEM Sayantan Banerjee*, University of boost the efficiency of estimation and Texas MD Anderson Cancer Center Eun Jeong Min*, North Carolina enhance predictive power in the regres- State University sion model of interest E(Y|X, B). Han Liang, University of Texas MD Anderson Cancer Center Hua Zhou, North Carolina State email: [email protected] University Veerabhadran Baladandayuthapani, University of Texas MD Anderson The generalized Rayleigh quotient R(x)= 9c. CROSS-CORRELATION OF Cancer Center (xt A x) / (xt B x) for symmetric and positive We develop Bayesian models to ana- semi-definite matrices A and B appears Congjian Liu*, Georgia Southern lyze proteomic networks in different as the objective function in many mul- University cancer types. Our primary aim is to tivariate statistics problems such as predict patient-specific network structure In general, a location or time, the principal component analysis, canoni- leveraging multi-domain genomic data observations or data follow two differ- cal correlation analysis, and partial least using Directed Acyclic Graphical (DAG) ent models before and after it, which is squares. Maximizing or minimizing the models. We infer the prior DAG network change point. Change point problems are Rayleigh quotient yields the general- based on the training data and obtain problems with chronologically ordered ized eigenvector corresponding to the the corresponding posterior network data collected over a period of time dur- maximal or minimal generalized eigen- based on sparse Bayesian regression ing which there is known (or suspected) value respectively. In many applications, methods on each of the nodes, incor- to have been a change in the underlying parameter constraints such as non-nega- porating gene-level information (mRNA, data generation process. Interest then tivity and sparsity are necessary for better miRNA and methylation) for each of the lies in, retrospectively, making infer- interpretability and conditioning. We proteins. Bayesian model averaging is ences about the time or position in the investigate three classes of algorithms for used to predict the responses for patients sequence that the change occurred. the constrained generalized eigenvector in the test data, along with obtaining (Everitt, Brian, 2010). See Fig1, many problem: gradient based methods, coor- the predictive density for each of the change points are shown in the plot of dinate descent, and alternating direction proteins corresponding to each test data. In Change point problems, we have method of multipliers. Their numerical patient. A network-score is proposed for series of observations or samples. In efficiency and convergence properties each patient as a measure of activation most cases, these observations appear are evaluated by simulation studies and a of the patient-specific network, based in chronological order of their time. In the real data arising from imaging genetics. other hand, sample with a spatial distribu- on probabilities of protein-activation for email: [email protected] tion is also possible, which the change each of the proteins. The network scores point is in space and the position of the are used to fit a survival model for the interface. In the one-dimensional space, patients. The methods are motivated by this is time variables, same with above. and applied to Reverse Phase Protein Array (RPPA) data for two different cancer email: [email protected] types, namely Kidney Renal Clear Cell Carcinoma (KIRC) and Lung Squamous Cell Carcinoma (LUSC), for prediction of the proteins in the PI3K/AKT pathway. email: [email protected]

184 ENAR 2015 | Spring Meeting | March 15–18 9f. CycloPs: A CYCLOSTATIONARY data obtained from the Developmental and variance based on any types of ALGORITHM FOR AUTOMATIC Cohort Study (DECOS) where 50 elderly summary statistics found in the published WALKING RECOGNITION subjects were monitored for one week studies. We conduct simulation study to (~ 300 GB of data). Our results show that compare the existing methods with the Jacek K. Urbanek*, Johns Hopkins both WalE and IWF are strongly associ- proposed method. We also include an Bloomberg School of Public Health ated with subjects’ gender and age. illustrative example of longitudinal meta- Vadim Zipunnikov, Johns Hopkins analysis of quality-of-life (QoL) data in email: [email protected] Bloomberg School of Public Health prostate cancer patients. Tamara B. Harris, National Institute on email: [email protected] Aging, National Institutes of Health 9g. SIMULATION-BASED ESTIMATION OF MEAN AND Nancy W. Glynn, University of Pittsburgh VARIANCE FOR META-ANALYSIS 9h. THE EFFECTS OF SPARSITY Ciprian Crainiceanu, Johns Hopkins VIA APPROXIMATE BAYESIAN CONSTRAINTS ON INFERENCE Bloomberg School of Public Health COMPUTATION (ABC) OF BIOLOGICAL PROCESSES IN STOCHASTIC NON-NEGATIVE Jaroslaw Harezlak, Indiana University Deukwoo Kwon*, University of Miami School of Medicine MATRIX FACTORIZATION OF Isildinha M. Reis, University of Miami EXPRESSION DATA We develop an algorithm (CycloPs) for In meta-analysis crucial inputs are mean automatic recognition of walking periods Wai S. Lee*, Johns Hopkins University effect size and its corresponding vari- based on modeling of local cyclosta- Alexander V. Favorov, Johns Hopkins ance from the studies in order to obtain tionarity in high-frequency time series University pooled estimate. Hozo et al. (2005) obtained from wearable accelerometers. proposed the sample standard deviation Elana J. Fertig, Johns Hopkins University The algorithm uses advanced spectral formulas using median, low and high end analysis to recognize walking and to Michael F. Ochs, The College of of the range, and the sample size. Wan describe its properties at a sub-second New Jersey et al. (2014) proposed a new estimation level such as walking instantaneous method to estimate standard deviation Non-negative matrix factorization (NMF) energy (WalE) expressed in earths’ using same descriptive statistics in Hozo and related methods, such as PCA, gravity units and instantaneous walk- et al. along with inter-quartile range (IQR). ICA, and Factor Analysis, model gene ing frequency (IWF) expressed in steps These summary statistics are commonly expression data as a mixture of under- per second. CycloPs is robust against reported in most studies. However, some lying expression patterns. It has been within- and between-subject variability literature provided different descriptive established that sparsity is a power- and it automatically adapts to the length statistics (and/or summary statistics) ful constraint for recovering biological of recording, type of device, and con- other than median, range, and IQR information from these analyses. For figuration set-up and can be applied to such as 95% confidence interval or just example, the CoGAPS matrix factorization data collected by wrist-, hip- and ankle- mean and p-value. In longitudinal meta- algorithm uses a Markov chain Monte worn accelerometers. CycloPs uses a analysis, we are often given mean and Carlo approach that incorporates a prior one-pass exhaustive search algorithm standard deviation at baseline and mean distribution, which enforces both non- that can process a week of data (~150M differences and corresponding standard negativity and sparsity constraints. We measurements) in an hour, allowing for deviations for specific time points relative present results using our new CoGAPS R/ efficient processing of very large datas- to baseline. In this study we propose a C++ Bioconductor package to explore ets. We apply our algorithm to free-living simulation-based estimation approach the effects of different levels of sparsity using Approximate Bayesian Computa- on the recovery of biological information tion (ABC) technique for estimating mean

Program & Abstracts 185 from a well-studied cancer data set. As tist context. We consider a Bayesian 9k. PRIOR ELICITATION FOR expected, we observe increased \chi- approach to sample size determination for LOGISTIC REGRESSION squared values as sparsity is increased hurdle models and show its application to WITH DATA EXHIBITING along with reduced structure in the esti- a hypothetical sleep disorder study. MARKOV DEPENDENCY mated matrix decomposition. However, in email: [email protected] Michelle S. Marcovitz*, Baylor University terms of recovery of biological processes previously validated, we find that there is John Seaman Jr., Baylor University an optimal range of sparsity that provides 9j. FAST COVARIANCE ESTIMATION We model data from a questionnaire with more reliable estimation of biological FOR SPARSE FUNCTIONAL/LON- three binary questions, some of which process activity. GITUDINAL DATA are sensitive. The three questions exhibit email: [email protected] Luo Xiao*, Johns Hopkins University first order Markov dependency so that the answer to the second question depends David Ruppert, Cornell University on the answer to the first and the answer 9i. BAYESIAN SAMPLE Vadim Zipunnikov, Johns Hopkins to the third question depends on the SIZE DETERMINATION Bloomberg School of Public Health answer to the second. For example, in a FOR HURDLE MODELS population of female sex workers admit- Ciprian Crainiceanu, Johns Hopkins ted for treatment of STDs, the questions Joyce Cheng*, Baylor University Bloomberg School of Public Health might be (1) “Do you engage in unpro- David Kahle, Baylor University Covariance function estimation is tected sex”?, (2) “Have you been arrested John W. Seaman, Baylor University essential in functional/longitudinal data for prostitution?”, and (3) “Do you have analysis. While covariance estimation is a dependents living with you?” Participants In many areas of research, count data bivariate smoothing problem, no bivari- are randomized to different versions of the containing a large number of zero ate smoother has been tailored to it. In questionnaire to protect privacy. We offer outcomes is common. Hurdle models this work, we propose a fast bivariate a Bayesian logistic regression model for are often presented as an alternative to penalized spline smoother for estimat- analyzing such data where the parameters zero-inflated models for such data. Hurdle ing covariance functions from sparsely of interest are marginal and conditional models consist of two parts: a binary observed data. We select the smoothing probabilities of answering “yes”. We con- model indicating a positive response parameter through leave-one-subject- struct power priors and conditional means (the ‘hurdle’) and a zero-truncated count out cross validation and derive a fast priors. We consider the issue of induced model. One or both sides of the model algorithm to overcome computational priors for the marginal and conditional can be dependent on covariates, which difficulties. Simulation results show that probabilities of “yes” answers when priors may or may not overlap. Sample size the proposed method works well. We are elicited on regression parameters. determination is an important aspect of illustrate the method with an application Finally, we implement a Bayesian sample experimental design for clinical trials. to a children growth data. size determination method based on the This is not a new problem in the realm two-priors approach for the logistic regres- of zero-inflated count data and has been email: [email protected] sion model. addressed in the literature in a frequen- email: [email protected]

186 ENAR 2015 | Spring Meeting | March 15–18 10. Advances in Patient- key properties of secondary data and subpopulations, such as those defined by Centered Outcomes the abundance of electronic healthcare a biomarker or risk factor at baseline. Two (PCOR) Methodology databases covering millions of patients, critical components of adaptive enrich- it is critical to strengthen the rigor of ment designs are the decision rule for analyses of such data. Highly innovative modifying enrollment, and the multiple PCORI FUNDING OPPORTUNITIES analytic approaches have recently been testing procedure. We provide a general FOR BIOSTATISTICIANS developed that (1) are solidly grounded method for simultaneously optimiz- Jason Gerson*, Patient-Centered Out- in the principles of science and (2) are ing both of these components for two comes Research Institute (PCORI) made to best fit any electronic healthcare stage, adaptive enrichment designs. The data source. With the involvement of top optimality criteria are defined in terms of This talk will provide an overview of the researchers, patients, doctors, and other expected sample size and power, under PCORI funding opportunities for biostat- decision makers, we plan to evaluate how the constraint that the familywise Type I isticians and the methodology projects much better these new methods per- error rate is strongly controlled. It is infea- currently funded by PCORI. form. To prove this, we use several large sible to directly solve this optimization email: [email protected] databases of electronic medical records problem since it is not convex. The key and health insurance records. We will test to our approach is a novel representation the relationship between two newer and of a discretized version of this optimiza- CAUSAL INFERENCE FOR frequently used cardiovascular therapies. tion problem as a sparse linear program. EFFECTIVENESS RESEARCH We will also use computer-generated We apply advanced optimization tools IN USING SECONDARY DATA artificial data in which we can impose a to solve this problem to high accuracy, Sebastian Schneeweiss*, known association. In such simulation revealing new, optimal designs. studies, we can further understand and Harvard University email: [email protected] improve the performance of these new The routine operation of the US health- analytic methods. care system produces an abundance of TREATMENT EFFECT INFERENCES electronically stored data that capture email: [email protected] USING OBSERVATIONAL DATA the care of patients as it is provided in WHEN TREATMENTS EFFECTS ARE settings outside of controlled research OPTIMAL, TWO STAGE, ADAPTIVE HETEROGENEOUS ACROSS OUT- environments. The potential for utilizing ENRICHMENT DESIGNS FOR RAN- COMES: SIMULATION EVIDENCE these data to inform future treatment DOMIZED TRIALS, USING SPARSE choices and improve patient care and John M. Brooks*, University LINEAR PROGRAMMING outcomes of all patients in the very of South Carolina Michael Rosenblum*, Johns Hopkins system that generates the data is widely Cole G. Chapman, University Bloomberg School of Public Health acknowledged. Particularly for elderly of South Carolina multi-morbid patients and most other Xingyuan Fang, Princeton University vulnerable patient groups who are often Helping patients make patient-centered excluded from randomized trials, these Han Liu, Princeton University treatment decisions requires treat- ment effect evidence aligned to the data, properly analyzed, are key to Adaptive enrichment designs involve circumstances of individual patients. improving care. Further, such second- preplanned rules for modifying enroll- If treatment effects are heterogeneous ary data reflect the health outcomes as ment criteria based on accruing data they occur in routine care, a main goal in a randomized trial. These designs of effectiveness research. Given these can be useful when it is suspected that treatment effects may differ in certain

Program & Abstracts 187 across patients, randomized control tri- 11. Looking Under MICROSIMULATION MODELING als are impractical for this purpose and the Hood: TO INFORM HEALTH POLICY many have recognized the necessity of Assumptions, Methods DECISIONS ON AGE TO BEGIN, using observational data to generate and Applications of AGE TO END, AND INTERVALS OF evidence more closely aligned to indi- Microsimulation Models COLORECTAL CANCER SCREENING vidual patients. However, the treatments to Inform Health Policy Ann G. Zauber*, Memorial Sloan Ketter- observed in observational databases are ing Cancer Center real world treatment decisions that often involve complex assessments of treat- INTRODUCTION TO THE CISNET Microsimulation modeling is increasingly ment effects across multiple outcomes PROGRAM AND POPULATION being used to inform health policy deci- valued by patients. Risk adjustment (RA) COMPARATIVE MODELING sions but there is a lack of understanding in the public health and statistical com- estimators and instrumental variable Eric J. Feuer*, National Cancer munity regarding when these models (IV) estimators are available to estimate Institute, National Institutes of Health treatment effectiveness using observa- are needed, what kind of questions they CISNET is a consortium of NCI-spon- tional data. When treatment effects are can uniquely address, and what are their sored investigators that use simulation heterogeneous across patients, though, it strengths and weaknesses. We provide modeling to understand past trends in is critical to understand that these estima- an example of microsimulation modeling cancer incidence and mortality, and to tors yield parameter estimates applicable to inform a health policy decision when guide public health research and priori- to distinct patient subsets and improper randomized controlled trials could not ties. In this talk we describe the role of interpretation could lead to dramatic be conducted for the number of options simulation modeling in understanding the policy mistakes. Here we further conjec- under consideration. The colorectal population impact of interventions (i.e. ture that these interpretive distinctions are cancer models from the Cancer Interven- screening, treatment and prevention). less clear when real world treatment deci- tion and Surveillance Modeling Network We review some of the unique aspects of sions are complex and affect more than (CISNET) were used to assess the age to modeling as carried out by CISNET, i.e. one outcome. Additional methodologi- begin screening (ages 40, 50, or 60), age the development of flexible broad-based cal research is needed to understand to end screening (75 or 85) and intervals disease models, the ability to model mul- the proper interpretations of RA and IV of repeat screening ( 5, 10, or 20 years tiple birth cohorts, comparative modeling, estimates in complex treatment scenarios for endoscopic tests and 1, 2 or 3 years transparency in model structure and to avoid treatment and policy mistakes. In for fecal occult blood tests). We used a assumptions, and outreach to partners to this study we use simulation modeling to natural history model of the adenoma car- make the modeling relevant. Finally, we assess the properties of the parameters cinoma sequence for colorectal cancer summarize some of the major accom- produced by RA and IV estimators under and overlaid screening interventions on a plishments of CISNET. various relationships between treatment large simulated population. The recom- benefits and risks across outcomes. email: [email protected] mended screening strategy was to begin at age 50 and stop at age 75 provided the email: [email protected] patient had been consistently screened with negative findings. This was the best strategy to balance life years gained with the resources required and complications associated with screening. email: [email protected]

188 ENAR 2015 | Spring Meeting | March 15–18 ROLE OF CALIBRATION AND USING MICROSIMULATION TO SYNTHESIS OF RANDOMIZED VALIDATION IN DEVELOPING ASSESS THE RELATIVE CON- CONTROLLED TRIALS OF PROSTATE MICROSIMULATION MODELS TRIBUTIONS OF SCREENING CANCER SCREENING TO ASSESS AND TREATMENT IN OBSERVED IMPACT OF PSA TESTING USING Carolyn M. Rutter*, RAND Corporation REDUCTIONS IN BREAST CANCER MICROSIMULATIONS Microsimulation models are an important MORTALITY IN THE UNITED STATES Ruth Etzioni*, Fred Hutchinson Cancer tool for informing health policy. Models Donald A. Berry*, University of Texas MD Research Center provide a structure for combining a wide Anderson Cancer Center range of evidence that represents the cur- Roman Gulati, Fred Hutchinson Cancer rent understanding of both disease and More randomized trials have addressed Research Center improvements in treating and screen- interventions to prevent or treat disease. Alex Tsodikov, University of Michigan This structure includes a descrip- ing for breast cancer over the past 30 tion of heath states that describe key years than in any other cancer. Over time Eveline Heijnsdijk, Erasmus University events in a disease processes and rules since the 1980s these interventions have Harry de Koning, Erasmus University describing transitions between states. been incorporated into clinical practice Randomized trials are the gold standard Parameter associated with transitions in the US; breast cancer mortality has for evidence regarding the efficacy of rules are selected to achieve good fit to since dropped by 30%, with comparable cancer screening tests. In the case of observed statistics through a process decreases in many European countries. PSA screening for prostate cancer, two of model calibration. Once calibrated, Are the decreases due to treatment or trials conducted in the US and Europe models are used to predict population- screening or both? The 7 (now 6) Breast produced apparently conflicting results, level outcomes under different policy CISNET models specifically address this with the European (ERSPC) trial indi- scenarios. Model validation, evaluation question and attribute relative benefits cating a significant benefit and the US of model predictions for data not used to the two types of interventions. Having (PLCO) trial showing no benefit. Rec- for calibration, is critical for developing 7 modeling teams addressing the same ognizing that the two trials had different confidence in model predictions. This question using common data sources populations, protocols and compliance presentation focuses on issues related is unique. It allows for addressing the rates we used simulation modeling to microsimulation model validation, variability of conclusions across model- to replicate the trials as conducted to using three models for colorectal cancer ing approaches. In the present case it determine whether a common screen- screening as an example. We evaluated enables robust conclusions regarding ing efficacy could be identified. Three the accuracy of model predictions across a fundamental scientific and medical different models of disease natural his- a range of natural history landmarks to question. A New York Times editorial put tory, screening and mortality were used. gain insight into the accuracy of model it thusly: “What seems most important is The models showed that under efficacy assumptions. Models generally provided that each team found at least some bene- similar to that in the ERSPC trial, the null good predictions of observed data, and fit from mammograms. The likelihood that result produced by US trial result would between-model comparisons supported they are beneficial seems a lot more solid not be unexpected (15-28% probability longer preclinical duration assumptions. today than it did four years ago, although across the models) given the extreme Validation is important, but complicated, the size of the benefit remains in dispute.” contamination observed on the control especially when evaluating fit to multiple I will describe various improvements in arm of the US trial. Further, by modeling targets, when using observed data that Breast CISNET models, especially how differences between the trials one at a may be prone to bias, and when translat- they address heterogeneity of the molec- ing models to different target populations. ular characteristics of breast cancer. email: [email protected] email: [email protected]

Program & Abstracts 189 time we were able to identify the main regularity in the images and the abun- We establish the scope of validity of these factors that explain the different results. dance of image databases to estimate folklore theorems. Starting with message We conclude that differences in imple- a patch-wise density prior. As we show, passing algorithm as a heuristic method mentation explain much if not most of explicitly accounting for the modeling for solving $\ell_q$ penalized least the reported differences in screening error in the prior improves the achieved squares, we study the following questions efficacy across the trials, but note that our reconstructions. This work is a collabora- in the asymptotic settings: (i) What is results are subject to a large degree of tion with the Gallant lab in UC Berkeley. the impact of initialization on the perfor- uncertainty. mance of the algorithm? (ii) When does email: [email protected] the algorithm converge to the sparsest email: [email protected] solution regardless of the initialization? DOES lq MINIMIZATION Studying these questions leads us to the 12. Optimal Inference OUTPERFORM l1 MINIMIZATION answer of the first folklore theorem, i.e., the performance of the global minimizer for High Dimensional Arian Maleki*, Columbia University Problems $\hat \beta(\lambda,q)$. In many application areas ranging email: [email protected] from bioinformatics to imaging we are A NON-PARAMETRIC NATURAL faced with the following question: Can IMAGE FOR DECODING VISUAL we recover a sparse vector $\beta_o \ INFERENCE IN HIGH-DIMENSIONAL STIMULI FROM THE BRAIN in \mathbb{R}^p$ from its unders- VARYING COEFFICIENT MODELS ampled set of noisy observations $y\in\ Yuval Benjamini*, Stanford University Mladen Kolar*, University of Chicago mathbb{R}^n$, $y=X\beta+\epsilon$? Bin Yu, University of California, Berkeley The last decade has witnessed a surge Damian Kozbur, ETH, Zurich of algorithms and theories to address Brain decoding refers to extracting the Varying coefficient models have been this question. One of the most popular experimental stimulus - in our case a successfully applied in a number of algorithms is the $\ell_q$-penalized least natural image (photo) or video - from scientific areas ranging from economics squares given by the following formula- brain activity. For a subject that watches and finance to biological and medical sci- tion: $\hat \beta(\lambda,q)=\arg\min_\ images or video while being scanned ence. Varying coefficient models allow for beta \| y - X\beta\|_2^2+\lambda \| \ in an function MRI, the goal is to recon- flexible, yet interpretable, modeling when beta \|_q^q.$ Despite the non-convexity struct what they saw from their brain traditional parametric models are too of these optimization problems for $0 \leq scans. In other words, can we display rigid to explain heterogeneity of sub-pop- q<1$, they are still appealing because what they were seeing? For this inverse ulations collected. Currently, as a result of the following folklores in the high- problem, we consider a so-called Bayes- of technological advances, scientists dimensional statistics: (i) $\hat \beta(\ ian decoder combining three sources are collecting large amounts of high- lambda,q )$ is closer to $\beta_o$ than $\ of information: (a) a forward model, dimensional data from complex systems hat{\beta}(\lambda,1)$. (ii) If we employ using training data, relating the image which require new analysis techniques. iterative methods that converge to a or video to the evoked brain activity, (b) We focus on the high-dimensional linear local minima of $ \| y - X\beta\|_2^2 + the estimated multivariate distribution varying-coefficient model and develop \lambda \| \beta \|_q^q$, then under of the prediction errors derived from the a novel procedure for estimating the good initialization these algorithms con- regressions, and (c) a prior for the natural coefficient functions in the model based verge to a solution that is still closer to $\ stimuli to constrain the inverse operation. on penalized local linear smoothing. Our beta_o$ than $\hat{\beta}(\lambda,1)$. In the talk, we will focus on the problem procedure works for regimes which allow of determining a non-parametric multi- the number of explanatory variables to variate prior for natural stimuli that will be be much larger than the sample size, best suited for reconstruction. We use

190 ENAR 2015 | Spring Meeting | March 15–18 under arbitrary heteroscedasticity in original feature measurements. Subse- Pocock et al., 2012). This approach is residuals, and is robust to model mis- quently, penalized logistic regression based on pairwise comparisons between specification as long as the model can is invoked, taking as input the newly patients in the treatment and control be approximated by a sparse model. We transformed or augmented features. This groups using a primary outcome (say, further derive an asymptotic distribution procedure trains models equipped with for example, mortality) with ties broken for the normalized maximum deviation local complexity and global simplicity, using a secondary outcome (say, occur- of the estimated coefficient function from thereby avoiding the curse of dimension- rence of a cardiac event) when a ranking the true coefficient function. This result ality while creating a flexible nonlinear based on the primary outcome cannot can be used to test hypotheses about a decision boundary. The resulting method be determined. In interpreting such particular coefficient function of inter- is called Feature Augmentation via analyses for studies involving prolonged est, for example, whether the coefficient Nonparametrics and Selection (FANS). follow-up it is important to recognize that function is constant, as well as construct We motivate FANS by generalizing the the observed pairwise preferences and confidence bands for covering the true Naive Bayes model, writing the log ratio the weight they attach to the component coefficient function. Construction of the of joint densities as a linear combination rankings will change over time. We study uniform confidence bands relies on a of those of marginal densities. It is related some properties of this procedure under double selection technique that guards to generalized additive models, but has various models for the treatment effect against omitted variable bias arising from better interpretability and computability. on each outcome and the dependence potential model selection mistakes. We Risk bounds are developed for FANS. In between them. demonstrate how these results can be numerical analysis, FANS is compared email: [email protected] used to make inference in high-dimen- with competing methods, so as to pro- sional dynamic graphical models. vide a guideline on its best application domain. Real data analysis demonstrates e-mail: [email protected] A MODEL FOR TIME TO FRAC- that FANS performs very competitively TURE WITH A SHOCK STREAM on benchmark email spam and gene SUPERIMPOSED ON PROGRESSIVE FEATURE AUGMENTATION VIA expression data sets. Moreover, FANS is DEGRADATION: THE STUDY OF NONPARAMETRICS AND SELEC- implemented by an extremely fast algo- OSTEOPOROTIC FRACTURES TION (FANS) IN HIGH DIMENSIONAL rithm through parallel computing. Xin He*, University of Maryland, CLASSIFICATION email: [email protected] College Park Jianqing Fan, Princeton University G. A. Whitmore, McGill University Yang Feng, Columbia University 13. Lifetime Data Geok Yan Loo, University of Maryland, Jiancheng Jiang, University of North Analysis Highlights College Park Carolina, Charlotte Marc C. Hochberg, University Xin Tong*, University of Southern MODELING THE “WIN RATIO” IN of Maryland, Baltimore California CLINICAL TRIALS WITH MULTIPLE Mei-Ling Ting Lee, University We propose a high dimensional OUTCOMES of Maryland, College Park classification method that involves David Oakes*, University of Rochester Osteoporotic hip fractures in the elderly nonparametric feature augmentation. Recently the “win ratio” has been popu- are associated with a high mortality in Knowing that marginal density ratios are larized as a simple method of statistical the first year following fracture and a high most powerful univariate classifiers, we analysis for controlled clinical trials with incidence of disability among survivors. use the ratio estimates to transform the multiple endpoints (see for example We study first and second fractures of Finkelstein and Schoenfeld, 1999 and

Program & Abstracts 191 elderly women using data from the Study ventional models for bivariate recurrent information at a landmark time. We of Osteoporotic Fractures (SOF). We events where the association is charac- propose a double-empirical likelihood present a new conceptual framework, terised solely by baseline frailty variables. method to combine published landmark stochastic model and statistical method- A composite likelihood approach is devel- survival information obtained from differ- ology for time to fracture. Our approach oped to estimate parameters in the joint ent sources such as disease registers. gives additional insights into the patterns rate models in semiparametric setting. We also propose an empirical likelihood for first and second fractures and the The proposed model and method can be ratio test to examine whether the aggre- concomitant risk factors. Our modeling used to identify biomarkers or risk factors gate information is consistent with the perspective involves a novel time-to-event for recurrent events that could be used to individual-level data. Simulation studies methodology called threshold regression tailor preventive strategies and treatment show that the proposed estimator yields which is based on the plausible idea that plans. To illustrate the applicability of a substantial gain in efficiency over the many events occur when an underlying the methods, the proposed approaches conventional partial likelihood approach. process describing the health or condi- are applied to data arising from a youth A data analysis illustrates the methods tion of a person or system encounters violence study. and theory. (Joint work with Jing Qin and a critical boundary or threshold for the Huei-Ting Tsai). email: [email protected] first time. In the parlance of stochastic email: [email protected] processes, this time to event is a first hit- ting time of the threshold. The underlying EFFICIENT ESTIMATION OF THE COX process in our model is a composite of a MODEL WITH AUXILIARY LANDMARK 14. Recent Advances chronic degradation process for skeletal TIME SURVIVAL INFORMATION and Challenges in the health combined with a random stream Chiung-Yu Huang*, Johns Hopkins Design of Early Stage of shocks from external traumas, which University Cancer Trials taken together trigger fracture events. Jing Qin, National Institute of Allergy email: [email protected] and Infectious Diseases, National Insti- MOTIVATING SAMPLE SIZES IN tutes of Health ONE- AND TWO-AGENT PHASE I JOINT RATE MODELS FOR Huei-Ting Tsai, Georgetown University DESIGNS VIA BAYESIAN POSTERIOR BIVARIATE RECURRENT EVENTS CREDIBLE INTERVALS Assessing heterogeneity of treatment WITH FRAILTY PROCESSES effects is of great importance in patient- Thomas M. Braun*, University centered outcomes research, as it is of Michigan Mei-Cheng Wang*, Johns Hopkins critical to identify subgroups of patients Simulation remains the primary method University who are likely to benefit from the treat- for which sample sizes are derived for ment. However, clinical trials are usually early-phase Bayesian adaptive clinical Bivariate or multivariate recurrent event not powered to detect the interaction trials, which is unappealing both due the data are often collected in longitudinal between treatment and patient charac- time needed to program the simulations, studies as the primary outcome mea- teristics. In this research, we propose a as well as the subjective means by which surements for research. We consider novel approach to improve efficiency in the final sample size is determined. We statistical modeling for bivariate recurrent estimating the survival time distribution apply the idea of Bayesian posterior cred- events, where the association between by synthesizing information from the ible intervals as a way to quickly generate two types of recurrent events is charac- individual-level data in clinical studies a sample size for both one- and two- terised by frailty processes and hence with that from the aggregate survival agent trials that is determined through allows for time-dependent association. an objective decision rule. Our methods This forms a contrast with those con- are also useful for examining the sensitiv-

192 ENAR 2015 | Spring Meeting | March 15–18 ity of any design to the prior distribution UNDERSTANDING THE TOXICITY SIMPLE BENCHMARK FOR PLAN- selected for the model parameter(s) and PROFILE OF NOVEL ANTICANCER NING AND EVALUATING COMPLEX the operational values assigned to doses, THERAPIES DOSE FINDING DESIGNS i.e the “skeleton.” We compare our Shing M. Lee*, Columbia University Ken Cheung*, Columbia University approach to that proposed by Cheng for the CRM, and we also use our methods The methods developed for estimat- While a general goal of early phase clini- to compare the sample sizes necessary ing the maximum tolerated dose for cal studies is to identify an acceptable for several models that have been pro- chemotherapeutic agents may not be dose for further investigation, modern posed for two-agent designs. appropriate for novel targeted therapies dose finding studies and designs are and immunotherapies. While toxicities highly specific to individual clinical set- email: [email protected] from chemotherapy generally arises soon tings. In addition, as outcome-adaptive after treatment, there is increasing litera- methods often involve complex algorithm, ture to suggest that this may not be true it is crucial to have diagnostic tools at the BEYOND THE MTD: PERSONALIZED for novel anticancer therapies. Moreover, planning stage to evaluate the plausibility MEDICINE AND CLINICAL TRIAL toxicities may also be cumulative, with of a method’s simulated performance and DESIGN patients experiencing mild toxicities in the adequacy of the algorithm. In this talk, Daniel Normolle*, University earlier cycles and progressing into more I will introduce a simple technique that of Pittsburgh severe ones in later cycles. Before, we provides an upper limit, or a benchmark, Brenda Diergaarde, University can suggest good designs for these ther- of accuracy for dose finding methods for of Pittsburgh apies it is necessary to better understand a given design objective. The proposed the toxicity profile of these newer thera- benchmark is nonparametric optimal, Julie Bauman, University of Pittsburgh pies. We analyzed the data from several and is demonstrated by examples to be Cancer therapy is arriving at a cross- phase I trials on targeted therapies to a practical accuracy upper bound for roads where the half-century paradigm illustrate the toxicity profile and propose model-based dose finding methods. We of cytotoxic therapy development will methods to address the complexities of illustrate the implementation of the tech- become irrelevant. Recent discoveries these data. The methods are compared nique in the context of phase I trials that based on high-throughput sequencing to standard approaches to illustrate the consider multiple toxicities and phase indicate that the genomics of metastatic deficiencies of conventional methods I/II trials where dosing decisions are disease are an order of magnitude more and the need for better designs for novel based on both toxicity and efficacy, and complex than that of primary disease, targeted therapies and immunotherapies. apply the benchmark to several clinical implying that treatments for metastatic examples considered in the literature. By email: [email protected] disease will require combinations of comparing the operating characteristics targeted therapies that will be unique to of a dose finding method to that of the each patient. Validation and optimization benchmark, we can form quick initial of therapy strategies will be, accordingly, assessments of whether the method much more complex than the design is adequately calibrated and evalu- and assessment of monotherapies. I ate its sensitivity to the dose-outcome will discuss recent advances in can- relationships. cer genomics that affect the design of email: [email protected] personalized therapies, and speculate on trial designs and endpoints that will be required to move beyond n=1 analyses. email: [email protected]

Program & Abstracts 193 15. Large Scale incidence rates of in-patient gastroin- SAFETY ANALYSIS STRATEGIES Data Science for testinal bleeding among atrial fibrillation FOR COMPARING TWO COHORTS Observational patients taking dabigatran or warfarin in a SELECTED FROM HEALTHCARE Healthcare Studies database that covers over 227M patient- DATA USING PROPENSITY SCORES years.[joint work with the Observational William DuMouchel*, Oracle Health Healthcare Data Sciences and Informat- Sciences BEYOND CRUDE COHORT DESIGNS: ics program] PHARMACOEPIDEMIOLOGY Rave Harpaz, Oracle Health Sciences email: [email protected] AT SCALE Propensity scores provide a way to select Marc A. Suchard*, University of Califor- two cohorts from a longitudinal health- nia, Los Angeles HONEST INFERENCE FROM OBSER- care database that are matched by their VATIONAL DATABASE STUDIES estimated probability of exposure to two Massive longitudinal healthcare therapies. This is designed to minimize databases enable development of surveil- David Madigan*, Columbia University potential biases caused by the non- lance solutions to identify and evaluate Observational healthcare data, such randomized treatment assignment. This drug risk at unprecedented scale. Recent as administrative claims and electronic balance theoretically protects against comparative drug safety analyses using health records, play an increasingly bias when comparing all outcomes administrative claims data continue to prominent role in healthcare. Pharma- observed after treatment assignment, rely on unadjusted incidence rate ratios. coepidemiologic studies in particular so that, for example, the two cohorts We develop a large-scale regularized routinely estimate temporal associations can be compared across a wide variety regression framework to control for between medical product exposure and of safety risks. We focus on the use of drug exposure-assignment and esti- subsequent health outcomes of interest the high dimensional propensity score mate adjusted incidence rate ratios at and such studies influence prescribing method, and discuss and illustrate how scale. Our framework uses advancing patterns and healthcare policy more gen- longitudinal data from the two cohorts computing technology for Big Data to fit erally. Some authors have questioned the can be imported into a general purpose statistical models involving 1,000,000s of reliability and accuracy of such studies, tool for comparisons across many safety patients and enables automatic adjust- but few previous efforts have attempted outcomes. Fecundity is defined as the ment via stratification, propensity score to measure their performance. We have biologic potential of men and women for matching and doubly-robust estima- conducted a series of experiments to reproduction, and is often measured by tors. These models involve conditioned empirically measure the performance estimating the probability of pregnancy likelihoods that were previously com- of various observational study designs in each menstrual cycle among couples putationally impractical in observational with regard to predictive accuracy for having regular unprotected intercourse. healthcare. In our framework, we include discriminating between true drug effects Estimating fecundity is challenging, in all clinical information available about and negative controls. I describe this part, given the effect that varying pat- patients up to their time of indication work, explore opportunities to expand the terns of sexual intercourse may have on diagnosis and treatment exposure, use of observational data to further our the length of pregnancy attempt. Clinical such as all possible drug prescriptions, understanding of medical products, and guidance is sometimes sought to aid medical conditions, procedures and other highlight areas for future research and couples in timing intercourse acts around demographics. The number of covari- development. ovulation to minimize the time needed to ates stands in the 10,000s, regularization email: [email protected] achieve pregnancy. Empirical evidence helps us avoid overfitting and algorithmic delineating the timing of intercourse optimization provides estimates in real- time. We apply our method to examine

194 ENAR 2015 | Spring Meeting | March 15–18 INTERPRETABLE FEATURE CRE- 16. CONTRIBUTED PAPERS: ATION AND MODEL UNCERTAINTY Competing Risks IN OBSERVATIONAL MEDICAL DATA Tyler McCormick*, University EXTENDING FINE AND GRAY’S of Washington MODEL: GENERAL APPROACH Rebecca Ferrell, University FOR COMPETING RISKS ANALYSIS of Washington Anna Bellach*, University Large-scale observational health data- of Copenhagen and University bases (such as electronic medical of North Carolina, Chapel Hill records or administrative claims data) Jason Peter Fine, University capture continuous-time, unsolicited of North Carolina, Chapel Hill recordings of patient experiences. As with many emerging data sources without Ludger Rüschendorf, Albert Ludwigs relative to ovulation are few, resulting in a a formal sampling design, these data University of Freiburg im Breisgau generalized clinical recommendation to require substantial pre-processing before Michael R. Kosorok, University have intercourse every other day (Prac- using standard statistical tools. For obser- of North Carolina, Chapel Hill tice Committee of the American Society vational health databases, pre-processing We introduce a pseudo likelihood func- for Reproductive Medicine, 2013). Under- often involves coding for characteristics tion that can be used to derive estimators standing the relation between fecundity, present at a designated baseline period for the hazard rate of the subdistribution intercourse behavior and other relevant through discretization of the temporal in competing risks settings for a broad covariates is increasingly relevant given element of the records, e.g. coarsening class of semiparametric regression population level changes in the sociode- the health event timelines over a specified models. Two important special cases mographic characteristics of reproductive “lookback period” into a binary or count of our approach are the Fine and Gray aged couples such as an increase in age feature to capture prior disease history. model and the proportional odds model at first pregnancy. This may be associ- Though there is rich literature examining for the hazard rate of the subdistribution. ated with reduced intercourse activity, model selection, very little work examines For a general class of semiparametric longer time-to-pregnancy, an increased these pre-processing, “feature creation” transformation models we prove the prevalence of infertility or a combination choices. We propose a model uncertainty consistency and asymptotic normality of of all these factors. Our main objective is framework to address this problem in the estimators. Our estimates are directly to jointly model intercourse behavior, a the context of medical event prediction. interpretable as we target on the hazard binary longitudinal process (measured on Through simulations and an application rate of the subdistribution. Our model day level), menstrual cycle characteristic to health claims data, we demonstrate the is efficient for administrative censored (measured on monthly level and TTP, effect of decisions to encode time-varying data. In simulation studies we show that a survival outcome (on monthly times- information as static baseline covariates also for right censored data our model cale), with a view towards prediction of on predictive performance and discuss performs well with respect to the vari- both longitudinal processes on differing approaches to account for uncertainty in ances even for very small sample sizes. timescales and time to pregnancy. This defining lookback periods. is achieved using an empirical bayes We apply the method to a bone marrow email: [email protected] approach of joint modeling of multivariate longitunidal processes and time to event. email: [email protected]

Program & Abstracts 195 transplant data set to demonstrate its diagnostic methods. Consistency and We establish the consistency, asymptotic practical utility. We illustrate how our asymptotic normality of the estimator is normality, and semiparametric efficiency proposed method improves the precision established and finite-sample properties of the NPMLEs. In addition, we construct of prediction for the individual event types are studied through simulation experi- graphical and numerical procedures if the appropriate link function is selected ments. The method is illustrated using to evaluate and select models. Finally, by the Akaike information criterion. data from HIV-1 seropositive individuals we demonstrate the advantages of the in sub-Saharan Africa, where serious proposed methods over the existing ones email: [email protected] death under-reporting (with deceased through extensive simulation studies and patients being misclassified as dropouts) an application to a major study on bone NON-PARAMETRIC CUMULATIVE affects the estimates of the cumulative marrow transplantation. incidence of mortality and of non-reten- INCIDENCE ESTIMATION UNDER email: [email protected] MISCLASSIFICATION IN THE tion in HIV care. CAUSE OF FAILURE email: [email protected] JOINT DYNAMIC MODELING OF Giorgos Bakoyannis*, Indiana University RECURRENT COMPETING RISKS Menggang Yu, University of Wisconsin EFFICIENT ESTIMATION OF SEMI- AND A TERMINAL EVENT PARAMETRIC TRANSFORMATION Constantin T. Yiannoutsos, Indiana Piaomu Liu*, University of South Caro- MODELS FOR THE CUMULATIVE University lina, Columbia INCIDENCE OF COMPETING RISKS Constantine Frangakis, Johns Hopkins Edsel Peña, University of South Lu Mao*, University of North Carolina, University Carolina, Columbia Chapel Hill The fundamental identifiable quantities in Recurrent events and terminal events Danyu Lin, University of North Carolina, cohort studies and clinical trials with com- occur in many areas in the biomedical Chapel Hill peting risks are the cause-specific hazard and public health settings. In this talk a and the cumulative incidence function. For analysis of competing risks data, joint model for recurrent competing risks However, in many clinical settings, the interest has centered on the cumula- and a terminal event will be described. cause of failure is diagnosed with error. tive incidence because of its practical Associations among the recurrent This type of misclassification is expected relevance and direct interpretation.
A semiparametric regression model frailty variable and the impact of previ- estimates. In this work we evaluate the proposed by Fine and Gray (1999) ous recurrent event occurrences. The effect of cause of failure misclassification has become the method of choice for recurrent competing risks also impact in cumulative incidence estimates and formulating the effects of covariates on the occurrence of the terminal event. In propose a weighted version of the Aalen- the cumulative incidence. Its estima- addition, further association between the Johansen non-parametric estimator to tion, however, requires modeling of the terminal event and the recurrent compet- adjust for such a misclassification. The censoring distribution and is not statisti- ing risks is induced by a frailty variable. weights are functions of the misclassifica- cally efficient. In this article, we present To dynamically model the impact of tion probabilities, which can be estimated a broad class of semiparametric trans- interventions after each event occur- through double-sampling techniques of formation models which extends the rence on the recurrent competing risks, a random sample of subjects whose true Fine and Gray model, and we derive the an effective age process is introduced. cause of failure is unequivocally ascer- nonparametric maximum likelihood esti- The impact of the increasing number of tained through possibly more expensive mators (NPMLEs). We develop a simple and fast algorithm for computing the NPMLEs through the profile likelihood.

196 ENAR 2015 | Spring Meeting | March 15–18 recurrent event occurrences and covari- the PSH assumption. Second, the land- generalized linear regression model. The ate processes are also incorporated into mark PSH supermodel enables users to proposed estimator is easy to implement the semiparametric model. Estimators of make predictions with a set of landmark and it also has desirable asymptotic the parameters of the proposed model points in one step. Third, the proposed properties. We evaluated the finite- will be described. Some finite-sample and models can incorporate various types sample performance of the estimator via large-sample properties of estimators will of time-varying information. Finally, our simulation studies. In the application, we be presented. models are not computationally intensive applied the proposed method to identify and can be easily implemented with exist- potential risk factors for 90-day mortal- email: [email protected] ing statistical software. We assessed the ity without transplantation for pediatric performance of our models via simula- patients with end-stage liver diseases. tions and applied the proposed models DYNAMIC PREDICTION OF email: [email protected] SUBDISTRIBUTION FUNCTIONS to a data set from a multicenter clinical FOR DATA WITH COMPETING RISKS trial for breast cancer patients. KERNEL SCORE TEST FOR Qing Liu*, University of Pittsburgh email: [email protected] PROGRESSION FREE SURVIVAL Chung-Chou H. Chang, University Matey Neykov*, Harvard University of Pittsburgh COMPETING RISKS REGRESSION Tianxi Cai, Harvard University To be able to dynamically predict a USING PSEUDO-VALUES UNDER patient’s prognosis based on the dis- RANDOM SIGNS CENSORING Recently papers have emerged, for ease progression is very helpful to the Tianxiu Wang*, University of Pittsburgh testing whether certain genetic informa- physician for medical decision making. tion has effect on disease progression. Chung-Chou H. Chang, University Landmark Cox models have great poten- The kernel methods, that these papers of Pittsburgh tial for serving the purpose of dynamic use are highly flexible, allowing for the prediction but the use of such models In medical studies, investigators are often genetic information to have nonlinear and becomes much more challenging when interested in estimating marginal survival non-additive effects on the disease pro- competing risks are present. Several distributions of latent failure times when gression. In this paper we are interested studies have extended the landmark competing risks exist. Without further in utilizing these methods to obtain a test method to competing risks regression assumption, marginal survival functions in a semi-competing risk situation. The models, however, the resulting models are not identifiable (Tsiatis, 1975). In this main advantage of using such methods, are either sensitive to the proportional study, we incorporate the random signs is that we can capture of the informative subdistribution hazards (PSH) assump- censoring principle (Cooke, 1993; Yabes, censoring which might be lost by simply tion, or somewhat difficult to handle 2012) in estimating marginal survival using a progression free survival (PFS) time-dependent covariate values and functions. The random signs censor- model, which is typically used as a gold- time-varying covariate effects simulta- ing (RSC) principle is verifiable from the standard in the literature. However, our neously. In this study, we developed a observed data. We propose an estimator simulations demonstrate that the PFS landmark PSH model and a more com- of the effect of a covariate on marginal can have a really poor performance in prehensive landmark PSH supermodel. survival functions. The proposed esti- some situations. mator is based on pseudo-values for Our proposed models have four advan- email: [email protected] tages over other dynamic predictive inverse-probability-censoring-weighted models in addressing competing risks. (IPCW) Kaplan-Meier functions and the First, they are robust against violations of corresponding marginal survival function can be written in the form of a standard

Program & Abstracts 197 17. CONTRIBUTED PAPERS: information is reported. We also derive for the data subject to DLs. However, Applications and an analytic expression for the integral the RKM estimator requires the indepen- Methods in with respect to the mortality parameter, dence assumption between the exposure Environmental Health which is useful to reduce the Monte level and DL and can lead to biased Carlo computational burden associated results when this assumption is violated. with this parameter. Using our proposed We propose a kernel-based nonparamet- METHODOLOGY FOR QUANTIFYING approach, we quantify the expected ric estimator for the exposure distribution THE CHANGE IN MORTALITY change in ozone-related summertime without imposing any independence ASSOCIATED WITH FUTURE mortality in the contiguous United States assumption between the exposure level OZONE EXPOSURES UNDER between 2000 and 2050 under a chang- and DL. We show the proposed estimator CLIMATE CHANGE ing climate. We also illustrate the results is consistent and asymptotically normal. Stacey E. Alexeeff*, National Center when using a common technique in pre- Simulation studies demonstrate that the for Atmospheric Research vious work that averages ozone to reduce proposed estimator performs well in the size of the data, and contrast these practical situations. A colon cancer study Gabriele G. Pfister, National Center findings with our own. is provided for illustration. for Atmospheric Research email: [email protected] email: [email protected] Doug Nychka, National Center for Atmospheric Research Climate change is expected to have many ESTIMATION OF ENVIRONMENTAL SPATIAL CONFOUNDING, SPATIAL impacts on the environment, including EXPOSURE DISTRIBUTION ADJUST- SCALE AND THE CHRONIC HEALTH changes in ozone concentrations at the ING FOR DEPENDENCE BETWEEN EFFECTS OF COARSE THORACIC surface level. A key public health concern EXPOSURE LEVEL AND DETECTION PARTICULATE MATTER LIMIT is the potential increase in ozone-related Helen Powell*, Johns Hopkins Bloom- summertime mortality if surface ozone Yuchen Yang*, University of Kentucky berg School of Public Health concentrations rise in response to climate Brent Shelton, University of Kentucky Roger D. Peng, Johns Hopkins Bloom- change. Previous health impact studies berg School of Public Health have not incorporated the variability of Tom Tucker, University of Kentucky Spatial confounding occurs when unmea- ozone into their prediction models. We Li Li, Case Western Reserve University propose a Bayesian posterior analysis sured spatially varying confounders make Richard Kryscio, University of Kentucky and Monte Carlo estimation method for it difficult to distinguish the effect of the quantifying health effects of future ozone. Li Chen, University of Kentucky exposure from residual spatial variation in the outcome. Studies which aim to The key features of our methodology In environmental exposure studies, investigate the long-term health effects are (i) the propagation of uncertainty it is common to observe a portion of of air pollution typically include a spatial in both the health effect and the ozone exposure measurements to fall below term in the regression model to account projections and (ii) use of the empirical experimentally determined detection for the inherent bias associated with distribution of the daily ozone projections limits (DLs). The reverse Kaplan-Meier spatial confounding. However, this bias to account for their variation. The use of (RKM) estimator, which mimics the can only be reduced if the spatial scale interpolation to improve the accuracy of well-known Kaplan-Meier estimator for of the exposure is smaller than that of averaging over irregular shaped regions right-censored survival data with the helps to derive average exposure for the scale reversed, has been recommended regions where mortality and demographic for estimating the exposure distribution

198 ENAR 2015 | Spring Meeting | March 15–18 the unmeasured confounder, a concept units primary fuel from coal to other fuels Background. Numerous observational which has thus far not been consid- (primarily natural gas). Using the monthly studies have assessed the associa- ered by air pollution studies. We aim to aggregated ARCP data, we identify tion between ambient air pollution and investigate the long-term health effects important confounders and estimate the chronic disease incidence. There is of coarse thoracic particulate matter monthly causal effect of coal on CO2 however no uniform approach to cre- (PM), a metric of PM which is currently emissions in 2012 using three statistical ate an exposure metric that captures unregulated by the US Environmental procedures: Coarsened Exact match- the variability in air pollution through Protection Agency, taking into account ing, Nearest Neighbor Propensity Score time and determines the most relevant the spatial scale of the exposure. By matching, and Nearest Neighbor Dis- exposure window for determining risk of allowing the spatial scale of the pollutant tance Adjusted Propensity Score (DAPS) chronic illness. Methods. In this study we to inform spatial terms in the regression matching. The first two methods are use simulation to assess nine exposure we aim to reduce bias in the estimated well-established methods of estimating metrics that incorporate the time trends health effects resulting from spatial the causal effect of an intervention using in ambient air pollution and make use of confounding. observational data, and we propose different exposure windows. We simu- DAPS as a way to further adjust for late observational data based on the email: [email protected] confounding by incorporating the dis- characteristics of the Black Women’s tance between power plant locations to Health Study and use observed values ESTIMATING THE CAUSAL control for spatially varying, unobserved for particulate matter < 2.5 microns EFFECT OF COAL BURNING POWER confounding. For each method we fit 3 (PM2.5) for this cohort to create the nine PLANTS ON CO2 EMISSIONS different models to estimate the effect. exposure metrics. Results. When we fit All models give similar estimates of the Cox proportional hazards models using Georgia Papadogeorgou*, Harvard causal effect of coal (~31,500-40,000 the nine different metrics, we observe School of Public Health tons of CO2 emissions per month of that time-invariant metrics perform poorly Corwin Zigler, Harvard School 2012) and revealed similar patterns of and tend to underestimate the true of Public Health seasonality. hazard ratio. Time varying metrics that average previous values tend to perform Francesca Dominici, Harvard School email: [email protected] of Public Health well. Conclusions. Our simulation study indicates that the use of averaged time- Prior literature on evaluating the effect TEMPORAL ASPECTS OF varying exposure metrics provide the of policy regulations on power plants AIR POLLUTANT MEASURES least biased results. through the Acid Rain Control Program IN EPIDEMIOLOGIC ANALYSIS: (ARCP) is scarce and has been restricted e-mail: [email protected] A SIMULATION STUDY to simple methodological and statisti- cal approaches. Potential flaws in these Laura F. White*, Boston University approaches could be a failure to con- Jeffrey Yu, Boston University trol for confounders of the relationship Bernardo Beckerman, University between regulations and emissions or of California, Berkeley health outcomes, unobserved confound- ing, misinterpretation of estimates, and Michael Jerrett, University of California, comparison of implausible interventions. Berkeley One compliance strategy following these Patricia Coogan, Boston University regulations is the switch of power plant

Program & Abstracts 199 BAYESIAN MODELS FOR MULTIPLE power to detect overall, domain-specific Environmental health researchers often OUTCOMES IN DOMAINS WITH and outcome-specific exposure and develop estimates of personal exposure APPLICATION TO THE SEYCHELLES covariate effects than separate models. to chemicals where exposure data are CHILD DEVELOPMENT STUDY When fit to the Seychelles data, sev- limited. In the NIEHS Gulf STUDY, an epi- eral outcomes were classified as partly demiologic study of the health of workers Luo Xiao, Johns Hopkins Bloomberg belonging to several domains. Checks of who participated in the 2010 Deepwater School of Public Health model misspecification were improved Horizon oil spill clean-up, researchers Sally W. Thurston*, University relative to a model that assumes each are using monitoring data from the time of Rochester outcome is in a single domain. of the spill to develop task, time, and location-specific exposure estimates for David Ruppert, Cornell University e-mail: [email protected] several oil-related chemicals. One data Tanzy M.T. Love, University of Rochester set contains 4300 full work shift measure- Philip W. Davidson, University ANALYSIS OF 26 MILLION AREA VOC ments of personal total hydrocarbon of Rochester OBSERVATIONS FOR THE PREDIC- (THC) measurements and another has The Seychelles Child Development Study TION OF PERSONAL THC EXPOSURE over 26,000,000 short-term area mea- examines the effects of prenatal meth- USING BAYESIAN MODELING surements of volatile organic compounds (VOC) collected on 38 vessels assisting ylmercury exposure on central nervous Caroline P. Groth*, University in the clean-up. We present a Bayes- system functioning. The data include of Minnesota 20 outcomes measured on 9-year old ian model and framework for estimating Sudipto Banerjee, University children that can be classified into four personal THC airborne exposures from of California, Los Angeles “domains”: cognition, memory, motor, area VOC data when personal THC and social behavior. Previous analyses Gurumurthy Ramachandran, University data are missing. We first summarize and scientific theory suggest that some of Minnesota the 26,000,000 area VOC observations in hourly averages by vessel. Then, outcomes may belong to more than Ian Reagen, University of Minnesota one domain. We develop a framework we correlate the VOC hourly averages in which each domain is defined by a Richard Kwok, National Institute that overlap with the time of each THC sentinel outcome preassigned to that of Environmental Health Sciences, sample. From the relationship between domain only, while all other outcomes National Institutes of Health VOC and THC, we develop a model may belong to multiple domains and are Aaron Blair, National Cancer Institute, for predicting personal THC exposure. not preassigned. Our model allows us to National Institutes of Health Throughout this analysis, we employ learn about assignment of outcomes to methods to account for values below the Dale Sandler, National Institute domains, while allowing exposure and limit of detection in both VOC and THC of Environmental Health Sciences, covariate effects to differ across domains measurements. Using this framework we National Institutes of Health and across outcomes within domains. present preliminary findings on a subset We take a Bayesian MCMC approach. Lawrence Engel, National Institute of this analysis. of Environmental Health Sciences, Results from the Seychelles study and e-mail: [email protected] from extensive simulations show that our National Institutes of Health model can effectively determine sparse Mark Stenzel, Stewart Exposure Assess- domain assignment, and give increased ments, LLC Patricia Stewart, Stewart Exposure Assessments, LLC

200 ENAR 2015 | Spring Meeting | March 15–18 18. CONTRIBUTED PAPERS: A MEDIATION-BASED INTEGRA- NONPARAMETRIC FAILURE Statistical Methods TIVE GENOMIC ANALYSIS OF LUNG TIME ANALYSIS WITH GENOMIC for Genomics CANCER APPLICATIONS Sheila Gaynor*, Harvard University Cheng Cheng*, St. Jude Children’s Research Hospital IDENTIFICATION OF CONSISTENT Xihong Lin, Harvard University Genome-wide Association Study FUNCTIONAL MODULES Genetic association methods have (GWAS) has become routine in cancer Xiwei Chen*, State University traditionally been used to analyze the genomic translational research, which of New York at Buffalo relationship between SNP or sequenc- often requires genome-wide screening ing data and disease outcomes. This David L. Tritchler, State University to identify ordinal genomic features that standard approach often fails to explain a of New York at Buffalo are associated with treatment outcome, significant proportion of disease and elu- for example, single nucleotide polymor- Jeffrey C. Miecznikowski, State Univer- cidate the complete relationship between phisms associated with time to relapse. sity of New York at Buffalo SNPs and complex diseases. It has thus The estimated coefficient of a hazard rate Daniel P. Gaile, State University been suggested that studies may be regression model (HRRM) is often used of New York at Buffalo improved by jointly analyzing SNP and as the association test statistic. It will be gene expression data to analyze phe- It is often of scientific interest to find a set demonstrated in this talk that in certain notypes of complex diseases. Huang, of genes that may represent an inde- cases the HRRM approach is problem- VanderWeele and Lin (2014) proposed pendent functional module or network, atic. A robust, completely nonparametric that this can be approached using a such as a functional gene expression alternative using rank correlation is then mediation model, where the association module causing a biological response, proposed. This method, called correla- between SNP sets or whole genome a transcription regulatory network, or a tion profile test (CPT), consists of the sequences and a disease outcome is constellation of mutations jointly causing correlation profile statistic and a very mediated by gene expression or epigen- a disease. In this paper we are specifi- efficient hybrid permutation test combin- etic data. In our analysis we explore this cally interested in identifying modules ing permutation and asymptotic theory. framework, leveraging the lung adeno- that control a particular outcome variable Statistical performances are compared carcinoma and squamous cell data sets such as a disease biomarker. We discuss with several established methods, by (n=1025) from The Cancer Genome the statistical properties that functional a simulation study and analysis of real Atlas. We utilize the whole genome SNP networks should possess and introduce genomics data. It is shown that CPT data, gene expression and methylation the concept of network consistency performs much better than the HRRM data, and clinical data on disease phe- which should be satisfied by real func- approach in terms of maintaining the notypes such as survival. We show the tional networks of cooperating genes, power and nominal significance level, application of such a mediation frame- and directly use the concept in the path- especially in cases where the propor- work can quantitatively and qualitatively way discovery method we present. Our tional hazard model does not hold. supplement a traditional genetic associa- method gives superior performance for all tion study by providing explanation of the e-mail: [email protected] but the simplest functional networks. mechanisms leading to disease phe- e-mail: [email protected] notypes. Further, we provide empirical evidence suggesting for which genomic settings this framework is useful. e-mail: [email protected]

Program & Abstracts 201 AN OMNIBUS TEST FOR DIFFER- SPARSE ANALYSIS FOR HIGH ROBUST INFERENCE OF CHROMO- ENTIAL ABUNDANCE ANALYSIS OF DIMENSIONAL DATA WITH APPLICA- SOME 3D STRUCTURE USING HI-C MICROBIOME DATA TION TO DATA INTEGRATION CHROMATIN INTERACTION DATA Jun Chen*, Mayo Clinic, Rochester Sandra Addo Safo*, Emory University Kai Wang*, University of Iowa Emily King, Iowa State University Jeongyoun Ahn, University of Georgia Kai Tan, University of Iowa Diane Grill, Mayo Clinic, Rochester A core idea of most multivariate data DNA-DNA spacial contact counts analysis methods is to project higher revealed by chromosome conformation Karla Ballman, Mayo Clinic, Rochester dimensional data vectors on to a lower capture (3C) techniques contain valuable One central goal of microbiome studies dimensional subspace spanned by a few information regarding chromosome 3D is to identify taxa that show differentiation meaningful directions. Many multivariate structure. Popular consensus approaches between sample groups. The identified methods, such as canonical correlation for inferring chromosome 3D structure taxa can provide insights into disease analysis (CCA), multivariate analysis of include multidimensional scaling (MDS) etiology as well as be used as biomark- variance (MANOVA), and linear discrimi- and likelihood-based modeling (LM). ers for disease diagnosis and prevention. nant analysis (LDA), solve a generalized MDS method employs a pre-determined Many methods have been developed to eigenvalue problem. We propose a contact-to-distance transfer function. address this statistical problem ranging general framework, called substitution However, there are mounting evidences from simple adaptation of the t test (Meta- method, with which one can easily obtain against the existence of a universal stat) to more sophisticated statistical test a sparse estimate for a solution vector transfer function. Although LM does not based on zero-inflated Gaussian model of a generalized eigenvalue problem. require specification of a transfer function, (metagenomeSeq) and count based We employ the idea of direct estimation it needs a distribution for contact counts methods (DESeq2). However, none of in high dimensional data analysis and which is typically assumed to be Poisson the statistical methods have taken into suggests a flexible framework for sparse or negative binomial neither of which is account all the features of the taxa data, estimation in all statistical methods that empirically justified. Most importantly, which are zero-inflated overdispersed use generalized eigenvectors to find spatial coordinates in these methods count data. Moreover, most of the meth- interesting low-dimensional projections do not seem to be uniquely identifiable ods focus on detecting the change of the in high dimensional space. We illustrate as they are not invariant to rotation or mean of the taxa abundance. In real situ- the framework with sparse CCA for joint shifting. Hence the effort to search for the ation, disease could affect not only the analysis of two high dimensional data- global optimal solution is severely com- abundance mean but also the prevalence sets- gene expression measurements promised. We propose a novel variation and the variance. Both dysbiosis and and copy number variations, to study the of the MDS method by focusing on the disease heterogeneity can lead to dif- idea that changes in expression profiles topological similarity between the inferred ferential variance. We therefore develop may be associated with copy number spatial 3D structure and the structure of an omnibus test based on a zero-inflated variations and that copy number varia- contact counts. Unlike a recent MDS- count model that jointly tests the equality tions may be related to gene expression based variant, our method allows for a of mean, zero probability and variance measurements. more general transfer function. In addi- between two sample groups. Both tion, among the 3n spatial coordinates for e-mail: [email protected] simulations and real data applications the n loci, 3n-7 of them can be uniquely demonstrated the increased power of the identified after fixing the other 7 coordi- omnibus test as well better control of the nates. The usefulness of the proposed type I error than existing methods. method is demonstrated by simulation e-mail: [email protected] studies and an empirical study. e-mail: [email protected]

202 ENAR 2015 | Spring Meeting | March 15–18 19. CONTRIBUTED PAPERS: SEMIPARAMETRIC NONSEPARABLE STATISTICAL ANALYSIS OF FEED- Spatial and Spatio- SPATIAL-TEMPORAL SINGLE INDEX FORWARD LOOPS ARISING FROM Temporal Methods MODEL AGING PHYSIOLOGICAL SYSTEMS and Applications Hamdy Fayez Farahat Mahmoud*, Jonathan (JJ) H. Diah*, Columbia Virginia Tech University

A SEMIPARAMETRIC APPROACH Inyoung Kim, Virginia Tech Feiran Zhong, Columbia University FOR SPATIAL POINT PROCESS In this paper, we propose two semi- Arindam RoyChoudhury, Columbia WITH GEOCODING ERROR IN CASE- parametric single index models for University CONTROL STUDIES spatially-temporally correlated data. One We define a feed-forward loop as a mul- model has the nonparametric function Kun Xu*, University of Miami tivariate continuous-valued discrete-time separable from spatially correlated ran- Yongtao Guan, University of Miami stochastic process where one variable dom effects and time effects. We call thus influences another variable, and in turn When conducting risk estimation in model semiparametric spatio-temporal the former variable is influenced by the epidemiological studies, residence of separable single index model (SSTS- latter variable at a later time. Specifically, subjects is commonly geocoded in SIM), while the other does not separate a bivariate feed-forward loop is defined as geographic information systems software the nonparametric function and spatially a stochastic process {Xt, Yt}, where Xt is by converting residential addresses to correlated random effects but separates associated with Y(t+1), after taking into geographic coordinates. The ignorance the time effects, we call it semiparamet- account the effects Yt, and Yt is associ- of geocoding error in spatial analysis ric spatio-temporal nonseparable single ated with X(t+1), after taking into account usually results in biased parameter index model (SSTN-SIM). Two algorithms the effects Xt. A trivariate feed-forward estimates, inflated standard errors and based on Markov Chain Expectation loop is similarly defined for three variables. reduced statistical power to detect spatial Maximization algorithm are introduced to One way of performing inference on cluster and trends. In this article, we estimate the models parameters, spatial feed-forward loops is structural equation propose a novel bias-correction method effects and times effects. The proposed modeling. Application of a feed-forward for such data, where only a small portion models are applied to the mortality data loop process can come from any branch of of true case and control locations are set of six major cities in South Korea. science with multiple interacting systems; observed. We construct score vector at The data covers the period from January, the aging physiological system is one such each location without any distribution 2000 to December, 2007. It is found that example. We have modeled aging physi- assumptions on error distribution. We Busan city has the highest mortality and ological measures data from two major study spatial correlation of those score Seoul and Daejeon have the lowest mor- aging cohorts and concluded that there are vectors and establish our estimating tality. SSTS-SIM enforces the unknown significant evidences of feed-forward loop equation. We show consistency and mortality functions of all cities to have relationships between certain variables. In asymptotical normality of our estimator. the same shape but SSTN-SIM is more particular, we found evidence that physical We illustrate our method through simula- flexible. In terms of estimation, SSTN- functionality, lean muscle mass, and physi- tion and Iowa Carroll County childhood SIM is better than SSTSSIM. In terms of cal performance measures interact in a asthma data. prediction, in case we have enough data, feed-forward loop. Such results are impor- SSTN-SIM is better. email: [email protected] tant for understanding how physiological email: [email protected] systems interrelate with each other and lead to aging. Thus, our novel modeling of the feed-forward loop has far reaching applications in biostatistics of geriatrics. email: [email protected]

Program & Abstracts 203 Integrated Nested Laplace Approxima- COVARIANCE TAPERING FOR tion (INLA) and Variational Bayes (VB). ANISOTROPIC NONSTATIONARY Taylor and Diggle (2012) compared GAUSSIAN RANDOM FIELDS WITH MCMC based on the Metropolis Adjusted APPLICATION TO LARGE SCALE Langevin algorithm (MALA) and INLA SPATIAL DATA SETS for this model; however, comparisons Abolfazl Safikhani*, Michigan between HMC, INLA, and VB have not State University been considered previously. In this talk we describe these comparisons in terms Yimin Xiao, Michigan State University of accuracy and computational efficiency Estimating the covariance structure of using simulation studies as well as spatial random processes is an important BAYESIAN COMPUTATION FOR through applications to ecology and step in spatial data analysis. Maximum LOG-GAUSSIAN COX PROCESSES: brain imaging. likelihood estimation is a popular method A COMPARATIVE ANALYSIS OF email: [email protected] in spatial models based on Gaussian ran- METHODS dom fields. But calculating the likelihood Ming Teng*, University of Michigan in large scale data sets is computationally THE JOINT ASYMPTOTICS FOR infeasible due to the heavy computa- Farouk S. Nathoo, University of Victoria ESTIMATING THE SMOOTHNESS tion of the precision matrix. One way to Timothy D. Johnson, University PARAMETERS OF BIVARIATE mitigate this issue, which is due to Furrer of Michigan GAUSSIAN RANDOM PROCESS et al. (2006), is to “taper” the covariance matrix. While most of the results in the The Log-Gaussian Cox Process (LGCP) Yuzhen Zhou*, Michigan State University current literature focus on isotropic taper- is a commonly used model for the Yimin Xiao, Michigan State University ing for stationary Gaussian processes, analysis of spatial point pattern data. Dif- there are many cases in application that ferent methods have been proposed for Characterizing the dependence structure require modeling of anisotropy and/or inference including traditional likelihood- of the multivariate random field plays a nonstationarity. In this article, we propose based approaches as well as methods key role in multivariate spatial model set- a nonstationary parametric model, in based on the Bayesian framework. ting. Usually, the covariance structure for which the underlying Gaussian random The computation of such inference is each component of the multivariate pro- field may have different regularities in intensive due to the doubly stochastic cess is highly related to the smoothness different directions, thus can be applied property, i.e., the model is a hierarchi- of the surface. The estimation of smooth- to model anisotropy. Using the theory cal combination of a Poisson process ness parameters in univariate model has of equivalence of Gaussian measures and a Gaussian Process (GP), which been studied extensively. Yet, there is few under nonstationary assumption, strong leads to an intractable integral over an work in the multivariate case. In this paper, consistency of the tapered likelihood infinite-dimensional random function. As we first propose an estimation procedure based estimation of the variance com- a result of these challenges a number for the smoothness parameters of bivariate ponent under fixed domain asymptotics of computational techniques have been Gaussian process. Then we investigate are derived by putting mild conditions proposed for Bayesian inference. These the joint asymptotics of the estimators and on the spectral behavior of the tapering include Hamiltonian Monte Carlo (HMC), study how the cross dependence struc- ture would affect the performance of the covariance function. The procedure is estimators. illustrated with numerical simulation. email: [email protected] email: [email protected]

204 ENAR 2015 | Spring Meeting | March 15–18 DYNAMIC NEAREST NEIGHBOR 20. CONTRIBUTED PAPERS: to cognitive data from deceased partici- GAUSSIAN PROCESS MODELS Case Studies in pants in two longitudinal studies: Rush FOR LARGE SPATIO-TEMPORAL Longitudinal Data Religious Order Study, and Rush Memory DATASETS Analysis and Aging Project. We present simulation results that illustrate the model empirical Abhirup Datta*, University of Minnesota properties. Sudipto Banerjee, University USING THE SIGMOID MIXED MOD- email: [email protected] of California, Los Angeles ELS FOR LONGITUDINAL COGNITIVE Andrew O. Finley, Michigan DECLINE State University Ana W. Capuano*, Rush University Medi- SHORT-TERM BLOOD PRESSURE VARIABILITY OVER 24 HOURS USING Gaussian process models for analyzing cal Center MIXED-EFFECTS MODELS large spatial or spatio-temporal datasets Robert S. Wilson, Rush University Medi- involve large dense matrix computations cal Center Jamie M. Madden*, University College rendering them infeasible. Nearest Neigh- Cork, Ireland Sue E. Leurgans, Rush University Medi- bor Gaussian Process (NNGP) models cal Center Xia Lee, University College Cork, Ireland based on local neighborhoods provide a scalable alternative for large spatial Jeffrey D. Dawson, University of Iowa Patricia M. Kearney, University College Cork, Ireland datasets. We extend this idea to con- Donald Hedeker, University of Chicago struct dynamic local neighborhoods in a Anthony P. Fitzgerald, University College Random-effects linear mixed models are continuous spatio-temporal domain using Cork, Ireland widely used to analyze longitudinal cogni- strength of a correlation function as a tive decline. Often, however, trajectories The benefits of using ambulatory blood proxy for distance. We develop a dynamic are non-linear. For example, terminal pressure measurements (ABPM) in Nearest Neighbor Gaussian Process cognitive decline is characterized by a addition to clinic measurements in the which yields finite dimensional Gaussian decline that is faster proximate to death. management of hypertension are well densities with sparse precision matrices. Adding a quadratic term for time, or con- established. As well as mean day, night We use the dynamic NNGP as a sparsity sidering two linear slopes (e.g. random and dip values, measures of short-term inducing prior in a hierarchical spatio- change point model) may not properly blood pressure variability (BPV) can also temporal setup. We provide an algorithm characterize the trajectories. We describe be obtained from ABPM. Long term BPV for fast updates of the dynamic neighbor- a random-effects non-linear mixed model has been associated with cardiovascular hoods and show that the total storage with covariates for such longitudinal data, events but the prognostic significance of and computation costs of a Markov Chain based on sigmoidal logistic curves. The short-term BPV remains uncertain. The Monte Carlo (MCMC) iteration for this most general of the models include five majority of studies have focused on sum- model are proportional to the size of the parameters, representing: level at death, mary measures of BPV such as standard dataset thereby ensuring massive scal- level before decline, rate of decline, deviation but there is uncertainty in ability. We demonstrate the computational decline midpoint, and asymmetry. To how accurately these indexes capture and inferential benefits of the dynamic illustrate the applicability of the approach, the true variability. We obtained data NNGP over other competing methods we fit a random-effects sigmoid model from the Mitchelstown Study, a cross- using real and synthetic datasets. sectional study of Irish adults aged 47-73 email: [email protected] years (n=2,047). A subsample (1,207) underwent 24-h ABPM. In addition to using traditional measures of variability

Program & Abstracts 205 such as standard deviation this analysis mixed model with three random com- tions by using the product of conditional makes full use of the longitudinal and ponents, namely random intercept, a probabilities. This test accommodates the circadian nature of ABPM data by apply- non-stationary Gaussian process and conditionality, subject dependencies and ing mixed-effects models to determine measurement error. We also provide an cluster effects and can be implemented in subject-specific trajectories over time. R package, lmenssp, to fit this class of SAS PROC NLMIXED easily. We evaluate The variation about subject-specific models. The case-study uses data from the properties of our approach and com- trajectories was taken as a measure of an the Chronic Renal Insufficiency Standards pare it with the two-sample proportion individual’s BPV. Additionally, the associa- Implementation Study, an ongoing cohort z-test and the Cochran–Mantel–Haenszel tion between this measure of variability study based at Salford Royal Hospital, test via simulations. An example based and subclinical target organ damage Greater Manchester. on readmission rates through an emer- (documented by microalbuminuria and gency department is used to illustrate the email: [email protected] ECG left ventricular hypertrophy) was proposed method. then examined using logistic regression. email: [email protected] Results will be presented and the findings A LIKELIHOOD RATIO TEST FOR will be discussed. NESTED PROPORTIONS BAYESIAN NONPARAMETRIC email: [email protected] Yi-Fan Chen*, University of Illinois, QUANTILE REGRESSION Chicago MODELS: AN APPLICATION TO A LONGITUDINAL MODELLING CASE Jonathan Yabes, University of Pittsburgh A FETAL GROWTH STUDY WITH STUDY IN RENAL MEDICINE AND AN Maria Brooks, University of Pittsburgh ULTRASOUND MEASUREMENTS ASSOCIATED R PACKAGE Sonia Singh, Royal Children’s Hospital Sungduk Kim*, Eunice Kennedy Shriver Ozgur Asar*, Lancaster University National Institute of Child Health and Lisa Weissfeld, Statistics Collaborative Inc. Human Development, National Institutes Peter J. Diggle, Lancaster University For policy and medical issues, it is impor- of Health and University of Liverpool tant to know if the proportion of an event Paul S. Albert, Eunice Kennedy Shriver James Ritchie, University of Manchester changes after an intervention. When the National Institute of Child Health and later proportion can only be calculated in Philip A. Kalra, University of Manchester Human Development, National Institutes a portion of the sample used to compute Kidney health is monitored by blood of Health the previous proportion, the two propor- biomarkers, principally serum creatinine tions are nested. The motivating example The appropriate interpretation of moni- level. An increase in creatinine level is is to test whether admission rates in tored fetal growth throughout pregnancy indicative of worsening kidney func- emergency departments are different in individual fetus and population is tion. Acute kidney injury (AKI) is defined between the first and a return visit. Here, dependent on the availability of adequate as a sudden fall in kidney function. For relatively small subjects who contribute standards. The focus of this paper is instance, stage 1 AKI is defined as a to the admission rate at the return visit on developing Bayesian nonparametric 1.5-fold increase in creatinine level within must be included in the first rate and also quantile regression models to develop 48 hours. The influence of AKI occur- return, but not vice versa. This condition- contemporary U.S. fetal growth stan- rence on subsequent kidney health is ality makes existing methods, such as dards for racial/ethnic groups of pregnant still an open research area. In this study, longitudinal data analysis, not directly women. The proposed method relies on our main aim is to compare the level and applicable, and researchers can only assuming the asymmetric Laplace distri- slope of kidney function regarding pre explore this question by using descrip- bution as auxiliary error distribution. We and post an AKI event. For this purpose, tive statistics. We propose a likelihood also consider the covariates-dependent we developed a continuous-time linear ratio test to compare two nested propor-

206 ENAR 2015 | Spring Meeting | March 15–18 random partition models that the prob- ogy has been developed to address the covariance structure for the multivariate ability of any particular partition is allowed analytical challenges for such data when longitudinal data, and then to model its to depend on covariates. This leads to only a single labor curve is observed on parameters parsimoniously. Kim (2013) random clustering models indexed by each woman (McLain and Albert, 2014, also introduced the weighted offensive covariates, i.e., quantile regression mod- Biometrics). These challenges include average (WOA) as a variation of on base els with the outcome being a partition conducting valid inference and prediction plus slugging (OPS) which explains of the experimental units. Markov chain when there is not a time zero (i.e., when not only a batter’s hitting performance Monte Carlo sampling is used to carry out women enter the hospital at different but also his non-hitting performance to Bayesian posterior computation. Several stages of their labor). Motivated by the generate runs for his team such as stolen variations of the proposed model are NICHD Consecutive Pregnancy Study bases, walks, and etc. We adopt Kim’s considered and compared via the devi- (CPS), a unique cohort study that col- unconstrained model for the covariance ance information criterion. The proposed lected repeat labor data on over 50,000 structure for Major League Baseball bat- methodology is motivated by and applied women, we propose new methodology ter’s salary with the Weighted Offensive to a longitudinal fetal growth study. for analyzing labor curves across multiple Average. pregnancies. Our focus is on using the email: [email protected] email: [email protected] cervical dilation data from prior pregnan- cies to predict subsequent labor curves. MODELING REPEATED LABOR We propose a hierarchical random effects 21. CONTRIBUTED PAPERS: CURVES IN CONSECUTIVE PREGNAN- model with random change points that Meta Analysis CIES: INDIVIDUALIZED PREDICTION characterizes repeated labor curves OF LABOR PROGRESSION FROM within and between women. We employ PREVIOUS PREGNANCY DATA Bayesian methodology (MCMC) for META-ANALYSIS SPARSE K-MEANS parameter estimation and prediction. The FRAMEWORK FOR DISEASE Olive D. Buhule*, Eunice Kennedy methodology was used in analyzing the SUBTYPE DISCOVERY WHEN Shriver National Institute of Child Health CPS data, and in developing a predictor COMBINING MULTIPLE TRANSCRIP- and Human Development, National for labor progression that can be used in TOMIC STUDIES Institutes of Health clinical practice. Zhiguang Huo*, University of Pittsburgh Paul S. Albert, Eunice Kennedy Shriver email: [email protected] National Institute of Child Health and George Tseng, University of Pittsburgh Human Development, National Institutes Disease phenotyping by omics data of Health AN EXAMPLE OF UNCONSTRAINED has become a popular approach that Alexander C. McLain, University of MODEL FOR COVARIANCE potentially can lead to better personalized South Carolina STRUCTURE FOR MULTIVARIATE LON- treatment. Identifying disease subtypes GITUDINAL DATA: MAJOR LEAGUE via unsupervised machine learning is the Katherine Grantz, Eunice Kennedy BASEBALL BATTER’S SALARY WITH first step towards this goal. In this paper, Shriver National Institute of Child Health THE WEIGHTED OFFENSIVE AVERAGE we extend a sparse $K$-means method and Human Development, National towards a meta-analytic framework to Institutes of Health Chulmin Kim*, University of West identify novel disease subtypes when Georgia Measuring cervical dilation in the late expression profiles of multiple cohorts stage of pregnancy is a commonly used The positive-definiteness requirement are available. The lasso regularization technique for monitoring the progression for the covariance matrix may impose and meta-analysis identify a unique set of labor. Recent statistical methodol- complicated nonlinear constraints on the parameters. Kim (2012) proposed an unconstrained parameterization for the

Program & Abstracts 207 of gene features for subtype character- treatment under the incorrect assump- designs being used have generated the ization. An additional pattern matching tion that a subject’s responses to the need to develop efficient and flexible reward function guarantees consistent different treatments would be identical, meta-analysis framework to combine all subtype signatures across studies. The 3) non-ignorable treatment assignment, designs for simultaneous inference. In method was evaluated by leukemia and and 4) response related variability in the this paper, we develop a missing data breast cancer data sets. The identified composition of subjects in different stud- framework and a Bayesian hierarchical disease subtypes from meta-analysis ies. We then examine the implications model for network meta-analysis of diag- were characterized with improved accu- of these assumptions for heterogeneity/ nostic tests (NMA-DT) and offer important racy and stability compared to single homogeneity of conditional and uncon- promises over the traditional MA-DT: 1) it study analysis. The breast cancer model ditional treatment effects. To illustrate the combines studies using all three designs; was applied to an independent META- utility of our approach, we re-analyze indi- 2) it pools both studies with or without a BRIC dataset and generated improved vidual patient data from 29 randomized gold standard; 3) it combines studies with survival difference between subtypes. placebo controlled studies of Vioxx on different sets of candidate tests; and 4) it These results provide a basis for diag- the cardio-vascular risk of Vioxx, a Cox-2 accounts for heterogeneity across studies nosis and development of targeted selective non- steroidal anti-inflammatory and complex correlation structure among treatments for disease subgroups. drug approved by the FDA in 1999 for the multiple tests. We illustrate our method management of pain and withdrawn from through a NMA of deep vein thrombosis email: [email protected] the market in 2004. tests. Finally, we evaluate the perfor- mance of the proposed method through email: [email protected] META ANALYSIS: A CAUSAL FRAME- simulation studies. WORK, WITH APPLICATION TO email: [email protected] RANDOMIZED STUDIES OF VIOXX A BAYESIAN HIERARCHICAL MODEL FOR NETWORK META-ANALYSIS OF Michael E. Sobel*, Columbia University DIAGNOSTIC TESTS INFERENCE FOR CORRELATED David Madigan, Columbia University EFFECT SIZES USING MULTIPLE Xiaoye Ma*, University of Minnesota Wei Wang, Columbia University UNIVARIATE META-ANALYSES Haitao Chu, University of Minnesota We construct a framework for meta-anal- Yong Chen, University of Texas Health Yong Chen, University of Texas Health ysis that helps to clarify and empirically Science Center, Houston Science Center, Houston examine the sources of between study Yi Cai*, University of Texas Health Sci- heterogeneity in treatment effects. The Joseph Ibrahim, University of North ence Center, Houston key idea is to consider, for each of the Carolina, Chapel Hill Chuan Hong, University of Texas Health treatments under investigation, the To compare the accuracy of multiple Science Center, Houston subject’s potential outcome in each study tests in a single study, three designs were he to receive that treatment. We Dan Jackson, Cambridge Institute of are commonly used: 1) the multiple consider four sources of heterogene- Public Health test comparison design; 2) the random- ity: 1) response inconsistency, whereby ized design and 3) the non-comparative Multivariate meta-analysis, which involves a subject’s response to a given treat- design. Existing meta-analysis methods jointly analyzing multiple and correlated ment varies across different studies, 2) of diagnostic tests (MA-DT) have been outcomes from separate studies, has the grouping of non-equivalent treat- focused on evaluating the performance received a great deal of attention. One ments, where two or more treatments of a single test by comparing it with a reason to prefer the multivariate approach are grouped and treated as a single reference test. The increasing number is because of its ability to account for the of available diagnostic instruments for a disease condition and the different study

208 ENAR 2015 | Spring Meeting | March 15–18 dependence between multiple estimates Meta-analysis summarizes evidence COMPARING MULTIPLE IMPUTATION from the same study. However, nearly from many studies addressing the same METHODS FOR SYSTEMATICALLY all the existing methods for analyzing research hypothesis and is one of the MISSING SUBJECT-LEVEL DATA multivariate meta-analytic data require most influential and powerful techniques David M. Kline*, The Ohio State the knowledge of the within-study cor- underpinning evidence-based practice. University relations, which are usually unavailable When considering data from many trials, in practice. We propose a simple non- it is likely that some of them present a Eloise E. Kaizar, The Ohio State iterative method that can be used for markedly different intervention effect or University the analysis of multivariate meta- analy- exert an undue influence on the sum- Rebecca R. Andridge, The Ohio State sis datasets that has no convergence mary results and, subsequently, on policy University problems and does not require the use of decision-making. The Cochrane Collabo- When conducting research synthesis, the within-study correlations. Our approach ration recommends the application of a collection of studies that will be com- uses standard univariate methods for random-effects meta-analysis both with bined often do not measure the same set the marginal effects but also provides and without outlying studies. Here, we of variables, which creates missing data. valid joint inference for multiple param- develop a forward search algorithm for Traditionally, the focus of missing data eters. The proposed method can directly identifying outlying studies in meta-analy- methods for longitudinal data has been handle missing outcomes under missing sis models. The forward search algorithm on missing observation-level (time-vary- completely at random assumption. Simu- starts by fitting the hypothesized model ing) variables. In this paper, we focus on lation studies show that the proposed to a small subset of studies and proceeds missing subject-level (non-time-varying) method provides unbiased estimates, by adding studies that are determined to variables and compare two multiple well-estimated standard errors and be close to the fitted model. We monitor imputation approaches, a joint modeling confidence intervals with good coverage estimated parameters, measures of fit approach and a sequential conditional probability. Furthermore, the proposed and Cook’s distances and identify outliers modeling approach, for modeling missing method is found to maintain high relative by sharp changes in their forward plots. data of this type. We find the joint model- efficiency compared to conventional The suggested methodology allows us to ing approach to be preferable to the multivariate meta-analyses where the test if a change in a statistic being moni- sequential conditional approach except within-study correlations are known. We tored is caused by the study entering the when the covariance structure of the illustrate the proposed method through search or can be attributed to random repeated outcome for each individual has two real meta-analyses where functions variation. We apply the method to a homogenous variance and exchangeable of the estimated effects are of interest. meta-analysis that examines the effect of correlation. Specifically, the regression writing-to-learn interventions on academic email: [email protected] coefficient estimates from an analysis achievement adjusting for three possible incorporating imputed values based on effect modifiers and compare results to the sequential conditional method are DETECTING OUTLYING STUDIES IN other outlier detection strategies and to attenuated and less efficient than those META-REGRESSION MODELS USING data from medical research. from the joint method. Remarkably, the A FORWARD SEARCH ALGORITHM email: [email protected] estimates from the sequential conditional Dimitris Mavridis, University of Ioannina method are often less efficient than a complete case analysis, which, in the Irini Moustaki*, London School of context of research synthesis, implies that Economics we lose efficiency by combining studies. Melanie Wall, Columbia University email: [email protected] Georgia Salanti, University of Ioannina

Program & Abstracts 209 22. CONTRIBUTED PAPERS: MITIGATING BIAS IN GENERALIZED AN ESTIMATED LIKELIHOOD ESTI- Semi-Parametric LINEAR MIXED MODELS: THE CASE MATOR BY EXTRACTING AUXILIARY Methods FOR BAYESIAN NONPARAMETRICS INFORMATION UNDER OUTCOME DEPENDENT SAMPLE DESIGN Joseph L. Antonelli*, Harvard School of Public Health Wansuk Choi*, University of North Caro- UNDERSTANDING GAUSSIAN lina, Chapel Hill PROCESS FITS USING AN APPROXI- Sebastien Haneuse, Harvard School of MATE FORM OF THE RESTRICTED Public Health Haibo Zhou, University of North Carolina, Chapel Hill LIKELIHOOD Lorenzo Trippa, Harvard School of Pub- Maitreyee Bose*, University of lic Health Outcome dependent sampling(ODS) has been studied by many research- Minnesota Generalized linear mixed models ers because it is a cost effective design. James S. Hodges, University of (GLMMs) use random effects to account In case of easily obtainable outcome, Minnesota for correlation in clustered or longitudinal researchers can have responses of every data. The random effects follow some Gaussian processes (GPs) are widely member in a study-population. However, unknown distribution, G, and in practice used in statistical modeling. A GP is often it can be difficult to have all covariate this is taken to be a Normal distribution. If used as the random effect in a linear of interest information from a study- this assumption does not hold, however, mixed model, with its unknowns estimated population. In this situation, even though the model is misspecified and estimation/ by maximizing the log restricted likelihood missing data in covariates problem inference may be invalid. An alternative is or using a Bayesian analysis, which are exists, researchers can obtain auxiliary to adopt a Dirichlet process (DP) prior for closely related. However, it is unclear how covariates information from all members G in a Bayesian analysis. Conventional the process variance, range, and error in population. Weaver and Zhou(2005) wisdom suggests that the increased variance are fit to features in the data. In showed that, rather than simple random flexibility reduces bias, although this has this paper, we aim to gain a better under- sampling, ODS design could improve of not been thoroughly examined. Further- standing of how GP parameters are fit to estimator’s property in terms of efficiency. more, the extent to which the increased data. To do so, we need a simple, inter- And ODS design provided unbiaseness flexibility confers a bias-variance trade-off pretable, and fast-computing form of the and consistency to estimators. In this has not been examined. Under a range restricted likelihood. This is achieved by article, we propose a method under the of `true’ random effects distributions, applying the spectral approximation to the situation that we can only have ODS we examine operating characteristics for GP and representing it as a linear mixed samples but not a whole population. We estimation of fixed and random effects in model. The log restricted likelihood from assume that SRS part of ODS has miss- a GLMM using a DP prior for G. Strate- this approximate model has a scalarized ing covariate and supplemental sample gies for the specification of the precision form and is identical to the log likelihood part of ODS has missing covariate. In parameter in the DP prior are also inves- arising from a gamma-errors generalized addition, every member of ODS sample tigated. We conclude that while no single linear model (GLM) with the identity link. has a binary auxiliary variable, which is model is likely to work well in all set- We use this GLM representation to make related to covariates of interest. The pro- tings, the use of the DP prior in a GLMM conjectures about how GP parameters are posed method use auxiliary information mitigates much if not all of the bias that fit to data, and investigate our conjectures to estimate nonparametric parts in the arises when one incorrectly assumes a by introducing features in simulated data, likelihood to derive an estimated likeli- Normal distribution, with little-to-no pen- like outliers and mean-shifts, and observ- hood. The finite sample performance of alty paid in terms of efficiency. ing how introduction of these features the proposed method is studied, com- affects the GP parameter estimates. email: [email protected] pared to other existing methods. email: [email protected] email: [email protected]

210 ENAR 2015 | Spring Meeting | March 15–18 ESTIMATION, IID REPRESENTATION EMPIRICAL LIKELIHOOD-BASED the variability across subjects or sites AND INFERENCE FOR THE AVER- INFERENCE FOR PARTIALLY or lack of experimental scientific evi- AGE OUTCOME UNDER STOCHASTIC LINEAR MODELS dence, it may not be obvious to detect INTERVENTION ON DEPENDENT DATA a specific shape of the population level Haiyan Su*, Montclair State University trend based on sparsely observed data. Oleg Sofrygin*, University of California, We propose an empirical likelihood (EL)- For example, it is widely believed and Berkeley based inference for the linear component debated that global temperature might Mark J. van der Laan, University coefficient in partially linear models and be on rise over the last century based on of California, Berkeley partially linear mixed-effect models. The observations taken at various locations proposed method combines the projec- We describe targeted minimum loss- around the globe, but a definitive answer tion method with the EL method. The based estimation (TMLE) for the mean is still lacking. Mixed-effect model is a project method is used to remove the outcome under joint stochastic inter- commonly used tool to account for varia- nuisance parameter in the model and vention in dependent data. Suppose tions across different subjects or sites in then EL method is used to construct data on N units is observed, each unit i longitudinal analysis. This paper devel- confidence intervals for the linear compo- is O_i=(F_i,W_i,A_i,Y_i), where F_i – set ops a nonparametric Bayesian method nent. Bartlett correction method is used of units making up i’s “network”, W_i to test various shape constraint of the to correct the EL-based confidence inter- - baseline covariates, A_i - binary expo- population level mean trend based on vals. The test statistic is shown to follow sure, Y_i - binary outcome. We assume approximating a Gaussian process using regular chi-square distribution asymptoti- A_i is a function of W_i; Y_i is a function a sequence of penalized splines whose cally. The numerical performance of the of (A_i,W_i). Dependence between units coefficients are allowed with vary with method under normal and non-normal is added by assuming A_i depends on subjects or sites. Posterior consistency of error terms is evaluated through simula- covariates of units in F_i; Y_i depends on the test procedure is established under a tion studies. covariates and exposures of units in F_i. set of regularity conditions and numeri- We propose a semi-parametric model for email: [email protected] cal illustrations are presented based on the observed data and focus on estimating simulated and real data sets. the expected value of the average outcome email: [email protected] \bar{Y}, where \bar{Y} is the average of BAYESIAN NONPARAMETRIC Y_i over N units and expectation is taken METHODS FOR TESTING SHAPE CON- wrt some known joint stochastic interven- STRAINT FOR LONGITUDINAL DATA HYPOTHESIS TESTING IN tion on exposures. We describe TMLE for Yifang Li*, North Carolina State SEMI-PARAMETRIC DISCRETE above quantity and demonstrate it is a University CHOICE MODEL doubly robust, asymptotically efficient and Sujit Ghosh, North Carolina State Yifan Yang*, University of Kentucky asymptotically linear estimator. We also University & Statistical and Applied demonstrate how our statistical quantity of Mai Zhou, University of Kentucky Mathematical Sciences Institute interest can be represented as a mapping Discrete choice model is widely used in from certain IID data distribution, which In various applications of longitudinal social sciences and economics. (Wang happens to be a function of the true dis- data analysis, we often have subject and Zhou, 1995) proposed a least square tribution of O and stochastic intervention. matter knowledge about the population type of estimation, which is difficult to This insight leads to a simplified estimator that may suggest a specific shape of derive the confidence interval or perform of the asymptotic variance for above TMLE, the unknown mean curve over a given a hypothesis test on the regression coef- performance of which is then evaluated in a time period of interest. However, due to ficients. In this paper, a semi-parametric simulation study. approach was described to solve these email: [email protected]

Program & Abstracts 211 problems. The proposed method was Shiferaw Mariam, Janssen R&D ing and quantifying potential biases. We based on empirical likelihood method review, compare, and draw connections Jerry Schindler, Merck (Own, 2001) and used the interpretation between two distinct approaches. The of the Expectation and Maximization (EM) Venkat Sethuraman, Bristol-Myers first represents unmeasured confound- principle (Zhou, 2002). Simulation work Squibb ing as a latent random variable that has shows that the log likelihood ratio is stan- Frank Shen, AbbVie a specific dependence structure with dard chi-square distributed hence can be the outcome and exposure. The second Anastasios (Butch) Tsiatis, North used to construct confidence interval for treats the unobserved potential outcome Carolina State University regression coefficients. as the sole unmeasured confounder. Each approach gives rise to specific email: [email protected] 24. Causal Inference in methods for sensitivity analysis and bias HIV/AIDS Research quantification; these will be illustrated 23. Trends and Innovations and compared using both simulation and in Clinical Trial Statis- data analysis. We also examine whether tics: “The Future ain’t REPRESENTING UNMEASURED and how estimated sensitivity to bias from what it Used to be” CONFOUNDING IN CAUSAL MODELS unmeasured confounding depends on FOR OBSERVATIONAL DATA the set of measured confounders being Joseph W. Hogan*, Brown University used for adjustment. “THE FUTURE AIN’T WHAT IT USED email: [email protected] TO BE” (YOGI BERRA). HAVE STAT- Dylan Small, University of Pennsylvania ISTICIANS RECEIVED THE MEMO? Arguably the most important assump- Nevine Zariffa*, AstraZeneca tion needed for drawing causal inference INVERSE PROBABILITY OF CENSOR- Pharmaceuticals from observational data is “ignor- ING WEIGHTS UNDER MISSING NOT able treatment assignment” or “no AT RANDOM WITH APPLICATION TO We will review key contributions from stat- unmeasured confounding.” In potential CD4 OUTCOMES IN HIV-POSITIVE isticians in healthcare advancement: past, outcomes formulations of causal effect, PATIENTS IN KENYA present and future. A brief tour of history “no unmeasured confounders” states and an analysis of the current situation Judith J. Lok*, Harvard School of Public that, conditionally on a specific set of will allow us to consider future directions Health measured covariates (confounders), the and challenges. We will explore oppor- potential outcomes are independent of Constantin T. Yiannoutsos, Indiana Uni- tunities to fundamentally transform the exposure. When the assumption holds, versity Fairbanks School of Public Health healthcare system, and how our disci- valid inferences about causal effect can Agnes Kiragga, Infectious Diseases pline best evolves. Key topics will include: be obtained using methods such as Institute, Kampala, Uganda the patient perspective, quality in all its inverse probability weighting, propen- incarnations, the evolving data eviron- Ronald J. Bosch, Harvard School of sity score methods, and g-estimation. ment, and effective communication. Public Health However, because “no unmeasured con- email: [email protected] founding” is untestable, researchers in Right-censoring is Missing Not At Random (MNAR) when the prognosis PANELISTS: statistics, epidemiology, and economet- rics have developed a variety of methods of patients after censoring is different Sara Hughes, GlaxoSmithKline for representing unmeasured confound- from the prognosis of patients in follow- Dominic Labriola, Bristol-Myers Squibb up, even given observed characteristics just prior to the dropout time. Analyz- Lisa LaVange, U.S. Food and Drug ing MNAR data is complicated. Often, Administration

212 ENAR 2015 | Spring Meeting | March 15–18 bounds and sensitivity analyses are the sample. The outcome is said to be miss- we investigate several analytical strate- only option. We propose a method to ing not at random (MNAR) if, conditional gies to infer prevention efficacy among obtain point estimates for the trajectory of on the observed variables, it is still a group of women who used study a mean/median over time when dropout dependent on the unobserved outcomes. product based on the plasma TFV is MNAR, if so-called outreach data are Under such settings, identification is detection, including causal potential available: additional data on a subsample generally not possible without imposing outcomes methods and confounding of patients lost-to-follow-up, successfully additional assumptions. Identification adjustment. Merits and limitations of located afterwards. We propose an exten- is sometimes possible, however, if an these approaches will be discussed. In sion of Inverse Probability of Censoring exogeneous instrumental variable (IV) particular, we show that when adher- Weighting to this setting. We illustrate is observed for all subjects such that it ence is moderate or low, the power of the our method by estimating the response satisfies exclusion restriction, and that potential outcomes approach can be low. to antiretroviral therapy (ART) among the IV affects the missing data process Using the exclusion restriction assump- HIV-positive patients in Kenya. The avail- without directly influencing the outcome. tion, we show that the conventional able data are MNAR: more patients in In this presentation, we propose a confounding-adjustment approach can the outreach sample died shortly after nonparametric method for identification, be useful in assessing whether selection dropout than expected on the basis of followed by inverse probability weighted bias has been adequately removed. (IPW) and regression estimators, which their initially observed covariates, and email: [email protected] more patients in the outreach sample give consistent estimates for the average were off treatment, in part because of lim- response if either the propensity score or ited access to ART outside the program the outcome regression model is correct, 25. Open Problems and evaluated. Taking MNAR into account respectively. Lastly, we propose a doubly New Directions in leads to a substantial downward adjust- robust estimator that remains consistent if Neuroimaging Research ment of the response to ART. either of the above models is true. email: [email protected] email: [email protected] OPEN PROBLEMS IN STRUCTURAL BRAIN IMAGING: WAVELETS AND DOUBLY ROBUST INSTRUMENTAL ESTIMATING PREVENTION EFFI- REGRESSIONS ON NON-EUCLIDEAN VARIABLE ESTIMATION FOR OUT- CACY AMONG COMPLIERS IN HIV MANIFOLDS COME MISSING NOT AT RANDOM PRE-EXPOSURE PROPHYLAXIS Moo K. Chung*, University of Wisconsin, (PrEP) TRIALS BaoLuo Sun*, Harvard School of Public Madison Health James Dai*, Fred Hutchinson Can- Structural brain images such as MRI and cer Research Center and University of Lan Liu, Harvard School of Public Health DTI are inherently geometric in nature. Washington However, many processing and analysis James Robins, Harvard School of Public Elizabeth Brown, Fred Hutchinson techniques used in the field are Euclid- Health Cancer Research Center and University ean and do not explicitly incorporate the Eric Tchetgen Tchetgen, Harvard School of Washington non-Euclidean nature of data and space. of Public Health We present a unified differential geo- Adherence is vital to success of PrEP metric framework for performing kernel Missing data occurs frequently in epi- trials. Pharmacological measures of smoothing, regressions and wavelet demiologic practice, compromising our adherence have been widely used, transforms on non-Euclidean space such ability to make accurate inferences. In typically in case-control or case-cohort particular, the outcome of interest may samples in the active product arms. not be observed for a subset of the Using the MTN-003 trial as an example,

Program & Abstracts 213 as the space of positive definite symmet- across subjects and differences across EMPIRICAL BAYES METHODS ric matrices, curved 3D curved surfaces conditions and patient groups. Moreover, LEVERAGING HERITABILITY FOR and 4D hypersphere. The framework is it is important to model and estimate IMAGING GENETICS applied in modeling brain networks and the latent neuronal sources by combin- Wesley Kurt Thompson*, University of subcortical brain surfaces. ing information across several trials for California, San Diego each subject. Finally, since EEG data email: [email protected] is massive, it is important to develop Brain imaging and genetics both produce computationally efficient algorithms for very high dimensional data. Associat- ing brain phenotypes with genetics data OPEN PROBLEMS AND NEW estimation and inference. thus leads to the curse of dimensionality DIRECTIONS IN MODELING email: [email protected] squared. We tackle this problem from ELECTROENCEPHALOGRAMS two directions. First, using data from Hernando Ombao*, University of OPEN PROBLEMS AND NEW DIREC- twin imaging studies, we reduce the California, Irvine TIONS IN FUNCTIONAL MAGNETIC dimensionality of the brain phenotype by Quantitative neuroscience is a flourishing RESONANCE IMAGING (fMRI) clustering voxels that share genetic influ- discipline where statistics plays a critical ences. Second, for associating genetic Martin A. Lindquist*, Johns Hopkins role. With studies collecting repeated variation to these new phenotypes we University measurements on thousands of subjects propose a novel resampling-based over multiple years, the size of data sets Functional Magnetic Resonance Imag- methodology that obtains non-parametric are becoming larger and more com- ing (fMRI) is a non-invasive technique for estimates of replication effect sizes from plex. Given the complexity and size of studying brain activity. During the past Genome-Wide Association (GWA) data. neuroimaging data, simple reproducible two decades fMRI has provided research- From replication effect sizes we can com- data-analytic methods, data explora- ers with an unprecedented access to pute a number of parameters of interest, tion and careful design of experiments the inner workings of the brain, leading including the local false discovery rate, will become increasingly important and to countless new insights into how the the tagged heritability, and polygenic will require the expertise of statisticians. brain processes information. The field estimates of predicted phenotypic values This is creating significant new demand that has grown around the acquisition in de novo subjects. Low locfdr indicates and unmatched opportunity for statisti- and analysis of fMRI data has experi- high probability of being non-null. We cians working in the field. This talk will enced a rapid growth in the past several also develop an extension of this meth- focus on open problems for analyzing years and found applications in a wide odology, termed “covariate modulated electroencephalograms (EEG) which variety of fields. In this talk we will discuss local false discovery rate” (cmlocfdr),that are non-invasive measurements of brain several new directions in the analysis of incorporates functional annotations and electrical activity. These are projections of fMRI data. These include open problems pleiotropic relationships of one pheno- the unobserved neuronal electrical activ- in brain connectivity, brain decoding and type with another to leverage information ity on the scalp. With excellent temporal multi-modal data analysis. from multiple GWAS. We demonstrate resolution (approximately 1000 samples these new methodologies on a GWAS email: [email protected] per second), they capture both oscilla- of several thousand subject using brain tory activity at the cortex and connectivity morphology phenotypes (cortical thick- between the cortical regions. Under ness, surface area, and subcortical designed experiments, there is a need volumes). to develop flexible models that correctly email: [email protected] capture variation in brain responses

214 ENAR 2015 | Spring Meeting | March 15–18 26. Statistical Methods for additional penalty term, for which we individuals, as a gold standard training Understanding Whole derive the asymptotic distribution under set to improve calibration of variant calls. Genome Sequencing the null hypothesis. One finding of interest is that there is little difference in overall quality between email: [email protected] single-sample and multi-sample calling GROUP ASSOCIATION TEST USING methods at the depth of coverage in our A HIDDEN MARKOV MODEL FOR VARIANT CALLING AND BATCH data set. In addition, we examined our SEQUENCING DATA EFFECTS IN DEEP WHOLE-GENOME data set for evidence of batch effects to Charles Kooperberg*, Fred Hutchinson SEQUENCING DATA detect possible confounders in our down- stream analysis. Cancer Research Center Margaret A. Taub*, Johns Hopkins Yichen Cheng, Fred Hutchinson Cancer University email: [email protected] Research Center Suyash S. Shringarpure, Stanford James Y. Dai, Fred Hutchinson Cancer University FLEXIBLE PROBABILISTIC MOD- Research Center Rasika A. Mathias, Johns Hopkins ELING OF GENETIC VARIATION IN With next generation sequencing data, University GLOBAL HUMAN STUDIES group association tests are of great Ingo Ruczinski, Johns Hopkins John Storey*, Princeton University interest, because the power of testing for University email: [email protected] association of a single genomic feature Kathleen C. Barnes, Johns Hopkins at a time is often very small, as well as University and The CAAPA Consortium the small effect sizes, due to the over- ALLELE SPECIFIC whelming number of individual genomic EXPRESSION TO IDENTIFY features. Many methods have been Whole genome sequencing studies CAUSAL FUNCTIONAL QTLs proposed to test association of a trait with with thousands of samples are currently Barbara Englehardt*, Princeton a group of features, e.g. all variants in a underway. Statistical challenges in work- University gene, yet few of these methods account ing with these massive data sets arise at email: [email protected] for the fact that a substantial proportion all phases of data analysis, from initial of the features are not associated with the measurement of data quality to develop- trait. We propose to model the associa- ment of appropriate methods for testing 27. Doing Data Science: tion for each feature in the group as a genetic hypotheses and interpreting Straight Talk from mixture of no association or a constant observed patterns of genetic varia- the Frontline non-zero association to account for the tion. Here, we present work using 642 fact that a fraction of features may not samples from the Consortium on Asthma be associated with the trait even if other among African-ancestry Populations in DOING DATA SCIENCE features in the group are. The observed the Americas (CAAPA) who were whole- Rachel Schutt*, Newscorp individual associations are first esti- genome sequenced to an average depth mated by generalized linear models; of 30x. We first performed a comparison In this talk, I will explore the question the sequence of these estimated asso- of different variant calling algorithms, “what is data science?” Many statisticians ciations are then modeled by a hidden focusing on characteristics of variants have understandably asked, “isn’t statis- Markov chain. To test for association, we called by different subsets of callers. We tics the science of data?” which suggests use a modified likelihood ratio test based then developed a quality-control classi- that data science is just a rebranding on an independence log-likelihood with fier which uses genotyping array data, of the discipline of statistics. Yet data typically collected for all sequenced science is clearly emerging in job titles

Program & Abstracts 215 and academic programs, and doesn’t coupling (ECC), proceeding as follows. 29. Panel Discussion: seem to be going away any time soon. 1. Generate a raw ensemble, consist- In Memory of Marvin We’ll discuss possible definitions of data ing of multiple runs of the computer Zelen: Past, Present and science, and some important concepts model that differ in the inputs or model Future of Clinical Trials that suggest that data science is a new parameters in suitable ways. 2. Apply and Cancer Research and distinct discipline in its own right. I’ll statistical postprocessing techniques, PANELISTS: describe more about my role at News such as Bayesian model averaging or Corp as the Chief Data Scientist and nonhomogeneous regression, to correct Colin Begg, Memorial Sloan Kettering what that job involves, and the oppor- for systematic errors in the raw ensemble, Cancer Center tunities I think that exist for statisticians to obtain calibrated and sharp predictive Dave DeMets, University of Wisconsin, both from career and research problem distributions for each univariate output Madison perspectives. variable individually. 3. Draw a sample from each postprocessed predictive Ross Prentice, Fred Hutchison email: [email protected] distribution. 4. Rearrange the sampled Cancer Center values in the rank order structure of Victor De Gruttola, Harvard School 28. IMS Medallion Lecture the raw ensemble, to obtain the ECC of Public Health postprocessed ensemble. The use of ensembles and statistical postprocessing UNCERTAINTY QUANTIFICATION have become routine in weather forecast- 30. CONTRIBUTED PAPERS: IN COMPLEX SIMULATION MOD- ing over the past decade. We show that Methods for Clustered ELS USING ENSEMBLE COPULA seemingly unrelated, recent advances Data and Applications COUPLING can be interpreted, fused and consoli- dated within the framework of ECC, the Tilmann Gneiting*, Heidelberg Institute MULTIVARIATE MODALITY common thread being the adoption of the for Theoretical Studies (HITS) and INFERENCE WITH APPLICATION empirical copula of the raw ensemble. In Karlsruhe Institute of Technology (KIT) ON FLOW CYTOMETRY some settings, the adoption of the empiri- Roman Schefzik, Heidelberg University cal copula of historical data offers an Yansong Cheng*, GlaxoSmithKline Thordis L. Thorarinsdottir, Norwegian attractive alternative. In a case study, the Surajit Ray, University of Glasgow Computing Center ECC approach is applied to predictions The number of modes (also known as of temperature, pressure, precipitation, Critical decisions frequently rely on modality) of a kernel density estima- and wind over Germany, based on the high-dimensional output from complex tor (KDE) draws lots of interests and is 50-member European Centre for Medium- computer simulation models that show important in practice. In this presentation, Range Weather Forecasts (ECMWF) intricate cross-variable, spatial and/or we develop an inference framework on ensemble. Joint work with Roman Sche- temporal dependence structures, with the modality of a KDE under multivariate fzik and Thordis Thorarinsdottir. weather and climate predictions being setting using Gaussian kernel. We applied key examples. There is a strongly increas- the modal clustering method proposed ing recognition of the need for uncertainty by Li et.al. (2007) for mode hunting. A test quantification in such settings, for which statistic and its asymptotic distribution are we propose and review a general multi- derived to assess the significance of each stage procedure called ensemble copula mode. The inference procedure is applied on real flow cytometry data. email: [email protected]

216 ENAR 2015 | Spring Meeting | March 15–18 ESTIMATION OF THE PREVALENCE mouth exams. These estimates give good ON THE USE OF BETWEEN-WITHIN OF DISEASE AMONG CLUSTERS precision/reproducibility for a range of MODELS TO ADJUST FOR CON- USING RANDOM PARTIAL-CLUSTER cluster sizes. Our method is applicable FOUNDING DUE TO UNMEASURED SAMPLING to many areas of health research where CLUSTER-LEVEL COVARIATES the use of partial cluster sampling is Sarah J. Marks*, University of North Babette A. Brumback*, University resource-preserving , such as estimating Carolina, Chapel Hill of Florida community-level prevalence of an infec- John S. Preisser, University of North tious disease. Zhuangyu Cai, University of Florida Carolina, Chapel Hill email: [email protected] Between-within models are generalized Anne E. Sanders, University of North linear mixed effects models for clustered Carolina, Chapel Hill data that incorporate a random intercept TESTING HOMOGENEITY IN A as well as fixed effects for both a within- James D. Beck, University of North Caro- CONTAMINATED NORMAL MODEL cluster covariate and a between-cluster lina, Chapel Hill WITH CORRELATED DATA covariate, wherein the between-cluster The gold-standard for estimating popu- Meng Qi*, University of Kentucky covariate represents the cluster means lation prevalence of dental conditions, of the within-cluster covariate. One popu- such as periodontitis, relies on a full Richard Charnigo, University of lar use of these models is to adjust for mouth exam for each individual, requiring Kentucky confounding of the effect of within-cluster examination of up to 168 tooth sites per In this talk, we consider the problem of covariates due to unmeasured between- person. Due to time constraints, partial testing homogeneity in a contaminated cluster covariates. Previous research mouth exams that select only a subset normal model, when the data is cor- has shown via simulations that using of sites have been used in epidemiologi- related under some known covariance this approach can yield an inconsistent cal studies; these exams, however, may structure. To address this problem, we estimator. We investigate this problem underestimate the population preva- developed a moment based homogene- further. lence when disease is defined entirely in ity test, assuming the data has a known email: [email protected] terms of the selected sites. We propose compound symmetric covariance struc- a model-based approach for estimat- ture, and designed the weight for test ing prevalence based on the conditional statistics to increase power. We did simu- ESTIMATING THE EFFECTS OF linear family for correlated binary data. lations to assess size and power of the CENTER CHARACTERISTICS ON For random site selection, our simple test and established asymptotic proper- CENTER OUTCOMES: A SYMBOLIC estimator requires specification of only ties. In a case study, we applied our test DATA APPROACH two parameters in a working model: the to microarray about Down’s syndrome Jennifer Le-Rademacher*, Medical Col- marginal mean assumed constant across caused by an extra copy of chromosome lege of Wisconsin sites and an exchangeable correlation 21. By assuming different covariance for pairs of sites within clusters (mouths). parameters, we got a contour plot of This paper introduces a symbolic data Using oral examination data from 6,793 p-values from our test, showing that fail- approach to evaluate the effects of participants in the Arteriolosclerosis Risk ing to take into account correlation may center-level characteristics on center in Communities Study, our proposed par- massively understate the p-value. outcomes. The proposed method treats tial sampling method estimator produces centers rather than patients as the units email: [email protected] estimates of periodontitis prevalence of observation when estimating the that are very similar to those from full effects of center characteristics since cen- ters are the entities of interest. To adjust for the differences in outcomes among

Program & Abstracts 217 centers caused by varying patient load, ables and fine stratification by geographic a given gene, whether there are SNPs the effects of patient-level characteristics area. Motivated by this study, we pres- that are simultaneously associated with are first modelled treating patients as ent a composite conditional likelihood its expression and with disease. In this the units of observation. The outcomes approach that yields valid inference on paper a method is proposed to detect (adjusted for patient-level effects) of the intracluster pairwise association such simultaneous associations. The patients from the same center are then along with the effects of covariates on the method allows the SNP-expression and combined into a distribution of outcomes marginal responses. We use the gen- SNP-disease associations to be calcu- representing that center. The outcome eral odds ratio function to measure the lated in independent datasets and is easy distributions are symbolic-valued intracluster pairwise assocations, which to implement and quick to compute. In responses on which the effects of center- accommodates responses of any type, is addition, it is shown that the proposed level characteristics are modelled. The invariant under prospective or retrospec- method is asymptotically optimal under proposed method provides an alternative tive study design, and is unconstrained certain conditions, and a procedure for framework to analyze clustered data. This by the marginal univariate distributions of calculating p-values in finite samples is method distinguishes the effects of center the responses. Theoretical and simulation also provided. In simulations it is shown characteristics from the patient charac- results demonstrate the validity of our that the proposed procedure is more teristics effects. It can be used to model proposed method. We apply the method powerful than standard enrichment the effects of center characteristics on the to investigate whether HCGI episodes approaches, and in data analysis the pro- mean as well as the consistency of center tended to aggregate within households. cedure is used to identify genes whose outcome which classical methods such regulation may play a role in Crohn’s email: [email protected] as the fixed-effect model and the ran- disease. dom-effect model cannot. This method email: [email protected] performs well even under scenarios 31. CONTRIBUTED PAPERS: where the data come from a fixed-effect GWAS model or a random-effect model. The STATISTICAL TESTS FOR THE proposed approach is illustrated using a DETECTION OF SHARED COMMON bone marrow transplant example. GENE-DISEASE ASSOCIATIONS GENETIC VARIANTS BETWEEN VIA SPARSE SIMULTANEOUS email: [email protected] HETEROGENEOUS DISEASES SIGNAL DETECTION BASED ON GWAS Sihai Dave Zhao*, University of Illinois Julie Kobie*, University of Pennsylvania A ROBUST AND FLEXIBLE METHOD at Urbana-Champaign TO ESTIMATE ASSOCIATION FOR Sihai Dave Zhao, University of Illinois Tony Cai, University of Pennsylvania SPARSE CLUSTERED DATA at Urbana-Champaign Hongzhe Li, University of Pennsylvania Lijia Wang*, Emory University Yun R. Li, University of Pennsylvania It is of great interest to identify genes John J. Hanfelt, Emory University Hakon Hakonarson, University whose expression levels are regulated of Pennsylvania It is challenging to conduct robust by disease-associated variants, as Hongzhe Li, University of Pennsylvania inference on sparse clustered data in these genes may be important in the heterogeneous populations. For example, functional mechanisms underlying the Studying complex diseases, such as in a study of drinking water, researchers disease. One promising approach is to autoimmune diseases or psychiatric wanted to know whether highly credible integrate genome-wide association and disorders, can lead to the detection of gastrointestinal illness (HCGI) episodes genetical genomics studies to test, for pleiotropic loci with otherwise small tended to aggregate within households, effects. Through the detection of pleiotro- after adjustment for demographic vari- pic loci, the genetic architecture of these

218 ENAR 2015 | Spring Meeting | March 15–18 related but clinically-distinct diseases can Rachel Ballentyne, University of SET-BASED TESTS FOR GENETIC be better defined, allowing for subse- Pennsylvania ASSOCIATION IN LONGITUDINAL quent improvements in their treatment STUDIES Liming Qu, University of Pennsylvania and prevention efforts. We investigate the Zihuai He*, University of Michigan genetic relatedness of complex diseases Muredach P. Reilly, University of through the detection of shared common Pennsylvania Min Zhang, University of Michigan genetic variants, utilizing data from read- Andrea S. Foulkes, Mount Holyoke Seunggeun Lee, University of Michigan ily available genome-wide association College Jennifer A. Smith, University of Michigan studies (GWAS). GWAS have the poten- Characterization of the genetic determi- tial to identify additional SNPs associated Xiuqing Guo, Harbor-UCLA Medical nants of complex diseases can be further with complex diseases with increased Center augmented by incorporating knowledge sample sizes, but standard meta-analysis of underlying structure or classifications Walter Palmas, Columbia University approaches are not optimal for the of the genome, such as newly devel- Sharon L.R. Kardia, University study of these diseases. We present two oped mappings of protein-coding genes, of Michigan tests for the detection of shared genetic epigenomic marcs, enhancer elements variants between two diseases, includ- Ana V. Diez Roux, University of Michigan and non-coding RNAs. In this manuscript, ing the global test proposed by Zhao et we derive two class-level test statistics Bhramar Mukherjee, University al. (2014), originally for the analysis of and their theoretical distributions, and of Michigan expression quantitative trait loci (eQTL), evaluate them in two publicly-available Genetic association studies with longitu- and a modified global test with an added summary level datasets derived from dinal markers of chronic diseases provide level of dependency on the direction of genome-wide association study meta- a valuable opportunity to explore how the association signals. A procedure for analyses -- namely, the CARDIoGRAM genetic variants affect traits over time by obtaining an analytical p-value for the and GIANT consortium meta-analysis utilizing the full trajectory of longitudinal modified global test is proposed and data. The proposed Genetic Class outcomes. Since these traits are likely validated using simulations. Both global Association Testing (genCAT) approach influenced by the joint effect of multiple tests identify pairs of related but clinically- is intended to complement post-hoc char- variants in a gene, a joint analysis of distinct pediatric autoimmune diseases acterization of class effects (e.g. genes) these variants may help to explain addi- (pAIDs) that share at least one common based on the minimum single element- tional phenotypic variation. We propose a genetic variant. level (e.g. single SNP level) p-value in longitudinal genetic random field model e-mail: [email protected] the class. Additionally we address high (LGRF), to test the association between a degrees of redundancy in the genotype phenotype measured repeatedly during data. A simulation study is presented to the course of an observational study and TESTING CLASS-LEVEL GENETIC characterize overall performance of this a set of genetic variants. Generalized ASSOCIATIONS USING SINGLE- approach. score type tests are developed, which ELEMENT SUMMARY STATISTICS e-mail: [email protected] we show are robust to misspefication Jing Qian*, University of Massachusetts, of within-subject correlation, a feature Amherst that is desirable for longitudinal analy- Eric Reed, University of Massachusetts, sis. A joint test incorporating gene-time Amherst interaction is further proposed. Computa- tional advancement is made for scalable Sara Nunez, University of Massachu- implementation of the proposed methods setts, Amherst

Program & Abstracts 219 in large-scale genome-wide associa- Pleiotropy and Annotation), to increase address this issue for detecting imprinting tion studies (GWAS). The methods are statistical power to identify risk variants and maternal effects. However, the ques- evaluated through extensive simulation through joint analysis of multiple GWAS tion is far from settled, mainly due to the studies and illustrated using data from data sets and annotation information. Our fact that results and recommendations the Multi-Ethnic Study of Atherosclerosis approach is motivated by the observa- in the literature are based on anecdotal (MESA). Our simulation results indicate tions that (1) accumulating evidence evidence from simulation studies rather substantial gain in power using LGRF suggests that different complex diseases than based on a rigorous statistical when compared with two commonly used share common risk bases, i.e., pleiotropy; analysis. In this paper, we propose a existing alternatives: (i) single marker and (2) functionally annotated variants systematic approach to study various tests using longitudinal outcome and (ii) have been consistently demonstrated to study designs for simultaneous detec- existing gene-based tests using the aver- be enriched among GWAS hits. GPA can tion of imprinting and maternal effects age value of repeated measurements as integrate multiple GWAS datasets and based on a partial likelihood formulation. the outcome. functional annotations to identify asso- We derive the asymptotic properties and ciation signals, and it can also perform obtain close-form formulas for comput- e-mail: [email protected] hypothesis testing to test the presence of ing the information contents of study pleiotropy and enrichment of functional designs. Our results show that, for a GPA: A STATISTICAL APPROACH annotation. I will discuss the power of common disease, recruiting additional TO PRIORITIZING GWAS RESULTS GPA with its application to real GWAS siblings is preferred; whereas if a disease BY INTEGRATING PLEIOTROPY data with various functional annotations is rare, additional families will be a better AND ANNOTATION and the simulation studies. choice with a fixed amount of resources. Our work thus offers a practical strategy Dongjun Chung*, Medical University email: [email protected] for investigators to select the optimum of South Carolina study design within a case-control family Can Yang, Hong Kong Baptist University OPTIMUM STUDY DESIGN FOR scheme before data collection. Cong Li, Yale University DETECTING IMPRINTING AND email: [email protected] MATERNAL EFFECTS BASED ON Joel Gelernter, Yale University PARTIAL LIKELIHOOD Hongyu Zhao, Yale University ANALYSIS OF GENOMIC DATA Fangyuan Zhang*, The Ohio State VIA LIKELIHOOD RATIO TEST Results from Genome-Wide Association University Studies (GWAS) have shown that com- IN COMPOSITE KERNEL Abbas Khalili, McGill University plex diseases are often affected by many MACHINE REGRESSION genetic variants with small or moderate Shili Lin, The Ohio State University Ni Zhao*, Fred Hutchinson Cancer effects. Identifications of these risk vari- Despite spectacular advances in molecu- Research Center ants remain a very challenging problem. lar genomic technologies in the past two Michael C. Wu, Fred Hutchinson Hence, there is a need to develop more decades, resources, especially those Cancer Research Center powerful statistical methods to leverage for family based genomic studies, are Semiparametric kernel machine regres- available information to improve upon tra- still finite. To maximally utilize limited sion has emerged as a powerful and ditional approaches that focus on a single resources to increase statistical power, flexible tool in genomic studies in which GWAS dataset without incorporating an important study-design question is genetic variants are grouped into biologi- additional data. In this presentation, I will whether to genotype siblings of probands cally meaningful entities for association discuss our novel statistical approach, or to recruit more independent families. GPA (Genetic analysis incorporating Numerous studies have attempted to

220 ENAR 2015 | Spring Meeting | March 15–18 testing. Recent advances have expanded evidence of a positive average treatment may reflect a bias from nonrandom the method to test for the effect of mul- effect for a population, analyses leave assignment rather than a treatment effect. tiple groups of genomic features via a unanswered whether treatment benefits After adjusting for measured pretreat- composite kernel that is constructed as are widespread or limited to a select few. ment covariates by matching, a sensitivity a weighted average of multiple kernels. This problem affects many disease areas, analysis determines the magnitude of Variance component testing is used to since it stems from how randomized tri- bias from an unmeasured covariate that evaluate the significance but requires als, often the gold standard for evaluating would need to be present to alter the fixing the weighting parameters or per- treatments, are designed and analyzed. conclusions of the naive analysis that turbation. In this paper, we focus on the The goal of this work is to estimate the presumes adjustments eliminated all (restricted) likelihood ratio test for kernel fraction who benefit from a given experi- bias. Other things being equal, larger machine regression with composite ker- mental treatment, based on randomized effects tend to be less sensitive to bias nels where instead of fixing the weighting trial data, when the primary outcome than smaller effects. Effect modification parameter, we estimate the weighting is continuous or ordinal. Because the is an interaction between a treatment parameter by maximizing the likelihood fraction who benefit is non-identifiable, and a pretreatment covariate controlled functions through the linear mixed model we develop a method to estimate sharp by matching, so that the treatment effect with multiple variance components. We lower and upper bounds on it. A novel is larger at some values of the covariate derive the spectral representation of (R) application of linear programming is used than at others. In the presence of effect LRT in linear mixed models with multiple to compute the bounds, which allows modification, it is possible that results variance components to obtain their finite fast, flexible, and easy implementation. are less sensitive to bias in subgroups sample distribution. We conduct exten- The method allows incorporation of experiencing larger effects. In this talk, sive simulations to evaluate the power baseline data, and the user may impose we will consider (i) an a priori grouping and type I error. Finally, we applied to restrictions based on subject matter into a few categories based on covariates proposed (R)LRT method to a real study knowledge. The method is applied to controlled by matching; and (ii) a group- to illustrate our methodology. estimate lower and upper bounds on the ing discovered empirically in the data at fraction who benefit from a new surgi- hand. A sensitivity analysis for a test of email: [email protected] cal intervention for stroke, based on the the global null hypothesis of no effect is MISTIE II randomized trial. converted to sensitivity analyses for sub- group analyses using closed testing. 32. CONTRIBUTED PAPERS: email: [email protected] Applications, Simulations email: [email protected] and Methods in Causal Inference SENSITIVITY ANALYSES IN THE PRESENCE OF EFFECT MODIFICA- THE CAUSAL EFFECT OF GENE AND TION IN OBSERVATIONAL STUDIES PERCENTAGE OF TRUNK FAT INTER- ESTIMATING THE FRACTION WHO ACTION ON PHYSICAL ACTIVITY Jesse Y. Hsu*, University of Pennsylvania BENEFIT FROM A TREATMENT, Taraneh Abarin*, Memorial University USING RANDOMIZED TRIAL DATA Dylan S. Small, University of Pennsylvania Literature has shown that a high level of Emily J. Huang*, Johns Hopkins physical activity reduces obesity-related University Paul R. Rosenbaum, University traits, such as BMI and percentage of of Pennsylvania Michael A. Rosenblum, Johns Hopkins trunk fat. More recent research has also University In observational studies of treatment paid attention to the interaction between effects, subjects are not randomly Most analyses of randomized trials focus assigned to treatments, so differing on the average treatment effect. The outcomes in treated and control groups result is that, even when there is strong

Program & Abstracts 221 physical activity and obesity-associated tional data. Each respective problem THE IMPACT OF UNMEASURED genes. However, exploring whether or not has been extensively studied. Lately a CONFOUNDING IN OBSERVATIONAL these associations, in part, reflect reverse principled approach to causal inference STUDIES causation remains a challenge. More on data with binary or continuous missing Zugui Zhang*, Christiana Care Health importantly, whether or not carriers of the covariates is developed. It is a unified System risk allele modify the causality, is yet to be multiply-robust (MR) methodology which discovered. Using data from the Complex simultaneously handles both issues. The Paul Kolm, Christiana Care Health Diseases in the Newfoundland Popula- MR method builds upon the well-estab- System tion: Envi­ronment and Genetics study, we lished doubly-robust theory and is 4-fold Unmeasured confounding has been a aim to assess whether or not different lev- robust in that it is consistent and asymp- major source of bias in observational els of BMI and percentage of trunk fat in totically normal if at least one of four sets studies. In this study, we assessed the population of adults in Newfoundland of modeling assumptions holds. In this the impact of unmeasured confound- and Labrador, causally influence different simulation study, we assess the finite ing factors on the cost-effectiveness of levels of physical activity. Using Mande- sample performance of MR under vari- coronary-artery bypass grafting (CABG) lian Randomization Analysis, we also aim ous realistic scenarios with both binary versus percutaneous coronary interven- to assess potential genetic effects on and continuous missing covariates. We tion (PCI), using data from the Society these causal relations. use the plug-in approach to deal with of Thoracic Surgeons Database and the the continuous missing covariates, while email: [email protected] American College of Cardiology Foun- with binary missing covariates we can dation National Cardiovascular Data explicitly get the expectation. For com- Registry in ASCERT from years 2004 to A SIMULATION STUDY OF A parison we also include results from the 2008. Patients (86, 244 in CABG group MULTIPLY-ROBUST APPROACH full data likelihood and the complete case and 103, 549 in PCI group) at least FOR CAUSAL INFERENCE WITH approaches. Our simulation results show 65 years old with two or three vessel BINARY OR CONTINUOUS MISSING that the MR approach has reasonable coronary artery disease were included. COVARIATES finite-sample performance and is 4-fold Cost-effectiveness is expressed as the robust in most considered settings. It is incremental cost effectiveness ratio Jia Zhan*, Indiana University School much more robust to model misspecifica- (ICER), the difference in costs of the of Medicine and Richard M. Fairbanks tion than the complete-case approach two groups divided by the difference School of Public Health and the likelihood based approach. The in effectiveness (events prevented, life Changyu Shen, Indiana University coverage probability based on an asymp- years gained or QALYs). The unmea- School of Medicine and Richard M. Fair- totic approximation is around the nominal sured confounder could have the cost banks School of Public Health level with realistic sample sizes. difference, ranging from decrease 20% Xiaochun Li, Indiana University School email: [email protected] to increase 10%. If the prevalence due of Medicine and Richard M. Fairbanks to the confounder in PCI was 50%, the School of Public Health ICER of CABG vs. PCI would change from $24,000, to $28,000, to $33,000, to Lingling Li, Harvard Medical School and $38,000 and $41,000 per QALY gained Harvard Pilgrim Health Care Institute for the prevalence of the confounder in Confounding bias and missing data are CABG with 5%, 10%, 20%, 30%, 40%, two major barriers to valid comparative respectively. effectiveness studies using observa- email: [email protected]

222 ENAR 2015 | Spring Meeting | March 15–18 FLEXIBLE MODELS FOR ESTI- DOUBLE ROBUST GOODNESS-OF- matches the data well, we derive a good- MATING OPTIMAL TREATMENT FIT TEST OF COARSE STRUCTURAL ness-of-fit (GOF) test procedure based INITIATION TIME FOR SURVIVAL NESTED MEAN MODELS WITH on overidentification restrictions tests ENDPOINTS: APPLICATION TO APPLICATION TO INITIATING HAART (Sargan, 1958; Hansen, 1982). Overi- TIMING OF cART INITIATION IN IN HIV-POSITIVE PATIENTS dentification restrictions tests are widely HIV/TB CO-INFECTION used in the economic literature; they Shu Yang*, Harvard School of Public however seem to have been previously Liangyuan Hu*, Brown University Health unnoticed in the statistics and biosta- Joseph W. Hogan, Brown University Judith Lok, Harvard School of Public tistics literature. We show that our GOF Health Timing of combinational antiretroviral statistic is double-robust in the sense that therapy (cART) initiation is important in Coarse Structural Nested Mean Models with a correct treatment effect model, if HIV/TB co-infection. Early initiation during (SNMMs) provide useful tools to esti- either the treatment initiation model or the TB treatment increases drug toxicity, the mate treatment effects from longitudinal nuisance regression outcome model is risk of inflammatory immune reconsti- observational data in the presence of correctly specified, the GOF statistic has tution, and cost burden; late initiation time-dependent confounders. Coarse a chi-squared limiting distribution with increases risk for morbidity and mortality SNMMs lead to a large class of estima- degrees of freedom equal to the number associated with HIV/AIDS. Evidence from tors. An optimal estimator was derived of overidentification restrictions. Moreover recent RCTs generally supports early within the class of coarse SNMMs under we show that the testing procedure is initiation. However, the RCT studies do the conditions of well-specified models consistent. We demonstrate the empirical not give specifics about optimal initiation for the treatment effect, for treatment relevance of our methods using simula- time or precise recommendations for initiation, and for nuisance regression tion designs based on an actual data set. those with CD4>100. We use data from outcomes (Lok et al., 2014). The key Our simulation shows that the asymptotic a large observational cohort to gain more assumption lies in a well-specified model distribution of the GOF statistic derived in detailed information about treatment for the treatment effect; however there this article provides an accurate approxi- effects in practical settings. We formulate is no existing guidance to specify the mation to the finite sample behavior of the a causal structural model that flexibly treatment effect model. Researchers GOF statistic. Our simulations show that captures the joint effects of treatment ini- often use simple models mainly because the GOF statistic is extremely powerful in tiation time and treatment duration using they do not want to estimate too many detecting misspecification of the treat- smoothing splines, and develop methods parameters. Misspecification of the ment effect model. In addition, we apply for fitting the model to observational data treatment effect model leads to biased the GOF test procedure in the study of wherein both mortality and cART initiation estimators, preventing valid inference. To the role of initiation timing of highly active times are subject to censoring. We fit the test whether the treatment effect model antiretroviral treatment (HAART) after model to data from 4903 individuals in a infection on one-year treatment effect in large HIV treatment program in Kenya, HIV-positive patients with acute and early and use it to estimate optimal initiation infection. times by CD4 subgroups. Additionally, we email: [email protected] derive rules that are consistent with RCTs but have higher resolution in the sense of generating CD4-specific rules that can be used to complement current treatment guidelines. email: [email protected]

Program & Abstracts 223 33. CONTRIBUTED PAPERS: the total sample size under pre-specified either do not formally optimize an esti- Adaptive Designs constraints on the expected length of mate of expected potential outcome, or and Dynamic posterior credible intervals for both group use supervised learning techniques. In Treatment Regimes means and their difference. Examples of this paper we propose a novel unsuper- method implementation are provided for vised tree-based approach for estimating different types of endpoints in the expo- optimal treatment regimes in RCTs A BAYESIAN OPTIMAL DESIGN IN nential families. that directly maximizes an estimator of TWO-ARM, RANDOMIZED PHASE II the overall expected outcome for the email: [email protected] CLINICAL TRIALS WITH ENDPOINTS tree-based regimes under study. The FROM EXPONENTIAL FAMILIES performance of the proposed approach Wei Jiang*, University of Kansas Medical A NOVEL METHOD FOR ESTIMATING is assessed through simulation studies, Center OPTIMAL TREE-BASED TREATMENT and the approach is illustrated using REGIMES IN RANDOMIZED CLINI- data from an RCT on early-stage breast Jo A. Wick, University of Kansas Medical CAL TRIALS cancer. Center Lisa L. Doove*, Katholieke Universiteit email: [email protected] Jianghua He, University of Kansas Leuven Medical Center Elise Dusseldorp, Leiden University Jonathan D. Mahnken, University LONGITUDINAL BAYESIAN ADAPTIVE of Kansas Medical Center Katrijn Van Deun, Tilburg University DESIGNS FOR THE PROMOTION OF MORE ETHICAL TWO ARMED RAN- Iven Van Mechelen, Katholieke Matthew S. Mayo, University of Kansas DOMIZED CONTROLLED TRIALS: A Universiteit Leuven Medical Center NOVEL EVALUATION OF OPTIMALITY For many medical problems, multiple Frequentist optimal designs for two-arm Jo Wick*, University of Kansas Medical treatment alternatives are available. A randomized phase II clinical trials with Center outcomes from exponential dispersion major challenge in such cases pertains families was proposed by Jiang et al. to identifying optimal treatment regimes Scott M. Berry, Berry Consultants (2014), where the total sample size is that specify for each individual client the Byron J. Gajewski, University of Kansas minimized under multiple constraints on preferable treatment alternative, with the Medical Center the standard error of the estimated group optimal regime being the one leading to Hung-Wen Yeh, University of Kansas means. This design was generalized the greatest expected potential outcome Medical Center from approaches developed in Mayo et for the population under study. Estimat- al. (2010) for dichotomous outcomes. ing optimal regimes comes down to an Won Choi, University of Kansas Medical Compared to frequentist methods, Bayes- unsupervised learning problem, with Center the goal being to find a set of unknown ian approaches offer a flexible way to Christina M. Pacheco, University of subgroups of patients each of which is incorporate uncertainty in parameters of Kansas Medical Center interest into considerations. In this paper, associated with a preferable treatment Christine Daley, University of Kansas a Bayesian optimal deign for Phase II alternative. Of particular interest for this Medical Center clinical trials with endpoints from the problem are methods to construct tree- exponential families is developed from based treatment regimes, in which the Classical clinical trial designs focus on the two previous frequentist approaches. subgroups that constitute the basis of statistical power but pay little attention The proposed optimal design minimizes the regimes are the leaves of a decision to optimizing other operating character- tree. However, the majority of methods for istics. This primary focus on statistical estimating tree-based treatment regimes power results in suboptimal trial designs

224 ENAR 2015 | Spring Meeting | March 15–18 that lead to a lower percentage of trial to treatment. DTRs can be operational- Identifying the optimal dynamic treatment participants placed in the better interven- ized by a sequence of decision rules regimes (DTRs) allows patients to receive tion, have fewer trial responders, bigger that map patient information to treatment the best treatment prescription given sample size, and longer duration. For options at specific decision points. The their own evolving disease status and example, the balanced two-armed design Sequential Multiple Assignment Ran- medical history. We consider estimating has optimal power, but we show this domized Trial (SMART) is a trial design the optimal DTRs for treatment initiation design has suboptimal performance in that was developed specifically for the using observational data, where key other operating characteristics. Bayes- purpose of obtaining data that informs biomarkers of disease severity are moni- ian adaptive designs (BAD) are known the construction of good (i.e., effica- tored continuously during follow-up and for their flexibility in clinical trial design, cious) decision rules. One of the scientific a decision of whether or not to initiate a allowing for modification to the design questions motivating a SMART concerns specific treatment is made each time the based on knowledge gained during the the comparison of multiple DTRs that biomarkers are measured. The goal is study. However, since BAD are often less are embedded in the design. Typical to find out the optimal DTR to initiate the powerful than traditional fixed designs, approaches for identifying the best DTRs treatment given the biomarker history. we consider a longitudinal variant, that involve all possible comparisons between Instead of considering multiple fixed uses interim results to adapt the random- DTRs that are embedded in a SMART, decision stages as in most DTR literature, ization of subjects to treatment (BADL) at the cost of greatly reduced power to our study undertakes the task of dealing to improve statistical power. A novel the extent that the number of embed- with continuous random decision points approach evaluates the designs based ded DTRs increase. Here, we propose for treatment modification given patients’ on both traditional operating charac- a method that will enable investigators up-to-date clinical records. Under each teristics and other subject-focused trial use SMART study data more efficiently DTR, we employ a flexible survival model features, leaning us to an unbalanced to identify the set that contains the most with splines for time-varying covariates two-armed design as the optimal design. efficacious embedded DTRs. Our method to estimate the probability of adherence ensures that the true best embedded to that regime for all patients, given their email: [email protected] DTRs are included in this set with at least own covariate history. We then use the a given probability. Simulation results estimated probability to construct inverse IDENTIFYING A SET THAT are presented to evaluate the proposed probability weighted estimators for the CONTAINS THE BEST DYNAMIC method and the Extending Treatment counterfactual mean utility, prespeci- TREATMENT REGIMES Effectiveness of Naltrexone SMART fied criteria for assessing each DTR. We study data are analyzed to illustrate its conduct simulations to demonstrate the Ashkan Ertefaie*, University application. performance of our method and further of Pennsylvania illustrate the application procedure with email: [email protected] Tianshuang Wu, University of Michigan the example of type 2 diabetes patients Inbal Nahum-Shani, University enrolled to initiate insulin therapy. of Michigan OPTIMAL DYNAMIC TREATMENT email: [email protected] REGIMES FOR TREATMENT INITIA- Kevin Lynch, University of Pennsylvania TION WITH CONTINUOUS RANDOM A dynamic treatment regime (DTR) is a DECISION POINTS treatment design that seeks to accom- Yebin Tao*, University of Michigan modate patient heterogeneity in response Lu Wang, University of Michigan Haoda Fu, Eli Lilly and Company

Program & Abstracts 225 STATISTICAL INFERENCE FOR treatment effects. We show that bound- multiple decision points. We also propose THE MEAN OUTCOME UNDER A ing such a quantity is straightforward in a a BIC-type criterion based on sequential POSSIBLY NON-UNIQUE OPTIMAL parametric model, and suggest an exten- advantage to select the best candidate TREATMENT STRATEGY sion to the nonparametric case. subset of variables for decision-making. The empirical performance of the pro- Alexander R. Luedtke*, University email: [email protected] posed method is evaluated by simulation of California, Berkeley and an application to depression data Mark J. van der Laan, University SEQUENTIAL ADVANTAGE from a clinical trial. of California, Berkeley SELECTION FOR OPTIMAL email: [email protected] We consider challenges that arise in the TREATMENT REGIME estimation of the mean outcome under an Ailin Fan*, North Carolina State optimal individualized treatment strat- University 34. CONTRIBUTED PAPERS: egy defined as the treatment rule that Survival Analysis and maximizes the mean outcome, where the Wenbin Lu, North Carolina State Cancer Applications candidate treatment rules are restricted to University depend on baseline covariates. Previous Rui Song, North Carolina State University works have established regular root-n REGRESSION ANALYSIS OF Variable selection plays an important role rate inference for this quantity in a large INFORMATIVE CURRENT STATUS in deriving practical and reliable optimal semiparametric model provided that DATA UNDER CURE RATE MODEL treatment regimes for personalized medi- the treatment has some effect, either cine, especially when there are a large Yeqian Liu*, University of Missouri, beneficial or detrimental, in each strata number of predictors, and is getting more Columbia of the measured covariates. Here we attention. Most existing variable selection Tao Hu, Capital Normal University, China prove a necessary and sufficient condi- techniques focus on selecting variables tion for the pathwise differentiability of Jianguo Sun, University of Missouri, that are important for prediction, therefore the mean outcome under an optimal rule, Columbia some variables that are poor in predic- a key condition needed for such regu- tion but are critical for decision-making Current status data arise when a single lar root-n rate inference, that is slightly may be ignored. A qualitative interaction follow-up inspection is made for each more general than the previous condition of a variable with treatment arises when individual and the occurrence of events implied in the literature. We then describe treatment effect changes direction as the is only detected at inspection times. an approach to get confidence intervals value of this variable varies. Variables Motivated by medical studies in which for the mean outcome under the optimal that have qualitative interactions with patients could be cured and no longer individualized treatment strategy even treatment are of clinical importance for suseptible to the disease. We consider when regular inference is not possible. decision-making. Gunter et al. proposed the current status data under the cure This procedure requires that one be able S-score to characterize the magnitude rate model and assume a generalized to (at least asymptotically) bound the of qualitative interaction of individual linear model with log link function for the mean-squared error between an estimate variable with treatment. In this article, we cure probability. The Cox proportional of the strata-specific treatment effects developed a sequential advantage selec- hazards models are used to model both and the true underlying strata-specific tion method based on modified S-score. the failure times and censoring times. To Our method selects qualitatively inter- avoid the inference bias resulted from acted variables sequentially and allows ignoring the informative censoring. We

226 ENAR 2015 | Spring Meeting | March 15–18 propose a log-normal frailty to character- partial likelihood, using a likelihood- JOINT SEMIPARAMETRIC ize the correlation between the censoring based information criterion to optimize TIME-TO-EVENT MODELING OF time and the failure time. An EM algo- the smoothing parameter. Methods are CANCER ONSET AND DIAGNOSIS rithm combining a sieve method by using applied to a study of in-hospital mortality WHEN ONSET IS UNOBSERVED Bernstein polynomials is developed for among patients with acute respiratory dis- John D. Rice*, University of Michigan parameter estimation. Simulation studies tress syndrome in the intensive care unit. are performed to evaluate the proposed Alex Tsodikov, University of Michigan email: [email protected] estimates and suggest that the approach In cancer research, interest frequently works well in finite sample situations. An centers on factors influencing a latent illustrative example is provided. BAYESIAN ANALYSIS OF SUR- event that must precede a terminal email: [email protected] VIVAL DATA UNDER GENERALIZED event. In practice it is often impossible to EXTREME VALUE DISTRIBUTION observe the latent event precisely, mak- WITH APPLICATION IN CURE RATE ing inference about this process difficult. THE HISTORICAL COX MODEL MODEL To address this problem, we propose a joint model for the unobserved time to Jonathan E. Gellar*, Johns Hopkins Dooti Roy*, University of Connecticut the latent and terminal events, with the Bloomberg School of Public Health Vivekananda Roy, Iowa State University two events linked by the baseline hazard. Fabian Scheipl, LMU Munich Dipak Dey, University of Connecticut Covariates enter the model parametri- Mei-Cheng Wang, Johns Hopkins cally as linear combinations that multiply, This paper introduces both maxima and Bloomberg School of Public Health respectively, the hazard for the latent minima generalized extreme value (GEV) event and the hazard for the terminal Dale M. Needham, Johns Hopkins distribution to analyze right-censored event conditional on the latent one. The School of Medicine survival data. We also use GEV distribu- baseline hazard is estimated nonpara- tions to construct flexible models for Ciprian M. Crainiceanu, Johns Hopkins metrically using the EM algorithm, which populations with a surviving fraction. Our Bloomberg School of Public Health allows for closed-form Breslow-type proposed GEV model leads to extremely estimators at each iteration, drastically In this paper, we extend the Cox pro- flexible hazard functions. We show that reducing computational time compared portional hazards model to account for our Bayesian model has several nice with maximizing the marginal likelihood densely sampled time-varying covari- properties. For example, we prove that directly. The parametric part of the model ates as historical functional terms. This even when improper priors are used, is estimated by maximizing the profile approach allows the hazard function the resulting posterior distribution could likelihood. We derive asymptotic prop- at any time t to depend not only on the still be proper under some weak condi- erties for the model, while simulation current value of the time-varying covari- tions. We further provide theoretical and studies are presented to illustrate the ate, but also on all previous values. The numerical results showing that our GEV finite-sample properties of the method. fundamental idea is to assume a bivariate models offer a richer class of models than Its use in practice is demonstrated in the coefficient function ?(s, t) that estimates the widely used Weibull models. Finally, a analysis of a prostate cancer data set. a weight function that is applied to the full glioblastoma multiforme cancer data with or partial covariate history up to t, and is a cure rate is analyzed to illustrate the email: [email protected] allowed to change with t. Estimation is proposed GEV model. performed by maximizing the penalized email: [email protected]

Program & Abstracts 227 A MULTIPLE IMPUTATION CHANGE-POINT PROPORTIONAL APPROACH FOR SEMIPARAMETRIC HAZARDS MODEL FOR CLUSTERED CURE MODEL WITH INTERVAL EVENT DATA CENSORED DATA Yu Deng*, University of North Carolina, Jie Zhou*, University of South Carolina, Chapel Hill Columbia Jianwen Cai, University of North Caro- Jiajia Zhang, University of South lina, Chapel Hill Carolina, Columbia Donglin Zeng, University of North Caro- Alexander C. McLain, University lina, Chapel Hill of South Carolina, Columbia Jinying Zhao, Tulane University Bo Cai, University of South Carolina, In the analysis of clustered time-to-event Columbia data, some continuous variable may Interval censored data, where the exact possess a “change point”, which vio- event time is only known to lie in an lates the assumption of linear effects on observed time interval, are commonly the disease incidence in the standard encountered in practice. Such data Cox model. In this work, we propose a analysis may be conducted under the change-point proportional hazards model setting where a fraction of patients can for clustered event data. The model be considered as fully recovered and A FLEXIBLE PARAMETRIC incorporates the unknown threshold they will never experience the event of CURE RATE MODEL WITH KNOWN of the threshold variable as a change interest. We propose a semiparametric CURE TIME point in the regression. Pseudo-partial estimation method for the proportional Paul W. Bernhardt*, Villanova University likelihood functions are maximized for hazards mixture cure model, which is estimating both regression coefficients Models for survival data usually assume easy to implement and is computationally and the change point in the model. that all individuals will eventually experi- efficient. A multiple imputation approach Furthermore, we use the supremum test ence the event of interest. However, in based on the asymptotic normal data based on robust score statistics to check many applications, the event will never augmentation is used to obtain parameter the existence of a change point. The m occur for a subset of individuals. Cure and variance estimates for both the cure out of n bootstrap method is applied to rate models have long been used for probability and the survival probability make inference for the estimator of the handling this type of data by modeling of uncured patients. A simulation study change point, where m is determined the probability of being “cured” as well as is performed to evaluate the proposed by the extension of Bickel and Sakov the probability of survival among those method and the results are compared (2008) method. We establish the consis- who are not cured. We propose an exten- with a fully parametric approach. The pro- tency and asymptotic distributions of the sion of standard parametric mixture cure posed method is applied to 2000-2010 proposed estimators. The small-sample rate models that allows for the incorpora- Greater Georgia breast cancer dataset performance of the proposed method tion of both a fixed, known cure time and from the Surveillance, Epidemiology, and is demonstrated via simulation studies. different censoring distributions for the End Results Program. Finally, the Strong Heart Study dataset is cured and uncured subgroups. We show analyzed to illustrate the method. email: [email protected] through simulations that the proposed model performs well in a variety of email: [email protected] scenarios. email: [email protected]

228 ENAR 2015 | Spring Meeting | March 15–18 35. INVITED AND With increasingly widespread use of Elec- sions are usually related by an underlying CONTRIBUTED ORAL tronic Health Records (EHRs), predicting biological network. Making use of the POSTERS: health outcomes from high dimensional, network information is crucial to improve Methods and longitudinal health histories is of central classification performance as well as the Applications in High importance to healthcare. The medical biological interpretability. We proposed Dimensional Data literature has formalized such predic- a penalized multinomial logit model that and Machine Learning tion problems in a few instances, and is capable to adjust for both the high- the resulting “risk calculators” attract dimensionality of predictors and the widespread use. We have adapted a underlying network information. In fact, 35a. INVITED POSTER: predictive modeling method originally group LASSO was involved to induce MACHINE LEARNING METHODS developed in the context of speech rec- model sparsity and a network constraint FOR CONSTRUCTING REAL- ognition to the context of large-scale EHR was used to induce the smoothness of TIME TREATMENT POLICIES data. “Relational Random Forests” (RRF) the coefficients subject to the underly- IN MOBILE HEALTH greedily construct informative labeled ing network structure. To deal with the Susan Murphy*, University of Michigan graphs representing temporal relations non-convexity of the objective function between multiple health events at the in parameter estimation, we developed Yanzhen Deng*, University of Michigan nodes of randomized decision trees. We a proximal-gradient-based algorithm Mobile devices are being increasingly have applied RRFs to the problem of for efficient computation. The proposed used by health researchers to, in real- predicting strokes in patients newly diag- models were compared to models with time, both collect symptoms and other nosed with atrial fibrillation. Our approach no prior structure information in both information as well as provide interven- compare favorably with the widely used simulations and a problem of cancer tions. These interventions are often “CHADS2” risk calculator. subtype prediction with real data, and it outperformed the traditional ones in provided via treatment policies. The email: [email protected] policies specify how patient informa- both cases. tion should be used to determine when, email: [email protected] where and which intervention to provide. 35c. NETWORK-CONSTRAINED Here we present generalizations of “Actor- GROUP LASSO FOR HIGH Critic” learning methods from the field DIMENSIONAL MULTINOMIAL 35d. TWO SAMPLE MEAN TEST of Reinforcement Learning for use, with CLASSIFICATION WITH IN HIGH DIMENSIONAL existing data sets, in constructing treat- APPLICATION TO CANCER COMPOSITIONAL DATA ment policies. SUBTYPE PREDICTION Yuanpei Cao*, University of Pennsylvania email: [email protected] Xinyu Tian*, Stony Brook University Wei Lin, Peking University Jun Chen, Mayo Clinic Hongzhe Li, University of Pennsylvania 35b. INVITED POSTER: Xuefeng Wang, Stony Brook University Compositional data arise naturally in PREDICTING STROKES Classic multinomial logit model, com- many scientific applications; for example, USING RELATIONAL monly used in multiclass regression in microbiome studies, only the compo- RANDOM FORESTS problem, is restricted to few predictors sition of the bacterial taxa is observed. Zach Shahn, Columbia University and no regard to the relationship among Recently studies also found that the Patrick Ryan, Columbia University variables. Its usage is limited for genomic differences in the composition of the data, where the number of genomic fea- microbiome are associated with disease David Madigan*, Columbia University tures far exceeds the sample size. Also, genomic features such as gene expres-

Program & Abstracts 229 or treatment outcomes. Thus, detecting cations of classification must cope with Clustering methods usually assign a the differences of the composition is a some degree of shift, and performances hard cluster membership to indicate potentially important issue in microbiome of theoretically well-behaved methods whether or not an observation belongs to studies. However, the performance of the can suffer substantial degradation when a cluster. In situations where the underly- canonical generalized likelihood ratio test it is present. We consider the covariate ing subgroups overlap with each other or (Aitchison, 2003) on the additive log-ratio shift problem, a particular type of dataset there are outliers or noisy observations transformation (alr) of the composition shift where distributions of feature vectors that may influence clustering results, soft is unsatisfactory under the high dimen- are possibly different between training clustering methods may be desirable sion setting. In this article, we introduce and predicting (test) sets. Inspired by since these methods allow for the assign- a global test based on the centered kernel density estimations, we propose ment of a cluster membership probability log-ratio transformation (clr) to detect the a classification method that involves to indicate the likelihood that an observa- differences of the compositions. Under the weighted bootstrap and ensemble tion belongs to a cluster. These methods the assumption that the basis covariance learning. This procedure trains classifiers often involve resampling the dataset, matrix is sparse, we show that the limiting using subsets of the training data that are where cluster memberships are sum- null distribution of the test statistic and in some sense like the predicting (test) marized by a comembership matrix for the power of the test based on clr are cases, thereby dealing with the covari- each resampling run. The consensus the same as the test on the log transfor- ate shift problems. The resulting method matrix is then computed as the average mation of the basis. Simulation studies is called {\bf A}ctive {\bf S}et {\bf S} of the comemberhsip matrices from all demonstrate that such tests based on clr election {\bf C}lassification (ASSC). The resampling runs. In this work we propose outperform some naive tests that ignore basic procedure is flexible and can be using the Bron-Kerbosch algorithm from the unique features of the compositional used with existing methods of classifica- graph theory to obtain clusters from the data. We apply the proposed test to an tion, such as support vector machines consensus matrix. This algorithm is ideal analysis of microbiome data that com- (SVM)s, linear discriminant analysis since obtaining clusters from the consen- pare normal and Crohn’s disease gut (LDA), and classification trees to improve sus matrix can be viewed as equivalent microbiomes. their prediction accuracy. ASSC performs to the maximum clique problem in graph well on both simulated and real data sets. theory where the goal is to find the larg- email: [email protected] We preface application of ASSC with a est complete subgraph within a graph preliminary screening step to deal with and by “complete” it means that any two 35e. CLASSIFICATIONS BASED situations where the number of features is nodes of the graph are connected. larger than the training set size. ON ACTIVE SET SELECTIONS email: [email protected] Wen Zhou*, Colorado State University email: [email protected]

Stephen Vardeman, Iowa State 35g. TESTING FOR THE PRESENCE University 35f. APPLICATION OF A GRAPH OF CLUSTERING THEORY ALGORITHM IN Huaiqing Wu, Iowa State University Erika S. Helgeson*, University of North SOFT CLUSTERING Max Morris, Iowa State University Carolina, Chapel Hill Wenzhu Mowrey*, Albert Einstein Dataset shift is the phenomenon in Eric Bair, University of North Carolina, College of Medicine predictive analytics where distributions of Chapel Hill George C. Tseng, University training and predicting (or test) data are Cluster analysis is an unsupervised of Pittsburgh different. This is encountered in develop- learning strategy that can be employed ing classification methods. It is drawing Lisa A. Weissfeld, Statistics to identify groups of observations in data growing attention as many practical appli- Collaborative, Inc. sets of unknown structure. This strategy

230 ENAR 2015 | Spring Meeting | March 15–18 is particularly useful for analyzing high- screening method, and (2) to estimate scores on various tests, but also data that dimensional data such as microarray the central subspace containing all useful measure the structure and function of the gene expression data. Many methods information for relevant predictors. The brain via magnetic resonance imaging are available which can identify the two stages procedure is also applicable (MRI), functional MRI (fMRI), or electroen- number of clusters present in the data for multivariate response. This proce- cephalography (EEG). These latter types and/or group the observations into their dure can also handle the situation when of data have an inherent structure and appropriate clusters based on feature the number of relevant predictors is still may be considered as functional data. characteristics, but there are only a few larger than the number of observations. Unfortunately, there is often little clinical methods that can determine whether Moreover, theoretical results are estab- guidance about which, if any, of these clusters are actually present in the data. lished for this procedure. We applied many baseline covariates are prescriptive We propose a novel method for testing our methodology to simulated data and of treatment. We propose an approach the null hypothesis that no clusters are real data sets including leukemia data that both selects important prescriptive present in a given data set by comparing and prostate cancer data. They showed covariates and estimates a treatment the number of features associated with that our method works pretty well, in decision rule when there are many candi- the clusters to the expected number of particular our method does not lose any date covariates consisting of both scalar features under an appropriate null distri- important information in application to and functional data. We describe our bution. We apply this method to a variety prostate cancer data. method and how to implement it using of simulated data sets and compare the existing software. Performance is evalu- email: [email protected] results of our method to those of previ- ated on simulated data in a variety of ously published methods. Overall, our settings and we apply our method to data method has comparable predictive accu- 35i. VARIABLE SELECTION FOR arising from the study of patients suffer- racy and much shorter computing time, TREATMENT DECISIONS WITH ing from MDD from which baseline scalar indicating that our method is a useful tool SCALAR AND FUNCTIONAL and functional data are available. for determining if clusters are present in COVARIATES email: [email protected] a data set. Adam Ciarleglio*, New York University email: [email protected] School of Medicine 35j. MOPM: MULTI-OPERATOR Eva Petkova, New York University School PREDICTION MODEL BASED ON 35h. VARIABLE SELECTION of Medicine and Nathan S. Kline Institute HIGH-DIMENSIONAL FEATURES for Psychiatric Research AND SUFFICIENT DIMENSION Hojin Yang*, University of North Carolina, REDUCTION FOR HIGH R. Todd Ogden, Columbia University Chapel Hill DIMENSIONAL DATA Thaddeus Tarpey, Wright State Hongtu Zhu, University of North Carolina, Yeonhee Park*, University of Florida University Chapel Hill Zhihua Su, University of Florida The amount and complexity of patient- Joseph G. Ibrahim, University of North Contemporary data sets often involve level data being collected in randomized Carolina, Chapel Hill controlled trials offers both opportuni- large number of predictor variables, We consider the problem of integrat- ties and challenges for developing such a high-dimensional data brings the ing and identifying important genomic, personalized rules for assigning treat- challenge for traditional data analysis imaging, and biological markers to ment for a given condition. For example, methods. In this paper, we propose a accurately predict low-dimensional out- trials examining treatments for major two stages procedure: (1) to identify the come variables, such as disease status depressive disorder (MDD) are not only relevant predictor variables and dis- or behavioral scores. Such prediction collecting a large number of typical card irrelevant variables with the help of problem can have a great impact in baseline data such as age, gender, or

Program & Abstracts 231 public health from disease prevention, a fraction of the variables in addition to hood ratio tests under the SPARC model. to detection, to treatment selection. The studying association. Several methods The asymptotic properties of these tests aim of this paper is to develop a multi- for sparse CCA have been proposed in are established in high dimensions. Our operator prediction modeling (MOPM) the literature. Although these methods theory does not require the irrepresent- framework to perform a supervised have proven useful in various applica- able condition or any assumption on dimension reduction and then build an tions, their main drawbacks are failure to the minimal signal strength. We further accurate prediction model. We formulate account for prior biological knowledge, examine the impact of model misspecifi- the problem of supervised dimension and assumption of independence of cation on parameter estimation and reduction as a variable selection problem the underlying covariance structures, hypothesis tests. We also show that the and propose an independent screen- the latter of which can be overly restric- proposed methods are applicable to the ing method to select a set of informative tive. In this paper, we propose a novel challenging settings with missing values features, which may have a complex structured sparse CCA method that over- or selection bias. Finally, we establish the nonlinear relationship with outcome vari- comes these limitations by incorporating corresponding theory for the multitask ables. Moreover, we develop a novel local biological information and making no learning problem to handle the data with projection method to use multiple linear assumptions on the underlying covari- heterogeneity. In particular, under the operators to project all informative pre- ance structures. We compare our method semiparametric group sparsity assump- dictors into multiple local subspaces in to existing sparse CCA approaches via tion, we demonstrate that the resulting order to capture reliable and informative simulation studies and real data analysis estimator can achieve an improvement covariate information. Theoretically, we using gene expression and metabolomics in the estimation errors as compared systematically investigate some theoreti- data from a cardiovascular disease study. to the L1-regularized estimator. These cal properties of MOPM. Our simulation theoretical results are illustrated through email: [email protected] results and real data analysis show that simulation studies, and a real data MOPM outperforms many state-of-theart example. methods in terms of prediction accuracy. 35l. SPARC: OPTIMAL ESTIMATION email: [email protected] email: [email protected] AND ASYMPTOTIC INFERENCE UNDER SEMIPARAMETRIC SPARSITY 35m. LOCAL-AGGREGATE 35k. STRUCTURED SPARSE CCA MODELING FOR BIG-DATA Yang Ning*, Princeton University FOR HIGH DIMENSIONAL VIA DISTRIBUTED OPTIMI- DATA INTEGRATION Han Liu, Princeton University ZATION: APPLICATIONS TO NEUROIMAGING Sandra Safo*, Emory University We propose a new inferential framework called semiparametric regression via Yue Hu*, Rice University Qi Long, Emory University chromatography (SPARC) to handle the Genevera I. Allen, Rice University, Canonical Correlation Analysis (CCA) is challenges of complex data analysis Baylor College of Medicine and a classical multivariate analysis tool that featured by high dimensionality and Texas Children’s Hospital aims at studying association between two heterogeneity. Under a semiparamet- sets of variables by finding linear com- ric sparsity assumption, we develop a Technological advances have led to binations of all available variables with regularized statistical chromatographic a proliferation of structured big-data maximum correlation. CCA has limita- estimation method, and establish the that is often collected and stored in a tions in the high dimensional framework nearly optimal parameter estimation error distributed manner. We are specifically as it is usually of interest to select only bounds under Lq norms. Furthermore, we motivated to build predictive models for propose a unified framework for statistical multi-subject neuroimaging data based inference including score, Wald and likeli- on each subject’s brain imaging scans.

232 ENAR 2015 | Spring Meeting | March 15–18 This is an ultra-high-dimensional problem ners. A major component of personalized 35o. INTEGRATIVE MULTI-OMICS that consists of a matrix of covariates medicine is to estimate individualized CLUSTERING FOR DISEASE (brain locations by time points) for each treatment rules. Recently, Zhao et al. SUBTYPE DISCOVERY BY subject; few methods currently exist to fit (2012) proposed the outcome weighted SPARSE OVERLAPPING supervised models directly to this tensor learning (OWL) to construct individualized GROUP LASSO AND TIGHT data. We propose a novel modeling and treatment rules that directly optimize the CLUSTERING algorithmic strategy to apply generalized clinical outcome. However, the individual- SungHwan Kim*, University of Pittsburgh linear models (GLMs) to this massive ized treatment rule estimated by OWL tensor data in which one set of variables would keep, if possible, the treatment YongSeok Park, University of Pittsburgh is associated with locations. Our method assignments that the subjects actually George Tseng, University of Pittsburgh begins by fitting GLMs to each location received. This behavior of OWL weakens With the rapid advances in technologies separately, and then builds an ensemble the finite sample performance. In this of microarray and massively paral- by blending information across locations article, we propose a new method, called lel sequencing, data of multiple omics through regularization with what we term Residual Weighted Learning (RWL), to sources from a large sample cohort are an aggregating penalty. Our so called, alleviate this problem, and to improve the now frequently seen in many consor- Local-Aggregate Model, can be fit in a finite sample performance. Not like OWL tium studies. Effective multi-omics data completely distributed manner over the which weights the misclassification errors integration has brought new statistical locations using an Alternating Direction by the clinical outcomes, the RWL weights challenges. One of important biological Method of Multipliers (ADMM) strategy, the errors by the residuals of outcome objective of the data analysis is clustering and thus greatly reduces the computa- from a regression fit on clinical covariates patients in order to identify meaningful tional burden. Furthermore, we propose other than the treatment assignment. We disease subtypes, which is the funda- to select the appropriate model through utilize the truncated hinge loss function mental basis for tailored treatment and a novel sequence of faster algorithmic in the RWL, and provide a difference of personalized medicine. Several methods solutions that is similar to regulariza- convex (d.c.) algorithm to solve the non- have been proposed in the literature to tion paths. We will demonstrate both the convex optimization problem. We show accommodate this purpose, including the computational and predictive modeling that the resulting estimator of the treat- popular iCluster in many cancer applica- advantages of our methods via simula- ment rule is universally consistent. We tions. However, all of those methods fail tions and an EEG classification problem. further obtain a finite sample bound for the to properly incorporate the information difference between the expected outcome email: [email protected] from inter-omics regulation flow and using the estimated individualized treat- do not allow outlier samples scatter- ment rule and that of the optimal treatment ing away from the tight clusters. In this 35n. RESIDUAL WEIGHTED rule. The performance of our proposed paper, we propose a group structured LEARNING FOR ESTIMATING RWL method is illustrated in simulation iCluster method to incorporate a sparse INDIVIDUALIZED TREATMENT studies and an analysis of chronic depres- overlapping group lasso technique and RULES sion data. a tight clustering concept via regulariza- Xin Zhou*, University of North Carolina, email: [email protected] tion to circumvent the aforementioned Chapel Hill pitfalls. We show by two real examples and simulated data that our proposed Michael R. Kosorok, University of North methods improve the original iCluster in Carolina, Chapel Hill clustering accuracy, biological interpreta- Personalized medicine has received tion, and are able to generate coherent increasing attention among statisticians, tight clusters. computer scientists, and clinical practitio- email: [email protected]

Program & Abstracts 233 35p. IDENTIFYING PREDICTIVE 36. Recent Research in POST-TRIAL SIMULATION OF TYPE MARKERS FOR PERSONALIZED Adaptive Randomized I ERROR FOR DEMONSTRATION OF TREATMENT SELECTION Trials with the Goal of CONTROL OF TYPE I ERROR Yuanyuan Shen*, Harvard University Addressing Challenges Scott M. Berry*, Berry Consultants in Regulatory Science Tianxi Cai, Harvard University The ability to demonstrate type I error of innovative adaptive trials through Many illness show heterogeneous simulation allows a whole new world response to treatment, which motivates ADAPTIVE ENRICHMENTWITH of innovative trial design. Simulation to researchers to advocate the individu- SUBPOPULATION SELECTIONAT demonstrate control can never explore alization of treatment to each patient. INTERIM the entire “null space”— but extensive Many Individualized Treatment Rules Sue-Jane Wang*, U.S. Food and Drug pre-simulation can be done. Frequently (ITR) have been developed but not many Administration the type I error can depend upon ancil- approaches on identifying markers that Hsien-Ming James Hung, U.S. Food and lary aspects of a trial — the rate under can guide treatment selection have been Drug Administration control, the shape of the sitribution, the studied. Traditional Wald test of interac- accrual rate, and even the drop-out rate. tion between treatment and markers has There is growing interest in pursuing We discuss the ability to do prospectively two major limitations: the validity of test- adaptive enrichment for drug develop- defined post-trial simulations to addition- ing for interaction in terms of identifying ment because of its potential to achieve ally demonstrate control of type I error important treatment selection markers is the goal of personalized medicine. There with a bootstrapping type approach. We scale-dependent; and it doesn’t consider are many versions of adaptive enrichment present several examples and discus- potential non-linearity among the predic- proposed across many disease indica- sions of the regulatory impact. tors. We propose a scale-independent tions. Some are exploratory adaptive score statistic to test and detect impor- enrichment and others aim at confirma- email: [email protected] tant baseline predictors that can guide tory adaptive enrichment. In this paper treatment selection. Kernel machine presentation, we give a brief overview framework is also incorporated to handle on adaptive enrichment and the meth- BAYESIAN COMMENSURATE PRIOR the non-linearity among predictors. Simu- odologies that are growing in statistical APPROACHES FOR PEDIATRIC AND lation studies show that our proposed literature. A case example that was RARE DISEASE CLINICAL TRIALS kernel machine based score test is more planned to adapt two design elements, Bradley P. Carlin*, University powerful than the Wald test when there is i.e., population adaptation and statistical of Minnesota non-linear effect among the predictors as information adaptation, will be given. We Cynthia Basu, University of Minnesota well as when the outcome is binary and articulate the challenges in the imple- the link function is non-linear. Further- mentation of a confirmatory adaptive Brian Hobbs, University of Texas more, when there is high-correlation enrichment trial. We also assess the con- MD Anderson Cancer Center among predictors and when the number sistency of treatment effect before and Rare diseases are difficult to study, since of predictors is big, our method over- after adaptation. We also discuss and the numbers of persons who can be performs Wald test due to the limitations articulate design considerations for adap- enrolled in a traditional clinical trial is of Wald test under such scenarios. tive enrichment among a dual-composite typically insufficient to demonstrate a null hypothesis, a flexible dual-inde- email: [email protected] statistically significant treatment effect. pendent null hypothesis and a rigorous dual-independent null hypothesis. email: [email protected]

234 ENAR 2015 | Spring Meeting | March 15–18 Pediatric disease researchers face similar IDENTIFYING SUBPOPULATIONS 37. Statistical Innovations challenges. Here, drugs successfully WITH THE LARGEST TREATMENT in Functional Genomics tested on adults are sometimes available, EFFECT and Population Health but we still lack information on dosing, Iván Díaz*, Johns Hopkins Bloomberg safety, and efficacy of these drugs in School of Public Health children. Full or partial extrapolation of QUALITY PRESERVING DATABASES: existing adult data to the pediatric case is Michael Rosenblum, Johns Hopkins STATISTICALLY SOUND AND EFFI- sometimes justified, but current methods Bloomberg School of Public Health CIENT USE OF PUBLIC DATABASES are often ad hoc and depend crucially In the presence of effect modifiers, overall FOR AN INFINITE SEQUENCE on knowing the appropriate amount of population effects often mask the pres- OF TESTS information to borrow from the adult data. ence of subpopulations with large and Saharon Rosset*, Tel Aviv University This talk considers a collection of novel small treatment effects. Knowledge of Ehud Aharoni, IBM Research Bayesian statistical methods and soft- such subpopulations is of high impor- ware tools for more efficient and effective tance in personalized medicine as it Hani Neuvirth, IBM Research orphan and pediatric drug trials. Bayes- allows physicians to assign the most ben- Large databases whose usage is open ian methods offer a formal statistical eficial treatment according to the patient’s to the scientific community to facilitate framework for incorporating all sources characteristics, potentially reducing research are becoming commonplace, of knowledge (structural constraints, costs, increasing efficacy, and improv- especially in Biology and Genetics. The expert opinion, and both historical and ing the system overall. In this paper we emerging scenario in which a commu- experimental data), thus offering the present a method for classifying individu- nity of researchers sequentially conduct possibility of substantially reduced trial als according to their treatment effect, multiple statistical tests on one shared sizes, thanks to their more efficient use of conditional on baseline variables. Existing database gives rise to major multiple information. This in turn typically leads to methods rely on classification criteria that hypothesis testing issues. We suggest increases in statistical power and reduc- optimize the average treatment effect, but a scheme we term Quality Preserving tions in cost and ethical hazard, the latter fail to account for the uncertainty in the Database (QPD) for controlling false since fewer patients need be exposed estimates. We propose a classification discovery without any power loss by to inferior treatments. Our methods use criterion that optimizes the signal to noise adding new samples for each use of the commensurate priors where possible to ratio, ensuring optimal power of a hypoth- database and charging the user with combine relevant auxiliary information, esis test of no effect. Our motivating the expenses. The crux of the scheme and we check our procedures to ensure application is the phase II MISTIE trial on is a carefully crafted pricing system adequate Type I error performance. We minimally invasive surgery after Intrace- that fairly prices different user requests illustrate in the context of and using real rebral Hemorrhage (ICH). We present the based on their demands while controlling data from ongoing clinical trials at the results of the analysis of the MISTIE II trial false discovery. The statistical problem University of Minnesota and the University as well as simulations showing the prop- encountered is one of defining appropri- of Texas M.D. Anderson Cancer Center, erties of the method in finite samples. ate measures of false discovery that can and in disease areas such as adrenoleu- email: [email protected] be controlled sequentially, and designing kodystrophy (ALD), Gaucher’s Disease, methodologies that can control them in epilepsy, Parkinson’s Disease, and cer- the context of QPD. We describe a simple tain rare cancers or cancer subtypes. QPD implementation based on control- email: [email protected] ling the family-wise error rate using a method called alpha-spending, and a more involved implementation based on controlling a measure called mFDR,

Program & Abstracts 235 using an approach we term generalized IMPUTING TRANSCRIPTOME A BAYESIAN METHOD FOR THE alpha investing. We derive the favorable IN INACCESSIBLE TISSUES IN DETECTION OF LONG-RANGE statistical properties of generalized alpha AND BEYOND THE GTEx PROJECT CHROMOSOMAL INTERACTIONS investing variants in general, and in the VIA RIMEE IN Hi-C DATA context of QPD in particular. The variant Jiebiao Wang, University of Chicago Zheng Xu, University of North Carolina, we implement can guarantee infinite use Chapel Hill of a public database while preserving Dan Nicolae, University of Chicago Guosheng Zhang, University of North power, with very low costs, or even no Nancy Cox, University of Chicago costs under some realistic assumptions. Carolina, Chapel Hill Lin S. Chen*, University of Chicago We demonstrate this idea in simulations Fulai Jin, Ludwig Institute for Cancer and describe its potential application to In order to synthesize new knowl- Research several real life setups. edge about the organization of gene Mengjie Chen, University of North expression across human tissues, the email: [email protected] Carolina, Chapel Hill Genotype-Tissue Expression (GTEx) project collected the transcriptome data Patrick F. Sullivan, University of North FUSED LASSO ADDITIVE MODEL in a wide variety of tissues from organ Carolina, Chapel Hill donors. Some human tissues are hard Ashley Petersen*, University Zhaohui Qin, Emory University to access and transcriptome information of Washington in those tissues have only limited avail- Terrence S. Furey, University of North Daniela Witten, University of Washington ability. We show that those transcriptome Carolina, Chapel Hill Noah Simon, University of Washington data can be imputed by harnessing Ming Hu, New York University rich information in the GTEx data, and Yun Li*, University of North Carolina, We consider the problem of predicting an furthermore it is feasible to use GTEx data Chapel Hill outcome variable using covariates that as reference and impute inaccessible tis- are measured on independent obser- sues for future studies. Here we propose Advances in chromosome conformation vations, in the setting in which flexible an approach -- Robust Imputation of capture and next-generation sequencing and interpretable fits are desirable. We Multi-tissue Expression incorporating technologies are enabling genome-wide propose the fused lasso additive model EQTLs (RIMEE) to impute transcriptome investigation of dynamic chromatin inter- (FLAM), in which each additive function in missing tissues by taking advantage actions. For example, Hi-C experiments is estimated to be piecewise constant of information in related tissues, related generate genome-wide contact frequen- with a small number of adaptively-chosen genes, and moreover, eQTLs. Based cies between pairs of loci by sequencing knots. FLAM is the solution to a convex on cross-validation analyses of the nine DNA segments ligated from loci in close optimization problem, for which a simple tissues in the GTEx data, we evaluate the spatial proximity. One essential task in algorithm with guaranteed convergence performance of imputing GTEx missing such studies is peak calling, that is, the to the global optimum is provided. FLAM tissues, assess the contributions of cis- identification of non-random interactions is shown to be consistent in high dimen- and trans-eQTLs in imputation, examine between loci from the two-dimensional sions, and an unbiased estimator of its the feasibility of using GTEx as refer- contact frequency matrix. Successful degrees of freedom is proposed. We ence and imputing based on accessible fulfillment of this task has many impor- evaluate the performance of FLAM in a tissues for future studies, and provide tant implications including identifying simulation study and on two data sets. insights into the inter-tissue predictive- long-range interactions that assist in inter- email: [email protected] ness and relatedness. preting a sizable fraction of the results email: [email protected]

236 ENAR 2015 | Spring Meeting | March 15–18 from genome-wide association studies location of risk variants. However, existing clinical and cellular data. An Integrative (GWAS). The task - distinguishing biologi- coalescent-based methods are computa- approach to the analysis of these data cally meaningful chromatin interactions tionally very challenging and can only be can push the statistical inference in and from massive numbers of random inter- applied to samples below 200 individuals. understanding of systems biology onto actions - poses great challenges both Here, we propose a novel approach to another level, beyond where traditional statistically and computationally. Model overcome this limitation. First, we infer a research stops. In this talk, we envision based methods to address this challenge set of clusters from the sampled haplo- an integrative genomics data analysis are still lacking. In particular, no statistical types. Then, we apply coalescent-based that is built on the big data analytics model exists that takes the underlying approaches to approximate the geneal- platform such as SPARK, which respects dependency structure into consideration. ogy of the clusters. Hence, the dimension computational and memory efficien- We propose a hidden Markov random of external nodes in coalescent models cies, in addition to statistical power and field (HMRF) based Bayesian method to is reduced from the total sample size to robustness. rigorously model interaction probabilities the number of clusters. We evaluate the email: [email protected] in the two-dimensional space based on cluster genealogy and their descendants the contact frequency matrix. By bor- phenotype distribution, to integrate over rowing information from neighboring loci all positions and establish CRs where RECALCULATING THE RELATIVE pairs, our method demonstrates superior risk variants are most likely to occur. In RISKS OF AIR POLLUTIONTO reproducibility and statistical power in simulation studies, our method correctly ACCOUNT FOR PREFERENTIAL both simulations and real data. localizes short segments around true SITE SELECTION risk positions in datasets with thousands email: [email protected] James V. Zidek*, University of British of individuals. Thus we have developed Columbia a novel approach to estimate the gene- FINE MAPPING OF COMPLEX TRAIT alogy of sequenced regions that can Gavin Shaddick, University of Bath LOCI WITH COALESCENT METHODS be applied to very large case-control In the 1960s, over 2000 sites in the UK IN LARGE CASE-CONTROL STUDIES datasets. monitored black smoke (BS) air pollution Ziqan Geng, University of Michigan email: [email protected] due to concerns about its effect on public health that were clearly demonstrated by Paul Scheet, University of Texas the famous London fog of 1952. Abate- MD Andersen Cancer Center 38. Big Data: Issues in ment measures led to a decline in the Sebastian Zöllner*, University Biosciences levels of BS and hence a reduction in the of Michigan number of monitoring sites to less than 200 by 1996. A case study demonstrates Identifying the risk variants underly- BIG GENOMICS DATA ANALYTICS that sites were removed preferentially, ing association signals for complex Haiyan Huang*, University of California, causing exaggerated estimates of diseases is challenging due to the Berkeley pollution levels. This talk will describe complicated dependence structure of methods for mitigating the effects of linkage disequilibrium. By modeling the Bin Yu, University of California, Berkeley that overestimation and show through hereditary process of a target region, Explosive high-throughput technologies the case study that the relative risk of coalescent-based approaches improve in the last decade demand developments environmental health outcomes has been this identification and model the prob- of useful high dimensional statistical tools underestimated. The large number of ability of carrying risk variants at all for systematic analyses of large genom- monitoring sites and their associated high loci jointly. These probabilities provide ics data such as gene expression, SNP, dimensional data vectors rule out naive Bayesian credible regions (CR) for the use of classical geostatistical and Bayes-

Program & Abstracts 237 ian hierarchical methods. Hence there is by functional principal components. We proposed method also allows flexibility a need for novel approaches in analysis, apply these approaches to investigate the for the researcher to focus the analysis in which will be described. The work has relations between brain connectivity and hypothesized activation areas rather than important general implications for the subject characteristics. the whole domain. setting of regulatory standards and the email: [email protected] email: [email protected] design of monitoring networks. email: [email protected] 39. Recent Advances MIXTURE OF INHOMOGENEOUS in Statistical Ecology MATRIX MODELS FOR SPECIES- FUNCTIONAL DATA ANALYSIS FOR RICH ECOSYSTEMS QUANTIFYING BRAIN CONNECTIVITY Frederic Mortier*, CIRAD - Tropical For- EFFICIENT SPATIAL AND SPATIO- Hans-Georg Mueller*, University est Goods and Ecosystem Services Unit TEMPORAL FALSE DISCOVERY RATE of California, Davis CONTROL Understanding how environmental factors Alexander Petersen, University of could impact population dynamics is of Ali Arab*, Georgetown University California, Davis primary importance for species conserva- Analysis of spatial and spatio-temporal tion. Matrix population models are widely Owen Carmichael, Louisiana State data often requires a large number of used to predict population dynamics. University hypothesis tests. For example, one may However, in species-rich ecosystems with Functional data analysis provides a test spatial data to detect activation areas many rare species, the small population toolbox for the analysis of data samples or identify changes in the environment. sizes hinder a good fit of species-specific that can be viewed as being generated Given the large number of hypothesis models. In addition, classical matrix by repeated realizations of an underly- tests in these settings, type I error control models do not take into account environ- ing (and often latent) stochastic process. cannot be effectively achieved using mental variability. We propose a mixture The application of this methodology to conventional multiple testing procedures of regression models with variable selec- paired processes (X,Y) will be illustrated such as Bonferroni. Alternatively, one tion allowing the simultaneous clustering by the problem of quantifying connec- may use the False Discovery Rate (FDR) of species into groups according to vital tivity for resting state fMRI data, where control. However, FDR for applications rate information (recruitment, growth, for each subject and each brain voxel, with large number of tests may be ineffi- and mortality) and the identification of a BOLD time signal is recorded. The cient due to low statistical power. For data group-specific explicative environmen- functional data analysis approach leads with spatial or spatio-temporal structure, tal variables. We develop an inference to various measures of functional cor- we may benefit from the spatial (and method coupling the R packages flexmix relation between X and Y. The resulting temporal) characteristics of the data in and glmnet. We first highlight the effec- correlations between brain hubs provide order to conduct tests in a multiresolution tiveness of the method on simulated a basis for the construction of subject- fashion (large scale to fine scale). In this datasets. Next, we apply it to data from a specific networks. A second application paper, we provide an overview of exist- tropical rain forest in the Central African of functional data analysis is based on a ing methods and propose a hierarchical Republic. We demonstrate the accuracy novel construction of network functions multiresolution approach to conduct FDR of the inhomogeneous mixture matrix that reflect inter- and intra-hub connectiv- control for spatial and spatio-temporal model in successfully reproducing stand ity during resting state of the brain for signals. The proposed method results in dynamics and classifying tree species each subject. A sample of networks is higher efficiency (i.e., improved power) into well-differentiated groups with clear then represented by a sample of network compared to existing FDR methods. The ecological interpretations. functions, which may be represented email: [email protected]

238 ENAR 2015 | Spring Meeting | March 15–18 SPATIO-TEMPORAL MODELING possibly discrete and/or the sample size disease. We propose a general frame- OF MULTIPLE SPECIES MIGRATION is big. The motivating examples include work for risk stratification by introducing FLOW data derived from the land surveys in the risk stratification distribution, which various regions of the United States, is the distribution of the changes in Trevor J. Oswald*, University of Missouri which require statistical methods for disease risk indicated by each possible There are many complex interactions data analysis that balance modeling test result. The mean of this distribu- between species, including competition, complexity, statistical efficiency, and com- tion, the mean risk stratification (MRS) predation, and mutualism, to name a putational feasibility. In this talk, some is the average amount of extra disease few of the most commonly understood of the existing methodology for spatial (or deficit of disease) that a test reveals interactions. These interactions between discrete/continuous data is reviewed for an individual patient. The MRS is a species can lead to variations in their and new approaches are proposed. function of not only the risk difference, migratory patterns. Historically, much of In particular, models for spatial count, but also marker positivity, demonstrating the previous research has focused on ordinal, nominal, and proportional data that a big risk difference does not imply migratory patterns of a single species are considered. Comparisons and con- good risk stratification for markers that between a small number of segregated nections will be drawn between different are rarely positive. The MRS is also a regions (i.e., metapopulation analysis). data types and modeling approaches. function of Youden’s index and disease Drawing concepts from metapopula- For illustration, the methods are applied prevalence, demonstrating that a large tion analysis, social demography, to analyze land cover data for mapping Youden’s index does not imply good spatial econometrics, and dynamical and inferring about forest landscape risk stratification if disease is too rare. spatio-temporal modeling, we extend the structures. We demonstrate that the net expected previous work by developing a methodol- benefit of a diagnostic test is a function email: [email protected] ogy that can predict the migratory flows of test characteristics solely through the of several populations simultaneously. MRS. Reasoning based on MRS enforces More generally, this model can be used 40. New Analytical Issues rational decision-making based on the for multivariate population flows in other in Current Epidemiology principle of “equal management of equal areas of science. The model we propose Studies of HIV and other risks”. We discuss examples from the makes use of spatio-temporal dynam- Sexually Transmitted presenter’s experience serving on the ics of the system, while accounting for Infections guidelines committee for HPV testing in uncertainty exhibited in the sampling, as cervical cancer screening. well as the process underlying the migra- email: [email protected] tion flows within a Bayesian hierarachical A FRAMEWORK FOR QUANTIFYING modeling framework. RISK STRATIFICATION FROM e-mail: [email protected] DIAGNOSTIC TESTS: APPLICATION COMBINING INFORMATION TO TO HPV TESTING IN CERVICAL ESTIMATE ADHERENCE IN TRIALS CANCER SCREENING OF PRE-EXPOSURE PROPHYLAXIS STATISTICAL MODELING OF Hormuzd Katki*, National Cancer Insti- FOR HIV PREVENTION SPATIAL DISCRETE AND tute, National Institutes of Health James Hughes*, University CONTINUOUS DATA IN ECOLOGY A test or biomarker that stratifies disease of Washington Jun Zhu*, University of Wisconsin, risks allows clinicians to only intervene In trials of pre-exposure prophylaxis Madison on only those who have or will develop (PrEP) for HIV prevention understand- Modeling spatial data in ecology and ing the effect of adherence on treatment drawing statistical inference is chal- efficacy is of great interest. However, lenging especially when response is even though most PrEP studies collect

Program & Abstracts 239 multiple measures of adherence (e.g., pairwise likelihood for analysis of longitu- of people who are uninfected, is used self-report, pill counts, plasma drug dinal HPV couple cohort data to identify to approximate the incidence rate. Our levels), no rigorous approach for combin- risk factors associated with HPV transmis- methods account for uncertainty in the ing these various sources of information sion, estimate difference in risk between durations of time spent in the biomarker has been developed. We develop novel male-to-female and female-to-male HPV defined early disease stage. We find that methods for combining these measures transmission, and compare genotype- failure to account for these uncertain- to estimate adherence using a latent vari- specific transmission risks within couples. ties when designing surveys can lead able model in a Bayesian framework. The The method was applied on the motivat- to imprecise estimates of incidence and approach is applied to data from a trial ing HPV couple cohort data from the underpowered studies. We evaluated of intermittent PrEP use to understand male circumcision (MC) trial in Uganda to our sample size methods in simulations variability in levels and patterns of adher- assess the effect of MC on HPV trans- and found that they performed well in a ence. We show how the methods can mission. Age stratified analysis was also variety of underlying epidemics. Code for also provide insights about the utility of conducted to understand the natural implementing our methods in R is avail- each measure for estimating adherence history of HPV infection and explain the able from the authors upon request. mechanisms through which MC reduced email: [email protected] email: [email protected] HPV detections in men and women. email: [email protected] ANALYSIS OF LONGITUDINAL DEVELOPMENT OF ACCURATE MULTIVARIATE OUTCOME DATA METHODS TO ESTIMATE HIV FROM COUPLES COHORT STUDIES: SAMPLE SIZE METHODS FOR INCIDENCE IN CROSS-SECTIONAL APPLICATION TO HPV TRANSMIS- ESTIMATING HIV INCIDENCE FROM SURVEYS CROSS-SECTIONAL SURVEYS SION DYNAMICS Oliver B. Laeyendecker*, National Xiangrong Kong*, Johns Hopkins Jacob Moss Konikoff*, University Institute of Allergy and Infectious University of California, Los Angeles Diseases, National Institutes of Health HPV is a common STI with 14 known Ron Brookmeyer, University of Accurate methods of estimating HIV oncogenic genotypes causing ano- California, Los Angeles incidence from cross-sectional surveys genital carcinoma. While gender-specific are critical to monitoring the epidemic Understanding HIV incidence, the infections have been well studied, one and determining the population level rate at which new infections occur in remaining uncertainty in HPV epidemiol- impact of prevention efforts. Using 1000s populations, is critical for tracking and ogy is HPV transmission within couples. of samples from individuals with known surveillance of the epidemic. In this Understanding transmission in couples duration of infection we combined anti- paper we derive methods for determining however is complicated by the multi- body titer and avidity assays with CD4 sample sizes for cross-sectional surveys plicity of genital HPV genotypes and and viral load testing in a multi assay to estimate incidence with precision and sexual partnership structures that lead algorithm (MAA) where the mean duration to detect changes in incidence from to complex multi-faceted correlations in that individuals appear recently infected two successive cross-sectional surveys data generated from HPV couple cohorts, was 159 days (95% CI 134, 186). We with adequate power. In these surveys including inter-genotype, intra-couple, validated the method in three longitudinal biomarkers such as CD4 cell count, viral and temporal correlations. We devel- cohorts, where the incidence estimated load and recently developed serologi- oped a hybrid modeling approach using using the MAA was nearly identical to the cal assays are used to determine which Markov transition model and composite observed incidence in the cohorts. For individuals are in an early disease stage of infection. The total number of individu- als in this stage, divided by the number

240 ENAR 2015 | Spring Meeting | March 15–18 a low incidence cohort of women at risk this data and for providing momentary analysis approaches that can deal with for HIV infection (HPTN064) the observed interventions. We discuss the design and the data complexity, describe its structure incidence was 0.24% (95% CI 0.07, 0.62) analysis of these types of trials. and its association with health outcomes. compared to 0.26% (95% CI 0.03, 0.95) In particular, I will discuss results for two email: [email protected] using our MAA. In a moderate incidence motivating studies: 1) the association population of individuals from a vaccine between age and the circadian rhythm of preparedness cohort (HIVNET001) the NOT EVERYBODY, BUT SOME activity; and 2) the association between observed incidence was 1.04% (95% PEOPLE MOVE LIKE YOU mental health disorders and activity CI 0.70, 1.55) compared to 1.09% (95% patterns. Ciprian M. Crainiceanu*, Johns Hopkins CI 0.60, 1.84) using our MAA. In a high Bloomberg School of Public Health email: [email protected] incidence cohort of African American MSM (HPTN061), the observed incidence Accelerometers are now used extensively was 3.02% (95% CI 2.01, 4.37) compared in health studies, where they increas- SUPPORTING HEALTH MANAGE- to 3.44% (95% CI 1.75, 6.20) using our ingly replace self-report questionnaires. MENT IN EVERYDAY LIFE WITH MAA. A MAA provides a powerful tool to The sudden success of accelerometers MOBILE TECHNOLOGY estimate population level HIV incidence. in these studies is due to the fact that Predrag Klasnja*, University of Michigan they are cheap, easy to wear, collect email: [email protected] millions of data points at high frequency Susan A. Murphy, University of Michigan (10-100Hz or more), store months worth Ambuj Tewari, University of Michigan 41. Statistical Advances of data, and can be paired with other and Challenges devices, such as heart, gps, or skin in Mobile Health temperature sensors. I will discuss the Mobile phones are becoming an increas- multi-resolution structure of the data and ingly important platform for the delivery will introduce methods for movement rec- of health interventions. Phones have MICRO-RANDOMIZED TRIALS ognition both for in-the-lab and in-the-wild been used to encourage physical activity AND mHEALTH data using second- and sub-second level and healthy diets, to monitor symptoms data. I will introduce movelets, a powerful Peng Liao, University of Michigan of asthma, heart disease, and chemo- dictionary learning approach, designed therapy side effects, and to send patients Pedja Klasjna, University of Michigan for quick identification of movement pat- reminders to take medications and to Ambuj Tewari, University of Michigan terns. At the minute level I will describe attend appointments. In this talk, I will activity intensity measures (activity discuss why mobile phones are particu- Susan Murphy*, University of Michigan counts, vector magnitude, and activity larly well suited for creation of innovative Micro-randomized trials are trials in intensity) and introduce functional data and effective health interventions, I will which individuals are randomized 100’s approaches for characterizing the circa- review our work on using mobile phones or 1000’s of times over the course of the dian rhythm of activity and its association to encourage physical activity, and I will study. The goal of these trials is to assess with health. The natural data structure highlight some of the challenges involved the impact of momentary interventions, induced by such observational studies is in developing and evaluating mHealth e.g. interventions that are intended to that of multilevel functional data (activity technologies. impact behavior over small time intervals. intensity measured at every minute for email: [email protected] A fast growing area of mHealth con- multiple days observed within each sub- cerns the use of mobile devices for both ject.) I will introduce fast functional data collecting real-time data, for processing

Program & Abstracts 241 MEASURING STRESS AND ADDIC- 42. CONTRIBUTED PAPERS: QUANTIFYING PARENTAL HISTORY TIVE BEHAVIORS FROM MOBILE Survey Research IN SURVEY DATA PHYSIOLOGICAL SENSORS Rengyi Xu*, University of Pennsylvania Santosh Kumar*, University of Memphis ORDINAL BAYESIAN INSTRUMENT Sara B. DeMauro, University of Emre Ertin, The Ohio State University DEVELOPMENT: NEW KID ON THE Pennsylvania Mustafa al’Absi, University of Minnesota PATIENT REPORTED OUTCOME Rui Feng, University of Pennsylvania MEASURES BLOCK David Epstein, National Institute on Drug Parental history has been identified as Abuse, National Institutes of Health Lili Garrard*, University of Kansas an important risk factor for the incidence Medical Center of many diseases in their offspring. Most Kenzie Preston, National Institute on existing literatures use a binary indicator Drug Abuse, National Institutes of Health Larry R. Price, Texas State University to quantify parental history. However, in Marjorie J. Bott, University of Kansas Annie Umbricht, Johns Hopkins some diseases, such as asthma, parent’s University Byron J. Gajewski, University of Kansas age at onset of the disease might increase Recent advances in the sensing and Medical Center children’s risk. Therefore, an estimator computational capacity of mobile devices Traditional instrument development is that incorporates parent’s age at onset have opened up enormous opportunities often challenged by psychometric difficul- is desirable. When the data are collected to improve patients’ health and well-being. ties when the target audience represents from national household surveys, the They can quantify dynamic changes in small populations or rare diseases. We complex survey sampling design needs to an individual’s health state as well as key propose an innovative Ordinal Bayesian be taken into consideration. In this study, physical, biological, behavioral, social, Instrument Development (OBID) method we develop a continuous standardized and environmental factors that contrib- that seamlessly integrates expert and score, the so-called log-rank risk score, ute to health and disease risk, anytime participant data in a Bayesian factor to quantify parental history that incor- and anywhere. Such real-time monitor- analysis framework, while utilizing fewer porates both the occurrence of disease ing can accelerate health research and subjects than classical approaches and and the age at onset for survey data. The optimize care delivery, e.g., via just-in-time maintaining coherent validity evidence. proposed method is evaluated using personalized interventions. In this talk, I When the instrument consists of all the third National Health and Nutrition will present a computational model for ordinal items, the ordinal factor analysis Examination Survey data to examine the automatically detecting stress, smoking, model is equivalent to a two parameter separate effects of maternal and paternal and cocaine use from mobile physiologi- item response theory (IRT) model with history on the onset of asthma in children cal sensors in the AutoSense suite. a probit link. Prior distributions obtained and to evaluate the relationship between age of asthma onset in parents and risk email: [email protected] from expert data are imposed on the IRT parameters and are updated with par- of asthma in their children. Using our ticipants’ data. Simulation data are used new risk scores leads to smaller standard to demonstrate the efficiency of OBID errors and thus more precise estimates by comparing its performance to classi- than using a binary indicator. Our results cal instrument development with exact also show children whose mother has an estimate procedures. earlier age at onset have an increased risk. email: [email protected] email: [email protected]

242 ENAR 2015 | Spring Meeting | March 15–18 BAYESIAN NONPARAMETRIC HOW TO BEST COMPUTE PRO- weights. An application of this method WEIGHTED SAMPLING INFERENCE PENSITY SCORES IN COMPLEX was demonstrated using the NHANES SAMPLES IN RELATION TO dataset in assessing racial differences Yajuan Si*, University of Wisconsin, SURVEY WEIGHTS in prostate cancer screening. Madison Keith W. Zirkle*, Virginia Commonwealth email: [email protected] Natesh S. Pillai, Harvard University University Andrew Gelman, Columbia University Adam P. Sima, Virginia Commonwealth MULTIPLE IMPUTATION OF THE It has historically been a challenge to University ACCELEROMETER DATA IN THE perform Bayesian inference in a design- Survey data are usually not collected NATIONAL HEALTH AND NUTRITION based survey context. The present via simple random sampling so that EXAMINATION SURVEY paper develops a Bayesian model for inferences upon an intended popula- sampling inference in the presence of Benmei Liu*, National Cancer Institute, tion require the use of survey weights. inverse-probability weights. We use National Institutes of Health However, similar to any observational a hierarchical approach in which we study, confounding from a number of Mandi Yu, National Cancer Institute, model the distribution of the weights of covariates can bias the results if not National Institutes of Health the nonsampled units in the population properly taken into account. Propensity and simultaneously include them as Barry I. Graubard, National Cancer score weighting is a popular method to predictors in a nonparametric Gaussian Institute, National Institutes of Health address confounding in a sample. There process regression. We use simulation Richard Troiano, National Cancer is considerable interest in how to best studies to evaluate the performance of Institute, National Institutes of Health combine survey weight and propen- our procedure and compare it to the clas- sity score information when analyzing Nathaniel Schenker, National Center sical design-based estimator. We apply data. The literature suggests that when for Health Statistics, Centers for Disease our method to the Fragile Family Child computing propensity scores within a Control and Prevention Wellbeing Study. Our studies find the complex sample, survey weights should Bayesian nonparametric finite popula- The Physical Activity Monitor (PAM) be treated as an effect that contributes to tion estimator to be more robust than the component was introduced to the the propensity score. Alternatively, survey classical design-based estimator without 2003-2004 National Health and Nutrition weights could be used in their proper loss in efficiency, which works because Examination Survey (NHANES) to collect purpose when calculating the propensity we induce regularization for small cells objective information on physical activ- scores. This study assesses the optimal and thus this is a way of automatically ity including both movement intensity use for survey weights in the calculation smoothing the highly variable weights. counts and steps. Due to an error in of propensity scores when the goal of the accelerometer device initialization email: [email protected] the research is to draw inferences upon process, the steps data was missing the intended population. A Monte Carlo for all participants in several primary simulation showed that the proper use sampling units (PSUs), typically a single of the survey weights never performed county or group of contiguous counties, worse than the methods recommended who had intensity count data from their by the literature and outperformed these accelerometers. To avoid potential bias methods when there was a strong level and loss in efficiency in estimation and of confounding and a strong contribution of confounding variables to the survey

Program & Abstracts 243 inference involving the steps data, we in the longitudinal setting. Our findings conceptually simple and computationally considered methods to accurately impute suggest that the optimal design depends fast, and more importantly has appeal- the missing values for steps collected in on the data structure, specifically on ing large-sample properties in both the 2003-2004 NHANES. We proposed a both the within-wave and between-wave estimation and inference. This new meth- multiple imputation approach based on variable correlations, and there exists odology is evaluated through simulation a semi-parametric regression technique a trade-off between the complexity and studies and illustrated by a motivating where the Alternating Conditional Expec- robustness of the design. These factors example of neuroimaging data about an tation model is utilized to improve the should be taken into account when con- association study of iron deficiency on imputation accuracy. This paper describes structing a longitudinal study with planned infant’s auditory recognition memory. the approaches used in this imputation missing data. email: [email protected] and evaluates the methods by comparing email: [email protected] the distributions of the original and the imputed data. A simulation study using the INTEGRATIVE ANALYSIS OF observed data is also conducted as part of 43. CONTRIBUTED PAPERS: GENETICAL GENOMICS DATA the model diagnostics. Finally some real Graphical Models INCORPORATING NETWORK data analyses are performed to compare STRUCTURE the before and after imputation results. Bin Gao*, Michigan State University REGRESSION ANALYSIS email: [email protected] OF NETWORKED DATA Yuehua Cui, Michigan State University Yan Zhou*, University of Michigan Genetical genomics analysis provides SPLIT QUESTIONNAIRE SURVEY promising opportunities to infer gene Peter X. K. Song, University of Michigan DESIGN IN THE LONGITUDINAL regulations and predict phenotypic SETTING This paper concerns the development of variation by combining genetic vari- a new regression analysis methodology Paul M. Imbriano*, University ants, gene expressions, and phenotype to assess relationships between multi- of Michigan data together. In this work, we use dimensional response variables and gene expressions to predict phenotypic Trivellore E. Raghunathan, University covariates that are correlated through response while considering the graphi- of Michigan networks. To address analytic challenges cal structures on gene networks. Given Advancements in survey administration pertaining to the integration of network that gene expressions are intermediate methodology and multiple imputation topology into the regression analysis, we phenotypes between a trait and genetic software now make it possible for planned propose a method of hybrid quadratic variants, we follow the instrumental vari- missing data designs to be implemented inference functions (HQIF) that utilizes able regression framework proposed by for improving data quality through the both prior and data-driven correlations Lin et al. (2014) and treat genetic variants reduction in survey length. Many papers among network nodes into statistical as instrumental variables to improve the have discussed implementing a cross estimation and inference. Moreover, prediction. In addition, we adopt a covari- sectional study with planned missing data a Godambe information based tuning ate-adjusted graph learning approach using a split-questionnaire design, but the strategy is proposed to allocate weights to improve the graphical structure of research in applying these methods to a between the prior and data-driven gene expression network. We propose a longitudinal study has been limited. Using pieces of network knowledge, so that 2-step estimation procedure. In step 1, simulations and data from the Health and the resulting estimation achieves desir- we apply the glasso algorithm to learn Retirement Study (HRS), we compared able efficiency. The proposed method is graphical structures on gene expressions the performance of several methods for while adjusting for the effects of genetic administering a split-questionnaire design variants. In step 2, we use the predicted

244 ENAR 2015 | Spring Meeting | March 15–18 expressions obtained from the first step ESTIMATION OF DIRECTED LONGITUDINAL GRAPHICAL as predictors while adopting a network SUBNETWORKS IN ULTRA HIGH MODELS: OPTIMAL ESTIMATION constrained regularization method based DIMENSIONAL DATA FOR GENE AND ASYMPTOTIC INFERENCE on the updated graphical structures NETWORK PROBLEM Quanquan Gu*, Princeton University obtained from the first step to obtain bet- Sung Won Han*, New York University ter coefficients estimates. We establish Yuan Cao, Princeton University Hua (Judy) Zhong, New York University the selection and estimation consistency Yang Ning, Princeton University of the 2-step estimation procedure. The Pathway analysis or gene-gene interac- Han Liu, Princeton University utility of the method is demonstrated with tion study is very important to find the extensive simulations and application to a genome-wide mechanism of the cancer In this paper, we propose a new semi- mouse obesity dataset. development. The directed acyclic graph parametric graphical model, namely Longitudinal Graphical Model, for con- email: [email protected] is a commonly used model to estimate a network with gene regulatory. However, tinuous longitudinal data with irregular due to the advent and the diminish- followup. It models the joint distribution ESTIMATING A GRAPHICAL INTRA- ing cost of high-throughput data, ultra of $d$ covariates at $T$ time stamps, CLASS CORRELATION COEFFICIENT high dimensional datasets are currently that are normally distributed with time (GICC) USING MULTIVARIATE generated. The estimation of DAGs is a varying mean and a common covari- PROBIT-LINEAR MIXED MODELS NP-hard problem, and the computational ance matrix. Our goal is to estimate the common covariance structure shared Chen Yue*, Johns Hopkins University time is large even in middle-sized data. The estimation of whole gene network by different time stamps. We present a Shaojie Chen, Johns Hopkins University based on gene expressions such as novel parameter estimation algorithm based on conditioning and QR decom- Haris Sair, Johns Hopkins University mRNAs is the problem of estimating DAGs with ultra high dimensional data. position, which treats the time varying Raag Airan, Johns Hopkins University Thus, it is computationally infeasible mean vectors as nuisance parameters. Brian Caffo, Johns Hopkins University with reasonable time to estimate causal The proposed graphical model includes Gaussian graphical model as a special Data reproducibility is a critical issue in relationship by using existing well-known example. Under certain mild assump- all scientific experiments. In this manu- methods. In most problem, they estimate tions, we establish the parameter script, the problem of quantifying the the network with a few gene expres- estimation error bounds in terms of reproducibility of graphical measurements sions, which known as a group with an different norms, which are of optimal is considered. The concept of image important role. Thus, finding directed rate. Furthermore, we study hypothesis intra-class correlation coefficient (I2C2) is interactions within a group of similar func- test of the sparse precision matrix in the generalized and the concept of the graphi- tional genes is a reasonable approach if proposed graphical model. We propose cal intra-class correlation coeffiecient finding entire directed interactions is not score, Wald and likelihood ratio tests for (GICC) is proposed for such purpose. The feasible. In this presentation, we discuss the element of the sparse precision matrix concept of GICC is based on multivariate how to estimate directed subnetworks of the graphical model. The asymptotic probit-linear mixed eect models. A Markov in ultra high dimensional data by using properties of these hypothesis tests are Chain EM (MCEM) algorithm is used for some techniques used in social network analyzed. In addition, we study multiple estimating the GICC. Simulations results problems. hypothesis test of the sparse preci- with varied settings are demonstrated and email: [email protected] sion matrix in the proposal graphical our method is applied to the KIRBY21 test- model. It is based on multiplier bootstrap retest dataset. technique and is able to test multiple email: [email protected]

Program & Abstracts 245 elements (i.e., a subgraph) of the preci- our approach achieves better accuracy use a gamma frailty in a Weibull model to sion matrix. The asymptotic property in network estimation compared with account for the correlation between sur- of the multiple hypothesis test are also models not incorporating spatial and tem- vival times within pairs. The sub-models established. We verify these appealing poral dependency. Finally, we illustrate are then linked through shared, latent theoretical properties of the proposed our method on the human brain gene random effects, where the longitudinal graphical model through both simulations expression microarray dataset, where the and survival processes are conditionally on synthetic datasets, and a real world expression levels of genes are measured independent given the random effects. microarray dataset. in different brain regions across multiple Parameter estimates are obtained by time periods. maximizing the joint likelihood for the email: [email protected] bivariate longitudinal and bivariate sur- email: [email protected] vival data using the EM algorithm. JOINTLY ESTIMATING GAUSSIAN email: [email protected] GRAPHICAL MODELS FOR SPATIAL 44. CONTRIBUTED PAPERS: AND TEMPORAL DATA Joint Models for JOINT MODEL OF BIVARIATE Zhixiang Lin*, Yale University Longitudinal and Survival Data SURVIVAL TIMES AND Tao Wang, Yale University LONGITUDINAL DATA Can Yang, Hong Kong Baptist University Ke Liu*, University of Iowa JOINT MODELING OF BIVARIATE Hongyu Zhao, Yale University LONGITUDINAL AND BIVARIATE Ying Zhang, University of Iowa In this paper, we first propose a Bayes- SURVIVAL DATA IN SPOUSE PAIRS Motivated by a study of muscular dystro- ian neighborhood selection procedure Jia-Yuh Chen*, University of Pittsburgh phy in MD STARnet a joint to estimate Gaussian Graphical Models model of bivariate survival times and lon- Stewart J. Anderson, University of (GGMs). Our procedure is then extended gitudinal data is developed. We propose Pittsburgh to jointly estimate GGMs in multiple to analyze correlated bivariate survival groups of data with complex structure, Joint modeling of longitudinal and sur- responses associated with a longitudinal including spatial data, temporal data vival data has become increasingly useful biomarker in the Frequentist paradigm. A and data with both spatial and tempo- for analyzing clinical trials data. Recent Gamma frailty variable is used to account ral structures. In our approach, Markov multivariate joint models relate one or for the correlation between the two cor- random field models are used to effi- more longitudinal outcomes to one or related survival outcomes in addition ciently utilize the information embedded more failure times (e.g., competing risks) to the random variables that account in the spatial and temporal structure. For in the same subject. We consider a case for the correlation between the survival the estimation of single GGM, we show where longitudinal and survival outcomes times and longitudinal maker. The EM the graph selection consistency of the are measured in subject pairs (e.g., algorithm is adopted to compute the proposed method in the sense that the married couples). We propose a bivari- maximum profile likelihood estimate. The posterior probability of the true model ate joint model incorporating within-pair bootstrap method is applied to estimate converges to one. We develop and imple- correlations, both in the longitudinal and the standard error of estimated model ment an efficient algorithm for statistical survival processes. We use a bivari- parameters. The simulation study is inference. For 1,000 iterations of Gibbs ate linear mixed-effects model for the conducted to demonstrate the validity of sampling, the computational time is longitudinal process where the random the proposed methodology. Finally the about 30 seconds for one graph with 100 effects are used to model the temporal method is applied to the MD STAR net for nodes. Simulation studies suggest that correlation among longitudinal outcomes illustration. and the correlation between different email: [email protected] outcomes. For the survival process, we

246 ENAR 2015 | Spring Meeting | March 15–18 THE JOINT MODELLING OF RECUR- alternative to the FGM bivariate density RENT EVENTS AND OTHER FAILURE for the recurrent events and the failure TIME EVENTS time that can account for a stronger correlation. We illustrate the model and Luojun Wang*, The Pennsylvania State analysis using data in which the recurrent University event is the occurrence of acute kidney Vernon M. Chinchilli, The Pennsylvania injury (AKI) and the failure event is death. State University email: [email protected] In many biomedical studies and clinical trials, recurrent events are commonly encountered, indicating progression in A BAYESIAN APPROACH FOR DYNAMIC PREDICTION OF treatment or disease. When recurrent JOINT MODELING OF LONGITUDI- ACUTE GRAFT-VERSUS-HOST events are correlated with another failure NAL MENSTRUAL CYCLE LENGTH DISEASE WITH TIME-DEPENDENT event, such as death, we no longer AND FECUNDITY COVARIATES should assume an independent censor- Kirsten J. Lum*, Johns Hopkins ing mechanism for the failure event. Yumeng Li*, University of Michigan University and Eunice Kennedy Shriver Huang and Wang (2004) proposed a joint National Institute of Child Health Thomas M. Braun, University modeling of a recurrent event process and Human Development, National of Michigan and a failure time in which a common, Institutes of Health Acute Graft-versus-Host Disease subject-specific, latent variable is used to Rajeshwari Sundaram, Eunice Kennedy (aGVHD) is a side-effect of hematopoietic model the association between the inten- Shriver National Institute of Child Health cell transplantation (HCT) and is a lead- sity of the recurrent event process and and Human Development, National ing cause of death in patients receiving the hazard function of the failure time. Institutes of Health HCTs. Thus, investigators would like to However, in this setting, the correlation have models that accurately predict those between the number of recurrent events Germaine M. Buck Louis, Eunice most likely to suffer from aGVHD in order occurring before the failure time or cen- Kennedy Shriver National Institute of to minimize over-treatment of patients as soring time needs to be positive. Another Child Health and Human Development, well as reduce mortality. To this end, we model to consider is to construct a Farlie- National Institutes of Health Gumbel-Morgenstern (FGM) bivariate propose using biomarkers (that are col- Thomas A. Louis, Johns Hopkins density function for the recurrent events lected weekly) to predict future biomarker University and U.S. Census Bureau values and the time-to-aGVHD through and the failure time, in which the correla- both joint modeling and multi-state tion between the recurrent events and the Female menstrual cycle length is thought model methods. We consider settings failure time or censoring time could be to play an important role in couple in which the biomarkers are continuous either positive or negative. The drawback fecundity, or the biologic capacity for or binary (above or below a threshold), to the FGM bivariate density is that it reproduction irrespective of pregnancy and aGVHD is treated as binary or as a only can accommodate a weak level of intentions. A complete assessment of time-to-event (and possibly censored) correlation, i.e., the correlation cannot the association between menstrual cycle outcome. We present simulation results approach the boundaries of -1 or +1, length and fecundity requires a model for various models using settings based as desired. In this work, we propose an that accounts for multiple risk factors upon actual data collected at the Uni- (both male and female) and the couple’s versity of Michigan Blood and Marrow intercourse pattern relative to ovula- Transplant Program. tion. We develop a Bayesian joint model consisting of a mixed effects accelerated email: [email protected]

Program & Abstracts 247 failure time model for longitudinal men- part, given the effect that varying pat- 45. CONTRIBUTED PAPERS: strual cycle lengths and a hierarchical terns of sexual intercourse may have on Functional Data Analysis model for the conditional probability of the length of pregnancy attempt. Clinical pregnancy in a menstrual cycle given no guidance is sometimes sought to aid pregnancy in previous cycles of trying, in couples in timing intercourse acts around GENERALIZED MULTILEVEL which we include covariates for the male ovulation to minimize the time needed to FUNCTION-ON-SCALAR and the female and a flexible spline func- achieve pregnancy. Empirical evidence REGRESSION AND PRINCIPAL tion of intercourse timing. Using our joint delineating the timing of intercourse COMPONENT ANALYSIS modeling approach to analyze data from relative to ovulation are few, resulting in a Jeff Goldsmith*, Columbia University the Longitudinal Investigation of Fertil- generalized clinical recommendation to Vadim Zipunnikov, Johns Hopkins ity and the Environment Study, a couple have intercourse every other day (Prac- University based prospective pregnancy study, we tice Committee of the American Society found a significant quadratic relation for Reproductive Medicine, 2013). Under- Jennifer Schrack, Johns Hopkins between menstrual cycle length and the standing the relation between fecundity, University probability of pregnancy even with adjust- intercourse behavior and other relevant We consider regression models for gen- ment for other risk factors, including male covariates is increasingly relevant given eralized, multilevel functional responses: semen quality, age, and smoking status. population level changes in the sociode- functions are generalized in that they mographic characteristics of reproductive email: [email protected] follow an exponential family distribution aged couples such as an increase in age and multilevel in that they are clustered at first pregnancy. This may be associ- within groups or subjects. This data ated with reduced intercourse activity, JOINT ANALYSIS OF MULTIPLE structure is increasingly common across longer time-to-pregnancy, an increased LONGITUDINAL PROCESSES AND scientific domains and is exemplified prevalence of infertility or a combination SURVIVAL DATA MEASURED ON by our motivating example, in which of all these factors. Our main objective is NESTED TIME-SCALES USING binary curves indicating physical activity to jointly model intercourse behavior, a SHARED PARAMETER MODELS: AN or inactivity are observed for nearly six binary longitudinal process (measured on APPLICATION TO FECUNDITY DATA hundred subjects over five days. We use day level), menstrual cycle characteristic Rajeshwari Sundaram*, Eunice a generalized linear model to incorporate (measured on monthly level and TTP, Kennedy Shriver National Institute of scalar covariates into the mean structure, a survival outcome (on monthly times- Child Health and Human Development, and decompose subject-specific and cale), with a view towards prediction of National Institutes of Health subject-day-specific deviations using both longitudinal processes on differing multilevel functional principal compo- Somak Chatterjee, George Washington timescales and time to pregnancy. This nents analysis. Model parameters are University is achieved using an empirical bayes estimated in a Bayesian framework using approach of joint modeling of multivariate Fecundity is defined as the biologic Stan, a programming language that longitunidal processes and time to event. potential of men and women for repro- implements a Hamiltonian Monte Carlo duction, and is often measured by email: [email protected] sampler. Simulations designed to mimic estimating the probability of pregnancy the application have good estimation in each menstrual cycle among couples and inferential properties with reasonable having regular unprotected intercourse. computation times for moderate datasets, Estimating fecundity is challenging, in in both cross-sectional and multilevel scenarios; code is publicly available. In

248 ENAR 2015 | Spring Meeting | March 15–18 the application we identify effects of age GENERALIZED FUNCTION- the dimensionality of the response and and BMI on the time-specific change in ON-FUNCTION REGRESSION coefficient curves and by the correlation probability of being active over a twenty- structure of the residuals. By expand- Janet S. Kim*, North Carolina State four hour period. ing the coefficient functions using a University B-spline basis, we pose the function-on- email: [email protected] Ana-Maria Staicu, North Carolina State scalar model as a multivariate multiple University regression problem. Spline coefficients INFERENCE ON FIXED EFFECTS Arnab Maity, North Carolina State are grouped within coefficient function, IN COMPLEX FUNCTIONAL MIXED University and group-minimax concave penalty MODELS (MCP) is used for variable selection. We consider a non-linear regression We adapt techniques from generalized So Young Park*, North Carolina State models for functional responses and least squares to account for residual University functional predictors observed on pos- covariance by “pre-whitening” using an sible different domains. We introduce a Ana-Maria Staicu, North Carolina State estimate of the covariance matrix and flexible model that relates the value of University develop an iterative algorithm that alter- response at a particular time point to nately updates the spline coefficients and Luo Xiao, Johns Hopkins Bloomberg the covariate over the entire domain as covariance. Simulation results indicate School of Public Health well as the time point of the response. that this iterative algorithm often performs Ciprian Crainiceanu, Johns Hopkins There are two innovations in this paper. as well as pre-whitening using the true Bloomberg School of Public Health First, we develop an inferential procedure covariance. We apply our method to two- that reduces the dimension of model We discuss statistical inference in regres- dimensional planar reaching motions in a parameters by orthogonal projection of sion models involving complex-correlated study of the effects of stroke severity on the functional response. Second, the pro- functional responses and scalar or vector motor control, and find that our method posed method accommodates realistic covariates. Current inferential proce- provides lower prediction errors than settings such as correlated error structure dures are developed for independently competing methods. as well as sparse and/or irregular design. sampled functional responses and are We investigate our methodology in finite email: [email protected] not directly applicable to functional data sample size through simulations and real that are correlated because of a longitu- data applications. dinal or spatial design, for example. We BAYESIAN ADAPTIVE FUNCTIONAL use bootstrap methodology to construct email: [email protected] MODELS WITH APPLICATIONS TO confidence bands for the covariates effect COPY NUMBER DATA on the mean response. Additionally, we Bruce D. Bugbee*, University of Texas introduce a testing procedure for test- VARIABLE SELECTION IN FUNC- MD Anderson Cancer Center ing scalar covariate effect. Our methods TION-ON-SCALAR REGRESSION are illustrated in a thorough simulation Yakuan Chen*, Columbia University Veera Baladandayuthapani, University experiment and on the motivating appli- of Texas MD Anderson Cancer Center Todd Ogden, Columbia University cation - the Baltimore Longitudinal Study Jeffrey S. Morris, University of Texas Jeff Goldsmith, Columbia University on Aging (BLSA), where daily physical MD Anderson Cancer Center activity is recorded repeatedly for several The problem of variable selection often We present a Bayesian framework for hundreds of subjects of various ages. arises in the context of models with func- the analysis of functional covariates in a tional responses and scalar predictors. email: [email protected] regression context. This is done with both In comparison with traditional regression global and spatially-adaptive regulariza- models, this setting is complicated by tion schemes for the population level

Program & Abstracts 249 weight function. To accomplish this we optimal property for prediction under 46. CONTRIBUTED PAPERS: develop both MCMC and variational the framework of Reproducing Kernel Methods in Causal Bayes approximation methods. In addi- Hilbert Space. The merits of the method Inference: Instrumen- tion to highlighting the need for these are further demonstrated by numerical tal Variable, Propensity models, we showcase the usefulness of experiments and an application on real Scores and Matching variational approximations for exploring MRI imaging data. complex, high-dimensional data. Finally, email: [email protected] we present investigations of a motivating METHODS TO OVERCOME example based on exploring relationships VIOLATIONS OF AN INSTRUMENTAL between copy number aberrations and SIMULTANEOUS CONFIDENCE VARIABLE ASSUMPTION: leukemia. BANDS FOR DERIVATIVES OF CONVERTING A CONFOUNDER INTO AN INSTRUMENT email: [email protected] DEPENDENT FUNCTIONAL DATA Guanqun Cao*, Auburn University Michelle Shardell*, National Institute on Aging, National Institutes of Health FUNCTIONAL BILINEAR REGRES- In this work, consistent estimators and Instrumental variable (IV) methods are a SION WITH MATRIX COVARIATES VIA simultaneous confidence bands for powerful tool for consistently estimating REPRODUCING KERNEL HILBERT the derivatives of mean functions are causal effects in the presence of unmea- SPACE WITH APPLICATIONS IN NEU- proposed when curves are repeatedly sured confounding. However, the validity ROIMAGING DATA ANALYSIS recorded for each subject. The within- curve correlation of trajectories has been of instrumental variable (IV) methods Dong Wang, University of North Carolina, considered while the proposed novel relies on strong assumptions, some of Chapel Hill confidence bands still enjoys semipara- which cannot be conclusively empirically Dan Yang*, Rutgers University metric efficiency. The proposed methods verified. One such assumption is that the lead to a straightforward extension of the effect of the proposed instrument on the Haipeng Shen, University of North two-sample case in which we compare outcome is completely mediated by the Carolina, Chapel Hill the derivatives of mean functions from exposure of interest. We consider the sit- Hongtu Zhu, University of North Carolina, two populations. We demonstrate in uation where this assumption is violated, Chapel Hill simulations that the proposed confi- but a weaker assumption holds in which Traditional functional linear regression dence bands are superior to existing the effect of the proposed instrument usually takes a one dimensional func- approaches which ignore the within-curve on the outcome is completely mediated tional predictor as input and estimates dependence. The proposed methods are by measured variables, including the the continuous coefficient function. applied to investigate the derivatives of exposure of interest. In this case, the Modern applications often generate two mortality rates from period lifetables that proposed instrument is actually a con- dimensional covariates, which when are repeatedly collected over many years founder. We review some conventional observed at grid points are matrices. To for various countries. IV methods and propose easy-to-use adaptations of these methods for use avoid inefficiency of the classical method email: [email protected] involving estimation of a two dimensional when the usual IV assumption is violated, coefficient function, we propose a bilinear but the weaker assumption holds. The regression model and obtain estimates proposed methods involve first “con- via a smoothness regularization method. verting” the confounder into an IV, then The proposed estimator exhibits minimax applying conventional IV methods. Poten- tial applications of the proposed methods in epidemiology include studies where

250 ENAR 2015 | Spring Meeting | March 15–18 the exposure and outcome are known matching (PSM) is used to reduce bias that matched cohort studies are often to exhibit seasonal variation and Mende- obtained in estimates of treatment effects more efficient than random sampling for lian randomization studies with genetic as a result of confounding between base- estimating effects on the treated, and we variants that are known to affect multiple line factors and exposure group status. derive the optimal number of matches phenotypes that may affect the outcome. This presentation describes the PSM in such studies for a given set of match- process, and applies optimal PSM, with a ing variables. We illustrate our results email: [email protected] sensitivity analysis implementing addi- via simulation and in an evaluation of tional matching techniques, using data the National Supported Work training ASSESSING TREATMENT EFFECT collected from this nationally representa- program. tive UK population-based study, where OF THIOPURINES ON CROHN’S email: [email protected] DISEASE FROM A UK POPULATION- impact of duration and timing of Thiopu- BASED STUDY USING PROPENSITY rine treatment on the likelihood of surgery SCORE MATCHING is assessed using a Cox proportional REVISITING THE COMPARISON OF hazards model and PSM. COVARIATE ADJUSTED LOGISTIC Laura H. Gunn*, Stetson University email: [email protected] REGRESSION VERSUS PROPEN- Sukhdev Chatu, St. George’s University SITY SCORE METHODS WITH FEW Hospital London EVENTS PER COVARIATE SEMIPARAMETRIC CAUSAL Sonia Saxena, Imperial College London Fang Xia*, Duke University School INFERENCE IN MATCHED Azeem Majeed, Imperial College London of Medicine COHORT STUDIES Richard Pollok, St. George’s University Phillip J. Schulte, Duke University Edward H. Kennedy*, University Hospital London School of Medicine of Pennsylvania Randomized controlled trials (RCTs) are Laine Thomas, Duke University School Dylan S. Small, University a ‘gold standard’ for estimating minimally of Medicine of Pennsylvania unbiased treatment effects on health When treatments are compared in obser- Famously, odds ratios can be estimated outcomes; however, RCTs are not always vational data sources, adjustment for in case-control studies using standard feasible and population-based observa- measured confounding is often achieved logistic regression, ignoring the outcome- tional studies can be more appropriate. by covariate-adjustment or propensity dependent sampling. In this paper we The Clinical Practice Research Datalink score methods including inverse propen- prove an analogous result for treatment (CPRD) contains clinical and prescribing sity weighting (IPW) and stratification by effects on the treated in cohort studies. data for over 13 million patients in the quintiles of the propensity score distribu- Specifically, in studies where a sample of United Kingdom; participating primary tion. With binary outcomes, over-fitting treated subjects is observed along with care practices are subject to regular audit may be a serious concern when there a separate sample of possibly matched to ensure data accuracy and complete- are fewer than 10 events per covariate. controls, we show that efficient and ness, allowing epidemiological studies This is often cited as a reason to prefer doubly robust estimators of effects on the of this data to be feasible. Since RCTs propensity methods to logistic regression treated are computationally equivalent evaluating the impact of Thiopurine treat- adjustment in the case of a rare outcome to standard estimators, which ignore the ment on Crohn’s disease patients is not but common treatment. The recom- matching and exposure-based sampling. practical, we used CPRD data to identify mendation is based on the median bias This is not the case for general aver- 5,640 patients with incident Crohn’s observed in a single simulation study age effects. With respect to issues of cases diagnosed between 1989 and of propensity stratified versus covariate efficiency and study design, we show 2005, with at least an additional 5-year adjusted logistic regression under condi- follow-up to 2010. Propensity score tions where the total number of events

Program & Abstracts 251 was typically less than 16. It is unclear standard propensity scoring theory This paper presents a Monte Carlo whether this result would generalize to (without measurement error) by proving simulation study of the comparative well-powered studies with a large number the consistency of ACE estimation using performance of multivariate matching of covariates, or to IPW as opposed to the proposed latent propensity scores. methods that select a subset of obser- stratification. In order to clarify the rela- She proposed a joint likelihood approach vations from typically larger samples tive performance of these methods, we in finite mixture model format for ACE of treated and controls. The methods conducted a simulation study across a estimation under continuous outcome. considered are the widespread method range of conditions with at least 20 total In Huang and etc paper, EM algorithm is of greedy nearest neighbor matching events. All three methods demonstrated used, where the numerical performance with propensity score calipers, optimal minimal bias and similar performance, is not ideal due to the large dimensions matching of an optimally chosen sub- even with as few as 2 events per con- of unknown parameters. We extend set, and the recent method of optimal founder. Our results suggest that all of this work and use Bayesian estimation cardinality matching. The main findings these techniques remain an option for method under the latent propensity score are: (i) covariate balance, as measured many adjustment applications. model in finite mixture model format. by differences in means, variance ratio, The method captures the uncertainty in Kolmogorov-Smirnov distance, and the email: [email protected] propensity score subclassification aris- cross-match test statis- tic, is better with ing from the unobserved measurement cardinality matching as by design it satis- BAYESIAN LATENT PROPENSITY error. Simulations studies are presented fies balance requirements; (ii) for a given SCORE APPROACH FOR AVER- to show the performance of this newly level of balance, the resulting sample AGE CAUSAL EFFECT ESTIMATION developed Bayesian approach compared sizes are larger with cardinal- ity match- ALLOWING COVARIATE MEASURE- to the existing EM algorithm and naive ing than with the other methods; (iii) in MENT ERROR approach ignoring the error. It shows terms of distances, optimal subset match- that Bayesian method provides more ing performs best; (iv) estimates from Elande Baro*, University of Maryland stable inference with good standard cardinality matching have lower RMSEs, Baltimore County error estimates than EM. In addition, we provided tight requirements for balance; Yi Huang, University of Maryland investigate the case of ACE estimation (v) specifically, matching with fine bal- Baltimore County under binary outcome and discuss its ance for all the covariates plus strong Anindya Roy, University of Maryland identifiability. This is a joint work with Dr requirements for mean balance have the Baltimore County Yi Huang and Dr Anindya Roy. lowest RM- SEs. In statistical practice, an extensively used rule of thumb is to In observational studies, it is often the email: [email protected] balance covariates so that their abso- case that covariates are measured with lute standardized differences in means error. The naive approach is to ignore COMPARATIVE PERFORMANCE are not greater than 0.1. The simulation the error and use naive propensity OF MULTIVARIATE MATCHING results suggest that stronger forms of bal- score methods with observed covari- METHODS THAT SELECT A SUBSET ance should be pursued in practice. ates to estimate the average causal OF OBSERVATIONS effect (ACE). It has been shown that email: [email protected] the naïve approach might bias the ACE Maria de los Angeles Resa*, Columbia inference. Dr. Yi Huang developed a set University of causal assumptions allowing covari- Jose R. Zubizarreta, Columbia ate measurement error and extend the University

252 ENAR 2015 | Spring Meeting | March 15–18 IMPROVING TREATMENT EFFECT 47. CONTRIBUTED PAPERS: SEPARATING VARIABILITY IN PRAC- ESTIMATION IN THE PRESENCE Covariates Measured TICE PATTERNS FROM STATISTICAL OF TREATMENT DELAY THROUGH with Error ERROR; AN OPPORTUNITY FOR TRIPLET MATCHING QUALITY IMPROVEMENT Erinn M. Hade*, The Ohio State Laine Thomas*, Duke University LOCALLY EFFICIENT SEMIPA- University RAMETRIC ESTIMATORS FOR Phillip J. Schulte, Duke University Bo Lu, The Ohio State University PROPORTIONAL HAZARDS MODELS Quality improvement studies seek to Hong Zhu, University of Texas WITH MEASUREMENT ERROR establish the degree of variability in Southwestern Medical Center Yuhang Xu*, Iowa State University practice patterns and outcomes across different providers. Wide variation sug- In health related studies with a longitu- Yehua Li, Iowa State University gests that institutional factors play a role dinal cohort, it is common that patients Xiao Song, University of Georgia in affecting outcomes, and high perform- initiate treatment at different time points. ing institutions should be studied and In observational studies, multiple fac- We propose a new class of semiparamet- emulated. Therefore, the magnitude of tors may contribute to why treatment ric estimators for proportional hazards variation is a key parameter of interest. is not administered at the desired time. models in presence of measurement Despite the extensive literature on meth- Patients who are delayed in receiving error in the covariates, where the baseline ods for hospital monitoring and profiling, treatment are often substantially different hazard function, the hazard function for variability across providers is usually from those who receive timely treatment. the censoring time, and the distribution displayed in figures and histograms using Therefore, ignoring information on treat- of the true covariates are deemed as techniques that either over-estimate or ment delay may lead to biased estimation unknown infinite dimensional parameters. under-estimate the actual degree of varia- of treatment effects. To take advantage of We estimate the model components by tion. As a result, conclusions regarding this information, we propose to estimate solving a system of estimating equations the extent of variation based on these the intended effect of having treatment on based on the semiparametric efficient figures may be wrong. A simple correc- time, versus delayed treatment, versus scores under a sequence of restricted tion can be derived from the hierarchical never being treated. Balancing scores are models where the logarithm of the hazard model, but is rarely used in medical first created to summarize the covariates functions are approximated by reduced literature. Although beneficial in many information related to treatment initiation. rank regression splines. By slowly cases, this relies on the hierarchical Using these estimated scores, we will cre- increasing the rank of the restricted model assumptions, such as normally ate matched groups of three observations model with the sample size, we show distributed variation across providers. (triplets with one observation from each that the proposed estimators are locally When features of the distribution, such as of the treatment groups) and compare efficient in the sense that the estimators bi-modality or skewness, are of particular the treatment effect between groups. are semiparametrically efficient if the dis- interest this approach is not adequate. To Further, we compare different match- tribution of the error-prone covariates is achieve greater flexibility, we cast this as ing algorithms to evaluate the matching specified correctly and are still consistent a measurement error problem and apply quality. We apply these methods to data and asymptotic normal if this distribution recently developed methods for density investigating the timing of adjuvant sur- is misspecified. Our simulation studies estimation in the presence of measure- gery for breast cancer. show that the proposed estimators have smaller variances than the competing ment error. We compare the alternative email: [email protected] methods. We further illustrate the new approaches by simulation and interpret method with a real application in an HIV the results in the context of a motivating clinical trial. example. email: [email protected] email: [email protected]

Program & Abstracts 253 GOODNESS-OF-FIT TESTING OF pregnancies over a fixed time interval of the multi-state model is to add covariates ERROR DISTRIBUTION IN LINEAR eight years. Unfortunately, measurement in the transition intensity functions. In ERRORS-IN-VARIABLES MODEL error due to the dating required for cal- most applications, covariates have to be culating gestational age may misclassify fully observed; yet clinical data are almost Xiaoqing Zhu*, Michigan State University preterm births and bias results obtained always incomplete in practice. In this The paper discusses a goodness-of-fit with standard longitudinal techniques. paper, we propose a maximum simulated test for the error density function in linear This article proposes a flexible approach likelihood method to handle the miss- error-in-variables regression models that accounts for measurement error in ing continuous covariates in multi-state using the deconvolution kernel density gestational age when making inference. models. Our simulation study shows that estimator. The test statistic is an analog We propose a hidden Markov modeling the proposed method works quite well of the Bickel and Rosenblatt type of sta- approach that allows for measurement in most MAR cases. We also apply the tistics, which is the square integrated error error in gestational age by exploiting the method to a real dataset, a longitudinal of the deconvolution kernel estimator and relationship between gestational age and dementia study cohort of 5,404 subjects. a smoothed version of the parametric fit of birthweight. We use this novel methodol- email: [email protected] the density. Under the null hypothesis, the ogy to estimate the effect of important asymptotic distribution of the proposed covariates on the risk of pre-term birth in test statistic is derived for the ordinary repeated pregnancies, focusing on the WEIGHTED L1-PENALIZED smooth and supersmooth deconvolution incidence and recurrence of preterm birth CORRECTED QUANTILE REGRES- problems. A simulation study also shows in the CPS cohort. SION FOR HIGH DIMENSIONAL the efficiency of this test. email: [email protected] MEASUREMENT ERROR MODELS email: [email protected] Abhishek Kaul*, Michigan State University MULTI-STATE MODEL WITH MISSING ESTIMATING RECURRENCE AND CONTINUOUS COVARIATE Hira L. Koul, Michigan State University INCIDENCE OF PRETERM BIRTH IN Wenjie Lou*, University of Kentucky Standard formulations of prediction CONSECUTIVE PREGNANCIES SUB- problems in high dimension regression Richard J. Kryscio, University JECT TO MEASUREMENT ERROR IN models assume the availability of fully of Kentucky GESTATION: A NOVEL APPLICATION observed covariates and sub-Gaussian OF HIDDEN MARKOV MODELS Erin Abner, University of Kentucky and homogenous model errors. This makes these methods inapplicable to Paul S. Albert*, Eunice Kennedy Shriver Multi-state models are very useful tools measurement errors models where National Institute of Child Health and to model chronic disease processes in covariates are unobservable and obser- Human Development, National Institutes which patients might go through several vations are possibly non sub-Gaussian of Health difference states (e.g., preclinical, mild and heterogeneous. We propose a Prediction of preterm birth as well disease, severe disease, death). For weighted penalized corrected quantile as characterizing the etiological fac- example, in studies of dementia patients estimator for regression parameters in tors affecting both the recurrence and might experience an intermittent state linear regression models with additive incidence of preterm birth are important called mild cognitive impairment (MCI) measurement errors, where unobservable problems in obstetrics. The NICHD con- before they become demented or die. covariate is nonrandom. The proposed secutive pregnancy study (CPS) recently The common way to account for patient estimators forgo the need for the above examined this question by collecting data characteristics in the disease process in mentioned model assumptions. We study on a cohort of women with at least two

254 ENAR 2015 | Spring Meeting | March 15–18 these estimators in a high dimensional Analyzing hierarchical data can be chal- 48b. OVER-PARAMETERIZATION sparse setup where the dimensionality lenging. Such data occur in longitudinal IN ADAPTIVE DOSE-FINDING can grow exponentially with the sample and/or multi-centric trials, regular meta- STUDIES size. We provide bounds for the statisti- analyses, surrogate endpoint evaluation John O’Quigley, Universite Pierre cal error associated with the estimation, (Burzykowski, Molenberghs, and Buyse et Marie Curie that hold with asymptotic probability 1, 2005), etc. Further, these issues occur for thereby providing the ?1-consistency of a wide variety of settings: small samples Nolan A. Wages, University of Virginia the proposed estimator. We also estab- (e.g., orphan diseases) on the one hand Mark R. Conaway, University of Virginia lish the model selection consistency in and big data on the other. In between, Ken Cheung, Columbia University terms of the correctly estimated zero we have meta-analyses based on just components of the parameter vector. A a few large trials. Similar issues occur Ying Yuan, University of Texas simulation study that investigates the in small-area epidemiology. It has been MD Anderson Cancer Center finite sample accuracy of the proposed found empirically that lack of balance Alexia Iasonos*, Memorial Sloan estimator is also included in the paper. is an important, though not the only, Kettering Cancer Center contributor to the difficulties. Another email: [email protected] aspect is the lack of so-called complete sufficient statistics (Molenberghs et al Adaptive, model-based, dose-finding 48. ORAL POSTERS: 2014). A broad paradigm is to use the methods, such as the continual re- Clinical Trials following three-step approach: (a) apply a assessment method, have been shown method to render the data balanced; (b) to have good operating characteristics. analyzed the so-obtained data; (c) apply One school of thought argues in favor of 48a. SPLIT-SAMPLE BASED AND combination rule to render a single, valid the use of parsimonious models using a MULTIPLE IMPUTATION ESTI- inference. Two viable candidates for step strict minimum number of parameters. MATION AND COMPUTATION (a) are: (a1) multiple imputation (Little In particular, for the standard situation of METHODS FOR META-ANALYSIS and Rubin 2002, Carpenter and Kenward a single homogeneous group, the usual OF CLINICAL TRIAL DATA AND 2013) and (a2) so-called split sampling, approach is to appeal to a one-parameter OTHERWISE HIERARCHICAL a pseudo-likelihood based approach. model. Other authors argue that richer DATA Method (a2) provides a formal basis for models lead to improved performance. Geert Molenbergs*, Universiteit Hasselt such methods as a two-stage approach Here, we show that increasing the for linear mixed models, a fixed-effects dimension of the parameter space, in the Geert Verbeke, Katholieke Universiteit approach for meta-analyses, etc. Under context of adaptive dose-finding stud- Leuven (a1), step (c) is the classical combination ies, is usually counter-productive and, Michael G. Kenward, London School rules; under (a2) an information-sandwich rather than leading to improvements of Hygiene and Tropical Medicine estimator is used. in operating characteristics, the added dimensionality is likely to result in prob- Wim Van der Elst, Universiteit Hasselt email: [email protected] lems. Among these are inconsistencies Lisa Hermans, Universiteit Hasselt of sample estimates, lack of coherency of Vahid Nassiri, Katholieke Universiteit escalation or de-escalation, erratic behav- Leuven ior, getting stuck at the wrong level and, in general, poorer performance in terms of correct identification of the targeted dose. Our conclusions are based on both theoretical results and simulations. email: [email protected]

Program & Abstracts 255 48c. IMPROVING SOME CLINICAL 48d. DIRECT ESTIMATION OF 48e. BAYESIAN INTERIM ANALY- TRIALS INFERENCE BY USING THE MEAN OUTCOME ON SIS METHODS FOR PHASE IB RANKED AXILLARY COVARIATE TREATMENT WHEN TREAT- EXPANSION TRIALS ENABLE MENT ASSIGNMENT AND EARLIER GO/NO-GO DECI- Hani Samawi*, Georgia Southern DISCONTINUATION COMPETE SIONS IN ONCOLOGY DRUG University DEVELOPMENT Xin Lu*, Emory University Rajai Jabrah, Georgia Southern James Lymp*, Genentech University Brent A. Johnson, University of Rochester Jane Fridlyand, Genentech Robert Vogel, Georgia Southern University Several authors have investigated the Hsin-Ju Hsieh, Genentech challenges of statistical analyses and infer- Daniel Linder, Georgia Southern Daniel Sabanes Bove, F. Hoffmann-La ence amidst early treatment termination, University Roche including a loss of efficiency in random- The main objective in a randomized ized controlled trials and its connection Somnath Sarkar, F. Hoffmann-La Roche clinical trial of studies such as in cancer, to dynamic regimes in observational Phase Ib expansion trials are often AIDS, etc. is to compare the outcome of studies. Popular estimation strategies for used in oncology drug development to interest between two or more groups. causal estimands in dynamic regimes lend obtain preliminary safety and efficacy Clinical trials are considered the “gold themselves to studies where treatment is information for making decisions in the standard” of biomedical research and of assigned at a finite number of points; the next phase of development. Such trials its strengths are the ability to measure extension to continuous treatment assign- are typically single arm and depend on changes and/or evaluate of treatments ment is non-trivial and introduces other comparison to reliable external informa- over time with maximizing power of caveats. We re-examine this particular tion which can be challenging to obtain, statistics and validity. Clinical trials are problem from a different perspective especially in a combination setting. We expensive, and the cost of clinical trials and propose a new direct estimator for describe two Bayesian approaches to on medical treatments and devices, the mean outcome of a target treatment interim analysis that enable earlier deci- public health investigators are increas- length policy that does not model the sions based on a binary primary efficacy ing with each phase and continue to propensity score. Because this strategy endpoint. The posterior probability escalate, especially in phase III. The does not include a model for treatment approach is based on the probability that idea proposed in this project is to use selection, the estimator works well in both the new therapy is effective using the auxiliary covariates by adopting Ranked discrete and continuous time and avoids current evidence at the time of interim Set Sampling (RSS) technique to select finite sample bias associated with squeez- analysis. The predictive probability the subjects for each treatment-arms, to ing continuous time data into intervals. We approach is based on the probability utilize inexpensive auxiliary covariates show how the competition of treatment that the trial would conclude that the information into a randomized clinical tri- assignment and terminating event through new therapy is effective if carried out to als. The goal is to provide a more precise time leads to an intriguing type of compet- completion. We motivate each of these estimator of the population mean of the ing risks problem. We exemplify the direct methods, illustrate their use and interpre- outcome of interest to recover the difficult estimator through small sample numeri- tation, and demonstrate the impact on to obtain information, without making any cal studies and the analyses of two real decisions of various parameters includ- additional assumptions other than those datasets. When all model assumptions are ing the prior distribution. Simulations already necessary for (RSS) and the correct, our simulation studies show that show that operating characteristics, such ordinary least square estimators from a the direct estimator is more precise than as the probability of making a correct regression model to hold. inverse weighted estimator. email: [email protected] email: [email protected]

256 ENAR 2015 | Spring Meeting | March 15–18 decision, can be effectively controlled. approval by considering the parameters 48h. A BAYESIAN SEMIPARAMETRIC Bayesian methods for interim analysis in the additional requirement depending MODEL FOR INTERVAL CEN- can help earlier decision making for a on the number of regions. In particular, SORED DATA WITH MONOTONE drug development program because they the values of parameters are determined SPLINES are flexible with the number and timing of by considering a reasonable sample Bin Zhang, Cincinnati Children’s Hospital interim analyses and naturally incorporate size increase with the desired probability Medical Center historical and concurrent controls. satisfying the additional requirement. Considering the practicality of the global Yue Zhang*, University of Cincinnati email: [email protected] trial or sample size increase, we suggest Generalized odds ratio hazard (GOPH) the values of the parameters for different models are general enough to include 48f. UNIFIED ADDITIONAL REQUIRE- number of regions. We also introduce the several commonly used models, such MENT IN CONSIDERATION OF assurance probability curve to evaluate as proportional hazards model and REGIONAL APPROVAL FOR the performance of different regional proportional odds model. However, its MULTI-REGIONAL CLINICAL requirements. application is much undeveloped to TRIALS email: [email protected] interval censored data in which the partial likelihood does not exist. In this paper, Zhaoyang Teng*, Boston University we propose a novel Bayesian approach Yeh-Fong Chen, The George 48g. EFFICIENCIES OF to analyze the interval censored data with Washington University BAYESIAN ADAPTIVE GOPH model. The baseline cumulative Mark Chang, AMAG Pharmaceuticals PLATFORM CLINICAL TRIALS hazard function was modelled with finite and Boston University Ben Saville*, Berry Consultants dimensional monotone splines. Gibbs sampler is easy to implement and the To speed up the process of bring- Scott Berry, Berry Consultants performance of the proposed method ing a new drug to the market, more A “platform trial” is a clinical trial in which was evaluated by extensive simulation and more clinical trials are conducted multiple treatments for the same indica- studies. A real life data set was analyzed simultaneously in multiple regions. After tion are tested simultaneously. Bayesian by using the aforementioned method as demonstrating the overall drug’s efficacy adaptive platform designs offer attractive an illustration. across regions, the regulatory and drug features such as dropping treatments for sponsor may also want to assess the email: [email protected] futility, declaring one or more treatments drug’s effect in specific region(s). Most superior, or adding new treatments to be of the recent approaches imposed a tested during the course of a trial. Such 48i. COMPREHENSIVE EVALUATION common criterion to assess the con- designs can be more efficient at finding OF ADAPTIVE DESIGNS FOR sistency of treatment effects between beneficial treatments relative to traditional PHASE I ONCOLOGY CLINICAL the interested region(s) and the entire two group designs. We quantify these TRIALS study population regardless the number efficiencies via simulation to show that of regions included in a Multi-Regional Sheau-Chiann Chen*, Vanderbilt platform trials on average can find benefi- Clinical Trials (MRCT). As a result, the University cial treatments with fewer patients, fewer needed sample size to achieve the deaths or poor outcomes, less time, and Yunchan Chi, National Cheng Kung desired probability of satisfying the with greater probabilities of success than University regional requirement could be huge traditional designs. and implausible for the trial sponsors to Yu Shyr, Vanderbilt University implement. In this paper, we propose a email: [email protected] To provide valuable recommendations for unified additional requirement for regional selecting a phase I trial design, a compre- hensive evaluation measure is proposed.

Program & Abstracts 257 This measure is used to evaluate the approach for testing the treatment effect. error, and affect trial validity and reliability. performance of the 3+3 design and This approach treats all individual compo- When covariate adaptive randomiza- three Bayesian adaptive designs, i.e., the nents as of equal relevance, although tion is used to control the imbalance continual reassessment method design they may not be of equal importance between treatment arms, misclassification (CRM), Bayesian continual reassessment clinically or to the patients. To address may also impact the intended treatment method design (B-CRM), and modi- this issue, several authors have proposed assignment. It is unclear whether the fied toxicity probability intervals design to rank the individual outcomes based on appropriate analysis strategy should (mTPI). A simulation study determined their importance, and then to combine adjust for the misclassified covariate that none of the designs are uniformly them based on their ranks. Two such or the corrected covariate. We provide better than the others based on overall approaches include the proportion in computational simulation results and scores. We use the gap size, which is favor of treatment parameter and the win asymptotic result to explore the impact the distance between the true toxicity ratio parameter. This talk will describe of such misclassification on the statisti- probability of MTD and the true toxicity tests and confidence intervals for com- cal operating characteristics. Methods: probability of the next higher dose, to posite outcomes based on prioritized Binary/frequency outcome were simu- provide some insight into comparison components using the large sample lated with treatment and one error-prone results. When both target toxicity rate and distribution of certain multivariate multi- dichotomized prognostic covariate. gap size increase, the performance of the sample U-statistics. This nonparametric Randomization schemes were stratified 3+3 and mTPI designs tend to be better approach provides a general solution for within the covariate. Simulation scenarios than the CRM design. In practice, since both the proportion in favor of treatment were created based on the misclassifica- the investigators may have some knowl- parameter and the win ratio parameter, tion rate and the covariate effect on the edge about target toxicity rate, the CRM and it can be extended to stratified stud- outcome. Models including unadjusted, design is best applied for a very small ies, and the comparison of more than adjusted by the misclassified covariate, target toxicity rate, while the mTPI design two groups. The proposed results are adjusted by the corrected covariate were is recommended for use for a target toxic- illustrated using data from the Prevention compared. Result: Under covariate- ity rate between 0.1 and 0.35. When the of Events with Angiotensin Converting adaptive randomization with logistic target toxicity rate and toxicity probability Enzyme Inhibition (PEACE) Trial. regression, type I error can be maintained patterns are unknown, the mTPI design is in the adjusted model either with the email: [email protected] recommended. misclassified covariate or the corrected covariate. Randomization procedure does email: not have additional impact on power loss [email protected] 48k. THE IMPACT OF COVARI- ATE MISCLASSIFICATION and bias caused by covariate misclas- USING GENERALIZED LIN- sification. The magnitude of power loss 48j. STATISTICAL INFERENCE FOR EAR REGRESSION UNDER and bias depends on the covariate effect COMPOSITE OUTCOMES BASED COVARIATE-ADAPTIVE on the outcome and the misclassifica- ON PRIORITIZED COMPONENTS RANDOMIZATION tion rate. With poisson log linear model, type I error is inflated with misclassified Ionut Bebu*, The George Washington Liqiong Fan*, Medical University model. Conclusion: Correction for covari- University of South Carolina ate misclassification should be taken into John M. Lachin, The George Sharon D. Yeatts, Medical University consideration during trial design and later Washington University of South Carolina analysis. Composite endpoints are common in Background: Covariate misclassifica- email: [email protected] cardiovascular (CV) trials, and the time- tion may lead to bias in treatment effect to-first-event analysis is the standard estimate, impact the power and type I

258 ENAR 2015 | Spring Meeting | March 15–18 48l. NON-INFERIORITY TEST BASED 48m. METHODS ACCOUNTING 48n. A SEMIPARAMETRIC BAYESIAN ON TRANSFORMATIONS FOR MORTALITY AND MISS- APPROACH USING HISTORICAL ING DATA IN RANDOMIZED CONTROL DATA FOR ASSESS- Santu Ghosh*, Wayne State University TRIALS WITH LONGITUDINAL ING NON-INFERIORITY IN Arpita Chatterjee, Georgia Southern OUTCOMES THREE ARM TRIALS University Elizabeth A. Colantuoni*, Arpita Chatterjee*, Georgia Southern Samiran Ghosh, Wayne State University Johns Hopkins Bloomberg School University of Public Health Non-inferiority trials are becoming very Santu Ghosh, Wayne State University popular for comparative effectiveness Chenguang Wang, Johns Hopkins Samiran Ghosh, Wayne State University research. These trials are required to School of Medicine show that an experimental drug is not Historical information is always relevant Daniel O. Scharfstein, Johns Hopkins inferior than a known reference drug by a for designing clinical trials. The incor- Bloomberg School of Public Health small pre-specified amount. Hence, non- poration of historical information in the inferiority trials are of great importance to Randomized trials are the standard for new trial can be very beneficial. Some of pharmaceutical companies, when superi- establishing evidence favoring a treat- these benefits include reduction of effec- ority can not be claimed. In this paper we ment over standard care in clinical tive sample size, increase the statistical consider a three-arm non-inferiority trial settings. We will consider randomized power, and reduction of cost and ethical consists of a placebo, areference treat- trials where the outcome is a clinical hazard. However, if current data conflicts ment, and an experimental treatment. characteristic of the patient and is mea- with historical data, borrowing information However unlike the traditional choices, we sured at a single time or multiple fixed from historical data can give mislead- assume that the distributions of the end times after randomization. Defining and ing results. In this project we consider points corresponding to these treatments estimating the treatment effect is compli- a semiparametric Bayesian approach are unknown and suggest a test proce- cated when the population being studied based on Dirichlet Process prior that can dure for a three non-inferiority trial based is expected to have high mortality rates borrow relevant information from histori- on transformations in conjunction with a during the trial due to either the age of cal data for assessing non-inferiority in normal approximation. Theoretical prop- the patients or the underlying conditions three-arm trials. The scale parameter of erties of our method are investigated. An for which the patients are being treated. Dirichlet Process prior can be treated as alternative test procedure based the boot- In such trials, clinical outcomes that a tuning parameter which can control the strap percentile-$t$ method is discussed. would be measured are “truncated due dependencies between the historical and We compare the performance of these to death”. In addition, among survivors, current data. A simulation study is devel- test procedures in simulated data sets. there is often missing data. Using data oped to demonstrate that our suggested These methods are further illustrated in a from a randomized trial among patients method can be successfully applied. study on mildly asthmatic patients. with acute respiratory distress syndrome Finally we apply the proposed methodol- (ARDS), we describe and demonstrate ogy to a real data set. email: [email protected] methods for accounting for mortality and email: missing data within randomized trials with [email protected] special attention to sensitivity analysis to the untestable assumptions required in the analyses. email: [email protected]

Program & Abstracts 259 48o. DESIGN PARAMETERS AND verified that the optimal time of treatment NAVIGATING THE ACADEMIC EFFECT OF THE DELAYED- switch is in the middle of the study for JUNGLE WITHOUT GOING BANANAS START DESIGN IN ALZHEIMER’S evenly spaced measurements. We will Amy H. Herring*, University of North DISEASE also provide power estimation for given Carolina, Chapel Hill sample sizes. Guoqiao Wang*, University of Alabama, We consider survival strategies for the Birmingham email: [email protected] academic jungle. Some individuals Richard E. Kennedy, University of appear to exibit immunity from hazards, Alabama, Birmingham 49. CENS Invited Session occupational or otherwise. However, we argue the feeling that one has escaped Lon S. Schneider, University of Southern – Careers in Statistics: the dreaded jaguar only to disappear California Skills for Success into an unanticipated sinkhole while Gary R. Cutter, University of Alabama, dealing with a pesky bot fly, all the while Birmingham HOW TO BE SUCCESSFUL IN ORAL entertaining a group of howler monkeys, Purpose and methods. The delayed-start AND WRITTEN COMMUNICATIONS is much more normative. We’ll discuss design, in which patients are randomly AS A BIOSTATISTICIAN risks, perceived and real, and share strategies employed by successful spe- assigned to placebo or treatment for Peter Grant Mesenbrink*, Novartis Phar- cies navigating jungles of all types. As a pre-specified frame of time and then maceuticals Corporation those (or a randomized portion of those) with any journey, a map is often critical in the placebo group are also given As statisticians, often it is required to successful navigation, and we will the treatment, was recommended for that technical information needs to be conclude by discussing useful features of AD. Critical design parameters such as explained to non-statisticians both as academic cartography. oral and written communications. The sample size and the timing of treatment email: [email protected] switch have been proposed, the purpose variability of knowledge of biostatistics of this study is to extend the existing the- for the audience receiving the informa- ory and to verify those design parameters tion is often quite large. Thus, it is often WHAT AM I GOINGTOBE through simulation based on a meta-data- required for biostatisticians to be able to WHEN I GROW UP? EVOLVING base of previous trials in AD. Conclusion. adapt their style of communication while AS A STATISTICIAN When only a randomized portion of still providing key scientific and strate- Nancy L. Geller*, National Heart, the patients originally on placebo late gic input to projects in which they are Lung and Blood Institute, National will receive the treatment, the optimal involved. Best practices for excelling in Institutes of Health sample size allocation ratio between oral and written communicators across the treatment group, the continuing different disciplines will be discussed The speaker will describe her own profes- placebo group, and the delayed-start when giving formal presentations as well sional evolution, both intellectually (to a treatment group is 1:1:1. The weight on as how to address questions and their clinical trials biostatistician) and in terms estimators from the 3 groups depends answers when communicating with audi- of leadership (to director of a biostatistics on the correlation among the slopes in ences who were not formally trained as group and former president of the Ameri- the delayed-start group; however, the biostatisticians will be discussed. can Statistical Association). Recognizing correlation is relatively small. We also email: [email protected] that we all will have setbacks and devel- oping the resiliance to continue pursuing your goals despite setbacks is important. Finding the right mentor(s) is very helpful, especially when your own path is not

260 ENAR 2015 | Spring Meeting | March 15–18 crystal clear. Developing a “yes I can” attitude, especially when opportunities arise to do something you have never done before, is a big plus. With many opportunities for intellectual pursuits and leadership, individuals in our field can have a rich and satisfying career. email: [email protected]

50. Analysis Methods for Data Obtained to genotypes assessed in routinely col- NONPARAMETRIC ESTIMATION from Electronic lected medical samples. It can be difficult OF PATIENT PROGNOSIS WITH Health Records to extract accurate information about APPLICATION TO ELECTRONIC disease outcomes from large numbers of HEALTH RECORDS EMRs, but recently numerous algorithms IMPROVING THE POWER OF Patrick J. Heagerty*, University have been developed to infer phenotypes. GENETIC ASSOCIATION TESTS WITH of Washington Although these algorithms are quite IMPERFECT PHENOTYPE DERIVED accurate, they typically do not provide Alison E. Kosel, University of FROM ELECTRONIC MEDICAL perfect classification due to the difficulty Washington RECORDS in inferring meaning from the text. Some We develop an algorithm and associ- Jennifer A. Sinnott*, Harvard School algorithms can produce for each patient ated inference for creating local patient of Public Health a probability that the patient is a disease outcome predictions. The intended goal Wei Dai, Harvard School of Public Health case, which can be thresholded to define of the methods is to provide an estimate case-control status, and this estimated of the full outcome distribution for a given Katherine P. Liao, Brigham and case-control status has been used to subject by providing summary data for a Women’s Hospital replicate known genetic associations specific axis-parallel neighborhood with Elizabeth W. Karlson, Brigham in EMR-based studies. However, using a fixed subset size. We develop inference and Women’s Hospital the estimated disease status in place of for the local predictions and implement Isaac Kohane, Harvard Medical School true disease status results in outcome the methods using a dynamic computa- misclassification, which can diminish test tional interface. We illustrate the methods Robert Plenge, Merck Research power and bias odds ratio estimates. We with a large electronic health records Laboratories propose to instead directly model the based back pain cohort, and comment Tianxi Cai, Harvard School algorithm-derived probability of being a on extensions of the methods to com- of Public Health case. We demonstrate how our approach parative estimation. improves test power and effect estimation. To reduce costs and improve clinical email: [email protected] Our work provides an easily implemented relevance of genetic studies, such studies solution to a major practical challenge that could be performed in hospital-based arises in the use of EMR data. cohorts by linking phenotypes extracted MINING EHR NARRATIVES from electronic medical records (EMRs) email: [email protected] FOR CLINICAL RESEARCH Enedia Mendonca*, University of Wisconsin, Madison email: [email protected]

Program & Abstracts 261 51. Statistical Challenges impact on point and variance estimation sample design can impact variance of Survey and and with respect to how they can be used estimates substantially. Therefore, there Surveillance Data to study the effectiveness of the sampling is a need to develop guidelines for the in US Government design. Participant counts at different presentation of NHANES data. In this talk, stages of the sampling procedure will be statistics used in formulating the most shown to illustrate how the final sample recent set of proposed guidelines, includ- USING VENUE-BASED SAMPLING is achieved and to give a sense of the ing degrees of freedom, design effect TO RECRUIT HARD-TO-REACH potential yield of individuals in a VBS and relative confidence interval width, will POPULATIONS design. be discussed. Maria Corazon B. Mendoza*, Centers email: [email protected] email: [email protected] for Disease Control and Prevention Chris Johnson, Centers for Disease DEVELOPMENT OF GUIDELINES FOR DATA SWAPPING METHODS Control and Prevention THE PRESENTATION OF DATA FROM FOR STATISTICAL DISCLOSURE Brooke Hoots, Centers for Disease THE NATIONAL HEALTH AND NUTRI- LIMITATION Control and Prevention TION EXAMINATION SURVEY Guangyu Zhang*, National Center Teresa Finlayson, Centers for Disease Margaret Devers Carroll*, National for Health Statistics, Centers for Control and Prevention Health and Nutrition Examination Disease Control and Prevention Survey, Centers for Disease Control Courses in the design of sample surveys Joe Fred Gonzalez, National Center and Prevention typically cover several types of sam- for Health Statistics, Centers for pling designs that are used to survey a Using objectively measured National Disease Control and Prevention population of interest. Examples in these Health and Nutrition Examination Sur- Anna Oganyan, National Center for courses assume a sampling frame is veys (NHANES) data, the prevalence Health Statistics, Centers for Disease readily identifiable. But what happens of important health characteristics can Control and Prevention if no easily identifiable sampling frame be estimated, e.g. the percent of obese exists? This talk explores this ques- adults ages 20 years and over. Since Alena Maze, National Center for tion by introducing one of the sampling 1999 data from NHANES has been Health Statistics, Centers for Disease designs that the Centers for Disease released in two year cycles. Similar to Control and Prevention Control and Prevention uses to survey previous NHANES surveys the sampling Protection of confidentiality and privacy of hard-to-reach populations: venue-based plan for each 2 year survey follows a survey participants’ (individuals or estab- sampling (VBS). Using the National HIV highly stratified, multistage probability lishments) data is of primary importance Behavioral Surveillance Study System design which involves the selection of to federal agencies when releasing micro- as an example, VBS will be described primary sampling units (PSUs, counties data to the public. Records that have and the logistics of creating the sampling or groups of contiguous counties), seg- unique combinations of key variables are frame and of sampling individuals will be ments (groups of dwelling units) within particularly vulnerable to disclosure. Sta- discussed. In addition, issues such as PSUs, dwelling units within segments and tistical disclosure limitation techniques, obtaining accurate counts (estimates of sample persons within dwelling units. such as random data swapping and individuals attending an event) and mul- Although the sample size of each 2 year recoding, have been used as disclosure tiplicity (an individual belonging to more cycle is approximately 10,000, the num- protection strategies. To protect confi- than one sampling unit), will be intro- ber of first stage units is much smaller (at dentiality and to maintain the accuracy duced and examined because of their most 17). Consequently design based of statistical inferences, in this study we estimates of variance can be extremely unstable. Furthermore, the complex

262 ENAR 2015 | Spring Meeting | March 15–18 create clusters of subjects by Euclidean evaluation of error characteristics for both sequencing (DNA-Seq) has entered the distances of observed variables and traditional survey data sources, and the arena. Due to the very limited amount of apply random data swapping methods abovementioned non-survey sources. DNA available from each capture, which within homogenous clusters. We conduct This framework in turn suggests some results in very low signal-to-noise ratios simulations and apply our methods to a practical methods for integration of these (SNR), it is critical to optimize the assay National Health Interview Survey (NHIS) data sources. The primary concepts are and the analytical procedures in order to public-use data set, available from the illustrated with applications to two large- succeed. We propose statistical methods National Center for Health Statistics. scale surveys. for quantitatively assessing the perfor- mance of these different approaches. email: [email protected] email: [email protected] In essences these methods capture the effective SNR for a particular setup and PRACTICAL APPROACHES TO 52. Reconstructing the its power to detect aberrations, which DESIGN AND INFERENCE THROUGH Genomic Landscape makes it possible to objectively decide THE INTEGRATION OF COMPLEX from High-Throughput which methods are better than others. SURVEY DATA AND NON-SURVEY Data They allow us to compare whether DNA- INFORMATION SOURCES Seq or DNA microarrays are better suited for the challenge or not, what DNA-Seq John L. Eltinge*, U.S. Bureau of COPY NUMBERS IN coverage is needed for detecting events Labor Statistics CIRCULATING TUMOR CELLS of allelic imbalance such as copy-neutral Rachel M. Harter, RTI International (CTCs) USING DNA-Seq loss of heterozygosity (LOH), and more. Practical sample survey work is currently Henrik Bengtsson*, University of email: [email protected] encountering important opportunities and California, San Francisco challenges arising from the increased Detecting and characterizing circulat- availability of data from alternative DNA COPY NUMBER ANALYSES ing tumor cells (CTCs) in the blood of (non-survey) sources. These sources FOR FAMILY BASED DESIGNS cancer patients could bring additional potentially can provide very rich infor- and crucial understanding on how the Ingo Ruczinski*, Johns Hopkins mation that could be integrated with cancers spread, how they circumvent University traditional sample survey processes therapy, and how to better target them. We present novel methods and software through, e.g., targeting of subpopula- It has been shown that CTCs resemble for DNA copy number analyses in family tions; enhancement of sample frames; genomic aberrations of the primary based designs with sequencing or array improvement of unit contact and other tumor, also when long time has passed, data. In the first example, we consider the dimensions of fieldwork; direct replace- meaning they could also be used in evidence that a rare copy number variant ment of survey items that are especially patient follow ups. Different techniques may be causal when only a few affected burdensome, expensive or error-prone; have been proposed to isolate, capture subjects in a multiplex extended pedigree and improvement of editing, imputation and enrich CTCs, to remove contamina- are sequenced or typed, by quantifying or weight construction. However, these tion from non-malignant epithelial cells or the probability of allele sharing by all non-survey data are subject to important leukocytes, and to amplify DNA obtained affected relatives given it was seen in any quality issues, including unit coverage; from these small counts of cells. Origi- one family member under the null hypoth- item missingness; definitional and aggre- nally DNA copy-number (CN) microarrays esis of complete absence of linkage and gation issues; use of proxy variables; was used for genomic profiling of CTCs association. In the second example, we imputation errors; and recording errors. but recently high-throughput DNA present a new method to infer de novo This paper reviews these quality issues copy number variants in trios by defining and suggests a unified framework for “minimum distance” statistic to capture

Program & Abstracts 263 differences in copy numbers between since no genome-wide reconstructions cells. Genetic diversity within a tumor is offspring and parents which reduces have been inferred for these organisms increasingly recognized as a driver of technical variation from probe effects and because of computational bottlenecks and rapid disease progression, resistance to genomic waves, a major source of false organismal complexity. Here we propose targeted therapies, and poor survival out- positive identifications in copy number a two-stage algorithm, deploying multi- come. I will present a statistical approach analyses. Following segmentation of dimensional scaling, that overcomes these to reconstruct clonal composition from the minimum distance by circular binary barriers. After showcasing 3D architec- high-throughput sequencing of bulk segmentation, final inference regarding tures for mouse embryonic stem cells and tumor samples. human lymphoblastoid cells we discuss de novo copy number events is based on email: [email protected] a posterior calling step. methods for evaluating these solutions and, time permitting, downstream applica- email: [email protected] tions thereof. 53. Statistical Methods email: [email protected] for Single Molecule RECONSTRUCTING 3-D GENOME Experiments CONFIGURATIONS: HOW AND WHY A LATENT VARIABLE APPROACH Mark Robert Segal*, University of FOR INTEGRATIVE CLUSTERING OF WALKING, SLIDING, AND DETACH- California, San Francisco MULTIPLE GENOMIC DATA TYPES ING: TIME SERIES ANALYSIS FOR The three-dimensional (3D) configuration CELLULAR TRANSPORT IN AXONS Ronglai Shen*, Memorial Sloan-Kettering of chromosomes within the eukaryote Cancer Center John Fricks*, The Pennsylvania State nucleus is consequential for several cel- University lular functions including gene expression Large-scale integrated cancer genome regulation and is also strongly associated characterizationefforts including the Jason Bernstein, The Pennsylvania with cancer-causing translocation events. cancer genome atlas have created State University While visualization of such architecture unprecedented opportunities to study William Hancock, The Pennsylvania remains limited to low resolutions (due cancer biology in the context of knowing State University to compaction, dynamics and scale), the the entire catalog of genetic altera- Kinesin is a molecular motor that, along ability to infer structures at high resolution tions. A clinically important challenge is with dynein, moves cargo such as organ- has been enabled by recently-devised to discover cancer subtypes and their elles and vesicles along microtubules chromosome conformation capture (3C) molecular drivers in a comprehensive through axons. Studying these transport techniques. In particular, when coupled genetic context. I will present a latent process is vital, since non-functioning with next generation sequencing, such variable framework for joint modeling of kinesin has been implicated in a number methods yield an unbiased inventory of discrete and continuous data types that of neurodegenerative diseases, such genome-wide chromatin interactions. arise from integrated genomic, epig- as Alzheimer’s disease. Over the last Various algorithms have been advanced to enomic, and transcriptomic profiling. We twenty years, these motors have been operate on such data to produce recon- show application of the method to the extensively studied through in vitro exper- structed 3D configurations. Several studies TCGA pan-cancer cohort with whole- iments of single molecular motors using have shown that such reconstructions pro- exome DNA sequencing, SNP6.0 array, laser traps and fluorescence techniques. vide added value over raw interaction data mRNA sequencing data in 3,000 patient However, an open challenge has been to with respect to downstream biological samples spanning 12 cancer types. In explain in vivo behavior of these systems insights. However, such added value has addition, I will introduce a topic on intratu- when incorporating the data from in vitro yet to be realized for higher eukaryotes mor heterogeneity which is characterized by the presence of genetically and phe- notypically distinct subclones of tumor

264 ENAR 2015 | Spring Meeting | March 15–18 experiments into straightforward models. system). We introduce a Bayesian hierar- HIDDEN MARKOV MODELS WITH In this talk, I will discuss recent work chical model on top of a hidden Markov APPLICATIONS IN CELL ADHESION with my experimental collaborator, Will model (HMM) to analyze these data and EXPERIMENTS Hancock (Penn State), to understand use the statistical results to answer the Jeff C. F. Wu, Georgia Institute of more subtle behavior of a single kinesin biological questions. We will discuss Technology than has previously been studied, such model selection, the construction of the as sliding and detachment and how such hierarchical model, their biological mean- Ying Hung*, Rutgers University behavior can contribute to our under- ing as well as our new understanding of Cell adhesion experiments refer to biome- standing of in vivo transport. Data from the detailed mechanism behind protein chanical experiments that study protein, these experiments include time series transportation. DNA, and RNA at the level of single taken from fluorescence experiments for email: [email protected] molecules. The study of cell adhesion kinesin. In particular, we will use novel plays a key role in many physiological applications of switching time series and pathological processes, especially models to explain the shifts between dif- BIMOLECULAR REACTION, DATA in tumor metastasis in cancer research. ferent modes of transport. TYPES, AND AN ALTERNATIVE Motivated by the analysis of a specific email: [email protected] MODEL TO THE SMOLUCHOWSKI type of cell adhesion experiments, a new THEORY framework based on hidden Markov model is proposed. A double penalized Hong Qian*, University of Washington ANALYZING SINGLE-MOLECULE order selection procedure is introduced PROTEIN-TARGETING EXPERIMENTS Bimolecular reaction is fundamental in and shown to be consistent in estimating VIA HIERARCHICAL MODELS biology and biochemistry; it is crucial in the number of hidden states in hidden all forms of life. The classical descrip- Markov models. Simulations show that Samuel Kou*, Harvard University tion of a bimolecular reaction assumes the proposed framework outperforms Yang Chen, Harvard University that the reaction is largely driven by the existing methods. Applications of the Brownian motion of the two reactant Recent technological advances allow proposed methodology to real data molecules. The rate of bimolecular reac- scientists to follow a biological pro- demonstrate the accuracy of estimating tion, fundamental to the understanding cess on a single-molecule basis. These receptor-ligand bond lifetimes and wait- of biological processes, is approximated advances also raise many interesting ing times, which are essential in kinetic by a classical diffusion theory, due to data-analysis problems. In this talk we will parameter estimation. M. Smoluchowski (1917). We recently focus on recent single-molecule experi- identify that, the key issue, unlike in the email: [email protected] ments on protein targeting. To maintain classical theory, is intimately related to proper cellular function, proteins often what type of data is collected for quantify- need to be transported inside or out of a ing a bimolecular reaction. An alternative cell. The detailed molecular mechanism model, based on coupled diffusion with behind such a process (often referred to Markov switching, will be provided. as protein targeting) is not well under- We will discuss its connection with an stood. Single-molecule experiments are experimental data collection technique, designed to unveil the detailed mecha- fluorescence correlation spectroscopy nism and reveal the functions of different (FCS), and the application on nonlinear organelles involved in the process. The reversible bimolecular reaction experimental data consist of hundreds of stochastic time traces (from the fluo- e-mail: [email protected] rescence recording of the experimental

Program & Abstracts 265 54. Subgroup Analysis SUBGROUP-BASED ADAPTIVE based on posterior predictive prob- and Adaptive Trials (SUBA) DESIGNS FOR MULTI-ARM abilities. We compare the SUBA design BIOMARKER TRIALS with three alternative designs including equal randomization, outcome-adaptive Yanxun Xu, University of Texas, Austin A BAYES RULE FOR SUBGROUP randomization and a design based on a REPORTING – BAYESIAN ADAPTIVE Lorenzo Trippa, Harvard University probit regression. In simulation studies ENRICHMENT DESIGNS Peter Mueller, University of Texas, Austin we find that SUBA compares favorably against the alternatives. Peter Mueller*, University of Texas, Yuan Ji*, NorthShore University Austin HealthSystem and University of Chicago email: [email protected] We discuss Bayesian inference for Targeted therapies based on biomarker subgroups in clinical trials. We start with profiling are becoming a mainstream DETECTION OF CANCER a decision theoretic approach, based direction of cancer research and treat- SUBGROUP ASSOCIATED on a straightforward extension of a 0/c ment. Depending on the expression of ALTERNATIVE SPLICING utility function and a probability model specific prognostic biomarkers, targeted Jianhua Hu*, University of Texas across all possible subgroup models. We therapies assign different cancer drugs MD Anderson Cancer Center show that the resulting rule is essentially to subgroups of patients even if they are determined by the odds of subgroup diagnosed with the same type of can- Xuming He, University of Michigan models relative to the overall null hypoth- cer by traditional means, such as tumor esis M0 of no treatment effects and location. For example, Herceptin is only relative to the overall alternative M1 of indicated for the subgroup of patients Alternative splicing is known to be a a common treatment effect in the entire with HER2+ breast cancer, but not other critical factor in cancer formation and patient population. This greatly simplifies types of breast cancer. However, sub- progression. In real experiments, high posterior inference. We then generalize groups like HER2+ breast cancer with heterogeneity is often observed among the approach to allow for subgroups that effective targeted therapies are rare and cancer patients. Specifically, alterna- are characterized by arbitrary interactions most cancer drugs are still being applied tive splicing variants may show different of covariates. The two key elements of to large patient populations that include degrees among or only occur to sub- the generalization are a flexible nonpara- many patients who might not respond or groups of cancer patients. We propose a metric Bayesian response function and benefit. Also, the response to targeted penalized mixture statistical model inte- a separate description of the subgroup agents in human is usually unpredictable. grated with dimension reduction of the report that is not linked to the parametri- To address these issues, we propose interaction space based on ANOVA-type zation of the response model. We discuss SUBA, subgroup-based adaptive designs model and a sequential testing proce- an application to an adaptive enrichment that simultaneously search for prog- dure to detect genes with such cancer design for targeted therapy. nostic subgroups and allocate patients subgroup structure. email: [email protected] adaptively to the best subgroup-specific email: [email protected] treatments throughout the course of the trial. The main features of SUBA include the continuous reclassification of patient subgroups based on a random parti- tion model and the adaptive allocation of patients to the best treatment arm

266 ENAR 2015 | Spring Meeting | March 15–18 55. CONTRIBUTED PAPERS: the existing bootstrap-based methods. tion techniques are effective. Software Methods to Assess The results from extensive Monte Carlo is made available that implements the Agreement simulation suggest the proposed meth- methods in R. Future extensions of these ods perform reasonably well for at least results are discussed. a moderately large number of clusters. email: [email protected] KAPPA STATISTICS FOR Real collections of data are analyzed to CORRELATED MATCHED-PAIR illustrate the application. CATEGORICAL DATA email: [email protected] STATISTICAL METHODS FOR Zhao Yang*, University of Tennessee ASSESSING REPRODUCIBILITY Health Science Center IN MULTICENTER NEUROIMAGING SAMPLE SIZE METHODS FOR STUDIES Ming Zhou, Bristol-Myers Squibb CONSTRUCTING CONFIDENCE Tian Dai*, Emory University Kappa statistic is widely used to assess INTERVALS FOR THE INTRA-CLASS the agreement in the independent CORRELATION COEFFICIENT Ying Guo, Emory University matched-pair data encountered in psy- Kevin K. Dobbin*, University of Georgia Recently in the neuroimaging community, chometrics, educational measurement, there is an increasing trend of conducting Alexei C. Ionan, University of Georgia epidemiology, diagnostic imaging, and multi-center studies. One major challenge etc. However, the correlated matched-pair The intraclass correlation coefficient (ICC) arises when combining data from differ- (clustered matched-pair and physician- in a two-way analysis of variance is a ratio ent centers is that the properties of brain patients) data which are more commonly involving three variance components. images of the same person can vary expected from the medical practice, like Two recently developed methods for con- considerably across centers since they the dental and ophthalmological care, the structing confidence intervals (CI’s) for are acquired using different scanners and shared-decision making between general the ICC are the Generalized Confidence protocols. Thus, it is crucial to effectively practice physician and patients. The tradi- Interval (GCI) and Modified Large Sample measure the reproducibility of images tional method, ignoring the dependence (MLS) methods. The resulting intervals acquired from various sites. However within a cluster, is generally inappropriate have been shown to maintain nominal simply assessing reproducibility based to handle the correlated matched-pair coverage. But methods for determining on raw brain images suffers from high data. For clustered matched-pair data, sample size for GCI and MLS intervals are dimensionality of the data and can be a non-parametric variance estimator is lacking. This paper presents sample size inefficient. In this work, we propose a two- proposed for the kappa statistic without methods that guarantee control of the stage network-based agreement method within-cluster correlation structure or mean width for GCI and MLS intervals. for fMRI data. In the first stage, we use a distributional assumptions. For the physi- In the process, two variance reduction blind signal separation method to extract cian-patients data, relying on a plausible methods are employed, which we term functional networks from fMRI data and assumption of conditional independence dependent conditioning and inverse estimate the temporal dynamics of these (responses from patients of the same Rao-Blackwellization. Asymptotic results networks under experimental conditions. physician are conditionally independent provide lower bounds for mean CI widths, In the second stage, we propose agree- given their physician’s responses), a and show that MLS and GCI widths ment indices for functional data to assess semi-parametric variance estimator is are asymptotically equivalent. Simula- the agreement of the brain network developed for the kappa statistic. The tion studies are used to investigate the temporal dynamics estimated from the proposed estimators provide convenient new methods. It is shown that the new same subjects brain images acquired tools for efficient computations and methods result in adequate sample size across different centers. We develop non-simulation-based alternatives to estimates, that the asymptotic estimates are accurate, and that the variance reduc-

Program & Abstracts 267 nonparametric estimation methods for INTER-OBSERVER AGREEMENT FOR often involves characterizing the con- the proposed indices and establish their A MIXTURE OF DATA TYPES cordance of ranked candidate lists from asymptotic properties. The proposed replicate experiments. Recently Li et Shasha Bai*, University of Arkansas methods are applied to the Functional al (2011) developed a copula mixture for Medical Sciences Bioinformatics Research Network (fBIRN) model, called IDR, to assess the repro- Phase I Traveling Subject study to investi- Marcelo A. Lopetegui, The Ohio ducibility of candidates on two rank gate the reproducibility of fMRI images in State University lists. This method allows one to select multi-center studies. Assessment of observer agreement candidates according to a reproducibil- ity-based criterion, and is particularly email: [email protected] has numerous applications in medical studies. A great amount of effort has convenient when the selection thresholds been applied to further the advancement are difficult to determine on the original NONPARAMETRIC REGRESSION OF of measuring observer agreement for lists. However, it is not applicable when a AGREEMENT MEASURE BETWEEN continuous and categorical data, in both large number of ties ranks are present or ORDINAL AND CONTINUOUS univariate and multivariate cases. How- when some candidates are unobserved OUTCOMES ever, there has been a lack of research in one replicate, for example, being trun- cated by a significance threshold. Here AKM F. Rahman*, Emory University in this area for data containing a mix- ture of different data types. We present we generalize this method to handle dis- Limin Peng, Emory University a scenario in clinical workflow studies creteness and incompleteness in the rank Ying Guo, Emory University where the assessment of inter-observer lists using a latent variable approach. The agreement is needed for a set of data generalized approach not only allows ties Amita Manatunga, Emory University containing a mixture of different types. and partially replicated candidates, but The effect of covariates on the agreement We review the problems and limitations also maintains the interpretability of the measure between ordinal and continuous of using a single agreement statistic original model. Using simulation studies, outcomes is considered in a nonpara- for these data, and provide a solution we showed that, when discreteness or metric framework. Peng et al. (2011) to these issues by way of a composite truncation is present, our method is able introduced a nonparametric broad sense method of agreement statistics. This to identify substantially more real signals agreement (BSA) measure between composite method offers meaningful than the original model and produce ordinal and continuous outcomes without and comprehensive evaluation of inter- better calibrated error rates than exist- considering covariates inormation. The observer agreement to clinicians. ing methods. We illustrated this method inhomogeneity in BSA of heterogeneous using a ChIP-seq dataset and a radiology email: [email protected] population explained by covaraites is dataset with highly discrete diagnosis of considerable interest in many set- ratings. Our method shows superior tings including mental health study. We ASSESSING REPRODUCIBILITY performance over existing methods in propose a nonparametric kernel type OF DISCRETE AND TRUNCATED both cases. estimator accommodating the effect of RANK LISTS IN HIGH-THROUGHPUT email: [email protected] covariates on the agreement measure. STUDIES Simulation studies demonstrates the effectiveness of the proposed method. Qunhua Li*, The Pennsylvania We illustrate our methodologies via an State University application to a mental health study. Reproducibility is essential for reliable email: [email protected] scientific discovery in high-throughput studies. Assessment of reproducibility

268 ENAR 2015 | Spring Meeting | March 15–18 EXPONENTIATED LINDLEY POISSON quantify DAS events from RNA-Seq data Kasper D. Hansen, Johns Hopkins DISTRIBUTION suffer from a spectrum of issues. The Bloomberg School of Public Health Bayesian method with MCMC simula- Mavis Pararai*, Indiana University We propose an extension to quantile tion is inefficient and lacks power as a of Pennsylvania normalization which removes unwanted result of considering only junction reads. technical variation using control probes. Gayan Liyanag, Indiana University Moreover, existing likelihood-based We adapt our algorithm, functional of Pennsylvania methods rely on independent assump- normalization, to the Illumina 450k tions between samples, which limits their Broderick Oluyede, Georgia methylation array and address the open applicability in experiments with matched Southern University problem of normalizing methylation data samples or repeated measurements. We A new lifetime distribution called the with global epigenetic changes, such as have devised a novel approach to model exponentiated power lindley Poisson human cancers. Using datasets from The the changes in the exon-inclusion levels distribution is proposed. All it’s properties Cancer Genome Atlas and a large case- using Hotelling’s T-squared distribution, will be explored including momentgen- control study, we show that our algorithm taking consideration of possible correla- erating function, order statistics and outperforms all existing normalization tions between the paired samples. In entropy measures. Real data is applied methods with respect to replication of addition, inspired by Fisher’s method, we to the proposed model. results between experiments, and yields have implemented a p-value aggregation robust results even in the presence of email: [email protected] algorithm to generate gene-level p-val- batch effects. Functional normaliza- ues, which greatly simplifies the ensuing tion can be applied to any microarray steps of data analyses. 56. CONTRIBUTED PAPERS: platform,provided suitable control probes Methylation and RNA email: [email protected] are available. Data Analysis email: [email protected] FUNCTIONAL NORMALIZATION OF 450K METHYLATION ARRAY DATA IDENTIFY DIFFERENTIAL ALTER- DETECTING DIFFERENTIALLY IMPROVES REPLICATION IN LARGE NATIVE SPLICING EVENTS FROM METHYLATED REGIONS (DMRs) BY CANCER STUDIES PAIRED RNA-Seq DATA MIXED-EFFECT LOGISTIC MODEL Jean-Philippe Fortin*, Johns Hopkins Cheng Jia*, University of Pennsylvania Fengjiao Hu*, Georgia Regents Bloomberg School of Public Health Mingyao Li, University of Pennsylvania University Aurelie Labbe, McGill University RNA-Seq has become indispensable Hongyan Xu, Georgia Regents University Mathieu Lemire, Ontario Institute for whole-transcriptome profiling due to Cancer is among the leading causes of Cancer Research its superiority over predecessor tech- of death worldwide, and DNA methyla- nologies in dynamic range and overall Brent W. Zanke, Ottawa Hospital tion at CpG loci in genomic regions has resolution. One of the areas in which Research Institute important implications in cancer. A lot of RNA-Seq shines is detecting compo- Thomas J. Hudson, Ontario Institute statistical methods have been proposed sitional differences among multiple of Cancer Research by using proportion of methylated allele isoforms transcribed from the same (methylation rate) to detect the associa- genetic locus between different condi- Elana J. Fertig, Johns Hopkins School tion of cancer and methylation at single tions, i.e., differential alternative splicing of Medicine CpG locus. However considering the (DAS). Current tools to discover and Celia MT Greenwood, Jewish General correlation among the methylation rates Hospital Montreal

Program & Abstracts 269 of close-by CpG sites, we propose mixed- expression. Owning to the dynamic Rama Raghavan, University of Kansas effect logistic regression model in this nature of miRNA and reduced microar- Medical Center study to detect differentially methylated ray and sequencing costs, a growing Prabhakar Chalise, University of Kansas regions (DMRs), while treating methyla- number of researchers are now measur- Medical Center tion rates from each subject as a cluster, ing high-dimensional miRNA expression and proportions of methylated molecules data using repeated or multiple measures Byunggil Yoo, Childrens Mercy Hospital for each site as observations. Age and in which each individual has more than Kansas City gender are included in the model as the one sample collected and measured Sumedha Gunewardena, Kansas covariates. Simulations were performed over time. However, the commonly used Intellectual and Developmental to show that the mixed-effect logistic univariate association testing or the site- Disabilities Research Center regression was robust to detect DMRs by-site (SBS) testing may underutilize the Jeremy Chien, University of Kansas after adjusting covariates, with Type I longitudinal feature of the data, leading Medical Center error well-controlled and good power. to underpowered results and less bio- The results indicating that this mixed- logically meaningful results. Results: We Brooke L. Fridley, University of Kansas effect logistic regression method is a propose a penalized regression model Medical Center promising approach for detecting DMRs incorporated with grid search method Discovery of differentially expressed (DE) with methylation data from next-genera- (PGS), for analyzing associations of genes is imperative for the understand- tion sequencing. high-dimensional miRNA expression data ing of the genomic basis of complex with repeated measures as well as vari- email: [email protected] diseases and phenotypes. Concurrently, able selection of significant miRNAs. The there is a lack of RNA-seq methods that development of this analytical framework can account for dependency in paired PENALIZED MODELING FOR was motivated by a real-world miRNA designs. In this study, we applied 8 DE VARIABLE SELECTION AND dataset. Comparisons between PGS analysis methods to an RNA-seq study ASSOCIATION STUDY OF HIGH- and the SBS testing revealed that PGS involving paired ovarian tumor samples DIMENSIONAL MicroRNA DATA provided smaller phenotype prediction pre- and post- treatment with carboplatin WITH REPEATED MEASURES errors and higher enrichment of pheno- taken from 11 ovarian cancer patients. type-related biological pathways than Zhe Fei*, University of Michigan Supplementary to the empirical compari- the SBS testing. Our extensive simula- son with real data, we simulated 1,000 Yinan Zheng, Northwestern University tions showed that PGS provided more paired and unpaired datasets under Wei Zhang, University of Illinois, Chicago accurate estimates and higher sensitivity numerous scenarios (i.e. varying level than the SBS testing with comparable of dependency, number of subjects, Justin B. Starren, Northwestern specificities. and effect size). To assess the type I University email: [email protected] error rates for the paired and unpaired Lei Liu, Northwestern University datasets, paired t-tests and two-sample Andrea A. Baccarelli, Harvard School t-tests were conducted. Results showed of Public Health COMPARISON OF PAIRED 35 common genes detected by six DE TUMOR-NORMAL METHODS analysis methods (p<0.05). Under the Yi Li, University of Michigan FOR DIFFERENTIAL EXPRESSION null hypothesis, the type I error rate was Lifang Hou, Northwestern University ANALYSIS OF RNA-Seq DATA conservative for paired designs that were Motivation: MicroRNAs (miRNAs) are Janelle R. Noel*, University of Kansas analyzed incorrectly using two-sample short single-stranded non-coding mol- Medical Center t-tests (0.001

270 ENAR 2015 | Spring Meeting | March 15–18 differences in DE was not affected by able reads of the gene and extracts more naturally defined regions (e.g., gene or either paired (rho=0.3, 0.5) or unpaired information on alternative splicing than regulatory), it is conceptually straight- designs (rho=0). This study demon- methods based on junction reads alone. forward to imagine achieving enhanced strates the differences in DE analysis Additionally, alternatively spliced exons statistical power by performing region- methods when selecting “associated” reflecting the same isoform(s) can be based test especially when there are genes, and the importance of using aggregated together, thus reducing the small or moderate signals in the region. proper statistical tests for RNA-seq data. number of subsequent tests. To detect Here, we propose FunMethyl, a functional DAS, we assume the exon-inclusion regression framework to perform associa- email: [email protected] levels follow a beta or an inflated-beta tion testing between multiple DNAm sites distribution, and test DAS by comparing in a region and a quantitative outcome. DETECTING DIFFERENTIAL ALTER- the parameters of the beta or inflated- Instead of collapsing DNAm variants or NATIVE SPLICING WITH BIOLOGICAL beta distributions between the two building a kernel matrix, we consider REPLICATES BETWEEN TWO groups through the use of a likelihood every individual’s DNAm levels in a region GROUPS FROM RNA-Seq DATA ratio test. Results based on simulated as a stochastic process and further data and the analysis of a real RNA-seq estimate the DNAm function in that region Yu Hu*, University of Pennsylvania dataset on human eyes demonstrate the using the proposed smoothing tech- Cheng Jia, University of Pennsylvania superior performance of our method as niques. Our results from both real data Dwight Stambolian, University of compared to several existing methods, based simulations and real data analysis Pennsylvania including Cuffdiff, DEXSeq, MATS, and clearly show that FunMethyl outperforms DSGSeq. single-site analysis across a wide spec- Mingyao Li, University of Pennsylvania trum of realistic scenarios. email: [email protected] Alternative splicing, a post-transcriptional email: [email protected] process that allows multiple messenger RNA (mRNA) isoforms to be produced FUNCTIONAL REGION-BASED by a single gene, is a regulated process TEST FOR DNA METHYLATION 57. CONTRIBUTED PAPERS: and a major mechanism for generating Kuan-Chieh Huang*, University New Developments in protein diversity. Detecting differen- of North Carolina, Chapel Hill Imaging tial alternative splicing (DAS) between two groups of samples (e.g., cases vs. Yun Li, University of North Carolina, controls) could provide an effective way Chapel Hill ESTIMATING DYNAMICS OF WHOLE- to discover disease susceptibility genes. Recent technological advances have BRAIN FUNCTIONAL CONNECTIVITY To detect DAS from RNA-Seq data, we allowed us to conduct large-scale IN RESTING-STATE fMRI BY FACTOR make use of information on known gene epigenome-wide association studies STOCHASTIC VOLATILITY MODEL structures and pre-estimated isoform rela- (EWASs). DNA methylation (DNAm) is Chee-Ming Ting*, Universiti Teknologi tive abundances. For each alternatively of particular interest because it is highly Malaysia, Malaysia spliced exon of a gene, we divide iso- dynamic and has been shown to be asso- Hernando Ombao, University forms into two categories depending on ciated with many complex human traits. of California, Irvine whether the exon is included or not. The Typically, DNAm level at hundreds of inclusion level of the alternatively spliced thousands of sites is measured and each Sh-Hussain Salleh, Universiti Teknologi exon is then estimated as the total relative of these sites is examined separately (i.e., Malaysia, Malaysia abundances of all isoforms with the exon single-site analysis). However, because Most studies of resting-state fMRI assume included. Our estimation utilizes all avail- of the correlation structure among the temporal stationarity of functional con- sites and because many of them fall in nectivity (FC) between distinct brain

Program & Abstracts 271 regions, identified using a constant KERNEL SMOOTHING GEE FOR A HIERARCHICAL BAYESIAN covariance model fitted on the entire LONGITUDINAL fMRI STUDIES MODEL FOR STUDYING THE time course. However, emerging evi- IMPACT OF STROKE ON BRAIN Yu Chen*, University of Michigan dence suggests that FC may exhibit MOTOR FUNCTION dynamic changes over time, arguably Min Zhang, University of Michigan Zhe Yu*, University of California, Irvine even more prominent during rest when Timothy D. Johnson, University Raquel Prado, University of California, mental activities are unconstrained. We of Michigan consider the problem of quantifying these Santa Cruz Longitudinal fMRI studies are beginning changes which may provide insights Erin Burke Quinlan, University to play an important role in understand- into the fundamental properties of brain of California, Irvine ing the development of the human brain. networks. Recent studies employed the In this setting random effects models Steven C. Cramer, University conventional sliding-window technique have convergence issues and, typically, of California, Irvine assuming locally stationary covariances generalized estimating equations (GEE) over short-time segments. In this work, Hernando Ombao, University are employed. However, due to the large we use multivariate stochastic volatility of California, Irvine number of multiple comparisons, GEE (MSV) model that allows a time-varying methods suffer from a lack of statistical Stroke is a disturbance in the blood covariance process to better capture power. To increase power, we propose supply to the brain which results in the the non-stationary dynamics of FC in a kernel smoothing generalized estimat- loss of brain functions, in particular resting-state fMRI. We further incorporate ing equation (KernGEE) method with a motor function. A study was conducted a latent factor model to achieve a reliable locally adaptive bandwidth to study the by neuroscientists to investigate the and computationally efficient estimation temporal trend of fMRI measurements impact of stroke on the motor-related of large-dimensional covariance matrices for each brain voxel. In order to address brain regions. In the study, functional MRI for analyzing evolving full-brain networks the spatial correlation among voxels (fMRI) data were collected from stroke with a large number of nodes. The and to increase power, we use a kernel patients and healthy controls while the stochastic volatility analysis is performed function that borrows information across subjects performed a simple hand motor on a lower-dimensional common factor neighboring voxels, spatially smoothing task. To explore the changes in the brain series instead of on the observations parameter estimates. The kernel band- due to stroke, we developed a hierarchi- directly. We propose a robust two-step width at each voxel is determined by cal Bayesian approach for modeling the estimation procedure by first estimat- leave-one-out cross validation. Therefore, multi-subject fMRI data. Our approach ing the factor model using the principal our method can provide a set of spatially simultaneously estimates activation component (PC) methods followed by smoothed estimators for each voxel with and connectivity at the group level, and the MSV model with quasi maximum increased efficiency. Meanwhile, correc- provides estimates for region/subject- likelihood (QML) using Kalman filter and tion for multiple comparisons is obtained specific hemodynamic response function expectation maximization (EM) algorithm. using Efron’s empirical null distribution (HRF) and condition-specific connectiv- The proposed method was evaluated on method. We apply our KernGEE method ity. Moreover, the use of spike and slab a resting-state fMRI dataset of 25 healthy to a longitudinal dataset studying brain priors allows for direct posterior inference subjects. mechanisms of risk for alcoholism and on the connectivity network. Using our email: [email protected] other substance abuse. We will also model, we observed several potential investigate the relationships between effects of stroke on the motor system activated brain regions and several function: executing the simple motor task covariates including IQ, age, gender, behavioral and personality variables. email: [email protected]

272 ENAR 2015 | Spring Meeting | March 15–18 requires more involvement of the higher ing the processes that drive the source frequency domain (via periodograms). level motor control regions in both the -- instead of merely recovering the source Our goal here is to develop a systematic stroke affected and unaffected hemi- signals. Moreover, we develop metrics for procedure for analyzing periodograms sphere compared to healthy subjects. connectivity between channels through collected across many trials (which con- We also noted increased communication latent sources by studying the properties sists of 1 second traces) during the entire within and between these secondary of the estimated mixing matrix. Our esti- resting state period. In particular, we use motor regions. These findings provide mation procedure pulls information from functional boxplots to extract information insight into different neural correlates of all trials using a two-stage approach: first, from the many trials [1]. First, we formed movement after stroke versus healthy we apply the second order blind identi- consistent estimators for the spectrum individuals. fication (SOBI) method to estimate the by smoothing the periodograms using a mixing matrix and second, we estimate bandwidth selected using the generalized email: [email protected] the parameters for latent sources using cross-validation of the Gamma deviance. maximum likelihood. Our methods will We then obtained descriptive statistics SOURCE ESTIMATION FOR also impose regularization to ensure from the smoothed periodograms using MULTI-TRIAL MULTI-CHANNEL sparsity. Our proposed methods have functional box plots which provide the EEG SIGNALS: A STATISTICAL been evaluated on both simulated data median and outlying curves. The perfor- APPROACH and EEG data obtained from a motor mance of functional boxplot is compared learning study. This project is in col- with the classical point-wise boxplots Yuxiao Wang*, University of California, laboration with the Space-Time Modeling and functional mixed effects models in a Irvine group at UC Irvine. simulation study and the EEG data. More- Hernando Ombao, University over, we explored the spatial variation e-mail: [email protected] of California, Irvine of the spectral power for the alpha and Raquel Prado, University of California, beta frequency bands by applying the Santa Cruz AN EXPLORATORY DATA ANALYSIS surface boxplot method on periodograms OF EEGs TIME SERIES: A FUNC- computed from the many resting-state Electroencephalography (EEG) has been TIONAL BOXPLOTS APPROACH EEG traces. This work is in collaboration widely used in studying the dynam- with the Space-Time Group at UC Irvine. ics in human brains due to its relatively Duy Ngo*, University of California, Irvine [1] Sun, Y., and Genton, M.G. (2011), high temporal resolution (in millisec- Hernando Ombao, University “Functional Boxplots,” Journal of Com- ond). EEGs are indirect measurements of California, Irvine putational and Graphical Statistics, 20, of neuronal sources. Estimation of the Marc G. Genton, University of Science 316-334. underlying sources is challenging due and Technology to the ill-posed inverse problem. EEGs e-mail: [email protected] are typically modeled as a linear mix- Ying Sun, King Abdullah University ing of the underlying sources. Here, we of Science and Technology consider source modeling and estimation We conduct exploratory data analysis on for multi-channel EEG data recorded over electroencephalograms (EEG) data to multiple trials. We propose parametric study the brain’s electrical activity during models to characterize the latent source resting state. The standard approaches signals and develop methods for estimat- to analyzing EEG are classified either into the time domain (ARIMA modeling) or the

Program & Abstracts 273 A BAYESIAN FUNCTIONAL Carlo algorithm. A simulation study is simulation study and use it to analyze the LINEAR COX REGRESSION MODEL performed to evaluate the finite sample data obtained from twelve populations (BFLCRM) FOR PREDICTING TIME performance of BFLCRM. of strawberries in a randomized block experiment. TO CONVERSION TO ALZHEIMER’S e-mail: [email protected] DISEASE email: [email protected] Eunjee Lee*, University of North Carolina, Chapel Hill 58. CONTRIBUTED PAPERS: Latent Variable ESTIMATION OF BRANCHING Hongtu Zhu, University of North Carolina, and Principal CURVES IN THE PRESENCE OF SUB- Chapel Hill Component Models JECT SPECIFIC RANDOM EFFECTS Dehan Kong, University of North Angelo Elmi*, The George Washington Carolina, Chapel Hill University A LATENT VARIABLE MODEL FOR Yalin Wang, Arizona State University ANALYZING CORRELATED ORDERED Sarah J. Ratcliffe, University Kelly Sullivan Giovanello, University CATEGORICAL DATA of Pennsylvania of North Carolina, Chapel Hill Ali Reza Fotouhi*, University Wensheng Guo, University Joseph Ibrahim, University of North of The Fraser Valley of Pennsylvania Carolina, Chapel Hill In many statistical studies in medi- Branching curves are a technique for The aim of this paper is to develop a cine, clinical trials, and agriculture the modeling curves that change trajectory Bayesian functional linear Cox regression responses are recorded on an ordinal at a change (branching) point. Currently, model (BFLCRM) with both functional scale. The ordered responses may be the estimation framework is limited to and scalar covariates. This new develop- clustered and the subjects within the independent data, and smoothing splines ment is motivated by establishing the clusters may be positively correlated. A are used for estimation. Here, extension likelihood of conversion to Alzheimer’s commonly used method to accommo- of the branching curve framework to the disease (AD) in 346 patients with mild date this correlation is to add a random longitudinal setting, where the branching cognitive impairment (MCI) enrolled in component to the linear predictor of each point varies by subject, will be discussed. the Alzheimer’s Disease Neuroimaging clustered response. Moreover, some If the branching point is modeled as Initiative 1 (ADNI1) and the optimal early unobservable characteristics may have a random effect, then the longitudinal markers of conversion. These 346 MCI significant effect on the categorization of branching curve framework is a Semipa- patients were followed over 48 months, the responses. In this article we introduce rametric Nonlinear Mixed Effects Model. with 161 MCI participants progress- a latent variable model for analyzing Given existing issues with using random ing to AD at 48 months. The functional ordered categorical data in which the effects within a smoothing spline, we linear Cox regression model was used to random effects are not a nuisance but are express the model as a B-spline Based establish that the conversion time to AD of explanatory interest. We introduce two Semiparametric Nonlinear Mixed Effects can be accurately predicted by functional correlated random effects in the latent Model. Simple, clever smoothness con- covariates including hippocampus sur- variable model to control for cluster and straints are enforced on the B-splines at face morphology and scalar covariates category variability. Four commonly used the change point. The method is applied including brain MRI volumes, cognitive models probit, logistic, complementary to Women’s Health data where we model performance (ADAS-Cog), and APOE log-log, and log-log models for correlated the shape of the labor curve (cervical status. Posterior computation proceeds ordered categorical data are special dilation measured longitudinally) before via an efficient Markov chain Monte cases of this latent-variable model. We and after treatment with oxytocin (a labor validate the proposed model through a stimulant). email: [email protected]

274 ENAR 2015 | Spring Meeting | March 15–18 COMPOSITE LARGE MARGIN sification problem using CLM not only disease status and incorporate ordinal CLASSIFIERS WITH LATENT SUB- provides lower classification error in dis- gold standard. In this talk, we will pro- CLASSES FOR HETEROGENEOUS criminating cases and controls, but also pose a finite mixture model approach BIOMEDICAL DATA identifies subclasses in controls which to address these issues. are more likely to develop into disease Guanhua Chen*, Vanderbilt University email: [email protected] in the future. Yufeng Liu, University of North Carolina, email: [email protected] Chapel Hill LINEAR MIXED MODEL WITH UNOB- Michael R. Kosorok, University SERVED INFORMATIVE CLUSTER of North Carolina, Chapel Hill EVALUATION OF COVARIATE-SPE- SIZE: APPLICATION TO A REPEATED CIFIC ACCURACY OF BIOMARKERS PREGNANCY STUDY High dimensional classification prob- WITHOUT A GOLD STANDARD lems are prevalent in a wide range of Ashok K. Chaurasia*, Eunice Kennedy modern scientific applications. Despite a Zheyu Wang*, Johns Hopkins University Shriver National Institute of Child Health and Human Development, National large number of candidate classification Xiao-Hua Zhou, University of Washington techniques available to use, practitio- Institutes of Health In recent years, advances in biomarker ners often face a dilemma of the choice Danping Liu, Eunice Kennedy Shriver discovery have re-energized the field between linear and general nonlinear National Institute of Child Health and of diagnostic medicine, as research- classifiers. Specifically, simple linear Human Development, National ers continuously strive to obtain more classifiers have good interpretability, but Institutes of Health may have limitations in handling data with convenient, economical, accurate, and/or complex structures. In contrast, general timely diagnoses, by adding or combin- Paul S. Albert, Eunice Kennedy Shriver nonlinear kernel classifiers are more flex- ing various biomarkers to create novel National Institute of Child Health and ible but may lose interpretability and have diagnostic procedures. Accordingly, Human Development, National higher tendency for overfitting. In this it is important to assess the accuracy Institutes of Health of biomarkers. There are three major paper, we consider data with potential Modeling with informative cluster size is issues in biomarker evaluation: 1) The latent subgroups in the classes of inter- a common issue in analyzing clustered underlying medical condition, or the gold est. We propose a new method, namely data, where the outcome is associated standard, can unknown due to time and the Composite Large Margin Classifier with cluster size. This paper addresses cost constraints, lack of biotechnology, (CLM) to address the issue of classifica- the informative cluster size problem in or concerns over the invasive nature of tion with latent subclasses. The CLM linear mixed models when the cluster size a diagnostic procedure. This issue is aims to find three linear functions simul- is censored on all subjects. This problem becoming more common and pressing taneously: one linear function to split the is motivated by the NICHD Consecutive with the growing interest and emphasis data into two parts, with each part being Pregnancy Study, where the objective on preclinical diagnosis and prevention. classified by a different linear classifier. is to study the relationship between 2) Compared with traditional diagnostic Our method has comparable prediction birthweight and parity. It is hypothesized tests, biomarker levels are more eas- accuracy to a general nonlinear kernel that the birthweight profile is associated ily affected by patients’ characteristics. classifier and it maintains the interpret- with the number of births over a woman’s Therefore diagnosis based on biomark- ability of traditional linear classifiers. lifetime, resulting in an informative cluster ers need to be personallized. 3)With the We demonstrate the competitive perfor- size. However, in this study, a woman’s improvement in clinical practice, there is mance of the CLM through comparisons lifetime number of births is not observed a need to go beyond traditional binary with several existing linear and nonlinear (censored at the end of the study win- classifiers by Monte Carlo experiments. dow). In this paper we develop a pattern Analyzing Alzheimer’s disease clas- mixture model to account for informative

Program & Abstracts 275 cluster size by treating the unobserved NESTED PARTIALLY-LATENT CLASS operating characteristics via a simulation cluster size (lifetime number of births) MODELS (npLCM) FOR ESTIMATING study tailored to the motivating scientific as a latent variable. We compare this DISEASE ETIOLOGY IN CASE-CON- problem and illustrate the model with a approach with the simple alternatives TROL STUDIES detailed analysis of PERCH study data. where we use the observed number of Zhenke Wu*, Johns Hopkins University email: [email protected] births at the end of the study as cluster size. For estimating the population mean Scott L. Zeger, Johns Hopkins University trajectory, we show theoretically, with The Pneumonia Etiology Research for 59. CONTRIBUTED PAPERS: simulations and in the real data applica- Child Health (PERCH) study attempts Developments and tion that the simple approach can serve to infer the distribution of pneumonia- Applications of Cluster- as reliable approximation for the latent causing bacterial or viral pathogens in ing, Classification, and variable approach. developing countries from measurements Dimension Reduction email: [email protected] outside of the lung. Recent developments Methods in test standardization make it possible to collect multiple specimens to detect a A SEMIPARAMETRIC MODEL large number of pathogens at once with SEPARABLE SPATIO-TEMPORAL OF ESTIMATING NON-CONSTANT varying degrees of etiologic relevance PRINCIPAL COMPONENT ANALYSIS FACTOR LOADINGS and measurement precision. With this Lei Huang*, Johns Hopkins University data, researchers seek to estimate the Zhenzhen Zhang*, University Philip T. Reiss, New York University population fraction of cases caused by of Michigan School of Medicine each pathogen, and to develop algo- Brisa Sanchez, University of Michigan rithms to assist clinical diagnosis when Luo Xiao, Johns Hopkins University Factor analysis is a commonly used presented with complex data on an Vadim Zipunnikov, Johns Hopkins method in modeling multivariate expo- individual case. We describe a latent vari- University sure data. Typically, the measurement able model to address these two analytic Martin A. Lindquist, Johns Hopkins model is assumed to have constant goals using data from a case-control University factor loadings. We propose models that design. We assume each observation is relax this assumption by using penal- a draw from a mixture model for which Ciprian Crainiceanu, Johns Hopkins ized splines to estimate factor loadings each component represents one patho- University that change with other covariates. We gen. Conditional dependence among Current brain imaging studies often implement two different approaches of multivariate binary measurements on acquire large images that are observed penalizing the smoothing splines, the a single subject is induced by nesting over time. Examples of such stud- generalized cross validation criterion and subclasses within each disease class. ies include BOLD fMRI, DCE-MRI and the random effects, and incorporate them Measurement precision can be estimated dynamic PET. To model such data we into the EM algorithm through Newton- using the control sample for whom the introduce a class of separable spatio- Raphson and Monte-Carlo methods. etiologic class is known. We use stick- temporal process using explicit latent The likelihood ratio test is used to test breaking priors on the subclass weights process modeling. To account for the size whether a factor loading is constant. to estimate the population and individual and spatio-temporal structure of the data, etiologic distributions that are averaged email: [email protected] we extend principal component analysis across models indexed by different to achieve dimensionality reduction at numbers of subclasses. Assessment of the individual process level. We introduce model fit and individual diagnosis is done necessary identifiability conditions for using posterior samples drawn by Gibbs each model and develop scalable estima- Sampling. We demonstrate the method’s

276 ENAR 2015 | Spring Meeting | March 15–18 tion procedures. The method is motivated CLUSTERING OF BRAIN IMPACT OF DATA REDUCTION by and applied to an fMRI study designed SIGNALS USING THE TOTAL ON ACCELEROMETER DATA to analyze the relationship between pain VARIATION DISTANCE IN CHILDREN and brain activity. Carolina Euán*, Centro de Investigación Daniela Sotres-Alvarez*, University email: [email protected] en Matemáticas (CIMAT), A.C. of North Carolina, Chapel Hill Hernando Ombao, University of Yu Deng, University of North Carolina, California, Irvine Chapel Hill PENALIZED CLUSTERING USING A HIDDEN MARKOV RANDOM Joaquin Ortega, Centro de Investigación Guadalupe X. Ayala, San Diego State FIELD MODEL: DETECTING STATE- en Matemáticas (CIMAT), A.C. University RELATED CHANGES IN BRAIN Pedro Alvarez-Esteban, Universidad Mercedes Carnethon, Northwestern CONNECTIVITY de Valladolid, Spain University Yuting Xu*, Johns Hopkins University We are interested in studying the spatial Alan M. Delamater, University of Miami Martin Lindquist, Johns Hopkins structure of brain signals during a learn- Carmen R. Isasi, Albert Einstein College University ing motor task. Our research is based of Medicine In the statistical analyses of task-based on the spectral analysis of the electro- Sonia Davis, University of North fMRI time series, the activity in a set of encephalograms (EEG) traces recorded Carolina, Chapel Hill regions of interest (ROIs) change with by our neurologist collaborator. The EEG the experimental process. We assume data was collected across three different Kelly R. Evenson, University of North that the activity in the ROIs can be mod- phases of rest and practice. Our goal Carolina, Chapel Hill is to develop a procedure for detecting eled as Gaussian random vectors, with a Accelerometry, typically worn by partici- and characterizing differences in spatial mean and correlation matrix that changes pants for one week, provides an indicator variation of the EEG power between the between different states as the task of physical activity and sedentary different learning phases. At each can- progresses. In this work, we introduce behavior through measured accelera- nel we estimate the spectrum using the a penalized Gaussian Hidden Markov tions. A challenge with accelerometer smoothed periodograms. These indicate random field model, to detect changes data is the difficulty to distinguish non- the distribution of power across different in brain connectivity and simultaneously wear from sedentary behavior, since frequency bands in each channel. Our achieve shrinkage parameter estima- theoretically they both can register 0 principal tool is the Total Variation (TV) tion. The clustering assignment, or the counts per epoch (e.g. 15 seconds). Distance which is a similarity measure in hidden state, is obtained via MRF-MAP Current practice defines non-wear (which the clustering algorithm. Our procedure estimation, which takes into account becomes missing data) with a certain esssentially clusters time series at differ- the time-dependent structure within and number of consecutive zero counts, and ent channels that share similar spectral across subjects. We incorporate several summarizes accelerometer data only for structures (i.e., similar smoothed periodo- popular sparse precision matrix estima- individuals with a minimum number of grams). Using our proposed procedure tion algorithms to achieve better variable days each with a minimum number of we will be able to cluster channels that selection and parameter estimation. The hours of wear (e.g. at least 3 days of 10 behave similarly within each learning method is applied to various simulation hours/day). This approach does not make phase of the experiment. This work has data as well as fMRI data from an anxiety full use of the rich information contained been in collaboration with the Space- study, illustrating the efficacy of the pro- in the data and might not effectively Time Modeling group at UC Irvine. posed method compared to alternative handle the missing data on incomplete methods. email: [email protected] days. We compared various methods email: [email protected] to account for missing data using data

Program & Abstracts 277 from the SOL Youth Study (1,400 children each domain. Since one must meet nents (movelets) of the acceleration time ages 8-16 y). The percentage of complete the criteria for all domains, a positive series. These key movelets acted like data from standard approach varied from diagnosis is only issued if the prediction building blocks which constructed the 70.6% children with 4+ wear days (10hrs/ in each domain is positive. The overall whole signal of activity. We further inves- day) including a weekend day to 89.8% decision rule is therefore the intersec- tigated the interpretation of these key with 3+ wear days (8hr/day) without tion of all the domain specific rules. We movelets and found most of them having including necessarily a weekend day. We propose two algorithms, SVM Iterative the signal patterns very close to some compared the bias and precision of esti- and Logistic Iterative, to fit this model. important basic activity types, such as mates for physical activity and sedentary The proposed methods are flexible standing, sitting, lying and walking. Using behavior between the standard approach, enough to be adapted to complicated this method, we could avoid manually imputation methods and Wavelet-based settings including using high-dimensional defining types or categories of activity, functional mixed models. data, other logic structures, or non-linear and build subject-specific dictionaries of discriminant functions. In simulations, the key components for the subjects. This email: [email protected] Exhaustive Search (when applicable), allows for better and deeper comparison SVM Iterative and Logistic Iterative per- of physical activity status of different LEARNING LOGIC RULES FOR form well when compared with the oracle subjects. rule. The methods are then applied to DISEASE CLASSIFICATION: WITH email: [email protected] AN APPLICATION TO DEVELOPING construct a criteria set for Complicated CRITERIA SETS FOR THE DIAGNOS- Grief, a new psychiatric disorder. TIC AND STATISTICAL MANUAL OF email: [email protected] SIMULTANEOUS MODEL-BASED MENTAL DISORDERS CLUSTERING AND VARIABLE SELECTION: EXTENSION TO Christine M. Mauro*, Columbia CHARACTERIZING TYPES MIXED-DISTRIBUTION DATA University OF PHYSICAL ACTIVITY: Katie Evans, Dupont Donglin Zeng, University of North AN UNSUPERVISED WAY Carolina, Chapel Hill Tanzy M. T. Love*, University Jiawei Bai*, Johns Hopkins University of Rochester M. Katherine Shear, Columbia University Luo Xiao, Johns Hopkins University Sally W. Thurston, University Yuanjia Wang, Columbia University Vadim Zipunnikov, Johns Hopkins of Rochester In Psychiatry, clinicians rely on criteria University Current model-based clustering meth- sets from the Diagnostic and Statisti- Ciprian M. Crainiceanu, Johns Hopkins ods, such as LatentGold (Vermunt & cal Manual of Mental Disorders to make University Magidson, 2005) and MultiMix (Hunt & diagnoses. Each criteria set has sev- Jorgensen, 1999) can accommodate eral symptom domains. In order to be Predicting the type of activity performed data with variables of mixed-distributional diagnosed, an individual must meet the by human subjects using accelerometry forms. In these methods, statistical minimum number of symptoms required data is crucial to many different areas of criteria guide the manual selection of rela- for each domain. Several approaches to research. Currently supervised learning tionships between clustering variables, determine these minimum values are pro- methods dominate this field since they but not the selection of variables impor- posed. In simple scenarios, an exhaustive provide high prediction accuracy when tant to clustering. Clustering variable search is feasible. For more complicated the activity types of interest are known. selection procedures, such as Raftery scenarios, another approach is neces- However, in free-living circumstances the & Dean (2006) and Maugis et al (2009), sary. Given disease status and the count activity types of interest are often unclear. are limited to data consisting of normally of symptoms present in each domain, a We proposed an unsupervised learning distributed variables. Our new frame- linear discriminate function is fit within method to extract the key basic compo-

278 ENAR 2015 | Spring Meeting | March 15–18 work for model-based clustering on data In clinical trials of hypertension medica- Recurrent event data frequently appear in with continuous and discrete variables tions, cardiovascular disease events longitudinal studies when study subjects extends the cluster variance structure frequently recur over the study follow-up experience more than one event during framework set forth by Fraley and Raftery times. Recently, predictive models have the observation period. In reality, one (1999). In modeling how each variable been routinely used to assess risk in may observe the subsequent events contributes to cluster determination, we clinical trials. Patients are interested in influenced by previous events; hence, the allow for relations within and between knowing their risk of disease recurrence triggering scheme of event occurrence the continuous and discrete variables and death as their conditions change, shall be considered when modeling such (termed mixClust.) We also modify and such as risk factors, event history. data. In this paper, we extend the Cox extend existing likelihood-based variable However, modeling event history that proportional hazard model with time- selection procedures to accommodate would facilitate prediction is still under- varying information of previous events to data with variables of mixed-distributional developed. In this article, we propose a enhance the model fitness and predic- forms (ESR) and only require at least predictive model based on generalized tion. Parameter estimation and statistical one continuous variable. Simulation renewal processes and joint frailty model inferences can be easily achieved via study results show desirable properties where the impact of past events on a partial likelihood function. A jointed of our method when applied to data with further events and individual heterogene- statistical test is provided to assess the variables of mixed-distributional forms ity have been accounted. The proposed existence of the effects from previous and improved performance over existing model is assessed by the receiver operat- events. We demonstrate our approach methods when applied to only normally ing characteristic curve, which is derived via comprehensive simulation studies distributed data. Applying mixClust and using Monte Carlo approach. Simulation and cystic fibrosis registry data in chronic ESR to prostate cancer data generates studies demonstrate that the proposed pseudomonas infections. Significantly, subgroups with different responses to methods perform well in practical situa- our model provides a better prediction treatment. tions. After the model has been fitted to amongst currently existing ones. the training dataset, one can estimate email: [email protected] email: [email protected] new patients’ future survival probability for next disease events and death condi- 60. CONTRIBUTED PAPERS: tional on current information. Finally, the A PAIRWISE-LIKELIHOOD AUG- Survival Analysis: proposed tools are applied to the Antihy- MENTED ESTIMATOR FOR THE COX Methods Development pertensive and Lipid-Lowering Treatment MODEL UNDER LEFT-TRUNCATION to Prevent Heart Attack Trial (ALLHAT). and Applications Fan Wu*, University of Michigan email: [email protected] Sehee Kim, University of Michigan PREDICTIVE MODEL AND DYNAMIC Jing Qin, National Institute of Allergy PREDICTION FOR RECURRENT AN EXTENDED SELF-TRIGGERING and Infectious Diseases, National EVENTS WITH DEPENDENT MODEL FOR RECURRENT EVENT Institutes of Health TERMINATION DATA Yi Li, University of Michigan Li-An Lin*, University of Texas Health Jung In Kim*, University of North Survival data collected from prevalent Sciences Center at Houston Carolina, Chapel Hill cohorts are subject to left-truncation. The Sheng Luo, University of Texas Health Feng-Chang Lin, University of North conventional conditional approach using Sciences Center at Houston Carolina, Chapel Hill Cox model disregards the information Barry Davis, University of Texas Health Jason Fine, University of North Carolina, in the marginal likelihood of truncation Sciences Center at Houston Chapel Hill

Program & Abstracts 279 time thus can be inefficient. On the other Existing linear rank statistics cannot be the the modifying variable, time, is part hand, the stationary assumption under applied to cross-sectional survival data of response instead just being a covari- length-biased sampling (LBS) methods without follow-up since all subjects are ant. The computational burden increases to incorporate the marginal information essentially censored. However, partial quickly as the number of sample size can lead to biased estimation when it survival information is available from grows and analysis with relatively large is violated. In this paper, we propose a backward recurrence times, and is sample size out-powers existing statisti- semiparametric estimation method by frequently collected from health surveys cal methods and softwares. We propose augmenting the Cox partial likelihood without prospective follow-up. Under a novel application of Quasi-Newton with a pairwise likelihood, by which we length-biased sampling, a class of linear method with inexact line search pro- eliminate the unspecified truncation rank statistics is proposed based only on cedure to model the dynamic changes distribution in the marginal likelihood, backward recurrence times without any of regression coefficients in survival yet retain the information about regres- prospective follow-up. When follow-up analysis. The algorithm converges super- sion coefficients and the baseline data are available, the proposed rank linearly and is computationally efficient. hazard. Exploring self-consistency of statistic and a conventional rank statistic Numerical examples show that the com- the estimator, we give a fast algorithm to that utilizes follow-up information from the putational cost of our algorithm remains solve for the regression coefficients and same sample are shown to be asymptoti- low for even large data sets. Thus, the the cumulative hazard simultaneously. cally independent. We discuss four ways proposed methods are applicable to The proposed estimator is shown to be to combine these two statistics when large-scale data for which the application consistent and asymptotically normal follow-up is present. Simulations show of existing methods is impractical or fails with a sandwich-type consistent vari- that all combined statistics have sub- completely. The methods are applied to ance estimator. Simulation studies show stantially improved power compared to national kidney transplant data and study a substantial efficiency gain in both the conventional rank statistics, and a Mantel- the impact of potential risk factors on regression coefficients and the cumula- Haenszel test performed the best among post-transplant survival. the proposal statistics. The method is tive hazard over Cox model estimators, email: [email protected] and that the gain is comparable to LBS applied to a cross-sectional health survey methods when the stationary assump- without follow-up and a study of Alzheim- tion holds. For illustration, we apply the er’s disease with prospective follow-up. MULTIPLE IMPUTATION FOR proposed method to the RRI-CKD data. email: [email protected] INTERVAL CENSORED DATA email: [email protected] WITH TIME-DEPENDENT AUXILIARY VARIABLES USING INCIDENT COMPUTATION EFFICIENT AND PREVALENT COHORT DATA MODELS FOR FITTING LARGE- RANK-BASED TESTING BASED ON Wen Ye*, University of Michigan CROSS-SECTIONAL SURVIVAL DATA SCALE SURVIVAL DATA WITH OR WITHOUT PROSPECTIVE Douglas Schaubel, University Kevin He*, University of Michigan FOLLOW-UP of Michigan Yanming Li, University of Michigan Kwun Chuen Gary Chan*, University Due to the rarity of the disease and high of Washington Ji Zhu, University of Michigan transplant rate in biliary atresia patients, the true incidence of portal hyperten- Yi Li, University of Michigan Jing Qin, National Institute of Allergy sion (PHT) related complications in and Infectious Diseases, National Time-varying effects model is a flex- the absence of liver transplantation is Institutes of Health ible and powerful tool for modeling the unknown. Motivated by the need to dynamic changes of covariate effects. In understand the true clinical burden of survival analysis, however, time-varying PHT in these patients, we developed effects is often difficult to model since

280 ENAR 2015 | Spring Meeting | March 15–18 a risk score based multiple imputation additional assumptions or information. 61b. META-ANALYSIS OF GENE- method to combine data from incident Therefore, the researcher is faced with a ENVIRONMENT INTERACTION and prevalent cohort studies to overcome choice between using methods designed IN CASE-CONTROL STUDIES BY data scarcity; and to recover interval-cen- for informative or noninformative cen- ADAPTIVELY USING GENE-ENVI- sored and dependently right censored soring. This project investigates the RONMENT CORRELATION time of clinical PHT onset. The risk scores effectiveness of two methods developed Bhramar Mukherjee*, University used for imputation are calculated using for the analysis of informative case I and of Michigan BLUP estimates of the random effects, case II interval censored data under both derived from prognostic factors and types of censoring. Extensive simula- Shi Li, University of Michigan based on longitudinal models for the tion studies indicate that the methods John D. Rice, University of Michigan time-varying prognostic factors. To incor- produce unbiased results in the presence Jeremy M. G. Taylor, University porate the full uncertainty in the imputes, of both informative and noninformative of Michigan we included a bootstrap procedure and censoring. The efficiency of the informa- by replacing the BLUP estimates with tive censoring methods is then compared Heather Stringham, University draws from their posterior distributions. In with approaches created to handle nonin- of Michigan addition, we used weighted Kaplan-Meier formative censoring. The results of these Michael L. Boehnke, University estimator to adjust for survival-selection simulation studies can provide guidelines of Michigan in the prevalent component of the for deciding between models when facing There has been a significant volume of sample. a practical problem where one is unsure literature on using gene-environment about the dependence of the censoring email: [email protected] (G-E) independence to enhance power distribution. for testing gene- environment interaction email: [email protected] (GEI) in case-control studies. However, MODEL FLEXIBILITY FOR REGRES- there is little work thus far to study the SION ANALYSIS OF SURVIVAL role of G-E independence in a meta- DATA WITH INFORMATIVE INTERVAL 61. ORAL POSTERS: analysis setting where the assumption CENSORING GWAS and META could vary across studies. In this paper, Analysis of Genetic Tyler Cook*, University of Missouri, we propose an appropriate adaptation Studies Columbia of the empirical-Bayes (EB) type shrink- Jianguo Sun, University of Missouri, age estimator previously proposed by Columbia 61a. HYPOTHESIS TESTING FOR Mukherjee and Chatterjee (2008) to a SPARSE SIGNALS IN GENETIC meta-analysis context. The retrospective One problem that researchers face when ASSOCIATION STUDIES likelihood framework for inference is used analyzing survival data is how to handle to derive an adaptive combination of esti- the censoring distribution. It is often Xihong Lin*, Harvard University mators obtained under the constrained assumed that the observation process email: [email protected] model (assuming G-E independence) generating the censoring is indepen- and unconstrained model (without any dent of the event time of interest and assumptions) with weights determined can then effectively be ignored, but this by using information on G-E association assumption is clearly not always realis- parameters derived from multiple studies/ tic. Unfortunately one cannot generally cohorts. Our simulation studies indicate test for independent censoring without that this newly proposed estimator has improved mean-squared-error (MSE) properties than the standard alternative

Program & Abstracts 281 of using the inverse variance weighted consistency and asymptotic normality the effect of the group of variants on the estimator that combines study-specific properties of parametric and nonpara- disease. The primary advantage of the constrained, unconstrained or EB esti- metric estimates are established under proposed model averaging approaches mators. The results were illustrated by some regularity conditions. Hypothesis is that it performs model averaging over analyzing data from a study of Type 2 testing for the parametric coefficients and a substantially smaller set of models sup- diabetes, with six different case-control nonparametric functions are conducted. ported by the data and thus gains power studies contributing to the meta-analysis. Results show that the statistics of testing to detect multi-locus effects. We illustrate We considered the interaction between parametric coefficients are asymptotically the proposed model on simulated and genetic markers on the obesity related Chi-squared distributed, and the statis- real data, study the performance of these FTO gene and environmental factors tics of testing nonparametric functions model-averaging approaches compared with Type 2 diabetes as the case-control approximately follow a Chi-squared to the model selection method proposed outcome of interest distribution. The utility of the method is by Basu and Pan (2011). Extensive demonstrated through extensive simula- simulations and real data application email: [email protected] tions and a case study. demonstrate the advantage the pro- posed approach in presence of moderate email: [email protected] 61c. PARTIAL LINEAR VARYING number of null variants and presence of INDEX COEFFICIENT MODEL linkage equilibrium among the variants. FOR GENE-ENVIRONMENT 61d. TREE-BASED MODEL AVER- email: [email protected] INTERACTIONS AGING APPROACHES FOR MODELING RARE VARIANT Xu Liu*, Michigan State University ASSOCIATION IN CASE-CON- 61e. A FUNCTIONAL APPROACH Yuehua Cui, Michigan State University TROL STUDIES TO ASSOCIATION TESTING OF MULTIPLE PHENOTYPES IN Gene-environment interactions play key Brandon J. Coombes*, University SEQUENCING STUDIES roles in many complex diseases. In this of Minnesota paper, we propose a partial linear vary- Sneha Jadhav*, Michigan State Saonli Basu, University of Minnesota ing index coefficient model (PLVICM) University to assess how multiple environmental Sharmistha Guha, Fair Isaac Corporation Qing Lu, Michigan State University factors acting jointly to modify individual Nicholas Schork, J. Craig Venter Institute genetic risk on complex disease. Our Sequencing-based association studies model is generalized from varying index Multi-locus effect modeling is a powerful are proving to be increasingly useful in coefficient model while discrete variables approach for detection of genes influ- genetic research of complex diseases. In are admitted as the linear part. Therefore, encing a complex disease. Especially many of these studies, multiple pheno- PLVICM allows us to study the nonlinear for rare-variants, we need to analyze types are collected. These phenotypes interaction between grouped continues multiple variants together to achieve can be different measurements of an environments and genes as well as the adequate power for detection. In this underlying disease, or measurements interaction between the linear form of paper, we propose a parsimonious tree characterizing multiple diseases for discrete environments and genes simul- model and several branching model studying common genetic mecha- taneously. We derive a profile method mechanisms to assess the joint effect of nism (e.g., pleiotropic effects). Multiple to estimate parametric parameters and a group of rare variants on a binary trait phenotypes and high-dimensionality of a B-spline backfitted kernel method in a case-control study. The tree model the sequencing data pose challenges to estimate nonlinear functions. The implements a data reduction strategy for their association studies. To address within a likelihood framework and all these challenges, we propose a non- approaches use a weighted score test to assess the statistical significance of

282 ENAR 2015 | Spring Meeting | March 15–18 parametric method, which first constructs At the present time and for the foresee- 61g. META-ANALYSIS OF COMPLEX smooth functions from individuals able future, it is not economically feasible DISEASES AT GENE LEVEL BY sequencing data, and then uses these to sequence all individuals in a large GENERALIZED FUNCTIONAL to construct a U statistic for testing of cohort. A cost-effective strategy is to LINEAR MODELS association. The proposed method has sequence those individuals with extreme Ruzong Fan*, Eunice Kennedy Shriver the advantages of providing a general values of a quantitative trait. We consider National Institute of Child Health and framework of analyzing various types of the design under which the sampling Human Development, National Institutes phenotypes, considering linkage dis- depends on multiple quantitative traits. of Health equilibrium between genetic markers, Under such trait-dependent sampling, and allowing for different directions and standard linear regression analysis can Yifan Wang, Eunice Kennedy Shriver magnitude of effects. Through preliminary result in bias of parameter estimation, National Institute of Child Health simulation study, we found it had compa- inflation of type 1 error, and loss of power. and Human Development, National rable performance to existing methods We construct a likelihood function that Institutes of Health when the distribution assumptions of properly reflects the sampling mecha- Haobo Ren, Regeneron existing methods hold. Nevertheless, nism and utilizes all available data. We Pharmaceuticals, Inc. it outperformed the existing methods implement a computationally efficient Yun Li, University of North Carolina, when their distribution assumptions were EM algorithm and establish the theoreti- Chapel Hill violated. cal properties of the resulting maximum likelihood estimators. Our methods can Christopher Amos, Dartmouth email: [email protected] be used to perform separate inference Medical School on each trait or simultaneous infer- Wei Chen, University of Pittsburgh 61f. ANALYSIS OF SEQUENCE DATA ence on multiple traits. We pay special Momiao Xiong, University of Texas, UNDER MULTIVARIATE TRAIT- attention to gene-level association tests Houston DEPENDENT SAMPLING for rare variants. We demonstrate the superiority of the proposed methods Jason Moore, Dartmouth Medical School Ran Tao*, University of North Carolina, over standard linear regression through Chapel Hill Generalized functional linear models extensive simulation studies. We provide (GFLMs) are developed to perform a Donglin Zeng, University of North applications to the Cohorts for Heart and meta-analysis of multiple case-control Carolina, Chapel Hill Aging Research in Genomic Epidemiol- studies to connect genetic data to dichot- Nora Franceschini, University of North ogy Targeted Sequencing Study and the omous traits adjusting for covariates. Carolina, Chapel Hill National Heart, Lung, and Blood Institute Based on the GFLMs, $\chi^2$-dis- Exome Sequencing Project. Kari E. North, University of North tributed Rao’s efficient score test and Carolina, Chapel Hill email: [email protected] likelihood ratio test (LRT) statistics are introduced to test for an association Eric Boerwinkle, University of Texas between a complex trait and multiple Health Science Center genetic variants in one genetic region. Dan-Yu Lin, University of North Carolina, Extensive simulations are performed to Chapel Hill evaluate empirical type I error rates and power performance of the proposed High-throughput DNA sequencing allows models and tests. The proposed Rao’s for the genotyping of common and rare efficient score test statistics control the variants for genetic association studies. type I error very well and have higher power than the existing methods of

Program & Abstracts 283 MetaSKAT when the causal variants are Yun Li, University of North Carolina, 61i. A NEW ESTIMATING EQUATION both rare and common. When the causal Chapel Hill APPROACH FOR SECONDARY variants are all rare (i.e., minor allele TRAIT ANALYSES IN GENETIC Momiao Xiong, University of Texas, frequencies less than 0.03), the Rao’s CASE-CONTROL STUDIES Houston efficient score test statistics have similar Xiaoyu Song*, Columbia University or slightly lower power than MetaSKAT. Functional linear models are developed The LRT statistics generate accurate type for meta-analysis of multiple studies to Iuliana Ionita-Laza, Columbia University connect genetic data to quantitative traits I error rates for homogeneous genetic Ying Wei, Columbia University effect models and may inflate type I error adjusting for covariates. The models In this manuscript, we propose a new rates for heterogeneous genetic mod- can analyze rare variants or common estimating equation based approach els due to big degrees of freedom, and variants or the combinations of the two. that provides unbiased secondary traits have similar or slightly higher power than Both likelihood ratio test (LRT) and analysis in genetic case-control studies. the Rao’s efficient score test statistics. F-distributed statistics are introduced In genetic studies, analysis on second- The proposed methods were applied to test association between quantitative ary traits is an important way to discover to analyze type 2 diabetes data from a traits and multiple genetic variants in one potential disease pathways. When data meta-analysis of eight European studies, genetic region. Extensive simulations are are collected from case-control designs, and detected significant association for performed to evaluate empirical type I direct analyses are often biased. Several genes APB, APOE, FTO, and LPL, while error rates and power performance of the methods have been proposed to address MetaSKAT detected none. The models proposed models and tests. We show this issue, including the inverse-probabil- and related test statistics can analyze rare that the proposed LRT and F-distributed ity-of-sampling-weighted (IPW) approach, variants or common variants or a combi- statistics control the type I error very well the maximum likelihood (ML) approach, nation of the two, and can be useful in the and have higher power than the existing the adaptive weighted approach and the whole genome-wide and whole exome methods of MetaSKAT. The proposed bias correction approach. Comparing to association studies. methods were applied to analyze four blood lipid levels in data from meta- the existing ones, the proposed estimat- email: [email protected] analysis of eight European studies. It ing equation based approach enjoys the was found that the proposed methods following properties. One, it creates a gen- detect more significant association eral framework that is applicable to a wide 61h. GENE LEVEL META-ANALYSIS than MetaSKAT and the p-values of the range of genetic models. It could be used OF QUANTITATIVE TRAITS BY proposed LRT and F-distributed statistics to model various types phenotypes (con- FUNCTIONAL LINEAR MODELS are usually much smaller than those of tinuous or binary) and SNPs (additive or Yifan Wang*, Eunice Kennedy Shriver MetaSKAT. The functional linear models dominate, single or multiple), and is also National Institute of Child Health and related test statistics can be useful easy to incorporate covariates. Second, it and Human Development, National in the whole genome-wide and whole is computationally simple and straightfor- Institutes of Health exome association studies. ward. We compared our method with the existing ones in both numerical studies Ruzong Fan, Eunice Kennedy Shriver email: [email protected] and a stroke GWAS data. The proposed National Institute of Child Health method was shown to be less sensitive and Human Development, National to the sampling scheme and underlying Institutes of Health disease model. For these reasons, we Michael Boehnke, University of Michigan believe that our new methods complement Wei Chen, University of Pittsburgh the existing approaches, and are useful to analyze secondary traits. email: [email protected]

284 ENAR 2015 | Spring Meeting | March 15–18 61j. NOVEL STATISTICAL MODEL heterogeneity among different ancestry 61l. A NEW APPROACH FOR FOR GWAS META-ANALYSIS groups but could not be detected by the DETECTING GENE-BY-GENE AND ITS APPLICATION TO traditional meta-analysis methods. INTERACTIONS THROUGH TRANS-ETHIC META-ANALYSIS email: [email protected] META-ANALYSES Jingchunzi Shi*, University of Michigan Yulun Liu*, University of Texas, Health Science Center at Houston Seunggeun Lee, University of Michigan Paul Scheet, University of Texas Trans-ethnic Genome-wide associa- 61k. MULTIPLE PHENOTYPE MD Anderson Cancer Center tion studies (GWAS) meta-analysis has ASSOCIATION TESTING Yong Chen, University of Texas, Health proven to be a practical and profitable BASED ON SUMMARY Science Center at Houston approach for identifying loci which STATISTICS IN GENOME-WIDE contribute to the missing heritability of ASSOCIATION STUDIES There is increasing interest in detecting complex traits. However, the expected gene-by-gene interactions for complex genetic effects heterogeneity cannot be Zhonghua Liu*, Harvard School of Public Health traits, with varying, but substantial, easily accommodated through existing proportions of heritability remaining unex- approaches. In response, we propose a Xihong Lin, Harvard School plained by surveys of single-SNP genetic novel trans-ethnic meta-analysis meth- of Public Health variation. The major challenges from odology with flexibly modeling of the Multiple correlated phenotypes might traditional regression-based methods are expected genetic effects heterogeneity share a common genetic basis, referred the large number of possible pairs under across diverse populations. Specifically, to as pleiotropy in genetics. However, investigation, with a requisite need to cor- we consider a modified random effect current available methods for identifying rect for multiple testing, and the restrictive model in which genetic effect coefficients genetic variants with pleiotropic effects assumptions of large marginal effects to are random variables whose correlation on multiple phenotypes are limited. In this reduce the search space and limit the structure across ancestry groups reflects paper, we present a toolkit of statistical number of tests. Both of these challenges the expected heterogeneity (or homoge- methods that harness the correlation may limit power, especially when the neity) among ancestry groups. To test for structures among the multiple pheno- marginal effects are in fact modest. In associations, we derive the data-adaptive types to boost statistical powers to detect this talk, we propose a new procedure variance component test with adaptive such genetic variants based on summary for detecting gene-by-gene interactions selection of the correlation structure to statistics. We conduct extensive simula- through meta-analyses. Our approach is increase the power. Simulations dem- tion studies to show that our methods pragmatic when data-sharing limitations onstrate that our proposed method maintain correct type I error rates, and restrict mega-analyses. It is also com- performs with substantial improvements their statistical powers are compared in a putationally efficient in that it applies a in comparison to the traditional meta- wide range of situations. We further apply dimension reduction procedure and thus analysis methods. Furthermore, our these methods to a genome-wide asso- may scale for higher-order interactions proposed method provides scalable ciation study of plasma lipids levels and as well. We compare the type I error and computing time for genome-wide data. identify hundreds of novel genetic vari- power of our proposed procedure relative For real data analysis, we re-analyzed the ants that conventional single-trait analysis to existing methods and evaluate their published type 2 diabetes GWAS meta- approaches failed to discover. We also strengths and limitations. analyses from Consortium et al. (2014), develop an R package MPAT available for and successfully identified one additional email: [email protected] public uses. SNP which clearly exhibits genetic effects email: [email protected]

Program & Abstracts 285 61m. GENOME-WIDE ASSOCIATION between genotypes through a fixed related case-control samples, which STUDIES FOR FUNCTIONAL kernel function and comparing it to the degenerates into corresponding tests for VALUED TRAITS phenotype similarity. However, given a set independent samples in the absence of of kernel candidates, there is no general population structure or relatedness. We fit Han Hao*, The Pennsylvania criterion to construct weighted kernel, a generalized linear mixed model under State University which has more flexibility than single the null hypothesis and derive the test Rongling Wu, The Pennsylvania kernel. Based on the asymptotic results, statistics and their asymptotic distribu- State University we proposed a weighted kernel strategy tions. We show in simulation studies Genome-wide association studies where the weights were optimized to that our tests have correct type I error (GWAS) has been widely used to detect maximize the signal-to-noise ratio of the rates in correlated samples, in contrast the association between genetic varia- weighted kernel. The proposed method to those tests assuming independence. tions and phenotypic variations. A great was demonstrated through simulations We compare the power of our tests in number of GWAS approaches have been and real data applications. various scenarios and illustrate how they could be used to test different scientific developed in the past decade, but few email: [email protected] of them were designed for functional hypotheses. We also apply our tests to a trait values. Functional trait values are real data example. widely seen in biological shape analysis, 61o. A GENERAL FRAMEWORK OF email: [email protected] dynamic progressions and clinical trials, GENE-BASED ASSOCIATION and it is crucial to integrate the functional TESTS FOR CORRELATED feature with GWAS and receive high CASE-CONTROL SAMPLES 61p. ALGORITHM TO COMPUTE THE IDENTITY COEFFICIENTS AT A statistical power. We here propose a Han Chen*, Harvard School PARTICULAR LOCUS GIVEN THE model-free approach to address a GWAS of Public Health problem with functional trait values. There MARKER INFORMATION Chaolong Wang, Harvard School is no assumption for the functional form, J. Concepcion Loredo-Osti*, of Public Health but a parametric form can be involved to Memorial University account for specific biological mecha- Xihong Lin, Harvard School Haiyan Yang, Memorial University nism. The method is applied on a real of Public Health dataset and verified to be quite powerful. There are some problems in modern In genetic association studies, gene- genetics where the inferring the identity email: [email protected] based tests have been widely used to coefficients or a linear combination of test association with a set of genetic them a particular locus given a the data variants, genes or pathways. However, on a set of markers may play an impor- 61n. KERNEL-BASED TESTING existing gene-based tests such as burden tant role. For example, if the identity FOR NONLINEAR EFFECT OF tests and the sequence kernel associa- coefficients at a given chromosomal A SNP-SET UNDER MULTIPLE tion test require the critical assumption location using the marker information CANDIDATE KERNELS that observations are independent, which spanning a the region of interest have Tao He*, Michigan State University is violated in the presence of population already been estimated, the identity by stratification and cryptic relatedness. We Ping-Shou Zhong, Michigan State descent status used in gene mapping observe inflated type I error rates when University problems can be easily obtained. It would using these tests to analyze correlated also be possible to compute these iden- Yuehua Cui, Michigan State University samples. Here we propose a general tity by descent coefficients conditional Kernel-based testing framework has been framework of gene-based tests for cor- on a particular model which be useful in proved very powerful in SNP-set associa- addressing gene genealogy problems, tion analysis by measuring the similarity

286 ENAR 2015 | Spring Meeting | March 15–18 modelling jointly linkage and linkage 61r. USAT: A UNIFIED SCORE-BASED 62. Statistical Inference disequilibrium, genetic counselling and ASSOCIATION TEST FOR MUL- with Random Forests other forensic applications. On this pre- TIPLE PHENOTYPE-GENOTYPE and Related Ensemble sentation, an extension to Karigl (1981), ANALYSIS Methods Abney(2009) or Cheng-Ozsoyoglu (2014) Debashree Ray*, University of Minnesota algorithms for computing the identity coefficients that incorporates the marker Saonli Basu, University of Minnesota CONSISTENCY OF RANDOM information is introduced. A comparison Genome-wide Association Studies FORESTS with other procedures in the context of (GWASs) for complex diseases often Gerard Biau*, Pierre and Marie identity by descent estimation is also collect data on multiple correlated endo- Curie University discussed. phenotypes. Multivariate analysis of these Erwan Scornet, Pierre and Marie correlated phenotypes can improve the email: [email protected] Curie University power to detect genetic variants. Multivari- ate analysis of variance (MANOVA) can Jean-Philippe Vert, Pierre and Marie 61q. ESTIMATING THE EMPIRICAL perform such association analysis at a Curie University NULL DISTRIBUTION OF GWAS level, but the behavior of MANOVA Random forests are a learning algorithm MAXMEAN STATISTICS IN under different trait models have not been proposed by L. Breiman in 2001 which GENE SET ANALYSIS carefully investigated. In this paper, we combines several randomized decision Xing Ren*, University at Buffalo, SUNY show that MANOVA is generally very pow- trees and aggregates their predictions by erful for detecting association but there are averaging. Despite its wide usage and Jeffrey Miecznikowski, University situations where MANOVA may not have outstanding practical performance, little is at Buffalo, SUNY any detection power. We investigate the known about the mathematical properties Song Liu, Roswell Park Cancer Institute behavior of MANOVA, both theoretically of the procedure. This disparity between and using simulations, and derive condi- Jianmin Wang, Roswell Park theory and practice originates in the tions where MANOVA loses power. Based Cancer Institute difficulty to simultaneously analyze both on our findings, we propose a unified the randomization process and the highly Gene set analysis is a widely-used score-based test USAT that can perform data-dependent tree structure. In this talk, framework for testing enrichment of dif- better than MANOVA in such situations we take a step forward in forest explora- ferentially expressed genes in a set of and do almost as good as MANOVA tion by proving a consistency result for genes. The method involves computing a elsewhere. USAT reports an approximate Breiman’s original algorithm in the con- maxmean statistic and estimating the null asymptotic p-value for association and is text of additive models. Our analysis also distribution of the maxmean statistics via computationally efficient at GWAS level. sheds an interesting light on how random a restandardization procedure. We derive We have studied through extensive simu- forests can nicely adapt to sparsity in an asymptotic null distribution of the max- lations the performance of USAT, MANOVA high-dimensional settings. mean statistic and propose an empirical and other existing approaches and dem- email: [email protected] method to estimate the empirical null onstrated the advantage of using USAT in distribution. We show that our method detecting association between a genetic is more accurate in controlling the type variant and multivariate phenotypes. We 1 error when testing a large number of applied USAT on ARIC type 2 diabetes gene sets. data with five correlated traits on 5,819 email: [email protected] Caucasians and detected some signifi- cantly associated novel genetic variants. email: [email protected]

Program & Abstracts 287 ASYMPTOTIC THEORY FOR for additivity have been developed to There is a crucial need for effective RANDOM FORESTS determine the appropriateness of such variable selection procedures in high models. However, as data grows in size dimensional data, where it is difficult Stefan Wager*, Stanford University and complexity, practitioners are rely- to detect subtle individual effects and Random forests have proven themselves ing more heavily on learning algorithms interactions between factors. Bayesian to be reliable predictive algorithms in because of their predictive superiority. Additive Regression Trees are a prom- many application areas. Not much is Due to the black-box nature of these ising alternative to more parametric known, however, about the statistical learning methods, the increase in predic- regression approaches, such as the properties of random forests. Several tive power is assumed to come at the lasso or Bayesian latent indicator models. authors have established conditions cost of interpretability and understanding. BART constructs an ensemble of decision under which their predictions are con- In this talk, we discuss our recent work trees from the set of possible predictors sistent, but these results do not provide that demonstrates that many popular of an outcome variable. We develop prin- practical estimates of the scale of random learning algorithms, such as bagged cipled methodology that adapts BART to forest errors. In this paper, we analyze a trees and random forests, have desirable variable selection as well as incorporating random forest model based subsampling, asymptotic properties. In particular, we additional data as prior information. We and show that random forest predictions produce a central limit theorem for pre- evaluate the performance of our BART-­ are asymptotically normal provided that dictions when base learners are built with based approach in simulation settings as the subsample size s scales as s(n)/n = subsamples, thereby allowing for statisti- well as an application to the gene regula- o(log(n)^{-d}), where n is the number cal inference. In addition to producing tory network in yeast. of training examples and d is the number confidence intervals and hypothesis tests email: [email protected] of features. Moreover, we show that the for feature significance, we show that by asymptotic variance can consistently be enforcing a grid structure on the test set, estimated using an infinitesimal jackknife we can formally test the plausibility of 63. Mediation and Interac- for bagged ensembles recently pro- various additive structures. We develop tion: Theory, Practice posed by Efron (2013). In other words, notions of total and partial additivity and and Future Directions our results let us both characterize and demonstrate that both tests can be car- estimate the error-distribution of random ried out at no additional computational forest predictions. Thus, random forests cost to the original ensemble. A UNIFICATION OF MEDIATION need not only be treated as black-box AND INTERACTION: A FOUR-WAY email: [email protected] predictive algorithms, and can also be DECOMPOSITION used for statistical inference. Tyler J. VanderWeele*, Harvard email: [email protected] VARIABLE SELECTION WITH BAYES- University IAN ADDITIVE REGRESSION TREES It is shown that the overall effect of an Shane T. Jensen*, University DETECTING FEATURE exposure on an outcome, in the presence of Pennsylvania INTERACTIONS IN BAGGED TREES of a mediator with which the exposure AND RANDOM FORESTS Justin Bleich, University of Pennsylvania may interact, can be decomposed into four components: (i) the effect of the Adam Kapelner, University Lucas K. Mentch*, Cornell University exposure in the absence of the media- of Pennsylvania Giles Hooker, Cornell University tor, (ii) the interactive effect when the Edward I. George, University mediator is left to what it would be in the Additive models remain popular statistical of Pennsylvania absence of exposure, (iii) a mediated tools due to their ease of interpreta- interaction, and (iv) a pure mediated tion and as a result, hypothesis tests

288 ENAR 2015 | Spring Meeting | March 15–18 effect. These four components, respec- confounding. When the mediator is the information from the individuals with tively, correspond to the portion of the binary, bounds for partial identification only the SNP data. We show using all effect that is due to neither mediation have been given when neither assump- available data, we gained more efficient nor interaction, to just interaction (but tion is made, or alternatively when estimators of the direct effects of SNPs not mediation), to both mediation and assuming only (ii). We extend these and the indirect effects of SNPs mediated interaction, and to just mediation (but bounds to the case of a polytomous through gene expressions/DNA methyl- not interaction). This four-way decom- mediator, and provide bounds for the ations on a phenotype with varying levels position unites methods that attribute case assuming only (i). We apply these of missingness. We also consider media- effects to interactions and methods that bounds as well as point estimates under tion network analysis, where the mediator assess mediation. Certain combinations other fully-identifying model assumptions consists of network data. We applied our of these four components correspond to data from the Harvard PEPFAR pro- method to several existing datasets. gram in Nigeria, where we evaluate the to measures for mediation, while other email: [email protected] combinations correspond to measures extent to which the effects of antiretroviral of interaction previously proposed in the therapy on virological failure are medi- literature. Prior decompositions in the ated by a patient’s adherence, and show 64. Motivation and Analy- literature are in essence special cases of that inference on this effect is somewhat sis Strategies for Joint this four-way decomposition. The four- sensitive to model assumptions. Modeling of High Dimen- way decomposition can be carried out email: [email protected] sional Data in Genetic using standard statistical models, and Association Studies software is provided to estimate each of the four components. The four-way INTEGRATIVE ANALYSIS OF decomposition provides maximum insight COMPLEX GENETIC, GENOMIC REGION-BASED TEST FOR GENE- into how much of an effect is mediated, AND ENVIRONMENTAL DATA ENVIRONMENT INTERACTIONS IN how much is due to interaction, how USING MEDIATION ANALYSIS LONGITUDINAL STUDIES much is due to both mediation and inter- Xihong Lin*, Harvard University Zihuai He, University of Michigan action together, and how much is due to Min Zhang*, University of Michigan neither. Mediation analysis provides a useful framework for integrative analysis of mul- Seunggeun Lee, University of Michigan email: [email protected] tiple types of genetic and genomic data Jennifer Smith, University of Michigan and environmental data to understand Xiuqing Guo, Harbor-UCLA PARTIAL IDENTIFICATION OF THE disease causing mechanisms. Genetic Medical Center PURE DIRECT EFFECT UNDER EXPO- and genomic data include SNP data, SURE-INDUCED CONFOUNDING such as GWAS or sequencing data, and Walter Palmas, Columbia University gene expression data. We discuss in this Caleb Miles*, Harvard University Sharon L.R. Kardia, University talk mediation analysis in several com- of Michigan Eric Tchetgen Tchetgen, Harvard plex settings, including the presence of University missing data and network analysis. Spe- Ana V. Diez Roux, University cifically, GWAS data are often collected of Michigan In causal mediation analysis, non- on all individuals enrolled in a study. parametric identification of the pure Bhramar Mukherjee, University However, genomic data, such as gene (natural) direct effect typically relies on of Michigan expressions and DNA methylations, are fundamental assumptions of (i) so-called There has been tremendous emphasis often collected in a subset of study sub- “cross-world-counterfactuals” indepen- on searching for interactions between jects. We propose a mediation analysis dence and (ii) no exposure-induced genetic factors and environmental method using all the data by leveraging

Program & Abstracts 289 exposures. Gene-environment interac- can lead to biased estimation when it (SNPs) within a biological pathway-could tions (G×E) are typically based on testing is violated. In this paper, we propose a complement single-SNP analysis and the interaction between each single- semiparametric estimation method by provide additional insights for the genetic nucleotide polymorphisms (SNP) and an augmenting the Cox partial likelihood architecture of complex diseases. In this environmental variable separately, with with a pairwise likelihood, by which we talk, we will explore several strategies adjustment for multiple testing. However, eliminate the unspecified truncation for enhancing the power of pathway the interaction process is probably far distribution in the marginal likelihood, analysis, including the improvement of more complex than looking for “single yet retain the information about regres- the signal-to-noise ratio through more locus vs. environment factor” analysis. sion coefficients and the baseline informative selection of SNPs based on We propose a novel statistical approach hazard. Exploring self-consistency of their potential functional impact, and the to test for gene-environment interaction the estimator, we give a fast algorithm to increase of sample size by integrating between an environmental factor and solve for the regression coefficients and summary statistics on individual SNPs a set of genetic variants for longitudi- the cumulative hazard simultaneously. from existing large scale Meta analysis for nal studies, with the consideration of The proposed estimator is shown to be pathway analysis. We will use numerical potential time dependency and correla- consistent and asymptotically normal simulations and real data applications to tion in the outcomes measured on the with a sandwich-type consistent vari- evaluate the proposed procedures. ance estimator. Simulation studies show same subject. The method integrates the email: [email protected] entire genotype-environment-phenotype a substantial efficiency gain in both the information contained in a longitudinal regression coefficients and the cumula- study through a region based test. Non- tive hazard over Cox model estimators, A UNIFIED TEST FOR POPULATION- parametric modeling of the environmental and that the gain is comparable to LBS BASED MULTIPLE CORRELATED exposure is incorporated to alleviate the methods when the stationary assump- PHENOTYPE-GENOTYPE ASSOCIA- problem of misspecification of the main tion holds. For illustration, we apply the TION ANALYSIS or interaction effect, leading to more proposed method to the RRI-CKD data. Saonli Basu*, University of Minnesota robust type-I error rate and superior email: [email protected] power. As the number of SNPs in a target Debashree Ray, University of Minnesota region can be very large, dimension Joint modeling of a disease-related reduction method is further proposed, STRATEGIES TO IMPROVE THE multiple correlated traits may improve which adaptively selects and adjusts POWER OF PATHWAY ANALYSIS IN power to detect association between a for the main effect of genetic variants to GENETIC ASSOCIATION STUDIES genetic variant and the disease. More- achieve numerical feasibility, controlled Kai Yu*, National Cancer Institute, over this joint analysis can reveal some type I error probability and improvement National Institutes of Health pleiotropic genes involved in the bio- in power. The performance of the method logical development of the disease. The Han Zhang, National Cancer Institute, will be evaluated through simulation stud- standard multivariate analysis of vari- National Institutes of Health ies and illustrated by real data analysis. ance (MANOVA) is very powerful when Survival data collected from prevalent Jianxin Shi, National Cancer Institute, a genetic variant is associated with a cohorts are subject to left-truncation. The National Institutes of Health subset of the traits or the effect of causal conventional conditional approach using variant is in different directions with these Nilanjan Chatterjee, National Cancer Cox model disregards the information correlated traits, but loses significant Institute, National Institutes of Health in the marginal likelihood of truncation power when the variant is associated time thus can be inefficient. On the other It is increasingly recognized that path- with all the traits and the effect direction hand, the stationary assumption under way analyses-a joint test of association is same as the direction of dependence length-biased sampling (LBS) methods between the outcome and a group between the traits. We propose a power- to incorporate the marginal information of single nucleotide polymorphisms

290 ENAR 2015 | Spring Meeting | March 15–18 ful computationally efficient unified test 65. Recent Developments HYPOTHESIS TESTING FOR that maximizes power by adaptively using on Inference for Pos- AN EXTENDED COX MODEL WITH the data to optimally combine MANOVA sibly Time-Dependent TIME-VARYING COEFFICIENTS and a test that potentially ignore the cor- Treatment Effects with Ying Q. Chen*, Fred Hutchinson Cancer relation between the traits. We illustrate Survival Data Research Center our proposed test through simulation studies and real data applications and The log-rank test has widely been compare the performance of different THRESHOLD REGRESSION used to test treatment effects under the multivariate approaches under various FOR LIFETIME DATA Cox model for censored time-to-event outcomes, though it may lose power alternative models. Mei-Ling Ting Lee*, University substantially when the model’s propor- email: [email protected] of Maryland, College Park tionality assumption does not hold. In George A. Whitmore, McGill University, this paper, we consider an extended Cox Canada MODELLING MULTIPLE model that uses B-splies or smoothing CORRELATED GENETIC VARIANTS Cox regression methods are well-known. splines to model a time-varying treatment It has, however, a strong proportional effect and propose score test statistics Sharon R. Browning*, University hazards assumption. In many medical for detecting the treatment efffect. The of Washington contexts, a disease progresses until a new methods are applied to a random- Many statistical analyses of genetic data failure event (such as death) is triggered ized clinical trial assessing the efficacy rely on being able to model the correla- when the latent health level first degrades of single-dose Nevirapine against tion between genetic variants that are to a failure threshold. I will present the mother-to-child HIV transmission that was located close together on a chromo- Threshold Regression (TR) model for conducted by the HIV Prevention Trial some. The processes that create the the health process that requires few Network. correlation are complex, and include assumptions and, hence, is quite general email: [email protected] mutation, recombination, selection, and in its potential application. I will begin drift. These factors have variable effects with Wiener diffusion process based across the genome, so the strength and TR model and the regression methods TIME-DEPENDENT CUT POINT patterns of correlation are also variable using inverse-Gaussian distributions. SELECTION FOR BIOMARKERS from one genomic region to another. Distribution-free methods for estimations IN CENSORED SURVIVAL DATA Hence successful modeling efforts need and predictions using the TR models Zhezhen Jin*, Columbia University to be data driven, as well as incorporat- will also be derived. I will demonstrate In biomedical research and practice, ing key elements of genetic processes the methodology and its practical use. continuous biomarkers are often used for such as recombination. I will describe the Comparisons with the Cox model will also diagnosis and prognosis, with cut points Beagle model, which has proved to be be discussed. useful for a variety of statistical analyses, being established to monitor treatment email: [email protected] including haplotype phase inference and effect on survival or time to an event. imputation of untyped variants. I will also We will study non-parametric procedure present recent work applying this model for the selection of time-dependent to identity by descent tract detection. cut points with censored survival data. Numerical studies will be presented along email: [email protected] with real applications. email: [email protected]

Program & Abstracts 291 66. Journal of Agricultural, NONLINEAR VARYING-COEFFICIENT Biological and MODELS WITH APPLICATIONS Environmental TO A PHOTOSYNTHESIS STUDY Statistics (JABES) Damla Senturk*, University of California, Highlights Los Angeles Esra Kurum, Medeniyet University LIMITED-INFORMATION Runze Li, The Pennsylvania MODELING OF LOGGERHEAD State University TURTLE POPULATION SIZE Yang Wang, China Vanke John M. Grego*, University of South Motivated by a study on factors affect- Carolina ing the level of photosynthetic activity David B. Hitchcock, University in a natural ecosystem, we propose of South Carolina nonlinear varying-coefficient models, INFERENCE ON THE SUMMARY In traditional capture-recapture experi- in which the relationship between the MEASURES OF TREATMENT EFFECT ments to estimate the size of an animal predictors and the response variable is WITH SURVIVAL DATA WHEN THERE population, individual animals are tagged allowed to be nonlinear. One-step local IS POSSIBLY TREATMENT BY TIME and the information about which individu- linear estimators are developed for the INTERACTION als are captured repeatedly is crucial. We nonlinear varying-coefficient models apply these approaches to data in which and their asymptotic normality is estab- Song Yang*, National Heart, Lung information about individual identity is lished leading to point-wise asymptotic and Blood Institute, National Institutes not available, specifically nesting data for confidence bands for the coefficient of Health loggerhead turtles. Rather, we observe functions. Two-step local linear estimators For clinical trials with survival data, the only the counts of successful and failed are also proposed for cases where the hazard ratio has been the most widely nestings at a location over a series of varying-coefficient functions admit differ- used measure for describing the treat- days. We view the turtles’ nesting behavior ent degrees of smoothness; bootstrap ment effect. When there is possibly a as an alternating renewal process, model confidence intervals are utilized for infer- treatment by time interaction, summary it using parametric distributions, and ence based on the two-step estimators. measures such as average hazard ratio then derive probability distributions that We further propose a generalized F-test and restricted mean survival difference describe the behavior of the turtles during to study whether the coefficient functions have been proposed in the literature. We the days under study. We adopt a Bayes- vary over a covariate. We illustrate the investigate various old and new summary ian approach, formulating our model in proposed methodology via an application measures, and study their nonparametric terms of parameters about which strong to an ecology data set and study the finite and semiparametric estimates, with or prior information is available. We use a sample performance by Monte Carlo without covariate adjustment. Hypothesis Gibbs sampling algorithm to sample from simulation studies. testing and confidence intervals of the the posterior distribution of our random email: [email protected] measures are established. We illustrate quantities, the most crucial of which is these measures and discuss their merits the number of turtles remaining offshore and limitations in applications to clini- during the entire sampling period. We cal trials including the Women’s Health illustrate the method using data sets from Initiative. loggerhead turtle sites along the South email: [email protected] Carolina coast. email: [email protected]

292 ENAR 2015 | Spring Meeting | March 15–18 MULTILEVEL LATENT GAUSSIAN ANALYSIS OF VARIANCE most proposals these models have been PROCESS MODEL FOR MIXED OF INTEGRO-DIFFERENTIAL constructed using a group lasso penalty DISCRETE AND CONTINUOUS EQUATIONS WITH APPLICATION and an explicit basis. We take a different MULTIVARIATE RESPONSE DATA TO POPULATION DYNAMICS route and provide a framework to con- OF COTTON APHIDS struct sparse additive models using either Erin M. Schliep*, Duke University an explicit basis expansion, or structure Jianhua Huang*, Texas A&M University Jennifer A. Hoeting, Colorado induced by a penalty or constraint. This State University The population dynamics of cotton allows us to build more data-adaptive aphids is usually described by mecha- We propose a Bayesian model for mixed additive models --- eg. piecewise con- nistic models, in the form of IDEs with ordinal and continuous multivariate data to stant or linear models with knots chosen parameters representing some key evaluate a latent spatial Gaussian process. adaptively, isotonic regression models, properties of the dynamics. Investigation Our proposed model can be used in many spline models etc. We give an efficient of treatments on the population dynamics contexts where mixed continuous and dis- algorithm and show that in many cases, is a central issue in developing success- crete multivariate responses are observed fitting these models requires only the ful chemical and biological controls for in an effort to quantify an unobservable same order of computation as the usual cotton aphids. Motivated by this impor- continuous measurement. In our example, linear lasso. tant agricultural problem we propose a the latent, or unobservable measurement email: [email protected] framework of ANOVA for IDEs. is wetland condition. While predicted values of the latent wetland condition email: [email protected] variable produced by the model at each INFERENCE FOR location do not hold any intrinsic value, the REGRESSION QUANTILES relative magnitudes of the wetland condi- 67. Estimation and AFTER MODEL SELECTION tion values are of interest. In addition, by Inference for High Dimensional and Data Jelena Bradic*, University of California, including point-referenced covariates in San Diego the model, we are able to make predic- Adaptive Problems tions at new locations for both the latent Mladen Kolar, University of Chicago random variable and the multivariate Regression quantiles have been a topic A FLEXIBLE FRAMEWORK FOR response. Lastly, the model produces of interest for the longest period of time. SPARSE ADDITIVE MODELING ranks of the multivariate responses in rela- The last two decade have seen exten- tion to the unobserved latent random field. Noah Simon*, University of Washington sive research devoted to this problem This is an important result as it allows us In high dimensional modeling problems in high dimensional regimes when the to determine which response variables are there is a tradeoff between adding flexibil- dimension of parameters overcomes the most closely correlated with the latent vari- ity (to decrease the bias) and removing sample size. WIth the modern data being able. Our approach offers an alternative to flexibility (to decrease the variance). Often collected at a fast pace, developing meth- traditional indices based on best profes- L1 penalized linear models are used as odology for the inference after model sional judgment that are frequently used they give parsimonous fits with few vari- selection becomes ever so important. We in ecology. We apply our model to assess ables and relatively low bias. However, address the issue of optimal confidence wetland condition in the North Platte and sometimes a linear model is not a good intervals and bahadur expansions of the Rio Grande River Basins in Colorado. The approximation to the true underlying sig- regression quantiles. We introduce penal- model facilitates a comparison of wetland nal. To combat this, authors have begun ized regression rank scores and propose condition at multiple locations and ranks to consider sparse additive models. In a novel estimator of the density of the the importance of in-field measurements. regression noise as a score statistic. email: [email protected]

Program & Abstracts 293 Hence, our method is independent of the CONDITIONAL OR FIXED? Magdalena Murawska, Indiana error distribution and is nonparametric in DIFFERENT PHILOSOPHIES University Fairbanks School of nature. Our results are non-asymptotic in IN ADAPTIVE INFERENCE Public Health, Indianapolis nature and reflect the delicate interplay of Max Grazier-G’sell*, Carnegie Ruth Aguilar, Universitat the signal strength and sample size. Mellon University de Barcelona, Spain email: [email protected] Ryan Joseph Tibshirani, Carnegie Gemma Moncunill, Universitat Mellon University de Barcelona, Spain FALSE DISCOVERY RATE CONTROL Inferential approaches in adaptive, Carlota Dobaño, Universitat FOR SPATIAL DATA non-classical estimation settings can de Barcelona, Spain be divided, roughly speaking, into two Alexandra Chouldechova*, Carnegie Clarissa Valim, Harvard School camps. The first camp conditions on Mellon University of Public Health the model selected by the adaptive In many modern applications the aim of procedure and performs a (conditional) Multiplex immunoassays are used to the statistical analysis is to identify “inter- hypothesis test accordingly. The second measure concentrations of several ana- esting” or “differentially behaved” regions camp uses the adaptive procedure as lytes simultaneously and are important from noisy spatial measurements. From a stepping stone to perform marginal for biomarker discovery. In addition to a statistical standpoint the task is both to (fixed) hypothesis tests and then defines the biological samples, assays include identify a collection of regions which are the model of interest according to the control standard curves to calibrate likely to be non-null, and to associate to results of these tests. There is much between-plate variability and quantify this collection a measure of uncertainty. recent and exciting work that has been analyte concentrations. However, their Viewing this task as a large scale multiple done in both categories. We discuss spe- range might result in suboptimal cali- testing problem, and borrowing ideas cific examples of such advances in the bration and decrease assay sensitivity, from the Poisson clumping heuristic literature, and the advantages and disad- i.e., the number of samples with analyte literature, we present methods for control- vantages of the two general approaches. concentrations within limits of quanti- ling the clusterwise false discovery rate fication (LOQ). To optimize the assay, (cFDR), defined as the expected fraction we used alternative approaches to fit of reported regions that are in truth null. 68. CONTRIBUTED PAPERS: the standard curves, treat background We show that the widely used approach Novel Methods noise, and estimate LOQ. We devel- of applying an FDR controlling procedure for Bioassay Data oped a comprehensive R package with pointwise to the measurement locations functions for managing data, calibrating in general fails to control the cFDR. We assays and performing quality control also describe how the proposed cFDR drLUMI: TOOLS FOR THE ANALYSIS (QC). Dose-response five-parameter procedure can be used to incorporate OF THE MULTIPLEX IMMUNOASSAYS logistic regression and other parametric into the analysis quantities such as clus- IN R functions for standard curves are imple- ter size and slope at upcrossing. Hector Sanz*, Universitat mented. Several approaches for treating email: [email protected] de Barcelona, Spain background noise and estimating LOQ are available to maximize the number of John Aponte, Universitat quantifiable concentrations. The package de Barcelona, Spain automates QC metrics and includes anal- Jaroslaw Harezlak, Indiana ysis of residuals and reliability estimates. University Fairbanks School Using data from a correlates of protection of Public Health, Indianapolis

294 ENAR 2015 | Spring Meeting | March 15–18 study of a malaria vaccine candidate, we adverse event. The proposed method is whether a compound is significantly toxic. show the importance and exemplify the compared to a variation of the Peddada We illustrate with a cell based study of the functionality of drLumi. test (Peddada et.al. 2007) via simulation cytotoxicity of 24 analogs of novobiocin; and is shown to have higher power. the compounds were ranked in order of email: [email protected] cytotoxicity to a panel of 18 cancer cell email: [email protected] lines and 1 normal cell line. Our approach A BAYESIAN ANALYSIS OF may also be a good alternative to com- BIOASSAY EXPERIMENTS COMPOUND RANKING BASED ON puting the EC50. A NEW MATHEMATICAL MEASURE Luis G. Leon-Novelo*, University email: [email protected] OF EFFECTIVENESS USING TIME of Louisiana at Lafayette COURSE DATA FROM CELL-BASED Andrew Womack, Indiana University ASSAYS NONPARAMETRIC CLASSIFICA- TION OF CHEMICALS USING Hongxiao Zhu, Virginia Polytechnic Francisco J. Diaz*, University of Kansas QUANTITATIVE HIGH THROUGHPUT Institute and State University Medical Center SCREENING (qHTS) ASSAYS Xiaowei Wu, Virginia Polytechnic The IC50 concentration has limitations Shuva Gupta*, National Institute Institute and State University that make it unsuitable for examin- of Environmental Health Sciences, We address model based statistical ing a large number of compounds in National Institutes of Health Bayesian inference to analyze data aris- cytotoxicity studies, particularly when ing from bioassay experiments. These multiple exposure periods are tested. A Soumendra Lahiri, North Carolina experiments consist in assigning increas- new approach to measure drug effec- State University tiveness is presented, which ranks ing doses of a chemical substance to Shyamal Peddada, National Institute compounds according to their toxic different groups of individuals (usually of Environmental Health Sciences, effects on live cells. This effectiveness lab rats or mice) while retaining a control National Institutes of Health group unexposed to the substance. measure combines all exposure times Toxicologists (and regulatory agencies) For every individual a 0-1 response tested, compares the growth rates of a are often interested in identifying toxins is observed according to whether the cell line in the presence of the compound and carcinogens humans are exposed individual exhibits the adverse event of with its growth rate in the presence of to. While the standard 2-year rodent interest. The objective of the experiment DMSO alone, measures a wider spec- cancer bioassay conducted by the US is to conclude if there is an associa- trum of toxicity than IC50, and allows National Toxicology Program (NTP) is tion between the adverse event and the automatic analyses of large numbers of often considered as the “gold standard” substance. A decision will be made compounds. It is easily implemented in to evaluate chemicals, it is typically slow based on the Bayes factor comparing two linear regression software, provides a and expensive. Consequently, the NTP, probit models: the model that assumes comparable measure of effectiveness for the US Environmental Protection Agency increasing dose effects vs. the model that each investigated compound, and tests (EPA) and others have begun exploring assumes no dose effect. Moreover, the the null hypothesis that a compound is quantitative high throughput screen- proposed approach incorporates infor- non-toxic versus the alternative that it is ing (qHTS) assays where thousands of mation of (historical) control groups from toxic. Our approach allows defining an chemicals can be processed in each previews studies and is able tohandle automated decision rule for deciding run of the assay. These are cost effec- data with very few occurrences of the tive and take considerably less time than the standard 2-year cancer bioassay. For each chemical, the data obtained from qHTS assay consists of responses

Program & Abstracts 295 to several doses (e.g. 10 to 14 doses). ROBUST BAYESIAN METHODS FOR (2012) which allow the specification of Typically, chemicals with a sigmoidal THE INVERSE REGRESSION WITH correlated non-normal errors. We extend shaped dose-response may be regarded AN APPLICATION TO IMMUNOASSAY that approach to other than 5 parameter as potentially active (i.e. potential toxin). EXPERIMENTS logistic such as exponential and power Otherwise they are potentially regarded functions to account for the setting Magdalena Murawska, Indiana as in-active (i.e. perhaps not a toxin). where no upper asymptote is reached. University Fairbanks School of Due to various characteristics of the data The developed methods are applied in Public Health, Indianapolis and the design, these distinctions are not malaria vaccine study using cytokine clear cut and hence some chemicals are Hector Sanz, Universitat de Luminex platform. The preliminary results declared inconclusive. Recently, Lim, Sen Barcelona, Spain of the performed simulations indicate and Peddada (2013) developed a robust Ruth Aguilar, Universitat de the more robustness of the Bayesian parametric methodology for classifying Barcelona, Spain approach comparing to mixed model chemicals from qHTS assays. As com- or plate-by-plate approaches especially Gemma Moncunill, Universitat de monly done by toxicologists, Lim et al. when the informative prior information is Barcelona, Spain (2013) used the Hill function to model the incorporated. Carlota Dobaño, Universitat de sigmoidal dose-response. While param- email: [email protected] eters of the Hill function provide important Barcelona, Spain interpretations to a toxicologists, and John Aponte, Universitat de hence a very preferred model, there are Barcelona, Spain ESTIMATING THE PREVALENCE instances where such a parametric func- OF MULTIPLE DISEASES VIA TWO- Clarissa Valim, Harvard School tion can be too rigid for qHTS data. Lim STAGE HIERARCHICAL POOLING of Public Health et al. (2013) describe several challenges Md S. Warasi*, University of South and open problems in the analysis of Jaroslaw Harezlak*, Indiana Carolina data from qHTS assays. We overcome University Fairbanks School of some of those challenges by taking a Public Health, Indianapolis Joshua M. Tebbs, University of South Carolina nonparametric approach to the problem. Immunoassays are a common diagnostic In this talk we describe a methodology and research tool in medical experi- Christopher McMahan, Clemson based on nonparametric monotone and ments. The use of such assays requires University convex functions that can be used for a calibration method that involves a Testing protocols in large-scale disease classifying chemicals as active, inactive standard curve estimation that reflects the screening applications often involve or inconclusive. The resulting methodol- functional relationship between the con- pooling biospecimens (e.g., blood, urine, ogy is illustrated using a data obtained centration of the analytes and the median swabs, etc.) to lower costs and/or to from the NTP. Operating characteristics of fluorescence intensity (MFI). Using the increase the number of individuals who the proposed methodology are discussed inverse standard curve an unknown are screened. Motivated by the recent using an extensive simulation study that concentration can be estimated based on development of assays that detect mul- mimics the real qHTS assay data. the given MFI. The most commonly used tiple diseases, it is now common to test email: [email protected] calibration methods rely on per plate and biospecimen pools for multiple infections per analyte standard curve estimation. simultaneously. In a recent article, Tebbs, Such methods do not use the underlying McMahan, and Bilder (Biometrics, 2013) biological properties and are not robust developed an expectation-maximization to the presence of outliers. Therefore we algorithm to estimate the prevalence of employ and expand alternative Bayesian robust methods proposed by Fong et al.

296 ENAR 2015 | Spring Meeting | March 15–18 two infections using a two-stage, Dorf- developed that expands the response provide insight into HIV transmission man-type testing protocol motivated by boundaries from (0, 1) to (L, U), where dynamics and the impact of prevention current screening practices for chlamydia L and U are unknown parameters. The interventions. Specifically, phyloge- and gonorrhea in the United States. In ballooned beta regression function differs netic linkage has the potential to inform this article, we have the same goal, but from the typical four parameter logistic whether recently-infected individuals instead we take a more flexible Bayesian model which has positive probability of have acquired viruses circulating within approach. This allows us to incorporate responses from negative infinity to posi- or outside a community. Characteristics information about assay uncertainty tive infinity. Given multiple Elisa plates of and patterns of HIV clustering can help during the screening process (which bioassay data from different laboratories, to trace transmission dynamics of viruses involves testing both pools and individu- the motivating problem was to ascertain circulating across communities. Specif- als) and also to update information as whether they all had equivalent boundar- ics of HIV clustering can be related to the more individuals are screened. Overall, ies. For this data, we find MLEs using potential of some individuals to contribute our approach provides reliable inference combination of grid searches and the disproportionally to the spread of the for disease probabilities and accurately Newton-Raphson method. We first test virus. However, assessment of the extent estimates assay sensitivity and specific- equivalences of the boundaries among to which individual (incident or prevalent) ity even when little or no information is plates. We do this under the ballooned viruses are clustered within a community supplied in the prior distributions. We beta model. Then we use a bivariate is biased if only a subset of subjects are illustrate our approach using chlamydia normal approximation to test the equiva- observed, especially if that subset is not and gonorrhea screening data from the lence of the slopes and inflection points, representative of the entire HIV infected Infertility Prevention Project. Extensions considering L and U to be nuisance population. To address this concern, we to more than two infections are also parameters. A 95 percent confidence develop a multiple imputation framework possible. ellipsoid is drawn to detect plates with in which missing viral sequences are outlying slopes and interception points. imputed based on a biological model for email: [email protected] the diversification of viral genomes. The email: [email protected] imputation method decreases the bias A BALLOONED BETA REGRESSION in clustering that arises from informative MODEL AND ITS APPLICATION TO 69. CONTRIBUTED PAPERS: missingness. Data from a household sur- BIOASSAY DATA Infectious Disease vey conducted in a village in Botswana are used to illustrate these methods. We Min Yi*, University of Missouri, Columbia demonstrate that the multiple imputation Nancy Flournoy, University of Missouri, VIRAL GENETIC LINKAGE approach effectively corrects for bias in Columbia ANALYSIS IN THE PRESENCE the overall proportion of clustering due OF MISSING DATA The beta distribution demonstrates a sim- to informative missingness of individuals ple and flexible model in which response Shelley Han Liu*, Harvard University from certain demographic groups, and is naturally confined to a finite interval. that we can recreate the entire sample of Gabriel Erion, Harvard University The parameters of the distribution can the population by viewing the observed be related to covariates such as dose Vladimir Novitsky, Harvard School dataset as a biased sample from the and gender through a regression model. of Public Health population. However, the beta distribution is naturally Victor DeGruttola, Harvard School email: [email protected] restricted between known boundaries, of Public Health 0 and 1. A ballooned beta regression Phylogenetic linkage, based on viral model with expected responses equal sequencing data from HIV preven- to the four parameter logistic model is tion trials at the community level, can

Program & Abstracts 297 A BAYESIAN APPROACH TO ESTI- EXPLORING BAYESIAN LATENT MODELING AND INFERENCE FOR MATING CAUSAL VACCINE EFFECTS CLASS MODELS AS A POTENTIAL ROTAVIRUS DYNAMICS IN NIGER STATISTICAL TOOL TO ESTIMATE ON BINARY POST-INFECTION Joshua Goldstein*, The Pennsylvania SENSITIVITY AND SPECIFICITY IN OUTCOMES State University PRESENCE OF AN IMPERFECT OR Jincheng Zhou*, University of Minnesota NO GOLD STANDARD Murali Haran, The Pennsylvania State Haitao Chu, University of Minnesota University Jay Mandrekar*, Mayo Clinic Michael G. Hudgens, University of North Matthew Ferrari, The Pennsylvania Assessment of a new assay or diag- Carolina, Chapel Hill State University nostic test is generally performed using M. Elizabeth Halloran, Fred Hutchinson statistical measures such as sensitivity, Recently developed vaccines provide Cancer Research Center and University specificity, negative predictive value, posi- a new way of controlling rotavirus in of Washington tive predictive value and area under the sub-Saharan Africa. Models for the trans- mission dynamics of rotavirus are critical To estimate causal effects of vaccine on curve when an established gold standard for assessing effects of vaccination post-infection outcomes, Hudgens and exists. However, in some cases, the gold and guiding intervention strategies. We Halloran (2006) defined a post-infection standard may be imperfect or may not examine rotavirus infection in the Maradi causal vaccine efficacy estimand VE exist. In such situations, Bayesian latent area in southern Niger, using hospital I based on the principal stratification class models (BLCM) is proposed as surveillance data provided by Médecins framework using the maximum likeli- one of the possible alternatives. LCM Sans Frontières collected over two years. hood estimation method. Extending does not assume any gold standard and Additionally, a cluster survey of house- their research, we propose a Bayesian a true disease state (present/absent) holds in the region allows us to estimate approach to estimate the causal vaccine for each individual is also unknown. the proportion of children with diarrhea effects on binary post-infection outcomes. Bayesian methodology to LCM will be who consulted at a health structure. We The identifiability of the causal vaccine illustrated using a simple example of a compare our results across several vari- effect VE I is discussed under differ- real life dataset from Clinical Microbiology ants of Susceptible-Infectious-Recovered ent assumptions on selection bias. The research study. This approach is increas- (SIR) compartmental models to quantify performance of the proposed Bayesian ingly used to validate diagnostic tests for the impact of modeling assumptions on method is compared with the maximum infectious diseases and clinical microbiol- our estimates. Model parameters are likelihood method through simulation ogy research without assuming a gold estimated by Bayesian inference using studies and two case studies -- a clini- standard. Markov chain Monte Carlo. Our approach cal trial of a rotavirus vaccine candidate email: [email protected] allows us to quantify the burden of infec- and a field study of a pertussis vaccine. tion in the region, and explore the impact For both case studies, the Bayesian of vaccination on both the short-term approach provided similar inference as dynamics and the long-term reduction of the frequentist analysis. However, simula- rotavirus incidence under varying levels tion studies with small sample sizes of coverage. Additionally, we investigate suggest that the Bayesian approach pro- two-strain dynamic models to gain insight vides smaller bias and shorter confidence into a shift in the observed dominant interval length. genotype of rotavirus, consistent with the email: [email protected] effects of strain replacement. email: [email protected]

298 ENAR 2015 | Spring Meeting | March 15–18 COMPARISON OF GROUP TESTING CHOLERA TRANSMISSION IN 70. CONTRIBUTED PAPERS: ALGORITHMS FOR CASE IDENTI- OUEST REGION OF HAITI: DYNAMIC Variable Selection FICATION IN THE PRESENCE OF MODELING AND PREDICTION DILUTION EFFECT Alexander Kirpich*, University of Florida WEAK SIGNAL IDENTIFICATION Dewei Wang*, University of South Alex Weppelmann, University of Florida AND INFERENCE IN PENALIZED Carolina Yang Yang, University of Florida MODEL SELECTION Christopher S. McMahan, Clemson Peibei Shi*, University of Illinois, University Ira Longini, University of Florida Urbana-Champaign Colin M. Gallagher, Clemson University We present a stochastic compartmen- tal model for cholera transmission that Annie Qu, University of Illinois, Group testing, through the use of pool- combines a framework of SIRS for human Urbana-Champaign ing, has been widely implemented as a hosts with an environmental reservoir Penalized model selection methods are more efficient means to screen individu- of the bacteria to accounts for both developed to select variables and esti- als for infectious diseases. Various testing human-to-human and environment-to- mate coefficients simultaneously, which strategies, such as hierarchical and human-to-environment transmission is useful in high-dimensional variable square array-based testing algorithms, routes. In addition, we consider the effect selection. However, identification and have been proposed. In this talk, I will of environmental conditions such as tem- inference for weak signals are still quite present the comparison of the operat- perature and precipitation on modulating challenging and are not well-studied. ing characteristics, including testing the dynamics. The model distinguishes Existing inference procedures for the efficiency and classification accuracy, of between symptomatic and asymptom- penalized estimators are mainly focused these algorithms for the purpose of case atic infections, each with its own disease on strong signals. This motivates us to identification. The differences between course and infectivity level. The asymp- investigate finite sample behavior for our approach and the previous ones are tomatic subpopulation is not observable, weak signal inference. We propose an the assumptions regarding testing error and we perform sensitivity analysis on identification procedure for weak signals rates. We relax previously made assump- related parameters. We apply our model in finite samples, and provide a transition tions by acknowledging the mechanistic to surveillance data in the Ouest region of phase in-between noise and strong signal structure of the diagnostic assays. By Haiti during 2010-2014 years. We found strengths. A new two-step inferential doing this, we are able to account for the that the transmission dynamics in Haiti method is introduced to construct better dilution effect; i.e., truly positive speci- were shaped jointly by the transmission inference for the weak signals being mens could be diluted when they are among human hosts and the environ- identified. Our simulation studies show pooled together with many truly negative mental reservoir, the waning of immunity that the proposed method leads to better ones, and thus cannot be detected. This in human hosts, the natural life cycle of confidence coverages for weak signals, methodology is illustrated by comparing the bacteria, and the potential effects of compared with those using asymptotic different testing algorithms via the HIV, other external factors such as phage that inference, perturbation and bootstrap HBV and HCV data collected form a study infects the bacteria. resampling approaches. We also illustrate involving Irish prisoners. email: [email protected] our method for HIV antiretroviral drug email: [email protected] susceptibility data to identify genetic mutations associated with HIV drug resistance. email: [email protected]

Program & Abstracts 299 FEATURE SCREENING FOR A REGULARIZED APPROACH FOR applications due to its simplicity and TIME-VARYING COEFFICIENT SIMULTANEOUS ESTIMATION AND intuitive interpretability. Under this model, MODELS ULTRAHIGH DIMENSIONAL MODEL SELECTION FOR SINGLE the LASSO (Tibshirani, 1996) is an attrac- LONGITUDINAL DATA INDEX MODELS tive penalized least squares approach that provides simultaneous estimation Wanghuan Chu*, The Pennsylvania Longjie Cheng*, Purdue University and variable selection. However, the State University Peng Zeng, Auburn University LASSO known to have many limitations, Runze Li, The Pennsylvania State Yu Zhu, Purdue University especially when the number of non-zero University coefficients exceed the available sample The single index model generalizes the Matthew Reimherr, The Pennsylvania size and the variables corresponding linear regression model by incorporat- State University to the non-zero coefficients are highly ing a non-parametric component. It has correlated. In this talk, we will present a This paper is concerned with feature become increasingly more popular due novel algorithm, the multi-step LASSO, screening for time-varying coefficient to its flexibility in modelling. Similar to which shields the LASSO from these models with ultrahigh dimensional the linear regression model, the set of limitations. The algorithm exploits the cor- longitudinal data. We propose a new predictors for the single index model relation structure among the predictors screening method that identifies impor- can contain a large number of irrelevant to improve estimation. Extensive simula- tant predictors after accounting for variables. In this work, we propose a tion studies and application to a publicly within-subject correlation and time-vary- new method for simultaneous estimation available gene expression data on Diffuse ing variance of the longitudinal response. and model selection for the single index Large-B-Cell Lymphoma show that the We examine their finite sample perfor- model. We will develop a coordinate proposed method yields superior results mance by comparing with other existing descent algorithm to efficiently imple- under different modeling scenarios. methods via Monte Carlo simulations. In ment our method for both low and high the real data example, Childhood Asthma dimensional cases. We will show that email: [email protected] Management Program (CAMP) datasets under certain conditions, the proposed are analyzed, where SNPs of genes that method can consistently estimate the true BAYESIAN HIERARCHICAL affect children’s asthma measurements index and select the true model. Simula- VARIABLE SELECTION INCORPO- are selected after accounting for baseline tions with various settings and a real data RATING MULTI-LEVEL STRUCTURAL predictors. We advocate a two-stage analysis are conducted to demonstrate INFORMATION approach by first reducing the ultrahigh the estimation accuracy, the selection dimensionality to a moderate size using consistency and the computational effi- Changgee Chang*, Emory University the proposed procedure, and then ciency of our proposed method. Yize Zhao, Statistical and Applied Math- applying model selection techniques to email: [email protected] ematical Sciences Institute make statistical inference on the coef- ficient functions and covariance structure. Qi Long, Emory University To compare models selected from our MULTI-STEP LASSO screening procedure and other methods, Haileab Hilafu*, University of Tennessee Recently, considerable effort has been we evaluate the prediction performance made to incorporate structural or bio- through leave-one-out cross validation. The traditional linear regression model logical information among covariates Finally, we discuss the joint and individual remains one of the most popular sta- into variable selection. In this work, we heritability of SNPs estimated from the tistical inference tools in a diverse of propose a Bayesian approach for hier- best models selected. archical variable selection in Gaussian email: [email protected]

300 ENAR 2015 | Spring Meeting | March 15–18 process models while incorporating multi- GLOBALLY ADAPTIVE QUANTILE 71. CONTRIBUTED PAPERS: level structural/biological information. We REGRESSION WITH ULTRA-HIGH Modeling Health develop efficient MCMC algorithms for DIMENSIONAL DATA Data with Spatial posterior computation. We examine the Qi Zheng*, Emory University or Temporal Features performance of our proposed method by simulation studies and we apply it to Limin Peng, Emory University a colorectal cancer study for assessing Xuming He, University of Michigan MODELING OF CORRELATED treatment effects on multiple functional OBJECTS WITH APPLICATION Quantile regression has become a biomarkers. TO DETECTION OF METASTATIC valuable tool to analyze heterogeneous CANCER USING FUNCTIONAL email: [email protected] covaraite-response associations that CT IMAGING are often encountered in practice. The development of quantile regression Yuan Wang*, University of Texas MODEL SELECTION FOR PROTEIN methodology for high dimensional MD Anderson Cancer Center COPY NUMBERS IN POPULATIONS covariates primarily focuses on exami- Brian Hobbs, University of Texas OF MICROORGANISM nation of model sparsity at a single or MD Anderson Cancer Center Burcin Simsek*, University of Pittsburgh multiple quantile levels, which are typi- Jianhua Hu, University of Texas cally prespecified ad hoc by the users. Hanna Salman, University of Pittsburgh MD Anderson Cancer Center The resulting models may be sensitive Satish Iyengar, University of Pittsburgh to the specific choices of the quantile Kim-Anh Do, University of Texas Recent biophysical studies have raised levels, leading to conceptual difficul- MD Anderson Cancer Center questions about the possible universal- ties in identifying relevant variables of Perfusion computed tomography (CTp) is ity of protein copy number fluctuations. interest. We propose a new penalization an emerging functional imaging modal- We are interested in comparing the fits framework for quantile regression in the ity that uses physiological models to of several models to those fluctuations. high dimensional setting. Our proposed quantify characteristics pertaining to the These models include the lognormal, approach achieves consistent shrinkage passage of fluid through blood vessels. generalized inverse Gaussian, and of regression quantile estimates across Perfusion characteristics provide physi- Frechet using closeness as measured a continuous range of quantiles levels, ological correlates for neovascularization by the Kullback-Leibler divergence. The enhancing the flexibility and robust- induced by tumor angiogenesis. Thus lognormal results from a large number of ness of the existing penalized quantile CTp offers promise as a non-invasive multiplicative processes, or exponential regression methods. Our theoretical quantitative functional imaging tool for growth; the generalized inverse Gauss- results include the oracle rate of uniform cancer detection, prognostication, and ian arises as a first passage time for convergence and weak convergence of treatment monitoring. We first developed diffusions; and the Frechet is an extreme the parameter estimators. We also use a Bayesian probabilistic framework for value distribution. In this study, we show numerical studies to confirm our theoreti- simultaneous supervised classification that the lognormal gives the best fit, and cal findings and illustrate the practical of multivariate correlated regions. We discuss implications for underlying bio- utility of our proposal. demonstrate that simultaneous Bayesian physical processes. email: [email protected] classification yields dramatic improve- email: [email protected] ments in performance in the presence of strong correlation, yet remains com- petitive with classical methods in the presence of weak or no correlation. A

Program & Abstracts 301 semi-parametric model is further imple- field (VF) location jointly. Past modeling cross-sectional setting; therefore, limiting mented for estimation and prediction of attempts include the analysis of global the possibility of investigating temporal sparse spatiotemporally correlated CTp VF measures over time or the separate trends, especially in the presence of policy characteristics derived from multiple analyses of sensitivities at individual VF change. The purpose of this research is to intra-patient metastatic sites. We consid- locations over time. The first set of meth- examine the spatio-temporal distribution of ered weighted kernel smoothing and joint ods ignores valuable spatial information alcohol outlets before and after privatiza- prediction of curves arising from multiple while the second set is inefficient and fails tion in Seattle, Washington, 2010-2014. ROIs within the same patient to improve to account for spatial similarities in vision This natural experiment allows us to see characterizations of contrast absorption loss across the VF. Our spatial probit the effect of privatization on the already over time. The methodology builds a model jointly incorporates all VF changes well-documented positive relationship foundation for probabilistic segmentation in a single framework while accounting for between alcohol outlets and violence. of regions of liver that exhibit perfusion structural similarities between neighbor- Using census block groups, we are able characteristic indicative of metastatic sites ing VF regions. Results indicate that our to analyze the patterns of alcohol out- using CTp maps acquired over the entire method provides improved model fit when lets, as well as characterize the census liver. compared with previously developed block group variation via geovisualization methods. The model is also shown to methods. e-mail: [email protected] provide improved predictions of progres- email: [email protected] sion status in a validation dataset. This A SPATIALLY VARYING COEFFICIENT model may be clinically useful for detect- MODEL WITH PARTIALLY UNKNOWN ing the glaucoma progression status of an MODELING ADOLESCENT HEALTH PROXIMITY MATRIX FOR THE DETEC- individual. DATA USING A BINARY SPATIAL- TION OF GLAUCOMA PROGRESSION email: [email protected] TEMPORAL GENERALIZED METHOD USING VISUAL FIELD DATA OF MOMENTS APPROACH Joshua L. Warren*, Yale School Kimberly Kaufeld*, Statistical MAPPING AND MEASURING THE of Public Health and Applied Mathematics Institute EFFECT OF PRIVATIZATION ON and North Carolina State University Jean-Claude Mwanza, University ALCOHOL AND VIOLENCE: DOES IT of North Carolina, Chapel Hill REALLY MATTER? Health applications generally contain binary data that are correlated across Angelo P. Tanna, Northwestern University Loni Philip Tabb*, Drexel University space and time. A model that accounts for Donald L. Budenz, University of Tony H. Grubesic, Drexel University the spatial and temporal dependence is North Carolina, Chapel Hill the centered spatial-temporal autologistic The increasing presence of alcohol outlets regression model. Statistical inference for Glaucoma is a leading cause of irre- has been long linked to violent crime, the autologistic model has been based versible blindness worldwide. Once a particularly in urban areas. Privatization upon pseudolikelihood, Monte Carlo diagnosis is made, careful monitoring of removes state control on alcohol sales, maximum likelihood, or Monte Carlo the disease is required to prevent vision and the effect of privatization has been expectation-maximization. However, loss. However, determining if the disease shown to alter the relationship between these methods require the full condi- is progressing remains the most difficult alcohol outlets and various alcohol-related tional distribution to be defined, which task in the clinical setting. We introduce public health issues. Research on alcohol can be computationally expensive given new spatial methodology in the Bayes- outlets, though, commonly involves a ian setting in order to properly model the the complexity of spatial and temporal progression status of a patient, as deter- dependence. We propose an alternative mined by expert clinicians, as a function approach to likelihood based methods for of changes in sensitivities at each visual

302 ENAR 2015 | Spring Meeting | March 15–18 binary spatial-temporal data using gener- firefighters compared the timing and Built environment factors have received alized method of moments. The approach incidence of physician-diagnosed OAD heightened attention in recent years as is based on a set of moment conditions relative to WTC-exposure. Exposure was potential contributors to health, given constructed with respect to spatial neigh- categorized by WTC arrival time: high that the built environment can constrain borhoods and time, accounting for the (9/11/2001 AM); moderate (9/11/2001 individual-level choices and behaviors. spatial and temporal dependence of the PM or 9/12/2001); or low (9/13/-24/2001). For example, food outlets around schools data. Comparisons of the estimation meth- Piecewise exponential survival models may affect children dietary choices ods are demonstrated in a simulation and with change points were used to model both through direct access to junk food with the Add Health data to assess the the relative rates (RR) and 95% confi- and exposure to advertisement thereby effect of peers at multiple levels (i.e. grade dence intervals (CI) of OAD incidence influencing body weight. Although and school) on drug and alcohol use. by exposure over the first ten years some research has observed significant post-9/11/2001, estimating the time(s) associations between the availability of email: [email protected] of change in the RR with change point food outlets near schools and childhood models. We observed change points at obesity, other studies have not. Traditional A PIECEWISE EXPONENTIAL SUR- 15 and 84 months post-9/11/2001. Before regression methods have been widely VIVAL MODEL WITH CHANGE POINTS 15 months the RR for the high versus low used to examine said associations, but FOR EVALUATING THE TEMPORAL exposure group was 4.23 (95% CI 2.71- they often rely on measures of the built ASSOCIATION OF WORLD TRADE 6.60), from 15 to 84 months 1.94 (95% environment (e.g., number of food out- CENTER EXPOSURE WITH INCIDENT CI 1.51-2.51) and thereafter, 1.01 (95% lets) within pre-specified distances from OBSTRUCTIVE AIRWAY DISEASE CI 0.76-1.36). Incidence of physician- schools. We propose using distributed lag diagnosed OAD increased in all exposure models (DLMs) to describe the association Charles B. Hall*, Albert Einstein groups starting in the sixth year post between built environment features and College of Medicine 9/11/2001 as the program started covering health as a function of distance from the Xiaoxue Liu, Montefiore Medical Center OAD medications for free. This difference study locations. We demonstrate through Rachel Zeig-Owens, Montefiore in RR by exposure occurred despite full simulation studies that traditional regres- Medical Center and free access to healthcare for all WTC- sion models can produce severely biased exposed firefighters, demonstrating the associations when there is spatial correla- Mayris P. Webber, Montefiore persistence of WTC-associated OAD risk tion among the built environment features. Medical Center for up to seven years. In contrast, inference based on DLMs is Jessica Weakley, Montefiore Medical robust under various conditions of the built email: [email protected] Center environment. We use this innovative appli- Theresa M. Schwartz, Montefiore cation of DLMs to examine the association Medical Center DISTRIBUTED LAG MODELS: between the presence of convenience EXAMINING ASSOCIATIONS stores around California public schools David J. Prezant, Fire Department BETWEEN THE BUILT ENVIRONMENT and children’s body mass index z-score. of the City of New York AND HEALTH email: [email protected] The World Trade Center (WTC) disaster Jonggyu Baek*, University of Michigan presents a unique opportunity to describe the latency period for obstructive airway Brisa N. Sanchez, University of Michigan disease (OAD) diagnoses. This pro- Veronica J. Berrocal, University of spective cohort study of New York City Michigan Emma V. Sanchez-Vaznaugh, San Francisco State University

Program & Abstracts 303 CLUSTER DETECTION TEST occurs or not. This assumption, however, sion model. The covariance matrix of IN SPATIAL SCAN STATISTICS: is not reasonable for many longitudinal multiple correlated time-specific random ADHD APPLICATION studies. Therefore we directly model intercepts for each subject is assumed to event time as a covariate, which provides represent the within-subject association. Ahmad Reza Soltani*, Kuwait University intuitive interpretation. When the terminal The subject-specific random effects cova- Suja Aboukhamseen, Kuwait University event times are right-censored, a semi- riance matrix is further decomposed into We establish hypotheses testing for spa- parametric likelihood-based approach is its dependence and variance components tial scan testing hypotheses for cluster proposed for the parameter estimation, through modified Cholesky decomposi- detection, then provide a transparent test where the Cox regression model is used tion method and then the unconstrained statistics procedure for cluster detection for the censored terminal event time. We version of resulting parameters are in a spatial settings. We also specify the consider a two-stage estimation proce- modelled in terms of covariates with limiting distribution of the test statistics. dure, where the conditional distribution low-dimensional regression parameters. We apply our method to the special of the right-censored terminal event time This provides better explanations related needs school students in Kuwait suffering given other variables is estimated prior to dependence and variance parameters from Attention Deficit Hyper Active Disor- to maximizing the likelihood function for and a reduction in the number of param- der, using real data. We do detects same the regression parameters. The proposed eters to be estimated in random effects primary and secondary clusters among method outperforms the complete case covariance matrix to avoid possible iden- districts of the students residential areas. analysis in simulation studies, which sim- tifiability problems. Marginal correlations ply eliminates the subjects with censored between responses of subjects and within email: [email protected] terminal event times. Desirable asymptotic the responses of a subject are derived properties are provided. through a Taylor series-based approxima- tion. Data cloning computational algorithm 72. CONTRIBUTED PAPERS: email: [email protected] Advances in is used to compute the maximum likeli- Longitudinal Modeling hood estimates and their standard errors A MARGINALIZED MULTILEVEL of the parameters in the proposed model. MODEL FOR BIVARIATE The proposed model is illustrated through CONDITIONAL MODELING LONGITUDINAL BINARY DATA Mother’s Stress and Children’s Morbidity OF LONGITUDINAL DATA WITH study data, where both population-aver- Gul Inan*, Middle East Technical TERMINAL EVENT aged and subject-specific interpretations University, Turkey Shengchun Kong*, Purdue University are drawn through Emprical Bayes estima- Ozlem Ilk Dag, Middle East Technical tion of random effects. Bin Nan, University of Michigan University, Turkey email: [email protected] Jack Kalbfleisch, University of Michigan This study considers analysis of bivariate We consider longitudinal data analysis longitudinal binary data. We propose a with a terminal event where the terminal model based on marginalized multilevel AUGMENTED BETA RECTANGULAR event time is informative. Existing methods model framework. The proposed model REGRESSION MODELS: A BAYESIAN include the joint modeling approach using consists of two levels such that the first PERSPECTIVE latent frailty and the marginal estimating level associates the marginal mean of Jue Wang*, University of Texas Health equation approach using inverse probabil- responses with covariates through a Science Center, Houston ity weighting approach, and both assume logistic regression model and the sec- Sheng Luo, University of Texas Health that the relationship between the response ond level includes subject/time specific Science Center, Houston variable and a set of covariates is the random intercepts within a probit regres- same no matter whether the terminal event

304 ENAR 2015 | Spring Meeting | March 15–18 Mixed effects Beta regression models robustness of inference over parametric intervals and with no unobserved transi- based on Beta distributions have been models. However, such models are not tion between two contiguous assessment widely used to analyze longitudinal robust against outlying observations. time points, these two types of models percentage or proportional data ranging Rank regression (RR), a lesser-known work equally well. When the data are not between zero and one. However, Beta model based on the Wilcoxon score for equally spaced, or there are possible distributions are not flexible to extreme the Mann-Whitney-Wilcoxon (MWW) test, unobserved transitions between two outliers or excessive events around tail provides more robust estimates over GEE. contiguous assessment time points (e.g., areas, and they do not account for the Unfortunately, RR does not sufficiently a patient dies without being observed first presence of the boundary values zeros address missing data arising in longitudi- passing through severe clinical disease), and ones because these values are not nal studies. We discuss a new approach the continuous-time multi-state Markov in the support of the Beta distributions. to address outliers in longitudinal study model is preferred. We also apply our To address these issues, we propose a data. This robust alternative not only effec- model to a real dataset, the Nun Study, mixed effects model using Beta rectan- tively addresses missing data, but has a cohort of 461 participants who were gular distribution and augment it with also been applied to extend the MWW to cognitively normal at study baseline and the probabilities of zero and one. We provide causal inference for observational followed to autopsy. studies. The approach is illustrated with conduct extensive simulation studies to email: [email protected] assess the performance of mixed effects both real and simulated data. models based on both the Beta and Beta email: [email protected] rectangular distributions under various APPLICATIONS OF MULTIPLE scenarios. The simulation studies sug- OUTPUTATION FOR THE ANALYSIS gest that the regression models based on MARKOV CHAINS AND CONTINU- OF LONGITUDINAL DATA SUBJECT Beta rectangular distributions improves OUS TIME MULTI-STATE MARKOV TO IRREGULAR OBSERVATION the accuracy of parameter estimates in MODELS COMPARISONS IN LONGI- Eleanor M. Pullenayegum*, Hospital the presence of outliers and heavy tails. TUDINAL CLINICAL ANALYSIS for Sick Children The proposed models are applied to the Lijie Wan*, University of Kentucky motivating Neuroprotection Exploratory Observational cohort studies often feature Trial in PD Long-term Study-1 (LS-1 study, Richard J. Kryscio, University longitudinal data subject to irregular n=1741), developed by The National Insti- of Kentucky observation. Moreover, the timings of observations are often associated with tute of Neurological Disorders and Stroke Erin Abner, University of Kentucky Exploratory Trials in Parkinson’s Disease the underlying disease process, and Multi-state Markov models are widely used (NINDS NET-PD) network. must thus be accounted for when analys- to analyze longitudinal data describing ing the data. Multiple outputation, which email: [email protected] the progression of a chronic disease or consists of repeatedly discarding excess condition, like dementia. Several studies observations, can be a helpful way of have focused on modeling true disease RANK-BASED REGRESSION approaching the problem. In particular, progression as a discrete time Markov MODELS FOR LONGITUDINAL DATA we show that multiple outputation enables chain, which requires certain assumptions. doubly robust inference within standard Rui Chen, University of Rochester Recently, continuous-time multi-state statistical software, and widens the scope Tian Chen*, University of Rochester Markov models have also become very of semi-parametric joint models for the popular. In this paper, we discuss the outcome and visit processes to include Xin Tu, University of Rochester relationship as well as differences between cases where the visit process includes a Popular mean-based semi-parametric these two modeling techniques. Our time-varying endogenous covariate. regression models such as the general- simulation study shows that when longitu- email: [email protected] ized estimating equations (GEE) improve dinal data are arise from equally spaced

Program & Abstracts 305 A HIDDEN MARKOV MODEL 73. CONTRIBUTED PAPERS: special case of a binary IV, we show that APPROACH TO ANALYZE LON- Causal Inference: ETT is non-parametrically identified under GITUDINAL TERNARY OUTCOME Average and a straightforward assumption that the IV DISEASE STAGE CHANGE SUBJECT Mediated Effects does not interact with an unmeasured TO MISCLASSIFICATION confounder in a logistic propensity score model for treatment. For inference, we Julia Benoit*, University of Houston INSTRUMENTAL VARIABLE propose three different semiparametric Wenyaw Chan, University of Texas ESTIMATION OF THE MARGINAL strategies (i) inverse probability weight- Health Science Center School of AVERAGE EFFECT OF TREATMENT ing (IPW), (ii) outcome regression and Public Health ON THE TREATED (iii) doubly robust (DR) estimation which Understanding the dynamic disease pro- Lan Liu*, Harvard University combines (i) and (ii) and is more robust cess is vital in early detection, diagnosis, than either strategies. Specifically, the Baoluo Sun, Harvard University and measuring progression. Continuous- DR estimator is shown to be consistent if time Markov chain (CTMC) methods James Robins, Harvard University either strategy (i) or (ii) is consistent. An extensive simulation study is carried out have been used to estimate state change Eric Tchetgen Tchetgen, Harvard to investigate the finite sample perfor- intensities but challenges arise when University stages are potentially misclassified. We mance of the proposed estimators. The The objective of many studies in health present an analytical likelihood approach methods are further illustrated in a well and social sciences is to evaluate the where the hidden state is modeled as known application of the impact of partici- causal effect of a treatment or exposure a three-state CTMC model using the pation in a 401(k) retirement programs on on a specific outcome using observa- possibly misclassified observed values. savings. tional data. In such studies the exposure Covariate effects of the hidden process email: [email protected] is typically not randomized and therefore and misclassification probabilities of confounding bias can rarely be ruled the hidden state are estimated without out with certainty. The instrumental information from a ‘gold standard’ as WITHIN-SUBJECT DESIGNS FOR variable (IV) design plays the role of a comparison. Parameter estimates are CAUSAL MEDIATION ANALYSIS quasi-experimental handle since the IV is obtained using a modified EM algorithm associated with the treatment, indepen- Yenny Webb-Vargas*, Johns Hopkins and identifiability of CTMC estimation is dent of potential outcomes conditional Bloomberg School of Public Health addressed. Simulation studies and an on observed covariates and it affects the application studying Alzheimer Dis- Martin A. Lindquist, Johns Hopkins outcome only through treatment. In this ease progression are presented. The Bloomberg School of Public Health paper, we present a novel framework for method was highly sensitive to detect- Elizabeth A. Stuart, Johns Hopkins identification and estimation using an IV, ing true misclassification and did not Bloomberg School of Public Health of the marginal average causal effect of falsely identify error in the absence of treatment amongst the treated (ETT) in Michael E. Sobel, Columbia University misclassification. In conclusion, we have the presence of unmeasured confound- developed a robust longitudinal method- Making causal statements about media- ing. We show that access to an IV allows ology for categorical outcome data where tion always poses a problem because, for partial identification of the association the researcher is unsure of the level of even if we can randomize the interven- between exposure and the potential out- uncertainty in the classification of disease tion, it is often difficult - or impossible - to come under no exposure, which encodes severity stage if the purpose is to look at randomize the assignment of the media- the magnitude of selection bias due to the process’ transition behavior without a tor. However, there are cases in which confounding of the treatment. In the gold standard. a different design of experiment can be applied, opening new possibilities for email: [email protected] identification of the mediation counterfac-

306 ENAR 2015 | Spring Meeting | March 15–18 tuals. Within-subject designs, common is challenging, due to Multicolinearity and clusters estimated based on a set of in experiments using functional Magnetic the Reversal paradox. We propose using potential effect modifiers. The cluster Resonance Imaging, allow for the obser- Weighted Quantile Sum (WQS) regres- specific direct and indirect effects can vation of an individual’s response under sion to analyze mediation effects of a set be estimated through a set of regression both treated and control conditions, often of correlated predictors on the outcome. models whose coefficients differ by clus- with replicates. We explored this design The WQS method is a constrained, ter. We construct a nonparametric model under the potential outcomes framework, non-linear optimization algorithm which with a stick breaking prior to identify the and established identifiability conditions estimates the regression parameters, clusters based on a large set of potential for estimating causal direct and indirect and the empirical weights of individual effect modifiers. We use this approach effects. Furthermore, we developed an predictors (ranked as quartiles), simulta- to estimate the cluster specific causal estimation procedure that is robust to neously. The result is WQS index, which effects of an expressive writing interven- confounding of the mediator-outcome, represents the set of correlated predic- tion for patients with renal cell carcinoma and we tested it using simulations. We tors as a single entity, having a composite (Milbury et al., 2014). effect on the outcome. Next, by applying applied this method to a trial analyzing email: [email protected] how the brain processes thermal pain. the traditional mediation analysis method In this case, the mediator is the hemody- using the WQS index, enables finding namic response function, conceptualized the significant mediation effect of the ACCOUNTING FOR UNCERTAINTY as a continuous function in time, and predictors acting through a hypothesized IN CONFOUNDER SELECTION WHEN our method is able to estimate the cor- mediation pathway, on the outcome of ESTIMATING AVERAGE CAUSAL responding functional parameter of a interest. While other constrained optimi- EFFECTS IN GENERALIZED LINEAR causal indirect effect. zation methods focus on dimensionality MODELS reduction, WQS attempts to pick out the email: [email protected] Chi Wang*, University of Kentucky predictors amongst a correlated predictor set, that have a significant effect on the Corwin Matthew Zigler, Harvard School MEDIATION ANALYSIS OF A SET outcome. Preliminary simulation results of Public Health OF CORRELATED PREDICTORS of WQS applied to a set of two correlated Giovanni Parmigiani, Dana-Farber USING WEIGHTED QUANTILE SUM predictors compared to the traditional Cancer Institute and Harvard School REGRESSION METHOD Multiple Regression Mediation Analysis of Public Health are presented. Bhanu Murthy Evani*, Virginia Francesca Dominici, Harvard School Commonwealth University email: [email protected] of Public Health Robert A. Perera, Virginia Confounder selection and adjustment Commonwealth University BAYESIAN SEMIPARAMETRIC are essential elements of assessing the Chris Gennings, Icahn School LATENT MEDIATION MODEL causal effect of an exposure or treat- ment in observational studies. Building of Medicine at Mount Sinai Chanmin Kim*, Harvard University upon work by Wang et al. (2012) and Traditional mediation analysis uses the Michael J. Daniels, University Lefebvre et al. (2014), we propose and single predictor, multiple regression of Texas, Austin evaluate a Bayesian method to estimate method; first proposed by Baron & Kenny Yisheng Li, University of Texas average causal effects in studies with a (1986) and since then advanced by MD Anderson Cancer Center large number of potential confounders, authors Hayes, MacKinnon and Vander- likely interactions between these con- We propose a Bayesian semiparametric Weele. The application of Mediation founders and the exposure of interest, analysis to a set of correlated predictors method to estimate natural direct and indirect effects (causal effects) within

Program & Abstracts 307 and uncertainty on which confound- goal of gaining some awareness of how mixed effects SEM with an innovation to ers should be included. Our method variable selection affects the balance of estimate the unknown correlation param- is applicable across all exposures and covariates, prediction ability, and accu- eter in the first layer. Using extensive outcomes that can be handled through racy of the estimate. Optimizing all three simulated data and a real fMRI dataset, generalized linear models. In this setting, metrics does not always coincide in the we demonstrate the improvement of our models coefficients are not collapsible simulations conducted. approach over existing methods. across different models for confounding email: [email protected] email: [email protected] adjustment. We implement a Bayesian bootstrap procedure to estimate causal effects while acknowledging uncertainty ESTIMATING MEDIATION EFFECTS 74. CONTRIBUTED PAPERS: in the population covariate distribution. UNDER CORRELATED ERRORS WITH Variable Selection with Our method permits estimation of both AN APPLICATION TO fMRI High Dimensional Data the overall population causal effect and Yi Zhao*, Brown University effects in specified subpopulations, providing clear characterization of hetero- Xi Luo, Brown University EMPIRICAL LIKELIHOOD TESTS geneous exposure effects that may vary Mediation analysis assesses the effect FOR COEFFICIENTS IN HIGH considerably across different covariate passing through a intermediate variable DIMENSIONAL LINEAR MODELS profiles. Simulation studies demonstrate (mediator) in a causal pathway from Honglang Wang*, Michigan State that the proposed method performs well the treatment variable to the outcome University in small sample size situations with 100 to variable. Structure equation models Ping-Shou Zhong, Michigan State 150 observations and 50 covariates. The (SEMs) is a popular approach to esti- University method is applied to evaluate the effect mate the mediation effect. However, of surgery on reducing thirty-day hospital causal interpretation usually requires Yuehua Cui, Michigan State University readmissions among 15060 US Medicare strong assumptions which may not hold We consider hypothesis testing prob- beneficiaries diagnosed with a brain in many social and scientific studies. In lems for low-dimensional regression tumor between 2000 and 2009. this paper, we use mediation analysis in coefficients in a high dimensional linear email: [email protected] an fMRI experiment to assess the effect model with Gaussian designs. We of randomized binary stimuli passing propose empirical likelihood based test through a brain pathway of two brain procedures. The empirical likelihood is VARIABLE SELECTION FOR ESTI- regions. We propose a two-layer SEM constructed based on asymptotically MATING AVERAGE CAUSAL EFFECTS framework for mediation analysis that unbiased estimating equations. This Douglas Galagate*, U.S. Census Bureau provides valid inference even if correlated method is flexible in incorporating auxil- additive errors are present. In the first iary information to improve the power of Determining which variables to include layer, we use a linear SEM to model the testing and it is robust to heterogeneous at each stage of the modeling process subject level fMRI data, where the con- random errors. Some simulation studies when estimating an average causal tinuous mediator and outcome variables and real data analyses are conducted to effect (ACE) is a topic of debate. In this may contain correlated additive errors. demonstrate the proposed methods. simulation study, different subsets of the We propose a constrained optimization predictor variables are used to estimate email: [email protected] approach to estimate the model coef- the ACE. We estimate the ACE using ficients, analyze its asymptotic properties, outcome-focused, treatment-focused, and characterize the nonidentifiability and double-robust models with the issue due to the correlation parameter. To address this issue, we introduce a linear

308 ENAR 2015 | Spring Meeting | March 15–18 TPRM: TENSOR PARTITION not only reduced to a manageable level, through extensive simulations and is REGRESSION MODELS WITH resulting in efficient estimation, but also applied to the breast cancer NKI data set APPLICATIONS IN IMAGING prediction accuracy is optimized to search to predict breast cancer patients’ survival. for informative sub-tensors. We apply BIOMARKER DETECTION email: [email protected] TPRM to a structural magnetic resonance Michelle F. Miranda*, University of North imaging data, to predict diagnostic status Carolina, Chapel Hill of individuals with Attention Deficit Hyper- STATISTICAL INFERENCE IN Hongtu Zhu, University of North Carolina, activity Disorder. HIGH-DIMENSIONAL M-ESTIMATION Chapel Hill email: [email protected] Hao Chai*, Yale University Joseph G. Ibrahim, University of North Carolina, Chapel Hill Shuangge Ma, Yale University A BOOSTING-BASED VARIABLE Many neuroimaging studies have col- This paper studies the asymptotic proper- SELECTION METHOD FOR SURVIVAL lected ultra-high dimensional imaging ties of some low-dimensional parameters PREDICTION WITH GENOME-WIDE data in order to identify imaging biomark- under the high-dimensional M-estimation GENE EXPRESSION DATA ers that are related to normal biological framework. We consider a general processes, diseases, and the response Yanming Li*, University of Michigan M-estimation problem in which penal- ization is used to select the variables. to treatment, among many others. These Kevin He, University of Michigan imaging data are often represented in Based on the low-dimensional penaliza- the form of a multi-dimensional tensor. Yi Li, University of Michigan tion projection method, we propose a two-stage estimator for selected low- Existing statistical methods are insufficient Ji Zhu, University of Michigan for analysis of these tensor data due to dimensional parameters. Our framework Motivated by a study using genome-wide their ultra-high dimensionality as well as includes linear and generalized linear gene expression data to predict breast complex structure. The aim of this paper models as special cases. Under rea- cancer patients’ survival, we proposed is develop a tensor partition regression sonable conditions, we show that the a Gateaux differential-based boost- modeling framework to establish an asso- proposed estimator is consistent and ing (GDBoosting) method for variable ciation between low-dimensional clinical has an asymptotic normal distribution. selection in the ultra-high dimensional outcomes and ultra-high dimensional ten- We find that a stronger requirement on predictor and survival outcome setting. sor covariates. Our TPRM is a hierarchical the sample size is needed in order to The proposed method can simultane- model with four components: (i) a partition obtain the asymptotic normality than the ously select important variables via an model that divides the high-dimensional consistency. The numerical performance early-stopping criterion and provide tensor covariates into sub-tensor of our estimator is evaluated through consistent estimates for the effects of covariates; (ii) a canonical polyadic simulation studies and a high-dimen- selected variables provided the selec- decomposition model to reduce the sub- sional data example is used to illustrate tion were as good as had been told by tensor covariates to a low-dimensional its application. an oracle. The GDboosting algorithm is feature vectors; and (iii) a generalized email: [email protected] more computationally efficient compared linear model that uses the feature vectors to the lasso, and can be easily adapted to predict clinical outcomes; (iv) a sparse to the case when the genome-wide gene inducing normal mixture prior. Under this expression data assume a grouping framework, ultra-high dimensionality is structure, such as biological pathways. The proposed method is evaluated

Program & Abstracts 309 AUGMENTED WEIGHTED SUPPORT VARIABLE SELECTION ON 75. PRESIDENTIAL INVITED VECTOR MACHINES FOR MISSING MODEL SPACES CONSTRAINED ADDRESS COVARIATES BY HEREDITY CONDITIONS Thomas G. Stewart*, University Andrew Womack, Indiana University, BIG DATA, BIG OPPORTUNITIES, of North Carolina, Chapel Hill Bloomington BIG CHALLENGES Michael C. Wu, University of North Daniel Taylor-Rodriguez*, Statistical David L. DeMets, Ph.D., Max Halperin Carolina, Chapel Hill and Applied Mathematics Institute Professor of Biostatistics, University and Duke University Donglin Zeng, University of North of Wisconsin, Madison Carolina, Chapel Hill Claudio Fuentes, Oregon State Since the 1950’s, biostatisticians have University In recent years, support vector machine been successfully engaged in biomedical (SVM) classifiers have demonstrated Often having a linear additive regression research, from laboratory experiments to utility for a wide variety of classifica- model in the predictors is not sufficient to observational studies to randomized clini- tion tasks. A key feature of SVMs is that adequately predict the response. Consid- cal trials. We owe some of that success they allow for construction of both linear ering higher order polynomial terms and to the early pioneers, especially those and non-linear decision rules which can interactions between the predictors might biostatisticians who were present at the yield better prediction when the data substantially improve the results. In this National Institutes of Health (NIH). They are complex, as is frequently the case in setting, respecting the polynomial hierar- created a culture of scientific collabora- biomedical studies. A practical challenge chy of the terms is necessary to ensure tion, working on the methodology as for SVMs, as with many other classifica- that the ensuing model is invariant under needed to solve the biomedical research tion methods, lies in the accommodation location and scale transformations, espe- problems in design, conduct and analy- of missing data which commonly occur cially when using shrinkage methods. sis. Over the past 5 decades, we have in real data applications due to imperfect With this in mind, we propose a Bayes- experienced a tremendous increase in data collection. Currently, many research- ian procedure using adaptive shrinkage computational power, data storage capa- ers rely on complete-case or imputation estimators that respects the polynomial bility and multidimensionality of data, or solutions which may introduce bias. Addi- hierarchy between predictors by enforc- “big data”. Some of this expansion has tional systematic approaches exist, but ing the strong heredity principle. We run been driven by genomics. At present, these alternative approaches require non- simulations on a variety of scenarios to we have the opportunity to contribute standard algorithms which have slowed test the performance of our approachand to the design and analysis of genomic their adoption. Therefore, we propose an compare with other popular methods data, data stored in the electronic health EM-motivated solution to the incomplete found in the literature. record and continued needs of clinical data problem for SVMs which maintains email: [email protected] trials for greater efficiency. However, with the convex objective function and which these opportunities, we have serious allows the researcher to use the same challenges starting with the fact that we software as the complete case solution. need to develop new methodology to Simulations show that the proposed design and analyze the “big data” bases. method often yields classification rules The demand for quantitative scientists with higher accuracy than existing meth- exceeds the supply and there is no strate- ods. We apply the approach to analyze gic national plan to meet these demands. data from HCV-TARGET, a longitudinal Federal funding for biomedical research study of Hepatitis C patients. has been flat and likely to remain so for email: [email protected] several years, impacting both the ability

310 ENAR 2015 | Spring Meeting | March 15–18 to train additional quantitative scientists FROM IDEALIZED TO REALIZED: treatment that maximises some target. and provide them with research funding ESTIMATING DYNAMIC TREATMENT This is the same fundamental problem for new methodologies. We face new or REGIMENS FROM ELECTRONIC that underpins control methodology in more public scrutiny, demanding that our MEDICAL RECORDS applications (primarily engineering) or data and analysis be shared earlier and theory (often mathematical analysis). This Erica EM Moodie*, McGill University earlier, even as the data are being gath- talk looks at similarities and differences ered such as in clinical trials. Litigation David A. Stephens, McGill University between the problems considered by the is now part of our research environment. Due to the cost and complexity of con- two schools and the methods that are We will examine some of these issues ducting a sequential multiple assignment used for their solutions. We examine how and speculate on ways forward. randomized trial, it is often desirable established control methods might be adapted for statistical adaptive treatment e-mail: [email protected] to estimate optimal strategies via other means which may then be trialed in a problems, and how statistical thinking confirmatory study. Finding good candi- might bring fresh ideas to the control 76. RECENT ADVANCES IN date regimes can be done via simulation literature, especially as control methods DYNAMIC TREATMENT or using large non-experimental datasets are now increasingly being used in bio- REGIMES in which treatment allocation were not medical applications. randomized, such as electronic medical email: [email protected] records (EMRs). Unfortunately, EMRs are THE LIBERTI TRIAL FOR DISCOV- subject to a variety of limitations. In this ERING A DYNAMIC TREATMENT presentation, I will present a simulation METHODS TO INCREASE REGIMEN IN BURN SCAR REPAIR design for a complex, continuous dosing EFFICIENCY OF ESTIMATION Jonathan Hibbard, University of North problem, and discuss ongoing work in WHEN A TEST USED TO DECIDE Carolina, Chapel Hill which we relax idealized assumptions TREATMENT HAS NO DIRECT and move towards more realistic sce- EFFECT ON THE OUTCOME Michael R. Kosorok*, University of North narios. I will then present an analysis of Carolina, Chapel Hill James M. Robins*, Harvard University EMR data from a London anticoagulation In this talk we describe the design and clinic. A CAT scan of the lung has no effect analysis plan for the LIBERTI (Laser on mortality from lung cancer except email: [email protected] Induced, Biologically Engineered Remod- through its role in deciding what che- eling of Thermally Injured) Skin Trial. motherapeutic agent to treat with next. If one is trying to estimate the optimal This is a SMART (Sequential Multiple ADAPTIVE TREATMENT CAT- scan frequency as a function of a Assignment Randomized Trial) design to AND ROBUST CONTROL discover the best sequence of treatments patients evolving laboratory and clinical over three time intervals to improve Robin Henderson*, Newcastle measures, one should be able to use outcomes for patients with severe burn University, UK the fact that the scan has no direct effect scaring as a function of baseline and There has been steadily increasing on mortality except through treatment historical tailoring variables. In addition, statistical interest over the last ten to increase the efficiency of estima- a simple randomized comparison of the years in the data-based development of tion. In this talk I show how that can be three treatments under consideration optimal dynamic treatment rules. Given accomplished. (standard of care plus two different laser a sequence of observations the aim email: [email protected] treatments) using a surrogate outcome is is to choose at each decision time the embedded within the SMART design. email: [email protected]

Program & Abstracts 311 77. Predictive Models work, of both types (a) and (b), that I TOWARD INDIVIDUALIZING for Precision Medicine have been doing recently with colleagues HEALTH CARE; STATISTICAL in the Division of Research of the North- OPPORTUNITIES ern California Kaiser Permanente hospital Yates Coley, Johns Hopkins University THE POWER OF ELECTRONIC chain. MEDICAL RECORDS AS DATA- Zhenke Wu, Johns Hopkins University e-mail: [email protected] GATHERING TOOLS FOR THE Scott L. Zeger*, Johns Hopkins CREATION OF (a) LONGITUDINAL University PERSONALIZED NEAR-REAL-TIME ASSESSING ILLNESS SEVERITY A century ago, William Osler, the first PREDICTIONS OF ADVERSE OUT- FROM ELECTRONIC HEALTH DATA Johns Hopkins Chief of Medicine, said: COMES AND (b) DATA-DRIVEN Suchi Saria*, Johns Hopkins University “Variability is the law of life, and as no two ADVICE SYSTEMS FOR MEDICAL faces are the same, so no two bodies are DECISION-MAKING Nearly one in five patients are harmed by iatrogenic errors during a hospitalization; alike, and no two individuals react alike David Draper*, University of California, types of harm include sepsis, central- and behave alike under the abnormal Santa Cruz and eBay Research Labs line associated blood stream infection conditions which we know as disease”. The recent rapid increase in the use of (CLABSI), urinary tract infection, and Biohealth has dramatically shifted in the electronic medical records (EMRs) for ventilator associated pneumonia. These last century when that statement was near-real-time clinical documentation has harms result in prolonged length of stay, made. Revolutions in biologic and infor- created a singular new opportunity for and increased risk of morbidity including mation technologies have unleashed a longitudinal statistical predictive model- death. Early detection can allow early torrent of new data that make possible the ing, of at least two kinds: (a) vital signs, treatment. However, early signs and kind of individual-specific medicine Osler symptoms and laboratory results now symptoms are often subtle, and therefore envisioned. The Institute of Medicine calls appear in the EMR in an almost real-time difficult to detect by the “naked eye”. on health systems to continuously learn manner, making possible much more With the HITECH act of 2009, much of from these complex data how to optimally accurate assessments of the risk that an individuals health data — continuous care for each individual. This talk will dis- a given hospitalized patient will experi- physiologic streaming data, hundreds cuss statistical opportunities to promote ence an adverse outcome (such as an of laboratory test results, demographic individualized healthcare. A hierarchical unplanned transfer from the general data, personal and family medical his- model will be introduced with compo- medical wards to the intensive care unit); tory, treatments, imaging results, and so nents that represent (1) the trajectory of and (b) each hospitalization, when fin- on — are now available for automated an individual’s health status over time; ished, becomes another row in a growing analysis. In this talk, I present early work (2) the effect of exogenous covariates clinically-rich data base, permitting the for integrating these diverse measure- and possibly endogenous interventions development of systems in which physi- ments collected in the inpatient setting on health status; (3) the measurement cians can make queries of the following for assessing illness severity. These are of health status; and (4) the embedding form: among all of the patients in the past useful for triage and early intervention of the individual in a relevant population. with this clinical profile, at this stage of for adverse events. This is joint work Bayesian methods will be used to esti- the hospitalization, what clinical courses with collaborators at the Johns Hopkins mate a person’s health state or trajectory of action did physicians in the past under- School of Engineering and the Johns using a multivariate discrete state space take, and what were the success rates of Hopkins School of Medicine. and discrete time. Method for checking those actions? In this talk I will describe model predictions will be described. The email: [email protected] approach and methods will be illustrated with recent clinical applications. email: [email protected]

312 ENAR 2015 | Spring Meeting | March 15–18 DANCING WITH BLACK SWANS: by medical practitioners. The second loss and degradation. Each part will be A COMPUTATIONAL PERSPECTIVE part of the talk focuses on the challenges explored using case studies of actual ON SUICIDE RISK DETECTION to adoption into clinician practice. We data quality problems encountered in discuss the challenges, our solutions and using EHR data. Examples of data quality Truyen Tran*, Deakin University finally outlines the implementation in clini- problems will be reviewed along with and Curtin University, Australia cal practice. assessment methods for identifying simi- Santu Rana, Deakin University, Australia lar data quality problems. email: [email protected] Wei Luo, Deakin University, Australia email: [email protected] Dinh Phung, Deakin University, Australia

Svetha Venkatesh, Deakin University, STATISTICAL METHODS FOR Australia 78. Electronic Health Records: Challenges DEALING WITH NON-RANDOM Richard Harvey, Barwon Health, and Opportunities OBSERVATION OF LABORATORY Australia DATA IN EHRs Despite great attention paid to suicide Jason A. Roy*, University of TRIALS AND TRIBULATIONS prevention with substantive medico-legal Pennsylvania IN TRIALS USING EHR DATA implications, there has been no satisfying EHRs are potentially a rich source of method that can reliably predict future Meredith Nahm Zozus*, Duke University observational data for comparative attempted or completed suicides. Is this NIH funded researchers and others effectiveness research. Laboratory data, impossible we ask? Are these (baby) are using EHR data for pragmatic trials which are increasingly available in many black swans? This talk takes an alternate in healthcare settings and other clini- EHRs, can be used to either improve view when faced with such challenging cal studies. The Health Care Systems confounder control or as outcomes. outlier detection in big data problems. Can Research Collaboratory (www.nihcollabo- However, which laboratory values are we focus on moderate and high risk with ratory.org) works with pragmatic clinical observed when are likely related to the minimal error? We present an integrated trials conducted in health systems and underlying health of the subjects. Further, machine learning framework to tackle this using EHR data. The literature is well some subjects will have no values of challenge. Our proposed framework con- peppered with empirical reports of data particular laboratory variables available sists of a novel feature extraction scheme, quality problems encountered in using at all. We develop methods for dealing an embedded feature selection process, EHR data for purposes other than those with this type of informative missing data. a set of risk classifiers and finally, a risk for the data were originally collected, Two sets of assumptions are considered calibration procedure. We perform com- however, there is little systematic knowl- -- one which is more appropriate for out- prehensive study on data collected for the edge about the topic. The collaboratory, mental health cohort, and the experiments other initiatives, and studies using EHR validate that our proposed framework data provide opportunities to observe outperforms risk assessment instruments data quality problems in EHR data and to build our knowledge. This talk casts data quality problems in two parts, repre- sentational inadequacy and information

Program & Abstracts 313 come models and another that is more heterogeneous, and contemporaneous sometimes subject to abrupt changes in appropriate for propensity score-based patient populations. The unique fea- structure and content. We present a sta- methods. We compare the methods tures and challenges of EHD, including tistical approach to jointly analyzing text, using simulation studies. The methods missing and high-dimensional risk factor categorical and continuous data to model are illustrated using data from several information, non-linear relationships a patient’s disease state. The approach EHR-based comparative effectiveness between risk factors and cardiovascular allows the collection of multiple sources studies. event outcomes, and differing effects of evidence into a single coherent picture in different patient subgroups, demand of disease. We put this together with email: [email protected] novel machine learning approaches a parametric family of curves used to to risk model development. However, describe the progression of health and EXTENDING BAYESIAN NETWORKS many machine learning methods are disease through time (as recorded in the TO ESTIMATE CONDITIONAL not well-suited to handle right-censored medical record). This model allows us SURVIVAL PROBABILITY USING outcomes. We describe how to efficiently to tie together different health outcomes ELECTRONIC HEALTH DATA extend Bayesian networks to estimate such as the onset of Alzheimer’s disease the conditional survival distribution. We and future admission into skilled nursing David M. Vock*, University of Minnesota show that our approach can lead to bet- facilities into a single picture of health. We Julian Wolfson, University of Minnesota ter predictive performance than the Cox demonstrate our approach on collection Sunayan Bandyopadhyay, University proportional hazards model while more of 3.55 million patient visits occurring of Minnesota naturally handling the challenges posed over a seven year period. by EHD. Our techniques are motivated by Gediminas Adomavicius, University email: [email protected] and illustrated on data from a large U.S. of Minnesota Midwestern health care system. Paul E. Johnson, University of Minnesota email: [email protected] 79. Cost-Effective Study Gabriela Vazquez-Benitez, Designs for HealthPartners Institute for Education Observational Data and Research TRACKING AND PREDICTING DISEASE FROM THE ELECTRONIC Patrick J. O’Connor, HealthPartners MEDICAL RECORD DESIGN AND ANALYSIS OF RETRO- Institute for Education and Research SPECTIVE STUDIES FOR Joseph Edward Lucas*, Duke University Models for predicting the risk of car- LONGITUDINAL OUTCOME DATA The systematic collection of electronic diovascular events based on individual Jonathan S. Schildcrout*, Vanderbilt medical records is creating new opportu- patient characteristics are important tools University School of Medicine for managing patient care. Most cur- nities throughout healthcare. Applications Nathaniel D. Mercaldo, Vanderbilt rent and commonly used risk prediction based on EHR data have the potential University School of Medicine models have been built from carefully to disrupt they way that physicians selected epidemiological cohorts. How- and health care systems interact with We discuss approaches to examine the ever, the homogeneity and limited size patients, the way that clinical research modifying effects of single nucleotide of such cohorts restricts the predictive is conducted, and the ability of regula- polymorphisms on lowering LDL-choles- power and generalizability of these risk tors to encourage medical practice that terol among patients on simvastatin. We models to other populations. Electronic is focused on patient health. However, consider the setting where longitudinal health data (EHD) from large health care medical records are complicated and LDL and covariate data are available prior systems provide access to data on large, messy. They incorporate many dif- to study conception; however genotyp- ferent types of data, those data often contain mistakes or holes, and they are

314 ENAR 2015 | Spring Meeting | March 15–18 ing stored blood samples is expensive, tions are proposed that drastically reduce screening test measure. We propose and it can only be done on a fraction computational burden. A comprehensive semi-parametric empirical likelihood of patients. We consider methods for simulation shows that, in a broad range estimators for the AUC, partial AUC, and determining which subjects should be of scenarios, estimators based on the the covariate-specific ROC curve to avoid sampled, and methods for analyzing the approximate hybrid likelihood exhibit the making distributional assumptions about data once the biased sample is identified. same operating characteristics as the the data. These methods are especially We compare several designs that sample exact hybrid likelihood, without any pen- beneficial in situations where the disease based on features of the longitudinal alty in terms of increased bias or reduced status is costly to ascertain or when the outcome and covariate data. Further, we efficiency. Third, in settings where the length of time between the screening test compare several approaches to analyses approximations may not hold, a prag- and the outcome of interest is long. This that are of varying degree of complexity matic estimation and inference strategy cost-effective sampling design allows including univariate and longitudinal data is developed that uses the approximate for a more powerful study on the same analyses. Parameter interpretations and form for some likelihood contributions budget. and the exact form for others. The estimation precision will be the focus. email: [email protected] strategy gives researchers the ability to email: balance computational tractability with [email protected] accuracy in their own settings. Finally, as 80. Advanced Machine a by-product of the development, we pro- Learning Methods ON THE ANALYSIS OF HYBRID vide the first explicit characterization of DESIGNS THAT COMBINE GROUP- the hybrid aggregate data design which AND INDIVIDUAL-LEVEL DATA combines data from an aggregate data A NEW APPROACH TO VARIABLE study (Prentice and Sheppard, 1995) with SELECTION VIA ALGORITHMIC Sebastien Haneuse*, Harvard School case-control samples. The methods are REGULARIZATION PATHS of Public Health illustrated using data from North Carolina Yue Hu, Rice University Elizabeth Smoot, Harvard School on births between 2007 and 2009. of Public Health Genevera I. Allen*, Rice University email: [email protected] and Baylor College of Medicine Ecological studies that make use of data on groups of individuals, rather than on Variable selection for high-dimensional the individuals themselves, are subject to TEST-DEPENDENT SAMPLING problems yields a trade-off between numerous biases that cannot be resolved DESIGN AND SEMI-PARAMETRIC statistical and computational efficiency. without some individual-level data. In the INFERENCE FOR THE ROC CURVE Penalization methods such as the LASSO solve a relaxation of the best subsets context of a rare outcome, the hybrid Haibo Zhou*, University of North problem that run in polynomial time, but design for ecological inference efficiently Carolina, Chapel Hill combines group-level data with individ- only achieve provable statistical guar- Beth Horton, University of Virginia ual-level case-control data. Unfortunately, antees for selection in limited settings. except in relatively simple settings, use The receiver operating characteristic Is possible to achieve better statistical of the design in practice is limited since (ROC) curve and area under the ROC performance for variable selection in evaluation of the hybrid likelihood is curve (AUC) are used to describe the a computationally faster manner? We computationally prohibitive expensive. In ability of a screening test to discriminate answer in the affirmative, introducing a this paper we first propose and develop between diseased and non-diseased new approach to variable selection that an alternative representation of the hybrid subjects. Evaluating the true disease we term Algorithmic Regularization Paths. likelihood. Second, based on this new status can be costly. We develop a test Our method quickly finds a sequence of representation, a series of approxima- dependent sampling (TDS) design where sparse models, similar in spirit to regular- TDS inclusion depends on a continuous ization paths, by solving a series of linked

Program & Abstracts 315 subproblems associated with the Alter- GRAPHICAL REGRESSION of nodes of a particular layer on another. nating Direction Methods of Multipliers We propose a new unified approach for Hsin-Cheng Huang, Academia Sinica, (ADMM) algorithm. Here, we introduce estimating a two-layered network. The Taiwan this novel method, provide intuition for proposed method offers an efficient way where the algorithm originates, study its Xiaotong Shen*, University of Minnesota of estimating edges between and across theoretical properties, and draw connec- Wei Pan, University of Minnesota layers iteratively, by constructing an tions to existing regularization methods. objective function based on the penal- Graphical models have proven useful in Empirical results show that our Algorithm ized joint maximum likelihood function describing relations among interacting Paths yield better performance in terms (under a Gaussianity assumption), then units. In this paper, we propose graphical of variable selection and prediction error using block co-ordinate descent to do models to link the inverse of covariance and are moreover computationally faster the optimization. Our method decouples matrix, called the precision matrix, to than all existing approaches. the estimation of undirected and directed covariates, to locate the structural change edges within each iteration, however the email: [email protected] of a graph as a function of covariates. updated estimates are integrated in the For instance, in neuroimaging, func- next iteration. The performance of the tional connectivity of regions of interest LINK PREDICTION FOR methodology is illustrated via simulations. concerns the relationship between brain PARTIALLY OBSERVED NETWORKS Applications to Omics problems are also activity and specific mental functions. briefly discussed. Yunpeng Zhao, George Mason University To investigate the structural change of a graph based on regression coefficients, email: [email protected] Yun-Jhong Wu, University of Michigan we construct a constrained likelihood, Elizaveta Levina, University of Michigan where sparsity constraints are imposed to 81. Statistical Analysis Ji Zhu*, University of Michigan seek low rank representations. Com- putational aspects will be discussed in for Deep Sequencing Link prediction is one of the fundamen- addition to some theoretical aspects. Data in Cancer tal problems in network analysis. In Research: Methods many applications, notably in genetics, email: [email protected] and Applications a partially observed network may not contain any negative examples of absent edges, which creates a difficulty for many PENALIZED MAXIMUM LIKELIHOOD A STATISTICAL METHOD FOR existing supervised learning approaches. ESTIMATION ON A TWO-LAYERED DETECTING DIFFERENTIALLY We develop a new method which treats NETWORK EXPRESSED MUTATIONS BASED ON the observed network as a sample of the George Michailidis*, University NEXT-GENERATION RNAseq DATA true network with different sampling rates of Michigan Pei Wang*, Icahn School of Medicine for positive and negative examples. We Networks are one of the most popu- at Mount Sinai obtain a relative ranking of potential links lar tools for capturing the interactions Rong Fu, University of Washington by their probabilities, utilizing information between nodes, which are used to rep- on node covariates as well as on network Ziding Feng, University of Texas resent the underlying random variables. topology. Empirically, the method per- MD Anderson Cancer Center In particular, constructing and analyzing forms well under many settings, including a layered structure provides insight into We propose a new statistical method when the observed network is sparse. understanding the conditional relation- – MutRSeq – for detecting single We apply the method to a protein-protein ships among nodes within layers while nucleotide variants (SNVs) differentially interaction network and a school friend- adjusting for and quantifying the effects expressed in samples with different ship network. disease status based on RNA-seq data. email: [email protected]

316 ENAR 2015 | Spring Meeting | March 15–18 Specifically, we employ a hierarchical it is possible that differences between which include finite mixtures, finite and likelihood approach to jointly model subsets in observed mutation frequency infinite hidden Markov models, Dirich- observed mutation events and read count may be due to differences in expression let processes, and zero and first order measurements from RNAseq experi- rather than actual differences in preva- PDPs that cover a broad range of data ments. We then introduce a likelihood lence. We present an empirical model correlation structures. An outstanding ratio based test statistic, which detects for detection frequency as a function of feature of g-PDP models is their ability changes not only in overall expression coverage, which we use to estimate the to infer in an unsupervised manner and levels, but also in allele specific expres- true mutation prevalence within a given with high accuracy, latent clusters of sion patterns. In addition, this method subset of cases. Further, we provide a genes that may represent joint regulatory can jointly test multiple mutations in one test for differential prevalence which takes mechanisms. Simulation studies dem- gene/pathway. The simulation stud- into account the possibility of differential onstrate that g-PDPs outperform many ies suggest that the proposed method coverage between the subsets. existing techniques in terms of accuracy achieves better power than a few compet- of signature identification. The detected email: [email protected] itors under a range of different settings. In disease genomic signatures in RNA-seq the end, we apply this method to a non- lung cancer TCGA data highlight the abil- smoker lung cancer data set and identify SCALABLE BAYESIAN ity of g-PDP models to handle multiple potential disease genes. NONPARAMETRIC LEARNING conditions without Bonferroni-type adjust- ments. The pathway analysis identified email: [email protected] FOR HIGH-DIMENSIONAL LUNG CANCER GENOMICS DATA upstream regulators of many genes that are common genetic markers in multiple Chiyu Gu, University of Missouri ACCOUNTING FOR DIFFERENTIAL tumor cells. Subharup Guha*, University of Missouri COVERAGE IN COMPARING email: [email protected] MUTATION PREVALENCE Veerabhadran Baladandayuthapani, George W. Wright*, National Cancer University of Texas MD Anderson UNDERSTANDING MicroRNA Institute, National Institutes of Health Cancer Center SEQUENCING DATA DISTRIBUTION RNA-seq is a powerful tool in under- Through array-based and next-generation Li-Xuan Qin*, Memorial Sloan Kettering standing the biological mechanisms sequencing, ‘omics datasets involve Cancer Center underlying cancer and other diseases. intrinsically different sizes and scales of By comparing the prevalence of a given high-throughput data, providing genome- Tom Tuschl, Rockefeller University wide, high-resolution information about mutation between subsets of samples, Sam Singer, Memorial Sloan Kettering the biology of lung cancer. A common it is possible to gain insight into the Cancer Center molecular basis for observed pheno- goal is the identification of differential typic differences. However, the ability to genomic signatures between samples MicroRNAs (miRNAs) are a prevalent detect mutations in samples at a given that correspond to different treatments class of small single-stranded non- location will depend on their coverage or biological conditions, e.g., treat- coding RNAs that negatively regulate at that location, so that simply counting ment arms, tumor (sub)types, or cancer gene expression. MiRNAs are involved the proportion of samples in which the stages. We construct an encompassing in a wide variety of cellular functions mutation is observed may underesti- class of nonparametric models called such as proliferation, differentiation, and mate the true mutation prevalence in generalized Poisson-Dirichlet processes apoptosis. Their roles in carcinogenesis locations of low coverage. In RNA-seq, (g-PDPs) that are applicable to mixed, are being increasingly studied with the coverage depends on gene expression, heterogeneously scaled datasets. advent of the deep sequencing technol- and since compared subtypes may have Each platform can choose from diverse ogy. Comparing with mRNA sequencing, very different gene expression profiles, parametric and nonparametric models, miRNA sequencing has the advantages

Program & Abstracts 317 of relatively homogeneous gene length and to understand factors that affect adopted choice for the proximities is to and more straightforward gene map- invasion speed. We also need to iden- assume that they are binary and based ping and assembly; at the same time, it tify geographic patterns of spread, for upon some notion of adjacency. In this faces the unique challenges that miRNA instance distinguishing highly directional paper, we propose an extension of the abundance has a broad range among “wave-like” spread versus spread that is binary adjacency proximities CAR model genes and its distribution can be highly radially outward, indicating the source where the proximities, defined through a skewed among samples. In order to of the invasion or the site of a sudden suitable transformation of a latent Gauss- better understand miRNA sequencing long-range dispersal. We describe meth- ian process, are random and directional, data distribution, we carried out a miRNA odology that addresses these questions, thus allowing for varying strength of sequencing study at Memorial Sloan building largely on Gaussian process association among an areal unit and Kettering Cancer Center using tumor models for observations of the time of its neighbors. Our specification of the samples including both biological and first infestation. Our methods allow us proximities allows us to derive distribu- technical replicates. We examined the to estimate local speed and direction of tional properties of the proximities and of distributional patterns of technical and spread while also identifying key invasion the spatial random effects, and leads to biological variations and we proposed features in rigorous fashion. We illustrate tractable Bayesian inference with closed a novel statistical model for miRNA the application of our methods to two form full conditionals. In the case of large sequencing data. biological invasions, the gypsy moth and dimensional datasets, it is possible to the emerald ash borer. This is joint work introduce dimension reduction for the email: [email protected] with Joshua Goldstein, Ottar Bjornstad proximities to alleviate computational and Andrew Liebhold. burden. 82. Spatial and email: [email protected] email: [email protected] Spatio-Temporal Modeling A GENERALIZED CONDITIONALLY MULTIVARIATE SPATIAL MODELING AUTOREGRESSIVE (CAR) MODEL OF CONDITIONAL DEPENDENCE SPATIAL LOCAL GRADIENT MODELS IN MICROSCALE SOIL ELEMENTAL Veronica J. Berrocal*, University OF BIOLOGICAL INVASIONS COMPOSITION DATA of Michigan Joshua Goldstein, The Pennsylvania Joseph Guinness*, North Carolina Alan E. Gelfand, Duke University State University State University Spatial areal data is encountered in a Murali Haran*, The Pennsylvania Montserrat Fuentes, North Carolina broad range of applications. Within the State University State University Bayesian framework, this type of data is Ottar N. Bjornstad, The Pennsylvania most commonly analyzed by introduc- Dean Hesterberg, North Carolina State University ing, in the second stage of a hierarchical State University model, spatial random effects modeled Andrew M. Liebhold, U.S. Forest Matthew Polizzotto, North Carolina as a Gaussian Markov random field, Services State University specified locally through a condition- Understanding the processes that influ- ally autoregressive (CAR) model (Besag Elevated concentrations of toxic trace ence the spread of biological invasions 1974). The key specification in a CAR elements pose threats to human health is key to developing effective strategies model is the proximity matrix, W, with through contamination of food and drink- for intervention. Of particular interest entries wij , proximities that encode ing water. We describe an experiment is the ability to quantify spread rates the strength of association among the that maps the composition of elements various areal units. The most frequently

318 ENAR 2015 | Spring Meeting | March 15–18 on a sand grain using X-ray fluorescence computer codes are essentially noiseless estimates from three sampling designs microprobe analyses, before and after and respond very smoothly to changes that were utilized in a completed trial; the the grain is treated with arsenic solutions, in input settings. In a typical setting, full cohort, a subsample, and a nested resulting in multivariate spatial lattice the simulation output is a function of a case-cohort design. The subsample maps of elemental abundance. To under- p-dimensional input vector x. In some represents participants that provided stand the behavior of arsenic in soils, it applications, p may be as large as 60, samples for ancillary studies. Sub-cohort is important to disentangle the complex however for the application of interest, controls were weighted by the inverse of multivariate relationships among the the output is typically affected by a much the sampling fraction. We used pro- elements in the sample. The abundance smaller subset of these p inputs. After a portional hazard models, stratified by of most elements, including arsenic, fixed number of simulations are carried treatment group and adjusted for known correlates strongly with that of iron, but out, a GP model can be used to predict confounders, to examine the association conditional on the amount of iron, some the simulation output at untried settings. between baseline eGFR and morality in elements may mitigate or potentiate the Recently we have encountered situations each sampling design. Demographic accumulation of arsenic. This problem where the computer model output is a and baseline characteristics were similar motivates our work to define conditional spatial field. The high-dimensionality of between the full cohort and subsample. correlation in spatial lattice models and this output, along with its spatial depen- We show that estimates from all three give general conditions under which two dence, leads to a number of interesting sampling designs agree reasonably well. components are conditionally uncor- issues in computer model emulation These results lend a measure of confi- related given the rest. We describe how and calibration. We compare a number dence in these designs. to enforce that two components are of different approaches for GP-based email: [email protected] conditionally uncorrelated given a third emulation for spatial fields, using two in parametric models, which provides a applications – one in nuclear physics, basis for likelihood ratio tests of condi- and one in vulcanology. POWER ESTIMATION FOR tional correlation between arsenic and email: [email protected] ORDINAL CATEGORICAL DATA chromium given iron. We show how to IN THE PRESENCE OF NON apply our results to big datasets using PROPORTIONAL ODDS the Whittle likelihood, and we demon- 83. CONTRIBUTED PAPERS: Roy N. Tamura*, University of strate through simulation that tapering Study Design and Power South Florida improves Whittle likelihood parameter estimates governing cross covariance. Xiang Liu, University of South Florida COMPARISON OF RISK ESTIMATES email: [email protected] Ordinal categorical data are extremely DERIVED FROM FULL COHORT, common in clinical trials where subjective SUB-SAMPLE, AND NESTED outcomes are being measured. The most CASE-COHORT METHODOLOGIES GAUSSIAN PROCESS MODELS FOR common analyses for ordinal categori- EMULATING SPATIAL COMPUTER Kathleen A. Jablonski*, The George cal data are based on the proportional MODEL OUTPUT Washington University odds model. However, the assumption of proportional odds is an assumption of Dave M. Higdon*, Los Alamos National Madeline M. Rice, The George the model and not necessarily a property Laboratory and Virginia Tech Washington University of the data. In many situations, clinical Mengyang Gu, Duke University Drawing subsamples from a large study researchers may suspect that an effect Gaussian process models have proven is often done to measure biomarkers as a on an ordinal scale will not satisfy the to be very useful in modeling computer cost savings measure. We compared risk proportional odds assumption. In these simulation output. This is because many situations, statisticians need to be able

Program & Abstracts 319 to investigate alternative models and EMPIRICAL DETERMINATION OF Poisson based methods and larger than estimate the power for these alternative STATISTICAL POWER AND SAMPLE an approach based on the data follow- models. We examine the trend odds SIZE FOR RNA-Seq STUDIES ing Gaussian distribution. As expected, model and the saturated model and empirical based sample size estimates Milan Bimali*, University of Kansas compare these models to the propor- based on Negative-Binomial distribution Medical Center tional odds model under a wide class of were larger than the estimates computed alternative hypotheses. We also examine Jonathan D. Mahnken, University from Poisson distribution thus accounting the exemplary dataset (ED) approach of Kansas Medical Center for the over-dispersion observed in RNA- for power estimation for these models Brooke L. Fridley, University of Kansas Seq data. and determine how well this approach Medical Center email: [email protected] approximates the actual power. A recently developed ordinal scale for assessing nausea in the pediatric population illus- RNA-Seq studies produce count- FUNCTIONAL SIGNAL-TO-NOISE trates the issues and recommendations. based data that do not follow normal RATIO ANALYSIS WITH APPLI- distribution, but rather Poisson or Neg- email: [email protected] CATIONS IN QUANTITATIVE ative-Binomial distribution. In designing ULTRASOUND RNA-Seq studies, the estimate of sample Yeonjoo Park*, University of Illinois, size to achieve a desired statistical power SINGLE ARM PHASE II CANCER Urbana-Champaign SURVIVAL TRIAL DESIGNS is important. Several methods have been proposed in the literatures that Douglas G. Simpson, University Jianrong John Wu*, St. Jude Children’s are designed specifically for RNA-Seq of Illinois, Urbana-Champaign Research Hospitial studies, however none have attracted Motivated by research on diagnostic In this talk, a modified one-sample log- consensus yet. The mean-variance ultrasound to evaluate tissue regions of rank test statistic is proposed. In general, relationship in Poisson and Negative- interest such as tumors and cysts via the proposed test can be used to design Binomial distribution has been one of their ultrasound backscatter properties, single-arm phase II survival trials under the challenges in deriving exact analytic this paper develops an approach to any parametric survival distribution. Simu- form. Proposed methods have been functional effect size estimation, testing lation results showed that it preserves based on large sample approximation and visualization. Extending methods type I error well and provides adequate thereby making the use of such esti- from functional analysis of variance we power for phase II cancer survival trial mates for small scale RNA-Seq studies introduce the functional signal-to-noise designs. questionable. We propose a simulation ratio (fSNR), discuss its use for visual- email: [email protected] based approach for estimating power izing the magnitude of effects over for given sample size under the Poisson the domain of interest, and develop and Negative-Binomial assumption. Our bootstrap inferences based on global method does not assume large sample summary functions of the fSNR. The approximation. The sample sizes based approach allows for irregular functional on our approach are compared with existing methods. The simulation based sample size estimates based on the Poisson distribution were smaller than the estimates computed from large sample

320 ENAR 2015 | Spring Meeting | March 15–18 data in which the ranges of the curves ate the power of the test through Monte Xianlong Wang, Fred Hutchinson may vary, as long as the full ensemble Carlo simulation studies and apply our Cancer Research Center of curves covers the domain of inter- method to a clinical trial of the effects of Pei Wang, Icahn Medical School est. We also develop simulation based normobaric oxygen therapy on patients at Mount Sinai power analysis for the global fSNR based who had an acute ischemic stroke. tests. The methods are illustrated in the Nonignorable missing data exist in the email: [email protected] analysis of irregular functional data from iTRAQ (isobaric tag for relative and abso- inter-laboratory quantitative ultrasound lute quantitation) proteomic experiment. measurements. SAMPLE SIZE DETERMINATION The missing mechanism in the data is defined as experiment-level abundance- email: [email protected] BASED ON QUANTILE RESIDUAL LIFE dependent missing-data mechanism (EADMM). We propose a new method, Jong Hyeon Jeong*, University ANALYSIS OF A NON-MORTALITY mixEMM, to explicitly model the miss- of Pittsburgh OUTCOME IN CLINICAL TRIAL OF ing mechanism, using the expectation A POTENTIALLY LETHAL DISEASE In the analysis of time-to-event data, the conditional maximization (ECM) algo- concept of residual life provides straight- rithm under the setting of mixed effects Roland A. Matsouaka*, Duke University forward interpretation, and has recently model. The performance of the proposed Rebecca Betensky, Harvard University drawn much attention in the literature. method is evaluated in simulation studies Clinical studies are usually designed and in a proteomic data illustration. We We consider a randomized clinical trial based on the hazard rates under the also discuss some extensions of this of a potentially lethal disease where proportional hazards model. In this approach in the end. patients are treated, followed for a fixed presentation, we consider sample size period of time, and assessed on a non- email: [email protected] determination to detect a difference in mortality outcome. For patients who die quantile residual lifetimes between two before the end of follow-up, the outcome groups, given operating characteristics. of interest is not assessed. Any statisti- MULTIPLE IMPUTATION FOR GEN- The results are compared to ones calcu- cal analysis based solely on patients ERAL MISSING PATTERNS IN THE lated from the hazard rate approach. An who survived can be misleading since PRESENCE OF HIGH-DIMENSIONAL extension to a competing risks case is survivors may substantially differ from DATA also considered. those who died. An alternative approach Yi Deng*, Emory University to assess the treatment effect is to create email: [email protected] Qi Long, Emory University a composite outcome including death and the non-mortality outcome. For this Zhao and Long (2013) investigated talk, we examine the use of Wilcoxon— 84. CONTRIBUTED PAPERS: several approaches of using regularized Mann—Whitney test on such worst-rank Missing Data regression and Bayesian lasso regres- composite outcomes, where the patients sion to conduct multiple imputation (MI) who survived are ranked based on the in the presence of high-dimensional data, A MIXED EFFECTS MODEL magnitude of their responses while those though their methods are not directly FOR INCOMPLETE DATA WITH who died are assigned “worst-rank” applicable to the case of general miss- EXPERIMENT-LEVEL ABUNDANCE- scores. These scores are (chosen) worse ing patterns. Based on the technique DEPENDENT MISSING-DATA than any observed responses: they are of chained equations, we extend their MECHANISM either (1) set to a single value or (2) rank methods to handle general missing pat- based the time of death, an earlier death Lin S. Chen, University of Chicago terns in the presence of high-dimensional data and implement our methods in an R being worse than a later death. We evalu- Jiebiao Wang*, University of Chicago

Program & Abstracts 321 package. The proposed MI methods are EM ALGORITHM IN GAUSSIAN ON IDENTIFICATION ISSUES WITH evaluated in extensive simulation studies COPULA WITH MISSING DATA BINARY OUTCOMES MISSING NOT and are further illustrated using a data set AT RANDOM Wei Ding*, University of Michigan from the Georgia Coverdell Acute Stroke Jiwei Zhao*, University at Buffalo, SUNY Registry. Peter X. K. Song, University of Michigan In epidemiology, regression models email: [email protected] Rank-based correlation is widely used to with binary outcomes are often used to measure dependence between variables investigate the relation between disease when their marginal distributions are status and other exposures or covariates A MIXED-EFFECTS MODEL FOR skewed. Estimation of such correlation of interest, especially for case control NONIGNORABLE MISSING LONGITU- is challenged by both the pres ence of studies. Clinically, these studies usu- DINAL DATA missing data and the need for adjusting ally encounter the problem of missing for confounding factors. In this paper, we Xuan Bi*, University of Illinois, disease status and the missing data consider a unified framework of Gauss- Urbana-Champaign mechanism is highly suspected to be ian copula regression that enables us to nonignorable (Little and Rubin, 2002). Annie Qu, University of Illinois, estimate either Pearson cor- relation or Therefore, we have to be careful resolv- Urbana-Champaign rank-based correlation (e.g. Kendall’s tau ing the identifiability conditions of the Nonignorable missing data occurs fre- or Spearman’s rho), depending on the unknown parameters for each method quently in longitudinal studies. Estimation types of marginal distributions. To adjust we use (Robins, 1997). In this paper, we bias may arise if a missing mechanism for confounding covariates, we utilise systemically study the identifiability con- is misspecified. To address this issue, marginal regression models with univari- ditions with missing response data when we introduce a mixed-effects estimat- ate location-scale family distributions. We the mechanism is nonignorable. Although ing equation approach, which enables establish the EM algorithm for estima- we focus on logistic regression and probit one to recover missing information tion of both correlation and regression regression, the theory can be extended to from the measurement process and the parameters with missing values. For other generalized linear models. Com- missing process simultaneously. The implementation, we propose an effective prehensive simulation studies and a real proposed method proves consistency peeling procedure to carry out itera- data analysis are conducted to illustrate and asymptotic normality of the fixed- tions required by the EM algorithm. We our theory and method. effect estimation under shared-parameter compare the performance of the EM algo- models and an extended shared-param- rithm method to the traditional multiple email: [email protected] eter model. In simulation studies, we imputation approach through simulation show the effectiveness of the proposed studies. For structured types of correla- KENWARD-ROGER APPROXIMATION method under different missing patterns tions, such as exchangeable or first-order FOR LINEAR MIXED MODELS in conjunction with robustness against auto-regressive (AR-1) correlation, the WITH MISSING COVARIATES model assumption violation. In addition, it EM algorithm outperforms the multiple is applied to the election poll survey data imputation approach in terms of both esti- Akshita Chawla*, Michigan from 2007-2008 Associated Press-Yahoo! mation bias and efficiency. State University News which involves multiple refreshment email: [email protected] Tapabrata Maiti, Michigan samples. State University email: [email protected] Samiran Sinha, Texas A&M University Partially observed variables are com- mon in scientific research. Ignoring the subjects with partial information may lead

322 ENAR 2015 | Spring Meeting | March 15–18 to biased and or inefficient estimators, could cause efficiency losses in estimat- 85. CONTRIBUTED PAPERS: and consequently any test based only on ing survival statistics and pose potential Innovative Methods for the completely observed subjects may risks for bias if censoring depends upon Clustered Data inflate the error probabilities. Missing data survival time or the missing covariates are issue has been extensively considered not ignorable. This presentation proposes in the regression model, especially in the a nonparametric approach to impute CORRELATION STRUCTURE independently identically (IID) data setup. censored survival or missing covariates SELECTION PENALTIES FOR Relatively less attention has been paid by substituting with information from a IMPROVED INFERENCE WITH for handling missing covariate data in the matched observation with complete data GENERALIZED ESTIMATING linear mixed effect model-- a dependent (called donor). This procedure is applied EQUATIONS data scenario. In case of complete data, sequentially for one variable at a time for Philip M. Westgate*, University Kenward-Roger’s F test is a well-estab- multiple rounds. The completed data can of Kentucky lished method for testing of fixed effects be analyzed as if there were no miss- Woodrow W. Burchett, University in a linear mixed model. In this paper, ing or censoring data. This approach is of Kentucky we present a modified Kenward-Roger appealing because it utilizes two working type test for testing fixed effects in a models, one predicts the variable subject- Generalized estimating equations (GEE) linear mixed model when the covariates ing to missing or censoring and the other are often used for the marginal analysis are missing at random. In the proposed predicts the missingness of that variable, of correlated data. With GEE, a working method, we attempt to reduce bias from to define the donor pool, such that the correlation structure must be selected. three sources, the small sample bias, the imputed value is double robust against Accurate modeling of this structure can bias due to missing values, and the bias possible incorrect assumptions about the improve efficiency. However, estimation due to estimation of variance compo- missing mechanisms. Multiple imputation of correlation parameters can inflate nents. The operating characteristics of is used to propagate imputation uncer- the covariance matrix of the regression the method is judged and compared with tainty. The approach is applied to cancer parameter estimates, which should be two existing approaches, listwise dele- registry data for assessing the survival accounted for, or penalized by, criteria tion and mean imputation, via simulation and incidence of breast cancer by HER2 that are used to select a working struc- studies. status. The performance of the proposed ture. We therefore discuss different method is assessed by a simulation penalties, and give practical consider- email: [email protected] study. ations for data analysts on how these email: [email protected] penalties may influence regression NONPARAMETRIC SEQUENTIAL parameter estimation. MULTIPLE IMPUTATION FOR email: [email protected] SURVIVAL ANALYSIS WITH MISSING COVARIATES Paul Hsu, University of Arizona HANDLING NEGATIVE CORRELA- TION AND/OR OVERDISPERSION Mandi Yu*, National Cancer Institute, IN GAUSSIAN AND NON-GAUSSIAN National Institutes of Health HIERARCHICAL DATA Cancer registry data has been the cor- Geert Molenberghs*, Hasselt University nerstone for monitoring cancer survival at and Leuven University the population level. However, the pres- Non-negative correlation and overdisper- ence of censoring and missing covariates sion are phenomena that received a lot of study, separately and in conjunction.

Program & Abstracts 323 As a result, a large suite of modeling sion model to mouth-level averages or (2) fat tailed error distribution. Specifically, approaches has been proposed, for using generalized estimating equations the variables are clustered based on the about three decades now. That said, (GEE) with simple correlation structures. agreement of relationships (unknown) also negative correlation is perfectly As an alternative, we propose two linear between variable measures and covari- possible in real-life biometric and other mixed models with random effects that ates of interest. A Bayesian method is experiments. The same is true for quantify the within-mouth correlation of proposed for this purpose, in which a underdispersion. Allowing for these in a teeth and their shared functionality. Via semi-parametric model is used to evalu- flexible and elegant modeling approach, simulation, we compare the bias and ate any unknown relationship between preferably with hierarchical intepretation, efficiency of fixed effect estimates com- variables and covariates of interest, and a is less than straightforward. We sketch puted with our models to corresponding Dirichlet process is utilized in the pro- the problem, offer modeling approach, results produced with t-tests and GEE. cess of clustering. Simulation studies are and discuss implications for such tasks We demonstrate that our mixed models used to examine the performance and as: model formulation, parameter and give estimates that are unbiased and efficiency of the proposed method. The precision estimation, hypothesis testing more efficient than other methods that method is then applied to a population (in a marginal and hierarchical fashion). fail to accurately model the within-mouth of patients evaluated for laser refractive Results synthetized encompass both correlation of teeth. We also evaluate surgery to further understand the effect of historic findings as well as very recent the performance of the approaches aging on measurements of higher-order results. when data are missing under different aberrations. biologically plausible mechanisms of email: [email protected] email: [email protected] missingness. email: [email protected] REFLECTING THE ORIENTATION OF STATISTICAL METHODS FOR TEETH IN RANDOM EFFECTS MOD- MANIFOLD-VALUED DATA FROM ELS FOR PERIODONTAL OUTCOMES DETECTING HETEROGENEITY LONGITUDINAL STUDIES BASED ON EFFECT SIZE OF Rong Xia*, University of Michigan Emil A. Cornea*, University of North RESPONSE MEASURES Carolina, Chapel Hill Thomas M. Braun, University Xin Tong*, University of South Carolina, of Michigan Hongtu T. Zhu, University of North Columbia Carolina, Chapel Hill William V. Giannobile, University This study was motivated by the potential of Michigan Joseph G. Ibrahim, University of heterogeneity in clusters identified by pre- North Carolina, Chapel Hill Clinical attachment level (CAL) is a tooth- existing methods which are designed to level measure that quantifies the severity separate heterogeneous data into groups The aim of the paper is to present a gen- of periodontal disease. The within-mouth of similar objects such that objects within eral regression framework for the analysis correlation of tooth-level measures of a group are similar. Zernike aberration of manifold-valued data from longitudinal CAL is difficult to model because it must polynomials have been commonly used studies. We develop semi-parametric reflect the three-dimensional spatial as the standard method of describing the intrinsic random effect regression models geography of teeth and their functional shape of an aberrated wavefront of the for analyzing manifold longitudinal data. similarity. Thus, traditional approaches human eye. Knowledge on the homo- We focus on directional data, symmetric have included (1) applying a t-test/regres- geneity among the Zernike coefficients positive definite (SPD) matrices, and land- can potentially help us to improve eye mark-based planar shapes rising from disease diagnosis. To improve the qual- manifold-valued imaging data to illustrate ity of clusters, we propose a clustering our methodological development. We method which can cover skewed and

324 ENAR 2015 | Spring Meeting | March 15–18 apply our semi-parametric models to the In literature, such combination of empiri- change-point, is shown via simulations shape analysis of the corpus callosum cal likelihood and quadratic inference to be computationally more viable than from longitudinal studies and investigate function already exist. However, only existing methods that rely on search whether the corpus callosum shape a few type of correlation structure are procedures, with dramatic gains in the information is a potential biomarker for mentioned. Such limitation is built-in in multiple change-point case. The pro- the diagnosis of Alzheimer’s disease and the quadratic inference function method posed estimates
are shown to attention deficit hyperactivity disorder. because it requires the inverse of cor- have n-consistency and asymptotic relation matrix of specific format can be normality; in particular,they are asymp- email: [email protected] decomposed into linear combination totically efficient in the cross-sectional of basis matrix. Such decomposition is setting allowing us to provide meaning- ANALYZING DEPENDENT DATA not always viable. In this paper, we will ful statistical inference. As our primary USING EMPIRICAL LIKELIHOOD AND investigate a general stretagy of apply- and motivating (longitudinal) application, QUADRATIC INFERENCE FUNCTION ing quadratic inference function method we study the Michigan Bone Health and when the assumed structure are not Metabolism Study cohort data to describe Chih-Da Wu*, University of North limited to compound symmetry or AR(1). patterns of change in log estradiol levels, Carolina, Chapel Hill Simulation study will be provided to before and after the final menstrual Naisyin Wang, University of Michigan comparing the performance of several period, for which a two change-point bro- Dependent data can be modeled under methods. And a functional data consisted ken stick model appears to be a good fit. of diffusino tensor tract statistics, preo- generalized estimating equation frame- email: [email protected] work. However there are always same cessed from neuroimaging dataset will be number of estimating functions as the analyzed to show the application of our number of parameters and the correla- proposed strategy. 86. CONTRIBUTED PAPERS: tion matrix to describing the dependent email: [email protected] Biopharmaceutical struction is often unknown. Even though Applications and the sandwich estimator method shows Survival Analysis that the parameter estimation using FAST ESTIMATION OF REGRESSION a working correlation structure is still PARAMETERS IN A BROKEN STICK PSEUDO-VALUE APPROACH FOR consistent. In practice, people still want MODEL FOR LONGITUDINAL DATA TESTING CONDITIONAL RESIDUAL to cooperate the partial information Ritabrata Das*, University of Michigan LIFETIME FOR DEPENDENT SUR- about dependent structure by assuming Moulinath Banerjee, University of VIVAL AND COMPETING RISKS DATA the covariance matrix to have specific Michigan simpler format that can be represented Kwang Woo Ahn*, Medical College by not so many argumanet. For example, Bin Nan, University of Michigan of Wisconsin compound symmetry stucture needs only Estimation of change-point(s) in the Brent R. Logan, Medical College one argument and auto-regressive needs broken-stick model has significant of Wisconsin two. Quadratic inference function is the applications in modeling important method to cooperate such information Quantile residual lifetime analysis is often biological phenomena. In this article we by creating multiple numbers of estimat- performed to evaluate the distributions present a computationally economical ing function, more than the number of of remaining lifetimes for survival and likelihood-based approach for estimat- parameters. For estimating equation competing risks data. Residual lifetimes ing change-point(s) efficiently in both models with more estimating functions may depend on some patients’ char- cross-sectional and longitudinal settings. than parameters, empirical likelihood can acteristics. In addition, the event times Our method, based on local smooth- provide a robust parameter estimation. and the censoring times of survival and ing in a shrinking neighborhood of each competing risks data are often clus-

Program & Abstracts 325 tered or correlated. Thus, it is crucial to ery Rate (FDR) while making real time hazard based or mean based) to time- develop statistical methods for assess- decisions about rejecting or accepting a to-event analyses. Most of the proposed ing conditional residual lifetimes for hypothesis. The existing FDR controlling inference procedures for quantile residual dependent survival and competing risks procedures, such as Benjamini Hoch- life in the literature are non-parametric data. The current literature on quantile berg procedure, are not applicable here or semi-parametric. However, para- residual lifetime analysis is restricted to as they are unable to make real time metric approaches are expected to be independent survival data. We propose a decisions based on incomplete informa- asymptotically efficient under a correct pseudo-value approach to compare con- tion pertaining to the p-values available. specification of the model. Furthermore, ditional residual lifetimes for independent/ In this talk, I present a powerful fallback the parametric approach does not require dependent survival and competing risks type procedure for controlling FDR that nonparametric estimation of the prob- data. The pseudo-values are obtained awards the critical constants on rejec- ability density function of the underlying based on jackknife using the Kaplan- tion of a hypothesis and penalizes on distribution under informative or noninfor- Meier estimates for survival data and acceptance. This procedure overcomes mative censoring to evaluate the variance the cumulative incidence estimates for the drawback of the conventional FDR of the quantile estimator. In this presenta- competing risks data. Statistical infer- controlling procedures by making real tion we develop parametric inferences ence for comparing conditional residual time decisions based on partial infor- on the quantile residual lifetimes for lifetimes is made by relying on general- mation available when a hypothesis is one-sample and two sample cases both ized estimating equations to account for tested and allowing testing of each a under competing and non-competing correlation among patients. The statistical priori ordered hypothesis. The procedure risks settings. Simulation results indi- properties of the proposed method are is shown to strongly control FDR under cate that the methods perform well. The studied. Simulation studies show that positive regression dependence (PRDS) proposed methods will be illustrated with the proposed method controls Type I of p-values. FDR control under arbitrary a real dataset. dependence can also be achieved by errors very well. The proposed method is email: [email protected] illustrated by a bone marrow transplant applying a correction factor to the critical data set. constants. A Simulation study demon- strates effectiveness of the procedure in email: [email protected] STUDY DESIGN ISSUES IN PRE- terms of FDR control and average power. CISION STUDY FOR OPTICAL A real data analysis is presented that sup- COHERENCE TOMOGRAPHY DEVICE ports our findings. FALLBACK TYPE FDR CONTROLLING Haiwen Shi*, U.S. Food and PROCEDURES FOR TESTING email: [email protected] Drug Administration A PRIORI ORDERED HYPOTHESES Optical Coherence Tomography (OCT) Anjana Grandhi*, New Jersey PARAMETRIC INFERENCE is a new medical imaging technology Institute of Technology ON QUANTILE RESIDUAL LIFE and more eye specialists are relying on Gavin Lynch, New Jersey Institute it to diagnose the Ocular diseases, such Kidane B. Ghebrehawariat*, University of Technology as Diabetic Retinopathy. It is critical for of Pittsburgh OCT to accurately monitor the thickness Wenge Guo, New Jersey Institute Ying Ding, University of Pittsburgh of eye anatomy such as retina, which is of Technology the indication of some ocular disease. Jong-Hyeon Jeong, University In large scale multiple testing problems Hence, the OCT device needs to have of Pittsburgh in applications such as stream data, good precision in the measurement of statistical process control, etc., the tested Statistical inference via quantile residual the thickness. A typical precision study hypotheses are ordered a priori by time life enjoys some practical advantages for OCT device uses either a nested or and it is desired to control False Discov- over other existing approaches (e.g., crossed design. For the economic rea-

326 ENAR 2015 | Spring Meeting | March 15–18 son, sponsors prefer the nested design, often of scientific interest. However, there ate of interest. Our proposed estimator especially when the study involves are certain difficulties in modeling the is consistent and asymptotically normal. multiple sites. In this talk, I will discuss recurrent gap time data of infections after Through numerical studies, we showed my investigation of the consequence transplant. First, there are multiple types that our proposed method for estimating if the random effect model that is used of infections, hence multivariate recurrent a covariate effect is unbiased compared to obtain the variance component is event processes, which could be cor- to the naive estimator that uses only mis-specified. Specifically, I investigate related due to the shared, compromised surrogate endpoints and is more efficient the consequences of fitting the nested host immunity. Second, the effects of with moderate missingness compared design data by a crossed model and the covariates on different episodes of gap to the complete-case estimator that uses consequences of treating some random times of the same event type could be only true endpoints. We also illustrated effect factors as fixed effect using simula- different. Hence, a model which can the use of our proposed method by tions and a data example. In addition, I handle episode-specific effect is neces- estimating the effect of gender on time will discuss some other issues related to sary. Third, the observation of recurrent to detection of Alzheimer’s disease the precision study for the OCT device. infection processes may be terminated by using data from the Alzheimer’s Disease This includes whether an interaction term events such as death and a second trans- Neuroimaging Initiative. The proposed should be included in a crossed ANOVA plant. Ignoring the possible dependence method is able to account for the uncer- random effect model and the problem of of recurrent event processes with the tainty of surrogate outcomes by using a negative variance component estimates terminal events could lead to incorrect validation subsample of true outcomes and imbalanced data. inferential results. In this paper, we will in estimating a binary covariate effect. present the analysis of post-HCT infection The proposed estimator can outper- email: [email protected] data prospectively collected from patients form standard semiparametric survival who received HCT between 2000 and analysis methods, and can therefore save MODELING GAP TIMES BETWEEN 2010 at the University of Minnesota. on costs of a trial or improve power in detecting treatment effects. RECURRENT INFECTIONS email: [email protected] AFTER HEMATOPOIETIC CELL email: [email protected] TRANSPLANT ASSESSING TREATMENT EFFECTS Chi Hyun Lee*, University of Minnesota WITH SURROGATE SURVIVAL INFERENCE CONCERNING Xianghua Luo, University of Minnesota OUTCOMES USING AN INTERNAL THE DIFFERENCE BETWEEN TWO Chiung-Yu Huang, Johns Hopkins VALIDATION SUBSAMPLE TREATMENTS IN CLINICAL TRIALS University Jarcy Zee*, Arbor Research Krishna K. Saha*, Central Connecticut Recurrent infections after transplanta- Collaborative for Health State University tion can cause significant morbidity in Sharon X. Xie, University of Pennsylvania This article focuses on confidence interval hematopoietic cell transplantation (HCT) construction for the difference between In studies with surrogate outcomes avail- recipients. Patients who received HCT two treatment means in clinical trials able for all subjects and true outcomes at the University of Minnesota Fairview and other similar fields. In this study, the available for only a subsample, survival Hospital had been monitored for vari- interval methods based on the general- analysis methods are needed that incor- ous types of infectious complications, ized estimating equation (GEE) approach porate both endpoints in order to assess including bacterial, viral, and fungal and the ratio estimator approach are treatment effects. We develop a semi- infections. The effects of patient- and developed. The three other interval meth- parametric estimated likelihood method transplant-related characteristics on ods following the procedures studied for for the proportional hazards model with the interoccurrence times or gap times proportions are also developed. Monte discrete time data and a binary covari- between infections of each type are Carlo simulations indicate that all the

Program & Abstracts 327 procedures have reasonably well cover- methods, DNase2TF is faster and more GROUP FUSED MULTINOMIAL age properties. However, the GEE-based accurate in predicting actual transcrip- REGRESSION interval procedure outperforms other tion factor binding sites. We also assess Brad Price*, University of Miami interval procedures in terms of all three a limitation of using footprints for binding confidence interval criteria. An example in prediction that may be caused by insuf- Charles J. Geyer, University clinical trials is also presented to illus- ficient sequencing and/or certain binding of Minnesota trate the proposed confidence interval events that do not produce footprints. Adam J. Rothman, University procedures. DNase2TF allows rapid identification of of Minnesota footprint candidates, but care should be email: [email protected] We propose a penalized likelihood taken when inferring transcription factor method to reduce the number of binding through footprints. The MATLAB response categories in multinomial source code and C code are provided as 87. CONTRIBUTED PAPERS: logistic regression. An l2 rusion peanlty is Supplementary Material and updated in Computational Methods used to introduce shrinkage and exploit http://sourceforge.net. vectorwise simialrity of the regression email: [email protected] coefficieints. An ADMM algortihm is used DNase2TF: AN EFFICIENT ALGO- for optimization, and tuning parameter RITHM FOR FOOTPRINT DETECTION selection is also adressed. Songjoon Baek*, National Cancer SPECTRAL PROPERTIES OF MCMC email: [email protected] Institute, National Institutes of Health ALGORITHMS FOR BAYESIAN LINEAR REGRESSION WITH GENER- Myong-Hee Sung, National Cancer ALIZED HYPERBOLIC ERRORS Institute, National Institutes of Health ANALYSIS OF MCMC ALGORITHMS Yeun Ji Jung*, University of Florida FOR BAYESIAN LINEAR REGRES- Gordon L. Hager, National Cancer SION WITH LAPLACE ERRORS Institute, National Institutes of Health James P. Hobert, University of Florida Hee Min Choi*, University of California, By deep sequencing of DNase-seq data We study MCMC algorithms for Bayes- Davis and analyzing the nucleotide-resolution ian analysis of a linear regression model DNase cleavage profiles, it is possible with generalized hyperbolic errors. The Let pi denote the intractable posterior to achieve digital footprinting of tran- Markov operators associated with the density that results when the standard scription factors. The DNA regions that standard data augmentation algorithm default prior is placed on the parameters are bound by proteins and relatively and a sandwich variant of that algorithm in a linear regression model with iid protected from enzymatic cutting are are shown to be trace-class. This means Laplace errors. We analyze the Markov termed footprints. Decreasing costs and the Markov chains underlying the DA and chains underlying two different Markov higher yields of improved sequencing sandwich algorithms are geometrically chain Monte Carlo algorithms for explor- methods make digital footprinting more ergodic; that is, the Markov chains con- ing pi. In particular, it is shown that the feasible, making de novo discovery of verge to the target posterior distribution Markov operators associated with the relevant transcription factors possible. at a geometric rate. This result is highly data augmentation (DA) algorithm and However, reliable and fast computational important because geometric ergodic- a sandwich variant are both trace-class. methods must be widely available to ity of the Markov chain guarantees the Consequently, both Markov chains are enable footprint detection from DNase- existence of CLT (Central Limit Theorem) geometrically ergodic. It is also estab- seq data. Here we present DNase2TF, s which allow for the computation of valid lished that for each i=1,2,3,..., the ith a new detection algorithm that scans asymptotic standard errors for MCMC- largest eigenvalue of the sandwich DNase I hypersensitive sites for putative based estimates. operator is less than or equal to the footprints. When compared to previous email: [email protected]

328 ENAR 2015 | Spring Meeting | March 15–18 corresponding eigenvalue of the DA posterior distribution of the parameters. same n-dimensional subspace and can operator. It follows that the sandwich Based on our theoretical and simula- be efficiently represented by their low algorithm converges at least as fast as tion results, our recommendation is that dimensional coordinates in that sub- the DA algorithm. (Joint work with Dr. a Cauchy prior needs to be used with space. Several uncertainty metrics can be Jim Hobert.) care because the existence of posterior computed solely based on the bootstrap moments is not always guaranteed. As distribution of these low dimensional email: [email protected] a result, for a full Bayesian analysis of a coordinates, without calculating or storing binary regression model using Markov the p-dimensional bootstrap compo- ON THE USE OF CAUCHY PRIOR chain Monte Carlo, Student t priors with nents. We apply fast bootstrap PCA to DISTRIBUTIONS FOR BAYESIAN somewhat larger degrees of freedom a dataset of brain magnetic resonance BINARY REGRESSION could serve as a safer choice. images (MRIs) (p= approx 3 million, n=352). Our method allows for standard Joyee Ghosh*, University of Iowa email: [email protected] errors for the first 3 principal components Yingbo Li, Clemson University based on 1000 bootstrap samples to be Robin Mitra, University of Southampton FAST, EXACT BOOTSTRAP calculated on a standard laptop in 47 PRINCIPAL COMPONENT ANALYSIS minutes, as opposed to approximately 4 Cauchy prior distributions for regression FOR p > 1 MILLION days with standard methods. parameters have been popular in the Bayesian linear regression literature for Aaron Fisher*, Johns Hopkins University email: [email protected] a long time. One of the main advantages Brian Caffo, Johns Hopkins University of these prior distributions is that they Brian Schwartz, Johns Hopkins 88. Biostatistical Methods are heavy tailed and hence much more University for Heterogeneous robust compared to normal prior distri- Genomic Data butions. More recently a Cauchy prior Vadim Zipunnikov, Johns Hopkins distribution has been recommended as University the default choice for logistic regression Many have suggested a bootstrap INVESTIGATING TUMOR HETEROGE- by Gelman et al. (2008). It is known that procedure for estimating the sampling NEITY TO IDENTIFY ETIOLOGICALLY the mean does not exist for the Cauchy variability of principal component analysis DISTINCT SUB-TYPES distribution and a natural question is if (PCA) results. However, when the number Colin B. Begg*, Memorial Sloan there are some scenarios under which of measurements per subject (p) is much Kettering Cancer Center the posterior mean of the regression larger than the number of subjects (n), parameters may not exist either. In this calculating and storing the leading prin- Many investigators have conducted stud- talk we focus on complete separation cipal components from each bootstrap ies that examine the molecular profiles of in logistic regression, a scenario that is sample can be computationally infea- tumors to identify sub-types with distinc- not that uncommon with many binary sible. To address this, we outline methods tive patterns based on gene expression, covariates. We provide some theoretical for fast, exact calculation of bootstrap copy numbers changes, mutations, or justification to show that the posterior principal components, eigenvalues, and other somatic or epigenetic events. Ide- mean will not exist for a Cauchy prior scores. Our methods leverage the fact ally the sub-types so identified display distribution when there is complete that all bootstrap samples occupy the distinct clinical phenotypes. However, separation, under certain conditions. same n-dimensional subspace as the molecular profiles can also be examined We also use simulation studies and real original sample. As a result, all bootstrap with the goal of identifying etiologically data analysis to illustrate the effect of this principal components are limited to the distinct sub-types. In this talk a general property on the behavior of the Markov strategy for establishing and optimizing chain that is used to sample from the such etiologic heterogeneity is presented.

Program & Abstracts 329 A scalar measure of etiologic heterogene- sion computed tomography. The aim that, in these studies, cellular compo- ity that can be used to characterize a set is to distinguish between biologically sition explains much of the observed of sub-types is defined. It is shown how distinct tissue types, metastatic versus variability in DNAm. Furthermore, we this can then be used to direct a cluster- normal liver, through the evaluation of find high levels of confounding between ing strategy to identify the sub-types that vasculature heterogeneity; (ii) Differential age-related variability and cellular compo- are most clearly etiologically distinct. The networks between treatment groups, sition at the CpG level. We also present ideas are illustrated using data from on- cancer subtypes, or prognostic features, data from brain samples were the cell going studies of cancer epidemiology. using different modalities of genomic composition issue also arises. data (mRNA expression, copy number, email: [email protected] email: [email protected] microRNA), in association with hetero- geneous survival times in glioblastoma STATISTICAL CHALLENGES IN patients from the Cancer Genome Atlas MODELLING SOURCES OF CANCER RESEARCH: HETEROGENE- (TCGA) study. I will discuss the develop- VARIABILITY IN SINGLE-CELL ITY IN FUNCTIONAL IMAGING AND ment of computer-intensive statistical TRANSCRIPTOMICS DATA models,simulation studies conducted, MULTI-DIEMNSIONAL OMICS DATA Sylvia Richardson*, MRC Biostatistics and inferential results. This is joint work Kim-Anh Do*, University of Texas Unit Cambridge, UK with Francesco Stingo, Thierry Chekouo, MD Anderson Cancer Center James Doecke, Yuan Wang, Brian Hobbs, Catalina Vallejos, MRC Biostatistics Unit Thierry Chekouo, University of Texas Jianhua Hu. Cambridge and European Bioinformatics MD Anderson Cancer Center Institute, Hinxton, UK email: [email protected] Francesco Stingo, University of Texas John Marioni, European Bioinformatics MD Anderson Cancer Center Institute, Hinxton, UK ACCOUNTING FOR CELLULAR Brian Hobbs, University of Texas Current technology has made possible HETEROGENEITY IS CRITICAL IN MD Anderson Cancer Center the analysis of gene expression levels EPIGENOME-WIDE ASSOCIATION with high resolution. Instead of measur- Yuan Wang, University of Texas STUDIES ing overall expression across groups of MD Anderson Cancer Center Rafael Irizzary*, Harvard University cells, scientists are now able to report Jianhua Hu, University of Texas Epigenome-wide association studies measures at a single-cell level, with typi- MD Anderson Cancer Center (EWAS) of human disease and other cal data represented by a matrix which James Doecke, CSIRO, Australian quantitative traits are becoming increas- entries correspond to the observed e-Health Research Centre, Brisbane, ingly common. A series of papers expression counts for each gene across Australia reporting age-related changes in DNA cells. It is known that high levels of tech- nical noise are usually observed when Understanding different types of het- methylation (DNAm) profiles in peripheral dealing with small amount of genetic erogeneity is one of the challenges that blood have already been published. How- material. This creates new challenges for needs to be addressed to drive cancer ever, blood is a heterogeneous collection identifying genes, which show genuine research forward. I will describe, at of different cell types, each with a very within-tissue heterogeneity beyond that a high level, the statistical questions different DNA methylation profile. Using a induced by technical noise. An additional posed by cancer research at MD Ander- statistical method that permits estimating challenge in this context is the normaliza- son that involve the different facets of the relative proportion of cell types from tion of the expression counts. This is due heterogeneity. I will focus on two main DNAm profiles, we examine data from to cells having different amounts genetic research projects: (i) Simultaneous five previously published studies, and material and technical aspects such as supervised classification ofmultivariate find strong evidence of cell composition sequencing depth. In this talk, statisti- correlated objects collected from perfu- change across age. We also demonstrate

330 ENAR 2015 | Spring Meeting | March 15–18 cal approaches to model the different COMPETING RISKS PREDICTION posed two time scale methods perform sources of variability of such data will IN TWO TIME SCALES well, outperforming the single time scale be presented. As opposed to previous predictions when the time scale is mis- Jason Fine*, University of North literature, we implement a self-normal- specified. The methods are illustrated Carolina, Chapel Hill ization procedure, where normalizing with stage III colon cancer data obtained constants are treated as unknown model In the standard analysis of competing from the Surveillance, Epidemiology, and parameters. risks data, proportional hazards mod- End Results (SEER) program of National els are fit to the cause-specific hazard Cancer Institute. e-mail: functions for all causes on the same time [email protected] e-mail: [email protected] scale. These regression analyses are the foundation for predictions of cause- 89. Innovative Approaches specific cumulative incidence functions CHECKING FINE AND GRAY’S in Competing Risk based on combining the estimated SUBDISTRIBUTION HAZARDS Analysis cause-specific hazard functions. How- MODEL WITH CUMULATIVE ever, in predictions arising from disease SUMS OF RESIDUALS registries, where only subjects with dis- Jianing Li, Medical College of Wisconsin FLEXIBLE MODELING OF ease enter the database, disease related COMPETING RISKS AND CURE RATE mortality may be more naturally modelled Thomas H. Scheike, University on the time since diagnosis time scale of Copenhagen Qi Jiang, Northern Illinois University while death from other causes may be Mei-Jie Zhang*, Medical College Sanjib Basu*, Northern Illinois University more naturally modelled on the age time of Wisconsin scale. The single time scale methodol- ogy may be biased if an incorrect time Recently, Fine and Gray (1999) pro- The cumulative incidence functions scale is employed for one of the causes posed a semi-parametric proportional based approach to competing risks and alternative methodology is not regression model for the subdistribution modeling has the advantage of providing available. We propose inferences for the hazard function which has been used direct inference on the survival probabili- cumulative incidence function in which extensively for analyzing competing risks ties from each risk. A unified competing regression models for the cause-specific data. However, failure of model adequacy risks cure rate model is proposed in this hazard functions may be specified on could lead to severe bias in parameter work where the cumulative incidence different time scales. Using the disease estimation, and only a limited contribu- functions of the competing risks are registry data, the analysis of other cause tion has been made to check the model directly modeled. The proposed model mortality on the age scale requires left assumptions. In this talk, we present a further accounts for the possibility of cure truncating the event time at the age of class of graphical and analytical methods from one or more of the competing risks. disease diagnosis, complicating the for checking the assumptions of Fine and Bayesian analyses of these models are analysis. In addition, standard martingale Gray’s model. The proposed goodness- explored, and conceptual, methodologi- theory is not applicable when combin- of-fit test procedures are based on the cal and computational issues related to ing regression models on different time cumulative sums of residuals. We validate Bayesian model fitting and model selec- scales. We establish that the covariate the model in three aspects: (1) propor- tion are discussed. The performance of conditional predictions are consistent tionality of hazard ratio, (2) the linear the proposed model is investigated in and asymptotically normal using empiri- functional form and (3) the link function. simulation studies. The unified model is cal process techniques and propose For each proposed test, we provided used to analyze cancer survival data from consistent variance estimators which may a visualized plot and a testing p-value SEER and a clinical study. be used to construct confidence intervals. e-mail: [email protected] Simulation studies show that the pro-

Program & Abstracts 331 against the null hypothesis using a simu- missing mechanism model. The likeli- PREDICTION ACCURACY OF LONGI- lation-based approach. We also consider hood function is then reweighted by TUDINAL MARKER MEASUREMENT an omnibus test for overall evaluation the missing probability and the disease Paramita Saha Chaudhuri*, McGill against any model misspecification. The probability. The double robust estimation University proposed test methods performed well in allows either the disease model or the simulation studies and are illustrated with missing mechanism model to be incor- Patrick Heagerty, University of real data example. rect. We examine the performance of Washington double robust PMM approach in exten- e-mail: [email protected] Longitudinal marker measurements sive simulation studies. The proposed serves several purposes, from disease methods are motivated from and applied screening to treatment management. 90. Biomarker Evaluation to a fetal growth study, in which neonatal With the progress of medicine in the last in Diagnostics Studies outcomes are predicted by the longitudi- couple of decades, there is now consid- with Longitudinal Data nal ultrasound measurements. erable focus on early detection of disease e-mail: [email protected] via population screening and longitu- dinal measurements, giving the patient COMBINATION OF LONGITUDINAL more treatment options if the disease is BIOMARKERS WITH MISSING DATA MEASURES TO EVALUATE found early and consequently a better Danping Liu*, Eunice Kennedy Shriver BIOMARKERS AS PREDICTORS prognostic outlook. However, dispute still National Institute of Child Health OF INCIDENT CASES remains as to whether multiple marker measurement is predictive over and and Human Development, National Chao-Kang Jason Liang*, University Institutes of Health of Washington above baseline or less frequent mea- surements. Sometimes, it is not clear if a A common practice in disease prediction Patrick J. Heagerty, University relaxed marker measurement frequency is to measure a biomarker repeatedly of Washington can be used without compromising the over time, where the trajectory informa- In many biomedical applications a benefits (e.g., annual versus biennial). In tion of the biomarker greatly improves the primary goal is to predict incident or this talk, I will introduce a summary from prediction accuracy. Missingness occurs future cases, and appropriate measures time-dependent ROC that can be used in most longitudinal studies, resulting in that characterize a biomarker’s predic- to compare the accuracy of longitudinal incomplete observations of both biomark- tive potential or incremental value are marker measurements. I will demonstrate ers and the disease condition. Liu and needed. We first review existing non-para- the approach with simulated and real Albert (2014) proposed a pattern mixture metric methods proposed for incident data. model (PMM) to combine the longitudinal time-dependent accuracy (Zheng and biomarkers, but their approach cannot e-mail: Heagerty, 2005; Saha and Heagerty, handle missing data. We extend their paramita.sahachaudhuri.work@gmail. 2013) and then overview extensions results by allowing both the longitudinal com of integrated discrimination index (IDI) biomarkers and the disease outcome that are appropriate for hazard models. to be missing at random. We develop a The proposed new methods are also doubly robust procedure with PMM for connected to information theory based prediction. Under the PMM framework, criteria for model choice. We outline the missingness in the biomarkers can estimation methods and applications be handled by using a likelihood-based to benchmark data sets to illustrate the inference. To account for the missing- methodology. ness in the disease status, we assume a disease model given covariates and a e-mail: [email protected]

332 ENAR 2015 | Spring Meeting | March 15–18 ESTIMATING TIME-DEPENDENT illustrate new procedures using a two- 91. Solving Clinical Trial ACCURACY MEASURES FOR phase biomarker study aiming to evaluate Problems by Using SURVIVAL OUTCOME UNDER the accuracy of a novel biomarker, des- Novel Designs TWO-PHASE SAMPLING DESIGNS γ-carboxy prothrombin, for the early detection of hepatocellular carci- Dandan Liu*, Vanderbilt University noma (Lok et al., 2010). SOME DESIGN APPROACHES Tianxi Cai, Harvard University TO ADDRESS MISSING DATA DUE e-mail: [email protected] Anna Lok, University of Michigan TO EARLY DISCONTINUATION IN CLINICAL TRIALS Yingye Zheng, Fred Hutchinson COMPRESSION OF LONGITUDINAL Cancer Research Center Sonia M. Davis*, University of North GENOMIC BIOMARKERS Carolina, Chapel Hill Large prospective cohort studies of rare FOR DIAGNOSIS STUDY chronic diseases such as cancer often Clinical trials of some indications such as Le Bao*, The Pennsylvania State require thoughtful planning in study neuroscience often have markedly high University designs, especially for biomarker study rates of early patient discontinuation. when measurement are based on stored Xiaoyue Niu, The Pennsylvania Statistical methods to compare treatment tissue or blood specimens. Two phase State University groups in the face of missing data have become more developed in the past designs, including nested case control Kayee Yeung, University of Washington (Thomas, 1977) and case-cohort (Pren- decade. However, high rates of missing tice, 1986) sampling designs, provide In the genome-wise association study data due to patient discontinuation pose cost-effective tool in the context of (GWAS), one of the objectives is to substantial complications for addressing biomarker evaluation, especially when establish the associations between genes bias and interpretation of efficacy and the clinical condition of interest is rare. and diseases. Besides traditional gene safety results. In these settings, stud- Existing literature for biomarker assess- expression data that are collected in ies designed to reduce the likelihood of ment under two phase designs has been the form of a two dimensional gene by participant drop-out are warranted. This based on simple inverse probability individual matrix, with new technology, talk identifies and discusses clinical trial weighting (IPW) estimators (Cai and individual gene expression levels are also design elements aimed at (1) minimiz- Zheng, 2011; Liu et al., 2012). Drawing followed longitudinally for a series of time ing the rate of early discontinuation, (2) on recent theoretical development on the points. The resulting data form a three minimizing the impact of discontinua- maximum likelihood estimators for two- dimensional gene by individual by time tion on the assessment of the outcome phase studies (Scheike and Martinussen, array. In order to infer associations from measure, and (3) maximizing the ability 2004; Zeng et al., 2006), we propose these newly available data, we propose to gather patient status information after statistical methods to evaluate accuracy a novel two-step approach that uses treatment discontinuation. Topics include and predictiveness of a risk prediction model-based clustering and contingency recommendations by the National Acad- biomarker, with censored time-to-event tables to analyze the array data. The pro- emy of Science panel on missing data, a outcome under both types of two-phase posed method is computational efficient discussion of 3-arm 2-period cross-over designs. Hybrid estimators that combine and suites the analysis goal. designs, and use of time to treatment IPW estimators and MLE procedures are e-mail: [email protected] discontinuation as an outcome measure. proposed to improve efficiency and allevi- e-mail: [email protected] ate computational burden. We derive large sample properties of proposed esti- mators and evaluate their finite sample performance using numerical studies. We

Program & Abstracts 333 INTRODUCTION TO THE INTEGRITY AND EFFICIENCY OF 92. Ensuring Biostatistical SEQUENTIAL ENRICHED DESIGN ENRICHMENT AND ADAPTIVE TRIAL Competence Using DESIGN AND ANALYSIS OPTIONS TO Yeh-Fong Chen*, U.S. Food and Novel Methods ENABLE ACCURATE AND PRECISE Drug Administration SIGNAL DETECTION Roy Tamura, University of South Florida WHAT DO NON-BIOSTATISTICS Marc L. de Somer*, PPD For many disease areas, the placebo CONCENTRATORS NEED FROM Traditional clinical trial design and analy- response can be high in clinical trials. THE INTRODUCTORY BIOSTATIS- sis options require large sample sizes Enrichment designs are thought of as TICS COURSE? to detect the true signal when noise is being able to provide a way to address Jacqueline N. Milton*, Boston University high due to excessive placebo response this issue. Some of the interesting and variability. Increasing sample size Successful careers in public health enrichment designs such as sequen- is self-defeating because it can increase require practitioners to have an under- tial parallel design (Fava et al., 2003), in variability, due to the multiplication standing of biostatistical concepts, two way enriched design (Ivanova and of investigational sites, countries and applications and techniques. Developing Tamura, 2011) and sequential enriched subjects. Several partial enrichment trial a curriculum that ensures that all students design (SED) (Chen et al., 2014) were design solutions have been recently understand and can apply biostatistical proposed recently in the literature. The proposed: the sequential parallel compar- techniques requires an understand- SED was devised not only to reduce ison design (SPCD: Fava M, 2003), the ing of which concepts in biostatistics placebo response but also to enhance two-way enriched design (TED: Ivanova are most critical and how they will be the capability of detecting a targeted A, 2012), and the sequential enriched applied in various disciplines. Here we treatment effect. As it is a new design, design (SED: Chen YF, 2014). Their aim is sought to understand the learning goals the SED’s implementation in real clinical to filter excessive placebo response and of graduate students in public health in trial settings and its advantage over other variability, and identify a target responder the introductory biostatistics course. We enrichment designs need careful evalu- subject set in which signal detection is surveyed professors and alumni to deter- ation. In this presentation, I will describe enhanced. The present research evalu- mine which biostatistics concepts and the SED and evaluate the selection of ates their performance in terms of type applications were most critical. We used required design parameters in terms I error control, accuracy, precision and this information to inform and further of power optimization and sample size power, using multiple analysis models develop our curriculum so that it could be planning. I will also discuss the issues and missing data handling methods tailored more towards students’ interests of missing data and considerations of (ANCOVA.LOCF and MMRM). The reli- and career needs. interim analysis implementation related to ability of inference based on a linear its applicability. e-mail: [email protected] combination of stage-wise statistics is e-mail: [email protected] evaluated. Finally, the three enrichment design options are compared in terms of trial integrity, efficiency, feasibility and economics. Simulation across a wide range of assumptions reveals the optimal choice of the trial design, analysis and missing data handling method in each specific context. e-mail: [email protected]

334 ENAR 2015 | Spring Meeting | March 15–18 CREATING THE INTEGRATED graduate students motivated our devel- typically based on models for correlated BIOSTATISTICS-EPIDEMIOLOGY opment of several different introductory binary data. This approach ignores the CORE COURSE: CHALLENGES biostatistics course sequence options. age of family members at the time of AND OPPORTUNITIES The basic understanding of statistical assessment. We consider likelihood and principles and applications is paramount composite likelihood based methods for Melissa D. Begg*, Columbia University in each of these sequences but they vary modeling within-family dependence in the Roger D. Vaughan, Columbia University by level of achieved skills in performing disease onset times using copula models Dana March, Columbia University data analysis as well as understand- for settings in which data from non-pro- ing of statistical theory. In addition, our bands are subject to right censoring and Biostatistics and Epidemiology are epidemiology course options beyond current status observation schemes. The considered the “basic sciences” of public fundamental concepts have evolved to advantages of pairwise and partial pair- health, yet many students fear taking focus on either epidemiologic research wise composite likelihoods are discussed these courses due to their technical methods or professional methods for in terms of computation speed and nature. One of the challenges for educa- public health practice. We have recog- statistical efficiency. These models and tors is to persuade students of the power nized the ability to coordinate between methods are also used to examine the and utility of quantitative methods, and the varying biostatistics and epidemiol- factors influencing the commonly used deliver the skills required to interpret the ogy course sequences to offer students measures of within-family dependence literature and identify effective public cohesive curricular options that match based on binary responses. health interventions. One approach to this their diverse needs. problem is to integrate the teaching of e-mail: [email protected] these 2 disciplines into one core course, e-mail: [email protected] with the goal of helping students to see MODELING COGNITIVE STATES IN more immediately how these skill sets THE ELDERLY: THE ANALYSIS OF can be applied together to better under- 93. Methodological Fron- PANEL DATA USING MULTI-STATE stand the factors that hinder or promote tiers in the Analysis of MARKOV AND SEMI-MARKOV health. However, one must take care Panel Observed Data PROCESSES when delivering an integrated course Richard J. Kryscio*, University to ensure that students have adequate SECOND-ORDER MODELS OF of Kentucky exposure to both disciplines, and that WITHIN-FAMILY ASSOCIATION IN both are capably taught. We share some CENSORED DISEASE ONSET TIMES Continuous-time multi-state models are insights and recommendations after 3 commonly used to describe the move- Yujie Zhong*, University of Waterloo offerings of a combined core course as ment of elderly subjects among various part of the MPH curriculum. Richard John Cook, University cognitive states in dementia studies. The e-mail: [email protected] of Waterloo cognition of each subjects is periodically In preliminary studies of the genetic basis assessed leading to interval-censoring for chronic conditions, interest routinely for the cognitive states: intact cognition, MEETING PUBLIC HEALTH CAREER lies in examining the within-family depen- Mild Cognitive Impairment (MCI), and GOALS: COURSE OPTIONS IN BIO- dence in disease status. When probands Dementia. In these studies death without STATISTICS AND EPIDEMIOLOGY are selected from disease registries and dementia is a competing risk which is Marie Diener-West*, Johns Hopkins their respective families are recruited, a not interval censored. We discuss two Bloomberg School of Public Health variety of methods are available which approaches to modeling this type of correct for the selection bias, which are panel data: Markov chains and semi-Mar- The heterogeneity of previous back- kov processes in terms of computational grounds and experiences, as well as issues that arise when some cognitive future professional goals, of public health

Program & Abstracts 335 states such as MCI involve back transi- estimates of state occupancy probabili- 94. CONTRIBUTED PAPERS: tions and when parametric models are ties with some non-parametric estimate. Ordinal and used to model time spent in states. Existing non-parametric estimates of Categorical Data Data from the Statistical Modeling of state occupancy either make crude Risk Transitions project, a consortium of assumptions, leading to bias, or else six longitudinal studies of cognition in are more computationally intensive to EXPLICIT ESTIMATES FOR CELL the elderly, will be used to illustrate the implement than the original parametric COUNTS AND MODELING THE results. model. A computationally simple method MISSING DATA INDICATORS IN for obtaining non-parametric estimates THREE-WAY CONTINGENCY TABLE e-mail: [email protected] of the state occupation probabilities is BY LOG-LINEAR MODELS proposed for progressive multi-state Haresh D. Rochani*, Georgia Southern MULTI-STATE MODELS: A VARIETY models where transition times between University OF USES intermediate states are subject to interval censoring. The method separates estima- Robert L. Vogel, Georgia Southern Vern Farewell*, MRC Biostatistics Unit, tion of overall survival, using standard University Cambridge, UK methods for survival data, and estimation Hani M. Samawi, Georgia Southern Recent uses of multi-state models in the of the conditional cumulative incidence of University analysis of longitudinal data reflect their progression to a series of subsets of the Daniel F. Linder, Georgia Southern usefulness in the specification of data state space performed using methods for University structures and their flexibility. The use of current status competing risks data. The multi-state models for a variety of prob- resulting estimates of state occupancy Missing observations in cross-classified lems will be illustrated to demonstrate are unbiased, without requiring a Markov data are an extremely common prob- these characteristics. These problems will assumption, when the disease process lem in the process of research in public involve the challenges of panel data, time and examination times are indepen- health, clinical sciences and social sci- to event analyses for events defined only dent. An inverse visit-intensity weighted ences. Ignorance of missing values in by prolonged observation and correlated estimator is proposed for cases where the analysis can produce biased results processes. The application of causal the time to next examination depends and low statistical power. The purpose reasoning in the context of multi-state on the last observed state. The method of this research was to expand Baker, models will also be briefly discussed. can also be extended to provide approxi- Rosenberger and Dersimonian (BRD) model approach to compute the explicit e-mail: mate estimates of the marginal transition maximum likelihood estimates for cell [email protected] probabilities. counts for three-way cross-classified e-mail: [email protected] data. Derivation of explicit cell counts COMPUTATIONALLY SIMPLE STATE for three-way table with supplementary OCCUPANCY PROBABILITY ESTI- margins can be obtained by controlling MATES FOR MULTI-STATE MODELS the missingness in third variable and by UNDER PANEL OBSERVATION modeling the missing-data indicators using homogeneous log-linear models. Andrew Titman*, Lancaster University Previous methods for contingency tables A desirable way of assessing the appro- with supplementary margins required an priateness of a parametric multi-state iterative algorithms, however, expected model is to compare the model-based cell counts can be obtained by simple algebraic formula. Simulation study with source of knowledge of cancer data

336 ENAR 2015 | Spring Meeting | March 15–18 illustrate that how well the explicit maxi- miologic principles in studying additive of skewness. To address this limitation, mum likelihood estimates can produce interactions in determining whether or we propose a Bayesian nonparametric consistent results in idyllic circumstances. not synergy amongst the traditional MetS model which combines a generalized Application of the BRD model approach components exists in predicting future extreme value link function with a Gauss- to Slovenian public opinion survey data disease. Specifically, we calculated the ian process prior on the latent structure reveals the effect of smaller sample size relative excess risk due to interaction for flexible modeling. The efficiency to the validity of the method. (RERI) in estimating the additive interac- and gains of our proposed model are tions across the five MetS components. illustrated through the analysis of two real e-mail: [email protected] This study of RERI measures across five data examples with one collected in an components, and the resulting com- experimental paradigm that studies the ADDITIVE INTERACTIONS parisons among racial/ethnic groups, monkey attention and the other studying AND THE METABOLIC SYNDROME provided interesting methodologic chal- the course of development of pneumoco- lenges that helped us ultimately address niosis among coal miners when exposed Matthew J. Gurka*, West Virginia the question of whether “the sum is truly to certain mining conditions. University greater than its parts.” e-mail: [email protected] Baqiyyah N. Conway, West Virginia e-mail: [email protected] University Michael E. Andrew, National Institute for PENALIZED NON-LINEAR Occupational Safety and Health (NIOSH) FLEXIBLE LINK FUNCTIONS IN NON- PRINCIPAL COMPONENTS ANALYSIS PARAMETRIC BINARY REGRESSION FOR ORDINAL VARIABLES Cecil M. Burchfiel, National Institute for WITH GAUSSIAN PROCESS PRIORS Occupational Safety and Health (NIOSH) Jan Gertheiss*, Georg August Dan Li*, University of Cincinnati University, Germany Mark D. DeBoer, University of Virginia Xia Wang, University of Cincinnati Nonlinear principal components analysis The metabolic syndrome (MetS) is gener- (PCA) for categorical data constructs new ally defined as a cluster of cardiovascular Lizhen Lin, University of Texas, Austin variables by assigning numerical values risk factors, including obesity, high blood Dipak K. Dey, University of Connecticut to categories such that the proportion pressure, elevated triglycerides, low HDL, In many scientific fields, a sequence of of variance in those new variables that and elevated fasting glucose, that has 0-1 measurements is frequently collected is explained by a predefined number of been observed to be associated with from a subject across time, space, or a principal components is maximized. We future disease (diabetes, cardiovascular collection of covariates. Researchers are propose a penalized version of nonlin- disease). MetS has been argued to be a interested in finding out how the expected ear PCA for ordinal variables that is an stronger risk factor for future disease than binary outcome is related to covariates, intermediate between standard PCA on the individual components that comprise and aim at better prediction in the future category labels and nonlinear PCA as it, but this assertion is hotly debated 0-1 outcomes. Gaussian processes have used so far. Our approach offers both amongst clinicians and researchers alike. been used to model the latent structure better interpretability of the nonlinear In addition, questions remain regard- in a binary regression model, but little is transformation of the category labels as ing whether such additive interactions known about the adequacy on the choice well as better performance on valida- are similar across racial/ethnic groups. of link functions and its resulting effects tion data than unpenalized nonlinear Utilizing data from large cardiovascular on model fitting, predictive power and PCA. The new method is applied to the cohort studies (Jackson Heart Study, flexibility. Commonly used link func- International Classification of Functioning, Atherosclerosis Risk in Communities tions such as probit and logit links have Disability and Health (ICF). Study), we systematically apply epide- fixed skewness and lack the flexibility to e-mail: [email protected] allow the data to determine the degree

Program & Abstracts 337 COVARIANCE ESTIMATION OF probit models are first developed for Risk-prediction models need care- PROPORTION FOR MISSING ordinal data, then generalized to finite ful calibration to ensure they produce DICHOTOMOUS AND ORDINAL DATA mixtures of multivariate probit models. unbiased estimates of risk for subjects IN RANDOMIZED LONGITUDINAL Specific recommendations for prior set- in the underlying population given their CLINICAL TRIAL tings are carefully reasoned and found risk-factor profiles. As subjects with to work well in simulations and data extreme high or low risk may be the Siying Li*, University of North Carolina, analyses. Interpretation of the model is most affected by knowledge of their risk Chapel Hill carried out by examining aspects of the estimates, checking the adequacy of risk Gary Koch, University of North Carolina, mixture components as well as through models at the extremes of risk is very Chapel Hill averaged effects focusing on the mean important for clinical applications. We This paper presents a closed form responses. A simulation verifies that the propose a new approach to test model method for sensitivity analysis of a nonparametric model is capable to model calibration targeted toward extremes of randomized multi-visit multi-center clini- bivariate ordinal data with latent variables disease risk distribution where standard cal trial that possibly has missing not at generated from different distributions. goodness-of-fit tests may lack power random (MNAR) dichotomous data and In all simulations, nonparametric mod- due to sparseness of data. We construct an extension to ordinal data. Counts of els perform better than the parametric a test statistic based on model residu- missing data are redistributed to the each models in terms of LPML (log pseduo als summed over only those individuals category of the outcome probabilistically marginal likelihood) and MED (maximal who pass high and/or low risk thresholds to adjust for possibly informative missing; expected discrepancy in probability). An and then maximize the test statistic over adjusted proportion estimates as well as analysis of alcohol drinking behavior data different risk thresholds. We derive an their closed form covariance estimates illustrates the usefulness of the proposed asymptotic distribution for the max-test are provided. Treatment comparisons model. statistic based on analytic derivation of the variance-covariance function of over time are addressed with adjustment e-mail: [email protected] for a stratification factor and/or baseline the underlying Gaussian process. The covariates. The parameter estimates method is applied to a large case-control are computed via weighted least square 95. CONTRIBUTED PAPERS: study of breast cancer to examine joint asymptotic regression through random- Statistical Genetics effects of common single nucleotide poly- ization based methods. Application of morphisms (SNPs) discovered through such sensitivity analyses are illustrated recent genome-wide association studies. with an example. TESTING CALIBRATION OF The analysis clearly indicates a non- RISK MODELS AT EXTREMES additive effect of the SNPs on the scale of e-mail: [email protected] OF DISEASE RISK absolute risk, but an excellent fit for the Minsun Song*, National Cancer linear-logistic model even at the extremes BAYESIAN NONPARAMETRIC MULTI- Institute, National Institutes of Health of risks. VARIATE ORDINAL REGRESSION Peter Kraft, Harvard School e-mail: [email protected] Junshu Bao*, University of of Public Health South Carolina Amit D. Joshi, Harvard School Timothy E. Hanson, University of of Public Health South Carolina Myrto Barrdahl, German Cancer Multivariate ordinal data are modeled as Research Center (DKFZ) a stick-breaking mixture of multivariate Nilanjan Chatterjee, National Cancer probit models. Parametric multivariate Institute, National Institutes of Health

338 ENAR 2015 | Spring Meeting | March 15–18 PLEMT: A NOVEL PSEUDOLIKE- to the current tests. In particular, the A FRAMEWORK FOR CLASSIFYING LIHOOD BASED EM TEST FOR proposed test outperforms the commonly RELATIONSHIPS USING DENSE HOMOGENEITY IN GENERAL- used tests under all simulation settings SNP DATA AND PUTATIVE IZED EXPONENTIAL TILT MIXTURE considered, especially when there is vari- PEDIGREE INFORMATION ance difference between two groups. The MODELS Zhen Zeng*, University of Pittsburgh proposed test is illustrated by an example Chuan Hong*, University of Texas of identifying differentially methylated sites Daniel E. Weeks, University of Pittsburgh School of Public Health, Houston between ovarian cancer subjects and Wei Chen, Children’s Hospital Yong Chen, University of Texas School normal subjects. of Pittsburgh of UPMC of Public Health, Houston e-mail: [email protected] Nandita Mukhopadhyay, University Yang Ning, Princeton University of Pittsburgh Shuang Wang, Columbia University REGRESSION-BASED METHODS Eleanor Feingold, University Hao Wu, Emory University TO MAP QUANTITATIVE TRAIT LOCI of Pittsburgh Raymond J. Carroll, Texas A&M UNDERLYING FUNCTION-VALUED When genome-wide association stud- University PHENOTYPES ies (GWAS) or sequencing studies are performed on family-based datasets, Motivated by analyses of DNA methyla- Il Youp Kwak*, University of Minnesota the genetic marker data can be used to tion data, we propose a semiparametric Karl W. Broman, University of Wisconsin, check the structure of putative pedigrees. mixture model, namely the generalized Madison Even in datasets of putatively unrelated exponential tilt mixture model, to account Genetic loci that contribute to variation in people, close relationships can often be for heterogeneity between differentially a quantitative trait are called quantitative detected using dense single-nucleotide methylated and non-differentially methyl- trait loci (QTL). We are developing simple polymorphism (SNP) data. A number of ated subjects in the cancer group, and regression-based methods to map QTL methods for finding relationships using capture the differences in higher order that influence a function-valued outcome genetic data exist, but they all have cer- moments (e.g. mean and variance) (such as body weight measured over tain limitations, such as being intended between subjects in the cancer and nor- time) in an experimental cross. In order for uncorrelated genetic markers or for mal groups. A pairwise pseudolikelihood to handle noisy trait measurements and correctly-phased genotype data. Also, a is constructed to eliminate the unknown to account for the correlation structure common limitation of existing methods nuisance function. To circumvent bound- among time points, we apply an initial is that they use average genetic sharing, ary and non-identifiability problems as in smoothing followed by functional princi- which is only a subset of the available parametric mixture models, we modify pal component analysis. Functional PCA information. In this paper we present a the pseudolikelihood by adding a penalty reduces the functional data a small num- set of approaches for classifying relation- function. In addition, as epigenetic and ber of principal components without much ships in GWAS datasets or large-scale genetic data are usually high dimensional, loss of information. We then consider sequencing datasets. We first propose an computationally efficient tests have great multiple methods for QTL analysis with empirical method for detecting identity- advantages over permutation based tests. these dimension-reduced traits, including by-descent segments in close relative To this end, we propose a pseudolikeli- a multi-trait mapping method proposed pairs using dense SNP data, and then hood based expectation--maximization by Knott and Haley (2000), and simple demonstrate how that information can be test, and show the proposed test follows combinations of the single-trait analysis used to build a relationship classifier. We a simple chi-squared limiting distribution. results. All of these methods have been then develop a strategy to take advan- Simulation studies show that the proposed implemented in an R package, funqtl. tage of putative pedigree information to test performs well in controlling Type I errors and has superior power compared e-mail: [email protected]

Program & Abstracts 339 enhance classification accuracy. Finally, TWO-STAGE BAYESIAN REGIONAL confirmation studies. This approach also we propose classification pipelines for FINE MAPPING OF A QUANTITATIVE permits additional stratification according checking and identifying relationships in TRAIT to the quantitative trait value. datasets containing a large number of Shelley B. Bull*, University e-mail: [email protected] small pedigrees. of Toronto and Lunenfeld-Tanenbaum e-mail: [email protected] Research Institute OPTIMAL RANKING PROCEDURES Zhijian Chen, Lunenfeld-Tanenbaum IN LARGE-SCALE INFERENCE: A NEGATIVE BINOMIAL MODEL- Research Institute THRESHOLDING FAMILIES AND BASED METHOD FOR DIFFERENTIAL Radu V. Craiu, University of Toronto THE R-VALUE EXPRESSION ANALYSIS BASED ON In focused studies designed to follow up Nicholas C. Henderson*, University NANOSTRING nCOUNTER DATA associations detected in a genome-wide of Wisconsin, Madison Hong Wang*, University of Kentucky association study (GWAS), investigators Michael A. Newton, University Arnold Stromberg, University can proceed to fine-map a genomic region of Wisconsin, Madison of Kentucky by targeted sequencing or dense genotyp- ing of all variants in the region, aiming to Identifying leading measurement units from Chi Wang, University of Kentucky identify a functional sequence variant. For a large collection is a common inference task in various domains of large-scale The NanoString nCounter system is analysis of a quantitative trait, we consider inference. Testing approaches, which a new and promising technology that a Bayesian approach to fine-mapping measure evidence against a null hypoth- enables the digital quantification of study design that incorporates stratifica- esis rather than effect magnitude, tend multiplexed target RNA molecules. In this tion according to a promising GWAS to overpopulate lists of leading units with talk, we present a novel bioinformatics tag SNP in the same region. Improved those associated with low measurement method to identify differential expression cost-efficiency can be achieved when error. By contrast, local maximum likelihood between two different groups based on the fine-mapping phase incorporates a (ML) approaches tend to favor units with NanoString nCounter data. Our negative two-stage design, with identification of a high measurement error. Available Bayes- binomial model-based method is specifi- smaller set of more promising variants in ian and empirical Bayesian approaches cally designed for this type of count data, a subsample taken in stage 1, followed by rely on specialized loss functions that result which fully utilizes positive control, nega- their evaluation in an independent stage 2 in similar deficiencies. We describe and tive control and housekeeping probes subsample. To avoid the potential negative evaluate a novel empirical Bayesian rank- for data normalization. We propose an impact of genetic model misspecification ing procedure that populates the list of top empirical Bayes shrinkage approach to on inference we incorporate genetic model units in a way that maximizes the expected estimate the dispersion parameter and a selection based on posterior probabilities overlap between the true and reported top likelihood ratio test to identify differential for each competing model. Our simulation lists for all list sizes. The procedure relates expression. Our simulation results show study shows that, compared to simple collections of unit-specific posterior upper competitive performance of our method random sampling which ignores genetic tail probabilities with their empirical distribu- versus existing methods. information from GWAS, tag-SNP-based stratified sample allocation methods tion to yield a ranking variable. It discounts e-mail: [email protected] reduce the number of variants continuing high-variance units less than popular non- to stage 2, and are more likely to pro- ML methods and thus achieves improved mote the functional sequence variant into operating characteristics in the models considered. e-mail: [email protected]

340 ENAR 2015 | Spring Meeting | March 15–18 96. CONTRIBUTED PAPERS: other to govern species interactions and parameters, are inputs in a nonlinear Ecology and Forestry their ecological dynamics. The kinetic least squares criterion, which in turn is Applications integration of QTL mapping and eco- minimized by a Gauss-Newton algo- logical experiments, which could not be rithm. The proposed method is illustrated made by previous theory, provides an using simulation and an observed cotton A STATISTICAL FRAMEWORK innovative incentive to understand the aphids data set. FOR THE GENETIC DISSECTION intrinsic complexity of ecosystems and e-mail: [email protected] OF EVOLUTION INDUCED BY their evolutionary mechanisms. ECOLOGICAL INTERACTIONS e-mail: [email protected] Cong Xu*, The Pennsylvania State NEW INSIGHTS INTO THE University USEFULNESS OF ROBUST SINGU- ANALYSIS OF VARIANCE OF INTE- LAR VALUE DECOMPOSITION IN Libo Jiang, Beijing Forestry University GRO-DIFFERENTIAL EQUATIONS STATISTICAL GENETICS: ROBUST Meixia Ye, Beijing Forestry University WITH APPLICATION TO POPULATION AMMI AND GGE MODELS DYNAMICS OF COTTON APHIDS Rongling Wu, The Pennsylvania State Paulo Canas Rodrigues*, Federal University Xueying Wang, Washington State University of Bahia, Brazil University An increasing body of research has Andreia Monteiro, Nova University suggested that genes play a pivotal role Jiguo Cao*, Simon Fraser University of Lisbon, Portugal in determining the spatial and temporal Jianhua Huang, Texas A&M University Vanda M. Lourenço, Nova University changes of ecological interactions in of Lisbon, Portugal response to environmental perturbations. The population dynamics of cot- Here, we develop novel theory that can ton aphids are usually described by Two of the most widely used models map genes, known as quantitative trait mechanistic models, in the form of to analyse genotype-by-environment loci (QTLs), and their epistasis that affect integro-differential equations (IDEs), with data are the genotype main effects and intra- or interspecifc interactions involved the IDE parameters representing some genotype-by-environment interaction in an ecological process. We derive key properties of the dynamics. Investiga- (GGE) model and the additive main a statistical framework to synthesize tion of treatment effects on the population effects and multiplicative interaction genetic mapping, an approach widely dynamics of cotton aphids is a central (AMMI) model. The GGE and AMMI mod- used in the field of quantitative genetics, issue in developing successful chemical els apply singular value decomposition and ecological experiments of species and biological controls for cotton aphids. (SVD) to the residuals of a specific linear competition through mathematical equa- Motivated by this important agricultural model, to decompose the genotype-by- tions. We quantify ecological competition problem, we propose a framework of environment interaction (GEI) into a sum between species using a system of analysis of variance (ANOVA) of IDEs. of multiplicative terms. However, SVD is ordinary differential equations (ODE) and The main challenge in estimating the IDE- highly sensitive to contamination and the further determine the genetic architecture based ANOVA model is that IDEs usually presence of outliers may result in mis- of species interactions via estimating and have no analytic solution, and repeat- interpretations and, in turn, lead to bad testing QTL genotype-dependent differ- edly solving IDEs numerically leads to a practical decisions. Since, as in many ences in ODE parameters that specify a high computational cost. We propose a other real life studies, the distribution of web of ecological interactions singly or penalized spline method in which spline these data is usually not normal due to jointly. The new framework is particularly functions are used to estimate the IDE the presence of outlying observations, equipped to characterize how genomes solutions and the penalty function is robust SVD methods have been sug- from different species interact with each defined by the IDEs. The estimated IDE gested to help overcome this handicap. solutions, as implicit functions of the Therefore, a new approach, where robust

Program & Abstracts 341 statistical methods replace the classic model, which is a linear mixed model of water located around Superfund site, ones to model and analyse GEI in the (LMM) with a specific layout. However, and aggregated exposure score have context of multi-location plant breeding when the normality assumption is vio- potential positive association with cancer trials, is presented. The performance of lated, as other likelihood-based models, incidence from 1986 to 2010 in Florida. the proposed robust extensions of the this model may provide biased results in Additionally, results indicate evidence of AMMI and GGE models is assessed the association analysis and greatly affect heterogeneity among cancer incidence through a Monte Carlo study where the classical R2. Therefore, a robust ver- rates. sion of the REML estimates for the LMM several contamination schemes are con- e-mail: [email protected] sidered. An application to a real data set to be used in this context is proposed, is also presented to illustrate the benefits as well as a robust version of a recently of the methodology. proposed R2. The performance of both 97. CONTRIBUTED PAPERS: classical and robust approaches for the e-mail: [email protected] Pooled Biospecimens estimation of H2 is thus evaluated via and Diagnostic simulation and an example of application Biomarkers A ROBUST MIXED LINEAR MODEL with a maize data set is presented. FOR HERITABILITY ESTIMATION e-mail: [email protected] IN PLANT STUDIES HIERARCHICAL GROUP TESTING FOR MULTIPLE INFECTIONS Vanda M. Lourenço*, Nova University CANCER INCIDENCE AND of Lisbon, Portugal Peijie Hou*, University of South Carolina SUPERFUND SITES IN FLORIDA Paulo C. Rodrigues, Federal University Joshua M. Tebbs, University of Emily Leary*, University of Missouri of Bahia, Brazil South Carolina Alexander Kirpich, University of Florida Miguel S. Fonseca, University of Lisbon, Christopher R. Bilder, University Portugal Uncontrolled hazardous waste sites have of Nebraska, Lincoln the potential to adversely impact human Ana M. Pires, University of Lisbon, Group testing, where individuals are health and damage or disrupt ecological Portugal tested initially in pools, is often used systems and the greater environment. to screen a large number of individu- Heritability (H2) refers to the extent of Decades have passed since the Super- als for rare diseases. Triggered by the how much a certain phenotype is geneti- fund law was enacted, allowing increased recent development of assays that detect cally determined. Knowledge of H2 is exposure time to these potential health multiple infections, large-scale screening crucial in plant studies to help perform hazards but also allowing advancement programs now involve testing individuals effective selection. Once a trait is known of analysis techniques. In this study, in pools for multiple infections simulta- to be high heritable, association studies statewide cancer incidence in Florida is neously. Tebbs, McMahan, and Bilder are performed so that the SNPs underly- analyzed to determine if differences in (2013, Biometrics) recently evaluated the ing those traits’ variation may be found. incidence exist in counties containing performance of a two-stage hierarchical Here, regression models are used to test Superfund sites compared to coun- algorithm used to screen for chlamydia for associations between phenotype and ties that do not. Spatial and non-spatial and gonorrhea as part of the Infertility candidate SNPs. SNP imputation ensures analyses models are utilized and results Prevention Project in the United States. that marker information is complete, compared. Preliminary results indicate In this article, we generalize this work to so both the coefficient of determination evidence that level of hazard, proportion accommodate a larger number of stages. (R2) and H2 are equivalent. One popular To derive the operating characteristics model used in these studies is the animal of higher-stage hierarchical algorithms with more than one infection, we view the

342 ENAR 2015 | Spring Meeting | March 15–18 pool decoding process as a finite-state over time. We apply the methods to the to provide unbiased estimates of the Markov chain. Taking this conceptualiza- Prostate Cancer Prevention Trial Risk Cal- biomarker combination rule and the tion enables us to derive closed-form culator (PCPTRC) using yearly data from sensitivity of the panel corresponding to expressions for the expected number of the Prostate Biopsy Collaborative Group specificity of 1-t on the receiver operating tests and classification accuracy rates in (PBCG) comprising 25,772 prostate biop- characteristic curve (ROC). The Copas & terms of transition probability matrices. sies from five international cohorts. We Corbett (2002) correction, for bias result- When disease probabilities are small, we evaluate the annual discrimination and ing from using the same data to derive offer compelling evidence that higher- calibration performance characteristics of the combination rule and estimate the stage algorithms can provide significant the multiple revision alternatives relative ROC, was also evaluated and a modified savings when screening a population for to static use of the PCPTRC. version was incorporated. An extensive multiple infections. We also demonstrate simulation study was conducted to e-mail: [email protected] that if prevalence estimation is an addi- evaluate finite sample performance and tional goal, two-stage algorithms provide propose guidelines for designing studies most of the benefits in terms of estimation EVALUATION OF MULTIPLE BIO- of this type. efficiency. MARKERS IN A TWO-STAGE GROUP e-mail: [email protected] e-mail: [email protected] SEQUENTIAL DESIGN WITH EARLY TERMINATION FOR FUTILITY FLEXIBLE AND ACCESSIBLE Nabihah Tayob*, University of Texas KEEPING RISK CALCULATORS SEMI-PARAMETRIC METHODS MD Anderson Cancer Center CURRENT FOR ANALYZING Kim-Anh Do, University of Texas POOLED BIOSPECIMENS Donna Pauler Ankerst*, Technical MD Anderson Cancer Center University Munich and University of Emily M. Mitchell*, Eunice Kennedy Health Science Center at San Antonio Ziding Feng, University of Texas Shriver National Institute of Child Health MD Anderson Cancer Center and Human Development, National Andreas Strobl, Technical University Institutes of Health Munich Motivated by an ongoing study to develop a screening test able to identify Robert H. Lyles, Emory University As clinical practice increasingly focuses patients with undiagnosed Sjogren’s on personalized medicine and more Amita K. Manatunga, Emory University Syndrome in a symptomatic population, contemporary large scale data from we propose methodology to combine Enrique F. Schisterman, Eunice research consortiums become publicly multiple biomarkers and evaluate their Kennedy Shriver National Institute available, so too must commonly-used performance in a two-stage group of Child Health and Human clinical risk prediction tools evolve. As a sequential design that proceeds as fol- Development, National Institutes particular example, the prostate cancer lows: biomarker data is collected from of Health clinical landscape has undergone several first stage samples; the biomarker panel changes over the past decade, and these Pooling involves the physical combi- is built and evaluated; if the panel meets likely necessitate that existing prostate nation of biospecimens into a single pre-specified performance criteria the cancer risk calculators be re-calibrated in composite sample prior to performing study continues to the second stage and order to remain accurate. In this talk we lab assays. While pooling has various the remaining samples are assayed. The outline revision combined with shrinkage benefits, researchers may be hesitant to design allows us to conserve valuable methods proposed by Steyerberg (2010) adopt a pooling strategy since appropri- specimens in the case of inadequate for periodically updating an existing risk ate statistical methods are still being biomarker performance. We propose calculator to account for serial changes developed, and existing methods for a nonparametric conditional bootstrap algorithm that uses all the study data

Program & Abstracts 343 individually-measured specimens may ized risk-based strategies. For example, DTI-MRI imaging modalities with paired not directly apply to pools. This is par- in breast cancer screening, advanced and unpaired designs, where a personal- ticularly true when a biomarker is treated imaging technologies have made it ized modality assignment is estimated to as the outcome in a regression model, possible to move away from “one-size- improve empirical AUC significantly com- since measurements are often positive fits-all” screening guidelines to targeted pared to a “one-size-fits-all” assignment. risk-based screening for those who and right-skewed. Current methods for e-mail: [email protected] analyzing this type of data are either are in need. Similarly, for neurological computationally expensive or limited disorders, multiple imaging modalities to specific pool types. In this study, we may be measured and their diagnostic ANALYSIS OF UNMATCHED propose a novel, flexible and accessible performances vary across subjects so POOLED CASE-CONTROL DATA estimation technique for a right-skewed that applying the most accurate modal- Neil J. Perkins*, Eunice Kennedy outcome subject to pooling, regard- ity to the patients who would benefit the Shriver National Institute of Child Health less of pool type. We use simulations to most requires personalized strategy. To and Human Development, National demonstrate the efficacy of our proposed address these needs, we propose novel Institutes of Health method compared with existing methods. machine learning methods to estimate Our simulations, along with analysis of personalized decision rules for medical Emily M. Mitchell, Eunice Kennedy data from the Collaborative Perinatal screening or diagnosis to maximize a Shriver National Institute of Child Project (CPP), demonstrate that when weighted combination of sensitivity and Health and Human Development, appropriate estimation techniques are specificity for subgroups of subjects. National Institutes of Health applied to strategically-formed pools, Specifically, we frame the optimization as Enrique F. Schisterman, Eunice Kennedy valid and efficient estimation can be a weighted classification problem where Shriver National Institute of Child Health achieved. This novel method contributes we use a weighted supportive vector and Human Development, National to the base of available statistical tools to machine to obtain the solutions. First, we Institutes of Health analyze pooled specimens and will help develop methods that can be applied to When studying new biomarkers, pooled empower researchers to more confidently estimate personalized diagnostic rules study designs, where individual bio- consider pooling as a potential study where competing modalities or screening specimens are combined and assayed, design. strategies are observed on each subject can minimize cost while maintaining (paired design). Second, we also develop e-mail: [email protected] statistical efficiency. Logistic regression a kernel-based method for studies where and maximum likelihood methods have not all subjects receive both modalities been developed to estimate the associa- ESTIMATING INDIVIDUALIZED (unpaired design). We study theoretical tion between a dichotomous outcome DIAGNOSTIC RULES IN THE properties including consistency and risk and a pooled exposure, where pools are ERA OF PERSONALIZED MEDICINE bound of the personalized diagnostic rule matched on disease status. Exploiting under the causal inference framework. Ying Liu*, Columbia University characteristics of the gamma distribu- We conduct extensive simulation studies tion, we have developed a more flexible Yuanjia Wang, Columbia University for both paired and unpaired design to model for analyzing data containing demonstrate that our proposed method Chaorui Huang, Cornell University pooled measurements, where pools can can significantly improve the empiri- Donglin Zeng, University of North be of mixed disease status. In studies cal area under the receiver operating Carolina, Chapel Hill employing pooling strategies, such a curve (AUC). Lastly, we analyze data flexible approach will be essential to Recent trend in disease screening collected from a brain imaging study of analyzing secondary and conditional calls for shifting from population-based Parkinson’s disease using FDG-PET and outcomes when pools are formed based screening strategies to more personal- on the primary outcome. We use simula-

344 ENAR 2015 | Spring Meeting | March 15–18 tion studies to assess consistency and posed LFSpro that is built on a Mendelian ANOVA model with constraints for efficiency of risk effect estimates by model and estimates TP53 mutation uniqueness into a classical linear regres- comparing maximum likelihood estimates probability through the Elston-Stewart sion model without constraints. Such of odds ratios using all pools to the more algorithm, incorporating de novo muta- reparameterization allows us to consider standard approaches that only allow for tion rates. With independent validation mixtures of g-priors on the regression matched pools. In a conditional analysis data from 765 families (19,530 individu- coefficients with a hyperprior to g. With a of pregnancy outcomes and pooled cyto- als in the United States [pediatric-onset special choice of prior specifications, we kine measurements, we demonstrate the sarcoma] and Australia [adult-onset propose an explicit closed-form expres- efficacy of our method when application sarcoma]), we compared estimations sion of Bayes factor, which is easy to of the standard approaches is precluded using LFSpro versus classic LFS and apply in practice, and also easy to teach by an insufficient number of matched Chompret clinical criteria. LFSpro out- in undergraduate statistics with empha- pools. performed Chompret and classic criteria size on Bayesian thinking. In particular, in the pediatric sarcoma cohort and was we present asymptotic properties of e-mail: [email protected] comparable to Chompret criteria in the Bayes factors with various choices of adult sarcoma cohort. Sensitivity analy- g for ANOVA models with divergence ESTIMATING TP53 MUTATION sis on de novo mutation rates showed dimensionality under different asymptotic CARRIER PROBABILITY IN that in both cohorts, a rate of 5e-4 gave scenarios. We show that Bayes factor FAMILIES WITH LI-FRAUMENI LFSpro the best prediction performance. under the mixture g-priors have very SYNDROME USING LFSpro We developed and validated a clinically similar asymptotic properties: they are accessible tool that incorporates de novo always consistent under the null and are Gang Peng*, University of Texas mutation rates to accurately estimate consistent under the alternative except MD Anderson Cancer Center TP53 mutation carriers. Family history of for a small region around the null model. Jasmina Bojadzieva, University cancer evolves, and LFSpro is sensitive Applications to two real-data sets are of Texas MD Anderson Cancer Center to mutation carriers in families newly pre- analyzed for illustrative purposes. senting in high-risk clinics and in families Mandy L. Ballinger, Peter MacCallum e-mail: [email protected] Cancer Centre, Melbourne, Australia followed for years. It is more broadly applicable than the clinical criteria. David M. Thomas, The Kinghorn A MULTIFUNCTIONAL BAYESIAN Cancer Centre and Garvan Institute, e-mail: [email protected] PROCEDURE FOR DETECTING Sydney, Australia COPY NUMBER VARIATIONS FROM Louise C. Strong, University of Texas 98. CONTRIBUTED PAPERS: SEQUENCING READ DEPTHS MD Anderson Cancer Center Multiple Testing Yu-Chung Wei*, U.S. Food and Wenyi Wang, University of Texas and Variable Selection Drug Administration and National MD Anderson Cancer Center Chiao Tung University, Taiwan Given the cancer spectrum and onset BAYES FACTOR APPROACHES Guan-Hua Huang, National Chiao in Li-Fraumeni syndrome (LFS) and FOR HYPOTHESIS TESTING IN Tung University, Taiwan limitations of the clinical criteria, accurate ANOVA MODELS Copy number variations (CNVs) are identification of candidates for prospec- Min Wang*, Michigan Technological genomic structural mutations with abnor- tive TP53 mutation testing has been University mal gene fragment copies. Read depths difficult. A more efficient prediction tool signal mirrors the variants directly from is needed for LFS identification, man- We examine the issue of hypothesis the next generation sequencing data. agement and screening, which should testing in analysis-of-variance (ANOVA) ultimately decrease mortality. We pro- designs. We first reparameterize the

Program & Abstracts 345 Some tools have been published to pre- Traditional eQTL mapping is to associ- regression coefficients and for select- dict CNVs by depths, but most of them ate one transcript with a single marker ing variables to account for the linear just apply to a specific data type. Provid- at a time, thereby limiting our inference constraints. We also propose a method ing a multifunctional detection algorithm about a complete picture of the genetic to obtain de-biased estimates that are that can easily make use of a variety of architecture of gene expression. In this asymptotically unbiased and derive its data types is difficult but valuable. We talk, I present an innovative applica- joint asymptotic distribution. The results develop a multifunctional COpy Number tion of variable selection approaches to provide valid confidence intervals of the variation detection tool by a BaYesian systematically detect main effects and regression coefficients and can be used procedure, CONY, which adopts an effi- interaction effects among all possible loci to obtain the p-values. Simulation results cient reversible jump Markov chain Monte on differentiation and function of gene show that variable selection based on the Carlo inference algorithm for analyzing expression. Forward-selection-based pro- confidence intervals or the p-values can sequencing read depths. CONY is suit- cedures were particularly implemented improve those from Lasso regression. able for reads from both whole genome to tackle complex covariance structures Application to a gut microbiome data and targeted exome sequencing. Addi- of gene-gene interactions. We reana- set has identified three bacterial genera tionally, CONY can be applied not only to lyzed a published genetic and genomic that are associated with the body mass an individual for estimating the absolute data collected in a mapping population index, which can explain about 24% of number of copies but also to case-control of Caenorhabditis elegans, gaining new the variance. discoveries on the genetic origin of gene samples for detecting patient specific e-mail: [email protected] variations. We demonstrate this prag- expression differentiation, which could matic approach with targeted region not be detected by a traditional one- exome sequencing data from National locus/one-transcript analysis approach. TAKING INTO ACCOUNT Taiwan University Hospital. We also e-mail: [email protected] OVERREPRESENTED PATTERNS IN evaluate the performance of CONY and GENE EXPRESSION ANALYSIS compare it with competing approaches Megan Orr*, North Dakota using both simulations and real data from STATISTICAL INFERENCE FOR State University the 1000 Genomes Project. HIGH DIMENSIONAL LINEAR Ekua Bentil, North Dakota e-mail: [email protected] REGRESSION WITH LINEAR CONSTRAINTS AND APPLICATION State University TO MICROBIOME STUDY Gene expression technologies allow INFERRING THE GLOBAL GENETIC Pixu Shi*, University of Pennsylvania expression levels to be compared across ARCHITECTURE OF GENE treatments for thousands of genes Anru Zhang, University of Pennsylvania TRANSCRIPTS FROM ULTRAHIGH- simultaneously. Many methods exist DIMENSIONAL MOLECULAR DATA Hongzhe Li, University of Pennsylvania for identifying differentially expressed genes while controlling multiple testing Kirk Gosik*, The Pennsylvania We consider the statistical inference error. However, most methods do not State University problem for high-dimensional linear take into account the overrepresentation regression models with linear con- Rongling Wu, The Pennsylvania of observed patterns (compared to the straints on the regression coefficients. State University expected patterns under the null hypoth- Such models include the log-contrast Knowledge about how changes in gene esis) across groups if such patterns model for compositional covariates as expression are encoded by expression exist. An example of an overrepresented a special case. We develop a penalized quantitative trait loci (eQTLs) is a key pattern includes a large majority of estimation procedure for estimating the to construct the genotype-phenotype genes being up-regulated compared to map for complex traits or diseases. down-regulated in a two-sample study.

346 ENAR 2015 | Spring Meeting | March 15–18 Another example includes a high propor- an implementation that scales well for information: the adaptive lasso (Zou 2006) tion of genes exhibiting monotonicity in high-dimensional data. We provide some and the weighted false discovery rate the sample mean expression levels as theoretical results, compare with existing (Genovese et al 2006). Using simulations treatment dose levels increase in a dose- frequentist and Bayesian nonparametric and real data analysis, we compared and response experiment with more than two testing methods, and describe an appli- contrasted our methods to a benchmark treatments. We propose new methods cation to breast cancer methylation data best linear unbiased prediction (BLUP) that take into account these overrepre- from the Cancer Genome Atlas. method that did not consider prior biologi- sented patterns and identify differentially cal information (Speed and Balding 2014). e-mail: [email protected] expressed genes while controlling false e-mail: [email protected] discovery rate. The proposed meth- ods are compared to traditional gene INCORPORATING ENCODE expression analysis procedures through INFORMATION INTO SNP-BASED 99. CONTRIBUTED PAPERS: simulation studies. Real gene expression PHENOTYPE PREDICTION Parameter Estimation data sets are analyzed to illustrate the Yue-Ming Chen*, University of Texas in Hierarchical and usefulness of the proposed methods. School of Public Health, Houston Non Linear Models e-mail: [email protected] Peng Wei, University of Texas School of Public Health, Houston A HIERARCHICAL BAYESIAN BAYESIAN SCREENING FOR Recent studies show that most genome- METHOD FOR WELL-MIXED GROUP DIFFERENCES IN wide association studies (GWAS) identified AND TWO-ZONE MODELS IN METHYLATION ARRAY DATA single nucleotide polymorphisms (SNPs) INDUSTRIAL HYGIENE Eric F. Lock*, University of Minnesota fall outside of the protein-coding regions Xiaoyue Zhao*, University of Minnesota (Hindorff et al 2009) and these trait-asso- In modern biomedical research, it is Susan Arnold, University of Minnesota ciated SNPs may play a role in regulatory common to screen for differences networks and tend to be highly corre- Dipankar Bandyopadhyay, University between groups in many variables that lated, i.e., in high linkage disequilibrium of Minnesota are measured using the same technol- (LD), with the functional SNPs (Maurano Gurumurthy Ramachandran, University ogy. Motivated by DNA methylation data, et al 2012, Schaub et al 2012). Tools for of Minnesota this talk focuses on screening for equality annotating functional variation in human of group distributions for many variables Sudipto Banerjee, University genome, such as RegulomeDB (Boyle with shared distributional features such of California, Los Angeles et al 2012), integrate enriched regulatory as common support, common modes information from multiple resources includ- In industrial hygiene and occupational and common patterns of skewness. ing the Encyclopedia of DNA Elements exposure assessment, a worker’s We propose a Bayesian nonparametric (The ENCODE Project Consortium 2012). exposure to chemical, physical and testing methodology, which improves Using ENCODE information, we annotated biological agents is customarily modeled performance by borrowing information all SNPs by assigning scores which reflect using deterministic physical models that across the different variables and groups their regulatory function in the human study exposures based on the distance through shared kernels and a common genome. Based on these functional to a contaminant source. When field probability of group differences. The annotations, we constructed a weighted or experimental observations are avail- inclusion of shared kernels in a finite genetic prediction framework for complex able, non-linear stochastic regression mixture, with Dirichlet priors on the dif- traits. Two frameworks were considered as models have been employed for filtering ferent weight vectors, leads to a simple prototypes for incorporating the ENCODE the noise and estimating the physical framework for testing and we describe parameters. This, however, has been shown to be inefficient. Here, we develop

Program & Abstracts 347 a likelihood-based model using a discrete BIAS AND CONFIDENCE INTERVAL ROBUST MIXED-EFFECTS MODEL version of the underlying differential CORRECTION IN FOUR PARAMETER FOR CLUSTERED FAILURE TIME equations. We cast this within a Bayes- LOGISTIC MODELS DATA: APPLICATION TO HUNTING- ian dynamic linear model framework and TON’S DISEASE EVENT MEASURES Bronlyn Wassink*, Michigan State use posterior predictive measures to University Tanya P. Garcia*, Texas A&M University predict future exposure concentrations and estimate the underlying physical Tapabrata Maiti, Michigan State Yanyuan Ma, University of South Carolina University parameters. We show that this method Yuanjia Wang, Columbia University excels over simple non-linear regression Using a four parameter logistic model Karen Marder, Columbia University methods by providing more accurate is a commonly used method to model predictions, while it offers more reliable dose-response data. The four parameters An important goal in clinical and statisti- physical parameter estimates than Bayes- are the minimum expected response, cal research is describing the distribution ian melding approaches using Gaussian the maximum expected response, the for clustered failure times, which have a processes. We implement our method EC50, and the Hill parameter. The EC50, natural intra-class dependency and are entirely within the RJAGS software a method of quantifying a drug’s potency, subject to censoring. We propose to han- environment using two well-established refers to the dose that produces an dle these inherent challenges with a novel physical models: (i) a well-mixed model, expected response halfway between approach that does not impose a pro- and (ii) a two-zone model. baseline and the maximal expected portional hazards assumption nor treat the dependency with a random effect e-mail: [email protected] response, and the Hill parameter is a measure of the steepness of the expected having a prespecified distribution. Rather, response when the dose is near the using a logit transformation, we relate PARAMETER ESTIMATION: EC50. In the small sample setting, such the distribution for clustered failure times A BAYESIAN INFERENCE APPROACH as in a pilot study, the maximum likeli- to covariates and a random intercept. To avoid any misspeciffication issues, the Romarie Morales*, Arizona State hood estimates of the four parameters covariates are modeled using unknown University are biased and not near-normally distrib- uted. Therefore, asymptotically-based functional forms and the random inter- We focus on improving the current methods for estimating the parameter cept is kept distribution-free and allowed methodology for estimating transmission biases and confidence intervals do not to depend on covariates. Over a range parameters by applying the Bayesian sta- always produce reliable results. In this of time points, the model is shown to be tistical framework to a probabilistic model talk, we investigate different methods for reminiscent of an additive logistic mixed of disease transmission. We then general- computing bias-corrected parameters, effect model, from which we can handle ize this formulation to any disease. The and propose the use of Beta and Gamma censoring via pseudo-value regression Bayesian method takes into account the distributions to improve confidence and apply semiparametric techniques to stochasticity of disease transmission intervals. Simulation results on the cover- factor out the unknown random effect. and provides more robust parameter age probability and interval widths show The resulting estimators are shown to estimates. Increasing estimation accu- the improved performance of the pro- be simple, consistent, and robust to any racy through adoption of the Bayesian posed method over the existing classical nuances of the random effect distribution. framework will equip policymakers with procedures. We illustrate the method’s robustness better tools for mitigating the effects of an and competitiveness to existing methods e-mail: [email protected] epidemic. in a simulation study that involves diferent random-intercept distributions and dif- e-mail: [email protected] ferent dependency structures between

348 ENAR 2015 | Spring Meeting | March 15–18 the random intercept and model covari- found that using stacked survival models 100. New Statistical ates. Lastly, we apply our method to the can have potential to provide robust Methods in the Cooperative Huntington’s Observational estimation at little loss of precision. We Environmental Research Trial data, to provide new also illustrate the approaches using lung Health Sciences insights into differences between motor transplantation data. and cognitive impairment event times in e-mail: [email protected] genetically predisposed patients. NEW STATISTICAL MODELS TO DETECT VULNERABLE PRENA- e-mail: [email protected] THE CoGAUSSIAN DISTRIBUTION: TAL WINDOW TO CARCINOGENIC A MODEL FOR RIGHT SKEWED DATA POLYCYCLIC AROMATIC HYDRO- STACKED SURVIVAL MODELS FOR CARBONS ON FETAL GROWTH Govind S. Mudholkar, University CENSORED QUANTILE REGRESSION of Rochester Lu Wang*, University of Michigan Kyle Rudser*, University of Minnesota Ziji Yu*, University of Rochester Prenatal exposure to carcinogenic poly- Andrew Wey, University of Hawaii cyclic aromatic hydrocarbons (c-PAHs) Saria S. Awadalla, University of Chicago through maternal inhalation induces John Connett, University of Minnesota Scientific data are often nonnegative, higher risk for a wide range of fetotoxic Inference on quantiles of survival is an right skewed and unimodal. For such effects. However, the most health-relevant attractive alternative to using the hazard data, CoGaussian distribution, the dose function from chronic gestational ratio for comparing groups that has a R-symmetric Gaussian twin, with its exposure remains unclear. Whether there meaningful interpretation based on units mode as the centrality parameter, is a is a gestational window during which the of time in the censored data setting. basic model. In this paper, the essentials, human embryo/fetus is particularly vul- Censored quantile regression allows namely the concept of R-symmetry, the nerable to PAHs has not been examined contrasts across groups while adjust- roles of the mode and harmonic vari- thoroughly. We consider a longitudinal ing for other factors. Current censored ance as, respectively, the centrality and semiparametric-mixed effect model to quantile regression methods often rely dispersion parameters of the CoGaussian characterize the individual prenatal PAH upon the fairly strong assumptions of distribution, are introduced. The pivotal exposure trajectory, where a nonpara- unconditionally independent censoring role of the CoGaussian family in the class metric cyclic smooth function plus a or linearity in all quantiles. We examine of R-symmetric distributions and the linear function are used to model the time the use of stacked survival models in a estimation, testing and characterization effect and random effects are used to distribution-free framework for adjusted properties are discussed. The similarities account for the within-subject correlation. contrasts of quantiles of survival. By between the Gaussian and CoGaussian We propose a penalized least squares minimizing prediction error, stacking esti- distribution, namely the G-CoG analo- approach to estimate regression coef- mates optimally weighted combinations gies, are summarized. ficients and the nonparametric function of survival models that can span paramet- e-mail: [email protected] of time. The smoothing parameter and ric, semi-parametric, and non-parametric variance components are selected using models. As such, the low variance of the generalized cross-validation criteria. approximately correct parametric models The estimated subject-specific trajectory can be exploited while maintaining the of prenatal exposure is linked to the birth robustness of nonparametric models. outcomes through a set of functional We analyze the performance on estima- linear models, where the coefficient of log tion and inference via simulations and PAH exposure is a fully nonparametric

Program & Abstracts 349 function of gestational age. This allows the effect of PAH exposure on each birth out- come to vary at different gestational ages, and the window associated with significant adverse effect is identified as a vulnerable prenatal window to PAHs on fetal growth. e-mail: [email protected] EVALUATING ALTERATIONS IN 101. Novel Phase II and III REGRESSION COEFFICIENTS Clinical Trial Designs DIMENSION REDUCTION FOR SPA- DIRECTED BY TOXICANT MIXTURES for Cancer Research TIALLY MISALIGNED MULTIVARIATE Peter X. K. Song*, University of Michigan that Incorporate AIR POLLUTION DATA Biomarkers and Non- Shujie Ma, University of California, standard Endpoints Adam Szpiro*, University of Washington Riverside Emerging monitoring technologies pro- We propose a new linear mixed effects vide high-dimensional characterizations of model for longitudinal data that enables NOVEL PHASE II AND III DESIGNS air pollution, promising a more nuanced us to study dynamics of interest in the FOR ONCOLOGY CLINICAL TRIALS, understanding of which pollutants/mix- process of somatic growth. This new WITH A FOCUS ON BIOMARKER tures are responsible for health effects methodology is useful to assess if and VALIDATION observed in single pollutant epidemiol- how the rate of growth may be intervened Daniel J. Sargent*, Mayo Clinic ogy studies. There are two interrelated by exposure variables such as mixtures Increasing scientific knowledge is creat- challenges (i) interpreting the association of toxicants (e.g. PBA and phthalates). ing both substantial opportunities and parameters requires dimension reduction Interestingly, most of such interveners are challenges in oncology drug develop- and (ii) spatial misalignment requires pre- of small size in their effects, and the tra- ment. As diseases are sub-stratified into diction modeling. We propose a paradigm ditional statistical method fails to detect often biomarker-based groups, usual for spatially predictive dimension reduc- their statistical significance. Our new paradigms for phase II and III disease tion, exemplified by predictive sparse modeling strategy incorporates a certain may no longer apply. Enrichment principal component analysis. We seek type of principal component in the forma- designs are appropriate when prelimi- sparse principal component loadings that tion of regression coefficients, termed as nary evidence suggest that patients with/ explain a large proportion of the variance index coefficients in that low-effect toxi- without that marker profile do not benefit in the monitoring data, while ensuring the cants are combined into possibly strong from treatments in question; however corresponding low-dimensional represen- toxicant groups. Statistical estimation and this may leave questions unanswered tations are predictable at subject locations. inference in such model is challenging (e.g. Herceptin and breast cancer). An We apply the proposed method to long- because it contains nonlinear interac- unselected design is optimal where term multi-pollutant data from regulators tions between the toxicant groups and preliminary evidence regarding treatment monitors across the United States and covariates of interest (e.g. age or time). benefit and assay reproducibility is uncer- utilize the predicted low-dimensional The proposed models and methods are tain. Adaptive analysis designs allow for exposures to refine our understanding of a motivated and illustrated by an analysis pre-specified marker defined subgroup previously observed association between of child growth data to evaluate altera- analyses of data from a RCT. We discuss hypertension and exposure to fine particu- tions in growth rates incurred by mother’s features of these various novel design late matter. exposures to endocrine disrupting com- strategies in the context of real trials. e-mail: [email protected] pounds during pregnancy. e-mail: [email protected] e-mail: [email protected]

350 ENAR 2015 | Spring Meeting | March 15–18 STRATIFIED SINGLE ARM PHASE LUNG-MAP: A PHASE II/III Network (NCTN) in study leadership and 2 DESIGN FOR FINDING A BIO- BIOMARKER-DRIVEN MASTER PRO- trial participation. The Lung-MAP study MARKER GROUP THAT BENEFITS TOCOL FOR SECOND LINE THERAPY activates[activated] in June 2014 with the FROM TREATMENT OF SQUAMOUS CELL LUNG CANCER intention to enroll 1,000 patients per year onto the therapeutic sub-studies. In this Irina Ostrovnaya*, Memorial Sloan Ket- Mary W. Redman*, Fred Hutchinson talk I will describe the statistical design tering Cancer Center Cancer Research Center of the Lung-MAP study and discuss the Emily Zabor, Memorial Sloan Kettering Lung-MAP is a large scale, screening/ current status of the study. Cancer Center clinical registration protocol that genomi- e-mail: [email protected] cally screens patients with advanced In phase II studies of cancer treatments, stage lung squamous cell cancer moving there is growing interest in investigating to second-line therapy, and uses the whether a treatment is effective among RANDOMIZED PHASE II DESIGN screening results to direct each patient the general population of patients, or only TO STUDY THERAPIES DESIGNED to a therapeutic phase II/III sub-study. among a subgroup of patients defined TO CONTROL GROWTH OF BRAIN Based on the results of the genomic by a biomarker value or genetic muta- METASTASES IN CANCER PATIENTS analysis, patients will either be assigned tion. While there are many such statistical to one of the biomarker-driven sub- Sujata M. Patil*, Memorial Sloan- “enrichment” designs developed for the studies or to the “non-match” sub-study Kettering Cancer Center randomized clinical trials, there are very for patients with none of the eligibility few available for single arm studies when The presence of brain metastases in biomarkers, and subsequently random- current treatment is compared to histori- cancer patients often indicates poor ized between an investigational therapy cal controls, and these designs are not prognosis. Additionally, the presence or standard of care. The biomarker- easily applied. Here we propose a simple of brain metastases can directly impact driven sub-studies are designed around two-stage single arm stratified design a patient’s quality of life. Controlling a genotypically-defined alteration in the for binary endpoint similar to Simon two brain disease is important and has been tumor and a drug that targets it. The stage design that allows for discontinu- one current focus of clinical trials and non-match study is designed around an ation of the marker negative subgroup retrospective reviews [Preusser et al, investigational agent with the potential in the first stage if there is not enough Eur J Cancer 2012; Lin, ecancer 2013]. for efficacy in a broader/less selected evidence of treatment efficacy in that However, there are challenges in con- population. Each Lung-MAP sub-study subgroup. The software for calculating ducting such studies and interpretations functions autonomously and will open the sample size for the proposed design of results are not uniform. For instance, and close independently of the other is publically available. This design often patients may progress extracrainially sub-studies. When an endpoint for a sub- requires fewer patients than two parallel before progression in the brain can be study is met, that drug-biomarker specific Simon two stage designs in marker posi- assessed, thereby creating a compet- combination may proceed to FDA for tive and negative patients independently. ing risks analytic setting. Assessing true approval review of the new drug with its brain recurrence versus radionecrosis e-mail: [email protected] matching companion diagnostic. When and the use of consistent criteria to an endpoint is not met, that sub-study assess brain recurrence have also been will be closed and another modular sub- methodological issues. Through the use study of a different agent will be initiated. of simulations, we describe how these While organized by SWOG, Lung-MAP is issues affect power and sample size in the result of unprecedented public-private Phase II studies and propose a design collaboration between government, non- that reduces their impact. profit, and for-profit organizations and e-mail: [email protected] involves the entire National Clinical Trials

Program & Abstracts 351 102. Novel Statistical STATISTICAL ANALYSIS OF A CASE STUDY OF RNA-Seq DATA Methods to Decipher DIFFERENTIAL ALTERNATIVE IN BREAST CANCER PATIENTS SPLICING USING RNA-Seq DATA Gene Regulation Wei Sun*, University of North Carolina, Using Sequence Data Mingyao Li*, University of Pennsylvania Chapel Hill Yu Hu, University of Pennsylvania We carry out a systematic study of RNA- seq data and its genetic architecture ON THE DETECTION OF NONLINEAR Cheng Jia, University of Pennsylvania AND INTERACTIVE RELATIONSHIPS in 550 breast cancer patients from The RNA sequencing (RNA-seq) allows an IN GENOMIC DATA Cancer Genome Atlas project. eQTL unbiased survey of the entire transcrip- mapping of gene expression (measured Bo Jiang, Harvard University tome in a high-throughput manner. It has by RNA-seq in tumor tissue) vs. germline Jun Liu*, Harvard University rapidly replaced microarrays as the major genotype and tumor copy number aber- platform for transcriptomics studies. A I will discuss a few recent results from my rations show that both types of genetic major application of RNA-seq is to detect group aiming at the detection of non- variants have substantial influence on differential alternative splicing (DAS), or lineardependence and interactive effects gene expression. We further assess such differential transcript usage, across exper- of several random variables. These associations after deconvoluting gene imental conditions. Differential analysis at approaches were developed by taking a expression from tumor cells and normal the transcript level is of great biological Bayesian view on the inverse-slicing idea cells (e.g. stromal cells within the tumor interest due to its direct relevance to pro- first proposed by Ker-Chau Li for dimen- tissue), and discuss possible scenarios tein function and disease pathogenesis. sion reduction. To detect whether a new to use such eQTL results to obtain further However, DAS analysis using RNA-seq covariate X can influence the continu- biological insights. data is challenging because of the dif- ous response Y conditional on a set of e-mail: [email protected] ficulty of quantifying alternative splicing selected covariates Z, we assume that and various biases present in RNA-seq there is a latent slicing variable Y which data. In this talk, I will present several indicates how Y can be sliced into a few UNIT-FREE AND ROBUST DETECTION statistical issues related to the analysis of levels. We then model the conditional OF DIFFERENTIAL EXPRESSION DAS. I will discuss methods for detecting distribution of [X|Z] at each level of Y FROM RNA-Seq DATA DAS for both paired and unpaired data, and compare with the overall conditional and compare the performance of exon- Hui Jiang*, University of Michigan distribution [X|Z] unconditional of Y. We based and gene-based tests of DAS. I Ultra high-throughput sequencing of can also provide a prior on the latent slic- will show simulation results as well as transcriptomes (RNA-Seq) has recently ing variable and average over all posible some examples from real transcriptomics become one of the most widely used slicing schemes weighted by their prior studies. methods for quantifying gene expres- probabilities. We will show how these sion levels due to its decreasing cost, methods are applied to bioinformatics e-mail: [email protected] high accuracy and wide dynamic range problems such as gene-set enrichment for detection. However, the nature of analysis, transcription regulation analysis, RNA-Seq makes it nearly impossible eQTL studies, and others. to provide absolute measurements of e-mail: [email protected] transcript concentrations. Several units or data summarization methods for tran- script quantification have been proposed to account for differences in transcript lengths and sequencing depths across genes and samples. However, none

352 ENAR 2015 | Spring Meeting | March 15–18 of these methods can reliably detect specimens, with current technologies This talk will present the design and differential expression directly without supporting the quantification of 10’s implementation of a data-clustering further proper normalization. We propose to 100’s of individual features in every algorithm, FLOCK, for computational a statistical model for joint detection of cell. These technologies are being identification of cell populations from differential expression and data normal- used to study normal and abnormal cell multi-dimensional flow cytometry data. ization. Our method is independent of activation, differentiation, and func- Then we will discuss improvements to the unit in which gene expression levels tion, to diagnose leukemia, lymphoma, FLOCK that allows for the analysis of are summarized. We also introduce an and myeloproliferative disorder, and to higher dimensionality data such as those efficient algorithm for model fitting. Due to identify novel biomarkers of therapeutic generated by mass and imaging cytom- the L0-penalized likelihood used by our response and treatment outcome. In this etry. The methods will be presented in the model, it is able to reliably normalize the presentation we will review each of these context of their applications in biomedical data and detect differential expression in technologies, summarize some of the basic and translational research. Both the some cases when more than half of the computational challenges associated impact and the limitations of the methods genes are differentially expressed in an with the high dimensionality, biological will be discussed. Besides FLOCK, sev- asymmetric manner. The robustness of variability and technical artifacts inher- eral other computational methods have our proposed approach is demonstrated ent in the resulting data, and describe been developed and shown to provide with simulations. some of the computational and statistical excellent performance when compared methods that have been developed to with manual analysis. We will summa- e-mail: [email protected] process, analyze and interpret cytometry rize the basic principles behind these data. We will also present the results from approaches and review the existing infra- 103. Flow Cytometry: the FlowCAP challenges (http://flowcap. structure support for computational single Data Collection and flowsite.org) that were developed to com- cell data analysis. Through transforming Statistical Analysis pare the performance of these methods. the methods and their accessories into workflow steps, we are able to integrate e-mail: [email protected] and compare multiple methods in the FLOW, MASS AND IMAGING CYTOM- same running environment for data- ETRY FOR SINGLE CELL ANALYSIS: COMPUTATIONAL IDENTIFICA- driven selection and optimization of the A FERTILE FIELD FOR BIOSTATIS- TION OF CELL POPULATIONS computational methods. We will report TICS RESEARCH FROM CYTOMETRY DATA: progress in the development of Flow- Richard H. Scheuermann*, J. Craig METHODS, APPLICATIONS, AND Gate; a Scientific Gateway that combines Venter Institute and University INFRASTRUCTURE graphical user interfaces, data analytical platforms and workflow engines including of California, San Diego Yu Qian*, J. Craig Venter Institute GenePattern and bioKepler, and parallel Yu Qian, J. Craig Venter Institute Hyunsoo Kim, J. Craig Venter Institute computing support for processing and Chiaowen Hsiao, University Shweta Purawat, University of California, analyzing cytometry single cell data in of Maryland, College Park San Diego an extensible, scalable, and reproducible way. Monnie McGee, Southern Methodist Rick Stanton, J. Craig Venter Institute University email: [email protected] Ilkay Altintas, University of California, Flow, mass and imaging cytometry are San Diego used to quantitatively assess the phe- Richard H. Scheuermann, J. Craig notypic characteristics of large numbers Venter Institute of single cells in complex biological

Program & Abstracts 353 MAPPING CELL POPULATIONS significantly different from a comparison 104. Statistical Methods IN FLOW CYTOMETRY DATA FOR population, thereby defining a new cel- In Chronic Kidney CROSS-SAMPLE COMPARISON lular phenotype by providing an objective Disease USING THE FRIEDMAN-RAFSKY TEST measure for when a cell population has become functionally distinct. Through Chiaowen Joyce Hsiao*, University comparing cell populations, FlowMap- JOINT MODELING OF KIDNEY FUNC- of Maryland, College Park FR can also detect situations in which TION DECLINE, END STAGE KIDNEY Mengya Liu, Southern Methodist inappropriate splitting or merging of cell DISEASE (ESRD), AND DEATH WITH University populations has occurred during gat- SPECIAL CONSIDERATION OF COM- Rick Stanton, J. Craig Venter Institute ing procedures. We have implemented PETING RISKS FlowMap-FR as a stand-alone R/Biocon- Monnie McGee, Southern Methodist Dawei Xie*, University of Pennsylvania ductor package that is publicly available University to the community. Wensheng Guo, University Yu Qian, J. Craig Venter Institute of Pennsylvania e-mail: [email protected] Richard H. Scheuermann, J. Craig Wei Yang, Merrill Lynch Venter Institute and University Qiang Pan, University of Pennsylvania of California, San Diego A NOVEL APPROACH TO MODELING IMMUNOLOGY DATA DERIVED FROM Methods have been developed previously This talk presents FlowMap-FR, a novel FLOW CYTOMETRY to jointly model a repeatedly measured method for comparative analysis of cell continuous variable such as estimated populations across flow cytometry (FCM) Jacob A. Turner*, Baylor Institute glomerular filtration rate (eGFR) and a experiment samples. FlowMap-FR is for Immunology Research time-to-event outcome (such as ESRD), based on the Friedman-Rafsky (FR) non- This presentation will illustrate some or jointly model two time-to-event out- parametric test statistic, which is used to of the distributional properties the comes that represent competing risks measure the equivalence of multivariate variables from flow cytometry (FCM) (such as ESRD and death). We propose distributions. As applied to FCM data studies exhibit. A novel modeling strategy to jointly model repeated measures of by FlowMap-FR, the FR test objectively denoted Layered Dirichlet Modeling eGFR, ESRD and death to address the quantifies the similarity between cell (LDM) will be introduced to model pro- correlation between repeated measures populations based on their shapes, sizes, portions derived from FCM data. The of eGFR and ESRD/death and the com- and positions in the high-dimensional LDM strategy takes into account that the peting risks between ESRD and death. feature space. We will present the variables are compositional and have Specifically, repeated measures of eGFR method and discuss the performance of a hierarchical structure that imposes are modelled via a linear mixed effects FlowMap-FR in mapping cell populations correlation between the variables. The model and the times to ESRD and death under the different kinds of biological properties of the LDM testing procedures accelerated failure time frailty models. We and technical sample variations that are are explored. A data-driven tree finding further assume the linear and the frailty commonly observed in FCM data. Our algorithm is provided to find a hierarchy models share the same mixed effects. evaluation results show that FlowMap-FR of relationships among FCM subpopula- An EM algorithm is used to calculate is able to effectively identify equivalent tion when the hierarchy is missing or the maximum likelihood estimates of the cell populations across samples under unknown. The motivation of LDM comes parameters. The method will be illus- scenarios of proportion changes and from a generalization of the Dirichlet trated using data from the Chronic Renal modest distribution shifts. As a statistical distribution known as the Nested Dirichlet Insufficiency Cohort (CRIC) study. test, FlowMap-FR thus can be used to distribution. e-mail: [email protected] objectively determine when the expres- e-mail: [email protected] sion of a cellular marker has become

354 ENAR 2015 | Spring Meeting | March 15–18 JOINT MULTIPLE IMPUTATION FOR MODELING THE EFFECT OF BLOOD time-varying confounders, including urine LONGITUDINAL OUTCOMES AND PRESSURE ON DISEASE PROGRES- protein, creatinine, and hemoglobin, in CLINICAL EVENTS WHICH TRUN- SION IN CHRONIC KIDNEY DISEASE estimating the effect of blood pressure CATE LONGITUDINAL FOLLOW-UP USING MULTISTATE MARGINAL on the probability of transitioning among STRUCTURAL MODELS states. We apply our model to data from Bo Hu*, Cleveland Clinic the Chronic Renal Insufficiency Cohort, a Alisa J. Stephens*, University Liang Li, University of Texas multisite observational study of patients of Pennsylvania MD Anderson Cancer Center with CKD. Wei Peter Yang, University Tom Greene, University of Utah e-mail: [email protected] of Pennsylvania Longitudinal cohort studies often collect Marshall M. Joffe, University both repeated measurements of longi- of Pennsylvania DYNAMIC PREDICTION OF CLINI- tudinal outcomes and times to clinical CAL EVENTS USING LONGITUDINAL events whose occurrence precludes Tom H. Greene, University of Utah BIOMARKERS IN A COHORT STUDY further longitudinal measurements. Both In patients with chronic kidney disease, OF CHRONIC RENAL DISEASE types of data are usually subject to non- clinical interest often centers on deter- ignorable missingness due to informative Liang Li*, University of Texas mining treatments and exposures that dropout as well as intermittent missed MD Anderson Cancer Center are causally related to progression. visits. Although joint modeling of the Analyses of longitudinal clinical data in In longitudinal studies, prognostic clinical events and the longitudinal data this population are often complicated by biomarkers are often measured longitu- can be used to provide valid statistical clinical events, such as end stage renal dinally. It is of both scientific and clinical inference for target estimands in certain disease (ESRD) or death, and time- interest to predict the risk of clinical contexts, the application of joint models dependent confounding, where patient events, such as disease progression or in medical literature is currently rather factors that are affected by past expo- death, using these longitudinal biomark- restricted due to the complexity of the sures are predictive of later exposures ers and possibly other time-dependent joint models and the intensive computa- and outcomes. We developed multistate and time-independent information. This tion involved. We propose a multiple marginal structural models to assess the problem can be done in two ways. One imputation (MI) approach to jointly impute effect of time-varying systolic blood pres- is to build a joint model of longitudinal missing data of both the longitudinal and sure on disease progression in subjects data and clinical events data, and draw clinical event outcomes. With complete with CKD. The multistate nature of our predictions from the fitted model using imputed datasets, analysts are then able models allows us to consider jointly as the posterior distributions. The other to use simple and transparent statistical outcomes disease progression char- approach is the landmark dynamic methods and standard statistical soft- acterized by changes in the estimated prediction model, which is a system of ware to perform various analyses without Glomerular Filtration Rate (eGFR), the prediction models that evolve with the dealing with the complications of missing onset of (ESRD), and death, and thereby landmark times. We review the pros data and joint modeling. We show that avoid unnatural assumptions of death or and cons of the two approaches in the the proposed MI approach is flexible and ESRD as an informative censoring event context of the chronic renal disease stud- easy to implement in practice. Numerical after which disease progression can ies, and present our research using the results are also provided to demonstrate occur. Under a Markov assumption, we landmark approach. One drawback of the its performance. model the causal effect of systolic blood current landmark methodology is that the e-mail: [email protected] pressure on the probability of transition- predictors are difficult to define when the ing into one of five disease states given the current state. We use inverse prob- ability weights to account for potential

Program & Abstracts 355 longitudinal data are measured at irregu- the images into a group of data-driven realization from some underlying point larly spaced time points. We present a subsets, called “principal patterns”. process. In contrast, our motivating data solution to this problem by an augmenta- The representation of the expression are lesion locations from a cohort of tion of the landmark model. We apply patterns as learned principal patterns Multiple Sclerosis patients with patient our proposed methodology to the African allows for a compact and interpretable specific covariates measuring disease American Study of Kidney Disease and representation. Based on the learned severity. Patient specific covariates enter Hypertension (AASK) to derive and test a patterns, we constructed spatially local the model as a linear combination with dynamic prediction model for quantifying TF networks. The constructed networks spatially varying coefficients. Our goal is the time-varying risk of end stage renal agreed well with known networks such as to correlate disease severity with lesion disease. the gapgene network. More interestingly, location within the brain. Estimation of our method also identified a number of the LGCP intensity function is typically e-mail: [email protected] previously undescribed TFs as possible performed in the Bayesian framework new candidates regulators of the gap- using the Metropolis adjusted Langevin 105. Challenging Statistical gene network. We are currently validating algorithm (MALA) and, more recently, Issues in Imaging the candidate TFs in knockout experi- Reimannian manifold Hamiltonian Monte ments. Thus, our dataset, representation Carlo (RMHMC). Due to the extremely and modeling approach have shown large size of our problem---3D data RELATING DEVELOPMENTAL TRAN- significant potential for modeling and on 240 subjects with 275,000 voxel SCRIPTION FACTORS BASED ON identifying novel components of gene locations---we show that MALA performs DROSOPHILA EMBRYONIC GENE networks during animal development. poorly in terms of posterior sampling and EXPRESSION IMAGES that RMHMC is computationally intrac- e-mail: [email protected] Siqi Wu*, University of California, table. As a compromise between these Berkeley two extremes, we show that posterior ANALYSIS OF POINT PATTERN estimation via Hamiltonian Monte Carlo TFs play a central role in controlling IMAGING DATA USING LOG performs exceptionally well in terms of gene expression. A fundamental prob- GAUSSIAN COX PROCESSES WITH speed of convergence and mixing. lem in systems biology is to understand SPATIALLY VARYING COEFFICIENTS the interactions between the TFs, or, in e-mail: [email protected] other words, to understand the tran- Timothy D. Johnson*, University scription networks. For the first time in of Michigan FIBER DIRECTION ESTIMATION any metazoan animal, we have imaged Thomas E. Nichols, University IN DIFFUSION MRI spatiotemporal gene expression of of Warwick all known and predicted TFs during Raymond Wong*, Iowa State University, Log Gaussian Cox Processes (LGCP) are Drosophila embryogenesis. We used Thomas C. M. Lee, University used extensively to model point pattern 400+ images of 155 TFs with restricted of California, Davis data. In a LGCP, the log intensity func- gene expression during early develop- Debashis Paul, University of tion is modeled semi-parametrically as ment and developed novel methods California, Davis a linear combination of spatially varying relate these TFs to each other in order Jie Peng, University of California, Davis covariates with scalar coefficients plus to shed light on the TF cascades that Diffusion magnetic resonance imaging is a Gaussian process that models the trigger transcription. We borrowed the an emerging medical imaging technol- random spatial variation. Almost exclu- idea of Nonnegative Matrix Factorization ogy to probe anatomical architectures sively, the point pattern data are a single (NMF) from computational neurosci- of biological samples in an in vivo and ence/computer vision and decomposed noninvasive manner. It is widely used the expression patterns contained in

356 ENAR 2015 | Spring Meeting | March 15–18 to reconstruct white matter ber tracts genetic, and clinical data to detect 106. Statistical Methods in brains. In this talk, we first propose putative genes for complexly inherited for Predicting a new parametrization of the tensor neuropsychiatric and neurodegenerative Subgroup Level mixture model and develop a stable disorders. Several major big-data chal- Treatment Response numerical procedure for estimating lenges arise from testing genome-wide diffusion direction(s) within a voxel. To ($N_G>12$ million known variants) further improve the estimation of these associations with signals at millions A REGRESSION TREE APPROACH directions, we then propose a direction of locations ($N_V\sim 10^6$) in the TO IDENTIFYING SUBGROUPS smoothing method which is applicable to brain from thousands of subjects ($n\ WITH DIFFERENTIAL TREATMENT regions with crossing bers. In addition, sim 10^3$). The aim of this paper is EFFECTS we develop a novel tracking algorithm to develop a fast statistical method, Wei-Yin Loh*, University of Wisconsin, which takes (estimated) diffusion direc- referred as FVGWAS, to efficiently Madison tions as input and allows for multiple carry out whole-genome analyses of Regression trees are natural for subgroup directions within a voxel. whole-brain data. FVGWAS consists of three components including a spatially identification because they partition e-mail: [email protected] heteroscedastic linear model, a global the data space. We introduce two new sure independence screening (GSIS) methods that are practically free of selection bias and are applicable to FVGWAS: FAST VOXELWISE procedure, and a detection procedure two or more treatment arms, censored GENOME WIDE ASSOCIATION based on wild bootstrap methods. response variables, and missing values ANALYSIS OF LARGE-SCALE Specifically, for standard linear associa- in the predictor variables. The methods IMAGING GENETIC DATA tion, the computational complexity is $O(n^2N_VN_G)$ for the voxelwise extend the GUIDE approach by using Hongtu Zhu*, University of North genome wide association (VGWAS) three key ideas: (i) treatment as a linear Carolina, Chapel Hill method in \citep{Hibar2011} compared predictor, (ii) chi-squared tests to detect Meiyang Chen, University of North with $O((N_G+N_V)n^2)$ for FVGWAS. residual patterns and lack of fit, and Carolina, Chapel Hill Simulation studies show that FVGWAS (iii) proportional hazards modeling via Poisson regression. Importance scores Thomas Nichols, University of Warwick is an efficient method of searching sparse signals in an extremely large with thresholds for identifying influential Chao Huang, University of North search space, while controlling for the variables are obtained as by-products. Carolina, Chapel Hill family-wise error rate. Finally, we have e-mail: [email protected] Yu Yang, University of North Carolina, successfully applied FVGWAS to a subset Chapel Hill of ADNI data with 374 subjects, 195,855 INCREASING EFFICIENCY FOR Zhaohua Lu, University of North voxels, and 503,892 SNPs, and the total ESTIMATING TREATMENT-BIO- Carolina, Chapel Hill processing time was 3997 seconds for a single CPU. MARKER INTERACTIONS WITH Qianjing Feng, Southern Medical HISTORICAL DATA University e-mail: [email protected] Jeremy MG Taylor*, University Rebecca C. Knickmeyer, University of Michigan of North Carolina, Chapel Hill Philip S. Boonstra, University More and more large-scale imaging of Michigan genetic studies are being widely con- Bhramar Mukherjee, University ducted to collect a rich set of imaging, of Michigan

Program & Abstracts 357 In a clinical trial an interaction between FEATURE ELIMINATION FOR REIN- ADAPTIVE DESIGNS FOR DEVELOP- the treatment and a baseline biomarker FORCEMENT LEARNING METHODS ING AND VALIDATING PREDICTIVE in the outcome model indicates that the BIOMARKERS Sayan Dasgupta*, Fred Hutchinson treatment effect is not the same for all Cancer Research Center Noah Simon, University of Washington subjects. Efficiently estimating this treat- ment-biomarker interaction, in a phase Michael R. Kosorok, University Richard M. Simon*, National Cancer II trial, which has a small sample size, is of North Carolina, Chapel Hill Institute, National Institutes of Health challenging, but important to inform the Personalized medicine can be defined as Clinical trials are generally designed to design of subsequent phase III studies. the medical model that can adapt itself determine whether a treatment provides Two plausibly-available sources of histori- to appropriate needs of a patient, with average benefit for the eligible popula- cal data may contain partial information treatments and medical decisions suited tion. The average benefit is often small to help estimate the treatment-biomarker to his/her requirements. Discovering tai- and many patients must be treated for interaction parameter in a randomized lored therapies for these patients is a very each one who benefits. This approach phase II study. The historical data is either complex issue because effects of must to evaluating treatments is particularly a study of the control group only with the be modeled within the multistage struc- problematic in oncology where the usual biomarker measured or a study of both ture. Recently Q-learning (Watkins 1989; diagnostic categories are molecularly the control and treatment group without Murphy et. al. 2006) has been proposed heterogeneous and modern molecularly the biomarker measured. The parameter for maximizing the average survival time targeted drugs are unlikely to be broadly is not identified in either historical dataset of patients in this format. One important useful. In this presentation we will pres- alone; nonetheless, both can provide problem that we typically face however, ent a new paradigm for phase III clinical some information about the parameter is that the information about prognosis trials which we believe is better suited and, consequently, increase the precision is sometimes very rich, and moreover to early 21st century oncology. The new of its estimate. To illustrate the potential in Q-learning this prognosis information paradigm includes two objectives; testing for gains in efficiency and implications (history) grows with the number of stages the null hypothesis of uniform ineffective- for the design of the study, we consider in the trial. Hence, overfitting is an issue ness of the test regimen for the eligible Gaussian outcomes and biomarker data, that needs to be addressed, and feature population, and prospective development posit a linear model, and calculate the elimination becomes an importance tool of a predictive biomarker or biomarker asymptotic variance using the expected here and this will be the primary focus classifier that provides internally vali- Fisher information matrix. We find that a for this talk. We will discuss a few differ- dated guidance regarding the subset of non-negligible gain in precision is possi- ent methods for feature selection in Q patients who are most likely to benefit ble, even if the historical and prospective learning, based on the idea of feature from the test treatment. The first objec- data do not arise from identical underly- screening through ranking in a sequen- tive is assured in a frequentist framework ing models. tial backward selection scheme. We will whereas the development and validation e-mail: [email protected] discuss the applicability of the methods, of the “indication classifier” is viewed as a partly reasoned on heuristics stemming prediction, classification or decision prob- from our previous work on feature selec- lem, not as a hypothesis testing problem. tion in support vector machines and will We will present an adaptive approach to give results showing their performance in prospective development and validation various simulated settings. of an “indication classifier” in a phase III e-mail: [email protected]

358 ENAR 2015 | Spring Meeting | March 15–18 or phase II/III trial. The approach is suited is very difficult and expensive, while rank- vals under a wide range of conditions. for settings where the best predictive ing them without actual measurements An example from a medical study on biomarker is not established by the start can be easy. In such cases, ranked set severe head trauma is used to illustrate of the study but there are a limited num- sampling would give more accurate application of the new intervals. The new ber of candidate markers. We describe estimation than simple random sampling, proposed intervals generally have better several types of indication classifiers since ranked set samples are more likely performance than the BTII interval. to span the full range of population (thus including a Bayesian method in an other- e-mail: [email protected] wise frequentist randomized clinical trial. is more representative). In this study, kernel density estimation is utilized to e-mail: [email protected] numerically solve for the nonparametric SIMPSON’S PARADOX IN THE IDI estimate of the optimal cut-off point. Jonathan Chipman*, Vanderbilt Intensive simulations are carried out to 107. CONTRIBUTED PAPERS: University ROC Curves compare the proposed method using ranked set samples with the one using Danielle Braun, Dana-Farber Cancer simple random samples and the pro- Institute IMPROVED ESTIMATION OF posed method outperforms universally The Integrated Discrimination Improve- DIAGNOSTIC CUT-OFF POINT with much smaller mean squared error ment (IDI) is a commonly used metric ASSOCIATED WITH YOUDEN INDEX (MSE). A real data set is analyzed for to compare two risk prediction mod- USING RANKED SET SAMPLING illustrating the proposed method. els; it summarizes the extent to which Jingjing Yin*, Georgia Southern e-mail: [email protected] a new model increases risk in cases University and decreases risk in controls. The IDI averages risks across cases and Hani Samawi, Georgia Southern A BETTER CONFIDENCE INTERVAL controls and is therefore susceptible to University FOR THE SENSITIVITY AT A FIXED Simpson’s Paradox. In some settings, Chen Mo, Georgia Southern University LEVEL OF SPECIFICITY FOR DIAG- adding a predictive covariate to a well Daniel Linder, Georgia Southern NOSTIC TESTS WITH CONTINUOUS calibrated model results in an overall University ENDPOINTS negative IDI. However if we stratify by the covariate, the stratum-specific IDIs are Diagnostic cut-off point of biomarker Guogen Shan*, University of Nevada non-negative. Meanwhile, the calibration measurements is needed for classifying Las Vegas (O/E), AUC, and Brier Score improve a random subject to be either diseased For a diagnostic test with continuous overall and for each stratum. We ran or healthy. However, such cut-off point measurement, it is often important to extensive simulations to determine which is usually unknown and needs to be construct confidence intervals for the settings lead to paradoxical IDI results. estimated by some optimization criteria, sensitivity at a fixed level of specificity. We provide an analytic explanation and among which, Youden index has been Bootstrap based confidence intervals suggest a simple modification to be used widely adopted in practice. Youden index, were shown to have good performance in these settings. We illustrate the para- defined as max (sensitivity + specificity as compared to others, and the one by dox on Cancer Genomics Network data, -1), directly measures the largest total Zhou and Qin (2005) was recommended by calculating predictions based on two diagnostic accuracy a biomarker can as the best existing confidence interval, versions of BRCAPRO (version 2.08 and achieve. Therefore, it is desirable to esti- named the BTII interval. We propose two version 2.07), a Mendelian risk predic- mate the optimal cut-off point associated new confidence intervals based on the tion model for breast and ovarian cancer. with Youden index. Sometimes, taking profile variance method, and conduct Version 2.08 updates contralateral breast the actual measurements of a biomarker extensive simulation studies to compare cancer (CBC) penetrance. Calibration the proposed intervals and the BTII inter-

Program & Abstracts 359 (O/E), AUC, and Brier Score improve degree of freedom for the t-distribution we proposed a latent mixture modeling overall and by CBC stratum; however, the using the Satterthwaite approximation. framework for the rASRM scores, using overall IDI is negative while CBC stratum- We validate our proposed test and par- the class of scale mixture of normal distri- specific IDIs are non-negative. ticularly assess its performance in parallel butions within each disease population. with the DeLong test for comparing two The a priori constraint was specified as e-mail: [email protected] correlated AUCs via Monte Carlo simula- decreasing variances of the distributions tion studies. Simulation results show that of the outcome. We developed MCMC A NONPARAMETRIC TEST BASED the proposed method provides satisfac- procedure to implement model inference ON t-DISTRIBUTION FOR COMPAR- tory type I error rates, even with very from a Bayesian perspective and con- ING TWO CORRELATED C INDICES small sample sizes. ducted simulation study to evaluate the performance of the proposed approach. WITH RIGHT-CENSORED SURVIVAL e-mail: [email protected] OUTCOME OR AUCs WITH DICHOTO- By DIC, the model with generalized t MOUS OUTCOME distribution for rASRM fits the PRS data LATENT MIXTURE MODELS FOR the best. Substantively, we observed Le Kang*, Virginia Commonwealth ORDERED ROC CURVES USING higher AUC estimate in setting 2 when University THE SCALE MIXTURE OF NORMAL the constraint was used while higher AUC Shumei Sun, Virginia Commonwealth DISTRIBUTIONS estimate in setting 1 when the constraint University was not used. Zhen Chen*, Eunice Kennedy Shriver When the reference standard is binary National Institute of Child Health e-mail: [email protected] outcome, the area under the receiver and Human Development, National operating characteristic (ROC) curve Institutes of Health (AUC) is routinely used as a summary LEAST SQUARES ROC METHOD Sungduk Kim, Eunice Kennedy Shriver measure of diagnostic accuracy. When FOR TESTS WITH THE ABSENCE National Institute of Child Health the reference standard is right-censored OF THE GOLD STANDARD and Human Development, National time-to-event (survival) outcome, the C Larry Tang*, George Mason University Institutes of Health index, motivated as an extension of AUC, and National Institutes of Health provides a measure of concordance Clinical Center between a prognostic biomarker and the In the Physician Reliability Study (PRS), Minh Huynh, Department of Labor right-censored survival outcome. Statisti- 12 physicians in OB/GYN were invited and National Institutes of Health cal methods for estimating the C index to diagnose endometriosis of about Clinical Center and its confidence interval, as well as for 150 participants under several settings, comparing two or more correlated C indi- Xuan Che, Epidemiology and each with a different amount of clinical ces have been investigated extensively. Biostatistics, National Institutes information. To assess the diagnostic In this work, we propose a nonparametric of Health Clinical Center accuracy of the physicians under each test based on t-distribution for comparing setting, care has to be taken to address Elizabeth K. Rasch, Epidemiology two correlated C indices (AUCs as a spe- the between-setting dependence of the and Biostatistics, National Institutes cial case). We adopt U-statistics based study outcome rASRM. Moreover, as the of Health Clinical Center estimators, both for the C index and for clinical information increase with settings, the variance of the difference between Ao Yuan, Georgetown University it is desirable to account for the a two C indices. We show that the resulting priori constraint in the estimation The topics on diagnostic accuracy with- test statistic for comparing two C indices of the diagnostic parameters. In this work, out the gold standard can be classified follows an approximate t-distribution and into several areas, including 1) binary we propose to estimate the appropriate test results with a perfect gold standard,

360 ENAR 2015 | Spring Meeting | March 15–18 2) ordinal or continuous test results with 108. CONTRIBUTED PAPERS: members are representable as a short a perfect gold standard, 3) binary test Personalized Medicine list of if-then statements. Regimes in this results without a gold standard, and 4) and Biomarkers class are immediately interpretable and ordinal or continuous test results without are therefore an appealing choice for a gold standard. Extensive literature is broad application in practice. We derive available on parametric, semiparametric USING DECISION LISTS TO a robust estimator of the optimal regime and nonparametric methods to evalu- CONSTRUCT INTERPRETABLE within this class and demonstrate its ate the accuracy of diagnostic tests with AND PARSIMONIOUS TREATMENT finite sample performance using simula- perfect gold standards. Sensitivities and REGIMES tion experiments. The proposed method specificities are commonly used for a Yichi Zhang*, North Carolina State is illustrated with data from two clinical binary test. These parameters can be University trials. estimated using the proportions when Eric Laber, North Carolina State e-mail: [email protected] a perfect gold standard is available University for every individual in the sample. The receiver operating characteristic (ROC) Anastasios Tsiatis, North Carolina SYNTHESIZING GENETIC MARKERS curve plotting pairs of sensitivities and State University FOR INCORPORATION INTO CLINI- specificities is a common statistical tool to Marie Davidian, North Carolina CAL RISK PREDICTION TOOLS evaluate the accuracy of ordinal or con- State University Sonja Grill*, Technical University tinuous tests. The ROC curve estimated Munich, Germany from data without the gold standard is A treatment regime formalizes per- biased. To correct for the bias, a lin- sonalized medicine as a function from Donna P. Ankerst, Technical University ear regression method is proposed to individual patient characteristics to a Munich, Germany and University estimate the ROC curve from pairs of recommended treatment. A high-quality of Texas Health Science Center consistent sensitivity and specificity esti- treatment regime can improve patient at San Antonio outcomes while reducing cost, resource mates. The proposed method first applies Clinical risk prediction tools built on stan- consumption, and treatment burden. Hui and Walter’s method to estimate a dard risk factors are important devices for Thus, there is tremendous interest in pair of sensitivity and specificity for a many different diseases. Newly discov- estimating treatment regimes from given cutoff point. For a set of chosen ered genetic and high-dimensional-omic observational and randomized studies. cutoff points on the continuous data, a markers, such as single nucleotide However, the development of treatment number of pairs can be obtained and the polymorphisms (SNPs) and gene expres- regimes for application in clinical prac- estimates in the pairs can be values for sions, have the potential to increase the tice requires the long-term, joint effort the response variable and covariate in the practical utility of clinical risk prediction of statisticians and clinical scientists. linear regression setting. tools. Typically these markers are not In this collaborative process, the stat- e-mail: [email protected] assessed in the original cohorts used to istician must integrate clinical science build the existing risk prediction tools, into the statistical models underlying making their incorporation into those a treatment regime and the clinician tools complicated. We provide an intuitive must scrutinize the estimated treatment Bayesian method for updating an existing regime for scientific validity. To facilitate clinical risk prediction tool with external meaningful information exchange, it marker information via the use of likeli- is important that estimated treatment hood ratios to transform the prior odds of regimes be interpretable in a subject- a disease to posterior odds. We illustrate matter context. We propose a simple, yet the method with two applications, the flexible class of treatment regimes whose

Program & Abstracts 361 first incorporating SNPs from multiple enhance search performance. Through goal is to select treatment to maximize published genome-wide association stud- simulations, we characterize conditions survival probability. We propose two ies into the Prostate Cancer Prevention that enable the procedure to work well. nonparametric estimators for the survival Trial Risk Calculator via a random-effects To demonstrate practical uses of the pro- function of patients following a given meta-analysis with an option account- cedure, we apply it to two real-world data treatment regime involving one or more ing for linkage disequilibrium between sets. We also compare the results with decisions, i.e., the so-called value. Based groups of SNPs. The second application those obtained from a recent regression- on data from a clinical or observational is detailed family history of cancer from based approach, Adaptive Index Models, study, we estimate an optimal regime by the nationwide Swedish Family-Cancer and discuss their respective advantages. maximizing these estimators for the value Database (the world’s largest of its kind). We will be focused on oncology applica- over a prespecified class of regimes. Both markers are independent predictors tions with survival responses. Because the value function is very jag- of prostate cancer to the commonly-used ged, we introduce kernel smoothing e-mail: [email protected] risk factors. within the estimator to improve perfor- mance. Asymptotic properties of the e-mail: [email protected] ON ESTIMATION OF OPTIMAL proposed estimators of value functions TREATMENT REGIMES FOR are established under suitable regular- A PRIM APPROACH TO PREDICTIVE- MAXIMIZING T-YEAR SURVIVAL ity conditions, and simulations studies SIGNATURE DEVELOPMENT FOR PROBABILITY evaluate the finite-sample performance PATIENT STRATIFICATION of the proposed regime estimators. The Runchao Jiang*, North Carolina methods are illustrated by application to Gong Chen*, Roche TCRC, Inc. State University data from an AIDS clinical trial. Hua Zhong, New York University Wenbin Lu, North Carolina State e-mail: [email protected] School of Medicine University Anton Belousov, Roche Diagnostics Rui Song, North Carolina GmbH State University EVALUATION OF NOVEL BIOMARKERS WHEN LIMITED Viswanath Devanarayan, AbbVie, Inc. Marie Davidian, North Carolina BY SMALL SAMPLE SIZE State University Patients often respond differently to a Bethany J. Wolf*, Medical University treatment due to individual heterogeneity. A treatment regime is a deterministic of South Carolina Failures of clinical trials can be substan- function that dictates personalized tially reduced if, prior to an investigational treatment based on patients’ individual John Christian Spainhour, Medical treatment, patients are stratified into prognostic information. There is increas- University of South Carolina responders and non-responders based ing interest in finding optimal treatment Jim C. Oates, Medical University on biological or demographic char- regimes, which determine treatment at of South Carolina acteristics. These characteristics are one or more treatment decision points Advances in high-throughput biologic captured by a predictive signature. In so as to maximize expected long-term methods provide potential for discov- this talk, we introduce a procedure to clinical outcome, where larger outcomes ery of biomarkers predictive of disease search for predictive signatures based on are preferred. For chronic diseases such status. Several difficulties in identifying the approach of Patient Rule Induction as cancer or HIV infection, survival time predictive biomarkers include small Method (PRIM). Specifically, we discuss is often the outcome of interest, and the sample size, weak main effects, marker selection of a proper objective function interactions, and non-linear relationships for the search, present its algorithm, and describe a resampling scheme that can

362 ENAR 2015 | Spring Meeting | March 15–18 between markers and outcomes. Logistic CALIBRATE VARIATIONS IN BUILDING SMALL, ROBUST regression is a common approach for BIOMARKER MEASURES FOR GENE SIGNATURES TO PREDICT modeling binary disease outcomes and IMPROVING PREDICTION PROGNOSIS while logistic regression can model weak Cheng Zheng*, University of Wisconsin, Prasad Patil*, Johns Hopkins University main effects and interactions, it suffers Milwaukee from poor precision of regression esti- Jeffrey T. Leek, Johns Hopkins mates, model over-fitting, and failure to Yingye Zheng, Fred Hutchinson Cancer University Research Center meet underlying assumptions. Machine We describe a novel approach to build- learning methods have many desirable Novel biologic markers have been widely ing lightweight, robust, and interpretable features for evaluating novel biomarkers used in predicting important clinical out- gene signatures for prediction of prog- for prediction of disease outcomes and come. One specific feature of biomarkers nosis in cancer patients. A bottleneck require fewer assumptions than logistic is that they often are ascertained with to building gene signatures is that the regression. However these methods variations due to the specific process of feature space of all possible genes is may also over fit the data and thus do measurement. The magnitude of such extremely large and noisy. This space not necessarily validate well in new data. variation may differ when applied to a is commonly reduced by incorporat- There is not one statistical method that different targeted population or when the ing knowledge about gene function can provide a “best” model for all data, platform for biomarker assaying changes and regulation to weed out biologically particularly with small sample size. Thus it from original platform the prediction implausible genes, but this may discard is beneficial to evaluate and compare the algorithm (cutoffs) based upon. Statistical genes that offer predictive value. Fea- predictive performance of multiple statisti- methods have been proposed to charac- ture selection methods for microarrays cal models when examining biomarkers. terize the effects of underlying error-free can struggle with the size of the feature We present a strategy for evaluating the quantity in association with an outcome, set and often incorporate the outcome predictive capability of a set of biomark- yet the impact of measurement errors throughout, which may lead to overfit- ers using different statistical models. We in terms of prediction has not been well ting. In this work, we focus on pairwise apply this strategy to evaluate prediction studied. We focus in this manuscript on comparisons between genes. These performance of a set of novel urine bio- the settings where biomarkers are used features are robust to the technology markers of treatment response in patients for predicting individual’s future risk and used to measure gene expression, and with lupus nephritis. propose semiparametric estimators for signatures built using these features do e-mail: [email protected] error-corrected risk, when replicates of not require retraining across platforms the error-prone biomarkers are avail- and technologies. We present a fast, able. The predictive performance of the two-stage filter/wrapper method that proposed estimators is evaluated and can reduce 20,000+ genes to a handful compared to alternative approaches of pairwise comparisons. This method with numerical studies under settings relies on unique properties of predictive with various assumptions on the mea- pairwise features and on the equivalency surement distributions. We studied the of F-statistics when outcome and covari- asymptotic properties of the proposed ate are flipped in a regression. We then estimator. Application is made in a liver compare gene signatures created using cancer biomarker study to predict risk our method to leading signatures that of 3 and 4 years liver cancer incidence have been validated for use in the clinic using age and a novel biomarker. for predicting prognosis of breast cancer patients. e-mail: [email protected] e-mail: [email protected]

Program & Abstracts 363 109. CONTRIBUTED PAPERS: CHANGE-POINT DETECTION IN tive frequencies which may be located in Time Series Analysis EEG SPECTRA FOR INFORMED different points within a frequency band. and Methods FREQUENCY BAND SELECTION Our approach takes the high dimen- sionality of the data over channels into Anna Louise Schroeder*, London account. Furthermore, we analyse the School of Economics ROBUST PORTFOLIO OPTIMIZATION information content over trial repetitions UNDER HIGH DIMENSIONAL HEAVY- Hernando Ombao, University and subjects. TAILED TIME SERIES of California, Irvine e-mail: [email protected] Huitong Qiu*, Johns Hopkins University The analysis of neural activity in a brain when exposed to an external stimulus is Fang Han, Johns Hopkins University core many neuroscientific research ques- TIME SERIES ANALYSIS Han Liu, Princeton University tions, e.g. on Brain-Computer Interfaces FOR SYMBOLIC-VALUED DATA Brian Caffo, Johns Hopkins University or developmental disorders such as dys- S. Yaser Samadi*, Southern Illinois lexia. In clinical settings applications exist In this paper, we study a robust port- University e.g. for the diagnosis of brain diseases, folio optimization strategy by resorting head injuries and sleep disorders. Elec- Lynne Billard, University of Georgia to quantile-based statistics. Computa- troencephalograms measure electrical Symbolic values can be lists, intervals, tionally, the method is as efficient as its activity non-invasively and with high tem- frequency distributions, and so on. Gaussian-based alternative. Theoretically, poral resolution. In experiments, data is Therefore, in comparison with standard by exploiting the quantile-based statis- recorded over multiple trials and at many classical data, they are more complex tics, we show that the actual portfolio points distributed over the skull. It is com- and can have structures (especially risk approximates the oracle risk with monly analysed in the spectral domain, internal structures) that impose complica- parametric rate of convergence. The rate where temporal evolution of pre-defined, tions that are not evident in classical data. is set in a double asymptotic framework broad frequency bands are monitored. In general, using “classical” analysis where the portfolio size may scale expo- The a priori definition of frequency bands approaches directly lead to inaccurate nentially with the sample size. Moreover, originated from the analysis of key sur- results. As a result of dependency in time the theory holds under heavy tailed distri- face features, such as the average peak series observations, it is more difficult to butions with no moment constraints, and frequency. It has been shown that the deal with symbolic (interval) time series allows for weakly dependent time series. highest-energy frequency within a band data and take into account their complex The empirical effectiveness of the method differs from individual to individual and structure and internal variability. In the is demonstrated in both synthetic and can be related to e.g. age, performance literature, the proposed procedures for real data. The experiments demonstrate and intelligence. Subject-independent analyzing interval time series data used that the method can significantly stabilize applications therefore typically consider either midpoint or radius that are inap- portfolio risk under highly volatile stock the mean power spectral density and propriate surrogates for symbolic interval returns, and effectively avoid extreme thus risk averaging-out possibly pro- variables. We develop a theory and meth- losses. nounced local changes in power. To odology to analyze symbolic time series e-mail: [email protected] avoid this, they require a mechanism to data (interval data) directly. Autocorrela- identify most informative frequencies and tion and partial autocorrelation functions compare this accounting for individual are formulated, maximum likelihood differences. We present a novel method estimators of the parameters of symbolic to detect change points over time in autoregressive processes are provided. frequencies. Based on these change e-mail: [email protected] points we can identify the most informa-

364 ENAR 2015 | Spring Meeting | March 15–18 HIGH DIMENSIONAL STATE AUTOREGRESSIVE MODELS MODELING SERIAL COVARIANCE SPACE MODEL WITH L-1 AND FOR SPHERICAL DATA WITH STRUCTURE IN SEMIPARAMETRIC L-2 PENALTIES APPLICATIONS IN PROTEIN LINEAR MIXED-EFFECTS REGRES- STRUCTURE ANALYSIS SION FOR LONGITUDINAL DATA Shaojie Chen*, Johns Hopkins University Daniel Hernandez-Stumpfhauser*, Changming Xia*, University of Rochester University of North Carolina, Chapel Hill Medical Center Joshua Vogelstein, Johns Hopkins University F. Jay Breidt, Colorado State University Hua Liang, The George Washington University Seonjoo Lee, Columbia University Mark van der Woerd, Colorado State University Sally W. Thurston, University Martin Lindquist, Johns Hopkins of Rochester Medical Center University Proteins consist of sequences of the 21 natural amino acids. There can be Mixed-effects regression accounts for Brian Caffo, Johns Hopkins University tens to hundreds of amino acids in the correlation and overdispersion in longitu- The time-invariant state space model, protein, and hundreds to hundreds of dinal data by introducing random effects also known as the linear dynamical thousands of atoms. A complete model of subjects. Any further unexplained system (LDS) model or linear Gauss- for the protein consists of coordinates for correlation and variance structure, ian model (LGM), is widely used in time every atom. A class of simplified models such as autoregressive time series and series analysis. A broad class of popular is obtained by focusing only on the alpha exponential weights, are accounted for models including factor analysis, prin- -carbon sequence, consisting of the by serial covariance structure within cipal component analysis (PCA) and primary carbon atom in the backbone of subjects after conditioning on random independent component analysis (ICA) each amino acid. The three-dimensional effects. We evaluate the effects of serial could be unified as variations of this structure of the alpha -carbon backbone covariance structure mis-specification generative model. Parameters learning of the protein can be described as a on model fitting and hypothesis testing in this model is challenging, especially sequence of angle pairs, each consisting in semiparametric linear mixed-effects when the dimension is high. In this paper, of a bond angle and a dihedral angle. regression for dependent continuous and we generalized the model by penalizing These angle pairs lie naturally on a categorical outcomes fitted by smooth- the coefficient matrices with L-1 and L-2 sphere. We consider autoregressive time ing splines based on reproducing kernel penalties. An Expectation-Maximization series models for such spherical data Hilbert space. algorithm is then designed for parameter sequences, using extensions of pro- e-mail: [email protected] learning. At the end the model is applied jected normal distributions. Application to explore the motor cortex of human to protein data and further developments, brains. including regime-switching autoregres- e-mail: [email protected] sive models, are described. This is joint work with F. Jay Breidt, Department of Statistics, Colorado State University, and Mark van der Woerd, Department of Biochemistry and Molecular Biology, Colorado State University. e-mail: [email protected]

Program & Abstracts 365 110. Incorporating GPA performs an integrative analysis of between cases and controls, accounting Biological Information multiple GWAS datasets and functional for pedigree relationships in association in Statistical Model- annotations to seek association signals, analyses, and incorporating biologi- ing of Genome-Scale as well as hypothesis testing to test the cal annotation to weight variants. We Data with Complex presence of pleiotropy and enrichment of will present new statistical methods for Structures functional annotation. When we applied case-control comparisons with related GPA to analyze jointly five psychiatric subjects, as well as statistical tests for disorders with annotation information, co-segregation of genetic variants with PRIORITIZING GWAS RESULTS not only did GPA identify many weak disease, allowing for gene-level analyses BY INTEGRATING PLEIOTROPY signals missed by the traditional single that evaluate multiple genetic variants AND ANNOTATION phenotype analysis, but it also revealed within a gene. relationships in the genetic architecture of Hongyu Zhao*, Yale School e-mail: [email protected] of Public Health these disorders. We will also demonstrate the usefulness of GPA using several other Dongjun Chung, Medical University examples. This is joint work with Dongjun BIG DATA METHODS FOR of South Carolina Chung, Can Yang, Cong Li, Qian Wang, DISSECTING VARIATIONS IN HIGH- Can Yang, Hong Kong Baptist University and Joel Gelernter. THROUGHPUT GENOMIC DATA Cong Li, Yale University e-mail: [email protected] Fang Du, Johns Hopkins Bloomberg Qian Wang, Yale University School of Public Health Joel Gelernter, Yale School of Medicine CHALLENGES AND SOLUTIONS Bing He, Johns Hopkins Bloomberg FOR WHOLE EXOME SEQUENCE School of Public Health Results from Genome-Wide Association ANALYSIS FOR PEDIGREE AND Studies (GWAS) have shown that com- Hongkai Ji*, Johns Hopkins Bloomberg EXTERNAL CONTROL DATA plex diseases are often affected by many School of Public Health genetic variants with small or moderate Daniel J. Schaid*, Mayo Clinic Variance decomposition (e.g., ANOVA, effects. Identifications of these risk vari- Whole exome sequencing (WES) targets PCA) is a fundamental tool in statistics to ants remain a very challenging problem. protein-coding DNA sequences, a understand data structure. High-through- There is a need to develop more powerful technique we have used to screen for put genomic data have heterogeneous statistical methods to leverage available genes associated with familial prostate sources of variation. Some are of bio- information to improve upon traditional cancer. The study samples are pedigree logical interest, and others are unwanted approaches that focus on a single GWAS members with prostate cancer selected (e.g., lab and batch effects). Knowing the dataset without incorporating additional from the International Collaboration relative contribution of each source to the data. In this presentation, we will intro- of Prostate Cancer Genetics. For cost total data variance is crucial for making duce a novel statistical approach, GPA efficiency, unrelated external controls data-driven discoveries. However, when (Genetic analysis incorporating Pleiotropy with WES from prior studies were used. one has massive amounts of high-dimen- and Annotation), to increase statistical Statistical challenges of analyzing sional data with heterogeneous origins, power to identify disease associated pedigree data with external controls, and analyzing variances is non-trivial. The variants because: (1) accumulating proposed solutions, will be presented. dimension, size and heterogeneity of the evidence suggests that different complex Topics include evaluating quality control data all pose significant challenges. Big diseases share common risk bases, i.e., metrics and comparability of WES data Data Variance Decomposition (BDVD) is a pleiotropy; and (2) functionally annotated new tool developed to solve this problem. variants have been consistently demon- Built upon the recently developed RUV strated to be enriched among GWAS hits. approach, BDVD decomposes data into

366 ENAR 2015 | Spring Meeting | March 15–18 biological signals, unwanted system- sequencing count data. Compared to Jeffreys’ prior has finite modes. A modi- atic variation, and independent random currently available methods on simu- fication of the Jeffreys’ prior is proposed noise. The biological signals can then be lated data and real data, our method has in order to obtain more robust estimates further decomposed to study variations demonstrated an improved accuracy in of covariate effects. We perform extensive among genomic loci or sample types, or bacterial abundance quantification and simulations to examine the performance correlation between different data types. better sensitivity and specificity in identi- of parameter estimates and demonstrate The algorithm is implemented by incor- fying the covariate-associated species. the applicability of our methods by ana- porating techniques to handle big data. lyzing real data from cancer clinical trials e-mail: [email protected] Applying BDVD to ENCODE, we show in detail. the variance structure of the ENCODE e-mail: [email protected] DNase-seq data and demonstrate that 111. Emerging Issues in BDVD allows one to develop tools that Clinical Trials and better separate signals from noise in vari- High Dimensional Data ASSESSING TEMPORAL AGREE- ous applications. MENT BETWEEN CENTRAL AND e-mail: [email protected] LOCAL PROGRESSION-FREE SUR- ASSESSING COVARIATE EFFECTS VIVAL TIMES WITH THE MONOTONE PARTIAL Donglin Zeng*, University of North MODEL-BASED APPROACH FOR LIKELIHOOD USING JEFFREYS’ Carolina, Chapel Hill SPECIES QUANTIFICATION AND PRIOR IN THE COX MODEL Emil Cornea, University of North DIFFERENTIAL ABUNDANCE Ming-Hui Chen*, University of Carolina, Chapel Hill ANALYSIS BASED ON SHOTGUN Connecticut METAGENOMIC DATA Jun Dong, Amgen Inc. Mario de Castro, Universidade Hongzhe Li*, University of Pennsylvania de Sao Paulo Jean Pan, Amgen Inc. The human microbiome, which includes Jing Wu, University of Connecticut Joseph Ibrahim, University of North the collective microbial genomes resid- Carolina, Chapel Hill Elizabeth D. Schifano, University ing in or on the human body, has a of Connecticut In oncology clinical trials, progression- profound influence on human health. free survival (PFS), generally defined The DNA sequencing technologies have In clinical trials, the monotone partial as the time from randomization until made large-scale the human microbi- likelihood is frequently encountered in the disease progression (PD) or death, has ome studies possible by using shotgun analysis of time-to-event data using the been a key endpoint to support licens- metagenomic sequencing. It is of great Cox model. When there are zero events ing approval. When PFS is the primary or interest to quantify the bacterial abun- in one or more covariate groups, the co-primary endpoint, it is recommended dances based on the sequencing data resulting partial likelihood is monotonic to have tumor assessments verified by and to identify the bacteria that are and consequently, the covariate effects an independent review committee (IRC) associated with clinical outcomes. We are difficult to estimate. In this paper, we blinded to study treatments, especially in propose a hierarchical Poisson-Gamma develop both Bayesian and frequentist open-label studies. It is considered reas- regression model and its Empirical approaches using the Jeffreys’ prior to suring about the lack of reader-evaluation Bayes extension to quantify microbial handle the monotone partial likelihood bias if treatment effect estimates from the abundances based on species-specific problem. We characterize sufficient and investigator and IRC’s evaluations agree. taxonomic markers as well as to identify necessary conditions for the propriety of Agreement between these evaluations the covariate-associated bacteria. Our the Jeffreys’ prior. We also show that the may vary for subjects with short or long model takes into account the marker- PFS, while there exist no such statistical specific effect when normalizing the

Program & Abstracts 367 quantities that can completely account Non-inferiority multi-regional clinical the sample size. Even for a simple linear for this temporal pattern of agreements. trials (MCRTs) have recently received model, when the number of predictors is Therefore, in this paper, we propose increasing attention in drug develop- larger than or close to the sample size, a new method to assess temporal ment. Two major goals in a MCRT are such model may be unidentifiable and agreement between two time-to-event (1) to estimate the global drug effect and the least squares estimates of regres- endpoints, while the two event times (2) to assess the consistency of drug sion coefficients can be unstable. To are assumed to have a positive prob- effects across multiple regions. In this deal with such issue, we systematically ability of being identical. This method paper, we propose an intuitive definition investigate three Bayesian regularization measures agreement in terms of the two of consistency of non-inferior drug effects methods with applications in imaging event times being identical at a given across regions under the random effects genetics. First, we develop a Bayes- time or both being greater than a given modeling framework. Specifically, we ian lasso estimator for the covariance time. Overall scores of agreement over a quantify the consistency of drug effects matrix and propose a metropolis-based period of time are also proposed. We pro- by the percentage of regions that meet a sampling scheme. This development pose maximum likelihood estimation to pre-defined treatment margin. This new is motivated by functional network infer the proposed agreement measures approach enables us to achieve both exploration for the entire brain from using empirical data, accounting for goals in one modeling framework. We magnetic resonance imaging (MRI) different censoring mechanisms includ- propose to use a signed likelihood ratio data. Second, we propose a Bayesian ing reader’s censoring. The proposed test for testing the global drug effect generalized low rank regression model method is demonstrated to perform well and the consistency of non-inferior drug (GLRR) for the mean parameter estima- in small-sample via extensive simulation effects. In addition, we provide guidelines tion and combine this with factor loading studies and is illustrated through a head for the allocation rule to achieve optimal method of covariance estimation to and neck cancer trial. power for testing consistency among capture the spatial correlation among multiple regions. Extensive simulation the responses and jointly estimate the e-mail: [email protected] studies are conducted to examine the mean and covariance parameters. This performance of the proposed methodol- development is motivated by performing STATISTICAL DESIGN OF NON- ogy. An application to a real data exam genome-wide searches for associa- INFERIORITY MULTIPLE REGION ple is provided. tions between genetic variants and brain imaging phenotypes from data collected CLINICAL TRIALS TO ASSESS e-mail: [email protected] GLOBAL AND CONSISTENT TREAT- by Alzheimer’s Disease Neuroimaging MENT EFFECTS Initiative (ADNI). Third, we extend GLRR BAYESIAN SHRINKAGE METHODS to longitudinal setting and propose a Guoqing Diao*, George Mason FOR HIGH DIMENSIONAL DATA Bayesian longitudinal low rank regres- University sion (L2R2) to account for spatiotemporal Joseph G. Ibrahim*, University of North Donglin Zeng, University of North correlation among the responses as well Carolina, Chapel Hill Carolina, Chapel Hill as estimation of full-rank coefficient matrix Hongtu Zhu, University of North Carolina, Joseph G. Ibrahim, University for standard prognostic factors. This Chapel Hill of North Carolina, Chapel Hill development is motivated by genome- Zakaria Khondker, Medivation, Inc. wide searches for associations between Alan Rong, Amgen Inc. genetic variants and brain imaging Zhaohua Lu, University of North Carolina, Oliver Lee, Amgen Inc. phenotypes observed over time with a Chapel Hill Kathy Zhang, Amgen Inc. primary focus on role of aging and the Big data presents the overwhelming interaction of age with genotype in affect- Qingxia Chen, Vanderbilt University challenge of estimating a large number ing brain volume. of parameters, which is much larger than e-mail: [email protected]

368 ENAR 2015 | Spring Meeting | March 15–18 112. Advances in a latent process approach for the analysis patients may receive several transfusions Repeated Measures of longitudinal data with non-ignorable over a period of time, and a response is and Longitudinal and non-monotone missingness. Hid- available from each such administration. Data Analysis den Markov models are widely used for It is natural to consider testing for treat- applications in pattern reognition includ- ment effects based on standard methods ing speech recognition, handwriting, for repeated measures data, but naive JOINT MODELLING OF DIFFERENT bioinformatics, and gene finding and analyses of the multiple responses can TYPES OF LONGITUDINAL DATA profiling. Multi-state Markov models are yield biased estimates of the probability WITH OUTLIERS AND CENSORING widely used to model diease progres- of response and associated treatment Lang Wu*, University of British Columbia sion and cancer screening. The hidden effects. These biases arise because Markov model is a powerful extension of only subsets of the patients randomized In multivariate mixed effects models for the multi-state Markov model in longi- contribute responses to second and sub- longitudinal data, the response variables tudinal studies assuming the states are sequent administrations of therapy, and may be of different types, such as con- unobserved. Incorporating this approach hence the balance between treatment tinuous and discrete. Moreover, the data with selection models and shared groups is lost with respect to potential may contain outliers, missing values, and parameter models, we can identify confounding factors. We discuss analysis censoring. We provide several methods differences among disease processes issues in this setting and demonstrate for inference, addressing these data com- with incomplete data simultaneously in how biases can be reduced by use of plications. The methods will be illustrated both the state-dependent model and inverse probability weighted estimating by AIDS datasets and will be evaluated missingness mechanism model. We equations. by simulations. propose the models in a generalized e-mail: [email protected] e-mail: [email protected] linear model and generalized linear mixed model framework, using a backward- forward algorithm to provide efficient JOINT MODELLING OF NONIGNOR- A HIDDEN MARKOV MODEL FOR parameter estimation in the general ABLE MISSING LONGITUDINAL NON-IGNORABLE NON-MONOTONE situation of non-ignorable non-monotone OUTCOMES AND TIME-TO-EVENT MISSING LONGITUDINAL DATA longitudinal missing data. A two-stage DATA FOR MEDICAL STUDIES OF QUALITY pseudo-likelihood method is used to Sanjoy Sinha*, Carleton University OF LIFE reduce the parameter space to make this Kaijun Liao, Hisun Pharmaceuticals USA model more attractive. We illustrate the Joint models for longitudinal and time- approach using data from a clinical trial in to-event data has received considerable Qiang Zhang, Radiation Therapy brain cancer. attention in recent years for analyzing Oncology Group follow-up data. These are typically used e-mail: [email protected] Andrea B. Troxel*, University when the focus is on survival data and of Pennsylvania Perelman School one wishes to investigate the effect of an of Medicine INVERSE WEIGHTED ESTIMATING endogenous time-dependent covariate on the survival times. Often we encoun- In longitudinal studies, the problem EQUATIONS FOR REPEATED MEA- ter missing values in the longitudinal of non-ignorable and non-monotone SURES IN TRANFUSION MEDICINE data due to a stochastic missing data missing data has gained increasing atten- Richard Cook*, University of Waterloo mechanism. In this work, we investigate tion recently. The statistical approach Trials in transfusion medicine are rou- methods for jointly analyzing longitudinal depends on the factorization of the joint tinely designed to assess the effect and time-to-event data in the presence of likelihood of the data and the missing- of experimental platelet products on nonignorable and nonmonotone miss- ness mechanism. In this article we adopt patients’ platelet counts. In such trials

Program & Abstracts 369 ing responses. We perform sensitivity attention must be paid to distributional logistic model for “cure” status (Yes/No) analyses to study effects of misspecified choices, as model misspecification can and a frailty proportional hazards model missing data models as well as random lead to biased and imprecise inferences. for recurrent event times of those “not effects distributions on the estimates of This paper introduces a broad class of cured”. The model can be fitted conve- the model parameters. The methods will Bayesian two-part models for the spatial niently in SAS Proc NLMIXED. Simulation be evaluated using simulations. An appli- analysis of semicontinuous data. Spe- results show the satisfactory finite sample cation will be presented using actual data cific models considered include two-part property of the estimation method. We from a clinical study. lognormal, log skew-elliptical, and Bayes- apply the method to model tumor recur- ian nonparametric models. Multivariate rences in a soft tissue sarcoma study. We e-mail: [email protected] conditionally autoregressive priors are find that this model has a better perfor- used to link the binary and continuous mance than the frailty model alone. components and provide spatial smooth- 113. Advances in Modeling e-mail: [email protected] Zero-Inflated Data ing across neighboring regions, resulting in a joint spatial modeling framework for health utilization and expenditures. We TWO-PART MODELS FOR ROLLING BAYESIAN TWO-PART SPATIAL MOD- develop a fully conjugate Gibbs sampling ADMISSION GROUP THERAPY DATA ELS FOR SEMICONTINUOUS DATA scheme, leading to efficient posterior Lane F. Burgette*, RAND Corporation Brian Neelon*, Duke University computation. We illustrate the approach using data from a recent study of emer- Susan M. Paddock, RAND Corporation Li Zhu, University of Pittsburgh gency department expenditures.

Group therapy is a common treatment Sara Benjamin, Duke University e-mail: [email protected] modality in alcohol and other drug (AOD) In health services research, it is common treatment programs, wherein multiple to encounter semicontinuous data char- clients attend group therapy sessions acterized by a point mass at zero and a ZERO-INFLATED FRAILTY MODEL together. Clients are often admitted into continuous distribution of positive values. FOR RECURRENT EVENT DATA therapy groups on a rolling basis. The Examples include medical expenditures, Lei Liu*, Northwestern University analysis of data arising from such stud- in which the zeros represent patients who ies is complicated by clustering of client Xuelin Huang, University of Texas do not use health services, while the con- outcomes due to joint participation in MD Anderson Cancer Center tinuous distribution describes the level of group therapy sessions. Outcomes are expenditures among users. Semicontinu- Alex Yaroshinsky, Vital Systems Inc. correlated not only for clients attending common sessions but also for clients ous data are customarily analyzed using Recurrent event data arise frequently in attending different sessions that are two-part mixture models consisting of a longitudinal medical studies. In many offered as part of the same rolling group. Bernoulli distribution for the probability situations, there are a large portion of Conditional autoregression has been of a nonzero response and a continuous subjects without any recurrent events used to model the correlation of client distribution for the positive responses. (e.g., tumor recurrences), manifesting outcomes that is due to common therapy In the spatial analysis of semicontinu- the “zero-inflated” nature of the data. session attendance among clients, while ous data, two-part models are especially Some of the zero events may be due to allowing for the non-independence of appealing because they provide a joint “cure”, while others are due to censor- random effects for sessions within the picture of how health services utiliza- ing before any recurrent events. In this same rolling group. Whereas previous tion and associated expenditures vary paper, we propose a zero-inflated frailty research in the area has focused on con- across geographic regions. However, model for this type of data, combining a when applying these models, careful tinuous measures, many AOD treatment outcomes are two-part in nature. For

370 ENAR 2015 | Spring Meeting | March 15–18 example, some clients might report no nonzero response, conventional two- For the medical studies involving compet- AOD use following treatment while others part models do not provide a marginal ing risks, one often wishes to estimate report some level of AOD use. For two- interpretation of covariate effects on the and model the cumulative incidence part outcomes, we model correlations overall population of health service users probability, the marginal probability of for both parts of the two-part outcome. and non-users, even though this is often failure for a specific cause. Recently, We propose vector autoregressive and of greatest interest to investigators. We several new methods have been devel- G-Wishart priors to account for correla- propose a marginalized two-part model oped to directly model the cumulative tions in the random effects distribution that yields more interpretable effect incidence probability of a specific cause while taking advantage of the structure of estimates by parameterizing the model in of failure. The key issue here is how to the group therapy design itself. terms of the marginal mean. This model deal with incomplete data due to the maintains many of the important features fact that observations are subject to e-mail: [email protected] of conventional two-part models, such right-censoring. We refer to a simple as capturing zero-inflation and skew- problem in which one covariate, say Z, A MARGINALIZED TWO-PART ness, but allows investigators to examine is always observed and the other, say X, MODEL FOR SEMICONTINUOUS covariate effects on the overall marginal is sometimes missing. There has been DATA mean, a target often of primary interest. considerable focus on handling missing Using a simulation study, we exam- covariates and there are several sug- Valerie A. Smith*, Center for Health ine properties of maximum likelihood gestions for dealing with the simpler Services Research in Primary Care, estimates from this model. We illustrate survival data where there are not several Durham VAMC and University the approach by evaluating the effect of causes of death. For survival data the key of North Carolina, Chapel Hill a behavioral weight loss intervention on suggestions are multiple imputation tech- John S. Preisser, University health care expenditures in the Veterans niques that typically aim for the modeling of North Carolina, Chapel Hill Affairs health care system. Extensions to of the hazard function. An alternative is Brian Neelon, Duke University longitudinal and clustered data are also the IPCW techniques for survival data. considered. Even though the competing risks frame- Matthew L. Maciejewski, Center work is very common practice, there are for Health Services Research e-mail: [email protected] no studies dealing with the problem of in Primary Care, Durham VAMC missing covariate information in compet- In health services research, it is com- 114. New Developments ing risks regression. Here we present mon to encounter semicontinuous data in Missing Data some results regarding multiple imputa- characterized by a degenerate distribu- Analysis: From tion and IPCW techniques applied to the tion at zero followed by a right-skewed Theory to Practice direct binomial regression model through continuous distribution with positive sup- some simple simulations. port. Semicontinuous data are typically e-mail: [email protected] analyzed using two-part mixtures that COMPETING RISKS REGRESSION separately model the probability of health WITH MISSING DATA IN services use and the distribution of posi- THE PROGNOSTIC FACTORS tive responses among users. However, Federico Ambrogi*, University of Milan because the second part conditions on a Thomas H. Scheike, University of Copenhagen

Program & Abstracts 371 COMPARISON OF MULTIPLE MICE is flexible to use but lack of a clear behavior still remains unclear when IMPUTATION VIA CHAINED EQUA- theoretical rationale and suffers from applied to survey data with complex TIONS AND GENERAL LOCATION potential incompatibility of the conditional sample designs including unequal MODEL FOR ACCELERATED FAIL- regression models used in imputation. weighting and clustering. Recently, URE TIME MODELS WITH MISSING In contrast, GLM is theoretically sound Lewis et al. (2014) compared single and COVARIATES and can be rather robust toward model multiple imputation analyses for certain misspecifications and violations of GLM incomplete variables in the 2008 National Lihong Qi*, University of California, assumptions. Therefore, we believe that Ambulatory Medicare Care Survey, which Davis GLM shows the potential for being a has a nationally representative, multi- Yulei He, Centers for Disease Control competitive and attractive tool for tackling stage, and clustered sample design. and Prevention the analysis of AFT models with missing Their study results suggested that the Rongqi Chen, University of California, covariates. increase of the variance estimate due to multiple imputation compared with Davis e-mail: [email protected] single imputation largely disappears for Ying-Fang Wang, University estimates with large design effects. We of California, Davis THE EFFECT OF DATA CLUSTER- supplement their research by providing Xiaowei Yang, University ING ON THE MULTIPLE IMPUTATION a theoretical explanation for this phe- of California, Davis VARIANCE ESTIMATOR nomenon. We consider data sampled from an equally weighted, single-stage Missing covariates are common in bio- Yulei He*, Centers for Disease Control cluster design and characterize the pro- medical studies with survival outcomes. and Prevention Multiple imputation is a practical strategy cess using a balanced, one-way normal Iris Shimizu, Centers for Disease Control for handling this problem with various random-effects model. Assuming that the and Prevention approaches and software packages missingness is completely at random, available for implementation. In this talk, Susan Schappert, Centers for Disease we derive the analytic expressions of the we compare two important approaches: Control and Prevention within and between- multiple imputation variance estimators for the mean estima- multiple imputation by chained equation Nathaniel Schenker, Centers for Disease tor and propose an approximation for (MICE) and multiple imputation via a gen- Control and Prevention eral location model (GLM) for accelerated the fraction of missing information. As failure time (AFT) models with missing Vladislav Beresovsky, Centers for hypothesized by Lewis et al. (2014), we covariates. Through a comprehensive Disease Control and Prevention show that rate of missingness and intra- simulation study, we investigate the Diba Khan, Centers for Disease Control cluster-correlation (i.e., design effect) performance of the two approaches and and Prevention have opposite effects on the increase their robustness toward violation of the of the variance estimate due to multiple Roberto Valverde, Centers for Disease GLM assumptions and model misspecifi- imputation. We discuss some generaliza- Control and Prevention cations including misspecifications of tions of this research and its practical the covariance structure and of the joint Multiple imputation is a popular approach implications for data release by statistical distribution of continuous covariates. to statistical analysis with missing data. agencies. Although it was originally motivated Simulation results show that MICE can e-mail: [email protected] be sensitive to model misspecifications by survey nonresponse problems, it and may generate biased results with has been readily applied to other data inflated standard errors while GLM can settings. On the other hand, its general still yield estimates with reasonable biases and coverages in these situations.

372 ENAR 2015 | Spring Meeting | March 15–18 FRACTIONAL HOT DECK IMPUTA- ing spatially-consistent SWGs. Unless the history of the dynamic process, the TION FOR MULTIVARIATE MISSING the region of interest is relatively small, traditional likelihood method cannot be DATA IN SURVEY SAMPLING or has homogeneous topography, the directly applied. We propose a semi- SWG will necessarily require simulation parametric method to estimate DDE Jae kwang Kim*, Iowa State University of nonstationary spatial fields. However, parameters. The key feature of the semi- Wayne A. Fuller, Iowa State University high resolution simulation of a non- parametric method is the use of a flexible Hot deck imputation is popular for stationary process is difficult, typically nonparametric function to represent the handling item nonresponse in survey requiring a Cholesky decomposition of dynamic process. The nonparametric sampling. Fractional hot deck imputa- a matrix whose dimension equals that function is estimated by maximizing the tion is extended to multivariate missing of the desired simulation resolution. We DDE-defined penalized likelihood func- data. The joint distribution of the study introduce an approach to large, high tion. Simulation studies show that the items are nonparametrically estimated resolution nonstationary process simula- semiparametric method gives satisfactory using a discrete approximation. The tion by exploiting ideas very similar to estimates of DDE parameters. The semi- discrete transformation serves to create Sampson and Guttorp (1992), relying parametric method is demonstrated by imputation cells. The fractional impu- on spatially deforming geographical estimating a DDE model from Nicholson’s tation procedure first assigns cells to space to achieve approximate stationar- blowfly population data. ity, then using fast stationary simulation each missing item and then imputes the e-mail: [email protected] real observations within each imputed algorithms, followed by an inverse cell. Replication variance estimation is transformation back to the nonstation- discussed and results from a limited ary plane. We illustrate the algorithm on ZERO-INFLATED SPATIAL TEM- simulation study presented. simulated and real datasets. PORAL MODELS FOR EXPLORING TREND IN COMANDRA BLISTER e-mail: [email protected] e-mail: [email protected] RUST INFECTION IN LODGE POLE PINE TREES ESTIMATING PARAMETERS 115. Environmental Cindy Feng*, University of IN DELAY DIFFERENTIAL Methods with Deter- Saskatchewan ministic and Stochastic EQUATION MODELS Environmental and ecological counts Components Liangliang Wang*, Simon Fraser data are often characterized by an University excess of zeroes, spatial and temporal HIGH RESOLUTION NONSTATION- Jiguo Cao, Simon Fraser University dependence. Motivated by a forestry study of Comandra blister rust (CBR) ARY RANDOM FIELD SIMULATION Delay differential equations (DDEs) are infection of lodge pole pine trees from William Kleiber*, University widely used in ecology, physiology and British Columbia, Canada, we develop a of Colorado, Boulder many other areas of applied science. class of zero-inflated models for analyz- Although the form of the DDE model is Stochastic weather generators (SWGs) ing zero inflated count data. The model usually proposed based on scientific are used in many scientific studies, consists of two components to compare understanding of the dynamic system, including model downscaling, climate the abundance of trees that are resistant parameters in the DDE model are often impact assessments and seasonal to CBR infection and the right skewed unknown. Thus it is of great interest to resource planning. The fundamental count of lesions on each tree from CBR estimate DDE parameters from noisy requirement of a stochastic weather infection. The model incorporates a data. Since the DDE model does not generator is simulated realizations of series of predictors, as well as spatially usually have an analytic solution, and plausible weather patterns. In recent and temporally correlated random effects the numeric solution requires knowing years, focus has shifted to develop- for each model component. The ran-

Program & Abstracts 373 dom effect terms are linked to induce INCORPORATING COVARIATES IN Except in very simple situations, the dependence of the two components DETERMINISTIC ENVIRONMENTAL untestable (from the observed data) and also to provide spatial and temporal MODELS assumptions need to be made for draw- smoothing. Modeling and inference use ing causal inferences. Similar to missing Edward L. Boone*, Virginia the fully Bayesian approach via Markov data problems, the problem can be parti- Commonwealth University Chain Monte Carlo (MCMC) simulation tioned into two components: 1) a model approaches. Ben Stewart-Koster, Australian for the observed data; 2) a set of (reason- Rivers Institute at Griffith University able) assumptions that allow identification e-mail: [email protected] Environmental models have been and estimation of causal estimands developed by both the statistical and given the observed data. Given that the A SPATIO-TEMPORAL APPROACH TO mathematical communities, which are second component is not checkable from MODELING SPATIAL COVARIANCE very good at capturing the complex the observed data, uncertainty about these assumptions is essential for a fair Ephraim M. Hanks*, The Pennsylvania behavior found in nature. While statistics characterization of the uncertainty. We State University has ventured into estimating parameters in deterministic models, not much work contend that these two components can Spatially-correlated data can often be has been done to combine the two be handled most naturally in the Bayes- viewed as being generated by a spatio- approaches in a more unified way. In this ian paradigm using flexible Bayesian temporal process. We illustrate how a talk we present a Bayesian method to nonparametric (BNP) models for the potential spatio-temporal generating incorporate covariates into deterministic observed data and assumptions with process can motivate spatial statistical models. To estimate the model param- sensitivity parameters that can be identi- models, providing a broad framework eters MCMC techniques will be used. The fied with informative priors. BNP models for modeling spatial covariance func- method will be illustrated using the sim- will provide similar robustness to semi- tions. We link spatio-temporal generating ple Lotka-Volterra predator-prey model. parametric approaches. We provide an processes to the Matern class of spatial We will also present how to incorporate illustration of this approach in the setting covariance functions, intrinsic condi- spatial correlation into these models. of the causal effect of mediation. tional autoregressive (ICAR) models, e-mail: [email protected] and others. We present a continuous- e-mail: [email protected] time Markov process on a spatial graph thathas a spatial random fields with ICAR 116. Bayesian and Non- A BAYESIAN NONPARAMETRIC structure as its stationary distribution, and Parametric Bayesian CAUSAL MODEL FOR REGRESSION consider generalizations that allow for Approaches to Causal DISCONTINUITY DESIGNS principled specification of ICAR precision Inference George Karabatsos*, University matrices based on existing knowledge of of Illinois, Chicago the system. We illustrate the utility of this approach through an example of spatial A FRAMEWORK FOR BAYESIAN Stephen G. Walker, University modeling on a stream network. NONPARAMETRIC INFERENCE FOR of Texas, Austin e-mail: [email protected] CAUSAL EFFECTS OF MEDIATION A regression discontinuity design (RDD) Chanmin Kim, Harvard University is a non-randomized design where treat- ment (versus non-treatment) assignment Michael J. Daniels*, University to a subject depends on whether or not of Texas, Austin her/his value of the assignment vari- Jason Roy, University of Pennsylvania

374 ENAR 2015 | Spring Meeting | March 15–18 able crosses a known threshold. Under Regression discontinuity (RD) designs BAYESIAN NONPARAMETRIC relatively mild conditions, the RDD can are often interpreted as local random- ESTIMATION FOR DYNAMIC TREAT- identify and estimate causal effects for ized experiments: a RD design can be MENT REGIMES WITH SEQUENTIAL the subgroup of subjects located in a considered as a randomized experi- TRANSITION TIMES neighborhood around the threshold, as ment for units with a realized value of a Yanxun Xu*, University of Texas, Austin if treatments are randomly assigned to so-called forcing variable falling around those subjects. However, the accurate a pre-fixed threshold. Motivated by the Peter Mueller, University of Texas, Austin estimation of causal effects still relies on evaluation of Italian university grants, Abdus S. Wahed, University of Pittsburgh a correctly-specified statistical model. we consider a fuzzy RD design where Peter F. Thall, University of Texas Also, in applications, it may be of interest the receipt of the treatment is based on MD Anderson Cancer Center to infer causal effects in terms of general both eligibility criteria and a voluntary features of the outcome variable distribu- application status. Resting on the fact Dynamic treatment regimes in oncol- tion, not only the mean. For RDDs, we that grant application and grant receipt ogy and other disease areas are often propose a flexible Bayesian nonparamet- statuses are post-assignment (post- characterized by an alternating sequence ric regression model that can provide eligibility) intermediate variables, we use of treatments or other actions and transi- accurate estimates of causal effects, in the principal stratification framework to tion times between disease states. The terms of the predictive mean, variance, define causal estimands within the Rubin sequence of transition states may vary quantile, probability density, distribution Causal Model. We propose a proba- substantially from patient to patient, function, or any other chosen function of bilistic formulation of the assignment depending on how the regime plays out, the outcome variable. The model allows mechanism underlying RD designs, by and in practice there often are many pos- the entire distribution of the outcome re-formulating the Stable Unit Treatment sible counterfactual outcome sequences. variable to change flexibly as a function Value Assumption (SUTVA) and making For evaluating the regimes, the mean of predictors, and can be extended to an explicit local overlap assumption for final overall time may be expressed as handle multivariate assignment variables. a subpopulation around the threshold. a weighted average of the means of We illustrate the model through the analy- A local randomization assumption is all possible sums of successive transi- sis of two real data sets, involving (resp.) invoked instead of standard continuity tions times. A common example arises a sharp RDD and a fuzzy RDD. Free user- assumptions. We also develop a model- in cancer therapies where the transition friendly software is available based Bayesian approach to select the times between various sequences of for the model. target subpopulation(s) with adjustment treatments, disease remission, disease for multiple comparisons, and to draw progression, and death characterize e-mail: [email protected] inference for the target causal estimands overall survival time. For the general in this framework. Applying the method setting, I propose estimating mean EVALUATING THE EFFECT OF to the data from two Italian universities, overall outcome time by assuming a UNIVERSITY GRANTS ON STU- we find evidence that university grants nonparametric Bayesian survival regres- DENT DROPOUT: EVIDENCE FROM are effective in preventing students from sion for the transition times. I construct A REGRESSION DISCONTINUITY low-income families from dropping out of a dependent Dirichlet process prior DESIGN USING BAYESIAN PRINCI- higher eduction. with Gaussian process base measure (DDP-GP). I summarize the joint posterior PAL STRATIFICATION ANALYSIS e-mail: [email protected] distribution by Markov chain Monte Carlo Fan Li*, Duke University (MCMC) posterior simulation. Then I use Alessandra Mattei, University likelihood-based G-estimation under of Florence the DDP-GP model to estimate causal inference by accounting for all possible Fabrizia Mealli, University of Florence outcome paths, the transition times

Program & Abstracts 375 between successive states, and effects CONSISTENCY OF TREATMENT Interval-censored failure time data arise of covariates and previous outcomes, on EFFECT IN MULTIREGIONAL in a number of fields and many authors each transition time. The Bayesian para- CLINICAL TRIALS have discussed various issues related digm works very well, and the simulation to their analysis. However, most of the Joshua Chen*, Sanofi Pasteur studies suggest that our DDP-GP method existing methods are for univariate data yields more reliable estimates than Global clinical development strategy uti- and there exists only limited research on inverse probability of treatment weighted lizing multi-regional clinical trials (MRCTs) bivariate data, especially on regression (IPTW) method. plays a crucial role in developing innova- analysis of bivariate interval-censored tive medicines. It is readily accepted that data. We present a class of semiparamet- e-mail: [email protected] studying patients from many different ric transformation models for the problem regions within a single trial under a single and for inference, a sieve maximum likeli- 117. Design of Multiregional protocol is an efficient method of trial hood approach is developed. The model Clinical Trials: Theory design. The prevalence of these trials has provides a great flexibility, in particular and Practice been growing over the last few decades. including the commonly used propor- MRCTs are most often conducted as a tional hazards model as a special case, single trial focusing on the overall results, and in the approach, Bernstein polynomi- RANDOM EFFECTS MODELS FOR but when such trials are submitted to als are employed. The strong consistency MULTIREGIONAL CLINICAL TRIAL health authorities, the scope and con- and asymptotic normality of the resulting DESIGN AND ANALYSIS cern often broaden to include the “local” estimators of regression parameters are results. In this presentation, I will discuss Gordon Lan*, Janssen Research established and furthermore, the esti- specific MRCT concerns at the design & Development mators are shown to be asymptotically stage, methods for assessing consis- efficient. Extensive simulation studies are In recent years, developing pharmaceuti- tency of treatment effect across regions conducted and indicate that the pro- cal products via a multiregional clinical and sample size planning. Case stud- posed method works well for practical trial (MRCT) has become more popular. ies will be presented and the methods situations. Also an illustrative example Many studies with proposals on design introduced will be applied to those case with data from an AIDS study is provided. and evaluation of MRCTs under the studies. assumption of a common treatment effect e-mail: [email protected] e-mail: [email protected] across regions have been reported in the literature. However, heterogeneity among METHODS FOR CONTRASTING regions causes concern that the fixed 118. CONTRIBUTED PAPERS: GAP TIME HAZARD FUNCTIONS effects model for combining information Multivariate Survival may not be appropriate for MRCT. In this Analysis Xu Shu*, University of Michigan presentation, we discuss the use of a Douglas E. Schaubel, University continuous random effects model, and of Michigan a discrete random effects model for the A SIEVE SEMIPARAMETRIC MAXI- Times between successive events (i.e., design and evaluation of MRCTs. Many MUM LIKELIHOOD APPROACH FOR gap times) are often of interest in clinical numerical examples will be provided to REGRESSION ANALYSIS OF BIVARI- and epidemiologic studies. While many illustrate the fundamental differences ATE INTERVAL-CENSORED FAILURE methods exist for estimating the effect of between these two random effects TIME DATA covariates on each gap time, relatively approaches. Qingning Zhou*, University of Missouri few methods have targeted comparisons e-mail: [email protected] Tao Hu, Capital Normal University between the gap times themselves. Motivated by the comparison of primary Jianguo Sun, University of Missouri

376 ENAR 2015 | Spring Meeting | March 15–18 and repeat organ transplantation, our ent disease events need to be modeled pretation as the expected life years up to interest is specifically in comparing the simultaneously. Existing case-cohort a pre-specified time point. While a num- gap-time-specific hazard functions. We estimators for multiple disease outcomes ber of techniques have been described propose a two-stage procedure, wherein utilize only the relevant covariate infor- for modeling restricted mean survival with the first stage involves a Cox regression mation in cases and subcohort controls, independent observations, little work has model on the first gap time. Weighted though many covariates are measured been done for clustered data. We apply estimating equations are then solved at for everyone in the full cohort. Intuitively, pseudo-value regression to the clustered the second stage to compare the first making full use of the relevant covariate data framework, and show that it pro- and second gap time hazard functions. information can improve efficiency. To vides a marginal model for the restricted Large-sample properties are derived, with this end, we consider a class of doubly- mean survival parameter. We compute simulation studies carried out to evaluate weighted estimators for both regular and leave one out pseudo-observations finite-sample performance. We apply the generalized case-cohort studies with mul- from estimates of the restricted mean proposed methods to kidney transplant tiple disease outcomes. The asymptotic survival. These are used in a general- data obtained from a national organ properties of the proposed estimators are ized estimating equation to model the transplant registry. derived and our simulation studies show marginal restricted mean survival, and that a gain in efficiency can be achieved obtain consistent estimates of the model e-mail: [email protected] with a properly chosen weight function. parameters. The method is easy to We illustrate the proposed method with implement using standard software once USING FULL COHORT INFORMATION a data set from Atherosclerosis Risk in the pseudo-values are obtained, and TO IMPROVE THE EFFCIENCY OF Communities (ARIC) study. simulation studies show that the method has good operating characteristics. We MULTIVARIATE MARGINAL HAZARD e-mail: [email protected] MODEL FOR CASE-COHORT STUDIES illustrate the method using a bone mar- row transplantation example. Hongtao Zhang*, University of North MARGINAL MODELS FOR Carolina, Chapel Hill e-mail: [email protected] RESTRICTED MEAN SURVIVAL Jianwen Cai, University of North WITH CLUSTERED TIME TO EVENT Carolina, Chapel Hill DATA USING PSEUDO-VALUES SEMI-PARAMETRIC MODELING OF BIVARIATE RECURRENT EVENTS Haibo Zhou, University of North Carolina, Brent R. Logan*, Medical College Chapel Hill of Wisconsin Jing Yang*, Emory University David Couper, University of North Kwang Woo Ahn, Medical College Limin Peng, Emory University Carolina, Chapel Hill of Wisconsin Recurrent events are frequently observed The case-cohort design is widely used Many time-to-event studies are compli- in biomedical studies, and often they in large cohort studies when it is prohibi- cated by the nesting of individuals within consist of more than one type of events tively costly to measure some exposures a cluster, such as patients in the same of interests. Marginal analysis of each for all subjects in the full cohort, espe- center in a multi-center study. These type of recurrent event is useful but can- cially in studies where the disease rate clustered data can be readily handled not address questions on the relationship is low. To investigate the effect of a risk within the Cox model framework; how- between different types of recurrent factor on different diseases, multiple ever, when the proportional hazards events. In this work, we study a dynamic case-cohort studies using the same assumption is violated, the hazard ratio association model that extends a recently subcohort are usually conducted. To is not easily interpretable. Restricted developed quantile association model. compare the effect of a risk factor on mean survival is an alternative summary Our estimating equations are constructed different types of diseases, times to differ- measure that has a useful clinical inter- based on the stochastic processes

Program & Abstracts 377 embedded with bivariate recurrent events a Kaplan-Meier estimate conditional on estimators. We developed and justi- data. The proposed estimation can be the other event. Inference procedures fied a resampling inference procedure implemented by an efficient and stable are developed for estimating survival for variance and covariance estimation. algorithm. We investigate the asymptotic distribution, quantile survival time, The finite-sample performance of the properties of the proposed estimator, and comparing survival distributions, and proposed method was demonstrated via develop proper inference procedures. estimating covariate effects. We compare simulation studies. The proposed method Our proposals are illustrated via simu- the proposed methods with some ad-hoc was illustrated through an application to a lation studies and an application to a methods through simulations. Simulation clinical study. results show that these multiple imputa- registry dataset. e-mail: [email protected] tion methods perform well consistently e-mail: [email protected] while the performance of ad-hoc methods depends on simulation settings. We also 119. CONTRIBUTED PAPERS: ANALYSIS OF A COMPOSITE apply the proposed methods to a real Constrained Inference ENDPOINT UNDER DIFFERENT data example. CENSORING SCHEMES FOR e-mail: [email protected] COMPONENT EVENTS VIA ORDER STATISTICS FROM MULTIPLE IMPUTATION LINDLEY DISTRIBUTION AND QUANTILE REGRESSION FOR SUR- THEIR APPLICATIONS Yuqi Chen*, University of California, VIVAL DATA WITH DELAYED ENTRY Santa Barbara Khalaf S. Sultan*, College of Science Boqin Sun*, University of King Saud University, Saudi Arabia Chunlei Ke, Amgen Inc. Massachusetts, Amherst Wafaa S. AL-Thubyani, College Jianming Wang, Celgene Corporation Jing Qian, University of Massachusetts, of Science King Saud University, A composite endpoint is often used Amherst Saudi Arabia as the endpoint of primary interest for Delayed entry arises frequently in follow- We derive the exact expressions for the various reasons in clinical trials. It may up studies for survival outcomes, where single and product moments of order happen that the component events are additional study subjects enter during statistics from Lindley distribution. Then, monitored or collected in different ways, the study period. We propose a quantile we use these moments to obtain the best thus potentially leading to different cen- regression model to analyze survival linear unbiased estimates of the location soring schemes among the components. data subject to delayed entry and right- and scale parameters (BLUEs) based on It becomes challenging to define the time censoring. Such a model offers flexibility Type-II censoring. Also, we use utilize the variable for the composite endpoint to be in assessing covariate effects on survival single and product moments to develop used for analysis. Some ad-hoc methods outcome and the regression coefficients the correlation goodness-of-fit test of are used by imputing or defining the time are interpretable as direct effects on the Lindley distribution. In addition, we variable based on that of the compo- the event time. Under the conditional calculate the power of the test based on nent events, which may be inefficient or independent censoring assumption, we some alterative distributions. In order to involve some assumptions. In this article, proposed a weighted martingale-based show the usefulness of the findings of the we propose three multiple imputation estimating equation, and formulated paper we carry out some Monte Carlo based methods under a monotone cen- the solution finding as a L1-type convex simulations. Finally, we discuss some soring scheme: to impute the event time optimization problem, which was solved applications based on real data sets. marginally using Kaplan-Meier estimates; through a linear programming algorithm. to impute based on a Cox proportional e-mail: [email protected] We established uniform consistency hazard model; and to impute the event and weak convergence of the resultant time of one component event based on

378 ENAR 2015 | Spring Meeting | March 15–18 CLME: A TOOL FOR INFERENCE IN ORDER-CONSTRAINED PARTIAL LIKELIHOOD ESTIMATION LINEAR MIXED EFFECTS MODELS BAYESIAN NONPARAMETRIC OF ISOTONIC PROPORTIONAL UNDER INEQUALITY CONSTRAINTS MODELING OF CORRELATED HAZARDS MODELS THREE-WAY ROC SURFACES Casey M. Jelsema*, National Institute Yunro Chung*, University of North of Environmental Health Sciences, Beomseuk Hwang*, Eunice Kennedy Carolina, Chapel Hill Shriver National Institute of Child Health National Institutes of Health Anastasia Ivanova, University of North and Human Development, National Shyamal D. Peddada, National Carolina, Chapel Hill Institutes of Health Institute of Environmental Health Michael Hudgens, University of North Zhen Chen, Eunice Kennedy Shriver Sciences, National Institutes of Health Carolina, Chapel Hill National Institute of Child Health In many applications researchers and Human Development, National Jason Fine, University of North are interested in testing for inequality Institutes of Health Carolina, Chapel Hill constraints in the context of linear fixed effects and mixed effects models. For example, a researcher may wish to test In many multiple diagnostic tests, some a We consider estimation of the semipara- for an increasing response over increas- priori constraints may exist. Also, the true metric proportional hazards model with a ing dose levels. Popular procedures disease often has more than two levels, completely unspecified baseline hazard such as ANOVA only test for differences; e.g., stages of endometriosis. In this case, function where the effect of a continu- estimation subject to linear inequality traditional binary diagnostic measures ous covariate is assumed monotone but constraints can often yield greater power such as ROC curve and AUC need to be otherwise unspecified. Previous work on to detect an effect. Furthermore, while extended to handle three- or more- classi- full nonparametric maximum likelihood there exists a large body of literature for fication problem assuming the constraints. estimation for isotonic Cox proportional performing statistical inference for linear We propose a nonparametric Bayesian hazard regression with right censored models subject to inequality constraints, joint modeling framework for three-dimen- data is computationally intensive, lack- user friendly statistical software for sional ROC surfaces that accounts for ing theoretical justification, and may be implementing such methods is lacking. stochastic and variability orders. The sto- prohibitive in large samples. We study We develop a package in the R language, chastic order constrains the distributional partial likelihood estimation. An iterative CLME, which can be used for testing a centers of the three disease populations quadratic programming method (IQM) broad collection of inequality constraints. within each test, while the variability order is proposed, with theoretically justified It uses residual bootstrap based meth- constrains the distributional spreads of the convergence properties. However, unlike odology which is reasonably robust to multiple tests within each of three popula- with likelihoods for isotonic parametric non-normality as well as heteroscedas- tions. We demonstrate the performance regression models, the IQM for the partial ticity. The package contains a graphical of the proposed approach using data likelihood cannot be implemented using interface built using the shiny package, from the Physician Reliability Study that standard pool adjacent violators tech- enabling a researcher with minimal investigated the accuracy of diagnosing niques, increasing the computational knowledge of R to easily take advantage endometriosis. To address the issue of no burden. An alternative pseudo iterative of the methods implemented in CLME. gold standard in the real data, we use a convex minorant algorithm (PICM) is We illustrate the package using real-world sensitivity analysis approach that exploited datasets, and demonstrate the graphical diagnostic results from a panel of experts. interface. e-mail: [email protected] e-mail: [email protected]

Program & Abstracts 379 presented which exploits such tech- COVARIATE BALANCED 120. CONTRIBUTED PAPERS: niques and is also theoretically justified. RESTRICTED RANDOMIZATION: Nonparametric The algorithms are extended to models OPTIMAL DESIGNS, EXACT TESTS, Methods with time-dependent covariates. Analysis AND ASYMPTOTIC PROPERTIES of real data illustrates the practical utility Jingjing Zou*, Columbia University of the isotonic methodology in estimating NONPARAMETRIC AND SEMIPARA- nonlinear covariate effects. Jose R. Zubizarreta, Columbia METRIC ESTIMATION IN MULTIPLE University COVARIATES email: [email protected] In randomized experiments, the act of Richard Charnigo*, University randomly assigning units to treatment (i) of Kentucky NONPARAMETRIC TESTS OF physically induces a distribution that can Limin Feng, Intel Corporation UNIFORM STOCHASTIC ORDERING be used for exact testing and (ii) ensures Cidambi Srinivasan, University Chuan-Fa Tang*, University that both observed and unobserved of Kentucky of South Carolina covariates are balanced across the treat- ment groups in expectation. However, in We consider the problem of simulta- Joshua M. Tebbs, University a given realization of the random assign- neously estimating a mean response of South Carolina ment mechanism, the covariates may function and its partial derivatives, when Dewei Wang, University exhibit considerable imbalances due the mean response function depends of South Carolina to chance, especially if the experiment nonparametrically on two or more has a small number of units. To address We derive nonparametric procedures for covariates. To address this problem, this limitation while explicitly using the testing for and against uniform sto- we propose a “compound estimation” randomization distribution for inference, chastic ordering in the two-population approach, in which differentation and in this paper we propose a new method setting with continuous distributions. We estimation are interchangeable: an esti- for achieving strong forms of balance on account for this ordering by examining mated partial derivative is exactly equal the observed covariates and develop a the least star-shaped majorant of the to the corresponding partial derivative of procedure for conducting exact infer- ordinal dominance curve formed from the estimated mean response function. ences based on a covariate balanced the nonparametric maximum likelihood Compound estimation yields essentially restricted randomization distribution. We estimators of the continuous distribution optimal convergence rates and exhibits derive asymptotic results for the proce- functions F and G. In particular, we focus substantially smaller squared error in dure and illustrate its use on an important on testing equality of F and G versus uni- finite samples compared to local regres- randomized experiment that was used for form stochastic ordering and testing for a sion. We also explain how to employ targeting the poor in Indonesia. It is dem- violation of uniform stochastic ordering. compound estimation under more onstrated through both theoretical results For both testing problems, we propose a general circumstances, when the mean and simulation studies that the proposed family of Lp norm statistics, derive appro- response function depends parametri- method outperforms unrestricted meth- priate limiting distributions, and provide cally on some additional covariates and ods with higher power. simulation results that characterize the the observations are not statistically performance of our procedures. We email: [email protected] independent. In a case study, we apply illustrate our methods using data from a compound estimation to examine how study involving premature infants and the the progression of Parkinson’s disease occurrence of necrotizing enterocolitis. may relate to a subject’s age and the email: [email protected]

380 ENAR 2015 | Spring Meeting | March 15–18 signal fractal scaling exponent of the sub- Wolfowitz NPMLE to high-dimensional A TEST FOR DIRECTIONAL DEPAR- ject’s recorded voice. Especially among classification problems. In simulated TURE FROM LOEWE ADDITIVITY those intermediate in age, an abnormal data, these methods dramatically out- Mingyu Xi*, University of Maryland, signal fractal scaling exponent may por- perform well-known alternative methods Baltimore County tend greater symptom progression. (by an order of magnitude, in some instances); in real genomic data analy- In assessing the impact of exposure email: [email protected] ses, the NPMLE methods appear to be to chemical mixtures, scientists often very competitive. Theoretical results will assume simplified models where the combined effect of chemicals is the sum NONPARAMETRIC EMPIRICAL also be discussed. This is joint work with of individual effects. One such assump- BAYES VIA MAXIMUM LIKELI- Sihai Dave Zhao and Long Feng. tion is Loewe additivity. However, in HOOD FOR HIGH-DIMENSIONAL email: [email protected] practice this is often violated due to CLASSIFICATION positive interaction (synergistic interac- Lee H. Dicker, Rutgers University tion) or negative interaction (antagonistic NONPARAMETRIC INFERENCE interaction). If the combined effect of Sihai D. Zhao, University of Illinois, FOR AN INVERSE-PROBABILITY- mixture is more potent than the simple Urbana-Champaign WEIGHTED ESTIMATOR WITH sum of individual effects, the interaction Long Feng*, Rutgers University DOUBLY TRUNCATED DATA is positive; less potent, the interaction is Nonparametric empirical Bayes methods Xu Zhang*, University of Mississippi negative. Only parametric model based are naturally suited for many problems Medical Center tests are available in the literature for in high-dimensional statistics. The Efron and Petrosian (1999) formulated the testing directional interaction. However, Kiefer-Wolfowitz (1956) nonparametric problem of double truncation and pro- models for less analyzed mixtures may maximum likelihood estimator (NPMLE) posed nonparametric methods on testing not be readily available and hence be for mixture models provides an elegant and estimation. An alternative estimation grossly mis-specified, compromising the approach to some of these problems. method was proposed by Shen (2010), power of the test. Based on the observed However, implementation and theoretical utilizing the inverse-probability-weighting contour profiles, we propose a novel analysis of the Kiefer-Wolfowitz NPMLE technique. One aim of this paper was to nonparametric test for directional interac- are notoriously difficult. Recently, Koen- assess the computational complexity of tion. The test is shown to be robust and ker and Mizera (2013) proposed a fast the existing estimation methods. Through is applied to the motivating example of method for approximately computing a simulation study we found that these testing for directional interaction among the Kiefer-Wolfowitz NPMLE based on two estimation methods have the same common battery waste chemicals such convex optimization. This computational level of computational efficiency. The as Nickel, Cadmium and Chromium. breakthrough has greatly simplified the other aim was to study the non-iterative email: [email protected] application of NPMLE-based empirical IPW estimator under the condition that Bayes methods. In this talk, we propose the truncation variables are indepen- some novel applications of the Kiefer- dent. The IPW estimator and the interval estimatoin was proved satisfactory in the simulation study. email: [email protected]

Program & Abstracts 381 ESTIMATION AND CONFIDENCE approach to developing such tests and BANDS FOR NONPARAMETRIC introduce a class of such tests that takes REGRESSION WITH FUNCTIONAL advantage of developments in Bayesian RESPONSES AND MULTIPLE nonparametric computing. This class of SCALAR COVARIATES tests use the connection between the Dirichlet process prior and the Wilcoxon Andrada E. Ivanescu*, Montclair rank sum test but extends this idea to State University the mixture of Dirichlet process model. The proposed nonparametric functional Given consistency results for this class of regression methodology accounts for a models we develop tests that have appro- nonlinear dependence involving several priate frequentist sampling procedures scalar predictors in a modeling approach for large samples but have the potential that explains changes in responses that to outperform the usual frequentist tests. consist of functional sampled data. The Extensions to interval and right censoring goal is to achieve adaptive inference and are considered and an application to a a pathway is to establish thresholded high dimensional data set obtained from estimators and construct nonparamet- a RNA-Seq investigation demonstrates ric confidence bands through a sparse the practical utility of the method. estimation strategy developed using a email: [email protected] data-supplied approach to determine the threshold levels. Applications to data analysis and simulations display results that show optimal implementations. email: [email protected]

NONPARAMETERIC BAYESIAN ANALYSIS OF THE 2 SAMPLE PROBLEM WITH CENSORING Kan Shang*, University of Minnesota Cavan Sheerin Reilly, University of Minnesota Testing for differences between 2 groups is a fundamental problem in statistics and due to developments in Bayesian non- parametrics and semiparametrics there has been renewed interest in approaches to this problem. Here we describe a new

382 ENAR 2015 | Spring Meeting | March 15–18 ENAR 2015

Index

Abarin, Taraneh  | 32 Asar, Ozgur  | 20 Abner, Erin  | 47, 72 Awadalla, Saria S.  | 99 Aboukhamseen, Suja  | 71 Ayala, Guadalupe X.  | 59 Adomavicius, Gediminas  | 78 Baccarelli, Andrea A.  | 56 Aguilar, Ruth  | 68 Bacher, Rhonda L.  | 6m Aharoni, Ehud  | 37 Baek, Jonggyu  | 71 Ahn, Jeongyoun  | 18 Baek, Songjoon  | 87 Ahn, Kwang Woo  | 86, 118 Bai, Jiawei  | 59 Ahn, Mihye  | 1j Bai, Shasha  | 55 Ahn, Mihye  | 2i Bair, Eric  | 6d, 6h, 35g Airan, Raag  | 43 Bakoyannis, Giorgos  | 16 al’Absi, Mustafa  | 41 Bakshi, Rohit  | 2d Albert, Paul S.  | 20, 47, 58 Baladandayuthapani, Veerabhadran 9d, 45, 81, 120 Alexeeff, Stacey E.  | 17 Balasubramanian, Sriram  | 1l Allen, Genevera I.  | 35m, 80 Ballentyne, Rachel  | 31 AL-Marshadi, Ali H.  | 8j Ballinger, Mandy L.  | 97 AL-Thubyani, Wafaa S.  | 119 Ballman, Karla  | 18 Altintas, Ilkay  | 53 Bandyopadhyay, Dipankar  | 99 Alvarez-Esteban, Pedro  | 59 Bandyopadhyay, Sunayan  | 78 Ambrogi, Federico  | 114 Banerjee, Anjishnu  | 8i Amos, Christopher  | 61g Banerjee, Moulinath  | 85 Anderson, Keaven  | T1 Banerjee, Sayantan  | 9d Anderson, Mark C.  | 2d Banerjee, Sudipto  | 17, 19, 99 Anderson, Stewart J.  | 44 Bao, Junshu  | 94 Andrew, Michael E.  | 94 Bao, Le  | 90 Andridge, Rebecca R.  | 3c, 21 Barnes, Kathleen C.  | 26 Ankerst, Donna Pauler  | 1c, 97, 108 Baro, Elande  | 46 Antonelli, Joseph L.  | 22 Barrdahl, Myrto  | 95 Aponte, John  | 68 Barry, Chris  | 6i Arab, Ali  | 39 Bartsch, Andreas J.  | 2k Arnold, Susan  | 99

Program & Abstracts 383 Basu, Cynthia  | 36 Boone, Edward L.  | 115 Cao, Yumei  | 7k Basu, Sanjib  | 89 Boonstra, Philip S.  | 106 Capuano, Ana W.  | 20 Basu, Saonli  | 61d, 61r, 64 Bosch, Ronald J.  | 24 Carlin, Bradley P.  | 3m, 36 Basu, Sumanta  | 27 Bose, Maitreyee  | 22 Carmichael, Owen  | 38 Bauman, Julie  | 14 Bott, Marjorie J.  | 42 Carnethon, Mercedes  | 59 Bebu, Ionut  | 48k Bowman, DuBois  | 2a Carroll, Margaret Devers  | 51 Beck, J. Robert  | 3g Bradic, Jelena  | 67 Carroll, Raymond J.  | 8f, 95 Beck, James D.  | 30 Braun, Danielle  | 107 Carroll, Regina A.  | 3f Beckerman, Bernardo  | 17 Braun, Thomas M.  | 14, 44, 85 Cavanaugh, Joseph  | 1k Beer, Joanne C.  | 6f Breidt, F. Jay  | 109 Celniker, Susan  | 96 Begg, Colin B.  | 29, 88 Brinkman, Ryan 6j Chai, Hao  | 74 Begg, Melissa D.  | 92 Broman, Karl W. 95 Chalise, Prabhakar  | 56 Bekelman, Justin  | 5e Brookmeyer, Ron  | 40 Chan, Kwun Chuen Gary  | 60 Bellach, Anna  | 16 Brooks, John M.  | 10 Chan, Wenyaw  | 72 Belousov, Anton  | 108 Brooks, Maria  | 20 Chang, Changgee  | 70 Bengtsson, Henrik  | 52 Brown, Elizabeth  | 24 Chang, Chung-Chou H.  | 16 Benjamin, Sara E.  | 113 Browning, Sharon R.  | 64 Chang, Mark  | 48f Benjamini, Yuval  | 12 Brownstein, Naomi C.  | 6h Chapman, Cole G.  | 10 Benoit, Julia  | 72 Bruckner, Mathew  | 3f Chatterjee, Arpita  | 7b, 48m, 48p Bentil, Ekua  | 98 Brumback, Babette A.  | 30 Chatterjee, Nilanjan  | 8f, 64, 95 Beresovsky, Vladislav  | 114 Bryant, Christopher  | 1j Chatterjee, Somak  | 44 Bernhardt, Paul W.  | 34 Buchanan, Ashley L.  | 5f Chattopadhyay, Pratip  | 6j Bernstein, Jason  | 53 Buck Louis, Germaine M.  | 44 Chatu, Sukhdev | 46 Berrocal, Veronica J.  | 2k, 71, 82 Budenz, Donald L.  | 71 Chaurasia, Ashok K.  | 58 Berry, Donald A.  | 11 Bugbee, Bruce D.  | 45 Chawla, Akshita  | 84 Berry, Scott M.  | 33, 36, 48g Buhule, Olive D.  | 20 Che, Xuan  | 107 Betensky, Rebecca  | 4a, 83 Bull, Shelley B.  | 95 Chekouo, Thierry  | 88 Bi, Xuan  | 84 Burchett, Woodrow W.  | 85 Chen, Fang  | T5 Biau, Gerard  | 62 Burchfiel, Cecil M.  | 94 Chen, Fei  | 117 Bilder, Christopher R.  | 97 Burgette, Lane F.  | 113 Chen, Gong  | 108 Billard, Lynne  | 109 Caffo, Brian  | 2m, 5b, 43, 87, 109 Chen, Guanhua  | 58 Billig, Erica  | 7i Cai, Bo  | 34 Chen, Han  | 61o Bimali, Milan  | 83 Cai, Jianwen  | 4l, 34, 118 Chen, James J.  | 8e Bjornstad, Ottar N.  | 82 Cai, Tianxi  | 16, 35p, 50, 90 Chen, Jarvis  | 5g Blair, Aaron  | 17 Cai, Tony  | 31 Chen, Jia-Yuh  | 44 Bleich, Justin  | 62 Cai, Yi  | 21 Chen, Joshua  | 117 Bliznyuk, Nikolay  | 7g Cai, Zhuangyu  | 30 Chen, Jun  | 18, 35c Boatman, Jeffrey A.  | 5c Cao, Guanqun  | 45 Chen, Li  | 17 Boehnke, Michael L.  | 61b, 61h Cao, Jiguo  | 96, 115 Chen, Lin S.  | 37, 84 Boerwinkle, Eric  | 61f Cao, Sherry  | 6k Chen, Meiyang  | 105 Bojadzieva, Jasmina  | 97 Cao, Yuan  | 43 Chen, Mengjie  | 37 Bondell, Howard  | 8g Cao, Yuanpei  | 35d Chen, Ming-Hui  | 111

384 ENAR 2015 | Spring Meeting | March 15–18 Chen, Oliver  | 5b Chu, Li-Fang  | 6i Davis, Sonia M.  | 59, 91 Chen, Qingxia  | 111 Chu, Wanghuan  | 70 Dawson, Jeffrey D.  | 20 Chen, Rongqi  | 114 Chua, Alicia S.  | 2d de Castro, Mario  | 111 Chen, Rui  | 72 Chung, Dongjun  | 31, 110 de Koning, Harry  | 11 Chen, Shaojie  | 43, 109 Chung, Moo K.  | 25 de los Campos, Gustavo  | 6b Chen, Sheau-Chiann  | 48i Chung, Yunro  | 119 de Somer, Marc L.  | 71 Chen, Tian  | 8g Ciarleglio, Adam  | 35i DeBoer, Mark D.  | 94 Chen, Tian  | 72 Colantuoni, Elizabeth A.  | 48o DeGruttola, Victor  | 69 Chen, Wei  | 61g, 61h, 95 Cole, Stephen R.  | 5f Delamater, Alan M.  | 59 Chen, Xiwei  | 18 Coley, Yates  | 77 DeMauro, Sara B.  | 42 Chen, Yakuan  | 45 Conaway, Mark R.  | 48b DeMets, Dave  | 75 Chen, Yang  | 53 Connett, John  | 99 Deng , Yanzhen  | 35a Chen, Yeh-Fong  | 48f Conway, Baqiyyah N.  | 94 Deng, Yi  | 84 Chen, Yeh-Fong  | 91 Coogan, Patricia  | 17 Deng, Yu  | 34, 59 Chen, Yi-Fan  | 20 Cook, Richard J.  | 30, 93, 112 Devanarayan, Viswanath  | 108 Chen, Yi-Hau  | 8f Cook, Tyler  | 60 Dewey, Blake  | 2c Chen, Ying Q.  | 65 Coombes, Brandon J.  | 61d Dey, Dipak K.  | 34, 94 Chen, Yong  | 21, 61l, 95 Cornea, Emil A.  | 85, 111 Diah, Jonathan (JJ) H.  | 19 Chen, Yu  | 57 Cottrell, Lesley  | 7f Diao, Guoqing  | 111 Chen, Yue-Ming  | 98 Coull, Brent A.  | 5g Diaz, Francisco J.  | 68 Chen, Yuqi  | 118 Couper, David  | 118 Díaz, Iván  | 36 Chen, Zhe  | 1g Cox, Nancy  | 37 Diener-West, Marie  | 92 Chen, Zhen  | 107, 119 Crainiceanu, Ciprian M.  | 2b, 2c, Diergaarde, Brenda  | 14 2m, 5b, 9f, 9j, 34, 41, 45, 59, T4 Chen, Zhijian  | 95 Diez Roux, Ana V.  | 31, 64 Craiu, Radu V.  | 95 Cheng, Cheng  | 18 Diggle, Peter J.  | 20 Cramer, Steven C.  | 57 Cheng, Joyce  | 9i Ding, Wei  | 84 Cui, Jike  | 6k Cheng, Longjie  | 70 Ding, Ying  | 86 Cui, Yuehua  | 43, 61c, 61n, 74 Cheng, Wenting  | 9b Do, Kim-Anh  | 71, 88, 97 Cunanan, Kristen May  | 3l Cheng, Yansong  | 30 Dobaño, Carlota  | 68 Cutter, Gary R.  | 48q Cheng, Yichen  | 26 Dobbin, Kevin K.  | 55 Dai, James Y.  | 24, 26 Cheung, Ken  | 14, 48b Doecke, James  | 88 Dai, Tian  | 55 Chi, Yunchan  | 48i Dominici, Francesca  | 17, 73 Dai, Wei  | 50 Chien, Jeremy  | 56 Dong, Jun  | 111 Daley, Christine  | 33 Chinchilli, Vernon M.  | 44 Doove, Lisa L.  | 33 Dang, Xibei  | 6h Chipman, Jonathan  | 107 Doss, Hani  | 1g Daniels, Michael J.  | 2l, 73, 116 Chitnis, Tanuja  | 2d Drake, Daniel  | 2a Das, Ritabrata  | 85 Choi, Dongseok  | 6f Draper, David  | 77, SC1 Dasgupta, Sayan  | 106 Choi, Hee Min  | 87 Du, Fang  | 110 Datta, Abhirup  | 19 Choi, Wansuk  | 22 DuMouchel, William  | 15 Davidian, Marie  | 108, SC4, R9 Choi, Won  | 33 Dunbrack Jr., Roland L.  | 3g Davidson, Philip W.  | 17 Chouldechova, Alexandra  | 67 Dunson, David  | 9a Davis, Barry  | 60 Chu, Haitao  | 21, 69 Durkalski, Valerie | 48j

Program & Abstracts 385 Dusseldorp, Elise | 33 Fisher, Aaron | 87 Genton, Marc G. | 57 Eberly, Lynn E. | 2f Fisher, William | 96 George, Edward I. | 62 Egleston, Brian L. | 3g Fitzgerald, Anthony P. | 20 Gerson, Jason | 10 Egorova, Svetlana | 2d Fletez-Brant, Kipper | 6j Gertheiss, Jan | 94 Elmi, Angelo | 58 Flournoy, Nancy | 68 Geyer, Charles J. | 87 Eloyan, Ani | 2c, T4 Fonseca, Miguel S. | 96 Ghebrehawariat, Kidane B. | 86 Eltinge, John L. | 51 Fortin, Jean-Philippe | 56 Ghosh, Joyee | 87 Engel, Lawrence | 17 Fotouhi, Ali Reza | 58 Ghosh, Malay | 8a Englehardt, Barbara | 26 Foulkes, Andrea S. | 31 Ghosh, Samiran | 48m, 48p Epstein, David | 41 Franceschini, Nora | 61f Ghosh, Santu | 48m, 48p Erion, Gabriel | 69 Frangakis, Constantine | 16 Ghosh, Souparno | 7e Ertefaie, Ashkan | 33 Freeland, Katherine E. | 5d Ghosh, Sujit | 22 Ertin, Emre | 41 French, Benjamin | 7h Giannobile, William V. | 85 Etzioni, Ruth | 11 Fricks, John | 53 Gillespie, Scott | 3i Euán, Carolina | 59 Fridley, Brooke L. | 56, 83 Giovanello, Kelly Sullivan | 57 Evani, Bhanu Murthy | 73 Fridlyand, Jane | 48e Gipson, Philip S. | 7e Evans, Katie | 59 Frise, Erwin | 96 Glynn, Nancy W. | 9f Evenson, Kelly R. | 59 Fu, Haoda | 3m, 33 Gneiting, Tilmann | 28 Faes, Christel | 1b Fu, Rong | 81 Goday, Praveen S. | 7k Fan, Ailin | 33 Fu, Zhixuan | 4g Goldsmith, Jeff | 45 Fan, Jianqing | 12 Fuentes, Claudio | 74 Goldstein, Joshua | 69, 82 Fan, Liqiong | 48l Fuentes, Montserrat | 82 Gonen, Mithat | R5 Fan, Ruzong | 61g, 61h Fuller, Wayne A. | 114 Gonzalez, Joe Fred | 51 Fan, Yong | 2i Furey, Terrence S. | 37 Gosik, Kirk | 98 Fang, Xingyuan | 10 Gaile, Daniel P. | 18 Grandhi, Anjana | 86 Farewell, Vern | 93 Gajewski, Byron J. | 33, 42 Grantz, Katherine | 20 Favorov, Alexander V. | 9h Galagate, Douglas | 73 Graubard, Barry I. | 42 Fei, Zhe | 56 Gallagher, Colin M. | 69 Grazier-G’sell, Max | 67 Feingold, Eleanor | 6e, 95 Gamage, Purna S. | 7e Greene, Tom H. | 104 Feng, Cindy | 115 Gao, Bin | 43 Greenwood, Celia M. T. | 56 Feng, Limin | 120 Gao, Lei | 3j Grego, John M. | 66 Feng, Long | 120 Garcia, Tanya P. | 99 Griffin, Felicia R. | 7l Feng, Qianjing | 105 Garrard, Lili | 42 Grill, Diane | 18 Feng, Rui | 42 Gaynor, Sheila | 18 Grill, Sonja | 108 Feng, Yang | 12 Gebregziabher, Mulugeta | 1d Groth, Caroline P. | 17 Feng, Ziding | 81, 97 Gelernter, Joel | 31, 110 Groves, Eric | 48n Ferrari, Matthew | 69 Gelfand, Alan E. | 82 Grubesic, Tony H. | 71 Ferrell, Rebecca | 15 Gellar, Jonathan E. | 34 Gu, Chiyu | 120 Fertig, Elana J. | 9h, 56 Geller, Nancy L. | 49 Gu, Mengyang | 82 Feuer, Eric J. | 11 Gelman, Andrew | 42 Gu, Quanquan | 43 Fine, Jason P. | 16, 60, 89, 119 Geng, Ziqan | 37 Guan, Yongtao | 19 Finlayson, Teresa | 51 Gennings, Chris | 73 Guha, Sharmistha | 61d

386 ENAR 2015 | Spring Meeting | March 15–18 Guha, Subharup | 81 He, Tao | 61n Hu, Bo | 104 Guinness, Joseph | 82 He, Xin | 13 Hu, Fengjiao | 56 Gulati, Roman | 11 He, Xuming | 54, 70 Hu, Jianhua | 54, 88 Gunewardena, Sumedha | 56 He, Yulei | 114 Hu, Lechuan | 2h Gunn, Laura H. | 46 He, Zihuai | 6g, 31, 64 Hu, Liangyuan | 32 Guo, Wenge | 86 Heagerty, Patrick J. | 50, 90 Hu, Ming | 37 Guo, Wensheng | 58, 104 Healy, Brian C. | 2d Hu, Tao | 34, 118 Guo, Xiuqing | 31, 64 Hedeker, Donald | 20 Hu, Yu | 56, 102 Guo, Ying | 55 Heijnsdijk, Eveline | 11 Hu, Yue | 35m, 80 Gupta, Shuva | 68 Helgeson, Erika S. | 35g Huang, Chao | 2i, 105 Gurka, Matthew J. | 94 Henderson, Nicholas C. | 3d, 95 Huang, Chaorui | 97 Guttmann, Charles R. | 2d Henderson, Robin | 76 Huang, Chiung-Yu | 13, 86 Hade, Erinn M. | 46 Hermans, Lisa | 48a Huang, Emily J. | 32 Hager, Gordon L. | 87 Hernandez-Stumpfhauser, Daniel Huang, Guan-Hua | 98 109 Hakonarson, Hakon | 31 Huang, Haiyan | 38 Herring, Amy H. | 9a, 49 Hall, Charles B. | 71 Huang, Hsin-Cheng | 80 Hesterberg, Dean | 82 Halloran, M. Elizabeth | 69 Huang, Jianhua | 66, 96 Hibbard, Jonathan | 76 Halpern, Carolyn | 9a Huang, Kuan-Chieh | 56 Higdon, Dave M. | 82 Hamada, Chikuma | 3e Huang, Lei | 59 Hilafu, Haileab | 70 Hammonds, Ann | 96 Huang, Xuelin | 113 Hirakawa, Akihiro | 3e Han, Fang | 109 Huang, Yi | 46 Hitchcock, David B. | 66 Han, Sung Won | 43 Huang, Yisong | 7b Hobbs, Brian | 36, 71, 88 Hancock, William | 53 Hudgens, Michael G. | 5f, 69, 119 Hobert, James P. | 87 Handorf, Elizabeth | 5e Hudson, Thomas J. | 56 Hochberg, Marc C. | 13 Haneuse, Sebastien | 22, 79 Hughes, John | 2f Hodge, Domonique Watson | 8d Hanfelt, John J. | 30 Hughes, James P. | 40 Hodges, James S. | 22 Hanks, Ephraim M. | 115 Hughes, Sara | 23 Hoefler, Josef | 1c Hanley, Daniel | 2b Hung, Hsien-Ming James | 36 Hoeting, Jennifer A. | 66 Hansen, Kasper D. | 6p, 56 Hung, Ying | 53 Hoffmann, Raymond G. | 7k Hanson, Timothy E. | 4f, 94 Huo, Zhiguang | 21 Hogan, Joseph W. | 24, 32 Hao, Han | 61m Huynh, Minh | 107 Hong, Chuan | 21, 95 Haran, Murali | 69, 82 Hwang, Beomseuk | 119 Hooker, Giles | 62 Harezlak, Jaroslaw | 1e, 9f, 68 Iasonos, Alexia | 48b, SC3 Hooten, Mevin B. | 100 Harpaz, Rave | 15 Ibrahim, Joseph G. | 1j, 21, 35j, 57, Hoots, Brooke | 51 74, 85, 111 Harrell, Frank E. | T2 Horton, Beth | 79 Ilk Dag, Ozlem | 72 Harris, Jonathan | 1l Hou, Peijie | 97 Imbriano, Paul M. | 42 Harris, Tamara B. | 9f Hou, Lifang | 56 Inan, Gul | 72 Harter, Rachel M. | 51 Hsiao, Chiaowen Joyce | 103 Ionan, Alexei C. | 55 Harvey, Richard | 77 Hsieh, Hsin-Ju | 48e Ionita-Laza, Iuliana | 61i He, Bing | 110 Hsu, Jesse Y. | 32, 58 Irizzary, Rafael | 88 He, Jianghua | 33 Hsu, Paul | 84 Irony, Telba | R6 He, Kevin | 60, 74

Program & Abstracts 387 Isasi, Carmen R. | 59 Karabatsos, George | 116 Kolar, Mladen | 12, 67 Ivanescu, Andrada E. | 120 Kardia, Sharon L. R. | 31, 64 Kolm, Paul | 4k, 32 Ivanova, Anastasia | 119 Karlson, Elizabeth W. | 50 Kong, Dehan | 57 Iyengar, Satish | 70 Katki, Hormuzd | 40 Kong, Shengchun | 72 Jablonski, Kathleen A. | 83 Kaufeld, Kimberly | 71 Kong, Xiangrong | 40 Jabrah, Rajai | 48c Kaul, Abhishek | 47 Konikoff, Jacob Moss | 40 Jackson, Dan | 21 Kawaguchi, Atsushi | 3b Kooperberg, Charles | 26 Jadhav, Sneha | 61e Ke, Chunlei | 118 Koopmeiners, Joseph S. | 3l Jelsema, Casey M. | 119 Kearney, Patricia M. | 20 Korthauer, Keegan D. | 6l Jensen, Shane T. | 62 Kelley, George A. | 3f Kosel, Alison E. | 50 Jeong, Jong-Hyeon | 83, 86 Kendziorski, Christina | 6i, 6l, 6m Kosorok, Michael R. | 16, 35n, 58, 76, 106 Jerrett, Michael | 17 Kennedy, Edward H. | 46 Kou, Samuel | 53 Ji, Hongkai | 110 Kennedy, Richard E. | 48q Koul, Hira L. | 47 Ji, Yuan | 54 Kenward, Michael G. | 48a Kozbur, Damian | 12 Jia, Cheng | 56, 102 Khalili, Abbas | 31 Kraft, Peter | 95 Jiang, Bo | 102 Khan, Diba | 114 Krieger, Nancy | 5g Jiang, Hui | 102 Khare, Kshitij | 8a Kryscio, Richard J. | 7j, 17, 47, Jiang, Jiancheng | 12 Khondker, Zakaria | 111 72, 93 Jiang, Libo | 96 Kim, Chanmin | 73, 116 Kudela, Maria A. | 1e Jiang, Peng | 6i Kim, Chulmin | 20 Kumar, Santosh | 41 Jiang, Qi | 89 Kim, Clara | 3a Kunihama, Tsuyoshi | 9a Jiang, Runchao | 108 Kim, Hyunsoo | 53 Kurum, Esra | 66 Jiang, Wei | 33 Kim, Inyoung | 19 Kwak, Il Youp | 95 Jiang, Yunyun | 8h Kim, Jae Kwang | 114 Kwok, Richard | 17 Jin, Fulai | 37 Kim, Janet S. | 45 Kwon, Deukwoo | 9g Jin, Zhezhen | 65 Kim, Jong-Min | 1h, 8j Labbe, Aurelie | 56 Joffe, Marshall M. | 104 Kim, Junghi | 2j Laber, Eric | 108 Johnson, Brent A. | 48d Kim, Jung In | 60 Labriola, Dominic | 23 Johnson, Chris | 51 Kim, Sehee | 60 Lachin, John M. | 48k Johnson, Paul E. | 78 Kim, Sungduk | 20, 107 Laeyendecker, Oliver B. | 40 Johnson, Timothy D. | 2k, 19, 57, Kim, SungHwan | 35o Lahiri, Soumendra | 68 105 Kimmel, Stephen E. | 7h | | Lan, Gordon 117 Joseph, Antony 96 King, Emily | 18 | | LaVange, Lisa 23, T3 Joshi, Amit D. 95 Kiragga, Agnes | 24 | | Leary, Emily 96 Jung, Yeun Ji 87 Kirpich, Alexander | 69, 96 | | Lee, Chi Hyun 86 Kahle, David 9i Klasjna, Pedja | 41 | | Lee, Eunjee 57 Kaizar, Eloise E. 21 Kleiber, William | 115 | | Lee, Eun-Joo 4c Kalbfleisch, Jack 72 Kline, David M. | 21 | | Lee, Mei-Ling Ting 13, 65 Kalra, Philip A. 20 Knickmeyer, Rebecca C. | 105 | | Lee, Oliver 111 Kang, Jian 2g Kobie, Julie | 31 | | Lee, Seonjoo 109 Kang, Le 107 Koch, Gary | 3b, 94 | | Lee, Seunggeun 31, 61j, 64 Kapelner, Adam 62 Kohane, Isaac | 50

388 ENAR 2015 | Spring Meeting | March 15–18 Lee, Seung-Hwan | 4b Li, Yun R. | 31 Liu, Yeqian | 34 Lee, Shing M. | 14 Liang, Chao-Kang Jason | 90 Liu, Ying | 97 Lee, Thomas | 81 Liang, Han | 9d Liu, Yufeng | 58 Lee, Wai S. | 9h Liang, Hua | 109 Liu, Yulun | 61l Lee, Xia | 20 Liao, Kaijun | 112 Liu, Zhonghua | 61k Leek, Jeffrey T. | 108 Liao, Katherine P. | 50 Liu, Zhuqing | 2k Lemire, Mathieu | 56 Liao, Peng | 41 Liyanag, Gayan | 55 Leng, Ning | 6i Liebhold, Andrew M. | 82 Lock, Eric F. | 98 Leong, Traci | 3i Lilly, Christa | 7f Logan, Brent R. | 86, 118 Leon-Novelo, Luis G. | 68 Lim, Junho | 8j Loh, Wei-Yin | 106 Le-Rademacher, Jennifer | 30 Lin, Danyu | 16, 61f Lok, Anna | 90 Leurgans, Sue E. | 20 Lin, Feng-Chang | 60 Lok, Judith J. | 24, 32 Levenson, Mark | 3a Lin, Hongbo | 1i Long, D Leann | 3f Levina, Elizaveta | 80 Lin, Hui-Min | 6e Long, Dustin M. | 7m Levy, Michael | 7i Lin, Li-An | 60 Long, Qi | 2g, 8d, 35k, 70, 84 Li, Cong | 6n, 31, 110 Lin, Lizhen | 94 Longini, Ira | 7g, 69 Li, Dan | 94 Lin, Shili | 31 Loo, Geok Yan | 13 Li, Fan | 116 Lin, Shu-Yi | 7d Lopetegui, Marcelo A. | 55 Li, Hongzhe | 6o, 31, 35d, 98, 110 Lin, Wei | 35d Loredo-Osti, J. Concepcion 61p Li, Jianing | 89 Lin, Xihong | 18, 61a, 61k, 61o, 63 Lou, Wenjie | 47 Li, Jiaqi | 5e Lin, Yan | 6e Louis, Thomas A. | 44 Li, Li | 17 Lin, Zhixiang | 43 Lourenço, Vanda M. | 96 Li, Liang | 104 Linder, Daniel F. | 7b, 48c, 94, 107 Love, Tanzy M. T. | 17, 59 Li, Lingling | 32 Lindquist, Martin A. | 1e, 2m, 5b, 25, 59, 73, 109, SC2 Lu, Bo | 46 Li, Meng | 2e Liu, Benmei | 42 Lu, Qing | 6c, 6g, 61e Li, Ming | 6c, 6g Liu, Congjian | 9c Lu, Wenbin | 33, 108 Li, Mingyao | 56, 102 Liu, Dandan | 90 Lu, Xin | 48d Li, Qunhua | 55 Liu, Danping | 58, 90 Lu, Yuefeng | 6k Li, Runze | 66, 70 Liu, Han | 10, 35l, 43, 109 Lu, Zhaohua | 105, 111 Li, Shanshan | 4j Liu, Jun | 102 Lucas, Joseph Edward | 78 Li, Shi | 61b Liu, Ke | 44 Luedtke, Alexander R. | 33 Li, Siying | 94 Liu, Lan | 24, 73 Lum, Kirsten J. | 44 Li, Xiaochun | 32 Liu, Lei | 56, 113 Luo, Sheng | 60, 72 Li, Yanming | 60, 74 Liu, Mengya | 103 Luo, Wei | 77 Li, Yehua | 47 Liu, Piaomu | 16 Luo, Xi | 73 Li, Yi | 56, 60, 74 Liu, Qing | 16 Luo, Xianghua | 86 Li, Yifang | 22 Liu, Shelley Han | 69 Lyles, Robert H. | 97 Li, Yingbo | 87 Liu, Song | 61q Lymp, James | 48e Li, Yisheng | 73 Liu, Xiang | 83 Lynch, Gavin | 86 Li, Yuan | 6i Liu, Xiaoxue | 71 Lynch, Kevin | 33 Li, Yumeng | 44 Liu, Xu | 61c Ma, Ling | 4h Li, Yun | 37, 56, 61g, 61h

Program & Abstracts 389 Ma, Shuangge | 74 Mejia, Amanda F. | 2m Nahum-Shani, Inbal | 33 Ma, Shujie | 100 Mendonca, Enedia | 50 Nam, Kijoeng | 3d Ma, Xiaoye | 21 Mendoza, Maria Corazon B. | 51 Nan, Bin | 72, 85 Ma, Yanyuan | 99 Mentch, Lucas K. | 62 Nassiri, Vahid | 48a Maas, Paige | 8f Mercaldo, Nathaniel D. | 79 Nathoo, Farouk S. | 19 Machogu, Evans M. | 7k Mesenbrink, Peter Grant | 49 Nebel, Mary Beth | 2m Maciejewski, Matthew L. | 113 Michailidis, George | 80 Needham, Dale M. | 34 Madden, Jamie M. | 20 Miecznikowski, Jeffrey C. | 18, 61q Neelon, Brian | 113 Madden, Stephen | 6k Miles, Caleb | 63 Nelson, LaRon E. | 1f Madigan, David | 5a, 15, 21, 35b Millen, Brian A. | 3n Neuvirth, Hani | 37 Mahmoud, Hamdy Fayez Farahat Milton, Jacqueline N. | 92 Nevo, Daniel | 6q 19 Min, Eun Jeong | 9e Newton, Michael A. | 95 | Mahnken, Jonathan D. 33, 83 Miranda, Michelle F. | 74 Neykov, Matey | 16 | Maiti, Tapabrata 84, 99 Mitchell, Emily M. | 97 Ngo, Duy | 57 | Maity, Arnab 45 Mitra, Nandita | 5e Nguyen, Thuan | 6f | Majeed, Azeem 46 Mitra, Robin | 87 Nichols, Thomas E. | 105 | Maleki, Arian 12 Mo, Chen | 107 Nicolae, Dan | 37 | Manatunga, Amita K. 55, 97 Molenberghs, Geert | 1b, 48a, 85 Ning, Yang | 35l, 43, 95 | Mandrekar, Jay 69 Moncunill, Gemma | 68 Niu, Xiaoyue | 90 | Manukyan, Zorayr 3j Monteiro, Andreia | 96 Noel, Janelle R. | 56 | Mao, Lu 16 Moodie, Erica E. M. | 76 Normolle, Daniel | 14 | March, Dana 92 Moore, Jason | 61g North, Kari E. | 61f | Marchenko, Olga R2 Morales, Romarie | 99 Northrup, Karen | 7f | Marcovitz, Michelle S. 9k Morris, Jeffrey S. | 45 Novitsky, Vladimir | 69 | Marder, Karen 99 Morris, Max | 35e Nunez, Sara | 31 | Mariam, Shiferaw 23 Mortier, Frederic | 39 Nychka, Doug | 17 | Marioni, John 88 Morton, Sally | 7a Oakes, David | 13 | Marks, Sarah J. 30 Mostofsky, Stewart | 2m Oates, Jim C. | 108 | Mathelier, Hansie 7h Moustaki, Irini | 1a, 21 Ochs, Michael F. | 9h | Mathias, Rasika A. 26 Mowrey, Wenzhu | 35f O’Connor, Patrick J. | 78 | Matsouaka, Roland A. 83 Mudholkar, Govind S. | 99 Oganyan, Anna | 51 | Mattei, Alessandra 116 Mueller, Hans-Georg | 38 Ogburn, Elizabeth | 5b | Mauro, Christine M. 59 Mueller, Peter | 54, 116 Ogden, R. Todd | 35i, 45 | Mavridis, Dimitris 21 Mukherjee, Bhramar | 9b, 31, 61b, Ohlssen, David | R4 | Mayo, Matthew S. 33 63, 64, 106, R1 Oluyede, Broderick | 55 | | Maze, Alena 51 Mukhopadhyay, Nandita 95 Ombao, Hernando | 2h, 25, 57, 59, McCormick, Tyler | 15 Murawska, Magdalena | 68 109, SC2 McGee, Daniel L. | 7l Murphy, Susan A. | 35a, 41 O’Quigley, John | 48b, SC3 McGee, Monnie | 103 Muschelli, John | 2b, T4 Orr, Megan | 98 McLain, Alexander C. | 20, 34 Musgrove, Donald R. | 2f Ortega, Joaquin | 59 McMahan, Christopher S. | 68, 69 Mwanza, Jean-Claude | 71 Ostrovnaya, Irina | 101 Mealli, Fabrizia | 116 Myint, Leslie | 6p Oswald, Trevor J. | 39

390 ENAR 2015 | Spring Meeting | March 15–18 Ou, Fang-Shu | 4l Polgar-Turcsanyi, Mariann | 2d Reed, Eric | 31 Pacheco, Christina M. | 33 Polizzotto, Matthew | 82 Reich, Daniel | 2c Paddock, Susan M. | 113 Pollok, Richard | 46 Reilly, Cavan Sheerin | 120 Palmas, Walter | 31, 64 Powell, Helen | 17 Reilly, Muredach P. | 31 Palmer, Jeffrey | 6k Prado, Raquel | 57 Reimherr, Matthew | 70 Pan, Jean | 111 Preisser, John S. | 30, 113 Reis, Isildinha M. | 9g Pan, Jianxin | 8b Prentice, Ross | 29 Reiss, Philip T. | 59 Pan, Qiang | 104 Preston, Kenzie | 41 Ren, Haobo | 61g Pan, Wei | 2j, 6a, 80 Prezant, David J. | 71 Ren, Xing | 61q Papadogeorgou, Georgia | 17 Price, Brad | 87 Resa, Maria de los Angeles | 46 Pararai, Mavis | 55 Price, Dionne L. | R3 Rice, John D. | 34, 61b Parikh, Chirag R. | 4g Price, Larry R. | 42 Rice, Madeline M. | 83 Park, So Young | 45 Pullenayegum, Eleanor M. | 72 Richardson, Sylvia | 88 Park, Yeonhee | 35h Purawat, Shweta | 53 Ritchie, James | 20 Park, Yeonjoo | 83 Qi, Lihong | 114 Robins, James M. | 24, 73, 76 Park, YongSeok | 35o Qi, Meng | 30 Robinson, Lucy F. | 1l Parmigiani, Giovanni | 73 Qian, Hong | 53 Rochani, Haresh D. | 94 Paskett, Electra D. | 3c Qian, Jing | 31, 118 Rodrigues, Paulo Canas | 96 Patil, Prasad | 108 Qian, Yu | 103 Rohan, Patricia | 3d Patil, Sujata M. | 101 Qin, Jing | 13, 60 Rong, Alan | 111 Pavur, Gregory | 7e Qin, Li | 1h Rosenbaum, Paul R. | 32 Paul, Debashis | 81 Qin, Li-Xuan | 81 Rosenberger, William F. | 3j | Peddada, Shyamal D. 68, 119 Qin, Zhaohui | 37 Rosenblum, Michael A. | 10, 32, 36 | Pedraza, Omar 3g Qiu, Huitong | 109 Ross, Eric A. | 3g | Pekar, James J. 2m Qiu, Sheng | 4i Ross, Michelle | 7i | Pelagia, Ioanna 8b Qu, Annie | 70, 84 Rosset, Saharon | 37 | Peña, Edsel 16 Qu, Liming | 31 Rothman, Adam J. | 87 | Peng, Gang 97 Quinlan, Erin Burke | 57 Roy, Anindya | 46 | Peng, Limin 55, 70, 118 Raghavan, Rama | 56 Roy, Dooti | 34 | Peng, Roger D. 17 Raghunathan, Trivellore E. | 42 Roy, Jason | 7i, 78, 116 | Peng, Jie 81 Rahman, AKM F. | 55 Roy, Vivekananda | 34 | Pennell, Michael L. 3c Rakhmawati, Trias Wahyuni | 1b RoyChoudhury, Arindam | 19 | Perera, Robert A. 73 Ramachandran, Gurumurthy | 17, Ruczinski, Ingo | 26, 52 | Perkins, Neil J. 97 99 Rudser, Kyle | 99 | | Petersen, Alexander 38 Rana, Santu 77 Ruppert, Amy S. | 3k | | Petersen, Ashley 37 Rasch, Elizabeth K. 107 Ruppert, David | 9j, 17 | | Petkova, Eva 35i Ratcliffe, Sarah J. 58 Rüschendorf, Ludger | 16 | | Pfister, Gabriele G. 17 Rathouz, Paul 79 Russek-Cohen, Estelle | 3d | | Phung, Dinh 77 Ray, Debashree 61r, 64 Rutter, Carolyn M. | 11 | | Pillai, Natesh S. 42 Ray, Surajit 30 Ryan, Patrick | 35b | | Pires, Ana M. 96 Reagen, Ian 17 Sabanes Bove, Daniel | 48e | | Plenge, Robert 50 Redman, Mary W. 101 Safikhani, Abolfazl | 19

Program & Abstracts 391 Safo, Sandra Addo | 18, 35k Schroeder, Anna Louise | 109 Sies, Aniek | 3h Saha, Krishna K. | 86 Schulte, Phillip J. | 46, 47 Sima, Adam P. | 42 Saha Chaudhuri, Paramita | 90 Schutt, Rachel | 27 Simon, Noah | 37, 67, 106 Sair, Haris | 43 Schwartz, Brian | 87 Simon, Richard M. | 106 Salanti, Georgia | 21 Schwartz, Theresa M. | 71 Simpson, Douglas G. | 83 Salleh, Sh-Hussain | 57 Schwartzman, Armin | 2e Simpson, Pippa M. | 7k Salman, Hanna | 70 Scornet, Erwan | 62 Simsek, Burcin | 70 Samadi, S. Yaser | 109 Seaman, John W. | 9i Singer, Sam | 81 Samawi, Hani M. | 7b, 48c, 94, 107 Seaman Jr., John | 9k Singh, Sonia | 20 Sanchez, Brisa N. | 58, 71, R7 Segal, Mark Robert | 52 Sinha, Samiran | 84 Sanchez-Vaznaugh, Emma V. | 71 Senturk, Damla | 66 Sinha, Sanjoy | 112 Sanders, Anne E. | 30 Sethuraman, Venkat | 23 Sinnott, Jennifer A. | 50 Sandler, Dale | 17 Shaddick, Gavin | 38 Slate, Elizabeth H. | 7l Santa Ana, Elizabeth J. | 1d Shahn, Zach | 35b Small, Dylan S. | 24, 32, 46 Sanz, Hector | 68 Shan, Guogen | 107 Smith, Jennifer A. | 31, 64 Sargent, Daniel J. | 101 Shang, Kan | 120 Smith, Valerie A. | 113 Saria, Suchi | 77 Shankara, Srinivas | 6k Smoot, Elizabeth | 79 Sarkar, Somnath | 48e Shardell, Michelle | 46 Sobel, Michael E. | 5a, 21, 73 Sasala, Emily A. | 7m Shaw, Pamela A. | 7h Sofrygin, Oleg | 22 Sato, Hiroyuki | 3e Shear, M. Katherine | 59 Sollecito, Bill | T3 Saville, Ben | 48g Shelton, Brent | 17 Soltani, Ahmad Reza 71 Saxena, Sonia | 46 Shen, Changyu | 1i, 32 Song, Minsun 95 Schaid, Daniel J. | 110 Shen, Frank | 23 Song, Peter X. K. | 8c, 43, 84, 100 Schappert, Susan | 114 Shen, Haipeng | 2i, 45 Song, Rui | 33, 108 Scharfstein, Daniel O. | 48o Shen, Ronglai | 52 Song, Xiao | 47 Schaubel, Douglas E. | 4e, 60, 118 Shen, Xiaotong | 80 Song, Xiaoyu | 61i Scheet, Paul | 37, 61l Shen, Xiaoxi | 6g Sonmez, Kemal | 6f Schefzik, Roman | 28 Shen, Yuanyuan | 35p Sotres-Alvarez, Daniela | 59 Scheike, Thomas H. | 89, 114 Shepler, Samantha | 3i Spainhour, John Christian | 108 Scheipl, Fabian | 34 Shi, Haiwen | 86 Spiegelman, Donna | 6q Schenker, Nathaniel | 42, 114 Shi, Jianxin | 64 Spidlen, Josef | 6j Scheuermann, Richard H. | 53, 103 Shi, Jingchunzi | 61j Srinivasan, Cidambi | 120 Schifano, Elizabeth D. | 111 Shi, Peibei | 70 Staicu, Ana-Maria | 45 Schildcrout, Jonathan S. | 79 Shi, Pixu | 98 Stambolian, Dwight | 56 Schindler, Jerry | 23 Shimizu, Iris | 114 Stanton, Rick | 53, 103 Schisterman, Enrique F. | 97 Shinohara, Russell | 2c Starren, Justin B. | 56 Schliep, Erin M. | 66 Shoben, Abigail B. | 3k Stenzel, Mark | 17 Schuemie, Martijn J. | SC5, 15 Shou, Haochang | 2m Steorts, Beka | 27 Schneeweiss, Sebastian | 10 Shringarpure, Suyash S. | 26 Stephens, Alisa J. | 104 Schneider, Lon S. | 48q Shu, Xu | 118 Stephens, David A. | 76 Schork, Nicholas | 61d Shyr, Yu | 48i Stewart, Patricia | 17 Schrack, Jennifer | 45 Si, Yajuan | 42 Stewart, Ron | 6i

392 ENAR 2015 | Spring Meeting | March 15–18 Stewart, Thomas G. | 74 Tayob, Nabihah | 97 Valverde, Roberto | 114 Stewart-Koster, Ben | 115 Tchetgen Tchetgen, Eric | 24, 63, 73 Van der Elst, Wim | 48a Stingo, Francesco C. | 88 Tebbs, Joshua M. | 68, 97, 119 van der Laan, Mark J. | 22, 33 Storey, John | 26 Teklehaimanot, Abeba | 1d van der Woerd, Mark | 109 Stringham, Heather | 61b Teng, Ming | 19 Van Deun, Katrijn | 33 Strobl, Andreas | 97 Teng, Zhaoyang | 48f Van Mechelen, Iven | 3h, 33 Stromberg, Arnold | 4d, 95 Tewari, Ambuj | 41 VanderWeele, Tyler J. | 5g, 63 Strong, Louise C. | 97 Thall, Peter F. | 116 Vardeman, Stephen | 35e Stuart, Elizabeth A. | 73 Thomas, David M. | 97 Vaughan, Roger D. | 92 Su, Hai | 4d Thomas, Laine | 46, 47 Vazquez-Benitez, Gabriela | 78 Su, Haiyan | 22 Thompson, Wesley Kurt | 25 Venkatesh, Svetha | 77 Su, Zhihua | 35h Thomson, James | 6i Verbeke, Geert | 1b, 48a Suchard, Marc A. | 15, SC5 Thorarinsdottir, Thordis L. | 28 Vert, Jean-Philippe | 62 Sullivan, Lisa | 92 Thuillier, Vincent | 6k Veturi, Yogasudha | 6b Sullivan, Patrick F. | 37 Thurston, Sally W. | 17, 59, 109 Vock, David M. | 5c, 78 Sultan, Khalaf S. | 119 Tian, Xinyu | 35c Vogel, Robert L. | 7b, 48c, 94 Sun, BaoLuo | 24, 73 Tibshirani, Ryan Joseph | 67 Vogelstein, Joshua | 109 Sun, Boqin | 118 Ting, Chee-Ming | 57 Voronca, Delia | 1d Sun, Hengrui | 3b Titman, Andrew | 93 Wager, Stefan | 62 Sun, Jianguo | 34, 60, 118 Tong, Xin | 12 Wages, Nolan A. | 48b Sun, Shumei | 107 Tong, Xin | 85 Wahed, Abdus S. | 116 Sun, Wei | 102 Tran, Truyen | 77 Walker, Stephen G. | 116 Sun, Ying | 57 Trippa, Lorenzo | 22, 54 Wall, Melanie M. | 1a, 21 Sundaram, Rajeshwari | 4h, 44 Tritchler, David L. | 18 Wan, Lijie | 72 Sung, Myong-Hee | 87 Troiano, Richard | 42 Wang, Alice | 56 Sweeney, Elizabeth | 2c, T4 Troxel, Andrea B. | 112 Wang, Chaolong | 61o Szpiro, Adam | 100 Tsai, Huei-Ting | 13 Wang, Chenguang | 48o Tabb, Loni Philip | 71 Tseng, George C. | 21, 35f, 35o Wang, Chi | 73, 95 Tamura, Roy N. | 83, 91 Tsiatis, Anastasios (Butch) | 23, Wang, Dewei | 69, 119 108, SC4 Tan, Kai | 18 Wang, Dong | 45 Tsodikov, Alexander | 4i, 11, 34 Tang, Chuan-Fa | 119 Wang, Guoqiao | 48q Tu, Xin | 72 Tang, Fan | 1k Wang, Honglang | 74 Tucker, Tom | 17 Tang, Larry | 107 Wang, Hongyuan | 4d, 95 Turner, Jacob A. | 103 Tang, Lu | 8c Wang, Huixia Judy | 7n Tuschl, Tom | 81 Tang, Xueying | 7g Wang, Jianmin | 61q Tymofyeyev, Yevgen R. | 8 Tanna, Angelo P. | 71 Wang, Jianming | 118 Ullman, Natalie | 2b Tao, Ran | 61f Wang, Jiebiao | 37, 84 Umbricht, Annie | 41 Tao, Yebin | 33 Wang, Jue | 72 Urbanek, Jacek K. | 9f Tarpey, Thaddeus | 35i Wang, Kai | 18 Valeri, Linda | 5g Taub, Margaret A. | 26 Wang, Kehui | 7n Valim, Clarissa | 68 Taylor-Rodriguez, Daniel | 74 Wang, Le | 7h Vallejos, Catalina | 88 Taylor, Jeremy M. G. | 9b, 61b, 106 Wang, Liangliang | 115

Program & Abstracts 393 Wang, Lijia | 30 Weppelmann, Alex | 69 Xiao, Luo | 9j, 17, 45, 59 Wang, Lu | 33, 100 Westgate, Philip M. | 85 Xiao, Yimin | 19 Wang, Luojun | 44 Wey, Andrew | 99 Xie, Dawei | 104 Wang, Mei-Cheng | 13, 34 White, Laura F. | 17 Xie, Sharon X. | 86 Wang, Min | 98 Whitmore, George A. | 13, 65 Xing, Fuyong | 4d Wang, Molin | 6q Wick, Jo A. | 33 Xiong, Momiao | 61g, 61h Wang, Naisyin | 85 Wikle, Christopher K. | 39 Xu, Cong | 96 Wang, Pei | 81, 84 Wilson, Robert S. | 20 Xu, Hongyan | 56 Wang, Qian | 110 Wittberg, Richard | 7f Xu, Kun | 19 Wang, Shuang | 95 Wolf, Bethany J. | 8f, 108 Xu, Rengyi | 42 Wang, Sue-Jane | 36 Wolfson, Julian | 78 Xu, Yanxun | 54, 116 Wang, Tao | 43 Womack, Andrew | 68, 74 Xu, Yuhang | 47 Wang, Tianxiu | 16 Won, Kyoung-Jae | 6o Xu, Yuting | 59 Wang, Wei | 5a, 21 Wong, Raymond K. W. | 105 Xu, Zheng | 37 Wang, Wenyi | 97 Wong, Yu-Ning | 3g Xue, Wei | 6d Wang, Xia | 94 Woo, Emily Jane | 3d Xue, Wengiong | 2a Wang, Xianlong | 84 Wright, George W. | 81 Yabes, Jonathan | 20 Wang, Xin | 4e Wu, Chih-Da | 85 Yang, Can | 6n, 31, 43, 110 Wang, Xuefeng | 35c Wu, Fan | 60 Yang, Dan | 45 Wang, Xueying | 96 Wu, Hao | 95 Yang, Haiyan | 61p Wang, Yang | 66 Wu, Huaiqing | 35e Yang, Hojin | 35j Wang, Yalin | 57, 61g, 61h Wu, Jeff C. F. | 53 Yang, Jing | 118 Wang, Ying-Fang | 114 Wu, Jianrong John | 83 Yang, Lin | 4d Wang, Yuan | 71, 88 Wu, Jing | 111 Yang, Shu | 32 Wang, Yuanjia | 59, 97, 99 Wu, Lang | 112 Yang, Song | 65 Wang, Yuxiao | 57 Wu, Michael C. | 31, 74 Yang, Wei | 104 Wang, Zheyu | 58 Wu, Pan | 1m Yang, Wei Peter | 104 Warasi, Md S. | 68 Wu, Qian | 6o Yang, Xiaowei | 114 Warren, Joshua L. | 71 Wu, Rongling | 61m, 96, 98 Yang, Yang | 6a Wassink, Bronlyn | 99 Wu, Siqi | 96 Yang, Yang | 7g, 69 Weakley, Jessica | 71 Wu, Tianshuang | 33 Yang, Yifan | 22 Webber, Mayris P. | 71 Wu, Xiao | 2l Yang, Yu | 105 Webb-Vargas, Yenny | 73 Wu, Xiaowei | 68 Yang, Yuchen | 17 Weeks, Daniel E. | 95 Wu, Yun-Jhong | 80 Yang, Zhao | 55 Wei, Changshuai | 6c Wu, Zhenke | 58, 77 Yaroshinsky, Alex | 113 Wei, Peng | 6a, 98 Xi, Dong | T6 Ye, Meixia | 96 Wei, Shaoceng | 7j Xi, Mingyu | 120 Ye, Wen | 60 Wei, Ying | 61i Xi, Wenna | 3c Yeatts, Sharon D. | 48l Wei, Yu-Chung | 98 Xia, Changming | 109 Yeh, Hung-Wen | 33 Weiner, Howard L. | 2d Xia, Fang | 46 Yeung, Kayee | 90 Weissfeld, Lisa A. | 20, 35f Xia, Rong | 85 Yi, Min | 68 Weiszmann, Richard | 96 Xiang, Ruoxuan | 8a Yiannoutsos, Constantin T. | 16, 24

394 ENAR 2015 | Spring Meeting | March 15–18 Zhong, Yujie | 30, 93 ENAR Zhou, Bingqing | 4g Zhou, Haibo | 22, 79, 118 Zhou, Haiming | 4f Zhou, Hua | 9e 2015 Zhou, Jie | 34 Zhou, Jincheng | 69 Zhou, Mai | 4m, 22 Zhou, Ming | 55 Yin, Jingjing | 107 Zhang, Jiajia | 4f, 34 Zhou, Qingning | 118 Yoo, Byunggil | 56 Zhang, Jing | 3m Zhou, Wen | 35e Young, Nicolas L. | 6h Zhang, Kathy | 111 Zhou, Xiao-Hua | 58 Yu, Bin | 12, 38, 96, 105 Zhang, Mei-Jie | 89 Zhou, Xin | 35n Yu, Jeffrey | 17 Zhang, Min | 31, 57, 64 Zhou, Yan | 43 Yu, Kai | 64 Zhang, Nanhua | 1f Zhou, Yuzhen | 19 Yu, Lili | 7b Zhang, Qiang | 112 Zhu, Hong | 46 Yu, Mandi | 42, 84 Zhang, Wei | 56 Zhu, Hongtu | 1j, 2i, 35j, 45, 57, 74, | | Yu, Menggang 16 Zhang, Wenfei 6k 85, 105, 111 | | Yu, Zhe 57 Zhang, Xu 120 Zhu, Hongxiao | 68 | | Yu, Ziji 99 Zhang, Yichi 108 Zhu, Ji | 60, 74, 80 | | Yuan, Ao 107 Zhang, Ying 44 Zhu, Jun | 39 | | Yuan, Ying 48b Zhang, Yue 1f, 48h Zhu, Li | 113 | | Yue, Chen 43 Zhang, Zhenzhen 58 Zhu, Shihong | 4m | | Zabor, Emily 101 Zhang, Zugui 32 Zhu, Xiaoqing | 47 | | Zanke, Brent W. 56 Zhao, Hongyu 6n, 31, 43, 110 Zhu, Yu | 70 | | Zariffa, Nevine 23 Zhao, Jinying 34 Zidek, James V. | 38 | | Zauber, Ann G. 11 Zhao, Jiwei 84 Zigler, Corwin Matthew | 17, 73 | | Zee, Jarcy 86 Zhao, Ni 31 Zipunnikov, Vadim | 9f, 9j, 45, 59, 87 | | Zeger, Scott L. 58, 77 Zhao, Sihai Dave 31, 120 Zirkle, Keith W. | 42 | | Zeig-Owens, Rachel 71 Zhao, Weizhong 8e Zöllner, Sebastian | 37 | | Zeng, Donglin 4l, 34, 59, 61f, 74, Zhao, Xiaoyue 99 Zou, Jingjing | 119 97, 111 | Zhao, Yang 6k Zou, Wen | 8e Zeng, Peng | 70 | Zhao, Yi 73 Zozus, Meredith Nahm | 78 Zeng, Zhen | 95 | Zhao, Yize 2g, 70 Zubizarreta, Jose R. | 46, 119 Zhan, Jia | 32 | Zhao, Yunpeng 80 Zucker, David | 6q | Zhang, Anru 98 Zheng, Cheng | 108 | Zhang, Bin 48h Zheng, Qi | 70 | Zhang, Fangyuan 31 Zheng, Yinan | 56 | Zhang, Guangyu 51 Zheng, Yingye | 90, 108 | Zhang, Guosheng 37 Zhong, Feiran | 19 | Zhang, Han 64 Zhong, Hua (Judy) | 43, 108 | Zhang, Hongtao 118 Zhong, Ping-Shou | 61n, 74

Program & Abstracts 395

ENARHyatt Regency Miami 2015 3D Meeting Space Layout

Floor Plan

Hyatt Regency Miami Lobby

Program & Abstracts 397 Statistics Recent releases of SAS/STAT® software provide exciting new capabilities. Highlights include:

SAS/STAT 13.2 Weighted GEE methods. Deal with drop-outs in longitudinal studies with a method that produces unbiased estimates under the missing-at-random (MAR) assumption.

Analysis for spatial point patterns. Understand locations of random events, such as crimes or lightning strikes, and how other spatial factors infl uence event intensity.

Proportional hazards regression models for interval-censored data. Apply Cox regression models when you have interval-censored data.

Nested multilevel nonlinear mixed models. Fit hierarchical models often used in the analysis of pharmacokinetics data.

SAS/STAT 13.1 Sensitivity analysis for multiple imputation. Assess sensitivity of multiple imputation to the missing at random assumption with pattern-mixture models.

Survival analysis for interval-censored data. Compute nonparametric estimates of the survival function for interval-censored data.

Bayesian choice models. Use Bayesian discrete choice analysis to model consumer decisions in choosing products or selecting from multiple alternatives.

Competing risk models. Analyze time-to-event data with competing risks using the method of Fine and Gray (1999).

Item response models. Use item response models to calibrate test items and evaluate respondents’ abilities.

to learn more support.sas.com/ statnewreleases

SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2015 SAS Institute Inc. All rights reserved. S136288US.0215 Statistics Recent releases of SAS/STAT® software provide exciting new capabilities. Highlights include:

SAS/STAT 13.2 Weighted GEE methods. Deal with drop-outs in longitudinal studies with a method that produces unbiased estimates under the missing-at-random (MAR) assumption.

Analysis for spatial point patterns. Understand locations of random events, such as crimes or lightning strikes, and how other spatial factors infl uence event intensity.

Proportional hazards regression models for interval-censored data. Apply Cox regression models when you have interval-censored data.

Nested multilevel nonlinear mixed models. Fit hierarchical models often used in the analysis of pharmacokinetics data.

SAS/STAT 13.1 Sensitivity analysis for multiple imputation. Assess sensitivity of multiple imputation to the missing at random assumption with pattern-mixture models.

Survival analysis for interval-censored data. Compute nonparametric estimates of the survival function for interval-censored data.

Bayesian choice models. Use Bayesian discrete choice analysis to model consumer decisions in choosing products or selecting from multiple alternatives.

Competing risk models. Analyze time-to-event data with competing risks using the method of Fine and Gray (1999).

Item response models. Use item response models to calibrate test items and evaluate respondents’ abilities.

to learn more support.sas.com/ statnewreleases

Program & Abstracts 399 SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2015 SAS Institute Inc. All rights reserved. S136288US.0215 12100 Sunset Hills Road Suite 130 Reston, Virginia 20190

Phone 703-437-4377 Fax 703-435-4390 www.enar.org