<<

October 16-19

DAE 2019 provides support and encouragement to junior researchers in the field of design and analysis of experiments.

Sponsored by: National Security Agency, JMP, The , Knoxville’s Haslam College of Business, and The University of Tennessee Office of & Engagement (SARIF)

DAE STEERING COMMITTEE

Manohar Aggarwal (DAE 2007, Memphis, TN)

Nancy Flournoy (DAE 2009, Columbia, MO) University of Missouri

John Stufken, Chair (DAE 2012, Athens, GA) University of North Carolina, Greensboro

Ryan Lekivetz (DAE 2015, Cary, NC) SAS Institute

Hongquan Xu (DAE 2017, Los Angeles, CA) UCLA

DAE 2019 ORGANIZERS

Robert Mee & Wei Zheng University of Tennessee, Knoxville

DAE 2019 SPONSORS

JMP

National Security Agency

University of Tennessee Haslam College of Business

University of Tennessee Office of Research & Engagement

American Statistical Association

Institute of Mathematical Statistics

National Institute of Statistical Sciences

1 OCTOBER 16 – 19, 2019

Wednesday, October 16 6:00 p.m. – 7:30 p.m. DAE Conference Reception Four Points Sheraton Hotel

Thursday, October 17 7:30 a.m. – 8:10 a.m. Continental Breakfast & Registration Student Union Ballroom A 8:10 a.m. – 8:30 a.m. Welcome & Opening Remarks Student Union 169 8:30 a.m. – 10:00 a.m. SESSION 1: SEQUENTIAL DESIGN AND OPTIMIZATION OF Student Union 169 SIMULATION EXPERIMENTS Organizers: Peter Frazier, ; Robert Gramacy, University Chair: Max Morris, Iowa State University

Bayesian Optimization with Expensive Integrands Speaker: Saul Toscano-Palmerin, Cornell University

Robust design and optimization for advancing science and engineering Speaker: Stefan Wild, Argonne National Lab

Replication or exploration? Sequential design for stochastic simulation experiments Speaker: Robert Gramacy, Virginia Tech University

10:00 a.m. – 10:30 a.m. Coffee & Tea Break Student Union 169 10:30 a.m. – 12:00 p.m. SESSION 2: INFORMATION AND INFERENCE WITH ADAPTIVE Student Union 169 DESIGN Organizer and Chair: Nancy Flournoy, University of Missouri

Random Norming Aids Analysis of Non-Linear Regression Models with Informative Sequential Dose Selection Speaker: Zhantao Lin,

New Adaptive Design Strategies Based of the Observed Fisher Information Speaker: Adam Lane, University of Cincinnati

The Effect of Interim Adaptations in Group Sequential Designs Speaker: Sergey Tarima, Medical College of Wisconsin

12:00 p.m. – 1:00 p.m. Lunch Student Union Ballroom A

2 OCTOBER 16 – 19, 2019

Thursday, October 17 (continued) 1:00 p.m. – 2:30 p.m. SESSION 3: INTERFACE BETWEEN CAUSAL INFERENCE AND Student Union 169 DESIGN OF EXPERIMENTS Organizer: P. Richard Hahn, Arizona State University Chair: Abhyuday Mandal, University of

Combining Observational and Experimental Data for Improved Inference and Design Speaker: Evan Rosenman, Stanford University

Rerandomization and Regression Adjustment Speaker: Xinran Li, University of Illinois at Urbana-Champaign

Harmonizing Fully Optimal Designs with Classic Randomization in Fixed Trial Experiments Speaker: Adam Kapelner, Queens College

2:30 p.m. – 3:30 p.m. Poster Session 1 Coffee & Tea Break Student Union 262 3:30 p.m. – 5:00 p.m. SESSION 4: DOE APPLICATIONS TO BUSINESS, HEALTH AND Student Union 169 EDUCATION. Organizer: DAE Committee Chair: David Edwards, Virginia Commonwealth University

Sub-data selection for subgroup analysis Speaker: Min Yang, University of Illinois at Chicago

Optimal Sample Size for Cluster Randomized Trials: A Simulation-Based Search Algorithm Speaker: Roee Gutman, Brown University

The Role of Optimal Challenge in Adaptive E-Learning: Evidence from a field Experiment with Middle School Students Speaker: De Liu,

5:00 p.m. – 5:45 p.m. Roundtable Discussion Student Union Ballroom A

3 OCTOBER 16 – 19, 2019

Friday, October 18, 2019 7:30 a.m. – 8:30 a.m. Continental Breakfast & Registration Student Union Ballroom A 8:30 a.m. – 10:00 a.m. SESSION 5: MACHINE LEARNING ALGORITHMS ASSISTED BY Student Union 169 DESIGN CONCEPTS Organizer: Arman Sabbaghi, Purdue University Chair: Wei Zheng, University of Tennessee

Designing for low-rank matrix recovery: a maximum entropy approach Speaker: Simon Mak,

Support Points: An Optimal and Model-Free Method for Subsampling Big Data Speaker: Roshan Joseph,

Collaborative Design for Improved Causal Machine Learning on Big Observational Data Speaker: Arman Sabbaghi, Purdue University

10:00 a.m. – 10:30 a.m. Coffee & Tea Break Student Union 169 10:30 a.m. – 12:00 p.m. SESSION 6: INTERNET EXPERIMENTS WITH EMPHASIS ON E- Student Union 169 COMMERCE Organizer and Chair: William Li, Shanghai Advanced Institute of Finance

Active Arm Selection using Thompson Sampling (AASETS): A multi-armed bandit method under arm budget constraints. Speaker: Yuanshuo (David) Zhao, Uber

Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb Speaker: Aleksander Fabijan, Microsoft

Improving LinkedIn Member Experience amid Network Interferences: Journey and Learning Speaker: Weitao Duan, LinkedIn

12:00 p.m. – 1:00 p.m. Lunch Student Union Ballroom A

4 OCTOBER 16 – 19, 2019

Friday, October 18 (continued) 1:00 p.m. – 2:30 p.m. SESSION 7: NETWORK EXPERIMENTATION Student Union 169 Organizer: Jean Pouget-Abadie, Google Chair: Weitao Duan, LinkedIn

Designs for estimating the treatment effect in networks with interference Speaker: Ravi Jagadeesan,

A Graph-Theoretic Approach to Randomization Tests of Causal Effects Under General Interference Speaker: David Puelz,

Variance Reduction in Bipartite Experiments through Correlation Clustering Speaker: Jean Pouget-Abadie, Google

2:30 p.m. – 3:30 p.m. Poster Session 2 Coffee & Tea Break Student Union 262 3:30 p.m. – 5:00 p.m. SESSION 8: DISCRETE CHOICE EXPERIMENTS Student Union 169 Organizer and Chair: Angela ,

Optimal Product Design by Sequential Experiments in High Dimensions Speaker: Mingyu (Max) Joo, , Riverside

Efficient Design and Analysis for a Selective Choice Process Speaker: Qing Liu, University of Wisconsin

Benefit Formation and Enhancement Speaker: Greg Allenby, The Ohio State University

5:05 p.m. – 5:45 p.m. Mentoring Session Student Union 262 6:00 p.m. – 8:00 p.m. Conference Banquet Club LeConte (See Page 35) 800 S Gay St, 27th Floor

5 OCTOBER 16 – 19, 2019

Saturday, October 19, 2019 7:30 a.m. – 8:30 a.m. Continental Breakfast Student Union Ballroom A 8:30 a.m. – 10:00 a.m. SESSION 9: OPTIMAL DESIGNS Student Union 169 Organizer and Chair: John Stufken, University of North Carolina, Greensboro

Optimal designs for nonlinear multiple regression models with censored data Speaker: Dennis Schmidt, Otto-von-Guericke-Universität Magdeburg

A Comparative Study of the Probability Distribution in Optimal Design Speaker: Sergio Pozuelo Campos, Universidad de Castilla-La Mancha

Robust Experimental Designs for Model Calibration Speaker: William Myers, The Procter & Gamble Company

10:00 a.m. – 10:30 a.m. Coffee & Tea Break Student Union 169 10:30 a.m. – 12:00 p.m. SESSION 10: FACTORIAL DESIGNS Student Union 169 Organizer and Chair: Boxin Tang, Simon Fraser University

A class of multilevel nonregular fractional factorial designs for studying quantitative factors Speaker: Lin Wang, George Washington University

Group Orthogonal Supersaturated Designs Speaker: Ryan Lekivetz, JMP

Cost-efficient Mixed-Level Covering Designs for Testing Experiments: Construction and Application Speaker: Frederick Phoa, Academia Sinica

12:00 p.m. – 1:00 p.m. Lunch (to-go boxes) Student Union Ballroom A

6 INVITED POSTERS, DAY 1

Poster Presentations, 2:30pm - 3:15pm, Oct 17 Faten Alamri Finding Bayesian Optimal Follow-up Experiment of Design Points Virginia commonwealth for Simultaneous Tolerable Dosage Combinations for Multiple Endpoints Carlos de la Calle-Arroyo D-Optimal Designs and Efficiency of Designs for the Antoine's Equation de Castilla-La Mancha in Distillation Experiments Carlos Diaz-Tufinio The Relevance of Designing Experiments in Clinical Trials: Experiences and Axis Clinicals Latina Perspectives from a Clinical Research Organization (CRO) in Mexico Jing-wen Huang A Systematic Construction of Cost-Efficient Designs National Tsing Hua for Order-of-Addition Experiments Tim Keaton Design and Dismemberment for Controlling the Risk of Regret Purdue University for the Multi-Armed Bandit Arvind Krishna Distributional Clustering: A Distribution-Preserving Clustering Method Georgia Inst. of Tech. Abhyuday Mandal EzGP: Easy-to-Interpret Gaussian Process Models for Computer Experiments with Both Quantitative and Qualitative Factors Reid Vincent Paris Designing for Large Simulations: A Case Study Iowa State University Garima Priyadarshini Experimental Designs Suitable for Cases with Varying Error Variances Imperial College London Zack Stokes Design and Analysis of Order-of-Addition Experiments with Application UCLA to Sequential Drug Administration Qian Xiao A Novel Bayesian Optimization Approach for Both Quantitative University of Georgia and Sequence Inputs Yue Zhang Dose Allocation Using Optimal Design Theory in Phase 1/11 Clinical Trials UIC Where Toxicity and Efficacy Are Evaluated Together

6 7 INVITED POSTERS, DAY 2

Poster Presentations, 2:30pm - 3:15pm, Oct 18 Kazeem Adepoju The Transmuted F Test: A Robust Testing Criterion in Design University of Minnesota and Analysis of Experiment Yasmeen Akhtar Cost-efficient Mixed-Level Covering Designs for Testing Experiments: Arizona State University Construction and Application David Cole Inducing Point Methods for Gaussian Process Surrogates Virginia Tech of Large-Scale Simulations Jeevan Jankar Optimal Crossover Designs for Generalized Linear Models University of Georgia Andrew Kane A New Analysis Strategy for Designs with Complex Aliasing Duke University Joseph Resch Inverse Problem for Dynamic Computer Simulators via Multiple University of Georgia Scalar-valued Contour Estimation Yao Shi Optimal Design for a Two-Parameter Generalized Linear Mixed Model Arizona State University in Longitudinal Study Ye Tian Minimum Space-Filling Aberration for Strong Orthogonal Arrays UCLA Hongzhi Wang A Review on Finding Maximin Distance Hypercube Designs University of Georgia with Flexible Sizes Ching-Chi Yang Dimensional Analysis for Response Surface Methodology University of Memphis Xueru Zhang Sequential Maximin Good Lattice Point Sets Nankai University Boya Zhang Distance-Distributed Design for Gaussian Process Surrogates Virginia Tech

8 ABSTRACTS OF INVITED TALK

Day 1: Thursday, October 17, 2019

Session 1: Sequential design and optimization of simulation experiments Organizers: Peter Frazier, Cornell and Robert Gramacy, Virginia Tech Chair: Max Morris, Iowa State University

Bayesian Optimization with Expensive Integrands Speaker: Saul Toscano-Palmerin, Cornell University Coauthor(s): Peter Frazier Abstract: Non-convex derivative-free time-consuming objectives are often optimized using "black­ box" optimization. These approaches assume very little about the objective. While broadly appli­ cable, these approaches typically require more evaluations than methods exploiting more problem structure. Often, such non-convex derivative-free time-consuming (or "expensive") objectives are actually the sum or integral of a larger number of less time-consuming objectives. This arises in de­ signing aircraft, choosing parameters in ride-sharing dispatch systems, and tuning a machine learning algorithm's hyper-parameters in deep neural networks. We develop a novel Bayesian optimization algorithm that leverages this structure to improve performance. Our algorithm is average-case opti­ mal by construction when a single evaluation of the integrand remains within our evaluation budget. We show consistency of our method for objective functions that are sums. In numerical experiments, our method performs as well or better across a wide range of problems. Keywords: Bayesian optimization

Robust Design and Optimization for Advancing Science and Engineering Speaker: Stefan Wild, Argonne National Lab Co-author(s): Raghu Bollapragada, Wendy Di, Arindam Fadikar, Jeff Larson and Matt Menickelly Abstract: From experimental measurement to multifidelity mathematical models to numerical imple­ mentation, uncertainty is ubiquitous in real-world applications. Robust optimization is a branch of mathematical optimization that allows decision and design makers to additionally model their level of uncertainty in terms of their models, objectives, and constraints. A robust optimization problem formulation seeks to ensure that optimal decisions are insulated from all possible uncertainties de­ scribed in the uncertainty set, thereby insulating the solution from potential worst-case scenarios. A further challenge is that specification of the models, objectives, and constraints are available only by observation, experiment, or simulation. Consequently, derivative information with respect to the decision and/or uncertain variables is often incomplete. We present ways to solve such robust opti­ mization problems in the derivative-free setting. Our general methodology involves sampling from the uncertainty set in an adaptive way that focuses on the design objective. Keywords: Robust optimization, Sequential design, Design under uncertainty

Replication or Exploration? Sequential design for Stochastic Simulation Experiments Speaker: Robert Gramacy, Virginia Tech Coauthor(s): Micka I Binois, Jiangeng Huang, Mike Ludkovs Abstract: We investigate the merits of replication, and provide methods that search for optimal de­ signs (including replicates), in the context of noisy computer simulation experiments. We first show

9 that replication offers the potential to be beneficial from both design and computational perspec­ tives, in the context of Gaussian process surrogate modeling. We then develop a lookahead based sequential design scheme that can determine if a new run should be at an existing input location (i.e., replicate) or at a new one (explore ). When paired with a newly developed heteroskedastic Gaussian process model, our dynamic design scheme facilitates learning of signal and noise relationships which can vary throughout the input space. We show that it does so efficiently, on both computational and statistical grounds. In addition to illustrative synthetic examples, we demonstrate performance on two challenging real-data simulation experiments, from inventory management and epidemiology. Keywords: Computer experiment, Gaussian process, Input-dependent noise, Lookahead, Replicated observations, Surrogate model

Session 2: Information and inference with adaptive design Organizer and Chair: Nancy Flournoy, University of Missouri

Random Norming Aids Analysis of Non-Linear Regression Models with Informative Se­ quential Dose Selection Speaker: Zhantao Lin, George Mason University Coauthor(s): Nancy Flournoy, William F. Rosenberger Abstract: A two-stage adaptive optimal design is an attractive option for increasing the efficiency of clinical trials. In these designs, based on interim data, the locally optimal dose is chosen for further exploration, which induces dependencies between data from the two stages. When the maximum likelihood estimator (MLE) is used under nonlinear regression models with independent normal er­ rors in a pilot study where the first stage sample size is fixed, and the second stage sample size is large, the Fisher information fails to normalize the estimator adequately asymptotically, because of dependencies. In this situation, three alternative random information measures are presented and are shown to provide better normalization of the MLE asymptotically. The performance of random information measures is investigated in simulation studies, and the results suggest that the observed information performs best when the sample size is small.

Adaptive Designs for Optimal Observed Fisher Information Speaker: Adam Lane, University of Cincinnati Abstract: Expected Fisher information can be found a priori and as a result its inverse is the primary variance approximation used in the design of experiments. This is in contrast to the common claim that the inverse of observed Fisher information is a better approximation of the variance of the maximum likelihood estimator. Observed Fisher information cannot be known a priori however, if an experiment is conducted sequentially (in a series of runs) the observed Fisher information from previous runs is known. In the current work two adaptive designs are proposed that use the observed Fisher information from previous runs to design the future runs. Keywords: Adaptive Design, Observed Information, Expected Information, Optimal Design

Effect of Interim Adaptations in Group Sequential Designs Speaker: Sergey Tarima, Medical College of Wisconsin

10 Coauthor(s): Nancy Flournoy Abstract: We investigate unconditional and conditional on-stopping maximum likelihood estimators (MLEs), information measures and information loss associated with conditioning in group sequential designs (GSDs). The possibility of early stopping brings truncation to the distributional form of MLEs; sequentially, GSD decisions eliminate some events from the sample space. Multiple testing induces mixtures on the adapted sample space. Distributions of MLEs are mixtures of truncated dis­ tributions. Test statistics that are asymptotically normal without GSD, have asymptotic distributions, under GSD, that are non-normal mixtures of truncated normal distributions under local alternatives; under fixed alternatives, asymptotic distributions of test statistics are degenerate. Estimation of var­ ious statistical quantities such as information, information fractions, and confidence intervals should account for the effect of planned adaptations. Calculation of adapted information fractions requires substantial computational effort. Therefore, a new GSD is proposed in which stage-specific sample sizes are fully determined by desired operational characteristics, and calculation of information frac­ tions is not needed (see the preprint of this research at https://arxiv.org/pdf/1908.01411.p df). Keywords: Adaptive designs, Maximum likelihood estimation, Asymptotic distribution theory, In­ terim analyses, Local alternative hypotheses

Session 3: Interface between causal inference and design of experiments Organizer: P. Richard Hahn, Arizona State University Chair: Abhyuday Mandal, University of Georgia

Combining Observational and Experimental Data for Improved Inference and Design Speaker: Evan Rosenman, Stanford University Coauthor(s): Art Owen, Mike Baiocchi Abstract: The increasing availability of large, observational datasets can help researchers to design better experiments. We consider the problem of assigning treatments in a prospective randomized controlled trial. We suppose an observational study of the same treatment exists, but it may suffer from unmeasured confounding. Making use of recent results in sensitivity analysis, we derive a conservative procedure for identifying areas of the covariate space in which unequal allocation of the treatment and control conditions in the RCT is justified. We also consider estimators that pool data between the RCT and the observational study. We propose a robust method for trading off between the estimators to choose a final number of treated units. Keywords: causal inference, randomized controlled trials, observational data, propensity scores

Rerandomization and Regression Adjustment Speaker: Xinran Li, University of Illinois at Urbana-Champaign Coauthor(s): Peng Ding Abstract: Randomization is a basis for the statistical inference of treatment effects without strong assumptions on the outcome-generating process. Appropriately using covariates further yields more precise estimators in randomized experiments. R. A. Fisher suggested blocking on discrete covariates in the design stage or conducting the analysis of covariance (ANCOVA) in the analysis stage. We can embed blocking into a wider class of experimental design called rerandomization, and extend the classical ANCOVA to more general regression adjustment. Rerandomization trumps complete

11 randomization in the design stage, and regression adjustment trumps the simple difference-in-means estimator in the analysis stage. It is then intuitive to use both rerandomization and regression adjustment. Under the randomization-inference framework, we establish a unified theory allowing the designer and analyzer to have access to different sets of covariates. We find that asymptotically {a) for any given estimator with or without regression adjustment, rerandomization never hurts either the sampling precision or the estimated precision, and {b) for any given design with or without rerandomization, our regression-adjusted estimator never hurts the estimated precision. Therefore, combining rerandomization and regression adjustment yields better coverage properties and thus improves statistical inference. To theoretically quantify these statements, we first propose two notions of optimal regression-adjusted estimators, and then measure the additional gains of the designer and the analyzer in the sampling precision and the estimated precision. We finally suggest using rerandomization in the design and regression adjustment in the analysis followed by the Huber­ White robust standard error. Keywords: covariate balance; experimental design; potential outcome; randomization

Harmonizing Fully Optimal Designs with Classic Randomization in Fixed Trial Experi­ ments Speaker: Adam Kapelner, Queens College, CUNY Coauthor(s): Abba Krieger, Michael Sklar, David Azriel Abstract: There is a long debate in experimental design between the classic randomization design of Fisher, Yates, Kempthorne, Cochran and those who advocate deterministic assignments based on notions of optimality. In non-sequential trials comparing treatment and control, covariate measure­ ments for each subject are known in advance, and subjects can be divided into two groups based on a criterion of imbalance. With the advent of modern computing, this partition can be made nearly perfectly balanced via numerical optimization, but these allocations are far from random. These perfect allocations may endanger estimation relative to classic randomization because unseen subject-specific measurements can be highly imbalanced. To demonstrate this, we consider differ­ ent performance criterions such as Efron's worst-case analysis and our original tail criterion of mean squared error. Under our tail criterion for the differences-in-mean estimator, we prove asymptotically that the optimal design must be more random than perfect balance but is not completely random. This result vindicates restricted designs that are used regularly such as blocking and rerandomization. For a covariate-adjusted estimator, balancing offers less rewards and it seems good performance is achievable with complete randomization. Further work will provide a procedure to find the explicit optimal design in different scenarios in practice. Keywords: randomization, experimental design, optimization, restricted randomization

Session 4: DOE applications to business, health and education. Organizer: DAE Committee Chair: David Edwards, Virginia Commonwealth University

Subdata Selection for Subgroup Analysis Speaker: Min Yang, University of Illinois at Chicago Abstract: How to implement data reduction to draw useful information from big data is a hot

12 spot of modern scientific research. One attractive approach is data reduction through subdata selection. Typically, this approach is based on some strong model assumption: data follows one specific statistical model. Big data is complexity and it may not be the best to model the data using one specific model. Instead of assuming one specific model for all population, subgroup analysis assumes there is a hidden group structure and each group has its own model. While subgroup analysis addresses the balance of the model complexity and interpretability efficiently, one disadvantage of this approach is the computation complexity. Even when the sample size is moderate, it will take a considerate computation resource to analyse the data. How to select informative subdata under subgroup analysis In this talk, a new framework is proposed to address this issue. Keywords: Optimal, IBOSS, Segmentation

Optimal Sample Size for Cluster Randomized Trials: A Simulation-Based Search Algo­ rithm Speaker: Roee Gutman, Brown University Coauthor(s): Ruoshui Zhai Abstract: Cluster randomized trials (CRTs) are experimental designs in which groups are randomized to treatments rather than individual units. Some advantages of CRTs include simplicity of design and reduction of experimental contamination. However, when the goal is to estimate individual-level effects, CRTs can be less efficient than randomizing individuals directly because outcomes of indi­ viduals in similar clusters may be correlated. The statistical literature describes several closed-form formulae for approximating the required number of clusters for a predefined effect size, power, and normal distribution. The derivation of these formulae relies on asymptotic approximation; however, in many CRTs, only a small number of clusters are actually randomized, and asymptotic approxima­ tion may not be appropriate. Simulation procedures have been proposed as an alternative method for sample size determination, but the procedures have not been completely outlined with a de­ tailed discussion of the assumptions and proper implementations especially for CRTs. We propose a simulation-based search algorithm to determine the optimal sample size for CRTs. This approach is non-parametric and does not limit investigators to specific types of outcomes or test statistics. It can incorporate data from previous experiments and propose unbalanced sample allocation when the costs for enrolling a cluster to the control or intervention arm differ. We demonstrate this ap­ proach using data from a recent CRT that compares the effect of high-dose and low-dose influenza vaccination on health care utilization in nursing homes. Keywords: Cluster Randomization Trials, Potential Outcomes, Optimal Sample Size, Simulation Procedure

The Role of Optimal Challenge in Adaptive E-Learning: Evidence from a Field Experi­ ment with Middle School Students Speaker: De Liu, University of Minnesota Coauthor(s): Tao Li, Xin Xu, Yufang Wang Abstract: The existing adaptive e-Learning literature focuses on detecting and mitigating knowledge gaps by serving learning materials that learners have not mastered. However, the issue of challenge levels of new learning materials is neglected. Intrinsic motivation theory predicts that providing optimal challenges is important for ensuring engagement and therefore learning outcomes. To fill in the knowledge gap, we conducted a large-scale field experiment with 708 eighth-grade Chinese students at two middle schools for their summer English reading assignments during a three-week

13 period. In this study, we first developed an integrated algorithm that predicts the accuracy of each student for each set of problems at the moment. We used this algorithm to select a set of problems with desirable challenge levels based on estimated accuracy. Students were randomly assigned to one of the three groups, a control group with a random challenge level, a high-challenge group, and a low-challenge group. We did not find a significant difference between high- and low-challenge groups based on pre- and post-test of English reading skills. However, the steady-challenge groups (including high and low-challenge groups) outperformed the control group with random challenge levels. A repetition with 6 other middle schools further confirmed these findings. Keywords: e-learning, optimal challenge, field experiment, adaptive learning

Day 2: Friday, October 18, 2019

Session 5: Machine learning algorithms assisted by design concepts Organizer: Arman Sabbaghi, Purdue University Chair: Wei Zheng, University of Tennessee

Designing for Low-Rank Matrix Recovery: a Maximum Entropy Approach Speaker: Simon Mak, Duke University Coauthor(s): Yao Xie Abstract: Low-rank matrices play a fundamental role in modeling a variety of statistical and ma­ chine learning problems. In many such problems, however, these matrices cannot be fully observed as data, due to expensive costs or massive matrix sizes. It is therefore of interest to design the data collection procedure, in order to maximize matrix recovery from incomplete data. We pro­ pose a new design method for (linear) matrix measurements, using a novel Singular Matrix-variate Gaussian (SMG) model on matrix X. Fundamental to our method is the "maximum entropy sam­ pling" principle (Shewry & Wynn, 1987), which states that measurements with maximum entropy can in turn maximize information on matrix X. For initial design, this principle provides a way to construct designs using well-packed subspaces. For sequential design, the same principle yields a closed-form design construction, which adaptively incorporates learned subspace information from data. We demonstrate the usefulness of the proposed design method in several real-world appli­ cations, including solar imaging, database compression, and building recommendation systems for e-commerce.

Support Points: An Optimal and Model-Free Method for Subsampling Big Data Speaker: Roshan Joseph, Georgia Institute of Technology Coauthor(s): Simon Mak Abstract: This talk presents a novel method called support points, which can be used for optimal and model-free subsampling of big data. This method has important applications to many practical problems in statistics and machine learning, particularly when the available data is plentiful and high-dimensional, but the processing of such data is expensive due to computation or storage costs. We also propose an extension of the method called Projected Support Points to deal with high

14 dimensional data, which ensures that the data is well-reduced on low-dimensional projections of the data space. Keywords: Experimental design, Quasi-Monte Carlo

Collaborative Design for Improved Causal Machine Learning on Big Observational Data Speaker: Arman Sabbaghi, Purdue University Coauthor(s): Yumin Zhang Abstract: The successful application of machine learning algorithms for inferring causal effects from an observational study requires consideration of the design of the data. One strategy to design an observational study is to stratify or match the treated and control subjects so as to create balance in their covariate distributions. In Big Data settings, the workload of stratifying or matching a large number of subjects and assessing the distributional balance of their high-dimensional covariates is unwieldy for a single researcher. We propose a procedure to stratify or match subjects in Big Obser­ vational Data based on the collaborative efforts of multiple analysts. In this procedure, each analyst first individually estimates the propensity scores for all of the subjects using data on a subset of covariates that they are assigned. They then collaborate with the other analysts to determine a final design utilizing all of their propensity scores. This process decomposes the design task into manageable modules of propensity score estimation, stratification/matching, and covariate balance assessment across analysts. When covariate balance is deemed to be achieved, causal effects are then inferred via the application of flexible machine learning algorithms to the designed observa­ tional study. Our procedure ultimately reduces the workload of each study designer, and facilitates machine learning for causal effects from large observational studies with diversified data sources. Keywords: causal inference; Rubin causal model; XGBoost

Session 6: Internet experiments with emphasis on e-commerce Organizer and Chair: William Li, Shanghai Advanced Institute of Finance

Active Arm Selection Using Thompson Sampling (AASETS): A Multi-Armed Bandit Method under Arm Budget Constraints Speaker: Yuanshuo Zhao, Uber Coauthor(s): Simon Mak, C.F. Jeff Wu Abstract: In e-commerce companies, such as Amazon and Linkedln, a key step for revenue opti­ mization is designing a website which maximizes conversion rates. This is achieved by first running many conversion experiments on different website settings (i.e., with different combinations of de­ sign factors), then using this data to pick an optimal website setting. In real-world scenarios, there are oftentimes many factors of interest, resulting in a large website design space. For such problems, only a small fraction of websites can be run in each experiment round due to budget constraints. This poses a problem for traditional multi-armed bandit methods, which typically assume all web­ site settings (arms) are tested in each experiment round. To address this so-called "arm budget constraint", we propose a new method called Active Arm SEiection using Thompson Sampling (AASETS), which performs active arm selection and traffic allocation in an online setting, under a fixed budget of arms in each experiment round. The key novelty of AASETS is the use of a low-order interaction model to learn dependencies between arms on the factorial design space. This model

15 allows an experimenter to (i) adaptively add good arms and remove bad arms from experimentation, and (ii) leverage conversion data over all arms for effective traffic allocation. We show that AASETS outperforms several industry benchmark methods by a large margin under arm budget constraints, both in simulated examples and a real-world problem. Keywords: multi-armed bandit, Thompson sampling, factorial design

Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb Speaker: Aleksander Fabijan, Microsoft Coauthor(s): Jayant Gupchup, Somit Gupta, Jeff Omhover, Wen Qin, Lukas Vermeer, Pavel Dmitriev Abstract: Online Controlled Experiments (OCEs), aka A/B tests, are becoming a standard operating procedure in software companies. They can detect small causal changes in user behavior due to product modifications (e.g. new features). However, OCEs are sensitive to trustworthiness and data quality issues which, if go unaddressed or unnoticed, may result in making wrong decisions. One of the most useful indicators of a variety of data quality issues is a Sample Ratio Mismatch (SRM): the situation when the observed sample ratio in the experiment is different from the one that we expected. Just like fever is a symptom for multiple types of illness, an SRM is a symptom for a variety of data quality issues. Ignoring the SRM without knowing the root cause may result in a bad product modification appearing to be good and getting shipped to users, or vice versa. In this talk, I will share a taxonomy of SRMs that we derived based on our experience of running tens of thousands of OCEs at Microsoft and other case companies that we collaborate with. I will present real online controlled experiments with an SRM that ran at Microsoft and lessons that we learned from diagnosing them. Keywords: SRM, Online Controlled Experiments

Improving Linkedln Member Experience amid Network Interferences: Journey and Learn­ ing Speaker: Weitao Duan, Linkedln Coauthor(s): Guillaume Saint Jacques Abstract: At Linkedln, we run a lot of A/B tests. It is critical to understanding whether a new feature or relevance model is working as intended and is improving the member experience. How­ ever, standard A/B testing is, unfortunately, inadequate to assess the impact on our ecosystem. It relies on a strong assumption: when comparing feature A and control, the behavior of feature A users is not impacted by the activity of control users. In other words, most methods assume that there is no interference, which is sometimes called "network effect" between features. Many of our experiments at Linkedln have network impacts that are hard to measure. Moreover, interference happens in various ways across different Linkedln products. In some cases, interference travels only along an edge between two members, while in other cases, interference propagates through the entire network. In this talk, we will introduce the suite of solutions we have in understanding the true treatment impact on our ecosystem with network interference, including ego-cluster method and edge level analysis. Keywords: network experiment, A/B testing, social network

16 Session 7: Network experimentation Organizer: Jean Pouget-Abadie, Google Chair: Weitao Duan, Linkedln

Designs for Estimating the Treatment Effect in Networks with Interference Speaker: Ravi Jagadeesan, Harvard University Coauthor(s): Natesh S. Pillai and Alexander Volfovsky Abstract: In this paper we introduce new, easily implementable designs for drawing causal inference from randomized experiments on networks with interference. Inspired by the idea of matching in observational studies, we introduce the notion of considering a treatment assignment as a "quasi­ coloring" on a graph. Our idea of a perfect quasicoloring strives to match every treated unit on a given network with a distinct control unit that has identical number of treated and control neighbors. For a wide range of interference functions encountered in applications, we show both by theory and simulations that the classical Neymanian estimator for the direct effect has desirable properties for our designs. This further extends to settings where homophily is present in addition to interference. Keywords: Experimental Design, Network Interference, Neyman Estimator, Symmetric Interference Model, Homophily.

A Graph-Theoretic Approach to Randomization Tests of Causal Effects Under General Interference Speaker: David Puelz, The University of Chicago Coauthor(s): Guillaume Basse, Avi Feller, Panos Toulis Abstract: Interference between units, in which a unit's outcome may depend on other units' assign­ ments, is an important but challenging problem. Standard randomization tests, for example, can only test sharp null hypotheses, such as the global null hypothesis of no effect. Such tests, however, are typically invalid for many hypotheses of interest, such as testing for treatment spillovers, which are not sharp. One solution is to find subsets of units and assignments for which a given null hypoth­ esis is sharp; researchers can then use standard randomization-based tests. Finding these subsets is challenging, and existing methods either have low power or are limited to special cases. In this paper, we propose powerful, valid, and easy-to-implement tests that allow for arbitrary dependence between units. Our key idea is to represent the null hypothesis of interest as a bipartite graph between units and assignments and to find a clique on this graph. Importantly, the null hypothesis is sharp for the subset of units and assignments in that clique, enabling standard randomization­ based tests. We illustrate this approach in clustered interference settings and show advantages over methods designed specifically for this setting. We then apply this method to a large-scale policing experiment in Medell n, Colombia, in which spillover is a primary question of interest. Keywords: randomization test, interference, causal inference, networks, clique

Variance Reduction in Bipartite Experiments through Correlation Clustering Speaker: Jean Pouget-Abadie, Google Coauthor(s): Kevin Aydin, Warren Schudy, Kay Brodersen and Vahab Mirrokni Abstract: Causal inference in randomized experiments typically assumes that the units of random­ ization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the

17 treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. Whereas previous work relied on balanced partitioning, our formulation suggests the use of a correlation clustering objective. We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments.

Session 8: Discrete choice experiments Organizer and Chair: Angela Dean, Ohio State University

Optimal Product Design by Sequential Experiments in High Dimensions Speaker: Mingyu Joo, UC Riverside Coauthor(s): Michael L. Thompson and Greg M. Allenby Abstract: The identification of optimal product and package designs is challenged when attributes and their levels interact. Firms recognize this by testing trial products and designs prior to launch, during which the effects of interactions are revealed. A difficulty in conducting analysis for product design is dealing with the high dimensionality of the design space and the selection of promising product configurations for testing. We propose an experimental criterion for efficiently testing product profiles with high demand potential in sequential experiments. The criterion is based on the expected improvement in market share of a design beyond the current best alternative. We also incorporate a stochastic search variable selection method to selectively estimate relevant interactions among the attributes. A validation experiment confirms that our proposed method leads to improved design concepts in a high-dimensional space compared with alternative methods. Keywords: design criterion, expected improvement, interaction effects, stochastic search, variable selection

Efficient Design and Analysis for a Selective Choice Process Speaker: Qing Liu, University of Wisconsin Coauthor(s): Ty Henderson Abstract: Variable selection is a decision heuristic that describes a selective choice process choices are made based on only a subset of product attributes while the presence of other ("inactive") attributes plays no active role in the decision. Within this context, we address two integrated topics that have received scant attention: the efficient design of choice experiments and the analysis of data arising from a selective choice process. We propose a new dual-objective compound design criterion that incorporates prior information for the joint purpose of efficient estimation of the effects of the active attributes and detection of the effects of attributes stated as inactive but may turn out to be active. The approach leverages self-stated auxiliary data as prior information both for individual-level customized design construction and in a heterogeneous variable selection model. We demonstrate the efficiency advantages of the approach relative to design benchmarks and highlight practical implications using both simulated data and actual data from a conjoint choice experiment where individual designs were customized instantaneously using self-stated active/inactive attribute status. Keywords: variable selection, selective choice process, customized conjoint choice designs, discrete choice designs, compound design criterion 18 Benefit Formation and Enhancement Speaker: Greg Allenby, Ohio State University Coauthor(s): Hyowon Kim and Dong Soo Kim Abstract: The identification of product attributes and features that are essential for a benefit to exist is critical to new product development and formulation. Some attributes are benefit enabling in the sense that their presence signals that an offering is responsive to some need, while their absence indicates that the product is not responsive. Not all brands, for example, are seen as luxurious and specific brand names may be needed for consumers to consider an offering as providing luxury. Benefit formation hinges on these types of attributes being present, while other attributes may only enhance a benefit that has already been formed. Benefit forming attributes therefore create a spe­ cific form of interaction among product attributes that may be heterogeneously distributed in the population. In this presentation we will discuss challenges in the design of choice experiments to uncover benefit forming and enhancement interactions. Keywords: Conditional subadditive utility, Lancasterian model

Day 3: Saturday, October 19, 2019

Session 9: Optimal designs Organizer and Chair: John Stufken, UNC Greensboro

Optimal Designs for Nonlinear Multiple Regression Models with Censored Data Speaker: Dennis Schmidt, Otto-von-Guericke-Universitat Magdeburg Abstract: A broad class of nonlinear multiple regression models with an arbitrary number of covariates is considered. This class includes proportional hazards models with both type I and random censoring, the Poisson and the negative binomial model. The D-optimal designs for a single covariate are two­ point designs with a boundary point of the design region as one of its support points. We show that under these conditions the D-optimal designs for any number of covariates can be constructed from the D-optimal designs in the marginal models with a single covariate. The support points of these D-optimal designs are located on the edges of the design region. Furthermore, we consider other optimality criteria. For certain vectors c we determine the c-optimal designs analytically. It is shown that c-optimal designs with singular information matrices and design points in the interior of the design region may be optimal. In certain cases the c-optimal design is not unique and additional c-optimal designs with regular information matrices may exist. For the general k-optimality criteria, which include D- and A-optimality, we compute optimal designs foran arbitrary number of covariates. In a numerical example it is investigated how censoring affects the optimal designs for the different optimality criteria. It is shown that the optimal designs have a much better performance than the optimal designs in the case of ignoring the censoring. Keywords: multiple regression, censored data, proportional hazards model

19 A Comparative Study of the Probability Distribution in Optimal Design Speaker: Sergio Pozuelo-Campos, University of Castilla-La Mancha Coauthor(s): Mariano Amo-Salas, V ctor Casero-Alonso Abstract: It is a common assumption in the context of Optimal Experimental Design that the re­ sponse variable follows a homoscedastic normal distribution. There are, however, other studies that assume different probability distributions based on prior experience or additional information. The main goal of this study is to look at the effect, in terms of efficiency, of misspecification in the probability distribution on optimal design. From the elemental information matrix, which includes information on the probability distribution of the response variable, a generalized Fisher information matrix is obtained. Relevant theoretical results were obtained, for different regression models, com­ paring heteroscedastic Poisson, gamma and normal distributions. Finally, the analysis was broadened to include a practical case which considers a 4-parameter Hill model, to explain the effect of a phar­ maceutical drug on cell development. Keywords: Elemental Information Matrix; Gamma Distribution; Poisson Distribution; D-optimization; D-efficiency

Robust Experimental Designs for Model Calibration Speaker: William Myers, The Procter & Gamble Company Coauthor(s): Roshan Joseph, Arvind Krishna, William Brenneman and Shan Ba Abstract: A physics-based model can be used for predicting an output only after specifying the val­ ues of some unknown physical constants known as calibration parameters. The unknown calibration parameters can be estimated from real data by conducting physical experiments. This paper presents an approach to optimally design such a physical experiment. The problem of optimally designing physical experiment, using a physics-based model, is similar to the problem of finding optimal design for nonlinear models. However, the problem is more challenging than the existing work on nonlinear optimal design because of the possibility of model discrepancy, that is, the physics-based model may not be an accurate representation of the true underlying model. Therefore, we propose an optimal design approach that is robust to potential model biases. We show that our designs are better than the commonly used physical experimental designs that do not make use of the information contained in the physics-based model and other nonlinear optimal designs that ignore potential model biases. We illustrate our approach using a toy example and a real example from Procter & Gamble. Keywords: calibration parameters, model discrepancy, physical experimental design

Session 10: Factorial designs Organizer and Chair: Boxin Tang, Simon Fraser University

A Class of Multilevel Nonregular Fractional Factorial Designs for Studying Quantitative Factors Speaker: Lin Wang, George Washington University Coauthor(s): Hongquan Xu Abstract: Nonregular fractional factorial designs can have better properties than regular designs, but their construction is challenging. Current research on the construction of nonregular designs focuses on two-level designs. We construct a novel class of multilevel nonregular designs by permuting levels

20 of regular designs via the Williams transformation. The constructed designs can reduce aliasing among effects without increasing the run size. They are more efficient than regular designs for studying quantitative factors. Keywords: Generalized minimum aberration; Level permutation; Williams transformation.

Group Orthogonal Supersaturated Designs Speaker: Ryan Lekivetz, JMP Coauthor(s): Bradley Jones, Chris Nachtsheim, Dibyen Majumdar, Jon Stallrich Abstract: This talk introduces a new method for constructing supersaturated designs (SSDs) that is based on the Kronecker product of two carefully chosen matrices. The construction method leads to a partitioning of the columns of the design such that the columns within a group are correlated to the others within the same group, but are orthogonal to any factor in any other group. We refer to the resulting designs as group orthogonal supersaturated designs (GOSSDs). We leverage this structure to obtain an unbiased estimate of the error variance, and to develop an effective, design-based model selection procedure. Simulation results show that the use of these designs, in conjunction with its model selection procedure, enables the identification of larger numbers of active main effects than have previously been reported for supersaturated designs. Keywords: E(s2 )-optimality, U E(s2 )-optimality, Group screening designs, Hadamard matrices, Model selection

Cost-Efficient Mixed-Level Covering Designs for Testing Experiments: Construction and Application Speaker: Frederick Phoa, Academia Sinica Coauthor(s): Yasmeen Akhtar Abstract: A covering design is a traditional class of experimental plans for hardware and software testing purposes. This paper presents a class of size-optimal covering designs for testing experiments with mixed-level factors. Among all factors of different levels, one or two factors have a high number of levels while other factors form a full factorial so that all level combinations among factor pairs are covered at least once and appeared almost equally frequent. We use the coloring techniques for hypergraphs to construct such near-balanced mixed-level covering designs with the minimum run-size. The resulting class of designs are applied to a real-life experiment in food industry. Keywords: Covering Designs, Near-Continuous Factors, Hyperedge Coloring, Mixed Covering Array on Hypergraph

21 ABSTRACTS OF INVITED POSTERS

Day 1: Thursday, October 17, 2019

Finding Bayesian Optimal Follow up Experiment of Design Points for Simultaneous Tol­ erable Dosage Combinations for Multiple Endpoints Speaker: Faten Alamri, Virginia commonwealth university and Princess Nourah bint Abdulrahman University Coauthor(s): Edward Boone and David Edwards Abstract: Everyone is exposed to chemicals in modern society as most workplaces, homes, farms, markets use some form of chemical to address pests, spoilage, sanitation, etc., and assessing and controlling the risk is of paramount importance. In particular, agriculture has a high risk due to fertil­ izers and pesticides, which may contain hazardous chemicals. Farmers are interested in determining a proper treatment (dose) of chemicals to eliminate pests but want a dosage that still permits a safe work environment. This work develops a statistical framework to determine the tolerable region that has low doses of chemicals and minimal side effects using a Bayesian design of experiments approach. A novel approach to follow up designs using the variance of the tolerable region when parametric models are used is presented. The work considers various scenarios that could be confronted in the real world to demonstrate the method for obtaining design points that minimize the variance of the resulting tolerable region. This approach considers multiple outcomes and multiple stressors. Keywords: Bayesian experiment design; Tolerable Region; Markov Chain Monte Carlo; Toxicology data analysis

□-Optimal Designs and Efficiency of Designs for the Antoine's Equation in Distillation Experiments Speaker: Carlos de la Calle-Arroyo, Universidad de Castilla-La Mancha Coauthor(s): Jesus L6pez-Fidalgo, Licesio Rodrfguez-Arag6n Abstract: In the distillation processes it is very important to know precisely the relationship between temperature and vapor pressure. The vapor pressures not only depend on the temperature but vary enormously for different substances. Keywords: Optimal Experimental Design, Antoine's Equation, D-Optimality, I-Optimality

The Relevance of Designing Experiments in Clinical Trials: Experiences and Perspectives from a Clinical Research Organization (CRO) in Mexico Speaker: Carlos Diaz-Tufinio, Axis Clinicals Latina / National Institute of Genomic Medicine / UNAM Coauthor(s): Jose Antonio Palma-Aguirre Abstract: Statistics has been an indispensable tool for succeeding in the experimental setting in pharmacology. Moreover, the field of Design of Experiments (DoE) has been one of the most important issues to address while planning the conduction of a clinical trial. Specifically, for the trials with bioequivalence and comparative bioavailability purposes, where crossover designs are common and feasible to be performed, the blockage of potential sources of variation leads to a reduction of the expected biological variability in the pharmacokinetic parameters. This allows a substantial reduction in time and costs, as well as lower sample sizes, which minimizes the exposition of healthy volunteers to unneeded pharmacological treatments, and speeds up the development and

22 positioning of generic drugs into the market, which ultimately has social and economic impacts. Keywords: Clinical trials, Variability, Drug development, Blocking, Stratification.

A Systematic Construction of Cost-Efficient Designs for Order-of-Addition Experiments Speaker: Jing-Wen Huang, National Tsing Hua University Coauthor(s): Fred Phoa Abstract: An order-of-addition (OofA) experiment aims at investigating how the order of factor inputs affects the experimental response, which is recently of great interest among practitioners in clinical trials and industrial processes. Although the initial framework was established for more than 70 years, recent studies in the design construction of OofA experiments focused on their properties of algebraic optimality rather than cost-efficiency. The latter is more practical in the sense that some experiments, like treatments, may not easily have adequate number of observations. In this work, we propose a systematic construction method for designs in OofA experiments from cost-efficient perspective. In specific, our designs take the effect of two successive treatments into consideration. To be cost-efficient, each pair of level settings from two different factors in our design matrix appears exactly once. Compared to recent studies in Oof A experiments, our designs not only handle experiments of one-level factors (i.e. all factors are mandatorily considered), but also factors of two or more levels, so practitioners may insert placebo or choose different dose when our designs are used in an Oof A experiment in clinical trials for example. Keywords: Pairwise Order Designs, Order-of-Addition Experiments, Cost-efficiency, Clinical Trials

Design and Dismemberment for Controlling the Risk of Regret for the Multi-Armed Ban­ dit Speaker: Timothy Keaton, Purdue University Coauthor(s): Arman Sabbaghi Abstract: The multi-armed bandit (MAB) problem refers to the task of sequentially assigning dif­ ferent treatments to experimental units so as to identify the best treatment(s) while controlling the regret, or opportunity cost, of exploration. The traditional criterion of interest for the design of an MAB algorithm has been control of the expected regret over the course of the algorithm's imple­ mentation. However, an additional criterion that must be considered for many practical, real-life problems is control of the variance, or risk, of regret. We develop a framework to address both of these criteria by means of two elementary concepts that can be incorporated into any existing MAB algorithm: design of a learning phase and dismemberment of interventions after the learning phase. The utility of our framework is demonstrated in the construction of new Thompson samplers that involve a small number of simple and interpretable tuning parameters. Additionally included is a presentation of new and unique dynamic techniques to visualize MAB algorithms. Keywords: sequential design, binomial bandit, visualizations

Distributional clustering: A distribution-preserving clustering method Speaker: Arvind Krishna, Georgia Institute of Technology Coauthor(s): Simon Mak, V. Roshan Joseph Abstract: One of the key uses of clustering is to identify representative points from a dataset of interest via cluster centers. However, a drawback of k-means clustering is that it induces a distortion between the distribution of its cluster centers and that of the underlying data. This can be disadvan­ tageous in problems where cluster centers are subsequently used to gain insights on the data, such

23 as density estimation or pattern recognition, as the accuracy of the analysis method in these cases depends on how well the cluster centers mimic the distribution of the data. To address this short­ coming, we propose a new clustering method called "distributional clustering", where cluster centers capture the distribution of the underlying data. We first prove the asymptotic convergence of the proposed cluster centers to the data generating distribution {which addresses the aforementioned distortion problem of k-means), then propose an efficient algorithm for computing these cluster centers in practice. Finally, we demonstrate the effectiveness of our method on synthetic and real datasets.

EzGP: Easy-to-Interpret Gaussian Process Models for Computer Experiments with Both Quantitative and Qualitative Factors Speaker: Abhyuday Mandal, University of Georgia Coauthor(s): Qian Xiao, C. Devon Lin and Xinwei Deng Abstract: Computer experiments with both quantitative and qualitative (QQ) inputs are commonly used in science and engineering applications. Constructing desirable emulators for such computer experiments remains a challenging problem. In this article, we propose an easy-to-interpret Gaussian process {EzGP) model for computer experiments to reflect the change of the computer model under different level combinations of qualitative factors. The proposed modeling strategy, based on an additive Gaussian process {GP), is flexible to address the heterogeneity of computer models involving multiple qualitative factors. We also develop two useful variants of the EzGP model to achieve computation efficiency when dealing with high dimensional data and large data size. The merits of these models are illustrated by several numerical examples and a real data application. Keywords: Additive Model; Categorical Data; Computer Model; Emulator; Kriging.

Designing for Large Simulations: A Case Study Speaker: R. Vincent Paris, Iowa State University Coauthor(s): George Ostrouchov, Drew Schmidt, Joshua New Abstract: We will present an overview of using a large group screening experiment for a deterministic large scale simulation written by the Department of Energy. The simulation had up to 4.5k variables with special interest in second order interactions and good performance on supercomputers. The advantages of the implementation of group screening compared to the original analysis, as well as a comparison of the final results, will be presented. Keywords: Group Screening, Simulation, Factorial Experiment

Experimental Designs Suitable for Cases with Varying Error Variances Speaker: Garima Priyadarshini, Imperial College London Abstract: For the usual linear model given as Y = X{J+E, the ordinary least squares {OLS) estimator is BLUE {best linear unbiased estimator) only if the assumptions of the linear regression are satisfied. Out of these, the assumption of homoscedasticity is observed to be violated frequently and certain solutions to deal with it are provided as well. These solutions, however, assume some knowledge about the form of the variance terms. As estimation of treatment effects in experimental designs is essentially OLS estimation, and any knowledge about the form of the observed variances is quite unlikely to be there at the time of designing an experiment fora process, it would be useful to obtain robust experimental designs suited for the case of variable error variances.

24 This work aims at defining a methodology to obtain suitable experimental designs for processes that exhibit random error variances. The methodology proposes designs with sparse associated design matrices, tailored to facilitate estimation of the treatment effects of interest. As presence of heteroscedasticity makes the treatment effect estimates non consistent, the utility of these designs lies in the fact that the estimate of the variance of treatment effect estimates are brought close to their OLS counterpart by means of utilizing this sparsity in the design matrix. The proposed designs are, thus, the most efficient for estimating the effects of interest when the error variances are not same. Keywords: experimental designs, heteroscedasticity, efficient designs

Design and Analysis of Order-of-Addition Experiments with Application to Sequential Drug Administration Speaker: Zack Stokes, UCLA Coauthor(s): Hongquan Xu Abstract: In many physical and computer experiments the order in which the steps of a process are performed may have a substantial impact on the measured response. Often the goal in these situations is to uncover the order which optimizes the response according to some metric. The brute force approach of performing all permutations quickly becomes infeasible as the number of components in the process increases. Instead, we seek to develop order-of-addition experiments that choose an economically viable subset of permutations to test. The statistical literature on this topic is sparse and many researchers rely on ad hoc methods to study the effect of process order. In this work we present a series of novel developments through the applied lens of sequence optimization of drug combination . These developments include models that appropriately exploit the structure of the data, a method for constructing optimal designs under these proposed models, and evaluation of the robustness of the constructed designs to algorithmic variability and model misspecification. Keywords: Drug administration, Optimal design, Order of addition, Orthogonal arrays

A Novel Bayesian Optimization Approach for Both Quantitative and Sequence Inputs Speaker: Qian Xiao, University of Georgia Coauthor(s): Xinwei Deng; Abhyuday Mandal Abstract: Drug combinations have been widely applied in disease treatment, especially chemotherapy for cancer. Traditionally, researchers only focus on optimizing drugs' dosages. Yet, some recent studies show that the orders of adding drug components are also important to the efficacy of drug combinations. In practice, experiments enumerating all possible sequences with different drug-doses are not usually affordable. Thus, statistical tools that can identify optimal drug consisting of both quantitative and sequence inputs within a few runs are required. Such problems are also encountered in engineering, chemistry, physics, managements, food science and etc. In this paper, we propose a novel Bayesian optimization approach to tackle this problem, which includes a novel Mapping-based Additive Gaussian Process (MaGP) model for both quantitative and sequence inputs, an innovative global optimization algorithm for the sequential scheme, and a new class of optimal experimental designs. The proposed method can identify optimal solutions within a few runs, provide accurate predictions on the response surface and give clear interpretations on model structure. The MaGP model can be easily generalized to further include qualitative inputs, e.g. blocking in physical experiment. We illustrate the superiority of the proposed method via a real drug experiment on lymphoma and several simulation studies. Keywords: Order of addition experiment, Data Science, Pharmaceutical Study 25 Dose Allocation Using Optimal Design Theory in Phase 1/11 Clinical Trials Where Toxicity and Efficacy Are Evaluated Together Speaker: Yue Zhang, University of Illinois at Chicago Abstract: It is common both toxicity and efficacy are of interest and often observed simultaneously for dose finding in phase 1/11 clinical trials. In this situation, evaluating them together and identifying the optimal dose such that toxicity is under control and efficacy is maximized are crucial targets. The clinical outcomes are treated as following a sequential order: no dose-limiting toxicity (DLT) and no efficacy, no DLT but with efficacy, and severe DLT and they are modeled using continuation ratio (CR) model. We extended continual reassessment method (CRM) by incorporating optimal design theory to identify optimal dose under CR model. Some analytic general conclusions regarding the optimal dose allocation at each stage of the trial are drawn. The simulation studies under various scenarios demonstrate its promising ability in identifying optimal dose as well as alleviating the safety and ethical concerns. Keywords: optimal dose, optimal design theory, toxicity and efficacy, CRM

Day 2: Friday, October 18, 2019

The Transmuted F Test: A Robust Testing Criterion in Design and Analysis of Experiment Speaker: Kazeem Adepoju, University of Minnesota Co-author(s): Galin Jones Abstract: The analysis of experimental studies involves the use of analysis of variance (ANOVA) models. In single factor experiments, ANOVA models are used to compare the mean response values at different levels of the factor via F test. The validity of F test which compares several populations means depends on the underlined assumptions which include; independent of populations, constant variance and absence of outlier among others. Arguably the source of violation of some of these assumptions is the outlier which lead to unequal variances of the populations. Outlier leads to inequality in the variances which consequently leads to the failure of the classical-F to take correct decision in terms of the null hypothesis. A series of robust tests have been carried out to ameliorate these lapses with some degrees of inaccuracies and limitations in terms of inflating the type 1 error. This study focuses on developing a Transmuted F test capable of taking decisions on Analysis of variance that are robust to the existence of outlier. The performance of the Transmuted F test was compared with the existing F-tests in the literature using the power of test. Keywords: ANOVA, Classical F, Transmuted F, Outlier

Locating Array and its application as the Screening design Speaker: Yasmeen Akhtar, Arizona State University (CIDSE) Coauthor(s): C. J. Colbourn, S. A. Seidel, J. Stufken, V. R. Syrotiuk, and F. Zhang. Abstract: The identification of parameters having a significant impact on the response, and the interactions accounted for faulty behavior are two essential aspects of testing. The traditional screening designs do not guarantee to identify faulty interactions. On the other hand, covering array can reveal the presence of a fault, but they are inadequate to locate the interactions responsible for it and are not suitable for measuring the effects. To address this problem, we use locating array (LA) as the screening design. A (d, t)-locating array is a covering array of strength t with the property that any set of d t-tuples can be distinguished from any other such set by appearing in a distinct set of trials. Consequently, LAs could distinguish the influence of different interactions (t-tuples). We propose a to screen important factors and interactions in the experiments based on LAs. The method is used for analyzing the data from different experiments, and the results show that this method efficiently

26 identifies the influential factors and interactions, also, the findings agree with results obtained by existing methods in the literature. Keywords: Locating array, screening designs

Inducing Point Methods for Gaussian Process Surrogates of Large-Scale Simulations Speaker: David Cole, Virginia Tech Abstract: Gaussian processes (GPs) provide a flexible methodology for modeling complex surfaces. One challenge with GPs is the computational burden with an increasing sample size or number of dimensions. The machine learning community has turned to pseudo-inputs or inducing points to reduce the computational burden in such contexts. We seek to port this family of methods to build GP surrogates for noisy and even heteroskedastic stochastic processes, with extensions to sequential design and Bayesian optimization. We show that using inducing points extends the reach of GP surrogates in big simulation contexts and makes for efficient design and meta-modeling of large scale computer simulation experiments. Examples are provided for epidemiological, industrial, and financial applications.

Optimal Crossover Designs for Generalized Linear Models Speaker: Jeevan Jankar, Indian Institute of Science Education and Research Kolkata, India Coauthor(s): Abhyuday Mandal, Jie Yang Abstract: We identify locally D-optimal crossover designs for generalized linear models. We use generalized estimating equations to estimate the model parameters along with their variances. To capture the dependency among the observations coming from the same subject, we propose six different correlation structures. We identify the optimal allocations of units for different sequences of treatments. For two-treatment crossover designs, we show via simulations that the optimal allocations are reasonably robust to different choices of the correlation structures. We discuss a real example of multiple treatment crossover experiments using Latin square designs. Using a simulation study, we show that a two-stage design with our locally 0-optimal design at the second stage is more efficient than the uniform design, especially when the responses from the same subject are more correlated. Keywords: Approximate Designs, D-Optimality, AR(l) Correlation Structure, Generalized Estimating Equations, Two-Stage Design.

A New Analysis Strategy for Designs with Complex Aliasing Speaker: Andrew Kane, Duke University Coauthor(s): Abhyuday Mandal Abstract: Non-regular designs are popular in planning industrial experiments for their run-size economy. These designs often produce partially aliased effects, where the effects of different factors cannot be completely separated from each other. In this paper, we propose applying an adaptive lasso regression as an analytical tool for designs with complex aliasing. Its utility compared to traditional methods is demonstrated by analyzing real-life experimental data and simulation studies.

Inverse Problem for Dynamic Computer Simulators via Multiple Scalar-ved Contour Estimation Speaker: Joseph Resch, University of Georgia Coauthor(s): Abhyuday Mandal, Pritam Ranjan Abstract: The inverse problem refers to as finding a set of inputs that generates a pre-specified simulator output. A dynamic computer simulator produces a time-series response, yt(x) over time points t = 1, 2, ... , T, for every given input parameter x. The motivating application uses a rainfall­ runoff measurement model ( called Matlab Simulink model) to predict the rate of runoff and sediment 27 yield in a watershed, using several inputs on the soil, weather ( e.g., , temperature, and humidity), elevation, and land characteristics. The input parameters x's are typically unknown and our aim is to identify the ones that correspond to the target response, which is to be used further for calibrating the computer model and make more accurate predictions of water level. The pro­posed approach starts with discretizing the target response series on k( < < T) time points, and then iteratively solve k scalar-valued inverse problems with respect to the discretized targets. We investigate two methods for the scalar-valued inverse problem. The first one is a sequential approach of contour estimation via expected improvement criterion developed by Ranjan et al. (2008, DOI: 10.1198/004017008000000541). The second method is a modified history matching algorithm in spirit of Bhattacharjee et al. (2019, DOI: 10.1007 /s10651-019-00420-9). We also propose to use spline smoothing of the target response series to identify k as the optimal number of knots, and the discretization time points as the actual location of the knots. The performance of the proposed methods are compared for several test-function based computer simulators and the real-life Matlab­Simulink model. Keywords: History matching; Gaussian process model; Expected improvement criterion; Spline smoothing; Matlab-Simulink model

Optimal Design for a Two-Parameter Generalized Linear Mixed Model in Longitudinal Study Speaker: Yao Shi, Arizona State University Coauthor(s): John Stufken Abstract: In a longitudinal study, an experiment may involve several different subjects with dis­ crete responses, where we usually apply a generalized linear mixed-effects model (GLMM). However, determining the information matrix can be difficult because the it usually doesn't have a closed form. My focus is on locally D-optimal design under a two-parameter longitudinal GLMM. Apply­ ing penalized quasi-likelihood (PQL) method, we have an approximation to the real information matrix. Then, based on different response accuracy and different respondent heterogeneity, the comparisons between the approximated (by PQL method) and the numerically calculated real in­formation matrix (by numerical integration) are made, giving good performances. At last, a case study (the French EPIDOS study) shows the optimal design found based on this PQL approximation. Keywords: Locally optimal design, Binary longitudinal study, Penalized quasi-likelihood, Two-parameter model.

Minimum Space-Filling Aberration for Strong Orthogonal Arrays Speaker: Ye Tian, UCLA Coauthor(s): Hongquan Xu Abstract: Strong orthogonal arrays (He and Tang, 2013) with "stratified" orthogonality have good space-filling property. If we treat design matrix as points distributed in the whole design region, strength t strong orthogonal arrays have equal number of points in any st equal-volume margins of design region cut through projection. We propose space-filling pattern as a measurement of this projection property. J-characteristics (Tang, 2001) are redefined carefully to capture finer information of orthogonality. Calculated with J-characteristics, each element of space-filling pattern tells about how uniform certain type of projection design will be. Minimum space-filling aberration criterion based on space-filling pattern is introduced as a systematic way ranking space-filling property of designs. Justification and application of this new criterion are provided. Keywords: Space-filling pattern, uniform projection, strong orthogonal arrays 28 A Review on Finding Maximin Distance Latin Hypercube Designs with Flexible Sizes Speaker: Hongzhi Wang, University of Georgia Coauthor(s): Qian Xiao and Abhyuday Mandal Abstract: Space-filling Latin hypercube designs (LHDs), e.g. maximin distance LHDs, are widely used in computer experiments. Their constructions with flexible run and factor sizes are challenging. Different algebraic constructions and heuristic algorithms have been proposed in the literature, each with its own advantages and disadvantages. In this paper, we compare and summarize some popular methods of both kinds, and provide recommendations of methods for different design sizes. The results can be used as benchmarks for future development on finding maximin distance LHDs. Keywords: Computer Experiment; Lattice Point Designs; Simulated Annealing; Particle Swarm Optimization

Dimensional Analysis for Response Surface Methodology Speaker: Ching-Chi Yang, University of Memphis Coauthor(s): Dennis Lin Abstract: Response surface methodology (RSM) is widely used in chemical engineering and industrial processes. The objective is to obtain the inputs' values, in a small number of experiments, such that the response is optimal. Dimensional analysis (DA) is a well-developed methodology in physical sciences and engineering studies. Given all relevant variables which impact the system of interest, DA can extract the dimensionless variables from the original variables via the physical dimension, and the underlying system can be expressed by its simplest form via DA. Because the dimensionless variables are sufficient to approximate the simplest form, the optimal design on the dimensionless variables can be more efficient than the optimal design on the original variables. Besides, the models based on the dimensionless variables are free from dimensional constraints. However, the dimensionless response might not be the original response. The optimal original response will not be obtained directly. In this paper, the general procedure of utilizing DA for RSM is proposed. The optimal response obtained by the proposed method is proved to be the optimal response in the original space under some regularity conditions. The number of experiments required by the proposed method is smaller than the one required by the conventional RSM. The Airfoil case study, as well as simulated examples, are used for illustrations. Keywords: Dimension Reduction; Optimization; Variable Selection

Sequential Maximin Good Lattice Point Sets Speaker: Xueru Zhang, Nankai University Coauthor(s): Min-Qian Liu, Yong-Dao Zhou Abstract: Localization of search in quasi-Monte Carlo method can accelerate the convergence sig­ nificantly, which is widely utilized to approximate the global optimum of a continuous function. However, it always neglects the previous points fell into current domain, which is uneconomic and causes loss of information. In order to overcome the drawback, this paper proposes a new sequential quasi-random point set, sequential good lattice point (SGLP) set, which contributes to a new local

29 search method for optimization named as SGO. In order to improve scatter uniformity, sequential maxim in good lattice point (SMGLP) sets are proposed and constructed under the maximin distance criterion. SGO method is not easy to fall into the local optimal point and has rapid convergence speed. Moreover, combined with modeling technology, SGO can be extended to investigate an unknown process or system with a random error. Compared with response surface method, SGO is more effective for multi-peak function optimization. Keywords: Quasi-Monte Carlo method; Localization of search; Maximin distance criterion; Response surface method

Distance-Distributed Design for Gaussian Process Surrogates Speaker: Boya Zhang, Virginia Tech Coauthor(s): D. Austin Cole, Robert B. Gramacy Abstract: A common challenge in computer experiments and related fields is to efficiently explore the input space using a small number of samples, i.e., the experimental design problem. Much of the recent focus in the computer experiment literature, where modeling is often via Gaussian process (GP) surrogates, has been on space-filling designs, via maximin distance, Latin hypercube, etc. However, it is easy to demonstrate empirically that such designs disappoint when the model hyper-parameterization is unknown, and must be estimated from data observed at the chosen de­ sign sites. This is true even when the performance metric is prediction-based, or when the target of interest is inherently or eventually sequential in nature, such as in blackbox (Bayesian) optimization. Here we expose such inefficiencies, showing that in many cases purely random design is superior to higher-powered alternatives. We then propose a family of new schemes by reverse engineering the qualities of the random designs which give the best estimates of GP length scales. Specifically, we study the distribution of pairwise distances between design elements, and develop a numerical scheme to optimize those distances for a given sample size and dimension. We illustrate how our distance-based designs, and their hybrids with more conventional space-filling schemes, outperform in both static (one-shot design) and sequential settings. Keywords: Computer experiment, emulator, experimental design, sequential design, Bayesian opti­ mization

30 2019 DAE List of Attendees

First Name Last Name Email Organization Kazeem Adepoju [email protected] University of Minnesota Yasmeen Akhtar [email protected] Arizona State University (CIDSE) Faten Alamri [email protected] Virginia Commonwealth University Greg Allenby [email protected] The Ohio State University Ahlam Alzharani [email protected] Virginia Commonwealth University Tanveer Bhuiyan [email protected] University of Hossain Tennessee Ham Bozdogan [email protected] University of Tennessee Donald Bryson [email protected] IMS/ACM/IEEE Kelley Callaway [email protected] University of Tennessee Din Chen [email protected] University of North Carolina- Chapel Hill Jianbin Chen [email protected] Nankai University David Cole [email protected] Virginia Tech Carlos de la Calle- [email protected] Universidad de Arroyo Castilla-La Mancha Angela Dean [email protected] The Ohio State University Abhijeet Dhakane [email protected] University of Tennessee Carlos Diaz-Tufinio [email protected] Axis Clinicals Latina /National Institute of Genomic Medicine/ UNAM Weitao Duan [email protected] LinkedIn David Edwards [email protected] Virginia Commonwealth University Aleksander Fabijan [email protected] Microsoft Nancy Flournoy [email protected] University of Missouri MIke Galbreth [email protected] University of Tennessee Robert Gramacy [email protected] Virginia Tech Roee Gutman [email protected] Brown University Arved Harding [email protected] Eastman Chemical Company

31 2019 DAE List of Attendees

Harrison Hicks [email protected] University of Tennessee Haileab Hillafu [email protected] University of Tennessee Jing-Wen Huang [email protected] National Tsing Hua University Ravi Jagadeesan [email protected] Harvard University Jeevan Jankar [email protected] University of Georgia Matt Jones [email protected] Austin Peay State University Max Joo [email protected] University of California, Riverside Roshan Joseph [email protected] Georgia Tech Andrew Kaminsky [email protected] CFDRC Andrew Kane [email protected] Duke University Adam Kapelner [email protected] Queens College, CUNY Timothy Keaton [email protected] Purdue University Xiangshun Kong [email protected] Beijing Institute of Technology Arvind Krishna [email protected] Georgia Institute of Technology Joseph Kupolusi [email protected] Federal University Ayodele of Technology Adam Lane [email protected] Cincinnati Children's Hospital Yeon Ok Lee [email protected] St. Jude Children's Hospital Ryan Lekivetz [email protected] SAS William Li [email protected] Shanghai Advanced Institute of Finance Xinran Li [email protected] University of Illinois at Urbana- Champaign Seung-Hwan Lim [email protected] Oak Ridge National Lab Zhantao Lin [email protected] George Mason University De Liu [email protected] University of Minnesota Qing Liu [email protected] University of Wisconsin Yanxi Liu [email protected] University of Illinois at Chicago Peshadi Liyanaarachchi [email protected] University of Tennessee

32 2019 DAE List of Attendees

Dibyen Majumdar [email protected] University of Illinois at Chicago Simon Mak [email protected] Duke University Abhyuday Mandal amandal@stat..edu University of Georgia Hugh Medal [email protected] University of Tennessee Robert Mee [email protected] University of Tennessee Leslie Moore [email protected] Sandia National Laboratories JP Morgan [email protected] Virginia Tech Max Morris [email protected] Iowa State University William Myers [email protected] The Procter & Gamble Company Mitsunori Ogawa [email protected] University of Tokyo Kavya Paladugu [email protected] University of Georgia Reid Vincent Paris [email protected] Iowa State University Frederick Phoa [email protected] Academia Sinica, Taiwan Jean Pouget- [email protected] Google Abadie Sergio Pozuelo- [email protected] University of Campos Castilla-La Mancha Garima Priyadarshini [email protected] Imperial College London David Puelz [email protected] University of Chicago, Booth School of Business Joseph Resch [email protected] University of Georgia Evan Rosenman [email protected] Stanford, PhD student Arman Sabbaghi [email protected] Purdue University Dennis Schmidt [email protected] Otto-von-Guericke- Universität Magdeburg Yao Shi [email protected] Arizona State University Zack Stokes [email protected] University of California, Los Angeles John Stufken [email protected] University of North Carolina at Greensboro 33 2019 DAE List of Attendees

Yaojin Sun [email protected] University of Tennessee Jindong Tan [email protected] University of Tennessee Boxin Tang [email protected] Simon Fraser University Yike Tang [email protected] The University of Illinois at Chicago Sergey Tarima [email protected] Medical College of Wisconsin Ye Tian [email protected] University of California, Los Angeles Saul Toscano [email protected] Cornell University Hongzhi Wang [email protected] University of Georgia Lin Wang [email protected] George Washington University Chunyan Wang [email protected] Nankai University Stefan Wild [email protected] Argonne National Laboratory Qian Xiao [email protected] University of Georgia Min Yang [email protected] University of Illinois at Chicago Jialin Yang [email protected] University of Georgia Ching-Chi Yang [email protected] University of Memphis Yuhao Yin [email protected] University of California, Los Angeles Xueru Zhang [email protected] Nankai University Yue Zhang [email protected] University of Illinois at Chicago Boya Zhang [email protected] Virginia Tech Yuanshuo Zhao [email protected] Uber (David) Wei Zheng [email protected] University of Tennessee

34 Parking Information

The Sheraton Four Points is offering free parking for DAE conference participants who are staying at the hotel. UT’s Student Union Building, where the conference will be held is a 7-minute walk from the hotel.

Those who are not staying at the Sheraton and plan to drive to UT’s campus may park in the UT Visitors Parking Garage at 1563 White Avenue. It is a 5- minute walk from this garage to the Student Union Building.

If you plan to use this garage, ask for a parking pass at the conference registration table each day. Use this when you exit the garage to exempt you from paying.

Conference Banquet

The conference banquet will be held at Club LeConte, 800 South Gay St., 27th floor, 6:00 – 8:00 p.m., Oct. 18.

We will provide a shuttle from the Student Union to Club LeConte at 5:50 p.m., and back to the hotel after the banquet.

Club LeConte is a 15-minute walk from the Sheraton Four Points.

For those driving to Club LeConte, there is $2 parking for the evening below Club LeConte. There is also free parking in the evenings and weekends in several nearby parking garages.

For more parking information: https://www.downtownknoxville.org/first-tennessee-plaza-garage/

35 Walking Map

Four Points by Sheraton Guests: The Student Union is located within a short walking distance (7-minute walk):

Recommended Route via White Ave Head W on White Ave (go four blocks) Turn left on James Agee St – cross Cumberland Ave Access Student Union at nearest entrance (see star on map)

36 Volunteer Blvd e v A d n a l r e b m u C

Entrance/Exit

Invited Presentations - Room 169 Room 169 (Tiered Seminar Room) Tiered Seminar Room 37 Ballroom Breakfast, Lunch, and Roundtable Discussions - Ballroom A

Poster Sessions and Mentoring - Room 262 Room 262

38 39 Downtown Knoxville

More than 80 restaurants catering to every taste and budget, over 40 boutique shops in beautifully restored buildings, festivals, world-class theatres, parks, museums, and extraordinary venues for events and outdoor activities; all in less than one square mile. For more information on local entertainment and attractions: Visitknoxville.com

Popular Destinations

GAY STREET: Since the 1790s, Gay Street has played a primary role in Knoxville’s historical and cultural development. This popular thoroughfare runs through a vibrant downtown that features historic theatres, museums, galleries, parks, shops, rooftop bars and more than 75 restaurants within less than one square mile. Several of these buildings are listed on the National Register of Historic Places.

MARKET SQUARE: Since the 1860s, Market Square has been one of Knoxville’s most popular places to shop, work, play, eat, drink and live. Market Square is home to outdoor concerts and movies, Shakespeare on the Square and much more. The Square is located just minutes from the University of Tennessee Campus and Knoxville’s Urban Wilderness, a 1,000-acre stretch of land that features more than 50 miles of hiking and biking and connects parks, trails, civil war sites and recreational amenities.

HISTORIC OLD CITY: Knoxville’s Historic Old City is a unique alternative for downtown visitors looking for a variety of shops, restaurants, coffee houses, art galleries, breweries and entertainment venues as well as an area rich in history with aesthetically inspiring architectural structures that bring the past to life.

40