The Screening Experiment

John A. Wass First Steps in Experimental Design— The Screening Experiment John A. Wass “Statistical Viewpoint” addresses principles of tWhen two design columns are identical, the statistics useful to practitioners in compliance and corresponding factors or interactions are aliased validation. We intend to present these concepts in a and their corresponding effects cannot be meaningful way so as to enable their application in distinguished daily work situations. tA desirable feature of a screening design is Reader comments, questions, and suggestions are orthogonality in which the vector products of any needed to help us fulfill our objective for this column. two main effect or interaction columns sum to Please send your comments to coordinating editor zero. Orthogonality means that all estimates can Susan Haigney at [email protected]. be obtained independently of one another tDOE software provides efficient screening designs KEY POINTS with columns that are not aliased and from which The following key points are discussed in this article: orthogonal estimates can be obtained tDesign of experiments (DOE) consists of three tFull-factorial designs include all combinations of basic stages: screening (to identify important factor levels and provide a predictive model that factors), response surface methodology (to define includes main effects and all possible interactions the optimal space), and model validation (to tFractional factorial (screening) designs include confirm predictions fewer trials and may be more efficient than the tA critical preliminary step in the screening stage is corresponding full factorial design for subject matter experts to identify the key list of tThe concept of aliasing is one of the tools that factors that may influence the process can be used to construct efficient, orthogonal, tA DOE design consists of a table with rows screening designs that represent experimental trials and columns tCenter points are often included in screening (vectors) that give the corresponding factor levels. designs to raise the efficiency and to assess lack of In a DOE analysis, the factor level columns are model fit due to curvature used to estimate the corresponding factor main tThe order of running and testing experimental effects trials is often randomized to protect against the tInteraction columns in a design are formed as the presence of unknown lurking variables “dot” product of two other columns. In a DOE tBlocking variables (such as day or run or session) analysis, the interaction columns are used to may be included in a design to raise the design estimate the corresponding interaction effects efficiency For more Author ABOUT THE AUTHOR information, John A. Wass, Ph.D, is a consulting statistician with Quantum Cats Consulting in the Chicago go to area, as well as a contributing editor at Scientific Computing and administrator of a regional gxpandjvt.com/bios [ statistical software group. He may be reached by e-mail at [email protected]. 46 PROCESS VALIDATION – Process Design John A. Wass tFactor effects in screening designs may be missed with everything in real-life, there is a price to pay for because they were not included in the screening every extra bit of information required. experiment, because they were not given We can show an experimental design as a table. sufficiently wide factor ranges, because the design An example is presented in Figure 1. Each row in the was underpowered for those factors, because trial table corresponds to an experimental trial. The table order was not properly randomized or blocked, or columns indicate the levels of the experimental factors. because of an inadequate model. For screening designs we usually consider only two levels, usually coded +/-1. In this notation, + represents INTRODUCTION “high” and – represents “low.” We may also include In days of old (i.e., the author’s undergraduate other columns that indicate interactions among the years), we were introduced to the joys of manual factors. The columns giving the experimental factor calculations and analysis of variance (ANOVA). levels permit us to estimate “main effects” and the Experimenters would change one factor at a time interaction columns permit us to estimate “interaction and identify what they felt were “optimal” processing effects.” We will say more about main effects and conditions. With the advent of personal computers interaction effects below. In addition to factor level and the dissemination of more efficient techniques by and interaction columns, we may record one or more statisticians, problems of increasing complexity were columns of measured variable values that result from solved. This not only enlightened the basic researcher, each trial. We refer to these dependent variables as the but permitted scientists and engineers to design more experimental responses. robust products and processes. In fact, the statistical Screening designs are useful as they are a practical design of experiments (DOE) has been called the most compromise between cost and information. Their cost-effective quality and productivity optimization main contribution is in suggesting which of many method known. In this brief introduction, we factors that may impact a result are actually the most will concentrate on practical aspects and keep important. Because screening designs require fewer mathematical theory to a minimum. runs, they are far less costly than the more informative The advent of DOE brought a modicum of order full-factorial designs where the practitioner uses all to the wild west of one-factor at a time changes. The combinations of factor levels. It has been suggested technique has many variations but consists of the that no more than 25% of the total budget for DOE following three basic stages: be spent on the screening runs. Screening runs are tScreening—to exclude extraneous effects usually a prelude to further experimentation, namely considered as noise the response surface and confirmatory runs, where tResponse surface methodology—to finely define specific information is gained around target (desired) the optimal result space outcomes. tModel validation—to confirm predictions. Key Assumption For Screening Studies Each is quite important. In this paper we will In screening designs, we make the assumption that concentrate on the first stage, screening design and our real-world processes are driven by only a few analysis. factors, the others being relatively unimportant. This There are many commercial software packages, usually works quite well but it is a crucial assumption either standalone or modules, within general statistics that requires careful consideration by subject matter programs that will support DOE. Some of these are experts. Also keep in mind that the fractional listed later in this discussion. Each has its unique factorial designs may be upgraded to full factorial strengths and weaknesses. For this paper, JMP8 has designs (main effects plus all interactions) if there are been used. The principles will be the same for most only a few main effects. This allows us to observe programs; although, the user interfaces, algorithms, interactions at a reasonable cost. and output will vary. Number Of Runs THEORY With screening designs, responses are taken only for The literature of DOE is replete with names such as a small fraction of the total possible combinations to full factorial, fractional factorial, runs, power, levels, reduce the number of runs and thus cost. The total and interactions. In addition, we have categorical number of runs is calculated by raising the number of and continuous factors and a variety of design names. levels to the power of the number of factors (e.g., for Fortunately, in screening we usually confine ourselves three factors at two levels each we have runs = 2^3 = to the fractional factorial designs. Unfortunately, as 2x2x2 = 8). This is actually a full factorial design as PROCESS VALIDATION – Process Design 47 John A. Wass we are testing all combinations of factor levels. Full Figure 1: Aliasing in a simple design. factorial designs allow us to build predictive models Run Factors that include the main effects of each factor as well A B C ABC D as interactions. This brings us to three important concepts of these models: interaction (the effects 1 - - - - - of one factor on another), orthogonality (all factors 2 - - + + + are independent of one another), and aliasing 3 - + - + + (when the effects due to multiple factors cannot be 4 - + + - - 5 + - - + + distinguished). 6 + - + - - 7 + + - - - Interactions 8 + + + + + One of the more important things that practitioners Sum 0 0 0 0 0 need to know about is that main factors may affect each other in ways known and unknown (i.e., interaction Orthogonality among effects). For example, the interaction of two Further, the “dot” product of any two of the columns reagents in a chemical process may be a significant A, B, C, or ABC will also sum to zero (try it and driver of the overall process (think enzyme and see). This more subtle design characteristic is called substrate). In deciding which are important, statistically “orthogonality” and is critical to good experimental and physically, it is necessary to consult with the bench design. To understand why orthogonality is so scientists and technicians to get a handle on what important, we return to our concept of aliasing. is already known and suspected to be important to Aliasing is the extreme absence of orthogonality. the process. Too few factors risk missing something It is

Load more