<<

of (DOE) in Covid-19 Factor Screening and Assessment

Jorge Luis Romeu, Ph.D. https://www.researchgate.net/profile/Jorge_Romeu http://web.cortland.edu/romeu/ Email: [email protected] Copyright. October 28, 2020

1.0 Introduction

We apply Design of Experiments (DOE) to Covid-19 patient treatments and health conditions for Factor screening and assessment. We assume that readers are familiar with our previous article on DOE1. This work is also part of our pro-bono collaboration to the struggle against Covid-19: https://www.researchgate.net/publication/341282217_A_Proposal_for_Fighting_Covid- 19_and_its_Economic_Fallout Our previous work includes statistical methods for Vaccine Life: https://www.researchgate.net/publication/344495955_Survival_Analysis_Methods_Applied_to_ Establishing_Covid-19_Vaccine_Life as well as to help accelerate vaccine testing: https://www.researchgate.net/publication/344193195_Some_Statistical_Methods_to_Accelerate_ Covid-19_Vaccine_Testing and a Markov model to study problems of reopening college: https://www.researchgate.net/publication/343825461_A_Markov_Model_to_Study_College_Re- opening_Under_Covid-19 and the effects of Herd Immunization: https://www.researchgate.net/publication/343345908_A_Markov_Model_to_Study_Covid- 19_Herd_Immunization?channel=doi&linkId=5f244905458515b729f78487&showFulltext=t rue as well as of general survival: https://www.researchgate.net/publication/343021113_A_Markov_Chain_Model_for_Covid- 19_Survival_Analysis about socio-economic and racial issues affected by Covid-19: https://www.researchgate.net/publication/343700072_A_Digression_About_Race_Ethnicity_Cla ss_and_Covid-19 and developing A Markov Chain Model for Covid-19 : https://www.researchgate.net/publication/343021113_A_Markov_Chain_Model_for_Covid- 19_Survival_Analysis and An Example of Survival Analysis Applied to analyzing Covid-19 : https://www.researchgate.net/publication/342583500_An_Example_of_Survival_Analysis_Data _Applied_to_Covid-19, and Multivariate in the Analysis of Covid-19 Data, and More on Applying to Covid-19 Data, both of which can also be found in: https://www.researchgate.net/publication/341385856_Multivariate_Stats_PC_Discrimination_in _the_Analysis_of_Covid-19, and the implementation of multivariate analyses methods such as: https://www.researchgate.net/publication/342154667_More_on_Applying_Principal_Component s_Discrimination_Analysis_to_Covid-19 Design of Experiments to the Assessment of Covid-19: https://www.researchgate.net/publication/341532612_Example_of_a_DOE_Application_to_Cor onavarius_Data_Analysis Offshoring: https://www.researchgate.net/publication/341685776_Off- Shoring_Taxpayers_and_the_Coronavarus_Pandemic and methods in ICU assessment: https://www.researchgate.net/publication/342449617_Example_of_the_Design_and_Operation_ of_an_ICU_using_Reliability_Principles and Control methods for monitoring Covid-19: https://web.cortland.edu/matresearch/AplicatSPCtoCovid19MFE2020.pdf

1https://www.researchgate.net/publication/341532612_Example_of_a_DOE_Application_to_Coronavarius_Data_A nalysis in ResearchGate, or https://web.cortland.edu/matresearch/ApplicatDOEtoCovid-19MFE2020.pdf in QR&CII

2.0 Problem Statement

We use DOE, one of the most powerful tools that statistics can provide to research, to screen and identify experimental Covid-19 treatments and characteristics that may affect Covid-19 patients. DOE helps identify which variables or factors, from a large variable pool, can be discarded as ineffective, and which ones should be examined more carefully, for they affect the responses.

The tool used here is Fractional Factorial (FF-DOE). Factors analyzed will have two levels (low/high), and thus, will be 2^k, where k is the number of factors used. It is evident that the number of experiments will increase geometrically with k. For example, if there are only six factors: 2^6 = 64 treatments; we then multiply by the number of experimental replications (at least two runs, so variability can be estimated), yielding a total of 128 experiments to conduct.

The need to reduce the number of experiments is evident (time and resource constraints). So we will use two techniques: first, Plackett-Burman (PB) procedures; second, Fractional Factorial designs. We assume the reader is acquainted with our previously mentioned DOE paper, which implements Full Factorial designs using Excel-based calculations. Such calculations can be used here, too. But we will use Minitab FF-DOE procedures, and their equivalent regression methods.

We do pay a price by reducing the number of experiments (i.e. using FF-DOEs): Estimations are confounded, for we cannot tell if results pertain to one Factor, or another, or both. Confounded Factors are also called aliased. Knowledge of the aliasing structure is important to analyze the experimental results. When interactions are small, becomes a lesser problem.

We will first implement PB to a set of 11 variables, to screen out those that do not impact the response, and to identify those that do. Then, we will implement FF-DOEs with factors resulting statistically significant, to assess the level and direction of their impact in the response.

The response may be any metric that assesses patient status of Covid-19 (i.e. a wellness score). It can be a single metric (e.g. temperature, BP), or a linear combination of such metrics (Ʃαiχi where the χi are k patient health variables, and the αi are coefficients such that Ʃαi =1; k=1, … k). In our present study we will assume that such combined response metrics exists.

3.0 The Data

We have tried, unsuccessfully, to obtain Covid-19 patient data. We believe that it is important to illustrate the power and use of statistical techniques using appropriate data. So we have created a data set based on our judgment and experience. Its numerical results, thence, have no medical value. But interested researchers can follow our procedures, inserting their real data in them.

First, let’s consider the Factors used. We will implement DOEs on several Covid-19 treatments listed below, currently (or in the recent past) undergoing medical trials and . We have included a Russian antiviral drug: Coronavir. In addition, we will implement DOEs on patient characteristics that have been considered in the past by the medical community as some possible factors that would place those patients holding them into a higher risk category.

Variables or Factors Involved in DOE Analysis Acronym Identification Low High Coro Russian None Active Mono Monoclonal None Active Rem Remdesivir None Active Hydr Hydroxychloro None Active Dexa Dexamethasone None Active Plas Conval. Plasma None Active Gen Gender Male Female Age Patient Age < 45 yr > 65 yr Com Co-morbidities Zero > 0 Soc Socio-economic Poor Other Bmas Body Mass Normal Obese

Socio economic corresponds to patient’s background. It has been discussed how individuals of lower socio-economic levels may be more susceptible to becoming infected with Covid-19, due to their ensuing habitation density, diets, occupations and other environmental conditions. The same considerations apply to gender, body mass, age and Number of Co-morbidities. We want to test whether such factors affect in any way treatment effectiveness in such cohorts. Finally, the response (a wellness score) has been described fully in the previous section.

4.0 Plackett-Burman Factor Screening Analysis:

A key objective of this paper is to demonstrate how DOE techniques can contribute to screen out potential Covid-19 treatments so that researchers can reduce the time and resources dedicated to analyzing them. We will screen eleven factors corresponding to Covid-19 treatments and patient characteristics, to determine which have a significant impact in the Response (a patient wellness score), and what direction (increasing or decreasing) does such Response impact has.

The DOE Analysis matrix, for the corresponding Plackett-Burman (PB) design is:

Run A B C D E F G H I J K 1 1 -1 1 -1 -1 -1 1 1 1 -1 1 2 1 1 -1 1 -1 -1 -1 1 1 1 -1 3 -1 1 1 -1 1 -1 -1 -1 1 1 1 4 1 -1 1 1 -1 1 -1 -1 -1 1 1 5 1 1 -1 1 1 -1 1 -1 -1 -1 1 6 1 1 1 -1 1 1 -1 1 -1 -1 -1 7 -1 1 1 1 -1 1 1 -1 1 -1 -1 8 -1 -1 1 1 1 -1 1 1 -1 1 -1 9 -1 -1 -1 1 1 1 -1 1 1 -1 1 10 1 -1 -1 -1 1 1 1 -1 1 1 -1 11 -1 1 -1 -1 -1 1 1 1 -1 1 1 12 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

For example, a Run 1 patient would display Factors A, C, G, H, I. K at High levels (1), and B, D, E, F, J at Low levels (-1). Responses used in the analysis, and Factor Ids are given below:

Response 6.4792 25.9234 16.7154 0.9684 -0.7402 28.0808 -1.4403 17.1465 8.4471 7.6250 4.0044 9.4544 3.5821 22.2068 23.3188 17.7480 19.4090 26.6889 -2.6356 22.6174 10.7344 25.9626 8.0272 10.8481 2.7513 29.8859 14.3129 -3.0161 -1.2803 33.2380 -6.0776 17.7292 9.4564 23.4639 8.9847 5.3468 1.3988 24.9400 17.9327 -1.4878 -0.9963 35.1034 -2.4701 17.9177 11.8238 23.3005 2.8541 9.0677

Variables or Factors in the Analysis Letter Acronim Letter Acronim A Mono F Gender B Coro G Plasma C Hydro H Age D Rem I SocioEc E Dexa J Comor K Bmas

Submitting the above 12 treatment , with four replications (for a total of 48 runs), to the corresponding PB statistical analysis, we obtain:

Estimated Effects and Coefficients

Term Effect Coef StDev Coef T P Constant 12.195 0.7679 15.88 0.000 A 4.880 2.440 0.7679 3.18 0.003 B 2.776 1.388 0.7679 1.81 0.079 C -0.506 -0.253 0.7679 -0.33 0.744 D -4.656 -2.328 0.7679 -3.03 0.004 E 9.611 4.805 0.7679 6.26 0.000 F -1.941 -0.970 0.7679 -1.26 0.214 G -7.922 -3.961 0.7679 -5.16 0.000 H 7.279 3.639 0.7679 4.74 0.000 I 0.413 0.207 0.7679 0.27 0.789 J 6.367 3.184 0.7679 4.15 0.000 K -9.271 -4.635 0.7679 -6.04 0.000

Analysis of for Response

Source DF Seq SS Adj SS Adj MS F P Main Effects 11 4704 4704 427.62 15.11 0.000 Residual Error 36 1019 1019 28.30 Pure Error 36 1019 1019 28.30 Total 47 5723

We show, in red, the four non statistically-significant Factors: B, C, F and I. All other factors are statistically significant (i.e. exert an impact in the response), at least at 5% significance level.

Notice how, among experimental treatments considered, only Hydroxichloroquine and Russian2 Coronavir are non-significant (do not impact the patient recovery or wellness index). Among all patient characteristics, Gender and Socio-economic level are also non-significant.

PB procedure is complex and does not work well when in presence of a strong Factor , or when PB Confounding patterns are involved. PB design matrices are not easy to construct (as opposed to Fractional Factorial design matrices). But they are available in the Bibliography for up to 35 Factors (see Bibliography: Box, Hunter & Hunter, Montgomery, and Romeu).

The Main Effect plots (below) help understand, and explain to lay people, the DOE analysis results. Main Effect plots complement and illustrate the DOE table of numerical results.

For example, the Main Effect for Factor E is strong (goes from -7 to +17, in two units: from -1 to +1) and positive (as the response increases when the Factor increases). The Main Effect for Factor I does not influence the response (it is non-significant, and the plot line is flat). The Main Effect for Factor K is strong and negative (the plot line decreases from 17 to 7) and impacts negatively the response.

Main Effects for Response

-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 17.0

14.5

12.0 Response 9.5

7.0 A B C D E F G H I J

-1 1 17.0

14.5

12.0 Response 9.5

7.0 K

2 https://www.pharmaceutical-technology.com/news/russia-nod-coronavir-covid-19/ The interaction plots (below) help us see which Factors interact (when their lines cross), or do not (when their lines are parallel). For example, Factors B and G have a strong interaction. If these are two treatments, it that one treatment is compounded with the other treatment. If the Factors are one treatment and one patient characteristic, then they may be interpreted as the Covid-19 treatment being affected, positively or negatively, by patient condition, or vice-versa.

Interaction Plot for Response

-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 A 1 -1 B 1 -1 C 1 -1 D 1 -1 E 1 -1 F 1 -1 G 1 -1 H 1 -1 I 1 -1 J 1 -1 K

In general, Main Effects of screening experiments are implemented between two values: one Low (-1) and one High (+1). These values define a region where said Effects are assumed to impact the response. Many effects can then be simultaneously screened in a single , leaving only the good ones. This saves research time and effort, that can then be dedicated to further analyze the sub-set of statistically significant Effects that the DOE has identified,

Alternative calculations:

If the analyst does not have access to a specialized statistical software package, such as SAS or Minitab, with DOE procedures, the analyst can alternatively Regress the response (e.g. wellness score) versus the DOE matrix (of +/- 1s), given at the start of Section 4.0. All DOE Main Factors are then double of the corresponding Regression coefficients (because the regression step is unit, but the Factorial step is two units). Their signs, statistical tests and p-values remain valid.

The regression equation is

Response = 12.2 + 2.44 A + 1.39 B - 0.253 C - 2.33 D + 4.81 E - 0.970 F - 3.96 G + 3.64 H + 0.207 I + 3.18 J - 4.64 K

Predictor Coef StDev T P Constant 12.1948 0.7679 15.88 0.000 A 2.4400 0.7679 3.18 0.003 B 1.3879 0.7679 1.81 0.079 C -0.2531 0.7679 -0.33 0.744 D -2.3278 0.7679 -3.03 0.004 E 4.8055 0.7679 6.26 0.000 F -0.9705 0.7679 -1.26 0.214 G -3.9609 0.7679 -5.16 0.000 H 3.6394 0.7679 4.74 0.000 I 0.2067 0.7679 0.27 0.789 J 3.1836 0.7679 4.15 0.000 K -4.6353 0.7679 -6.04 0.000

S = 5.320 R-Sq = 82.2% R-Sq(adj) = 76.8%

Analysis of Variance

Source DF SS MS F P Regression 11 4703.78 427.62 15.11 0.000 Error 36 1018.89 28.30 Total 47 5722.67

Notice how the same four Factors (B, C, F and I) are non-statistically significant. Factors have the same signs (all positive, except K: body mass) and p-values. This means, for example, that as Factor D (drug Remsedevir) increases, the response (wellness score) also increases. On the other hand, as Factor K (patient Body Mass) increases the response (wellness score) decreases.

5.0 Fractional Factorial Design for Covid-19 Drug Treatments:

In this section we implement a FF-DOE to the five resulting statistically significant experimental treatments. To minimize research time and effort we use a Resolution V Half Fraction Factorial 5-1 DOE (denoted 2 V) for drug treatments Hydroxychloroquine, Dexamethasone Monoclonal, and Remdesivir. We construct this design from the 2^4 Full Factorial, by making the fifth Factor (E) equal to the four factor interaction (ABCD).

We initialy use Minitab software DOE routine, with notation:

Factors Analyzed Letter Acronym A Mono B Conv C Dexa D Rem E Hydro

Responses -0.474086 -3.23094 -1.02434 -1.76997 1.7832887 4.288786 1.347528 5.97645 2.9125628 3.178303 6.166301 4.355656 10.212619 8.935715 13.50254 10.60435 6.6854336 12.19264 13.15214 13.6751 15.594037 13.91064 18.44668 16.09081 15.158581 11.04969 11.21969 13.1245 25.923135 20.91327 23.61092 22.98009

ANALYSIS OF A 2^5 HALF FRACTION, RESOLUTION 5, FOR PATIENT CHARACTERISTICS:

5-1 DOE MATRIX FOR A 16 Run (2 V) FRACTIONAL FACTORIAL DESIGN w/Generator ABCDE Run A B C D AB AC BC AD BD CD ABC ABD BCD E=ABCD (1) -1 -1 -1 -1 1 1 1 1 1 1 -1 -1 -1 1 a 1 -1 -1 -1 -1 -1 1 -1 1 1 1 1 -1 -1 b -1 1 -1 -1 -1 1 -1 1 -1 1 1 1 1 -1 ab 1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 c -1 -1 1 -1 1 -1 -1 1 1 -1 1 -1 1 -1 ac 1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 1 1 bc -1 1 1 -1 -1 -1 1 1 -1 -1 -1 1 -1 1 abc 1 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 d -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 -1 ad 1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 1 1 bd -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 -1 1 cd -1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1 1 abd 1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 -1 acd 1 -1 1 1 -1 1 -1 1 -1 1 -1 -1 -1 -1 bcd -1 1 1 1 -1 -1 1 -1 1 1 -1 -1 1 -1 abcd 1 1 1 1 1 1 1 1 1 1 1 1 1 1

5-1 Notice how Factor E = ABCD. The result is a 2^5 HALF FRACTION of Resolution 5: 2 V

Interaction ABCDE is called Design Generator, and I=ABCDE, the Defining Relation. Obtain the aliasing structure of this design from these two relationships (see a Half Fraction example in Romeu: https://www.quanterion.com/design-of-experiments-for-reliability-improvement/)

An example of finding Effect aliases is as follows. Any Effect multiplied by itself yields Identity:

A*A = B*B = C*C = D*D = E*E = I

Applying above rule to I = ABCDE, we have: E = ABCD => ED = ABC => CED = AB

Therefore, the interaction ED is aliased with ABC, and the interaction AB is aliased with CED. Factorial Fit: Response versus A, B, ...

Estimated Effects and Coefficients for Response (coded units)

Term Effect Coef SE Coef T P Constant 10.2241 0.5637 18.14 0.000 A 6.3122 3.1561 0.5637 5.60 0.000 B 5.1354 2.5677 0.5637 4.55 0.000 C 10.5435 5.2718 0.5637 9.35 0.000 D 0.3953 0.1976 0.5637 0.35 0.730 E=ABCD 1.1628 0.5814 0.5637 1.03 0.315 AB 1.2413 0.6206 0.5637 1.10 0.284 AC 2.2668 1.1334 0.5637 2.01 0.058 BC -0.3452 -0.1726 0.5637 -0.31 0.763 AD -0.6290 -0.3145 0.5637 -0.56 0.583 BD -1.1567 -0.5783 0.5637 -1.03 0.317 CD -0.0782 -0.0391 0.5637 -0.07 0.945

S = 3.18890 R-Sq = 88.06% R-Sq(adj) = 81.49%

Analysis of Variance for Response (coded units) Source DF Seq SS Adj SS Adj MS F P Main Effects 11 1499.43 1499.43 136.312 13.40 0.000 Residual Error 20 203.38 203.38 10.169 Lack of Fit 4 16.34 16.34 4.086 0.35 0.841 Pure Error 16 187.04 187.04 11.690 Total 31 1702.81

In red, we show the statistically significant Factors (A, B, C) which are positive. Factors D and E are non-significant (do not impact the response). The interaction AC, between factors A and C, is mildly significant (5%). The DOE model used explains 80% of the problem.

On the other hand, being a Fractional Factorial, we need to consider the aliasing structure of the design (shown below). As it is the rule with Resolution V FF-DOE, no Main Factor is aliased with any other Main Factor or first level interaction. Higher order interactions may be aliased with other higher order interactions. Notice from the analysis table that, excepting AC, no other Factor interaction is statistically significant.

Below, we show the aliasing structure for this experimental design:

Alias Structure (up to order 2) A + B*AB + C*AC + D*AD B + A*AB + C*BC + D*BD C + A*AC + B*BC + D*CD D + A*AD + B*BD + C*CD E=ABCD + AB*CD + AC*BD + BC*AD AB + A*B + E=ABCD*CD + AC*BC + AD*BD AC + A*C + E=ABCD*BD + AB*BC + AD*CD BC + B*C + E=ABCD*AD + AB*AC + BD*CD AD + A*D + E=ABCD*BC + AB*BD + AC*CD BD + B*D + E=ABCD*AC + AB*AD + BC*CD CD + C*D + E=ABCD*AB + AC*AD + BC*BD

Residual Plots for Response Normal Probability Plot of the Residuals Residuals Versus the Fitted Values

99 l 2

a

u

d i

90 s

e 1

t

R

n

d

e e

c 50 0

r z

i

e

d

P r

a -1

10 d

n

a t

1 S -2 -2 -1 0 1 2 0 5 10 15 20 Standardized Residual Fitted Value

Histogram of the Residuals Residuals Versus the Order of the Data

l 2

8 a

u

d i

s 1 e

y 6

c

R

n

d

e e u 0

4 z

q

i

e

d

r

r

F a

2 d -1

n

a t 0 S -2 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Standardized Residual Observation Order

Model assumptions (Normality, independence, homoskedasticity) should always be checked, to establish the of the results. In real life, however, model assumptions are seldom fully satisfied, so we assess them in a case per case manner. We show above the graphical assumption analysis for this DOE model, and verify how these assumptions are not seriously violated.

Below we show a for the model Effects. Columns above the 2.086 line (significance level) are statistically significant. Factors A and C are the most significant ones. The hereditary effect explains why interaction AC is almost significant. The sparcity effect states that, as more interactions between Main Effects occur, the significance level decreases. For this reason we do not place much weight on effects beyond the second order interaction (e.g. AC).

The Normality plot represents an alternative graphical way of displaying which Main Effects and interaction are statistically significant (those further away from the graph line)

These plots are the graphical representations of the numerical values in the DOE analysis tables. Such graphical representations of the results are very useful when explaining the DOE analysis to others, especially to laymen. significant The help Normal Paretoand illustrate Plot the Chart

Percent Term 10 20 30 40 50 60 70 80 90 95 99 E=ABCD 1 5 ,

AD AC CD BC BD AB and -2 D A C B 0 Normal Plot Probability of the Standardized Effects are often used areto often Pareto Pareto Chart of the Standardized Effects 1 0 2.086 (response is Response, Alphais =(response .05) Response, Alphais =(response .05) 2 StandardizedEffect 2 explain themodel explain 3 StandardizedEffect 4 4 B A 6 5 results how

6 Factors to la 8 y users. 7 A, B, C A, and C 10 8 Significant Significant Not Effect Type are statistically 9

Below we show the equivalent regression analysis results. Notice again how the regression coefficients are half of the corresponding Main Factors. Tests and significance levels remain.

Regression Analysis: Response versus A, B, C, D, E=ABCD

The regression equation is:

Response = 10.2 + 3.16 A + 2.57 B + 5.27 C + 0.198 D + 0.581 E=ABCD

Predictor Coef SE Coef T P Constant 10.2241 0.5714 17.89 0.000 A 3.1561 0.5714 5.52 0.000 B 2.5677 0.5714 4.49 0.000 C 5.2718 0.5714 9.23 0.000 D 0.1976 0.5714 0.35 0.732 E=ABCD 0.5814 0.5714 1.02 0.318

S = 3.23256 R-Sq = 84.0% R-Sq(adj) = 81.0%

Analysis of Variance

Source DF SS MS F P Regression 5 1431.13 286.23 27.39 0.000 Residual Error 26 271.68 10.45 Total 31 1702.81

Residual Plots for Response Normal Probability Plot of the Residuals Residuals Versus the Fitted Values

99 l 2

a

u

d i

90 s 1

e

t

R

n

d

e e

c 50 0

r z

i

e

d

P r

a -1

10 d

n a t -2 1 S -2 -1 0 1 2 0 5 10 15 20 Standardized Residual Fitted Value

Histogram of the Residuals Residuals Versus the Order of the Data

l 2

a u

8 d i

s 1

y e

c

R

n 6

d

e e

u 0

z q

4 i

e

d

r

r F a -1

2 d

n a t -2 0 S -2 -1 0 1 2 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Standardized Residual Observation Order

The above plots are presented with the same objectives stated in the previous example.

Extracting a Full Factorial 2^3 Design from the above Half Fraction Factorial 2^4

A characteristic of Resolution 5 Fractional Factorials is that we can extract the Full Factorial Designs of level lower than 5, to study more carefully those statistically significant relations identified from the Half Fraction analysis. In our case we will extract and analyze a 2^3 Full Factorial Design for the three statistically significant Main Factors: A, B, C.

Factorial Fit: Response versus A_1, B_1, C_1

Estimated Effects and Coefficients for Response (coded units)

Term Effect Coef SE Coef T P Constant 10.2241 0.5464 18.71 0.000 A_1 6.3122 3.1561 0.5464 5.78 0.000 B_1 5.1354 2.5677 0.5464 4.70 0.000 C_1 10.5435 5.2718 0.5464 9.65 0.000 A_1*B_1 1.2413 0.6206 0.5464 1.14 0.267 A_1*C_1 2.2668 1.1334 0.5464 2.07 0.049 B_1*C_1 -0.3452 -0.1726 0.5464 -0.32 0.755 A_1*B_1*C_1 -0.1190 -0.0595 0.5464 -0.11 0.914

S = 3.09066 R-Sq = 86.54% R-Sq(adj) = 82.61%

Analysis of Variance for Response (coded units)

Source DF Seq SS Adj SS Adj MS F P Main Effects 3 1419.06 1419.06 473.020 49.52 0.000 2-Way Interactions 3 54.39 54.39 18.129 1.90 0.157 3-Way Interactions 1 0.11 0.11 0.113 0.01 0.914 Residual Error 24 229.25 229.25 9.552 Pure Error 24 229.25 229.25 9.552 Total 31 1702.81

Factors Analyzed Letter Acronym A Mono B Conv C Dexa

The re-analysis of the three Main Effects A, B, and C has been done without the need to conduct additional experimentation, by extracting a Full 2^3 Design. A Full Factorial design will always provide stronger results than a Fractional Factorial, because there are no aliasing or confounding.

We present below the graphical analysis of these results. are clean Finally, Romeu: “convert”These back. theunits calculationsareind shown After AUpper = ConvertedUnitfor (High Lower A = ConvertedUnitfor (Low A (Mono) isu that It important to is the Response. procedu Andthe same (fromAn increase oftwo units and treatments The

the AC are constructedby “ Main of Response

the regression analysis, wewantthe regression if to 10.0 12.5 15.0 10.0 12.5 15.0 5.0 7.5 5.0 7.5

the above the https://www.quanterion.com/design ,

and Effect Monoclonal, Convalescent DexamethasoneMonoclonal, and Plasma

interaction is

sed at two levels: Lowlevels: sed at ( two strengt s plot notice that notice that

is Main Effects Plot (data means) for Response -1 -1

a Full Factorial design a Full h

confirm en converting the results obtained. mildly A_1 C_1 DOE s

- that thethree 1 to +1) infactorA carries1 to+1) an it increase with from 7.5to12.5in significant. significant. matrices ” said re can be applied tothe otherreapplied MainFactors. be can – – 7 units) and High and 7 units) the“conversion” (13Then, units). is:

1 1

Mid) /Half Mid) /Half

original

are work with work ; - The o

of Main Facto Main there areno aliases

composed composed - experiments F ther interactionsare non - - actor values. Foractor example, values. Interval= (7 Interval =(13

the originalFactor we values, -1 rs of +/

etail DOE our paper,or in previous in A, B, C,thatcorrespond Covid to - for -

1s - – reliability

B_1

involved. – . 10) /[(13

This a 10) /[(13 10) re positive and positive re

yield - -

significant. improvement/ 1 - Factor estimations 7)/2] = 7)/2] - s 7)/ assume that

orthogonal factors

2] = 3/3=+12] significant - need 3/3 =

to first

Factor - 1

-

19 ,

Interaction Plot (data means) for Response

-1 1 -1 1 20 A_1 -1 1 A _1 10

0 20 B_1 -1 1 B_1 10

0

C_1

Interaction effect plots confirm how Factors Effects do not interact (notice parallel lines). The equivalent Regression analysis again provides the same results as those of the Factorial.

Regression Analysis: Response versus A_1, B_1, ...

The regression equation is:

Response = 10.2 + 3.16 A_1 + 2.57 B_1 + 5.27 C_1 + 0.621 AB_1 + 1.13 AC_1 - 0.173 BC_1 - 0.060 ABC_1

Predictor Coef SE Coef T P Constant 10.2241 0.5464 18.71 0.000 A_1 3.1561 0.5464 5.78 0.000 B_1 2.5677 0.5464 4.70 0.000 C_1 5.2718 0.5464 9.65 0.000 AB_1 0.6206 0.5464 1.14 0.267 AC_1 1.1334 0.5464 2.07 0.049 BC_1 -0.1726 0.5464 -0.32 0.755 ABC_1 -0.0595 0.5464 -0.11 0.914

S = 3.09066 R-Sq = 86.5% R-Sq(adj) = 82.6%

Analysis of Variance

Source DF SS MS F P Regression 7 1473.56 210.51 22.04 0.000 Residual Error 24 229.25 9.55 Total 31 1702.81

Analysis of the Response Variability

In addition to increasing or decreasing the Response values, as occurs in the examples given above, Factors can impact Response variability. This is not very frequently analyzed, but it is an important consideration. For, when the Response Variability (which is associated with variance) increases, the Response accuracy decreases.

Below is an example of DOE that analyzes the impacts of a Factor in Variability. It is performed using regression (DOE specialized software have routines to do this). In the case of the analysis of variability, the regression response used is the Logarithm of the original response.

Regression Analysis: LogResp versus A_1, B_1, ...

The regression equation is:

LogResp = 2.09 + 0.367 A_1 + 0.110 B_1 + 0.585 C_1 + 0.145 AB_1 - 0.068 AC_1 + 0.061 BC_1 - 0.163 ABC_1

29 cases used, 3 cases contain missing values

Predictor Coef SE Coef T P Constant 2.0857 0.1605 12.99 0.000 A_1 0.3665 0.1605 2.28 0.033 B_1 0.1104 0.1605 0.69 0.499 C_1 0.5847 0.1605 3.64 0.002 AB_1 0.1454 0.1605 0.91 0.375 AC_1 -0.0679 0.1605 -0.42 0.677 BC_1 0.0606 0.1605 0.38 0.709 ABC_1 -0.1633 0.1605 -1.02 0.320

S = 0.840684 R-Sq = 51.7% R-Sq(adj) = 35.6%

Analysis of Variance

Source DF SS MS F P Regression 7 15.8687 2.2670 3.21 0.018 Residual Error 21 14.8417 0.7067 Total 28 30.7104

The Regression analysis result shows how Factors A and C are significant, and thus increase the response variability (since their coefficients are positive). Factor B, on the other hand, does not affect the response variability (and neither of the other interactions).

Statisticians prefer to work with variables having small (because, for example, these yield smaller confidence intervals, for the same coverage). The (CV) = sigma/Mu is a frequently used metric to compare variation, across different variables. 6.0 Fractional Factorial Design for Screening Patient Characteristics

In this section we implement FF-DOEs to screen five patient characteristics and determine which ones may affect the patient treatments considered. FF-DOEs are used to minimize research time 5-1 and effort. We will use a Resolution V Half Fraction Factorial Design, denoted 2 V, with the characteristics Gender, Age, Socio-Economic conditions, Co-morbidities, and Body Mass. We first use Minitab DOE software, and then a regression. We use the five variables, with notation:

Variables in the Analysis Letter Acronym Letter A Gen Gender B Age Patient Age C Com Comorbidities D Soc Socio-Econ E Bmas Body Mass

5-1 ANALYSIS FOR (2 V) HALF FRACTION, RESOLUTION 5, FOR PATIENT CHARACTERISTICS:

5-1 DOE MATRIX FOR A 16 Run (2 V) FRACTIONAL FACTORIAL DESIGN w/Generator ABCDE Run A B C D AB AC BC AD BD CD ABC ABD BCD E=ABCD (1) -1 -1 -1 -1 1 1 1 1 1 1 -1 -1 -1 1 a 1 -1 -1 -1 -1 -1 1 -1 1 1 1 1 -1 -1 b -1 1 -1 -1 -1 1 -1 1 -1 1 1 1 1 -1 ab 1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 c -1 -1 1 -1 1 -1 -1 1 1 -1 1 -1 1 -1 ac 1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 1 1 bc -1 1 1 -1 -1 -1 1 1 -1 -1 -1 1 -1 1 abc 1 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 d -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 -1 ad 1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 1 1 bd -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 -1 1 cd -1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1 1 abd 1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 -1 acd 1 -1 1 1 -1 1 -1 1 -1 1 -1 -1 -1 -1 bcd -1 1 1 1 -1 -1 1 -1 1 1 -1 -1 1 -1 abcd 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Response 3.670 6.970 0.140 15.590 7.680 18.370 13.320 22.490 0.150 6.999 2.455 12.344 6.970 16.389 9.789 21.777 0.220 6.470 5.150 8.930 4.560 13.510 9.540 23.900 3.410 7.550 2.080 12.500 8.670 17.290 6.500 18.950 The Half Fractional Factorial Design is built starting from a 2^4 Full Factorial

Fractional Factorial Design

Runs: 16 Replicates: 1 Fraction: 1/2 Blocks: none Center pts (total): 0

Design Generator: ABCDE

Estimated Effects and Coefficients for Response

Term Effect Coef StDev Coef T P Constant 9.823 0.3900 25.19 0.000 A 7.957 3.979 0.3900 10.20 0.000 B 2.386 1.193 0.3900 3.06 0.007 C 8.968 4.484 0.3900 11.50 0.000 D -0.418 -0.209 0.3900 -0.54 0.599 AB 1.831 0.915 0.3900 2.35 0.031 AC 1.598 0.799 0.3900 2.05 0.056 BC 0.567 0.284 0.3900 0.73 0.477 AD -1.036 -0.518 0.3900 -1.33 0.201 BD -2.316 -1.158 0.3900 -2.97 0.009 CD 0.689 0.345 0.3900 0.88 0.389 ABC 0.606 0.303 0.3900 0.78 0.448 ABD 0.134 0.067 0.3900 0.17 0.865 BCD -1.014 -0.507 0.3900 -1.30 0.211 E=ABCD 1.330 0.665 0.3900 1.70 0.106

Analysis of Variance for Response

Source DF Seq SS Adj SS Adj MS F P Main Effects 14 1327.41 1327.41 94.8148 19.48 0.000 Residual Error 17 82.75 82.75 4.8674 Lack of Fit 1 0.32 0.32 0.3222 0.06 0.806 Pure Error 16 82.42 82.42 5.1514 Total 31 1410.15

Notice how Factors D and E (=ABCD) are non-significant (only Factors A, B, C are significant). We will explore further this issue in the next section, using the extracted 2^3 Full Factorial.

Main Factors are not confounded (aliased) with second order interactions; but they are aliased with third order interactions which, in this example, are non-significant.

We use Fractional Factorial Designs (such as Half Fractions) to obtain screening designs. They are not as sensitive as Full Factorial Designs. But they save much time, effort, and readily can identify those Factors that merit further investigation. Equivalent Regression Analysis

The regression equation is:

Response = 9.82 + 3.98 A + 1.19 B + 4.48 C - 0.209 D + 0.915 AB + 0.799 AC + 0.284 BC - 0.518 AD - 1.16 BD + 0.345 CD + 0.303 ABC + 0.067 ABD - 0.507 BCD + 0.665 E=ABCD

Predictor Coef StDev T P Constant 9.8229 0.3900 25.19 0.000 A 3.9787 0.3900 10.20 0.000 B 1.1928 0.3900 3.06 0.007 C 4.4839 0.3900 11.50 0.000 D -0.2090 0.3900 -0.54 0.599 AB 0.9153 0.3900 2.35 0.031 AC 0.7990 0.3900 2.05 0.056 BC 0.2837 0.3900 0.73 0.477 AD -0.5182 0.3900 -1.33 0.201 BD -1.1578 0.3900 -2.97 0.009 CD 0.3445 0.3900 0.88 0.389 ABC 0.3030 0.3900 0.78 0.448 ABD 0.0672 0.3900 0.17 0.865 BCD -0.5070 0.3900 -1.30 0.211 E=ABCD 0.6649 0.3900 1.70 0.106

S = 2.206 R-Sq = 94.1% R-Sq(adj) = 89.3%

Analysis of Variance

Source DF SS MS F P Regression 14 1327.407 94.815 19.48 0.000 Error 17 82.745 4.867 Total 31 1410.153

We present below the corresponding analysis graphs. The first two (Normal Probability Plot and Pareto Chart) are equivalent and identify D and E, the two non-statistically significant Factors in the analysis (A, B, C are significant Main Factors). If the analyst does not have access to DOE specialized software, these plots can be implemented with Excel, where plot calculations are first done following the definitions, and then implemented using standard Excel plots.

Most of the two and three factor interactions are also non significant. This situation will be re- assessed in the extracted 2^3 Full Factorial, in the next section, without having to redo the experiment (just, recalculating the results from the present one).

The last two (Normality and Residual) plots help assess the assumptions of Normality, independence and homoskedasticity. Normality is checked via the first plot; the independence and equality of variances are checked via the second plot Normal Probability Plot of the Standardized Effects (response is Response, Alpha = .10)

C

A 1 B AB AC

0 Normal Score -1

BD

0 5 10 Standardized Effect

Pareto Chart of the Standardized Effects (response is Response, Alpha = .10)

C A B BD AB AC E=ABCD AD BCD CD ABC BC D ABD

0 5 10

The Normal Plots of the residuals and the plot of residuals vs. fitted values are used to check the above-mentioned regression model assumptions. Normal Probability Plot of the Residuals (response is Response)

2

1

0

-1 Standardized Residual

-2

-2 -1 0 1 2 Normal Score

Residuals Versus the Fitted Values (response is Response)

2

1

0

-1 Standardized Residual

-2

0 10 20 Fitted Value

Extracted 2^3 Full Factorial Design:

We analyze again the three statistically significant patient characteristics A, B, C by using a 2^3 Full Factorial extracted from the previous Half Fraction. The other two non-significant patient characteristics (Socio-Economic and Body Mass) are non-significant and will not be re-analyzed.

Factors D and E may have been partially embodied by the three significant variables. For, Covid- 19 patients belonging to Lower Socio-economic levels may have, for a number of environmental reasons discussed in a previous paper3, developed more Co-morbidities, including obesity.

Variables in the Analysis Letter Acronym Letter AA BMass Body Mass BB Age Patient Age CC Com Comorbidities

Estimated Effects and Coefficients for Response:

Term Effect Coef StDev Coef T P Constant 9.8229 0.4592 21.39 0.000 AA 7.9573 3.9787 0.4592 8.66 0.000 BB 2.3856 1.1928 0.4592 2.60 0.016 CC 8.9678 4.4839 0.4592 9.76 0.000 AAB 1.8306 0.9153 0.4592 1.99 0.058 AAC 1.5981 0.7990 0.4592 1.74 0.095 BBC 0.5673 0.2837 0.4592 0.62 0.543 AABBCC 0.6061 0.3030 0.4592 0.66 0.516

Analysis of Variance for Response

Source DF Seq SS Adj SS Adj MS F P Main Effects 7 1248.2 1248.2 178.315 26.43 0.000 Residual Error 24 162.0 162.0 6.748 Pure Error 24 162.0 162.0 6.748 Total 31 1410.2

We show below the Pareto Chart that illustrates the statistical analysis performed. The columns corresponding to the statistically significant Factors (A, B, C) appear above the dashed red line (which corresponds to the critical value for said significance level). Per the hereditary effect the interactions AB, AC are also significant. Per the sparcity effect, the triple interaction ABC is not. Columns corresponding to non-statistically significant Factors appear below the dashed red line.

We also present below a residual plot that helps visualize whether the model fits (or does not) its assumptions. A plot presenting random results, such as the one below, is indicative of a model reasonably complying with assumptions. If there is a distinctive pattern in this plot, as opposed to , then there is reason to suspect problems exist with the model assumptions.

3 https://www.researchgate.net/publication/343700072_A_Digression_About_Race_Ethnicity_Class_and_Covid-19 Pareto Chart of the Standardized Effects (response is Response, Alpha = .10)

CC

AA

BB

AAB

AAC

AABBCC

BBC

0 1 2 3 4 5 6 7 8 9 10

Residuals Versus the Fitted Values (response is Response)

3

2

1

0

-1 Standardized Residual

-2

0 10 20 Fitted Value

Variability Analysis for Patient Characteristics:

We now analyze whether any of the patient characteristics impact response variability and, if so, in which direction (increasing or decreasing said variability). This is of importance, for having a smaller variability helps to increase the accuracy of results. Variability is analyzed by regressing the Log(response) on the corresponding DOE matrix, presented above.

Regression Analysis

The regression equation is:

LogResp = 1.82 + 0.696 AA + 0.156 BB + 0.745 CC

Predictor Coef StDev T P Constant 1.8217 0.1546 11.78 0.000 AA 0.6960 0.1546 4.50 0.000 BB 0.1563 0.1546 1.01 0.321 CC 0.7447 0.1546 4.82 0.000

S = 0.8745 R-Sq = 61.4% R-Sq(adj) = 57.2%

Analysis of Variance

Source DF SS MS F P Regression 3 34.030 11.343 14.83 0.000 Error 28 21.411 0.765 Total 31 55.441

Notice how both characteristics, Body Mass and Co-morbidities, increase the wellness response variability, while Age is non-statistically significant. This means that the corresponding wellness results, for patient having high Body Mass and Co-morbidities, may vary more than typically.

7.0 Discussion

Design of Experiments is an advanced statistical procedure that presumes the knowledge of some statistics. In the Bibliography section we include several textbooks that may help in their review. The Barrentine textbook is practically oriented. The Box, Hunter and Hunter text is rigorous in DOE methodology, but geared to the experimenter and the engineer. The Montgomery book is more formal in its treatment of the subject. This author has used them all, with much success, in his graduate, and professional statistics training courses.

We have provided three methods to analyze data from DOEs. Programming the calculations into Excel was used in our mentioned previous DOE paper. The other two methods were illustrated: using design of experiment software (e.g. Minitab, SAS), or (if the analyst does not have access to such procedures) implementing a regression analysis using the corresponding DOE matrix.

We developed a framework frequently used in Factor screening. We started by analyzing eleven Factors with both PB and FF-DOE designs (such designs can process many more Factors). Some Factor effects may be confounded with others. But PB/FF-DOE saves valuable time and effort.

The detected sub-set of significant Factors were then submitted to a Resolution 5 Fractional Factorial, allowing a more thoroughly screening of said selected factors. Finally, we re-analyzed them again, by extracting a sub-model from the Resolution 5 Fractional Factorial, using the set of Factors that were identified as statistically significant. Thus, we refined the results obtained.

In the present analysis we have used said screening approach, illustrated with our made-up data.

Lastly (or perhaps firstly), there is an additional and important use of modeling for the Covid-19 context. Models can be neutral devices that help move the argument from subjective partisanship to a more objective and scientific debate. Through models, we can present our viewpoints clearly and support or dispute their merits with civility. Models can also assess the effects of the various proposed courses of action. And, by changing their parameters and modifying their components, their outcomes could then be compared, facilitating the selection of more efficient solutions.

8. Conclusions

Our Covid-19 work stems from our proposal to the retired academic and research communities: https://www.researchgate.net/publication/341282217_A_Proposal_for_Fighting_Covid- 19_and_its_Economic_Fallout It pursues one goal: to contribute to defeat Covid-19.

The main objective of this paper is to provide a tutorial on the use of DOE in the screening and identification of Covid-19 Factors. The data we analyzed was created using this researcher’s experience and information. Thence, numerical results have only an illustrative value. However, Public Health and medical researchers can follow these statistical procedures, substituting their own data, generating additional analyses, and including new factors, as they become available.

With our work, we want to reach four audiences: (1) public health officers and researchers, (2) medical doctors, (3) and (4) the public in general. We want to encourage public health and medical professionals to use DOE and other statistical procedures, which are not easy to correctly implement. They need to work with statisticians: not only after their data have been collected, but also at the time that their experiments are being designed. Joint work enables the possibility of extrapolating to the general population () the promising results obtained in their laboratories and hospital wards, which is the final objective of most research.

We want to encourage statisticians, especially those retired, who have the experience, financial support (their pension), and the time to provide such assistance, to contribute in helping with the planning, implementation and analysis of statistical procedures –or with writing about them.

We want to provide illustrative examples to doctors, public health researchers, and to the general public, so they can better understand what each one does, fostering more efficient collaboration.

Finally, we have written a series of papers on statistical analysis of Covid-19. They are listed in the initial section of this article, with their web addresses. Such papers could become a part of a course in public health, or an applications course, in the medical curriculum. Bibliography

Scheffe, H. The Analysis of Variance. Wiley. New York, 1959.

Box, G., Hunter, W. G., and J. S. Hunter. Statistics for Experimenters.Wiley. New York.1978.

Montgomery, D. C. Design and Analysis of Experiments. 2nd Ed. Wiley, New York. 1984.

Barrentine, L. An Introduction to Design of Experiments. ASQ Quality Press. Milwaukee. 1999.

Romeu, J. L. Design and Analysis of an Aquatic Ecosystem. Proceedings of Federal Conference on Statistical Methodology. https://web.cortland.edu/romeu/DesEvalAqcEcosystems.pdf

Romeu, J. L. Design of Experiments for Reliability Improvement. Quanterion Reliability Ques. https://www.quanterion.com/design-of-experiments-for-reliability-improvement/

About the Author:

Jorge Luis Romeu retired Emeritus from the State University of New York (SUNY). He was for sixteen years, a Research Professor at Syracuse University, where he is currently an Adjunct Professor of Statistics. Romeu worked for many years as a Senior Research Engineer at the Reliability Analysis Center (RAC), an Air Force Information and Analysis Center operated by IIT Research Institute (IITRI). Romeu received seven Fulbright assignments: in Mexico (3), the Dominican Republic (2), Ecuador, and Colombia. He holds a doctorate in Statistics/O.R., is a C. Stat. Fellow, of the Royal Statistical Society, a Senior Member of the American Society for Quality (ASQ) and Member of the American Statistical Association. Romeu is a Past ASQ Regional Director (currently Deputy Regional Director), and holds Reliability and Quality ASQ Professional Certifications. Romeu created and directs the Juarez Lincoln Marti International Ed. Project (JLM, https://web.cortland.edu/matresearch/), which supports (i) higher education in Ibero-America and (ii) maintains the Quality, Reliability and Continuous Improvement Institute (QR&CII, https://web.cortland.edu/matresearch/QR&CIInstPg.htm) statistical web site.