<<

E REVIEW ARTICLE Methodology 2: Observational Clinical Research

Daniel I. Sessler, MD, and Peter B. Imrey, PhD

* † Case-control and cohort studies are invaluable research tools and provide the strongest fea- sible research designs for addressing some questions. Case-control studies usually involve retrospective collection. Cohort studies can involve retrospective, ambidirectional, or prospective . Observational studies are subject to errors attributable to selec- tion , , measurement bias, and reverse causation—in addition to errors of chance. Confounding can be statistically controlled to the extent that potential factors are known and accurately measured, but, in practice, bias and unknown confounders usually remain additional potential sources of error, often of unknown magnitude and clinical impact. —the most clinically useful relation between exposure and outcome—can rarely be defnitively determined from observational studies because intentional, controlled manipu- lations of exposures are not involved. In this article, we review several types of observa- tional clinical research: , comparative case-control and cohort studies, and hybrid designs in which case-control analyses are performed on selected members of cohorts. We also discuss the analytic issues that arise when groups to be compared in an , such as patients receiving different , are not comparable in other respects. (Anesth Analg 2015;121:1043–51)

bservational clinical studies are attractive because Group, and the American Society of Anesthesiologists they are relatively inexpensive and, perhaps more Anesthesia Quality Institute. importantly, can be performed quickly if the required Recent retrospective perioperative studies include data O 1,2 data are already available. Epidemiologic and health ser- from tens-of-thousands to tens-of-millions of patients. vices investigators have used such approaches for decades. Large numbers per se limit research errors attributable But until recently, surgical and perioperative retrospective to chance, but do not prevent bias and confounding, and studies were too often “100-patient chart reviews,” which can exacerbate their effects by inducing overconfdence in rarely produced valid conclusions. Increasingly, though, biased results. But large samples do support more in depth the availability of electronic data and sophisticated statisti- and effective application of statistical techniques to com- cal techniques makes retrospective observational studies of pensate for known confounding factors than is possible surgical and perioperative practices a valuable tool. with small studies. Especially in anesthesia, there has been a revolution in Although analyses of observational data should gener- data quality and availability because of electronic anesthe- ally be considered exploratory rather than defnitive, they sia and hospital records. Consequently, some institutions are nonetheless often a relatively inexpensive and quick have large and dense (i.e., minute-to-minute) registries of way to evaluate the plausibility of hypotheses and build anesthesia care. Often they are linked to related hospital support for subsequent experimental studies. Done well, databases, such as those from clinical laboratories and retrospective analyses can provide good estimates of treat- blood banks, and to outcomes such as duration-of-hos- ment effect3–7 although they tend to underestimate harms pitalization and date-of-death. Billing codes also provide associated with interventions.8 Excellent guidelines have valuable information about diagnoses and procedures been published to encourage uniform and complete report- although the intricacies and requirements of reimburse- ing of observational studies, including full acknowledge- ment mechanisms, and of administrative record systems ment of limitations.9,10 more generally, can sometimes distort clinical realities. To glean the most information from such registries, there CASE SERIES are now several national registries that pool data from A case series—a description of what happened to a series of various institutions, notably the National Surgical Quality patients with a particular diagnosis, perhaps treated with a Improvement Project, Multicenter Perioperative Outcomes particular strategy—is certainly an improvement over anec- dotal experience and case reports because compiled data

From the Department of , Anesthesiology Institute, from a group are less likely to be idiosyncratic than results Cleveland Clinic, Cleveland, Ohio; and Department of Quantitative Health from 1 or a few individuals. There are many examples in , Lerner* Research Institute, Cleveland Clinic, Cleveland, Ohio. which case series, and even case reports, have provided crit- † Accepted for publication May 1, 2015. ical advances.9 Malignant hyperthermia, for example, was Funding: Supported by internal sources. initially reported as a case series and, because it is so rare, The authors declare no conficts of interest. has never been subject to randomized trial.10,11 Address correspondence to Daniel I. Sessler, MD, Department of Outcomes In a typical case series, physicians might report that 79 of Research, Anesthesiology Institute, Cleveland Clinic, 9500 Euclid Ave./P77, Cleveland, OH 44195. Address e-mail to [email protected]; website: www.OR.org. their patients given a particular treatment for Copyright © 2015 International Anesthesia Research Society had a survival of 37 months. If you are comparable DOI: 10.1213/ANE.0000000000000861 to their patients and get exactly the same treatment, it is

Number 4 www.anesthesia-analgesia.org 1043 ڇ Volume 121 ڇ October 2015 Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. E REVIEW ARTICLE

reasonable to expect to have about a 50-50 chance of living reintubations are also rare—too rare, in fact—to practically more or less than 37 months. evaluate in prospective trials. The trouble is that this result, although useful for con- An alternative approach is to fnd patients who required sidering the prognosis of an individual already committed reintubation in the postanesthesia care unit and compare to the particular treatment, offers no comparative context. them to a similar group of patients who did not. Investigators Median survival for comparable patients is not all that we can then look backward in time and determine, for exam- want to know; what we really need to understand is whether ple, which patients in each group had their neuromuscular this survival (or any other outcome) is better or worse with block reversed. If patients who required reintubation were this treatment than with alternatives, and that is where the signifcantly less likely to have been given reversal agents, logic gets tricky. The danger is that in assessing alternative one can conclude that there is an association between inad- treatment plans, the results of a case series of a new treat- equate block reversal and emergent reintubation. The basic ment are almost always implicitly or explicitly compared approach to such case-control studies is shown in Figure 1. with previous results, that is, to a “historical control.” For case-control studies to be valid, the case and control Historically controlled studies tend to falsely generate groups must be chosen or statistically adjusted to be compa- a conclusion that new treatment or local management is rable with respect to potential confounding factors, and the superior because the comparative effects of the treatment assessment of exposures and potential confounding factors tend to get mixed up with, or confounded by, other time- must be equally complete and accurate for both. A danger dependent changes. For instance, recent patients may have with clinical records is that important potential confound- been diagnosed earlier in their courses than his- ers may not have been evaluated or may have been inaccu- torical patients because of the improvement in diagnostic rately recorded. They may also be nonrandomly erroneous: imaging or other technology. Patients diagnosed earlier— for example, patients who have complications may have a the recent ones—would thus be expected to survive lon- more complete list of preexisting conditions. A further dan- ger from the time of diagnoses whether or not the new ger is that important confounding factors may be unknown treatment they receive is an improvement. This problem and thus not even considered by investigators. is known as “lead time” bias in the context of studies of But because investigators must look back in time— medical . Because there is no way to recover when sometimes quite far back—it is diffcult to determine the diagnoses would have occurred had the historical reference extent to which the groups might meaningfully differ. subjects been seen under present conditions, it is diffcult Equivalent exposure assessment for both groups may also or impossible to retroactively correct for this problem. be hard to ensure. For example, data about cases often must Alternatively, for some disease-related complica- be obtained directly from patients or their families, whose tions may have improved over time. Improved treatment memories may be stimulated by the symptoms, diagnostic of these complications, rather than the new therapy, may processes, and concerns about the disease. In contrast, the have improved survival. memories of and about control patients may not undergo Conclusions based on historical comparisons are, star- such reinforcement. The tendency for stronger memories in tlingly often shown to be misleading, usually exaggerated, 1 group to distort exposure comparisons between case and in subsequent randomized trials.12,13 Comparisons to histor- control subjects is known as recall bias. ical controls are generally invalid because: (1) the compari- It is possible to conduct case-control studies within well- son patients differ from those in the case series in important defned cohorts and thus ensure that both cases and con- ways that have not been accounted for and may even be trols represent the same defned population. For instance, unknown (confounding); and/or (2) outcome measurement disease registries that centrally record all cases within a accuracy differs nonrandomly (measurement bias). defned geographic area allow investigators to conduct That said, population-based observational research— following a group of subjects over time—is the proper tool for determining the natural history and prognosis of vari- ous , and the conditions (i.e., “risk factors”) that frequently precede their development. For example, obser- vational studies would be used to evaluate weight gain or development of over time in a population. We will return later to observational designs involving more rigorous interperiod comparisons than the historically con- trolled case series. But it is essential to avoid unwarranted causal inference such as the conclusion that health will be improved by preventing obesity or hypertension—which may or may not prove to be the case. Figure 1. The cardinal feature of case-control studies is that inves- tigators start with a group of people who have already experienced CASE-CONTROL STUDIES the outcome-of-interest (reintubation) and an appropriate group of Case-control studies are usually the best approach—and controls who have not. They then look backward in time to compare often the only practical one—to study rare diseases or out- the and the extent of exposures (in this case, reversal of muscle relaxation) between the groups. Case-control studies are comes. Consider unanticipated tracheal reintubation in the most often retrospective, in that investigators collect the data about postanesthesia care unit. This is a potentially severe com- exposure, often many different exposures, backward in time, after plication that is often used as a quality metric; fortunately, the cases have occurred.

1044 www.anesthesia-analgesia.org ANESTHESIA & ANALGESIA Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. Observational Clinical Research

“population-based” studies with cases drawn from the reg- development is to be compared. Usually, and prefer- istry and controls drawn from the general population in the ably, the exposure groups are subsets of a single natural area covered by the registry. cohort although on occasion they may arise from different “Case-cohort” and “nested case-control” studies describe sources. hybrid designs combining elements of conventional cohort Cohort studies can also start after exposure but before and case-control research. In each, an underlying cohort development of disease, an approach termed an “ambidi- study defnes the context within which a case-control study rectional .” Ambidirectional studies are used is conducted. In case-cohort studies, the exposures of cases in public health crises when the exposure was unexpected that develop within the course of the cohort study are com- but when its effects on health are potentially important pared with those of a random of cohort members, (e.g., a leaking nuclear power facility, chemical plant which may be selected before disease develops. In nested explosion, or widespread exposure to contaminated air or case-control studies, controls are matched to each case indi- water). Exposed subjects are then followed prospectively vidually by random selection from other cohort members for ill effects, even though the exposure occurred before observed and remaining disease free for at least as long as the study started. Studies of , air pollution, diet, the case. This matches the times and data col- , etc. are also considered ambidirectional lection periods of controls with those of the cases. because the (ongoing) exposure is present at the time the Case-cohort and nested case-control designs are more study starts. assuredly valid than conventional case-control or retro- Finally, cohort studies can be completely prospective. spective cohort studies. But they require surveillance of a Some are observational, which is appropriate when expo- cohort to obtain the cases and controls and thus may take sure cannot be controlled by the investigators (i.e., employ- longer and inevitably cost more than fully retrospective ment in a chemical plant, a genetic factor, or a lifestyle factor approaches. Savings nonetheless accrue because exposure or maintenance that cannot ethically or practi- and outcome data need not be ascertained for all cohort cally be manipulated for a suffcient time frame). But a com- members. For instance, if exposures are genetic or otherwise pletely prospective cohort approach, in principle, allows the available from preserved biosamples, only those from cases investigators to take an experimental approach by control- and selected controls, who together may constitute only a ling the exposure. This can vastly increase the validity of small subset of the cohort, will require analysis. the study. A special case of a prospective experimental cohort study COHORT STUDIES is a “randomized ,” in which the exposure (in Cohort studies differ from case-control studies in that they this case, a preventive strategy or therapeutic treatment for look forward in time from exposure to disease/outcome. As a disease) is randomly assigned. The 3 “favors” of cohort above, the term “exposure” is used broadly and can refer to studies are shown in Figure 3. (Also, see the third article in a patient’s , environmental exposure, or treatment. this series that focuses on randomized trials.) The term “disease” is equally broad and includes complica- tions, progression, and death. The general logic of cohort studies is shown in Figure 2. In cohort studies, investigators start with groups dif- fering in their exposures or levels of exposure and look forward in time, comparing subsequent disease between these groups. But that does not that data collection needs to be prospective. For example, investiga- tors can “look forward” from exposure to disease within the confnes of an existing database. This latter approach is termed a “retrospective cohort study” because the research is conducted after the period for which disease

Figure 3. In cohort studies, investigators start with exposure and look forward in time, comparing development or progression of dis- ease between groups differing in their exposures. In a retrospective cohort study, investigators assemble the groups and make these comparisons within the confnes of existing records. An ambidirec- tional approach is used when the exposure has occurred (i.e., a chemical plant exploding) or is ongoing (i.e., cholesterol concentra- tion or diet), and its effects on health are potentially important and thus worth following. Prospective cohort studies, in which investiga- tors collect their data as the exposures and subsequent disease events occur, can be especially powerful because this approach may Figure 2. The cardinal feature of cohort studies is that investigators allow investigators to control measurements and manipulate the start with groups of people who differ in their exposure and analyti- exposure, which vastly increases the study reliability. A special case cally look forward in time to compare the development of disease of a is a randomized clinical trial, in which between these groups. exposure (treatment) is randomly assigned.

Number 4 www.anesthesia-analgesia.org 1045 ڇ Volume 121 ڇ October 2015 Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. E REVIEW ARTICLE

INTERPERIOD COMPARATIVE DESIGNS associated with amount of on the preceding night; and “Before-and-after” studies, which compare outcomes of episodes of bleeding and thrombosis for patients on ven- a case series after initiation of an intervention with those tricular assist devices might be compared across periods of of historical controls in an immediately preceding period, differing preventive medication regimens. use temporally proximal historical controls who are also highly similar in other respects to those coming after and BIAS IN CASE-CONTROL, COHORT, AND thus exposed to the intervention. Such studies often allow INTERPERIOD COMPARISON STUDIES planned, uniform measurements throughout the full obser- The “,” initially identifed in the 1940s,16 vation period. For these reasons, they are clearly less vul- is a subtle type of bias favoring the intervention in before- nerable to error from extraneous infuences than more and-after studies. It refers to the fact that simply being in loosely historically controlled case series. Before-and-after a study, and the associated attention from investigators, studies are frequently used in health services research alters responses of participants. For example, patients in because implementations of complex practice changes, such a -controlled clinical trial of Ginkgo biloba for treat- as electronic records or multispecialty enhanced recovery ment of mild-moderate dementia exhibited more improve- pathways, tend to occur at discrete points in time and are ment when intensively monitored than with less intense essentially irreversible. .17 Observational analogs of experimental crossover Two diffculties with before-and-after studies and more designs and “n-of-1” clinical studies,14,15 in which individu- elaborate interperiod comparisons are that (1) the presence als or groups are exposed to planned sequences of 1 or more of the exposure may infuence the measurements of the experimental and control conditions in discrete periods, are outcome unless the measurement process is under tight more resistant to errors than simple before-and-after com- control of the researcher throughout the study period; and parisons. Such designs are often called “quasiexperimental” (2) medical improvements, or secular trends concurrent and are most straightforward for studies of exposures with with but irrelevant to the intervention, may make an interven- short-term biological effects and of people whose health tion appear more (or less) effective than it really is. The sec- conditions are chronic and relatively stable; more restric- ond of these can be greatly mitigated by multiple series and tive assumptions and complex analyses are required when multiperiod studies because such studies can effectively responses vary systematically among periods because of exclude many simple alternative explanations for data, disease progression or persistent effects of therapies in pre- leaving causality as the sole remaining plausible scenario. vious periods. For example, at the population level, cardiac Combined concerns about concurrent time-dependent arrest incidence rates might be compared among periods of improvements, the Hawthorne effect, and the potential for high and low air pollution, as determined by a relevant air measurement , make simple before-and-after stud- quality metric. ies an especially weak design. Where feasible, multiperiod Interrupted studies involve measurements crossover studies, including stepped wedge18 and inter- over multiple time periods, preferably a considerable rupted time series19,20 comparisons, are generally superior number, bracketing an intervention, or other exposure of alternatives and even more so when they include a parallel interest. This elaboration of a conventional before-and- control series. after design allows observation of trends coincident with Case-control studies look back in time from disease/ the intervention. The technique allows differentiation of a outcome to exposure, whereas cohort studies look forward response to intervention from stable secular trends caused in time from exposure to disease/outcome. A consequence by extraneous factors, such as steady increases or decreases is that selection and measurement biases apply differently. or perhaps cyclical effects because of seasonal variation. In case-control studies, applies to selection of Close temporal association between exposure and outcome, the case and control groups. To the extent that the 2 groups in a long series of stable or predictably changing outcomes, differ on variables extraneous to the focus of the study, the can compellingly suggest causality in the context of obser- effects of such variables may contaminate inferences about vational research. the exposure(s) of interest, compromising the validity of the Elaborations of this design, where possible, involve conclusions. of similar time series across multiple groups The opposite is true for cohort studies. There, selection for which the timing of the exposure has varied. Controlled bias applies to selection of the exposed and unexposed interrupted time series studies include a second series of groups—which need to be otherwise comparable. An observations, at the same times as the frst, in a group that example from anesthesia is that patients who get neuraxial remains unexposed throughout the study. Stepped wedge anesthesia usually differ from those who do not; a simple designs observe multiple time series at similar points, from comparison of outcomes between patients who do and do groups initially exposed at different times in the sequence of not have epidural or spinal anesthesia will thus be con- observations. The additional time series improve investiga- founded by such differences to the extent that they infu- tors’ ability to distinguish the effects of exposure from those ence the outcomes of interest. of extraneous, irregular infuences on the outcome. Such issues emerged in the controversy over risks and Similarly, in what are known as “case-crossover stud- benefts of hormone replacement therapy for postmeno- ies” at the individual patient level, measures of chronic pausal women. Hormone replacement therapy users might be compared between periods in which patients used tend to be more affuent, better educated, have greater different or doses; childhood accidents might be access to care and treatment of comorbidities, and tend to

1046 www.anesthesia-analgesia.org ANESTHESIA & ANALGESIA Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. Observational Clinical Research

be more medically compliant than nonusers.21 To the extent LINKS BETWEEN CASE-CONTROL AND COHORT that these factors predispose to better health outcomes, STUDIES direct observational comparisons between users and nonus- Although cohort and case-control studies differ substan- ers randomly sampled from the population were potentially tially, they share an identical basic purpose: to shed light biased in favor of hormone replacement therapy. on how current status and exposures, such as medical treat- Simple comparisons of such an outcome in cohort stud- ments, predict and affect future outcomes. ies would thus compare groups of women collectively at dif- It is not intuitively obvious that comparisons of statistics ferent risk levels. Case-control studies would similarly fnd about past exposures in case-control studies can be used to fewer using hormone replacement among cases than controls describe the statistical patterns of evolution of disease out- because more affuent and educated women with better medi- comes in current and future patients. In fact, precise math- cal care would be less frequent among the case group, which ematical reasoning, including a bit of algebraic jujitsu, is would thus be relatively more populated with women less required, and this reasoning does not apply to arbitrarily likely to have access to and use hormone replacement therapy. assembled case and control groups. For the logic to work, Thus, confounding in both types of studies could arise from both cases and controls must be sampled from a com- failure to compensate from the selective manner in which hor- mon underlying cohort wherein the relevant pathogenic mone replacement therapy becomes available to, and is used processes are generally stable. In other words, the control by, perimenopausal and postmenopausal women. sample in a case-control study must be representative of the For case-control studies, measurement bias applies pri- larger group of all those who, had they developed the dis- marily to estimating exposure, which might be genetic, ease under study, would have been detected and included environmental, or a treatment. Again, the opposite is true in the case group. In practice, it is often diffcult to establish for cohort studies; there, measurement bias applies pri- such groups, especially when studying rare diseases and marily to assessing disease or outcome. For instance, good potential links to distant occupational and environmental clinical practice typically requires greater surveillance for, exposures. and higher sensitivity to, disease signs in patients known Consider, for example, a case-control study in which or suspected to be at increased risk, but such differences in hospitalized patients with 1 disease are compared with sim- relation to a suspect exposure produce ascertainment bias ilarly hospitalized controls with other diagnoses. The dif- in cohort studies. Figure 4 shows the 2 types of studies and fculty with this approach is that hospitalized patients are where selection and measurement bias are of most concern special by virtue of their need for hospitalization and thus in each. may differ greatly from the cases in characteristics and past Case-control and cohort studies are both important exposures related to other causes of hospitalization. So the research tools. Case-control studies usually involve retro- selection processes that result in hospitalization for various spective data collection although a case-cohort or nested conditions can introduce serious confounding that is often case-control analysis is sometimes planned within a ran- hard to anticipate—and even harder to eradicate—a prob- domized trial. Cohort studies can involve retrospective, lem known as Berkson bias. ambidirectional, or prospective data collection. More com- With hindsight, we can well appreciate an example plex observational studies may combine features of both of this in 2 seminal 1950s studies of smoking and health the case-control and cohort approaches. Comparisons conducted in England by Doll and Hill, who were both between observational studies and subsequent large ran- subsequently knighted for their contributions. One was a domized trials suggest that the observational studies usu- prospective study of smoking and mortality from various ally correctly determine the direction of the effect being causes in a cohort of physicians.21 The other was a case-con- evaluated but often overestimate the magnitude of the trol study of smoking and lung incidence, in which treatment effect.6 hospitalized patients were compared with con- trols selected from inpatients of the same hospitals with other diagnoses.22 Because lung cancer was almost always fatal, and there is no particular reason to expect its rela- tion with smoking to differ in physicians from its relation with smoking in others, one might expect estimates of the smoker-to-nonsmoker lung cancer incidence ratio from the hospital-based case-control study and the smoker-to-non- smoker lung cancer mortality ratio from the prospectively studied physician cohort to be similar. However, the hospital-based case-control study inci- dence ratios for categories of cigarette smoking at levels above half-a-pack per day were only half the corresponding mortality ratios observed in the physician cohort, whereas the incidence and mortality ratios for less frequent smok- Figure 4. In case-control studies (left), selection bias applies to ers were similar. Subsequent research has clearly supported selection of the case and control groups, and measurement bias applies to determination of exposure. In cohort studies (right), selec- the ratios found from the physician cohort, and it is now tion bias applies to selection of exposed and unexposed groups, and accepted that the case-control study substantially underesti- measurement bias applies to determination of disease or outcome. mated the strength of the smoking and lung cancer linkage.

Number 4 www.anesthesia-analgesia.org 1047 ڇ Volume 121 ڇ October 2015 Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. E REVIEW ARTICLE

We know now what was not known then, and why such obtain a clearer picture of the effect of state of residence, a discrepancy between the results of these studies at higher separated from the known effect of age. But unless a very smoking levels was virtually inevitable. Cigarette smok- large sample is available, it is diffcult with this method to ing causes many diseases, including myocardial infarc- simultaneously protect against confounding by more than a tions and ischemic stroke; collectively, cardiovascular and few variables, and it only protects against known confound- cerebrovascular diseases linked to smoking were then, ers for which reliable data are available. and are now, much more common than lung cancer. In the “Multivariable modeling” refers to more comprehensive 1950s, heart attack patients were typically hospitalized for and sophisticated statistical multiple regression methods 3 weeks. Hence, the British hospitals from which controls for simultaneously controlling multiple confounders and were selected in the early 1950s were likely flled with heart potentially also evaluating effect modifcation. One use attack survivors, whose past smoking exposures had con- of such models is to develop an overall propensity score tributed to pathogenesis of their heart disease and thus to summarize the multiple characteristics correlated with selected them to be available as controls. exposure. The score may then be used to develop individu- Control patients in this study were thus bound to over- ally matched or stratifed comparisons, more or less as just estimate the smoking levels of those without lung cancer in described, in relatively simple statistical analyses as if the the population, leading to underestimation of the disparity propensity score itself had been a single prognostic char- between smoking levels of cases and controls and conse- acteristic known at the study’s outset. Thus, the modeling quently of the inferred disparities between lung cancer inci- effort is directed toward accounting for preceding correlates dences of nonsmokers and smokers—but only at smoking of the exposure and is conducted entirely without reference levels suffcient to noticeably affect heart attack risk. Thus, to clinical outcomes. When the exposure is a treatment, with current knowledge, it is unsurprising that the effect was propensity modeling then attempts to represent physician not seen in Doll and Hill’s lowest smoking dose category, behavior, about which much may be known a priori and where effects of smoking on other diseases were weakest. used to inform the modeling process. Most multivariable modeling, however, is directed at CONTROLLING CONFOUNDING AND COMPLEXITY exposure-disease linkages and attempting the more diffcult IN OBSERVATIONAL STUDIES task of representing complex biology and resulting disease Observational clinical researchers attempt to isolate the behaviors. These models attempt to achieve the benefts of causal effects of 1 factor from the biological and statistical fnely stratifed analyses by mathematically incorporating infuences of others. Fortunately, human environments and the relations between pairs of variables, assuming others behaviors are not subject to the sorts of manipulations and are held constant, and then correcting for differences in controls that are routine for laboratory samples and animal groups being compared that may generate confounding models. But a consequence is that confounding is always by mathematical adjustments of outcomes. Essentially, this possible. Specifc features of study design and analysis are involves sliding the outcome distributions for each group thus used to reduce the risk of confounding errors by foster- along scales corresponding to individually predictive vari- ing like-with-like comparisons. These methods are power- ables, until averages of all such variables are closely aligned ful but imperfect. across all groups. Direct multivariable modeling and pro- methods assemble the groups to be compared pensity matching are both useful, and which is best for a in a manner that forces them to be similarly constituted with given analysis depends on the questions being asked, the respect to characteristics that otherwise might confound the type of data, and the characteristics of the data source.23 comparisons of greatest interest. Group compositions might A basic approach to adjusting for confounding is com- be harmonized as a whole in what is known as “frequency mon to the various types of used for matching.” Alternatively, and more aggressively, individu- modeling continuous measurements, dichotomous out- als in each group might be linked to specifc similar indi- comes, rates of recurrent disease episodes, and survival viduals in the other group(s) in “matched set” studies. times. At the core of most statistical modeling is an index Analytical strategies strive for similarly equitable com- of patient factors known technically as a “linear predictor”

parisons by restructuring how observations are organized and consisting of a weighted sum of patient variables a1x1

and weighted when comparisons are made. Stratifcation + a2x2 + ...+ akxk. The xi are numbers, or numerical codes methods such as classical direct rate adjustment subdivide for values of ordinal or qualitative patient factors, and the

the data into strata whose members are similar with respect ai are weights intended to refect the statistical relations of to confounding variables, obtain like-with-like subcompari- these variables to the outcome. When enough data are avail- sons between the relevant groups within each of these strata, able and relations in nature are simple, confounding can be

and then reassemble these subcomparisons into an overall statistically removed simply by adding 1 or more terms ajxj summary of group differences. Because the subcomparisons representing confounders into the linear predictor. This are protected against confounding, their aggregation is also. approach can, in principle, remove confounding for even a For example, Florida’s is higher than Alaska’s large set of variables. because Floridians tend to be older. But we would be foolish Effect modifcation can also be represented in regression to move from 1 state to the other to live longer because mor- analysis by including terms in the linear predic- talities for Floridians and Alaskans of comparable ages are tor for which the x portion is constructed as the product similar. By stratifying each state’s population and averaging of numerical representations of the exposure variable and the differences in the resulting age-specifc mortalities, we variables that may modify its effect. When interaction terms

1048 www.anesthesia-analgesia.org ANESTHESIA & ANALGESIA Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. Observational Clinical Research

are statistically signifcant, exposure effects should usually alternative highly correlated (technically known as be described separately for different values or ranges of the “collinear”) measurements in the same linear pre- effect modifer. dictor may conceal important relations. This occurs For instance if, in an observational study, the vari- because the test of each variable individually treats

able x1 is respectively 1 or 0 for patients receiving the counterpart variable as a potential confounder A or drug B, and physicians tend to somewhat preferen- and, by doing so, largely adjusts its own effects out tially prescribe drug A for older and drug B for younger of the test. patients—and the average outcome of treatment depends 4. Intermediate variables in a causal pathway between linearly on age—then confounding of the treatment effect an exposure and outcome, technically known as by age might be removed by simply including a multiplier “mediators,” are not confounders. Treating them as if of patient age in the linear predictor or including separate they were, by matching or inclusion in a linear pre- variables taking values 0 or 1 to indicate membership in dictor, conceals such portion of the exposure’s causal each of several age strata. Effect modifcation can also be effect that acts through the mediator. For example,

investigated by including an additional variable, say x6, anesthesia handovers are associated with, and may

constructed as the product of x1 with age: 1 × age = age be causal for, major in-hospitality morbidity or mor- for patients receiving drug A and 0 × age = 0 for patients tality, potentially because of errors resulting from receiving drug B. the loss of critical information during the transfer.1 Occam’s Razor, attributed to William of Occam (1285-c. If this is correct, then if analyses of the 1347) and sometimes known as the law of parsimony or the association between handovers and outcomes were KISS (keep it simple stupid) principle, is the practical, and to somehow adjusted for loss of critical information, the some extent aesthetic, philosophy of choosing, among com- adjustment would greatly reduce or even eliminate peting explanations, the simpler over the more complex. the association between handoffs and outcomes. Occam’s Razor requires hypotheses involving effect 5. Controlling the relations of 2 variables for a mutual modifcation to be held somewhat at arms length until consequence, known as a “collider,” or for a collider’s essentially required by data that cannot be explained more near relative, can distort the relation unpredictably simply. Keeping this approach in mind when conducting and yield paradoxical results. To see this, consider multivariable modeling is wise because a search for inter- the classic example of the relation between 2 causes of actions among many variables involves large numbers of wet lawns, rain, and automated lawn sprinkling only combinations, promoting false-positive results. on Mondays, Wednesdays, and Fridays. If the asso- The apparent simplicity of these powerful methods can ciation is examined after controlling for lawn status be deceptive, especially when the methods are applied in by fxing the lawn as wet, then absence of either cause conjunction with automatic or semiautomatic methods for fully determines—indeed, logically implies—that the variable selection, which itself is among the most diffcult other must be present. But of course this is nonsense problems in statistical methodology. Specifcally: because rain and automatic sprinkling are statistically independent, despite the conviction of many home- 1. Because confounding is a consequence of sample com- owners that watering brings on rain. position regardless of statistical signifcance, methods 6. Accurate determination of timing, and hence sequenc- that select variables solely by statistical signifcance ing of exposures, outcomes, and possible mediators, criteria, such as signifcance of unadjusted associa- confounders, and effect modifers may be diffcult tions with either outcome or exposure or by forward, in observational studies of chronic diseases and/ backward, or stepwise variable selection, may fail or exposures, leading to inadvertent adjustment for to recognize and adjust for important confounding. mediators or consequences rather than confounders. Thus, the common practice of using statistical signif- 7. Adjustments for confounding in statistical models cance to determine which variables to adjust for in are themselves infuenced by random variation of the comparisons of outcomes in an observational study estimated relations between the outcome and the con- is ad hoc rather than justifed by statistical principles. founder, random variation which can be substantial 2. Sliding variables along their scales individually to in modestly sized studies. achieve comparability of averages across groups, as is done in the basic modeling approach to confounder These issues clarify why (1) a hands-on statistical approach adjustment described above, may implicitly assume is generally preferable to “by the numbers” handling of con- biologically impossible combinations of variables. In founder control in observational studies; (2) data from some such cases, adjustment models effectively extrapolate such studies may not have a unique coherent interpretation, beyond the of the data. even when large amounts of data are available; and (3) data 3. A characteristic that cannot itself be measured from controlled clinical are preferable to obser- directly is termed “latent.” Pain, for instance, is latent vational data when obtainable and affordable. but can be assessed by its effects on multiple behav- iors and measurement instruments such as, for young STROBE REPORTING GUIDELINES children, the Faces Pain Scale (Faces, Legs, Activity, Various checklists and guidelines have been proposed to Cry, Consolability)24 and numerical visual analog enhance completeness and consistency of reporting in obser- scale.25 When such multiple measures of the same vational studies. In practice, the reporting guidelines also latent entity are available for analysis, inclusion of serve as guidelines for study conduct because many required

Number 4 www.anesthesia-analgesia.org 1049 ڇ Volume 121 ڇ October 2015 Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. E REVIEW ARTICLE

elements will only be available if designed into the original measurement bias. The third article in this series discusses . The best known and most commonly used check- randomized clinical trials. E list is from the Strengthening the Reporting of Observational Studies in (STROBE) statement.26 DISCLOSURES The STROBE checklist includes 22 items, half-a-dozen of Name: Daniel I. Sessler, MD. which have subitems. Among these, perhaps the most criti- Contribution: This author helped write the manuscript. cal are (1) specifc objectives and hypotheses; (2) eligibility Attestation: Daniel I. Sessler approved the fnal manuscript. criteria, sources, and methods of participant selection and Name: Peter B. Imrey, PhD. sample size rationale; (3) defnition of all outcomes, expo- Contribution: This author helped design the study and write sure, predictors, potential confounders, and effect modifers; the manuscript. (4) data sources; (5) methods of statistical analysis, includ- Attestation: Peter B. Imrey approved the fnal manuscript. ing matching, adjustments for confounding, subgroups, and This manuscript was handled by: Steven L. Shafer, MD. interactions; (6) relevant participant characteristics; and (7) REFERENCES results including both unadjusted and (if used) adjusted/ 1. Saager L, Hesler BD, You J, Turan A, Mascha EJ, Sessler DI, matched outcomes with 95% confdence intervals. Kurz A. Intraoperative transitions of anesthesia care and post- operative adverse outcomes. Anesthesiology 2014;121:695–706 CONCLUSIONS 2. Dalton JE, Glance LG, Mascha EJ, Ehrlinger J, Chamoun N, Sessler DI. Impact of present-on-admission indicators on risk- Large registries have revolutionized retrospective perioper- adjusted hospital mortality measurement. Anesthesiology ative studies by giving investigators access to large amounts 2013;118:1298–306 of high-quality data. High-density databases including 3. Abraham NS, Byrne CJ, Young JM, Solomon MJ. Meta-analysis tens-of-thousands to tens-of-millions of patients, combined of well-designed nonrandomized comparative studies of surgi- with sophisticated statistical techniques, have markedly cal procedures is as good as randomized controlled trials. J Clin Epidemiol 2010;63:238–45 improved the reliability of retrospective studies. 4. Benson K, Hartz AJ. A comparison of observational studies and Case series are often explicitly or implicitly compared randomized, controlled trials. N Engl J Med 2000;342:1878–86 with historical controls, an approach that is subject to 5. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, numerous biases. Retrospective case-control studies may be observational studies, and the hierarchy of research designs. N Engl J Med 2000;342:1887–92 the only way to evaluate rare conditions, and retrospective 6. Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, cohort studies are a quick and inexpensive way to evalu- Tektonidou MG, Contopoulos-Ioannidis DG, Lau J. Comparison ate hypotheses. Long-running prospective cohort studies of evidence of treatment effects in randomized and nonran- accumulate invaluable data and biosamples that inform on domized studies. JAMA 2001;286:821–30 the natural history of disease and provide a basis for subse- 7. Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods gave similar results to traditional regression modeling quent case-cohort and nested case-control studies. in observational studies: a . J Clin Epidemiol Done properly, case-control and cohort studies are 2005;58:550–9 powerful research tools and may provide the strongest 8. Papanikolaou PN, Christidi GD, Ioannidis JP. Comparison of feasible research designs for addressing some questions. evidence on harms of medical interventions in randomized and nonrandomized studies. CMAJ 2006;174:635–41 Case-control studies usually involve retrospective data 9. Albrecht J, Meves A, Bigby M. Case reports and case series collection. Cohort studies can involve retrospective, ambi- from Lancet had signifcant impact on medical literature. J Clin directional, or prospective data collection. More complex Epidemiol 2005;58:1227–32 observational studies may combine features of both the 10. Denborough M, Lovell R. Anaesthetic deaths in a family (let- case-control and cohort approaches. ter). Lancet 1960;ii45 11. Denborough MA, Forster JF, Lovell RR, Maplestone PA, Villiers However, observational studies are subject to errors JD. Anaesthetic deaths in a family. Br J Anaesth 1962;34:395–6 attributable to selection bias, confounding, measurement 12. Sacks H, Chalmers TC, Smith H Jr. Randomized versus histori- bias, and reverse causation—in addition to errors of chance. cal controls for clinical trials. Am J Med 1982;72:233–40 Confounding can be statistically controlled to the extent 13. Bhansali MS, Vaidya JS, Bhatt RG, Patil PK, Badwe RA, Desai PB. Chemotherapy for carcinoma of the esophagus: a compari- that potential factors are known and accurately measured son of evidence from meta-analyses of randomized trials and of but, for the reasons enumerated above, adjustment may not historical control studies. Ann Oncol 1996;7:355–9 be straightforward in practice, and bias and unknown con- 14. Guyatt G, Sackett D, Adachi J, Roberts R, Chong J, Rosenbloom founders remain additional potential sources of error, often D, Keller J. A clinician’s guide for conducting randomized trials in individual patients. CMAJ 1988;139:497–503 of unknown magnitude and clinical impact. Causality—the 15. Guyatt G, Sackett D, Taylor DW, Chong J, Roberts R, Pugsley S. most clinically useful outcome—can rarely be defnitively Determining optimal therapy—randomized trials in individual determined from observational data, because intentional, patients. N Engl J Med 1986;314:889–92 controlled manipulations of exposures are not involved, 16. Mayo E. The Human Problems of an Industrial Civilization. and clear exclusion of competing noncausal interpretations 2nd ed. New York, NY, MacMillan, 1946 17. McCarney R, Warner J, Iliffe S, van Haselen R, Griffn M, Fisher is logically impossible and exceptionally diffcult, even to a P. The Hawthorne effect: a randomised, controlled trial. BMC practical standard of assuaging reasonable doubt. Med Res Methodol 2007;7:30 Experimentation, however, provides more powerful 18. Brown CA, Lilford RJ. The stepped wedge trial design: a sys- tools for preventing clinical research errors. Randomized tematic review. BMC Med Res Methodol 2006;6:54 19. Ma ZQ, Kuller LH, Fisher MA, Ostroff SM. Use of interrupted assignment of treatment prevents reverse causation time-series method to evaluate the impact of cigarette excise errors and selection bias and, in suffciently large studies, tax increases in Pennsylvania, 2000–2009. Prev Chronic Dis strongly protects against confounding. Blinding minimizes 2013;10:120268

1050 www.anesthesia-analgesia.org ANESTHESIA & ANALGESIA Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited. Observational Clinical Research

20. Terner Z, Carrol T, Brown DE. Time series forecasts and volatil- Face, Legs, Activity, Cry, Consolability scale for evaluat- ity measures as predictors of post-surgical death and kidney ing pain in toddlers during . Pain Res Manag injury. Healthcare Innovation Conference (HIC), 2014 IEEE, 2013;18:e124–8 319–22 25. Myles PS, Troedel S, Boquest M, Reeves M. The pain 21. Doll R, Hill AB. Smoking and carcinoma of the lung; prelimi- visual analog scale: is it linear or nonlinear? Anesth Analg nary report. Br Med J 1950;2:739–48 1999;89:1517–20 22. Doll R, Hill AB. The mortality of doctors in relation to their smoking habits; a preliminary report. Br Med J 1954;1:1451–5 26. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, 23. Blackstone EH. Comparing apples and oranges. J Thorac Vandenbroucke JP; STROBE Initiative. The Strengthening the Cardiovasc Surg 2002;123:8–15 Reporting of Observational Studies in Epidemiology (STROBE) 24. Gomez RJ, Barrowman N, Elia S, Manias E, Royle J, Harrison statement: guidelines for reporting observational studies. D. Establishing intra- and inter-rater agreement of the Lancet 2007;370:1453–7

Number 4 www.anesthesia-analgesia.org 1051 ڇ Volume 121 ڇ October 2015 Copyright © 2015 International Anesthesia Research Society. Unauthorized reproduction of this article is prohibited.