Confounding and Cornfield: Back to the Future

Total Page:16

File Type:pdf, Size:1020Kb

Confounding and Cornfield: Back to the Future ICOTS10 (2018) Invited Paper Schield CONFOUNDING AND CORNFIELD: BACK TO THE FUTURE Milo Schield Augsburg University, Minneapolis, MN, USA [email protected] Most students see less value in statistics after taking the intro course than they did before. Most students are in disciplines that deal with observational studies and confounding. Introductory statistics doesn't deal with observational studies. Most intro statistics textbooks never mention confounding. Confounding wasn't one of McKenzie's top 30 concepts. Why are we so silent? Perhaps we aren't comfortable teaching confounding. To change the future, we need to go back to when Jerome Cornfield argued that smoking caused cancer. Fisher disagreed arguing it might be caused by a genetic confounder. Cornfield refuted Fisher by deriving the Cornfield conditions: one of the greatest contributions of statistics to human knowledge. We need to teach multivariate statistics, confounding and the Cornfield conditions so students will appreciate statistics. Key words: Statistical literacy, effect size, confounder, lurking variable, Simpson's paradox STATISTICAL EDUCATION ENJOYS SUCCESS Statistical education has had success. Most college graduates study statistics. Schield (2016a) estimates that 57% of US four-year college students graduating in 2012 studied statistics. This estimate assumes that all students in the Table 1 majors must take statistics. This assumption will over-estimate the number in math, biology and history, but it omits many other majors where statistics may be required. ASA (2016). The number of US high school students taking the AP statistics exam has more than doubled in 9 years: from 98 thousand in 2007 to 206 thousand in 2016. Wikipedia (2018a) Table 1: Distribution Stat 101 by Major Table 2: Student Math Aptitudes by Major ' STATISTICAL EDUCATION HAS ONE BIG PROBLEM Students see less value in statistics after taking the course than they did before. Schield and Schield (2008). Statistical educators seem unwilling to survey their students on this criterion. This topic is almost absent in statistical education papers or conferences. It's not a stated goal. It's as though statistical educators don't want to know. If we don't measure it, it doesn't exist. Why don't students appreciate the value of statistics despite being deluged by a sea of data? Perhaps it is an acquired taste like Scotch. Maybe students won't see the value until they start working; maybe their dislike of math carries over to statistics. Here are some more explanations: One explanation is a student-teacher math-aptitude gap. This gap can be seen in Table 2. (Schield 2016a). Most statistical educators majored in mathematics or statistics with an average math SAT score at the 80th percentile among college-bound SAT takers. The biggest group of students taking statistics majored in business where the average math SAT score is the 51st percentile. The average math SAT for psychology and education majors is at the 39th percentile. Maybe statistics teachers think and teach at a more abstract level than their students. A second explanation is a student-teacher subject-interest gap. A student's subject interest is typically related to their chosen major. Table 1 shows the choice of majors among those taking statistics. The majors in Table 1 can be classified into two groups: those in the social science and those in the physical sciences with math/stat included in the latter. Since statistics teachers tend to In M. A. Sorto, A. White, & L. Guyot (Eds.), Looking back, looking forward. Proceedings of the Tenth International Conference on Teaching Statistics (ICOTS10, July, 2018), Kyoto, Japan. Voorburg, The Netherlands: International Statistical Institute. iase-web.org [© 2018 ISI/IASE] ICOTS10 (2018) Invited Paper Schield come from math/stat or the physical sciences, perhaps they can't relate to the subject interests of their students in the social sciences. A third explanation involves a student-teacher prediction-causation gap. Richard de Veaux argued that "We are teaching the wrong things in the wrong way and in the wrong order." Rossman (2016). Perhaps it is the kinds of data we are – or are not – teaching. Normally we introduce causation using random assignment. If we teach observational data, it only involves predictive modeling – not causal exploration or explanation. Confounding isn't a problem in predicting. Students majoring in the social sciences are more likely to use observational data to understand causation. Confounding is a major problem. Multivariate data is needed to deal with confounding. Teachers may avoid those topics except to say "Association is not causation." By making multivariate data a goal for intro statistics, the 2016 update to the GAISE guidelines was a major step forward. Mentioning observational studies and confounding was a big step. Schield (2017) But confounding was not mentioned at the highest level. Our unwillingness to talk about observational causation, confounding and strength of evidence is arguably the primary reason our students' see little value in the introductory statistics normally taught in Stat 101. STATISTICAL EDUCATORS: CAUSATION AND CONFOUNDING Statistical educators are encouraged to deal with causation and confounding in intro stats. • Moore (1997, 1998) identified "Beware the lurking variable" as a key component of statistical literacy. • Tintle et al (2014) identified “confounding and variation" as "two substantial hindrances to drawing conclusions from data". • Kaplan (2010). "Saying 'Correlation is not causation' is like saying 'Sex is not love'." Statistical educators seem unwilling to deal with causation in the introductory course. • Pearl (2018) has written a most comprehensive critique of why statisticians are unwilling to deal with causation. Pearl's book, The Book of Why: The New Science of Cause and Effect, is arguably the most important book on causal statistics since Cornfield debated Fisher on whether smoking caused lung cancer. Statistical educators seem unwilling to deal with confounding in the introductory course. • Confounding was not listed in McKenzie's (2004) top-30 topics in intro statistics. • Most statistical educators use confounding to show that association is not causation, but never mention the topic thereafter: a classic case of "bait and switch"! • Confounding is not mentioned in the index of many (most?) intro textbooks. (Schield, 2018). This finding was based on a convenience sample of 80 statistics textbooks. Of these 65% (52/80) contained no index reference to confounding or lurking variable, 25% (20/80) contained a few index entries and 10% (8/80) contained substantial content. • Statistical educators disregarded Moore's call (1997) for two separate introductory statistics courses: statistical competence (a first course for those who must deal with data in their work); statistical literacy (what every educated person should know). So why are statistical educators unwilling to discuss confounding in the intro course? • Content. Moore (1997) wondered if a statistical literacy course was really "statistics". • Mathematical allegiance. Causation is not a mathematical idea or operation. Avoiding causation and confounding maintains a closer allegiance to mathematics. Schield (2013) • Statistical allegiance. Statistical educators may share Fisher's total disregard for observational studies. Freedman et al (1997) quoted Fisher's comment on observational studies, "That's not an experiment you have there, that's an experience." • Concern. “When students see how easily statistics are influenced by other factors, this may bring our discipline into disrepute.” Anonymous statistical educator. • Ignorance. Most statistical educators never heard of Cornfield's minimum effect size needed to ward off nullification an observed association. This paper addresses the last two explanations. Concern and ignorance. - 2 - ICOTS10 (2018) Invited Paper Schield BACK TO THE FUTURE: BATTLE OF THE TITANS There is a similarity between statistical education today and the parents in the 80s hit movie: Back to the Future. Wikipedia (2018b). In that movie, Marty's dad and mom are not communicating. His parents had great plans in the past, but now they've stopped listening. Similarly, statistical educators have stopped listening to their students. They didn't listen when students said they saw less value in statistics after taking the course than they did before. To change the future, we need to go back to 1959 and see Jerome Cornfield arguing that smoking caused cancer – based on data from observational studies. We need to see Sir Ronald Fisher arguing that the observationally-based association between smoking and cancer could be due to genetics: a confounder. Yes, Fisher was a smoker, but he produced data from a German twins study showing an association between the type of twinship (identical vs. fraternal) and smoking preference. We need to see how Cornfield derived the minimum effect size needed for a confounder to nullify or reverse an observed association. See Schield (1999). Cornfield noted that smokers were at least 10 times as likely to develop lung cancer as non- smokers. Fisher's association had a relative risk of only three. Cornfield proved that Fisher's confounder was too small to nullify the observed association. Yes, there is no test for an unobserved confounder, but there is a necessary condition for an unobserved confounder to nullify an observed association. See Schield (1999), Gastwirth et al. (2000) and Cornfield (2018). Cornfield's
Recommended publications
  • No-Decision Classification: an Alternative to Testing for Statistical
    The Journal of Socio-Economics 33 (2004) 631–650 No-decision classification: an alternative to testing for statistical significance Nathan Berga,b,∗ a School of Social Sciences, University of Texas at Dallas, GR 31 211300, Box 830688, Richardson, TX 75083-0688, USA b Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Berlin, Germany Abstract This paper proposes a new statistical technique for deciding which of two theories is better supported by a given set of data while allowing for the possibility of drawing no conclusion at all. Procedurally similar to the classical hypothesis test, the proposed technique features three, as opposed to two, mutually exclusive data classifications: reject the null, reject the alternative, and no decision. Referred to as No-decision classification (NDC), this technique requires users to supply a simple null and a simple alternative hypothesis based on judgments concerning the smallest difference that can be regarded as an economically substantive departure from the null. In contrast to the classical hypothesis test, NDC allows users to control both Type I and Type II errors by specifying desired probabilities for each. Thus, NDC integrates judgments about the economic significance of estimated magnitudes and the shape of the loss function into a familiar procedural form. © 2004 Elsevier Inc. All rights reserved. JEL classification: C12; C44; B40; A14 Keywords: Significance; Statistical significance; Economic significance; Hypothesis test; Critical region; Type II; Power ∗ Corresponding author. Tel.: +1 972 883 2088; fax +1 972 883 2735. E-mail address: [email protected]. 1053-5357/$ – see front matter © 2004 Elsevier Inc. All rights reserved.
    [Show full text]
  • Comparison of Machine Learning Techniques When Estimating Probability of Impairment
    Comparison of Machine Learning Techniques when Estimating Probability of Impairment Estimating Probability of Impairment through Identification of Defaulting Customers one year Ahead of Time Authors: Supervisors: Alexander Eriksson Prof. Oleg Seleznjev Jacob Långström Xun Su June 13, 2019 Student Master thesis, 30 hp Degree Project in Industrial Engineering and Management Spring 2019 Abstract Probability of Impairment, or Probability of Default, is the ratio of how many customers within a segment are expected to not fulfil their debt obligations and instead go into Default. This isakey metric within banking to estimate the level of credit risk, where the current standard is to estimate Probability of Impairment using Linear Regression. In this paper we show how this metric instead can be estimated through a classification approach with machine learning. By using models trained to find which specific customers will go into Default within the upcoming year, based onNeural Networks and Gradient Boosting, the Probability of Impairment is shown to be more accurately estimated than when using Linear Regression. Additionally, these models provide numerous real-life implementations internally within the banking sector. The new features of importance we found can be used to strengthen the models currently in use, and the ability to identify customers about to go into Default let banks take necessary actions ahead of time to cover otherwise unexpected risks. Key Words Classification, Imbalanced Data, Machine Learning, Probability of Impairment, Risk Management Sammanfattning Titeln på denna rapport är En jämförelse av maskininlärningstekniker för uppskattning av Probability of Impairment. Uppskattningen av Probability of Impairment sker genom identifikation av låntagare som inte kommer fullfölja sina återbetalningsskyldigheter inom ett år.
    [Show full text]
  • Removing Hidden Confounding by Experimental Grounding
    Removing Hidden Confounding by Experimental Grounding Nathan Kallus Aahlad Manas Puli Uri Shalit Cornell University and Cornell Tech New York University Technion New York, NY New York, NY Haifa, Israel [email protected] [email protected] [email protected] Abstract Observational data is increasingly used as a means for making individual-level causal predictions and intervention recommendations. The foremost challenge of causal inference from observational data is hidden confounding, whose presence cannot be tested in data and can invalidate any causal conclusion. Experimental data does not suffer from confounding but is usually limited in both scope and scale. We introduce a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data. Our method makes strictly weaker assumptions than existing approaches, and we prove conditions under which it yields a consistent estimator. We demonstrate our method’s efficacy using real-world data from a large educational experiment. 1 Introduction In domains such as healthcare, education, and marketing there is growing interest in using observa- tional data to draw causal conclusions about individual-level effects; for example, using electronic healthcare records to determine which patients should get what treatments, using school records to optimize educational policy interventions, or using past advertising campaign data to refine targeting and maximize lift. Observational datasets, due to their often very large number of samples and exhaustive scope (many measured covariates) in comparison to experimental datasets, offer a unique opportunity to uncover fine-grained effects that may apply to many target populations.
    [Show full text]
  • Confounding Bias, Part I
    ERIC NOTEBOOK SERIES Second Edition Confounding Bias, Part I Confounding is one type of Second Edition Authors: Confounding is also a form a systematic error that can occur in bias. Confounding is a bias because it Lorraine K. Alexander, DrPH epidemiologic studies. Other types of can result in a distortion in the systematic error such as information measure of association between an Brettania Lopes, MPH bias or selection bias are discussed exposure and health outcome. in other ERIC notebook issues. Kristen Ricchetti-Masterson, MSPH Confounding may be present in any Confounding is an important concept Karin B. Yeatts, PhD, MS study design (i.e., cohort, case-control, in epidemiology, because, if present, observational, ecological), primarily it can cause an over- or under- because it's not a result of the study estimate of the observed association design. However, of all study designs, between exposure and health ecological studies are the most outcome. The distortion introduced susceptible to confounding, because it by a confounding factor can be large, is more difficult to control for and it can even change the apparent confounders at the aggregate level of direction of an effect. However, data. In all other cases, as long as unlike selection and information there are available data on potential bias, it can be adjusted for in the confounders, they can be adjusted for analysis. during analysis. What is confounding? Confounding should be of concern Confounding is the distortion of the under the following conditions: association between an exposure 1. Evaluating an exposure-health and health outcome by an outcome association.
    [Show full text]
  • Binary Classification Models with “Uncertain” Predictions
    Binary classification models with “Uncertain” predictions Damjan Krstajic1§, Ljubomir Buturovic2, Simon Thomas3, David E Leahy4 1Research Centre for Cheminformatics, Jasenova 7, 11030 Beograd, Serbia 2Clinical Persona, 932 Mouton Circle, East Palo Alto, CA 94303, USA 3Cyprotex Discovery Ltd, No. 24 Mereside, Alderley Park, Macclesfield SK10 4TG, UK 4Discovery Bus Ltd, The Oakridge Centre, Gibhill Farm, Shrigley Rd, Macclesfield, SK10 5SE, UK §Corresponding author Email addresses: DK: [email protected] LB: [email protected] ST: [email protected] DEL: [email protected] - 1 - Abstract Binary classification models which can assign probabilities to categories such as “the tissue is 75% likely to be tumorous” or “the chemical is 25% likely to be toxic” are well understood statistically, but their utility as an input to decision making is less well explored. We argue that users need to know which is the most probable outcome, how likely that is to be true and, in addition, whether the model is capable enough to provide an answer. It is the last case, where the potential outcomes of the model explicitly include ‘don’t know’ that is addressed in this paper. Including this outcome would better separate those predictions that can lead directly to a decision from those where more data is needed. Where models produce an “Uncertain” answer similar to a human reply of “don’t know” or “50:50” in the examples we refer to earlier, this would translate to actions such as “operate on tumour” or “remove compound from use” where the models give a “more true than not” answer.
    [Show full text]
  • Social Dating: Matching and Clustering
    Social Dating: Matching and Clustering SangHyeon (Alex) Ahn Jin Yong Shin 3900 North Charles St 3900 North Charles St Baltimore, MD 21218, USA Baltimore, MD 21218, USA [email protected] [email protected] Abstract 1 Introduction Purpose of the paper was to apply machine learn- Social dating is a stage of interpersonal rela- ing algorithms for Social Dating problems and build tionship between two individuals for with the tools to be used for smarter decision making in aim of each assessing the other’s suitability as matching process. We used the data from Speed Dat- a partner in a more committed intimate rela- ing Experiment (of 21 waves) which includes each tionship or marriage. Today, many individ- candidate’s demographic attributes and preferences. uals spend a lot of money, time, and effort We have selected relevant features to provide the for the search of their true partners. Some most informative data and used SQL to inner join reasons for the inefficiency in seeking sexual partners include limited pool of candidates, missing feature variables. lack of transparency in uncovering personal- We implemented Machine Learning algorithms ities, and the nature of time consumption in (without the use of external libraries) to perform building relationships. Finding the right part- (1) binary classifications to predict a date match ner in a machine driven automated process can between a candidate and a potential partner, and be beneficial towards increasing efficiency in (2) clustering analysis on candidates to use narrow the following aspects: reduced time and ef- down a pool of candidates by demographic traits fort, a larger pool, and a level of quantita- and partner preferences.
    [Show full text]
  • How Will AI Shape the Future of Law?
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Helsingin yliopiston digitaalinen arkisto How Will AI Shape the Future of Law? EDITED BY RIIKKA KOULU & LAURA KONTIAINEN 2019 Acknowledgements The editors and the University of Helsinki Legal Tech Lab would like to thank the authors and interviewees for the time and effort they put into their papers. We would also like to thank the Faculty of Law at University of Helsinki and particularly the continuous kind support of Professor Kimmo Nuotio, the former dean of the faculty and the Lab’s unofficial godfather, for facilitating the Lab’s development and for helping us make the second Legal Tech Conference happen. In addition, we would like to express our gratitude to the conference sponsors, Finnish Bar Association, the Association of Finnish Lawyers, Edita Publishing, Attorneys at Law Fondia and Attorneys at Law Roschier as well as the Legal Design Summit community for their support. It takes a village to raise a conference. Therefore, we would like to thank everyone whose time and commitment has turned the conference and this publication from an idea into reality. Thank you to the attendees, speakers, volunteers and Legal Tech Lab crew members. RIIKKA KOULU & LAURA KONTIAINEN Legal Tech Lab, University of Helsinki 2019 University of Helsinki Legal Tech Lab publications © Authors and Legal Tech Lab ISBN 978-951-51-5476-7 (print) ISBN 978-951-51-5477-4 (PDF) Print Veiters Helsinki 2019 Contents Foreword 009 KIMMO NUOTIO I Digital Transformation of
    [Show full text]
  • Gaussian Process Classification Using Privileged Noise
    Mind the Nuisance: Gaussian Process Classification using Privileged Noise Daniel Hernandez-Lobato´ Viktoriia Sharmanska Universidad Autonoma´ de Madrid IST Austria Madrid, Spain Klosterneuburg, Austria [email protected] [email protected] Kristian Kersting Christoph H. Lampert TU Dortmund IST Austria Dortmund, Germany Klosterneuburg, Austria [email protected] [email protected] Novi Quadrianto SMiLe CLiNiC, University of Sussex Brighton, United Kingdom [email protected] Abstract The learning with privileged information setting has recently attracted a lot of at- tention within the machine learning community, as it allows the integration of ad- ditional knowledge into the training process of a classifier, even when this comes in the form of a data modality that is not available at test time. Here, we show that privileged information can naturally be treated as noise in the latent function of a Gaussian process classifier (GPC). That is, in contrast to the standard GPC setting, the latent function is not just a nuisance but a feature: it becomes a natural measure of confidence about the training data by modulating the slope of the GPC probit likelihood function. Extensive experiments on public datasets show that the proposed GPC method using privileged noise, called GPC+, improves over a stan- dard GPC without privileged knowledge, and also over the current state-of-the-art SVM-based method, SVM+. Moreover, we show that advanced neural networks and deep learning methods can be compressed as privileged information. 1 Introduction Prior knowledge is a crucial component of any learning system as without a form of prior knowl- edge learning is provably impossible [1].
    [Show full text]
  • Binary Classification Approach to Ordinal Regression
    Binary Classification Approach to Ordinal Regression Jianhao Peng [email protected] Abstract In the lecture we mainly focus on questions about binary classification or regres- sion. However in many cases we may want the results to show preference in terms of several discrete, but ordered, values, like ratings of a restaurant, disease condition and students' letter grades. Such problems are usually known as ordinal regression or ordinal classification. Standard approaches to ordinal regression in statistics require a assumption on the distribution of a latent variable while many other learning methods treat this as a multiclass problem, which may lose the order information in the data. In this paper I first present the probability model of ordinal regression in traditional statistics, then briefly go through different types of loss function defined from two per- spectives in order to reduce the problem to simple binary case. From the loss function we can also get performance bounds similar to those we learnt in binary classification. 1 Introduction Compared to the standard machine learning tasks of classification or regression, there exist many other systems in which users specify preferences by giving each sample one of several discrete values, e.g. one through five stars in movie or restaurant rating and A through F in letter grades. Those problems are often considered as multiclass classification, which treated the labels as a finite unordered nominal set. However in that case the natural order structure of the data is ignored and may not be as accurate. Therefore one nature question arises: can we make use of the lost ordinal information to potentially improve the predictive performance of the classifier or simplify the learning algorithm? Such question has been studied in statistics for decades.
    [Show full text]
  • Beyond Controlling for Confounding: Design Strategies to Avoid Selection Bias and Improve Efficiency in Observational Studies
    September 27, 2018 Beyond Controlling for Confounding: Design Strategies to Avoid Selection Bias and Improve Efficiency in Observational Studies A Case Study of Screening Colonoscopy Our Team Xabier Garcia de Albeniz Martinez +34 93.3624732 [email protected] Bradley Layton 919.541.8885 [email protected] 2 Poll Questions Poll question 1 Ice breaker: Which of the following study designs is best to evaluate the causal effect of a medical intervention? Cross-sectional study Case series Case-control study Prospective cohort study Randomized, controlled clinical trial 3 We All Trust RCTs… Why? • Obvious reasons – No confusion (i.e., exchangeability) • Not so obvious reasons – Exposure represented at all levels of potential confounders (i.e., positivity) – Therapeutic intervention is well defined (i.e., consistency) – … And, because of the alignment of eligibility, exposure assignment and the start of follow-up (we’ll soon see why is this important) ? RCT = randomized clinical trial. 4 Poll Questions We just used the C-word: “causal” effect Poll question 2 Which of the following is true? In pharmacoepidemiology, we work to ensure that drugs are effective and safe for the population In pharmacoepidemiology, we want to know if a drug causes an undesired toxicity Causal inference from observational data can be questionable, but being explicit about the causal goal and the validity conditions help inform a scientific discussion All of the above 5 In Pharmacoepidemiology, We Try to Infer Causes • These are good times to be an epidemiologist
    [Show full text]
  • Analysis of Confirmed Cases of COVID-19
    sustainability Article Investigating a Serious Challenge in the Sustainable Development Process: Analysis of Confirmed cases of COVID-19 (New Type of Coronavirus) Through a Binary Classification Using Artificial Intelligence and Regression Analysis Behrouz Pirouz 1 , Sina Shaffiee Haghshenas 2,* , Sami Shaffiee Haghshenas 2 and Patrizia Piro 2 1 Department of Mechanical, Energy and Management Engineering, University of Calabria, 87036 Rende, Italy; [email protected] 2 Department of Civil Engineering, University of Calabria, 87036 Rende, Italy; Sami.shaffi[email protected] (S.S.H.); [email protected] (P.P.) * Correspondence: S.shaffi[email protected]; Tel.: +39-0984-496-542 Received: 5 March 2020; Accepted: 16 March 2020; Published: 20 March 2020 Abstract: Nowadays, sustainable development is considered a key concept and solution in creating a promising and prosperous future for human societies. Nevertheless, there are some predicted and unpredicted problems that epidemic diseases are real and complex problems. Hence, in this research work, a serious challenge in the sustainable development process was investigated using the classification of confirmed cases of COVID-19 (new version of Coronavirus) as one of the epidemic diseases. Hence, binary classification modeling was used by the group method of data handling (GMDH) type of neural network as one of the artificial intelligence methods. For this purpose, the Hubei province in China was selected as a case study to construct the proposed model, and some important factors, namely maximum, minimum, and average daily temperature, the density of a city, relative humidity, and wind speed, were considered as the input dataset, and the number of confirmed cases was selected as the output dataset for 30 days.
    [Show full text]
  • Study Designs in Biomedical Research
    STUDY DESIGNS IN BIOMEDICAL RESEARCH INTRODUCTION TO CLINICAL TRIALS SOME TERMINOLOGIES Research Designs: Methods for data collection Clinical Studies: Class of all scientific approaches to evaluate Disease Prevention, Diagnostics, and Treatments. Clinical Trials: Subset of clinical studies that evaluates Investigational Drugs; they are in prospective/longitudinal form (the basic nature of trials is prospective). TYPICAL CLINICAL TRIAL Study Initiation Study Termination No subjects enrolled after π1 0 π1 π2 Enrollment Period, e.g. Follow-up Period, e.g. three (3) years two (2) years OPERATION: Patients come sequentially; each is enrolled & randomized to receive one of two or several treatments, and followed for varying amount of time- between π1 & π2 In clinical trials, investigators apply an “intervention” and observe the effect on outcomes. The major advantage is the ability to demonstrate causality; in particular: (1) random assigning subjects to intervention helps to reduce or eliminate the influence of confounders, and (2) blinding its administration helps to reduce or eliminate the effect of biases from ascertainment of the outcome. Clinical Trials form a subset of cohort studies but not all cohort studies are clinical trials because not every research question is amenable to the clinical trial design. For example: (1) By ethical reasons, we cannot assign subjects to smoking in a trial in order to learn about its harmful effects, or (2) It is not feasible to study whether drug treatment of high LDL-cholesterol in children will prevent heart attacks many decades later. In addition, clinical trials are generally expensive, time consuming, address narrow clinical questions, and sometimes expose participants to potential harm.
    [Show full text]