<<

Why are we here? Intro to study design Intro to biostatistics

Research objective, study design and statistical approach

Assoc. Prof. Cameron Hurst [email protected]

QIMR Berghofer Insitute, Queensland, Australia

2nd November, 2020

1/28 Why are we here? Intro to study design Intro to biostatistics Good scientific research

Before I go further I want to talk about what represents strong research (i.e. a manuscript likely to be accepted in a high impact journal). Strong research is: 1 Scientifically significant: It addresses an important research question that clearly needs answering 2 It has high scientific quality: We believe the findings presented. In other words, the paper provides strong quality of evidence As a methodologist, it is this second issue that interests me. It’s also the only one we have any real power over, anyway. Also, to the expert, the first issue: scientific importance (and novelty), should be obvious.

2/28 Why are we here? Intro to study design Intro to biostatistics Scientific significance vs scientific quality

To get the point across about scientific significance and scientific quality I would like to give you an example of two different studies. Study 1: Prevalence of blood sugar control in rural Type 2 diabetes (T2D) outpatients: A retrospective cross-sectional study Single-center (a hospital) involving 120 patients Record glycated hemoglobin (HbA1c) along with a some patient socio-demographic factors from medical records In our , approx. 34% of patients had HbA1c ≤ 7% Note: Around the world (other studies), Type 2 diabetes patient blood sugar control ranges in prevalence from about 30% to %40 1. Important/novel?? ... 2. Strength of evidence??

3/28 Why are we here? Intro to study design Intro to biostatistics Scientific significance vs scientific quality

Study 2: Impact of a ’patient empowerment program’ on blood sugar control in T2D patients: A cluster randomized implementation trial A study involving 25 Diabetes clinics from across the country Implementation trial of a ’patient empowerment’ intervention that seeks to enhance patient self-management through improving patient diabetes management self-efficacy Randomized implementation sequence for centers using a stepped-wedge design for this prospectively powered study Results demonstrated that Blood sugar control went from 35% (at pre-intervention) to 53% post-intervention (diff=18%, 95%CI : 13.8%, 22.2%, p < 0.0001) Recommend that all diabetes clinics (nationally) adopt this program for their outpatients. 1. Important/novel?? ... 2. Strength of evidence?? 4/28 Why are we here? Intro to study design Intro to biostatistics Scientific quality: The quality of evidence

Now let’s think about the quality of evidence. Can we believe: The results are generalizable: Does the sample represent a sufficiently representative and broad subgroup of the target that the findings are important for others The results are reproducible: Is the sample sufficently large and diverse that we would be see similar results if study repeated Results were unbiased: Was sufficient care taken in the study design to minimize bias. If it was experimental, was there , blinding? Were there problems with patient compliance? If there is bias (e.g. ), were appropriate statistical models employed that may control this bias?

5/28 Why are we here? Intro to study design Intro to biostatistics The idea of Quality of Evidence

As an epidemiologist, I like to think of quality of evidence issues in the same way as we think about risk factors for a . Some risk factors are non-modifiable (there is nothing we can do about them), and some risk factors are modifiable. For example: Non-modifiable risk factors include things like , gender and ethnicity; and Modifiable risk factors include factors like smoking, BMI, dietry salt intake, etc. In a similar way, it is useful to think about quality of evidence issues in as either non-modifiable (forgivable) or modifiable (unforgivable). Design issues that are forgivable should be acknowledged (limitations section), and those that are unforgivable should be designed/modeled out.

6/28 Why are we here? Intro to study design Intro to biostatistics Enter ”Clinical and Biostatistics”

So, why am I here today??? Am I here to simply inflict another painful biostatistics lecture on you? Answer: No! I promise I won’t mention even one !

INSTEAD, I would like to emphasize the importance of clinical epidemiology and biostatistics in the sciences. I would like to show you how it can help you move from mediocre research, to producing (concieving, designing, conducting and disseminating) strong clinical research.

7/28 Why are we here? Intro to study design Intro to biostatistics Epidemiology Vs Biostatistics

The disciplines of Epidemiology and Biostatistics are inextricably related. They both relate to the ’research’ end of the health discipline ...as such, a basis in epi and biostats is essential for health and (bio)medical researchers Definition: Epidemiology is the study of the distribution and determinants of disease, health, or injury outcomes in human and the use of the knowledge we gain (from this study) to control these health problems Clinical epidemiology focuses on those who are already sick (i.e. clinical populations) ....which can make it considerably easier to collect representative samples

8/28 Why are we here? Intro to study design Intro to biostatistics Another definition of Clinical epidemiology (in practice)

In practice, clinical epidemiology is about the generic research methods (regardless of clinical sub-discipline) we use to produce strong research, that will we hope will (eventually) makes it’s way into clinical practice or policy (and improve patient outcomes); evidence-based .

So all we need to do now is produce strong research.....EASY?

But how do we do this? So (again) let’s reiterate what is meant by strong research?

9/28 Why are we here? Intro to study design Intro to biostatistics The ’strong’ study

So a reminder ... A strong study is one that: 1 is Contextually (i.e. clinically) significant: That is, it needs to address an important research question (one that fills a gap in our knowledge). If a paper does not address an interesting issue, readers (and therefore, a good journal) is unlikely to be interested in it. 2 has High scientific quality: It is a well conceived study, with an effective study design, an appropriate analysis, AND all of this is then packaged into a well written paper.

10/28 Why are we here? Intro to study design Intro to biostatistics Yet another definition of Clinical Epidemiology

With this in mind, I would like to put forward another PRACTICAL definition of Clinical epidemiology. Clinical epidemiology ⇒ Generic clinical research skills Clinical epidemiology is about the processes and methods underpinning clinical research practice and has four main components: 1 Problem formulation (relates to the clinical science) 2 Design (epidemiology) 3 Methods (biostatistics) 4 Dissemination (academic research communication skills)

This is to be distinguished from the clinical sciences (e.g. nephrology, opthalmolgy etc); areas of clinical expertise.

11/28 Why are we here? Intro to study design Intro to biostatistics A bit of Epi101: Study design

We will start by examining some of the (classical) health and medical study designs The main factors that govern our choice of study design are the: 1. research question, 2. target population (and our ability to sample it) and (correspondingly) 3. our ability to control sources of bias Fortunately (for you) most clinical studies (especially those involving a ’treatment’) tend to be up the stronger end of health study designs (more on this later) Study designs in clinical studies: Design vs analytical approach Some clinical studies (especially clinical trials) try to control sources of bias through design, allowing the use of simple statistical analysis. Where we can’t do this, we need to use more sophisticated statistical approaches

12/28 Why are we here? Intro to study design Intro to biostatistics Health and medical study designs (epi101)

Study design Randomized controlled trials Non-randomized trials Cohort Cross-sectional Case-control Ecological Case series Case study

13/28 Why are we here? Intro to study design Intro to biostatistics Randomized controlled trials

Widely considered to be the of study designs Provide the strongest evidence Double blinding and randomization attempt to minimize selection and confounding bias As a true , RCTs always involve an intervention (or treatment) that we (as the researcher) impose on the patients That is, experimental manipulation by the researcher Often not possible in epidemiological studies, especially those involving the study of risk factors (unethical to impose a protocol of risky behaviour)

14/28 Why are we here? Intro to study design Intro to biostatistics Non-randomized trials

Many study don’t (or can’t) include a randomization process, but still involve an intervention For example, it may be unethical or impractical to randomize patients, or it might be impossible to avoid contamination between different experimental arms Example might be comparing the different treatments used at two different hospitals This is called a ’non-equivelent group design’ Patients are not randomized to the hospitals and may differ in many ways (e.g demographics, health coverage, exposure etc) What problems might this cause??? Important to realize that like RCTs, NRTs are prospective.

15/28 Why are we here? Intro to study design Intro to biostatistics Cohort studies

Cohort studies are considered the strongest (in terms of evidence) of the observational studies. Observational designs: Researcher has no control over group membership (’outsider’ who can only observe what happens) Their main strength (above something like a cross-sectional study) is that the risk factors (exposures) are collected before the outcome (e.g. disease) Cohort studies can be retrospective: already exists and NOT collected for the purposes of the study (i.e. a secondary analysis or analysis of routine collected data); or Prospective: data collected specifically for answering the research question (i.e. the study generates the ) Also, cohort studies can involve a single collection of the endpoint (e.g. disease); or They can be longitudinal: where patients are repeatedly

measured over time 16/28 Why are we here? Intro to study design Intro to biostatistics Longitudinal cohort studies

Generally the strongest of all ’observational’ designs Repeated observation of participants over time Common problem is loss-to-follow-up(LTFU) LTFU especially a problem when it confounds with effects/outcomes of interest: LTFU does not occur at random Like other cohort studies, longitudinal cohorts can be collected retrospectively or prospectively (data collected for study) Reality check: Urban myths Some reviewers can be overly critical of retrospective studies. Their argument is that data were NOT collected to answer the question, so why should we believe it (e.g sample size/power is an issue). The logic is that a well-planned prospective study can be trusted. BUT in reality most ’planned’ cohort studies are not that much better (prospectivity often makes little difference to data quality). The main advantage is that we can measure new variables that are not routinely measured 17/28 Why are we here? Intro to study design Intro to biostatistics Cross-sectional studies

Often called ’prevalence’ studies as we can get an estimate of disease prevalence from our sample (or subgroups of the sample) In contrast, see case-control studies below A common study designs in population-based epidemiology as they are generally cheaper and more practical (project timeline) than longitudinal cohort studies Main problem is that without strong contextual evidence, associations can not be assumed to be causal (only associative) Also rely strongly on a ’representative’ sample of the target population (something harder to obtain than you might think)

18/28 Why are we here? Intro to study design Intro to biostatistics Case-control studies

Case-control studies are usually performed where there is a scarcity of participants with the ’condition’ of interest (i.e. Low prevalence) Idea is to collect a group of cases (those with the outcome of interest) and a similar group that corresponds as much as possible without the ’condition’ of interest (controls) In this respect the relative balance of cases to controls is an artefact of study design (so in no way reflects prevalence) Then (along with measuring observable traits) we ask participants about exposure history For this reason, case-control studies are prone to recall bias Also a variant design, matched case-control study involves balancing cases and controls within strata (e.g. age-sex). Generally not used that much anymore (better approaches available)

19/28 Why are we here? Intro to study design Intro to biostatistics Ecological studies

Finally we come to the last of the ’primary data-based’ study designs: Ecological studies Typically the outcomes from such studies represent aggregate counts (or rates) from different times (e.g. year) or spaces (e.g. countries or localities) Often data are routinely collected (e.g. hospital or government data) As the outcomes are often counts (or rates), (or related methods) are often used in these studies Probably the study design most prone to confounding, as there are always multiple sources of variation between ’observation’ (→ ecological fallacy) Common in population studies of vector-borne and parasitic (e.g. Malaria and Dengue fever)

20/28 Why are we here? Intro to study design Intro to biostatistics Impact of study design

Study design is a major determinant of scientific quality. The study design can also have a major impact on analytical planning (how we plan to analyze the data) I would like to clarify what I by analytical planning.I mean: What is the likely of our outcome(s) and predictor(s)? What is the best way (and practical issues) for (e.g. Random sample, stratified samples, convenience samples)? What are the likely sources of (uncontrolled) bias? Based on our study design, what statistical method \ modeling approach should we use? Prospective powering: If meaningful, what sample size is appropriate?

Study design and srength of evidence Study design + analytical planning = Scientific quality

21/28 Why are we here? Intro to study design Intro to biostatistics Biostatistics

Statistics is the science (and art) of dealing with (analyzing or modelling) variation in data to obtain reliable results and conclusions

Biostatistics is the application of statistical methods to the health sciences: biomedical sciences, , allied health, medicine etc.

Biostatistics in the public health and medical contexts, is usually (but not always) the application of statistical models to community-based and clinical samples

22/28 Why are we here? Intro to study design Intro to biostatistics Oh, why do we have to learn biostatistics

Biostatistics represents the scientific framework underpinning most health research An understanding of biostatistics is essential for both conducting and disseminating health research

Warning: Why do grants, protocols, manuscripts and theses fail? Many unsuccessful grants submitted to funding bodies, unsuccessful protocols submitted to ECs/IRBs, and unsuccessful manuscripts/theses sent to reviewers/examiners, fail in their scientific quality (i.e. poor research design and/or analytical planning), NOT in their contextual significance (importance of the research question)

23/28 Why are we here? Intro to study design Intro to biostatistics Study objectives and analytical approach

There are a number reasons (objectives) why a researchers might conduct a project In my experience, a large majority of studies fall under three main categories: 1 ”Predictive” Modelling: Performing analysis for predictive purposes For example: Diagnostic testing or prognostic scoring models 2 testing: Answering a previously posed (and very specific) research question Includes most clinical trials, and epidemiological studies with a particular study effect 3 ”Exploratory” modelling: Trying to identify potential risk/protective factors (e.g. where disease or other health outcome epidemiology not yet well understood) This ’type’ of study has a profound effect on the design and analytical approaches we have to consider

24/28 Why are we here? Intro to study design Intro to biostatistics Predictive modelling

Emphasis is on the (e.g. patient) rather than the population Much more common in clinical setting, than in population studies, particularly: 1 Diagnostic tests 2 Progostic models and predicting survival Predictive modelling relies on very strong relationships Need high R2 for continuous outcome models and high classification accuracy for binary outcome modelss (i.e. high sensitivity and/or specificity, high AUC in ROC curve).

Take home point: Preditive models Fit to the data (not statistical significance) is perhaps the most important factor in selecting the ’best’ model in predictive studies, although model still needs to be generalizable

25/28 Why are we here? Intro to study design Intro to biostatistics Hypothesis testing: Addressing a specific research question

Studies that consider a specific research question typically address a single (primary) hypothesis As such (and if sensible) they should be prospectively and formally powered (i.e. Formal sample size calculation) Examples of these types of studies include RCTs and population studies considering a particular and well-understood risk factor In these studies we typically have a single predictor (study effect) of interest (although other covariates can also included in the models)

26/28 Why are we here? Intro to study design Intro to biostatistics Exploratory studies

Exploratory studies are usually done when we don’t have a strong (scientific) understanding of the system (e.g. disease outcome or population) under consideration For this reason, these studies typically involve a large number of covariates which may (or may not) represent potential risk factors, confounders and/or effect modifiers Limited knowledge (and lack of a specific research hypothesis) we must be less formal in our sample size calculation for such studies. For example rules of thumb such as: Continuous outcomes: N = 60 + 5k where k is the number of covariate under consideration Binary outcomes: 10 events (i.e. ’cases’) per covariate rule (aka 10EPV rule) Even possible to use simulation methods to ’guestimate’ sample size

27/28 Why are we here? Intro to study design Intro to biostatistics Take home message: Talk to the biostatitican EARLY!!!

To call the after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of. Ronald A. Fisher (Grand-daddy of statistics) Thank-you!!!!!! QUESTIONS???

28/28