Basic Biostatistics Part 3
31st January, 2018 Content
• Part 2 Summary
• Frequency Distributions and Contingency Tables
• Diagnostic Tests
• Rate and Risk
Part 2 Summary
• What were the key learning points from Part 2?
− In groups, identify 3 key learning points from the second session Frequency Distributions and Contingency Tables Definition of a Frequency Distribution
• A few examples:
− “a representation, either in a graphical or tabular format, which displays the number of observations within a given interval” − “a mathematical function showing the number of instances in which a variable takes each of its possible values” − “an arrangement of statistical data that exhibits the frequency of the occurrence of the values of a variable”
Contingency Table
• A table in which the entries are frequencies
• A matrix format that displays the frequency distribution of the variables
• If there are 2 rows and 2 columns it is called a “2x2 contingency table”
• Often used in conjunction with statistical tests e.g. Chi-squared test, Diagnostic test
General Contingency table: 2 x 2
Characteristic Group 1 Group 2 Total Present a b a + b Absent c d c + d Total a + c b + d n = a + b + c + d Contingency Table in Practice Diagnostic Tests Gold Standard and Diagnostic Tests
• Gold Standard - provides a definitive diagnosis of a particular condition
• May be impractical or not routinely available
• Simple diagnostic tests - provide a guide to whether or not a patient has a condition
• To evaluate the usefulness of a diagnostic test, the test needs to be applied to a group of patients whose true disease status is known from the Gold Standard
Use of a contingency table
Gold Standard Test Test result Disease No disease Total Positive a b a + b Negative c d c + d Total a + c b + d n = a + b + c + d
Prevalence
• a + c individuals have the disease a c • Prevalence of the disease = n
• Of the a + c individuals who have the disease
− a have positive test results – true positives − c have negative test results – false negatives
People without the disease
• Of the b + d individuals who do not have the disease
− d have negative test results – true negatives
− b have positive test results – false positives
Assessing Effectiveness: Sensitivity and Specificity
• The proportion of individuals with the disease who are correctly identified by the test a = Sensitivity a c • The proportion of individuals without the disease who are correctly identified by the test d = Specificity b d High Sensitivity or High Specificity?
• Desirable to have both as close to 1 (or 100%) as possible • In practice – may gain high sensitivity at the expense of specificity (or vice versa) • For conditions that are easily treatable – high sensitivity is preferred • For conditions that are serious and untreatable – high specificity is preferred
Using the test result for diagnosis - Positive Predictive Value
• Positive Predictive Value = proportion of individuals with a positive test result who have the disease a • Positive Predictive Value = a b Using the test result for diagnosis - Negative Predictive Value
• Negative Predictive Value = proportion of individuals with a negative test result who do not have the disease d • Negative Predictive Value = c d Comparing Values
• Sensitivity and Specificity quantify the diagnostic ability of the test
• Predictive values (PPV and NPV) indicate how likely it is that the individual has or does not have the disease, given his/her test results Cut-off values
• A diagnosis is sometimes made on the basis of a numerical or ordinal measurement
• A cut-off value needs to be defined above (or below) which it is believed an individual has a high chance of having the disease
• Can evaluate a cut-off value by calculating its associated sensitivity, specificity and predictive values
• Can choose the cut-off to optimise the measures as desired
ROC curve
• Receiver Operating Characteristic (ROC) curve
• A useful way of assessing whether a test provides useful information
• Used to compare different tests
• Used to select an optimal cut-off value
Drawing and Interpreting the ROC curve
• For all cut-off points, plot Sensitivity vs. 1 – Specificity
• Connect the points
• The ROC curve for a test with some use will lie to the left of the diagonal of the graph
• The optimal cut-off for a test can be chosen from the graph
AUROC
• Area Under each Receiver Operating Characteristic Curve
• Given by the c statistic
• c =1 means that the test is perfect at discriminating between disease outcomes
• c = 0.5 means that the test performs no better than chance
Is a test useful? Likelihood Ratio (LR) for a positive test result sensitivity • LR for a positive result = (1specificity) • Ratio of the chance of a positive result if the patient has the disease to the chance of a positive result if the patient does not have the disease
• e.g. an LR of 10 for a positive result indicates that a positive result is 10 times as likely to occur in a patient with disease than in a patient without it
• A high likelihood (e.g. > 10) suggests that the test is useful and provides evidence to support the diagnosis
Is a test useful? Likelihood Ratio (LR) for a negative test result
(1- sensitivity) • LR for a negative result = specificity • Ratio of the chance of a negative result if the patient does not have the disease to the chance of a negative result if the patient does have the disease
• A likelihood close to 0 (e.g.< 0.01) suggests that the test is useful and provides evidence to rule out the diagnosis
References and Workshop
• http://apt.rcpsych.org/content/10/6/446.full
• Workshop: from page 117 of Medical Statistics at a Glance by Petrie and Sabin
− Calculate Prevalence, Sensitivity, Specificity, PPV, NPV, LR for positive result and LR for negative result
Survival Analysis Survival Analysis
• Analysis of the time it takes an individual to reach an endpoint of interest (often death) • Length of time is the variable • Data is often censored
• Examples: time to relapse after treatment or time to death after treatment
Survival Times
• Survival times are calculated from some baseline that reflects a ‘starting point’ for the study e.g. surgery date diagnosis of a condition date
• ...until the time that a patient reaches the endpoint of interest
Survival plot
• Separate horizontal lines can be drawn for each patient
• The length indicates the survival time
• Different symbols at the end of the line represent true endpoint or censored data
Survival curve (Kaplan-Meier curve)
• Usually calculated by the Kaplan-Meier method • Displays the cumulative probability (the survival probability) of an individual remaining free of the endpoint at any time after baseline • Drawn as a series of steps • Most statistical packages have functionality for survival curves
Summarising survival
• Survival rates are often summarised by quoting survival probabilities at certain time points on the curve
• The median time to reach the endpoint is sometimes quoted
Survival Curves Workshop
• What are the endpoints of interest?
• Explore:
the effect of treatments A and B on survival the effect of different stages of cancer on survival the effect of altering or not altering the gene set on survival
• Quote survival probabilities at certain time points
• What is the median time to reach the endpoint in each case?
• Identify potential applications of survival plots and curves in psychiatry
Censored data
• May not know exactly when the patient reached the endpoint e.g.
they may die from some other cause they may withdraw from the study the study may end before they die of the cause
Further analysis of survival rates
• Significant differences in progression rates between different groups can be tested formally
Log-rank test Regression models Rate and Risk Event Rate p/person p/year
Rate is the number of events occurring expressed as a proportion of the total follow-up time of all individuals
Number of events occurring Rate Total number of years follow - up for all individuals
Number of events occurring Person - years of follow - up
− Each person’s length of follow-up time is the time from when he/she enters the study until the time when the event occurs or the study draws to a close if the event does not occur
− Total number of years follow-up time is the sum of all the individuals’ follow-up times
Different names for rates
• The rate is called an incidence rate when the event is a new case (e.g. of a disease)
• The rate is called the mortality rate when the event is death
− rates cannot be calculated in a cross-sectional study since this type of study does not involve time Relative Rate (rates ratios)
Rate Relative Rate exposed Rateunexposed
− the rate of disease in a group of individuals exposed to some factor of interest with that in a group of individuals not exposed to the factor
Interpreting Relative Rate
• Relative rate = 1 indicates that the rate of disease is the same in the two groups
• Relative rate > 1 indicates that the rate is higher in those exposed to the factor than in those who are unexposed
• Relative rate < 1 indicates that the rate is lower in those exposed to the factor than in those who are unexposed
Risk
Risk is the probability of developing the disease (or dying) in a stated time period
Total number of events Risk Number of individuals in the study
− risk of event is greater when the individuals are followed for longer Relative risk
Risk Relative Risk exposed Risk unexposed
− The risk of disease in a group of individuals exposed to some factor of interest with that in a group of individuals not exposed to the factor
Interpreting Relative Risk
• Relative risk = 1 indicates that the risk of disease is the same in the two groups
• Relative risk > 1 indicates that the risk is higher in those exposed to the factor than in those who are unexposed
• Relative risk < 1 indicates that the risk is lower in those exposed to the factor than in those who are unexposed
Workshop: Rate and Risk
• Over a total follow-up of 718 person-years, 61 patients experienced virological failure
− calculate the event rate
• Exercise: from page 46 of Medical Statistics at a Glance by Petrie and Sabin
− calculate risks and relative risks
Summary
• Frequency Distributions and Contingency Tables
• Diagnostic Tests
• Survival Analysis
• Rate and Risk