Basic Biostatistics Part 3

Home , Gold standard (test)

31st January, 2018 Content

• Part 2 Summary

• Frequency Distributions and Contingency Tables

• Diagnostic Tests

• Survival Analysis

• Rate and Risk

Part 2 Summary

• What were the key learning points from Part 2?

− In groups, identify 3 key learning points from the second session Frequency Distributions and Contingency Tables Definition of a Frequency Distribution

• A few examples:

− “a representation, either in a graphical or tabular format, which displays the number of observations within a given interval” − “a mathematical function showing the number of instances in which a variable takes each of its possible values” − “an arrangement of statistical data that exhibits the frequency of the occurrence of the values of a variable”

Contingency Table

• A table in which the entries are frequencies

• A matrix format that displays the frequency distribution of the variables

• If there are 2 rows and 2 columns it is called a “2x2 contingency table”

• Often used in conjunction with statistical tests e.g. Chi-squared test, Diagnostic test

General Contingency table: 2 x 2

Characteristic Group 1 Group 2 Total Present a b a + b Absent c d c + d Total a + c b + d n = a + b + c + d Contingency Table in Practice Diagnostic Tests Gold Standard and Diagnostic Tests

• Gold Standard - provides a definitive diagnosis of a particular condition

• May be impractical or not routinely available

• Simple diagnostic tests - provide a guide to whether or not a patient has a condition

• To evaluate the usefulness of a diagnostic test, the test needs to be applied to a group of patients whose true disease status is known from the Gold Standard

Use of a contingency table

Gold Standard Test Test result Disease No disease Total Positive a b a + b Negative c d c + d Total a + c b + d n = a + b + c + d

Prevalence

• a + c individuals have the disease a  c • Prevalence of the disease = n

• Of the a + c individuals who have the disease

− a have positive test results – true positives − c have negative test results – false negatives

People without the disease

• Of the b + d individuals who do not have the disease

− d have negative test results – true negatives

− b have positive test results – false positives

Assessing Effectiveness: Sensitivity and Specificity

• The proportion of individuals with the disease who are correctly identified by the test a = Sensitivity  a  c • The proportion of individuals without the disease who are correctly identified by the test d = Specificity  b  d High Sensitivity or High Specificity?

• Desirable to have both as close to 1 (or 100%) as possible • In practice – may gain high sensitivity at the expense of specificity (or vice versa) • For conditions that are easily treatable – high sensitivity is preferred • For conditions that are serious and untreatable – high specificity is preferred

Using the test result for diagnosis - Positive Predictive Value

• Positive Predictive Value = proportion of individuals with a positive test result who have the disease a • Positive Predictive Value = a  b Using the test result for diagnosis - Negative Predictive Value

• Negative Predictive Value = proportion of individuals with a negative test result who do not have the disease d • Negative Predictive Value = c  d Comparing Values

• Sensitivity and Specificity quantify the diagnostic ability of the test

• Predictive values (PPV and NPV) indicate how likely it is that the individual has or does not have the disease, given his/her test results Cut-off values

• A diagnosis is sometimes made on the basis of a numerical or ordinal measurement

• A cut-off value needs to be defined above (or below) which it is believed an individual has a high chance of having the disease

• Can evaluate a cut-off value by calculating its associated sensitivity, specificity and predictive values

• Can choose the cut-off to optimise the measures as desired

ROC curve

• Receiver Operating Characteristic (ROC) curve

• A useful way of assessing whether a test provides useful information

• Used to compare different tests

• Used to select an optimal cut-off value

Drawing and Interpreting the ROC curve

• For all cut-off points, plot Sensitivity vs. 1 – Specificity

• Connect the points

• The ROC curve for a test with some use will lie to the left of the diagonal of the graph

• The optimal cut-off for a test can be chosen from the graph

AUROC

• Area Under each Receiver Operating Characteristic Curve

• Given by the c statistic

• c =1 means that the test is perfect at discriminating between disease outcomes

• c = 0.5 means that the test performs no better than chance

Is a test useful? Likelihood Ratio (LR) for a positive test result sensitivity • LR for a positive result = (1specificity) • Ratio of the chance of a positive result if the patient has the disease to the chance of a positive result if the patient does not have the disease

• e.g. an LR of 10 for a positive result indicates that a positive result is 10 times as likely to occur in a patient with disease than in a patient without it

• A high likelihood (e.g. > 10) suggests that the test is useful and provides evidence to support the diagnosis

Is a test useful? Likelihood Ratio (LR) for a negative test result

(1- sensitivity) • LR for a negative result = specificity • Ratio of the chance of a negative result if the patient does not have the disease to the chance of a negative result if the patient does have the disease

• A likelihood close to 0 (e.g.< 0.01) suggests that the test is useful and provides evidence to rule out the diagnosis

References and Workshop

• http://apt.rcpsych.org/content/10/6/446.full

• Workshop: from page 117 of Medical Statistics at a Glance by Petrie and Sabin

− Calculate Prevalence, Sensitivity, Specificity, PPV, NPV, LR for positive result and LR for negative result

Survival Analysis Survival Analysis

• Analysis of the time it takes an individual to reach an endpoint of interest (often death) • Length of time is the variable • Data is often censored

• Examples:  time to relapse after treatment or  time to death after treatment

Survival Times

• Survival times are calculated from some baseline that reflects a ‘starting point’ for the study e.g. surgery date diagnosis of a condition date

• ...until the time that a patient reaches the endpoint of interest

Survival plot

• Separate horizontal lines can be drawn for each patient

• The length indicates the survival time

• Different symbols at the end of the line represent true endpoint or censored data

Survival curve (Kaplan-Meier curve)

• Usually calculated by the Kaplan-Meier method • Displays the cumulative probability (the survival probability) of an individual remaining free of the endpoint at any time after baseline • Drawn as a series of steps • Most statistical packages have functionality for survival curves

Summarising survival

• Survival rates are often summarised by quoting survival probabilities at certain time points on the curve

• The median time to reach the endpoint is sometimes quoted

Survival Curves Workshop

• What are the endpoints of interest?

• Explore:

 the effect of treatments A and B on survival  the effect of different stages of cancer on survival  the effect of altering or not altering the gene set on survival

• Quote survival probabilities at certain time points

• What is the median time to reach the endpoint in each case?

• Identify potential applications of survival plots and curves in psychiatry

Censored data

• May not know exactly when the patient reached the endpoint e.g.

they may die from some other cause they may withdraw from the study the study may end before they die of the cause

Further analysis of survival rates

• Significant differences in progression rates between different groups can be tested formally

Log-rank test Regression models Rate and Risk Event Rate p/person p/year

Rate is the number of events occurring expressed as a proportion of the total follow-up time of all individuals

Number of events occurring Rate  Total number of years follow - up for all individuals

Number of events occurring  Person - years of follow - up

− Each person’s length of follow-up time is the time from when he/she enters the study until the time when the event occurs or the study draws to a close if the event does not occur

− Total number of years follow-up time is the sum of all the individuals’ follow-up times

Different names for rates

• The rate is called an incidence rate when the event is a new case (e.g. of a disease)

• The rate is called the mortality rate when the event is death

− rates cannot be calculated in a cross-sectional study since this type of study does not involve time Relative Rate (rates ratios)

Rate Relative Rate  exposed Rateunexposed

− the rate of disease in a group of individuals exposed to some factor of interest with that in a group of individuals not exposed to the factor

Interpreting Relative Rate

• Relative rate = 1 indicates that the rate of disease is the same in the two groups

• Relative rate > 1 indicates that the rate is higher in those exposed to the factor than in those who are unexposed

• Relative rate < 1 indicates that the rate is lower in those exposed to the factor than in those who are unexposed

Risk

Risk is the probability of developing the disease (or dying) in a stated time period

Total number of events Risk  Number of individuals in the study

− risk of event is greater when the individuals are followed for longer Relative risk

Risk Relative Risk  exposed Risk unexposed

− The risk of disease in a group of individuals exposed to some factor of interest with that in a group of individuals not exposed to the factor

Interpreting Relative Risk

• Relative risk = 1 indicates that the risk of disease is the same in the two groups

• Relative risk > 1 indicates that the risk is higher in those exposed to the factor than in those who are unexposed

• Relative risk < 1 indicates that the risk is lower in those exposed to the factor than in those who are unexposed

Workshop: Rate and Risk

• Over a total follow-up of 718 person-years, 61 patients experienced virological failure

− calculate the event rate

• Exercise: from page 46 of Medical Statistics at a Glance by Petrie and Sabin

− calculate risks and relative risks

Summary

• Frequency Distributions and Contingency Tables

• Diagnostic Tests

• Survival Analysis

• Rate and Risk