Data Driven Discovery

Data Driven Discovery

Data-driven discovery Case studies from patient records and spontaneous reports Niklas Norén, PhD Uppsala Monitoring Centre WHO Collaborating Centre for International Drug Monitoring ISPE 2011 Mid-Year Meeting. April 9, 2011. Florence, Italy. Disclosure • Uppsala Monitoring Centre research primarily self- funded • The results on patient records in this presentation came out of a, now finished, pilot study co-financed by IMS Health • Government support throught grants – IMI PROTECT – Monitoring Medicines (FP7) – OMOP (FNIH) Presentation outline • Data-driven discovery • Three case studies • Lessons learned RTWLAAQZDDYTFGFQXYSSSTFGLOIRQQXY DDAFAFEKRRQTNGDIRGEEWGWCFSSJWQYS JKAOFVHVVIKQXYWZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYIQXYAQVVVQ KMRDPVWVFERUQTESQWMIERFPSYDVDAVQ JKKOLTHSNMKQXYYFNGHDDLYOCSAOLDZA VKKIHLZKRRQTNGDIRGEEWGWCFSSJWQXY QXYMYWZACIYRGEFQXYSWOYSAWLAAQZSE What is data-driven discovery? • The application of analytics to detect patterns in data • Let data lead the way! – No pre-specified hypothesis – Parallel perspectives on data – Many covariates, many pattern types • Simple diagnostic test: Can you enumerate the possible findings prior to your analysis? Why is it important? • Identify the unexpected • Obtain a more complete perspective of data • Highlight issues that may alter the interpretation of your primary analysis When is it relevant? • Fundamental to broad surveillance • A core component in signal refinement and refutation • Useful for data management and quality assurance • A safeguard in hypothesis-driven research Hand and Bolton J Appl Statist, 2004 Pattern discovery • A pattern can be defined as a local deviation from a global baseline model • Affects a limited number of covariates and / or data points RTWLAAQZDDYTFGFQXYSSSTFGLOIRQQXY DDAFAFEKRRQTNGDIRGEEWGWCFSSJWQYS JKAOFVHVVIKQXYWZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYIQXYAQVVVQ KMRDPVWVFERUQTESQWMIERFPSYDVDAVQ JKKOLTHSNMKQXYYFNGHDDLYOCSAOLDZA VKKIHLZKRRQTNGDIRGEEWGWCFSSJWQXY QXYMYWZACIYRGEFQXYSWOYSAWLAAQZSE Hand and Bolton J Appl Statist, 2004 Pattern discovery • A pattern can be defined as a local deviation from a global baseline model • Affects a limited number of covariates and / or data points RTWLAAQZDDYTFGFQXYSSSTFGLOIRQQXY DDAFAFEKRRQTNGDIRGEEWGWCFSSJWQYS JKAOFVHVVIKQXYWZSCYGRRWOYSAOVDVAJKAOF V H VV IKQXYWZSCYGRRWOYSAO V D V A DWUACDWUACVVSVKKIHLZJWDOVEYIQXYAQVVVQVV S V KKIHLZJWDO V EYIQXYAQ VVV Q KMRDPKMRDPVWVFERUQTESQWMIERFPSYDVDAVQV W V FERUQTESQWMIERFPSYD V DA V Q JKKOLTHSNMKQXYYFNGHDDLYOCSAOLDZA VKKIHLZKRRQTNGDIRGEEWGWCFSSJWQXY QXYMYWZACIYRGEFQXYSWOYSAWLAAQZSE How do you do it? • No such thing as completely open-ended analysis! • Need to define: – Type of patterns (examples on following slides) – Baseline model – Covariates – Data subset(s) – How to follow up • The challenge is to maintain power to detect the unexpected Success factors • Data preparation – Effective data management and cleaning • Robustness to data quality issues – ... or relevant patterns may be drowned in noise • Control of false alerts – Some false positives ok but positive predictive value must be acceptable – Rate of spurious associations can often be evaluated with Monte Carlo simulation or permutation tests – Biases are more difficult! Norén et al. Data Min Knowl Discov, 2007 Record matching • Duplicate detection in six million VigiBase reports • Screen for pairs of suspiciously similar records • Baseline model: independent reports • Score each covariate with log-likelihood ratio (matches rewarded, mismatches penalized) RTWLAAQZDDYTFGFQXYSSSTFGLOIRQQXY DDAFAFEKRRQTNGDIRGEEWGWCFSSJWQYS JKAOFVHVVIKQXYWZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYIQXYAQVVVQ KMRDPVWVFERUQTESQWMIERFPSYDVDAVQ JKKOLTHSNMKQXYYFNGHDDLYOCSAOLDZA VKKIHLZKRRQTNGDIRGEEWGWCFSSJWQXY QXYMYWZACIYRGEFQXYSWOYSAWLAAQZSE Norén et al. Data Min Knowl Discov, 2007 Record matching • Duplicate detection in six million VigiBase reports • Screen for pairs of suspiciously similar records • Baseline model: independent reports • Score each covariate with log-likelihood ratio (matches rewarded, mismatches penalized) RTWLAAQZDDYTFGFQXYSSSTFGLOIRQQXY DDAFAFE KRRQTNGDIRGEEWGWCFSSJWQ YS JKAOFVHVVIKQXYWZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYIQXYAQVVVQ KMRDPVWVFERUQTESQWMIERFPSYDVDAVQ JKKOLTHSNMKQXYYFNGHDDLYOCSAOLDZA VKKIHLZ KRRQTNGDIRGEEWGWCFSSJWQ XY QXYMYWZACIYRGEFQXYSWOYSAWLAAQZSE Norén et al. Data Min Knowl Discov, 2007 Record matching method • Covariates: country of origin, patient gender, patient age, date of onset, outcome, drugs, suspected ADRs • Suspected duplicates reviewed by national centre RTWLAAQZDDYTFGFQXYSSSTFGLOIRQQXY DDAFAFE KRRQTNGDIRGEEWGWCFSSJWQ YS JKAOFVHVVIKQXYWZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYIQXYAQVVVQ KMRDPVWVFERUQTESQWMIERFPSYDVDAVQ JKKOLTHSNMKQXYYFNGHDDLYOCSAOLDZA VKKIHLZ KRRQTNGDIRGEEWGWCFSSJWQ XY QXYMYWZACIYRGEFQXYSWOYSAWLAAQZSE Norén et al. Data Min Knowl Discov, 2007 Record matching results • 78,000 suspected duplicates (2011) • ~65% recall, ~80% precision (rel. manual review, small study!) • Highlighted non-duplicates typically otherwise related Country Patient Patient Drugs ADRs Date of of origin age gender onset Norway 8 F Epinephrine/Lidocaine Facial pain 2003-12-16 Norway 18 F Epinephrine/Lidocaine Facial pain 2003-12-16 Norway 29 F Epinephrine/Lidocaine Facial pain 2003-12-16 • Three reports from the same dentist! Norén et al. Stat Med, 2008. Interaction detection • Drug interaction detection in VigiBase • Covariates: drugs and ADRs • Identify excess reporting of ADR with two drugs RTWLAAQZDDYTFGFQXYSSSTFGLOIRQQXY DDAFAFEKRRQTNGDIRGEEWGWCFSSJWQYS JKAOFVHVVIKQXYWZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYIQXYAQVVVQ KMRDPVWVFERUQTESQWMIERFPSYDVDAVQ JKKOLTHSNMKQXYYFNGHDDLYOCSAOLDZA VKKIHLZKRRQTNGDIRGEEWGWCFSSJWQXY QXYMYWZACIYRGEFQXYSWOYSAWLAAQZSE Norén et al. Stat Med, 2008. Interaction detection • Drug interaction detection in VigiBase • Covariates: drugs and ADRs • Identify excess reporting of ADR with two drugs RTWLAAQZDDYTFGF QX Y SSSTFGLOIRQ QX Y DDAFAFEKRRQTNGDIRGEEWGWCFSSJWQYS JKAOFVHVVIK QX Y WZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYI QX Y AQVVVQ KMRDPVWVFERUQXESQWMIERFPSYDVDAVQ JKKOLTHSNMK QX Y YFNGHDDLYOCSAOLDZA VKKIHLZKRRQTNGDIRGEEWGWCFSSJW QX Y QX Y MYWZACIYRGEF QX Y SWOYSAWLAAQZSE Norén et al. Stat Med, 2008. Interaction detection method • Baseline model: additive attributable risks • Shrinkage observed-to-expected ratio to protect against spurious associations • Suspected interactions assessed by clinical experts RTWLAAQZDDYTFGF QX Y SSSTFGLOIRQ QX Y DDAFAFEKRRQTNGDIRGEEWGWCFSSJWQYS JKAOFVHVVIK QX Y WZSCYGRRWOYSAOVDVA DWUACVVSVKKIHLZJWDOVEYI QX Y AQVVVQ KMRDPVWVFERUQXESQWMIERFPSYDVDAVQ JKKOLTHSNMK QX Y YFNGHDDLYOCSAOLDZA VKKIHLZKRRQTNGDIRGEEWGWCFSSJW QX Y QX Y MYWZACIYRGEF QX Y SWOYSAWLAAQZSE Norén et al. Stat Med, 2008. Interaction detection results • 15,000 triplets with excess reporting rates • Among those are cases of known interactions such as cerivastatin/gemfibrozil, digoxin/clarithromycin etc. • Also report clusters and a patient safety issue: Drugs ADR(s) # Reports Expected Comment Bupivacain Strabismus 25 <1 25 reports listing the same Hyaluronidase five drugs and ADR Cefazolin submitted by the same Gentamicin reporter in 1985 Lidocaine Celecoxib Drug maladministration 51 <1 Confusion of brand names Citalopram (Celebrex & Celexa) Norén et al. Data Min Knowl Discov, 2010 Temporal pattern discovery • Pattern discovery in IMS UK collection of two million longitudinal patient records • Covariates: drugs and medical events • Screen for medical events that occur more often than expected soon after start of treatment Norén et al. Data Min Knowl Discov, 2010 Temporal pattern discovery method • Baseline model: Relative frequency of medical event constant over time in exposed patients • Self-controlled cohort with external control group to adjust for age gradients, variations in use of healthcare and clustering of doctor’s visits • Follow-up: – Visualisation of temporal patterns – Computerized highlighting of potential confounders – Clinical assessment of patient details – Secondary analysis (related drugs or events, stratification...) Norén et al. Data Min Knowl Discov, 2010 Temporal pattern discovery results • 42,000 associations between drugs and events • A variety of temporal patterns Lessons learned • Many different patterns can be highlighted as deviations from a single simple baseline model • A substantial proportion of findings relate to data quality issues or highlight aspects of data that are important for interpretation of the primary analysis • Major intellectual input required after initial discovery! More lessons learned • 10,000+ patterns -> additional triages required – Emerging patterns – Focus areas – Predictive models • Careful communication! • Many times, biases dominate! • Multiple comparisons are a real issue in some applications Hopstadius and Norén Submitted, 2011 – Naive sub-group analyses in spontaneous reports can lead to ~50% false positive rates RTWLAAQZD D YTFGF QXY SSSTFGLOIRQ QXY D D AFAFE K R RQTNG DI RGEEWGWCFSSJWQ YS JKAOF V H VV I K QXY WZ SC YGRRWOYSAO V D V A DWU A C VV S V KKIHLZJWD OV EYI QXY AQ VVV Q KMRDP V W V F E RUQTESQWMI ER FPSYD V DA V Q JKKOL T HS N MK QXY YFNGHDDL Y OCSAOLDZA VKKIHLZ KRRQTNGDIRGEEWGWCFSSJWQXY QXY MYWZ A CIYRGEF QXY SWOYSAWLAAQZSE References 1. Hand DJ, Bolton R. Pattern discovery and detection: a unified statistical methodology . Journal of Applied Statistics , 2004. 31 (8):885-924. 2. Norén GN, Orre R, Bate A, Edwards IR. Duplicate detection in adverse drug reaction surveillance . Data Mining and

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    26 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us