Data Mining in the Pharmaceutical Industry
Total Page:16
File Type:pdf, Size:1020Kb
Data Mining in the Pharmaceutical Industry By Jerry Swartz Introduction • Since I am a remote student, if there are questions, feel free to e-mail [email protected] Pharmaceutical Development • Four Stages of Drug Development – Research finds new drugs – Development tests and predicts drug behavior – Clinical trials test the drug in humans – Commercialization takes drug and sells it to likely consumers (doctors and patients) • I’ll show an example for the Research, Development, and Clinical Trials stages Research Stage • Huge user of data mining tools and techniques • Scientists run experiments to determine activity of potential drugs • Uses high speed screening to test tens, hundreds, or thousands of drugs very quickly – this generates microarray data Research Stage • “Bioinformatics” is a general term for the information processing activities on data generated in Research Stage, especially microarray data • General goal is to find activity on relevant genes or to find drug compounds that have desirable characteristics (whatever those may be) Research Stage • Data mining techniques used – Clustering – Classification – Neural networks Research Stage Example 1 • Goal: Determine compounds with similar activity • Why: Compounds with similar activity may behave similarly • When: – Have known compound and are looking for something better – Don’t have known compound but have desired activity and want to find compound that exhibits this activity Research Stage Example 1 • Sample data Structure\Activity Alpha Beta Delta Gamma CO2 0.07 0.88 0.62 0.09 H2O 0.80 0.54 0.32 0.79 H2O2 0.34 0.91 0.44 0.40 Research Stage Example 1 • Cluster compounds that have similar activity • We like behavior of H2O and want to see what compounds have similar activity • Example derived from Application of Nearest-Neighbor and Cluster Analyses in Pharmaceutical Lead Discovery • Clustering takes place based on similar activity using Euclidean “distance.” Research Stage Example 1 • For simplicity, distance in example is simply difference between Beta and Delta values, not Euclidean • Distances: CO2 H2OH2O2 CO2 0.00 0.65 0.16 H2O 0.65 0.00 0.49 H2O2 0.16 0.49 0.00.