Marrying Prediction and Segmentation to Identify Sales Leads

Marrying Prediction and Segmentation to Identify Sales Leads

Marrying Prediction and Segmentation to Identify Sales Leads Jim Porzak, The Generations Network Alex Kriney, Sun Microsystems February 19, 2009 1 Business Challenge ● In 2005 Sun Microsystems began the process of open-sourcing and making its software stack freely available ● Many millions of downloads per month* ● Heterogeneous registration practices ● Multichannel (web, email, phone, in- person) and multitouch marketing strategy ● 15%+ response rates; low contact-to-lead ratio ● Challenge: Identify the sales leads * Solaris, MySQL, GlassFish, NetBeans, OpenOffice, OpenSolaris, xVM Virtualbox, JavaFX, Java, etc. Sun Confidential: Internal Only 2 The Project ● Focus on Solaris 10 (x86 version) ● Data sources: download registration, email subscriptions, other demographics databases, product x purchases ● What are the characteristics of someone with a propensity to purchase? ● How can we become more efficient at identifying potential leads? ● Apply learnings to marketing strategy for all products and track results Sun Confidential: Internal Only 3 Predictive Models Employed ● Building a purchase model using random forests ● Creating prospect persona segmentation using cluster analysis Sun Confidential: Internal Only 4 Random Forest Purchase Model Sun Confidential: Internal Only 5 Random Forests • Developed by Leo Breiman of Cal Berkeley, one of the four developers of CART, and Adele Cutler, now at Utah State University. • Accuracy comparable with modern machine learning methods. (SVMs, neural nets, Adaboost) • Built in cross-validation using “Out of Bag” data. (Prediction error estimate is a by product) • Large number candidate predictors are automatically selected. (Resistant to over training) • Continuous and/or categorical predicting & response variables. (Easy to set up.) • Can be run in unsupervised for cluster discovery. (Useful for market segmentation, etc.) • Free Prediction and Scoring engines run on PC’s, Unix/Linux & Mac’s. (R version) • See Appendix for links to more details. Sun Confidential: Internal Only 6 Data for Model • Data cleanup issues • Types of predictors > Categorical variables > Demographic: Job & with many categories Organization limited to top 30, with > Intended Use by rest grouped into application area “Other2” > Engagement with Sun: Newsletter list > Silly answers in number combinations of licenses questions • Solaris product > Email domains checked registration not used in & simplified model – concern that • Predict purchase of registration was intrinsic product x; yes or no. in purchase process Sun Confidential: Internal Only 7 randomForest Run • Training set: random sample of 8000 records • Tuned & repeated to verify stability. • Final model: > train2.rf Call: randomForest(x = train2[, -1], y = train2[, 1], classwt = c(5, 1), importance = TRUE, proximity = FALSE) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 5 OOB estimate of error rate: 27.3% Confusion matrix: Has Paid Not Paid class.error Has Paid 2884 1116 0.279 Not Paid 1068 2932 0.267 Sun Confidential: Internal Only 8 Diagnostics – Error Rate by # Trees Purchase model Error Rate by # Trees Sun Confidential: Internal Only 9 Variable Importance Plot Purchase model Variable Importance Sun Confidential: Internal Only 10 Partial Dependence Plot Example Sun Confidential: Internal Only 11 Model worked well! Purchase model Validation Trained on 8,000 records, validated on entire database. % Cumulative % Purchased depth of file 20% 3.4% 40% 7% 60% 13% 80% 30% Sun Confidential: Internal Only 12 Predictors by Importance Purchase model Variable Importance ← Tier 1 ← Tier 2 ← Tier 3 Sun Confidential: Internal Only 13 Variables that Predict Purchase ● On the whole those who raise their hands and identify themselves with more detailed company-related information tend to purchase ● Specificity is the key to all the variables ● 3 tiers of variables in order of importance Tier 1 Tier 2 Tier 3 - Email domain - Organization (dept.) - Industry - List membership - Job Role - Everything else - Company - Number of Lists Sun Confidential: Internal Only 14 A Closer Look at Tier 1 ● Users that put in a corporate domain tend to purchase. Users that use free web-based email such as Yahoo or Gmail do not ● People that enter a valid company name tend to purchase ● Opting in for most lists has a positive association with purchase, while a few lists and not opting in at all have a negative association ➢ Some lists are better than others. Developers and students do not purchase. SysAdmins and generalists do when they are subscribed to multiple lists Sun Confidential: Internal Only 15 A Closer Look at Tier 2 ● Organization (Department) ➢ IT, MIS, SYS Admin positions have a high tendency to purchase ● Job Role (more of a functional attribute) ➢ The more specific the role that is entered the higher the association with purchase, i.e. Systems Administrator ● The more email lists an individual signs up for increases the association with purchase Sun Confidential: Internal Only 16 Digital Persona Cluster Model Sun Confidential: Internal Only 17 Clustering Data Source – 1 of 2 • Surveys from 20k respondents > All within same time frame > All requesting Solaris download • This survey started after modeling data pull > Totally independent data used here! • Question 1 free form inputs coded as 1 for a reasonable answer, 0 for none or silly. • Rest code 1 if ticked, else 0. > “Other” coded 1 if used Sun Confidential: Internal Only 18 Clustering Data Source – 2 of 2 Variables: Q1. Licenses? Q2. Roles? Q3. Hardware systems? Q4. Why most interested? Q5. What applications? Sun Confidential: Internal Only 19 Choice of Clustering Method • In this kind of “check box” survey, checking a box is much more significant than not checking it. > If I check “Database,” that says a whole lot more about me than if I don't check “Database.” • We need an appropriate distance measure. > Expectation based Jaccard • Fritz Leisch's flexclust package for R handles this quite nicely. > See Appendix for links & details • We tell flexclust the number of clusters to find. Starting from a random point, it does so. • Our challenge is picking a stable & meaningful solution. Sun Confidential: Internal Only 20 One Example 4-cluster Solution Sun Confidential: Internal Only 21 Another Example 4-cluster Solution Sun Confidential: Internal Only 22 Best Solution was 5 Clusters ● SPARC Loyalists ● Other Brand Responders ● Other Brand Non-Responders ● Sun x86 Loyalists ● Students Sun Confidential: Internal Only 23 Digital Persona Segments Sparc Loyalists Other Brand Responders X86 Loyalists 0.0 0.2 0.4 0.6 0.8 1.0 #1: 2313 (11.56%) #2: 6472 (32.36%) #3: 4658 (23.29%) Lic_SPARC Lic_X86 Role_Dev Role_SA Role_IT_Mngr Role_IT_Arch Role_Student Role_Other Sy s_SUNSPARC Sys_SUN6486 Sy s_DELL Sy s_HP Sy s_IBM Sy s_FUJITSU Sy s_Other Int_Multiplatf orm Int_Opensource Int_Pred_Lif e Int_Pricing Int_Support Int_64_bit Int_Containers Int_Perf ormance Int_DTrace Int_ZFS Int_Other Ap_Web_Serv ing Ap_Inf rastructrue Ap_Collaboration Ap_Database Ap_J2EE Ap_Desk_Lap_Top Ap_Dev elopment Ap_High_Perf Ap_Other #4: 2699 (13.49%) #5: 3858 (19.29%) Lic_SPARC Lic_X86 Role_Dev Role_SA Role_IT_Mngr Role_IT_Arch • Red dots represent Role_Student Role_Other Sy s_SUNSPARC the average Sys_SUN6486 Sy s_DELL Sy s_HP response for the Sy s_IBM Sy s_FUJITSU Sy s_Other Int_Multiplatf orm entire population Int_Opensource Int_Pred_Lif e Int_Pricing Int_Support Int_64_bit Int_Containers Int_Perf ormance Int_DTrace • Height of the bar Int_ZFS Int_Other Ap_Web_Serv ing Ap_Inf rastructrue represents that Ap_Collaboration Ap_Database Ap_J2EE clusters response Ap_Desk_Lap_Top Ap_Dev elopment Ap_High_Perf Ap_Other and can be viewed 0.0 0.2 0.4 0.6 0.8 1.0 in context of the Other Brand Non-Responders Students average response 24 Sun Confidential: Internal Only 24 0.8 0.6 0.4 0.2 0.0 Cluster 1: 2313 points (11.56%) Lic_SPARC Sparc Loyalists Lic_X86 Role_Dev Role_SA Role_IT_Mngr Role_IT_Arch Role_Student Role_Other Sys_SUNSPARC Sys_SUN6486 0.8 Sys_DELL Sys_HP 0.6 Sys_IBM Sys_FUJITSU Sys_Other 0.4 Int_Multiplatform Int_Opensource 0.2 Int_Pred_Life Int_Pricing Int_Support 0.0 Cluster 2: 6472 points (32.36%) Int_64_bit Int_Containers Int_Performance Lic_SPARC Int_DTrace Lic_X86 Int_ZFS Role_Dev Responders Int_Other Cluster 1: SPARC Loyalists Role_SA Ap_Web_Serving • Role_IT_Mngr Ap_Infrastructrue Heavy SPARC users/licensees Role_IT_Arch Ap_Collaboration • Role_Student Ap_Database Tend to be Sys Admins Ap_J2EE Role_Other • Sys_SUNSPARC Ap_Desk_Lap_Top sourceDo not show interest in open Sys_SUN6486 Ap_Development Sys_DELL Ap_High_Perf Sys_HP Ap_Other Sys_IBM Sys_FUJITSU Sys_Other Int_Multiplatform Int_Opensource Int_Pred_Life Int_Pricing Int_Support Int_64_bit Int_Containers Int_Performance Int_DTrace Sun Confidential: Internal OnlyInt_ZFS Cluster 2: Int_Other Ap_Web_Serving • Ap_Infrastructrue Sys admins Ap_Collaboration • Use multi-platform(non-Sun) Ap_Database hardware in their data centersOther Ap_J2EE • Ap_Desk_Lap_Top Very Interested in a variety of Brand Ap_Development applications for Solaris 10 Ap_High_Perf Ap_Other Responders 25 0.8 0.6 0.4 0.2 0.0 Cluster 3: 4658 points (23.29%) Lic_SPARC Lic_X86 X86 Loyalists Role_Dev Role_SA Role_IT_Mngr Role_IT_Arch Role_Student Role_Other 0.8 Sys_SUNSPARC Sys_SUN6486 0.6 Sys_DELL Sys_HP Sys_IBM 0.4 Sys_FUJITSU Sys_Other 0.2 Int_Multiplatform Int_Opensource Int_Pred_Life 0.0 Cluster 4: 2699 points (13.49%)Int_Pricing Int_Support Int_64_bit Lic_SPARC Int_Containers Lic_X86

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    37 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us