R: a Swiss Army Knife for Market Research

R: a Swiss Army Knife for Market Research

R: A Swiss Army Knife for Market Research Enhancing work practices with R Martin Chan – Consultant, Rainmakers CSI 12th September, 2018 1 2 This presentation is about using R in business environments or conditions where its usage is less intuitively suitable… … and a story of how R has been transformative for our work practices 2 3 Who are we? What do we do? 3 4 What How Where Answer strategic Estimate size of market, and size of Across multiple questions to help our opportunity, anticipating trends industries and clients grow profitably markets Utilise resources available within an Inform strategic decisions organisation (e.g. stakeholder knowledge, • FMCG through creative and existing research) • Finance & analytical thinking Insurance Develop consumer targeting frameworks – • Travel identify core consumers and areas of • Media opportunities 4 5 A team with a mix of backgrounds from strategy, brand planning, marketing, research… (where analytics is a key, but only one of the components of our work) 5 6 What’s different about the use of R in our work? 6 Challenges of Using R Nature of data: Nature of U&A Client 1 2 3 disparate, patchy data requirement and expectations (of outputs) U&A – research with the aim to understand a market and identify growth opportunities by answering questions on whom to target, with what, and how. (Source: https://www.ipsos.com/en/ipsos-encyclopedia- 7 usage-attitude-surveys-ua) Nature of data: 1 disparate, patchy 8 9 Historical survey data – often designed for different purposes Stakeholder Interviews Our data come (Qualitative) together in and collected from different samples different forms, like pieces of Population / Demographic data (from census, World Bank Customer Interviews (Qualitative) JIGSAW research etc.) Pricing data Historical segmentation work (e.g. Euromonitor, Nielsen) (past work produced by client’s suppliers) Primary research (e.g. U&A, customer satisfaction surveys) 9 10 Challenges in Using R Disparate and patchy data… Ad-hoc data – easier to explore in Excel 1 Traditionally Excel is an easy tool for these (albeit chaotic) situations when you need to ‘play around’ to figure out your plan of attack 2 Data is small or ‘non-repetitive’ – not worth automating Perception of using R – for automating analysis - is only justifiable if the benefits of reproducibility exceed the costs of … renders it more setting up code (prominent for small data sets) difficult to reap the benefits offered by R 3 Poorly structured / stored data – more time-consuming to use R to clean Condition of the input data (think merged cells in Excel, or data saved in PowerPoint) almost make R even more time- consuming in the short-term 10 Integrating data types 11 Integrating analysis from different Package for Qualitative Data Analysis sources of data under one roof RQDA Qualitative Research Output stakeholder interviews, Understand the “why” and the reasoning telephone interviews; focus behind behaviour, or develop hypotheses / segments – development of a groups, workshops framework of themes and quotes 11 Integrating data types 12 Qualitative Analysis (Without R) Interview or group takes place, which is recorded and transcribed into a text or Word file • Transcripts are annotated and interpreted • Possible themes / hypotheses are documented in a separate document • Quotes are copied out and pasted into a separate document, Keep refining with references to the respondent background and interview framework until it is a number fair and sensible interpretation of the Transcript Quotes Themes different themes emerging from the interviews A framework of themes/hypotheses or a story is developed – and a report is produced using the analysis of the annotations, themes, and quotes 12 Integrating data types 13 RQDA: What is it used for? RQDA 1. A more systematic tool for analysing qualitative data 2. GUI launched in R for marking up themes, quotes from qualitative interviews 3. Creates an analysis output in sqlite database, allowing further analysis or production of outputs within R 4. Integrated with R – use sqlite to extract data, enable exploration of insights through text mining techniques Outputs 1 Word / N-gram frequency analysis → Help generate hypotheses wordcloud 2 Word cloud → Visual output for communicating strategic output tidytext 3 Table output → Ease of retrieving quotes to be used in a presentation 13 Integrating data types 14 RQDA GUI Interface Marked up transcript Outputs Structure of database Code ID Codes / Themes File … 1 Data Science in Practice 1, 2, 3 frequency of occurrence, 2 Industry Differences 1, 2 code themes, etc. 3 R versus Python 3,5 … … … Source: https://leanpub.com/conversationsondatascience/read_s 14 ample Nature of U&A 2 data U&A – research with the aim to understand a market and identify growth opportunities by answering questions on whom to target, with what, and how. (Source: https://www.ipsos.com/en/ipsos-encyclopedia- 15 usage-attitude-surveys-ua) 16 Challenges of Using R 1 Wide data Features of a Large number of variables with relatively few cases Usage & / observations Attitude Survey 2 Design-dependent Variables must be very closely interpreted relative (U&A) to the original research design, e.g. questionnaire wording, rating vs ranking – making it difficult to interpret results within R 16 17 Example Question 1 Attitudinal Q25. For each of the following statements - on a scale of 1 to 6, please select the score which best describes your attitude towards using R packages: I write my own R packages I look extensively for existing whenever it is possible to /do packages as solutions before so considering to produce my own Q25A … 1 2 3 4 5 6 I believe R packages should be I believe R packages should contain comprehensive enough to cover Q25P the minimum number of functions the most likely tasks that I will needed to solve a problem do when solving a problem Wording will also slightly differ if you enter the survey as a different respondent – e.g. R package developer, beginner or heavy R user! 17 18 Example Question 2 Usage Q27. For each of the following tasks, please select the packages that you are likely to use in carrying out the task… SELECT A MAXIMUM OF THE TEN MOST LIKELY PACKAGES Tasks Packages 1. Building a linear model with an 1. dplyr interactive dashboard output 2. tidyr Entered through 2. Writing a package to store 3. data.table search box customised analysis functions for 4. tibble a project 5. shiny 3. Producing publishing quality 6. plotly visualisations 7. ggplot2 … … 12.4. Analyse survey data 110.8. Other (Please enter) 18 19 1 2 Our challenge 3 1 3 4 The meaning 5 represented by Handling both the each variable is variable and value 6 highly dependent labels effectively This explains the on the exact through R when prevalence of wording on both there are often a GUI analysis Number of ends of the scale large number of variables from software such one question these variables, as SPSS and Q which are not 2 supported by data Sparse data frames / tibbles (particularly in example 2) 19 20 Generalised Approach The solution to overcoming this is to use a mixture of the Create empty list following that helps us Loop* through scenarios / categories automate certain workflows: Operations to wrangle / analyse data, e.g. group_by(), summarise(), mutate(), spread(), gather() • Dplyr Output: either as a tibble / data.frame or mschart object Assign output to member of list • for loops / apply → Create a combined output using merge_recurse or merge_all functions → Export Excel output using writexl::write_excel() • merge functions → Export PowerPoint output using mschart package • User-defined functions + timestamp to outputs to document analysis *Or in the form of combining sapply and a UDF, if possible – for computation efficiency; but for loops benefit from readability 20 21 Survey Analysis (With R) Run different iterations Output Import Prepare Analyse Summary writexl tables, e.g. contingency tables VBA for automating formatting Import files as SAV Extract, cleaning, apply functions, loops, and custom PowerPoint (SPSS) or CSV files; and structuring data functions for generating the outputs outputs – sometimes serialised as to prepare for – usually in the form of summary occasionally RDS to save loading analysis tables using SVG time and memory (vector) for Lepton providing an alternative to R visual quality packages for managing ad-hoc 21 functions 22 Outcome • Reproducibility – can trace back to exactly what you have done • A quick and organised way of exploring data through iterations (using custom functions) 22 Client 3 requirement and expectations (of outputs) 23 24 In many business environments there is still a preference of (perceived) simple and reliable outputs over interactive dashboard outputs like Shiny or Tableau 24 25 There are many reasons for this – sometimes it may be the nature of the data (small, ad-hoc datasets that require a lot of cleaning), but also driven by certain specific needs: Clients can easily adapt No requirement to know any 1 content for their own use 2 code (access to data not when sharing with internal dependable on internal skill or stakeholders supplier) Perceived to have fewer Output is not merely data 3 ‘moving parts’, e.g. what 4 visualisation – often a requirement do I when there is an error to heavily annotate or to construct a conceptual framework 5 Familiarity… 25 26 Familiarity (Style) “Content looks great, could you put this in a format that looks less academic?” 26 27 We work with partners and clients from very varied

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    32 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us