1
Ways of Reducing coverage and sampling error as part of the Total Survey Error Framework for Establishment Surveys in Europe: Recent Developments
Nikola Jovanovski - Sample Solutions BV Carsten Broich - Sample Solutions BV
www.companyname.com © 2016 Startup theme. All Rights Reserved. 2
CONTENT OVERVIEW
Enrichment Methodology Introduction How can we use a Big Data approach to Company Background decrease sampling error in European European Establishment Surveys 4 1 Establishment Surveys Big Data as a crucial part in sampling
Pros and Cons Research Sample challenges and Research Problems Benefits and limitations from using the
2 ESRA 2019 Future considerations 5 enrichment approach
Establishment Sample sources Conclusion What are the different sources of Research Problem address
6 3 establishment sample data for the EU27 Q&A
www.companyname.com © 2016 Startup theme. All Rights Reserved. 3 1. INTRODUCTION
www.companyname.com © 2016 Startup theme. All Rights Reserved. 4
About Sample Solutions
Background Founded in the Netherlands, back in 2009 with the focus on Business & Consumer
Telephone Sample
Sample Survey Platform B2B module Specialized B2B Database designed Survey Research -instant counts and sample
ordering
Multi country B2B sample Projects Eurobarometer, London Economics, PwC, ABB, World Bank, American Express
www.companyname.com © 2016 Startup theme. All Rights Reserved. 5 WHAT IS BIG DATA?
www.companyname.com © 2016 Startup theme. All Rights Reserved. 6 "Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or “ complex to be dealt with by traditional data-processing application software. “
www.companyname.com © 2016 Startup theme. All Rights Reserved. European Establishments 7 Surveys:
Flash Eurobarometer (business survey part)
European Companies Survey (ECS)
European Survey of Enterprises on New and Emerging Risks
(ESENER)
www.companyname.com © 2016 Startup theme. All Rights Reserved. Sampling frame sources per 8 country:
B2B data vendors (finance, marketing,sales purposes)
National Business Register
Chamber of Commerce Directory
www.companyname.com © 2016 Startup theme. All Rights Reserved. 9 2. RESEARCH
www.companyname.com © 2016 Startup theme. All Rights Reserved. 10
Sampling Challenge ~60% telephone Issues among the Sampling sources coverage No Phone number included
➔ Decentralized Sample data vending (1 vendor per country) ◆ Under representativeness among certain: ● Industries ● Company Sizes ● Area typologies ● Subsidiaries(Branches) ◆ Variable sample costs among countries ◆ Limited coverage of Contact details of decision makers Entire frame and officers ◆ No or limited coverage of email addresses Wrong/ Outdated Numbers
www.companyname.com © 2016 Startup theme. All Rights Reserved. Source: Business Count that includes entire frame with and without phone number records 11 Sample comparison: Phone v Email
Establishment sample
➔ Offline - phone
◆ Key breakdown values (Industry code, Size) Phone Email ◆ No direct contact point with decision maker
◆ Overall better response rates • Response rates • Direct approach
➔ Online - email • Coverage among • Desired role
Sole Traders • Convenient time ◆ Key breakdown values (Industry code, Size) • Wide availability of interview ◆ Direct email of Decision Maker • Rapid interview • Low Cost
◆ Targeting specific job roles based on research turnaround time • Non opt-in sampling
www.companyname.com © 2016 Startup theme. All Rights Reserved. 12
Research questions
What are the ways of reducing coverage and sampling error for Establishment
Surveys in EU27: 1. How can Big Data be used for increasing the telephone coverage in the Sampling Frame 2. How can the Sampling frame go online, for a mixed mode sampling approach? 3. How can we ensure sample data accuracy and validity in a international and longitudinal study?
www.companyname.com © 2016 Startup theme. All Rights Reserved. 13 Future considerations:
From ESRA 2019 we have posted these future action steps:
➔ Use Big Data to Identify Company Size(no. of Employees) on businesses with unknown size ➔ Include a validation step in the enrichment process to lower data inaccuracy ➔ Include a Area Typology stratification criteria in order to make a more representable sample to frame ➔ Design sample of all countries from a centralized source ➔ Source Contact person data and email
www.companyname.com © 2016 Startup theme. All Rights Reserved. 14 3. ESTABLISHMENT SAMPLE SOURCES
www.companyname.com © 2016 Startup theme. All Rights Reserved. ESTABLISHMENT SAMPLE ELEMENTS: 15
● Country, City, Postcode, Region, HQ Location, Branches 01 LOCATION Location
● SIC code, Keywords (from LinkedIn), NAICS code, NACE 02 INDUSTRY code
● Employee Range 03 COMPANY SIZE ● Revenue Range
● Founded date 04 COMPANY STATUS ● Operating status, Entity Type
● Email available, Phone available, Website URL, Decision 05 CONTACT OPTIONS maker (C-level)
06 TECHNOLOGY ● Technologies divided into categories
● Departments, Job title 07 CONTACTS INFO www.companyname.com● C-level, VP level, Director, Manager, Non - Manager © 2016 Startup theme. All Rights Reserved. 16
Three sources for Establishment sample sources:
TRADITIONAL ENRICHMENT GENERATED Designed Designed Designed by by by slidefusion slidefusion slidefusion
Email patterns Available lists Online lookups
Basic Business Verification of email Big Data information activity
www.companyname.com © 2016 Startup theme. All Rights Reserved. 17 Traditional sources
National Registers Phone book directories
Open and free access for State or Privately run general public Categorized with NACE and SME sizes Commercial Data Chamber of Commerce vendors Rich in business information Paid access Can be Member only For Marketing, Credit report www.companyname.com and Sales purposes © 2016 Startup theme. All Rights Reserved. Rich in detail 18
CHARACTERISTICS OF TRADITIONAL SOURCES:
Slidefusion copyright
BRAND MANDATORY FREE ACCESS EMAIL ADDRESSES CONTACTS
National Business * Register * *
Phone book *
Commercial Data Vendor *
Chamber of Commerce * * * *
www.companyname.com © 2016 Startup theme. All Rights Reserved. *Applies to majority of cases in the EU27 19
BIG DATA - ENRICHMENT SOURCES
Search Engines Directory lookup 01 02
Social Media Review sites 03 04
www.companyname.com © 2016 Startup theme. All Rights Reserved. 20
Enrichment sources and tools:
Search
Directory lookups
Engine Google Snippet Reverse phone lookups Company name and address search Company Lookups( Company Name + Email address search Address)
Geographic lookup Contact person lookup
Social Media Review Sites
Facebook Company lookup
Phone lookup Geographic lookup Linkedin Contact and Emp. size lookup
www.companyname.com © 2016 Startup theme. All Rights Reserved. SAMPLE DATA ENRICHMENT 21
SAMPLE DATA ENRICHMENT
Technology Contact personnel Sizes Telephone number Misc details Enrichment
Decision makers From employee count For missing records From a variety of VAT number,
Officers And Reports Incorrect/Outdated 300 Web BusinessDescription,
Employees Of branches technologies Listed email addresses www.companyname.com incorporated on the Entity form © 2016 Startup theme. All Rights Reserved.
website 22
Which source can produce which data
Slidefusion copyright
FEATURES Employee size Phone number Contact person Email Address
Directory lookup
Yelp & Tripadvisor
www.companyname.com © 2016 Startup theme. All Rights Reserved. 23
Overall Enrichment rates among elements
75% 60% 40% 20%
Phone Employee size Contacts Email address number
www.companyname.com © 2016 Startup theme. All Rights Reserved. 24 Challenges in Big Data enrichment
Native encoding Incorrect Location
Wrong or no address Recognition and acceptance of information reduces letters such as Greek, Cyrillic, enrichment productivity Nordic etc.
Registration Name Inactive businesses
Legal Name ≠ Merchant Input frames Name includes potential www.companyname.com dead records © 2016 Startup theme. All Rights Reserved. GENERATED SAMPLE SOURCE: 25 EMAIL ADDRESS SAMPLING METHODOLOGY
Input Data Sourced Company URL to create email domain Contact person Name
Generating Patterns Apply input data to the pattern Combinations eg. fi[email protected] [email protected] fi[email protected]
Validation of email activity Screen all combinations using a SMTP service to see whether the email server recognizes
the email addresses
www.companyname.com © 2016 Startup theme. All Rights Reserved. 26
Email Categorization
[email protected] Personal
[email protected] General
Categorization: Lookup table for CompanyEnrichmen general t
Email Name database for personal Freemail Provider list Freemail
[email protected] Personal
[email protected] General
www.companyname.com © 2016 Startup theme. All Rights Reserved. 27 SAMPLE DATA VERIFICATION
VERIFIED NAME
EMAILS NOT VERIFIED NAME
NO STATUS DATA VERIFICATION NAME WORKING
NOT WORKING URLS
NAME REDIRECTING
NAME NO STATUS
www.companyname.com © 2016 Startup theme. All Rights Reserved. 28 5. ENRICHMENT METHODOLOGY
www.companyname.com © 2016 Startup theme. All Rights Reserved. 29 Outline
European enterprises and establishments sampling featured in both online and offline sampling modes Full Sampling method
➔ Mixed Mode Sample (Online and Offline): defined by:
◆ Direct and/or Company Email
◆ Company phone
➔ Merging and blending multiple business sample sources for a
country into one frame ensuring maximum coverage among the
EU27
➔ Collected email address, contact person data and phone number
Offline(phone) Online(email) information from publicly available sources(URLs, Social Media
etc) Figure 2. Estimation of Coverage between offline and online modes EU27 ➔ Generate Email addresses where not available
www.companyname.com➔ Ensure Generated vs. Collected email data (compliance) © 2016 Startup theme. All Rights Reserved. 30 Full Sample Enrichment
Business Register Online lookup Validation Enriching otherwise unavailable Merge and blend sample frame Ensuring generated data for business frame among multiple sources output is valid
Evaluate variables Generation available Data availability differs Inhouse generating from source to source patterns for missing
www.companyname.com data © 2016 Startup theme. All Rights Reserved. Enriching B2B records with phone 31 numbers
Compile list Web Mining Enriched list Bulk web query to find listing Blank records and incorrect The enriched list is of the company phone numbers are submitted delivered to the FW
Input Phone information number
Company Name The phone number + from the result is www.companyname.com Address Information © 2016 Startup theme. All Rights Reserved. extracted ESENER-3 Results: enrichment of 32
wrong telephone data
www.companyname.com © 2016 Startup theme. All Rights Reserved. ESENER-3 Results: Enrichment of 33
without phone
www.companyname.com © 2016 Startup theme. All Rights Reserved. 34 Strategies for limiting sample Coverage & Bias
Sources Cross-enrichment
Source validation Using multiple sources
Cross-validation Sustainability
Referencing one data fragment Keeping the data up to date
across multiple sources
www.companyname.com *Weighting is possible but might get too statistical (adapting selection probability) © 2016 Startup theme. All Rights Reserved. 35 5. PROS AND CONS:
www.companyname.com © 2016 Startup theme. All Rights Reserved. 36
BENEFITS
DETAILED RICH SAMPLE DATA Data contains more data elements than any other individual source 2 3
Verification Centralized source for EU27 data
Data accuracy is One stop shop for 4 validated by multiple comprehensive data in all sources 27 countries 1 5
Sample going Online Increased coverage
Combining phone and online More sample data can be in survey fieldwork accessed via phone or
www.companyname.com © 2016 Startup theme. All Rights Reserved. B. Limitations/Possible Issues 37
Lack of data availability among Sole Traders
Sources limiting available information
Underrepresentation among certain industries ! that are fully offline
Unintentional sourcing of inactive businesses
What are the ethical aspects that should
www.companyname.combe considered when sampling email © 2016 Startup theme.addresses? All Rights Reserved. 38
CONCLUSION
www.companyname.com © 2016 Startup theme. All Rights Reserved. 39
Research questions
What are the ways of reducing coverage and sampling error for Establishment
Surveys in EU27: 1. How can Big Data be used for increasing the telephone coverage in the Sampling Frame 2. How can the Sampling frame go online, for a mixed mode sampling approach? 3. How can we ensure sample data accuracy and validity in a international and longitudinal study?
www.companyname.com © 2016 Startup theme. All Rights Reserved. 40 Future considerations:
From ESRA 2019 we have posted these future action steps:
➔ Use Big Data to Identify Company Size(no. of Employees) on businesses with unknown size (COMPLETED) ➔ Include a validation step in the enrichment process to lower data inaccuracy (COMPLETED) ➔ Include a Area Typology stratification criteria in order to make a more representable sample to frame (PENDING) ➔ Design sample of all countries from a centralized source (PENDING) ➔ Source Contact person data and email (COMPLETED)
www.companyname.com © 2016 Startup theme. All Rights Reserved. 41
SAMPLE SOLUTIONS WWW.SAMPLE.SOLUTIONS
ADDRESS E-MAIL TELEPHONE WEBSITE
Stationsplein 45 [email protected] Direct: 123-4567-9898 www.sample.solutions
3013 AK Rotterdam Fax: 123-4567-9898 The Netherlands
www.companyname.com © 2016 Startup theme. All Rights Reserved. 42
www.companyname.com © 2016 Startup theme. All Rights Reserved.