1

Ways of Reducing coverage and error as part of the Total Error Framework for Establishment Surveys in Europe: Recent Developments

Nikola Jovanovski - Solutions BV Carsten Broich - Sample Solutions BV

www.companyname.com © 2016 Startup theme. All Rights Reserved. 2

CONTENT OVERVIEW

Enrichment Methodology Introduction How can we use a Big approach to Company Background decrease in European European Establishment Surveys 4 1 Establishment Surveys Big Data as a crucial part in sampling

Pros and Cons Research Sample challenges and Research Problems Benefits and limitations from using the

2 ESRA 2019 Future considerations 5 enrichment approach

Establishment Sample sources Conclusion What are the different sources of Research Problem address

6 3 establishment sample data for the EU27 Q&A

www.companyname.com © 2016 Startup theme. All Rights Reserved. 3 1. INTRODUCTION

www.companyname.com © 2016 Startup theme. All Rights Reserved. 4

About Sample Solutions

Background Founded in the Netherlands, back in 2009 with the focus on Business & Consumer

Telephone Sample

Sample Survey Platform B2B module Specialized B2B Database designed Survey Research -instant counts and sample

ordering

Multi country B2B sample Projects , London Economics, PwC, ABB, World Bank, American Express

www.companyname.com © 2016 Startup theme. All Rights Reserved. 5 WHAT IS BIG DATA?

www.companyname.com © 2016 Startup theme. All Rights Reserved. 6 "Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or “ complex to be dealt with by traditional data-processing application software. “

www.companyname.com © 2016 Startup theme. All Rights Reserved. European Establishments 7 Surveys:

Flash Eurobarometer (business survey part)

European Companies Survey (ECS)

European Survey of Enterprises on New and Emerging Risks

(ESENER)

www.companyname.com © 2016 Startup theme. All Rights Reserved. sources per 8 country:

B2B data vendors (finance, marketing,sales purposes)

National Business Register

Chamber of Commerce Directory

www.companyname.com © 2016 Startup theme. All Rights Reserved. 9 2. RESEARCH

www.companyname.com © 2016 Startup theme. All Rights Reserved. 10

Sampling Challenge ~60% telephone Issues among the Sampling sources coverage No Phone number included

➔ Decentralized Sample data vending (1 vendor per country) ◆ Under representativeness among certain: ● Industries ● Company Sizes ● Area typologies ● Subsidiaries(Branches) ◆ Variable sample costs among countries ◆ Limited coverage of Contact details of decision makers Entire frame and officers ◆ No or limited coverage of email addresses Wrong/ Outdated Numbers

www.companyname.com © 2016 Startup theme. All Rights Reserved. Source: Business Count that includes entire frame with and without phone number records 11 Sample comparison: Phone v Email

Establishment sample

➔ Offline - phone

◆ Key breakdown values (Industry code, Size) Phone Email ◆ No direct contact point with decision maker

◆ Overall better response rates • Response rates • Direct approach

➔ Online - email • Coverage among • Desired role

Sole Traders • Convenient time ◆ Key breakdown values (Industry code, Size) • Wide availability of ◆ Direct email of Decision Maker • Rapid interview • Low Cost

◆ Targeting specific job roles based on research turnaround time • Non opt-in sampling

www.companyname.com © 2016 Startup theme. All Rights Reserved. 12

Research questions

What are the ways of reducing coverage and sampling error for Establishment

Surveys in EU27: 1. How can Big Data be used for increasing the telephone coverage in the Sampling Frame 2. How can the Sampling frame go online, for a mixed mode sampling approach? 3. How can we ensure sample data accuracy and validity in a international and ?

www.companyname.com © 2016 Startup theme. All Rights Reserved. 13 Future considerations:

From ESRA 2019 we have posted these future action steps:

➔ Use Big Data to Identify Company Size(no. of Employees) on businesses with unknown size ➔ Include a validation step in the enrichment process to lower data inaccuracy ➔ Include a Area Typology stratification criteria in order to make a more representable sample to frame ➔ Design sample of all countries from a centralized source ➔ Source Contact person data and email

www.companyname.com © 2016 Startup theme. All Rights Reserved. 14 3. ESTABLISHMENT SAMPLE SOURCES

www.companyname.com © 2016 Startup theme. All Rights Reserved. ESTABLISHMENT SAMPLE ELEMENTS: 15

● Country, City, Postcode, Region, HQ Location, Branches 01 LOCATION Location

● SIC code, Keywords (from LinkedIn), NAICS code, NACE 02 INDUSTRY code

● Employee Range 03 COMPANY SIZE ● Revenue Range

● Founded date 04 COMPANY STATUS ● Operating status, Entity Type

● Email available, Phone available, Website URL, Decision 05 CONTACT OPTIONS maker (C-level)

06 TECHNOLOGY ● Technologies divided into categories

● Departments, Job title 07 CONTACTS INFO www.companyname.com● C-level, VP level, Director, Manager, Non - Manager © 2016 Startup theme. All Rights Reserved. 16

Three sources for Establishment sample sources:

TRADITIONAL ENRICHMENT GENERATED Designed Designed Designed by by by slidefusion slidefusion slidefusion

Email patterns Available lists Online lookups

Basic Business Verification of email Big Data information activity

www.companyname.com © 2016 Startup theme. All Rights Reserved. 17 Traditional sources

National Registers Phone book directories

Open and free access for State or Privately run general public Categorized with NACE and SME sizes Commercial Data Chamber of Commerce vendors Rich in business information Paid access Can be Member only For Marketing, Credit report www.companyname.com and Sales purposes © 2016 Startup theme. All Rights Reserved. Rich in detail 18

CHARACTERISTICS OF TRADITIONAL SOURCES:

Slidefusion copyright

BRAND MANDATORY FREE ACCESS EMAIL ADDRESSES CONTACTS

National Business * Register * *

Phone book *

Commercial Data Vendor *

Chamber of Commerce * * * *

www.companyname.com © 2016 Startup theme. All Rights Reserved. *Applies to majority of cases in the EU27 19

BIG DATA - ENRICHMENT SOURCES

Search Engines Directory lookup 01 02

Social Media Review sites 03 04

www.companyname.com © 2016 Startup theme. All Rights Reserved. 20

Enrichment sources and tools:

Search

Directory lookups

Engine Google Snippet Reverse phone lookups Company name and address search Company Lookups( Company Name + Email address search Address)

Geographic lookup Contact person lookup

Social Media Review Sites

Facebook Company lookup

Phone lookup Geographic lookup Linkedin Contact and Emp. size lookup

www.companyname.com © 2016 Startup theme. All Rights Reserved. SAMPLE DATA ENRICHMENT 21

SAMPLE DATA ENRICHMENT

Technology Contact personnel Sizes Telephone number Misc details Enrichment

Decision makers From employee count For missing records From a variety of VAT number,

Officers And Reports Incorrect/Outdated 300 Web BusinessDescription,

Employees Of branches technologies Listed email addresses www.companyname.com incorporated on the Entity form © 2016 Startup theme. All Rights Reserved.

website 22

Which source can produce which data

Slidefusion copyright

FEATURES Employee size Phone number Contact person Email Address

LinkedIn

Google

Facebook

Directory lookup

Yelp & Tripadvisor

www.companyname.com © 2016 Startup theme. All Rights Reserved. 23

Overall Enrichment rates among elements

75% 60% 40% 20%

Phone Employee size Contacts Email address number

www.companyname.com © 2016 Startup theme. All Rights Reserved. 24 Challenges in Big Data enrichment

Native encoding Incorrect Location

Wrong or no address Recognition and acceptance of information reduces letters such as Greek, Cyrillic, enrichment productivity Nordic etc.

Registration Name Inactive businesses

Legal Name ≠ Merchant Input frames Name includes potential www.companyname.com dead records © 2016 Startup theme. All Rights Reserved. GENERATED SAMPLE SOURCE: 25 EMAIL ADDRESS SAMPLING METHODOLOGY

Input Data Sourced Company URL to create email domain Contact person Name

Generating Patterns Apply input data to the pattern Combinations eg. fi[email protected] [email protected][email protected]

Validation of email activity Screen all combinations using a SMTP service to see whether the email server recognizes

the email addresses

www.companyname.com © 2016 Startup theme. All Rights Reserved. 26

Email Categorization

[email protected] Personal

[email protected] General

Categorization: Lookup table for CompanyEnrichmen general t

Email Name database for personal Freemail Provider list Freemail

[email protected] Personal

[email protected] General

www.companyname.com © 2016 Startup theme. All Rights Reserved. 27 SAMPLE DATA VERIFICATION

VERIFIED NAME

EMAILS NOT VERIFIED NAME

NO STATUS DATA VERIFICATION NAME WORKING

NOT WORKING URLS

NAME REDIRECTING

NAME NO STATUS

www.companyname.com © 2016 Startup theme. All Rights Reserved. 28 5. ENRICHMENT METHODOLOGY

www.companyname.com © 2016 Startup theme. All Rights Reserved. 29 Outline

European enterprises and establishments sampling featured in both online and offline sampling modes Full Sampling method

➔ Mixed Mode Sample (Online and Offline): defined by:

◆ Direct and/or Company Email

◆ Company phone

➔ Merging and blending multiple business sample sources for a

country into one frame ensuring maximum coverage among the

EU27

➔ Collected email address, contact person data and phone number

Offline(phone) Online(email) information from publicly available sources(URLs, Social Media

etc) Figure 2. Estimation of Coverage between offline and online modes EU27 ➔ Generate Email addresses where not available

www.companyname.com➔ Ensure Generated vs. Collected email data (compliance) © 2016 Startup theme. All Rights Reserved. 30 Full Sample Enrichment

Business Register Online lookup Validation Enriching otherwise unavailable Merge and blend sample frame Ensuring generated data for business frame among multiple sources output is valid

Evaluate variables Generation available Data availability differs Inhouse generating from source to source patterns for missing

www.companyname.com data © 2016 Startup theme. All Rights Reserved. Enriching B2B records with phone 31 numbers

Compile list Web Mining Enriched list Bulk web query to find listing Blank records and incorrect The enriched list is of the company phone numbers are submitted delivered to the FW

Input Phone information number

Company Name The phone number + from the result is www.companyname.com Address Information © 2016 Startup theme. All Rights Reserved. extracted ESENER-3 Results: enrichment of 32

wrong telephone data

www.companyname.com © 2016 Startup theme. All Rights Reserved. ESENER-3 Results: Enrichment of 33

without phone

www.companyname.com © 2016 Startup theme. All Rights Reserved. 34 Strategies for limiting sample Coverage & Bias

Sources Cross-enrichment

Source validation Using multiple sources

Cross-validation Sustainability

Referencing one data fragment Keeping the data up to date

across multiple sources

www.companyname.com *Weighting is possible but might get too statistical (adapting selection probability) © 2016 Startup theme. All Rights Reserved. 35 5. PROS AND CONS:

www.companyname.com © 2016 Startup theme. All Rights Reserved. 36

BENEFITS

DETAILED RICH SAMPLE DATA Data contains more data elements than any other individual source 2 3

Verification Centralized source for EU27 data

Data accuracy is One stop shop for 4 validated by multiple comprehensive data in all sources 27 countries 1 5

Sample going Online Increased coverage

Combining phone and online More sample data can be in survey fieldwork accessed via phone or

email

www.companyname.com © 2016 Startup theme. All Rights Reserved. B. Limitations/Possible Issues 37

Lack of data availability among Sole Traders

Sources limiting available information

Underrepresentation among certain industries ! that are fully offline

Unintentional sourcing of inactive businesses

What are the ethical aspects that should

www.companyname.combe considered when sampling email © 2016 Startup theme.addresses? All Rights Reserved. 38

CONCLUSION

www.companyname.com © 2016 Startup theme. All Rights Reserved. 39

Research questions

What are the ways of reducing coverage and sampling error for Establishment

Surveys in EU27: 1. How can Big Data be used for increasing the telephone coverage in the Sampling Frame 2. How can the Sampling frame go online, for a mixed mode sampling approach? 3. How can we ensure sample data accuracy and validity in a international and longitudinal study?

www.companyname.com © 2016 Startup theme. All Rights Reserved. 40 Future considerations:

From ESRA 2019 we have posted these future action steps:

➔ Use Big Data to Identify Company Size(no. of Employees) on businesses with unknown size (COMPLETED) ➔ Include a validation step in the enrichment process to lower data inaccuracy (COMPLETED) ➔ Include a Area Typology stratification criteria in order to make a more representable sample to frame (PENDING) ➔ Design sample of all countries from a centralized source (PENDING) ➔ Source Contact person data and email (COMPLETED)

www.companyname.com © 2016 Startup theme. All Rights Reserved. 41

SAMPLE SOLUTIONS WWW.SAMPLE.SOLUTIONS

ADDRESS E-MAIL TELEPHONE WEBSITE

Stationsplein 45 [email protected] Direct: 123-4567-9898 www.sample.solutions

3013 AK Rotterdam Fax: 123-4567-9898 The Netherlands

www.companyname.com © 2016 Startup theme. All Rights Reserved. 42

www.companyname.com © 2016 Startup theme. All Rights Reserved.