Social Measurement Depends on Data Quantity and Quality
Total Page:16
File Type:pdf, Size:1020Kb
POINT OF VIEW SOCIAL MEASUREMENT DEPENDS ON DATA QUANTITY AND QUALITY Social Measurement Depends on Data Quantity and Quality As social platforms proliferate, enthusiastic users are generating more data than ever. Social media data are fast becoming the hottest commodity in market research. But can social data yield measurements that are comparable to those from other, more established The future of actionable forms of research? Is it really possible for brand managers to tap into these data streams to gain social media measurement insight into brand equity? is only as strong as its We believe it is too early to say for sure, even standards for data quality. though social data have been used effectively by PR and marketing departments for years. Anne Czernek For crisis management and on-the-fly campaign brands and more than 30 million online conversations Senior Research Analyst assessments, social monitoring involves to determine the most appropriate methodologies Emerging Media Lab Millward Brown/Dynamic Logic watching a wide stream of updates in real time for working with social measurement from a brand and using those to gauge immediate next steps. perspective. Our conclusion? The future of actionable For these purposes, a qualitative sense of the social media measurement is only as strong as its consumer mood is adequate; the precision of standards for data quality. quantitative research is not required. THE “SOCIAL” VOICE: HOW IS But increasingly, insight and brand strategy teams IT DIFFERENT? are interested in using social data, and they would like to place social measurement alongside other Social data are fundamentally different from types of brand metrics (attitudinal, behavioral, and traditional brand measurement data. We can so on). In this context, social data must be treated think of consumers speaking in two different with the same rigor we expect of more traditional voices. As Figure 1 shows, the “survey” voice forms of measurement. Therefore, Millward Brown’s of consumers is captured under structured and Emerging Media Lab has conducted tests across 60 replicable conditions, while their “social” voice is observed in a fluid state. FIGURE 1: SOCIAL VOICE VS. SURVEY VOICE SURVEY VOICE SOCIAL VOICE Guided Replicable Unsolicited Fluid Quantifiable Observational Unmoderated Structured SHARE 1 POINT OF VIEW SOCIAL MEASUREMENT DEPENDS ON DATA QUANTITY AND QUALITY The challenge is first making these two types of Our collective eagerness to be perpetually data structurally comparable and then establishing connected has spawned an ever-expanding linkages across them. Methodological issues ecosystem of technology platforms, and as a arise out of the nature of the samples available result, the prevailing methodology for social to us in these two data sources. Traditional brand media listening is to capture as much social data tracking and brand equity measurement rely on as possible—from Facebook, Twitter, Instagram, observing a statistically similar group of people Pinterest, Tumblr, etc. But should comments over time. Individuals in a quantitative dataset may on news articles be considered? What about be treated differently—their opinions weighted reviews on sites like Yelp? How do we define the more or less heavily—to ensure that the sample is borders of the social universe? representative and consistent. It seems that the edges of the social universe are as murky as those of the real universe; in both cases, the boundaries are expanding at an apparently The only way to address the increasing rate. Without an agreed-upon definition issues of user duplication and of the limits of the universe—or even what “social” means—it’s difficult to know whether we are truly lack of profiling information is to capturing all of the relevant data. put boundaries on our dataset. Of course, there are also significant differences in We need to make some choices. the type of data available across platforms. Blog and forum discussion tends to be more composed and “conversationally” oriented than Twitter and But the social sample is a rolling sample of Facebook updates, which tend to be short and active, not necessarily representative, voices. posted on the fly. And while, to consumers, it may Activists may overamplify topics they care deeply seem like social data points are easily accessible about, while people having positive but ordinary and browsable, from a research perspective, experiences with a brand may not feel compelled platforms control how those data streams are to speak up at all. It is not possible to weight these syndicated at scale. Twitter has monetized its responses effectively because the information “firehose” and charges for full access to it; Yelp, needed to profile respondents is not consistently in contrast, prohibits collection of its reviews for available. Thus it is difficult, if not impossible, research purposes. Table 1 shows the data fields to ascertain whether those who post, tweet, or that are available (or generally inferable) across comment on a brand are representative of the different types of platforms. population of interest. TABLE 1: TYPES OF INFORMATION We expect that social profiling will improve AVAILABLE ACROSS SOCIAL MEDIA over time as we’re able to derive more of these PLATFORMS attributes from implicit relationships. But even with FACEBOOK BLOGS/ TWITTER PERSONAL FORUMS improved profiling, we anticipate ongoing concerns (TWEETS) as basic as user duplication—e.g., how can we PAGES* (POSTS) ascertain whether one individual has posted five Full data stream times—once each on Facebook, Twitter, Tumblr, Full text availability Blogger, and a forum—or whether five different Majority public profiles individuals posted similar content? Historical data DEFINING A SOCIAL UNIVERSE User-level data to MEASURE User profiles Influence In the current environment, the only way to Location address the issues of user duplication and lack Gender of profiling information is to put boundaries on our dataset. We need to make some choices. *Facebook is a special case: Because data is only released in aggregate, none of the measures above are publicly available for research at a user level. SHARE 2 POINT OF VIEW SOCIAL MEASUREMENT DEPENDS ON DATA QUANTITY AND QUALITY Twitter, THE BEST SOCIAL DATA ISSUES RESEARCH SOURCE — FOR NOW But having chosen a data source, our work is Because of all the issues we have mentioned thus not done. Further processing of social data is far, we believe it is necessary to rely on a single needed. Even if Twitter is the best source of source for social data, and to us, Twitter seems to data for social measurement, not all of the data be the strongest candidate. It’s open, it’s mobile, within it is clean and useable—far from it. Our and it’s the world’s largest information platform. tests of over 30 million tweets show that up to 60 And because the Twitter firehose flows with a percent of Twitter data must be removed from good deal of tweet-level information (including the dataset before it is ready for analysis. the full text, user ID, and timestamp), we’re able to ascertain more not only about the text, but also Twitter tends to have two main issues that the users. Thus Twitter meets our requirement for can degrade data quality: respondent-level information in a bounded dataset. 1. Keyword AMBIGuity Moreover, because of the way it is used and Social data are generally collected through perceived by users, Twitter seems to us to be keyword searches, so when a brand name most representative of the broader social sphere. is also a common word, a large proportion of Dynamic Logic’s 2010 AdReaction study found content returned will not be about the brand. For that consumers viewed Facebook as being example, collecting data for the sandwich chain about connecting to friends and family, whereas “Subway” returns many mentions of the New Twitter was seen as an information platform for York City transit system. This not only distorts discovering, sharing, and learning. Though much the themes of conversation topics tracked, but has changed in social media in the intervening can also wreak havoc on other metrics, like years, those observations still hold true. Facebook sentiment. (In this case, a delayed subway is still the central connection platform, even though train might generate many complaints that are consumers are also using smaller, more interest- irrelevant to $5 foot-long sandwiches.) based social networks to share content related to cooking, photography, sports, or other specialized 2. SPAM interests. When the wealth of data from these sites There is widespread proliferation of spam content is shared into the broader social stream, it usually on Twitter. As Figure 2 shows, half of the social comes through Twitter. Thus Twitter seems to us media mentions of a particular CPG brand were the best aggregation of discovery and sharing. spam. When spam comments are removed, FIGURE 2: SPAM PROPORTION OF TOTAL BRAND KEYWORD MENTIONS 30,000 Spam 25,000 CPG brand 20,000 15,000 Tweets 10,000 5,000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Weeks SHARE 3 POINT OF VIEW SOCIAL MEASUREMENT DEPENDS ON DATA QUANTITY AND QUALITY “net sentiment” tends to go down. (We obtain This research is under way now, and we our measure of net sentiment by adding the anticipate sharing results in the coming months. percentages of positive and neutral content, and from that sum subtracting the percentage of negative comments.) This happens because most spam is neutral in tone; e.g., “Check out this Using social measurement coupon”—so removing it leaves negative sentiment effectively will require unstinting proportionally higher, as illustrated by Figure 3. attention to the quality of data FIGURE 3: EFFECT OF we consider. CLEANING ON SENTIMENT Uncleaned Cleaned The first step in harnessing the brand insights 35% contained in social data is to create a dataset that can be managed by the same principles 30% that govern established, trusted methodologies.