Quick viewing(Text Mode)

A Natural Language Processing Approach to Predicting the Persuasiveness of Marketing Communications

A Natural Language Processing Approach to Predicting the Persuasiveness of Marketing Communications

Marketing Science Institute Working Paper Series 2020 Report No. 20-104

A Natural Processing Approach to Predicting the Persuasiveness of Marketing

Siham El Kihal, A. Selin Atalay, and Florian Ellsaesser

“A Natural Language Processing Approach to Predicting the Persuasiveness of Marketing Communications” © 2020 Siham El Kihal, A. Selin Atalay, and Florian Ellsaesser

MSI working papers are distributed for the benefit of MSI corporate and academic members and the general public. Reports are not to be reproduced or published in any form or by any means, electronic or mechanical, without written permission. 1. Introduction

The use of language is ubiquitous in any marketing message communicated such as blogs, emails or product descriptions, and so is the decision problem how to formulate such messages so that they are effective with a wide audience. Most marketing messages are crafted with the goal of facilitating persuasion. Persuasion, changing individuals’ , feelings, or behavior (Rocklage et al. 2018) is a major outcome sought in marketing communications. Language, spoken and written alike, is the fundamental tool used to express a message intended for persuasion. With the growth of the internet and connectivity, companies today have many tools at their disposal that they can instrumentalize to spread various lengths of messages for numerous purposes. For instance, the use of email, social media posts, live chats, and blogs provide unprecedented opportunities to reach consumers with longer messages than those that can be used to reach consumers on traditional channels such as TV, radio or newspaper ads. Many companies are trying to make use of these opportunities. Recently, Brandwatch estimated that there are more than 60 million active business pages on Facebook attempting to reach their customers (Brandwatch 2019). The verbal content created online, per minute is immense: for example, using the internet, per minute 12,989,111 text messages are sent; 473,400 messages are tweeted on Twitter; 79,740 blogs are posted on Tmblr; pointing to a rising need to better evaluate which message will perform better and be more persuasive (Statista

2018).

The aim of the current research is to understand how the use of language impacts persuasion.

We answer the following research question: What is the role of language in predicting how persuasive a message will be? Our goal in asking this question is twofold. First, we want to contribute to the literature in persuasion by testing the unique role of language. Second, we want to provide guidance to decision makers or those that design marketing messages by developing a tool they can use to formulate their messages and/or assess the persuasiveness of their messages.

To answer this question, we focus on the two main elements of language that comprise each message: (1) The choice of used to communicate the content of the message which is referred to as diction. (2) The arrangement of the words creating the sentences, which is referred to as .

1

Marketing Science Institute Working Paper Series Syntax is the grammatical structure of the words communicating the intended of the sentences. Psycholinguistic studies have established that both diction and syntax are crucial for language processing (Bates 1995). Thus, we expect both these elements to impact how individuals process a message, and thereby impact how persuasive that message is. Building on this understanding, we develop a machine approach to predict the persuasiveness of messages as a function of diction and syntax. We use a dataset including 134 debates with 129,480 sentences and a follow-up experiment to measure content and syntactic complexity and predict the persuasiveness of messages. More specifically, we use the LIWC dictionary (Pennebaker et al. 2015) to classify the categories of words used in the message and measure content complexity. We take a natural language processing approach and use universal dependency based on convolutional neural networks to classify the syntax of the sentences in the message and measure syntactic complexity.

The measurement of syntactic complexity in larger corpora of language has only recently become possible with advances in natural language processing (NLP). An emerging stream of research in marketing has used NLP to improve decision-making. For instance, to analyse user-generated content to identify consumer content preferences, predict future changes in sales, or analyse market structures (Archak et al. 2011, Lee and Bradlow 2011, Liu and Toubia 2018, Netzer et al. 2012).

Recently, Timoshenko and Hauser (2019) used advanced NLP methods to identify customer needs from user-generated content as well as to identify new opportunities in marketing contexts such as new product development. Our approach combines advances in NLP with traditional dictionary approaches to investigate how both diction and syntax contribute to predicting the persuasiveness of a message. We show that syntax can be used to predict how persuasive any message will be, beyond diction.

2. Theoretical Background: Language and Persuasion

The Elaboration Likelihood Model (ELM, Petty and Cacioppo 1986, Petty and Wegener 1999) provides a comprehensive account of elements that impact attitude change and persuasion. The ELM is a dual-route theory, which explains that the message, source (i.e., speaker) and context elements of a are processed differently in different contexts to facilitate changes in judgment.

2

Marketing Science Institute Working Paper Series Language is part of the message, and the ELM points to how language may be processed by different individuals in different contexts. The ELM suggests that the motivation and ability of the audience are two factors that are instrumental to predicting how much the audience will be willing or able to elaborate on a message to accept or reject it. The motivation and the ability to elaborate on a message may be due to factors related to the audience (i.e. cognitive skills, involvement), the context (i.e., distractors are present, message is repeated) or to the message itself (i.e., number of arguments, complexity of arguments, comprehensibility of content).

According to the ELM, the impact of a persuasive message is differentiated by the degree of elaborative information processing activity, which is determined by the motivation, and ability of the audience. The ELM suggests that individuals who are motivated and able to elaborate on the message presented respond differently to complex messages, compared to those individuals that are not motivated or able to elaborate on the message presented. Complex messages may be harder to comprehend compared to simple messages due to limited processing abilities or when the motivation to process is low. From the perspective of ELM, this may mean that complex messages will be more persuasive than simple messages for those individuals that are motivated and able to deliberately process the message, because these individuals will be able to process the content and evaluate the claims made to form their judgments. Complex messages can also be persuasive for those individuals that are not motivated or able to deliberately process the message. Through heuristic processing, these individuals may make inferences (i.e., credibility, expertise) about the information and be nonetheless be persuaded by it.

Note that this view equates message complexity with content complexity. From the perspective of , there is however, another dimension of message complexity. This dimension is the syntactic complexity, which is the complexity of the grammatical structure of the words in the message. A message that is not complex in terms of content can still be complex in terms of its syntax.

Syntactic complexity impacts readability, reaction times, and recall (c.f., Lowrey 1998, Lowrey 2006,

Bradley and Meeds 2002), making it at least as critical as content complexity for the persuasiveness of

3

Marketing Science Institute Working Paper Series a message. We expect syntactic complexity to affect the persuasiveness of a message from an information processing perspective.

In order to process any syntax, working is required such that the linguistic material can be temporarily simultaneously stored and processed to extract its meaning (Gibson 1998, Lewis et al. 2006). Working memory is the limited resource used for information processing. It is the mental capacity individuals have to allocate to information processing tasks such as reasoning, comprehension, and learning (Cowan 2010, Engle 2001, Britton et al.1982, Bradley and Meeds 2002).

If there is not enough mental capacity, proper information processing is not feasible (Basil 1994,

Schneider et al. 1984, Shiffrin and Schneider 1977, Lang 2000, Lang 2006). The limited capacity of working memory that impacts learning tasks such as serial recall, also impacts language processing

(McElree 2006, Cowan 2001). To comprehend sentences and the overall meaning of a message, individuals have to process syntax, through a mechanism referred to as sentence parsing.

The process of sentence parsing refers to how individuals break down a sentence into its grammatical components, and identify the syntactic relationships between words. Sentence parsing is an automatic process that individuals engage in without thinking about it. The syntactic relations, the links between the words, are referred to as dependencies (Nivre et al. 2016, McDonald et al.

2013). For instance, to parse the sentence “Amazon delivers diapers”, the individual identifies the dependencies between the three words: the noun ‘Amazon’ which is the subject of the sentence, the verb ‘delivers’, and the noun ‘diapers’ which is the object. This is done automatically, and is key to comprehending the sentence. Albeit automatic, this process consumes working memory resources

(Gibson 1998): Individuals need to retain the words and the relations between the words in their working until they can correctly derive the meaning of the message (Swets et al. 2007).

Since working memory is a limited resource, as the syntactic complexity of a message increases, processing the message becomes harder, and may even be impaired (Gibson 1998). For instance, compared to parsing the sentence “Amazon delivers diapers”, parsing the sentence “Amazon, the retailer that changed how consumers shop, delivers high quality and cheap diapers” exerts higher demand on working memory to identify the dependencies.

4

Marketing Science Institute Working Paper Series Providing support to the premise that syntactic complexity has a cost to working memory, in the context of education, Britton et al. (1982) found that syntactic complexity increases the depletion of working memory resources as measured by reaction times and recall. There is also support in the extant marketing literature. In marketing to date, syntactic complexity has mainly been studied in the context of broadcasting and print advertising (c.f., Lowrey 1998, Lowrey 2006, Bradley and Meeds

2002), and it has been equated with readability (Lowrey 1998, Lowrey 2006, Metoyer-Duran 1993) or surface structure transformations (i.e., active voice sentences transformed into passive voice; Bradley and Meeds 2002). It was found that high syntactic complexity in advertisement scripts leads to lower levels of ad recall and recognition in both broadcasting and print advertising while high syntactic complexity in advertisement scripts lead to lower levels of persuasiveness in only in print advertising

(Lowrey 1998, Lowrey 2006). This converges with findings in the context of advertising slogans showing that, although syntactic complexity does not hurt comprehension of slogans, use of simple language structures such as the use of active voice increases slogan recognition. According to this stream of research, the use of moderately complex syntax helps with slogan recall as well as attitude change. The use of highly complex syntax however, hurts slogan recall as well as attitude change

(Bradley and Meeds 2002). The role of syntactic complexity holds for both spoken and written communication such that syntactic complexity has similar effects on recall and attitude change for either modality (Lowrey 2006, Flesch 1951). Nonetheless, it is hard to generalize from the context of advertising to other corpora of persuasive messages as advertising is a unique context in persuasion due to length of advertisement scripts or slogans. In addition, most advertisements are competing in an environment where they target a wide group of consumers with limited to try to persuade them. Thus, advertisement scripts or slogans need to be brief, catchy, memorable and comprehensive.

However, not all persuasive messages are subject to the same constraints as advertising. Messages that are longer and target consumers in individualized contexts may operate differently.

In the current research, we propose that the syntactic complexity of a message will be predictive of the persuasiveness of the message beyond content complexity indicated by the diction of the message. We predict that syntactic complexity will have a negative impact on the persuasiveness of a message, as the depletion of working memory resources due to processing the complex syntax 5

Marketing Science Institute Working Paper Series would impair the comprehension of the message. In other words, the fewer the working memory resources that go into processing a message, the easier it will be to comprehend the message (Gibson

1998). Stated bluntly, an argument cannot have an impact if the meaning cannot be processed due to syntactic complexity (Gibson 1998).

3. Capturing the role of content and syntax in persuasive message communication

3.1 Content and persuasion

Content complexity effects on persuasion. The content of a message is indicated with the choice of words used to communicate the message, namely the diction. In the LIWC dictionary we use to study diction (Tausczik and Pennebaker 2010, Pennebaker et al. 2015), content complexity is captured by the dictionary categories; articles, exclusive words, causal words, and negations (Tausczik and Pennebaker 2010). As explained in §2, from the perspective of the ELM, we expect that persuasion will benefit from content complexity.

Other content-based effects on persuasion. Recent work in the domain of persuasion has shown that persuaders spontaneously use more emotional words when they strive to persuade another individual even when they know that using emotional appeals may backfire (Rocklage et al. 2018).

Rocklage et al. (2018) showed that using emotional words is a widely used strategy among persuaders both using real-world data and in a lab experiment, but it has not been tested whether using emotional words is a successful strategy leading to persuasion. In the LIWC dictionary, the categories positive emotion and negative emotion broadly capture the use of emotion. We include these dictionaries as diction measures indicating content given emotional words are widely used as a persuasive strategy

(Rocklage et al. 2018).

Table 1 summarizes the dictionary categories, which are related to content and persuasion.

3.2 Syntax and persuasion

The syntax of a message is indicated by the linguistic relationships between the words used in the message. In the universal dependency approach we use to study syntax (Nivre et al.

2016, McDonald et al. 2013), the relationships between the words are called dependencies. Of the

6

Marketing Science Institute Working Paper Series various dependency types identified in this approach, we suggest that syntactic complexity is captured by the use of direct objects, conjuncts, coordinating conjuncts, preconjuncts, open clause complements, and possessives. Specifically, we expect that the use of direct objects as well as the use of coordinating conjuncts will have a positive effect on persuasiveness while the use of conjuncts, preconjuncts, open clause complements, and possessives will have a negative effect on the persuasiveness of a message. Table 2 provides an overview of the dependency types.

Dependency types. We identify six dependency types that we expect will have an impact on working memory resources either by simplifying the syntax or by increasing its complexity. These dependency types are:

1. A direct object is an object that is transformed by a verb. Sentences with direct objects have

simple syntax with low working memory costs. Ninety six percent of direct objects are right

embedded (Nivre 2015). A right-embedded sentence starts with the main verb, thus indicates a low

working memory cost to parse the sentence. Hence, we expect that the use of direct objects will have

a positive effect on the persuasiveness of the message.

2. A conjunct is two or more grammatical units connected with each other. Sentences with

conjuncts have complex syntax with high working memory costs. Sentences containing conjuncts

have a large average distance between the main verb of the sentence and the object it refers to

(Nivre, 2015). This implies that the main verb of the sentence needs to be kept in memory, until the

object that the verb refers to has been encountered and this increases the cost to working memory

(Graf and Marcinek 2014). Finally, a conjunct may not be explicitly conjoined by a coordinating

conjunct such as “and” or “or”, which may lead to parsing (i.e. uncertainty how to process

the conjunct). Parsing ambiguity has a high cost to working memory as multiple options of how to

parse the two parts of the sentences that are conjoined have to be kept in working memory until the

parsing ambiguity has been resolved (Engelhardt and Ferreira 2010). Thus, we expect that the use of

conjuncts will have a negative effect on the persuasiveness of a message.

3. A coordinating conjunct (such as e.g. or, and) is an explicit connection between conjuncts.

Sentences with coordinating conjuncts have simple syntax with low working memory costs. Their

7

Marketing Science Institute Working Paper Series use reduces parsing ambiguity and thus lowers the cost to working memory. Consequently, we

expect that the use of coordinating conjuncts will have a positive effect on the persuasiveness of a

message.

4. A preconjunct is the relation between the head of a noun phrase and a that appears at the

beginning of a bracketing conjunction (and puts emphasis on it), such as either, both, neither.

Sentences with preconjuncts have complex syntax with high working memory costs. Following the

same logic as conjuncts, we expect that the use of preconjuncts will have a negative effect on the

persuasiveness of a message.

5. In an open clause complement, the subject is inferred from an argument external to the open

clause complement. In other words, the meaning of an open clause complement is determined by the

context, which can create ambiguity such as in the sentence “the chicken is ready to eat” (Spivey-

Knowlton and Tanenhaus 2015). The context has to be kept in working memory to parse the open

clause complement. Sentences with open clause complements have complex syntax with high

working memory costs. Thus, we expect that the use of open clause complements will have a

negative effect on the persuasiveness of a message.

6. A possessive indicates a relationship between nouns. The encoding required to link two nouns

together increases the cost to working memory, especially if possessive pronouns are used.

Possessive pronouns replace nouns or noun phrases to avoid repetition and improve the style of

communication, yet they require the recipient of the communication to keep in working memory the

initial noun or noun phrase referred to. Hence, sentences with possessives have complex syntax with

high working memory costs. Thus, we expect that use of possessives will have a negative effect on

the persuasiveness of a message.

4. Data

To investigate the role of language in predicting how persuasive a message will be, we use publicly accessible debates from an online platform. We crawl transcripts of debates from the platform

Intelligence² Debates (IQ2). Intelligence² Debates is a nonpartisan, non-profit organization, based in the US. The platform organizes a debate series with the idea of exchanging arguments and contributing to a constructive public discourse on current and contemporary topics such as politics, 8

Marketing Science Institute Working Paper Series culture, education, and science. Figure 1 describes the distribution of the topics of debates in our data

(see the website for more details: https://www.intelligencesquaredus.org/). The debate series follows the Oxford style debate format, where one team is proposing and the other team is opposing a motion.

Each debate is hosted in front of an audience that has paid to attend the debate. A debate usually lasts about 2.5 hours. We crawl transcripts of all 134 debates available on the website by January 20181 resulting in more than 300 hours of debates. The resulting dataset consists of 129,480 sentences including 2,703,624 words. In each debate, there is a host and two teams composed of two to three speakers on each side of the motion. Each team gets the same amount of time to present their arguments. The audience watching the debates is polled for their attitudes on the motion: Each individual in the audience is asked to anonymously vote whether they are undecided, for or against the motion, both before the debate starts and upon completion of the debate. We have access to all aggregate vote results before and after the debate.

We collect and validate the transcripts of the debates provided by IQ2, and the attitude polls from the audience before and after the debate. We manually code all debates, to tag the speakers and identify which speaker belongs to which team.

The debates represent a unique setting for studying the role of language in predicting how persuasive a communication is. First, the debates are intended to persuade an audience, meaning that the main goal of messages communicated in the context of the debates is persuasion. Second, the debates are held in a highly controlled environment. Specifically, the teams chosen to compete to persuade the audience have a similar level of credibility and expertise. Additionally, both teams have approximately equal speaking times. Third, the average 2.5 hours length of the debates allows for a of language units with use of different words and syntax to naturally occur. Since an experimental manipulation of all possible language unit combinations leading to various levels and types of content and syntax with respect to their complexity would not be feasible, the debates provide a setting in which a large corpus of language can be studied to understand how language affects

1 Please note that since January 2018 Intelligence² Debates has made changes to how the debates are operated. However, the procedures used to operate the debates up to the end of our date collection period are the same.

9

Marketing Science Institute Working Paper Series persuasion. Fourth, the debates cut across a great variety of topics, such as culture, education, and technology (Figure 1). This allows us to test whether the results are topic dependent. Last and most importantly, the audience of the debates is polled to capture their attitudes both before and after the debate without any time lag. This allows us to measure the change in audience attitudes with respect to the motion as a result of the debate providing us with a unique measure of persuasion. Please note that an additional feature of this dataset is that the initial votes for and against the motion are on average equally distributed. This reduces sample bias in capturing persuasion. Table 3 provides descriptive statistics of the distribution of votes before and after the debates.

5. Method

Dependent Variable. We operationalize persuasion as the change in votes of the audience as captured by the polls before and after the debate. In each debate, we measure the persuasion per debate by computing the difference of the vote results taken before and after the debate, for each of the two teams. We explain how we compute the dependent variable using the numerical example in Table 4. In this example, before the debate, 30% of participants were for the motion, 20% against the motion, while the remaining 50% were undecided. After the debate, 50% were for the motion, 10% against the motion, and 40% were undecided. This means that votes for the motion increased by 20% while votes against the motion decreased by 10%. The total change in votes for and against the motion is 30%. See

Table 3 for the descriptives of the dependent variable.

5.1 Measurement of Content Complexity

To compute a measurement of content complexity of the message we look at diction using a dictionary approach that compares each word used in the corpus against a set of predefined categories of words, namely dictionaries. Specifically, we parse the sentences into their word stems and count word occurrences in each sentence in the dataset. Next, using the LIWC dictionary (Taucszik and

Pennebaker 2010, Pennebaker et al. 2015), we classify the words counted into categories. We compute the relative use of each word category in the given message by averaging the frequency of word category occurrences.

5.2 Measurement of Syntactic Complexity 10

Marketing Science Institute Working Paper Series To measure syntactic complexity, we extract the dependency relationships we describe in

§3.2. Our method uses convolutional neural networks to extract information about the syntactic structure of a sentence in order to reveal its dependency structure (Nivre 2015). The output of the method is a classification of the syntax of the language into dependency types.

We use the dependency parser of the spaCy natural language processing framework to classify the dependency structure of a sentence. Our methodology consists of four steps as illustrated in Figure

2. To identify the dependency structure of the sentences in the corpus, we first convert each word into a matrix (Step 1). We then perform word embedding into a latent space (Step 2), contextually embed those words (Step 3), and finally perform a transition-based classification to generate the dependency structure (Step 4).

Steps 1: Dictionary mapping. To classify the relevant dependencies using the dependency parser, we first encode all words into rows of vectors by taking each letter of the word and converting it into a 256 dimension one-hot encoded vector based on ASCI-II (a system that converts each letter into a number). Each word is converted into a matrix, where each row represents a letter of the original word.

Step 2: Word embedding. To reduce dimensionality, we map the resulting matrix from Step 1 into a dense embedded word vector length 256 via word2vec, which is the standard approach in natural language processing (Mikolov et al. 2013, Neelakantan et al. 2014). The mapping is done through a lookup table, which was learned by a skip-gram bi-directional neural network based on co- occurrences of the particular word with other words. In a skip-gram word embedding model, the words of the text are fed to the neural network, following their order in the sentences. Some of the words are left out (i.e., skipped) and the task of the neural network is to learn how to correctly predict those skipped words. Bi-directional means that the neural network has access to words following as well as preceding the skipped word. To predict the skipped words correctly, the neural network learns the co-occurrence structure of words in a language and is consequently learning the implicit meanings of words. For example, the words “latte macchiato” and “espresso” have a closely related meaning regardless of not sharing many letters, as they occur in the context of similar words such as “drink”,

11

Marketing Science Institute Working Paper Series “break”, “café”. In this sense, the meaning of a word is defined by the other words it usually co-occurs with. This is important at a later stage, since the meaning of a word also has implications for its function. Through word embedding, the neural network has learned, for example, that in general the words “nice” and “nicely” have a similar relationship as “happy” and “happily” (adjective to adverb).

In the next step, the words are then put into their specific context, which is the specific sentence they are uttered in.

Step 3: Contextual embedding. In this step, we need to grasp the function of words, given the specific relations they stand in with other words in a sentence. This is important since it allows the neural network to learn the different functions of the word “work”, such as a noun in “he went to work” and as a verb in “I work in the finance industry”, depending on the context. Similar to Collbert et al. (2011), we use convolutional neural networks, mainly for computational efficiency. Our approach employs three-gram CNN where layers are stacked on top of each other. Three-gram implies that each layer looks at three sentence tokens (e.g., three words) at a time (see Figure 3 for an illustration).

The convolutions mix information from one word on either side of the target word, allowing the algorithm to learn the function of the encoded word in the respective context. In the first layer, we have one word on either side, in the second layer two words on either side, in the third layer three words and so forth. This allows us to recalculate the word vector based on the surrounding context.

A single one-dimensional convolution is defined as follows:

∞ (1) 푦[푛] = 푥[푛] ∗ ℎ[푛] = ∑푘=−∞ 푥[푘] ∙ ℎ[푛 − 푘]

Where x[n] is the input signal (a word vector). h[n] is the impulse response, and y[n] is the output. * denotes convolution. We thus multiply the terms of x[k] by the terms of an index-shifted h[n] and add them up. In our specific case of the sentence parsing, the index-shift refers to words before and after the target word.

Once we get an output after convolving the vectors of a sentence, we add a bias term b to those outputs to ensure that the neuron only passes a signal, after a specific threshold (Equation 2).

12

Marketing Science Institute Working Paper Series Finally, we apply a non-linear activation function called rectifier linear unit (ReLU) defined as g(z)=max(0,z). The resulting activation a[1] of a neuron from the network is illustrated in Figure 4a and can be described as:

(2) 푧[1] = 푊[1] ∗ 푎[0] + 푏[1]

(3) 푎[1] = 푔(푧[1])

The type of filtering for the convolution is learned in terms of the weight vector W. On the initial layer with index 0, the activation a[0] is the row of vectors of the embedded words of the sentence. The general flow to calculate the activations from different layers l is illustrated in Figure 4a and can be described as follows:

(4) 푧[푙+1] = 푊[푙+1] ∗ 푎[푙] + 푏[푙+1]

(5) 푎[푙+1] = 푔(푧[푙+1])

(6) 푧[푙+2] = 푊[푙+2] ∗ 푎[푙+1] + 푏[푙+2]

(7) 푎[푙+2] = 푔(푧[푙+2])

Since deep neural networks have the problem of vanishing and exploding gradients between the layers and the potential of fatal information loss for deeper layers (Hochreiter et al. 2001), skip connections are added to the network. Skip connections form a bypass between the layers to avoid information loss (Mao et al. 2016). The idea is to enhance the flow of information by passing through the network both the original as well as the transformed input. Thus, the signal is not weakened during the process, and gradients can flow backwards more easily and do not vanish. To tackle the problem of vanishing gradients we add skip connections to the network (see Figure 4b).

Step 4: Transition-based classification. To classify the dependency structure of a sentence, we use a transition-based parser (Nivre et al. 2006, 2007), which treats parsing as a sequence of actions that produce a parse tree. Here, a classifier is trained to score the possible actions at each stage of the process to determine the parsing process.

13

Marketing Science Institute Working Paper Series The advantages of transition-based parsing are twofold. First, whilst the particular transitions are learned, transitions can be constrained with global rules on dependency types (Choi and Palmer

2011). We know from computational linguistics, for example, that some dependencies cannot follow from others. Second, transition-based parsing is efficient in terms of computation as only one word has to be added at a time. Therefore, the parser has to classify only the next operation, one-step at a time in a greedy manner (Bohnet 2010).

The transition-based parser is an abstract state machine that processes sentences step by step and has a dependency tree as the final output (Dyer et al. 2015). The parser works in a greedy manner, performing a series of locally optimal decisions allowing for very fast parsing speeds. The transition- based parser has a set of states and a set of transitions, to move from one state to another, moving word by word. The parser starts with an initial state with a “buffer” that contains the sentence to be classified and an empty “stack” in which words can be loaded into (Figure 5). At each step, the parser asks a guide (which is the classification from the convolutional neural network) to choose between one of several transitions into new states.

In the initial state, the stack is empty since all words are in the buffer (Figure 5). In the terminal state, the buffer is empty and the stack contains a single word. At each state, three types of transitions are possible (Figure 5): shift, left, and right (Eisenstein 2019).

1. Shift: move the next word from the buffer to the stack.

2. Left: add an arc from the last word in the stack, s1, to the second-last word, s2, and remove s2.

3. Right: add an arc from the second-last word on the stack, s2, to the last word, s1, and remove s1.

where s1 and s2 describe the positions of the words in the stack, as illustrated in Figure 5.

Figure 6 illustrates the transition-based parsing using the sentence “I ordered a book from

Amazon” as an example. Initially, the sentence is loaded into the buffer. In the initial step, only the transition shift is an option, since the stack is empty. Thus, the word “I” is moved to the stack. No dependencies are found since there is only one word on the stack. Another shift transition is performed pushing “ordered” to the stack. Afterwards, a left transition is performed, where a nominal subject

14

Marketing Science Institute Working Paper Series (nsubj) is identified as the dependency relating “I” and “ordered”. Consequently, “I” is removed from the stack. A shift transition is performed pushing “a” to the stack. Since no dependencies are found, another shift transition is performed moving the next word “book” to the stack. A left transition is then performed, describing the dependency relation determiner (det), relating the words “a” and “book”.

Therefore, “a” is consequently removed from the stack. After a few transitions, we reach the terminal state, in which the buffer is empty and the stack contains only one word “Amazon”. The result is then the dependency graph of the sentence “I ordered a book from Amazon” as illustrated in Figure 6.

So far, we have described how transition-based parsing works, by performing one of three transitions: shift, left, or right. Now we will explain how a particular transition is chosen through the classification task, which is the output of the convolutional neural network.

To classify the direction as well as the particular type of dependency. The output of the dependency classification is a sequence of per-class scores, where each possible dependency type between two particular words is one class. We have the three possible transitions “shift”, “right” and,

“left”. The “right” and left” transitions can be one of the 37 possible types of dependencies, while the

“shift” transition is only one single option. Therefore, there are 37*2+1=75 possible classes. For example, “right direct object” is one class. We determine the dependency class between two words in a sentence using argmax from these per-class scores.

To increase the efficiency of the algorithm, we use “arc-eager” dependency parsing (Goldberg and Nivre 2012, Nivre 2003). The arc-eager dependency parser allows for an additional transition, called “reduce”, which allows that longer-range arcs can be directly formed, so that less steps need to be taken, improving efficiency. Moreover, the algorithm uses beam search implying that it is not completely locally greedy (Zhang and Clark 2008, Zhang and Nivre 2011). Rather than just evaluating a single step at a time, it takes the most likely options for some steps ahead and thus overcomes some of the problems of a completely “myopic” algorithm.

The model is trained based on a pre-classified corpus, the English OntoNote corpus

(Weischedel 2013), which contains examples of sentences that have been classified in terms of their dependency structure. It contains 635 thousand sentences from newspapers, 200 thousand sentences

15

Marketing Science Institute Working Paper Series from broadcast conversations, 300 thousand sentences from the web, and 120 thousand sentences from television conversations. We train our model on this corpus and the resulting feedback on the correct classification of dependencies takes the form of an error gradient of the loss function. We define the loss function as multi label log-loss (Crouch et al. 2002):

1 (8) 퐿표푠푠 = ∑푁 ∑푇 −푦′ 푙표푔 푦 푁 푛=1 푡=1 푛푡 2 푛푡

’ where y nt is the correct value of the label for the particular example (a sentence to be parsed), at the particular state of the dependency parser and ynt is the predicted value for the class generated by our neural network (both between 0 and 1) and n the index for the example. The convolutional neural network is optimized using mini batch stochastic gradient decent. The gradient flows back through back propagation optimizing the convolutional neural networks to perform correct classifications.

5.3 Model

To investigate the role of content complexity and syntactic complexity on persuasion, we estimate a linear regression model with persuasion, the change in votes, as the dependent variable. To construct our independent variables, we count the frequency of each of the dictionary categories

(content) and the frequency of the types of dependencies (syntax) per sentence. We then calculate the mean of these frequencies for each of the teams in the debate. Recall that our goal is to predict the total change in votes (persuasion) as a function of content complexity and syntactic complexity of the language used by the “for” and “against” teams. Therefore we build the independent variables measuring content complexity and syntactic complexity as the difference between the content and syntactic complexity of the language used by each of the teams. Moreover, we rescale these variables

(content and syntactic complexity) by multiplying them by the factor 100. Table 5 provides summary statistics of the variables.

We model persuasion as a function of initial votes, topic of the debate, language content, and language structure. We estimate different models, which include different sets of variables. Model 1, which we define as the baseline model, includes initial votes before the debate and the topic of the debate. Model 2 includes content complexity in addition to the variables included in Model 1. Finally,

16

Marketing Science Institute Working Paper Series Model 3 includes content complexity as well as syntactic complexity, in addition to the variables included in Model 1.

(9) 푝푒푟푠푢푎푠𝑖표푛 = 푓(𝑖푛𝑖푡𝑖푎푙 푣표푡푒푠, 푑푒푏푎푡푒 푡표푝𝑖푐, 푐표푛푡푒푛푡, 푠푦푛푡푎푥)

To assess the predictive accuracy of the models, we use k-fold cross validation with k=20. We randomly split the data into the 20 folds. We use 19 folds to train the models and use the remaining fold as a holdout to test predictive accuracy in terms of mean square error (MSE). We repeat the procedure on all possible combinations of the folds (20 times), and calculate the average increase in predictive accuracy (decease in MSE) across the folds as well as the respective standard deviation. We also calculate an adjusted, one sided t-test, testing whether the increase in predictive accuracy is significant.

6. Results

Table 6 compares the predictive accuracy of the models, when trained with different sets of variables. Model 1 is the baseline model and uses initial votes (initial attitude) and the debate topic

(dummies) to predict persuasion. Model 2, adds the measurement of content complexity, namely the

LIWC dictionary categories. Model 3, in addition to the variables in Model 2, includes syntactic complexity, namely the dependencies we extracted using convolutional neural networks.

Compared to the baseline model, out of sample MSE improves by 9.4% when adding content complexity to train the model (Model 2). A t-test shows that this improvement in predictive accuracy is not significant. Including syntactic complexity to train the model (Model 3) improves the predictive accuracy significantly (p<.05) compared to the baseline model by 32.0%, which is an additional 22.6% compared to Model 2. The increase in predictive accuracy of Model 3 that includes the syntactic complexity compared to Model 2 is statistically significant (p<.05). Hence, the additional increase in predictive power from adding syntactic complexity is more than twice as high as the increase from adding content complexity.

To understand which content complexity and syntactic complexity elements have a strong effect on persuasion, so that the message communicated can be adjusted accordingly by decision

17

Marketing Science Institute Working Paper Series makers, we analyse the regression results in more detail. Table 7 shows the results of our estimation.

The baseline model (Model 1), which includes initial votes and debate topic dummies, shows that the initial share of undecided individuals in the audience (before the debate) positively impacts change in votes (β=0.32; p<.05). This shows that the higher the initial share of undecided individuals in the audience, the more likely it is that these individuals swing to either for or against the motion, after the debate.

Impact of content complexity. Including the dictionary categories in the model increased the adjusted R2 of the model from 0.16 for the baseline model to 0.19 (Model 2). Recall that we included six dictionary categories that capture the role of content complexity in persuasion (i.e., articles, exclusive words, causal words, negations, positive emotion, and negative emotion). The estimation results show that only one of these dictionary categories has a significant impact on persuasion. We find that the “exclusives” dictionary category, (e.g., but, exclude, without), has a significant positive impact (β=7.31; p<.05) on persuasion. This is consistent with our predictions based on the ELM. The usage of “exclusives” is associated with content complexity as well as honesty in communication

(Tausczik and Pennebaker 2010) suggesting that both the information provided and the trust gained may have increased persuasion. The remaining dictionary categories that are linked to content complexity, articles, causal words, and negations do not have a significant impact on persuasion.

Though explaining why exclusives have a significant effect on persuasion while articles, causal words, and negations do not have a significant effect on persuasion, is beyond the scope of our data, the findings are consistent with extant literature in marketing. Exclusives maybe persuasive because they provide both sides of an argument preventing the audience from coming up with counterarguments while improving the perceived trustworthiness of the speaker. This is consistent with the literature that studies the use of two-sided messages. In brief, two-sided messages are often viewed as trustworthy because the source communicating the message includes both positive and negative claims about the argument being made, and thus they are more persuasive (Hovland, Janis and Kelly 1953).

Recall that we included the dictionary categories positive emotion and negative emotion in light of the recent findings that individuals spontaneously use more emotion words when they intend

18

Marketing Science Institute Working Paper Series to persuade (Rocklage et. al, 2018). The analyses show that using emotional language as a persuasion strategy (Rocklage et. al, 2018) does not increase persuasiveness of the message, in this context, even though it is a widely used strategy by persuaders (see Table 5 for descriptives of the use of emotional language in the data). More studies are needed before drawing any conclusions with respect to the effectiveness of using emotional language in persuasion.

Impact of syntactic complexity. Including the dependencies in the model (Model 3) led to a considerable increase in adjusted R2, to 0.33 (compared with 0.19 for Model 2), showing a large improvement in the model fit as a result of including syntactic complexity. Recall that we included six dependency types that capture syntactic complexity in persuasion (i.e., conjuncts, preconjuncts, coordinating conjuncts, direct objects, open clause complements, and possessives). The estimation results show that out of the six types of dependency structure, four have a significant impact on persuasion. Consistent with our predictions based on the ELM, we find that the use of the dependency type “conjunct” (β=-7.92; p<.01) reduces persuasion. The use of the dependency type “direct object”

(β=4.54; p<.01) increases persuasion. The use of the dependency type “coordinating conjunct”

(β=5.60; p<.05) increases persuasion. The use of the dependency type “open clause complements”

(β=-8.32; p<.01) reduces persuasion. Finally, we did not find any effect of “preconjuncts” and

“possessives” on persuasion. The significant effects are consistent with our expectations listed in §3.2.

Taken together, the results suggest that in order to be persuasive, communicators should avoid the use of complex syntax. This can be accomplished by using direct objects and coordinating conjuncts while avoiding conjuncts and open clause complements.

7. Follow-up Experiment

One unique aspect of our dataset is that, inherent in it is the competition of two teams (for and against the motion) that seek to persuade the same audience at the same time. While this is highly representative of some persuasive communication contexts such as brands competing with other brands in the market place or political candidates having a debate with other candidates, one could ask if the role of syntactic complexity would be different in a context in which the audience is exposed to only a single communicator. Additionally, the debates span a long time, about 2.5 hours on average.

19

Marketing Science Institute Working Paper Series Whereas this allows for various types of syntactic complexity to occur naturally, it also may point to other factors instantiating the cost to working memory. Therefore, one could ask if the role of syntactic complexity would be different in a context where the audience is exposed to a shorter message. Finally in our data persuasiveness of the debate is measured at an aggregate level, so we do not have individual measurements of attitude change. In an attempt to address these concerns, we ran a follow- up experiment at a European business school, in which we asked 74 individuals (average age 21.4, with 47% female) to read a text on the benefits of wearing a helmet while biking. In this experiment we measured their attitudes towards wearing a helmet before and after reading the text.

We manipulated the levels of syntactic complexity (low/high) while holding the content of the text constant. Syntactic complexity was manipulated using our approach described in §5. Table 8 shows the exact wording of the two texts (Version 1 &2), with high and low syntactic complexity and the complexity measurements for both texts in terms of content and syntactic complexity. The measurements of language complexity show that while content complexity of both texts is almost identical, syntactic complexity of Version 1 of the text is considerably lower than the syntactic complexity of Version 2 of the text (see Table 8 for all measurements).

During the experiment, participants were randomly assigned to read one of the two texts.

Before reading the text, we measured participants’ attitudes toward wearing a helmet when riding a bike by asking them “How essential is it for you to wear a helmet when biking?”, on a 7-point scale.

We asked the same question again after reading the text, to measure the change in participants’ attitudes towards wearing a helmet when biking. Distractor questions were included to prevent the participants from guessing the hypothesis.

The results corroborate our main findings. The relative change of attitudes of the group who read the text with high syntactic complexity was considerably lower (M=0.46; SD=0.57) than the relative change of attitudes of the group who read the text with low syntactic complexity (M=0.79;

SD=0.98). This difference in individual attitudes’ change is significant (t(72)=-1.78; p=0.039) and provides additional support that high syntactic complexity hurts the persuasiveness of the message

(Figure 7). This result provides further evidence for the strong and significant impact of syntactic

20

Marketing Science Institute Working Paper Series complexity on persuasiveness of a message, and highlights how communicators can use our approach to assess and adjust the complexity of their messages.

8. Robustness Tests

To test the robustness of our findings, we run several tests:

 Including only syntactic complexity. To further test the impact of syntactic complexity on

persuasion, we estimate a model in which persuasion is only a function of syntactic complexity

(see Model 4 in Table 9). The estimation results show that in general, syntactic complexity has a

significant impact on persuasion and we can replicate our findings (from Model 3in Table 7) on

the impact of the different dependency types on persuasion.

 Exclusion of potential outliers. For one debate, no one in the audience indicated an attitude for

the motion before the debate. Classifying this debate as an outlier, and excluding it from the data

did not change the results (see Model 5 in Table 9).

 Alternative operationalization of syntactic complexity. To test whether the use of alternative

measures of syntactic complexity (other than the dependencies) would change the predictive

accuracy of our model, we calculated three measures of graph length. The first measure of graph

length is the average dependency distance and indicates the average length of the relational arches

that each word spans in a sentence (see Figure 6 for an example of relational arches between

words). The second measure of graph length is the maximum dependency distance, and indicates

the longest spanning arch of every sentence. Finally, the third measure of graph length is the

maximum shortest distance necessary to understand a sentence, and indicates the minimum length

that has to be parsed by a reader to comprehend a sentence. Using these three measures of graph

length to measure syntactic complexity (Model 6 in Table 9) shows that only the maximum

dependency distance has a negative impact on persuasion (β=-4.13, p<.10), showing that as the

longest spanning arches of sentences become longer, persuasion is decreased. Moreover, in terms

of predictive accuracy, we find that using the dependency distances as an alternative measure for

syntactic complexity improved the model’s predictive accuracy by 22.6% compared to the

baseline model. This is an improvement of predictive accuracy by 13.2%, compared to Model 2

21

Marketing Science Institute Working Paper Series that only includes content in addition to the baseline model. However, this model with the

alternative measures of syntactic complexity underperforms compared to Model 3, where the

dependency types are included to capture the syntactic complexity and where we were able to

achieve 32.0% improvement in predictive accuracy compared to the baseline. This provides

further evidence for the power of dependency types in capturing syntactic complexity in message

communication.

9. Discussion

Building on the premise that language is a fundamental element of any persuasive communication, we examined two key aspects of language, diction and syntax, to test their unique roles in predicting the persuasiveness of a message. Based on the ELM (Petty & Cacioppo, 1986), we predicted that language complexity as indicated by both content complexity and syntactic complexity of a message will have an impact on the persuasiveness of a message. We tested our predictions using dictionary approaches to measure content complexity, and a universal dependency grammar approach to measure syntactic complexity in a unique dataset, comprised of 134 debates with 129,480 sentences on different topics. We found a positive effect of content complexity on persuasion as captured by the use of the dictionary category exclusives. Our main finding is that syntactic complexity has a significant and strong impact on persuasion: When the syntax is more complex, persuasiveness of the message is diminished. Our model shows that using both content complexity and syntactic complexity as predictors, improves the accuracy of predicting the persuasiveness of a message by about 32%, compared to a baseline model where use of language is not factored in. Furthermore, we show that adding syntactic complexity to the baseline model has roughly twice the effect of adding content complexity to the baseline model in terms of increasing of predictive power of the model. Stated bluntly, our results suggest that syntactic complexity is a stronger predictor of the persuasiveness of a message compared to content complexity, highlighting the importance of considering syntactic complexity while designing a message. This is consistent with the literature that argues that an argument cannot make the desired impact (i.e., be persuasive), if the content cannot be processed due the syntactic complexity of the message (Gibson 1998). The results are corroborated by a follow-up

22

Marketing Science Institute Working Paper Series experiment in which the syntactic complexity of a message was manipulated, and its effect on the persuasiveness of the message was measured.

In a marketing context, our work is one of the first to use convolutional neural networks, to enable the automatic and effective classifications of language dependency structures. Our methodology can be employed for both short and large corpora of communications. This makes it easy for academics and practitioners alike to use our model as a tool in research as well as managerial decision making. Marketers today have access to large amounts of textual data, as well as vast number of opportunities to reach their customers (online chats, blogs, social media). In similar vein, consumers today expect and demand more communication from companies. We provide practitioners with a tool to assess language in which we account for both diction and syntax simultaneously, to develop and predict the persuasiveness of their messages. Our tool can be used for both spoken and written messages. It is not restricted to length, motivation, or content of the message.

In the current work, we leveraged the predictive power of syntactic complexity to build a model extracting syntactic complexity and making predictions of its impact on persuasive outcomes.

Our work introduces a novel use of Natural Language Processing for decision making with respect to how to formulate messages as well as how to assess their complexity to increase persuasion. The model is not restricted to persuasion as an outcome. Using our measurement of syntactic complexity, syntax can be studied both as a predictor or outcome to further understand the role of language in a marketing context. For example, online retailers can use our method to analyse online product descriptions and assess the complexity of the language used as well as how this complexity ties to quality , sales, expectations about products, and satisfaction outcomes. An entrepreneur posting a project on Kickstarter, may use our model to design the product description in an effective way, using language that is beneficial for persuasion and convince more supporters to contribute financially. In another domain, in replying to customer emails, our tool can help improve the quality and effectiveness of the email communication by helping marketers formulate a message that can be comprehended more easily above and beyond its content. Taken together, our model can serve as a

23

Marketing Science Institute Working Paper Series decision-making tool that can be used both in making predictions about the success of verbal communications as well as identifying and fixing verbal communications that are not effective.

As the role of language in any communication is ubiquitous, our work paves the path for a systematic analysis of various elements of language to understand the effects of language use in various domains. From writing a news article to communicating a new policy, it is important to ascertain that the audience can elaborate on the communication. It is the first step to accomplish most goals that may be pursued in using verbal communication. Needless to mention, the benefits from using our model are not restricted to marketing contexts. Companies can improve their internal communication with their employees to increase motivation. Contracts can be written better if the impact of language is factored in to avoid misunderstandings. Educators can improve learning outcomes if they pay attention to language use. Presidential candidates can reach a wider base of voters. Compliance with various rules and regulations can increase if more persuasive language is used. As these few examples demonstrate, factoring in the role of language has the potential to improve communication across domains beyond the context of persuasion.

Our work facilitates the discussion of the role of syntactic complexity along with content complexity in designing effective verbal communications in any field. We find that syntactic complexity is more predictive of success of persuasive communications compared to content complexity. Certainly, this should not undermine the importance of content complexity in persuasive communication. Future work can bring into this discussion speaker and audience effects to enhance the predictive power of language use in verbal communications, and further compare the predictive roles of content complexity and syntactic complexity.

24

Marketing Science Institute Working Paper Series 10. References

Archak N, Ghose A, Ipeirotis PG (2011) Deriving the pricing power of product features by mining

consumer reviews. Management Sci. 57(8):1485–1509

Baddeley A. (1992) Working memory. Sci. 255(5044):556–559.

Baddeley AD, Hitch G (1974) Working memory. Psych. Learning and Motivation. 8:47–89.

Basil MD (1994) Multiple resource theory I: Application to television viewing. Communication Res.

21(2):177–207.

Bates, M. (1995). Models of natural language understanding. Proc. the National Academy of Sciences

of the United States of America. 92(22):9977–9982.

Bohnet B (2010) Very high accuracy and fast dependency parsing is not a contradiction. Proc. 23rd

Int. Conf. Comput. Linguist. 89–97.

Bradley SD, Meeds R (2002) Surface‐structure transformations and advertising slogans: The case for

moderate syntactic complexity. Psych. & Marketing. 19(7-8):595–619.

Brandwatch (2019) 53 incredible Facebook statistics and facts. Accessed March 28, 2019,

https://www.brandwatch.com/blog/facebook-statistics/.

Britton BK, Glynn SM, Meyer BJ, & Penland MJ (1982). Effects of text structure on use of cognitive

capacity during reading. J. Ed. Psych. 74(1):51–61.

Choi JD, Palmer M (2011) Getting the most out of transition-based dependency parsing. Proc. 49th

Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol. 687–692.

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language

processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug):2493–2537.

Cowan N (2001) The magical number 4 in short-term memory: a reconsideration of mental storage

capacity. Behav Sci. 24(1):87–114.

Cowan N (2010) The magical mystery four: How is working memory capacity limited, and why?

25

Marketing Science Institute Working Paper Series Curr. Dir. Psych. Sci. 19(1):51–-57.

Crouch R, Kaplan RM, King TH, Riezler S (2002) A comparison of evaluation metrics for a broad-

coverage stochastic parser. Proc. Lr. Beyond PARSEVAL Work. 67–74.

Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing

with stack long short-term memory. Proc. 53rd Annu. Meet. Assoc. Comput. Linguist. 7th Int. Jt.

Conf. Nat. Lang. Process. (Association for Computational Linguistics, Cambridge, MA), 334–

343.

Eisenstein J (2019) Introduction to natural language processing (MIT Press, Cambridge, MA).

Engelhardt PE, Ferreira F (2010) Processing coordination ambiguity. Lang. Speech. 53(4):494–509.

Flesch R (1951). The art of clear thinking. Oxford, England: Harper.

Gibson E (1998) Linguistic complexity: locality of syntactic dependencies. Cognition. 68(1):1-76.

Goldberg Y, Nivre J (2012) A dynamic oracle for arc-eager dependency parsing. Proc. COLING 2012.

959–976.

Graf T, Marcinek B (2014) Evaluating evaluation metrics for minimalist parsing. Proc. Fifth Work.

Cogn. Model. Comput. Linguist. 28–36.

Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J, others (2001) Gradient flow in recurrent nets: the

difficulty of learning long-term dependencies. A F. Guid. to Dyn. Recurr. neural networks.

(IEEE Press, New York), 237–244.

Hovland CI, Janis IL, Kelley HH (1953) Communication and persuasion. New Haven, CT, US: Yale

University Press.

Lang A (2000) The limited capacity model of mediated message processing. J. Communication. 50(1):

46-70.

Lang A (2006) Using the limited capacity model of motivated mediated message processing to design

effective cancer communication messages. J. Communication. 56(1): 57-80.

26

Marketing Science Institute Working Paper Series Lewis RL, Vasishth S, Van Dyke JA (2006) Computational principles of working memory in sentence

comprehension. Trends Cogn. Sci. 10(10):447-454.

Lee TY, Bradlow ET (2011) Automated marketing research using online customer reviews. J.

Marketing Res. 48(5):881–894.

Liu J, Toubia T 0 (2018) A semantic approach for estimating consumer content preferences from

online search queries. Marketing Sci. 37(6):930-952.

Lowrey, Tina M. (1998), The Effects of syntactic complexity on advertising persuasiveness. J.

Consumer Psych. 7(2):187-206.

Lowrey, Tina M. (2006), The relation between script complexity and commercial memorability. J.

Advertising. 35(3):7-15.

Mao XJ, Shen C, Yang YB (2016) Image restoration using convolutional auto-encoders with

symmetric skip connections. arXiv Prepr. arXiv1606.08921.

McElree B (2006). Accessing recent events. In B. H. Ross (Ed.), The psychology of learning and

motivation: Vol. 46. The psychology of learning and motivation: Advances in research and

theory (pp. 155-200). San Diego, CA, US: Elsevier Academic Press.

Metoyer-Duran C (1993) The readability of published, accepted, and rejected papers appearing in

college & research libraries. College & Research Libraries. 54(6):517-526.

Mikolov T, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and

their compositionality. Adv. Neural Inf. Process. Syst. 3111–3119.

Neelakantan A, Shankar J, Passos A, McCallum A (2014) Efficient non-parametric estimation of

multiple embeddings per word in vector space. Proc. 2014 Conf. Empir. Methods Nat. Lang.

Process. 1059–1069.

Netzer O, Feldman R, Goldenberg J, Fresko M (2012) Mine your own business: Market-structure

surveillance through text mining. Marketing Sci. 31(3):521–543.

Nivre J (2003) An efficient algorithm for projective dependency parsing. Proc. 8th Int. Work. Parsing 27

Marketing Science Institute Working Paper Series Technol. 149–160.

Nivre J (2015) Towards a universal grammar for natural language processing. Int. Conf. Intell. Text

Process. Comput. Linguist. 3–16.

Nivre J, Hall J, Nilsson J (2006) Maltparser: A data-driven parser-generator for dependency parsing.

Proc. Fifth Int. Conf. Lang. Resour. 2216–2219.

Nivre J, Hall J, Nilsson J, Chanev A, Eryigit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: A

language-independent system for data-driven dependency parsing. Nat. Lang. Eng. 13(2):95–

135.

Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric

properties of LIWC2015 (University of Texas at Austin., Austin, TX).

Petty RE, Cacioppo JT (1986) The elaboration likelihood model of persuasion. Adv. Exp. Soc. Psych.

19(124–192):1–24.

Rocklage MD, Rucker DD, & Nordgren LF (2018). Persuasion, emotion, and language: The intent to

persuade transforms language via emotionality. Psych. Sci. 29(5):749–760.

Schneider W, Dumais ST, Shiffrin RM (1984) Automatic and controlled processing revisited. Psych

Rev. 91(2):269-276.

Shiffrin RM, Schneider W (1977). Controlled and automatic information processing: II.

Perceptual learning, automatic attending and a general theory. Psych. Rev. 84(2):127-190.

Spivey-Knowlton M, Tanenhaus M (2015) Perspectives on sentence processing. Frazier L, Keith R,

eds. Perspect. sentence Process. (Psychology Press, London, UK), 173–195.

Statista (2018) Media usage in an internet minute as of June 2018. Accessed March 11, 2019,

https://www.statista.com/statistics/195140/new-user-generated-content-uploaded-by-users-per-

minute/.

Swets B, Desmet T, Hambrick DZ, Ferreira F (2007) The role of working memory in syntactic

ambiguity resolution: a psychometric approach. J. Exp. Psych. Gen. 136(1):64-81. 28

Marketing Science Institute Working Paper Series Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized

text analysis methods. J. Lang. Soc. Psych. 29(1):24–54.

Timoshenko A, Hauser JR (2019) Identifying customer needs from user-generated content. Marketing

Sci. 38(1):1-20.

Weischedel R (2013) OntoNotes Release 5.0 LDC2013T19. OntoNotes Release 5.0 LDC2013T19.

Web Download. (Linguistic Data Consortium, Philadelphia).

Zhang Y, Clark S (2008) A tale of two parsers: investigating and combining graph-based and

transition-based dependency parsing using beam-search. Proc. Conf. Empir. Methods Nat. Lang.

Process. 562–571.

Zhang Y, Nivre J (2011) Transition-based dependency parsing with rich non-local features. Proc. 49th

Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol. short Pap. 2. 188–193.

29

Marketing Science Institute Working Paper Series 11. Tables and Figures

Table 1: Description of dictionary categories

Dictionary Category Examples Causation because, effect Articles a, an, the Negation no, not, never Exclusives but, without, exclude Nonfluencies er, hm, umm Positive emotion love, nice, sweet Negative emotion hurt, ugly, nasty

30

Marketing Science Institute Working Paper Series Table 2: Description and illustration of dependency types

31

Marketing Science Institute Working Paper Series Table 3: Descriptives of votes distribution before and after the debate

Variables Mean St. Dev. Min Max

Before the debate For the motion 36.8% 14.2% 0.0% 100.0% Against the motion 31.8% 13.2% 0.0% 63.0% Undecided 31.2% 9.5% 0.0% 51.0% After the debate For the motion 46.9% 15.7% 0.0% 100.0% Against the motion 44.4% 15.7% 0.0% 85.0% Undecided 8.5% 3.4% 0.0% 17.0%

Absolute change in opinion -2.3% 21.2% -46.0% 51.0%

Length of debates: number of words 20,176 50,368

Number of speakers per team 2 2 4

32

Marketing Science Institute Working Paper Series Table 4: Numerical example

For the Against the Total change motion motion Before the debate 30% 20% After the debate 50% 10%

Absolute change in votes 20% -10% 30%

33

Marketing Science Institute Working Paper Series Table 5: Descriptive statistics

Variables Mean Std.Dev. Min Max For the motion (%) 37.0% 14.3% 0.0% 100.0% Against the motion (%) 31.6% 13.3% 0.0% 63.0% votes Initial Undecided (%) 31.0% 95.1% 0.0% 51.0% Culture 0.112 0.316 0.000 1.000 Economics & Finance 0.187 0.391 0.000 1.000 Education 0.090 0.287 0.000 1.000 Energy & Environment 0.052 0.223 0.000 1.000 Health 0.134 0.342 0.000 1.000 Law 0.149 0.358 0.000 1.000 Politics 0.187 0.391 0.000 1.000

Topics Religion 0.060 0.238 0.000 1.000 Science 0.082 0.276 0.000 1.000 Sports 0.030 0.171 0.000 1.000 Technology 0.104 0.307 0.000 1.000 U.S. 0.440 0.498 0.000 1.000 World 0.291 0.456 0.000 1.000 Causation -0.013 0.384 -1.047 0.984 Articles 0.011 0.745 -2.008 2.121 Negation -0.035 0.293 -0.879 0.780 Exclusives -0.009 0.492 -1.336 1.555 Content Positive emotions -0.071 0.489 -1.062 1.225 Negative emotions 0.037 0.495 -1.454 1.807 Direct objects 0.009 0.586 -1.651 1.533 Coordinating conjunctions 0.010 0.548 -1.328 1.190 Preconjuncts -0.004 0.034 -0.132 0.070 Conjuncts -0.058 0.550 -1.156 1.953 Syntax Possessives 0.019 0.370 -1.322 0.936 Open clausal complements -0.007 0.319 -1.050 0.751

34

Marketing Science Institute Working Paper Series Table 6: Improvement in predictive accuracy

Out of sample Out of sample MSE In sample MSE MSE improvement in %

Model 1 94.9 162.8 -

Model 2 86.9 147.5 9.4%

Model 3 67.5 110.7 32.0%

Note: All models include the initial votes and the debate topic.

35

Marketing Science Institute Working Paper Series Table 7: Estimation results

Model 1 Model 2 Model 3 coef SE coef SE coef SE For the motion -0.11 0.12 -0.11 0.12 -0.03 0.11 Against the motion 0.04 0.12 0.03 0.13 0.03 0.12 votes Initial Undecided 0.32 ** 0.13 0.33 ** 0.13 0.35 *** 0.12 Causation 1.60 2.58 1.51 2.45 Articles 0.50 1.41 0.52 1.45 Negation -6.16 4.21 -5.16 3.84 Exclusives 7.31 *** 2.49 6.41 *** 2.40 Content Positive emotions -1.76 2.08 -0.86 1.95 Negative emotions 0.87 1.99 1.24 1.87 Direct objects 4.54 *** 1.64 Coordinating conjunctions 5.60 ** 2.61 Preconjuncts 21.05 28.06 Conjuncts -7.92 *** 2.39 Syntax Possessives -3.92 2.59 Open clausal complements -8.32 *** 3.02

Intercept 0.99 11.25 0.91 11.13 -3.39 10.83

Adjusted-R2 0.16 0.19 0.33 Notes: (1) Topic dummies are included in all models; (2) ***, **, * denote significance at 1%, 5%, and 10% respectively.

36

Marketing Science Institute Working Paper Series Table 8: Texts used in experimental manipulation and measurements of content and syntactic complexity

Manipulation Version 1: Low syntactic complexity Version 2: High syntactic complexity

Wearing a bicycle helmet may not seem like the cool thing to do. To protect A bicycle helmet may not seem like the cool thing to wear, if you want to protect yourself, wearing a bicycle helmet every time that you ride your bike is essential. yourself, having a bicycle helmet worn every time that you bike is essential. Wearing a bicycle helmet is beneficial for many reasons: Wearing a bicycle helmet is beneficial for many reasons: First, a helmet protects your head. You need to protect your head while biking. First, a helmet protects your head. You need to protect your head while biking, Injuries to the head are one of the main causes of death in biking accidents. injuries to the head are one of the main causes of death in accidents due to biking Helmets are the single most effective way to reduce this risk by up to 85-88%. and helmets are the single most effective way of risk reduction by up to 85-88%. Second, wearing a helmet, you are setting an example for children. Children under Second, wearing a helmet, you are an example for children. Children that are under the age of 15 accounted for about 53 percent of bicycle injuries treated in the age of 15 accounted for about 53 percent of bicycle injuries treated in Text emergency room departments. emergency room departments. Third, wearing a helmet keeps your head warm and dry and clean. If you are bike Third, a helmet keeps your head warm, dry, clean. If you are bike riding at a time riding at a time of year when the weather is wet or chilly, a bicycle helmet can help of year when the weather is wet or chilly, a bicycle helmet can help to keep your to keep your head nice and dry. The helmet would trap the escaping heat from head nice, dry. The escaping heat from your body would be trapped by the your body. It would also prevent birds pooping on your head while biking. helmet. Birds would also be prevented from pooping on your head while biking.

Causation 3 3 Articles 17 18 Negation 1 2 Exclusives 3 4 Content Positive emotions 1 1 Negative emotions 4 4 Direct objects 14 7 Coordinating 4 2 conjunctions Preconjuncts 0 0 Conjuncts 4 5 Syntax Possessives 7 6 Open clausal 2 3 complements

37

Marketing Science Institute Working Paper Series Table 9: Robustness tests - Estimation results

Model 4 Model 5 Model 6 Only syntax Excluding outliers Alternative syntax measures coef SE coef SE coef SE For the motion -0.03 0.11 -0.06 0.12 Against the motion 0.01 0.12 0.01 0.12 votes Initial Undecided 0.36 *** 0.12 0.37 *** 0.13 Causation 1.61 2.48 2.34 2.59 Articles 0.49 1.46 0.87 1.52 Negation -5.09 3.86 -7.42 * 4.42 Exclusives 6.36 *** 2.42 7.91 *** 2.57 Content Positive emotions -0.86 1.96 -2.52 2.10 Negative emotions 1.12 1.92 0.72 1.97 Direct objects 3.76 ** 1.62 4.43 *** 1.70 Coordinating conjunctions 7.94 *** 2.44 5.70 ** 2.65 Preconjuncts 35.94 28.09 21.78 28.31 Conjuncts -9.22 *** 2.44 -7.98 *** 2.41 Syntax Possessives -2.54 2.65 -3.91 2.60 Open clausal complements -9.03 *** 2.97 -8.29 *** 3.03 0.00 0.00 Average dependency distance Max dependency distance -4.13 * 2.15

Syntactic distances Min dependency distance 1.50 3.16 Intercept 9.63 *** 0.93 -3.21 10.89 -2.31 11.26 Adjusted-R2 0.16 0.33 0.20 Notes: (1) Topic dummies are included in all models; (2) ***, **, * denote significance at 1%, 5%, and 10% respectively.

38

Marketing Science Institute Working Paper Series Figure 1: Distribution of topics of the debates

Note: A debate can be categorized into more than one topic.

39

Marketing Science Institute Working Paper Series Figure 2: Approach for measurement of language structure

40

Marketing Science Institute Working Paper Series Figure 3: Illustration of the three-gram embedding model

41

Marketing Science Institute Working Paper Series Figure 4: Flow of activations through convolutional layers

42

Marketing Science Institute Working Paper Series Figure 5: Types of transitions within the transition-based classification step

43

Marketing Science Institute Working Paper Series Figure 6: Illustration of the transition-based classification step

Note: For the example, we take the transition at each state as given.

44

Marketing Science Institute Working Paper Series Figure 7: Impact of syntactic complexity on persuasion

Note: We measure the relative change in attitudes here as the individual relative change (between pre-and post-reading the text) in the participant’s attitude towards wearing a helmet when biking (“How essential is it for you to wear a helmet when biking?” on a 7-point scale).

45

Marketing Science Institute Working Paper Series