Tilburg University

Essays on reporting and information processing de , Ties

DOI: 10.26116/center-lis-1904

Publication date: 2019

Document Version Publisher's PDF, also known as Version of record

Link to publication in Tilburg University Research Portal

Citation for published version (APA): , T. (2019). Essays on reporting and information processing. CentER, Center for Economic Research. https://doi.org/10.26116/center-lis-1904

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Download date: 02. okt. 2021 Essays on reporting and information processing

Proefschrift

ter verkrijging van de graad van doctor aan Tilburg University op gezag van de rector magnificus, prof. dr. E.H.L. Aarts, in het open- baar te verdedigen ten overstaan van een door het college voor promoties aangewezen commissie in de aula van de Universiteit op maandag 20 mei 2019 om 16.00 uur door

Ties Cornelis Jouke de Kok

geboren op 16 januari 1992 te Berkel-Enschot. Promotiecommissie:

Promotores: prof. dr. P.P.M. Joos prof. dr. J.F.M.G. Bouwens Overige Leden: prof. dr. S. Hollander prof. dr. M. Clatworthy prof. dr. E. deHaan

Acknowledgments

In hindsight, I consider completing the PhD similar in spirit to climbing the Mount Everest. It is a challenge of persistence and self-discovery that is not possible to com- plete on your own. It requires people that support you and it gets a whole lot easier if there are people that are taking the challenge alongside you. I started the “climb” having no idea what I was embarking on, nor having a clear understanding as to why I wanted to conquer “the mountain” to begin with. However, unbeknownst to me at the time, the journey turned out to be the most important, with the final destination being just a bonus. More specifically, I have been incredibly fortunate to meet Yusiyu, my partner for life, at the beginning of this journey. Much of my development during the PhD I owe to her, and she made taking a leap of faith on the PhD one of the best decisions of my life.

I would also like to especially acknowledge the help, support, and friendship of Victor van Pelt. Victor, Yusiyu, and myself formed our PhD cohort, and I very much con- sider us akin to the “Three Musketeers”. I have benefited greatly, both personally and professionally, from having Victor as a friend, colleague, and co-author. He is an excel- lent researcher that is always willing to pitch in on research ideas and provide critical insights on how to solve research related problems. Furthermore, his assertive person- ality proved useful on more than one occasion, you can rely on Victor when something needs to be taken care of. I very much hope that we can continue our traditions of getting together for a beer in the future, wherever it may be!

There are also many people whose support have been instrumental to my PhD. First of all, special thanks to my supervisor Philip Joos. There are many different types of PhD supervisors, and I consider myself lucky to have ended up with Philip. I would describe his supervision as supportive and enabling, without being restrictive. I never felt that Philip was forcing my decision making and instead he trusted in my autonomy to incorporate his advice and feedback. This trust is something that I greatly appreciate. Furthermore, Philip is the definition of a supervisor that can create great opportunities, as long as you are willing to capitalize on them. Many of my interactions with the accounting community in Europe and the United States have been enabled by the initiatives and network of Philip. A good example of this is his role in realizing my research visit to the University of Washington, which was a great opportunity.

I Second of all, special thanks to my second supervisor Jan Bouwens. Jan and I only briefly overlapped at Tilburg but I still consider him an important mentor. If it wasn’t for him, I might not have been in the PhD to begin with given that he was the person I talked to, besides Willem Buijink, when I was considering the research master program. I also learned later that he was instrumental in allowing me to enter into the program, even though some of the formal criteria were not in my favor. Jan is also my first co- author and he opened my mind up to the world of “empirical management accounting”, but he did it in a way that did not restrict my interests in financial accounting topics. This is how it should be done! I greatly enjoy working with Jan and he is an incredibly nice person to be around. Jan has played an important supporting role throughout my PhD, and I am very grateful for that.

I also would like to express a thank you to my committee members: Stephan Hollander, Mark Clatworthy, and Ed deHaan. They have provided me with important feedback, and my papers have benefited from it greatly. I would like to especially thank Stephan for being a great colleague with whom I have always been able to discuss cool and interesting research ideas. We share a lot of common interests, and I will fondly re- member our interactions over Python in particular. Also, I would like to express a special thanks to Ed for being open minded enough to extend me an invitation to visit the University of Washington. I consider Ed an important third mentor besides Philip and Jan. His advice on my research projects and the US job market have been instrumental to my PhD. It is a privilege to join the University of Washington and be one of Ed his colleagues.

There are two remaining co-authors that I also would like to express a special thank you to, Christoph Sextroh and Arnt Verriest. Christoph joined Tilburg half-way through my PhD, and I consider myself lucky to call him my co-author. He is an exceptionally smart researcher and has a very kind and supportive personality. Besides working on research, I have fond memories of playing squash together and hanging out over beer and boardgames. Arnt and I never overlapped at Tilburg, but our paths crossed through our mutual co-authorship with Jan. I got to know Arnt as a great researcher, but more importantly as a kind and caring person. He is incredibly easy to hang out with and always has a way of making you feel comfortable around him, it truly is a pleasure to work with him.

A special thanks also goes out to all the fellow PhD students over the years at Tilburg. First of all, thanks a lot to everyone that shared a research master cohort with me in addition to Victor and Yusiyu: Xiang, Nan, Xiaochi, Yue, and Thijs. I have many great memories of these first two years. A big thanks also to all the other research

II master students and PhD students at Tilburg that I interacted with over the years; Martin, Ruishen, Jingwen, Tim, Ruidi, and Mathieu. You are all great people and you definitely made my PhD a lot more enjoyable. Besides the accounting PhDs I also have fond memories of playing video and board games with Clemens, Peter, Carlos, Ricardo, and Tung. I also would like to thank all the PhD students at the University of Washington for being so welcoming to me. In particular, Rosh was a great job market buddy and David was a great office mate. I would like to also especially thank John for his friendship and willingness to help me out during my visit. Even though he did not have to, he made sure that I had someone to hang out with from the very beginning, and without him my visit would have been a lot lonelier!

Thanks as well to the whole accounting department at Tilburg, they have provided me with a great environment to development myself in. There are some people that I would like to mention in particular. First of all, I would have not been here if it wasn’t for Willem Buijink spotting my potential in his bachelor course. You have opened up a new world for me Willem! I also would like to thank Bart Dierynck for his support during my PhD. A special thanks also to Bob van den Brand, most of my teaching was for his financial accounting course, and I always had a blast. Hetty Rutten has also set a very high bar with regards to administrative support, she is an instrumental part of the department and helped me a lot throughout my PhD. Lastly, I would like to acknowledge the help and training from Laurence van Lent, he is a very knowledgeable person and I learned a lot from him during the research master.

Finally, I owe everything to my parents, Jolanda de Kok-Hulzenga and Maarten de Kok. I consider their unconditional support and incredible upbringing instrumental to how I have developed as a person, both personally and professionally. In particular, their spirit of being super hardworking with a very down-to-earth mentality will always be a big inspiration to me. It is hard to describe how lucky I feel to have them as my parents. Furthermore, a special thanks to Anneloes, Joris, and Michiel. They are a fantastic group of siblings that are all achieving great things in their lives, and I could not be prouder of them! A last special thanks goes to two of my best friends, Johan and Tim. Playing video games with them online and getting together to hang out have always been important ways for me to decompress from the PhD. Thanks for sticking around, and I hope to welcome both of you to Seattle some day!

Ties de Kok Tilburg, April 2019

III IV Contents

1 Introduction1

2 Reporting Frequency and Market Pressure in Crowdfunding Markets5 2.1 Introduction...... 7 2.2 Background and Literature Review...... 10 2.2.1 Crowdfunding...... 10 2.2.2 Disclosures...... 11 2.2.3 Market Pressure...... 13 2.3 Hypotheses Development...... 14 2.3.1 Moderating effects...... 17 2.4 Setting and Empirical Design...... 18 2.4.1 Data...... 18 2.4.2 Construct operationalization...... 19 2.4.3 Empirical models...... 21 2.5 Results...... 23 2.5.1 Descriptive statistics...... 23 2.5.2 Main results...... 25 2.5.3 Moderating results...... 26 2.6 Additional analyses...... 27 2.6.1 Effect of recently joined funders...... 27 2.6.2 Cross-sectional splits...... 28 2.6.3 Market Pressure Heterogeneity and LDA...... 29 2.7 Conclusion...... 31 Graphs...... 33 Period length visualization...... 33 Mediation results...... 34 Visualization LDA model...... 35 Bibliography...... 36

Appendix A Market Pressure Examples 42

V Appendix B Amazon Mechanical Turk procedure 44

Appendix C Variable Definitions 47

Tables 49

3 The effect of allocating decision rights on the generation, application, and sharing of soft information 59 3.1 Introduction...... 61 3.2 Hypotheses Development...... 64 3.2.1 Reallocating decision rights enables effective use of soft informa- tion...... 65 3.2.2 Reallocating decision rights impedes effective use of soft infor- mation...... 67 3.3 Research Setting...... 68 3.3.1 Credit assessments...... 69 3.3.2 Policy change...... 70 3.3.3 Role of the loan officer...... 71 3.4 Sample Selection and Empirical Design...... 72 3.4.1 Sample selection...... 72 3.4.2 Empirical design...... 72 3.5 Descriptive Statistics and Empirical Results...... 76 3.5.1 Descriptive Statistics...... 76 3.5.2 Main Analysis...... 77 3.5.3 Selection Effects...... 78 3.6 Additional Tests...... 80 3.6.1 Loan officer Fixed Effects...... 80 3.6.2 Loan outcomes...... 80 3.6.3 Pre-screening...... 81 3.7 Robustness Tests...... 82 3.7.1 Likelihood of acceptance...... 82 3.7.2 Common Trend Assumption...... 83 3.8 Conclusions...... 85 Graphs...... 87 Parallel Trends...... 87 Bibliography...... 90

Tables 94

VI 4 Are All Readers on the Same Page? Predicting Variation in Informa- tion Retrieval from Financial Narratives 107 4.1 Introduction...... 109 4.2 Background and Related Literature...... 114 4.2.1 Text Characteristics...... 114 4.2.2 User Characteristics...... 116 4.3 Machine-learning Approach and Computational Pipeline...... 118 4.4 Eliciting Variation in Investor Behavior...... 121 4.4.1 Reading and Marking Task...... 121 4.4.2 Financial Literacy...... 122 4.4.3 Participant Pool...... 123 4.4.4 Validation of the Elicitation Procedure...... 124 4.5 Predicting Variation in Investor Behavior and Market Reactions to Fi- nancial Narratives...... 128 4.5.1 Machine-learning Approach to Predict Relevance Judgments across Financial Literacy Groups...... 128 4.5.2 Prediction Sample and Descriptive Statistics...... 130 4.5.3 Heterogeneity in User Behavior and Capital Market Outcomes. 133 4.6 Conclusion...... 136 Graphs...... 137 (a) Text-centric approach...... 138 (b) User-centric approach...... 138 Computational Pipeline to Estimate Variation in User Behavior.... 139 Illustration of the MTurk Instrument used to elicit User’s Behavior.. 140 Trend of IR Heterogeneity and Document/Text Features by Year.... 141 Association of IR and Text Features with MD&A Length...... 142 Bibliography...... 143

Appendix A: Financial Literacy Questions 147

Appendix B: Variable Definitions 151

Appendix C: Machine Learning Details 156

Appendix D: Top 15 Words and Bigrams in Marked Sentences Split by Marking Sentiment 158

Tables 161

VII List of Figures

Chapter 2: 33 2.1 Estimation results for various period lengths...... 33 2.2 Results of mediation model — Levels estimation...... 34 2.3 Results of mediation model — Changes estimation...... 34 2.4 Visualization of LDA model...... 35

A.1 A small selection of market pressure Twitter messages...... 43

B.1 Screen capture of M-Turk instructions...... 45 B.2 Screen capture of M-Turk task...... 46

Chapter 3: 87 3.1 Ex-ante trend for the main analysis...... 87 3.2 Ex-ante trend for the selection analysis...... 88 3.3 Ex-post trend for the selection analysis...... 89

Chapter 4: 137 4.1 (a) Text-centric approach ...... 138 4.2 (b) User-centric approach ...... 138 4.3 Computational Pipeline to Estimate Variation in User Behavior.... 139 4.4 Illustration of the MTurk Instrument used to elicit User’s Behavior.. 140 4.5 Trend of IR Heterogeneity and Document/Text Features by Year.... 141 4.6 Association of IR Heterogeneity and Document/Text Features with MD&A Length...... 142

VIII List of Tables

Chapter 2:5

Table 1: Descriptive Statistics...... 49 Table 2: Main Regressions...... 51 Table 3: Reporting Quality Regressions...... 53 Table 4: Frequency of Unverifiable Announcements...... 54 Table 5: New Backers Regressions...... 55 Table 6: Cross-Sectional Split Regressions...... 56 Table 7: Market Pressure Heterogeneity (LDA) Regressions...... 57

Chapter 3: 59 Table 1: Descriptive Statistics...... 94 Table 2a: Descriptive Statistics...... 96 Table 2b: Descriptive Statistics...... 97 Table 3: Main Results...... 98 Table 4: First Stage Heckman Selection Model...... 99 Table 5: Second Stage Heckman Selection Model...... 100 Table 6: Loan Officer Fixed-Effects...... 101 Table 7: Loan outcome analysis...... 102 Table 8: Screening analysis...... 103 Table 9: Likelihood of acceptance analysis...... 104 Table 10: Placebo Tests...... 105

Chapter 4: 107 Table 1: Regressions of Participants’ Judgments on Marking Behavior.... 161 Table 2: Regressions of Reading Time on Financial Literacy and Text Com- plexity...... 163 Table 3: Logit of Marking Likelihood on Financial Literacy and Text Com- plexity...... 164

IX Table 4: Regressions of Marking Behavior on Financial Literacy and Text Complexity...... 165 Table 5: Top 15 Words and Bigrams in Sentences Marked by Medium or High Literacy Users Only...... 166 Table 6: Predicted Marking Behavior of Financial Literacy Groups...... 167 Table 7: Heterogeneity in Predicted Marking Behavior...... 168 Table 8: Variable Descriptive Statistics for 10-K Filings...... 169 Table 9: Analysis of Predicted Heterogeneity in Information Retrieval Using Post-Filing Date Market Model RMSE...... 170 Table 10: Analysis of Predicted Heterogeneity in Information Retrieval Using Analyst Dispersion as Dependent Variable...... 171 Table 11: Analysis of Predicted Heterogeneity in Information Retrieval and the Level of Institutional Ownership Using Post-Filing Date Market Model RMSE...... 172

X Chapter 1

Introduction

1 2 In this dissertation, I present three empirical essays on a range of topics relating to reporting and information processing. All of these essays utilize state-of-the-art em- pirical techniques drawn from computer science along with new data sources to study fundamental accounting questions. More specifically, they cover four primary topics: internal and external reporting practices, narrative disclosures, recent advancements in reporting technologies, and the role of reporting in emerging markets.

Chapter 2 studies the reporting implications of recent technological advancements in funding structures and two-way communication channels. More specifically, this single author paper studies the role of reporting frequency in crowdfunding markets. Using data from one of the largest reward crowdfunding platforms, I provide the first empir- ical evidence on the relationship between reporting frequency and the level of market pressure in a scenario where the consumers of an entrepreneur also act as her funders. I develop a text-based measure for market pressure by classifying Twitter messages directed to the entrepreneurs with a machine learning algorithm trained using Amazon Mechanical Turk. The results show a negative association between reporting frequency and the level of market pressure. This result is driven primarily by the reporting part of the updates; a mediation analysis shows that only a small fraction of the relationship is driven by a consumption utility effect. Furthermore, this association is stronger when accompanied by higher quality reporting events, is not influenced by the frequency of unverifiable additional announcements, and is weaker during periods with a large presence of newly joined funders. These results highlight the fact that higher reporting frequencies can lead to reduced agency frictions in crowdfunding markets, even when these markets are characterized by strong myopic market preferences

Chapter 3 falls into the category of empirical management accounting and studies the use of soft information in the context of internal bank lending decisions. More specifically, we study how the use of soft information, which is difficult to gather, transfer, and verify, is affected by the location of knowledge and the allocation of deci- sion rights. Specifically, we examine how operational decisions are affected if controls are introduced that limit the decision rights of loan officers. To that end, we exploit a quasi-natural experiment at a large European bank. The reallocation of decision rights was prompted by a regulator, which required the bank to restructure its loan

3 decision process. Our findings indicate that this change enables better integration of soft information in credit decisions. These findings are robust to controlling for strate- gic loan-sorting behavior, manager fixed effects, and the likelihood of acceptance. We also document that this improved integration of soft information is driven by a change in behavior by the loan officers and that it results in better loan outcomes.

Chapter 4 studies the information retrieval process for narrative disclosures from a user perspective by combining innovative tracking techniques deployed on Amazon Mechanical Turk with state-of-the-art machine learning techniques. More specifically, we develop a comprehensive measure for variation in information retrieval based on observed user behavior that is also able to incorporate understudied text characteris- tics such as the semantics and content of a narrative. Using a tool that tracks reading and marking behavior in a controlled environment, we first document how users with varying degrees of financial literacy retrieve information from financial narratives. We find significant variation among financial literacy groups that cannot be solely ex- plained by text characteristics related to processing costs. Next, we use state-of-the-art machine-learning to predict variation in information retrieval for out-of-sample finan- cial narratives, and we show that these predictions are incrementally associated with the post-announcement return volatility. Overall, our results suggest that efforts by regulators and corporations to simplify text characteristics of corporate communica- tions might not resolve all differences in how users retrieve information from financial narratives.

In summary, my dissertation is organized as follows. In chapter two, I present my single- authored paper titled “Reporting Frequency and Market Pressure in Crowdfunding Markets”. In Chapter three, I present a study co-authored with Jan Bouwens titled “The effect of allocating decision rights on the generation, application, and sharing of soft information”. The Last chapter presents a study co-authored with Christoph Sextroh and Victor van Pelt titled “Are All Readers on the Same Page? Predicting Variation in Information Retrieval from Financial Narratives”.

4 Chapter 2

Reporting Frequency and Market Pressure in Crowdfunding Markets

5 6 2.1. Introduction

I study the relationship between reporting frequency and market pressure in crowd- funding markets. Reward crowdfunding is an alternative channel for entrepreneurs to bring their ideas to market by raising funds directly from their consumers rather than through traditional financial intermediaries such as venture capitalists. Entrepreneurs increasingly opt to use this funding channel because crowdfunding allows them to re- tain ownership and provides a cost-effective way to reduce demand uncertainty before the investment decision (Strausz, 2017). Analysts estimate that by 2022 more than US $25 billion will have been raised worldwide through reward crowdfunding.1 This pop- ularity, however, is often considered puzzling from an economic perspective as the lack of regulation and monitoring mechanisms in these crowdfunding markets is expected to result in a moral hazard problem (Gutirrez Urtiaga and Saez Lacave, 2018).

Disclosures by the entrepreneur provide a mechanism for alleviating this agency fric- tion. Recent studies examine the role of such disclosures and document that projects with higher upfront disclosure quality are able to raise more funding. (Ahlers, Cum- ming, G¨unther, and Schweizer, 2015; Cascino, Correia, and Tamayo, 2018; Hornuf and Schwienbacher, 2018; Madsen and McMullin, 2018). I extend these findings by studying how the frequency of disclosures interacts with funders’ monitoring behavior after they have provided their funding. In particular, I measure funders’ monitoring behavior as the level of market pressure that the incumbent funders impose on the entrepreneur. Drawing on various concepts in the literature, I define market pressure as communications from funders to the entrepreneur that are intended to steer the entrepreneur’s decisions in a specific direction.

It is not ex ante clear how the disclosure frequency affects the level of market pres- sure (Gigler, Kanodia, Sapra, and Venugopalan, 2014; Edmans, Heinle, and Huang, 2016). From the perspective of information asymmetry and agency frictions, a higher reporting frequency would be expected to reduce the need for market pressure. Empir- ical evidence shows that in a traditional capital market context, timelier information reduces the need for active stakeholders to exert market pressure owing to a reduc- tion in moral hazard (Demsetz and Lehn, 1985; Bushman, Chen, Engel, and Smith, 2004; Armstrong, Guay, and Weber, 2010). From the perspective of market myopia, however, the reporting frequency might also increase the level of market pressure. Ex- perimental evidence shows that exposing individuals to a higher evaluation frequency can adversely affect their decision horizon (Thaler, Tversky, Kahneman, and Schwartz, 1997; Gneezy and Potters, 1997; Van Der Heijden, Klein, M¨uller,and Potters, 2012).

1 Retrieved from: https://www.statista.com/outlook/335/100/crowdfunding/worldwide

7 The disclosure literature also suggests that higher reporting frequencies might attract funders with short-term preferences (e.g., Wagenhofer, 2014) and might result in more dissatisfied funders owing to increased exposure to the outcome variability (Guo, Finke, and Mulholland, 2015; Casas-arce, Louren¸co,and Mart´ınez-jerez, 2017).

To empirically study this relationship, I examine a type of reward crowdfunding called “Early Access”. Early-access crowdfunding differentiates itself from “traditional” re- ward crowdfunding by providing the funders with access to all development versions of the product. It is increasingly replacing platforms such as Kickstarter in the video game and other digital technology industries.2 The early-access model enables me to study the effect of reporting frequency on market pressure because the entrepreneurs pro- vide product updates at varying frequencies that reveal the progress in the product’s development and because the funder-entrepreneur interactions happen on publicly ob- servable communication channels. Furthermore, the funding period is open throughout the development period, which provides the entrepreneur with incentives to care about disclosures and market pressure, as the behavior of incumbent funders is a strong deter- minant for the investment decisions of future funders (Hildebrand, Puri, and Rocholl, 2017; Hornuf and Schwienbacher, 2018).

I collect publicly available data from the most popular early-access platform, Steam Early Access, using programmatic data-gathering techniques. The Steam platform is the largest digital distributor of games, with more than 125 million registered users and close to 15 million concurrent users during peak hours. My sample consists of 144 projects that became available on the Steam platform between May 2013 and February 2017 as part of the early-access program. An average project is, conservatively estimated, able to attract around US $1.5 million in funding from early-access funders. For my main analyses, this results in 916 observations, with each observation being a 10-week project period. This sample exhibits substantial variation in terms of the number of new product updates, as an average project has an update every 6 weeks but some projects update as frequently as once per week. Each product update reports on the progress made by the entrepreneur but also potentially provides funders with additional consumption utility for receiving new product features. To distinguish the reporting effect from the utility effect, I use a mediation model wherein changes in the product satisfaction ratings serve as a proxy for the utility effect.

In the early-access crowdfunding setting, the primary channel for funders to commu- nicate with the entrepreneurs is through the entrepreneur’s Twitter account. I utilize

2 “Kickstarter in 2017 In depth look at the Games category, February 1, 2018” by the ICO partners consulting firm. http://icopartners.com/2018/02/ kickstarter-2017-depth-look-games-category/

8 this characteristic of the setting to develop a new measure for market pressure based on the subset of tweets that both relate to the state of the project and contain a direct or indirect request. This subset of tweets is identified by a machine learning algorithm trained on a sample of 5,000 manually classified tweets that I obtain by deploying a custom classification tool on Amazon Mechanical Turk. I use this algorithm to classify the entire sample of 220,000 tweets directed toward entrepreneurs, which yields a sub- sample of 105,000 tweets that plausibly impose market pressure on the entrepreneur. On average, this translates to an entrepreneur receiving around 56 market pressure tweets from funders in each 10-week period.

The results show a negative association between the number of reporting events in a period and the level of market pressure imposed on the entrepreneur. This result is economically meaningful, because one more reporting event in a 10-week period is associated with, on average, 6 fewer market pressure tweets, which reflects a reduction of 17% relative to the median. I obtain a similar result when I use the stricter changes specification. Increasing the number of reports in a 10-week period by 1 report is associated with a reduction in market pressure of around 9 tweets, which corresponds to a reduction of 25% relative to the median. This result is driven primarily by the progress reporting part of the product updates, as the mediation analysis shows that only a small fraction of the relationship is driven by a utility effect. Additional tests show that this association is stronger when accompanied by higher quality reporting events, is not influenced by the frequency of unverifiable additional announcements, and is weaker during periods with a large presence of newly joined market participants. Finally, a latent dirichlet allocation (LDA) model trained on the market pressure tweets reveals that the reporting frequency primarily affects pressure relating to the product’s quality and feature set but has little effect on pressure relating to the release schedule of the product’s development.

This paper makes several contributions. I am the first to provide empirical insights on the relationship between reporting frequency and market pressure in crowdfunding markets. Furthermore, my results add to the growing literature on the positive effects of disclosure in reward crowdfunding settings where consumers act as funders (e.g., Strausz, 2017; Cascino et al., 2018; Hornuf and Schwienbacher, 2018; Madsen and Mc- Mullin, 2018). Third, in the disclosure literature, it is often suggested that market pressure drives the relationship between reporting frequency and managerial myopia (e.g., Ernstberger, Link, Stich, and Vogler, 2017; Kraft, Vashishtha, and Venkatacha- lam, 2017). Yet, my paper is one of the first to provide direct evidence of the existence and direction of the relationship between reporting frequency and market pressure. Finally, I make a methodological contribution to the textual analysis literature by de-

9 veloping a project-period specific measure for the level of market pressure based on textual communications from market participants by training a machine learning algo- rithm using Amazon Mechanical Turk. This empirical approach is largely new to the literature but highlights and encourages classifying textual data using training sam- ples of appropriate sizes in a cost-efficient and objective manner, which is becoming increasingly important.

2.2. Background and Literature Review

2.2.1. Crowdfunding

The existing literature shows that obtaining early-stage funding through traditional channels such as bank loans or large equity investors can be troublesome for some en- trepreneurs (e.g.,Berger and Udell, 1995; Cosh, Cumming, and Hughes, 2009). Many entrepreneurs are therefore increasingly turning to crowdfunding as an alternative source of financing. Crowdfunding suggests that early stage funding is obtained directly from a large group of individuals who each are willing to support the entrepreneur by pledging a relatively small amount of money (Belleflamme, Lambert, and Schwien- bacher, 2014; Belleflamme, Omrani, and Peitz, 2015). This crowd-based funding ap- proach helps the entrepreneur to obtain not only financing at an early stage but also feedback and ideas from these individuals, which can help to reduce the uncertainty around product demand that is inherent in starting a new business (Agrawal, Catalini, and Goldfarb, 2014; Strausz, 2017; Xu, 2017; Chemla and Tinn, 2018).

There are many types of crowdfunding. Two primary categories of crowdfunding can be distinguished according to their models: reward crowdfunding and profit-sharing (eq- uity) crowdfunding (Belleflamme et al., 2014, 2015). Under the reward crowdfunding model, the funders have a claim on the final product, whereas under the profit-sharing model, the funders have a claim on future profits or equity securities. Early-access crowdfunding is a particular type of reward crowdfunding. However, the early-access model gives the funders not only a claim on the final product but also an additional reward by granting them early-access to all intermediate versions of the software prod- uct during the development process. This additional access distinguishes early-access crowdfunding from traditional reward crowdfunding models used by platforms such as Kickstarter (e.g.,Mollick, 2014). Entrepreneurs in the video game industry increasingly opt for the early-access model over the Kickstarter model as it allows them to obtain more funding and product feedback earlier on in the development process. Besides the video game industry, the early-access model is also increasingly used to fund products

10 such as electronic books, online courses, and video series.

The popularity of these crowdfunding models is often considered puzzling from an economic and legal perspective owing to the strong presence of moral hazard and in- formation asymmetries (e.g.,Strausz, 2017; Gutirrez Urtiaga and Saez Lacave, 2018). This is particularly true for reward crowdfunding. First, the entrepreneurs in these markets tend to be young and relatively inexperienced, this makes it difficult for fun- ders to evaluate the reputation of an entrepreneur.3 Second, funders in these markets are often considered to primarily be consumers which are not traditionally exposed to such agency frictions. The theoretical papers by Strausz(2017) and Chemla and Tinn (2018) model this reward crowdfunding scenario and conclude that these funders are a new type of stakeholder that is best described as a consumer exhibiting “investor like” behavior. The incentives of these funders are different compared to traditional in- vestors, however, the underlying agency frictions are similar. Empirical studies on the Kickstarter platform find evidence consistent with these theoretical predictions (e.g., Barbi and Bigelli, 2017; Courtney, Dutta, and Li, 2017; Kuppuswamy and Bayus, 2018).

2.2.2. Disclosures

The agency frictions inherent to crowdfunding highlight the importance of mecha- nisms that reduce the information asymmetry between the entrepreneur and the fun- ders (Strausz, 2017). Courtney et al.(2017) show that signals such as third-party endorsements and the entrepreneur’s prior crowdfunding experience have a positive effect on the projects’ likelihood of attaining funding. Besides these indirect signals, the entrepreneur can also reduce information asymmetry directly by providing more and better disclosures to the funders. Several papers have documented that Kickstarter projects with longer project descriptions on average obtain more project funding (Barbi and Bigelli, 2017; Kuppuswamy and Bayus, 2018; Cascino et al., 2018). Madsen and McMullin(2018) extend these results by analyzing the “Risks and Challenges” sec- tion and documenting that projects with higher quality risk disclosures receive more funding. Furthermore, Cumming, Hornuf, Karami, and Schweizer(2017) show that projects with poorly worded and confusing campaign pitches have a higher likelihood of being ex post identified as fraudulent.4

3 “Developer Satisfaction Survey 2017” by the International Game Developer Association. https: //www.igda.org/page/dss2017 4 On a related note, the role of proprietary information in providing credibility to a voluntary reporting event (e.g., Gigler, 1994) is less relevant in the early-access model, given that the new product version accompanying an announcement provides a direct way for funders to verify the credibility.

11 Prior literature also documents the importance of reporting frequency in equity market scenarios. There is plenty of research that looks at the effect of reporting frequency on the level of information asymmetry in traditional capital market settings (Cuijpers and Peek, 2010; Fu, Kraft, and Zhang, 2012). More relevant are the papers by Block, Hornuf, and Moritz(2018) and Hornuf and Schwienbacher(2018) as they show that entrepreneurs that provide more updates have a higher likelihood of obtaining funding in equity crowdfunding markets. However, little is known about the role of reporting frequency in reward crowdfunding markets.

Early-access crowdfunding is particularly suitable to study the role of reporting fre- quency in reward crowdfunding markets. The concept of disclosure on the Kickstarter platform is largely determined by the up-front campaign description as the funding is concentrated at the beginning and the end of the campaign (Kuppuswamy and Bayus, 2018). In the early-access crowdfunding variant, however, the funding distribution is more constant, as the funding period is open throughout the development process. As a result, the information asymmetry for an early-access project depends not only on the initial up-front disclosure but also on the quality and frequency of intermediate product updates on the progress of the project’s development. This characteristic of the early-access model provides one of the first opportunities to study the effects of re- porting frequency in a reward crowdfunding scenario. While not a natural experiment, there is quasi-exogenous variation in the reporting frequencies both across projects and over time. This variation is driven by the new and unexplored nature of the early- access model combined with the inexperience of the entrepreneurs, this results in a wide range of reporting frequencies being explored. Combining this quasi-exogenous variation with a wide array of control variables and project fixed effects enables me to evaluate the unconfounded impact of higher reporting frequencies.

Theoretically there are several incentives for the entrepreneur to disclosure more or less frequently. The primary incentive for entrepreneurs to provide more frequent dis- closures is that it can help them to attract more future funding as shown by Block et al.(2018) and Hornuf and Schwienbacher(2018). Furthermore, as explored in this paper, providing more frequent reports can potentially help the entrepreneur to reduce the level of market pressure imposed on them. However, there are also several reasons that push the entrepreneur to provide less frequent disclosures. The first reason is that providing more frequent disclosures is costly from the perspective of preparation. Entrepreneurs that do most of the work themselves, or with only a small team, will be resource constrained making it difficult to provide more frequent updates given that each update requires a fixed amount of preparation effort. This relates to the second reason which is that there is a cost, in the form of reputation concerns and

12 increased difficult to raise future funding, to entrepreneurs when they provide updates at a higher frequency that are not up to par with the expectations of the funders. For each entrepreneur there will be an appropriate disclosure frequency where these advantages and disadvantages are balanced, however, these entrepreneurs are unlikely to know what this frequency is and will thus experiment with various frequencies in a quasi-exogenous way.

2.2.3. Market Pressure

Another primary determinant of the crowdfunding investment decision, besides disclo- sures by the entrepreneur, is “herding” behavior based on the actions of incumbent funders (Hildebrand et al., 2017). Courtney et al.(2017) find that the sentiment of incumbent funders is one of the primary determinants of the likelihood that new fun- ders will opt to fund a Kickstarter project. Furthermore, Hornuf and Schwienbacher (2018) and Hildebrand et al.(2017) show that similar behavior occurs in both equity crowdfunding and lending-based crowdfunding settings. This herding behavior is rele- vant because it gives the entrepreneur a strong incentive to care about communications from incumbent funders, as they are a primary determinant of his or her ability to attract future funding. This is particularly relevant in the early-access crowdfunding model because the entrepreneur’s development decisions are expected to be partially driven by pressure imposed by the market (i.e., the incumbent funders). I refer to this phenomenon as “market pressure” and define it as “communications from funders to the entrepreneur that are intended to steer the entrepreneur’s decisions in a specific direction”.

Market pressure could have either a positive or a negative effect on the decision making quality of the entrepreneur. From the perspective of moral hazard, market pressure might improve the decision making quality. As described in the seminal work of Jensen and Meckling(1976), agency frictions can be resolved if the market has a way to mon- itor and control the behavior of the manager. Communications (i.e., a “voice”) from the market to the entrepreneur can operate as a governance mechanism, as it helps to align the market’s interest with that of the entrepreneur (Armstrong et al., 2010). Several papers empirically document this in the traditional capital market by showing that communicating with management constitutes an effective instrument for moni- toring and controlling the manager (Brav, Jiang, Partnoy, and Thomas, 2008; Dimson, Karaka¸s,and Li, 2015; McCahery, Sautner, and Starks, 2016; Harford, Kecsk´es,and Mansi, 2018).

However, crowdfunding markets tend to be inherently myopic due to the prepurchas- ing nature of the transaction. Crowdfunders have strong incentives to demand faster

13 product development because the time value of money suggests that the up-front in- vestment becomes more expensive the longer the funder must wait for the final product. As a result, a misalignment may develop between the entrepreneur’s decision horizon and the funders’ myopic horizon. In general terms, myopia refers to the overvaluation of short-term outcomes and the undervaluation of long-term outcomes (Fishburn and Rubinstein, 1982). Misalignment stemming from market myopia is relevant because exposure to such myopic preferences can induce the entrepreneur to engage in myopic development decisions. In that case, market pressure driven by market myopia might decrease decision-making quality. Such market myopia, when reporting on the progress of product development, is also observed in the context of R&D announcements by bio-pharmaceutical companies (e.g.,Mc Namara and Baden-Fuller, 2007). In a more general sense, this idea of market induced managerial myopia is popularized in the seminal work of Stein(1988), Stein(1989), and Froot, Perold, and Stein(1992).

2.3. Hypotheses Development

The primary objective of this paper is to study how reporting frequency interacts with the level of market pressure imposed on the entrepreneur in a crowdfunding setting. This interaction is relevant to studying early-access crowdfunding because the inter- mediate product updates act as reporting events that are likely to introduce disclosure effects that could either increase or decrease the level of market pressure (e.g.,Gigler et al., 2014; Edmans et al., 2016). I expect the level of market pressure to be a first- order determinant of managerial decision-making in the crowdfunding setting, yet it is not ex ante clear how this market pressure is influenced by the primary disclo- sure choice (i.e., reporting frequency) of an entrepreneur pursuing financing through early-access crowdfunding.

From the perspective of agency frictions a higher reporting frequency should reduce the amount of market pressure imposed on the entrepreneur. Armstrong et al.(2010) emphasizes, in a general sense, the importance of the information environment in shap- ing the extent of the conflict between the manager and the market participants and the role that this environment plays in resolving this tension. Having more reporting events will improve the information environment, which in turn reduces entrepreneurs’ discretion to make decisions that do not align with market interests (e.g.,Barbi and Bigelli, 2017; Xu, 2017; Kuppuswamy and Bayus, 2018; Madsen and McMullin, 2018). The interplay between the level of this managerial discretion and the level of commu- nication from market participants (i.e., agency friction driven market pressure) is less straightforward to predict. At first glance, one might expect that having better avail-

14 able information would increase funders’ propensity to strategically communicate with the entrepreneur to steer his or her decision-making. However, as discussed in Demsetz and Lehn(1985) and Bushman et al.(2004) this relationship is more nuanced than that. Demsetz and Lehn(1985) argue that active monitoring by market participants is most important in settings characterized by low transparency. This prediction derives from the fact that there is less need for market participants to exert market pressure in a scenario where the entrepreneur has committed to less self-motivated decision- making by increasing transparency (Demsetz and Lehn, 1985; Bushman et al., 2004; Armstrong et al., 2010). Following a similar logic, in scenarios of low transparency, there is more room for moral hazard and in turn more necessity for market pressure with the purpose of reducing agency frictions. As a result, I predict that higher re- porting frequencies in the crowdfunding setting will reduce funders’ propensity to exert market pressure on the entrepreneur as a way of resolving agency frictions.

However, from the perspective of myopic market preferences, I expect that a higher reporting frequency will increase the level of market pressure imposed on the en- trepreneur. Based on the conditions formalized by Froot et al.(1992), there is an indirect and a direct channel through which short-term oriented market participants can influence the decision making of the entrepreneur. Indirect myopic market pressure occurs when the entrepreneur cares about the public behavior of incumbent funders, seeing it as a primary determinant for future funding (Hornuf and Schwienbacher, 2018). Direct myopic market pressure occurs when market participants alter the sen- sitivity of the entrepreneur to their myopic preferences by varying the frequency with which they communicate their preferences to the entrepreneur (Froot et al., 1992). In a capital market context, various papers empirically confirm these predictions. Jacobson and Aaker(1993) and Asker, Farre-Mensa, and Ljungqvist(2015) both document that market-induced myopia is expected to be larger in scenarios where the manager is more exposed to the market participants’ horizon. Regarding market participants influenc- ing the sensitivity of the manager to their preferences, the results by Bushee(1998) indicate that the institutional investors’ horizon preference influences the amount of R&D a manager will invest in a product.

The aforementioned direct channel of myopic market pressure can be influenced by the reporting frequency in several ways.5 The first way works through the effect that reporting frequency has on the self-selection of funders with certain horizons. Basic

5 Various papers, such as Hermalin and Weisbach(2012), Gigler et al.(2014), and Edmans et al. (2016), adapt the models of Stein(1988, 1989) to incorporate the concept of reporting frequency. Their models, however, primarily treat reporting frequency as a dichotomous concept based on the presence or absence of premature evaluation. In my empirical setting all reporting frequencies cause premature evaluation, making it difficult to use these models for appropriate empirical predictions.

15 economic theory suggests that impatient funders prefer a higher update frequency of updates, and vice versa, as short-term oriented funders discount longer update inter- vals more strongly (e.g.,Thaler et al., 1997; Wagenhofer, 2014). Furthermore, it is also possible that the reporting frequency alters the horizon of incumbent funders directly. Experimental evidence suggests that the evaluation frequency (which is a product of the reporting frequency) imposed on individuals alters their risk preference and more specifically their myopic tendencies (Gneezy and Potters, 1997; Thaler et al., 1997; Van Der Heijden et al., 2012). Besides having an effect on the funder horizon, re- porting frequency is also expected to affect funders’ propensity to communicate their preference to the entrepreneur. Each reporting event is expected to trigger a period of heightened attention, temporarily increasing the amount of communication. Theoret- ically, this expectation is one case of the broader “agenda-setting hypothesis”, which predicts that the frequency of reports on an issue determines the level of attention that issue receives (Cornelissen, 2011; Sayre, Bode, Shah, Wilcox, and Shah, 2010; Conway, Kenski, and Wang, 2015). Moreover, higher reporting frequencies expose the funders to more of the outcome variability that increases the likelihood that some of the reports will not meet expectations. Based on the results of Guo et al.(2015) and Casas-arce et al.(2017), I expect funders to asymmetrically overvalue these below expectation reports, which increases their propensity to communicate their dissatisfaction to the entrepreneur.

Main prediction

Combining the effect on market pressure that is driven by agency frictions with the effect on market pressure that is driven by myopic preferences yields conflicting ex- pectations regarding the direction of the relationship between reporting frequency and market pressure. I expect market pressure driven by agency frictions to decrease with higher reporting frequencies, whereas I expect market pressure driven by my- opic preferences to increase with higher reporting frequencies. In a more general sense, this dynamic is similar to the trade-off between the positive effects (reduction in cost of capital) and the negative effects (inducing managerial short-termism) of increased disclosure as modelled by Edmans et al.(2016). To reflect this trade-off, I frame Hy- pothesis 1 in a non directional manner. Hypothesis 1. The reporting frequency is associated with the level of market pressure.

16 2.3.1. Moderating effects

Reporting quality

Providing information in a timely manner is, as previously discussed, important for es- tablishing the information environment that is necessary for reducing agency frictions. However, the overall quality of the information environment is influenced also by the quality of the reporting events (i.e., the degree to which the expectations of the mar- ket are updated via the reported information). In the capital market context, this is often referred to as the trade-off between the timeliness of information and the quality (i.e., the “representational faithfulness”) of information (e.g.,Doyle and Magilke, 2013). Boland, Bronson, and Hogan(2015) and Lee, Mande, and Son(2015) provide empir- ical evidence of this trade-off by showing that accelerated filings are likely associated with lower levels of reporting quality. More generally, Casas-arce et al.(2017) show in a management accounting context that there can also be an upper bound to the reporting frequency, whereby the additional processing costs of more granular infor- mation start to outweigh the benefits of providing timelier information. Based on this trade-off, I would expect that reporting frequency and reporting quality act as substi- tutes, whereby an increase in the reporting quality weakens the relationship between reporting frequency and market pressure. From the perspective of agency frictions, however, there might also be a complementary relationship between the reporting fre- quency and the reporting quality. This is particularly relevant to discussions about the effects of a change in reporting frequency. Increasing the number of reporting events in combination with increasing the quality of those reporting events is substantially more costly and should thus increase the signaling strength of these reporting events. Since it is not ex ante clear whether the relationship between changes in reporting frequency and changes in market pressure is stronger or weaker when accompanied by a change in the reporting quality, I frame Hypothesis 2 in a non directional manner. Hypothesis 2. The association between a change in the reporting frequency and a change in the level of market pressure is influenced by changes in the average reporting quality.

Frequency of unverifiable announcements

As in a capital market scenario, early-access entrepreneurs can make announcements in addition to the main product updates. Compared to the main reporting events, these additional announcements are unverifiable because they are not accompanied by a new product update. It is not ex ante clear whether these additional unverifiable announcements have enough substance and credibility to cause market participants to update their beliefs. According to the analytical models of Verrecchia(1983) and

17 Gigler(1994), a market participant will incorporate announcements into their decision- making only if these communications are costly to the entrepreneur. The empirical evidence on this prediction in the capital market context is mixed. For example, the literature on managerial use of social media for additional communications shows that it can sometimes improve the information environment (Chen, Hwang, and Liu, 2013; Blankespoor, Miller, and White, 2014) and sometimes it serve only a strategic purpose (Jung, Naughton, Tahoun, and Wang, 2017). In the crowdfunding context, there is little to no enforcement or regulation, making it difficult for an entrepreneur to credibly commit to unbiased additional announcements. Moreover, Ben-Rephael, Da, Easton, and Israelsen(2017) document that in the context of Form 8-K filings, the decreased information benefits of additional communications are particularly profound for less sophisticated market participants. Based on these analytical and empirical results, I predict that the frequency of an entrepreneur’s unverifiable announcements will not influence the relationship between reporting frequency and market pressure. Hypothesis 3. The association between reporting frequency and level of market pres- sure is not influenced by the frequency of unverifiable announcements.

2.4. Setting and Empirical Design

2.4.1. Data

I collect publicly available data from the most popular early-access platform, Steam Early Access. The Steam platform is the largest digital distributor of games in the world, with more than 125 million registered users and more than 15 million concur- rent users during peak hours. In 2013, the Steam platform started allowing software entrepreneurs to register their projects for the Steam early-access program. This pro- gram enables users of the platform to crowdfund a video game project by prepurchasing the game in return for early-access to all development versions (i.e., pre-alpha, alpha, and beta) and the final version.

My sample consists of all early-access projects that have sufficient economic substance for meaningful market pressure to occur. I define economic substance using two condi- tions: the purchase price is higher than $5.99 and at least 1,000 funders have funded the project. I verified complain with these conditions manually based on publicly available information. Furthermore, I require the entrepreneur to have an identifiable Twitter account that is actively used and the project to have at least three 10-week peri- ods’ worth of data. These requirements result in a sample of 144 early-access projects available on the Steam platform between May 2013 and February 2017. The median

18 number of funders for a project is around 80,000, and the mean price is around $19. This yields the conservative estimate that an average project will raise raise around US $1.5 million via this early-access crowdfunding platform.

2.4.2. Construct operationalization

Reporting frequency

A reporting event is defined as an update to the software that is substantial enough to at least partially meet the funders’ expectations (i.e., substantial enough for the funders to update their expectations). I identify a reporting event based on two criteria: (1) there is an announcement on the official project page containing at least one keyword referencing a new version or update, and (2) the reporting event is accompanied within three days by a change in the file size of the software.6 The first criterion guarantees that the reporting event is visible to funders and a change in the software’s file size guarantees that the reporting event has substance. Data on these announcements and reporting events is gathered directly from the Steam API and from the project news feed on the Steam platform. The reporting frequency is operationalized by the number of reporting events in a time interval.

One concern with this approach is that each product update will not only adjust the funders’ beliefs about the development progress but also provide funders with a utility from receiving new product features. The primary mechanism of interest in this study is the reporting effect (i.e., the reduction in information asymmetry) and not the utility effect. To rule out the alternative explanation that market pressure is associated with the number of product updates owing to the utility effect, I use a mediation model. In this mediation model, I use the number of negative reviews that the project receives during a given period as a proxy for the utility effect. Any extra utility derived from additional product updates should be strongly correlated with a reduction in the number of negative product reviews. This mediation analysis approach allows me to separate the relationship between the frequency of product updates and market pressure into the reporting effect and the mediated effect that flows through changes in product satisfaction.

Market pressure

Measuring the level of market pressure is empirically challenging. In the early-access setting, the primary channel for funders to communicate with the entrepreneur is through the entrepreneur’s social media accounts, in particular his or her Twitter

6 The keywords used are: “version”, “release”, “patch”, “change log”, “changelog”, “update”, “hotfix”,“hot fix” ,“build”, and “what’s new”.

19 account. I utilize this characteristic of the setting to develop a new measure for market pressure based on the subset of tweets received by an entrepreneur during a given period that are plausibly imposing pressure. I consider a tweet to contribute to the market pressure experienced by the entrepreneur when it fulfills two criteria: (1) the tweet must relate to the current and/or future state of the project, and (2) the tweet must contain a direct and/or an indirect request. An example of a direct request would be a funder asking the entrepreneur to focus their development attention on a particular aspect of the project, whereas an example of indirect request would be a complaint or a general inquiry about the project’s development status. The second criterion is of particular importance, as a tweet without an explicit or implicit request is unlikely to steer the entrepreneur’s decision-making in a specific direction, which is a key aspect of my theoretical definition of market pressure. I provide a small selection of market pressure Twitter messages in appendix A.

I classify tweets based on both criteria using an objective procedure that involves training a machine learning algorithm on a large sample of tweets manually classified by Amazon Mechanical Turk (i.e., M-Turk) workers. I manually identify the Twitter account of each entrepreneur, and all Twitter messages to and from these accounts are gathered in a programmatic manner from the Twitter API and the Twitter advanced search functionality. This yields a sample of roughly 220,000 tweets directed toward the entrepreneurs. A training sample of 5,000 tweets was randomly selected and classified by M-Turk workers to identify whether each tweet relates to the development of the project and whether it contains an explicit or an implicit request. Each M-Turk worker was paid US $0.20 for every 5 tweets they classified. In order to validate the accuracy of this procedure, I also classify 1,000 tweets manually and cross-validate this sample with the M-Turk classification. This cross-validation shows a disagreement of between 3% to 4%, which provides reassurance that the M-Turk workers understand the task and are properly incentivized. A detailed description of the classification instrument and procedure is available in appendix B.

Based on this training sample, a machine learning algorithm is trained to classify the entire sample of 220,000 tweets. I use an evolutionary grid search algorithm to find the optimal machine learning classifier and corresponding hyper-parameters. A support vector machine (SVM) classifier with tuned hyper-parameters yields an F- beta score of around 76%. This score is comparable to other studies that train machine learning classifiers to classify tweets based on more sophisticated concepts (e.g., Sakaki, Okazaki, and Matsuo, 2010). The full sample is classified using this algorithm, with the result that around 47% of the total tweets (around 105,000 tweets) is classified as contributing to the level of market pressure imposed on the entrepreneur.

20 Moderating variables

I measure the reporting quality of a reporting event based on the length of the an- nouncement posted on the official project page. This approach is similar to the ap- proaches used by Cascino et al.(2018) and Madsen and McMullin(2018). I opera- tionalize the length of an announcement using a tokenizer and counting the number of tokens (i.e., words) in the announcement. I chose this method because the number of tokens in the announcement message is more easily compared across different projects and reporting events that are alternatives such as the file size change of the prod- uct update. I operationalize the frequency of additional unverifiable announcements by identifying the number of announcements on the official project page that are not accompanied by a new product update.

Control variables

I expect that a period characterized by high levels of dissatisfaction among funders is likely to increase market pressure. To control for this, I include a variable that accounts for the number of negative reviews posted on the project page for a given period. Changes to the funderbase of a project can also influence the level of market pressure imposed on the entrepreneur. To account for this, I include a control variable based on the change in ownership for a given period. Because I expect the general attention a product receives on social media to influence the project’s visibility, I include a control variable based on the number of YouTube videos posted about a project during a given period. Because I expect the level of active users of the software being developed to influence the likelihood that a funder will decide to express concerns to the entrepreneur, I include a control variable to account for the median playtime in a given period. Finally, I expect an entrepreneur that is more active on social media to attract more market pressure via social media, because their social media becomes a more effective communication channel for funders wanting to influence the entrepreneur. To control for this, I include the number of tweets from the entrepreneur in a given period.7

2.4.3. Empirical models

Hypothesis 1

To test Hypothesis 1, namely, the association between the reporting frequency and the level of market pressure, I use the following model:

7 The various statistics required to calculate these control variables are collected from the Steam API and various community websites.

21 MarketP ressurei,t = β0 + β1RepF reqi,t−1 + β Controls i,t−1 { } (2.1) + P rojectF E + T imeF E + 

where i indicates the project, t indicates the period, MarketP ressurei,t indicates the number of market pressure tweets directed to the entrepreneur for project i and period t, RepF reqi,t−1 indicates the number of reporting events for project i in period t 1, − P rojectF E is a dummy variable for indicating the project i, and T imeF E is a dummy variable for controlling for life-cycle fixed effects t. The primary coefficient of interest is β1. Given that MarketP ressurei,t is a count variable, all models are estimated with both an OLS and a Poisson regression.8

This model is similar to the model employed by Fu et al.(2012), the difference being that I use a different period length. The length of a period in Fu et al.(2012) is 12 months, which is not appropriate for the early-access crowdfunding setting, as the reporting frequency for a given project tends to change within such a long period. Finding the appropriate period length is an important empirical choice, as a period length that is too long will absorb most of the variation, whereas a period length that is too short will result in a noisy estimation. My decision criterion for the period length is to pick a length as long as possible up to the point where the estimation starts to lose power. I estimate the main model for various period lengths ranging from 2 weeks to 16 weeks in 2-week increments, and the results are displayed in Graph 1. The results indicate that the model starts to lose statistical power at a period length of around 12 weeks. I therefore choose a period length of 10 weeks for my analyses.

There is substantial variation in the level of market pressure across the various projects and periods. To study how changes in the reporting frequency are associated with changes in market pressure, I use a changes version of the first model, wherein each variable is corrected for the average value in the preceding three months:

∆MarketP ressurei,t = β0 + β1∆RepF reqi,t−1 + β ∆Controls i,t−1 { } (2.2) + T imeF E + 

Moderating effects

To test Hypothesis 2, namely, the effect of reporting quality on the association between reporting frequency and market pressure, I use the following model:

8 All models are estimated using standard errors clustered at the project level.

22 ∆MarketP ressurei,t = β0 + β1∆RepF reqi,t−1 + β2∆RepQuali,t−1 + β3∆Interactioni,t−1

+β ∆Controls i,t−1 + P rojectF E + T imeF E +  { } (2.3)

To test Hypothesis 3, namely, the effect of the frequency of unverifiable announcements on the association between reporting frequency and market pressure, I use the following model:

MarketP ressurei,t = β0 + β1RepF reqi,t−1 + β2F reqAddAnni,t−1 + β3Interactioni,t−1

+β Controls i,t−1 + T imeF E +  { } (2.4)

where i indicates the project, t indicates the period, MarketP ressurei,t indicates the number of market pressure tweets directed toward the entrepreneur for project i and period t, RepF reqi,t−1 indicates the number of reporting events for project i and period t 1, RepQuali,t−1 indicates the average announcement length i in period t 1, − − F reqAddAnni,t−1 indicates the number of additional unverifiable announcements i in period t 1, P rojectF E is a dummy variable to indicate the project i, and T imeF E − is a dummy variable to control for life-cycle fixed effects. The primary coefficient of interest is β3.

2.5. Results

2.5.1. Descriptive statistics

Table 1 provides detailed descriptive statistics for the observations I use to test the various models. Table 1 Panel A relates to the sample used for models 1, 4, and 5. Table 1 Panel B contains the descriptive statistics for the changes specification used for models 2 and 3. The main sample consists of 883 observations for 144 projects, each observation being a 10-week project period. Owing to the nature of a changes specification, the sample described in Table 1 Panel B consists of 449 observations, each observation being the change relative to the average of the previous 3 periods.9

9 Inferences remain similar when calculating the changes using only the previous period, however this specification is sensitive to extreme changes that are potentially the result of data errors. Taking the difference compared to the average of the previous 3 periods provides a more robust

23 The primary dependent variable MarketP ressure is the number of market pressure tweets received in a 10-week period. Table 1 Panel A shows that there is substantial variation in the level of market pressure across different periods. At the median level, an entrepreneur receives 16 market pressure tweets in a 10-week period, with a mean level of 56 tweets. This translates to around 2 tweets per week at the median level (6 at the mean). Some project periods receive a very high number of tweets; the entrepreneur in the period with the most tweets received 2,241 tweets during the 10-week period. The descriptive statistics in Table 1 Panel B show that the level of market pressure varies substantially across periods of the same project. The average median change is 3 tweets, which corresponds to 17% of the period median number of tweets.

The primary independent variable RepF req is the number of reports issued by the entrepreneur in a 10-week period. As expected, Table 1 Panel A shows that the re- porting frequency varies across project periods. On average, an entrepreneur will issue between 1 and 2 reports per 10-week period. In the span of a year, this translates to 7 to 8 reports on average, or an average of 1 report every 7 weeks. Some entrepreneurs, however, choose to report much more frequently, up to as much as once a week. There is also variation in the reporting frequency between periods of the same project. As displayed in Table 1 Panel B, some entrepreneurs choose to increase or decrease their frequency by as much as 5 reports compared to the previous three periods.

Table 1 Panel A shows, with regard to the other variables, that a project receives, on average, 48 negative reviews in a 10-week period but that this number varies strongly between project periods. An entrepreneurs’ Twitter account sends, on average, 88 tweets during each 10-week period, which confirms that Twitter is a relevant and active channel for two-way communication between the entrepreneur and his or her funders. Finally, each project receives, on average, 79,000 new funders, 11 YouTube videos, and a median per-funder playtime of 20 hours per week at the high end of the distribution.10 Regarding the moderating variables, an average reporting event is accompanied by an announcement that is, on average, 284 words long and has a standard deviation of 410 words, and an entrepreneur creates, on average, 4 to 5 unverifiable additional announcements per 10-week period.

[Table 1 Panel A and Table 1 Panel B about here]

specification. 10 The actual median playtime will be lower as the data source only collects playtime data for the individuals that play more than average. This applies equally to all projects.

24 2.5.2. Main results

Table 2 shows the results for my main models: Table 2 Panel A corresponds to the results for model 1, and Table 2 Panel B corresponds to the results for model 2. The level of market pressure is a count variable, so model 1 is estimated using both Poisson and OLS regressions. Panel A of Table 2 contains the following columns. Column 1 and Column 2 contain the full model without fixed effects, tested using Poisson and OLS regressions. Respectively; columns 3, 4, and 5 contain the various combinations of fixed effects estimated using Poisson regressions. Finally, Column 6 contains the full specification with both project and life-cycle fixed effects estimated using an OLS regression. Panel B of Table 2 contains the following columns. Column 1, 2, and 3 contain the main changes model with various combinations of control variables excluding any fixed effects. Column 4 contains the full changes model with project life- cycle fixed effects. All models are tested with standard errors that are corrected for heteroscedasticity and are clustered at the project level.

The results of Table 2 Panel A indicate that there is a negative association (-0.059 p = 0.065 — -2.65, p = 0.05) between the number of reporting events in a period and the number of market pressure tweets that the entrepreneur receives. This result provides support for my first hypothesis, which states that there is an association between reporting frequency and the level of market pressure. Furthermore, the direction of the association is consistently negative throughout the various specifications, which indicates that in the early-access crowdfunding setting, the reporting frequency is primarily associated with market pressure that is driven by agency frictions. Inferences remain similar when all the control variables, project fixed effects, and project life-cycle fixed effects are included. This result is economically meaningful, as one more reporting event in a 10-week period is associated with, on average, 3 fewer market pressure tweets, or a reduction of 17% relative to the median number of market pressure tweets. A similar result is obtained when using the stricter changes specification displayed in Table 2 Panel B. The results of the changes model indicate a similar negative association (-4.74, p = 0.05) between a change in the number of reporting events and a change in the level of market pressure. Reducing the number of reports in a 10- week period by 1 report is associated with a reduction in market pressure of around 5 tweets, which corresponds to a reduction of 29% relative to the median number of market pressure tweets.

A potential alternative explanation for the reduction in market pressure is that it is driven by the additional utility that funders derive from getting earlier access to new features, and not by the reporting part (i.e., reduction of information asymmetry) of the product updates. To disentangle the utility effect from the reporting effect, I run a

25 mediation model wherein I use the number of negative reviews that the project receives during a given period (i.e., the product satisfaction) as a proxy for the utility effect. I run this mediation model for both the levels and changes specifications; the results are displayed in Figure 2 and Figure 3. The results in Figure 2 indicate that product satis- faction is not significantly associated with the reporting frequency (0.63, p = 0.60) but is associated with the level of market pressure (0.23, p = 0.000). More importantly, the effect of reporting frequency on market pressure that is explained through the effect on product satisfaction is not significant (0.14, p = 0.60). Similar inferences are obtained when running the mediation model for the changes specification; the indirect effect ex- plained through the effect on product satisfaction is not significant (-0.02, p = 0.94). These mediation results provide reassurance that the negative association between the reporting frequency and market pressure is plausibly driven by the reporting effect of these product updates.

[Table 2 Panel A and Table 2 Panel B about here]

2.5.3. Moderating results

Table 3 shows the results for model 3, which is used to test whether the association be- tween a change in the reporting frequency and a change in the level of market pressure is moderated by the average reporting quality of the reporting events. The coefficient for ∆Number of Reportst−1 is no longer statistically significant (-3.65, p = 0.18). The coefficient for ∆Reporting Qualityt−1 is not statistically significant (-0.01, p = 0.26), which indicates that a change in the reporting quality is by itself not associated with a change in the market pressure. The interaction term (-0.01, p = 0.04) indicates that the average reporting quality amplifies the association between reporting frequency and market pressure. Given the insignificance of ∆Reporting Qualityt−1 this result suggests that a change in reporting frequency only affects the market pressure when it is also accompanied by a change in reporting quality. Simultaneously increasing the quality of an average report with 100 words (35% of mean number of words) increases the association between reporting frequency and market pressure by an average of 1 tweet (around 33% of the main effect size, which is 3 tweets). This result is consistent across all specifications and confirms a complementary relationship between reporting quality and reporting frequency. Overall, these results support my second hypothesis that changes in the reporting quality amplify the association between a change in the reporting frequency and changes in the level of market pressure.

[Table 3 about here]

Table 4 shows the results for model 4, which I use to test whether the association between the reporting frequency and the level of market pressure is moderated by the

26 number of unverifiable additional announcements in the same period. The coefficient for Number of Reportst−1 remains negative and statistically significant (-3.67, p = 0.05) but does increase in magnitude compared to the results in Table 2 Panel A.

The coefficient for Num Announcementst−1 is not statistically significant (-1.17, p = 0.15). This indicates that, by itself, having more extra announcements is not associated with lower market pressure. The interaction term is not significant (0.15, p = 0.38). These results suggest that the frequency of unverifiable additional announcements does not influence the association between the reporting frequency and the level of market pressure.

[Table 4 about here]

2.6. Additional analyses

2.6.1. Effect of recently joined funders

An interesting phenomenon in the crowdfunding setting is the large fluctuation in the amount of newly joined funders across periods. The presence of a large number of newly joined funders might have implications for the relationship between reporting frequency and the level of market pressure. On the one hand, it is possible that these newly joined funders will exhibit a higher than average amount of attention, which will increase the scrutiny placed on the entrepreneur. The analytical model of Hirshleifer and Teoh(2003) highlights that the presence or absence of attention from market participants has important implications for the degree to which, and if so how, they react to reporting events. Based on this “market distraction hypothesis”, I expect that periods with a large number of newly joined funders will exhibit a stronger negative relationship between reporting frequency and market pressure. However, Porter and Smith(1995) also suggest that market participants with limited or no experience might not be able to come to rational expectations, with the result that these inexperienced market participants exhibit a form of “temporary myopia”. Newly joined funders, for example, might have irrationally high expectations for the product updates, which would increase their propensity to impose market pressure on the entrepreneur. This “temporary myopia hypothesis” suggests that the presence of more newly joined back- ers will weaken the negative relationship between reporting frequency and market pressure.

I measure the presence of newly joined funders for a given period by calculating the trend of the amount of newly joined funders in preceding periods. A positive trend indicates a larger presence of newly joined funders in the current period. I obtain own-

27 ership statistics via the Steam API and various community websites and operationalize it as the average percentage change in ownership for the three 10-week periods before the reporting period. Using this new variable, I test the following model:

MarketP ressurei,t = β0 + β1RepF reqi,t−1 + β2NewF underT rendi,t−1 (2.5) +β3Interactioni,t−1 + β Controls i,t−1 + T imeF E +  { }

The results of model 5 are shown in Table 5. Owing to how the New F under T rendt−1 variable is created, the number of observations drops to 645, but the results in Col- umn 4 show that the main result still holds (-5.17, p = 0.06). Column 3 shows that

Number of Reportst−1 remains negative and statistically significant (-9.18, p= 0.004) when New F under T rendt−1 is included, but the magnitude is substantially higher than the coefficient in Column 4. The coefficient for New F under T rendt−1 is sta- tistically significant and negative (-16.38, p = 0.01), indicating that the presence of recently joined funders is by itself associated with lower levels of market pressure. This relationship suggests that an average increase in ownership of 10% across the previous three periods is associated with 2 fewer market pressure tweets. The interaction term is positive and statistically significant (5.39, p = 0.02) throughout all specifications, which implies that the negative relationship between reporting frequency and market pressure is lower during periods with a large presence of newly joined funders. Untab- ulated results show that the inferences remain the same when Poisson regressions are used instead of OLS regressions. In summary, this result suggests that newly joined funders temporarily exhibit a higher degree of myopia, which pushes the relationship between reporting frequency and market pressure in the positive direction owing to how higher reporting frequencies are expected to interact with the myopia of the mar- ket participants.

[Table 5 about here]

2.6.2. Cross-sectional splits

2.6.2.1. Effect of variation in monitoring incentives

On average, funders have incentives to impose market pressure on the entrepreneur to ensure an appropriate return on their investment. However, these incentives might vary with the characteristics of the project. One particularly important characteristic is the degree to which the utility of the product is conditional on it having a sufficiently large user base. For projects that rely solely on single-player functionality, the active user base is of little concern, as the product is usable even when no one else is using it.

28 This is not the case for multiplayer projects, which become useless without an active user base. As such, I expect that the funders of multiplayer projects will have stronger incentives to provide monitoring than will funders of single-player projects. Columns 1 and 2 of Table 6 show the results of a comparison between single-player and multiplayer projects. I determine whether a project is single-player or multiplayer based on the tags provided on the project page. The results show a significantly negative reporting frequency effect for multiplayer projects (-3.40, p = 0.03) but no significant effect for singleplayer projects (-0.57, p = 0.72). This finding indicates that the result is driven primarily by funders of multiplayer projects, as they have stronger incentives to monitor the entrepreneur and thus are more likely to respond to the frequency of reporting.

2.6.2.2. Effect of funders’ free-riding propensity

A potential downside to the crowdfunding model is that effective monitoring might be difficult owing to widespread free-riding behavior among funders (Belleflamme et al., 2015). Kuppuswamy and Bayus(2018), for example, document that Kickstarter projects with a large amount of initial support receive less additional funder support in later stages of the campaign. Based on this prior evidence, I expect that the relation- ship between the reporting frequency and the level of market pressure will be weaker for projects whose funders are more likely to exhibit free-riding behavior. I develop a proxy for cross-sectional variation in the free-riding propensity based on the amount of funding raised during the first two periods of the project (i.e., the first 20 weeks). Columns 3 to 5 of Table 6 contain the results for projects with varying levels of initial success. The results show a significant reporting effect for projects with an average initial funding success (-2.76, p = 0.04) but an insignificant effect for projects that have either low (0.20, p = 0.78) or high (-4.85, p = 0.32) initial funding success. The insignificant coefficient for high initial success projects is consistent with prior evidence that free-riding behavior can impede the effectiveness of monitoring in crowdfunding markets.

[Table 6 about here]

2.6.3. Market Pressure Heterogeneity and LDA

In the main analyses, I treat the concept of market pressure as dichotomous: a Twit- ter message is classified as either imposing or not imposing market pressure on the entrepreneur. In reality, however, there is substantial variation in the type of Twitter messages that the entrepreneurs receive from funders. From an empirical perspective, incorporating this heterogeneity into the model is not straightforward, given that it is not ex ante clear how to categorize the messages into meaningful categories. I solve

29 this problem by training a latent dirichlet allocation (LDA) model on the corpus of market pressure tweets to model the underlying topic distribution. This LDA approach has already been used in several other papers, such as Dyer, Lang, and Stice-Lawrence (2017) which models the topics of 10-K disclosures, and Huang, Lehavy, Zang, and Zheng(2018) which models the topics of analyst reports.

Latent dirichlet allocation is an unsupervised machine learning model for discovering abstract topics using only the corpus of documents and a set of hyper-parameters as input. In contrast to the supervised machine learning procedure that I used to classify market pressure tweets, the LDA procedure does not require a pre-classified training sample. While easy to train, the LDA procedure relies on the researcher’s subjective interpretations to determine the appropriate number of abstract topics and the categorical labels assigned to each. After experimenting with various options, I settle on an LDA model with 3 topics, as this seems to yield the best discriminant and convergent validity. I run this LDA model on a stratified random sample of around 130,000 tweets to ensure that each project is appropriately represented.

Figure 4 provides detailed insights into the topics identified by the LDA procedure. The intertopic dimension map and termite plot in Figure 4 indicate relatively strong discriminant validity.11 I subjectively assign a label to each cluster of three central tropics: release-related tweets, problem-related tweets, and feature-related tweets. The release-related tweets contain words such as “release”, “version”, and “update” and refer mostly to funders asking about the release schedule and timing of new updates. The problem-related tweets contain words such as “fix”, “bug”, and “problem” and refer mostly to funders expressing that they have problems with the product. The feature-related tweets contain words such as “add”, “new”, and “future” and refer mostly to funders suggesting and discussing new features that they would like to see added to the product.

Using the trained LDA model, I classify a tweet into one of the three categories if the tweet’s topic probability exceeds 50%. Some ambiguous tweets have roughly equal probabilities for each of the three topic and therefore don’t fall into any of the three categories. I rerun both my main levels and my changes regression with a topic-specific measure for market pressure. The results of these regressions are shown in Table 7. Column 1 and 2 of Table 7 indicate no significant effect (-0.47, p = 0.44) of the reporting frequency on release-related market pressure tweets. Columns 3 to 6 show a negatively significant coefficient for problem and feature-related tweets (-0.80, p = 0.02 and - 0.48, p = 0.20). These results are in line with the effect being driven by monitoring incentives and not so much by myopic incentives. From a myopic perspective, I would

11 These visuals are adapted from the LDAvis, pyLDAvis, and Textacy packages.

30 expect a reporting frequency to have a stronger effect on the likelihood that funders send release-related tweets.

[Table 7 about here]

2.7. Conclusion

Using data from one of the largest reward crowdfunding platforms, I provide empiri- cal evidence on the relationship between reporting frequency and market pressure in crowdfunding markets. I measure market pressure by classifying Twitter messages to the entrepreneurs using a machine learning algorithm that is trained using Amazon Mechanical Turk. The results show a negative association between reporting frequency and the level of market pressure; one additional reporting event in a 10-week period is associated with a level of market pressure that is, on average, 17% lower than the median level of market pressure. This result is driven primarily by the reporting part of the product updates, as the mediation analysis reveals that only a small fraction of the relationship is driven by a consumption utility effect. Furthermore, additional tests show that this association is stronger when accompanied by reporting events of higher quality, is not influenced by the frequency of unverifiable additional announce- ments, and is weaker during periods with a large presence of newly joined market participants. Consistent with the existing literature on crowdfunding disclosures, these results highlight the fact that higher reporting frequencies can reduce agency frictions in a crowdfunding market even when that market is characterized by strong myopic market preferences.

My findings can inform discussions among practitioners and academics on the role of disclosure practices and market pressure in the increasingly common scenario whereby consumers of an entrepreneur’s product also become the entrepreneur’s funders. Specif- ically, my results extend our limited understanding of the dynamics of two-way com- munication and agency frictions in crowdfunding settings. In a more general sense, my paper is one of the first to directly examine the existence and direction of the role that market pressure plays in the relationship between reporting frequency and managerial myopia. Finally, my study leaves unanswered several interesting questions about the relationship between reporting frequency and market pressure. For exam- ple, in my early-access crowdfunding setting funders have very limited exit options. Prior analytical models suggest that the existence of an exit-option can serve as a complement or substitute for active market participants’ communication with man- agement (e.g., Levit, 2018). Also, in my setting, there is no explicit distinction between

31 mandatory and voluntary reporting, as crowdfunding has only limited regulation and oversight. It would be interesting to obtain more insight into the effects of mandated reporting frequencies, especially in the context of equity crowdfunding. Finally, social media research suggests that the interpretation and perception of two-way communi- cation between market participants and managers can be quite subtle and nuanced (e.g., Cade, 2018). My measure considers only the level of communication but future research could extend this measure by creating empirical models that also incorporate the content and nuances of communications from market participants.

32 Graphs

Period length visualization

Fig. 2.1. Estimation results for various period lengths.

33 Mediation results

Fig. 2.2. Results of mediation model — Levels estimation

Fig. 2.3. Results of mediation model — Changes estimation

34 Visualization LDA model

Fig. 2.4. Visualization of LDA model

35 Bibliography

Agrawal, A., Catalini, C., Goldfarb, A., 2014. Some Simple Economics of Crowdfund- ing. Innovation Policy and the Economy 14, 63–97.

Ahlers, G. K., Cumming, D., G¨unther, C., Schweizer, D., 2015. Signaling in Equity Crowdfunding. Entrepreneurship Theory and Practice 39, 955–980.

Armstrong, C. S., Guay, W. R., Weber, J. P., 2010. The role of information and finan- cial reporting in corporate governance and debt contracting. Journal of Accounting and Economics 50, 179–234.

Asker, J., Farre-Mensa, J., Ljungqvist, A., 2015. Corporate Investment and Stock Market Listing: A Puzzle? Review of Financial Studies 28, 342–390.

Barbi, M., Bigelli, M., 2017. Crowdfunding practices in and outside the US. Research in International Business and Finance 42, 208–223.

Belleflamme, P., Lambert, T., Schwienbacher, A., 2014. Crowdfunding: Tapping the right crowd. Journal of Business Venturing 29, 585–609.

Belleflamme, P., Omrani, N., Peitz, M., 2015. The economics of crowdfunding plat- forms. Information Economics and Policy 33, 11–28.

Ben-Rephael, A., Da, Z., Easton, P. D., Israelsen, R. D., 2017. Does SEC Form 8- K Provide Information Necessary or Useful for the Protection of Investors? SSRN Electronic Journal .

Berger, A. N., Udell, G. F., 1995. Relationship Lending and Lines of Credit in Small Firm Finance.

Blankespoor, E., Miller, G. S., White, H. D., 2014. The Role of Dissemination in Market Liquidity: Evidence from Firms’ Use of Twitter. The Accounting Review 89, 79–112.

Block, J., Hornuf, L., Moritz, A., 2018. Which updates during an equity crowdfunding campaign increase crowd participation? Small Business Economics 50, 3–27.

36 Boland, C. M., Bronson, S. N., Hogan, C. E., 2015. Accelerated Filing Deadlines, Internal Controls, and Financial Statement Quality: The Case of Originating Mis- statements. Accounting Horizons 29, 551–575.

Brav, A., Jiang, W., Partnoy, F., Thomas, R., 2008. Hedge Fund Activism, Corporate Governance, and Firm Performance. The Journal of Finance 63, 1729–1775.

Bushee, B. J., 1998. The Influence of Institutional Investors on Myopic R&D Investment Behavior.

Bushman, R., Chen, Q., Engel, E., Smith, A., 2004. Financial accounting information, organizational complexity and corporate governance systems. Journal of Accounting and Economics 37, 167–201.

Cade, N. L., 2018. Corporate Social Media: How Two-Way Disclosure Channels Influ- ence Investors. Accounting, Organizations and Society .

Casas-arce, P., Louren¸co,S. M., Mart´ınez-jerez,F. A., 2017. The Performance Effect of Feedback Frequency and Detail: Evidence from a Field Experiment in Customer Satisfaction. Journal of Accounting Research .

Cascino, S., Correia, M. M., Tamayo, A. M., 2018. Does Consumer Protection Enhance Disclosure Credibility in Reward Crowdfunding? SSRN Electronic Journal .

Chemla, G., Tinn, K., 2018. Learning Through Crowdfunding. SSRN Electronic Jour- nal .

Chen, H., Hwang, B.-H., Liu, B., 2013. The Economic Consequences of Having ’Social’ Executives. SSRN Electronic Journal .

Conway, B. A., Kenski, K., Wang, D., 2015. The Rise of Twitter in the Political Cam- paign: Searching for Intermedia Agenda-Setting Effects in the Presidential Primary. Journal of Computer-Mediated Communication 20, 363–380.

Cornelissen, J., 2011. Corporate communication : a guide to theory and practice. Sage.

Cosh, A., Cumming, D., Hughes, A., 2009. Outside Enterpreneurial Capital. The Eco- nomic Journal 119, 1494–1533.

Courtney, C., Dutta, S., Li, Y., 2017. Resolving Information Asymmetry: Signaling, Endorsement, and Crowdfunding Success. Entrepreneurship Theory and Practice 41, 265–290.

Cuijpers, R., Peek, E., 2010. Reporting Frequency, Information Precision and Private Information Acquisition. Journal of Business Finance & Accounting 37, 27–59.

37 Cumming, D. J., Hornuf, L., Karami, M., Schweizer, D., 2017. Disentangling Crowd- funding from Fraudfunding. SSRN Electronic Journal .

Demsetz, H., Lehn, K., 1985. The Structure of Corporate Ownership: Causes and Consequences. Journal of Political Economy 93, 1155–1177.

Dimson, E., Karaka¸s,O., Li, X., 2015. Active Ownership. Review of Financial Studies 28, 3225–3268.

Doyle, J. T., Magilke, M. J., 2013. Decision Usefulness and Accelerated Filing Dead- lines. Journal of Accounting Research 51, 549–581.

Dyer, T., Lang, M., Stice-Lawrence, L., 2017. The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics 64, 221–245.

Edmans, A., Heinle, M. S., Huang, C., 2016. The Real Costs of Financial Efficiency When Some Information Is Soft. Review of Finance 20, 2151–2182.

Ernstberger, J., Link, B., Stich, M., Vogler, O., 2017. The Real Effects of Mandatory Quarterly Reporting. The Accounting Review pp. accr–51705.

Fishburn, P. C., Rubinstein, A., 1982. Time Preference. International Economic Review 23, 677.

Froot, K. A., Perold, A. F., Stein, J. C., 1992. Shareholder Trading Practices And Corporate Investment Horizons. Journal of Applied Corporate Finance 5, 42–58.

Fu, R., Kraft, A., Zhang, H., 2012. Financial reporting frequency, information asym- metry, and the cost of equity. Journal of Accounting and Economics 54, 132–149.

Gigler, F., 1994. Self-Enforcing Voluntary Disclosures. Journal of Accounting Research 32, 224.

Gigler, F., Kanodia, C., Sapra, H., Venugopalan, R., 2014. How Frequent Financial Reporting Can Cause Managerial Short-Termism: An Analysis of the Costs and Benefits of Increasing Reporting Frequency. Journal of Accounting Research 52, 357–387.

Gneezy, U., Potters, J., 1997. An Experiment on Risk Taking and Evaluation Periods. The Quarterly Journal of Economics 112, 631–645.

Guo, T., Finke, M., Mulholland, B., 2015. Investor attention and advisor social media interaction. Applied Economics Letters 22, 261–265.

38 Gutirrez Urtiaga, M., Saez Lacave, M. I., 2018. The Promise of Reward Crowdfunding. SSRN Electronic Journal .

Harford, J., Kecsk´es,A., Mansi, S., 2018. Do long-term investors improve corporate decision making? Journal of Corporate Finance 50, 424–452.

Hermalin, B. E., Weisbach, M. S., 2012. Information Disclosure and Corporate Gov- ernance. The Journal of Finance 67, 195–233.

Hildebrand, T., Puri, M., Rocholl, J., 2017. Adverse Incentives in Crowdfunding. Man- agement Science 63, 587–608.

Hirshleifer, D., Teoh, S. H., 2003. Limited attention, information disclosure, and finan- cial reporting. Journal of Accounting and Economics 36, 337–386.

Hornuf, L., Schwienbacher, A., 2018. Market mechanisms and funding dynamics in equity crowdfunding. Journal of Corporate Finance 50, 556–574.

Huang, A. H., Lehavy, R., Zang, A. Y., Zheng, R., 2018. Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach. Management Science 64, 2833–2855.

Jacobson, R., Aaker, D., 1993. Myopic management behavior with efficient, but im- perfect, financial markets. Journal of Accounting and Economics 16, 383–405.

Jensen, M. C., Meckling, W. H., 1976. Theory of the firm: Managerial behavior, agency costs and ownership structure. Journal of Financial Economics 3, 305–360.

Jung, M. J., Naughton, J. P., Tahoun, A., Wang, C., 2017. Do Firms Strategically Dis- seminate? Evidence from Corporate Use of Social Media. The Accounting Review, Forthcoming .

Kraft, A. G., Vashishtha, R., Venkatachalam, M., 2017. Frequent Financial Reporting and Managerial Myopia. The Accounting Review pp. accr–51838.

Kuppuswamy, V., Bayus, B. L., 2018. Crowdfunding Creative Ideas: The Dynamics of Project Backers. In: The Economics of Crowdfunding, Springer International Pub- lishing, Cham, pp. 151–182.

Lee, H.-Y., Mande, V., Son, M., 2015. Are earnings announced early of higher quality? Accounting & Finance 55, 187–212.

Levit, D., 2018. Soft Shareholder Activism. SSRN Electronic Journal .

39 Madsen, J., McMullin, J., 2018. Economic Consequences of Risk and Ability Disclo- sures: Evidence From Crowdfunding. Working Paper .

Mc Namara, P., Baden-Fuller, C., 2007. Shareholder returns and the explorationex- ploitation dilemma: R&D announcements by biotechnology firms. Research Pol- icy 36, 548–565.

McCahery, J. A., Sautner, Z., Starks, L. T., 2016. Behind the Scenes: The Corporate Governance Preferences of Institutional Investors. The Journal of Finance 71, 2905– 2932.

Mollick, E., 2014. The dynamics of crowdfunding: An exploratory study. Journal of Business Venturing 29, 1–16.

Porter, D. P., Smith, V. L., 1995. Futures Contracting and Dividend Uncertainty in Experimental Asset Markets.

Sakaki, T., Okazaki, M., Matsuo, Y., 2010. Earthquake shakes Twitter users. In: Pro- ceedings of the 19th international conference on World wide web - WWW ’10 , ACM Press, New York, New York, USA, p. 851.

Sayre, B., Bode, L., Shah, D., Wilcox, D., Shah, C., 2010. Agenda Setting in a Digital Age: Tracking Attention to California Proposition 8 in Social Media, Online News and Conventional News. Policy & Internet 2, 7–32.

Stein, J. C., 1988. Takeover Threats and Managerial Myopia. Journal of Political Economy 96, 61–80.

Stein, J. C., 1989. Efficient Capital Markets, Inefficient Firms: A Model of Myopic Corporate Behavior. The Quarterly Journal of Economics 104, 655.

Strausz, R., 2017. A Theory of Crowdfunding: A Mechanism Design Approach with Demand Uncertainty and Moral Hazard. American Economic Review 107, 1430– 1476.

Thaler, R. H., Tversky, A., Kahneman, D., Schwartz, A., 1997. The Effect of Myopia and Loss Aversion on Risk Taking: An Experimental Test.

Van Der Heijden, E., Klein, T. J., M¨uller,W., Potters, J., 2012. Framing effects and impatience: Evidence from a large scale experiment. Journal of Economic Behavior & Organization 84, 701–711.

Verrecchia, R. E., 1983. Discretionary disclosure. Journal of Accounting and Economics 5, 179–194.

40 Wagenhofer, A., 2014. Trading off Costs and Benefits of Frequent Financial Reporting. Journal of Accounting Research 52, 389–401.

Xu, T., 2017. Financial Disintermediation and Entrepreneurial Learning: Evidence from the Crowdfunding Market. SSRN Electronic Journal .

41 Appendix A

Market Pressure Examples

42 Fig. A.1. A small selection of market pressure Twitter messages.

43 Appendix B

Amazon Mechanical Turk procedure

The Amazon Mechanical Turk (i.e., MTurk) workers are asked to classify 5 tweets and upon completion are paid US $0.20. Completing the task takes between 60 to 90 seconds, so the payment equates to an average hourly rate of $10. This hourly rate is above average but ensures that the workers take their time to appropriately evaluate each tweet. Figure 6 shows a screen capture of the HIT instructions. Figure 7 shows a screen capture of the three independent questions that an M-Turk worker is required to evaluate for each tweet.

44 Fig. B.1. Screen capture of M-Turk instructions

45 Fig. B.2. Screen capture of M-Turk task

46 Appendix C

Variable Definitions

47 Variable Definition Main variables

Market Pressure The number of Twitter messages directed to the Twitter account of the entrepreneur during a given period that are classified as inducing pressure us- ing the machine learning algorithm. Reporting Frequency The number of product updates with a non-zero file size change accompanied by an announcement that contains at least one keyword referencing a new version during a given period. Reporting Quality The average length (i.e., number of tokens) of the announcements accompanying a product update in a given period. Additional Announcements The number of announcements in a given period that are not accompanied by a product update.

Control variables

Number Negative Reviews The number of negative reviews posted on the project page in a given period. Number of Dev Tweets The number of Twitter messages sent by the en- trepreneur during a given period. Owners Delta The change in ownership for a given period. Media Attention The number of YouTube videos posted related to the project during a given period. Median Playtime The median number of hours that an average fun- der played the game for a given period at the high end of the distribution.

Additional variables

New Funder Trend The average percentage change in ownership for the three 10-week periods prior to the given period.

48 Tables

49 Table 1: Descriptive Statistics

Panel A: Levels models

Statistic N Mean St. Dev. Min Median Max

Market Pressure (#Tweets) 883 56.3 106.4 1 16.0 1, 241.0

Number of Reportst 1 883 1.3 1.6 0 1.0 12.0 − Number Negative Reviewst 1 883 48.3 80.8 0 15.0 614.0 − Number of Dev Tweetst 1 883 87.8 118.4 1 46.0 1, 022.0 − Owners Deltat 1 883 78.7 75.9 1.8 51.1 419.8 − Media Attention (#Videos)t 1 883 11.0 16.1 0.0 3.2 50.0 − Median Playtimet 1 883 270.9 263.1 6 204.6 2, 225.0 − Reporting Qualityt 1 883 280.7 411.9 0 137.0 3, 227.0 − #Additional Announcementst 1 883 4.7 4.9 0 3.0 35.0 − New Backer Trendt 1 639 0.2 0.6 0.8 0.1 4.5 − −

Panel B: Changes models

Statistic N Mean St. Dev. Min Median Max

∆Market Pressure (#Tweets) 449 7.1 72.5 727.0 2.7 358.0 − − − ∆Number of Reports 449 0.1 1.4 5.3 0.3 4.7 − − − ∆Number Negative Reviews 449 2.0 58.5 577.7 0.7 453.0 − − − ∆Number of Dev Tweets 449 11.5 88.4 720.7 6.0 577.7 − − − ∆Owners Delta 449 6.6 28.1 140.2 4.4 153.1 − ∆Media Attention (#Videos) 449 0.9 4.3 22.4 0.1 26.5 − ∆Median Playtime 449 15.0 37.1 143.3 11.1 220.2 − ∆Reporting Quality 449 5.3 389.7 1, 128.0 36.3 3, 227.0 − − ∆#Additional Announcements 449 0.5 3.1 16.0 0.3 21.0 − − −

50 Table 2: Main Regression

Panel A: Levels models

Dependent variable:

Market Pressure (i.e., Number of Tweets)

(1) (2) (3) (4) (5) (6)

Number of Reportst 1 0.099∗∗ 3.84∗∗∗ 0.058∗ 0.100∗∗ 0.059∗ 2.65∗∗ − − − − − − − ( 2.35) ( 2.58) ( 1.86) ( 2.18) ( 1.85) ( 2.05) − − − − − −

Number Negative Reviewst 1 0.000 0.04 0.001 0.001 0.000 0.04 − (0.21) (0.42) (1.24) (0.87) (1.00) (0.61)

Number Dev Tweetst 1 0.004∗∗∗ 0.38∗∗∗ 0.001∗∗ 0.004∗∗∗ 0.001∗ 0.14∗ − (0.003) (4.95) (2.47) (7.29) (1.81) (1.82)

Owners Deltat 1 0.003∗∗ 0.19∗ 0.000 0.003∗∗ 0.000 0.06 − (2.19) (1.70) (0.17) (2.21) (0.75) (0.70)

Media Attentiont 1 0.033∗∗∗ 2.10∗ 0.006 0.032∗∗∗ 0.004 0.39 − − − − (3.45) (1.86) ( 0.83) (3.58) ( 0.57) ( 0.61) − − −

Median Playtimet 1 0.001∗∗∗ 0.003 0.000 0.001∗∗∗ 0.001 0.001 − − − − ( 3.96) ( 0.19) (0.03) ( 3.58) (1.40) (0.05) − − −

Constant 2.97∗∗∗ 11.61∗ 2.722∗∗∗ − (19.18) ( 1.83) (8.51) −

Estimator Poisson OLS Poisson Poisson Poisson OLS Project Clustered SE Yes Yes Yes Yes Yes Yes Project FE No No Yes No Yes Yes Period FE No No No Yes Yes Yes Observations 883 883 883 883 883 883 R2 0.39 0.80 Adjusted R2 0.39 0.76

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

51 Table 2: Main Regression (cont.)

Panel B: Changes models

Dependent variable:

Delta Market Pressure (i.e., Delta Number of Tweets)

(1) (2) (3) (4)

∆Number of Reportst 1 3.48 5.18∗∗ 5.13∗∗ 4.74∗∗ − − − − − ( 1.41) ( 2.10) ( 2.09) ( 2.04) − − − −

∆Number Negative Reviewst 1 0.29∗ 0.32∗ 0.30∗∗ − (1.68) (1.83) (1.99)

∆Number Dev Tweetst 1 0.19∗∗∗ 0.19∗∗∗ 0.19∗∗∗ − (3.27) (3.29) (3.66)

∆Owners Deltat 1 0.02 0.04 − (0.08) (0.21)

∆Media Attentiont 1 1.34 1.40 − − − ( 1.26) ( 1.29) − −

∆Median Playtimet 1 0.02 0.01 − (0.30) (0.16)

Constant 7.37∗∗ 4.74∗ 3.87∗ − − − ( 2.34) ( 1.67) ( 1.68) − − −

Project Clustered SE Yes Yes Yes Yes Project FE No No No No Period FE No No No Yes Observations 449 449 449 449 R2 0.004 0.13 0.14 0.17 Adjusted R2 0.002 0.13 0.13 0.13 Residual Std. Error 72.43 (df = 447) 67.76 (df = 445) 67.77 (df = 442) 67.58 (df = 428)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

52 Table 3: Reporting Quality Regressions

Dependent variable:

Delta Market Pressure (i.e., Delta Number of Tweets)

(1) (2) (3) (4)

∆Number of Reportst 1 3.05 4.27 4.15 3.65 − − − − − ( 1.10) ( 1.52) ( 1.49) ( 1.36) − − − −

∆Report Qualityt 1 0.003 0.01 0.01 0.01 − − − − − ( 0.33) ( 1.02) ( 1.12) ( 1.13) − − − −

∆Interactiont 1 0.01∗∗ 0.01∗∗ 0.01∗ 0.01∗∗ − − − − − ( 2.08) ( 2.09) ( 1.85) ( 2.15) − − − −

∆Number Negative Reviewst 1 0.29 0.32∗ 0.30∗∗ − (1.65) (1.84) (1.98)

∆Number Dev Tweetst 1 0.19∗∗∗ 0.20∗∗∗ 0.19∗∗∗ − (3.35) (3.38) (3.80)

∆Owners Deltat 1 0.01 0.01 − − ( 0.04) (0.05) −

∆Media Attentiont 1 1.39 1.45 − − − ( 1.30) ( 1.33) − −

∆Median Playtimet 1 0.03 0.01 − (0.34) (0.17)

Constant 5.06 2.37 1.29 − − − ( 1.56) ( 0.79) ( 0.44) − − −

Project Clustered SE Yes Yes Yes Yes Project FE No No No No Period FE No No No Yes Observations 449 449 449 449 R2 0.01 0.14 0.15 0.18 Adjusted R2 0.004 0.13 0.13 0.14 Residual Std. Error 72.38 (df = 445) 67.63 (df = 443) 67.63 (df = 440) 67.30 (df = 426)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

53 Table 4: Frequency of Unverifiable Announcements

Dependent variable:

Market Pressure (i.e., Number of Tweets)

(1) (2) (3) (4) (5) (6)

Number of Reportst 1 7.40∗∗ 8.32∗∗∗ 6.12∗∗∗ 3.30∗ 6.30∗∗ 3.67∗∗ − − − − − − − ( 2.11) ( 2.68) ( 2.64) ( 1.75) ( 2.19) ( 1.98) − − − − − −

Num Announcementst 1 0.32 1.95 1.36 0.77 1.40 1.17 − − − − − − − ( 0.18) ( 1.65) ( 1.54) ( 0.95) ( 1.48) ( 1.47) − − − − − −

Interactiont 1 0.16 0.41 0.38 0.15 0.37 0.15 − (0.43) (1.39) (1.48) (0.88) (1.37) (0.88)

Number Negative Reviewst 1 0.42∗∗∗ 0.11 0.06 0.14 0.05 − (5.33) (1.08) (1.12) (1.37) (1.04)

Number Dev Tweetst 1 0.40∗∗∗ 0.39∗∗∗ 0.17∗∗ 0.40∗∗∗ 0.15∗ − (4.64) (4.81) (2.48) (4.96) (1.87)

Media Attentiont 1 2.42∗∗ 0.54 2.45∗∗ 0.34 − − − (2.52) ( 0.87) (2.51) ( 0.55) − −

Median Playtimet 1 0.01 0.02 0.01 0.001 − − (0.67) ( 0.59) (0.62) (0.03) −

Constant 66.71∗∗∗ 18.74∗∗ 0.68 − (4.41) (2.01) ( 0.11) −

Project Clustered SE Yes Yes Yes Yes Yes Yes Project FE No No No Yes No Yes Period FE No No No No Yes Yes Observations 883 883 883 883 883 883 R2 0.01 0.29 0.38 0.80 0.40 0.80 Adjusted R2 0.01 0.29 0.38 0.76 0.38 0.76 Residual Std. Error 106.06 (df = 879) 89.88 (df = 877) 83.80 (df = 875) 51.86 (df = 737) 83.90 (df = 857) 52.06 (df = 719)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

54 Table 5: New Backers Regressions

Dependent variable:

Market Pressure (i.e., Number of Tweets)

(1) (2) (3) (4)

Number of Reportst 1 10.14∗∗∗ 9.15∗∗∗ 9.81∗∗∗ 5.17∗ − − − − − ( 2.65) ( 2.81) ( 2.95) ( 1.89) − − − −

New Backer Trendt 1 22.96∗∗∗ 17.22∗∗∗ 16.38∗∗∗ − − − − ( 2.66) ( 2.89) ( 2.62) − − −

Interactiont 1 7.06∗∗ 5.92∗∗∗ 5.39∗∗ − (2.10) (2.64) (2.46)

Number Negative Reviewst 1 0.38∗∗ 0.40∗∗∗ 0.10 − (2.12) (2.70) (1.23)

Number Dev Tweetst 1 0.41∗∗∗ 0.42∗∗∗ 0.14 − (3.86) (4.01) (1.36)

Media Attentiont 1 1.55∗∗ 1.64∗∗ 0.27 − (1.97) (2.15) (0.27)

Median Playtimet 1 0.03 0.03 0.04 − − (1.55) (1.53) ( 0.77) −

Constant 73.30∗∗∗ 5.89 − (5.84) ( 0.64) −

Project Clustered SE Yes Yes Yes Yes Project FE No No No Yes Period FE No No Yes Yes Observations 645 645 645 645 R2 0.02 0.45 0.47 0.84 Adjusted R2 0.02 0.44 0.45 0.78 Residual Std. Error 108.13 (df = 641) 81.44 (df = 637) 80.81 (df = 621) 50.99 (df = 480)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

55 Table 6: Cross-Sectional Split Regressions

Dependent variable:

Market Pressure (i.e., Number of Tweets)

Singleplayer Multiplayer Low Initial Success Average Initial Success High Initial Success

Number of Reportst 1 0.57 3.40∗∗ 0.20 2.76∗∗ 4.85 − − − | − − ( 0.36) ( 2.30) (0.28) ( 2.09) ( 1.00) − − | − − | Number Negative Reviewst 1 0.26∗ 0.02 0.31 0.07 0.05 − − | − (1.69) ( 0.36) ( 0.94) (1.14) (0.72) − | − | Number Dev Tweetst 1 0.04 0.20∗ 0.07∗∗ 0.19∗∗∗ 0.18 − | (0.71) (1.79) (2.22) (3.06) (0.77) | | Owners Deltat 1 0.22 0.12 0.09 0.19 0.04 − − | ( 0.79) (1.61) (0.50) (1.20) (0.40) − | | Media Attentiont 1 3.03∗ 0.17 1.38 0.27 1.13 − − − | − ( 1.82) ( 0.28) (1.01) (0.54) ( 1.24) − − | − | Median Playtimet 1 0.02 0.01 0.01 0.09 0.16 − | − − (1.06) (0.24) ( 0.74) ( 0.77) (1.58) | − − | Project Clustered SE Yes Yes Yes Yes Yes | Project FE Yes Yes Yes Yes Yes | Period FE Yes Yes Yes Yes Yes | Observations 308 575 284 309 290 | R2 0.83 0.81 0.72 0.88 0.76 | Adjusted R2 0.78 0.76 0.64 0.84 0.69 | Residual Std. Error 42.36 (df = 235) 55.94 (df = 462) 16.46 (df = 219) 32.98 (df = 234) 83.33 (df = 225) |

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

56 Table 7: Market Pressure Heterogeneity (LDA) Regressions

Dependent variable:

Market Pressure (i.e., Number of Tweets)

Release-Related Problem-Related Feature-Related

Number of Reportst 1 0.47 1.06 0.80∗∗ 1.34∗∗ 0.48∗∗ 0.75∗∗∗ − − − − − − − ( 0.78) ( 0.90) ( 2.52) ( 2.55) ( 2.53) ( 3.18) − − − − − −

Number Negative Reviewst 1 0.01 0.05∗ 0.02 0.12∗ 0.001 0.03∗∗ − − (0.41) (1.79) (1.44) (1.87) ( 0.10) (2.18) −

Number Dev Tweetst 1 0.02 0.04∗∗ 0.03∗ 0.04∗∗ 0.05∗∗∗ 0.04∗∗∗ − (0.94) (2.24) (1.67) (2.30) (2.73) (3.38)

Owners Deltat 1 0.02 0.001 0.01 0.01 0.03∗ 0.04 − − − (0.39) ( 0.02) (0.34) ( 0.17) (1.78) (1.48) − −

Media Attentiont 1 0.002 0.03 0.19 0.65∗ 0.10 0.34∗∗ − − − − − (0.01) (0.07) ( 1.01) ( 1.81) ( 0.80) ( 2.01) − − − −

Median Playtimet 1 0.004 0.02 0.001 0.01 0.004 0.01 − − − (0.52) (0.76) (0.07) ( 0.56) ( 0.60) (0.29) − −

Model Levels Changes Levels Changes Levels Changes Project Clustered SE Yes Yes Yes Yes Yes Yes Project FE Yes No Yes No Yes No Period FE Yes Yes Yes Yes Yes Yes Observations 862 426 862 426 862 426 R2 0.68 0.07 0.78 0.21 0.81 0.19 Adjusted R2 0.61 0.02 0.73 0.17 0.77 0.15 Residual Std. Error 18.46 (df = 699) 24.36 (df = 405) 14.79 (df = 699) 21.60 (df = 405) 9.22 (df = 699) 11.62 (df = 405)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

57 58 Chapter 3

The effect of allocating decision rights on the generation, application, and sharing of soft information

Co-author:

Jan Bouwens

59 60 3.1. Introduction

We study the effect of a regulator imposing a mandatory risk management system in an established setting. Economic theory suggests that firms are better off granting decision rights to employees with the best knowledge of a given situation as long as the firm also implements controls to prevent these employees from behaving oppor- tunistically (Jensen and Meckling, 1992; Milgrom and Roberts, 1992; Raith, 2008). This expectation is supported by empirical work (Abernethy, Bouwens, and van Lent, 2004; Moers, 2006). What happens, however, to the quality of decision-making if the firm is forced to limit the decision-making rights of these employees? We study this question by examining whether implementing additional controls enables or impedes the incorporation of soft information in the loan decisions of a large bank.

Theory offers contradictory predictions on the effect of reallocating decision rights away from the agent responsible for collecting and processing soft information. Based on the work of Aghion and Tirole(1997), one might argue that centralization will make the role of the agent less prominent, which will lead the agent to shirk informa- tion collection efforts. This prediction is supported by the argument that introducing additional agents into the process increases the communication costs of soft informa- tion (Dessein, 2002; Dewatripont and Tirole, 2005). Work by Holmstr¨omand Milgrom (1991), however, suggests a reduction in task multidimensionality can increase agents effectiveness in collecting and processing information. Constant, Kiesler, and Sproull (1994) support this prediction by arguing that career opportunities motivate agents to work harder when their actions become directly visible to a higher hierarchical party.

Studies (e.g. Campbell, 2012; Qian, Strahan, and Yang, 2015; Liberti, 2017) have looked into the context in which individual banks unilaterally decide to extend the decision rights of loan officers on granting loans and establishing loan rates. These papers suggest that the extension of decision rights led loan officers to collect more (soft) information, which in turn enhanced their decisions. We extend this literature by examining a reduction in decision rights called for by a regulator. The regulator required that the bank increase its level of scrutiny in evaluating whether loans should be granted. We argue that, in this context, loan officers will be more willing to accept a limitation on their decision rights, as central management can claim that the regulator pointed to shortcomings in the loan process. As such, in our context, loan officers might increase their effort to collect and evaluate the information needed to assess loan applications. We suggest that the officers are sensitive to the underlying origin of the decision to reduce their decision rights. Employees who consider a procedure to

61 be fair are more likely to subscribe to the outcome of that procedure (e.g., Fehr and G¨achter, 2000).

We exploit an externally imposed organizational design change in credit application processes for small to medium-sized companies at a large European bank to study these contradictory predictions. Specifically, the bank limited the ability of loan officers to approve loans, based upon soft information they had collected. It reallocated their decision rights to risk approvers and required the loan officers to share information with the approvers. This setting is especially suitable to study the use of soft informa- tion because it is characterized by the low availability of verifiable information. Soft information, such as subjective assessments on the quality of management, is collected solely by the loan officer and contributes to the assessment of the creditworthiness of potential borrowers (Agarwal and Hauswald, 2010; Uchida, Udell, and Yamori, 2012; Drexler and Schoar, 2014).

Our empirical design exploits the shock to the banks loan-application reviews that came with the introduction of this procedure. This shock, which is exogenous from the perspective of the loan officers, creates a quasi-natural experiment. Jensen and Meckling(1992) provide a framework to describe the expected effect of a shock of this sort in an organizations design. In our case, the loan officers soft information is specific knowledge that is costly to transfer. The initial design of the banks credit reviews minimized transfer costs by allocating decision rights to loan officers. This design, however, exacerbated the agency problem because higher-level managers could not verify the information underlying loan decisions (Jensen and Meckling, 1992). External pressures to impose additional controls let upper-level management to redesign the process by requiring loan officers to obtain approval from the risk department.

We use a loan application-level dataset for the years 2013 and 2014, compiled from a variety of proprietary internal information systems of the bank. These information systems primarily contain hard information related to the loan applications. Soft in- formation is qualitative and unverifiable, impeding its retention in a similar fashion. A common approach to operationalize soft information is to create a proxy based on an internal risk rating, corrected for an estimated amount of integrated hard information (Berg, Puri, and Rocholl, 2014; Qian et al., 2015). We avoid these indirect proxies, like Campbell, Erkens, and Loumioti(2014) and Campbell, Erkens, and Loumioti(2016), by creating a variable based on the subjective interest-rate adjustment that the bank makes for some loans. The main determinant of the interest rate charged on loans is hard information. The optional adjustment is designed to incorporate the availability of soft information. Adjustments therefore proxy for the degree to which available soft information impacts decision-making.

62 Our results support soft information being better used in loan applications following the reallocation of decision rights. This result is driven by a change in behavior of the loan officers and is robust to controlling for loan officer fixed effects and the inclusion of rejected loan applications. Our analysis shows that this change in behavior is associated with improved outcomes as measured by the risk rating trends in the 15 months following loan granting.

This paper extends knowledge in accounting and finance on the gathering, sharing, and application of soft information. Our setting permits a strong identification strat- egy without losing track of the underlying business process. With the credit reviews for corporate loan applications as a control group, we can account for bank specific, geographical, and economic time trends by using a differences-in-differences design. We also can correct for strategic loan-sorting by the loan officer using a Heckman selection model, and we directly proxy for soft information using the discretionary part of the interest rate.

Our paper makes several contributions. We first extend the findings of Qian et al. (2015) by studying an explicit shock to an organizations design, applying a direct measure of soft information, and using a differences-in-differences design to rule out confounding events and time trends. We likewise broaden the findings of Liberti and Mian(2009) by showing that hierarchical distance does not impede the sharing and in- corporation of soft information. And we add to the stream of literature led by Campbell et al.(2016) on the portability of soft information. Campbell et al.(2016) document that soft information can be transferred over time and between employees at the same hierarchical level. We show that soft information can also be transferred between hi- erarchical levels, even after a reallocation of decision rights. We also complement a growing literature on the effect of risk management in monitoring loan officers (e.g., Berg, 2015). Our study examines the introduction of a risk management system im- posed by a regulator. To the best of our knowledge, this is the first study that directly examines the effects of a regulator imposing a mandatory risk management system in an established setting.

From a broader perspective, our paper contributes to the literature on the relationship between regulation and contracting. Hart(2009) suggest that “rather than being based on sound principles, regulation often seems to be a consequence of the public’s need for action in response to a crisis”, implying that regulatory intervention provides little contractual benefit. However, the work of Ertan, Loumioti, and Wittenberg-Moerman (2017) and Granja(2018) documents that regulator-imposed reporting practices in- crease loan quality and enhance stability and development of commercial banks. We contribute to this debate by providing empirical evidence, from the internal perspec-

63 tive of a bank, when operational changes are imposed through regulation. Granja and Leuz(2017) suggest that banks that get assigned to a stricter regulator restructure their loan application processes. They predict that this could lead bank to grant fewer loans as a stricter process naturally results in more rejected loans. On the other hand, it could also enhance the risk assessment quality allowing the bank to accept more loans compared to before the introduction of a more sophisticated risk management system. Our results highlight a positive relation between regulation and operational decision-making, suggesting that regulation can have positive implications for con- tracting.

Our results also provide insight into the difficulties for banks in evaluating loans for small and medium-sized enterprises. These kinds of borrowers have a large economic importance and mainly rely on bank financing (Uchida et al., 2012). In the recent years, especially following the credit crisis of 2008-2009, banks have been criticized for being insufficiently willing to lend to small and medium-sized applicants (Wehinger, 2014). Our study delves into a banks screening of loan applications from these kinds of com- panies to investigate factors that influence the likelihood of acceptance. Most papers are constrained by data limitations and cannot include these types of analyses.1

3.2. Hypotheses Development

A trade-off between agency costs and transfer costs

The classical agency problem is characterized by information asymmetry between the principal and the agent. In the framework of Jensen and Meckling(1992), informa- tion asymmetry relates to two types of information: general and specific. These types differ mainly in their transfer costs when people want to share information; specific information is costlier to transfer. In theory, the optimal solution would be to transfer general information upward and transfer decision rights downward to the agents in possession of the specific information. The possibility of opportunism, however, com- plicates this solution; organizational design must account for agents who might misuse their decision rights. In a situation with substantial specific information, this results in a trade-off between transfer costs (information lost by transfer) and agency costs (suboptimal use of the information).

1 With some notable exceptions such as Agarwal and Hauswald(2010).

64 The concept of soft information

Soft information can be hard to define (Liberti and Mian, 2009). We investigate a setting that relates closely to the one described by Berger, Klapper, and Udell(2001) and Petersen(2004). These authors agree on three characteristics of soft information: it is difficult to generate, hard to verify, and costly to share. Hard or quantitative information, in contrast, is easy to store and can be objectively transmitted (Petersen, 2004). Examples of hard information used in credit evaluation are internal risk ratings as well as information about an applicants industrial segment and location. Examples of soft information are the loan officers assessment of the quality of management, the sustainability of involvement by key persons, the feasibility of the underlying credit purpose, the sustainability of the business model, and the risk profile of important suppliers.

Based on the theoretical framework presented below, we expect that the reallocation of decision rights affects how soft information influences decision-making. Our first hypothesis reflects this expectation. Hypothesis 1. Moving decision rights to higher hierarchical levels affects the effec- tiveness of considering soft information in decision-making.

3.2.1. Reallocating decision rights enables effective use of soft informa- tion.

Reduced task multidimensionality

The efficient advocacy hypothesis (Berg, 2015), as developed by Holmstr¨omand Mil- grom, 1990, 1991), posits that a superior outcome can be achieved by splitting the responsibility for a task into several separate objectives. In our research setting, the risk approver is put in place to act as a safeguard and decrease the likelihood that risk is either misjudged or not appropriately acted upon by loan officers. From the perspective of the loan officer, knowing someone else will review credit decisions shifts attention away from risk assessment. Put differently, a reduction in task multidimensionality allows the loan officer to dedicate more time and effort to gathering and sharing infor- mation, increasing the availability of soft information that can be incorporated in the credit decision.

Knowledge sharing as a cost benefit trade-off

Career considerations influence the behavior of loan officers (e.g. Cole, Kanz, and Klapper, 2015). Loan officers who perform well can expect to receive promotions ei- ther through increased authority or better positions in different departments. Constant

65 et al.(1994) approach knowledge-sharing as a cost-benefit trade-off. In our setting, that implies that the introduction of the risk approver increases the exposure of the loan officer. Knowledge sharing thus becomes a way for loan officers to improve their career prospects. This is an example of the generation and sharing of high quality soft infor- mation being influenced by extrinsic motivators. As discussed by Lin(2007), however, intrinsic motivators may also influence the sharing of information. Adding agents in- creases the reciprocal benefits, knowledge self-efficacy and enjoyment of helping others by sharing high quality information.

Increase task clarity through the introduction of a supervisor

Gibbons and Henderson(2012) suggest that introducing an additional supervisor im- proves task clarity for loan officers. From the perspective of relational contract theory, there are two important components of knowledge: task understanding and relational knowledge. Understanding procedural guidelines and executing them falls under the category of task understanding, but the abstract nature of soft information inhibits the bank’s creation of such guidelines (Campbell et al., 2016). Relational knowledge is de- fined as an understanding of what each party can and is expected to do. Employees of the risk department have greater authority and are thus expected to better understand these undocumented rules and expectations. By mandating that loan officer interact with these risk employees, the bank might aim to improve the understanding among loan officers about what soft information they are expected to gather and share.

A reduction of soft information that is being strategically withheld

Loan officers can strategically communicate soft information to affect the outcome of the credit application by withholding parts of their private information (Crawford and Sobel, 1982). Hertzberg, Liberti, and Paravisini(2010), for example, find that loan officer rotations improve the accuracy of communications because of a reduc- tion in strategic motivation to withhold information. The incentive of loan officers to strategically withhold information is also influenced by the hierarchical distance within the authorization chain (Dessein, 2002). Mosk(2014) documents that limit- ing the authority of loan officers to approve applications increases their incentives to share information. In our setting, the introduction of a risk approver might therefore increase the benefits of strategically sharing information. This, in turn, may increase the amount of available soft information.

The actions of the regulator

Regulators create social value to the extent they set rules that allow parties to write contracts that could not arise if not for those rules (Hart, 2009). Granja and Leuz

66 (2017) find in this regard that banks that get assigned to a stricter regulator improve their internal management and organization. They show how these improvements en- hanced lending and local business activity. A situation like this may exist, for instance, when individual agents can exploit their information advantage at the cost of the other party. In the case of loan decisions, it is not immediately clear under what conditions bank management and loan officers are unable to conclude (adapt) their contract. As theory suggests, agents may be perturbed if management decides that the bank makes better decision if it centralizes decision-making on loans. However, if it is the case that the bank is better off with centralization, provided that loan officers accept central- ization, regulators may provide the bank with the necessary justification to centralize decision-making. This condition exists if a central manager is better able than a loan officer to process information guiding the decision to grant the loan, set the condi- tions, or both. In that case, the loan officer specializes in collecting and sharing soft and hard information, while the central manager interprets the data to make the loan decision. The benefit would be enhanced if the loan officer is prepared to extend her or his information collection and sharing efforts. This would increase the amount of information available to the decision-maker. Even when talents of central managers and loan officers are the same, decision-making will improve, provided that the loan officer starts to collect and share additional information in response to centralization of decision-making.

Combining these theoretical considerations results in our first sub-hypothesis. Hypothesis 2a. Moving decision rights to higher hierarchical levels increases the ef- fectiveness of considering soft information in decision-making.

3.2.2. Reallocating decision rights impedes effective use of soft infor- mation.

A less prominent role for the loan officer

While the efficient advocacy hypothesis highlights the benefits of splitting responsi- bilities, there is another stream of literature that emphasizes potential downsides. By introducing an additional agent into the authorization chain, the role of the loan of- ficer is made less prominent (Aghion and Tirole, 1997). Recent papers, such as Qian et al.(2015) and Liberti(2017), show that an increase in responsibility increases the effort put into the generation of soft information. Reversing that logic results in the expectation that reducing the loan officers responsibility leads to him or her making less effort to generate and share soft information. In summary, the amount of respon- sibility given to the loan officer is a delicate balance. Too much responsibility might

67 cause the loan officer to act opportunistically, whereas too little could result in a loss of soft information.

The negative effects of evaluative pressure

Introducing a risk approver increases the exposure of the information generated by the loan officer. Increased exposure might cause the loan officer to hesitate about being creative in generating soft information. This change in behavior is explained by the finding of Campbell, Epstein, and Martinez-Jerez(2011) that increasing the intensity of monitoring can discourage unorthodox thinking and experimentation. Loan officers might experience greater evaluative pressure, which, combined with uncertainty about the expectations of upper-level management, could cause them to generate less information.

Increased communication costs

The introduction of a risk approver also increases communication costs. The loan officer must consider that there is an additional agent, from a different department, who will have to receive, understand, and interpret the loan officers information. As a result, the loan officer will have to devote additional effort to presenting soft information in a way that maximizes the likelihood that it will be correctly interpreted. Only then can the risk approver act upon the information in the way the loan officer intended. To compensate for the increased costs, the loan officer may reduce the effort put into generating and sharing soft information. The risk approver may, in turn, not act upon the soft information because the costs of correctly understanding and interpreting it are too high (Bolton and Dewatripont, 1994; Dessein, 2002; Dewatripont and Tirole, 2005).

Combining these theoretical considerations results in our competing sub-hypothesis. Hypothesis 2b. Moving decision rights to higher hierarchical levels decreases the effectiveness of considering soft information in decision-making.

3.3. Research Setting

The research setting for this study is a large European bank that offers banking, insur- ance, and asset management services. Data for one geographical segment, from August 2013 through August 2014, was compiled from the internal information systems of the bank. This geographical segment is an environment with a large and economically significant network of small to medium-sized enterprises. Throughout this period, the

68 bank was consistently ranked among the top 25 of largest European banks.

3.3.1. Credit assessments

Loan rates

The bank generates a suggested loan rate for each application. This rate is comprised of a risk-based component (based on hard information) and a cost-based component, which includes charges for services such as document preparation, underwriting, and origination. In the case of applications from small and medium-sized businesses, the suggested loan rate is relatively high, compared to those offered to bigger companies. The reason is that the risk-based component and the cost-based adjustment are both higher on average. These higher rates stem from to the relative lack of hard information on applicants and the difficulty for the bank of covering the largely fixed application expenses with the small principal amounts of these loans. The loan officers, however, can propose a rate adjustment based on any material soft information that they have collected.

The suggested loan rate for these applications is, on average, substantially higher than the interest rate that would be calculated based on the actual probability of default by small and medium-sized businesses. As a result, loan officers typically propose loan rate reductions when they incorporate soft information into the applications. 2,3

Application process

In the small and medium-sized sector, the bank receives many applications that gen- erally request relatively small amounts of credit. Evaluation is thus standardized to efficiently cope with the volume. Only a limited amount of hard information is readily available about the applicants, which increases the importance of soft information, when its available (Uchida et al., 2012). The banks guidelines, however, emphasize only the importance of soft information without providing comprehensive guidance on how to gather and store it (Campbell et al., 2016).

A loan application starts with a request for a loan via regular channels, such as the website or a phone call. Another option is for the applicant to directly contact his or her relationship manager at the bank. Depending on the channel through which an

2 This is also reflected by the fact that these adjustments are referred to as loan rate discounts within the bank. An adjustment therefore interacts with the final interest rate in the following way: a positive (negative) adjustment reduces (increases) the interest rate. 3 A counterintuitive implication of this typical overstatement is that net negative soft information frequently leads to an interest-reducing adjustment if the calculated interest rate is still over- stated after incorporating the negative soft information. These scenarios will obviously reduce the interest rate less, compared to a situation of net positive soft information, but they can still be interest reducing.

69 application arrives, it is screened based on a set of guidelines that require the applicant to have the appropriate set of documents, such as a Chamber of Commerce registra- tion. If the basic conditions are met, the application is taken into consideration via a front office manager, who will handle the early communications. In many circum- stances, the application is discontinued at this early stage, often at the initiative of the applicant. Applications that pass through this early stage are classified as potential credit candidates and are assigned a loan officer.

Based on the soft information gathered by the loan officer and the hard information provided by the applicant, a loan proposal is suggested by the loan officer. It includes hard information, such as internal ratings and details on the credit history of the applicant. It also includes soft information, such as communications and meeting sum- maries, combined with any subjective assessments of the loan officer. The application is then sent to a front-office member. The file contains both the proposal of the loan officer and a summary of the soft and hard information used to develop the proposal. The loan officer and front-office manager may then discuss the proposal, and they make a joint decision to sign off on the loan file.

3.3.2. Policy change

At the beginning of 2014, a policy change was implemented. Some of the decision rights of loan officers to grant loans were moved to the risk department. In other words, loan officers could no longer prepare the full credit application and make an offer to the client without the explicit approval of the risk department. This change was primarily triggered by a policy change initiated by a regulator, which required the bank to implement tougher controls in its credit reviews. The rationale behind the banks choice to alter the decision rights was that upper-level management viewed the original review process as unable to generate appropriate risk profiles for applicants.4 Soft information is an important part of an accurate risk profile, especially for small and medium-sized businesses (Uchida et al., 2012; Drexler and Schoar, 2014). The loan officers are the account managers of the loan applicants and are thus the only employees tasked with collecting soft information. The bank decided to involve the risk department toward the end of the application process. The risk department then has the final say on the loan proposal and the acceptance of an application. The risk department can ask loan officers to elaborate on their initial loan proposals. If the risk department disagrees with the loan officer, its decision prevails.

4 The bank did not decide to alter the decision rights because they perceived the interest rate to be consistently too high or too low. The combination of external pressure with a perceived inability of the loan officers to generate an appropriate risk profile were the primary drivers.

70 3.3.3. Role of the loan officer

Incentives of the loan officer

The majority of activities by loan officers involve managing their accounts. These ac- tivities range from providing services to customers to negotiating with the clients to increase the likelihood that loan payments arrive on time (e.g., forbearance negotia- tions). The incentive system for the loan officers does not include performance-related bonuses. Loan officers receive a fixed wage determined by tenure, and the banks culture motivates through the possibility of promotion.

Loan officers are therefore not given individual targets in terms of the number of loans they are expected to sell. Occasionally, they are encouraged to increase their sales activity if the total number of granted loans in a period is falling short of the projected amount. These inducements, however, do not play a leading role in promotion decisions. Instead, the likelihood of promotion and the performance appraisal, in general, is primarily based on how well loan officers manage their clients. Standard advancement paths include a promotion to servicing corporate loans or to a different department such as the risk department. Promotions are characterized by not only an increase in responsibilities but also in an increased total amount of outstanding credit that the officer can manage.5

Discretion of the loan officer

For our study, it is of particular interest how the soft information collected by the loan officer is incorporated into the loan proposal and the outcome of the credit application. In the settings of similar papers, soft information is usually integrated into the internal risk rating (Berg, 2015; Qian et al., 2015) of a client. In the application process we study, however, these internal credit ratings are solely based on hard information and are kept clear from any discretionary influence by the loan officer. Based on policy documents and conversations with bank employees, we have identified two key stages at which the collected soft information can influence applications.

In the early stage of an application, front office employees have some discretion over the applications that they pre-screen. The majority of early rejects or cancellations are due to inability of the applicant to comply with basic requirements, such as the uploading of hard information. At this early stage, however, the front office employee is the sole possessor of any soft information gathered from communications with the applicant. Many of these applications do not reach the point of being logged in the

5 The incentive structure for risk officers is largely similar to those of the loan officers. The primary difference is that the evaluation of risk officers is more focused on the risk assessment quality of their portfolio.

71 main application information system.

After the application has passed these early stages, it is entered into the application processing system. At this stage, the loan officer determines the specifics of the loan proposal. The officer has discretion over two main choices: credit construction and interest rate. The credit construction determines the type of product (e.g., working capital loan, overdraft loan, or regular fixed-interest loan) that serves as the foundation for the rest of the application. One application can consist of multiple product types. The total interest rate consists of an objective part, based solely on hard informa- tion, and an optional subjective adjustment based on soft information. This optional adjustment is only allowed for some product types.

3.4. Sample Selection and Empirical Design

3.4.1. Sample selection

The sample for our main analyses consists of 2,600 credit applications from August 2013 through August 2014, including applications from both small and medium-sized enterprises and the lower half of the corporate segment. A benefit of this data is that many of our variables are based on meta-data captured and logged by the informa- tion systems. Auditors prefer to audit processes such as these based on meta-data because it is an independent and unmanipulated source of information (Jans, Alles, and Vasarhelyi, 2013).

This set of applications is selected based on the following criteria. The applications are all from the same country. An applicant requests some form of credit (e.g., early repayment requests are excluded). The application must be for a meaningful amount (no administrative cases), and it must have available data for our key constructs. To ensure consistency throughout the sample, our main analyses focus on a sample of accepted applications.

3.4.2. Empirical design

Main dependent variable

Our main dependent variable is a proxy for the soft information integrated into the credit application. It is operationalized by means of using the subjective component of the interest rate. This optional subjective component is intended to adjust the interest rate margin that is calculated based on a set of hard information. Campbell et al.(2016) verify this role by documenting a significant link between their direct

72 measure for soft information, based on textual analysis of exception reports, and the discretionary part of the interest rate. Both a positive (interest decreasing) and a negative (interest increasing) adjustment are possible. Our main dependent variable is a single quantitative construct defined below.6

Subjective adjustment (Soft information Taken into consideration) = | Calculated interest margin

Note that we do not attempt to measure the amount of soft information. We are inter- ested in measuring the degree of effectiveness with which soft information is integrated into the decision process. By doing so, we not only examine the gathering and shar- ing of soft information but also emphasize the aspect of appropriately acting on that information.7

We define “effectively integrating soft information” as using soft information to adjust the calculated interest rate toward a rate that better reflects the probability of default. This definition implies that a larger interest-decreasing adjustment reflects an increase (decrease) in effectiveness, if the calculated interest rate is substantially too high (low) and vice versa for a higher interest-increasing adjustment. As explained in Section 3.1, our sample of applications is characterized by a calculated interest rate that is typically overstated, compared to the probability of default. As a result, we can interpret an average increase (decrease) in interest-reducing adjustments as soft information being integrated more (less) effectively into decision-making.8

Research design

We aim to isolate the effect on our soft information construct that is attributable to the organizational design shock. This is operationalized by the use of a differences-in- differences design. Our treatment group is the small and medium-sized business seg- ment of the bank, and the corporate segment is our control group. These two segments have their own separate departments within the bank, each with their own employ- ees, policies, and procedures. There is no reason to suspect that the policy change in the small and medium-sized department also affects applications for corporate clients. This assumption is strengthened by the observation that risk-department review is

6 The height of the adjustment is correlated with the calculated interest margin. A large calculated interest margin generally requires a larger adjustment. We avoid hard information from indirectly affecting our soft information construct by using a ratio that explicitly corrects for the height of the calculated interest margin. 7 It is for this reason that we do not look at the absolute value of our dependent variable; the absolute value would not allow us to study the effectiveness of incorporating soft information. 8 Note that this interpretation is specific to our setting. In a scenario in which the calculated interest rate is typically understating the underlying probably of default, the interpretation would be reversed.

73 already a practice for the corporate segment throughout our sample period. The first set of empirical procedures is based on the following model (i indexes borrowers and t indexes months):

Soft information = β0 + β1Shockt + β2SMEi + β3Shockt SMEi + Controls + i,t i ∗ (3.1)

The coefficient β1 will be an indicator for the general trend of the control group. β2 indicates any level differences of the trend in the ex ante period when comparing the control and treatment groups. For our analysis, the main coefficient of interest is β3, which indicates how the subjective adjustment is influenced by the shock, relative to the unaltered trend of the control group. A significant coefficient on the interaction term is consistent with our expectation that the shock affects the way that soft in- formation is integrated into decision-making. A more interest-decreasing correction, for our setting, should be interpreted as soft information being better integrated into decision-making, as it allows a reduction of the typically overstated initial interest rate. This interpretation comports with the work of Campbell et al.(2014) and Campbell et al.(2016), who study a similar setting. To summarize, a significantly positive β3 coefficient supports hypothesis 2A (soft information better incorporated), whereas a significantly negative β3 coefficient supports of hypothesis 2B (soft information incor- porated less effectively).

Control variables

Several control variables are added to account for applicant and application-specific factors. The internal risk rating captures the majority of applicant-specific factors. This rating is calculated based on the objective characteristics of an applicant that are available to the bank. The height of the rating corresponds with the estimated probability of default; a high rating relates to a high estimated probability of default. Bank employees often use the risk rating as baseline prediction for the riskiness of a company. We therefore expect that companies with high ratings will receive more attention and have more soft information considered. There are multiple types of rating models that are available for use, and the dummy variable objective rating indicates whether the risk rating is generated by a new stream of arguably better rating models. Two additional applicant-specific control variables are included: the dummy variable going concern is 1 if the applicant is still active six months after the application, and the dummy variable easy financials is 1 if basic financial information is available via the Chamber of Commerce. Three additional application-specific control variables are included. V ariable interest is a dummy variable to indicate whether the interest rate is variable, and new credit is a dummy variable to indicate that the application is a

74 request for a new loan. The variable processing time is the number of working days between the initialization date and the date at which the application is either accepted or rejected.

Selection effects

Certain product types receive an interest rate that is based only on hard information. These standard products types are generally used for lower risk (e.g., lower credit amount) applications and allow for quick processing. The loan officer, however, in- fluences this product choice and thus indirectly influences whether a subjective ad- justment is possible. This presents a potential problem because our soft information construct is available only for applications where a subjective adjustment is possible. A solution would be to assume that the subjective adjustment is zero for those ap- plications with a standard product type. This approach, however, would ignore any nonrandomness involved in selecting the product type. Work by Heckman(1979) pro- vides a better solution in the form of the Heckman selection model. This estimation procedures works by treating the selection effect as an omitted variable bias. An adjust- ment factor is estimated via a selection model, which is then included in the outcome model as an additional explanatory variable. In the first stage, the following selection model is estimated via binary choice estimation procedures.

Custom producti = β0 + β1Shockt + β2SMEi + β3Shockt SMEi ∗ (3.2) + Controls + Exclusion restrictions + i,t

Based on this first stage, an adjustment factor called the inverse Mills ratio or Lambda is calculated, which is included in the second stage as an additional explanatory vari- able.

Soft informationi = β0 + β1Shockt + β2SMEi + β3Shockt SMEi ∗ (3.3) + β4Lambdat + Controls + i,t

At least one valid exclusion restriction is required to assure econometric validity of the Heckman procedure. These exclusion restrictions must be included in the selection equation but not in the outcome equation. It is difficult to find an exclusion restriction for which it makes economic sense to exclude it from the second-stage equation. For our situation, two reasonably valid exclusion restrictions have been identified: a dummy variable to indicate a limited liability company (LLC legal form) and the years of incorporation. Conversations with bank employees and policy documents suggest that the legal form of a company is an important consideration for choosing a particular

75 product type. Both the age of an applicant and the legal form are hard information factors that are not expected to influence the subjective adjustment. To statistically verify this claim, the first set of analyses will include these inclusion restrictions to assess their significance in the second-stage outcome equation.

3.5. Descriptive Statistics and Empirical Results

3.5.1. Descriptive Statistics

Table 1 provides detailed descriptive statistics for the samples used for our two main empirical tests. The first (second) column in Table 1 displays the mean values and standard deviations for the subset of observations with product types that can (cannot) have a subjective adjustment. For the total set of observations these statistics are displayed in the third column. Around 65% of the applications in our sample have a construction that allows for a subjective adjustment. This results in 1,646 applications for the custom product type and 917 observations for the standard product type, totaling 2,563 observations.

The dependent variable subjective adjustment is included in the first column of Table 1. On average, the subjective adjustment is -3.3% with a standard deviation of 23.3%. This negative adjustment indicates that an average application receives a subjective adjustment that increases its calculated interest rate by 3.3%. The standard deviation, however, implies that this adjustment varies substantially across applications. Around 60% of our 2,563 applications belong in the small and medium-sized treatment group. The group of applications with a standard product type consists mainly of these ap- plicants. This is explained by corporate applicants generally requesting a larger credit amount, resulting in custom credit construction. Roughly 60% of the observations fall in the ex-post period. Consistent with our prediction, applications with a standard product design have, on average, a lower risk rating and request a relatively small amount of credit. Over 95% of the applicants in our sample remain active at least six months after the application has been completed. The average applicant firm is 22 years old. A credit application with a potential adjustment generally takes 46 working days to complete, compared to 25 working days for those with a standard product design.

[Table 1 about here]

The main econometric characteristic of our empirical analyses is the differences-in- differences approach. We believe that we can improve our understanding of this ap-

76 proach if we support our analyses by splitting the descriptive statistics based on the four groups of the differences- in-differences design. In our case, that results in a split based on small and medium-sized businesses versus corporate and ex ante versus ex post. These descriptive statistics are displayed in Table 2 panel A for the first set of empirical tests and Table 2 panel B for the second set of empirical tests.

The main statistics of interest in Table 2 Panel A are the subjective adjustment differ- ences between the four groups. The treatment group is small and medium-sized loan applications, and the control group is corporate loan applications. Before (after) indi- cates before (after) the policy change. In our control group, the subjective adjustment goes from -4.3% in the ex ante period to an average of -8.4% in the ex post period. This number deviates from the pattern present in the averages of the treatment group; a positive 1.4% in the ex ante period shifts to an average of 2.4%. The treatment group has a slightly higher risk rating, compared to the control group, and this implies that larger clients are, in general, given a lower estimated probability of default. These ratings do not appear to be affected by the shock. After the introduction of a risk approver, the average processing time for small and medium-sized loan applications increases from 32 to 37 working days.

[Table 2 Panel A about here]

Panel B of Table 2 is constructed using the same groups as Panel A. Of main interest is the distribution between custom and standard products types. The proportion of custom products for the treatment group slightly increases from 38% to 46%. This distribution remains at a high level throughout our sample period for the control group. There is a slight increase in the average credit amount and internal risk rating for the treatment group. Similar to Panel A, the average processing time for small and medium-sized loan applications increases with around seven working days.

[Table 2 Panel B about here]

3.5.2. Main Analysis

In Table 3, we show the results of our main differences-in-differences model. Three different specifications are included. Columns 1 and 4 show the results of our base- line model, using only the explanatory variables of the differences-in-differences de- sign. Columns 2 and 5 improve this baseline model by including the control variables. The final columns, 3 and 6, present the most comprehensive model by adding two- digit NAICS industry and geographical indicators. We run an ordinary least squares estimation for the equations represented in the first three columns. Our dependent variable, however, is a ratio that is censored by an upper and lower bound of 2. Or-

77 dinary least squares estimation does not take such bounded dependent variables into account, and we therefore run a Tobit type 1 censored regression estimation in the final three columns. All estimation results have standard errors that are corrected for heteroscedasticity and clustered by two-digit industry.

The results of Table 3 indicate that the interest adjustment of the control group has decreased (-0.044, p = 0.028) when comparing the ex ante and ex post situations. Also consistent with the descriptive statistics of Table 2 Panel A is the positive and signifi- cant coefficient for SME (0.071, p = 0.007). The interaction term between shock and SME is significant and positive with a coefficient of around 5.0% (p = 0.034). This result supports our first hypothesis, stating that the introduction of a risk approver significantly influences the integration of soft information into credit applications. The positive sign of this coefficient indicates that the effectiveness of integrating soft in- formation into decision-making improved, which supports hypothesis 2A. This result holds across all specifications and estimation methods. The internal credit rating (0.019, p = 0.000) and credit amount (0.001, p = 0.000) are significant and pos- itive in all of our estimations. This result is consistent with our expectation that applications with a higher probability of default or those that request more credit re- ceive more attention. Neither exclusion restriction, LLC Legal F orm (p = 0.543) nor Y ears of incorporation (p = 0.197), is statistically significant.

The coefficient and significance of the interaction term is also influenced by the trend of the treated group, relative to the trend of the control group. This consideration is relevant, even when the common trend assumption holds. In the descriptive statistics from panel A of Table 2, we show that the average subjective adjustment changes from 1.4% to 2.4% for the treatment group and from -4.3% to -8.4% for the control group. Under the common trend assumption, this indicates that, without the shock, the subjective adjustment for the treated group would have changed from 1.4% to approximately -2.7%. The interaction coefficient in the first column of Table 3 (0.051, p = 0.025) displays this differences between the expected -2.7% versus the observed 2.4%. Our result is strengthened by the fact that this difference is not only driven by a relative trend effect but also by an absolute increase in the mean value of our construct.

[Table 3 about here]

3.5.3. Selection Effects

Our main result may be driven by a selection effect. Loan officers can sort applications into proposals that allow for subjective interest adjustments (a “custom product”) or ones that do not (a “standard product”). This second set of analyses is aimed at

78 identifying how selection effects (changes in loan sorting) impact our results. First, we investigate whether the shock changed the selection behavior of the loan officer. Second, we investigate whether this behavior could be driving the results observed in the first set of tests. The results of the selection equations presented in Table 4 aim at answering the first question. Table 5 presents the results of the outcome equations that replicate the results of Table 3 with a selection correction. Two estimation procedures for the Heckman selection model are included: a full information maximum-likelihood estimation (MLE) procedure and the less efficient two-step procedure. The two-step procedure has corrected standard errors to account for the fact that an estimated parameter is included in the second stage.

[Table 4 about here]

The main result of Table 4 is that the coefficient on the interaction term shock * SME is not statistically significant (p-values range from 0.100 to 0.768) in the majority of specifications. This indicates that, from an econometric perspective, the shock did not trigger a change in selection behavior by the loan officers. Both exclusion restrictions are, as expected, significant determinants of the likelihood of being assigned a custom product type (LLC Legal F orm: 0.242, p = 0.000 and Y ears of incorporation: 0.002, p = 0.001). Besides the main explanatory variables and the two exclusion restrictions, there are other control variables that are found to be significant in the selection equa- tions. Applications with higher internal ratings (0.017, p = 0.000), ratings generated by the new rating models (0.043, p = 0.072), or applicants requesting a new loan (0.202, p = 0.000) have a higher likelihood of being assigned a custom product type.

[Table 5 about here]

Following the structure of Table 4, Table 5 includes two specifications and two estima- tion methods. The objective of Table 5 is to investigate whether the results in Table 3 are driven by a selection effect. An applicable starting point is to investigate the statistics generated by the Heckman estimation procedure: rho and lambda. Rho is calculated to assess whether selection into the outcome sample is based on a nonrandom process. At the bottom of Table 5, it becomes apparent that rho is significant (-0.755, p = 0.000) in most specifications, indicating that the selection into the outcome sample is not random. A negative rho indicates that the unobservable characteristics affect the likelihood of receiving a custom product in an opposite way compared to how these characteristics influence the height of the subjective adjustment. Both the estimates of rho and lambda indicate that selection into the outcome equation is not random, which makes it worthwhile to include lambda into the outcome equations.

As in Table 3, the main variable of interest in Table 5 is the interaction term Shock

79 * SME. Throughout all specifications this interaction term remains positive and sig- nificant (p-values range from 0.020 to 0.069). The inclusion of a selection adjustment therefore does not compromise our earlier results. The increased effectiveness of in- corporating soft information, as observed in Table 3, is not driven by a selection ef- fect.

3.6. Additional Tests

3.6.1. Loan officer Fixed Effects

The amount of employees responsible for assessing and processing the applications in our sample is quite heterogeneous. There are roughly 350 different loan officers who have been identified as assigned to a credit application in our sample. Around 250 different risk approvers are identified as being involved with a credit application in our ex post sample. This heterogeneity in the loan officer pool inhibits the implementation of loan officer fixed effects in our main analyses. In many cases, a loan officer only appears once or twice in our sample.

[Table 6 about here]

Table 6 replicates Table 3 for a subsample of applications assigned to a loan officer who appears at least twice in both the before and after period. The literature documents that employee turnover is one of the main drivers of changes in economic outcomes after a change in organizational design (e.g., Campbell, 2012). The goal of this analysis is to study whether our main effect is driven by a shift in the loan officer pool and to check whether our results are susceptible to the inclusion of loan officer fixed effects. Columns 1 and 3 speak to this first question and show that the interaction term for this subsample remains statistically significant (0.059, p = 0.012). In the columns 2 and 4, loan officer fixed effects are included, and this absorbs the effect of SME, but our interaction term Shock * SME remains unchanged (0.056, p = 0.032). Overall, these results show that our main result is primarily driven by a change in behavior by the established loan officers and not by a change of the loan officer pool.

3.6.2. Loan outcomes

Our primary results in Table 3 and our additional results of Table 6 indicate a change in behavior by the loan officers following the change in organizational structure. These analyses, however, do not allow us to infer whether this change in behavior has positive or negative implications for the loan outcomes. The literature usually uses the charge-

80 off rate to assess loan outcomes. The first two columns of Table 7 show how the charge-off rate is influenced by the change in organizational structure. The dependent variable of these logit regressions is a dummy variable that equals 1 if the risk rating of a loan is higher than the charge-off threshold 15 months after the loan has been granted. Regardless of region and segment fixed effects, the primary coefficient of this regression is not significant (-0.002, p = 0.966 and -0.009, p = 0.774). This result is not surprising, given that the nature of these loans is long term and performance would be expected to deteriorate gradually. Given our sample years, it is not possible to directly observe the long-term charge-off outcome, but we use a proxy (the “probability of default slope”) to capture the long-term performance of these loans. This probability is calculated by estimating a linear regression for each loan where the risk rating (on a per-month basis) is the dependent variable and time (month relative to the date of loan granting) is the independent variable. This slope is then used as the dependent variable for the OLS regressions in columns 3 and 4 of Table 7. The negative coefficient on Shock SME (-0.047, p = 0.030 and -0.044, p = 0.040) suggests that, relative to the ∗ control group, the shock in organizational design improves the probability of default prospect. These results confirm that better integration of soft information into loan applications helps the performance of these loans.

[Table 7 about here]

3.6.3. Pre-screening

An aspect of the application process that may be relevant but is hard to investigate is pre-screening. Pre-screening does not relate to our main result directly because these screens are generally performed by a different group of employees. It is, however, helpful to provide some descriptive results that give insight into pre-screening.9 We create a separate sample based on another information system that stores the majority of credit applications received by the bank via regular channels such as the internet or by telephone. Applications that lack the basic requirements and those that are retracted by the applicant are filtered out of this sample. Following the approach of Qian et al. (2015), we split the sample into two periods, February 2013 through to August 2013 and February 2014 through to August 2014. For each period, an estimation is performed to identify how the known characteristics of an application influence the likelihood it will be allowed to enter the next application stage. This approach is descriptive and not intended to differentiate between time trends or a potential effect attributable to our shock. 9 Unfortunately, we are not able to match these early applications to the applications in our main sample because the various systems use incompatible identifiers.

81 [Table 8 about here]

Table 8 provides the results of logit estimations for these two periods. A large difference between these two periods is the inflow of credit requests; it decreases from a total of 4,000 to around 1,600. This is consistent with the small and medium-sized business financing described by Wehinger(2014). Looking at the coefficients, there are several characteristics that indicate a higher likelihood of early acceptance: applications that received via a face-to- face meeting (client meeting: 0.262, p = 0.000), applicants who request a loan via their relationship manager (0.423, p = 0.000), and applicants whose existing loan is older than one year (establish client: 0.097, p = 0.000). The positive and significant coefficient on going concern (0.179, p = 0.000) confirms that pre-screening can select applicants that have a lower probability of defaulting within six months after the application is completed.

Besides the reduction in the number of requests, there are three other differences in Ta- ble 8 when comparing between the two periods. The first is the increasing importance of the credit amount that is requested. Consistent with the trend of increasing risk aversion by the bank is the lower likelihood of an early acceptance for higher credit amounts in the more recent period (0.012, p = 0.082 -0.030, p = 0.000). It is possi- → ble for the front office to request a brief early recommendation by the risk department, which is included by the dummy variable risk involvement. A strong increase in the coefficient for risk involvement is observed when comparing between the two periods (0.302, p = 0.000 0.763, p = 0.000). This growth suggests the risk department also → has an increased importance during the pre-screening stages after the organizational shock. The final difference is an increase in the explanatory power (R2 = 0.175 R2 → = 0.446) of these basic characteristics, which highlights the attempt of the bank to improve the loan review process for small to medium-sized companies.

3.7. Robustness Tests

3.7.1. Likelihood of acceptance

In the previous empirical tests, only accepted applications were considered. As men- tioned by Agarwal and Hauswald(2010), leaving these unaccepted applications out of the sample might result in an endogeneity problem. The goal of this section is to investigate whether the main results are driven by the exclusion of these non-accepted applications and whether the shock influenced the likelihood of acceptance itself. We define non-accepted applications as those that have a completed application process

82 but are rejected or declined at the final stage.

[Table 9 about here]

The first column of Table 9 is a reduced version of the model underlying Table 3. Around 300 non-accepted applications are added to the sample, and the dummy vari- able accepted indicates whether an application is accepted. The negative and significant coefficient (-0.024, p = 0.019) of accepted shows that accepted applications, on average, receive a more interest-increasing or less interest-decreasing adjustment, compared to non-accepted applications. The interaction coefficient remains unchanged (0.050, p = 0.029), which gives reassurance that our results are not driven by the exclusion of non-accepted applications.

Columns 2 and 3 of Table 9 present logit estimations with the acceptance dummy as the dependent variable. Columns 2 and 3 are based only on the small and medium- sized business sample, due to an insufficient number of non-accepted observations with a standard product type for the control group. The results, as displayed through the negative and significant shock coefficient (-0.058, p = 0.001), are consistent with a downward trend in the likelihood of acceptance. The interaction term is insignificant (0.029, p = 0.397), indicating that the likelihood of acceptance is not impacted by the introduction of a risk approver.

3.7.2. Common Trend Assumption

Our analyses contain two differences-in-differences estimations, each with their own common trend assumption. There is no actual test that guarantees a robust verdict on the assumption of the common trend. There are, however, several tests that can give some reassurance that a violation of the common trend assumption is not driving the results (Roberts and Whited, 2012). We will present two of these tests: a graphical representation of the ex ante linear trend and a placebo tests using a random shock on the ex ante sample.

The graphical approach is performed by estimating a linear trend via OLS for the ex ante period. These estimates result in a linear prediction line for both the treatment and control groups. While this approach does not represent a statistical procedure, it does allow for an eyeball approach to identify potential problems. These graphical results are followed up by placebo tests designed to add statistical verification. The intuition behind this test is that a randomly picked placebo shock point in the ex ante period should not yield a significant interaction term if the common trend assumption holds.10 A number of random placebo shock dates are picked, and the baseline specifi-

10 The ex ante period used is Sept. 1, 2013, through Nov. 30, 2013, excluding December 2013, to

83 cations, presented below, are iteratively estimated based on these placebo shocks. The iterative results are averaged to yield the final ones.

Soft information = β0 + β1P lacebo Shockt + β2SMEi + β3P lacebo Shockt SMEi i ∗

Custom product = β0 + β1P lacebo Shockt + β2SMEi + β3P lacebo Shockt SMEi i ∗

Figure 1 and Panel A of Table 10 display the test results for the first set of analyses. In- spection of Figure 1 for both the small and medium-sized and corporate samples shows a decreasing trend that appears to be aligned between the two groups. The placebo re- sults presented in panel A of Table 10 confirm the common trend suggested by Figure 1; the placebo shock interaction term is not significant (-0.001, p = 0.810).

[Figure 1 about here]

[Table 10 Panel A about here]

Figure 2 and Panel B of Table 10 display the test results for the second set of analyses. Figure 2 reveals an almost flat trend for the corporate segment, while there appears to be an increasing trend for the small and medium-size segment. This would suggest that these trends are misaligned. The nonsignificant (-0.449, p = 0.308) interaction term in panel B of Table 10, however, counteracts with this suggestion. Comparison of the rear end of the small and medium-sized linear trend line with the mean value in panel B of Table 2 (46%) suggests that the increasing trend of the segment discontinues after the shock. We confirm this suggestion when we plot the ex post period selection trend. In Figure 3, we indeed observe that both trends appear to be aligned for the ex post period. This observation results in a special econometric case in which the interaction term in Table 4 is mainly influenced by any trend difference that has preceded the shock. This intuition is best understood by the fact that switching the 0-1 around for the shock term will yield identical results to Table 4 but with flipped signs. The nonsignificance of the interaction term in Table 4 therefore indicates that the common trend assumption is not violated. Overall, these results indicate that the common trend assumption can reasonably be expected to hold for both the outcome and selection models.

[Figure 2 about here]

[Table 10 Panel B about here]

[Figure 3 about here]

avoid potentially picking up effects related to the shock.

84 3.8. Conclusions

We examine how externally imposed changes to the allocation of decision rights af- fects the generation, sharing, and application of soft information in a large European bank. Our research design is built on a differences-in-differences model that uses the introduction of a risk approver as a quasi-natural experiment. This introduction of ad- ditional risk assessment procedures was imposed by the local regulator. Using a sample of loan applications, we find that reallocating decision rights to a higher organizational level affects the integration of soft information into the assessment of credit applica- tions. Decreasing the decision authority of loan officers leads them to share their soft information with the risk approvers. Increasing the decision rights of risk management and reducing those of the loan officers therefore improves the banks decision-making by allowing soft information to be better used in the assessment of credit applications. This increased effectiveness is accompanied by an improvement in the loan outcomes. These findings are robust to controlling for strategic loan-sorting behavior, manager fixed effects, and the likelihood of acceptance. We also document that this improved integration of soft information is driven by a change in behavior of the loan officers and not by a change in the loan officer pool.

The results of our study may contradict recent findings of Qian et al.(2015) and Liberti(2017). In our paper, we examine a situation in which central management decided to take away the decision rights from the loan officers to decide whether loan applicants would be granted a loan. Based on the findings of Qian et al.(2015) and Liberti(2017), one might expect that decision-making would deteriorate. However, a key characteristic of our setting is that central management can justify its decision as being imposed by the regulator. This justification makes it more likely that loan officers will accept the decision and cooperate (Fehr and G¨achter, 2000). Our evidence suggests that this cooperation extends to loan officers stepping up their effort to collect, share, and process information in the wake of the centralization decision.

There are several other characteristics of our setting that might be driving our result that centralization improves the use of soft information in lending decisions. The first relates to competition between banks, as our bank operates in a setting with much competition. The analytical model of Heider and Inderst(2012) yields the expectation that the agency problem between a bank and its loan officers is greater in situations of high competition, causing the loan officers to rely on hard information and neglect soft information. Furthermore, Canales and Nanda(2012) empirically document that decentralization can have adverse effects on the lending terms for small businesses in settings with high competition. The introduction of a risk approver might help alle-

85 viate this heightened agency problem and allow for soft information to be integrated better into the applications. Second, Berger, Miller, Petersen, Rajan, and Stein(2005) show that bigger banks have more trouble incorporating soft information in loan re- views. Our bank is one of the biggest in Europe. Reducing its decentralization might thus help alleviate its difficulties with incorporating soft information. Lastly, loan of- ficers and risk managers are relatively homogeneous in our setting. It is common for risk managers to have been loan officers previously. Canales and Greenberg(2016) document, for example, that it is easier for loan officers with similar lending styles to transfer information between each other. This homogeneity may help reduce some of the frictions that might prevent soft information from being transferred, enabling the risk approver to better use soft information.

There are several limitations to this study that could be addressed by future research. Due to data limitations, we cannot determine exactly how the change in decision rights impacts the collecting, sharing, and application individually, we only observe the aggregate outcome of the decision-making. Such effect can be teased out with a randomized control trial Paravisini and Schoar(2015). Furthermore, our sample period is relatively short. Future research can extend on our work by investigating how the influence of risk management evolves. Additionally, besides a reallocation of design rights, the bank is trying to improve its risk rating models by shifting employees focus to verifiable information. While these new rating models do not affect our sample period, they do provide an interesting avenue for future research.

86 Graphs

Parallel Trends

Fig. 3.1. Ex-ante trend for the main analysis.

87 Fig. 3.2. Ex-ante trend for the selection analysis.

88 Fig. 3.3. Ex-post trend for the selection analysis.

89 Bibliography

Abernethy, M., Bouwens, J., van Lent, L., 2004. Determinants of control system design in divisionalized firms. The Accounting Review 79, 545–570.

Agarwal, S., Hauswald, R., 2010. Distance and Private Information in Lending. Review of Financial Studies 23, 2757–2788.

Aghion, P., Tirole, J., 1997. Formal and Real Authority in Organizations. Journal of Political Economy 105, 1–29.

Berg, T., 2015. Playing the Devil’s Advocate: The Causal Effect of Risk Management on Loan Quality. Review of Financial Studies 28, 3367–3406.

Berg, T., Puri, M., Rocholl, J., 2014. Loan officer incentives, internal ratings and default rates. Working Paper .

Berger, A. N., Klapper, L. F., Udell, G. F., 2001. The Ability of Banks to Lend to Informationally Opaque Small Businesses. Journal of Banking & Finance 25, 2127– 2167.

Berger, A. N., Miller, N. H., Petersen, M. A., Rajan, R. G., Stein, J. C., 2005. Does function follow organizational form? Evidence from the lending practices of large and small banks. Journal of Financial Economics 76, 237–269.

Bolton, P., Dewatripont, M., 1994. The Firm as a Communication Network. Quarterly Journal of Economics 109, 809–839.

Campbell, D., 2012. Employee Selection as a Control System. Journal of Accounting Research 50, 931–966.

Campbell, D., Epstein, M. J., Martinez-Jerez, F., 2011. The learning effects of moni- toring. Accounting Review 86, 1909–1934.

Campbell, D., Erkens, D. H., Loumioti, M., 2014. Exception Reports as a Source of Idiosyncratic Information. Working Paper .

Campbell, D., Erkens, D. H., Loumioti, M., 2016. Monitoring and the Portability of Soft Information. Working Paper .

90 Canales, R., Greenberg, J., 2016. A Matter of (Relational) Style: Loan Officer Con- sistency and Exchange Continuity in Microfinance. Management Science 62, 1202– 1224.

Canales, R., Nanda, R., 2012. A darker side to decentralized banks: Market power and credit rationing in SME lending. Journal of Financial Economics 105, 353–366.

Cole, S., Kanz, M., Klapper, L., 2015. Incentivizing Calculated Risk-Taking: Evidence from an Experiment with Commercial Bank Loan Officers. Journal of Finance 70, 537–575.

Constant, D., Kiesler, S., Sproull, L., 1994. What’s mine is ours, or is it? A study of attitudes about information sharing. Information Systems Research 5, 400–421.

Crawford, V. P., Sobel, J., 1982. Strategic Information Transmission. Econometrica 50, 1431–1451.

Dessein, W., 2002. Authority and Communication in Organizations. The Review of Economic Studies 69, 811–838.

Dewatripont, M., Tirole, J., 2005. Modes of Communication. Journal of Political Econ- omy 113, 1217–1238.

Drexler, A., Schoar, A., 2014. Do Relationships Matter? Evidence from Loan Officer Turnover. Management Science 60, 2722–2736.

Ertan, A., Loumioti, M., Wittenberg-Moerman, R., 2017. Enhancing Loan Quality Through Transparency: Evidence from the European Central Bank Loan Level Re- porting Initiative. Journal of Accounting Research 55, 877–918.

Fehr, E., G¨achter, S., 2000. Fairness and Retaliation: The Economics of Reciprocity. Journal of Economic Perspectives 14, 159–182.

Gibbons, R., Henderson, R., 2012. What Do Managers Do? Exploring Persistent Per- formance Differences among Seemingly Similar Enterprises. In: The Handbook of Organizational Economics, Princeton University Press, pp. 680–731.

Granja, J., 2018. Disclosure Regulation in the Commercial Banking Industry: Lessons from the National Banking Era. Journal of Accounting Research 56, 173–216.

Granja, J., Leuz, C., 2017. The Death of a Regulator: Strict Supervision, Bank Lending and Business Activity. SSRN Electronic Journal .

Hart, O., 2009. Regulation and Sarbanes-Oxley. Journal of Accounting Research 47, 437–445.

91 Heckman, J., 1979. Sample Selection Bias as a Specification Error. Econometrica 47, 153–161.

Heider, F., Inderst, R., 2012. Loan Prospecting. Review of Financial Studies 25, 2381– 2415.

Hertzberg, A., Liberti, J. M., Paravisini, D., 2010. Information and Incentives Inside the Firm: Evidence from Loan Officer Rotation. The Journal of Finance 65, 795–828.

Holmstr¨om,B., Milgrom, P., 1990. Regulating Trade Among Agents. Journal of Insti- tutional and Theoretical Economics pp. 85–105.

Holmstr¨om,B., Milgrom, P., 1991. Multitask principal-agent analyses: Incentive con- tracts, asset ownership, and job design. Journal of Law, Economics, & Organization pp. 24–52.

Jans, M., Alles, M., Vasarhelyi, M., 2013. The case for process mining in auditing: Sources of value added and areas of application. International Journal of Accounting Information Systems 14, 1–20.

Jensen, M. C., Meckling, W. H., 1992. Specific and General Knowledge, and Organiza- tional Structure. In: Werin, L., Wijkander, H. (eds.), Contract Economics, Blackwell Publishers, Oxford, U.K.

Liberti, J. M., 2017. Initiative, Incentives, and Soft Information. Management Science p. mnsc.2016.2690.

Liberti, J. M., Mian, A. R., 2009. Estimating the Effect of Hierarchies on Information Use. Review of Financial Studies 22, 4057–4090.

Lin, H., 2007. Effects of extrinsic and intrinsic motivation on employee knowledge sharing intentions. Journal of Information Science 33, 135–149.

Milgrom, P. R., Roberts, J., 1992. Economics, organization, and management. Prentice-Hall.

Moers, F., 2006. Performance Measure Properties and Delegation. The Accounting Review 81, 897–924.

Mosk, T., 2014. Delegation of Authority and Information Manipulation: Evidence from Bank Lending Decisions. Working Paper .

Paravisini, D., Schoar, A., 2015. The Incentive Effect of Scores: Randomized Evidence from Credit Committees.

92 Petersen, M. A., 2004. Information: Hard and Soft. Working Paper .

Qian, J. Q., Strahan, P. E., Yang, Z., 2015. The Impact of Incentives and Commu- nication Costs on Information Production and Use: Evidence from Bank Lending. Journal of Finance 70, 1457–1493.

Raith, M., 2008. Specific knowledge and performance measurement. The RAND Jour- nal of Economics 39, 1059–1079.

Roberts, M. R., Whited, T. M., 2012. Endogeneity in Empirical Corporate Finance. Working Paper .

Uchida, H., Udell, G. F., Yamori, N., 2012. Loan officers and relationship lending to SMEs. Journal of Financial Intermediation 21, 97–122.

Wehinger, G., 2014. SMEs and the credit crunch. OECD Journal: Financial Market Trends 2013/2, 115–148.

93 Tables

94 Table 1: Descriptive Statistics

Custom products Standard products Total Min Max SME 0.397 0.971 0.602 0.000 1.000 (0.489) (0.169) (0.489)

Shock 0.601 0.538 0.579 0.000 1.000 (0.490) (0.499) (0.494)

Subjective Adjustment 0.033 1.857 1.000 −(0.233) −

Custom product 0.642 0.000 1.000 (0.479)

Variable Interest 0.606 0.692 0.637 0.000 1.000 (0.489) (0.462) (0.481)

Internal Rating 13.270 12.797 13.100 5.000 22.000 (3.254) (3.035) (3.185)

Going Concern 0.976 0.979 0.977 0.000 1.000 (0.152) (0.143) (0.149)

LLC Legal Form 0.759 0.347 0.612 0.000 1.000 (0.428) (0.476) (0.487) 95 Objective rating 0.758 0.708 0.740 0.000 1.000 (0.428) (0.455) (0.439)

Years of incorporation 24.065 18.267 21.991 1.000 312.000 (24.447) (19.563) (22.984)

Processing time 45.603 25.062 38.254 1.000 195.000 (38.631) (22.649) (35.194)

New credit 0.501 0.301 0.429 0.000 1.000 (0.500) (0.459) (0.495)

Amount 8.676 0.842 5.873 0.003 193.339 (17.924) (6.160) (15.295)

Easy financials 0.717 0.314 0.573 0.000 1.000 (0.451) (0.464) (0.495)

Observations 1,646 917 2,563 2,563 2,563 This table presents the descriptive statistics for the variables described in Section 4. The first column shows the subset of observations with a credit construction that allows for an adjustment. The second column is the subsample with a credit construction that does not allow for an adjustment. The third column is the total sample. SME is 1 for SME clients and 0 for corporate clients (treatment group indicator). Shock is 1 for applications after the policy change (after the shock). Subjective Adjustment is our main dependent variable and proxies for soft information, calculated by: (Subjective interest rate adjustment / Calculated interest margin). Note, a negative adjustment increases the interest rate. Variable Interest is 1 for applications assigned a variable interest rate. Going Concern is 1 if the applicant did not default within 6 months after the application. Objective rating is 1 if an improved rating model is used. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or an overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. Standard deviations are included in brackets. Table 2A: Descriptive Statistics

Before After Treated Control Treated Control Subjective Adjustment 0.014 0.043 0.024 0.084 (0.150)− (0.264) (0.116)− (0.282)

Variable Interest 0.412 0.781 0.406 0.705 (0.493) (0.414) (0.492) (0.457)

Internal Rating 13.283 13.378 13.315 13.158 (3.115) (3.246) (3.168) (3.378)

Going Concern 0.972 0.970 0.985 0.976 (0.165) (0.170) (0.121) (0.153)

LLC Legal Form 0.712 0.791 0.658 0.828 (0.454) (0.407) (0.475) (0.378)

Objective rating 0.692 0.783 0.720 0.795 (0.463) (0.413) (0.449) (0.404)

Years of incorporation 22.504 24.490 22.807 25.304 96 (23.950) (23.676) (22.075) (26.624)

Processing time 31.744 51.704 36.938 53.261 (24.366) (48.735) (25.717) (40.316)

New credit 0.636 0.387 0.639 0.427 (0.482) (0.488) (0.481) (0.495)

Amount 2.421 12.142 2.315 13.329 (3.118) (22.214) (2.581) (21.872)

Easy financials 0.632 0.764 0.614 0.792 (0.483) (0.425) (0.487) (0.406)

Observations 250 406 404 586 This table presents the descriptive statistics for the sample underlying our first set of analyses. The columns are split based on the groups of the differences-in-differences model. The treatment group are SME loan applications and the control group are corporate loan applications. Before (After) indicates before (after) the policy change. Subjective Adjustment is our main dependent variable and proxies for soft information, calculated by: (Subjective interest rate adjustment / Calculated interest margin). Note, a negative adjustment increases the interest rate. Variable Interest is 1 for applications assigned a variable interest rate. Going Concern is 1 if the applicant did not default within 6 months after the application. Objective rating is 1 if an improved rating models is used. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or an overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. Standard deviations are included in brackets. Table 2B: Descriptive Statistics

Before After Treated Control Treated Control Custom product 0.379 0.964 0.456 0.980 (0.486) (0.186) (0.498) (0.140)

Variable Interest 0.592 0.765 0.567 0.699 (0.492) (0.425) (0.496) (0.459)

Internal Rating 12.889 13.316 13.093 13.193 (3.075) (3.244) (3.089) (3.389)

Going Concern 0.971 0.971 0.985 0.977 (0.167) (0.167) (0.120) (0.151)

LLC Legal Form 0.481 0.796 0.475 0.829 (0.500) (0.404) (0.500) (0.376)

Objective rating 0.684 0.774 0.730 0.793 (0.465) (0.419) (0.444) (0.406)

Years of incorporation 19.407 24.599 20.493 25.219 97 (21.063) (23.598) (21.090) (26.492)

Processing time 24.244 52.128 32.149 52.958 (21.015) (48.868) (24.713) (40.250)

New credit 0.420 0.397 0.451 0.430 (0.494) (0.490) (0.498) (0.495)

Amount 1.153 12.127 1.277 13.474 (2.189) (22.045) (2.018) (22.570)

Easy financials 0.429 0.770 0.437 0.793 (0.495) (0.422) (0.496) (0.406)

Observations 659 421 885 598 This table presents the descriptive statistics for the sample underlying our second set of analyses. The columns are split based on the groups of the differences-in-differences model. The treatment group are SME loan applications and the control group are corporate loan applications. Before (After) indicates before (after) the policy change. Custom product is our second dependent variable and indicates whether an application was assigned a credit construction that allows for a subjective adjustment. Variable Interest is 1 for applications assigned a variable interest rate. Going Concern is 1 if the applicant did not default within 6 months after the application. Objective rating is 1 if an improved rating models is used. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or an overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. Standard deviations are included in brackets. Table 3: Main Results Subjective Adjustment OLS Tobit [-2, 2] Shock 0.041∗ 0.037∗∗ 0.044∗∗ 0.041∗∗ 0.037∗∗ 0.044∗∗ −(0.020)− (0.018)− (0.018)− (0.020)− (0.018)− (0.018)

SME 0.057∗∗∗ 0.081∗∗∗ 0.071∗∗∗ 0.057∗∗∗ 0.081∗∗∗ 0.071∗∗∗ (0.021) (0.023) (0.023) (0.021) (0.023) (0.023)

Shock * SME 0.051∗∗ 0.045∗∗ 0.050∗∗ 0.051∗∗ 0.045∗∗ 0.050∗∗ (0.021) (0.021) (0.022) (0.021) (0.021) (0.021)

Variable Interest 0.020 0.026 0.020 0.026 (0.018) (0.017) (0.018) (0.017)

Internal Rating 0.018∗∗∗ 0.019∗∗∗ 0.018∗∗∗ 0.019∗∗∗ (0.003) (0.003) (0.003) (0.003)

Going Concern 0.010 0.004 0.010 0.004 −(0.029) (0.024)− (0.029) (0.024)

LLC Legal Form 0.016 0.016 0.016 0.016 −(0.025)− (0.026)− (0.025)− (0.025)

Objective rating 0.002 0.001 0.002 0.001 −(0.010) (0.009)− (0.010) (0.009)

Years of incorporation 0.0002 0.0002 0.0002 0.0002 (0.0002) (0.0001) (0.0002) (0.0001) 98 Processing time 0.0001 0.0001 0.0001 0.0001 (0.0001) (0.0001) (0.0001) (0.0001)

Easy financials 0.012 0.012 0.012 0.012 (0.024) (0.026) (0.024) (0.026)

New credit 0.002 0.005 0.002 0.005 −(0.014) (0.016)− (0.014) (0.016)

Amount 0.001∗∗∗ 0.001∗∗∗ 0.001∗∗∗ 0.001∗∗∗ (0.0002) (0.0002) (0.0002) (0.0002)

Intercept 0.043∗∗ 0.309∗∗∗ 0.321∗∗∗ 0.043 0.309∗∗∗ 0.321∗∗∗ −(0.021)− (0.069)− (0.071)− (0.044)− (0.069)− (0.070)

Region indicators No No Yes No No Yes Segment indicators No No Yes No No Yes

Observations 1,646 1,646 1,646 1,646 1,646 1,646 R2 0.038 0.113 0.157 Adjusted R2 0.037 0.106 0.135 Log Likelihood 95.980 162.816 203.996 This table presents our main result, the effect of reallocating decision rights on the amount of soft information that is integrated into the assessment of a credit application. Subjective Adjustment is our main dependent variable and proxies for soft information, calculated by: (Subjective interest rate adjustment / Calculated interest margin). Note, a negative adjustment increases the interest rate. SME is 1 for SME clients and 0 for corporate clients (treatment group indicator). Shock is 1 for applications after the policy change (after the shock). Shock * SME is the main variable of interest, this interaction term indicates how the reallocation of decision rights affects the amount of considered soft information. Variable Interest is 1 for applications assigned a variable interest rate. Going Concern is 1 if the applicant did not default within 6 months after the application. Objective rating is 1 if an improved rating model is used. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or an overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. Standard errors are included in brackets. ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01. Table 4: First Stage Heckman Selection Model

Likelihood Custom Product Maximum Likelihood Two-Step LLC Legal Form 0.227∗∗∗ 0.183∗∗∗ 0.237∗∗∗ 0.242∗∗∗ (0.052) (0.028) (0.049) (0.049)

Years of incorporation 0.001∗∗ 0.002∗∗∗ 0.001∗∗∗ 0.002∗∗∗ (0.001) (0.001) (0.0005) (0.001)

Shock 0.081 0.119∗∗∗ 0.064 0.078 (0.055) (0.043) (0.052) (0.052)

SME 0.581∗∗∗ 0.543∗∗∗ 0.567∗∗∗ 0.522∗∗∗ −(0.069)− (0.061)− (0.042)− (0.043)

Shock * SME 0.037 0.072∗ 0.017 0.028 −(0.052)− (0.044)− (0.056)− (0.057)

Variable Interest 0.003 0.055 0.006 0.040 −(0.063)− (0.044)− (0.039)− (0.040)

Internal Rating 0.019∗∗∗ 0.009∗ 0.017∗∗∗ 0.017∗∗∗ (0.004) (0.005) (0.003) (0.004)

Going Concern 0.110∗∗∗ 0.087∗ 0.117 0.117 −(0.042)− (0.052)− (0.074)− (0.076)

Objective rating 0.053∗ 0.026 0.058∗∗ 0.043∗

99 (0.029) (0.026) (0.024) (0.024)

Processing time 0.002∗∗∗ 0.001∗∗∗ 0.002∗∗∗ 0.002∗∗∗ (0.0004) (0.0004) (0.0004) (0.0004)

Easy financials 0.025 0.107∗∗∗ 0.027 0.062 (0.053) (0.038) (0.048) (0.048)

New credit 0.238∗∗∗ 0.152∗∗∗ 0.252∗∗∗ 0.202∗∗∗ (0.060) (0.048) (0.039) (0.039)

Amount 0.001 0.001 0.002 0.001 (0.003) (0.004) (0.001) (0.001)

Intercept 0.022 0.364 0.002 0.031 (0.422) (0.479) (0.341)− (0.420)

Region and Industry indicators No Yes No Yes Robust Clustered SE Yes Yes No No

log pseudolikelihood 812.539 678.257 Observations− 2,563− 2,563 2,563 2,563 This table presents the results for the first stage of the Heckman selection procedure. The results indicate whether the shock had an effect on the selection behavior of loan officers. The dependent variable Custom product is 1 if an application was assigned a credit construction that allows for a subjective adjustment. The first two columns are estimated using full-information maximum likelihood and the last two columns are estimated by the less efficient two-step approach. Two exclusion restrictions are included that are not included in the second stage: LLC Legal Form and the Years of incorporation. SME is 1 for SME clients and 0 for corporate clients (treatment group indicator). Shock is 1 for applications after the policy change (after the shock). Shock * SME is the main variable of interest, this interaction term indicates how the shock altered the selection behavior of loan officers.Variable Interest is 1 for applications assigned a variable interest rate. Going Concern is 1 if the applicant did not default within 6 months after the application. Objective rating is 1 if an improved rating model is used. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or an overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. All coefficients (except the intercept) are average marginal effects. Standard errors are included in brackets. ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01 Table 5: Second Stage Heckman Selection Model

Subjective Adjustment Maximum Likelihood Two-Step

Shock 0.035∗ 0.049∗∗∗ 0.036∗∗ 0.046∗∗∗ −(0.020)− (0.017)− (0.014)− (0.014)

SME 0.009 0.175∗∗∗ 0.057∗ 0.095∗∗∗ (0.028) (0.031) (0.032) (0.027)

Shock * SME 0.052∗∗ 0.038∗ 0.048∗∗ 0.048∗∗ (0.022) (0.021) (0.023) (0.022)

Variable Interest 0.018 0.031∗∗ 0.020 0.027∗ (0.022) (0.012) (0.016) (0.016)

Internal Rating 0.020∗∗∗ 0.015∗∗∗ 0.019∗∗∗ 0.019∗∗∗ (0.003) (0.003) (0.002) (0.002)

Going Concern 0.018 0.017 0.013 0.006 −(0.027) (0.028)− (0.036) (0.036)

Objective rating 0.004 0.003 0.001 0.002 (0.010)− (0.010) (0.013) (0.013)

Processing time 0.0002 0.0002 0.0001 0.00001 (0.0001)− (0.0001) (0.0002) (0.0001)

100 Easy financials 0.028 0.050∗∗∗ 0.009 0.010 (0.020)− (0.012) (0.016)− (0.017)

New credit 0.024 0.035∗∗ 0.006 0.004 (0.018)− (0.015) (0.018)− (0.017)

Amount 0.001∗∗∗ 0.001∗∗∗ 0.001∗∗∗ 0.001∗∗∗ (0.0002) (0.0002) (0.0003) (0.0003)

Intercept 0.373∗∗∗ 0.194∗∗∗ 0.329∗∗∗ 0.294∗∗∗ −(0.075)− (0.061)− (0.058)− (0.068)

Region and Industry indicators No Yes No Yes Robust Clustered SE Yes Yes No No

rho 0.459 0.755 0.152 0.195 − − rho Prob > chi2 0.0∗∗∗ 0.0∗∗∗ lambda 0.104 0.174 0.033 0.042 − − lambda SE 0.018∗∗∗ 0.023∗∗∗ 0.039 0.034 log pseudolikelihood 812.539 678.257 Observations− 1,646− 1,646 1,646 1,646 This table presents the results for the second stage of the Heckman selection procedure. The results indicate whether the resuls of Table 3 are affected by a selection effect. Subjective Adjustment is our main dependent variable and proxies for soft information, calculated by: (Subjective interest rate adjustment / Calculated interest margin). Note, a negative adjustment increases the interest rate. The first two columns are estimated using full-information maximum likelihood and the last two columns are estimated by the less efficient two-step approach. SME is 1 for SME clients and 0 for corporate clients (treatment group indicator). Shock is 1 for applications after the policy change (after the shock). Shock * SME is the main variable of interest, this interaction term indicates how the reallocation of decision rights affects the amount of considered soft information. Variable Interest is 1 for applications assigned a variable interest rate. Going Concern is 1 if the applicant did not default within 6 months after the application. Objective rating is 1 if an improved rating model is used. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or an overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. Standard errors are included in brackets. ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01. Table 6: Loan Officer Fixed-Effects

Subjective Adjustment OLS Tobit [-2, 2]

Shock 0.046∗∗ 0.036∗∗ 0.046∗∗∗ 0.044∗∗∗ −(0.018)− (0.017)− (0.018)− (0.016)

SME 0.054∗∗∗ 0.027 0.054∗∗∗ 0.049∗∗ (0.021) (0.084) (0.021) (0.024)

Shock * SME 0.059∗∗ 0.056∗∗ 0.059∗∗ 0.060∗∗ (0.023) (0.026) (0.023) (0.027)

Variable Interest 0.021 0.046∗∗ 0.021 0.030 (0.018) (0.021) (0.018) (0.019)

Internal Rating 0.021∗∗∗ 0.023∗∗∗ 0.021∗∗∗ 0.022∗∗∗ (0.002) (0.003) (0.002) (0.002)

Going Concern 0.001 0.053 0.001 0.007 (0.036) (0.040)− (0.035) (0.046)−

Objective rating 0.001 0.0001 0.001 0.002 (0.016)− − (0.020)− (0.016) (0.015)−

Processing time 0.0001 0.0001 0.0001 0.00004 (0.0002)− (0.0002) (0.0002) (0.0002) 101 Easy Financials 0.006 0.0001 0.006 0.004 (0.017) (0.021) (0.017) (0.017)

New credit 0.005 0.029∗ 0.005 0.014 (0.019) (0.017) (0.018) (0.018)

Amount 0.001∗∗∗ 0.001∗∗∗ 0.001∗∗∗ 0.001∗∗∗ (0.0003) (0.0003) (0.0003) (0.0003)

Intercept 0.365∗∗∗ 0.323∗∗∗ 0.365∗∗∗ 0.364∗∗∗ −(0.071)− (0.084)− (0.069)− (0.081)

Loan Officer FE No Yes No Yes Region + Industry indicators Yes Yes Yes Yes

Group Stats Total: 150 Min: 2 Mean: 8.4 Max: 39

R-Square 0.173 0.299 Observations 1,213 1,213 1,213 1,213 This table presents the results of an analysis designed to investigate how changes in the loan officer pool and loan officer fixed-effects affect our main results. Subjective Adjustment is our main dependent variable and proxies for soft information, calculated by: (Subjective interest rate adjustment / Calculated interest margin). Note, a negative adjustment increases the interest rate. SME is 1 for SME clients and 0 for corporate clients (treatment group indicator). Shock is 1 for applications after the policy change (after the shock). Shock * SME is the main variable of interest, this interaction term indicates how the reallocation of decision rights affects the amount of considered soft information.Variable Interest is 1 for applications assigned a variable interest rate. Going Concern is 1 if the applicant did not default within 6 months after the application. Objective rating is 1 if an improved rating model is used. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. Standard errors are included in brackets. ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01. Table 7: Loan outcome analysis

Likelihood of Charge-Off Probability of Default slope

Shock 0.013 0.018 0.015 0.015 (0.018) (0.015) (0.012) (0.013)

SME -0.076 -0.054 -0.008 -0.001 (0.046) (0.044) (0.010) (0.012)

Shock * SME -0.002 -0.009 -0.047∗∗ -0.044∗∗ (0.037) (0.033) (0.020) (0.020)

Variable Interest 0.159∗∗∗ 0.163∗∗∗ -0.020∗∗ -0.017∗ (0.025) (0.026) (0.008) (0.009)

Processing time -0.001∗∗∗ -0.001∗∗∗ 0.00002 0.00004 (0.0003) (0.0003) (0.0001) (0.0001)

Easy financials -0.047∗∗ -0.027 0.028∗∗∗ 0.032∗∗∗ (0.020) (0.023) (0.007) (0.009)

New credit -0.099∗∗∗ -0.099∗∗∗ -0.020∗∗ -0.018∗ (0.030) (0.024) (0.010) (0.009) 102

Amount -0.001∗ -0.001 0.0003 0.0003 (0.001) (0.001) (0.0002) (0.0002)

Intercept -1.272∗∗∗ -0.476 0.043∗∗∗ 0.006 (0.467) (0.626) (0.012) (0.037)

Model Logit Logit OLS OLS Region and Segment FE No Yes No Yes

(Pseudo) R-Squared 0.116 0.150 0.024 0.053

Observations 1,661 1,661 1,661 1,661 Log Likelihood -724.694 -695.834 This table presents the results of an analysis designed to investigate how the observed change in behavior by loan officers following the change in organizational design influences the ex-post loan outcomes. Likelihood of Charge-Off is our first dependent variable and is 1 if the credit rating is above the charge-off threshold 15 months after the loan is granted. Probability of Default slope is our second dependent variable and is calculated by estimating a linear regression for each loan where the risk rating is the dependent variable and time is the independent variable. SME is 1 for SME clients and 0 for corporate clients (treatment group indicator). Shock is 1 for applications after the policy change (after the shock). Shock * SME is the main variable of interest, this interaction term indicates how the reallocation of decision rights affects the amount of considered soft information. Variable Interest is 1 for applications assigned a variable interest rate. Processing time is the number of workings days from initiation to completion. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. Easy financials is 1 if basic financial information is available at the Chamber of Commerce. Standard errors are included in brackets. ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01. Table 8: Screening analysis

Likelihood early acceptance (Logit) 2013-2 – 2013-10 2014-2 – 2014-10

Log(Amount) 0.010 0.012∗ 0.029∗∗∗ 0.030∗∗∗ (0.007) (0.007)− (0.004)− (0.005)

Client meeting 0.266∗∗∗ 0.262∗∗∗ 0.222∗∗∗ 0.215∗∗∗ (0.033) (0.033) (0.034) (0.035)

Relationship Manager 0.420∗∗∗ 0.423∗∗∗ 0.303∗∗∗ 0.302∗∗∗ (0.029) (0.031) (0.028) (0.031)

Established client 0.096∗∗∗ 0.097∗∗∗ 0.106∗∗ 0.099∗∗ (0.026) (0.026) (0.045) (0.044)

Risk involvement 0.305∗∗∗ 0.302∗∗∗ 0.772∗∗∗ 0.763∗∗∗ (0.028) (0.029) (0.073) (0.061)

Years of incorporation 0.001 0.001 0.0001 0.00000 (0.0004) (0.0004) (0.0004)− (0.0004)

LLC Legal Form 0.068∗∗∗ 0.061∗∗∗ 0.035 0.036 − − 103 (0.020) (0.023) (0.026) (0.029)

Going Concern 0.175∗∗∗ 0.179∗∗∗ 0.336∗∗ 0.340∗∗ (0.044) (0.042) (0.160) (0.159)

Intercept 3.095∗∗∗ 3.023∗∗∗ 3.706∗∗∗ 2.739∗∗∗ −(0.298)− (0.369)− (0.942)− (0.876)

Region and Industry indicators No Yes No Yes

Pseudo R-squared 0.158 0.175 0.433 0.446 log pseudolikelihood -2340.163 -2291.762 -544.078 -531.46

Observations 4,062 4,062 1,563 1,563 This table presents the descriptive results of an analysis designed to investigate how pre-screening behavior at the early stages changes over time. The first two columns are applications from the period 2013-02 to 2013-10, the last two columns are applications from the period 2014-02 to 2014-10. Early acceptance is the dependent variable which indicates whether an application made it past the initial pre-screening stage. Client meeting is 1 if a face-to-face meeting took place. Relationship Manager is 1 if the application entered the system via a relationship manager. Established client is 1 if the client is older than 1 year and has a prior credit history with the bank. Risk involvement is 1 if a brief early recommendation was requested from the risk department. Going Concern is 1 if the applicant did not default in the 6 months after the application was completed. log(Amount) is the logarithmic transformation of the credit amount in euros. All coefficients (except intercept) are average marginal effects. Standard errors are included in brackets. ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01. Table 9: Likelihood of acceptance analysis

Subjective Adjustment Likelihood of acceptance Custom product Standard product

Accepted 0.024∗∗ −(0.009)

Shock 0.049∗∗∗ 0.058∗∗∗ 0.008 −(0.017)− (0.018) (0.030)

SME 0.055∗∗∗ 0.040 (0.019)− (0.026)

Shock * SME 0.050∗∗ 0.029 (0.021) (0.034)

Internal Rating 0.019∗∗∗ 0.004∗∗∗ 0.002 (0.003) (0.001) (0.004)

Objective rating 0.004 0.001 0.026 (0.008) (0.020)− (0.023)

New credit 0.006 0.031∗∗ 0.040 − − 104 (0.012) (0.015) (0.025)

Amount 0.001∗∗∗ 0.0003 0.027 (0.0002)− (0.0003) (0.027)

Intercept 0.259∗∗∗ 2.789∗∗∗ 1.278∗∗ −(0.055) (0.480) (0.571)

Model OLS Logit Logit

(Pseudo) R-Squared 0.149 0.040 0.027

Observations 1,982 1,945 1,085 Log Likelihood -707.270 -494.110 This table presents the results of an analysis designed to investigate whether the shock influenced the likelihood of acceptance. The first column replicates Table 3 but includes applications that are not accepted or declined at the final stage back into the sample. Subjective Adjustment is our main dependent variable and proxies for soft information, calculated by: (Subjective interest rate adjustment / Calculated interest margin). Note, a negative adjustment increases the interest rate. SME is 1 for SME clients and 0 for corporate clients (treatment group indicator). Shock is 1 for applications after the policy change (after the shock). Shock * SME is the main variable of interest, this interaction term indicates how the reallocation of decision rights affects the amount of considered soft information. Objective rating is 1 if an improved rating model is used. New credit is 1 if the applicant requests a new loan and 0 in case of a loan increase or an overdraft loan. Amount is the total credit amount requested in euros scaled by a random number. All coefficients (except the intercept) are average marginal effects. Standard errors are included in brackets. ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01. Table 10A: Placebo Test

Dependent variable: Subjective Adjustment OLS Dif-in-Dif with random placebo shock Placebo Shock 0.013 p =− 0.584

SME 0.058 p = 0.032∗∗

Placebo Shock * SME 0.001 p =− 0.810

Intercept 0.035 − p = 0.085∗

Observations 656

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01 For an explanation of this procedure see Section 7. The results are the average coefficients and p values of the following amount of iterations: 30

Table 10B: Placebo Test

Dependent variable: Custom product Probit Dif-in-Dif with random placebo shock Placebo Shock 0.033 p = 0.345

SME 2.347 − p = 0.000∗∗∗

Placebo Shock * SME 0.449 p = 0.308

Intercept 1.885 p = 0.000∗∗∗

Observations 1,080

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01 For an explanation of this procedure see Section 7. The results are the average coefficients and p values of the following amount of iterations: 30

105 106 Chapter 4

Are All Readers on the Same Page? Predicting Variation in Information Retrieval from Financial Narratives

Co-authors:

Christoph Sextroh Victor van Pelt

107 108 4.1. Introduction

Over the past several decades, policy-makers, and scholars have been increasingly concerned about how users of financial statements, such as investors, stakeholders, and information intermediaries, vary in how they retrieve information from financial narratives in financial statements and other corporate communications. It has been widely suggested that such variation is caused by the text characteristics of financial narratives and that it has non-trivial capital market consequences (e.g., Loughran and McDonald, 2014; Guay, Samuels, and Taylor, 2016; Bonsall, Leone, Miller, and Rennekamp, 2017). Whereas prior empirical accounting research and practice have mainly focused on how specific text characteristics produce variation in how users retrieve information from financial narratives, this paper instead takes a broader focus that incorporates all variation stemming from differences in the financial literacy of users.

Rather than focusing on specific text characteristics, such as readability and length, we expand the measurement of “variation in information retrieval” by first eliciting observed variation in information retrieval exhibited by users with varying degrees of financial literacy, and then relating this variation back to the full set of text char- acteristics of financial narratives. We, therefore, measure “variation in information retrieval” without having to rely on assumptions about the homogeneity of users, and it also enables us to incorporate text characteristics that are less documented by prior literature such as the semantics and content of the text. Another primary advantage of our approach is that we can also use it to predict “variation in information retrieval” based on text characteristics for financial narratives at large, and relate it to capital market outcomes.

Prior accounting research proposes that certain text characteristics of financial nar- ratives may increase processing costs for some users but not for others depending on their financial literacy (e.g., Bloomfield, 2002; Rennekamp, 2012; Krische, 2018). Yet, there is more to the information retrieval process because financial literacy may also interact with how and which information is retrieved from the financial narra- tive. Specifically, research in judgment and decision-making suggests that differences in domain-specific knowledge impact individuals’ representations of judgment and de- cision situations (Yates, 1990; Bonner, 2007). When these mental models or schema are activated by, for instance, thinking about specific topics or thinking about infor- mation relevant to the schema, they may influence attention to specific information, problem representation, and how this information is interpreted (Smith, 1998). By constructing a broader measure that directly incorporates variation in users’ financial

109 literacy, we thus aim to better capture how users retrieve information from financial narratives.

To measure variation in information retrieval from financial narratives, we follow a two- step approach that integrates experimental and archival methodologies. The first step of our approach is eliciting how users retrieve information from financial narratives in a controlled environment. We develop and execute a procedure that elicits how users with varying degrees of financial literacy evaluate the relevance of information in MD&As included in public 10-K filings. Specifically, we ask participants to read through MD&A excerpts and mark information that they find relevant (i.e., sentences and words). Using obfuscation and mouse-tracking techniques, we are not only able to elicit participants’ relevance judgments but also measure the time that they spend on reading individual sentences, whether they evaluate a marking as positive, negative, or neutral, whether they undo markings, and whether they switch back and forth between sentences. Our approach is similar in spirit to eye-tracking techniques employed for related questions, e.g. in the field of marketing or computational linguistics (e.g. Hy¨on¨a, Lorch, and Kaakinen, 2002; Cole, Gwizdka, Liu, Belkin, and Zhang, 2013; Pieters and Wedel, 2008). However, our approach allows us to capture and track user behavior, for the purpose of eliciting variation in information retrieval, on a much more cost-effective and large-sample oriented way.

We recruit more than 600 participants from Amazon’s Mechanical Turk (M-Turk). Since we propose that variation in information retrieval is driven by both text charac- teristics and users’ financial literacy, we control that a sufficiently heterogeneous set of users evaluate the same excerpt. Before participants executed the main reading task, we required them to complete a qualification task and to respond to a set of questions aiming to assess their level of financial literacy (van Rooij, Lusardi, and Alessie, 2011). Next, we created three financial literacy categories (i.e., low, medium, and high) and randomly allocated a minimum of 3 users of each financial literacy category to each of the randomly selected excepts resulting in at least 9 observations per excerpt.

To ensure the validity of our elicited data, we validate whether participants actually performed the task as instructed. First, we examine whether reading time varies with financial literacy and text characteristics in a manner that is consistent with expecta- tions from prior literature. We find that users spend more time to read longer sentences, spend less time to read sentences that contain a higher percentage of complex words, and that the reading time per sentence varies across financial literacy groups. Second, we also test the validity of users’ marking behavior. We find that positive (negative) markings are positively (negatively) related to participants’ willingness to invest and their willingness to consult additional information sources. Neutral marking behavior

110 is not related to participants willingness to invest, but does positively relate to their willingness to consult additional information sources. Consistent with prior literature, we also find that readability reduces the willingness to invest (e.g., Rennekamp, 2012) while at the same time it increases the willingness to consult external resources (e.g., Asay, Elliott, and Rennekamp, 2016). Lastly, we test whether user marking behavior varies with financial literacy and text characteristics. We find that users are less likely to mark characters in sentences that contain a higher percentage of complex words. Yet, users are more likely to mark characters in a sentence, and mark more characters in that sentence, when the sentence is longer. We also find that the likelihood of mark- ing characters in sentences and the number of marked characters in marked sentences varies across financial literacy groups. Overall, our validation tests suggest that partic- ipants read, processed, and assessed the excerpts presented to them in the procedure. More importantly, how participants read, processed, and assessed the MD&A excerpts depends on both text characteristics and their financial literacy.

The second step of our approach involves testing whether the variation in information retrieval that we elicited in the first stage can be predicted using the text characteristics of financial narratives. We first create a topical structure which is defined by applying a transfer-learning approach where we convert the sentences into a dense vector space that captures the meaning (i.e. semantics) of the text using FastText word embeddings (Mikolov, Grave, Bojanowski, Puhrsch, and Joulin, 2017). Specifically, each sentence is mapped into a vector space of 300 dimensions based on the pre-trained Common Crawl FastText word vectors (Joulin, Grave, Bojanowski, Douze, J´egou,and Mikolov, 2016). We then add several additional linguistic features, such as readability scores, to complement the FastText dimensions. All these text characteristics combined with the actual user behavior data obtained in the first stage serve as the data to train state- of-the-art machine-learning algorithms that are able to predict the likelihood that a sentence is marked by a reader of high, medium, or low literacy. While we are able to predict the marking of the medium and high literacy groups with reasonable accuracy (the two neural networks achieved an F-1 Score on the test set of around 68 (69) percent for the medium (high) literacy group), the observed marking behavior of low literacy users revealed that their marking behavior was too erratic to be predicted based on the features of the text. Even the best predictive model for the low groups’ marking behavior provided predictions that were only marginally more accurate than a random choice baseline with an F-1 score of around 54 percent. This is consistent with the notion that users with low financial literacy are not equipped with enough knowledge to make consistent evaluations about the relevance of information in financial narratives. Since low literacy users are unlikely the main users of financial narrative information, we focus on the high and medium literacy groups in subsequent tests.

111 We measure our proxy for variation in information retrieval as the degree to which the predicted marking for high and medium financial literacy users varies within sentences and across documents. Based on a sample of about 31,000 MD&As from 10-K docu- ments with filing dates on EDGAR between 2003 and 2016, we predict the probability of marking for all 10,208,953 individual sentences in the MD&A based on the trained algorithms for the high and medium financial literacy groups. Consistent with the results of the elicitation procedure, descriptive statistics of these predictions suggest that there is also considerable predictable variation in how users of high and medium financial literacy retrieve information from financial narratives.

In our final set of tests, we test whether the heterogeneity in the predicted mark- ing behavior correlates with observed reactions of capital market participants after publication of the corresponding 10-K filing. To the extent that heterogeneity in the predicted marking behavior is a valid proxy for variation in information retrieval of cap- ital market participants, we expect a positive relationship with post-filing stock return volatility. We measure heterogeneity in the predicted marking behavior using standard inter-rater correlation and agreement measures. We find that the level of heterogene- ity in predicted marking behavior is significantly positively associated with post-filing stock return volatility, suggesting that more variation in information retrieval from fi- nancial narratives among different groups of financial statement users generate greater (lower) stock return volatility following the public release of the MD&A. This result holds while controlling for the general length and/or complexity of the 10-K (as mea- sured by the total number of words and the percentage of complex words) as well as industry or firm fixed effects. We do not find significant results for tests using ana- lyst forecast dispersion as an alternative measure of the information environment that focuses more on the ability of analysts to process data into earnings forecasts. While these results are seemingly inconsistent with the idea that predicted marking behavior actually captures variation in information retrieval, financial analysts are generally sophisticated users of financial statements and, hence, should exhibit lower variation in financial literacy. Hence, the insignificant association observed in analyst dispersion tests de facto corroborates our idea that heterogeneity in predicted marking behavior captures variation in information retrieval based on differences in financial statement users’ financial literacy. Similarly, we observe that the relationship of heterogeneity and stock return volatility is more pronounced for firms with lower levels of institutional ownership. This is consistent with the notion that variation in information retrieval of capital market participants should be more important for firms with a relatively more diverse investor base. Overall, the results provide evidence that relatively sim- ple predictions of the variation in information retrieval from financial narratives have important consequences for capital market outcomes.

112 We believe that this paper makes several important contributions to the literature in accounting and finance. While prior research identifies specific text characteristics that may affect financial statement users’ ability to retrieve financial narrative infor- mation (e.g., Loughran and McDonald, 2014; Guay et al., 2016; Bonsall et al., 2017; Rennekamp, 2012), this paper is, to the best of our knowledge, the first to elicit and predict variation in information retrieval from financial narratives based on actual be- havior. Rather than focusing on specific text characteristics, such as readability and length, we expand the measurement of “variation in information retrieval” by first eliciting observed variation in information retrieval exhibited by users with varying degrees of financial literacy, and then relating this variation back to the full set of text characteristics of financial narratives. This allows us to avoid having to make assump- tions about this variation and it’s source, it allows us to incorporate the semantics and content of text beyond text characteristics relating to the processing costs, and the machine learning second stage enables us to create out-of-sample predictions based on this approach. Furthermore, our approach and results highlight that variation in infor- mation retrieval across users may not solely be resolved by reducing the complexity of a financial narrative given that the financial literacy will also influence the information retrieval based on the semantics and content of the text. Although prior literature, such as Hodge and Pronk(2006), have documented that financial literacy impact the type of source documents that users focus on, we extend this line of literature by looking how it impacts the information retrieval process from within one source document. These insights are relevant not only for future empirical research in accounting and finance, but also for practitioners and preparers of financial narratives.

Second, our study uses a two-stage empirical approach to predict and validate user behavior that otherwise would be difficult to observe using naturally-occurring data alone. More recent literature in applied computational linguistics discussing approaches to measure text comprehension seem to suggest that there might be a shift from tra- ditional text characteristics toward more data-driven, user-centric, knowledge-based computational assessment algorithms (e.g., Collins-Thompson, 2014). Such approaches evolve automatically as vocabularies evolve and adapt to individual users and groups which may reduce the importance of assumptions and discussions made in the past about the text characteristics of financial narratives (e.g., Loughran and McDonald, 2014; Henry and Leone, 2016; Siano and Wysocki, 2018). Our study meets these calls by using advanced techniques to test the relevance of a user-centric approach to infor- mation retrieval in the context of financial narratives; an arguably important source of economic information used by a variety of users with different backgrounds (Drake, Roulstone, and Thornock, 2012; Chi and Shanthikumar, 2018; Gibbons, Iliev, and Kalodimos, 2018; Drake, Roulstone, and Thornock, 2016). We also believe that our

113 paper makes a contribution to the growing literature on automatic summarization for financial documents (e.g., Cardinaels, Hollander, and White, 2019), in particular by highlighting that “human variation” exists and that it might be necessary to create multiple versions of each summary to cater to various types of stakeholders. Finally, our empirical approach also highlights the importance of including the semantics and contents of a text, as proxied for by the dense word embeddings, when evaluating financial narratives.

Lastly, by integrating experimental and archival methods to elicit and predict user behavior, we also open up a broad range of new research questions for empirical ac- counting and finance. Prior experimental research typically focuses on testing whether specific text characteristics of financial narratives produce meaningful variation in user behavior in a controlled, experimental environment (e.g., Rennekamp, 2012; Elliott, Rennekamp, and White, 2015; Asay et al., 2016). Using the judgments and decisions of user-participants, scholars can increase their confidence that a story or theory about the relationship between text characteristics and user behavior makes sense (Libby, Bloomfield, and Nelson, 2002). In contrast, archival work uses naturally-occurring data to relate text characteristics to real-word outcomes, and tests whether the re- lationships generalize across firms and industries (see e.g., Bloomfield, Nelson, and Soltes, 2016). While some scholars have used both empirical methods in conjunction to improve their examination of text characteristics of financial narratives (e.g., Bon- sall et al., 2017), we are unaware of endeavours that integrate both empirical methods to predict variation in user behavior for financial narratives at large.

4.2. Background and Related Literature

4.2.1. Text Characteristics

Academic interest in how users may vary in how they retrieve information from fi- nancial narratives grew considerably when the Securities and Exchange Commission (SEC) produced specific guidance recommending that preparers of financial narratives use plain English features (Securities and Exchange Commission, 1998a,b). This guid- ance coincided with calls by accounting scholars to empirically examine the text char- acteristics of financial narratives. Most of its argumentation stems from the conjecture that the text characteristics of financial narratives produce variation in information retrieval among users, such as investors and information intermediaries, because cer- tain text characteristics may vary in the processing costs that they impose on users. For instance, Bloomfield(2002) formulates the incomplete revelation hypothesis which

114 posits that when information is more costly to extract by users, then this will result in less trading and it will be less completely revealed in market prices. The incomplete revelation hypothesis suggests that the text characteristics of financial narratives may increase processing costs for some users but not for others depending on their degree of financial literacy. When users of financial narratives experience different processing costs , they may respond differently to the same financial narrative.

To this date, research in accounting and finance has produced considerable evidence suggesting that certain text characteristics, such as word familiarity and complexity, in fact produce variation in whether users retrieve information from financial narratives. You and Zhang(2009), for instance show that firms with longer 10-K filings have a larger delay in market reactions to 10-K filings, and De Franco, Hope, Vyas, and Zhou(2015) show that more readable analysts ’ reports lead to increases in trading volume reactions. Lang and Stice-Lawrence(2015) conduct a large-sample empirical examination of annual report textual disclosures for non-U.S. firms and find that firms with improvements in financial narratives experience improvements in several economic outcomes. Importantly, firms also seem not to be completely oblivious to the impact of text characteristics in their disclosures and might use voluntary disclosure to mitigate the negative effects associated with less readable filings (Guay et al., 2016).

However, accounting scholars and regulators struggle to agree on a measure to cap- ture variation in the processing costs of financial narratives. Most research today has focused on measuring syntactic features of the text (e.g., word familiarity or sentence structure) or simple aspects of text legibility (e.g., the quantity of text). However, even popular measures, such as Gunning(1952)’s Fog index, have received increasing criti- cism in the areas of finance and accounting. The Fog index gained initial popularity due to its arguably intuitive way to measure variation in the processing costs of financial narratives; the average sentence length and the percentage of complex words. However, Loughran and McDonald(2014), for example, argue that the Fog Index and its two components are poorly specified in financial applications. One significant shortcoming of the Fog Index proposed by the authors is that financial narratives tend to use a high percentage of words that contain three or more syllables. Although multi-syllabic word are thus common in financial narratives, the Fog index labels such words as complex meaning that it biases the measurement of processing costs upward. Instead, Loughran and McDonald(2014) put 10-K file size forward as a simple, yet relevant feature to capture variation in the processing costs of financial narratives.

Bonsall et al.(2017) suggest a more specific measure by incorporating the plain index English attributes of disclosure (e.g., active voice, fewer hidden verbs, etc.) and show that the resulting measure, the Bog Index, incrementally correlates with capital market

115 outcomes. Notably, however, even these more elaborate measures focus on relatively basic features of a text, and, therefore, do not incorporate higher level text charac- teristics such as discourse, cohesion, pragmatics, advanced semantics, etc. Also, Siano and Wysocki(2018) suggest that textual analysis using financial narratives generally suffers from the prevalence of numbers within corporate disclosures. This characteristic of financial narratives seems not only relevant from a measurement perspective, but highlights a larger fundamental concern related to the analysis of text characteristics of financial narratives.

4.2.2. User Characteristics

Despite these advances and ongoing discussions, a few recent studies suggest that directly incorporating users characteristics may be of crucial importance to advance our understanding of how users retrieve information from financial narratives. Miller (2010), for instance, finds that firms with more readable financial narratives have more pronounced small investor trading around the 10-K filing date. Similarly, Lawrence (2013) finds that retail investors are more likely to invest in firms with shorter and more readable financial narratives. An influential paper by Rennekamp(2012) shows that more readable disclosures lead to stronger reactions from small investors, so that changes in valuation judgments are more positive when news is good and more nega- tive when news is bad. Rennekamp(2012) attributes this finding to an indirect effect operating through feelings implicated by perceived processing fluency. Consistent with the idea that text characteristics affect investors’ confidence in interpreting narrative information, Elliott et al.(2015) experimentally show that, when concrete language is highlighted in a prospectus, investors are significantly more willing to invest in a firm than when abstract language is highlighted. Furthermore, the effect of concrete language is particularly important when investors feel more psychologically distant from a firm because such investors feel less comfortable in their ability to evaluate an investment. More recently, Asay et al.(2016) use an experiment to examine investors’ reactions to disclosures containing mixed news about the valence of firm performance, and this disclosure varies in readability. Their findings suggest that investors who view a less readable initial disclosure feel less comfortable evaluating the firm and, in turn, rely more on the outside information. In sum, directly incorporating user char- acteristics into the examination of information retrieval from financial narratives is an important area of research with a large number of unanswered questions.

Research in judgment and decision-making also suggests that incorporating individual characteristics into the examination of information retrieval is more than just account- ing for differences in ability to retrieve information from a source (Yates, 1990; Bonner,

116 2007). Information retrieval is a cognitive process that has different dimensions: how much information is retrieved, the speed by which the information is retrieved, the order in which it is retrieved, and the type of information that is retrieved (Ford, Schmitt, Schechtman, Hults, and Doherty, 1989). Knowledgeable individuals tend to retrieve less information than less knowledgeable individuals (Johnson, 1988). Knowl- edgeable individuals also tend to retrieve information in different sequential orders and search for different pieces of information, whereas less knowledgeable individuals use the same sequential order (Camerer and Johnson, 1997; Maines and McDaniel, 2000). Knowledgeable users use a more focused information search (Cloyd, 1997; Wood and Lynch, 2002) and that they access more internal decision rules to judge whether in- formation is more and less relevant to them (Bonner and Lewis, 1990; Ed, Koch, and Boone, 2005). In contrast, users who possess less knowledge may rely more strongly on heuristics and use broader surface level features and indicators to identify and ex- tract that what information they deem relevant to retrieve (Benbasat and Schroeder, 1977).

A few studies have documented how knowledge differences impact the information re- trieval process in accounting settings. Shields(1983) shows that students and managers retrieve different pieces of information which affects their final judgments during per- formance evaluations. Information retrieval strategies also vary for individuals with different levels of knowledge in capital budgeting settings (Swain and Haka, 2000). Johnson(1988) compares analysts’ and students’ information retrieval behavior and prediction of stock prices, and finds that analysts search for less information and are faster. Also, Anderson(1988) compares analysts to investors making a stock recommen- dation. While both groups have similar depth and speed, analysts vary more strongly in the order in which they work through information while investors work through in- formation more sequentially. Also, analysts retrieve both confirming and disconfirming information while investors tend to retrieve mostly confirming information.

The judgment and decision-making literature also suggests that differences in knowl- edge may impact how individuals form mental representations of judgment and decision situations (Yates, 1990; Bonner, 2007). When retrieving information from narratives, thinking about or being confronted with specific text causes individuals to activate a mental model or schema that not only depends on the text but also on prior knowledge and experiences helping individuals form meaningful relationships, patterns, and cat- egories (Bower and Morrow, 1990; Castellan, 1993). Specifically, individuals translate the surface of the text into underlying conceptual propositions, and use their knowl- edge and experience to identify referents of the text’s concepts, linking expressions that refer to the same entity, and drawing inferences to tie the causal links among the

117 action sequences of the narrative together. Once activated, schemas influence memory retrieval, attention to specific types and pieces of information, and how retrieved in- formation is interpreted (Smith, 1998). Individuals also tend to remember the schemas they constructed from a text, rather than the text itself (Johnson-Laird, 1983). As a consequence, we expect that variation in users’ financial literacy not only affects their abilities to retrieve information from financial narratives, but also impacts how they retrieve information, which type of information they retrieve, and how they interpret that information.

4.3. Machine-learning Approach and Computational Pipeline

Research in computational linguistics suggests that readers’ information retrieval is determined by at least three dimensions: (1) characteristics of the text, (2) charac- teristics of the reader, and (3) the goal and context of the reading task (see, e.g., Collins-Thompson, 2014, for a discussion). Based on this we deviate from the tradi- tional empirical approach of evaluating narratives using rule-based measures. Instead, we use a machine learning approach with training data based on actual observed user behavior. This approach enable us to directly incorporate characteristics of the user into the model when evaluating information retrieval from financial narratives. Figure 1 illustrates the main differences. The traditional text-only approach used in empirical finance and accounting typically focus on specific characteristics of the text and how these affect the presentation and ultimately the processing of a given piece of infor- mation (see Figure 1a). This traditional approach is inherently based on assumptions with regards to the chosen text characteristics and the way these are expected to inter- act with the context and knowledge of the individual accessing the information. Our approach is able to measure user variation without having to make assumptions about this variation and how it interacts with text characteristics. The user characteristics can directly impact the information retrieval process, but they can also interact with text characteristics as indicated by the blue lines in Figure 1. In particular, it is im- portant to note that characteristics of the text do not only affect a users’ ability or willingness to decode information, but may also affect how users access information and which information they retrieve. For example, while differences in users’ reaction to more or less complex financial narratives might be caused by differences in users’ perceived ability to process this information, textual characteristics are also expected to affect users’ access and relative weighting of different pieces of information (i.e. their relevance judgments), Section 2 includes more details. We directly identify and

118 predict heterogeneity in these relevance judgments that are driven by differences in user characteristics (see Figure 1b).

— Insert Figure 1 here —

To identify variation in behavior and relevance judgments of different types of users we proceed in two stages that integrate experimental and archival methodologies. In the first stage, we elicit behavior of different users and for different pieces of narrative information in a controlled environment. The main purpose of this stage is to assign user behavior to a distinct text structure (e.g., the sentence or paragraph). These user- based assignments can then be used to analyze the relationship of distinct patterns of user behavior in response to the characteristics of the corresponding text structure (e.g., its semantic and syntactic characteristics) and variations in this relationship for different user characteristics. In the second stage, we use the text structures and assignments from the first stage as training corpora and observed user behavior to train a machine-learning algorithm that predicts how a heterogeneous user-group would vary in their relevance judgments of financial narratives based on an inclusive set of text characteristics, including understudied text characteristics such as the semantics and content. Overall, the goal of our approach is not only to analyze differences in user behavior (for which the first stage would be sufficient), but also to test whether it is possible to predict meaningful differences in users’ relevance judgments for out-of- sample narratives in order to evaluate whether such differences have capital market consequences.

— Insert Figure 2 here —

Figure 2 presents an overview of the computational pipeline to estimate variation in information retrieval. Our two-stage approach would not be possible without machine- learning. A machine-learning approach aimed at predicting user behavior typically consists of three key ingredients:

1.A training corpus of individual text structures that is representative of the genre, language, or any other aspect of the narrative of interest.

2.a set of text characteristics that capture attributes of the text that are related to the outcome that is to be predicted.

3. a valid elicitation of user behavior that is to be predicted.

We focus on the Management Discussion & Analysis (MD&A) section in 10-K filings as our main object of analysis and use excerpts from MD&As as an instrument in the elicitation stage and as training corpora in the second state to predict variation in user’s information retrieval of financial narratives. The MD&A section is one of

119 the key financial narratives in the financial statement and has the favorable empirical characteristic that it covers a broad range of topics making it a suitable candidate from the perspective of generalizability. In addition, we follow prior literature analyzing financial narratives and focus on the 10-K and not on the 10-Q because 10-Ks are considered to be more informative for investors (Griffin, Harris, and Topaloglu, 2003), are regularly consulted by the investment community (Drake et al., 2012; Gibbons et al., 2018; Drake et al., 2016), and are generally more consistent with regards to the content and formatting.

Text characteristics capture semantic, syntactic, and other properties of the text con- tained in the MD&A section of 10-K filings. When analyzing the data obtained from the first stage we impose no normative assumptions about which text characteristics “should” presumably affect user behavior in financial context. Instead, we compile a comprehensive list of text characteristics used by either prior accounting research on financial narratives or by research in computational linguistics and evaluate their as- sociation with variation in the observed user behavior. Furthermore, for the second stage (i.e. the prediction stage) we deliberately also include characteristics of the text that relate to the underlying topic structure. This approach enables us to create a pre- diction for variation in information retrieval that is not only driven by the underlying processing costs of a text but also the content (i.e. semantics) of this text. We present more details on the second stage in section V.

Finally, we focus on differences in financial literacy when eliciting variation in users’ information retrieval. Although such variation can originate from various inter-related user characteristics such as such as age, education, and gender, we focus on financial literacy for several important reasons. First, financial literacy is a well-established concept in empirical finance and accounting, and is often used to categorize users of financial narratives, especially, small and retail investors. Second, financial literacy has also been shown to interact with the text characteristics of financial narratives (e.g., Rennekamp, 2012). Financial literacy is, therefore, an important user characteristic that may drive variation in how users evaluate the relevance of financial narratives on the user-side of the disclosure channel.

120 4.4. Eliciting Variation in Investor Behavior

4.4.1. Reading and Marking Task

To elicit which information in financial narratives users find relevant, we develop and execute a procedure in which a sample of participants with varying degree of financial literacy evaluate the relevance of excerpts from MD&As. These excerpts comprise one to four paragraphs depending on the amount of text and they are randomly drawn from MD&A sections of 10-K filings released in 2015, 2016, and 2017. We asked participants to read through the MD&A excerpt and mark financial information (i.e., sentences, words) that they find “relevant”. We also inform participants that the text they choose to mark is completely up to them and may be subjective. We purposefully refrain from specifying explicitly what we or other institutions consider “relevant’ financial information because this would guide participants’ marking behavior and attenuate the heterogeneity among participants that we intended to elicit for the machine-learning algorithm. Participants have no particular incentives to mark or search for specific information and receive a fixed fee for finishing an MD&A excerpt. Each participant can read and mark a maximum of 10 excerpts.

We use obfuscation and mouse-tracking techniques to measure the time participants spend reading each sentence, and whether they switch between sentences back and forth. To elicit which information participants find relevant, we store the portion of text participants mark, whether they undo markings, and how long they spend mark- ing, and whether they consider the information positive, negative, or neutral financial information. Figure 3 presents an example of the reading task presented to partici- pants. While participants can see which sentences they previously already marked as relevant (colored markings), they can only read one sentence at the time (all other sentences are blurred). By hovering their mouse over the excerpt, participants can decide which sentence to read.

— Insert Figure 3 here —

We also informed participants that we would ask them three questions after they finished reading each excerpt (i.e., Inv. Slider, Lik. Investment, and Consult Other). Inv. Slider captures participants’ willingness to divest versus invest between 100% − and +100%: ”Suppose you have $10, 000 invested in this company. Would you change your investment based on the text you just read?”. Lik. Investment equals investment likelihood on a five-point Likert scale ranging from 1 (Definitely not) to 5 (Definitely): ”Based on the text you just read, would you consider investing in this company.” Consult Other equals participants’ willingness to consulting other information sources

121 on a five-point Likert scale ranging from 1 (Definitely not) to 5 (Definitely): ”Given the text you just read, would you still consult other sources of information before you decide to invest in this company?” We use these three questions to validate whether participants reading and marking behavior predicts their judgments in a way that is consistent with prior literature.

4.4.2. Financial Literacy

The success of the elicitation procedure crucially depends on how reliably we can elicit how different participants vary in their information retrieval for a sub-sample of excerpts from the MD&A section. Since we predict that the variation in information retrieval of those excerpts not just depends on certain text characteristics, but also on how those users vary on important user characteristics, it is important to maximize and control the likelihood that a heterogeneous set of participants evaluate the relevance of the same excerpt of the MD&A section. Allocating multiple participants to the same excerpt randomly may lead to an imbalance between excerpts meaning that some excerpts are matched to a relatively heterogeneous set of participants while others are matched to a relatively homogeneous set of participants. We address this concern by randomly allocating a maximum of 3 x 3 unique participants to the same excerpt of the MD&A section conditional on their level of financial literacy which reflects basic knowledge of financial concepts (van Rooij et al., 2011), and basic knowledge about evaluating different financial assets. Before participants executed the main reading task, we required them to complete a qualification task and respond to a carefully constructed set of questions aiming to assess their level financial literacy. Participants received a fixed fee for participating in this qualification task.

To derive a concise set of questions for the financial literacy test, we collected, adapted, and constructed 22 questions commonly used to measure financial literacy among par- ticipants in empirical finance and accounting (e.g., van Rooij et al., 2011). Next, we ran a pre-test and extracted a set of 10 questions that optimized participants alloca- tion into three distinct buckets: low financial literacy (less or equal to 50% correct), average financial literacy (between 50% and 80% correct), and high financial literacy (80% correct or more). Appendix A presents an overview of the 10 questions. Although we selected these 10 questions to create variation in financial literacy scores among participants, we maintain an sufficient level of internal consistency among those ques- tions to ensure that they measure the same general construct (Cronbach’s alpha = 0.602).

We instructed participants to answer the questions carefully and to the best of their ability. We also instructed them to choose the option “I do not know” if they were

122 unable to answer a question. Based on their qualification in the financial literacy test, we then allocated participants efficiently to the excerpts of the MD&A section to maximize the heterogeneity of participants allocated to the same excerpt. Specifically, after sorting participants into three buckets using their financial literacy scores, we made sure that each excerpt was assigned to three randomly drawn participants from each financial literacy group. After sorting participants into one of the three buckets, participants also had to pass a short technical pre-test to guarantee that they could perform the task using their mouse to hover over sentences, that our task worked on their system, that they were human, and that they understood how to mark sentences using their mouse. Only if participants passed this test, they could start the task. Each unique user on M-Turk that completed the financial literacy and technical pre-tests was allowed to read and mark a maximum of 10 different excerpts.

4.4.3. Participant Pool

We recruited our participants on an online labor market called Amazon Mechani- cal Turk (M-Turk), which is a popular online labor market to recruit participants for finance and accounting-focused studies (Rennekamp, 2012; Koonce, Miller, and Winchel, 2015; Bonsall et al., 2017; Farrell, Grenier, and Leiby, 2017), but also for prediction tasks in computational linguistics (e.g., De Clercq, Hoste, Desmet, van Oosten, de , and Macken, 2014). Since M-Turk was originally developed to create a thriving online marketplace for jobs that involve classification and evaluation tasks (See the M-Turk website for more information), we feel confident that M-Turk partic- ipants posses the necessary skills and work ethic to complete the task we feature in our elicitation procedure (Libby et al., 2002). Online platforms, such as M-Turk, are also low-cost meaning that we maximize the sample available for training our neural- network algorithms to predict variation in information retrieval out-of-sample.

Since M-Turk participants are often compared to retail investors (e.g., Rennekamp, 2012), we expect them to be reasonable proxies for this population of interest. More importantly for realizing our research goals, however, is that M-Turk participants are relatively heterogeneous. Although we acknowledge that the heterogeneity of online participant pools may also present challenges (e.g., Dennis, Goodson, and Pearson, 2018; Bentley, 2018), we argue that its benefits outweigh its costs in light of our research goals. Contrary to more traditional work, our study benefits greatly from heterogeneity in the participant pool because it increases the likelihood that we obtain sufficient variation in financial literacy among participants. We need this variation to help elicit variation in information retrieval of financial narratives.

Throughout the elicitation procedure, we aimed to pay M-Turk participants a flat rate

123 around the U.S. Federal minimum wage ($7.25 per hour). This flat rate served as the basis for calculating the payment per excerpt and the payment for finishing the finan- cial literacy test. We screened M-Turk participants based on their location (U.S. only) and an approval rating of at least 95 percent or higher, which are generally-accepted screening criteria in the finance and accounting literature (e.g., Rennekamp, 2012; Farrell et al., 2017; Dennis et al., 2018). In total 662 M-Turk participants participated in our study. 90.4 percent of our M-Turk participants are between 19 and 50 years, and 95.9 percent possess a college degree or higher. Participants have completed an average of 1.34 accounting and 1.15 finance courses, and 70.7 percent of participants indicate that they have experience with trading or consult financial reports on a regu- lar basis. These descriptive results are similar to those reported by prior studies that use M-Turk participants to proxy for small or retail investors (e.g., Rennekamp, 2012; Bonsall et al., 2017).

4.4.4. Validation of the Elicitation Procedure

A key concern of the elicitation procedure is that participants on M-Turk do not behave as if they were actually reading the excerpt and retrieving relevant pieces of informa- tion. As a consequence, any observed behavior could measure relevance judgments with bias or significant noise at best. We validate that participants actually performed the task as intended by examining whether reading time and marking behavior vary with financial literacy and text characteristics in a manner consistent with expecta- tions from prior literature. For this analysis, we only include sentences from MD&A excerpts with at least two participants from each of the financial literacy groups (low, medium, and high). We also drop sentences that participants have not read (i.e., 908 participant-sentences) and sentences that took participants 1 minute or more to read (i.e., 192 participant-sentences above the 99th percentile of the reading time distribu- tion). Applying these criteria, we arrive at a sample of 522 M-Turk participants and 18, 042 participant-sentence observations. The median time spent on a sentence equals 7.191 seconds. A total of 47.41 percent of the participant-sentences include marked characters (8, 554 out 18, 042); 11.31 percent of the participant-sentences include posi- tively marked characters (2, 041 out 18, 042), 17.34 percent of the participant-sentences include neutrally marked characters (3, 129 out 18, 042), and 20.50 percent of the participant-sentences include negatively marked characters (3, 699 out 18, 042).

We start the validation of our elicitation procedure by analyzing whether participants’ marking behavior predicts their judgments for the MD&A excerpts they read (Table 1). Specifically, after participants finished reading an excerpt, we asked them to answer three judgment-related questions: Inv. Slider, Lik. Investment, and Consult Other. For

124 this particular analysis, we collapse the participant-sentence data structure to the participant-excerpt level, and we run three regressions that cluster standard errors by participant. The results in Table 1 show that positive marking behavior is positively related to participants’ willingness to invest, their stated likelihood of investment, and their willingness to consult other information sources (two-tailed p-value < 0.010). In contrast, negative marking behavior is negatively related to participants’ willingness to invest, their stated likelihood of investment, and their willingness to consult other information sources (two-tailed p-value < 0.010). We find no support for a relation- ship between participants’ neutral marking behavior and their willingness to invest and stated likelihood of investment (two-tailed p-value > 0.100). However, neutral marking behavior is positively related with participants’ willingness to consult other information sources. This is what we would expect for relevant financial information whose valence participants have difficulty evaluating. Overall, these results suggest that participants’ marking behavior reflects their judgments giving us confidence that participants mark information that is relevant to them and mark information in the way that it matters to them. We find no evidence of a relationship between participants’ judgments and the total number of words in the MD&A excerpt (two-tailed p-value > 0.100). Consistent with prior empirical literature, however, we do find evidence of a negative relation- ship between the relative use of complex words in MD&A excerpts and participants’ willingness to invest and their stated likelihood of investment (two-tailed p-value < 0.010). The relative use of complex words in MD&A excerpts is also positively related to participants’ willingness to consult other information sources (two-tailed p-value < 0.010).

— Insert Table 1 here —

Next, we examine how reading time per sentence maps into participants’ financial lit- eracy and the readability of the text in the sentence. Table 2 displays four regressions predicting reading time as a function of financial literacy and readability. These anal- yses are on the participant-sentence level, and we cluster standard errors by sentence. Columns 1 and 2 represent the full sample of participants. Column 1 shows that par- ticipants who vary on financial literacy read financial narratives differently. While we find that participants with medium financial literacy spend longer reading a sentence than participants with low financial literacy (b = 0.034, two-tailed p-value = 0.019), we find no such evidence for participants with high financial literacy (two-tailed p-value > 0.100). This difference in differences between financial literacy groups is significant (F = 6.67, two-tailed p-value = 0.009) and implies that participants who vary in their financial literacy vary non-linearly in their reading time of sentences. In column 2, we interact financial literacy with the readability of the sentence, which we measure using

125 the percentage of complex words in the sentence. We find no evidence that the rela- tionship between reading time and financial literacy vary depending on the readability of the sentence (p-value > 0.100). In columns 3 and 4, we reexamine the same empir- ical specifications but only for participants with medium and high financial literacy because those two participant groups may be more reflective of real users of financial narratives in practice. Column 3 reveals that participants with high financial literacy spend less time reading a sentence compared to participants with medium financial literacy (b = 0.035, two-tailed p-value = 0.009). However, column 4 suggests that − this difference in reading time (b = 0.089, two-tailed p-value = 0.011) diminishes as − a sentence becomes less readable (b = 0.248, two-tailed p-value = 0.092). The read- ability of the text and user financial literacy, therefore, interact in a non-trivial way to impact how long participants read a sentence in MD&A excerpts.

— Insert Table 2 here —

The next step is to examine whether financial literacy and readability of the text im- pact participants’ tendency to mark characters in a sentence. Table 3 shows the results of four logit regressions that predict the likelihood that a sentence contains marked characters by a participant. These analyses are on the participant-sentence level, and we cluster standard errors by sentence. Columns 1 and 2 represent the full sample of participants and suggest that the likelihood of marking also varies with financial liter- acy of participants in a non-linear way. While we find that participants with medium financial literacy are more likely to mark characters in a sentence compared to partic- ipants with low financial literacy (b = 0.118, two-tailed p-value = 0.001), we find no such evidence for participants with high financial literacy (two-tailed p-value > 0.100). This difference in differences between financial literacy groups is significant meaning that participants who vary in their financial literacy vary non-linearly in whether they mark characters in a sentence (F = 4.34, two-tailed p-value = 0.037). In column 2, we interact the financial literacy variables with the readability of the text in the sentence, measured using the percentage of complex words in the sentence. We find that partic- ipants with medium financial literacy are more likely to mark characters in a sentence compared to participants with low financial literacy (b = 0.411, two-tailed p-value = 0.052). We find no evidence that the differences in the likelihood of marking between participants with varying levels of financial literacy depends on the readability of the sentence. In columns 3 and 4, we reexamine the same empirical specifications, but only for participants with medium and high financial literacy. Column 3 reveals that participants with high financial literacy are less likely to mark characters in a sentence compared to participants with medium financial literacy (b = 0.072, two-tailed p- − value = 0.037). However, column 4 suggests that this difference in the likelihood of

126 marking depends on the readability of the sentence (b = 0.908, two-tailed p-value = 0.015). The readability of the text and user financial literacy, therefore, interact in a non-trivial way to impact the likelihood of marking characters in a sentence.

— Insert Table 3 here —

Our last validation test focuses on explaining variation in marking behavior for sen- tences that are marked by participants. Specifically, Table 4 displays four regressions where financial literacy and the readability of the text predict the log of how many char- acters participants mark in a marked sentence. These analyses are on the participant- sentence level, and we cluster standard errors by sentence. Columns 1 and 2 represent the full sample of participants and suggest that marking behavior in marked sentences varies with financial literacy of participants in a non-linear way. While we find that participants with medium financial literacy mark more characters in a marked sentence compared to participants with low financial literacy (b = 0.031, two-tailed p-value = 0.000), we find no such evidence for participants with high financial literacy (two-tailed p-value > 0.100). This difference in differences between financial literacy groups is sig- nificant (F = 10.83, two-tailed p-value = 0.001). In column 2, we interact financial literacy with the readability of the text in the marked sentence, measured using the percentage of complex words in the sentence. We find that participants mark fewer characters in a marked sentence as that sentence becomes less readable but only for participants with high and medium financial literacy (b = 0.126, two-tailed p-value = 0.021 for medium financial literacy; b = 0.025, two-tailed p-value = 0.087 for high financial literacy). In columns 3 and 4, we reexamine the same empirical specifications again for participants with medium and high financial literacy only. Column 3 reveals that participants with high financial literacy mark fewer characters in a marked sen- tence compared to participants with medium financial literacy (b = 0.024, two-tailed − p-value = 0.001). Column 4 suggests that this difference marking behavior between high and medium financial literacy does not vary with the readability of the text in marked sentences.

— Insert Table 4 here —

Besides the amount of characters marked it is also insightful to discuss variation with regards to the kind of words and topics that certain types of participants mark. Specif- ically, Table 5 aims to provide anecdotal evidence on the types of words that are distinctly marked by one group of users but not by the other group. To improve the generalizability of our relatively small sample of unique sentences we replace all general entities with their abstract representation. This implies replacing, for example, com- pany names with ORG and monetary values with MONEY. Each column represents

127 the top 15 words or bigrams that are frequently in sentences marked by group 1 but are not frequently in sentences marked by group 2. Columns 1 and 3 (2 and 4) show the top 15 words that are frequently in sentences marked by medium (high) literacy participants but not frequently in sentences marked by high (medium) literacy partici- pants. While descriptive of nature, it appears that the top 15 items shown in Columns 1 and 3 suggests that medium literacy participants focus more on particular heuristic terms such as fair value, foreign currency, and cash flow compared to high literacy par- ticipants. To the contrary, high literacy participants appear to focus more on detailed and specific constructs such as references to organizations (ORG), monetary values (MONEY ), and locations (GPE) compared to medium literacy participants. More generally, Table 5 shows strong and meaningful variation between marking behavior by medium and high literacy participants.1

— Insert Table 5 here —

4.5. Predicting Variation in Investor Behavior and Market Reactions to Financial Narratives

4.5.1. Machine-learning Approach to Predict Relevance Judgments across Financial Literacy Groups

We utilize the elicited marking behavior in the previous stage to train a machine learning algorithm that predicts marking behavior (i.e. relevance judgments) for an average user of a certain literacy level based on the text characteristics of the document. In this way, we can predict whether a particular user would mark sentences of an out- of-sample document as relevant given the characteristics of that sentence. To rule out that differences in relevance judgments are purely driven by variations in the processing costs of a sentence we deliberately also include topic-based characteristics in our machine learning algorithm. In order to incorporate the user element into the predictions we train two separate prediction algorithms. The first algorithm is based on the elicited data obtained from the users with medium literacy and the second algorithm is based on the elicited data obtained from the users with high literacy. We focus on the high and medium literacy users because attempting to train a prediction algorithm for the low literacy users revealed that their marking behavior was too erratic

1 While beyond the scope of our paper, we also provide similar statistics that focus specifically on variation with regards to the chosen sentiment for the markings. These statistics are provided in Appendix D.

128 to be predicted based on the characteristics of the text.2 This is consistent with the notion that users with low financial literacy are not equipped with enough knowledge to make consistent relevance judgments. This is not a limitation of our study given that this group of users is unlikely to be consumers of financial narrative information in the first place.

Predicting the marking behavior of a user based the text characteristics, including the underlying topic structure, is a complex supervised machine learning task. This complexity is amplified due to the relatively small sample size of our training sample. After aggregating the elicited data (19,107 observations) to the sentence level we are left with a sample of 2,123 unique sentences that have at least three observations for each of the three literacy groups. Reserving 20 percent (425 observations) for the test sample, the training sample consists of 1,698 observations. To deal with the relatively small sample size, we apply a transfer-learning approach to our prediction strategy by also converting the sentences into a dense vector space using FastText word em- beddings (Mikolov et al., 2017). Specifically, each sentence is mapped into a vector space of 300 dimensions based on the pre-trained Common Crawl FastText word vec- tors (Joulin et al., 2016). These dimensions are determined at the token level and aggregated to the sentence level by taking the mean value for each dimension. The FastText word embeddings are a specific variant of the broader group of dense word embedding algorithms, the most popular one being the Word2Vec algorithm (Mikolov et al., 2017). Dense word embeddings aim to capture the meaning (i.e. semantics) of words by representing the word in a low dimensional vector space. Each word (i.e. to- ken) will be represented by a range of topic probabilities, where the amount of topics is defined by the size of the dense vector. These topic probabilities are uncovered by training a neural network that attempts to cluster tokens based on the tokens that are surrounding it. While it is possible to train such a neural network based on our own sample of MD&A narratives, we opt to use the pre-trained Common Crawl FastText word vectors instead (Joulin et al., 2016). Using the FastText word embeddings is more robust given that these word vectors are trained on a substantially large corpus of documents (the Common Crawl dataset consists of 600 Billion tokens). The main empirical benefit of this choice is that our prediction algorithm is generalizable to a wider range of documents compared to a scenario where we would have MD&A specific word embeddings.

Several additional linguistic features, such as readability scores, are included to com- plement the FastText dimensions. As mentioned earlier, to compose a set of valid and

2 Even the best predictive model for the low groups’ marking behavior provided predictions that were only marginally more accurate than a random choice baseline with an F-1 score of around 54 percent.

129 relevant linguistic features we impose no normative assumptions about which linguis- tic features ‘should’ presumably be included. Instead, we compile a list of linguistic features used by prior accounting research on financial narratives and used by research in computational linguistics. A full list of all the additional text features is provided in Appendix B.

Ex-ante, it is not clear which machine learning model is optimal for this prediction task. We evaluate the various options by performing basic hyper parameter optimization for the following learning models: Logistic Regression, Naive Bayes, Support Vector Machines, Decision Trees, and Artificial Neural Networks. The results of most of the “traditional” models, such as Support Vector Machines, are susceptible to over-fitting on the training set due to the high dimensionality of the input features relative to the sample size. An artificial neural network with dropout layers and L2 regularization yields the highest performance in our baseline tests as it is better able to combat such over fitting.3

We utilize an evolutionary hyper parameter grid-search algorithm to find the optimal number of hidden layers, learning rate, batch size, L2 regularization, and dropout rates for the neural networks. This hyper parameter optimization is performed separately for each of the two prediction algorithms (i.e., for high and medium literacy user behavior). The two neural networks achieved an F-1 Score on the test set (which is completely unseen by the algorithm) of around 68 percent for the medium literacy group and 69 percent for the high literacy group. These metrics indicate that both our models are able to outperform a random choice by a substantial margin and are comparable or higher compared to some of the previous literature that applies machine learning algorithms on financial documents (e.g., Theil, Stajner, and Stuckenschmidt, 2018). Appendix C provides an overview of the optimal hyper parameters and a full performance report for both models.

4.5.2. Prediction Sample and Descriptive Statistics

The prediction sample is based on all 10-K documents with filing dates on EDGAR be- tween 2003 and 2016. Following prior literature (e.g., Loughran and McDonald, 2014; Bonsall et al., 2017), we retain all observations with a CRSP-PERMNO match (61,282 observations remaining), a stock price greater than $3 (52,928 observations remain- ing), shares classified as ordinary common equity (51,090 observations remaining), at least 3,000 words (50,910 observations remaining), with a COMPUSTAT match and positive book-to-market ratio available (44,914 observations remaining), a filing date

3 The performance results are obtained using the Scikit-Learn, Keras, and TensorFlow Python libraries.

130 greater than 180 days from the prior filing (44,898 observations remaining), and pre- and post-market model data available (43,687 observations remaining). We download each raw 10-K filing as HTML from the SEC EDGAR database and extract all text corresponding to the MD&A. Since we apply a relatively strict algorithm, to ensure the validity of our sentence sample, that fails if the document is not consistently formatted (which is more likely to be the case for earlier observations), the final prediction sample is further reduced to 35,650 observations with all necessary text and capital market data available. We then estimate the likelihood of marking for each of the 10,208,953 individual sentences includes in the final MD&A sample both for the high and medium financial literacy group.

Table 6 presents descriptive statistics for the predicted marking behavior for high and medium financial literacy groups obtained from the machine learning algorithm for all 10,208,953 unique sentences of the final MD&A sample. The predicted probabil- ity of marking a sentence as relevant is, on average, between 43.2 percent (medium literacy group) and 54.3 percent (high literacy group). To measure the variation in information retrieval between both groups, we rely on standard inter-rater agreement measures, i.e., Cohen’s kappa and Cronbach’s alpha. These measures explicitly take into account the possibility that agreement between different groups occurs by chance and, hence, should result in a more robust measure of variation in information retrieval than just the percentage of identical markings. Since inter-rater agreement calculations are typically based on categorical items, we convert the continuous marking probabil- ities obtained from the prediction algorithm into marking indicators based on a 50 percent probability cut-off (i.e., the marking indicator takes a value of one if the pre- dicted probability is above 50 percent, and zero otherwise). Although arbitrary, the 50 percent cut-off is the most simplest decision-rule based on the notion of a “more likely than not”-classification. However, we also note that the resulting inter-rater agreement scores will potentially measure heterogeneity with error. In particular, it is possible that relatively smaller differences in marking probabilities between the high and medium literacy group are classified as disagreement if the predicted probabilities are located just around the 50 percent threshold. At the same time, if the predicted probabilities are both below or above the threshold, they will be classified as agreement even if the differences between both groups are relatively large.4 However, this potential measure- ment error would bias the resulting inter-rater agreement scores towards finding more homogeneity and less heterogeneity among both financial literacy groups. Hence, these

4 For example, the predicted probabilities of 48 percent for group A and 52 percent for group B would be classified as disagreement, while 60 percent (group A) and 80 percent (group B) would be classified as agreement. However, the different between both group is only 4 percent in the first case vs. 20 percent in the second case.

131 measures will - if at all - underestimate variation of information retrieval from financial narratives and its impact on information processing in capital markets. For robustness, we also compute correlation-based measures (i.e., Kendall’s tau and Spearman’s rho) based on probability rankings that we obtain from allocating predicted marking prob- abilities to the corresponding probability percentile (i.e., a marking probability of 94% is assigned rank 9; a marking probability of 87% rank 8, etc.). All measures are bound between zero and one with higher values indicating higher levels of agreement between the high and medium literacy group. For ease of interpretation, we multiply all four variables by minus one so that higher values represent higher levels of variation in information retrieval.

Table 7 reports statistics for the heterogeneity in marking among both literacy groups based on the 50 percent probability cut-off. While about 75 percent of all sentences are predicted to be treated similarly (i.e., marking/non-marking) by both financial literacy groups, about 15.0 (10.7) percent of all sentences are predicted to be marked by the high (medium) financial literacy group only. These statistics are in line with the results of the elicitation procedure and suggest that while financial statement users extract the similar base information from financial narratives, there is also considerable predictable heterogeneity concerning the retrieval of additional information.

For an initial intuition about how our predicted measures of heterogeneity in rele- vance judgments (IR HET ) compare to document and text characteristics, we next examine how these measure change over the sample period and for percentiles of doc- ument length. Figure 4.6 shows the trends in each of the IR HET measures as well as MD&A/10-K file size and MD&A readability (measured using the FOG index) by year over our entire sample period. Consistent with prior studies, both FOG and the quantity of disclosure seem to steadily increase throughout the sample period suggest- ing an increase of document complexity over the 2003-2016 period. In contrast, all four measures of variation in information retrieval appear to decrease prior to 2006 (i.e., agreement among users increased) and remains relatively stable after 2009. This pattern suggest that IR HET measures distinct variation in information retrieval that can hardly be explained by variation in processing costs (i.e., document complexity) alone.

Figure 4.6 shows the variation of IR HET measures and key document/text charac- teristics for percentiles of MD&A document length (measured by the total number of words in the MD&A). Again, consistent with prior studies, all three document/text features (file size and FOG index) increase in the length of the narrative. IR HET mea- sures, in contrast, seem to be associated with document length mainly if the document is rather short. For longer documents IR HET is relatively stable across different length

132 percentiles. Univariate statistics (untabulated) fail to detect any significant differences of IR HET measures for the third and higher percentiles. Since document length is a potential determinant of processing costs, we exclude all MD&A documents with fewer than 3,000 words (i.e., within the second length percentile) from the final regression sample to ensure that the tests for the association between variation in information retrieval and post-filing market outcomes are not affected by the correlation of IR HET and document length observed for short documents. The final sample includes 31,322 observations.

4.5.3. Heterogeneity in User Behavior and Capital Market Outcomes

In our final set of tests, we provide evidence that the predicted relevance judgments are a valid measure for heterogeneity among financial statement users’ information retrieval from financial narratives. In particular, we test whether heterogeneity in pre- dicted relevance judgments among the high and medium financial literacy groups cor- relates with observed reactions of capital market participants after the corresponding 10-K filing. We follow the tests in Loughran and McDonald(2014) and Bonsall et al. (2017) and examine the association between the predicted level of heterogeneity in relevance judgements and post-filing stock return volatility as well as analyst forecast dispersion. To the extend that our predictions capture differences in financial state- ments users’ retrieval and processing of financial narrative information, we expect a positive association between the heterogeneity measures and post-filing stock return volatility. Following Loughran and McDonald(2014) and Bonsall et al.(2017), we estimate the following models:

X RMSE[6,28]i,t/Analyst dispersioni,t =β0 + β1IRHETi,t + Control V ariablesi,t X X + Y ear F E + Industry/F irm F E (4.1)

The dependent variable, RMSE[6,28] is the root mean squared error multiplied by 100 from a market model estimated over trading days 6 to 28 relative to the 10-K filings date. The main variable of interest, IR HET is the level of heterogeneity (or inter-rater disagreement) between the high medium financial literacy group measured as Cohen’s kappa or Cronbach’s alpha (based on predicted marking indicators) and Kendall’s tau or Spearman’s rho (based on ranks of predicted marking probabilities). Higher values of kappa, alpha, tau, and rho, correspond to higher levels of heterogeneity in the rating between both literacy groups and hence should correspond to higher post-filing stock return volatility. We control for the total number of words of the 10-K as well as the

133 percentage of complex words in the 10-K to capture the general processing costs and complexity of the financial narrative (Loughran and McDonald, 2014). In addition, the model controls for the pre-filing alpha obtained from the market model over the trading days 257 to 6 prior to the filing date, the pre-filing RMSE from the prior period market model regression, the absolute filing period abnormal return from the filing date to the next day, the firms’ market capitalization and book-to-market ratio, and whether the firm trades on NASDAQ. We also include year as well as industry- or firm-fixed effects. Please refer to Appendix B for a detailed description of all variables used in the analyses.

In addition to the stock return volatility tests, we also conduct tests using analyst forecast dispersion as a dependent variable. To be included in the sample, we require two or more analyst forecasts available from I/B/E/S in the time period between the 10-K filing date and the firm’s next quarterly earnings announcement. Due to data availability, the sample for the analyst dispersion tests falls to 20,848. In these tests we also include the logarithm on the number of analyst following as an additional control variable.

Table 8 presents the descriptive statistics for all variables used in the regression analysis as well as text features examined by prior studies for the entire sample of MD&A documents. Overall, descriptive statistics are comparable to those reported by, e.g., Loughran and McDonald(2014) and Bonsall et al.(2017). Consistent with Li(2008), text features relating to the complexity of the text (e.g., Gunning Fog score, percentage of complex words) are slightly lower for the MD&A than corresponding statistics for the entire 10-K reported by prior studies. It is likely that the MD&A represents more coherent writing and is presumably less affected by, e.g., numbers, tables, etc. compared to other parts of the 10-K filing, such as the notes section. Most importantly, all four IR HET measures show considerable variation in the level of homogeneity in information retrieval between the high medium financial literacy group, suggesting that homogeneity varies across firms and/or over time.

Table 9 reports the results of our regression estimations including industry or firm fixed effects. All control variables, except the NASDAQ indicator, are significant in explaining post-filing RMSE and show coefficient estimates consistent in sign and magnitude with prior research. The level of homogeneity in marking as measure by both Cohen’s Kappa or Cronbach’s Alpha is significantly negatively associated with post-filing RMSE, suggesting that more homogeneous (heterogeneous) patterns of in- formation retrieval from financial narratives among different groups of financial state- ment users generate lower (greater) stock return volatility following the public release of the MD&A. This result holds while controlling for the general length and/or com-

134 plexity of the 10-K (as measured by the total number of words and the percentage of complex words) as well as industry or firm fixed effects. Interestingly, the percentage of complex words loads negatively in industry-fixed effects specifications (insignificant in firm-fixed effects specifications), which is consistent with the notion in Loughran and McDonald(2014) that the most frequently used multisyllable words contained in a 10-K are easily understood by investors and, hence, do not capture investors’ pro- cessing costs. Overall, results in Table 9 suggest that variation in information retrieval not only has an effect on the processing of information by capital market participants, but also that it is possible to predict these differences and market participants’ reac- tions even under relatively simplistic assumptions (such as the 50 percent probability cut-off).

In Table 10 we do not find any evidence of a statistically significant association be- tween any of the IR HET measures and analyst forecast dispersion. At the same time, we find that measures of information complexity, i.e., the quantity of narrative disclo- sures, is positively associated with the dispersion in analyst forecasts, consistent with evidence presented in Loughran and McDonald(2014) and Bonsall et al.(2017). While these results are seemingly inconsistent with the idea that IR HET measures capture variation in information retrieval and processing, they de facto corroborate the idea that IR HET measures heterogeneity based on differences in financial statement users financial literacy. Since financial analysts are generally sophisticated users of financial statements and, hence, should show a low variation in financial expertise, it is not surprising that we find no association between IR HET measures and analyst disper- sion, i.e., the heterogeneity among earnings projections of more sophisticated market participants.

To further corroborate our interpretation that IR HET measures variation of infor- mation retrieval and processing between financial statement users of different financial literacy, we re-estimate the RMSE model including an interaction term for IR HET with the level of institutional ownership. If variation in information retrieval matters for market pricing, we should observe that the effect is more pronounced for firms with a more diverse investor base. Assuming that lower levels of institutional ownership are associated with diversity in the investor base, we match the sample of MD&As with institutional holdings data obtained from the Thomson Reuters 13-f data.5 Due to data availability the sample drops to 20,346 observations. Low Institutional Ownership is an indicator variable that takes the value of one if the percentage of institutional own- ership is below the sample median, and zero otherwise. Due to the sticky nature of

5 We note that there have been reports about data quality problems regarding Thomson Reuters 13f data. We obtained the data via WRDS in October 2018 and only use periods for which the data quality issues have been reported to be corrected as of October 2018.

135 stock ownership, the corresponding regressions in Table 11 only include industry fixed effects. While more diversity in the investor base is associated with greater post-filing stock return volatility, more variation in information retrieval among investor groups seem to amplify this effect. This is consistent with the idea that the effects of variation in financial statement users’ information retrieval from financial narratives should be more important for firms with a relatively more diverse investor base. Overall, the re- sults provide further evidence that predictions of variation among financial statement users’ information retrieval from financial narratives capture variations in information processing and subsequent capital market reactions.

4.6. Conclusion

We develop a measure for the extent to which different users agree on the relevance of information in financial narratives based on their observed behavior. Using a tool that tracks users’ reading and marking behavior in a controlled environment, we first elicit how a group of users with varying financial literacy evaluate the relevance of information in firms’ MD&As. We find that an inclusive set of text characteristics of MD&As excerpts interact with the financial literacy of users, and that this impact how users retrieve information from MD&As excerpts. Next, we use the data elicited in the first stage to train two state-of-the-art machine-learning algorithms that predicts how users, who vary in their financial literacy, assess the relevance of information in firms’ MD&As. Our machine-learning algorithms predict variation in information retrieval for users with medium and high literacy with reasonable accuracy. Our final set of analyses reveal that predicted variation in information retrieval of MD&A excerpts is incrementally associated with capital market outcomes. Specifically, we find that less variation in information retrieval is negatively related with post-filing stock return volatility. We also find that this effect is more pronounced for firms with lower levels of institutional ownership which is consistent with the notion that variation in infor- mation retrieval should be more pronounced for firms with a relatively more diverse investor base.

136 Graphs

137 Fig. 4.1. (a) Text-centric approach

Fig. 4.2. (b) User-centric approach

138 Fig. 4.3. Computational Pipeline to Estimate Variation in User Behavior

139 Fig. 4.4. Illustration of the MTurk Instrument used to elicit User’s Behavior

140 Fig. 4.5. Trend of IR Heterogeneity and Document/Text Features by Year This figure shows the trend in each of the IR Heterogeneity measures and key docu- ment/text features by year over our entire sample period. For ease of interpretation all regression coefficients are standardized with a mean of zero and standard deviation of one.

141 Fig. 4.6. Association of IR Heterogeneity and Document/Text Features with MD&A Length This figure shows variation in IR Heterogeneity measures and key document/text features across percentiles of MD&A document length. For ease of interpretation all regression coefficients are standardized with a mean of zero and standard deviation of one.

142 Bibliography

Anderson, M. J., 1988. A Comparative Analysis of Information Search and Evalua- tion Behavior of Professional and Non-professional Financial Analysts. Accounting, Organizations and Society 13, 431–446.

Asay, H. S., Elliott, W. B., Rennekamp, K., 2016. Disclosure Readability and the Sen- sitivity of Investors’ Valuation Judgments to Outside Information. The Accounting Review 92, 1–25.

Benbasat, I., Schroeder, R., 1977. An Experimental Investigation of Some MIS Design Variables. Management Information Systems Quarterly 1.

Bentley, J. W., 2018. Challenges with Amazon Mechanical Turk Research in Account- ing.

Bloomfield, R. J., 2002. The Incomplete revelation hypothesis and financial reporting. Accounting Horizons 16, 233–243.

Bloomfield, R. J., Nelson, M. W., Soltes, E. F., 2016. Gathering Data for Archival, Field, Survey and Experimental Accounting Research. Journal of Accounting Re- search .

Bonner, S. E., 2007. Judgment and Decision Making in Accounting. Pearson.

Bonner, S. E., Lewis, B. L., 1990. Determinants of Auditor Expertise. Journal of Accounting Research 28, 1.

Bonsall, S. B., Leone, A. J., Miller, B. P., Rennekamp, K., 2017. A Plain English Measure of Financial Reporting Readability. Journal of Accounting and Economics 63, 329–357.

Bower, G. H., Morrow, D. G., 1990. Mental Models in Narrative Comprehension. Science 247, 44–8.

Camerer, C. F., Johnson, E. J., 1997. The Process-Performance Paradox in Expert Judgment: How Can Experts Know So Much and Predict So Badly. In: Research on Judgment and Decision Making: Currents, Connections, and Controversies, p. 768.

143 Cardinaels, E., Hollander, S., White, B. J., 2019. Automatic Summarization of Earn- ings Releases: Attributes and Effects on Investors Judgments. Review of Accounting Studies pp. 1–31.

Castellan, N. J. J., 1993. Individual and Group Decision Making. Lawrence Erlbaum Associates, New York, NY.

Chi, S., Shanthikumar, D., 2018. Do Retail Investors Use SEC Filings? Evidence from EDGAR Search. SSRN .

Cloyd, C. B., 1997. Performance in Tax Research Tasks: The Joint Effects of Knowledge and Accountability. The Accounting Review 72, 111–131.

Cole, M. J., Gwizdka, J., Liu, C., Belkin, N. J., Zhang, X., 2013. Inferring user knowl- edge level from eye movement patterns. Information Processing & Management 49, 1075–1091.

Collins-Thompson, K., 2014. Computational Assessment of Text Readability: a Survey of Current and Future Research. International Journal of Applied Linguistics 165, 97–135.

De Clercq, O., Hoste, V., Desmet, B., van Oosten, P., , M., Macken, L., 2014. Using the crowd for readability prediction. Natural Language Engineering 20, 293– 325.

De Franco, G., Hope, O.-K., Vyas, D., Zhou, Y., 2015. Analyst Report Readability. Contemporary Accounting Research 32, 76–104.

Dennis, S. A., Goodson, B. M., Pearson, C., 2018. Mturk Workers Use of Low-Cost ’Virtual Private Servers’ to Circumvent Screening Methods: A Research Note.

Drake, M. S., Roulstone, D. T., Thornock, J. R., 2012. Investor Information Demand: Evidence from Google Searches Around Earnings Announcements. Journal of Ac- counting Research 50, 1001–1040.

Drake, M. S., Roulstone, D. T., Thornock, J. R., 2016. The Usefulness of Historical Accounting Reports. Journal of Accounting and Economics 61, 448–464.

Ed, O., Koch, B., Boone, J., 2005. The influence of domain knowledge and task com- plexity on tax professionals’ compliance recommendations. Accounting, Organiza- tions and Society 30, 145–165.

144 Elliott, W. B., Rennekamp, K. M., White, B. J., 2015. Does Concrete Language in Disclosures Increase Willingness to Invest? Review of Accounting Studies 20, 839– 865.

Farrell, A. M., Grenier, J. H., Leiby, J., 2017. Scoundrels or Stars? Theory and Evidence on the Quality of Workers in Online Labor Markets. The Accounting Review 92, 93– 114.

Ford, J., Schmitt, N., Schechtman, S. L., Hults, B. M., Doherty, M. L., 1989. Pro- cess Tracing Methods: Contributions, Problems, and Neglected Research Questions. Organizational Behavior and Human Decision Processes 43, 75–117.

Gibbons, B., Iliev, P., Kalodimos, J., 2018. Analyst Information Acquisition via EDGAR. SSRN .

Griffin, J. M., Harris, J. H., Topaloglu, S., 2003. The Dynamics of Institutional and Individual Trading. The Journal of Finance 58, 2285–2320.

Guay, W., Samuels, D., Taylor, D., 2016. Guiding through the Fog: Financial statement complexity and voluntary disclosure. Journal of Accounting and Economics 62, 234– 269.

Gunning, R., 1952. The Technique of Clear Writing. McGraw-Hill International Book Co., New York.

Henry, E., Leone, A. J., 2016. Measuring Qualitative Information in Capital Markets Research: Comparison of Alternative Methodologies to Measure Disclosure Tone. The Accounting Review 91, 153–178.

Hodge, F., Pronk, M., 2006. The Impact of Expertise and Investment Familiarity on Investors’ Use of Online Financial Report Information. Journal of Accounting, Auditing & Finance 21, 267–292.

Hy¨on¨a,J., Lorch, Robert F., J., Kaakinen, J. K., 2002. Individual differences in read- ing to summarize expository text: Evidence from eye fixation patterns. Journal of Educational Psychology 94, 44–55.

Johnson, E. J., 1988. Expertise and Decision under Uncertainty: Performance and Process. In: The Nature of Expertise, Lawrence Erlbaum, pp. 209–228.

Johnson-Laird, P. N., 1983. Mental Models. Harvard University Press, 6th ed.

145 Joulin, A., Grave, E., Bojanowski, P., Douze, M., J´egou,H., Mikolov, T., 2016. Fast- text. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 .

Koonce, L., Miller, J., Winchel, J., 2015. The Effects of Norms on Investor Reactions to Derivative Use. Contemporary Accounting Research 32, 1529–1554.

Krische, S. D., 2018. Investment Experience, Financial Literacy, and Investment- Related Judgments. Contemporary Accounting Research .

Lang, M., Stice-Lawrence, L., 2015. Textual Analysis and International Financial Re- porting: Large Sample Evidence. Journal of Accounting and Economics 60, 110–135.

Lawrence, A., 2013. Individual Investors and Financial Disclosure. Journal of Account- ing and Economics 56, 130–147.

Li, F., 2008. Annual Report Readability, Current Earnings, and Earnings Persistence. Journal of Accounting and Economics 45, 221–247.

Libby, R., Bloomfield, R., Nelson, M. W., 2002. Experimental Research in Financial Accounting. Accounting, Organizations & Society 27, 775–810.

Loughran, T., McDonald, B., 2014. Measuring Readability in Financial Disclosures. The Journal of Finance 69, 1643–1671.

Maines, L. A., McDaniel, L. S., 2000. Effects of ComprehensiveIncome Characteristics on Nonprofessional Investors’ Judgments: The Role of FinancialStatement Presen- tation Format. The Accounting Review 75, 179–207.

Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A., 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405 .

Miller, B. P., 2010. The Effects of Reporting Complexity on Small and Large Investor Trading. The Accounting Review 85, 2107–2143.

Pieters, R., Wedel, M., 2008. Informativeness of eye movements for visual marketing. Visual Marketing From Attention to Action pp. 43–71.

Rennekamp, K., 2012. Processing Fluency and Investors Reactions to Disclosure Read- ability. Journal of Accounting Research 50, 1319–1354.

Securities and Exchange Commission, 1998a. A Plain English Handbook: How to Cre- ate Clear SEC Disclosure. SEC Office of Investor Education and Assistance Wash- ington, District of Columbia.

146 Securities and Exchange Commission, 1998b. Staff Legal Bulletin No. 7.

Shields, M. D., 1983. Effects of Information Supply and Demand on Judgment Accu- racy: Evidence from Corporate Managers. The Accounting Review 58, 284–303.

Siano, F., Wysocki, P., 2018. The Primacy of Numbers in Financial and Accounting Disclosures: Implications for Textual Analysis Research. SSRN .

Smith, E. R., 1998. Mental Representations and Memory. In: The Handbook of Social Psychology, McGraw-Hill, Boston, MA, pp. 391–440, fourth ed.

Swain, M. R., Haka, S. F., 2000. Effects of Information Load on Capital Budgeting Decisions. Behavioral Research in Accounting 12, 171–198.

Theil, C. K., Stajner, S., Stuckenschmidt, H., 2018. Word Embeddings-Based Uncer- tainty Detection in Financial Disclosures. Proceedings of the First Workshop on Economics and Natural Language Processing pp. 32–37. van Rooij, M., Lusardi, A., Alessie, R., 2011. Financial Literacy and Stock Market Participation. Journal of Financial Economics 101, 449–472.

Wood, S. L., Lynch, J. G., 2002. Prior Knowledge and Complacency in New Product Learning. Journal of Consumer Research 29, 416–426.

Yates, J. F., 1990. Judgment and Decision Making. Pearson College Div.

You, H., Zhang, X.-j., 2009. Financial reporting complexity and investor underreaction to 10-K information. Review of Accounting Studies 14, 559–586.

147 Appendix A: Financial Literacy Questions

148 Appendix A: Financial Literacy Questions

(1) Suppose you had $100 in a savings account and the interest rate was 2% per year. After 5 years, how much do you think you would have in the account if you left the money to grow? a) More than $102 b) Exactly $102 c) Less than $102

(2) Suppose you had $100 in a savings account and the interest rate is 20% per year and you never withdraw money or interest payments. After 5 years, how much would you have on this account in total? a) More than $200 b) Exactly $200 c) Less than $200

(3) Imagine that the interest rate on your savings account was 1% per year and inflation was 2% per year. After 1 year, how much would you be able to buy with the money in this account? a) More than today b) Exactly the same as today c) Less than today

(4) Assume a friend inherits $10,000 today and his sibling inherits $10,000 three years from now. Assume both your friend and his sibling do not spend the $10,000. Who is richer because of the inheritance? a) My friend b) His sibling c) They are equally rich

(5) Suppose that in the year 2020, your income has doubled, and prices of all goods have doubled too. In 2020, how much will you be able to buy with your income? a) More than today b) Exactly the same c) Less than today

(6) Which of the following financial assets typically grant you the highest return over a long period of time (e.g., 10-20 years)? a) Savings accounts b) Individual shares and stocks c) Debt securities and bonds

(7) If the interest rate drops, what happens to bond prices? a) They rise b) They fall c) They stay the same d) I do not know

149 Appendix A (continued)

(8) Compared to similar firms in the same industry, a company uses more borrowed money to finance its operations. Which of the following statements is most likely to be true for the company? a) It is less likely to experience any difficulty with its creditors compared to other firms in the industry b) It has less liquidity than other firms in the industry c) It will be viewed has having relatively high creditworthiness d) It has greater than average financial risk when compared to other firms in the industry

(9) Which of the following activities would most likely result in an increased risk of the firm being unable to repay borrowed funds? a) Increasing short-term assets while decreasing short-term liabilities b) Increasing short-term assets while increasing short-term liabilities c) Reducing short-term assets, increasing short-term liabilities, and reducing long-term liabilities d) Replacing short-term liabilities with equity

(10) Please indicate which of the following you think best captures the operating performance of a firm. a) The return of a company’s stock b) The value of sales generated relative to the value of a firm’s total assets c) The difference between total sales and cost of sales d) The net income generated relative to the value of a firm’s total assets

Appendix A lists all 10 questions included in the pre-test to determine participants level of financial literacy. All questions also include the option “I do not know”. The correct answers to the questions are as follows (in ascending order): 1-a, 2-a, 3-c, 4-a, 5-b, 6-a, 7-a, 8-d, 9-c, 10-c.

150 Appendix B: Variable Definitions

151 Appendix B: Variable Definitions

Text Characteristics

Gunning FOG score Gunning (1952) Fog Index equal to 0.4*(average number of words per sentence + percent of complex words).

Average words per sentence Number of words in the 10-K divided by the total number of sentences.

Flesch score Flesch reading ease formula.

Smog score The Simple Measure of Gobbledygook (SMOG) readability score.

Text score Readability consensus score based on a portfolio of readability scores.

Number of tokens Number of tokens (i.e. words) in a sentence.

Number of words Number of words in a sentence or document (depending on the level of analysis).

%Complex words Number of words with three or more syllables in a sentence or document (depending on the level of analysis) scaled by the total number of words.

Sentence position Order of the sentence within a document excerpt based on its occurrence.

Contains number A dummy variable that equals one if the sentence contains at least one numerical value.

Contains monetary value A dummy variable that equals one if the sentence contains at least one monetary value as measured by the presence of one of the following tokens: “$”, “dollar”, “dollars”, “e”, “euro”, “euros”.

Contains date A dummy variable that equals one if the sentence contains at least one date value as measured by the presence of a year (in the range 1980 to 2040) or contains the name of a month.

Note that the textual features included in the machine-learning algorithm are much broader and also include, among other aspects, FastText word vectors (Joulin et al., 2016). Please refer to section V for a detailed explanation of the components included in the machine learning approach.

Financial Literacy and Other Participant Variables

High financial literacy Indicator variable equal to one if participants scored higher than 80 percent correct on the financial literacy pre-test, zero otherwise. Please refer to appendix A for an overview of questions asked in the financial literacy pre-test.

Medium financial literacy Indicator variable equal to one if participants scored between 50 and 80 percent correct on the financial literacy pre-test, zero otherwise. Please refer to appendix A for an overview of questions asked in the financial literacy pre-test.

152 Appendix B (continued)

Low financial literacy Indicator variable equal to one if participants scored below 50 percent correct on the financial literacy pre-test, zero otherwise. Please refer to appendix A for an overview of questions asked in the financial literacy pre-test.

Reading time per sentence Equals the time a participant spent reading the sentence in seconds.

Investment slider Captures participants willingness to divest versus invest between 100% and +100%.

Likelihood investment Equals investment likelihood on a five-point Likert scale ranging from 1(Definitely not) to 5 (Definitely).

Consult other Equals participants willingness to consulting other information sources on a five-point Likert scale ranging from 1 (Definitely not) to 5 (Defi- nitely).

Marking Variables

Positive markings Number of characters marked as positive in a given excerpt.

Neutral markings Number of characters marked as neutral in a given excerpt.

Negative markings Number of characters marked as negative in a given excerpt.

Marking indicator Indicator equal to one if a sentence contains marked characters, and zero otherwise.

Number of marked characters Equals the number of marked characters in a sentence.

Probability of marking The probability of marking a sentence as relevant as obtained from the prediction algorithm (min. 0%, max: 100%).

Marking indicator Indicator variable equal to one if the predicted marking probability is above 50 percent, and zero otherwise.

IR HETKendall0stau Inter-rater agreement statistic based on Kendall’s rho using ranked marking probabilities for the high and medium financial literacy groups as input. Raw marking probabilities are transformed into percentile probability rankings based on the relative probability interval. The vari- able is multiplied my minus one so that a larger score indicates higher levels of disagreement between both rating groups.

IR HETSpearman0srho Inter-rater agreement statistic based on Spearman’s rho using ranked marking probabilities for the high and medium financial literacy groups as input. Raw marking probabilities are transformed into percentile probability rankings based on the relative probability interval. The vari- able is multiplied my minus one so that a larger score indicates higher levels of disagreement between both rating groups.

153 Appendix B (continued)

IR HETCohen0skappa Inter-rater agreement statistic based on Cohen’s kappa using the marking indicators for the high and medium financial literacy groups as input. The variable is multiplied my minus one so that a larger score indicates higher levels of disagreement between both rating groups.

IR HETCronbach0salpha Inter-rater agreement statistic based on Cronbach’s alpha using the marking indicators for the high and medium financial literacy groups as input. The variable is multiplied my minus one so that a larger score indicates higher levels of disagreement between both rating groups.

Market Test Variables

Post-filing RMSE [6,28] The root mean squared error (RMSE) from a market model estimated using trading days [+6, +28] relative to the 10-K file date multiplied by 100, with a minimum of 60 observations available.

Pre-filing RMSE [-257,-6] The root mean squared error (RMSE) from a market model estimated using trading days [257, 6] relative to the 10-K file date multiplied by 100, with a minimum of 60 observations available.

Pre-filing alpha [-252,-6] Alpha from a market model using trading days [-252, -6] relative to the 10-K file date multiplied by 100. A minimum of 60 observations of daily returns must be available to be included in the sample.

Abs(Filing period abnormal return) Absolute value of the filing date excess return. The buy-and-hold return period is measured from the 10-K filing date (day 0) through day +1 minus the buy-and-hold return of the CRSP value-weighted index over the same 2-day period.

Market capitalization CRSP stock price times shares outstanding on the day prior to the 10-K filing date (in $ millions).

Book-to-market Firm’s book-to-market ratio, using data from both Compustat (book value from most recent year prior to filing date) and CRSP (market value of equity). Firms with negative book values are removed from the sample.

NASDAQ dummy Indicator variable equal to one if the firm is listed on NASDAQ at the time of the 10-K filing, and zero otherwise.

Analyst dispersion Standard deviation of analysts forecasts appearing divided by the stock price from before the 10-K filing date multiplied by 100. Only forecasts occurring between the 10-K filing date and the next earnings announce- ment date are included, and thus stale forecasts are not in the sample. If a given analyst has more than one forecast reported during this time interval, only the forecast closest to the filing date is included in the sample. We retain only firms with at least two analyst forecasts.

154 Appendix B (continued)

Analyst following The number of analysts used in the Analyst dispersion calculation.

Low institutional ownership Indicator variable equal to one if the firm’s percentage of institutional ownership is below the sample median, and zero otherwise. Due to reported data quality problems with Thomson Reuters 13-f data, we only calculate the variable for time periods reported as corrected as of October 2018.

155 Appendix C: Machine Learning Details

156 Appendix C: Machine Learning Details

Panel A: Hyper parameters

Hyper parameter High Financial Literacy Model Medium Financial Literacy Model

Hidden Layers Layer 1: 10 Layer 2: 200 Layer 1: 50 Layer 2: 200 | | Learning Rate 0.001 0.001 Batch Size 48 32 Epochs 200 300 Dropout Rates Layer 1: 0.5 Layer 2: 0.25 Layer 1: 0.5 Layer 2: 0.5 | | Class Weights Not Marked: 1, Marked: 1.5 Not Marked: 1, Marked: 1.1 157

Panel B: Performance report

High Financial Literacy Model Medium Financial Literacy Model Performance metric Not Marked Marked avg/total Not Marked Marked avg/total

Precision 0.73 0.65 0.69 0.71 0.64 0.68 Recall 0.71 0.67 0.69 0.68 0.67 0.68 Fl-score 0.72 0.66 0.69 0.69 0.65 0.68 Support 237 188 425 230 195 425 Appendix D: Top 15 Words and Bigrams in Marked Sentences Split by Marking Sentiment

158 Appendix D: Top 15 Words and Bigrams in Marked Sentences Split by Marking Sentiment

Panel A: Top 15 bigrams that are marked as POS, NEG, or NEUT only by medium or high literacy users

Marked Positive Marked Negative Marked Neutral

Only Medium Only High Only Medium Only High Only Medium Only High

(DATE, DATE) (MONEY, MONEY) (MONEY, MONEY) (PERCENT, PERCENT) (DATE, DATE) (PERSON, PERSON) (ORG, ORG) (MONEY, DATE) (DATE, DATE) (accounts, receivable) (ORG, ORG) (cash, equivalents) (fair, value) (GPE, GPE) (MONEY, DATE) (operating, income) (LOC, LOC) (activities, MONEY) (foreign, currency) (DATE, MONEY) (ORG, ORG) (increased, MONEY) (foreign, currency) (cash, cash) (PERSON, PERSON) (increased, MONEY) (cash, flow) (compared, DATE) (tax, assets) (backed, securities) 159 (goodwill, impairment) (compared, DATE) (deferred, tax) (expenses, increased) (taxable, income) (interest, rate) (LAW, LAW) (net, cash) (MONEY, PERCENT) (tax, positions) (future, cash) (ORG, ’s) (DATE, ORG) (DATE, compared) (MONEY, net) (depreciation, amortization) (long, term) (Net, cash) (economic, conditions) (long, term) (carrying, value) (reasonably, assured) (raw, materials) (market, conditions) (balance, sheet) (doubtful, accounts) (tax, assets) (ORG, MONEY) (fair, value) (mortgage, backed) (PRODUCT, PRODUCT) (loss, ratio) (GPE, GPE) (uncertain, tax) (cash, flows) (MONEY, compared) (estimated, fair) (stock, based) (LAW, LAW) (impact, future) (stock, MONEY) (price, MONEY) (located, GPE) (cash, provided) (valuation, allowance) (MONEY, primarily) (net, interest) (discount, rate) (ORG, DATE) (ORG, GPE) (fair, values) (incurred, MONEY) (ORG, related) (net, income) (cash, flows) (based, compensation) (compared, MONEY) (costs, MONEY) (impairment, charge) (financing, activities)

Capitalized words indicate generalized entities. For example, PERSON refers to persons, PRODUCT refers to objects, MONEY relates to monetary values, QUANTITY refers to measurements, ORG refers to any company or institution, GPE refers to countries, cities, and states, LAW refers to documents made into laws, DATE refers to dates, TIME refers to times smaller than a day. Appendix D (continued)

Panel B: Top 20 keyword differences to predict whether a sentence is marked as either positive or negative

Words Word entities Bigrams

POS NEG POS NEG POS NEG

2015 million ORG MONEY (ORG, ORG) (MONEY, MONEY) based results PERSON future (PERSON, PERSON) (MONEY, DATE) common future CARDINAL results (PRODUCT, PRODUCT) (financial, condition) performance assumptions PERCENT assumptions (DATE, ORG) (increased, MONEY) service conditions based subject (DATE, increased) (impairment, charge) balance subject performance impairment (CARDINAL, CARDINAL) (net, actuarial) investments impairment balance decreased (wholly, owned) (results, operations)

160 continue decreased PRODUCT costs (acquisition, ORG) (expected, future) believe costs continue loss (increased, PERCENT) (DATE, MONEY) transaction loss believe demand (real, estate) (goodwill, impairment) June number transaction affect (tax, benefit) (decreased, PERCENT) plans demand date impact (balance, sheet) (carrying, value) owned affect owned number (PERCENT, PERCENT) (operating, loss) exchange impact acquisitions reduced (ORG, GPE) (operating, results) method reduced shares result (shares, common) (MONEY, primarily) contracts result method tax (CARDINAL, shares) (pension, costs) shares tax facilities included (borrowing, rates) (tax, assets) acquisitions included retail significant (tax, benefits) (investment, securities) benefits significant benefits actuarial (PERCENT, total) (financial, results) facilities actuarial positions charge (beginning, DATE) (single, digits)

Capitalized words indicate generalized entities. For example, PERSON refers to persons, PRODUCT refers to objects, MONEY relates to monetary values, QUANTITY refers to measurements, ORG refers to any company or institution, GPE refers to countries, cities, and states, LAW refers to documents made into laws, DATE refers to dates, TIME refers to times smaller than a day. Tables

161 Table 1: Regressions of Participants’ Judgments on Marking Behavior

(1) (2) (3) Inv. Slider Lik. Investment Consult Other Log Positive Markings 0.214*** 0.496*** 0.208*** (0.018) (0.037) (0.054) Log Neutral Markings 0.022 0.051 0.164*** (0.017) (0.036) (0.058) Log Negative Markings -0.273*** -0.698*** -0.312*** (0.017) (0.040) (0.064) Percentage Complex Words -0.620** -1.154** 1.732** (0.256) (0.544) (0.755) Log Number of Words -0.015 0.096 -0.028 (0.053) (0.126) (0.162) Constant -0.067 2.435*** 3.649*** (0.316) (0.739) (0.984) Adjusted R-squared 0.255 0.295 0.057 F-statistic 79.272 85.368 7.797 — p-value 0.000 0.000 0.000 Observations 1416 1416 1416

Table1 reports three OLS regressions of participants’ judgments on marking behavior on the participant-excerpt level. Standard errors are clustered by participant and presented in parentheses; all p-values are two-tailed: * p < 0.1, ** p < 0.05, *** p < 0.01; Investment slider captures participants’ willingness to divest versus invest between 100% and +100%. Likelihood investment equals investment likelihood on a five-point Likert scale ranging from − 1 (Definitely not) to 5 (Definitely). Consult other equals participants’ willingness to consulting other information sources on a five-point Likert scale ranging from 1 (Definitely not) to 5 (Definitely); Log(Positive markings) is the log of one plus the number of positive markings in the MD&A excerpt; Log(Neutral markings) is the log of one plus the number of neutral markings in the MD&A excerpt; Log(Negative markings) is the log of one plus the number of negative markings in the MD&A excerpt; Percentage complex words is the number of complex words divided by the number of words in the MD&A excerpt; Log(Number of words) equals the log of the number of words in the MD&A excerpt.

162 Table 2: Regressions of Reading Time on Financial Literacy and Text Complexity

(1) (2) (3) (4) High Financial Literacy -0.001 -0.004 -0.035*** -0.089** (0.015) (0.032) (0.013) (0.035) Medium Financial Literacy 0.034** -0.023 (0.015) (0.079) High Financial Literacy x Percentage Complex Words 0.015 0.248* (0.132) (0.147) Medium Financial Literacy x Percentage Complex Words 0.018 (0.024) Percentage Complex Words -0.213*** -0.218** -0.326*** -0.450*** (0.079) (0.090) (0.091) (0.116) Sentence Position -0.027*** -0.027*** -0.026*** -0.026*** (0.002) (0.002) (0.002) (0.002) Log Number of Words 0.616*** 0.610*** 0.644*** 0.644*** (0.016) (0.018) (0.018) (0.018) Constant 0.195*** 0.215*** 0.156** 0.183*** (0.057) (0.063) (0.065) (0.067) Adjusted R-squared 0.130 0.130 0.144 0.144 F-statistic 361.025 258.024 391.286 313.320 — p-value 0.000 0.000 0.000 0.000 Observations 18042 18042 12192 12192

Table2 reports four OLS regressions of reading time on financial literacy and readability on the participant- sentence level; columns 1 and 2 are the full sample of participants while columns 3 and 4 are only participants with medium and high financial literacy; standard errors are clustered by sentence and presented in parentheses; all p-values are two-tailed: * p < 0.1, ** p < 0.05, *** p < 0.01; The dependent variable, reading time, equals the log of the time a participant spent reading the sentence in seconds; High financial literacy equals one if par- ticipants scored higher than 80 percent correct on the financial literacy pre-test, zero otherwise; Medium financial literacy equals one if participants scored between 50 percent and 80 percent on the financial literacy pre-test, zero otherwise; Percentage complex words is the number of complex words divided by the number of words in the sen- tence; Sentence position equals the position of the sentence in the MD&A excerpt; Log(Number of words) equals the log of the number of words in the sentence.

163 Table 3: Logit of Marking Likelihood on Financial Literacy and Text Complexity

(1) (2) (3) (4) High Financial Literacy 0.046 -0.041 -0.072** -0.269*** (0.035) (0.079) (0.034) (0.088) Medium Financial Literacy 0.118*** 0.411* (0.036) (0.212) High Financial Literacy x Percentage Complex Words 0.401 0.908** (0.329) (0.372) Medium Financial Literacy x Percentage Complex Words -0.091 (0.065) Percentage Complex Words -1.842*** -1.978*** -2.028*** -2.482*** (0.231) (0.259) (0.249) (0.310) Sentence Position -0.017*** -0.017*** -0.018*** -0.018*** (0.005) (0.005) (0.006) (0.006) Log Number of Words 0.428*** 0.459*** 0.412*** 0.412*** (0.045) (0.050) (0.050) (0.050) Constant -1.008*** -1.079*** -0.789*** -0.690*** (0.161) (0.177) (0.175) (0.180) Pseudo R-squared 0.013 0.013 0.014 0.014 Chi2 165.509 168.714 140.305 146.472 — p-value 0.000 0.000 0.000 0.000 Observations 18042 18042 12192 12192

Table3 reports four logit regressions of the marking likelihood on financial literacy and readability on the participant-sentence level; columns 1 and 2 are the full sample of participants while columns 3 and 4 are only participants with medium and high financial literacy; standard errors are clustered by sentence and presented in parentheses; all p-values are two-tailed: * p < 0.1, ** p < 0.05, *** p < 0.01; The dependent variable, marking likelihood, equals one if a sentence contains marked characters else zero; High financial literacy equals one if par- ticipants scored higher than 80 percent correct on the financial literacy pre-test, zero otherwise; Medium financial literacy equals one if participants scored between 50 percent and 80 percent on the financial literacy pre-test, zero otherwise; Percentage complex words is the number of complex words divided by the number of words in the sen- tence; Sentence position equals the position of the sentence in the MD&A excerpt; Log(Number of words) equals the log of the number of words in the sentence.

164 Table 4: Regressions of Marking Behavior on Financial Literacy and Text Complexity

(1) (2) (3) (4) High Financial Literacy 0.008 -0.019 -0.024*** -0.027* (0.006) (0.013) (0.007) (0.015) Medium Financial Literacy 0.031*** -0.049 (0.007) (0.044) High Financial Literacy x Percentage Complex Words 0.126** 0.017 (0.054) (0.066) Medium Financial Literacy x Percentage Complex Words 0.025* (0.014) Percentage Complex Words -0.006 -0.049 0.066* 0.057 (0.028) (0.034) (0.034) (0.050) Sentence Position -0.005*** -0.005*** -0.006*** -0.006*** (0.001) (0.001) (0.001) (0.001) Log Number of Words 0.085*** 0.076*** 0.100*** 0.100*** (0.006) (0.008) (0.008) (0.008) Constant -0.174*** -0.137*** -0.202*** -0.200*** (0.021) (0.026) (0.026) (0.026) Adjusted R-squared 0.030 0.031 0.038 0.038 F-statistic 46.476 33.950 50.794 40.663 — p-value 0.000 0.000 0.000 0.000 Observations 8554 8554 5856 5856

Table4 reports four OLS regressions of marking behavior on financial literacy and readability for sentences that were marked by participants; columns 1 and 2 are the full sample of participants while columns 3 and 4 are only participants with medium and high financial literacy; standard errors are clustered by sentence and presented in parentheses; all p-values are two-tailed: * p < 0.1, ** p < 0.05, *** p < 0.01; the dependent variable, marking behavior, is the log of the number of marked characters in a marked sentence; High financial literacy equals one if participants scored higher than 80 percent correct on the financial literacy pre-test, zero otherwise; Medium financial literacy equals one if participants scored between 50 percent and 80 percent on the financial literacy pre-test, zero otherwise; Percentage complex words is the number of complex words divided by the number of words in the sentence; Sentence position equals the position of the sentence in the MD&A excerpt; Log(Number of words) equals the log of the number of words in the sentence.

165 Table 5: Top 15 Words and Bigrams in Sentences Marked by Medium or High Literacy Users Only

Words with entity abstractions Bigrams with entity abstractions

Only Medium Only High Only Medium Only High

DATE MONEY (DATE, DATE) (ORG, ORG) tax ORG (fair, value) (MONEY, MONEY) value GPE (foreign, currency) (GPE, GPE) fair water (goodwill, impairment) (increased, MONEY) goodwill increased (cash, flow) (backed, securities) impairment costs (reporting, unit) (water, meters) 166 rate sales (deferred, tax) (DATE, compared) foreign hotels (tax, assets) (mortgage, backed) market expenses (investment, securities) (compared, DATE) reporting related (DATE, ORG) (cash, provided) currency driven (operating, cash) (partially, offset) customers meters (value, reporting) (PERCENT, PERCENT) orders returns (carrying, value) (ORG, PRODUCT) carrying net (available, sale) (municipal, water) exchange revenue (exchange, rate) (activities, MONEY)

Table 5 lists the top 15 words (columns 1 and 2) and bigrams (columns 3 and 4) in sentences marked by medium or high literacy users only. Capitalized words indicate generalized entities. For example, PERSON refers to persons, PRODUCT refers to objects, MONEY relates to monetary values, QUANTITY refers to measurements, ORG refers to any company or institution, GPE refers to countries, cities, and states, LAW refers to documents made into laws, DATE refers to dates, TIME refers to times smaller than a day. Table 6: Predicted Marking Behavior of Financial Literacy Groups

N Mean Sd p1 p50 p99 High Financial Literacy Group Probability of Marking 10,208,953 0.543 0.306 0.000 0.507 0.958 Marking Indicator 10,208,953 0.503 0.500 0.000 1.000 1.000 Medium Financial Literacy Group Probability of Marking 10,208,953 0.432 0.266 0.000 0.460 0.845 Marking Indicator 10,208,953 0.460 0.498 0.000 0.000 1.000 Table6 reports descriptive statistics for the predicted marking probability of the high and medium financial literacy groups for all 10,208,953 unique sentences contained in the sample of MD&As. While the Probability of Marking is the output obtained from the machine-learning algorithm, the Marking Indicator is a constructed variable equal to one if the predicted marking probability is above 50 percent, and zero otherwise.

167 Table 7: Heterogeneity in Predicted Marking Behavior

N Perc. Marking High and Medium Literacy Group 3,603,989 35.30 Non-Marking High and Medium Literacy Group 3,986,148 39.05 High Literacy Group marking only 1,530,057 14.99 Medium Literacy Group marking only 1,088,759 10.66 10,208,953 100.00 Table7 reports the number and frequency of consistent and deviating marking of high and medium financial literacy groups based on prediction of marking be- havior for all 10,208,953 unique sentences contained in the sample of MD&As. Marking High and Medium Literacy Group and Non-Marking High and Medium Literacy Group refer to observations for which the machine-learning algorithm predicts identical marking/non-marking behavior for the high and medium fi- nancial literacy group. High Literacy Group marking only and Medium Literacy Group marking only refer to observations for which marking is only predicted for the high or low literacy group, respectively. All alignment scores are based on the predicted marking indicator, which is equal to one if the predicted mark- ing probability is above 50 percent, and zero otherwise.

168 Table 8: Variable Descriptive Statistics for 10-K Filings

N Mean Sd p1 p50 p99 Inter-rater disagreement measures

IR HETKendall0stau 31,332 -0.499 0.074 -0.641 -0.507 -0.290

IR HETSpearman0srho 31,332 -0.601 0.086 -0.758 -0.612 -0.354

IR HETCohen0skappa 31,332 -0.479 0.098 -0.678 -0.488 -0.206

IR HETCronbach0salpha 31,332 -0.648 0.093 -0.811 -0.661 -0.358 Text features Gunning FOG score 10-K 31,332 20.127 1.108 18.104 20.056 23.326 Gunning FOG score MD&A 31,332 18.836 1.245 16.031 18.798 21.909 %Complex words 10-K 31,332 25.664 1.215 22.682 25.687 28.554 %Complex words MD&A 31,332 20.557 1.625 16.652 20.557 24.676 Number of words 10-K 31,332 44,075 22,401 16,144 39,582 127,831 Number of words MD&A 31,332 8,442 4,400 3,140 7,548 24,072 Market model variables Post-filing RMSE [6,28] 31,332 2.113 1.545 0.429 1.725 7.548 Pre-filing RMSE [-257,-6] 31,332 2.533 1.431 0.747 2.227 7.240 Pre-filing alpha [-252,-6] 31,332 0.043 0.179 -0.379 0.030 0.621 Abs(abnormal return) 31,332 0.030 0.044 0.000 0.017 0.208 Market capitalization 31,332 3,735.582 15,368.410 18.925 612.551 58,222.945 Book-to-market 31,332 0.622 0.480 0.043 0.513 2.492 NASDAQ dummy 31,332 0.587 0.492 0.000 1.000 1.000 Other variables Analyst following 20,484 7.692 5.790 2.000 6.000 29.000 Analyst dispersion 20,484 0.248 0.457 0.000 0.107 2.609 Low institutional ownership 20,346 0.474 0.499 0.000 0.000 1.000 Table8 reports descriptive information for the key variables used in the regressions examining volatility subse- quent to the 10-K filing are based on the entire sample of 66,173 observations. Detailed variable definitions for all variables are provided in appendix B.

169 Table 9: Analysis of Predicted Heterogeneity in Information Retrieval Using Post-Filing Date Market Model RMSE

Dependent variable = Post-filing RMSE [6,28] Baseline IR HET IR HET IR HET IR HET model Kendall’s tau Spearman’s rho Cohen’s kappa Cronbach’s alpha (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

IR HET 0.301∗ 0.297∗∗ 0.251∗ 0.247∗∗ 0.285∗∗ 0.314∗∗ 0.320∗∗ 0.335∗∗ (0.160) (0.135) (0.136) (0.113) (0.120) (0.118) (0.128) (0.127)

%Complex Words (10-K) -0.021∗∗ 0.011 -0.019∗∗ 0.011 -0.019∗∗ 0.011 -0.019∗∗ 0.010 -0.019∗∗ 0.011 (0.007) (0.013) (0.007) (0.013) (0.007) (0.013) (0.007) (0.013) (0.007) (0.013)

Log(Number of Words 10-K) 0.125∗∗∗ 0.036 0.130∗∗∗ 0.038 0.130∗∗∗ 0.038 0.126∗∗∗ 0.037 0.126∗∗∗ 0.037 (0.040) (0.037) (0.041) (0.037) (0.041) (0.037) (0.040) (0.037) (0.040) (0.037)

Pre-filing alpha [-257,-6] -0.673∗∗ -0.366∗ -0.671∗∗ -0.364∗ -0.671∗∗ -0.364∗ -0.670∗∗ -0.363∗ -0.670∗∗ -0.363∗ (0.234) (0.185) (0.234) (0.185) (0.234) (0.185) (0.234) (0.185) (0.234) (0.185)

Pre-filing RMSE [-252,-6] 0.460∗∗∗ 0.300∗∗∗ 0.459∗∗∗ 0.299∗∗∗ 0.459∗∗∗ 0.299∗∗∗ 0.459∗∗∗ 0.299∗∗∗ 0.459∗∗∗ 0.299∗∗∗

170 (0.052) (0.036) (0.052) (0.036) (0.052) (0.036) (0.052) (0.036) (0.052) (0.036)

Abs(filing period abnormal return) 3.860∗∗∗ 3.782∗∗∗ 3.856∗∗∗ 3.778∗∗∗ 3.856∗∗∗ 3.779∗∗∗ 3.857∗∗∗ 3.778∗∗∗ 3.856∗∗∗ 3.779∗∗∗ (0.557) (0.540) (0.557) (0.540) (0.557) (0.541) (0.556) (0.539) (0.557) (0.539)

Log(market capitalization) -0.159∗∗∗ -0.384∗∗∗ -0.159∗∗∗ -0.384∗∗∗ -0.159∗∗∗ -0.384∗∗∗ -0.159∗∗∗ -0.385∗∗∗ -0.159∗∗∗ -0.385∗∗∗ (0.026) (0.060) (0.026) (0.060) (0.026) (0.060) (0.026) (0.060) (0.026) (0.060)

Log(book-to-market) -0.067∗∗ -0.110∗∗ -0.067∗∗ -0.110∗∗ -0.067∗∗ -0.110∗∗ -0.067∗∗ -0.110∗∗ -0.067∗∗ -0.110∗∗ (0.024) (0.037) (0.024) (0.037) (0.024) (0.037) (0.024) (0.037) (0.024) (0.037) NASDAQ dummy 0.054 0.102 0.053 0.102 0.053 0.102 0.052 0.102 0.052 0.103 (0.038) (0.088) (0.037) (0.088) (0.037) (0.088) (0.037) (0.088) (0.037) (0.088)

Constant 1.028∗∗∗ 2.960∗∗∗ 1.076∗∗∗ 3.075∗∗∗ 1.076∗∗∗ 3.075∗∗∗ 1.108∗∗∗ 3.109∗∗∗ 1.171∗∗∗ 3.168∗∗∗ (0.299) (0.837) (0.285) (0.840) (0.285) (0.839) (0.280) (0.833) (0.277) (0.831) Observations 31,332 30,420 31,332 30,420 31,332 30,420 31,332 30,420 31,332 30,420 Firm FE No Yes No Yes No Yes No Yes No Yes Industry FE Yes No Yes No Yes No Yes No Yes No Year FE Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Adjusted R-squared 0.405 0.470 0.405 0.470 0.405 0.470 0.405 0.470 0.405 0.470 Table 9 reports the results for the relationship between post-filing RMSE and heterogeneity in information retrieval between high and medium financial lit- eracy groups. Standard errors clustered by year and industry/firm are presented in parentheses. *, **, and *** represent statistical significance at the 10 percent, 5 percent, and 1 percent level (two-tailed), respectively. Table 10: Analysis of Predicted Heterogeneity in Information Retrieval Using Analyst Dispersion as Dependent Variable

Dependent variable = Analyst Dispersion Baseline IR HET IR HET IR HET IR HET model Kendall’s tau Spearman’s rho Cohen’s kappa Cronbach’s alpha (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) IR HET 0.086 -0.034 0.074 -0.021 0.002 -0.016 0.001 -0.026 (0.057) (0.047) (0.050) (0.042) (0.048) (0.042) (0.048) (0.047)

%Complex Words (10-K) -0.008∗ -0.000 -0.007∗ -0.000 -0.007∗ -0.000 -0.008∗∗ -0.000 -0.008∗∗ -0.000 (0.004) (0.005) (0.003) (0.005) (0.003) (0.005) (0.004) (0.005) (0.004) (0.005)

Log(Number of Words 10-K) 0.125∗∗∗ 0.071∗∗∗ 0.127∗∗∗ 0.071∗∗∗ 0.127∗∗∗ 0.071∗∗∗ 0.125∗∗∗ 0.071∗∗∗ 0.125∗∗∗ 0.071∗∗∗ (0.034) (0.019) (0.035) (0.020) (0.035) (0.020) (0.035) (0.020) (0.035) (0.020)

Pre-filing alpha [-257,-6] -0.513∗∗∗ -0.294∗∗∗ -0.512∗∗∗ -0.294∗∗∗ -0.512∗∗∗ -0.294∗∗∗ -0.513∗∗∗ -0.294∗∗∗ -0.513∗∗∗ -0.294∗∗∗ (0.090) (0.037) (0.090) (0.037) (0.090) (0.037) (0.090) (0.037) (0.090) (0.037)

Pre-filing RMSE [-252,-6] 0.120∗∗∗ 0.083∗∗∗ 0.120∗∗∗ 0.083∗∗∗ 0.120∗∗∗ 0.083∗∗∗ 0.120∗∗∗ 0.083∗∗∗ 0.120∗∗∗ 0.083∗∗∗ (0.031) (0.019) (0.031) (0.019) (0.031) (0.019) (0.031) (0.019) (0.031) (0.019)

171 Abs(filing period abnormal return) 0.748∗∗∗ 0.461∗∗∗ 0.748∗∗∗ 0.461∗∗∗ 0.748∗∗∗ 0.461∗∗∗ 0.748∗∗∗ 0.461∗∗∗ 0.748∗∗∗ 0.461∗∗∗ (0.180) (0.144) (0.181) (0.144) (0.181) (0.144) (0.180) (0.144) (0.180) (0.144)

Log(market capitalization) -0.036∗∗ -0.218∗∗∗ -0.037∗∗ -0.218∗∗∗ -0.037∗∗ -0.218∗∗∗ -0.036∗∗ -0.218∗∗∗ -0.036∗∗ -0.218∗∗∗ (0.016) (0.021) (0.016) (0.021) (0.016) (0.021) (0.016) (0.021) (0.016) (0.021)

Log(book-to-market) 0.067∗∗∗ 0.015 0.067∗∗∗ 0.015 0.067∗∗∗ 0.015 0.067∗∗∗ 0.015 0.067∗∗∗ 0.015 (0.017) (0.013) (0.017) (0.013) (0.017) (0.013) (0.017) (0.013) (0.017) (0.013)

NASDAQ dummy -0.039∗∗ -0.010 -0.040∗∗ -0.010 -0.040∗∗ -0.010 -0.039∗∗ -0.010 -0.039∗∗ -0.010 (0.015) (0.024) (0.015) (0.024) (0.015) (0.024) (0.015) (0.024) (0.015) (0.024)

Log(Analyst Following) 0.023∗∗ 0.057∗∗∗ 0.023∗∗ 0.057∗∗∗ 0.023∗∗ 0.057∗∗∗ 0.023∗∗ 0.057∗∗∗ 0.023∗∗ 0.057∗∗∗ (0.010) (0.008) (0.010) (0.008) (0.010) (0.008) (0.010) (0.008) (0.010) (0.008)

Constant -0.878∗∗∗ 0.782∗∗ -0.862∗∗∗ 0.767∗∗ -0.862∗∗∗ 0.771∗∗ -0.877∗∗∗ 0.773∗∗ -0.877∗∗∗ 0.764∗∗ (0.277) (0.326) (0.270) (0.321) (0.271) (0.321) (0.269) (0.324) (0.265) (0.324) Observations 20,484 19,712 20,484 19,712 20,484 19,712 20,484 19,712 20,484 19,712 Firm FE No Yes No Yes No Yes No Yes No Yes Industry FE Yes No Yes No Yes No Yes No Yes No Year FE Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Adjusted R-squared 0.265 0.438 0.265 0.438 0.265 0.438 0.265 0.438 0.265 0.438 Table 10 reports the results for the relationship between analyst dispersion and heterogeneity in information retrieval between high and medium financial literacy groups. Standard errors clustered by year and industry/firm are presented in parentheses. *, **, and *** represent statistical significance at the 10 percent, 5 percent, and 1 percent level (two-tailed), respectively. Table 11: Analysis of Predicted Heterogeneity in Information Retrieval and the Level of Institutional Ownership Using Post-Filing Date Market Model RMSE

Dependent variable = Post-filing RMSE [6,28] IR HET IR HET IR HET IR HET Kendall’s tau Spearman’s rho Cohen’s kappa Cronbach’s alpha (1) (2) (3) (4) IR HET -0.042 -0.039 0.037 0.013 (0.130) (0.113) (0.083) (0.089)

IR HET 0.590∗∗∗ 0.511∗∗∗ 0.456∗∗ 0.547∗∗ x Low institutional ownership (0.168) (0.145) (0.178) (0.187)

Low institutional ownership 0.424∗∗∗ 0.436∗∗∗ 0.347∗∗ 0.484∗∗∗ (0.106) (0.105) (0.116) (0.148) %Complex Words (10-K) -0.013 -0.013 -0.014 -0.014 (0.013) (0.013) (0.013) (0.013)

Log(Number of Words 10-K) 0.159∗∗∗ 0.159∗∗∗ 0.155∗∗∗ 0.155∗∗∗ (0.045) (0.045) (0.043) (0.044)

Pre-filing alpha [-257,-6] -0.526∗ -0.527∗ -0.525∗ -0.525∗ (0.269) (0.269) (0.269) (0.269)

Pre-filing RMSE [-252,-6] 0.430∗∗∗ 0.430∗∗∗ 0.429∗∗∗ 0.429∗∗∗ (0.088) (0.087) (0.092) (0.091)

Abs(filing period abnormal return) 3.711∗∗∗ 3.712∗∗∗ 3.716∗∗∗ 3.714∗∗∗ (0.638) (0.638) (0.636) (0.636)

Log(market capitalization) -0.190∗∗∗ -0.190∗∗∗ -0.190∗∗∗ -0.190∗∗∗ (0.038) (0.038) (0.038) (0.038)

Log(book-to-market) -0.065∗∗ -0.065∗∗ -0.064∗∗ -0.064∗∗ (0.026) (0.026) (0.026) (0.026) NASDAQ dummy 0.033 0.033 0.033 0.033 (0.036) (0.037) (0.032) (0.032)

Log(Analyst Following) 0.092∗∗∗ 0.092∗∗∗ 0.092∗∗∗ 0.092∗∗∗ (0.016) (0.016) (0.016) (0.016) Constant 0.548 0.542 0.632 0.619 (0.356) (0.356) (0.369) (0.380) Observations 20,346 20,346 20,346 20,346 Firm FE No No No No Industry FE Yes Yes Yes Yes Year FE Yes Yes Yes Yes Adjusted R-squared 0.438 0.438 0.438 0.438 Table 11 reports the results for the relationship between post-filing RMSE and heterogeneity in information retrieval between high and medium financial literacy groups taking into account diversity in firms’ ownership base. Standard errors clustered by year and industry are presented in parentheses. *, **, and *** represent statistical significance at the 10 percent, 5 per- cent, and 1 percent level (two-tailed), respectively.

172