Can Consumer-Posted Photos Serve as a Leading Indicator of Restaurant Survival? Evidence from

Mengxia Zhang and Lan Luo

May 2021

Mengxia Zhang (e-mail: [email protected]) is a PhD Candidate of Marketing, and Lan Luo (e- mail: [email protected]) is an Associate Professor of Marketing, both at the Marshall School of Business, University of Southern California. We thank participants at the Advanced Research Techniques (ART) Forum, the ISMS Doctoral Dissertation Award Special Session at the INFORMS Marketing Science conference, the UT Dallas FORMS Conference, the Product and Service Innovation Conference, Carnegie Mellon University Conference on Digital Marketing and Machine Learning, Erasmus University Workshop on Machine Learning & Multi-bandit Algorithms, GERAD Montreal, University of Miami Marketing Research Camp, MIT Marketing Group Doctoral Workshop, and seminar attendants at Chinese University of Hong Kong, City University of Hong Kong, Erasmus University, Hong Kong University of Science and Technology, University of Hong Kong, University of Iowa, National University of Singapore, Ohio State University, Peking University, Penn State University, Santa Clara University, Shanghai University of Finance and Economics, Tongji University, Tsinghua University, University of Western Ontario, and University of Wisconsin-Madison for their helpful comments. We also thank Nvidia and Clarifai for supporting this research. 1

Can Consumer-Posted Photos Serve as a Leading Indicator of Restaurant Survival? Evidence from Yelp

Abstract

Despite the substantial economic impact of the restaurant industry, large-scale empirical research on restaurant survival has been sparse. We investigate whether consumer-posted photos can serve as a leading indicator of restaurant survival above and beyond reviews, firm characteristics, competitive landscape, and macro conditions. We employ machine learning techniques to extract features from 755,758 photos and 1,121,069 reviews posted on Yelp between 2004 and 2015 for 17,719 U.S. restaurants. We also collect data on restaurant characteristics (e.g., cuisine type; price level), competitive landscape, as well as entry and exit (if applicable) time from each restaurant’s Yelp/ page, own website, or search engine. Using a predictive XGBoost algorithm, we find that consumer-posted photos are strong predictors of restaurant survival. Interestingly, the informativeness of photos (e.g., the proportion of food photos; the proportion of photos with helpful votes) relates more to restaurant survival than do photographic attributes (e.g., composition; brightness). Additionally, photos carry more predictive power for independent, young or mid-aged, and medium-priced restaurants. Assuming that restaurant owners possess no knowledge about future photos and reviews, photos predict restaurant survival for up to three years, while reviews are only predictive for one year. We further employ causal forests to facilitate the interpretation of our predictive results. Among photo content variables, the proportion of food photos has the largest positive association with restaurant survival, followed by proportions of outside and interior photos. Among others, the proportion of photos with helpful votes also positively relates to restaurant survival.

2

1. Introduction

The restaurant industry has a substantial impact on the U.S. economy. According to the National

Restaurant Association, the U.S. restaurant industry generated more than $833 billion in revenue and jobs for 1 in 10 workers in 2018. Meanwhile, this industry is also well known for its high turnover rate.

According to Parsa et al. (2005), the first-year turnover rate of restaurants is as high as 26%.

Nevertheless, large-scale empirical research on restaurant survival has been scarce.

With the extensive use of camera-enabled smartphones and the increasing popularity of various photo-sharing platforms, three billion photos are shared on social media daily (McGrath 2017). The number of photos taken by consumers in 2017 was projected to be 1.3 trillion globally (New York Times,

July 29, 2015). Compared to many other industries, the restaurant industry is also unique in that consumers love to share photos of their dining experience online (New York Times, April 6, 2010).

Moreover, the restaurant industry is abundant with rich information on consumer reviews, restaurant characteristics, and competitive landscape. Consequently, it provides us with an ideal setting to investigate whether photos may serve as a leading indicator of restaurant survival above and beyond these alternative factors.

Why do we believe consumer-posted photos may be a strong predictor of restaurant survival? We conjecture that there are two possible routes in which photos may be useful in predicting restaurant survival. First, from the posters’ perspective, it takes very little effort to upload photos on social media.

As such, photos may serve as a proxy of restaurant popularity and/or reflect a restaurant’s situation (e.g., well-managed/deteriorating restaurants may be reflected by customer-posted photos on platforms such as

Yelp) above and beyond reviews. Second, from the viewers’ perspective, photos may communicate valuable information about the restaurant with great efficiency, hence facilitate a good match between the focal business and its customers. Scientific studies have shown that human brains can process a photo in as little as 13 milliseconds (MIT News, Jan 16, 2014). The brain also processes visual information 60,000 times faster than text (Vogel et al. 1986). On Yelp and TripAdvisor, businesses with 10 plus photos receive double to triple the number of views over businesses with the same number of reviews and no

3 photos.1 Furthermore, photos visualize rich information about the restaurant (e.g., food items served; ambiance), revealing whether the focal restaurant matches the viewer’s horizontal taste. For example, a consumer may not be able to figure out whether she may like a seafood pizza based on a review “it is the most unique pizza I had!” But she can clearly visualize the pizza from a photo and discern whether this is a food item that she may like. In the past, Ghose et al. (2012) have shown that hotel ranking systems incorporating photos can generate recommendations with a better fit compared to those with no photos.

Similarly, we conjecture that photos may help viewers better visualize how much they might enjoy the food items and/or the dining experience.

Thus, although the survival of a restaurant might not hinge on the experience of a single consumer who either posted photos or viewed photos posted by others, consumer-posted photos may collectively provide some useful information that can be used to forecast its survival potential. Within this context, we explore answers to the following questions: 1) Can consumer-posted photos serve as a leading indicator of restaurant survival? 2) If so, what aspects of photos are more important and what are less important? 3) Are photos more informative for certain types of restaurants? And 4) how long can photos stay predictive in forecasting restaurant survival?

Specifically, we employ machine learning methods to extract various features from 755,758 photos posted on Yelp between October 2004 and December 2015 for 17,719 U.S. restaurants, among which 25.37% went out of business during this time window. To investigate the incremental predictive power of photos, we also extract features from 1,121,069 reviews for these restaurants during the same time window as controls. Built upon the business/restaurant-survival literature (e.g., Lafontaine et al.

2018; Parsa et al. 2005), we further collect data on these restaurants’ characteristics (e.g., chain status, cuisine type, price level), competitive landscape (e.g., restaurant concentration, new entries/exits, photos and reviews of competitors), entry time (from each restaurant’s Yelp/Facebook page, own website, or the

Google search engine), along with data on macro factors (year and zip code) as additional controls. Given

1https://searchengineland.com/business-profile-review-best-practices-tripadvisor-yelp-276247 and https://www.tripadvisor.com/ForRestaurants/wp-content/uploads/2018/01/Dinerengagement_us_en.pdf

4 that unobserved restaurant quality may impact both consumer-posted photos and survival, we employ deep learning techniques to extract restaurant quality measures along four dimensions (food, service, environment, and price) via deep mining of 1,121,069 reviews.2 We emphasize these four dimensions of restaurant quality based on the prior literature (e.g., Bujisic et al. 2014; Hyun 2010; Ryu et al. 2012).

Such an approach is inspired by the recent trend of utilizing consumer reviews to track product or service quality over time (e.g., Tirunillai and Tellis 2014; Hollenbeck 2018).3 We then employ an implementation of gradient boosting trees called XGBoost algorithm (Chen and Guestrin 2016) to discern the incremental predictive power of photos with all other variables served as controls.

We discover that consumer-posted photos can significantly improve forecast accuracy for restaurant survival above and beyond reviews, restaurant characteristics, competitive landscape, and macro conditions. We further explore the incremental predictive power of various aspects of photos, including content, photographic attributes, caption, volume, and helpful votes. We learn that the informativeness of photos (e.g., the proportion of food photos and the proportion of photos with helpful votes) relates more to restaurant survival than do photographic attributes (e.g., composition, brightness).

In addition, photos are more informative for independent (vs. chain), young or mid-aged (vs. established), and medium-priced (vs. low-priced) restaurants. Assuming that restaurant owners do not possess any knowledge about future photos and reviews for both themselves and their competitors, photos can predict restaurant survival for up to three years, while reviews are informative only for one year.

Despite its many desirable features (such as the ability to explore nonlinear relationships among a large number of variables), the predictive XGBoost algorithm emphasizes maximizing out-of-sample

2 Since consumers self-select into writing reviews, we fully acknowledge that Yelp reviews are not a perfect measure of restaurant quality. We have explored alternative measures of restaurant quality, such as Zagat or the Michelin Guide. Nevertheless, these sources suffer from severe restaurant-selection bias, with Zagat only focusing on the most popular restaurants and Michelin mostly emphasizing high-end upscale dining options. Because our research requires time-varying measures of restaurant quality for a large number of restaurants, we resort to text mining of Yelp reviews to discern trajectories of restaurant quality over time.

3 Tirunillai and Tellis (2014) find that quality metrics mined from consumer reviews and those from Consumer Reports correlate between 0.61 (for footwear) to 0.81 (for computers). Using 15 years of hotel reviews from multiple platforms, Hollenbeck (2018) discovers that consumer reviews reduce information asymmetry of quality between sellers and buyers and serve as a substitute of brand name (or chain affiliation) for independent hotels.

5 prediction accuracy rather than obtaining unbiased/consistent parameter estimates quantifying how each independent variable (e.g., the proportion of photos with helpful votes) relates to restaurant survival.

Aiming to provide a better understanding of how various photo-related (as well as non-photo-related) factors relate to restaurant survival, we further employ cluster-robust causal forests (Athey et al. 2019;

Athey and Wager 2019) to facilitate parameter interpretation of the most informative predictors as suggested by the SHAP feature importance (Lundberg et al. 2020) derived from our XGBoost algorithm.

As discussed in Athey and Imbens (2016) and Wager and Athey (2018), the causal forests model can serve as a suitable alternative to conventional propensity score methods in inferring treatment effects from rich observational data like ours. Furthermore, our cluster-robust causal forests allow for clustered errors within a restaurant to account for time-invariant variables that are unobservable to us (e.g., owner education).4 We learn that, among photo content variables, the proportion of food photos has the largest positive association with restaurant survival, followed by proportions of outside and interior photos.

Among others, the proportion of photos with helpful votes is also positively related to restaurant survival.

To our knowledge, this study is among the first to link consumer-posted photos with business survival. Over the past decade, there has been an extensive literature in marketing (e.g., Archak et al.

2011; Lee et al. 2019; Liu et al. 2019; Netzer et al. 2012; Netzer et al. 2019; Puranam et al. 2017;

Timoshenko and Hauser 2019; Tirunillai and Tellis 2012, 2014; Toubia and Netzer 2016) that emphasized extracting managerially relevant information from consumer reviews. More recently, several researchers have explored the role of photos in consumption experiences (e.g., Barasch et al. 2017;

Barasch et al. 2017; Diehl et al. 2016), advertising (Xiao and Ding 2014), ranking systems (Ghose et al.

2012), brand perceptions (Dzyabura and Peres 2018; Liu et al. 2018), social media engagement

(Hartmann et al. 2019; Ko and Bowman 2018; Li and Xie 2018), lodging demand (Zhang et al. 2018), product returns (Dzyabura et al. 2018), crowdfunding success (Li et al. 2019), and labor market (Malik et

4 In Section 4.1, we discuss in detail how we mitigate possible time-varying restaurant unobservables (e.g., changes in finances or chef) via our restaurant quality measures based on the directed acyclical graph theory (Morgan and Winship 2015; Pearl 2014).

6 al. 2020; Troncoso and Luo 2020). However, few studies have explored the relationship between consumer-posted photos and long-term business prosperity. Our research contributes to the literature by filling this void.

Our research also adds to the business/restaurant-survival literature by examining whether consumer-posted photos can predict restaurant survival above and beyond reviews and other known factors related to company, competition, and macro conditions. Historically, research on business survival has mainly focused on company characteristics (e.g., Bates 1995; Lafontaine et al. 2018), competitive landscape (e.g., Fritsch et al. 2006; Wagner 1994), and macro conditions (e.g., Audretsch and Mahmood

1995; Boden Jr and Nucci 2000) (see Table 1 for a review of this literature). We contribute by adding user-generated content (particularly photos) as an additional component to this business survival literature. Given our extensive efforts in extracting as much information as possible from a large number of restaurants over an extended time period, we believe that our research is perhaps among the most comprehensive studies on restaurant survival to date.

Findings from our research can be beneficial to various stakeholders, including business investors, landlords, online platforms, restaurant owners, and research/trade associations. First, managers/investors can better understand the market before starting a new business or investing in existing businesses. Additionally, our research can help landlords who must make high-stake decisions such as deciding whether to begin/renew rental leases to new/existing restaurants. The average investment per restaurant in the US is about half a million (Restaurant Engine, May 4th, 2015). As of May 2019, the restaurant industry represents about $375 billion market capitalization.5 Nevertheless, restaurants are also known to have the highest turnover rates in the retail sector (Fritsch et al. 2006; Parsa et al. 2005). As per our findings, incorporating photos (especially their content information and helpful votes received) into such analysis can significantly increase the prediction accuracy of these important business decisions.

5 The number is calculated by summing up market capitalization of all publicly traded restaurant companies listed on https://markets.on.nytimes.com/research/markets/usmarkets/industry.asp?industry=53312 accessed on 05/10/2019.

7

Second, because our research suggests that UGC (especially photos) is useful in forecasting survival, online platforms such as Yelp and TripAdvisor might monetize our work by offering premium analytics reports to their business customers. These platforms already routinely provide business owners with basic analytics reports (e.g., number of consumer-posted photos and reviews). These business owners may also benefit from an enriched report that provides more in-depth information on photos (e.g., photo content; helpful votes received) and reviews (e.g., consumer sentiment on food, service, environment, and price) for both the focal business and all competing businesses within close proximity.

Third, restaurant owners might also leverage our research to advance their competitive intelligence and resource-allocation strategies. As is well-known in the literature (e.g., Fritsch et al. 2006;

Kalleberg and Leicht 1991; Parsa et al. 2005), a thorough understanding of the competitive landscape is vital to business/restaurant survival. Based on our predictive model, restaurant owners might decide whether and/or when to initiate aggressive marketing strategies (e.g., offering promotions) upon detecting a decline in the survival probability of a close competitor. Additionally, given that photos can predict restaurant survival for up to three years into the future, our research may also be useful for longer-term strategic planning by restaurant owners.

Last but not least, our research provides useful insights for research/trade associations such as the

National Restaurant Association and the National Tour Association regarding survival probabilities by business characteristics (e.g., chain vs. independent; cuisine type; price level), time trends, and macro performances in the restaurant industry.

2. Data

We collect data from 17,719 U.S. restaurants listed on Yelp with a total of 755,758 photos and 1,121,069 reviews between October 12, 2004 and December 24, 2015. Our dataset also contains information on firm characteristics, competitive landscape, year, zip code, and entry and exit (if applicable) times for these restaurants. The restaurant and review data come from the Yelp Dataset Challenge Round 7. The competitive landscape, zip codes, and exit times are processed from Yelp provided data. To complement the Yelp dataset, we further collect the Yelp photos of all restaurants in our dataset. Additionally, we

8 collect data on the entry time of these restaurants from Yelp, the restaurant’s website, Facebook page, or via the Google search engine.

Figure 1 depicts the roadmap of this research. In particular, we focus on studying the role of photos in predicting restaurant survival. All other factors, including reviews, restaurant characteristics, competition, and macro conditions, serve as important control variables in our research. In the following subsections, we explain each component of this roadmap in more detail. Summary statistics of reviews and photos are presented in Table 2. Restaurant level summary statistics are provided in Table 3. Figure 2 shows the distribution of restaurant life span broken down by open and closed restaurants in our sample.

Table 4A describes all variables related to consumer photos and reviews in our predictive model. Table

4B lists all variables associated with company characteristics, competitive landscape, and macro factors.

In Appendix A, we depict survival patterns by cuisine types and states. This appendix also provides model-free evidence that supports our conjecture that consumer-posted photos may serve as a leading indicator of restaurant survival.

2.1 Photos

The photo information collected from Yelp includes each photo’s upload date, helpful votes received, photo caption written by the poster, as well as its general content (i.e., food, drink, interior, outside, menu) as classified by Yelp. Because we only observe the posting date of each photo and the total number of helpful votes received by a photo at the end of our data collection, we use the number of helpful votes normalized by the number of years since upload, i.e., yearly helpful votes, to measure the helpfulness of each photo. We describe how we extract additional information for photo content, photographic attributes, and photo caption below.

2.1.1 Photo Content

Yelp has developed a deep learning model that yields a general photo classification as follows: food, drink, interior, outside, and menu.6 Because this general classification may not fully capture the detailed

6 https://engineeringblog.yelp.com/2015/10/how-we-use-deep-learning-to-classify-business-photos-at-yelp.html

9 content of a photo (e.g., strawberries, seafood, ocean view), we further extract specific content in each photo using Clarifai API. Several marketing researchers have utilized APIs to identify objects in photos

(Ko and Bowman 2018; Li et al. 2019). We choose Clarifai because it has been a market leader in photo- content detection. Clarifai was among the top five winners in photo classification at the ImageNet 2013 competition and is particularly useful for our study because it provides highly detailed labels for food.

Clarifai’s “Food” model recognizes more than 1,000 food items in photos down to the ingredient level.7

Appendix B provides a more detailed description of how we use Clarifai API in our research. The Clarifai

API generates 5,080 unique labels for all photos in our dataset.

To summarize the content in the photos detected by Clarifai, we calibrate a topic model called

Latent Dirichlet allocation (LDA) (Blei et al. 2003). The topic model serves as a data reduction technique, allowing us to use our prediction model to relate LDA probabilities of photo content topics to survival probabilities. We vary the number of topics between 2 and 40. We find that the fitted LDA model with 10 topics yields the highest topic coherence score (Röder et al. 2015), a method that has been shown to make the resulting topics more interpretable (Chang et al. 2009; Newman et al. 2010). Therefore, we set the number of photo topics to 10 in our study. Table A2 in Appendix B presents the 10 topics and the most representative words for each topic. Please refer to Appendix B for technical details of our topic modeling procedure.

Given that the variety of objects in a photo might also affect its memorability (Isola et al. 2011) and consumer appetite (Wadhera and Capaldi-Phillips 2014), we follow prior literature on visual complexity (Isola et al. 2011) by counting the number of unique labels in a photo. One challenge we face is that there is no clear structure in the returned 5,080 labels (e.g., returned labels might include both

“berry” and “strawberry,” but “berry” is a superset of “strawberry”). To solve this problem, we use

WordNet (Fellbaum 1998; Miller 1995) to structure the labels. For example, “fruit” is a superset of

“berry,” which, in turn, is a superset of “strawberry.” Thus, in this example, “strawberry” is the leaf node,

7 https://clarifai.com/models/food-image-recognition-model-bd367be194cf45149e75f01d59f77ba7

10 and we use leaf nodes (the label at the most refined level) among the labels of a photo to measure the variety of objects in the photo. The total number of unique leaf nodes is 5,037.

2.1.2 Photographic Attributes

We further examine whether photographic attributes carry any predictive power for restaurant survival.

As suggested by Zhang et al. (2018), photographic attributes may reflect the quality of a photo, which in turn may affect demand. Following Zhang et al. (2018), we organize photographic attributes into three categories: 1) color; 2) composition; and 3) figure-ground relationship (see definitions in Table A3 in

Appendix B) based on the prior photography and marketing literature. For example, researchers have shown that color can affect aesthetics (Arnheim 1965; Freeman 2007) and attractiveness of food

(Nishiyama et al. 2011; Takahashi et al. 2016; Wadhera and Capaldi-Phillips 2014). Composition describes how different subjects and visual elements are arranged within the photo (Krages 2012).

Visually balanced photos give the audience the feeling of order and tidiness and minimize cognitive demands (Kreitler and Kreitler 1972). The figure-ground relationship reflects how the central element of the photo is distinguished from the background. Consumer research suggests that ads containing photos with clear figure-ground relationships receive more attention from their audience (Larsen et al. 2004).

In sum, we use an 18-dimensional vector to capture the color, composition, and figure-ground relationship attributes of each photo, with each dimension bounded between 0 and 1. A higher score represents a higher intensity in that dimension. For example, a photo with a 0.9 brightness level is much brighter than one measuring 0.1. Appendix B includes more technical details of how we extract these photographic attributes.

2.1.3 Photo Caption

Seventy-one percent of photos in our dataset have a caption (i.e., a short description of the photo provided by the poster), generally reflecting either a dish name, such as “strawberry cheesecake,” or positive/negative sentiment associated with the photo, such as “giant pretzel is yummy!” or “the tofu wasn’t that great.”

11

We use VADER (Valence Aware Dictionary and sEntiment Reasoner) (Hutto and Gilbert 2014) sentiment analysis to analyze photo captions. VADER is a lexicon and rule-based sentiment-analysis tool specifically attuned to sentiments expressed in social media platforms such as Yelp. VADER is calibrated on multiple sources: a sentiment lexicon built on well-established sentiment word-banks (LIWC, ANEW,

GI) and validated by independent human judges, tweets, , movie reviews from

Rotten Tomatoes, and Amazon reviews.

Our analysis generates a sentiment score for each caption, ranging from -1 (most negative) to +1

(most positive). Consistent with our expectation, dish-name captions always have a neutral sentiment

(sentiment score = 0). We also find that 70% of the captions are neutral dish names, 26% are positive, and only 4% are negative. Please see Appendix B for examples and distribution of extracted sentiments from photo captions.

2.2 Reviews

Given that both photos and reviews are user-generated content that reflects consumer experience with the focal restaurant, reviews serve as an important control for us to examine the incremental predictive power of photos. Because prior literature (e.g., Chevalier and Mayzlin 2006; Liu 2006) suggests that review volume, star rating, and length can impact demand, we include these variables in our analysis. To the extent feasible, we also make the analyses of photos and reviews as comparable as possible. For example, similar to helpful votes of photos, we use yearly helpful votes of a review to measure its helpfulness. In the following, we explain how we extract restaurant-quality dimensions and content information from reviews.

2.2.1 Restaurant Quality Dimensions

Restaurant quality is unobservable to us, but it is a crucial factor for restaurant survival. Based on prior literature on restaurant-quality dimensions (Bujisic et al. 2014; Hyun 2010; Ryu et al. 2012), we extract the following four quality dimensions from reviews: food, service, environment, and price. Compared to

12 the time-invariant price level given by Yelp, the quality dimension measure of price based on reviews can track consumer sentiment regarding prices over time.

One challenge of evaluating restaurant quality dimensions based on reviews is that Yelp reviews do not provide separate numerical ratings for each quality dimension. For each review in our dataset, we must determine whether each quality dimension is mentioned and, if so, the corresponding sentiment.

Because we have over a million reviews in our dataset, it is challenging to manually label all of them.

Therefore, we randomly select 10,000 reviews from our review dataset and recruit 4,051 consumers from

Amazon Mechanical Turk (Mturk) to each read and provide labels for 20 reviews, which provides an average of 8 labels per review. Please see the details of our Mturk survey in Appendix C.

In line with the stream of literature that utilizes deep learning to extract managerially relevant information from reviews (e.g., Lee et al. 2019; Liu et al. 2019; Timoshenko and Hauser 2019), we use a text-based multi-task convolutional neural network (CNN) to extract restaurant quality dimensions from reviews. The 10,000 reviews and their labels are used to calibrate the text-based CNN. We randomly split the 10,000 reviews into 80% for calibration and 20% for out-of-sample testing. In the calibration process, the text of each review is treated as model inputs, and the outputs are the eight quality dimension scores labeled by the Mturk survey. We use AUC, the area under the receiver operating characteristic (ROC) curve (Hanley and McNeil 1982), to measure prediction accuracy in the holdout test dataset. AUC has proven to be useful in evaluating prediction accuracy in many machine learning applications (Bradley

1997; Fawcett 2004, 2006; Netzer et al. 2019). AUC ranges between 0 and 1. The larger the AUC, the better. A random model generates an AUC=0.5. The text-based CNN yields AUC scores greater than 0.88 for all eight tasks, as shown in Table A7 in Appendix C. We then extrapolate the calibrated CNN to each review in our dataset to extract the quality dimension scores, an 8-dimensional vector of scores bounded between 0 and 1. A higher score represents a higher probability that the review mentioned the corresponding quality dimension (for the first four scores in the vector) or better sentiment (for the second four scores in the vector) in the corresponding dimension. More technical details of this text-based CNN are provided in Appendix C.

13

2.2.2 Review Content

Although the primary purpose of extracting restaurant quality dimensions is to control for the valence of each dimension, these four dimensions may not fully capture the detailed content of a review (e.g., dessert served, waiting time, happy hour). Additionally, while certain specific content (e.g., dessert) may exist in both photos and review texts, there may be rich information contained in review content (e.g., dissatisfaction for waiting time, love for a restaurant) that are not reflected in photos. Therefore, we further extract specific content information from reviews using topic modeling.

Similar to our topic modeling on photo content, we calibrate an LDA model to analyze the review content. We vary the number of topics between 2 and 40 and find that the fitted LDA model with 20 topics yields the highest topic coherence score (Röder et al. 2015). Table A9 in Appendix C presents the

20 topics and the most representative words for each topic. Please refer to Appendix C for technical details of our topic modeling procedure for reviews.

Moreover, because the variety of objects in photos might also be discussed in reviews, we capture the variety of objects in reviews as an additional control. Because nouns are natural units of objects in reviews, we count the number of unique nouns in a review. We use WordNet (Fellbaum 1998; Miller

1995) to structure the nouns (e.g., “strawberry” is a leaf node of “berry”). Then, we use leaf nodes among the nouns of a review to measure the variety of objects in the review. The total number of unique leaf nodes is 30,296.

2.3 Restaurant Characteristics

Based on the extant business survival literature (e.g., Audretsch and Mahmood 1995; Bates 1995; Parsa et al. 2005), we include several restaurant characteristics including age, chain status, cuisine type, and price level as additional controls in our survival model. Prior studies (e.g., Carroll 1983) often suggest that less- established businesses are more prone to failure than their longer-established counterparts. Since not all restaurants include their opening-year information on Yelp, we take the following steps to collect data on age for restaurants in our sample. First, we write a Python program to check and collect birth-year

14 information from each restaurant’s Yelp page. Second, for restaurants with no birth year listed on Yelp, we collect the URL link of their Facebook page from a restaurant’s website (if listed on Yelp). Third, we then collect birth-year information from the respective Facebook page. Lastly, for the restaurants that do not report birth years on Yelp or Facebook, two research assistants manually check birth years on restaurants’ websites (if available) or through the Google search engine. In total, we are able to collect birth year information for 10,368 restaurants, accounting for 59% of all restaurants in our sample. For the remaining 41% of restaurants, we use the date of the first photo or first review as a proxy for their birthdates.

Following Parsa et al. (2011), chain status is determined by whether the number of restaurants with the same name in our dataset is greater than five. Cuisine types are based on the cuisine-type labels provided by Yelp (e.g., fast food, bars, pizza). Yelp categorizes restaurant cuisines into 260 types. A restaurant may have more than one cuisine type (e.g., McDonald’s belongs to both fast food and burger categories). Because many cuisine types include only a few restaurants in our sample, we examine the survival probabilities of the top 10 cuisine types and group the remaining types as “others.” The price level is measured by one to four dollar signs, as indicated on Yelp. As shown in Table 3, 26% of restaurants in our sample are chain restaurants, and our restaurant sample includes a wide variety of restaurants with different cuisine types and price levels.

2.4 Competitive Landscape

According to the business survival literature (Fritsch et al. 2006; Kalleberg and Leicht 1991; Parsa et al.

2005; Wagner 1994), the intensity of competition plays a vital role in the survival of a business enterprise.

Consequently, we take into account competitor concentration, new entries/exits in each period, and photos and reviews of competitors in our survival model. Following Parsa et al. (2005), we consider all restaurants operating in the same zip code and year as competitors of the focal restaurant. Given that restaurants from the same cuisine types often compete for a similar customer base, we further group competitors into competitors with overlapping/non-overlapping cuisine types. Overlapping means that the competing restaurant shares at least one cuisine type with the focal restaurant; non-overlapping means

15 that they have no cuisine type in common. A large number of competitors within the same cuisine type in an area can indicate the popularity of that cuisine type and/or fierce competition in that area (Fritsch et al.

2006). We also consider the number of new entries and exits (we discuss how we identify exit status and exit time in Section 2.6) in each year and zip code. Many exits may indicate decreased demand and/or less competition in that area, while many new entries may indicate increased demand and/or more competition

(Fritsch et al. 2006). Finally, we account for the volume of photos and reviews and the average star rating of competitors in each year and zip code, because the UGC of competitors may also reflect the fierceness of competition.

2.5 Macro Conditions

We account for macro conditions such as time trends by year dummies and local environments by zip- code dummies. The interactions between year dummies and zip-code dummies capture local condition changes over the years. We do not need to include year-zip-code specific dummies, because the models we use (the XGBoost algorithm in Section 3 and causal forests in Section 4) account for interactions between year dummies and zip-code dummies automatically.

2.6 Exit Time

To study restaurant survival, we need to know whether and/or when a restaurant went out of business. For each closed restaurant, Yelp has a salient banner on the restaurant page indicating that “Yelpers report this location has closed” (see Figure A1). During the eleven years in our observation window, 4,495 restaurants (25.37% of all restaurants in our sample) went out of business.

However, Yelp does not provide information regarding the exit time. For each closed restaurant, we scan for the earliest review that mentions a restaurant’s closure and use the date of that review to approximate its exit time. We use keyword matching to identify the earliest review mentioning the closure of a restaurant. To obtain a dictionary of keywords, we recruit a research assistant to read all reviews of

200 randomly chosen closed restaurants in order to identify words and phrases representing the permanent closure status of a restaurant. This dictionary of keywords is then used to identify exit time for all closed restaurants in our dataset. Figure A1 in Appendix A shows an example of a review that mentions a

16 restaurant closure. If a closed restaurant has no reviews mentioning its closure, we use the date of the last photo or last review to approximate its exit time.

As a robustness check, we collect reports about restaurant closure on Eater.com (a website with local culinary news that reports restaurant openings and closings). Eater.com only covered restaurant closure information for selected cities in our dataset from 2012 to 2015. We use the information provided by the subset of closed restaurants that we could identify on Eater.com to crosscheck restaurant exit time.

We use the exact closure date if it is provided in the Eater.com report; if no exact date is reported, we use the date of the report itself as the exit time. The average discrepancy between the closure date identified via Yelp reviews and that identified by Eater.com is 60.95 days. Since we use a year as the unit of analysis, the difference is negligible.

3. Predicting Restaurant Survival

In Section 3.1, we describe a predictive model of restaurant survival based on the XGBoost algorithm

(Chen and Guestrin 2016), a scalable implementation of gradient boosting trees (Friedman 2001) using all input variables shown in Table 4. We first apply this model for a one-year-ahead prediction and explore what factors are the most informative in forecasting restaurant survival in Section 3.2. We then examine whether consumer-posted photos are more informative for certain types of restaurants in Section 3.3. In

Section 3.4, we investigate how long photos can remain predictive in forecasting restaurant survival.

3.1 Restaurant Survival Model

Our restaurant survival model is based on gradient boosting trees. In particular, we apply the XGBoost algorithm (Chen and Guestrin 2016) due to its excellent performance in many predictive tasks. The superior predictive performance of this algorithm has been widely recognized across many machine learning challenges held by Kaggle and KDD cups (Chen and Guestrin 2016). Several prior marketing studies have also used XGBoost (e.g., Rafieian and Yoganarasimhan 2018; Rajaram et al. 2018) or similar gradient boosting trees methods (Yoganarasimhan 2019) for complex customer behavior prediction problems.

17

In our context, the XGBoost algorithm has the following four properties that are particularly desirable. First, XGBoost works well in handling a large number of predictor variables that may correlate with each other (Chen et al. 2018). This property is particularly helpful in our context because many of our predictors are correlated. Second, given that our predictive model includes a very high number of independent variables, XGBoost is also particularly suitable as it automatically selects the most informative variables for prediction (Chen and Guestrin 2016). Third, XGBoost’s flexibility in handling potentially higher-order interactions among predictors (Friedman 2001) is advantageous in our context

(e.g., photos might be more indicative of the survival of an independent restaurant than for a chain restaurant). Lastly, the XGBoost algorithm’s ability to efficiently process sparse data (Chen and Guestrin

2016) is also helpful in our setting, as our data contain many dummy variables (e.g., cuisine types, zip codes, years) as well as variables with missing values. To discern the predictive ability of the XGBoost algorithm in our specific context, we compare the predictive performance of XGBoost with random forests and support vector machine in Appendix D. The comparison results show that XGBoost outperforms the benchmark models.

Following Figure 1, we include consumer-posted photos and a set of control variables (i.e., consumer reviews, company, competition, and macro factors) as inputs and restaurant survival as the outcome variable for XGBoost. Let 푖 denote restaurant and 푡 denote period. We define each period as a calendar year (for example, in year 2014, 푡 = 2014) because yearly (compared to monthly) predictions provide a longer lead time for managers and investors to plan out their resource-allocation decisions.

Furthermore, because some restaurants may have no newly posted photos or reviews within a given month, using year rather than month as the unit of analysis also reduces sparsity of data and improves the stability of our empirical model.

0 Let us denote the observation window of restaurant i as from calendar year 푇푖 to 푇푖. Let the dependent variable be 푦푖푡 (푦푖푡 = 1 if restaurant 푖 survives at period 푡; 푦푖푡 = 0 if restaurant 푖 is closed at period 푡). We use data up to period 푡 − 1 to predict whether a restaurant will survive in period 푡 for a

18 one-year-ahead prediction. For example, to predict survival in year 2010, we only use data until year 2009 for model calibration. As such, we carry out multiple forecasting tasks (e.g., predicting survival in year

2010 using data until 2009; predicting survival in year 2011 using data until 2010; etc.). Specifically, we carry out six forecasting scenarios to predict survival in years 2010-2015 respectively. We did not perform such tasks to forecast survival before year 2010 because there are very little data for model calibration (less than 20% of total restaurant-year observations were from years before 2010). All tables in this section are based on average predictive results across years 2010-2015.

To ensure that our predictive model of restaurant survival can be generalized to restaurants not in our dataset, we split the training and testing data at the restaurant level (all observations of a restaurant will be either in the training or the testing dataset, but not both). Namely, we utilize calibrated parameter estimates from our training data to predict restaurant survival in the testing data. We take within- restaurant variation into account in Section 4 when our primary goal is to interpret parameter estimates rather than maximizing out-of-sample prediction accuracy.

In what follows, we outline the basic intuition behind our prediction model. Following Chen and

Guestrin (2016), the XGBoost algorithm in our predictive model aims to minimize the objective function outlined in Equation (1) consisting of two parts: a loss function 푙표푠푠(휃) (a negative log-likelihood) and a regularization term Ω(휃). 휃 is the set of parameters to be calibrated through the model (i.e., the functions

푓푔, each containing the structure of the tree and the leaf scores 휔푔푙). In Equation (1), 푦푖푡 is the observed binary dependent variable. 푿푖푡−1 = (풙푖푡−1, 풙푖, 풙푡−1) is the vector of independent variables as shown in

Table 4, with 풙푖푡−1 being the time-variant variables such as volume of photos, 풙푖 representing time- invariant variables such as cuisine types, and, 풙푡−1 being restaurant-invariant variables such as year dummies. 푦̂푖푡 is the predicted survival probability between 0 and 1. Please refer to Appendix D for more technical details of this predictive model and how we finetune the hyper-parameters (e.g., 훾 and 휆).

19

(1) min 푙표푠푠(휃) + Ω(휃) 휃

퐼 푇푖 푙표푠푠(휃) = − ∑푖=1 ∑ 0[푦푖푡 ln 푦̂푖푡 + (1 − 푦푖푡)ln (1 − 푦̂푖푡)] 푡=푇푖

1 퐿 Ω(휃) = ∑퐺 [훾퐿 + 휆 ∑ 푔 휔2 ] 푔=1 푔 2 푙=1 푔푙

∑퐺 푓 (푿 ) 푒 푔=1 푔 푖푡−1 Where 푦̂ = ; 푓 : 푿 → 휔 , 푙 = 1,2, … , 퐿 ; 푔 indexes trees; 푙 indexes leaves. 푖푡 ∑퐺 푓 (푿 ) 푔 푖푡−1 푔푙 푔 1+푒 푔=1 푔 푖푡−1

Given our primary interest in predicting restaurant survival in the next period, we lag 푡 for one period for time-variant independent variables. In our data, 1,723 restaurants failed within one year. The inclusion of such restaurants in our analysis can be highly beneficial for our understanding of restaurant survival. Therefore, we decide not to drop these restaurants in our analysis. Instead, we include a period zero where the UGC volume of all restaurants is set to zero, and non-UGC variables (i.e., restaurant characteristics, competitive landscape, and macro factors) are used to predict survival during the opening year of each restaurant. In total, we have 89,384 restaurant-year observations.

Given that the literature on restaurant survival (e.g., Carroll 1983; Fritsch et al. 2006; Kalleberg and Leicht 1991) has shown that organization age is a critical factor for business survival, we include age as a set of dummies (age=0, 1, 2, ..., 21, 21+) to explicitly examine the potential nonlinear relationship between age and restaurant survival. Because consumers can post photos without posting reviews and vice versa, Yelp separates photos (on top) and reviews (on bottom) on each restaurant page.

Consequently, we included photo variables and review variables as two separate batches of independent variables.

Since recent photos and reviews may carry more predictive power compared to earlier ones, we include both the one-period lag variables that capture photos and reviews posted in the last period

(referred to as OnePeriodt-1) and the cumulative variables that encompass all photos and reviews cumulated until the end of last period (referred to as Cumt-1) whenever applicable. We include Cumt-1 because: 1) cumulative values capture the entire history of a restaurant; 2) in each period, consumers see

20 summary statistics of cumulated photos and reviews on Yelp (e.g., the volume of photos and average star rating). Therefore, the results reported in the paper are based on a predictive model that includes both

OnePeriodt-1 and Cumt-1 variables.

As robustness checks, we also explore the following alternative model specifications: 1)

OnePeriodt-1 + Cumt-2 to avoid overlap of period t-1 in the model; 2) OnePeriodt-1 + OnePeriodt-2 + Cumt-3 to capture both one-year and two-year lags; 3) OnePeriodt-1 + Cumt-1 + Changet-1 to explicitly capture changes in UGC. Appendix D provides details on these alternative model specifications. Multiple-year lags are required in all alternative models described above. As such, we are constrained to drop restaurants that survived for only a short time period (one year for specifications 1 and 3; one or two years for specification 2), the inclusion of which can be crucial in our understanding of restaurant survival.

Tables A13 to A15 in Appendix D show that our main model specification performs similarly or slightly better compared to these alternative specifications, possibly because the main specification includes the highest number of observations along with relatively fewer predictors. Additionally, comparisons for the incremental predictive power of photos are consistent across these different model specifications, which also provides us with robustness checks for our Table 5 findings to be discussed below.

3.2 One-Year-Ahead Survival Prediction: What is the Most, Somewhat, or Not at All Important

Given that both photos and reviews reflect consumer experience, they may overlap in their incremental predictive power over all non-UGC variables. Hence, in Table 5, we compare the predictive power of the following model specifications: 1) baseline (including all non-UGC variables related to company, competition, and macro factors); 2) baseline + review (including all non-UGC variables and variables related to reviews only); 3) baseline + photo (including all non-UGC variables and variables related to photos only); and 4) baseline + review + photo.

We employ the following metrics to gauge the performance of the above models: 1) AUC; 2)

Kullback-Leibler (KL) divergence; 3) sensitivity; 4) specificity; 5) balanced accuracy; and 6) Pseudo R2.

Our predicted 푦̂푖푡 is a probability bounded between 0 and 1, which is used to calculate AUC, KL divergence, and Pseudo R2. Because metrics (3) to (5) can only be calculated using a predicted binary

21 outcome, we converted the survival probability to a dummy variable using 0.5 as the cutoff. We use metrics (1) to (5) for out-of-sample validation and metric (6) for in-sample fit measurement. KL divergence (Kullback and Leibler 1951) is an information theory measure of divergence between predicted distribution and observed distribution, which has been widely applied in the marketing literature

(e.g., Dzyabura and Hauser 2011; Hauser et al. 2014; Huang and Luo 2016). A smaller KL divergence value represents better accuracy. In our context, sensitivity is the hit rate among open restaurant-year observations, and specificity is the hit rate among closed restaurant-year observations. Because sensitivity focuses on open observations and specificity emphasizes closed observations, the training data are reweighted so that the total weights of open and closed observations are equal (Seiffert et al. 2008). The balanced accuracy is the arithmetic average of sensitivity and specificity (Bekkar et al. 2013; Brodersen et al. 2010). For AUC and KL divergence, the training data do not need to be reweighted because they work well for imbalanced samples (Dzyabura and Hauser 2011; Kotsiantis et al. 2006). Pseudo R2 (bounded between 0 and 1) measures the percentage improvement in log-likelihood over the null model.

To statistically test the incremental predictive power across the four different model specifications, we use 10-fold cross-validation to predict survival in each year and report the average performance of these models in Table 5. Specifically, we shuffle the restaurants and divide them into ten even buckets. For each iteration, we use 9 buckets (i.e., 90% of restaurants) to train the XGBoost algorithm and calculate in-sample Pseudo R2 for measuring in-sample fit. The hold-out bucket (i.e., 10% of restaurants) is used to calculate metrics (1) to (5). Then, we calculate the mean and the standard deviation of model performance over the years and cross-validation iterations.

Table 5 suggests that photos are strong predictors of restaurant survival. In particular, the

“baseline + photo” model exhibits significantly greater performance compared to the baseline according to all metrics. Additionally, the “baseline + photo” model also performs better than the “baseline + review” model on most performance metrics, with the exception of one metric (sensitivity, i.e., the hit rate among open restaurant-year observations). In our particular context, there are much fewer closed observations than open observations. Therefore, predicting restaurant closure (i.e., specificity) is a

22 considerably harder task. The specificity measure shows that the “baseline + photo” model does a much better job than the “baseline + review” model. Moreover, “baseline + review + photo” model does not improve much from “baseline + photo” model, indicating that prediction improvement from UGC mainly stems from photos. Such comparisons are further confirmed by the ROC curves in Figure 3.

By and large, photos appear to be a leading indicator of restaurant survival above and beyond reviews and other known factors. Recall that we summarize the content of photos and reviews by topic modeling in our main model specification as described above. In Appendix D, we use the top 100 Clarifai labels to capture photo content and the top 100 nouns to capture review content. The results are qualitatively the same.

We further explore what aspects of photos are the most informative in predicting survival.

Specifically, we compare the incremental predictive power of various aspects of photos as defined in

Table 4A (photographic attributes, caption, volume, helpful votes, and content). Table 6 shows that the predictive power of photos mainly stems from their content, followed by helpful votes. Although photo volume increases prediction accuracy above the baseline based on most performance metrics, such improvements are not statistically significant. Interestingly, photo captions and photographic attributes do not improve prediction accuracy over the baseline model according to most metrics, with the exception that they significantly increase the sensitivity (i.e., the hit rate among open restaurant-year observations) over the baseline model. Again, there are many more open observations than closed observations in our context and thus predicting opening is a relatively easier task. Overall, our findings suggest that the informativeness of photos (i.e., photo content and helpfulness) plays a more prominent role than do photo volume or photo aesthetics.

Linking these findings to our earlier conjecture as to why consumer-posted photos may predict restaurant survival, if the posters’ perspective were at play, photo volume would play an important role in predicting survival. Meanwhile, if the viewer’s perspective prevails by facilitating a better horizontal match between restaurants and photo viewers, photo content and helpful votes provided by photo viewers would be the most informative predictors of survival. Our findings above seem to suggest that, while both

23 perspectives might be at play, the viewer’s perspective may play a more salient role in explaining why consumer-posted photos may help predict restaurant survival.

We further examine the top 35 predictors with the highest SHAP feature importance weights

(bounded between 0 and 1) (Lundberg et al. 2020; Lundberg and Lee 2017) for the “baseline + review + photo” model. We choose this model because it contains all factors related to restaurant survival, and thus we can compare the importance weights of all variables. Essentially, the SHAP feature importance is the marginal contribution of one predictor considering all possible combinations of other predictors (Molnar

2021). So it takes into account the correlations across predictors. Figure 4 shows the top 35 variables and their SHAP feature importance.8

First and foremost, we find that restaurant age is among the most significant predictor of restaurant survival. Among the top 35 most informative predictors, we observe five age dummy variables

(i.e., age >21, age=1, age=0, age=2, age=3, in descending order of SHAP feature importance). Such a finding provides some external validity of our findings because prior research (e.g., Carroll 1983; Fritsch et al. 2006; Kalleberg and Leicht 1991) has long shown that organizational age highly correlates with survival. We conjecture that restaurants with ages<=3 have smaller chances to survive and that restaurants with ages > 21 are more likely to survive. We will formally analyze the relationship between age and survival in Section 4.

Furthermore, we observe that eight out of the top 35 variables relate to photos. Specifically, the three most informative predictors related to photos are 1) the proportion of photos depicting food; 2) the proportion of photos depicting outside; and 3) the proportion of photos with helpful votes. The results are consistent with prior literature suggesting food is the most critical aspect of a restaurant (Duarte Alonso et al. 2013; Sulek and Hensley 2004) and outdoor attractions are a vital factor for tourism regions (Getz and

Brown 2006). Certain other photo content (i.e., the proportion of photos depicting interior; LDA topic on

8 The top 35 predictors are generally consistent when forecasting survival from 2010 to 2015. The results in Figure 4 are based on the scenario of using data until 2014 to predict survival in year 2015. We choose to report results from this scenario because it covers the most observations and contains the most recent and comprehensive data in our sample.

24 food and drink based on Clarifai labels) and the total volume of photos are also among the top 35 variables. Interestingly, photo captions and photographic attributes are not in the top 35 predictors, indicating again that content may be more informative for restaurant survival than caption or photographic attributes.

With regard to control variables, we learn that mixed/negative reviews on food (a review content topic), helpful votes received by reviews, review length, star rating, and consumer sentiment of service also appear among the top 35 most informative predictors for restaurant survival. Our findings regarding review valence and length are by and large consistent with prior research linking consumer reviews with demand or stock performance (e.g., Chevalier and Mayzlin 2006; Tirunillai and Tellis 2012). Regarding restaurant characteristics, consistent with prior literature on business survival (e.g., Audretsch and

Mahmood 1995; Bates 1990; Kalnins and Mayer 2004; Parsa et al. 2005), we observe that chain status, price level, and cuisine type are all predictive of restaurant survival besides age. Lastly, eight of the top

35 predictors relate to competition, including the number of overlapping/non-overlapping competitors, new entries/exits, review volume, star rating, and photo volume of competitors, suggesting the importance of competitive landscape in restaurant survival.

3.3 For Which Types of Restaurants Are Photos More Informative in Predicting Survival?

We further explore whether the incremental predictive power of photos differs for heterogeneous restaurants. Specifically, we focus on the following three dimensions of restaurant heterogeneity: 1) chain vs. independent restaurants; 2) restaurants of different ages; 3) restaurants of varying price levels. We focus on these dimensions and conjecture that their survival may be differently associated with photos because prior literature on business survival suggests that these restaurants differ in their survival rates

(Audretsch and Mahmood 1995; Carroll 1983; Lafontaine et al. 2018).

Table 7 provides the results of such comparisons. To fully calibrate the model for each type of restaurants, we retrain our predictive model separately using the subset of data associated with each restaurant type. The results shown here are based on AUC. We also examine other metrics and results are qualitatively consistent. We learn that while photos significantly increase prediction accuracy for all sub-

25 types of restaurants, photos carry more incremental predictive power for independent (vs. chain), young or mid-aged (vs. established), and medium-priced (vs. low-priced) restaurants.

Photos are more predictive for the survival of independent restaurants compared to chain restaurants, possibly because consumers are in general less uncertain about chain restaurants due to their brand reputation and franchise regulations (Kalnins and Mayer 2004; Lafontaine et al. 2018). For ease of exposition, we group restaurants by 33.3% and 66.7% percentiles of age in the second comparison, i.e., young (1<=age<=3), mid-aged (321).9 Table 7 demonstrates that photos carry more predictive power for young and mid-aged restaurants compared to established restaurants, possibly because the latter are well-known and rely less on consumer-posted photos. Given that restaurants with three or four dollar signs on Yelp represent only 5% of our data, we include only medium-priced (price level=2) and low-priced (price level=1) restaurants in the third comparison to be conservative. Table 7 suggests that photos carry more predictive power for medium-priced than for low- priced restaurants. It is possible that consumers may care more about their dining experiences when paying more, and photos might be more useful in facilitating the horizontal match in this case.

3.4 Multiple-Year Survival Prediction: How Long Can Photos Stay Predictive in Forecasting

Restaurant Survival?

We now explore the ability of photos to predict multiple-year restaurant survival. This question is particularly important because owners may not foresee their restaurants’ survival beyond one year, and a longer-term forecast for survival can be useful for their resource allocation and strategic planning.

We aim to predict restaurant survival during the future ∆푡 years, assuming that restaurant owners do not observe future photos and reviews (e.g., using photos/reviews in 2010 predicting whether a restaurant would survive until the end of 2013). Therefore, we redefine the dependent variable as

푦̃푖푡+∆푡 = 0, if any of 푦푖푡+1, 푦푖푡+2, … , 푦푖푡+∆푡 is 0 (recall that 0 means exit); otherwise, 푦̃푖푡+∆푡 = 1. We use

9 We exclude age=0 in this comparison because restaurants have no photos at age=0. We also tried other age cutoff values (e.g., 1<=age<=5; 5 30), and the results were consistent. In Appendix D, we further report photos’ incremental predictive power by each age for restaurants younger than or equal to five years old in order to better understand photos’ predictive power, since younger restaurants tend to be more susceptible to failure.

26

푿푖푡 to predict 푦̃푖푡+∆푡 with ∆푡=1,2,3,4,5 years. We remove all restaurants whose next ∆푡-year periods are censored at the end of year 2015, as we do not know their survival status in the next ∆푡-year period. For each ∆푡, we train separate models (baseline; baseline + review; baseline + photo; baseline + review + photo) with data till 푡 and 10-fold cross-validation. We report average performances of these models across 푡 and cross-validation iterations in Figure 5.

As expected, the predictive power of all models declines with forecast duration. Interestingly, the

AUC curve of the “baseline + photo” model is significantly above that of the baseline model for up to three years. Nevertheless, the “baseline + review” model is significantly above the baseline model for only one year. These findings indicate that consumer-posted photos can predict restaurant survival for up to three years, while reviews are only predictive for one year. Moreover, the “baseline + photo” model is almost as good as the “baseline + review + photo” model, indicating that improvement in multiple-year survival prediction mainly stems from photos.

As robustness checks, we also explore alternative specifications for multiple-year survival predictions in Appendix D. Compared with the main specification where we predict survival during the future ∆푡 years (e.g., forecasting whether a restaurant would survive until the end of 2013), these robustness checks break down the prediction for each year (e.g., predicting whether a restaurant would survive in 2011, 2012, or 2013). To make these robustness checks comparable with the main specification above, we multiply the predicted survival probability for each future year to derive the predicted survival probability during the future ∆푡 years. For instance, 푦̃̂푖2013 = 푦̂푖2011 ∗ 푦̂푖2012 ∗ 푦̂푖2013. Specifically, we try out three different model specifications within this setup: 1) predict survival in year 푡 + ∆푡 assuming that a restaurant owner does not observe future UGC (the same assumption as in the main specification above); 2) predict survival in year 푡 + ∆푡, assuming that UGC in future years will be the same as that from last year (incorporating continuity in UGC for both focal and competing restaurants); and 3) predict survival in year 푡 + ∆푡 based on distributed lag model (Mela et al. 1997). Findings from all robustness checks are consistent with what we learn from the main specification above.

27

Our multiple-year survival prediction is related to the literature on long-term effects of marketing

(e.g., Ataman et al. 2010; Brüggen et al. 2011; Montabone and Soto 2010; Pauwels et al. 2002). There are two common approaches to capture long-term effects: 1) relate the treatment to the long-term outcome directly (Brüggen et al. 2011; Liu et al. 2017); and 2) derive the long-term effect from the intermediate effect quantified with a model (e.g., the distributed lag model in Mela et al. 1997; the hidden Markov model in Montoya et al. 2010; and the vector autoregressive model in Pauwels et al. 2002; see reviews in

Ataman et al. 2010 and Pauwels et al. 2002). Our main model specification and first alternative specification are consistent with the first approach that does not make any functional form assumptions.

The other two alternative specifications belong to the second approach.

4. Cluster-Robust Causal Forests for Parameter Interpretation

While generating a kernel of interesting and managerially relevant insights, the XGBoost algorithm coupled with SHAP values as described above is not ideal for obtaining unbiased/consistent parameter estimates quantifying how each independent variable relates to restaurant survival. Within our context, the key benefit of implementing a cluster-robust causal forests model (Athey et al. 2019; Athey and

Wager 2019) is that causal forests emphasize obtaining consistent estimates of treatment effects, rather than maximizing out-of-sample prediction accuracy as in many prediction-based machine learning models. Although we can plot each independent variable against its SHAP value across observations to check whether it is positively or negatively associated with survival (Lundberg et al. 2020), the interpretations may be subject to bias due to common issues in predictive machine learning models such as regularization and overfitting (Chernozhukov et al. 2018). The gradient boosted trees model as described in Section 3 is not an exception. Therefore, findings from our causal forests model can help us better gauge the extent to which each of the top 35 predictors associates with restaurant survival. Similar causal forests models have been used in some recent marketing papers (e.g., Guo et al. 2017; Narang et al.

2019). In Section 4.1, we describe how we apply the cluster-robust causal forests model in our setting. In

Section 4.2, we report results from the model.

28

4.1 Model: Cluster-Robust Causal Forests

As discussed in Athey and Imbens (2016) and Wager and Athey (2018), the causal forests model can serve as a good alternative to conventional propensity score methods in inferring treatment effects from rich observational data like ours. Similar to propensity score matching (Rosenbaum and Rubin 1983;

Rubin 1973) in spirit, cluster-robust casual forests are built upon an outcome variable (푦푖푡), a treatment variable (푊푖푡−1), and a set of control variables (푿푖푡−1). For instance, when estimating the treatment effect of the proportion of photos with helpful votes (푊푖푡−1) on restaurant survival (푦푖푡), we include a vector of control variables (푿푖푡−1), consisting of the number of photos and reviews, photo content, review-related variables, restaurant characteristics, competition, year dummies, and zip codes. The relationship between the outcome, treatment, and controls is modeled in Equation (2) (Athey et al. 2019).

(2) 푦푖푡 = 푊푖푡−1휏(푿푖푡−1) + 휇(푿푖푡−1) + 휀푖푡

This model aims to estimate the treatment effect 휏(푿푖푡−1) conditional on 푿푖푡−1, with 휇(푿푖푡−1) capturing effects only from controls. The treatment variable is assumed to be independent of potential

(푊) outcome conditional on controls (Athey et al. 2019): {푌푖푡 } ⊥ 푊푖푡−1 | 푿푖푡−1. As discussed in Wooldridge

(2009), one way to mitigate endogeneity bias due to unobservables is to include as many variables as possible in the model to reduce the correlation between the treatment/control variables and the error term

휀푖푡. As shown in Figure A18 in Appendix E, we include as many as 95 controls in our causal forests estimations. Similar to Hollenbeck (2018), the inclusion of these controls is crucial to our empirical strategy, which may be viewed as descriptive while attempting to come close to causality.

Compared to conventional propensity score matching, cluster-robust causal forests utilize a flexible non-parametric data-driven approach to determine similarity across observations. As Athey et al.

(2019) discussed, this approach is particularly advantageous with a large number of controls as in our study. Additionally, as Fong et al. (2018) mentioned, the estimation of traditional propensity score methods is often sensitive to the model specification especially when the treatment variable is continuous.

29

The causal forests are immune to such problems because the building of an honest tree (the building block of causal forests) does not rely on any particular functional form.

We describe the basic intuition of the model below and provide more technical details in

Appendix E. In order for the causal forests model to properly identify the treatment effects based on observation data, the correlations between the treatment variable and the control variables should not be very high (i.e., the “overlap” assumption as stated in Wager and Athey 2018). Consequently, built upon the top 35 predictors from XGBoost, we define the treatment and control variables in our casual forests model as follows: First, we consolidate highly correlated variables. For instance, we combine volume of photos and volume of reviews into one variable because these two variables are highly correlated (see

Table A17 in Appendix E for a complete list of consolidated variables). Next, we decide to use cumulative rather than one-period lag variables for photos and reviews in our causal forests because 1) these two sets of variables are often highly correlated and 2) our XGBoost algorithm suggests that cumulative variables often times are more predictive of restaurant survival than one-period lag variables.

Lastly, we include all general content types of photos (i.e., proportions of food, outside, inside, drink, and menu photos), six age brackets (<= 1, 2-3, 4-7, 8-21, 22-42, > 42, each covering about 16.67% of observations)10, and the top 10 cuisine types for the completeness of the comparison. We also include average star rating based on prior literature (Chevalier and Mayzlin 2006). See Table 8 for a complete list of our treatment variables. To obtain the effect of each treatment variable, we estimate a separate causal forests model, with all other treatment variables plus restaurant quality dimensions, zip code, and year as controls (see the complete list of controls in Table 8).11

10 We did not use age dummies (age=0, 1, 2, ..., 21, 21+) for causal forests because a single age can have very few restaurant observations, making it difficult to satisfy the “overlap” assumption in causal forests model. This assumption requires that there should be enough observations in both treated and untreated conditions near any set of values of controls, which also implies the correlations between the treatment variable and the control variables should not be very high (Wager and Athey 2018).

11 When an age bracket dummy (e.g., 4 <= age <= 7) is the treatment variable, all other age brackets are combined as the untreated group because it provides a relatively larger pool of untreated units to satisfy the “overlap” assumption.

30

We then employ orthogonalization (Athey et al. 2019), also used in double machine learning

(Chernozhukov et al. 2018), to further reduce correlations between treatment variables and controls.

Specifically, as in Equation (3), we regress out the main effect of 푿푖푡−1 on 푊푖푡−1 and 푦푖푡 using regression

∗ ∗ forests and retain the residuals, 푊푖푡−1 and 푦푖푡, to build causal forests. The idea is similar in spirit to propensity score adjustment for continuous treatment (Hirano and Imbens 2004), with the goal that, after such adjustment, the treatment variable becomes more independent of controls. Table A18 in Appendix E demonstrates that the correlations between the treatment variables and controls are considerably reduced after orthogonalization.

∗ ̂ (−푖) ∗ (−푖) (3) 푊푖푡−1 = 푊푖푡−1 − 푊푖푡−1 (푿푖푡−1) , 푦푖푡 = 푦푖푡 − 푦̂푖푡 (푿푖푡−1).

Note that, even with the orthogonalization step above, there are still two types of unobservables that could potentially threaten our attempts to properly identify the treatment effects in our setting: 1) time-invariant restaurant-level unobservables (e.g., owner education), 2) time-variant unobservables (e.g., changes in finances or chef). Our cluster-robust causal forests account for time-invariant restaurant-level unobservables by allowing clustered errors (Athey et al. 2019). All observations from one restaurant are considered as a cluster. Intuitively the restaurant-year observations are not entirely independent and subject to unobserved restaurant-level effects that might influence survival. Essentially, utilizing clustered errors with causal forests is analogous to a non-parametric random-effect model that does not make distributional assumptions (Athey and Wager 2019).12

We try to mitigate potential bias caused by time-variant unobservables via our control variables.

Based on directed acyclical graph theory (Morgan and Winship 2015; Pearl 2014), the inclusion of controls is often adequate to correct such bias as long as control variables fully absorb the impact of time- variant unobservables on the outcome variable. Within our context, for any time-variant unobservables whose effects on survival can be fully captured by our restaurant quality measures, controlling restaurant

12 We have considered fixed effects to account for unobserved restaurant characteristics. Nevertheless, because our dependent variable is censored and non-repeating (i.e., each restaurant can only fail once), simply adding fixed effects to such nonlinear models will lead to inconsistent and biased coefficient estimates (Greene 2004).

31 quality is adequate to correct the bias caused by such unobservables. Namely, if any change in chef impacts a restaurant’s survival by affecting its quality in food, service, environment, or price, we should be able to absorb such impact by including our restaurant quality metrics in the controls.

Within this context, we follow Athey et al. (2019) by building honest trees to weight the similarity between a set of values for controls 풙 and an arbitrary 푿푖푡−1, denoted by 훼푖푡(풙). Then we can calculate 휏̂(풙) (the conditional treatment effects at 풙) and 휏̂ (the average treatment effect) as in Equation

(4) (Athey et al. 2019; Athey and Wager 2019).

∗ ̅̅̅̅̅∗ ∗ ̅̅̅∗̅ ( ) ∑푖푡 훼푖푡(풙)(푊푖푡−1−푊훼)(푦푖푡−푦훼) ̅̅̅̅∗ ∑ ( ) ∗ ̅̅̅∗ ∑ ( ) ∗ (4) 휏̂ 풙 = ∗ ̅̅̅̅̅∗ 2 , where 푊훼 = 푖푡 훼푖푡 풙 푊푖푡−1, 푦훼 = 푖푡 훼푖푡 풙 푦푖푡 ∑푖푡 훼푖푡(풙)(푊푖푡−1−푊훼)

1 휏̂ = 0 ∑푖푡 휏̂(푿푖푡−1) ∑푖 (푇푖−푇푖 +1)

4.2 Results: Cluster-Robust Causal Forests

Table 8 presents results from our cluster-robust causal forests model.13 As discussed above, each row is a separate cluster-robust causal forests estimation, where we treat the focal variable as treatment and all other observable factors (including other variables listed in Table 8, restaurant quality measures, zip-code dummies, and year dummies) as controls. A positive sign means a positive association with restaurant survival.

Among photo content variables, the proportion of food photos has the largest positive association with restaurant survival, followed by the proportions of outside and interior photos. Such a result is not surprising, since prior literature has suggested that food is the most critical aspect of a restaurant (Duarte

Alonso et al. 2013; Sulek and Hensley 2004). Food photos could be what get consumers in the door. Outside photos show outdoor attractions, a vital factor for tourism regions (Getz and Brown 2006).

Consumers may also use outside photos to check the neighborhood and find a restaurant’s gate. Inside photos depict the ambiance of a restaurant. Overall, these findings indicate that photos may offer useful cues to provide a better horizontal match between restaurants and consumer preferences.

13 Given that our focus here is to interpret how each independent variable relates to restaurant survival, we use all data to estimate the causal forests since there is no longer a need to separate out the training and testing samples.

32

Moreover, the proportion of photos with helpful votes is also positively associated with restaurant survival. In practice, helpful votes may also reveal negative aspects of a restaurant (e.g., stains on the table). Nevertheless, we learn from our photo caption analysis that most photos carry neutral or positive sentiment. Mudambi and Schuff (2010) have suggested that a review’s helpful votes reflect its information diagnosticity in facilitating the consumer’s purchase decision process. In a similar spirit, we conjecture that helpful votes on photos reflect the degree of diagnosticity associated with photos, which in turn facilitates a better match between consumers and the focal business, hence positively relates to survival. Interestingly, after all other factors controlled, the cumulative number of photos and reviews is not significantly associated with restaurant survival.

For reviews, we find that restaurant survival is negatively associated with the proportion of mixed/negative reviews on food, review length, and standard deviation of star ratings, while positively associated with helpful votes for reviews and average star rating. The results on review length and star rating are consistent with prior literature studying the relationship between consumer text reviews and demand (e.g., Chevalier and Mayzlin 2006; Zhu and Zhang 2010). Table 8 also reveals the types of restaurants with better chances of survival. We observe a general trend that more established restaurants have a higher survival chance, resonating with Carroll (1983). In particular, the first three years are the hardest for restaurants to survive, and the survival rate is quite stable after the first seven years.

Additionally, mainstream cuisine-type restaurants (e.g., Mexican, American traditional, Pizza, Nightlife,

Fast Food, Sandwiches, and Chinese categories), lower-priced restaurants, and chain restaurants have better odds of survival. Lastly, we find that restaurants are less likely to survive when there are more competitors, or when competitors receive more photos and reviews, or have better star ratings.

Restaurants usually concentrate in locations with more foot traffic and consequently may face stiffer competition. These findings indicate that the competition effect outweighs the demand effect regarding competitor concentration, reinforcing that monitoring competitors is important for restaurant managers.

33

5. Conclusions

We investigate whether consumer-posted photos can forecast business survival above and beyond reviews, restaurant characteristics, competitive landscape, and macro conditions. Utilizing a predictive algorithm called XGBoost, we discover that photos provide significant incremental predictive power for restaurant survival. Specifically, the informativeness of photos (i.e., photo content and helpfulness) carries the most predictive power for restaurant survival compared to other aspects of photos such as photographic attributes (e.g., composition or brightness). We also find that photos are more informative in predicting the survival of independent, young or mid-aged, and medium-priced restaurants. Moreover, assuming that restaurant owners possess no knowledge of future UGC for the focal business and its competitors, current photos can predict restaurant survival for up to three years, while reviews are predictive for only one year. We further develop cluster-robust causal forests with the aim to obtain unbiased/consistent parameter estimates quantifying the relationship between the set of most informative predictors (based on our XGBoost algorithm) and restaurant survival. Among photo content variables, the proportion of food photos has the largest positive association with restaurant survival, followed by proportions of outside and interior photos. The proportion of photos with helpful votes is also positively related to restaurant survival. To the best of our knowledge, this is the first empirical study that explores whether consumer photos can serve as a leading indicator of long-term business prosperity. Our research is also among the most comprehensive empirical studies on restaurant survival to date.

Findings of this study generate useful insights for multiple stakeholders. Business investors and landlords can employ our research results to obtain a better evaluation of the market as well as to monitor restaurant survival likelihood. Our models can significantly increase survival prediction accuracy for better-informed capital investment/lease decisions on restaurants. Online platforms can also utilize our method for premium business intelligence. Furthermore, our findings regarding consumer-posted photos can provide foresight to managers and investors in terms of survival likelihood for up to three years, which can be highly valuable for competitive intelligence, resource allocation, and longer-term strategic

34 planning. Lastly, our research offers insights into the macro performance of the restaurant industry for marketing research firms and trade associations.

Our research is also subject to several limitations and provides some fruitful directions for future research. First, future research may follow Hollenbeck (2018) by collecting restaurant reviews from multiple platforms (e.g., TripAdvisor, Google Reviews) to obtain a better measure of restaurant quality.

Under our approach, the quality metrics for less popular restaurants might be noisier than those of popular restaurants because the former have fewer reviews. Collecting reviews from multiple platforms provides a larger pool of reviews and may potentially mitigate the measurement error of restaurant quality from Yelp only. Based on simulation studies, we learn that, as long as the quality measure is unbiased, even if the magnitude of its noise (i.e., variance of noise) is larger for less popular restaurants, our estimates of the treatment effects would remain consistent. Second, future research may consider collecting Census data to gather possibly richer information on these restaurants to further enrich the restaurant survival study.

Lastly, while we emphasize the relationship between consumer-posted photos and restaurant survival, future research may examine how such photos relate to other outcomes related to long-term business prosperity such as consumer attitudes towards the focal business and brand loyalty.

References Anonymous. 2015. “5 Things Startup Restaurants Typically Overspend On.” Restaurant Engine, May 04. Archak, Nikolay, Anindya Ghose, and Panagiotis G. Ipeirotis. 2011. “Deriving the Pricing Power of Product Features by Mining Consumer Reviews.” Management Science 57(8):1485–1509. Arnheim, Rudolf. 1965. Art and Visual Perception: A Psychology of the Creative Eye. Univ of California Press. Ataman, M. Berk, Harald J. Van Heerde, and Carl F. Mela. 2010. “The Long-Term Effect of Marketing Strategy on Brand Sales.” Journal of Marketing Research 47(5):966–882. Athey, Susan, and Guido Imbens. 2016. “Recursive Partitioning for Heterogeneous Causal Effects.” Proceedings of the National Academy of Sciences 113(27):7353–60. Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” Annals of Statistics 47(2):1148–78. Athey, Susan, and Stefan Wager. 2019. “Estimating Treatment Effects with Causal Forests: An Application.” Observational Studies. Audretsch, David B., Patrick Houweling, and A. Roy Thurik. 2000. “Firm Survival in the Netherlands.” Review of Industrial Organization 16(1):1–11. Audretsch, David B., and Talat Mahmood. 1995. “Photographic Memory: The Effects of Volitional Photo Taking on Memory for Visual and Auditory Aspects of an Experience.” The Review of Economics and Statistics 97–103. Barasch, Alixandra, Kristin Diehl, Jackie Silverman, and Gal Zauberman. 2017. “Photographic Memory: The Effects of Volitional Photo Taking on Memory for Visual and Auditory Aspects of an

35

Experience.” Psychological Science 28(8):1056–66. Barasch, Alixandra, Gal Zauberman, and Kristin Diehl. 2017. “How the Intention to Share Can Undermine Enjoyment: Photo-Taking Goals and Evaluation of Experiences.” Journal of Consumer Research 44(6):1220–37. Bates, Timothy. 1990. Entrepreneur Human Capital Inputs and Small Business Longevity. Vol. 72. MIT Press. Bates, Timothy. 1995. “A Comparison of Franchise and Independent Small Business Survival Rates.” Small Business Economics 7(5):377–88. Bates, Timothy. 1998. “The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms.” Journal of Business Venturing 13(2):113–30. Bekkar, Mohamed, Hassiba Kheliouane Djemaa, and Taklit Akrouf Alitouche. 2013. “Evaluation Measures for Models Assessment over Imbalanced Data Sets.” Journal of Information Engineering and Applications 3(10). Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3(Jan):993–1022. Boden Jr, Richard J., and Alfred R. Nucci. 2000. “On the Survival Prospects of Men’s and Women’s New Business Ventures.” Journal of Business Venturing 15(4):347–62. Bradley, Andrew P. 1997. “The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms.” Pattern Recognition 30(7):1145–59. Brodersen, Kay Henning, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M. Buhmann. 2010. “The Balanced Accuracy and Its Posterior Distribution.” IEEE 20th International Conference on Pattern Recognition 3121–24. Brüggen, Elisabeth C., Bram Foubert, and Dwayne D. Gremler. 2011. “Extreme Makeover: Short-and Long-Term Effects of a Remodeled Servicescape.” Journal of Marketing 75(5):71–87. Bujisic, Milos, Joe Hutchinson, and H. G. Parsa. 2014. “The Effects of Restaurant Quality Attributes on Customer Behavioral Intentions.” International Journal of Contemporary Hospitality Management 26(8):1270–91. Carroll, Glenn R. 1983. “A Stochastic Model of Organizational Mortality: Review and Reanalysis.” Social Science Research 12(4):303–29. Chang, Jonathan, Sean Gerrish, Chong Wang, Jordan Boyd-Graber, and David Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” Advances in Neural Information Processing Systems 22:288–96. Chen, Tianqi, Michaël Benesty, and Tong He. 2018. “Understand Your Dataset with Xgboost.” R Document. Chen, Tianqi, and Carlos Guestrin. 2016. “Xgboost: A Scalable Tree Boosting System.” Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining 785–94. Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21:C1–68. Chevalier, Judith A., and Dina Mayzlin. 2006. “The Effect of Word of Mouth on Sales: Online Book Reviews.” Journal of Marketing Research 43(3):345–54. Cooper, Arnold C., F. Javier Gimeno-Gascon, and Carolyn Y. Woo. 1994. “Initial Human and Financial Capital as Predictors of New Venture Performance.” Journal of Business Venturing 9(5):371–95. Diehl, Kristin, Gal Zauberman, and Alixandra Barasch. 2016. “How Taking Photos Increases Enjoyment of Experiences.” Journal of Personality and Social Psychology. Ding, Min, John R. Hauser, Songting Dong, Daria Dzyabura, Zhilin Yang, S. U. Chenting, and Steven P. Gaskin. 2011. “Unstructured Direct Elicitation of Decision Rules.” Journal of Marketing Research 48(1):116–27. Duarte Alonso, Abel, Martin O’neill, Yi Liu, and Michelle O’shea. 2013. “Factors Driving Consumer Restaurant Choice: An Exploratory Study From the Southeastern United States.” Journal of Hospitality Marketing & Management 22(5):547–67.

36

Dzyabura, Daria, and John R. Hauser. 2011. “Active Machine Learning for Consideration Heuristics.” Marketing Science 30(5):801–19. Dzyabura, Daria, Siham El Kihal, John R. Hauser, and Marat Ibragimov. 2018. “Leveraging the Power of Images in Predicting Product Return Rates.” Working Paper. Dzyabura, Daria, and Renana Peres. 2018. “Visual Elicitation of Brand Perception.” Working Paper. Fawcett, Tom. 2004. “ROC Graphs: Notes and Practical Considerations for Researchers.” Machine Learning 31(1):1–38. Fawcett, Tom. 2006. “An Introduction to ROC Analysis.” Pattern Recognition Letters 27(8):861–74. Fellbaum, Christiane. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. Fong, Christian, Chad Hazlett, and Kosuke Imai. 2018. “Covariate Balancing Propensity Score for a Continuous Treatment: Application to the Efficacy of Political Advertisements.” The Annals of Applied Statistics 12(1):156–77. Freeman, Michael. 2007. The Photographer’s Eye: Composition and Design for Better Digital Photos. Routledge. Friedman, Jerome. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 1189–1232. Fritsch, Michael, Udo Brixy, and Oliver Falck. 2006. “The Effect of Industry, Region, and Time on New Business Survival--a Multi-Dimensional Analysis.” Review of Industrial Organization 28(3):285– 306. Getz, Donald, and Graham Brown. 2006. “Critical Success Factors for Wine Tourism Regions: A Demand Analysis.” Tourism Management 27(1):146–58. Ghose, Anindya, Panagiotis G. Ipeirotis, and Beibei Li. 2012. “Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowdsourced Content.” Marketing Science 31(3):493–520. Greene, William. 2004. “The Behaviour of the Maximum Likelihood Estimator of Limited Dependent Variable Models in the Presence of Fixed Effects.” The Econometrics Journal 7(1):98–119. Guo, Tong, S. Sriram, and Puneet Manchanda. 2017. “The Effect of Information Disclosure on Industry Payments to Physicians.” Working Paper. Haapanen, Mika, and Hannu Tervo. 2009. “Self-Employment Duration in Urban and Rural Locations.” Applied Economics 41(19):2449–61. Hanley, James A., and Barbara J. McNeil. 1982. “The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve.” Radiology 143(1):29–36. Hartmann, Jochen, Mark Heitmann, Christina Schamp, and Oded Netzer. 2019. “The Power of Brand Selfies in Consumer-Generated Brand Images.” Working Paper. Hauser, John R., Songting Dong, and Min Ding. 2014. “Self-Reflection and Articulated Consumer Preferences.” Journal of Product Innovation Management 31(1):17–32. Heyman, Stephen. 2015. “Photos, Photos Everywhere.” New York Times, July 29. Hirano, Keisuke, and Guido W. Imbens. 2004. “The Propensity Score With Continuous Treatments.” Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives 73–84. Hollenbeck, Brett. 2018. “Online Reputation Mechanisms and the Decreasing Value of Chain Affiliation.” Journal of Marketing Research 55(5):636–54. Huang, Dongling, and Lan Luo. 2016. “Consumer Preference Elicitation of Complex Products Using Fuzzy Support Vector Machine Active Learning.” Marketing Science 35(3):445–64. Hutto, Clayton J., and Eric Gilbert. 2014. “Vader: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.” Eighth International AAAI Conference on Weblogs and Social Media. Hyun, Sunghyup Sean. 2010. “Predictors of Relationship Quality and Loyalty in the Chain Restaurant Industry.” Cornell Hospitality Quarterly 51(2):251–67. Isola, Phillip, Devi Parikh, Antonio Torralba, and Aude Oliva. 2011. “Understanding the Intrinsic Memorability of Images.” Advances in Neural Information Processing Systems 2429--2437. Kalleberg, Arne L., and Kevin T. Leicht. 1991. “Gender and Organizational Performance: Determinants

37

of Small Business Survival and Success.” Academy of Management Journal 34(1):136–61. Kalnins, Arturs, and Francine Lafontaine. 2013. “Too Far Away? The Effect of Distance to Headquarters on Business Establishment Performance.” American Economic Journal: Microeconomics 5(3):157– 79. Kalnins, Arturs, and Kyle J. Mayer. 2004. “Franchising, Ownership, and Experience: A Study of Pizza Restaurant Survival.” Management Science 50(12):1716–28. Kamshad, Kimya M. 1994. “Firm Growth and Survival: Does Ownership Structure Matter?” Journal of Economics & Management Strategy 3(3):521–43. Ko, Eunhee (Emily), and Douglas Bowman. 2018. “Content Engineering of Images: The Effect of Sentiment and Complexity on Consumer Engagement with Brand-Themed User-Generated Content.” Working Paper. Kotsiantis, Sotiris, Dimitris Kanellopoulos, Panayiotis Pintelas, and others. 2006. “Handling Imbalanced Datasets: A Review.” GESTS International Transactions on Computer Science and Engineering 30(1):25–36. Krages, Bert. 2012. Photography: The Art of Composition. Simon and Schuster. Kreitler, Hans, and Shulamith Kreitler. 1972. Psychology of the Arts. Duke University Press. Kullback, Solomon, and Richard A. Leibler. 1951. “On Information and Sufficiency.” The Annals of Mathematical Statistics 22(1):79–86. Lafontaine, Francine, Marek Zapletal, and Xu Zhang. 2018. “Brighter Prospects? Assessing the Franchise Advantage Using Census Data.” Journal of Economics & Management Strategy. Larsen, Val, David Luna, and Laura A. Peracchio. 2004. “Points of View and Pieces of Time: A Taxonomy of Image Attributes.” Journal of Consumer Research 31(1):102–11. Lee, Dokyun, Emaad Ahmed Manzoor, and Zhaoqi Cheng. 2019. “Focused Concept Miner (FCM): An Interpretable Deep Learning for Text Exploration.” Working Paper. Li, Xi, Mengze Shi, and Xin (Shane) Wang. 2019. “Video Mining: Measuring Visual Information Using Automatic Methods.” International Journal of Research in Marketing 36(2). Li, Yiyi, and Ying Xie. 2018. “Is a Picture Worth a Thousand Words? An Empirical Study of Imagery Content and Social Media Engagement.” Working Paper. Liu, Liu, Daria Dzyabura, and Natalie Mizik. 2020. “Visual Listening In: Extracting Brand Image Portrayed on Social Media.” Marketing Science 39(4):669–848. Liu, Xiao, Dokyun Lee, and Kannan Srinivasan. 2019. “Large Scale Cross Category Analysis of Consumer Review Content on Sales Conversion Leveraging Deep Learning.” Journal of Marketing Research. Liu, Yan, Venkatesh Shankar, and Wonjoo Yun. 2017. “Crisis Management Strategies and the Long- Term Effects of Product Recalls on Firm Value.” Journal of Marketing 81(5). Liu, Yong. 2006. “Word-of-Mouth for Movies: Its Dynamics and Impact on Box Office Revenue.” Journal of Marketing 70(3):74–89. Lundberg, Scott, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” 31st Conference on Neural Information Processing Systems. Lundberg, Scott M., Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. “From Local Explanations to Global Understanding With Explainable AI for Trees.” Nature Machine Intelligence 2(1):56–67. Malik, Nikhil, Param Vir Singh, and Kannan Srinivasan. 2020. “A Dynamic Analysis of Beauty Premium.” Working Paper. McGrath, Thomas. 2017. “Social Listening Meets Image Sharing: A Picture Is Worth 1,000 Words.” Marketing Science Institute. Mela, Carl F., Sunil Gupta, and Donald R. Lehmann. 1997. “The Long-Term Impact of Promotion and Advertising on Consumer Brand Choice.” Journal of Marketing Research XXXIV:248–61. Miller, George A. 1995. “Wordnet: A Lexical Database for English.” Communications of the ACM 38(11):39–41. Molnar, Christoph. 2021. Interpretable Machine Learning: A Guide for Making Black Box Models

38

Explainable. Montabone, Sebastian, and Alvaro Soto. 2010. “Human Detection Using a Mobile Platform and Novel Features Derived from a Visual Saliency Mechanism.” Image and Vision Computing 28(3):391–402. Montoya, Ricardo, Oded Netzer, and Kamel Jedidi. 2010. “Dynamic Allocation of Pharmaceutical Detailing and Sampling for Long-Term Profitability.” Marketing Science 29(5):909–24. Morgan, Stephen L., and Christopher Winship. 2015. Counterfactuals and Causal Inference. Cambridge University Press. Mudambi, Susan M., and David Schuff. 2010. “What Makes a Helpful Online Review? A Study of Customer Reviews on Amazon. Com.” MIS Quarterly 185–200. Murphy, Kate. 2010. “First the Camera, Then the Fork.” New York Times, April 6. Narang, Unnati, Venkatesh Shankar, and Sridhar Narayanan. 2019. “The Impact of Mobile App Failures on Purchases in Online and Offline Channels.” Working Paper. Netzer, Oded, Ronen Feldman, Jacob Goldenberg, and Moshe Fresko. 2012. “Mine Your Own Business: Market-Structure Surveillance through Text Mining.” Marketing Science 31(3):521–43. Netzer, Oded, Alain Lemaire, and Michal Herzenstein. 2019. “When Words Sweat: Identifying Signals for Loan Default in the Text of Loan Applications.” Journal of Marketing Research 56(6):960–80. Newman, David, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. “Automatic Evaluation of Topic Coherence.” Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics 100–108. Nishiyama, Masashi, Takahiro Okabe, Imari Sato, and Yoichi Sato. 2011. “Aesthetic Quality Classification of Photographs Based on Color Harmony.” Computer Vision and Pattern Recognition 33–40. Parsa, H. G., Amy Gregory, and Michael Doc Terry. 2010. “Why Do Restaurants Fail ? Part III : An Analysis of Macro and Micro Factors.” Emerging Aspects Redefining Tourism and Hospitality 1(1):16–25. Parsa, H. G., Jean-Pierre I. Van Der Rest, Scott R. Smith, Rahul A. Parsa, and Milos Bujisic. 2015. “Why Restaurants Fail? Part IV The Relationship between Restaurant Failures and Demographic Factors.” Cornell Hospitality Quarterly 56(1):80–90. Parsa, H. G., John Self, Sandra Sydnor-Busso, and Hae Jin Yoon. 2011. “Why Restaurants Fail ? Part II - The Impact of Affiliation , Location , and Size on Restaurant Failures : Results from a Survival Analysis.” Journal of Foodservice Business Research 14(4):360–79. Parsa, H. G., John T. Self, David Njite, and Tiffany King. 2005. “Why Restaurants Fail.” Cornell Hotel and Restaurant Administration Quarterly 46(3):304–22. Pauwels, Koen, Dominique M. Hanssens, and Sivaramakrishnan Siddarth. 2002. “The Long-Term Effects of Price Promotions on Category Incidence, Brand Choice, and Purchase Quantity.” Journal of Marketing Research 39(4). Pearl, Judea. 2014. “The Deductive Approach to Causal Inference.” Journal of Causal Inference 2(2):115–29. Puranam, Dinesh, Vishal Narayan, and Vrinda Kadiyali. 2017. “The Effect of Calorie Posting Regulation on Consumer Opinion: A Flexible Latent Dirichlet Allocation Model with Informative Priors.” Marketing Science 36(5):726–46. Rafieian, Omid, and Hema Yoganarasimhan. 2018. “Targeting and Privacy in Mobile Advertising.” Working Paper. Rajaram, Prashant, Manchanda Puneet, and Eric Schwartz. 2018. “Finding the Sweet Spot: Ad Scheduling on Streaming Media.” Working Paper. Röder, Michael, Andreas Both, and Alexander Hinneburg. 2015. “Exploring the Space of Topic Coherence Measures.” Proceedings of the Eighth ACM International Conference on Web Search and Data Mining 399--408. Rosenbaum, Paul R., and Donald B. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70(1):41–55. Rubin, Donald B. 1973. “Matching to Remove Bias in Observational Studies.” Biometrics 159–83.

39

Ryu, Kisang, Hye-Rin Lee, and Woo Gon Kim. 2012. “The Influence of the Quality of the Physical Environment, Food, and Service on Restaurant Image, Customer Perceived Value, Customer Satisfaction, and Behavioral Intentions.” International Journal of Contemporary Hospitality Management 24(2):200–223. Seiffert, Chris, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2008. “Resampling or Reweighting: A Comparison of Boosting Implementations.” 20th IEEE International Conference on Tools with Artificial Intelligence 1:445–51. Sulek, Joanne M., and Rhonda L. Hensley. 2004. “The Relative Importance of Food, Atmosphere, and Fairness of Wait: The Case of a Full-Service Restaurant.” Cornell Hotel and Restaurant Administration Quarterly 45(3):235–47. Takahashi, Kazuma, Keisuke Doman, Yasutomo Kawanishi, Takatsugu Hirayama, Ichiro Ide, Daisuke Deguchi, and Hiroshi Murase. 2016. “A Study on Estimating the Attractiveness of Food Photography.” 2016 IEEE Second International Conference on Multimedia Big Data (BigMM) 444– 49. Timoshenko, Artem, and John R. Hauser. 2019. “Identifying Customer Needs From User-Generated Content.” Marketing Science 38(1):1–20. Tirunillai, Seshadri, and Gerard Tellis. 2014. “Extracting Dimensions of Consumer Satisfaction with Quality From Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation.” Journal of Marketing Research 51:463–79. Tirunillai, Seshadri, and Gerard J. Tellis. 2012. “Does Chatter Really Matter? Dynamics of User- Generated Content and Stock Performance.” Marketing Science 31(2):198–215. Toubia, Olivier, and Oded Netzer. 2016. “Idea Generation, Creativity, and Prototypicality.” Marketing Science. Trafton, Anne. 2014. “In the Blink of an Eye.” MIT News, January 16. Troncoso, Isamar, and Lan Luo. 2020. “Look the Part? The Role of Profile Pictures in Online Labor Markets.” Working Paper. Tveterås, Ragnar, and Geir Egil Eide. 2000. “Survival of New Plants in Different Industry Environments in Norwegian Manufacturing: As Semi-Proportional Cox Model Approach.” Small Business Economics 14(1):65–82. Vogel, Douglas Rudy, Gary W. Dickson, and John A. Lehman. 1986. “Persuasion and the Role of Visual Presentation Support: The UM/3M Study.” Minneapolis: Management Information Systems Research Center, School of Management, University of Minnesota. Wadhera, Devina, and Elizabeth D. Capaldi-Phillips. 2014. “A Review of Visual Cues Associated With Food on Food Acceptance and Consumption.” Eating Behaviors 15(1):132–43. Wager, Stefan, and Susan Athey. 2018. “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests.” Journal of the American Statistical Association 113(523):1128–1242. Wagner, Joachim. 1994. “The Post-Entry Performance of New Small Firms in German Manufacturing Industries.” The Journal of Industrial Economics 141–54. Wooldridge, Jeffrey. 2009. Introductory Econometrics. South Western Cengage Learning. Xiao, Li, and Min Ding. 2014. “Just the Faces: Exploring the Effects of Facial Features in Print Advertising.” Marketing Science 33(3):338–52. Yoganarasimhan, Hema. 2019. “Search Personalization Using Machine Learning.” Management Science. Zhang, Shunyuan, Dokyun Lee, Param Singh, and Kannan Srinivasan. 2018. “How Much Is an Image Worth? Property Demand Analytics Leveraging A Scalable Image Classification Algorithm.” Working Paper. Zhu, Feng, and Xiaoquan Zhang. 2010. “Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics.” Journal of Marketing 74(2):133–48.

40

Table 1. Review of Prior Literature on Business Survival Factors Literature Bates 1995; Carroll 1983; Fritsch et al. Firm age 2006; Kalleberg and Leicht 1991 Audretsch and Mahmood 1995; Bates 1995, 1998; Cooper et al. 1994; Kalnins Ownership type (e.g., chain vs. independent) and Mayer 2004; Kamshad 1994; Parsa Company et al. 2005, 2011 Type of business (e.g., cuisine types) Parsa et al. 2005 Audretsch et al. 2000; Audretsch and Mahmood 1995; Kalleberg and Leicht Operation (e.g., price, service, environment) 1991; Parsa et al. 2015; Parsa, Gregory, and Terry 2010; Tveterås and Eide 2000 Fritsch et al. 2006; Kalleberg and Leicht Concentration Competition 1991; Parsa et al. 2005; Wagner 1994 Number of entries Fritsch et al. 2006 Audretsch and Mahmood 1995; Boden National conditions Jr and Nucci 2000; Fritsch et al. 2006; Parsa et al. 2010 Macro Fritsch et al. 2006; Haapanen and Tervo 2009; Kalleberg and Leicht 1991; Local environment Kalnins and Lafontaine 2013; Parsa et al. 2010, 2011, 2015

41

Table 2. Summary Statistics of Photos and Reviews Count Mean Standard deviation Min. Max. General content of photos Food 755,758 59.10% Interior 755,758 6.09% Outside 755,758 5.14% Drink 755,758 3.03% Menu 755,758 2.30% Others 755,758 24.34% Length of captions 533,965 31.94 26.13 1 140 (# of characters) Yearly helpful votes for photosa 755,758 0.16 0.34 0 38

Star rating of reviews 1,121,069 3.75 1.30 1 5 Length of reviews (# of characters) 1,121,069 617.45 585.41 1 5,000 Yearly helpful votes for reviewsa 1,121,069 0.56 1.52 0 133 total # of helpful votes received at the end of our observation window aYearly helpful votes = years since upload

Table 3. Restaurant Level Summary Statistics Count Mean Standard deviation Min. Max. Age (at the end of year 2015) 17,719 19.89 21.10 1 192 Chain 17,719 25.95% Price level 16,932 1.56 0.61 1 4 Top 10 cuisine types Mexican 17,719 12.34% American (Traditional) 17,719 12.04% Pizza 17,719 11.89% Nightlife 17,719 10.94% Fast Food 17,719 10.80% Sandwiches 17,719 9.76% American (New) 17,719 8.10% Burgers 17,719 7.36% Italian 17,719 7.18% Chinese 17,719 6.77% Other cuisine types 17,719 2.82%

Total # of photosa 17,719 42.65 122.55 0 4,793 Total # of reviewsa 17,719 63.27 135.76 0 5,040 aTotal # of photos/reviews is cumulative volume at the end of our observation window.

42

Table 4. Definitions of Independent Variables Table 4A. Definitions of Independent Variables: Photos and Reviews Variable Type Subtype (Both one-period lag and cumulative until t-1) Prop. of photos depicting each of the general content types (food, drink, interior, outside, and menu) provided by Yelp Prop. of photos on each of the 10 LDA topics Photo contentit-1 Avg. photo variety based on Clarifai labels Std. of photo variety based on Clarifai labels Prop. of photos equipped with top 33.3% variety in photo contenta Prop. of photos equipped with middle 33.3%variety in photo content Prop. of photos equipped with bottom 33.3% variety in photo content Avg. of each attribute (18 attributes of color, composition, figure-ground relationship) Std. of each attribute Photographic Prop. of photos rated equipped with top 33.3% of each attribute attributes it-1 Prop. of photos rated equipped with middle 33.3% of each attribute Photo Prop. of photos rated equipped with bottom 33.3% of each attribute Prop. of photos with a caption Avg. caption length (# of characters) Std. of caption length Prop. of photos with a dish name caption Caption it-1 Prop. of photos with a positive caption Prop. of photos with a negative caption Avg. sentiment of captions (if caption is not a dish name) Std. of caption sentiment (if caption is not a dish name) Photo volumeit-1 # of photos Prop. of photos with helpful votes Helpful voteit-1 Avg. yearly helpful votes for photos Std. of yearly helpful votes for photos Prop. of reviews mentioning each restaurant quality dimension (food, service, environment, price) Avg. sentiment of each dimension Restaurant quality Std. of sentiments of each dimension dimensions it-1 Prop. of reviews equipped with top 33.3% sentiment of each dimension Prop. of reviews equipped with middle 33.3% sentiment of each dimension Prop. of reviews equipped with bottom 33.3% sentiment of each dimension Prop. of reviews on each of the 20 LDA topics Avg. review variety based on nouns Std. of review variety based on nouns Review content it-1 Prop. of reviews equipped with top 33.3% variety in review content Review Prop. of reviews equipped with middle 33.3%variety in review content Prop. of reviews equipped with bottom 33.3% variety in review content Review volumeit-1 # of reviews Avg. star rating Star ratingit-1 Std. of star ratings Prop. of reviews of each star Avg. review length (# of characters) Review length it-1 Std. of review length Prop. of reviews with helpful votes Helpful voteit-1 Avg. yearly helpful votes for reviews Std. of yearly helpful votes for reviews aFor example, “prop. of photos equipped with top 33.3% variety in photo content” is the prop. of photos from the focal restaurant with the top one-third variety in photo content among all photos in our dataset.

43

Table 4B. Definitions of Independent Variables: Company, Competition, and Macro Type Variable Definitions of Variable Years since the restaurant opened. We include age as a Ageit-1 set of dummies (age=0, 1, 2, ..., 21, 21+) in our prediction model. Company Whether the number of restaurants with the same name Chain i was greater than five in our data Price leveli $, $$, $$$, $$$$ Cuisine typesi Cuisine types as defined in Table 3 Number of competitors with overlapping cuisine types in # of overlapping competitors it-1 the same zip code in period t-1 Number of competitors with no overlapping categories in # of non-overlapping competitors it-1 the same zip code in period t-1 Number of competitors opened in the same zip code in # of new entries it-1 period t-1 Number of competitors closed in the same zip code in # of new exits it-1 period t-1 Competition Avg. of photo volume of competitors in the same zip Avg. of competitors’ # of photosit-1 code (both one-period lag and cumulative until t-1) Avg. of review volume of competitors in the same zip Avg. of competitors’ # of reviewsit-1 code (both one-period lag and cumulative until t-1) Avg. of avg. star rating across competitors in the same zip Avg. of competitors’ avg. star code rating it-1 (both one-period lag and cumulative until t-1) Year Year dummies Macro t-1 Location: zip codei Zip codes with more than 100 restaurants included

44

Table 5. One-Year-Ahead Prediction – Comparison between Photos and Reviews Out of sample In sample AUC KL divergence Sensitivity Specificity Balanced accuracy Pseudo R2 0.7020 0.1973 0.6484 0.6284 0.6384 0.1373 Baseline (i.e., no UGC) (0.0047) (0.0035) (0.0056) (0.0092) (0.0046) (0.0026) 0.7152 0.1955 0.7301 a 0.5614 0.6457 0.2100 Baseline + review (0.0048) (0.0035) (0.0054) (0.0095) (0.0042) (0.0062) 0.7612 a 0.1877 a 0.6735 0.6936 a 0.6835 a 0.2324 Baseline + photo (0.0066) (0.0027) (0.0087) (0.0086) (0.0051) (0.0029) 0.7660 a 0.1857 a 0.7027 0.6820 a 0.6924 a 0.2673 a Baseline + review + photo (0.0065) (0.0028) (0.0081) (0.0089) (0.0052) (0.0046) Baseline model includes restaurant characteristics, competitive landscape, and macro conditions. For sensitivity, specificity, and balanced accuracy, the training data are reweighted so that the total weights of open and closed observations are equal. Results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. Bold numbers indicate significant improvement over the baseline model at the 0.05 level with a 2-sided test. a Best in the column or not significantly different from best in the column at the 0.05 level with a 2-sided test.

Table 6. One-Year-Ahead Prediction – Different Aspects of Photos Out of sample In sample AUC KL divergence Sensitivity Specificity Balanced Accuracy Pseudo R2 0.7020 0.1973 0.6484 0.6284 0.6384 0.1373 Baseline (0.0041) (0.0030) (0.0049) (0.0080) (0.0046) (0.0023) 0.7005 0.1975 0.6861 a 0.5838 0.6350 0.1767 Baseline + photographic attributes (0.0048) (0.0035) (0.0055) (0.0093) (0.0042) (0.0043) 0.7010 0.1972 0.6646 0.6060 0.6353 0.1460 Baseline + photo caption (0.0047) (0.0035) (0.0046) (0.0096) (0.0048) (0.0027) 0.7051 0.1967 0.6581 0.6222 0.6401 0.1407 Baseline + photo volume (0.0048) (0.0035) (0.0048) (0.0086) (0.0043) (0.0027) 0.7157 0.1962 0.6601 0.6339 0.6470 0.1670 Baseline + helpful votes (0.0046) (0.0034) (0.0098) (0.0115) (0.0038) (0.0034) 0.7539 a 0.1876 a 0.6698 0.6884 a 0.6791 a 0.2014 a Baseline + photo content (0.0075) (0.0028) (0.0061) (0.0094) (0.0065) (0.0020) Baseline model includes restaurant characteristics, competitive landscape, and macro conditions. For sensitivity, specificity, and balanced accuracy, the training data are reweighted so that the total weights of surviving and closed observations are equal. Results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. Bold numbers indicate significant improvement over the baseline model at the 0.05 level with a 2-sided test. a Best in the column or not significantly different from best in the column at the 0.05 level with a 2-sided test.

45

Table 7. Comparison of Incremental Predictive Power of Photos by Restaurant Type (AUC) Price Price Chain Independent 1<=Age<=3 321 level=1 level=2 0.6700 0.6698 0.6290 0.6261 0.6580 0.7142 0.6960 Baseline + review (0.0117) (0.0054) (0.0084) (0.0108) (0.0126) (0.0059) (0.0053) 0.7106 0.7323 0.7128 0.7320 0.7080 0.7582 0.7624 Baseline + review + photo (0.0157) (0.0075) (0.0093) (0.0085) (0.0132) (0.0064) (0.0075) 0.0406 0.0626 b 0.0837 a 0.1059 a 0.0501 0.0440 0.0664 a AUC increase by photo (0.0119) (0.0059) (0.0084) (0.0111) (0.0103) (0.0053) (0.0060) % increase by photo 6.06% 9.34% 13.31% 16.91% 7.61% 6.16% 9.54% Baseline includes restaurant characteristics, competitive landscape, and macro conditions. Results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. Bold numbers indicate that the incremental predictive power of photos is significantly different from zero at the 0.05 level with a 2-sided test. a Best in the comparison (row) or not significantly different from best in the comparison (row) at the 0.05 level with a 2-sided test. b Best in the comparison (row) or not significantly different from best in the comparison (row) at the 0.10 level with a 2-sided test.

46

Table 8. Results of Cluster-Robust Causal Forests Parameter Treatment variable SE Estimate Photo and review # of photos and reviews (cum.) 0.0118 0.1301 % of food photos (cum.) 0.0473 *** 0.0056 % of outside photos (cum.) 0.0327 ** 0.0101 % of interior photos (cum.) 0.0297 * 0.0120 Photo % of dink photos (cum.) -0.1958 0.1066 % of menu photos (cum.) -0.0272 0.0420 % of photos on food and drink (cum.) -0.0109 0.0131 % of photos with helpful votes (cum.) 0.0538 *** 0.0037 % of mixed/negative reviews on food (cum.) -0.0731 *** 0.0129 Avg. yearly helpful votes for reviews (cum.) 0.0351 *** 0.0041 Review Avg. review length (cum.) -0.0036 *** 0.0005 Avg. star rating (cum.) 0.0047 ** 0.0016 Std. of star ratings (cum.) -0.0229 *** 0.0046 Chain 0.0243 *** 0.0024 Age <= 1 -0.0719 *** 0.0051 Age 2-3 -0.0189 *** 0.0031 Age 4-7 0.0057 * 0.0026 Age 8-21 0.0255 *** 0.0017 Age 22-42 0.0226 *** 0.0019 Age > 42 0.0237 *** 0.0025 Price level -0.0062 *** 0.0016 Company Mexican 0.0157 *** 0.0027 American (traditional) 0.0066 ** 0.0024 Pizza 0.0060 * 0.0028 Nightlife 0.0152 *** 0.0026 Fast Food 0.0095 *** 0.0028 Sandwiches 0.0057 * 0.0027 American (new) -0.0034 0.0032 Burgers -0.0018 0.0031 Italian -0.0006 0.0031 Chinese 0.0331 *** 0.0029 # of competitors (one-period) -0.0189 *** 0.0044 Competition Avg. of competitors' # of photos and reviews (cum.) -0.0718 *** 0.0196 Avg. of competitors' avg. star rating (cum.) -0.0212 * 0.0093 Restaurant quality dimensions including: 1) Prop. of reviews mentioning each restaurant quality dimension (food, service, environment, price) (cum.) Additional controls 2) Avg. sentiment of each dimension (cum.) Zip codes Year dummies *** p<0.001, ** p<0.01, * p<0.05. A positive sign means positive association with restaurant survival. Each row is a separate cluster-robust causal forests estimation. We rescaled “avg. review length (cum.),” “# of competitors (one-period),” and “avg. of competitors' # of photos and reviews (cum.)” by 1/100. Avg. sentiment of each quality dimension is not included as controls when “avg. star rating” is the treatment variable due to high correlation.

47

Figure 1. Research Roadmap • Content extracted by Yelp & Clarifai deep learning and topic modeling • Photographic attributes extracted by computer vision Consumer photos • Caption (sentiment extracted by text mining) • Volume • Helpful votes

Controls

• Restaurant quality extracted from reviews by deep learning • Content extracted by topic modeling • Volume Consumer reviews • Star rating • Length • Helpful votes

• Restaurant age Restaurant Company • Chain status • Price level survival • Cuisine type

• Competitors with overlapping/non-overlapping cuisine types Competition • New entries/exits • Photos and reviews of competitors

• Year Macro • Zip code

48

Figure 2. Distribution of Life Span of Open vs. Closed Restaurants

Figure 3. Performance AUC Visualization for One-Year-Ahead Prediction: Photos Carry More Predictive Power Than Reviews

Because we cannot directly average ROC curves, each ROC curve in this figure is plotted by pooling together testing observations across years and cross-validation iterations. In contrast, we report the average AUC values across years and cross-validation iterations in Table 5. Hence, the AUC numbers in this figure are slightly different from those in Table 5.

49

Figure 4. Top 35 Predictors of Restaurant Survival in One-Year-Ahead Prediction SHAP feature importance 0 0.05 0.1 0.15 0.2 0.25

% of food photos (cum.) % of outside photos (cum.) % of photos with helpful votes (cum.) % of interior photos (cum.)

Avg. yearly helpful votes for photos (cum.) Photo % of photos on food and drink (cum.) # of photos (cum.) Avg. yearly helpful votes for photos (one-period) % of mixed/negative reviews on food (cum.) Std. of yearly helpful votes for reviews (cum.)

Avg. yearly helpful votes for reviews (cum.)

Avg. review length (cum.)

Std. of star ratings (cum.)

Review Avg. yearly helpful votes for reviews (one-period) Std. of review length (cum.) Std. of sentiments of service (cum.) Chain Age > 21 Age = 1 Age = 0 Price level Age = 2 Company Chinese Fast Food Age = 3 # of non-overlapping competitors (one-period)

Avg. of competitors' # of reviews (one-period)

# of overlapping competitors (one-period) Avg. of competitors' avg. star rating (cum.) Avg. of competitors' # of reviews (cum.)

# of new closes (one-period) Competition Avg. of competitors' # of photos (cum.)

# of new entries (one-period)

Year 2014 Year 2011

Macro Importance weights are based on predicting survival in 2015. Within each type of factors (e.g., photo), variables are ordered by their predictive power.

50

Figure 5. Multiple-Year Survival Prediction Results (AUC): Predictive Power of Photos Lasts Longer Than Reviews

Baseline model includes restaurant characteristics, competitive landscape, and macro conditions. Results are averaged over years and cross-validation iterations. The error bars represent  1 times the standard error of each point estimate.

1

Web Appendixes: Can Consumer-Posted Photos Serve as a Leading Indicator of Restaurant Survival? Evidence from Yelp

Appendix A. Survival Identification and Survival-Related Summary Statistics Figure A1. Example of Closure Time Identification

Figure A2. Survival Rate by Top 10 Cuisine Types

The error bars represent  1 times the standard error of each point estimate.

2

Figure A3. Survival Rate by State

States listed alphabetically. The error bars represent  1 times the standard error of each point estimate.

To investigate model-free evidence supporting our conjecture that consumer-posted photos may serve as a leading indicator of restaurant survival, we compare the proportions of closed and open restaurants that have: 1) decreasing numbers of photos; 2) decreasing proportion of photos with helpful votes in each year. Using numbers of photos as an example, for the year 2014, we first find restaurants that were closed and those that were still open in 2014. We then check whether each restaurant received fewer photos during 2013 than in 2012. Lastly, we calculate the proportion of restaurants with a decreasing number of photos among closed and open restaurants, respectively. To make our comparisons more reliable, we conduct such comparisons only in years with at least 20 open and at least 20 closed restaurants. This comparison is similar in spirit to a difference-in-difference approach: last year vs. the year before last difference and closed vs. open difference. Such a difference-in-difference approach is superior to plotting the total numbers of photos for open vs. closed restaurants directly because it takes the baseline differences between the two groups of restaurants into account. Figure A4 shows the comparison results. We observe that the red dots representing proportions among closed restaurants are generally above the green triangles representing proportions among open restaurants. Such patterns indicate that, by and large, restaurants near closure are more likely to have decreasing photo volume and decreasing proportion of photos with helpful votes than are open restaurants. These comparisons provide model-free evidence that photos may be predictive of restaurant survival after the macro year-trend is controlled. 3

Figure A4. Photo Volume and Helpful Votes Trends Comparisons Across Years

The error bars represent  1 times the standard error of each point estimate.

As a benchmark, we compare the proportions of closed and open restaurants that have: 1) decreasing numbers of reviews; 2) decreasing proportion of reviews with helpful votes in each year. Figure A5 shows that restaurants near closure are more likely to have decreasing review volume than open restaurants. However, such a trend is not as apparent when it comes to the proportion of reviews with helpful votes for restaurants, indicating the proportion of reviews with helpful votes may not be as predictive of restaurant survival as its counterpart in photos. Figure A5. Review Volume and Helpful Votes Trend Comparison Across Years

The error bars represent  1 times the standard error of each point estimate.

We also compare the proportions of closed and open restaurants that have: 1) decreasing numbers of photos; 2) decreasing proportion of photos with helpful votes at each restaurant age. For example, for the age=3, we first find restaurants that were closed and those that were still open at age=3. We then check whether each restaurant received fewer photos at age=2 than at age=1. Lastly, we calculate the proportion of restaurants with a decreasing number of photos among closed and open restaurants, respectively. To make reliable comparisons, we conduct comparisons only in ages with at least 20 open and at least 20 closed restaurants. To compare volume and helpful votes in the last two years, we also 4 have to choose restaurants that lived more than two years and listed on Yelp for more than two years. Based on all considerations above, we plot comparisons for ages between 3 and 11 in Figures A6 an A7. Similar to the comparison across calendar years, Figure A6 shows that the red dots representing proportions among closed restaurants are generally above the green triangles representing proportions among open restaurants. Such patterns again indicate that, by and large, restaurants near closure are more likely to have decreasing photo volume and decreasing proportion of photos with helpful votes than are open restaurants. These comparisons also provide model-free evidence that photos may be predictive for restaurant survival after controlling for restaurant age. Figure A6. Photo Volume and Helpful Votes Trends Comparisons Across Ages

The error bars represent  1 times the standard error of each point estimate.

As a benchmark, we also compare the proportions of closed and open restaurants that have: 1) decreasing numbers of reviews; 2) decreasing proportion of reviews with helpful votes at each restaurant age. Figure A7 shows that restaurants near closure are more likely to have decreasing review volume than open restaurants. Nevertheless, such a trend is not apparent for the proportion of reviews with helpful votes, indicating again the proportion of reviews with helpful votes may not be as predictive of restaurant survival as its counterpart in photos. Figure A7. Review Volume and Star Rating Trend Comparison Across Ages

5

Appendix B. Photo Analysis Clarifai Labels Clarifai, a photo-based API specializing in computer vision, was founded by Matthew Zeiler, author of “Visualizing and Understanding Convolutional Networks” (Zeiler and Fergus 2014). Clarifai has been a market leader in photo content detection since it won the top five places in photo classification at the ImageNet 2013 competition. We use the Clarifai “Food” model to identify objects in food and drink photos as per the classification on Yelp. The “Food” model is designed to process photos of food and drink.1 Upon comparing APIs provided by Google, Microsoft, Amazon, and Clarifai, we find that Clarifai can provide the most refined labels for food. For example, only Clarifai could label content as refined as “strawberry” in a photo when we compare these APIs. Figure A8 provides four examples of food and drink photos with labels and probabilities provided by the Clarifai “Food” model. The Clarifai API does a good job in recognizing a large variety of food ingredients in a photo.

Figure A8. Examples of Clarifai Labels for Food and Drink Photos

pizza 1 sushi 1 corn 1 strawberry 1 crust 0.999 salmon 0.998 corn salad 0.987 fruit 0.998 pepperoni 0.999 nori 0.99 vegetable 0.978 berry 0.998 dough 0.999 seafood 0.983 salad 0.977 mint 0.997 sauce 0.998 rice 0.952 onion 0.895 refreshment 0.991 mozzarella 0.998 sashimi 0.941 tomato 0.872 ice 0.963 tomato 0.994 fish 0.94 lettuce 0.615 glass 0.943

salami 0.99 caviar 0.911 juice 0.941

basil 0.77 shrimp 0.903 meat 0.715 nigiri 0.85

We use the Clarifai “General” model for interior, outside, menu, and other photos as per the classification of Yelp. The ‘General’ model recognizes over 11,000 different labels.2 Figure A9 shows four examples of restaurant photos with labels and probabilities provided by the Clarifai “General” model.

1 https://clarifai.com/models/food-image-recognition-model-bd367be194cf45149e75f01d59f77ba7 2 https://clarifai.com/models/general-image-recognition-model-aaa03c23b3724a16a56b629203edc62c

6

Figure A9. Examples of Clarifai Labels for Interior and Outside Restaurant Photos

people 0.993 restaurant 0.988 street 0.98 outdoors 0.979 restaurant 0.991 dining 0.983 city 0.968 no person 0.972 adult 0.977 table 0.974 no person 0.957 sky 0.949 man 0.966 people 0.954 urban 0.952 light 0.942 dining 0.957 interior design 0.949 building 0.932 dusk 0.931 indoors 0.941 chair 0.937 commerce 0.9 evening 0.922 drink 0.933 indoors 0.914 restaurant 0.9 nightlife 0.916 woman 0.93 bar 0.9 light 0.896 city 0.915 table 0.93 man 0.875 window 0.876 street 0.886 wine 0.882 group 0.875 town 0.861

group 0.879 seat 0.852

candle 0.866 woman 0.846 service 0.861 cafeteria 0.842

We include only labels with more than 50 percent confidence according to the Clarifai API, resulting in a total of 5,080 unique labels. Table A1 shows the top 100 Clarifai labels.

Table A1. Top 100 Clarifai Labels sauce shrimp pizza lettuce salsa chicken table avocado seat street cheese people drink lamb milk pork onion cheddar ice duck meat pepper wine outdoors group bacon chocolate text pasta corn beef cake city mushroom house vegetable egg tuna tacos toast garlic butter plate dinner apple rice salmon coffee ham mozzarella salad seafood refreshment chips ginger sweet sandwich lemon dairy product shop cream bar chair bun pastry bread chili turkey lunch sign potato sausage beans curry cuisine food room tea dish window indoors soup spinach inside basil steak beer lobster hotel noodle tomato meal sushi honey light fish crab pie ramen lime

7

Topic Modeling of Clarifai Labels We randomly sampled 10,000 photos to calibrate the LDA model for topic modeling, with 80% for training and 20% for testing. We implemented the LDA model using a Python library “GENSIM” (Rehurek and Sojka 2010), a widely used Python library for topic modeling. We used the online variational inference algorithm for the LDA training (Hoffman, Bach, and Blei 2010). We used all the 1,281 labels that Clarifai identified in more than 0.05% of photos. Removing rare labels reduces the risk that the results are influenced by outlier labels (Netzer et al. 2019; Tirunillai and Tellis 2014). We ran the model with 2 to 40 topics on the training dataset. We found that the fitted LDA with 10 topics yielded the highest topic coherence score (Röder et al. 2015) on the testing dataset. Table A2 presents the 10 topics and the most representative words for each topic based on the relevance score calculated using 휆 = 0.5 (Sievert and Shirley 2014). Intuitively, the representative words for a topic tend to appear together (Tirunillai and Tellis 2014).

Table A2. The 10 LDA Topics and Representative Words with Highest Relevance (Topics are listed in an arbitrary order) ID Topic name Representative words with highest relevance 1 Symbol and decoration Illustration, symbol, sign, design, retro, art, image, text, vintage, decoration 2 Seafood Fish, seafood, salmon, tuna, sushi, shrimp, crab, sashimi, lobster, nigiri 3 Text and information Page, text, number, document, paper, time, information, form, order, writing 4 Meat Pork, beef, chicken, steak, lamb, meat, duck, rib, broth, tenderloin 5 Inside Chair, indoors, room, seat, table, bar, inside, dining, interior design, counter Portrait, people, music, recreation, festival, wear, performance, group, 6 People celebration, facial expression 7 dessert Chocolate, cake, cream, ice, tea, sweet, coffee, strawberry, ice cream, dessert 8 Outside Outdoors, city, street, road, vehicle, light, urban, daylight, entrance, sky 9 food and drink Meal, refreshment, plate, dinner, lunch, dish, cuisine, fruit, glass, drink Cheese, bacon, sandwich, bread, cheddar, pizza, sausage, chicken, tomato, 10 Sandwich and pizza ham

In the extrapolation step, we applied the calibrated LDA parameters to extract topic distribution for each photo in our entire data set of 755,758 photos. Following the standard approach (Netzer et al. 2019), a 10-dimensional topic distribution vector was generated for each photo. Each dimension represented the empirical percentage of Clarifai labels assigned to a topic, with all percentages summing up to 1. For example, the topic distribution for one photo could be [5%, 5%, 15%, 15 %, 10%, 10%, 0%, 0 %, 20%, 20%]. We then used the average topic distribution for photos for each restaurant-year as inputs for our prediction model.

8

Photographic Attributes Table A3. Definitions of Photographic Attributes Types Attributes Description Brightness refers to the overall lightness or darkness of the Brightness photo. Saturation indicates color purity. Pure colors in a photo tend to Saturation be more appealing than dull or impure ones. Color Contrast Contrast is the difference in brightness between regions. Clarity The proportion of pixels with enough brightness. Warm hue The proportion of warm color in a photo. It distinguishes multi-colored photos from monochromatic, Colorfulness sepia, or simply low contrast photos. Measures how close the main region in the photo is positioned Diagonal dominance to diagonal lines. The “golden ratio” (about 0.618). The photo is divided into nine equal segments by two vertical and two horizontal lines. Rule of thirds The rule of thirds says that the most important elements are positioned near the intersections. Composition It measures how symmetrical a photo is around its central Physical visual vertical line (horizontal physical balance) or central horizontal balance line (vertical physical balance). It measures how symmetrical the colors of a photo are around Color visual balance its central vertical line (horizontal color balance) or central horizontal line (vertical color balance). Size difference It measures how much bigger the figure is than the ground. It measures how different the color of the figure is from that of Color difference Figure- the ground. ground It measures how different the texture of the figure is from that Texture difference relationship of the ground. The range of distance from a camera that is acceptably sharp in Depth of field a photo.

Color For the first five color features, we first convert photos from RGB space to HSV space (each pixel is a 3-dimensional vector representing hue, saturation, and value). In the following, we explain how we extract each attribute for color. 1. Brightness is the average of the value dimension of HSV across pixels (Datta et al. 2006). We normalized brightness to be between 0 and 1, with a higher score meaning brighter. 2. Saturation is the average of saturation cross pixels (Datta et al. 2006). We normalized saturation to be between 0 and 1, with a higher score meaning more saturated. 3. Contrast of brightness was calculated as the standard deviation of the value dimension of HSV cross pixels. We normalize the score to be between 0 and 1, with a higher score meaning higher contrast. 4. Clarity We first normalize the value dimension of HSV to be between 0 and 1, and we define a pixel to be of enough clarity if its value is bigger than 0.7. Then clarity is defined as the proportion of pixels with enough clarity. 5. Warm hue Following Wang et al. (2006), the warm hue level for the photo is the proportion of warm hue (i.e., red, orange, yellow) pixels in a photo. 6. Colorfulness We followed Hasler and Suesstrunk (2003) to measure colorfulness for each photo based on the RGB vector of each pixel. We normalized this metric to be between 0 and 1, with a higher score meaning more colorful.

9

Composition To compute the four attributes of composition, we first assigned each pixel a saliency score based on computer vision methods (Montabone and Soto 2010). Then we separated a photo into 10 segments based on the superpixel algorithm (Mori et al. 2004; Ren and Malik 2003). The segment with the highest average saliency is identified as the salient region (Zhang et al. 2018). Identifying the salient region is necessary because diagonal dominance and rule of thirds are defined with respect to the main element of a photo. In the following, we explain how we extract each attribute for composition. 1. Diagonal dominance We calculate the distance between the center of the salient region to each of the two diagonals of a photo. We then define diagonal dominance as the negative of the minimum of two distances (Wang et al. 2013). We normalize this score to be between 0 and 1, with a higher score meaning stronger diagonal dominance. 2. Rule of thirds We calculate the distance from the center of the salient region to each of the four intersections of the two horizontal lines and the two vertical lines that evenly divide the photo into nine parts (Datta et al. 2006). Then the rule of thirds is defined as the negative of the minimum of the four distances. We normalized this score to be between 0 and 1, with a higher score meaning a stronger rule of thirds. 3. Physical visual balance We calculate two scores for physical visual balance: vertical and horizontal physical visual balances. We first calculate the weighted center of the photo by weighting the centers of segments by their respective saliency. Then the vertical physical visual balance is defined as the negative of the vertical distance between the weighted center and the horizontal line that divides the photo into two equal parts. The horizontal physical visual balance is defined as the negative of horizontal distance between the weighted center and the vertical line that divides the photo into two equal parts (Wang et al. 2013). We normalize the two scores to be between 0 and 1, with a higher score meaning a higher balance. 4. Color visual balance We calculated two scores for color visual balance: vertical and horizontal color visual balances. Following Wang et al. (2013), to measure vertical color balance, we first separate the photo into two equal parts by a horizontal line. A pair of pixels is a pixel on the top part and its symmetric counterpart on the bottom part. Then the vertical color balance is defined as the negative of the average of Euclidean distance cross pixel pairs. To measure horizontal color balance, we first separate the photo into two equal parts by a vertical line. A pair of pixels is a pixel on the left part and its symmetric counterpart on the right part. Then the horizontal color balance is defined as the negative of the average of Euclidean distance cross pixel pairs. We normalize the two scores to be between 0 and 1, with a higher score meaning higher color balance.

Figure-ground relationship Figure refers to the foreground, and ground refers to the background, of a photo. For the first three figure-ground relationship features, we first use the Grabcut algorithm (Rother et al. 2004) to identify the figure and background of each photo. In the following, we explain how we extract each attribute for the figure-ground relationship. 1. Size difference We take the difference between the number of pixels of the figure and that of the background, normalized by the total number of pixels of the photo (Wang et al. 2013). 2. Color difference We first calculate the average RGB vectors for figure and ground. Then the color difference is the Euclidean distance between the two RGB vectors (Wang et al. 2013). We normalize the score to be between 0 and 1, with a higher score meaning a bigger figure-ground color difference. 3. Texture difference We use the Canny edge detection algorithm (Canny 1987) to detect edges in the figure and background. Then we compute the density of edges in the figure and background. The texture difference is the absolute value of the difference between figure edge density and background edge density. This score is normalized to be between 0 and 1, with a higher score meaning a higher difference. 4. Depth of field Professional photos usually use a low depth of field to enhance the most important element in the photo. A photo of a low depth of field is usually sharp in the center while out of focus in the surrounding area. We divide the photo into 16 equal regions. Following Datta et al. (2006), we compute the Daubechies wavelet coefficients (Daubechies 1992) in the high-frequency for each HSV 10 dimension of a photo. Then we calculate the depth of field by dividing the sum of wavelet of the center four regions by the sum of wavelet of the whole photo, for each HSV dimension. Thus, there are three scores of the depth of field: depth of field (hue), depth of field (saturation), and depth of field (value). A higher score means a lower depth of field.

Table A4. Summary Statistics of Photographic Attributes Count Mean Standard deviation Minimum Maximum Color Brightness 755,758 0.53 0.13 0.00 1.00 Saturation 755,758 0.43 0.15 0.00 1.00 Contrast 755,758 0.47 0.11 0.00 0.99 Clarity 755,758 0.31 0.20 0.00 1.00 Warm hue 755,758 0.85 0.17 0.00 1.00 Colorfulness 755,758 0.25 0.10 0.00 1.00 Composition Diagonal dominance 755,758 0.52 0.17 0.00 1.00 Rule of thirds 755,758 0.56 0.18 0.00 1.00 Vertical physical balance 755,758 0.94 0.06 0.00 1.00 Horizontal physical balance 755,758 0.90 0.08 0.00 1.00 Vertical color balance 755,758 0.43 0.08 0.00 1.00 Horizontal color balance 755,758 0.44 0.08 0.02 1.00 Figure-ground relationship Size difference 755,758 0.45 0.19 0.00 1.00 Color difference 755,758 0.21 0.14 0.00 1.00 Texture difference 755,758 0.08 0.07 0.00 1.00 Depth of field (hue) 755,758 0.25 0.13 0.00 1.00 Depth of field (saturation) 755,758 0.30 0.09 0.00 1.00 Depth of field (value) 755,758 0.32 0.09 0.00 0.99

11

Photo Caption Sentiments We use VADER (Hutto and Gilbert 2014) sentiment analysis to analyze photo captions. We learn that photo captions can be categorized into three categories: neutral dish names captions, positive captions, and negative captions. Table A5 shows examples of captions and respective sentiment for each category. Figure A10 demonstrates the distribution of caption sentiment.

Table A5. Examples of Caption Classification and Sentiment Scores ID Caption Sentiment Sam & Emma Sandwich 0 Red velvet pancakes 0 Dish name captions Our sushi boat 0 Vegetables 0 Mandarin Kung Pao Chicken 0 Inspiration + food - love it! 0.8356 Positive captions I love being transported to another country by their food! 0.6696 Kiffka pita so good! :) 0.6009 This place sucks. Flat beer bad service had to leave -0.7351 Negative captions Can i get a bag of salt to add to my sodium overload? -0.3612

Figure A10. Distribution of Extracted Caption Sentiment

12

Appendix C. Review Analysis Restaurant Quality Dimensions We define the four quality dimensions in Table A6, based on prior literature on restaurant quality dimensions (Bujisic et al. 2014; Hyun 2010; Ryu et al. 2012). We first label the 10,000 reviews through a Mturk survey. After an introduction of the definition for each restaurant quality dimension, each consumer was presented with 20 screens of reviews with one review per screen. The 20 reviews were randomly selected from the 10,000 reviews. See Figure A11 for a screenshot of the survey interface. On average, it took 20 minutes for each consumer to finish the survey. We removed respondents who finished the survey with less than 5 minutes, given that these participants might skim through each review too fast and did not provide reliable responses. As a result, each review is on average read by eight consumers. We instructed each consumer to answer whether a review mentioned food, service, environment, or price of the restaurant; if so, whether the specific content was positive or negative on a 7-point Likert scale, with 1 being “extremely negative” and 7 being “extremely positive.” The average of sentiment votes was used to label sentiment for each quality dimension mentioned in each review.

Table A6. Definitions of Restaurant Quality Dimensions Extracted from Reviews Dimension Definition food, drink, menu variety, nutrition, healthiness, plating, food presentation, serving Food size, freshness, etc. employee behavior, attitude, responsiveness, reliability, tangibility, empathy, Service assurance, process speed, wait time, etc. interior/exterior design, décor, cleanliness, ambience, aesthetics, lighting, layout, Environment table setting, employee appearance, location, etc. Price good/bad value for money, price of items, etc.

Figure A11. Restaurant Review Survey

Review 1 of 20. This is my first time trying this place and everything was good. We got the tom yum shrimp, duck pad thai and mango sticky rice. Service was great too. Will definitely come back again.

We use a deep learning model to extract the following four quality dimensions from reviews: food, service, environment, and price. Each dimension has two labels: whether the dimension was mentioned and the sentiment of the dimension. To increase the accuracy of the text-based deep learning model, we follow the standard procedure (Timoshenko and Hauser 2019) and use pre-trained word embeddings as inputs for our text-based CNN. Specifically, we use GloVe (Pennington et al. 2014) word embeddings. GloVe provides an effective measure for the linguistic or semantic similarity of the corresponding words. Under GLoVe, each word is represented by a 200-dimensional vector. Each review is then represented by a 200  number of words matrix, which serves as the inputs for the text-based CNN. Figure A12 depicts the structure of the text-based CNN. The left panel of the figure demonstrates the entire structure of the text-based CNN. The right panel of this figure depicts the detailed structure of 13 the ConvBlock (the building block of the text-based CNN). The model is constructed in a way that is similar to the photo-based CNN — VGG model (Simonyan and Zisserman 2014), just replacing 2- dimensional convolution with 1-dimensional convolution. The idea is borrowed from Kim (2014), which shows that text-based CNN structures adapted from established image-based CNN structures perform well on several benchmark datasets. We also add more convolutions and pooling operations because our text is relatively long. The input goes through five convolutional blocks (ConvBlock), one global average pooling, and two full connections (FC). The guiding principle of convolution is to identify those features that can best predict the output variable. The primary function of pooling is dimension reduction and some degree of shift invariance (LeCun et al. 1998). Average pooling calculates the average on the previous layer. The full connections can be regarded as a classification task that links the last layer before full connection with the output variable.

Figure A12. Structure of Text-based CNN

Convolution: feature extraction. A 13 weight matrix, “filter,” multiplies with each segment of the previous layer, and the dot product becomes an element on the next layer. Different filters capture different information from the previous layer. ReLu: non-linear transformation, which turns all negative values on the previous layer to zero. Max pooling: subsampling method. A 12 window glides over the previous layer, and only the max value of each window is kept as an element on the next layer.

We employ a multitasking structure. As suggested by the deep learning literature (see Ruder (2017) for a review), multitasking can increase the accuracy of a deep learning model intuitively because the information cross tasks may be complementary. A multitasking model structure is also more efficient than a single-task structure. In our context, for each review, there are eight tasks to predict: whether a review mentions food, service, environment, and price; whether the sentiment for food, service, environment, and price is positive or negative. Under a single task text-based CNN, we would train eight separate deep learning models. With a multitasking structure, we only need to train one deep learning model. To implement the multitask CNN, we link the layer after global average pooling with a separate full connection for each task, generating a score for each task. For tasks that aim to identify whether the review mentions one or more dimensions of restaurant 푘 푘 quality, the label 푠푟 =1 if review 푟 mentions the quality dimension for task 푘, 푠푟 =0 if review 푟 does not mention the quality dimension for task 푘, with 푘 = 1, 2, 3, 4; 푟 = 1, 2, 3, … ,8000. For the other four tasks 푘 = 5, 6, 7, 8 that label sentiment of each quality dimension, only reviews that mention a quality dimension are kept for the respective sentiment task. For sentiment of each quality dimension, we follow the conventional procedure as in Liu et al. (2019), Zhang et al. (2018), and Zhang et al. (2015) and 14 convert the 7-point Likert scale sentiment to binary levels to mitigate potential noises in the data. In 푘 푘 particular, a sentiment label is defined as positive (푠푟 = 1) if it receives a rating above 4; negative (푠푟 = 0) if a rating is less than 4; and is disregarded if its rating equals 4. The loss function is defined in Equation (A1), which is the sum of loss for the eight tasks ̂푘 푘 altogether, with 푠푟 being the predicted probability that 푠푟 = 1. The specification is called “cross-entropy” in computer science literature, which is equivalent to negative log-likelihood. (A1) 8000 8 푘 ̂푘 푘 ̂푘 퐿표푠푠 = − ∑ ∑ [푠푟 ln (푠푟 ) + (1 − 푠푟 ) ln (1 − 푠푟 )] 푟=1 푘=1 We randomly split the 10,000 reviews into 80% for calibration and 20% for out-of-sample testing. In the calibration process, the text of each review is treated as model inputs, and the outputs are the eight quality dimension scores labeled by the Mturk survey. Parameters are optimized using a stochastic gradient descent method by minimizing the above loss function. We validate the calibrated text-based CNN on the holdout dataset. Our text-based CNN yields good AUC scores for all the eight tasks on the holdout dataset, as shown in Table A7. We then extrapolate the calibrated CNN to extract eight quality dimension scores for all reviews in our entire dataset of 1,121,069 reviews. Each review went through layers of the text-based CNN with review text and calibrated parameters as inputs and eight numerical scores between 0 and 1 representing mentioning and sentiments of the four restaurant quality dimensions as outputs. The distribution of the extracted quality scores is shown in Table A8. The calibration and extrapolation of the text-based CNN were implemented using Tensorflow, a deep learning library within Python.

Table A7. Out-of-Sample Testing Performance of the Text-based CNN AUC Valid samples for testing Mention Food 0.9187 1990 Service 0.9163 1986 Environment 0.8814 1980 Price 0.9449 1993 Sentiment Food 0.9515 1721 Service 0.9627 1325 Environment 0.8841 781 Price 0.9139 552

Table A8. Summary Statistics of Restaurant Quality Dimension Scores Count Mean Standard deviation Minimum Maximum Mention Food 1,121,069 0.9714 0.1667 0 1 Service 1,121,069 0.9998 0.0141 0 1 Environment 1,121,069 0.4043 0.4908 0 1 Price 1,121,069 0.2570 0.4370 0 1 Sentiment Food 1,089,039 0.7726 0.3903 0 1 Service 1,120,880 0.7249 0.4025 0 1 Environment 453,203 0.8311 0.3384 0 1 Price 288,070 0.6017 0.4512 0 1 15

Topic Modeling of Reviews For the topic modeling of reviews, we also calibrate a Latent Dirichlet allocation (LDA) (Blei et al. 2003) model to summarize the reviews' content. We use the same procedure of topic modeling on photos for reviews. We randomly sample 10,000 reviews to calibrate the LDA model, with 80% for training and 20% for testing. We implement the LDA model using a Python library “GENSIM” (Rehurek and Sojka 2010), a widely used Python library for topic modeling. We use the online variational inference algorithm for the LDA training (Hoffman et al. 2010). We use all the 5,330 words3 that appeared in more than 0.05% of reviews. Removing rare words reduces the risk that the results are influenced by outlier words (Netzer et al. 2019; Tirunillai and Tellis 2014). We run the model with 2 to 40 topics on the training dataset and find that the fitted LDA yields the highest topic coherence score (Röder et al. 2015) on the testing dataset when the number of topics is equal to 20. Table A9 presents the 20 topics and the most representative words for each topic based on the relevance score calculated using 휆 = 0.5 (Sievert and Shirley 2014). Intuitively, the representative words for a topic tend to appear together (Tirunillai and Tellis 2014).

Table A9. The 20 LDA Topics and Representative Words with Highest Relevance (Topics are listed in an arbitrary order) ID Topic name Representative words with highest relevance 1 Indian Food Indian, naan, masala, tikka, paneer, mais, frites, tater, tot, meatloaf 2 Mixed/negative reviews Sauce, chicken, flavor, salad, cheese, shrimp, steak, taste, didnt, wasnt on food 3 Japanese food Sushi, roll, fish, Japanese, tuna, tofu, tempura, sashimi, spicy, chef 4 Breakfast and Brunch Dog, toast, brunch, hot, juice, omelette, fresh, scramble, local, french 5 Satisfaction Great, food, service, excellent, friendly, good, recommend, staff, highly, love 6 Love for a place Best, place, ever, food, love, try, eat, phoenix, town, always 7 Happy hour and price Happy, hour, slider, discount, alcohol, Monday, free, drink, fondue, price 8 Italian food Bread, tomato, garlic, mozzarella, feta, meatball, Italian, crust, marinara, pasta 9 Buffet Buffet, Vega, pasta, wine, squid, seafood, station, tray, din, selection 10 Breakfast Coffee, breakfast, egg, pancake, waffle, biscuit, bacon, omelet, morning, benedict 11 Dissatisfaction Floor, response, horrible, needless, boring, crew, host, havent, bore, service 12 Mexican food Chip, Mexican, salsa, enchilada, tapa, band, tortilla, fajitas, queso, guacamole 13 Sandwich Sandwich, beef, pork, turkey, bbq, chicken, sub, roast, meat, lettuce 14 Dessert Dessert, pie, cake, chocolate, appetizer, creme, cheesecake, peanut, course, cream 15 General service Order, ask, get, take, minute, say, wait, come, didnt, table 16 Fast food Burger, shake, fry, pizza, bun, hamburger, topping, Grimaldis, smash, gourmet 17 Bar Beer, bar, game, watch, night, bartender, selection, sport, tap, place 18 Asian food Thai, noodle, pho, curry, Chinese, ramen, rice, bowl, pad, Vietnamese 19 Specific service Kid, time, wait, long, service, issue, table, hostess, waiter, refill 20 Special event Birthday, venue, football, reward, gift, private, cater, celebrate, sing, thanksgiving

In the extrapolation step, we apply the calibrated LDA parameters to extract topic distribution for each review in our entire data set of reviews. A 20-dimensional topic distribution vector is generated for each review. Each dimension represents the empirical percentage of words assigned to a topic, with all percentages summing up to 1. For example, the topic distribution for one word could be [2.5%, 2.5%,

3 For word preprocessing, we implemented the following procedures following Tirunillai and Tellis (2014): 1) remove punctuations (i.e., keep only words, numbers, space); 2) change capitalized characters to lower case; 3) tokenize words (i.e., a sentence to list of words); 4) implement part of speech tagging and keep only nouns, verbs, adjectives, adverbs; 5) lemmatize words (e.g., “booths” to “booth”, help us to normalize words); and 6) remove stop words. 16

7.5%, 7.5 %, 5%, 5%, 0%, 0 %, 10%, 10%, 2.5%, 2.5%, 7.5%, 7.5 %, 5%, 5%, 0%, 0 %, 10%, 10%]. We then use the average topic distribution for reviews for each restaurant-year as inputs for our predictive model.

Variety of Objects in Reviews To measure the variety of objects in a review, we count the number of unique nouns in a review. All reviews in our dataset contain 30,927 unique nouns. All nouns are lemmatized to ensure they have the same format. E.g., “apples” are lemmatized as “apple”, so “apples” and “apple” are treated as the same noun. We also remove stop words, words with fewer than two characters, and words that appear in less than 0.001% of reviews.

Top 100 Nouns As a robustness check for using topic modeling, we use an alternative way to summarize specific content in reviews. While all reviews in our dataset contain 30,927 unique nouns, the vast majority of nouns appear in only a few reviews. Hence, we use the top 100 nouns (encompassing all nouns with more than 5% frequency in our review dataset) to capture the specific content in reviews. Table A10. Top 100 Nouns in Reviews food friend plate option manager place star soup check water time dinner beef husband room service experience portion family fan restaurant fry quality work crab order flavor potato town fun chicken day egg choice bbq menu taste visit chip group table meat waitress coffee party pizza hour buffet cream end drink beer shrimp house piece sauce sushi wine line chocolate price review feel couple tomato salad roll pork guy tea burger location customer thai style bar bread home strip size lunch breakfast selection course decor night rice spot wife pasta staff area appetizer week name people taco item owner glass

17

Appendix D. Restaurant Survival Predictive Model: Gradient Boosted Trees Technical Details We follow Chen and Guestrin (2016) to carry out our restaurant survival predictive model using XGBoost algorithm. We provide additional technical details of our predictive model in this appendix. The regularization term Ω(휃) in Equation (1) of the paper aims to prevent overfitting. We carefully tune the hyper-parameters (e.g., 훾 and 휆 in Ω(휃)) before calibrating XGBoost, as described below. Following the convention of hyper-parameter tuning of XGBoost, we tune the hyper-parameters sequentially as follows.4 Table A11 lists the hyper-parameters that we tuned as well as the values that we tried.

Table A11 Hyper-parameters for Tuning Hyper- Definitions Values tried parameters The maximum depth of a tree. Increasing this value will max_depth [3,4,5,6] make the model more complex and more likely to overfit. The minimum number of nodes in a leaf. Increase this value min_child_weight [1,2] may reduce overfitting. The fraction of features to use when building each tree. A colsample_bytree [0.6,0.8,1] lower value might reduce overfitting. The fraction of observations to subsample at each step for subsample [0.6,0.8,1] building a tree. A lower value might reduce overfitting. 훾 Penalize more leaves. A higher value might reduce overfit. [0,4,8] Regularization term on weights. Increasing this value will 휆 [0,1,2] make a model more conservative. Learning rate. Step size shrinkage used in updating to prevent 휂 [0.05,0.1,0.3,0.5] overfitting

Ninety percent of restaurants are randomly chosen to tune the hyper-parameters. We further randomly split the restaurants with 80% for training and 20% for validation. We first find the best combination of max_depth and min_child_weight, among the 4*2 =8 combinations, as shown in the last column of Table A11, using a grid search. Given the best combination of max_depth and min_child_weight, we then pick the best values among the nine combinations of colsample_bytree and subsample. Then, given the best combination of max_depth, min_child_weight, colsample_bytree, and subsample, we choose the best values among the nine combinations of 훾 and 휆. Finally, given the best combination of the six tuned hyper-parameters, we choose best 휂 among the four values. Then we end up with the following values for the hyper-parameters in our XGBoost model: max_depth =4, min_child_weight=2, colsample_bytree=1, subsample=1, 훾 = 4, 휆 = 2, and 휂 = 0.1. Additionally, we learn empirically that the model converges within 100 trees. Therefore, we choose the number of trees to be 100 in our XGBoost model.

We provide details below for the model evaluation metrics used in Tables 5-6 in Section 3.2 of the paper. ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier (Hanley and McNeil 1982). False positive rate is on the horizontal axis. False positive rate =1-specificity= # 표푓 푓푎푙푠푒 푝표푠푖푡푖푣푒 푠푎푚푝푙푒푠 . True positive rate is on the vertical axis. True positive rate = # 표푓 푟푒푎푙 푛푒푔푎푡푖푣푒 푠푎푚푝푙푒푠 # 표푓 푡푟푢푒 푝표푠푖푡푖푣푒 푠푎푚푝푙푒푠 recall=sensitivity= . Given a false positive rate on the horizontal axis, the higher # 표푓 푟푒푎푙 푝표푠푖푡푖푣푒 푠푎푚푝푙푒푠 1 the true positive rate, the better. KL divergence =− ∑ [푦 ln 푦̂ + (1 − 푦 )ln (1 − 푦̂)], which is 푁 푖푡 푖푡 푖푡 푖푡 푖푡

4 https://blog.cambridgespark.com/hyperparameter-tuning-in-xgboost-4ff9100a3b2f 18

퐿표푔퐿푖푘푒푙푖ℎ표표푑 equivalent to negative log-likelihood. Pseudo R2 = 1 − 푝푟표푝표푠푒푑. We reweight the data for 퐿표푔퐿푖푘푒푙푖ℎ표표푑푛푢푙푙 sensitivity, specificity, and balanced accuracy, based on the discussion about the parameter “scale_pos_weight” on https://xgboost.readthedocs.io/en/latest/parameter.html and “handle imbalanced dataset” on https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html.

Comparisons with Other Predictive Algorithms We compare the XGBoost algorithm used in the paper with two alternative predictive algorithms: random forests and SVM (see Table A12). We also tried Lasso, hazard model, and OLS in our comparisons, which are omitted from Table A12 because their predictive performance was considerably worse than the three models presented. For random forests, we train an ensemble of 100 trees, the same as the number of trees in our XGBoost model. For SVM, we use L1 regularization and square of the hinge loss. We implement both random forests and SVM algorithms in Python. We then calculate means and standard deviations of prediction performance measured by AUC across years 2010-2015 and cross- validation iterations. For each photo or review related variable, we use the same main model specification as in the paper, namely OnePeriodt-1+Cumt-1. Table A12 shows that XGBoost dominates random forests and SVM in our context. It is not surprising because XGBoost is very flexible in handling potentially high-order interactions among predictors (Friedman 2001) and can process sparse data efficiently (Chen and Guestrin 2016).

Table A12 Out-of-sample Prediction Performance (AUC) of Different Predictive Algorithms XGBoost Random forests SVM 0.7020 a 0.6596 0.6904 Baseline (0.0047) (0.0048) (0.0054) 0.7152 a 0.6607 0.7007 Baseline + review (0.0048) (0.0047) (0.0053) 0.7612 a 0.7005 0.7329 Baseline + photo (0.0066) (0.0061) (0.0068) 0.7660 a 0.7081 0.7371 Baseline + review + photo (0.0065) (0.0065) (0.0072) Training obs. 89,384 Baseline model includes restaurant characteristics, competition, and macro factors. The results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. a Best in the row or not significantly different from best in the row at the 0.05 level with a 2-sided test.

19

Alternative Model Specifications Tables A13 to A15 provide the predictive results based on three alternative model specifications. It is worth noting that, while our main model specification includes all restaurant observations (with the total number of observations being 89,384), these alternative model specifications have fewer observations due to the inclusion of multi-year lags (i.e., OnePeriodt-2, Cumt-2, Cumt-3, Changet-1) in such models. Overall, our main model specification has better or similar predictive performance compared to these alternative specifications. The incremental predictive power of photos is robust across these alternative modeling specifications.

Table A13 Prediction Results of Alternative Model Specification #1: OnePeriodt-1+Cumt-2 Out of sample In sample AUC KL divergence Sensitivity Specificity Balanced accuracy Pseudo R2 0.6984 0.2001 0.7208 0.5402 0.6305 0.1344 Baseline (0.0043) (0.0041) (0.0053) (0.0109) (0.0047) (0.0025) 0.7135 0.1981 0.7993 a 0.4595 0.6294 0.2287 Baseline + review (0.0046) (0.0042) (0.0048) (0.0106) (0.0045) (0.0073) 0.7606 a 0.1903 a 0.7308 0.6210 a 0.6759 a 0.2495 Baseline + photo (0.0065) (0.0033) (0.0085) (0.0125) (0.0061) (0.0033) 0.7662 a 0.1880 a 0.7629 0.6105 a 0.6867 a 0.2894 a Baseline + review + photo (0.0063) (0.0034) (0.0082) (0.0135) (0.0060) (0.0059) Total obs. 71,665 Baseline model includes restaurant characteristics, competition landscape, and macro factors. For sensitivity, specificity, and balanced accuracy, the training data are reweighted so that the total weights of open and closed observations are equal. The results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. Bold numbers indicate significant improvement over the baseline model at the 0.05 level with a 2-sided test. a Best in the column or not significantly different from best in the column at the 0.05 level with a 2-sided test.

20

Table A14 Prediction Results of Alternative Model Specification #2: OnePeriodt-1+OnePeriodt-2+Cumt-3 Out of sample In sample AUC KL divergence Sensitivity Specificity Balanced accuracy Pseudo R2 0.6736 0.1903 0.7721 0.4314 0.6017 0.1366 Baseline (0.0054) (0.0045) (0.0087) (0.0137) (0.0052) (0.0037) 0.6933 0.1888 0.8650 a 0.3207 0.5928 0.3109 Baseline + review (0.0044) (0.0047) (0.0068) (0.0139) (0.0048) (0.0126) 0.7465 a 0.1819 b 0.7676 0.5418 a 0.6547 a 0.2896 Baseline + photo (0.0071) (0.0036) (0.0109) (0.0186) (0.0067) (0.0053) 0.7484 a 0.1806 b 0.8204 0.4679 0.6442 a 0.3592 a Baseline + review + photo (0.0067) (0.0039) (0.0104) (0.0204) (0.0070) (0.0105) Total obs. 55,669 Baseline model includes restaurant characteristics, competition landscape, and macro factors. For sensitivity, specificity, and balanced accuracy, the training data are reweighted so that the total weights of open and closed observations are equal. The results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. Bold numbers indicate significant improvement over the baseline model at the 0.05 level with a 2-sided test. a Best in the column or not significantly different from best in the column at the 0.05 level with a 2-sided test. b Best in the column or not significantly different from best in the column at the 0.10 level with a 2-sided test.

Table A15 Prediction Results of Alternative Model Specification #3: OnePeriodt-1+Cumt-1+Changet-1 Out of sample In sample AUC KL divergence Sensitivity Specificity Balanced accuracy Pseudo R2 0.6972 0.2005 0.7433 0.5131 0.6282 0.1438 Baseline (0.0044) (0.0041) (0.0056) (0.0091) (0.0044) (0.0029) 0.7152 0.1987 0.8171 a 0.4454 0.6312 0.2518 Baseline + review (0.0044) (0.0043) (0.0050) (0.0110) (0.0043) (0.0084) 0.7683 a 0.1891 a 0.7395 0.6331 a 0.6863 a 0.2669 Baseline + photo (0.0066) (0.0031) (0.0089) (0.0112) (0.0056) (0.0036) 0.7788 a 0.1863 a 0.7737 0.5991 0.6864 a 0.3149 a Baseline + review + photo (0.0065) (0.0034) (0.0085) (0.0124) (0.0057) (0.0063) Total obs. 71,665 Baseline model includes restaurant characteristics, competition landscape, and macro factors. For sensitivity, specificity, and balanced accuracy, the training data are reweighted so that the total weights of open and closed observations are equal. The results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. Bold numbers indicate significant improvement over the baseline model at the 0.05 level with a 2-sided test. a Best in the column or not significantly different from best in the column at the 0.05 level with a 2-sided test. 21

Robustness Check for Photo and Review Content Measures As a robustness check for photo and review content measures, we use content labels to capture photo and review content to carry out the comparisons in Table A16. Specifically, for photos, we replace “prop. of photos on each of the 10 LDA topics” in Table 4A in the paper with “prop. of photos containing each of the top 100 Clarifai objects” listed in Table A1. For reviews, we replace “prop. of reviews on each of the 20 LDA topics” in Table 4A with “prop. of reviews containing each of the top 100 nouns” listed in Table A10. Table A16 below provides a robustness check for Table 5. The results are qualitatively similar as the ones presented in the paper.

Table A16. One-Year-Ahead Prediction – Using Content Labels on Photo and Review Content Out of sample In sample AUC KL divergence Sensitivity Specificity Balanced accuracy Pseudo R2 0.7020 0.1973 0.6484 0.6284 0.6384 0.1373 Baseline (i.e., no UGC) (0.0047) (0.0035) (0.0056) (0.0092) (0.0046) (0.0026) 0.7190 0.1948 0.7342 a 0.5487 0.6415 0.2198 Baseline + review (0.0046) (0.0035) (0.0045) (0.0092) (0.0040) (0.0060) 0.7623 a 0.1878 a 0.6759 0.6936 a 0.6848 a 0.2358 Baseline + photo (0.0064) (0.0028) (0.0086) (0.0094) (0.0052) (0.0027) 0.7708 a 0.1850 a 0.7103 0.6745 a 0.6924 a 0.2752 a Baseline + review + photo (0.0061) (0.0028) (0.0074) (0.0093) (0.0050) (0.0042) Baseline model includes restaurant characteristics, competitive landscape, and macro conditions. For sensitivity, specificity, and balanced accuracy, the training data are reweighted so that the total weights of open and closed observations are equal. The results are averaged over years and cross-validation iterations. Standard errors are provided in parentheses. Bold numbers indicate significant improvement over the baseline model at the 0.05 level with a 2-sided test. a Best in the column or not significantly different from best in the column at the 0.05 level with a 2-sided test.

22

Photos’ Predictive Power Broken Down by Each Age for Age <=5 Because the survival of younger restaurants is more volatile5, we take a closer look at photos’ predictive power by each age for age<=5 in Figure A13. We calculate the AUC increase from photos by examining the difference between 퐴푈퐶퐵푎푠푒푙푖푛푒+ 푟푒푣푖푒푤 + 푝ℎ표푡표 and 퐴푈퐶퐵푎푠푒푙푖푛푒+ 푟푒푣푖푒푤 for each age group, respectively.6 The figure shows that photos increase prediction accuracy for restaurants at each age younger than or equal five years old. Figure A13 Photo’s Prediction Power by Each Age (Age<=5)

Robustness Checks for Multiple-Year Survival Prediction We report results from three alternative specifications for multiple-year (∆푡) survival prediction as robustness checks below. We let ∆푡=1,2,3 years. Compared with the main specification in the paper where we predict survival during the future ∆푡 years (e.g., forecasting whether a restaurant would survive until the end of 2013), these alternative specifications break down the prediction for each year (e.g., predicting whether a restaurant would survive in 2011, 2012, or 2013). To make these specifications comparable with the main specification in the paper, we multiply the predicted survival probability for each future year to derive the predicted survival probability during the future ∆푡 years (e.g., 푦푖2010̃̂+3 = 푦̂푖2011 ∗ 푦̂푖2012 ∗ 푦̂푖2013). Same for all the approaches below, for each ∆푡, we train separate models (baseline; baseline + review; baseline + photo; baseline + review + photo) with data till 푡 and 10-fold cross-validation. Then for each ∆푡, we report average performance across 푡 and cross-validation iterations. First specification We assume that a restaurant owner only has information until time 푡. Without knowledge of any UGC posted on Yelp for both the focal restaurant and its competitors over the next ∆푡 years, he wants to predict whether the restaurant can survive in year 푡 + ∆푡. Namely, we use 푿푖푡 to predict 푦푖푡+∆푡. For example, the owner may use photos and reviews until year 2010 to predict the restaurant’s survival probability in year 2013, assuming that he has no information regarding UGC posted on Yelp for the focal restaurant and its competitors between year 2010 and year 2013. Second specification We assume that UGC posted in future years will be the same as that posted in the current year. Namely, we use 푿̂푖푡+∆푡−1 to predict 푦푖푡+∆푡, with 푿̂푖푡+∆푡−1 derived using 푿푖푡 assuming continuity in UGC for both focal and competing restaurants.

5The failure rates for the three groups (1<=age<=3; 321) in Table 7 are 10%, 4%, and 2%, respectively. Please note that the failure rates are calculated at the restaurant-year level (instead of restaurant level) since restaurant age changes over the years.

6 In Table 7, we train a separate model for each age bracket (young, mid-aged, established) to fully calibrate the model for each type of restaurants. Because each age in Figure A13 has a relatively small number of observations, we do not train a separate model for each age. Instead, we plot the AUC difference for each age separately, using the model trained on all ages. 23

Third specification This specification is based on the distributed lag model (Mela et al. 1997). In this model, the dependent variable 푦푖푡+∆푡 is a function of 푿푖푡+∆푡−1 and lagged dependent variable 푦푖푡+∆푡−1. We do not observe 푿푖푡+∆푡−1 and 푦푖푡+∆푡−1 in current year 푡, when ∆푡 > 1. Therefore, we predict the dependent variables of future years sequentially, assuming continuity in UGC. For example, in order to predict survival in year 2013, i.e., 푦푖2013, during focal year 2010, we predict 푦̂푖2011 using 푿푖2010 as in the first robustness check and derive 푿̂푖2011 using 푿푖2010 assuming continuity in UGC as in the second robustness check. Then we use 푿̂푖2011 and 푦̂푖2011 to predict 푦̂푖2012. Finally, we use 푿̂푖2012 and 푦̂푖2012 to predict 푦̂푖2013. Figures A14-A16 show that, under all these three alternative specifications, photos can predict three-year survival while reviews can only predict one-year survival. It is consistent with the pattern of main prediction in the paper.

Figure A14 Multiple-Year Survival Prediction (First Specification)

Baseline includes restaurant characteristics, competition landscape, and macro factors. Results are averaged over years and cross-validation iterations. The error bars represent  1 times the standard error of each point estimate.

24

Figure A15 Multiple-Year Survival Prediction Assuming Same Future UGC as Focal Year (Second Specification)

Baseline includes restaurant characteristics, competition landscape, and macro factors. Results are averaged over years and cross-validation iterations. The error bars represent  1 times the standard error of each point estimate.

Figure A16 Multiple-Year Survival Prediction Using Distributed Lag Model (Third Specification)

Baseline includes restaurant characteristics, competition landscape, and macro factors. Results are averaged over years and cross-validation iterations. The error bars represent  1 times the standard error of each point estimate.

25

Appendix E. Cluster-Robust Causal Forests Figure A17 summarizes the estimation procedures of our causal forests. We provide more details for the estimation procedures below. Figure A17 Causal Forest Estimation Procedures

풙 denotes a set of values of 푿푖푡−1.

Consolidating Highly Correlated Variables Before carrying out the causal forests estimation, we consolidate variables to reduce the correlations between treatments and controls as required by the overlap assumption (Wager and Athey 2018). Table A17 presents the complete list of consolidated variables. When both cumulative and one- period variables of the same measure (e.g., cum. and one-period avg. yearly helpful votes for photos) are among the top 35 most informative variables in Figure 4 of the main paper, we use cumulative rather than one-period lag variables because the former often times are more predictive of restaurant survival than the latter. When two variables carry similar meanings (e.g., prop. of photos with helpful votes, avg. yearly helpful votes for photos), we keep the one with higher SHAP feature importance from Figure 4 in the main paper. When the average and the standard deviation of the same measure are highly correlated (e.g., avg. and std. of yearly helpful votes for reviews), we keep the average for ease of interpretation.

26

Table A17 The List of Consolidated Variables Variables before consolidation Treatment variable used in causal forests # of photos (cum.) # of photos and reviews (cum.) # of reviews (cum.) Prop. of photos with helpful votes (cum.) Prop. of photos with helpful votes (cum.) Avg. yearly helpful votes for photos (cum.)

Avg. yearly helpful votes for photos (one-period) Avg. yearly helpful votes for reviews (cum.) Avg. yearly helpful votes for reviews (cum.) Std. of yearly helpful votes for reviews (cum.)

Avg. yearly helpful votes for reviews (one-period) Avg. review length (cum.) Avg. review length (cum.) Std. of review length (cum.) # of overlapping competitors (one-period) # of non-overlapping competitors (one-period) # of competitors (one-period) # of new entries (one-period) # of new exits (one-period) Avg. of competitors’ # of photos (cum.) Avg. of competitors’ # of reviews (cum.) Avg. of competitors' # of photos and reviews (cum.) Avg. of competitors’ # of reviews (one-period)

Correlations Between Treatment and Control Variables Before and After Orthogonalization We carry out an orthogonalization procedure (Step 1 in Figure A17) to further reduce correlations between treatment variables and controls. We report correlations between treatment and control variables before and after orthogonalization. The idea of examining such correlations is similar to the matching quality check under traditional propensity score matching methods (Austin 2009). Table A18 displays the maximum correlation coefficient between each treatment variable and their controls before and after orthogonalization. For example, when the cumulative number of photos and reviews is the treatment variable, we check its correlation with each control variable (i.e., all other treatment variables in the second column of Table A18, restaurant quality dimensions, zip codes, and years). Before orthogonalization, the maximum correlation between cumulative UGC volume and controls was 0.40. After orthogonalization, the maximum correlation is reduced to 0.08. It is evident that the correlations between the treatment and control variables are greatly reduced after orthogonalization. ̂ (−푖) As a robustness check, we also double-check the quality of the propensity scores 푊푖푡−1 (푿푖푡−1) (the second term of the first part of Equation 3 in the paper) generated in the orthogonalization step. We choose to perform this robustness check on binary treatment variables, given that such a conventional balance check method is only applicable to binary variables. The intuition is, if the quality of the propensity scores is good, we would see that the control variables adjusted by the propensity scores are balanced across the treated and untreated conditions. For example, suppose that the treatment variable is chain status and that one control variable is age. To check the extent to which age is balanced in the observations of chain and independent restaurants, we compute the normalized balance diagnostic 푑 (Austin 2011) for age as in Equation (A2) before and after the adjustment (푑푟푎푤, 푑푎푑푗푢푠푡푒푑). 푎𝑔푒̅̅̅̅̅퐷 is the average of age and 푠 2 is the variance of age for observations of 퐷 ∈ (푐ℎ푎𝑖푛, 𝑖푛푑푒푝푒푛푑푒푛푡), 푎푔푒퐷 푐ℎ푎푖푛 respectively. Following Austin (2011) , we define weight for adjustment as 휑 = [ 푖 + 푖푡−1 ̂ (−푖) 푐ℎ푎푖푛푖푡−1(푿푖푡−1) 27

1−푐ℎ푎푖푛푖 7 ∑푖 ∈퐷 휑푖푡푎푔푒푖푡 (−푖) ] . Then following Austin and Stuart (2015), 푎𝑔푒̅̅̅̅̅퐷,푎푑푗푢푠푡푒푑 = and ̂ ( ) ∑ 휑 1−푐ℎ푎푖푛푖푡−1 푿푖푡−1 푖∈퐷 푖푡 2 ∑푖 ∈퐷 휑푖푡 2 푠푎푔푒 = 2 2 ∑푖 ∈퐷 휑푖푡(푎𝑔푒푖푡 − 푎𝑔푒̅̅̅̅̅퐷,푎푑푗푢푠푡푒푑) , with 퐷 ∈ 퐷,푎푑푗푢푠푡푒푑 (∑푖 ∈퐷 휑푖푡) −∑푖 ∈퐷 휑푖푡 (푐ℎ푎𝑖푛, 𝑖푛푑푒푝푒푛푑푒푛푡). The normalized balance diagnostic 푑 is calculated for other control variables in the same fashion.

풂품풆̅̅̅̅̅̅풄풉풂풊풏−풂품풆̅̅̅̅̅̅풊풏풅풆풑풆풏풅풆풏풕 풂품풆̅̅̅̅̅̅풄풉풂풊풏,풂풅풋풖풔풕풆풅−풂품풆̅̅̅̅̅̅풊풏풅풆풑풆풏풅풆풏풕,풂풅풋풖풔풕풆풅 (A2) 풅풓풂풘 = , 풅풂풅풋풖풔풕풆풅 = 풔 ퟐ +풔 ퟐ 풔 ퟐ +풔 ퟐ √ 풂품풆풄풉풂풊풏 풂품풆풊풏풅풆풑풆풏풅풆풏풕 √ 풂품풆풄풉풂풊풏,풂풅풋풖풔풕풆풅 풂품풆풊풏풅풆풑풆풏풅풆풏풕,풂풅풋풖풔풕풆풅 ퟐ ퟐ

Figure A18 visualizes 푑푟푎푤 and 푑푎푑푗푢푠푡푒푑 when chain status is the treatment variable, where the x-axis is the normalized balance diagnostic 푑 and y-axis lists names of all control variables. As per Austin (2009), d ≤ 0.2 indicates adequate balance in control variables between treated (chain restaurants) and untreated (independent restaurants) groups. For example, in Figure 18, 푑푟푎푤 for age is bigger than 0.2, while the 푑푎푑푗푢푠푡푒푑 for age is smaller than 0.2, indicating that the control variable age is balanced across the chain and independent restaurants after being adjusted by the propensity scores generated by the causal forests. Figure 18 shows 푑푎푑푗푢푠푡푒푑 ≤ 0.2 for all control variables.

7 This weight is conventionally used by inverse probability of treatment weighting methods to estimate the average ̂ (−푖) treatment effect (ATE), where 푐ℎ푎푖푛푖푡−1(푿푖푡−1) is the probability of being treated (chain restaurants in this ̂ (−푖) example) and 1 − 푐ℎ푎푖푛푖푡−1(푿푖푡−1) is the probability of being untreated (independent restaurants in this example) (Austin 2011). Namely, each sample's weight is equal to the inverse of the probability of receiving the condition the subject received. 28

Table A18 Correlations between Treatment and Control Variables Before and After Orthogonalization Maximum Maximum correlation correlation coefficient coefficient Type Treatment variable between between treatment and treatment and controls before controls after orthogonalization orthogonalization Photo and review # of photos and reviews (cum.) 0.40 0.08 % of food photos (cum.) 0.36 0.07 % of outside photos (cum.) 0.23 0.06 % of interior photos (cum.) 0.19 0.06 Photo % of dink photos (cum.) 0.16 0.03 % of menu photos (cum.) 0.09 0.04 % of photos on food and drink (cum.) 0.36 0.05 % of photos with helpful votes (cum.) 0.31 0.04 % of mixed/negative reviews on food (cum.) 0.23 0.04 Avg. yearly helpful votes for reviews (cum.) 0.20 0.06 Review Avg. review length (cum.) 0.23 0.08 Avg. star rating (cum.) 0.32 0.05 Std. of star ratings (cum.) 0.32 0.04 Chain 0.56 0.05 Age <= 1 0.26 0.04 Age 2-3 0.21 0.03 Age 4-7 0.21 0.03 Age 8-21 0.11 0.03 Age 22-42 0.24 0.03 Age > 42 0.50 0.05 Price level 0.28 0.09 Mexican 0.14 0.03 Company American (traditional) 0.25 0.05 Pizza 0.33 0.05 Nightlife 0.27 0.06 Fast Food 0.47 0.04 Sandwiches 0.17 0.04 American (new) 0.20 0.05 Burgers 0.29 0.03 Italian 0.33 0.03 Chinese 0.14 0.05 # of competitors (one-period) 0.58 0.20 Competition Avg. of competitors' # of photos and reviews (cum.) 0.55 0.19 Avg. of competitors' avg. star rating (cum.) 0.23 0.06

29

Figure A18 An Example of Control Variable Balance Check before and after Orthogonalization Covar(Treatmentiate Balance Variable: Chain) # of photos and reviews (cum.) % of food photos (cum.) % of interior photos (cum.) % of outside photos (cum.) % of drink photos (cum.) % of menu photos (cum.) % of photos on food and drink (cum.) % of photos with helpful votes (cum.) % of mixed/negative reviews on food (cum.) Avg. yearly helpful votes for reviews (cum.) Avg. review length (cum.) Avg. star rating (cum.) Std. of star ratings (cum.) Age Price level Mexican American (traditional) Pizza Nightlife Fast Food Sandwiches American (new) Burgers Italian Chinese # of competitors# of(one competitors-period) Avg. of competitors' # of photos and re views (cum.) Avg. of competitors' avg. star rating (cum.) % of reviews mentioning food (cum.) % of reviews mentioning service (cum.) % of reviews mentioning environment (cum.) % of reviews mentioning price (cum.) Avg. sentiment of food (cum.) Avg. sentiment of service (cum.) Avg. sentiment of environment (cum.) Avg. sentiment of price (cum.) Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Sample Year 2014 Year 2015 RawUnadjusted Zip code 89109 Zip code 85281 Adjusted Zip code 53703 Zip code 89119 Zip code 85260 Zip code 89103 Zip code 15222 Zip code 85018 Zip code 89146 Zip code 85251 Zip code 89014 Zip code 89101 Zip code 28277 Zip code 89147 Zip code 89102 Zip code 85032 Zip code 85016 Zip code 61820 Zip code 89158 Zip code 85224 Zip code 89117 Zip code 85308 Zip code 89104 Zip code 85225 Zip code 28202 Zip code 85003 Zip code 85202 Zip code 85206 Zip code 85284 Zip code 15213 Zip code 28273 Zip code 89193 Zip code 89123 Zip code 89052 Zip code 85044 Zip code 85382 Zip code 15203 Zip code 28105 Zip code 85282 Zip code 85040 Zip code 53704 Zip code 85210 Zip code 85374 Zip code 28205 Zip code 85004 Zip code 85257 Other zip codes 0.0 0.2 0.5 1.0 NormalizedAbsolute balance Mean diagnosticDifferences (푑) y-axis lists names of all control variables when chain status is the treatment variable.

30

Building Honest Trees In Step 2 of Figure A17, we follow Athey et al. (2019) by building honest trees to weight the similarity between a set of values of controls 풙 and an arbitrary 푿푖푡−1. For a consistent estimation of the treatment effects, honest trees use separate subsamples of observations for placing the splits and calculating similarity weights. Let us grow B (B=2000) trees, with each tree (indexed by 푏) built with a random subsample of observations 풮푏 and a random subsample of control variables. Each tree aims to group observations 푿푖푡−1 similar to 풙 in one leaf. Denote the leaf containing 풙 as 퐿푏(풙). The similarity 핝(푿푖푡−1∈퐿푏(풙)) between observation 푿푖푡−1 and 풙 measured by tree 푏 is thus defined as 훼푏,푖푡(풙) = . Then, |퐿푏(풙)| the overall similarity between 푿푖푡−1 and 풙, denoted by 훼푖푡(풙), captures the frequency the 푿푖푡−1 1 observation falls into the same leaf as 풙: 훼 (풙) = ∑퐵 훼 (풙). 푖푡 퐵 푏=1 푏,푖푡 As discussed in the paper, our cluster-robust causal forests account for within-restaurant variation by allowing clustered errors. Technically, to allow errors to be correlated within clusters, we first draw a subsample of clusters 퐼푏 ⊆ {1,2, … , 퐼} when building each tree; then 풮푏 is formed by all observations from 8 the clusters 퐼푏.

References for Appendices Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” Annals of Statistics 47(2):1148–78. Austin, Peter C. 2009. “Balance Diagnostics for Comparing the Distribution of Baseline Covariates between Treatment Groups in Propensity-Score Matched Samples.” Statistics in Medicine 28(25):3083--3107. Austin, Peter C. 2011. “An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.” Multivariate Behavioral Research 46(3):399–424. Austin, Peter C., and Elizabeth A. Stuart. 2015. “Moving towards Best Practice When Using Inverse Probability of Treatment Weighting (IPTW) Using the Propensity Score to Estimate Causal Treatment Effects in Observational Studies.” Statistics in Medicine 34(28):3661–79. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3(Jan):993–1022. Bujisic, Milos, Joe Hutchinson, and H. G. Parsa. 2014. “The Effects of Restaurant Quality Attributes on Customer Behavioral Intentions.” International Journal of Contemporary Hospitality Management 26(8):1270–91. Canny, John. 1987. “A Computational Approach to Edge Detection.” Readings in Computer Vision (184– 203). Chen, Tianqi, and Carlos Guestrin. 2016. “Xgboost: A Scalable Tree Boosting System.” Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining 785–94. Datta, Ritendra, Dhiraj Joshi, Jia Li, and James Z. Wang. 2006. “Studying Aesthetics in Photographic Images Using a Computational Approach.” European Conference on Computer Vision 288–301. Daubechies, Ingrid. 1992. Ten Lectures on Wavelets. Siam. Friedman, Jerome. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 1189–1232. Hanley, James A., and Barbara J. McNeil. 1982. “The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve.” Radiology 143(1):29–36. Hasler, David, and Sabine E. Suesstrunk. 2003. “Measuring Colorfulness in Natural Images.” Human Vsion and Electronic Imaging VIII 5007(87–96). Hoffman, Matthew, Francis Bach, and David Blei. 2010. “Online Learning for Latent Dirichlet Allocation.” Advances in Neural Information Processing Systems 23:856–64. Hutto, Clayton J., and Eric Gilbert. 2014. “Vader: A Parsimonious Rule-Based Model for Sentiment

8 https://grf-labs.github.io/grf/REFERENCE.html#cluster-robust-estimation 31

Analysis of Social Media Text.” Eighth International AAAI Conference on Weblogs and Social Media. Hyun, Sunghyup Sean. 2010. “Predictors of Relationship Quality and Loyalty in the Chain Restaurant Industry.” Cornell Hospitality Quarterly 51(2):251–67. Kim, Yoon. 2014. “Convolutional Neural Networks for Sentence Classification.” ArXiv Preprint ArXiv:1408.5882. LeCun, Yann, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 2278–2324. Liu, Xiao, Dokyun Lee, and Kannan Srinivasan. 2019. “Large Scale Cross Category Analysis of Consumer Review Content on Sales Conversion Leveraging Deep Learning.” Journal of Marketing Research. Mela, Carl F., Sunil Gupta, and Donald R. Lehmann. 1997. “The Long-Term Impact of Promotion and Advertising on Consumer Brand Choice.” Journal of Marketing Research XXXIV:248–61. Montabone, Sebastian, and Alvaro Soto. 2010. “Human Detection Using a Mobile Platform and Novel Features Derived from a Visual Saliency Mechanism.” Image and Vision Computing 28(3):391–402. Mori, Greg, Xiaofeng Ren, Alexei A. Efros, and Jitendra Malik. 2004. “Recovering Human Body Configurations: Combining Segmentation and Recognition.” Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2:II--II. Netzer, Oded, Alain Lemaire, and Michal Herzenstein. 2019. “When Words Sweat: Identifying Signals for Loan Default in the Text of Loan Applications.” Journal of Marketing Research 56(6):960–80. Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 1532–43. Rehurek, Radim, and Petr Sojka. 2010. “Software Framework for Topic Modelling with Large Corpora.” Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. Ren, Xiaofeng, and Jitendra Malik. 2003. “Learning a Classification Model for Segmentation.” Proceedings of 9th International Conference of Computer Vision 1:10–17. Röder, Michael, Andreas Both, and Alexander Hinneburg. 2015. “Exploring the Space of Topic Coherence Measures.” Proceedings of the Eighth ACM International Conference on Web Search and Data Mining 399--408. Rother, Carsten, Vladimir Kolmogorov, and Andrew Blake. 2004. “Grabcut: Interactive Foreground Extraction Using Iterated Graph Cuts.” ACM Transactions on Graphics (TOG) 23(3):309–14. Ruder, Sebastian. 2017. “An Overview of Multi-Task Learning in Deep Neural Networks.” ArXiv Preprint ArXiv:1706.05098. Ryu, Kisang, Hye-Rin Lee, and Woo Gon Kim. 2012. “The Influence of the Quality of the Physical Environment, Food, and Service on Restaurant Image, Customer Perceived Value, Customer Satisfaction, and Behavioral Intentions.” International Journal of Contemporary Hospitality Management 24(2):200–223. Sievert, Carson, and Kenneth Shirley. 2014. “LDAvis: A Method for Visualizing and Interpreting Topics.” Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces 63–70. Simonyan, , and Andrew Zisserman. 2014. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” ArXiv Preprint ArXiv:1409.1556. Timoshenko, Artem, and John R. Hauser. 2019. “Identifying Customer Needs From User-Generated Content.” Marketing Science 38(1):1–20. Tirunillai, Seshadri, and Gerard Tellis. 2014. “Extracting Dimensions of Consumer Satisfaction with Quality From Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation.” Journal of Marketing Research 51:463–79. Wager, Stefan, and Susan Athey. 2018. “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests.” Journal of the American Statistical Association 113(523):1128–1242. Wang, Wei-ning, Ying-lin Yu, and Sheng-ming Jiang. 2006. “Image Retrieval by Emotional Semantics: 32

A Study of Emotional Space and Feature Extraction.” IEEE International Conference on Systems, Man and Cybernetics 4:3534–39. Wang, Xiaohui, Jia Jia, Jiaming Yin, and Lianhong Cai. 2013. “Interpretable Aesthetic Features for Affective Image Classification.” IEEE International Conference on Image Processing 3230–34. Zeiler, Matthew D., and Rob Fergus. 2014. “Visualizing and Understanding Convolutional Networks.” European Conference on Computer Vision 818–33. Zhang, Shunyuan, Dokyun Lee, Param Singh, and Kannan Srinivasan. 2018. “How Much Is an Image Worth? Airbnb Property Demand Analytics Leveraging A Scalable Image Classification Algorithm.” Working Paper. Zhang, Xiang, Junbo Zhao, and Yann LeCun. 2015. “Character-Level Convolutional Networks for Text Classification.” Advances in Neural Information Processing Systems 649–57.