<<

Working Paper Series 2018/57/TOM

Does Real-time Feedback Make You Try Less Hard? A Study of Automotive

Vivek Choudhary INSEAD, [email protected]

Masha Shunko University of Washington, [email protected]

Serguei Netessine The Wharton School, [email protected]

Mobile and -of-things (IoT) devices increasingly enable tracking of user behavior and they often provide real-time feedback to the consumers in an effort to improve their conduct. Growing adoption of such technologies leads to an important question, “Does real-time feedback provided to users improve their behavior?” To answer this question, in this paper, we study real-time feedback in the context of automotive telematics, which is the key technological disruption in the automotive insurance industry. Automotive telematics enables drivers to track their behavior using various technologies inbuilt in a mobile device and to provide real-time feedback on the driving quality. We investigate the impact of such real-time feedback on driving behavior, as measured by several parameters such as harsh braking, over speeding, and steep acceleration. Contrary to much of the existing feedback literature, we find that, on average, the driving performance of users post-detailed feedback is 13.3% worse than the performance of users who do not review their detailed feedback. This impairment in performance translates into a one-year reduction in inter-accident time. Our results suggest this deterioration is associated with increased sharp accelerations and over speeding. Drivers also demonstrate higher speed dispersion within a trip after feedback that results in 1.65% increased probability of an accident. Further, we demonstrate the critical role that insurance incentive thresholds play in the effect of real-time feedback.

Key words: Real-time Feedback; Empirical Operations Management; Behavioral Operations Management; Automotive Telematics

Electronic copy available at: https://ssrn.com/abstract=3260891

A Working Paper is the author’s intellectual property. It is intended as a means to promote research to interested readers. Its content should not be copied or hosted on any server without written permission from [email protected] Find more INSEAD papers at https://www.insead.edu/faculty-research/research

Does Real-time Feedback Make You Try Less Hard? A Study of Automotive Telematics Vivek Choudhary, INSEAD Masha Shunko, Foster School of Business, University of Washington Serguei Netessine, The Wharton School, University of Pennsylvania Abstract

Mobile and internet-of-things (IoT) devices increasingly enable tracking of user behavior and they often provide real-time feedback to the consumers in an effort to improve their conduct. Growing adoption of such technologies leads to an important question, “Does real-time feedback provided to users improve their behavior?” To answer this question, in this paper, we study real-time feedback in the context of automotive telematics, which is the key technological disruption in the automotive insurance industry. Automotive telematics enables drivers to track their behavior using various technologies in- built in a mobile device and to provide real-time feedback on the driving quality. We investigate the impact of such real-time feedback on driving behavior, as measured by several parameters such as harsh braking, over speeding, and steep acceleration. Contrary to much of the existing feedback literature, we find that, on average, the driving performance of users post-detailed feedback is 13.3% worse than the performance of users who do not review their detailed feedback. This impairment in performance translates into a one-year reduction in inter-accident time. Our results suggest this deterioration is associated with increased sharp accelerations and over speeding. Drivers also demonstrate higher speed dispersion within a trip after feedback that results in 1.65% increased probability of an accident. Further, we demonstrate the critical role that insurance incentive thresholds play in the effect of real-time feedback. Keywords: Real-time Feedback, Empirical Operations Management, Behavioral Operations Management, Automotive Telematics. 1. Introduction Mobile and IoT devices increasingly allow tracking of user behavior that was not observable before, be it a clickstream during shopping, geolocation while running, or heartbeat patterns during sleep. In addition to collecting and summarizing data, such applications and devices usually provide real-time feedback to the users, typically in an effort to change their conduct. Overall, the goal is usually to “improve” user behavior pertaining to a particular setting: for example, sleep better, run longer, reduce workload, and decrease calories intake. Providing automated feedback to the consumer in real time has potentially large implications for system performance.

Numerous disruptive technologies (such as sensors in mobile devices, RFID, etc.) provide objective and insightful assessment of one’s task performance: e.g., FitBit monitors physical activity in real time and EarlySense monitors sleep quality. In industries ranging from utilities (Tiefenbeck et al. 2016) to 1

Electronic copy available at: https://ssrn.com/abstract=3260891 transportation (Toledo and Lotan 2006), technological advances already allow immediate feedback to users, who can use it to improve performance. Such feedback is important, as human decision making is often biased (Kahneman and Tversky 1973) and therefore, feedback or nudging can be used as a low- cost intervention to exploit human behavioral tendencies such as ahead-seeking (utility gain from over performing), behind-aversion (utility loss from underperforming) (Roels and Su 2014), and last place aversion (Buell and Norton 2014).

One important application of IoT is in the automotive industry and it is termed telematics or telematics for short (https://en.wikipedia.org/wiki/Telematics). Automotive is a huge industry serving more than 1.3 billion on the roads worldwide and in the USA the automotive industry accounts for nearly 3.5 percent of GDP and employs more than 940,000 people directly or indirectly in manufacturing (USASelect 2018). Due to its size, transportation is the second-largest polluter in the USA and it is also one of the biggest contributors to death/injuries in the world with approximately 1.25 million deaths every year due to accidents (WHO 2015).

Evidently, driving behavior can have drastic implications for safety, emissions, and vehicle longevity. Moreover, recent economic trends including urbanization, concerns about city pollution, the gig economy (e.g., Uber, Lyft and similar companies in the trucking industry like Convoy), crowdsourced deliveries (e.g. Amazon and InstaCart), and ridesharing (BlaBlaCar, Car2Go) all lead to higher vehicle utilization (Cramer and Krueger 2016). As a result, growing number of drivers operate potentially unfamiliar vehicles in unfamiliar congested locations while increasingly relying on GPS-enabled devices. Such conditions create potentially unsafe environment on the roads and pose challenges for the companies in managing their drivers and maintaining their fleets. Thus, telematics applications have become numerous: companies can monitor drivers’ behavior, including harsh braking, steep acceleration, lane changes, sharp cornering, and speeding, which all lead to higher fuel consumption, higher emissions and more accidents. Companies also use this information to manage the drivers’ pool and the fleet, and provide incentives to lower insurance premiums and operational costs. For example, with the help of telematics, left turns have been found to be associated with more accidents (Najm et al. 2001) and UPS has saved millions of dollars and gallons of fuel by implementing ‘no left turn’ policy (UPS 2016).

Acknowledging these developments, telematics has been recognized as the most disruptive technology (Accenture 2018) in the automotive insurance industry, which is already a very large industry with revenues exceeding $250 billion in 2017. Telematics is utilized in usage based insurance (UBI) which is expected to have 142 million subscribers by 2023 (IHS Markit 2016). There are several other applications of telematics where insights from this study apply. For example, fleet drivers are often incentivized to improve driving behavior through threshold bonuses. Further, GPS-enabled devices, sensors, and drivers’ phones together allow UPS to capture data related to vehicle routes, idling times and safety practices (UPS 2016). Gig economy companies, such as Uber in Singapore, often lease 2

Electronic copy available at: https://ssrn.com/abstract=3260891 vehicles to their drivers and track driver performance closely (Beinstein and Sumers 2016).

While telematics devices are already providing complex real-time feedback to drivers, we still poorly understand implications of such feedback on user behavior. This understanding is important given that other attempts to make driving safer have led to unintended consequences. For example, the introduction of in in the mid-1970s led to more deaths in lower-speed collisions (Cunningham et al. 2000). In this paper, we focus on an application in which insurance companies incentivize individual drivers to drive better through insurance premium discounts and real-time feedback.

At a high level, much of the existing literature suggests that “conventional” feedback (post-factum average performance, for example, utility bills at the end of each month, annual employee appraisals etc.) and feedback-seeking may lead to improved performance (see Anseel et al. (2015) for a review). While these results are encouraging, the focus of our paper is on real-time feedback that works differently from conventional feedback studied in the literature. Real-time feedback delivers immediate effect on the user performance: users can observe their feedback after short time intervals (in our case after every trip) and learn from it. Further, using their learning from feedback, users can modify their behavior to achieve the ultimate goal such as improving driving quality score (and subsequently obtaining insurance discount) almost immediately. Very recent studies have shown (Dahlinger and Ryder 2018, Gnewuch et al. 2018, Tiefenbeck et al. 2016) that providing some types of real-time feedback can be highly effective in some cases (e.g., utilities consumption, medicine intake). Given that effect of conventional feedback is sometimes ambiguous (Kluger and DeNisi 1996) it is important to study real-time feedback and identify the effect it has on human behavior, especially in such important applications as automotive.

Using a novel dataset obtained through a collaboration with a startup, we study behavioral implications of telematics. Once a user opts in, the insurer tracks the user’s driving behavior using a device embedded in the user's or using sensors in the . In our case, users are provided with two types of scores after each trip: a trip score (the score for the latest trip), and a to-date score (an average score based on all the trips taken in the past). The overall trip score is calculated based on a combination of three inputs – speeding, acceleration, and braking, in addition to two trip attributes - time of the day and number of miles driven. Users can check detailed real-time feedback regarding their performance through a smartphone application.

Our dataset is comprised of trip-level data for 382 users totaling 80,885 trips. After every trip, users receive notification about their trip score as well as an option to review detailed feedback via the app interface (we refer to the trips that are completed after the user reviewed detailed feedback as trips with detailed feedback). Each trip score is then combined into a to-date score, which determines the financial benefits to the user. Users who obtain a to-date score above a threshold qualify for an insurance premium discount that increases in a step-wise manner with the score. The dataset includes users’ data regarding interactions with the application: when does a user open the application , which screens s/he visits, etc. Thus, unlike in most other related papers, we are actually able to track when users check their 3

Electronic copy available at: https://ssrn.com/abstract=3260891 detailed feedback, making our study richer than the studies done with SMS, e-mail, or paper mail campaigns, where we seldom know precisely if the users have read/observed the intervention and when. While the question “why people seek detailed feedback” merits a separate study given decades of extensive work in this area (Anseel et al. 2015), we focus on the impact of receiving detailed real-time feedback.

Existing operations literature utilizing lab experiments to study various phenomena (such as bullwhip effect and newsvendor games) sometimes provides real-time feedback to users. Such studies (e.g., Croson and Donohue 2006, Schweitzer and Cachon 2000 etc.) involve multiple decisions over several rounds, allowing users to see outcomes of their past actions. Unlike these studies, we focus on field data. Moreover, most of these studies (e.g., Bolton and Katok 2008, Schweitzer and Cachon 2000) focus on the decision biases in the presence of feedback (or information) that is mandated and they do not study effect of providing feedback on performance.

Our application provides a rather unique setup, which helps us study the impact of real-time feedback on the end-consumers. First, we use novel trip-level data for diverse individuals receiving real-time feedback; they are not a part of an organization or a group and have independent monetary incentives. Second, feedback is available in the application with no social cost, unlike in many organizational setups. Third, individuals receive the feedback without disclosing any information about the driving behavior of other participants (no relative feedback). Fourth, users self-select to see their detailed feedback; therefore, the feedback-seeking behavior is intrinsic rather than mandated. Fifth, in our setting, we precisely know when people saw their detailed feedback, in contrast with numerous SMS/Email campaign-based field studies, where intervention’s effectiveness is not known precisely.

Our results suggest that, surprisingly, reviewing detailed feedback by drivers may not lead to improved driving quality. On average, the driving performance of users who review the detailed feedback regarding their last driving trip is 13.3% worse than the performance of users who do not review their detailed feedback, which is equivalent to a one-year reduction in inter-accident time. Moreover, users who review their performance more frequently experience larger deterioration in performance. Our results indicate this deterioration is associated with higher number of sharp accelerations and increased over speeding of users. Drivers also increase variation in their speed within a trip after feedback that is associated with 1.65% increased probability of an accident. Although we show that the overall effect of feedback is negative, we find that insurance incentive thresholds are important - they make the impact of feedback consistently less negative and the lowest threshold is the most important one. A plausible explanation of our results is over estimation bias of own driving skills. It is well-known that drivers overestimate their driving skills (see Roy and Liersch 2013), often regarding them as nearly perfect. Receiving scores far below perfect could make drivers feel bad, challenging their self-belief, which might induce them to do worse after a detailed feedback. Since individuals have very different, idiosyncratic definitions of “bad” driving, they may realize that their definition does not align with the 4

Electronic copy available at: https://ssrn.com/abstract=3260891 scoring system, leading to negative reactions to feedback.

This paper makes several contributions. We contribute to the literature studying lesser-understood behavioral impact of real-time feedback on performance. To the best of our knowledge, ours is amongst the first works in the context of real-time feedback delivered through telematics in a field setting. We also contribute to the feedback literature, demonstrating potentially counter-intuitive effects of real-time feedback. Finally, we demonstrate the critical role that incentive threshold plays in the effect of real- time feedback, which suggests ways to implement personalized real-time feedback approaches. 2. Related Literature and Hypotheses Development Our research is closely related to the operations, business management, and economics literature in which feedback systems have been shown to be instrumental in improving performance such as productivity improvement through performance feedback (Song et al. 2017). In a very recent study, using data from a UBI provider, Soleymanian et al. (2017) find that individual’s overall score improved by 9% and harsh braking reduced by 21% due to enrollment in UBI. In contrast to our study, this paper studies the impact of being in the program (a one-time decision) and not the effect of specific feedback provided (which happens during the program). A related study (Blader et al. 2015) used a field experiment with a large US transportation company with electronic on-board recorders, which monitor drivers’ performance and provide transparent assessment. Their results indicate that providing feedback leads to better performance. In the study, the feedback monitor is fitted on the trucks for drivers to monitor performance continuously and alarms in the devices sound when a driver exceeds thresholds. In our case, drivers self-select to see their detailed feedback, and do not receive any intermediate alarms for their driving behavior if they perform poorly during a trip. In addition to these applications, telematics implementation in a fleet has shown significant improvement in driving safety (Leverson and Chiang 2014, Toledo and Musicant 2008) and fuel economy (Leverson and Chiang 2014). Though real-time feedback is a common theme in these papers, in all of these cases the feedback is forced on the individuals, which is mostly feasible in fleet applications. On the other hand, our drivers self-monitor themselves and seek feedback when they wish and the effect we study is different from mere participation or monitoring in the program. A rich body of work in the psychology and behavioral economics literature seeks to understand how conventional (i.e., not real-time) feedback affects performance. It has been relatively well-established that feedback is related to beneficial outcomes including self-awareness and competency enhancements. It inculcates various aspects of self-development such as setting goals for performance improvement. In our case, drivers enrolled in the telematics program can seek detailed feedback regarding their driving performance on a recently completed trip as well as the overall performance across all past trips. Performance feedback provides information about prior performance and serves as a basis for evaluating one’s capability to perform successfully on subsequent tasks (Bandura 1991). Common wisdom suggests that the subjects will be able to find the problematic areas in their driving conduct and improve upon 5

Electronic copy available at: https://ssrn.com/abstract=3260891 them, given the financial incentives involved. Very few papers that do study real-time feedback have shown that it can be effective in improving behavior (Dahlinger et al., 2018; Gnewuch et al., 2018; Tiefenbeck et al., 2016). For example, Tiefenbeck et al. (2016) reported a 22% reduction in hot water consumption when subjects are provided with real-time feedback on their usage. Since in our case drivers are provided with the explicit monetary incentive (discount) to improve performance, we hypothesize that detailed feedback should lead to a positive impact on drivers’ future behavior. The effect of feedback should be observed in the trip that follows the observation of feedback, and such trips should have higher performance scores. Formally, we state our first hypothesis as follows:

H1a: Drivers who seek detailed feedback before a trip will, on average, exhibit higher performance in that trip compared to drivers who do not.

On the other hand, there have been studies that demonstrated ineffectiveness of feedback on user performance (Volpp et al. 2017). Surprisingly, Kluger and DeNisi (1996) found in their meta-analysis of the feedback literature that 38% of the studies report a negative effect of feedback on performance. The effect of feedback on performance can be negative if the feedback provided induces threat to self- concept (Green et al. 2017): analyzing peer review data, this study found that feedback that is lower than their own self-assessment (i.e., disconfirming feedback) results in performance losses. Since most people believe that they are better drivers than the average (Roy and Liersch 2013), it is plausible that feedback in our setting can be disconfirming. Literature studying peer effects (Beshears et al. 2015, Schultz et al. 2007) has reported some unintended consequences of providing feedback (information on peers) such as increased electricity consumption by the low consumption households when peers consume more and reduction in savings rate after learning that peers are saving more. In our case, drivers do not feel peer pressure, however, a common scale with a maximum score of 100 can induce a comparison effect that can lead to lower performance. For example, in a lab experiment, Ashton (1990) reports decrease in performance when users are provided with feedback on performance in the presence of a performance standard. In a case study with bus drivers, Rolim et al. (2017) observe that drivers do worse in environmental parameters (e.g., idling) after real-time feedback is provided. Hannan et al. (2008) shows that providing precise relative feedback can result in deterioration of performance. Studying the effects of feedback on motivation, authors found a negative correlation between extrinsic rewards and task performance (Deci et al. 1999). Therefore, we hypothesize that feedback in the case of telematics may have a negative effect.

H1b: Drivers who seek detailed feedback before a trip will, on average, exhibit lower performance in that trip compared to drivers who do not.

Further, feedback-seeking behavior has been regarded as a useful resource for individual adaptation (Ashford 1986). Frequent feedback seeking leads to identification of the gaps and provides individuals with opportunities to address them. In addition, repetitive evaluation leads to reinforcement of its effect. It is natural to hypothesize that more feedback over a time period should have a multiplier effect, which 6

Electronic copy available at: https://ssrn.com/abstract=3260891 will lead to a change in performance in the same direction (positive or negative) as the effect of real- time feedback. That is, we predict that the high number of feedback views will exemplify the effect of real-time feedback on the next trip performance. Our Hypothesis 2 states this formally as follows:

H2: When the effect of feedback on performance is positive (negative), drivers who seek feedback more (less) frequently perform better than drivers who seek detailed feedback less (more) frequently.

Driving involves uncertainty. In making decisions under uncertainty, people tend to make systematic errors (Kahneman and Tversky 1973). Using heuristics based on past performance can lead to systematic errors in future trips by the driver. People overestimate the probability of a rare event (e.g., achieving the highest score) when it occurs. This can lead to being careless on the next trip after achieving a high score. As a result, good past scores can lead to poor future performance. Research reveals that positive or “good” feedback does not automatically lead to positive reactions (Mulder 2013). Ashford and Tsui (1991) found that negative feedback (associated with seeking critical appraisals that might hurt) leads to better performance. Burgers et al. (2015) have further illustrated a positive effect of negative feedback on performance with temporal nuance in a study employing computer games. They show that negative feedback (e.g., providing feedback “you perform poorly”) has a positive impact on performance in the short-term; however, positive feedback (e.g., providing feedback “you did well”) is more effective for long-term behavioral improvement. The authors find that intrinsic motivation plays an important role in the temporal aspect of feedback. In a field study, Martocchio and Webster (1992) study the effect of feedback on training performance of employees of a large public university. They find that positive feedback (sharing with the participants that their performance was excellent) leads to improved learning compared to negative feedback (sharing with the participants that they have performed below average) and thus, can improve long-term performance. In our case, the reward of better driving is realized in the long-term, after the end of six months. Drivers qualify for a discount on their annual insurance premiums once they reach a threshold in their to-date scores. Following the theory and the evidence provided in the prior literature, we hypothesize that a driver who receives poor scores (where we define “poor” or “negative feedback” as scores lower than the financial incentive threshold) would improve more than a driver who receives “high” scores or “positive feedback” (i.e., scores higher than the financial incentive threshold). (We later use alternative definitions of ‘negative’ and ‘positive’ feedback to test this hypothesis.) Formally, we state our hypothesis as follows:

H3: After reviewing detailed feedback, performance on the next trip (short-term effect) for the drivers who receive negative feedback will be better than for the drivers who receive positive; average performance of all the trips taken after receiving detailed feedback (long-term effect) for the drivers who receive positive feedback will be better than for the drivers who receive negative feedback.

One of the important aspects of real-time feedback is that it enables users to act on their behavior promptly to induce correction. In a field experiment, Tiefenbeck et al. (2016) show that providing real- time feedback results in large behavior changes on the consumption of hot water during showering and 7

Electronic copy available at: https://ssrn.com/abstract=3260891 the full effect on behavior change may be realized from the onset of intervention. This behavior change can be also caused by human’s innate tendency to over-react. For example, attributing to cognitive limitations, Croson and Donohue (2006) find that people tend to make adjustments even when they are not required to: they observe variation in orders made by subjects facing a stationary demand in a lab experiment studying the bullwhip effect. Over-reaction to short-term information (feedback) has been also documented in several other experiments (Bolton and Katok 2008), where performance-tracking information may not lead to reduction in over-reaction. On the other hand, inadequate adjustment to behavior (e.g., ordering) while responding to variability (e.g., uncertain demand) can lead to higher variance in the outcome (e.g., profit). For example, in a newsvendor lab experiment, Schweitzer and Cachon (2000) show that, even in the presence of feedback (through profits made, outcomes of past actions), people make insufficient adjustments in ordering behavior that can be attributed to anchoring. Kremer et al. (2011) find that forecasters exhibit systematic judgement biases that lead to both over and under reactions. In a lab experiment they provide on-screen information on past decisions and outcomes (as feedback) and find that people over-react to forecast errors in stable time-series and under react to errors in relatively unstable time-series. Although they do not study the effect of feedback (they provide feedback to everyone), in our context, these effects together should lead to higher performance variability in trips with detailed feedback compared to the trips without detailed feedback.

However, literature also suggests that people may become more consistent after receiving feedback; this can be attributed to the effect of feedback as well as to overcoming cognitive biases. In a field experiment with patients, Lee and Dey (2014) found that providing real-time feedback on medication taking using an in-home ambient display significantly reduces variation (measured by adherence to schedule, variance in the time of the day the medication was supposed to be taken). Findings from the lab studies on the bullwhip effect (Croson and Donohue 2006, Wu and Katok 2006) have shown substantial reduction in oscillation in ordering across the supply chain by providing information on past decisions and performance. This improvement comes from the reduction of each participating agent using the information provided (feedback). Put together, information sharing and consistency in behavior may lead to lower performance variability in the trips with detailed feedback compared to the trips without detailed feedback. Therefore, we form two competing hypotheses regarding the impact of feedback review on the size of performance adjustment.

H4a: Trips with detailed feedback have higher performance variation (i.e., changes in performance from trip to trip) than the trips without detailed feedback.

H4b: Trips with detailed feedback have lower performance variation (i.e., changes in performance from trip to trip) than the trips without detailed feedback.

‘Goal-gradient’ theory suggests that people close to the goal tend to exert higher effort to improve performance and near the first reward people tend to exert very high effort with increased engagement (Kivetz et al. 2006). After attaining reward threshold, people tend to reset their efforts. For example, in 8

Electronic copy available at: https://ssrn.com/abstract=3260891 reward programs, people tend to spend more to gain the next status (e.g., Airline loyalty programs). Using lab experiments, it has been found that people pay more attention to the goal as they get closer to attaining it; and visualization (e.g., through progress bars) increases commitment and effort (Cheema and Bagchi 2011). This effect of increasing effort near the goal is well-established in the literature and it is termed the ‘goal looms larger’ effect (Forster et al. 1998). We expect that drivers close to achieving the threshold score, which is necessary for a discount, will perform better than the drivers who are far from the threshold. However, once the goal is reached, the motivation to do well diminishes (Lee et al. 1997). This predication is also valid in terms of economic returns to the effort. Therefore, we hypothesize that proximity to the threshold from below will have a positive effect on performance.

H5: After reviewing detailed feedback, drivers whose to-date score is lower than, but near the thresholds, perform better than drivers whose to-date score is higher than or away from thresholds. 3. Data & Analysis Data and Variables

We use a data set from a startup in Singapore, which will remain unidentified to protect confidentiality. The data consists of trip-level records for the drivers enrolled in the UBI offered by a large insurer. The users enroll in the program using a smartphone application (app). For each trip, the app collects start and end time stamps, origin and destination, GPS coordinates, and miles driven. In addition, the app captures the driving behavior on three dimensions - speeding, acceleration, and braking – and assigns a trip score on a scale from 0 to 100 based on a proprietary algorithm that combines the assessments on these dimensions using a variety of sensors that are built into the mobile phone (such as Gyroscope, GPS, and Accelerometer). Following every trip, the user receives a push notification with an overall score of the trip. Further, users can open the application and study the detailed feedback that includes the overall score and its individual components. In the dashboard, scores are displayed in two ways: a trip score (the score for the latest trip), and a to-date score (an average score based on all the trips taken in the past) as well as potential discount a user can receive.

It is evident that three self-selections happen during the UBI process. First, drivers decide to insure with the focal insurance company, probably because the company offers them a competitively priced policy. Second, the drivers self-select into the optional UBI program. Third, drivers choose to seek feedback by visiting the detailed trip evaluation. Unfortunately, our data does not allow us to study or address the first and second self-selection issues and we focus in this paper on the effect of feedback on driving behavior of the users who choose to enroll in the UBI and receive real-time feedback on their driving behavior. This population is, in fact, most relevant for the insurance company because it has no means or desire of forcing the app on all insured and it has no way to insure everyone. Naturally, while the effects that we study have a relatively short time-horizon, it is possible that, in a very long-term, they can begin to affect who the company insures and who selects to enroll into UBI. If anything, first and

9

Electronic copy available at: https://ssrn.com/abstract=3260891 second self-selections bias our results downwards: users who enroll into UBI probably believe that they are good drivers and that they can obtain a discount by improving their driving behavior further, which means our negative effect of feedback is conservatively estimated.

Any driver can register for the telematics application provided by the insurer. There is no cost or risk to sign up: a good score can qualify a user for a discount; however, a bad score does not currently increase the premium. Once drivers sign up, they need to keep the application running for a minimum of 6 months and drive at least 5000 km. After completion of the 6 months’ time interval, drivers can obtain a discount on the insurance policy renewal fee based on their driving scores. Drivers obtain no financial benefits (or discounts) until they achieve a to-date threshold score of 70. After the threshold, drivers qualify for monetary benefits in the form of a discount on policy renewal, which increases from 5% (scores from 70 to 80) to 20% (scores > 95). The discount on the premium provided is on top of the no-claims discounts provided to the drivers (additional discount). No-claims discounts are provided to the drivers who did not make any claims in the previous one year while holding the policy. Additive discounts and no sign up fee create a low entry barrier for the drivers.

We have 81,839 recorded trips in the original dataset. To ensure that our trips are comparable, we drop 239 trips that are originating or destined to Malaysia (neighboring country). We further remove trips with lengths below 0.3 km and above 70 km (therefore, excluding another 488 observations). Similarly, we exclude users with fewer than 10 trips in total (227 trips). Our final dataset consists of 80,885 observations for 382 users over a period of 140 days. Next, we describe the variables used in our analysis tabulated in Table 1. We index each observation by i, j, and t, where i denotes individuals, j denotes trips, and t denotes dates when a trip has been taken.

oscoreijt. Our outcome variable, which we refer to as (overall) trip score. After each trip, users receive a notification on their with their overall score for the last trip. This score is derived according to a proprietary algorithm based on three behaviors of the users (speeding, acceleration, and braking) and two trip aspects (hour in which the trip was taken and distance travelled). The scores are provided on a scale of 0 to 100.

todatescoreijt. Based on the performance in all the trips taken till date t, a to-date score is provided to the users using a proprietary algorithm, which determines the discount a user will receive after the completion of the program. This score reflects to-date performance of the user based on all the trips taken by user i till the trip j on date t.

While each user is notified about their score using a push notification, some users choose to open the dashboard and review their trip score, to-date score, potential incentives, and performance on various driving dimensions. We call this event detailed feedback review, and we know when the users open the application and which screens they visit. We define binary variable feedback:

10

Electronic copy available at: https://ssrn.com/abstract=3260891 feedbackijt = 1 if a user i visits the main dashboard before the current trip j, which takes place at time t; 0 otherwise.

Figure 1: Screenshots of the Smartphone Application

The app consists of five screens (see Figure 1). The screen that is shown first is the dashboard, providing summary of the to-date performance along various dimensions of driving behavior. In the lower part of the dashboard, maneuvers ring measures the acceleration and deceleration behavior, speed captures the over-speeding behavior. The dashboard also provides the discount percentage a user can obtain with the current to-date performance at a time of policy renewal. The screen provides graphical scores regarding how user performs across maneuvers (that is, accelerations and braking behavior), mileage (projected kilometers one would drive in a year), over speeding, and drive time (driving in the rush or night hours) graphically. A full solid ring denotes a score of 100. Partial solid rings in Figure 1 denote a score of 85 out of 100. The app also provides a Mileage score based on the estimated kilometers a user would drive over a year. Higher number of kilometers driven is associated with higher probability of accidents, therefore higher mileage is related to lower scores; in our app, the users are penalized for mileage if they drive more than 10,000 km a year. Users can swipe right to other screens in the app to see additional details of their driving performance; for example, users can see how many steep accelerations they have made per 100 km travelled (second screen in Figure 1).

The app captures the time stamps of the screen visits and thus we can detect whether the user has reviewed detailed feedback before starting the next trip. We find almost perfectly correlated screen visits (correlation > 0.99) indicating that users who visit the dashboard also visit most other screens and therefore, we use only the main dashboard visit as the treatment variable named feedbackijt.

cfeedbackijt. A discrete variable capturing the total number of times a user i has viewed feedback prior to trip j on day t and hence captures the frequency effect of feedback.

perdailyhrijt, perrushhrijt, pernighthrijt. Drivers’ scores are penalized if travel happens during the rush or night hours, which are associated with higher probability of accidents. To account for the effect of these variables on the score, we control for the percentage of trip duration that occurs during night and

11

Electronic copy available at: https://ssrn.com/abstract=3260891 rush hours. For each trip, we divide the minutes travelled during daily/night/rush hours by the total trip duration. As Singapore is located near equator, these hours are defined similarly throughout the year. The insurer sets these time intervals after studying the Singapore traffic behavior. These hours vary between working (non-holiday Monday to Friday) and non-working days (holidays, Saturdays, and Sundays). Figure 2 illustrates the classification of hours for different days. Note that one trip can cover multiple types of hours, for example, the first few minutes can be driven during a rush hour followed by a few minutes travelled during a regular daily hour.

freqijt. Users who drive rarely may behave systematically differently from the frequent drivers. To control for this effect, we calculate the number of trips taken by the user i since the enrollment in the program prior to trip j on day t divided by the number of days in the program by day t.

fbfreqijt. Users differ in how often they seek feedback so we control for this propensity. We calculate this variable as a percentage of the trips with detailed feedback for an individual until current trip.

perphoneuseijt. Through the app, we track whether the phone was used during a trip; phone usage is usually distracting and can potentially affect the overall score. For this reason, phone usage for calls and texting is prohibited while driving in Singapore (Singapore Road Traffic Act 65B implemented in 2015). Unfortunately, we cannot be sure of the reason for phone usage (for example navigation, call, or texting). We control for the percentage of time travelled using phone during a trip as a potential distraction. Removing this variable does not alter our results.

In addition to the above variables, we use two lagged variables in our analysis. These variables are one trip-lagged scores of – todatescore and oscore. These two variables capture the performance of a user on a trip before the current trip. We code these variables using ltodatescoreijt and loscoreijt and we use them as controls. Essentially, ltodatescoreijt is to-date score of an individual one trip before the current trip and loscore is the oscore of an individual one trip before the current trip. The latest to-date score observed by the user on the app as well as the last trip score can influence the user’s driving behavior on the subsequent trip; hence, we use these variables as controls in our econometric model.

Figure 2: Definition of Rush Hour, Night Hour and Daily Hour Hour of the day 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Working days N D R D R D Non-working days N D N - night hours, D - daily hours, R-rush hours

We provide the descriptive statistics for all the variables in the first four columns in Table 1. On average, trips have highest percentage of day hours with a mean of 62.1% followed by rush hours that has a mean of 34.9%. The users in our sample take an average of 3.7 trips per day. Nearly one third of the trips have been treated with drivers reviewing their detailed feedback before the trip (feedback has a mean of 0.33). It is evident from the table that our data is right-skewed with respect to the variable oscore, with a mean of 81.39 on a scale of 0 to 100. 52,412 trips have happened without any phone usage 12

Electronic copy available at: https://ssrn.com/abstract=3260891 during the trip, while the average phone usage was 2.47% of the time during a trip. The total number of trips observed is 80,885.

Table 1: Summary Statistics and Correlation Structure of the Variables Summary statistics Correlation structure Mean SD Min Max (1)+ (2) (3) (4) (5) (6) (7) (8) (9) (10) (1) feedback 0.33 0.47 0.00 1.00 1.00 (2) cfeedback 63.49 60.56 0.00 308.00 0.18* 1.00 (3) perrushhr 34.99 46.74 0.00 100.00 -0.09* -0.04* 1.00 (4) perdailyhr 62.10 47.44 0.00 100.00 0.08* 0.05* -0.94* 1.00 (5) pernighthr 2.91 16.41 0.00 100.00 -0.06* -0.01* -0.13* -0.22* 1.00 (6) freq 3.66 1.43 0.12 13.00 -0.13* 0.18* -0.02* 0.02* 0.01* 1.00 (7) fbfreq 35.20 19.06 0.00 100.00 0.53* 0.31* 0.02* -0.02* -0.00 -0.23* 1.00 (8) perphoneuse 2.47 6.87 0.00 100.00 0.02* 0.04* -0.02* 0.01* 0.02* 0.02* 0.03* 1.00 (9) ltodatescore 72.59 16.64 22.00 100.00 0.04* -0.11* 0.01 0.00 -0.03* -0.50* 0.09* -0.03* 1.00 (10) loscore 81.39 16.81 22.00 100.00 -0.05* 0.00 -0.12* 0.14* -0.06* -0.01* 0.01* -0.03* 0.20* 1.00 *p<0.05, +polyserial correlation, oscore has similar summary statistics as loscore.

Table 1 also demonstrates the correlation structure in our data (columns 1 to 10). For our analysis, we keep only perdailyhr and pernighthr. We do not report any other collinearity issues in the variables used. feedback is a binary variable, therefore we report polyserial correlation. As cfeedback is a discrete variable with very large number of categories, we treat it as continuous and report Pearson correlation (Kolenikov and Angeles 2004). We observe a positive correlation of our treatment variables (feedback and cfeedback) with fbfreq and perphoneuse. We observe that trip frequency is negatively correlated with feedback.

We start analyzing our data by conducting a t-test to observe any differences between the mean scores for the control trips (for which users did not review their feedback) and the treatment trips (for which users reviewed detailed feedback). Out of the 80,885 trips, we have 26,313 treatment trips and 54,572 control trips. We find that the mean score for the treatment trips (80.4) is 1.9% lower than that of the control trips (81.9). Thus, we see that trips with detailed feedback have lower scores, on average, compared to the trips without detailed feedback. Further, we conduct Kolmogorov–Smirnov equality of distributions test to compare the distributions of the trips with feedback with the trips without the feedback. Our results suggest that the two distributions are significantly different with p<0.01. Econometric Model

To test our first hypothesis, we start with a linear regression model to evaluate the impact of feedback obtained prior to a trip on the driving behavior during the focal trip. Since decisions to review feedback are likely to be endogenous, we will later complement our analysis with a series of matching methods and instrumental variable regression to account for this endogenous choice. Our data set is an unbalanced panel, since users in the sample complete different number of trips during the study period. We specify the following econometric model:

표푠푐표푟푒푖푗푡 = α0 + β푇푟푒푎푡푚푒푛푡ijt + γControlsijt + datet +individuali + origj + desj + εijt (1) 13

Electronic copy available at: https://ssrn.com/abstract=3260891 푇푟푒푎푡푚푒푛푡ijt represents one of the two treatment variables feedbackijt and cfeedbackijt, (we introduce them one at a time and then together). We include Controlsijt vector to account for the potential confounding factors summarized in Table 1. To account for the unobserved heterogeneity among dates, such as impact of weather, we use date fixed effects. To control for time invariant individual unobservable characteristics, such as gender and education, we use individual fixed effects. Since our data captures about five months of observations, we believe that individual fixed effect will also take care of the unobserved driving skills and vehicle types used by individuals. To control for the location- specific characteristics such as commercial office area, number of traffic lights, and width of the roads we include origin and destination dummies (origj and desj). To obtain them, we map each address to the corresponding district area using standard location classification of Singapore government. Our origins and destinations cover the 64 district areas in the country. We have repeated observations for each individual, so we cluster the standard errors at an individual level to address potential correlations among the driving scores.

Table 2: Effect of Feedback on Next Trip Score DV=oscore (1) (2) (3) (4) feedback -0.837*** -0.840*** (0.160) (0.160) cfeedback -0.011*** -0.011*** (0.004) (0.004) perdailyhr 0.148*** 0.147*** 0.148*** 0.147*** (0.003) (0.003) (0.003) (0.003) pernighthr -0.145*** -0.147*** -0.145*** -0.147*** (0.005) (0.005) (0.005) (0.005) freq 0.206 0.195 0.307** 0.296** (0.129) (0.129) (0.126) (0.125) fbfreq 0.006 0.017 0.016 0.027** (0.010) (0.010) (0.011) (0.011) perphoneuse 0.034** 0.034** 0.033** 0.033** (0.013) (0.013) (0.013) (0.013) ltodatescore 0.003 0.004 0.003 0.004 (0.033) (0.033) (0.031) (0.031) loscore 0.074*** 0.073*** 0.074*** 0.073*** (0.005) (0.005) (0.005) (0.005) Constant 79.073*** 78.951*** 78.049*** 77.919*** (11.113) (10.947) (11.234) (11.068) R-squared 0.394 0.395 0.395 0.395 Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals, N = 80,503.

We report results for the estimation of equation (1) in Table 2. In model 1, we estimate the coefficients using only the control variables and we find expected directions for the effect of control variables on the trip score. We observe that performance is negatively associated with night hours, while it is positively associated with daily hour’s percentage. Further, we find positive association of oscoreijt with the past trip score as well as with to-date score. Trip frequency is also positively associated with the performance. We find a positive association of phone usage with the dependent variable. It may be due to drivers using 14

Electronic copy available at: https://ssrn.com/abstract=3260891 map applications (such as Google Maps and Waze) to identify optimal routes. Next, model 2 includes the first treatment variable (feedback) along with the date, individual, origin and destination fixed effects. We observe that feedback is negatively associated with the users’ performance in the next trip. The coefficient of feedbackijt is significant and negative (in model 2 coefficient is -0.837, p<0.01), indicating that, once users review their feedback, they perform worse in the next trip by 0.837 points. This is contrary to what we had predicted in H1a.

In model 3, we include variable cfeedback that denotes the number of times a user has reviewed detailed feedback prior to the current trip. We find a significant negative association of cfeedback with oscore (coefficient -0.011, p<0.01). In model 4, we include both treatment variables (feedback, cfeedback) and the results are consistent with models 2 and 3, we observe that not only feedback but also cfeedback has a negative association with oscore (model 4 coefficient is -0.011, p<0.01) as predicted. It indicates that repeated reviewing of feedback is associated with a further reduction in performance. A user who reviewed feedback 100 times prior to a trip performs 1.1 points lower on the upcoming trip. The effect size is rather small but our results support H2.

3.2.1. Assessing the Impact of Feedback on Driving Performance using Matching Using simple reduced form regression models, our first analysis established that trips prior to which users reviewed their performance scores have lower scores compared to the trips prior to which users have not reviewed their scores. However, because we do not randomly assign treatment to the trips, some other underlying characteristics of the trips may influence the scores so that the difference in scores may be mistakenly attributed to feedback. To account for this non-random assignment, we next employ matching methods and instrumental variable regression. We start with matching.

Propensity scoring (see Rosenbaum and Rubin (1983) for details) is the standard way to match a variable in a control group to statistically equivalent subjects within a treatment group in absence of a random assignment to groups. This method has been successfully employed by many researchers as a way to select an unbiased control group when studying the effect of some treatment. The key idea of matching in our case is that trips that belong to two different groups—such as treatment and control— are nevertheless comparable if they have similar propensity scores (Luellen et al. 2005). Therefore, this method enables matching of trips, despite non-random assignment.

We compare treated trips to control trips that are best matched based on observed parameters to achieve the best possible balance on covariates across the feedback trips and the matched non-feedback trips such that we have the closest estimate of the marginal effect of feedback on oscore. We consider two approaches to specify how we create an appropriate control set for the treatment trips: (a) propensity score matching and (b) nearest-neighbor matching. Once we have the control group of non-feedback trips for every treatment trip, we estimate the “average treatment effect” (ATE) of a feedback based on the mean score of the treatment trips versus the mean score of the control trips. First, we utilize matching

15

Electronic copy available at: https://ssrn.com/abstract=3260891 based on propensity scores, which are typically calculated as the predicted probabilities of group membership, such as the probability of seeking a feedback, based on an appropriate set of observable factors. Thus, propensity scores can be calculated based on an estimation of a binary response model. We use a probit model to estimate the following specification:

푃푟⁡(푓푒푒푑푏푎푐푘푖푗푡 = 1)= Φ(훼0 + 훽⁡퐶표푛푡푟표푙푠푖푗푡 ⁡+ ⁡푑푎푡푒푡 + ⁡푖푛푑푖푣푖푑푢푎푙푖 ⁡+ 표푟푖푔푗 + 푑푒푠푗)⁡⁡⁡⁡⁡⁡(2)

Prior research suggests that a small set of matching variables can provide reasonably good predictive power (Dehejia and Wahba 2002). We present the results from probit model (equation 2) in Table 3. Of the seven variables, five are significant at a p<0.01 level. The trip involving higher percentage of night hours and daily hours is associated with a lower probability of feedback before a trip (negative and significant coefficient of perdailyhr, pernighthr). We note a smaller number of observations than the original dataset since there are 16 users who either did not review any feedback or reviewed detailed feedback nearly after each trip.

Table 3: Probit Model with Feedback as a Binary Outcome Variable VARIABLES feedback perdailyhr -0.003*** (0.000) pernighthr -0.006*** (0.001) freq -0.038** (0.017) fbfreq 0.044*** (0.002) perphoneuse 0.001 (0.001) ltodatescore 0.004 (0.003) loscore -0.003*** (0.000) Constant -2.547*** (0.875) Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals, N=79,648 Given a trip with detailed feedback and its predicted probability of being a treatment trip, as determined by the probit estimation, we take the most similar non-feedback trip to be the one that shows the numerically closest ex ante predicted probability of feedback, with replacement. Once we determine the best match for each treatment trip, we compute the mean score for each group and perform a difference of means t-test, as before. We show these new results in the first column of Table 4. The first column shows the size of the ATE, which is the difference of the mean of the scores between the feedback and non-feedback trips. A matched sample based on propensity scores suggests that feedback induces negative impact on the score of the next trip (coefficient -1.189, p<0.01). The effect size is quite similar to the coefficient obtained without matching: the OLS has slightly underestimated the coefficient when 16

Electronic copy available at: https://ssrn.com/abstract=3260891 we compare it to the ATE provided by propensity score matching.

Although the propensity scoring method has been used extensively in the literature, recent seminal work (Abadie and Imbens 2011) has questioned the large-sample properties of such matching estimators. They propose a set of nearest-neighbor (NN) matching estimators for the treated units that provide each unit with a control group of NN ≥ 1 untreated (control) observations whose covariates are most similar to those of the treated unit. We set NN to one in the results reported below; that is, each trip with feedback is matched to at least one nearest neighbor with replacement.

We use the same covariates for matching that we employed for probit estimation in Table 3. We report results of matching in columns 2 to 6 in Table 4. The first row in Table 4 shows the effect of feedback on oscore. Abadie and Imbens (2011) have shown that the nearest-neighbor matching estimators are not consistent when we match on more than one matching variable, therefore, we employ bias adjusted estimator using all continuous control variables as prescribed in the literature to obtain a consistent estimator. We calculate robust standard errors and we observe that our results are consistent across models. We also try exact matching of variables (date, individual, and both) and obtain consistent results which further reinforce earlier analysis (refer to models 4-6). Note that we are unable to use robust standard error when we are conducting exact matching using both individual and date variables (column 6) as the number of observations drops significantly (less than 8,000). Our estimated effect size using OLS is within the range of the results obtained using matching.

Table 4: Results of Matching: Effect of Detailed Feedback on oscore Propensity score matching Nearest-Neighbor Matching (1) (2) (3) (4) (5) (6) feedback (1 vs 0) -1.189*** -1.160*** -0.752*** -0.958*** -0.671*** -0.794*** (0.224) (0.277) (0.152) (0.174) (0.136) (0.137)

Observations 79,648 79,648 80,503 77,881 80,503 55,710 Matched- 26042 26042 26042 26012 26042 21236 treatments Matching variables - propensity all control score variables Bias-adj - - yes yes yes yes Robust std error yes yes yes yes yes no Exact match - - - individual date date & variables individual Standard errors reported in parentheses, *** p<0.01, ** p<0.05, * p<0.1

3.2.2. Instrumental Variable Analysis Looking at detailed feedback is likely an endogenous decision. Namely, we can realistically envision a mechanism such that some of the unobserved variables may affect both feedback and oscore. E.g., if drivers use their phones extensively to plan an unexpected rushed trip, it is more likely that they view their (that is, review feedback) due to increased screen time. However, the same trip that has been taken in a rush may cause speeding and lead to a lower oscore, therefore affecting both feedback

17

Electronic copy available at: https://ssrn.com/abstract=3260891 and oscore. Such omitted variables will lead to underestimation of the effect of feedback on performance and potentially biased estimation. We conduct Hausman test and we cannot reject the null hypothesis that feedback is an endogenous variable. Therefore, to address the endogeneity of the decision to review detailed feedback, we use an instrumental variable approach. We utilize two instrumental variables for our analysis: 1) standardized time between two trips, and 2) network outages.

We define time between two trips as the difference (in minutes) between the end of the latest trip and the start of the next trip. We standardize the time between trips by subtracting the average of the time between trips and dividing by the standard deviation of the time between trips for each user. This standardization helps us disentangle the users’ trip frequency effect (which we control for in the model) from the instrumental variable. We denote this variable stdtimeb4trip.

We argue that stdtimeb4trip should be positively correlated with the feedback variable (hence, the relevance condition holds): if a user has more time between two trips (and hence higher standardized time between trips), it is more likely that the user will be able to review feedback. We further argue that, as the start time of a trip is largely exogenous (i.e., it is driven by user’s needs such as work location and hours), it is unlikely that stdtimeb4trip impacts performance and affects oscore through any channel other than feedback (hence, the exclusion restriction holds). Of course, one such potential channel could be driver fatigue as documented in the past literature (Chen and Xie 2014, Hyodo et al. 2017) that studies the effect of drivers’ break time between trips on their driving performance. In our dataset, however, the trips have three distinct features that make this effect unlikely. First, the trip duration for most of the trips (75th percentile) is less than 27 minutes (i.e., it is short) with most of the users (75th percentile) taking fewer than 5 trips a day (i.e., these are not heavy car users like, say, taxi drivers). Second, users take more than 55 minutes of break before most trips (75th percentile). Third, fewer than 3.4% of the trips involve any night time driving. To sum up, our driving durations are significantly shorter and the rest times are significantly longer so that they should not alter cognitive ability of the drivers to directly impact oscore (Chen and Xie 2014, Hyodo et al. 2017). Therefore, we argue that stdtimeb4trip does not impact our dependent variable oscore directly or through any channel other than feedback review, satisfying the exclusion restriction criteria for instrumental variables.

The distribution of stdtimeb4trip is skewed within a range [-1.57 to 25.88] due to the high variability in the time between two trips, which has a mean of 454 minutes and a median of 161 minutes. Out of the 80,885 trips, we have 382 trips (the first trip for every user) without stdtimeb4trip. We have 85 trips which happened back to back such that there is no time gap between them, we include them in our analysis. We also have 2,971 trips which happened with more than 24 hours of gap. To address this skewness, we take the log transformation (using⁡log푒(1 + 푥 + min⁡(|푥|)), as there are zeros in our data) of the variable stdtimeb4trip, code it as logstdtimeb4trip, and use it for all our further analysis. The correlation between logstdtimeb4trip and feedback is positive (0.46, p<0.05) as predicted. In a 2SLS estimation, the F-stat of this instrument after first stage is 1197.10 indicating a strong instrument. 18

Electronic copy available at: https://ssrn.com/abstract=3260891 Our second instrumental variable is mobile network disruption (outage) that is an exogenous shock affecting users’ ability to utilize services requiring network connectivity. To construct this variable, we use data from http://downdetector.sg that publishes user-reported network disruptions throughout the year. It aggregates data for all three telecom providers (Singtel, Starhub, and M1) in Singapore and time stamps their outages. These providers account for nearly 100% of the mobile users in the country. The website checks and validates this data from a series of sources to ascertain network disruptions. We code the network disruption data by a variable outagejt that denotes the total number of outages reported between end of trip j-1 and beginning of trip j for the trips taken on date t by all three network providers. Thus, outages affect the feedback variable if they happen before a trip starts but after the previous trip has ended.

When there is poor connectivity to the network, the app shows a ‘disconnected from network’ notification on smartphone. Trip data, in the absence of network connectivity, is stored on the phone and transmitted to the server for score calculations after network is restored. Subsequently, users receive notifications from the app after the phone connects back to the network. Outages should cause the users to look at the phone more often to check whether the network was restored (e.g., in order to be able to use Google maps, or simply to check on the reasons for the lack of notifications). Hence, we predict that higher outage may be associated with a higher feedback viewing activity, and therefore positively associated with feedback variable (hence, the relevance condition holds). Indeed, our data confirms that outage and feedback are significantly positively associated (0.15, p<0.05). The full network outage in the country is rare and not everyone who receives a delayed notification would necessarily visit the application dashboard. Consequently, we find that the correlation is somewhat weak. The F-stat of this instrument after first stage is 699.01, indicating a strong instrument (using a 2SLS estimator). The joint F-statistic of the two instrumental variables is 637.45, which is well above the threshold value of 10 indicating that our instruments are not weak.

We check and confirm from the company websites as well as news wires that these outages are technical faults and not due to poor weather that may result in excessive traffic that, in turn, can have a direct impact on oscore. We argue that outage will affect feedback, without directly affecting the oscore given that it is an exogenous shock (hence the exclusion restriction holds). One could argue that during the network outage, the users are not able to use their phones and hence they are subject to fewer distractions during driving, which can affect the oscore directly. However, since we control for phone usage in our model, our instrumental variable should satisfy conditional independence after this control.

We use the method provided in Wooldridge (2002) to estimate the effect of feedback on the performance as our treatment variable is binary in nature (for a practical application of this method see Adams et al. 2009). To do so, we adopt a three-steps method as follows: (i) We estimate a binary response model of 푓푒푒푑푏푎푐푘 on Z (instruments) and controls x to estimate the fitted probabilities 푓푒푒푑푏푎푐푘̂ . (ii)

We then regress feedback on fitted 푓푒푒푑푏푎푐푘̂ ⁡and controls x to estimate the fitted values of 푓푒푒푑푏푎푐푘⏞ ⁡. 19

Electronic copy available at: https://ssrn.com/abstract=3260891 (iii) Finally, we regress 표푠푐표푟푒 on 푓푒푒푑푏푎푐푘⏞ ⁡and controls x. In the first step, we estimate a probit model using logstdtimeb4trip, outage, and all other control covariates as follows, Pr(푓푒푒푑푏푎푐푘 = 1|퐄) = Φ(γ퐄), (3)

where E ={Z, x} is the set of exogenous variables. The results of the first step estimation are reported in Table 5. In order to estimate the parameters and ensure convergence, we had to drop nine observations (outliers) where the times between two trips were more than 7 days. Next, using the predicted probabilities, we run a 2SLS model to estimate the effect of feedback on oscore. In step 1, we obtain a positive significant coefficient for our instrument logstdtimeb4trip (3.441, p<0.01) and a significant positive coefficient for outage (0.093, p<0.01) as reported in Table 5. The positive signs indicate that, as predicted, higher logstdb4trip and outage are associated with a higher probability of users reviewing their dashboard.

Using this model, we predict probabilities of users looking at their detailed feedback by estimating a probit model. Using the predicted probabilities as instruments, we run a 2SLS model with feedback as an endogenous variable. This method is advantageous as we obtain correct standard errors (Wooldridge, 2002). Doing so, in the first stage of 2SLS, we obtain an F-stat >10 making our instruments strong (Staiger and Stock 1997). Using a linear model (2SLS), Anderson-Rubin test (휒2=29.3, p<0.01) and under-identification test (significant at p<0.01) indicate instruments’ sufficiency. The coefficient for the predicted probabilities is 1.074 and it is significant with p<0.01. We report the results of the second stage of 2SLS in Table 6, model 1. We find a negative and significant coefficient (-2.220, p<0.01).

Table 5: First Step – Instrumental Variable Regression VARIABLES feedback logstdtimeb4trip 3.441*** (0.118) outage 0.093*** (0.014) Constant -5.649*** (0.987) Controls yes Observations 79,639 origin/destination FE yes individual FE yes cluster individual Robust SEs in parentheses *** p<0.01, ** p<0.05, * p<0.1

Further, as a robustness test, we also estimate a model assuming feedback is a linear rather than a binary variable with 2SLS estimator and we report the results in model 2 of Table 6. Similarly, we also estimate two models (feedback as binary, feedback as continuous variable) with only one instrument at a time. We report these results in models 3 and 4 in Table 6 with logstdtimeb4trip as a single instrument and in models 5 and 6 with outage as a single instrument. We find that, even with a single instrument,

20

Electronic copy available at: https://ssrn.com/abstract=3260891 we obtain consistent results. The effect size obtained is comparable both in size and significance.

To summarize, the effect size is nearly 3 times higher than when estimated without the instrumental variables. On average, users that review their detailed feedback prior to a trip, perform worse on the upcoming trip by 2.22 points (model 1, Table 6) or equivalently 13.3% in terms of the standardized effect size (measured as a multiple of standard deviation of oscore) than the users who do not review their feedback. As a robustness check, we also run sub-sample analysis for the trips, which have been taken with a minimum gap of 30 minutes and 60 minutes between trips. Our results remain consistent.

After discussing results with the collaborating firm, we are able to estimate the economic size of the effect. Based on historic data, the firm is able to link the scores to the number of years until the next accident. For an average driver in our data set with a mean score of 81.4, one score point reduction in performance will reduce time until the next accident by 0.4 years, thus significantly increasing the probability of having an accident. Therefore, our results indicate that feedback reduces the inter-accident time by 5.6 percent or by one year.

Table 6: Results of Instrumental Variable Regression (1) (2) (3) (4) (5) (6) VARIABLES oscore feedback -2.220*** -2.787*** -2.212*** -2.786*** -2.459*** -2.835*** (0.422) (0.492) (0.424) (0.496) (0.828) (0.931) Constant 78.761*** 78.657*** 78.762*** 78.658*** 78.735*** 78.660*** (10.575) (10.506) (10.576) (10.507) (10.534) (10.504) Controls yes yes yes yes yes yes Observations 79,639 80,494 79,639 80,494 79,648 80,503 R-squared 0.393 0.392 0.393 0.392 0.393 0.392 Instruments logstdtimeb4trip, outage logstdtimeb4trip outage Model 3-steps 2SLS 3-steps 2SLS 3-steps 2SLS Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1, estimation done with individual, date, origin and destination fixed effects. SEs clustered at individual level.

3.2.3. Positive vs Negative feedback To test H3, we first define negative and positive feedback. Drivers know that any final score below 70 does not qualify them for any financial benefits. Thus, a trip score of 70 serves as a natural reference point for the users and we use this reference to define positive and negative feedback. Namely, we create a binary variable negfbck, which equals one when the last obtained trip score is below 70, and zero otherwise. We interact this variable with feedback and estimate the effect of negative feedback on trip score. We use instruments logstdb4trip and outage for feedback. We report the results in the first column of Table 7. We find a positive significant coefficient for the interaction term negfbck×feedback (1.357, p<0.01) indicating that negative feedback moderates the impact of feedback, making it less negative than it would be otherwise.

To test the long-term implications of feedback, we define futureaverageijt, which is an outcome variable that captures the long-term average or mean of all trip scores obtained in the future (all trips 21

Electronic copy available at: https://ssrn.com/abstract=3260891 starting from j+1 onwards). Along similar lines of negfbck, we define the positive feedback, code this variable as posfbck, and estimate our model. We report results in the second column of Table 7, where we observe that there is positive effect of positive feedback on long-term performance (coefficient of posfbck×feedback in column 2 is 0.148, p<0.1). Therefore, H3 is supported. Interestingly, the main effect of feedback is statistically insignificant for future performance. We further run analysis with future moving averages for next two to seven trips after seeking feedback and report the results in Table E.1 (e-companion). Our results clearly show that the effect of feedback is short lived (up to two trips).

Table 7: Effect of Negative Feedback on Trip Performance (1) (2) VARIABLES oscore futureaverage feedback -2.570*** -0.088 (0.409) (0.079) negfbck×feedback 1.357*** (0.510) negfbck -0.826*** (0.298) posfbck×feedback 0.148* (0.088) posfbck 0.002 (0.045) Constant 79.736*** 84.614*** (10.629) (2.142) Controls yes yes R-squared 0.393 0.900 Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals. N = 80,503, estimation with both instruments. 3.2.4. Effect of Feedback on Performance Variability To test the effect of feedback on performance variability (H4) we use four constructs that measure the difference between latest two scores (loscore and oscore) for each user. The four constructs are 표푠푐표푟푒−푙표푠푐표푟푒 Δ =oscore-loscore, Δ = %, and their absolute values, i.e., |Δ | and 표푠푐표푟푒 %표푠푐표푟푒 푙표푠푐표푟푒 표푠푐표푟푒

|Δ%표푠푐표푟푒|. We estimate the effect of feedback using instrumental variable regressions by regressing feedback on variability. Note that we do not use loscore as a control in this model to have a meaningful interpretation of the effects. All four variables constructed are positively correlated with each other (pairwise correlations at least 0.320 and significant at p<0.05). The differences and absolute differences have very high positive correlation (more than 0.87 and significant at p<0.05). The averages of the metrics Δ표푠푐표푟푒 and |Δ표푠푐표푟푒| are 0.02 and 14.78, respectively, while for Δ%표푠푐표푟푒 and |Δ%표푠푐표푟푒| the means are 4.32% and 20.82%, respectively. We report the results of the estimation in Table 8. In columns 1 to 4, we observe that feedback consistently leads to higher performance variability.

Our results suggest that trips with feedback have higher variation in their performance scores than trips with no feedback, thus supporting H4a and rejecting H4b. We note that an alternative metric of variability can be standard deviation of oscore, however, because the effect of feedback is short lived 22

Electronic copy available at: https://ssrn.com/abstract=3260891 (refer to Table E.1), it becomes a problematic metric because it requires many trips for calculation. Referring to columns 3 and 4, our results suggest that feedback results in between 3.56% and 6.70% increase in the variability of performance that will be manifested in the next trip.

Table 8: Effect of Feedback on Score Variation (1) (2) (3) (4) VARIABLES Δ표푠푐표푟푒 |Δ표푠푐표푟푒|⁡ Δ%표푠푐표푟푒 |Δ%표푠푐표푟푒|⁡ feedback 1.491** 4.095*** 3.562*** 6.697*** (0.623) (0.305) (0.960) (0.575) Constant 18.202 24.019* 42.600 51.441* (23.034) (13.894) (42.834) (30.891) Controls yes yes yes yes R-squared 0.107 0.092 0.084 0.112 Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals. N = 79,639, estimation with both instruments.

3.2.5. Feedback and Incentive Thresholds

To test H5, we construct a variable nearThres = 1 if a trip is taken when the user is ‘close’ from below to the thresholds (70, 80, 90, and 95) and zero otherwise. We define ‘close’ by including all the trips that are within w points of the thresholds. For instance, for threshold 70 and w=4, trips with ltodatescore within [66, 70] will have nearThres = 1. We present our results in Table 9 where we interact nearThres with feedback. We test our results with different values of w = {2, 3, 4, and 5}.

Table 9: Threshold Effect on Performance (1) (2) (3) (4) (5) (6) (7) (8) VARIABLES oscore feedback -2.401*** -2.709*** -3.028*** -3.082*** -2.228*** -2.308*** -2.347*** -2.366*** (0.478) (0.500) (0.543) (0.572) (0.499) (0.518) (0.558) (0.573) nearThres× feedback 0.690 1.341** 1.835*** 1.756*** 0.0279 0.275 0.329 0.347 (0.493) (0.550) (0.588) (0.611) (0.550) (0.561) (0.589) (0.597) nearThres -0.279 -0.552** -0.797** -0.957*** 0.198 0.255 0.248 0.131 (0.243) (0.257) (0.316) (0.331) (0.289) (0.314) (0.373) (0.371) Constant 78.55*** 78.44*** 78.48*** 78.35*** 78.88*** 79.12*** 79.14*** 79.16*** (10.68) (10.73) (10.71) (10.62) (10.55) (10.50) (10.48) (10.48) Controls yes yes yes yes yes yes yes yes R-squared 0.393 0.393 0.393 0.393 0.393 0.393 0.393 0.393 w 2 3 4 5 -2 -3 -4 -5 Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals. N = 79,639, estimation with both instruments The first four columns in Table 9 denote the estimation results when we change w from 2 to 5. All coefficients for the interaction term nearThres×feedback are positive, and for column 1 the p-value is 16.3%, while all other columns show much higher significance. Taking column 3 as an example, if two users drive after a detailed feedback (i.e., feedback=1), one who is closer to the threshold by 4 points from below will perform better than the one who is not close to any threshold by 1.038 (=1.835-0.797) points. In terms of the standard deviation of oscore (i.e. 16.81) after feedback, users near threshold will perform 6.2% (=1.038/16.81) in standard deviations points better than the users whose ltodatescore are away from threshold. To test the robustness of the results, we estimate the same models with w = {-2,- 23

Electronic copy available at: https://ssrn.com/abstract=3260891 3,-4, and -5}, that is, if ltodatescore is close to thresholds but two to five points above. We report the estimated results in Table 9 in columns 5 to 8. We do not find any significant effects of being near the threshold from above. Therefore, H5 is supported. We further test whether all thresholds are equally important by introducing individual dummy variables (e.g., if a user is close to the first threshold 70, we create a dummy variable nearThres70 and so on) and then interact it with feedback to estimate the effect. We report the results in Table E.2. Our results indicate that first threshold has the strongest effect.

Table 10: Effect of Feedback on Various Dimensions (1) (2) (3) (4) (5) (6) (7) VARIABLES acclr_rate brk_rate overspeedrate overspeedkm maxspeed averagespeed max-avg speed feedback 0.221** 0.127 0.745*** 0.186*** 5.817*** 4.096*** 1.732*** (0.105) (0.125) (0.145) (0.0353) (0.726) (0.543) (0.302) Constant 1.201 1.421 -5.381*** -0.819** 48.99** 19.17*** 29.89** (1.074) (1.339) (1.645) (0.332) (21.41) (7.098) (14.59) Controls yes yes yes yes yes yes yes Observations 79,639 79,639 79,639 79,639 79,528 79,639 79,528 R-squared 0.086 0.093 0.159 0.175 0.188 0.202 0.161 Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals, estimation with both instruments.

3.2.6. Effect of Feedback on Different Aspects of Driving Behavior In testing H1, we used the overall score as a dependent variable. To better understand the reasons for our key results, here we unbundle the overall score and we analyze the effect of feedback on three dimensions of driving – harsh braking, sharp acceleration, and speeding. For the first two dimensions, we create variables (acclr_rate and brk_rate) that measure the number of incidences of harsh braking and sharp acceleration per 100 minutes of trip duration. We also have the information for each trip – distance (in Km) travelled while over speeding (normalized by per 100 minute of trip duration, variable coded as overspeedrate). We report the results in Table 10. The first three models suggest that feedback deteriorates driving behavior by increasing number of sharp acceleration and distance travelled while over speeding, but it does not have any effect on harsh braking rate. We further drill down on the over speeding rate in models 4 to 6. Our results indicate that feedback leads to more kilometers travelled while over speeding (model 4), higher maximum speed attained (model 5), and higher average speed attained (model 6). In model 7, we test if the users have higher speed variation (measured by maxspeed- averagespeed) in a trip when they observe real-time feedback. Literature suggests that not only high average speed but also high speed variation is highly correlated with accidents (Quddus 2013, Tanishita and van Wee 2017). Other notable work (Nicholas and Gadirau 1988) found that ‘accident rates do not necessarily increase with increase in average speed but do increase with increase in speed variance’. Together with results obtained for H4, model 7 suggests that users not only significantly change their driving behavior overall due to feedback but they also tend to have higher spread in their trip speed.

Our results suggest that feedback increases sharp accelerations by 4%, over speed distance and speed variability by 18%, as measured by the standard deviation of the respective variables. For an average

24

Electronic copy available at: https://ssrn.com/abstract=3260891 user with average variation of 31.74 km/hr (maxspeed-averagespeed) this effect is equivalent to (1.73/31.74% = 5.5%) increase in variability. Recent work (Quddus 2013) suggests that it will increase probability of accident by 1.65% on an average (1% variation increase is associated with 0.3% increased probability of accident). Robustness Checks a) Effect of learning: One could argue that the behavior that we observe in our data is partially due to learning: while becoming familiar with the app, the users may review feedback more frequently (novelty effect) which might vanish over time. Thus, the negative effect is only observed as users get used to the first few trips in the program. To disentangle the effect of learning, we code each of the first 10, 20, 30, and 40 trips with dummy variables. Then we interact these variables with feedback and estimate our model. We summarize the results in Table E.3. We do not find any differences in the first 10 to 40 trips. We also run all our models while controlling for the number of trips taken so far in the program and we find our results to be consistent. We conclude that the behavior that we observe is not transient and we do not find any evidence that learning is contributing to the results that we obtained, while the effect of feedback on performance remains consistent. b) Effect of high trip scores: We noted that the mean trip scores are right-skewed with a mean of 81.4 (refer to Table 1). Regression to the mean is a natural explanation for the negative effect of feedback, which provides an alternate explanation for the effect we observe in H1. We perform subsample analysis while excluding the high scores in addition to controlling for the loscore. We run our estimation only for the observations for which the oscore is below 95 (models 1 and 3 in Table E.4) and below 90 (models 2 and 4 in Table E.4). We observe that our results remain consistent even in the sub sample, supporting the notion that our results are not due to drifting of the score towards the mean. Our estimates remain consistent even when the sample size reduces to fewer than 55,000 trips. c) Effect of duration between the feedback and the trip: One can argue that the time interval between user reviewing detailed feedback and next trip is an important variable: a trip taken long after a feedback review should have a lower impact on performance. We introduce variable fb2triptime to code the time duration between a feedback and trip starting time (in hours). We also include an interaction effect of fb2triptime and feedback. We report the results for this analysis in Table E.5. We do not find any evidence that the effect of feedback is affected by the time interval between trip and feedback review. Our results for the effect of feedback on performance remain consistent. We observe insignificant coefficient of the interaction term (Models 1 and 2 in Table E.5), meaning that duration between feedback and a trip has no effect on the performance irrespective of user reviewing feedback or not. d) Alternative Definition of Negative Feedback: While testing H3, we have used one definition of negative feedback, i.e., loscore < 70. However, there are other ways to define what a “negative” feedback is. For instance, users may perceive negative feedback if they observe a negative trend in their previous

25

Electronic copy available at: https://ssrn.com/abstract=3260891 trip scores (i.e., the difference between trip scores of past two trips). We estimate our results with this construct for negative feedback and we code the variables as negtrend and postrend. We present our results in the Table E.6. We find that using these alternative constructs, our results are consistent. The sign of the coefficient remains similar for the interaction term for postrend×feedback however; the p- value is 15%. 4. Discussion and Implications IoT devices and mobile connectivity in general increasingly enable user behavior to be tracked. Although conventional feedback literature provides ample support for the positive effect of conventional feedback on user behavior, little is known about the effect of real-time feedback. In this paper, we focus on real-time feedback in the context of telematics-based insurance. The main offering of UBI is to provide real-time evaluation of driving behavior to users and motivate them to improve it. We study the effect of such real-time feedback in the case of automotive telematics, where drivers are provided with real-time driving quality feedback to enable them to improve their driving behavior and earn insurance discounts.

Our results reveal several interesting findings. First, providing real-time feedback to users may not lead to improved behavior in their driving. On average, we find that feedback has a negative impact on the next trip performance and the effect persists and amplifies when users seek feedback frequently. Second, the type of feedback (e.g., positive or negative) matters. We find that negative feedback (as defined by a lower score than the incentive threshold) helps in improving short-term behavior, while positive feedback (a higher score than the threshold) helps in improving long-term performance. Despite these moderating effects, the overall effect of feedback remains negative. Third, our results suggest that this deterioration is associated with higher number of sharp accelerations and increased over speeding. Drivers also increase variation in their speed within a trip after reviewing detailed feedback that results in 1.65% increased probability of an accident. Fourth, the effect of incentive threshold is very important: users who are close to the threshold from below react to feedback less negatively.

Our data does not allow us to identify the precise mechanism for the results we obtain regarding the negative effect of feedback on performance, we can only state that users increase acceleration and over speeding. It is possible that feedback causes drivers to over react immediately after feedback (which we observe from higher oscore variance) but not in an optimal manner. After a feedback event, drivers adjust their behavior consciously and then revert to their natural driving style (short-term effect of feedback). It is also possible that the period we observe is not long enough for positive changes. It has been shown that even after 1 year of real-time continuous feedback people tend to return to natural driving habits and for some parameters real-time feedback leads to lower performance (Rolim et al. 2017). Our results can be also attributed to allocation of mental resources that emerges from the fundamental difference between one’s mental model vs. actual model for scoring. While people pay attention to few aspects of driving (e.g., acceleration, speeding etc.), the algorithm has more interrelated attributes; hence, this 26

Electronic copy available at: https://ssrn.com/abstract=3260891 strategy may lead to judgement errors. In a newsvendor experiment setup, it has been shown that decomposing an ordering task into components (e.g., forecast, service level etc.) increases random judgement errors in case of higher uncertainty (Lee and Siemsen 2016). Another phenomenon that can explain our results is over estimation bias of own skills. For example, Larkin and Leider (2012) have shown that people over estimate their ability to the extent that they choose wrong performance incentives (that will reduce pay-off), and even with feedback (that they have lost money due to their wrong choice) that bias does not go away (no change in over confidence). Since people tend to overestimate their driving capabilities (see Roy and Liersch (2013)) and often regard their own skills to be flawless, any score lower than 100 makes them feel bad and challenges their self-belief that induces them to do worse after a detailed feedback. This paper also show that people tend to have very different, idiosyncratic definitions what constitutes “bad” driving, and simply realizing that their definition does not align with the scoring system may lead to negative reactions to feedback.

It is also possible that users are too focused on the incentives but do not want to use the app to improve driving through the real-time feedback provided. For example, in healthcare, adoption of apps to manage even chronic care has not gone up, as the patients tend to avoid them due to continuous dictate on behavior, e.g., exercise and diet restrictions (Huckman and Stern 2018). Volpp et al. (2017) did not find any improvement in medication intake schedule when patients were provided with real-time feedback. Similar to their explanation, different framing of objectives may lead to different behavior (Shellenbarger 2018); and similarly, framing in our application (improve behavior vs. obtain a discount) may be influencing the results. Further, studies of behavioral inhibition system (BIS) have shown that individuals with a strong BIS tend to feel bad about themselves (Ilies et al. 2010), which results in lower self-efficacy and performance. This could be linked to cultural aspect (Kiasu culture, i.e., afraid to lose) of Singapore. These cultural aspects have been found to be detrimental to how people react to adverse outcomes (Voss et al. 2004). Lastly, the discrete nature of the incentive structure design could also play a role here as threshold-based incentives make people slack off after they achieve a target (recall H5). A field experiment can be a logical step towards gaining understanding in this regard with varying interface, incentives, framing etc.

Our study has limitations that are similar to the limitations of any research conducted with archival data. The first, and perhaps the most important, limitation is generalizability of our results. We are unable to observe drivers who are not under telematics insurance program. However, we believe that these drivers will not be very different from the ones already on the program platform. In our context, the users do not pay any setup costs. They can voluntarily leave the app if they do not find it useful. The discount obtained is on top of the no-claim discounts and therefore it does not deter a driver from obtaining the app and using it. Maybe most importantly, there is no way to obtain data on users who do not download the app: such data simply does not exist. Second, we lack random assignment to feedback. We have tried to mitigate this issue by using matching (propensity score and nearest neighbor matching) as well as 27

Electronic copy available at: https://ssrn.com/abstract=3260891 instrumental variable regressions, but controlled field experiment would be ideal. Third, our data limits us from studying the role of demographics: it would be interesting to study the effects of age, education, and vehicle type in this context because conventionally insurance companies consider these factors in premium calculations. We use fixed effects for users to get around this issue. Fourth, we do not capture the quality or type of feedback in our model, as the feedback provided was of similar format for the entire time-period. It would be interesting to observe how people respond to different kinds of feedback. Again, these issues are best addressed in a field experiment. References Abadie A, Imbens GW (2011) Bias-Corrected Matching Estimators for Average Treatment Effects. J. Bus. Econ. Stat. 29(1):1–11. Accenture (2018) UKI: Telematics—Passing Fad or Game-Changer? Retrieved (August 20, 2001), https://accntu.re/2zKQz0H. Adams R, Almeida H, Ferreira D (2009) Understanding the relationship between founder-CEOs and firm performance. J. Empir. Financ. 16(1):136–150. Anseel F, Beatty AS, Shen W, Lievens F, Sackett PR (2015) How Are We Doing After 30 Years? A Meta-Analytic Review of the Antecedents and Outcomes of Feedback-Seeking Behavior. J. Manage. 41(1):318–348. Ashford SJ (1986) Feedback-Seeking in Individual Adaptation: a Resources Perspective. Acad. Manag. J. 29(3):465–487. Ashford SJ, Tsui A (1991) Self-Regulation for Managerial Effectiveness : The Role of Active Feedback Seeking. Acad. Manag. J. 34(2):251–280. Ashton RH (1990) Pressure and Performance in Accounting Decision Settings: Paradoxical Effects of Incentives, Feedback, and Justification. J. Account. Res. 28:148–180. Bandura A (1991) Social cognitive theory of self-regulation. Organ. Behav. Hum. Decis. Process. 50(2):248–287. Beinstein A, Sumers T (2016) How Uber Engineering Increases Safe Driving with Telematics - Uber Engineering Blog. Uber Eng. Beshears J, Choi JJ, Laibson D, Madrian BC, Milkman KL (2015) The Effect of Providing Peer Information on Retirement Savings Decisions. J. Finance 70(3):1161–1201. Blader S, Gartendberg C, Prat A (2015) The Contingent Effect of Management Practices. Columbia Bus. Sch. Res. Pap. No. 15-48. Bolton GE, Katok E (2008) Learning by Doing in the Newsvendor Problem: A Laboratory Investigation of the Role of Experience and Feedback. Manuf. Serv. Oper. Manag. 10(3):519–538. Buell RW, Norton MI (2014) Last-Place Aversion. Q. J. Econ.:105–149. Burgers C, Eden A, Van Engelenburg MD, Buningh S (2015) How feedback boosts motivation and play in a brain-training game. Comput. Human Behav. 48:94–103.

28

Electronic copy available at: https://ssrn.com/abstract=3260891 Cheema A, Bagchi R (2011) The Effect of Goal Visualization on Goal Pursuit: Implications for Consumers and Managers. J. Mark. 75(2):109–123. Chen C, Xie Y (2014) Modeling the safety impacts of driving hours and rest breaks on truck drivers considering time-dependent covariates. J. Safety Res. 51:57–63. Cramer J, Krueger AB (2016) Disruptive Change in the Taxi Business: The Case of Uber. Am. Econ. Rev. Pap. Proc. 106(5):177–182. Croson R, Donohue K (2006) Behavioral Causes of the Bullwhip Effect and the Observed Value of Inventory Information. Manage. Sci. 52(3):323–336. Cunningham K, Brown TD, Gradwell E, Nee PA (2000) associated fatal head injury: case report and review of the literature on airbag injuries. Emerg. Med. J. 17(2):139–142. Dahlinger A, Ryder B (2018) The Impact of Abstract vs . Concrete Feedback Design on Behavior – Insights from a Large Eco-Driving Field Experiment. Proc. CHI:1–11. Deci EL, Koestner R, Ryan RM (1999) A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychol. Bull. 125(6):627–668. Dehejia RH, Wahba S (2002) Propensity Score-Matching Methods for Nonexperimental Causal Studies. Rev. Econ. Stat. 84(1):151–161. Forster J, Higgins T, Chen Idson L (1998) Approach and avoidance strength during goal attainment: Regulatory focus and the “goals loom larger” effect. J. Pers. Soc. Psychol. 75(5):1115–1131. Gnewuch U, Morana S, Heckmann C, Maedche A (2018) Designing Conversational Agents for Energy Feedback. Proc. 13th Int. Conf. Des. Sci. Res. Inf. Syst. Technol. (DESRIST 2018) (June). Green PJ, Gino F, Staats B (2017) Shopping for Confirmation: How Disconfirming Feedback Shapes Social Networks. Harvard Bus. Sch. Work. Pap. (18-028). Hannan RL, Krishnan R, Newman AH (2008) The effects of disseminating relative performance feedback in tournament and individual performance compensation plans. Account. Rev. 83(4):893– 913. Huckman RS, Stern AD (2018) Why Apps for Managing Chronic Disease Haven’t Been Widely Used, and How to Fix It. Harv. Bus. Rev. Hyodo S, Yoshii T, Satoshi M, Hirotoshi S (2017) An analysis of the impact of driving time on the driver’s behavior using probe car data. Transp. Res. Procedia 21:169–179. IHS Markit (2016) Usage- Based Insurance Expected to Grow to Subscribers Globally by 2023 , IHS. Retrieved (September 14, 2018), https://bit.ly/2OvhLZa. Ilies R, Judge TA, Wagner DT (2010) The Influence of Cognitive and Affective Reactions to Feedback on Subsequent Goals Role of Behavioral Inhibition / Activation. Eur. Psychol. 15(1998):121–131. Kahneman D, Tversky A (1973) On the Psycology of Prediction. Psychol. Rev. 80(4):237–251. Kivetz R, Urminsky O, Zheng Y (2006) The Goal-Gradient Hypothesis Resurrected: Purchase Acceleration, Illusionary Goal Progress, and Customer Retention. J. Mark. Res. 43(1):39–58.

29

Electronic copy available at: https://ssrn.com/abstract=3260891 Kluger AN, DeNisi A (1996) The effects of feedback internventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychol. Bull. 119(2):254–284. Kolenikov S, Angeles G (2004) The Use of Discrete Data in PCA: Theory, Simulations, and Applications to Socioeconomic Indices. Carolina Popul. Cent. Meas. Eval.:1–59. Kremer M, Moritz B, Siemsen E (2011) Demand Forecasting Behavior: System Neglect and Change Detection. Manage. Sci. 57(10):1827–1843. Larkin I, Leider S (2012) Incentive Schemes, Sorting, and Behavioral Biases of Employees: Experimental Evidence. Am. Econ. Assoc. 4(2):184–214. Lee ML, Dey AK (2014) Real-time feedback for improving medication taking. Proc. 32nd Annu. ACM Conf. Hum. factors Comput. Syst. - CHI ’14 (May):2259–2268. Lee TW, Locke EA, Phan SH (1997) Explaining the Assigned Goal-Incentive Interaction: The Role of Se/f-Efficacy and Personal Goals. J. Manage. 23(4):541–559. Lee Y, Siemsen E (2016) Task Decomposition and Newsvendor Decision Making. Manage. Sci., 63(10):3226–3245. Leverson B, Chiang KH (2014) Study of the Impact of a Telematics System on Safe and Fuel-efficient Driving in Trucks. U.S. Dep. Transp. Luellen JK, Shadish WR, Clark MH (2005) Propensity Scores. Eval. Rev. 29(6):530–558. Martocchio JJ, Webster J (1992) Effects of Feedback and Cognitive Playfulness on Performance in Microcomputer Software Training. Pers. Psychol. 45(3):553–578. Mulder RH (2013) Exploring feedback incidents, their characteristics and the informal learning activities that emanate from them. Eur. J. Train. Dev. 37:49–71. Najm WG, Smith JD, Smith DL (2001) Analysis of Crossing Path Crashes. Natl. Tech. Inf. Serv. Springfield, VA 22161. Nicholas JG, Gadirau R (1988) Speed Variance and its Influence on Accidents. AAA Found. Traffic Saf. Quddus M (2013) Exploring the relationship between average speed, speed variation, and accident rates using spatial statistical models and GIS. J. Transp. Saf. Secur. 5(1):27–45. Roels G, Su X (2014) Optimal Design of Social Comparison Effects: Setting Reference Groups and Reference Points. Manage. Sci. 60(3):606–627. Rolim C, Baptista P, Duarte G, Farias T, Pereira J (2017) Impacts of real-time feedback on driving behaviour: A case-study of bus passenger drivers. Eur. J. Transp. Infrastruct. Res. 17(3):346–359. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55. Roy MM, Liersch MJ (2013) I Am a Better Driver Than You Think: Examining Self- Enhancement for Driving Ability. J. Appl. Soc. Psychol. 43(8):1648–1659. Schultz PW, Nolan JM, Cialdini RB, Goldstein NJ, Griskevicius V (2007) Destructive, and

30

Electronic copy available at: https://ssrn.com/abstract=3260891 Reconstructive Power of Social Norms. Psychol. Sci. 18(5):429–434. Schweitzer ME, Cachon GP (2000) Decision Bias in the Newsvendor Problem with a Known Demand Distribution: Experimental Evidence. Manage. Sci. 46(3):404–420. Shellenbarger S (2018) The Rare Workers Who Thrive on Negative Feedback. Wall Str. J.:9–12. Soleymanian M, Weinberg C, Zhu T (2017) Sensor Data, Privacy, and Behavioral Tracking : Does Usage-Based Auto Insurance Benefit Drivers? Work. Pap.:1–41. Song H, Tucker AL, Murrell KL, Vinson DR (2017) Closing the Productivity Gap: Improving Worker Productivity through Public Relative Performance Feedback and Validation of Best Practices. Manage. Sci.:1–37. Staiger D, Stock J (1997) Instrumental Variables Regression with Weak Instruments. Econometrica 65(3):557–586. Tanishita M, van Wee B (2017) Impact of vehicle speeds and changes in mean speeds on per vehicle- kilometer traffic accident rates in Japan. IATSS Res. 41(3):107–112. Tiefenbeck V, Goette L, Degen K, Tasic V, Fleisch E, Lalive R, Staake T (2016) Overcoming Salience Bias: How Real-Time Feedback Fosters Resource Conservation. Manage. Sci. 64(3):1458–1476. Toledo T, Lotan T (2006) In-Vehicle Data Recorder for Evaluation of Driving Behavior and Safety. Transp. Res. Rec. J. Transp. Res. Board 1953:112–119. Toledo T, Musicant O (2008) in-Vehicle Data Recorders for Monitoring and Feedback on Drivers’ Behavior. Transp. Res. Part C 16(3):320–331. UPS (2016) ORION : The algorithm proving that left isn’t right USASelect (2018) The Automotive Industry in the United States. Volpp KG, Troxel AB, Mehta SJ, Norton L, Zhu J, Lim R, Wang W, et al. (2017) Effect of Electronic Reminders, Financial Incentives, and Social Support on Outcomes After Myocardial Infarction: The HeartStrong Randomized Clinical Trial. JAMA Intern. Med. 19104:1–9. Voss CA, Roth A V., Rosenzweig ED, Blackmon K, Chase RB (2004) A tale of two countries’ conservatism, Service quality, and feedback on customer satisfaction. J. Serv. Res. 6(3):212–230. WHO (2015) Global status report on road safety Wu DY, Katok E (2006) Learning, communication, and the bullwhip effect. J. Oper. Manag. 24(6):839– 850.

31

Electronic copy available at: https://ssrn.com/abstract=3260891 E-companion

Table E.1: Effect of Feedback on Future Moving Average Performance (1) (2) (3) (4) (5) (6) VARIABLES MA_2trips MA_3trips MA_4trips MA_5trips MA_6trips MA_7trips

feedback -0.835*** -0.339 -0.080 -0.076 -0.126 -0.146 (0.302) (0.233) (0.190) (0.162) (0.144) (0.131) Constant 80.944*** 79.311*** 83.859*** 86.345*** 85.364*** 83.310*** (7.904) (7.741) (4.600) (5.054) (4.809) (5.611) Controls yes yes yes yes yes yes Observations 79,268 78,897 78,526 78,155 77,784 77,413 R-squared 0.408 0.432 0.467 0.502 0.534 0.562 moving average for 2-trips 3-trips 4-trips 5-trips 6-trips 7-trips Robust, SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals, estimation with both instruments.

32

Electronic copy available at: https://ssrn.com/abstract=3260891 Table E.2: Identifying Effect of Individual Threshold (1) (2) (3) VARIABLES oscore oscore oscore feedback -2.715*** -3.029*** -2.968*** (0.501) (0.545) (0.558) nearThres70Xfbck 1.496+ 1.898** 1.582* (0.997) (0.947) (0.893) nearThres80Xfbck 1.833 2.339* 2.820* (1.388) (1.245) (1.443) nearThres90Xfbck 1.446* 2.164*** 1.655** (0.778) (0.821) (0.783) nearThres95Xfbck 0.906 1.303* 0.771 (0.762) (0.703) (0.629) nearThres70 -0.224 -0.489 -0.580 (0.437) (0.474) (0.446) nearThres80 -0.214 -0.497 -0.988 (0.739) (0.615) (0.802) nearThres90 -1.044** -1.793*** -1.505*** (0.409) (0.560) (0.510) nearThres95 -0.791* -1.034** -0.877** (0.435) (0.452) (0.376) Constant 78.25*** 78.01*** 77.40*** (10.98) (10.97) (11.03) Controls yes yes yes Observations 79,639 79,639 79,639 R-squared 0.393 0.393 0.393 w 3 4 5 Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals. N = 79639, estimation with both instruments. + denotes p-value of 13%

33

Electronic copy available at: https://ssrn.com/abstract=3260891 Table E.3: Regression Results for the Effect of Learning (1) (2) (3) (4) VARIABLES oscore feedback -2.164*** -2.201*** -2.200*** -2.239*** (0.428) (0.434) (0.438) (0.442) first10trip×feedback 0.687 (1.098) first10trip 0.095 (0.634) first20trip×feedback 0.954 (0.729) first20trip -0.517 (0.468) first30trip×feedback 0.616 (0.633) first30trip -0.028 (0.375) first40trip×feedback 0.808 (0.601) first40trip -0.180 (0.339) Constant 80.872*** 80.982*** 80.873*** 80.826*** (10.238) (10.269) (10.149) (10.187) R-squared 0.393 0.393 0.393 0.393 Controls yes yes yes yes for first 10 trips first 20 trips first 30 trips first 40 trips Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals. N = 80503, estimation with both instruments

Table E.4: Regression Results for High Trip Scores Sub-Sample (1) (2) (3) (4) VARIABLES oscore feedback -2.012*** -2.138*** -2.040*** -2.165*** (0.384) (0.384) (0.386) (0.385) cfeedback -0.010*** -0.009** (0.004) (0.004) Constant 74.784*** 74.202*** 73.486*** 72.985*** (3.036) (2.986) (2.997) (2.968) Observations 54,239 53,633 54,239 53,633 R-squared 0.266 0.264 0.266 0.264 Controls yes yes yes yes oscore range <95 <90 <95 <90 Robust SEs in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals, estimation with both instruments

34

Electronic copy available at: https://ssrn.com/abstract=3260891 Table E.5: Effect of Time to Feedback Duration on Performance (1) (2) VARIABLES oscore feedback -1.479*** -1.549*** (0.516) (0.513) cfeedback -0.013*** (0.004) fb2triptime -0.000 -0.000 (0.001) (0.001) fb2triptime×feedback -0.116 -0.110 (0.093) (0.092) Constant 82.246*** 81.030*** (10.088) (10.317) Controls yes yes Observations 79,022 79,022 R-squared 0.393 0.394 Robust, standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1, with individual, date, origin, and destination fixed effects. SEs clustered for individuals, estimation with both instruments

Table E.6: Robustness Tests for Positive and Negative Feedback (1) (2) VARIABLES oscore futureaverage feedback -2.768*** -0.026 (0.427) (0.055) negtrend×feedback 1.471*** (0.318) negtrend -0.243 (0.189) postrend×feedback 0.076+ (0.053) postrend -0.015 (0.030) Constant 77.719*** 84.556*** (10.562) (2.185) Controls yes yes Observations 79,268 79,268 R-squared 0.393 0.900 Robust, standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1,+ with individual, date, origin, and destination fixed effects. SEs clustered for individuals, estimation with both instruments + denotes p-value of 15%

35

Electronic copy available at: https://ssrn.com/abstract=3260891