AN ANALYSIS OF CORRUPTION IN : BASED ON EVIDENCE FROM LICENSE PLATES

Aantal woorden: 16409

Tom Eeckhout Stamnummer : 01100232

Promotor: Prof. dr. Koen Schoors

Masterproef voorgedragen tot het bekomen van de graad van:

Master of Science in de Economische Wetenschappen

Academiejaar: 2016 - 2017

AN ANALYSIS OF : BASED ON EVIDENCE FROM LICENSE PLATES

Aantal woorden: 16409

Tom Eeckhout Stamnummer : 01100232

Promotor: Prof. dr. Koen Schoors

Masterproef voorgedragen tot het bekomen van de graad van:

Master of Science in de Economische Wetenschappen

Academiejaar: 2016 - 2017

VERTROUWELIJKHEIDSCLAUSULE/ CONFIDENTIALITY AGREEMENT

PERMISSION

Ondergetekende verklaart dat de inhoud van deze masterproef mag geraadpleegd en/of gereproduceerd worden, mits bronvermelding. I declare that the content of this Master’s Dissertation may be consulted and/or reproduced, provided that the source is referenced.

Naam student/name student :………………………………………………………………………………………………

Handtekening/signature

Samenvatting

Gepersonaliseerde nummerplaten zijn in Rusland officieel niet te verkrijgen. Het is echter een publiek geheim dat deze platen in ruil voor de juiste som roebels te koop zijn. We kunnen deze omkoping aantonen dankzij de beschikbaarheid van een administratieve databank.

De auto eigenaar is niet zomaar in een bijzondere plaat geïnteresseerd, ze vervult een vermogens- signaal of illustreert de bijzondere connecties van de eigenaar met de machthebbers. De persona- liseerde nummerplaten volgen de standaard structuur maar bestaan uit bijzondere letter- of cijfer- combinaties. Hierdoor is de verdeling van letter- en cijfercombinaties over de auto’s niet conform een willekeurige verdeling van de nummerplaten. Op basis van een theoretisch model voorspellen we dat deze bijzondere platen gecombineerd worden met luxueuze auto’s. De basis voor deze com- plementariteit ligt in de signalen die deze nummerplaten moeten versturen. Wanneer bijzondere platen ook op een toevallige manier, dus zonder omkoopsom, verkregen kunnen worden, dan ver- mindert dat namelijk de sterkte van het signaal dat zo’n plaat geeft over de eignaar. Wanneer dan een goedkope auto met een bijzondere plaat gecombineerd wordt zal de buitenwereld de bijzondere plaat zien als een toevalligheid.

Om het fenomeen te kunnen onderzoeken en te testen dienen we de 2 belangrijkste concepten af te bakenen. Welke auto’s kwalificeren als luxe auto’s; en welke nummerplaten zijn bijzonder? De groep van bijzondere nummerplaten bestaat uit platen, met cijfer- of nummercombinaties die een sterk verhoogd gemiddeld motor vermogen hebben t.o.v. hun theoretische verwachting. Voor de luxe auto’s gebruiken we een officiële lijst van de Russische overheid, een selectie op basis van auto klasse segmenten en een selectie op basis van de verdeling van bijzondere platen over automerken. Deze laatste aanpak gebruikt expliciet de gekozen groep van bijzondere nummerplaten.

We testen voor het bestaan van deze bijzondere platen op basis van t-testen en regressies. De t-testen vergelijken de frequentie van bijzondere platen met hun theoretische verwachting. Daarna herhalen we deze test voor de groep van luxe auto’s. Beide testen tonen aan dat nummerplaten niet willekeurig verdeeld worden, en dat de bijzondere nummerplaten vaker voorkomen bij luxe auto’s. Ook een reeks regressies van de nummerplaat variabele op de luxe auto variabele, bevestigen ons model en verwerpen de nulhypotehese van afwezigheid van corruptie.

Ten slotte illustreren we regionale verschillen, verschillen voor geboortejaren en een tijdstrend van deze corruptie.

i CONTENTS CONTENTS

Contents

1 Introduction 1

2 Data and Model 3

2.1 Data description ...... 3

2.2 Model and hypotheses ...... 3

2.3 Constructing the key variables ...... 9

2.3.1 Vanity plate dummy ...... 9

2.3.2 Luxury vehicle dummy ...... 18

3 Results 27

3.1 Frequency in the general population ...... 28

3.2 Frequency in the luxury vehicle subpopulation ...... 28

3.3 Regressing vanity on luxury ...... 32

3.4 Government agencies ...... 33

4 Further Investigation 39

4.1 Regional differences ...... 39

4.1.1 Region ...... 43

4.1.2 Region of birth ...... 44

4.1.3 Assessing the regional corruption measurements ...... 44

4.2 Time trend ...... 47

4.3 Year of birth ...... 49

4.4 Other features ...... 51

ii LIST OF FIGURES LIST OF FIGURES

4.4.1 Regional differences in ...... 51

4.4.2 Vanity plates as immunity to traffic stop ...... 51

5 Conclusion 52

6 References 53

A Selected letters for lvan1 II

B Selected letters for lvan2 III

C Selected letters for lvan3 V

D Classes in the nocar category VII

E Brands in the lux1 category VIII

F Corruption ranks: regions IX

G Corruption ranks: regions of birth XIV

List of Figures

1 Luxury in function of resource ...... 6

2 Density plots of numcount for number and numdum ...... 13

3 Density plots of lcount for l and ldum ...... 15

4 Bar chart of nvan1 by class ...... 31

5 Bar chart of lvan2 by class ...... 31

6 Bar chart of lvan3 by class ...... 32

iii LIST OF TABLES LIST OF TABLES

7 Regional variation in corruption: frequency for region ...... 43

8 Regional variation in corruption: coefficient for region ...... 44

9 Regional variation in corruption: frequency for region of birth ...... 45

10 Regional variation in corruption: coefficient for region of birth ...... 45

11 Density of registrations by year ...... 48

12 Time trend of vanity plates ...... 49

13 Density of registrations by year of birth ...... 50

14 Comparison of vanity plates over year of birth ...... 50

15 Comparison of vanity plates over year of birth: zoomed in on 1928-1988 ...... 51

List of Tables

1 Other types of license plates ...... 10

2 Types of license plates ...... 10

3 Reserved letter combinations ...... 11

4 Summary statistics: count of numdum ...... 12

5 Summary statistics: count of number ...... 12

6 Summary statistics: medpower (number) ...... 12

7 Summary statistics: medpower (numdum) ...... 12

8 List of numbers with high median power ...... 14

9 Summary statistics: count of ldum ...... 15

10 Summary statistics: count of l ...... 15

11 Summary statistics: medpower (l) ...... 16

12 Summary statistics: medpower (ldum) ...... 16

iv LIST OF TABLES LIST OF TABLES

13 Summary statistics: medpower (l), excluding nocar ...... 16

14 Summary statistics: medpower (ldum), excluding nocar ...... 17

15 Euro Market Segment ...... 19

16 Russian number system: passenger classes ...... 19

17 Composition of carpopulation in the data ...... 20

18 Ttest for nvan1 by brand ...... 21

19 Top-brands by nvan1 ...... 25

20 Summary statistics: van ...... 26

21 Summary statistics: lux ...... 26

22 Correlation matrix for vanity variables ...... 26

23 Correlation matrix for luxury variables ...... 27

24 Ttests for vanity frequency in general population ...... 28

25 Ttests for vanity frequency in luxury car subpopulation: lux1 ...... 29

26 Ttests for vanity frequency in luxury car subpopulation: lux2 ...... 29

27 Ttests for vanity frequency in luxury car subpopulation: lux3 ...... 30

28 Ttests for vanity frequency in luxury car subpopulation: lux4 ...... 30

29 Ttests for vanity frequency in luxury car subpopulation: lux5 ...... 30

30 Ttests for vanity frequency in luxury car subpopulation: lux6 ...... 30

31 Regression of nvan1 on luxury variables ...... 34

32 Regression of lvan1 on luxury variables ...... 35

33 Regression of lvan2 on luxury variables ...... 36

34 Regression of lvan3 on luxury variables ...... 37

35 Government agencies ...... 39

v LIST OF TABLES LIST OF TABLES

36 Regression of nvan1 on luxury variables: government agencies ...... 40

37 Regression of nvan1 on gov1-gov9 ...... 41

38 Correlation matrix of different corruption measures on the federal level and WVS variables ...... 47

39 Correlation matrix of different corruption ranks on the regional level, the CPI and regional GDP per capita and growth ...... 48

40 Letters in lvan1 ...... II

41 Letters in lvan2 ...... III

42 Letters in lvan3 ...... V

43 Nocar classes ...... VII

44 Brands in lux1 ...... VIII

45 Rank of regions: frequency ...... IX

46 Rank of regions: coefficient ...... XI

47 Rank of regions of birth: frequency ...... XIV

48 Rank of regions of birth: coeffecient ...... XVI

vi LIST OF TABLES LIST OF TABLES

Abstract

This thesis exploits an administrative database to look for evidence of corruption within the Russian traffic . It studies the vanity plate phenomenon. By building a theoretical model that predicts a strong population correlation between vanity plates and luxury , we provide tests for the existence of the corruption. These tests confirm the existence of this corruption. Potential topics for further research are also highlighted: differences in our corruption measures for different years of registration, years of birth, region of birth and region of registration.

vii LIST OF TABLES LIST OF TABLES

viii 1 INTRODUCTION

1 Introduction

Vanity plates are a well known phenomenon in a large part of the world. These personalized license plates most often consist of a combination of letters and/or numbers that have a certain value for the owner. Being more expensive than a ’normal’ plate, they, among other things, often signal the wealth of it’s owner. This makes vanity plates a form of conspicuous consumption. In Russia, officially, you can’t buy these vanity plates. Instead license plates are distributed randomly. However, it is well known among Russians that these vanity plates do exist and can be bought. The plates follow the standard structure and as such it is not easy to distinguish between a vanity plate and a ’normal’ one. But the distribution of those plates will be different in a scenario where certain plates are preferred and can be bought. The purchase of these plates, for which no legal path exists, is an act of bribery. And the studying of the distribution of these vanity plates represents a case study of revealed corruption.

Corruption, most commonly defined as the misuse of public office for private gain (Svensson, 2005), has become a more important issue of policy and research recently. It is recognized as a key factor in preventing the development of countries. This is explained by seeing corruption as an extra tax burden, or as a mechanism undermining the effectiveness of public policy. Russia consistently ranks among the most corrupt countries of the world. According to the Corruption Perceptions Index of Transparency International, in 2015 it was ranked as 119th of 168 countries. And within Russia, the police together with public officials and civil servants rank as the institutions most affected by corruption as perceived by the public ("Transparency International - Country Profiles", n.d.). Our case study concerns the police, who is responsible for the registration of cars and the issuing of new license plates. In countries where the issuing of vanity plates is regulated, it represents a significant revenue stream. For example, in 2008 a plate in Abu Dhabi was sold for $14 million (Brown, 2008). The existence of these plates does not only reveal corruption among the police, but also a willingness to pay bribes among Russian car owners.

True objective corruption measurements are rare (Urra, 2007). Nonetheless we have the opportu- nity to create such an objective corruption measure, thanks to the availability of an administrative transaction database. More precisely we look for evidence and measures of bribes made to the police in order to receive a vanity plate. First of all, we will examine the database at hand. We proceed to build a model explaining the complementary nature of luxury vehicles and vanity plates. Then we construct several variables to capture the key concepts ’luxury car’ and ’vanity plate’. Hereafter, we perform tests to determine if the empirical distribution of the vanity plates is conform random

1 1 INTRODUCTION plate entitlement. These include t-tests to compare empirical frequency to theoretical expectations, both in the general population and in a subset of luxury cars. Both these t-tests confirm our model, the license plates are not distributed randomly and vanity plates and luxury vehicles are complementary. A last test consists of a series of regressions, while controlling for certain issues surrounding the specific approach followed when constructing the key variables ’luxury vehicle’ and ’vanity plate’. The regressions also confirm our model and controlling for certain variable construc- tion issues doesn’t change this conclusion. Specific attention is provided to a set of plates reserved for government agencies.

Lastly, we highlight some distribution features of the vanity plates, which might be interesting for future research. Our corruption measure rises over the period 1994-2006, a period which is characterized by a general rise in corruption in Russia. There are important differences in our corruption measures for Russians with different regions of birth, which might reflect differences in their willingness to pay bribes. Regional variation in our corruption measure between the regions where the license plates are issued also exist. This regional variation should be influenced by the quality of governance of the local institutions. A last important finding is the variation of corruption between Russians with different years of birth. Plotting our corruption measure over years of birth shows both an upward trend and spikes around traumatic episodes of Russian history. These might highlight mechanisms driving the formation of social norms, such as the willingness to pay bribes.

As mentioned before, objective measures of corruption are rare. This thesis adds to the literature of revealed corruption by examining a case study of corruption: we proof the existence of bribes for vanity plates in Russia, propose a way of measuring it and find interesting features of the variation over time and regions.

2 2 DATA AND MODEL

2 Data and Model

2.1 Data description

The data consists of an administrative transaction database of the Main Directorate for Road Safety of the Ministry of Internal Affairs of Russia (abbreviated GIBDD). It contains 92 variables and transactions such as first registrations of vehicles, technical inspections, traffic violations, traffic accidents and many more. The total number of observations is more then 32 million. It covers the period 1994-2006. This administrative database is rather unpolished. A large amount of our effort has been to transform this database to a data set that’s ready for analysis. For example: one variable in the original database contained values of brands and models. This field has been filled by hundreds of police officers in a non-standardized way. Implying the field not only had to be split in a ’brand’ variable and a ’model’ variable, but also that we had to look for all the dozen ways these officers spelled ’Daewoo’ or ’Daihatsu’ in Russian... Besides making the existing variables more manageable, we’ve constructed some new variables using the existing ones. Involving the bundling of models into certain class variables and place names into regional levels. More information about these new variables is provided when we discuss the construction of the luxury variable and when we discuss regional variation in corruption. Having done this necessary transformation of the database we could start our analysis.

Our primary focus is on the first registrations. Car owners have to register their car before being allowed on the public roads. When registering the car, they receive, at least in theory, a randomly assigned license plate. Therefore, the first registrations should provide the most clear picture of the vanity plate phenomenon. The first registration database contains 4.245.585 observations. Key variables in the database are the license plate and the brand and the model of the vehicle. The first is used to construct a dummy for having a vanity plate, the latter 2 are used to construct a dummy for having a luxury vehicle.

2.2 Model and hypotheses

To test if there is clear evidence for these bribes in the data, we test if the license plate numbers are distributed randomly, as they should be if there weren’t any corrupt agents. The first test comes from looking at the frequency of special license plates. If for example we we’re looking for suspicious frequencies of license plates with the number 007 in it, we would compare it’s empirical frequency with the theoretical frequency 1 0.10% (numbers of the standard license plate are in the range 999 ≈

3 2.2 Model and hypotheses 2 DATA AND MODEL

001-999). Standard errors for this test can be retrieved by simulating a process that truly assigns plates randomly. But first, before postulating the hypotheses, let’s construct a theoretical model. In the model there is a population, which consists of c corrupt individuals and 1 c non-corrupt − individuals (Algan, Cahuc, & Sangnier, 2016). Corrupt agents are defined by being willing to pay a bribe to the police officer. Every agent has a certain budget r he wants to spend on conspic- uous consumption, given the prices of both ’normal’ consumption and conspicuous consumption goods. Consumption goods consist of a luxury car and a vanity plate. The utility function for the conspicuous consumption can be put in the following form:

U = f(signal-plate, signal-car) = f(van, lux)

The time line proceeds like this: first an agent buys a car and implicitly decides how much he wants to spend on the social prestige he gets from the car (lux). Then he registers his car and receives a license plate. Corrupt agents can decide to intervene and to bribe the police officer to get a special license plate (van). When a special license plate can be acquired by luck (randomly assigned), the social prestige gained from such a plate isn’t independent from other signals, such as the kind of car you’re driving. This makes lux and van behave as complementary goods. Concerning the special license plates, agents face a discrete choice (although there actually is a spectrum of vanity over license plates from a little special to very special, we will abstract from this). This implies that only from a certain level of r agents will decide to buy the van license plate, since they need a certain level of lux to go with it. A corrupt individual not yet in the possession of a vanity plate faces a trade-off whether to pay a bribe or not, with the following utilities:

r pbribe r pbribe α := π U(1, − ) + (1 π) U(0, − ) if he pays a bribe ∗ plux − ∗ plux r β := U(0, ) if he doesn’t plux

, where plux and pbribe are the prices of a unit of luxury vehicle and the size of a bribe respectively. π(lux) is defined as the subjective probability in the eye of the public that a certain car-plate combination is the result of bribery. The utility of having a vanity plate could be divided in a ’aesthetic’ component and a ’wealth’ component. We make abstraction of the ’aesthetic’ component. Then the wealth-signaling value of a vanity plate originates in the bribe you might have paid for it. When vanity plates can be received incidentally, this will undermine the value of those plates to the owners. The signal of the plate will then depend on the signal of the car. So actually, π(lux) is a function of lux. If π = 0, it is clear that agents will prefer not to pay a bribe, because then α < β

(if pbribe > 0).

We will make a set of assumptions in this model. First of all, the focus is on the demand side of vanity plates. It is assumed that every police officer is equally willing to supply a vanity plate.

4 2 DATA AND MODEL 2.2 Model and hypotheses

So we don’t divide the police officer population into corrupt and non-corrupt as we do with our agents. This assumption is not crucial for the model, a non-homogeneous police officer corps would imply different levels of pbribe. It is convenient however to make the simplification of 1 level of pbribe. When looking to differences between groups of agents this assumption becomes more important. Next we assume identical preferences of our agents concerning van and lux. They only differ in their endowments r and their willingness to pay a bribe (c is 0 or 1 for an individual). Finally, we will assume that being corrupt is independent from the level of r. Then the frequency of corrupt individuals per level of resource (income) is equal to c.

We consider two scenarios for the license plate entitlement.

Scenario 1 Agents can only receive a vanity plate when they offer a bribe, otherwise they receive a randomly chosen ’normal’ plate.

Scenario 2 Agents receive their plate randomly. Next, when they haven’t received a vanity plate by luck, the corrupt agents have the opportunity to bribe and receive a vanity plate instead of the normal plate.

Scenario 1 Scenario 1 is easy to solve. Since vanity plates only can be bought, π(lux) 1. If ≡ pbribe is low enough, there will always be a level of r, above which agents will prefer to bribe.

(limpbribe→0 α) > β

Let’s call this level of r, req. At that level of r, agents are indifferent between bribing the officer or not. If α would be lower than β for all r, the demand for vanity plates would be 0. To balance the market, the officers would lower their price pbribe. Also, since agents at least have to be able to pay the bribe. r p eq ≥ bribe ( ) If you define V := α β, then r follows from solving ∂V r = 0. Choosing a specification for the − eq ∂r utility function would allow us to derive req analytically, however it is not necessary to show the main implications of the model and construct our hypotheses. Assume now that req > pbribe. For the corrupt agents, their individual consumption of lux for each level of r is displayed below:

r luxH = eq plux r p luxL = eq − bribe plux

5 2.2 Model and hypotheses 2 DATA AND MODEL

lux(r)

lux(r) no bribe − lux(r) bribe − ) lux ( luxH luxury

luxL

req resource (r)

Figure 1: Luxury in function of resource: At req luxury falls back because agents now spend budget on vanity plates as well. Between luxL and luxH there will be both agents who paid a bribe and who didn’t.

The public doesn’t know the r of the other agents, it can however see the lux of those agents. And between luxL and luxH there are both agents with and without vanity plates. Depending on the distribution of the population over r, let’s call it χ(r), you can determine the frequency of van for each level of lux. If you take the very unrealistic but simple assumption of a uniform distribution of population over r, than you can calculate the frequencies easily. Let’s call the frequency of vanity plates φ. Assume χ(r) δ, with δ = 1 and R the maximum of r. Then it follows that ≡ R

φ(lux) = 0 if lux < luxL c φ(lux) = if luxL lux luxH 2 ≤ ≤ φ(lux) = c if lux > luxH

Half the corrupt agents with a lux between luxL and luxH have a vanity plate, whereas the other half hasn’t. If you divide the car population into two parts separated by some luxD (between luxL and luxH , and then create a dummy Lux = 1 if lux > luxD and Lux = 0 if lux < luxD. Then this model predicts that the frequency of vanity plates when Lux = 1 will be higher than when Lux = 0. Also if c p this would imply that there are sold less vanity plates than would be ≤ distributed randomly.

frequency(vanity) = 0 if lux < luxL

frequency(vanity) = c if lux > luxH

6 2 DATA AND MODEL 2.2 Model and hypotheses

Scenario 2 In scenario 2 π(lux) < 1, for all levels of lux. This is because a fraction of the population received the vanity plate by luck. π(lux) is now determined by the frequencies of ’bought’ vanity plates versus the ’incidental’ vanity plates. To illustrate the mechanism:

π(lux) U(bribing) r φ(lux) π(lux) → → eq → →

With p the theoretical frequency of vanity plates (this is without corruption) we have

φ(lux) p π(lux) = − φ(lux)

, mind that φ(lux) p. The α here is lower than in scenario 1, because of the uncertainty that ≥ arises about the origin of the vanity plate. Again, like in scenario 1, if req > pbribe it can be found ( ) by solving ∂V r = 0. Should α > β lux it would imply r = p . And if α < β lux than ∂r ∀ eq bribe ∀ market forces should lower pbribe and create equality between α and β for some level of lux. This is because r r r (limpbribe→0 α) = π U(1, ) + (1 π) U(0, ) > U(0, ) = β ∗ plux − ∗ plux plux U(1, r ) > U(0, r ) , where the inequality holds because of plux plux . Assume for the rest of the paragraph that req > pbribe. The graph of lux to r is similar as in scenario 1.

Assume again a uniform distribution of r, χ(r) = δ r. Then we have for φ(lux) ∀

φ(lux) = p if lux < luxL (1 p) c φ(lux) = p + − ∗ if luxL lux luxH 2 ≤ ≤ φ(lux) = p + (1 p) c if lux > luxH − ∗ and thus for π(lux)

π(lux) = 0 if lux < luxL (1−p)∗c π(lux) = 2 if luxL lux luxH (1−p)∗c ≤ ≤ p + 2 (1 p) c π(lux) = − ∗ if lux > luxH p + (1 p) c − ∗ With this scenario, the model predicts that the frequency of vanity plates equals p in the subgroup of cars with lux luxL. And that the frequency will be above p in the complementary group if ≤ there is corruption (c > 0).

frequency(vanity) = p if lux < luxL

frequency(vanity) > p if lux > luxH

7 2.2 Model and hypotheses 2 DATA AND MODEL

A few notes on the assumptions

1. We assume an equal distribution of χ(r). This simplifies the matters and doesn’t change the main conclusions of the model qualitatively. Abandoning this assumption doesn’t change the prediction of the frequencies below luxL and above luxH , it does however changes the φ(lux) in between.

2. We assume a homogeneous police officer corps. It is however very likely that some registration centers will be more corrupt than others. This might for example translate in scenario 1 being the case in some centers (very corrupt, you can’t get a vanity plate without bribing), and scenario 2 being the case in others (less corrupt, vanity plates are still being hand out without bribe). If this were to be the case, the resulting predictions could alter to

frequency(vanity) < p if lux < luxL

frequency(vanity) > p if lux > luxH

Hypotheses Our model shows that in scenario 2 p is the theoretical expected frequency of van under the null hypothesis of no corruption, since then c = 0

H0a : freq(van) = p

It is expected that the real frequency will be higher. Of course, under the null hypothesis of no corruption scenario 1 can’t occur, because it assumes that vanity plates can only be acquired by bribery.

A second approach is more subtle. Since there might be only small numbers of bribes compared to the huge number of registrations, it makes sense to look at a relevant subgroup. And following the above model, the subpopulation of luxury cars should be effected the most by the bribes, since φ(lux) > p if lux > luxH . A second test is then:

H0 : freq(van lux > luxH ) = p b | Again, the expectation is that the real frequency will be higher. Should scenario 1 be the case, then this frequency c might be lower. But at least, in case the corruption exists, the frequency of vanity should be higher in luxury vehicle subpopulation than in the complementary population. An alternative to simply looking at that frequency would be to regress van on lux. Under the null hypothesis of no corruption lux shouldn’t have any explaining power and the coefficient b should not differ significantly form 0. van = a + b lux +  i ∗ i i

8 2 DATA AND MODEL 2.3 Constructing the key variables

2.3 Constructing the key variables

2.3.1 Vanity plate dummy

As explained above, some license plates are preferred over others. How to determine which ones? Surfing the internet can learn already something about what kind of plates could be qualified as vanity plates. ("Designated license plate Russia for power – Автоновини з усього свiту", 2015); ("o000oo.ru - купить номер на авто реально!", 2016); ("Купить номера на автомобиль", 2016). Лобанов(2015) gives a good summary of the vanity plate phenomenon in Russia. As he argues, there have been frequent efforts to legalize the vanity plates sales and to put them at auction. In 2010 (Плехов, n.d.), in 2013 (Tabakov, 2013) and again in 2015. It has been postponed indefinitely by the Duma, the Russian parliament. Even though the proposal in 2010 had the backing of Putin, who wanted to put the vanity plates at auction. The key change to the market of vanity plates eventually came in 2013. From then on, license plates could be kept by the car owner, instead of being inseparable from the car. It has caused the rapid development of a secondary market in vanity plates. Owners of a vanity plate sell their car - with plate - to a buyer of the vanity plate. The buyer then resells the car - without plate - to the original owner (Гагарин, 2016). The development of these secondary markets and the fact that it did mostly online, gives us more information about this market ("Продажа автомобильных номеров", 2016). It learns us that some of these plates are sold for up to 10,000,000 rubles ( 150, 000 euros). ≈

An analytic approach of selecting numbers to include in a vanity category, might consist of looking at the frequencies of the different number- and letter combinations. But these might not necessary be different if for example no extra vanity plates are produced. One of the variables in the data set contains information about the engine power (in kWh) of the car. Luxury cars typically have more engine power than other cars. By calculating the median power per specific number or letter- combination, and comparing it to a random simulation, we should see that some combinations have significant deviations in their median power from the expected median power. Letter- or numbercombinations having these higher median power, would indicate that the plates with these combinations more often are linked to a luxury vehicle.

Structure of the Plates The current structure of the license plates is LDDDLL|Reg, with L and D symbolizing a letter and a digit respectively. The Reg stands for the region code. Only 12 letters from the Cyrillic alphabet that resemble Latin characters are used: A, B, E, K, M, H, O, P, C, T, Y, X.

9 2.3 Constructing the key variables 2 DATA AND MODEL

7 other types of license plates are in circulation ("Vehicle registration plates of Russia", 2016), ("License plate as a source of valuable data – SA WEST, Global intelligence at your disposal", 2016).

Group Structure of the plate

Ministry of Internal Affairs LDDDD|Reg Diplomatic DDDLLD|Reg Armed Forces DDDDLL|Reg Public Transport LLDDD|Reg The cars exported from RF ’T’LLDDD|Reg Trailers LLDDDD|Reg Transit (temporary registrations) LLDDDL|Reg

Table 1: Other types of license plates

In table 2 we tabulate this variable. For the rest of the analysis we drop the observations who don’t have a "Normal Plate" - our focus is on passenger cars.

Type of license plates Obs.

Diplomatic 9 Export 2 Military 113 Normal 4,244,295 Police 324 Public Transport or Trailer 792 Temporay&Transit 50 Total 4,245,585

Table 2: Types of license plates

Besides these other types of license plates, there are also specific letter combinations reserved for certain purposes ("License plate as a source of valuable data – SA WEST, Global intelligence at your disposal", 2016). These reserved letters are listed in table 3.

10 2 DATA AND MODEL 2.3 Constructing the key variables

State agency Letter combination

Higher authorities: Federal Assembly, The Government, bodies of A***MP Supreme and Constitutional Court, law enforcement agencies A***AA, A***OA, O***OA, City Administration an state O***AA, A***OO, A***OA, agencies A***AO State security agencies O***CM, O***OO, O***CA O***OO, O***MO, O***MM, Ministry of Internal Affairs O***MP, O***OM, O***TT, O**PP, O***BO Bodies of Supreme and O***KC Constitutional Court Federal Migration Service O***TT, O***PP Federal Bailiff Service O***KO Committee of inquiry, Prosecutor’s O***KO, O***CK office State Traffic Safety Inspectorate O***CA (GIBDD) Federal Drug Control Service O***OH

Table 3: Reserved letter combinations

The analysis will focus on the normal license plates, and with special attention with regards to the reserved letters.

The Numbers We generate a new variable numdum which assigns to each observation a random number in the range 1 999. Hereafter medpower is calculated, this is the median power for a − group of observations with the same number or groups with the same numdum. But before looking at the median power distributions, let’s inspect the frequencies (numcount) of each number. We compare it to the distribution of numdum, which could be seen as a simulation of the distribution of number assuming random assignment and doing 999 replications.

11 2.3 Constructing the key variables 2 DATA AND MODEL

Variable mean sd p1 p5 p95 p99 Obs. Skewness

numcount 4,249.48 64.335 4,108 4,149 4,352 4,402 4,244,259 0.077

Table 4: Summary statistics: count of numdum: numcount is the size of each group in the data with the same number dummy numdum.

Variable mean sd p1 p5 p95 p99 Obs. Skewness

numcount 4,260.84 226.151 3,595 3,883 4,500 4,750 4,244,259 -0.158

Table 5: Summary statistics: count of number: numcount is the size of each group in the data with the same number.

The frequencies of certain numbers imply that those numbers aren’t drawn randomly. The stan- dard deviation is 3 times what theoretically could be expected. More than 5% of the number have count frequencies above the 99th percentile of the theoretical distribution. Figure 2 illustrates the difference in distribution between the theoretical numcount and the empirical numcount. The next thing to examine is the distribution of the median power, medpower, for both number groups and the dummy numdum groups.

Variable mean sd min p1 p5 p95 p99 max Obs. Skewness

medpower 58.57 2.152 55 57.2 57.2 60 69 80.88 4,244,259 5.193

Table 6: Summary statistics: medpower (number): median engine power medpower for each group in the data with the same number.

Variable mean sd min p1 p5 p95 p99 max Obs. Skewness

medpower 58.35 0.659 57.2 57.2 57.2 59 60 62.5 4,244,259 0.135

Table 7: Summary statistics: medpower (numdum): median engine power medpower for each group in the data with the same number dummy numdum.

The distribution of medpower for numdum indicates that numbers with medpower above 62.5 1 are very exceptional (they represent less than 999 % of the theoretical population). Mind also the high skewness, indicating the biggest deviations are on the positive side of the distribution. More specific, groups with high median engine power cause this positive skewness. The numbers in the data with this exceptional median power are listed in table 8. We see that the same type of numbers as on the online markets come to the front. Numbers of the type ’00 ’, ’0 0’ and ’ 00’, and also ∗ ∗ ∗ of the type ’DDD’, with 3 identical digits, top the list. The 3 numbers with the biggest medpower deviation are 001, 777 & 999. As a first vanity variable, we create the dummy variable nvan1, which

12 2 DATA AND MODEL 2.3 Constructing the key variables .006 .004 Frequency .002 0 3000 4000 5000 6000 Count

number numdum

Figure 2: Density plots of numcount for number and numdum: empirical distribution of group sizes for each number versus the theoretical distribution simulated by numdum is equal to 1 if the plate has a number of the list in table 8 and 0 otherwise. 34 numbers are on the list, which would have a theoretical frequency of 34 3.4 %. We already know, that the frequencies 999 ≈ of certain numbers in the data, differ significantly from what theoretically could be expected. More interesting is what these frequencies will be in the subgroup of luxury vehicles.

The Letters We follow the same approach as for the numbers, and generate a new variable ldum which assigns to each observation 3 random letters, from the set of 12 letters used for the Russian license plates. Once more median engine power for groups is calculated, this time for groups of the same letter combination: medpower(l).

One difference with the numbers, is that some letter combinations are reserved. They are listed in table 3. These reserved combinations shall be treated separately later on. We also define a new variable gov, with the different government agencies as categories.

First, we inspect the summary statistics of lcount and medpower, for both the letter combinations from the data l, and for a randomly generated ldum. Table 10 & table 9 contain the summary statistics for group count of l and ldum respectively. Figure 3 shows the density plot for both variables their medpower.

The 2 distributions differ even more than for the numbers. Two things might help explain this behavior. First of all, the database contains only information about the engine power for a subsam- ple of the cars. Some groups thus may suffer from the smaller sample size for which engine power

13 2.3 Constructing the key variables 2 DATA AND MODEL

number medpower

001 80 002 72 003 70.59 004 66 005 69 007 73.5 008 67 009 66.7 010 66.2 012 63 020 66 030 63.4 050 66 070 66 090 63 100 67 111 68.8 200 66 222 67 300 66.18 333 69 400 66 444 66.7 500 69 555 73 600 66 666 63 700 66.7 707 64 777 77 800 66.18 888 72 900 66.7 999 75 Total 68.13

Table 8: List of numbers with high median power

14 2 DATA AND MODEL 2.3 Constructing the key variables .008 .006 .004 Frequency .002 0 0 2000 4000 6000 Count

l ldum

Figure 3: Density plots of lcount for l and ldum: empirical distribution of group sizes for each letter combination l versus the theoretical distribution simulated by ldum

Variable mean sd p1 p5 p95 p99 Obs. Skewness

lcount 2457.41 49.466 2,345 2,377 2,542 2,574 4,244,295 0.055

Table 9: Summary statistics: count of ldum: lcount is the size of each group in the data with the same letter dummy. data is available. Furthermore, from ("Административный регламент по регистрации", 2011) we know that license plates are distributed in series in ascending number first and then alphabetically: "Свидетельства о регистрации, регистрационные знаки и паспорта транспортных средств вы- даются в порядке возрастания их цифровых номеров, а по серии - в алфавитном порядке". So A999BC is followed by A001BE. Thus, the license plate entitlement isn’t completely random. Because of this, number and l are serially correlated. And because the cycle of l is longer, it is more sensitive to time trends in corruption. This might help explain why, the distribution of lcount differs more from what we would expect, under the assumption of a random draw, than numcount does from its own expected distribution. Remember that serial correlation typically causes slower convergence of estimators, such as a sample mean converging to the true population average.

Variable mean sd p1 p5 p95 p99 Obs. Skewness

lcount 2670.32 743.830 1,237 1,553 3,998 4,662 4,244,295 0.453

Table 10: Summary statistics: count of l: lcount is the size of each group in the data with the same letter combination.

15 2.3 Constructing the key variables 2 DATA AND MODEL

The same discrepancy exists between the theoretical expected distribution of medpower for l, when randomly distributed, and the real distribution. Table 11 and table 12 display the sum- mary statistics of medpower(l) and medpower(ldum). There are big outliers in medpower for l. medpower is 267 for EOM and 161 for EHP. Both letter combo’s have a smaller size, and therefore it is more likely for their median power to lie far from the population average.

Variable mean sd min p1 p5 p95 p99 max Obs. Skewness

medpower 62.17 10.494 49.8 51 54.5 77 98 267 4,237,868 6.061

Table 11: Summary statistics: medpower (l): median engine power medpower for each group in the data with the same letter combination.

Variable mean sd min p1 p5 p95 p99 max Obs. Skewness

medpower 58.41 .919 57.2 57.2 57.2 59.53 62 63.2 4,244,295 1.389

Table 12: Summary statistics: medpower (ldum): median engine power medpower for each group in the data with the same letter dummy ldum.

Based upon the variable containing brand and model of the car, we’ve constructed a variable cclass, which is based upon both the European car classification system ("Euro Car Segment", 2016) and the Russian automobile numbering system ("Automobile model numbering system in the and Russia", 2016). More details concerning these different types of classes are provided in the paragraph about the luxury vehicles. We used this cclass to construct a dummy nocar for a vehicle being a truck, a bus, a trailer, a crane, ... . More details are provided in the appendix. We then look once more to the median power of l and ldum, excluding the observations of the nocar category. Table 13 and table 14 show the summary statistics for these medpower.

Variable mean sd min p1 p5 p95 p99 max Obs. Skewness

medpower 59.82 9.060 47 51 53 74.9 97 161 3,638,450 3.590

Table 13: Summary statistics: medpower (l), excluding nocar : median engine power medpower for each group in the data with the same letter combination, excluding all the types of vehicles that aren’t cars.

The discrepancy now is smaller than without excluding nocar observations. But it is still much larger than for the numbers. Once again, EHP is an outlier. It is clear that the nocar observations, enlarge the deviations between empirical and theoretical distributions, but does not create them.

Analogous to our approach with the numbers, we could use the 99.9%-percentile of the theoretical distribution as a boundary for calling combinations exceptional (as in: an exceptional high median

16 2 DATA AND MODEL 2.3 Constructing the key variables

Variable mean sd min p1 p5 p95 p99 max Obs. Skewness

medpower 57.07 0.160 56 56.1 57 57.2 57.2 57.75 3,648,035 -3.465

Table 14: Summary statistics: medpower (ldum), excluding nocar: median engine power medpower for each group in the data with the same letter dummy, excluding all the types of vehicles that aren’t cars.

power in the group). But since applying that approach to the letters would lead to more than 10% of the observations to qualify as exceptional we choose a different approach. Instead we take a boundary above which approximately 3.4% of the observations lie. This would be equal to about 59 values of l. Choosing a subset of plates that has roughly the same relative size as our numerical vanity variable has the advantage of a more clear comparison between the two types of vanity variables.

Before determining such a boundary, we examine the reserved numbers. It appears that a lot of the reserved letters have above average power. This, however, doesn’t need to indicate possible corruption. Cars from government agencies typically differ from the average passenger car. Think for example of cars, highway police or high-end cars for high-level government functionaries. One simple solution would be to exclude these reserved combinations when constructing a variable lvan. Unfortunately, things are not that simple. The police officers might be so corrupt that they also sell plates with the reserved letters combinations. Among the Russian population, some say that this is in fact the case. These plates would give owners a certain degree of immunity on the road and might in fact be the highest level of vanity plates. Therefore we construct 2 variables, one excluding the reserved letters, the other not excluding them. By determining a boundary as described above, a discrete variable is created.

Because some l have the same medpower the 58 (instead of 59) highest values of medpower(l), when excluding nocar, are used to define a dummy lvan1. The same approach is repeated, now excluding the reserved letters. Now 57 l values are used to create the variable lvan2.

Finally, we wish to create a more parsimonious lvan variable, namely by using the plates of type ’LLL’. These plates with 3 identical letters top the lists of secondary plate markets. The variable lvan3 is 1 when a plate belongs to this class and 0 otherwise. The lists of the letter combinations that make up the discrete variables lvan1, lvan2 and lvan3 can be found in the appendix.

17 2.3 Constructing the key variables 2 DATA AND MODEL

2.3.2 Luxury vehicle dummy

Our theoretical model predicts that luxury vehicles have a higher percentage of vanity plates. Our analysis therefore requires selecting a subset of vehicles from the data to list as a luxury vehicle. We follow 3 different approaches. To begin with, the Russian Ministry of Industry and Trade lists certain vehicles as a luxury vehicle. As a second strategy we examine a new variable cclass which divides cars in to car classes based upon their brand and model. And finally, we also follow an approach based upon the distribution of vanity over car brands. This last approach uses the data to construct an ’optimal’ luxury vehicle variable. The main purpose of this third strategy is comparing the links between lux and van over different subgroups, such as regions or years of birth.

Official list of luxury vehicles The Russian Ministry of Trade and Industry maintains a list of luxury vehicles ("Which cars are on the luxury item list?", 2014). This list exists for tax reasons. Cars appearing on this list have a value of more than 3 million rubles. We use this list to construct a similar list in our data. For some observations both brand and model are available in our data. From other cars only the brand is known. When a brand appears on the official luxury list and the models aren’t specified we select all cars with that brand for our luxury dummy. Often these are brands with little observations. For brands where there are more observations, model information is more available, and in that case we can add the models of the official list to our luxury dummy. Another complication comes from the fact this list was published in 2008 while the data covers the period 1994-2006. Some models from our data have been succeeded by new models. We handle this by looking for the models who proceeded the models from the official list. More information about which observations precisely were included can be found in the appendix.

By car class As mentioned in 2.3.1, we’ve constructed a variable cclass. The construction of this variable uses two types of car classifications. First of all it uses the Euro Market Segment. The different classes of the EMS are listed in table 15. When we sorted out the cars in different classes, we slightly deviated from the EMS classes. We kept A-F. We’ve split up sports cars into a lighter subset (G) and the subset (H) of more serious sports cars. We’ve added the class (N) for vans and minibuses. We also split up MPV into 2 subsets of smaller and larger MPV’s (J & K). Lastly, the class of sports utility cars, SUV’s and pickups is called (L).

Next to the Euro Market Segment, we use the Russian automobile numbering system. Every vehicle produced in Russia receives a number. This number is equivalent to the model. For example a 2140 is a small family car. Moskvitch is the brand, 2140 the model. The numbering

18 2 DATA AND MODEL 2.3 Constructing the key variables system uses engine displacement and vehicle weight to classify the different cars, and this in turn determines the first digit. The second digit stands for the type of vehicle, being a truck, a passenger car, a bus, etc. This second digit is 1 when the vehicle is a passenger car. The 3th and 4th digit stand for the factory model number. To return to our example, we have a passenger car (1) of the small class (2) that has been model number 40 of the factory. Table 16 below lists the different classes of passenger cars in this numbering system.

EMS Segment Segment in the data Description

A A mini cars B B small cars C C medium cars D D large cars E E executive cars F F luxury cars J L sports utility cars (incl. SUV) M J, K multi purpose cars S G, H sports cars N vans, minibuses

Table 15: Euro Market Segment

First digit Segment in the data Class Engine displacement Weight

1 R-1 Extra small 0. . . 1099 0. . . 799 2 R-2 Small 1100. . . 1799 800. . . 1149 3 R-3 Middle 1800. . . 3499 1150. . . 1499 4 R-4 Large 3500 and more 1500 and more 5 R-5 Upper non-regulated non-regulated

Table 16: Russian number system: passenger car classes

In the data, we call these Russian classes R-1, . . . , R-5. Table 17 gives the composition of the car population into these classes in the data. It is clear that the car population is dominated by Russian class 2 passenger cars (R-2). They make up more than half the population. In table 17 there is also a class called "Unknown car". We’ve added to this class all the cars of a passenger car brand of which the model is unknown.

Based upon these classes we are now able to construct a dummy for luxury vehicles. This variable lux2 is 1 when the class is E, F, G, H or L and 0 otherwise. lux3 is 1 when the class is E, F, G, H,

19 2.3 Constructing the key variables 2 DATA AND MODEL

Segment in the data Obs. Percent

A 12,149 0.286 B 78,789 1.856 C 344,937 8.127 D 207,623 4.892 E 114,157 2.690 F 17,663 0.416 G 1,879 0.044 H 1,712 0.040 J 12,479 0.294 K 13,606 0.321 L 134,813 3.176 N 27,127 0.639 R-1 112,198 2.644 R-2 2,175,516 51.257 R-3 271,785 6.404 R-4 371 0.009 R-5 89 0.002 Unknown car 31,584 0.744 Nocar 596,260 14.049 Total 4,244,295 100

Table 17: Composition of carpopulation in the data

20 2 DATA AND MODEL 2.3 Constructing the key variables

L, R-3, R-4 or R-5 and 0 otherwise.

By car brand A third approach to constructing a variable measuring the luxury of vehicle exploits the link between luxury cars and vanity plates. More specifically it uses our expectation that the most luxurious vehicles will have the highest frequencies of vanity plates. We will select those brands that have statistically significant higher frequencies of vanity plates, namely frequencies of nvan1. This would be a bad approach if you would want to use this variable to show there is a link between lux and van. However, we do not create this variable with that purpose in mind. This ’optimal’ lux variable will be used to try to get more robust results in smaller subsets. These smaller subsets might be regions or years for example.

We perform a series of t-tests to test for equality of the percentage of nvan1 between a subgroup with the same brand and the general car population. We do this t-test for every brand except for those who completely belong to the nocar category, they, together, are included by the nocar line in the table.

H0 :=freq(nvan1) freq(nvan1 brand ) = 0 i − | i Ha :=freq(nvan1) freq(nvan1 brand ) < 0 i − | i

Table 18 shows for each of these brands the t-statistics and the one-sided p-value. This one sided p-value, is the chance of seeing a negative difference at least as big, as there is between the brand subgroup and the general car population, this all under hypothesis of equal frequencies between the groups. We are mainly interested in negative differences, because this implies a higher frequency of nvan1 for the brand.

Table 18: Ttest for nvan1 by brand

Brand freq(nvan1) (in %) t-stat p-value Obs.

AZLK 2.65 11.13 1 47,633 Acura 8.52 -3.97 3.65e-5 223 AlfaRomeo 3.90 -0.75 0.2259 2,002 Asia 4.92 -0.79 0.2139 122 AstonMartin 20 -4.42 5.03e-06 25 Audi 5.72 -26.41 0 52,283 Austin 5.56 -0.45 0.3263 18 BMW 6.99 -39.15 0 45,082 Bentley 27.42 -20.20 0 248

21 2.3 Constructing the key variables 2 DATA AND MODEL

Buick 4.86 -0.94 0.1742 185 Cadillac 10.65 -10.35 2.05e-25 742 0 1.14 0.8730 35 Chevrolet 5.94 -20.93 0 27,007 Chrysler 6.70 -10.85 1.01e-27 4,178 Citroen 3.49 0.48 0.6833 9,901 Dacia 1.76 4.38 1 1,990 Daewoo 2.85 10.06 1 63,626 Daf 1.88 2.92 0.9983 1,013 Daihatsu 4.08 -0.59 0.2765 490 Daimler 4.36 -1.79 0.0364 1,834 Datsun 4.10 -0.31 0.3798 122 De Tomaso 0 . . 1 Dodge 4.42 -2.50 0.0063 3,052 8.16 -9.56 5.70e-22 1,507 Eagle 1.55 1.24 0.8929 129 ErAZ 2.71 1.37 0.9139 848 Ferrari 30.16 -11.35 3.79e-30 63 Fiat 2.31 5.75 1 6,996 Ford 3.54 0.72 0.7636 75,797 Freightrover 0 . . 1 GAZ 4.17 -24.66 0 247,893 GAZelle 2.72 0.80 0.7869 294 GMC 10.67 -6.91 2.50e-12 328 GreatWallHaval 1.76 2.33 0.9901 567 Honda 4.67 -8.49 9.99e-18 20,846 Hummer 21.28 -17.27 0 329 Hyundai 3.39 2.71 0.9966 64,001 IFA 3.36 0.24 0.5939 387 IMYA 1.46 1.63 0.9488 205 Infiniti 17.12 -22.41 0 946 Isuzu 4.46 -1.72 0.0427 1,322 Jaguar 9.44 -11.55 3.61e-31 1,346 Jeep 6.59 -13.11 0 6,554

22 2 DATA AND MODEL 2.3 Constructing the key variables

Kia 3.42 1.55 0.9396 30,395 2.56 0.59 0.7235 117 Lamborghini 22.22 -4.25 1.05e-5 18 Lancia 4.04 -0.58 0.2810 570 LandRover 7.79 -17.23 0 5,791 Lexus 12.86 -46.23 0 8,563 Lincoln 8.24 -7.70 6.59e-15 947 Lotus 15.79 -2.86 0.0021 19 LuA 2.10 3.49 0.9998 1,905 MG 4.76 -0.29 0.3857 21 Maserati 24.68 -9.96 1.17e-23 77 Maybach 16.67 -1.72 0.0423 6 3.81 -1.84 0.0325 23,089 Mercedes 7.91 -58.30 0 59,263 Mercury 4.28 -0.76 0.2225 421 Mini 8.74 -6.29 1.56e-10 515 Mitsubishi 4.34 -10.54 2.82e-26 65,394 Morgan 0 0.33 0.6308 3 Morris 0 . . 1 Moskvitsj 1.94 29.25 1 106,797 4.73 -15.72 0 63,421 OeAZ 2.68 13.92 1 32,788 Oldsmobile 3.55 0.02 0.5091 197 Oltcit 2.94 0.20 0.5799 34 3.03 6.77 1 50,572 Packard 0 0.27 0.6074 2 Peugeot 3.47 0.98 0.8357 25,106 Plymouth 3.42 0.23 0.5928 702 Pontiac 4.78 -1.91 0.0284 879 Porsche 17.10 -25.12 0 1,193 Proton 4.35 -0.74 0.2303 322 RangeRover 15.40 -12.45 7.30e-36 383 2.61 9.96 1 35,145 RollsRoyce 20 -6.55 2.88e-11 55

23 2.3 Constructing the key variables 2 DATA AND MODEL

Rover 5.52 -5.13 1.47e-07 2,429 Saab 6.01 -9.61 3.60e-22 5,386 Saturn 2.44 0.39 0.6533 41 SeAZ 2.10 8.39 1 11,075 Seat 2.46 2.05 0.9797 1,140 Skoda 3.62 -0.37 0.3557 30,307 Smart 7.89 -2.02 0.0216 76 Ssangyong 4.08 -0.92 0.1790 1,176 Steyr 0 0.72 0.7647 14 Subaru 5.62 -10.83 1.28e-27 9,774 Suzuki 3.79 -1.39 0.0823 16,136 Talbot 2.84 0.48 0.6833 141 Tata 0 1.69 0.9547 77 Tatra 3.04 0.74 0.7715 657 Toyota 6.21 -37.03 0 67,371 Trabant 11.54 -2.18 0.0145 26 Triumph 11.11 -1.21 0.1122 9 VAZ 3.29 32.98 1 2,115,782 Vauxhall 0 0.58 0.7185 9 Vis 1.53 7.90 1 235 Volkswagen 3.98 -6.16 3.62e-10 81,336 Volvo 5.65 -18.66 0 25,006 Wartburg 0 2.02 0.9784 110 Will 7.14 -0.72 0.2369 14 Yulon 0 1.11 0.8660 33 ZAZ 2.06 18.53 1 50,277 nocar 2.03 69.63 4.2e+06 596,260

The brands who top the list concerning the highest nvan1 frequency are Aston Martin (20%), Bentley (27.4%), Ferrari (30.2%), Hummer (21.3%), Lamborghini (22.2%), Maserati (24.7%) and Rolls Royce (20%). The results of these t-tests are evidence for our model. But we will test the hypotheses of our model more thoroughly later on.

Now we use these t-tests to construct binary lux variables lux4, lux5 and lux6. lux4 equals 1

24 2 DATA AND MODEL 2.3 Constructing the key variables

brand freq(nvan1)

Ferrari 30.2% Bentley 27.4% Maserati 24.7% Lamborghini 22.2% Hummer 21.3% Rolls Royce 20% Aston Martin 20%

Table 19: Top brands by nvan1: frequency of cars from a certain brand that has a license plate from the numerical vanity category nvan1 for the brands who had a higher level of nvan1, with the t-test being significant on the 5% level (t-stat < 1.645). Selecting on statistical significance in this large data sets, can cause brands with − rather small levels of nvan1 (but above population average) to be tagged as luxury vehicle. These brands most often have a big group size, and thus are more likely to have statistically different frequencies of nvan1. In order to construct lux5 and lux6 we employ besides the %5 level cut-off for statistical significance also a practical significance cutoff. We will require brands their nvan1 frequency to be at least respectively %5 and %10 for lux5 and lux6. These cutoffs mean a difference of 1.5%-points or %43 and 6.5%-points or %186 in nvan1 frequency between the brand and the general car population.

Summary of the constructed variables Before moving to testing our hypotheses we summarize the variables we have build to do so. Table 20 presents summary statistics for the vanity variables nvan1, lvan1, lvan2 and lvan3. And in table 21 you see the summary statistics for the constructed luxury variables lux1, lux2, lux3, lux4, lux5 and lux6.

Furthermore we consider 2 correlation matrices, which you can find in table 22 for the different vanity variables and in table 23 for the different luxury variables. Some remarks on the correlations should be made. A first remark concerns the existence of some kind of dichotomy between the vanity variables. Numerical vanity plates and letter vanity plates seem to be 2 uncorrelated things. Intuitively we can think of the 2 kind of plates being some sort of substitutes for most agents. Buying a combination of special letters and numbers might be too expensive. It will be interesting to see whether analyzing these two different kinds of vanity plates leads to similar conclusions. We also note that lvan3 is the letter vanity variable least correlated to the other vanity variables. This

25 2.3 Constructing the key variables 2 DATA AND MODEL variable was constructed purely based upon a theoretical structure of vanity plates, namely 3 equals letters. The other variables were all constructed exploiting the distribution of power among these letter combination groups.

The correlations between the luxury variables are in the range 0.0733-0.6806. The 2 variables most correlated with each other are lux2 and lux3, the two least correlated are lux4 and lux6. We defined lux2 and lux3 both on our cclass variable, making their construction quite similar. As a last remark on these correlations, we mention that lux6 is weakly correlated with the other luxury variables.

Variable Obs. mean sd min max

nvan1 4,244,295 0.0358 0.1858 0 1 lvan1 4,244,292 0.0255 0.1575 0 1 lvan2 4,244,292 0.0266 0.1608 0 1 lvan3 4,244,292 0.0064 0.0751 0 1

Table 20: Summary statistics: van

Variable Obs. mean sd min max

lux1 4,220,540 0.0169 0.1289 0 1 lux2 4,154,737 0.0650 0.2466 0 1 lux3 4,154,737 0.1306 0.3369 0 1 lux4 4,220,540 0.2672 0.4425 0 1 lux5 4,220,540 0.0790 0.2698 0 1 lux6 4,220,540 0.0031 0.0555 0 1

Table 21: Summary statistics: lux

nvan1 lvan1 lvan2 lvan3

nvan1 1 lvan1 0.0014 1 lvan2 0.0009 0.9275 1 lvan3 0.0072 0.1332 0.1224 1

Table 22: Correlation matrix for vanity variables

26 3 RESULTS

lux1 lux2 lux3 lux4 lux5 lux6

lux1 1 lux2 0.4818 1 lux3 0.3279 0.6806 1 lux4 0.2129 0.3167 0.5106 1 lux5 0.4260 0.5300 0.3324 0.4776 1 lux6 0.3273 0.1598 0.1084 0.0733 0.1535 1

Table 23: Correlation matrix for luxury variables

3 Results

As mentioned when formulating our hypotheses we will follow 3 strategies. First, a comparison of the frequency of vanity plates to its theoretical expectation is made. Second, the same test is performed within the category of luxury vehicles. In the absence of corruption, when plates are distributed randomly, their is no reason why frequencies would be higher for the category of luxury vehicles. A third strategy is to regress vanity variables on luxury variables. Under the null hypothesis of no corruption having a luxury vehicle should not have a significant effect on the probability of having a vanity plate. Performing these regressions we have the opportunity to control for some issues concerning the specific construction of our variables. For example, nvan1 , lvan1 and lvan2 were constructed exploiting the unusual distribution of engine power among certain plates. With the possibility of adding powerkw as explanatory variable we can control if the results are driven by our way of selecting vanity plates. In section 3.1 we perform our tests for the frequency in the general population. Next, in section 3.2 we do the same for the frequency within the category of luxury vehicles. Then, we will do some regressions of vanity variables on luxury variables in section 3.3. And finally, in section 3.4 we take a closer look at possible corruption within the set of plates reserved for government agencies.

27 3.1 Frequency in the general population 3 RESULTS

3.1 Frequency in the general population

The theoretical frequency of a group of plates is simply the number of different combinations in the group, divided by the total number of combinations. This way we have: 34 freq(nvan1) = 3.4034% 999 ≈ 58 freq(lvan1) = 3.3565% 123 ≈ 57 freq(lvan2) = 3.2861% 123 ≈ 12 freq(lvan3) = 0.6944% 123 ≈

Mind that for the numbers 000 is excluded and for the letters only 12 Cyrillic characters are used (those that resemble Latin characters). We test our hypotheses with a t-test. The results are presented in table 24. The frequency of nvan1 is higher than expected, the frequencies of lvan1, lvan2 & lvan3 however are lower than expected. All t-tests are significant. It is worth noting that although the mean of nvan1 differs significantly from its expected value, this statistical significant difference is a rather modest difference of 0.17%-point. The differences between expected value and real frequency are bigger for lvan1(0.81%-points) and lvan2(0.63%-points). For all the different measures of letter vanity, real frequencies are lower than expected contrary to the numerical vanity frequencies. This suggests that numerical vanity and letter vanity might be 2 different kinds of vanity plates.

expected p-value variable mean std. dev. t-stat df mean (2-sided)

nvan1 0.0340 0.0358 0.1858 19.53 0.0000 4,244,294 lvan1 0.0336 0.0255 0.1575 -1.1e+02 0.0000 4,244,291 lvan2 0.0329 0.0266 0.1608 -80.73 0.0000 4,244,291 lvan3 0.00694 0.00637 0.0796 -14.76 0.0000 4,244,291

Table 24: Ttests for vanity frequency in general population: expected mean is the theoretical mean, mean is the empirical mean. df is the degrees of freedom for the t-test.

3.2 Frequency in the luxury vehicle subpopulation

In table 25, table 26, table 27, table 28, table 29 and table 30 you see the results for the t-tests as in the previous paragraph, but now performed in a subpopulation of luxury vehicles. Remember

28 3 RESULTS 3.2 Frequency in the luxury vehicle subpopulation that our model predicts that the frequency of vanity plates will be higher in the luxury vehicle subpopulation. We see that all the means are above the means in the general population. And all the means also differ significantly from their expected values. The means of the letter vanity variables are higher than expected, except for lvan2 and lvan3 in the subpopulation for which lux4 equals 1. The variable lux4 is the broadest of the luxury variables, and the rise in letter vanity frequency isn’t enough in this subset to offset the lower frequency in the general population. In general these t-tests seem to support the model. Luxury vehicles and vanity plates go hand in hand, for a series of different measurements. If we would perform a t-test to compare the luxury vehicle subset with it’s complement instead of comparing the subset with it’s expected value, we would see the same kind of results. The results of such a test are not reported here.

expected p-value variable mean std. dev. t-stat df mean (2-sided)

nvan1 0.0340 0.0826 0.2753 54.43 0.0000 95,115 lvan1 0.0336 0.0634 0.2438 37.81 0.0000 95,115 lvan2 0.0329 0.0533 0.2246 28.07 0.0000 95,115 lvan3 0.00694 0.0267 0.1611 37.76 0.0000 95,115

Table 25: Ttests for vanity frequency in luxury car subpopulation: lux1: expected mean is the theoretical mean, mean is the empirical mean. df is the degrees of freedom for the t-test.

expected p-value variable mean std. dev. t-stat df mean (2-sided)

nvan1 0.0340 0.0652 0.2469 75.71 0.0000 359,781 lvan1 0.0336 0.0478 0.2133 39.92 0.0000 359,781 lvan2 0.0329 0.0463 0.2100 38.25 0.0000 359,781 lvan3 0.00694 0.0163 0.1268 44.47 0.0000 359,781

Table 26: Ttests for vanity frequency in luxury car subpopulation: lux2: expected mean is the theoretical mean, mean is the empirical mean. df is the degrees of freedom for the t-test.

We show in figure 4, figure 5 and figure 6 the frequencies of nvan1, lvan2 and lvan3 for the different Euro Market Segment classes. We use the official class segments, except for the N class which consists of vans. The brown vertical line in the graphs marks the average level in the data, the red line marks the theoretical average. These figures serve as an illustration, but they tend to support the model. One important note should be made however. In our model we assumed 2 classes of plates: vanity plates and non-vanity plates. Vanity plates in reality are not homogeneous,

29 3.2 Frequency in the luxury vehicle subpopulation 3 RESULTS

expected p-value variable mean std. dev. t-stat df mean (2-sided)

nvan1 0.0340 0.0640 0.2448 97.38 0.0000 632,026 lvan1 0.0336 0.0437 0.2044 39.40 0.0000 632,026 lvan2 0.0329 0.0411 0.1985 32.88 0.0000 632,026 lvan3 0.00694 0.0140 0.1174 47.66 0.0000 632,026

Table 27: Ttests for vanity frequency in luxury car subpopulation: lux3: expected mean is the theoretical mean, mean is the empirical mean. df is the degrees of freedom for the t-test.

expected p-value variable mean std. dev. t-stat df mean (2-sided)

nvan1 0.0340 0.0495 0.2169 76.47 0.0000 1,151,602 lvan1 0.0336 0.0320 0.1759 -9.75 0.0000 1,151,602 lvan2 0.0329 0.0313 0.1742 -9.37 0.0000 1,151,602 lvan3 0.00694 0.0107 0.1026 38.74 0.0000 1,151,602

Table 28: Ttests for vanity frequency in luxury car subpopulation: lux4: expected mean is the theoretical mean, mean is the empirical mean. df is the degrees of freedom for the t-test.

expected p-value variable mean std. dev. t-stat df mean (2-sided)

nvan1 0.0340 0.0658 0.2479 76.56 0.0000 357,325 lvan1 0.0336 0.0492 0.2164 43.33 0.0000 357,325 lvan2 0.0329 0.0460 0.2096 37.57 0.0000 357,325 lvan3 0.00694 0.0175 0.1311 48.11 0.0000 357,325

Table 29: Ttests for vanity frequency in luxury car subpopulation: lux5: expected mean is the theoretical mean, mean is the empirical mean. df is the degrees of freedom for the t-test.

expected p-value variable mean std. dev. t-stat df mean (2-sided)

nvan1 0.0340 0.0686 0.2528 26.24 0.0000 36,775 lvan1 0.0336 0.0519 0.2217 15.82 0.0000 36,775 lvan2 0.0329 0.0477 0.2132 13.39 0.0000 36,775 lvan3 0.00694 0.0243 0.1539 21.60 0.0000 36,775

Table 30: Ttests for vanity frequency in luxury car subpopulation: lux6: expected mean is the theoretical mean, mean is the empirical mean. df is the degrees of freedom for the t-test.

30 3 RESULTS 3.2 Frequency in the luxury vehicle subpopulation there is actually a whole range between them. You could think of each of these levels of vanity having its own price and being accompanied with its own level of luxury. That would support the stair-stepped graph we see. Taking luxury and vanity in the limit to continuous variables, would change the graph between them to an upwards sloping line, instead of flat line with a breaking point. .1 .08 .06 mean of nvan1 .04 .02

0 ABCDEFJMNS

Figure 4: Bar chart of nvan1 by class: Letters stand for classes of the EMS-segment, E are executive cars, F luxury cars, J suv’s and S sports cars. The brown line is the average level of nvan1 in the data, the red line is the theoretical level of nvan1. .08 .06 .04 mean of lvan3 .02

0 ABCDEFJMNS

Figure 5: Bar chart of lvan2 by class: Letters stand for classes of the EMS-segment, E are executive cars, F luxury cars, J suv’s and S sports cars. The brown line is the average level of lvan2 in the data, the red line is the theoretical level of nvan1.

31 3.3 Regressing vanity on luxury 3 RESULTS .06 .04 mean of lvan5 .02

0 ABCDEFJMNS

Figure 6: Bar chart of lvan3 by class: Letters stand for classes of the EMS-segment, E are executive cars, F luxury cars, J suv’s and S sports cars. The brown line is the average level of lvan3 in the data, the red line is the theoretical level of nvan1.

3.3 Regressing vanity on luxury

The first thing to consider when choosing a model to regress vanity on luxury is whether to apply a linear model or a logistic model. A linear model with only luxury as predictor poses no problems since this is a saturated model. Including other variables and abandoning the saturated model, the linear probability model would still provide consistent estimates of the coefficients. However, we must use heteroskedasticity-robust standard errors, since heteroskedasticity arises naturally in models with binary outcomes. The estimation results of such a LPM with robust standard errors were the same as the estimation results of a logistic model. Because LPM is easier to interpret it is preferred over the logistic model.

There are 4 variables for vanity and 6 for luxury. The variables nvan1, lvan1 and lvan2 were constructed by explicitly exploiting the distribution of power among plates. Luxury vehicles tend to have a higher power. So it might be that a positive association between the above vanity variables and luxury vehicles, is driven by power. By adding powerkw as an explanatory variable to our regression, we control for this. Another issue is the construction of lux4, lux5 and lux6, which uses the distribution of nvan1 over brands. A positive association between nvan1 and these luxury variables would only be natural. On the other hand, it doesn’t imply the existence of positive association between the different lvan variables and lux4, lux5 and lux6. Even more since we’ve seen that lvan and nvan are two different things (low correlation).

32 3 RESULTS 3.4 Government agencies

The results of these regressions can be found in table 31, table 32, table 33 & table 34. For nvan1 the coefficients of all lux variables are positive and highly significant. The level of nvan1 rises from around 3-3.5% with something between 1.9%-points (for lux4) and 10.6%-points (for lux6). When controlling for powerkw all lux coefficients keep their sign and significance. The effects become smaller, 1.2%-points (lux4) and 8.2%-points (lux6). The coefficient for powerkw is positive and significant. This could be due to the construction of the vanity variables. On the other hand, it could be due to an ’intra-class’ link between luxury vehicles and vanity plates. This means that the luxury variables do not capture the range of luxury between vehicles in the same category (lux=1 or lux=0), whereas powerkw might do. Further research might explore ways of discriminating which scenario holds.

For both lvan1 and lvan2 the lux variable has a positive significant effect, even after controlling for engine power. The effects range between 0.9%-points for lux4 and 5.1%-points for lux6 not controlling for power, and between 0.5%-points for lux4 and 2.0%-points for lux6 controlling for power. A note on these first letter vanity variables: doing the regression, uncontrolled for powerkw, with lvan2 instead of lvan1 doesn’t change the conclusions qualitatively. Therefore we can conclude that the results for lvan1 aren’t driven by a special behavior of the reserved government plates. For the last letter vanity variable lvan3 we have positive and significant coefficients both controlled and uncontrolled for engine power. The effects are even more outspoken than for the other vanity variables. The constant ranges between 0.48% and 0.62%. Uncontrolled, the effect of lux ranges between 0.58%-points (lux4) and 3.50%-points (lux6). After controlling for engine power the effects range between 0.29%-points (lux4) and 2.55%-points (lux6). This means that luxury vehicles of the subset determined by lux6 have 8 times(!) as many plates of the type lvan3 than in the general population.

3.4 Government agencies

The reserved plates deserve a separate analysis. Normally, as the name suggests, they are reserved for certain government agencies. However, they might also be high level vanity plates. The more exclusive, the higher the price. Worth mentioning is the ’migalki’ phenomenon (Parfitt, 2003). Since the 90’s a combination of blue flashing lights, sirens and reserved plates were sold to people with political connections. Due to widespread protest there has been a clampdown by the government since the early 00’s. If they really are high level vanity plates, they might also have vanity numbers. Therefore we look at the distribution of nvan1. First, in table 35, a set of gov dummies and the number of observations for each government agency is tabulated. In table 36 nvan1 is regressed

33 3.4 Government agencies 3 RESULTS -au u#0000000000000000000000000000000000.000 0.000 0.000 0.000 0.000 0.000 0.000 2,573,448 4,220,540 0.000 2,573,448 4,220,540 2,573,448 0.000 4,220,540 2,541,093 0.000 4,154,737 2,541,093 0.000 4,154,737 2,573,448 4,220,540 0.000 Observations lux# p-value powerkw lux6 lux5 lux4 lux3 lux2 lux1 Constant (386.33) (58.31) 0.0659 0.0347 1 2 3 4 5 6 7 8 9 1)(1 (12) (11) (10) (9) (8) (7) (6) (5) (4) (3) (2) (1) 0.000211 (64.44) (55.69) (37.61) 0.0545 0.0180 al 1 ersino vn nlxr variables luxury on nvan1 of Regression 31: Table (364.51) (75.37) 0.0378 0.0331 0.0001753 (45.34) (45.89) (68.92) 0.0291 0.0194 (102.13) (339.14) 0.0309 0.0358 0.0001585 (66.45) (40.62) (64.99) 0.0292 0.0188 (312.96) (84.59) 0.0192 0.0307 -tt r nparentheses in are t-stats : 0.0002196 (55.84) (44.27) (52.62) 0.0124 0.0150 (364.43) (79.27) 0.0354 0.0330 0.0001901 (50.40) (48.82) (65.96) 0.0276 0.0184 (393.58) (34.70) 0.1060 0.0355 0.0002513 (65.82) (23.53) (55.75) 0.0821 0.0157

34 3 RESULTS 3.4 Government agencies (7.27) 0.0195 -0.0074 (-30.92) (100.51) 0.0003595 0.0252 0.0511 (21.95) (329.98) 0.0193 (87.77) (39.48) -0.0049 (-20.96) 0.0003051 0.0233 0.0267 (69.46) (304.31) 0.0054 (24.16) (92.14) -0.0075 (-31.03) 0.000341 : t-stats are in parentheses 0.0230 0.0088 (47.29) (270.03) 0.0155 (42.75) (85.14) -0.0059 (-24.52) 0.0003109 0.0223 0.0225 (77.12) (286.81) 0.0189 (34.41) (85.42) -0.0051 (-21.20) 0.0003096 0.0278 0.0234 (64.47) (305.03) Table 32: Regression of lvan1 on luxury variables 0.0331 (25.88) (92.62) -0.0055 (-23.14) 0.0003257 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) 0.0472 0.0246 (48.69) (323.39) lux2 lux1 lux3 p-value lux#Observations 0.000 4,220,537 2,573,448 0.000 4,154,734 2,541,093 4,154,734 0.000 2,541,093 4,220,537 0.000 2,573,448 4,220,537 2,573,448 0.000 4,220,537 2,573,448 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Constant lux4 lux5 lux6 powerkw

35 3.4 Government agencies 3 RESULTS -au u#0000000000000000000000000000000000.005 0.000 0.000 0.000 0.000 0.000 0.000 2,573,448 4,220,537 0.000 2,573,448 4,220,537 2,573,448 0.000 4,220,537 2,541,093 0.000 4,154,734 2,541,093 4,154,734 0.000 2,573,448 4,220,537 0.000 Observations lux# p-value powerkw lux6 lux5 lux4 lux3 Constant lux2 lux1 (332.46) (36.23) 0.0259 0.0317 1 2 3 4 5 6 7 8 9 1)(1 (12) (11) (10) (9) (8) (7) (6) (5) (4) (3) (2) (1) 0.0002955 -6.34) ( -0.0014 (89.69) (13.38) 0.0155 al 3 ersino vn nlxr variables luxury on lvan2 of Regression 33: Table (313.89) (57.22) 0.0242 0.0247 0.0002663 -0.0003 (79.12) (31.82) (-1.37) 0.0176 (298.18) (61.96) 0.0175 0.0240 0.0002784 -0.0015 (82.99) (31.78) (-6.54) 0.0112 (280.33) (34.28) 0.0064 0.0248 -tt r nparentheses in are t-stats : 0.0002928 -0.0022 (86.11) (20.18) (-9.97) 0.0047 (314.21) (58.16) 0.0217 0.0248 0.0002666 -0.0002 (80.66) (30.63) (-0.94) 0.0152 (337.61) (16.73) 0.0264 0.0353 0.000312 (-10.66) -0.0024 (95.01) 0.0069 (2.82)

36 3 RESULTS 3.4 Government agencies 0.0255 (12.38) (50.20) -0.0032 (-18.88) 0.0001231 0.0350 (20.09) 0.00622 (162.25) 0.0098 (36.26) (43.95) -0.0022 (-13.58) 0.0001002 0.0054 0.0123 (53.32) (144.62) 0.0029 (24.86) (46.39) -0.0035 (-20.35) 0.0001175 : t-stats are in parentheses 0.0048 0.0058 (55.58) (121.91) 0.0064 (30.25) (40.96) -0.0025 (-14.66) 0.0001034 0.0050 0.0091 (55.18) (135.31) (26.73) (40.90) -0.0021 0.00856 (-12.21) 0.0001005 0.0055 0.0120 (46.99) (145.91) Table 34: Regression of lvan3 on luxury variables 0.0215 (24.96) (44.27) -0.0022 (-13.07) 0.0001052 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) 0.0059 0.0247 (38.20) (157.05) lux4 Constant lux5 lux3 lux2 lux1 lux6 powerkw p-value lux#Observations 0.000 4,220,537 2,573,448 0.000 4,154,734 2,541,093 4,154,734 0.000 2,541,093 4,220,537 0.000 2,573,448 4,220,537 2,573,448 0.000 4,220,537 2,573,448 0.000 0.000 0.000 0.000 0.000 0.000 0.000

37 3.4 Government agencies 3 RESULTS on our different luxury variables for the subset of government agencies. And then finally, in table 37, we also take a look at the differences between the various agencies. This is done by regressing nvan1 on the 9 gov dummies and controlling for powerkw. To control for powerkw instead of a luxury variable might be preferred because the car populations might differ significantly between the government agencies and the general car population. We once again use heteroskedasticity robust standard errors in all regressions.

The results in table 36 show that even in the class of reserved plates there is a strong link between lux variables and nvan1. This strong link is evidence that our model also applies to these reserved plates. The nature of the corruption involved with these plates is difficult to investigate. In theory, an unusual frequency of nvan1 could suggest two different things. 1: members of the state agencies have an influence over the handing out of plates, and by informal connections they get themselves a ’nice’ plate. 2: the reserved plates are sold to anyone with the right connections and the right amount of cash. It is hard to conclude from our tests which scenario might be true (or perhaps both are).

Table 37 shows that gov8 has a significant effect on nvan1. The amount of nvan1 goes from 3.58% to 5.88% for gov8. Since gov7, which consists of plates for the MIA, is insignificant and gov8 consists of plates used for both the MIA and the State Security (FSB), it is probable that the results are driven by State Security plates. When controlling for powerkw we no longer have a significant difference for gov8. Now gov5 has a significant positive effect (1.39%-points) and gov6 a significant negative effect (-1.86%-points) on nvan1. Controlling for power these results have to be read as followed. The effects measure differences between cars with similar power but from the different agencies, compared to the general car population. For gov6 this might mean for example: compared with the general population the same kind of cars (in the sense of engine power) have less frequently a numerical vanity plate. Gov6 consists of plates for higher authorities, often these are cars with high engine power. In the general population, cars with high engine power tend to have more numerical vanity plates. But higher authority plates are extremely exclusive. If this should mean that they are to risky to sell or to buy, they would have a lower frequency of numerical vanity plates than similar high engine powered cars. On the other hand, if the plates of the type of gov6 are so exclusive (perhaps coming with the ’migalki’ siren and flashing light), that a vanity number would add little extra signal, then we would see the same negative coefficient as in table 37 (2). Once more, it is hard to tell which scenario is the case here. A last remark: the fact that gov5 (GIBDD) has a higher (and in (2) a highly significant higher) frequency of numerical vanity plates should be no surprise. It is the GIBDD agency that is responsible for handing out the license plates.

38 4 FURTHER INVESTIGATION

Variable Description Observations

gov1 CityAdm&State 19,795 gov2 FedBailiff/CoI/Prosecutor 4,608 gov3 FedDrugControl 2,133 gov4 FedMigration 3,189 gov5 GIBDD 1,715 gov6 HigherAuth 2,342 gov7 MIA 1,166 gov8 MIA/StateSecurity 3,319 gov9 SupremeCourts 2,830

Table 35: Government agencies

4 Further Investigation

We now wish to illustrate some interesting features of the vanity plate phenomenon, without thor- oughly analyzing them. Future research might take a closer look at each of them.

4.1 Regional differences

Having established the strong links between luxury vehicles and vanity plates, we have strong evidence for corruption and for our model. But there are more interesting aspects to this corruption than merely the fact that exists. We have already taken a look on the subpopulation of government agencies and differences between the different agencies. But we can study differences between certain subgroups for other variables too. For a starter, we can compare plates with different region code (remember the structure of the plate, each plate has a region code). Our data set also has variables registering the year of operation, year of birth and place of birth. We’ve used the place of birth variable to construct a ’region of birth’ variable. We first start with studying the different region codes in section 4.1.1. Thereafter, we will compare the regions of birth in section 4.1.2. In section 4.2 we will show a time trend measuring the evolution of the vanity plate phenomenon. And lastly, in section 4.3, we compare the links between vanity and luxury for individuals with different years of birth.

If all Russians would have the same preferences over vanity plates and luxury vehicles, then differences in vanity between groups in Russia could be accredited to differences between those

39 4.1 Regional differences 4 FURTHER INVESTIGATION Constant -au u#0000080000070000000000060000010000.159 0.000 27,359 0.001 40,749 0.000 27,359 0.006 40,749 27,359 0.000 40,749 0.000 26,906 0.000 39,893 0.087 26,906 39,893 0.000 27,359 0.018 40,749 Observations 0.000 lux# p-value powerkw lux6 lux5 lux4 lux3 lux2 lux1 (38.32) 0.0366 0.0370 (6.38) 1 2 3 4 5 6 7 8 9 1)(1 (12) (11) (10) (9) (8) (7) (6) (5) (4) (3) (2) (1) al 6 ersino vn nlxr aibe:gvrmn agencies government variables: luxury on nvan1 of Regression 36: Table 0.0001989 0.0168 0.0189 (6.65) (2.36) (7.70) (35.76) 0.0245 0.0350 (6.75) 0.0001954 0.0081 0.0188 (6.17) (1.71) (7.58) (32.63) 0.0228 0.0329 (8.53) 0.0001795 0.0125 0.0183 (5.98) (3.65) (7.74) (29.97) 0.0173 0.0325 (8.10) 0.0002016 0.0076 0.0169 (6.72) (2.76) (7.28) -tt r nparentheses in are t-stats : (35.35) 0.0298 0.0344 (8.75) 0.0001644 0.0204 0.0163 (4.90) (7.94) (3.34) (39.86) 0.0616 0.0379 (4.07) 0.0002238 0.0175 0.0216 (8.23) (7.49) (1.41)

40 4 FURTHER INVESTIGATION 4.1 Regional differences

(1) (2) 0.0358 0.0144 Constant (394.87) (50.36) 0.0003 powerkw (70.86) 0.0019 -0.0018 gov1 (1.38) (-1.08) 0.0018 0.0034 gov2 (0.63) (1.05) 0.0017 0.0066 gov3 (0.42) (1.35) -0.0028 -0.0013 gov4 (-0.90) (-0.38) 0.0062 0.0139 gov5 (1.28) (2.45) 0.0018 -0.0186 gov6 (0.46) (-4.06) -0.0083 -0.0102 gov7 (-1.74) (-1.98) 0.0230 -0.0023 gov8 (5.63) (-0.46) -0.0029 0.0013 gov9 (-0.87) (0.29) Observations 4,244,295 2,583,085

Table 37: Regression of nvan1 on gov1-gov9: t-stats are in parentheses

41 4.1 Regional differences 4 FURTHER INVESTIGATION groups in the fraction of corrupt agents. This is only true if they all operate in the same institutional environment. In practice, if in some region frequencies of vanity plates are higher than in another region this might be due to differences in the institutions. A more corrupt local government, or on a smaller scale: more corrupt police officers, might imply a smaller bribe you need to pay to obtain the vanity plate. In the limit, there would be no finite price an honest police officer would be willing to receive in order to hand out a vanity plate. Everything else being equal, a higher/lower pbribe would lead to lower/higher vanity plate frequencies. Ideally, to compare individuals their willingness to pay bribes, you need them to be in the same institutional environment. We have that opportunity here. The data set contains a transaction "Traffic Violation". This Traffic Violation Database contains the information about place of birth. But also the current place of residence. This gives us the opportunity to get a group of agents, all living in the same institutional environment (Moscow and Moscow Oblast) but with different origins. Willingness to pay bribes is a social norm and social norms tend to be rather persistent over generations and within regions, so we can proxy the willingness to pay bribes of people from different regions with their respective group living in Moscow. All of this is inspired by Algan & Cahuc (2010), they use trust of American immigrants to proxy the trust in their country of origin in order to study the effects of trust on economic growth.

There still remains an important issue to be able to compare individuals with different regions of origin: they might differ in income/wealth. Merely looking at the ’raw’ difference in frequency might be misleading, since it might not measure differences in ’willingness to pay’ but rather ’ability to pay’. We will therefore also use a more subtle approach. By regressing vanity on luxury and interacting luxury with regional dummies. We thus estimate the following equation:

van = c + a lux + a1 lux reg1 + ... + aN lux regN +  i ∗ i ∗ i ∗ i ∗ i ∗ i i with N being the number of regions, and reg#i being 1 if individual i lives/originates in region #. By looking at the interaction terms we control for differences in income or wealth by specifically focusing on the subgroup that buys a luxury vehicle. By focusing on residents of Moscow and Moscow Oblast it also becomes more likely that preferences between individuals with different origins will be alike. They are subject to the same fashion trends in these vanity plates, something that is less likely when you look at the individuals living in the different regions of Russia.

Besides comparing individuals from different regions to determine differences between the will- ingness to pay bribes between those regions, it might also be interesting to simply compare the distribution of vanity over the regions. Differences between those regions will also be a result of differences in the institutions. Some of the differences might be accredited to differences in the fashion trends surrounding these plates. But by using a broad measure of vanity we hope to keep

42 4 FURTHER INVESTIGATION 4.1 Regional differences this at a minimum. We also control for differences in income by including luxury in the regression. Therefore the same equation as for the regions of birth is being estimated

The differences between regions are highlighted in section 4.1.1. In section 4.1.2 we will compare individuals with different regions of birth.

4.1.1 Region

The region code of a plate depends on the region where the plate is issued, this is where the car gets its first registration. The quality of the local institutions thus could play a big role in explaining differences in frequency of vanity plates. As a first comparison we rank the different regions based on their frequency of nvan1, van and lvan2. The variable van is newly constructed, it is 1 when nvan1 is 1 or lvan2 is 1 and 0 otherwise. As such, it is a broad variable measuring both letter and numerical vanity plates. It has an empirical frequency of 6.06%. Aggregating these ranks allows us to derive a total rank. In figure 7 you can see a map colored by using this total rank. We also perform the regression of vanity on luxury and interact luxury with regional dummies. This way, we get coefficients for each region. Again, we do this for nvan1, van and lvan2. Figure 8 is colored by the total rank of these 3 ranks. The ranks themselves can be found in the appendix.

The 9 most corrupt regions by frequency are Omsk, Altai Krai, Khabarovsk, Sakha, North Ossetia-Alania, Adygea, Irkutsk, Chechen and Sakhalin. These are all either Siberian ore Cau- casian regions. Small regions often appear at the extreme ends of the table.

The 9 most corrupt regions by their coefficient are Kalmykia, Arkhangelsk, Astrakhan, Karachay- Cherkess, Mari El, Sverdlovsk, Bashkortostan, Krasnodar and Lipetsk. Those are different regions compared to the frequency ranking, but they are also regions in the periphery.

Figure 7: Regional variation in corruption: frequency for region: darker regions have a lower total ranking, this is a sum of 3 rankings (on nvan1, van and lvan2). Darker areas thus are more corrupt. Frequencies can however measure also ability to bribe instead of willingness to bribe.

43 4.1 Regional differences 4 FURTHER INVESTIGATION

Figure 8: Regional variation in corruption: coefficient for region: darker regions have a lower total ranking, this is a sum of 3 rankings (on nvan1, van and lvan2). Darker areas thus are more corrupt.

4.1.2 Region of birth

The region code might differ from the region of residence of the car owner. It’s necessary to compare different regional variables to get a more clear picture. First we construct from the place of residence the variable rego, which is the region of residence of the owner of the car and region from the region code of the plate. And then we tabulate rego for all plates with a license plate region code standing for Moscow or Moscow Oblast (M.O.). It shows that 98.3% of the plates distributed in Moscow or M.O. are for the residents of Moscow or M.O.. Information for the rego is only available for 1,702,295 observations, compared to 2,856,561 of a total of 2,878,462 observations for region. Therefore, we will use region to select a Moscow-only sample.

As for the regions, we make 3 rankings for both a coefficient approach and a frequency approach. These are then aggregated to get a total frequency rank and a total coefficient rank. Figure 9 is colored by the total frequency rank, figure 10 by the total coefficient rank. The ranks themselves can be found in the appendix.

The 9 most corrupt regions of birth by frequency are Pskov, Leningrad Oblast, Kirov, Murmansk, Irkutsk, Arkhangelsk, North-Ossetia Alania, Yaroslavl and Primorsky.

The 9 most corrupt regions of birth by coefficient are Stavropol, North-Ossetia Alania, Astrakhan, Arkhangelsk, Primorsky, Kabardino-Balkar, Khabarovsk, Ulyanovsk and Yaroslavl.

4.1.3 Assessing the regional corruption measurements

Although there are big differences between the different rankings of regions, there are some regions who consistently come out as best or worst performing. North Ossetia-Alania for example is in each

44 4 FURTHER INVESTIGATION 4.1 Regional differences

Figure 9: Regional variation in corruption: frequency for region of birth: darker regions have a lower total ranking, this is a sum of 3 rankings (on nvan1, van and lvan2). Darker areas thus are more corrupt. Frequencies can however measure also ability to bribe instead of willingness to bribe.

Figure 10: Regional variation in corruption: coefficient for region of birth: darker regions have a lower total ranking, this is a sum of 3 rankings (on nvan1, van and lvan2). Darker areas thus are more corrupt.

of the 4 figures one of the most corrupt regions. We will compare the different regional variables with variables from the World Values Survey, with a regional corruption index from Transparency International and with the GDP per capita info for the different regions from the Russian Federal State Statistics Service (Rosstat).

For a comparison with questions from wave5 of the WVS we need to calculate the different regional corruption variables on the level of the Federal Districts ("Federal districts of Russia", 2016). The first question of interest from the WVS concerns generalized trust "Do you think most people would try to take advantage of you if they got a chance, or would they try to be fair?", where 1 means that “people would try to take advantage of you,” and 10 means that “people would try to be fair”. The second is about trust in the police "I am going to name a number of organizations. For each one, could you tell me how much confidence you have in them: is it a great deal of confidence, quite a lot of confidence, not very much confidence or none at all?" - "The police", where 1 means "A great deal" and 4 means "None at all". A last question from the WVS we consider concerns civic

45 4.1 Regional differences 4 FURTHER INVESTIGATION virtues "Please tell me for each of the following actions whether you think it can always be justified, never be justified, or something in between?" - "Someone accepting a bribe in the course of their duties", where 1 means "Never justifiable" and 10 means "Always justifiable".

Table 38 shows the correlations between the following variables: vanity frequency by federal district for nvan1, van, lvan2 and lvan3 and the coefficient of regressing vanity on lux4 for each Federal District with nvan1, van and lvan2 as vanity variables. The WVS doesn’t cover the Federal Northern Caucasian District neither does it cover the Federal Siberian District, these were treated as missing values.

All corruption variables except for the one based on the frequency of lvan3 are negatively cor- related to trust. Only the variables based on the regressions are positively correlated to the police trust variable. A high value for police trust means lower trust. Thus a positive correlation implies that federal districts with more corruption have a lower level of trust in the police. For the correla- tion with the acceptableness of bribes only 4 out of 7 variables have the right sign of correlation. The measurements based on the coefficients have the right signs for all 3 correlations. The coefficient of nvan1 and van performs best, with van coeff and bribe having a correlation of 0.69 and distrust in the police and nvan1 coeff having a correlation of 0.4656. Because of its good performance, will use the van variable later on when looking at time trends.

From Transparency International Russia we have a generalized index of the level of everyday corruption ("Состояние бытовой коррупции в Российской Федерации", 2011). This variable is called cpi in table 39. From the Russian Federal State Statistics Service we obtain the regional gdp per capita. We include them for 1998 and 2013 and also the growth in that period. Correlations between those variables and the total rank based on regressions of vanity and the ranks based on regressions of nvan1 and van are shown in table 39.

The total rank and the separate ranks their correlations have the opposite signs. The most reasonable rank seems to be the ranking based on regressions of van. This rank is slightly correlated to the cpi. However, the bribes we study are only one specific form of corruption, and therefore a high correlation to a broad corruption measure as cpi is not necessary.

Besides these issues with the way we measure regional differences in corruption, one should not forget that these results will probably suffer seriously from measurement errors. The data is focused on Moscow, which in practice means that although that the Traffic Violation database for example contains more than 2 million observations, most regions have less than 2000 observations. It is in the nature of vanity plates to be rare, and in our case they make up 0.6% to 6% of the observations.

46 4 FURTHER INVESTIGATION 4.2 Time trend

In a sample of 1000 observations this means 6 to 60 vanity plates. I do believe that reviewing the methods to measure regional differences and perhaps applying them on a larger part of the data set might be valuable to shine a light on regional differences in corruption, particularly for studying corruption of the police.

nvan1 van lvan2 nvan1 van lvan2 lvan3 trust trust bribe coeff coeff coeff freq freq freq freq police nvan1 1 coeff van 0.1545 1 coeff lvan2 -0.0720 0.9353 1 coeff nvan1 -0.1988 0.7773 0.9214 1 freq van 0.1835 -0.0026 0.0951 0.4136 1 freq lvan2 0.2785 -0.5229 -0.5110 -0.2118 0.8001 1 freq lvan3 0.1852 -0.5931 -0.7817 -0.7872 -0.0887 0.4433 1 freq trust -0.3072 -0.0174 -0.1946 -0.4114 -0.7647 -0.5134 0.4574 1 police 0.4656 0.2393 0.1031 -0.2302 -0.6691 -0.6129 -0.2180 0.1699 1 trust bribe 0.2561 0.6870 0.4991 0.1892 -0.5667 -0.7499 -0.3665 0.3864 0.7403 1

Table 38: Correlation matrix of different corruption measures on the federal level and WVS variables: On the Russian federal region level: nvan1 coeff, van coeff and lvan2 coeff measure corruption by the coefficient of vanity on luxury, nvan1 freq, van freq and lvan2 freq measure frequency of the respective vanity variable. From the WVS: trust measures generalized trust (higher levels more trust), police trust measures trust in the police (higher levels more distrust) and bribe measures acceptableness of bribery (higher levels more acceptable)

4.2 Time trend

Figure 11 shows the distribution of first registrations over the period 1990-2006. Most registrations occurred after 1998. The system of digital registration had to be rolled out. More interestingly,

47 4.2 Time trend 4 FURTHER INVESTIGATION

GDPpc98 GDPpc13 growth cpi coeff total rank nvan1 coeff rank van coeff rank

GDPpc98 1 GDPpc13 0.9542 1 growth -0.0917 0.1909 1 cpi -0.3354 -0.2179 0.3851 1 coeff total rank 0.1080 0.1427 0.0467 0.0111 1 nvan1 coeff rank -0.1593 -0.1877 -0.0423 -0.0736 0.4054 1 van coeff rank -0.1474 -0.1751 -0.0991 0.1674 0.5737 0.4973 1

Table 39: Correlation matrix of different corruption ranks on the regional level, the CPI and regional GDP per capita and growth: GDP per captia and growth from Rosstat, cpi from Transparency International Russia. Coeff total rank is the total aggregated rank of 3 coefficient ranks: nvan1, van and lvan2. Coeff ranks are made by regressing the vanity variable on a luxury variable and ranking the coefficients of region interactions with luxury.

figure 12 shows the frequency of van and the estimated coefficient of a regression of van on lux4 for each year. The frequency doesn’t have a clear trend. The coefficient however rises from below 0.02 in 1995 to over 0.04 in 2005. It seems to suggest that this form of corruption has gotten stronger over time, although a thorough analysis would have to control for other things such as a change of preferences, income, prices of both luxury vehicles and bribes. A deterioration in institutional quality, or more corrupt police would probably mean lower bribe prices since it would imply more supply of vanity plates. On the other hand, norms of the individuals might have become more corruption tolerant. This would strengthen demand for vanity and drive the bribe prices up. .1 .08 .06 Frequency .04 .02 1990 1995 2000 2005 Year of First Registration

Figure 11: Density of registrations by year: The distribution of the data over years of registration. Most observations are from registrations that occurred after 1995.

48 4 FURTHER INVESTIGATION 4.3 Year of birth .08 .06 .04 .02 0 −.02 1990 1995 2000 2005 Year Of First Registration

frequency coefficient

Figure 12: Time trend of vanity plates: In blue the frequency of van, in red the coefficient of regressing van on lux4. The frequency fluctuates without a clear trend, the coefficient however shows a clear upwards trend.

4.3 Year of birth

A last interesting feature of the corruption is illustrated in figure 14. Similar to the trend by year of registration, it plots the frequency of van, and the coefficient of the regression of van on lux4. As figure 13 shows, the bulk of individuals in the Traffic Violation database were born after the second world war. The trends before 1940 should be seen with this in mind. Figure 14 shows a big spike in the coefficient between 1910 and 1920. It also shows a rapid rise during the 1930’s and a steep climb at the end of the data, just the before the collapse of the Soviet Union. Figure 15 zooms in on the period 1928-1988. Some of the spikes coincide with lower data availability, but this appears to be more of a strengthening of certain trends, than actually creating them. What might these differences in the estimated coefficient over individuals with different years of birth mean? As a measure of the link between luxury and vanity it is linked with the fraction c of our model. A higher c, more people willing to pay a bribe, will cause a stronger coefficient of lux4 on van. BenYishay (2013) investigates the effect of early life rainfall on adult trust in Sub-Saharan countries. Similarly we can point to certain periods in Russian history that might have been a traumatic experience and acted as an institutional shock. They might have altered social norms such as the willingness to pay bribes. The peak between 1910-1920 coincides with the first world war, the Russian Revolution and a great famine. Somebody that was born at the end of the 1980’s grew up during the severe transition crisis of the 1990’s. Future research may use this data to investigate which contexts alter social norms, such as willingness to pay bribes.

49 4.3 Year of birth 4 FURTHER INVESTIGATION .04 .03 .02 Frequency .01 0 1900 1920 1940 1960 1980 Year of Birth

Figure 13: Density of registrations by year of birth: The distribution of the data over years of birth. Most observations are from individuals that are born after 1940. .08 .06 .04 .02 0

1900 1920 1940 1960 1980 Year of Birth

frequency coefficient

Figure 14: Comparison of vanity plates over year of birth: Plot of 2 corruption measures over persons with different years of birth. In blue is the frequency of van and in red is the coefficient of regressing van on lux4. We see a strong rise of the coefficient during World War 1 and the Civil War, during the 30’s and again at the end of 80’s. These spikes coincide with significant events in Russian history.

50 4 FURTHER INVESTIGATION 4.4 Other features .06 .05 .04 .03 .02 .01

1920 1940 1960 1980 2000 Year of Birth

frequency coefficient

Figure 15: Comparison of vanity plates over year of birth: zoomed in on 1928-1988: Plot of 2 corruption measures over persons with different years of birth. In blue is the frequency of van and in red is the coefficient of regressing van on lux4. The rise in coefficient during the 30’s and at the end of the 80’s is now more pronounced.

4.4 Other features

4.4.1 Regional differences in Moscow

The data contains a wide variety of regional information, and most of it is from the Moscow and the Moscow Oblast regions. Analyzing the regional differences in corruption within Moscow and comparing it to socio-economical variables is an interesting pursuit. But also, comparing corruption between the different registration centers might give a glimpse on the supply of these vanity plates. Putting together these 2 regional variables might enable an analysis that breaks up the corruption in a demand and a supply side, a unique opportunity.

4.4.2 Vanity plates as immunity to traffic stop

Vanity plates aren’t bought merely for their aesthetic value, nor for their wealth signal. As men- tioned earlier, they are said to provide a certain degree of immunity of traffic stops by the police. If this would be true it should lead to more risky driving behavior. The data set may provide the opportunity to test these suspicions that are held by Russian citizens. It contains information about traffic violations, such as fines and months of deprivation of the driver’s license. Immunity for vanity plates should imply less registered violations for the drivers equipped with these plates.

51 5 CONCLUSION

But since they might engage in more risky driving, this negative difference could be offset: more risky behavior should lead to more traffic violations. To have a clear test for this excessive risk taking another part of the data set might be useful: the traffic accidents database. More risk taking would result in more traffic collisions.

5 Conclusion

The availability of a transaction database of the GIBDD provides a unique opportunity to study the Russian vanity plates. Russian citizens suspect the police to sell these plates, although it’s illegal. We’ve constructed a theoretical model that predicts a strong link between luxury vehicles and vanity plates. And the tests we perform confirm the suspicions of the Russian citizens and our theoretical model. In order to do these tests we had to construct variables capturing the concepts of vanity plates and of luxury vehicles. For a whole series of different variables the tests confirms our model.

Besides showing the existence of this corruption we also highlight several interesting features of it. Differences in corruption can be studied for different regions, regions of birth, years of registration or years of birth. Some rankings of the Russian regions based on our approach to measure corruption were made. These rankings are not strongly correlated to an index of corruption perceptions, but an approach based on regressions leads to a high correlation with distrust in the police and acceptableness of bribes. The time trend suggests a rise in this form of corruption over the period 1994-2006. And lastly, an interesting pattern exists for corruption by year of birth. Spikes in corruption behavior by year of birth and important traumatic experiences in Russian history seem to coincide.

Corruption is a serious problem, certainly in Russia. The law enforcement is ranked as one of the most corrupt institutes. All this makes it all the more important to be able to measure and understand how this corruption evolves and why its spread over the Russian regions is unequal. Having transformed this administrative database to something that’s ready for analysis and putting forward an approach to measure this corruption I hope other researches will take up the thread.

52 6 REFERENCES

6 References

Algan, Y., Cahuc, P., & Sangnier, M. (2016). Trust and the Welfare State: the Twin Peaks Curve. The Economic Journal, 126 (593), 861-883. http://dx.doi.org/10.1111/ecoj.12278

Algan, Y., & Cahuc, P. (2010). Inherited Trust and Growth. The American Economic Review, 100 (5), 2060-2092. Retrieved from http://www.jstor.org/stable/41038755

BenYishay, A. (2013). The Transmission of Mistrust: Institutional Consequences of Early-Life Rainfall. Retrieved 19 December 2016, from http://cegadev.org/assets/cega_events/61/2B_ Risk_and_Mistrust.pdf

Fisman, R., Miguel, E. (2007). Corruption, Norms, and Legal Enforcement: Evidence from Diplomatic Parking Tickets. Journal of Political Economy, 2007, 115 (6),1020-1048.

Sampford, C., Shacklock, A., Connors, C.,&Galtung, F.Measuring Corruption. Hampshire: Ash- gate Publishing.

Svensson, J. (2005). Eight Questions about Corruption. Journal Of Economic Perspectives, 19 (3), 19-42. http://dx.doi.org/10.1257/089533005774357860

Urra, F-J. (2007). Assessing Corruption An analytical review of Corruption measurement and its problems: Perception, Error and Utility.. Retrieved from Georgetown University: http://unpan1. un.org/intradoc/groups/public/documents/APCITY/UNPAN028792.pdf

Automobile model numbering system in the Soviet Union and Russia - Wikipedia. (2016). Re- trieved October 22, 2016, from https://en.wikipedia.org/wiki/Automobile_model_numbering_ system_in_the_Soviet_Union_and_Russia

Brown, M. (2008). Abu Dhabi License Plate Fetches $14 Million, Sets World Record - Bloomberg. Web.archive.org. Retrieved 24 December 2016, from https://web.archive.org/web/20121103135255/ http://www.bloomberg.com/apps/news?pid=newsarchive&sid=aJ8EZTdrItjs

Buranov, I. (2016). Ъ - ¾От всех бед камеры не спасают¿. Retrieved October 22, 2016, from http://kommersant.ru/doc/3093898

Car classification - Wikipedia. (2016). Retrieved October 22, 2016, from https://en.wikipedia. org/wiki/Car_classification

53 6 REFERENCES

Designated license plate Russia for power – Автоновини з усього свiту. (2015). Retrieved October 23, 2016, from http://avtoz.net/designated-license-plate-russia-for-power/

Euro Car Segment. (2016). En.wikipedia.org. Retrieved 9 December 2016, from https://en. wikipedia.org/wiki/Euro_Car_Segment

Federal districts of Russia. (2016). En.wikipedia.org. Retrieved 19 December 2016, from https: //en.wikipedia.org/wiki/Federal_districts_of_Russia

Gagarin, V. (2016). Лучшее за 2015. Красивые номера  как, за сколько и где купить. Retrieved October 20, 2016, from https://auto.mail.ru/article/57453-luchshee_za_2015_ krasivye_nomera_kak_za_skolko_i_gde_kupit/

Main Directorate for Road Traffic Safety (Russia) - Wikipedia (2016). Retrieved October 22, 2016, from https://en.wikipedia.org/wiki/Main_Directorate_for_Road_Traffic_Safety_(Russia)

Keates, N. (2011). What Drives People to Take a Creative License? Retrieved October 23, 2016, from http://www.wsj.com/articles/SB10001424052702303745304576359910386002034

Lobanov, S. (2015). Красивые номера: гдекупить, какпродать и вчем коррупция. Retrieved October 22, 2016, from http://info.drom.ru/misc/34222/

Parfitt, T. (2003). Russia strips ’untouchable’ drivers of their sirens. Telegraph.co.uk. Re- trieved 17 December 2016, from http://www.telegraph.co.uk/news/worldnews/europe/russia/ 1428528/Russia-strips-untouchable-drivers-of-their-sirens.html

Plekhov, C. (s.d.) ГИБДД больше не будет выдавать автомобильные номера. Retrieved Oc- tober 22, 2016, from http://spokoino.ru/articles/gibdd/gibdd_bolshe_ne_budet_vydavat_ avtomobilnye_nomera/

Plekhov, C. (s.d.). Продажа ¾красивых номеров¿ по новым правилам. Retrieved October 21, 2016, from http://spokoino.ru/articles/gibdd/prodazha_krasivyh_nomerov_po_novym_ pravilam/

Police of Russia - Wikipedia (2016). Retrieved October 22, 2016, from https://en.wikipedia. org/wiki/Police_of_Russia

PWC. (2014). Which cars are on the luxury item list? Retrieved October 23, 2016, from https:// www.pwc.ru/ru/tax-consulting-services/assets/legislation/tax-flash-report-issue-7-eng. pdf

54 6 REFERENCES

SA West. (2011). License Plate as a Source of Valuable Data. Retrieved October 23, 2016, from http://www.backgroundscreeninginrussia.com/documents/lplate.pdf

Sputnik. (2012). Senior Traffic Cop Suspected of Taking $17,000 Bribe. Retrieved October 23, 2016, from https://sputniknews.com/russia/20120403172585054/

Stata Tools for LaTeX. (2016). Ats.ucla.edu. Retrieved 18 December 2016, from http://www. ats.ucla.edu/stat/stata/latex/

Tabakov, I. (2013). Personalized License Plates to Boost Budget. Themoscowtimes.com. Re- trieved 8 December 2016, from https://themoscowtimes.com/articles/personalized-license -plates-to-boost-budget-24644

The World Bank. (2013). Russian Federation: National and Regional Trends in Regulatory Burden and Corruption. Retrieved from http://www.worldbank.org/content/dam/Worldbank/ document/eca/Russia-Regional-BEEPS-2013.pdf

Transparency International - Country Profiles. Transparency.org. Retrieved 25 December 2016, from https://www.transparency.org/country/#RUS_PublicOpinion

Transparency International. (2015). CORRUPTION PERCEPTIONS INDEX 2015. Retrieved from http://www.transparency.org/cpi2015

WORLD VALUES SURVEY Wave 5 2005-2008 OFFICIAL AGGREGATE v.20140429. World Values Survey Association (www.worldvaluessurvey.org). Aggregate File Producer: Asep/JDS, Madrid SPAIN.

Административный регламент по регистрации (2008). Retrieved October 22, 2016, from https://autorambler.ru/bz/registration/registration_reglament/

Состояние бытовой коррупции в Российской Федерации. (2011). Retrieved 20 December 2016, from http://www.indem.ru/corrupt/doklad_cor_indem_fom_2010.pdf

Госавтоинспекция: История Госавтоинспекции. (2012). Retrieved October 23, 2016, from http://www.gibdd.ru/about/history/

Красивые номера на авто: Зеркалки, тройные, нули- o000oo.ru (2016). Retrieved October 21, 2016, from http://o000oo.ru/номера/красивые/

Купить красивые номера на авто (автономера). Спецномер в регионе РФ. | Госномер-RUS.

55 6 REFERENCES

(2016). Retrieved October 21, 2016, from https://gosnomer-rus.ru/

Купить номер на авто- o000oo.ru (2016). Retrieved October 21, 2016, from http://o000oo.ru/номера/

Купить номера на автомобиль. (2016). Avto-nomer.ru. Retrieved October 22, 2016, from http://nomera-auto.ru/ru/prodam

Продажа автомобильных номеров (2016). Retrieved October 22, 2016, from http://migalki. net/shop.php?item_price_start=1000000&item_price_end=5000000

Приказ МВД РФ от 31 октября 2008г. №948/ММ-3-6/561 (2009). Retrieved October 22, 2016, from http://www.garant.ru/products/ipo/prime/doc/12064763/

Приказ МВД РФ № 125 от 31.03.1995 Об учете автомототранспортных средств и специальной продукции в ГИБДД - 2016 автомобильное законодательство РФ (2014). Retrieved October 22, 2016, from http://voprosov-net.ru/prikazi/125.htm

56 6 REFERENCES

I A SELECTED LETTERS FOR LVAN1

A Selected letters for lvan1

Table 40: Letters in lvan1

l mean

EPO 94 l mean EXB 118 AAA 110 EXO 103 ABA 96 EYK 85 AMM 95.59 EYP 85 AMO 96 HAC 94 AMP 100 HAT 94 AOO 92 KKK 96 BAP 82.5 KMP 100 BEY 90 KXA 98.5 BMP 120 KXP 94 CCC 110 KXY 94 CCH 85 MMM 96 CCY 85 MMP 110 CEC 92 MXB 85 CPA 87 MXO 95 CXX 95 MXT 82 EAA 98 MYX 85 EAY 82.25 OMP 107.5 EHC 88 OOO 140 EHP 161 PXY 95 EKA 86.5 TXA 95 EKO 106 TXM 85 EMB 85.85 XAE 94 EME 110 XEP 97 EMO 85 XPC 85 EOT 108.5 XXY 85 EOY 99.5 YCY 94 EPA 83.09 YPX 97 EPH 97.5 YYX 85 Tot 96.91403

II B SELECTED LETTERS FOR LVAN2

B Selected letters for lvan2

Table 41: Letters in lvan2

l mean

AAK 81 ABA 96 AMM 95.59 AMO 96 BAP 82.5 BEY 90 BMP 120 CAY 80 CCC 110 CCH 85 CCY 85 CEC 92 CPA 87 CPX 81 CXX 95 EAA 98 EAY 82.25 EHC 88 EHP 161 EKA 86.5 EKO 106 EMB 85.85 EME 110 EMO 85 EOT 108.5 EOY 99.5 EPA 83.09 EPH 97.5 EPO 94 EXB 118

III B SELECTED LETTERS FOR LVAN2

EXO 103 EYK 85 EYP 85 HAC 94 HAT 94 HTX 80 KKK 96 KMP 100 KXA 98.5 KXP 94 KXY 94 MCY 80 MMM 96 MMP 110 MXB 85 MXO 95 MXT 82 MYX 85 OCC 80 OMP 107.5 PXY 95 TXA 95 TXM 85 XAE 94 XEP 97 XMO 81 XPC 85 XXY 85 YCY 94 YPX 97 YYX 85 Tot 93.13352

IV C SELECTED LETTERS FOR LVAN3

C Selected letters for lvan3

Table 42: Letters in lvan3

l mean

AAA 110 BBB 52.5 CCC 110 EEE 51 HHH 66.44 KKK 96 MMM 96 OOO 140 PPP 54.8 TTT 56 XXX 71 YYY 56 Tot 86.35

V C SELECTED LETTERS FOR LVAN3

VI D CLASSES IN THE NOCAR CATEGORY

D Classes in the nocar category

Table 43: Nocar classes

Cclass Obs.

Bus 22028 Crane 2472 Minibus 4673 Motorcycle 65205 R-1 Bus 23 R-1 Truck 8325 R-1 Van 143 R-2 Bus 15373 R-2 Reserved 651 R-2 Special 783 R-2 Truck 32823 R-2 Van 78455 R-3 Bus 23556 R-3 Special 22519 R-3 Truck 136936 R-3 Van 7722 R-4 Bus 22 R-4 Reserved 39 R-4 Special 1489 R-4 Truck 31195 R-4 Van 5966 R-5 Bus 3497 R-5 Truck 20444 R-6 Special 737 R-6 Truck 3197 R-6 Van 18 R-9 Truck 33 Trailer 66 Truck 62387 Truck/Bus 45473 Total 596260 VII E BRANDS IN THE LUX1 CATEGORY

E Brands in the lux1 category

Table 44: Brands in lux1

Brand Obs.

Audi 3,915 BMW 29,515 Bentley 248 Cadillac 742 Ferrari 63 Infinity 946 Jaguar 1,346 Jeep 6,554 Lamborghini 18 LandRover 404 Lexus 7,559 Maserati 77 Mercedes 4,766 Nissan 2,321 Porsche 1,193 RangeRover 383 RollsRoyce 55 Toyota 10,873 Volkswagen 383 Total 71,361

VIII F CORRUPTION RANKS: REGIONS

F Corruption ranks: regions

Table 45: Rank of regions: frequency: higher numbers indicate more corrupt

region nvan1 lvan2 van total rank Obs.

Adygea 65 64 63 68 450 Altai Krai 80 72 80 72 174 Altai Republic 2 1 2 2 86 82 1 71 56 94 Arkhangelsk 25 38 25 15 1,567 Astrakhan 27 66 54 35 1,394 Bashkortostan 63 18 39 24 1,192 Belgorod 57 68 65 60 2,425 Bryansk 38 41 33 42 6,487 Buryatia 3 78 15 40 70 Chechen 70 63 69 67 372 Chelyabinsk 66 14 35 25 1,039 Chukotka Aut. 43 10 13 28 124 6 37 8 10 2,604 Ingushetia 4 2 3 3 125 Irkutsk 75 61 77 68 193 Ivanovo 17 77 62 54 5,460 Jewish Aut. Obla 5 12 4 8 161 Kabardino-Balkar 10 76 59 32 763 Kaliningrad Obla 34 60 50 45 6,241 Kalmykia 78 46 75 62 504 Kaluga 31 34 26 22 16,271 Kamchatka 83 7 79 46 126 Karachay-Cherkes 28 16 14 27 635 Karelia 47 55 55 39 485 Kemerovo 73 4 51 25 140 Khabarovsk 81 74 83 71 318 Khakassia 79 1 42 37 82 Khanty-Mansi Aut 72 8 37 31 752 Kirov 60 26 43 41 2,174

IX F CORRUPTION RANKS: REGIONS

Komi 77 31 72 52 758 Kostroma 54 42 44 34 4,314 Krasnodar 33 43 31 19 5,596 Krasnoyarsk 84 15 82 57 395 Kurgan 64 69 73 55 335 Kursk 19 27 17 20 3,021 Leningrad Oblast 35 67 58 39 2,385 Lipetsk 51 57 57 50 6,172 Magadan 8 61 27 18 193 Mari El 18 80 76 63 948 Mordovia 11 23 10 13 3,753 Moscow 12 32 18 7 839,911 Moscow Oblast 14 35 19 11 1,730,706 Murmansk 44 45 38 48 1,057 Nenets Aut. 9 82 85 65 37 Nizhny Novgorod 30 30 23 17 7,435 North Ossetia-Al 62 79 78 69 604 Novgorod 46 71 66 61 2,246 Novosibirsk 58 52 56 51 385 Omsk 86 75 86 73 427 Orenburg 16 59 36 29 1,645 Oryol 45 50 47 56 3,880 Penza 50 81 81 63 6,504 Perm 55 49 48 44 1,289 Primorsky 71 6 40 43 318 Pskov 37 70 61 58 2,245 Rostov 21 36 22 10 5,707 Ryazan 32 11 12 21 26,772 Sakha 85 47 84 70 231 Sakhalin 76 51 74 66 169 Samara 56 40 46 33 5,212 Saratov 40 53 41 33 5,024 Smolensk 39 19 20 26 14,136 St. Petersburg 53 24 29 29 8,469

X F CORRUPTION RANKS: REGIONS

Stavropol 59 65 68 53 3,041 Sverlovsk 29 48 34 36 1,293 Tambov 49 73 70 47 8,176 Tatarstan 15 28 16 14 2,723 Tomsk 74 25 67 59 428 Tula 36 39 28 23 16,145 Tuva 7 21 5 6 130 Tver 22 9 7 15 25,598 Tyumen 68 20 52 47 521 Udmurt 61 3 21 12 1,037 Ulyanovsk 24 5 6 5 2,037 Vladimir 26 33 24 16 18,689 Volgograd 41 58 49 41 4,879 Vologda 48 29 30 31 4,191 Voronezh 20 17 11 9 5,375 Yamalo-Nenets Au 69 22 53 30 609 Yaroslavl 52 44 45 43 12,873 Zabaykalsky 42 62 60 49 397

Table 46: Rank of regions: coefficient: higher numbers indicate more corrupt

region nvan1 lvan2 van total rank Obs.

Arkhangelsk 46 55 49 48 1,567 Astrakhan 51 51 52 47 1,394 Bashkortostan 54 37 55 44 1,192 Belgorod 30 9 8 9 2,425 Bryansk 40 19 35 23 6,487 Chelyabinsk 20 34 5 26 1,039 Dagestan 9 53 21 38 2,604 Ivanovo 12 41 6 28 5,460 Kabardino-Balkar 36 4 20 6 763 Kaliningrad Obla 6 8 2 2 6,241 Kalmykia 56 56 56 49 504 Kaluga 26 30 31 25 16,271 Karachay-Cherkes 52 42 53 46 635

XI F CORRUPTION RANKS: REGIONS

Khanty-Mansi Aut 49 10 44 17 752 Kirov 2 7 14 1 2,174 Komi 45 1 1 8 758 Kostroma 34 17 15 16 4,314 Krasnodar 31 47 39 43 5,596 Kursk 23 14 45 11 3,021 Leningrad Oblast 8 31 16 18 2,385 Lipetsk 32 45 42 42 6,172 Mari El 55 38 54 45 948 Mordovia 21 40 47 31 3,753 Moscow 14 49 25 36 839,911 Moscow Oblast 25 46 27 39 1,730,706 Murmansk 1 16 46 4 1,057 Nizhny Novgorod 41 28 17 30 7,435 North Ossetia-Al 44 3 48 10 604 Novgorod 19 5 3 3 2,246 Orenburg 39 32 32 32 1,645 Oryol 42 35 40 36 3,880 Penza 17 52 36 41 6,504 Perm 16 11 7 5 1,289 Pskov 24 24 33 19 2,245 Rostov 28 13 38 14 5,707 Ryazan 3 39 10 24 26,772 Samara 53 29 51 35 5,212 Saratov 4 36 18 22 5,024 Smolensk 18 15 4 9 14,136 St. Petersburg 13 48 29 34 8,469 Stavropol 11 54 41 40 3,041 Sverlovsk 43 44 34 45 1,293 Tambov 33 6 13 7 8,176 Tatarstan 22 27 30 22 2,723 Tula 35 20 37 21 16,145 Tver 15 25 19 15 25,598 Tyumen 7 50 11 33 521

XII F CORRUPTION RANKS: REGIONS

Udmurt 37 26 24 27 1,037 Ulyanovsk 50 22 43 28 2,037 Vladimir 38 18 22 20 18,689 Volgograd 29 12 12 13 4,879 Vologda 47 33 28 37 4,191 Voronezh 27 21 26 17 5,375 Yamalo-Nenets Au 48 2 50 12 609 Yaroslavl 10 43 23 29 12,873

XIII G CORRUPTION RANKS: REGIONS OF BIRTH

G Corruption ranks: regions of birth

Table 47: Rank of regions of birth: frequency: higher numbers indicate more corrupt

region of birth nvan1 lvan2 van total rank Obs.

Altai Krai 21 55 44 34 1,767 Amur 68 8 36 35 996 Arkhangelsk 64 64 66 55 2,126 Astrakhan 5 43 9 13 1,258 Bashkortostan 31 15 16 6 2,203 Belgorod 25 50 37 40 1,878 Bryansk 50 57 63 46 6,130 Chechen 70 16 45 44 1,791 Chelyabinsk 8 10 6 10 2,954 Chita 27 73 69 48 1,165 Dagestan 63 36 54 50 4,308 Irkutsk 74 63 72 56 2,100 Ivanovo 14 6 7 9 3,395 Kabardino-Balkar 56 5 22 15 769 Kaliningrad Obla 33 25 19 18 1,741 Kaluga 26 47 27 36 6,427 Kamchatka 57 26 25 28 576 Kemerovo 42 52 59 49 2,174 Khabarovsk Krai 15 59 42 24 1,640 Kirov 38 72 70 58 1,996 Komi 20 23 13 6 1,341 Kostroma 47 48 53 32 1,849 Krasnodar 28 11 10 16 5,229 Krasnoyarsk 54 44 51 33 2,784 Kurgan 12 54 39 42 762 Kursk 65 19 50 43 3,928 Leningrad Oblast 73 65 74 59 2,153 Lipetsk 55 29 46 39 5,973 Magadan 22 41 28 33 1,228 Mari El 61 30 52 39 733

XIV G CORRUPTION RANKS: REGIONS OF BIRTH

Mordovia 17 14 11 5 7,221 Moscow 45 31 35 27 189,578 Moscow Oblast 51 24 34 34 378,240 Murmansk 62 62 65 57 1,661 Nizhny Novgorod 36 42 32 31 5,382 North Ossetia-Al 53 46 58 55 818 Novgorod 18 60 49 23 1,105 Novosibirsk 30 66 62 45 1,845 Omsk 58 3 17 8 1,564 Orenburg 13 39 18 12 2,886 Oryol 43 49 48 35 3,438 Penza 10 13 8 4 8,138 Perm 37 53 57 52 2,723 Primorsky 71 32 61 53 2,127 Pskov 66 58 64 59 1,007 Rostov 49 7 15 16 5,191 Ryazan 34 28 23 19 24,511 Sakha 69 2 29 17 732 Sakhalin 3 9 3 11 1,295 Samara 9 4 5 2 3,082 Saratov 40 38 33 26 5,568 Smolensk 72 45 68 50 5,391 St. Petersburg 35 35 30 29 2,012 Stavropol 52 34 43 47 2,849 Sverdlovsk 6 70 55 50 3,342 Tambov 24 40 24 22 11,075 Tatarstan 39 68 67 51 2,497 Tomsk 7 71 56 31 810 Tula 41 27 26 21 10,507 Tver 23 22 12 14 7,922 Tyumen 59 21 41 39 1,436 Udmurt 46 51 60 38 883 Ulyanovsk 48 12 21 25 2,152 Vladimir 11 37 14 20 6,869

XV G CORRUPTION RANKS: REGIONS OF BIRTH

Volgograd 60 17 38 40 4,325 Vologda 19 56 40 41 1,384 Voronezh 44 33 31 30 6,049 Yaroslavl 29 61 47 54 4,016

Table 48: Rank of regions of birth: coefficient: higher numbers indicate more corrupt

region of birth nvan1 lvan2 van total rank Obs. Altai Krai 59 31 37 34 1,767 Amur 62 36 45 39 996 Arkhangelsk 50 68 47 52 2,126 Astrakhan 66 61 52 53 1,258 Bashkortostan 32 18 35 15 2,203 Belgorod 46 4 6 10 1,878 Bryansk 63 27 58 31 6,130 Chechen 53 29 55 27 1,791 Chelyabinsk 35 39 40 29 2,954 Chita 21 7 4 6 1,165 Dagestan 64 35 62 39 4,308 Irkutsk 9 20 18 8 2,100 Ivanovo 13 14 30 7 3,395 Kabardino-Balkar 67 50 69 50 769 Kaliningrad Obla 3 63 15 38 1,741 Kaluga 55 15 38 23 6,427 Kamchatka 61 1 21 14 576 Kemerovo 36 59 60 45 2,174 Khabarovsk Krai 47 58 61 49 1,640 Kirov 69 2 12 18 1,996 Komi 26 43 27 28 1,341 Kostroma 22 17 23 12 1,849 Krasnodar 39 40 43 32 5,229 Krasnoyarsk 25 66 20 47 2784 Kurgan 8 10 13 3 762 Kursk 56 30 67 30 3,928 Leningrad Oblast 7 64 10 40 2,153

XVI G CORRUPTION RANKS: REGIONS OF BIRTH

Lipetsk 52 51 44 45 5,973 Magadan 28 38 49 25 1,228 Mari El 19 52 33 36 733 Mordovia 44 48 34 41 7,221 Moscow 38 42 46 35 189,578 Moscow Oblast 40 41 42 35 378,240 Murmansk 1 16 14 5 1,661 Nizhny Novgorod 18 19 25 12 5,382 North Ossetia-Al 60 67 68 54 818 Novgorod 2 28 5 13 1,105 Novosibirsk 12 69 41 43 1,845 Omsk 10 45 50 24 1,564 Orenburg 23 23 26 16 2,886 Oryol 54 33 59 33 3,438 Penza 58 24 19 26 8,138 Perm 4 34 11 17 2,723 Primorsky 57 60 65 51 2,127 Pskov 68 8 63 22 1,007 Rostov 16 32 16 20 5,191 Ryazan 29 44 31 31 24,511 Sakha 15 13 2 7 732 Sakhalin 5 11 3 2 1,295 Samara 51 46 57 42 3,082 Saratov 37 57 48 44 5,568 Smolensk 33 25 29 21 5,391 St. Petersburg 42 49 64 41 2,012 Stavropol 65 65 66 55 2,849 Sverdlovsk 30 21 17 17 3,342 Tambov 17 47 28 27 11,075 Tatarstan 49 53 54 46 2,497 Tomsk 6 5 1 1 810 Tula 31 12 39 11 10,507 Tver 20 54 51 37 7,922 Tyumen 41 6 7 9 1,436

XVII G CORRUPTION RANKS: REGIONS OF BIRTH

Udmurt 43 3 8 8 883 Ulyanovsk 34 62 56 48 2,152 Vladimir 24 55 32 39 6,869 Volgograd 11 22 36 11 4,325 Vologda 14 9 22 4 1,384 Voronezh 27 26 24 19 6,049 Yaroslavl 45 56 53 47 4,016

XVIII