Not a Zero Sum Game: How to Simultaneously Maximize Efficiency and Privacy in Data-Driven Urban Governance by Nikita Krishna Kodali Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Masters of Engineering in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2019 c Massachusetts Institute of Technology 2019. All rights reserved.

Author...... Department of Electrical Engineering and Computer Science May 28, 2019

Certified by...... Karen Sollins Principal Research Scientist Thesis Supervisor

Accepted by...... Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Not a Zero Sum Game: How to Simultaneously Maximize Efficiency and Privacy in Data-Driven Urban Governance by Nikita Krishna Kodali

Submitted to the Department of Electrical Engineering and Computer Science on May 28, 2019, in partial fulfillment of the requirements for the degree of Masters of Engineering in Computer Science and Engineering

Abstract has been been striving towards digitization of citizen data and government services using e-governance platforms to improve accountability, transparency, and efficiency. Accordingly, India launched the world’s largest biometric ID system, Aad- haar, in 2009 and the "100 " in 2015. However, with the immense wealth of personal data being digitized, guarantees on personal privacy were subse- quently called into question. In August of 2017, the Supreme Court of India declared that privacy is a fundamental right. To examine the juxtaposition of governmental efficiency and personal privacy through rapid digitization, we investigate and col- lect city metadata by examining the architecture of the products of eGovernments Foundation, one of the leading providers of digital tools for ULBs, and by directly by interacting with cities. In this particular study, we investigate New Property Tax Assessment applications, New Water Tap Connection applications, and Public Grievance Redressals for 112 Urban Local Bodies (ULBs) in the state of . Through field work, collection of data, and further analysis, we observe which data fields are collected how they are used. We define a Government Efficiency Index (GEI) and Information Privacy Index (IPI) in order to provide a standard for understanding and analyzing the trade-offs between government efficiency and citizen privacy for these services across all ULBs. This thesis examines how ULBs perform on GEI and IPI axes through multiple lenses. Using real data, we demonstrate that both efficiency and privacy are measurable concepts in the context of urban governance. Furthermore, the methodology of identifying top-performing cities outlined in this thesis allows us to conclude that there exist exemplar cases of ULBs that have high impact on both the GEI and IPI axes. Thus, this methodology for comparing effi- ciency and privacy provides a structure to understanding, evaluating, and comparing governmental processes.

Thesis Supervisor: Karen Sollins Title: Principal Research Scientist

3 4 Acknowledgments

First and foremost, I am deeply grateful for my research supervisors, Karen Sollins and Chintan Vaishnav. Without their guidance, compassion, and encouragement, I would not have been able to produce this document with the confidence I do today. Thank you Karen, for allowing me to voice my sometimes naive optimism or trivial frustrations several times a week, and pushing me to think about implications beyond just the scope of our research. Your immense depth of knowledge in your field of research inspires me, and I hope to one day be as meaningful of a mentor to someone as you have been to me. Thank you Chintan, for your boundless patience in teaching me how to become a better researcher both in the lab and on the ground. Your endless enthusiasm, calming presence, and passion for using technology for social good give me a role model to look up to every day.

Second, this work would not have been possible without the work and contribu- tions of Gautham Ravichander and his colleagues at the eGovernments Foundation. In addition, I wish to acknowledge my Research Assistantship funding the MIT In- ternet Policy Research Initiative. I want to express my profound gratitude for your support and encouragement over the last two years.

My accomplishments and successes are never singular. The support, guidance, love, encouragement, and optimism of my friends and family empower me every day. The women in my life - Ammamma, Nanamma, Amma, Chinnu Amma, and Souji Atha - have taught me how to be strong and the men in my family - Thatha, Nana, Chandu Nana, Deva Mama, Bablu Mama, and Pasi Babai - have picked me up when Iamunabletobe.Thankyouforalwaysbelievinginme.

To my dear friend, Divya, thank you for unwavering support and encouragement through thick and thin. And of course, thank you in advance to my forever best buddies - Anita, Bhargavi, Kavya, Sravya, Siri, Ronak, and Aadya - for listening to me recite every line in this document annually. Sandeep, I literally would not have been able to hand in this thesis on time without you, so you are exempt.

5 For Geethatha, whose warm smile will always be etched in our memories,

whose love will forever remain in our hearts. Contents

1 Introduction 14

2 Research Questions 17

3 Background: Digitization and Constitutional Privacy 18 3.1 Advances in Digitization of Cities ...... 18 3.2 The Constitutionality of Privacy ...... 19 3.2.1 The Issue ...... 20 3.2.2 SupremeCourtDecision...... 20

4 Understanding the Value of Privacy in the Indian Context 22

5 Existing Models of Privacy 25 5.0.1 Command and Control Model ...... 25 5.0.2 SectoralModel ...... 26 5.0.3 Co-Regulatory Model ...... 27 5.0.4 Recommended Model of Privacy in India ...... 29 5.1 Data and Data Collection ...... 30 5.2 e-Governments Foundation, Current Installations and Digital Services ...... 30 5.3 SiteandModuleSelection ...... 31 5.3.1 Department Hierarchy ...... 31 5.3.2 Classification of eGov Modules ...... 32 5.3.3 ModulesSelectedforThisStudy ...... 32

6 Data Available for Analysis 33 6.0.1 Workflows...... 33 6.0.2 Binary and Qualitative Matrices ...... 33 6.0.3 2018 Data ...... 34 6.1 ServiceModulesData ...... 34

7 CONTENTS 8

6.1.1 PropertyTaxModule ...... 35 6.1.2 Water Charges Module ...... 36 6.1.3 PublicGrievancesModule...... 38

7 Understanding Data Use and Data Disclosure 41 7.1 Understanding Implications of Loss of Data Integrity ...... 42 7.1.1 Privacy Analysis I: Loss of Data Integrity ...... 42 7.1.2 Functional and Financial Implications of Loss of Data Integrity ...... 42 7.2 Understanding the Implications of Data Disclosure on Privacy ...... 44 7.2.1 Privacy Analysis II: Loss of Data Confidentiality ...... 44 7.2.2 Financial and Cultural Implications of Data Voluntary or Involuntary Disclosure 44 7.2.3 GrievanceandInference ...... 45

8Synthesis 46 8.1 Government EciencyIndex ...... 46 8.1.1 TimelinessofService...... 46 8.1.2 Accuracy of Service ...... 48 8.2 Informational Privacy Index ...... 49 8.2.1 Right Collection Index ...... 49 8.2.2 RightUse...... 50 8.2.3 RightDisclosure ...... 51

9 Results and Analysis 53 9.1 GEICalculation ...... 53 9.1.1 Timeliness...... 53 9.1.2 Property Tax Assessment ...... 55 9.1.3 Water Charges ...... 67 9.1.4 Comparison of PT and WT Model ULBs ...... 79 9.1.5 PublicGrievances ...... 79 9.1.6 Accuracy ...... 83 9.2 IPICalculation...... 83 9.2.1 Right Collection ...... 83 9.2.2 RightUse...... 85 9.2.3 RightDisclosure ...... 85 9.2.4 IPIImpact ...... 86 9.3 GEIandIPI...... 86 CONTENTS 9

10 Applications and E↵ects on Public Policy 90 10.1 Learning from Top Performing ULBs ...... 90 10.2 Evaluating the Trade-o↵s between EciencyandPrivacy ...... 91 10.3 Reviewing Fairness of SLAs ...... 91

11 Conclusions and Further Work 93 11.1FurtherWork...... 94

A Site and Module Selection 99

B Understanding Data Use and Data Disclosure 103

C New Property Tax Assessment Data 105

D New Water Tap Connection: Data 112

E Public Grievance Redressal Data 116

F Results and Analysis 121 List of Figures

5.1 Co-Regulatory Model ...... 28 5.2 Comparison of Privacy Principles ...... 29 5.3 eGovernmentsFoundationERPSuite[3] ...... 31 5.4 DepartmentHierarchy ...... 31

6.1 New Property Tax Assessment Workflow[3] ...... 36 6.2 New Water Tap Connection Workflow[3] ...... 37 6.3 PGR Escalation Hierarchy[3] ...... 39

7.1 Table of Fields Necessary, Collected for Eciency/Accuracy, and Unnecessary for Completion of PT Assessment, WT Assessment, and PGR ...... 43

9.1 Distribution of Inaccurate Application Durations for PT by Tier: This plot describes the distribution of percentage of inaccurately recorded transactions by tier of pop- ulation. Each marker indicates a ULB, where the size of the marker indicates the volume of applications received and the color indicates the population tier to which that ULB belongs. The median of each tier is also marked in yellow and annotated withthevalueoftherespectivemedian...... 56 9.2 PT Adjusted vs. Unadjusted Timeliness: This figure compares adjusted timeliness and unadjusted timeliness values for each ULB. The size and color of the marker correspond to the volume of applications received by and population tier of that ULB, respectively. The OLS lines-of-best-fit for each tier are indicated by dashed lines. 57 9.3 PT: A histogram of the frequency of adjusted timeliness bins, aggregated by color according to population tier ...... 58 9.4 Distribution of Adjusted Timeliness for PT by Tier: These boxplots separated by tier describe the distribution of adjusted timeliness values. Each marker, representing a ULB, has size proportional to the volume of applications received by that ULB. The

pink reference lines indicate the average tadj for the corresponding tiers. The yellow

line indicates the median tadj for the corresponding tiers ...... 59

10 LIST OF FIGURES 11

9.5 Distribution of Weighted Impact for PT by Tier: Boxplots for the distribution of weighted impact by tiers, where markers are ULBs of size proportional to the volume ofapplicationsreceived...... 61 9.6 GEImpact vs. Adjusted Timeliness of PT by Tier: ULBs handling high volumes of applications but maintaining high timeliness rates can be identified here. Each pane gives a plot of weighted impact vs. adjusted timelines for a particular tier. The shades regions correspond to the interquartile regions for each axis. The markers represent ULBs where the size of the marker is proportional to the total number of applications received. ULBs where both coordinates are greater than the third quartile can be informative upon further analysis...... 62 9.7 Tier 1 Model ULBs with respect to Timeliness of Service for New Property Tax Assessment ...... 64 9.8 Tier 2 Model ULBs with respect to Timeliness of Service for New Property Tax Assessment ...... 65 9.9 Tier 3 Model ULBs with respect to Timeliness of Service for New Property Tax Assessment ...... 66 9.10 Distribution of Inaccurate Application Durations for WT by Tier: This plot describes the distribution of percentage of inaccurately recorded transactions by tier of population. 68 9.11 WT Adjusted vs. Unadjusted Timeliness: This figure compares adjusted timeliness and unadjusted timeliness values for each ULB for WT. The size and color of the marker correspond to the volume of applications received by and population tier of that ULB, respectively. The OLS lines-of-best-fit for each tier are indicated by dashed lines...... 69 9.12 WT: A histogram of the frequency of adjusted timeliness bins, aggregated by color according to population tier ...... 70 9.13 Distribution of Adjusted Timeliness for WT by Tier: These boxplots separated by tier describe the distribution of adjusted timeliness values. Each marker, representing a ULB, has size proportional to the volume of applications received by that ULB. The

pink reference lines indicate the average tadj for the corresponding tiers. The yellow

line indicates the median tadj for the corresponding tiers...... 71 9.14 Distribution of Weighted Impact for WT by Tier: Boxplots for the distribution of weighted impact by tiers, where markers are ULBs of size proportional to the volume ofapplicationsreceived...... 72 LIST OF FIGURES 12

9.15 GEImpact vs. Adjusted Timeliness of WT by Tier: ULBs handling high volumes of applications but maintaining high timeliness rates can be identified here. Each pane gives a plot of weighted impact vs. adjusted timelines for a particular tier. The shades regions correspond to the interquartile regions for each axis. The markers represent ULBs where the size of the marker is proportional to the total number of applications received. ULBs where both coordinates are greater than the third quartile can be informative upon further analysis...... 74 9.16 Tier 1 Model ULBs with respect to Timeliness of Service for New Water Tap Connection 76 9.17 Tier 2 Model ULBs with respect to Timeliness of Service for New Water Tap Connection 77 9.18 Tier 3 Model ULBs with respect to Timeliness of Service for New Water Tap Connection 78 9.19 Average Timeliness across Tiers by PGR Department ...... 80 9.20 PGR Timeliness by Department of Complaint and Tier ...... 81 9.21 Process for Identifying Model ULBs for a Complaint ...... 82 9.22 New Water Tap Connection Right Collection Parameters: The functionary level pa- rameters are displayed under the columns of each functionary, while the parameter attheservicelevelisdisplayedintherightmostcolumn...... 84 9.23 New Property Tax Assessment Right Collection Parameters: The functionary level parameters are displayed under the columns of each functionary, while the parameter attheservicelevelisdisplayedintherightmostcolumn...... 84 9.24 PGR Right Collection Parameters: The functionary level parameters are displayed under the columns of each functionary, while the parameter at the service level is displayedintherightmostcolumn...... 85 9.25 IPI Calculation for All ULBs ...... 86

9.26 Tier 1: GEImpacttotal vs IP Impacttotal ...... 87

9.27 Tier 2: GEImpacttotal vs IP Impacttotal ...... 87

9.28 Tier 3: GEImpacttotal vs IP Impacttotal ...... 88 9.29 GEI Comparison for All Services by Tier ...... 89

A.1 Service Level Agreements for PGR ...... 102

B.1 Examples of Public Grievances types for which loss of confidentiality leads to financial loss...... 103 B.2 Examples of Public Grievances types for which loss of confidentiality leads to financial andculturalloss ...... 104 B.3 Loss of confidentiality can be to either the complainer or the subject; the relationship is related to whether the loss is financial or cultural...... 104

C.2 New Property Tax Assessment: Necessary Data Matrix ...... 107 LIST OF FIGURES 13

C.4 New Property Tax Assessment: Operational and Administrative Matrix ...... 109 C.6 New Property Tax Assessment: 2018 Data Fields ...... 111

D.1 New Water Tap Connection: Necessary Data Matrix ...... 113 D.2 New Water Tap Connection: Operational and Administrative Matrix ...... 114 D.3 New Water Tap Connection: 2018 Data Fields ...... 115

E.1 Public Grievance Redressal: Necessary Data Matrix ...... 116 E.2 Public Grievance Redressal: Operational and Administrative Matrix ...... 117 E.5 Public Grievance Redressal: 2018 Data Fields ...... 120

F.3 New Property Tax Assessment Timeliness of Service Results...... 124 F.4 PT Population vs. Percentage of Inaccurate Application Durations: This plot shows no correlation between the population of a ULB and its compliance with proper use oftheERPSystem...... 125 F.5 PT Volume of Applications vs. Adjusted Timeliness ...... 126 F.8 NewWaterTaxAssessmentTimelinessofServiceResults ...... 129 F.9 WT Population vs. Percentage of Inaccurate Application Durations: This plot shows no correlation between the population of a ULB and its compliance with proper use oftheERPSystem...... 130 F.10 WT Volume of Applications vs. Adjusted Timeliness ...... 131 F.14 Public Grievance Redressal Timeliness by ULB across Departments ...... 135 F.15Tier1:GEImpactvs.IPImpact...... 136 F.16Tier2:GEImpactvs.IPImpact...... 137 F.17Tier3:GEImpactvs.IPImpact...... 138 Chapter 1

Introduction

India has been undergoing rapid digitization in the private and public sectors. Behind China, India is the second fastest digital adopter among 17 major digital economies.[17] Even with a population of 1.2 billion people, India has instituted the world’s largest national identification system with the implementation of Aadhaar by the Unique Identification Authority of India (UIDAI) [14]. With the introduction of ”100 Smart Cities Mission” in 2015[6], India started focusing on digitizing governance across cities to improve accountability, transparency, and eciency. With the immense wealth of personal data being digitized, guarantees on personal privacy were called into question. Consequently in August of 2017, the Supreme Court of India declared that privacy is a fundamental right.3.2.2 The court-issued Srikrishna Committee of Experts further expanded and explored the societal and operation implications of the judgement. We work with eGovernments Foundation (eGov) - a company that develops e-governance plat- forms that enable city and state governments to improve citizen service delivery and improve trans- parency - to investigate the juxtaposition of rapid digitization and personal privacy. Until recently, all citizen data was collected via paper forms. During the push for digitizaiton, organizations tasked with making services digital and equipping municipalities with digital platforms translated the pa- per process directly into a digital process. All the information that was collected via paper then was collected digitally. As ocials processing data on a paper form would have access to all the data fields, ocials processing digital applications now do as well. However, with the advent of big data and the ability to save and search large sets of data easily, data that on paper would not have been easily accessible can be searched relatively easily. Instead of having to manually go through an immense volume of New Property Tax Assessment applications, for example, an ocial or any citizen can merely search eGov’s website to find out what someone owes in property taxes to the government. The online system does provide accountability and transparency regarding a govern- ment service’s workflow and government ocials’ performance. However, due to the current lack of privacy constraints, personal information with cultural, financial or functional implications is left

14 CHAPTER 1. INTRODUCTION 15

vulnerable for exploitation and discrimination. Though there has been a wealth of work done regarding anonymizing data to hide personal information, the question of privacy is separate from that of anonymity. Techniques such as k- anonymity,[22], di↵erential privacy [9], and searchable encryption [8] provide methods to anonymize large sets of data. Anonymizing data does not necessarily address privacy violations or infringements due to inference, as Barocas and Nissenbaum point out.[15]. Particularly in the case of India, details other than a person’s name, such as job title or house address, can provide a lot information regarding caste, income level, political preferences, and so on. Inference comes not from specific data fields but rather contextual information in underlying data. Subsequently, we strive to understand the types of implications that specific data can have in the context of government services regardless of advances in anonymization methods. Through field work, collection of data, and further analysis, we observe and analyze the types of data collected by municipalities, their use cases for government services, access controls to per- sonal data, and the trade-o↵s between governmental e↵ectiveness and privacy driven constraints on data access. Of the more than 20 government services eGov is in the process of digitizing, we investigate three citizens services: New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal in the 112 urban local bodies(ULBs) in Andhra Pradesh, India. For New Property Tax Assessment and New Water Tap Connection, citizens can submit applications online via eGov’s online system or in person via municipal Citizen Service Centers. Citizens can submit complaints regarding a range of municipal/infrastructural issues using the Puraseva mobile application. Data such as personal details, building construction type, family occupancy, physical location, and information is collected for various services. Across all ULBs, each service has a standardized workflow for processing the data, from the opening of the application until its approval or rejection. Each ocial in the workflow has a specific task regarding the application, and subsequently uses specific information in the application to complete his task. The eGov ERP system logs timestamps for task completion, details of and comments for each task, and so on. In order to provide a standard for understanding and analyzing the trade-o↵s between government eciency and citizen privacy of these workflows across all ULBs, we define a Government Eciency Index (GEI) and Information Privacy Index (IPI). By analyzing municipal data use and calculating these indices across all ULBs in Andhra Pradesh, we strive to understand:

1. How does loss of data integrity or confidentiality a↵ect data use?

2. How do we measure government eciency and informational privacy in the Indian context?

3. How can we use this methodology to understand how to maximize for eciency and privacy?

In order to answer these questions, this document proceeds as follows: In Chapter 2, we elaborate on the research questions the document strives to answer. Chapter 3 discusses the history of the CHAPTER 1. INTRODUCTION 16

Supreme Court judgement and the need for privacy in the Indian context. We evaluate the value of privacy and various models of privacy for India as well. Chapter 4 describes the municipal organizational workflow for and citizen data collected from the three services of study: New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal. Details specific to eGov’s implementation of these services are included as well. Chapter 5 discusses the impact of loss of integrity and loss of confidentiality on citizen privacy. We provide the formulation of GEI and IPI in Chapter 6, detailing the calculation of these indices at a couple levels of organization. Chapter 7 shows the results of these calculations across the three services, in addition to additional pieces of analysis to improve our understanding of how ULBs use the ERP system. We identify model cities, or cities that perform well with respect to GEI, IPI, and both. Given the observations and determinations from Chapter 7, Chapter 8 discusses the subsequent use cases for the methodology in determining model cities and details possible impacts on public policy. Lastly, we conclude in Chapter 9 with a summary of our observations and directions for further work. Chapter 2

Research Questions

The overarching question of interest to this paper is this: Governance-Privacy Tension: How can cities (a state actor) use citizen data to maximize the governance while protecting the citizens fundamental right to privacy? The Srikrishna Committee white paper[21] identifies several privacy principles to be of impor- tance to context. Given the focus of this research on cities, we focus on the following subset: Data Collection, Data Use, Data Disclosure, Data Security and Data Anonymity. These are princi- ples that a↵ect both governance eciency and privacy, and where cities, acting as a Data Controller, must determine what their policies ought to be. In order to understand the tension between gov- ernance and privacy, it is important to first analyze the e↵ects of loss of privacy for citizen data, then construct a standard methodology to evaluate both eciency and privacy across cities. Using standardized indices as a tool, we can then identify which ULBs in particular are able to maximize eciency without loss of citizen privacy. Subsequently, the sub-questions we answer are as follows:

1. How do ULBs compare against each other on the axis of eciency?

2. How do ULBs compare against each other on the axis of privacy?

3. Which ULBs are able to maintain high GEI or high IPI?

4. Which factors lead to a decrease in GEI or IPI?

5. Which ULBs are able to maintain high GEI and high IPI?

6. What factors impact both GEI and IPI?

...

17 Chapter 3

Background: Digitization and Constitutional Privacy

3.1 Advances in Digitization of Cities

India, the second most populous country in the world, has been making strong e↵orts to improve the eciency of metropolitan areas. As of 2014, the urban population of India is 410 million, and the United Nations has projected that Indias urban areas will experience a growth of 404 million more people. As a result of this urban revolution that has been happening for the past decade, urban bodies in India have been struggling to keep up with the management and monitoring of large pop- ulations. Without accountable collection and processing of data, policymakers may not necessarily have access to the most up-to-date information on societal, financial, and environmental problems and resources. Furthermore, without access to proper data, urban planners may have diculties optimizing the allocation of these resources. In order to accommodate the increasing urban popu- lation, comprehensive development of physical, institutional, social, and economic infrastructure is necessary. To improve the quality of life in urban areas and to improve urban sustainability, the Indian Prime Minister Narendra Modi launched the 100 Smart Cities Mission in 2015. The Mission commissioned under the Union Ministry of Urban Development is a five-year urban renewal project that included the development of 100 Indian cities and the revitalization of 500 more. According to its ocial guidebook, the Missions core infrastructural focuses would consist of ten ar- eas, including robust IT connectivity and digitization, good governance (especially e-Governance and citizen participation), sustainable environment, safety and security of citizens, particularly women, children and the elderly, and health and education. Of the core infrastructure objectives outlined in the Smart Cities Mission Guidelines, the central government and state governments have been putting a focus on the implementation of e-governance

18 CHAPTER 3. BACKGROUND: DIGITIZATION AND CONSTITUTIONAL PRIVACY 19

projects and digitization of citizen data. The goal of e-governance is to increase transparency, accountability, and eciency of basic governmental tasks including but not limited to property tax payment, maintenance of public infrastructure, issuing of marriage licenses, commercial licenses, birth certificates, and so on and informational and transactional exchanges within the government as well as between the government and government agencies at the national, state, municipal and local levels, and cities and business. E-governance also empowers citizens through access, use, and ownership of their own data. State governments have hired private consulting firms and companies to develop and implement the appropriate software and hardware infrastructure. Once the municipal oces become almost completely paperless, the public will have access to every online record that exists in the e-governance databases, as privacy of data and information security is not yet enforced. As observed from fieldwork this summer, municipal government administrations tend to believe on their end that broad access to citizen information, the status of their records, and the progress logged by the records will increase the accountability of their own ocials, increase the eciency of work, and improve the publics faith in the governments ability to work for the individual. Addi- tionally, with the advent of machine learning and data science techniques, third-party observers will be able to use the data for research that could improve service delivery or give insights into public goods. Private corporations may be able to cater their goods and services to certain populations based on observed needs and tastes from the data. Furthermore, policy makers will be able to make more data-driven decisions. Despite the possibility of significant public and private benefits to various actors, data collectors and data holders are being forced to re-evaluate their methods and policies as a result of the Indian Supreme Courts recent landmark decision declaring privacy as a fundamental right under the Indian Constitution.

3.2 The Constitutionality of Privacy

In 2009, India mandated that every citizen must register their information and certain biometric data under a new unique identification system called Aadhar. While Aadhar was used to verify identification and document citizens, there occurred numerous cases where sensitive personal infor- mation had been leaked or hacked and privacy was compromised. As a result, petitioners questioned the need to collect biometric information, data security, and personal privacy, the central govern- ment issued a Supreme Court bench to decide on whether under the Indian Constitution, if privacy is a guaranteed fundamental right or not. The Supreme Court ruled that in fact, privacy is a fun- damental right, and is intrinsic to the values of Article 21 of the Indian Constitution giving citizens the right to life and personal liberty. They noted that the complexity of regulating privacy derives from the context-dependent nature of privacy, and issued a Committee of Experts to deliberate on a data protection framework for the country. CHAPTER 3. BACKGROUND: DIGITIZATION AND CONSTITUTIONAL PRIVACY 20

3.2.1 The Aadhaar Issue

In an e↵ort to create a comprehensive database of its citizens, in 2009 the Government of India created Aadhaar, the worlds largest biometric ID system, collected by the Unique Identification Authority of India (UIDAI). The Aadhaar system ideally would link a persons bank account, Aadhaar number, and mobile phone number to create a cashless, presence-less, paperless citizen information system. Aadhaar was meant to be used by companies to know their customer, to verify customers online, and to create a unique ID [13]. However, the debate on personal privacy was re-instigated when the Government of India passed the Aadhaar Act in 2016. Several stakeholders including private organizations, municipal governments, and activists and citizens have filed several petitions against UIDAI regarding the constitutionality of the Aadhaar Act and the schemes violation of the right to privacy.[19] Petitions included objections on the lack of transparency regarding the handling of private data and the lack of data security by information collection agencies. Furthermore, many questioned the need for the government to collect all ten fingerprints and two retinal scans in order to uniquely identify each person. There existed little to no guidelines on how or what information can be shared within government agencies or how much personal information can be collected by the government. First on July 21 2015, a three-judge bench clarified that mandating that all citizens must have Aadhaar violates a Supreme Court interim order from September 23, 2013, that Aadhaar is voluntary. The central government argued the next day that privacy was not a guaranteed fundamental right under the Indian constitution, that the right to privacy is not absolute and that privacy is subject to restrictions in public interest. On August 6, 2015, the three-judge bench upheld that the petitions against linking the biometric registration process with basic and essential subsidies and welfare schemes constituted a violation of privacy.

3.2.2 Supreme Court Decision

In 2009, India implemented an new national ID system, Aadhar. Citizens were required to register personal information as well as certain biometric data for Aadhaar. While Aadhaar was intended to verify identification and document citizens, it was not entirely secure. There were numerous cases of hacking, leaking, or selling of sensitive personal information. As a result petitions were filed by citizens and civil liberties groups, questioning the need to collect biometric information, data security, and personal privacy. The central government convened a Supreme Court bench to decide on whether or not under the Indian Constitution, privacy is a guaranteed fundamental right.

Privacy is a Fundamental Right

The central government convened a nine-judge bench decision to reflecting how the Constitution makers envisioned the nature of privacy: CHAPTER 3. BACKGROUND: DIGITIZATION AND CONSTITUTIONAL PRIVACY 21

Is privacy a guaranteed fundamental right in the Constitution? • What is privacy defined as? • Is the right to privacy embedded in the right to liberty and personal dignity, or other guarantees • of protected fundamental rights?

In what parts of a citizen’s life is privacy guaranteed? • How much should the government regulate privacy (nature of regulatory power)? • What are the di↵erent aspects of privacy and does the Constitution cover some but not the • others?

On August 24, 2017, the Bench unanimously decided that under the Indian Constitution, privacy is a fundamental right other than for reasons of national security, protection against crime, and protection of revenue. Observing that the Indian Constitution is a dignitarian constitution focused on upholding every citizens personal dignity, the Bench outlined several reasons why privacy is important for ordered liberty: (1) privacy is a form of dignity; (2) privacy provides a limit on the government’s power as well as a limit on private sector entities’ power; (3) privacy is key for freedom of thought and opinion; (4) it provides the right to control personal information as well as provides incentive for development of personality; (5) a guarantee of privacy prevents unreasonable intrusions by malicious public, private, or individual actors. It was determined that privacy is intrinsic to the values of Article 21 which gives citizens the right to life and personal liberty. Furthermore, privacy should apply to both physical forms and to technological forms of information; rights to enter the home should be up to the individual, excepting security reasons listed in Article 14. Lastly, privacy serves eternal values and guarantees as well as foundation of ordered liberty. Consequently, the Bench formulated a three-fold requirement for a valid law on privacy:

1. A law stating the privacy is a fundamental right according to Article 21 should exist.

2. To guard against arbitrary state action, the restrictions imposed on the nature and content of the law should abide by Article 14’s exceptions to reasonableness.

3. The legislature must be proportional to the object and needs sought to be fulfilled by the law.

The Bench, recognizing that data protection and data privacy are complex issues that require expert opinion, mandated that the government create a Committee of Experts under the Chairman- ship of Justice BN Srikrishna, a former judge of the Indian Supreme Court, to deliberate on a data protection framework for the country. While the constitutionality of the right to privacy was decided upon, the complexity of regulating privacy derives from the context-dependent economics of privacy. To better understand existing models of privacy protection and enforcement, an understanding of the transforming definition and value of privacy depending on contexts is important. Chapter 4

Understanding the Value of Privacy in the Indian Context

The combination of big data and machine learning techniques can inform significantly about society and have the potential to bring about positive societal change and digital records can allow for greater eciency and accountability. Policymakers have to reconsider how open should open data be, and what the fine line lies between keeping information private yet taking the most advantage of the large scale of digitized information. Often, data collected with informed consent for a particular purpose can be re-purposed and analyzed for a di↵erent subset of insights. In these cases, the economic value of the data changes and especially in the big data realm, privacy regulation grapples with problems of unpredictability, externalities, probabilistic harms, and valuation diculties.[11] The economics of privacy concerns the trade-o↵s associated with the balance of public and private spheres between individuals, organizations, and governments with respect to personal privacy. [7]. In the Indian e-governance context, personal data is any information that provides knowledge on an individuals traits or attributes, including but not limited to age, gender, income tax level, address, number of family members, occupation, education level, and welfare status. The data generated by the citizens, who are called the data subjects or providers, is passed sequentially to the data collector, data holder, and data users who may be private or public entities providing a particular service to the citizens. The data collectors for e-governance data are the municipal governments, but the data holders in the backend di↵er from state to state. The state of Andhra Pradesh and for example, hired e-Governments Foundation to implement and build specialized data collection, visualization and storage tools.[3] The companies and consulting firms hired by states vary, as state governments are fairly independent from the central government. All stakeholders including individual citizens, the government of India, UIDAI, data collection agencies, data storage agencies, analytics firms, government welfare agencies, and telecommunications companies to name

22 CHAPTER 4. UNDERSTANDING THE VALUE OF PRIVACY IN THE INDIAN CONTEXT23

a few, will have access to or will have to transfer personal data. While citizens derive individual benefits and enjoy any common public goods produced using the assembled data, three key themes emerge on the flow and use of information about individuals by firms or governmental organizations.[7] First, a single unifying practice of privacy is dicult to formulate, as privacy issues of economic relevance arise in a wide variety of contexts and a variety of markets for personal information. For example, the definition of privacy even within the Indian context changes when discussed with respect to national security versus commercial marketing versus healthcare. Although the Smart Cities Mission mentions only security in the e-governance context, the Srikrishna Committee seeks to create an overarching data privacy protection framework that may minimizes costs from standardizing across sectors, even if it may not be able to minimize trade-o↵s. Second, it is dicult to conclude whether privacy protection entails a net positive or negative change in economic terms, as the benefits of protecting privacy including fraud and identity protec- tion may or may not be greater than the costs of anonymizing data, securing the storage of data, and so on. For instance, while revealing mobile location information can be beneficial in improving trac conditions or transportation eciency, it may be considered an intrusion upon privacy if the government continuously monitors citizens locations with the intent of surveillance. In both cases, the same information is being monitored but depending on the context, the credibility and legitimacy of the data collection may be undermined, perhaps leading to greater distrust in the government and higher costs of data collection for public services. . Lastly, especially in a country like India where its 1.3 billion citizens lie on a broad spectrum of levels of education and income, a large number of poor or poorly educated people are at a disadvantage in accurately assessing the benefits or consequences from the sharing or protecting of personal information. The average citizen in India has undergone only 5.1 years of schooling [4] and even the most educated may not necessarily understand the power of analytics or machine learning. When requesting consent for the use of particular information, organizations intending to perform predictive analysis or machine learning techniques cannot lay out all of the scenarios in which the information will be used and what insights may be discovered about their data subjects. Disclosing data causes the reversal of information asymmetries: before the information is released, the data subject the citizen in this case holds greater knowledge about the information than the data holder the government or third party. Afterwards, the data subject may not know what the data holder can do with the data and the consequences associated with sharing the data. While giving up privacy may allow a citizen to receive tangible benefits such as welfare approval, revealing the data may also incur intangible consequences such as the loss of autonomy and possibility of increased surveillance that the common citizen will not be able to foresee or gauge. The market cannot respond appropriately to information gaps where users cannot express their true preferences for privacy protection. [10]. As a result of this information asymmetry, designing even specific privacy regulations cannot necessarily cover unknown use cases or account for under-informed citizens. CHAPTER 4. UNDERSTANDING THE VALUE OF PRIVACY IN THE INDIAN CONTEXT24

In addition to the caveats that exist with creating privacy legislation, there are two basic trade- o↵s that the government sees with the sharing of personal data.[7] First, individuals and communi- ties can economically benefit from sharing data. One particular case in which the sharing of data is undoubtedly beneficial is Indias Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA), the worlds largest social welfare scheme.[20] Through rural employment, the program seeks to alleviate poverty and provide benefits to impoverished and marginalized Chapters of society. In a social welfare state, the collection and storage of data about each individual citizen who is a part of the program is not only necessary for accountable monitoring of progress and allocation of the scarce public resources, but also a powerful enabler in the spread of innovation and knowledge if legitimately deployed to improve the states understanding of causes of and predictive power of particular living conditions, education levels, income levels, and so on. At the same time, when Aadhaar is linked to welfare schemes and education scholarships, inappropriate access to the infor- mation could compromise personally identifiable information like banking information or culturally sensitive information such as caste. Second, certain positive and negative externalities rise through data creation and transmission. The obvious positive externality is that specific aggregate and individual analysis of the data may lead to correlation between particular events in for example the health or education sector. For example, researchers with access to education data were able to discover that student attendance in low-income areas increased when in-school meals were provided. As a result, the central government implemented the Midday Meal Scheme that targeted certain demographics of school children as well as employed an estimated 2 million poor and marginalized women to cook and help with meals.[5] Negative externalities, however, can include intrusive surveillance by the government or targeted pricing by corporations arising from a comfort of sharing ones information if the people around him are willing to. As a result, the economic value of ones personal data continuously changes depending on context and how willing other people are to share their personal information. Chapter 5

Existing Models of Privacy

The constitutionality and contextual dependence of privacy provide a considerable challenge in formulating one set of standardized regulations on the conditions under which personal information can be shared and the methods by which to share and monitor the data. The two main components of international privacy regulations include first, guidelines on how to protect the data and the second, on how to enforce privacy protection. Internationally, the three common models of privacy protection can be described as i) the Command and Control Model, ii) the Self-Regulation/Sectoral Model, and iii) the Co-Regulatory Model.[21] The Srikrishna Committee assessed the three models and concluded that the Co-Regulatory Model was appropriate for India as its varying levels of government involvement and industry participation can be molded to the Indian context.

5.0.1 Command and Control Model

The Command and Control Model, also known as the Comprehensive Model[1], includes a general law that regulates the collection, use and dissemination of personal information in the private and public sectors, governed by an oversight body. Around 40 countries and jurisdictions in Europe have adopted this model,[1] mainly through the European Union Data Protection Directive (EU- DPD) and the Organization for Economic Co-operation and Development (OECD) Guidelines on the Protection of Privacy and Transborder Flow of Personal Data. There are three reasons for adop- tion of a comprehensive model: 1. To remedy past injustices incurred by authoritarian regimes; 2. To ensure consistency with European Privacy Laws to facilitate trans-border information transfer while protection personal privacy; and 3. To promote electronic commerce.[16] The challenges of implementing this model include costly paperwork and documentation in even low-risk scenarios as the model sets out minimum requirements for data collection, storage, and transfer as well as insucient opportunities for innovation in data processing as the purpose of use of the data must be pre-determined and communicated to the data subjects.

25 CHAPTER 5. EXISTING MODELS OF PRIVACY 26

With the goals of creating a unified economic market and providing strong overall protection of privacy within the EU, the EUDPD sets down Data Protection Principles: transparent data processing, purpose limitation and proportional use, data minimization, accuracy, data retention periods, data security, and accountability to a supervising body. Additionally, the Directive outlines data subjects rights of access, rectification, deletion, and objection to the data, restrictions on onward transfers, additional protections in special categories of data and direct marketing, and prohibition on automated individual decisions. Unlike the sectoral model, these principles put the onus of protecting private data on the data collectors and data holders as opposed to the data subjects. As can be observed, such strict restrictions on how personal data can be used limits freedom of innovation through analytics. Given the breadth and quantity of data being collected on a federal and municipal level in India, the Srikrishna Committee noted that this model is quite restrictive and computationally expensive to implement in India. Additionally, the dichotomy of federal and municipal control would create diculties in creating a federate, sectoral framework and raise issues on how involved state machinery should be.[21]

5.0.2 Sectoral Model

The Sectoral Model of privacy protection applies to countries that have varying legislation across industries and sectors, and is based on a combination of legislation, regulation, and self-regulation. Each industry is at will to draft its own set of guidelines and incorporate self-policing techniques for enforcement of the codes of practice.[21] While this approach provides flexibility and specificity across industries, a significant disadvantage is that over-specified regulation requires modifications or entirely new sets of regulations when new technologies come into use. Furthermore, the self- regulatory approach can be self-serving, and ine↵ective as there is no ocial oversight agency in place. Consequently, some countries opt to use a sectoral approach combined with the Comprehensive Model to have general privacy regulations in addition to more specific industry-oriented guidelines. The standalone Sectoral Model is used in the United States, Japan, and Singapore. The United States Federal Trade Commissions Code of Fair Information Practice Principles (FIPPs) give guide- lines on the use of personal data in the online market place, upon which sectoral legislation is built. It is interesting to note that while the European model holds the data collectors and users responsible for the privacy protection of personal data, the American model holds the data subject responsible for the privacy and disclosure of their data. Data collectors must follow certain guide- lines on notice and consent but regardless of who has access to the data, the data subject owns his data. As a result, the core privacy principles are consent-based: Adequate Notice of Data Use, Choice/Consent for Data Use , Access/Participation of Data Subject, Integrity/Security of Data, and Enforcement/Redress for Data Collectors.[18] Other examples of sectoral regulation include the Fair Credit Reporting Act (1970), Video Privacy Protection Act (1988), Childrens Online Privacy Protection Act (1998), and the Cable Television CHAPTER 5. EXISTING MODELS OF PRIVACY 27

Consumer Protection and Competition Act (1992). Similar to the European model, the American guidelines include rules about the security of data from internal and external threats, as well as enforcement guidelines from a regulatory body. Physical protection of data is guaranteed by both models. Privacy protection of data, however, is not necessarily guaranteed by the self-regulation model. First, the FIPPs are not legally enforced and are merely guidelines to maintain privacy-friendly, consumer-oriented data collection practices. The American model does ensure that people have a choice in providing data, and that they know their data and how it is being used, but companies can deny services if data subjects do not agree to their terms of disclosure. Furthermore, industries will most likely choose the self-regulatory method of enforcement to avoid third-party oversight. While some firms may truly institute costly, self- regulatory standards, competitors may free ride on the sectors improved reputation for protecting privacy. [10] While the sectoral model is the least intrusive and most ecient in ensuring fair information practices, the government risks the possibility of firms putting their own profits ahead of the public interest.[10] While the basic privacy principles were instituted in order to be able to engage in commercial data transfers with EU member states, the United States policy on the governments right to personally identifiable information is completely separate. Certain government agencies under Amendment 4 of the US Constitution and the Patriot Act are allowed to withhold and analyze any personally identifiable information in the interest of national security. The extent to which privacy can be compromised is vague, and privacy regulation is essentially non-existent when the government is the data holder, collector, and user. In the historical and culture context of India, self-regulation has been ine↵ective and highly corrupt. When industries, such as the telecommunications industry, are composed of oligopolies the self-regulation and guidelines will ultimately lead to improper trade- o↵s between marketing strategies and maintenance of personal privacy. Furthermore, the sectoral model does not impose any guidelines on the governments use and processing of personal data. The Srikrishna Committee noted that substantive elements of the self-regulatory model and its data protection framework are not considered as part of an enforcement mechanism.[21] As a result, the sectoral method in India would not only be incomplete but also inecient for such a diverse country.

5.0.3 Co-Regulatory Model

The Srikrishna Committee and the Expert Group on Privacy concluded that the Co-Regulatory Model captures elements of both the command and control model and the self-regulatory model to create a more appropriate middle path that combines the flexibility of self-regulation with the rigour of government rule-making. In this model, both the government and industry draft regulation for privacy protection, which are enforced by the industry and overseen by a private state agency. Canadas Personal Information Protection and Electronic Documents Act (PIPEDA) and the Aus- tralian National Privacy Principles implement the Co-Regulatory Model. In Canada, privacy in CHAPTER 5. EXISTING MODELS OF PRIVACY 28

the private and public sectors is a guaranteed fundamental right under the Charter of Rights and Freedoms. The core privacy principles in the model are similar if not identical to that of the EU- DPD where the data collectors are responsible for protecting the privacy of the data subjects. This model, however, spreads regulatory power over individuals, industry organizations, and the central government in a four-tier system composed of legislation that is enforced by a government protection agency. The governments agency overlooks watchdog agencies which help empower the public and private sectors. The four tiers of the co-regulatory model are implemented as shown in Figure 5.1:

Figure 5.1: Co-Regulatory Model

The co-regulatory model takes into account privacy principles and provides a multi-layered method to ensure that federal agencies as well as private entities and individuals are aware of and abide by privacy regulations. Like the European Model, there is an ocial governmental supervi- sory body to maintain accountability and verify enforcement. Additionally, like the Sectoral Model, industries have the flexibility of some amount of self-regulation, and so can keep up with needs of the growing Internet economy and rapid technological changes. By the same token, critics of the co-regulation model fear that the governments cooperation with industries may reduce accountabil- ity and transparency. Industry lobbying and the power to participate in creating legislation would facilitate the industry taking advantage of co-regulatory processes to capture the agency and enforce its point of view. CHAPTER 5. EXISTING MODELS OF PRIVACY 29

5.0.4 Recommended Model of Privacy in India

In 2012, an expert group under the Planning Commission of the Indian Government produced a Report of the Group of Experts on Privacy. Chaired by the former Chief Justice of the Delhi High Court, Justice A.P. Shah, the Expert Group was composed of representatives from industry, civil society, NGOs, voluntary organizations, and government departments. As the initiation of national programs like the Aadhaar card, DNA profiling, Reproductive Rights of Women that increasingly used internet-connected technologies was growing, the amount of information about a person began to range from data related to health, travel, taxes, religion, education, financial status, employment, disability, living, situation, welfare status, citizenship status, marriage status, crime record, etc. Analytic tools that generate economic value out of data and the ubiquitous transfer of data require an overarching privacy policy to regulate the government and commercial collection of information. Consequently, the Srikrishna Committee drew from the Group of Experts Report, examined inter- national and national privacy principles, and identified a set of recommendations for the Indian Government to consider when formulating a privacy framework for the country. The White Paper proposes seven salient features for a conceptual foundation for a Privacy Act for India, which the Supreme Court case file reiterates: Technology Agnosticism, Holistic Appli- cation, Informed Consent, Data Minimization, Controller Accountability, Structured Enforcement and Deterrent Penalties. The data collectors, holders, and users are responsible for protecting the privacy of data subjects. Drawing from particularly the EUDPD and OECD Guidelines, the Expert Group provides a comprehensive set of principles and foundational elements to construct an ex- haustive framework that protects personal privacy especially in the context of government collection of personal data. The principles include guidelines on Notice of Data Use, Choice and Consent, Collection Limitation, Purpose Limitation, Access and Correction, Notice of Disclosure of Informa- tion, Security, Openness/Transparency of Data Use, and Accountability. The scope of this study in particular is outlined in Figure 5.2.

Figure 5.2: Comparison of Privacy Principles CHAPTER 5. EXISTING MODELS OF PRIVACY 30

...

5.1 Data and Data Collection

In this section we review the sources of our data, both in terms of use of the current tools for collecting data, through eGov, and our decisions about both site and data type selection. Our key observations are that not all data collected by ULBs for PT, WT, and PGR are necessary for providing these particular services. This discrepancy suggests room for improvement in access controls and therefore personal privacy.

5.2 e-Governments Foundation, Current Installations and Dig- ital Services

The eGovernments Foundation (eGov) develops digital platforms that enable city and state govern- ments to improve accountability, transparency, and eciency for the delivery of citizen services and accounting and organization within the government. The eGov platform is designed to aid in the management of four categories of government information: administration, revenue, expenditure, and citizen services. While administration and expenditure modules account for employee manage- ment, legal case management, payroll and pensions, assets, and so on, revenue and citizen service modules mainly include tax evaluations and registrations filed by citizens. Revenue sources include collection of property tax, water tax, trade licenses, advertisement tax and fees from government land and estates while citizen services include birth and death registrations, marriage registrations, an online citizen portal, public grievance registrations, and building plan approvals. The platform allows municipal ocials to enter information and view individual and cumulative data on quantitative and geo-spatial dashboards. The digital actions of each employee are logged in order to monitor performance and accountability. It also promotes citizen engagement by interfacing with an online citizen portal and mobile app where people can submit and view the status of their applications and registration, improving transparency and accessibility. EGovs clients include but are not limited to the state of Andhra Pradesh, the state of Punjab, the Greater Chennai Corporation, and the state of Maharashtra. CHAPTER 5. EXISTING MODELS OF PRIVACY 31

Figure 5.3: eGovernments Foundation ERP Suite[3]

5.3 Site and Module Selection

For this study, we chose research sites located in a single state. For confidentiality reasons, the identity of the state is kept private. Within this state, there are 112 ULBs that are now equipped with the eGov platform. ULBs are classified into three types by population: Nagar Panchayats have a population of less than 100,000 people, municipalities have populations greater than 100,000 people, and municipal corporations have populations of greater than one million people. For this study, we chose two municipal corporations and one municipality to construct a representative sample. Administratively, the Director of Municipal Administration (DMA) is a state ocial oversees the eGov implementations in all ULBs.

5.3.1 Department Hierarchy

Figure 5.4: Department Hierarchy CHAPTER 5. EXISTING MODELS OF PRIVACY 32

Within a state, the DMA, who provides state level oversight for support services in the municipali- ties, manages the Additional Director, Joint Directors, and Assistant Directors who oversee various aspects of all municipalities. Then, each municipal corporation houses a commissioner who, with the ULB mayor, provides administration and governance of the operations of each district. Each ULB is assigned a commissioner depending on which district it resides in. The Commissioner defines access controls for employees and can monitor employee performance. Within every ULB, there exist the Administration, Revenue, Accounts, Public Health and Sanitation, Engineering, Town Planning and Poverty Alleviation departments. Each department is responsible for processing certain modules classified under Expenditure and Revenue. All departments, however, are responsible for the Public Grievance module depending on factors discussed in Section 6.1.3.

5.3.2 Classification of eGov Modules

Of the four categories of eGov modules, the Revenue and Citizen Services modules are public-facing and relevant to citizen data collection and citizen service delivery. This subset of modules can be grouped into four types based on what kind of information they may reveal about a citizen. First, modules may be revealing of personal identity, as they contain highly sensitive personal information. Birth and Death Registration, Marriage Licenses, and the Citizen Portal fall into this category. Second, modules such as Water Charges, Property Taxes, and Building Approval are revealing of personal assets. Third, the Trade License and Advertisement Tax modules are examples of modules that are revealing of a citizen’s commercial assets. Lastly, the Public Grievances module forms its own category, as its function does not necessarily require citizens to reveal sensitive personal information, and e↵ects public infrastructure.

5.3.3 Modules Selected for This Study

The Property Tax Module (PT), Water Charges Module (WT), and Public Grievance Module (PGR) were chosen for this study based on volume of data, accessibility and prevalence. These modules were among the first to be implemented at our site in 2016. As a result, the volume of transactions for each of the modules exceeds 100,000 in 2018. This combination also lets us consider various di↵erent types of potential revelation of citizens personal information. We will examine the nature of this information in further detail in the next few sections. Chapter 6

Data Available for Analysis

6.0.1 Workflows

For each of the modules, we gathered data and information in three parts. To understand the workflow, we first interviewed e-Gov’s team for the state as well as state ocials on the workflow of each module. Each of the three selected modules have their own workflow. Once a citizen submits a form or a request, all of the information that they have submitted is passed through various levels of hierarchy in the appropriate department within a certain number of days. These number of days, called Service Level Agreements or SLAs, are unique to the Indian state where we carried out our site visits. If an ocial does not complete his task within the given SLA, then the task will be escalated to the next level in the hierarchy. This accountability model promotes transparency and improves eciency. We developed an understanding of how the data collected from a citizen is used and passed through a department to provide a particular service and how long it takes to do so in comparison to the SLA for that service. This information is important in detecting possible eciency and privacy trade-o↵s while providing a service.

6.0.2 Binary and Qualitative Matrices

The second type of data we needed was a matrix of how each data field for each service is used. We conducted on-site interviews with service functionaries, the state employees that complete certain tasks in the service workflows. During the interviews, a functionary from each level of the workflow was asked to identify the data fields that they were given access to as well as the data fields that they needed to complete their task in the workflow, and the data fields that were not necessary to complete their task but useful to have access to. During the interviews, we built two matrices to structure this data. In both matrices, the rows contain the data fields collected for a particular service and the columns contain the name of each functionary in the workflow and their tasks. In the Necessary Data Matrix (NDM), cells are filled with ”1” if a particular functionary uses a particular

33 CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 34

data field to complete his task, or ”0” otherwise. In the Operational and Administrative Matrix (OAM), a cell is assigned ”O” if the data field is used for operational purposes, as in it is absolutely required for a particular functionary to complete his task. The cell is denoted ”A” if the functionary may use the field for eciency purposes but is not necessarily required to complete his task. The cell is filled with ”U” if the field is not necessary or used by the functionary at all to complete his task. We assume the NDM and OAM are identical across all ULBs. The state has guidelines on how each service is performed or how the outcome of each service is determined. For services like WT and PT, there is a master sheet that is filled with relevant information, which then calculates the respective tax assessment. The data fields collected and service workflows are the same across all ULBs. The master sheet is also identical across all ULBs. In smaller ULBs, one functionary may be responsible for a couple tasks that are spread out across multiple functionaries in larger ULBs. Although these small variations in responsibilities exists, the tasks performed, data access, and data use we assume are uniform.

6.0.3 2018 Data

The final source of data includes the 2018 data collected by the state through eGov’s modules. With the permission of eGov and our partner state, we were given access to the data collected by all New Water Tap Connection applications, New Property Tax Assessment applications, and Public Grievance Redressals in 2018 for all 112 ULBs. The Data includes all data fields collected by these services as well as details of the workflow transition for each application. The workflow transition documentation includes when a particular application was received by a functionary, the time it took for the functionary to complete his task, the functionary’s comments, and the state-mandated completion deadline for the application. Such granular data on the movement of applications through workflows was extremely important for results and analysis.

6.1 Service Modules Data

The Property Tax (PT) and Water Charges modules o↵er various services. For example, in the Water Charges Module, citizens can apply for New Connection, Re-Connection, Closure of Connection, and so on. For both modules, the New Property Tax assessment and New Water Tap Connection applications and workflows are fairly representative of and the most comprehensive in terms of collected data fields of all of the services in their respective modules. As such, we refer to the new property tax assessment and new water connection workflows as the generalized Property Tax Module and the Water Charges Module, respectively. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 35

6.1.1 Property Tax Module

The Property Tax Module includes services to evaluate property tax or change property tax. While the representative service is New Property Tax Assessment, the module also includes services like Transfer of Title, Bifurcation, Addition/Alterations, Revision Petitions, Demolitions, and so on. The module requires the applicant to give owner details, property address details, assessment details, amenities, construction details, floor details, details of surrounding boundaries of the properties, court documents, and vacant land details if applicable.

Workflow

The quantitative evaluation of property tax payment depends on Usage, Classification, Zone, Age, and Occupancy Type data fields. Application particulars, such as contact details and address are important in verifying personal identity and assets. Once a citizen submits an evaluation request, the data is verified by a Junior Senior Assistant, then send to a Bill Collector and Revenue Inspector who verify details and conduct site visits. A revenue ocer validates the evaluation, at which point the application must be approved at the Commissioner Level in order to be complete. In smaller ULBs, two or more of these functions may be completed by the same ocial. In larger ULBs, the process may be less uniform so that work is spread across multiple ocials in the same level of hierarchy. The workflow for processing a property tax assessment application is described in Figure 6.1.[3] All data is first collected from the citizen through an online portal, the Citizen Service Center’s (CSC) physical location, or the state’s online app. Along the rows we see the various functionaries including Jr./Sr. Assistant, Bill Collector, Revenue Inspector, Revenue Ocer, and Commissioner. The green boxes describe the tasks each functionary is responsible for, and the arrows indicate the order of operations. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 36

Figure 6.1: New Property Tax Assessment Workflow[3]

Binary and Qualitative Matrices

The NDM and OAM matrices were collected on the ground through interviews with ULB functionar- ies as well as eGov site specialists. The full matrices for our state of study are shown in Appendix C.

2018 Data

We were given access to all New Property Tax Assessment data from 2018. In 2018, 201,458 applica- tions were processed, which overall went through 931,393 workflow transitions. As of the beginning of 2019, the state has processed 366,711 applications which has undergone 1,986,969 transitions in total. The fields which were collected and their descriptions are shown in Figure C.6.

6.1.2 Water Charges Module

The Water Tax module, which the Engineering department manages, includes services to evaluate or change water tax payments. In our study, we analyze applications for New Water Tap Connection, as this service fairly representative of and the most comprehensive in terms of collected data fields of all of the services in the Water Charges Module. Other services in the module include Change of Usage, Closure of Connection, Re-connection Service, and Additional Water Tap Connection. Similar to the PT module, application particulars are necessary as well for verification. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 37

Workflow

The fields that are essential for the evaluation of water tax are Zone, Uses Type, Water Source, Pipe Size and where it is applicable the White Ration Card. If the resident holds a White Ration Card, that means they are eligible for subsidies. In that case, the name and address become important for verification purposes, and that the person holding the white ration card is the one living at the property. The Property Assessment ID must also be provided in the application, where all of the information from that Property Tax (PT) assessment is available to the ocials in the Water Charges workflow. The workflow generally looks similar to the PT module, where a Junior/Senior Assistant verifies application details, Assistant Engineer does a field verification and feasibility testing, Deputy Executive Engineer/Executive Engineer/Superintendent Engineer scrutiny the estimation details, and the Commissioner approves the evaluation.

Figure 6.2: New Water Tap Connection Workflow[3]

Binary and Qualitative Matrices

The NDM and OAM matrices were collected on the ground through interviews with ULB func- tionaries as well as eGov site specialists. The full matrices are shown in Appendix D. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 38

2018 Data

In 2018, 101,849 New Water Tap Connection applications were processed, which overall went through 747,521 workflow transitions. As of the beginning of 2019, the state has processed 160,809 appli- cations which has undergone 1,192,056 transitions in total.[3] The fields which were collected and their descriptions are shown in Figure D.3.

6.1.3 Public Grievances Module

The Public Grievance Module (PGR) allows citizens to submit a complaint to the municipality about sanitation issues, stray animals, illegal businesses, non-functioning of street lights, concerns regarding schools, voter lists, and so on. Each complaint is mapped to an internal department and an ocial in that department. Once the complaint is submitted and reaches an ocial in the relevant department, the ocial has an SLA for that concern by which he must address the issue. If an ocial does not address the concern within the given SLA, then the task will be escalated to the next level in the hierarchy. This accountability model promotes transparency and improves eciency. A comprehensive list of complaint types and corresponding SLAs are outlined by the state, as shown in Figure A.1.

Workflow

The workflow and escalation of tasks depends on the complaint and the department to which the com- plaint is assigned. Unlike Property Tax and Water Charges, each PGR complaint can be addressed and completed by one functionary. The module maps to a municipal administration department depending on the type of grievance submitted. This module requires the citizen to input contact details of the citizen and grievance details including the location of the grievance and photos of the complaint if relevant. Depending on the type and geographical area of the complaint, the back-end maps the complaint to a functionary. If the functionary exceeds the SLA for that complaint, then the complaint will escalate to his superior according to the escalation hierarchy shown in Figure 6.3.

Binary and Qualitative Matrices

The NDM and OAM matrices were collected on the ground through interviews with ULB func- tionaries as well as eGov site specialists. The full matrices are shown in Appendix E.

2018 Data

In 2018, 135,242 PGR submissions were processed, which overall went through 654,418 workflow transitions. As of the beginning of 2019, the state has processed 265,192 submissions which has undergone 1,307,552 transitions in total.[3] The fields which were collected and their descriptions are shown in Figure E.5. PGR has not yet been implemented for all departments for all ULBs. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 39

Figure 6.3: PGR Escalation Hierarchy[3] CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 40

In the eGov data we received, ULBs had a non-trivial number of transactions for the Revenue De- partment, Administration Department, Town Planning Department, and Urban Poverty Alleviation Department. In this study we analyze complaints from only these four departments. Chapter 7

Understanding Data Use and Data Disclosure

To maximize privacy, it seems obvious that municipalities should collect only data that they need in determining a tax assessment or providing a particular service. However, why is this not necessarily the case? The current online forms were transcribed from the previously existing paper forms. Pre- digitization, as much information as possible was collected from the citizen, even if some fields seemed unnecessary. Privacy was still conserved as data was not easily searchable, and citizens were fine with giving up more information so that they would not have to spend more time and money to return to the municipal oce again to give more details. In the digital era where data is searchable and more easily accessible, collecting unnecessary data jeopardizes personal privacy and data usage. From talking with various ocials and gaining an in-depth understanding of the workflow, we understand that all citizen data collected for each module can be categorized broadly into three categories: necessary for completing the function of the module, collected for eciency/accuracy of the workflow, and unnecessary to the module (see Figure 7.1). Necessary use means that a data field is either required for a quantitative valuation or for personal identity or asset verification purposes. In Water Charges for example, the attributes Property Type, Usage Type, Pipe Size, Water Source Type, Connection Type, and Address are the only factors used to quantitatively calculate the water tax. Attributes such as Property Assessment number, White Ration Card, and Name of Applicant are used to verify details of the request, and so are important to validate the request. Data fields collected for eciency/accuracy are not necessarily needed in order to complete the function of the module, but give ocials a clearer picture of the application. Some of these attributes include information that can be observable on the site, such as amenities or details of surrounding areas for the property tax module. Other information makes establishing contact with the applicant easier,

41 CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 42

such as the name and phone number of the citizen filing a public grievance. Lastly, there are data fields that are unnecessary to complete a function, but are still collected. In PGR for example, the address of the citizen complaining is not necessary to either contact him or address the grievance. For Water Charges and Property Tax module, email is not used to contact or give updates to the citizen, as the status of an application is communicated verbally or through the citizen portal or app.

7.1 Understanding Implications of Loss of Data Integrity

In this section we focus on several sorts of implications that derive from loss of data integrity. We begin by considering privacy implications and then consider the functional and financial implications of loss of data integrity. The reasons for loss of integrity are left to future work.

7.1.1 Privacy Analysis I: Loss of Data Integrity

According to NIST,[12] the loss of data integrity is defined as data being altered in an unauthorized manner during storage, processing, or in transit. In order to understand the implications if data integrity is lost for each data field in the three modules. We determined three types of implications that the loss of integrity may have: cultural, financial, and functional. Cultural implications relate to what social inferences can be made about a person from a particular data field. There would be a financial implication if the citizen is a↵ected financially, and a functional implication if the function of the module is unfulfilled. Due to the size of this detailed analysis, we summarize it here.

7.1.2 Functional and Financial Implications of Loss of Data Integrity

For each data field in each of the modules, we evaluate what kind of implication may result if only that particular data field was altered in an unauthorized manner. For example, if only a person’s name in Water Charges module was changed, and the name on the application does not match the name in the property assessment anymore, then the water tax evaluation would be stalled. As a result, the loss of integrity of the Name attribute in the Water Charges module would have a functional implication. In the case of Water Charges and Property Tax, financial implications and functional implications go hand in hand. If a tax evaluation is incorrect, then the citizen is financially e↵ected. From our analysis, we observe that the loss of integrity is likely to have functional and financial implications. The loss of integrity can a↵ect the verification of personal identity or assets, the quantitative valuation of the water or property tax, or hinder eciency. As described in Section 6.2, necessary data fields can be separated into those required for verification and those required for the quantitative calculation that is the output of a module. If the integrity of either type of necessary CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 43

Figure 7.1: Table of Fields Necessary, Collected for Eciency/Accuracy, and Unnecessary for Com- pletion of PT Assessment, WT Assessment, and PGR CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 44

field is lost, then by definition the function of the module cannot be completed. There may be an over-valuation or under-valuation of a property or water tax if the factors determining them are changed. This poses a financial implication. A functional implication is the function being stalled if the identifying information of a person, his assets, or a public grievance are inaccurate. If the data fields collected for eciency/accuracy are corrupted, then the function of the module may not be stalled, but may be severely hindered. If a Grievance Photo was changed to a di↵erent image, a functionary would still be able to find the location of the grievance by the Grievance Details or contacting the complainer, but his job would likely be severely sidetracked by an irrelevant image. Therefore, we understand that protecting the integrity of citizen data fields is important in order to protect against harmful functional and financial implications.

7.2 Understanding the Implications of Data Disclosure on Privacy

In this section we consider the e↵ects of data disclosure on privacy. We begin by considering who will be impacted, and the relative severity of that. We then discuss the financial and cultural implication, and conclude this section with a discussion of the challenges of inference over the grievance data.

7.2.1 Privacy Analysis II: Loss of Data Confidentiality

In parallel with analyzing for loss of integrity we analyzed the data types for loss of confidentiality. According to NIST, the loss of data confidentiality means that protected data is accessed by or disclosed to an unauthorized party. For each data field, we evaluated what kind of implication may arise from the data field being exposed. At a high level, we can separate two dimensions of a citizen service: the type of service being requested/o↵ered; and who is the service being o↵ered to/o↵ered by. The implications of the loss of confidentiality are the gravest when both aspects are leaked; meaning an unauthorized person knows not only what the service being o↵ered is, but also who is being served. The type of implications can be functional, where the outcome of the service, or cultural, where politic, social, or economic inferences can be made about the citizen.

7.2.2 Financial and Cultural Implications of Data Voluntary or Involun- tary Disclosure

The next question we asked of the data was whether we could understand and categorize the types of implications from loss of confidentiality of the data. When we analyze the types of grievances and their implications, we find that some are solely financial. Figure B.1 presents a subset of these that we found among the data. Typically, these complaints were made either by individual citizens or a business, but about other businesses, but we note their financial implications. If a restaurant has CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 45

complaints about incorrect slaughtering or garbage disposal, whether or not it is true, the restaurant may su↵er financially. In contrast, we also found some kinds of grievances that were solely cultural. If a citizen complained about noise at night, it was likely to be about another citizen at home or walking down the street. As a third category, we found some complaints that had both financial and cultural implications. Examples of these can be found in Figure B.2. Noting that generally, financial only losses would derive from a complaint against a business, a citizen complaining about another citizen would lead to a cultural loss, and a business complaining about a citizen would lead to a combination of cultural and financial loss, we tabulated the number of each type of relationship, as shown in Figure Appendix B. In addition, in that figure we noted the number of complaints where the respondent to the complaint would be the ULB, noting that that exposure there would have neither cultural or financial implications directly.

7.2.3 Grievance and Inference

For the PGR module, the types of inferences that can be made from the loss of confidentiality can be determined through an understanding of the actors and their roles. For a public grievance, there is a complainer, and the entity that must take action to fix the complaint. The complainer can either be a citizen or a business, and the entity who must take action to fix the problem can be another citizen, business, or the municipality. For example, a citizen may submit a complaint about the non-functioning of street lights. In that case, the municipal administration must take action. However, if the complaint is about the illegal slaughtering of animals, then the complainer could have been a citizen who may have been a↵ected by the business or a legal competing business. The entity that must fix the issue is the illegal business. From analyzing the two actors for the 110 types of public grievances published by the on-site’s government, we found that we can assign certain types of implications depending on who the two actors are. We notice that citizen on citizen complaint has cultural implications, citizen or business on business has functional implications, and business on citizen has cultural and financial implica- tions. As shown in Figure B.3, most complaints must be fixed by the ULBs, likely regarding public infrastructure and health and sanitation issues. The majority of complaints that have implications come from complaining about a business, resulting in financial implications. If a business is a↵ected by a complaint and the entity who has complained is exposed, the complainer may endure financial consequences due to bias against them from the business. In contrast, complaints against citizens tend to result in cultural implications, as the person against whom the complaint has been lodged may develop political, social, or other types of cultural biases against the complaining entity. ... Chapter 8

Synthesis

We now synthesize the analysis above into two indices that help measure the eciency of governance and information privacy. Defining these indices helps in highlighting the tension that arises when try- ing to maximize both indices simultaneously. This tension then carves out a space where innovation can help achieve a high level of governance eciency, information privacy, and transparency.

8.1 Government Eciency Index

The Government Eciency Index (GEI) is defined as the product of timeliness and accuracy pa- rameters for a give service or entity. GEI is constructed such that it ranges from 0-1, where a value of 1 denotes highest level of governance eciency.

8.1.1 Timeliness of Service

The definition of Timeliness of Service rests upon when a service is considered timely. We consider a service timely when it is delivered on or before the desired Service Level Agreement (SLA). The ULBs in India publish an SLA for each service they o↵er, as promised by the Citizen’s Charter. [2]1 The Timeliness of Service component is measured as follows: for a given service, it is measured by the fraction of times the service is delivered on or before the SLA over a given unit of time (i.e., hour, day, month, etc.). For a given group (a division within ULB, the ULB as a whole), Timeliness of Service is measured by averaging the timeliness of the services delivered by the group over a given unit of time. Timeliness of Service can be computed at the level of a functionary, given service, or for all services o↵ered by an entity (e.g., all services of a division within ULB, all services of the ULB, all services of the ULBs in a given block, all services of ULBs in a given state, etc.) Accordingly, the equations for it come in three flavors.

46 CHAPTER 8. SYNTHESIS 47

Some notation that is important to note:

1. A task ki is a multi-set consisting of the times it has taken for every instance of that task to be completed.

2. For one particular service, there are n tasks that need to completed: K = k ,k ,...k . { 1 2 n} 3. Each functionary f (j) involved in the service completes some partial set of tasks from K.We denote the set of tasks f (j) must complete as K(j). The union of K(j) will just equal K. f (j) for all j is the set F.

Timeliness of Service at the level of a functionary

Each functionary’s timeliness can be described by the proportion of tasks that the functionary has completed within the task’s SLA as prescribed by the state’s SLA guidebook, the Puraseva User Manual. k = time to complete instance ,... time to complete instance (8.1) i { 1 m}

Each task ki also has an SLAk,i associated with it according to the Puraseva User Manual.

Therefore, the timeliness of a given task ki is:

k k SLA t = k{ ik i  ki }k (8.2) ki k k ik A functionary is usually responsible for one task, although some functionaries are responsible for more. Therefore, the timeliness of a functionary j can be described as his average timeliness across all of the tasks he is responsible for K(j):

1 t (j) = t (8.3) f K(j) k k K(j) k k 2X Timeliness of Service at the level of a service

The timeliness of a service is described by the proportion of instances the service was completed by the SLA for that service. We can calculate this by dividing the number of timely instances divided by the total number of instances for a given service. Suppose set si consists of all instances that a service i was completed:

s = time to complete service instance , time to complete service instance ,... (8.4) i { 1 2 }

We can denote the timeliness of a a service si with:

s s SLA t = k{ i| i  si }k (8.5) s s k ik CHAPTER 8. SYNTHESIS 48

Timeliness of Service at the level of a an entity

Timeliness of Service Average The timeliness of service for an entity is the proportion of all services that the entity o↵ers that were completed in a timely manner. All services that an entity o↵ers can be denoted by S = s ,s ,s ... . We can calculate the timeliness at the entity level by averaging the timeliness of { 1 2 3 } all services within the given entity. We assume each service is just as important as the next, so we do not give a di↵erent weight for each service while calculating the timeliness for the entity.

1 t = t (8.6) e S s s S k k X2 8.1.2 Accuracy of Service

The definition of Accuracy of Service rests upon when a service is considered accurate. We consider a service accurate when right service is delivered to the right person without any rework. The Accuracy of Service component is measured as follows: for a given service, it is measured by the fraction of times the service is delivered without rework over a given unit of time (i.e., hour, day, month, etc.). For a given group (a division within ULB, the ULB as a whole), Accuracy of Service is measured by averaging the accuracy of the services delivered by the group over a given unit of time. Accuracy of Service can be delivered at the level of a service or an entity as previously described. Accordingly, the equations for it come in two flavors.

Accuracy of Service at Service Level

The accuracy of a service at the service level can be computed by dividing the total number of times the service was accurately divided by the total number of times the service was completed. The number of services completed accurately is equal to the number of times the services was completed n minus the number of times the completed service was petitioned by the citizen and resulted in a change. A proxy for this latter value is the number of Revision Petitions r that resulted in a change in some service. So for a given service s we can calculate the accuracy of the service at the service level as follows: r a =1 (8.7) s n

Accuracy of Service at Entity Level

The accuracy of a service for an entity can be described by the average accuracy of all services S o↵ered by the entity: 1 a = a (8.8) e S s s S k k X2 CHAPTER 8. SYNTHESIS 49

8.2 Informational Privacy Index

IPI is constructed such that it ranges from 0-1, where a value of 1 denotes highest level of information privacy. We define Right Collection as collection of those data fields that are necessary for delivering the service. In other words, without collecting these data fields, the requested service cannot be delivered. Right Collection is measured for a given service or for services o↵ered by a given group as Necessary Data Fields/Total Data Fields Collected. We define Right Use as access of data fields to only those (in the ULB) who need it for delivering the service. Right Use is measured for a given service or for services o↵ered by a given group as Number of Data Field To Which Access Is Necessary / Number of Data Fields To Which Access Is Granted. We define Right Disclosure as public disclosure data fields that protects personal identity and undesirable inference. Right Disclosure is measured for a given service or for services o↵ered by a given group as (1 - (Number of Data fields With PII or Undesirable Inference Disclosed / Total Number of Fields with PII or Undesirable Inference)). IPI is determined based on the analysis of data collection, use, and disclosure policies of ULBs. The real-time value of IPI will rest upon the frequency and types of service requests a ULB serves. Notation:

1. X is the set of data fields collected for a particular service.

(i) 2. Xn : The set of data necessary for a task ki to be completed in the work flow of a service. (i) Xn = Xn is the set of all data necessary for a service to be completed. i S (i) 3. Xe : The set of data collected for the eciency/accuracy of a task ki in the work flow for a (i) service. Xe = Xe is the set of data collected for the eciency/accuracy of a service. i S (i) 4. Xu : The set of data collected generally for legacy reasons and unnecessary for a task ki in (i) the work flow of a service to be completed. Xu = Xu is the set of data that is collected i but unnecessary for a service to be completed. S

i 5. Xa: For each task ki, the data fields the functionary responsible for ki has access to is denoted as Xi . A functionary currently has access to all the collected data fields, so Xi = X for all a a k k i.

8.2.1 Right Collection Index

The extent of right collection is determined by calculating the proportion of the collected data fields that are actually necessary for the completion of the task or service. Since each task has specific CHAPTER 8. SYNTHESIS 50

data that it requires to be completed, we compute right collection at the task level as opposed to the functionary level. We can also compute right collection at the service level as well as the entity level.

Task Level

The right collection index at the task level ck is the number of necessary data fields collected for that particular task divided by the total number of data fields collected for the task. So for a task ki the right collection is: (i) Xn ci = k k (8.9) k X k nk Service Level

The right collection index at the service level is computed by dividing all necessary fields collected by a particular service divided by all the fields collected by that service.

X c = k nk (8.10) s X k k Entity Level

The right collection index for an entity is measured by the average of the right collection indices for all the services S the entity o↵ers. 1 c = c (8.11) e S s s S k k X2 8.2.2 Right Use

The right use index measures extent to which access of data fields is given only to those (in the ULB) who need it for delivering the service.

Task Level

i We can compute the right use index for each task by uk dividing the number of fields necessary to complete a given task by the total number of fields a functionary completing that field is given access to. Ideally, a functionary should only have access to the fields that are required for him to complete a given task, giving a right use index of 1.

(i) (i) Xn uk = k (i)k (8.12) Xa k k CHAPTER 8. SYNTHESIS 51

The above calculations penalize the index fully for a functionary having access to Xe. An alternate calculation for the right use index that takes into account the right use of Xe could be as follows:

(i) (i) (i) Xn 1 Xe uk = k (i)k + k (i)k (8.13) Xa 2 Xa k k k k Service Level

The right use index for a given service can either be described as the average or the product of i the right use indices for all tasks involved in completing that service. While averaging over all uk would give a general indication of right use for a service, it would not give a nuanced index across all services and all ULBs, as the right use index for tasks within a service may vary greatly. The average right use index for a particular service s is computed as follows:

1 u = u (8.14) s K k k K k k X2 The product of right use indices of all tasks for a particular service s is computed as follows:

us = uk (8.15) k K Y2 Entity Level

The right use index for a given service can be similarly described: we can take the average or the product of the right use indices for all services the entity o↵ers. The average right use index for an entity is as follows: 1 u = u (8.16) e S s s S k k X2 8.2.3 Right Disclosure

The right disclosure index should describe how protected personal identity and undesirable inferences are against public disclosure. This parameter is defined as the proportion of fields that are considered PII that are not open at each level. In our case study, home address and mobile phone number are considered PII, as defined by eGov.

Functionary Level

At the functionary level, right disclosure is calculated by the Proxy: 1 (Data fields With PII or Undesirable Inference Disclosed/Total Fields with PII or Undesirable Inference). For a given service, the right disclosure parameter will be the same across all functionaries; data fields disclosed publicly are obviously accessible by functionaries as well. However, it is useful to note the parameter at CHAPTER 8. SYNTHESIS 52

the functionary level in order to calculate the IPI across functionaries in a given service. The right collection and right use parameters are not necessarily the same across all functionaries.

(i) PIIexposed dk =1 (8.17) PIIcollected

Service Level

The calculation and values for the right disclosure parameter at the service level will be the same as the that at the functionary level. However, the parameter will vary across services, depending on the PII fields collected and exposed to the public.

(i) PIIexposed ds =1 (8.18) PIIcollected

Entity Level

At the entity level, the right disclosure parameter can be defined as the average IPI of all services.

1 d = u (8.19) e S s s S k k X2 Chapter 9

Results and Analysis

In this Chapter, we calculate and analyze GEI and IPI for New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal. We describe each ULB with a variety of metrics including but not limited to timeliness, weighted impact, inaccurate application duration rate, and average time to complete service. Ultimately, we determine top-performing ULBs according to GEI, IPI, and both. These ULBs will serve as case studies in determining factors that help maintain high GEI and IPI.

9.1 GEI Calculation

In this Section, we explore the distribution of timeliness and accuracy values for New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal across all ULBs. We often group ULBs by tier as the varying volume of transactions and resources could a↵ect GEI. We determine top-performing cities according to GEI, and argue that since there are ULBs within each tier that have consistently high GEI, then other ULBs within each tier can also improve their eciency.

9.1.1 Timeliness

We assess timeliness through several lenses. First, we observe the distribution of timeliness across all ULBs. We calculate adjusted timeliness values and weighted impact of timeliness across all three tiers of ULBs for Property Tax Assessment, Water Charges, and Public Grievance Redressal. For Property Tax and Water Charges, we identify model cities by tier based on the adjusted timeliness values and quality workflow transparency. Identifying top-performing ULBs is important in later determining why these ULBs in particular are able to maintain high levels of eciency and perhaps even privacy. For PGR, we observe timeliness values at various levels. Second, we analyze the

53 CHAPTER 9. RESULTS AND ANALYSIS 54

distribution of timeliness values across all ULBs by department. Third, we observe which ULBs perform well in which departments. Overall, we determine model cities primarily by timeliness values across all departments and secondarily by the average di↵erence between SLA and the days it took to complete the set or subset of requests.

Adjusted Timeliness

Adjusted timeliness gives a more accurate measure of timeliness as opposed to perceived timeliness. Perceived timeliness is calculated as previously described. Application duration is defined as the di↵erence in timestamps from the application entry until the acceptance/denial of the application. Perceived timeliness is the proportion of applications where the application durations are less than their respective SLAs. However, when observing the distribution of the task durations, we observe that a non-trivial number of applications have unrealistic application durations. Some applications were recorded as having been completed in less than one or two days. Others show entry and approval within seconds. We assume that these applications were received manually, proceeded through the workflow, and were accepted or denied manually. After the completion of the workflow, these applications were likely then entered into the ERP system. Subsequently, technical aids or administrators enter, digitally approve, and pass on the application according to all steps of the workflow within a day. Applications digitally logged to have been completed in less than 1 day are likely to have been recorded in this inaccurate manner. Inaccurate application durations can significantly improve a ULBs timeliness value, leading to incorrect identification of model ULBs. To normalize for inaccurate recordings, we calculate and compare an adjusted timeliness pa- rameter. For each service, the adjusted timeliness is calculated by ignoring applications that have application durations of less than a day, and re-calculating timeliness as described in Section 8.1.1 with the remaining applications. Adjusted timeliness is calculated as follows:

s (s SLA ) (s 1 t = k{ i| i  s ^ i }k (9.1) adj s s 1 k{ i| i }k Adjusted timeliness gives a more accurate measure of how timely applications are completed. It also allows for a fairer comparison between ULBs, penalizing cities that record large numbers of inaccurate application durations.

Weighted Impact

We calculate weighted impact to understand the magnitude of applications that are a↵ected by the GEI for a particular ULB. We define GEI Impact (GEImpact) of a ULB as the number of applications that on average are delivered in a timely manner for a particular service. As this number depends on the number of applications received by each ULB, the impact gives a sense of the magnitude of changes in timeliness. A large ULB with a lower timeliness value and a smaller ULB with a higher CHAPTER 9. RESULTS AND ANALYSIS 55

timeliness value may have the same impact score. As such, comparing impact scores is not as useful to determine model ULBs overall. In particular, for Property Tax and Water Charges we calculate weighted impact. For a given ULB and service, weighted impact is the number of timely applications that can be expected given the adjusted timeliness value and total number of applications:

I(s) = t(s) s (9.2) w adj ⇤k k (PGR) For a given ULB, Iw is equal to the number of applications that had been completed by that (PT) ULB within each complaint’s SLA period. For Property Tax and Water Charges however, Iw (WC) and Iw will be smaller than the impact values calculated with perceived timeliness values. We calculate adjusted timeliness based on only the partial set of applications that were likely to have been recorded accurately. The weighted impacts are based on the actual of volume of applications, but with adjusted timeliness values. As such, we interpret weighted impact as projected impact of a service if the ERP system is used as intended.

9.1.2 Property Tax Assessment

See Appendix F for a complete table of results.

Inaccurate Application Duration

Figure 9.1 shows the distribution of the percentage of inaccurately recorded transactions for each ULB by tier. The color of the markers refer to the population tier of the city, and are consistent for each tier for all plots in this document. From Figure 9.1, we observe that while Tier 1 cities on average process more applications, they have a smaller spread of inaccurate application durations. The median of Tier 1 at around 7% is the lowest among all tiers, and the median of Tier 2 cities is highest at about 29%. Tier 2 ULBs show the most and nearly maximum spread of percentage of inaccurate transactions. While the median for Tier 3 is around 20%, the upper 50% of Tier 2 and Tier 3 ULBs have a large spread. High rates of inaccurate workflow documentation seem to be largely uncorrelated with the number of applications processed by a ULB. In fact ULBs with smaller volume of applications seem to generally have a higher proportion of inaccurate records. This observation falls in line with what we were told during field visits. CHAPTER 9. RESULTS AND ANALYSIS 56

Figure 9.1: Distribution of Inaccurate Application Durations for PT by Tier: This plot describes the distribution of percentage of inaccurately recorded transactions by tier of population. Each marker indicates a ULB, where the size of the marker indicates the volume of applications received and the color indicates the population tier to which that ULB belongs. The median of each tier is also marked in yellow and annotated with the value of the respective median.

During interviews with eGov sta↵, we were told that some ULBs were using the ERP system throughout the course of a service workflow as intended. In most ULBs, eGov had partnered with an NGO called Karvy to help municipal ocials. Karvy employees are appointed to act as liaisons between eGov and lower level ocials. They train ocials on how to use, navigate, and fulfill their responsibilities in the ERP system. When interviewing a Karvy employee on site, he explained that he inputs application information into the ERP system. As he also has access to all login credentials, he digitally processes applications in the system after they are manually approved due to the poor technical skills of municipal sta↵. EGov ocials confirmed that this is not an isolated case, and is in fact quite common. Some ocials and ULBs feel that the barrier to learning is too high. As such they resort to inaccurate recording techniques even if the integrity of the application process is upheld. Operating under this assumption, it is expected that ULBs that overall have less technically trained sta↵depend on a single or couple data entry persons to digitally process applications after manual processing. It is important to deal with this issue in the long run, if there is to be a commitment to actual eciency and transparency. After disregarding inaccurate workflow transitions, the adjusted timeliness values were calculated. CHAPTER 9. RESULTS AND ANALYSIS 57

Figure 9.2 shows a comparison of adjusted timeliness and unadjusted timeliness values by tier. The R-squared values for Tier 1, 2, and 3 are 0.987392, 0.919681, and 0.741817 respectively. The farther the ULBs are from the y = x line, the greater the discrepancy between the adjusted and unadjusted timeliness values. The trendlines fitted in Figure 9.2 show how the adjusted timeliness values may be correlated with timeliness. We can observe the overall trend in discrepancy within each tier as well.

Figure 9.2: PT Adjusted vs. Unadjusted Timeliness: This figure compares adjusted timeliness and unadjusted timeliness values for each ULB. The size and color of the marker correspond to the volume of applications received by and population tier of that ULB, respectively. The OLS lines-of-best-fit for each tier are indicated by dashed lines.

The Tier 1 trend line is quite close to a y = x line, and the Tier 1 ULBs are clustered very closely around the line. From our observations in Figure 9.1, this result is not surprising. Tier 2 and 3 ULBs are not as closely clustered around their trendlines. The general trend is that the unadjusted timeliness values are higher than adjusted timeliness, while converging as adjusted timelines increase. Tier 1 ULBs with high adjusted timeliness values have nearly equal unadjusted values. These observations indicate that ULBs with high unadjusted timeliness values are not necessarily recording applications inaccurately. Furthermore, we note that the size of the ULB need not a↵ect the integrity of digital application processing. As can be seen in Appendix F.5, there exist ULBs handling volumes proportional to their population, and yet have a low percentage of inaccurate workflow records. These two pieces of analysis indicate that it is possible for ULBs to have high timeliness regardless of tier and volume of applications while using the ERP system and digital workflow processing as intended. CHAPTER 9. RESULTS AND ANALYSIS 58

Figure 9.3: PT: A histogram of the frequency of adjusted timeliness bins, aggregated by color according to population tier

Adjusted Timeliness

Overall, we observe a modal distribution of adjusted timeliness values across all tiers, centered around 0.55 to 0.60. While a 55% timely service rate is not poor, we see examples of ULBs in all tiers that have reached 90% - 100% timely service rates. Consequently, we determine that it is possible to optimize timeliness across all ULBs. We would expect to see that as the volume of applications that a ULB has to handle increases, that eciency would be compromised due to limits on resources. However, within each tier of cities we observe ULBs with high adjusted timeliness values. From Appendix F.5, we observe no obvious correlation between population or volume of applications and adjusted timeliness. In Figure 9.4, we describe the distribution of adjusted timeliness by tiers. Tier 1 ULBs have on average lower timeliness values in comparison to the other tiers, as can be expected. Even if the volume of applications received does not a↵ect timeliness, there could be other factors that lead to more Tier 1 cities being less ecient. Human and time resources do not grow proportionately with volume of applications, and the organizational complexity of larger municipalities could further a↵ect timeliness. CHAPTER 9. RESULTS AND ANALYSIS 59

Figure 9.4: Distribution of Adjusted Timeliness for PT by Tier: These boxplots separated by tier describe the distribution of adjusted timeliness values. Each marker, representing a ULB, has size proportional to the volume of applications received by that ULB. The pink reference lines indicate the average tadj for the corresponding tiers. The yellow line indicates the median tadj for the corresponding tiers

Weighted Impact

In order to compare and identify which ULBs are able have high impact while maintaining high timeliness, we calculate weighted impact. Weighted impact is the number of applications that could be delivered on a timely manner by a ULB given that the adjusted timeliness applies to the original volume of applications, and all application durations were recorded in a timely manner. As can be CHAPTER 9. RESULTS AND ANALYSIS 60

seen in Figure 9.5, there are ULBs in each tier that stand out as having significantly higher weighted impact. While a larger impact does not necessarily indicate greater timeliness, ULBs with high impact and timeliness can be further analyzed to learn about how high timeliness is maintained with large application volumes. ULBs to further analyze in this way are those that located in the upper right hand corner of each pane in Figure 9.6. ULBs 1013, 1070, 1073, 1034, 1030, 1117, 1089, 1120, 1062, and 1148 are examples of such ULBs. ULBs not in the upper right corner of these plots are not necessarily inecient. Some ULBs have high eciency but a lesser volume of applications. As a result, they have smaller impact. While analyzing the top performing ULBs according to weighted impact and timeliness can be informative, weighted impact should not be used to determine the overall top-performing ULBs. CHAPTER 9. RESULTS AND ANALYSIS 61

Figure 9.5: Distribution of Weighted Impact for PT by Tier: Boxplots for the distribution of weighted impact by tiers, where markers are ULBs of size proportional to the volume of applications received. CHAPTER 9. RESULTS AND ANALYSIS 62

Figure 9.6: GEImpact vs. Adjusted Timeliness of PT by Tier: ULBs handling high volumes of applications but maintaining high timeliness rates can be identified here. Each pane gives a plot of weighted impact vs. adjusted timelines for a particular tier. The shades regions correspond to the interquartile regions for each axis. The markers represent ULBs where the size of the marker is proportional to the total number of applications received. ULBs where both coordinates are greater than the third quartile can be informative upon further analysis. CHAPTER 9. RESULTS AND ANALYSIS 63

Model ULBs

Determination of top-performing ULBs with regards to New Property Tax Assessments is important in understanding what the limitations on timeliness are for this particular service and for further analyzing what changes to ULBs can improve eciency for this service. Throughout the analysis we have addressed various characteristics and parameters of a ULB. Determining the top-performing ULBs however takes into account two factors. We consider ULBs with adjusted timeliness greater than the 75 percentile and di↵erences in adjusted and perceived (or unadjusted) timeliness above the median as model ULBs. We identify model ULBs by tier; see Figures 9.7, 9.8, and 9.9. While ultimately adjusted timeliness values are most important in determining model ULBs within each tier, we use the di↵erence between perceived and adjusted timeliness (t) as a secondary factor. A smaller di↵erence does not necessarily indicate a lower percentage of recording inaccurate application durations. However, we use it as a secondary factor for two reasons. First, when com- paring ULBs handling similar volumes of applications, a smaller t indicates a smaller proportion of applications had inaccurate application durations. The adjusted timeliness values for ULBs pro- cessing similar volumes are similarly sensitive to the number of applications that were inaccurately recorded. As a result, a slight preference can be given to the ULB with a smaller t. Second, when comparing ULBs with highly varying volumes of applications even with the same tier, comparing t values gives more leeway for ULBs processing large application volumes. Suppose we have a ULB ↵ that handles twice the volume of applications as ULB . The adjusted timeliness of ↵ is less sensitive to a larger percentage of inaccurate application durations. ULB ↵ can have a higher percentage than

while t for both will remain similar. This leeway partially accounts for the fact that the amount of resources or SLAs for a ULB do not grow proportionally with the volume of applications received. The ULBs are in a way given credit for having high timeliness despite large application volumes. CHAPTER 9. RESULTS AND ANALYSIS 64

(a) Top-performing ULBs for New Property Tax Assessment timeliness of service in Tier 1 ac- cording to criteria described in 9.1.2

(b) Identification of Model ULBs in Tier 1: We plot tadj against t and choose top-performing ULBs according to the criteria outlined in Section 9.1.2. The blue region shows the interquartile range of tadj for Tier 1 ULBs. the grey region shows the interquartile range for t. Each marker corresponds to a ULB in Tier 1 where the size of the marker is proportional to the number of applications received by that ULB.

Figure 9.7: Tier 1 Model ULBs with respect to Timeliness of Service for New Property Tax Assess- ment CHAPTER 9. RESULTS AND ANALYSIS 65

(a) Top-performing ULBs for New Property Tax Assessment timeliness of service in Tier 2 according to criteria described in 9.1.2

(b) Identification of Model ULBs in Tier 2: We plot tadj against t and choose top- performing ULBs according to the criteria outlined in Section 9.1.2. The blue region shows the interquartile range of tadj for Tier 2 ULBs. the grey region shows the interquartile range for t. Each marker corresponds to a ULB in Tier 2 where the size of the marker is proportional to the number of applications received by that ULB.

Figure 9.8: Tier 2 Model ULBs with respect to Timeliness of Service for New Property Tax Assess- ment CHAPTER 9. RESULTS AND ANALYSIS 66

(a) Top-performing ULBs for New Property Tax Assessment timeliness of service in Tier 3 according to criteria described in 9.1.2

(b) Identification of Model ULBs in Tier 3: We plot tadj against t and choose top- performing ULBs according to the criteria outlined in Section 9.1.2. The blue region shows the interquartile range of tadj for Tier 3 ULBs. the grey region shows the interquartile range for t. Each marker corresponds to a ULB in Tier 3 where the size of the marker is proportional to the number of applications received by that ULB.

Figure 9.9: Tier 3 Model ULBs with respect to Timeliness of Service for New Property Tax Assess- ment CHAPTER 9. RESULTS AND ANALYSIS 67

9.1.3 Water Charges

Inaccurate Application Durations

As in New Property Assessment records, we notice applications that have unrealistic application durations. Applications were supposedly completed in as little as 1 minute. It is reasonable to assume that most of these applications were approved or rejected manually, then inputted into the ERP system. As these ”incorrect” transactions unfairly improve timeliness, they are disregarded in calculating adjusted timeliness. In Figure9.10, the distribution of percentage of applications with inaccurate application durations by tier is shown. Similar to new property assessment, tier 1 cities have a smaller median and average percentage of incorrect transactions. This result is not surprising, as the ERP system provides organization and accountability for the larger ULBs. It is also possible that tier 1 ULBs with very low percentages of incorrect transactions are not manually approving applications before digitally inputting them. Instead, there may just be a backup in inputting paper forms into the system such that the application is approved by the time the initial application reaches the data entry desk for the first time. This may be the case as well for some Tier 2 and Tier 3 ULBs. Overall, however, the Engineering Department responsible for New Water Tap Connection ap- plications seems to be more compliant with the digital recording procedure than the Revenue De- partment. Compared to the rate of incorrect recording of applications for property tax, that for new water connection is much lower, especially for Tiers 2 and 3. For Property tax assessments, tiers 1, 2, and 3 had median incorrect transaction rates of 7.03%, 28.83%, and 19.90%, respectively. There are also a higher volume of property tax assessments in comparison to water connection applications. It is dicult to determine at this time whether the higher volume is the cause of the higher inaccurate recording rate for property tax assessments. Across tiers, Tier 1 ULBs handle large volumes well with the ERP system, but within tiers, Tier 2 and Tier 3 have diculty being compliant. Some ULBs have rates of incorrect recording of applications above third quartiles of both PT and WT: ULBs 1008,1024,1070,1074 in Tier 1, 1089 and 1121 in Tier 2, and 1082, 1151, and 1160 in Tier 3. It is important to recognize that these ULBs in particular are not consistently using the ERP system as intended. It could be interesting to further analyze why these ULBs are not doing so, and address those issues to improve transparency and accountability of those municipalities. CHAPTER 9. RESULTS AND ANALYSIS 68

Figure 9.10: Distribution of Inaccurate Application Durations for WT by Tier: This plot describes the distribution of percentage of inaccurately recorded transactions by tier of population.

If there were a trivial number of inaccurate application durations compared to the total number of applications, the di↵erence between the adjusted and unadjusted timeliness values should be minimal. In Figure 9.11, ULBs along the y = x line are those with close to no inaccurate transactions in comparison to those farther away. As expected, the Tier 1 ULBs are fairly close to the y = x line. The OLS line of best fit for Tier 1 has an R-squared value of 0.9753. In contrast to PT, Tier 3 ULBs perform relatively well. As the adjusted timeliness increases, the unadjusted timeliness values approach the adjusted timeliness. The R-squared value for Tier 3’s line of best fit is 0.8631. Tier 2, however, shows a much weaker correlation between adjusted and unadjusted timeliness. From Figure 9.10, we see the weak correlation is as a result of the large variation in inaccurate application rates in Tier 2 cities. In fact, the R-squared value is 0.4424. Furthermore, we note that the size of the ULB need not a↵ect the integrity of digital application processing, as was the case for PT. As can be seen in ??, there exist ULBs handling volumes proportional to their population, and yet have a low percentage of inaccurate workflow records. These two pieces of analysis indicate that it is possible for ULBs to have high timeliness regardless of tier and volume of applications while using the ERP system and digital workflow processing as intended. CHAPTER 9. RESULTS AND ANALYSIS 69

Figure 9.11: WT Adjusted vs. Unadjusted Timeliness: This figure compares adjusted timeliness and unadjusted timeliness values for each ULB for WT. The size and color of the marker correspond to the volume of applications received by and population tier of that ULB, respectively. The OLS lines-of-best-fit for each tier are indicated by dashed lines.

Adjusted Timeliness

We would expect to see that as the volume of applications that a ULB has to handle increases, that eciency would be compromised due to limits on resources. However, within each tier of cities we observe ULBs with high adjusted timeliness values. From Figure F.10, we observe no obvious correlation between population or volume of applications and adjusted timeliness. There is a very slight negative correlation between volume and adjusted timeliness for Tier 1. As volume increases, timeliness has a very slight decrease. Overall however, there are ULBs that are able to perform well despite volume. In Figure 9.4, we describe the distribution of adjusted timeliness by tiers. Tier 3 timeliness values are shifted slightly to the right, but in general, all tiers have a spread of timeliness. The histogram is unimodal, with the peak at the 0.65 to 0.70 bin. In comparison to PT, WT has higher timeliness in general. CHAPTER 9. RESULTS AND ANALYSIS 70

Figure 9.12: WT: A histogram of the frequency of adjusted timeliness bins, aggregated by color according to population tier

Tiers 1, 2, and 3 have an average timeliness of 0.517, 0.508, and 0.559 respectively. All tiers have similar timeliness, where the Tier 3 average and median are slightly higher than those for Tier 1 and Tier 2. Tier 3 has a maximum spread of timeliness values, ranging from 0 to 1.0. ULBs in Tier 3 receive fairly small volumes of applications and yet have a wide range of timeliness. Workflows are likely not di↵erent, but further studies should be completed to understand what di↵erences with municipalities leads to the spread of timeliness. Additionally, we can see here that large Tier 1 ULBs with large volumes of applications seem to have lower timeliness. This observation confirms the slight negative correlation observed in F.10. CHAPTER 9. RESULTS AND ANALYSIS 71

Figure 9.13: Distribution of Adjusted Timeliness for WT by Tier: These boxplots separated by tier describe the distribution of adjusted timeliness values. Each marker, representing a ULB, has size proportional to the volume of applications received by that ULB. The pink reference lines indicate the average tadj for the corresponding tiers. The yellow line indicates the median tadj for the corresponding tiers.

Weighted Impact

As can be expected, we see decreasing impact from Tier 1 to Tier 3. However, multiple ULBs in all tiers standout as having large impact, when receiveing similar volumes of applications. In comparison to the weighted impact distributions for new property tax assessment (See 9.5), Tier 2 and Tier 3 have more outliers. The increased number of outliers in 9.14 could be derived from a wider spread distribution of tadj for water charges or large di↵erences in volumes of applications for property tax assessments versus water charges. The distribution of tadj for water charges is in CHAPTER 9. RESULTS AND ANALYSIS 72

fact more spread out; the range of tadj for tiers 1, 2, and 3 are approximately 0.9, 0.9, and 1. In comparison, the range of tadj for property tax assessment is about 0.85, 0.83, and 0.65 for tiers 1, 2, and 3. Most if not all ULBs process many more new property tax assessment applications that water charges applications, partially because the Property Tax Module was implemented in the state producing this data before the Water Charges Module was implemented. There are a couple of ULBs that have impact above the third quartile for their respective tiers for both modules. For example, 1013 and 1033 have significantly high impacts for both services. Further analysis of why these common high-impact ULBs are able to consistently handle high volumes of applications across departments.

Figure 9.14: Distribution of Weighted Impact for WT by Tier: Boxplots for the distribution of weighted impact by tiers, where markers are ULBs of size proportional to the volume of applications received.

ULBs with high timeliness and impact values for Water Charges are identified and can be further CHAPTER 9. RESULTS AND ANALYSIS 73

analyzed to understand why their Engineering Department, responsible for water charges, are able to handle high volumes of applications while maintaining high timeliness. As described in Section 9.1.2, we can identify these ULBs of interest by filtering for ULBs that perform in the top 25 percentile of weighted impact and tadj. In Figure 9.15, the top-performing ULBs according to the aforementioned criteria reside in the top right corner of each pane. Such ULBs include 1035, 1008, 1117, 1006, 1144, and 1160 to name a few. In particular, ULBs 1008, 1034, and 1117 have high impact and timeliness for both new property assessment and water assessment. On-site research regarding workplace eciency and organization at these municipalities would inform on how to maintain consistent timeliness across departments even with a large load of transactions. Weighted impact, however should not be used to determine the overall top-performing ULBs, as the volume of applications received can vary greatly within each tier. CHAPTER 9. RESULTS AND ANALYSIS 74

Figure 9.15: GEImpact vs. Adjusted Timeliness of WT by Tier: ULBs handling high volumes of applications but maintaining high timeliness rates can be identified here. Each pane gives a plot of weighted impact vs. adjusted timelines for a particular tier. The shades regions correspond to the interquartile regions for each axis. The markers represent ULBs where the size of the marker is proportional to the total number of applications received. ULBs where both coordinates are greater than the third quartile can be informative upon further analysis.

Model ULBs

We determine top-performing ULBs for New Water Tap Connection according to timeliness by a similar method to what we did for New Property Tax Assessment: we consider ULBs that perform in the top 25th percentile for adjusted timeliness and the top 50th percentile for di↵erence in adjusted CHAPTER 9. RESULTS AND ANALYSIS 75

and perceived (or unadjusted) timeliness as model ULBs. The results by tier are shown in Figures 9.16, 9.17, and 9.18. CHAPTER 9. RESULTS AND ANALYSIS 76

(a) Top-performing ULBs for New Water Tap Connection timeliness of service in Tier 1 according to criteria described in 9.1.3

(b) Identification of Model ULBs in Tier 1: We plot tadj against t and choose top- performing ULBs according to the criteria outlined in Section 9.1.3. The blue region shows the interquartile range of tadj for Tier 1 ULBs. The grey region shows the interquartile range for t. Each marker corresponds to a ULB in Tier 1 where the size of the marker is proportional to the number of applications received by that ULB.

Figure 9.16: Tier 1 Model ULBs with respect to Timeliness of Service for New Water Tap Connection CHAPTER 9. RESULTS AND ANALYSIS 77

(a) Top-performing ULBs for New Water Tap Connection timeliness of service in Tier 2 according to criteria described in 9.1.3

(b) Identification of Model ULBs in Tier 2: We plot tadj against t and choose top- performing ULBs according to the criteria outlined in Section 9.1.3. The blue region shows the interquartile range of tadj for Tier 2 ULBs. The grey region shows the interquartile range for t. Each marker corresponds to a ULB in Tier 2 where the size of the marker is proportional to the number of applications received by that ULB.

Figure 9.17: Tier 2 Model ULBs with respect to Timeliness of Service for New Water Tap Connection CHAPTER 9. RESULTS AND ANALYSIS 78

(a) Top-performing ULBs for New Water Tap Connection timeliness of service in Tier 3 according to criteria described in 9.1.3

(b) Identification of Model ULBs in Tier 3: We plot tadj against t and choose top- performing ULBs according to the criteria outlined in Section 9.1.3. The blue region shows the interquartile range of tadj for Tier 3 ULBs. The grey region shows the interquartile range for t. Each marker corresponds to a ULB in Tier 3 where the size of the marker is proportional to the number of applications received by that ULB.

Figure 9.18: Tier 3 Model ULBs with respect to Timeliness of Service for New Water Tap Connection CHAPTER 9. RESULTS AND ANALYSIS 79

9.1.4 Comparison of PT and WT Model ULBs

We observe a couple ULBs of note. First, we identify 1073 as a model ULB in Tier 1 (see Figure 9.16) but only 2 applications were received. We can disregard this ULB due to the trivial volume of applications. Second, 1075 is classified as a model ULB for both new property tax assessment and new water tap connection. It receives a significant volume of applications (1,867) but maintains high (PT) (WT) timeliness, where tadj =0.83 and tadj =0.73. Additionally, only about 7.2% of PT applications and 4.3% of WT applications are incorrectly recorded. Such low numbers imply that the ERP system is being utilized as intended during the application workflow as opposed to after the processing is manually completed. In Tier 2, 1117 and 1032 are model ULBs for both WT and PT. For 1117,the percent of inaccurate application durations for WT is 14% while it is 66% for PT. Such a large di↵erence within one ULB is surprising. However, the PT module falls under the the Revenue Department and the WT module falls under the Engineering Department. It’s possible that ocials in the Engineering Department were more skilled with computers such that adoption of the ERP system did not require a high activation energy. Although 1135 is classified as a model ULB for both PT and WT, the data indicates that 1135 may not necessarily be a timely and transparent ULB. When calculating adjusted timeliness, we disregarded applications that were processed in less than a day. The one day threshold is a lower bound, as applications with durations of less than 2-3 days are likely also inaccurately recorded. 91% of PT applications and 75% of WT applications were recorded to have been completed in less than a day. It is possible that if the threshold for being considered inaccurately recorded is increased to 2 days from 1 day, the majority of the remaining PT and WT applications would be considered ignored as well in the timeliness calculation. Further work is required to determine if 1135 is in fact providing services in a timely manner or recording inaccurate application durations with a higher time threshold.

9.1.5 Public Grievances

We analyze PGR timeliness from a couple of di↵erent dimensions.

Which departments of which ULBs consistently provide timely service? • Which complaints consistently require more time than the given SLA across all ULBs? • Unlike new property tax assessment and new water tap connection, public grievances can only be submitted digitally. As soon as a complaint is submitted by the citizen, it enters into the workflow and all its information, including time of submission, is recorded. As a result, we do not need to normalize the data for inaccurate application durations. Timeliness at the ULB level or department level can be computed directly. See Figures 9.19 and F.14 for a full table of results. CHAPTER 9. RESULTS AND ANALYSIS 80

Figure 9.19: Average Timeliness across Tiers by PGR Department CHAPTER 9. RESULTS AND ANALYSIS 81

(a) Administration (b) Revenue

(c) Town Planning (d) UPA

Figure 9.20: PGR Timeliness by Department of Complaint and Tier

We observe distribution of timeliness by Tier as well (Figure 9.20). It is easy to identify the top performing ULBs in each tier for each department. As can be expected, Tier 1 ULBs process significantly more requests than Tier 2 or Tier 3 ULBs Tier 1 ULBs perform much worse when it comes to complaints in the administration department. By breaking down the department timeliness by complaint, one can detect exactly which complaints are leading the drop in timeliness by Tier 1 ULBs. As seen in Figure 9.21b, Tier 1 ULBs are particularly less timely regarding ”Complaints CHAPTER 9. RESULTS AND ANALYSIS 82

regarding schools” and ”Inclusion, Detection of Correction in Voter List” than the average Tier 2 or Tier 3 ULB. As in Figure 9.21c one can then further break down the complaints by ULB to determine which ULBs in Tier 1 are able to redress the specific complaint in a timely manner. One method of identifying model ULBs is looking at the ULBs that are in the top 5 percentile for each complaint across each tier. This process can be repeated across all departments and complaints to identify model ULBs. These ULBs can be studied further to inform on how possible di↵erences in workflow, amount of sta↵, and other factors can improve timeliness of service.

(a) Distribution of Timeliness for Administration (b) Average Timeliness by Administration Com- Complaints by Tier plaints and Tier

(c) ULB Timeliness by Administration Complaint

Figure 9.21: Process for Identifying Model ULBs for a Complaint

In general, we identify ULBs that provide consistent timely redressal for each department. Again, CHAPTER 9. RESULTS AND ANALYSIS 83

we assume that departments within ULBs function independently of each other. As a result, iden- tifying model ULBs for all PGR is not necessarily the most informative. Given that the current SLAs are uniform for each complaint across ULBs, we would expect that Tier 1 ULBs on average will provide less timely service than smaller ULBs due to a larger volume of complaints or other such population-related factors. As seen in Figure 9.21, this is not always the case. Surprisingly, in the Revenue Department Tier 1 cities overall perform the best regardless of application volume. Therefore a study of model cities for each department and tiers together can provide the most in- formation on why certain departments, with respect to specific volumes of complaints, are still able to provide ecient service.

9.1.6 Accuracy

Currently there is no easy way to determine how accurately an application or PGR has been processed in the ERP system. A citizen can re-open a previous complaint if he is not satisfied with the redressal. However, it is dicult to automate the process of determining whether the complaint is subsequently closed due to the ULB not properly addressing the complaint or determined to be an unreasonable request. Similarly, citizens can file a revision petition for WT and PT if they believe that their property tax of water charges determination is incorrect. The assessment may initially be wrong due to a mistake on the part of the ULB for multiple reasons. Functionaries could have inputted wrong values in the master sheet that calculates charges. If the municipality accepts the revision petition, the citizen’s property assessment or water charges tax may be a↵ected. Even after approval however, the revision petition can be withdrawn. Due to the complexity of evaluating it, this study did not address accuracy, and has assigned the accuracy parameter a value of 1. Further work and on-site visits should be completed to understand how to go about calculating this parameter.

9.2 IPI Calculation

Using the workflows and NDM and OAM matrices, we calculated the right use, right collection, and right disclosure parameters at the level of the functionary, service, and ULB. In this section, we will explain how each parameter was calculated and discuss the resulting values. A point to note about these parameters is that at the ULB level, they are the same across all ULBs. In the state we studied, every ULB collected the same data for each service. All ULBs follow similar workflows and evaluate each application or complaint using the same criteria outlined by the state government.

9.2.1 Right Collection

The right collection index at the functionary level is calculated by understanding what data is necessary for a given task to be completed and the data collected for that service. Each of the CHAPTER 9. RESULTS AND ANALYSIS 84

service forms include mandatory data fields and optional data fields. For the purpose of this study, we consider ”collected fields” as all of the fields that are requested by a particular service. This includes the mandatory and non-mandatory fields, as the government has the ability to observe all fields. The right collection parameter at a functionary level by dividing necessary fields by all collected fields for a particular functionary. The number of necessary fields is found by summing the corre- sponding column of a functionary where a cell is denoted ”1” if the field is necessary for completion of a task. All collected fields is the number of fields given in the service form. At the service level, the parameter is found by diving all fields that are needed by one or more functionary by all collected fields. To calculate the numerator, we run a row-wise ”OR” function and sum the resulting values. The denominator remains the same. Accordingly, the results for each of the services are shown in Figure 9.22, Figure 9.23, and Figure 9.24.

Figure 9.22: New Water Tap Connection Right Collection Parameters: The functionary level pa- rameters are displayed under the columns of each functionary, while the parameter at the service level is displayed in the right most column.

Figure 9.23: New Property Tax Assessment Right Collection Parameters: The functionary level parameters are displayed under the columns of each functionary, while the parameter at the service level is displayed in the right most column.

As all ULBs have the parameter values, the Right Collection parameter at the ULB level across all three services we analyzed is 0.435966087. CHAPTER 9. RESULTS AND ANALYSIS 85

Figure 9.24: PGR Right Collection Parameters: The functionary level parameters are displayed under the columns of each functionary, while the parameter at the service level is displayed in the right most column.

9.2.2 Right Use

Like the right collection index, the right use index is calculated from the NDM matrix. At a functionary level, the right use parameter is calculated by dividing the necessary data fields by all data fields that the functionary is given access to. In this particular state, all functionaries are given access to all data that is collected. As a result, the right collection and right use parameters at the functionary level are all the same across all services. Right use at the service level, however, is calculated by taking the average of the right use indices across all functionaries (See Figure 9.25). The ULB level right use parameter is similarly the average across all services.

9.2.3 Right Disclosure

The right disclosure values at the functionary level will be the same as that at the service level, as described in 8.2.3. Across all information collected across all services, we consider Mobile Phone number and home address as PII. Accordingly, the right disclosure parameter has been calculated. At the ULB level, the right disclosure value is the average of the right disclosure values across all services. CHAPTER 9. RESULTS AND ANALYSIS 86

Figure 9.25: IPI Calculation for All ULBs

9.2.4 IPI Impact

Similar to GEI, IPI Impact (IPImpact) helps us understand the magnitude of applications e↵ected by a ULB’s IPI. Theoretically, it is the number of applications that are fully privacy protected according to the principles IPI is composed of. Of course, in practice, this interpretation is not correct. IPImpact is calculated by multiplying the number of applications received by a ULB by its IPI: I(s) = IPI(s) s (9.3) p ⇤k k

9.3 GEI and IPI

Comparing GEImpact and IPImpact is not necessarily useful in determining top-performing cities, but is informative in determining ULBs that are able to maintain high GEI and IPI while handling large volumes of applications. As IPI for all ULBs is equal and we assume 100% accuracy for this study, ULBs with high GEImpact and IPImpact are also those maintaining high timeliness despite large volumes of applications. By plotting GEImpact against IPImpact for each tier and service, it is easy to identify model ULBs by selecting those exceeding the third quartile in both axes. See Figures F.15, F.16, and F.16. Since comparing impact values are dependent on the volume of applications received, comparing GEI across all evaluated services serves as a better proxy for the best performing ULBs irrespective of volume of applications received. In Figure 9.29, GEI across the three services are compared. ULBs exceeding the third quartile on both axes and having high GEI for PGR (as indicated by green indicators) can be considered as model ULBs.

ULBs can also be compared across all services evaluated. GEImpacttotal is the sum of the GEI for all services, and IP Impacttotal is evaluated similarly. The resulting distributions are as follows: CHAPTER 9. RESULTS AND ANALYSIS 87

Figure 9.26: Tier 1: GEImpacttotal vs IP Impacttotal

Figure 9.27: Tier 2: GEImpacttotal vs IP Impacttotal CHAPTER 9. RESULTS AND ANALYSIS 88

Figure 9.28: Tier 3: GEImpacttotal vs IP Impacttotal CHAPTER 9. RESULTS AND ANALYSIS 89

Figure 9.29: GEI Comparison for All Services by Tier Chapter 10

Applications and E↵ects on Public Policy

A methodology for comparing eciency and privacy of various entities provides a structure to understanding, evaluating, and comparing governmental processes. The process for evaluating GEI and IPI as well as the resulting values provide multiple lenses through which the workflows of a state, ULB, or department can be analyzed, as discussed in Section 9. In particular, we highlight a couple use cases for GEI and IPI.

10.1 Learning from Top Performing ULBs

Across all services, top-performing ULBs within each tier are identified according to certain criteria outlined in Chapter 9. As all ULBs currently have the same IPI and the accuracy parameter has been set to 1, the model ULBs identified according to timeliness can be considered top-performing with respect to both IPI and GEI. These ULBs can be studied more in depth to understand why they are able to provide timely services. Each ULB, even within tiers, might have slight variations in their workflow. Consolidation or separation of certain tasks or roles may lead to an improved workflow. Such kind of specific optimizations can be detected and implemented in other places through analyzing model cities. Conversely, comparison of GEI and IPI can identify ULBs that consistently perform poorly for a given service or redressal for specific complaints. One can use a methodology similar that described in Section 9.1.2. ULBs within each tier performing under the 25th percentile can be targeted for improvement. By comparing to the respective model ULBs for each service or complaint, ULBs marked for improvement can comparing stang models, workflow modifications, and other such ULB dependent factors.

90 CHAPTER 10. APPLICATIONS AND EFFECTS ON PUBLIC POLICY 91

Furthermore, parameters for GEI can be calculated at the functionary level as well, as described in Section 8. Calculating timeliness for each functionary along the service workflows can inform on where the bottleneck for a particular ULB may be. Once this is identified, the ULB can act accord- ingly, possibly by increasing resources for that particular task. Making the functionary timeliness values public for ULBs of similar size to view can give a basis for comparison as to what can be reasonably expected of each functionary.

10.2 Evaluating the Trade-o↵s between Eciency and Pri- vacy

From Section 9.2, it is clear that there is reasonable room for improvement in privacy standards at the service and functionary level. The most obvious optimization is tightening access controls for funtionaries in terms of the citizen information they have access to. When a citizen files a new water tap connection application or new property tax assessment application, a Jr./Sr. Assistant first views the application to verify contact and personal details. However, he/she does not need acccess to details about the building, construction type, occupation, and so on, as can be seen in Figures C.2 and D.1. If he/she were not given immediate access to these fields, then the right use parameter would increase, leading to an increase in IPI. Access controls that are too strict, however, could lead to decreases in eciency. Consequently, the structured methodology we provide to evaluate both eciency and privacy in tandem is informative.

10.3 Reviewing Fairness of SLAs

GEI analysis for PGR across all departments and complaints could be used to identify complaints for which the SLAs may need to be reviewed or possibly di↵erentiated by tier. SLAs are uniform for each service and PGR complaint across all ULBs regardless of the volume of applcations or complaints received. While shorter SLAs push ULBs to complete tasks in a timely manner, SLAs that are consistently not being met may do more harm than good. SLAs consistently not being met could impact citizens’ trust in the municipality. Furthermore, funtionaries, particularly in PGR, are being evaluated by the number of times they have met or not met the SLA. For PGR, this value is public. However, it cannot be reasonably expected that functionaries performing the same role in a village versus a corporation will be able to complete their jobs in the same way for some complaints due to the sheer di↵erence in volume of complaints, population, and physical constraints of being in a large ULB. Since SLAs do a↵ect a functionary’s performance metric, it is interesting to investigate first by tier, for which complaints the SLAs are consistently not being met. A modified version of timeliness can be used identify the subset of complaints whose SLAs may need to be re-evaluated, in general or by tier. While timeliness measures the proportion of CHAPTER 10. APPLICATIONS AND EFFECTS ON PUBLIC POLICY 92

complaints addressed by the SLA, it does not provide information on how close or far from the SLA complaints are being addressed. Timeliness is calculated under the assumption that the SLAs are fair and correct. If timeliness values across ULBs are low, then either the SLA is too low, or it is low for a reason and ULBs need to improve their workflow process, stang model, or other such organizational factors. Using a threshold or a percentile threshold for timeliness, one can determine what is considered a low timeliness for a given complaint for a particular tier of ULBs. If the state expects a timeliness of service rate of 50%, then one can expect the distribution of timeliness values for a given complaint to be normally distributed with a mean of 0.50. If the probability of sampling the relevant subset of ULBs’ timeliness from this distribution is below a certain threshold, then one may need to re-evaluate the SLA itself. If the complaint is time-sensitive, however, then the pros and cons of increasing the SLA time should be weighed. For example, ”Electric Shock due to Street Lights” or ”Illegal Draining of Sewerage to Open Site” are complaints that regardless of timeliness values across tiers, should have low SLAs. In general, complaints regarding Public Health and Sanitation are time sensitive. Chapter 11

Conclusions and Further Work

This study lays out and applies a methodology to analyze how ULBs perform on the axis of govern- ment eciency (GEI) and informational privacy (IPI). Using real data from eGovernments Foun- dation for three government services - New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal - we demonstrate that both eciency and privacy are measurable concepts in the context of urban governance. The model of identifying top-performing cities allows us to conclude that there are exemplar cases of ULBs that need to be studied further. These model cities demonstrate that ULBs can have high GEImpact and IPImpact, and that there is room for improvement for model cities as well. We analyze the performance of ULBs by tier, as the volume of applications received and the resources available to each ULB a↵ects the timeliness parameter of GEI. We also discover that certain New Property Tax Assessment and New Water Tap Connection applications in some ULBs have gone through the digital workflow in the ERP system in a suspiciously small amount of time. We identify applications that have inaccurate application durations, and ignore those during the calculation of adjusted timeliness. As only online complaints can be filed, the Public Grievance Redressal module does not have applications with inaccurate application durations. In addition to grouping and analyzing cities by tiers, we identify top-performing cities in each department independently. There exist ULBs that are generally top-performing as well across all PGR. As IPI is the same for each service across all ULBs and accuracy could not be calculated with the collected data, ULBs with high timeliness within each tier are considered model cities, regardless of the volume of applications received. We identify ULBs that have high timeliness across all services, demonstrating that it is possible to maximize eciency irrespective of population or the resulting volume of transactions.

Finally, we calculate and compare GEImpacttotal and IP Impacttotal across all ULBs. While ULBs with low impact are not necessarily less ecient or private, ULBs with high impact on both axes should be further studied to understand what specifics about their workflow, organizational

93 CHAPTER 11. CONCLUSIONS AND FURTHER WORK 94

structure, or resource structure leads to high eciency and privacy impact even with a high volume of applications.

11.1 Further Work

A methodology to calculate eciency and privacy is helpful in determining top-performing ULBs within an entity, in this case Andhra Pradesh; however, further work can be done to improve the indices to make them more accurate, informative, and comprehensive. There are a couple more specific questions we may ask with regards to this study. First, in this study we only calculate GEI and IPI at the service level for each ULB. An interesting direction for further work includes implementing the methods laid out in Chapter 8 to calculate GEI and IPI at the functionary level. In particular, observing and analyzing timeliness at the functionary level across ULBs can provide a wealth of information on exactly which task or functionary in the workflow for each service is the bottleneck for improving timeliness. Once the bottleneck is identified, municipalities can have individualized solutions for addressing the issue. Cities with high timeliness across all functionaries can serve as case studies for those unable to consistently maintain high eciency. In general, on-site studies and interviews can be completed at the model ULBs identified in this document to pinpoint exactly what factors lead to high timeliness. Further investigation can be done on why ULBs with high impacts and high timeliness are bale to maintain eciency while handling large volumes of applications. Second, further work is required to develop a methodology to calculate accuracy or develop a proxy for accuracy. Accuracy is an important component of GEI, and while we currently assume that accuracy values are largely close to 1, research confirming that assumption needs to completed. Calculation of accuracy may require designing and implementing a way of logging accuracy in the ERP system with eGov. Third, for complaints in PGR that have been marked as possibly requiring review, a process for determining any change in SLA can be developed. While we calculate GEI based on the assumption that SLAs are fair and correct, we recognize that this may not necessarily be the case. This process would take into account the urgency of the complaint as well as timeliness information. Based on the resulting analysis, it may also be argued that some SLAs should be dependent on the tier of a ULB so that functionaries in corporations are not penalized for urban mobility or resource issues that are out of their control. Fourth, a more rigorous approach to identifying ULBs that are not properly documenting appli- cation workflows in the ERP system can be developed. In this study, we classify applications that have an application duration of less than a day as incorrectly recorded transactions. It is reasonable to assume, however, that most applications completed in less than a couple days may have also been incorrectly recorded, particularly if the applications were from Tier 1 cities. In general, it would be interesting if we could train a machine learning model that could predict timeliness of a ULB based CHAPTER 11. CONCLUSIONS AND FURTHER WORK 95

on a range of socio-economic or organization factors, including population, volume of transactions, amount of resources, geographic location, and so on. Depending on the model, we could gain valu- able information regarding the importance of each of these factors in determining the performance of a city. Most importantly, now that there exists a methodology to measure and analyze government e- ciency and informational privacy, the general direction of further research should be to understand any causal a↵ects between GEI and IPI. Do implementing changes to access controls or data dis- closure a↵ect government eciency? More excitingly, can we innovate upon existing procedures to improve both eciency and privacy in a sustainable manner? Answering this question will require further field work, experiments with changes to the workflows or ERP system for particular services, and so on. Equipped with the lessons from our current study, we hope to expand upon this line of research in the future. Bibliography

[1] CIPP Guide: Comparing the Co-Regulatory Model, Comprehensive Laws and the Sectoral Approach. Available at by Privacy Commissioner of Canada at https://www.cippguide.org/2010/06/01/comparing-the-co-regulatory-model-comprehensive- laws-and-the-sectoral-approach/.

[2] Citizen’s Charterin Government of India, Dept. of Administrative Reforms and Public Grievances, Ministry of Personal Public Grievances and Pensions, Government of India. Avail- able at https://goicharters.nic.in.

[3] eGovernments Foundation. Accessed at https://www.egovernments.org on August 12, 2018.

[4] NationMaster: India vs United States Education Stats Compared. Available at http://www.nationmaster.com/country-info/compare/India/United-States/Education.

[5] Pulitzer Center: A Free Meal: Indias School Lunch Program. Available at http://pulitzercenter.org/projects/asia-india-midday-meal-program-corruption-success- nutrition-poverty-school-lunch.

[6] Smart cities mission statement and guidelines, June 2015. Available at: http://smartcities.gov.in/upload/uploadfiles/files/ SmartCityGuidelines(1).pdf.

[7] Alessandro Acquisti, Curtis R. Taylor, and Liad Wagman. The economics of pri- vacy. Journal of Economic Literature, 52(1), 2016. Sloan Foundation Economics Re- search Paper No. 2580411. Available at SSRN: https://ssrn.com/abstract=2580411 or http://dx.doi.org/10.2139/ssrn.2580411. 2.

[8] Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. Searchable symmetric encryp- tion: Improved definitions and ecient constructions. Journal of Computer Security, 19(5):895– 934, November 2011.

96 BIBLIOGRAPHY 97

[9] Cynthia Dwork. Di↵erential privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II, ICALP’06, pages 1–12, Berlin, Hei- delberg, 2006. Springer-Verlag.

[10] Dennis D. Hirsch. The law and policy of online privacy: Regulation, self-regulation, or co- regulation? Seattle University Law Review, 439(440-441), 2011.

[11] Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, editors. Privacy, Big Data, and the Public Good. Cambridge University Press, New York, NY, 2014.

[12] Erika McCallister, Tim Grance, and Karen Sarfone. Guide to protecting the confi- dentiality of personally identified information (pii). Special Publication 800-122, Na- tional Institute of Standards and Technology, Computer Security Division, Information Technology Laboratory, NIST, Gaithersburg, MD,20899-8930, April 2010. Available at https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-122.pdf.

[13] Nandan Nilekani. Aadhaar, india’s biometric, unified payment interface and the india stack: Technology at scale for development. Presented at Harvard University Lakshmi Mittal South Asia Institute, 2017.

[14] Nandan Nilekani and Viral Shah. Rebooting India: Realizing a billions aspirations. Allen Lane, 2015.

[15] Helen Nissenbaum. Privacy as contextual integrity. Washington Law Review, 79(1):119–158, February 2004.

[16] Helen Nissenbaum. Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press, 2009.

[17] Alok Kshirsagar Rajat Gupta James Manyika Kushe Bahl Noshir Kaka, Anu Madgavkar and Shishir Gupta. : Technology to transform a connected nation.

[18] Department of Homeland Security. Fair information practice principles (fipps). Technical report, 2014. Available at www.dhs.gov/publication/fair-information-practice-principles-fipps.

[19] Krishnadas Rajagopal. Constitution bench to hear aadhaar petitions on december 14. The Hindu (online), 12 2017. Available at www.thehindu.com/news/national/constitution-bench- to-hear-aadhaar-petitions-on-december-14/article21571363.

[20] Sanjeet Singh. Evaluation of worlds largest social welfare scheme: An assessment using non- parametric approach. Evaluation and Program Planning, 57:16–29, August 2016. BIBLIOGRAPHY 98

[21] B. N. SriKrishna, Aruna Sundararajan, Agay Bhushan Pandey, Ajay Kumar, Rajat Moona, Gulshan Rai, Rishikesha Krishnan, Arghya Sengupta, and Rama Vedashree. White paper of the committee of experts on a data protection framework for india, Nov 2017. Available at http://meity.gov.in/writereaddata/files/ white paper on data protection in india 18122017 final v2.1.pdf.

[22] Latanya Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-base Systems, 10(5):557–570, 2002. 99 APPENDIX A. SITE AND MODULE SELECTION 100

Appendix A

Site and Module Selection APPENDIX A. SITE AND MODULE SELECTION 101 APPENDIX A. SITE AND MODULE SELECTION 102

Figure A.1: Service Level Agreements for PGR Appendix B

Understanding Data Use and Data Disclosure

Financial Implication/Inference Complaints regarding Misuse of Community Removal of shops in restaurants / function Hall [Town Planning] the footpath [Town halls[PHS] Planning]

Complaints regarding Unhygienic and Unauthorized/Illegal Dispensary[PHS] improper transport of Construction [Town meat and livestock Planning] [PHS]

Unhygienic conditions Food adulteration: Unauthorized advt. because of Slaughter road side eateries boards [Town House [PHS] [PHS] Planning]

Complaints regarding Complaints regarding Issues relation to all Sanctioned loans Function halls [PHS] advertisement Boards [UPA] [Town Planning]

Figure B.1: Examples of Public Grievances types for which loss of confidentiality leads to financial loss

103 APPENDIX B. UNDERSTANDING DATA USE AND DATA DISCLOSURE 104

Financial and Cultural Implication/Inference

Obstruction of water Burning of garbage Misuse of Community flow [Engineering] [PHS Hall [Town Planning]

Illegal Slaughtering Issues relating to vacant Unauthorized sale of [PHS] lands [PHS] meat and meat product [PHS] Illegal draining of Violation of DCR/ sewage to SWD/Open Building by-laws [Town site [PHS] Planning] Unauthorized tree Encroachment on the Cutting [PHS] public property [Town Planning]

Figure B.2: Examples of Public Grievances types for which loss of confidentiality leads to financial and cultural loss

Business --> Citizen --> Citizen --> Citizen Business --> Type of Citizen Business (cultural Business Only ULB relationship (cultural) (financial) +financial) (financial) Addressable No. grievance types in this category 8 20 5 15 75

Figure B.3: Loss of confidentiality can be to either the complainer or the subject; the relationship is related to whether the loss is financial or cultural. Appendix C

New Property Tax Assessment Data

105 APPENDIX C. NEW PROPERTY TAX ASSESSMENT DATA 106 APPENDIX C. NEW PROPERTY TAX ASSESSMENT DATA 107

Figure C.2: New Property Tax Assessment: Necessary Data Matrix APPENDIX C. NEW PROPERTY TAX ASSESSMENT DATA 108 APPENDIX C. NEW PROPERTY TAX ASSESSMENT DATA 109

Figure C.4: New Property Tax Assessment: Operational and Administrative Matrix APPENDIX C. NEW PROPERTY TAX ASSESSMENT DATA 110 APPENDIX C. NEW PROPERTY TAX ASSESSMENT DATA 111

Figure C.6: New Property Tax Assessment: 2018 Data Fields 112 APPENDIX D. NEW WATER TAP CONNECTION: DATA 113

Appendix D

New Water Tap Connection: Data

Figure D.1: New Water Tap Connection: Necessary Data Matrix APPENDIX D. NEW WATER TAP CONNECTION: DATA 114

Figure D.2: New Water Tap Connection: Operational and Administrative Matrix APPENDIX D. NEW WATER TAP CONNECTION: DATA 115

Figure D.3: New Water Tap Connection: 2018 Data Fields Appendix E

Public Grievance Redressal Data

Figure E.1: Public Grievance Redressal: Necessary Data Matrix

116 APPENDIX E. PUBLIC GRIEVANCE REDRESSAL DATA 117

Figure E.2: Public Grievance Redressal: Operational and Administrative Matrix APPENDIX E. PUBLIC GRIEVANCE REDRESSAL DATA 118 APPENDIX E. PUBLIC GRIEVANCE REDRESSAL DATA 119 APPENDIX E. PUBLIC GRIEVANCE REDRESSAL DATA 120

Figure E.5: Public Grievance Redressal: 2018 Data Fields Appendix F

Results and Analysis

121 APPENDIX F. RESULTS AND ANALYSIS 122 APPENDIX F. RESULTS AND ANALYSIS 123 APPENDIX F. RESULTS AND ANALYSIS 124

Figure F.3: New Property Tax Assessment Timeliness of Service Results APPENDIX F. RESULTS AND ANALYSIS 125

Figure F.4: PT Population vs. Percentage of Inaccurate Application Durations: This plot shows no correlation between the population of a ULB and its compliance with proper use of the ERP System. APPENDIX F. RESULTS AND ANALYSIS 126

Figure F.5: PT Volume of Applications vs. Adjusted Timeliness APPENDIX F. RESULTS AND ANALYSIS 127 APPENDIX F. RESULTS AND ANALYSIS 128 APPENDIX F. RESULTS AND ANALYSIS 129

Figure F.8: New Water Tax Assessment Timeliness of Service Results APPENDIX F. RESULTS AND ANALYSIS 130

Figure F.9: WT Population vs. Percentage of Inaccurate Application Durations: This plot shows no correlation between the population of a ULB and its compliance with proper use of the ERP System. APPENDIX F. RESULTS AND ANALYSIS 131

Figure F.10: WT Volume of Applications vs. Adjusted Timeliness APPENDIX F. RESULTS AND ANALYSIS 132 APPENDIX F. RESULTS AND ANALYSIS 133 APPENDIX F. RESULTS AND ANALYSIS 134 APPENDIX F. RESULTS AND ANALYSIS 135

Figure F.14: Public Grievance Redressal Timeliness by ULB across Departments APPENDIX F. RESULTS AND ANALYSIS 136

Figure F.15: Tier 1: GEImpact vs. IPImpact APPENDIX F. RESULTS AND ANALYSIS 137

Figure F.16: Tier 2: GEImpact vs. IPImpact APPENDIX F. RESULTS AND ANALYSIS 138

Figure F.17: Tier 3: GEImpact vs. IPImpact