Not a Zero Sum Game: How to Simultaneously Maximize Efficiency and Privacy in Data-Driven Urban Governance by Nikita Krishna Kodali Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Masters of Engineering in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2019 c Massachusetts Institute of Technology 2019. All rights reserved.
Author...... Department of Electrical Engineering and Computer Science May 28, 2019
Certified by...... Karen Sollins Principal Research Scientist Thesis Supervisor
Accepted by...... Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Not a Zero Sum Game: How to Simultaneously Maximize Efficiency and Privacy in Data-Driven Urban Governance by Nikita Krishna Kodali
Submitted to the Department of Electrical Engineering and Computer Science on May 28, 2019, in partial fulfillment of the requirements for the degree of Masters of Engineering in Computer Science and Engineering
Abstract India has been been striving towards digitization of citizen data and government services using e-governance platforms to improve accountability, transparency, and efficiency. Accordingly, India launched the world’s largest biometric ID system, Aad- haar, in 2009 and the "100 Smart Cities Mission" in 2015. However, with the immense wealth of personal data being digitized, guarantees on personal privacy were subse- quently called into question. In August of 2017, the Supreme Court of India declared that privacy is a fundamental right. To examine the juxtaposition of governmental efficiency and personal privacy through rapid digitization, we investigate and col- lect city metadata by examining the architecture of the products of eGovernments Foundation, one of the leading providers of digital tools for ULBs, and by directly by interacting with cities. In this particular study, we investigate New Property Tax Assessment applications, New Water Tap Connection applications, and Public Grievance Redressals for 112 Urban Local Bodies (ULBs) in the state of Andhra Pradesh. Through field work, collection of data, and further analysis, we observe which data fields are collected how they are used. We define a Government Efficiency Index (GEI) and Information Privacy Index (IPI) in order to provide a standard for understanding and analyzing the trade-offs between government efficiency and citizen privacy for these services across all ULBs. This thesis examines how ULBs perform on GEI and IPI axes through multiple lenses. Using real data, we demonstrate that both efficiency and privacy are measurable concepts in the context of urban governance. Furthermore, the methodology of identifying top-performing cities outlined in this thesis allows us to conclude that there exist exemplar cases of ULBs that have high impact on both the GEI and IPI axes. Thus, this methodology for comparing effi- ciency and privacy provides a structure to understanding, evaluating, and comparing governmental processes.
Thesis Supervisor: Karen Sollins Title: Principal Research Scientist
3 4 Acknowledgments
First and foremost, I am deeply grateful for my research supervisors, Karen Sollins and Chintan Vaishnav. Without their guidance, compassion, and encouragement, I would not have been able to produce this document with the confidence I do today. Thank you Karen, for allowing me to voice my sometimes naive optimism or trivial frustrations several times a week, and pushing me to think about implications beyond just the scope of our research. Your immense depth of knowledge in your field of research inspires me, and I hope to one day be as meaningful of a mentor to someone as you have been to me. Thank you Chintan, for your boundless patience in teaching me how to become a better researcher both in the lab and on the ground. Your endless enthusiasm, calming presence, and passion for using technology for social good give me a role model to look up to every day.
Second, this work would not have been possible without the work and contribu- tions of Gautham Ravichander and his colleagues at the eGovernments Foundation. In addition, I wish to acknowledge my Research Assistantship funding the MIT In- ternet Policy Research Initiative. I want to express my profound gratitude for your support and encouragement over the last two years.
My accomplishments and successes are never singular. The support, guidance, love, encouragement, and optimism of my friends and family empower me every day. The women in my life - Ammamma, Nanamma, Amma, Chinnu Amma, and Souji Atha - have taught me how to be strong and the men in my family - Thatha, Nana, Chandu Nana, Deva Mama, Bablu Mama, and Pasi Babai - have picked me up when Iamunabletobe.Thankyouforalwaysbelievinginme.
To my dear friend, Divya, thank you for unwavering support and encouragement through thick and thin. And of course, thank you in advance to my forever best buddies - Anita, Bhargavi, Kavya, Sravya, Siri, Ronak, and Aadya - for listening to me recite every line in this document annually. Sandeep, I literally would not have been able to hand in this thesis on time without you, so you are exempt.
5 For Geethatha, whose warm smile will always be etched in our memories,
whose love will forever remain in our hearts. Contents
1 Introduction 14
2 Research Questions 17
3 Background: Digitization and Constitutional Privacy 18 3.1 Advances in Digitization of Cities ...... 18 3.2 The Constitutionality of Privacy ...... 19 3.2.1 The Aadhaar Issue ...... 20 3.2.2 SupremeCourtDecision...... 20
4 Understanding the Value of Privacy in the Indian Context 22
5 Existing Models of Privacy 25 5.0.1 Command and Control Model ...... 25 5.0.2 SectoralModel ...... 26 5.0.3 Co-Regulatory Model ...... 27 5.0.4 Recommended Model of Privacy in India ...... 29 5.1 Data and Data Collection ...... 30 5.2 e-Governments Foundation, Current Installations and Digital Services ...... 30 5.3 SiteandModuleSelection ...... 31 5.3.1 Department Hierarchy ...... 31 5.3.2 Classification of eGov Modules ...... 32 5.3.3 ModulesSelectedforThisStudy ...... 32
6 Data Available for Analysis 33 6.0.1 Workflows...... 33 6.0.2 Binary and Qualitative Matrices ...... 33 6.0.3 2018 Data ...... 34 6.1 ServiceModulesData ...... 34
7 CONTENTS 8
6.1.1 PropertyTaxModule ...... 35 6.1.2 Water Charges Module ...... 36 6.1.3 PublicGrievancesModule...... 38
7 Understanding Data Use and Data Disclosure 41 7.1 Understanding Implications of Loss of Data Integrity ...... 42 7.1.1 Privacy Analysis I: Loss of Data Integrity ...... 42 7.1.2 Functional and Financial Implications of Loss of Data Integrity ...... 42 7.2 Understanding the Implications of Data Disclosure on Privacy ...... 44 7.2.1 Privacy Analysis II: Loss of Data Confidentiality ...... 44 7.2.2 Financial and Cultural Implications of Data Voluntary or Involuntary Disclosure 44 7.2.3 GrievanceandInference ...... 45
8Synthesis 46 8.1 Government E ciencyIndex ...... 46 8.1.1 TimelinessofService...... 46 8.1.2 Accuracy of Service ...... 48 8.2 Informational Privacy Index ...... 49 8.2.1 Right Collection Index ...... 49 8.2.2 RightUse...... 50 8.2.3 RightDisclosure ...... 51
9 Results and Analysis 53 9.1 GEICalculation ...... 53 9.1.1 Timeliness...... 53 9.1.2 Property Tax Assessment ...... 55 9.1.3 Water Charges ...... 67 9.1.4 Comparison of PT and WT Model ULBs ...... 79 9.1.5 PublicGrievances ...... 79 9.1.6 Accuracy ...... 83 9.2 IPICalculation...... 83 9.2.1 Right Collection ...... 83 9.2.2 RightUse...... 85 9.2.3 RightDisclosure ...... 85 9.2.4 IPIImpact ...... 86 9.3 GEIandIPI...... 86 CONTENTS 9
10 Applications and E↵ects on Public Policy 90 10.1 Learning from Top Performing ULBs ...... 90 10.2 Evaluating the Trade-o↵s between E ciencyandPrivacy ...... 91 10.3 Reviewing Fairness of SLAs ...... 91
11 Conclusions and Further Work 93 11.1FurtherWork...... 94
A Site and Module Selection 99
B Understanding Data Use and Data Disclosure 103
C New Property Tax Assessment Data 105
D New Water Tap Connection: Data 112
E Public Grievance Redressal Data 116
F Results and Analysis 121 List of Figures
5.1 Co-Regulatory Model ...... 28 5.2 Comparison of Privacy Principles ...... 29 5.3 eGovernmentsFoundationERPSuite[3] ...... 31 5.4 DepartmentHierarchy ...... 31
6.1 New Property Tax Assessment Workflow[3] ...... 36 6.2 New Water Tap Connection Workflow[3] ...... 37 6.3 PGR Escalation Hierarchy[3] ...... 39
7.1 Table of Fields Necessary, Collected for E ciency/Accuracy, and Unnecessary for Completion of PT Assessment, WT Assessment, and PGR ...... 43
9.1 Distribution of Inaccurate Application Durations for PT by Tier: This plot describes the distribution of percentage of inaccurately recorded transactions by tier of pop- ulation. Each marker indicates a ULB, where the size of the marker indicates the volume of applications received and the color indicates the population tier to which that ULB belongs. The median of each tier is also marked in yellow and annotated withthevalueoftherespectivemedian...... 56 9.2 PT Adjusted vs. Unadjusted Timeliness: This figure compares adjusted timeliness and unadjusted timeliness values for each ULB. The size and color of the marker correspond to the volume of applications received by and population tier of that ULB, respectively. The OLS lines-of-best-fit for each tier are indicated by dashed lines. 57 9.3 PT: A histogram of the frequency of adjusted timeliness bins, aggregated by color according to population tier ...... 58 9.4 Distribution of Adjusted Timeliness for PT by Tier: These boxplots separated by tier describe the distribution of adjusted timeliness values. Each marker, representing a ULB, has size proportional to the volume of applications received by that ULB. The
pink reference lines indicate the average tadj for the corresponding tiers. The yellow
line indicates the median tadj for the corresponding tiers ...... 59
10 LIST OF FIGURES 11
9.5 Distribution of Weighted Impact for PT by Tier: Boxplots for the distribution of weighted impact by tiers, where markers are ULBs of size proportional to the volume ofapplicationsreceived...... 61 9.6 GEImpact vs. Adjusted Timeliness of PT by Tier: ULBs handling high volumes of applications but maintaining high timeliness rates can be identified here. Each pane gives a plot of weighted impact vs. adjusted timelines for a particular tier. The shades regions correspond to the interquartile regions for each axis. The markers represent ULBs where the size of the marker is proportional to the total number of applications received. ULBs where both coordinates are greater than the third quartile can be informative upon further analysis...... 62 9.7 Tier 1 Model ULBs with respect to Timeliness of Service for New Property Tax Assessment ...... 64 9.8 Tier 2 Model ULBs with respect to Timeliness of Service for New Property Tax Assessment ...... 65 9.9 Tier 3 Model ULBs with respect to Timeliness of Service for New Property Tax Assessment ...... 66 9.10 Distribution of Inaccurate Application Durations for WT by Tier: This plot describes the distribution of percentage of inaccurately recorded transactions by tier of population. 68 9.11 WT Adjusted vs. Unadjusted Timeliness: This figure compares adjusted timeliness and unadjusted timeliness values for each ULB for WT. The size and color of the marker correspond to the volume of applications received by and population tier of that ULB, respectively. The OLS lines-of-best-fit for each tier are indicated by dashed lines...... 69 9.12 WT: A histogram of the frequency of adjusted timeliness bins, aggregated by color according to population tier ...... 70 9.13 Distribution of Adjusted Timeliness for WT by Tier: These boxplots separated by tier describe the distribution of adjusted timeliness values. Each marker, representing a ULB, has size proportional to the volume of applications received by that ULB. The
pink reference lines indicate the average tadj for the corresponding tiers. The yellow
line indicates the median tadj for the corresponding tiers...... 71 9.14 Distribution of Weighted Impact for WT by Tier: Boxplots for the distribution of weighted impact by tiers, where markers are ULBs of size proportional to the volume ofapplicationsreceived...... 72 LIST OF FIGURES 12
9.15 GEImpact vs. Adjusted Timeliness of WT by Tier: ULBs handling high volumes of applications but maintaining high timeliness rates can be identified here. Each pane gives a plot of weighted impact vs. adjusted timelines for a particular tier. The shades regions correspond to the interquartile regions for each axis. The markers represent ULBs where the size of the marker is proportional to the total number of applications received. ULBs where both coordinates are greater than the third quartile can be informative upon further analysis...... 74 9.16 Tier 1 Model ULBs with respect to Timeliness of Service for New Water Tap Connection 76 9.17 Tier 2 Model ULBs with respect to Timeliness of Service for New Water Tap Connection 77 9.18 Tier 3 Model ULBs with respect to Timeliness of Service for New Water Tap Connection 78 9.19 Average Timeliness across Tiers by PGR Department ...... 80 9.20 PGR Timeliness by Department of Complaint and Tier ...... 81 9.21 Process for Identifying Model ULBs for a Complaint ...... 82 9.22 New Water Tap Connection Right Collection Parameters: The functionary level pa- rameters are displayed under the columns of each functionary, while the parameter attheservicelevelisdisplayedintherightmostcolumn...... 84 9.23 New Property Tax Assessment Right Collection Parameters: The functionary level parameters are displayed under the columns of each functionary, while the parameter attheservicelevelisdisplayedintherightmostcolumn...... 84 9.24 PGR Right Collection Parameters: The functionary level parameters are displayed under the columns of each functionary, while the parameter at the service level is displayedintherightmostcolumn...... 85 9.25 IPI Calculation for All ULBs ...... 86
9.26 Tier 1: GEImpacttotal vs IP Impacttotal ...... 87
9.27 Tier 2: GEImpacttotal vs IP Impacttotal ...... 87
9.28 Tier 3: GEImpacttotal vs IP Impacttotal ...... 88 9.29 GEI Comparison for All Services by Tier ...... 89
A.1 Service Level Agreements for PGR ...... 102
B.1 Examples of Public Grievances types for which loss of confidentiality leads to financial loss...... 103 B.2 Examples of Public Grievances types for which loss of confidentiality leads to financial andculturalloss ...... 104 B.3 Loss of confidentiality can be to either the complainer or the subject; the relationship is related to whether the loss is financial or cultural...... 104
C.2 New Property Tax Assessment: Necessary Data Matrix ...... 107 LIST OF FIGURES 13
C.4 New Property Tax Assessment: Operational and Administrative Matrix ...... 109 C.6 New Property Tax Assessment: 2018 Data Fields ...... 111
D.1 New Water Tap Connection: Necessary Data Matrix ...... 113 D.2 New Water Tap Connection: Operational and Administrative Matrix ...... 114 D.3 New Water Tap Connection: 2018 Data Fields ...... 115
E.1 Public Grievance Redressal: Necessary Data Matrix ...... 116 E.2 Public Grievance Redressal: Operational and Administrative Matrix ...... 117 E.5 Public Grievance Redressal: 2018 Data Fields ...... 120
F.3 New Property Tax Assessment Timeliness of Service Results...... 124 F.4 PT Population vs. Percentage of Inaccurate Application Durations: This plot shows no correlation between the population of a ULB and its compliance with proper use oftheERPSystem...... 125 F.5 PT Volume of Applications vs. Adjusted Timeliness ...... 126 F.8 NewWaterTaxAssessmentTimelinessofServiceResults ...... 129 F.9 WT Population vs. Percentage of Inaccurate Application Durations: This plot shows no correlation between the population of a ULB and its compliance with proper use oftheERPSystem...... 130 F.10 WT Volume of Applications vs. Adjusted Timeliness ...... 131 F.14 Public Grievance Redressal Timeliness by ULB across Departments ...... 135 F.15Tier1:GEImpactvs.IPImpact...... 136 F.16Tier2:GEImpactvs.IPImpact...... 137 F.17Tier3:GEImpactvs.IPImpact...... 138 Chapter 1
Introduction
India has been undergoing rapid digitization in the private and public sectors. Behind China, India is the second fastest digital adopter among 17 major digital economies.[17] Even with a population of 1.2 billion people, India has instituted the world’s largest national identification system with the implementation of Aadhaar by the Unique Identification Authority of India (UIDAI) [14]. With the introduction of ”100 Smart Cities Mission” in 2015[6], India started focusing on digitizing governance across cities to improve accountability, transparency, and e ciency. With the immense wealth of personal data being digitized, guarantees on personal privacy were called into question. Consequently in August of 2017, the Supreme Court of India declared that privacy is a fundamental right.3.2.2 The court-issued Srikrishna Committee of Experts further expanded and explored the societal and operation implications of the judgement. We work with eGovernments Foundation (eGov) - a company that develops e-governance plat- forms that enable city and state governments to improve citizen service delivery and improve trans- parency - to investigate the juxtaposition of rapid digitization and personal privacy. Until recently, all citizen data was collected via paper forms. During the push for digitizaiton, organizations tasked with making services digital and equipping municipalities with digital platforms translated the pa- per process directly into a digital process. All the information that was collected via paper then was collected digitally. As o cials processing data on a paper form would have access to all the data fields, o cials processing digital applications now do as well. However, with the advent of big data and the ability to save and search large sets of data easily, data that on paper would not have been easily accessible can be searched relatively easily. Instead of having to manually go through an immense volume of New Property Tax Assessment applications, for example, an o cial or any citizen can merely search eGov’s website to find out what someone owes in property taxes to the government. The online system does provide accountability and transparency regarding a govern- ment service’s workflow and government o cials’ performance. However, due to the current lack of privacy constraints, personal information with cultural, financial or functional implications is left
14 CHAPTER 1. INTRODUCTION 15
vulnerable for exploitation and discrimination. Though there has been a wealth of work done regarding anonymizing data to hide personal information, the question of privacy is separate from that of anonymity. Techniques such as k- anonymity,[22], di↵erential privacy [9], and searchable encryption [8] provide methods to anonymize large sets of data. Anonymizing data does not necessarily address privacy violations or infringements due to inference, as Barocas and Nissenbaum point out.[15]. Particularly in the case of India, details other than a person’s name, such as job title or house address, can provide a lot information regarding caste, income level, political preferences, and so on. Inference comes not from specific data fields but rather contextual information in underlying data. Subsequently, we strive to understand the types of implications that specific data can have in the context of government services regardless of advances in anonymization methods. Through field work, collection of data, and further analysis, we observe and analyze the types of data collected by municipalities, their use cases for government services, access controls to per- sonal data, and the trade-o↵s between governmental e↵ectiveness and privacy driven constraints on data access. Of the more than 20 government services eGov is in the process of digitizing, we investigate three citizens services: New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal in the 112 urban local bodies(ULBs) in Andhra Pradesh, India. For New Property Tax Assessment and New Water Tap Connection, citizens can submit applications online via eGov’s online system or in person via municipal Citizen Service Centers. Citizens can submit complaints regarding a range of municipal/infrastructural issues using the Puraseva mobile application. Data such as personal details, building construction type, family occupancy, physical location, and ration card information is collected for various services. Across all ULBs, each service has a standardized workflow for processing the data, from the opening of the application until its approval or rejection. Each o cial in the workflow has a specific task regarding the application, and subsequently uses specific information in the application to complete his task. The eGov ERP system logs timestamps for task completion, details of and comments for each task, and so on. In order to provide a standard for understanding and analyzing the trade-o↵s between government e ciency and citizen privacy of these workflows across all ULBs, we define a Government E ciency Index (GEI) and Information Privacy Index (IPI). By analyzing municipal data use and calculating these indices across all ULBs in Andhra Pradesh, we strive to understand:
1. How does loss of data integrity or confidentiality a↵ect data use?
2. How do we measure government e ciency and informational privacy in the Indian context?
3. How can we use this methodology to understand how to maximize for e ciency and privacy?
In order to answer these questions, this document proceeds as follows: In Chapter 2, we elaborate on the research questions the document strives to answer. Chapter 3 discusses the history of the CHAPTER 1. INTRODUCTION 16
Supreme Court judgement and the need for privacy in the Indian context. We evaluate the value of privacy and various models of privacy for India as well. Chapter 4 describes the municipal organizational workflow for and citizen data collected from the three services of study: New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal. Details specific to eGov’s implementation of these services are included as well. Chapter 5 discusses the impact of loss of integrity and loss of confidentiality on citizen privacy. We provide the formulation of GEI and IPI in Chapter 6, detailing the calculation of these indices at a couple levels of organization. Chapter 7 shows the results of these calculations across the three services, in addition to additional pieces of analysis to improve our understanding of how ULBs use the ERP system. We identify model cities, or cities that perform well with respect to GEI, IPI, and both. Given the observations and determinations from Chapter 7, Chapter 8 discusses the subsequent use cases for the methodology in determining model cities and details possible impacts on public policy. Lastly, we conclude in Chapter 9 with a summary of our observations and directions for further work. Chapter 2
Research Questions
The overarching question of interest to this paper is this: Governance-Privacy Tension: How can cities (a state actor) use citizen data to maximize the governance while protecting the citizens fundamental right to privacy? The Srikrishna Committee white paper[21] identifies several privacy principles to be of impor- tance to Indias context. Given the focus of this research on cities, we focus on the following subset: Data Collection, Data Use, Data Disclosure, Data Security and Data Anonymity. These are princi- ples that a↵ect both governance e ciency and privacy, and where cities, acting as a Data Controller, must determine what their policies ought to be. In order to understand the tension between gov- ernance and privacy, it is important to first analyze the e↵ects of loss of privacy for citizen data, then construct a standard methodology to evaluate both e ciency and privacy across cities. Using standardized indices as a tool, we can then identify which ULBs in particular are able to maximize e ciency without loss of citizen privacy. Subsequently, the sub-questions we answer are as follows:
1. How do ULBs compare against each other on the axis of e ciency?
2. How do ULBs compare against each other on the axis of privacy?
3. Which ULBs are able to maintain high GEI or high IPI?
4. Which factors lead to a decrease in GEI or IPI?
5. Which ULBs are able to maintain high GEI and high IPI?
6. What factors impact both GEI and IPI?
...
17 Chapter 3
Background: Digitization and Constitutional Privacy
3.1 Advances in Digitization of Cities
India, the second most populous country in the world, has been making strong e↵orts to improve the e ciency of metropolitan areas. As of 2014, the urban population of India is 410 million, and the United Nations has projected that Indias urban areas will experience a growth of 404 million more people. As a result of this urban revolution that has been happening for the past decade, urban bodies in India have been struggling to keep up with the management and monitoring of large pop- ulations. Without accountable collection and processing of data, policymakers may not necessarily have access to the most up-to-date information on societal, financial, and environmental problems and resources. Furthermore, without access to proper data, urban planners may have di culties optimizing the allocation of these resources. In order to accommodate the increasing urban popu- lation, comprehensive development of physical, institutional, social, and economic infrastructure is necessary. To improve the quality of life in urban areas and to improve urban sustainability, the Indian Prime Minister Narendra Modi launched the 100 Smart Cities Mission in 2015. The Mission commissioned under the Union Ministry of Urban Development is a five-year urban renewal project that included the development of 100 Indian cities and the revitalization of 500 more. According to its o cial guidebook, the Missions core infrastructural focuses would consist of ten ar- eas, including robust IT connectivity and digitization, good governance (especially e-Governance and citizen participation), sustainable environment, safety and security of citizens, particularly women, children and the elderly, and health and education. Of the core infrastructure objectives outlined in the Smart Cities Mission Guidelines, the central government and state governments have been putting a focus on the implementation of e-governance
18 CHAPTER 3. BACKGROUND: DIGITIZATION AND CONSTITUTIONAL PRIVACY 19
projects and digitization of citizen data. The goal of e-governance is to increase transparency, accountability, and e ciency of basic governmental tasks including but not limited to property tax payment, maintenance of public infrastructure, issuing of marriage licenses, commercial licenses, birth certificates, and so on and informational and transactional exchanges within the government as well as between the government and government agencies at the national, state, municipal and local levels, and cities and business. E-governance also empowers citizens through access, use, and ownership of their own data. State governments have hired private consulting firms and companies to develop and implement the appropriate software and hardware infrastructure. Once the municipal o ces become almost completely paperless, the public will have access to every online record that exists in the e-governance databases, as privacy of data and information security is not yet enforced. As observed from fieldwork this summer, municipal government administrations tend to believe on their end that broad access to citizen information, the status of their records, and the progress logged by the records will increase the accountability of their own o cials, increase the e ciency of work, and improve the publics faith in the governments ability to work for the individual. Addi- tionally, with the advent of machine learning and data science techniques, third-party observers will be able to use the data for research that could improve service delivery or give insights into public goods. Private corporations may be able to cater their goods and services to certain populations based on observed needs and tastes from the data. Furthermore, policy makers will be able to make more data-driven decisions. Despite the possibility of significant public and private benefits to various actors, data collectors and data holders are being forced to re-evaluate their methods and policies as a result of the Indian Supreme Courts recent landmark decision declaring privacy as a fundamental right under the Indian Constitution.
3.2 The Constitutionality of Privacy
In 2009, India mandated that every citizen must register their information and certain biometric data under a new unique identification system called Aadhar. While Aadhar was used to verify identification and document citizens, there occurred numerous cases where sensitive personal infor- mation had been leaked or hacked and privacy was compromised. As a result, petitioners questioned the need to collect biometric information, data security, and personal privacy, the central govern- ment issued a Supreme Court bench to decide on whether under the Indian Constitution, if privacy is a guaranteed fundamental right or not. The Supreme Court ruled that in fact, privacy is a fun- damental right, and is intrinsic to the values of Article 21 of the Indian Constitution giving citizens the right to life and personal liberty. They noted that the complexity of regulating privacy derives from the context-dependent nature of privacy, and issued a Committee of Experts to deliberate on a data protection framework for the country. CHAPTER 3. BACKGROUND: DIGITIZATION AND CONSTITUTIONAL PRIVACY 20
3.2.1 The Aadhaar Issue
In an e↵ort to create a comprehensive database of its citizens, in 2009 the Government of India created Aadhaar, the worlds largest biometric ID system, collected by the Unique Identification Authority of India (UIDAI). The Aadhaar system ideally would link a persons bank account, Aadhaar number, and mobile phone number to create a cashless, presence-less, paperless citizen information system. Aadhaar was meant to be used by companies to know their customer, to verify customers online, and to create a unique ID [13]. However, the debate on personal privacy was re-instigated when the Government of India passed the Aadhaar Act in 2016. Several stakeholders including private organizations, municipal governments, and activists and citizens have filed several petitions against UIDAI regarding the constitutionality of the Aadhaar Act and the schemes violation of the right to privacy.[19] Petitions included objections on the lack of transparency regarding the handling of private data and the lack of data security by information collection agencies. Furthermore, many questioned the need for the government to collect all ten fingerprints and two retinal scans in order to uniquely identify each person. There existed little to no guidelines on how or what information can be shared within government agencies or how much personal information can be collected by the government. First on July 21 2015, a three-judge bench clarified that mandating that all citizens must have Aadhaar violates a Supreme Court interim order from September 23, 2013, that Aadhaar is voluntary. The central government argued the next day that privacy was not a guaranteed fundamental right under the Indian constitution, that the right to privacy is not absolute and that privacy is subject to restrictions in public interest. On August 6, 2015, the three-judge bench upheld that the petitions against linking the biometric registration process with basic and essential subsidies and welfare schemes constituted a violation of privacy.
3.2.2 Supreme Court Decision
In 2009, India implemented an new national ID system, Aadhar. Citizens were required to register personal information as well as certain biometric data for Aadhaar. While Aadhaar was intended to verify identification and document citizens, it was not entirely secure. There were numerous cases of hacking, leaking, or selling of sensitive personal information. As a result petitions were filed by citizens and civil liberties groups, questioning the need to collect biometric information, data security, and personal privacy. The central government convened a Supreme Court bench to decide on whether or not under the Indian Constitution, privacy is a guaranteed fundamental right.
Privacy is a Fundamental Right
The central government convened a nine-judge bench decision to reflecting how the Constitution makers envisioned the nature of privacy: CHAPTER 3. BACKGROUND: DIGITIZATION AND CONSTITUTIONAL PRIVACY 21
Is privacy a guaranteed fundamental right in the Constitution? • What is privacy defined as? • Is the right to privacy embedded in the right to liberty and personal dignity, or other guarantees • of protected fundamental rights?
In what parts of a citizen’s life is privacy guaranteed? • How much should the government regulate privacy (nature of regulatory power)? • What are the di↵erent aspects of privacy and does the Constitution cover some but not the • others?
On August 24, 2017, the Bench unanimously decided that under the Indian Constitution, privacy is a fundamental right other than for reasons of national security, protection against crime, and protection of revenue. Observing that the Indian Constitution is a dignitarian constitution focused on upholding every citizens personal dignity, the Bench outlined several reasons why privacy is important for ordered liberty: (1) privacy is a form of dignity; (2) privacy provides a limit on the government’s power as well as a limit on private sector entities’ power; (3) privacy is key for freedom of thought and opinion; (4) it provides the right to control personal information as well as provides incentive for development of personality; (5) a guarantee of privacy prevents unreasonable intrusions by malicious public, private, or individual actors. It was determined that privacy is intrinsic to the values of Article 21 which gives citizens the right to life and personal liberty. Furthermore, privacy should apply to both physical forms and to technological forms of information; rights to enter the home should be up to the individual, excepting security reasons listed in Article 14. Lastly, privacy serves eternal values and guarantees as well as foundation of ordered liberty. Consequently, the Bench formulated a three-fold requirement for a valid law on privacy:
1. A law stating the privacy is a fundamental right according to Article 21 should exist.
2. To guard against arbitrary state action, the restrictions imposed on the nature and content of the law should abide by Article 14’s exceptions to reasonableness.
3. The legislature must be proportional to the object and needs sought to be fulfilled by the law.
The Bench, recognizing that data protection and data privacy are complex issues that require expert opinion, mandated that the government create a Committee of Experts under the Chairman- ship of Justice BN Srikrishna, a former judge of the Indian Supreme Court, to deliberate on a data protection framework for the country. While the constitutionality of the right to privacy was decided upon, the complexity of regulating privacy derives from the context-dependent economics of privacy. To better understand existing models of privacy protection and enforcement, an understanding of the transforming definition and value of privacy depending on contexts is important. Chapter 4
Understanding the Value of Privacy in the Indian Context
The combination of big data and machine learning techniques can inform significantly about society and have the potential to bring about positive societal change and digital records can allow for greater e ciency and accountability. Policymakers have to reconsider how open should open data be, and what the fine line lies between keeping information private yet taking the most advantage of the large scale of digitized information. Often, data collected with informed consent for a particular purpose can be re-purposed and analyzed for a di↵erent subset of insights. In these cases, the economic value of the data changes and especially in the big data realm, privacy regulation grapples with problems of unpredictability, externalities, probabilistic harms, and valuation di culties.[11] The economics of privacy concerns the trade-o↵s associated with the balance of public and private spheres between individuals, organizations, and governments with respect to personal privacy. [7]. In the Indian e-governance context, personal data is any information that provides knowledge on an individuals traits or attributes, including but not limited to age, gender, income tax level, address, number of family members, occupation, education level, and welfare status. The data generated by the citizens, who are called the data subjects or providers, is passed sequentially to the data collector, data holder, and data users who may be private or public entities providing a particular service to the citizens. The data collectors for e-governance data are the municipal governments, but the data holders in the backend di↵er from state to state. The state of Andhra Pradesh and Maharashtra for example, hired e-Governments Foundation to implement and build specialized data collection, visualization and storage tools.[3] The companies and consulting firms hired by states vary, as state governments are fairly independent from the central government. All stakeholders including individual citizens, the government of India, UIDAI, data collection agencies, data storage agencies, analytics firms, government welfare agencies, and telecommunications companies to name
22 CHAPTER 4. UNDERSTANDING THE VALUE OF PRIVACY IN THE INDIAN CONTEXT23
a few, will have access to or will have to transfer personal data. While citizens derive individual benefits and enjoy any common public goods produced using the assembled data, three key themes emerge on the flow and use of information about individuals by firms or governmental organizations.[7] First, a single unifying practice of privacy is di cult to formulate, as privacy issues of economic relevance arise in a wide variety of contexts and a variety of markets for personal information. For example, the definition of privacy even within the Indian context changes when discussed with respect to national security versus commercial marketing versus healthcare. Although the Smart Cities Mission mentions only security in the e-governance context, the Srikrishna Committee seeks to create an overarching data privacy protection framework that may minimizes costs from standardizing across sectors, even if it may not be able to minimize trade-o↵s. Second, it is di cult to conclude whether privacy protection entails a net positive or negative change in economic terms, as the benefits of protecting privacy including fraud and identity protec- tion may or may not be greater than the costs of anonymizing data, securing the storage of data, and so on. For instance, while revealing mobile location information can be beneficial in improving tra c conditions or transportation e ciency, it may be considered an intrusion upon privacy if the government continuously monitors citizens locations with the intent of surveillance. In both cases, the same information is being monitored but depending on the context, the credibility and legitimacy of the data collection may be undermined, perhaps leading to greater distrust in the government and higher costs of data collection for public services. . Lastly, especially in a country like India where its 1.3 billion citizens lie on a broad spectrum of levels of education and income, a large number of poor or poorly educated people are at a disadvantage in accurately assessing the benefits or consequences from the sharing or protecting of personal information. The average citizen in India has undergone only 5.1 years of schooling [4] and even the most educated may not necessarily understand the power of analytics or machine learning. When requesting consent for the use of particular information, organizations intending to perform predictive analysis or machine learning techniques cannot lay out all of the scenarios in which the information will be used and what insights may be discovered about their data subjects. Disclosing data causes the reversal of information asymmetries: before the information is released, the data subject the citizen in this case holds greater knowledge about the information than the data holder the government or third party. Afterwards, the data subject may not know what the data holder can do with the data and the consequences associated with sharing the data. While giving up privacy may allow a citizen to receive tangible benefits such as welfare approval, revealing the data may also incur intangible consequences such as the loss of autonomy and possibility of increased surveillance that the common citizen will not be able to foresee or gauge. The market cannot respond appropriately to information gaps where users cannot express their true preferences for privacy protection. [10]. As a result of this information asymmetry, designing even specific privacy regulations cannot necessarily cover unknown use cases or account for under-informed citizens. CHAPTER 4. UNDERSTANDING THE VALUE OF PRIVACY IN THE INDIAN CONTEXT24
In addition to the caveats that exist with creating privacy legislation, there are two basic trade- o↵s that the government sees with the sharing of personal data.[7] First, individuals and communi- ties can economically benefit from sharing data. One particular case in which the sharing of data is undoubtedly beneficial is Indias Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA), the worlds largest social welfare scheme.[20] Through rural employment, the program seeks to alleviate poverty and provide benefits to impoverished and marginalized Chapters of society. In a social welfare state, the collection and storage of data about each individual citizen who is a part of the program is not only necessary for accountable monitoring of progress and allocation of the scarce public resources, but also a powerful enabler in the spread of innovation and knowledge if legitimately deployed to improve the states understanding of causes of and predictive power of particular living conditions, education levels, income levels, and so on. At the same time, when Aadhaar is linked to welfare schemes and education scholarships, inappropriate access to the infor- mation could compromise personally identifiable information like banking information or culturally sensitive information such as caste. Second, certain positive and negative externalities rise through data creation and transmission. The obvious positive externality is that specific aggregate and individual analysis of the data may lead to correlation between particular events in for example the health or education sector. For example, researchers with access to education data were able to discover that student attendance in low-income areas increased when in-school meals were provided. As a result, the central government implemented the Midday Meal Scheme that targeted certain demographics of school children as well as employed an estimated 2 million poor and marginalized women to cook and help with meals.[5] Negative externalities, however, can include intrusive surveillance by the government or targeted pricing by corporations arising from a comfort of sharing ones information if the people around him are willing to. As a result, the economic value of ones personal data continuously changes depending on context and how willing other people are to share their personal information. Chapter 5
Existing Models of Privacy
The constitutionality and contextual dependence of privacy provide a considerable challenge in formulating one set of standardized regulations on the conditions under which personal information can be shared and the methods by which to share and monitor the data. The two main components of international privacy regulations include first, guidelines on how to protect the data and the second, on how to enforce privacy protection. Internationally, the three common models of privacy protection can be described as i) the Command and Control Model, ii) the Self-Regulation/Sectoral Model, and iii) the Co-Regulatory Model.[21] The Srikrishna Committee assessed the three models and concluded that the Co-Regulatory Model was appropriate for India as its varying levels of government involvement and industry participation can be molded to the Indian context.
5.0.1 Command and Control Model
The Command and Control Model, also known as the Comprehensive Model[1], includes a general law that regulates the collection, use and dissemination of personal information in the private and public sectors, governed by an oversight body. Around 40 countries and jurisdictions in Europe have adopted this model,[1] mainly through the European Union Data Protection Directive (EU- DPD) and the Organization for Economic Co-operation and Development (OECD) Guidelines on the Protection of Privacy and Transborder Flow of Personal Data. There are three reasons for adop- tion of a comprehensive model: 1. To remedy past injustices incurred by authoritarian regimes; 2. To ensure consistency with European Privacy Laws to facilitate trans-border information transfer while protection personal privacy; and 3. To promote electronic commerce.[16] The challenges of implementing this model include costly paperwork and documentation in even low-risk scenarios as the model sets out minimum requirements for data collection, storage, and transfer as well as insu cient opportunities for innovation in data processing as the purpose of use of the data must be pre-determined and communicated to the data subjects.
25 CHAPTER 5. EXISTING MODELS OF PRIVACY 26
With the goals of creating a unified economic market and providing strong overall protection of privacy within the EU, the EUDPD sets down Data Protection Principles: transparent data processing, purpose limitation and proportional use, data minimization, accuracy, data retention periods, data security, and accountability to a supervising body. Additionally, the Directive outlines data subjects rights of access, rectification, deletion, and objection to the data, restrictions on onward transfers, additional protections in special categories of data and direct marketing, and prohibition on automated individual decisions. Unlike the sectoral model, these principles put the onus of protecting private data on the data collectors and data holders as opposed to the data subjects. As can be observed, such strict restrictions on how personal data can be used limits freedom of innovation through analytics. Given the breadth and quantity of data being collected on a federal and municipal level in India, the Srikrishna Committee noted that this model is quite restrictive and computationally expensive to implement in India. Additionally, the dichotomy of federal and municipal control would create di culties in creating a federate, sectoral framework and raise issues on how involved state machinery should be.[21]
5.0.2 Sectoral Model
The Sectoral Model of privacy protection applies to countries that have varying legislation across industries and sectors, and is based on a combination of legislation, regulation, and self-regulation. Each industry is at will to draft its own set of guidelines and incorporate self-policing techniques for enforcement of the codes of practice.[21] While this approach provides flexibility and specificity across industries, a significant disadvantage is that over-specified regulation requires modifications or entirely new sets of regulations when new technologies come into use. Furthermore, the self- regulatory approach can be self-serving, and ine↵ective as there is no o cial oversight agency in place. Consequently, some countries opt to use a sectoral approach combined with the Comprehensive Model to have general privacy regulations in addition to more specific industry-oriented guidelines. The standalone Sectoral Model is used in the United States, Japan, and Singapore. The United States Federal Trade Commissions Code of Fair Information Practice Principles (FIPPs) give guide- lines on the use of personal data in the online market place, upon which sectoral legislation is built. It is interesting to note that while the European model holds the data collectors and users responsible for the privacy protection of personal data, the American model holds the data subject responsible for the privacy and disclosure of their data. Data collectors must follow certain guide- lines on notice and consent but regardless of who has access to the data, the data subject owns his data. As a result, the core privacy principles are consent-based: Adequate Notice of Data Use, Choice/Consent for Data Use , Access/Participation of Data Subject, Integrity/Security of Data, and Enforcement/Redress for Data Collectors.[18] Other examples of sectoral regulation include the Fair Credit Reporting Act (1970), Video Privacy Protection Act (1988), Childrens Online Privacy Protection Act (1998), and the Cable Television CHAPTER 5. EXISTING MODELS OF PRIVACY 27
Consumer Protection and Competition Act (1992). Similar to the European model, the American guidelines include rules about the security of data from internal and external threats, as well as enforcement guidelines from a regulatory body. Physical protection of data is guaranteed by both models. Privacy protection of data, however, is not necessarily guaranteed by the self-regulation model. First, the FIPPs are not legally enforced and are merely guidelines to maintain privacy-friendly, consumer-oriented data collection practices. The American model does ensure that people have a choice in providing data, and that they know their data and how it is being used, but companies can deny services if data subjects do not agree to their terms of disclosure. Furthermore, industries will most likely choose the self-regulatory method of enforcement to avoid third-party oversight. While some firms may truly institute costly, self- regulatory standards, competitors may free ride on the sectors improved reputation for protecting privacy. [10] While the sectoral model is the least intrusive and most e cient in ensuring fair information practices, the government risks the possibility of firms putting their own profits ahead of the public interest.[10] While the basic privacy principles were instituted in order to be able to engage in commercial data transfers with EU member states, the United States policy on the governments right to personally identifiable information is completely separate. Certain government agencies under Amendment 4 of the US Constitution and the Patriot Act are allowed to withhold and analyze any personally identifiable information in the interest of national security. The extent to which privacy can be compromised is vague, and privacy regulation is essentially non-existent when the government is the data holder, collector, and user. In the historical and culture context of India, self-regulation has been ine↵ective and highly corrupt. When industries, such as the telecommunications industry, are composed of oligopolies the self-regulation and guidelines will ultimately lead to improper trade- o↵s between marketing strategies and maintenance of personal privacy. Furthermore, the sectoral model does not impose any guidelines on the governments use and processing of personal data. The Srikrishna Committee noted that substantive elements of the self-regulatory model and its data protection framework are not considered as part of an enforcement mechanism.[21] As a result, the sectoral method in India would not only be incomplete but also ine cient for such a diverse country.
5.0.3 Co-Regulatory Model
The Srikrishna Committee and the Expert Group on Privacy concluded that the Co-Regulatory Model captures elements of both the command and control model and the self-regulatory model to create a more appropriate middle path that combines the flexibility of self-regulation with the rigour of government rule-making. In this model, both the government and industry draft regulation for privacy protection, which are enforced by the industry and overseen by a private state agency. Canadas Personal Information Protection and Electronic Documents Act (PIPEDA) and the Aus- tralian National Privacy Principles implement the Co-Regulatory Model. In Canada, privacy in CHAPTER 5. EXISTING MODELS OF PRIVACY 28
the private and public sectors is a guaranteed fundamental right under the Charter of Rights and Freedoms. The core privacy principles in the model are similar if not identical to that of the EU- DPD where the data collectors are responsible for protecting the privacy of the data subjects. This model, however, spreads regulatory power over individuals, industry organizations, and the central government in a four-tier system composed of legislation that is enforced by a government protection agency. The governments agency overlooks watchdog agencies which help empower the public and private sectors. The four tiers of the co-regulatory model are implemented as shown in Figure 5.1:
Figure 5.1: Co-Regulatory Model
The co-regulatory model takes into account privacy principles and provides a multi-layered method to ensure that federal agencies as well as private entities and individuals are aware of and abide by privacy regulations. Like the European Model, there is an o cial governmental supervi- sory body to maintain accountability and verify enforcement. Additionally, like the Sectoral Model, industries have the flexibility of some amount of self-regulation, and so can keep up with needs of the growing Internet economy and rapid technological changes. By the same token, critics of the co-regulation model fear that the governments cooperation with industries may reduce accountabil- ity and transparency. Industry lobbying and the power to participate in creating legislation would facilitate the industry taking advantage of co-regulatory processes to capture the agency and enforce its point of view. CHAPTER 5. EXISTING MODELS OF PRIVACY 29
5.0.4 Recommended Model of Privacy in India
In 2012, an expert group under the Planning Commission of the Indian Government produced a Report of the Group of Experts on Privacy. Chaired by the former Chief Justice of the Delhi High Court, Justice A.P. Shah, the Expert Group was composed of representatives from industry, civil society, NGOs, voluntary organizations, and government departments. As the initiation of national programs like the Aadhaar card, DNA profiling, Reproductive Rights of Women that increasingly used internet-connected technologies was growing, the amount of information about a person began to range from data related to health, travel, taxes, religion, education, financial status, employment, disability, living, situation, welfare status, citizenship status, marriage status, crime record, etc. Analytic tools that generate economic value out of data and the ubiquitous transfer of data require an overarching privacy policy to regulate the government and commercial collection of information. Consequently, the Srikrishna Committee drew from the Group of Experts Report, examined inter- national and national privacy principles, and identified a set of recommendations for the Indian Government to consider when formulating a privacy framework for the country. The White Paper proposes seven salient features for a conceptual foundation for a Privacy Act for India, which the Supreme Court case file reiterates: Technology Agnosticism, Holistic Appli- cation, Informed Consent, Data Minimization, Controller Accountability, Structured Enforcement and Deterrent Penalties. The data collectors, holders, and users are responsible for protecting the privacy of data subjects. Drawing from particularly the EUDPD and OECD Guidelines, the Expert Group provides a comprehensive set of principles and foundational elements to construct an ex- haustive framework that protects personal privacy especially in the context of government collection of personal data. The principles include guidelines on Notice of Data Use, Choice and Consent, Collection Limitation, Purpose Limitation, Access and Correction, Notice of Disclosure of Informa- tion, Security, Openness/Transparency of Data Use, and Accountability. The scope of this study in particular is outlined in Figure 5.2.
Figure 5.2: Comparison of Privacy Principles CHAPTER 5. EXISTING MODELS OF PRIVACY 30
...
5.1 Data and Data Collection
In this section we review the sources of our data, both in terms of use of the current tools for collecting data, through eGov, and our decisions about both site and data type selection. Our key observations are that not all data collected by ULBs for PT, WT, and PGR are necessary for providing these particular services. This discrepancy suggests room for improvement in access controls and therefore personal privacy.
5.2 e-Governments Foundation, Current Installations and Dig- ital Services
The eGovernments Foundation (eGov) develops digital platforms that enable city and state govern- ments to improve accountability, transparency, and e ciency for the delivery of citizen services and accounting and organization within the government. The eGov platform is designed to aid in the management of four categories of government information: administration, revenue, expenditure, and citizen services. While administration and expenditure modules account for employee manage- ment, legal case management, payroll and pensions, assets, and so on, revenue and citizen service modules mainly include tax evaluations and registrations filed by citizens. Revenue sources include collection of property tax, water tax, trade licenses, advertisement tax and fees from government land and estates while citizen services include birth and death registrations, marriage registrations, an online citizen portal, public grievance registrations, and building plan approvals. The platform allows municipal o cials to enter information and view individual and cumulative data on quantitative and geo-spatial dashboards. The digital actions of each employee are logged in order to monitor performance and accountability. It also promotes citizen engagement by interfacing with an online citizen portal and mobile app where people can submit and view the status of their applications and registration, improving transparency and accessibility. EGovs clients include but are not limited to the state of Andhra Pradesh, the state of Punjab, the Greater Chennai Corporation, and the state of Maharashtra. CHAPTER 5. EXISTING MODELS OF PRIVACY 31
Figure 5.3: eGovernments Foundation ERP Suite[3]
5.3 Site and Module Selection
For this study, we chose research sites located in a single state. For confidentiality reasons, the identity of the state is kept private. Within this state, there are 112 ULBs that are now equipped with the eGov platform. ULBs are classified into three types by population: Nagar Panchayats have a population of less than 100,000 people, municipalities have populations greater than 100,000 people, and municipal corporations have populations of greater than one million people. For this study, we chose two municipal corporations and one municipality to construct a representative sample. Administratively, the Director of Municipal Administration (DMA) is a state o cial oversees the eGov implementations in all ULBs.
5.3.1 Department Hierarchy
Figure 5.4: Department Hierarchy CHAPTER 5. EXISTING MODELS OF PRIVACY 32
Within a state, the DMA, who provides state level oversight for support services in the municipali- ties, manages the Additional Director, Joint Directors, and Assistant Directors who oversee various aspects of all municipalities. Then, each municipal corporation houses a commissioner who, with the ULB mayor, provides administration and governance of the operations of each district. Each ULB is assigned a commissioner depending on which district it resides in. The Commissioner defines access controls for employees and can monitor employee performance. Within every ULB, there exist the Administration, Revenue, Accounts, Public Health and Sanitation, Engineering, Town Planning and Poverty Alleviation departments. Each department is responsible for processing certain modules classified under Expenditure and Revenue. All departments, however, are responsible for the Public Grievance module depending on factors discussed in Section 6.1.3.
5.3.2 Classification of eGov Modules
Of the four categories of eGov modules, the Revenue and Citizen Services modules are public-facing and relevant to citizen data collection and citizen service delivery. This subset of modules can be grouped into four types based on what kind of information they may reveal about a citizen. First, modules may be revealing of personal identity, as they contain highly sensitive personal information. Birth and Death Registration, Marriage Licenses, and the Citizen Portal fall into this category. Second, modules such as Water Charges, Property Taxes, and Building Approval are revealing of personal assets. Third, the Trade License and Advertisement Tax modules are examples of modules that are revealing of a citizen’s commercial assets. Lastly, the Public Grievances module forms its own category, as its function does not necessarily require citizens to reveal sensitive personal information, and e↵ects public infrastructure.
5.3.3 Modules Selected for This Study
The Property Tax Module (PT), Water Charges Module (WT), and Public Grievance Module (PGR) were chosen for this study based on volume of data, accessibility and prevalence. These modules were among the first to be implemented at our site in 2016. As a result, the volume of transactions for each of the modules exceeds 100,000 in 2018. This combination also lets us consider various di↵erent types of potential revelation of citizens personal information. We will examine the nature of this information in further detail in the next few sections. Chapter 6
Data Available for Analysis
6.0.1 Workflows
For each of the modules, we gathered data and information in three parts. To understand the workflow, we first interviewed e-Gov’s team for the state as well as state o cials on the workflow of each module. Each of the three selected modules have their own workflow. Once a citizen submits a form or a request, all of the information that they have submitted is passed through various levels of hierarchy in the appropriate department within a certain number of days. These number of days, called Service Level Agreements or SLAs, are unique to the Indian state where we carried out our site visits. If an o cial does not complete his task within the given SLA, then the task will be escalated to the next level in the hierarchy. This accountability model promotes transparency and improves e ciency. We developed an understanding of how the data collected from a citizen is used and passed through a department to provide a particular service and how long it takes to do so in comparison to the SLA for that service. This information is important in detecting possible e ciency and privacy trade-o↵s while providing a service.
6.0.2 Binary and Qualitative Matrices
The second type of data we needed was a matrix of how each data field for each service is used. We conducted on-site interviews with service functionaries, the state employees that complete certain tasks in the service workflows. During the interviews, a functionary from each level of the workflow was asked to identify the data fields that they were given access to as well as the data fields that they needed to complete their task in the workflow, and the data fields that were not necessary to complete their task but useful to have access to. During the interviews, we built two matrices to structure this data. In both matrices, the rows contain the data fields collected for a particular service and the columns contain the name of each functionary in the workflow and their tasks. In the Necessary Data Matrix (NDM), cells are filled with ”1” if a particular functionary uses a particular
33 CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 34
data field to complete his task, or ”0” otherwise. In the Operational and Administrative Matrix (OAM), a cell is assigned ”O” if the data field is used for operational purposes, as in it is absolutely required for a particular functionary to complete his task. The cell is denoted ”A” if the functionary may use the field for e ciency purposes but is not necessarily required to complete his task. The cell is filled with ”U” if the field is not necessary or used by the functionary at all to complete his task. We assume the NDM and OAM are identical across all ULBs. The state has guidelines on how each service is performed or how the outcome of each service is determined. For services like WT and PT, there is a master sheet that is filled with relevant information, which then calculates the respective tax assessment. The data fields collected and service workflows are the same across all ULBs. The master sheet is also identical across all ULBs. In smaller ULBs, one functionary may be responsible for a couple tasks that are spread out across multiple functionaries in larger ULBs. Although these small variations in responsibilities exists, the tasks performed, data access, and data use we assume are uniform.
6.0.3 2018 Data
The final source of data includes the 2018 data collected by the state through eGov’s modules. With the permission of eGov and our partner state, we were given access to the data collected by all New Water Tap Connection applications, New Property Tax Assessment applications, and Public Grievance Redressals in 2018 for all 112 ULBs. The Data includes all data fields collected by these services as well as details of the workflow transition for each application. The workflow transition documentation includes when a particular application was received by a functionary, the time it took for the functionary to complete his task, the functionary’s comments, and the state-mandated completion deadline for the application. Such granular data on the movement of applications through workflows was extremely important for results and analysis.
6.1 Service Modules Data
The Property Tax (PT) and Water Charges modules o↵er various services. For example, in the Water Charges Module, citizens can apply for New Connection, Re-Connection, Closure of Connection, and so on. For both modules, the New Property Tax assessment and New Water Tap Connection applications and workflows are fairly representative of and the most comprehensive in terms of collected data fields of all of the services in their respective modules. As such, we refer to the new property tax assessment and new water connection workflows as the generalized Property Tax Module and the Water Charges Module, respectively. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 35
6.1.1 Property Tax Module
The Property Tax Module includes services to evaluate property tax or change property tax. While the representative service is New Property Tax Assessment, the module also includes services like Transfer of Title, Bifurcation, Addition/Alterations, Revision Petitions, Demolitions, and so on. The module requires the applicant to give owner details, property address details, assessment details, amenities, construction details, floor details, details of surrounding boundaries of the properties, court documents, and vacant land details if applicable.
Workflow
The quantitative evaluation of property tax payment depends on Usage, Classification, Zone, Age, and Occupancy Type data fields. Application particulars, such as contact details and address are important in verifying personal identity and assets. Once a citizen submits an evaluation request, the data is verified by a Junior Senior Assistant, then send to a Bill Collector and Revenue Inspector who verify details and conduct site visits. A revenue o cer validates the evaluation, at which point the application must be approved at the Commissioner Level in order to be complete. In smaller ULBs, two or more of these functions may be completed by the same o cial. In larger ULBs, the process may be less uniform so that work is spread across multiple o cials in the same level of hierarchy. The workflow for processing a property tax assessment application is described in Figure 6.1.[3] All data is first collected from the citizen through an online portal, the Citizen Service Center’s (CSC) physical location, or the state’s online app. Along the rows we see the various functionaries including Jr./Sr. Assistant, Bill Collector, Revenue Inspector, Revenue O cer, and Commissioner. The green boxes describe the tasks each functionary is responsible for, and the arrows indicate the order of operations. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 36
Figure 6.1: New Property Tax Assessment Workflow[3]
Binary and Qualitative Matrices
The NDM and OAM matrices were collected on the ground through interviews with ULB functionar- ies as well as eGov site specialists. The full matrices for our state of study are shown in Appendix C.
2018 Data
We were given access to all New Property Tax Assessment data from 2018. In 2018, 201,458 applica- tions were processed, which overall went through 931,393 workflow transitions. As of the beginning of 2019, the state has processed 366,711 applications which has undergone 1,986,969 transitions in total. The fields which were collected and their descriptions are shown in Figure C.6.
6.1.2 Water Charges Module
The Water Tax module, which the Engineering department manages, includes services to evaluate or change water tax payments. In our study, we analyze applications for New Water Tap Connection, as this service fairly representative of and the most comprehensive in terms of collected data fields of all of the services in the Water Charges Module. Other services in the module include Change of Usage, Closure of Connection, Re-connection Service, and Additional Water Tap Connection. Similar to the PT module, application particulars are necessary as well for verification. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 37
Workflow
The fields that are essential for the evaluation of water tax are Zone, Uses Type, Water Source, Pipe Size and where it is applicable the White Ration Card. If the resident holds a White Ration Card, that means they are eligible for subsidies. In that case, the name and address become important for verification purposes, and that the person holding the white ration card is the one living at the property. The Property Assessment ID must also be provided in the application, where all of the information from that Property Tax (PT) assessment is available to the o cials in the Water Charges workflow. The workflow generally looks similar to the PT module, where a Junior/Senior Assistant verifies application details, Assistant Engineer does a field verification and feasibility testing, Deputy Executive Engineer/Executive Engineer/Superintendent Engineer scrutiny the estimation details, and the Commissioner approves the evaluation.
Figure 6.2: New Water Tap Connection Workflow[3]
Binary and Qualitative Matrices
The NDM and OAM matrices were collected on the ground through interviews with ULB func- tionaries as well as eGov site specialists. The full matrices are shown in Appendix D. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 38
2018 Data
In 2018, 101,849 New Water Tap Connection applications were processed, which overall went through 747,521 workflow transitions. As of the beginning of 2019, the state has processed 160,809 appli- cations which has undergone 1,192,056 transitions in total.[3] The fields which were collected and their descriptions are shown in Figure D.3.
6.1.3 Public Grievances Module
The Public Grievance Module (PGR) allows citizens to submit a complaint to the municipality about sanitation issues, stray animals, illegal businesses, non-functioning of street lights, concerns regarding schools, voter lists, and so on. Each complaint is mapped to an internal department and an o cial in that department. Once the complaint is submitted and reaches an o cial in the relevant department, the o cial has an SLA for that concern by which he must address the issue. If an o cial does not address the concern within the given SLA, then the task will be escalated to the next level in the hierarchy. This accountability model promotes transparency and improves e ciency. A comprehensive list of complaint types and corresponding SLAs are outlined by the state, as shown in Figure A.1.
Workflow
The workflow and escalation of tasks depends on the complaint and the department to which the com- plaint is assigned. Unlike Property Tax and Water Charges, each PGR complaint can be addressed and completed by one functionary. The module maps to a municipal administration department depending on the type of grievance submitted. This module requires the citizen to input contact details of the citizen and grievance details including the location of the grievance and photos of the complaint if relevant. Depending on the type and geographical area of the complaint, the back-end maps the complaint to a functionary. If the functionary exceeds the SLA for that complaint, then the complaint will escalate to his superior according to the escalation hierarchy shown in Figure 6.3.
Binary and Qualitative Matrices
The NDM and OAM matrices were collected on the ground through interviews with ULB func- tionaries as well as eGov site specialists. The full matrices are shown in Appendix E.
2018 Data
In 2018, 135,242 PGR submissions were processed, which overall went through 654,418 workflow transitions. As of the beginning of 2019, the state has processed 265,192 submissions which has undergone 1,307,552 transitions in total.[3] The fields which were collected and their descriptions are shown in Figure E.5. PGR has not yet been implemented for all departments for all ULBs. CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 39
Figure 6.3: PGR Escalation Hierarchy[3] CHAPTER 6. DATA AVAILABLE FOR ANALYSIS 40
In the eGov data we received, ULBs had a non-trivial number of transactions for the Revenue De- partment, Administration Department, Town Planning Department, and Urban Poverty Alleviation Department. In this study we analyze complaints from only these four departments. Chapter 7
Understanding Data Use and Data Disclosure
To maximize privacy, it seems obvious that municipalities should collect only data that they need in determining a tax assessment or providing a particular service. However, why is this not necessarily the case? The current online forms were transcribed from the previously existing paper forms. Pre- digitization, as much information as possible was collected from the citizen, even if some fields seemed unnecessary. Privacy was still conserved as data was not easily searchable, and citizens were fine with giving up more information so that they would not have to spend more time and money to return to the municipal o ce again to give more details. In the digital era where data is searchable and more easily accessible, collecting unnecessary data jeopardizes personal privacy and data usage. From talking with various o cials and gaining an in-depth understanding of the workflow, we understand that all citizen data collected for each module can be categorized broadly into three categories: necessary for completing the function of the module, collected for e ciency/accuracy of the workflow, and unnecessary to the module (see Figure 7.1). Necessary use means that a data field is either required for a quantitative valuation or for personal identity or asset verification purposes. In Water Charges for example, the attributes Property Type, Usage Type, Pipe Size, Water Source Type, Connection Type, and Address are the only factors used to quantitatively calculate the water tax. Attributes such as Property Assessment number, White Ration Card, and Name of Applicant are used to verify details of the request, and so are important to validate the request. Data fields collected for e ciency/accuracy are not necessarily needed in order to complete the function of the module, but give o cials a clearer picture of the application. Some of these attributes include information that can be observable on the site, such as amenities or details of surrounding areas for the property tax module. Other information makes establishing contact with the applicant easier,
41 CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 42
such as the name and phone number of the citizen filing a public grievance. Lastly, there are data fields that are unnecessary to complete a function, but are still collected. In PGR for example, the address of the citizen complaining is not necessary to either contact him or address the grievance. For Water Charges and Property Tax module, email is not used to contact or give updates to the citizen, as the status of an application is communicated verbally or through the citizen portal or app.
7.1 Understanding Implications of Loss of Data Integrity
In this section we focus on several sorts of implications that derive from loss of data integrity. We begin by considering privacy implications and then consider the functional and financial implications of loss of data integrity. The reasons for loss of integrity are left to future work.
7.1.1 Privacy Analysis I: Loss of Data Integrity
According to NIST,[12] the loss of data integrity is defined as data being altered in an unauthorized manner during storage, processing, or in transit. In order to understand the implications if data integrity is lost for each data field in the three modules. We determined three types of implications that the loss of integrity may have: cultural, financial, and functional. Cultural implications relate to what social inferences can be made about a person from a particular data field. There would be a financial implication if the citizen is a↵ected financially, and a functional implication if the function of the module is unfulfilled. Due to the size of this detailed analysis, we summarize it here.
7.1.2 Functional and Financial Implications of Loss of Data Integrity
For each data field in each of the modules, we evaluate what kind of implication may result if only that particular data field was altered in an unauthorized manner. For example, if only a person’s name in Water Charges module was changed, and the name on the application does not match the name in the property assessment anymore, then the water tax evaluation would be stalled. As a result, the loss of integrity of the Name attribute in the Water Charges module would have a functional implication. In the case of Water Charges and Property Tax, financial implications and functional implications go hand in hand. If a tax evaluation is incorrect, then the citizen is financially e↵ected. From our analysis, we observe that the loss of integrity is likely to have functional and financial implications. The loss of integrity can a↵ect the verification of personal identity or assets, the quantitative valuation of the water or property tax, or hinder e ciency. As described in Section 6.2, necessary data fields can be separated into those required for verification and those required for the quantitative calculation that is the output of a module. If the integrity of either type of necessary CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 43
Figure 7.1: Table of Fields Necessary, Collected for E ciency/Accuracy, and Unnecessary for Com- pletion of PT Assessment, WT Assessment, and PGR CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 44
field is lost, then by definition the function of the module cannot be completed. There may be an over-valuation or under-valuation of a property or water tax if the factors determining them are changed. This poses a financial implication. A functional implication is the function being stalled if the identifying information of a person, his assets, or a public grievance are inaccurate. If the data fields collected for e ciency/accuracy are corrupted, then the function of the module may not be stalled, but may be severely hindered. If a Grievance Photo was changed to a di↵erent image, a functionary would still be able to find the location of the grievance by the Grievance Details or contacting the complainer, but his job would likely be severely sidetracked by an irrelevant image. Therefore, we understand that protecting the integrity of citizen data fields is important in order to protect against harmful functional and financial implications.
7.2 Understanding the Implications of Data Disclosure on Privacy
In this section we consider the e↵ects of data disclosure on privacy. We begin by considering who will be impacted, and the relative severity of that. We then discuss the financial and cultural implication, and conclude this section with a discussion of the challenges of inference over the grievance data.
7.2.1 Privacy Analysis II: Loss of Data Confidentiality
In parallel with analyzing for loss of integrity we analyzed the data types for loss of confidentiality. According to NIST, the loss of data confidentiality means that protected data is accessed by or disclosed to an unauthorized party. For each data field, we evaluated what kind of implication may arise from the data field being exposed. At a high level, we can separate two dimensions of a citizen service: the type of service being requested/o↵ered; and who is the service being o↵ered to/o↵ered by. The implications of the loss of confidentiality are the gravest when both aspects are leaked; meaning an unauthorized person knows not only what the service being o↵ered is, but also who is being served. The type of implications can be functional, where the outcome of the service, or cultural, where politic, social, or economic inferences can be made about the citizen.
7.2.2 Financial and Cultural Implications of Data Voluntary or Involun- tary Disclosure
The next question we asked of the data was whether we could understand and categorize the types of implications from loss of confidentiality of the data. When we analyze the types of grievances and their implications, we find that some are solely financial. Figure B.1 presents a subset of these that we found among the data. Typically, these complaints were made either by individual citizens or a business, but about other businesses, but we note their financial implications. If a restaurant has CHAPTER 7. UNDERSTANDING DATA USE AND DATA DISCLOSURE 45
complaints about incorrect slaughtering or garbage disposal, whether or not it is true, the restaurant may su↵er financially. In contrast, we also found some kinds of grievances that were solely cultural. If a citizen complained about noise at night, it was likely to be about another citizen at home or walking down the street. As a third category, we found some complaints that had both financial and cultural implications. Examples of these can be found in Figure B.2. Noting that generally, financial only losses would derive from a complaint against a business, a citizen complaining about another citizen would lead to a cultural loss, and a business complaining about a citizen would lead to a combination of cultural and financial loss, we tabulated the number of each type of relationship, as shown in Figure Appendix B. In addition, in that figure we noted the number of complaints where the respondent to the complaint would be the ULB, noting that that exposure there would have neither cultural or financial implications directly.
7.2.3 Grievance and Inference
For the PGR module, the types of inferences that can be made from the loss of confidentiality can be determined through an understanding of the actors and their roles. For a public grievance, there is a complainer, and the entity that must take action to fix the complaint. The complainer can either be a citizen or a business, and the entity who must take action to fix the problem can be another citizen, business, or the municipality. For example, a citizen may submit a complaint about the non-functioning of street lights. In that case, the municipal administration must take action. However, if the complaint is about the illegal slaughtering of animals, then the complainer could have been a citizen who may have been a↵ected by the business or a legal competing business. The entity that must fix the issue is the illegal business. From analyzing the two actors for the 110 types of public grievances published by the on-site’s government, we found that we can assign certain types of implications depending on who the two actors are. We notice that citizen on citizen complaint has cultural implications, citizen or business on business has functional implications, and business on citizen has cultural and financial implica- tions. As shown in Figure B.3, most complaints must be fixed by the ULBs, likely regarding public infrastructure and health and sanitation issues. The majority of complaints that have implications come from complaining about a business, resulting in financial implications. If a business is a↵ected by a complaint and the entity who has complained is exposed, the complainer may endure financial consequences due to bias against them from the business. In contrast, complaints against citizens tend to result in cultural implications, as the person against whom the complaint has been lodged may develop political, social, or other types of cultural biases against the complaining entity. ... Chapter 8
Synthesis
We now synthesize the analysis above into two indices that help measure the e ciency of governance and information privacy. Defining these indices helps in highlighting the tension that arises when try- ing to maximize both indices simultaneously. This tension then carves out a space where innovation can help achieve a high level of governance e ciency, information privacy, and transparency.
8.1 Government E ciency Index
The Government E ciency Index (GEI) is defined as the product of timeliness and accuracy pa- rameters for a give service or entity. GEI is constructed such that it ranges from 0-1, where a value of 1 denotes highest level of governance e ciency.
8.1.1 Timeliness of Service
The definition of Timeliness of Service rests upon when a service is considered timely. We consider a service timely when it is delivered on or before the desired Service Level Agreement (SLA). The ULBs in India publish an SLA for each service they o↵er, as promised by the Citizen’s Charter. [2]1 The Timeliness of Service component is measured as follows: for a given service, it is measured by the fraction of times the service is delivered on or before the SLA over a given unit of time (i.e., hour, day, month, etc.). For a given group (a division within ULB, the ULB as a whole), Timeliness of Service is measured by averaging the timeliness of the services delivered by the group over a given unit of time. Timeliness of Service can be computed at the level of a functionary, given service, or for all services o↵ered by an entity (e.g., all services of a division within ULB, all services of the ULB, all services of the ULBs in a given block, all services of ULBs in a given state, etc.) Accordingly, the equations for it come in three flavors.
46 CHAPTER 8. SYNTHESIS 47
Some notation that is important to note:
1. A task ki is a multi-set consisting of the times it has taken for every instance of that task to be completed.
2. For one particular service, there are n tasks that need to completed: K = k ,k ,...k . { 1 2 n} 3. Each functionary f (j) involved in the service completes some partial set of tasks from K.We denote the set of tasks f (j) must complete as K(j). The union of K(j) will just equal K. f (j) for all j is the set F.
Timeliness of Service at the level of a functionary
Each functionary’s timeliness can be described by the proportion of tasks that the functionary has completed within the task’s SLA as prescribed by the state’s SLA guidebook, the Puraseva User Manual. k = time to complete instance ,... time to complete instance (8.1) i { 1 m}
Each task ki also has an SLAk,i associated with it according to the Puraseva User Manual.
Therefore, the timeliness of a given task ki is:
k k SLA t = k{ ik i ki }k (8.2) ki k k ik A functionary is usually responsible for one task, although some functionaries are responsible for more. Therefore, the timeliness of a functionary j can be described as his average timeliness across all of the tasks he is responsible for K(j):
1 t (j) = t (8.3) f K(j) k k K(j) k k 2X Timeliness of Service at the level of a service
The timeliness of a service is described by the proportion of instances the service was completed by the SLA for that service. We can calculate this by dividing the number of timely instances divided by the total number of instances for a given service. Suppose set si consists of all instances that a service i was completed:
s = time to complete service instance , time to complete service instance ,... (8.4) i { 1 2 }
We can denote the timeliness of a a service si with:
s s SLA t = k{ i| i si }k (8.5) s s k ik CHAPTER 8. SYNTHESIS 48
Timeliness of Service at the level of a an entity
Timeliness of Service Average The timeliness of service for an entity is the proportion of all services that the entity o↵ers that were completed in a timely manner. All services that an entity o↵ers can be denoted by S = s ,s ,s ... . We can calculate the timeliness at the entity level by averaging the timeliness of { 1 2 3 } all services within the given entity. We assume each service is just as important as the next, so we do not give a di↵erent weight for each service while calculating the timeliness for the entity.
1 t = t (8.6) e S s s S k k X2 8.1.2 Accuracy of Service
The definition of Accuracy of Service rests upon when a service is considered accurate. We consider a service accurate when right service is delivered to the right person without any rework. The Accuracy of Service component is measured as follows: for a given service, it is measured by the fraction of times the service is delivered without rework over a given unit of time (i.e., hour, day, month, etc.). For a given group (a division within ULB, the ULB as a whole), Accuracy of Service is measured by averaging the accuracy of the services delivered by the group over a given unit of time. Accuracy of Service can be delivered at the level of a service or an entity as previously described. Accordingly, the equations for it come in two flavors.
Accuracy of Service at Service Level
The accuracy of a service at the service level can be computed by dividing the total number of times the service was accurately divided by the total number of times the service was completed. The number of services completed accurately is equal to the number of times the services was completed n minus the number of times the completed service was petitioned by the citizen and resulted in a change. A proxy for this latter value is the number of Revision Petitions r that resulted in a change in some service. So for a given service s we can calculate the accuracy of the service at the service level as follows: r a =1 (8.7) s n
Accuracy of Service at Entity Level
The accuracy of a service for an entity can be described by the average accuracy of all services S o↵ered by the entity: 1 a = a (8.8) e S s s S k k X2 CHAPTER 8. SYNTHESIS 49
8.2 Informational Privacy Index
IPI is constructed such that it ranges from 0-1, where a value of 1 denotes highest level of information privacy. We define Right Collection as collection of those data fields that are necessary for delivering the service. In other words, without collecting these data fields, the requested service cannot be delivered. Right Collection is measured for a given service or for services o↵ered by a given group as Necessary Data Fields/Total Data Fields Collected. We define Right Use as access of data fields to only those (in the ULB) who need it for delivering the service. Right Use is measured for a given service or for services o↵ered by a given group as Number of Data Field To Which Access Is Necessary / Number of Data Fields To Which Access Is Granted. We define Right Disclosure as public disclosure data fields that protects personal identity and undesirable inference. Right Disclosure is measured for a given service or for services o↵ered by a given group as (1 - (Number of Data fields With PII or Undesirable Inference Disclosed / Total Number of Fields with PII or Undesirable Inference)). IPI is determined based on the analysis of data collection, use, and disclosure policies of ULBs. The real-time value of IPI will rest upon the frequency and types of service requests a ULB serves. Notation:
1. X is the set of data fields collected for a particular service.
(i) 2. Xn : The set of data necessary for a task ki to be completed in the work flow of a service. (i) Xn = Xn is the set of all data necessary for a service to be completed. i S (i) 3. Xe : The set of data collected for the e ciency/accuracy of a task ki in the work flow for a (i) service. Xe = Xe is the set of data collected for the e ciency/accuracy of a service. i S (i) 4. Xu : The set of data collected generally for legacy reasons and unnecessary for a task ki in (i) the work flow of a service to be completed. Xu = Xu is the set of data that is collected i but unnecessary for a service to be completed. S
i 5. Xa: For each task ki, the data fields the functionary responsible for ki has access to is denoted as Xi . A functionary currently has access to all the collected data fields, so Xi = X for all a a k k i.
8.2.1 Right Collection Index
The extent of right collection is determined by calculating the proportion of the collected data fields that are actually necessary for the completion of the task or service. Since each task has specific CHAPTER 8. SYNTHESIS 50
data that it requires to be completed, we compute right collection at the task level as opposed to the functionary level. We can also compute right collection at the service level as well as the entity level.
Task Level
The right collection index at the task level ck is the number of necessary data fields collected for that particular task divided by the total number of data fields collected for the task. So for a task ki the right collection is: (i) Xn ci = k k (8.9) k X k nk Service Level
The right collection index at the service level is computed by dividing all necessary fields collected by a particular service divided by all the fields collected by that service.
X c = k nk (8.10) s X k k Entity Level
The right collection index for an entity is measured by the average of the right collection indices for all the services S the entity o↵ers. 1 c = c (8.11) e S s s S k k X2 8.2.2 Right Use
The right use index measures extent to which access of data fields is given only to those (in the ULB) who need it for delivering the service.
Task Level
i We can compute the right use index for each task by uk dividing the number of fields necessary to complete a given task by the total number of fields a functionary completing that field is given access to. Ideally, a functionary should only have access to the fields that are required for him to complete a given task, giving a right use index of 1.
(i) (i) Xn uk = k (i)k (8.12) Xa k k CHAPTER 8. SYNTHESIS 51
The above calculations penalize the index fully for a functionary having access to Xe. An alternate calculation for the right use index that takes into account the right use of Xe could be as follows:
(i) (i) (i) Xn 1 Xe uk = k (i)k + k (i)k (8.13) Xa 2 Xa k k k k Service Level
The right use index for a given service can either be described as the average or the product of i the right use indices for all tasks involved in completing that service. While averaging over all uk would give a general indication of right use for a service, it would not give a nuanced index across all services and all ULBs, as the right use index for tasks within a service may vary greatly. The average right use index for a particular service s is computed as follows:
1 u = u (8.14) s K k k K k k X2 The product of right use indices of all tasks for a particular service s is computed as follows:
us = uk (8.15) k K Y2 Entity Level
The right use index for a given service can be similarly described: we can take the average or the product of the right use indices for all services the entity o↵ers. The average right use index for an entity is as follows: 1 u = u (8.16) e S s s S k k X2 8.2.3 Right Disclosure
The right disclosure index should describe how protected personal identity and undesirable inferences are against public disclosure. This parameter is defined as the proportion of fields that are considered PII that are not open at each level. In our case study, home address and mobile phone number are considered PII, as defined by eGov.
Functionary Level
At the functionary level, right disclosure is calculated by the Proxy: 1 (Data fields With PII or Undesirable Inference Disclosed/Total Fields with PII or Undesirable Inference). For a given service, the right disclosure parameter will be the same across all functionaries; data fields disclosed publicly are obviously accessible by functionaries as well. However, it is useful to note the parameter at CHAPTER 8. SYNTHESIS 52
the functionary level in order to calculate the IPI across functionaries in a given service. The right collection and right use parameters are not necessarily the same across all functionaries.
(i) PIIexposed dk =1 (8.17) PIIcollected
Service Level
The calculation and values for the right disclosure parameter at the service level will be the same as the that at the functionary level. However, the parameter will vary across services, depending on the PII fields collected and exposed to the public.
(i) PIIexposed ds =1 (8.18) PIIcollected
Entity Level
At the entity level, the right disclosure parameter can be defined as the average IPI of all services.
1 d = u (8.19) e S s s S k k X2 Chapter 9
Results and Analysis
In this Chapter, we calculate and analyze GEI and IPI for New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal. We describe each ULB with a variety of metrics including but not limited to timeliness, weighted impact, inaccurate application duration rate, and average time to complete service. Ultimately, we determine top-performing ULBs according to GEI, IPI, and both. These ULBs will serve as case studies in determining factors that help maintain high GEI and IPI.
9.1 GEI Calculation
In this Section, we explore the distribution of timeliness and accuracy values for New Property Tax Assessment, New Water Tap Connection, and Public Grievance Redressal across all ULBs. We often group ULBs by tier as the varying volume of transactions and resources could a↵ect GEI. We determine top-performing cities according to GEI, and argue that since there are ULBs within each tier that have consistently high GEI, then other ULBs within each tier can also improve their e ciency.
9.1.1 Timeliness
We assess timeliness through several lenses. First, we observe the distribution of timeliness across all ULBs. We calculate adjusted timeliness values and weighted impact of timeliness across all three tiers of ULBs for Property Tax Assessment, Water Charges, and Public Grievance Redressal. For Property Tax and Water Charges, we identify model cities by tier based on the adjusted timeliness values and quality workflow transparency. Identifying top-performing ULBs is important in later determining why these ULBs in particular are able to maintain high levels of e ciency and perhaps even privacy. For PGR, we observe timeliness values at various levels. Second, we analyze the
53 CHAPTER 9. RESULTS AND ANALYSIS 54
distribution of timeliness values across all ULBs by department. Third, we observe which ULBs perform well in which departments. Overall, we determine model cities primarily by timeliness values across all departments and secondarily by the average di↵erence between SLA and the days it took to complete the set or subset of requests.
Adjusted Timeliness
Adjusted timeliness gives a more accurate measure of timeliness as opposed to perceived timeliness. Perceived timeliness is calculated as previously described. Application duration is defined as the di↵erence in timestamps from the application entry until the acceptance/denial of the application. Perceived timeliness is the proportion of applications where the application durations are less than their respective SLAs. However, when observing the distribution of the task durations, we observe that a non-trivial number of applications have unrealistic application durations. Some applications were recorded as having been completed in less than one or two days. Others show entry and approval within seconds. We assume that these applications were received manually, proceeded through the workflow, and were accepted or denied manually. After the completion of the workflow, these applications were likely then entered into the ERP system. Subsequently, technical aids or administrators enter, digitally approve, and pass on the application according to all steps of the workflow within a day. Applications digitally logged to have been completed in less than 1 day are likely to have been recorded in this inaccurate manner. Inaccurate application durations can significantly improve a ULBs timeliness value, leading to incorrect identification of model ULBs. To normalize for inaccurate recordings, we calculate and compare an adjusted timeliness pa- rameter. For each service, the adjusted timeliness is calculated by ignoring applications that have application durations of less than a day, and re-calculating timeliness as described in Section 8.1.1 with the remaining applications. Adjusted timeliness is calculated as follows:
s (s SLA ) (s 1 t = k{ i| i s ^ i }k (9.1) adj s s 1 k{ i| i }k Adjusted timeliness gives a more accurate measure of how timely applications are completed. It also allows for a fairer comparison between ULBs, penalizing cities that record large numbers of inaccurate application durations.
Weighted Impact
We calculate weighted impact to understand the magnitude of applications that are a↵ected by the GEI for a particular ULB. We define GEI Impact (GEImpact) of a ULB as the number of applications that on average are delivered in a timely manner for a particular service. As this number depends on the number of applications received by each ULB, the impact gives a sense of the magnitude of changes in timeliness. A large ULB with a lower timeliness value and a smaller ULB with a higher CHAPTER 9. RESULTS AND ANALYSIS 55
timeliness value may have the same impact score. As such, comparing impact scores is not as useful to determine model ULBs overall. In particular, for Property Tax and Water Charges we calculate weighted impact. For a given ULB and service, weighted impact is the number of timely applications that can be expected given the adjusted timeliness value and total number of applications:
I(s) = t(s) s (9.2) w adj ⇤k k (PGR) For a given ULB, Iw is equal to the number of applications that had been completed by that (PT) ULB within each complaint’s SLA period. For Property Tax and Water Charges however, Iw (WC) and Iw will be smaller than the impact values calculated with perceived timeliness values. We calculate adjusted timeliness based on only the partial set of applications that were likely to have been recorded accurately. The weighted impacts are based on the actual of volume of applications, but with adjusted timeliness values. As such, we interpret weighted impact as projected impact of a service if the ERP system is used as intended.
9.1.2 Property Tax Assessment
See Appendix F for a complete table of results.
Inaccurate Application Duration
Figure 9.1 shows the distribution of the percentage of inaccurately recorded transactions for each ULB by tier. The color of the markers refer to the population tier of the city, and are consistent for each tier for all plots in this document. From Figure 9.1, we observe that while Tier 1 cities on average process more applications, they have a smaller spread of inaccurate application durations. The median of Tier 1 at around 7% is the lowest among all tiers, and the median of Tier 2 cities is highest at about 29%. Tier 2 ULBs show the most and nearly maximum spread of percentage of inaccurate transactions. While the median for Tier 3 is around 20%, the upper 50% of Tier 2 and Tier 3 ULBs have a large spread. High rates of inaccurate workflow documentation seem to be largely uncorrelated with the number of applications processed by a ULB. In fact ULBs with smaller volume of applications seem to generally have a higher proportion of inaccurate records. This observation falls in line with what we were told during field visits. CHAPTER 9. RESULTS AND ANALYSIS 56
Figure 9.1: Distribution of Inaccurate Application Durations for PT by Tier: This plot describes the distribution of percentage of inaccurately recorded transactions by tier of population. Each marker indicates a ULB, where the size of the marker indicates the volume of applications received and the color indicates the population tier to which that ULB belongs. The median of each tier is also marked in yellow and annotated with the value of the respective median.
During interviews with eGov sta↵, we were told that some ULBs were using the ERP system throughout the course of a service workflow as intended. In most ULBs, eGov had partnered with an NGO called Karvy to help municipal o cials. Karvy employees are appointed to act as liaisons between eGov and lower level o cials. They train o cials on how to use, navigate, and fulfill their responsibilities in the ERP system. When interviewing a Karvy employee on site, he explained that he inputs application information into the ERP system. As he also has access to all login credentials, he digitally processes applications in the system after they are manually approved due to the poor technical skills of municipal sta↵. EGov o cials confirmed that this is not an isolated case, and is in fact quite common. Some o cials and ULBs feel that the barrier to learning is too high. As such they resort to inaccurate recording techniques even if the integrity of the application process is upheld. Operating under this assumption, it is expected that ULBs that overall have less technically trained sta↵depend on a single or couple data entry persons to digitally process applications after manual processing. It is important to deal with this issue in the long run, if there is to be a commitment to actual e ciency and transparency. After disregarding inaccurate workflow transitions, the adjusted timeliness values were calculated. CHAPTER 9. RESULTS AND ANALYSIS 57
Figure 9.2 shows a comparison of adjusted timeliness and unadjusted timeliness values by tier. The R-squared values for Tier 1, 2, and 3 are 0.987392, 0.919681, and 0.741817 respectively. The farther the ULBs are from the y = x line, the greater the discrepancy between the adjusted and unadjusted timeliness values. The trendlines fitted in Figure 9.2 show how the adjusted timeliness values may be correlated with timeliness. We can observe the overall trend in discrepancy within each tier as well.
Figure 9.2: PT Adjusted vs. Unadjusted Timeliness: This figure compares adjusted timeliness and unadjusted timeliness values for each ULB. The size and color of the marker correspond to the volume of applications received by and population tier of that ULB, respectively. The OLS lines-of-best-fit for each tier are indicated by dashed lines.
The Tier 1 trend line is quite close to a y = x line, and the Tier 1 ULBs are clustered very closely around the line. From our observations in Figure 9.1, this result is not surprising. Tier 2 and 3 ULBs are not as closely clustered around their trendlines. The general trend is that the unadjusted timeliness values are higher than adjusted timeliness, while converging as adjusted timelines increase. Tier 1 ULBs with high adjusted timeliness values have nearly equal unadjusted values. These observations indicate that ULBs with high unadjusted timeliness values are not necessarily recording applications inaccurately. Furthermore, we note that the size of the ULB need not a↵ect the integrity of digital application processing. As can be seen in Appendix F.5, there exist ULBs handling volumes proportional to their population, and yet have a low percentage of inaccurate workflow records. These two pieces of analysis indicate that it is possible for ULBs to have high timeliness regardless of tier and volume of applications while using the ERP system and digital workflow processing as intended. CHAPTER 9. RESULTS AND ANALYSIS 58
Figure 9.3: PT: A histogram of the frequency of adjusted timeliness bins, aggregated by color according to population tier
Adjusted Timeliness
Overall, we observe a modal distribution of adjusted timeliness values across all tiers, centered around 0.55 to 0.60. While a 55% timely service rate is not poor, we see examples of ULBs in all tiers that have reached 90% - 100% timely service rates. Consequently, we determine that it is possible to optimize timeliness across all ULBs. We would expect to see that as the volume of applications that a ULB has to handle increases, that e ciency would be compromised due to limits on resources. However, within each tier of cities we observe ULBs with high adjusted timeliness values. From Appendix F.5, we observe no obvious correlation between population or volume of applications and adjusted timeliness. In Figure 9.4, we describe the distribution of adjusted timeliness by tiers. Tier 1 ULBs have on average lower timeliness values in comparison to the other tiers, as can be expected. Even if the volume of applications received does not a↵ect timeliness, there could be other factors that lead to more Tier 1 cities being less e cient. Human and time resources do not grow proportionately with volume of applications, and the organizational complexity of larger municipalities could further a↵ect timeliness. CHAPTER 9. RESULTS AND ANALYSIS 59
Figure 9.4: Distribution of Adjusted Timeliness for PT by Tier: These boxplots separated by tier describe the distribution of adjusted timeliness values. Each marker, representing a ULB, has size proportional to the volume of applications received by that ULB. The pink reference lines indicate the average tadj for the corresponding tiers. The yellow line indicates the median tadj for the corresponding tiers
Weighted Impact
In order to compare and identify which ULBs are able have high impact while maintaining high timeliness, we calculate weighted impact. Weighted impact is the number of applications that could be delivered on a timely manner by a ULB given that the adjusted timeliness applies to the original volume of applications, and all application durations were recorded in a timely manner. As can be CHAPTER 9. RESULTS AND ANALYSIS 60
seen in Figure 9.5, there are ULBs in each tier that stand out as having significantly higher weighted impact. While a larger impact does not necessarily indicate greater timeliness, ULBs with high impact and timeliness can be further analyzed to learn about how high timeliness is maintained with large application volumes. ULBs to further analyze in this way are those that located in the upper right hand corner of each pane in Figure 9.6. ULBs 1013, 1070, 1073, 1034, 1030, 1117, 1089, 1120, 1062, and 1148 are examples of such ULBs. ULBs not in the upper right corner of these plots are not necessarily ine cient. Some ULBs have high e ciency but a lesser volume of applications. As a result, they have smaller impact. While analyzing the top performing ULBs according to weighted impact and timeliness can be informative, weighted impact should not be used to determine the overall top-performing ULBs. CHAPTER 9. RESULTS AND ANALYSIS 61
Figure 9.5: Distribution of Weighted Impact for PT by Tier: Boxplots for the distribution of weighted impact by tiers, where markers are ULBs of size proportional to the volume of applications received. CHAPTER 9. RESULTS AND ANALYSIS 62
Figure 9.6: GEImpact vs. Adjusted Timeliness of PT by Tier: ULBs handling high volumes of applications but maintaining high timeliness rates can be identified here. Each pane gives a plot of weighted impact vs. adjusted timelines for a particular tier. The shades regions correspond to the interquartile regions for each axis. The markers represent ULBs where the size of the marker is proportional to the total number of applications received. ULBs where both coordinates are greater than the third quartile can be informative upon further analysis. CHAPTER 9. RESULTS AND ANALYSIS 63
Model ULBs
Determination of top-performing ULBs with regards to New Property Tax Assessments is important in understanding what the limitations on timeliness are for this particular service and for further analyzing what changes to ULBs can improve e ciency for this service. Throughout the analysis we have addressed various characteristics and parameters of a ULB. Determining the top-performing ULBs however takes into account two factors. We consider ULBs with adjusted timeliness greater than the 75 percentile and di↵erences in adjusted and perceived (or unadjusted) timeliness above the median as model ULBs. We identify model ULBs by tier; see Figures 9.7, 9.8, and 9.9. While ultimately adjusted timeliness values are most important in determining model ULBs within each tier, we use the di↵erence between perceived and adjusted timeliness ( t) as a secondary factor. A smaller di↵erence does not necessarily indicate a lower percentage of recording inaccurate application durations. However, we use it as a secondary factor for two reasons. First, when com- paring ULBs handling similar volumes of applications, a smaller t indicates a smaller proportion of applications had inaccurate application durations. The adjusted timeliness values for ULBs pro- cessing similar volumes are similarly sensitive to the number of applications that were inaccurately recorded. As a result, a slight preference can be given to the ULB with a smaller t. Second, when comparing ULBs with highly varying volumes of applications even with the same tier, comparing t values gives more leeway for ULBs processing large application volumes. Suppose we have a ULB ↵ that handles twice the volume of applications as ULB . The adjusted timeliness of ↵ is less sensitive to a larger percentage of inaccurate application durations. ULB ↵ can have a higher percentage than