Where Is the Data to Study the Internet in India?
Total Page:16
File Type:pdf, Size:1020Kb
ISSN (Online) - 2349-8846 Where Is the Data to Study the Internet in India? MAITRAYEE MUKERJI Maitrayee Mukerji ([email protected]) is an independent researcher, formerly with Centre for Excellence in Sustainable Development, IIM Kashipur, Uttarakhand. Vol. 54, Issue No. 4, 26 Jan, 2019 Social science researchers who want to study the internet in India using data mining and analytic techniques are challenged by constraints in access, and the availability of big data. Even when such data is available, it is often behind a paywall or organised in a manner that makes it difficult to interpret. India is ranked at 127 among 201 countries in terms of internet penetration (Internet Live Stats 2016). Although internet penetration is only 13.5%, the country still has the second- largest number of users worldwide. According to the latest estimates, around 35% of the Indian population access the Internet using multiple devices (Internet Live Stats 2016). India is thus considered to be one of the fastest growing online markets, and thereby part of the strategic focus of many internet-based companies. As Indians browse, search, transact and interact online, one can observe how the internet is getting increasingly enmeshed in everyday lives. But, how do we study the influence and impact of the internet in India beyond anecdotes, journalistic articles and descriptive narratives? Conducting rigorous and critical studies on economic, social and political aspects of the internet in India, using data-driven analytic approaches is quite challenging in many ways. The challenges include access to data, having a certain level of proficiency in technological skills for data acquisition and analysis, training in cross-disciplinary perspective, and opportunities for collaborative efforts, among other ISSN (Online) - 2349-8846 things. This essay examines the first challenge, that is, the availability and access to relevant data. What are the ways in which social science scholars might approach online interactions and transactions as an empirical research field? What are the data sources available to them, and the challenges thereof, in accessing them? We examine both the traditional sources of data and the emerging social/big data sources that can be used for studying the internet in India. Diffusion and Adoption of Internet The first policy framework addressing the internet in India was “IT [Information Technology] for Masses” in 2001. Subsequent national level policies provided a common IT policy framework for the country. These policies were adopted by different states, and many interventions were implemented as per their perceived priorities and strategies. As a result, the penetration and access to the internet have been, in general, uneven across the country. The starting point for any scholar wanting to do an in-depth study on the internet in India would probably be to examine the nature and extent of internet penetration in India. Estimated and actual figures for the same are available for the country as a whole. In fact, there are multiple values for each parameter from multiple sources, such as the census, indiastat.com, etc. It is thus imperative for the researcher to be clear about the source and the appropriateness of the source and data before using them. The Organization for Economic Cooperation and Development (OECD) defines digital divide as “the gap between individuals, households, businesses, and geographic areas at different socioeconomic levels with regard both to their opportunities to access ICTs and to their use of the internet for a wide variety of activities”. In order to understand particular dimensions of the digital divide, the Census 2011 is, perhaps, the most comprehensive public data set. For the first time, the Census 2011 collected data that provides a picture of digital inclusion in India. It offers household data on the possession of digital assets from the village level. In separate tables, it also has data on village level assets and infrastructural facilities. This might be useful in assessing the extent of digital divide a village may be experiencing. However, the dataset does not record if a particular house has more than one digital asset. Micro-level data about each household is available at select academic institutions. The Census 2011 data set is fairly voluminous, and relevant data is generally distributed among multiple tables. Multiple files have to be searched or combined for relevant data. Thus, a social science researcher would need some technical skills to use requisite software for automating data extraction, merging, analysis and visualising data. The main limitation of Census 2011 data is that it is slightly dated, especially if one considers the rapid proliferation of the mobile phone. But if mined carefully, the results can actually serve as a benchmark for all future studies. Compared to computers and laptops, the diffusion of mobile phones in India has been far ISSN (Online) - 2349-8846 more rapid and widespread. The smartphone has become one of the most popular devices through which the Internet is accessed in India. The Telecom Regulatory Authority of India’s (TRAI) website provides recent aggregated data on the growth of telecom services. However, data on this website is generally available in PDF format (for example, TRAI 2018). The PDF format is suitable for sharing documents, but for aggregation and analysis, its contents often need to be copied and formatted manually. Availability of aggregate data in machine-readable file formats like XML, RDF, JSON or even Excel would make it easy to process contents using computers. Established in 2000, www.indiastat.com is an IT-enabled private limited company providing data aggregation services in the socio-economic information domain. It draws data from sources like the TRAI website, the Lok Sabha and Rajya Sabha questions, ITU, UN reports etc, and provides them online in a reusable format. The website has data that is generally not available at various other sources. For example, it provides data about the uptake of financial services in India or the number of complaints associated with online transactions. However, the data can be accessed only in exchange of a fee or through institutional access. A simple search on this website yields a huge amount of data and here also the researcher has to sift through individual files to find relevant research data. Moreover, care has to be taken to handle the duplicity and comparability of data across different states and time period. In many cases, although triangulation of data is not possible, the figures available on the website, do give some idea about the phenomenon being investigated. In addition to the data sets mentioned here for examining the nature and extent of the digital divide, the data from the 71st round of National Sample Survey Office (NSSO) on Social Consumption – Education Survey (2016) can be used for examining second order digital divide. The second order digital divide refers to the lack of skills or capabilities that can prevent people who already have supporting devices from accessing the digital sphere. Among the details it has about socio-economic characteristics of households, it also has data on whether the household has a computer or access to the internet. It also has particulars of information technology literacy for household members aged 14 and above. Digital Economy In general, information and communication technologies are fundamental for a transition from industrial to information/knowledge/digital economy with the commodification of information, digital goods and services, and online transactions. The internet has enabled new, and at times disruptive business models. In addition, it also has implications on the operations and the productivity of traditional industries. Market structure and competitiveness, pricing, incentives and regulations, the impact of online exchange of goods and services, and online behaviour are some areas via which the internet can be studied from an economic perspective. However, the data to study these linkages between the internet and the Indian economy is limited. For example, the Annual Survey of Industries (ASI), conducted by the Ministry of Statistics and Programme Implementation collects only ISSN (Online) - 2349-8846 a single piece of data for firms in the manufacturing sector, that is, whether the firm has a computerised accounting system or not. Some structural aspects of the IT/ITES sector can be derived from the 63rd round of NSSO survey Service Sector 2006–07 (2012). It has a section on post and telecommunications, with survey results of all enterprises providing communication services like courier, ISD/STD/PCO booths, voicemail through computer networking, video/fax/phone, voiced and non-voiced leased circuits, email, video conferencing, Internet, and activity of cable operators. It basically covers all enterprises not owned by the government, public sector undertakings and local bodies. In another section on computer and related activities, the survey covers enterprises engaged in hardware consultancy, software publishing; software consultancy, supply and maintenance; data processing, maintenance and repair of office, accounting and computing machinery among others. State and national level data on the use of ICTs by unincorporated non-agricultural enterprises in manufacturing, trade and other service sectors are available in the data set of the 67th round of NSSO (2015). As the proliferation of mobile phones and internet services increases, a lot of private market research firms and industry associations like Internet and Mobile Association of India (IAMAI) have begun to collect data related to Internet usage, especially from a marketing point of view. However, such data is generally available only as summary reports for the general public. Anybody who wants to carry out related research will have to pay to access the data or engage similar data collection agencies for collecting data. Similarly, although one can see the increasing popularity of e-commerce sites and also the use of mobile-based applications, very little data is publicly available to study the various facets of e-commerce or m-commerce.