VLDB Prerequisite for the success of Digital India 02 Content

Foreword 05 Introduction to Very Large 06 Adoption of VLDB 07 Overview of Digital India Programme 10 How VLDB Can Enable Digital India Programme 12 Key VLDB Challenges and Solutions 13 Conclusion 20 References 21 Contacts 21

03 VLDB | Prerequisite for the success of Digital India

04 VLDB | Prerequisite for the success of Digital India

Foreword

A few decades ago, data was considered a byproduct To tap ongoing momentum of digitizing India, there is a of algorithms or processes, not quite an integral great need to develop an atmosphere of impregnable part. But as the algorithms started being used for association between government, industry and businesses, it was realized that data generated is common man. A new kind of professional has not just a byproduct, rather an essential part of the emerged, the data scientist, who possesses the skills of process. Personal desktops also began using client- software programmer, statistician and artist to extract server regularly. Two decades later, we see the data. With time, the data generated and processed databases being involved in activities we perform on a will further increase and new solutions will have to be daily basis. The presence of the “Industrial Revolution devised, but this first step is essential in ensuring that of Data” is being felt all over the world, from science the whole country moves towards digitization as one. to the arts, from business to government. Digital information increases tenfold every five years that The purpose of this report is to promote discussions results in a vast amount of data being shared. This and share our point of view on the recent can be attributed to the improvements in algorithms developments around Digital India. Deloitte believes driving computer applications. this report provides you with insights into the India- scale data problems and how technology can be In 2014, the Prime Minister of India, Narendra Modi, instrumental in solving those. introduced the Digital India Programme; a programme which aims to connect every citizen of India, using the digital infrastructure. An initiative of this magnitude has never before been seen/attempted in India. It comes with its fair share of challenges, one of which is the immensity of the data it will generate, store and handle. With the Internet completing 25 years of its existence, India should utilize this platform to reap the benefits by aggregating and analyzing data in the various fields under concern. Data associated with the services of Digital India Programme will range in terabytes and will have to be stored in Very Large Databases (VLDB). Processing and accessing this data will also be more challenging due to the enormous amounts of data involved due to the large population of around 1.3 billion and and an area of 1.3 million square miles. Moreover, ensuring data

security and protecting privacy is becoming harder Hemant Joshi as the information is shared ever more widely around the world. For the Digital India programme to be successful, large databases is an important aspect that needs to be acknowledged.

05 VLDB | Prerequisite for the success of Digital India

Introduction to Very Large Database

“The high velocity, white-water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors, clickstream, and even transactions remain largely unnavigated by most firms. The opportunity to leverage streaming analytics has never been greater.”- Forrester

According to NASSCOM & Hansa Cequity’s Very large database (VLDB) emerges as report on Indian Analytics, the explosion a solution to existing data management in data has increased the importance of problems. It is a type of database that analytics. Experts believe that 90% of the consists of very high number of database world’s data was created in the last two records, rows and entries, which are years. It is expected that, by 2020, the spanned across a wide file system. amount of digital information existence Although there is no standard size of the will have grown from 3.2 Zettabytes to database to be classified as VLDB but it 40 Zettabytes (1 Zettabytes= 1 trillion is typically in terabyte range and contains Gigabytes).1 Businesses are clear that they billions of table rows. It is primarily an need the right set of products, tools and enterprise class database. techniques to manage such high volume of data. A VLDB is generally a repository for , a transactional processing system As the technology becomes cheaper and or a combination of the two, and is more widely accessible to the end users, maintained through standard relational there is more data getting produced database management system (RDBMS) than ever before. The advent of new software with levels of normalization technologies such as Internet of Things to reduce redundancy. A VLDB not only (IoT), Big Data and rise in number of requires capable hardware computing analysis to make an informed decision and storage resources but also needs the for the business, also creates a massive underlying system to have the ability to demand for databases that can store scale up to address its ever-growing size. huge volume of data for various purposes without compromising the accessibility or processing speed.

06 VLDB | Prerequisite for the success of Digital India

Adoption of VLDB

The enterprises are facing several •• Many enterprises or Government challenges for management of organisation face regulations for storing heterogeneous, unstructured and data for a minimum period of time. These enormous data that is being produced regulations generally result in more data every second. The data originating from being stored for longer periods of time. various known and unknown sources has many forms of complexity. They need VLDB addresses these challenges and the databases that are technologically problems faced by modern enterprises by equipped to reduce the magnitude of helping them adapt to the ever increasing these challenges. VLDB appears to be a complexity of data management and cost- solution to this problem. It is being seen as effectiveness of performing operations a remedy for complex data management against a system of that size. with its ever increasing flow of churn for the factors listed below: Fig.1: Survey from ISACA Journal Vol 3 Do you have the data you Need? •• For a long time, systems have been 37 developed and managed in isolation. 29 21 Enterprises have now started to 13 recognise benefits of combining these systems to enable cross-departmental No Yes, poor Good set Good set analysis while reducing system quality of data of enriched maintenance costs. 2 Consolidation of data databases and applications is a key factor According to an article published in ISACA in the ongoing growth of database size. Journal Volume 3 in which a survey was •• Enterprises grow by expanding sales conducted amongst 100 attendees, 58% and operations or through mergers of attendees say they either have no and acquisitions, causing the amount data available or data available is of poor of generated and processed data to quality (as represented in the Fig.1), which increase. At the same time, the user ascertains the need for very large database population that relies on the database for among enterprises. daily activities increases. One principal advantage of large volume •• The high demand, popularity and usage of data that is collected and stored on of social media (Facebook, WhatsApp, every click or tap of the user is that it can YouTube, Twitter, etc.) where various provide new insights that enable enterprise types of data such as images, videos, to make more informed decisions. This is etc. are being uploaded, shared and not only true for organisations but also downloaded every moment makes it for customers. Customers are better imperative for enterprises to have a informed to make better choices resulting robust database management solution in in improved satisfaction. place to meet these challenges.

07 BREXIT – Trigger for larger things to come? | Discussion on consequential scenarios: Focus on India

While it may appear that enterprises that Starting early 2000s, the introduction stories. Departments like Income Tax, are venturing into various innovative of Telco OTT (Over the Top) services Passport Office have completely automated solutions to manage these large sets of where a telecommunications service their workflows and made them digital. data, Government organisations and provider delivers one or more services This not only helped them in running the departments are making an equal use like voice & messages, content, data, etc. business efficiently but also proved to be of these technologies. However, the This telecom growth has been further more effective than the manual processes challenges and their solutions are slightly propelled by the introduction and rapid and resulted in better outcomes and different or more than the enterprises. penetration of first multimedia phones customer satisfaction. and more recent smart phones. Such Both corporate and government rapid growth in telecommunication sector The vision of Digital India is such that organisations are grappling with the issue has been majorly possible because of citizens of the country can use government of managing large data coming from the liberalisation policies and hyper- services at their disposal which can only multiple known and unknown sources. competitive market. be achieved by making it digital. The The Digital India programme launched With the infrastructure in place and adaptability of digital services can’t be by the Government will prepare India for far reachability nature of telecom, the questioned anymore as people have a 'Knowledge Future." The focus of this infrastructure has been complimented embraced many digital businesses like programme is to make technology central by growing Information Technology ecommerce, passport services, etc. with to enabling change. It is an umbrella (IT) workforce in the country. With the open hands. Taking a cue from the past programme that cuts across several introduction of the globalisation policy and progression and penetration of IT in departments and ministries. in 1991, the market was thrown open for daily lives, integration of varied services global firms to set up their business in India come as a much needed welcome step and Although Digital India has been introduced which gave IT outsourcing a major boost. hold a great future. officially on 1 July, 2015, the foundation India is now being viewed as the largest stone was laid much earlier with the exporter of IT in the world. Such a large and Some of the cardinal issues that the advent of telecom revolution back in skilful workforce reaffirms the fact that the government will face in implementing 1984 by India's foreign and domestic country is ready for a digital revolution, not Digital India programme include data telecommunications policies. The results of in terms of infrastructure, but also in terms security, privacy, and most importantly the initiatives are quite apparent in today’s of workforce. management of the data (both structured era where India's telecommunication and unstructured) that need to be hosted, network is the second largest in the world, But as they say, the system is only as good updated and accessed. VLDB could play based on the total number of telephone as its users. The systems will not be of any an important role in dealing with such high users (both fixed and mobile phone), along use if people do not start using them in volume of data that will be used in Digital with having one of the lowest call tariffs. their daily lives and operations. Adoption India programme. Telecommunication has been a vital thread of technology plays a significant role in in connecting people all across the country. decision making— early adopters of digital systems have given us quite a few success

The subsequent section closely examine the initiatives under Digital India programme and fitment of VLDB in each of them.

08 VLDB | Prerequisite for the success of Digital India

09 VLDB | Prerequisite for the success of Digital India

Overview of Digital India Programme

Universal Access to Mobile Another important aspect of this Connectivity initiative aims to ensure initiative is the development of the mobile connectivity in all parts of the National Information Infrastructure (NII). country by 2018. The objective is to provide NII is supposed to be implemented by each citizen of the country with access to integrating the already existent networks mobile networks and associated services. like the National Knowledge Network There are a lot of services offered by (NKN), State Wide Area Network (SWAN), various departments of Government (both Government User Network (GUN), Union and State) and other basic services National Optical Fibre Network (NOFN) such as Banking, Insurance, etc. where and MeghRaj Cloud. NII will provide high the mobile number of the user acts as speed internet connectivity to government authentication for ensuring reliability of entities by accessing the network and cloud services. Therefore, it is critical that each infrastructure in India. citizen has access to this infrastructure for mobile connectivity. e-Kranti initiative is one of the most crucial initiatives of Digital India, involving Public Internet Access Program initiative provisioning services like e-healthcare, aims to improve the connectivity within technology for justice, e-education, etc. the country. This initiative entails building e-Healthcare would cover online medical Common Service Centres (CSCs) in each consultation, online medical records, online gram panchayat and also converting post medicine supply, pan-India exchange for offices into multi-service centres. patient information, etc.4 Medical records The CSCs would provide citizens with of each patient, including multimedia data centres from where they can get access to like CT scans, X-rays, and MRIs will be multiple services like utility bill payments, stored and accessed from a single point of exam records, land records, etc. Post access. Offices will be converted to multi-service centres to provide not only postal services Technology for justice plans to integrate the but also financial services like savings and justice system using IT by providing online insurance. portals to register FIRs, upload evidences, and eventually reduce the number of Broadband Highways initiative is the first pending cases the Indian Judiciary system step in digitizing India through internet faces by providing an efficient and hassle- connectivity in all areas within the country. free litigation process. All services that will be made available as part of the Digital India Program e-Education aims at developing Massive would require good connectivity across Open Online Courses (MOOCs) that can the country. Broadband highways is the be leveraged for education and provide initiative which aims to provide this basic citizens access to huge amounts of connectivity to all citizens. knowledge along with a forum for students and teachers to debate, discuss and form This initiative focuses on establishing meaningful conversations.5 broadband connectivity all over the country. As part of this initiative, 250,000 e-Governance initiative aims at making village panchayats are planned to be the best use of growing technology to covered under the National Optical Fibre provide services like banking, civil services, Network (NOFN) by December 2016.3 postal services, etc. to every citizen of

10 VLDB | Prerequisite for the success of Digital India

the country, irrespective of their location. children would be a gateway for real-time It is not commercially feasible to open information regarding lost and found physical branches of banks and full children where a complaint can be filed government services in India, since it is and tracked. Biometric scanning will such a widespread nation with distributed ensure physical security of the government population in villages and cities. Hence, to offices along with logging-in the time provide equal services to all the citizens employees report for duty. This a very of the country, e-Governance (disposal of efficient time management initiative. A government services in electronic form) messaging app will be created for the has been planned to be implemented and elected representatives and government is one of the nine pillars of Digital India. employees, with the intent to create a Using pervasive nature of IT as a platform, forum for discussion, generate a fast government intends to reach every far response, and share ideas through corner of the country and provide services a two-way communication between like voter cards, Aadhaar card, integration representatives and employees. of services, registration of companies, etc. A major part of the country’s capital is lost Information for All initiative is aimed to in importing electronic items. To counteract curb the problem of making the documents this, we require the growth of Electronics imperishable, pervasive, and immune Manufacturing, another initiative under to theft and loss. At the same time, the Digital India programme, within the authenticity of the document can be country. Hence, this initiative aims at verified decreasing imports and also plans to provide job opportunities by promoting the Considering, the current population establishment of electronics manufacturing of 1.25 billion, the data collected from plants within the country. Its final objective digitalization of documents would be is to have net zero import of electronics by enormous which can only be contained 2020. in Very Large Database (VLDB). The database will store all the information IT for Jobs initiative plans to train students related to the documents (e.g. name and from various villages, small towns and Tier- other identifiable information, document II/III cities so that they become competent identification number, date of issue and enough to take up IT jobs in the next 5 expiry, digital copy of the certificate, years. Training will also be provided to etc.) and be available for the person at the workforce which is mainly rural so their disposal. Implementation of such a as to help the Telecom service providers program will reduce document frauds as in their own area. It is planned to set up well and help accelerate the processes of BPO’s in North eastern states to deliver background screening. growth which is ICT enabled. To help with this, there is the North east BPO scheme Early Harvest Programs initiative started by the government that provides contains multiple services like the 50 percent of the capital funding to the national portal for lost and found children citizens for setting up BPOs.6 (KhoyaPaya), biometric scanners in all government offices, mass messaging app for government employees, etc. The National portal for lost and found

11 VLDB | Prerequisite for the success of Digital India

How VLDB Can Enable Digital India Programme

It is clear that all of these initiatives will retina scan, etc., to be accessed from the produce, process and store data for database. Also, scanning response time providing the services to the citizens. Some needs to be extremely small to avoid of the initiatives are more data intense and wastage of time when employees are will need advanced technology platforms logging in. to meet the growing data needs. VLDBs •• The national portal for lost and found clearly are the solution to meet these children (KhoyaPaya) will also be hosting needs. a huge amount of data consisting of every single case involving a lost child. Few key benefits of VLDB in Digital India programme are as under: •• Very Large Database (VLDB) to host the content shared/exchanged on •• National Information Infrastructure (NII) various Apps such as Mass Messaging will require a huge database to hold the App for government employees and necessary data, currently contained representatives. within separate network databases.

•• NII will also make use of data Analyzing data in VLDBs used for Digital warehousing to get necessary data from India services will help understand the other databases. requirements and trends in various citizen centric services across country. To ensure •• To create EHR (Electronic health records) that Digital India Programme continues to under e-Kranti, there is a need to access be successful in the future, it is necessary data from various heterogeneous to leverage all the insights one can get sources, such as e-records of patients in from this stack of centralized data. For existing vendor software and make them instance, the e-justice service would store available at a single point of access. details of each case on the database; this •• The technology for justice service information can be analyzed to identify key provided under the e-Kranti initiative issues in every region and each of them needs to deal with unstructured data like can then be dealt with individually. This CCTV footage, photographs of evidence, would provide a clear connect between the along with the usual structured data like issues faced by the common man and what text, First Information Report (FIR), etc. the government and police department need to do to sort them. •• To host MOOCs under e-Kranti huge volume of data, that includes Likewise, on analyzing data from cases in various course materials, quizzes, the KhoyaPaya portal, it will be possible and certificates, need to be accessed to identify areas where child abduction by millions in the country quickly and is taking place. Furthermore, it might be simultaneously with the best possible possible to recognize areas which are throughput. involved in child trafficking rackets on •• The DigiLocker service, under the investigating further. Information for All initiative, will also host huge amounts of data consisting of the Similarly, analysis of data from e-healthcare citizens’ important documents. services can also help identify resource requirements (doctors, nurses, medicines, •• Biometric Scanner, to be implemented etc.) in certain areas, probable epidemics, under the Early Harvest Program, needs etc. unstructured data like fingerprints, 12 BREXIT – Trigger for larger things to come? | Discussion on consequential scenarios: Focus on India

Key VLDB Challenges and Solutions

While VLDBs solve several data related problems, they too come with its own set of challenges. It is important to recognize these challenges and put in place steps to address them for effective use of VLDB in Digital India programme.

Fig.2: Key challenges

Challenges

Storage Quick Data Management Access

Data Concurrency Data and Volume Redundancy

Access Data Management and Restoration Data Confidentiality, Integrity and Availability

Quick Data Access not an easy task whereas textual Data that is stored in VLDB is not only information can be retrieved by huge but also comes from various the relational software available heterogeneous sources and is usually today. Apart from accessing the in an unstructured format (e.g. data, the process of retrieving it Multimedia files). One of the major should be efficient and with minimal challenges is to integrate structured response time. Finally, to achieve and unstructured data along with the the optimum performance of a very need to access it. Unstructured data large database, we need to find a cannot be easily manipulated as text careful balance between hardware data. Querying the multimedia data resources, software resources and or data that needs to be retrieved the size of data, essentially to get the from heterogeneous sources is best throughput.7

13 VLDB | Prerequisite for the success of Digital India

Outlined below are some suggestions data. In the Biometric scanner scenario, to overcome challenges related to quick this can be executed if the employee data access: inputs the last few digits of any unique ID number (which targets a specific •• In the traditional sense, data warehouses range of data) in VLDB, after which the of VLDB contain only textual information scanned fingerprint/retina data can be but, with the influx of huge amount of matched. One more indexing method multimedia content, there is a need called Columnstore indexing can enable to integrate structured data (e.g., text) processing in batch mode. This will and unstructured data (e.g. multimedia greatly increase performance for queries content) and efficiently access them. which scan multiple rows. This can be done using SQL jobs like Extract, Transform and Load which reads Data Volume data from heterogeneous sources like Handling data which spans in Terabytes flat files (notepad, word), databases, (or even Petabytes) can be extremely xml files, including multimedia files and challenging in terms of mere storage consolidates them into a single unified space. Moreover, data is rapidly increasing format. in volume and this increase must also •• The challenge of obtaining coherent be taken into account when dealing with longitudinal patient record that is databases this huge. built from various sources and that can be accessed in real time can be The Digital India program consists of tackled by using data. This scenario various initiatives, as discussed earlier, and involves an Update-driven approach most of the services offered within each of data warehousing, where integrated initiative would deal with huge amounts information from various heterogeneous of data on a regular basis. This data needs sources is stored in a warehouse to be stored in a manner where it can and subsequently available for direct accommodate the data that gets generated querying and analysis. with time.

•• In a large country like India where Recommendation to overcome the Data multiple people would want to access the Volume related challenges huge amount of data (for e.g. MOOCs) Digital India Initiative entails services which simultaneously, parallel processing and would deal with humungous amounts of partitioning of data help in giving the data. This data will not only span terabytes end user, the best throughput. Also, at the inception of the database but a plethora of techniques that come will also continue to grow with time. To under adaptive query processing can ensure that growing data doesn’t become be implemented, which focus on using a hindrance in the functioning of these runtime feed-back so that the query initiatives, it is necessary to compress the processing can be modified to provide data within the database. better response time and more efficient CPU utilization.8 Different types of compressions can •• To ensure fast response time, a filtered be used depending on the frequency index can be created by specifying of use of the data. The oldest archive a range for the query to retrieve the data can be compressed using page

14 VLDB | Prerequisite for the success of Digital India

compression, the moderately used data Data Security can be row compressed, while the newest The data present in very large databases and most frequently used data can be needs to be protected so that their uncompressed. confidentiality, integrity, and availability is not compromised. The data needs to be However, addition CPU resources would be encrypted when the data is hosted on the required to compress and decompress the database and the data flows are accessed data in the database.9 Hence it is important by the user. to fully understand the workload of the server to decide which tables must be Also, securing the database-various compressed. programs and data in it has become very critical as networks are widely accessed, Data Backup and Restoration in particular from the Internet. Hence, It is important to have a backup and network security controls become very restore it, since sometimes a database significant in this scenario. needs to be brought back to its original For the implementation of all the services state, e.g. in cases when the database is a database is required, hence the corrupted due to a software error, or if it confidentiality and integrity of the database has been updated with erroneous data. is of prime importance. This becomes of prime importance when dealing with sensitive data. Outlined below are key considerations All services provided by the Digital India to deal with data security challenges in Programme will face this challenge. Due VLDB to the presence of critical data in all •• The first step in safeguarding data is to the databases, database back-up and design security policies. These should restoration procedure is a must. be designed taking into consideration the documents to be protected, level of Solutions for data backup and risk posed by users, and the sensitivity restoration of data. Ideally, all long-lived tasks, such as data cleansing, warehouse population and •• These policies, then, need to be enforced refresh, data summarization, indexing, and implemented. The end-users must roll-up, and query processing, should also be made aware of these policies by incrementally write checkpoints to a providing adequate trainings. persistent storage. In case of failure, •• Vulnerability assessment tests or the system only needs to have partial penetrations tests against the databases roll-back to previous checkpoint and re- need to be run for evaluating the start. Incremental check-pointing can be security. These tests are generally used implemented to provide forward recovery. to find the vulnerabilities that can bypass Once the backup is done, the size it takes the security controls and enter the up in the database is checked. In this database. Vulnerabilities might include scenario, compression comes into play. misconfiguration of controls or any Row compression and page compression known vulnerabilities of the database can both be used depending on the data. software. The results obtained can be

used to strengthen the security of the database.

15 VLDB | Prerequisite for the success of Digital India

•• To ensure data security at all times a include, among other procedures, continuous monitoring system must be reviewing and managing permissions in place to make note of any unusual given to objects of the database and activities. Database activity monitoring patch management. (DAM) can be one such system. An end-to-end encryption solution can be •• Analysis can be performed to bring to implemented for the data that is stored light the known exploits, policy breaches, and the flow of data from the sender to the or baselines that can be captured over receiver. This to protect the privacy of the time to build a normal pattern used for users, especially when using services like finding anomalous activity that could the mass messaging app. indicate intrusion. Subsequent changes in the policies can be made, taking these results into account. Apart from the above steps, there should be regular monitoring for compliance with the security standards. These

Fig.3: High Level Data Security Model for VLDB

Enforce and Design Security Implement policies and policies framework

Carry out Continuous Vulnerably Monitoring Assessment System

Access Management by having multiples levels of permissions Access management and subsequently depending on the roles and responsibilities data protection are extremely crucial of the end user. A user’s access is pre- to ensure the success of any business determined depending on their respective that implements a VLDB. Giving access roles and responsibilities. For example, to only authorized personnel is critical a nurse and receptionist, at the same when maintaining confidentiality of data. physician practice, do not require the same The challenge lies in the administrator data for a given patient; hence data access identifying a user and the data they require for them will be decided depending on the access to. roles and responsibilities they have to fulfil.

Solutions for Access Management In conclusion, robust customized identity In VLDB we can implement access and access management solutions can provisioning and de-provisioning to be provided to ensure access to all the manage user access. This can be achieved stakeholders.

16 VLDB | Prerequisite for the success of Digital India

Fig.4: Services provided for Access Management

Business Drivers

Governance and security Audit and regulatory Business enablement and effectiveness compliance IT agility

• Reduce risk of • Real-time visibility and • Synergize business unauthorized access to reporting on “who has collaborations with critical systems access to what, and why” partners and suppliers • Consistent and • Ease of compliance to • Deliver advantage in automated enforcement meet internal security today's mobile, always-on, of security policies and policies, and government cloud-based business

Business outcomes Business access controls regulations environment • Protection against rogue • Increase accuracy and • Ease of registration and user accounts accountability for application access to regulatory compliance customers • Reduce risk of data loss and privacy related • Management of privilege • Role base automated incidents and violations access and Segregation of access to business Duties controls applications

Data Concurrency and Redundancy consistency.10 When a user is editing When dealing with databases that a particular database a snapshot is contain as much information as a VLDB, created, and only when the transaction is concurrency controls play a major role. complete a new version of the snapshot They are needed to ensure database is updated wherever the file is being transactions are performed concurrently accessed. Page restore can be used without compromising the data integrity of to restore the content if the data is the databases. To avoid any errors in any corrupted in that particular page.11 system where two or more transactions •• Regular review of unused indexes or happen with time overlap while accessing redundant information needs to be the same data, concurrency control is done to avoid the accumulation of needed. trash. Few indexes could be subsets of others; in such cases, the indexes can Solutions to maintain data concurrency be removed if deemed unnecessary and avoid data redundancy by the developer. Some indexes might •• Concurrency controls can be have the same keys but techniques for implemented to tackle the above indexing can be different — there might challenges. The concept of Multiversion be an opportunity for index consolidation concurrency control can be used there, also saving a lot of essential space to achieve data concurrency and on the VLDB.

17 VLDB | Prerequisite for the success of Digital India

Storage Management Methods for efficient storage With data in a VLDB being in the range of management terabytes or petabytes, it is very important Multiple options like ‘filegrouping’, to structure the data systematically to be partitioning, tiered storage, and able to access it at a reasonable speed. ‘columnstore’ indexing will play an essential If not structured well, it would take much role in enabling the database to grow in longer to process any query. A planned the future as all the databases used for storage will not just improve the query hosting the data need to be structured so processing speed, but will also make it that they can last for many years to come. much easier to maintain, update and Tiered storage will enable the data to be discard the data. grouped depending on how often it may be required. The freshest data or the most Data involved with the Digital India frequented data can be kept on the faster Initiatives will have a data lifecycle that tiers while the old or occasionally needed could be as long as 70+ years, instead of data can be stored in the slower tiers. five to ten years as it is in most enterprises. ‘Columnstore’ indexing, if used, can enable This also implies that the data warehouse processing in batch mode. This will greatly will continue to grow for 70 years. In increase performance for queries which addition to size, a long data lifecycle scan multiple rows. imposes a heavy requirement on the structure of the data warehouse which has to be flexible enough to access 70 year old data.

The national information infrastructure will require immense amounts of pre-planning with respect to how the data will be stored in the database.

Course content for the online courses and user specific data, like their completed courses, certificates, etc., can be two major divisions. Course content would be read- only data while user specific data would be regularly updated/modified.

18 VLDB | Prerequisite for the success of Digital India

Summary of key challenges backup and restoration policies are defined and While very large databases solves several issues enforced in these systems. An automated data and problems for data management in Digital backup solution should be considered to enable India programme, they come with certain the data backup. Regular testing of the backed challenges that need to be effectively dealt with by up data should be performed to gain assurance Government agencies. on its integrity and availability.

•• Data security remains a growing concern •• Need-to-know and need-to-have are the best to protect vital information of citizens and principles that form the key to providing access records maintained by several departments. to the legitimate users. User access remains a It is important that suitable security measures concern across organisations and industries. covering all aspects of data lifecycle are ––Data access policies and processes should considered. be defined as guiding principles for access ––Data classification and security policies should management. be defined and implemented. ––For critical data access, additional ––Regular security testing of the databases authentication mechanism such as multi- and underlying infrastructure to ascertain factor and biometric technique should be vulnerabilities should be performed. considered. ––Suitable encryption techniques should ––Solutions like Identity & access management be used to protect the confidentiality of and privilege access management should be information. considered for controlling the access to data in ––Solutions like database activity monitoring various initiatives of Digital India programme. should be deployed for continuous monitoring for security issues.

•• While VLDB offers a great flexibility in data access and storage, it is also prone to failure. Therefore, it is critical that adequate data

19 VLDB | Prerequisite for the success of Digital India

Conclusive remarks

Many enterprises have recognized the build predictive and detective models for importance of managing huge and service efficiency and proactive decision complex sets of data, and most of them making for improved citizen experience. have started to embrace the VLDB The additional measures to address issues solutions for it. Government of India and such as database access and data security its department are recommended to will have to be put in place to handle large consider employing VLDB for its Digital and complex volume of data in a seamless India programme. Digital India programme manner. will need application of data science to

A SWOT Analysis View for Use of VLDBs in Digital India Programme

Strength Weakness

•• Centralization of scattered data •• Political willingness

•• Systematic storage of data •• Data accessibility

•• Data warehousing, which provides •• Skillset and competency high throughput

Opportunities Threats

•• Most pillars of Digital India Programme •• Data duplication and inconsistency will require VLDB •• Unauthorised Data Access and Security •• Analytics of structured and Breaches unstructured data for service •• Cyber terrorism improvement and decision making

•• Employment Opportunities

20 VLDB | Prerequisite for the success of Digital India

References Contacts

1. https://community.nasscom.in/docs/DOC-1062 Shree Parthasarathy 2. https://docs.oracle.com/cd/E11882_01/server.112/e25523/intro.htm Partner

3. http://www.digitalindia.gov.in/content/broadband-highways PN Sudarshan 4. http://www.digitalindia.gov.in/content/ekranti-electronic-delivery-services Partner 5. http://www.digitalindia.gov.in/content/ekranti-electronic-delivery-services Gaurav Shukla 6. http://www.digitalindia.gov.in/content/it-jobs Director 7. tdan.com/what-is-vldb-very-large-databases/496

8. https://www.is.upenn.edu/~zives/research/aqp-survey.pdf Achal Gangwani

9. https://technet.microsoft.com/en-us/library/dd894051(v=sql.100).aspx Senior Manager

10. http://www.eazynotes.com/pages/database-management-system/ concurrency-control.html Manikanda Prabhu Manager 11. http://sqlturbo.com/presentation-best-practices-for-sql-server-very-large- databases-vldbs/ Udit Lekhi Manager

Isha Shyam Consultant

Pranav Krishna Consultant

21 22 VLDB | Prerequisite for the success of Digital India

23 For further information please email at [email protected]

Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”), its network of member firms, and their related entities. DTTL and each of its member firms are legally separate and independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients. Please see www.deloitte.com/about for a more detailed description of DTTL and its member firms.

This material is prepared by Deloitte Touche Tohmatsu India LLP (DTTILLP). This material (including any information contained in it) is intended to provide general information on a particular subject(s) and is not an exhaustive treatment of such subject(s) or a substitute to obtaining professional services or advice. This material may contain information sourced from publicly available information or other third party sources. DTTILLP does not independently verify any such sources and is not responsible for any loss whatsoever caused due to reliance placed on information sourced from such sources. Without limiting the generality of this notice and terms of use, nothing in this material or information comprises legal advice or services (you should consult a legal practitioner for these). None of DTTILLP, Deloitte Touche Tohmatsu Limited, its member firms, or their related entities (collectively, the “Deloitte Network”) is, by means of this material, rendering any kind of investment, legal or other professional advice or services. You should consult a relevant professional for these kind of services. This material or information is not intended to be relied upon as the sole basis for any decision which may affect you or your business. Before making any decision or taking any action that might affect your personal finances or business, you should consult a qualified professional adviser.

No entity in the Deloitte Network shall be responsible for any loss whatsoever sustained by any person or entity by reason of access to, use of or reliance on, this material. By using this material or any information contained in it, the user accepts this entire notice and terms of use.

©2016 Deloitte Touche Tohmatsu India LLP. Member of Deloitte Touche Tohmatsu Limited

Deloitte Touche Tohmatsu India Private Limited (U74140MH199 5PTC093339) a private company limited by shares was converted into Deloitte Touche Tohmatsu India LLP, a limited liability partnership (LLP Identification No. AAE-8458) with effect from October 1, 2015.