<<

Dr. Craig Thompson, Professor and Acxiom Chair in Engineering Science and Department 313 Engineering Hall, Fayetteville, AR 72701 Tel: (479) 575-6519, Fax: (479) 575-5339, : [email protected]

Database

Problem Industry, government, the U.S. military, DHS, the scientific community, our networked society, RFID and sensor networks are all generating at unprecedented and accelerating rates. The commercial database community is dominated by a few main vendors and database architectures have not changed enough in thirty years – they are monolithic and dominated by disk-based storage. Organizations with huge and extreme throughput requirements are outgrowing traditional database architectures and searching for a next-generation approach.

Objective The objective of this proposed project is to develop an open architecture for database management that can complement disk-based storage with grids of hundreds of low-cost (e.g., PCs) which can all query fragments in parallel.

Impact if Successful In Arkansas, companies like Acxiom Corporation and Wal-Mart will benefit from low-cost, scalable database supercomputer . These companies currently process terabytes daily and petabytes annually, always more. Acxiom is already years into pioneering data grid technology – they can process datasets for clients like Citibank, major credit agencies, and hundreds of other clients and compare customer records against a knowledgebase the size of the US population in hours to days. Wal-Mart is processing all transactions worldwide and will soon also process all RFID sensor data worldwide. These companies do not want to be in the database business but need better database management architectures to keep pace with their voracious data needs. At the level of the US, the need for scalable database supercomputing is glaring:  Department of Homeland Security lists data sharing among its organizations as its top priority.  The US military views dominance as a critical goal.  NASA and the scientific community need ways to process huge data sets.

Approach Over the past year, researchers at University of Arkansas have been working closely with Acxiom Corporation on a next-generation architecture (funded by $280K from Acxiom). The approach is related to NSF-sponsored research in . Much of that research has focused on computational grids, that is, how to spread a computation across large numbers of cheap machines (e.g., PCs). Our work with Acxiom has focused on data grids in which huge files are partitioned across the main memory of pools of 100’s of PCs. Then, queries can be partitioned to operate in parallel across all machines at once. The architecture is scalable by adding machines and tunable to assure very high throughput. Acxiom uses a special purpose version of this kind of architecture to effectively process huge data sets. The opportunity is to generalize this architecture to cover a wide variety of traditional query operations while retaining the massive capacity and huge speedups of the “” data grid architecture.

Task 1 – Year 1 – Storage Management Architecture Prototype We proposed to work with the open source database community MySQL and/or to develop a standard storage architecture API so that third parties can add their own storage architecture to any front-end relational . We will then develop a data grid cluster architecture consisting of a collection of general index structures – initially HASH and ISAM. The system will be prototyped and tested on large datasets. We will also demonstrate that custom indexes like current Acxiom workhorse Abiletec indexes can be interfaced to via the storage engine. This would benefit Acxiom by providing a broader class of relational queries on current datasets. But at the same time, the architecture, motivated by Acxiom’s positive experience in data grids, will generalize so the entire field can benefit from a radical new, highly parallel and scalable storage management architecture.

Task 2 & 3 – Years 2 & 3 – Query Processing, Automation, and Digital Rights Management Results from Task 1 are immediately useful because some high-performance applications can interface to that architectural level directly. But generalizations of the higher levels of a DBMS system can also lead to richer, more capable DBMS systems. In follow-on years, we expect to extend the work from Year 1 to build higher level capabilities. These include  Multiple query processors that can operate in parallel against one or several storage management architectures.  Extensions of relational queries to parallel that speed up query processing from multiple simultaneous queries.  Critical to the long term success of next-generation database is secure access. New policy-driven languages are needed to assure security and privacy.

Risk Assessment and Chance of Success Based on special purpose prototypes developed at Acxiom, we can claim with reasonable certainty that new, more general architectures can provide similar benefits to a wide variety of mega-databases. The Principal Investigator has credentials (see below) that qualify him for architecting and leading the proposed development effort. A partnership between University of Arkansas and Acxiom with connectivity to open source vendors, e.g., MySQL is already yielding productive results and can build on this working relationship.

Team Dr. Craig Thompson – Principal Investigator - Dr. Thompson if Professor and Acxiom Database Chair in Engineering. He has a strong record of industrial research and external funding ( for $11.3M and database research from DARPA 1990-present)). His background includes university teaching, publications, presentations, inventions, consulting, standards, and administration; and his work has had reasonable impact in several fields - architectures, survivable reliable secure distributed object middleware, and multi-agent systems. He led the DARPA Open OODB project that helped define OODB functionality but more importantly led to service-oriented architectures (SOAs) – middleware architectures consisting of collections of Lego-like modular components. He co-authored Object Management Group’s Object Management Architecture and Object Services Architecture documents that the blue-print for CORBA and CORBAservices, a direct precursor to today’s web services. He also helped architect the DARPA CoABS agent grid and is currently working with Acxiom on improvements to a very high-performance data grid architecture.

Budget A detailed budget and plan will be developed if this project is selected. A successful project would involve  University of Arkansas CSCE Department – 3-4 faculty for 2-3 months each summer plus some release time and cost during the academic year. 7-10 grad students at the PhD and MS level plus undergraduate researchers  Acxiom guidance – probably a range of Acxiom architects experienced in different facets of the architecture. Most of these contacts are already in place.  MySQL partnership – we have made initial contacts and believe our proposal is aligned with their open source project goals. Initial estimate: $700-$800K in year one.

Benefits to University of Arkansas  Arkansas is an EPSCoR state and would benefit from the opportunity to create a center of excellence in data engineering. The investment is strategic regionally because Arkansas is home to Wal-Mart, Acxiom, JB Hunt, Tyson, and many other companies that depend on enterprise computing to process huge data sets.  This is a large software architecture and development project that could also involve resources from CAST and Walton College of Business so there is a likelihood of cross- College participation. Of course, this would strengthen university-industry ties in a year when strong ties with industry partners, esp. Acxiom, are especially important.  This project puts U Arkansas on the map as a world leader in data management. The University’s 2010 goal is to build -intensive industries and world-class leadership in research in Arkansas. This proposal offers an opportunity to do that.  If we win this effort, we can leverage our relationship with Acxiom to win other efforts, hopefully benefiting DHS, DoD, NSF, and scientific research and many data management efforts US wide.  This project involves thinking outside the box, re-thinking the boundaries in the data management area. It would draw students to esp. our Ph.D. and M.S. programs. Even better, it would make a difference in the world. And it would be fun.