Programs That Use Databases Are Central to Our Information Infrastructure

Programs that use databases are central to our information infrastructure. Such systems are increasingly being developed using procedural object-oriented languages and relational databases. But procedural languages and database query languages are based on different semantic founda- tions and optimization strategies. No approach has yet been developed to bridge this semantic gap and enable integration of databases and procedural language programs into efficient total systems. As a result, applications that access databases are awkward to design and develop. Careful opti- mizations are often needed to attain good performance, resulting in programs that are difficult to maintain and evolve. In addition, databases and programming languages are taught separately so that developers trained in one domain may not be expert in the other domain. Finally, there has not been sufficient evaluation and comparison of the performance of different integration approaches. There is accordingly a very high leverage to improving the development of these programs, both with respect to programmer productivity and application performance. The goals of this project are to create methods, languages and tools which enable application programmers to readily develop efficient and well-structured programs for applications which integrate database processing into procedural programs, to develop a framework for evaluation of solutions for this problem and to develop and offer courses which integrate database and procedural language program development. The intellectual merit of this project is based on the development of fundamental solutions to the problem of integrating programming languages and databases. The vision is to extend programming languages to specify modular and statically-typed database operations, while still leveraging the full power of database query optimization and concurrency control. Two approaches will be explored: Safe query objects use statically-typed classes to define compositional and modular queries. This simple and novel approach may have immediate impact by enabling static typing of queries in widely-used technologies like EJB. Query extraction involves extracting database queries from traditional procedural code. This project will go beyond previous work by extracting large queries from nested sub-expressions and exploring interprocedural analysis to extract queries from modular programs. These new approaches will be evaluated relative to existing approaches for performance and usability. In additional to traditional performance benchmarking, large open-source applications will be used to test the applicability and usability of proposed solutions and measure their performance in more realistic settings. This project will have broad impact on both industry and academia. A fundamental improve- ment in the interface between programming languages and databases could significantly reduce cost and improve quality of software, from e-commerce servers to desktop and mobile applications. The project includes a plan to publish in both programming language and database conferences to facilitate and encourage coordinated research. These results are also a concrete first step in a larger research effort to better support the development of distributed systems from large-scale components, using technologies like web services. The educational innovation in this project is based on developing an integrated approach to teaching programming, languages and databases. Although incremental improvements to existing courses will be implemented, the project will also develop undergraduate and graduate courses that take an integrated view of programming with persistent data as their central theme. Project Description 1 Introduction Programs that use databases are a critical part of our information infrastructure. These systems generally use programming languages for general-purpose computation and databases to control concurrent access to data, search large amounts of data, and/or update data reliably and securely. Such systems are increasingly being developed using procedural object-oriented languages and relational databases. For scalability and reliability, multiple application servers typically communicate with a shared, replicated database server. Procedural languages and database query languages are based on different semantic founda- tions and optimization strategies. These differences are known informally as “impedance mis- match” [44]: imperative programs versus declarative queries, compiler optimization versus query optimization, algorithms and data structures versus relations and automatic indexes, null point- ers versus nulls for missing data, and different approaches to modularity and information hiding. Because databases and programming languages can perform many of the same tasks, developers must make difficult architectural decisions about how to organize and partition system func- tionality. Distributed execution also requires efficient structuring and management of specialized communication patterns. As a result, applications that access databases are awkward to design and develop. Program- ming languages do not facilitate effective use of databases, and attaining good performance usually requires careful optimization based on expert knowledge, which can make programs difficult to maintain and evolve. In addition, databases and programming languages are taught separately so that developers trained in one domain may not be expert in the other domain. Finally, there has not been sufficient evaluation of the end to end performance of design alternatives. There is accordingly a very high leverage in methods, languages and tools which enable more effective development of these programs, both with respect to programmer productivity and application performance. 1.1 Research Overview The goals of this project are to create methods, languages and tools which enable application programmers to readily develop efficient and well-structured programs for applications which integrate database processing into procedural programs, to develop a framework for evaluation of solutions for this problem and to develop and offer courses which integrate database and procedural language program development. The vision is to extend programming languages to specify modular and statically-typed database operations, while still leveraging the full power of database query optimization and concurrency control. Two approaches will be explored: Safe query objects use statically-typed classes to define compositional and modular queries. This approach is effective because is leverages existing language constructs to enabling static typing of queries in widely-used technologies like EJB. Query extraction involves extracting database queries from traditional procedural code. This approach is a natural combination of compiler and query optimization within a unified language framework. This project will go beyond previous work by extracting large queries from nested sub-expressions and exploring interprocedural analysis to extract queries from modular programs. These new approaches will be evaluated relative to existing approaches for performance and usability. In additional to traditional performance benchmarking, large open-source applications will 1 class Employee { class Department { String name; String name; float salary; Collection<Employee> employees; Department department; Employee manager; } } Figure 1: Example database schema defined via classes be used to test the applicability and usability of proposed solutions and measure their performance in more realistic settings. 1.2 Expected Results The results from this project will include fundamental theories on how to interface programming languages to databases to achieve both programmer productivity and high performance, practical implementations of this theory in a language design and implementation, and measurement of this solution relative to other approaches. The project will produce several PhD theses, one of which is now in progress, and several masters thesis. Theoretical results will be published in both programming language and database conferences. The practical applications of the project make it possible to communicate the results in professional and trade publications. All software developed in the project will be freely available on the web as a basis for training and experimentation. The project will explore an integrated approach to teaching programming language and database concepts. New courses will be developed at the undergraduate and graduate level that take integration as a central theme. The project will also make incremental improvements to existing courses to address integration issues. The results from this project will help to improve the performance and reliability of software, while reducing development costs. The solutions developed in this project will be defined at a fundamental, theoretical level, so that they will apply to a wide range of languages and applications. The empirical validation of the results will provide evidence that the techniques are worth adopting. These results are also a concrete first step in a larger research effort to better support the development of distributed systems from large-scale components, using technologies like web services. Databases are an important concrete example of a large-scale component with complex integration needs. The solutions developed in this project should contribute to a longer-term effort to address integration

Load more