Comparison of JPA Providers and Issues with Migration
Total Page:16
File Type:pdf, Size:1020Kb
MASARYKOVA UNIVERZITA FAKULTA}w¡¢£¤¥¦§¨ INFORMATIKY !"#$%&'()+,-./012345<yA| Comparison of JPA providers and issues with migration DIPLOMA THESIS Luk´aˇs Semberaˇ Brno, June 2012 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Luka´sˇ Semberaˇ Advisor: Jirˇ´ı Pechanec, Red Hat Czech, s.r.o. ii Acknowledgement I would like to thank my technical advisor Jirˇ´ı Pechanec from Red Hat Czech for his valuable comments and suggestions. I would also like to thank my fiancee´ Daria for her support during writing. iii Abstract This thesis aims to compare three implementations of the JPA stan- dard – specifically Hibernate, OpenJPA and EclipseLink. Except the comparison, it will also describe the migration processes of various real-world applications between those JPA implementation and doc- ument the issues that the developers might typically run into. The practical part involves developing an application which would provide a support when migrating projects between those three JPA providers. iv Keywords JPA, JPA2, Hibernate, OpenJPA, EclipseLink, Java, persistence, rela- tional, databases, Scala v Contents 1 Introduction ............................6 1.1 Database management systems ..............6 1.1.1 Relational databases................7 1.1.2 Object-oriented databases.............7 1.1.3 NoSQL databases..................8 1.2 Object-relational mismatch .................8 1.3 Brief history of Java persistence solutions ........9 1.3.1 JDBC.........................9 1.3.2 EJB 2.x entity beans................. 10 1.3.3 JDO.......................... 11 1.3.4 myBatis....................... 11 1.4 JPA .............................. 12 1.5 Goals of the thesis ...................... 12 2 Comparison of JPA providers .................. 13 2.1 Methodology of the comparison ............. 13 2.2 Identifier generation .................... 14 2.3 Performance ......................... 15 2.3.1 Batch inserts..................... 18 2.3.2 Searching by ID................... 18 2.3.3 Basic JPA QL test.................. 19 2.3.4 Basic criteria API test................ 19 2.3.5 Aggregate function................. 20 2.3.6 Performance summary............... 20 2.4 Type conversion ....................... 21 2.5 Caching support ....................... 22 2.6 Entity lifecycle and transactional events ......... 24 2.7 Schema generation ..................... 25 2.8 Support for stored procedures ............... 27 2.9 Integrating with other frameworks ............ 28 2.10 Licenses ........................... 29 2.11 Documentation quality ................... 30 2.12 Build systems ........................ 31 2.13 Summary ........................... 31 3 Experimental migration of JPA applications ......... 33 3.1 Migrating from Hibernate ................. 33 1 3.2 Migrating from OpenJPA .................. 35 3.3 Migrating from EclipseLink ................ 36 3.4 Migration summary ..................... 38 4 Automatic migration tool .................... 39 4.1 The application architecture ................ 39 4.2 Java source files parsing .................. 41 4.3 Ideas for a further development .............. 43 5 Conclusion ............................. 45 A Generated database schemas .................. 46 A.1 Hibernate ........................... 46 A.2 OpenJPA ........................... 48 A.3 EclipseLink .......................... 50 2 Listings 1.1 Sample of JDBC code....................9 2.1 DDL defining sample database schema......... 16 2.2 Sample stored procedure.................. 27 4.1 Recursively searching the abstract syntax tree for vendor- specific annotations using Scala pattern matching... 42 A.1 Hibernate-generated sample database schema..... 46 A.2 OpenJPA-generated sample database schema...... 48 A.3 EclipseLink-generated sample database schema.... 50 3 List of Figures 2.1 ER diagram of sample database schema 15 4.1 Class diagram of the migration application 44 4 List of Tables 2.1 Batch inserts on PostgreSQL test results 18 2.2 Batch inserts on MySQL test results 18 2.3 Find by ID test results 19 2.4 Fetch all users using JPA QL test results 19 2.5 Fetch all users using criteria API test results 19 2.6 Complex join using JPA QL test results 20 2.7 Complex join using criteria API test results 20 2.8 Feature matrix 31 5 1 Introduction Every application, except the most basic ones, has to deal with data. The very first computers were designed as black boxes receiving in- put, doing some calculations and producing output. Since then, com- puters have become much more complicated and nowadays they do much more than such simple data processing. Nevertheless, they still operate with data stored on some kind of a permanent storage de- vice, such as hard drive. Input data for an application could be saved, without much think- ing, into an ordinary text file. However, such files are next to impos- sible to machine process because they do not follow any rules which would describe their structure. For this reason, variety of rules the data have to follow are often introduced (e.g. the structure is de- scribed by XML with an appropriate XML Schema definition). Even if the data are in easily computer-readable form, the biggest problem with this “file-based” approach remains. It is still just a text file and, therefore, the data access is limited by I/O operations of the operating system. Demands of current enterprise applications, how- ever, go far beyond the possibilities of such file-based persistence. We require reliability, transaction management, high-performance con- current access, advanced user access control and much more. To sup- port all of these advanced features, database management systems have been invented. 1.1 Database management systems A database management system (DBMS), as defined in [1], is a soft- ware designed to assist in maintaining and utilizing large collections of data. Each DBMS has its model, which describes data, data rela- tionships, semantics and consistency constraints[2]. It is basically a theoretical foundation, upon which database management systems operate. During last few decades, several database models have been in- vented. In 1960, IBM introduced their database management system IMS, which internally uses hierarchical database model. Hierarchical model stores data in records, which are connected with each other 6 1. INTRODUCTION through links, creating tree-like structures [2]. An evolution of the hierarchical model is the network model, which allows records to be connected in arbitrary graphs and thus making data modelling mode flexible (e.g. allows many-to-many relationships between records). Even though hierarchical and network databases exist and are still in use1, the models have many flaws (further discussed in [4]), which make their usage in certain scenarios particularly complicated. 1.1.1 Relational databases In 1970, E. F. Codd published a revolutionary paper [5], where he laid out the concept of the relational data model, which is the theo- retical foundation of relational databases. For its flexibility2, simplic- ity and strong but simple formal background (which allows math- ematical reasoning about data) its popularity grew rapidly. A lot of both commercial and open-source implementations exist; they are very mature and industry-proven, relational model itself is very well understood and documented. For these reasons, relational databases basically mean an industry standard and their knowledge is essential for every programmer. 1.1.2 Object-oriented databases In last decade, under the influence of object oriented programming, the concept of object oriented (OODBMS) and object-relational data- base management (ORDBMS) systems has aroused. OODBMS allow object graphs to be stored to the database directly and are very of- ten integrated with the programming language itself. Thus, they pro- vide homogeneous environment and remove the necessity of various transformations when data are passed back and forth between ap- plication and data layer. Even though object oriented databases have undeniable benefits and advantages, their popularity is not very high. 1. Probably the best known hierarchical database is the Windows System Reg- istry [3] 2. By “flexibility” I mean the ability of the relational model to hide its inter- nal data representation. Clients thus do not need any knowledge, how data are physically stored and, therefore, are not affected when the server implementation changes. 7 1. INTRODUCTION Not only because of those enormous amounts data that are already stored in relational databases (and migration of which would not be cost free), but also because of some technical issues they are still fac- ing and which are still not yet resolved3. Moreover, vendors of rela- tional databases are integrating various object-oriented features into their products and thus are making the need for pure object-oriented databases less urgent. 1.1.3 NoSQL databases Recently, with the rise of interest in cloud computing, a new cate- gory or databases has occurred, so called NoSQL4 databases. NoSQL is neither a specific database model, nor an evolution of relational or object oriented databases, but it is rather a group of database prod- ucts which are suited to specific scenarios, often where other solu- tions fail. They often offer only a feature subset of relational data- bases, but they are superior in certain