Apache Derby Performance

Apache Derby Performance Olav Sandstå, Dyre Tjeldvoll, Knut Anders Hatlen Database Technology Group Sun Microsystems Overview • Derby Architecture • Performance Evaluation of Derby • Performance Tips • Comparing Derby, MySQL and PostgreSQL Derby Architecture: Embedded Java Virtual Machine ● Parsing of SQL Application ● Compiling ● Optimization ● Byte-code gen. ● Execution JDBC SQL ● Database buffer Access ● Log generation ● Access path ● Log execution ● Indexes Storage Database buffer ● Asynchronous write of data ● Synchronous (checkpoint) write of log ● Synchronous records Log Data read of data Derby Architecture: Client-Server Application Java Virtual Machine JDBC client Network Server Java Virtual Machine JDBC SQL Access Storage Database buffer Log Data Performance Evaluation of Derby Overview: • Evaluation of disk and file system configurations: > Disk write cache > Database and transaction log on separate disks • Database configurations: > Size of database buffer • Comparing Embedded and Client-Server What Is “Performance”? • How to measure database performance? > Throughput > Response time > Scalability • Which configuration? > Out-of-the-box configuration > Carefully tuned • How to compare database systems with different properties and different tuning possibilities? Test Configuration Load clients: Test platform: 1. TPC-B like load: • Sun JVM 1.5.0 > 3 UPDATES • Solaris 10 x86 > 1 SELECT > 1 INSERT • 2 x 2.4 GHz AMD Opteron • 2 GB RAM 2. Single-record SELECT: • 2 SCSI disks > Select of one record on • UFS file system primary key Effect of Write Cache on Disk TPC-B like load: WARNING: The write cache reduces probability of successful recovery after power failure Separate Data and Log Devices Log device: > Sequential write of transaction log > Synchronous as part of commit > Group commit Data device: > Data in database buffer regularly written do disk as part of checkpoint > Data read from disk on demand Tip: use separate disks for data and log device Database Buffer Example: • Cache of frequently used single-record select: data pages in memory • Cache-miss leads to a read from disk (or disk cache) • Size: > default 4 MB > derby.storage. pageCacheSize Performance tip: • increase the size of the database buffer to get frequently accessed data in memory Comparing Embedded and Client-Server • TPC-B like load: Comparing Embedded and Client-Server • Single-record SELECT: Comparing Embedded and Client-Server: CPU Usage CPU usage per transaction [ms] 1.4 • Network Server add 30%- 1.3 1.2 50% to the CPU usage 1.1 1 User CPU • System CPU usage: 0.9 System CPU 0.8 > Message sending and 0.7 receiving 0.6 0.5 0.4 • User CPU usage: 0.3 > 0.2 Message parsing 0.1 > Character set conversions 0 TPC-B TPC-B Select Select Embedde Client- Embedde Client- d Server d Server Performance Tips Performance Tips 1: 2.75 2.5 Use Prepared Statements 2.25 ) x 2 System time t / s User time 1.75 m ( e 1.5 • Compilation of SQL g a s u 1.25 statements is expensive: U P 1 C > Derby generates Java byte 0.75 0.5 code and loads generated 0.25 classes 0 PreparedStatement Statement • Prepared statements eliminate this cost Tip: • USE prepared statements • and REUSE them Performance Tips 2: Avoid Table Scans • Use indexes to optimize much used access paths: > CREATE INDEX.... • Check the query plan: > derby.language.logQueryPlan=true • Use RunTimeStatistics: > SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1) > SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS() Tip: • Use indexes • Use Derby's tools to understand the query execution Comparing the Performance of MySQL, PostgreSQL and Derby Performance Evaluation: MySQL, PostgreSQL and Derby Evaluated performance of: • MySQL/InnoDB (version 5.0.10) • PostgreSQL (version 8.0.3) • Derby Embedded (version 10.1.1.0) • Derby Client-Server Database Configurations Configurations: Databases: • “Out-of-box” performance 1. Main-memory database: • No tuning, except: > 10 MB user data > size of database buffer > 64 MB database buffer > database and transaction log on separate disks 2. Disk database: ➔ No Benchmark > 10 GB user data > 64 MB database buffer Load: > 1-100 concurrent clients Throughput: TPC-B like load Main-memory database (10 MB): Disk-based database (10 GB): Throughput: Single-record Select Main-memory database (10 MB): Disk-based database (10GB): Observations • Derby outperforms MySQL on disk-based databases > Derby has 100% higher throughput than MySQL • MySQL performs better on small main-memory databases > Update-intensive load: Derby has 20-50% lower throughput > Read-intensive load: Derby has 50% lower throughput • PostgreSQL performs best on read-only databases, and has lowest throughput on update-intensive databases Why? CPU Usage Main-Memory Database TPC-B SELECT 1.5 0.325 1.4 0.3 1.3 System time 0.275 1.2 User time 0.25 System time 1.1 n User time o i 0.225 t n 1 c o i 0.9 0.2 t sa c 0.8 n a 0.175 a s r 0.7 t n / 0.15 s a r 0.6 t m / 0.125 0.5 ms 0.4 0.1 0.3 0.075 0.2 0.05 0.1 0.025 0 0 Derby Derby MySQL PostgreSQ embedded client/serv L Derby Derby MySQL PostgreSQL er embedded client/server CPU Usage Disk-Based Database TPC-B SELECT CPU usage per transaction CPU usage per transaction 6 1.5 5.5 1.4 1.3 5 System time System time 1.2 4.5 User time User time n n 1.1 o o i i 4 t t 1 c c 3.5 a 0.9 a s s n 0.8 a n 3 r t a / 0.7 r s t 2.5 / 0.6 m s 2 0.5 m 1.5 0.4 0.3 1 0.2 0.5 0.1 0 0 Derby Derby MySQL PostgreSQL Derby Derby MySQL PostgreSQL embedded client/server embedded client/server Network I/O TPC-B SELECT 9 4.5 8.5 8 4 7.5 n 7 o n 3.5 i t o 6.5 i c t a 6 c s 3 n 5.5 sa a r an t 5 2.5 r t r Incoming packets Incoming packets e 4.5 r p Outgoing packets e Outgoing packets 2 4 p s t 3.5 s e t k e c 3 1.5 k a 2.5 ac P 2 P 1 1.5 1 0.5 0.5 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL client/server client/server Observation: • Derby has two extra round-trips for SELECT operations Disk I/O: Main-Memory Database (TPC-B) Write Operations Write Volume 2.6 15 14 n 2.4 o i t 13 2.2 c a 12 n s 2 o i n t 11 a c r 1.8 t a Data 10 r s e 1.6 Log n 9 p a r t s 1.4 8 r n e io 1.2 7 p t a r 1 B 6 e K p 5 o 0.8 4 e it 0.6 r 3 W 0.4 2 0.2 1 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL Disk I/O: Disk-Based Database (TPC-B) I/O Operations Data Volume 15 1400 14 1300 13 1200 n o 12 i 1100 t n c 11 o 1000 i a t s 10 c 900 n a a 9 s r n 800 t 8 a r r t 700 e 7 r p e 600 s 6 p n 500 B o 5 i K t 400 a 4 r e 3 300 p 200 O 2 1 100 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL Log Data write Data read Disk I/O: Disk-Based Database (SELECT) I/O Operations Data Volume 2.75 1400 1300 2.5 1200 n 2.25 o 1100 ti n c 2 o 1000 i a t s c 900 a n 1.75 s a r n 800 t 1.5 a r r t 700 e r p 1.25 e 600 p ns 1 500 B o ti K 400 a 0.75 r e 300 p 0.5 O 200 0.25 100 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL Data write Data read Conclusions: Resource Usage • MySQL performs better than Derby when > The database is small and fits in the database buffer > Throughput becomes CPU-bound > Derby uses more CPU and sends more messages over the net • Derby performs better than MySQL when > The database is large and does not fit in the database buffer > Throughput becomes I/O-bound • PosgreSQL performs best with read-only load > Update-intensive load results in much disk I/O Performance Improvement Activities CPU usage: Network IO: • High CPU usage during • Reduce number of initialization of database messages sent and buffer (DONE) received • Reduce CPU usage in Disk IO: network server (message • Improve fairness of parsing and generation) scheduling of I/O requests • Improve use of hash (DONE) tables and reduce lock • Allow concurrent read contention operations on data files Performance tip: • Next version of Derby will have even better performance Summary • Disk Write Cache: > Improves the performance, but..... > WARNING: durability and the probability of successful recovery after a power failure are reduced • Data and log devices: > Keep the transaction log and database on a separate devices • Database buffer: > Keep frequently accessed data in the database buffer • Client-Server versus Embedded: > Throughput: down by 5% (TPC-B) to 30% (SELECT) • Use: > Prepared statements and indexes Questions? Olav Sandstå, Dyre Tjeldvoll, Knut Anders Hatlen [email protected], [email protected], [email protected].

Load more