VoltDB (New SQL) SUNNIE CHUNG CIS 612 VoltDB 2

 VoltDB is an ACID-compliant relational database management system, which uses memory as storage to maximize performance.  VoltDB also uses shared-nothing architecture in which each node is independent and self-sufficient. The architecture brings a RDBMS as VoltDB scalability. Architecture 3

 VoltDB belongs to a NewSQL relation database system. NewSQL is type of modern database management systems that seek to provide the same scalable performance of NoSQL while still maintaining the ACID guarantees of traditional database system.  Automatic partitioning across shared-nothing server cluster  Main-memory data architecture  Elimination of multi-threading and locking overhead  Automatic replication and command logging  Stored procedure interface for transactions Features 4

 VoltDB uses in-memory storage to maximize throughput, avoiding costly disk access.  Further performance gains are achieved by serializing all data access, avoiding many of the time-consuming functions of traditional databases such as locking, latching, and maintaining transaction logs.  Scalability, reliability, and high availability are achieved through clustering and replication across multiple servers and server farms.  Scaling is transparent to applications and can be done in two dimensions: Up (by increasing the capacity of existing database nodes) and Out (by increasing the number of nodes in cluster) ACID in VoltDB 5

 VoltDB is a fully ACID-compliant transactions database, relieving the application developer from having to develop code to perform transactions and mange rollbacks within their own application.  It guarantees that data will be 100% accurate all the time. ACID is ensured by:  Data is organized into in-memory partitions  Clients connect to the database and send transactions  Incoming transactions are routed to data and executed serially  Each stored procedure is defined as a transaction, the stored procedure succeeds and rollbacks as a whole to ensure database consistency. ACID in VoltDB 6

 Also by using serialized processing (single-threaded), VoltDB ensures transactional consistency without the overhead of locking, latching and transaction logs. Handling multiple requests at a time is conducted by partitioning.  It is slower with multiple-partitioned transaction than with single- partitioned transaction but the integrity is maintained and throughput is maximized. How VoltDB Works 7

 Tested with VoltDB Enterprise version 4.2 for Mac

 VoltDB is not like traditional database products because there is no generic database. Instead, each VoltDB database is optimized for a specific application by compiling the schema, stored procedure, and partitioning information into VoltDB application catalog.  The catalog then will be loaded on or more lost machines to create a distributed database. VoltDB example 8

Example: a schema is saved in a text file towns. CREATE TABLE towns ( town VARCHAR(128), county VARCHAR(64), state VARCHAR(2) ); Compiling the Application 9 Catalog:  $ voltdb compile towns.sql  ------ Successfully created catalog.jar  Includes schema: towns.sql   [MP][WRITE] TOWNS.insert  INSERT INTO TOWNS VALUES (?, ?, ?);  ------ Catalog contains 1 built-in CRUD procedures.  Simple insert, update, delete and select procedures are created  automatically for convenience.  ------ Full catalog report can be found at file:///Users/nqt289/Desktop/voltdb/catalog- report.html  Or can be viewed at "http://localhost:8080" when the server is running. VoltDB Command 10

 Or to name the catalog (default is catalog.jar ) $ voltdb compile –o towns.jar towns.sql Starting the Database: $ voltdb create towns.jar Initializing VoltDB...

Build: 4.2 voltdb-4.2-0-gc9751d3-local Enterprise Edition Connecting to VoltDB cluster as the leader... Host id of this node is: 0 Starting VoltDB with trial license. License expires on May 17, 2014. Initializing the database and command logs. This may take a moment... WARN: This is not a highly available cluster. K-Safety is set to 0. Server completed initialization.  Check report, schema, procedure, etc. at http://localhost:8080/ Command Line interface 11

 VoltDB provides a SQL shell interpreter that allows users to execute VoltDB SQL and Stored Procedure interactively as well as non- interactively via scripts.  VOLTDB provides a command line interface, which can be accessed through sqlcmd   $ sqlcmd  SQL Command :: localhost:21212  1>  Command Line interface 12

 Three key options at the sqlcmd prompt:  SQL queries: for ad hoc SQL queries  Procedure calls: execute stored procedures  Exit: to exit interactive session VoltDB Query/Syntax 13  VoltDB supports a subset of ANSI-standard SQL 99, including CREATE INDEX, CREATE TABLE, and CREATE VIEW for schema definition and SELECT, INSERT, UPDATE, and DELETE for data manipulation.

Insert statement:

1> insert into towns values ('Billerica','Middlesex','MA'); (1 row(s) affected) 2> insert into towns values ('Buffalo','Erie','NY'); (1 row(s) affected) 3> insert into towns values ('Bay View','Erie','OH'); (1 row(s) affected) VoltDB Query/Syntax 14

 Select statement: 4> select count(*) as total from towns; TOTAL ------3 (1 row(s) affected) 5> select town, state from towns ORDER BY town; TOWN STATE ------Bay View OH Billerica MA Buffalo NY

(3 row(s) affected)  Exit: 6> exit VoltDB Input 15

 CSV and TXT files are standard input files to be loaded into VoltDB database.  VoltDB provides a simplified CSV loader through shell script csvloader.  Command:  csvloader tableName < dataFile.csv  csvloader tableName –f dataFile.csv VoltDB Input Example: 16

 Create a database with two tables towns and people from a schema saved in towns.sql

$ voltdb compile -o towns.jar towns.sql $ voltdb create towns.jar

 Prepare input files $ cut -d'|' -f2,4-7,16 POP_PLACES_20140401.txt | grep -v '|$' | grep -v '||' > towns.txt VoltDB Input Example: 17

 Loading the data:

$ csvloader --separator "|" --skip 1 --file towns.txt towns Read 194465 rows from file and successfully inserted 194465 rows (final) Elapsed time: 4.989 seconds  Invalid row file: /Users/nqt289/Desktop/voltdb/csvloader_TOWNS_insert_invalidrows. csv  Log file: /Users/nqt289/Desktop/voltdb/csvloader_TOWNS_insert_log.log  Report file: /Users/nqt289/Desktop/voltdb/csvloader_TOWNS_insert_report.log Querying the Database 18

1> SELECT town,state,elevation from towns order by elevation desc limit 5;

TOWN STATE ELEVATION ------Corona (historical) CO 3573 Quartzville (historical) CO 3527 Logtown (historical) CO 3524 Tomboy (historical) CO 3508 Rexford (historical) CO 3484

(5 row(s) affected) Querying the Database 19

2> select town, count(town) as duplicates from towns 3> group by town order by duplicates desc limit 5;

TOWN DUPLICATES ------Midway 214 Fairview 211 Oak Grove 167 Five Points 150 Riverside 130

(5 row(s) affected) Querying the Database 20

 Load another file: people.txt

$ csvloader --file people.txt --skip 1 people Read 3143 rows from file and successfully inserted 1802 rows (final) Elapsed time: 0.467 seconds Querying the Database 21  Check “people” table

1> select * from people order by population desc limit 5; STATE_NUM COUNTY_NUM STATE COUNTY POPULATION ------6 37 California Los Angeles County 9818605 17 31 Illinois Cook County 5194675 4 13 Arizona Maricopa County 3817117 6 73 California San Diego County 3095313 6 59 California Orange County 3010232

(5 row(s) affected) Querying the Database 22

 Perform join tables

2> select top 5 min(t.elevation) as height, 3> t.state,t.county, max(p.population) 4> from towns as t, people as p 5> where t.state_num=p.state_num and t.county_num=p.county_num 6> group by t.state, t.county order by height desc; HEIGHT STATE COUNTY C4 ------2754 CO Lake 7310 2640 CO Hinsdale 843 2609 CO Mineral 712 2523 CO San Juan 699 2452 CO Summit 27994 (5 row(s) affected) Save and Recover 23

 As VoltDB uses memory for operational storage unit, it provides a tool to save database snapshots.  Snapshots are a complete disk-based representation of a VoltDB database, including everything needed to reproduce the database after a shutdown. Save and Recover 24

 Save: $ voltadmin save /Users/nqt289/Desktop/voltdb/voltdbroot/snapshots/ "townsandpeople"

-- Snapshot Save Results --

HOST_ID HOSTNAME TABLE RESULT ERR_MSG ------0 Thuats-MacBook-Pro.local PEOPLE SUCCESS 0 Thuats-MacBook-Pro.local STATES SUCCESS 0 Thuats-MacBook-Pro.local TOWNS SUCCESS 0 Thuats-MacBook-Pro.local SUCCESS Save and Recover 25

 Recover: $ voltdb recover Initializing VoltDB...

Build: 4.2 voltdb-4.2-0-gc9751d3-local Enterprise Edition Connecting to VoltDB cluster as the leader... Host id of this node is: 0 Starting VoltDB with trial license. License expires on May 17, 2014. Initializing the database and command logs. This may take a moment... WARN: This is not a highly available cluster. K-Safety is set to 0. Restoring from path: voltdbroot/snapshots with nonce: townsandpeople Finished restore of voltdbroot/snapshots with nonce: townsandpeople in 0.87 seconds Server completed initialization. Save and Recover 26

 Adding, dropping tables, or changing stored procedure can be done while the database is running. New catalog will be created then data can be recovered.  When updating schema, the deploymeny.xml is required to specify configurations such as number of servers, number of partitions, etc.

 $ voltadmin update towns.jar voltdbroot/deployment.xml Stored Procedure 27

 A stored procedure is added to the schema, in order to update a snapshot of running database needs to be saved and new catalog is then compiled.

$ voltadmin save /Users/nqt289/Desktop/voltdb/voltdbroot/snapshots/ "states" $ voltadmin restore Stored Procedure 28

CREATE PROCEDURE leastpopulated AS SELECT TOP 1 county, abbreviation, population FROM people, states WHERE people.state_num=? AND people.state_num=states.state_num ORDER BY population ASC; Test procedure: 29

1> exec leastpopulated 6; COUNTY ABBREVIATION POPULATION ------Alpine County CA 1175

(1 row(s) affected)

2> exec leastpopulated 48; COUNTY ABBREVIATION POPULATION ------Loving County TX 82

(1 row(s) affected) Stored Procedure 30

 Stored procedure can also be written in , which is a more handy way to working with complex procedure.  Example of stored procedure that uses SELECT to check each record before either INSERT a new one or UPDATE existing one. UpdatePeople.java See VoltDB Example codes Stored Procedure 31

 Compile java file with specified jar files $ javac -cp "/Users/nqt289/voltdb-ent-4.2/voltdb/*" UpdatePeople.java

$ voltadmin save /Users/nqt289/Desktop/voltdb/voltdbroot/snapshots/ "tutorial5"

$ voltdb compile --classpath=./ -o towns.jar towns.sql

$ voltadmin update towns.jar deployment.xml INFO: The catalog update succeeded. Stored Procedure 32

 Check two counties with smallest population 1> SELECT TOP 2 county, abbreviation, population 2> FROM people,states WHERE people.state_num=states.state_num 3> ORDER BY population ASC; COUNTY ABBREVIATION POPULATION ------Loving County TX 82 Kalawao County HI 90

(2 row(s) affected) Stored Procedure 33

 Check two counties with smallest population again: 1> SELECT TOP 2 county, abbreviation, population 2> FROM people,states WHERE people.state_num=states.state_num 3> ORDER BY population ASC; COUNTY ABBREVIATION POPULATION ------Kalawao County HI 90 Loving County TX 94

(2 row(s) affected) Client Applications 34

 VoltDB provides client libraries in several programming languages (Java, Python, C++, etc.) with the same process:  Create a client connection to the database org.voltdb.client.Client client = null; ClientConfig config = null; try { config = new ClientConfig("advent","xyzzy"); client = ClientFactory.createClient(config); client.createConnection("myserver.xyz.net", 21211); } catch (java.io.IOException e) { e.printStackTrace(); System.exit(-1); } Client Applications 35

 Interacting with the database :  Calling a procedure, processing results client.callProcedure(new MyCallback(), ` "NewCustomer", firstname, lastname, custID};  Close the connection try { client.drain(); client.close(); } catch (InterruptedException e) { e.printStackTrace(); }