THE NOSQL MOUVEMENT (2)

GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE [email protected] http://www.vargas7solar.com/bigdata7managment0 THE NOSQL FAMILY

NoSQL& Graph&

Document&

Key5value&store&

! NoSQL concerns document , key-value databases and graph databases

2 GRAPH

! Use graph structures with nodes, edges, and properties to represent and store data ! Nodes are similar in nature to the objects that object-oriented programmers are familiar with ! Properties are pertinent information that relate to nodes ! Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two ! By definition, a is any storage system that provides index-free adjacency ! Every element contains a direct pointer to its adjacent element ! No index lookups are necessary

3 Takahiro Inoue, MongoDB leader, slideshare

GRAPH

4 Takahiro Inoue, MongoDB leader, slideshare

UNDIRECTED GRAPH

5 Takahiro Inoue, MongoDB leader, slideshare

DIRECTED GRAPH

6 Takahiro Inoue, MongoDB leader, slideshare

7 Takahiro Inoue, MongoDB leader, slideshare

MIXED GRAPH, MULTIGRAPH

8 Takahiro Inoue, MongoDB leader, slideshare

SINGLE RELATIONAL GRAPH

9 Takahiro Inoue, MongoDB leader, slideshare

MULTI RELATIONAL GRAPH

10 Takahiro Inoue, MongoDB leader, slideshare

11 Takahiro Inoue, MongoDB leader, slideshare

12 Takahiro Inoue, MongoDB leader, slideshare

PROPERTY GRAPH

13 Takahiro Inoue, MongoDB leader, slideshare

14 Takahiro Inoue, MongoDB leader, slideshare

PROPERTY GRAPH: SUMMARY

15 Takahiro Inoue, MongoDB leader, slideshare

GRAPH TRAVERSALS

16 Takahiro Inoue, MongoDB leader, slideshare

GRAPH TRAVERSALS

17 THE NOSQL FAMILY

NoSQL& Graph&

18 DOCUMENT DATABASE

! Computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information ! Document encapsulates and encodes data (or information) in some standard formats or encodings. ! XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on) ! Similar, in some ways, to records or rows, in relational databases, but they are less rigid. ! Not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like

19 ORGANIZATION AND ACCESS

! Organizing documents, include notions of Collections,0Tags,0Non7visible0 Metadata,0Directory0hierarchies,0Buckets0 ! Documents are addressed in the database via a unique key that represents that document ! This key is a simple string e.g., URI or path ! This key can be used to retrieve the document from the database ! The database retains an index on the key such that document retrieval is fast ! Simple key-document (or key-value) lookup to retrieve a document, ! the database offers an API or query language to retrieve documents based on their contents ! For example, you may want a query that gets you all the documents with a certain field set to a certain value

20 DATA MODEL: DOCUMENT

! An object with named attributes and «attachments»: ! Identified by one unique ID and a version number ! Different data types: Text, numbers, booleans, dates, lists, maps ! Does not use locks for dealing with concurrency control: conflicts can be merged

! Examples: ! “Title”:0“CouchDB:0The0Definitive0Guide:0Time0to0Relax0(Animal0Guide)”0 ! “Authors”:0[“Chris0Anderson”,0“Jan0Lehnardt”,0“Noah0Slater”]0 ! “Keywords”:0[“NoSQL0databases”,0“Document0databases”]0 21

11/01/15 FLEXIBLE DOCUMENT STRUCTURE

! Can represent different classes of tag as documents ! Both documents can be inserted in the same collection

22 SIMPLE QUERY

! db.tags.find({id:0“tone/obituaries”})0

! Query&operators&(cf.&h

! db.tags.find({“section”:0{$exists:0true}})0 ! db.tags.find({“webtitle”:0/^Obit*/i})0

23 MODIFYING THE DOCUMENT STRUCTURE

24 THE NOSQL FAMILY

NoSQL&

Document&

25 DEMO: COUCHDB

26 THE NOSQL FAMILY

Eventually5consistent& NoSQL& Hierarchical&

Hosted&services&

MulIvalue&databases&

Key5value&store&

Object&databases&

Stores&on&disk& Tabular&

Ordered&stores& Tuple& 27 • Data$model$$ • Availability$$ • Consistency$$ • Query$support$ • Storage$$ • Durability$$

Data&stores&designed&&to&scale&simple&& OLTP5style&applicaIon&loads&&

Read/Write$operations00 by0thousands/millions0of0users0

28 Use the right tool for the right job…

How do I know which is the right tool for the right job?

(Katsov-2012) 29 PROBLEM STATEMENT: HOW MUCH TO GIVE UP?

Fault7tolerant00 partitioning0

Availability0 Consistency0

! CAP theorem1: a system can have two of the three properties ! NoSQL systems sacrifice consistency

30 10Eric0Brewer,0"Towards0robust0distributed0systems."0PODC.020000http://www.cs.berkeley.edu/~brewer/cs262b72004/PODC7keynote.pdf00 COMPARING NOSQL & NEWSQL SYSTEMS

SYSTEM CONCURRENCY DATA REPLICATION TRANSACTION SYSTEM CONCURRENCY DATA REPLICATION TRANSACTION CONTROL STORAGE CONTROL STORAGE

Redis Locks RAM Asynchronous No Terrastore Locks RAM+ Synchronous L Scalaris Locks RAM Synchronous Local Hbase Locks HADOOP Asynchronous L

Tokyo Locks RAM/Disk Asynchronous Local HyperTable Locks Files Synchronous L

Voldemort MVCC RAM/BDB Asynchronous No Cassandra MVCC Disk Asynchronous L BigTable Locs+stamps GFS Both L Riak MVCC Plug in Asynchronous No Key7Value0 PNuts MVCC Disk Asynchronous L

Membrain Locks Flash+Disk Synchronous Local Extended0records0 MySQL-C ACID Disk Synchronous Y Membase Locks Disk Synchronous Local VoltDB ACID/no Lock RAM Synchronous Y Dynamo MVCC Plug in Asynchronous No Clustrix ACID/no Lock Disk Synchronous Y SimpleDB Non S3 Asynchronous No ScaleDB ACID Disk Synchronous Y MongoDB Locks Disk Asynchronous No ScaleBase ACID Disk Asynchronous Y Relational0 Document0 CouchDB MVCC Disk Asynchronous No NimbusDB ACID/no Lock Disk Synchronous Y

Cattell,0Rick.0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270 31 CONCLUSIONS

! Data are growing big and more heterogeneous and they need new adapted ways to be managed thus the NoSQL movement is gaining momentum ! Data heterogeneity implies different management requirements this is where polyglot persistence comes up ! Consistency – Availability – Fault tolerance theorem: find the balance ! ! Which data store according to its data model? ! A lot of programming implied …

Open opportunities if you’re interested in this topic! 32 POLYGLOT PERSISTENCE

GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE [email protected] http://www.vargas7solar.com0 34 THIS TALK IS ABOUT

alternative for managing multiform and multimedia data collections according to different properties and requirements

35 36 POLYGLOT PERSISTENCE

! Polyglot Programming: applications should be written in a mix of languages to take advantage of different languages are suitable for tackling different problems ! Polyglot persistence: any decent sized enterprise will have a variety of different data storage technologies for different kinds of data ! a new strategic enterprise application should no longer be built assuming a relational persistence support ! the relational option might be the right one - but you should seriously look at other alternatives

M.0Fowler0and0P.0Sadalage.0NoSQL&Distilled:&A&Brief&Guide&to&the&Emerging&World&of&Polyglot&Persistence.0Pearson0Education,0Limited,020120

37 DESIGNING AND BUILDING A POLYGLOT DATABASE

38 OBJECTIVE

MyNet0App0 MyNet0DB0 Social0network0 ! Build a MyNet app based on a polyglot database for building an integrated directory of my contacts including their status and posts from several social networks

39 Integrating posts Directory synchronisation Contact graph traversal User sessions in different from all networks Social networks Integrating contacts’ information For building groups out of From all SN Common characteristics

Analysis on contacts networks, overlapping according to interests, posts topics Top 10 most popular contacts Synchronizing posts to all Friends network User accounts activity SN In different social networks

40 DEPLOYING A POLYGLOT DATABASE

41 Phone' *' Post' Content' number:(String( type:({pers,(prof}(

postID:(Integer( contentID:(Integer( '>>' *' timeStamp:(Date( 1' 1' text:(String( geoStamp:(String( image:(Jpeg( <<'hasContent'>>' video:(Avi( '>>'

*'

MULTI-CLOUD POLYGLOT DATABASE hasMobilePhones <<'publishesPost'>>' Phone' <<' 1' 1' *' Contact' Post' hasOtherPhones Content' Basic(Info( number:(String( <<' idContact:(Integer( type:({pers,(prof}( 1' lastName:(String( <<'hasBasicInfo'>>'webSite:(URI( givenName:(String( socialNetworkIDpostID:(URI((:(Integer( contentID:(Integer( '>>' *' MyNet0 society:(String( 1' *' ( timeStamp:(Date( 1' 1' text:(String( *' 0 geoStamp:(String( image:(Jpeg( 1' <<'hasContent'>>' 1' *' 1' video:(Avi( '>>'

<<'isContactof'>>' '>>' <<'hasEmail'>>' <<'isComposedof'>>' *' *' hasAddress 1' <<'publishesPost'>>' hasMobilePhones *'

<<' Email'

Group' <<' Address' 1' email:(String( 1' Phone' groupName:(String( type:({pers,(prof}( street:(String,(( Contact' hasOtherPhones *' number:(Integer,( Basic(Info( Post' Content' number:(String( City:(Sting,(( <<' type:({pers,(prof}( Zipcode:(Integer( idContact:(Integer(

postID:(Integer( contentID:(Integer( '>>' *' lastName:(String( webSite:(URI( 1' timeStamp:(Date( 1' 1' text:(String( <<'hasBasicInfo'>>' geoStamp:(String( image:(Jpeg( givenName:(String( socialNetworkID:(URI(( <<'hasContent'>>' video:(Avi( society:(String( 1' *' ( REST0 '>>' *' *'

hasMobilePhones MyNetContacts0 <<'publishesPost'>>' 1' <<' 1' *' 1'

1' '>>' 1'

Contact' hasOtherPhones <<'isContactof'>>' Basic(Info( <<' <<'hasEmail'>>' idContact:(Integer( 1' <<'isComposedof'>>' lastName:(String( <<'hasBasicInfo'>>'webSite:(URI( givenName:(String( socialNetworkID:(URI(( society:(String( 1' *' ( *' *' 1' hasAddress JSON0documents0 *' 1' <<' Email' 1' *' 1' '>>' Group' <<'isContactof'>>' Address' <<'hasEmail'>>' <<'isComposedof'>>' email:(String( *' groupName:(String(

hasAddress type:({pers,(prof}( 1' street:(String,(( *'

<<' Email' Group' number:(Integer,( Address' City:(Sting,(( email:(String( groupName:(String( street:(String,(( type:({pers,(prof}( Zipcode:(Integer( number:(Integer,( City:(Sting,(( Zipcode:(Integer( MANAGING A POLYGLOT DATABASE QUERYING, INSERTING, MAINTAINING

43 GENERATING NOSQL PROGRAMS FROM HIGH LEVEL ABSTRACTIONS

High-level abstractions

Java0 web0 App0 Spring Data Spring Roo

UML class diagram application classes

Low-level abstractions

Graph database http://code.google.com/p/model2roo/00 Relational database 44 Post' Content'

postID:(Integer( contentID:(Integer( timeStamp:(Date( 1' 1' text:(String( POLYGLOTgeoStamp:(String DATABASE( EVOLUTION <<'hasContent'>>'

*' <>'

1'

Contact' Basic(Info( idContact:(Integer( lastName:(String( webSite:(URI( 1' givenName:(String( *' socialNetworkID:(URI(( society:(String( <<'hasBasicInfo'>>' ( <<' <<' 1' speaksLanguage '>>' hasEmail ! Problem statement: '>>' hasAddress '>>' <<' *' Contact' *' *' ! Evolution of the application: modification of classes, new Address' Language' Email'

street:(String,(( ID:(Integer,( email:(String( classes, new relationships among classes idContact:"Integer" number:(Integer,( Lang:(String( type:({pers,(prof}( firstName:"String" City:(Sting,(( Zipcode:(Integer( lastName:"String" ! Evolution of the “entities” managed in the polyglot database ! Some change structure, change values, … ! The content of the stores start deriving from the application data structures ! Which is the current structure of the entities stored? ! Are there elements that are not being accessed because they do not longer correspond to the application data structures?

45 CRUD OPERATIONS

Consistent view of data

Contact Content Post Group

Id$ FirstName$ LastName$ Society$

46 BACKGROUND

! Classic protocols are not an option (Ex: Two-phase commit) ! Voting phase, Commit phase = Scalability issues

! Limited transactional support by NoSQL solutions ! Neo4j is one of the few that truly supports ACID (Atomicity, Consistency, Isolation, Durability) ! Others provide transactions limited to single entities (MongoDB), no roll-back (Redis), etc. ! Rely on BASE (Basic Availability, Soft-state, Eventual consistency) instead of ACID

47 EXAMPLE 1: SYNCHRONIZING REDIS+MYSQL

Updating REDIS #FAIL

begin0MySQL0transaction0 Redis has updated MySQL does not 0update0MySQL0 0update0Redis0 rollback0MySQL0transaction0

begin0MySQL0transaction0 0update0MySQL0 MySQL has updated Redis does not commit0MySQL0transaction0 <<0system0crashes0>>0 update0Redis0 https://oracleus.activeevents.com/connect/sessionDetail.ww?SESSION_ID=477500

48 EXAMPLE 1: UPDATING REDIS RELIABLY

Step I Step 2 begin0MySQL0transaction0 for0each0CRUD0event0in0MySQL0queue0 0update0MySQL0 ACID 0 0queue0CRUD0event0in0MySQL0 get0next0CRUD0event0from0MySQL0queue0 commit0transaction0 if0CRUD0event0is0not0duplicate0then0 0update0Redis0(incl.0eventID)0 end0if0 0 Event0Id0 begin0MySQL0transaction0 Operation:0Create,0Update,0Delete0 0mark0CRUD0event0processed0 queue0CRUD0event0in0MySQL0 commit0transaction0 New0entity0state,0e.g.0JSON0 0 end0for0each0

49 EXAMPLE 1: UPDATING REDIS RELIABLY

Step 1 Step 2 Timer0

EntityCRUDEvent0 EntityCRUDEvent0 apply(event)0 Redis0updater0 Repository0 Processor0

INSERT0INTO0..0 SELECT0…0FROM..0

ID$ JSON$ Processed?$

50 EXAMPLE 1: TRACKING CHANGES (HIBERNATE)

51 EXAMPLE 1: SYNCHRONIZING REDIS+MYSQL

! Tracking changes (Hibernate)…

52 EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL

53 EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL

54 EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL

! Spring Data project

http://static.springsource.org/spring-data/data-mongodb/docs/current/reference/html/#mongo.cross.store 55 Use the right tool for a given job…

Lack of standardization of models and data storage technologies

(Katsov-2012) 56 +

QUALITY DRIVEN BENCHMARK1

CHARACTERISTIC$ SUBCHARACTERISTIC$ METRIC$ Reliability0 Maturity0 API changes0

Availability0 Downtime 30 Fault tolerance0 Node down throughput 30 Recoverability0 Time to stabilize on node up 30 Performance and Time behaviour0 Throughput, latency 20 efficiency0 Resource utilisation0 CPU, Memory and disk usage 40

Data&stores&designed&&to&scale&simply&& OLTP5style&applicaIon&loads&& Read/Write$operations00 by0thousands/millions0of0users0

1Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki 2 Cooper,B.F.,Silberstein,A.,Tam,E.,Ramakrishnan,R.,Sears,R.:Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on Cloud computing. pp. 143–154. SoCC ’10, ACM, New York, NY, USA (2010) 3 Nelubin, D., Engber, B.: Failover Characteristics of leading NoSQL databases. Tech. rep., Thumbtack Technology (2013) 4 Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring Sys- tem: Design, Implementation, and Experience. Parallel Computing 30(7) (Jul 2004) 57 +

QUALITY DRIVEN BENCHMARK

Linked0data0&0 QDB0 temporal0streams0

• Read/write0mix0 • DB0to0use0 • Record0size0 • Workload0to0use0 • Popularity0distribution0 • Target0throughput0 • Number0of0threads0 YSCB0Client0

Client0threads0 layer0

Workload0 executor0 Stats0 Cloud0serving0 Read0latency0 DB0interface0 store0 Throughput0

58 +

ONGOING WORK ! QDB benchmark extends YCSB: FaultTolerance, Recoverability and TimeBehaviour ! Pivot data model for representing NoSQL stores data models ! Sample application: Shopping system1 (ProductInfo) ! Document data stores: MongoDB, Couchbase, VoltDB, Redis, Neo4J ! Cluster of four Ubuntu 12.04 servers deployed with extra large VM instances (8 virtual cores and 14 GB of RAM) in Windows Azure2 ! Distributed polyglot (big) database engineering ! Model2Roo: engineering data storage solutions for given data collections ! ExSchema for supporting the maintenance of a polyglot storage solution

1 McMurtry , D., Oakley, A., Sharp, J., Subramanian, M., Zhang, H.: Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence Microsoft patterns & practices, Microsoft (2013) 2 http://www.windowsazure.com/ 3 http://forge.puppetlabs.com/puppetlabs/ 4Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki 59

IN BRIEF…

! Many proposals, but no definite solution yet…

! Research/Industry challenges

! Open opportunities if you’re interested in this topic!

60 WHEN IS POLYGLOT PERSISTENCE PERTINENT?

! Application essentially composing and serving web pages ! They only looked up page elements by ID, they had different needs or availability, concurrency and no need to share all their data ! A problem like this is much better suited to a NoSQL store than the corporate relational DBMS ! Scaling to lots of traffic gets harder and harder to do with vertical scaling ! Many NoSQL databases are designed to operate over clusters ! They can tackle larger volumes of traffic and data than is realistic with a single server

61 Dr.0Genoveva0Vargas7Solar0 CNRS,0LIG7LAFMIA0 France0

[email protected] http://www.vargas7solar.com/bigdata7management0000 Juan0Carlos0Castrejón0 Javier0Espinosa0 University0of0Grenoble0 University0of0Grenoble0 France0 France0 REFERENCES

! Eric0A.,0Brewer0"Towards0robust0distributed0systems."0PODC.020000 ! Rick,0Cattell0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270 ! Juan0 Castrejon,0 Genoveva0 Vargas7Solar,0 Christine0 Collet,0 and0 Rafael0 Lozano,0 ExSchema:0 Discovering0and0Maintaining0Schemas0from0Polyglot0Persistence0Applications,0In0Proceedings0of0 the0International0Conference0on0Software0Maintenance,0Demo0Paper,0IEEE,0201300 ! M.0Fowler0and0P.0Sadalage.0NoSQL0Distilled:0A0Brief0Guide0to0the0Emerging0World0of0Polyglot0 Persistence.0Pearson0Education,0Limited,020120 ! C.0 Richardson,0 Developing0 polyglot0 persistence0 applications,0 http://fr.slideshare.net/ chris.e.richardson/developing7polyglotpersistenceapplications7gluecon20130

63