THE NOSQL MOUVEMENT (2)
GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE [email protected] http://www.vargas7solar.com/bigdata7managment0 THE NOSQL FAMILY
NoSQL& Graph&
Document&
Key5value&store&
! NoSQL concerns document databases, key-value databases and graph databases
2 GRAPH DATABASE
! Use graph structures with nodes, edges, and properties to represent and store data ! Nodes are similar in nature to the objects that object-oriented programmers are familiar with ! Properties are pertinent information that relate to nodes ! Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two ! By definition, a graph database is any storage system that provides index-free adjacency ! Every element contains a direct pointer to its adjacent element ! No index lookups are necessary
3 Takahiro Inoue, MongoDB leader, slideshare
GRAPH
4 Takahiro Inoue, MongoDB leader, slideshare
UNDIRECTED GRAPH
5 Takahiro Inoue, MongoDB leader, slideshare
DIRECTED GRAPH
6 Takahiro Inoue, MongoDB leader, slideshare
7 Takahiro Inoue, MongoDB leader, slideshare
MIXED GRAPH, MULTIGRAPH
8 Takahiro Inoue, MongoDB leader, slideshare
SINGLE RELATIONAL GRAPH
9 Takahiro Inoue, MongoDB leader, slideshare
MULTI RELATIONAL GRAPH
10 Takahiro Inoue, MongoDB leader, slideshare
11 Takahiro Inoue, MongoDB leader, slideshare
12 Takahiro Inoue, MongoDB leader, slideshare
PROPERTY GRAPH
13 Takahiro Inoue, MongoDB leader, slideshare
14 Takahiro Inoue, MongoDB leader, slideshare
PROPERTY GRAPH: SUMMARY
15 Takahiro Inoue, MongoDB leader, slideshare
GRAPH TRAVERSALS
16 Takahiro Inoue, MongoDB leader, slideshare
GRAPH TRAVERSALS
17 THE NOSQL FAMILY
NoSQL& Graph&
18 DOCUMENT DATABASE
! Computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information ! Document encapsulates and encodes data (or information) in some standard formats or encodings. ! XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on) ! Similar, in some ways, to records or rows, in relational databases, but they are less rigid. ! Not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like
19 ORGANIZATION AND ACCESS
! Organizing documents, include notions of Collections,0Tags,0Non7visible0 Metadata,0Directory0hierarchies,0Buckets0 ! Documents are addressed in the database via a unique key that represents that document ! This key is a simple string e.g., URI or path ! This key can be used to retrieve the document from the database ! The database retains an index on the key such that document retrieval is fast ! Simple key-document (or key-value) lookup to retrieve a document, ! the database offers an API or query language to retrieve documents based on their contents ! For example, you may want a query that gets you all the documents with a certain field set to a certain value
20 DATA MODEL: DOCUMENT
! An object with named attributes and «attachments»: ! Identified by one unique ID and a version number ! Different data types: Text, numbers, booleans, dates, lists, maps ! Does not use locks for dealing with concurrency control: conflicts can be merged
! Examples: ! “Title”:0“CouchDB:0The0Definitive0Guide:0Time0to0Relax0(Animal0Guide)”0 ! “Authors”:0[“Chris0Anderson”,0“Jan0Lehnardt”,0“Noah0Slater”]0 ! “Keywords”:0[“NoSQL0databases”,0“Document0databases”]0 21
11/01/15 FLEXIBLE DOCUMENT STRUCTURE
! Can represent different classes of tag as documents ! Both documents can be inserted in the same collection
22 SIMPLE QUERY
! db.tags.find({id:0“tone/obituaries”})0
! Query&operators&(cf.&h ! db.tags.find({“section”:0{$exists:0true}})0 ! db.tags.find({“webtitle”:0/^Obit*/i})0 23 MODIFYING THE DOCUMENT STRUCTURE 24 THE NOSQL FAMILY NoSQL& Document& 25 DEMO: COUCHDB 26 THE NOSQL FAMILY Eventually5consistent& NoSQL& Hierarchical& Hosted&services& MulIvalue&databases& Key5value&store& Object&databases& Stores&on&disk& Tabular& Ordered&stores& Tuple& 27 • Data$model$$ • Availability$$ • Consistency$$ • Query$support$ • Storage$$ • Durability$$ Data&stores&designed&&to&scale&simple&& OLTP5style&applicaIon&loads&& Read/Write$operations00 by0thousands/millions0of0users0 28 Use the right tool for the right job… How do I know which is the right tool for the right job? (Katsov-2012) 29 PROBLEM STATEMENT: HOW MUCH TO GIVE UP? Fault7tolerant00 partitioning0 Availability0 Consistency0 ! CAP theorem1: a system can have two of the three properties ! NoSQL systems sacrifice consistency 30 10Eric0Brewer,0"Towards0robust0distributed0systems."0PODC.020000http://www.cs.berkeley.edu/~brewer/cs262b72004/PODC7keynote.pdf00 COMPARING NOSQL & NEWSQL SYSTEMS SYSTEM CONCURRENCY DATA REPLICATION TRANSACTION SYSTEM CONCURRENCY DATA REPLICATION TRANSACTION CONTROL STORAGE CONTROL STORAGE Redis Locks RAM Asynchronous No Terrastore Locks RAM+ Synchronous L Scalaris Locks RAM Synchronous Local Hbase Locks HADOOP Asynchronous L Tokyo Locks RAM/Disk Asynchronous Local HyperTable Locks Files Synchronous L Voldemort MVCC RAM/BDB Asynchronous No Cassandra MVCC Disk Asynchronous L BigTable Locs+stamps GFS Both L Riak MVCC Plug in Asynchronous No Key7Value0 PNuts MVCC Disk Asynchronous L Membrain Locks Flash+Disk Synchronous Local Extended0records0 MySQL-C ACID Disk Synchronous Y Membase Locks Disk Synchronous Local VoltDB ACID/no Lock RAM Synchronous Y Dynamo MVCC Plug in Asynchronous No Clustrix ACID/no Lock Disk Synchronous Y SimpleDB Non S3 Asynchronous No ScaleDB ACID Disk Synchronous Y MongoDB Locks Disk Asynchronous No ScaleBase ACID Disk Asynchronous Y Relational0 Document0 CouchDB MVCC Disk Asynchronous No NimbusDB ACID/no Lock Disk Synchronous Y Cattell,0Rick.0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270 31 CONCLUSIONS ! Data are growing big and more heterogeneous and they need new adapted ways to be managed thus the NoSQL movement is gaining momentum ! Data heterogeneity implies different management requirements this is where polyglot persistence comes up ! Consistency – Availability – Fault tolerance theorem: find the balance ! ! Which data store according to its data model? ! A lot of programming implied … Open opportunities if you’re interested in this topic! 32 POLYGLOT PERSISTENCE GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE [email protected] http://www.vargas7solar.com0 34 THIS TALK IS ABOUT alternative for managing multiform and multimedia data collections according to different properties and requirements 35 36 POLYGLOT PERSISTENCE ! Polyglot Programming: applications should be written in a mix of languages to take advantage of different languages are suitable for tackling different problems ! Polyglot persistence: any decent sized enterprise will have a variety of different data storage technologies for different kinds of data ! a new strategic enterprise application should no longer be built assuming a relational persistence support ! the relational option might be the right one - but you should seriously look at other alternatives M.0Fowler0and0P.0Sadalage.0NoSQL&Distilled:&A&Brief&Guide&to&the&Emerging&World&of&Polyglot&Persistence.0Pearson0Education,0Limited,020120 37 DESIGNING AND BUILDING A POLYGLOT DATABASE 38 OBJECTIVE MyNet0App0 MyNet0DB0 Social0network0 ! Build a MyNet app based on a polyglot database for building an integrated directory of my contacts including their status and posts from several social networks 39 Integrating posts Directory synchronisation Contact graph traversal User sessions in different from all networks Social networks Integrating contacts’ information For building groups out of From all SN Common characteristics Analysis on contacts networks, overlapping according to interests, posts topics Top 10 most popular contacts Synchronizing posts to all Friends network User accounts activity SN In different social networks 40 DEPLOYING A POLYGLOT DATABASE 41 Phone' *' Post' Content' number:(String( type:({pers,(prof}( postID:(Integer( contentID:(Integer( '>>' *' timeStamp:(Date( 1' 1' text:(String( geoStamp:(String( image:(Jpeg( <<'hasContent'>>' video:(Avi( '>>' *' MULTI-CLOUD POLYGLOT DATABASE hasMobilePhones <<'publishesPost'>>' Phone' <<' 1' 1' *' Contact' Post' hasOtherPhones Content' Basic(Info( number:(String( <<' idContact:(Integer( type:({pers,(prof}( 1' lastName:(String( <<'hasBasicInfo'>>'webSite:(URI( givenName:(String( socialNetworkIDpostID:(URI((:(Integer( contentID:(Integer( '>>' *' MyNet0 society:(String( 1' *' ( timeStamp:(Date( 1' 1' text:(String( *' 0 geoStamp:(String( image:(Jpeg( 1' <<'hasContent'>>' 1' *' 1' video:(Avi( '>>' <<'isContactof'>>' '>>' <<'hasEmail'>>' <<'isComposedof'>>' *' *' hasAddress 1' <<'publishesPost'>>' hasMobilePhones *' <<' Email' Group' <<' Address' 1' email:(String( 1' Phone' groupName:(String( type:({pers,(prof}( street:(String,(( Contact' hasOtherPhones *' number:(Integer,( Basic(Info( Post' Content' number:(String( City:(Sting,(( <<' type:({pers,(prof}( Zipcode:(Integer( idContact:(Integer( postID:(Integer( contentID:(Integer( '>>' *' lastName:(String( webSite:(URI( 1' timeStamp:(Date( 1' 1' text:(String( <<'hasBasicInfo'>>' geoStamp:(String( image:(Jpeg( givenName:(String( socialNetworkID:(URI(( <<'hasContent'>>' video:(Avi( society:(String( 1' *' ( REST0 '>>' *' *' hasMobilePhones MyNetContacts0 <<'publishesPost'>>' 1' <<' 1' *' 1' 1' '>>' 1' Contact' hasOtherPhones <<'isContactof'>>' Basic(Info( <<' <<'hasEmail'>>' idContact:(Integer( 1' <<'isComposedof'>>' lastName:(String( <<'hasBasicInfo'>>'webSite:(URI( givenName:(String( socialNetworkID:(URI(( society:(String( 1' *' ( *' *' 1' hasAddress JSON0documents0 *' 1' <<' Email' 1' *' 1' '>>' Group' <<'isContactof'>>' Address' <<'hasEmail'>>' <<'isComposedof'>>' email:(String( *' groupName:(String( hasAddress type:({pers,(prof}( 1' street:(String,(( *' <<' Email' Group' number:(Integer,( Address' City:(Sting,(( email:(String( groupName:(String( street:(String,(( type:({pers,(prof}( Zipcode:(Integer( number:(Integer,( City:(Sting,(( Zipcode:(Integer( MANAGING A POLYGLOT DATABASE QUERYING, INSERTING, MAINTAINING 43 GENERATING NOSQL PROGRAMS FROM HIGH LEVEL ABSTRACTIONS High-level abstractions Java0 web0 App0 Spring Data Spring Roo UML class diagram application classes Low-level abstractions Graph database http://code.google.com/p/model2roo/00 Relational database 44 Post' Content' postID:(Integer( contentID:(Integer( timeStamp:(Date( 1' 1' text:(String( POLYGLOTgeoStamp:(String DATABASE( EVOLUTION <<'hasContent'>>' *' < 1' Contact' Basic(Info( idContact:(Integer( lastName:(String( webSite:(URI( 1' givenName:(String( *' socialNetworkID:(URI(( society:(String( <<'hasBasicInfo'>>' ( <<' <<' 1' speaksLanguage '>>' hasEmail ! Problem statement: '>>' hasAddress '>>' <<' *' Contact' *' *' ! Evolution of the application: modification of classes, new Address' Language' Email' street:(String,(( ID:(Integer,( email:(String( classes, new relationships among classes idContact:"Integer" number:(Integer,( Lang:(String( type:({pers,(prof}( firstName:"String" City:(Sting,(( Zipcode:(Integer( lastName:"String" ! Evolution of the “entities” managed in the polyglot database ! Some change structure, change values, … ! The content of the stores start deriving from the application data structures ! Which is the current structure of the entities stored? ! Are there elements that are not being accessed because they do not longer correspond to the application data structures? 45 CRUD OPERATIONS Consistent view of data Contact Content Post Group Id$ FirstName$ LastName$ Society$ 46 BACKGROUND ! Classic protocols are not an option (Ex: Two-phase commit) ! Voting phase, Commit phase = Scalability issues ! Limited transactional support by NoSQL solutions ! Neo4j is one of the few that truly supports ACID (Atomicity, Consistency, Isolation, Durability) ! Others provide transactions limited to single entities (MongoDB), no roll-back (Redis), etc. ! Rely on BASE (Basic Availability, Soft-state, Eventual consistency) instead of ACID 47 EXAMPLE 1: SYNCHRONIZING REDIS+MYSQL Updating REDIS #FAIL begin0MySQL0transaction0 Redis has updated MySQL does not 0update0MySQL0 0update0Redis0 rollback0MySQL0transaction0 begin0MySQL0transaction0 0update0MySQL0 MySQL has updated Redis does not commit0MySQL0transaction0 <<0system0crashes0>>0 update0Redis0 https://oracleus.activeevents.com/connect/sessionDetail.ww?SESSION_ID=477500 48 EXAMPLE 1: UPDATING REDIS RELIABLY Step I Step 2 begin0MySQL0transaction0 for0each0CRUD0event0in0MySQL0queue0 0update0MySQL0 ACID 0 0queue0CRUD0event0in0MySQL0 get0next0CRUD0event0from0MySQL0queue0 commit0transaction0 if0CRUD0event0is0not0duplicate0then0 0update0Redis0(incl.0eventID)0 end0if0 0 Event0Id0 begin0MySQL0transaction0 Operation:0Create,0Update,0Delete0 0mark0CRUD0event0processed0 queue0CRUD0event0in0MySQL0 commit0transaction0 New0entity0state,0e.g.0JSON0 0 end0for0each0 49 EXAMPLE 1: UPDATING REDIS RELIABLY Step 1 Step 2 Timer0 EntityCRUDEvent0 EntityCRUDEvent0 apply(event)0 Redis0updater0 Repository0 Processor0 INSERT0INTO0..0 SELECT0…0FROM..0 ID$ JSON$ Processed?$ 50 EXAMPLE 1: TRACKING CHANGES (HIBERNATE) 51 EXAMPLE 1: SYNCHRONIZING REDIS+MYSQL ! Tracking changes (Hibernate)… 52 EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL 53 EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL 54 EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL ! Spring Data project http://static.springsource.org/spring-data/data-mongodb/docs/current/reference/html/#mongo.cross.store 55 Use the right tool for a given job… Lack of standardization of models and data storage technologies (Katsov-2012) 56 + QUALITY DRIVEN BENCHMARK1 CHARACTERISTIC$ SUBCHARACTERISTIC$ METRIC$ Reliability0 Maturity0 API changes0 Availability0 Downtime 30 Fault tolerance0 Node down throughput 30 Recoverability0 Time to stabilize on node up 30 Performance and Time behaviour0 Throughput, latency 20 efficiency0 Resource utilisation0 CPU, Memory and disk usage 40 Data&stores&designed&&to&scale&simply&& OLTP5style&applicaIon&loads&& Read/Write$operations00 by0thousands/millions0of0users0 1Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki 2 Cooper,B.F.,Silberstein,A.,Tam,E.,Ramakrishnan,R.,Sears,R.:Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on Cloud computing. pp. 143–154. SoCC ’10, ACM, New York, NY, USA (2010) 3 Nelubin, D., Engber, B.: Failover Characteristics of leading NoSQL databases. Tech. rep., Thumbtack Technology (2013) 4 Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring Sys- tem: Design, Implementation, and Experience. Parallel Computing 30(7) (Jul 2004) 57 + QUALITY DRIVEN BENCHMARK Linked0data0&0 QDB0 temporal0streams0 • Read/write0mix0 • DB0to0use0 • Record0size0 • Workload0to0use0 • Popularity0distribution0 • Target0throughput0 • Number0of0threads0 YSCB0Client0 Client0threads0 layer0 Workload0 executor0 Stats0 Cloud0serving0 Read0latency0 DB0interface0 store0 Throughput0 58 + ONGOING WORK ! QDB benchmark extends YCSB: FaultTolerance, Recoverability and TimeBehaviour ! Pivot data model for representing NoSQL stores data models ! Sample application: Shopping system1 (ProductInfo) ! Document data stores: MongoDB, Couchbase, VoltDB, Redis, Neo4J ! Cluster of four Ubuntu 12.04 servers deployed with extra large VM instances (8 virtual cores and 14 GB of RAM) in Windows Azure2 ! Distributed polyglot (big) database engineering ! Model2Roo: engineering data storage solutions for given data collections ! ExSchema for supporting the maintenance of a polyglot storage solution 1 McMurtry , D., Oakley, A., Sharp, J., Subramanian, M., Zhang, H.: Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence Microsoft patterns & practices, Microsoft (2013) 2 http://www.windowsazure.com/ 3 http://forge.puppetlabs.com/puppetlabs/ 4Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki 59 IN BRIEF… ! Many proposals, but no definite solution yet… ! Research/Industry challenges ! Open opportunities if you’re interested in this topic! 60 WHEN IS POLYGLOT PERSISTENCE PERTINENT? ! Application essentially composing and serving web pages ! They only looked up page elements by ID, they had different needs or availability, concurrency and no need to share all their data ! A problem like this is much better suited to a NoSQL store than the corporate relational DBMS ! Scaling to lots of traffic gets harder and harder to do with vertical scaling ! Many NoSQL databases are designed to operate over clusters ! They can tackle larger volumes of traffic and data than is realistic with a single server 61 Dr.0Genoveva0Vargas7Solar0 CNRS,0LIG7LAFMIA0 France0 [email protected] http://www.vargas7solar.com/bigdata7management0000 Juan0Carlos0Castrejón0 Javier0Espinosa0 University0of0Grenoble0 University0of0Grenoble0 France0 France0 REFERENCES ! Eric0A.,0Brewer0"Towards0robust0distributed0systems."0PODC.020000 ! Rick,0Cattell0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270 ! Juan0 Castrejon,0 Genoveva0 Vargas7Solar,0 Christine0 Collet,0 and0 Rafael0 Lozano,0 ExSchema:0 Discovering0and0Maintaining0Schemas0from0Polyglot0Persistence0Applications,0In0Proceedings0of0 the0International0Conference0on0Software0Maintenance,0Demo0Paper,0IEEE,0201300 ! M.0Fowler0and0P.0Sadalage.0NoSQL0Distilled:0A0Brief0Guide0to0the0Emerging0World0of0Polyglot0 Persistence.0Pearson0Education,0Limited,020120 ! C.0 Richardson,0 Developing0 polyglot0 persistence0 applications,0 http://fr.slideshare.net/ chris.e.richardson/developing7polyglotpersistenceapplications7gluecon20130 63