Polyglot Persistence
Total Page:16
File Type:pdf, Size:1020Kb
THE NOSQL MOUVEMENT (2) GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE [email protected] http://www.vargas7solar.com/bigdata7managment0 THE NOSQL FAMILY NoSQL& Graph& Document& Key5value&store& ! NoSQL concerns document databases, key-value databases and graph databases 2 GRAPH DATABASE ! Use graph structures with nodes, edges, and properties to represent and store data ! Nodes are similar in nature to the objects that object-oriented programmers are familiar with ! Properties are pertinent information that relate to nodes ! Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two ! By definition, a graph database is any storage system that provides index-free adjacency ! Every element contains a direct pointer to its adjacent element ! No index lookups are necessary 3 Takahiro Inoue, MongoDB leader, slideshare GRAPH 4 Takahiro Inoue, MongoDB leader, slideshare UNDIRECTED GRAPH 5 Takahiro Inoue, MongoDB leader, slideshare DIRECTED GRAPH 6 Takahiro Inoue, MongoDB leader, slideshare 7 Takahiro Inoue, MongoDB leader, slideshare MIXED GRAPH, MULTIGRAPH 8 Takahiro Inoue, MongoDB leader, slideshare SINGLE RELATIONAL GRAPH 9 Takahiro Inoue, MongoDB leader, slideshare MULTI RELATIONAL GRAPH 10 Takahiro Inoue, MongoDB leader, slideshare 11 Takahiro Inoue, MongoDB leader, slideshare 12 Takahiro Inoue, MongoDB leader, slideshare PROPERTY GRAPH 13 Takahiro Inoue, MongoDB leader, slideshare 14 Takahiro Inoue, MongoDB leader, slideshare PROPERTY GRAPH: SUMMARY 15 Takahiro Inoue, MongoDB leader, slideshare GRAPH TRAVERSALS 16 Takahiro Inoue, MongoDB leader, slideshare GRAPH TRAVERSALS 17 THE NOSQL FAMILY NoSQL& Graph& 18 DOCUMENT DATABASE ! Computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information ! Document encapsulates and encodes data (or information) in some standard formats or encodings. ! XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on) ! Similar, in some ways, to records or rows, in relational databases, but they are less rigid. ! Not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like 19 ORGANIZATION AND ACCESS ! Organizing documents, include notions of Collections,0Tags,0Non7visible0 Metadata,0Directory0hierarchies,0Buckets0 ! Documents are addressed in the database via a unique key that represents that document ! This key is a simple string e.g., URI or path ! This key can be used to retrieve the document from the database ! The database retains an index on the key such that document retrieval is fast ! Simple key-document (or key-value) lookup to retrieve a document, ! the database offers an API or query language to retrieve documents based on their contents ! For example, you may want a query that gets you all the documents with a certain field set to a certain value 20 DATA MODEL: DOCUMENT ! An object with named attributes and «attachments»: ! Identified by one unique ID and a version number ! Different data types: Text, numbers, booleans, dates, lists, maps ! Does not use locks for dealing with concurrency control: conflicts can be merged ! Examples: ! “Title”:0“CouchDB:0The0Definitive0Guide:0Time0to0Relax0(Animal0Guide)”0 ! “Authors”:0[“Chris0Anderson”,0“Jan0Lehnardt”,0“Noah0Slater”]0 ! “Keywords”:0[“NoSQL0databases”,0“Document0databases”]0 21 11/01/15 FLEXIBLE DOCUMENT STRUCTURE ! Can represent different classes of tag as documents ! Both documents can be inserted in the same collection 22 SIMPLE QUERY ! db.tags.find({id:0“tone/obituaries”})0 ! Query&operators&(cf.&h<p://docs.mongodb.org/manual/crud/)&& ! db.tags.find({“section”:0{$exists:0true}})0 ! db.tags.find({“webtitle”:0/^Obit*/i})0 23 MODIFYING THE DOCUMENT STRUCTURE 24 THE NOSQL FAMILY NoSQL& Document& 25 DEMO: COUCHDB 26 THE NOSQL FAMILY Eventually5consistent& NoSQL& Hierarchical& Hosted&services& MulIvalue&databases& Key5value&store& ObJect&databases& Stores&on&disk& Tabular& Ordered&stores& Tuple& 27 • Data$model$$ • Availability$$ • Consistency$$ • Query$support$ • Storage$$ • Durability$$ Data&stores&designed&&to&scale&simple&& OLTP5style&applicaIon&loads&& Read/Write$operations00 by0thousands/millions0of0users0 28 Use the right tool for the right job… How do I know which is the right tool for the right job? (Katsov-2012) 29 PROBLEM STATEMENT: HOW MUCH TO GIVE UP? Fault7tolerant00 partitioning0 Availability0 Consistency0 ! CAP theorem1: a system can have two of the three properties ! NoSQL systems sacrifice consistency 30 10Eric0Brewer,0"Towards0robust0distributed0systems."0PODC.020000http://www.cs.berkeley.edu/~brewer/cs262b72004/PODC7keynote.pdf00 COMPARING NOSQL & NEWSQL SYSTEMS SYSTEM CONCURRENCY DATA REPLICATION TRANSACTION SYSTEM CONCURRENCY DATA REPLICATION TRANSACTION CONTROL STORAGE CONTROL STORAGE Redis Locks RAM Asynchronous No Terrastore Locks RAM+ Synchronous L Scalaris Locks RAM Synchronous Local Hbase Locks HADOOP Asynchronous L Tokyo Locks RAM/Disk Asynchronous Local HyperTable Locks Files Synchronous L Voldemort MVCC RAM/BDB Asynchronous No Cassandra MVCC Disk Asynchronous L BigTable Locs+stamps GFS Both L Riak MVCC Plug in Asynchronous No Key7Value0 PNuts MVCC Disk Asynchronous L Membrain Locks Flash+Disk Synchronous Local Extended0records0 MySQL-C ACID Disk Synchronous Y Membase Locks Disk Synchronous Local VoltDB ACID/no Lock RAM Synchronous Y Dynamo MVCC Plug in Asynchronous No Clustrix ACID/no Lock Disk Synchronous Y SimpleDB Non S3 Asynchronous No ScaleDB ACID Disk Synchronous Y MongoDB Locks Disk Asynchronous No ScaleBase ACID Disk Asynchronous Y Relational0 Document0 CouchDB MVCC Disk Asynchronous No NimbusDB ACID/no Lock Disk Synchronous Y Cattell,0Rick.0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270 31 CONCLUSIONS ! Data are growing big and more heterogeneous and they need new adapted ways to be managed thus the NoSQL movement is gaining momentum ! Data heterogeneity implies different management requirements this is where polyglot persistence comes up ! Consistency – Availability – Fault tolerance theorem: find the balance ! ! Which data store according to its data model? ! A lot of programming implied … Open opportunities if you’re interested in this topic! 32 POLYGLOT PERSISTENCE GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE [email protected] http://www.vargas7solar.com0 34 THIS TALK IS ABOUT alternative for managing multiform and multimedia data collections according to different properties and requirements 35 36 POLYGLOT PERSISTENCE ! Polyglot Programming: applications should be written in a mix of languages to take advantage of different languages are suitable for tackling different problems ! Polyglot persistence: any decent sized enterprise will have a variety of different data storage technologies for different kinds of data ! a new strategic enterprise application should no longer be built assuming a relational persistence support ! the relational option might be the right one - but you should seriously look at other alternatives M.0Fowler0and0P.0Sadalage.0NoSQL&Distilled:&A&Brief&Guide&to&the&Emerging&World&of&Polyglot&Persistence.0Pearson0Education,0Limited,020120 37 DESIGNING AND BUILDING A POLYGLOT DATABASE 38 OBJECTIVE MyNet0App0 MyNet0DB0 Social0network0 ! Build a MyNet app based on a polyglot database for building an integrated directory of my contacts including their status and posts from several social networks 39 Integrating posts Directory synchronisation Contact graph traversal User sessions in different from all networks Social networks Integrating contacts’ information For building groups out of From all SN Common characteristics Analysis on contacts networks, overlapping according to interests, posts topics Top 10 most popular contacts Synchronizing posts to all Friends network User accounts activity SN In different social networks 40 DEPLOYING A POLYGLOT DATABASE 41 Phone' *' Post' Content' number:(String( type:({pers,(prof}( postID:(Integer( contentID:(Integer( '>>' *' timeStamp:(Date( 1' 1' text:(String( geoStamp:(String( image:(Jpeg( <<'hasContent'>>' video:(Avi( '>>' *' MULTI-CLOUD POLYGLOT DATABASE hasMobilePhones <<'publishesPost'>>' Phone' <<' 1' 1' *' Contact' Post' hasOtherPhones Content' Basic(Info( number:(String( <<' idContact:(Integer( type:({pers,(prof}( 1' lastName:(String( <<'hasBasicInfo'>>'webSite:(URI( givenName:(String( socialNetworkIDpostID:(URI((:(Integer( contentID:(Integer( '>>' *' MyNet0 society:(String( 1' *' ( timeStamp:(Date( 1' 1' text:(String( *' 0 geoStamp:(String( image:(Jpeg( 1' <<'hasContent'>>' 1' *' 1' video:(Avi( '>>' <<'isContactof'>>' '>>' <<'hasEmail'>>' <<'isComposedof'>>' *' *' hasAddress 1' <<'publishesPost'>>' hasMobilePhones *' <<' Email' Group' <<' Address' 1' email:(String( 1' Phone' groupName:(String( type:({pers,(prof}( street:(String,(( Contact' hasOtherPhones *' number:(Integer,( Basic(Info( Post' Content' number:(String( City:(Sting,(( <<' type:({pers,(prof}( Zipcode:(Integer( idContact:(Integer( postID:(Integer( contentID:(Integer( '>>' *' lastName:(String( webSite:(URI( 1' timeStamp:(Date( 1' 1' text:(String( <<'hasBasicInfo'>>' geoStamp:(String( image:(Jpeg( givenName:(String( socialNetworkID:(URI(( <<'hasContent'>>' video:(Avi( society:(String( 1' *' ( REST0 '>>' *' *' hasMobilePhones MyNetContacts0 <<'publishesPost'>>' 1' <<' 1' *' 1' 1' '>>' 1' Contact' hasOtherPhones <<'isContactof'>>'