Cambridge University Press 978-1-107-18612-5 — Principles of Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

INDEX

aborted transaction, 432 analytics Apache Kylin, 621 absolute address, 366 applications, 667 Apache Lucene, 616 abstraction. See generalization data pre-processing, 669–672 Apache Spark access category, 82 economic perspectives background of, 652–653 access modifiers, 59 in- versus outsourcing, 664–704 GraphX, 658 access paths, 397 on-premises versus cloud MLlib, 656–657 access transparency, 525 solutions, 705–706 Spark Core, 653–654 accessibility dimension, 617 open-source versus commercial Spark SQL, 654–656 accessor methods, 208 software, 706–708 Spark Streaming, 657–658 accuracy dimension, 617 return on investment, 702–704 Apache Sqoop, 621 accuracy ratio (AR), 687 total cost of ownership, 702 Apache Storm, 658 ACID properties improving ROI of Apollo program, 97 defined, 15, 452–453 cross-fertilization, 713–714 application developer, 12 in loosely coupled systems, 535–538 , 711–712 application programming interface in NoSQL, 538 management support, 712 (API) active DBMS, 232–236 new data sources, 708–711 classification ActiveX Data Objects (ADO), 468, 502 organizational aspects, 712–713 background, 462–463 activity services, 607–608 post-processing, 700–701 early binding versus late binding, actuator, 353 predictive model evaluation, 689–696 465–466 ADO.NET, 468–471, 502, 533–534 privacy and security embedded versus call-level, after images, 435 accessing internal data, 717 464 after trigger, 233 anonymization, 717–718 proprietary versus universal, agglomerative hierarchical clustering, definitions and considerations for, 463–464 693 714–715 object persistence aggregate functions, 157 encryption, 721 Enterprise JavaBeans, 484–488 aggregated data, 717 importance of, 714 Entity Framework, 498–499 aggregation label-based access control, Java Data Objects, 495–498 in EER model, 55 719–721 Java Persistence API, 488–494 mapping EER to relational, 137 RACI matrix, 715–716 object-relational mapping, in UML, 62–63 regulations, 721–723 483–484, 498 AJAX (Asynchronous JavaScript and SQL views, 719 SQLAlchemy, 499–502 XML), 508–509 process model, 665–666 universal database ALL, 178–181 success factors for, 701 ADO.NET, 468–471 allocation, 519, 523 types of embedded API versus embedded ALTER, 155–156 descriptive, 689–695 DBMS, 480–482 alternative keys, 109 predictive, 673–682 JDBC, 471–477 Amazon, 593 social network, 695–700 language-integrated querying, Amazon Redshift, 572, 621 analytics process model, 665–666 482–483 Amazon Relational Database Service anonymization (data), 717–718 ODBC, 466–467 (RDS), 621 ANY, 178–181 OLE DB and ADO, 467–468 Amazon Web Services, 706 Apache Flume, 621 SQL injection, 477–479 Amsterdam, 8 Apache Hadoop, 631 SQLJ, 479–480

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

Index 771

architecture categorization, 30–31 BASE transactions, 540 Boyce, Raymond, 120 architecture components Bayer, Rudolf, 388 Boyce–Codd normal form (BCNF), connection and security manager, BayesDB, 342 119–120 21–22 Bean-Managed Persistence (BMP), Brewer, Eric, 312–313, 539 DDL compiler, 22 488 B-tree, 378, 386–388 interacting with before image, 435 bucket, 365, 368–369 DDL statements, 21 before trigger, 233 buffer manager, 26 embedded DML statements, 21 begin_transaction instruction, 432 business activity monitoring (BAM), interactive queries, 21 behavior (OO), 244 593 interfaces, 27 BETWEEN, 159 business continuity query processor, 22–25 BFI, 384 contingency planning, recovery point storage manager, 25–26 bidirectional association, 61 and recovery time, 398–421 utilities, 26 Big Data defined, 421 archiving, 438 Apache Spark (BI). See also arcs, 333 background of, 652–653 decision-making area under the ROC curve (AUC), 685 GraphX, 658 defined, 572 ASP (Active Server Pages), 506 MLlib, 656–657 hybrid OLAP, 575 association class, 60 Spark Core, 653–654 multidimensional OLAP, association rules Spark SQL, 654–656 574–587 basic setting, 689–690 Spark Streaming, 657–658 on-line analytical processing, 574 defined, 689 outlook, 621 operational BI, 592 post-processing, 691 defined, 627 pivot tables, 573 support, confidence, and lift, 690–691 Hadoop query and reporting, 573–587 associations, 59–61, See also definition and design, 630 relational OLAP, 575 relationship type history of, 630–631 business process associative query, 221 SQL, 643–652 defined, 601 Asynchronous JavaScript and XML stack, 631–643 in database design, 38 (AJAX), 508–509 scope of business process integration atomic attribute type, 42 value, 627–629 data and process integration in, atomic literal, 217 variety, 627–629 606–610 atomic search key, 402–404 velocity, 627–629 defined and modeling, 601–602 atomicity property, 15, 452, 530–532 veracity, 627–629 managing dependencies, attribute type volume, 627–629 604–606 defined, 40 BigQuery ETL, 621 manual processes, 602–604 in ER model, 42–43 binary large object (BLOB), 155–156, in file organization, 362 247, 360–361 CallableStatement, 475–476 in index creation, 400 binary relationship type call-level APIs, 464 relationship, 46 cardinalities, 45 attributes, 57 defined, 44–45 defined, 108 authorization identifier, 150 mapped to a , in file organization, 362 availability, 539 122–127 in index creation, 400 AVG, 162 and ternary types, 48–50 canonical form, 526 Axibase, 342 binary search, 364 CAP theorem, 312–313, 539 binary search trees, 385–386 Capability Maturity Model Integration B+-trees, 378, 388–389, 402–404 biometric data, 3 (CMMI), 619–620 Bachman diagram, 98 bitmap index, 383 cardinalities. See also multiplicities backup, 438 BLOB (binary large object), 155–156, CODASYL model, 101 backup and recovery utility 247, 360–361 ER model, 45 and data availability, 423–425 block, 397 Cartesian product, 108 as database advantage, 15 block-level I/O protocols, 415 cascading rollback, 447–448 defined, 27 block pointer, 371 catalog Baesens, B., 82, 85 blockchains, 524 and role, 80 bag. See multiset blocking factor (BF), 361 data types in, 401 BASE principle, 312 bootstrapping, 683–684 defined, 10–11

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

772 Index

categorization Codd, Edgar F., 104–105, 120 consistency based on architecture, 30–31 coefficient of determination, 688 in CAP theorem, 539 based on data model, 28–30 collection literal, 218 eventual, 312–313, 540 based on degree of simultaneous collection types (OO), 245–247 quorum-based, 542–544 access, 30 collision, 366 consistency dimension, 82–84 based on usage, 31–32 constraints, 151 consistency in , 300–301 in EER model, 54–55 column value, 399 consistency property, 15, 452 mapping EER to relational, 136–137 column-oriented DBMS, 331–332 consistent hashing, 309–310, 538 central processing unit (CPU), 352 combination notation, 404 constructor, 210 central storage, 352 combined approach, 270–271 container managed relationships centrality metrics, 696–698 commercial analytical software, (CMR), 488 centralized DBMS architecture, 706–708 container-managed persistence (CMP), 459–460 committed, 432 488 centralized system architecture, 30 common gateway interface (CGI), contextual category, 82 chaining, 370 504–507 contingency plan, 398–421 changeability property, 65 Common Language Runtime (CLR), Control Objectives for Information and changed data capture (CDC), 598 468–471 Related Technology (COBIT), character large object (CLOB), 247, compatibility matrix, 445 620 360–361, 610 compensation-based transaction models, correlated nested queries, 175–178 CHECK constraint, 151 535–538 cost-based optimizer, 400 checkpoints, 435 completeness constraint, 53 COUNT, 161, 222–223 Chen, Peter Pin-Shan, 40 completeness dimension, 82–84 credit scoring models, 667 choreography, 604–606 composite aggregation, 62–63 cross-, 573 churn prediction, 667 composite attribute type, 42 cross-validation, 682–683 class, 57 composite key, 362 CRUDS functionality, 608 class diagram, 58 comprehensibility, 689 CUBE, 577–578 class invariant, 65 conceptual data model cube (three-dimensional), 575 classification advantages of, 13 cumulative accuracy profile (CAP), defined, 673 defined, 9 686–687 performance measures for, 684–687 in design phase, 39 mechanism, 474 classification accuracy, 685 EER, 52–57 customer relationship management cleansing, 566 ER, 40–52, 121–133 system (CRM), 5, 628 client–server DBMS architecture, 31, not stored in catalog, 10 customer segmentation, 667 459–460 physical design architecture, cutoff, 685 client-side scripting, 507–508 357–358 Cutting, Doug, 630 CLOB (character large object), 247, UML class diagram, 57–66 cylinder, 354 360–361, 610 conceptual/logical layer, 10 Cypher cloud DBMS architecture in graph-based database, 334 analytics, 705–706 defined, 6, 14–15 overview of, 335–341 data in the, 600–601 in distributed databases, 528–534 data warehousing in, 572 locking protocol, 444–452 data access request, 717 defined, 31 multi-version, 541–542 data accessibility, 84 tiered system architectures, 462 optimistic and pessimistic schedulers, data accuracy, 82 cloud storage, 421 443–444 Data as a Service (DaaS), 599–601 cloud-based solutions, 705–706 problems, 439–442 data auditing services, 610 CLUSTER, 398 schedule and serial schedule, 442 data availability, 423–425 clustered index, 373–374, 398 serializable schedules, 442–443 data cleansing services, 609 clustering in transactions, 431 data completeness, 617 defined, 423, 692–693 confidence, 690 data consistency, 617 hierarchical clustering, 693–695 conformed dimensions, 568–569 data consolidation, 593–595 K-means, 695 connection manager, 21–22 data definition language (DDL), 12 CODASYL model Connection object, 477 data definitions, 21 defined, 97 connectivity, 414 and DDL compiler, 22 key building blocks, 97–101 conservative 2PL, 446 versus actual data, 8–9

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

Index 773

data dependency XML, 29 ETL process, 565–567 defined, 604 defined, 9–10 Hive, 649–652 managing, 604–610 data needs convergence, 591–593 largest, 554 data enrichment services, 609 data owner, 87 most popular vendors, 565 data event services, 610 data pointers, 386 , 571 data federation, 595–596 data pooling firms, 709 schemas data flow, 607 data pre-processing fact constellation, 557 data governance, 85–86, 712–713 defined, 669 snowflake, 556 data in the cloud, 600–601 denormalization, 669–670 star, 555–556 data independence, 12 exploratory analysis, 671 traditional set-up, 592 data integration missing values, 671–672 versus data lakes, 571–572 in business process integration, outlier detection and handling, 672 virtual, 569–570 606–610 sampling, 670 data adapter method, 470–471 defined, 591 data profiling services, 609 database, 4 outlook, 621 data projections, 626 database access in the World Wide Web searching unstructured documents data propagation client-side scripting, 507–508 enterprise search, 616–617 enterprise application integration, common gateway interface, 504–507 full-text, 610–611 596–597 JavaScript, 508–509 indexing, 611–613 enterprise data replication, 597 JSP, ASP, and ASP.NET, 506 web search engines, 613–616 data providers, 469 original web server, 504 data integration pattern, 593 data quality (DQ) REST-based web services, 509–511 data integrity rules, 14 in analytics, 711–712 Simple Object Access Protocol, data item, 357 and data governance 509–511 data lake, 571–572 Capability Maturity Model database access methods , 608 Integration, 619–620 atomic key index searches, 402–404 data localization, 526 Control Objectives for Information full table scan, 408 and Related Technology, 620 index-only access, 408 in catalogs, 401 Data Management Body of multiple index and multicolumn data quality, 81–85 Knowledge, 620 index search, 403–407 defined, 79 Information Technology query optimizer functioning, governance, 85–86 Infrastructure Library, 621 400–402 and metadata catalogs, 80–81 Total Data Quality Management, 619 (DBA) roles in, 86–88 defined, 80–81 and analytics processing, 666 Data Management Body of Knowledge dimensions, 81–84 data management role, 87 (DMBOK), 620 master data management, 617–618 defined, 11 Data Management Maturity Model, problems, 84–85 interaction tools, 21 619–620 data redundancy, 14, 311, 438 privileges, 191–192 data manipulation language (DML) data replication, 311 database approach, 6–8 declarative, 22–25 data scientist database design phases, 38–39 defined, 12 and analytics processing, 666 database designer, 11, 87 procedural, 22–23 defined, 88 database functionality, 422–423 job profile, 668–669 database languages, 12 as smaller , data security, 15 database management, 12–15 567–569 data service composition, 599 database management system (DBMS) virtual, 569–570 data services, 608 architecture of, 20–27 data model data silo, 591 categorization of categorization data stewards, 87 based on architecture, 30–31 extended relational, 29 data striping, 411–412 based on data model, 28–30 hierarchical, 28 services, 610 based on degree of simultaneous network, 28 data type, 359 access, 30 not-only SQL, 30 , 598–599 based on usage, 31–32 object-oriented, 28 data warehouse definition, 5 object-relational, 29 data marts, 567–569 market value of, 5 relational, 28 definition, 553–554 object-oriented, 207–226

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

774 Index

database management system deduplication, 567 dirty read problem, 440–441 architecture, 20–27 deep equality, 217 disaster tolerance, 421, 424–425 deferred update, 437 discretization, 717 advantages of, 13 degree (of relationship type), 44 disjoint specialization, 52 versus instances, 8–9 Dejaeger, K., 82 disjointness constraint, 52 database schema, 8–9 DELETE, 185–186, 474 disk arrays, 411–413 database schema issues in data deletion anomaly, 113 disk blocks, 354, 361 warehousing delimiters, 360 disk mirroring, 412, 438 dimension table optimization, delineation, 432 dissemination, 308 559–560 Delta Airlines, 225 DISTINCT, 170 granularity, 558–559 dendrogram, 694 distinct data type, 238–252 factless fact table, 559 denormalization, 669–670 distributed 2PL, 529–530 junk dimensions, 560 dense indexes, 371 systems outrigger table, 561 dependency architectural implications, 518–519 rapidly changing dimension, 563–565 existence, 45, 47 blockchains, 524 slowly changing dimensions, full functional, 117 defined, 517–518 561–563 managing, 604–606 fragmentation, 520–523 surrogate keys, 557–558 multi-valued, 120 metadata allocation, 524 database state, 8 transitive, 118 replication, 523–524 database system, 12–15 trivial functional, 119 distributed query processing, 525–528 definition, 5 in UML, 66 divisive hierarchical clustering, 693 elements, 8–10 dependent data marts, 568 DML compiler database system architecture, 459–461 derived attribute type, 43 in database access, 401 database system types derived fragmentation, 523 defined, 22–25 extended relational, 231–249 descriptive analytics document metadata, 613 legacy, 93–101 association rules document stores, 315 object-oriented, 207–226 basic setting, 689–690 Document Type Definition (DTD), 260 relational, 104–137 defined, 689 document-oriented approach, 270 SQL, 147–194 post-processing, 691 DOM API, 267 XML, 255–293 support, confidence and lift, domain database technology applications, 3–4 690–691 defined, 41 database user types, 12 clustering, 692–695 relational database, 106–108 DataNodes, 632–634 goal of, 689 in SQL, 150–151 data-oriented approach, 270 sequence rules, 691–692 in UML, 59 dataset, 357 detection (deadlock), 449 double byte character large object Davenport, T.H., 712 dicing, 577 (DBCLOB), 247 DBCLOB (double byte character large dimension reduction, 710 doubling amount, 677 object), 247 dimension table optimization, 559–560 DQ frameworks, 82 DDL compiler, 22 dimensions drill-across, 576 DDL statements, 21 accessibility, 84 drill-down operator, 576 deadlock, 448 completeness, 82–84 drill-up operator, 575 detection and resolution, 448–449 conformed, 568–569 Driver Manager, 471–473, 479 prevention, 449 consistency, 82–84 DROP, 155–156 decision making, 552–553, See also junk, 560 dumb client architecture, 460 business intelligence mini, 563 dummy record type, 99–100 decision phase, 531 rapidly changing, 563–565 durability property, 15, 453 decision support systems (DSS), 552 slowly changing, 561–563 dynamic binding, 212 decision trees table optimization, 559–560 dynamic hashing, 370 defined, 677 direct attach, 414 dynamic random access memory properties of, 680 direct file organization, 365–370 (DRAM), 356 regression trees, 680–681 directly accessible storage devices dynamic SQL, 466 splitting decision, 677–679 (DASDs), 353 stopping decision, 679–680 directly attached storage (DAS), 416, 419 early binding API, 465, 502 declarative DML, 22–25 directory, 376 eBay, 629

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

Index 775

edge, 333, 696 mapped to a relational model, fact constellation, 557 efficiency, 399 121–133 fact table granularity, 558–559 Elasticsearch, 616 relationship types, 43–46 factless fact table, 559 electronic vaulting, 424 temporal constraints of, 51–52 failover, 416 ELK stack, 616 ternary relationship types, 48–50 failover time, 438 embedded API, 464, 502 weak entity types, 46–47 failure detection, 308 embedded DBMSs, 480–482 entity type failure types, 436 embedded DML statements, 21 defined, 40 fat client variant, 31 embedded documents, 320 mapped to a relational model, fat server variant, 31 embedded identification, 359 121–122 featurization, 699–700 encapsulation, 208 Entity Manager, 492–493 federated database, 519, 525 encryption, 721 Ethernet federated DBMS, 31 end_transaction, 432 defined, 415 Fibre Channel (FC), 415 Enhanced Entity Relationship Model iSCSI (internet SCSI), 419–420 field, 357 (EER) NAS, 417–418, 420 file-level I/O protocols, 415 aggregation in, 55 ETL (extract, transform, load), file system, 418 categorization in, 54–55 565–566, 593–595, See also file-based approach designing a, 56–57 near real-time ETL definition, 5–6 examples of, 56 Event Store, 342 record organization, 359 foreign keys, 109 eventual consistency, 312–313, fill factor, 389 mapped to a relational model, 318–319, 540 filter factor, 401 133–137 EXCEPT, 183–184 filters, 316–320 metadata modeling, 81 exclusive lock, 445 first normal form (1 NF), 115–117 and relational model, 106 ExecuteReader method, 470 5 Vs of Big Data, 627–629 specialization and generalization in, ExecuteScalar method, 470 FlockDB, 334 52–54 existence dependency FLWOR, 280–282 SQL for metadata management, 192 cardinality, 45 versus UML, 66 and weak entity type, 47 defined, 109–110 ensemble methods, 681–682 EXISTS, 181–182 mapped to a relational model, Enterprise Application Integration experts, 666 129–130 (EAI), 284, 596–597 explicit networks, 708–709 and SQL Enterprise Data Replication (EDR), 597 exploratory analysis, 671 constraints, 154–155 Enterprise Information Integration (EII), extended relational DBMS (ERDBMS) formatting rules, 566 595–596 active RDBMS extensions, 232–236 fourth normal form (4 NF), 120 Enterprise JavaBeans (EJB), 484–488, defined, 29 fragment query, 526 502 Extensible Markup Language (XML), fragmentation, 520–523 enterprise resource planning (ERP), 256–259 fragmentation transparency, 524 628 Extensible Stylesheet Language (XSL), fragments, 519 enterprise search, 616–617 263–266 fraud detection, 667 enterprise storage subsystems external data model free-form language, 148 directly attached storage (DAS), 416 defined, 10 FROM, 157 iSCSI/Storage over IP, 419–421 SQL having, 150 full backup, 439 NAS gateway, 418–419 SQL views, 188–190 full functional dependency, 117 network attached storage (NAS), external scalar function, 241 full outer join, 170–172 417–418 external table function, 241 full table scan, 408 overview, 414–416 extraction strategy, 566 full-text storage area network (SAN), extraction transformation and loading indexing, 611–613 416–417 process (ETL), 565–567 searching, 610–611 Entity Relationship model (ER) full-text search (XML), 280 attribute types, 40–41 Facebook functional dependency, 114–115 domains, 41 API, 463 fuzzy logic, 612 entity type, 40 as Big Data, 628 examples of, 50–51 Hive, 649 gain, 678 history of, 40 Presto, 652 galaxy schema, 557

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

776 Index

garbage collection, 541 Hibernate, 489 information hiding, 58, 209 garbage in, garbage out (GIGO), 81, hierarchical clustering, 693–695 Information Technology Infrastructure 572, 711 hierarchical DBMS, 28 Library (ITIL), 621 General Data Protection Regulation hierarchical model, 93–97 Infrastructure as a Service (IaaS), (GDPR), 722 Hive, 649–652 600–601 generalization horizontal fragmentation, 521, 538 inheritance defined, 52 horizontal scaling at data type level, 242–243 in UML, 62 defined, 301 defined, 210 geographical information systems (GIS) in distributed databases, 519 in EER model, 54 applications, 4 and NoSQL databases, 305–308 at table type level, 243–244 as file-based approach, 5 HTML in UML, 58 global deadlock, 530 client-side scripting languages, in-memory DBMS, 31 global , 526 507 Inmon, Bill, 553, 569 Google as original web server language, inner join, 166–171 consistency, 314 504 inner table, 409–410 Google Cloud Dataflow, 621 and XML, 263–266 INSERT, 185, 474 Google File System, 630 versus XML, 256–257 insertion anomaly, 112 Google Trends, 616 web search engines and, 615 insourcing, 664–704 grain. See granularity HTTP, 506 integrator standard, 620 granularity, 558–559 definition, 285 integrity constraints, 314–315 graph theory, 333 and REST, 288–289 integrity rules, 14 graph-based databases, 333, 709 hybrid OLAP (HOLAP), 575 intention exclusive lock (ix-lock), GraphX, 658 451–452 GROUP BY, 163–164, 170 I/O, 353, 414–416 intention lock, 451 GROUPING SETS, 579–580 I/O boundary, 353 intention shared lock (is-lock), immediate update policy, 438 451–452 Hadoop immutable object identifier (OID), 58 inter-query parallelism, 397, 519 and Big Data, 621 impedance mismatch problem, 24 interactive queries, 21 definition and initial design, 630 implementation, 209 interface, 27, 209 history of, 630–631 implicit networks, 708–709 internal data model stack impurity, 677–679 defined, 10 distributed file system, 631–635 IN, 159 in design phase, 40 MapReduce, 635–641 inconsistent analysis problem, physical database design facilitates, pure form, 631 441–442 356–358 YARN, 641–643 incremental backups, 439 and physical database organization, Hadoop Common, 631 independent data marts, 586 396 Hadoop Distributed File System index, 190–191 and SQL, 149 (HDFS), 631–635 index design, 398–400 SQL indexes, 190–191 hard disk backup, 424 index entry, 371 internal layer, 10 hard disk controller, 353 index-only access, 408 internal representation format, 25 hard disk drive (HDD) index search, 402–404 Internet of Things (IoT), 4, 627–628 failures of, 631 index spaces, 397 INTERSECT, 183–184 internals of, 353–355 indexed sequential file organization intersection, 405 as secondary storage device, 352 clustered indexes, 373–374 intervals, 370 Harris, J.G., 712 defined, 370 intra-query parallelism, 397, 519 hash file organization, 365–370 multilevel indexes, 374–375 intrinsic category, 82 hash function, 304–305 primary indexes, 373 inverted file hash indexes, 383 terminology of, 370–385 characteristics, 382 hash join, 410 indexer, 613 and database access, 370 hashing, 362 indexing, 362, 611–613 defined, 380 HAVING clause (SQL DML), information analyst, 86 iSCSI (internet SCSI), 419–420 163–164 information architect isolation levels (locking protocol), HBase, 644–648 data management role, 86 449–450 heap file, 363 defined, 11 isolation property, 15, 452

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

Index 777

Java (programming language) legacy databases loosely coupled systems, 535 hash map, 304 CODASYL model, 98–101 Lorenz curve, 686 HDFS, 631, 634 hierarchical model, 93–97 lost update problem, 440, 447, 450 and MapReduce, 636–641 legacy file-based system, 6 MongoDB, 321–330 Lemahieu, W., 82 macroeconomic data, 709 NoSQL databases, 316–320 lift, 690 mainframe architecture, 460 object-oriented paradigm, 223–225 lift curve, 686 manual failover, 422 popularity of, 148 LIKE, 159 MapReduce and SQL, 148 linear decision boundary, 676 and Hadoop, 631, 635–641 Java applets, 507 linear list, 375–377 and Hive, 651–652 JavaBean, 484–488 linear regression, 673–675 innovative aspects of, 321–330 Java Data Objects (JDO), 495–498, 502 linear search, 362 parallelization in, 630 Java DataBase Connectivity (JDBC), Linear Tape-Open (LTO), 424 marketing analytics, 667 471–477, 511 linked data, 283 master data management (MDM), Java Persistence API, 489–494, 502 linked list 617–618 JavaScript, 507–510 defined, 370 MATCH, 335–341 JavaScript Object Notation (JSON), represented as tree structure, 378 mathematical notation, 404 290–292 LinkedIn, 628 maturity (analytics applications), JavaServer Pages (JSP), 486, 506 LINQ, 483 713–714 join condition, 408–409 Lismont, J., 713 McCreight, Edward, 388 join index, 384 list data organization, 375–378 mean absolute deviation (MAD), 688 join queries lists, 360, 375 mean squared error (MSE), 688 defined, 166 literal, 217 media failure, 436 in index creation, 399 loading factor, 368 media recovery, 438–439 in physical database organization, loading utility, 26 member record type, 98 408–410 local area network (LAN), 519 membership protocol, 308 OQL, 222 local query optimization, 526 Memcached, 306–308, 353 jOOQ, 482 location transparency, 524 message-oriented middleware (MOM), JSON standard, 316 lock manager, 26 284–285 JSONB, 331 lock table, 445 metadata junk dimension, 560 locking, 444–446 and catalogs, 80 locking protocol in database approach, 6 key attribute type, 42 cascading rollbacks, 447–448 distribution and replication of, 524 key performance indicators (KPIs) in concurrency control, 444 document, 613 in business intelligence, 592 deadlocks, 448–449 modeling, 80–81 defined, 12–16 defined, 26 NameNode, 632 monitoring of, 27 isolation levels, 449–450 semantic, 567 keys, 108–110 lock granularity, 450–452 SQL for, 192–193 key-to-address transformation, 365–368 purposes of, 444–446 structural, 567 key–value stores, 304 Two-Phase Locking Protocol, metadata services, 610 keyword-based search (XML), 280 446–448 metamodel, 80–81 Kibana, 617 log records, 435 method overloading, 209 Kimball, Ralph, 569 logfile, 435 method overriding, 211–212 K-means clustering, 695 logical data independence, 12 metrics (social network), 696–698 logical data model Microsoft Label-Based Access Control (LBAC), defined, 9–10 ADO.NET, 468–471, 502, 533–534 719–721 in design phase, 39–40 AJAX, 508–509 LAN-free backup, 416–417 physical design architecture, 357–358 ASP.NET, 506 language-integrated querying, 482–483 SQL having, 150 LINQ, 483 large objects (LOBs), 247 transparency in, 524 ODBC, 466–467 late binding, 465–466, 502 logistic regression, 675–677 OLE DB, 467–468, 502 latency, 354, 591–593 Logstash, 616 purchase of LinkedIn, 629 left outer join, 170–172 long-running transactions, 534–535 mini-dimension table, 563 legacy, 93 long-term locks, 449 mirroring, 424

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

778 Index

misclassification rate, 685 nested queries, 172–175 fragmentation, 520–521 missing values, 671–672 nested-loop join, 409–410 inconsistency, 440 mixed file, 357 Netflix, 593 key-value stores 304 mixed fragmentation, 521–522 network attach, 414 consistent hashing, 309–310 MLlib, 656–657 network attached storage (NAS), eventual consistency, 312–313 mobile DBMSs, 32 417–418, 518 hash function, 304–305 modulo (mod), 366–367 network DBMSs, 28 horizontal scaling, 305–308 Moges, H.T., 82 network sockets, 462 integrity constraints and querying, MongoDB neural networks, 681–682 314–315 complex queries and aggregations, NewSQL, 343 replication and redundancy, 320–330 nodes, 696 311–312 eventual consistency, 318–319 defined, 333 request coordination, 308–309 filters and queries, 316–318 tree data structures, 386–389 stabilization, 314 items with keys, 316 nonlinear list, 375 modern background, 302–304 Monsanto, 334 nonrepeatable read, 442 movement emergence, 302 Morison, R., 712 non-volatile, 553–554 REST-based web services, 509–511 multi-user database, 430 normalization tuple and document stores multicolumn index, 382–383, 403–407 anomalies in unnormalized relational complex queries and aggregations, multidimensional DBMS (MDBMS), model, 113–114 320–330 574–587 defined, 111–112 defined, 315–316 multidimensional OLAP (MOLAP), functional dependencies, 114–115 filters and queries, 316–320 574–587 informal guidelines, 114 items with keys, 316 multilevel indexes, 374–375, 384–385 prime attribute type, 115 SQL interface, 330–331 multimedia data normalization form tests NOT EXISTS, 182 applications, 3–4 Boyce–Codd normal form, NOT NULL, 151 technology for, 32 119–120 n-tier DBMS architecture, 31, 461 multimedia DBMSs, 32 first normal form, 115–117 Nutch project, 630 Multiple Granularity Locking Protocol fourth normal form, 120 (MGL protocol), 451–452 second normal form, 117–118 object, 57 multiple indexes, 403–407 third normal form, 118–119 object constraint language (OCL), multiplicities, 60, See also cardinalities NoSQL databases 64–66 multiset, 157 and Big Data, 31 Object Data Management Group multi-user system, 30 blended systems, 343 (ODMG), 217 multi-valued attribute type column-oriented databases, object definition language (ODL), defined, 43 331–332 218–221 mapped to a relational model, and consistency, 301–302 object equality, 217 130–131 data distribution/transaction object identifier (OID), 216–217 multi-valued dependency, 120 management object identity, 217 multi-version concurrency control, BASE transactions, 540 Object Management Group (OMG), 57 541–542 CAP theorem, 539 object manipulation language (OML), MySQL (interactive environment), horizontal fragmentation and 223 147–148 consistent hashing, 538 object model, 217–218 multi-version concurrency object persistence, 483–484 named type, 239 control and vector clocks, basic principles of, 214 NameNode server, 631–634 541–542 serialization in, 214–215 namespace, 266–267 quorum-based consistency, object (OQL), n-ary relationship, 129–130 542–544 221–223 NAS gateway, 418–419 defined, 30 object-relational mapping (ORM), natural key, 717 graph-based databases 226–227, 484, 501–502 navigational query, 221 and Hbase, 644 object storage, 420 near real-time ETL, 598, See also ETL Cypher query language, object-oriented database management Neo4j (graph-based database), 334, 335–341 systems (OODBMS), 658 defined, 333–335 208–227

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

Index 779

defined, 216 Spark, 652 defined, 16 evaluation of, 225–227 operational BI, 592 in modern DBMSs, 397 identifiers, 216–217 operational data store (ODS), 571 persistence as NoSQL niche database, 303 operational efficiency, 689 by class, 214 standard operational level, 552 by creation, 214 defined, 217 operations, 57 by inheritance, 214 language bindings, 223–225 optimistic protocol, 443–444, 534–536 by marking, 214 object definition language, orchestration pattern, 604–606 by reachability, 214 218–221 ORDER BY, 163–164 independence, 214 object model, 217–218 ordinary least squares (OLS), 674 orthogonality, 214 object query language, 221–223 outer joins (SQL DML), 170–172 persistent object, 214 object-oriented DBMS (OODBMS). See outer table, 409–410 persistent storage media, 352 object-oriented database outliers, 672 personal computer (PC), 459 management systems outrigger table, 561 pessimistic protocol, 444, 533 object-oriented paradigm (OO) outsourcing, 664–704 phantom reads, 442 advanced concepts of, 209–213 overfitting, 679 PHP, 505 basic concepts of, 208–209 overflow physical data independence, 12 defined, 207 and database access, 408 physical database, 357 object persistence in, 214–215 defined, 366 physical database design, 356–358 object-relational DBMS (ORDBMS) retrieval of, 368 physical database organization behavior, 244 overflow area, 369 business continuity, 421–425 collection types, 245–247 overflow handling technique, database access methods, 400–408 defined, 29, 236 369–370 disk arrays and RAID, 411–413 inheritance, 242–244 overlap specialization, 53 enterprise storage subsystems, large objects, 247 owner entity type, 46 413–421 polymorphism, 245 owner record type, 98 join implementations, 408–410 recursive SQL queries, 249–253 physical design architecture, user-defined functions, 240–241 page, 397 356–358 user-defined types, 236–240 parallel databases, 519 record organization, 359–361 odds ratio, 676 parent–child relationship records and files, 396–400 OLE DB, 467–468, 502 defined, 94–95 storage hierarchy, 352–353 ON DELETE CASCADE, 154 foreign keys, 110 physical file, 357 ON UPDATE CASCADE, 154 Parquet, 332 physical file organization one-way linked list, 375–376 partial categorization, 55 bitmap index, 383 on-line analytical processing (OLAP), partial participation, 45 hash index, 383 31 partial shredding, 270–271 heap file, 363 on-line DBMS partial specialization, 53 indexed sequential file, 370–375 (OLTP) 31 participants, 528 join index, 384 in business intelligence, 574 BY, 582 list data organization, 375–378 in decision-making, 552 partition tolerance, 539 random file/hashing, 365–370 operators, 575–577 partitions, 370 secondary indexes and inverted files, SQL queries, 577–583 passive, 232 379–384 on-premises analytics, 705–706 Pearson correlation coefficient, 687 sequential file, 363–365 opaque data type, 238 performance measures terminology, 362–363 open addressing, 369 classification models, 684–687 Pig platform, 648–649 open API, 463 comprehensibility, 688 pivot or cross-table, 573 Open Database Connectivity (ODBC), operational efficiency, 689 Platform as a Service (PaaS), 600 464, 466–467 regression models, 687–688 pointers open-source analytical software, performance monitoring utilities block, 371 706–708 and CAP theorem, 539 data, 386 open-source DBMS defined, 27 defined, 357–358, 360 defined, 32 in modern DBMSs, 397 record, 371 and Hadoop, 630–631 performance utilities tree, 386

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

780 Index

point-of-sale (POS) query by example (QBE), 573 record-at-a time DML, 22–23 application (POS), 552 query cardinality (QC), 401 recovery, 431 defined, 4 query decomposition, 526 recovery facilities, 15 as on-line transactional processing, 31 query executor, 25 recovery manager, 26 polymorphism, 212, 245 query optimizer recovery point objective (RPO), 422 PostgreSQL, 330 in database access, 400–402 recovery time objective (RTO), power curve, 686 defined, 25 421–422 precedence graph, 443 and index design, 398 recovery utilities, 27 precision, 685 query parser, 25, 401 REDO, 436–438 predictive analytics query predicate, 401 redundancy decision trees, 677–681 query processor and clustering, 311 evaluating, 689–696 in database access, 401 defined, 95 goal and types of, 673 defined, 22 in disk arrays, 412 linear regression, 673–675 in a distributed database, Redundant Array of Independent Disks logistic regression, 675–677 525–528 (RAID), 411–413, 422 PreparedStatement interface, 475 DML compiler, 22–25 referential integrity constraints Presto, 652 function of, 25 ALTER command, 155–156 primary area, 369 optimization of, 25 DROP command, 155–156 primary copy 2PL, 529 query executor, 25 SQL having, 154–155 primary file organization methods, 362 query rewriter, 24, 401 regression, 673 primary index QueryDSL, 483 regression tree defined, 373, 398 querying, 314–320 defined, 680–681 range query, 404 quorum-based consistency, performance measures, 687–688 primary key 542–544 , 105, See also set type defined, 109 relational DBMS (RDMS) in file organization, 362 RACI matrix, 715–716 active extensions, 232–236 in index creation, 400 RAID controller, 411 and SQL, 147–149 NoSQL databases, 316 RAID levels, 412–413 defined, 28, 147 primary site 2PL, 529 RAIN (redundant array of independent relational model, 105–112 primary storage, 352 nodes), 420 SQL, 147–149 prime attribute type, 115 random file organization relational databases privacy (in analytics) defined, 365 basic concepts, 105–106 anonymization (data), 717–718 efficiency factors, 368–370 compared to NoSQL, 303 defined, 714 key-to-address transformation, constraints, 111 encryption, 721 365–368 examples of, 111–112 internal data access, 717 ranking, 580 formal definitions, 106–108 LBAC, 719–721 ranking module, 613 history of, 104–105 RACI matrix, 715–716 rapidly changing dimension, 563–565 mapping conceptual EER model to, regulations, 721–723 raw data, 6 133–137 SQL views, 719 RDF Schema, 283, 341 mapping conceptual ER model to, privilege, 191–192 read committed, 450 121–133 procedural DML, 22 read lock, 26 mathematical underpinning, 105 process engine, 602 read uncommitted, 450 normalization, 111–120 process integration read/write heads, 353 types of keys, 108–110 and data integration, 606–610 receiver operating characteristic curve versus XML databases, 271–272 defined, 591 (ROC curve), 685 relational model, 231–232 product notation, 404 recommender systems, 667 Relational OLAP (ROLAP), 575 projection, 527 record organization, 359–361 relationship, 43 proprietary API, 463–464 record pointer, 371 relationship type. See also associations proximity, 612 record type with attribute type, 46 CODASYL model, 98–101 defined, 44 qualified association, 61 defined, 94 dependency, 66 query and reporting, 573–587 variable length, 360–361 in ER model, 43–46

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

Index 781

legacy databases, 94 scaling out, 301 self-service BI, 573 mapped to a relational model, scaling up, 301 semantic metadata, 567 122–132 schedule, 432, 442 semantic search, 282–284 relative block address, 366 scheduler, 433, 439–440 semantical rules relative location, 359 schema-aware mapping (XML), defined, 14 remote procedure call (RPC), 284 275–276 in UML, 59 reorganization utility, 27 schema-level triggers, 234 semi-structured data repeatable read, 450 schema-oblivious mapping/shredding, defined, 13 repeated group, 98 273–275 and document stores, 315–318 replicas, 311 Scheule, H., 85 sensitivity, 685 replication SCSI (Small Computer Systems sensor DBMS, 32 and data availability, 424 Interface), 414–415 sequence rules, 691–692 and distributed 2PL, 529–530 search key, 362 sequential file organization, 363–365 in distributed databases, 520, search key values, 386 sequentially accessible storage device 523–524 search tree, 385 (SASD), 353 replication transparency, 524 second normal form (2 NF), 117–118 serial schedule, 442 representation category, 82 secondary file organization methods, serial transactions, 431 representational state transfer (REST), 363 serializable, 442–443 288–289, 509–511 secondary index, 363, 380–381, 398, serializable level, 450 request coordinator, 308 404 serialization, 214–215 requirement collection and analysis, 38 secondary storage, 352 server-free backups, 416–417 resilient distributed datasets (RDDs), sectors, 354 service oriented architecture (SOA), 653–654 security (in analytics) 599 resolution (deadlock), 449 anonymization (data), 717–718 service time, 354 Resource Description Framework defined, 714–715 SET DEFAULT, 154 (RDF), 282–284 encryption, 721 SET NULL, 154 response modeling, 667 internal data access, 717 set operators (SQL DML), 183–184 response time LBAC, 719–721 set type, 97–101, See also relation defined, 354 RACI matrix, 715–716 set-at-a-time DML, 22–25, 147 as KPI, 16 regulations, 721–723 shallow equality, 217 REST (representational state transfer), SQL views, 719 shard, 306 288–289, 509–511 security manager, 22 sharding, 306, 538 RESTRICT, 154–155 seek time, 354 shared aggregation, 62–63 ResultsSets, 474, 476 SELECT shared and intention exclusive lock return on investment (ROI) correlated queries, 175–178 (six-lock), 451–452 in analytics, 708–714 full syntax of, 156–157 shared-disk architecture, 518 defined, 702–704 GROUP BY/HAVING clause, shared lock, 445 right outer join, 170–172 163–164 shared-memory architecture, 518 rigorous 2PL, 446 join queries shared-nothing architecture, 518 ring topology, 309 defined, 166 short-term locks, 449 risk analytics, 667 inner joins, 166–171 shredding, 270, 273–275 Roesch, D., 85 outer joins, 170–172 similarity measures, 612 roles, 44 nested queries, 172–175 simple attribute type, 42 rollback, 432 queries with aggregate functions, Simple Object Access Protocol (SOAP), roll-down, 576 161–163 285–288, 509–511 rollforward recovery, 439 queries with ALL/ANY, 178–181 simultaneous access, 30 roll-up, 575 queries with EXISTS, 181–182 single points of failure, 422 ROLLUP, 578–579 queries with ORDER BY, 165 single-user system, 30 rotational delay, 354 queries with set operators, 183–184 single-valued attribute, 43 row, See tuple SELECT/FROM subqueries, singular value decomposition (SVD), 182–183 710 sampling, 670 simple queries, 157–160 slicing, 587 SAX API (simple API for XML), 268 selective inheritance, 55 slowly changing dimensions, 561–563

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

782 Index

Small Computer Systems Interface specialization hierarchy, 54 for metadata management, 192–193 (SCSI), 414–415 specificity, 685 in index creation, 398–400 snowflake schema, 556 spindle, 353 indexes, 190–191 SOAP (Simple Object Access Protocol), splitting decision, 677–679 and JPA, 494 285–288, 509–511 splitting up the dataset techniques, key characteristics of, 147–148 Sober (fictional taxi company) 682–684 in NoSQL queries, 330–331 analytics, 723–724 Spotify, 32, 463 OLAP queries, 577–583 background of, xxiiii SQL data types, 150–151 and ORDBMSs, 249–253 Big Data, 660 SQL injection, 477–479 popularity of, 148 data management, 88 SQL schema, 150 privileges, 191–192 data quality and governance, 622 SQL table, 150–151 in relational databases, 147 data warehousing and business SQL views, 719 search keys, 362 intelligence, 583–584 SQL/XML, 276–279 three-layer architecture, 149 database access methods, 426 SQL/XML mapping, 276–279 views, 188–190 database architecture and SQLite, 480–482 Structured Query Language (SQL) on categorization, 33 SQLJ, 479–480 Hadoop distributed transaction management, stabilization, 314 background for using, 643–644 545 , 555–556 HBase, 644–648 EER model for, 67–68 starvation, 446 Hive, 649–652 file organization methods, 390 static 2PL, 446 Pig, 648–649 legacy databases, 102 static binding, 212 Structured Query Language data mapping relational models, static SQL, 466 definition language 138–139 StAX (Streaming API for XML), 269 (SQL DDL) NoSQL databases, 300 stopping decision, 679–680 example of, 151–154 SQL, 194–195 storage area network (SAN), 416–417, key concepts of, 150–151 transaction management, 453 419, 518 referential integrity constraints, UML model for, 69–70 storage devices, 352–353, 422 154–155 XML, 294–296 storage manager Structured Query Language data web-based access, 512–513 and buffer manager, 26 manipulation language social network, 696 defined, 25 (SQL DML) social network analytics and lock manager, 26 DELETE statement, 185–186 definitions, 696–697 and recovery manager, 26 INSERT statement, 185 learning, 699–700 and transaction manager, 26 purpose of, 156 metrics, 696–698 stored data manager, 434 SELECT statement, 156–184 popularity of, 695–696 stored database, 357 UPDATE statement, 186–188 sociogram, 696–697 , 234–236, 465 structured search, 280–282 Software as a Service (SaaS), 600 stored record, 357 subject-oriented, 553 solid state drive (SSD) stored table, 397 SUM, 162 integrated circuitry of, 355–356 strategic level of decision-making, summation notation, 404 as secondary storage device, 352 552 , 108 sort-merge join, 410 streaming data, 593 supply chain management systems sourced function, 241 stretched cluster, 425 (SCM), 628 space utilization, 16 strong entity type, 46 support, 690 Spark, 652 Strong, D.M., 82 support vector machines (SVM), Spark Core, 653–654 structural metadata, 567 681–682 Spark SQL, 654–656 structured data surrogate keys, 557–558 Spark Streaming, 657–658 defined, 13 synonyms, 366 SPARQL, 284, 341 versus unstructured data, 613 syntactical rules, 14 sparse indexes, 371, 380 structured literal, 218 system database, 396 spatial DBMS, 32 Structured Query Language (SQL) system failure, 436 specialization binding in, 465–466 system recovery, 436–438 defined, 52 data definition language, 149–156 mapping EER to relational, 133–136 data manipulation language, table cardinality (TC), 401 in UML, 62 156–188 table data type, 240

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

Index 783

table-based mapping, 272–273 logfile, 435 defined, 57 tablespace, 396–397 transparency, 524–525 dependency relationships in, 66 tactical level, 552 transaction manager, 26, 433 examples of, 64 tape backup, 423 transaction recovery OCL in, 64–66 technical key, 717 defined, 425 specialization and generalization in, template-based mapping, 278 failure types, 436 62 temporal constraints media recovery, 438–439 variables, 59 of EER model, 56–57 system recovery, 436–438 versus EER, 66 of ER model, 51–52 transaction transparency, 525 uniform distribution, 368 relational database, 111 transfer time, 354 UNION, 183–184 ternary relationship types transient object, 214 UNIQUE constraint defined, 48 transitive dependency, 118 defined, 151 in ER model, 48–50 transitive persistence, 214 in index creation, 398 text analytics, 667 transparency unique index, 399 text mining, 612 defined, 517 universal API, 463–464, 525 textual data, 710 in distributed databases, 524–525 universal data access, 468 thesaurus, 612 in transaction management, 524–525 universal data storage, 468 third normal form (3 NF), 118–119 tree data structures unnamed row type, 238 thread, 22 B+-tree, 388–389 unrepeatable read, 442 three-layer architecture binary search, 385–386 unstructured data defined, 10 B-tree, 386–388 in analytics ROI, 710 SQL having, 149 defined, 377–379 defined, 13 three-schema architecture, 10 tree pointers, 386 versus structured data, 613 three-tier architecture, 460 trigger, 232–236 UPDATE, 186–188, 474 throughput rate, 12–16 trivial functional dependency, 119 update anomaly, 113 tiered system architecture, 460–462 tuple usage categorization, 31–32 tightly coupled defined, 105 user database, 396 in distributed databases, 528 in relationships, 106–108 user interface, 27 in Hadoop, 631 SQL DML, 161–164 user management utilities, 27 primary site 2PL and primary copy tuple stores, 315 user-defined functions (UDF), 2PL, 529 Twitter 240–241 time variant, 554 API, 463 user-defined types (UDT) timeliness, 82 as Big Data, 628 defined, 236–237 timestamping, 444, 533 as social network, 334 distinct data type, 238–252 total categorization, 55 Two-Phase Commit Protocol (2PC), named row type, 239 total cost of ownership (TCO), 569, 702 530–532 opaque data type, 238 Total Data Quality Management Two-Phase Locking Protocol (2PL), table data types, 240 (TDQM), 85–86, 619 446–448, 529 unnamed row type, 238 total participation, 45 two-tier architecture, 460 utilities, 26–27 total specialization, 53 two-way linked list, 377 tracks, 354 type orthogonality, 214 valid, 260 transaction, 430–431 value, 629 transaction coordinator, 528 unary relationship type, 44–45, 127– value distortion, 717 transaction failure, 436 128 variable length records, 360 transaction management uncommitted dependency problem, variables, 59 ACID properties of, 452–453 440–441, 448, 537 VARIANCE, 162 compensation-based models, 534–538 UNDO, 436–438 variety, 628 concurrency control, 528–534 unidirectional association, 60 vector, 98 DBMS components, 433–435 Unified Modeling Language (UML) vector clocks, 541–542 delineating, 432–433 access modifiers, 59 velocity, 628 distributed and concurrency control, aggregation in, 62–63 veracity, 629 528–534 associations, 59–61 vertical fragmentation, 520 distributed query processing, changeability property, 65 vertical scaling, 301 525–528 classes, 58 vertices, 333

© in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-18612-5 — Principles of Database Management Wilfried Lemahieu , Seppe vanden Broucke , Bart Baesens Index More Information

784 Index

victim selection, 449 well-formed, 259 web services and databases, WHERE, 159, 221–223 289–290 defined, 10 wide area network (WAN), 519 YAML Ain’t a Markup Language, SQL, 188–190 windowing, 582 292–293 virtual child record type, 95 workflow service, 607 mapping between object-relational virtual data mart, 569–570 wrappers, 525, 569, 595 databases virtual data warehouse, 569–570 write ahead log strategy, 435 schema-aware, 275–276 virtual nodes, 311 write lock, 26 schema-oblivious, 273–275 virtual parent record type, 95 WS-BPEL, 602–604 SQL/XML, 276–279 virtual parent–child relationship type, table-based, 272–273 95–97 XML and XML DBMS namespaces, 266–267 volatile data, 4 AJAX, 508–509 processing documents, 267–269 volatile memory, 352 and JPA, 489 searching volume, 628 and relational databases, 271–272 full-text, 280 voting phase, 531 as NoSQL niche database, 303 keyword-based, 280 basic concepts of, 256–259 semantic search with RDF and wait-for graph, 449 defined, 29 SPARQL, 282–284 Wang, R.Y., 82, 88 document storage, 269–271 structured search with XQuery, weak entity type document stores, 316 280–282 defined, 46 Document Type and Schema XML element, 256 mapped to a relational model, 131–132 Definitions, 260–263 XML Schema Definition (XSD), 260 qualified associations for, 62 Extensible Stylesheet Language, XML-enabled DBMS, 271 web crawler 263–266 XPath, 256 and Big Data, 630 for information exchange XQuery, 280–282 defined, 613 JavaScript Object Notation, XSL Formatting Objects (XSL-FO), Web Ontology Language (OWL), 283 290–292 263–266 web search engines, 613–616 message-oriented middleware, XSL Transformations (XSLT), 263–266 web services, 285–288 284–285 Web Services Description Language REST-based web services, YAML Ain’t a Markup Language, (WSDL), 286 288–289 292–293, 316 WeChat, 628 SOAP-based web services, YARN (Yet Another Resource Weibo, 628 285–288 Negotiator), 631, 641–643

© in this web service Cambridge University Press www.cambridge.org