File Organizations and Indexing Administrivia Review – the Big

Total Page:16

File Type:pdf, Size:1020Kb

File Organizations and Indexing Administrivia Review – the Big File Organizations and Indexing Administrivia Lecture 5 • Homework 1 Assignment now Available – Due in 2 parts: Sunday 9/14, Thursday 9/18 R&G Chapter 8 – Have to make changes to Postgres "If you don't find it in the index, look very – Don’t procrastinate carefully through the entire catalogue." -- Sears, Roebuck, and Co., Consumer's Guide, 1897 Review – The Big Picture Review – Relational Algebra • Basic operators for manipulating relations • Databases are cool – Selection ( s ) Selects a subset of rows – Theory: data modelling, relational algebra – Projection ( p ) Retains only wanted columns – Practical: storing data with disk, RAM – Cross-product ( ´ ) Combine two relations – Set-difference ( — ) Tuples in r1, but not in r2 • Today: – Union ( È ) Tuples in r1 and/or in r2 – Rename ( ? ) Change column name – Practical: File Organization, Indexing • Other Useful Operators • Next Time: – Intersection (Ç ) Tuples in both relations – Theory: Relational Calculus – Join ( ) Match common columns – Division ( / ) Tuples in A that match all in B Relation Algebra 2: Division Review: Examples of Division A/B sno pno pno • Division A / B, pno pno s1 p1 p2 p2 – columns in A must be a superset of columns in B p1 s1 p2 p4 p2 – let C be the columns in both A and B B1 both s1 p3 B2 p4 – let C be the columns that are only in A A-Only s1 p4 B3 – result has columns C A-Only s2 p1 sno – result has only rows where values in C A-Only have a s2 p2 s1 match with every value in B s3 p2 s2 sno – can be defined using other operators s4 p2 s3 s1 sno s4 p4 s4 s4 s1 A A/B1 A/B2 A/B3 1 sid bid day Review: Reserves 22 101 10/10/96 Find the names of Find the names of sailors who’ve reserved all boats Sailors who have 58 103 11/12/96 reserved all boats sid sname rating age • Uses division; schemas of the input relations to / must be carefully chosen: Sailors 22 dustin 7 45.0 31 lubber 8 55.5 r (Tempsids, (p Re serves) / (p Boats)) 58 rusty 10 35.0 sid,bid bid p (Tempsids>< Sailors) Boats bid bname color sname 101 Interlake Blue 102 Interlake Red v To find sailors who’ve reserved all ‘Interlake’ boats: 103 Clipper Green ..... / p (s Boats) 104 Marine Red bid bname=' Interlake' Review – Buffer Management and Files Today: File Organization • Storage of Data • How to keep pages of records on disk – Fields, either fixed or variable length... – Stored in Records... • but must support operations: – Stored in Pages... – scan all records – Stored in Files – search for a record id “RID” – search for record(s) with certain values • If data won’t fit in RAM, store on Disk – insert new records – Need Buffer Pool to hold pages in RAM – delete old records – Different strategies decide what to keep in pool Alternative File Organizations Indexes Many alternatives exist, tradeoffs for each: • Sometimes need to retrieve records by the values in one – Heap files: or more fields, e.g., • Suitable when typical access is file scan of all records. – Find all students in the “CS” department – Find all students with agpa > 3 • An index on a file is a: – Sorted Files: – Disk-based data structure • Best for retrieval in search key order – Speeds up selections on the search key fields for the index. • Also good for search based on search key – Any subset of the fields of a relation can be index searchkey – Search key is not the same as key • (e.g. doesn’t have to be unique ID). – Indexes: Organize records via trees or hashing. • An index • Like sorted files, speed up searches for search key fields – Contains a collection of data entries • Updates are much faster than in sorted files. – Supports efficient retrieval of all records with a given search key value k. 2 First Question to Ask About Indexes Index Classification • What selections does it support • What kinds of selections do they support? – Selections of form field <op> constant – Equality selections (op is =) • Representation of data entries in index – Range selections (op is one of <, >, <=, >=, BETWEEN) – what info is the index storing? 3 alternatives: – More exotic selections: • Data record with key value k • 2-dimensional ranges (“east of Berkeley and west of Truckee and North of Fresno and South of Eureka”) • <k, rid of data record with search key value k> – Or n- dimensional • <k, list of rids of data records with search key k> • 2-dimensional distances (“within 2 miles of Soda Hall”) – Or n- dimensional • Ranking queries (“10 restaurants closest to VLSB”) • Clustered vs. Unclustered Indexes • Regular expression matches, genome string matches, etc. • One common n-dimensional index: R-tree • Single Key vs. Composite Indexes – Supported in Oracle and Informix • Tree-based, hash-based, other – See http://gist.cs.berkeley.edu for research on this topic Alternatives for Data Entry k* in Index Alternatives for Data Entries (Contd.) • Three alternatives: · Actual data record (with key value k) • Alternative 1: · <k, rid of matching data record> Actual data record (with key value k) · <k, list of rids of matching data records> – Index structure is file organization for data records (like Heap files or sorted files). • Choice is orthogonal to the indexing technique. – At most one index on a table can use Alternative 1. – techniques: B+ trees, hash-tables, R trees, … – Saves pointer lookups – Typically, index contains auxiliary information that – Can be expensive to maintain with insertions and directs searches to the desired data entries deletions. • Can have multiple (different) indexes per file. – E.g. file sorted by age, with a hash index on salary and a B+tree index on name. Alternatives for Data Entries (Contd.) Index Classification Alternative 2 <k, rid of matching data record> • Clustered vs. unclustered: and Alternative 3 – If order of data records is the same as, or `close to’, <k, list of rids of matching data records> order of index data entries, then called clustered index. – Easier to maintain than Alt 1. • A file can be clustered on at most one search key. – Atmost one index can use Alternative 1; any others must use Alternatives 2 or 3. • Cost to retrieve data records with index varies – Alternative 3 more compact than Alternative 2, but greatly based on whether index clustered or not! leads to variable sized data entries even if search keys are of fixed length. • Alternative 1 implies clustered, but not vice-versa. – Even worse, for large rid lists the data entry might have to span multiple pages! 3 Clustered vs. Unclustered Index Unclustered vs. Clustered Indexes • Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file. – To build clustered index, first sort the Heap file (with • What are the tradeoffs???? some free space on each block for future inserts). • Clustered Pros – Overflow blocks may be needed for inserts. (Thus, order of – Efficient for range searches data recs is `close to’, but not identical to, the sort order.) – May be able to do some types of compression – Possible locality benefits (related data?) Index entries UNCLUSTERED CLUSTERED direct search for data entries • Clustered Cons Data entries Data entries – Expensive to maintain (on the fly or sloppy with (Index File) reorganization) (Data file) Data Records Data Records B+ Tree Indexes Hash-Based Indexes • Good for equality selections. Non-leaf • Index is a collection of buckets. Bucket = primary Pages page plus zero or more overflow pages. • Hashing function h: – h(r) = bucket in which record r belongs. Leaf Pages – h looks at the search key fields of r. v Leaf pages containdata entries, and are chained (prev & next) v Non-leaf pages contain index entries and direct searches: • If Alternative (1) is used, the buckets contain the data records; otherwise, they contain <key, rid> index entry or <key, rid-list> pairs. P K P K P 0 1 1 2 P 2 K m m Example B+ Tree Comparing File Organizations Root 17 • Heap files (random order; insert at eof) Entries <= 17 Entries > 17 • Sorted files, sorted on <age, sal> • Clustered B+ tree file, Alternative (1), search 5 13 27 30 key <age, sal> • Heap file with unclustered B + tree index on 39* 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* search key <age, sal> • Find 28*? 29*? All > 15* and < 30* • Heap file with unclustered hash index on • Insert/delete: Find data entry in leaf, then search key <age, sal> change it. Need to adjust parent sometimes. – And change sometimes bubbles up the tree 4 Operations to Compare Cost Model for Analysis I/O cost 150,000 times more than hash function • Scan: Fetch all records from disk – We ignore CPU costs, for simplicity • Fetch all records in sorted order B: The number of data pages • Equality search R: Number of records per page • Range selection F: Fanout of B-tree • Insert a record • Delete a record Average-case analysis; based on several simplistic assumptions. * Good enough to show the overall trends! Heap File I/O Cost of Scan all Assumptions in Our Analysis Operations records • Heap Files: Get all in sort order – Equality selection on key; exactly one match. B: Number of data pages (packed) • Sorted Files: R: Number of records per page Equality Search – Files compacted after deletions. S: Time required for equality search • Indexes: Range – Alt (2), (3): data entry size = 10% size of record Search – Hash: No overflow buckets. Insert • 80% page occupancy => File size = 1.25 data size – Tree: 67% occupancy (this is typical). Delete • Implies file size = 1.5 data size Sorted File Clustered Tree I/O Cost of Scan all I/O Cost of Scan all Operations records Operations records Get all in Get all in sort order sort
Recommended publications
  • Tuple Relational Calculus Formula Defines Relation
    COS 425: Database and Information Management Systems Relational model: Relational calculus Tuple Relational Calculus Queries are formulae, which define sets using: 1. Constants 2. Predicates (like select of algebra ) 3. BooleanE and, or, not 4.A there exists 5. for all • Variables range over tuples • Attributes of a tuple T can be referred to in predicates using T.attribute_name Example: { T | T ε tp and T.rank > 100 } |__formula, T free ______| tp: (name, rank); base relation of database Formula defines relation • Free variables in a formula take on the values of tuples •A tuple is in the defined relation if and only if when substituted for a free variable, it satisfies (makes true) the formula Free variable: A E x, x bind x – truth or falsehood no longer depends on a specific value of x If x is not bound it is free 1 Quantifiers There exists: E x (f(x)) for formula f with free variable x • Is true if there is some tuple which when substituted for x makes f true For all: A x (f(x)) for formula f with free variable x • Is true if any tuple substituted for x makes f true i.e. all tuples when substituted for x make f true Example E {T | E A B (A ε tp and B ε tp and A.name = T.name and A.rank = T.rank and B.rank =T.rank and T.name2= B.name ) } • T not constrained to be element of a named relation • T has attributes defined by naming them in the formula: T.name, T.rank, T.name2 – so schema for T is (name, rank, name2) unordered • Tuples T in result have values for (name, rank, name2) that satisfy the formula • What is the resulting relation? Formal definition: formula • A tuple relational calculus formula is – An atomic formula (uses predicate and constants): •T ε R where – T is a variable ranging over tuples – R is a named relation in the database – a base relation •T.a op W.b where – a and b are names of attributes of T and W, respectively, – op is one of < > = ≠≤≥ •T.a op constant • constant op T.a 2 Formal definition: formula cont.
    [Show full text]
  • Tuple Relational Calculus Formula Defines Relation Quantifiers Example
    COS 597A: Tuple Relational Calculus Principles of Queries are formulae, which define sets using: Database and Information Systems 1. Constants 2. Predicates (like select of algebra ) 3. Boolean and, or, not Relational model: 4. ∃ there exists 5. ∀ for all Relational calculus Variables range over tuples Value of an attribute of a tuple T can be referred to in predicates using T[attribute_name] Example: { T | T ε Winners and T[year] > 2006 } |__formula, T free ______| Winners: (name, tournament, year); base relation of database Formula defines relation Quantifiers • Free variables in a formula take on the values of There exists: ∃x (f(x)) for formula f with free tuples variable x • A tuple is in the defined relation if and only if • Is true if there is some tuple which when substituted when substituted for a free variable, it satisfies for x makes f true (makes true) the formula For all: ∀x (f(x)) for formula f with free variable x Free variable: • Is true if any tuple substituted for x makes f true i.e. all tuples when substituted for x make f true ∃x, ∀x bind x – truth or falsehood no longer depends on a specific value of x If x is not bound it is free Example Formal definition: formula • A tuple relational calculus formula is {T |∃A ∃B (A ε Winners and B ε Winners and – An atomic formula (uses predicate and constants): A[name] = T[name] and A[tournament] = T[tournament] and • T ε R where B[tournament] =T[tournament] and T[name2] = B[name] ) } – T is a variable ranging over tuples – R is a named relation in the database – a base relation • T not constrained to be element of a named relation • T[a] op W[b] where • Result has attributes defined by naming them in the formula: – a and b are names of attributes of T and W, respectively, T[name], T[tournament], T[name2] – op is one of < > = ≠ ≤ ≥ – so schema for result: (name, tournament, name2) • T[a] op constant unordered • constant op T[a] • Tuples T in result have values for (name, tournament, name2) that satisfy the formula • What is the resulting relation? 1 Formal definition: formula cont.
    [Show full text]
  • Regulations & Syllabi for AMIETE Examination (Computer Science
    Regulations & Syllabi For AMIETE Examination (Computer Science & Engineering) Published under the authority of the Governing Council of The Institution of Electronics and Telecommunication Engineers 2, Institutional Area, Lodi Road, New Delhi – 110 003 (India) (2013 Edition) Website : http://www.iete.org Toll Free No : 18001025488 Tel. : (011) 43538800-99 Fax : (011) 24649429 Email : [email protected] [email protected] Prospectus Containing Regulations & Syllabi For AMIETE Examination (Computer Science & Engineering) Published under the authority of the Governing Council of The Institution of Electronics and Telecommunication Engineers 2, Institutional Area, Lodi Road, New Delhi – 110 003 (India) (2013 Edition) Website : http://www.iete.org Toll Free No : 18001025488 Tel. : (011) 43538800-99 Fax : (011) 24649429 Email : [email protected] [email protected] CONTENTS 1. ABOUT THE INSTITUTION 1 2. ELIGIBILITY 4 3. ENROLMENT 5 4. AMIETE EXAMINATION 6 5. ELIGIBILITY FOR VARIOUS SUBJECTS 6 6. EXEMPTIONS 7 7. CGPA SYSTEM 8 8. EXAMINATION APPLICATION 8 9. EXAMINATION FEE 9 10. LAST DATE FOR RECEIPT OF EXAMINATION APPLICATION 9 11. EXAMINATION CENTRES 10 12. USE OF UNFAIR MEANS 10 13. RECOUNTING 11 14. IMPROVEMENT OF GRADES 11 15. COURSE CURRICULUM (Computer Science) (CS) (Appendix-‘A’) 13 16. OUTLINE SYLLABUS (CS) (Appendix-‘B’) 16 17. COURSE CURRICULUM (Information Technology) (IT) (Appendix-‘C”) 24 18. OUTLINE SYLLABUS (IT) (Appendix-‘D’) 27 19. COURSE CURRICULUM (Electronics & Telecommunication) (ET) 35 (Appendix-‘E’) 20. OUTLINE SYLLABUS (ET) (Appendix-‘F’) 38 21. DETAILED SYLLABUS (CS) (Appendix-’G’) 45 22. RECOGNITIONS GRANTED TO THE AMIETE EXAMINATION BY STATE 113 GOVERNMENTS / UNIVERSITIES / INSTITUTIONS (Appendix-‘I’) 23. RECOGNITIONS GRANTED TO THE AMIETE EXAMINATION BY 117 GOVERNMENT OF INDIA / OTHER AUTHORITIES ( Annexure I - V) 24.
    [Show full text]
  • Session 5 – Main Theme
    Database Systems Session 5 – Main Theme Relational Algebra, Relational Calculus, and SQL Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Presentation material partially based on textbook slides Fundamentals of Database Systems (6th Edition) by Ramez Elmasri and Shamkant Navathe Slides copyright © 2011 and on slides produced by Zvi Kedem copyight © 2014 1 Agenda 1 Session Overview 2 Relational Algebra and Relational Calculus 3 Relational Algebra Using SQL Syntax 5 Summary and Conclusion 2 Session Agenda . Session Overview . Relational Algebra and Relational Calculus . Relational Algebra Using SQL Syntax . Summary & Conclusion 3 What is the class about? . Course description and syllabus: » http://www.nyu.edu/classes/jcf/CSCI-GA.2433-001 » http://cs.nyu.edu/courses/fall11/CSCI-GA.2433-001/ . Textbooks: » Fundamentals of Database Systems (6th Edition) Ramez Elmasri and Shamkant Navathe Addition Wesley ISBN-10: 0-1360-8620-9, ISBN-13: 978-0136086208 6th Edition (04/10) 4 Icons / Metaphors Information Common Realization Knowledge/Competency Pattern Governance Alignment Solution Approach 55 Agenda 1 Session Overview 2 Relational Algebra and Relational Calculus 3 Relational Algebra Using SQL Syntax 5 Summary and Conclusion 6 Agenda . Unary Relational Operations: SELECT and PROJECT . Relational Algebra Operations from Set Theory . Binary Relational Operations: JOIN and DIVISION . Additional Relational Operations . Examples of Queries in Relational Algebra . The Tuple Relational Calculus . The Domain Relational Calculus 7 The Relational Algebra and Relational Calculus . Relational algebra . Basic set of operations for the relational model . Relational algebra expression . Sequence of relational algebra operations . Relational calculus . Higher-level declarative language for specifying relational queries 8 Unary Relational Operations: SELECT and PROJECT (1/3) .
    [Show full text]
  • Ch06-The Relational Algebra and Calculus.Pdf
    Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 6- 1 Chapter 6 The Relational Algebra and Calculus Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Relational Algebra Unary Relational Operations Relational Algebra Operations From Set Theory Binary Relational Operations Additional Relational Operations Examples of Queries in Relational Algebra Relational Calculus Tuple Relational Calculus Domain Relational Calculus Example Database Application (COMPANY) Overview of the QBE language (appendix D) Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 6- 3 Relational Algebra Overview Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) The result of an operation is a new relation, which may have been formed from one or more input relations This property makes the algebra “closed” (all objects in relational algebra are relations) Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 6- 4 Relational Algebra Overview (continued) The algebra operations thus produce new relations These can be further manipulated using operations of the same algebra A sequence of relational algebra operations forms a relational algebra expression The result of a relational algebra expression is also a relation that represents the result of a database query (or retrieval request) Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 6- 5 Brief History of Origins of Algebra Muhammad ibn Musa al-Khwarizmi (800-847 CE) wrote a book titled al-jabr about arithmetic of variables Book was translated into Latin. Its title (al-jabr) gave Algebra its name. Al-Khwarizmi called variables “shay” “Shay” is Arabic for “thing”.
    [Show full text]
  • Relational Algebra & Relational Calculus
    Relational Algebra & Relational Calculus Lecture 4 Kathleen Durant Northeastern University 1 Relational Query Languages • Query languages: Allow manipulation and retrieval of data from a database. • Relational model supports simple, powerful QLs: • Strong formal foundation based on logic. • Allows for optimization. • Query Languages != programming languages • QLs not expected to be “Turing complete”. • QLs not intended to be used for complex calculations. • QLs support easy, efficient access to large data sets. 2 Relational Query Languages • Two mathematical Query Languages form the basis for “real” query languages (e.g. SQL), and for implementation: • Relational Algebra: More operational, very useful for representing execution plans. • Basis for SEQUEL • Relational Calculus: Let’s users describe WHAT they want, rather than HOW to compute it. (Non-operational, declarative.) • Basis for QUEL 3 4 Cartesian Product Example • A = {small, medium, large} • B = {shirt, pants} A X B Shirt Pants Small (Small, Shirt) (Small, Pants) Medium (Medium, Shirt) (Medium, Pants) Large (Large, Shirt) (Large, Pants) • A x B = {(small, shirt), (small, pants), (medium, shirt), (medium, pants), (large, shirt), (large, pants)} • Set notation 5 Example: Cartesian Product • What is the Cartesian Product of AxB ? • A = {perl, ruby, java} • B = {necklace, ring, bracelet} • What is BxA? A x B Necklace Ring Bracelet Perl (Perl,Necklace) (Perl, Ring) (Perl,Bracelet) Ruby (Ruby, Necklace) (Ruby,Ring) (Ruby,Bracelet) Java (Java, Necklace) (Java, Ring) (Java, Bracelet) 6 Mathematical Foundations: Relations • The domain of a variable is the set of its possible values • A relation on a set of variables is a subset of the Cartesian product of the domains of the variables. • Example: let x and y be variables that both have the set of non- negative integers as their domain • {(2,5),(3,10),(13,2),(6,10)} is one relation on (x, y) • A table is a subset of the Cartesian product of the domains of the attributes.
    [Show full text]
  • RELATIONAL DATABASE PRINCIPLES © Donald Ravey 2008
    RELATIONAL DATABASE PRINCIPLES © Donald Ravey 2008 DATABASE DEFINITION In a loose sense, almost anything that stores data can be called a database. So a spreadsheet, or a simple text file, or the cards in a Rolodex, or even a handwritten list, can be called a database. However, we are concerned here with computer software that manages the addition, modification, deletion and retrieval of data from specially formatted computer files. Such software is often called a Relational Database Management System (RDBMS) and that will be the subject of the remainder of this tutorial. RELATIONAL DATABASE HISTORY The structure of a relational database is quite specific. Data is contained in one or more tables (the purists call them relations). A table consists of rows (also known as tuples or records), each of which contains columns (also called attributes or fields). Within any one table, all rows must contain the same columns; that is, every table has a uniform structure; if any row contains a Zipcode column, every row will contain a Zipcode column. In the late 1960’s, a mathematician and computer scientist at IBM, Dr. E. F. (Ted) Codd, developed the principles of what we now call relational databases, based on mathematical set theory. He constructed a consistent logical framework and a sort of calculus that would insure the integrity of stored data. In other words, he effectively said, If you structure data in accordance with these rules, and if you operate on the data within these constraints, your data will be guaranteed to remain consistent and certain operations will always work.
    [Show full text]
  • Scientific Calculator Operation Guide
    SSCIENTIFICCIENTIFIC CCALCULATORALCULATOR OOPERATIONPERATION GGUIDEUIDE < EL-W535TG/W516T > CONTENTS CIENTIFIC SSCIENTIFIC HOW TO OPERATE Read Before Using Arc trigonometric functions 38 Key layout 4 Hyperbolic functions 39-42 Reset switch/Display pattern 5 CALCULATORALCULATOR Coordinate conversion 43 C Display format and decimal setting function 5-6 Binary, pental, octal, decimal, and Exponent display 6 hexadecimal operations (N-base) 44 Angular unit 7 Differentiation calculation d/dx x 45-46 PERATION UIDE dx x OPERATION GUIDE Integration calculation 47-49 O G Functions and Key Operations ON/OFF, entry correction keys 8 Simultaneous calculation 50-52 Data entry keys 9 Polynomial equation 53-56 < EL-W535TG/W516T > Random key 10 Complex calculation i 57-58 DATA INS-D STAT Modify key 11 Statistics functions 59 Basic arithmetic keys, parentheses 12 Data input for 1-variable statistics 59 Percent 13 “ANS” keys for 1-variable statistics 60-61 Inverse, square, cube, xth power of y, Data correction 62 square root, cube root, xth root 14 Data input for 2-variable statistics 63 Power and radical root 15-17 “ANS” keys for 2-variable statistics 64-66 1 0 to the power of x, common logarithm, Matrix calculation 67-68 logarithm of x to base a 18 Exponential, logarithmic 19-21 e to the power of x, natural logarithm 22 Factorials 23 Permutations, combinations 24-26 Time calculation 27 Fractional calculations 28 Memory calculations ~ 29 Last answer memory 30 User-defined functions ~ 31 Absolute value 32 Trigonometric functions 33-37 2 CONTENTS HOW TO
    [Show full text]
  • Database Management Systems Solutions Manual Third Edition
    DATABASE MANAGEMENT SYSTEMS SOLUTIONS MANUAL THIRD EDITION Raghu Ramakrishnan University of Wisconsin Madison, WI, USA Johannes Gehrke Cornell University Ithaca, NY, USA Jeff Derstadt, Scott Selikoff, and Lin Zhu Cornell University Ithaca, NY, USA CONTENTS PREFACE iii 1 INTRODUCTION TO DATABASE SYSTEMS 1 2 INTRODUCTION TO DATABASE DESIGN 6 3THERELATIONALMODEL 16 4 RELATIONAL ALGEBRA AND CALCULUS 28 5 SQL: QUERIES, CONSTRAINTS, TRIGGERS 45 6 DATABASE APPLICATION DEVELOPMENT 63 7 INTERNET APPLICATIONS 66 8 OVERVIEW OF STORAGE AND INDEXING 73 9 STORING DATA: DISKS AND FILES 81 10 TREE-STRUCTURED INDEXING 88 11 HASH-BASED INDEXING 100 12 OVERVIEW OF QUERY EVALUATION 119 13 EXTERNAL SORTING 126 14 EVALUATION OF RELATIONAL OPERATORS 131 i iiDatabase Management Systems Solutions Manual Third Edition 15 A TYPICAL QUERY OPTIMIZER 144 16 OVERVIEW OF TRANSACTION MANAGEMENT 159 17 CONCURRENCY CONTROL 167 18 CRASH RECOVERY 179 19 SCHEMA REFINEMENT AND NORMAL FORMS 189 20 PHYSICAL DATABASE DESIGN AND TUNING 204 21 SECURITY 215 PREFACE It is not every question that deserves an answer. Publius Syrus, 42 B.C. I hope that most of the questions in this book deserve an answer. The set of questions is unusually extensive, and is designed to reinforce and deepen students’ understanding of the concepts covered in each chapter. There is a strong emphasis on quantitative and problem-solving type exercises. While I wrote some of the solutions myself, most were written originally by students in the database classes at Wisconsin. I’d like to thank the many students who helped in developing and checking the solutions to the exercises; this manual would not be available without their contributions.
    [Show full text]
  • Introduction to Data Management CSE 344
    Introduction to Data Management CSE 344 Unit 3: NoSQL, Json, Semistructured Data (3 lectures*) *Slides may change: refresh each lecture Introduction to Data Management CSE 344 Lecture 11: NoSQL CSE 344 - 2017au 2 Announcmenets • HW3 (Azure) due tonight (Friday) • WQ4 (Relational algebra) due Tuesday • WQ5 (Datalog) due next Friday • HW4 (Datalog/Logicblox/Cloud9) is posted CSE 344 - 2017au 3 Class Overview • Unit 1: Intro • Unit 2: Relational Data Models and Query Languages • Unit 3: Non-relational data – NoSQL – Json – SQL++ • Unit 4: RDMBS internals and query optimization • Unit 5: Parallel query processing • Unit 6: DBMS usability, conceptual design • Unit 7: Transactions • Unit 8: Advanced topics (time permitting) 4 Two Classes of Database Applications • OLTP (Online Transaction Processing) – Queries are simple lookups: 0 or 1 join E.g., find customer by ID and their orders – Many updates. E.g., insert order, update payment – Consistency is critical: transactions (more later) • OLAP (Online Analytical Processing) – aka “Decision Support” – Queries have many joins, and group-by’s E.g., sum revenues by store, product, clerk, date – No updates CSE 344 - 2017au 5 NoSQL Motivation • Originally motivated by Web 2.0 applications – E.g. Facebook, Amazon, Instagram, etc – Web startups need to scaleup from 10 to 100000 users very quickly • Needed: very large scale OLTP workloads • Give up on consistency • Give up OLAP 6 What is the Problem? • Single server DBMS are too small for Web data • Solution: scale out to multiple servers • This is hard
    [Show full text]
  • The Relational Model
    The Relational Model Serge Abiteboul INRIA Saclay, Collège de France et ENS Cachan 20/03/2012 1 Organization The priilinciples – Abstraction – UiUniversa lity – Independence Abs trac tion: the reltilationa l modldel Universality: main functionalities Independence: the views revisited Optimization Complexity and expressiveness Conclusion 20/03/2012 2 The principles 3/20/2012 3 DBMS Goal: the management of large amounts of data – Large amounts of dtdata: dtbdatabase – Software that does this: DBMS Management systems, databases Characteristics of the data – Persistence over time (years) – Size (giga, tera, etc.). – Shared among many users and programs – May be distributed geographically – Heterogeneous storage : hard disk, network 3/20/2012 4 Mediation The data management system acts as a mediator between intelligent users and objects that store information tdt,d ( Film(t, d, «Bogart ») Séance(t, s, h) ) Où et à quelle intget(intkey){ heure puis-je inthash=(key%T voir un film S);while(table[h avec Bogart? ash]=NU LL&&ta ble[hash]‐ >getKey()=key) hash=(hash… The questions are translated into first order logic and then into pgprograms with precise and unambiguous syntax and semantics Alice does not want to write this program; she does not have to 3/20/2012 5 1st principle: abstraction • Data modldel – Definition language for describing the data – Manipulation language (queries and updates) • Simple data structure – Relations – Trees – Graphs • Formal language for queries – LiLogics – Declarative vs. Procedural – Graphical languages 3/20/2012
    [Show full text]
  • Chapter 7 Database Management
    Chapter 7 Database Management Introduction The information age has brought about heavy demand on storing, organizing and retrieving data as well as producing usable information out of the stored data in the form of reports or answers to queries. Due the enormous nature of the data stored, they are placed in many files in an organized way and relationships between the records in these files are established taking special care to minimize duplication of the data. A collection of programs that handle all these data requirements is called a Database Management System (DBMS). The collection of files that contain related data is called a database. Traditional file processing systems concentrated on a particular task such as Accounting or a particular department such as Human Resources. These files were sequential, random or indexed access depending on the program. These programs were isolated and did not share the files with another program, and would only generate a pre-defined set of reports. If a new report is needed, a separate program had to be written by those familiar with the file structure of that program. It was difficult to obtain file structures for proprietary programs. The main disadvantages of these traditional file processing systems were the data redundancy and resulting inconsistency. The database management system attempts to answer these shortcomings. Some Terminology A relation in a database is organized as a table of rows and columns, each row containing a tuple (record) and each column containing an attribute (field) of the record. Each attribute must have a type declaration associated with it.
    [Show full text]