Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University SQL
Total Page:16
File Type:pdf, Size:1020Kb
SQL Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University SQL • 1965: Maron & Levien propose Relational Data File • 1968: Di Paola challenges RDF model • 1968: Childs proposes Extended Set Theory (Tuples) • 1970: Codd proposes the Relational Model • 1974: SEQUEL (Structured English Query Language ) introduced by IBM developers Donald D. Chamberlin and Raymond F. Boyce • 1979: Relational Software (Oracle) introduces SQL • 1986: SQL becomes the standard of the American National Standards Institute (ANSI) Short History of SQL Melvin Earl "Bill" Maron was an American computer scientist and emeritis professor of University of California, Berkeley. Maron is best known for his work on probabilistic information retrieval which he published together with his friend and colleague Lary Kuhns. Quite remarkably, Maron also pioneered relational databases, proposing a system called the Relational Data File in 1967. • Maron, Melvin E.; Kuhns, J. L. (1960). "On relevance, probabilistic indexing, and information retrieval". Journal of the ACM. • Levien, Roger E.; Maron, Melvin E. (1967). "A computer system for inference execution and data retrieval". Communications of the ACM. Bill Maron David Child’s 1968 Article A class of formulas of the first-order predicate calculus, the definite formulas has recently been proposed as the formal representation of the “reasonable” questions to put to a computer in the context of an actual data retrieval system, the Relational Data File of Levien and Maron. It is shown here that the decision problem for the class of definite formulas is recursively unsolvable. Hence there is no algorithm to decide whether a given formula is definite. Robert Di Paola David Child’s 1968 Article Childs’ N-Tuple Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the Relational Model for database management, the theoretical basis for relational databases. He made other valuable contributions to computer science, but the relational model, a very influential general theory of data management, remains his most mentioned achievement. Ted Codd Codd’s Relational Model In the relational model, records are "linked" using virtual keys not stored in the database but defined as needed between the data contained in the records Relational Model • Structured Query Language is a domain-specific language used in programming and designed for managing data held in a relational database management system. • Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language, data manipulation language, and data control language. • SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks.” Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language. Structured Query Language Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. In Codd's book, these two Null- type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. SQL Code Problem Despite not entirely adhering to the relational model: • Moved away from strict adherence to foundational principles of relational algebra and tuple relational calculus • Which results in: • Foundational problem with null values • Left Joins result in slow code • No algorithm can determine if a given formula is definite • Leads to argument about code development being an Art or Science • No allowances for waste • Throwaway causes problems SQL’s Inherent Problems SQL Outcomes SQL SQL Resources Introductory Courses Creating a Table Training Table SQL Statement Basic Statement The WITH clause, or subquery factoring clause, is part of the SQL-99 standard and was added into the Oracle SQL syntax in Oracle 9.2. The WITH clause may be processed as an inline view or resolved as a temporary table. The advantage of the latter is that repeated references to the subquery may be more efficient as the data is easily retrieved from the temporary table, rather than being requeried by each reference. Subquery Factoring Oracle Example An inline view is a SELECT statement in the FROM-clause of another SELECT statement. In-line views are commonly used to simplify complex queries by removing join operations and condensing several separate queries into a single query. This feature was introduced in Oracle 7.2. Inline View http://www.sqlstyle.guide/ https://wiki.easyvista.com/xwiki/bin/view/Documentation/SQL+rules https://technet.microsoft.com/en-us/library/bb264565(v=sql.90).aspx http://weblogs.sqlteam.com/jeffs/archive/2006/03/14/9289.aspx http://www.fontstuff.com/access/acctut15.htm http://softwareengineering.stackexchange.com/questions/144602/how-do-i-make-complex-sql-queries-easier-to-write Remember your Job The SQL Server's job is to return data. Your client application or Reporting tool's job is to present that data. Always remember this. If you try to format a result set using T-SQL, your nice Date Time and Money values come to the client as VARCHAR's. Which means that if the client wants to do anything meaningful with these values (sort, compare, do math, format, etc.) it must immediately convert those strings back to the original datatype! So many people try to format their results in T-SQL thinking this makes things easier for their clients, when it makes the T-SQL longer, more complicated, and less efficient. Let SQL do what it does best, and let your presentation layer do what it does best. It really kind of almost makes sense when you think about it. SQL’s Job Pivot SQL Performance Don’t think that a query execution of 50ms is fast or even acceptable. It’s not. If you get these speeds at development time, make sure you investigate execution plans. Those monsters might explode in production, where you have more complex contexts and data. In my recent SQL work for a large Swiss bank, I have maintained nested database view monsters whose unnested SQL code amounted up to 5,000 lines of code, joining the same table over and over again in separate subselects combined via UNION operations. This monster performed in way under 50ms, no matter how we queried it. Of course, this performance was only achieved after lots of fine-tuning, load-testing and benchmarking. But it worked. Our Oracle database never let us down on these things. Operators in WHERE Clause Functions in the WHERE Clause Queries with NOT IN BETWEEN is better than IN UNION is better than OR • Begin a new line with each clause in the statement— For example, place the FROM clause on a separate line from the SELECT clause. Place the WHERE clause on a separate line from the FROM clause, and so on. • Use tabs or spaces for indentation when arguments of a clause in the statement exceed one line. • Use tabs and spaces consistently. • Begin a new line with each column name in the SELECT clause if many columns are being selected. • Begin a new line with each table name in the FROM clause if many tables are being used. • Begin a new line with each condition of the WHERE clause— You can easily see all conditions of the statement and the order in which they are used. Format Consistency Limit Sort Operations Avoid HAVING and DISTINCT Using Select * Outdated Join Clauses Old Syntax • Formatting your SQL statement sounds like an obvious statement; as obvious as it may sound, it is worth mentioning. There are several things that a newcomer to SQL will probably not take into consideration when building a SQL statement. The following sections discuss the listed considerations; some are common sense, others are not so obvious: • Formatting SQL statements for readability • The order of tables in the FROM clause • The placement of the most restrictive conditions in the WHERE clause • The placement of join conditions in the WHERE clause Formatting a SQL Statement Tables from Smallest to Largest Restrictive Conditions EXISTS or IN EXISTS or IN • When you write a query using the IN clause, you're telling the rule- based op;mizer that you want the inner query to drive the outer query (think: IN = inside to outside). • When you write EXISTS in a where clause, you're telling the op;mizer that you want the outer query to be run first, using each value to fetch a value from the inner query (think: EXISTS = outside to inside). EXISTS or IN Don't Skimp on Information Whenever a field is specified you have the option to append the table name, separating the two with a dot. SELECT Firstname, Lastname FROM tblStaff ORDER BY Lastname is the same as: SELECT tblStaff.Firstname, tblStaff.Lastname FROM tblStaff ORDER BY tblStaff.Lastname providing that the fields belong to the data source specified in the FROM clause. If your query refers to more than one table you must include the table name along with the field name. SQL SQL SQL Scalar Subqueries Introduced in Oracle9, scalar subqueries allow you to treat the output of a subquery as a column or even an expression within a SELECT statement. It is a query that only selects one column or expression and returns just one row. If the scalar subquery fails to return select any rows, Oracle will use a NULL value for the output of the scalar subquery.