A Hands-On Tour Inside the World of PROC SQL® Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California
Total Page:16
File Type:pdf, Size:1020Kb
NESUG 2007 Hands-On Workshops A Hands-on Tour Inside the World of PROC SQL® Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract Structured Query Language (PROC SQL) is a database language found in the base-SAS software. It enables access to data stored in SAS® data sets or tables using a powerful assortment of statements, clauses, options, functions, and other language features. This hands -on workshop presents core concepts of this powerful language as well as its many applications, and is intended for SAS users who desire an overview of the capabilities of this exciting procedure. Attendees will explore the construction of SQL queries, ordering and grouping data, the application of case logic for data reclassification, the creation and use of views, and the construction of simple inner and outer joins. Introduction The SQL procedure is a wonderful tool for querying and subsetting data; restructuring or regrouping data by specifying case expressions ; constructing and using virtual tables known as views; joining two or more tables (up to 32) to explore data relationships . Occasionally, a problem comes along where the SQL procedure is either better suited or easier to use than other more conventional DATA and/or PROC step methods. As each situation presents itself, PROC SQL should be examined to see if its use is warranted for the task at hand. Many SAS users prefer the power and coding simplicity of the SQL procedure in perform ing data analysis and routine everyday tasks rather than using other DATA or PROC step approaches and methods . Whatever the nature of the requirements or SQL usage, this paper is intended to present an overview of some of the most exciting features found in PROC SQL. Example Tables The examples used throughout this paper utilize a database of tables. A relational database is a collection of tables. Each table contains one or more columns and one or more rows of data. The example database consists of three tables: MOVIES, ACTORS, and RENTAL_INFO. Each table appears below. MOVIES TITLE LENGTH CATEGORY YEAR STUDIO RATING Brave Heart 177 Action Adventure 1995 Paramount Pictures R Casablanca 103 Drama 1942 MGM / UA PG Christmas Vacation 97 Comedy 1989 Warner Brothers PG-13 Coming to America 116 Comedy 1988 Paramount Pictures R Dracula 130 Horror 1993 Columbia TriStar R Dressed to Kill 105 Drama Mysteries 1980 Filmways Pictures R Forrest Gump 142 Drama 1994 Paramount Pictures PG-13 Ghost 127 Drama Romance 1990 Paramount Pictures PG-13 Jaws 125 Action Adventure 1975 Universal Studios PG Jurassic Park 127 Action 1993 Universal Studios PG-13 Lethal Weapon 110 Action Cops & Robber 1987 Warner Brothers R Michael 106 Drama 1997 Warner Brothers PG-13 National Lampoon's Vacation 98 Comedy 1983 Warner Brothers PG-13 Poltergeist 115 Horror 1982 MGM / UA PG Rocky 120 Action Adventure 1976 MGM / UA PG Scarface 170 Action Cops & Robber 1983 Universal Studios R Silence of the Lambs 118 Drama Suspense 1991 Orion R Star Wars 124 Action Sci-Fi 1977 Lucas Film Ltd PG The Hunt for Red October 135 Action Adventure 1989 Paramount Pictures PG The Terminator 108 Action Sci-Fi 1984 Live Entertainment R The Wizard of Oz 101 Adventure 1939 MGM / UA G Titanic 194 Drama Romance 1997 Paramount Pictures PG-13 1 NESUG 2007 Hands-On Workshops ACTORS TITLE ACTOR_LEADING ACTOR_SUPPORTING Brave Heart Mel Gibson Sophie Marceau Christmas Vacation Chevy Chase Beverly D'Angelo Coming to America Eddie Murphy Arsenio Hall Forest Gump Tom Hanks Sally Field Ghost Patrick Swayze Demi Moore Lethal Weapon Mel Gibson Danny Glover Michael John Travolta Andie MacDowell National Lampoon’s Vacation Chevy Chase Beverly D'Angelo Rocky Sylvester Stallone Talia Shire Silence of the Lambs Anthony Hipkins Jodie Foster The Hunt for Red October Sean Connery Alec Baldwin The Terminator Arnold Schwarzenegger Michael Biehn Titanic Leonardo DiCaprio Kate Winslet RENTAL_INFO CUST_NO RENTAL_DATE TITLE 5 11/29/2006 Christmas Vacation 10 05/18/2007 Star Wars 3 11/28/2006 Christmas Vacation 10 04/04/2007 Coming to America 3 06/20/2007 Forest Gump 5 10/15/2006 Ghost 10 05/07/2006 Jurassic Park 3 06/20/2007 National Lampoon’s Vacation 10 07/10/2007 Star Wars 5 08/01/2007 The Wizard of Oz 3 09/15/2006 The Wizard of Oz Constructing SQL Queries to Retrieve and Subset Data PROC SQL provides simple, but powerful, retrieval and subsetting capabilities including the retrieval and display of all rows and columns in a table, removal of rows with duplicate values, using wildcard characters to search for partially known information, and integrating ODS to create enhanced and more esthetically pleasing output. Retrieving and Displaying All Columns in a Table with a Wildcard Character “*” SQL can retrieve and display all columns in a table by specifying the wildcard character “*” in a SELECT statement. The wildcard character “*” instructs the SQL processor to retrieve and display each column in the order that they were defined in the underlying table. The following example illustrates the use of the wildcard character to display all columns and rows in the underlying MOVIES table. SQL Code PROC SQL; SELECT * FROM MOVIES; QUIT; SQL Output The SAS System Title Length Category Year Studio Rating Brave Heart 177 Action Adventure 1995 Paramount Pictures R Casablanca 103 Drama 1942 MGM / UA PG Christmas Vacation 97 Comedy 1989 Warner Brothers PG-13 Coming to America 116 Comedy 1988 Paramount Pictures R Dracula 130 Horror 1993 Columbia TriStar R 2 NESUG 2007 Hands-On Workshops Dressed to Kill 105 Drama Mysteries 1980 Filmways Pictures R Forrest Gump 142 Drama 1994 Paramount Pictures PG-13 Ghost 127 Drama Romance 1990 Paramount Pictures PG-13 Jaws 125 Action Adventure 1975 Universal Studios PG Jurassic Park 127 Action 1993 Universal Pictures PG-13 Lethal Weapon 110 Action Cops & Robber 1987 Warner Brothers R Michael 106 Drama 1997 Warner Brothers PG-13 National Lampoon's Vacation 98 Comedy 1983 Warner Brothers PG-13 Poltergeist 115 Horror 1982 MGM / UA PG Rocky 120 Action Adventure 1976 MGM / UA PG Scarface 170 Action Cops & Robber 1983 Universal Studios R Silence of the Lambs 118 Drama Suspense 1991 Orion R Star Wars 124 Action Sci-Fi 1977 Lucas Film Ltd PG The Hunt for Red October 135 Action Adventure 1989 Paramount Pictures PG The Terminator 108 Action Sci-Fi 1984 Live Entertainment R The Wizard of Oz 101 Adventure 1939 MGM / UA G Titanic 194 Drama Romance 1997 Paramount Pictures PG-13 Retrieving and Displaying Specific Columns in a Table SQL can retrieve and display specific columns from a table when they are coded in a SELECT statement. As illustrated in the next example, the SELECT statement displays the movie titles and ratings from the MOVIES table. SQL Code PROC SQL; SELECT title, rating FROM MOVIES; QUIT; SQL Output The SAS System Title Rating Brave Heart R Casablanca PG Christmas Vacation PG-13 Coming to America R Dracula R Dressed to Kill R Forrest Gump PG-13 Ghost PG-13 Jaws PG Jurassic Park PG-13 Lethal Weapon R Michael PG-13 National Lampoon's Vacation PG-13 Poltergeist PG Rocky PG Scarface R Silence of the Lambs R Star Wars PG The Hunt for Red October PG The Terminator R The Wizard of Oz G Titanic PG-13 3 NESUG 2007 Hands-On Workshops Ordering the Results of a Query with an ORDER BY Clause The results of an SQL query can be ordered with an ORDER BY clause. Query results can be ordered in ascending (default) order, or in descending order when the DESC keyword is specified following the column name. The following example selects movie titles and ratings from the MOVIES table and displays the results in ascending order. SQL Code PROC SQL; SELECT title, rating FROM MOVIES ORDER BY rating; QUIT; Output from the ORDER BY Clause The SAS System Title Rating The Wizard of Oz G The Hunt for Red October PG Star Wars PG Poltergeist PG Jaws PG Rocky PG Casablanca PG Forrest Gump PG-13 Christmas Vacation PG-13 Michael PG-13 National Lampoon's Vacation PG-13 Jurassic Park PG-13 Titanic PG-13 Ghost PG-13 Dressed to Kill R Lethal Weapon R Dracula R The Terminator R Brave Heart R Coming to America R Silence of the Lambs R Scarface R Removing Duplicate Rows with the DISTINCT Keyword When the same value and or row appear multiple times in a table, SQL can be instructed to remove the duplicates by specifying a DISTINCT keyword in a SELECT statement. As illustrated in the following example, the DISTINCT keyword is specified for the RATING column to remove any and all duplicate rows from the MOVIES table. SQL Code PROC SQL; SELECT DISTINCT rating FROM MOVIES; QUIT; The resulting output from specifying the DISTINCT keyword illustrates three distinct groups and appears below. 4 NESUG 2007 Hands-On Workshops Output from the DISTINCT Keyword The SAS System Rating PG PG-13 R Using Wildcard Characters with the LIKE Operator for Searching When searching for specific rows of data is necessary, but only part of the data you are searching for is known, then SQL provides the ability to use wildcard characters as part of the search argument. Say you wanted to search for all movies that were classified as an “Action” type of movie. By specifying a query using the wildcard character percent sign (%) in a WHERE clause with the LIKE operator, the query results consist of all rows containing the word “ACTION” as illustrated in the following query. SQL Code PROC SQL; SELECT title, category FROM MOVIES WHERE UPCASE(category) LIKE ‘%ACTION%’; QUIT; The resulting output from specifying the wildcard character “%” with the LIKE operator appears below.