<<

Advanced Tutorials

SELECT ITEMS FROM PROC.SQL WHERE ITEMS> BASICS

Alan Dickson & Ray Pass ASG. Inc.

to get more SAS sort-space. Also, it may be possible to limit For those of you who have become extremely the total size of the resulting extract set by doing as much of comfortable and competent in the DATA step, getting into the joining/merging as close to the raw data as possible. PROC SQL in a big wrJ may not initially seem worth it Your first exposure may have been through a training class. a 3) Communications. SQL is becoming a lingua Tutorial at a conference, or a real-world "had-to" situation, franca of data processing, and this trend is likely to continue such as needing to read corporate data stored in DB2, or for some time. You may find it easier to deal with your another relational . So, perhaps it is just another tool systems department, vendors etc., who may not know SAS at in your PROC-kit for working with SAS datasets, or maybe all, ifyou can explain yourself in SQL tenus. it's your primary interface with another environment 4) Career growth. Not unrelated to the above - Whatever the case, your first experiences usually there's a demand for people who can really understand SQL. involve only the basic CREATE TABLE, SELECT, FROM, Its apparent simplicity is deceiving. On the one hand, you can WHERE, GROUP BY, ORDER BY options. Thus, the do an incredible amount with very little code - on the other, tendency is frequently to treat it as no more than a data extract you can readily generate plausible, but inaccurate, results with tool. You then go on to do the "real" work in SAS DATA step code that looks to be bullet-proof at first glance. post-processing. So, our aim is not to proselytize. SQL may not be While that approach may have the advantage of the answer to all (or even any) of your problems. Nonetheless, getting the job done faster (alwrJs a consideration these days), ifyou've only been a swface-scratcher until now, come along there are a number of reasons why it might be worth looking and take a look at some examples that go just a little bit into doing more in the realm ofSQL before cranking up that deeper. We hope they whet your appetite to do some more good old DATA step. In fact, maybe you can avoid it exploring ofyour own. altogether! At this point we would like to emphasize that, in this 1) SQL just works differently. Sometimes that's paper, we are only showing what can be done - we're not better, sometimes worse. Although it can be a bit of a mental making any efficiency statements. As always, the results you wrench to think SQL rather than DATA step, there are obtain depend on your environment, your data, and the choice situations the SQL approach is more appropriate. An oftechnique. That last point is particularly important since, as actuary once asked one of the authors (AD) how he could you progress through the examples, you will see that there is force SAS to do an all-to-all of two datasets (both with fiequently more than a single teclmique that can be applied to non-unique keys) which he was planning to extract separately solve a specific problem. Even ifyou're already familiar with an SQLlDS database. To his credit, by the time we'd one, we hope that we'll expose you to some others that may be talked through his reasons for doing this, he had the "lightbulb more appropriate or efficient We suggest that you conduct experience" - SQL does it that way all the time. so why bypass your own benchmarks before committing yourself to anyone that power? Why not just "" them with SQL. approach, especially if it will become a "production" application.. 2) Resource constraints. At the risk of being branded heretics for suggesting such "cost-shifting", it may It's also important to realize that this paper focuses prove easier!fasterlcheaper to take advantage of system-owned on the SAS implementation ofSQL. That is the background of resources to get some ofyour processing done. This may be one ofthe authors (RP), while the other is more familiar with particularly true when using PROC SQL to access corporate the DB2 implementation (AD). There are many differences relational , since they tend to have large sort-space between the two - we'll tty to point out any major and work-space pools associated with them. Even if your discrepancies as we go, but be warned that some of this code benchmarks show that PROC SORT is more efficient than will not work in non-SAS SQL environments. using an ORDER BY statement, perhaps the ~ h~ run-time is more effective than the three days of politics 1Iying

22 Advanced Tutorials

SECTION I - First Things First Here, we are reading the temporary dataset First, let's discuss the title of this paper - we wanted WORKPROC using the table SQL, with the same it not only to describe the paper's contents, but also to be results as the previous intended code. Let this serve as an syntactically valid SQL code. Here is an example including early lesson. It's quite possible to produce syntactically the paper title in the PRoe SQL statement, along with data anect code which could yield perfectly meaningless results, that would support the code. without much effort at all. Be careful. *------; 'lTl'LE 'EX 0 - PAPER TI'l'LE' ; For the remainder of this paper, we will be using a RUN; fictitious set ofmedical practice data which is contained in two tIA'l!A PROC. SQL; SAS datasets: dataset PIDEMOG contains variables mPO'l' 801 KEY $4. PATIENT, PRIMDOC (patient's primary doctor), DOB 806 ITEMS 2. 809 BASICS 2.; (patient's date of birth) and SEX (patient's sex); dataset CARDS; PTVISlTS contains information about ten visits made by these KEY1 34 17 patients to the doctor's office in variables PATIENT, KEY22244 VlSDATE, VlSDOC (doctor seen), HX (salient patient Kft3 18 09 :nY4 18 36 history item for visit), DX (diagnosis at time of visit), RX :nY5 52 26 (prescription written -if any- at time of visit) and CHARGES :nY6 12 24 (for the visit.) Although these data are obviously of a simple DY7 40 20 ; sample nature, the techniques we employ are totally *------; to to PROC SQL; expandable all real-life data situations. The source code SELECT ITEMS create the datasets is as follows. FROM PROC • SQL WHERE ITEMS > BASICS; *------; *------J tIA'l!A P'l'DEMOG; XNI?O'l' 8001 PATIEN'l' $2. We are reading the column ITEMS from the 8004 PRJJmOC $5. 8010 noB YDlMDD6. permanently stored (two-level name) SAS dataset 8017 SEX $1. ; PROC.SQL. The resulting output is as follows. FORMAT DOB 1OIl)DYY8.;

CARDS; Ex 0 - PAPER TI'l'LE Pl SMITH 540221 H P2 BRONN 431111 H ITEMS P3 BRONN 380818 F P4 SMITH 610704 F 34 P5 JONES 331219 H 18 P6 BRONN 420525 F 52 P7 JONES 660606 F ; 40 *------; DA'l!A P'l'VIsr:rs I Unfortunately, the paper title was frequently typoed :INPOT 8001 PATIENT $2. 8004 VISDA'l'E YYMHDD6. in pre-publication advertising - usually by missing the period 8011 VISDOC $5 . inPROC.SQL (and thereby missing the pointl) We can now 8017 HX $3. demonstrate that even this rendering is syntactically valid, 8021 OX $3. because of the "a1i8s" feature ofPROC SQL. The following 8025 RX $4. 8030 CHARGBS 6.2; code produces theexact same output as the originally intended FORMATVISDA'l'E lIMDDYYB. syntax. CHARGES 6.2; CARDS; TI'l'LE*------; 'EX 0 - PAPER TI'l'LE' ; Pl 910511 SMITH 243 087 1831 074.50 RUN; P1 921104 SMITH 465 131 NONE 128.00 P1 930122 SMITH 164 676 NONE 085.00 tIA'l!A PROC; P2 911021 BROWN 227 257 1798 039.00 SE'l' PROC. SQL; P3 920907 BROWN 140 673 NONE 110.75 P3 930909 JONES 589 324 2978 068.50 PROC SQL; P4 930515 SMITH 995 270 NONE 032.00 SELECT 7TEMS p5 920717 JONES 658 195 NONE 106.85 FROM PROC SOL P6 910521 BROWN 754 042 NONE 098.00 WHERE ITEMS > BASJ:CS; P7 940111 JONES 576 357 1130 135.00 *------; RUN; *------;

23 Advanced Tutorials

It is important to keep in mind the many functional when AS is required in conjunction with a table name is equalities between the SQL procedure and the DATA step, as when the table is being CREATEd from an SQL query weD as between PROC SQL and some of the other SAS basic expression. We'll see an example of this towards the end of building block procedures, e.g. PROC SORT and PROC the paper. SUMMARY (or PROC l'.:1EANS). SAS dalasets are composed of observations and variables. This statement is DB2 SQL allows both colunm and table aliasing directly translatable to: SQL tables are composed of rows and like PROC SQL, but unlike the SAS implementation, does columns. In fact, for all intents and purposes, SAS da.tasets not support the AS keyword in these processes. Only the and SQL tables are totally interchangeable in the course of a implicit form of aliasing as mentioned earlier (without the SAS session. AS) is available. However, similar to the last point made above, AS is required in DB2 when a is created from Welypically use the DATA step to read in raw data a SELECT expression (unlike PROC SQL, DB2 does not and to create new computed variables based on manipulations currently support creating tables from SELECT.) of raw data. New computed columns can also be created in PROC SQL with all the same computational power and features of DATA step assignment statements. This is The above code yields the following output. accomplished quite simply in PROC SQL by including the computational code in the SELECT statement as a new "column", and optionally giving the new column a name EX 1 - 'AS' by (column alias) using the AS keyword. If the newly created Row VISDOC PATZENT AGE column is not named in this llllIIlIlCf, it will still be created, and listed in the SELECT columnar output display, but there will 1 BROWN P2 47 be no column header, and reference to the new "column" will 2 BROWN P6 48 3 BROWN P3 S4 have to be made by repeating the computational code. In the 4 JONES P3 S5 following example, we list the VISDOC, PATIENT and a 5 JONES P7 27 newly created variable, AGE. A few other features are 6 JONES P5 58 employed which will be discussed following the output 7 SMITH P4 31 8 SMITH P1 38 9 SMITH P1 38 *------; 10 SMITH P1 37 'l'l:'l'LE • EX 1 - , AS' • ; RON; NOTE: This output, as well as all other examples in this PROC SQL NtlMBBR; paper are not actual output. but are rather typed in here to SEI.ECT PV • VISDOC, accommodate the columnar formatting requirements ofthe PV. PA'l'IEN'l' , Proceedings. Many code samples are likewise rearranged IN'l'«VISDATE-DOB)/36S.2S) AS AGE to fit into the columns, and are not optimally indented or FROM P'l'DEMOG AS PD, line-spaced. P'l'V:ISl:'l'S AS PV WHERE PD. PA'l'l:BNT = PV. PA'l'ZENT OBDBR BY PV • VJ:SDOC ; Unlike PROC PRINT. the output from a PROC Q1JJ:T; *------; SQL SELECT statement will not automatically include row (observation) numbers. This is easily included in the display by including the NUMBER statement on the PROC SQL This example shows two different uses ofAS. It is statement Also, the ORDER BY statement will cause the first used, as noted above, in the SELECT statement to give data to be displayed. in the desired order (ASC is the default; the alias AGE to the newly computed column. In cases like DESC can also be specifled.) It is important to realize that this, when creating column aliases, AS is required. It is then the ORDERBY statement refers to the display order only. used in the FROM statement to give aliases to the tables There is no internal ordering of data in a relational SQL PTDEMOG and PTVISITS. These explicit uses of AS are table. optional; the FROM statement would have been just fine without them, but good coding practice suggests their use. or As mentioned above, the age data is computed by even over-use. The use of table aliases here are themselves the code INT«(VISDATE-DOB)136S.2S), and is named by optional. The WHERE statement could have been written as the AS AGE colunm alias. If this newly created column WHERE PTDEMOG.PATIENT = PTVISITS.PATIENT, but AGE is to be referred to in subsequent code in the PROC the aliases make the coding a little briefer. The one instance SQL, it can be done so by using the CALCULATED

24 Advanced Tutorials

keyword as is shown in the next example. Prior to version statement as well, similar to the forms of the DATA step 6.08, you would have had to use the calculational code every SELECT statement. In fact it's best to conceive of the time you wanted to refer to the age column. CASE statement as an SQL version of the DATA step SELECT statement. Note that we've included one more Suppose we now want to group our data into column in the current example, VISDA TE, and that PROC subgroups based on age categories by decade: 20's, 30's, etc. SQL displays it with its defined display' format, This is easily accomplished in a DATA step by a series of MMDDYYS .. Unfortunately, neither CALCULATED nor IF-TIffiN-ELSE statements, or a SELECT statement CASE are currently supported in DB2 SQL. Although there is no equivalent to IF-THEN-ELSE in PROC SQL, there is a very powerful alternate to the DATA step Here is the output from the above code. SELECT statement, namely CASE. Keep in mind the possibly confusing terminology here. The PROC SQL "CASE" statanent is fi.mctionally similar to the DATA step 'SELECT" EX 2 - 'CASE &. CALCULATED' statement, not the PROC SQL ·SELECT" statement In our VISDOC PATIENT VJ:SDA'l'E AGE AGEGROUP example, we will create a new column AGEGROUP by using ------CASE as follows. BROWN P2 10/21/91 47 40'S BROWN P6 OS/21/91 48 40'S BROWN P3 09/07/92 54 50'S JONES P3 09/09/93 55 50'S *------; JONES P7 01/11/94 27 20'S TI'l'LE "SX 2 - 'CASE &. CALCULATED'·; JONES P5 07/17/92 58 50'S RON; SHI'l'H P4 05/15/93 31 30'S SHI'l'H P1 11104192 38 30'S PROC SQL; SHI'l'H P1 01/22/93 38 30'S SHI'l'H P1 05/11191 37 30'S SELECT PV. VJ:SDOC, PV. PATIENT, PV • VJ:SDA'l'E, IN'l'{(PV.VJ:SDA'l'E-PD.DOB)/365.25) AS AGE, SECTION II-A PROC of DISTINCTion CASE WHEN (DO LE CALCULATED AGE LE 19) THEN "PEDS" A quick examination of the data will reveaI that WHEN (20 LE CALCULATED AGE LE 29) some patients have had more than one visit, and in one case, THEN ·20'S· to more than one doctor. Obviously, in a real-world WHEN (3D LE CALCULATED AGE LE 39) example, this would be the rule rather than the exception. THEN "30'S· WHEN (40 LE CALCOLA'l'ED AGE LE 49) Ifwe want to remove the duplicate visit situation from our THEN "40'S· output to see just who visited who, we can use the WHEN (50 LE CALCULATED AGE LE 59) DISTINCT sWf:ment to return only unique combinations of THEN "50'S" patients and doctors seen. Notice that DISTINCT applies to WHEN (60 LE CALCULATED AGE LE 69) THEN "60'S" the group of columns which follow it, not just the WHEN (70 LE CALCULATED AGE LE 79) immediately following column. This is in contrast to other 'IfiEN "GER" situations where a qualifier typically refers to the BloSE "???7· iDllIlediately preceding (or following) columns only, such as END AS AGEGRotlP FROM P'l'DEI!OG AS PO, the DESCENDING option in the BY statement within P'l'Ir.ISITS AS PV PROCSORT. WHERE PD.PATIENT = PV.PATIENT ORDER BY PV • VJ:SDOC;

QtJIT; *------1 T:ITLE 'SX 3 - DJ:S'l'INCT '; *------; RUN;

PROC SQL; The PROC SQL CASE statement always yields a SEtoECT DISTDlCT PATJ:ENT, VJ:SDOC single value. In this example, the value of the calculated FROM PTV:IS:ITS, colwnn AGE for each row is evaluated over the entire CASE statement, and a single value is created AS AGEGROUP, for QtJIT; each row. The CASE statement is extremely powerful and can *------~-; be used anywhere in a PROC SQL nm where a single column value is expected. There are other fonns of the CASE

25 Advanced Tutorials

Here is the output from the above code. instructions. We'll see this problem crop up again later. For now we will solve the problem by using a GROUP BY EX 3 - DISTINCT statement as follows (and we'll also give the COUNT(*) column an alias.) PA'l'IENT VISDOC *------; Pl. SMITH 'l'ITLE "EX 5 - COONT (BiH'rBR)·; P2 BROWN RUN; P3 BROWN P3 JONES PROC SQI,; P4 SMITH SELECT DIS'l'INC'l' PA'l'IEN'l', P5 JONES COUNT(*, AS NUKVIS P6 BROWN ROllI P'l'VISI'l'S P7 JONES GROOP BY PA'l'IEN'l'; QUIT; *------; Patient P3 was seen by someone other than his Here is the output primaly doctor on one visit, and therefore shows up in the output twice since PROC SQL is evaluating the PATIENTI EX 5 - COUNT (BE'l'I'ER) VISDOC combination for uniqueness. PA'l'IENT NUMVIS

At this point, let's add the COUNT function to the P1 3 DISTINCT picture to pro:Iuce a total number of visits for ea$ P2 1 patient To those new to SQL, the following code would P3 2 probably look sufficient P4 1 P5 1 P6 1 *------; p7 1 'l'ITLE 'EX 4 - COUNT (BAD)'; RUN;

PROC SQI,; Now that we've seen the inter-relationship SELECT DISTINCT PATIENT, between COUNT(*) and GROUP BY, we can tty a COUNT(*) somewhat more realistic example, including the doctor who FROM PTVISITS; attended the patient QUIT; *------; *------; 'l'ITLE "EX 6 - COONT (BEST)·; RUN; As seen in the resulting output, it is not. PROC SQL; EX 4 - COUNT (BAD) SELECT DISTINC'l' PA'l'IEN'l', v:rSDOC, PA'l'IENT COUNT (*) AS Ntll!IVIS FROM P'l'VISITS Pl. 10 GROUP BY PA'l'IEN'l', P2 10 VISDOC; P3 10 QUIT; P4 10 *------; P5 10 P6 10 Here is the output P7 10 EX 6 - COUNT (BEST)

Not only does this not give us the desired result, it PATIENT VISDOC NUMVIS also generates a warning note as fonows: Pl SMITH 3 NOTE: The query requires remerging P2 BROWN 1 summary statistics back with the P3 BROWN l. original data. P3 JONES l. p4 SMITH 1 P5 JONES 1 This is because PROC SQL resolves the function P6 BROWN 1 across the entire table unless we give it more explicit . P7 JONES 1 .

26 Advanced Tutorials

Again, we note the P3/JONES row impacting results, *------_. splitting her two visits into one for each doctor. TJ:TLE 'EX 8 - SUBQUERY-FEMALES ONLY'; , RON;

PRIX SQL; SECTION III - Lers Get Together SBLBCT fN .PA'l'IEN'l', fN • VlSDOC , Theone feature ofPROC SQL that is perhaps touted fN • VlSDATE most often as the real reason for its existence is its ability to FROM PTVJ:srrs AS fN easily and powerfully merge multiple tables together with many options either unavailable, or quite difficult to perfmm WHERE fN .PATJ:EN'l' IN (SELECT PD.PATJ:EN'l' using DATA steps alone. The following example FROM PTDEMOG AS PD demonstrates this basic feature while also using a WHERE WHERE PD.SEX='P') statement to limit the processing of rows selected for display based on values of one column in one of the merging tables. ORDER BY fN. VlSDOC; Here we show visit information for female patients only. QUIT; *------; *------: 'l'rl'LE "EX 7 - SIHPLE JOIN-FEKALES ONLY·; The subquery is the code in parentheses in the RON; WHERE statement (called the inner query) which is PRIX SQL; evaluated before the outer query. In this example, it yields a series of values for PV.PATIENT which become the IN SBLBCT fN. PATJ:EN'l', portion of the WHERE statement After evaluation of the fN • VlSDOC. is fN • VlSDATE subquery, the WHERE statement read internally as FROM PTDEMOG AS PD. WHERE PV.PATIENT IN ('P3', 'P4', 'P6', 'P7'). Here is PTVJ:srrs AS fN the output of the above code. Except for the title, it is WHBiIB PD. PATJ:EN'l' = fN. PATJ:EN'l' identical to the previous output AND PD.SEX = 'F' ORDER BY fN • VlSDOC; EX 8 - SUBQUERY-FEMALES ONLY QUIT; *------; PATJ:ENT VISDOC VISDATE

Here is the output for the above code. 10'3 BROWN 09/07/92 P6 BROWN OS/21/91 EX 7 - SIMPLE JOIN-FEMALES ONLY P3 JONES 09/09/93 P7 JONES 01/11/94 PATJ:ENT VISDOC VISDATE P4 SMITH 05/15/93

P3 BROWN 09/07/92 Subqueries are a very powerful technique in P6 BROWN OS/21/91 (and P3 JONES 09/09/93 PROC SQL. Suppose we bad need to isolate all visits P7 JONES 01/11/94 charges) by patients to physicians other than their primary P4 SMrl'H 05/15/93 doctor. This is easily handled by a subquery as follows. *------; TJ:TLE 'EX 9 - stlBQUERY-NOHPRIJI1ollY DOCS '; While the above demonstrated merging ability is an R.tlN; extremely important aspect of PROC SQL, there are addit.iooal features inherent in the proced~ for working with PROC SQL; multiple tables simultaneously which add to the SAS System SBLBCT fN .I?ATJ:EN'l'. users cache of data manipulation tools. Notable among these fN. VlSDOC, are various types of "subqueries.· A basic subquery can be PIT. VlSDATE, used to form the business end of a WHERE statement. It fN.CBARGES allows you to selectively choose rows for display from one FROM P'l'Ir.IS:ITS AS fN table based on selection criteria derived from another table. WHERE fN.PA'l'IEN'l'llfN.VISDOC NOT IN The above example could have been coded as follows to make (SELECT PD.PATJ:ENTIIPD.PRDIDOC use of a subquery. Don't be fooled by the simplicity of our FROM P'1'DEMOG AS PD); QUIT; examples. These features are extremely powerful! *------;

27 Advanced Tutorials

Note once again that almost all DATA step P3/JONES combination using EXISTS. operations (in this case, character value concatenation) are available for use in PROC SQL. In this case, the only *------_.• TJ:'l'LB 'BX 11 - BXISTS-Nom'RDfARY DOC'; patient-doctor combination from the visits file that does NOT Rm; occur in the demographic file is P3-JONES. Here is the output. PROC SQL; SELECT PATIENT. EX 9 - SUBQUERY-NONPRIMARY DOCS VISDOC FROM PTVJ:SI'l'S AS PV WHERE NOT BXISTS PATIENT VISDOC VISDATE CHARGES (SELECT PA'l'IEN'l', PRnmoc P3 JONES 09/09/93 68.50 FROM P'l'DEKOG AS PD WHERE PD. PATIENT=PV • PATIEN'I' AND PD.PRll!IDOC=PV.VISDOC); gorr; We can also use DISTINCT to isolate the *------, non-primary doctor situation, in conjunction with the aliasIqualification of data columns technique covered earlier. Here is the output from the above code. This is illustrated as follows. *------; EX 11 - EXISTS-NONPRIMARY DOC TJ:TIaB 'BX 10 - DISTINCT-NONPRllIOIRY DOC'; RtIN; PATIENT VISDOC

PROC SQL; P3 JONES SELECT DISTINCT PV. PATIENT, PV.VIsnoc FROM PTVJ:SITS AS PV, P'l'DBIIOG AS PD WHERE PV.PATIEN'I' = PD.PATIENT SECTION IV -I'd Rather Do It Myself lIND PV • VISDOC <> PD. PRll!IDOC; gorr; Many SAS programmers may be unaware of the *------; fact that it is perfectly legal to issue two discrete SET statements for the same dataset in a single DATA step. Here is the resulting output. While this may at first seem to fall into the rea1m of "Stupid SAS Tricks", anyone who has spent time with Bob Virgile EX 10 - DISTINCT-NONPRIMARY DOC and his "Puzzle Comer" will have learned that it can be an incredibly powerful technique. PATIENT VISDOC

P3 JONES Up until now, we have only locked at subqueries on a second table; we now demonstrate joining a table with A word about valid "NOT-EQUALS" syntax may be itself! The set-logic orientation of SQL makes this concept in order here. SAS accepts NE as usual : other languages may of working with more than one "image" of a table at one prefer the "0" as used above. SAS also accepts "0", but time much more common than it is in the SAS DATA step. puts out a note: In SQL terms, it is known as a "reflexive join". It becomes particularly necessary when trying to answer queries of the NOTE: The "<>" operator is interpreted form, "Find all rows where X is true, and all other as "not equals". associated rows", or, "Find all rows where X is true, but with no other associated rows Y being true". Yet another approach (do we really need another) to identifying this situation involves the EXISTS statement. In the next example, we will look for all other EXISTS basically allows you to answer the question - "Is diagnoses ever given for patients with a specified "trigger" there a row in the subquery that fits the bill, yes or no?". In diagnosis (e.g. '131 '). To stream1ine the output results, it that regard, it is closely related to ANY (see later). The may also be appropriate to remove the trigger diagnosis primary distinction between the two is that EXISTS focuses from the display. Obviously, this kind of query could be on a row satisfYing certain conditions (usually equalities), enhanced to identifY associated diagnoses which occurred whereas ANY focuses on a value within that row satisfYing before or after the trigger, perhaps to aid in early certain conditions (usually comparisons). Here we isolate the identification of risk or to better plan follow-up treatment.

28 Advanced Tutorials

The code is as follows. This shows the one and only row where SMITHs *------; charges exceed aU ofBROWN's. Note that the result could 'l':tTLE 'EX 12 - REFLEXIVE JOIN' ; easily have been multiple rows. RON; Now observe the result when we use the same PR.OC SQL; code, but substitute ANY for AIL. SELECT PV2 • PA'l':tENT. PV2.OX *------; FROM P'l'VJ:SJ:TS AS PVl., TITLE 'EX 14 - ANY'; P'l'VJ:SJ:TS AS PV2 RON; WHERE PV2.OX <> '131' AND PV2 • PA'l':tENT = PVl. • PATIENT PROC SQL; AND PVl..OX '131'; SELBC'l' PATIENT, QUIT; CHARGES *------; PROK PTV:tSITS WHERE VISDOC = 'SHITH' We are accessing PTVISITS twice, with each access .AND CHIIRGES > ANY independent of the other. We use the two aliases, PVI and (SELECT CHIIRGES FROH P'l'VJ:SITS PV2, to create the two separate "views" of the same table. WHERE VISDOC= ' BROWN' ) ; Here is the resulting output. QUIT; *------; EX 12 - REFLEXJ:VE Jom Here is the output. PATJ:ENT DX EX 14 - ANY P1 087 P1 676 PATl:ENT

P1 74.50 Two additional PROe SQL tools, ANY and AlL, P1 128.00 Pl 85.00 are easier to comprehend from seeing them in action than by explanation. so let's dive right in. In this example, the subquery is also a reflexive join, but it does not have to be the This shows all of SMITHs charges except the one same table at all. (P41S32) which is lower than the lowest of any of Brown's *------; charges (P2I$39). Note that the result could easily have TITLE 'EX 13 - ALL'; been a single row.

Once again, let's throw a function (MAX this time) PROC SQL; SELECT PATIENT, into the mix. In this example, we're trying to find the CBAElGES highest charge each patient has ever incurred in.a single FROM P'l'VJ:Sl:TS visit. From our experience earlier, many of you will WHERE VISDOC=' maTH' probably suspect that the following obvious code won't .AND CBAElGES > ALL (SELECT CHIIRGES work PROK P'l'VJ:srrs WHERE VISDOC=' BROWN' ) ; *------; QUIT; Tl:TLE 'EX 15 - HIGHEST (BAD); *------; RON; PROC SQL; The above code produces the following output. SELECT DIS'l'INCT PATIENT, lIIAX (CHIIRGBS) FROM P'l'VJ:SITS 1 EX 13 - ALL QUIT; *------; PATIENT CHARGES

P1 128.00

29 Advanced Tutorials

As the following output shows, you were right In a "correlated subquery", we are asking PROC SQL to identifY, in a subquery, the one row (possibly rows), which satisfies some , from a group of "candidate" EX 15 - HIGHEST (BAD) rows identified in the outer query. This is a complex PATIENT concept, SO let's look at it another way. We have already seen that functions operate on the "universe" of rows P1 135.00 available to them. In the examples earlier, that universe was P2 135.00 P3 135.00 the entire table. A correlated subquery constrains the P4 135.00 function to operate on small subsets of the table (the P5 135.00 candidate rows from the outer query) one at a time. To P6 135.00 achieve this, the key variables in the subquery have to be P7 135.00 exactly correlated to those in the outer query. Once again. SAS produces the "remerging" note, and In our example, only PATIENT needs to be resolves the function across the entire dataset correlated - however, be warned that the real world is seldom this simple. For instance, if we were evaluating Will a GROUP BY solve the problem. as it did charges by doctorlpatient combination, both patient equality earlier? Yes - it will. However, because of volume (i.e. and doctor equality would have to be established in the discrete nwnber of GROUP values), that approach may only subquery. be suitable for situations of a similar size to this example data. In the real world, GROUP BY is appropriate for studies by Another side-effect of the operation of correlated SEX. HOSPITAL, perhaps DOCTOR, etc. A better approach sub queries is that many "universes" (albeit smaller ones) to low-level swnmaries (i.e. specific to one key, but where have to be evaluated. To accomplish this, SQL has to there are many discrete keys) may be a "correlated subquery". establish. retain, and discard working set after working set. This technique is used in the following example to the This technique is widely regarded as a "hog", and for good highest charge paid by each patient on any visit reason. It may be worthwhile to look into rewriting such *------; queries as "joins." Theoretically, it should be possible, but TITLE 'EX 16 - CORRELATED SUBQOERY'; the developmentlmainteoa ofsuch code may make it less RON; desirable than correlated subqueries, even with their intrinsic inefficiencies. PROC SQI,; SELBCT DISTINCT PVl. PATIENT, PVl. CHARGES FROM P'l'I7J:SI:TS AS PV1 SECTION V - You Could Look It Up WHERE PVl. CHARGES = (SELECT HAX{CHARGES) FROM P'l'I7J:SI:TS AS PV2 How often have you needed to merely display the WHERE PVl. PATIEN'1'=PV2 • PATIENT) ; descriptive information about a SAS dataset, or better yet, collect all the data and use it as a dataset itself? Of course, QtJJ:T; the place to turn for this type of information is PROC *------; CONTENTS. This procedure does an admirable job (despite the mysterious disappearance of the HISTORY Output from the above code is as follows, with option), but is somewhat inflexible in terms of default discussion to follow. output. Once again, PROC SQL provides an alternative which does an excellent job ifyou are in need of this type of EX 16 - CORRELATED SUBQUERY supportive data.

In version 6.08, a few enhancements were added PATIENT CHARGES to PROC SQL in addition to the above mentioned P1 128.00 CALCULATED keyword. Notable among them is the P2 39.00 DICTIONARY component which comes in eight garden P3 110.75 varieties. You can build tables (datasets) containing P4 32.00 information about CATALOGS (SAS libraries, catalogs, P5 106.85 P6 98.00 entries), COLUMNS (SAS dataset variables), EXTFD..ES P7 135.00 (external files defined to your current session by filerefs), INDEXES (of SAS datasets), MEMBERS (SAS files and data libraries), OPTIONS (current settings), TABLES

30 Advanced Tutorials

(SAS data files) and VIEWS (SAS data views.) There is appeared as the PROC SQL default output Making the some overlap in the choices, but its all there. In addition, data available to other reporting procedures via a dataset there are numerous preconstructed VIEWS of these types of however, provides a very powerful tool for developing data available in the SASHELP data library which provide documentation on all of oW' data entities. selected subgroups of information based on the DICTIONARY component. SECTION VI - Now RUN; with It (even though As a fmal example in this looooonnnnng paper, lets you don't have tot) simply create a new table containing some descriptive data about the variables in oW' two working datasets. The This last section title is oW' clever way to get you following code will accomplish this. to play with PROC SQL. Throw a RUN; statement after yoW' next PROC SQL and see what happens.

OK, we've now dug into the PROC SQL bag of *------; tricks to come up with a few techniques which only begin to TITLE 'EX 17 - DICTIONARY'; show the power available when you scratch a little harder RON; and deeper than merely at the surface. This exposition is by PROC SQL; no means extensive. OW' aim here is to show that PROC SQL is a SAS powerhouse, once you take yoW' first steps CREATE TABLE EX17 AS beyond the basics. We only meant to point you in the right SELBCT MEMNAME, direction and give you a peek over the horizon. We hope NAME, we have whetted your appetite to delve fw1her into this new TYPE, frontier (do you think we used enough cliches there?) LBNGTH. FORMAT Happy coding.

FRO!I DICTIONARY • COLUMNS SAS is a registered trademark of SAS Institute, Cary, NC.

WHERE LIBISIJ\HB=' WORK ' AND HEMNAME IN ( , PTDEMOG' • 'P'l'VISITS' ) ; DB2 is a registered trademark of International Business Machines, Inc., Armonk, NY. QUl:T;

PROC PRINT DATA=EX17; The authors currently be contacted follows: RON; can as *------; Alan Dickson 40 Riverbend Drive The output is as follows. Yannouth, ME 04096 EX 17 - DICTIONARY (207) 846-8957 [email protected] OBS MEMN1IME NAME TYPE LENGTH FORMAT

1 PTDEMOG I'ATIENT char 2 2 PTDEMOG PRIMDOC char 5 Ray Pass 3 PTDEMOG DOB nUl1\ 8 MMDDYY8. ASG,Inc. 4PTDEMOG SEX char 1 1100 Summer Street 5 PTITISITS PATIENT char 2 6 PTlTISITS VISDATE num 8 MMDDYYB. Stamford, CT 06905 7 PTlTISITS VISDOC char 5 B PTlTISI.TS HX char 3 voice: (203) 356-9540 9 PTlTISITS DX char 3 fax: (203) 967-8644 10 PTlTISITS RX char 4 11 PTlTISITS CHARGES nUl1\ 8 6.2 e-mail: 0005962116.mcimail.com

We could have simply listed the data by merely SELECTing it and not CREATEing TABLE EX17. We would then not have to PROe PRINT it - it would have

31