Programming and Data Management for IBM SPSS Statistics 19
Total Page:16
File Type:pdf, Size:1020Kb
Programming and Data Management for IBM SPSS Statistics 19 A Guide for IBM SPSS Statistics and SAS Users Raynald Levesque and SPSS Inc. Note: Before using this information and the product it supports, read the general information under “Notices” on p. 435. This document contains proprietary information of SPSS Inc, an IBM Company. It is provided under a license agreement and is protected by copyright law. The information contained in this publication does not include any product warranties, and any statements provided in this manual should not be interpreted as such. When you send information to IBM or SPSS, you grant IBM and SPSS a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligationtoyou. © Copyright SPSS Inc. 1989, 2010. Preface Experienced data analysts know that a successful analysis or meaningful report often requires more work in acquiring, merging, and transforming data than in specifying the analysis or report itself. IBM® SPSS® Statistics contains powerful tools for accomplishing and automating these tasks. While much of this capability is available through the graphical user interface, many of the most powerful features are available only through command syntax—and you can make the programming features of its command syntax significantly more powerful by adding the ability to combine it with a full-featured programming language. This book offers many examples of the kinds of things that you can accomplish using command syntax by itself and in combination with other programming language. For SAS Users If you have more experience with SAS for data management, see Chapter 32 for comparisons of the different approaches to handling various types of data management tasks. Quite often, there is not a simple command-for-command relationship between the two programs, although each accomplishes the desired end. Acknowledgments This book reflects the work of many members of the SPSS Inc. staff who have contributed examples here and in Developer Central, as well as that of Raynald Levesque, whose examples formed the backbone of earlier editions and remain important in this edition. We also wish to thank Stephanie Schaller, who provided many sample SAS jobs and helped to define what the SAS user would want to see, as well as Marsha Hollar and Brian Teasley, the authors of the original chapter “IBM® SPSS® Statistics for SAS Programmers.” A Note from Raynald Levesque It has been a pleasure to be associated with this project from its inception. I have for many years tried to help IBM® SPSS® Statistics users understand and exploit its full potential. In this context, I am thrilled about the opportunities afforded by the Python integration and invite everyone to visit my site at www.spsstools.net for additional examples. And I want to express my gratitude to my spouse, Nicole Tousignant, for her continued support and understanding. Raynald Levesque © Copyright SPSS Inc. 1989, 2010 iii Contents 1Overview 1 UsingThisBook.............................................................. 1 DocumentationResources...................................................... 2 Part I: Data Management 2 Best Practices and Efficiency Tips 5 WorkingwithCommandSyntax.................................................. 5 CreatingCommandSyntaxFiles............................................... 5 Running Commands........................................................ 6 SyntaxRules............................................................. 7 Protecting theOriginalData..................................................... 7 Do Not OverwriteOriginalVariables............................................ 8 UsingTemporaryTransformations............................................. 8 UsingTemporaryVariables.................................................. 9 UseEXECUTESparingly........................................................10 LagFunctions............................................................11 Using$CASENUMtoSelectCases.............................................12 MISSING VALUES Command.................................................13 WRITEandXSAVECommands................................................13 Using Comments..............................................................13 Using SET SEED toReproduceRandomSamplesorValues..............................14 Divide and Conquer...........................................................15 UsingINSERTwithaMasterCommandSyntaxFile................................15 DefiningGlobalSettings.....................................................15 3 Getting Data into IBM SPSS Statistics 19 GettingDatafromDatabases....................................................19 Installing Database Drivers . 19 DatabaseWizard..........................................................20 ReadingaSingleDatabaseTable..............................................20 ReadingMultipleTables.....................................................22 v ReadingIBMSPSSStatisticsDataFileswithSQLStatements............................25 Installing the IBM SPSS Statistics Data File Driver. 25 UsingtheStandaloneDriver.................................................26 ReadingExcelFiles............................................................27 Readinga“Typical”Worksheet...............................................27 ReadingMultipleWorksheets................................................30 ReadingTextDataFiles.........................................................32 SimpleTextDataFiles......................................................32 DelimitedTextData........................................................33 Fixed-WidthTextData......................................................36 TextDataFileswithVeryWideRecords.........................................40 ReadingDifferentTypesofTextData...........................................40 ReadingComplexTextDataFiles..................................................41 MixedFiles..............................................................42 GroupedFiles............................................................43 Nested(Hierarchical)Files..................................................45 RepeatingData...........................................................49 ReadingSASDataFiles........................................................50 ReadingStataDataFiles........................................................51 CodePageandUnicodeDataSources.............................................52 4 File Operations 55 UsingMultipleDataSources....................................................55 MergingDataFiles............................................................58 MergingFileswiththeSameCasesbutDifferentVariables..........................58 MergingFileswiththeSameVariablesbutDifferentCases..........................61 UpdatingDataFilesbyMergingNewValuesfromTransactionFiles....................64 AggregatingData.............................................................65 AggregateSummaryFunctions...............................................67 WeightingData...............................................................68 ChangingFileStructure........................................................70 TransposingCasesandVariables..............................................70 CasestoVariables.........................................................71 VariablestoCases.........................................................73 vi 5 Variable and File Properties 77 VariableProperties............................................................77 VariableLabels...........................................................79 ValueLabels.............................................................80 MissingValues...........................................................80 MeasurementLevel........................................................81 CustomVariableProperties..................................................81 UsingVariablePropertiesasTemplates ........................................82 FileProperties...............................................................83 6 Data Transformations 85 RecodingCategoricalVariables..................................................85 BinningScaleVariables........................................................85 SimpleNumericTransformations.................................................88 ArithmeticandStatisticalFunctions...............................................88 RandomValueandDistributionFunctions...........................................89 StringManipulation...........................................................90 ChangingtheCaseofStringValues............................................90 Combining StringValues....................................................91 TakingStringsApart.......................................................92 Changing Data TypesandStringWidths............................................95 Working with DatesandTimes...................................................96 Date Input and DisplayFormats...............................................97 DateandTimeFunctions....................................................99 7 Cleaning and Validating Data 105 FindingandDisplayingInvalidValues............................................. 105 ExcludingInvalidDatafromAnalysis............................................. 107 FindingandFilteringDuplicates................................................. 107 DataPreparationOption....................................................... 110 8 Conditional Processing, Looping, and Repeating