Programming and Data Management for IBM SPSS Statistics 19

Total Page:16

File Type:pdf, Size:1020Kb

Programming and Data Management for IBM SPSS Statistics 19 Programming and Data Management for IBM SPSS Statistics 19 A Guide for IBM SPSS Statistics and SAS Users Raynald Levesque and SPSS Inc. Note: Before using this information and the product it supports, read the general information under “Notices” on p. 435. This document contains proprietary information of SPSS Inc, an IBM Company. It is provided under a license agreement and is protected by copyright law. The information contained in this publication does not include any product warranties, and any statements provided in this manual should not be interpreted as such. When you send information to IBM or SPSS, you grant IBM and SPSS a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligationtoyou. © Copyright SPSS Inc. 1989, 2010. Preface Experienced data analysts know that a successful analysis or meaningful report often requires more work in acquiring, merging, and transforming data than in specifying the analysis or report itself. IBM® SPSS® Statistics contains powerful tools for accomplishing and automating these tasks. While much of this capability is available through the graphical user interface, many of the most powerful features are available only through command syntax—and you can make the programming features of its command syntax significantly more powerful by adding the ability to combine it with a full-featured programming language. This book offers many examples of the kinds of things that you can accomplish using command syntax by itself and in combination with other programming language. For SAS Users If you have more experience with SAS for data management, see Chapter 32 for comparisons of the different approaches to handling various types of data management tasks. Quite often, there is not a simple command-for-command relationship between the two programs, although each accomplishes the desired end. Acknowledgments This book reflects the work of many members of the SPSS Inc. staff who have contributed examples here and in Developer Central, as well as that of Raynald Levesque, whose examples formed the backbone of earlier editions and remain important in this edition. We also wish to thank Stephanie Schaller, who provided many sample SAS jobs and helped to define what the SAS user would want to see, as well as Marsha Hollar and Brian Teasley, the authors of the original chapter “IBM® SPSS® Statistics for SAS Programmers.” A Note from Raynald Levesque It has been a pleasure to be associated with this project from its inception. I have for many years tried to help IBM® SPSS® Statistics users understand and exploit its full potential. In this context, I am thrilled about the opportunities afforded by the Python integration and invite everyone to visit my site at www.spsstools.net for additional examples. And I want to express my gratitude to my spouse, Nicole Tousignant, for her continued support and understanding. Raynald Levesque © Copyright SPSS Inc. 1989, 2010 iii Contents 1Overview 1 UsingThisBook.............................................................. 1 DocumentationResources...................................................... 2 Part I: Data Management 2 Best Practices and Efficiency Tips 5 WorkingwithCommandSyntax.................................................. 5 CreatingCommandSyntaxFiles............................................... 5 Running Commands........................................................ 6 SyntaxRules............................................................. 7 Protecting theOriginalData..................................................... 7 Do Not OverwriteOriginalVariables............................................ 8 UsingTemporaryTransformations............................................. 8 UsingTemporaryVariables.................................................. 9 UseEXECUTESparingly........................................................10 LagFunctions............................................................11 Using$CASENUMtoSelectCases.............................................12 MISSING VALUES Command.................................................13 WRITEandXSAVECommands................................................13 Using Comments..............................................................13 Using SET SEED toReproduceRandomSamplesorValues..............................14 Divide and Conquer...........................................................15 UsingINSERTwithaMasterCommandSyntaxFile................................15 DefiningGlobalSettings.....................................................15 3 Getting Data into IBM SPSS Statistics 19 GettingDatafromDatabases....................................................19 Installing Database Drivers . 19 DatabaseWizard..........................................................20 ReadingaSingleDatabaseTable..............................................20 ReadingMultipleTables.....................................................22 v ReadingIBMSPSSStatisticsDataFileswithSQLStatements............................25 Installing the IBM SPSS Statistics Data File Driver. 25 UsingtheStandaloneDriver.................................................26 ReadingExcelFiles............................................................27 Readinga“Typical”Worksheet...............................................27 ReadingMultipleWorksheets................................................30 ReadingTextDataFiles.........................................................32 SimpleTextDataFiles......................................................32 DelimitedTextData........................................................33 Fixed-WidthTextData......................................................36 TextDataFileswithVeryWideRecords.........................................40 ReadingDifferentTypesofTextData...........................................40 ReadingComplexTextDataFiles..................................................41 MixedFiles..............................................................42 GroupedFiles............................................................43 Nested(Hierarchical)Files..................................................45 RepeatingData...........................................................49 ReadingSASDataFiles........................................................50 ReadingStataDataFiles........................................................51 CodePageandUnicodeDataSources.............................................52 4 File Operations 55 UsingMultipleDataSources....................................................55 MergingDataFiles............................................................58 MergingFileswiththeSameCasesbutDifferentVariables..........................58 MergingFileswiththeSameVariablesbutDifferentCases..........................61 UpdatingDataFilesbyMergingNewValuesfromTransactionFiles....................64 AggregatingData.............................................................65 AggregateSummaryFunctions...............................................67 WeightingData...............................................................68 ChangingFileStructure........................................................70 TransposingCasesandVariables..............................................70 CasestoVariables.........................................................71 VariablestoCases.........................................................73 vi 5 Variable and File Properties 77 VariableProperties............................................................77 VariableLabels...........................................................79 ValueLabels.............................................................80 MissingValues...........................................................80 MeasurementLevel........................................................81 CustomVariableProperties..................................................81 UsingVariablePropertiesasTemplates ........................................82 FileProperties...............................................................83 6 Data Transformations 85 RecodingCategoricalVariables..................................................85 BinningScaleVariables........................................................85 SimpleNumericTransformations.................................................88 ArithmeticandStatisticalFunctions...............................................88 RandomValueandDistributionFunctions...........................................89 StringManipulation...........................................................90 ChangingtheCaseofStringValues............................................90 Combining StringValues....................................................91 TakingStringsApart.......................................................92 Changing Data TypesandStringWidths............................................95 Working with DatesandTimes...................................................96 Date Input and DisplayFormats...............................................97 DateandTimeFunctions....................................................99 7 Cleaning and Validating Data 105 FindingandDisplayingInvalidValues............................................. 105 ExcludingInvalidDatafromAnalysis............................................. 107 FindingandFilteringDuplicates................................................. 107 DataPreparationOption....................................................... 110 8 Conditional Processing, Looping, and Repeating
Recommended publications
  • SPSS to Orthosim File Conversion Utility Helpfile V.1.4
    SPSS to Orthosim File Conversion Utility Helpfile v.1.4 Paul Barrett Advanced Projects R&D Ltd. Auckland New Zealand email: [email protected] Web: www.pbarrett.net 30th December, 2019 Contents 3 Table of Contents Part I Introduction 5 1 Installation Details ................................................................................................................................... 7 2 Extracting Matrices from SPSS - Cut and Paste ................................................................................................................................... 8 3 Extracting Matrices from SPSS: Orthogonal Factors - E.x..c..e..l. .E..x..p..o..r.t................................................................................................................. 17 4 Extracting Matrices from SPSS: Oblique Factors - Exce.l. .E..x..p..o..r..t...................................................................................................................... 24 5 Creating Orthogonal Factor Orthosim Files ................................................................................................................................... 32 6 Creating Oblique Factor Orthosim Files ................................................................................................................................... 41 3 Paul Barrett Part I 6 SPSS to Orthosim File Conversion Utility Helpfile v.1.4 1 Introduction SPSS-to-Orthosim converts SPSS 11/12/13/14 factor loading and factor correlation matrices into the fixed-format .vf (simple ASCII text) files
    [Show full text]
  • Integrating Excel, SQL, and SPSS Within an Introductory Finance Intrinsic Value Assignment
    Journal of Finance and Accountancy Volume 20, September, 2015 Integrating Excel, SQL, and SPSS within an Introductory Finance Intrinsic Value Assignment Richard Walstra Dominican University Anne Drougas Dominican University Steve Harrington Dominican University ABSTRACT In 2013, the Association to Advance Collegiate Schools of Business (AACSB) issued revised accreditation standards to address a “new era” of business education. The standards recognize that students will be entering a data-driven world and need skills in information technology to analyze and manage data. Employer surveys also emphasize the importance of technological skills and knowledge in a rapidly changing environment. To address these challenges, business faculty in all disciplines must adapt course content to incorporate software tools and statistical applications and integrate active learning tools that provide students with hands-on experience. This paper describes a technology-based, business valuation assignment instructors can employ within an undergraduate managerial finance course. The assignment draws upon historical financial data from the restaurant industry as students predict the intrinsic value of a firm through the application of various software products. Students receive data in an Access database and create SQL queries to determine free cash flows before transferring information into Excel for further analysis. Students continue by uploading key data elements into SPSS and performing statistical analysis on three growth models in order to identify a desired predictor of intrinsic value. The assignment develops students’ abilities to navigate software tools, understand statistical concepts, and apply quantitative decision making. Key Words: Business valuation, growth rates, software tools Copyright statement: Authors retain the copyright to the manuscripts published in AABRI journals.
    [Show full text]
  • IBM DB2 for Z/OS: the Database for Gaining a Competitive Advantage!
    Why You Should Read This Book Tom Ramey, Director, DB2 for z/OS IBM Silicon Valley Laboratory “This book is a ‘must read’ for Enterprise customers and contains a wealth of valuable information! “It is clear that there is a technology paradigm shift underway, and this is opening enormous opportunities for companies in all industries. Adoption of Cloud, Mobile, and Analytics promises to revolutionize the way we do business and will add value to a company’s business processes across all functions from sales, marketing, procurement, manufacturing and finance. “IT will play a significant role enabling this shift. Read this book and find out how to integrate the heart of your infrastructure, DB2 for z/OS, with new technologies in order to maximize your investment and drive new value for your customers.” Located at IBM’s Silicon Valley Laboratory, Tom is the director of IBM’s premiere relational database management system. His responsibilities include Architecture, Development, Service, and Customer Support for DB2. He leads development labs in the United States, Germany, and China. Tom works closely with IBM’s largest clients to ensure that DB2 for z/OS continues as the leading solution for modern applications, encompassing OLTP to mobile to analytics. At the same time he continues an uncompromising focus on meeting the needs of the most demanding operational environments on Earth, through DB2’s industry- leading performance, availability, scaling, and security capabilities. IBM DB2 for z/OS: The Database for Gaining a Competitive Advantage! Shantan Kethireddy Jane Man Surekha Parekh Pallavi Priyadarshini Maryela Weihrauch MC Press Online, LLC Boise, ID 83703 USA IBM DB2 for z/OS: The Database for Gaining a Competitive Advantage! Shantan Kethireddy, Jane Man, Surekha Parekh, Pallavi Priyadarshini, and Maryela Weihrauch First Edition First Printing—October 2015 © Copyright 2015 IBM.
    [Show full text]
  • A FORTRAN 77 Program for a Nonparametric Item Response Model: the Mokken Scale Analysis
    BehaviorResearch Methods, Instruments, & Computers 1988, 20 (5), 471-480 A FORTRAN 77 program for a nonparametric item response model: The Mokken scale analysis JOHANNES KINGMA University of Utah, Salt Lake City, Utah and TERRY TAERUM University ofAlberta, Edmonton, Alberta, Canada A nonparametric item response theory model-the Mokken scale analysis (a stochastic elabo­ ration of the deterministic Guttman scale}-and a computer program that performs this analysis are described. Three procedures of scaling are distinguished: a search procedure, an evaluation of the whole set of items, and an extension of an existing scale. All procedures provide a coeffi­ cient of scalability for all items that meet the criteria of the Mokken model and an item coeffi­ cient of scalability for every item. Four different types of reliability coefficient are computed both for the entire set of items and for the scalable items. A test of robustness of the found scale can be performed to analyze whether the scale is invariant across different subgroups or samples. This robustness test serves as a goodness offit test for the established scale. The program is writ­ ten in FORTRAN 77. Two versions are available, an SPSS-X procedure program (which can be used with the SPSS-X mainframe package) and a stand-alone program suitable for both main­ frame and microcomputers. The Mokken scale model is a stochastic elaboration of which both mainframe and MS-DOS versions are avail­ the well-known deterministic Guttman scale (Mokken, able. These programs, both named Mokscal, perform the 1971; Mokken & Lewis, 1982; Mokken, Lewis, & Mokken scale analysis. Before presenting a review of the Sytsma, 1986).
    [Show full text]
  • 2018 Corporate Responsibility Report
    2018 Corporate Responsibility Report Trust and responsibility. Earned and practiced daily. #GoodTechIBM IBM 2018 Corporate Responsibility Report | 1 Trust and responsibility. Earned and practiced daily. We have seen, for more than a century, that when to the boardroom. They are core to every — We invested hundreds of millions of dollars in we apply science to real-world problems, we can relationship — with our employees, our clients, programs to help train and prepare the global create a tomorrow that is better than today. More our shareholders, and the communities in which workforce for this new era. These initiatives sustainable. More equitable. More secure. we live and work. include 21st century apprenticeship programs, returnships for women reentering In fact, we have never known a time when In this report, you will read about the many the workforce, veterans programs and science and technology had more potential to achievements we made to further this foundation volunteer skills-building sessions for more benefit society than right now. of trust and responsibility throughout 2018. than 3.2 million students worldwide. And we For example: helped scale the P-TECH™ school model — a In the last 10 years alone, the world has achieved six-year program that offers a high school Ginni Rometty at P-TECH in Brooklyn, N.Y., May 2019 stunning advancements, from breaking the — After reaching our aggressive goals to increase diploma and an associate’s degree, along AI winter to the dawn of quantum computing. our use of renewable energy and reduce CO2 with real-world working experience and These and other advanced technologies have emissions 4 years ahead of schedule, we set mentorship — at no cost to students.
    [Show full text]
  • IBM SPSS Orientation: Version 23
    IBM SPSS Orientation: Version 23 Installation & Introduction to Operations [6 and 12-Month Licence version] This document provides an introduction to IBM SPSS, explaining how to install and run the program, as well as a basic overview of its features. Your Real Estate Division course workbook will provide software instructions for running necessary statistical procedures. However, you are expected to have installed the software and have a working knowledge of how to operate the software program and interpret its results. In general, IBM SPSS is very easy to learn and use. The user interface is similar to that of Microsoft Word and Microsoft Excel and its output is easily transferred to these programs. There is comprehensive help available within the program as well as customer support. Note that the Real Estate Division’s courses do not cover all of the capabilities of the IBM SPSS program. For a complete explanation of statistical procedures not covered in the lessons, you should refer to the IBM SPSS Help menu found in the program. (Note: from this point on SPSS means IBM SPSS) Computer Requirements It is essential that students have at least a very basic knowledge of how a computer operates. If you are unfamiliar with the operation of personal computers, you may wish to visit your local bookstore for a “how to” manual or investigate schools, colleges, or libraries in your area which may offer a course designed for computer beginners. SPSS operates with a Windows or Mac operating system. To use SPSS, you will need to have basic computer skills in order to do the following: • download a program from a website; • start an application; • use a mouse; • use menus and submenus; • use click, select, and drop and drag actions; and • save files to a hard disk drive and open them again.
    [Show full text]
  • IBM Db2 on Cloud Solution Brief
    Hybrid Data Management IBM Db2 on Cloud A fully-managed, relational database on IBM Cloud and Amazon Web Services with elastic scaling and autonomous failover 1 IBM® Db2® on Cloud is a fully-managed, SQL cloud database that can be provisioned on IBM Cloud™ and Amazon Web Services, eliminating the time and expense of hardware set up, software installation, and general maintenance. Db2 on Cloud provides seamless compatibility, integration, and licensing with the greater Db2 family, making your data highly portable and extremely flexible. Through the Db2 offering ecosystem, businesses are able to desegregate systems of record and gain true analytical insight regardless of data source or type. Db2 on Cloud and the greater Db2 family support hybrid, multicloud architectures, providing access to intelligent analytics at the data source, insights across the business, and flexibility to support changing workloads and consumptions cases. Whether you’re looking to build cloud-native applications, transition to a fully-managed instance of Db2, or offload certain workloads for disaster recovery, Db2 on Cloud provides the flexibility and agility needed to run fast queries and support enterprise-grade applications. Features and benefits of Db2 on Cloud Security and disaster recovery Cloud databases must provide technology to secure applications and run on a platform that provides functional, infrastructure, operational, network, and physical security. IBM Db2 on Cloud accomplishes this by encrypting data both at rest and in flight, so that data is better protected across its lifecycle. IBM Db2 on Cloud helps restrict data use to only approved parties with user authentication for platform services and resource access control.
    [Show full text]
  • SPSS Statistics 23.0.0.0 SPSS Statistics 23.0.0.0 Supported Operating Systems
    Software Product Compatibility Reports Supported Operating Systems Product SPSS Statistics 23.0.0.0 SPSS Statistics 23.0.0.0 Supported Operating Systems Contents Included in this report Operating systems Glossary Disclaimers Report data as of 2016-12-27 01:20:00 EST 1 SPSS Statistics 23.0.0.0 Supported Operating Systems Included in this report This report can be generated with filters applied to operating system platforms, components, and/or software capabilities. This section reflects how the report was filtered when it was generated. Legend • The information about this item is included in this report. • The information about this item is not included in the report filter. Platforms Components • AIX Desktop • Linux • IBM SPSS Statistics Client Mac OS • Server • Solaris • IBM SPSS Statistics Server • Windows Report data as of 2016-12-27 01:20:00 EST 2 SPSS Statistics 23.0.0.0 Supported Operating Systems Operating Systems The operating sysytem section specifies the operating systems that SPSS Component support Statistics 23.0.0.0 supports, organized by operating system familiy. Full Operating system families Partial AIX Linux Mac OS Solaris Windows None AIX Summary Operating system OperatingHardware Bitness Product Components Notes? system minimum minimum AIX 6.1 Base POWER 64-Exploit 23.0.0.0 Yes System - Big Endian AIX 7.1 Base POWER 64-Exploit 23.0.0.0 Yes System - Big Endian Report data as of 2016-12-27 01:20:00 EST 3 SPSS Statistics 23.0.0.0 Supported Operating Systems AIX 6.1 POWER System - Big Endian Legend: Supported Not supported
    [Show full text]
  • IBM SPSS Statistics Version 25: Mac OS Installation Instructions (Authorized User License) Installation Instructions
    IBM SPSS Statistics Version 25 Mac OS Installation Instructions (Authorized User License) IBM Contents Installation instructions ........ 1 Notes for installation .......... 1 System requirements............ 1 Licensing your product ........... 2 Authorization code ........... 1 Using the license authorization wizard .... 2 Installing ............... 1 Viewing your license .......... 2 Running multiple versions and upgrading from a Applying fix packs ............ 3 previous release ............ 1 Uninstalling .............. 3 Note for IBM SPSS Statistics Developer .... 1 Updating, modifying, and renewing IBM SPSS Installing from a downloaded file ...... 1 Statistics ................ 3 Installing from the DVD/CD ........ 1 iii iv IBM SPSS Statistics Version 25: Mac OS Installation Instructions (Authorized User License) Installation instructions The following instructions are for installing IBM® SPSS® Statistics version 25 using the license type authorized user license. This document is for users who are installing on their desktop computers. System requirements To view system requirements, go to http://publib.boulder.ibm.com/infocenter/prodguid/v1r0/clarity/ index.jsp. Authorization code You will also need your authorization code(s). In some cases, you might have multiple codes. You will need all of them. You should have received separate instructions for obtaining your authorization code. If you cannot find your authorization code, contact Customer Service by visiting http://www.ibm.com/software/analytics/ spss/support/clientcare.html. Installing Running multiple versions and upgrading from a previous release You do not need to uninstall an old version of IBM SPSS Statistics before installing the new version. Multiple versions can be installed and run on the same machine. However, do not install the new version in the same directory in which a previous version is installed.
    [Show full text]
  • Implementing Database Connections in SPSS
    District Data Coordinator Toolbox: Implementing Database Connections in SPSS Jason Schoeneberger, Ph.D. Senior Researcher & Task Lead Data, data, everywhere The volume of and the push to Teachers, principals, make use educational data is administrators and analysts often growing: have difficulty keeping pace. • More people must become data savvy (teachers, coordinators, etc.) • Leadership may request cyclical reporting to establish and monitor trends • Little time to document business rules or standardize data storage practices • Quality control can take time or be difficult to manage 2 Some familiar scenarios (using data stored in SQL, Oracle, Access, etc.) • The same data points are • Analysts are maintaining necessary across idiosyncratic versions of reporting cycles various data elements • Process to acquire and (e.g. test score files, report data is repetitive student attendance files, across reporting cycles etc.) • A non-technical person • Idiosyncratic versions may be tasked with have commonalities reporting responsibility across analyst versions • Lack of documentation • Separate data requests • Analysts report shortage completed by different of storage space on analysts yield conflicting network or external hard results (e.g. a school drives mean test score) 3 Database connections • Databases (e.g., SQL, Oracle, Access, etc.) allow for basic data base connectivity: – Open Database Connectivity (ODBC) – Object Linking and Embedding Database (OLEDB) – These are often standard on computers • ODBC/OLEDB connections are frameworks
    [Show full text]
  • IBM SPSS Statistics
    IBM SPSS Statistics: Time Is Money 1 Overview Advanced statistical procedures and visualization can provide a robust, user-friendly, and integrated platform to understand your data and solve complex business and research problems. No matter what industry you are coming from, you can reveal the meaning behind your data with IBM SPSS Statistics’ easy-to-use software. In no time you can perform complex statistical analyses, giving your data experts time to focus on insights instead of setup. 2 If you’re looking for a market-leading IBM SPSS Statistics: product to use for statistical analysis, Leading the Pack look no further than IBM SPSS Statistics. Based on data from real user reviews, IBM SPSS Statistics was ranked as a Leader in G2’s Statistical Analysis category. IBM SPSS Statistics has the highest Market Presence score and second highest Satisfaction score on the G2 Grid®. Grid® Report for Statistical Analysis | Winter 2020 Leaders Market Presence Market Satisfaction * The Grid® Report for Statistical Analysis | Winter 2020 is based off of scores calculated using the G2 Grid® algorithms from reviews collected by November 27, 2019. 3 Benefits Easy to Use Intuitive interface gives everyone the ability to conduct statistical analysis Quick, Reproducible Insights Simple-to-document processes for reproducible insights Great Support Find answers easily from IBM’s support team and IBM SPSS Statistics community Full Featured Gives you the features you need to conduct complex statistical analysis Flexible and Scalable Always improving with frequent updates 4 Compared to other statistical software it Easy to Use is the best. Allows for complex analysis with far more ease than excel or other similar software.
    [Show full text]
  • OHD Outbreak Investigation Templates for Questionnaires, Data Entry, and Data Analysis
    31 July 2003 OHD Outbreak Investigation Templates for Questionnaires, Data Entry, and Data Analysis Overview This describes how to use a family of questionnaire templates for outbreak investigations. Although most commonly used with outbreaks of gastroenteritis, they can be adapted for other kinds of investigations. Although they can be used as stand-alones, they are optimized for use with a FileMaker data entry template, “Data Entry.FP5”. (Note to non-FileMaker users: although a tedious project, these templates could be replicated in Access or most other database programs; help yourself.) The FileMaker template allows for fast and accurate data entry, and feeds directly into pre-written EpiInfo analysis programs. Because the set-up does not vary much from outbreak to outbreak, data entry people quickly become (and stay) proficient. The key difference between this system and more traditional data entry is that many exposures (typically food items), as well as symptoms and other characteristics, are grouped into blocks for data entry. Only “yes” answers are entered; “no” is otherwise assumed. For example, if there were 20 items available at a suspect meal, traditionally one would make 20 yes/no variables. In this system, however, those 20 items are entered as a single variable. If the person ate only items A, C, and H on the list, for example, one would enter “ACH”. Behind the scenes, these raw data are being split into “traditional” YES/NO variables. At any time, with the flick of a button, the data can be automatically cleaned and exported (e.g., as DBF files) for analysis in EpiInfo or other programs.
    [Show full text]