Introduction to Python for Econometrics, Statistics and Data Analysis

Introduction to Python for Econometrics, Statistics and Data Analysis 3rd Edition, 1st Revision Kevin Sheppard University of Oxford Monday 9th September, 2019 2 - ©2019 Kevin Sheppard Changes since the Third Edition • Verified that all code and examples work correctly against 2019 versions of modules. The notable packages and their versions are: – Python 3.7 (Preferred version) – NumPy: 1.16 – SciPy: 1.3 – pandas: 0.25 – matplotlib: 3.1 • Python 2.7 support has been officially dropped, although most examples continue to work with 2.7. Do not Python 2.7 in 2019 for numerical code. • Small typo fixes, thanks to Marton Huebler. • Fixed direct download of FRED data due to API changes, thanks to Jesper Termansen. • Thanks for Bill Tubbs for a detailed read and multiple typo reports. • Updated to changes in line profiler (see Ch. 24) • Updated deprecations in pandas. • Removed hold from plotting chapter since this is no longer required. • Thanks for Gen Li for multiple typo reports. • Tested all code on Pyton 3.6. Code has been tested against the current set of modules installed by conda as of February 2018. The notable packages and their versions are: – NumPy: 1.13 – Pandas: 0.22 ii Notes to the 3rd Edition This edition includes the following changes from the second edition (August 2014): • Rewritten installation section focused exclusively on using Continuum’s Anaconda. • Python 3.5 is the default version of Python instead of 2.7. Python 3.5 (or newer) is well supported by the Python packages required to analyze data and perform statistical analysis, and bring some new useful features, such as a new operator for matrix multiplication (@). • Removed distinction between integers and longs in built-in data types chapter. This distinction is only relevant for Python 2.7. • dot has been removed from most examples and replaced with @ to produce more readable code. • Split Cython and Numba into separate chapters to highlight the improved capabilities of Numba. • Verified all code working on current versions of core libraries using Python 3.5. • pandas – Updated syntax of pandas functions such as resample. – Added pandas Categorical. – Expanded coverage of pandas groupby. – Expanded coverage of date and time data types and functions. • New chapter introducing statsmodels, a package that facilitates statistical analysis of data. statsmodels includes regression analysis, Generalized Linear Models (GLM) and time-series analysis using ARIMA models. iv Changes since the Second Edition • Fixed typos reported by a reader – thanks to Ilya Sorvachev • Code verified against Anaconda 2.0.1. • Added diagnostic tools and a simple method to use external code in the Cython section. • Updated the Numba section to reflect recent changes. • Fixed some typos in the chapter on Performance and Optimization. • Added examples of joblib and IPython’s cluster to the chapter on running code in parallel. • New chapter introducing object-oriented programming as a method to provide structure and orga- nization to related code. • Added seaborn to the recommended package list, and have included it be default in the graphics chapter. • Based on experience teaching Python to economics students, the recommended installation has been simplified by removing the suggestion to use virtual environment. The discussion of virtual environments as been moved to the appendix. • Rewrote parts of the pandas chapter. • Changed the Anaconda install to use both create and install, which shows how to install additional packages. • Fixed some missing packages in the direct install. • Changed the configuration of IPython to reflect best practices. • Added subsection covering IPython profiles. • Small section about Spyder as a good starting IDE. vi Notes to the 2nd Edition This edition includes the following changes from the first edition (March 2012): • The preferred installation method is now Continuum Analytics’ Anaconda. Anaconda is a complete scientific stack and is available for all major platforms. • New chapter on pandas. pandas provides a simple but powerful tool to manage data and perform preliminary analysis. It also greatly simplifies importing and exporting data. • New chapter on advanced selection of elements from an array. • Numba provides just-in-time compilation for numeric Python code which often produces large performance gains when pure NumPy solutions are not available (e.g. looping code). • Dictionary, set and tuple comprehensions • Numerous typos • All code has been verified working against Anaconda 1.7.0. viii Contents 1 Introduction 1 1.1 Background ............................................................ 1 1.2 Conventions ............................................................ 2 1.3 Important Components of the Python Scientific Stack ................................ 3 1.4 Setup ................................................................ 4 1.5 Using Python ........................................................... 6 1.6 Exercises .............................................................. 12 1.A Additional Installation Issues ................................................. 13 2 Python 2.7 vs. 3 (and the rest) 19 2.1 Python 2.7 vs. 3.x ........................................................ 19 2.2 Intel Math Kernel Library and AMD’s GPUOpen Libraries ............................. 19 2.3 Other Variants .......................................................... 20 2.A Relevant Differences between Python 2.7 and 3 ................................... 20 3 Built-in Data Types 23 3.1 Variable Names ......................................................... 23 3.2 Core Native Data Types .................................................... 24 3.3 Additional Container Data Types in the Standard Library .............................. 34 3.4 Python and Memory Management ............................................. 35 3.5 Exercises .............................................................. 37 4 Arrays and Matrices 39 4.1 Array ................................................................. 39 4.2 Matrix ................................................................ 41 4.3 1-dimensional Arrays ...................................................... 42 4.4 2-dimensional Arrays ...................................................... 43 4.5 Multidimensional Arrays .................................................... 43 4.6 Concatenation .......................................................... 43 4.7 Accessing Elements of an Array .............................................. 44 4.8 Slicing and Memory Management ............................................. 49 4.9 import and Modules ...................................................... 51 x CONTENTS 4.10 Calling Functions ........................................................ 52 4.11 Exercises .............................................................. 54 5 Basic Math 57 5.1 Operators ............................................................. 57 5.2 Broadcasting ........................................................... 57 5.3 Addition (+) and Subtraction (-) ............................................... 59 5.4 Multiplication (*) ......................................................... 59 5.5 Matrix Multiplication (@) .................................................... 59 5.6 Array and Matrix Division (=) ................................................. 60 5.7 Exponentiation (**) ....................................................... 60 5.8 Parentheses ............................................................ 60 5.9 Transpose ............................................................. 61 5.10 Operator Precedence ..................................................... 61 5.11 Exercises .............................................................. 62 6 Basic Functions and Numerical Indexing 63 6.1 Generating Arrays and Matrices .............................................. 63 6.2 Rounding .............................................................. 66 6.3 Mathematics ........................................................... 67 6.4 Complex Values ......................................................... 69 6.5 Set Functions ........................................................... 69 6.6 Sorting and Extreme Values ................................................. 70 6.7 Nan Functions .......................................................... 72 6.8 Functions and Methods/Properties ............................................. 73 6.9 Exercises .............................................................. 74 7 Special Arrays 77 7.1 Exercises .............................................................. 78 8 Array and Matrix Functions 79 8.1 Views ................................................................ 79 8.2 Shape Information and Transformation .......................................... 80 8.3 Linear Algebra Functions ................................................... 87 8.4 Exercises .............................................................. 90 9 Importing and Exporting Data 93 9.1 Importing Data using pandas ................................................ 93 9.2 Importing Data without pandas ............................................... 94 9.3 Saving or Exporting Data using pandas ......................................... 99 9.4 Saving or Exporting Data without pandas ........................................ 100 9.5 Exercises .............................................................. 101 CONTENTS xi 10 Inf, NaN and Numeric Limits 103 10.1 inf and NaN ...........................................................

Introduction to Python for Econometrics, Statistics and Data Analysis

Python on Gpus (Work in Progress!)

Introduction Shrinkage Factor Reference

Department of Geography

Prototyping and Developing GPU-Accelerated Solutions with Python and CUDA Luciano Martins and Robert Sohigian, 2018-11-22 Introduction to Python

The Evolution of Econometric Software Design: a Developer's View

How to Access Python for Doing Scientific Computing

International Journal of Forecasting Guidelines for IJF Software Reviewers

Tangent: Automatic Differentiation Using Source-Code Transformation for Dynamically Typed Array Programming

Goless Documentation Release 0.6.0

Estimating Regression Models for Categorical Dependent Variables Using SAS, Stata, LIMDEP, and SPSS*

Arxiv:1210.6293V1 [Cs.MS] 23 Oct 2012 So the Finally, Accessible

IMPLEMENTING OPTION PRICING MODELS USING PYTHON and CYTHON Sanjiv Dasa and Brian Grangerb