How to Use Stata, SPSS, Mlwin, R and Editpad

How to Use…

Guide to using software for the practicals of the Essex Summer School 2016 course 1E ‘Introduction to multilevel models with applications’

Paul Lambert, July 2016 Version 2.5 of 13/JUN/2016

Contents How to... Undertake the practical sessions for module 1E ...... 2 Principles of the practical sessions ...... 4 Organisation of the practical sessions ...... 6 ..The tremendous importance of workflows...... 9 How to use… EditPad Lite ...... 13 How to use… Stata ...... 17 How to use… SPSS ...... 24 How to use… MLwiN ...... 30 How to use MLwiN (i): MLwiN the easy way ...... 30 How to use MLwiN (ii): MLwiN for serious people ...... 34 How to use MLwiN (iii): Using ‘runmlwin’ or ‘r2mlwin’ ...... 36 Some miscellaneous points on working with MLwiN ...... 38 How to use… R ...... 39 References cited ...... 43 Software-specific recommendations: ...... 44

1 1E Coursepack: Appendix 2 How to... Undertake the practical sessions for module 1E

The next few pages describe topics and materials from the 1E lab sessions. The format of the labs is that you are supplied with a set of example ‘command files’ in the relevant packages, plus a corresponding (electronic) exercise sheet. During the scheduled session you should open and review the example command files (.do, .sps, .mac and .R files), then finish the session by answering the exercise sheet (which draws upon things illustrated in the example command files).

Annotation notes can be found within the example command files which try to explain what the commands are doing. Some of the example files are very long, but once you become familiar with them you will realise that their sections can often be moved through quite quickly. Ordinarily, you should proceed through the example command files by running a few lines at a time and observing the outputs generated. As you digest the examples, it will often help to add your own notes, adjustments or examples to the command file text. You’ll ordinarily need to have the analytical software open and the relevant tool for working with a syntax file (e.g. ‘syntax window’ or ‘do file editor’). In addition it will typically also be helpful to have open some applications to remind you of where the data is stored, and perhaps a plain text editor allowing you to conveniently open up several other syntax files for purposes of comparison.

 When working with Stata a typical view of your desktop might be something like:

Description: The first two interfaces you can see in this screenshot are respectively the Stata do file editor (where I write commands and send them to be processed, such as by highlighting the relevant lines and clicking ‘ctrl-d’); and the main Stata window (here Version 10) which includes the results page. Note that the syntax file open is a modified (personalised) version of the supplied illustrative syntax file – the name has been changed so that I can save the original file plus my own version of it after edits (e.g. with my own comments). Behind the scenes I’ve also got open an ‘Editpadlite’ session which I’m using to conveniently open up and compare some other sytnax files that I don’t particularly want in my do file editor itself; I’ve also got a file manager software open showing the data I’m drawing upon (in what Stata will call ‘path1a’); and I’ve got some Internet Exporer (IE) sessions open (I’m looking up the online BHPS documentation, and the Stata listserv, where information on Stata commands is available).

2 1E Coursepack: Appendix 2

 When working with SPSS, you may have a desktop session something like:

Description: The three SPSS interfaces shown in this screenshot are firstly the syntax editor (where I write commands and send them to be processed, such as by highlighting the relevant lines and clicking ‘ctrl-r’); then behind this are the ‘data editor’ and the ‘ouptut’ window in SPSS, where I can see the data and results from the commands. Again, the syntax file open is a modified (personalised) version of the supplied illustrative syntax file – the name has been changed so that I can save the original file plus my own version of it after edits (e.g. with my own comments). Whilst working in SPSS I’ve also got open an ‘Editpadlite’ session which I’m using to keep open and compare some other sytnax files that I’m checking against; I’ve also got a file manager software open (Windows Explorer) to show the location of the data files I’m using (i.e. the contents of ‘path1a’); and I’ve got some IE sessions open (I’m looking at the BHPS documentation, but it’s late in the day and my self-discipline is slipping a little, so I’ve got my email open as well, which is never going to help with my multilevel modelling..!).

3 1E Coursepack: Appendix 2 Principles of the practical sessions

The programme of practicals included within course 1E is intended to help you first in understanding the topics covered in teaching sessions, and second by furnishing you with important skills in applying multilevel analysis techniques to social science data.

For the practical sessions we provide this handout which briefly describes the contents of the sessions and the Essex lab setup, and then gives more extended material describing the various software packages used in the labs. In addition, within the labs you are given copies of extended software specific command files which contain the details of the exercises in the form of software code and some degree of descriptive notes. These sample command files will hopefully provide you with some useful worked examples which will help you for quite some time into the future. Supplementing these resources, you are also given lab ‘exercise sheets’, and some ‘homework’ exercises, which should serve to test the skills covered in the practical sessions.

A lot of material is presented across the programme of practicals, particularly in the details of supplementary exercises or activities. It is easy with software based materials to skip through parts of the sessions briefly without properly digesting them. Nevertheless you will get the best out of the module if you work through the practical exercises in full (even if that involves catching up with some aspects of them later on). You should also be prepared that covering the materials in full will often include following up on other information sources, such as software help guides and files.

To some extent, part of learning to work effectively with complex data and analysis methods (which is what, arguably, defines multilevel modelling of social science processes) is about teaching yourself to be a better software programmer in relevant software packages. Many social scientists have not previously been trained in programming. Good programming involves using software in a manner which generates a syntactical record of the procedures; keeping your software commands and files well organised; and learning specific programming rules, fragments of code, and code options, as are relevant to the particular software packages (see also the sections immediately below). Much of the work of Course 1E, indeed, concerns the development of your skills for using software for handling and analysing social science datasets. To pre-warn you, for most of us, the development of programming skills in this domain is a long and arduous process.

We would also highlight four themes running through the practical sessions which summarise the approach we are adopting and the principles of good practice in using software for handling and analysing social science datasets which we follow:

 Software pluralism  Documentation for replication  The integration of data management and data analysis  The use of ‘real’ data

Software pluralism

Payne et al. (2004) popularised the concept of ‘methodological pluralism’ for social scientists, as the idea that all good researchers should strive to have a working understanding of different methods of analysis, not just their own specialities. We have a similar view with regard to software: it is valuable to have some understanding of how to do similar things in different packages. This is the case because, as well as providing valuable practical skills, pluralism also provides insight and understanding, by seeing things from different perspectives. Our lab materials do not

4 1E Coursepack: Appendix 2 comprehensively have the same materials in all four packages (they have, relatively, much more content in Stata than in SPSS, R or MLwiN), but we have tried to provide illustrations across packages, and, though we recognise that most of you will concentrate on one set of examples, we think that you will learn useful things from spending a little time with others packages as well.

Documentation for replication

Our next principle refers to the importance of working in such a way as to keep a clear linear record of the steps involved in your analysis. This serves as a means to support the long-term preservation of your work. A useful guiding principle is that, after a significant break in activities, you yourself – or another researcher at some future point – could retrace and replicate your analysis generating exactly the same results. There are several excellent recent texts which explain the importance of the principle of replicability to a model of systematic scientific enquiry with social science research data (e.g. Dale, 2006; Freese, 2007; Kohler and Kreuter, 2009; Altman and Franklin 2010). Long (2009) gives a very useful extended guide to organising you research materials for this purpose.

In social statistics we work overwhelmingly with software packages which process datasets and generate results. In this domain, documentation for replicability is readily achieved by (a) the careful and consistent organisation and storage of data files and other related materials, and (b) the systematic use (and storage) of ‘syntactical’ programming to process tasks.

‘Syntactical programming’ (also often described as ‘textual command programming’) involves finding a way to write out software commands in a textual format which may be easily stored, reopened, re-run and/or edited and re-run. Examples include using SPSS ‘syntax files’, and Stata ‘do files’, but almost all software programmes support some sort of equivalent syntactical command representation. The use of syntactical programming is not universal in the analysis of social science research data - but it should be. A great attraction of programmes such as SPSS and Stata is that their syntactical command language is largely ‘human readable’ (that is, you can often guess what the syntax commands mean even before you look them up). We’ll say more on applying syntactical programming in the ‘How to use…’ sections below.

The integration of data management and data analysis.

Our third principle involves recognising the scope for modification and enhancement of quantitative data in the social sciences. We all know that data is ‘socially constructed’, but the impact for analysts, which we sometimes forget, is that many measures in a survey dataset are ‘up for negotiation’ (such as in choices we have regarding category recodes, missing data treatments, derived summary variables, and choices over the ‘functional form’). The way that such decisions are actually negotiated in any particular study comprises an element of research which is often labelled ‘data management’. (In fact, in other work, a group of us at the University of Stirling developed a number of training and instructional materials on data management for social survey research, available from www.dames.org.uk/workshops).

It is our experience that most applied projects require more input to data management than to data analysis aspects of the work. As such, finding software approaches (or combinations of software approaches) which support the seamless movement back and forth between data management and data analysis is an important consideration. In this regard, Stata stands out at the current time as the superior package for social survey research as it combines extended functionally for both components of the research process (this is the main reason that Stata is used more frequently than

5 1E Coursepack: Appendix 2 other packages in this course). Other packages or combinations of packages can also be used effectively (for instance a common way of working with multilevel models for those who don’t have access to Stata is to combine SPSS for data management and exploratory analysis, and MLwiN or R for more complex multilevel data analysis functions).

The use of ‘real’ data.

Our last principle may sound rather trite but is an important part of gaining applied skills in social science data analysis. Statistics courses and texts ordinarily start with simplified illustrative data such as simulated data and data files which are very small in terms of the number of cases and variables involved. This is for very good reasons – trying anything else can lead to access permission problems, software processing time delays, and onerous requirements for further data management. In this course we do often use simplified datasets with these characteristics, however, as much as possible we also include materials which start from the original source data (e.g. the full version of the data such as you could get from downloading via the UK Data Archive or a similar data provider).

There are quite a few reasons why using full size data and the original measures is a good thing to aspire to. Firstly, in the domain of multilevel modelling techniques, estimation and calculation problems, and the actual magnitude of hierarchical effects, can often be different in ‘real’ datasets compared to those from simplified examples (in ‘real’ applications, the former tend to be much greater, and the latter much smaller..!). Secondly, using full size data and original variables very often exposes problems, errors or required adjustments with software routines which may not otherwise be apparent (for instance, errors caused by missing data conventions, or upper limits on the numbers of categories, cases or variables, are common problems here). Third, our experience is that by using ‘real’ data during practical sessions, as course participants you are much more likely to go away from the course in a position to be readily able to apply the analytical methods covered to your own data and analysis requirements.

Do I really need to do all of this? There’s no doubt that we are ‘aiming high’ in our advice here about approaches to using software, and the correspondingly wide coverage of the labs! Some students find this all a bit overwhelming, for instance because there are a great many files to look through, and challenging practices to try to adhere to. We won’t stop you if you don’t do everything in our way, and in fact we’d prefer that than if you worked yourself into a nervous wreck trying to cover everything that we suggest. Still, there really are benefits to developing good practices of the manner described here, and my argument would be that if you’ve come this far (to Colchester!), you might as well push yourself further to develop as best you can as a data analyst.

Organisation of the practical sessions

Materials referred to in the 1E practical sessions will include:  data files (copies of survey data used in the practicals);  sample command files (pre-prepared materials which include programming commands in the language of the relevant software)  summary exercise sheets (questions which test you on things covered by each lab)  supplementary ‘macros’ or ‘sub-files’ (further pre-prepared materials featuring programming commands in relevant languages, usually invoked as a sub-routine within the main sample command files)  teaching materials (i.e. the Coursepack notes and lecture files).

6 1E Coursepack: Appendix 2

At each session we will give you example command files for the data analysis packages, which take you through the process of carrying out the relevant data analysis operations. The main activities of the lab sessions involve opening the relevant files and working your way through them. During the labs, we do of course expect that you will ask for help or clarification on parts of the command files which are not clear.

During the workshop you will be able to store data files at a network location within the University of Essex systems. You should allocate a folder for this module, and use subfolders within it to store the materials for this module.

A typical setup will look be something like:

This session is from my own laptop so it won’t be identical to your machine, but you should have something broadly similar. Illustrated, I’ve made a folder called ‘d:\essex10’ within which I’ve made a number of subfolders for different component materials linked to the course. For example, in the ‘syntax’ subfolder I’ve got copies of the four example command files linked to Practical session 1; although not shown, other things from the module are in the other subfolders, such as a number of other relevant command files in the ‘sub_files’ folder, and some relevant data files in the ‘data’ folder.

An important point to make is that the various command files will need to draw upon other files (e.g. data files) in order to run. To do this, they need to be able to reference the location of the required files. In most applications, we do this by defining ‘macros’ which point to specific ‘paths’ on your computer (see also software sections below). For the labs to work successfully, it will be necessary to ensure that the command file you are trying to run is pointing to the right paths at the right time. In general, this only requires one specification to be made at the start of the session, for instance to do this in SPSS and in Stata we define ‘macros’ for the relevant paths (also see pages 20 and 28 of the ‘How to use’ guide on this issue). Sometimes however it can be necessary to edit the full path reference of a particular file in order to be able to access it.

7 1E Coursepack: Appendix 2 For example, in the text below, we show some Stata and SPSS commands which in both cases define a macro (called ‘path3a’) which gives the directory location of the data file ‘aindresp.dta’ or ‘aindresp.sav’. Subsequent commands which call upon it will then go directly to that path:

Stata example SPSS example global path3a “d:\data\bhps\” define !path3a () “d:\data\bhps\” !enddefine. use pid asex using $path3a\aindresp.dta, clear get file=!path3a+“aindresp.sav”. tab asex fre var=asex.

Example command files

Most of the command files are annotated with descriptive comments which try to explain, at relevant points, what they are doing in terms of the data and analysis.

In general, to use the example command files, you should:

 Save them into your own filestore space  Open them with the relevant part of the software application (e.g. the ‘syntax editor’ in SPSS or the ‘do file editor’ in Stata (see also the ‘How to use..’ subsections)  Save the files with a new name to indicate that this will be your personalised copy of the file (i.e. so that you can edit the files without risk of losing the original copies)  Make any necessary edits to the path locations given in the files (usually this involves editing the ‘path’ statements near the top of the relevant files)  Work through the files, implementing the commands and interpreting the outputs, adding your own comments to the files to help you understand the operations you’re performing

Data Files

We’ll use a great many data files over the course of the practical programme. We’re planning to supply them via the network drive, though it may sometimes be easier to transfer them by other means.

Many of these data are freely available online, such as the example data files used in textbooks such as by Hox (2010), Treiman (2009) and Rabe-Hesketh and Skrondal (2012).

Some of the data files are versions of complex survey datasets from the UK and from comparative international studies. We can use these for teaching purposes, but you are not allowed to take them away with you from the class. You can in general access these data for free, however to do so you must register with the relevant data provider (e.g. the UK Data Service) and agree to any associated conditions of access (e.g. required citations and acknowledgements).

Macros/non-interactive command files

As well as the example ‘interactive’ command files, at certain points we will also supply you with command files in the various software languages which essentially serve as ‘macros’ (also referred to as ‘sub-files’) to perform specific tasks in data management or analysis. These ordinarily do not need to be further edited or adjusted, and it is not particularly necessary to examine their contents,

8 1E Coursepack: Appendix 2 but it will be necessary to take some time to place the files in the right location in order to invoke them successfully at a suitable time.

Where to put your files..

In general it will be effective to store your files (example command files, data files, and macro files) in the allocated network filestore space during the course of the summer school (‘M:\ drive’). This filestore should be available to you from any machine on the Essex system.

It might not be effective to store some of the larger survey data files on the network location however (due to their excessive size). In some instances, it might be more efficient to store the files on the hard drives of the machine you are using (e.g. the C:\ drive). However, if doing this, you should be careful that contents of the hard drives may not be preserved when you log off.

The various exercises and command files ought all to be transferable to other machines (e.g. your laptop or you work machine at your home institution), though obviously you will need to install the relevant software if you don’t already have it. At the end of the module, you will probably want to take a copy of all your command files and related notes, plus the freely available example datasets and the macro files, to your memory stick or some other means, to take back with you.

Analysis of your own data

In general we would very much encourage you to bring you own data to the practical sessions. You’ll get a lot out of the module if you are able to repeat the demonstration analyses on your own data and research problems. It will ordinarily be most effective to undertake the class examples first, then subsequently try to re-do the analysis, in part or in whole, on your own data. If we can, we’ll help you in working with your own data within lab sessions, though when the class is busy we need to prioritise people who are working through the class examples.

..The tremendous importance of workflows..

If you haven’t come across the idea of workflows before, then this might simply be a point that, at this stage, we ask you to take in good faith! There are very good expositions of the idea of workflows in the social science data analysis process in, amongst others, Long (2009), Treiman (2009), and Kohler and Kreuter (2008).

A workflow in itself is a representation of a series of tasks which contribute to a project or activity. It can be a useful exercise to conceptualise a research project as a workflow (with components such as data collection, data processing, data analysis, report writing). However, what is a really important contribution is to organise the data and command files that are associated with a project in a consistent style that recognises their relevant contributions to the workflow structure.

What does that involve? The issue is that we want to construct a replicable trail of our data oriented research, which allows us to go all the way from opening the initial data file, to producing the

9 1E Coursepack: Appendix 2 publication quality graph or statistical results which are our end products. We need the replicable trail in order that it will be possible to adjust our analysis on the basis of minor changes at any possible stage of the process (or to be able to transfer a record of our work on to others). However because the trail is long and complex (and not entirely linear), we can only do this, realistically, if we break down our activities into multiple, separate components.

There are different ways to organise files for these purposes, but a popular and highly effective approach is to design a ‘master’ syntax command file and a series of ‘sub-files’ which it draws upon. In this model, the sub-files cover different parts of the research analysis. Personally, my preference is to construct both the master and sub-files in the principle software package being used, though Long (2009) notes that creating a documentation master file in a different software (e.g. MS Excel) is an effective way to record a wider range of activities which span across different software.

Here’s an example of a series of tasks being called upon via a Stata format ‘master’ file:

This screenshot shows the Stata master file (‘bhps_educ_master.do’), plus a couple of other Stata do files, which I’ve opened in the do file editor; plus, in addition, a number of other component files are open within the EditPad editor, including the sub-file which is being called directly by the master file (‘pre_analysis1.do’). The Stata output file is not visible, but is open behind the scenes.

10 1E Coursepack: Appendix 2

Here’s an example of a project documentation file that might be constructed in Excel:

Note that the other tabs in the Excel file can be used to show things like author details, context of the analysis, and last update time. The file also notes some (though not all) dependencies within the workflow – for instance step 9 requires step 4 to have been taken (the macro reads in a plain text data file that was generated in Stata by the do file ‘pre_analysis1.do’).

In summary, we can’t advise you strongly enough on the value of organising you data files around a workflow conceptualisation, such as through master and sub-files. Read the opening chapters of Long (2009), or the other references mentioned above, for more on this theme. You could also look at the workshop materials from the ‘DAMES’ research Node, at www.dames.org.uk, for more on this topic.

11 1E Coursepack: Appendix 2 Software alternatives

Many different software packages can be used effectively for applied research using complex social science data (e.g. Lambert et al. 2015). Various packages support the estimation of a wide range of multilevel models covering random effects models and other specifications. Packages are not uniform, however, in their support for multilevel models. There are variations in their coverage of different modelling options; their requirements and approaches to operationalising data for modelling; the statistical algorithms the packages use; the relative estimation times required; the default settings for estimation options and other data options; and the means of displaying and storing the results produced. The Hox (2010) text comments systematically on software options for different modelling requirements, and other comparisons can be found in Twisk (2006) and West et al. (2007), amongst others. A very thorough range of material reviewing alternative software options for multilevel modelling is available at the Centre for Multilevel Modelling website at:

http://www.bristol.ac.uk/cmm/learning/mmsoftware/

In this course we focus upon four packages which bring slightly different contributions to multilevel modelling.

 SPSS is used because it is a widely available general purpose package for data management and data analysis, which also supports the estimation of many random effects multilevel models. Field’s (2013) introductory text on SPSS includes a chapter on multilevel modelling in SPSS, whilst Heck et al. (2010, 2012) give longer treatments of the topic.

 Stata is used because it is a popular general purpose package for data management and data analysis which includes a substantial range of multilevel modelling options. Stata is attractive to applied researchers for many reasons, including its good facilities for storing and summarising estimation results; its support of a wide range of advanced analytical methods which complement a multilevel analysis (such as clustering routines often used in Economics); and its wide range of data management functions suited to complex data. Rabe-Hesketh and Skrondal’s (2012) text provides a very full intermediate-to-advanced level guide to multilevel modelling in Stata.

 MLwiN is used because it is a widely available, purpose built package for applied multilevel modelling which covers almost all random effects modelling permutations, and is also designed to be broadly accessible to non-specialist users. A further attraction of MLwiN is that its purpose built functionality means that it supports a wider range of estimation options, and achieves substantially more efficient estimation, than many other packages. The MLwiN manual by Rasbash et al. (2012) is a very strong, and accessible, guide to this software.

 R is used in this course because it is a popular freeware that supports most forms of multilevel model estimation (via extension libraries). It is a difficult language for many social scientists to work with because it has high ‘overheads’ in its programming requirements in several different ways (although plenty of help is available – e.g. Grolemund and Wickham 2016). Gelman and Hill (2007) give an excellent advanced introduction to multilevel modelling which is illustrated with examples in R.

We should stress that many more packages can be used effectively for multilevel modelling analysis. Some of those that are often used for multilevel modelling in social science research, but that we have not mentioned hitherto, are HLM (Raudenbush et al., 2004), M-Plus (Muthén and Muthén, 2010), and BUGS (e.g. Lun et al., 2000). Some cross-over between packages can also be found, such as in linkage routines known as ‘plug-ins’ which allow a user to invoke one software package from the interface of another – examples include routines for running HLM (hlm, see Reardon, 2006) and MLwiN (run-mlwin, see Leckie and Charlton, 2011) from Stata, and tools to run an estimation engine known as ‘Sabre’ in Stata and in R (see Crouchley et al. 2009).

Finally, an exciting development in the area is a project based at the University of Bristol that has constructed a generic software, ‘StatJR’ (www.bristol.ac.uk/cmm/software/statjr/), for specifying and estimating complex statistical models of ‘arbitrary complexity’, including most forms of multilevel models, and many other types of model (e.g. Browne et al. 2012). In ongoing work this project is also linked to a facility supporting workflow modelling of research tasks, known as ‘ebooks’ (www.bristol.ac.uk/cmm/research/ebooks/).

12 1E Coursepack: Appendix 2

How to use… EditPad Lite

The first package we’ll introduce is not a statistical analysis package. EditPad Lite is a freely distributed plain text editor, and we go to the trouble of introducing it for two reasons:

 Using a plain text editor to open supplementary files and subfiles is a very good way to perform complex operations and programming without getting into too tangled a mess.  In our experience, plenty of people haven’t previously used plain text editors, and are initially confused by the software until they’ve got used to using it.

1) Access/ To access the package, go to http://www.editpadlite.com/ and follow the links to download the installation software:

13 1E Coursepack: Appendix 2

It is best to click ‘save file as’ and run the setup.exe file from a location on your computer. This 2) More on will extract four component files which allow you to run the software, regardless of where they installation are stored. You could for instance install the files on a section of your memory stick and run the programme from there, but I would recommend saving them to your network filestore folder and running the software from that location. E.g.:

You can now run the software by double clicking the ‘EditPadLite.exe’ file. To save yourself some time, you might want to copy that .exe file, then ‘paste as shortcut’ onto your desktop (and maybe also add a shortcut key to the shortcut icon), so that you can open up the software rapidly (though such desktop settings might not be preserved across login sessions).

14 1E Coursepack: Appendix 2 When you open the software, you are given a blank template from where you can then ask to 3. Opening open one or more files (you can in fact open several files in one call, which can be convenient): files

(note how the multiple files are organised under multiple tabs)

15 1E Coursepack: Appendix 2 4. Editing You can use EditPadLite to edit files (for instance sub-files and macros which are called upon in files other applications). There are numerous handy file editing functionalities in EditPad Lite which you won’t get in a simpler text editor such as Notepad. For instance you can run various types of ‘search’ and ‘replace’ commands, across one or multiple files. The latter is often helpful, for instance if you have built up a suite of related files and you want to change a path specification in all of them.

(The red tabs indicate the contents of the file have been changed and have not yet been saved)

Closing comments on EditPad Lite

Using a plain text editor is very helpful if your work involves programming in any software language. It can help you to deal with complex combinations of files, and it provides a ‘safe’ way of editing things without inadvertently adding in formatting or other characters (a problem that will arise if you try to edit command files in a word processor software such as Microsoft Word).

We’ve nominated EditPadLite for this purpose, though you might find a different editor more helpful or preferable. There are many alternatives, some of which are free to access for some or all users, and others of which require a (typically small) purchase price. Note that if you wish to use EditPad for commercial purposes, the licence requires that you should purchase the ‘professional’ version of the package, called ‘EditPadPro’.

16 1E Coursepack: Appendix 2

How to use… Stata

Stata was first developed in 1984 and was originally used mainly in academic research in economics. From approximately the mid 1990’s its functionalities for social survey data analysis began to filter through to other social science disciplines, and in the last decade it has displaced SPSS as the most popular intermediate-to- advanced level statistical analysis package in many academic disciplines which use social survey data, such as in sociology, educational research, population health and geography.

Stata is popular for many good reasons (see also Treiman 2009: 80-6). The list of features of Stata that lead me personally to favour this package above others are:

 It supports explicit documentation of complex processes through a concise and ‘human readable’ syntax language  It supports a wide range of data management functions, including many routines useful in complex survey data which are not readily performed in other packages (e.g. ‘egen’, ‘xtdes’)  It supports a very full range of statistical modelling options, including several advanced model specifications which are not widely available elsewhere  It has excellent graphics capabilities, supporting the specification and export of publication quality graphs in a syntactical, replicable manner  It features very convenient tools for storing the results from multiple models or analyses and compiling them in summary tables or files (e.g. ‘est store’, ‘statsby’)  It can read online data files and run command files and macros from online locations  It supports extended add-on programming capabilities, and benefits from a large, constructive community of user-contributed extensions (see http://www.stata.com/links/utilities-and-services/)

In pragmatic terms, most users of Stata are reasonably confident programmers, and getting started with the package does need a little effort in learning about data manipulation and data analysis. This is one reason why Stata is not yet widely taught in introductory social science courses - though, in the UK for example, it is increasingly used in intermediate and advanced level teaching (e.g. MSc programmes or Undergraduate social science programmes with extended statistical components).

A common problem with working with Stata is that many institutions do not have site-level access to the software, and accordingly many individual researchers don’t have access to the package. Stata is generally sold as an ‘n-user’ package, which means that an institution buys a specified number of copies at any one time. Recently however, access to Stata for academic researchers has probably be made easier by the Stata ‘GradPlan’, which allows purchase of personal copies of the package for students and faculty at fairly low price – see http://www.stata.com/order/new/edu/gradplans/. Stata also comes in several different forms with different upper limits on the scale of data it may handle – ‘Small Stata’ is not normally adequate for working with advanced survey datasets; ‘Intercooled’ Stata (I/C) usually has more than enough capacity to support social survey research analysis (using I/C with a large scale dataset, you may occasionally hit upper limits, such as on the number of variables or cases, but it is usually possible to find an easy work-around, such as by dropping unnecessary variables); Stata SE and MP offer greater capacity regarding the size of datasets and faster processing power, but they are more expensive to purchase. To my knowledge, most academic researchers use Intercooled Stata.

Many online resources on Stata are available, in particular we highlight:

 UCLA’s ATS pages: http://www.ats.ucla.edu/stat/stata/ (Features a wide range of materials including videos of using Stata and routines across the range of the package)  The CMM’s LEMMA online course: http://www.bristol.ac.uk/cmm/learning/course.html (includes detailed descriptions of running basic regression models and of specifying random effects multilevel models in Stata)  In the first lab session we point you to an illustrative do file which serves as an introduction to Stata, available from www.longitudinal.stir.ac.uk

17 1E Coursepack: Appendix 2 The steps below give you some relevant instructions on working with Stata for the purposes of the module.

(The images below are for Stata version 10, whereas the Essex lab is likely to have access to the most recent version, 13. For introductory purposes, there are few differences between these versions, though you might notice small changes to the default colour schemes and layouts (many people edit these to their own tastes anyway – see under ‘general preferences’. Later on, when we get to running multilevel models, version 13 is better to use, since it has improved functionality here).

When you On opening the programme (this image shows Stata version 10): launch the package, you see the basic Stata window, here for version I/C 10.1.

You can customise its appearance (e.g. right click on the results window) – the image on the right reflects how I’ve set up the windows on my machine, and will be slightly different to what you see by default on first launching the package in the lab at Essex.

Is Stata perfect? My account above might give this impression, and it is certainly well suited to applied social science research with survey datasets. For multilevel modelling, Stata does have a few flaws, for instance having very slow estimation times for some models, and there are a few other bugs and annoying quirks that we will see during the course of the lab sessions. Thus it probably isn’t the perfect package for our purposes - but I would say that it gets quite close.

18 1E Coursepack: Appendix 2 The very first thing you should do at the start of every session it so ask explicitly to open the ‘do file’ editor with ‘ctrl+8’ or ‘ctrl+9’ or via the GUI.

Note below that we can have several do files open at once.

Not shown below, but from Stata 11 onwards, the default syntax editor has formatting features such as colour coding that aim to aid programming. It is also possible to set up Stata to run directly from a plain text editor if you wish to (search online for how to do this).

19 1E Coursepack: Appendix 2 Once you’ve opened a ‘do file’ you can begin running commands by highlighting the segments of the relevant command lines and clicking ‘ctrl+R’.

(Note that I’ve changed the windows display to ‘white’ background to make it easier to read the outputs)

Important: Defining macros for paths. This particular image shows an important component of the start of every session in the module lab exercises. The lines beginning with ‘global’ are ways of defining permanent ‘macros’ for the session. The macros serve to define locations on my machine where my files (e.g. data files) are actually stored. Doing this means that in later commands (e.g. the image below) I can call on files via their macro folder locations rather than their full path – this aids transferability of work The results between machines/locations. are shown in the results window. Error messages are by default shown in red text and lead to the termination of the sequence of commands at that point

(unlike in SPSS, which carries on, disregarding the error).

In the above, the macro which I have called ‘path3b’ means that when Stata reads the line:

what it reads ‘behind the scenes’ is

20 1E Coursepack: Appendix 2

You can also submit commands line by line through the command interface (e.g. if you don’t want to log them in the do file).

Note how the ‘review’ window shows lines that were entered through the command window, but it just shows some programming code for commands run through the do file editor.

You need to ‘clear’ the dataspace to read in a new file, e.g. Note some of use $path3b\ghs95.dta, clear the features of the Stata You can’t create a new variable with the same name as an existing one – if it’s there already syntax you need to delete it first, e.g. language: drop fem

gen fem=(sex==2) The ‘capture’ command suppresses potential error messages so is a useful way to make commands generic capture drop fem gen fem=(sex==2) Using ‘by:’ or ‘if..’ within a command can usefully restrict the specifications: tab nstays if sex==1 bysort sex: tab nstays if age >= 20 & age <= 30 The ‘numlabel’ command is a useful way to show both numeric values and categorical labels compared to the default (labels only), e.g.; tab ecstaa numlabel _all, add tab ecstaa There’s no requirement for full stops at the end of lines, but a carriage return serves as the default delimiter, and so we usually use ‘///’ to extend a command over more than one line. bysort sex: tab nstays /// if age >= 20 & age <= 60 & ecstaa==1

21 1E Coursepack: Appendix 2

Example of finding and installing the ‘tabplot’ extension routine:

Extension routines are often written by users and made available to the wider community. To exploit them, you need to run either ‘net install’ or ‘ssc install’

(the exact code needed may depend on which machine you are working on – you may have to define a folder for the installation that you have permission to write to)

We often run subfiles, or define macros or programmes, via calling upon other do files with the ‘do’ command

22 1E Coursepack: Appendix 2

‘est store’ is a great way to collate and review multiple model results

Extension: You can write a ‘.profile’ file to load some starting specifications into your session

For lots more on Stata, see the links and references given above, or the DAMES Node workshops at dames.org.uk

23 1E Coursepack: Appendix 2

How to use… SPSS

Here we will refer to SPSS as the software which, in 2008, was officially rebranded as ‘PASW Statistics’, and then renamed again in 2009 as “IBM SPSS Statistics”. SPSS has been a market leader as a software tool for social survey data analysis and data management since its first versions in the late 1960’s. Some people believe that position is coming to an end, since recent revisions to SPSS seem to make it less suitable for academic research users, whilst other packages have emerged, such as Stata, that seem to many to be more robust and more wide-ranging in their capabilities. Nevertheless, most European educational establishments have site- level access to SPSS, and continue to teach it to students as a first introductory package for survey data analysis; additionally a great many social science researchers continue to work with the package – so the software itself remains a significant part of social survey data analysis.

SPSS has a very good range of useful features for data management and exploratory analysis of survey data. It also has functionality that supports many variants of random effects multilevel models, which has increased in coverage substantially in recent versions (e.g. Heck et al. 2010; 2012). Perhaps relatively few advanced users of multilevel models work with SPSS, principally because other packages have a wider range of model specification options1. Indeed a common practice amongst multilevel modellers in the social sciences is to use SPSS for data preparation, in combination with a specialist multilevel modelling package such as MLwiN or HLM for the actual statistical analysis. We try to illustrate this usage in some examples (using SPSS in preparation for analysis in MLwiN) but our examples also seek to demonstrate SPSS’s own multilevel modelling capacity.

In working with SPSS for the preparation or analysis of complex survey data, it is critical to use ‘syntax’ command programming in order to preserve a record of the tasks you undertake. This is often a bit of a burden to adapt to if it is new to you: probably most people are first taught to use the package using non- syntax methods (i.e. using the drop down or ‘GUI’ interface). For topics such as covered in this module, however, it is essential to work with syntax commands, which we’ll give you examples of in the relevant lab session files. You can also find out relevant syntax by setting up commands via the GUI menus then clicking ‘paste’, or by using the SPSS help facilities.

The steps below highlight ways of using SPSS effectively for the purpose of this module (these images use version 17, and again for introductory purposes this issues are much the same for later versions). Online introductions to SPSS are available from many other sources, and a list of online tutorials is maintained at: http://www.spsstools.net/spss.htm

1 We should note that SPSS can also be used as the means of specifying a much wider range of multilevel modelling and other statistical models through its ‘plug-in’ facility supporting R and Python scripts (see e.g. Levesque and SPSS Inc, 2010, for more on this topic).

24 1E Coursepack: Appendix 2

Your first steps should always be to open the programme first, then immediately open a new or existing syntax file

I advise against double clicking on SPSS format data files as a way to open the package and data simultaneously. This might seem a trivial issue, but actually it makes quite a difference to the replicability and consistency of your analysis.

25 1E Coursepack: Appendix 2

You can run commands via syntax by highlighting all or part of the line or lines you want to run, then clicking ‘run -> selection’, or just ‘ctrl+r’.

You can write out the commands yourself; or find their details in the manual; or use the ‘paste’ button to retrieve them from the windows link.

The above gives you a manual with documentation on all syntax commands. Alternatively by clicking the ‘syntax help’ icon whilst the cursor is pointing within a particular command, you can go straight to the specific syntax details of the command you’re looking at (see below).

26 1E Coursepack: Appendix 2

These images depict using the ‘paste’ option to find relevant syntax (for the crosstabs command). Using ‘paste’, you’ll often get a bit more syntax detail than you bargained for since all the default settings are explicitly shown.

27 1E Coursepack: Appendix 2

Most often, you’ll start with an existing syntax file and work from it, editing and amending that file. A well organised syntax file should start with metadata and path definitions.

By default, the syntax editor has colour coding and other command recognition. I personally prefer to turn off annotation settings on the syntax editor (under ‘tools’ and ‘view’), but others prefer to keep them on.

*Path definitions are important! In the above, at the top of the file we’ve used the ‘define..’ lines to specify macros which SPSS will now recognise as the relevant text. For example, when SPSS reads

in the above ‘get file’ line, it is actually reading

28 1E Coursepack: Appendix 2  New commands begin on a new line. A few syntax  Command lines can spread over several lines, but commands must tips: always end with a full stop. When a command line is spread over several lines, it is conventional, though not essential, to indent the 2nd, 3rd etc lines by one or more spaces.  Any line beginning with a ‘*’ symbol is ignored by the SPSS processor and is regarded as a ‘comment’ (you write comments purely for your own benefit).  Do your best not to use the drop-down menus! Practice pays off where using syntax is concerned – the more you run in syntax, the more comfortable you’ll be with it  It’s good to organise your syntax files into master files and sub-files. The master file can call the sub-files with the ‘include’ command.

Some comments on output from SPSS

 You can manually edit content of the output windows (e.g. text, fonts, colours).  You can paste output directly into another application, either as an image (via ‘paste special’) or as an editable graph or table (in some applications)  Usually, you’re unlikely to want to save a whole output file (it’s big and isn’t transferable) (But you can print output files into pdf only)  Many users don’t save SPSS output itself, but merely save their syntax and transcribe important results from the output as and when they need it.

29 1E Coursepack: Appendix 2

How to use… MLwiN

MLwiN was first released in 1995 as an easy to use Windows style version of the package ‘MLN’. That package supported the preparation of data and the estimation of multilevel models via a DOS style command interface; MLwiN has since retained the MLN estimation functions, and developed a great many new additional routines.

How to use MLwiN (i): MLwiN the easy way

The standard means of using MLwiN is to open the package then open data files in its format, and input commands through opening the various windows and choosing options. People most often use the Names and Equations windows here – the latter is especially impressive, allowing you to specify models then run them and review their results in an intuitive interface.

The MLwiN user’s guide gives a very accessible description of the windows commands options, as does the LEMMA online course covering MLwiN examples (http://www.cmm.bristol.ac.uk/research/Lemma/). One important addition, however, is that it is very useful to have the ‘command interface’ and ‘output’ windows in the background. These will give you valuable information on what is going on behind the scenes when you operate the package.

Here are some comments and description on using MLwiN in the standard way:

When you first open the MLwiN interface…

30 1E Coursepack: Appendix 2 …the first thing to do is click ‘data manipulation’ and ‘command interface’, then click ‘output’, then tick ‘include output..’

{Note - These materials are oriented to MLwiN v 2.20, but they should be compatible with other versions of the software}

The normal next step it to open up an existing MLwiN format dataset (‘worksheet’)..

31 1E Coursepack: Appendix 2

..or else to read in data in plain text format, which requires you to specify how many columns there are in the data..

When data is entered, most people now open the ‘names’ and ‘equations’ windows as their next steps. From the equations window, you can click on terms or add terms to develop your model as you wish…!!

32 1E Coursepack: Appendix 2 Important: as you read in a worksheet file, it might already contain some saved model settings which are visible via the equations window

This is the popularity analysis dataset as used in Hox 2010. The terms in the equation window show the model that has been ‘left’ in MLwiN at the last save – the terms show the outcome variable (popular); explanatory variables (cextravl; csex; ctexp; cextrav*ctexp; the random effects design, and the model coefficient estimates for that model.

MLwiN supports all sorts of other useful windows and functions; we’ll explore some but not all over the module.

33 1E Coursepack: Appendix 2

How to use MLwiN (ii): MLwiN for serious people

Good practice with statistical analysis involves keeping a replicable log of your work. MLwiN has some very nice features but it is not well designed to support replication of analysis activities. This is because it is programmed on the presumption that most specifications are made through windows interface options (point and click specifications). Most people working with MLwiN specify models, and produce other outputs, via the windows links. This can work well to a point, but it does make documentation difficult. To see the ‘behind the scenes’ commands that MLwiN is processing when responding to point and click instructions, ensure the output window has the box ‘include output from system generated commands’ ticked, from which point you can see results in the background.

Replication in MLwiN can be achieved - by building up a command file which will run all the processes you are working on – but this is quite hard work. (An alternative is to set a log of your analysis and run your analysis through point-and-click methods – some people do this, but it does generate a very long list of commands that isn’t easily navigated). Building up a command file is hard because the command language (known as ‘MLN’) is not widely reported. If possible, it is usually better to build up your analysis using a template from another file (e.g. the command files we’ll supply in this module). Also, some features of the programme use rather long- winded command language sequences which are burdensome to write out manually (e.g. in moving through multiple iterations when estimating models).

Even so, my advice is to persevere with building up a command file in MLwiN covering the model specification you’re working on. To do this, I recommend writing your file in a separate plain text editor, and invoking it with the ‘obey’ command in MLwiN. This is cumbersome, but tractable. I find that a reasonable way to approach this is to explore and refine models via the windows options, but make sure that when I settle on an approach, I write it into my command file, and then run the command file again to check that this has worked.

The following screens show this being done:

34 1E Coursepack: Appendix 2

The above screens depict how I would typically have MLwiN configured for my own work. I would have the package open, but simultaneously I would also be editing a command file in another editor, which I would periodically run in full in MLwiN.

Looking at the MLwiN session, at various points there are other windows that I’ll need to open, but the four windows shown are probably the most important. At the command interface I can manually input text commands. I sometimes do this, and at other times I use it to run a whole subfile, such as with:

obey "d:\26jun\new\prep\prep_mlwin1.mac"

This is used to run the whole of the command file (that I’m simultaneously editing in EditPad Lite) through MLwiN (which includes a ‘logo’ command which sends the log of the results to a second plain text location).

Next, in the output window, I can see the results of commands I send. This includes results emerging from my macro file and commands sent in through the command interface, and it also includes results emergent from windows based command submissions. For example, in this image, I’ve recently input the commands ‘fixe’ and ‘rand’ from the command interface, with the effect of generating a list of the ‘fixed’ and ‘random’ parameter estimates from the last model run.

The other windows shown in MLwiN here are the names window (which shows a list of the variables and scalars currently in memory), and the equations window. The equations window is a very attractive feature of MLwiN through which, firstly, you can see the current model represented statistically, along with its parameter estimates, and secondly, you can manually edit to adjust the terms of the model.

35 1E Coursepack: Appendix 2

How to use MLwiN (iii): Using ‘runmlwin’ or ‘r2mlwin’

A third mechanism exists for running MLwiN as a plug-in from Stata (see Leckie and Charlton, 2011) or R (Zhang et al. 2013). This can be attractive because it offers the convenience of working with your data in your preferred package, plus the ability to run complex multilevel models using the more efficient (and wider ranging) estimation routines of MLwiN (this is particularly helpful when compared to the relatively slow routines in Stata). Below we illustrate using ‘runmlwin’ (the Stata plugin), whilst r2mlwin (the R plugin) works in a very similar manner.

‘Runwlwin’ allows you to save most model results within your Stata session in a similar way to that which you would otherwise have achieved from running a model in Stata (i.e. you can subsequently use commands such as ‘est table’ on the basis of model results which were calculated in MLwiN).

Using runmlwin involves working in Stata, and requires the most recent available Stata and MLwiN versions MLwiN

As preliminaries, you must set a macro, specifically called ‘MLwiN_path’ which identifies where MLwiN is installed on your machine.

..you must then install runmlwin using the typical Stata module installation commands

36 1E Coursepack: Appendix 2 Running a model with runmlwin uses a fairly intuitive syntax

The output includes a brief glimpse of MLwiN, which is opened, used, and, if you used the ‘nopause’ option, is promptly closed again..!

Note above that by using the ‘viewmacro’ model we also get to see the underlying MLwiN macro used for this analysis.

37 1E Coursepack: Appendix 2 A great attraction is the storage of model results from MLwiN as items in Stata

In my opinion, runmlwin is a very exciting contribution to software for multilevel modelling, and will prove a very valuable tool for anyone with access to both packages!

Some miscellaneous points on working with MLwiN

 Sorting data. Some MLwiN models will only work effectively if the data has been sorted in ascending order of respective level identifiers. In some circumstances, moreover, it may be necessary to ensure that the identifiers begin at 1 and increase contiguously. Generally, it is easiest to arrange identifiers in this way in another package such as Stata or SPSS before reading the data into MLwiN.  Macro editor. It is possible to edit a macro within the MLwiN session using the link ‘file -> {open/new} macro’. The image below shows an example of this. However, the macro can only be executed in its entirety (there is no comparable function to the SPSS and Stata syntax editors to allow you to run only a selected subset of lines), and on balance you are likely to find this less convenient then executing a macro which is open in a separate plain text editor.  Saving worksheets. An important feature of MLwiN is that when a worksheet is saved in MLwiN, as well as saving the underlying data, the software automatically saves the current model settings as part of the worksheet. Opening and reviewing the Equations window is usually adequate to reveal any existing model specifications. Also, model specifications can always be cleared from memory with the ‘clear’ command. To follow ‘best practice’ regarding replication standards, ideally it is preferable to ‘clear’ datasets of the model specifications first, then re-specify the model from first principles via your working macro. In exploratory analysis phase, however, people frequently do exploit the save functionality to save their current model settings.  Reading data from other formats. MLwiN is relatively poorly designed for reading data from other software formats. In general, only plain text data can be easily recorded, and you will need to add in any metadata (e.g. variable names; variable descriptions) manually yourself. The effective way to do this is to write a data input macro for the relevant files, which contains the metadata specifications you want to make then saves out to MLwiN data format (*.ws), thus allowing you to start an analytical file from that point. There is an illustration of this process in Practical 2b.

Useful user guides on MLwiN are:

Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2012). A User's Guide to MLwiN, v2.26. Bristol: Centre for Multilevel Modelling, University of Bristol. (pdf available: http://www.bristol.ac.uk/cmm/software/mlwin/download/2-26/manual-print.pdf) Rasbash, J., Browne, W. J., & Goldstein, H. (2003). MLwiN 2.0 Command Manual. London: Centre for Multilevel Modelling, Institute for Education, University of London. (pdf available: http://www.cmm.bristol.ac.uk/MLwiN/download/commman20.pdf)

(The former covers windows operation of the package, and the latter covers the syntax command language based upon the earlier MLN package).

38 1E Coursepack: Appendix 2

How to use… R

R is a freeware which is a popular tool amongst statisticians and a small community of social science researchers with advanced programming skills. It is an ‘object oriented’ programming language which supports a vast range of statistical analysis routines, and many data management tasks, through its ‘base’ or extension commands. Being ‘object oriented’ is important and means the package appears to behave in a rather different way to the other packages described above. The other packages essentially have one principal quantitative dataset in memory at any one time, plus they store metadata on that dataset and typically some other statistical results in the form of other scalars and matrices; in such packages, commands are automatically applied to the variables of the principal dataset. In R, however, different quantitative datasets (‘data frames’), matrices, vectors, scalars and metadata, are all stored as different ‘objects’, potentially alongside each other. R therefore works by first defining objects, then second performing operations on one or many objects, however defined.

An important concept in R is the ‘extension library’, which is how ‘shortcut’ programmes to undertake many routines are supplied. In fact, you will rarely use R without exploiting extension libraries. The idea here is that R has a ‘base’ set of commands and support, but that many user-contributed programmes develop substantially from that base language. Those extensions typically provide shortcut routes to useful analyses. A few extension libraries in R are specifically designed to support random effects multilevel model estimation – e.g. the lme package (Bates, 2005; Pinhero & Bates, 2000).

Some researchers are very enthusiastic about R, the common reasons being that it is free and that it often supports exciting statistical models or functions which aren’t available in some other packages. My own perspective is that R isn’t the ideal package for a social survey researcher interested in applied research with large scale datasets, as the programming demands to exploit it are very high, and, because it isn’t widely used in applied research, it doesn’t always have robust and helpful routines, working interfaces, or documentation standards, that meet popular social science data-oriented requirements.

R is installed as freeware and since it is frequently updated it is wise to regularly revisit the distribution site and re- download

39 1E Coursepack: Appendix 2

When you open R, you will see something like this ‘R console’

(on my machine I use a ‘.rprofile’ settings file so my starting display is marginally different to the default)

The first few lines show me defining a new object (a short vector) and listing the objects in current memory.

R’s basic help functions point to webpages.

The general help pages mostly have generic information, and are not in general provided with worked examples. Many R users get their help from other online sources, e.g. http://www.statmethods.net/

40 1E Coursepack: Appendix 2

In general, with R, the first thing you should do is ask to open a new or existing script and work from that. Scripts in R work in a similar way to a syntax file in Stata or SPSS – highlight a line or lines, and press ‘ctrl+R’.

R or RStudio? These outputs are from the default R ‘console’ session. Many, perhaps most, people who work regularly with R use an alternative way to set up the software command script and output windows, of which there are a few options but RStudio is probably the most popular (www.rstudio.com). RStudio has the attraction of a smoother linkage between the programming within the script file, and the outputs and other information on the session.

41 1E Coursepack: Appendix 2

After running commands, output is sent either to the main console or a separate graphics window

42 1E Coursepack: Appendix 2

References cited

Altman, M., & Franklin, C. H. (2010). Managing Social Science Research Data. London: Chapman and Hall. Bates, D. M. (2005). Fitting linear models in R using the lme4 package. R News, 5(1), 27-30. Browne, W. J., Cameron, B., Charlton, C. M. J., Michaelides, D. T., Parker, R. M. A., Szmaragd, C., et al. (2012). A Beginner's Guide to Stat-JR (Beta release). Bristol: Centre for Mutlilevel Modelling, University of Bristol. Crouchley, R., Stott, D., & Pritchard, J. (2009) Multivariate generalised linear mixed models via sabreStata (Sabre in Stata). Lancaster: Centre for e-Science, Lancaster University. Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158. Field, A. (2013). Discovering Statistics using SPSS, Fourth Edition. London: Sage. Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), 153-171. Fox, J., & Weisberg, S. (2011). An R Companion to Applied Regression, 2nd Ed. London: Sage. Gelman, A., & Hill, J. (2007). Data Analysis using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press. Grolemund, G. & Wickham, H. (2016). R for Data Science. O’Reilly, and http://r4ds.had.co.nz. Heck, R. H., Thomas, S. L., & Tabata, L. N. (2010). Multilevel and Longitudinal Modeling with IBM SPSS. Hove: Psychology Press. Heck, R. H., Thomas, S. L., & Tabata, L. N. (2012). Multilevel Modeling of Categorical Outcomes Using IBM SPSS London: Routledge. Hox, J. (2010). Multilevel Analysis, 2nd Edition. London: Routledge. Kulas, J. T. (2008). SPSS Essentials: Managing and Analyzing Social Sciences Data New York: Jossey Bass. Kohler, H. P., & Kreuter, F. (2009). Data Analysis Using Stata, 2nd Edition. College Station, Texas: Stata Press. Lambert, P.S., Browne, W.J. & Michaelides, D.T. (2015) ‘Contemporary developments in statistical software for social scientists’, in P. Halfpenny & R. Procter (Eds), Innovations in Digital Research Methods (pp. 143-160). London: Sage. Leckie, G., & Charlton, C. (2011). runmlwin: Running MLwiN from within Stata. Bristol: University of Bristol, Centre for Multilevel Modelling, http://www.bristol.ac.uk/cmm/software/runmlwin/ [accessed 1.6.2011]. Levesque, R., & SPSS Inc. (2010). Programming and Data Management for IBM SPSS Statistics 18: A Guide for PASW Statistics and SAS users. Chicago: SPSS Inc. Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Muthén, L. K., & Muthén, B. O. (1998-2010). Mplus User's Guide, Sixth Edition. Los Angeles, California: Muthén and Muthén, and http://www.statmodel.com/ Payne, G., Williams, M. and Chamberlain, S. (2004). ‘Methodological Pluralism in British Sociology’, Sociology, 38(1) 153-164. Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-Plus. New York: Springer- Verlag. Rabe-Hesketh, S., & Skrondal, A. (2008). Multilevel and Longitudinal Modelling Using Stata, Second Edition. . College Station, Tx: Stata Press. Rafferty, A., & Watham, J. (2008). Working with survey files: using hierarchical data, matching files and pooling data. Manchester: Economic and Social Data Service, and http://www.esds.ac.uk/government/resources/analysis/. Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2012). A User's Guide to MLwiN, v2.26. Bristol: Centre for Multilevel Modelling, University of Bristol. Rasbash, J., Browne, W. J., Healy, M., Cameron, B., & Charlton, C. (2011). MLwiN, Version 2.24. Bristol: Centre for Multilevel Modelling, University of Bristol. Reardon, S. (2006). HLM: Stata Module to invoke and run HLM v6 software from within Stata. Boston: EconPapers, Statistical Software Components (http://econpapers.repec.org/software/bocbocode/s448001.htm).

43 1E Coursepack: Appendix 2 Raudenbush, S. W., Bryk, A. S., Cheong, Y. F., & Congdon, R. (2004). HLM 6: Hierarchical Linear and Nonlinear Modelling. Chicago: Scientific Software International. Spector, P. (2008). Data Manipulation with R (Use R). Amsterdam: Springer. Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass. Twisk, J. W. R. (2006). Applied Multilevel Analysis: A Practical Guide. Cambridge: Cambridge University Press. West, B. T., Welch, K. B., & Gatecki, A. T. (2007). Linear Mixed Models. Boca Raton, Fl: Chapman and Hall. Zhang, Z., Charlton, C.M.J., Parker, R.M.A., Leckie, G. and Browne, W.J. (2013). R2MLwiN v0.1-7. Bristol: Centre for Multilevel Modelling, University of Bristol.

Software-specific recommendations:

Stata Kohler, H. P., & Kreuter, F. (2009). Data Analysis Using Stata, 2nd Ed College Station, Tx: Stata Press. Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modelling Using Stata, Third Edition. . College Station, Tx: Stata Press. Web: http://www.longitudinal.stir.ac.uk/Stata_support.html ; http://www.ats.ucla.edu/stat/stata/

SPSS Field, A. (2013). Discovering Statistics using SPSS, Fourth Edition. London: Sage. Kulas, J. T. (2008). SPSS Essentials: Managing and Analyzing Social Sciences Data New York: Jossey Bass. Levesque, R., & SPSS Inc. (2010). Programming and Data Management for IBM SPSS Statistics 18: A Guide for PASW Statistics and SAS users. Chicago: SPSS Inc. Web: http://www.spsstools.net/ ; http://www.spss-tutorials.com/

MLwiN Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2009). A User's Guide to MLwiN, v2.10. Bristol: Centre for Multilevel Modelling, University of Bristol. Web: http://www.bristol.ac.uk/cmm/software/mlwin/

R Field, A., Miles, J., & Field, Z. (2013). Discovering Statistics using R. London: Sage. Fox, J., & Weisberg, S. (2011). An R Companion to Applied Regression, 2nd Ed. London: Sage. Gelman, A., & Hill, J. (2007). Data Analysis using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press. Grolemund, G. & Wickham, H. (2016). R for Data Science. O’Reilly, and http://r4ds.had.co.nz. Spector, P. (2008). Data Manipulation with R (Use R). Amsterdam: Springer. (There is also a useful guide to using R, within SPSS, in the body of Levesque & SPSS Inc 2010, cited above). Web: http://www.ats.ucla.edu/stat/r/

44 1E Coursepack: Appendix 2