Users' Manual
Total Page:16
File Type:pdf, Size:1020Kb
THE NewMDSX SERIES OF MULTIDIMENSIONAL SCALING PROGRAMS USERS’ MANUAL FOR WINDOWS 9x / NT / 2000/XP First published : October 2001 Revised : August 2004 © The NewMDSX Project Background The original MDS(X) Project was funded (1974-1982) by the U.K. Social Science Research Council in conjunction with the Prog ra m Library Unit of the University of Edinburgh. It grew out of the frustration of a research group at Edinburgh University trying to work out the similarities and differences in programs coming from different sources – particularly Bell Laboratories and University of Michigan (Guttma n-Lingoes). The project was designed to: • Collect MDS and related programs in common use or of particular interest • rewrite the source-code up to Fortran77 specifications • replace the common subroutines by numerically efficient versions • provide a common instruction set for running programs • produce a utility for producing measures from raw data for input into (any) multidimensional scaling programs. For many years, a mainframe ver sion was wi d ely a vailable, and maintained until recently by Manchester Inform at ion and As sociated Services [http://www.mimas.ac.uk/]. Until recently its main use has been on PCs operating under MD-DOS. This manual describes a new version for use with Windows 9x, NT, 2000, and XP. The Windows version now includes programs for Correspondence Analysis (CORRESP) , analysis of sorting data (MDSORT), and principal components analysis (PRINCOMP), in addition to the routines originally available in MDS(X). For information about MDS(X) on MAC machines contact Wolfgang Otto: [[email protected]]. He has also operated NewMDSX for Windows successfully using the MAC PC emulator. A version of NewMDSX for Linux is in preparation. The NEWMDS Project has a number of SPONSORS and COUNTRY REPRESENTATIVES in addition to the Core Project Team. 1. SPONSORS <LIST TO FOLLOW> 2. COUNTRY REPRESENTATIVES <LIST TO FOLLOW> 3. NEWMDSX PROJECT TEAM Professor A. P. M. Coxon (University of Edinburgh) [email protected] Dr A.P.Brier [email protected] Professor C.L. Jones (University of Toronto) [email protected] Mr D.T. Muxworthy (University of Edinburgh) [email protected] Mr W. Otto (University of Zurich) [email protected] Dr S.K. Tagg (Strathclyde University) [email protected] Dr. Wijbrandt H. van Schuur (University of Groningen) [email protected] Dr Nico Tiliopoulos (Queen Margaret University College, Edinburgh) [email protected] ========================== All enquiries about NewMDSX should be directed to [email protected] The NewMDSX Home Page is at http://www.newMDSX.com INTRODUCTION Why scale to begin with ? The purpose of scaling is to obtain a quantitative representation of a set of data. How is such a representation obtained ? The basic idea is that much data can be thought of as giving information about how similar or dissimilar things are to each other. Scaling models then take this idea seriously, and represent the objects as points in space. In this space, the more similar objects are, the closer they lie to each other. The pattern of points which most accurately represent the information in the data is referred to as ‘the solution’ or ‘final configuration’. Some common uses of MDS are:- 1. to measure an attitude, attribute or variable. e.g. the subjective loudness of a series of tones, the degree of ethnocentrism, the intensity of a particular sexual orientation, preference for a range of educational policies, the utility of a set of goods, the prestige of a group of occupations. 2. to portray complex data in a simpler manner. e.g. to represent the relationships between a set of objects in an easily assimilable, usually spatial, form. 3. to infer latent dimensions or processes. e.g. to identify the factors involved in peoples’ judgements of the desirability of types of housing, or the most likely historical sequence of a set of graves, or how subjects' overall judgements of similarity relate to the known properties of the objects concerned. DATA THEORY AND MEASUREMENT The main impetus towards developing MDS models came from the wish to develop distance models as a paradigm for the measurement and analysis of psychological and social science data, and to build such models without being committed to the strong distributional or measurement assumptions usually made. This so-called “non-metric” orientation has been associated above all with Clyde Coombs (1964) who pioneered much early non-metric MDS modelling, and whose viewpoint might be summarised in the following propositions: i) Assumptions about the “level of measurement” of one’s data, and assumptions involved in the scaling models used to analyse data, commit one to substantive hypotheses about human behaviour. ii) It is better to err on the side of conservatism in attributing metric properties to social science data, and to use weaker measurement structures to represent them. iii) Because most social science data have been elicited in non- experimental settings, and often refer to diversified or non- homogeneous populations, it is well to be especially sensitive to individual or group differences, which may be crucial to the interpretation of the processes generating the data, but which are typically “washed out” in the usual aggregation procedures. Coombs’ initial work lay in the analysis of preferential data and he evolved a distance model for their analysis. This model, which he termed “Unfolding Analysis” was especially sensitive to individual differences. The failure to develop a workable algorithm for fallible data meant that Unfolding Analysis was of little interest to the practising scientist, however attractive it was, or sensitive it was to representing individual differences. A tractable algorithm in fact awaited the development of multi-dimensional scaling procedures, which were equally committed to making use only of ordinal information to obtain a metric solution to the data. NON-METRIC MDS: THE BASIC MODEL Developments in non-metric MDS procedures and models represent one of the most significant methodological advances of the last forty years. Stated simply, their purpose seems very pedestrian – namely to relax the assumption of linearity usually made about the kind of function linking the dissimilarities (the data) and the distances in the solution. In this sense, it could be seen as analogous to the shift of interest to non- parametric sta tis tics. The greatest pay-off from the use of non-metric MDS is that the same basic algorithm is easily extended to very different types of data, to different models (other than just the distance model) and it is readily a ppl ied in a wide variety of situations and in disciplines as diverse as archaeology and electronics as well as the usual social science applications. Moreover, unlike conventional multivariate models, assumptions about distributions rarely need to be made and the procedures in no way depend upon the particular measures of similarity used. For example, frequencies, probabilities, ratings, co-occurrences are quite as appropriate as measures of similarity as are composite indices like coefficients of correlation, covariance, association and overlap. Perhaps most importantly, however, non-metric M DS solutions are “order-invariant”. That is to say that only the ordinal content of the data is made use of in obtaining a solution, so that any set of data with the same ordering of (dis)similarities will generate the same metric solution.1 The basic rationale of non-metric MDS is well discussed in Shepard (1962). He begins by considering the difficulty of achieving numerical representation when only a ranking of the objects is known. This stems from the fact that points representing the objects can be moved very extensively (i.e. can take on a large range of numerical values), whilst still satisfying such ordinal constraints. However, once the representation must in addition satisfy “ordered metric” constraints (i.e. once the data contain, in addition, information on the order of the inter point distances) the range of possible numerical values is greatly reduced: “if non-metric constraints are imposed in sufficient number they begin to act like metric constraints ... As the points are forced to satisfy more and more Inequalities on the inter-point distances ... the Spacing tightens up until any but very small Perturbations of the points will usually isolate one or more of the inequalities” (ibid. 288). The notion that order relations on distances impose very severe constraints on the uniqueness of numerical representation is now commonplace, but its convincing demonstration awaited the development of an iterative algorithm to implement the set of constraints obtained from the data. The basic rationale for this non-metric MDS algorithm is given by Kruskal (1964) and this has formed the basis for almost all subsequent work in this area.2 2.1 The empirical data are interpreted as follows i) There is a set C of objects (often termed stimuli), and these objects will be represented as points in a multidimensional space. Significant information about the relations between the objects is contained in some empirical measure of dis/similarity, linking pairs of objects. Only the (possibly weak) ordering of these dissimilarity coefficients δ(ci , cj) = δij will be preserved in obtaining the solution. In common terminology, the measures input to MDS are termed “proximities” or “dis/similarities”. This usage emphasizes the fact that such measures may be EITHER similarities OR dissimilarities; the only difference is that dissimilarities will be positively related to the distances of the solution whereas similarities will be related negatively to the distances of the solution. Thus if similarity measures (such as correlations, co-occurrences as well as actual similarity ratings)are input then the higher the similarity of two objects, the closer they will be made to be in the solution space, whereas if a dissimilarity measure (such as the Index of Dissimilarity, Euclidean distance or dissimilarity ratings)is input, the higher the dissimilarity of two objects, the more distant they will be made to be in the solution space.