Exploratory Analysis: What to Do first Inj Prev: First Published As 10.1136/Ip.4.2.140 on 1 June 1998

Exploratory Analysis: What to Do first Inj Prev: First Published As 10.1136/Ip.4.2.140 on 1 June 1998

140 Injury Prevention 1998;4:140 Exploratory analysis: what to do first Inj Prev: first published as 10.1136/ip.4.2.140 on 1 June 1998. Downloaded from Robert W Platt One of the more important but often over- approach is to graph a scatterplot of the two looked parts of statistical analysis is the very variables and check for a relationship. first step—an exploratory and descriptive For categorical variables, it is easiest to analysis. Typically, researchers take a quick inspect bivariate (for example 2 × 2) cross look at the data and then dive into more com- tabulations to identify patterns and potentially plex regression models or t tests. In this interesting relationships. These relationships column, I discuss preliminary analysis in provide the baseline for futher analyses. general and look at some techniques less well Finally, a multivariate exploratory analysis known than others, but which provide interest- may be needed to detect possible confounding ing and useful results. (the mixing of eVects of an outcome, an expo- The first step in understanding your data is sure and a third variable that is associated with to establish the kinds of variables you have. Are the primary predictor and also aVects the out- they continuous (ranging over several values, come) or eVect modification (when the eVect like weight or height) or categorical (taking of an exposure on the outcome diVers for only a few values)? Are the continuous diVerent levels of a third variable). The easiest variables bounded (like age, which can’t be less way to do this is with a bivariate analysis strati- than zero) or unbounded? Are there any fied by the third variable. If the latter is outliers or strange values? categorical just look at the relationship between This last question can be looked at in a sim- the other two variables restricted to the levels of ple way. Calculate the mean and standard the third, and if it is continuous, create a new deviation of a variable, and examine values that categorical variable. If there is important are more than three, or if you want to be very confounding or eVect modification (the defini- careful, two, standard deviations from the tion of “important” here is arbitrary and mean. If there are outliers, they need to be depends on the needs of the analysis) these investigated, and either eliminated (if they are must be accounted for in the formal models errors) or treated carefully (if they are valid when computing estimates of the primary pre- data points). Next, for the continuous vari- dictor. http://injuryprevention.bmj.com/ ables, look at histograms of your data, and for After these preliminary analyses, the patterns the categorical variables, look at frequency and relationships in the results should be tables. These will tell you roughly what the dis- reasonably clear and the analyses that need to tributions of the variables are and this influ- be done should be obvious. If this is the case, ences the statistics you can use. then the rest is simple—for continuous vari- The next thing to consider is bivariate analy- ables, t tests, ANOVA, or linear regression can McGill ses of the data. First, what do we do with con- be used to confirm the exploratory work. Simi- University/Montreal 2 Children’s Hospital tinuous variables? A common mistake is to larly, for categorical data, ÷ or non-parametric Research Institute, examine correlations first. But these are usually tests can be used. 2300 Tupper, Montreal, an ineYcient way of inspecting the data, If patterns in the results are not clear, two PQ H3H 1P3, Canada because significant correlations depend on a things are possible: either there aren’t any linear relationship between the variables and if interesting relationships, or there are but they on October 2, 2021 by guest. Protected copyright. Correspondence to: Dr Platt the true relationship is curved, the correlation are complex and you need to consult a statisti- ([email protected]). may not indicate the association. Another cian!.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    1 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us