1. Your First Step Is to Define Your Topic(S) of Interest

New Measures

“Measures” is the terminology encompassing any type of measurement tool, such as scales, questionnaires, inventory, test, etc.

There are two concepts pertinent to developing a new measure -- “Constructs” are the theoretical or conceptual variable of interest. Constructs are sometimes called “latent variables”, where “latent” refers to the variable being not directly observable, but instead inferred from other measurable variables (your new measures). “Variables” are the operational definitions of the construct. In other words, the act of creating new measures is translating your theoretical variable (constructs) into operational variables (variables).

1. Your first step is to define your topic(s) of interest You need to identify the purpose of your scale so that you can identify what you want to measure (so that you can directly and concretely identify measurement items) and identify what you do not want to measure (so that you can exclude irrelevant and extraneous concepts from your measure).

A. Identify theoretically relevant concepts. For example, if you were interested in a measure of anger, other theoretically-related concepts could be hostility and frustration which are shown in the literature to be theoretically relevant to anger. If you are not familiar with the literature on your topic of interest, try asking a colleague for theoretically relevant concepts, or read a literature review on the topic. Also, if you are contacting a colleague who conducts research on the topic, ask for the measures they use when measuring the topic.

B. Using the APA Thesaurus. Another way to find related concepts is using synonyms, which for anger could be rage, fury, wrath, resentment, indignation, annoyance, hatred, indignation, etc. An online thesaurus, or Googling the word, will provide ample examples. There is also an APA book called “Thesaurus of Psychological Index Terms (http://www.apa.org/psycinfo/products/thesaurus.html) which contains standardized vocabulary terms used in database searches such as PsycINFO. By standardizing the words or phrases used to represent concepts, you don't need to try and figure out all the ways different authors could refer to the same concept.

C. Breaking your topic into distinct components. Another way to define your topic is by breaking-up your topic into distinct components, which can be separately well-defined and used for measuring the larger construct. For example, if your topic is anger, you could divide that topic into components like state and trait anger, direct and indirect anger, and etc.

D. Moving from specific to broader levels. You may also want to have well-defined components that start at a narrow level of specificity and then cycle toward broader and broader concepts. For example, if the specific topic of your research is how displaced anger influences health outcomes in terms of cardiovascular functioning, then your first narrow topic could be for “displaced anger cardiovascular”, and then some combination such as “displaced anger”, “anger cardiovascular”, and so forth. Then, a broader level could be for health instead of cardiovascular function, or anger instead of displaced anger, and so forth.

1 2. Generate Items Your task at this point is to brainstorm as many items as possible for the construct and/or distinct components of the construct. After over-generating possible items, you can then evaluate the items and decide which ones to trim from the final measure.

Here are the basic Do’s and Don’ts (which mirror each other).

DOs DON’Ts One concept per item Avoid double-barreled items One component per item Avoid merging two or more components All items must be easily understandable Avoid overly complicated items Short, direct, and concise wording Avoid overly lengthy items Unambiguous wording is essential Avoid ambiguity or miscommunication Correct grammar and spelling Avoid double-negatives Subjects must be able to vary on the items Avoid items for which all people would agree Include both positively and negatively Avoid items being all in the same direction valenced words which will create response bias

3. Evaluate the Items Get a second opinion. Just as with any type of writing, such as writing articles, grants, book chapters, a Christmas Card, etc., you should consult the advice of others, particularly those people who are not part of your research team and could look upon the items with a fresh eye. Plus, people not on your research team who may not be experts on the subject matter of the measure may be able to provide you an outside perspective from the non-expert or lay-person point of view. Assessing how the lay person interprets and comprehends your measure is critical since the lay person is likely the intended user of the measure. In fact, after completing all the different stages of generating new measures, your first step should be to pilot test the measure by having subjects complete the measure, and then indicate any problems, issues, confusions, ambiguities, etc. Finally, giving yourself time to get a second opinion allows you to take a step back from your own work so that you can then evaluate your own work with a fresh perspective.

After getting feedback, you will want to revise, reword, and rework the measure.

When it comes time to trim down the number of items, the total length of the measure depends on your construct and your methodological approach. For example, the length of your measure is based upon how many items you think are necessary to assess the construct. Some constructs require more items because they are more complicated, have more components, or are larger in scope than other constructs. The length of your measure is also dictated by the practical concerns of how you are implementing the measure. For surveys and experiments where you have the relatively undivided attention of the user, you can include as many items as you believe the average person can process before response fatigue occurs. I would recommend the entire length of the survey/experiment not last longer than 1 hour, so maybe pilot test your study to assess the total length of time of the survey/experiment. For other research venues, such as online studies or telephone studies, you may

2 want a shorter study to compensate for the quicker escalation to response fatigue that sometimes occurs when you no longer have the relatively undivided attention of the subject. 4. Assign Measurement Format You have many different options for formatting the items. For example…

A. Open-ended versus Closed-ended? Open-ended items allow the subject to answer in their own words so that the researcher can learn how the subjects’ think and feel about the topic. It is possible to learn a great deal about a topic by simply asking the subjects their opinion about the topic. In fact, at the beginning stages of new research, it is very advantageous to learn as much as you can about your topic by utilizing open-ended questions. Plus, you are not restricting the subject’s response by limiting the subjects to your carefully worded and focused closed-ended questions. The disadvantage of open-ended questions is the inability to statistically analyze the responses without first quantifying it by translating the text into numbers. There will always be some measurement error when a researcher codes the written open-ended response into a number format because the researcher provides an unavoidable level of imprecision when assigning numbers of qualitatively different responses. In other words, some data or variation is lost in the transfer. Closed-ended questions, conversely, lose some of the qualitative differences between the subjects that can only be captured by open-ended responses, but the advantage of the closed-ended questions is that you know exactly how the subject answers the questions (without the secondary filter through the researcher coding the responses). Solution -- ask both closed-ended & open-ended questions.

B. Rating Scale Formats? Dichotomous formats fall into a yes/no format. Categorical formats involve assigning labels or numbers to categories, such as ethnicity categories, state of residence in the US, gender, etc. Continuous formats have a continuous range, such as a 7-point scale from 1-7, or a “feeling thermometer” which asks the subject to indicate whether they feel warming toward the topic (and thus answer in the 50-100 point range) or feel coldly about the topic (and thus answer in the 0-50 point range).

So the question is how many points on the scale? At one extreme, you could argue that a scale range with few points, such as a 3-point or 4-point scale, might not provide enough variation between subjects because of the limited scale range. At the other extreme, a scale range with many points, such as 20-point or 100-point scale might be too large for the subject to have meaningful appreciation of distinctions between all the points. On the other hand, you want enough items to assess subtle distinctions or variations within the questions. Most researchers settle for a 5-point, 7-point, 9-point, or 11-point scale. Plus, classic research by Miller (1956) showed that most people are only able to process 72 items at a time. Also, keep in mind that some subjects are reluctant to indicate the extreme ends of a scale so a 9-point scale is actually more a 7-point scale.

C. Neutral or Mid-point to the scale? If you offer a mid-point to your scale, such as an odd- numbered scale, then some subjects will indicate the mid-point because they truly believe they fit the neutral option, and some subjects will indicate the mid-point because it is easier than making a decision about which side of the scale they fall on. In other words, if you provide the mid-point, you are losing some data via the subjects who are not forced to

3 decide upon which side of the scale they fall; but if you do not provide the mid-point, you lose some data via the subjects who truly are in the mid-point that does not exist on your scale. Sometimes the purpose of the measure dictates whether or not a mid-point is included. Assessing verdict choice, for example, typically should not include a mid-point because you want the “jurors” to self-select whether they fall on the “guilty” side or the “not guilty” side of the scale.

D. How many Labels? Should your scale include labels for every numbered option, or just labels for the end-points? The answers to these questions depend partly upon the scale range and the label wording. For example, if your scale range is short, such as 2-point or 3-point, it should probably include labels for each numbered option to provide enough information to the user to understand how to respond. If your scale range is larger it may be unnecessary to include labels for every numbered option because the difference between any two scale points might not be meaningful enough to provide an appropriate label. For example, if you have multiple labels -- strongly agree, somewhat agree, and slightly agree -- can you think of any appropriate labels to fit in between those existing labels? At what point can the user no longer understand the distinctions between the labels? The bottom line is that you need to include labels for the end-points so that the user can understand the range of the response format, and then include more labels only as necessary for the user to understand the basic distinctions within the scale range.

E. What should be the wording of the labels? The answer to this question depends primarily upon the purpose of the scale and how the items are worded. For example, many items are worded specifically to fit a format of “Disagree/Agree”, or “uncharacteristic of me / characteristic of me”. Some measures are designed to fit a “yes versus no” format so the end-points will correspond to yes (I Agree) to no (I Disagree); whereas other measures are designed to fit a “high versus low” format so the end-points will correspond to high (I strongly Agree) to low (I weakly Agree).

F. Filler Questions? Sometimes you want to disguise the purpose of your measure. Filler questions are questions that fit closely enough with the other “true” items that the user perceives the entire measure as being coherent and related; but that at the same time, the filler questions are distinct enough from the other “true” items that the overall theme of the measure is camouflaged from the user. Why do you want to deceive the user? If the user is aware of the true purpose of the measure, they may react differently to the measure than if unaware of why they are being tested. For example, if the user thinks they know why you are having them take the measure, they may react with compliance (by acting in accordance with how they assume you want them to respond) or with reactance (by purposefully responding in a manner converse of what they perceive you want from them). Either response is detrimental to the purpose of accurately assessing the user’s true response. On the other hand, many measures involve benign topics which the user could correctly deduce the purpose of the measure and yet still not respond with compliance or reactance.

G. Question Order? You may want the first item to directly correspond to the construct being investigated so that the user is oriented to the objective of the scale. After that, you are free to intersperse filler items, positively versus negatively valenced items, and etc. In fact, scientific standards dictate that you should randomize all the questions, so the issue of “Question Order” is rather mute.

4 5. Evaluate the Measure At this point you have generated items, evaluated and revised the items, and assigned a measurement format. You are now ready to evaluate the entire measure. There are two steps in evaluating the measure -- practically and statistically.

From a practical standpoint, you want to assess if the measure is appropriate for the intended user of the measure. For example, you want to confirm the questions are understandable, unambiguous, that the subjects vary on the items, that there is no response fatigue, and etc. You should pilot test the measure by having subjects complete the measure, and then indicate any problems, issues, confusions, ambiguities, etc. Another purpose of the pilot test to assess the total length of time the average subject requires to finish the measure.

From a statistical point of view, you want to assess if the measure is valid and reliable. The statistical analyses will be discussed in the Spring class on Statistics.

5