I. Steps in Problem Solving Using Statistical Methods

Business Statistics 240

This course is an introduction to "quantitative reasoning" (thinking with numbers). The main objective of the course is to present foundations of problem solving using quantitative approaches. These approaches are referred by the generic term "statistical methods" and their purpose is to enable reaching conclusions about “problems” from observable data sets called "samples".

I. Steps in problem solving using statistical methods: 1) Identification of a problem: a) Recognition. b) Definition. 2) Taking inventory of informational requirements obtained from (1)(b). 3) Gathering data. Complication: not all data is observable. Solution: take a representative subset of all data, a "sample". 4) Organizing available data. COVERED IN COURSE 5) Analyzing available data. COVERED IN COURSE 6) Reaching conclusions from data. COVERED IN COURSE 7) Prescribing “solution” to “problem”.

II. Most important checks and balances (debugging):

1. Is sample large enough (to be representative of any/all data that could have been used in the above steps)? 2. Do the conclusions from the used sample of data extend/generalize to “problem” over any/all data?

1 III. Taxonomy of Data: Statistical methods are applicable to many kinds of data1. These varieties of data are classified into two categories (Qualitative, or Quantitative) and into four different levels of measurement (Nominal, Ordinal, Interval and Ratio). Categories of Data Qualitative data answers the question “What kind?” while Quantitative answers the question “How much?”. There is a greater variety of quantitative data than qualitative data because: (1) the required detail level of measurements for our data determines the numeric scale/system employed --continuous or discrete-- and (2) typically one can not account for all the possible values that data may take when it is of the Quantitative variety --while it is typically easy to count the different observed values of qualitative data. By Category Discrete, Continuous, Both? Countable, Not Countable, Both? Qualitative Discrete Countable Quantitative Both Both

Levels of Measurement

By level of measurement, data of interest may take on any of four forms. Depending on what we want the data for: sorting, measuring intensity or to provide us with perspective or a sense of relation. Data can be Nominal, Ordinal, Interval or Ratio. Nominal data enables us to label or name observations, Ordinal data can be used to sort or rank or order elements of a sample or population. Interval data captures the intensity of some phenomenon by giving us an idea of the lowest and highest value an item of interest takes for a given observation. And when expressed in Ratio form, data can also give us a clue as to how much variability there is in data relative to the value in the denominator of the ratio. By Level of Measurement Organization of Data Nominal None Ordinal Sorting Possible Interval Sorting and Intensity Possible Ratio Sorting, Intensity and Reference Point Possible

1 A more formal definition of what we mean by data will be given in class.

2 The following table presents a full organization of all descriptors of data, by both methods of data classification:

Qualitative Quantitative By category "discrete" May be "continuous" or and "countable" not; may be "countable" or not By level of measurement

Nominal  Names/Labels (no order necessary)  Attributes

Ordinal  Rankings  All "number" "a", "b", "c", "d", "e", …  Lexicographic systems used for (sorting) (culture) counting

Interval  Temperature From "a" to "b"  Stock Prices (sorting; intensity)  Statistical Classes

Ratio  All existing divisions "a":"b" or "a"/"b" of all "number" (sorting; intensity; systems used for reference point) counting

3 Treatment of Data: Organizing Samples in Chapters 4 and 6.

There are two questions we will always seek to answer when analyzing data using statistical methods:

1. What is happening? What values are observed in the gathered samples of data? 2. FREQ. (observed data values, ODV) 2. How often do we observe the values that are occurring in the data? Do these values repeat themselves in identifiable or predictable patterns? (FREQUENCY)

1. ODV Two of the first substantive chapters in your textbook for the course, Chapters 4 and 6, elaborate on the treatment of data by offering you information on how we use statistics to organize sample data in ways that permit us to come up with a meaningful arrangement of data values by category as well as by level of measurement.

Chapters 4 and 6 describe visual tools that help you to organize data via charts graphs or plots that provide insights into data of interest to you. Chapter 6 enters into more detail and offers you definitions of what we formally call "statistics": numbers that describe features of interest about the observed data values in samples.

However, both chapters offer us complementary views of observed data values that permit us to meaningfully arrange samples of data into organized formats that allow us to tell what values occur and how often they happen.

4 Visual Organization of Data

In the case of chapter four, we are ultimately interested in charting (picturing) different varieties of data to obtain a clearer understanding or perspective about the content of data. As the saying goes: "a picture is worth a thousand words".

Effective charting tools for chapter 2

Qualitative Quantitative By category "discrete" May be "continuous" or and "countable" not; may be "countable" or not By level of measurement

Nominal  Pareto Chart (no order necessary)  Pie Chart

 Histogram / Polygon Ordinal  Ogive "a", "b", "c", "d", "e", …  Bar Graph  Stem-Leaf Plot (sorting)  Run Chart/ Time Series  Histogram / Polygon Interval  Ogive From "a" to "b"  Stem-Leaf Plot (sorting; intensity)  Run Chart/ Time Series  Histogram / Polygon Ratio  Ogive "a":"b" or "a"/"b"  Stem-Leaf Plot (sorting; intensity;  Run Chart/ Time reference point) Series

5 All of the charting tools below use one fundamental vehicle of data organization: the frequency table. A frequency table is an array of rows and columns that organizes data into "classes" (rows) that share some common element(s). The columns of a frequency table then contain counts (absolute and relative) or how often different values arise in each class of data --hence the name frequency table. Depending on the type of data --by category or by level of measurement-- we are using for purposes of our analysis, the process of constructing frequency tables will vary, as will the kind of charting tool that most effectively organizes the data visually --see the last table above, and the slides below.

Building Frequency Tables

For Qualitative data For Quantitative data •classes (rows) are by types of attributes •classes (rows) are ranges of values of equal •first column describes attributes magnitude (class width) •all other columns count frequencies in different •first column details ranges for each class ways •all other columns count frequencies...

Classes? Class Width? Frequency-table Construction Steps for Quantitative Data

•Sturge’s law: #classes = 1 + 3.3*log(n) •class width = (max - min) / #classes •start first class close to min •start second class close to min+(class width), and so on… •all other columns count frequencies

6