<<

ARTNeT Greater Mekong Sub-region (GMS) initiative

Session 7

Introduction to important statistical techniques for competitiveness analysis – example and interpretations

ARTNeT Consultant Witada Anukoonwattaka, PhD Thammasat University, Thailand [email protected] Asia‐Pacific Research and Training Network on Trade 1 www.artnetontrade.org Outline • Concepts of analysis • Basic data analysis: – Interpreting quantitative and qualitative data • Technical tools – analysis – Regression • Concepts and interpretation of basic

Asia‐Pacific Research and Training Network on Trade 2 www.artnetontrade.org What is data analysis? 1. Describing what is going on in the dataset E.g. You explore the sample to find out – the level and changes in relative price competitiveness of the observed garment producers on . – differences in the cost competitiveness among firm groups, such as • purely-national firms vs. foreign joint-ventures • small vs. large firms

Asia‐Pacific Research and Training Network on Trade 3 www.artnetontrade.org 2. Testing hypothesis E.g. You may want to know – Whether the changes in relative cost of Chinese garments to that of the GMS group systematically related to tariff reductions? – Does the changes in relative costs differ systematically between countries in the group? – Are the trends of competitiveness similar between exports to the US and Japanese markets?

Asia‐Pacific Research and Training Network on Trade 4 www.artnetontrade.org 3. Forecasting • Can exchange rate depreciation increase export competitiveness of GMS countries to China? By how much? • Can tariff reductions enhance export competitiveness of GMS countries? To what extent?

Asia‐Pacific Research and Training Network on Trade 5 www.artnetontrade.org Describing what is going on in the data

Asia‐Pacific Research and Training Network on Trade 6 www.artnetontrade.org Interpreting Quantitative Data (1) 1. Overall Average Scores - high or low? Very high or very low scores might that the question is poorly worded. 2. Standard Deviations - A low standard respondents generally had a common response. A high standard deviations mean they had different responses. 3. The distribution will help you get a better idea of what is happening. • Is there any bi-modal distribution where there are two different groups who had very different responses? • Bi-modal distribution might show up as having a normal average score, but high standard deviations.

Asia‐Pacific Research and Training Network on Trade 7 www.artnetontrade.org Interpreting Quantitative Data (2)

4. Compare the results between the different demographic subgroups. – Especially focusing on the items where you had interesting things happening in the frequency distributions. 5. If you are serious about understanding your numeric data, you should also perform some statistical analyses.

Asia‐Pacific Research and Training Network on Trade 8 www.artnetontrade.org Interpreting Qualitative Data

1. Read through all the comments. Get a feeling for what people are saying. 2. Categorize the comments into different areas. 3. Look at each category separately. How many unique comments are in each? How detailed are those comments? How strongly are they stated? At this point, you should be able to identify which categories are more important and which are less important. 4. Look at the different subgroups to see if any relationships emerge between subgroups and categories of comments.

Asia‐Pacific Research and Training Network on Trade 9 www.artnetontrade.org Technical Data Analysis:

• Statistic analysis • Hypothesis testing • Forecasting

Asia‐Pacific Research and Training Network on Trade 10 www.artnetontrade.org Statistic Analysis 1. Analysis of individual variables – Look at the “central tendency”, “distribution” and “dispersion” of responses to each data variable. 2. Analysis of relationships between variables – Look at “possible interdependence” between data variables. 3. Analysis of difference characteristics between subgroups. – Look at “characteristic differences” between subgroups. Asia‐Pacific Research and Training Network on Trade 11 www.artnetontrade.org Examples

What are we analyzing when we investigate a competitiveness survey dataset to find out… a) Whether foreign investment tends to enhance labor productivity of the garment industry? b) Whether export-oriented industries have higher labor productivity than import-competing industries? c) How productive is labor in the garment industry ?

Asia‐Pacific Research and Training Network on Trade 12 www.artnetontrade.org

Activity

Statistics Worker Industry 1 2 3 4 AFTA Foreign Mean 58.75 2 0.33 0.50 0.25 0.42 1 0.58 13.50 0.28 0.14 0.15 0.13 0.15 0.33 0.15 45 2 0 0.5 0 0 1 1 30 2 0 1 0 0 2 1 SD 46.76 0.95 0.49 0.52 0.45 0.51 1.13 0.51 Minimum 15 1 0 0 0 0 -1 0 Maximum 180 4 1 1 1 1 2 1 Sum 705 24 4 6 3 5 12 7 Count 12 12 12 12 12 12 12 12 Asia‐Pacific Research and Training Network on Trade 13 www.artnetontrade.org Note: You can do descriptive statistics in Excel

• Go to menu Tools – Add Ins – check the Analysis Tool pack and then press OK button. Next time when you open the Tools menu again, you will see Data Analysis in the bottom of Tools menu. • Click menu Tools – Data Analysis and you will see Data Analysis dialog. Scroll down and you will see Descriptive Statistics. Select it and click OK button.

Asia‐Pacific Research and Training Network on Trade 14 www.artnetontrade.org • You will get the Descriptive Statistics dialog form. In the Input , select range of your data that you want to be analyzed. Include the label in the first row and check that check box. Check also the check box and then click OK button.

Asia‐Pacific Research and Training Network on Trade 15 www.artnetontrade.org The result of the descriptive statistics tool, after formatting, is shown in the figure below.

Asia‐Pacific Research and Training Network on Trade 16 www.artnetontrade.org Analyzing Individual Variables

• Central tendency of the data • Distribution of the data • Dispersion of the data

Asia‐Pacific Research and Training Network on Trade 17 www.artnetontrade.org Tools for Measuring Central Tendency: Mode, Median, Mean

• Mode is the most frequently occurring value, • Median is the middle value, • Mean is the average value.

Notes: a “Yes” means the indicator is suitable for the measurement level shown. b May be OK in some circumstances. See Example 2. Asia‐Pacific Research and Training Network on Trade c May be misleading when the distribution is asymmetric or has a few 18 . www.artnetontrade.org Competitiveness Analysis Examples: Example 1: Which measures of central tendency to use to find the following information from your dataset? a) Unit labor cost of firms in the footwear industry b) The majority of foreign investors in the textile industry c) Average export ratio when the dataset shows that Firm No. Export ratio 1 20% 2 24% 3 28% 4 30% 5 85% Asia‐Pacific Research and Training Network on Trade 19 www.artnetontrade.org Example 2. The following ordinal scale data shows customers’ views on the quality of domestically produced garments (sample size is 30). Is it possible to find the “mean” of this ordinal variable?

Asia‐Pacific Research and Training Network on Trade 20 www.artnetontrade.org Analyzing Data Dispersion: ‘Range’ and ‘ (SD)’

Dispersion is the spread of the values around the central tendency. Range = Max-Min

SD =

Asia‐Pacific Research and Training Network on Trade Note: All statistic programs (event Excel) re capable of calculating descriptive21 statistics for you. www.artnetontrade.org Analyzing Data Distribution: A

The frequency distribution is a summary of the frequency of individual values or ranges of values for a variable.

A Frequency Distribution of Age Groups

Asia‐Pacific Research and Training Network on Trade 22 www.artnetontrade.org

We usually expect normal distribution of the data observations if we performed random .

Normal Distribution

-1 SD 1 SD -2 SD 2 SD If the mean of our example is 20.5 and the standard deviation is 7.5, we can estimate that approximately 95% of the scores will fall in the range of 20.5-(2*7.5) to 20.5+(2*7.5) or between 4.5 and 35.5 Asia‐Pacific Research and Training Network on Trade 23 www.artnetontrade.org Analyzing Relationships between Variables

• Scattered-plot diagram • Cross tabulation (Pivot Table) • Regression analysis

Asia‐Pacific Research and Training Network on Trade 24 www.artnetontrade.org Relationships between Variables

Is there any relationship between the two variables shown in the scattered-plot diagram?

Asia‐Pacific Research and Training Network on Trade 25 www.artnetontrade.org Cross Tabulation (Pivot Table)

Attitude toward QC Export orientation Low Medium High Total Indifferent 27 37 56 120 Somewhat positive 35 39 41 115 Positive 43 33 30 106 Total 105 109 127 341

Note: Some called it , while MS excel calls it Pivot Table. Asia‐Pacific Research and Training Network on Trade 26 www.artnetontrade.org Interpretation (1)

Attitude toward QC Export orientation Low Medium High Total Indifferent 120 35% Distribution of attitude Somewhat positive 115 34% variable. Positive 106 31% Total 105 109 127 341 100%

100% Distribution of export-orientation variable.

Does the sample bias toward particular attitude? Does the sample bias toward particular firm types? Asia‐Pacific Research and Training Network on Trade 27 www.artnetontrade.org Interpretation (2)

Attitude toward Export orientation QC Low Medium High Total Indifferent 56 Distribution Somewhat positive 41 of attitudes for Positive 43 33 30 106 high export Total 127 firms.

Distribution of export-orientation for positive attitude toward QC.

• Is attitude toward QC associated with export orientation of the firms? • Do the firms with a positive attitude toward QC tend to be low or high export-orientation firms? Asia‐Pacific Research and Training Network on Trade • Do the firms with high export-orientation tend to be positive or 28 indifferent toward QC? www.artnetontrade.org Analysis of Differences between Groups

E.g. Differences between firm groups.

Percentage Cross Tabulation

Attitude toward Export orientation QC Low Medium High Total Indifferent 26 34 44 35 Somewhat positive 33 36 32 34 Positive 41 30 24 31 Total 100 100 100 100 • Are there differences between low-export and high-export firms in the attitude toward QC? Asia‐Pacific Research and Training Network on Trade 29 www.artnetontrade.org Note: You can do Cross Tabulation in Excel In Microsoft Excel, CrossTabs can be automated using Pivot Table. You may use either Pivot Table icon in the toolbar or using MS Excel Menu Data – Pivot Table and Pivot Chart Report .

When you click the toolbar or menu, Pivot Table wizard will pop up, click Next

Asia‐Pacific Research and Training Network on Trade 30 www.artnetontrade.org In the step 2 of the wizard, you highlight the data including the label of the data in the top as shown in the following figure.

Asia‐Pacific Research and Training Network on Trade 31 www.artnetontrade.org In step 3 of the Pivot Table Wizard, select Layout button.

Asia‐Pacific Research and Training Network on Trade 32 www.artnetontrade.org To answer the relationship between variable Playground and Satisfaction , drag and drop the name of the variables on the right into the diagram. Put Satisfaction button in the row and Playground button in the column and make another drop to put Satisfaction once again to the Data . It will appear as Sum of Satisfaction . After that, double click the last button ( Sum of Satisfaction ) and Pivot Table Field dialog will appear. Select summarized by Count and then click the OK button twice.

Asia‐Pacific Research and Training Network on Trade 33 www.artnetontrade.org When you go back to the Step 3 of Pivot table wizard, click Finish button.

Asia‐Pacific Research and Training Network on Trade 34 www.artnetontrade.org MS excel will automatically create the Cross Tabulation table. Personally, I don't like to use it directly because it may contain very long formula. Thus, I prefer to highlight this Pivot Table and use Menu Edit Copy (CTRL-C). Then select another cell, and use menu Edit - Paste Special . Click Values options and click OK button.

Asia‐Pacific Research and Training Network on Trade 35 www.artnetontrade.org Key Considerations • Watch the "n" (number of observations)- Be wary of small samples. – If there are few respondents in a particular category, you should NOT trust the data, or at least, you should look for much stronger trends before trusting the results. For example, can we make a conclusion if we found that… Case A) 38% of sample (8 observations) said they have not had a problem competing with imports from China. Case B) 88% of sample (8 observations) said they have not had a problem competing with imports from China.

Asia‐Pacific Research and Training Network on Trade 36 www.artnetontrade.org • Knowing whether a relationship is strong enough or not strong enough with smaller respondent numbers takes some practice and experience. • What you really want to know is whether the relationship is "statistically significant". – This type of analysis is rather technical.

Asia‐Pacific Research and Training Network on Trade 37 www.artnetontrade.org Introduction to Regression Analysis • Regression Analysis A technique for using data to identify relationships among variables and use these relationships to make predictions.

Asia‐Pacific Research and Training Network on Trade 38 www.artnetontrade.org Basic Concepts of Regression Analysis

• You first fit a straight line to model the data.

yb= 01++ bxerror • A straight line provides the simplest model of the relationship between the response (y variable) and the predictor (x variable).

Asia‐Pacific Research and Training Network on Trade 39 www.artnetontrade.org Simple

Y

Fitted line

Productivity Index y =bbxerror01++ X

Firm-size Index

Productivity = b0 + b1(Size) + error Asia‐Pacific Research and Training Network on Trade 40 www.artnetontrade.org How far is the fitted line from the y =+bbxerror01 + data.

Dependent coefficients Independent variable variable

•The size of the coefficient gives you the size of the effect that variable is having on your dependent variable. •Thesign on the coefficient (positive or negative) gives you the direction of the effect.

Asia‐Pacific Research and Training Network on Trade 41 www.artnetontrade.org Interpretation

Productivity = b0 + b1(Size) + error b1 represents the increase in productivity for an additional value of firm size. b0 could in theory be thought of as the productivity for which the firm-size is zero

Regression Productivity = 5 + 3 Size + error Prediction Expected Productivity = 5+ 3 Size • Productivity is predicted to increase by 3 units if firm-size increases by 1 unit. • If the average firm size of the industry of interest is Asia‐Pacific Research and Training Network on Trade 20, we get a predicted productivity of 5+ 3(20) = 65. 42 www.artnetontrade.org Your turn!

What is the following regression telling? Market share = 100 – 0.2 (labor cost) + error

Asia‐Pacific Research and Training Network on Trade 43 www.artnetontrade.org General Regression • If a straight line doesn’t fit the data well, you can – Fit a curved line with quadratic or cubic terms – Apply a log transformation to the response (y) or predictor variable (X).

E.g. ln y = β 01++β ln xerror

Asia‐Pacific Research and Training Network on Trade 44 www.artnetontrade.org A regression model may need more than one dependent variable to adequately describe the response (Y variable).

yb=+0112233 bxbxbxerror + + +

This is called “Multiple Regression”. • The coefficient tells you how much the response is expected to increase when that independent variable increases by one, holding all the other independent variables constant.

Asia‐Pacific Research and Training Network on Trade 45 www.artnetontrade.org E.g. What is the regression telling? export price = 120 -3 (exchange rate) + 1.7 (wage) + e

Asia‐Pacific Research and Training Network on Trade 46 www.artnetontrade.org Regression Output

Export share = b01 +b (realwage )+ b2 ( investment )++ b3 ( L . prod .) e R square = 0.646 Adjusted R square = 0.613 Prob>F =0.000 Estimated SE t statistic P value Coefficien t constant 41.36 37.82 1.094 0.280

Real wage -15.85*** 2.88 5.500 0.000

Investment 0.64 0.27 0.236 0.814

Labor 2.42*** 0.81 2.992 0.004 Productivity Asia‐Pacific Research and Training Network on Trade Note: at the 1 percent, 5 percent and 10 percent levels47 is www.artnetontrade.org indicated by ***, **, and *. Interpretation of a regression output (1) 1) Are the independent (X) variables having a genuine effect on the response (Y) ? 1.1 Look for a small “P value” in a regression output. –“P value” tells you how confident you can be that each individual variable has some correlation with the dependent variable. It is also called significant level. –“P < 0.05" is the most common standard threshold for statistical significance. • It says there is a 95% probability of being correct that the variable is having some effect, assuming your model is specified correctly.

Asia‐Pacific Research and Training Network on Trade 48 www.artnetontrade.org (2) 1.2 Looks for a large “t statistic” in a regression output. – t statistic is the coefficient divided by its standard error (SE). – SE tells the precision of the regression coefficient. If a coefficient is large compared to its standard error, then t statistic is large (significantly different from 0). – Your regression software will compare the t statistic on your variable with values in the Table of t distribution to determine the P value, which is the number that you really need to be looking at.

Asia‐Pacific Research and Training Network on Trade 49 www.artnetontrade.org (3) – The larger the t statistic, the more likely there is a 95% (or higher) probability that the variable is having some effect, then you have P value < 0.05.

1.3 Look for symbols indicating statistical significance at the 1%, 5%, and 10% level. – statistical significance at the 1%, 5%, and 10% is another way of saying P < 0.01, P< 0.05, and P< 0.10, respectively.

Asia‐Pacific Research and Training Network on Trade 50 www.artnetontrade.org (4) 2) Whether your regression model is making accurate predictions? - Look for “R-squared (R2) ”close to 100%. - It says how much of dependent variable (Y) has been explained by the regression model. Ex. What is meant by R2 = 100% ? 3) Is there any explanatory variable missing from the model? -See whether “Adjusted R square ( )” is significantly 2 lower than R . R 2 - It usually says that there are some explanatory variables missing from the model Asia‐Pacific Research and Training Network on Trade 51 www.artnetontrade.org (5)

4) You should be aware that P-value is generally more important than R-square - The P value tells you how confident you can be that each individual variable has some correlation with the dependent variable. - The R-squared is generally of secondary importance, unless your main concern is using the regression equation to make accurate predictions.

Asia‐Pacific Research and Training Network on Trade 52 www.artnetontrade.org (6)

5) The sign of multicollinearity (independent variables may be correlated) - Small P-value of the regression as a whole (Prob>F at the upper part of the regression output is less than 0.05), but large P-value of individual variables. - It means the coefficients on individual variables may be insignificant when the regression as a whole is significant. - Intuitively, this is because highly correlated independent variables are explaining the same part of the variation in the dependent variable, so their explanatory power and the significance of their coefficients is "divided up" between them. Asia‐Pacific Research and Training Network on Trade 53 www.artnetontrade.org Regression Methods and Choosing Criteria

Asia‐Pacific Research and Training Network on Trade 54 www.artnetontrade.org Regression Continuous X variables ⇒ Continuous response (Y)

E.g. How are the age and the body mass index (BMI) of a patient associated with the length of stay in the hospital?

Asia‐Pacific Research and Training Network on Trade 55 Day=+ b01www.artnetontrade.org b Age + b 2 BMI + e General

Categorical X variables ⇒ Continuous response (Y)

How is the payment method and the day of the week associated with the cost of a transaction?

Asia‐Pacific Research and Training Network on Trade 56 www.artnetontrade.org Day (x1) Dummy Value Method(x2) Dummy Value Mon 0,1 Credit 0,1 Tue 0,1 Cash 0,1 Wed 0,1 Check 0,1

Cost=+ b01 b DayDummy+ b2 MethodDummy+ e Asia‐Pacific Research and Training Network on Trade 57 www.artnetontrade.org Binary

Two Response (Y) Categories

Advertisement Whether customers who saw an advertisement for its new cereal are more likely to buy the product? Analysts randomly sample customers and ask them whether they saw the advertisement and whether they bought the cereal.

Asia‐Pacific Research and Training Network on Trade 58 www.artnetontrade.org Binary Logistic Regression

Two Response (Y) Categories

Advertisement

Decision (y) Coding Buy 1 Don’t buy 0

Pr(AsiaDecision‐Pacific Research )= and f Training( Ad . Dummy Network on )Trade+ e 59 www.artnetontrade.org Ordinal Logistic Regression

More than Two Response (Y) Categories in Natural Order

Hen Weight Whether the weight of a hen is related to the size of its eggs? They randomly sample hens, record the weight of each hen, and classify the size of its eggs as small, medium, or large.

Asia‐Pacific Research and Training Network on Trade 60 www.artnetontrade.org Ordinal Logistic Regression

More than Two Response (Y) Categories in Natural Order

Hen Weight

Egg Size (y) Coding Small 1 Medium 2 Large 3

Pr(AsiaEgg‐PacificSize Research ) = fand( HenWeiTraining Networkght on) Trade+ e 61 www.artnetontrade.org Nominal Logistic Regression

More than Two Response (Y) Categories with No Natural Order

Whether the color of the vehicle that consumers purchase is related to their gender or age? Because the colors of the vehicles cannot be arranged from least to greatest, the response categories do not follow a natural order.

Asia‐Pacific Research and Training Network on Trade 62 www.artnetontrade.org Nominal Logistic Regression

More than Two Response (Y) Categories with No Natural Order

Color (y) Dummy Value Silver 0,1 Blue 0,1 Red 0,1

Pr(Color ) = f( Age) + e

Asia‐Pacific Research and Training Network on Trade 63 www.artnetontrade.org Potential Misuses of Statistics

• Manipulating the scale to change the appearance of the distribution of data • Eliminating high/low scores for more coherent presentation • Inappropriately focusing on certain variables to the exclusion of other variables • Presenting correlation as causation

Asia‐Pacific Research and Training Network on Trade 64 www.artnetontrade.org Conclusion • Statistical analysis is just one way of working with observable information. • It consists of tests used to analyze data. These tests provide an analytical framework within which researchers can pursue their research questions. • However, statistical tests may be misused, resulting in potential misinterpretation and misrepresentation.

Asia‐Pacific Research and Training Network on Trade 65 www.artnetontrade.org Reading

• Sykes, A. An Introduction to Regression Analysis. Inaugural Coase Lecture. Chicago Working Paper in Law & Economics. • US General Accounting Office (1992), Quantitative Data Analysis: An Introduction. Report to Program Evaluation and Methodology Division. • Colorado State University. Introduction to Statistics. http://writing.colostate.edu/guides/research/stats/ index.cfm. • William M.K. Trochim (2006). Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/index.php

Asia‐Pacific Research and Training Network on Trade 66 www.artnetontrade.org