Exceltm and Statistical Analysis

Exceltm and Statistical Analysis

<p> Excel and Statistical Analysis Jonathan Hill Florida International University</p><p>In this document, basic principles for analyzing data with Microsoft’s Excel software will be detailed. Many assignments with require the information provided in the document. It is assumed that you know how to boot-up Excel from your personal computer’s Start Menu, as well as open existing Excel files and save files.</p><p>CONTENTS</p><p>Topic Page </p><p>1. Data Arrangement: Rows, Columns, Observations 2 1.1 Rows and Columns 2 1.2 Observations 2</p><p>2. Creating News Variables as Functions of Existing Variables 3 2.1 Creating One New Variable for One Observation 3 2.2 Creating One New Variable for All Observations 4 2.3 Examples</p><p>3. Statistical Analysis: Mean, Variance, Standard Deviation 5 3.1 Sample Mean, Mode and Median 5 3.2 Sample Variance and Standard Deviation 7 3.3 Descriptive Statistics: Tool Package 8</p><p>4. Relative Frequency and Histograms 10 4.1 Frequency 10 4.2 Relative Frequency 11 4.3 Histogram 12</p><p>5. Probability Distribution Functions 14 5.1 Discreet Distribution Functions 14 Binomial and Poisson 5.2 Continuous Distribution Functions 17 Normal and Student's t</p><p>6. Bivariate Analysis: Sample Covariance and Correlation 20 6.1 Sample Covariance 20 6.2 Sample Correlation 21</p><p>7. Interval Estimation 23</p><p>8. Graphics: Charts, Tables and Figures 25 8.1 Scatter Plots 25 8.2 Time-Series: Line Plots 27 8.3 Editing Graphs and Figures 29</p><p>1 1. Data Arrangement: Rows, Columns, Observations</p><p>1.1 Rows and Columns When you boot-up Excel, a spread sheet of “cells” appears with column labels in letters (e.g. A, B, etc..) and row labels in Arabic numerals (e.g. 1, 2, etc..). For example, the upper-left corned of the spread-sheet will appear as follows:</p><p>A B C 1 2 3 4</p><p>A vertical stack of cells is a “column”. For example, column “A” is the stack A</p><p>A horizontal layer of cells is a “row”. For example, the 3rd row is the layer</p><p>A B C 1 2 3 4</p><p>1.2 Observations [Individual People/Time Period in a Sample of Data] Your data assignments will include Excel data files. For example, consider data on work-hours, hourly wage, education and age of 4 people:</p><p>A B C D 1 hours wage ed age 2 34 7.5 12 20 3 50 20.8 18 37 4 40 5.75 16 50 5 30 12.90 14 42</p><p>The data files will always contain column headers with variable names. The data is read as follows. Clearly, the columns contain individual variables. For example, columns C contains information on the years of education the 4 people have. Moreover, in the 2nd row is the first person in our sample of data. This person works 34 hours/week, earns $7.50/hour, has 12 years of education and is 20 years old. In this manner, the rows contain information for one person. We call one such set of information for one person an “observation”. Thus, the fourth observation comprises 30 work hours, a wage of $12.90, 14 years of education, and an age of 42. Our sample, therefore, has n = 4 observations. Similarly, we could have data that changes over time, like GNP, the CPI and inflation for 1988, 1989 and 1990 A B C D 1 YEAR GNP CPI INFL 2 1988 3 1989 4 1990</p><p>Thus, one year, say 1988, and its GNP, CPI and inflation rate, represents one observation. </p><p>2 2. Creating News Variables as Functions of Existing Variables</p><p>2.1 Creating One New Variable for One Observation Often, we want to create new columns of data based on the existing data in our data file. For example, if our file contains information on one firm’s revenue [REV] and cost-per-unit [COST] of production for 4 separate years (1998, 1999, 2000, 2001), we can use an empty column to create a new profit variable, PROFIT= REV – COST.</p><p>A B C D 1 YEAR REV COST 2 1998 100 87 3 1999 76 75 4 2000 490 500 5 2001 80 40</p><p>2.2.1 First, we want to create a new column label, in column D, for the PROFIT variable. Simply place the cursor over the cell in row 1, column D (denoted as cell "D1"), click, and type “PROFIT”:</p><p>A B C D 1 YEAR REV COST PROFIT 2 1998 100 87 3 1999 76 75 4 2000 490 500 5 2001 80 40</p><p>2.2.2 Next, we want to use the data in the 2nd row (the first year in our sample) to create the new profit variable for the year 1998. Point and click-on column D, row 2 (i.e. cell D2), just below the new label, PROFIT. We will use Excel’s ability to perform mathematical operations. Type “=”, then type in the cell coordinates containing the first year’s revenues, “B2” (always letters/columns first), then type “-”, then the coordinates containing cost, “C2”:</p><p>A B C D 1 YEAR REV COST PROFIT 2 1998 100 87 =B2-C2 3 1999 76 75 4 2000 490 500 5 2001 80 40</p><p>Finally, hit Enter. Excel will perform the operation for you:</p><p>A B C D 1 YEAR REV COST PROFIT 2 1998 100 87 13 3 1999 76 75 4 2000 490 500 5 2001 80 40</p><p>2.2 Creating One New Variable for All Observations Excel beautifully allows us to perform the same operation for all other years in the sample simply by clicking and dragging the cursor. Click-on the cell containing the first year’s profit. The cell will be surrounded by black with a small box in the lower-right corner:</p><p>A B C D 1 YEAR REV COST PROFIT 2 1998 100 87 13  3 1999 76 75 4 2000 490 500 5 2001 80 40</p><p>3 Point the cursor over small blacken box in the lower right corner of the cell: Excel will change the shape of the cursor to a small cross. Then, click, hold, and drag down: Excel will automatically perform the same arithmetic operation for the first cell in all subsequent cells. Once you reach the 5 th row (i.e. cell D5), stop dragging, and release the mouse-button. The result is</p><p>A B C D 1 YEAR REV COST PROFIT 2 1998 100 87 13 3 1999 76 75 1 4 2000 490 500 -10 5 2001 80 40 40</p><p>2.3 Examples We want to the use labor data from above to create a yearly wage income variable. Mathematically, yearly wage income is</p><p> incomewage work hours wages weeks    52 (thus, the units are $'s) year week hour year</p><p>A B C D E 1 hours wage ed age 2 34 7.5 12 20 3 50 20.8 18 37 4 40 5.75 16 50 5 30 12.90 14 42</p><p>In column E, row 1 (cell E1), type “income_wage”, or whatever variable name you prefer, as long as it logically describes the content of the data:</p><p>A B C D E 1 hours wage ed age income_wag e 2 34 7.5 12 20 3 50 20.8 18 37 4 40 5.75 16 50 5 30 12.90 14 42</p><p>Click-on column E, row 2 (cell E2), type “=A2*B2*52”, then enter. Notice that multiplication is performed with the asterisk “*”. The result is a yearly wage income of $13260/year:</p><p>A B C D E 1 hours wage ed age income_wag e 2 34 7.5 12 20 13260 3 50 20.8 18 37 4 40 5.75 16 50 5 30 12.90 14 42</p><p>Place the cursor above the small black box in the lower right corned of E2, click, hold, and drag down to the 5th row (cell E5). The outcome is</p><p>A B C D E 1 hours wage ed age income_wag e 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124</p><p>4 3. Statistical Analysis: Mean, Variance, Standard Deviation</p><p>In the present sub-section, we will study how to use Excel for basic statistical analysis of data. The theory required for the subsequent discourse is contained in Newbold, Chapter 2. The data that we inspect in any situation or environment is necessarily a sample taken from a larger population: we cannot (typically) have the entire population of data. Thus, all data analysis is of the sample statistic form.</p><p>3.1 Sample Mean, Mode and Median Consider the sample of data on work-hours, wage, education, age, and newly created variable, yearly wage income = income_wage: A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7</p><p>For ease of reference, we can use the 7th row to contain a label for the "mean", as well as the sample means of each variable. Thus, type in cell A7 the title "MEAN".</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN</p><p>3.1.1 Sample Mean for one Variable (e.g. work-hours) For the sample mean (i.e. average) of work-hours, click-on cell B7, and type the command </p><p>=average(B2:B5)</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN =average(B2:B5)</p><p>This command will tell Excel to derive the sample mean of the numbers contained in cells B2 - through - B5, which identically represents the work hours data. Once you type the above command in, hit ENTER, and Excel will fill the cell with the sample average:</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.5</p><p>5 3.1.2 Sample Mean for All Variables Recall the point, click and drag method for easily creating observations of a newly created variable. We will employ the same technique. Click-on cell B7, in which the sample mean of "hours" is contained. The cell will be highlighted with black around the edges, and a small box in the lower right corner:</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.5  Point the cursor over the small black box until it changes into a small black cross, click and drag to the right over cells C7 - F7. Excel will not copy the numerical value "38.5" into the subsequent cells. Rather, Excel will copy the formula for the average in those cells, updating the column of data used as you move right. The result is sample averages for all variables:</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.5 11.7375 15 37.25 24856</p><p>3.1.3 Formatting Cells for Decimal Places The above results are somewhat messy in the sense that the sample mean of hours has one decimal place, the sample mean of wages has 4 decimal places, etc. We would like to standardize the representation by displaying all numbers with only two decimal places. Click-on cell B7. Place the cursor above the center of the cell, and NOT the small box in the corner. Click and drag over cells B7 - F7. The cells with become highlighted. </p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.5 11.7375 15 37.25 24856</p><p>Go to the task-bar on the top of the screen, click-on FORMAT, CELL, NUMBER. Once done, Excel will have displayed a pop-up box with a choice of the number of decimal places. Choose "2". Click-on "ok" on the bottom of the pop-up box. </p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00</p><p>6 Excel will round values in the case when the true numerical value has more than the prescribed number of decimal places (in this care, two). </p><p>3.1.4 Sample Mode and Median</p><p>For sample modes and medians, use the command “=mode(…)”and “=median(…)”. As a separate example, consider two columns of data labeled simply X and Y. Using the methods describe above, we can calculate the samples of the two variables:</p><p>A B C 1 X Y 2 1 2 3 1 7 4 0 0 5 -1 0 6 7 MEAN .25 2.25 8 9</p><p>For clarity, type “MODE” and “MEDIAN” in cells A8 and A9, respectively. In cell B8, type</p><p>=mode(B2:B5)</p><p>Then, hit ENTER. The mode of variable X is 1.00, the most frequently occurring value in the sample of 4 observations. For the sake of speed, click on cell B8, place the cursor above the small black box in the lower right corner, click, hold, and drag over cell C8. The mode of variable Y is 0.00. Formatting the cells so numerical values have 2 decimal places, we now have</p><p>A B C 1 X Y 2 1 2 3 1 7 4 0 0 5 -1 0 6 7 MEAN .25 2.25 8 MODE 1.00 0.00 9 MEDIAN</p><p>Next, in cell B9, type “=median(B2:B5)”, then ENTER. Click and drag for variable Y. The result, after formatting, is</p><p>A B C 1 X Y 2 1 2 3 1 7 4 0 0 5 -1 0 6 7 MEAN .25 2.25 8 MODE 1.00 0.00 9 MEDIAN .50 1.00</p><p>3.2 Sample Variance and Standard Deviation Any subsequent statistical operation in Excel requires only knowledge of the proper command. All further steps are identical to those outlined for the sample mean. For the sample variance, type "VARIANCE" in cell A8, and in cell B8 use the command </p><p>7 =var(B2:B5)</p><p>Then type ENTER, and click-and-drag over the remaining cells, C8 - F8. For the sample standard deviation, type "STAND. DEV." in cell A9, then type in cell B9</p><p>=stdev(B2:B5)</p><p>Then type ENTER, and click-and-drag over the remaining cells, C9 - F9. After the numerical values in the cells have been standardized to two decimal places, we obtain</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00 8 VARIANCE 75.67 45.76 6.67 160.92 392402677.33 9 STAND. DEV. 8.70 6.76 2.58 12.69 19809.16</p><p>3.3 “Descriptive Statistics” Excel Package</p><p>3.3.1 Deriving All Descriptive Statistics Excel provides a very simple mechanism for obtaining an array of sample descriptive statistics of your data. Consider the spread-sheet</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124</p><p>In the main Excel tool-bar, click-on TOOLS, DATA ANALYSIS, DESCRIPTIVE STATISTCS, then OK. In the area to the right of Input Range, type in the cell range in which your data is stored. In this case, we type</p><p>B1: F5</p><p>Notice that we include the 1st row with the variable labels. Click-on Labels in First Row. Click-on Summary Statistics for standard sample statistics. Click-on Confidence Level for Mean in order to generate the CI-length. Excel sends the output to another “sheet” which can be accessed by clicking-on the sheet with the highest number at the bottom of the screen (e.g. Sheet 4). The result is</p><p> hours wage ed age income_wage Mean 38.5 Mean 11.7375 Mean 15 Mean 37.25 Mean 24856 Standard Error 4.349329 Standard Error 3.382392 Standard Error 1.290994 Standard Error 6.342647 Standard Error 9904.578 Median 37 Median 10.2 Median 15 Median 39.5 Median 16692 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard Deviation 8.698659 Standard Deviation 6.764783 Standard Deviation 2.581989 Standard Deviation 12.68529 Standard Deviation 19809.16 Sample Variance 75.66667 Sample Variance 45.76229 Sample Variance 6.666667 Sample Variance 160.9167 Sample Variance 3.92E+08 Kurtosis -0.03656 Kurtosis -0.20184 Kurtosis -1.2 Kurtosis 1.365516 Kurtosis 3.290698 Skewness 0.829537 Skewness 0.980108 Skewness 0 Skewness -0.96447 Skewness 1.813812 Range 20 Range 15.05 Range 6 Range 30 Range 42120 Minimum 30 Minimum 5.75 Minimum 12 Minimum 20 Minimum 11960 Maximum 50 Maximum 20.8 Maximum 18 Maximum 50 Maximum 54080 Sum 154 Sum 46.95 Sum 60 Sum 149 Sum 99424 Count 4 Count 4 Count 4 Count 4 Count 4</p><p>8 3.3.2 Descriptive Statistics Output</p><p>“NA”: Denotes “not applicable” in the event a unique answer does not exist.</p><p>Standard Error: Denotes the sample mean divided by the estimated standard deviation of the sample mean: </p><p>X seX   2 s x n</p><p>Range: Denotes the difference between the maximum and minimum values.</p><p>Count: Denotes the number of actual data points.</p><p>Skew: Measures the degree of skew in the distribution of X. A large positive skew implies a skew to the right (disproportionate probability of relatively small/negative values); a large negative skew implies a skew to the right (disproportionate probability of relatively large/positive values).</p><p>Kurtosis: Measures the thickness of the tails of the distribution. Random variables with a high variance have a greater odds of taking on relatively large values, hence the middle realm of the distribution is flat, and the tails are thick: this denotes a large kurtosis. Random variables with a low variance have a smaller odds of taking on relatively large values, hence the middle realm of the distribution is tall, and the tails are thin: this denotes a small kurtosis.</p><p>9 4. Relative Frequency and Histograms</p><p>Excel provides a simple method for calculating frequencies, relative frequencies and histograms. Consider the following column of data representing the number of divorces in a sample of n = 10 people:</p><p>A B C D 1 divorces 2 0 3 0 4 1 5 4 6 2 7 1 8 1 9 0 10 0 11 2</p><p>4.1 Frequency We would like to derive the frequency for each divorce value. Recall that we define frequency as the number of times a value occurs in a sample. Notice that the only values that appear in the sample are (0,1,2,4), thus the relative frequencies of values 3,5,6,… will be zero. In cell B2, type FREQUENCY. In column D, we need to create the possible values which we want to include for the frequencies: unfortunately, Excel does not understand that we simply want the values between, say 0 and 5; rather, we need to type the numerical values to be counted in a separate column. Create the following:</p><p>A B C D 1 divorces FREQUENCY 2 0 0 3 0 1 4 1 2 5 4 3 6 2 4 7 1 5 8 1 9 0 10 0 11 2</p><p>Now, click on cell B2, hold the mouse-key down, and drag down to cell B7. This will highlight cells B2 - B7, exactly parallel to cells D2-D7, which contain the possible values we want to count in our sample. Next, in the task-bar at the top of the screen, click on the fx key: this brings up a pop-up box within which we can employ of Excel's statistical and mathematical formulas. On the left-side of the pop-up box, click-on "statistical", on the right-side, scroll down to "frequency", and click. A frequency pop-up box appears. In the Data-Array, type the cells in which the column of data is contained, in this case A2:A11 (be sure to use a colon between cell values). In the Bins-Array, type the cells which contain the possible values we want to count, in this case cells D2:D7. Again, be sure to use a colon. DO NOT TYPE ENTER. In order to tell Excel that you want this operation to occur for all possible values in D2-D7, relative to the sample of data A2-A11, we need to hit simultaneously CTRL and SHIFT and ENTER. Once done, Excel will display the number of times each value in cells D2-D7 occurs in the sample contained in A2-A11. In summary:</p><p>10 i. Highlight cells </p><p>A B C D 1 divorces FREQUENCY 2 0 0 3 0 1 4 1 2 5 4 3 6 2 4 7 1 5 8 1 9 0 10 0 11 2</p><p> ii. Click on the fx key, click on statistical, then frequency. Type in A2:A11 in the Data- Array, and type in D2:D7 in the Bins-Array.</p><p> iii. Hit simultaneously CTRL and SHIFT and ENTER. The result: </p><p>A B C D 1 divorces FREQUENCY 2 0 4 0 3 0 3 1 4 1 2 2 5 4 0 3 6 2 1 4 7 1 0 5 8 1 9 0 10 0 11 2</p><p>Thus, the value 0 occurs 4 times, 1 occurs 2 times, 2 occurs twice, 3 never occurs, 4 occurs onces, zero 5's, and of course, zero 6's, etc. </p><p>4.2 Relative Frequency Recall that the relative frequency is the frequency relative to the sample size. Thus, for example, the relative frequency of the value of 0 divorces represents the percent of the sample which has this value of divorces. In this case, the relative frequency of 0 is 4/10 = .4 0, or 40% of the people in our sample have a value of 0 divorces. To derive the relative frequency, simply use column C, and perform a basic new- variable derivation, as detailed in Section 1. Type "REL. FREQ." in cell C1. Click on cell C2, type </p><p>=B2/10 then hit ENTER. Excel will derive 4/10 for you, and present .40. To derive the rest of the relative frequencies, click on cell C2, hold the cursor over the small black box in the lower right corner, click, hold and drag down over cells C3 - C7. The result, after formatting the numerical values to have 2 decimal places, is</p><p>11 A B C D 1 divorces FREQUENCY REL. FREQ. 2 0 4 0.40 0 3 0 3 0.30 1 4 1 2 0.20 2 5 4 0 0.00 3 6 2 1 0.10 4 7 1 0 0.00 5 8 1 9 0 10 0 11 2</p><p>Clearly, the relative frequencies must sum to 1.00 (one). </p><p>4.3 Histogram A simple method of graphically representing an array of relative frequencies if the histogram. We will use Excel's ability to create graphs to build a histogram: further details on graphs and figures is presented in Section 8. 4.3.1 Create the Rough Figure Shape First, highlight the cells that contain the relative frequency information, as well as the label "REL. FREQ.":</p><p>A B C D 1 divorces FREQUENCY REL. FREQ. 2 0 4 0.40 0 3 0 3 0.30 1 4 1 2 0.20 2 5 4 0 0.00 3 6 2 1 0.10 4 7 1 0 0.00 5 8 1 9 0 10 0 11 2</p><p>Next, go to the task bar at the top of the screen, click on the little colored bar-chart that is two buttons to the left of the screen percent size [e.g. 100%]. A pop-up box appears. Click on "Standard Types", then "Column". Then, click on "Next": Excel will show you the basic set-up of the resulting chart. Click on "Next" again: a pop-up box allowing for substantial alterations of the basic outlook of the bar- chart appears. Type in the Title as "Relative Frequency of Divorces", type is Category X as "Number of Divorces", and type in Category Y as "f(x)/n". If you want to remove the grid-lines, click-on Gridlines, and click-on any box with a check-mark in it: this will remove the mark, and remove the associated grid-lines. Because the legend (the graph box that describes the variable) is redundant, click-on Legend, then click off the mark on "Show Legend". Finally, click on Finish at the bottom of the pop-up box. The result is the following histogram:</p><p>Relative Frequency of Divorces</p><p>0.60</p><p> n 0.40 / ) x (</p><p> f 0.20 0.00 1 2 3 4 5 6 Num ber of Divorces 12 4.3.2 Refine the X-axis Observe that unless we tell Excel otherwise, it will automatically display X-axis values starting at 1: For our histogram purposes, however, this is in error: the smallest number of divorces possible is, of course, zero. In order to specify to Excel the correct X-axis array of values, in our case 0 – 5, perform the following task. First, recall that column D contains the possible values, 0 – 5. Now, point the cursor over the white-area of the chart, and click once: this highlight the chart with small black dots around the border. Go to the task-bar at the top of the screen, click on CHART, SOURCE DATA, then SERIES. At the bottom of the pop-up box, there is a white-area next to “Category (X) Axis”. Click once in this area: this places Excel’s cursor in the box. Now, move the mouse cursor over the cell that contains the first possible value, 0: in our case, the cell is D2. Click once and hold, dragging the cursor down over all cells that contain possible, cells D2 – D7. Once you reach cell D7, let go of the mouse button: Excel will automatically fill in the cell range D2 – D7 into the white area to the right of “Category (X) Axis”. Once this is done, click on OK at the bottom of the pop-up box. The result is as follows:</p><p>Relative Frequency of Divorces</p><p>0.5 0.4</p><p> n 0.3 / ) x (</p><p> f 0.2 0.1 0 0 1 2 3 4 5 Number of Divorces</p><p>See Section 8 for further information about graph/chart/figure formatting. </p><p>13 5. Probability Distribution Functions</p><p>Excel provides a simple means for deriving the probability distribution functions or cumulative distribution functions for a variety of discreet and continuous random variables. In this section, we will study Excel’s built in mathematical operations relating to the pdf’s of the binomial, the poission, normal and Student’s t distributions.</p><p>5.1 Discreet Distribution Functions</p><p>5.1.1 Binomial Distribution Consider the following set-up. We have 5 people drawn from a population in which p = 35% are smokers. We want to tabulate the probability distribution function of the number of smokers X in our group of n = 5 people. The easiest way to derive the probability distribution function is by first creating a column containing all possible values of X, in this case (0,1,2,3,4,5):</p><p>A B C 1 X 2 0 3 1 4 2 5 3 6 4 7 5</p><p>Now, in cell B1 type P(X). In cell B2, type the command</p><p>=binomdist(A2, 5, .35, false)</p><p>Then hit ENTER. The command “false” indicates that we want a probability distribution function, and NOT a cumulative distribution function. The result is</p><p>A B C</p><p>1 x PX(X=x) 2 0 0.116029 3 1 0.312386 4 2 0.336416 5 3 0.181147 6 4 0.048770 7 5 0.005252</p><p>Consider, next, deriving the cumulative distribution function, F(X < x). Type FXX < x) in cell C1, click on cell C2, and type the command</p><p>=binomdist(A2, 5, .35, true)</p><p>Next, click on cell C2, the hold the cursor above the small black box, click and drag over cells C3-C7. The result, after formatting the cells to show 4 decimal places, is the complete cdf for x = (0,1,2,3,4,5):</p><p>A B C</p><p>1 x PX(X=x) FX( x) 2 0 0.1160 0.1160 3 1 0.3124 0.4284 4 2 0.3364 0.7648 5 3 0.1811 0.9460 6 4 0.0488 0.9947 7 5 0.0053 1.0000</p><p>14 Graphically, we can employ the techniques discussed in Sections 4 and 8 for creating a bar-chart of the pdf and cdf. Consider the pdf contained in cells C2-C7. Highlight the cells C2-C7, and create a bar- chart using said techniques. Remember to change the X-axis values: our random variable’s possible values start at 0, and the cells that contain the possible values, in our case, are A2-A7. The result is:</p><p>PDF for Binom ial Random Variable (n = 5,  = .35)</p><p>0.4</p><p>) 0.3 x</p><p>= 0.2 X (</p><p>P 0.1 0 0 1 2 3 4 5 x 5.1.2 Poisson The poisson distribution is helpful for modeling the probability of discreet occurrences of some event, x. In particular, the random variable must be zero or positive and discreet. For example, the number of times a volcano erupts in a year can be assumed to be a poisson random variable. Recall the probability distribution function of a poisson random variable of x occurrences:</p><p> e x P(x)  x! where it can be shown the parameter  satisfies </p><p>E[x]   V[x]  </p><p>Excel provides a means for deriving P(x) based on x and the mean, . In this case, you can the values of x and  contained in cells, or you can simply use Excel to derive a probability by typing the values in directly. Consider the latter quick method, and consider a blank grid spread:</p><p>A B 1 2 3 4</p><p>Suppose you want to derive P(x = 5) for a poisson random variable x with mean  = 2. Click on any cell, say A1, and type</p><p>=poisson(5,2,false) where "false" dictates that we want the probability at the point x = 5, and NOT the cumulative distribution up to x = 5. Hit ENTER, and Excel displays the odds of such an event:</p><p>A B 1 .036089 2 3 4</p><p>If you have an array of values x from the same poisson distribution (i.e. with the same mean and variance, ), then type in cell values instead of numerical values. For example, consider the following array of x's from a poisson distribution where  = 2:</p><p>15 A B 1 x 2 0 3 1 4 2 5 3</p><p>Click on cell B2, and type the command =poisson(A2,2,false)</p><p>Then hit ENTER. Subsequently, point and click on cell B2, aim the cursor above the small black box, click and drag down over the remaining cells, B3-B5. The result is P(x) for x = 0,1,2,…,9:</p><p>A B 1 x 2 0 0.135335283 3 1 0.270670566 4 2 0.270670566 5 3 0.180447044 6 4 0.090223522 7 5 0.036089409 8 6 0.012029803 9 7 0.003437087 10 8 0.000859272 11 9 0.000190949</p><p>We can use the graphical methods described in Section 4 on histograms to create a figure for the above probability distribution. Highlight the cells containing the probabilities, B2-B11, and using the methods described in Section 4 (for the title, and x-axis and y-axis labels), create a bar graph. We obtain:</p><p>Probability Distribution of a Poison Random Variable 0.3 0.25</p><p>) 0.2 x</p><p>=</p><p>0.15 X ( 0.1 P 0.05 0 0 1 2 3 4 5 6 7 8 9 x</p><p>Intuitively, for a poisson random variable x with a mean of 2, there will be great odds that the actual value of x will be near 1 (i.e. 0,1,2,3,4). Hence, the pdf is stacked near 2 (high probability values at 0,1,2,3,4). For a cumulative distribution value, simply use "true" instead of "false":</p><p>=poisson(A2,2,true) The result is: A B 1 x 2 0 0.135335 3 1 0.406006 4 2 0.676676 5 3 0.857123 6 4 0.947347 7 5 0.983436 8 6 0.995466 9 7 0.998903 10 8 0.999763 11 9 0.999954</p><p>16 A simple bar-chart plot of the CDF, above, follows: CDF for a Poisson Random Variable</p><p>1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 6 7 8 9</p><p> x</p><p>5.2 Continuous Distribution Functions For all subsequent probability derivations, the only changes from the above methods in the particular command required. We can derive continuous probabilities P(x  a) for normally or t-distributed random variables x by typing in the value a directly, or by using cells that contain various a's. </p><p>5.2.1 Normal Distribution</p><p>5.2.1.1 Probabilities from any Normal Distribution</p><p>Let x ~ N(, 2 ) and suppose we want to derive P(x  a)</p><p>For example, let  = 2 ,2 = 16 and a = 3. For direct computation, directly type in any cell</p><p>=normdist(3,2,16,true) where "true" denotes that we want the cumulative distribution up to the value a. The result is .524918. Using the same procedure as with Binomial or Poisson random variables, we can instead use cell values that contain a rather than directly typing in a.</p><p>5.2.1.2 Probabilities from The Standard Normal</p><p>Let z ~ N(0,1) and suppose we want to derive P(z  a)</p><p>For example, let a = 1.23. For direct computation, directly type in any cell</p><p>=normsdist(1.23)</p><p>The result is .89065.</p><p>17 5.2.1.3 Cut-off Values of the Standard Normal</p><p>Suppose instead of a probability, for given a, we want some a for a given probability p. For example, suppose we want some a such that</p><p>P(z  a)  .95</p><p>In any cell in Excel, type</p><p>=normsinv(.95)</p><p>The result is 1.644853.</p><p>5.2.2 Student's t Distribution</p><p>5.2.2.1 Probabilities from the t-Distribution</p><p>Excel provides a built-in operation for deriving "tail" probabilities of t-distributed random variables, x. We define a "one-tail" probability as, for example, </p><p>P(t  2) or</p><p>P(t   2) and a "two-tailed" probability as P(| t |  2)  P(t  2 or t  2)</p><p>Suppose we want to derive the two-tailed probability</p><p>P(| t |  1.75) which is identical to</p><p>P(t  1.75 or t  1.75 ) for a t distributed random variable, x, with 5 degrees of freedom:</p><p>P(t  1.75 or t  1.75 ) t ~ t5</p><p>For a cumulative probability derivation of a t distributed random variable, x, use the following command:</p><p>=tdist(1.75,5,2) where the final "2" denotes that we want a two-tailed derivation rather than a one-tailed probability. The result is .140522. For a one-tailed probability,</p><p>P(t  1.75)</p><p>18 use</p><p>=tdist(1.75,5,1) and we obtain half of the two-tailed probability, .070261.</p><p>5.2.2.2 Two-Tailed Cut-off Values from the t-Distribution</p><p>As in the normal case, we have an inverse function for t-random variables that returns some a for a given probability p. For example, suppose we want some a such that</p><p>P(| t | a)  P(t  a )  P(t  a )  .010 t ~ t25</p><p>In Excel, type</p><p>=tinv(.01, 15)</p><p>The result is 2.787438.</p><p>19 6. Sample Bivariate Analysis: Sample Covariance and Correlation</p><p>A primary concern in the analysis of data is not measures of central tendency and dispersion, per se. Rather, we are interested in how economic phenomena relate to each other. For example, we might ask whether income tax increases will diminish net tax revenues collected and therefore threaten the solvency of certain welfare programs. This topic is ultimately left for Econ. 120B and 120C, Econometrics 1 and 2, however the initial step entails the analysis of simple linear relationships by measures of covariance and correlation. Details on the theory of covariance and correlation can be found in Newbold, Chapter 4.</p><p>6.1 Sample Covariance Consider the labor data again, without means, variances and standard deviations present:</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00 8 VARIANCE 75.67 45.76 6.67 160.92 392402677.33 9 STAND. DEV. 8.70 6.76 2.58 12.69 19809.16</p><p>We would like to measure the degree of linear of dependence between work hours, wages education and age. For the sample covariance between the relevant variables, we need to set up an area in the spread- sheet that can contain this information. Here, we will place the covariance information below the mean and variance information. Type in cell A11 the title "COVARIANCE", and re-type the variable names in cells C11 - F11 and B11 - B14:</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00 8 VARIANCE 75.67 45.76 6.67 160.92 392402677.33 9 STAND. DEV. 8.70 6.76 2.58 12.69 19809.16 10 11 COVARIANCE hours wage ed age 12 hours 13 wage 14 ed 15 age</p><p>We will fill out the table completely in order to make a point below. Click-on cell C12 in order to derive the sample covariance between hours and hours. Type the command</p><p>=covar(B2:B5, B2:B5)</p><p>Excel will use the data in cells B2 - B5 (i.e. work hours) to derive the covariance between hours and itself. Type ENTER to complete the task. Note that the sample covariance between a variable and it self is simply a sample estimate of the variance. However, it is not an unbiased sample estimate of the variance. Specifically, Excel uses the following formula for sample covariance:</p><p>20 ^ 1 n cov(x, y)  xi  xyi  y n i1</p><p>Thus, the sample between, say, x and x is</p><p> n ^ 1 2 cov(x, x)  xi  x n i1 which is a biased estimator of the variance. The unbiased variance estimator that Excel uses is</p><p> n ^ 1 2 var(x, x)  xi  x n 1 i1</p><p>Next, perform the task for the covariance between hours and wages. Click-on cell D12, and type</p><p>=covar(B2:B5, C2:C5)</p><p>Then type ENTER. Excel will use the data for work hours (cells B2 - B5) and the data for wages (cells C2 - C5) to derive the sample covariance. Repeat this task for all other combinations of variables (hours, ed), (hours,age), (wage,wage), (wage,ed), (wage,age), (ed,ed), (ed,age) and (age,age). We find, after formatting the numbers to have four decimal places</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00 8 VARIANCE 75.67 45.76 6.67 160.92 392402677.33 9 STAND. DEV. 8.70 6.76 2.58 12.69 19809.16 10 11 COVARIANCE hours wage ed age 12 hours 56.7500 26.1063 14.5000 13.3850 13 wage 34.3217 8.1875 0.0031 14 ed 5.0000 14.7500 15 age 120.6875</p><p>Now, recall that the covariance is symmetric, hence</p><p>^ ^ COV (X ,Y )  COV (Y, X ) which means we can ignore the rest of the cells. For example, the sample covariance between hours and wages in cell D12 is 26.1063. Likewise, the sample covariance between wages and hours would be 26.1063, which could redundantly be placed in cell C13. Thus, the above covariance "table" is considered complete. </p><p>6.2 Sample Correlation Consider the correlation between hours and wages. Deriving the sample correlation is performed in a manner identical to the derivation of the covariance, except we use the command in cell D18 (see the presentation, below)</p><p>=correl(B2:B5, C2:C5)</p><p>21 Like covariance, the sample correlation is symmetric, thus we do need to find the correlation between wages and hours. Recall that the correlation between a variable and itself is simple 1, thus you only need to perform the operation for couples of different variables. Consider using the cells beneath the covariance cells. We can type "CORRELATION" in cell A17, and use the cells to the right to fill in the correlation derivations. After formatting the cells to allow for 5 decimal places, we obtain the following:</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00 8 VARIANCE 75.67 45.76 6.67 160.92 392402677.33 9 STAND. DEV. 8.70 6.76 2.58 12.69 19809.16 10 11 COVARIANCE hours wage ed age 12 hours 56.7500 26.1063 14.5000 13.3850 13 wage 34.3217 8.1875 0.0031 14 ed 5.0000 14.7500 15 age 120.6875 16 17 CORRELATION hours wage ed age 18 hours .59153 .86080 .16161 19 wage .62500 .00005 20 ed .60045 21 age</p><p>22 7. Interval Estimation</p><p>Consider the task of estimating a 95% Confidence Interval for the true mean  of some variable. Recall that the confidence interval itself is</p><p>95%CI()  (X  K, X  K) where K satisfies P(  K  x    K)  .95</p><p>Excel assumes the variable in question is either normally distributed, or the sample size is large enough so that, by the Central Limit Theorem (see Newbold, Chapter 6), the sample mean will be nearly normally distributed . After some simplification (see Newbold, Chapter 8), we can solve directly for K based on Excel's distributional assumptions:  K  1.96 n</p><p>Of course, we do not know the true standard deviation, thus we must give Excel a sample estimate of . In order to calculate a 95% confidence interval length, K, perform the following. Suppose the sample size is 4, as it is in the previous section, and suppose an estimate of the standard deviation is contained in cell B9, as it is for the variable hours, above. Click-on an empty cell (say, cell B24 for the variable hours), the type</p><p>=confidence(.05, B9, 4) then hit ENTER. Notice that the above command is specifically contoured to our needs. The general command format is</p><p>=confidence(, , n) where  satisfies the (1 - )% confidence interval size,  denotes the true standard deviation, and n denotes the number of individuals in our group. Excel will derive K and place it in the chosen cell. For clarity, we can type "95% CI: K" in cell A24, “95% Lower” in A25 and “95% Upper” in A26. Notice that we re-typed the variable names for ease of reference. The result is A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00 8 VARIANCE 75.67 45.76 6.67 160.92 392402677.33 9 STAND. DEV. 8.70 6.76 2.58 12.69 19809.16 10 … 22 23 hours wage ed age income_wage 24 95% CI: K 8.524516 25 95% Lower</p><p>23 26 95% Upper</p><p>Thus, the 95% CI for the sample of hours is 95%CI()  (x  8.52, x  8.52)  (38.50  8.52, 38.50  8.52)  (29.98, 47.02)</p><p>We can directly create the lower bound by typing in cell B25</p><p>=B7 – B24 and we can directly create the upper bound by typing in cell B26</p><p>=B7 + A24</p><p>We can now proceed as usual to perform the same CI operation on the rest of the cells, C24 - F24. As usual, click-on cell B24, place the cursor above the small black-box in the lower right corner, click, hold and drag over C24 - F24. The result, after formatting the cells to 4 decimal places, is</p><p>A B C D E F 1 hours wage ed age income_wage 2 34 7.5 12 20 13260 3 50 20.8 18 37 54080 4 40 5.75 16 50 11960 5 30 12.90 14 42 20124 6 7 MEAN 38.50 11.74 15.00 37.25 24856.00 8 VARIANCE 75.67 45.76 6.67 160.92 392402677.33 9 STAND. DEV. 8.70 6.76 2.58 12.69 19809.16 10 … 23 hours wage ed age income_wage 24 95% CI: K 8.5245 6.6294 2.5303 12.4313 19412.5878 25 95% Lower 29.9755 5.1106 12.4697 24.8187 5443.4122 26 95% Upper 47.0245 18.3694 17.5303 49.6813 44268.5878 27</p><p>Comment: For a 99% CI, replace the ".05" with ".01". For a 90% CI, replace the ".05" with ".10".</p><p>24 8. Graphics: Charts, Tables and Figures</p><p>8.1 Scatter Plots: Two Variable Relationships</p><p>A scatter-plot present points (X,Y) in two-dimensional Euclidean space. For example, in the (WAGE,ED) space, a point would represent an individuals hourly wages and level of education:</p><p>ED 15 14 13  This point denotes an individual's (WAGE, ED) = ($5.25/hour, 13 years) 12 11 10 9 8</p><p>4 5 6 7 8 9 10 11 12 13 14 15 16 WAGE</p><p>Consider a sample of n = 10 years of aggregate U.S. health care expenditure, EXP_HEALTH, and real income, INCOME. </p><p>A B 1 INCOME EXP_HEALTH 2 9.3 0.998 3 11.2 1.499 4 17.1 4.285 5 13.8 1.573 6 10.9 2.021 7 15.3 2.26 8 12.8 1.953 9 14.6 2.103 10 21.2 3.428 11 19.3 2.277</p><p>We want to create a scatter-plot with income on the X-axis and health care expenditures on the income on the X-axis and health care expenditures on the Y-axis. </p><p>25 1. Click on cell A1 (the label INCOME), hold and drag over cells A1-A11 and B1-B11. That is, highlight the labels and data for expenditure and income:</p><p>A B 1 INCOME EXP_HEALTH 2 9.3 0.998 3 11.2 1.499 4 17.1 4.285 5 13.8 1.573 6 10.9 2.021 7 15.3 2.26 8 12.8 1.953 9 14.6 2.103 10 21.2 3.428 11 19.3 2.277</p><p>2. Next, go to the task bar at the top of the screen, click on INSERT, then CHART. A pop-up bax appears: for a scatter-plot, on the left side of the pop-up box, click on “standard types”, then “XY (Scatter)”. Click on NEXT at the bottom of the box: Excel will show you the graph with its default style. We will add a title, axes labels and remove the grid-lines, next. 3. Click on NEXT again. Click on TITLE, and type “Income versus Health Expenditure”. Under Category (X) axis, type “Income”; under Value (Y) axis, type “Health Expenditure”. Click on GRIDLINES, and remove any check-marks by clicking on them: this will turn off all gridlines. Finally, click on LEGEND, and remove the check-mark in the legend box by clicking on it. </p><p>The result is the following polished scatter plot:</p><p>Income versus Health Expenditure</p><p>25 e r</p><p> u 20 t i d</p><p> n 15 e p</p><p> x 10 E</p><p> h t</p><p> l 5 a e</p><p>H 0 0 1 2 3 4 5 Incom e</p><p>Comment 1. Notice that a very clear linear relationship seems to exist: as real income increases, consumers can afford more health care and subsequently purchase more health care. Indeed, the sample correlation between the two economic variates is .74, a strong positive value.</p><p>2. For editing (point colors, X-axis, background color, etc.), see the discourse under Time Series, below). 8.2 Time-Series: Line Plots: Multiple Variable Representations, PDF's, CFD's</p><p>26 Often, or data exists over time. Examples include GNP, the CPI, production indices, inflation rate, exchange rates, asset returns and stock market indices, etc. It is often usual to present such information in a concise manner on the same graph. Consider data U.S. GDP (in billions $), the money supply MS (in billions $) and gross private domestic investment GPDI: A B C D 1 Year GDP MS GPDI 2 1970 1010.7 628.1 150.3 3 1971 1097.2 717.2 175.5 4 1972 1207 805.2 205.6 5 1973 1349.6 861 243.1 6 1974 1458.6 908.6 245.8 7 1975 1585.9 1023.3 226 8 1976 1768.4 1163.7 286.4 9 1977 1974.1 1286.6 358.3 10 1978 2232.7 1388.7 434 11 1979 2488.6 1496.7 480.2 12 1980 2708 1629.5 467.6 13 1981 3030.6 1792.9 558 14 1982 3149.6 1951.9 503.4 15 1983 3405 2186.1 546.7 16 1984 3777.2 2374.3 718.9 17 1985 4038.7 2569.4 714.5 18 1986 4268.6 2811.1 717.6 19 1987 4539.9 2910.8 749.3 20 1988 4900 3071.1 793.6 21 1989 5250.8 3227.3 832.3 22 1990 5522.2 3339 799.5 23 1991 5677.5 3439.8 721.1</p><p>In the following discourse, we will look at various graphical presentations of the above data.</p><p>8.2.1 Smooth Lines</p><p>Creating the Figure Highlight the data and labels by clicking on cell A1, hold and drag over cells A1-D23. The "year" variable has already been formatted to be understood by Excel to be, in fact, a year. When we include this variable in a graph, Excel will not plot it: rather, Excel will use the year-formatted data for a time-line on the X-axis. Next, go to the task bar, click-on INSERT, then CHART, then COSTUM TYPES, scroll down, and finally click-on Smooth Lines. Click on NEXT: Excel shows you the default style of the figure. Click on NEXT again: a pop-up box appears for title and label editing. Click on TITLE, and type under Chart Title, "U.S. Macro Data: 1970-1991", or whatever you deem appropriate. Under Value (Y) Axis, type "Billions of $": this will provide a Y-axis label explaining that the data is in billions of U.S. dollars (i.e. GDP = 1010.7 denotes 1010.7billion = 1.0107 trillion dollars). </p><p>Finally, click on FINISH. Excel will present you the following figure:</p><p>27 U.S. Macro Data: 1970-1991</p><p>6000 5000 $</p><p> f 4000 GDP o</p><p> n</p><p> o 3000 MS i l l i 2000 GDPI B 1000 0</p><p>8.3 Editing Graphs and Figures Consider the figure we created for Gross Domestic Product and Investment data:</p><p>U.S. Macro Data: 1970-1991</p><p>6000 5000 $</p><p> f 4000 GDP o</p><p> n</p><p> o 3000 MS i l l i 2000 GDPI B 1000 0</p><p>The year orientation is difficult to read, and we might want to change the yellow color of the line representing GDPI over time to some darker hue. Also, for style (according to your personal interests as well as readability for the viewer of your figure), we may want to remove the dark background color, or eved remove the outline boundary line of the entire figure, as well as re-locate the legend, and change the font of the test used.</p><p>Line Color In order to change a line's color, double click on the line itself, and a pop-up box appears. Click on PATTERN, then COLOR, and chose a darker hue: here, we will choose a shade of green. </p><p>Background Area In order to remove the dark background, double click on the dark background itself, go to AREA and click on NONE. </p><p>U.S. Macro Data: 1970-1991</p><p>6000 5000 $</p><p> f 4000 GDP o</p><p> n 3000 MS o i l l i 2000 GDPI B 1000 0</p><p>28 Borders I prefer figures without outer boundaries. To create this effect, double click-on the white-area between the outer box and the inner box. A pop-up box appears: go to BORDER, and click on NONE. Also, we can remove the border around the legend: double-click on the legend, in the pop-up box, under BORDER, and click on NONE, U.S. Macro Data: 1970-1991</p><p>6000 5000 $</p><p> f 4000 GDP o</p><p> n</p><p> o 3000 MS i l l i 2000 GDPI B 1000 0</p><p>X-Axis Label and Orientation Finally, to orient the year denominations, double click on the X-axis itself, then click on ALIGNMENT, and notice that a little line appears sprouting from the word Text . Click on this line, hold and drag the line to an upward orientation. If you want the numerical values on the Y-axis to display dollar signs, double click on the Y-axis, click on Number, Currency, go to Symbol, and click on $. Next to Decimal Places, because we like round numbers on the axes, type 0. We obtain:</p><p>U.S. Macro Data: 1970-1991</p><p>$6,000 $5,000 $</p><p> f $4,000 GDP o</p><p> n</p><p> o $3,000 MS i l l i $2,000</p><p>B GDPI $1,000 $0 Fonts If you prefer a different than the default font (usually 12-point), click ONCE on the white-area between the outer and inner boxes. Excel will "highlight" the figure by placing small black boxes in the outer corners. Next, at the top of the screen, in the task bar, change the font size to 10, or 8, or what-have-you. I prefer 10- point font.</p><p>Using the Left and Right Y-Axes Finally, some data is difficult to see because we represent it on the same Y-axis as data with a very different size structure. For example, above, clearly GDP is much larger than GDPI, and therefore some of the small year-to-year nuances of GDPI is lost in the large Y-axis used for GDP. There are, however, two Y-axes, the left and standard Y-axis, and the right Y-axis. Consider placing GDPI on its own Y-axis, to the right. Click on the green, GDPI, line: a pop-up box appears. Click on Axis, then click on Secondary. Excel will orient the GDPI data on the right-side Y-axis, using a more convenient range of numerical values that befits the smaller values of the GDPI. Now, we can see specific year-to-year changes</p><p>29 that were somewhat over-shadowed, above. As before, we can double click on the right Y-axis, click on Number, Currency, go to Symbol, and click on $.</p><p>U.S. Macro Data: 1970-1991</p><p>$6,000 $900 $800 $5,000 $700 $</p><p> f $4,000 $600 GDP o $500 n</p><p> o $3,000 MS i</p><p> l $400 l i GDPI</p><p>B $2,000 $300 $200 $1,000 $100 $0 $0 2 6 4 8 0 4 8 0 2 6 0 7 7 8 8 7 7 7 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 1 1 1 1 1 1 1 1 1 1 1</p><p>Figure Title If you want to enhance the figure title (increase font size, use italics, bold, etc.), click on the title once, place the cursor anywhere above the title characters (the cursor should change in shape to a little line), and click. Excel will now allow you to edit the title. You can highlight the title, and using the task bar at the top of the screen, change the characters to italics, bold, change the font, etc. We will change the font to 12-point and use bold:</p><p>U.S. Macro Data: 1970-1991 $ 6 ,0 0 0 $ 9 0 0 $ 8 0 0 $ 5 ,0 0 0 $ 7 0 0 $ 4 ,0 0 0 $ 6 0 0 G D P $ 5 0 0 $ 3 ,0 0 0 $ 4 0 0 M S $ 2 ,0 0 0 $ 3 0 0 G D P I $ 2 0 0 $ 1,0 0 0 $ 10 0 $ 0 $ 0</p><p>30</p>

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    30 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us