The Coefficient of Determination Lecture 46 Section 13.9 Robb T. Koether Hampden-Sydney College Tue, Apr 13, 2010 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 1 / 48
Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 2 / 48
Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 3 / 48
Explaining the Variation in y Statisticians use regression models to explain y. More specifically, through the model they use variation in x to explain variation in y. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 4 / 48
Explaining the Variation in y For example, why do some people weigh more than other people? One explanation is that some people weigh more than others because they are taller. That is, there is variation in weight because their is variation in height and because weight and height are correlated. But that is only a partial explanation. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 5 / 48
Explaining the Variation in y Statisticians want to quantify how much of the variation in y is explained by the variation in x. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 6 / 48
The Regression Identity As always, variation is measure by calculating a sum of squared deviations. There are three different deviations that we can measure. Deviations of y from y (variation in the data). Deviations of ŷ from y (variation in the model). Deviations of y from ŷ (difference between the data and the model). Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 7 / 48
The Regression Identity Variation in the data (Total sum of squares): SST = (y y) 2. Variation in the model (Regression sum of squares): SSR = (ŷ y) 2. Residues (Sum of squared Errors): SSE = (y ŷ) 2. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 8 / 48
Example - SST, SSR, and SSE The following data represent the heights and weights of 10 adult males. Height (x) Weight (y) 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 9 / 48
Example - SST, SSR, and SSE The regression line is ŷ = 310 + 7x. The model predicts, for example, that if a person is 70 inches tall, he will weigh 180 pounds. The model also predicts that a person will weigh an additional 7 pounds for each additional inch of height. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 10 / 48
Example - SST, SSR, and SSE Compute the predicted weight: Y 1 (L 1 ) L 3. Height (x) Weight (y) Pred. Wgt. (ŷ) 70 185 180 65 140 145 71 180 187 76 220 222 68 150 166 67 170 159 68 185 166 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 11 / 48
Example - SST, SSR, and SSE The regression line 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 12 / 48
Example - SST, SSR, and SSE The deviations of y from y 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 13 / 48
Example - SST, SSR, and SSE The deviations of ŷ from y 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 14 / 48
Example - SST, SSR, and SSE The deviations of y from ŷ 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 15 / 48
Example Compute SST. x y y y (y y) 2 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 16 / 48
Example Compute SST: L 2 -y. x y y y (y y) 2 70 185 5 65 140 40 71 180 0 76 220 40 68 150 30 67 170 10 68 185 5 72 200 20 74 210 30 69 160 20 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 17 / 48
Example Compute SST: Ans 2. x y y y (y y) 2 70 185 5 25 65 140 40 1600 71 180 0 0 76 220 40 1600 68 150 30 900 67 170 10 100 68 185 5 25 72 200 20 400 74 210 30 900 69 160 20 400 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 18 / 48
Example Compute SST: sum(ans). x y y y (y y) 2 70 185 5 25 65 140 40 1600 71 180 0 0 76 220 40 1600 68 150 30 900 67 170 10 100 68 185 5 25 72 200 20 400 74 210 30 900 69 160 20 400 5950 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 19 / 48
Example Compute SSR. x y ŷ ŷ y (ŷ y) 2 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 20 / 48
Example Compute SSR: Y 1 (L 1 ) L 3. x y ŷ ŷ y (ŷ y) 2 70 185 180 65 140 145 71 180 187 76 220 222 68 150 166 67 170 159 68 185 166 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 21 / 48
Example Compute SSR: L 3 -y. x y ŷ ŷ y (ŷ y) 2 70 185 180 0 65 140 145 35 71 180 187 7 76 220 222 42 68 150 166 14 67 170 159 21 68 185 166 14 72 200 194 14 74 210 208 28 69 160 173 7 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 22 / 48
Example Compute SSR: Ans 2. x y ŷ ŷ y (ŷ y) 2 70 185 180 0 0 65 140 145 35 1225 71 180 187 7 49 76 220 222 42 1764 68 150 166 14 196 67 170 159 21 441 68 185 166 14 196 72 200 194 14 196 74 210 208 28 784 69 160 173 7 49 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 23 / 48
Example Compute SSR: sum(ans). x y ŷ ŷ y (ŷ y) 2 70 185 180 0 0 65 140 145 35 1225 71 180 187 7 49 76 220 222 42 1764 68 150 166 14 196 67 170 159 21 441 68 185 166 14 196 72 200 194 14 196 74 210 208 28 784 69 160 173 7 49 4900 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 24 / 48
Example Compute SSE. x y ŷ y ŷ (y ŷ) 2 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 25 / 48
Example Compute SSE: Y 1 (L 1 ) L 3. x y ŷ y ŷ (y ŷ) 2 70 185 180 65 140 145 71 180 187 76 220 222 68 150 166 67 170 159 68 185 166 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 26 / 48
Example Compute SSE: L 2 -L 3 L 4. x y ŷ y ŷ (y ŷ) 2 70 185 180 5 65 140 145 5 71 180 187 7 76 220 222 2 68 150 166 16 67 170 159 11 68 185 166 19 72 200 194 6 74 210 208 7 69 160 173 13 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 27 / 48
Example Compute SSE: Ans 2. x y ŷ y ŷ (y ŷ) 2 70 185 180 5 25 65 140 145 5 25 71 180 187 7 49 76 220 222 2 4 68 150 166 16 256 67 170 159 11 121 68 185 166 19 361 72 200 194 6 36 74 210 208 7 49 69 160 173 13 169 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 28 / 48
Example Compute SSE: sum(ans). x y ŷ y ŷ (y ŷ) 2 70 185 180 5 25 65 140 145 5 25 71 180 187 7 49 76 220 222 2 4 68 150 166 16 256 67 170 159 11 121 68 185 166 19 361 72 200 194 6 36 74 210 208 7 49 69 160 173 13 169 1050 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 29 / 48
Example We have now found that SSR = 4900. SSE = 1050. SST = 5950. We see that SSR + SSE = SST. This is called the regression identity. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 30 / 48
Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 31 / 48
TI-83 - Finding SSR, SSE, and SST TI-83 SSR, SSE, and SST Put the x values into L 1 and the y values into L 2. Use LinReg(a+bx) L 1,L 2,Y 1. Enter Y 1 (L 1 ) L 3. To get SSR, evaluate sum((l 3 -y) 2 ). To get SSE, evaluate sum((l 2 -L 3 ) 2 ). To get SST, evaluate sum((l 2 -y) 2 ). Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 32 / 48
Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 33 / 48
Explaining Variation One goal of regression is to explain the variation in y. For example, if y were weight, how would we explain the variation in weight? That is, why do some people weigh more than others? A partial answer is that some people weigh more because they are taller. That is, an explanatory variable is height x. What are some other partial answers? Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 34 / 48
Explaining Variation How much of the variation in weight is explained by variation in height? The total variation in weight is SST. The linear model (the regression line) explains some of the variation. The model predicts the variation SSR. The remainder is SSE, the variation not predicted by the model. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 35 / 48
Explaining Variation Statisticians consider the predicted variation SSR to be the amount of variation in y that is explained by the model. The residual variation SSE is the remaining variation in y that is not explained by the model. It all checks out because SST = SSR + SSE. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 36 / 48
Variation Explained by the Model The regression line 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 37 / 48
Variation Explained by the Model The total variation in y (SST) 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 38 / 48
Variation Explained by the Model The variation in y that is explained by the model (SSR) 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 39 / 48
Variation Explained by the Model The variation in y that is unexplained by the model (SSE) 220 210 200 190 180 170 160 150 140 64 66 68 70 72 74 76 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 40 / 48
Explaining Variation It can be shown that and, therefore, r 2 = SSR SST 1 r 2 = SSE SST. Therefore, r 2 is the proportion of variation in y that is explained by the model. It is called the coefficient of determination. 1 r 2 is the proportion that is not explained by the model. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 41 / 48
Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 42 / 48
TI-83 - Coefficient of Determination TI-83 Coefficient of Determination To calculate r 2 on the TI-83, follow the procedure that produces the regression line and r. In the same window, the TI-83 reports the value of r 2. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 43 / 48
TI-83 - Finding SSR, SSE, and SST Practice The data on the next slide represent crude oil prices a (x) vs. gasoline prices b (y). Draw the scatter plot. Find the equation of the regression line. Perform the residual analysis. Find the correlation coefficient. Find the coefficient of determination. Compute SST, SSR, and SSE. a http://tonto.eia.doe.gov/dnav/pet/xls/pet_pri_wco_k_w.xls b http://tonto.eia.doe.gov/oog/ftparea/wogirs/xls/pswrgvwrec.xls Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 44 / 48
TI-83 - Finding SSR, SSE, and SST Practice Date Crude Oil Date Gasoline Jan 16 40.98 Jan 19 1.833 Jan 23 41.05 Jan 26 1.833 Jan 30 42.07 Feb 2 1.894 Feb 6 41.77 Feb 9 1.926 Feb 13 43.04 Feb 16 1.970 Feb 20 39.87 Feb 23 1.924 Feb 27 40.22 Mar 2 1.942 Mar 6 42.85 Mar 9 1.936 Mar 13 42.91 Mar 16 1.921 Mar 20 44.90 Mar 23 1.950 Mar 27 50.10 Mar 30 2.048 Apr 3 48.09 Apr 9 2.044 Find SST, SSR, and SSE. Find r 2 and interpret the value. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 45 / 48
Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 46 / 48
Assignment Homework Read Section 13.9, pages 868-869. Work the practice problem on the previous slide. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 47 / 48
Answers to Even-Numbered Exercises Answers to Even-Numbered Exercises SST = 0.0490, SSR = 0.0321, SSE = 0.0169. r 2 = 0.6544. About 65.44% of the variation in gas prices is due to variation in oil prices. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 48 / 48