Econometrics for Health Policy, Health Economics, and Outcomes Research Topic 5 Lecture 3 Estimating Policy Effects via the Simple Linear Regression Model (SLRM) and the Ordinary Least Squares (OLS) Method Copyright Joseph V. Terza, Ph.D. 2007 All Rights Reserved 3. Estimating Policy Effects in the Context of the Simple Linear Regression Model via the Ordinary Least Squares Estimator Recall the specific form of the sampling model in the SLRM is y i = â 1 + xipâ 2 + e i (5-22) where the specific form of the regression error e i (as given in Definition 5.2) is defined as e i = y i -(â 1 + xipâ 2).
Under assumptions (a) through (e), w have fully characterized the aspect of the factual joint distribution of y (the dependent variable) and x p (the policy variable) that is relevant to the analysis the systematic part of the regression model E[y x p] = â 1 + xpâ 2. If we could observe the entire joint population of y and x p, we would know the desired conditional mean relationship. Unfortunately, taking a census typically imposes prohibitive costs. 2
We must rely on an appropriate estimation method, applied to a subset of the population (i.e., a sample), for estimating the values of the parameters â 1 and â 2. The ordinary least squares (OLS) estimator is one such estimation method which: is based on the analogy principal has its desirable statistical properties in particular, OLS is both unbiased and efficient 3
The population parameters â 1 and â 2 are the intercept and slope of the linear relationship between E[y x ] ( p ) and x, respectively p intercept : the point at which the line cuts the vertical axis of the graph slope: the rate of change in E[y x p] ( ) per one unit change in x p. Geometrically the OLS estimator finds the intercept and the slope of the line that best fits the two-dimensional plot of the xp-y data pairs. 4
(5.6) DEF: For any estimators of â 1and â 2, denoted as and, 1) the estimated regression line is defined as 2) the predicted value of y is and 3) the residual is defined as. 5
Figure 5-1 depicts all of the concepts defined in (5.6) Figure 5-1 Here Note that, and that is the vertical distance between the estimated regression line and the observed value of y. 6
In developing an estimation method we adhere to the analogy principle as we did in motivating the DOM estimator. There we noted that our object of interest, the policy effect, is the difference between two population averages. By analogy, we proposed the DOM which is the difference between two sample averages. 7
Similarly, in the present context we know that the fundamental component of the policy effect is the linear relationship between x p and y in the population. By analogy, we seek to plot an appropriate linear relationship in the sample. The problem, of course, is that there are an infinity of possible ways to fit a linear relationship to a sample of (x, y ) pairs generically represented in Figure 5-1 as the dashed estimated regression line. pi i 8
From among the infinity of possible methods we choose the method of least squares (ordinary least squares OLS) which chooses as the estimate values of â 1 and â 2, those values that in a very specific sense minimize the vertical distance between the estimated regression line and the observed data points. 9
For example, to each of the six data points in Figure 5-1 there corresponds a residual (i.e. the vertical distance between the point and the line). The least squares method positions the line so that the sum of the squared vertical distances (residuals) is the smallest. Another way to say this is that the least squares method chooses and so as to minimize the sum of squared residuals. 10
The formal definition of the Ordinary Least Squares (OLS) estimators of â 1 and â 2 is given in the following. (5.7) DEF: The Ordinary Least Squares (OLS) estimators of â 1and â 2are defined to be the minimizing values of for (5-23) where y and x denote the observed values of the outcome and policy i iip variable for the ith sample member, and i = 1,..., n. 11
It is easy to show that the solution to this optimization problem is (5-24) (5-25) and. It should also be noted that if x p is binary, the DOM estimator of the policy effect is identical to the OLS estimator of â 2. 12
(5.8) DEF: For the OLS estimators of â 1 and â 2, denoted as and, 1) the OLS estimated regression line is defined as 2) the OLS predicted value of y is and 3) the OLS residual is defined as. 13
For example, suppose we want to estimate the effect of a person s age on her yearly health care expenditure. We draw the following sample (y) EXPEND [$] (x p) AGE [yrs] 1200 50 500 35 650 25 300 30 450 18 2000 60 1500 65 350 21 200 20 1300 45 1100 40 250 19 1900 64 500 23 750 54 1000 55 650 47 100 24 800 26 850 27 14
Now consider the plot of these data pictured in Figure 5-2. Figure 5-2 Here 15
Each point denotes a sample observation for a particular individual s age and yearly health care expenditure. The solid line represents the OLS estimated line, i.e. (5-26) where are the estimated values of the intercept (â 1) and slope (â 2), respectively. The OLS estimated line is a best fit in the sense that it minimizes the squared vertical distances of the plotted points from the line. 16
Formulation of the SLRM, sampling, and OLS estimation combine to constitute two of the components of the scientific method, viz. 1) hypothesis formulation, and 2) observation. Choice of the SLRM as the basis for analysis lies at the heart of the former component, while sampling and OLS estimation are the keys to the latter. 17
4. Desirable Properties and Sampling Distribution of the OLS Estimator in the Context of the SLRM 4.1 Desirable Properties Unbiasedness (5.10) THEOREM: The OLS estimator in the simple linear regression model is unbiased, i.e. and E[â 1] = â1 E[â 2] = â 2. 18
(5.9) THEOREM: OLS estimators in the simple linear regression model can be written as linear combinations of the regression errors e,..., e n. Specifically, (5-27) and. (5-27) 19
We now note an efficiency result for the OLS estimator relative to a certain class of estimators. (5.11) THEOREM: (The Gauss-Markov Theorem) OLS estimators of â 1 and â 2 in the SLRM are Best Linear Unbiased Estimators (BLUE). The relevant class of estimators are those that can be formed as a linear combination of the conditional errors as in Theorem 5.9, and are unbiased as in Theorem 5.10. 20