Lecture 2. Review of Linear Regression I Statistics Statistical Methods II. Presented January 9, 2018

Similar documents
Topic 5 Lecture 3 Estimating Policy Effects via the Simple Linear. Regression Model (SLRM) and the Ordinary Least Squares (OLS) Method

Sharif University of Technology. Graduate School of Management and Economics. Econometrics I. Fall Seyed Mahdi Barakchian

LECTURE 6: HETEROSKEDASTICITY

Statistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O Halloran

TABLE 4.1 POPULATION OF 100 VALUES 2

two populations are independent. What happens when the two populations are not independent?

The Degrees of Freedom of Partial Least Squares Regression

Some Experimental Designs Using Helicopters, Designed by You. Next Friday, 7 April, you will conduct two of your four experiments.

Appendix B STATISTICAL TABLES OVERVIEW

Improving CERs building

Regression Analysis of Count Data

Using Statistics To Make Inferences 6. Wilcoxon Matched Pairs Signed Ranks Test. Wilcoxon Rank Sum Test/ Mann-Whitney Test

Important Formulas. Discrete Probability Distributions. Probability and Counting Rules. The Normal Distribution. Confidence Intervals and Sample Size

Multiple Imputation of Missing Blood Alcohol Concentration (BAC) Values in FARS

Hidden Markov and Other Models for Discrete-valued Time Series

Published: 14 October 2014

Burn Characteristics of Visco Fuse

Regression Models Course Project, 2016

PUBLICATIONS Silvia Ferrari February 24, 2017

TECHNICAL REPORTS from the ELECTRONICS GROUP at the UNIVERSITY of OTAGO. Table of Multiple Feedback Shift Registers

Example #1: One-Way Independent Groups Design. An example based on a study by Forster, Liberman and Friedman (2004) from the

EVALUATION OF THE CRASH EFFECTS OF THE QUEENSLAND MOBILE SPEED CAMERA PROGRAM IN THE YEAR 2007

Robust alternatives to best linear unbiased prediction of complex traits

Technical Papers supporting SAP 2009

Preface... xi. A Word to the Practitioner... xi The Organization of the Book... xi Required Software... xii Accessing the Supplementary Content...

TRINITY COLLEGE DUBLIN THE UNIVERSITY OF DUBLIN. Faculty of Engineering, Mathematics and Science. School of Computer Science and Statistics

Fuel Economy and Safety

ESSAYS ESSAY B ESSAY A and 2009 are given below:

Linking the Virginia SOL Assessments to NWEA MAP Growth Tests *

Relating your PIRA and PUMA test marks to the national standard

Statistical Learning Examples

Relating your PIRA and PUMA test marks to the national standard

INVESTIGATION ONE: WHAT DOES A VOLTMETER DO? How Are Values of Circuit Variables Measured?

On Using Storage and Genset for Mitigating Power Grid Failures

Cost-Efficiency by Arash Method in DEA

9.3 Tests About a Population Mean (Day 1)

. Enter. Model Summary b. Std. Error. of the. Estimate. Change. a. Predictors: (Constant), Emphaty, reliability, Assurance, responsive, Tangible

Road Surface characteristics and traffic accident rates on New Zealand s state highway network

Integrating remote sensing and ground monitoring data to improve estimation of PM 2.5 concentrations for chronic health studies

Motor Trend Yvette Winton September 1, 2016

Linking the Georgia Milestones Assessments to NWEA MAP Growth Tests *

PREDICTION OF FUEL CONSUMPTION

Linking the North Carolina EOG Assessments to NWEA MAP Growth Tests *

Linking the Kansas KAP Assessments to NWEA MAP Growth Tests *

From Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT. Full book available for purchase here.

How and why does slip angle accuracy change with speed? Date: 1st August 2012 Version:

IMA Preprint Series # 2035

Linking the New York State NYSTP Assessments to NWEA MAP Growth Tests *

Stat 301 Lecture 30. Model Selection. Explanatory Variables. A Good Model. Response: Highway MPG Explanatory: 13 explanatory variables

Some Robust and Classical Nonparametric Procedures of Estimations in Linear Regression Model

A UNIFYING VIEW ON MULTI-STEP FORECASTING USING AN AUTOREGRESSION

Getting Started with Correlated Component Regression (CCR) in XLSTAT-CCR

Capacity-Achieving Accumulate-Repeat-Accumulate Codes for the BEC with Bounded Complexity

Elements of Applied Stochastic Processes

Analyzing Crash Risk Using Automatic Traffic Recorder Speed Data

Linking the Alaska AMP Assessments to NWEA MAP Tests

Elevation, Aspect, Latitude, Land Use. y = β 0 + β 1X 1 + β 2X 2 + ε

CDI15 6. Haar wavelets (1D) 1027, 1104, , 416, 428 SXD

AN EVALUATION OF THE 50 KM/H DEFAULT SPEED LIMIT IN REGIONAL QUEENSLAND

Descriptive Statistics

Linking the Mississippi Assessment Program to NWEA MAP Tests

Linking the Indiana ISTEP+ Assessments to the NWEA MAP Growth Tests. February 2017 Updated November 2017

PLS score-loading correspondence and a bi-orthogonal factorization

Motor Trend MPG Analysis

Statistical Applications in Genetics and Molecular Biology

Linking the Florida Standards Assessments (FSA) to NWEA MAP

fruitfly fecundity example summary Tuesday, July 17, :13:19 PM 1

Exact distribution of Bartels Statistic

EEEE 524/624: Fall 2017 Advances in Power Systems

Table 2: ARCH(1) Relative Efficiency of OLS Sample Size:512

Online appendix for "Fuel Economy and Safety: The Influences of Vehicle Class and Driver Behavior" Mark Jacobsen

Lampiran 1. Penjualan PT Honda Mandiri Bogor

LAMPIRAN 1. Tabel 1. Data Indeks Harga Saham PT. ANTAM, tbk Periode 20 Januari Februari 2012

Supplementary file related to the paper titled On the Design and Deployment of RFID Assisted Navigation Systems for VANET

Investigation of Relationship between Fuel Economy and Owner Satisfaction

Online Appendix for Subways, Strikes, and Slowdowns: The Impacts of Public Transit on Traffic Congestion

PARTIAL LEAST SQUARES: WHEN ORDINARY LEAST SQUARES REGRESSION JUST WON T WORK

Smart Operation for AC Distribution Infrastructure Involving Hybrid Renewable Energy Sources

Outages with Initiating Cause Code Unknown by Voltage Class

PLS Pluses and Minuses In Path Estimation Accuracy

2018 Linking Study: Predicting Performance on the NSCAS Summative ELA and Mathematics Assessments based on MAP Growth Scores

Basic SAS and R for HLM

Efficiency Measurement on Banking Sector in Bangladesh

Algebra 2 Plus, Unit 10: Making Conclusions from Data Objectives: S- CP.A.1,2,3,4,5,B.6,7,8,9; S- MD.B.6,7

DOT HS Summary of Statistical Findings November Statistical Methodology to Make Early Estimates of Motor Vehicle Traffic Fatalities

The following output is from the Minitab general linear model analysis procedure.

Linking the Indiana ISTEP+ Assessments to NWEA MAP Tests

Series and Parallel Networks

Problem Set 3 - Solutions

LET S ARGUE: STUDENT WORK PAMELA RAWSON. Baxter Academy for Technology & Science Portland, rawsonmath.

TSFS02 Vehicle Dynamics and Control. Computer Exercise 2: Lateral Dynamics

Antonio Olmos Priyalatha Govindasamy Research Methods & Statistics University of Denver

APPLICATION OF RELIABILITY GROWTH MODELS TO SENSOR SYSTEMS ABSTRACT NOTATIONS

Property Testing and Affine Invariance Part II Madhu Sudan Harvard University

Measurement methods for skid resistance of road surfaces

3rd International Conference on Material, Mechanical and Manufacturing Engineering (IC3ME 2015)

The reverse order law (ab) # = b (a abb ) a in rings with involution

namibia UniVERSITY OF SCIEnCE AnD TECHnOLOGY FACULTY OF HEALTH AND APPLIED SCIENCES DEPARTMENT OF MATHEMATICS AND STATISTICS MARKS: 100

Objectives. Materials TI-73 CBL 2

ACCIDENT MODIFICATION FACTORS FOR MEDIAN WIDTH

Transcription:

Review of Linear Regression I Statistics 211 - Statistical Methods II Presented January 9, 2018 Estimation of The OLS under normality the OLS Dan Gillen Department of Statistics University of California, Irvine of the OLS 2.1

Review of Linear Regression Our plan 1. Start with OLS and see where we get 2. What are not fulfilled with categorical response data (eg. binary or Poisson) 3. Fix up OLS to satisfy and obtain valid inference Estimation of The OLS under normality the OLS of the OLS 2.2

Review of Linear Regression Linear Regression Definition: By a classical (ordinary least squares) linear regression model, we mean a model in which we assume that 1. E[Y i X i ] = X T i 2. ɛ i = Y i X T i 3. the ɛ i s are independent β β and note that the model demands E[ɛ i ] 0 4. var(ɛ i ) = σ 2 for all i = 1,..., n 5. the ɛ i s are identically distributed 6. ɛ i N (0, σ 2 ) Estimation of The OLS under normality the OLS of the OLS 2.3

Review of Linear Regression Goal Conduct a model for the dependence of a response Y on predictors X 1, X 2,..., X p 1 Two components to the model: 1. The systematic component (mean model) µ i = β 0 + β 1 X i1 + β 2 X i2 +... + β p X ip 1 2. The random component (error term) Y i = µ i + ɛ i, where ɛ i N (0, σ 2 ) Note: We can write the above model using matrix notation in which the ith row of design matrix X is X i T, response vector Y T = (Y 1,..., Y n ). and error vector ɛ T = (ɛ 1,..., ɛ n ) obeys Y = X β + ɛ Estimation of The OLS under normality the OLS of the OLS 2.4

Estimation of Least Squares We consider parameter estimates that minimize the sum of squared errors n (Y i µ i ) 2 = i=1 n (Y i X i T β) 2 i=1 = ( Y X β) T ( Y X β) where X i is the i th row of the design matrix (the row vector of covariate values corresponding to the i th observation) and β = (β 0, β 1,..., β p 1 ) T Why focus on the sum of squared errors? Leads to score estimating equation under classical OLS It is reasonable and mathematically convenient! Estimation of The OLS under normality the OLS of the OLS 2.5

Estimation of Least Squares Proposition: Assume rank(x T X) = p (ie. the number of observations n is greater than the number of parameters p and no predictors are constant or linear combinations of the other predictors). Then the least squares estimate is given by β = (X T X) 1 X T Y. Estimation of The OLS under normality the OLS of the OLS 2.6

Estimation of Proof: Estimation of The OLS under normality the OLS of the OLS 2.7

Mean and Variance of the OLS Mean of the OLS Proposition: ˆ β is unbiased for β (ie. E[ ˆ β] = β) Proof: Estimation of The OLS under normality the OLS of the OLS 2.8

Mean and Variance of the OLS Variance of the OLS Proposition: The variance of the ordinary least squares estimate is var( β) = (X T X) 1 X T ΣX(X T X) 1 where Σ = var( Y ). When Σ = σ 2 I n (i.e., the Y i s are uncorrelated and have equal variance (3-4)), this reduces to var( β) = σ 2 (X T X) 1 Proof: Follows directly from var(a Y ) = Avar( Y )A T. Estimation of The OLS under normality the OLS of the OLS 2.9

Mean and Variance of the OLS Estimation of Var[ β] Note: In practice, we estimate σ 2 with ˆσ 2 = 1 n p n (Y i ˆµ i ) 2 i=1 **It can easily" be shown that ˆσ 2 is an unbiased and consistent of σ 2 using the methods of Stat 120B/200B! Estimation of The OLS under normality the OLS of the OLS 2.10

OLS is optimal" under the normality assumption If the ɛ i are independent and distributed N (0, σ 2 ) then the OLS is the MLE This means that the OLS is: 1. Consistent, 2. Asymptotically normally distributed, 3. Asymptotically efficient (achieves the Cramer-Rao lower bound) Estimation of The OLS under normality the OLS of the OLS 2.11

under non-normal errors Gauss-Markov If we do not assume normality, we may appeal to the Gauss-Markov theorem... Proposition: (Gauss-Markov Thm) Suppose Var( Y ) = σ 2 I n. Let β = CY be an unbiased of β. Then, the variance of linear functions of β is at least as great as the variance of linear functions of β (that is, the ordinary least squares estimate is the best linear unbiased (BLUE) of β). Estimation of The OLS under normality the OLS of the OLS 2.12

Gauss-Markov Thm Proof: Estimation of The OLS under normality the OLS of the OLS 2.13

under non-normal errors Gauss-Markov Note: Now suppose that Var( Y ) = Σ is arbitrary. For a positive definite symmetric matrix we can find nonsingular symmetric matrix A such that Σ = AA. In that case, then, Z = A 1 Y has expectation A 1 X β and variance A 1 ΣA 1 = I n. Letting W = A 1 X in this transformed model, the ordinary least squares estimate for β would be β = (W T W) 1 W T Z Estimation of The OLS under normality the OLS of the OLS 2.14

under non-normal errors Gauss-Markov In terms of the original response Y and predictors X this yields generalized least squares estimate β = (X T Σ 1 X) 1 X T Σ 1 Y which is unbiased for β and has variance (X T Σ 1 X) 1. Note that by the Gauss Markov Thm, this is the best linear unbiased estimate of β for this general setting. Note: Generalized least squares can obviously handle the case of correlated Y i s. In this class, we do not consider such settings. We do however consider the setting in which the Y i s are uncorrelated but do not have equal variance. Estimation of The OLS under normality the OLS of the OLS 2.15

under non-normal errors Gauss-Markov Definition: Consider a linear regression model in which Var(Y i ) = σ i and Cov(Y i, Y j ) = 0 for i j. Thus Σ = Var( Y ) = diag(σ 1,..., σ n ). The weighted least squares estimate of β is given by the generalized least squares estimate using the above definition of Σ Note: The above optimality (BLUE) of the ordinary, weighted, and generalized least squares estimates is not dependent upon any particular distribution of the Y i s beyond their first two moments. However, if we want to make inference after an analysis, we need to know the distribution of the estimates, which in turn requires some on the regression model. Estimation of The OLS under normality the OLS of the OLS 2.16

under normality Proposition: Suppose the Y i s are jointly normally distributed and are uncorrelated (hence independent) ( 1-6)). Then, the ordinary (weighted, generalized) least squares estimates are multivariately normally distributed. Thus in the case of constant variance, ˆ β N ( β, σ 2 (X T X) 1 ) Proof: This follows from linear transformations of multivariate normals. Estimation of The OLS under normality the OLS of the OLS 2.17

under normality Consider testing the null hypothesis H 0 : β k = β k,0 vs H 1 : β k β k,0 In Stat 210 you found that for the Wald test statistic we have: T = ˆβ k β k,0 ŝe( ˆβ k ) H0 t n p where ŝe( ˆβ k ) is given by the square-root of the k th diagonal element of Var[ ˆ β] = ˆσ 2 (X T X) 1 with ˆσ 2 = 1 n p n (y i ˆµ i ) 2 i=1 A 100(1-α)% CI for βk is given by computing ˆβ k ± t n p,1 α/2 ŝe( ˆβ k ) Estimation of The OLS under normality the OLS of the OLS 2.18

Asymptotic normality of OLS Question: What happens when the normality assumption is not satisfied? Answer: Like most (useful) s we can approximate the sampling distribution in large samples! To do this, we must appeal to the Lindeberg-Feller Central Limit... Estimation of The OLS under normality the OLS of the OLS 2.19

Lindeberg-Feller Central Limit Proposition: (Lindeberg-Feller Central Limit ) Let Y 1, Y 2,... be independent random variables with E[Y i ] = 0 and var(y i ) = σi 2. Define S n = n i=1 Y i and σ(n) 2 = n i=1 σ2 i. Then both 1. S n /σ (n) d N (0, 1), and 2. lim n max{σ 2 i /σ 2 (n), 1 i n} = 0 if and only if (the Lindeberg condition) ɛ > 0 1 lim n σ(n) 2 n E i=1 [ ] Y i 2 1 [ Yi ɛσ (n) ] = 0 Estimation of The OLS under normality the OLS of the OLS 2.20

Asymptotic normality of OLS Proposition: Consider simple linear regression in which (Y i, X i ) are pairs of response R.V. s and known predictors. Y i s are independently distributed Y i (µ i, σ 2 ) with σ 2 < known. In particular, we will consider regression model of the form µ i = β 0 + β 1 (X i X) and assume (Y i µ i ) iid (0, σ 2 ). Further, let X = ( 1 X X), β = (β 0 β 1 ) T and consider the OLSE β = (X T X) 1 X Y. Then, Z n = (X T X) 1/2 ( β β) Estimation of The OLS under normality the OLS = n n( ˆβ0 β 0 ) i=1 (x i x) 2 ( ˆβ 1 β 1 ) d N 2 (0, σ 2 I 2 ) of the OLS 2.21

Asymptotic normality of OLS Proof: Estimation of The OLS under normality the OLS of the OLS 2.22

Asymptotic normality of OLS Conclusion: Even if we do not assume normality, but simply have independence between the errors, the ordinary least squares estimate will be asymptotically normally distributed as long as max { (Xi X) 2 (Xi X) 2 } 0 as n by the Lindeberg-Feller CLT. In particular, in the case of constant variance we have ˆ β N ( β, σ 2 (X T X) 1 ) Estimation of The OLS under normality the OLS of the OLS 2.23

Consider the regression model Y i = µ i + ɛ i Varying degrees of 1. ɛ i N (0, σ 2 ) for all i 2. ɛ independent and identically distributed with mean zero 3. ɛ independent with constant variance and mean zero 4. ɛ independent with mean zero 5. ɛ has mean zero Estimation of The OLS under normality the OLS of the OLS 2.24

Consider the regression model Y i = µ i + ɛ i Weaker lead to weaker properties for the OLS 1. OLS is optimal (consistent, unbiased, most efficient) 2. OLS is consistent and is the best linear unbiased (BLUE) 3. OLS is consistent and is the best linear unbiased (BLUE) 4. OLS is consistent and asymptotically Normal 5. No guarantees (OLS consistent and asymptotically Normal under additional ) Estimation of The OLS under normality the OLS of the OLS 2.25

What is the effect of changing the error distribution? Thus, changing the error distribution could... 1. Could change Var[ ˆβ] In repeated experimentation, ˆβ varies more than it would if ɛ N (0, σ 2 ) Estimation of The OLS 2. Could affect the efficiency of ˆβ In repeated experimentation, ˆβ varies more than some other of β 3. Could make ˆσ 2 (X T X) 1 a bad estimate of Var[ ˆβ] under normality the OLS In repeated experimentation, the variability of ˆβ is greater (or less) than ˆσ 2 (X T X) 1 of the OLS 2.26

What is the effect of changing the error distribution? These results are distinct... The above results of changing the error distribution are all different phenomena Items (1) and (2) mean that another may be more efficient (smaller variability) than the OLS Item (3) means that if we estimate Var[ ˆβ] by ˆσ 2 (X T X) 1 then our inference for ˆβ will be wrong: Estimation of The OLS Type I error rate of hypothesis tests will be higher (lower) than the nominal level Confidence intervals will not have the correct coverage probability under normality the OLS of the OLS 2.27

(3) occurs when the variance of the error terms is not constant Why does this matter to us? 1. Suppose that our response is a binary outcome variable Y Y i Binom(µ i, 1) Standard linear regression mean model: E[Y i ] = µ i = X i β Error distribution: Var[Y i ] = µ i (1 µ i ) 2. Suppose that our response Y counts the number of events over a specified interval Might assume Y i Poisson(µ i ) Standard linear regression mean model: E[Y i ] = µ i = X i β Error distribution: Var[Y i ] = µ i *Note: Nonconstant variance can also cause (1) and (2) Estimation of The OLS under normality the OLS of the OLS 2.28

Bottom line Because of the mean-variance relationship in these (and many other) outcome distributions, we cannot fulfill the constant variance assumption! ˆσ 2 (X T X) 1 is a bad estimate for ˆβ Invalid inference Much of our class will be devoted to deriving a general class of s for regression models where a mean-variance assumption exists... Estimation of The OLS under normality the OLS of the OLS 2.29