Motor Trend Yvette Winton September 1, 2016 Executive Summary Objective In this analysis, the relationship between a set of variables and miles per gallon (MPG) (outcome) is explored from a data set of a collection of cars. The objective of the analysis is to find out if an automatic or manual transmission is more fuel effecient. The MPG difference between automatic and manual transmissions will also be quantified. Conclusion There is statistically significant difference between the single fitted model of mpg ~ am (transmission type) as well as the best fitted multivariate model of mpg ~ am + wt + qsec (transmission type + weight of vehicle + 1/4 mile time). In the single fitted model, mean fuel efficiency for manual transmission is 24.4 MPG, this is 7.2 MPG higher than that of automatic transmission which is 17.1 MPG. In best fitted multivariate model, mean fuel efficency for manual transmission is only 2.9 mpg higher than that of automatic transmission while holding weight and 1/4 mile time constant. There are uncertainty to both models. The single fitted model adjusted R-square is 0.339 and multivariate fitted model adjusted R-square is 0.833. Multivariate fitted model can better predict mpg but can only explain the mpg variability 83.3% of the time. Analysis To explore the fuel effciency difference, if any, between automatic and manual transmission. Fuel effciency is plotted by transmission type. (See appendix) T-test is done for fuel efficiency between the tansmission types. It is significantly different between the fuel effeciency of automatic transmission vs manual transmission because P-value is < 0.05 and confidence interval does not contain 0, thus null is rejected. Mean Fuel Efficiency for manual transmission is 24.4 MPG, this is 7.2 MPG higher than that of automatic transmission which is 17.1 MPG. t.test(mpg ~ transm, data=mtcars1) Welch Two Sample t-test data: mpg by transm t = -3.7671, df = 18.332, p-value = 0.001374 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.280194-3.209684 sample estimates: mean in group automatic mean in group manual 17.14737 24.39231 1
However, there are other variables that fuel efficiency has strong correlation to. In order to find the best fitted multivariate model, the following is being done. Please refer to appendix for details. 1. Start with a fitted model of fuel efficiency vs transmission type (interest of this analysis) 2. After fitting fuel efficiency vs all variables in the dataset, variables are ranked in ascending order of their P-values. 3. Each variable (with P-values in ascending order) is added to fitted model of fuel efficiency vs transmission type, if P value of added variable is > 0.05, meaning newly added variable has little significance to the model, newly added variable will not be included in the model. 4. The above is repeated until all variables that are significant to the fitted model with P value < 0.05 are added. The best fitted model was found to be fuel effeciency vs transmission type + weight + 1/4mile time (lm(mpg ~ am+wt+qsec)). Anova is performed P value for best multivariate model is very small, thus all 3 variables am, wt and qsec have significant influence in the model predicting fuel efficency. In this mulivariate model, the mean of fuel efficiency of manual transmission is only 2.9 mpg higher than that of automatic transmission when holding wt and qsec constant. fitam <-lm(mpg~ am, data=mtcars) fitbest<-lm(mpg~ am + wt + qsec, data=mtcars) anova(fitam, fitbest) Analysis of Variance Table Model 1: mpg ~ am Model 2: mpg ~ am + wt + qsec Res.Df RSS Df Sum of Sq F Pr(>F) 1 30 720.90 2 28 169.29 2 551.61 45.618 1.55e-09 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 summary(fitbest)$coeff (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01 am 2.935837 1.4109045 2.080819 4.671551e-02 wt -3.916504 0.7112016-5.506882 6.952711e-06 qsec 1.225886 0.2886696 4.246676 2.161737e-04 Adjusted R square for single fitted model is 0.339 and adjusted R square for the multivariate fitted model is 0.833. summary(fitam)$adj.r.squared [1] 0.3384589 summary(fitbest)$adj.r.squared [1] 0.8335561 There is slight curve to the residual plot of the best fitted multivariate model lm(mpg ~ am + wt + qsec), but there is no obvious trend, residuals are random. From QQ plot, the residuals are normally distributed. (See appendix) 2
Appendix: library(ggplot2) ggplot(aes(x=transm, y=mpg), data=mtcars1) + geom_boxplot(aes(fill=transm)) + xlab("transmission Type") 35 Miles/Gallon vs Transmission Type 30 Miles/(US)gallon 25 20 transm automatic manual 15 10 automatic Transmission Type manual summary(lm(mpg ~., data=mtcars))$coeff (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440 cyl -0.11144048 1.04502336-0.1066392 0.91608738 disp 0.01333524 0.01785750 0.7467585 0.46348865 hp -0.02148212 0.02176858-0.9868407 0.33495531 drat 0.78711097 1.63537307 0.4813036 0.63527790 wt -3.71530393 1.89441430-1.9611887 0.06325215 qsec 0.82104075 0.73084480 1.1234133 0.27394127 vs 0.31776281 2.10450861 0.1509915 0.88142347 am 2.52022689 2.05665055 1.2254035 0.23398971 gear 0.65541302 1.49325996 0.4389142 0.66520643 carb -0.19941925 0.82875250-0.2406258 0.81217871 summary(lm(mpg~ am + wt, data=mtcars))$coeff (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13 am -0.02361522 1.5456453-0.01527855 9.879146e-01 wt -5.35281145 0.7882438-6.79080719 1.867415e-07 3
summary(lm(mpg~ am + wt + qsec, data=mtcars))$coeff (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01 am 2.935837 1.4109045 2.080819 4.671551e-02 wt -3.916504 0.7112016-5.506882 6.952711e-06 qsec 1.225886 0.2886696 4.246676 2.161737e-04 summary(lm(mpg~ am + wt + qsec + hp, data=mtcars))$coeff (Intercept) 17.44019110 9.3188688 1.871492 0.072149342 am 2.92550394 1.3971471 2.093913 0.045790788 wt -3.23809682 0.8898986-3.638726 0.001141407 qsec 0.81060254 0.4388703 1.847021 0.075731202 hp -0.01764654 0.0141506-1.247052 0.223087932 summary(lm(mpg~ am + wt + qsec + disp, data=mtcars))$coeff (Intercept) 6.442378425 8.25723071 0.7802105 0.4420535035 am 3.310153929 1.51241278 2.1886577 0.0374482129 wt -4.588282099 1.16677426-3.9324506 0.0005290227 qsec 1.416958261 0.39148853 3.6194119 0.0012000578 disp 0.007689836 0.01053478 0.7299473 0.4717085406 summary(lm(mpg~ am + wt + qsec + drat, data=mtcars))$coeff (Intercept) 7.6277466 8.2102682 0.9290496 3.610954e-01 am 2.5728751 1.6225267 1.5857213 1.244465e-01 wt -3.8039824 0.7592452-5.0102160 2.963097e-05 qsec 1.1958078 0.2995350 3.9922141 4.517773e-04 drat 0.6429296 1.3551408 0.4744375 6.390028e-01 par(mfrow=c(2,2)) plot(fitbest) 4
Residuals 4 0 4 Residuals vs Fitted 10 15 20 25 30 1 1 Normal Q Q 2 1 0 1 2 Fitted values Theoretical Quantiles 0.0 1.0 Scale Location 10 15 20 25 30 1 1 Residuals vs Leverage Chrysler Fiat 128 Imperial 0.5 Merc 230 Cook's distance 0.00 0.10 0.20 0.30 Fitted values Leverage 5