Motor Trend MPG Analysis

Similar documents
Regression Models Course Project, 2016

Motor Trend Yvette Winton September 1, 2016

AIC Laboratory R. Leaf November 28, 2016

tool<-read.csv(file="d:/chilo/regression 7/tool.csv", header=t) tool

Investigation of Relationship between Fuel Economy and Owner Satisfaction

delivery<-read.csv(file="d:/chilo/regression 4/delivery.csv", header=t) delivery

Drilling Example: Diagnostic Plots

Modeling Ignition Delay in a Diesel Engine

R-Sq criterion Data : Surgical room data Chap 9

fruitfly fecundity example summary Tuesday, July 17, :13:19 PM 1

Graphics in R. Fall /5/17 1

Lampiran IV. Hasil Output SPSS Versi 16.0 untuk Analisis Deskriptif

Technical Papers supporting SAP 2009

Booklet of Code and Output for STAD29/STA 1007 Final Exam

TRINITY COLLEGE DUBLIN THE UNIVERSITY OF DUBLIN. Faculty of Engineering, Mathematics and Science. School of Computer Science and Statistics

Exercises An Introduction to R for Epidemiologists using RStudio SER 2014

Subsetting Data in R. Data Wrangling in R

Preface... xi. A Word to the Practitioner... xi The Organization of the Book... xi Required Software... xii Accessing the Supplementary Content...

LET S ARGUE: STUDENT WORK PAMELA RAWSON. Baxter Academy for Technology & Science Portland, rawsonmath.

Problem Set 3 - Solutions

TABLE 4.1 POPULATION OF 100 VALUES 2

Stat 301 Lecture 30. Model Selection. Explanatory Variables. A Good Model. Response: Highway MPG Explanatory: 13 explanatory variables

Getting Started with Correlated Component Regression (CCR) in XLSTAT-CCR

Stat 401 B Lecture 31

Stat 301 Lecture 26. Model Selection. Indicator Variables. Explanatory Variables

Power and Fuel Economy Tradeoffs, and Implications for Benefits and Costs of Vehicle Greenhouse Gas Regulations

ENGINE VARIABLE IMPACT ANALYSIS OF FUEL USE AND EMISSIONS FOR HEAVY DUTY DIESEL MAINTENANCE EQUIPMENT

Model Information Data Set. Response Variable (Events) Summe Response Variable (Trials) N Response Distribution Binomial Link Function

Appendix B STATISTICAL TABLES OVERVIEW

When the points on the graph of a relation lie along a straight line, the relation is linear

Oregon DOT Slow-Speed Weigh-in-Motion (SWIM) Project: Analysis of Initial Weight Data

9.3 Tests About a Population Mean (Day 1)

Stat 401 B Lecture 27

PREDICTION OF FUEL CONSUMPTION

Statistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O Halloran

The Coefficient of Determination

HASIL OUTPUT SPSS. Reliability Scale: ALL VARIABLES

Math 135 S18 Exam 1 Review. The Environmental Protection Agency records data on the fuel economy of many different makes of cars.

Basic SAS and R for HLM

Chapter 5 ESTIMATION OF MAINTENANCE COST PER HOUR USING AGE REPLACEMENT COST MODEL

Guatemalan cholesterol example summary

Vehicle Scrappage and Gasoline Policy. Online Appendix. Alternative First Stage and Reduced Form Specifications

. Enter. Model Summary b. Std. Error. of the. Estimate. Change. a. Predictors: (Constant), Emphaty, reliability, Assurance, responsive, Tangible

Analysis of Production and Sales Trend of Indian Automobile Industry

Quality Control in Mineral Exploration

Lecture 2. Review of Linear Regression I Statistics Statistical Methods II. Presented January 9, 2018

Descriptive Statistics

SAN PEDRO BAY PORTS YARD TRACTOR LOAD FACTOR STUDY Addendum

EXST7034 Multiple Regression Geaghan Chapter 11 Bootstrapping (Toluca example) Page 1

Important Formulas. Discrete Probability Distributions. Probability and Counting Rules. The Normal Distribution. Confidence Intervals and Sample Size

5. CONSTRUCTION OF THE WEIGHT-FOR-LENGTH AND WEIGHT-FOR- HEIGHT STANDARDS

Sharif University of Technology. Graduate School of Management and Economics. Econometrics I. Fall Seyed Mahdi Barakchian

LECTURE 6: HETEROSKEDASTICITY

The PRINCOMP Procedure

Analyzing Crash Risk Using Automatic Traffic Recorder Speed Data

Example #1: One-Way Independent Groups Design. An example based on a study by Forster, Liberman and Friedman (2004) from the

TRY OUT 25 Responden Variabel Kepuasan / x1

UJI VALIDITAS DAN RELIABILIAS VARIABEL KOMPENSASI

Honda Accord theft losses an update

THERMOELECTRIC SAMPLE CONDITIONER SYSTEM (TESC)

TRY OUT 30 Responden Variabel Kompetensi/ x1

Improving CERs building

Effect of Sample Size and Method of Sampling Pig Weights on the Accuracy of Estimating the Mean Weight of the Population 1

DRIVER SPEED COMPLIANCE WITHIN SCHOOL ZONES AND EFFECTS OF 40 PAINTED SPEED LIMIT ON DRIVER SPEED BEHAVIOURS Tony Radalj Main Roads Western Australia

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

FutureMetrics LLC. 8 Airport Road Bethel, ME 04217, USA. Cheap Natural Gas will be Good for the Wood-to-Energy Sector!

Topic 5 Lecture 3 Estimating Policy Effects via the Simple Linear. Regression Model (SLRM) and the Ordinary Least Squares (OLS) Method

Background Information. Instructions. Problem Statement. HOMEWORK INSTRUCTIONS Homework #5 Vehicle Fuel Economy Problem

DRIVING PERFORMANCE PROFILES OF DRIVERS WITH PARKINSON S DISEASE

Fuel Economy and Safety

Lampiran 1. Data Perusahaan

I-95 Corridor Coalition Vehicle Probe Project: HERE, INRIX and TOMTOM Data Validation. Report for North Carolina (#08) I-240, I-40 and I-26

U.S. Census Bureau News Joint Release U.S. Department of Housing and Urban Development

female male help("predict") yhat age

Tactical Vehicle Cons & Reps Cost Estimating Relationship (CER) Tool

BUILDING A ROBUST INDUSTRY INDEX BASED ON LONGITUDINAL DATA

ENTUCKY RANSPORTATION C ENTER

Introduction. Materials and Methods. How to Estimate Injection Percentage

Momentum, Energy and Collisions

Article: Sulfur Testing VPS Quality Approach By Dr Sunil Kumar Laboratory Manager Fujairah, UAE

MONTHLY NEW RESIDENTIAL CONSTRUCTION, AUGUST 2017

Robust alternatives to best linear unbiased prediction of complex traits

: ( .

PROCEDURES FOR ESTIMATING THE TOTAL LOAD EXPERIENCE OF A HIGHWAY AS CONTRIBUTED BY CARGO VEHICLES

Non-Obvious Relational Awareness for Diesel Engine Fluid Consumption

MONTHLY NEW RESIDENTIAL SALES, AUGUST 2017

Using Statistics To Make Inferences 6. Wilcoxon Matched Pairs Signed Ranks Test. Wilcoxon Rank Sum Test/ Mann-Whitney Test

Relating your PIRA and PUMA test marks to the national standard

Relating your PIRA and PUMA test marks to the national standard

ggplot2: easy graphics with R

I-95 Corridor Coalition Vehicle Probe Project: HERE, INRIX and TOMTOM Data Validation

MONTHLY NEW RESIDENTIAL CONSTRUCTION, NOVEMBER 2017

Follow this and additional works at:

CEMENT AND CONCRETE REFERENCE LABORATORY PROFICIENCY SAMPLE PROGRAM

DEFECT DISTRIBUTION IN WELDS OF INCOLOY 908

U.S. Census Bureau News Joint Release U.S. Department of Housing and Urban Development

Derivative Valuation and GASB 53 Compliance Report For the Period Ending September 30, 2015

Deploying Smart Wires at the Georgia Power Company (GPC)

The Effect of Fuel Price Changes on Fleet Demand for New Vehicle Fuel Economy

Post 50 km/h Implementation Driver Speed Compliance Western Australian Experience in Perth Metropolitan Area

Transcription:

Motor Trend MPG Analysis SJ May 15, 2016 Executive Summary For this project, we were asked to look at a data set of a collection of cars in the automobile industry. We are going to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). We are particularly interested in the following two questions: 1. "Is an automatic or manual transmission better for MPG" 2. "Quantifying how different is the MPG between automatic and manual transmissions?" Following are the steps I took to conduct this study: Data Loading data(mtcars) str(mtcars) 'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2... $ cyl : num 6 6 4 6 8 6 8 4 4 6... $ disp: num 160 160 108 258 360... $ hp : num 110 110 93 110 175 105 245 62 95 123... $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92... $ wt : num 2.62 2.88 2.32 3.21 3.44... $ qsec: num 16.5 17 18.6 19.4 17... $ vs : num 0 0 1 1 0 1 0 1 1 1... $ am : num 1 1 1 0 0 0 0 0 0 0... $ gear: num 4 4 4 3 3 3 3 4 4 4... $ carb: num 4 4 1 1 2 1 4 2 2 4... A quick look at the data from str(mtcars) and?mtcars indicates that some variables need to be changed as factor. Those variables are cyl, vs, gear, carb, and am. mtcars$cyl <- factor(mtcars$cyl) mtcars$vs <- factor(mtcars$vs) mtcars$gear <- factor(mtcars$gear) mtcars$carb <- factor(mtcars$carb) mtcars$am <- factor(mtcars$am,labels=c('automatic','manual')) Exploratory data analysis summary(mtcars) mpg cyl disp hp drat Min. :10.40 4:11 Min. : 71.1 Min. : 52.0 Min. :2.760

1st Qu.:15.43 6: 7 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080 Median :19.20 8:14 Median :196.3 Median :123.0 Median :3.695 Mean :20.09 Mean :230.7 Mean :146.7 Mean :3.597 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920 Max. :33.90 Max. :472.0 Max. :335.0 Max. :4.930 wt qsec vs am gear carb Min. :1.513 Min. :14.50 0:18 Automatic:19 3:15 1: 7 1st Qu.:2.581 1st Qu.:16.89 1:14 Manual :13 4:12 2:10 Median :3.325 Median :17.71 5: 5 3: 3 Mean :3.217 Mean :17.85 4:10 3rd Qu.:3.610 3rd Qu.:18.90 6: 1 Max. :5.424 Max. :22.90 8: 1 Please see scatterplot matrix in the appendix. Regression model Performed stepwise model selection using backwards elimination to determine the variables for the best model because it has the advantage to fit multiple models and find the best final model. full.model <- lm(mpg ~., data = mtcars) best.model <- step(full.model, direction = "backward") # results not shown because of page limitation summary(best.model) Call: lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.9387-1.2560-0.4013 1.1253 5.0513 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 33.70832 2.60489 12.940 7.73e-13 *** cyl6-3.03134 1.40728-2.154 0.04068 * cyl8-2.16368 2.28425-0.947 0.35225 hp -0.03211 0.01369-2.345 0.02693 * wt -2.49683 0.88559-2.819 0.00908 ** ammanual 1.80921 1.39630 1.296 0.20646 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.41 on 26 degrees of freedom Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401 F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10

Interpretation: The above procedure determines that the best model includes the cyl6, cyl8, hp, wt, and ammanual variables (overall p-value<0.001). The adjusted R-squared indicates that about 84% of the variance is explained by the final model. Moreover, the output of this model suggests that mpg decreases with respect to cylinders (-3.03 and -2.16 for cyl6 and cyl8, respectively), horsepower (-0.03), and weight (for every 1,000lb, by -2.5). On the other hand, mpg increases with respect to having a manual transmission (by 1.8). Residual plots (see appendix) suggest that some transformation may be necessary to achieve linearity. Transmission type differences: Constructed a boxplot of mpg per transmission type (see appendix) and conducted a t-test as follows: t.test(mpg ~ am, data = mtcars) Welch Two Sample t-test data: mpg by am t = -3.7671, df = 18.332, p-value = 0.001374 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.280194-3.209684 sample estimates: mean in group Automatic mean in group Manual 17.14737 24.39231 Interpretation: The boxplots show a difference in mpg depending on the type of transmission. The t-test output confirms that this difference is statistically significant (p-value < 0.05). Conclusions According to these results, cars with a manual transmission are better for mpg than cars with an automatic transmission. The rate of change of the conditional mean mpg with respect to am is about 1.8, and we are 95% confident that this value varies between -1.06 and 4.68. There are however some limitations to this study. To name a few: Conducting this study with the base package only makes it difficult to dig deeper into this assignment. A lack of linearity in the residual plots. This could have been adressed by transforming the variables in an attempt to achieve linearity and would have been facilitated by the use of packages other than the base to determine which transformations are necessary. The sample size is very small, with is a limitation by itself for statitical inference. Being allowed only 5 pages (including 2 pages or less for the main text) to conduct this study is another.

Appendix - Supporting figures Scatterplot matrix of the "mtcars" dataset pairs(mpg ~., data = mtcars) the best model as evaluated by the stepwise regression Residual plots of par(mfrow=c(2, 2)) plot(best.model)

per gallon by transmission type Boxplot of miles boxplot(mpg ~ am, data = mtcars, col = "blue", ylab = "miles per gallon")