Lecture 3: Measure of Central Tendency

Similar documents
Statistics for Social Research

Descriptive Statistics

TABLE 4.1 POPULATION OF 100 VALUES 2

Descriptive Statistics Practice Problems (99-04)

Using Statistics To Make Inferences 6. Wilcoxon Matched Pairs Signed Ranks Test. Wilcoxon Rank Sum Test/ Mann-Whitney Test

Improving CERs building

The purpose of this experiment was to determine if current speed limit postings are

More information at

Table 3.1 New Freshmen SAT Scores By Campus: Fall Table 3.2 UVI New Freshmen SAT Scores By Gender: Fall 1999

PSYC 200 Statistical Methods in Psychology

Technical Papers supporting SAP 2009

9.3 Tests About a Population Mean (Day 1)

Math 135 S18 Exam 1 Review. The Environmental Protection Agency records data on the fuel economy of many different makes of cars.

CEMENT AND CONCRETE REFERENCE LABORATORY PROFICIENCY SAMPLE PROGRAM

Guatemalan cholesterol example summary

Engineering Dept. Highways & Transportation Engineering

Verification of Redfin s Claims about Superior Notification Speed Performance for Listed Properties

FINAL REPORT AP STATISTICS CLASS DIESEL TRUCK COUNT PROJECT

MONTHLY NEW RESIDENTIAL SALES, APRIL 2017

Technical Manual for Gibson Test of Cognitive Skills- Revised

MONTHLY NEW RESIDENTIAL SALES, SEPTEMBER 2018

ESSAYS ESSAY B ESSAY A and 2009 are given below:

Investigating the Concordance Relationship Between the HSA Cut Scores and the PARCC Cut Scores Using the 2016 PARCC Test Data

2018 Linking Study: Predicting Performance on the NSCAS Summative ELA and Mathematics Assessments based on MAP Growth Scores

MONTHLY NEW RESIDENTIAL SALES, AUGUST 2017

Example #1: One-Way Independent Groups Design. An example based on a study by Forster, Liberman and Friedman (2004) from the

Statistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O Halloran

Fall Hint: criterion? d) Based measure of spread? Solution. Page 1

Motorcoach Census. A Study of the Size and Activity of the Motorcoach Industry in the United States and Canada in 2015

SPATIAL AND TEMPORAL PATTERNS OF FATIGUE RELATED CRASHES IN HAWAII

U.S. Census Bureau News Joint Release U.S. Department of Housing and Urban Development

Chapter 28. Direct Current Circuits

Houghton Mifflin MATHEMATICS. Level 1 correlated to Chicago Academic Standards and Framework Grade 1

WHITE PAPER. Preventing Collisions and Reducing Fleet Costs While Using the Zendrive Dashboard

Piping Systems. J. David Bankston, Jr., and Fred Eugene Baker*

Test-Retest Analyses of ACT Engage Assessments for Grades 6 9, Grades 10 12, and College

The application of the 95% Confidence interval with ISAT and IMAGE

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

APPLICATION OF A PARCEL-BASED SUSTAINABILITY TOOL TO ANALYZE GHG EMISSIONS

Norming Tables for the Student Testing Program (STP97)

Physics 2048 Test 2 Dr. Jeff Saul Fall 2001

Spot Speed Study. Engineering H191. Autumn, Hannah Zierden, Seat 20. Ryan King, Seat 29. Jae Lee, Seat 23. Alex Rector, Seat 26

We trust that these data are helpful to you. If you have any questions, feel free to contact Dr. Joe Ludlum at or

Stat 301 Lecture 30. Model Selection. Explanatory Variables. A Good Model. Response: Highway MPG Explanatory: 13 explanatory variables

Topic 5 Lecture 3 Estimating Policy Effects via the Simple Linear. Regression Model (SLRM) and the Ordinary Least Squares (OLS) Method

GRADE 7 TEKS ALIGNMENT CHART

Quality of Life in Neurological Disorders. Scoring Manual

U.S. Census Bureau News Joint Release U.S. Department of Housing and Urban Development

M1 for either 2 or 3 A1 for 7.20 and (need both) (b) 28 1 B1 for 28

DRIVER SPEED COMPLIANCE WITHIN SCHOOL ZONES AND EFFECTS OF 40 PAINTED SPEED LIMIT ON DRIVER SPEED BEHAVIOURS Tony Radalj Main Roads Western Australia

Index. Calculator, 56, 64, 69, 135, 353 Calendars, 348, 356, 357, 364, 371, 381 Card game, NEL Index

MONTHLY NEW RESIDENTIAL CONSTRUCTION, NOVEMBER 2017

University of New Brunswick Fall Full-Time Enrolment by Faculty and Year in Program (Head Count)

correlated to the Virginia Standards of Learning, Grade 6

DRAFT. Enrollment Projections Report. November 25, 2015

Motorcoach Census 2011

Continuous Efficiency Improvement Loop

Improving the Quality and Production of Biogas from Swine Manure and Jatropha (Jatropha curcas) Seeds

CITY OF BOWLING GREEN, OHIO MUNICIPAL UTILITIES ELECTRIC RATE SCHEDULES

HOUSING REPORT SOUTHEAST MICHIGAN 2ND QUARTER 2018

National Center for Statistics and Analysis Research and Development

U.S. Census Bureau News Joint Release U.S. Department of Housing and Urban Development

2018 Linking Study: Predicting Performance on the TNReady Assessments based on MAP Growth Scores

U.S. Census Bureau News Joint Release U.S. Department of Housing and Urban Development

EXST7034 Multiple Regression Geaghan Chapter 11 Bootstrapping (Toluca example) Page 1

MONTHLY NEW RESIDENTIAL CONSTRUCTION, FEBRUARY 2017

MONTHLY NEW RESIDENTIAL CONSTRUCTION, JULY 2017

Albert Sanzari IE-673 Assignment 5

Lecture 2. Review of Linear Regression I Statistics Statistical Methods II. Presented January 9, 2018

Effect of Police Control on U-turn Saturation Flow at Different Median Widths

MONTHLY NEW RESIDENTIAL CONSTRUCTION, APRIL 2017

Read the following questions and select the choices that best answer the questions.

Method for the estimation of the deformation frequency of passenger cars with the German In-Depth Accident Study (GIDAS)

2018 Linking Study: Predicting Performance on the Performance Evaluation for Alaska s Schools (PEAKS) based on MAP Growth Scores

Driver Personas. New Behavioral Clusters and Their Risk Implications. March 2018

Alcohol, Travelling Speed and the Risk of Crash Involvement

Vehicle Speeds in School Zones

PSAT / NMSQT SUMMARY REPORT COLLEGE-BOUND HIGH SCHOOL JUNIORS NEW JERSEY

THE VICTAULIC PIPING METHOD FOR ACCOMMODATING PIPE OFFSETS

Objective: Estimate and measure liquid volume in liters and milliliters using the vertical number line.

Missouri Seat Belt Usage Survey for 2017

Math 20 2 Statistics Review for the Final

Sample size determination and estimation of ships traffic stream parameters

Student-Level Growth Estimates for the SAT Suite of Assessments

Lampiran IV. Hasil Output SPSS Versi 16.0 untuk Analisis Deskriptif

Alcohol Ignition Interlocks: Research, Technology and Programs. Robyn Robertson Traffic Injury Research Foundation NCSL Webinar, June 24 th, 2009

fruitfly fecundity example summary Tuesday, July 17, :13:19 PM 1

LECTURE 6: HETEROSKEDASTICITY

Policy Note. State data shows electric vehicle tax breaks go mostly to the rich. Introduction. Tax breaks for electric vehicles

Project 2: Traffic and Queuing (updated 28 Feb 2006)

Detailed Plan of Study Form

SAN PEDRO BAY PORTS YARD TRACTOR LOAD FACTOR STUDY Addendum

Predicting Tractor Fuel Consumption

Quarterly Market Detail - Q Townhouses and Condos Miami-Fort Lauderdale-West Palm Beach MSA

Abstract. Executive Summary. Emily Rogers Jean Wang ORF 467 Final Report-Middlesex County

MONTHLY NEW RESIDENTIAL CONSTRUCTION, AUGUST 2017

Scientific Notation. Slide 1 / 106. Slide 2 / 106. Slide 3 / th Grade. Table of Contents. New Jersey Center for Teaching and Learning

TRUTH AND LIES: CONSUMER PERCEPTION VS. DATA

Petroleum Engineering. August 28, 2005 English 406 Project 1 Word Count: 841

University of Central Florida Police Department. Traffic Statistical Report

Transcription:

Lecture 3: Measure of Central Tendency Donglei Du (ddu@unb.edu) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 Donglei Du (UNB) ADM 2623: Business Statistics 1 / 53

Table of contents 1 Measure of central tendency: location parameter Introduction Arithmetic Mean Weighted Mean (WM) Median Mode Geometric Mean Mean for grouped data The Median for Grouped Data The Mode for Grouped Data 2 Dicussion: How to lie with averges? Or how to defend yourselves from those lying with averages? Donglei Du (UNB) ADM 2623: Business Statistics 2 / 53

Section 1 Measure of central tendency: location parameter Donglei Du (UNB) ADM 2623: Business Statistics 3 / 53

Subsection 1 Introduction Donglei Du (UNB) ADM 2623: Business Statistics 4 / 53

Introduction Characterize the average or typical behavior of the data. There are many types of central tendency measures: Arithmetic mean Weighted arithmetic mean Geometric mean Median Mode Donglei Du (UNB) ADM 2623: Business Statistics 5 / 53

Subsection 2 Arithmetic Mean Donglei Du (UNB) ADM 2623: Business Statistics 6 / 53

Arithmetic Mean The Arithmetic Mean of a set of n numbers AM = x 1 +... + x n n Arithmetic Mean for population and sample µ = x = N x i i=1 N n x i i=1 n Donglei Du (UNB) ADM 2623: Business Statistics 7 / 53

Example Example: A sample of five executives received the following bonuses last year ($000): 14.0 15.0 17.0 16.0 15.0 Problem: Determine the average bonus given last year. Solution: x = 14 + 15 + 17 + 16 + 15 5 = 77 5 = 15.4. Donglei Du (UNB) ADM 2623: Business Statistics 8 / 53

Example Example: the weight example (weight.csv) The R code: weight <- read.csv("weight.csv") sec_01a<-weight$weight.01a.2013fall # Mean mean(sec_01a) ## [1] 155.8548 Donglei Du (UNB) ADM 2623: Business Statistics 9 / 53

Will Rogers phenomenon Consider two sets of IQ scores of famous people. Group 1 IQ Group 2 IQ Albert Einstein 160 John F. Kennedy 117 Bill Gates 160 George Washington 118 Sir Isaac Newton 190 Abraham Lincoln 128 Mean 170 Mean 123 Let us move Bill Gates from the first group to the second group Group 1 IQ Group 2 IQ Albert Einstein 160 John F. Kennedy 117 Bill Gates 160 Sir Isaac Newton 190 George Washington 118 Abraham Lincoln 128 Mean 175 Mean 130.75 Donglei Du (UNB) ADM 2623: Business Statistics 10 / 53

Will Rogers phenomenon The above example shows the Will Rogers phenomenon: "When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states." Donglei Du (UNB) ADM 2623: Business Statistics 11 / 53

Properties of Arithmetic Mean It requires at least the interval scale All values are used It is unique It is easy to calculate and allow easy mathematical treatment The sum of the deviations from the mean is 0 The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero! It is easily affected by extremes, such as very big or small numbers in the set (non-robust). Donglei Du (UNB) ADM 2623: Business Statistics 12 / 53

The sum of the deviations from the mean is 0: an illustration Values deviations 3-2 4-1 8 3 Mean 5 0 Donglei Du (UNB) ADM 2623: Business Statistics 13 / 53

How Extremes Affect the Arithmetic Mean? The mean of the values 1,1,1,1,100 is 20.8. However, 20.8 does not represent the typical behavior of this data set! Extreme numbers relative to the rest of the data is called outliers! Examination of data for possible outliers serves many useful purposes, including Identifying strong skew in the distribution. Identifying data collection or entry errors. Providing insight into interesting properties of the data. Donglei Du (UNB) ADM 2623: Business Statistics 14 / 53

Subsection 3 Weighted Mean (WM) Donglei Du (UNB) ADM 2623: Business Statistics 15 / 53

Weighted Mean (WM) The Weighted Mean (WM) of a set of n numbers WM = w 1x 1 +... + w n x n w 1 +... + w n This formula will be used to calculate the mean and variance for grouped data! Donglei Du (UNB) ADM 2623: Business Statistics 16 / 53

Example Example: During an one hour period on a hot Saturday afternoon Cabana boy Chris served fifty drinks. He sold: five drinks for $0.50 fifteen for $0.75 fifteen for $0.90 fifteen for $1.10 Problem: compute the weighted mean of the price of the drinks WM = 5(0.50) + 15(0.75) + 15(0.90) + 15(1.10) 5 + 15 + 15 + 15 = 43.75 50 = 0.875. Donglei Du (UNB) ADM 2623: Business Statistics 17 / 53

Example Example: the above example The R code: ## weighted mean wt <- c(5, 15, 15, 15)/50 x <- c(0.5,0.75,0.90,1.1) weighted.mean(x, wt) ## [1] 0.875 Donglei Du (UNB) ADM 2623: Business Statistics 18 / 53

Subsection 4 Median Donglei Du (UNB) ADM 2623: Business Statistics 19 / 53

Median The Median is the midpoint of the values after they have been ordered from the smallest to the largest Equivalently, the Median is a number which divides the data set into two equal parts, each item in one part is no more than this number, and each item in another part is no less than this number. Donglei Du (UNB) ADM 2623: Business Statistics 20 / 53

Two-step process to find the median Step 1. Sort the data in a nondecreasing order Step 2. If the total number of items n is an odd number, then the number on the (n+1)/2 position is the median; If n is an even number, then the average of the two numbers on the n/2 and n/2+1 positions is the median. (For ordinal level of data, choose any one on the two middle positions). Donglei Du (UNB) ADM 2623: Business Statistics 21 / 53

Examples Example: The ages for a sample of five college students are: 21, 25, 19, 20, 22 Arranging the data in ascending order gives: 19, 20, 21, 22, 25. The median is 21. Example: The heights of four basketball players, in inches, are: 76, 73, 80, 75 Arranging the data in ascending order gives: 73, 75, 76, 80. The median is the average of the two middle numbers Median = 75 + 76 2 = 75.5. Donglei Du (UNB) ADM 2623: Business Statistics 22 / 53

One more example Example: Earthquake intensities are measured using a device called a seismograph which is designed to be most sensitive for earthquakes with intensities between 4.0 and 9.0 on the open-ended Richter scale. Measurements of nine earthquakes gave the following readings: 4.5, L, 5.5, H, 8.7, 8.9, 6.0, H, 5.2 where L indicates that the earthquake had an intensity below 4.0 and a H indicates that the earthquake had an intensity above 9.0. Problem: What is the median earthquake intensity of the sample? Solution: Step 1. Sort: L, 4.5, 5.2, 5.5, 6.0, 8.7, 8.9, H, H Step 2. So the median is 6.0 Donglei Du (UNB) ADM 2623: Business Statistics 23 / 53

Example Example: the weight example (weight.csv) The R code: weight <- read.csv("weight.csv") sec_01a<-weight$weight.01a.2013fall # Median median(sec_01a) ## [1] 155 Donglei Du (UNB) ADM 2623: Business Statistics 24 / 53

Properties of Median It requires at least the ordinal scale All values are used It is unique It is easy to calculate but does not allow easy mathematical treatment It is not affected by extremely large or small numbers (robust) Donglei Du (UNB) ADM 2623: Business Statistics 25 / 53

Subsection 5 Mode Donglei Du (UNB) ADM 2623: Business Statistics 26 / 53

Mode The number that has the highest frequency. Donglei Du (UNB) ADM 2623: Business Statistics 27 / 53

Example Example: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87 The score of 81 occurs the most often. It is the Mode! Donglei Du (UNB) ADM 2623: Business Statistics 28 / 53

Example Example: the weight example (weight.csv) The R code: weight <- read.csv("weight.csv") sec_01a<-weight$weight.01a.2013fall # Mode names(table(sec_01a)[which.max(table(sec_01a))]) ## [1] "155" Donglei Du (UNB) ADM 2623: Business Statistics 29 / 53

Properties of Mode Even nominal data have mode(s) All values are used It is not unique Modeless: if all data have different values, such as 1,1,1 Multimodal: if more than one value have the same frequency, such as 1,1,2,2,3. It is easy to calculate but does not allow easy mathematical treatment It is not affected by extremely large or small numbers (robust) Donglei Du (UNB) ADM 2623: Business Statistics 30 / 53

Subsection 6 Geometric Mean Donglei Du (UNB) ADM 2623: Business Statistics 31 / 53

Geometric Mean (GM) Given a of a set of n numbers x 1,..., x n, the geometric mean is given by the following formula: GM = n x 1 x n If we know the initial and final value over a certain period of n (instead of the individual number sin each period), then GM = n final value initial value Donglei Du (UNB) ADM 2623: Business Statistics 32 / 53

Example Example: The interest rate on three bonds was 5%, 21%, and 4% percent. Suppose you invested $10000 at the beginning on the first bond, then switch to the second bond in the following year, and switch again to the third bond the next year. Problem: What is your final wealth after three years? Solution: Your final wealth will be 10000 GM 3 = 10, 000 1.097 3 = 13213.2, where GM = 3 1.05 1.21 1.04 1.097 Donglei Du (UNB) ADM 2623: Business Statistics 33 / 53

Example Example: the above example The R code: #geometric mean: R does not have a built-in function for t #You can install and use another package library(psych) ## Warning: package psych was built under R version 3.2.5 ## [1] 1.097327 rates<-c(1.05, 1.21,1.04) geometric.mean(rates) Donglei Du (UNB) ADM 2623: Business Statistics 34 / 53

Example Example: The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. Problem: What is your the geometric mean rate of increase?. Solution: The geometric mean over these 8 years is 835, 000 GM = 8 755, 000 1.0127. Therefore the geometric mean rate of increase is 1.27%. Donglei Du (UNB) ADM 2623: Business Statistics 35 / 53

Arithmetic Mean vs Geometric Mean: the AM-GM inequality: If x 1,..., x n 0, then AM = x 1 + x 2 + + x n n with equality if and only if x 1 = x 2 = = x n. n x 1 x 2... x n = GM, Donglei Du (UNB) ADM 2623: Business Statistics 36 / 53

Arithmetic Mean vs Geometric Mean: the AM-GM inequality: If x 1,..., x n 0, then AM = x 1 + x 2 + + x n n with equality if and only if x 1 = x 2 = = x n. n x 1 x 2... x n = GM, Donglei Du (UNB) ADM 2623: Business Statistics 37 / 53

Case: GM vs AM in fund reporting A fund manager tries to convince you to invest in their fund by showing you the annual returns over the last five years 10%, 20%, 30%, 12%, 10% and the average return per year realized in the last five years is 8.4% as calculated as follows. AM = (1 + 0.1) + (1 0.2) + (1 + 0.3) + (1 + 0.12) + (1 + 0.1) 5 = 1.084 This is misleading sometimes. It is much better to say that the average return realized over the last fives with us is approximately 7% per year: GM = 5 1.10 0.80 1.30 1.12 1.10 1 0.07104408 Donglei Du (UNB) ADM 2623: Business Statistics 38 / 53

Example Example: the above example The R code: #geometric mean: R does not have a built-in function for t #You can install and use another package library(psych) rates<-c(1.10, 0.80, 1.30, 1.12, 1.10) mean(rates)-1 ## [1] 0.084 geometric.mean(rates)-1 ## [1] 0.07104408 Donglei Du (UNB) ADM 2623: Business Statistics 39 / 53

Properties of Geometric Mean Similar to arithmetic mean, except used in different scenario It requires interval level All values are used It is unique It is easy to calculate and allow easy mathematical treatments Donglei Du (UNB) ADM 2623: Business Statistics 40 / 53

Subsection 7 Mean for grouped data Donglei Du (UNB) ADM 2623: Business Statistics 41 / 53

Mean for grouped data The mean of a sample of data organized in a frequency distribution is computed by the following formula: x = f 1x 1 +... + f k x k f 1 +... + f k, where f i is the frequency of Class i and x i is the class mid-point of Class i. Donglei Du (UNB) ADM 2623: Business Statistics 42 / 53

Example Example: Recall the weight example from Chapter 2: class freq. (f i ) mid point (x i ) f i x i [130, 140) 3 135 405 [140, 150) 12 145 1740 [150, 160) 23 155 3565 [160, 170) 14 165 2310 [170, 180) 6 175 1050 [180, 190] 4 185 740 62 9810 The mean for the grouped data is: x = 9810 62 158.2258. The real mean for the raw data is 155.8548. Donglei Du (UNB) ADM 2623: Business Statistics 43 / 53

Subsection 8 The Median for Grouped Data Donglei Du (UNB) ADM 2623: Business Statistics 44 / 53

Median for grouped data: Two-step procedure Step 1: identify the median class, which is the class that contains the number on the n/2 position. Step 2: Estimate the median value within the median class using the following formula: median = L + C n 2 CF, f where L is the lower limit of the median class CF is the cumulative frequency before the median class f is the frequency of the median class C is the class interval or size Donglei Du (UNB) ADM 2623: Business Statistics 45 / 53

Example Example: Recall the weight example from Chapter 2: class freq relative freq. cumulative freq. [130, 140) 3 0.05 3 [140, 150) 12 0.19 15 10 {}}{ [ 150, 160) 23 0.37 38 [160, 170) 14 0.23 52 [170, 180) 6 0.10 58 [180, 190] 4 0.06 62 median = 150 + 10 62 2 15 156.9565. 23 Donglei Du (UNB) ADM 2623: Business Statistics 46 / 53

An explanation of the median formula L n CF 2 1 2 3 C f U Donglei Du (UNB) ADM 2623: Business Statistics 47 / 53

Subsection 9 The Mode for Grouped Data Donglei Du (UNB) ADM 2623: Business Statistics 48 / 53

Mode for grouped data: Two-step procedure Step 1: Identify the modal class, which is the class(es) that has the highest frequency(ies). Step 2: Estimate the modal(s) within the modal class (es) as the class midpoint(s). Donglei Du (UNB) ADM 2623: Business Statistics 49 / 53

Example Example: Recall the weight example from Chapter 2: class freq. (f i ) mid point (x i ) [130, 140) 3 135 [140, 150) 12 145 [150, 160) 23 155 [160, 170) 14 165 [170, 180) 6 175 [180, 190] 4 185 mode = 155. Donglei Du (UNB) ADM 2623: Business Statistics 50 / 53

Section 2 Dicussion: How to lie with averges? Or how to defend yourselves from those lying with averages? Donglei Du (UNB) ADM 2623: Business Statistics 51 / 53

Lie with averages There are many different interpreations of averages: Arithemtic Mean vs Geometric mean: be careful of investment fund statements Mean vs Median: be careful of the accounting statements [Huff, 2010] Donglei Du (UNB) ADM 2623: Business Statistics 52 / 53

References I Huff, D. (2010). How to lie with statistics. WW Norton & Company. Donglei Du (UNB) ADM 2623: Business Statistics 53 / 53