Descriptive Statistics

Similar documents
Statistics for Social Research

Guatemalan cholesterol example summary

FINAL REPORT AP STATISTICS CLASS DIESEL TRUCK COUNT PROJECT

Math 135 S18 Exam 1 Review. The Environmental Protection Agency records data on the fuel economy of many different makes of cars.

Lecture 3: Measure of Central Tendency

Busy Ant Maths and the Scottish Curriculum for Excellence Year 6: Primary 7

Important Formulas. Discrete Probability Distributions. Probability and Counting Rules. The Normal Distribution. Confidence Intervals and Sample Size

Technical Papers supporting SAP 2009

Albert Sanzari IE-673 Assignment 5

Index. Calculator, 56, 64, 69, 135, 353 Calendars, 348, 356, 357, 364, 371, 381 Card game, NEL Index

WHITE PAPER. Preventing Collisions and Reducing Fleet Costs While Using the Zendrive Dashboard

Descriptive Statistics Practice Problems (99-04)

The purpose of this experiment was to determine if current speed limit postings are

5. CONSTRUCTION OF THE WEIGHT-FOR-LENGTH AND WEIGHT-FOR- HEIGHT STANDARDS

Chapter 12 VEHICLE SPOT SPEED STUDY

PSD & Moisture Content (71) PROFICIENCY TESTING PROGRAM REPORT

Z-Score Summary - Concrete Proficiency Testing Program (70) Z-SCORES SUMMARY. Concrete April 2017 (70)

Statistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O Halloran

Busy Ant Maths and the Scottish Curriculum for Excellence Foundation Level - Primary 1

New Zealand Transport Outlook. VKT/Vehicle Numbers Model. November 2017

TRUTH AND LIES: CONSUMER PERCEPTION VS. DATA

Fall Hint: criterion? d) Based measure of spread? Solution. Page 1

Spot Speed Study. Engineering H191. Autumn, Hannah Zierden, Seat 20. Ryan King, Seat 29. Jae Lee, Seat 23. Alex Rector, Seat 26

Vehicle Scrappage and Gasoline Policy. Online Appendix. Alternative First Stage and Reduced Form Specifications

Cumulative Frequency Diagrams Question Paper 1

Engineering Dept. Highways & Transportation Engineering

Investigating the Concordance Relationship Between the HSA Cut Scores and the PARCC Cut Scores Using the 2016 PARCC Test Data

Index. Calculated field creation, 176 dialog box, functions (see Functions) operators, 177 addition, 178 comparison operators, 178

Investigation of Relationship between Fuel Economy and Owner Satisfaction

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Pollution Reduction Program (PRP) 4 - Particulate Emissions from Coal Trains

Using Statistics To Make Inferences 6. Wilcoxon Matched Pairs Signed Ranks Test. Wilcoxon Rank Sum Test/ Mann-Whitney Test

We trust that these data are helpful to you. If you have any questions, feel free to contact Dr. Joe Ludlum at or

Effect of Sample Size and Method of Sampling Pig Weights on the Accuracy of Estimating the Mean Weight of the Population 1

Appendix B STATISTICAL TABLES OVERVIEW

EXST7034 Multiple Regression Geaghan Chapter 11 Bootstrapping (Toluca example) Page 1

Regression Models Course Project, 2016

Grade 1: Houghton Mifflin Math correlated to Riverdeep Destination Math

Improving CERs building

EN 1 EN. Second RDE LDV Package Skeleton for the text (V3) Informal EC working document

LET S ARGUE: STUDENT WORK PAMELA RAWSON. Baxter Academy for Technology & Science Portland, rawsonmath.

Grade 3: Houghton Mifflin Math correlated to Riverdeep Destination Math

9.3 Tests About a Population Mean (Day 1)

Houghton Mifflin MATHEMATICS. Level 1 correlated to Chicago Academic Standards and Framework Grade 1

TAXIMETER SURVEY May 2016

The Value of Travel-Time: Estimates of the Hourly Value of Time for Vehicles in Oregon 2007

Passenger seat belt use in Durham Region

DEFECT DISTRIBUTION IN WELDS OF INCOLOY 908

Fractional Factorial Designs with Admissible Sets of Clear Two-Factor Interactions

CHAPTER 3 PROBLEM DEFINITION

PSYC 200 Statistical Methods in Psychology

Missouri Learning Standards Grade-Level Expectations - Mathematics

Ricardo-AEA. Passenger car and van CO 2 regulations stakeholder meeting. Sujith Kollamthodi 23 rd May

correlated to the Virginia Standards of Learning, Grade 6

SUMMARY ANALYSIS OF SET-OUT WEIGHTS FOR GARBAGE, RECYCLING & YARD DEBRIS IN THE CITY OF VANCOUVER. Spring, Summer & Fall Seasons 2000

TABLE 4.1 POPULATION OF 100 VALUES 2

Algebra 2 Plus, Unit 10: Making Conclusions from Data Objectives: S- CP.A.1,2,3,4,5,B.6,7,8,9; S- MD.B.6,7

Relating your PIRA and PUMA test marks to the national standard

Relating your PIRA and PUMA test marks to the national standard

Investigation Electrical Circuits

Lampiran IV. Hasil Output SPSS Versi 16.0 untuk Analisis Deskriptif

Technical Guide No. 7. Dimensioning of a Drive system

Lecture 2. Review of Linear Regression I Statistics Statistical Methods II. Presented January 9, 2018

Chapter 5 ESTIMATION OF MAINTENANCE COST PER HOUR USING AGE REPLACEMENT COST MODEL

Tyre noise limits of EC/661/2009 and ECE R117: Evaluation based on sold tyres in the Netherlands

2018 Linking Study: Predicting Performance on the NSCAS Summative ELA and Mathematics Assessments based on MAP Growth Scores

Motor Trend MPG Analysis

Somatic Cell Count Benchmarks

Spacing and Pattern Effects on DU LQ of Spray Nozzles

Effect of Police Control on U-turn Saturation Flow at Different Median Widths

CHAPTER 5 ANALYSIS OF COGGING TORQUE

THERMOELECTRIC SAMPLE CONDITIONER SYSTEM (TESC)

The PRINCOMP Procedure

For full credit, show all your work.

Attached are suggested wording revisions to the proposed NASAA FPR Commentary.

Detection of Braking Intention in Diverse Situations during Simulated Driving based on EEG Feature Combination: Supplement

Comparative analysis of ship efficiency metrics

RESIDENTIAL PARKING WORKING GROUP MEETING TEN READ- AHEAD MATERIALS

Introduction. Traffic data collection. Introduction. Introduction. Traffic stream parameters

SMART PASSENGER TRANSPORT

Performance of bootstrap confidence intervals for L-moments and ratios of L-moments.

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

Box Plot Template. Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample Vertex42 LLC HELP. Q1 Q2-Q1 Q3-Q2 Řady1 Řady2

Mathematics 43601H. Cumulative Frequency. In the style of General Certificate of Secondary Education Higher Tier. Past Paper Questions by Topic TOTAL

An Approach to Judge Homogeneity of Decision Making Units

WNTE. WNTE control area evaluation with respect to the real-world engine operation envelope. TNO Knowledge for Business

Stat 301 Lecture 30. Model Selection. Explanatory Variables. A Good Model. Response: Highway MPG Explanatory: 13 explanatory variables

The Adoption and Impact of Mobile Money in Kenya: Results from a Panel Survey

Online Appendix for Subways, Strikes, and Slowdowns: The Impacts of Public Transit on Traffic Congestion

Duckworth/Lewis/Stern Methodology of Re-calculating the Target Score in an Interrupted Match

Linking the Indiana ISTEP+ Assessments to the NWEA MAP Growth Tests. February 2017 Updated November 2017

Horsepower to Drive a Pump

SOME ISSUES OF THE CRITICAL RATIO DISPATCH RULE IN SEMICONDUCTOR MANUFACTURING. Oliver Rose

Student-Level Growth Estimates for the SAT Suite of Assessments

IMA Preprint Series # 2035

Table 3.1 New Freshmen SAT Scores By Campus: Fall Table 3.2 UVI New Freshmen SAT Scores By Gender: Fall 1999

2010 REAL MARKET VALUE ANALYSIS REPORT. Coos County, Oregon

Parking Studies. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

CHAPTER 3. Experimental Test Set-Up

I-76 Operating Speed and Speed Limit Compliance Study

Transcription:

Chapter 2 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data 2-3 Pictures of Data 2-4 Measures of Central Tendency 2-5 Measures of Variation 2-6 Measures of Position 2-7 Exploratory Data Analysis Review and Projects 1

2-1 Overview Descriptive Statistics summarizes or describes the important characteristics of a known set of population data Inferential Statistics uses sample data to make inferences about a population 2

Important Characteristics of Data 1. Nature or shape of the distribution, such as bell-shaped, uniform, or skewed 2. Representative score, such as an average 3. Measure of scattering or variation 3

2-2 Summarizing Data With Frequency Tables Frequency Table lists categories (or classes) of scores, along with counts (or frequencies) of the number of scores that fall into each category 4

5 Axial Loads of 0.0109 in. Cans 270 278 250 278 290 274 242 269 257 272 265 263 234 270 273 270 277 294 279 268 230 268 278 268 262 Table 2-1 273 201 275 260 286 272 284 282 278 268 263 273 282 285 289 268 208 292 275 279 276 242 285 273 268 258 264 281 262 278 265 241 267 295 283 281 209 276 273 263 218 271 289 223 217 225 283 292 270 262 204 265 271 273 283 275 276 282 270 256 268 259 272 269 270 251 208 290 220 259 282 277 282 256 293 254 223 263 274 262 263 200 272 268 206 280 287 257 284 279 252 280 215 281 291 276 285 287 297 290 228 274 277 286 277 251 278 277 286 277 289 269 267 276 206 284 269 284 268 291 289 293 277 280 274 282 230 275 236 295 289 283 261 262 252 283 277 204 286 270 278 270 283 272 281 288 248 266 256 292

Table 2-2 Frequency Table of Axial Loads of Aluminum Cans Axial Load 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 Frequency 9 3 5 4 4 14 32 52 38 14 6

Class: An interval. Frequency Table Definitions Lower Class Limit: The left endpoint of a class. Upper Class Limit: The upper endpoint of a class. Class Mark: The midpoint of the class. Class width: the difference between the two consecutive lower class limits. 7

Score Definition values for the example Table 2-2 Frequency 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 9 3 5 4 4 14 32 52 38 14 Lower Class Limits: 200, 210, Upper class limits: 209,219 Class Marks: 204.5=(200+209)/2,, 214.5, Class width: 210-200=10. 8

Determine the Definition Values for this Frequency Table Quiz Scores 0-4 5-9 10-14 15-19 20-24 Frequency 2 5 8 11 7 Classes Lower Class Limits Upper Class Limits Class Marks Class Width 9

Constructing A Frequency Table 1. Decide on the number of classes. 2. Determine the class width by dividing the range by the number of classes (range = highest score lowest score) and round up. class width = round up of range number of classes 3. 4. 5. 6. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score. Add the class width to the starting point to get the second lower class limit. List the lower class limits in a vertical column and enter the upper class limits. Represent each score by a tally mark in the appropriate class. Total tally marks to find the total frequency for each class. 10

Guidelines For Frequency Tables 1. Classes should be mutually exclusive. 2. Include all classes, even if the frequency is zero. 3. Try to use the same width for all classes. 4. Select convenient numbers for class limits. 5. Use between 5 and 20 classes. 6. The sum of the class frequencies must equal the number of original data values. 11

Relative Frequency Table relative frequency = class frequency sum of all frequencies 12

Relative Frequency Table Table 2-2 Table 2-3 Score 200-209 210-219 Frequency 9 3 Axial Load 200-209 210-219 Relative Frequency 0.051 0.017 9 175 =.051 220-229 230-239 240-249 5 4 4 220-229 230-239 240-249 0.029 0.023 0.023 3 175 =.017 250-259 14 250-259 0.080 260-269 270-279 280-289 32 52 38 260-269 270-279 280-289 0.183 0.297 0.217 5 175 =.029 290-299 14 290-299 0.080-13

Cumulative Frequency Table Score Table 2-2 Frequency Axial Load Table 2-4 Cumulative Frequency 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 9 3 5 4 4 14 32 52 38 14 Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 Cumulative Frequencies 14

Frequency Tables Table 2-2 Score Frequency Table 2-3 Axial Load Relative Frequency Table 2-4 Axial Load Cumulative Frequency 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 9 3 5 4 4 14 32 52 38 14 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 0.051 0.017 0.029 0.023 0.023 0.080 0.183 0.297 0.217 0.08- Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 15

Mean as a Balance Point Mean FIGURE 2-7 16

Notation S denotes the summation of a set of values x is the variable usually used to represent the individual data values n represents the number of data values in a sample N represents the number of data values in a population x is pronounced x-bar and denotes the mean of a set of sample values µ is pronounced mu and denotes the mean of all values in a population 17

Mean Definitions the value obtained by adding the scores and dividing the total by the number of scores Sample x = S x n Population µ = S x N Calculators can calculate the mean of data 18

Median Definitions the middle value when scores are arranged in (ascending or descending) order often denoted by x (pronounced x-tilde ) is not affected by an extreme value ~ 19

5 5 5 3 1 5 1 4 3 5 2 1 1 2 3 3 4 5 5 5 5 5 (in order) exact middle MEDIAN is 4 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers 4 + 5 2 = 4.5 MEDIAN is 4.5 20

Mode Definitions the score that occurs most frequently Bimodal Multimodal No Mode the only measure of central tendency that can be used with nominal data 21

Examples a. 5 5 5 3 1 5 1 4 3 5 b. 2 2 2 3 4 5 6 6 6 7 9 c. 2 3 6 7 8 9 10 Mode is 5 Bimodal No Mode 22

Examples a. 5 5 5 3 1 5 1 4 3 5 b. 2 2 2 3 4 5 6 6 6 7 9 c. 2 3 6 7 8 9 10 Mode is 5 Bimodal No Mode d. 2 2 3 3 3 4 e. 2 2 3 3 4 4 5 5 Mode is 3 No Mode 23

Definitions Midrange the value halfway between the highest and lowest scores Midrange = highest score + lowest score 2 24

Round-off rule for measures of central tendency Carry one more decimal place than is present in the orignal set of data 25

Frequency Frequency Frequency An Example of Skewness 3 Dataset 1: 3, 4, 4, 5, 5, 5, 6, 6, 7 2 1 Symmetric Mean = 5, Median = 5 0 3 4 5 6 7 C1 Dataset 2: 3, 4, 4, 5, 5, 5, 7, 7,9. Mean=5.444, Median = 5. 3 2 Skewed right 1 0 3 4 5 6 7 8 9 C2 Dataset 3: 2, 3, 3, 5, 5, 5, 6, 6, 7. 3 Mean = 4.667, Median = 5. 2 1 Skewed left 0 2 3 4 5 6 7 C3 26

Skewness Figure 2-8 (b) Mode = Mean = Median SYMMETRIC Figure 2-8 (a) Mean Median Mode SKEWED LEFT (negatively) Mode Median Mean SKEWED RIGHT (positively) Figure 2-8 (c) 27

Best Measure of Central Tendency Table 2-6 Advantages - Disadvantages 28

Mean from a Frequency Table use class mark of classes for variable x S (f x) x = Formula 2-2 S f x = class mark f = frequency S f = n 29

Quiz Scores Frequency Class Marks 0-4 5-9 10-14 15-19 20-24 2 5 8 11 7 2 7 12 17 22 Mean of this frequency table =14.4 30

Waiting Times of Bank Customers at Different Banks in minutes Jefferson Valley Bank 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7 Bank of Providence 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0 Mean Jefferson Valley Bank 7.15 Bank of Providence 7.15 Median 7.20 7.20 Mode 7.7 7.7 Midrang 7.10 7.10 31

Measure of Variation Range highest score lowest score 32

Measure of Variation Standard Deviation a measure of variation of the scores about the mean (average deviation from the mean) 33

Sample Standard Deviation Formula S (x x) 2 S = n 1 Formula 2-4 calculators can calculate sample standard deviation of data 34

Find the standard deviation of the sample data: 2, 3, 4, 5, 5, 5. S 2 = 8/5=1.6, S=1.26. Use the shortcut formula to find the standard deviations of the above data, and the waiting times at the two banks. 1) S x 2 =104, 2) Jefferson Valley Bank: S x 2 =513.27, S x =71.5, s=0.48. 3) Bank of Providence: S x 2 =541.09, S x =71.5, s=1.82. 35

Population Standard Deviation s = S (x µ) N 2 calculators can calculate the population standard deviation of data 36

Symbols for Standard Deviation Textbook Sample s Population s Book Some graphics calculators Some nongraphics calculators Sx xs n 1 s x xs n Some graphics calculators Some nongraphics calculators 37

Measure of Variation Variance standard deviation squared Notation } s s 2 2 use square key on calculator 38

Variance s 2 = S (x x) 2 n 1 Sample Variance s2 = S (x µ) 2 N Population Variance 39

Round-off Rule for measures of variation Carry one more decimal place than was present in the original data 40

Standard Deviation Shortcut Formula s = n (S x 2 ) (S x) 2 n (n 1) Formula 2-6 41

Frequency IGURE 2-10 Same Means (x = 4) Different Standard Deviations s = 0 7 6 5 4 3 2 1 s = 0.8 s = 1.0 s = 3.0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Standard deviation gets larger as spread of data increases. 42

FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 68% within 1 standard deviation 0.340 0.340 x s x x + s 43

FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.135 0.135 x 2s x s x x + s x + 2s 44

FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 99.7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.024 0.024 0.001 0.001 0.135 0.135 x 3s x 2s x s x x + s x + 2s x + 3s 45

Range Rule of Thumb (minimum) x 2s x Range 4s x + 2s (maximum) or s Range 4 46

Chebyshev s Theorem applies to distributions of any shape the proportion (or fraction) of any set of data lying within k standard deviations of the mean is always at least 1 1/k 2, where k is any positive number greater than 1. 47

Measures of Variation Summary For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations. 48

An application of measure of variation There are two brands, A, B or car tires. Both have a mean life time of 60,000 miles, but brand A has a standard deviation on lifetime of 1000 miles and Brand B has a standard deviation on lifetime of 3000 miles. Which brand would you prefer? 49

Quartiles Q 1, Q 2, Q 3 divides ranked scores into four equal parts 25% 25% 25% 25% Q 1 Q 2 Q 3 50

Percentiles 99 Percentiles 51

Finding the Percentile of a Given Score number of scores less than x Percentile of score x = 100 total number of scores Sorted Axial Loads of 175 Aluminum Cans [1] 200 201 204 204 206 206 208 208 209 215 217 218 220 223 223 [16] 225 228 230 230 234 236 241 242 242 248 250 251 251 252 252 [31] 254 256 256 256 257 257 258 259 259 260 261 262 262 262 262 [46] 262 263 263 263 263 263 264 265 265 265 266 267 267 268 268 [61] 268 268 268 268 268 268 268 269 269 269 269 270 270 270 270 [76] 270 270 270 270 271 271 272 272 272 272 272 273 273 273 273 [91] 273 273 274 274 274 274 275 275 275 275 276 276 276 276 276 [106] 277 277 277 277 277 277 277 277 278 278 278 278 278 278 278 [121] 279 279 279 280 280 280 281 281 281 281 282 282 282 282 282 [136] 282 283 283 283 283 283 283 284 284 284 284 285 285 285 286 [151] 286 286 286 287 287 288 289 289 289 289 289 290 290 290 291 [166] 291 292 292 292 293 293 294 295 295 297 52

Start Rank the data. (Arrange the data in order of lowest to highest.) Compute L = ( k ) n where 100 n = number of scores k = percentile in question Is L a whole number? No Change L by rounding it up to the next larger whole number. Yes Finding the Value of the kth Percentile The value of the kth percentile is midway between the Lth score and the highest score in the original set of data. Find P k by adding the L th score and the next higher score and dividing the total by 2. The value of P k is the Lth score, counting from the lowest 53

Sorted Axial Loads of 175 Aluminum Cans [1] 200 201 204 204 206 206 208 208 209 215 217 218 220 223 223 [16] 225 228 230 230 234 236 241 242 242 248 250 251 251 252 252 [31] 254 256 256 256 257 257 258 259 259 260 261 262 262 262 262 [46] 262 263 263 263 263 263 264 265 265 265 266 267 267 268 268 [61] 268 268 268 268 268 268 268 269 269 269 269 270 270 270 270 [76] 270 270 270 270 271 271 272 272 272 272 272 273 273 273 273 [91] 273 273 274 274 274 274 275 275 275 275 276 276 276 276 276 [106] 277 277 277 277 277 277 277 277 278 278 278 278 278 278 278 [121] 279 279 279 280 280 280 281 281 281 281 282 282 282 282 282 [136] 282 283 283 283 283 283 283 284 284 284 284 285 285 285 286 [151] 286 286 286 287 287 288 289 289 289 289 289 290 290 290 291 [166] 291 292 292 292 293 293 294 295 295 297 The 10th percentile: L=175*10/100=17.5, round up to 18. So the 10th percentile is the 18th one in the sorted data, i.e., 230. The 25th percentile: L=175*25/100=43.52, rounded up to 44. The 25th percentile is the 44th one in the sorted data, I.ei. 262. 54

Interquartile Range: Q 3 Q 1 Semi-interquartile Range: Midquartile: Q 1 + Q 3 2 Q 3 Q 1 2 55

Exploratory Data Analysis Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs 56

Exploratory Data Analysis Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs Traditional Statistics Used to confirm final conclusions about data Typically requires some very important assumptions about the data Calculations are often complex, and graphs are often unnecessary 57

Boxplots Box-and-Whisker Diagram 5 - number summary Minimum first quartile Q1 Median third quartile Q3 Maximum 58

Boxplots Box-and-Whisker Diagram 60 68.5 78 52 90 Figure 2-13 Boxplot of Pulse Rates (Beats per minute) of Smokers 59

Figure 2-14 Boxplots Normal Uniform Skewed 60

Axial Load Outliers Values that are very far away from most of the data 300 290 280 270 260 250 240 230 220 210 200 61

Height Class Survey Data 75 70 65 60 n Bone y Boxplots for the heights of those who never broke a bone and those who did 62

PULSE When comparing two or more boxplots, it is necessary to use the same scale. 100 90 80 70 60 50 40 1 2 (yes) SMOKE (No) 63