PSYC 200 Statistical Methods in Psychology

1 PSYC 200 Statistical Methods in Psychology Summer Session II Meets 07/13/04-08/19/04 Tu - Th 5:00pm-8:20pm (BPS 1124) Instructor: Walky Rivadeneira TA: Susan Campbell The course will Improve your ability to make choices under uncertainty Provide a foundation in statistical thinking and practice Combination of mathematics, psychology, and logic Study regularly and actively Do all the exercises in the main text Expect to study 4-6 hrs/week or more outside of class Homework is required and must be on time Homework and Answer Sheets will be posted on the web Attendance at lab and lecture is expected Class web page: www.lap.umd.edu/lap/psyc200/ See Syllabus

2 What will happen in Psyc200? You will develop skills and abilities in how to reason effectively with data Topics include: How to describe/summarize data (descriptive statistics) How the process of making statistical inferences works (probability, inferential statistics) How to correctly use and interpret statistical techniques and procedures (depends on research question) Why statistics? Science and practice of psychology is entirely dependent on a probabilistic view of human behavior scientists use statistics as an inferential tool! Information in the media is often presented in quantitative/statistical terms Even if you don t use statistics in your everyday life, (but many of you will!), decision-makers affecting your life will! Learn to be skeptical!

3 There are three kind of lies: lies, damned lies, and statistics. -Disraeli STABLE GOVERNMENT PAY ROLLS UP! $20,000,000 30 $19,500,000 Millions of Dollars 20 10 June July Aug Sep Oct Nov Dec 0 June July Aug Sep Oct Nov Dec

4 How to lie with statistics LAST YEAR Milk $3.00 Bread $2.00 THIS YEAR Milk $1.50 Bread $4.00 200 COST OF LIVING UP Last year as base period 150 100 50 0 last year 200 this year COST OF LIVING DOWN This year as base period 150 100 50 0 last year 200 this year COST OF LIVING STABLE Last year as base period AND Use geometric mean 150 100 50 0 last year this year What Would You Do? (Chance, 2002,Vol15, No 15, p. 3) In June, 2002, challenger Eric Perrodin beat Mayor Omar Bradley in Compton, CA, by 281 votes Mistakenly, Perrodin s name had been first on all ballots Bradley called in Prof. John Krosnick, whose research suggests that the first-listed candidate on the ballot receives about 2.5% more votes than the last-listed On that basis, Perrodin got 306 more votes than if his name had appeared last The judge took 306 votes from Perrodin and gave them to Bradley, declaring him the winner Appeal is in process

5 Definitions STATISTICS: A method for dealing with data; a tool for organizing and analyzing numerical facts or observations DATA: A collection of measurements or observations (FYI - data is a plural term; datum is the singular) Population POPULATION: The collection of all people, objects, events, or observations sharing one or more specified characteristics The size of the population may be very large or relatively small. The size is dependent on the defining characteristics established by the research Examples All introductory statistics students All registered voters All babies born in 1997

6 Sample SAMPLE: A subset of individuals selected from a population, usually intended to reflect or represent the population in a study Used because collecting observations from an entire population is usually time-consuming, costly, impractical and/or unfeasible Sample sizes can vary greatly, just as population sizes do. Examples of samples: All statistics students in your class Registered voters in Montgomery county Babies born at Shady Grove hospital When describing data, it is useful to distinguish whether the data come from a population or a sample -- is it a parameter or a statistic? PARAMETER: A value that describes a characteristic of a population Example: The average height of of the entire UMD student population USE GREEK LETTERS STATISTIC: A value that describes a characteristic of a sample Example: The average height of the members of this class USE ROMAN LETTERS

7 SAMPLING ERROR: The discrepancy, or difference, between a sample statistic and the corresponding population parameter Example: The difference that exists between the performance of 300 UMD males and females and the performance of all UMD students. Variables Vs. Constants VARIABLE: A characteristic that takes on different values for different individuals in a population or a sample; something that varies Examples: Height - GPA IQ - hair color CONSTANT: A characteristic that does not change its value in a given context Examples: Days in a week - sample size

8 Discrete vs. Continuous Variables DISCRETE VARIABLE: A variable that consists of separate, indivisible categories; can assume only a finite number of values between any two points a.k.a. Categorical or Qualitative variables No values can exist between the categories Examples: Gender, religious affiliation, number of children in a family CONTINUOUS VARIABLE: A variable where there are an infinite number of possible values that fall between any 2 observed values a.k.a. Quantitative or Numeric variables Can be pictured as a number line without gaps between neighboring points Examples: Family income, test scores, weight Independent vs. Dependent Variables INDEPENDENT VARIABLE: a.k.a. I.V. A variable that is examined in order to determine its effects on an outcome of interest Often (but not always) manipulated by the researcher Consists of at least two levels, or categories DEPENDENT VARIABLE: a.k.a. D.V. An outcome of interest that is being observed and measured in order to assess the effects of the independent variable The actual observations or measurements that you record

9 Independent vs. Dependent Variables A researcher wants to identify the effects of sleep deprivation test performances in his introductory statistics course. He studies 2 groups of students. One group is instructed to stay awake for 2 nights and days and the other group is told to sleep normally. The exam scores of a test given after the 2-day period are recorded. What is the independent variable? What is the dependent variable? Independent vs. Dependent Variables The independent variable is sleep deprivation (the researcher is interested in the effects of this variable on the dependent variable - this is what the researcher is manipulating). There are two levels of this variable -- 1) the presence of deprivation, 2) the absence of deprivation The dependent variable is test performance. This is what is being measured by the researcher in terms of exam scores. The scores are presumed to have been influenced by the manipulation of the I.V. It may be useful to identify independent and dependent variables using the following phrase: The effects of IV on DV. Thus, in the previous example we could state, The effects of sleep deprivation on test performance.

10 Two Basic Areas of Statistics 1) DESCRIPTIVE STATISTICS: used to summarize, organize, and present data in a convenient and communicable form. Example: Summarize exam scores by reporting an average score for the class Summarize age by reporting the range of ages (i.e., The subjects ranged in age from 17-38) 2) INFERENTIAL STATISTICS: techniques that allow us to make inferences or conclusions about a population based on data that are gathered from a sample Example: The academic performance of 150 females and 150 males from UMD are examined. The information provided by these 300 students is then used to draw conclusions about performance for all UMD students. Levels of Measurement MEASUREMENT: The process of assigning numbers to objects or events according to a set of rules There are 4 levels (or scales) of measurement Nominal Ordinal Interval Ratio

11 Levels of Measurement NOMINAL SCALE: Classify observations into mutually exclusive and exhaustive categories No attempt is made to measure magnitude or amount; only distinguishes between groups Observations of unordered variables Examples: gender, political affiliation, county of residence Levels of Measurement ORDINAL SCALE: Observations are rank ordered in terms of size or magnitude One group can be greater than or less than another group Does not tell how much difference there is between groups; it only identifies a direction of difference Example: letter grades A, B, C, D, F -- the students are rankordered in terms of class standing, but the magnitude of the difference in grades between students is not identified

12 Levels of Measurement INTERVAL SCALE: A quantitative scale that requires a constant unit of measurement; intervals between numbers are equal in size Allows for addition, subtraction and other mathematical operations The 0-point is arbitrary - a value of 0 does not necessarily mean the absence of that quality Example: The Fahrenheit scale of measuring temperature -- the difference between 30 and 31 degrees is 1 degree, as is the difference between 95 and 96 degrees; a temperature of 0 does not indicate a lack of temperature Levels of Measurement RATIO SCALE: Similar to interval scale except that the 0-point is absolute - a zero value reflects an absence of the quality we are measuring Allows the use of ratios when comparing numbers - you can have twice as much of that variable or 1/2 as much (you can t do this with interval scale values) This is the most common level of measurement Example: weight - 0 pounds means that object has no weight; a weight of 160 pounds is twice as much as a weight of 80 pounds

13 Organizing and Displaying Data Data by themselves are just numbers Here are the final raw exam scores from 2 classes, 25 students randomly assigned to each, comparing a new and an old method of teaching algebra. How can we make sense of them? New Method Old Method 94 72 66 64 75 68 81 116 91 52 84 84 101 53 100 84 74 56 97 64 87 55 112 65 106 109 109 121 85 92 103 62 121 73 73 104 92 88 104 104 79 90 113 94 110 128 106 93 81 67 Ranking the data New Method Old Method 66 52 73 53 74 55 75 56 79 62 81 64 81 64 84 65 85 67 87 68 91 72 92 73 94 84 97 84 100 88 101 90 103 92 104 93 106 94 106 104 109 104 110 109 112 116 113 121 121 128 What features of the data are suggested? Old method yields greater variability Equal performance in old and new group does not indicate equal relative performance Need to look at ranks and percentile scores

14 Frequency Distributions FREQUENCY DISTRIBUTION: An organization of data indicating the number of people that obtain a certain score or fell in a certain category Can either be in the form of tables or graphs Help to provide us with a visual picture of how the scores are spread out across a measurement scale FREQUENCY DISTRIBUTION TABLES: The simplest frequency distributions list a column of scores (x s) and then the frequency, or the number of times that score occurs (f), beside it FREQUENCY DISTRIBUTION TABLES New n- Old Method f RMethod f 66 1 52 1 73 1 53 1 74 1 55 1 75 1 56 1 79 1 62 1 81 2 64 2 84 1 65 1 85 1 67 1 87 1 68 1 91 1 72 1 92 1 73 1 94 1 84 2 97 1 88 1 100 1 90 1 101 1 92 1 103 1 93 1 104 1 94 1 106 2 104 2 109 1 109 1 110 1 116 1 112 1 121 1 113 1 128 1 121 1 N 25 25 Scores are listed from lowest to highest Freq = count of observations with same value The sum of the f column should be equal to n (the number of subjects in your sample)

15 FREQUENCY DISTRIBUTION TABLES (cont) New Method f Rel f 66 1 0.04 73 1 0.04 74 1 0.04 75 1 0.04 79 1 0.04 81 2 0.08 84 1 0.04 85 1 0.04 87 1 0.04 91 1 0.04 92 1 0.04 94 1 0.04 97 1 0.04 100 1 0.04 101 1 0.04 103 1 0.04 104 1 0.04 106 2 0.08 109 1 0.04 110 1 0.04 112 1 0.04 113 1 0.04 121 1 0.04 N 25 =1.00 n- Old Method f Rel f 52 1 0.04 53 1 0.04 55 1 0.04 56 1 0.04 62 1 0.04 64 2 0.08 65 1 0.04 67 1 0.04 68 1 0.04 72 1 0.04 73 1 0.04 84 2 0.08 88 1 0.04 90 1 0.04 92 1 0.04 93 1 0.04 94 1 0.04 104 2 0.08 109 1 0.04 116 1 0.04 121 1 0.04 128 1 0.04 25 =1.00 In addition to x and f, we can add a relative frequency (Rel f) column. This column tells us the score s frequency relative to the total population size (N) Relative Frequency= f/n The sum of the Rel f column should equal 1.00 FREQUENCY DISTRIBUTION TABLES (cont) New Cum n- Old Cum Method f Rel f f Method f Rel f f 66 1 0.04 1 52 1 0.04 1 73 1 0.04 2 53 1 0.04 2 74 1 0.04 3 55 1 0.04 3 75 1 0.04 4 56 1 0.04 4 79 1 0.04 5 62 1 0.04 5 81 2 0.08 7 64 2 0.08 7 84 1 0.04 8 65 1 0.04 8 85 1 0.04 9 67 1 0.04 9 87 1 0.04 10 68 1 0.04 10 91 1 0.04 11 72 1 0.04 11 92 1 0.04 12 73 1 0.04 12 94 1 0.04 13 84 2 0.08 14 97 1 0.04 14 88 1 0.04 15 100 1 0.04 15 90 1 0.04 16 101 1 0.04 16 92 1 0.04 17 103 1 0.04 17 93 1 0.04 18 104 1 0.04 18 94 1 0.04 19 106 2 0.08 20 104 2 0.08 21 109 1 0.04 21 109 1 0.04 22 110 1 0.04 22 116 1 0.04 23 112 1 0.04 23 121 1 0.04 24 113 1 0.04 24 128 1 0.04 25 121 1 0.04 25 N 25 25 Additional columns may be added to a grouped frequency table: CUMULATIVE FREQUENCIES (Cum f): The number of people/scores who are in or below each class interval; represents the accumulation of individuals as you go up the scale The last value in the Cum f column should reflect the number of subjects that you have (n)

16 FREQUENCY DISTRIBUTION TABLES (cont) New Cum Cumn- Old Cum Cum Method f Rel f f Rel f Method f Rel f f Rel f 66 1 0.04 1 0.04 52 1 0.04 1 0.04 73 1 0.04 2 0.08 53 1 0.04 2 0.08 74 1 0.04 3 0.12 55 1 0.04 3 0.12 75 1 0.04 4 0.16 56 1 0.04 4 0.16 79 1 0.04 5 0.2 62 1 0.04 5 0.2 81 2 0.08 7 0.28 64 2 0.08 7 0.28 84 1 0.04 8 0.32 65 1 0.04 8 0.32 85 1 0.04 9 0.36 67 1 0.04 9 0.36 87 1 0.04 10 0.4 68 1 0.04 10 0.4 91 1 0.04 11 0.44 72 1 0.04 11 0.44 92 1 0.04 12 0.48 73 1 0.04 12 0.48 94 1 0.04 13 0.52 84 2 0.08 14 0.56 97 1 0.04 14 0.56 88 1 0.04 15 0.6 100 1 0.04 15 0.6 90 1 0.04 16 0.64 101 1 0.04 16 0.64 92 1 0.04 17 0.68 103 1 0.04 17 0.68 93 1 0.04 18 0.72 104 1 0.04 18 0.72 94 1 0.04 19 0.76 106 2 0.08 20 0.8 104 2 0.08 21 0.84 109 1 0.04 21 0.84 109 1 0.04 22 0.88 110 1 0.04 22 0.88 116 1 0.04 23 0.92 112 1 0.04 23 0.92 121 1 0.04 24 0.96 113 1 0.04 24 0.96 128 1 0.04 25 1 121 1 0.04 25 1 N 25 25 CUMULATIVE RELATIVE FREQUENCIES (Cum Rel f): The relative frequency of people/scores who are in or below each class interval; represents the accumulation of percentages as you go up the scale The last value in the Cum Rel f column should be 1.00 FREQUENCY DISTRIBUTION TABLES (cont) New Cum Cum Percentile Old Cum Cum Method f Rel f f Rel f Method f Rel f f Rel f 66 1 0.04 1 0.04 4 52 1 0.04 1 0.04 4 73 1 0.04 2 0.08 8 53 1 0.04 2 0.08 8 Percentile 74 1 0.04 3 0.12 12 55 1 0.04 3 0.12 12 75 1 0.04 4 0.16 16 56 1 0.04 4 0.16 16 79 1 0.04 5 0.2 20 62 1 0.04 5 0.2 20 81 2 0.08 7 0.28 28 64 2 0.08 7 0.28 28 84 1 0.04 8 0.32 32 65 1 0.04 8 0.32 32 85 1 0.04 9 0.36 36 67 1 0.04 9 0.36 36 87 1 0.04 10 0.4 40 68 1 0.04 10 0.4 40 91 1 0.04 11 0.44 44 72 1 0.04 11 0.44 44 92 1 0.04 12 0.48 48 73 1 0.04 12 0.48 48 94 1 0.04 13 0.52 52 84 2 0.08 14 0.56 56 97 1 0.04 14 0.56 56 88 1 0.04 15 0.6 60 100 1 0.04 15 0.6 60 90 1 0.04 16 0.64 64 101 1 0.04 16 0.64 64 92 1 0.04 17 0.68 68 103 1 0.04 17 0.68 68 93 1 0.04 18 0.72 72 104 1 0.04 18 0.72 72 94 1 0.04 19 0.76 76 106 2 0.08 20 0.8 80 104 2 0.08 21 0.84 84 109 1 0.04 21 0.84 84 109 1 0.04 22 0.88 88 110 1 0.04 22 0.88 88 116 1 0.04 23 0.92 92 112 1 0.04 23 0.92 92 121 1 0.04 24 0.96 96 113 1 0.04 24 0.96 96 128 1 0.04 25 1 100 121 1 0.04 25 1 100 N 25 25 PERCENTILE: The percentage of individuals who are located at or below the upper real limit of each interval Percentile = (cf/n)(100) The last value in the Percentile column should be 100

17 FREQUENCY DISTRIBUTION TABLES (cont) New Cum Cum Percentile Old Cum Cum Percen- Method f Rel f f Rel f Method f Rel f f Rel f tile 66 1 0.04 1 0.04 4 52 1 0.04 1 0.04 4 73 1 0.04 2 0.08 8 53 1 0.04 2 0.08 8 74 1 0.04 3 0.12 12 55 1 0.04 3 0.12 12 75 1 0.04 4 0.16 16 56 1 0.04 4 0.16 16 79 1 0.04 5 0.2 20 62 1 0.04 5 0.2 20 81 2 0.08 7 0.28 28 64 2 0.08 7 0.28 28 84 1 0.04 8 0.32 32 65 1 0.04 8 0.32 32 85 1 0.04 9 0.36 36 67 1 0.04 9 0.36 36 87 1 0.04 10 0.4 40 68 1 0.04 10 0.4 40 91 1 0.04 11 0.44 44 72 1 0.04 11 0.44 44 92 1 0.04 12 0.48 48 73 1 0.04 12 0.48 48 94 1 0.04 13 0.52 52 84 2 0.08 14 0.56 56 97 1 0.04 14 0.56 56 88 1 0.04 15 0.6 60 100 1 0.04 15 0.6 60 90 1 0.04 16 0.64 64 101 1 0.04 16 0.64 64 92 1 0.04 17 0.68 68 103 1 0.04 17 0.68 68 93 1 0.04 18 0.72 72 104 1 0.04 18 0.72 72 94 1 0.04 19 0.76 76 106 2 0.08 20 0.8 80 104 2 0.08 21 0.84 84 109 1 0.04 21 0.84 84 109 1 0.04 22 0.88 88 110 1 0.04 22 0.88 88 116 1 0.04 23 0.92 92 112 1 0.04 23 0.92 92 121 1 0.04 24 0.96 96 113 1 0.04 24 0.96 96 128 1 0.04 25 1 100 121 1 0.04 25 1 100 N 25 25 The data are somewhat grouped Still cannot form visual impressions Divide into larger classes Freq = count of observations Cumulative Rel. frequency = (Cum f)/n with same value Percentile = 100(Cum f)/n Relative frequency = f/n Cumulative freq = sum of accumulated freq GROUPED FREQUENCY DISTRIBUTIONS: We can further simplify data by grouping scores together into intervals and presenting them in a table INTERVALS / CLASS INTERVALS: Groups of scores; Class intervals have real limits, reflecting the continuous nature of the variable these intervals have real limits REAL LIMITS: are used to separate adjacent scores or intervals exactly halfway between the scores; are always halfway between adjacent intervals; neighboring intervals share a real limit

18 Grouped Frequency Distribution Example: On an ungrouped continuous scale, a score of 94 has a lower real limit of 93.5 and an upper real limit of 94.5 On grouped frequency distribution, a score of 94 may be in the interval 90-99, which has a lower real limit of 89.5 and an upper real limit of 99.5 Why real limits? The real limits of the intervals should be impossible scores, so that every score falls in exactly one interval Frequency Histograms Form roughly 10-20 equal intervals of scores Interval New Method Old Method 50-59 0 4 60-69 1 6 70-79 4 2 80-89 5 3 80-99 4 4 100-109 7 3 110-119 3 1 120-129 1 2 N 25 25 Real limits 49.5 59.5 59.5 69.5 Etc. Frequency 8 7 6 5 4 3 2 1 0 Frequency Histogram New Method Old Method 55 65 75 85 95 105 115 125 Score