IRT Models for Polytomous Response Data

Similar documents
2013 PLS Alumni/ae Survey: Overall Evaluation of the Program

Quality of Life in Neurological Disorders. Scoring Manual

Statistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O Halloran

Components of Hydronic Systems

LECTURE 6: HETEROSKEDASTICITY

Sample Reports. Overview. Appendix C

A REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD

Student-Level Growth Estimates for the SAT Suite of Assessments

Green Server Design: Beyond Operational Energy to Sustainability

Linking the Virginia SOL Assessments to NWEA MAP Growth Tests *

LET S ARGUE: STUDENT WORK PAMELA RAWSON. Baxter Academy for Technology & Science Portland, rawsonmath.

APPLICATION NOTE QuickStick 100 Power Cable Sizing and Selection

Gains in Written Communication Among Learning Habits Students: A Report on an Initial Assessment Exercise

Extracting Tire Model Parameters From Test Data

Linking the Alaska AMP Assessments to NWEA MAP Tests

Propeller Power Curve

The Mark Ortiz Automotive

Presentation Overview. Stop, Station, and Terminal Capacity

International Aluminium Institute

Grade 3: Houghton Mifflin Math correlated to Riverdeep Destination Math

Application Notes. Calculating Mechanical Power Requirements. P rot = T x W

2018 Linking Study: Predicting Performance on the NSCAS Summative ELA and Mathematics Assessments based on MAP Growth Scores

Combustion Performance

Application of claw-back

Linking the Georgia Milestones Assessments to NWEA MAP Growth Tests *

Investigation of Relationship between Fuel Economy and Owner Satisfaction

Engine Cycles. T Alrayyes

CHAPTER THREE DC MOTOR OVERVIEW AND MATHEMATICAL MODEL

Linking the Kansas KAP Assessments to NWEA MAP Growth Tests *

Vehicle Scrappage and Gasoline Policy. Online Appendix. Alternative First Stage and Reduced Form Specifications

Lecture 2. Review of Linear Regression I Statistics Statistical Methods II. Presented January 9, 2018

Capacity and Level of Service for Highway Segments (I)

Linking the New York State NYSTP Assessments to NWEA MAP Growth Tests *

PSYC 200 Statistical Methods in Psychology

LEM Transducers Generic Mounting Rules

Linking the Mississippi Assessment Program to NWEA MAP Tests

Autonomous Vehicles. National Survey Prepared for: RSA Connected and Autonomous Vehicles Conference

Using Statistics To Make Inferences 6. Wilcoxon Matched Pairs Signed Ranks Test. Wilcoxon Rank Sum Test/ Mann-Whitney Test

5. CONSTRUCTION OF THE WEIGHT-FOR-LENGTH AND WEIGHT-FOR- HEIGHT STANDARDS

North Carolina End-of-Grade ELA/Reading Tests: Third and Fourth Edition Concordances

SAN PEDRO BAY PORTS YARD TRACTOR LOAD FACTOR STUDY Addendum

Can Vehicle-to-Grid (V2G) Revenues Improve Market for Electric Vehicles?

Blast Off!! Name. Partner. Bell

Battery Capacity Versus Discharge Rate

SMOOTHING ANALYSIS OF PLS STORAGE RING MAGNET ALIGNMENT

Descriptive Statistics

Linking the North Carolina EOG Assessments to NWEA MAP Growth Tests *

Special edition paper

Investigating the Concordance Relationship Between the HSA Cut Scores and the PARCC Cut Scores Using the 2016 PARCC Test Data

20th. SOLUTIONS for FLUID MOVEMENT, MEASUREMENT & CONTAINMENT. Do You Need a Booster Pump? Is Repeatability or Accuracy More Important?

FRONTAL OFF SET COLLISION

Introducing Formal Methods (with an example)

MIT ICAT M I T I n t e r n a t i o n a l C e n t e r f o r A i r T r a n s p o r t a t i o n

NEW HAVEN HARTFORD SPRINGFIELD RAIL PROGRAM

2018 AER Social Research Report

The Discussion of this exercise covers the following points:

Chapter 13: Application of Proportional Flow Control

PERFORMANCE AND ACCEPTANCE OF ELECTRIC AND HYBRID VEHICLES

University Of California, Berkeley Department of Mechanical Engineering. ME 131 Vehicle Dynamics & Control (4 units)

Fuel Economy and Safety

Statement before the Transportation Subcommittee, U.S. House of Representatives Appropriations Committee

Innovative Power Supply System for Regenerative Trains

Road Surface characteristics and traffic accident rates on New Zealand s state highway network

2012 Air Emissions Inventory

Linking the Florida Standards Assessments (FSA) to NWEA MAP

VALVES & ACTUATORS. 20th TECHNOLOGY REPORT. SOLUTIONS for FLUID MOVEMENT, MEASUREMENT & CONTAINMENT. HOW MUCH PRESSURE Can a 150 lb. Flange Withstand?

CONSTRUCT VALIDITY IN PARTIAL LEAST SQUARES PATH MODELING

Table 1.-Elemsa code and characteristics of Type K fuse links (Fast).

An environmental assessment of the bicycle and other transport systems

How to Build with the Mindstorm Kit

EFFECT OF TRUCK PAYLOAD WEIGHT ON PRODUCTION

Dave Mark Intrinsic Algorithm Kevin Dill Lockheed Martin

RESEARCH ON ASSESSMENTS

Black Belt Six Sigma Project Summary

Analysis and Correlation for Body Attachment Stiffness in BIW

ESTIMATING ELASTICITIES OF HOUSEHOLD DEMAND FOR FUELS FROM CHOICE ELASTICITIES BASED ON STATED PREFERENCE

UT Lift 1.2. Users Guide. Developed at: The University of Texas at Austin. Funded by the Texas Department of Transportation Project (0-5574)

PVP Field Calibration and Accuracy of Torque Wrenches. Proceedings of ASME PVP ASME Pressure Vessel and Piping Conference PVP2011-

Topic 5 Lecture 3 Estimating Policy Effects via the Simple Linear. Regression Model (SLRM) and the Ordinary Least Squares (OLS) Method

DRIVER SPEED COMPLIANCE WITHIN SCHOOL ZONES AND EFFECTS OF 40 PAINTED SPEED LIMIT ON DRIVER SPEED BEHAVIOURS Tony Radalj Main Roads Western Australia

Linking the PARCC Assessments to NWEA MAP Growth Tests

The Mechanics of Tractor Implement Performance

(Refer Slide Time: 00:01:10min)

In order to discuss powerplants in any depth, it is essential to understand the concepts of POWER and TORQUE.

2018 Linking Study: Predicting Performance on the TNReady Assessments based on MAP Growth Scores

WHITE PAPER. Preventing Collisions and Reducing Fleet Costs While Using the Zendrive Dashboard

AN ASSESSMENT OF CAR OWNERS INTEREST AND PERCEPTION OF THE USE OF GLOBAL POSITIONING SYSTEM IN AUTOMOBILE VEHICLES

Technical Papers supporting SAP 2009

Linking the Indiana ISTEP+ Assessments to NWEA MAP Tests

2018 Linking Study: Predicting Performance on the Performance Evaluation for Alaska s Schools (PEAKS) based on MAP Growth Scores

Usage of solar electricity in the national energy market

THERMOELECTRIC SAMPLE CONDITIONER SYSTEM (TESC)

TSFS02 Vehicle Dynamics and Control. Computer Exercise 2: Lateral Dynamics

Higher National Unit Specification. General information for centres. Electrical Motors and Motor Starting. Unit code: DV9M 34

Power Factor Correction

AIR POLLUTION AND ENERGY EFFICIENCY. Update on the proposal for "A transparent and reliable hull and propeller performance standard"

Development of the Japan s RDE (Real Driving Emission) procedure

Predicting Solutions to the Optimal Power Flow Problem

EEEE 524/624: Fall 2017 Advances in Power Systems

DRP DER Growth Scenarios Workshop. DER Forecasts for Distribution Planning- Electric Vehicles. May 3, 2017

Transcription:

IRT Models for Polytomous Response Data Lecture #4 ICPSR Item Response Theory Workshop Lecture #4: 1of 53

Lecture Overview Big Picture Overview Framing Item Response Theory as a generalized latent variable modeling technique Differentiating RESPONSE Theory from Item RESPONSES Nominal Response (but Categorical) Data Ordered Category Models :: Graded Response Model Partially Ordered Category Models :: Partial Credit Model Unordered Category Models :: Nominal Response Brief introduction to even more types of data Lecture #4: 2of 53

DIFFERENTIATING RESPONSE THEORY FROM ITEM RESPONSES Lecture #4: 3of 53

Fundamentals of IRT IRT is a type of measurement model in which transformed item responses are predicted using properties of persons (Theta) and properties of items (difficulty, discrimination) Rasch models are a subset of IRT models with more restrictive slope assumptions Items and persons are on the same latent metric: conjoint scaling Anchor (identify) scale with either persons (z scored theta) or items After controlling for a person s latent trait score (Theta), the item responses should be uncorrelated: local independence Item response models are re parameterized versions of item factor models (for binary outcomes) Thus, we can now extend IRT to polytomous responses (3+ options) Lecture #4: 4of 53

The Big Picture The key to working through the varying types of IRT models is understanding that IRT is all about the type of data you have that you intend to model Once the data type is know, the nuances of a model family become evident (but mainly are due to data types) Item Response (Variable Type) Causal Assumption Response Theory (Latent Variable) In latent variable modeling, we assume that variability in unobserved traits cause variability in item responses Lecture #4: 5of 53

IRT from the Big Picture Point of View Or more conveniently re organized: The model has two parts: Item Response (Variable Type) Response Theory (Latent Variable) Lecture #4: 6of 53

Polytomous Items Polytomous items end up changing the left hand side of the equation The Item Response portion Subsequently, minor changes are made to the right hand side The Response Theory portion These changes frequently are related to the item more than to the theory Think of the c parameter in the 3 PL (for guessing) It cannot be present in an item that is scored continuously More commonly, nuances in IRT software reflect the changes in how models are constructed But general theory remains the same Lecture #4: 7of 53

Polytomous Items Polytomous items mean more than 2 options (categorical) Polytomous models are not named with numbers like binary models, but instead get called different names Most have a 1 PL vs. 2 PL version that go by different names Different constraints on what to do with multiple categories Three main kinds* of polytomous models: Outcome categories are ordered (scoring rubrics, Likert scales) Graded Response or Modified Graded Response Model Outcome categories could be ordered (Generalized) Partial Credit Model or Rating Scale Model Outcome categories are not ordered (distractors/multiple choice) Nominal Response Model * Lots and lots more these are the major categories Lecture #4: 8of 53

Threshold Concept for Binary and Ordinal Variables Each ordinal variable is really the chopped up version of a hypothetical underlying continuous variable (Y*) with a mean of 0 SD=1 SD=1.8 Probit (ogive) model: Pretend variable has a normal distribution (variance = 1) Logit model: Pretend variable has logistic distribution (variance = π 2 /3) 0 1 2 # thresholds = # options - 1 Polytomous models will differ in how they make use of multiple (k-1) thresholds per item Lecture #4: 9of 53

GRADED RESPONSE MODEL Lecture #4: 10 of 53

Example Graded Response Item From the 2006 Illinois Standards Achievement Test (ISAT): www.isbe.state.il.us/assessment/pdfs/grade_5_isat_2006_samples.pdf Lecture #4: 11 of 53

ISAT Scoring Rubric Lecture #4: 12 of 53

Additional Example Item Cognitive items are not the only ones where graded response data occurs Likert type questionnaires are commonly scored using ordered categorical values Typically, these ordered categories are treated as continuous data (as with Factor Analysis) Consider the following item from the Satisfaction With Life Scale (e.g. SWLS, Diener, Emmons, Larsen, & Griffin, 1985) Lecture #4: 13 of 53

SWLS Item #1 I am satisfied with my life. 1. Strongly disagree 2. Disagree 3. Slightly disagree 4. Neither agree nor disagree 5. Slightly agree 6. Agree 7. Strongly agree Lecture #4: 14 of 53

Graded Response Model (GRM) Ideal for items with clear underlying response continuum # response options (k) don t have to be the same across items Is an indirect or difference model Compute difference between models to get probability of each response Estimate 1 a i per item and k 1 difficulties (4 options 3 difficulties) Models the probability of any given response category or higher, so for any given difficulty submodel, it will look like the 2PL Otherwise known as cumulative logit model Like dividing 4 category items into a series of binary items 0 vs. 1,2,3 0,1 vs. 2,3 0,1,2 vs. 3 b 1i b 2i b 3i But each threshold uses all response data in estimation Lecture #4: 15 of 53

Example GRM for 4 Options (0 3): 3 Submodels with common a Prob of 0 vs 123 ::.. Prob of 01 vs 23 ::.. Prob of 012 vs 3 :: Prob of 0 1 P i1 Prob of 1 P i1 P i2 Prob of 2 P i2 P i3 Prob of 3 P i3 0.. Note a i is the same across thresholds :: only one slope per item b ik = trait level needed to have a 50% probability of responding in that category or higher Lecture #4: 16 of 53

Cumulative Item Response Curves (GRM for 5 Category Item, a i = 1) P (Y >= y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 P(Y>=0 Theta) P(Y>=1 Theta) P(Y>=2 Theta) P(Y>=3 Theta) P(Y>=4 Theta) b 1 = -2 b 2 = -1 b 3 = 0 b 4 = 1 a i = 1 curves have same slope 0.0-4 -3-2 -1 0 1 2 3 4 Theta Lecture #4: 17 of 53

Cumulative Item Response Curves (GRM for 5 Category Item, a i = 2) P (Y >= y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 P(Y>=0 Theta) P(Y>=1 Theta) P(Y>=2 Theta) P(Y>=3 Theta) P(Y>=4 Theta) b 1 = -2 b 2 = -1 b 3 = 0 b 4 = 1 a i = 2 slope is steeper 0.0-4 -3-2 -1 0 1 2 3 4 Theta Lecture #4: 18 of 53

Category Response Curves (GRM for 5 Category Item, a i = 1) P (Y = y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Gives most likely category response across Theta P(Y=0 Theta) P(Y=1 Theta) P(Y=2 Theta) P(Y=3 Theta) P(Y=4 Theta) The b ik s do not map directly onto this illustration of the model, as these are calculated from the differences between the submodels. This is what is given in Mplus, however. 0.1 0.0-4 -3-2 -1 0 1 2 3 4 Theta Lecture #4: 19 of 53

Category Response Curves (GRM for 5 Category Item, a i = 2) P (Y = y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Gives most likely category response across Theta P(Y=0 Theta) P(Y=1 Theta) P(Y=2 Theta) P(Y=3 Theta) P(Y=4 Theta) a i = 2 :: slope is steeper 0.1 0.0-4 -3-2 -1 0 1 2 3 4 Theta Lecture #4: 20 of 53

Category Response Curves (GRM 5 Category Item, a i =.5) P (Y = y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Gives most likely category response across Theta This is exactly what you do NOT want to see. P(Y=0 Theta) P(Y=1 Theta) P(Y=2 Theta) P(Y=3 Theta) P(Y=4 Theta) Although they are ordered, the middle categories are basically worthless. 0.1 0.0-4 -3-2 -1 0 1 2 3 4 Theta Lecture #4: 21 of 53

Modified ( Rating Scale ) Graded Response Model Is more parsimonious version of graded response model Designed for items with same response format In GRM, there are (#options 1)*(#items) thresholds estimated + one slope per item In MGRM, each item gets own slope and own location parameter, but the differences between categories around that location are constrained equal across items (get a c shift for each threshold) Items differ in overall location, but spread of categories within is equal So, different ai and bi per item, but same c1, c2, and c3 across items Prob of 0 vs 123 :: 1.. (and so forth for c2 and c3) Not same c as guessing parameter sorry, they reuse letters Not directly available within Mplus, but pry could be using constraints Lecture #4: 22 of 53

c 1 c 2 c 3 c 4 b 3 b 2 b 1 Modified GRM :: 1 Location, k-1 c s All category distances are same across items b 11 b 12 b 13 b 14 b 21 b 22 b 23 b 24 b 31 b 32 b 33 b 34 Original GRM :: k-1 locations All category distances are allowed to differ across items Lecture #4: 23 of 53

Summary of Models for Ordered Categorical Responses Available in Mplus with CATEGORICAL ARE option Equal discrimination across items (1-PLish)? Unequal discriminations (2-PLish)? Difficulty Per Item Only (category distances equal) (possible, but no special name) Modified GRM or Rating Scale GRM (same response options) Difficulty Per Category Per Item (possible, but no special name) Graded Response Model Cumulative Logit GRM and Modified GRM are reliable models for ordered categorical data Commonly used in real world testing; most stable to use in practice Least data demand because all data get used in estimating each b ik Only major deviations from the model will end up causing problems Lecture #4: 24 of 53

PARTIAL CREDIT MODEL Lecture #4: 25 of 53

Partial Credit Model (PCM) Ideal for items for which you want to test an assumption of an ordered underlying continuum # response options doesn t have to be same across items Is a direct, divide by total model (probability of response given directly) Estimate k 1 thresholds (so 4 options :: 3 thresholds) Models the probability of adjacent response categories: Otherwise known as adjacent category logit model Divide item into a series of binary items, but without order constraints beyond adjacent categories because it only uses those 2 categories: 0 vs. 1 1 vs. 2 2 vs. 3 δ 1i δ 2i δ 3i No guarantee that any category will be most likely at some point Lecture #4: 26 of 53

Partial Credit Model With different slopes (a i ) per item, then it s generalized partial credit model ; otherwise 1 PLish version is Partial Credit Model Still 3 submodels for 4 options, but set up differently: Given 0 or 1, prob of 1 ::.. Given 1 or 2, prob of 2 ::.. Given 2 or 3, prob of 3 ::.. δ is the step parameter :: latent trait where the next category becomes more likely not necessarily 50% Other parameterizations also used check the program manuals Currently not directly available in Mplus Lecture #4: 27 of 53

Generalized Partial Credit Model The item score category function Lecture #4: 28 of 53

Category Response Curves (PCM for 5 Category Item, ai = 1) P (Y = y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Gives most likely category response across Theta 0 1 2 P(Y=0 Theta) P(Y=1 Theta) P(Y=2 Theta) P(Y=3 Theta) P(Y=4 Theta) 3-4 -3-2 -1 0 1 2 3 4 Theta 4 These curves look similar to the GRM, but the location parameters are interpreted differently because they are NOT cumulative, they are only adjacent Lecture #4: 29 of 53

Category Response Curves (PCM for 5 Category Item, a i = 1) P (Y = y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Gives most likely category response across Theta 0 δ 12 1 δ 01 2 P(Y=0 Theta) P(Y=1 Theta) P(Y=2 Theta) P(Y=3 Theta) P(Y=4 Theta) δ δ 34 23 3-4 -3-2 -1 0 1 2 3 4 Theta 4 The δ s are the location where the next category becomes more likely (not 50%). Lecture #4: 30 of 53

Category Response Curves (PCM for 5 Category Item, a i = 1) P (Y = y Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Gives most likely category response across Theta 0 δ 12 1 δ 01 2 P(Y=0 Theta) P(Y=1 Theta) P(Y=2 Theta) P(Y=3 Theta) P(Y=4 Theta) δ δ 34 23 3-4 -3-2 -1 0 1 2 3 4 Theta 4 a score of 2 instead of 1 requires less Theta than 1 instead of 0 This is called a reversal But here, this likely only happens because of a very low frequency of 1 s Lecture #4: 31 of 53

Partial Credit Model vs. Graded Response Model The PCM is very similar to GRM Except these models allow for the fact that one or more of the score categories may never have a point where the probability of x is greatest for a given q level Because of local estimation, there is no guarantee that category b values will be ordered This is a flaw or a strength, depending on how you look at it Lecture #4: 32 of 53

PCM and GPCM vs. GRM GPCM and GRM will generally agree very closely, unless one or more of the score categories is underused GRM will force the categories boundary parameters to be ordered, GPCM and PCM do not For this reason, comparing results with the same data across models can point out interesting phenomena in your data Lecture #4: 33 of 53

More of what you don t want to see category response curves from a PCM where reversals are a plenty and the middle categories are fairly useless. Response Categories 0 = green = Time-Out 1 = pink = 30 45 s 2 = blue = 15 30 s 3 = black = < 15 s *Misfit (p <.05) Lecture #4: 34 of 53

PCM Example: General Intrusive Thoughts (5 options) Note that the 4 thresholds cover a wide range of the latent trait, and what the distribution of Theta looks like as a result... But the middle 3 categories are used infrequently &/or are not differentiable -3-2 -1 0 1 2 3 Latent Trait Score Lecture #4: 35 of 53

Partial Credit Model Example: Event- Specific Intrusive Thoughts (4 options) Note that the 3 thresholds do not cover a wide range of the latent trait, and what the distribution of theta looks like as a result -3-2 -1 0 1 2 3 Latent Trait Score Lecture #4: 36 of 53

Rating Scale Model Rating Scale is to PCM what Modified GRM is to GRM Is more parsimonious version of partial credit model Designed for items with same response format In PCM, there are (#options 1)*(#items) step parameters estimated (+ one slope per item in generalized PCM version) In RSM, each item gets own slope and own location parameter, but the differences between categories around that location are constrained equal across items Items differ in overall location, but spread of categories within is equal So, different δi per item, but same c1, c2, and c3 across items If 0 or 1, prob of 1 ::.. (and so forth for δ2 and δ3) δiis a location parameter, and c is the step parameter as before Constrains curves to look same across items, just shifted by δi Lecture #4: 37 of 53

c 1 c 2 c 3 c 4 δ 3 δ 2 δ 1 Rating Scale 1 Location, k-1 c s All category distances are same across items δ 11 δ 12 δ 13 δ 14 δ 21 δ 22 δ 23 δ 24 δ 31 δ 32 δ 33 δ 34 Original PCM k-1 locations All category distances are allowed to differ across items Lecture #4: 38 of 53

Summary of Models for Partially Ordered Categorical Responses Partial Credit Models test the assumption of ordered categories This can be useful for item screening, but perhaps not for actual analysis These models have additional data demands relative to GRM Only data from that threshold get used (i.e., for 1 vs. 2, 0 and 3 don t contribute) So larger sample sizes are needed to identify all model parameters Sometimes categories have to be consolidated to get the model to not blow up Not directly available in Mplus Equal discrimination across items (1-PLish)? Unequal discriminations (2-PLish)? Difficulty Per Item Only (category distances equal) Rating Scale PCM Generalized Rating Scale PCM?? (same response options) Difficulty Per Category Per Item Partial Credit Model Generalized PCM Adjacent Category Logit Lecture #4: 39 of 53

ADDITIONAL FEATURES OF ORDERED CATEGORICAL MODELS Lecture #4: 40 of 53

Expected Scores It is useful to combine the probability information from categories into one function for an expected score: Multiply each score by its P, add up over categories for any theta level This expected score function acts as a single Item Characteristic Function (analogous to the ICC for dichotomous/binary items) Lecture #4: 41 of 53

4 3 2 1 0 Item Characteristic Function -3-2 -1 0 1 2 3 Ability ( ) Lecture #4: 42 of 53 Expected Score = E(X) Expected Score

1 0.5 0 Expected Proportion Correct -3-2 -1 0 1 2 3 Ability ( ) Lecture #4: 43 of 53 Expected Proportion Score = E(X)/mj

1 ICF y = 0 y = 4 0.5 y = 1 y = 2 y = 3 0-3 -2-1 0 1 2 3 Ability ( ) Lecture #4: 44 of 53 Probability Probability P of x

Item/Test Characteristic Function ICF is a good summary of an item and is used in test development, DIF studies, model data fit evaluations As before, the TCF is equal to the sum of expected scores over items This could include dichotomous, polytomous, or mixedformat tests Lecture #4: 45 of 53

NOMINAL RESPONSE MODELS Lecture #4: 46 of 53

Nominal Response Model Ideal for items with no ordering of any kind (e.g., dog, cat, bird) # response options don t have to be same across items Is a direct model (probability of response given directly) Models the probability of one response category against all others Still like dividing item into a series of binary items, but now each option is really considered as a separate item ( Baseline category logit ) 0 vs. 1,2, 1 vs. 0,2,3 2 vs. 0,1,3 c 1i c 2i c 3i P(y =1) = exp(1.7a (θ + c )) i1 s i1 3 exp(1.7a iy(θ s + c iy)) y=0 Estimate one slope (a i ) and one intercept (c i ) parameter per item, per threshold, such that sum(a s)=0, sum(c s)=0 (so a and c are only relatively meaningful within a single item) Available in Mplus with NOMINAL ARE option Can be useful to examine distractors in multiple choice tests Lecture #4: 47 of 53

Example Nominal Response Item Lecture #4: 48 of 53

Additional Item Types Non cognitive tests can also contain differing item types that could be modeled using a Nominal Response Model For example, consider an item from a questionnaire about political attitudes Which political party would you identify yourself with? A. Democrat B. Republican C. Independent D. Green E. Unaffiliated Lecture #4: 49 of 53

Category Response Curves (NRM for 5 Category Item) Nominal Response Item Response Function P(Y=m Theta) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 d c P(X=a Theta) P(X=b Theta) P(X=c Theta) P(X=d Theta) b a Example Distractor Analysis: People low in Theta are most likely to pick d, but c is their second choice People high in Theta are most likely to pick a, but b is their second choice 0.0-4 -3.6-3.2-2.9-2.5-2.2-1.8-1.4-1.1-0.7-0.4 0 0.36 0.72 1.08 1.44 1.8 2.16 2.52 2.88 3.24 3.6 3.96 Theta Lecture #4: 50 of 53

CONCLUDING REMARKS Lecture #4: 51 of 53

Summary: Polytomous Models Many kinds of polytomous IRT models Some assume order of response options (done in Mplus) Graded Response Model Family :: cumulative logit model Model cumulative change in categories using all data for each Some allow you to test order of response options (no Mplus) Partial Credit Model Family :: adjacent category logit model Model adjacent category thresholds only, so they allow you to see reversals (empirical mis ordering of your response options with respect to Theta) PCM useful for identifying separability and adequacy of categories Can be done using SAS NLMIXED (although very slowly see example) Some assume no order of response options (done in Mplus) Nominal Model :: baseline category logit model Useful to examine probability of each response option Is very unparsimonious and thus can be hard to estimate Lecture #4: 52 of 53

Up Next Estimation of Parameters for IRT Models Estimate person parameters when item parameters are known Joint estimation of person and item parameters Lecture #4: 53 of 53