Revisiting Nested Stratification of Primary Sampling Units

Similar documents
Chapter URL:

Nebraska Feedyard Labor Cost Benchmarks and Historical Trends

Making use of example sentences

The productivity paradox: evidence from indirect indicators of service sector productivity growth

What else can you do?

Tenderness and Intramuscular Lipid of Most Major Muscles from Bos Indicus Cattle are Less than Bos Taurus Cattle

Annual Legumes to Complement Warm-Season Perennial Grass Forage Systems in North Florida

PROJECTIONS OF PLANES

Study on the Characteristics of Gas Molecular Mean Free Path in Nanopores by Molecular Dynamics Simulations

Agricultural Experiment Station Kansas State University, Manhattan Walter R. Woods, Director

AWWA Grooved Fittings

Super-efficiency and stability intervals in additive DEA

Time, Distance & Speed

An Experimental Investigation of Jatropha Biodiesel Blends in a Multi Cylinder CI Engine: Performance and Emissions Study

Sizing and Simulation of a Flywheel Energy Storage System for Ramea Hybrid Power System

PRODUCT INFORMATION. English MCF - MBF - MBL HIGH PRESSURE. Series

MPC COBRA SERIES. Manually Programmable Cylindrical Lock D

Stocker Cattle Production and Management Practices in Oklahoma

Experimental study of scalar filtered mass density function in turbulent partially premixed flames

Assembly Instructions

864293A02 4.3L (262 CID) MANIFOLD CONVERSION

Snap & Grow 6 x8 254 L x 190 W x 209 H cm 100 L x 74 3 /4 W x 82 1 /4 H

UNCORRECTED. Understanding fractions, decimals percentages

Snap & Grow 8 x8 254 L x 249 W x 260 H cm 100 L x 98 W x /2 H

Assembly Instructions

Snap & Grow 8 x8 + 4 Extension

SYSTEM FITTINGS FOR OUTWARD OPENING WINDOWS AND DOORS

Assembly Instructions

5000 series strike. Installation Instructions. ASSA ABLOY, the global leader in door opening solutions HES 2005

A PORTABLE TILLAGE PROFILER FOR MEASURING SUBSOILING DISRUPTION

Saginaw Valley Research and Extension Center 2014 Pickling Cucumber Variety Trial

Automation and prothrombin time: a United Kingdom field study of two widely used coagulometers

Microelectronics Journal

e-quals Unit Syllabus

LifeTec PP N PROCESS FILTRATION FROM PURE TO STERILE MAIN FEATURES & BENEFITS PRODUCT DESCRIPTION INDUSTRIES

POWER STEERING PUMP 10 A

Carl Nielsen Music is Life. SET UP MANUAL Travelling Exhibition from Odense City Museums DK-5000 Denmark

CALL HEIGHT CARD READER TOP OF MONITOR FRONT SIDE CONCRETE SURFACE TO BE FLAT AND LEVEL IN AREA OF UNIT

POWER TRIM. Table of Contents. Section 5B - Trim Cylinders

CS 5700 LOBBY CASH DISPENSER REAR LOAD - 15" DISPLAY OR 19" DISPLAY TYPICAL ACCESS TO LEVELING LEGS - RANGE (2) EACH SIDE OF SAFE CEN I SAFE ONLY * *

HEATING BLANKETS FOR GENERAL COMPOSITE REPAIR

UNITED KINGDOM AERONAUTICAL INFORMATION CIRCULAR

A/S J. PETERSENS BESLAGFABRIK

Active damping devices for aeroelastic models

3/8" Square Multi-Turn Cermet Trimmer

Brazed heat exchanger XB

Report of Progress 961

CS 7700 FULL FUNCTION LOBBY TERMINAL REAR LOAD - 15" OR 19" DISPLAY

AWWA Grooved Fittings

A/S J. PETERSENS BESLAGFABRIK

2/2 4/4 Solenoid Directional Seat Valve, ISO Size 03

Harmonic Reduction of Doubly Fed Induction Generator Connected To the Grid

E. Askari Asli-Ardeh and Y. Abbaspour-Gilandeh Department of Agricultural Machinery, College of Agriculture,

FUEL SYSTEM. Table of Contents. Section 3C Oil Injection

8' SECTION SHOWN IN ASY. DOOR KITS WILL VARY IN LENGTH. REFER TO STORE PLANS/LAYOUT FOR LENGTHS REQUIRED.

Web Data Models. XPath: Syntax, Semantics Silviu Maniu

Clearline. fusion. Installation. t<45mm Universal Roofing Kit F16 T F16 L. viridian. solar. v

ALUMINUM ENTRANCE DOORS

Strengthening and Evaluating the Preventing Malnutrition in Children under 2 Years of Age Approach Burundi Follow-Up Report: Children Months

COTTON VARIETY FIBER CHARACTERISTICS AND YIELD COMPARISON. King Ranch Farms, Kleberg County, 1998

Economic Contribution of Off-Highway Vehicle Recreation in Colorado

3/8" Square Multi-Turn Cermet Trimmer

External Shocks and Policy Alternatives in Small Open Economies

Hardware. Contents. Hardware. Gates. p3-6 p7-16

MANN+HUMMEL Oil-bath air cleaner Single-stage air cleaner without spare parts

631 ALLOWANES AND TOLERANCES

Research and Implementation of Tractor Power Shift Clutch Control System

TECHNICAL REPORT Kathleen T. Williams, PhD, NCSP Overview Revisions in the New Edition

Evaluation of Selected Insecticides for Control of Insect Pests of Soybeans. Beaumont, TX

CHAPTER 3 Federal Coal Leases and Preference Right Lease Applications: An Overview

Bringing Finance to Pakistan s Poor:

Annex II Emission reduction commitments

CRUISE CONTROL (VACUUM TYPE)

Evaluation of UPS for Intersection Traffic Signals with LEDs: Findings for Myers PB-1250PC UPS

Press control block PSB Type approved according to DIN EN 693

Comparative evaluation of aesthetic, biological, and economic effectiveness of different lawn management programs

Solenoid Operated Proportional Directional Control Valve (with Pressure Compensation, Multiple Valve Series)

FRONT SIDE FIX WINDOW

Transmission and Driveline. Section 2A - Transmission and Driveline

JET PUMP R1 JUNE 1996 JET PUMP

POWER TRIM. Table of Contents. Section 5B - Trim Cylinders

Modular simulation software development for liquid propellant rocket engines based on MATLAB Simulink

ELECTRICAL and IGNITION 2 A IGNITION SYSTEM

A bait comparison study in the Newfoundland and Labrador snow crab (Chionoecetes opilio) fishery: does Atlantic herring stand a chance against squid?

ELECTRICAL AND IGNITION

Chef's First Kitchen Station

EOI No: DATED 26/04/2015 EXPRESSION OF INTEREST FOR SUPPLY OF BIO-DIESEL NOTICE INVITING EOI

Special Valves. Connection size G1/8 to G2. Contents Version Actuation Port size Page Characteristics. Instructions Overview. Electrical G1/8 G1/4

Electrical measurement and control. Measurement Transformers and Shunts

Get Ready for the Lesson

Some thoughts on requirements for languages in engineering Requirements for Languages for modelling big systems World Ontology Summit,

(a) on-board sensors evaluat ion, (b) damage signature and background noise characterization, and (c) optimizat ion of noise filtering techniques.

Unbalanced Voltage Compensation by Interline Photo Voltaic Systems

Security Evaluation of DPA Countermeasures Using Dual-Rail Pre-charge Logic Style

GUARDIAN MEDIUM PRESSURE

WINDSHIELD / WINDOWGLASS

V-JETm2 / JET-R User s Manual

Timing/Synchronizing/ Adjusting (3 Cylinder Models)

DRIVES VELVET DRIVE IN-LINE TRANSMISSION

ELECTRICAL SYSTEM 4 D INSTRUMENTATION

Transcription:

Revisiting Nested Strtifition of Primry Smpling Units Tom Krenzke nd Wen-Chu Hung Westt, 16 Reserh Boulevrd, Rokville, MD, 285 Strtified multi-stge luster re smple designs re used widely when onduting lrge, in-person surveys in the US euse they re ost effetive nd effiient. In suh surveys, it is expensive to ondut listing opertion to rete frme of dwelling units nd to trvel interviewers to the seleted households. Therefore, in the first stge of seletion, usul pproh is to form, strtify, nd selet Primry Smpling Units (PSUs), for exmple, ounties or groups of ounties. These geogrphi res re formed to redue interviewer trvel osts nd to inrese the heterogeneity within PSUs. Prior to seletion, PSUs re strtified into homogeneous groups in order to redue the ntiipted smpling vrition in the resulting survey estimtes. Another ojetive is to form strt lose-to-equl in popultion totls to help hieve lose-toequl interviewer worklods nd self-weighting smple. Reduing the vrition mong strtum-level popultion totls for one-psu per strtum designs helps to redue the vrine in survey estimtes, espeilly totls, nd lso redues the is in vrine estimtes. While some strtifition designs re onduted to serve multiple purposes or surveys, we fous the disussion on single survey with single vrile of interest. Strtifition serhes hve een implemented using sophistited multivrite lustering lgorithms, suh s desried in Friedmn nd Ruin (1967), Jewitt nd Judkins (1988) nd more omputer-intensive pproh s presented in Ludington (1992). Kish (1965) disusses muh effort while implementing strtifition pproh, nd questions the enefits of suh expensive efforts. He mentions tht strtifition ttempts tht pper to e very different often led to out the sme vrines. One purpose of this pper is to investigte Kish s onlusive remrks. We desrie serhes under simplified multivrite lgorithm using nested strtifition, likely used y Kish, whih ttempts to inrese homogeneity (using distne mesures) nd redue the vrition mong sustrt popultion totls, while rriving t expliit oundries, s some my prefer for doumenttion nd lerly ommuniting the strtifition results. In effet, we revisit the efforts undertken y Kish nd his ollegues, mesuring the vrition ross hundreds of strtifition shemes. A seond purpose of this pper is to present n evlution of the PSU strtifition design for the 23 Ntionl Assessment of Adult Litery (NAAL), whih used nested strtifition proedure. Susequent to the ondut of the survey, extensive modeling led to the identifition of key vriles tht would e good strtifition vriles (suh s Deennil Census dt) for the future. Also, model-sed estimtes of low-litery t the ounty level hve een produed nd re used here s n evlution vrile for omputing the etween PSU vrine for different sustrtifition shemes. Lstly, improvements to Westt s PSU strtifition softwre (WesStrt) hve llowed more strtifition shemes to e reted. Key steps leding up to the sustrtifition proess The underlying senrio for this disussion is to selet strtified proility proportionte to size smple of PSUs tht will led to self-weighting design. We onsider the following steps in the strtifition of PSUs. Determining the mesure of size. The mesure of size used to selet PSUs is typilly the popultion ount within the PSUs. The mesure of size is used to llote the totl numer of sustrt proportionte to size to eh mjor strtum. It is lso used in forming the sustrt, for instne, to reh the ojetive of equl-sized strt. Identifying self-representing (SR) PSUs. Self-representing (SR) PSUs re typilly PSUs with the lrgest vlues of the mesure of size. The SR PSUs ome into the smple with proility equl to one. Eh SR PSU is in strtum y itself nd therefore is exluded from the sustrtifition proess.

Determining the numer of PSUs nd strt. The numer of PSUs to selet depends primrily on ost nd reliility onsidertions, whih inludes the inrese to smpling vrine due to lustering individuls within smpling units. In generl, the more PSUs seleted, the less lustering ut the higher ost due to interviewer trvel. One the numer of PSUs is estlished, the totl numer of strt is derived y the numer of PSUs plnned in the smple nd the numer of smple PSUs per strtum. One the self-representing PSUs re identified, under one-psu per strtum design, the totl numer of strt is equl to the totl numer of PSUs needed, whih is equl to the numer of SR PSUs nd non-self representing (NSR) PSUs. Therefore, the numer of NSR strt is equl to the totl numer of strt minus the numer of SR PSUs. Under two- PSU per strtum design, the totl numer of strt is equl to the numer of SR PSUs dded to one-hlf the numer of NSR PSUs needed. Therefore, the numer of NSR strt is equl to the totl numer of strt minus the numer of SR PSUs. Identifying mjor strt. The non-self representing (NSR) PSUs on the frme re grouped into mjor strt. The mjor strt re typilly formed to ensure representtion ross geogrphi res while llowing for estimtes to e reported for the domins they represent (e.g., stte-level estimtes). One identified, they serve s hrd oundries when forming the sustrt. The mjor strt should lso e relted to the survey outome mesure. Identifying sustrtifition vriles. Typilly 2 to 4 vriles re used to form strt within the mjor strt. The sustrtifition vriles should e relted to the survey outome vrile of interest. They my e seleted fter proessing stepwise regression, or fter review of literture of pst nlyses. Some exmples of strtifiers used in demogrphi surveys inlude medin household inome, totl popultion size, nd proportion of popultion with ollege degree. Alloting the totl numer of NSR strt to the mjor strt. The totl numer of NSR strt needs to e lloted to the mjor strt. When the llotion is done proportionte to the mesure of size, strt totls my e more equl in size ross ll strt. With one-psu per strtum design, it is preferle to hve n even numer of strt lloted to eh mjor strtum, sine strt will need to e omined (pired) to filitte vrine estimtion. Sustrtifying eh mjor strtum. One the steps desried ove re ompleted, the sustrtifition proess is implemented. The next setion disusses nested strtifition pproh tht hs een used in severl surveys t Westt, inluding the Erly Childhood Longitudinl Study, NAAL, nd the Adult Litery nd Lifeskills Survey, eh sponsored y the Ntionl Center for Edution Sttistis, nd the Ntionl Center for Helth Sttistis Ntionl Helth nd Nutrition Exmintion Survey. A nested strtifition pproh Given the dul ojetive of reduing the etween PSU vrine nd rriving t lose-to-equl mesure of size totls ross sustrt, we undertook serh for the est sustrtifition solution given the underlying nested strtifition pproh. We should note tht we tret the ojetives s equl in this pper, however, in prtie, one my e fvored over the other depending on the sitution. The nested strtifition design rrived t egins with forming sustrt from one strtifier. With 2 nd strtifier, sustrt re formed within eh strtum from the 1 st strtifier. A 3 rd strtifier is used to form sustrt within eh sustrt formed y the 2 nd strtifier, nd so on. The splitting on eh strtifier is foused on rriving t lose-to-equl size totls ross sustrt. This nested pproh n e thought of in terms of tree struture, where set of rnhes is reted y splitting the set of PSUs into groups. The rnhes re identified y using weighted perentiles on the mesure of size (MOSVAR). The perentiles re weighted y mesure of size. For exmple, suppose perent lk (PCT_BLK) is the lone strtifier (SV = 1) to form three sustrt H g = 3. Then there is only one possile solution for the tree struture. Given tht solution, two (H g 1) utpoints re reted on the strtifier PCT_BLK. To find the utpoints, it first sorts the PSUs y PCT_BLK, nd then omputes the umulted sum of mesure of size for eh susequent PSU reord. The utoffs in PCT_BLK re the points tht ontriute 1/3 nd 2/3 of the totl mesure of size. Appendix A provides more detils of the lgorithm. Given the numer of lloted sustrt to the mjor strtum g (H g ), nd given the numer of strtifiers (SV), ll possile nested sustrtifition shemes re found under the ove splitting pproh. The numer of possile sustrtifition shemes (Z) is equl to strtifiers nd numer of sustrt. Z H g SV 1. Tle 1 shows the numer of shemes relted to the numer of

Tle 1. Numer of sustrtifition shemes y numer of strtifiers (SV) nd numer of sustrt Numer of Sustrt Numer of shemes where SV=1 Numer of Shemes where SV=2 Numer of Shemes where SV=3 Numer of shemes where SV=4 2 1 2 3 4 3 1 4 9 16 4 1 8 27 64 5 1 16 81 256 6 1 32 243 124 7 1 64 729 496 8 1 128 2187 16384 9 1 256 6561 65536 1 1 512 19683 262144 11 1 124 5949 148576 12 1 248 177147 419434 13 1 496 531441 16777216 14 1 8192 1594323 6718864 The underlying pproh is illustrted in Figure 1, whih shows the nested strtifition hrts for three sustrt (H=3) nd three strtifiers (SV=3), referred to s,, nd. The Figure shows sheme nottions, for exmple, t the top left hrt (1): (1,1,1)(1,1,2)(1,1,3) giving 3 nodes, one for eh finl sustrtum. This sheme provides very useful informtion; for exmple, the first position in eh node is relted to strtifier 1, the seond position is relted to strtifier 2, nd the third position is for strtifier 3. For hrt (1), the sheme nottion is shown s (1,1,1)(1,1,2)(1,1,3), euse the first two positions re onstnt, nd the third position hnges, refleting tht only the third strtifier is used in the strtifition. For hrt (2), the sheme nottion (1,1,1)(1,1,2)(1,2,1) shows tht the first strtifier is not used, nd the seond strtifier is split into two, with one further split mde on the third strtifier. After eh utomted sustrtum solution is generted, the evlution tools (etween PSU vrine nd equl-size strt mesure) re omputed. The ojetive is to redue the vlues of these mesures when grouping PSUs into strt. The omputtion for the etween-psu vrine for n evlution vrile s totl U, for mjor strtum g nd sustrtum h mong the totl numer of PSUs I is s follows: BETWVAR gh I h i1 PROB ghi ( Uˆ ( ) U ) gh i gh 2 where, where, MOSVAR PROBghi MOSVAR ghi gh where, MOSVAR ghi = size mesure for PSU i, sustrtum h for mjor strtum g, MOSVAR gh = size mesure for sustrtum h for mjor strtum g, U ghi U gh( i) PROB = estimted totl of evlution vrile U y PSU i of sustrtum h, mjor strtum g, gh ghi where U ghi = totl of evlution vrile for PSU i, sustrtum h for mjor strtum g, U gh = totl of evlution vrile for sustrtum h for mjor strtum g. For eh mjor strtum g, the etween PSU vrine eomes: BETWVAR g BETWVAR gh. If the evlution vriles re in terms of perentge, they re onverted to totls U. H g h1

Figure 1. Nested Strtifition Chrts for Three Strt (H=3) nd Three Strtifiers (SV=3) (1) (2) (3) 111 112 113 111 112 121 111 112 211 (4) (5) (6) 111 121 122 111 121 131 111 121 211 (7) (8) (9) 111 211 212 111 211 221 111 211 311 The equl-size strt mesure is simply the vrine of the sustrtum-level MOSVAR vlues. It is omputed s follows for mjor strtum g: g 2 H g MOSVAR H gh g h1 MOSVARgh h1 H g H g 1

Evluting the 23 NAAL sustrtifition sheme We use the sustrtifition proess outlined ove to evlute the 23 NAAL strtifition sheme. As desried in Mohdjer et l (29), the NAAL 23 household study ws designed to e ntionlly representtive smple from the 5 sttes nd the Distrit of Columi of persons in households or ollege dormitories who were 16 yers of ge or older t the time of interview. The NAAL smple ws seleted sed on four-stge re smple design, imed t reduing the ost of interviewing nd ssessing respondents in their homes. The first stge of seletion ws of primry smpling units (PSUs). PSUs were defined to e ounties or sets of ounties with the following generl hrteristis: 1) PSUs were required to hve minimum popultion of 15, persons; 2) PSUs were required to e no wider thn 1 miles in mximum point-topoint distne; 3) PSUs onsisted of ounties tht were either ll Metropolitn Sttistil Are (MSA) or non-msa; nd 4) PSUs were required to sty within stte oundries. A totl of 1,884 PSUs were formed nd omined into 1 strt. A totl of 1 PSUs ws seleted (one-per strtum) with proility proportionte to size s the first-stge smple, with the estimted size equl to the yer 2 popultion. Assoited with the NAAL design were six stte-level smples, lled the Stte Assessment of Adult Litery (SAAL). An dditionl 74 PSUs were smpled for the SAAL sttes of whih 14 overlpped with the 84 ntionl NSR PSUs. To simplify the evlution, we foused only on the strtifition relting to the ntionl NAAL smple. The 16 PSUs with lrgest mesures of size (sed on the totl household popultion from the 2 Deennil Census) were identified s self-representing. Twelve of these 16 PSUs were identified s hving proilities equl to one nd the remining four PSUs hd initil proilities of seletion lose to 1 nd were lso seleted s self-representing. Eh of the SR PSUs ws treted s single strtum nd the remining PSUs were strtified into 84 NSR strt. The strtifition proess for the NSR PSUs strted with the formtion of 17 mjor strt defined y Census Division nd MSA sttus, where non-msa PSUs in Census Divisions 1 1 nd 2 were omined into one mjor strtum. Then, the smple size of 84 NSR PSUs ws lloted proportionl to the totl mesure of size in eh of the mjor strt. Tle 2 presents the llotion of 84 NSR PSUs mong the 17 mjor strt. As it is desirle for the purpose of vrine estimtion to selet n even numer of PSUs from mjor strtum, the lloted numers were mostly rounded to even numers. Tle 3 presents the vriles used for sustrtifition within eh mjor strtum. The vriles used in the sustrtifition proess were identified erlier y performing regression nlysis with the demogrphi vrile relting to the perentge of the popultion tht were high shool grdutes 25 yers nd older. The vriles were listed in the tle in order of importne, relting to the vriility explined y eh vrile (s mesured y R-squre) in the regression nlysis. The 23 NAAL sustrtifition proess ws done using the nested pproh desried ove. To evlute the NAAL PSU strtifition sheme, the idel evlution vrile would e one tht is n outome vrile from the survey itself, nd is ville for eh ounty in the entire ountry. After the 23 NAAL, ounty-level estimtes were produed using smll re estimtion (SAE) tehniques tht rely on NAAL survey dt, s well s dt from other soures, suh s the Deennil Census. As desried in Mohdjer et l (29), NCES undertook the projet to produe estimtes of dults t the lowest litery level for individul ounties using sttistil modeling pprohes (suh s in Ro 23). The lol re preditions estimte the perent lking si prose litery skills (BPLS). These model-dependent estimtes re lled indiret estimtes to distinguish them from stndrd or diret estimtes tht do not depend on the vlidity of sttistil model. The SAE pproh uses the NAAL diret estimtes nd the modeling, to orrow strength from other ounties nd uses the uxiliry dt to help improve upon the impreision of ville diret estimtes. We use the indiret estimtes to evlute the strtifition sheme in terms of the etween PSU vrine. 1 The nine ensus divisions re: 1) New Englnd, 2) Middle Atlnti, 3) Est North Centrl, 4) West North Centrl, 5) South Atlnti, 6) Est South Centrl, 7) West South Centrl, 8) Mountin, 9) Pifi.

Tle 2. Allotion of NSR PSUs in mjor strt Census Allotion Division MSA sttus No. of PSUs Popultion Ext Rounded 1 MSA 13 8,569,586 3.31 3 1+2 Non-MSA 88 5,262,752 2.3 2 2 MSA 36 19,721,29 7.61 7 3 Non-MSA 238 8,874,76 3.42 3 3 MSA 63 28,319,74 11.49 12 4 Non-MSA 272 7,376,87 2.85 3 4 MSA 36 11,274, 4.35 4 5 Non-MSA 264 1,215,21 3.94 4 5 MSA 73 35,349,252 13.63 14 6 Non-MSA 211 6,853,688 2.64 3 6 MSA 31 9,696,238 3.74 4 7 Non-MSA 22 6,649,947 2.57 3 7 MSA 5 16,668,66 6.43 6 8 Non-MSA 132 4,457,347 1.72 2 8 MSA 29 1,293,97 3.97 4 9 Non-MSA 72 3,64,643 1.39 2 9 MSA 4 23,113,95 8.92 8 Totl NSR 1868 216,31,111 84 84 Totl SR 16 55,868,955 16 16 Totl All PSUs 1884 273,643,259 1 1 Note: Sums my not dd to totls euse of rounding. Tle 3. Sustrtifition vriles used in NAAL PSU strtifition Division MSA Sttus Sustrtifition Vriles 1, 2, 8 nd 9 Non-MSA Per pit inome 3 nd 4 Non-MSA Per pit inome, Perent Non-Hispni White 5, 6, nd 7 Non-MSA Per pit inome, Perent Non-Hispni Blk 1 nd 2 MSA Per pit inome, Perent Hispnis 3 nd 6 MSA Per pit inome, Perent Non-Hispni Blk 4 MSA Per pit inome 5 nd 7 MSA Per pit inome, Perent Non-Hispni Blk, Perent Hispni 8 nd 9 MSA Per pit inome, Perent Non-Hispni White

To gin mximum enefit from strtifition, high orreltion etween the strtifiers nd the key survey outome vrile is required. The orreltion etween the indiret estimtes with the SAE key preditors, s well s the sustrtifition vriles used in the 23 NAAL, re shown in Tle 4. The orreltion oeffiients for the 23 NAAL strtifition proess vriles re slightly lower s group when ompred to the 23 NAAL SAE preditors. Also provided in Tle 4 re R 2 vlues for logisti regression models. As we would expet, the R 2 vlue for the model with the NAAL strtifiers is muh lower (.669) thn for the model with the NAAL SAE preditors (.898). The resulting R 2 vlue for model tht exludes the perentge of the popultion elow the 15 perent poverty line from the set of NAAL SAE preditors is.868 -- still muh lrger thn the model thn inludes the NAAL strtifiers. The resulting R 2 vlue for model tht exludes two vriles (the perentge who re Blk or Hispni, nd the perentge of the popultion elow the 15 perent poverty line) from the set of NAAL SAE preditors is.634, whih is out the sme level s the model thn inludes the NAAL strtifiers. Tle 4. Logisti regression R 2 vlues nd orreltion oeffiients with perent lking BPLS nd the 23 NAAL strtifiers nd SAE preditors Covrite R 2 lking BPLS Correltion oeffiients with perent 23 NAAL strtifiers.669 Per pit inome -.35 Perentge of the popultion who re Non-Hispni White -.73 Perentge of the popultion who re Non-Hispni Blk.51 Perentge of the popultion who re Hispni.56 23 NAAL SAE preditors Evlution strtifiers.898 (.868 ) (.634 ) Perentge of the popultion who re foreign-orn styed in the.45 United Sttes -2 yers Perentge of persons ge 25 nd older with high shool.51 edution or less Perentge of the popultion who re Blk or Hispni.8 Perentge of the popultion elow the 15 perent poverty line.66 This is the resulting R 2 vlue for model tht exludes the following preditor: Perentge of the popultion elow the 15 perent poverty line. This is the resulting R 2 vlue for model tht exludes the following preditors: Perentge of the popultion who re Blk or Hispni, perentge of the popultion elow the 15 perent poverty line. The vriles used s key preditors in the SAE models re used s strtifiers in the evlution, exluding the perentge of the popultion elow the 15 perent poverty line. Due to the omputer intensive serh, for mjor strt with more thn 1 sustrt (H g > 1), we used two strtifiers (SV = 2). For mjor strt with H g 8, we used SV = 4. As shown in Tle 2, no mjor strt hd H g = 9 or 1. Furthermore, the utomted pproh provides ll possile shemes, nd sometimes shemes inlude sustrt with just one PSU. For this omprison, we exlude ny sustrtifition sheme with t lest one sustrtum with just one PSU. Sine the evlution strtifiers were preditors used in the SAE model tht generted the evlution mesure (indiret estimtes), the evlution is tough test for the 23 NAAL sheme. This is exemplified y the results in Tle 5, whih shows the perentiles on the etween PSU vrine distriution within eh mjor strtum for the NAAL 23 strtifition sheme mong ll generted evlution shemes. The perentiles for non-msas, sed on etween PSU vrine, rnged from the 53 rd perentile in ensus division 3 to the 1 th (worst sheme ompred to the 16 evlution shemes) in ensus division 7, nd for MSAs, the perentiles rnged from the 8 th perentile in ensus division 1 to the 98 th perentile in ensus division 5. We would expet etter results for the equl sized strt mesure sine it does not depend on the evlution vrile, whih is ssoited with the evlution strtifiers. For non-msas, the NAAL perentiles sed on the equl size strt mesure rnged from the 12 th perentile in ensus division 7 to the 8 th perentile in the omined ensus divisions 1 nd 2, where s for MSAs the perentiles rnged from.2 in ensus divisions 3 nd 9, to 46 th in ensus division 1. Eh sheme ws rnked within the mjor strtum, oth in terms of the etween PSU vrine mesure, nd the equl size strt mesure. Using the two rnks, the verge omined rnk mong the two mesures ws then omputed nd perentile within

eh mjor strtum ssoited with the NAAL sheme mong the verge omined rnks ssoited with the evlution shemes re lso shown in Tle 5. For non-msas, the 23 NAAL strtifition s perentile rnged from 4 to 8, while for MSAs, the perentiles rnged from.3 to 6. Tle 5. Perentiles for 23 NAAL results y evlution mesure nd mjor strt Perentiles Census Division MSA Sttus Numer of sustrtifition evlution shemes Between PSU vrine Equl size strt Averge omined rnk 1 MSA 12 7.7 46.2 23.1 1+2 Non-MSA 4 6. 8. 8. 2 MSA 415 91.7 1.1 45.5 3 Non-MSA 16 52.9 29.4 41.2 3 MSA 168 46..2 17.1 4 Non-MSA 16 58.8 7.6 7.6 4 MSA 64 93.8 21.5 6. 5 Non-MSA 64 93.8 13.8 47.7 5 MSA 723 98.5 3.5.3 6 Non-MSA 16 64.7 35.3 52.9 6 MSA 64 4. 23.1 15.4 7 Non-MSA 16 1. 11.8 52.9 7 MSA 124 38.6 9.3 13.2 8 Non-MSA 4 8. 4. 4. 8 MSA 64 89.2 3.1 4. 9 Non-MSA 4 8. 4. 6. 9 MSA 16269 9.4.2 43.4 Improvements to strtifition for future dult litery surveys The ove nlysis shows tht there n e improvements mde to PSU strtifition in future dult litery surveys. The vriles used in the extensive serh for preditor vriles in the SAE model re leding ndidtes for strtifiers. Using mesure suh s the verge omined rnk, desried ove, will help to find the est sheme in reduing the etween PSU vrine nd mesure of size vrition ross sustrt. We lso investigted if it ws enefiil to use more strtifiers. The 23 NAAL sheme inluded one to three strtifiers for ny given mjor strtum. Tle 6 shows the perent reltive differenes etween the optiml solutions mong the evlution mesures (seprtely for etween PSU vrine nd equl size strt mesure) for SV=2 nd for SV=4. For MSAs in divisions 3 nd 5, only shemes with SV=2 were generted nd therefore the omprison etween two nd four strtifiers ws not mde. The two strtifiers used re the est two preditors in the SAE model nd the shemes hving four strtifiers re sed on the four SAE preditors. A derese signifies redution in the mesure when going from two strtifiers to four strtifiers. When ompring the minimums (lowest resulting mesure), there re seven mjor strt with more thn 1% redution in the etween PSU vrine (rnging from % to - 23%), while most strt hve more thn 1% redution in the equl size strt mesure. These results imply tht using more strtifiers n e enefiil, espeilly in reduing the vrines in the size mesure mong sustrt. Vrition mong the strtifition sheme results One of the ojetives of this pper ws to determine if the efforts suh s those undertken y Kish nd ohorts ws worth the effort. Tle 7 shows the 1 th, 5 th nd 9 th perentiles of the distriution of the etween PSU vrine nd equl size strt mesure ross ll sustrtifition shemes in eh mjor strtum. The tle shows muh more vrition mong the equl size strt mesure thn the etween PSU vrine. This is lso seen in the stterplots for eh MSA mjor strtum in Figure 2 nd eh Non-MSA mjor strtum in Figure 3. The ojetive is to find the point tht is furthest in the lower lefthnd orner of eh plot. Certinly, efforts to identify key strtifiers tht re highly orrelted with survey outome mesures

will help to redue the etween PSU vrine, in some mjor strt more thn others, nd one ould serh nd selet sheme tht pprohes optimlity. Also, there remins the enefit of reduing the equl size strt mesure. Sine there is lot of vrition etween strtifition shemes in terms of the etween PSU vrine nd the equl size strt mesure, we n sy tht it is well worth the effort to evlute severl strtifition shemes. However, key uxiliry dt re needed t the time of strtifition to redue the etween PSU vrine. Tle 6. Perent reltive redutions etween the resulting minimum evlution mesures when SV=2 nd when SV=4 Numer of shemes Differene etween the resulting minimum vlues (SV=2 SV=4) Census Division MSA Sttus SV=2 SV=4 Between PSU vrine Equl size strt mesure 1 MSA 2 12-2.2% -67.7% 1+2 Non-MSA 2 4.% -93.8% 2 MSA 64 415-6.4% -15.% 3 Non-MSA 4 16.% -66.8% 3 MSA 168 -- -- -- 4 Non-MSA 4 16.% -77.1% 4 MSA 8 64-21.4% -27.% 5 Non-MSA 8 64-16.% -47.6% 5 MSA 723 -- -- -- 6 Non-MSA 4 16-2.8% -62.8% 6 MSA 8 64-23.% -37.9% 7 Non-MSA 4 16-11.7%.% 7 MSA 32 124-2.3% -5.% 8 Non-MSA 2 4-1.1%.% 8 MSA 8 64-1.1%.% 9 Non-MSA 2 4 -.4%.% 9 MSA 128 16269-2.4% -12.1% Note: For MSAs in divisions 3 nd 5, only shemes with SV=2 were generted. Tle 7. Distriution of etween PSU vrine nd equl size strt mesure y mjor strt for the evlution runs Between PSU vrine Equl size strt Census Division MSA Sttus 1 th perentile Medin 9 th perentile 1 th perentile Medin 9 th perentile 1 MSA 6,955 87,286 116,283 97,979 723,41 1,81,964 1+2 Non-MSA 73,237 11,596 114,776 891 9,169 32,14 2 MSA 249,163 28,849 332,543 19,117 392,78 614,662 3 Non-MSA 44,928 61,189 63,345 4,38 15,871 21,743 3 MSA 75,963 94,687 111,795 376,18 56,648 626,763 4 Non-MSA 94,137 99,486 12,25 4,888 13,844 25,13 4 MSA 41,655 46,791 51,856 96,229 237,762 631,251 5 Non-MSA 176,469 2,236 219,718 7,766 22,962 43,547 5 MSA 255,651 26,85 266,973 324,87 411,91 525,214 6 Non-MSA 18,569 121,44 134,659 5,713 17,779 32,291 6 MSA 71,767 85,535 95,41 83,546 187,78 347,391 7 Non-MSA 223,732 249,68 264,46 5,631 12,213 2,344 7 MSA 323,91 455,321 49,38 158,548 322,551 482,599 8 Non-MSA 134,652 139,215 151,341 3,283 8,686 29,11 8 MSA 134,88 158,567 182,36 135,773 3,397 477,877 9 Non-MSA 155,73 159,36 168,99 4,13 7,62 36,951 9 MSA 181,132 23,387 238,86 317,674 458,676 574,814 Note: Squre roots of eh mesure re shown.

Figure 2. Stterplots showing the etween PSU vrine nd equl size strt mesure for eh MSA mjor strtum Division 1 MSA 5 Division 2 MSA 5 Division 3 MSA 5 4 4 4 4 6 8 Division 4 MSA 5 4 6 8 Division 5 MSA 5 4 6 8 Division 6 MSA 5 4 4 4 4 6 8 Division 7 MSA 5 4 6 8 Division 8 MSA 5 4 6 8 Division 9 MSA 5 4 4 4 4 6 8 4 6 8 4 6 8 Notes: The x-xis is the equl size strt mesure () nd the y-xis is the etween PSU vrine ()

Figure 3. Stterplots showing the etween PSU vrine nd equl size strt mesure for eh Non-MSA mjor strtum Divisions 1 nd 2 Non-MSA Division 3 Non-MSA Division 4 Non-MSA 1 2 3 4 5 Division 5 Non-MSA 1 2 3 4 5 Division 6 Non-MSA 1 2 3 4 5 Division 7 Non-MSA 1 2 3 4 5 Division 8 Non-MSA 1 2 3 4 5 Division 9 Non-MSA 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Notes: The x-xis is the equl size strt mesure () nd the y-xis is the etween PSU vrine ()

As mentioned ove, we re interested in how muh etter the est evlution sheme is over the 23 NAAL sheme. Figure 4 ompres the NAAL sheme, depited s dot, nd the ssoited est evlution sheme, depited y the rrow. Eh line is for omprison etween the NAAL sheme nd the est evlution sheme for ertin ensus division. This is for MSAs only. The x-xis is the equl size strt mesure nd the y-xis is the etween PSU vrine. The ojetive is to redue the vlues in eh xis nd therefore hed into the lower left hnd orner. The est evlution sheme ws determined y the verge omined rnk. As you n see, some of the lines point down to the lower left orner, showing the improvement n e mde to oth the equl size strt mesure, nd the etween PSU vrine. However, some point down to the right, showing improvement in etween PSU vrine ut not in equl size strt mesure. In Division 2, the equl size mesure ws redued sustntilly t the expense of some slight inrese to the etween PSU vrine. While this ws tough test for the NAAL sheme, we were modertely stisfied with the results, while there is still room for improvement. Figure 4. Comprison of the NAAL sheme nd the est evlution sheme, y Census Division, MSAs only Conluding remrks At the time of the 23 NAAL strtifition proess, there ws no mesure of the ssoition etween the strtifiers nd key survey outome vriles. With the prodution of ounty-level indiret estimtes of the perent lking BPLS, n evlution vrile tht is one of the key NAAL survey outomes eme ville, nd key preditor vriles were identified for the SAE model. Muh of the effort, one softwre suh s the one developed here is estlished, should go into the identifition of key strtifiers. Soon, smple design plns will e written for the 211 Progrmme for Interntionl Assessment of Adult Competenies (PIAAC). The PIAAC survey is similr to the 23 NAAL in tht it mesures outomes relted to litery through n in-person ssessment. As we hve seen from the evlution, the use of the key preditors in the SAE proess s strtifiers will help redue the etween PSU vrine in the PIAAC survey. Tht is, the pln will inlude the perent lking BPLS s the evlution vrile, while forming expliit strt using demogrphi dt (the SAE preditors) from the most reent deennil Census. Also, when using the underlying pproh, the use of more strtifiers when reting the strtifition shemes will help to redue the equl size strt mesure.

In Kish (1965) pge 379, he sys A gret mny mn-hours were spent in the strtifition proess. However, it is questionle whether the mount of time devoted to reviews nd refinements pid off in ppreile redutions in smpling vrines. Intuitive notions out gins from strtifition n e misleding. As result of the evlution of the 23 NAAL strtifition proess, we hve, in effet, revisited the efforts onduted y Kish nd his ollegues, y mesuring the vrition ross numerous strtifition shemes, using omputer-intensive serh. Under the nested strtifition pproh desried in this pper, whih rrives t expliitly defined oundries, we found tht there is onsiderle vrition mong resulting shemes in terms of the equl size strt mesure. Reduing this omponent will led to equlizing interviewer worklods nd reduing the vrition in estimted totls nd the is of these vrine estimtes -- not so muh the vrition in proportions nd mens. We found less vrition in most mjor strt with regrds to the etween PSU vrine, given strong set of strtifiers. However, some mjor strt experiened onsiderle etween PSU vrine, for whih redution in smpling vrines n e relized. Therefore, we onlude tht enefits n e relized y using onstrutive, systemti, nd thorough pproh for identifying strtifition sheme for PSUs. We lso reommend the development of softwre to generte mny strtifition shemes to filitte n nlysis of the mny solutions. Lstly, we reommend tht repeting surveys use informtion from the prior round to improve the strtifition proess. With regrds to future reserh, it would e interesting to ompre results from the simplisti nd effiient nested pproh to the more sophistited omputerintensive lustering lgorithms, while weighing in the level of effort involved. Referenes Friedmn, H. P. nd Ruin, J. (1967). On some invrint riteri for grouping dt. Journl of the Amerin Sttistil Assoition. Vol. 62, 1159-1178. Jewett, R.S. nd Judkins, J. (1988). Multivrite strtifition with size onstrints. SIAM Journl of Sientifi Sttistil Computing, Vol. 9, No. 6, 191-196. Kish, L. (1965). Survey Smpling. New York: John Wiley & Sons. Ludington, P. (1992). Strtifition of primry smpling units for the Current Popultion Survey using omputer intensive methods. Proeedings of the Setion on Survey Reserh Methods of the Amerin Sttistil Assoition. Mohdjer, L., Klton, G., Krenzke, T., Liu, B., Vn de Kerkhove, W., Li, L., Shermn, D., Dillmn, J., Ro, J. nd White, S. (29). Ntionl Assessment of Adult Litery: Indiret ounty nd stte estimtes of the perentge of dults t the lowest level of litery for 1992 nd 23 (NCES 29-482). Ntionl Center for Edution Sttistis, Institute of Edution Sienes, U.S. Deprtment of Edution, Wshington, D.C. Ro, J.N.K. (23). Smll re estimtion. Wiley-Intersiene.

Appendix A The following disusses the sustrtifition pproh in more detil. Suppose is the 1 st strtifier, the 2 nd strtifier nd the 3 rd strtifier (if SV = 3). Then ( 1, 1, 1 ) defines sustrtum 1 (i.e., 'ending' node), ( 2, 2, 2 ) defines sustrtum 2,, ( H,, H, H ) defines sustrtum H. To define the tree-struture given H g nd SV (for this explntion we set SV = 3) the following onstrints re used: 1. i + i + i < H g + SV for ny set of ( i, i, i ) defining node i. 2. The set of ending nodes I = 1, 2,, H g defining set of sustrt must lwys ontin ending node (1,1,1). 3. There re no gps etween i nd j, or i nd j, or i nd j, for ll i nd j tht omprise the ending nodes defining set of sustrt. 4, Similr to 3), for given vlue for strtifier, there must exist sequene of vlues (or one vlue) strting t 1 for i, for ll i relting to the given vlue for strtifier. In the sme mnner, for given vlue for strtifier, there must exist sequene of vlues (or one vlue) strting t 1 for i, for ll i relting to the given vlue for strtifier. The sets of ll possile sustrt (or ending nodes) re the omintions of strtifiers,, nd, where = 1, 2,, H g ; = 1, 2,, H g ; = 1, 2,, H g, tht stisfy the ove onstrints. At this point, there is tree struture defined for eh possile sustrtifition sheme for mjor strtum. However, the rnhes hve not een expliitly defined within eh of the trees. To mke rnhes from node, it does the following: 1. Counts the numer of rnhes formed y the first strtifier. Let us ll it A. 2. Counts the numer of sustrt, holding the vlue of the strtifier onstnt. Tht is, for the split on the first strtifier, ount the numer of ending nodes tht result from = 1 nd ll it H =1. Do tht for eh vlue of tht results from the first strtifier to rrive t H =1, H =2, H =A. 3. Sorts y the strtifier. 4. Cretes A 1 utpoints on the strtifier. The 1 st utpoint is the strtifier vlue ontriuting 1%*H =1 /H g of the totl mesure of size for the supopultion defined y the prtiulr node (whih is the mjor strtum if it is the 1 st strtifier). The 2 nd utpoint is the strtifier vlue defining 1%* (H =1 + H =2 ) / H g of the totl mesure of sizes for the supopultion defined y the prtiulr node, nd so on. 5. For the seond strtifier, nlogously repets steps 1)-3) for eh non-ending node reted y strtifier. In the sme mnner, ontinue with strtifier. As n exmple, suppose H g = 4 nd SV=4. Let the sheme e (1,1,1,1) (1,1,2,1) (1,1,3,1) (2,1,1,1). For strtifier, H =(1) = 3 nd H =(2) = 1. Therefore, the perentile utoff is 75% for first rnh nd 25% for the seond rnh. For strtifier : H =(1,1) = 3; H =(1,2) = 1. But sine the prent H =(1) hs just 1 immedite hild, nd prent H =(2) hs just 1 immedite hild, the numer of utpoints = for eh prent rnh, nd so we don't need to ompute the utpoints using strtifier. For strtifier, H =(1,1,1) hs 1 ending node, H =(1,1,2) hs 1 ending node, nd H =(1,1,3) hs 1 ending node. The prent H =(1,1) hs 3 immedite hildren nd therefore 2 utpoints re mde. The utpoints use the numer of ending nodes from the prent H (1,1), whih is equl to 3 (in generl it does not equl the numer of immedite hildren, ut it does in this exmple). So 1%* H =(1,1,1) / H = (1,1) nd 1%*( H =(1,1,1) + H =(1,1,2)) / H = (1,1) or 33.3% nd 66.6%. And sine prent H =(1,2) hs 1 immedite hild, there is no utpoint formed. For strtifier d, eh prent H (1,1,1), H (1,1,2), H (1,1,3) nd H (2,1,1) re ll equl to 1 so there re no utpoints generted for strtifier d.