01:43 Thursday, March 10, 2011 1 AGRO 6005 Análisis de la varianza (II) 1. Evaluación de supuestos. Los siguientes datos son recuentos de un áfido en trigo en 6 semanas diferentes. En cada ocasión se muestrearon cuarenta plantas aleatoriamente escogidas, y se contaron los insectos en cada una. Con el procedimiento NPAR1WAY realizamos pruebas estadísticas no paramétricas, que son equivalentes a las que podemos realizar usando un ANOVA sobre los datos ranqueados. data afidos; do semana=1 to 6; do repet=1 to 40; input recuento @@; output; end; end; datalines; 12 1 6 1 5 7 1 1 2 1 20 0 9 7 0 12 2 0 0 2 8 0 11 2 21 0 3 18 2 2 6 6 5 1 12 0 3 1 1 18 40 16 32 15 44 41 43 53 67 21 6 31 15 11 21 40 15 50 17 32 24 7 25 11 64 22 50 27 3 46 45 10 8 27 34 19 86 83 17 36 86 63 20 68 55 42 24 29 20 27 26 63 40 46 7 15 10 30 46 26 15 42 6 28 7 9 5 35 6 9 108 38 35 64 21 20 62 25 0 0 29 2 3 0 4 2 6 7 5 4 6 0 0 5 1 3 2 2 2 5 0 1 1 0 3 1 2 0 3 3 18 7 21 0 0 0 2 3 0 40 5 7 0 0 0 1 1 2 1 0 25 1 0 0 0 0 0 0 0 5 0 2 0 0 0 2 0 0 0 4 0 0 0 0 2 0 0 0 0 2 1 0 0 1 7 0 0 0 4 1 5 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ; title 'Datos originales'; proc boxplot; plot recuento*semana; proc glm data=afidos; class semana; model recuento = semana; output out=b p=pred r=resid; proc univariate data=b normal; var resid; qqplot resid / normal; proc gplot data=b; plot resid*pred; data b; set b; absresid=abs(resid); proc glm data=b; class semana; model absresid = semana; title 'Prueba de Levene (puede generalizarse a cualquier modelo)'; data afidos; set afidos; logrec=log(recuento+1); title 'Datos transformados'; proc glm data=afidos; class semana; model logrec = semana; output out=d p=pred r=resid;
proc univariate data=d normal; var resid; qqplot resid / normal; proc gplot data=d; plot resid*pred; 01:43 Thursday, March 10, 2011 2 proc npar1way data=afidos wilcoxon; class semana; var recuento; title 'Datos ranqueados'; proc rank data=afidos out=ranks1; var recuento; ranks rankrec; proc glm data=ranks1; class semana; model rankrec=semana; run;
01:43 Thursday, March 10, 2011 3 Datos originales Class Level Information Class Levels Values semana 6 1 2 3 4 5 6 Number of Observations Read 240 Number of Observations Used 240 Dependent Variable: recuento Source DF Sum of Squares Mean Square F Value Pr > F Model 5 44393.33750 8878.66750 47.01 <.0001 Error 234 44191.72500 188.85353 Corrected Total 239 88585.06250 R-Square Coeff Var Root MSE recuento Mean 0.501138 109.3922 13.74240 12.56250 Source DF Type I SS Mean Square F Value Pr > F semana 5 44393.33750 8878.66750 47.01 <.0001 Source DF Type III SS Mean Square F Value Pr > F semana 5 44393.33750 8878.66750 47.01 <.0001
01:43 Thursday, March 10, 2011 4 Datos originales The UNIVARIATE Procedure Variable: resid Moments N 240 Sum Weights 240 Mean 0 Sum Observations 0 Std Deviation 13.5978901 Variance 184.902615 Skewness 1.61685246 Kurtosis 6.70692905 Uncorrected SS 44191.725 Corrected SS 44191.725 Coeff Variation. Std Error Mean 0.87774003 Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.818005 Pr < W <0.0001 Kolmogorov-Smirnov D 0.223108 Pr > D <0.0100 Cramer-von Mises W-Sq 3.285398 Pr > W-Sq <0.0050 Anderson-Darling A-Sq 15.74663 Pr > A-Sq <0.0050
Datos originales 01:43 Thursday, March 10, 2011 5
01:43 Thursday, March 10, 2011 6 Prueba de Levene (puede generalizarse a cualquier modelo) Class Level Information Class Levels Values semana 6 1 2 3 4 5 6 Number of Observations Read 240 Number of Observations Used 240 Dependent Variable: absresid Source DF Sum of Squares Mean Square F Value Pr > F Model 5 11822.90918 2364.58184 32.07 <.0001 Error 234 17251.84681 73.72584 Corrected Total 239 29074.75599 R-Square Coeff Var Root MSE absresid Mean 0.406638 108.1890 8.586375 7.936458 Source DF Type I SS Mean Square F Value Pr > F semana 5 11822.90918 2364.58184 32.07 <.0001 Source DF Type III SS Mean Square F Value Pr > F semana 5 11822.90918 2364.58184 32.07 <.0001
01:43 Thursday, March 10, 2011 7 Datos transformados Class Level Information Class Levels Values semana 6 1 2 3 4 5 6 Number of Observations Read 240 Number of Observations Used 240 Dependent Variable: logrec Source DF Sum of Squares Mean Square F Value Pr > F Model 5 339.4187667 67.8837533 93.18 <.0001 Error 234 170.4676155 0.7284941 Corrected Total 239 509.8863823 R-Square Coeff Var Root MSE logrec Mean 0.665675 52.95501 0.853519 1.611781 Source DF Type I SS Mean Square F Value Pr > F semana 5 339.4187667 67.8837533 93.18 <.0001 Source DF Type III SS Mean Square F Value Pr > F semana 5 339.4187667 67.8837533 93.18 <.0001
01:43 Thursday, March 10, 2011 8 Datos transformados The UNIVARIATE Procedure Variable: resid Moments N 240 Sum Weights 240 Mean 0 Sum Observations 0 Std Deviation 0.84454344 Variance 0.71325362 Skewness 0.06112673 Kurtosis 1.32847944 Uncorrected SS 170.467616 Corrected SS 170.467616 Coeff Variation. Std Error Mean 0.05451504 Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.972419 Pr < W 0.0001 Kolmogorov-Smirnov D 0.101145 Pr > D <0.0100 Cramer-von Mises W-Sq 0.39147 Pr > W-Sq <0.0050 Anderson-Darling A-Sq 2.063397 Pr > A-Sq <0.0050
Datos transformados 01:43 Thursday, March 10, 2011 9
01:43 Thursday, March 10, 2011 10 Datos ranqueados The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable recuento Classified by Variable semana semana N Sum of Scores Expected Under H0 Std Dev Under H0 Mean Score 1 40 4568.00 4820.0 393.642469 114.20000 2 40 7821.00 4820.0 393.642469 195.52500 3 40 7541.50 4820.0 393.642469 188.53750 4 40 4248.00 4820.0 393.642469 106.20000 5 40 2654.50 4820.0 393.642469 66.36250 6 40 2087.00 4820.0 393.642469 52.17500 Average scores were used for ties. Kruskal-Wallis Test Chi-Square 155.7551 DF 5 Pr > Chi-Square <.0001
01:43 Thursday, March 10, 2011 11 Datos ranqueados Class Level Information Class Levels Values semana 6 1 2 3 4 5 6 Number of Observations Read 240 Number of Observations Used 240 Dependent Variable: rankrec Rank for Variable recuento Source DF Sum of Squares Mean Square F Value Pr > F Model 5 724048.263 144809.653 87.57 <.0001 Error 234 386974.738 1653.738 Corrected Total 239 1111023.000 R-Square Coeff Var Root MSE rankrec Mean 0.651695 33.74787 40.66618 120.5000 Source DF Type I SS Mean Square F Value Pr > F semana 5 724048.2625 144809.6525 87.57 <.0001 Source DF Type III SS Mean Square F Value Pr > F semana 5 724048.2625 144809.6525 87.57 <.0001
Ejemplo 3 01:43 Thursday, March 10, 2011 12 2. En este ejemplo realizamos una prueba no paramétrica (comparación de rangos) para un diseño en bloques. Los datos son la cantidad de ácido oxálico generado por distintas cepas de un hongo en un experimento llevado a cabo en cuatro semanas diferentes (bloques). data sclerot; do week=1 to 4; do strain=1, 5, 6, 7; input oxalic @@; output; end; end; datalines; 2.40 0.23 0.00 0.01 2.30 7.03 0.01 0.02 5.01 26.93 0.02 0.01 2.67 12.05 0.05 0.00 proc sort; by week; title 'Ejemplo 3'; proc rank out=ranks2; by week; var oxalic; ranks rankoxal; proc glm; class week strain; model rankoxal = week strain / ss3; means strain / tukey; run; Class Level Information Class Levels Values week 4 1 2 3 4 strain 4 1 5 6 7 Source DF Sum of Squares Mean Square F Value Pr > F Model 6 16.50000000 2.75000000 7.07 0.0052 Error 9 3.50000000 0.38888889 Corrected Total 15 20.00000000 R-Square Coeff Var Root MSE rankoxal Mean 0.825000 24.94438 0.623610 2.500000 Source DF Type III SS Mean Square F Value Pr > F week 3 0.00000000 0.00000000 0.00 1.0000 strain 3 16.50000000 5.50000000 14.14 0.0009
The SAS System 01:43 Thursday, March 10, 2011 13 Dependent Variable: diametro Note: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 9 Error Mean Square 0.388889 Critical Value of Studentized Range 4.41490 Minimum Significant Difference 1.3766 Means with the same letter are not significantly different. Tukey Grouping Mean A 3.7500 4 5 A A 3.2500 4 1 N strain B 1.5000 4 6 B B 1.5000 4 7
The SAS System 01:43 Thursday, March 10, 2011 14 Dependent Variable: diametro Análisis de la varianza (III). Experimentos factoriales Los siguientes datos provienen de un experimento para comparar los efectos del ph del suelo y de los aditivos de calcio sobre el crecimiento de árboles de naranja. El experimento se condujo según un diseño en bloques completos al azar. data arboles; input bloque ph calcio diametro @@; datalines; 1 4 100 5.2 1 4 200 7.4 1 4 300 6.3 2 4 100 5.9 2 4 200 7.0 2 4 300 6.7 3 4 100 6.3 3 4 200 7.6 3 4 300 6.1 1 5 100 7.1 1 5 200 7.4 1 5 300 7.3 2 5 100 7.4 2 5 200 7.3 2 5 300 7.5 3 5 100 7.5 3 5 200 7.1 3 5 300 7.2 1 6 100 7.6 1 6 200 7.6 1 6 300 7.2 2 6 100 7.2 2 6 200 7.5 2 6 300 7.3 3 6 100 7.4 3 6 200 7.8 3 6 300 7.0 1 7 100 7.2 1 7 200 7.4 1 7 300 6.8 2 7 100 7.5 2 7 200 7.0 2 7 300 6.6 3 7 100 7.2 3 7 200 6.9 3 7 300 6.4 ; proc glm data=arboles; class bloque ph calcio; model diametro=bloque ph calcio ph*calcio; lsmeans ph calcio ph*calcio / slice=ph pdiff adjust=tukey; ods output diff=diferencias lsmeans=medias; *ods listing exclude lsmeans diff; run; %include 'i:\pdglm800.sas'; %pdglm800(diferencias, medias,alpha=.05,sort=yes); run; proc gplot data=medias; where effect='ph_calcio'; symbol i=join value=dot; plot lsmean*ph=calcio; run; Class Level Information Class Levels Values bloque 3 1 2 3 ph 4 4 5 6 7 calcio 3 100 200 300
The SAS System 01:43 Thursday, March 10, 2011 15 Dependent Variable: diametro Number of Observations Read 36 Number of Observations Used 36 Source DF Sum of Squares Mean Square F Value Pr > F Model 13 9.19194444 0.70707265 9.62 <.0001 Error 22 1.61777778 0.07353535 Corrected Total 35 10.80972222 R-Square Coeff Var Root MSE diametro Mean 0.850340 3.844925 0.271174 7.052778 Source DF Type III SS Mean Square F Value Pr > F bloque 2 0.00888889 0.00444444 0.06 0.9415 ph 3 4.46083333 1.48694444 20.22 <.0001 calcio 2 1.46722222 0.73361111 9.98 0.0008 ph*calcio 6 3.25500000 0.54250000 7.38 0.0002
The SAS System 01:43 Thursday, March 10, 2011 16 Least Squares Means Adjustment for Multiple Comparisons: Tukey ph diametro LSMEAN LSMEAN Number 4 6.50000000 1 5 7.31111111 2 6 7.40000000 3 7 7.00000000 4 Least Squares Means for effect ph Pr > t for H0: LSMean(i)=LSMean(j) Dependent Variable: diametro i/j 1 2 3 4 1 <.0001 <.0001 0.0039 2 <.0001 0.8978 0.0998 3 <.0001 0.8978 0.0234 4 0.0039 0.0998 0.0234 calcio diametro LSMEAN LSMEAN Number 100 6.95833333 1 200 7.33333333 2 300 6.86666667 3 Least Squares Means for effect calcio Pr > t for H0: LSMean(i)=LSMean(j) Dependent Variable: diametro i/j 1 2 3 1 0.0072 0.6900 2 0.0072 0.0010 3 0.6900 0.0010
The SAS System 01:43 Thursday, March 10, 2011 17 Least Squares Means Adjustment for Multiple Comparisons: Tukey ph calcio diametro LSMEAN LSMEAN Number 4 100 5.80000000 1 4 200 7.33333333 2 4 300 6.36666667 3 5 100 7.33333333 4 5 200 7.26666667 5 5 300 7.33333333 6 6 100 7.40000000 7 6 200 7.63333333 8 6 300 7.16666667 9 7 100 7.30000000 10 7 200 7.10000000 11 7 300 6.60000000 12 Least Squares Means for effect ph*calcio Pr > t for H0: LSMean(i)=LSMean(j) Dependent Variable: diametro i/j 1 2 3 4 5 6 7 8 9 10 11 12 1 <.0001 0.3567 <.0001 <.0001 <.0001 <.0001 <.0001 0.0002 <.0001 0.0003 0.0526 2 <.0001 0.0102 1.0000 1.0000 1.0000 1.0000 0.9608 0.9997 1.0000 0.9939 0.0968 3 0.3567 0.0102 0.0102 0.0199 0.0102 0.0051 0.0005 0.0526 0.0143 0.0968 0.9939 4 <.0001 1.0000 0.0102 1.0000 1.0000 1.0000 0.9608 0.9997 1.0000 0.9939 0.0968 5 <.0001 1.0000 0.0199 1.0000 1.0000 1.0000 0.8693 1.0000 1.0000 0.9997 0.1705 6 <.0001 1.0000 0.0102 1.0000 1.0000 1.0000 0.9608 0.9997 1.0000 0.9939 0.0968 7 <.0001 1.0000 0.0051 1.0000 1.0000 1.0000 0.9939 0.9939 1.0000 0.9608 0.0526 8 <.0001 0.9608 0.0005 0.9608 0.8693 0.9608 0.9939 0.6231 0.9237 0.4395 0.0051 9 0.0002 0.9997 0.0526 0.9997 1.0000 0.9997 0.9939 0.6231 1.0000 1.0000 0.3567 10 <.0001 1.0000 0.0143 1.0000 1.0000 1.0000 1.0000 0.9237 1.0000 0.9983 0.1292 11 0.0003 0.9939 0.0968 0.9939 0.9997 0.9939 0.9608 0.4395 1.0000 0.9983 0.5296 12 0.0526 0.0968 0.9939 0.0968 0.1705 0.0968 0.0526 0.0051 0.3567 0.1292 0.5296
01:43 Thursday, March 10, 2011 18 Ejemplo 3 Tukey's Studentized Range (HSD) Test for rankoxal ph DF ph*calcio Effect Sliced by ph for diametro Sum of Squares Mean Square F Value Pr > F 4 2 3.606667 1.803333 24.52 <.0001 5 2 0.008889 0.004444 0.06 0.9415 6 2 0.326667 0.163333 2.22 0.1322 7 2 0.780000 0.390000 5.30 0.0132 SALIDA DE LA MACRO PDGLM800 Obs Dependent ph BYGROUP=1 Effect=calcio LSMean calcio LetterGroup 1 diametro 7.33333333 200 A 2 diametro 6.95833333 100 B 3 diametro 6.86666667 300 B Obs Dependent ph BYGROUP=2 Effect=ph LSMean calcio LetterGroup 4 diametro 6 7.40000000 A 5 diametro 5 7.31111111 AB 6 diametro 7 7.00000000 B 7 diametro 4 6.50000000 C Obs Dependent ph BYGROUP=3 Effect=ph_calcio LSMean calcio LetterGroup 8 diametro 6 7.63333333 200 A 9 diametro 6 7.40000000 100 AB 10 diametro 4 7.33333333 200 AB 11 diametro 5 7.33333333 100 AB 12 diametro 5 7.33333333 300 AB 13 diametro 7 7.30000000 100 AB 14 diametro 5 7.26666667 200 AB 15 diametro 6 7.16666667 300 ABC 16 diametro 7 7.10000000 200 ABC 17 diametro 7 6.60000000 300 BCD
Ejemplo 3 01:43 Thursday, March 10, 2011 19 Tukey's Studentized Range (HSD) Test for rankoxal Obs Dependent ph LSMean calcio LetterGroup 18 diametro 4 6.36666667 300 CD 19 diametro 4 5.80000000 100 D
Ejemplo 3 01:43 Thursday, March 10, 2011 20 Tukey's Studentized Range (HSD) Test for rankoxal Diferencia entre medias no ajustadas y medias marginales en factoriales no balanceados data factor; input variedad $ fungic $ repet rendim; datalines; a no 1 14 a no 2 15 a si 1 14 a si 2 16 b no 1 13 b no 2 12 b si 1 19 b si 2 20 b si 3 20 proc glm data=factor; class variedad fungic; model rendim = variedad fungic variedad*fungic; means variedad fungic variedad*fungic; lsmeans variedad fungic variedad*fungic / stderr; run; Dependent Variable: rendim
Ejemplo 3 01:43 Thursday, March 10, 2011 21 Tukey's Studentized Range (HSD) Test for rankoxal Source DF Sum of Squares Mean Square F Value Pr > F Model 3 71.22222222 23.74074074 32.37 0.0011 Error 5 3.66666667 0.73333333 Corrected Total 8 74.88888889 R-Square Coeff Var Root MSE rendim Mean 0.951039 5.389608 0.856349 15.88889 Source DF Type I SS Mean Square F Value Pr > F variedad 1 9.33888889 9.33888889 12.73 0.0161 fungic 1 37.64090909 37.64090909 51.33 0.0008 variedad*fungic 1 24.24242424 24.24242424 33.06 0.0022 Source DF Type III SS Mean Square F Value Pr > F variedad 1 3.87878788 3.87878788 5.29 0.0698 fungic 1 32.06060606 32.06060606 43.72 0.0012 variedad*fungic 1 24.24242424 24.24242424 33.06 0.0022 Means Level of variedad N Mean rendim Std Dev a 4 14.7500000 0.95742711 b 5 16.8000000 3.96232255 Level of fungic N Mean rendim Std Dev no 4 13.5000000 1.29099445 si 5 17.8000000 2.68328157
Ejemplo 3 01:43 Thursday, March 10, 2011 22 Tukey's Studentized Range (HSD) Test for rankoxal Level of variedad Level of fungic N Mean rendim Std Dev a no 2 14.5000000 0.70710678 a si 2 15.0000000 1.41421356 b no 2 12.5000000 0.70710678 b si 3 19.6666667 0.57735027 Least Squares Means variedad rendim LSMEAN Standard Error Pr > t a 14.7500000 0.4281744 <.0001 b 16.0833333 0.3908680 <.0001 fungic rendim LSMEAN Standard Error Pr > t no 13.5000000 0.4281744 <.0001 si 17.3333333 0.3908680 <.0001 variedad fungic rendim LSMEAN Standard Error Pr > t a no 14.5000000 0.6055301 <.0001 a si 15.0000000 0.6055301 <.0001 b no 12.5000000 0.6055301 <.0001 b si 19.6666667 0.4944132 <.0001