Appendices for: Statistical Power in Analyzing Interaction Effects: Questioning the Advantage of PLS with Product Indicators

Similar documents
PLS Pluses and Minuses In Path Estimation Accuracy

PLS: New Directions, New Challenges, and New Understandings

CONSTRUCT VALIDITY IN PARTIAL LEAST SQUARES PATH MODELING

COMPARING THE PREDICTIVE ABILITY OF PLS AND COVARIANCE MODELS

Improving CERs building

Statistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O Halloran

Student-Level Growth Estimates for the SAT Suite of Assessments

Technical Papers supporting SAP 2009

Investigating the Concordance Relationship Between the HSA Cut Scores and the PARCC Cut Scores Using the 2016 PARCC Test Data

PVP Field Calibration and Accuracy of Torque Wrenches. Proceedings of ASME PVP ASME Pressure Vessel and Piping Conference PVP2011-

Lecture 2. Review of Linear Regression I Statistics Statistical Methods II. Presented January 9, 2018

WHITE PAPER. Preventing Collisions and Reducing Fleet Costs While Using the Zendrive Dashboard

Modeling Ignition Delay in a Diesel Engine

Getting Started with Correlated Component Regression (CCR) in XLSTAT-CCR

Meeting product specifications

DRIVER SPEED COMPLIANCE WITHIN SCHOOL ZONES AND EFFECTS OF 40 PAINTED SPEED LIMIT ON DRIVER SPEED BEHAVIOURS Tony Radalj Main Roads Western Australia

TRINITY COLLEGE DUBLIN THE UNIVERSITY OF DUBLIN. Faculty of Engineering, Mathematics and Science. School of Computer Science and Statistics

A REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD

5. CONSTRUCTION OF THE WEIGHT-FOR-LENGTH AND WEIGHT-FOR- HEIGHT STANDARDS

Preface... xi. A Word to the Practitioner... xi The Organization of the Book... xi Required Software... xii Accessing the Supplementary Content...

LET S ARGUE: STUDENT WORK PAMELA RAWSON. Baxter Academy for Technology & Science Portland, rawsonmath.

(Refer Slide Time: 00:01:10min)

LECTURE 6: HETEROSKEDASTICITY

Effect of Sample Size and Method of Sampling Pig Weights on the Accuracy of Estimating the Mean Weight of the Population 1

Driving Tests: Reliability and the Relationship Between Test Errors and Accidents

Linking the Alaska AMP Assessments to NWEA MAP Tests

Linking the Mississippi Assessment Program to NWEA MAP Tests

Robust alternatives to best linear unbiased prediction of complex traits

1. Tolerance Allocation to Optimize Process Capability

Fuel Economy and Safety

Linking the Georgia Milestones Assessments to NWEA MAP Growth Tests *

2018 Linking Study: Predicting Performance on the NSCAS Summative ELA and Mathematics Assessments based on MAP Growth Scores

Linking the Virginia SOL Assessments to NWEA MAP Growth Tests *

Regression Models Course Project, 2016

TECHNICAL REPORTS from the ELECTRONICS GROUP at the UNIVERSITY of OTAGO. Table of Multiple Feedback Shift Registers

The Mark Ortiz Automotive

BAC and Fatal Crash Risk

Linking the Florida Standards Assessments (FSA) to NWEA MAP

Linking the Kansas KAP Assessments to NWEA MAP Growth Tests *

Hydro Plant Risk Assessment Guide

International Aluminium Institute

Sharif University of Technology. Graduate School of Management and Economics. Econometrics I. Fall Seyed Mahdi Barakchian

Online Appendix for Subways, Strikes, and Slowdowns: The Impacts of Public Transit on Traffic Congestion

Data envelopment analysis with missing values: an approach using neural network

Problem Set 3 - Solutions

Burn Characteristics of Visco Fuse

Supervised Learning to Predict Human Driver Merging Behavior

Use of Flow Network Modeling for the Design of an Intricate Cooling Manifold

Performance of the Mean- and Variance-Adjusted ML χ 2 Test Statistic with and without Satterthwaite df Correction

Linking the North Carolina EOG Assessments to NWEA MAP Growth Tests *

Linking the Indiana ISTEP+ Assessments to NWEA MAP Tests

Linking the New York State NYSTP Assessments to NWEA MAP Growth Tests *

Supplementary Online Content

sponsoring agencies.)

Chapter 5 ESTIMATION OF MAINTENANCE COST PER HOUR USING AGE REPLACEMENT COST MODEL

Understanding the benefits of using a digital valve controller. Mark Buzzell Business Manager, Metso Flow Control

Linking the Indiana ISTEP+ Assessments to the NWEA MAP Growth Tests. February 2017 Updated November 2017

9.3 Tests About a Population Mean (Day 1)

Investigation of Relationship between Fuel Economy and Owner Satisfaction

Fourth Grade. Multiplication Review. Slide 1 / 146 Slide 2 / 146. Slide 3 / 146. Slide 4 / 146. Slide 5 / 146. Slide 6 / 146

Identify Formula for Throughput with Multi-Variate Regression

Investigation in to the Application of PLS in MPC Schemes

Who has trouble reporting prior day events?

SOME ISSUES OF THE CRITICAL RATIO DISPATCH RULE IN SEMICONDUCTOR MANUFACTURING. Oliver Rose

Cost-Efficiency by Arash Method in DEA

PROCEDURES FOR ESTIMATING THE TOTAL LOAD EXPERIENCE OF A HIGHWAY AS CONTRIBUTED BY CARGO VEHICLES

Transmission Error in Screw Compressor Rotors

Diagnostic. Enlightenment. The Path to

A Cost Benefit Analysis of Faster Transmission System Protection Schemes and Ground Grid Design

Relating your PIRA and PUMA test marks to the national standard

Relating your PIRA and PUMA test marks to the national standard

Using MATLAB/ Simulink in the designing of Undergraduate Electric Machinery Courses

Effects of speed distributions on the Harmonoise model predictions

Vehicle Scrappage and Gasoline Policy. Online Appendix. Alternative First Stage and Reduced Form Specifications

INVESTIGATION ONE: WHAT DOES A VOLTMETER DO? How Are Values of Circuit Variables Measured?

From Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT. Full book available for purchase here.

A Viewpoint on the Decoding of the Quadratic Residue Code of Length 89

Fourth Grade. Slide 1 / 146. Slide 2 / 146. Slide 3 / 146. Multiplication and Division Relationship. Table of Contents. Multiplication Review

Fractional Factorial Designs with Admissible Sets of Clear Two-Factor Interactions

Appendix B STATISTICAL TABLES OVERVIEW

Introducing the OMAX Generation 4 cutting model

Descriptive Statistics

Important Formulas. Discrete Probability Distributions. Probability and Counting Rules. The Normal Distribution. Confidence Intervals and Sample Size

Predicting Solutions to the Optimal Power Flow Problem

The Degrees of Freedom of Partial Least Squares Regression

Busy Ant Maths and the Scottish Curriculum for Excellence Foundation Level - Primary 1

Optimization of Three-stage Electromagnetic Coil Launcher

PLS score-loading correspondence and a bi-orthogonal factorization

The Value of Travel-Time: Estimates of the Hourly Value of Time for Vehicles in Oregon 2007

Low Speed Rear End Crash Analysis

Components of Hydronic Systems

ESTIMATING THE LIVES SAVED BY SAFETY BELTS AND AIR BAGS

EMaSM. Principles Of Sensors & transducers

Improving Analog Product knowledge using Principal Components Variable Clustering in JMP on test data.

The following output is from the Minitab general linear model analysis procedure.

Analyzing Crash Risk Using Automatic Traffic Recorder Speed Data

Open Discussion Topic: Potential Pitfalls in the Use of Coefficient of Variation as a Measure of Trial Validity

PREDICTION OF FUEL CONSUMPTION

Assessing Feeder Hosting Capacity for Distributed Generation Integration

Large Electric Motor Reliability: What Did the Studies Really Say? Howard W Penrose, Ph.D., CMRP President, MotorDoc LLC

Transcription:

Appendices for: Statistical Power in Analyzing Interaction Effects: Questioning the Advantage of PLS with Product Indicators Dale Goodhue Terry College of Business MIS Department University of Georgia Athens, GA 30606 dgoodhue@terry.uga.edu William Lewis College of Administration and Business Louisiana Tech University P. O. Box 10318 Ruston, LA 71272-0001 wlewis@cab.latech.edu Ronald L. Thompson Babcock Graduate School of Management Wake Forest University Winston-Salem, NC 27109 ron.thompson@mba.wfu.edu

Appendices for: Statistical Power in Analyzing Interaction Effects: Questioning the Advantage of PLS with Product Indicators Abstract of Main Paper: A significant amount of IS research involves hypothesizing and testing for interaction effects. Chin, Marcolin and Newsted (2003) completed an extensive experiment using Monte Carlo simulation that compared two different techniques for detecting and estimating such interaction effects: Partial Least Squares (PLS) with a product indicator approach versus multiple regression with summated indicators. By varying the number of indicators for each construct and the sample size, they concluded that PLS using product indicators was better (at providing higher and presumably more accurate path estimates) than multiple regression using summated indicators. Although we view the Chin et al. (2003) study as an important step in using Monte Carlo analysis to investigate such issues, we believe their results give a misleading picture of the efficacy of the product indicator approach with PLS. By expanding the scope of the investigation to include statistical power, and by replicating and then extending their work, we reach a different conclusion -- that although PLS with the product indicator approach provides higher point estimates of interaction paths, it also produces wider confidence intervals, and thus provides less statistical power than multiple regression. This disadvantage increases with the number of indicators and (up to a point) with sample size. We explore the possibility that these surprising results can be explained by capitalization on chance. Regardless of the explanation, our analysis leads us to recommend that, if sample size or statistical significance is a concern, regression or PLS with product of the sums should be used instead of PLS with product indicators for testing interaction effects. APPENDICES Table of Contents Page 2. Appendix A: A SAS program to generate data for 500 samples of 100 questionnaires each Page 3. Page 5. Page 6. Page 11. Page 12. Page 14. Appendix B: Determinants of statistical power Appendix C: Real versus integer data Appendix D: Capitalization on chance Appendix E: Comparing Bootstrapping with 100 and 500 resamples Appendix F: PLS with normal theory significance testing Appendix G: Normality and Kurtosis Page 15. Appendix H: Results using significance level of 0.05 1

Appendix A: A SAS program to Generate Data for 500 Samples of 100 Questionnaires Each (Two Indicators For Each Main Effect Construct, Four Indicators for the Interaction). Retain Seed1 193434 Seed2 187934 Seed3 138574 Seed4 132634 Seed5 1388374 Seed6 138521874 Seed7 13854274 Seed8 13988574 Seed9 13857974; Do I = 1 to 50000; Call Rannor(Seed1, KSI1); Call Rannor(Seed2, X1Err); Call Rannor(Seed3, X2Err); X1 = round(.70*ksi1 +.714*X1Err) ; X2 = round(.70*ksi1 +.714*X2Err) ; Call Rannor(Seed4, KSI2); Call Rannor(Seed5, Z1Err); Call Rannor(Seed6, Z2Err); Z1 = round(.70*ksi2 +.714*Z1Err) ; Z2 = round(.70*ksi2 +.714*Z2Err) ; Call Rannor(Seed7,Eta1Err); Call Rannor(Seed8,Y1Err); Call Rannor(Seed9,Y2Err); ETA1 =.30*KSI1 +.50*KSI2 +.30*KSI1*KSI2 +.755*Eta1Err; Y1 = round(.70*eta1 +.714*Y1Err) ; Y2 = round(.70*eta1 +.714*Y2Err) ; I1 = X1*Z1; I2 = X1*Z2; I3 = X2*Z1; I4 = X2*Z2; Put Y1 4.0 Y2 4.0 X1 4.0 X2 4.0 Z1 4.0 Z2 4.0 I1 4.0 I2 4.0 I3 4.0 I4 4.0; end; 2

Appendix B. Determinants of Statistical Power When the underlying relationships of the various constructs and indicators are explicitly known (as in data generated for a Monte Carlo Analysis using Figure 1 and Appendix A), it gives us a unique opportunity to see how measurement error, indicator reliability, number of indicators, construct reliability, path strength, effect size and statistical power are all related. This Appendix walks the reader through the logic and calculations needed, using several basic statistics equations and tables from Cohen (1988). Measurement error is a general term referring to error in the score of an indicator of a construct. Reliability (as measured by alpha; Cronbach, 1951) is more precisely defined as the true score variance of a measure divided by its total variance (Carmines & Zeller 1979, page 31). Since we specifically designed the model in Figure 1 this way, we know that each indicator and each of the X, Z and Y constructs is normally distributed, mean 0 and variance 1.0. In Figure 1 and Appendix A, for example, x 1 is generated as.7*x +.714*RANNOR, where X and RANNOR are independent, normal (0,1) distributions. Therefore the true score variance of x 1 is (.7) 2 times the variance of X, or.49 (Larsen and Marx, 1981, Theorem 3.12). The error variance is (.714) 2 times the variance of RANNOR or.51. So total variance.49 +.51 = 1. Reliability of x 1 is.49 / 1.0 =.49. The same calculations hold for each of the indicators in Figure 1, so the reliability of each indicator is.49. (If we changed all the indicator loadings to.6, adjusting the error variance accordingly, the reliability of each indicator would be.36, and so on.) We can also calculate the reliability of the two-indicator measure of X, by writing its equation, and using a standard equation for the variance of a sum of variables 1. Accordingly, if each of two indicators has a.7 loading, the true score variance X is.49 and the total variance is.745. The reliability of the two-indicator measure for X (and of Z and or Y) is therefore.49 /.745 =.66. If we moved to 4 indicators to measure each construct, the reliability of each construct would be.79, etc. So the reliability of each construct is a function of the relative size of the true score and the error loading in each indicator, and the number of indicators. Finally, though the math is more extensive 2 if not more difficult, we can calculate the covariances for X and Y, for Z and Y, and for I and Y. Since X, Z, and X*Z are all independent (see footnote 5), the partial R 2 (or incremental variance explained) for each is simply the covariance with Y, squared. Finally, the effect size of I is equal to the square of the partial R of I, divided by the total unexplained variance, or: f 2 = ΔR 2 I / (1 Σ j = X,Y,I (ΔR 2 j)) (Cohen 1988, page 410, Equation 9.2.3) Note that effect size is not equivalent to the value of b 3, nor are b 3 and effect size related in any straightforward way (Carte and Russell 2003, p.482-3). The effect size of I is increased by greater reliability of the constructs X, Z, and Y, (which can be brought about by more reliable indicators, or by more indicators, or both) by stronger b 3 (the true interaction path), and by more total explained variance (from all three independent variables. Finally, Cohen (1988) gives tables (i.e. Tables 9.3.1 and 9.3.2, pages 416-432) that allow us to predict the power of any given independent variable in a regression if we know its effect size, the sample size n and the number of other independent variables. Table B-1 below shows the expected power for Figure 1, at six different sample sizes and three different number-of-indicators. Since power is a proportion (number of significant t 1 Var[(X 1 /2 + X 2 /2] = Σ i=1,n Var(X i /2) + 2 Σ j<k Cov(X j /2,X k /2), (Larsen and Marx 1981, equation 10.3) 2 We use Cov(X,Y) = E(XY) - μ X μ Y, and E(XY) = E(X)E(Y) if X,Y independent (Larsen & Marx 1981, Theorems 3.10 and 10.1) 3

statistics divided by number of datasets analyzed), we can determine a 95% confidence interval around our estimate for each condition, by using a standard equation for confidence intervals around a proportion. Of course our advance calculations apply only to regression, since PLS will modify the indicator loadings to maximize the variance explained, and this changes all the calculations unpredictably. However, these power calculations will serve as a way to check our Monte Carlo results for regression, and to provide a target value to compare with PLS results. Sample Size Table B-1. Estimated Power for the Figure 1 Model For Regression. Values in square brackets show the 95 % confidence interval. 2 indicators 4 indicators 6 indicators Main Effect Reliab =.66 Interaction Reliab. =.44 Effect Size = 0.031 20 OOR* 50 OOR* Main Effect Reliab. =.79 Interaction Reliab. =.62 Effect Size = 0.059 OOR* Main Effect Reliab. =.85 Interaction Reliab. =.72 Effect Size = 0.080 OOR* 17 [14-20] 24 [20-28] 100 20 [10-23] 41 [37-46] 56 [52-61] 150 32 [28-36] 63 [59-67] 79 [76-83] 200 45 [41-49] 79 [75-82] 92 [89-94] 500 91 [88-93] OOR* OOR* *OOR = Out of Range in Cohen s table. i.e. power is predicted to be either less than 10 or 100. 4

APPENDIX C: REAL VERSUS INTEGER DATA To test the impact on the pattern of results of moving from input data that is real numbers specified to four decimal places (as used by Chin et al. 2003) to integer data (as used by most IS researchers), we rounded the X, Z, and Y indicator values from the A1 through A7 datasets to integers, and repeated the analysis. Table C-1 shows the results. Obviously there is some loss of information as we go from real to integer data -- power for both regression and PLS-PI is reduced by about 3 to 5 percentage points -- but the pattern of results is unchanged. PLS-PI still has an advantage over regression in terms of higher point estimates for beta; regression still has an advantage over PLS-PI in terms of power. Widely different indicator loadings (for example A4, A5, or A7) reduce regression's advantage, but do not eliminate it. So the pattern we found in Figure 2 and Table 1 will, in fact, be faced by researchers using questionnaires with integer data as well. Table C-1 -- Integer Values (Indicator Values Rounded to Integers) Chin et al. s A1- A7 Data Reliability (main Power Beta Values Main Effect Construct effect Regression DataSet Indicator Loadings constructs) PLS Reg Advantage PLS Reg PLS Advantage A1 2@.8; 2@.7; 2@.6 0.85 0.43 0.58 0.15 0.28 0.23 0.06 A2 3@.8; 3@.7 0.89 0.51 0.64 0.13 0.29 0.24 0.05 A3 3@.8; 3@.6 0.85 0.43 0.57 0.14 0.29 0.22 0.06 A4 2@.8; 2@.6; 2@.4 0.77 0.43 0.51 0.08 0.30 0.21 0.09 A5 3@.8; 3@.4 0.77 0.38 0.49 0.11 0.30 0.21 0.10 A6 3@.7; 3@.6 0.81 0.39 0.51 0.12 0.28 0.21 0.07 A7 2@.7; 2@.6; 2@.3 0.70 0.33 0.36 0.03 0.29 0.18 0.11 5

6 APPENDIX D: EXPLORING THE POSSIBILITY THAT PLS-PI CAPITALIZES ON CHANCE We said in the beginning of the Discussion section of the main paper that our results in Tables 2 and 3 seem to suggest the possibility that PLS-PI capitalized on chance. In this appendix we examine this contention in more detail. We first describe the PLS estimation process and show where in that process capitalization on chance could occur. With as background, we use a hypothetical example of how PLS-PI and regression would respond to a single extreme outlier data point causing an interaction indicator to be highly correlated with Y. We then move away from the idea of single outlier data points and argue that random variation in general could have the same effect. We then support these contentions with a detailed look at one specific sample where PLS-PI generated a large beta estimate that was not statistically significant. We conclude that all of our ad hoc analysis gives results that are consistent with PLS-PI capitalizing on chance, especially when there are many indicators for given constructs. The PLS Estimation Process. The algorithm for PLS estimation, as described by Chin (1998), involves 3 stages. Stage One is an iterative estimation process to determine the set of indicator weights for all constructs, computed in such a way to maximize the amount of variance explained. Once the final indicator weights are determined, Stages Two and Three are simple noniterative applications of OLS regression for obtaining loadings, path coefficients, and mean scores and location parameters for the LV [latent variable] and observed variables. (Chin 1998, p. 302). To be complete we might say that there is a Stage Four to determine the statistical significance of the parameter estimates, which typically involves the use of bootstrapping or jackknifing. The key to understanding how PLS might capitalize on chance is to look closely at how Stage One operates. Let s look only at the interaction indicators and the link between the interaction (X*Z) and Y (from Figure 1). The following steps are performed (Barclay et al., 1995): 1. PLS starts by assuming equal weights for all indicators (the loadings are set to 1.0). An initial estimate for Y is calculated by summing the values for y1 and y2. 2. To estimate the weights for the interaction terms (x1*z1, etc.), a regression is completed with the initial estimate of Y as the dependent variable, and the interaction indicator values (x1*z1, etc.) as the independent variables. 3. These weights are then used in a linear combination of x1*z1 x2*z2 to give an initial estimate for X*Z (the interaction term construct). 4. The loadings for y1 and y2 are then estimated by a pair of simple regressions of y1 and y2 on X*Z. 5. The next step uses the estimated loadings, transformed into weights, to form a linear combination of y1 and y2 as a new estimate for the value of Y. This process continues until the difference between the stop criterion (e.g., the average of R 2 s of all endogenous constructs) in consecutive iterations is extremely small. Put differently, at each Stage One cycle PLS calculates new construct scores for the interaction construct and for Y using their existing weights. It then seeks to increase the R 2 of Y by adjusting the interaction indicator weights. It does this by looking through the immediate construct (X*Z) to see the next proximal construct (Y). In other words, for the interaction construct indicators, it ignores the interaction construct score just calculated, and looks instead at the construct score for Y, the proximal construct. In effect it runs a regression analysis with the

construct score for Y as the dependent variable and the many interaction product indicators as the independent variables. The betas for the indicators in this regression provide the basis for adjusting the indicator weights for the next round. In effect, PLS selects those indicators with the highest correlation with Y and gives them the highest weights. PLS-PI And an Extreme Outlier Point. How would this process react to a single extreme outlier data point for one of the interaction indicators? Consider our underlying true model in Figure 1 with two indicators for each construct and all indicator loadings equal to.7. Further, suppose that there was an extreme outlier point that created an unusually high correlation between one of the interaction indicators (say x 1 *z 2 ) and the Y construct. In this case the Stage One process described above would tend to give a higher than normal weight to that x 1 *z 2 indicator, in order to maximize the R 2. (We note that maximizing R 2 is what both PLS and regression try to do, but by being able to adjust the indicator weights as well as the beta weights, PLS has more ability to capitalize on chance than regression). If the x 1 *z 2 indicator is weighted more heavily in the score for X*Z, then X*Z would be more than normally correlated with Y, which would result in a higher than normal estimate for the path from X*Z to Y. What would regression do in response to this extreme outlier point? Because our regression analysis approach typically uses equal loadings for all indicators, the effect of the outlier data point would be dampened by the fact that the x 1 *z 2 indicator would be averaged in with three other indicators that did not have such a high correlation with Y. The result is that for regression there would be a smaller increase in the path from X*Z to Y, and a smaller increase in R 2. So far this explanation is consistent with the results from both the Chin et al. analysis and our analysis we saw higher average interaction paths (and, though not shown here, higher average R 2 ) in PLS than in regression. But how would the statistical significance be affected in the two different techniques? Regression would calculate the standard error of the interaction coefficient using the assumption that all indicators had equal weightings, and would include all data points, including the outlier, in the single calculation. To test for statistical significance in PLS, on the other hand, the recommended technique is to employ bootstrapping. The bootstrapping approach is relatively conservative with respect to statistical significance caused by outlier data points, as explained below. Using sampling with replacement, bootstrapping creates a number of resamples (say 400) of the original data points, each with the same n as the original sample, but not necessarily the same data points. (Some data points may be omitted; others may appear more than once in any given resample.) For each of these 400 resamples, there is a complete PLS analysis (Stages one through three) and a new estimate of the interaction beta is made for each. Since some of the 400 resamples would have the outlier point included and some would not, the values of the 400 beta estimates would tend to jump around they would be higher when the outlier was included; lower when it was not included. Bootstrapping uses these beta estimates from the 400 bootstrapping resamples to determine the standard error of the original beta estimate, so this larger variation in the beta estimates due to the outlier data point would result in the bootstrapping program generating a larger standard error for the interaction beta. This larger standard error would tend to offset the larger interaction beta value, perhaps even leading to smaller t-statistics for the PLS interaction beta than the regression interaction beta. This could make it more difficult to achieve statistical significance, even with a higher interaction beta estimate. 7

PLS-PI With the Usual Random Variation But Without Extreme Outliers Although above we talked about the effect of a single extreme outlier data point, the same general effect might come about from the normal random variation present in virtually all questionnaire data, even without extreme outliers. In fact it is virtually certain that when there are multiple interaction indicators, some will have higher correlations with Y than others. PLS would give larger weights to each indicator that had a higher correlation with Y, and lower weights to those with smaller correlations with Y. The more indicators there are, the greater the possibility for variation in correlation with Y, and the more scope PLS has to "capitalize on chance" by assigning higher weights to those interaction indicators highly correlated with Y. Thus the impact from any single indicator would not need to be large, if it was added to the impact from other indicators as well. If PLS-PI does sometimes capitalize on chance, we would expect that we could identify samples in our own analysis that had capitalized on chance by looking for large differences between the regression and PLS-PI estimates of the interaction beta. If it were true that these samples also did not have statistically significant interaction terms for PLS, it would be consistent with our conjecture above. To test this conjecture, we identified the five samples (of the 500 in the six indicator, n=100 cell) with the largest differences between the PLS and regression interaction estimates. In all five cases, the PLS runs have a much larger absolute value for the interaction estimate. However, none of the five samples has a statistically significant interaction term not for PLS nor for regression. Table D-1 shows the results for the five samples. An In Depth Look At One Particular Sample To investigate this further, in Table D-2 we look at sample #497 (from the bottom of Table D-1) in more detail. (We completed a similar analysis with sample #87, and obtained similar results). We focus not on the estimate of the interaction path from X*Z to Y, but on the weights PLS assigns to the 36 product indicators to come up with a value for I, the interaction construct. According to our explanation above, if sample #497 has some interaction indicators with especially high correlations with Y, we should see PLS assigning a higher weight to these indicators. We certainly see that in Table D-2. The six indicators with the highest absolute value correlation with Y (shown in bold in column 2 of Table D-2) have the six highest absolute value weights assigned by PLS (shown in bold in column 3 of Table D-2.) Note also that three of the highest absolute value correlations are positive, and three are negative. We know that in the underlying model of Figure 1, the true relationship between Y and the interaction term (or any of its indicators) is positive. Of course there is also random error Table D-1. Examples Of Large Path Estimates In PLS Not Translating to Statistical Significance The five samples from the 6 indicators with n = 100 cell with the largest absolute difference between interaction point estimates for PLS-PI and regression. PLS-PI Regression 8 Interaction Path Estimate R 2 t statistic Interaction Path Estimate R 2 t statistic Example 1 (#87) -0.32 0.38-0.94 0.05 0.21 0.54 Example 2 (#422) 0.40 0.37 0.92 0.10 0.23 1.12 Example 3 (#471) 0.34 0.32 0.98 0.04 0.17 0.47 Example 4 (#477) -0.31 0.34-0.89 0.02 0.23 0.17 Example 5 (#497) 0.37 0.36 1.10 0.04 0.20 0.46

variance, which is why we see quite a scattering between negative and positive correlations with Y in Table D-2. What is interesting is that PLS has responded to this chance scattering of correlations by assigning higher positive weights to indicators with high positive correlations with Y (for example I8), and assigning high negative weights to indicators with higher negative correlations (for example I6). This is evidence that PLS is capitalizing on large negative and positive chance correlations. PLS is doing this to maximize the correlation between the resulting interaction construct (the weighted average of all 36 product indicators) and Y. But as we can see, the result is a strange mixture of positive and negative weights for the 36 product indicators, even though the underlying model would suggest all positive (or all negative) weights. If the higher correlations between product indicator scores and Y are caused by a small collection of data points, there are a couple of things we might expect to see in the bootstrap results. First, we might expect a large difference (column 5) between the original PLS estimate for the weights based on the full sample (column 3) and the average of the bootstrap resample estimates for the weights (column 4), since the bootstrap resamples would sometimes include these chance data points, and sometimes not. In fact we do see that -- the six highest absolute value correlations in column 2, are also the 6 highest differences (column 5) between the overall PLS determined weight and the bootstrap average weight. Secondly, we would expect that the bootstrap would produce a larger standard error for the weights (column 6) when an indicator had these chance data points (i.e. those causing a high positive or negative correlation with Y). In fact, the six highest absolute value correlations in column 2 are also the 6 highest standard errors as determined by bootstrapping (column 6). Finally, of the six highest absolute value correlations, only two have a t statistic higher than 1.98. All of this is consistent with our conjecture that PLS capitalizes on random high positive or negative correlations with Y in determining its beta values, but during significance testing, bootstrapping compensates for that by penalizing beta values caused by only a few data points. Extreme Outliers or Random Chance? Are the characteristics shown in Table D-2 really due to chance variations and not to recognizable outliers in the data? Because if the latter, perhaps the outliers could be removed prior to PLS analysis. Looking at frequency distributions for the X, Z and Y indicators, we saw mostly values between -2 and +2, with a lesser amount between -3 and +3, and only one possible outlier, a Z value of 4. Removing that data point had no material impact on the PLS results -- beta was still high (.367) and not statistically significant. Though most of the values for interaction indicators were between -4 and +4, we found a collection of data points with values of +6 or -6, which might be construed to be outliers 3. Removing these left 95 data points, and PLS again returned a high beta interaction (.407), again not significant. There were 10 more data points with values of + 4 or - 4. While it is not clear that any researcher would be comfortable removing as outliers 15 out of 100 questionnaires on this evidence (Carte and Russell 2003, p. 489 caution against this), we did remove those questionnaires and performed a PLS analysis on the remaining 85 questionnaires. The result was still a large beta value for the interaction term (.298, within the 10% accuracy hurdle), but still no statistical significance. We conclude that this phenomenon cannot be blamed on identifiable outlier data points, but is due to capitalization on chance in a more general sense. The more 3 We note that it would be possible to run a regression predicting Y with all of the X and Z predictors, and then use standard regression diagnostics, such as hat values to identify multivariate outliers (Bollen, 1989; Bollen and Arminger, 1991). We would like to thank an anonymous reviewer for this suggestion. 9

interaction indicators there are, the more opportunity there will be for PLS to capitalize on chance, and the more bootstrapping will penalize the findings in terms of loss of power. Table D-2. Sample #497 Statistics on PLS-PI Product Indicators (Six Indicator, N=100) Higher + or indicator correlations to Y leads to increased PLS indicator weights, increased differences from bootstrap means, and increased bootstrap standard error. Interaction Indicators Correlation with Y From SAS Run Bold if (abs(corr)>=.13) Weights Full Sample From PLS Bold if (abs(wt) >.195) Mean Wt. of Subsamples From PLS (Bootstrap) Difference (Wt - Mean) Calculated Bold if (abs(diff) >.14) Standard Error of Wt. From PLS (Bootstrap) Bold if (Std Err >.105) T-Statistic From PLS (Bootstrap) Bold if (t > 1.98) i1-0.018-0.027 0.026-0.053 0.071 0.38 i2-0.012-0.016 0.022-0.038 0.076 0.214 i3-0.005 0.001 0.03-0.028 0.071 0.02 i4-0.086-0.123 0.01-0.133 0.1 1.228 i5 0.005 0.006 0.028-0.022 0.051 0.117 i6-0.173-0.243 0.005-0.248 0.149 1.638 i7 0.103 0.143 0.035 0.109 0.086 1.666 i8 0.144 0.213 0.051 0.162 0.106 2.01 i9 0.123 0.194 0.056 0.139 0.102 1.896 i10 0.042 0.074 0.035 0.04 0.08 0.932 i11-0.084-0.114 0.018-0.132 0.091 1.253 i12 0.013 0.029 0.019 0.011 0.094 0.31 i13 0.035 0.052 0.023 0.029 0.067 0.781 i14-0.069-0.083 0.011-0.094 0.076 1.103 i15 0.072 0.115 0.044 0.071 0.086 1.342 i16 0.04 0.069 0.025 0.044 0.07 0.984 i17 0.001 0.013 0.021-0.008 0.052 0.254 i18 0.092 0.147 0.035 0.112 0.084 1.756 i19 0.146 0.218 0.043 0.175 0.11 1.975 i20 0.076 0.122 0.035 0.087 0.075 1.626 i21 0.034 0.073 0.041 0.032 0.074 0.986 i22-0.016-0.013 0.024-0.038 0.06 0.219 i23-0.087-0.106 0.021-0.127 0.091 1.163 i24-0.18-0.235 0.005-0.24 0.146 1.605 i25-0.056-0.086 0.012-0.098 0.086 0.996 i26-0.007-0.006 0.027-0.033 0.059 0.102 i27 0.118 0.178 0.048 0.13 0.098 1.822 i28-0.021-0.028 0.023-0.051 0.079 0.355 i29-0.028-0.029 0.024-0.052 0.076 0.379 i30-0.144-0.198 0.007-0.205 0.124 1.6 i31 0.183 0.258 0.047 0.211 0.126 2.05 i32 0.053 0.088 0.037 0.051 0.06 1.452 i33 0.01 0.02 0.041-0.022 0.053 0.37 i34 0.041 0.068 0.044 0.024 0.047 1.466 i35-0.083-0.105 0.023-0.128 0.084 1.25 i36 0.005 0.022 0.036-0.014 0.064 0.344 10

Appendix E: Comparing Bootstrapping With 100 and 500 Resamples To address the issue of whether a move to 500 resamples (rather than 100) would affect our results, we selected two cells (N=20 and N=50, with 4 indicators) and reran the analysis using 500 resamples. Recall that in each cell we have 500 samples of N=20 (or N=50). Using 500 resamples for each sample results in 250,000 resamples total in each cell, all drawn from the same underlying population. We limited the number cells for this analysis because given the technique we were using, the computational complexity of using 500 resamples made this very challenging. To illustrate the difficulty involved, one of the researchers spent approximately 16 hours to complete the analyses for the N=20 cell alone. The results were as we expected. At two decimal places, the path estimates and power computations were identical across the 100 resamples and 500 resamples. At three decimal places, very minor differences were observed in power, as shown in Table E1 below. Table E1: Comparing 100 and 500 Bootstrapping Resamples 100 Resamples 500 Resamples N=20, path estimate for I.238.238 N= 20, power for I (p<.01).030.028 N=50, path estimate for I.242.242 N=50, power for I (p<.01).154.154 11

Appendix F: PLS with Normal Theory Significance Testing A concern was raised that comparing statistical significance tests based on normal theory testing with regression, versus bootstrapping with PLS, is like comparing apples and oranges -- the two are not equivalent. To address this, we conducted some additional analyses where we compared regression with normal theory testing to PLS with normal theory testing. To create PLS with normal theory testing, we used indicator loadings generated by PLS to calculate construct scores, and then ran regression with normal theory testing on these construct scores. These results were compared with the straight regression (equal indicator loadings) significance results. To describe the process in more detail, first, we modified the model to have the same effect sizes as previously for X and Z, but no effect for I (the interaction term). In this manner, we could test the efficacy of both approaches in terms of both Type I and Type II errors. We used four items to measure each of the constructs X and Z, with sixteen indicators for the interaction term, I. We used sample sizes of 50, 100, and 150, and we generated 500 datasets for each of the sample sizes. We ran the PLS analysis without bootstrapping for all 500 datasets in each sample size condition, then stripped off the indicator weights for each construct and used those and the raw data to determine construct scores for each "questionnaire". These construct scores were then fed into a regression analysis which estimated the betas and t statistics for each of the 500 datasets. The proportion of t-statistics that are significant (i.e. the power) for each effect size and N are displayed in Table F-1 below (titled PLS-NTT, for PLS using Normal Theory Testing), alongside the results for Regression (using normal theory testing), and PLS with bootstrapping (titled PLS-B). Two things are worthy of note in the results. First, for the medium effect size path (X to Y) the power of PLS with regression significance testing (labeled PLS-NTT in the table) dominates the other approaches at N=50 and N=100. At sample sizes of 150, this advantage seems to have disappeared and the power of PLS-NTT is generally similar to the other techniques. On the face of it, this is evidence that PLS with regression significance testing is a more efficacious technique (has more power) than the other techniques at small sample sizes. But see below. Second, PLS-NTT also finds far more significant betas for the interaction term (for which there is no actual effect). The other techniques both find between 5 and 7% of these false positives, within a.05 confidence interval of.03 to.07 around the allowable amount of.05. PLS-NTT finds between 31% and 36% of these false positives for all three sample sizes. This is strong evidence that PLS-NTT is capitalizing on chance. Our interpretation (developed more fully in Appendix D) is the following. PLS has more "levers" available to it to capitalize on chance than regression. Regression can only vary the beta coefficients, while PLS can vary both the beta coefficients and the indicator weights. This gives PLS a stronger ability to capitalize on any chance high correlations of a particular indicator and the dependent construct. Especially with small sample size, often these chance high correlations come about through one or a few outlier data points. Bootstrapping, because of the way it determines the standard error for significance testing, will react to such outliers with a larger standard error, since in the resamples sometimes the outlier data point will be included and sometimes it will not. However, PLS-NTT allows the PLS algorithm to capitalize on chance, and does not correct for this using bootstrapping. The result is an unacceptably high percent of high false positives with PLS-NTT. 12

This suggests that the approach of using PLS to determine indicator weightings and then using those weightings and indicators scores as input to a regression analysis is not appropriate, at least without considerable further investigation. Further, no published work that we are aware of has advocated this approach. In addition, we note that Goodhue, Lewis and Thompson (2006) obtained similar results when they conducted analyses to test the impact of moving from the use of bootstrapping for PLS to the use of normal theory testing with PLS. Table F-1: Power at each Effect Size and Sample Size (Proportion of statistically significant betas) Including PLS Analysis Followed By Regression Significance Testing X to Y True beta =.30 n= 50 100 150 MR 0.50 0.79 0.93 PLS-B* 0.46 0.81 0.95 PLS-NTT** 0.56 0.86 0.94 Z to Y True beta =.50 n= 50 100 150 MR 0.85 0.99 1.00 PLS-B 0.82 0.99 1.00 PLS-NTT 0.86 0.99 1.00 Interaction True beta =.00 n= 50 100 150 MR 0.06 0.05 0.05 PLS-B 0.05 0.06 0.07 PLS-NTT 0.31 0.36 0.36 For medium effect size, PLS-NTT dominates the other techniques at small sample size, but is about the same at sample size of 150. For strong effect size at all three sample sizes, the proportion that are statistically significant are roughly equal for all three techniques. When there is no actual effect, PLS-NTT finds false positives over 30% of the time, with N=50, 100 or 150. The other techniques (MR and PLS with bootstrapping) are in line with expectations. PLS-B -- using bootstrapping (100 resamples) to assess statistical significance PLS-NTT, using normal theory testing (employing weights from PLS to compute weighted scores for constructs, and then using regression in SAS to compute estimates for path coefficients and to calculate t-statistic values) 13

Appendix G. Normality and Kurtosis Although the data that we (and Chin et al. 2003) generated for the main effect and dependent constructs (X, Z and Y) was normally distributed (N(0,1) by design), the interaction data that was computed by multiplying two normally distributed values together had zero skew but fairly high kurtosis. This is always the result of multiplying two N(0,1) values together, so kurtosis will generally be present in this type of interaction construct. To see if this had any impact on the PLS-PI and the regression results, we selected three cells (N=20, 50 and 100 for the 4-indicator data) that we believed would be representative of the type of data normally used by IS researchers. We then transformed the interaction data (for each of the PLS product indicators and for the regression single interaction value) by taking the square root, reducing the level of kurtosis to within the normally accepted range. We re-ran PLS and regression with the transformed data, and compared the statistical power results to our original results (see Table G- 1). At most, the power changed by.02 (e.g., at N=100 -- with the original data the power for regression was.50, and with the transformed data it was.48). These differences were negligible, suggesting that the non-normality of the interaction term did not affect the pattern of our results. Table G-1: Power (Proportion statistically significant at p <.01) for 4 Indicators Sample Size Regression Original Data Regression Transformed Data PLS-PI Original Data PLS-PI Transformed Data 20.08.07.03.03 50.17.17.15.15 100.50.48.41.40 14

Appendix H: Results using Significance Level of 0.05 Comparing the PLS-PI results in Table 3a (using α = 0.01) to those in Table H-1 below (using the more common α = 0.05 level), it is clear that relaxing significance level constraints does not change the pattern of results, yet it does reduce the sample sizes needed to achieve power of.80 across various conditions. The same is true for regression (Table H-2). In addition, the pattern of regression having an advantage with respect to PLS-PI for most conditions (except at very large sample sizes) holds at α = 0.05 as well (Table H-3). Table H-1: PLS-PI, Power at p <.05 Equal Loadings at.70; Bold = Power >.80 Number of Indicators for Main Effect Constructs Sample 2 i 4 i 6 i 8 i 10 i 12 i Size 20 0.05 0.13 0.15 0.17 0.16 0.16 50 0.20 0.32 0.37 0.40 0.41 0.43 100 0.36 0.59 0.63 0.63 0.68 0.63 150 0.47 0.72 0.77 0.80 0.81 0.79 200 0.64 0.80 0.85 0.91 0.90 0.92 500 0.93 0.99 0.99 1.00 1.00 1.00 Table H-2: Regression, Power at p <.05 Table H-3: Regression Advantage Over PLS-PI Equal Loadings at.70; Bold = Power >.80 At p <.05 Number of Indicators for Main Effect Constructs Number of Indicators for Main Effect Constructs Sample Size 2 i 4 i 6 i 8 i 10 i 12 i Sample Size 2 i 4 i 6 i 8 i 10 i 12 i Row Avg. 20 0.13 0.25 0.24 0.28 0.28 0.32 20 0.08 0.12 0.09 0.11 0.12 0.16 0.11 50 0.28 0.41 0.50 0.59 0.62 0.66 50 0.08 0.09 0.13 0.19 0.21 0.23 0.16 100 0.43 0.73 0.79 0.87 0.90 0.87 100 0.07 0.14 0.16 0.24 0.22 0.24 0.18 150 0.58 0.86 0.89 0.95 0.97 0.98 150 0.11 0.14 0.12 0.15 0.16 0.19 0.15 200 0.73 0.91 0.96 0.98 0.99 0.99 200 0.09 0.11 0.11 0.07 0.09 0.07 0.09 500 0.96 1.00 1.00 1.00 1.00 1.00 500 0.03 0.01 0.01 0.00 0.00 0.00 0.01 Col. Avg. 0.08 0.10 0.10 0.13 0.13 0.15 15

References For the Online Appendices Barclay, D., Higgins, C. and Thompson R. The Partial Least Squares (PLS) Approach to Causal Modeling: Personal Computer Adoption and Use as an Illustration, Technology Studies, 2(2), 1995, pp. 285-309. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Bollen, K. A., & Arminger, G. 1991, Observational residuals in factor analysis and structural equation models. Sociological Methodology 21 235-262. Carmines, E.G. and Zeller, R.A. 1979. Reliability and Validity Assessment, Sage Publications, Beverly Hills, CA. Carte, T. and C. Russell. 2003. In Pursuit of Moderation: Nine Common Problems and Their Solutions. MIS Quarterly 27(3) 479-501. Chin, W.W. 1998. The Partial Least Squares Approach to Structural Equation Modeling, in G.A. Marcoulides (Ed.) Modern Methods for Business Research, London. 295-336. Chin, W.W., Marcolin, B. and P. Newsted. 2003. A Partial Least Squares Latent Variable Modeling Approach for Measuring Interaction Effects: Results from a Monte Carlo Simulation Study and an Electronic-Mail Emotion/Adoption Study. Information Systems Research 14(2) 189-217. Cronbach, L.J. 1951. Coefficient Alpha and the Internal Structure of Tests. Psychometrica 16 297-334. Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences, L. Erlbaum Associates, Hillside, NJ. Goodhue, D., Lewis, W. and Thompson, R. 2006. Small Sample Size and Statistical Power in MIS Research. Proceedings of the 39th Hawaii International Conference on Systems Sciences, (CD), R. Sprague (Ed) IEEE Computer Society Press, Los Alamitos, CA, (January 4-7) 1-10. Larsen, R.J. and Marx, M.L. 1981. An Introduction to Mathematical Statistics and Its Applications. Prentice-Hall, Inc. Englewood Cliffs, NJ. 16