Sum of Squares Type I, II, III:
the underlying hypotheses, model comparisons, and their calculation in R

General remarks
Example walk-through in R
Intercorrelations of indicator variables
Sum of squares type I
Sum of squares type II
Sum of squares type III

General remarks

In non-orthogonal factorial between-subjects designs that typically result from non-proportional unequal cell sizes, so-called type I-III sums of squares (SS) can give different results in an ANOVA for all tests but the highest interaction effect. The SS of an effect is the sum of squared differences between the predicted values from the least-squares fit of a restricted model and the prediction from the least-squares fit of a more general model. What differs between the "types of SS" is the choice for the restricted and more general model when testing an effect. Orthogonal designs obscure these differences because then the different hypotheses lead to the same calculations.

The hypotheses tested by the model-comparisons underlying the different SS types can also be formulated using appropriately-weighted cell expected values, for details see this pdf. This manuscript derives an explicit formula for type III SS.

Since different types of SS can give different results, the question arises which one should be chosen. It seems difficult to give general guidelines to that question since the choice should be motivated by the actual hypotheses being tested. The issue does not come up for hypotheses that do not refer to averages of cell expected values, but simply compare cell expected values themselves or differences between them.

SAS and SPSS use SS type III as their default, while functions that ship with R use type I. This can lead to different results when analyzing the same data with different statistics packages. If you simply "want to get SS type III in R" despite the host of issues associated with the choice, you can do this (see the numerical example below for other ways):

Install the package car:
> install.packages("car") # (car on CRAN)
Set the contrast coding scheme for unordered factors to effect coding (contr.sum) or some other sum-to-zero contrast codes (e.g., contr.helmert):
> options(contrasts=c(unordered="contr.sum", ordered="contr.poly"))
Apply function Anova() to a model fitted with lm(), and use the type="III" argument:
> library(car)
> Anova(lm(<formula>), type="III")

A thorough resource on the topic is chapter 7 of Maxwell & Delaney (2004). Designing Experiments and Analyzing Data. A Model Comparison Perspective. Mahwah, NJ: Lawrence Erlbaum. However, M&D do not cover the geometrical interpretation in terms of orthogonal projections on model subspaces and their complements. A separate text on the General Linear Model will be required to get an introduction to that aspect.

Example walk-through in R

In order to make sense of the different explanations and pro/contra arguments brought forth when SS types are discussed, it might be helpful to have a specific example at hand. The following R-code aims to serve as such an example.

Jump directly to section
Intercorrelations of indicator variables
Sum of squares type I
Sum of squares type II
Sum of squares type III

> ####-----------------------------------------------------------------
> # 3x3 unbalanced design
> # data from Maxwell & Delaney 2004 p339
> # dependent variable Y: depression score
> # factor A: type of therapy, B: degree of severity
> ####-----------------------------------------------------------------
> 
> P   <- 3                              # number of groups factor A
> Q   <- 3                              # number of groups factor B
> g11 <- c(41, 43, 50)                  # scores in group A1/B1
> g12 <- c(51, 43, 53, 54, 46)          # scores in group A1/B2
> g13 <- c(45, 55, 56, 60, 58, 62, 62)  # scores in group A1/B3
> g21 <- c(56, 47, 45, 46, 49)          # scores in group A2/B1
> g22 <- c(58, 54, 49, 61, 52, 62)      # scores in group A2/B2
> g23 <- c(59, 55, 68, 63)              # scores in group A2/B3
> g31 <- c(43, 56, 48, 46, 47)          # scores in group A3/B1
> g32 <- c(59, 46, 58, 54)              # scores in group A3/B2
> g33 <- c(55, 69, 63, 56, 62, 67)      # scores in group A3/B3
> Y   <- c(g11, g12, g13, g21, g22, g23, g31, g32, g33)  # all scores
> # corresponding factors
> A   <- factor(rep(1:P, c(3+5+7, 5+6+4, 5+4+6)),
+               labels=paste("A", 1:P, sep=""))
> B   <- factor(rep(rep(1:Q, P), c(3,5,7, 5,6,4, 5,4,6)),
+               labels=paste("B", 1:Q, sep=""))
> 
> ####-----------------------------------------------------------------
> # for comparison: 3x3 balanced design with random data
> # -> all SS types lead to identical results
> ####-----------------------------------------------------------------
> 
> # Njk <- 5                     # equal cell size
> # P   <- 3                     # number of groups factor A
> # Q   <- 3                     # number of groups factor B
> # Y   <- rnorm(Njk*P*Q, 0, 1)  # all scores
> # # corresponding factors
> # A   <- factor(rep(1:P, Njk*Q),      labels=paste("A", 1:P, sep=""))
> # B   <- factor(rep(1:Q, each=Njk*P), labels=paste("B", 1:Q, sep=""))
> 
> ####-----------------------------------------------------------------
> # function getInf3x3
> # get indicator variables (predictors) for factors from design matrix
> # fit all relevant regression models for 3x3 between-subjects design
> # output: * residual sum of squares (RSS) for each model and its df
> #         * orthogonal projection on subspace as defined
> #           by the design matrix of each model
> ####-----------------------------------------------------------------
> 
> getInf3x3 <- function() {
+   X <- model.matrix(lm(Y ~ A + B + A:B))  # ANOVA design matrix
+ 
+   # numerical predictors (indicator variables) from design matrix
+   idA1 <- X[ , 2]   # factor A 1st indicator
+   idA2 <- X[ , 3]   # factor A 2nd indicator
+   idB1 <- X[ , 4]   # factor B 1st indicator
+   idB2 <- X[ , 5]   # factor B 2nd indicator
+   idI1 <- X[ , 6]   # interaction A:B 1st indicator
+   idI2 <- X[ , 7]   # interaction A:B 2nd indicator
+   idI3 <- X[ , 8]   # interaction A:B 3rd indicator
+   idI4 <- X[ , 9]   # interaction A:B 4th indicator
+ 
+   ####---------------------------------------------------------------
+   # RSS for each regression model and its df with the help of lm()
+ 
+   # all relevant regression models from lm()
+   mod1   <- lm(Y ~ 1)                      # no effect
+   modA   <- lm(Y ~ idA1+idA2)              # factor A
+   modB   <- lm(Y ~             idB1+idB2)  # factor B
+   modAB  <- lm(Y ~ idA1+idA2 + idB1+idB2)  # factors A, B
+   # factor A, interaction A:B
+   modAI  <- lm(Y ~ idA1+idA2             + idI1+idI2+idI3+idI4)
+   # factor B, interaction A:B
+   modBI  <- lm(Y ~             idB1+idB2 + idI1+idI2+idI3+idI4)
+   # full model A, B, interaction A:B
+   modABI <- lm(Y ~ idA1+idA2 + idB1+idB2 + idI1+idI2+idI3+idI4)
+ 
+   # RSS for all models from lm(), i.e., sum((Y - fitted(<model>))^2)
+   rss1   <- sum(residuals(mod1)^2)    # no effect, i.e., total SS
+   # = var(Y) * (length(Y)-1) = sum((Y - mean(Y))^2)
+   rssA   <- sum(residuals(modA)^2)    # factor A
+   rssB   <- sum(residuals(modB)^2)    # factor B
+   rssAB  <- sum(residuals(modAB)^2)   # factors A, B
+   rssAI  <- sum(residuals(modAI)^2)   # factor A, A:B
+   rssBI  <- sum(residuals(modBI)^2)   # factor B, A:B
+   rssABI <- sum(residuals(modABI)^2)  # full model A, B, A:B
+ 
+   # degrees of freedom for RSS from each model
+   N     <- length(Y)  # total N
+   df1   <- N - (0+1)  # no effect:            0 predictors + mean
+   dfA   <- N - (2+1)  # factor A:             2 predictors + mean
+   dfB   <- N - (2+1)  # factor B:             2 predictors + mean
+   dfAB  <- N - (4+1)  # factors A, B:         4 predictors + mean
+   dfAI  <- N - (6+1)  # factor A, A:B:        6 predictors + mean
+   dfBI  <- N - (6+1)  # factor B, A:B:        6 predictors + mean
+   dfABI <- N - (8+1)  # full model A, B, A:B: 8 predictors + mean
+ 
+   ####---------------------------------------------------------------
+   # alternatively: get RSS for each model and its df manually
+   # based on geometric interpretation
+ 
+   # design matrix for each model
+   one  <- rep(1, nrow(X))                     # column of 1s
+   X1   <- cbind(one)                          # no effect
+   Xa   <- cbind(one, idA1, idA2)              # factor A
+   Xb   <- cbind(one,             idB1, idB2)  # factor B
+   Xab  <- cbind(one, idA1, idA2, idB1, idB2)  # factors A, B
+   # factor A, interaction A:B
+   Xai  <- cbind(one, idA1, idA2,             idI1, idI2, idI3, idI4)
+   # factor B, interaction A:B
+   Xbi  <- cbind(one,             idB1, idB2, idI1, idI2, idI3, idI4)
+   # full model A, B, interaction A:B
+   Xabi <- cbind(one, idA1, idA2, idB1, idB2, idI1, idI2, idI3, idI4)
+ 
+   # orthogonal projections P on subspace
+   # given by the design matrix of each model
+   # P*y = y^hat are predictions of the model's least squares fit
+ 
+   # in a balanced design, effect term estimates are:
+   M       <- mean(Y)                        # grand mean
+   Mjk     <- ave(Y, A, B, FUN="mean")       # cell means
+   Mj      <- ave(Y, A,    FUN="mean")       # row means
+   Mk      <- ave(Y,    B, FUN="mean")       # col means
+   alphaJ  <- Mj - M                         # main effect A
+   betaK   <- Mk - M                         # main effect B
+   gammaJK <- Mjk - (M + alphaJ + betaK)     # interaction effect
+ 
+   # no effect: prediction = (weighted) grand mean
+   P1 <- X1  %*% solve(t(X1) %*% X1) %*% t(X1)
+   # P1 %*% Y = M
+ 
+   # factor A: prediction = corresponding (weighted) A-group mean
+   Pa <- Xa %*% solve(t(Xa) %*% Xa) %*% t(Xa)
+   # Pa %*% Y = Mj = M + alphaJ
+ 
+   # factor B: prediction = corresponding (weighted) B-group mean
+   Pb <- Xb %*% solve(t(Xb) %*% Xb) %*% t(Xb)
+   # Pb %*% Y = Mk = M + betaK
+ 
+   # factors A, B:
+   Pab <- Xab %*% solve(t(Xab) %*% Xab) %*% t(Xab)
+   # only the same in a balanced design:
+   # Pab %*% Y = M + alphaJ + betaK
+ 
+   # factor A, interaction A:B
+   Pai <- Xai %*% solve(t(Xai) %*% Xai) %*% t(Xai)
+   # only the same in a balanced design && with sum-to-zero contrasts:
+   # Pai %*% Y = M + alphaJ + gammaJK
+ 
+   # factor B, interaction A:B
+   Pbi <- Xbi %*% solve(t(Xbi) %*% Xbi) %*% t(Xbi)
+   # only the same in a balanced design && with sum-to-zero contrasts:
+   # Pbi %*% Y = M + betaK + gammaJK
+ 
+   # full model A, B, A:B: prediction = corresponding cell mean
+   Pabi <- Xabi %*% solve(t(Xabi) %*% Xabi) %*% t(Xabi)
+   # Pabi %*% Y = Mjk = M + alphaJ + betaK + gammaJK
+ 
+   # RSS = squared length of projections onto the orthogonal
+   # complement of each model's null hypothesis subspace
+   # within NxN space
+   # (I-P)*y = I*y - P*y = y - P*y = y - y^hat
+   # differences to model prediction, i.e., residuals
+ 
+   # NxN identity matrix = "projection" onto outermost space
+   ID     <- diag(N)
+ 
+   # no effect:
+   rss1   <- c(crossprod((ID - P1)   %*% Y))
+   # = sum((Y - M)^2)
+ 
+   # factor A:
+   rssA   <- c(crossprod((ID - Pa)   %*% Y))
+   # = sum((Y - Mj)^2) = sum((Y - (M + alphaJ))^2)
+ 
+   # factor B:
+   rssB   <- c(crossprod((ID - Pb)   %*% Y))
+   # = sum((Y - Mk)^2) = sum((Y - (M + betaK))^2)
+ 
+   # factors A, B:
+   rssAB  <- c(crossprod((ID - Pab)  %*% Y))
+   # only the same in a balanced design:
+   # = sum((Y - (M + alphaJ + betaK))^2)
+ 
+   # factor A, interaction A:B:
+   rssAI  <- c(crossprod((ID - Pai)  %*% Y))
+   # only the same in a balanced design && with sum-to-zero contrasts:
+   # = sum((Y - (M + alphaJ + gammaJK))^2)
+ 
+   # factor B, interaction A:B
+   rssBI  <- c(crossprod((ID - Pbi)  %*% Y))
+   # only the same in a balanced design && with sum-to-zero contrasts:
+   # = sum((Y - (M + betaK + gammaJK))^2)
+ 
+   # full model A, B, interaction A:B:
+   rssABI <- c(crossprod((ID - Pabi) %*% Y))
+   # = sum((Y-Mjk)^2) = sum((Y - (M+alphaJ+betaK+gammaJK))^2)
+ 
+   # get df of each RSS from dimension of subspace complementary
+   # to the null hypothesis model subspace (within NxN space)
+   df1   <- qr(ID - P1)$rank    # no effect
+   dfA   <- qr(ID - Pa)$rank    # factor A
+   dfB   <- qr(ID - Pb)$rank    # factor B
+   dfAB  <- qr(ID - Pab)$rank   # factor AB
+   dfAI  <- qr(ID - Pai)$rank   # factor A, A:B
+   dfBI  <- qr(ID - Pbi)$rank   # factor B, A:B
+   dfABI <- qr(ID - Pabi)$rank  # full model: A, B, A:B
+ 
+   # return information about RSS from each model, its df,
+   # and the corresponding orthogonal projections
+   return(list( rss1=rss1,   rssA=rssA,    rssB=rssB, rssAB=rssAB,
+               rssAI=rssAI, rssBI=rssBI, rssABI=rssABI,
+                 df1=df1,     dfA=dfA,      dfB=dfB,   dfAB=dfAB,
+                dfAI=dfAI,   dfBI=dfBI,   dfABI=dfABI,
+                  P1=P1,       Pa=Pa,        Pb=Pb,     Pab=Pab,
+                 Pai=Pai,     Pbi=Pbi,     Pabi=Pabi))
+ }

Intercorrelations of indicator variables | top

In a non-orthogonal design, indicator variables corresponding to different effects are correlated -> this is anolagous to multicollinearity in multiple regression. The degree of multicollinearity depends on the type of contrast codes for categorical variables.

> # dummy coding -> high multicollinearity
> options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
> # get multicollinearity information
> # pairwise correlations of indicator variables
> model <- lm(Y ~ A + B + A:B)
> round(cor(model.matrix(model)[ , -1]), 3)
           AA2    AA3    BB2    BB3 AA2:BB2 AA3:BB2 AA2:BB3 AA3:BB3
AA2      1.000 -0.500  0.100 -0.162   0.555  -0.221   0.442  -0.277
AA3     -0.500  1.000 -0.100  0.032  -0.277   0.442  -0.221   0.555
BB2      0.100 -0.100  1.000 -0.551   0.555   0.442  -0.221  -0.277
BB3     -0.162  0.032 -0.551  1.000  -0.306  -0.243   0.401   0.503
AA2:BB2  0.555 -0.277  0.555 -0.306   1.000  -0.123  -0.123  -0.154
AA3:BB2 -0.221  0.442  0.442 -0.243  -0.123   1.000  -0.098  -0.123
AA2:BB3  0.442 -0.221 -0.221  0.401  -0.123  -0.098   1.000  -0.123
AA3:BB3 -0.277  0.555 -0.277  0.503  -0.154  -0.123  -0.123   1.000

> # variance inflation factors
> library(car)
> vif(model)
        GVIF Df GVIF^(1/2Df)
A   13.00000  2     1.898829
B   10.52381  2     1.801122
A:B 50.48571  4     1.632661

> # condition number
> # ev <- eigen(t(model.matrix(model)) %*% model.matrix(model))$values
> # sqrt(max(ev) / min(ev[ev > .Machine$double.eps]))
> kappa(model, exact=TRUE)
[1] 16.62590

> # effect coding -> lower multicollinearity
> options(contrasts=c(unordered="contr.sum",       ordered="contr.poly"))
> 
> # pairwise correlations of indicator variables
> model <- lm(Y ~ A + B + A:B)
> round(cor(model.matrix(model)[ , -1]), 3)
          A1     A2     B1     B2  A1:B1  A2:B1  A1:B2  A2:B2
A1     1.000  0.500 -0.101  0.000 -0.200 -0.041 -0.156 -0.082
A2     0.500  1.000  0.067  0.129 -0.040  0.000 -0.078  0.000
B1    -0.101  0.067  1.000  0.547 -0.051 -0.075  0.039 -0.068
B2     0.000  0.129  0.547  1.000  0.034 -0.076  0.075  0.007
A1:B1 -0.200 -0.040 -0.051  0.034  1.000  0.547  0.608  0.310
A2:B1 -0.041  0.000 -0.075 -0.076  0.547  1.000  0.287  0.497
A1:B2 -0.156 -0.078  0.039  0.075  0.608  0.287  1.000  0.481
A2:B2 -0.082  0.000 -0.068  0.007  0.310  0.497  0.481  1.000

> vif(model)                # variance inflation factors
        GVIF Df GVIF^(1/2Df)
A   1.124302  2     1.029724
B   1.098251  2     1.023706
A:B 1.124790  4     1.014808

> kappa(model, exact=TRUE)  # condition number
[1] 3.549330

SS type I | top

Type I sum of squares have the following properties:

They use sequential model comparisons that conform to the principle of marginality (higher order terms are entered after all corresponding lower order terms).
Due to the sequential nature of the model comparisons, they depend on the order of model terms.
The individual effect SS sum to the total effect SS
They test for the equality of weighted marginal expected values (see pdf for formula).
-> the hypotheses take the empirical cell sizes into account
They do not depend on the coding scheme used for categorical variables.

> ####-----------------------------------------------------------------
> # coding scheme for categorical variables doesn't matter - run with
> # dummy coding for unordered factors (factory default in R)
> options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
> 
> # or with effect coding for unordered factors (sum to zero)
> options(contrasts=c(unordered="contr.sum",       ordered="contr.poly"))
> 
> # result from anova(): SS type I
> anova(lm(Y ~ A + B + A:B))
Analysis of Variance Table
Response: Y
          Df  Sum Sq Mean Sq F value    Pr(>F)    
A          2  101.11   50.56  1.8102    0.1782    
B          2 1253.19  626.59 22.4357 4.711e-07 ***
A:B        4   14.19    3.55  0.1270    0.9717    
Residuals 36 1005.42   27.93

> ####-----------------------------------------------------------------
> # order of model terms matters
> anova(lm(Y ~ B + A + A:B))
Analysis of Variance Table
Response: Y
          Df  Sum Sq Mean Sq F value    Pr(>F)    
B          2 1115.82  557.91 19.9764 1.458e-06 ***
A          2  238.48  119.24  4.2695   0.02168 *  
B:A        4   14.19    3.55  0.1270   0.97170    
Residuals 36 1005.42   27.93

> ####-----------------------------------------------------------------
> # sequential model comparisons for SS type I for order Y ~ A+B+A:B
> # conform to marginality principle:
> # higher-order terms are entered after all corresponding lower terms
> #
> # main effect factor A
> # <constant> vs. A
> # main effect factor B: same comparison for SS type I, II
> #      A     vs. A + B
> # interaction: same comparison for SS type I, II and III
> #      A + B vs. A + B + A:B
> 
> cmpA <- anova(lm(Y ~ 1),     lm(Y ~ A))            # factor A
> cmpB <- anova(lm(Y ~ A),     lm(Y ~ A + B))        # factor B
> cmpI <- anova(lm(Y ~ A + B), lm(Y ~ A + B + A:B))  # interaction
> 
> cmpA[["Sum of Sq"]][2]  # SS factor A
[1] 101.1111

> cmpB[["Sum of Sq"]][2]  # SS factor B
[1] 1253.189

> cmpI[["Sum of Sq"]][2]  # SS interaction
[1] 14.18714

> ####-----------------------------------------------------------------
> # manual calculation of model comparisons for SS type I
> # get information from utility function
> resI <- getInf3x3()
> 
> # SS for each effect: sequential RSS-difference between model
> # and next more restrictive model
> (ssIa <- resI$rss1  - resI$rssA)           # factor A
[1] 101.1111

> (ssIb <- resI$rssA  - resI$rssAB)          # factor B
[1] 1253.189

> (ssIi <- resI$rssAB - resI$rssABI)         # interaction
[1] 14.18714

> # effect MS: RSS-difference divided by df-difference
> (msIa <- ssIa / (resI$df1  - resI$dfA))    # factor A
[1] 50.55556

> (msIb <- ssIb / (resI$dfA  - resI$dfAB))   # factor B
[1] 626.5945

> (msIi <- ssIi / (resI$dfAB - resI$dfABI))  # interaction
[1] 3.546785

> (msE  <- resI$rssABI / resI$dfABI)         # error MS full model
[1] 27.92844

> # F-values for each effect: effect MS / error MS full model
> msIa / msE  # factor A
[1] 1.810182

> msIb / msE  # factor B
[1] 22.43572

> msIi / msE  # interaction
[1] 0.1269955

> ####-----------------------------------------------------------------
> # individual effect SS sum to total effect SS:
> # sum of individual effect SS
> ssIa + ssIb + ssIi
[1] 1368.487

> # total effect SS from model comparison:
> # RSS model with 0 predictors - RSS full model
> resI$rss1 - resI$rssABI
[1] 1368.487

> ####-----------------------------------------------------------------
> # SS for each effect from geometric interpretation:
> # squared length of projection onto the orthogonal complement
> # of null hypothesis model subspace within next bigger model subspace
> crossprod((resI$P1  - resI$Pa)   %*% Y)  # factor A
         [,1]
[1,] 101.1111

> crossprod((resI$Pa  - resI$Pab)  %*% Y)  # factor B
         [,1]
[1,] 1253.189

> crossprod((resI$Pab - resI$Pabi) %*% Y)  # interaction
         [,1]
[1,] 14.18714

> ####-----------------------------------------------------------------
> # SS type I test hypotheses about main effect of factor A
> # (entered first) with marginal means that are calculated
> # with weighted cell means
> Nj <- tapply(Y, A, length)           # total row Ns
> Mj <- tapply(Y, A, mean)             # weighted row means
> gM <- mean(Y)                        # weighted grand mean
> 
> # SS type I for factor A (entered first)
> (ssIa <- sum(Nj * (Mj-gM)^2))
[1] 101.1111

> # SS type I for factor B (entered after A) and interaction SS
> # cannot be easily calculated without matrix math (like SS type II)
> # error SS: sum of squared Y - corresponding cell mean
> (ssE <- sum((Y - ave(Y, A, B, FUN=mean))^2))
[1] 1005.424

> # degrees of freedom
> N   <- length(Y)      # total N
> dfA <- P-1            # for SS factor A
> dfE <- N - (P*Q)      # for error SS
> 
> # mean squares for main effect of A and error
> (msIa <- ssIa / dfA)  # for factor A
[1] 50.55556

> (msE  <- ssE  / dfE)  # MS error
[1] 27.92844

> # F-value factor A
> msIa / msE
[1] 1.810182

SS type II | top

Type II sum of squares have the following properties:

They test model comparisons that conform to the principle of marginality.
They do not depend on the order of model terms.
They test hypotheses about weighted cell expected values (see pdf for formula).
-> the hypotheses take the empirical cell sizes into account
The individual effect SS do not sum to the total effect SS.
The result does not depend on the coding scheme for categorical variables.

> ####-----------------------------------------------------------------
> # coding scheme for categorical variables doesn't matter - run with
> # dummy coding for unordered factors (factory default in R)
> options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
> 
> # or with effect coding for unordered factors (sum to zero)
> options(contrasts=c(unordered="contr.sum",       ordered="contr.poly"))
> 
> library(car)  # result from Anova() from package car
> Anova(lm(Y ~ A + B + A:B), type="II")
Anova Table (Type II tests)
Response: Y
           Sum Sq Df F value    Pr(>F)    
A          238.48  2  4.2695   0.02168 *  
B         1253.19  2 22.4357 4.711e-07 ***
A:B         14.19  4  0.1270   0.97170    
Residuals 1005.42 36

> ####-----------------------------------------------------------------
> # order of model terms doesn't matter
> Anova(lm(Y ~ B + A + A:B), type="II")
Anova Table (Type II tests)
Response: Y
           Sum Sq Df F value    Pr(>F)    
B         1253.19  2 22.4357 4.711e-07 ***
A          238.48  2  4.2695   0.02168 *  
A:B         14.19  4  0.1270   0.97170    
Residuals 1005.42 36

> ####-----------------------------------------------------------------
> # manual model comparisons for SS type II
> # get information from utility function
> resII <- getInf3x3()
> 
> # model comparisons for SS type II conform to marginality principle:
> # higher-order terms are entered after all corresponding lower terms
> #
> # main effect factor A
> #     B vs. A + B
> # main effect factor B: same comparison for SS type I, II
> # A     vs. A + B
> # interaction: same comparison for SS type I, II and III
> # A + B vs. A + B + A:B
> 
> # SS for main effects: RSS-difference between model with all main
> # effects and that model except the target effect
> (ssIIa <- resII$rssB  - resII$rssAB)           # factor A
[1] 238.4826

> (ssIIb <- resII$rssA  - resII$rssAB)           # factor B
[1] 1253.189

> (ssIIi <- resII$rssAB - resII$rssABI)          # interaction
[1] 14.18714

> # effect MS: RSS-difference divided by df-difference
> (msIIa <- ssIIa / (resII$dfB  - resII$dfAB))   # factor A
[1] 119.2413

> (msIIb <- ssIIb / (resII$dfA  - resII$dfAB))   # factor B
[1] 626.5945

> (msIIi <- ssIIi / (resII$dfAB - resII$dfABI))  # interaction
[1] 3.546785

> (msE   <- resII$rssABI / resII$dfABI)          # error MS full model
[1] 27.92844

> # F-values for each effect: effect MS / error MS full model
> msIIa / msE  # factor A
[1] 4.269529

> msIIb / msE  # factor B
[1] 22.43572

> msIIi / msE  # interaction
[1] 0.1269955

> ####-----------------------------------------------------------------
> # individual effect SS do not sum to total effect SS:
> # sum of individual effect SS
> ssIIa + ssIIb + ssIIi
[1] 1505.859

> # total effect SS from model comparison:
> # RSS model with 0 predictors - RSS full model
> resII$rss1 - resII$rssABI
[1] 1368.487

> ####-----------------------------------------------------------------
> # SS for each effect from geometric interpretation:
> # squared length of projection onto the orthogonal complement
> # of null hypothesis model subspace within model subspace
> # with all main effects (but not the interaction)
> crossprod((resII$Pb  - resII$Pab)  %*% Y)  # factor A
         [,1]
[1,] 238.4826

> crossprod((resII$Pa  - resII$Pab)  %*% Y)  # factor B
         [,1]
[1,] 1253.189

> crossprod((resII$Pab - resII$Pabi) %*% Y)  # interaction
         [,1]
[1,] 14.18714

SS type III | top

Type III sum of squares have the following properties:

They test model comparisons that violate the principle of marginality when testing main effects.
They do not depend on the order of model terms.
The individual effect SS do not sum to the total effect SS.
They test for the equality of unweighted marginal expected values (see pdf for formula).
-> the hypotheses do not take empirical cell sizes into account
These hypotheses are only tested when using effect- or orthogonal coding for categorical variables (e.g., Helmert).

> ####-----------------------------------------------------------------
> # coding scheme for categorical variables matters
> # run with dummy coding -> factory default in R, wrong results
> options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
> 
> # effect coding for unordered factors (sum to zero, correct results)
> options(contrasts=c(unordered="contr.sum",       ordered="contr.poly"))
> 
> library(car)  # result from Anova() from package car
> Anova(lm(Y ~ A + B + A:B), type="III")
Anova Table (Type III tests)
Response: Y
            Sum Sq Df   F value    Pr(>F)    
(Intercept) 121174  1 4338.7178 < 2.2e-16 ***
A              205  2    3.6658   0.03556 *  
B             1181  2   21.1452 8.447e-07 ***
A:B             14  4    0.1270   0.97170    
Residuals     1005 36

> ####-----------------------------------------------------------------
> # order of model terms doesn't matter
> Anova(lm(Y ~ B + A + A:B), type="III")
Anova Table (Type III tests)
Response: Y
            Sum Sq Df   F value    Pr(>F)    
(Intercept) 121174  1 4338.7178 < 2.2e-16 ***
B             1181  2   21.1452 8.447e-07 ***
A              205  2    3.6658   0.03556 *  
A:B             14  4    0.1270   0.97170    
Residuals     1005 36

> ####-----------------------------------------------------------------
> # model comparisons for SS type III
> # comparisons for main effects violate principle of marginality:
> # in the restricted model, interaction is included, but not one
> # of the corresponding main effects
> #
> # main effect factor A
> #     B + A:B vs. A + B + A:B
> # main effect factor B
> # A     + A:B vs. A + B + A:B
> # interaction: same comparison for SS type I, II and III
> # A + B       vs. A + B + A:B
> 
> # comparisons cannot be done using anova()
> # due to violation of marginality principle
> # drop1() drops each term in turn anyway
> # sum-to-zero coding required for correct results
> (dropRes <- drop1(lm(Y ~ A + B + A:B), ~ ., test="F"))
Single term deletions
Model:
Y ~ A + B + A:B
       Df Sum of Sq    RSS    AIC F value     Pr(F)    
<none>      1005.4  157.79                      
A       2    204.76 1210.2 162.13  3.6658   0.03556 *  
B       2   1181.11 2186.5 188.75 21.1452 8.447e-07 ***
A:B     4     14.19 1019.6 150.42  0.1270   0.97170

> ####-----------------------------------------------------------------
> # manual model comparisons leading to SS type III
> # get information from utility function
> resIII <- getInf3x3()
> 
> # effect SS: RSS-difference between model with all
> # effects and that model except the target effect
> (ssIIIa <- resIII$rssBI - resIII$rssABI)           # factor A
[1] 204.7617

> (ssIIIb <- resIII$rssAI - resIII$rssABI)           # factor B
[1] 1181.105

> (ssIIIi <- resIII$rssAB - resIII$rssABI)           # interaction
[1] 14.18714

> # effect MS: RSS-difference divided by df-difference
> (msIIIa <- ssIIIa / (resIII$dfBI - resIII$dfABI))  # factor A
[1] 102.3809

> (msIIIb <- ssIIIb / (resIII$dfAI - resIII$dfABI))  # factor B
[1] 590.5525

> (msIIIi <- ssIIIi / (resIII$dfAB - resIII$dfABI))  # interaction
[1] 3.546785

> (msE    <- resIII$rssABI / resIII$dfABI)           # error MS full model
[1] 27.92844

> # F-values for each effect: effect MS / error MS full model
> msIIIa / msE  # factor A
[1] 3.665829

> msIIIb / msE  # factor B
[1] 21.14520

> msIIIi / msE  # interaction
[1] 0.1269955

> # individual effect SS do not sum to total effect SS:
> # sum of individual effect SS
> ssIIIa + ssIIIb + ssIIIi
[1] 1400.054

> # total effect SS from model comparison:
> # RSS model with 0 predictors - RSS full model
> resIII$rss1 - resIII$rssABI
[1] 1368.487

> ####-----------------------------------------------------------------
> # SS for each effect from geometric interpretation:
> # squared length of projection onto the orthogonal complement
> # of null hypothesis model subspace within full model subspace
> crossprod((resIII$Pbi - resIII$Pabi) %*% Y)  # factor A
         [,1]
[1,] 204.7617

> crossprod((resIII$Pai - resIII$Pabi) %*% Y)  # factor B
         [,1]
[1,] 1181.105

> crossprod((resIII$Pab - resIII$Pabi) %*% Y)  # interaction
         [,1]
[1,] 14.18714
 
> ####-----------------------------------------------------------------
> # coding scheme for categorical variables matters for models
> # that violate marginality principle
> 
> # orthogonal projections onto subspace given by model's design matrix
> # when using dummy coding
> options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
> resD  <- getInf3x3()     # projections from utility function
> PaD   <- resD$Pa         # factor A
> PbD   <- resD$Pb         # factor B
> PabD  <- resD$Pab        # factors A, B
> PaiD  <- resD$Pai        # factor A, A:B -> violates marginality
> PbiD  <- resD$Pbi        # factor B, A:B -> violates marginality
> PabiD <- resD$Pabi       # full model A, B, A:B
> 
> # orthogonal projections onto subspace given by model's design matrix
> # when using effect coding (sum to zero)
> options(contrasts=c(unordered="contr.sum",       ordered="contr.poly"))
> resE  <- getInf3x3()     # projections from utility function
> PaE   <- resE$Pa         # factor A
> PbE   <- resE$Pb         # factor B
> PabE  <- resE$Pab        # factors A, B
> PaiE  <- resE$Pai        # factor A, A:B -> violates marginality
> PbiE  <- resE$Pbi        # factor B, A:B -> violates marginality
> PabiE <- resE$Pabi       # full model A, B, A:B
> 
> # check equality of projections from different contrast coding schemes
> # -> projections from models that violate marginality are not equal
> all.equal(PaD,   PaE)
[1] TRUE

> all.equal(PbD,   PbE)
[1] TRUE

> all.equal(PaiD,  PaiE)
[1] "Mean relative difference: 1.404650"

> all.equal(PbiD,  PbiE)
[1] "Mean relative difference: 1.354575"

> all.equal(PabD,  PabE)
[1] TRUE

> all.equal(PabiD, PabiE)
[1] TRUE

> # consequence: wrong results from Anova() and other methods
> # for main effects when dummy coding is used
> options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
> Anova(lm(Y ~ A + B + A:B), type="III")
Anova Table (Type III tests)
Response: Y
            Sum Sq Df  F value    Pr(>F)    
(Intercept) 5985.3  1 214.3096 < 2.2e-16 ***
A             31.4  2   0.5615  0.575263    
B            360.2  2   6.4488  0.004039 ** 
A:B           14.2  4   0.1270  0.971701    
Residuals   1005.4 36

> ####-----------------------------------------------------------------
> # manual calculation of SS type III for main effects based
> # on unweighted cell means
> # equals above results when sum-to-zero contrasts were used
> Njk <- tapply(Y, list(A, B), length)     # cell sizes
> Mjk <- tapply(Y, list(A, B), mean)       # cell means
> Mj  <- rowMeans(Mjk)                     # unweighted row means
> Mk  <- colMeans(Mjk)                     # unweighted column means
> 
> # effective marginal N: harmonic mean of cell sizes
> effNj <- P / rowSums(1/Njk)              # harmonic row means
> effNk <- Q / colSums(1/Njk   )           # harmonic column means
>
> # grand mean based on corresponding marginal means
> # weighted with effective marginal N
> gM1 <- sum(effNj*Mj) / sum(effNj)        # based on row means
> gM2 <- sum(effNk*Mk) / sum(effNk)        # based on column means
> 
> # SS type III for factors A und B
> (ssIIIa <- P * sum(effNj * (Mj-gM1)^2))  # factor A
[1] 204.7617

> (ssIIIb <- Q * sum(effNk * (Mk-gM2)^2))  # factor B
[1] 1181.105

> # interaction SS cannot be easily calculated without matrix math
> # error SS: sum of squared Y - corresponding cell mean
> (ssE    <- sum((Y - ave(Y, A, B, FUN=mean))^2))
[1] 1005.424

> # degrees of freedom
> dfA <- P-1                # SS factor A
> dfB <- Q-1                # SS factor B
> dfE <- N - (P*Q)          # error SS
> 
> # MS type III for effects and error
> (msIIIa <- ssIIIa / dfA)  # factor A
[1] 102.3809

> (msIIIb <- ssIIIb / dfB)  # factor B
[1] 590.5525

> (msE    <- ssE    / dfE)  # error
[1] 27.92844

> # F-values
> msIIIa / msE              # factor A
[1] 3.665829

> msIIIb / msE              # factor B
[1] 21.14520

Sum of Squares Type I, II, III: the underlying hypotheses, model comparisons, and their calculation in R