Exploratory factor analysis

Install required packages

GPArotation, mvtnorm, psych

wants <- c("GPArotation", "mvtnorm", "psych")
has   <- wants %in% rownames(installed.packages())
if(any(!has)) install.packages(wants[!has])

Factor analysis

Simulate data

True matrix of loadings

N <- 200
P <- 6
Q <- 2
(Lambda <- matrix(c(0.7,-0.4, 0.8,0, -0.2,0.9, -0.3,0.4, 0.3,0.7, -0.8,0.1),
                  nrow=P, ncol=Q, byrow=TRUE))
     [,1] [,2]
[1,]  0.7 -0.4
[2,]  0.8  0.0
[3,] -0.2  0.9
[4,] -0.3  0.4
[5,]  0.3  0.7
[6,] -0.8  0.1

Non correlated factors

set.seed(123)
library(mvtnorm)
Kf <- diag(Q)
mu <- c(5, 15)
FF <- rmvnorm(N, mean=mu,        sigma=Kf)
E  <- rmvnorm(N, mean=rep(0, P), sigma=diag(P))
X  <- FF %*% t(Lambda) + E

Using factanal()

(fa <- factanal(X, factors=2, scores="regression"))

Call:
factanal(x = X, factors = 2, scores = "regression")

Uniquenesses:
[1] 0.616 0.638 0.225 0.885 0.658 0.608

Loadings:
     Factor1 Factor2
[1,]  0.570  -0.243 
[2,]  0.601         
[3,] -0.217   0.853 
[4,] -0.197   0.276 
[5,]  0.327   0.486 
[6,] -0.622         

               Factor1 Factor2
SS loadings      1.267   1.104
Proportion Var   0.211   0.184
Cumulative Var   0.211   0.395

Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 1.43 on 4 degrees of freedom.
The p-value is 0.84 

Using fa() from package psych with rotation

Rotation uses package GPArotation

library(psych)
corMat <- cor(X)
(faPC  <- fa(r=corMat, nfactors=2, n.obs=N, rotate="varimax"))
Factor Analysis using method =  minres
Call: fa(r = corMat, nfactors = 2, n.obs = N, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
    MR2   MR1   h2   u2
1  0.57 -0.24 0.38 0.62
2  0.60  0.03 0.36 0.64
3 -0.22  0.85 0.77 0.23
4 -0.20  0.28 0.12 0.88
5  0.33  0.49 0.34 0.66
6 -0.62  0.07 0.39 0.61

                       MR2  MR1
SS loadings           1.27 1.10
Proportion Var        0.21 0.18
Cumulative Var        0.21 0.40
Proportion Explained  0.53 0.47
Cumulative Proportion 0.53 1.00

Test of the hypothesis that 2 factors are sufficient.

The degrees of freedom for the null model are  15  and the objective function was  0.82 with Chi Square of  160.4
The degrees of freedom for the model are 4  and the objective function was  0.01 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.04 

The harmonic number of observations is  200 with the empirical chi square  1.37  with prob <  0.85 
The total number of observations was  200  with MLE Chi Square =  1.43  with prob <  0.84 

Tucker Lewis Index of factoring reliability =  1.067
RMSEA index =  0  and the 90 % confidence intervals are  NA 0.061
BIC =  -19.77
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                MR2  MR1
Correlation of scores with factors             0.81 0.88
Multiple R square of scores with factors       0.66 0.78
Minimum correlation of possible factor scores  0.33 0.56

Factor scores

bartlett <- fa$scores
head(bartlett)
      Factor1 Factor2
[1,] -0.14693 -0.4087
[2,]  1.12205 -0.1700
[3,]  0.06507  0.7207
[4,] -0.11232  0.1412
[5,] -0.24712  1.4317
[6,]  1.09157  0.3979
anderson <- factor.scores(x=X, f=faPC, method="Anderson")
head(anderson$scores)
         MR2     MR1
[1,] -0.2025 -0.4724
[2,]  1.3725 -0.1340
[3,]  0.1182  0.8227
[4,] -0.1308  0.1545
[5,] -0.2286  1.6144
[6,]  1.3650  0.5099

Visualize loadings

factor.plot(faPC, cut=0.5)
plot of chunk rerMultFA01

plot of chunk rerMultFA01

fa.diagram(faPC)
plot of chunk rerMultFA01

plot of chunk rerMultFA01

Determine number of factors

Parallel analysis and a "very simple structure" analysis provide help in selecting the number of factors. Again, package psych has the required functions. vss() takes the polychoric correlation matrix as an argument.

fa.parallel(X)                     # parallel analysis
plot of chunk rerMultFA02

plot of chunk rerMultFA02

Parallel analysis suggests that the number of factors =  2  and the number of components =  2 
vss(X, n.obs=N, rotate="varimax")  # very simple structure
plot of chunk rerMultFA02

plot of chunk rerMultFA02


Very Simple Structure
Call: VSS(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
    n.obs = n.obs, plot = plot, title = title)
VSS complexity 1 achieves a maximimum of 0.59  with  3  factors
VSS complexity 2 achieves a maximimum of 0.72  with  3  factors

The Velicer MAP criterion achieves a minimum of NA  with  1  factors
 
Velicer MAP
[1] 0.07 0.10 0.25 0.45 1.00   NA

Very Simple Structure Complexity 1
[1] 0.45 0.57 0.59 0.57 0.52 0.53

Very Simple Structure Complexity 2
[1] 0.00 0.69 0.72 0.72 0.67 0.67

Useful packages

For confirmatory factor analysis (CFA), see packages sem, OpenMx, and lavaan which implement structural equation models. More packages can be found in CRAN task view Psychometric Models.

Detach (automatically) loaded packages (if possible)

try(detach(package:psych))
try(detach(package:GPArotation))
try(detach(package:mvtnorm))

Get the article source from GitHub

R markdown - markdown - R code - all posts