# Exploratory factor analysis for ordinal categorical data

## Factor analysis

### Simulate categorical data based on continuous variables

First, let’s simulate 200 observations from 6 variables, coming from 2 orthogonal factors. I’ll take a couple of intermediate steps and start with multivariate normal continuous data that I later dichotomize. That way, we can compare Pearson correlations with polychoric correlations, and compare factor loadings from continuous data with that from dichotomous data and the true loadings.

Now simulate the actual data from the model $$x = \Lambda f + e$$, with $$x$$ being the observed variable values of a person, $$\Lambda$$ the true loadings matrix, $$f$$ the latent factor score, and $$e$$ iid, mean 0, normal errors.

Now let’s categorize the data. We’ll keep the data in two formats: as a data frame with ordered factors, and as a numeric matrix. hetcor() from package polycor gives us the polychoric correlation matrix we’ll later use for the FA.

### Factor analysis for ordered categorical data

Use the polychoric correlation matrix to do a regular FA.

MR1    MR2
X1  0.558 -0.225
X2  0.605
X3 -0.161  0.845
X4 -0.192  0.306
X5  0.341  0.527
X6 -0.610

MR1   MR2
Proportion Var 0.205 0.189
Cumulative Var 0.205 0.394

It is possible to skip the step of calculating the polychoric correlation matrix, and directly use fa.poly() from package psych, which does the same thing in the end. This function accepts the raw dichotomous data as a numeric matrix.

MR1    MR2
X1  0.559 -0.224
X2  0.603
X3 -0.162  0.844
X4 -0.192  0.306
X5  0.340  0.527
X6 -0.612

MR1   MR2
Proportion Var 0.205 0.189
Cumulative Var 0.205 0.394

## Factor scores

For factor scores, look at package ltm which has a factor.scores() function specifically for polytomous outcome data. An example is provided on this page.

You can visualize the loadings from the factor analysis using factor.plot() and fa.diagram(), both from package psych. For some reason, factor.plot() accepts only the $fa component of the result from fa.poly(), not the full object. ## Determine number of factors Parallel analysis and a “very simple structure” analysis provide help in selecting the number of factors. Again, package psych has the required functions. vss() takes the polychoric correlation matrix as an argument. Call: fa.parallel.poly(x = ordNum) Error in data.frame(fa = x$fa.values, fa.sam = x$fa.simr, fa.sim = x$fa.sim, : arguments imply differing number of rows: 6, 0

Very Simple Structure
Call: vss(x = pc\$correlations, rotate = "varimax", n.obs = N)
VSS complexity 1 achieves a maximimum of 0.62  with  4  factors
VSS complexity 2 achieves a maximimum of 0.74  with  5  factors

The Velicer MAP achieves a minimum of NA  with  1  factors
BIC achieves a minimum of  NA  with  2  factors
Sample Size adjusted BIC achieves a minimum of  NA  with  2  factors

Statistics by number of factors
vss1 vss2   map dof   chisq    prob sqresid  fit RMSEA BIC SABIC complex
1 0.41 0.00 0.076   9 7.8e+01 4.4e-13     4.6 0.41 0.195  30  58.6     1.0
2 0.58 0.69 0.107   4 8.2e+00 8.4e-02     2.4 0.69 0.072 -13  -0.3     1.3
3 0.60 0.73 0.223   0 4.7e-01      NA     1.9 0.76    NA  NA    NA     1.4
4 0.62 0.73 0.400  -3 8.3e-09      NA     1.6 0.79    NA  NA    NA     1.5
5 0.62 0.74 1.000  -5 0.0e+00      NA     1.6 0.79    NA  NA    NA     1.6
6 0.62 0.74    NA  -6 0.0e+00      NA     1.6 0.79    NA  NA    NA     1.6
eChisq    SRMR eCRMS eBIC
1 1.3e+02 1.5e-01 0.188   79
2 6.7e+00 3.3e-02 0.064  -15
3 3.1e-01 7.2e-03    NA   NA
4 7.4e-09 1.1e-06    NA   NA
5 7.4e-19 1.1e-11    NA   NA
6 7.4e-19 1.1e-11    NA   NA

## Detach (automatically) loaded packages (if possible)

(based on an answer I wrote on CrossValidated)