Binary logistic regression

TODO

  • link to regressionOrdinal, regressionMultinom, regressionDiag for outliers, collinearity, crossvalidation

Install required packages

rms, DescTools, pROC

Descriptive model fit

Simulate data

plot of chunk rerRegressionLogistic01
plot of chunk rerRegressionLogistic01
plot of chunk rerRegressionLogistic01
plot of chunk rerRegressionLogistic01
plot of chunk rerRegressionLogistic01
plot of chunk rerRegressionLogistic01

Fit the model


Call:  glm(formula = postFac ~ DVpre + IV, family = binomial(link = "logit"), 
    data = dfAncova)

Coefficients:
(Intercept)        DVpre    IVPlacebo         IVWL  
    -8.4230       0.4258       1.7306       1.2027  

Degrees of Freedom: 29 Total (i.e. Null);  26 Residual
Null Deviance:      41.46 
Residual Deviance: 24.41    AIC: 32.41

Odds ratios

 (Intercept)        DVpre    IVPlacebo         IVWL 
0.0002197532 1.5308001795 5.6440022784 3.3291484767 

Profile likelihood based confidence intervals for odds ratios

                   2.5 %     97.5 %
(Intercept) 1.488482e-07  0.0251596
DVpre       1.193766e+00  2.2446549
IVPlacebo   5.343091e-01 95.1942030
IVWL        2.916673e-01 52.2883653

Fit the model based on a matrix of counts


Call:  glm(formula = hitMat ~ x1 + x2, family = binomial(link = "logit"))

Coefficients:
(Intercept)           x1           x2  
  -0.437682     0.001841    -0.012805  

Degrees of Freedom: 99 Total (i.e. Null);  97 Residual
Null Deviance:      107.6 
Residual Deviance: 105.1    AIC: 543.5

Fit the model based on relative frequencies


Call:  glm(formula = relHits ~ x1 + x2, family = binomial(link = "logit"), 
    weights = total)

Coefficients:
(Intercept)           x1           x2  
  -0.437682     0.001841    -0.012805  

Degrees of Freedom: 99 Total (i.e. Null);  97 Residual
Null Deviance:      107.6 
Residual Deviance: 105.1    AIC: 543.5

Fitted logits and probabilities

plot of chunk rerRegressionLogistic02
plot of chunk rerRegressionLogistic02
         1          2          3          4          5          6 
0.31891231 0.16653918 0.16653918 0.11545968 0.07856997 0.52318493 
[1] 0.4666667
postFac
       lo        hi 
0.5333333 0.4666667 

Assess model fit

Classification table

       facHat
postFac lo hi Sum
    lo  12  4  16
    hi   4 10  14
    Sum 16 14  30

Correct classification rate

[1] 0.7333333

log-Likelihood, AUC, Somers’ \(D_{xy}\), Nagelkerke’s pseudo \(R^{2}\)

Deviance, log-likelihood and AIC

[1] 24.40857
'log Lik.' -12.20428 (df=4)
[1] 32.40857

Nagelkerke’s pseudo-\(R^{2}\) (R2), area under the ROC-Kurve (AUC = ‘C’), Somers’ \(D_{xy}\) (Dxy), Goodman & Kruskal’s \(\gamma\) (Gamma), Kendall’s \(\tau\) (Tau-a) using package rms

Logistic Regression Model
 
 lrm(formula = postFac ~ DVpre + IV, data = dfAncova)
 
                      Model Likelihood     Discrimination    Rank Discrim.    
                         Ratio Test           Indexes           Indexes       
 Obs            30    LR chi2     17.05    R2       0.579    C       0.900    
  lo            16    d.f.            3    g        2.686    Dxy     0.799    
  hi            14    Pr(> chi2) 0.0007    gr      14.672    gamma   0.803    
 max |deriv| 2e-06                         gp       0.404    tau-a   0.411    
                                           Brier    0.139                     
 
            Coef    S.E.   Wald Z Pr(>|Z|)
 Intercept  -8.4230 2.9502 -2.86  0.0043  
 DVpre       0.4258 0.1553  2.74  0.0061  
 IV=Placebo  1.7306 1.2733  1.36  0.1741  
 IV=WL       1.2027 1.2735  0.94  0.3450  
 

Plot ROC-curve using package pROC - also gives AUC


Call:
roc.default(response = glmFit$y, predictor = glmFit$fitted.values,     ci = TRUE, plot = TRUE, main = "ROC-Kurve", xlab = "1-Spezifität (TN / (TN+FP))",     ylab = "Sensitivität (TP / (TP+FN))")

Data: glmFit$fitted.values in 16 controls (glmFit$y 0) < 14 cases (glmFit$y 1).
Area under the curve: 0.8996
95% CI: 0.7914-1 (DeLong)
plot of chunk unnamed-chunk-13
plot of chunk unnamed-chunk-13

McFadden, Cox & Snell and Nagelkerke pseudo \(R^{2}\)

  McFadden Nagelkerke   CoxSnell 
 0.4112090  0.5788220  0.4334714 

Crossvalidation

cv.glm() function from package boot, see crossvalidation

Apply model to new data

         1          2          3 
0.82301174 0.82090641 0.04474881 

Coefficient tests and overall model test

Individual coefficient tests

Wald-tests for parameters


Call:
glm(formula = postFac ~ DVpre + IV, family = binomial(link = "logit"), 
    data = dfAncova)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9865  -0.5629  -0.2372   0.4660   1.5455  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -8.4230     2.9502  -2.855   0.0043 **
DVpre         0.4258     0.1553   2.742   0.0061 **
IVPlacebo     1.7306     1.2733   1.359   0.1741   
IVWL          1.2027     1.2735   0.944   0.3450   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.455  on 29  degrees of freedom
Residual deviance: 24.409  on 26  degrees of freedom
AIC: 32.409

Number of Fisher Scoring iterations: 5

Or see lrm() above

Model comparisons - likelihood-ratio tests

Analysis of Deviance Table

Model 1: postFac ~ 1
Model 2: postFac ~ DVpre + IV
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1        29     41.455                          
2        26     24.409  3   17.047 0.0006912 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Single term deletions

Model:
postFac ~ DVpre + IV
       Df Deviance    AIC     LRT  Pr(>Chi)    
<none>      24.409 32.409                      
DVpre   1   39.540 45.540 15.1319 0.0001003 ***
IV      2   26.566 30.566  2.1572 0.3400666    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Or see lrm() above

Model comparisons for testing IV

Analysis of Deviance Table

Model 1: postFac ~ DVpre
Model 2: postFac ~ DVpre + IV
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        28     26.566                     
2        26     24.409  2   2.1572   0.3401

Model comparisons for testing DVpre

Analysis of Deviance Table

Model 1: postFac ~ 1
Model 2: postFac ~ DVpre
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)    
1        29     41.455                         
2        28     26.566  1    14.89 0.000114 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Further resources

For penalized logistic regression, see packages logistf (using Firth’s penalized likelihood) and glmnet. An example using glmnet for linear regression is in regressionRobPen.

Detach (automatically) loaded packages (if possible)

Get the article source from GitHub

R markdown - markdown - R code - all posts