Math 430: Lecture 8b

Logistic regression part 2

Professor Catalina Medina

Logisitic regression

\[\log\left(\frac{E[Y_i | X_{1i}, ..., X_{pi}]}{1 - E[Y_i | X_{1i}, ..., X_{pi}]}\right) = \beta_0 + \beta_1 X_{1i} + ... + \beta_p X_{pi}\]

We often let \(\pi_i = E[Y_i | X_{1i}, ..., X_{pi}]\) so our model is written

\[\log\left(\frac{\pi_i}{1 - \pi_i}\right) = \beta_0 + \beta_1 X_{1i} + ... + \beta_p X_{pi}\]

Interpreting coefficients

\(\beta_0\) is the log odds for when all \(X_i\)’s are zero.

\(\beta_1\) is the log odds ratio associated with a one unit increase in \(X_1\)

\(e^{\beta_1}\) is the odds ratio associated with a one unit increase in \(X_1\)
\(e^{\beta_1}\) is the multiplicative change / factor change in odds associated with a one unit increase in \(X_i\)
For \(e^{\beta_1} > 1\), \(100 * (e^{\beta_1} - 1)\)% is the increase in odds associated with a one unit increase in \(X_1\). For \(e^{\beta_1} < 1\), we have \(100 * (1 - e^{\beta_1})\)% decrease in odds associated with a one unit increase in \(X_1\)

… with all other \(X_i\)’s held constant.

Inference

In our class we will focus on two methods for hypothesis tests and confidence intervals

Wald
Likelihood ratio

Comparing inference methods

Wald - For a single predictor*

Pro: Easy to calculate
Pro: Only requires fitting one model
Con: Invalid for small sample size
Con: Confidence intervals may contain invalid values outside [0,1]

Likelihood ratio - similar to partial F / F-test for linear regression

Comparing inference methods

Wald - For a single predictor

Pro: Easy to calculate
Pro: Only requires fitting one model
Con: Invalid for small sample size
Con: Confidence intervals may contain invalid values outside [0,1]

Likelihood ratio - Compares nested models

Pro: Confidence intervals will only contain valid values
Pro: Not invalidated by small sample size
Con: Requires fitting two models
Con: “Not as easy to fit”

Coronary heart disease (cdh) data

A retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. These data are taken from a larger dataset, described in Rousseauw et al, 1983, South African Medical Journal.

sbp systolic blood pressure
tobacco cumulative tobacco (kg)
ldl low density lipoprotein cholesterol
adiposity body fat
famhist family history of heart disease
typea type-A behavior
obesity
alcohol current alcohol consumption
age age at onset

heart_data <- read_csv(
  "https://hastie.su.domains/ElemStatLearn/datasets/SAheart.data", 
  show_col_types = FALSE
)

glimpse(heart_data)

Rows: 462
Columns: 11
$ row.names <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ sbp       <dbl> 160, 144, 118, 170, 134, 132, 142, 114, 114, 132, 206, 134, …
$ tobacco   <dbl> 12.00, 0.01, 0.08, 7.50, 13.60, 6.20, 4.05, 4.08, 0.00, 0.00…
$ ldl       <dbl> 5.73, 4.41, 3.48, 6.41, 3.50, 6.47, 3.38, 4.59, 3.83, 5.80, …
$ adiposity <dbl> 23.11, 28.61, 32.28, 38.03, 27.78, 36.21, 16.20, 14.60, 19.4…
$ famhist   <chr> "Present", "Absent", "Present", "Present", "Present", "Prese…
$ typea     <dbl> 49, 55, 52, 51, 60, 62, 59, 62, 49, 69, 72, 65, 59, 49, 54, …
$ obesity   <dbl> 25.30, 28.87, 29.14, 31.99, 25.99, 30.77, 20.81, 23.11, 24.8…
$ alcohol   <dbl> 97.20, 2.06, 3.81, 24.26, 57.34, 14.14, 2.62, 6.72, 2.49, 0.…
$ age       <dbl> 52, 63, 46, 58, 49, 45, 38, 58, 29, 53, 60, 40, 17, 15, 53, …
$ chd       <dbl> 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, …

Fitting the model

\[\begin{equation*} \begin{split} \log(\frac{\pi_i}{1 - \pi_i}) &= \beta_0 + \beta_1 X_{\text{Blood pressure}, i}\\ &+ \beta_2 X_{\text{Family history}, i} + \beta_1 X_{\text{Age}, i} \end{split} \end{equation*}\]

heart_model <- glm(
  chd ~ sbp + famhist + age,
  data = heart_data,
  family = binomial # This tells it to do logisitic regression
)

summary(heart_model)


Call:
glm(formula = chd ~ sbp + famhist + age, family = binomial, data = heart_data)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.500388   0.767490  -5.864 4.52e-09 ***
sbp             0.006404   0.005389   1.188    0.235    
famhistPresent  0.943106   0.216915   4.348 1.37e-05 ***
age             0.056222   0.009239   6.086 1.16e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.11  on 461  degrees of freedom
Residual deviance: 505.24  on 458  degrees of freedom
AIC: 513.24

Number of Fisher Scoring iterations: 4

Testing example 1

Let’s test if systolic blood pressure should be in the model

\(H_0: \beta_1 = 0\)

\(H_A: \beta_1 \neq 0\)

We can do this with a Wald or Likelihood ratio test

heart_full_model <- glm(
  chd ~ sbp + famhist + age,
  data = heart_data,
  family = binomial # This tells it to do logisitic regression
)

Performing Wald Test

summary(heart_full_model)


Call:
glm(formula = chd ~ sbp + famhist + age, family = binomial, data = heart_data)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.500388   0.767490  -5.864 4.52e-09 ***
sbp             0.006404   0.005389   1.188    0.235    
famhistPresent  0.943106   0.216915   4.348 1.37e-05 ***
age             0.056222   0.009239   6.086 1.16e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.11  on 461  degrees of freedom
Residual deviance: 505.24  on 458  degrees of freedom
AIC: 513.24

Number of Fisher Scoring iterations: 4

The summary() function displays Wald Statistics.

Performing Likelihood Ratio Tests

heart_reduced_model <- glm(
  chd ~ famhist + age,
  data = heart_data,
  family = binomial # This tells it to do logisitic regression
)

anova(heart_reduced_model, heart_full_model, test = "LRT")

Analysis of Deviance Table

Model 1: chd ~ famhist + age
Model 2: chd ~ sbp + famhist + age
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1       459     506.66                     
2       458     505.24  1    1.414   0.2344

The p-value for Wald test: 0.235
The p-value for the Likelihood Ratio Test: 0.2344
The two p-values won’t necessarily equal the same. In practice you would choose one test to perform, you wouldn’t fit both.

A probability of 0.02344 is not unlikely, so fail to reject \(H_0\).

We found no evidence (Likelihood ratio test p-value = 0.2344) that the model with systolic blood pressure significantly improves the fit compared to the model with just family history and age.

Testing example 2

Let’s test if systolic blood pressure and family history should be in the model

\(H_0: \beta_1 = \beta_2 = 0\)

\(H_A:\) At least one of \(\beta_1\) or \(\beta_2\) are zero

We can do this with a Wald or Likelihood ratio test

Wald test

library(aod)

wald.test(
  Sigma = vcov(heart_full_model), 
  b = coef(heart_full_model), 
  Terms = c(1, 2)
)

Wald test:
----------

Chi-squared test:
X2 = 75.0, df = 2, P(> X2) = 0.0

So we can test subsets of predictors with a Wald test, but it is different…

Likelihood ratio test

heart_reduced_model2 <- glm(
  chd ~ age,
  data = heart_data,
  family = binomial
)

anova(heart_reduced_model2, heart_full_model, test = "LRT")

Analysis of Deviance Table

Model 1: chd ~ age
Model 2: chd ~ sbp + famhist + age
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       460     525.56                          
2       458     505.24  2   20.318 3.872e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s use the likelihood ratio test.

We found very strong evidence (likelihood ratio test p-value 3.9e-05) that including systolic blood pressure and family history improve the model fit, over a model with age only.

Which to use?

We already listed the pros and cons, so with those considerations you have to make a choice.
I think it is important that you specify what test you used when providing the p-value.
If you have a small sample size DON’T use Wald

Confidence intervals

Via Likelihood Ratio: We can used the confint() function like we did with linear regression

confint(heart_full_model)

                      2.5 %      97.5 %
(Intercept)    -6.038659314 -3.02234201
sbp            -0.004159741  0.01703609
famhistPresent  0.520038741  1.37140982
age             0.038573599  0.07486714

Via Wald: We can used the confint.default() function

confint.default(heart_full_model)

                      2.5 %      97.5 %
(Intercept)    -6.004640777 -2.99613599
sbp            -0.004158755  0.01696642
famhistPresent  0.517961224  1.36825057
age             0.038114783  0.07432913

We didn’t have time to cover prediction in class so that was moved to Tuesday lecture 9a.