Math 430: Lecture 8b

Logistic regression part 2

Professor Catalina Medina

Logisitic regression

\[\log\left(\frac{E[Y_i | X_{1i}, ..., X_{pi}]}{1 - E[Y_i | X_{1i}, ..., X_{pi}]}\right) = \beta_0 + \beta_1 X_{1i} + ... + \beta_p X_{pi}\]

We often let \(\pi_i = E[Y_i | X_{1i}, ..., X_{pi}]\) so our model is written

\[\log\left(\frac{\pi_i}{1 - \pi_i}\right) = \beta_0 + \beta_1 X_{1i} + ... + \beta_p X_{pi}\]

Interpreting coefficients

\(\beta_0\) is the log odds for when all \(X_i\)’s are zero.

\(\beta_1\) is the log odds ratio associated with a one unit increase in \(X_1\)

  • \(e^{\beta_1}\) is the odds ratio associated with a one unit increase in \(X_1\)
  • \(e^{\beta_1}\) is the multiplicative change / factor change in odds associated with a one unit increase in \(X_i\)
  • For \(e^{\beta_1} > 1\), \(100 * (e^{\beta_1} - 1)\)% is the increase in odds associated with a one unit increase in \(X_1\). For \(e^{\beta_1} < 1\), we have \(100 * (1 - e^{\beta_1})\)% decrease in odds associated with a one unit increase in \(X_1\)

… with all other \(X_i\)’s held constant.

Inference

In our class we will focus on two methods for hypothesis tests and confidence intervals

  • Wald
  • Likelihood ratio

Comparing inference methods

Wald - For a single predictor*

  • Pro: Easy to calculate
  • Pro: Only requires fitting one model
  • Con: Invalid for small sample size
  • Con: Confidence intervals may contain invalid values outside [0,1]

Likelihood ratio - similar to partial F / F-test for linear regression

Comparing inference methods

Wald - For a single predictor

  • Pro: Easy to calculate
  • Pro: Only requires fitting one model
  • Con: Invalid for small sample size
  • Con: Confidence intervals may contain invalid values outside [0,1]

Likelihood ratio - Compares nested models

  • Pro: Confidence intervals will only contain valid values
  • Pro: Not invalidated by small sample size
  • Con: Requires fitting two models
  • Con: “Not as easy to fit”

Coronary heart disease (cdh) data

A retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. These data are taken from a larger dataset, described in Rousseauw et al, 1983, South African Medical Journal.

  • sbp systolic blood pressure
  • tobacco cumulative tobacco (kg)
  • ldl low density lipoprotein cholesterol
  • adiposity body fat
  • famhist family history of heart disease
  • typea type-A behavior
  • obesity
  • alcohol current alcohol consumption
  • age age at onset

heart_data <- read_csv(
  "https://hastie.su.domains/ElemStatLearn/datasets/SAheart.data", 
  show_col_types = FALSE
)

glimpse(heart_data)
Rows: 462
Columns: 11
$ row.names <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ sbp       <dbl> 160, 144, 118, 170, 134, 132, 142, 114, 114, 132, 206, 134, …
$ tobacco   <dbl> 12.00, 0.01, 0.08, 7.50, 13.60, 6.20, 4.05, 4.08, 0.00, 0.00…
$ ldl       <dbl> 5.73, 4.41, 3.48, 6.41, 3.50, 6.47, 3.38, 4.59, 3.83, 5.80, …
$ adiposity <dbl> 23.11, 28.61, 32.28, 38.03, 27.78, 36.21, 16.20, 14.60, 19.4…
$ famhist   <chr> "Present", "Absent", "Present", "Present", "Present", "Prese…
$ typea     <dbl> 49, 55, 52, 51, 60, 62, 59, 62, 49, 69, 72, 65, 59, 49, 54, …
$ obesity   <dbl> 25.30, 28.87, 29.14, 31.99, 25.99, 30.77, 20.81, 23.11, 24.8…
$ alcohol   <dbl> 97.20, 2.06, 3.81, 24.26, 57.34, 14.14, 2.62, 6.72, 2.49, 0.…
$ age       <dbl> 52, 63, 46, 58, 49, 45, 38, 58, 29, 53, 60, 40, 17, 15, 53, …
$ chd       <dbl> 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, …

Fitting the model

\[\begin{equation*} \begin{split} \log(\frac{\pi_i}{1 - \pi_i}) &= \beta_0 + \beta_1 X_{\text{Blood pressure}, i}\\ &+ \beta_2 X_{\text{Family history}, i} + \beta_1 X_{\text{Age}, i} \end{split} \end{equation*}\]

heart_model <- glm(
  chd ~ sbp + famhist + age,
  data = heart_data,
  family = binomial # This tells it to do logisitic regression
)

summary(heart_model)

Call:
glm(formula = chd ~ sbp + famhist + age, family = binomial, data = heart_data)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.500388   0.767490  -5.864 4.52e-09 ***
sbp             0.006404   0.005389   1.188    0.235    
famhistPresent  0.943106   0.216915   4.348 1.37e-05 ***
age             0.056222   0.009239   6.086 1.16e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.11  on 461  degrees of freedom
Residual deviance: 505.24  on 458  degrees of freedom
AIC: 513.24

Number of Fisher Scoring iterations: 4

Testing example 1

Let’s test if systolic blood pressure should be in the model

\(H_0: \beta_1 = 0\)

\(H_A: \beta_1 \neq 0\)

We can do this with a Wald or Likelihood ratio test

heart_full_model <- glm(
  chd ~ sbp + famhist + age,
  data = heart_data,
  family = binomial # This tells it to do logisitic regression
)

Performing Wald Test

summary(heart_full_model)

Call:
glm(formula = chd ~ sbp + famhist + age, family = binomial, data = heart_data)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.500388   0.767490  -5.864 4.52e-09 ***
sbp             0.006404   0.005389   1.188    0.235    
famhistPresent  0.943106   0.216915   4.348 1.37e-05 ***
age             0.056222   0.009239   6.086 1.16e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.11  on 461  degrees of freedom
Residual deviance: 505.24  on 458  degrees of freedom
AIC: 513.24

Number of Fisher Scoring iterations: 4

The summary() function displays Wald Statistics.

Performing Likelihood Ratio Tests

heart_reduced_model <- glm(
  chd ~ famhist + age,
  data = heart_data,
  family = binomial # This tells it to do logisitic regression
)

anova(heart_reduced_model, heart_full_model, test = "LRT")
Analysis of Deviance Table

Model 1: chd ~ famhist + age
Model 2: chd ~ sbp + famhist + age
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1       459     506.66                     
2       458     505.24  1    1.414   0.2344

  • The p-value for Wald test: 0.235
  • The p-value for the Likelihood Ratio Test: 0.2344
  • The two p-values won’t necessarily equal the same. In practice you would choose one test to perform, you wouldn’t fit both.

A probability of 0.02344 is not unlikely, so fail to reject \(H_0\).

We found no evidence (Likelihood ratio test p-value = 0.2344) that the model with systolic blood pressure significantly improves the fit compared to the model with just family history and age.

Testing example 2

Let’s test if systolic blood pressure and family history should be in the model

\(H_0: \beta_1 = \beta_2 = 0\)

\(H_A:\) At least one of \(\beta_1\) or \(\beta_2\) are zero

We can do this with a Wald or Likelihood ratio test

Wald test

library(aod)

wald.test(
  Sigma = vcov(heart_full_model), 
  b = coef(heart_full_model), 
  Terms = c(1, 2)
)
Wald test:
----------

Chi-squared test:
X2 = 75.0, df = 2, P(> X2) = 0.0

So we can test subsets of predictors with a Wald test, but it is different…

Likelihood ratio test

heart_reduced_model2 <- glm(
  chd ~ age,
  data = heart_data,
  family = binomial
)

anova(heart_reduced_model2, heart_full_model, test = "LRT")
Analysis of Deviance Table

Model 1: chd ~ age
Model 2: chd ~ sbp + famhist + age
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       460     525.56                          
2       458     505.24  2   20.318 3.872e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s use the likelihood ratio test.

We found very strong evidence (likelihood ratio test p-value 3.9e-05) that including systolic blood pressure and family history improve the model fit, over a model with age only.

Which to use?

  • We already listed the pros and cons, so with those considerations you have to make a choice.
  • I think it is important that you specify what test you used when providing the p-value.
  • If you have a small sample size DON’T use Wald

Confidence intervals

Via Likelihood Ratio: We can used the confint() function like we did with linear regression

confint(heart_full_model)
                      2.5 %      97.5 %
(Intercept)    -6.038659314 -3.02234201
sbp            -0.004159741  0.01703609
famhistPresent  0.520038741  1.37140982
age             0.038573599  0.07486714

Via Wald: We can used the confint.default() function

confint.default(heart_full_model)
                      2.5 %      97.5 %
(Intercept)    -6.004640777 -2.99613599
sbp            -0.004158755  0.01696642
famhistPresent  0.517961224  1.36825057
age             0.038114783  0.07432913

We didn’t have time to cover prediction in class so that was moved to Tuesday lecture 9a.