Generalized Linear Models Workshop
Last modified: 29-01-2026
Estimate the expected (average) outcome given predictors.
The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes.
A constant change in x leads to a constant change in the expected outcome.
lm(y~x)For a fixed , the model describes the distribution of possible outcomes around the mean .
The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes.
A Normal distribution has parameters (mean) and (variance):
A Normal distribution has parameters (mean) and (variance):
A Normal distribution has parameters (mean) and (variance):
When building a model, we need to know:
A probability distribution of a random variable describes the probabilities assigned to each possible value , given certain parameters values.
becomes because we are assigning a certain probability to a discrete variables. The function is called probability mass function (PMF).
and the function is called probability density function (PDF).
The PMF return the probability associated with a certain value of while the PDF returns the probability density of a certain value of
When we use the distributions, we are usually interested in some properties describing the distribution:
Moments are always the same but the way we actual compute them could change according to the distribution.
In normal linear and generalized linear models we generally include predictors on the mean of the distribution.
In the case of Normal distribution the mean and the variance are the same as the parameters defining the distribution.
| Distribution | Support | mean: | variance: |
|---|---|---|---|
| Normal: | |||
| Gamma: | |||
| Binomial: | |||
| Poisson: |
Probability of passing the exam as a function of how many hours students study:
id studyh passed
1 1 29 0
2 2 79 1
3 3 41 1
4 4 88 1
5 5 94 1
6 6 5 0
7 7 53 0
8 8 89 1
9 9 55 1
10 10 46 0
11 11 96 1
12 12 45 0
13 13 68 1
14 14 57 0
15 15 10 0
16 16 90 1
17 17 25 0
18 18 4 0
19 19 33 1
20 20 95 1
21 21 89 1
22 22 69 0
23 23 64 0
24 24 99 1
25 25 66 1
26 26 71 1
27 27 54 0
28 28 59 1
29 29 29 0
30 30 15 0
31 31 96 1
32 32 90 1
33 33 69 1
34 34 80 0
35 35 2 0
36 36 48 0
37 37 76 0
38 38 22 1
39 39 32 1
40 40 23 0
41 41 14 0
42 42 41 0
43 43 41 0
44 44 37 0
45 45 15 0
46 46 14 0
47 47 23 0
48 48 47 0
49 49 27 0
50 50 86 1
# number of students that have passed the exam
sum(dat_exam$passed)
#> [1] 22
# proportion of students that have passed the exam
mean(dat_exam$passed)
#> [1] 0.44
# study hours and passing the exam
tapply(dat_exam$studyh, dat_exam$passed, mean)
#> 0 1
#> 35.46429 73.04545
#>
table(dat_exam$passed, cut(dat_exam$studyh, breaks = 4))
#> (1.9,26.2] (26.2,50.5] (50.5,74.8] (74.8,99.1]
#> 0 11 10 5 2
#> 1 1 3 6 12
#>
tapply(dat_exam$passed, cut(dat_exam$studyh, breaks = 4), mean)
#> (1.9,26.2] (26.2,50.5] (50.5,74.8] (74.8,99.1]
#> 0.08333333 0.23076923 0.54545455 0.85714286Do you see something strange?
The model should consider both the support of the variable and the non-linear pattern!
Random Component: Choose a distribution
Systematic Component: Linear predictor
Link Function: Transform the mean
The random component specifies a probability model for :
What support?
Choosing a distribution specifies not only the mean, but also how the variance depends on the mean:
where is the variance function (determined by the family) and is a constant scale factor.
The systematic component is exactly the same as in normal linear regression: we predict a linear combination of predictors.
Basically it describes how the expected value (i.e., the mean, the first moment) of the chosen distribution (the random component) varies according to the predictors.
The link function connects the expected value (mean) of the distribution to the linear predictor :
| Distribution | Support of | Link | Purpose |
|---|---|---|---|
| Normal | Identity: | No transformation | |
| Binomial | Logit on : , where | Probability | |
| Poisson | Log: | Positive | |
| Gamma | Log: | Positive |
For example, a Generalized Linear Model with the Normal family and identity link can be written as:
For example, a Generalized Linear Model with the Binomial family and logit link can be written as:
The Bernoulli or the Binomial distributions can be used as random component when we have a binary dependent variable or the number of successes over the total number of trials.
Let indicate whether a student passes the exam:
The probability mass function of the Bernoulli distribution, over possible outcomes , is
Where is the probability of success and the two possible results 0 and 1. The mean is and the variance is
Now let be the number of students who pass out of students who take the same exam.
The probability of having success (e.g., 0, 1, 2, etc.), out of trials with a probability of success (and failing ) is:
The is the mean of the binomial distribution and is the variance. The binomial distribution is just the repetition of independent Bernoulli trials.
How many pass?
Variance is not constant!
Because probabilities must stay between 0 and 1, we work with odds.
Let , then the odds of passing compare “pass” to “fail”:
If , then .
So passing is 4-to-1 relative to failing (4 expected passes for 1 fail, on average).
The logit is the log-odds:
This maps to any real number, which makes it easier to model with a linear predictor.
The odds have an interesting property when taking the logarithm. We can express a probability sing a scale ranging .
With and hours studied, the model assumes:
Then the odds:
Then the odds:
When
When
Then the ratio of these two odds is:
This means that the odds of passing when increasing study hours by 1 is times greater than at the baseline (i.e., when ).
A +1 increase in changes by a constant amount . So equal increases in correspond to equal increases in log-odds.
To go back to probability we apply the inverse-logit:
Equal increases in generally do not correspond to equal increases in , because is nonlinear.
Here , so each +1 hour multiplies the odds by ~2.23.
| Baseline p | Baseline odds | New odds odds | New p | p |
|---|---|---|---|---|
| 0.10 | 0.11 | 0.25 | 0.20 | +0.10 |
| 0.20 | 0.25 | 0.55 | 0.36 | +0.16 |
| 0.36 | 0.55 | 1.22 | 0.55 | +0.20 |
| 0.55 | 1.22 | 2.73 | 0.73 | +0.18 |
| 0.73 | 2.73 | 6.07 | 0.86 | +0.13 |
| 0.86 | 6.07 | 13.51 | 0.93 | +0.07 |