Multivariate logistic regression (Multicategorical)

When we use this many variables then it is called multivariate logistic regression, but one of the variables is a multicategorical independent variable. In our example this is the age, because it has 5 categories.

Our age independent variable has 5 categories, out of which we have to choose one reference category. The reference category is always the one that we leave out of the regression. When we have independent variables with multiple categories (more than 2 categories) then we have to create dummy variables from these categories. We will leave out one, so we will have 4 dummy variables that we have to include into the regression.

weight by pspwght.
fre rlgblg.
RECODE rlgblg (1=1)(2=0) into rlgblg_2cat.
VARIABLE LABELS rlgblg_2cat ‘Belonging or not to particular religion or denomination?’.
VALUE LABELS rlgblg_2cat 1’yes’ 0’no’.
fre rlgblg rlgblg_2cat.

fre gndr.
RECODE gndr (1=1)(2=0) into gndr_2cat.
VARIABLE LABELS gndr_2cat ‘gender=male’.
VALUE LABELS gndr_2cat 1’male’ 0’female’.
fre gndr gndr_2cat.

fre domicile.
RECODE domicil (1 2=1)(3 thru 5=0) into domicil_2cat.
VARIABLE LABELS domicil_2cat ‘domicil=big city’.
VALUE LABELS domicil_2cat 1’big city or outskirts’ 0’not big city’.
fre domicil domicil_2cat.

*Age (continuous) – let’s create categories.
fre agea.
RECODE agea (18 thru 29=1)(30 thru 39=2)(40 thru 49=3) (50 thru 59=4) (60 thru hi =5) into agea_5cat.
variable labels agea_5cat ‘agea_5cat’.
value labels agea_5cat 1’18-29′ 2’30-39′ 3’40-49′ 4’50-59′ 5’60+’.
fre agea agea_5cat.

RECODE agea_5cat (1=1) (2 thru 5=0) into age_18_29.
variable labels age_18_29 ‘age_18_29’.
value labels age_18_29 1′ age_18_29′ 0’not age_18_29′.
fre agea_5cat age_18_29.

RECODE agea_5cat (2=1) (1 3 4 5=0) into age_30_39.
variable labels age_30_39 ‘age_30_39’.
value labels age_30_39 1′ age_30_39′ 0’not age_30_39′.
fre agea_5cat age_30_39.

RECODE agea_5cat (3=1) (1 2 4 5=0) into age_40_49.
variable labels age_40_49 ‘age_40_49’.
value labels age_40_49 1′ age_40_49′ 0’not age_40_49′.
fre agea_5cat age_40_49.

RECODE agea_5cat (4=1) (1 2 3 5=0) into age_50_59.
variable labels age_50_59 ‘age_50_59’.
value labels age_50_59 1’age_50_59′ 0’not age_50_59′.
fre agea_5cat age_50_59.

RECODE agea_5cat (5=1) (1 thru 4=0) into age_60.
variable labels age_60 ‘age_60 and above 60’.
value labels age_60 1’age_60′ 0’not age_60′.
fre agea_5cat age_60.

*first model.
LOGISTIC REGRESSION rlgblg_2cat WITH gndr_2cat
domicil_2cat age_30_39 age_40_49 age_50_59 age_60.

*Second model.
LOGISTIC REGRESSION rlgblg_2cat WITH gndr_2cat
domicil_2cat age_18_29 age_30_39 age_40_49 age_50_59.

*Third model.
LOGISTIC REGRESSION rlgblg_2cat WITH gndr_2cat
domicil_2cat age_18_29 age_30_39 age_50_59 age_60.

*=======================================================.
*The same logistic regression for the 1st model without manually creating dummy variables:.
LOGISTIC REGRESSION rlgblg_2cat WITH gndr_2cat domicil_2cat agea_5cat
/categorical agea_5cat
/contrast (agea_5cat)=indicator(1).

/contrast (agea_5cat)=indicator(1). – this means we choose the first category (18-29) as an indicator category (reference category). So, the Spss creates the dummies for you but only for this regression.
Note: Watch out for the codings. The category codings are automatically created by the SPSS and you can check the coding system in the table called “Categorical Variables Codings”.

First model

We select the youngest age group as a reference category, so we leave out this category from the regression.

The constant means that “when everything else is 0”, so here we take the 0 category of all the independent variables. So, the constant value refers to females in the youngest age group, who live in a small city.

b0: The log odds of being religious is -0,310 for females who live in a small city and are aged 18-29.

b1: The log odds of being religious are lower by -0,437 among males than among females, everything else held constant.

In other words: The log odds of being religious are lower by -0,437 among males than among females, but this is true only for those who live in a small city and are aged 18-29.

b3: The log odds of beling religious are higher by 0,197 among people aged 30-39 than among the youngest people, everything else held constant. (everything else held constant = this is true for females who live in small cities.)

Exp(b6): The odds of being religious among those who are aged 60 or above are 3,611 times as high as among people aged 19-29, everything else held constant. (everything else held constant = this is true for females who live in small cities.)

Second model

We choose the category that refers to those who are aged 60 and above as a reference category.

Exp(b5): The odds of being religious among those who are aged 40-49 are 0,417 times as low as among those who are aged 60 or above 60, everything else held constant. (everything else held constant = this is true for females who live in small cities.)

Conclusion: All age cohorts are significantly less religious than people aged 60 or above (this is the reference category).
Belonging to older and older age cohorts (categories) decreases the difference in religiousness compared to the oldest age cohort.

Third model

The reference category, that we left out of the model is: people aged 40-49.

Exp(b0): The odds of being religious among women aged 40-49, who are living in small cities are 1.106.

b0: The log odds of being religious among women aged 40-49, who are living in small cities is 0,101.

Exp(b1): The odds of being religious among men are 0,646 times as low as among women, everything else held constant. (everything else held constant = this is true for females who live in small cities because females and small cities are coded as 0.)

b1: The log odds of being religious among men is lower by -0,437 than among women, everything else held constant. (everything else held constant = this is true for females who live in small cities because females and small cities are coded as 0.)

Exp(b2): The odds of being religious among big city residents are 0,572 times as low as among small city residents, everything else held constant.

b2: The log odds of being religious among big city residents is lower by 0,558 than among small city residents, everything else held constant.

Exp(b3): The odds of being religious among people aged 18-29 are 0,664 times as low as among people aged 40-49, everything else held constant.

b3: The log odds of being religious among people aged 18-29 is lower by 0,410 than among people aged 40-49, everything else held constant.

Exp(b4): The odds of being religious among people aged 30-39 are 0,808 times as low as among people aged 40-49, everything else held constant.

Exp(b5): The odds of being religious among people aged 50-59 are 1,435 times as high as among people aged 40-49, everything else held constant.

Exp(b6): The odds of being religious among people aged 60 or above are 2,396 times as high as among people aged 40-49, everything else held constant.

Note: We used this many models because we were curious about the difference between various age groups. As you see the coefficients for gender and domicil remained the same in all the models. The difference between the models is that we use a different age cohort as a reference in order to see the significance level between the reference and other age groups.
Of course, we could have calculated the difference between 30-39 and 40-49 age cohorts by subtracting one from the other but what we don’t know is the p value for this difference. In order to get the p value we must
run the regression again using another reference category.
Example
age 18-29:
the difference between age 18-29 <-> 30-39 : 0,410-0,213 = 0,197
the difference between 18-29 <-> 40-49: 0,410 (because 40-49 is our reference group)
the difference between 30-39 and 40-49 = 0,410-0,197=0,213

Other examples