An interaction occurs when the relationship between one independent variable and the dependent variable depends on another independent variable. You have to imagine this as dividing our sample into two different subtables. So, interaction means that one variable modifies the effect on another variable.

When you are curious about an interaction effect then you must use the command “COMPUTE” and create a new variable, called “inter”, which we only use for calculation.

Note: In the interpretation you have to add “everything else held constant”, since you have more than 1 independent variables.

Research question: Does the effect of gender on smoking differ in small and big cities?

weight by pspwght.
fre cgtsmke.
RECODE cgtsmke (1 2=1) (3 thru 5=0) INTO cgtsmke_dummy.
VARIABLE LABELS cgtsmke_dummy ‘Smoking? (1=yes)’.
VALUE LABELS cgtsmke_dummy 1’yes’ 0’no’.
fre cgtsmke cgtsmke_dummy.

fre gndr.
RECODE gndr (1=1)(2=0) into gndr_2cat.
VARIABLE LABELS gndr_2cat ‘gender=male’.
VALUE LABELS gndr_2cat 1’male’ 0’female’.
fre gndr gndr_2cat.

fre domicil.
RECODE domicil (1 2=1)(3 thru 5=0) into domicil_dummy.
VARIABLE LABELS domicil_dummy ‘Big city or outskirts vs not big city’.
VALUE LABELS domicil_dummy 1’Big city or outskirts’ 0’Not big city’.
fre domicil domicil_dummy.

COMPUTE inter_domicil_gndr=domicil_dummy*gndr_2cat.

LOGISTIC REGRESSION cgtsmke_dummy WITH gndr_2cat domicil_dummy inter_domicil_gndr.

GRAPH
/BAR(grouped)=PGT(0)(cgtsmke_dummy) BY domicil_dummy BY gndr_2cat.

b1: The log odds of smoking among men is higher by 0,803 than among
women, everything else held constant.
In small cities, the log odds of smoking among men is higher by 0,803 than among women. So, in this case, domicil=0. So, this shows us the gender difference in smoking.

Exp(b1): 2,231: The odds of smoking for men are 2,231 times as high as for
women, everything else held constant. So, we can say that men are more likely to smoke than women. So, this shows us the gender difference in smoking.

b2: The log odds of smoking among people living in big cities is lower
by 0,177 than among people living in small cities, everything else held constant.
Among women, the log odds of smoking among people living in big cities is lower by 0,177 than among people living in small cities, everything else held constant.
(Among women, because gender=0.)

Exp(b2): 0,838: The odds of smoking for big city residents are 0,838 times as low as for those not living in big cities, everything else held constant.
This is, however, not significant.

b3: This shows the size of the interaction effect and we
use this to calculate the difference between women and men
in terms of smoking in big cities
(because we already know the difference in small cities.)
Conclusion: b1+b3 = How much higher or smaller is the log odds of smoking for men than for women in big cities.
*(We already know the gender difference in smoking behaviour for small cities from b1).

So, in the interpretation we cannot use the coefficient of “inter” by itself, but we have to combine it somehow with b1 coefficient. The b1 shows the gender difference in smoking but only for not big city residents. (only for not big city residents, because everything else is held constant and everything else held constant means that everything else takes up the 0 value).

Note: This is an additive effect, so here you have to add b1 to b3.

Exp(b3): We use this for calculation only:
Similarly to b3, we must combine this with b1, but in this case, we multiply the two values by each other:
b1*b3 = 1,98 = This much times as high are the odds of smoking for men than for women in big cities.

Note: This is a multiplicative effect, so here you have to multiply b1 by b3.

How to check whether the interaction effect is significant? You have to see the significance level of the inter variable, this will show you. Here it is not significant (p=0,636). The effect of gender is not significantly different in big cities and in small cities. So, we don’t see a significant difference on how the gender influences smoking behaviour of the people in big cities and in small cities.

## Example

In the original relationship, we have two variables. But we think that a new independent variable moderates the effect that the original independent variable has on the dependent variable. We include an interaction term into the regression because we expect that gender has a different effect on taking part in a demonstration in big cities and small towns. So, here we divide the people into two groups: we will see the effect of gender in 1) big cities and 2) in small towns and we expect that this effect will be different in these two groups.

Note: In this case, you have to use the command COMPUTE and create a new inter variable. In this inter variable we will multiply the two independent variables. (See the syntax.)

weight by pspwght.
fre pbldmn.
RECODE pbldmn (1=1)(2=0) into pbldmn_2cat.
VARIABLE LABELS pbldmn_2cat ‘taking part or not in lawful public demonstration last 12 months?’.
VALUE LABELS pbldmn_2cat 1’yes’ 0’no’.
fre pbldmn pbldmn_2cat.

fre gndr.
RECODE gndr (1=1)(2=0) into gndr_2cat.
VARIABLE LABELS gndr_2cat ‘gender=male’.
VALUE LABELS gndr_2cat 1’male’ 0’female’.
fre gndr gndr_2cat.

fre domicil.
RECODE domicil (1 2=1)(3 thru 5=0) into domicil_2cat.
VARIABLE LABELS domicil_2cat ‘domicil=big city’.
VALUE LABELS domicil_2cat 1’big city or outskirts’ 0’not big city’.
fre domicil domicil_2cat.

fre edulvlb.
RECODE edulvlb (0 thru 213=1)(313 thru 800=0) INTO
edulvlb_primary_dummy.
VARIABLE LABELS edulvlb_primary_dummy ‘Highest education level=primary’.
VALUE LABELS edulvlb_primary_dummy 1’Primary or below’ 0’Higher than primary’.
fre edulvlb edulvlb_primary_dummy.

fre edulvlb.
RECODE edulvlb (313 thru 520=1)(0 thru 213=0)(610 thru 800=0) INTO edulvlb_secondary_dummy.
VARIABLE LABELS edulvlb_secondary_dummy ‘Highest education level=secondary’.
VALUE LABELS edulvlb_secondary_dummy 1’Secondary’ 0’Not secondary’.
fre edulvlb edulvlb_secondary_dummy.

fre edulvlb.
RECODE edulvlb (0 thru 520=0)(610 thru 800=1) INTO edulvlb_tertiary_dummy.
VARIABLE LABELS edulvlb_tertiary_dummy ‘Highest education level=tertiary’.
VALUE LABELS edulvlb_tertiary_dummy 1’Tertiary’ 0’Not tertiary’.
fre edulvlb edulvlb_tertiary_dummy.

LOGISTIC REGRESSION pbldmn_2cat WITH gndr_2cat domicil_2cat edulvlb_secondary_dummy edulvlb_tertiary_dummy.

*============================================.

*Another method, without creating the dummy variables manually:.
fre edulvlb.
RECODE edulvlb (0 thru 213=1)(313 thru 520=2)(610 thru 800=3) INTO edulvlb_3cat.
VARIABLE LABELS edulvlb_3cat ‘Highest education level’.
VALUE LABELS edulvlb_3cat 1’Primary or below’ 2’Secondary’ 3’Tertiary’.
fre edulvlb edulvlb_3cat.

LOGISTIC REGRESSION pbldmn_2cat WITH gndr_2cat domicil_2cat edulvlb_3cat
/categorical edulvlb_3cat
/contrast (edulvlb_3cat)=indicator(1).
*/contrast (edulvlb_3cat)=indicator(1). – this means we choose the first category as an indicator category.

*===================================.
*INTERACTION EFFECT.
*1.step) create the variable for the interaction term.
COMPUTE inter_gndr_domicil=gndr_2cat*domicil_2cat.

*2.step) Run the regression.
LOGISTIC REGRESSION pbldmn_2cat WITH gndr_2cat domicil_2cat edulvlb_3cat inter_gndr_domicil
/categorical edulvlb_3cat
/contrast (edulvlb_3cat)=indicator(1).

*/contrast (edulvlb_3cat)=indicator(1). – this means we choose the first category as an indicator category.

Primary or below -> is the reference category (because it is coded with only zeros).

Most of the results, compared to the previous regression table (Education as multicategorical) are similar after we involved the interaction term, but the significance level of the gender changed, it is not significant anymore. (p=0,266)

Exp(b1): The odds of taking part in a demonstration among men are 1,615 times as high as among women, everything else held constant. (But this is only true for small cities). So, this is the effect of gender in small cities.

The interaction term is only good for the calculation and we only have to check its significance level. In the B column, we have to add the coefficient of the inter variable to the coefficient of the original independent variable called gender (Note: gender appears in the interaction term as well). In the Exp(B) column, we have to multiply them.

b1+inter: The log odds of taking part in the demonstration among men is higher by 0,479+0,218=0,697 than among women. This is true for big cities. In this column the effect is additive, so you have to add the two numbers together.

Exp(b1)*inter: The odds of taking part in the demonstration among men are 2,007 times as high as among women, but this is true for big cities. (So, in the Exp(B) column the effect is multiplicative. This means that you don’t need to add the two numbers together but you have to multiply the two numbers by each other.) 1,615*1,243=2,007. So, this is the effect of gender in big cities.

So, you use the inter variable only for calculations. But you see that the inter variable is not significant, so there is no interaction effect. (p=0,704) This means that the effect of gender is not significantly different in big cities and in small towns. More accurately, we say that we don’t have evidence for stating that there is a significant difference.

Note: In a logistic regression can be more than one interaction: multiple interactions in logistic regression