# Chi-square test

**Hypothesis:** The proportion of those who are unhappy is higher among those who rarely meet socially with friends, relatives, and colleagues than among those who often meet.

Independent: sclmeet, Dependent: happy.

weight by pweight.

fre sclmeet.

RECODE sclmeet (1 thru 3=1)(4 thru 7=2) INTO sclmeet_2cat.

VARIABLE LABELS sclmeet_2cat ‘Do you meet socially often or rarely with friends, relatives or colleagues?’.

VALUE LABELS sclmeet_2cat 1’rarely’ 2’often’.

fre sclmeet sclmeet_2cat.

fre happy.

RECODE happy (0 thru 5=1) (6 thru 10=2) INTO happy_2cat.

VARIABLE LABELS happy_2cat ‘Are you happy or not?’.

VALUE LABELS happy_2cat 1’unhappy’ 2’happy’.

fre sclmeet sclmeet_2cat happy happy_2cat.

CROSSTABS happy_2cat BY sclmeet_2cat /CELLS=COUNT COLUMN /STATISTICS=CHISQ RISK.

The percentage has to be in the direction of the independent variable. The independent variable has to be in the column. In the interpretation, you have to compare the percentages across. In the syntax after the command “CROSSTABS”, first you have to write the name of the dependent variable, then you have to write the command “by” and then the name of the independent variable.

41,8 – 27,6= 14,2 percentage points. The proportion of those who are unhappy is 14,2 percentage points higher among those who rarely meet socially with friends, relatives, and colleagues than among those who often meet.

**Conclusion:** The proportion of unhappy people is significantly higher among those who rarely meet with friends, relatives, and colleagues than among those who often meet them. Why do we state this? Because: (p=0,000 < 0,05) and the epsilon shows us where is the proportion of unhappy people higher.

## Epsilon

**Epsilon:** the difference between two adjacent percentages – measured in percentage points.

The percentage of unhappy people (41,8%) is much higher among those who only rarely meet socially with friends, family or colleagues than among those who often meet socially (27,6%).

->(41,8%-27,6%) It is 14,2 percentage points higher.

OR: The percentage of unhappy people (41,7%) is much lower among those who often meet socially with friends, family members, colleagues than among those who rarely meet socially.

->(27,6%-41,8%) It is 14,2 percentage points lower.

The percentage of happy people is much lower among those who only rarely meet socially with friends, family members, colleagues than among those who often meet. ->58,2%-72,4%=-14,2 percentage points lower

The percentage of happy people is much higher among those who often meet socially with friends, family or colleagues than among those who only rarely meet. (72,4%-58,2%)=14,2 percentage points higher.

Which epsilon should you interpret? The epsilon for people being happy or the epsilon for people being unhappy? The answer to this relies on your hypothesis. You should interpret the one that you are referring to in your hypothesis. So, in this example, you should interpret the one for being unhappy, since your statement refers to the unhappiness of the people.

## How do you calculate the probability?

**Probability:** Probability of an event occurring / Probability of all the different events occurring (total)

What is the probability of being unhappy overall?

P(unhappy)=293/841=0,348

What is the probability of being happy overall?

P(happy)=548/841=0,652

What is the probability of being unhappy overall?

P(unhappy)=34,8% * divided by 100 = 0.348

What is the probability of being happy overall?

P(happy)=0.652 * divided by 100 = 0.652

## How do you calculate the odds?

**Odds:** probability of event occurring / probability of event not occuring.

You can interpret it in times or in percentage(%). It shows you how many times more likely something is to happen than to not happen.

CROSSTABS happy_2cat BY sclmeet_2cat /CELLS=COUNT COLUMN.

What are the odds of being unhappy?

Odds of being unhappy: 0.348/0.652=0,534 – here you calculate it from the probability.

Odds of being unhappy: 34,8 / 65,2 = 0,534 – here you calculate it from the percentages in the total column. You get the same result.

Odds of being unhappy: 293 / 548 = 0.534 (0.534-1)100 = -46,6

People are by 46.6% less likely to be unhappy than to be happy.

Odds of being happy: 0.652/0.348=1.874

(1.874-1)*100=87,4

People are by 87,4% more likely to be happy than to be unhappy.

**Conditional odds**

**Conditional odds:** odds computed separately for each category of the independent variable.

What are the odds of being unhappy for those who only rarely meet socially with friends, family or colleagues?

The odds of being unhappy for people who only rarely meet socially with friends, family or colleagues / the odds of being unhappy for people who often meet socially with friends, family or colleagues.

Rarely:(conditional odds for being unhappy):180 / 251 = 0,717

Often (conditional odds for being unhappy): 113 / 297= 0,380

**Odds ratio**

Odds ratio=ratio of two conditional odds.

The odds ratio shows how many times greater or smaller the odds of the phenomenon under study is in one category of the independent variable than in the other category.

CROSSTABS happy_2cat BY sclmeet_2cat /STATISTICS=RISK.

Odds ratio: 0,72 / 0,38=1,89. Those who rarely meet are 1,89 times more likely to be unhappy than those who often meet. (compared to being happy).

Odds ratio as percentage change: (1.885-1)*100 = 88,5.

The odds of being unhappy is 88,5% higher among those who only rarely meet socially with friends, family, colleagues, than among those who often meet socially with others.

The program computes the odds ratio for those who are on your top left cells in the contingency table, so it computes the odds ratio for those who rarely meet and are unhappy comparing to those who often meet.

Don’t be confused by the name risk, because it actually computes the odds ratio and not the risk (Because in real life risk means probability, but here it means odds ratio).

## Lambda

Lambda is a measure of association that reflects the proportional reduction in error (PRE) when values of the independent variable are used to predict values of the dependent variable.

A value of 1 means that the independent variable perfectly predicts the dependent variable.

A value of 0 means that the independent variable is no help in predicting the dependent variable.

CROSSTABS happy_2cat BY sclmeet_2cat /STATISTICS=LAMBDA.

The value of the lambda is the one that is in the row which shows the “true dependent variable”. In this example the dependent variable is happy, so the value of the lambda is 0,000.

How do you interpret the lambda?

convert it’s value into a percent: x*100 = 0,000*100 = 0

If you want to compare having the knowledge (data) of the distribution of the independent variable compared to not having this knowledge (data)

improves our ability to predict the correct outcome by 0 percent.

In another way: by knowing the independent variable we can reduce

the probability of making an incorrect prediction by 0 percent. So, knowing whether a person often or rarely meets with others socially improves our ability to predict the correct outcome by 0 percent.