# Categorical variable

When to use this? When you have a categorical variable with more than two categories/levels and you want to use linear regression.

All categorical variables have to be entered into the regression as a dummy variable. Each dummy variable represents one category of the independent variable and it is coded as 1 if the case falls into that category and 0 if the case does not. So, we have to create one dummy variable for each of the categories.

**Step:**check if there are any values to exclude from the database (Check the Frequency table – in case there are any in the Variable View window put the values to System missing)**Step:**Recode every single category into a dummy variable (1-if it falls in the case, 0 – if it does not)**Step:**Select one category as a baseline category, the category against which we will compare all the other categories. The baseline category does not have to be included in the regression. Basically, this means that we compare all the other categories to the baseline category. Either one of them can be set as a basecase / baseline / omitted category.**Step:**To perform the Simple Linear Regression go to Analyze – Regression – Linear. Include the dependendent variable and all the independent variables,**except the baseline/omitted category.**So, the number of the dummy variables included in the regression is always one less than the number of categories.- Step: Click Paste, run the command.
- Step: Interpret the tables in the Output.

## How to interpret the values?

First of all check the p value (Sig.). If this is above 0,05 then we say that the results are not statistically significant. This means that the results that we see are very likely that they have only occured by chance, thus we do not need to interpret the results furthure.

**b0**(the constant): gives the average level of the dependent variable for the omitted category, which is the category coded with 0 on**all**dummy variables. This becomes the reference point with which all other categories are compared.- The other
**coefficients**represent the difference in the average level of the dependent variable between the category coded with 1 on one particular dummy variable and the omitted category.- b1: the difference between the mean for the category coded as 1 in this independent variable and the mean for the omitted category
- b2: the difference between the mean for the category coded as 1 in this independent variable and the mean for the omitted category
- b3: the difference between the mean for this category (coded with 1) and the mean for the omitted category.
- In case there is b4, b5, b6 the same logic applies.