In case you have one dependent variable and one independent variable:

There are six assumptions associated with a linear regression model:

There are six assumptions associated with a linear regression model:

  1. The dependent variable has to be a scale variable.
  2. Linearity: The relationship between X and the mean of Y is linear.
  3. Outlier condition
  4. Homoscedasticity: nearly normal residuals (Independence or Errors) The variance of residual is the same for any value of X.
  5. Normality: the variables(X,Y) have to be normally distributed.

1.The dependent variable has to be a scale variable.

2. Linearity

Check the Scatter plot for this.

The relationship between the dependent and the independent variable should be linear check using a scatterplot of the data

3.Outlier condition: Check for outliers and extreme values (levarage and influential points):

Check the Scatter plot.

Outliers in regression are observations that fall far from the “cloud” of points. These points are especially important because they can have a strong influence on the regression line.

So, there are 3 types of „deviant” data points: influential data points, high leverage data points and outliers.

4. Homoscedasticity: nearly normal residuals, nearly normal residuals

Homoscedasticity=constant variability

For large sample sizes this does not matter.

Check using a normal probability plot of residuals or a histogram. Check using a residuals plot (plotting the predicted values and the residuals)

Analyze-Regression-Linear : Plots window -mark the Normal Probability Plot

Question: Are the theoretical residuals normally distributed? We don’t know the theoretical residuals, we only have the observed residuals. Residuals should be nearly normally distributed, centered at 0. This may not be satisfied if there are unusual observations that don’t follow the trend of the rest of the data.

The variability of points around the least squares line (regression line) should be roughly constant. This implies that the variability of residuals around the 0 line should be roughly constant as well.

Put into Y: *ZRESID – standardized residuals

Put into X: *ZPRED – standardized predicted values

What we are looking for here is that if these points more or less fall the line. We see that there is some deviation here (towards the center) but generally the points do seem to fall the line. So, we would assume that we have a normal distribution here. The observed standardized residuals are normally distributed. And then we check if the observed unstandardized residuals are normally distributed. (You can see them in the DataView window, the SPSS program creates these ones after you run the regression.)

Click here for more info (Video)

For another example click here

5.Test of Normality for the Variables (Independent and dependent)

We use the Kolmogorov-Smirnov or Shapiro Wilk test.

Analyze-Descriptive Statistics-Explore: Plots – mark: Normality Plots with Tests

If the Sig. value is above 0,05 then we assume that the variable is normally distributed.

If it is 0,05 or less then 0,05 then the variable is not normally distributed.

UP