When do you need to use a control variable?
When you want to address alternative explanations by removing confounding effects and when you want to improve efficiency (when you want to eliminate some kind of distortion.)
Use a control variable:
- If the two independent variables correlate with each other and they both have an effect on the dependent variable.
- When the variable has an effect on the dependent variable.
Another thing that you have to take into consideration:
1. the time order/time schedule of the variables: If the new variable in time comes before/precedes the original independent variable then it is reasonable to involve it in the regression
2. If the new variable in time succeds/follows the original variable then it is not reasonable to involve the new variable to the model.
Example: what does it means that in time one follows the other one? Meaning: in our example if we start from the point of birth then we can imagine that a person/a student first has a father, who finishes x number of classes in the school (PEDUCATION) and this is followed by the fact that a person/a student later on in his life will join a class. In a very simple way: usually first you have a father, then you get into a class. So, the CLASS variable in time follows the PEDUCATION variable.
Note: There are two different cases when it is reasonable to involve a new independent variable that “in time happened later” or follows the original variable and you want to test an explanation:
1.The change in b1 supports the alternative explanation:
Dependent: test result. Independent: class, peducation
Hypothesis: The new teaching method is much more effective than the old one.
Alternative explanation: The educational background of the parent’s has an effect on the student’s test scores, so we involved a new control variable called PEDUCATION.
The significance level of b1 in the second regression where we involved both the variables became not significant -> This supports our hypothesis. (On the next page you can find this example in details.)
2. There is no change in b1 and the results still support the alternative explanation:
Hypothesis: In the south-east region of Hungary people live in such a “subculture” where suicide as a norm is widely accepted (theory of socialization) and this is why in this region the rate of the suicide is higher than in other regions of Hungary.
- regression: Dependent: suicide rate, Independent: region
- regression: Dependent: suicide rate Independents: region, control variable: geographical mobility
Alternative explanation: The geographical mobility of people living in the south-east regions is higher than in other regions. The high level of geographical mobility has the effect on the suicide rate. People moving away from their original hometowns face uncertainty in their new life circumstances and while becoming strangers they also do not find supportive relationships. These events result in stress that they cannot cope with.
b1 did not become insignificant, it only slightly decreased but it is still significant after introducing the geo-mobility as a control variable and the geo-mobility is also significant. -> the results support our hypothesis.
In the second regression besides the original two variables (suicide rate, region) we involved one more independent variable: geographical mobility. Before we involved the control variable the region had an effect on the suicide rate and after involving one more control variable (geographical mobility) the effect of region on the suicide rate was still significant, the original effect did not disappear. But the effect of the control variable, geographical mobility became also significant. (Thus, both the independent variables have an effect on the suicide rate.)
In conclusion, the results supports our alternative hypothesis stating that the geographical mobility has an effect on the suicide rate. People living in the south-east region are not only more prone to commit suicide because of the social or cultural meanings attributed to suicide (socialization – through generations they have learned that this is the right way to cope with stress), but they are also more prone to commit suicide because people living in the south-east region have higher geographical mobility. Thus, geographical mobility increases the risk of suicide rate and living in the east-south region of Hungary also increases this risk.
Do not introduce the variable
“A” independent variable (the control variable) can only distort the effect that another independent variable has on the dependent variable if it correlates with the other independent variable and it also has an effect on the dependent variable.
So, when the new independent variable correlates with the other independent variable, but it does not have an effect on the dependent variable then we do not have to include it in our model.
How to show two regression models in one coefficient table?
Always raise the question
Are there any other alternative explanations that you can think of besides your original explanation?UP