Better one control variable in your mind, than 10 in your model.
A key challenge in country-comparative research is the limited number of (control) variables that can be accounted for simultaneously. But then again, often too many variables are included in models anyway, I think. Without critical thought, such sheer amount of control variables may do more harm than good.
Recently, a good example was discussed by Andrew Gelman – who argued against controlling for post-treatment variables. Gelman responded to a study finding that “The more money that parents provide for higher education, the lower the grades their children earn.” It turned out that the study included a very important post-treatment control: “whether the student is employed during school”. So, the most plausible causal mechanism (parents pay -> students don’t have to work and therefore have more time to study –> better grades) is completely blocked out, therefore unable to drive the parameter estimate.
In my own research, combining person-level and country-level data, I was limited in the number of control variables I could have. On the person-level, this was due to the lack of comparable measurements across countries and over time. At the country-level, on the other hand, I could have selected for quite an abundance of country-level control variables. However, the number of country-year level observations was still relatively low – and the specified regression models were quite complex. Hence, the number of controls had to be limited. I decided to control only for labour market structure in tested the effects of family policy outcomes. In my mind, it makes sense to control for factors directly shaping the employment opportunities of women when investigating the effects of policies also affecting these opportunities.
Technically, it has become easy to estimate highly complex regression models with many variables. This often makes a lot of sense to do, but if one is not careful the results become uninterpretable, or the correct interpretation of a parameter estimate changes – for instance because of controlling for a post-treatment variable. In any case: controls can be great but only after careful consideration.
This is a series on the 10 propositions that are part of my PhD dissertation. These propositions are a Dutch tradition to highlight key findings of a dissertation and some additional insights by the author. My dissertation is titled “Family Policy Outcomes: Combining Institutional and Demographic Explanations of Women’s Employment and Earnings Inequality in OECD countries, 1975-2005″ and I will defend my dissertation on January 10 2014. So, this series is also a count down. Find out more about my dissertation.