Rense Nieuwenhuis » Simpson’s Paradox

Sex discrimination in graduate admissions? A real-life aggregation paradox

Rense Nieuwenhuis — Mon, 07 Jun 2010 10:00:59 +0000

A 1975 study on graduate admissions at Berkeley found that male applicants had a substantially higher likelihood of being admitted, compared to women. However, upon closer examination the presence of aggregation paradoxes do not legitimize the conclusion that women were discriminated against.

In an attempt to study whether or not sex inequalities in higher education are due to discrimination, one may want to study individual admissions, and compare whether a man and a woman on average has an equal opportunity of being admitted, after applying for a graduate education. Sure, we need to take into account all possibly relevant differences between the candidates, such as prior academic performance, motivation, experience, and whatever other aspects may be relevant in the admission procedure. But then, when comparing (statistically) identical men and women, finding that men have a higher likelihood to be admitted to a graduate program compared to women, would lead many to the conclusion that this university discriminates against women. Right?

I believe that when such a research finding was published regarding a university, a great many of us would indeed suspect, or even believe, that women are discriminated on this university. And, in public opinion, the news, talkshows, and the blogosphere, I can easily imagine this university being accused of sex discrimination. And possibly Bickel, Hammel, and O’Connell did so as well, initially, when they indeed found that women actually had a substantially smaller chance of being admitted to a graduate program on Berkely university (the study was carried out during the fall of 1973 and published in 1975).

After having determined that 44% of all male applicants were admitted to Berkeley, but only 35% of all female applicants, the authors decided to find out which departments were the culprits. Perhaps some departments discriminated against women more strongly than other departments? Fortunately, the data on Berkeley allowed the authors to study the likelihood of being admitted for each department separately. Much to their surprise, however, they found only a few departments to be biased, and the number of departments biased towards men equalled the number biased towards women!

The paradox is clear: at the level of the departments men and women had about an equal chance of being admitted to a graduate programme, but still this resulted in the finding that in this university as a whole, a man had a substantially higher likelihood of being admitted than a woman.

A solution to this paradox lies in the aggregation: at different level of aggregation (university as a whole vs. individual departments) the association between an applicants’ sex and their likelihood of being admitted is of a different sign. This is an example of Simpson’s Paradox. As I illustrated in an earlier post, this paradox can occur when two or more sub-populations are aggregated, and if one or more of the associated characteristics is not equally prevalent amongst sub-populations.

As it turned out, Bickel, Hammel, and O’Connell indeed found that women applied relatively frequently to departments that reject more applicants in general. So, even while these departments did not have a bias towards either men, nor women, because they rejected a great number of applicants, they also rejected a proportionally large number of women. Therefore, while none of the departments discriminated against women, the selection of women towards specific departments resulted in a lower overall likelihood for women of being admitted to Berkeley.

Is this to say that no discrimination of women takes place? No, for as the authors conclude: “Women are shunted by their socialization and education towards fields of graduate study that are generally more crowded, less productive of completed degrees, and less wel funded, and that frequently offer poorer professional employment prospects” (p. 403). So, inequality and possibly discrimination remained, but this analysis showed that the unequal likelihood of a woman being admitted to Berkeley could not legitimize the conclusion that the departments, on average, showed a bias towards men. If there was discrimination, it was somewhere else.

Bickel PJ, Hammel EA, & O’connell JW (1975). Sex Bias in Graduate Admissions: Data from Berkeley. Science (New York, N.Y.), 187 (4175), 398-404 PMID: 17835295

Simpson’s Paradoxical Card Trick

Rense Nieuwenhuis — Mon, 31 May 2010 10:00:03 +0000

Imagine this card trick. A statistician divides a regular deck of cards into two sets: one of 20 and one of 32 cards. Next, he urges two groups of students to investigate the cards, and hands out one set of cards to each of the groups. Both groups start counting the cards, and cross-tabulating the numbers based on several ways they can come up with. Quite rapidly, a member of the group with 20 cards observes an interesting pattern. Amongst the cards his group is studying, an interesting pattern emerges: a disproportionally large number of black court cards. A hypothesis is formulated: could it be that their ‘blackness’ causes them being a ‘court’ card more frequently?

The results are drawn up in a table and shown to the entire group:

	Plain	Court	Total
Red	8 (66.7%)	4 (33.3%)	12 (100%)
Black	5 (62.5%)	3 (37.5%)	8 (100%)
Total	13 (65.0%)	7 (35.0%)	20 (100%)
Odds Ratio				1.2

The math is correct: the numbers add up to 20, so no card was missed or counted double. For proper interpretation, the percentages were calculated by row. The percentage ‘court’ cards amongst the black ones (37.5%) is larger than amongst the red cards (33.3%). To make sure, an odds ratio was computed as well with (8 * 3) / (5 * 4) = 1.2. This measure of association thus also indicates ‘black’ causing ‘court’. The group was satisfied with the support for their hypothesis.

“Oh, come on!” a member of the other group cries out, “Use your common sense. Every deck of cards has an equal number of black and red cards, and the court cards are distributed equally over the colors. If you find a disproportionally large number of black court cards, we should simply find a correspondingly large number of red court cards in our set. Right?” And so, a contrasting hypothesis was raised. The second group quickly tabulated their set of cards, and produced the following table:

	Plain	Court	Total
Red	12 (85.7%)	2 (14.3%)	14 (100%)
Black	15 (83.3%)	3 (16.7%)	18 (100%)
Total	27 (84.4%)	5 (15.6%)	32 (100%)
Odds Ratio				1.2

How could that be? To their surprise, the only valid conclusion seems to be that in this subset of cards a disproportionally large number of black court cards is present. Or, in a more liberal interpretation of the findings, amongst these cards it was found again that their ‘blackness’ causes them to be a ‘court’ card. This resulted in a very puzzling situation: a deck of cards was split into two subsets and in both sets a positive association was found between the cards’ ‘blackness’ and them being a ‘court’ card. Could it be that the deck of cards was rigged?

The two groups teamed up, and aggregated their two tables:

	Plain	Court	Total
Red	20 (76.9%)	6 (23.1%)	26 (100%)
Black	20 (76.9%)	6 (23.1%)	26 (100%)
Total	40 (76.9%)	12 (23.1%)	52 (100%)
Odds Ratio				1.0

No, the deck of cards was not rigged. The numbers are correct since the aggregated numbers represent a typical deck of cards: 40 plain cards, 12 court cards, and an equal number of red and black cards. Also, in this contingency table, absolutely no association is present between the color of the card, and them being ‘court’ or ‘plain’.

This must be magic: in the complete deck of cards no association is present, while in the two subsets of cards a positive association is found. Since this association is in the same direction in both subsets, we cannot simply argue that the two associations cancel each other out upon aggregation. But it is no magic, it’s statistics: the two sets of cards were selected by the statistician E.H. Simpson (1951).

Of course, the cards in the two subsets were selected carefully, so that the smaller subset both had a disproportionally small number of black cards, and a disproportionally large number of court cards. Simpson selected these subsets to illustrate a paradox that is as fascinating as it is relevant to our analytical practice. Finding associations – with correct math! – in subsets, whereas this association is not present in the aggregated sets is so counter-intuitive, that we can easily make a mistake.

Now imagine that the cards in Simpson’s deck represent observations in the project you are currently working on. The deck of cards represents the overall population you are interested in, and the two subsets represent two sub-populations you might be studying separately (we often do, for instance if we have separate samples). Even if your sub-populations encompass all people in the population (e.g. you are studying men and women separately), and even if the findings in both sub-populations are consistent, you should not simply conclude that these consistent findings hold true in the complete population. With Simpson’s paradox in mind, you know you’re just one (dis)aggregation away from being led astray.

Simpson, E.H. (1951). The Interpretation of Interaction in Contingency Tables Journal of the Royal Statistical Society. Series B (Methodological), 13 (2), 238-241