PlanetMath (more info)
 Math for the people, by the people.
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Owner confidence rating: Very high Entry average rating: No information on entry rating
Simpson's paradox (Definition)

Before describing what a Simpson's paradox is, let's start with a hypothetical example. During a particular summer, an experiment was conducted to find out the preference between two types of beverages: soda and lemonade. The data was drawn from two locations: city and rural. In each location, the gender and the choice of drinks were collected. The results are summarized as follows:

location gender lemonade soda total % preferring lemonade odds ratio
<</SPAN>#195#> female 150 300 450 24.9% <</SPAN>#201#>
  male 300 660 960
<</SPAN>#207#> female 285 860 1145 33.3% <</SPAN>#213#>
  male 30 100 130
The odds ratio given that location = city is about 1.1, showing that females are about 10% more likely to drink lemonade than males. Because the conditional odds ratio given that location = rural is also 1.1, the same conclusion can be drawn.

Next, combine the results from both locations and form the following 2 by 2 contingency table:

gender lemonade soda total % preferring lemonade odds ratio
female 435 1160 1595 27.3% <</SPAN>#230#>
male 330 760 1090 30.3%
The odds ratio of 0.86 shows that females are about 14% less likely to drink lemonade than males, rather than 10% more likely as was shown earlier! This is an example of Simpson's paradox.

In general, Simpson's paradox illustrates that the effect of an omission of a categorical explanatory variable $ Z$ can have on the measure of association between a categorical explanatory variable $ X$ and a categorical response variable $ Y$.

In the example, given the location variable $ Z$, the conditional odds ratios show that the gender variable $ X$ and choice of drinks response variable $ Y$ have a positive association, with positive log-odds ratios. However, when the location variable $ Z$ is removed, the marginal association between $ X$ and $ Y$ is negative, with a negative log-odds ratio.

One reason for this apparent paradox is due to the dissimilar populations between the city and the rural groups. In the rural area, the majority of the test subjects are female, whereas in the city area, the majority is male.

For an excellent explanation of Simpson's paradox, please refer to the book below.

Bibliography

1
A. Agresti, An Introduction to Categorical Data Analysis, Wiley & Sons, New York (1996).



"Simpson's paradox" is owned by CWoo.
(view preamble)

View style:

Log in to rate this entry.
(view current ratings)

Cross-references: area, paradox, log-odds ratio, negative, positive, odds, variable, response variable, explanatory variable, categorical, ratio, contingency table, conclusion, conditional, odds ratio
There is 1 reference to this entry.

This is version 5 of Simpson's paradox, born on 2004-10-06, modified 2008-04-15.
Object id is 6313, canonical name is SimpsonsParadox.
Accessed 4512 times total.

Classification:
AMS MSC62H17 (Statistics :: Multivariate analysis :: Contingency tables)

Pending Errata and Addenda
None.
Discussion
Style: Expand: Order:
forum policy

No messages.

Interact
post | correct | update request | add derivation | add example | add (any)