BS1.10: Contingency tables |
OBJECTIVES |
At the end of this section you should be able to compute and interpret the significant relationship between two categorical variables. |
When two variables are categorical (binary variables), the Chi-squared test is commonly used to test the null hypothesis that the distributions of the variables are independent of each other.
Consider the following 2x2 table with binary variable A (as rows) and samples 1 and 2 as columns:
Variable A | Sample #1 | Sample #2 | Total |
---|---|---|---|
+ | a | b | (a + b) |
- | c | d | (c + d) |
Total | (a + c) | (b + d) | n |
Chi-squared test value (c2) is calculated using the following formulae:
c2 = [(ad - bc)2 n] / [(a + b) (c + d) (a + c) (b + d)]
Degrees of freedom = 1.
Suppose we wish to determine the relationship between smoking and gender among farm workers. Both smoking and gender are binary variables, so the Chi-squared test is applied:
Smoking status | Female workers | Male workers | Total |
---|---|---|---|
No | 56 (47.5%) | 36 (29.3%) | 92 |
Yes | 62 (52.5%) | 87 (70.7%) | 149 |
Total | 18 (100%) | 123 (100%) | 241 |
c2 = [(ad - bc)2 n] / [(a + b) (c + d) (a + c) (b + d)]
c2 = 8.4
Degrees of freedom = 1.
To obtain the corresponding P-value: The critical Chi-squared distribution values at 1 degree of freedom are:
D.F. | 0.1 | 0.05 | 0.025 | 0.01 | 0.005 |
---|---|---|---|---|---|
1 | 2.71 | 3.84 | 5.02 | 6.63 | 7.88 |
The calculated c2 value (8.4) lies below the P-value 0.005.
Interpretation: Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of smoking is significantly higher among male farm workers. Hence we reject the null hypothesis.