BS1.10: Contingency tables

OBJECTIVES

At the end of this section you should be able to compute and interpret the significant relationship between two categorical variables.

Chi-square test: Relationship between two categorical variables:

When two variables are categorical (binary variables), the Chi-squared test is commonly used to test the null hypothesis that the distributions of the variables are independent of each other.

Consider the following 2x2 table with binary variable A (as rows) and samples 1 and 2 as columns:

Variable A Sample #1 Sample #2 Total
+ a b (a + b)
- c d (c + d)
Total (a + c) (b + d) n

Chi-squared test value (c2) is calculated using the following formulae:

c2 = [(ad - bc)2 n] / [(a + b) (c + d) (a + c) (b + d)]

Degrees of freedom = 1.

Example:

Suppose we wish to determine the relationship between smoking and gender among farm workers. Both smoking and gender are binary variables, so the Chi-squared test is applied:

Smoking status Female workers Male workers Total
No 56 (47.5%) 36 (29.3%) 92
Yes 62 (52.5%) 87 (70.7%) 149
Total 18 (100%) 123 (100%) 241

c2 = [(ad - bc)2 n] / [(a + b) (c + d) (a + c) (b + d)]
c2
= 8.4

Degrees of freedom = 1.

To obtain the corresponding P-value: The critical Chi-squared distribution values at 1 degree of freedom are:

P-value
D.F. 0.1 0.05 0.025 0.01 0.005
1 2.71 3.84 5.02 6.63 7.88

The calculated c2 value (8.4) lies below the P-value 0.005.

Interpretation: Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of smoking is significantly higher among male farm workers. Hence we reject the null hypothesis.




Creative Commons 

License
General Introduction to Occupational Health: Occupational Hygiene, Epidemiology & Biostatistics by Prof Jonny Myers is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 South Africa License
.