Introductory Biostatistics

When two variables are categorical (binary variables), the Chi-squared test is commonly used to test the null hypothesis that the distributions of the variables are independent of each other.

Consider the following 2x2 table with binary variable A (as rows) and samples 1 and 2 as columns:

Variable A	Sample #1	Sample #2	Total
+	a	b	(a + b)
-	c	d	(c + d)
Total	(a + c)	(b + d)	n

Example:

Suppose we wish to determine the relationship between smoking and gender among farm workers. Both smoking and gender are binary variables, so the Chi-squared test is applied:

Smoking status	Female workers	Male workers	Total
No	56 (47.5%)	36 (29.3%)	92
Yes	62 (52.5%)	87 (70.7%)	149
Total	18 (100%)	123 (100%)	241

c² = [(ad - bc)² n] / [(a + b) (c + d) (a + c) (b + d)]
c² = 8.4

Degrees of freedom = 1.

To obtain the corresponding P-value: The critical Chi-squared distribution values at 1 degree of freedom are:

P-value
D.F.	0.1	0.05	0.025	0.01	0.005
1	2.71	3.84	5.02	6.63	7.88

The calculated c² value (8.4) lies below the P-value 0.005.

Interpretation: Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of smoking is significantly higher among male farm workers. Hence we reject the null hypothesis.

OBJECTIVES

Chi-square test: Relationship between two categorical variables: