7
STEP PROJECT |
SIMPLE EXAMPLE OF HOW TO ANALYSE YOUR DATA
You will need to collect data and put it into a statistical package eg Stata 6 which you were shown in Block 1. You can use Statransfer to get the data from Excel which may be easier for data entry into Stata 6. You will then need to analyse your data. The following analyses should be done: Show the difference between who you set out to study and who you ended up studying , or the response rate, if this is appropriate and you wanted to study say 100 people or things but could only get 60 to respond or study. Describe the demographics of your populations of interest (exposed and unexposed). list all your main variables of interest in terms of your hypothesis, aims and objectives and describe these statistically. You will need to think about which variables are continuous (age) and which are categorical (gender = male or female). The link to the article shows many variables, which are often quite complex in the way they are put together from the basic measurements. To simplify matters we will look additionally at some selected variables from this study. In general you need to think about your variables of interest as exposure, outcome or confounding/nuisance variables. Exposure will be measured as average intensity of exposure to manganese for a particular individual across their lifetime in the manganese industry and is measured in mg/m3. This has been rendered as a continuous variable (aint) and a categorical variable (highexp) for the purposes of demonstration. Where highexp = 1, the subjects have high exposure above the median exposure for the whole group of subjects of 0.4mg/m3. If subjects have lesser exposure than 0.4 then highexp = 0. The hyperlink above shows the summary details for both variables from Stata outputs - summarise for continuous and tabulate for dichotomous. Outcome is the speed of reaction to a stimulus and is measured as mean reaction time in milliseconds. This has been rendered as a continuous variable (mrt) and a categorical variable (highout) for the purposes of demonstration. Where highout = 1, the subjects have slow reaction times (high values) above the median reaction time of 288 milliseconds for the whole group of subjects.. If subjects have fast reaction times below 288 then highout = 0. The hyperlink above shows the summary details for both variables from Stata outputs - summarise for continuous and tabulate for dichotomous. An important potential confounder variable is educational level which is measured as years of schooling passed. This has been rendered as a continuous variable (stdpass) and a categorical variable (highstd) for the purposes of demonstration. Where highstd = 1, the subjects have high years of education achieved above the median of 5 years for the whole group of subjects.. If subjects have low years of education below 5 then highstd = 0. The hyperlink above shows the summary details for both variables from Stata outputs - summarise for continuous and tabulate for dichotomous. Summarise these variables by describing their distributions or frequency distributions. Decide on whether they are normally distributed or not. One clue to this is to see how close the median (50th percentile) is to the mean value for the distribution. Another way is to do a formal test like the Shapiro-Wilk test. In this case the p value is < 0.05 and indicates that the distribution of mrt is not normal. This can be checked against the values for Skewness and Kurtosis which are both high - if normal they should be considerably less than 1 each. This is important for outcome variables if you are doing correlation or regression analysis. You can see that mrt is not really normally distributed. We may be able to get away with a correlation or regression analysis but to be safe we also need to analyse mrt as a dichotomous variable. The shape of the distribution of the exposure and confounder variables is not important for correlation and regression analysis. After describing your variables the next thing to do is to look at some associations between exposures and health outcomes to see if there is any exposure effect. First we look at the crude association between continuous variables mrt and aint. This shows a correlation coefficient of 0.16 which although a modest correlation in the range (0 to 1) is highly significant (p < 0.0002). So there appears to be a crude effect. This association can also be looked at and presented graphically by means of a two way scatterplot. There is a suggestion of a weak upward trend in mrt with increasing exposure. This second example for one of the other neurobehavioural test score results shows more of a visual trend. We can also look at the association between highout and highexp treating both as dichotomous variables to get around any violation of the assumption of normality for the outcome variable. This shows a significant crude or unadjusted odds ratio so there is a crude effect when assumptions of normality are dispensed with. You will note that exposure is also significantly associated with years of schooling, and that years of schooling is associated in its own right with outcome. This means that years of schooling is very likely to confound the effect (the association between exposure and outcome). So we need to examine exposure effects adjusted for the presence of the confounder. The easiest way to examine this is to stratify the confounding variable (highstd) and look at the association between exposure and outcome within each stratum of years of schooling. For high schooling years (highstd = 1) there is a significant effect with an odds ratio near 2 with a low p-value. For low schooling years (highstd = 0) there is an odds ration of 1.6 with a p-value that just misses being significant. Remember from the above that the crude (unadjusted for the confounder) odds ratio was 1.9. These are all quite close to each other and having done this analysis we are inclined to believe that the effect that we see between exposure and outcome here is real and that it is not due to the effect of the confounder. Although we do see that when we remove the effect of the confounder the crude odds ratio is higher than the stratum specific ratios. So the mixed effect of years of schooling has been to make the exposure effect bigger than it actually is. In the case where your outcome variable is continuous and the exposure variable is dichotomous you can do a t-test to examine mean differences between the two exposure categories. This is the crude effect. The stratum specific effects (stdhigh = 1) and (stdhigh = 0) also show significant but smaller mean differences. Revisit your Statistics lectures from Block 1 to see how to test hypotheses and to calculate measures of association.
|