Basic Statistics for Clinicians

1. HYPOTHESIS TESTING

The Null hypothesis

The Null Hypothesis: “The true difference in the effects of the experimental and control treatments on the outcome of interest is zero”

The result of a single experiment will almost always show a difference between experimental and control groups. Is the difference due to chance, or is it large enough to reject the null hypothesis and conclude there is a true difference in treatment effects?

The p value

Statistical tests yield a p value: the probability that the experiment would show a difference as great or greater than that observed if the null hypothesis were true. By statistical convention, the boundary or threshold that separates the plausible and the implausible is five times in 100 (p=0.05).
Statistical significance means that a result is “sufficiently unlikely to be due to chance that we are ready to reject the null hypothesis”.

However, the smaller the sample size, the greater the chance of erroneously concluding that the experimental treatment does not differ from the control - in statistical terms, the power of the test may be inadequate.

Type I Error

To conclude there is a difference in outcomes between treatment and control when no such difference exists. The probability of making such an error is designated alpha.

Type II Error

This type of error occurs when we erroneously fail to reject the null hypothesis, thus falsely concluding an effective treatment is useless. The larger the sample, the lower the risk of Type II error and the greater the power.

The Chi² test

Application of a statistical test that compares proportions.

Student's t-test

A statistical test for continuous variables. P values for Student's t-test and others like it are obtained from standard tables.

Baseline differences

If factors that determine outcome are unequally distributed beaten groups despite random sampling, adjustments can be made (for several variables at once) to yield a p value that can be interpreted in the regular way.

Multiple tests

How likely is it that in six independent tests on two similar groups, at least one test would have crossed the 0.05 threshold by chance alone? The probability is calculated as follows: the probability that we would not cross the 0.05 threshold in testing a single hypothesis is 0.95; in testing two hypotheses the probability that neither one would cross the threshold is 0.95 multiplied by 0.95. Etc., so that six hypotheses yield 0.95 to the sixth power, which is 0.74. Therefore, when six independent hypotheses are tested the probability that at least one result is statistically significant is 0.265 or approx. 1 in 4, not 1 in 20. If we wish to maintain our overall boundary for statistical significance at 0.05, we must divide the threshold p value by six, so that each of the six tests uses a boundary value of p=0.008. That is, you would reject the null hypothesis that none of the characteristics differed significantly only if any one of the differences was significant at p<0.008.

There are several statistical strategies for dealing with multiple hypothesis testing of the same data. Apart from dividing the p value by the number of tests, we can specify, before the study is undertaken, a single primary outcome on which the main conclusions will hinge; a third approach is to derive a global test statistic that combines the multiple outcomes in a single measure.

Independent

Independent means that the result of a test of one hypothesis does not in any way depend on the results of the tests of any of the other hypotheses.

Acknowledgement

This article is based on Basic statistics for clinicians: 1. Hypothesis testing. G. Guyatt, R. Jaeschke, N. Heddle, D. Cook, H. Shannon, and S. Walter
Can. Med. Assoc. J., Jan 1995; 152: 27 - 32