Basic Statistics for Clinicians

CORRELATION AND REGRESSION

Correlation and regression help us to understand the relation between variables and to predict patients' status in regard to a particular variable of interest. Correlation examines the strength of the relation between two variables or phenomena, neither of which is considered the variable one is trying to predict (the target variable). Regression analysis examines the ability of one or more factors, called independent variables, to predict a patient's status in regard to a target or dependent variable. (The statistical techniques for making a prediction or causal inference = regression). Independent and dependent variables may be continuous (taking a wide range of values) or binary (dichotomous, yielding yes-or-no results). Regression models can be used to construct clinical prediction rules that help to guide clinical decisions. In considering regression and correlation, clinicians should pay more attention to the magnitude of the correlation or the predictive power of the regression than to whether the relation is statistically significant.

Correlation

A relation is strong when patients who obtain high, intermediate or low scores on the first variable obtain respective scores on the second variable. The strength of the relation can be summarized in a single number, the relation coefficient (r). This can range from -1.0 (the strongest possible negative relation - the patient with the highest score on one test has the lowest on the other) to 1.0, the strongest possible relation. A correlation coefficient of 0 denotes no relation at all - patients with a high score on one test have the same range of scores on the second as those with a low score on the first test. The p value for r is determined from the null hypothesis that the true correlation between the two measures is 0. The smaller the p value the less likely that chance explains the relation between the two measures.

Regression

As clinicians we are interested in prediction: which patients will get a disease, and which will not. Regression analysis is useful in addressing these sorts of issues. When regression analysis assumes a straight line fit between the independent and dependent variable, and the dependent variable is continuous, we refer to the analysis as linear progression. When the dependent variable is dichotomous the term logistic regression may be used to refer to such models because they are based on logarithmic equations.

Acknowledgement

This article is based on Basic statistics for clinicians: 4. Correlation and regression. G. Guyatt, R. Jaeschke, N. Heddle, D. Cook, H. Shannon, and S. Walter
Can. Med. Assoc. J., Feb 1995; 152: 497 - 504