Statistics - explanations and formulas

Power Analysis

This was a negative study (i.e.; the results showed no difference in the outcomes between the actual surgery and sham surgery groups). There may have been no difference between the 2 groups due to insufficient participants.   When this happens, it is called a type II error (i.e.; the study shows no difference when there really is one).  To assess the risk of type II error, we need to ensure the study had enough participants by looking at the power analysis.  A negative study with enough people (an “adequately powered trial”) is just a negative study, but a negative study that did not enroll enough people may have committed a type two error, and the results are at risk being invalid.

You need to refer back to the section in the paper usually headed “statistical analysis” to search for the power analysis. At the beginning of this section the authors should state, and did in this paper, what the minimal meaningful clinical difference is in the primary outcomes. In this study differences of over 11.5, 15.5 or 2.0 in the Lysholm, WOMET and knee pain scores respectively were considered clinically relevant.  A calculation is then done using level of significance, the power and the effect size to estimate how many patients would need to be in each of the intervention and control groups to show a difference for each of the 3 outcomes (i.e.;  the sample size). 

Level of significance: this is the probability cut-off (usually 0.05 or 5%). In this case it means we are 95% certain that there is no true difference between the 2 treatments.

A power level of 80% is nearly always used and means there is an eight in ten chance of detecting a difference of the specified effect size (in this case, the 11.5, 15.5, or 2.0 on the knee pain scales above)

The effect size is a measure of the difference in the outcomes of the experimental and control groups.  It’s a measure of the effectiveness of the treatment, which is calculated from the standard deviation and the absolute difference in outcomes. It is often derived from previous studies looking at similar outcomes, as it was in this case.  The values for the clinically meaningful changes on the knee pain scales came from previous studies that asked patients how much their pain had to change before they felt it was really significantly better.  In other cases, where no previous information exists, pilot data or the investigators’ best guesses may have to suffice..

Power calculations are usually only calculated for the primary outcome which is why the results from secondary outcomes need to be interpreted with more caution particularly if they show no difference in outcomes.

A  type I error happens when the study incorrectly detects a difference when one does not really exist.