The Normal Distribution: Part 2
Last month, Page2 considered the normal distribution. As was stated in that particular post, there are tests that can be used to determine whether a data set is normally distributed, and those tests would be considered in a later entry. Though the tests that are described can easily be applied using a statistical program, the reader is advised that the first step should be to plot the data using either a histogramÂ or a stem-and-leaf diagram (or stemplot)Â and to determine whether the plot appears normal. This is a key step because the tests described below have the danger of rejecting normality too easily for larger samples, and not rejecting often enough for smaller samples (i.e., underpowered). Thus, these tests should always be combined with your visual assessment!
Tests of normality show that a data set models itself as a normal distribution, that is, the curve is symmetrical and is shaped so that 68% of observations are within 1 standard deviation, 95% of observations are within 2 standard deviations, and 99.7% of observations are within 3 standard deviations. A common method to test normality is to calculate various moments or forces (using a physics analogy) about the mean. These include measures of skewness, i.e., a measure of symmetry, and kurtosis, i.e., a measure of the “peakedness”. The D’Agostino’s K-squared test, for example, creates a transformation combining the two measures and provides a statistic that can detect deviations from normality. This combined statistic is known as the D’Agostino-Pearson omnibus test and was used last month in the article “Assessing the Diagnostic Accuracy of Pulse Pressure Variations for the Prediction of Fluid Responsiveness: A “Gray Zone” Approach“. The authors used the test to be more confident that results expressed as mean Â± SD were normally distributed. That article was alsoÂ discussed last month on Page2. Last month also, in the article “Developmental Stage-dependent Persistent Impact of Propofol Anesthesia on Dendritic Spines in the Rat Medial Prefrontal Cortex“, those authors used the D’Agostino’s K-squared test.
Another method to test for normality is to summarize the observed data values versus those expected under the normal distribution using either correlation statistics or regression. Â A Q-Q plot compares quantiles of the data to those expected under the normal distribution. The Shapiro-Wilk test assesses the slope of the regression line and is used for smaller samples. In the study “Ultrasound Imaging Facilitates Spinal Anesthesia in Adults with Difficult Surface Anatomic Landmarks” published in July and also summarized on Page2, the authors used both Q-Q plots and the Shapiroâ€“Wilk statistic to assess whether data was normally distributed.
Another technique to test for normality uses empirical distribution function statistics. The empirical distribution function is a cumulative distribution function that jumps 1/n at each of the n data points. Goodness-of-fit tests can be used to compare the distribution of the data being studied to that expected under the normal distribution. The Kolmogorov-Smirnov statistic is an example that was used in another study, “Pregabalin Suppresses Spinal Neuronal Hyperexcitability and Visceral Hypersensitivity in the Absence of Peripheral Pathophysiology“, also published in July. We caution, however, that the Kolmogorov-Smirnov test has been shown to not discriminate normal from non-normal data as well as other tests, and many authors discourage its use.
More tests for normality are available. A nice review of different tests for normality has been published. Â Most authors test for normality, though mention of that fact is not always included in the description of statistics.