The normality assumption for multiple regression is one of the most misunderstood in all of statistics. In multiple regression, the normal distribution assumption applies only to the residuals, not the independent variables as many often believe. Confusion may stem from misunderstanding residuals—errors in the independent-dependent variable relationship. Each case’s residual is the difference between observed and predicted values in a regression. The residuals or noise distribution for all cases in the sample should follow a normal distribution.
Violating the assumption has few consequences, as it does not cause bias or inefficiency in regression models. It only matters for calculating p-values in significance testing, which is relevant primarily for very small sample sizes. When the sample size exceeds 200, the normality assumption is unnecessary. Because the Central Limit Theorem ensures the residuals’ distribution approximates normality.
When dealing with very small samples, it is important to check for a possible violation of the normality assumption. Inspecting the residuals from the regression model accomplishes this. Some programs perform this automatically, while others require saving the residuals as a new variable and examining them using summary statistics and histograms. There are several statistics available to examine the normality of variables, including skewness and kurtosis, as well as numerous graphical depictions, such as the normal probability plot. Unfortunately, the statistics used to assess this are unstable in small samples, so their results require cautious interpretation. If the residuals’ distribution deviates from normality, possible solutions include transforming the data, removing outliers, or using an alternative analysis, such as nonparametric regression, that does not require normality.
To Reference this Page: Statistics Solutions. (2013). Normality . Retrieved from https://www.statisticssolutions.com/academic-solutions/resources/directory-of-statistical-analyses/normality/