Home Directory of Statistical Analyses Normality

Normality

The normality assumption for multiple regression is one of the most misunderstood in all of statistics. In multiple regression, the assumption requiring a normal distribution applies only to the residuals, not to the independent variables as is often believed. Perhaps the confusion about this assumption derives from difficulty understanding what the residuals are – simply put, the residuals are the error in the relationship between the independent variables and the dependent variable in a regression model. Each case in the sample has a residual value that represents the difference in the observed and predicted values produced by a regression equation. It is the distribution of the residuals or noise for all cases in the sample that should be normally distributed.

There are few consequences associated with a violation of the normality assumption, as it does not contribute to bias or inefficiency in regression models. It is only important for the calculation of p values for significance testing, but this is only a consideration when the sample size is very small. When the sample size is sufficiently large (>200), the normality assumption is not needed at all as the Central Limit Theorem ensures that the distribution of residuals will approximate normality.

When dealing with very small samples, it is important to check for a possible violation of the normality assumption. This can be accomplished through an inspection of the residuals from the regression model (some programs will perform this automatically while others require that you save the residuals as a new variable and examine them using summary statistics and histograms). There are several statistics available to examine the normality of variables, including skewness and kurtosis, as well as numerous graphical depictions, such as the normal probability plot. Unfortunately, the statistics to assess it are unstable in small samples, so their results should be interpreted with caution. When the distribution of the residuals is found to deviate from normality, possible solutions include transforming the data, removing outliers, or conducting an alternative analysis that does not require normality (e.g., a nonparametric regression).

To Reference this Page: Statistics Solutions. (2013). Normality . Retrieved from https://www.statisticssolutions.com/academic-solutions/resources/directory-of-statistical-analyses/normality/