The linear regression is a test to see if indicators can explain an outcome variable. For example, do two indicators (e.g., Stress Level and college GPA) predict an outcome variable (e.g., Life Satisfaction). The results of the regression will indicate whether the model (using stress and GPA) are good at evaluating Satisfaction, which of these predictors were important in predicting satisfaction, and what was the relationship between a predictor and the outcome variable.
The linear regression has the assumptions of normality, homoscedasticity, multicollinearity, and the absence of outliers.
To assess normality, plot the model residuals against the quantiles of a Chi-square distribution, also called a Q-Q scatterplot. The data meets normality when residual quantiles do not strongly deviate from the theoretical quantiles. Strong deviations indicate that the parameter estimates are unreliable. Below is an example plot.
Evaluate homoscedasticity by plotting the residuals against the predicted values. The assumption is met if the points appear randomly distributed around a mean of zero and no apparent curvature. Below is an example plot.
Calculate Variance Inflation Factors (VIFs) to detect the presence of multicollinearity (high correlations) between predictor variables. High VIFs (typically greater than 5) indicates increased effects of multicollinearity in the model.
Schedule a time to speak with an expert using the link below.
To identify outliers or influential points, calculate the Studentized residuals by dividing the model residuals by the estimated residual standard deviation. Plot the absolute values of these residuals against the observation numbers. If an observation with a Studentized residual than 3.15 in absolute value, it significantly influences the model’s result. This threshold matches to the 0.999 quartile of a t distribution with 149 degrees of freedom.
The results of a linear regression
After measuring the assumptions, the regression will indicate if the model was significant, suggested by a statistically significant F-value. The R-squared value indicates the coefficient of determination. Each indicator includes a beta coefficient, a t-value, and a probability for the t-value. A probability of 0.05 or less is considered statistically significant. The beta sign (positive or negative) indicates the relationship between that predictor and the outcome variable. For example, if stress has a negative beta coefficient, this indicates that the higher the stress, the lower the satisfaction tends to be, while a positive beta for GPA indicates that the higher the GPA, the higher the satisfaction tends to be.
If you want to see for yourself, you can go to www.IntellectusStatistics.com, try it for a week for free, download an example dataset and conduct a linear regression on different data, and look at the results.
We work with graduate students every day and know what it takes to get your research approved.