Directory of Statistical Analyses

This directory provides researchers with a near-exhaustive array of inferential and non-inferential statistical analyses. We aim to cover common analyses in academic research, focusing on social sciences and healthcare.

Correlation
Conduct and Interpret a Bivariate (Pearson) Correlation

A correlation expresses the strength of linkage or co-occurrence between to variables in a single value between -1 and +1. This value that measures the strength of linkage is called correlation coefficient, which is represented typically as the letter r.

Conduct and Interpret a Canonical Correlation

The Canonical Correlation is a multivariate analysis of correlation. Canonical is the statistical term for analyzing latent variables (which are not directly observed) that represent multiple variables (which are directly observed).

Conduct and Interpret a Partial Correlation

Spurious correlations have a ubiquitous effect in statistics. Spurious correlations occur when two effects have clearly no causal relationship whatsoever in real life but can be statistically linked by correlation.

Conduct and Interpret a Point-Biserial Correlation

Like all correlation analyses the Point-Biserial Correlation measures the strength of association or co-occurrence between two variables. Correlation analyses express this strength of association in a single value, the correlation coefficient.

Correlation (Pearson, Kendall, Spearman)

Usually, in statistics, we measure four types of correlations: Pearson correlation, Kendall rank correlation, Spearman correlation, and the Point-Biserial correlation.

Correlation Ratio

Correlation ratio is a coefficient of non-linear association.  In the case of linear relationships, the correlation ratio that is denoted by eta becomes the correlation coefficient.

Measures of Association

The measures of association refer to a wide variety of coefficients (including bivariate correlation and regression coefficients) that measure the strength and direction of the relationship between variables; these measures of strength, or association, can be described in several ways, depending on the analysis.

Partial Correlation

Partial correlation is the measure of association between two variables, while controlling or adjusting the effect of one or more additional variables.

Pearson’s Correlation Coefficient

Pearson’s correlation coefficient is the test statistics that measures the statistical relationship, or association, between two continuous variables.

Factor Analysis & SEM
Conduct and Interpret a Factor Analysis

Much like cluster analysis involves grouping similar cases, factor analysis involves grouping similar variables into dimensions.

Confirmatory Factor Analysis

Confirmatory factor analysis (CFA) is a multivariate statistical procedure that is used to test how well the measured variables represent the number of constructs.

Exploratory Factor Analysis

Exploratory factor analysis is a statistical technique that is used to reduce data to a smaller set of summary variables and to explore the underlying theoretical structure of the phenomena.

Factor Analysis

Factor analysis is a technique that is used to reduce a large number of variables into fewer numbers of factors.

LISREL

LISRELis a program application provided by Windows for performing structural equation modeling (SEM), and other related linear structure modeling (e.g.,multilevel structural equation modeling, multilevel linear and non-linear modeling, etc.).

Path Analysis

Path analysis is an extension of the regression model. In a path analysis model from the correlation matrix, two or more casual models are compared.

PLS Graph Software

PLS graph is an application that consists of a windows based graphical user interface that helps the researcher or the user to perform partial least square (PLS) analyses.

Principal Component Analysis (PCA)

Principal component analysis is an approach to factor analysis that considers the total variance in the data, which is unlike common factor analysis, and transforms the original variables into a smaller set of linear combinations.

Structural Equation Modeling

Structural equation modeling is a multivariate statistical analysis technique that is used to analyze structural relationships.

General
Baron & Kenny’s Procedures for Mediational Hypotheses

Mediational hypotheses are the kind of hypotheses in which it is assumed that the affect of an independent variable on a dependent variable is mediated by the process of a mediating variable and the independent variable may still affect the independent variable.

Conduct and Interpret a Profile Analysis

Profile Analysis is mainly concerned with test scores, more specifically with profiles of test scores.

Conduct and Interpret a Sequential One-Way Discriminant Analysis

Sequential one-way discriminant analysis is similar to the one-way discriminant analysis. Discriminant analysis predicts group membership by fitting a linear regression line through the scatter plot.

Data Levels and Measurement

On this page you’ll learn about the four data levels of measurement (nominal, ordinal, interval, and ratio) and why they are important.

Effect Size

Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale.

Hierarchical Linear Modeling (HLM)

Hierarchical linear modeling (HLM) is an ordinary least square (OLS) regression-based analysis that takes the hierarchical structure of the data into account.

Hypothesis Testing

Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data.

Latent Class Analysis

Latent Class Analysis (LCA) is a statistical technique that is used in factor, cluster, and regression techniques; it is a subset of structural equation modeling (SEM).

Mathematical Expectation

Mathematical expectation, also known as the expected value, is the summation or integration of a possible values from a random variable.

Meta Analysis

Meta analysis is a statistical analysis that consists of huge collections of outcomes for the purpose of integrating the findings.

Moderator Variable

A moderator variable, commonly denoted as just M, is a third variable that affects the strength of the relationship between a dependent and independent variable.

Nominal Variable Association

Nominal variable association refers to the statistical relationship(s) on nominal variables. Nominal variables are variables that are measured at the nominal level, and have no inherent ranking.

Normality

The normality assumption is one of the most misunderstood in all of statistics. In multiple regression, the assumption requiring a normal distribution applies only to the disturbance term, not to the independent variables as is often believed.

Probability

The origin of the probability theory starts from the study of games like cards, tossing coins, dice, etc. But in modern times, probability has great importance in decision making.

Reliability Analysis

Reliability refers to the extent to which a scale produces consistent results, if the measurements are repeated a number of times. The analysis on reliability is called reliability analysis.

Run Test of Randomness

Running a Test of Randomness is a non-parametric method that is used in cases when the parametric test is not in use.

Significance

Significance testing refers to the use of statistical techniques that are used to determine whether the sample drawn from a population is actually from the population or if by the chance factor.

Survival Analysis

Survival analysis helps the researcher assess if, any why, certain individuals are exposed to a higher risk of experiencing an event of interest, such as death, machine failure, drug relapse, etc.

Testing of Assumptions

In statistical analysis, all parametric tests assume some certain characteristic about the data, also known as assumptions. Violation of these assumptions changes the conclusion of the research and interpretation of the results.

Time Series Analysis

Time series analysis is a statistical technique that deals with time series data, or trend analysis. Time series data means that data is in a series of particular time periods or intervals.

Log-Linear Analysis (Multi-way Frequency Tables)

The log-linear analysis is appropriate when the goal of research is to determine if there is a statistically significant relationship among three or more discrete variables (Tabachnick & Fidell, 2012).

Conduct and Interpret a Cluster Analysis

Cluster analysis is an exploratory analysis that tries to identify structures within the data. Cluster analysis is also called segmentation analysis or taxonomy analysis.

Cluster Analysis Consulting

As experts in cluster analysis, with over 22 years of dissertation consulting success, and our own dissertation experience, we are well equipped to analyze your data and report your cluster analysis findings.

Validity

Validity implies precise and exact results acquired from the data collected. In technical terms, a measure can lead to a proper and correct conclusions to be drawn from the sample that are generalizable to the entire population.

Regression Analysis
Assumptions of Linear Regression

Linear regression is an analysis that assesses whether one or more predictor variables explain the dependent (criterion) variable.

Assumptions of Logistic Regression

Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms – particularly regarding linearity, normality, homoscedasticity, and measurement level.

Assumptions of Multiple Linear Regression

Multiple linear regression analysis makes several key assumptions.

Binary Logistic Regressions

Binary logistic regressions, by design, overcome many of the restrictive assumptions of linear regressions. For example, linearity, normality and equal variances are not assumed, nor is it assumed that the error term variance is normally distributed.

Conduct and Interpret a Linear Regression

Linear regression is the most basic and commonly used predictive analysis.

Conduct and Interpret a Logistic Regression

Logistic regression is the linear regression analysis to conduct when the dependent variable is dichotomous (binary).

Conduct and Interpret a Multinomial Logistic Regression

Multinomial Logistic Regression is the regression analysis to conduct when the dependent variable is nominal with more than two levels.

Conduct and Interpret a Multiple Linear Regression

As a predictive analysis, multiple linear regression is used to describe data and to explain the relationship between one dependent variable and two or more independent variables.

Conduct and Interpret an Ordinal Regression

Ordinal regression is a member of the family of regression analyses. As a predictive analysis, ordinal regression describes data and explains the relationship between one dependent variable and two or more independent variables.

Homoscedasticity

The assumption of homoscedasticity (meaning “same variance”) is central to linear regression models.

How to Conduct Linear Regression

Linear regression analysis consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model.

How to Conduct Logistic Regression

Logistic Regression Analysis estimates the log odds of an event.

How to Conduct Multiple Linear Regression

Multiple Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data points.

Logistic Regression

Logistic regression is a class of regression where the independent variable is used to predict the dependent variable.

Multiple Regression

Multiple regression generally explains the relationship between multiple independent or predictor variables and one dependent or criterion variable.

Nonlinear Regression

Nonlinear regression is a regression in which the dependent or criterion variables are modeled as a non-linear function of model parameters and one or more independent variables.

Ordinal Regression

Ordinal regression is a statistical technique that is used to predict behavior of ordinal level dependent variables with a set of independent variables.

Question the Logistic Regression Answers

There are 3 major questions that the logistic regression analysis answers – (1) causal analysis, (2) forecasting an outcome, (3) trend forecasting.

Questions the Linear Regression Answers

There are 3 major areas of questions that the regression analysis answers – (1) causal analysis, (2) forecasting an effect, (3) trend forecasting.

Questions the Multiple Linear Regression Answers

There are 3 major areas of questions that the multiple linear regression analysis answers – (1) causal analysis, (2) forecasting an effect, (3) trend forecasting.

Regression

A regression assesses whether predictor variables account for variability in a dependent variable.

Scatterplot: An Assumption of Regression Analysis

Residual scatter plots provide a visual examination of the assumption homoscedasticity between the predicted dependent variable scores and the errors of prediction.

Selection Process for Multiple Regression

The basis of a multiple linear regression is to assess whether one continuous dependent variable can be predicted from a set of independent (or predictor) variables.

The Linear Regression Analysis in SPSS

This example is based on the FBI’s 2006 crime statistics. Particularly we are interested in the relationship between size of the state and the number of murders in the city.

The Logistic Regression Analysis in SPSS

Our example is a research study on 107 pupils. These pupils have been measured with 5 different aptitude tests one for each important category (reading, writing, understanding, summarizing etc.).

The Multiple Linear Regression Analysis in SPSS

This example is based on the FBI’s 2006 crime statistics. Particularly we are interested in the relationship between size of the state, various property crime rates and the number of murders in the city.

Two-Stage Least Squares (2SLS) Regression Analysis

Two-Stage least squares (2SLS) regression analysis is a statistical technique that is used in the analysis of structural equations. This technique is the extension of the OLS method.

Using Logistic Regression in Research

Binary Logistic Regression is a statistical analysis that determines how much variance, if at all, is explained on a dichotomous dependent variable by a set of independent variables.

What is Linear Regression?

Linear regression is a basic and commonly used type of predictive analysis.

What is Logistic Regression?

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).

What is Multiple Linear Regression?

As a predictive analysis, the multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables.

(M)ANOVA Analysis
ANOVA (Analysis of Variance)

ANOVA is a statistical technique that assesses potential differences in a scale-level dependent variable by a nominal-level variable having 2 or more categories.

Assumptions of the Factorial ANOVA

The factorial ANOVA has a several assumptions that need to be fulfilled – (1) interval data of the dependent variable, (2) normality, (3) homoscedasticity, and (4) no multicollinearity.

Conduct and Interpret a Factorial ANCOVA

ANCOVA is short for Analysis of Covariance. The factorial analysis of covariance is a combination of a factorial ANOVA and a regression analysis.

Conduct and Interpret a Factorial ANOVA

ANOVA is short for ANalysis Of Variance. As discussed in the chapter on the one-way ANOVA the main purpose of a one-way ANOVA is to test if two or more groups differ from each other significantly in one or more characteristics.

Conduct and Interpret a One-Way ANCOVA

ANCOVA is short for Analysis of Covariance. The analysis of covariance is a combination of an ANOVA and a regression analysis.

Conduct and Interpret a One-Way ANOVA

ANOVA is short for ANalysis Of VAriance. The main purpose of an ANOVA is to test if two or more groups differ from each other significantly in one or more characteristics.

Conduct and Interpret a One-Way MANCOVA

MANCOVA is short for Multivariate Analysis of Covariance. The words “one” and “way” in the name indicate that the analysis includes only one independent variable.

Conduct and Interpret a One-Way MANOVA

MANOVA is short for Multivariate ANalysis Of Variance. The main purpose of a one-way ANOVA is to test if two or more groups differ from each other significantly in one or more characteristics.

Conduct and Interpret a Repeated Measures ANCOVA

The repeated measures ANCOVA is a member of the GLM procedures. ANCOVA is short for Analysis of Covariance.

Conduct and Interpret a Repeated Measures ANOVA

The repeated measures ANOVA is a member of the ANOVA family. ANOVA is short for ANalysis Of VAriance.

Generalized Linear Models

Generalized linear models are an extension, or generalization, of the linear modeling process which allows for non-normal distributions.

GLM Repeated Measure

GLM repeated measure is a statistical technique that takes a dependent, or criterion variable, measured as correlated, non-independent data.

MANOVA

Multivariate analysis of variance (MANOVA) is an extension of the univariate analysis of variance (ANOVA).

Multivariate Analysis of Covariance (MANCOVA)

Multivariate analysis of covariance (MANCOVA) is a statistical technique that is the extension of analysis of covariance (ANCOVA).

Multivariate GLM, MANOVA, and MANCOVA

Multivariate (generalized linear model) GLM is the extended form of GLM, and it deals with more than one dependent variable and one or more independent variables.

Repeated Measure

Repeated measure analysis involves a ‘within subject’ design. The true ‘within subject’ design in this repeated measure analysis is a design in which each subject is measured under each treatment condition.

Non-Parametric Analysis
CHAID

Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables.

Chi-Square Goodness of Fit Test

Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed value of a given phenomena is significantly different from the expected value.

Assumptions of the Wilcoxon Sign Test

The Wilcoxon Sign Test requires two repeated measurements on a commensurate scale, that is, that the values of both observations can be compared.

Chi-Square Test of Independence

The Chi-Square test of independence is used to determine if there is a significant relationship between two nominal (categorical) variables.

Conduct and Interpret a Mann-Whitney U-Test

The Mann-Whitney U-test, is a statistical comparison of the mean. The U-test is a member of the bigger group of dependence tests.

Conduct and Interpret a Spearman Rank Correlation

A Spearman correlation coefficient is also referred to as Spearman rank correlation or Spearman’s rho.

Conduct and Interpret a Wilcoxon Sign Test

The Wilcoxon Sign test is a statistical comparison of the average of two dependent samples. The Wilcoxon sign test is a sibling of the t-tests.

Conduct and Interpret the Chi-Square Test of Independence

The Chi-Square Test of Independence is also known as Pearson’s Chi-Square and has two major applications: 1) goodness of fit test and 2) test of independence.

Friedman Test, Kendall’s W, Cochran’s Q: Significance Tests for More Than Two Dependent Samples

There are three significance tests for cases involving more than two dependent samples. These are the Friedman Test, the Kendall’s W test, and the Cochran’s Q test.

How to Conduct the Wilcoxon Sign Test

The Wilcoxon signed rank test is the non-parametric of the dependent samples t-test.

Kendall’s Tau and Spearman’s Rank Correlation Coefficient

There are two accepted measures of non-parametric rank correlations: Kendall’s tau and Spearman’s (rho) rank correlation coefficient. Correlation analyses measure the strength of the relationship between two variables.

Kruskal-Wallis Test

The Kruskal-Wallis test is a nonparametric (distribution free) test, and is used when the assumptions of one-way ANOVA are not met.

Mann-Whitney U Test

Mann-Whitney U test is the non-parametric alternative test to the independent sample t-test.

McNemar, Marginal Homogeneity, Sign, Wilcoxon Tests

Non-parametric significance tests for two dependent samples are used when the researcher wants to study correlated, or matched, samples.

Ordinal Association

Ordinal variables are variables that are categorized in an ordered format, so that the different categories can be ranked from smallest to largest or from less to more on a particular characteristic.

Questions the Wilcoxon Sign Test Answers

The Wilcoxon Sign Test is used to determine whether the mean ranks of two dependent, or matched, samples are different from each other.

Sign Test

The Sign test is a non-parametric test that is used to test whether or not two groups are equally sized.

The Wilcoxon Sign Test in SPSS

The Wilcoxon sign test is a statistical comparison of average of two dependent samples. The Wilcoxon sign test works with metric (interval or ratio) data that is not multivariate normal, or with ranked/ordinal data.

Using Chi-Square Statistic in Research

The Chi Square statistic is commonly used for testing relationships between categorical variables.

Wald Wolfowitz Run Test

The Wald Wolfowitz run test is a non-parametric test or method that is used in cases when the parametric test is not in use.

What is the Wilcoxon Sign Test?

The Wilcoxon Signed Rank test is a non-parametric analysis that statistically compared of the average of two dependent samples and assess for significant differences.

t-Tests
One Sample T-Test

The one sample t-test is a statistical procedure used to determine whether a sample of observations could have been generated by a process with a specific mean.

Paired Sample T-Test

The paired sample t-test, sometimes called the dependent sample t-test, is a statistical procedure used to determine whether the mean difference between two sets of observations is zero.

Tests for Two Independent Samples

There are four non-parametric tests for cases involving two independent samples. These tests are: The Mann-Whitney U test, The, Wald-Wolfowitz Runs test, The Kolmogorov-Smirnov Z test, and The Moses Extreme Reactions test

Conduct and Interpret a Dependent Sample T-Test

The dependent sample t-test is a member of the t-test family. All tests from the t-test family compare one or more mean scores with each other.

Conduct and Interpret a One-Sample T-Test

The one-sample t-test is a member of the t-test family. All the tests in the t-test family compare differences in mean scores of continuous-level (interval or ratio), normally distributed data.

Conduct and Interpret an Independent Sample T-Test

The independent samples t-test is a test that compares two groups on the mean value of a continuous (i.e., interval or ratio), normally distributed variable.

Two women looking at statistics paper work
Step Boldly to Completing your Research

If you’re like others, you’ve invested a lot of time and money developing your dissertation or project research.  Finish strong by learning how our dissertation specialists support your efforts to cross the finish line.