This directory provides researchers with a near-exhaustive array of inferential and non-inferential statistical analyses. We aim to cover common analyses in academic research, focusing on social sciences and healthcare.
A correlation expresses the strength of linkage or co-occurrence between to variables in a single value between -1 and +1. This value that measures the strength of linkage is called correlation coefficient, which is represented typically as the letter r.
The Canonical Correlation is a multivariate analysis of correlation. Canonical is the statistical term for analyzing latent variables (which are not directly observed) that represent multiple variables (which are directly observed).
Spurious correlations have a ubiquitous effect in statistics. Spurious correlations occur when two effects have clearly no causal relationship whatsoever in real life but can be statistically linked by correlation.
Like all correlation analyses the Point-Biserial Correlation measures the strength of association or co-occurrence between two variables. Correlation analyses express this strength of association in a single value, the correlation coefficient.
Usually, in statistics, we measure four types of correlations: Pearson correlation, Kendall rank correlation, Spearman correlation, and the Point-Biserial correlation.
Correlation ratio is a coefficient of non-linear association. In the case of linear relationships, the correlation ratio that is denoted by eta becomes the correlation coefficient.
The measures of association refer to a wide variety of coefficients (including bivariate correlation and regression coefficients) that measure the strength and direction of the relationship between variables; these measures of strength, or association, can be described in several ways, depending on the analysis.
Partial correlation is the measure of association between two variables, while controlling or adjusting the effect of one or more additional variables.
Pearson’s correlation coefficient is the test statistics that measures the statistical relationship, or association, between two continuous variables.
Much like cluster analysis involves grouping similar cases, factor analysis involves grouping similar variables into dimensions.
Confirmatory factor analysis (CFA) is a multivariate statistical procedure that is used to test how well the measured variables represent the number of constructs.
Exploratory factor analysis is a statistical technique that is used to reduce data to a smaller set of summary variables and to explore the underlying theoretical structure of the phenomena.
Factor analysis is a technique that is used to reduce a large number of variables into fewer numbers of factors.
LISRELis a program application provided by Windows for performing structural equation modeling (SEM), and other related linear structure modeling (e.g.,multilevel structural equation modeling, multilevel linear and non-linear modeling, etc.).
Path analysis is an extension of the regression model. In a path analysis model from the correlation matrix, two or more casual models are compared.
PLS graph is an application that consists of a windows based graphical user interface that helps the researcher or the user to perform partial least square (PLS) analyses.
Principal component analysis is an approach to factor analysis that considers the total variance in the data, which is unlike common factor analysis, and transforms the original variables into a smaller set of linear combinations.
Structural equation modeling is a multivariate statistical analysis technique that is used to analyze structural relationships.
Mediational hypotheses are the kind of hypotheses in which it is assumed that the affect of an independent variable on a dependent variable is mediated by the process of a mediating variable and the independent variable may still affect the independent variable.
Profile Analysis is mainly concerned with test scores, more specifically with profiles of test scores.
Sequential one-way discriminant analysis is similar to the one-way discriminant analysis. Discriminant analysis predicts group membership by fitting a linear regression line through the scatter plot.
On this page you’ll learn about the four data levels of measurement (nominal, ordinal, interval, and ratio) and why they are important.
Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale.
Hierarchical linear modeling (HLM) is an ordinary least square (OLS) regression-based analysis that takes the hierarchical structure of the data into account.
Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data.
Latent Class Analysis (LCA) is a statistical technique that is used in factor, cluster, and regression techniques; it is a subset of structural equation modeling (SEM).
Mathematical expectation, also known as the expected value, is the summation or integration of a possible values from a random variable.
Meta analysis is a statistical analysis that consists of huge collections of outcomes for the purpose of integrating the findings.
A moderator variable, commonly denoted as just M, is a third variable that affects the strength of the relationship between a dependent and independent variable.
Nominal variable association refers to the statistical relationship(s) on nominal variables. Nominal variables are variables that are measured at the nominal level, and have no inherent ranking.
The normality assumption is one of the most misunderstood in all of statistics. In multiple regression, the assumption requiring a normal distribution applies only to the disturbance term, not to the independent variables as is often believed.
The origin of the probability theory starts from the study of games like cards, tossing coins, dice, etc. But in modern times, probability has great importance in decision making.
Reliability refers to the extent to which a scale produces consistent results, if the measurements are repeated a number of times. The analysis on reliability is called reliability analysis.
Running a Test of Randomness is a non-parametric method that is used in cases when the parametric test is not in use.
Significance testing refers to the use of statistical techniques that are used to determine whether the sample drawn from a population is actually from the population or if by the chance factor.
Survival analysis helps the researcher assess if, any why, certain individuals are exposed to a higher risk of experiencing an event of interest, such as death, machine failure, drug relapse, etc.
In statistical analysis, all parametric tests assume some certain characteristic about the data, also known as assumptions. Violation of these assumptions changes the conclusion of the research and interpretation of the results.
Time series analysis is a statistical technique that deals with time series data, or trend analysis. Time series data means that data is in a series of particular time periods or intervals.
The log-linear analysis is appropriate when the goal of research is to determine if there is a statistically significant relationship among three or more discrete variables (Tabachnick & Fidell, 2012).
Cluster analysis is an exploratory analysis that tries to identify structures within the data. Cluster analysis is also called segmentation analysis or taxonomy analysis.
As experts in cluster analysis, with over 22 years of dissertation consulting success, and our own dissertation experience, we are well equipped to analyze your data and report your cluster analysis findings.
Validity implies precise and exact results acquired from the data collected. In technical terms, a measure can lead to a proper and correct conclusions to be drawn from the sample that are generalizable to the entire population.
Linear regression is an analysis that assesses whether one or more predictor variables explain the dependent (criterion) variable.
Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms – particularly regarding linearity, normality, homoscedasticity, and measurement level.
Multiple linear regression analysis makes several key assumptions.
Binary logistic regressions, by design, overcome many of the restrictive assumptions of linear regressions. For example, linearity, normality and equal variances are not assumed, nor is it assumed that the error term variance is normally distributed.
Linear regression is the most basic and commonly used predictive analysis.
Logistic regression is the linear regression analysis to conduct when the dependent variable is dichotomous (binary).
Multinomial Logistic Regression is the regression analysis to conduct when the dependent variable is nominal with more than two levels.
As a predictive analysis, multiple linear regression is used to describe data and to explain the relationship between one dependent variable and two or more independent variables.
Ordinal regression is a member of the family of regression analyses. As a predictive analysis, ordinal regression describes data and explains the relationship between one dependent variable and two or more independent variables.
The assumption of homoscedasticity (meaning “same variance”) is central to linear regression models.
Linear regression analysis consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model.
Logistic Regression Analysis estimates the log odds of an event.
Multiple Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data points.
Logistic regression is a class of regression where the independent variable is used to predict the dependent variable.
Multiple regression generally explains the relationship between multiple independent or predictor variables and one dependent or criterion variable.
Nonlinear regression is a regression in which the dependent or criterion variables are modeled as a non-linear function of model parameters and one or more independent variables.
Ordinal regression is a statistical technique that is used to predict behavior of ordinal level dependent variables with a set of independent variables.
There are 3 major questions that the logistic regression analysis answers – (1) causal analysis, (2) forecasting an outcome, (3) trend forecasting.
There are 3 major areas of questions that the regression analysis answers – (1) causal analysis, (2) forecasting an effect, (3) trend forecasting.
There are 3 major areas of questions that the multiple linear regression analysis answers – (1) causal analysis, (2) forecasting an effect, (3) trend forecasting.
A regression assesses whether predictor variables account for variability in a dependent variable.
Residual scatter plots provide a visual examination of the assumption homoscedasticity between the predicted dependent variable scores and the errors of prediction.
The basis of a multiple linear regression is to assess whether one continuous dependent variable can be predicted from a set of independent (or predictor) variables.
This example is based on the FBI’s 2006 crime statistics. Particularly we are interested in the relationship between size of the state and the number of murders in the city.
Our example is a research study on 107 pupils. These pupils have been measured with 5 different aptitude tests one for each important category (reading, writing, understanding, summarizing etc.).
This example is based on the FBI’s 2006 crime statistics. Particularly we are interested in the relationship between size of the state, various property crime rates and the number of murders in the city.
Two-Stage least squares (2SLS) regression analysis is a statistical technique that is used in the analysis of structural equations. This technique is the extension of the OLS method.
Binary Logistic Regression is a statistical analysis that determines how much variance, if at all, is explained on a dichotomous dependent variable by a set of independent variables.
Linear regression is a basic and commonly used type of predictive analysis.
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).
As a predictive analysis, the multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables.
ANOVA is a statistical technique that assesses potential differences in a scale-level dependent variable by a nominal-level variable having 2 or more categories.
The factorial ANOVA has a several assumptions that need to be fulfilled – (1) interval data of the dependent variable, (2) normality, (3) homoscedasticity, and (4) no multicollinearity.
ANCOVA is short for Analysis of Covariance. The factorial analysis of covariance is a combination of a factorial ANOVA and a regression analysis.
ANOVA is short for ANalysis Of Variance. As discussed in the chapter on the one-way ANOVA the main purpose of a one-way ANOVA is to test if two or more groups differ from each other significantly in one or more characteristics.
ANCOVA is short for Analysis of Covariance. The analysis of covariance is a combination of an ANOVA and a regression analysis.
ANOVA is short for ANalysis Of VAriance. The main purpose of an ANOVA is to test if two or more groups differ from each other significantly in one or more characteristics.
MANCOVA is short for Multivariate Analysis of Covariance. The words “one” and “way” in the name indicate that the analysis includes only one independent variable.
MANOVA is short for Multivariate ANalysis Of Variance. The main purpose of a one-way ANOVA is to test if two or more groups differ from each other significantly in one or more characteristics.
The repeated measures ANCOVA is a member of the GLM procedures. ANCOVA is short for Analysis of Covariance.
The repeated measures ANOVA is a member of the ANOVA family. ANOVA is short for ANalysis Of VAriance.
Generalized linear models are an extension, or generalization, of the linear modeling process which allows for non-normal distributions.
GLM repeated measure is a statistical technique that takes a dependent, or criterion variable, measured as correlated, non-independent data.
Multivariate analysis of variance (MANOVA) is an extension of the univariate analysis of variance (ANOVA).
Multivariate analysis of covariance (MANCOVA) is a statistical technique that is the extension of analysis of covariance (ANCOVA).
Multivariate (generalized linear model) GLM is the extended form of GLM, and it deals with more than one dependent variable and one or more independent variables.
Repeated measure analysis involves a ‘within subject’ design. The true ‘within subject’ design in this repeated measure analysis is a design in which each subject is measured under each treatment condition.
Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables.
Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed value of a given phenomena is significantly different from the expected value.
The Wilcoxon Sign Test requires two repeated measurements on a commensurate scale, that is, that the values of both observations can be compared.
The Chi-Square test of independence is used to determine if there is a significant relationship between two nominal (categorical) variables.
The Mann-Whitney U-test, is a statistical comparison of the mean. The U-test is a member of the bigger group of dependence tests.
A Spearman correlation coefficient is also referred to as Spearman rank correlation or Spearman’s rho.
The Wilcoxon Sign test is a statistical comparison of the average of two dependent samples. The Wilcoxon sign test is a sibling of the t-tests.
The Chi-Square Test of Independence is also known as Pearson’s Chi-Square and has two major applications: 1) goodness of fit test and 2) test of independence.
There are three significance tests for cases involving more than two dependent samples. These are the Friedman Test, the Kendall’s W test, and the Cochran’s Q test.
The Wilcoxon signed rank test is the non-parametric of the dependent samples t-test.
There are two accepted measures of non-parametric rank correlations: Kendall’s tau and Spearman’s (rho) rank correlation coefficient. Correlation analyses measure the strength of the relationship between two variables.
The Kruskal-Wallis test is a nonparametric (distribution free) test, and is used when the assumptions of one-way ANOVA are not met.
Mann-Whitney U test is the non-parametric alternative test to the independent sample t-test.
Non-parametric significance tests for two dependent samples are used when the researcher wants to study correlated, or matched, samples.
Ordinal variables are variables that are categorized in an ordered format, so that the different categories can be ranked from smallest to largest or from less to more on a particular characteristic.
The Wilcoxon Sign Test is used to determine whether the mean ranks of two dependent, or matched, samples are different from each other.
The Sign test is a non-parametric test that is used to test whether or not two groups are equally sized.
The Wilcoxon sign test is a statistical comparison of average of two dependent samples. The Wilcoxon sign test works with metric (interval or ratio) data that is not multivariate normal, or with ranked/ordinal data.
The Chi Square statistic is commonly used for testing relationships between categorical variables.
The Wald Wolfowitz run test is a non-parametric test or method that is used in cases when the parametric test is not in use.
The Wilcoxon Signed Rank test is a non-parametric analysis that statistically compared of the average of two dependent samples and assess for significant differences.
The one sample t-test is a statistical procedure used to determine whether a sample of observations could have been generated by a process with a specific mean.
The paired sample t-test, sometimes called the dependent sample t-test, is a statistical procedure used to determine whether the mean difference between two sets of observations is zero.
There are four non-parametric tests for cases involving two independent samples. These tests are: The Mann-Whitney U test, The, Wald-Wolfowitz Runs test, The Kolmogorov-Smirnov Z test, and The Moses Extreme Reactions test
The dependent sample t-test is a member of the t-test family. All tests from the t-test family compare one or more mean scores with each other.
The one-sample t-test is a member of the t-test family. All the tests in the t-test family compare differences in mean scores of continuous-level (interval or ratio), normally distributed data.
The independent samples t-test is a test that compares two groups on the mean value of a continuous (i.e., interval or ratio), normally distributed variable.
If you’re like others, you’ve invested a lot of time and money developing your dissertation or project research. Finish strong by learning how our dissertation specialists support your efforts to cross the finish line.