4 Diagnostic Analytics

Diagnostic analytics involves examining data to understand the reasons behind past performance.

This form of analytics moves beyond descriptive analytics, which merely identifies what has happened, to delve into why something happened. It involves more complex analyses, such as correlation, regression, and drill-down techniques, to uncover causal relationships and patterns within the data.

Diagnostic analytics is crucial for businesses and organizations looking to diagnose issues, understand underlying factors, and improve future performance based on insights from past actions.

4.1 Parametric VS Non-Parametric Tests

Parametric and non-parametric tests are two broad categories of statistical tests used in hypothesis testing. The choice between them depends on the type of data you’re analyzing and the assumptions you can make about that data. Here’s a comparison of the two:

Parametric Tests

Assumptions: Parametric tests assume that the data follows a specific distribution, usually a normal distribution. They also assume homogeneity of variances and the data is measured on an interval or ratio scale.
Data Requirements: These tests require the data to be quantitative and typically need to meet assumptions about the distribution of the data (e.g., normality). They are more suitable for data measured on an interval or ratio scale.
Examples: Common parametric tests include the t-test (used to compare the means of two groups), ANOVA (used to compare the means of three or more groups), and the Pearson correlation coefficient (used to assess the strength and direction of the linear relationship between two continuous variables).
Advantages: When their assumptions are met, parametric tests are generally more powerful than non-parametric tests, meaning they are more likely to detect a true effect when one exists.
Disadvantages: The main drawback is that if the assumptions are not met, the results of the parametric tests may not be valid.

4.1.1 Non-Parametric Tests

Assumptions: Non-parametric tests do not assume that the data follows a specific distribution. They are distribution-free tests and are less strict about data assumptions.
Data Requirements: These tests can be used on data that is not normally distributed, ordinal data, or when the sample size is small. They are more flexible in terms of the types of data they can handle.
Examples: Common non-parametric tests include the Mann-Whitney U test (used to compare two independent groups), the Kruskal-Wallis test (used to compare three or more independent groups), and the Spearman rank correlation coefficient (used to assess the strength and direction of the relationship between two variables that may not be linear).
Advantages: The main advantage is their flexibility; they can be used when parametric test assumptions are not met. They are also useful for ordinal data or for data with outliers that might affect parametric test results.
Disadvantages: Non-parametric tests are generally less powerful than parametric tests when the assumptions of the latter are met. This means they might not detect an effect that is actually there, especially if the sample size is not large enough.

4.1.2 Choosing Between Them

Data Distribution and Scale: If your data is normally distributed and measured on an interval or ratio scale, a parametric test might be more appropriate. If your data does not meet these criteria, consider a non-parametric test.
Sample Size: Parametric tests typically require larger sample sizes than non-parametric tests, although non-parametric tests might also need larger samples to have adequate power.
Research Question: The choice can also be influenced by the specific research question and the nature of the data.

In summary, the choice between parametric and non-parametric tests depends on the characteristics of your data and your specific research needs. Understanding the assumptions and requirements of each type of test is crucial in making the right choice.

Feature	Parametric Tests	Non-Parametric Tests
Assumptions	Assumes specific distribution (usually normal). Assumes homogeneity of variances. Data measured on interval/ratio scale.	No specific distribution assumed. Less strict about data assumptions.
Data Requirements	Quantitative data that meets distribution assumptions. Interval/ratio scale.	Suitable for non-normally distributed data, ordinal data, or small sample sizes. Flexible.
Examples	t-test, ANOVA, Pearson correlation coefficient	Mann-Whitney U test, Kruskal-Wallis test, Spearman rank correlation
Advantages	More powerful when assumptions are met. Can detect true effects more likely.	Flexible, can be used when parametric assumptions not met. Useful for ordinal data/outliers.
Disadvantages	Results may not be valid if assumptions are not met.	Generally less powerful than parametric tests, might not detect an effect that is actually there.
Choice Considerations	Use if data is normally distributed and measured on an interval or ratio scale.	Use for non-normally distributed data, ordinal data, or when sample size is small.