DISTRIBUTION-FREE TEST

Distribution-Free Tests: A Review

In the field of data analysis, it is often necessary to make statistical inferences about a population based on a sample. Such inferences are often difficult to make without making assumptions about the underlying distribution of the data. Distribution-free tests are a class of statistical tests designed to make such inferences without making any such assumptions. This review will discuss the history, theory, and application of distribution-free tests.

History

The concept of distribution-free tests was first proposed in the 1950s by Egon Pearson. He proposed a distribution-free test of goodness-of-fit, called the Kolmogorov-Smirnov test, which was designed to measure the similarity of two distributions without making any assumptions about the underlying distributions. This test was later expanded to include tests of independence, as well as tests for non-parametric correlation.

Theory

Distribution-free tests are based on the concept of rank statistics, which are used to measure the similarity of two distributions without making any assumptions about the underlying distributions. The idea is that if two distributions are similar, then their rank statistics should be similar as well. Rank statistics are non-parametric measures of the similarity of two distributions, and they are usually computed using the Kolmogorov-Smirnov test.

Application

Distribution-free tests are often used in data analysis to make inferences about a population without making any assumptions about the underlying distribution. For example, they can be used to test for differences in the means of two populations, or to test for the presence of a correlation between two variables. They can also be used to test for non-parametric correlations, such as Spearman’s rank correlation coefficient.

Conclusion

Distribution-free tests are a valuable tool for data analysis, as they allow for the making of statistical inferences without making assumptions about the underlying distribution. They can be used to test for differences in the means of two populations, or to test for the presence of a correlation between two variables. As such, they are an important tool for data scientists.

References

Pearson, E. (1956). Distribution-free tests. Journal of the American Statistical Association, 51(272), 577-593.

Kolmogorov, A. N., & Smirnov, N. V. (1933). On the estimation of the discrepancy between empirical curves of distribution and the distribution law. Akademiia nauk SSSR Doklady, 2, 3-16.

Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72-101.

Scroll to Top