Selected Nonparametric Tests and Techniques
Difference Between Parametric and Nonparametric Test …
Fisherian parametric tests are classified by data miners Nisbet, Elder, and Miner (2009) as first generation statistical methods. While parametric tests are efficient to handle relatively small experimental data sets in academic settings, business and industry, which use huge data sets, admitted that "analysts could bring computers to their 'knees' with the processing of classical statistical analyses" (Nisbet, Elder, & Miner, 2009, p.30). As a remedy, a new approach to decision making was created based on artificial intelligence (AI), which modeled on the human brain rather than on Fisher’s parametric approach. As a result, a new set of non-parametric tools, including neural nets, classification trees, and multiple auto-regressive spine (MARS), was developed for analyzing huge data sets. This cluster of tools is called data mining. Unlike conventional parametric tests that emphasize theoretical explanation, data mining is primarily used by business for . This paradigm shift is reflected by the renaming of SPSS in 2009. After IBM acquires SPSS, SPSS became Predictive Analytical Software (PASW) because data mining and text mining tools had been tightly integrated into formerly SPSS's parametric procedures. But later IBM reverted the name to SPSS. For more information about data mining, please read this.
Parametric and Nonparametric Methods in Statistics
In social sciences, the assumption of independence, which is required by ANOVA and many other parametric procedures, is always violated to some degree. Take Trends for International Mathematics and Science Study (TIMSS) as an example. The TIMSS sample design is a two-stage stratified cluster sampling scheme. In the first stage, schools are sampled with probability proportional to size. Next, one or more intact classes of students from the target grades are drawn at the second stage (Joncas, 2008). Parametric-based ordinary Least Squares (OLS) regression models are valid if and only if the residuals are normally distributed, independent, with a mean of zero and a constant variance. However, TMISS data are collected using a complex sampling method, in which data of one level are nested with another level (i.e. students are nested with classes, classes are nested with schools, schools are nested with nations), and thus it is unlikely that the residuals are independent of each other. If OLS regression is employed to estimate relationships on nested data, the estimated standard errors will be negatively biased, resulting in an overestimation of the statistical significance of regression coefficients. In this case, hierarchical linear modeling (HLM) (Raudenbush & Bryk, 2002) should be employed to specifically tackle the nested data structure. To be more specific, instead of fitting one overall model, HLM takes this nested data structure into account by constructing models at different levels, and thus HLM is also called multilevel modeling.
The merit of HLM does not end here. For analyzing longitudinal data, HLM is considered superior to repeated measures ANOVA because the latter must assume compound symmetry whereas HLM allows the analyst specify many different forms of covariance structure (Littell & Milliken, 2006). Readers are encouraged to read Shin's (2009) concise comparison of repeated measures ANOVA and HLM.