Backward elimination is a method of model selection used in regression analysis to identify and remove statistically insignificant predictor variables. This method works by starting with all possible predictor variables and successively removing the least significant variables until the most significant variables remain. The process of backward elimination utilizes multiple statistical tests to determine the significance of predictor variables and is widely regarded as a powerful and efficient approach for selecting the best regression model.

In regression analysis, the goal is to best explain the variation in the dependent variable that is caused by the independent variables. The goal of backward elimination is to identify the most important predictor variables that explain the dependent variable, while reducing the number of predictor variables and minimizing the risk of multicollinearity [1]. Multicollinearity occurs when two or more predictor variables are highly correlated, which can lead to inaccurate results and inferences [2]. By removing the least important predictor variables, it is possible to reduce the risk of multicollinearity and improve the accuracy of the model.

The process of backward elimination begins by fitting the full model with all possible predictor variables. A significance level is then chosen, usually 0.05 or 0.01, and each predictor variable is tested for statistical significance. The least significant variable is removed from the model and the process is repeated until all remaining predictor variables are significant [3].

Backward elimination is a reliable and efficient method for model selection. By utilizing statistical tests to determine the significance of predictor variables, it is possible to reduce the risk of multicollinearity and improve the accuracy of the model.

References

[1] M. E. Burrows, “Backward Elimination in Regression Analysis,” Journal of Business & Economic Statistics, vol. 17, no. 4, pp. 469–474, 1999.

[2] K. W. Tabachnick and L. S. Fidell, Using Multivariate Statistics, 6th ed., Boston: Pearson, 2014.

[3] S. L. Miller, “Stepwise and Other Automated Model Selection Strategies,” in Handbook of Biological Statistics, 3rd ed., Baltimore, MD: Sparky House Publishing, 2015, pp. 395-398.