Best Model Selection Method for the Best Latent Variables Determination When Solving Multicollinearity with Partial Least Squares

Authors

  • Olusegun O. Alabi Department of Statistics, Federal University of Technology, Akure, Nigeria Author
  • Gbemisola W. Ogunmefun Department of Statistics, Federal University of Technology, Akure, Nigeria Author
  • Rasaki Y. Akinbo Department of Mathematics and Statistics, Federal Polytechnic, Ilaro, Nigeria Author
  • Toba T. Bamidele Department of Statistics, Federal University of Technology, Akure, Nigeria Author
  • Olusesan T. Akintola Department of Mathematics and Statistics, Joseph Ayo Babalola University, Ikeji-Arakeji, Nigeria Author

DOI:

https://doi.org/10.62054/ijdm/0204.17

Abstract

Violating the assumption of independence among explanatory variables in the linear regression model leads to multicollinearity. In the presence of multicollinearity, the Ordinary Least Squares (OLS) estimator yields inefficient parameter estimates, whereas Partial Least Squares (PLS) estimates are more robust. Moreover, in PLS, weights must be assigned to each explanatory variable before the latent variables are extracted. Two significant challenges associated with the PLS method are the choice of the weight scheme and the selection of latent variables (LVs) to obtain an efficient estimate of the model parameters.  Two methods of weight allocation are considered in this study: equal weight allocation and the variance of the regressors, while the two commonly known methods of model selection are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). AIC and BIC were used to select the best model for determining the optimal latent variables. Consequently, the study compared the performance of PLS results when the two weight attachment schemes and the two commonly used methods of model selection were used to determine the best latent variables. Efficient validation of PLS was performed using Total Mean Squared Error (TMSE) results for all model parameters obtained by the PLS estimator across different scenarios: varying sample sizes, multicollinearity levels, variability values, weight assignments, and model selection methods. Hence, the study concluded that the BIC method of model selection is the best for determining the optimal latent variables to use when employing PLS methods of estimation to handle multicollinearity in a Linear Regression Model.

References

Ayinde, K. (2007b). Equations to generate normal variates with desired inter-correlation matrix. International Journal of Statistics and System, 2(2), 99–111.

Bamidele, T. T., & Alabi, O. O. (2024). A robust estimator for causal inference: Integrating two-stage least squares with principal component. International Journal of Recent Research in Mathematics, Computer Science, and Information Technology, 11(1), 27–32. https://doi.org/10.5281/zenodo.12671069

Bastien, P., Vinzi, V. E., & Tenenhaus, M. (2005). PLS generalized linear regression. Computational Statistics & Data Analysis, 48(1), 17–46. https://doi.org/10.1016/j.csda.2004.02.005

Höskuldsson, A. (2015). PLS regression methods. Journal of Chemometrics, 29(10), 569-582.

Kondylis, A. (2006). PLS methods in regression model assessment and inference (Unpublished thesis). Université de Neuchâtel.

Naes, T., & Martens, H. (1989). Multivariate calibration. John Wiley & Sons.

Tenenhaus, M. (1998). La régression PLS: Théorie et pratique. Technip, Paris.

Westerhuis, J. A., van Velzen, E. J. J., Hoefsloot, H. C. J., & Smilde, A. K. (2016). Multivariate data analysis of complex datasets: Applications in metabolic fingerprinting. Journal of Chemometrics, 30(7), 421-430.

Wold, H. (1975). Soft modelling by latent variables: The nonlinear iterative partial least squares (NIPALS) approach. In J. Gani (Ed.), Perspectives in Probability and Statistics: Papers in Honour of M. S. Bartlett (pp. 520–540). Academic Press.

Wold, S., Ruhe, A., Wold, H., & Dunn, W. J. (2016). The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. Journal of Econometrics, 67(1), 121–139. https://doi.org/10.1137/0905052

Downloads

Published

2025-12-30

How to Cite

Best Model Selection Method for the Best Latent Variables Determination When Solving Multicollinearity with Partial Least Squares. (2025). International Journal of Development Mathematics (IJDM), 2(4), 245-253. https://doi.org/10.62054/ijdm/0204.17