|
Abstract.
This article is devoted to the urgent problem of constructing quite interpretable linear regressions. It is known that this problem can be reduced to a problem of mixed 0-1 integer linear programming when using the ordinary least squares method for estimating linear regression parameters. The technology for solving these problems currently involves using the LPSolve solver, and the purpose of this work is to compare the efficiency of two methods for constructing quite interpretable linear regressions: generating all subsets method using the Gretl package, and the method based on mixed 0-1 integer linear programming, using the COPT solver. As a result of solving 250 optimization problems on large samples, the performance of COPT solver turned out to be much higher than the performance of LPSolve solver in all cases. The effectiveness of a mixed 0-1 integer linear programming method is clarified
Keywords:
regression analysis, linear regression, interpretability, subset selection in regression, ordinary least squares method, mixed 0-1 integer linear programming problem, efficiency.
DOI 10.14357/20718632260114
EDN RVRAJB
PP. 154-166.
References
1. Afzal S., Shokri A., Ziapour B.M., Shakibi H., Sobhani B. Building energy consumption prediction and optimization using different neural network-assisted models; comparison of different networks and optimization algorithms. Engineering Applications of Artificial Intelligence. 2024;127:107356. https://doi.org/10.1016/j.engappai.2023.107356 2. Ma X., Zou B., Deng J., Gao J., Longley I., Xiao S., Guo B., Wu Y., Xu T., Xu X., Yang X., Wang X., Tan Z., Wang Y., Morawska L., Salmond J. A comprehensive review of the development of land use regression approaches for modeling spatiotemporal variations of ambient air pollution: A perspective from 2011 to 2023. Environment international. 2024;183:108430. https://doi.org/10.1016/j.envint.2024.108430 3. Aivazjan S.A., Mhitarjan V.S. Applied statistics and basics of econometrics. Moscow: YUNITI; 1998. 1005 p. (In Russ.). 4. Noskov S.I. Technology of modeling objects with unstable functioning and uncertainty in data. Irkutsk: Oblinformpechat’; 1996. 320 p. (In Russ.). 5. Koch T., Berthold T., Pedersen J., Vanaret C. Progress in mathematical programming solvers from 2001 to 2020. EURO Journal on Computational Optimization. 2022;10:100031. https://doi.org/10.1016/j.ejco.2022.100031 6. Eifler L., Gleixner A. A computational status update for exact rational mixed integer programming. Mathematical Programming. 2023;197(2):793–812. https://doi.org/10.1007/s10107-021-01749-5 7. Scavuzzo L., Aardal K., Lodi A., Yorke-Smith N. Machine learning augmented branch and bound for mixed integer linear programming. Mathematical Programming. 2024:1–44. https://doi.org/10.1007/s10107-024-02130-y 8. Konno H., Yamamoto R. Choosing the best set of variables in regression analysis using integer programming. Journal of Global Optimization. 2009;44:273–282. https://doi.org/10.1007/s10898-008-9323-9 9. Ahari S.A., Kocuk B. A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression. EURO Journal on Computational Optimization. 2023;11: 100069. https://doi.org/10.1016/j.ejco.2023.100069 10. Lee H., Park Y.W. Integrated subset selection and bandwidth estimation algorithm for geographically weighted regression. Pattern Recognition. 2025;165:111589. https://doi.org/10.1016/j.patcog.2025.111589 11. Bazilevskiy M.P. Reduction the problem of selecting informative regressors when estimating a linear regression model by the method of least squares to the problem of partial-Boolean linear programming. Modeling, Optimization and Information Technology. 2018;6(1):108–117. (In Russ.). EDN: XOFRXV. 12. Ferster E., Rents B. Methods of correlation and regression analysis. Moscow: Finance and Statistics; 1983. 303 p. (In Russ.). 13. Bazilevskiy M.P. Program for constructing quite interpretable elementary and non-elementary quasi-linear regression models. Proceedings of ISP RAS. 2023;35(4):129–144. (In Russ.). EDN: KTOSCW. https://doi.org/10.15514/ISPRAS-2023-35(4)-7 14. Bazilevskiy M.P. Application of mathematical programming for selection the optimal structures of multivariate linear regressions. Journal of Information Technologies and Computing Systems. 2024;(4):32–45. (In Russ.). EDN: BBFOVP. https://doi.org/10.14357/20718632240404 15. Bazilevskiy M.P. Comparative analysis of the effectiveness of methods for constructing quite interpretable linear regression models. Modelling and Data Analysis. 2023;13(4):59–83. (In Russ.). EDN: VXFGBO. https://doi.org/10.17759/mda.2023130404 16. Lpsolve. Mixed integer linear programming solver. Available from: https://sourceforge.net/projects/lpsolve/ [Accessed 7 July 2025]. 17. Ge D., Huangfu Q., Wang Z., Wu J., Ye Y. Cardinal Optimizer (COPT) user guide. Available from: https://guide.coap.online/copt/en-doc [Accessed 7 July 2025]. 18. Knowledge Extraction based on Evolutionary Learning. Available from: https://sci2s.ugr.es/keel/dataset.php?cod=93 [Accessed 7 July 2025]. 19. UCI Machine Learning Repository. Available from: https://archive. ics.uci.edu/dataset/464/superconductivty+data [Accessed 7 July 2025]. 20. UCI Machine Learning Repository. Available from: https://archive. ics.uci.edu/dataset/437/residential+building+data+set [Accessed 7 July 2025]. 21. UCI Machine Learning Repository. Available from: https://archive. ics.uci.edu/dataset/203/yearpredictionmsd [Accessed 7 July 2025].
|