Fairness-Aware Mutual Information for Feature Selection in Employee Attrition Prediction

Authors

  • omar shakir Directorate of Educational Nineveh , Mosul, Iraq

DOI:

https://doi.org/10.31185/wjcms.404

Keywords:

Feature selection, Mutual‑Information, Equal Opportunity Difference (EOD) ,Fairness aware

Abstract

Feature selection is a critical part of creating machine learning models, particularly for touchy areas such as HR, finance, and healthcare. The literature shows that many current methods only review the results based on accuracy while ignoring both fairness and interpretability. This study presents a feature selection method which fuses Mutual Information for assessing feature relevance, while imposing fairness constraints by measuring the Equal Opportunity Difference (EOD) metric. The aim, of this study, is to maximize accuracy, fairness, and interpretability. The methodology was tested on the IBM HR Analytics dataset for six models (Fair Logistic Regression, Explainable Boosting Machines (EBMs), XGBoost, Random Forest, SVM and KNN). Fair Logistic Regression, provided the best overall results with 96% accuracy and little fairness bias (EOD = 0.005). XGBoost and Random Forest had good predictive reliability, evidence of disparity in fairness measures. Moreover, Explainable Boosting Machines (EBMs) met accuracy standards, and have good interpretability, so these models could be used in intervention areas where transparency of the model/construction process is required.

Downloads

Download data is not yet available.

References

[1] Bahangulu, Julien & Owusu Berko, Louis. (2025). Algorithmic bias, data ethics, and governance: Ensuring fairness, transparency and compliance in AI-powered business analytics applications. World Journal of Advanced Research and Reviews. 25. 1746-1763. 10.30574/wjarr.2025.25.2.0571.

[2] Gallegos, Isabel & Rossi, Ryan & Barrow, Joe & Tanjim, Md Mehrab & Kim, Sungchul & Dernoncourt, Franck & Yu, Tong & Zhang, Ruiyi & Ahmed, Nesreen. (2024). Bias and Fairness in Large Language Models: A Survey. Computational Linguistics. 50. 1097-1179. 10.1162/coli_a_00524.

[3] Saarela, Mirka. (2024). On the relation of causality- versus correlation-based feature selection on model fairness. 56-64. 10.1145/3605098.3636018.

[4] Chadha, Kabir. (2024). Bias and Fairness in Artificial Intelligence: Methods and Mitigation Strategies. International Journal for Research Publication and Seminar. 15. 36-49. 10.36676/jrps.v15.i3.1425.

[5] Gazi, Md & Nasiruddin, Md & Dutta, Shuvo & Sikder, Rajesh & Huda, Chowdhury & Islam, Md Zahidul. (2024). Employee Attrition Prediction in the USA: A Machine Learning Approach for HR Analytics and Talent Retention Strategies. Journal of Business and Management Studies. 6. 47-59. 10.32996/jbms.2024.6.3.6.

[6] Alsubaie, Fiyhan & Aldoukhi, Murtadha. (2024). Using machine learning algorithms with improved accuracy to analyze and predict employee attrition. Decision Science Letters. 13. 1-18. 10.5267/j.dsl.2023.12.006.

[7] Haque, Mustafizul & Paralkar, Tejasvini & Rajguru, Sudhir & Goyal, Adheer & Patil, Tanaya & Upreti, Kamal. (2025). Featuring Machine Learning Models to Evaluate Employee Attrition: A Comparative Analysis of Workforce Stability- Relating Factors. International Research Journal of Multidisciplinary Scope. 06. 862-873. 10.47857/irjms.2025.v06i02.03512.

[8] Qutub, Aseel & Al-Mehmadi, Asmaa & Al-Hssan, Munirah & Aljohani, Ruyan & Alghamdi, Hanan. (2021). Prediction of Employee Attrition Using Machine Learning and Ensemble Methods. International Journal of Machine Learning and Computing. 11. 110-114. 10.18178/ijmlc.2021.11.2.1022.

[9] Benabou, Adil & Touhami, Fatima & My Abdelouahed, Sabri. (2025). Predicting Employee Turnover Using Machine Learning Techniques. Acta Informatica Pragensia. 14. 10.18267/j.aip.255.

[10] Sari, Sindi & Lhaksmana, Kemas. (2022). Employee Attrition Prediction Using Feature Selection with Information Gain and Random Forest Classification. Journal of Computer System and Informatics (JoSYC). 3. 410-419. 10.47065/josyc.v3i4.2099.

[11] Alshiddy, Muneera & Aljaber, Bader. (2023). Employee Attrition Prediction using Nested Ensemble Learning Techniques. International Journal of Advanced Computer Science and Applications. 14. 10.14569/IJACSA.2023.01407101.

[12] Guerranti, Filippo & Dimitri, Giovanna. (2022). A Comparison of Machine Learning Approaches for Predicting Employee Attrition. Applied Sciences. 13. 267. 10.3390/app13010267.

[13] Huang, Da & Liu, Zhaoguo & Wu, Dan. (2023). Research on Ensemble Learning-Based Feature Selection Method for Time-Series Prediction. Applied Sciences. 14. 40. 10.3390/app14010040.

[14] Lyu, Lijun & Roy, Nirmal & Oosterhuis, Harrie & Anand, Avishek. (2024). Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?. 10.1007/978-3-031-56066-8_29.

[15] Beknazaryan, Aleksandr & Dang, Xin & Sang, Hailin. (2019). On mutual information estimation for mixed-pair random variables. Statistics & Probability Letters. 148. 10.1016/j.spl.2018.12.011.

[16] Hakkal, Soukaina & Ait Lahcen, Ayoub. (2024). XGBoost To Enhance Learner Performance Prediction. Computers and Education Artificial Intelligence. 7. 100254. 10.1016/j.caeai.2024.100254.

[17] Boateng, Ernest Yeboah & Otoo, Joseph & Abaye, Daniel. (2020). Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review. Journal of Data Analysis and Information Processing. 08. 341-357. 10.4236/jdaip.2020.84020.

[18] Nayak, Swathi & Bhat, Manisha & Reddy, N V Subba & Rao B, Ashwath. (2022). Study of distance metrics on k - nearest neighbor algorithm for star categorization. Journal of Physics: Conference Series. 2161. 012004. 10.1088/1742-6596/2161/1/012004.

[19] Iranzad, Reza & Liu, Xiao. (2024). A review of random forest-based feature selection methods for data science education and applications. International Journal of Data Science and Analytics. 1-15. 10.1007/s41060-024-00509-w.

[20] Khattak, Afaq & Zhang, Jianping & Chan, P.W. & Chen, Feng & Almujibah, Hamad. (2023). Explainable Boosting Machine: A Contemporary Glass-Box Strategy for the Assessment of Wind Shear Severity in the Runway Vicinity Based on the Doppler Light Detection and Ranging Data. Atmosphere. 15. 20. 10.3390/atmos15010020.

[21] Pinheiro, Murilo & Silva, Maria & Machado, Javam. (2023). Strategies Selection for a Fair Classification in Logistic Regression: A Comparative Analysis. Anais Estendidos do Simpósio Brasileiro de Banco de Dados (SBBD). 15-21. 10.5753/sbbd_estendido.2023.232722

[22] Vujovic, Zeljko. (2021). Classification Model Evaluation Metrics. International Journal of Advanced Computer Science and Applications. Volume 12. 599-606. 10.14569/IJACSA.2021.0120670.

[23] Conciatori, Marco & Valletta, Alessandro & Segalini, Andrea. (2024). Improving the quality evaluation process of machine learning algorithms applied to landslide time series analysis. Computers & Geosciences. 184. 10.1016/j.cageo.2024.105531

Downloads

Published

2025-12-30

Issue

Section

Computer

How to Cite

[1]
omar shakir, “Fairness-Aware Mutual Information for Feature Selection in Employee Attrition Prediction”, WJCMS, vol. 4, no. 4, pp. 26–37, Dec. 2025, doi: 10.31185/wjcms.404.