Hybrid Framework for Multi-Disease Chest X-Ray Diagnosis Using Vision Transformers with Label Noise Correction and Uncertainty Calibration
DOI:
https://doi.org/10.31185/wjcms.451Keywords:
Chest X-ray, Vision Transformer, label noise, uncertainty calibration, multi-disease diagnosis, medical image analysis.Abstract
The most accessible radiological technique for thoracic disease detection and diagnosis
uses chest X-ray imaging as its primary method. The deployment of automated chest X
ray analysis systems faces two main obstacles because of untrustworthy labels in large
datasets and unpredictable predictive confidence levels. The research proposes a hybrid
system which combines Vision Transformer (ViT) architecture with methods to handle
noisy labels and produce accurate probability estimates for multiple disease diagnosis in
chest X-ray images. The system trains on CheXpert and NIH ChestX-ray14 datasets
while using Co-Teaching and DivideMix noise-handling methods and self-supervised
pretraining to enhance feature resistance against supervision errors. The framework uses
temperature scaling and Monte Carlo dropout as post-hoc methods to enhance confidence
reliability without compromising discriminative performance. The system aims to reach
performance levels that match or exceed traditional CNN and standard ViT models in
AUROC and mAP and F1 score metrics. The system reduces the effects of untrustworthy
labels while generating meaningful confidence scores which doctors can understand. The
model produces Grad-CAM++ explanations to assist doctors in understanding its
decision-making process. The hybrid system works to develop AI systems which deliver
both exact results and safe operational readiness for real-world chest X-ray decision
support systems.
Downloads
References
[1] H. Park, J. Kim, and S. Hong, “Self-evolving Vision Transformer for Chest X-Ray
Diagnosis (DISTL),” Nature Communications, vol. 13, no. 7892, pp. 1–12, 2022.
[2] A. Dosovitskiy, L. Beyer, A. Kolesnikov et al., “An Image is Worth 16×16 Words:
Transformers for Image Recognition at Scale,” in Proc. International Conference on
Learning Representations (ICLR), 2021.
[3] M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, and A. Dosovitskiy, “Do Vision
Transformers See Like Convolutional Neural Networks?,” in Proc. Advances in Neural
Information Processing Systems (NeurIPS), 2021.
[4] Z. Liu, Y. Lin, Y. Cao et al., “Swin Transformer: Hierarchical Vision Transformer
Using Shifted Windows,” in Proc. IEEE/CVF International Conference on Computer
Vision (ICCV), pp. 9992–10002, 2021.
[5] X. Wang, J. Wang, and F. Li, “Transformer-Based Global Spatial Representation for
Chest X-ray Classification,” IEEE Access, vol. 11, pp. 15687–15698, 2023.
[6] T. Mahmood, A. Rehman, and K. Kim, “Hybrid CNN-ViT Model for Robust Chest
X-ray Diagnosis under Weak Supervision,” Computer Methods and Programs in
Biomedicine, vol. 241, 107673, 2024.
[7] B. Han, Q. Yao, X. Yu et al., “Co-Teaching: Robust Training of Deep Neural
Networks with Extremely Noisy Labels,” in Proc. Advances in Neural Information
Processing Systems (NeurIPS), pp. 8527–8537, 2018.
[8] J. Li, R. Socher, and S. Hoi, “DivideMix: Learning with Noisy Labels as Semi
supervised Learning,” in Proc. International Conference on Learning Representations
(ICLR), 2020.
[9] S. Khanal, L. Li, and M. Ghafoor, “Investigating the Robustness of Vision
Transformers Against Label Noise in Medical Image Classification,” arXiv preprint,
arXiv:2401.01872, 2024.
[10] D. Lin, J. Zhao, and H. Xu, “Efficiency and Safety of Automated Label Cleaning on
Multimodal Retinal Images Using Cleanlab,” Nature Machine Intelligence, vol. 7, pp.
120–132, 2025.
[11] M. Taassori, R. Ahmad, and A. Patel, “RobustDeiT: Noise-Robust Vision
Transformers for Medical Image Classification,” in Proc. IEEE International Symposium
on Signal Processing and Information Technology (ISSPIT), 2025.
[12] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural
Networks,” in Proc. International Conference on Machine Learning (ICML), pp. 1321
1330, 2017.
[13] M. Gawlikowski, J. Tassi, and A. Kruspe, “A Survey of Uncertainty in Deep Neural
Networks for Medical Image Analysis,” Artificial Intelligence Review, vol. 56, pp. 4821
4865, 2023.
[14] S. Ayhan and P. Berens, “Test-Time Data Augmentation for Estimating Prediction
Uncertainty in Deep Neural Networks,” Medical Image Analysis, vol. 82, 102642, 2022.
[15] T. Leibig, V. Allken, and F. Berens, “Leveraging Uncertainty Information from
Deep Neural Networks for Disease Detection,” Scientific Reports, vol. 10, no. 1, pp. 1
14, 2020.
[16] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B.
Haghgoo, R. Ball, K. Shpanskaya, M. Seekins, A. Mong, S. Halabi, J. Sandberg, R.
Jones, D. Larson, C. Langlotz, B. Patel, M. Lungren, and A. Ng,
“CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert
Comparison,” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI),
vol. 33, no. 1, pp. 590–597, 2019.
[Online]. Available: https://aimi.stanford.edu/datasets/chexpert-chest-x-rays
[17] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers,
“ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly
supervised Classification and Localization of Common Thorax Diseases,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 3462–3471, 2017.
[Online]. Available: https://nihcc.app.box.com/v/ChestXray-NIHCC
[18] A. E. Johnson, T. J. Pollard, S. J. Berkowitz, L. Shen, H. L. Lu, M. Ghassemi, and
R. A. Celi, “MIMIC-CXR, a large publicly available database of labeled chest
radiographs,” Scientific Data, vol. 7, no. 1, pp. 1–8, 2020.
[19] A. Bustos, A. Pertusa, J. Salinas, and M. de la Iglesia-Vayá, “PadChest: A Large
Chest X-ray Image Dataset with Multi-Label Annotated Reports,” Scientific Data,
vol. 6, no. 1, pp. 1–8, 2019.
[20] P. Chambon, J. Irvin, and M. P. Lungren, “CheXpert Plus: Augmenting a Large
Chest X-ray Dataset with Text Reports and Demographics,” arXiv preprint
arXiv:2405.19538, 2024.
[21] S. Majkowska, J. Mittal, D. Steiner, and A. Kalidindi, “Chest Radiograph
Interpretation with Deep Learning Models Trained on Multiple Large-Scale
Datasets,” Radiology: Artificial Intelligence, vol. 2, no. 2, e190080, 2020.
[22] H. Tang, Y. Chen, and L. Zhang, “A Self-Supervised Vision Transformer for
Medical Image Diagnosis,” IEEE Journal of Biomedical and Health Informatics, vol.
28, no. 3, pp. 1094–1105, 2024.
[23] J. Wu, Z. Zhang, and H. Liu, “Noise-Aware Semi-Supervised Learning for
Robust Medical Image Classification,” Pattern Recognition, vol. 147, 110015, 2024.
[24] D. Ghosh, R. Shankar, and S. Saha, “Entropy-Based Sample Reweighting for
Learning with Noisy Labels in Chest X-ray Images,” IEEE Access, vol. 11, pp.
112358–112369, 2023.
[25] C. Gawlikowski, J. Tassi, A. Kruspe, and D. Xiao, “A Survey of Uncertainty in
Deep Neural Networks for Medical Image Analysis,” Artificial Intelligence Review,
vol. 56, no. 6, pp. 4821–4865, 2023.
[26] S. Ayhan and P. Berens, “Test-Time Data Augmentation for Estimating
Prediction Uncertainty in Deep Neural Networks,” Medical Image Analysis, vol. 82,
102642, 2022.
[27] J. Ma, Y. Zhao, and K. Wang, “Vision Transformer-Based Multimodal Fusion for
Chest X-ray and Clinical Text Integration,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 35, no. 5, pp. 892–905, 2025.
[28] Y. He, F. Wang, and L. Jin, “Trustworthy AI for Medical Imaging: Challenges,
Methods, and Opportunities,” Nature Machine Intelligence, vol. 6, pp. 210–223, 2024.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 bahaa mohammed, Nor Azura Husin

This work is licensed under a Creative Commons Attribution 4.0 International License.



