Hybrid Framework for Multi-Disease Chest X-Ray Diagnosis Using Vision Transformers with Label Noise Correction and Uncertainty Calibration

Authors

DOI:

https://doi.org/10.31185/wjcms.451

Keywords:

Chest X-ray, Vision Transformer, label noise, uncertainty calibration, multi-disease diagnosis, medical image analysis.

Abstract

The most accessible radiological technique for thoracic disease detection and diagnosis 
uses chest X-ray imaging as its primary method. The deployment of automated chest X
ray analysis systems faces two main obstacles because of untrustworthy labels in large 
datasets and unpredictable predictive confidence levels. The research proposes a hybrid 
system which combines Vision Transformer (ViT) architecture with methods to handle 
noisy labels and produce accurate probability estimates for multiple disease diagnosis in 
chest X-ray images. The system trains on CheXpert and NIH ChestX-ray14 datasets 
while using Co-Teaching and DivideMix noise-handling methods and self-supervised 
pretraining to enhance feature resistance against supervision errors. The framework uses 
temperature scaling and Monte Carlo dropout as post-hoc methods to enhance confidence 
reliability without compromising discriminative performance. The system aims to reach 
performance levels that match or exceed traditional CNN and standard ViT models in 
AUROC and mAP and F1 score metrics. The system reduces the effects of untrustworthy 
labels while generating meaningful confidence scores which doctors can understand. The 
model produces Grad-CAM++ explanations to assist doctors in understanding its 
decision-making process. The hybrid system works to develop AI systems which deliver 
both exact results and safe operational readiness for real-world chest X-ray decision 
support systems. 

Downloads

Download data is not yet available.

References

[1] H. Park, J. Kim, and S. Hong, “Self-evolving Vision Transformer for Chest X-Ray

Diagnosis (DISTL),” Nature Communications, vol. 13, no. 7892, pp. 1–12, 2022.

[2] A. Dosovitskiy, L. Beyer, A. Kolesnikov et al., “An Image is Worth 16×16 Words:

Transformers for Image Recognition at Scale,” in Proc. International Conference on

Learning Representations (ICLR), 2021.

[3] M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, and A. Dosovitskiy, “Do Vision

Transformers See Like Convolutional Neural Networks?,” in Proc. Advances in Neural

Information Processing Systems (NeurIPS), 2021.

[4] Z. Liu, Y. Lin, Y. Cao et al., “Swin Transformer: Hierarchical Vision Transformer

Using Shifted Windows,” in Proc. IEEE/CVF International Conference on Computer

Vision (ICCV), pp. 9992–10002, 2021.

[5] X. Wang, J. Wang, and F. Li, “Transformer-Based Global Spatial Representation for

Chest X-ray Classification,” IEEE Access, vol. 11, pp. 15687–15698, 2023.

[6] T. Mahmood, A. Rehman, and K. Kim, “Hybrid CNN-ViT Model for Robust Chest

X-ray Diagnosis under Weak Supervision,” Computer Methods and Programs in

Biomedicine, vol. 241, 107673, 2024.

[7] B. Han, Q. Yao, X. Yu et al., “Co-Teaching: Robust Training of Deep Neural

Networks with Extremely Noisy Labels,” in Proc. Advances in Neural Information

Processing Systems (NeurIPS), pp. 8527–8537, 2018.

[8] J. Li, R. Socher, and S. Hoi, “DivideMix: Learning with Noisy Labels as Semi

supervised Learning,” in Proc. International Conference on Learning Representations

(ICLR), 2020.

[9] S. Khanal, L. Li, and M. Ghafoor, “Investigating the Robustness of Vision

Transformers Against Label Noise in Medical Image Classification,” arXiv preprint,

arXiv:2401.01872, 2024.

[10] D. Lin, J. Zhao, and H. Xu, “Efficiency and Safety of Automated Label Cleaning on

Multimodal Retinal Images Using Cleanlab,” Nature Machine Intelligence, vol. 7, pp.

120–132, 2025.

[11] M. Taassori, R. Ahmad, and A. Patel, “RobustDeiT: Noise-Robust Vision

Transformers for Medical Image Classification,” in Proc. IEEE International Symposium

on Signal Processing and Information Technology (ISSPIT), 2025.

[12] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural

Networks,” in Proc. International Conference on Machine Learning (ICML), pp. 1321

1330, 2017.

[13] M. Gawlikowski, J. Tassi, and A. Kruspe, “A Survey of Uncertainty in Deep Neural

Networks for Medical Image Analysis,” Artificial Intelligence Review, vol. 56, pp. 4821

4865, 2023.

[14] S. Ayhan and P. Berens, “Test-Time Data Augmentation for Estimating Prediction

Uncertainty in Deep Neural Networks,” Medical Image Analysis, vol. 82, 102642, 2022.

[15] T. Leibig, V. Allken, and F. Berens, “Leveraging Uncertainty Information from

Deep Neural Networks for Disease Detection,” Scientific Reports, vol. 10, no. 1, pp. 1

14, 2020.

[16] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B.

Haghgoo, R. Ball, K. Shpanskaya, M. Seekins, A. Mong, S. Halabi, J. Sandberg, R.

Jones, D. Larson, C. Langlotz, B. Patel, M. Lungren, and A. Ng,

“CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert

Comparison,” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI),

vol. 33, no. 1, pp. 590–597, 2019.

[Online]. Available: https://aimi.stanford.edu/datasets/chexpert-chest-x-rays

[17] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers,

“ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly

supervised Classification and Localization of Common Thorax Diseases,”

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), pp. 3462–3471, 2017.

[Online]. Available: https://nihcc.app.box.com/v/ChestXray-NIHCC

[18] A. E. Johnson, T. J. Pollard, S. J. Berkowitz, L. Shen, H. L. Lu, M. Ghassemi, and

R. A. Celi, “MIMIC-CXR, a large publicly available database of labeled chest

radiographs,” Scientific Data, vol. 7, no. 1, pp. 1–8, 2020.

[19] A. Bustos, A. Pertusa, J. Salinas, and M. de la Iglesia-Vayá, “PadChest: A Large

Chest X-ray Image Dataset with Multi-Label Annotated Reports,” Scientific Data,

vol. 6, no. 1, pp. 1–8, 2019.

[20] P. Chambon, J. Irvin, and M. P. Lungren, “CheXpert Plus: Augmenting a Large

Chest X-ray Dataset with Text Reports and Demographics,” arXiv preprint

arXiv:2405.19538, 2024.

[21] S. Majkowska, J. Mittal, D. Steiner, and A. Kalidindi, “Chest Radiograph

Interpretation with Deep Learning Models Trained on Multiple Large-Scale

Datasets,” Radiology: Artificial Intelligence, vol. 2, no. 2, e190080, 2020.

[22] H. Tang, Y. Chen, and L. Zhang, “A Self-Supervised Vision Transformer for

Medical Image Diagnosis,” IEEE Journal of Biomedical and Health Informatics, vol.

28, no. 3, pp. 1094–1105, 2024.

[23] J. Wu, Z. Zhang, and H. Liu, “Noise-Aware Semi-Supervised Learning for

Robust Medical Image Classification,” Pattern Recognition, vol. 147, 110015, 2024.

[24] D. Ghosh, R. Shankar, and S. Saha, “Entropy-Based Sample Reweighting for

Learning with Noisy Labels in Chest X-ray Images,” IEEE Access, vol. 11, pp.

112358–112369, 2023.

[25] C. Gawlikowski, J. Tassi, A. Kruspe, and D. Xiao, “A Survey of Uncertainty in

Deep Neural Networks for Medical Image Analysis,” Artificial Intelligence Review,

vol. 56, no. 6, pp. 4821–4865, 2023.

[26] S. Ayhan and P. Berens, “Test-Time Data Augmentation for Estimating

Prediction Uncertainty in Deep Neural Networks,” Medical Image Analysis, vol. 82,

102642, 2022.

[27] J. Ma, Y. Zhao, and K. Wang, “Vision Transformer-Based Multimodal Fusion for

Chest X-ray and Clinical Text Integration,” IEEE Transactions on Neural Networks and

Learning Systems, vol. 35, no. 5, pp. 892–905, 2025.

[28] Y. He, F. Wang, and L. Jin, “Trustworthy AI for Medical Imaging: Challenges,

Methods, and Opportunities,” Nature Machine Intelligence, vol. 6, pp. 210–223, 2024.

Downloads

Published

2025-12-31

Issue

Section

Computer

How to Cite

[1]
bahaa mohammed and N. Azura Husin, “Hybrid Framework for Multi-Disease Chest X-Ray Diagnosis Using Vision Transformers with Label Noise Correction and Uncertainty Calibration”, WJCMS, vol. 4, no. 4, pp. 1–9, Dec. 2025, doi: 10.31185/wjcms.451.