•  
  •  
 

Keywords

Lung Cancer, LC25000 dataset, Colon cancer, Machine learning, Explainable AI

Document Type

Research Paper

Abstract

Lung and colon cancers are two of the most common and deadly tumors around the world, creating significant public health concerns. Artificial intelligence (AI) and machine learning (ML) have heavily improved cancer research, particularly in early detection, histopathological analysis, and personalized therapy planning. However, despite their remarkable accuracy, ML models sometimes lack transparency, making explainability crucial in medical applications. Although various machine learning (ML)-based classifications for cancer models exist, their interpretation is not understood. The current research overcomes the diagnostic gap by developing a highly accurate system that uses XAI (Explainable Artificial Intelligence) methods to clarify its predictions. We used Kaggle''''''''s LC25000 dataset, which included histology images for lung and colon tumors in humans. To determine the best cancer classification strategy, we tested various machine learning algorithms, including Random Forest, Decision Tree, Support Vector Machine (SVM), and Extreme Gradient Boosting. Furthermore, XAI approaches such as LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (Shapley Additive Explanations) were utilized to evaluate model predictions and identify important information affecting classification outcomes. XGBoost confirmed that it was useful in identifying colon and lung cancer by achieving the highest accuracy of 99.80% among the models used. Also, XAI techniques offered useful information on the most significant features. SHAP analysis highlighted LBP and color histogram features as key for distinguishing lung and colon tissues, while LIME confirmed their importance by identifying critical image regions influencing predictions.

References

M. Masud, N. Sikder, A. A. Nahid, A. K. Bairagi, M. A. AlZain, A machine learning approach to diagnosing lung and colon cancer using a deep learning-based classification framework, Sensors, 21 (2021) 748. https://doi.org/10.3390/s21030748 F. Bray, M. Laversanne, H. Sung, J. Ferlay, R.L. Siegel, I. Soerjomataram, A. Jemal, Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA Cancer J. Clin., 74 (2024) 229-263. https://doi.org/10.3322/caac.21834 F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA Cancer J. Clin., 68 (2018) 394–424. https://doi.org/10.3322/caac.21492 A. Bermúdez, I. Arranz-Salas, S. Mercado, J. A. López-Villodres, V. González, F. Ríus, M. V. Ortega, C. Alba, I. Hierro, D. Bermúdez, Her2-Positive and Microsatellite Instability Status in Gastric Cancer Clinicopathological Implications, Diagnostics, 11 (2021) 944. https://doi.org/10.3390/diagnostics11060944 M. Togaçar, Disease type detection in lung and colon cancer images using the complement approach of inefficient sets, Comput. Biol. Med., 137 (2021) 104827. https://doi.org/10.1016/j.compbiomed.2021.104827 L. F. Sánchez-Peralta, L. Bote-Curiel, A. Picón, F. M. Sánchez-Margallo, J. B. Pagador, Deep learning to find colorectal polyps in colonoscopy: A systematic literature review, Artif. Intell. Med., 108 (2020) 101923. https://doi.org/10.1016/j.artmed.2020.101923. N. Wijethilake, D. Meedeniya, C. Chitraranjan, I. Perera, M. Islam, H. Ren, Glioma survival analysis empowered with data engineering A survey, IEEE Access, 9 (2021) 43168–43191. https://doi.org/10.1109/ACCESS.2021.3065965. N. Alangari, M. El Bachir Menai, H. Mathkour, I. Almosallam, Exploring evaluation methods for interpretable machine learning: A survey, Information, 14 (2023) 469. https://doi.org/10.3390/info14080469. A. M. Antoniadi, Y. Du, Y. Guendouz, L. Wei, C. Mazo, B. A. Becker, C. Mooney, Current challenges and future opportunities for XAI in machine learning-based clinical decision support systems: A systematic review, Appl. Sci., 11 (2021) 5088. https://doi.org/10.3390/app11115088. J. Xu, P. Yang, S. Xue, B. Sharma, M. Sanchez-Martin, F. Wang, B. Parikh, Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives, Hum. Genet., 138 (2019) 109–124. https://doi.org/10.1007/s00439-019-01970-5 O. Loyola-Gonzalez, Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view, IEEE Access, 7 (2019) 154096–154113. https://doi.org/10.1109/ACCESS.2019.2949286 S. K. Ghosh, A. H. Khandoker, Investigation on explainable machine learning models to predict chronic kidney diseases, Sci. Rep., 14 (2024) 3687. https://doi.org/10.1038/s41598-024-54375-4 A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Inf. Fusion, 58 (2020) 82–115. https://doi.org/10.1016/j.inffus.2019.12.012 A. Adadi, M. Berrada, Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI), IEEE Access, 6 (2018) 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052 Z. C. Lipton, The mythos of model interpretability, Queue, 16 (2018) 31–57. https://doi.org/10.1145/3236386.3241340 C. Hu, et al., Application of interpretable machine learning for early prediction of prognosis in acute kidney injury, Comput. Struct. Biotechnol. J., 20 (2022) 2861–2870. https://doi.org/10.1016/j.csbj.2022.06.003 S. M. Lundberg, From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell., 2 (2020) 56–67. https://doi.org/10.1038/s42256-019-0138-9 A. Hage Chehade, N. Abdallah, J. M. Marion, M. Oueidat, P. Chauvet, Lung and colon cancer classification using medical imaging: A feature engineering approach, Phys. Eng. Sci. Med., 45 (2022) 729–746. https://doi.org/10.1007/s13246-022-01139-x M. Al-Jabbar, M. Alshahrani, E. M. Senan, I. A. Ahmed, Histopathological analysis for detecting lung and colon cancer malignancies using hybrid systems with fused features, Bioengineering, 10 (2023) 383. https://doi.org/10.3390/bioengineering10030383 K. Vanitha, S. S. Sree, S. Guluwadi, Deep learning ensemble approach with explainable AI for lung and colon cancer classification using advanced hyperparameter tuning, BMC Med. Inform. Decis. Mak., 24 (2024) 222. https://doi.org/10.1186/s12911-024-02628-7 M. A. Hasan, F. Haque, S. R. Sabuj, H. Sarker, M. O. F. Goni, F. Rahman, M. M. Rashid, An end-to-end lightweight multi-scale CNN for the classification of lung and colon cancer with XAI integration, Technologies, 12 (2024) 56. https://doi.org/10.3390/technologies12040056 A. A. Borkowski, M. M. Bui, L. B. Thomas, C. P. Wilson, L. A. DeLand, S. M. Mastorides, Lung and Colon Cancer Histopathological Image Dataset (LC25000), arXiv, 1912 (2019) 12142. https://arxiv.org/abs/1912.12142 S. A. El-Ghany, M. Azad, M. Elmogy, Robustness fine-tuning deep learning model for cancer diagnosis based on histopathology image analysis, Diagnostics, 13 (2023) 699. https://doi.org/10.3390/diagnostics13040699 S. Tummala, S. Kadry, A. Nadeem, H. T. Rauf, N. Gul, An explainable classification method based on complex scaling in histopathology images for lung and colon cancer, Diagnostics, 13 (2023) 1594. https://doi.org/10.3390/diagnostics13091594 M. Sakli, C. Essid, B. B. Salah, H. Sakli, Flexible framework for lung and colon cancer automated analysis across multiple diagnosis scenarios, Int. J. Adv. Comput. Sci. Appl., 16 (2025). https://dx.doi.org/10.14569/IJACSA.2025.0160258 J. C. M. Dos Santos, G. A. Carrijo, C. F. d. S. De Cardoso, J. C. Ferreira, P. M. Sousa, A. C. Patrocínio, Fundus image quality enhancement for blood vessel detection via a neural network using CLAHE and Wiener filter, Res. Biomed. Eng., 36 (2020) 107–119. https://doi.org/10.1007/s42600-020-00046-y A. M. Reza, Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement, J. VLSI Signal Process, Syst. Signal Image Video Technol., 38 (2004) 35–44. https://doi.org/10.1023/B:VLSI.0000028532.53893.82 T. Ayyavoo, J. J. Suseela, Illumination pre-processing method for face recognition using 2D DWT and CLAHE, IET Biometrics, 7 (2017) 380–390. https://doi.org/10.1049/iet-bmt.2016.0092 S. Sahu, A. K. Singh, S. P. Ghrera, M. Elhoseny, An approach for de-noising and contrast enhancement of retinal fundus image using CLAHE, Optics & Laser Technol., 110 (2019) 87–98. https://doi.org/10.1016/j.optlastec.2018.06.061 W. N. J. H. W. Yussof, M. Man, R. Umar, A. N. Zulkeflee, E. A. Awalludin, N. Ahmad, Enhancing moon crescent visibility using contrast-limited adaptive histogram equalization and bilateral filtering techniques, J. Telecommun. Inf. Technol., 1 (2022) 3–13. http://dx.doi.org/10.26636/jtit.2022.155721 J. Ahmad, M. Batool, K. Kim, Sustainable wearable system: Human behavior modeling for life-logging activities using K-Ary tree hashing classifier, Sustainability, 12 (2020) 10324. https://doi.org/10.3390/su122410324 I. Nosheen, A. Naseer, A. Jalal, Efficient vehicle detection and tracking using blob detection and Kernelized filter, 2024 5th International Conference on Advancements in Computational Sciences, 2024, 1–8. https://doi.org/10.1109/ICACS60934.2024.10473292 Q. Zhao, J. Yang, H. Liu, Stone images retrieval based on color histogram, IEEE Int. Image Anal. Signal Process., 2009 (2009) 157–161. https://doi.org/10.1109/IASP.2009.5054590 J. Hafner, H. Sawhney, W. Equitz, M. Flickner, W. Niblack, Efficient color histogram indexing for quadratic form distance functions, IEEE Trans. Pattern Anal. Mach. Intell., 17 (1995) 729–736. https://doi.org/10.1109/34.391417 M. J. Swain, D. H. Ballard, Color indexing, Int. J. Comput. Vis., 7 (1991) 11–29. https://doi.org/10.1007/BF00130487 S. Sural, G. Qian, and S. Pramanik, Segmentation and histogram generation using the HSV color space for image retrieval, Proceedings of IEEE International Conference on Image Processing, 2 (2002) 589–592. https://doi.org/10.1109/ICIP.2002.1040019 T. Ojala, M. Pietikäinen, D. Harwood, A comparative study of texture measures with classification based on featured distributions, Pattern Recognit., 29 (1996) 51–59. https://doi.org/10.1016/0031-3203(95)00067-4 V. E. Staartjes, L. Regli, C. Serra, Machine learning in clinical neuroscience: foundations and applications, Conference proceedings Machine Learning in Clinical Neuroscience Foundations and Applications, 2017. http://dx.doi.org/10.1007/978-3-030-85292-4 I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn., 46 (2002) 389–422. https://doi.org/10.1023/A:1012487302797 I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res., 3 (2003) 1157. http://dx.doi.org/10.1162/153244303322753616 C. A. Ramezan, Transferability of recursive feature elimination (RFE)-derived feature sets for support vector machine land cover classification, Remote Sens., 14 (2022) 6218. https://doi.org/10.3390/rs14246218 N. K. Ahmed, A. F. Atiya, N. E. Gayar, H. El-Shishiny, An empirical comparison of machine learning models for time series forecasting, Econometric Rev., 29 (2010) 594–621. https://doi.org/10.1080/07474938.2010.481556 L. U. Ying, Decision tree methods: applications for classification and prediction, Shanghai Arch. Psychiatry, 27 (2015) 130. https://doi.org/10.11919/j.issn.1002-0829.215044 B. Charbuty, A. Abdulazeez, Classification based on decision tree algorithm for machine learning, J. Appl. Sci. Technol. Trends, 2 (2021) 20–28. https://doi.org/10.38094/jastt20165 C. Gold, P. Sollich, Model selection for support vector machine classification, Neurocomputing, 55 (2003) 221–249. https://doi.org/10.1016/S0925-2312(03)00375-8 L. Breiman, Random forests, Mach. Learn., 45 (2001) 5–32. https://doi.org/10.1023/A:1010933404324 T. M. Oshiro, P. S. Perez, J. A. Baranauskas, How many trees in a random forest?, Mach. Learn. Data Min. Pattern Recognit., 8 (2012) 154–168. https://doi.org/10.1007/978-3-642-31537-4_13 M. Shepovalov, V. Akella, FPGA and GPU-based acceleration of ML workloads on Amazon cloud – A case study using gradient boosted decision tree library, Integration, 70 (2020) 1–9. https://doi.org/10.1016/j.vlsi.2019.09.007 C. W. Wang, Y. C. Lee, E. Calista, F. Zhou, H. Zhu, R. Suzuki, D. Komura, S. Ishikawa, S.-P. Cheng, A benchmark for comparing precision medicine methods in thyroid cancer diagnosis using tissue microarrays, Bioinformatics, 34 (2017) 1767–1773. https://doi.org/10.1093/bioinformatics/btx838 N. Nusrat, S. B. Jang, A comparison of regularization techniques in deep neural networks, Symmetry, 10 (2018) 648. https://doi.org/10.3390/sym10110648 S. Das, M. Sultana, S. Bhattacharya, D. Sengupta, D. De, XAI–reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI, J. Supercomput., 79 (2023) 18167–18197. https://doi.org/10.1007/s11227-023-05356-3 J. An, Y. Zhang, I. Joe, Specific-input LIME explanations for tabular data based on deep learning models, Appl. Sci., 13 (2023) 8782. https://doi.org/10.3390/app13158782

Highlights

An image-based method was developed to classify lung and colon cancer using CLAHE-enhanced histopathology images Color histograms and LBP features were combined to improve classification across five cancer-related classes XGBoost with RFE achieved 99.80% accuracy by selecting the most relevant handcrafted features CLAHE preprocessing enhanced feature clarity, improving model accuracy and interpretability SHAP and LIME tools were used to explain model decisions, supporting transparent AI-driven cancer diagnosis

DOI

10.30684/etj.2025.158895.1935

First Page

775

Last Page

794

Share

COinS