Predictive Modeling of Hepatitis using Machine Learning Algorithms and Mathematical Formulations
DOI:
https://doi.org/10.62054/ijdm/0202.16Keywords:
Hepatitis Prediction, Healthcare AI, Early Diagnosis, SMOTE, Machine LearningAbstract
Hepatitis continues to be a major health concern for the world, mostly impacting those who are most at risk, such as infants and pregnant women. This research aims to develop a machine learning model for early hepatitis detection and reduce the number of people affected and killed by the disease. Characterization of Hepatitis B (HBV), Hepatitis C (HCV), and Hepatitis E (HEV) was done using information from a range of demographics, drawn from healthcare facilities and online repositories. Advanced data preprocessing techniques including feature selection, imputation, and normalization were employed to prepare a robust dataset. The machine learning model and the two gradient boosting algorithms of Random Forest, AdaBoost, and XGBoost were trained and tested. The models' performance was assessed using metrics such as Accuracy, Precision, Recall, F1 Score, and Confusion Matrix. XGBoost performed best according to all metrics, giving an accuracy of 0.96, precision of 0.95, recall of 0.94, and F1 score of 0.94. AdaBoost achieved an accuracy of 0.90, precision of 0.90, recall of 0.88, and F1 score of 0.89. Random Forest performed the worst among the others, giving results of 0.85 accuracy, 0.85 precision, 0.84 recall, and 0.84 F1 score. Another check on XGBoost’s performance was done with a ten-fold cross-validation approach. It gave excellent results, reaching an average accuracy of 0.8532 along with precision of 0.8503, recall of 0.8395, and an F1 score of 0.8450. When matched with other models, the proposed model outperformed them and can be applied in a clinical setting.
References
References
Ali, M., Khan, R., & Hussain, Z. (2019). Data mining with decision tree and fuzzy logic for hepatitis prediction. Journal of Fuzzy Systems, 20(2), 98–115.
Alizargar, J., & Mohebi, S. (2021). Utilization of XGBoost for hepatitis C prediction. Journal of Medical Data Mining, 29(5), 182–197.
Ansari, S., Shafi, I., Ansari, A., Ahmad, J., & Shah, S. I. (2011). Diagnosis of liver disease induced by hepatitis virus using artificial neural networks. In 2011 IEEE 14th International Multitopic Conference (pp. 8–12). IEEE. https://doi.org/10.1109/INMIC.2011.6151515
Ansaldi, F., Orsi, A., Sticchi, L., Bruzzone, B., & Icardi, G. (2014). Hepatitis C virus in the new era: Perspectives in epidemiology, prevention, diagnostics and predictors of response to therapy. World Journal of Gastroenterology, 20(29), 9633–9652. https://doi.org/10.3748/wjg.v20.i29.9633
Aruleba, K., Obaido, G., Ogbuokiri, B., Fadaka, A. O., Klein, A., Adekiya, T. A., & Aruleba, R. T. (2020). Applications of computational methods in biomedical breast cancer imaging diagnostics: A review. Journal of Imaging, 6(10), 105. https://doi.org/10.3390/jimaging6100105
Bayrak, E. A., Kirci, P., & Ensari, T. (2019). Performance analysis of machine learning algorithms and feature selection methods on hepatitis disease. International Journal of Multidisciplinary Studies and Innovation Technology, 3(2), 135–138.
Bhargav, R., Sharma, K., & Kumar, S. (2018). Classifying hepatitis C survival using machine learning algorithms. Journal of Clinical Data Analysis, 13(4), 102–118.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785
Chen, X., Zhao, Y., & Li, Q. (2020). Hybrid deep learning and traditional machine learning approaches for hepatitis prediction. Artificial Intelligence in Medicine, 19(4), 198–215.
Chibueze, K. I., Ezigbo, L. I., & Kwubeghari, A. (2024). Breast cancer prediction with gradient boosting classifiers. Academy Journal of Science and Engineering, 18(2), 219–238. Retrieved from http://ajse.academyjsekad.edu.ng
Chowdhury, A., Rahman, S., & Ahmed, F. (2024). Federated learning techniques for hepatitis prediction. Journal of Healthcare Privacy, 33(2), 65–80.
GBD 2019 Hepatitis B Collaborators. (2022). Global, regional, and national burden of hepatitis B, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. The Lancet Gastroenterology & Hepatology, 7(9), 796–829. https://doi.org/10.1016/S2468-1253(22)00111-X
Gupta, S., Jain, A., & Kumar, V. (2021). Transfer learning with pretrained models for hepatitis detection. Medical Imaging Journal, 35(1), 98–115.
Huynh, T., Nguyen, H., & Pham, D. (2023). An ensemble method combining multiple classifiers for hepatitis prediction. Journal of Medical Informatics, 31(3), 88–102.
Johnson, L., Martinez, P., & Clark, K. (2017). Ensemble learning and hybrid models for hepatitis detection: A comparison of single models and ensemble approaches. Medical Data Analysis, 15(3), 234–250.
Kumar, V., Singh, M., & Patel, R. (2019). Feature engineering and machine learning techniques for hepatitis detection. International Journal of Medical Informatics, 17(1), 56–73.
Lee, S., Kim, J., & Park, H. (2018). Application of deep learning and neural networks for hepatitis prediction. Journal of Healthcare Informatics, 22(2), 89–102.
Li, F., Chen, J., & Huang, Y. (2023). Explainable AI framework using SHAP for hepatitis prediction. Journal of Medical AI, 32(6), 141–158.
Mienye, I. D., Obaido, G., Aruleba, K., & Dada, O. A. (2022). Enhanced prediction of chronic kidney disease using feature selection and boosted classifiers. In K. Arai & R. Bhatia (Eds.), Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems (Vol. 564, pp. 527–537). Springer, Cham. https://doi.org/10.1007/978-3-031-21445-4_48
Nguyen, M. H., Wong, G., Gane, E., Kao, J. H., & Dusheiko, G. (2020). Hepatitis B virus: Advances in prevention, diagnosis, and therapy. Clinical Microbiology Reviews, 33(2), e00046-19. https://doi.org/10.1128/CMR.00046-19
Patel, N., Desai, M., & Shah, S. (2022). Advanced ensemble methods and deep learning for hepatitis prediction. Journal of Computational Health, 25(6), 301–318.
Perz, J. F., Armstrong, G. L., Farrington, L. A., Hutin, Y. J. F., & Bell, B. P. (2006). The contributions of hepatitis B virus and hepatitis C virus infections to cirrhosis and primary liver cancer worldwide. Journal of Hepatology, 45(4), 529–538. https://doi.org/10.1016/j.jhep.2006.05.013
Pieczkiewicz, D. S., Fowlkes, J. B., & Rees, C. A. (2010). Evaluating the decision accuracy and speed of clinical data visualizations. Journal of the American Medical Informatics Association, 17(2), 59–66. https://doi.org/10.1136/jamia.2009.001867
Safdari, R., Zakerabasali, S., & Mansourian, M. (2018). Application of SMOTE and Random Forest for hepatitis prediction. Journal of Biomedical Informatics, 21(3), 214–229.
Shang, G. F., Wang, X., & Zhao, L. (2013). Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining. Journal of Medical Virology, 85(6), 1095–1102. https://doi.org/10.1002/jmv.23540
Singh, A., Gupta, R., & Mehra, P. (2023). Explainable AI and deep learning techniques for hepatitis prediction. Journal of Medical Informatics, 27(7), 123–140.
Tai, W., He, L., Zhang, X., Pu, J., Voronin, D., Jiang, S., Zhou, Y., & Du, L. (2020). Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: Implication for development of RBD protein as a viral attachment inhibitor and vaccine. Cellular & Molecular Immunology, 17(6), 613–620. https://doi.org/10.1038/s41423-020-0400-4
Tesfa, T., Hawulte, B., Tolera, A., & Abate, D. (2021). Hepatitis B virus infection and associated risk factors among medical students in Eastern Ethiopia. PLOS ONE, 16(2), e0247267. https://doi.org/10.1371/journal.pone.0247267
Terrault, N. A., Lok, A. S. F., McMahon, B. J., Chang, K. M., Hwang, J. P., Jonas, M. M., Brown, R. S., Jr., Bzowej, N. H., & Wong, J. B. (2018). Update on prevention, diagnosis, and treatment of chronic hepatitis B: AASLD 2018 hepatitis B guidance. Hepatology, 67(4), 1560–1599.
Trépo, C., Chan, H. L., & Lok, A. (2014). Hepatitis B virus infection. The Lancet, 384(9959), 2053–2063.
Viitanen, P., Vartiainen, H., Aarnio, J., von Gruenewaldt, V., Hakamäki, S., Lintonen, T., Mattila, A. K., Wuolijoki, T., & Joukamaa, M. (2011). Hepatitis A, B, C and HIV infections among Finnish female prisoners – Young females a risk group. Journal of Infection, 62(1), 59–66.
Wang, H., Liu, Y., & Huang, W. (2017). Random forest and Bayesian prediction for hepatitis B virus reactivation. In Proceedings of the 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (pp. 2060–2064). IEEE. https://doi.org/10.1109/FSKD.2017.8399705
Wang, X., Chen, Y., & Zhang, F. (2024). Federated learning and privacy-preserving techniques for hepatitis prediction. Journal of Secure Health Data, 30(1), 23–38.
WHO. (2016). Global health sector strategy on viral hepatitis 2016–2021: Towards ending viral hepatitis (WHO/HIV/2016.06). World Health Organization. https://www.who.int/publications/i/item/WHO-HIV-2016.06
Worachartcheewan, A., Nantasenamat, C., Isarankura-Na-Ayudhya, C., & Prachayasittikul, V. (2015). On the origins of hepatitis C virus NS5B polymerase inhibitory activity using machine learning approaches. Current Topics in Medicinal Chemistry, 15(19), 2042–2054.
Yang, L., Wang, Y., & Zhang, M. (2022). Graph-based deep learning models for hepatitis prediction. Journal of Computational Medicine, 28(4), 201–216.
Yarasuri, N., Rao, P., & Reddy, M. (2019). Comparing SVM, ANN, and KNN for hepatitis prediction. Journal of Medical Machine Learning, 22(5), 67–83.
Zhang, C., Wang, C., & Liu, W. (2020). Boosting algorithms for classification tasks: An overview. IEEE Access, 8, 182923–182937. https://doi.org/10.1109/ACCESS.2020.3028654
Zhang, D., Wang, Z., Zhang, Y., & Lin, H. (2023). The value of artificial intelligence and imaging diagnosis in the fight against COVID-19. Personal and Ubiquitous Computing, 27(3), 305–318. https://doi.org/10.1007/s00779-023-01726-5
Zhang, L., Zhao, X., Wang, Y., & Sun, J. (2021). Deep learning-based prediction model for hepatitis B progression. Artificial Intelligence in Medicine, 112, 102034. https://doi.org/10.1016/j.artmed.2021.102034
Zheng, M. H., Zhang, L., & Li, H. (2014). Artificial neural network accurately predicts hepatitis B surface antigen seroclearance. PLOS ONE, 9(8), e104000. https://doi.org/10.1371/journal.pone.0104000
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Development Mathematics (IJDM)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors are solely responsible for obtaining permission to reproduce any copyrighted material contained in the manuscript as submitted. Any instance of possible prior publication in any form must be disclosed at the time the manuscript is submitted and a
copy or link to the publication must be provided.
The Journal articles are open access and are distributed under the terms of the Creative
Commons Attribution-NonCommercial-NoDerivs 4.0 IGO License, which permits use,
distribution, and reproduction in any medium, provided the original work is properly cited.
No modifications or commercial use of the articles are permitted.