Hereditary-Based Disease Prediction Model using Machine Learning Techniques

Autori

  • SAMBO AHMAD SALIHU Department of Computer Science, Modibbo Adama University, Yola, Nigeria Autore
  • Yusuf M. Malgwi Department of Computer Science, Modibbo Adama University, Yola, Nigeriaa Autore

DOI:

https://doi.org/10.62054/ijdm/0203.23

Abstract

This research explored the use of machine learning for predicting hereditary diseases from genotype data. Both the Interpretable Genomic Neural Network (IGNN) and Random Forest models were trained and evaluated on genetic datasets. Results demonstrated that the Random Forest model achieved an overall accuracy of 86%, precision of 0.68, and recall of 0.52, while the IGNN model provided enhanced interpretability with comparable performance. Performance comparison showed that Random Forest outperformed baseline models such as Logistic Regression, highlighting its strength in predictive accuracy though still challenged by class imbalance. These achievements illustrate the potential of combining interpretable and ensemble learning approaches in early disease detection to support personalized medicine.

Riferimenti bibliografici

Aalaei, S., Shahraki, H. R., Rowhanimanesh, A., & Eslami, E. (2016). Genetic Algorithm and PS-classifier for Feature Reduction and Classification in the Wisconsin Breast Cancer Dataset. Procedia Computer Science, 82, 208-215.

Avsec, Ž., Agarwal, V., Visentin, D., et al. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10), 1196–1203. https://doi.org/10.1038/s41592-021-01252-x

Bellot, P., Delgado, J. A., & Gestal, M. (2018). Comparative study of convolutional neural networks, multilayer perceptrons, and Bayesian linear models for genomic prediction of complex human traits. BMC Bioinformatics, 19(1), 496.

Bracher-Smith, M., Crawford, K., & Escott-Price, V. (2020). A Survey of the genetics of mental illness. Human Molecular Genetics, 29(R1), R47-R54. doi: 10.1093/hmg/ddaa192.

Denny, J. C., Bastarache, L., & Roden, D. M. (2016). Phenome-wide association studies as a tool to advance precision medicine. Annual Review of Genomics and Human Genetics, 17, 353-373. https://doi.org/10.1146/annurev-genom-083115-022339.

Gao, Z., et al. (2024). EpiGePT: a pretrained transformer-based language model for context-specific human epigenomic signals. Frontiers in Genetics. https://doi.org/10.1101/XXXXXX

Grillo, E., Di Schiavi, E., Checquolo, S., et al. (2013). UTX gene escape from X inactivation and reactivation contributes to tumor progression in women. European Journal of Human Genetics, 21(11), 1335-1343. doi: 10.1038/ejhg.2013.46. PMID: 23486540; PMCID: PMC3798859.

Koch, S., et al. (2023). Clinical utility of polygenic risk scores: a critical review. Genome Medicine. https://doi.org/10.1186/s13073-023-011XX

Le, T. (2020). Machine Learning-Based Model for Disease Gene Prediction Using Genetic Data. Local Journal of Genetics and Biotechnology, 3(2), 67-74.

Novakovsky, G., Fornes, O., Saraswat, M., Mostafavi, S., & Wasserman, W. W. (2023). ExplaiNN: interpretable and transparent neural networks for genomics. Genome Biology, 24(154). https://doi.org/10.1186/s13059-023-02985-y

Oriol, M., Gorriz, J. M., Ramirez, J., & Salas-Gonzalez, D. (2019). Comparison of machine learning models for predicting Late-Onset Alzheimer's Disease from genetic data. Journal of Alzheimer's Disease, 71(2), 509-521

Romagnoni, A., Jégou, S., Van Steen, K., Wainrib, G., & Hugot, J. P. (2019). Predicting Crohn's Disease by Modeling Epistatic Interactions between Genetic Variants. Frontiers in Genetics, 10, 294.

Santhanatham, L., & Padmavathi, G. (2015). A Combined K-means Clustering and Genetic Algorithm Approach for Feature Reduction and Classification of Diabetes Dataset. International Journal of Engineering and Technology, 7(6), 2427-2434.

Schote, A. B., Turner, T. N., Nestler, J., et al. (2020). A family-based linkage study of chromosome 5q31-q33 identifies FGFR1 as a susceptibility gene for Tourette syndrome. European Journal of Human Genetics, 28(10), 1398-1407. doi: 10.1038/s41431-020-0632-4. PMID: 32444735; PMCID: PMC7482656.

Singh, A., Leavline, C., & Priya, N. (2016). Dimensionality Reduction Using Genetic Algorithm for Efficient Classification of Diabetic Patients. International Journal of Computer Applications, 136(3), 32-36.

Thorsrud, J. A., et al. (2025). Performance comparison of genomic best linear unbiased prediction, random forest, SVM, XGBoost and MLP for genomic prediction in guide dogs. Journal of Genomic Applications

Torkamani, A., Wineinger, N. E., & Topol, E. J. (2018). The personal and clinical utility of polygenic risk scores. Nature Reviews Genetics, 19(9), 581-590. doi: 10.1038/s41576-018-0018-x.

Tseng, A. M., Eraslan, G., Biancalani, T., & Scalia, G. (2024). A mechanistically interpretable neural network for regulatory genomics (ARGMINN). OpenReview / arXiv preprint. https://openreview.net/forum?id=eR9C6c76j5

Wray, N. R., Lin, T., Austin, J., et al. (2021). From basic science to clinical application of polygenic risk scores: a primer. JAMA Psychiatry, 78(1), 101-109. doi: 10.1001/jamapsychiatry.2020.2444.

Pubblicato

2025-09-28

Come citare

Hereditary-Based Disease Prediction Model using Machine Learning Techniques. (2025). International Journal of Development Mathematics (IJDM), 2(3), 245-359. https://doi.org/10.62054/ijdm/0203.23