Hereditary-Based Disease Prediction Model using Machine Learning Techniques
DOI:
https://doi.org/10.62054/ijdm/0203.23Abstract
This research explored the use of machine learning for predicting hereditary diseases from genotype data. Both the Interpretable Genomic Neural Network (IGNN) and Random Forest models were trained and evaluated on genetic datasets. Results demonstrated that the Random Forest model achieved an overall accuracy of 86%, precision of 0.68, and recall of 0.52, while the IGNN model provided enhanced interpretability with comparable performance. Performance comparison showed that Random Forest outperformed baseline models such as Logistic Regression, highlighting its strength in predictive accuracy though still challenged by class imbalance. These achievements illustrate the potential of combining interpretable and ensemble learning approaches in early disease detection to support personalized medicine.
Riferimenti bibliografici
Aalaei, S., Shahraki, H. R., Rowhanimanesh, A., & Eslami, E. (2016). Genetic Algorithm and PS-classifier for Feature Reduction and Classification in the Wisconsin Breast Cancer Dataset. Procedia Computer Science, 82, 208-215.
Avsec, Ž., Agarwal, V., Visentin, D., et al. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10), 1196–1203. https://doi.org/10.1038/s41592-021-01252-x
Bellot, P., Delgado, J. A., & Gestal, M. (2018). Comparative study of convolutional neural networks, multilayer perceptrons, and Bayesian linear models for genomic prediction of complex human traits. BMC Bioinformatics, 19(1), 496.
Bracher-Smith, M., Crawford, K., & Escott-Price, V. (2020). A Survey of the genetics of mental illness. Human Molecular Genetics, 29(R1), R47-R54. doi: 10.1093/hmg/ddaa192.
Denny, J. C., Bastarache, L., & Roden, D. M. (2016). Phenome-wide association studies as a tool to advance precision medicine. Annual Review of Genomics and Human Genetics, 17, 353-373. https://doi.org/10.1146/annurev-genom-083115-022339.
Gao, Z., et al. (2024). EpiGePT: a pretrained transformer-based language model for context-specific human epigenomic signals. Frontiers in Genetics. https://doi.org/10.1101/XXXXXX
Grillo, E., Di Schiavi, E., Checquolo, S., et al. (2013). UTX gene escape from X inactivation and reactivation contributes to tumor progression in women. European Journal of Human Genetics, 21(11), 1335-1343. doi: 10.1038/ejhg.2013.46. PMID: 23486540; PMCID: PMC3798859.
Koch, S., et al. (2023). Clinical utility of polygenic risk scores: a critical review. Genome Medicine. https://doi.org/10.1186/s13073-023-011XX
Le, T. (2020). Machine Learning-Based Model for Disease Gene Prediction Using Genetic Data. Local Journal of Genetics and Biotechnology, 3(2), 67-74.
Novakovsky, G., Fornes, O., Saraswat, M., Mostafavi, S., & Wasserman, W. W. (2023). ExplaiNN: interpretable and transparent neural networks for genomics. Genome Biology, 24(154). https://doi.org/10.1186/s13059-023-02985-y
Oriol, M., Gorriz, J. M., Ramirez, J., & Salas-Gonzalez, D. (2019). Comparison of machine learning models for predicting Late-Onset Alzheimer's Disease from genetic data. Journal of Alzheimer's Disease, 71(2), 509-521
Romagnoni, A., Jégou, S., Van Steen, K., Wainrib, G., & Hugot, J. P. (2019). Predicting Crohn's Disease by Modeling Epistatic Interactions between Genetic Variants. Frontiers in Genetics, 10, 294.
Santhanatham, L., & Padmavathi, G. (2015). A Combined K-means Clustering and Genetic Algorithm Approach for Feature Reduction and Classification of Diabetes Dataset. International Journal of Engineering and Technology, 7(6), 2427-2434.
Schote, A. B., Turner, T. N., Nestler, J., et al. (2020). A family-based linkage study of chromosome 5q31-q33 identifies FGFR1 as a susceptibility gene for Tourette syndrome. European Journal of Human Genetics, 28(10), 1398-1407. doi: 10.1038/s41431-020-0632-4. PMID: 32444735; PMCID: PMC7482656.
Singh, A., Leavline, C., & Priya, N. (2016). Dimensionality Reduction Using Genetic Algorithm for Efficient Classification of Diabetic Patients. International Journal of Computer Applications, 136(3), 32-36.
Thorsrud, J. A., et al. (2025). Performance comparison of genomic best linear unbiased prediction, random forest, SVM, XGBoost and MLP for genomic prediction in guide dogs. Journal of Genomic Applications
Torkamani, A., Wineinger, N. E., & Topol, E. J. (2018). The personal and clinical utility of polygenic risk scores. Nature Reviews Genetics, 19(9), 581-590. doi: 10.1038/s41576-018-0018-x.
Tseng, A. M., Eraslan, G., Biancalani, T., & Scalia, G. (2024). A mechanistically interpretable neural network for regulatory genomics (ARGMINN). OpenReview / arXiv preprint. https://openreview.net/forum?id=eR9C6c76j5
Wray, N. R., Lin, T., Austin, J., et al. (2021). From basic science to clinical application of polygenic risk scores: a primer. JAMA Psychiatry, 78(1), 101-109. doi: 10.1001/jamapsychiatry.2020.2444.
Dowloads
Pubblicato
Fascicolo
Sezione
Licenza
Copyright (c) 2025 International Journal of Development Mathematics (IJDM)

Questo volume è pubblicato con la licenza Creative Commons Attribuzione 4.0 Internazionale.
Authors are solely responsible for obtaining permission to reproduce any copyrighted material contained in the manuscript as submitted. Any instance of possible prior publication in any form must be disclosed at the time the manuscript is submitted and a
copy or link to the publication must be provided.
The Journal articles are open access and are distributed under the terms of the Creative
Commons Attribution-NonCommercial-NoDerivs 4.0 IGO License, which permits use,
distribution, and reproduction in any medium, provided the original work is properly cited.
No modifications or commercial use of the articles are permitted.