On Expectation-Maximization Algorithm with Split and Merge in R

Authors

  • Hassan Ahmed Department of Mathematical Sciences, Abubakar Tafawa Balewa University, Bauchi, Bauchi State, Nigeria. Author
  • Ahmed Abdulkadir Department of Mathematical Sciences, Abubakar Tafawa Balewa University, Bauchi, Bauchi State, Nigeria. Author
  • Kazeem Lasisi Department of Mathematical Sciences, Abubakar Tafawa Balewa University, Bauchi, Bauchi State, Nigeria. Author
  • Rasheed Bello Department of Mathematical Sciences, Gombe State University, Gombe, Gombe State, Nigeria Author

DOI:

https://doi.org/10.62054/ijdm/0102.14

Keywords:

Algorithm, EM, GMM, Mixture-model, Split and merge

Abstract

This paper presents a comprehensive study on the implementation of the Expectation-Maximization (EM) algorithm for a 4-component bivariate Gaussian Mixture Model (GMM) with a focus on incorporating the split and merge techniques. The bivariate GMM is widely utilized in various fields, such as pattern recognition, image processing, and data clustering, due to its flexibility in capturing complex data distributions. Our proposed implementation leverages the versatility of the R programming language to create a robust and efficient framework for modeling and optimizing the parameters of the 4-component GMM. The EM algorithm is employed as a powerful tool for iteratively estimating the model parameters, ensuring convergence to a local maximum of the likelihood function. To enhance the model’s flexibility, we introduce the split and merge strategies, enabling the algorithm to adapt to diverse data structures and efficiently manage the complexity of the mixture components. The split operation allows for the subdivision of components when needed, while the merge operation facilitates the combination of components that represent similar patterns. This adaptive approach contributes to the model’s ability to capture intricate data patterns and improve convergence during the optimization process. The implementation is validated through extensive experimentation on synthetic, demonstrating its effectiveness in accurately estimating the parameters of the 4-component bivariate GMM. The proposed methodology proves to be a valuable addition to the existing tools available for GMM based modeling, providing researchers and practitioners with a flexible and powerful framework for analyzing complex data structures in diverse applications

References

Ba¨cklin, C. L., Andersson, C., and Gustafsson, M. G. (2018). Self-tuning density estimation based on Bayesian averaging of adaptive kernel density estimations yields state-of-the-art performance. Pattern Recognition, 78, 133–143.

Baudry, J.-P., and Celeux, G. (2015). EM for mixtures: Initialization requires special care. Statistics and computing, 25, 713–726.

Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009). mixtools: An R Package for Analyzing Finite Mixture Models. Journal of Statistical Software, 32(6), 1–29. Retrieved from https://www.jstatsoft.org/v32/i06/

Celeux, G., and Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern recognition, 28(5), 781–793.

Cheng, K. K., Lam, T. H., and Leung, C. C. (2022). Wearing face masks in the community during the COVID-19 pandemic: altruism and solidarity. The Lancet, 399(10336), e39–e40.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1), 1–22.

Du, Y., and Gui, W. (2019). Goodness of fit tests for the log-logistic distribution based on cumulative entropy under progressive type II censoring. Mathematics, 7(4), 361.

Ganesalingam, S., and McLachlan, G. J. (1979). Small sample results for a linear discriminant function estimated from a mixture of normal populations. Journal of Statistical Computation and Simulation, 9(2), 151–158.

Gebru, I. D., Alameda-Pineda, X., Forbes, F., and Horaud, R. (2016). EM algorithms for weighted-data clustering with application to audio-visual scene analysis. IEEE transactions on pattern analysis and machine intelligence, 38(12), 2402–2415.

Kang, Y., Hyndman, R. J., and Li, F. (2020). GRATIS: GeneRAting

TIme Series with diverse and controllable characteristics. Statistical Analysis and Data Mining: The ASA Data Science Journal, 13(4), 354–376.

Kayal, S., Bhakta, R., and Balakrishnan, N. (2023). Some results on stochastic comparisons of two finite mixture models with general components. Stochastic Models, 39(2), 363–382.

Liu, C., Li, H.-C., Fu, K., Zhang, F., Datcu, M., and Emery, W. J. (2019). Bayesian estimation of generalized gamma mixture model based on variational em algorithm. Pattern Recognition, 87, 269– 284.

Ma, J., Jiang, X., Jiang, J., and Gao, Y. (2019). Feature-guided Gaussian mixture model for image matching. Pattern Recognition, 92, 231–245.

McLachlan, G. (2000). Peel., d. Finite Mixture Models.

McLachlan, G. J., and Krishnan, T. (2007). The EM algorithm and extensions. John Wiley and Sons.

McLachlan, G. J., Lee, S. X., and Rathnayake, S. I. (2019). Finite mixture models. Annual review of statistics and its application, 6, 355–378.

Ng, S.-K., and McLachlan, G. J. (2004). Speeding up the EM algorithm for mixture model-based segmentation of magnetic resonance images. Pattern Recognition, 37(8), 1573–1589.

Pag`es-Zamora, A., Cabrera-Bean, M., and Diaz-Vilor, C. (2019). Unsupervised online clustering and detection algorithms using crowdsourced data for malaria diagnosis. Pattern Recognition, 86, 209–223.

Scrucca, L., Fraley, C., Murphy, T. B., and Raftery, A. E. (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC. Retrieved from https://mclust-org.github.io/book/ doi: 10.1201/9781003277965

Sugasawa, S., Kim, J. K., and Morikawa, K. (2022). Semiparametric imputation using latent sparse conditional Gaussian mixtures for multivariate mixed outcomes. arXiv preprint arXiv:2208.07535.

Yang, M.-S., Lai, C.-Y., and Lin, C.-Y. (2012). A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognition, 45(11), 3950–3961.

Yu, J., Chaomurilige, C., and Yang, M.-S. (2018). On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures. Pattern Recognition, 77, 188–203.

Yu, L., Yang, T., and Chan, A. B. (2018). Density-preserving hierarchical EM algorithm: Simplifying Gaussian mixture models for approximate inference. IEEE transactions on pattern analysis and machine intelligence, 41(6), 1323–1337.

Zhang, B., Zhang, C., and Yi, X. (2004). Competitive EM algorithm for finite mixture models. Pattern recognition, 37(1), 131–144.

Downloads

Published

2024-06-02

How to Cite

On Expectation-Maximization Algorithm with Split and Merge in R. (2024). International Journal of Development Mathematics (IJDM), 1(2), 179-190. https://doi.org/10.62054/ijdm/0102.14

Similar Articles

1-10 of 141

You may also start an advanced similarity search for this article.