Application of the random forest algorithm to a model of anemia classification in Peruvian children

Authors

Keywords:

anemia, algorithms, area under curve, child

Abstract

Introduction: in Peru, in recent years there is a decrease in poverty. However, the prevalence of anemia continues high; it affects 40,00 % of children from six to 35 months of age.

Objective: to identify risk factors or forecasts in the appearance of anemia in Peruvian children.

Methods: a transverse observational study was carried out from the database created for the Demographic and Family Health Survey, by the National Institute of Statistics and Informatics during the years 2015-2019. The population was constituted by 57 410 children from six to 35 months of age, which had hemoglobin exams. 33 independent variables were selected and six procedures were raised with the random forest algorithm. Values of the area indicators under the curve, specificity and sensitivity were obtained.

Results: The procedure that best predicted the presence of anemia, with values for specificity indicators (63,62 %) and sensitivity (65,88 %) more similar, used balanced data with readjustment of the parameters, reduction of the amount of trees and selection of variables.

Conclusions: the five most important independent variables for the model were: child age, conglomerate altitude, number of prenatal visits for pregnancy, moment of the first prenatal control and size of the mother. The study provided scientific evidence about the use of automatic learning algorithms to predict the appearance of anemia based on common risk factors

Downloads

Download data is not yet available.

Author Biography

Bernardo Céspedes Panduro, Lima, República del Perú.

Universidad Nacional Mayor de San Marcos.

References

Instituto Nacional de Estadística e Informática (Perú). Encuesta Demográfica y de Salud Familiar. ENDES 2020 [Internet]. Lima: Instituto Nacional de Estadística e Informática; 2021 [citado 14 May 2021]. Disponible en: https://proyectos.inei.gob.pe/endes/2020/INFORME_PRINCIPAL_2020/INFORME_PRINCIPAL_ENDES_2020.pdf

Ministerio de Salud (Perú). Plan nacional para la reducción y control de la anemia materno infantil y la desnutrición crónica infantil y la desnutrición crónica infantil en el Perú: 2017-2021 [Internet]. Lima: MINSA; 2021 [citado 12 Abr 2017]. Disponible en: https://cdn.www.gob.pe/uploads/document/file/322898/Plan_nacional_para_la_reducci%C3%B3n_y_control_de_la_anemia_materno_infantil_y_la_desnutrici%C3%B3n_cr%C3%B3nica_infantil_en_el_Per%C3%BA__2017___2021._Documento_t%C3%A9cnico20190621-17253-s9ub98.pdf

Ministerio de Desarrollo e Inclusión Social (Perú). Plan multisectorial de lucha contra la anemia [Internet]. Lima: MIDIS; 2018 [citado 27 May 2018]. Disponible en: https://cdn.www.gob.pe/uploads/document/file/307159/plan-multisectorial-de-lucha-contra-la-anemia-v3.pdf

Sanou D, Ngnie-Teta I. Risk factors for anemia in preschool children in Sub-Saharan Africa. En: Silverberg DS, editor. Anemia [Internet]. Rijeka: InTech; 2012. p. 171-90. [citado 14 Feb 2012]. Disponible en: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1040.3665&rep=rep1&type=pdf

Balarajan Y, Ramakrishnan U, Özaltin E, Shankar AH, Subramanian SV. Anaemia in low-income and middle-income countries. The Lancet [Internet]. Ene 2012 [citado 3 Ago 2012];378(9809):2123-35. Disponible en: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1023.2792&rep=rep1&type=pdf

Saaka M, Galaa SZ. How is dietary diversity related to haematological status of preschool children in Ghana? Food Nutr Res [Internet]. Jun 2017 [citado 14 Jun 2017];61(1):1333389. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5475327/pdf/zfnr-61-1333389.pdf

Siekmans K, Receveur O, Haddad S. Can an integrated approach reduce child vulnerability to anaemia? Evidence from three African countries. PLoS ONE [Internet]. 2014 [citado Mar 2014];9(3):e90108. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3943899/pdf/pone.0090108.pdf

Véliz-Capuñay C. Aprendizaje automático. Análisis para la minería de datos y big data. Lima: Pontificia Universidad Católica del Perú; 2018.

Mahboob T, Irfan S, Karamat A. A machine learning approach for student assessment in e-learning using Quinlan’s C4.5, naive bayes and random forest algorithms. En: Proceedings of the 2016 19th International MultiTopic Conference, INMIC 2016. p. 1-8.

Ezzati M, López AD, Rodgers A, Murray CJL, editores. Comparative quantification of health risks: global and regional burden of disease attributable to selected major risk factors. Vol. 1 [Internet]. Geneva: WHO; 2004. [citado 18 Oct 2014]. Disponible en: https://apps.who.int/iris/bitstream/handle/10665/42770/9241580313_eng.pdf

Durán-Romo B. Comparación de metodologías de imputación aplicadas a ingresos laborales de la ENOE. Realidad, Datos y Espacio. Revista Internacional de Estadística y Geografía [Internet]. Dic 2019 [citado 18 Dic 2019];10(3):5-27. Disponible en: https://rde.inegi.org.mx/wp-content/uploads/2019/09/RDE29_art01_2c.pdf

Fernández-Vásquez RF. Regresión bayesiana con enlaces asimétricos para la clasificación de clientes con propensión a caer en mora en una entidad bancaria. Lima: Universidad Nacional Agraria La Molina; 2018 [citado 20 Feb 2018]. Disponible en: http://repositorio.lamolina.edu.pe/bitstream/handle/20.500.12996/3093/fernandez-vasquez-richard-fernando.pdf?sequence=3&isAllowed=y

Perez-Sánchez JM, Negrín-Hernández MA, García-García C, Gómez-Déniz E. Bayesian asymmetric logit model for detecting risk factors in motors ratemaking. Astin Bulletin. 2014;44(2):445-57.

Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, et al. Comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics [Internet]. 2009 [citado 20 May 2014];10:213. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2724423/pdf/1471-2105-10-213.pdf

Kroese DP, Botev ZI, Taimre T, Vaisman R. Data science and machine learning. Mathematical and statistical methods. Boca Ratón: CRC Press; 2019.

Genuer R, Poggi JM. Random forest with R. En: Genuer R, Poggi JM. Random forest. London: Springer Nature, 2020. p. 33-55.

Khan JR, Chowdhury S, Islam H, Raheem E. Machine learning algorithms to predict the childhood anemia in Bangladesh. Journal of Data Science [Internet]. 2019 [citado 20 May 2019];17(1)195-218. Disponible en: https://www.jds-online.com/files/01%20No.09%20310%20Machine%20learning%20algorithms%20to%20predict%20the%20childhood%20anemia%20in%20Bangladesh.pdf

Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman J, Sakr S. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford exercise testing (FIT) project. PloS One [Internet]. 2017 [citado 24 Jul 2017];12(7):e0179805. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5524285/pdf/pone.0179805.pdf

Meena K, Tayal DK, Gupta V, Fatima A. Using classification techniques for statistical analysis of anemia. Artif Intell Med. Mar 2019;94:138-52.

Khare S, Kavyashree S, Gupta D, Jyotishi A. Investigation of nutritional status of children based on machine learning techniques using Indian demographic and health survey data. Proc Comp Sci [Internet]. 2017 [citado 24 Jul 2017]115:338-49 Disponible en: https://reader.elsevier.com/reader/sd/pii/S187705091731894X?token=5D7B79CF5C71745C89B20E2D46EFE7FA649FA3E9ED92AED1E96C5BAD5AB8768649C171CDB95401D47D44C2C9ECCA1516yoriginRegion=us-east-1yoriginCreation=20220607134821

Santos-Da Silva LL, Wahib-Fawzi W, Augusto-Cardoso M. Factors associated with anemia in young children in Brazil. Plos One [Internet] 2018 [citado 25 Sep 2018];13(9):e0204504. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6155550/pdf/pone.0204504.pdf

Gebremeskel MG, Tirore LL. Factors associated with anemia among children 6-23 months of age in Ethiopia: a multilevel analysis of data from the 2016 Ethiopia Demographic and Health Survey. Pediatr Health Med Ther [Internet]. 2020 [citado 27 Jul 2020];11:347-57. Disponible en: https://www.dovepress.com/getfile.php?fileID=61509

Molla A, Egata G, Mesfin F, Arega M, Getacher L. Prevalence of anemia and associated factors among infants and young children aged 6-23 months in Debre Berhan Town, North Shewa, Ethiopia. J Nutr Metab [Internet]. 2020 [citado 27 Jul 2020];2020:2956129. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768586/pdf/jnme2020-2956129.pdf

Shenton LM, Jones AD, Wilson ML. Factors associated with anemia status among children aged 6-59 months in Ghana, 2003-2014. Matern Child Health J [Internet]. Abr 2020 [citado 21 Abr 2020];24(4):483-502. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7078144/pdf/10995_2019_Article_2865.pdf

Manikandan AD. Factors Associated with anemia among women and children belonging to the scheduled castes and scheduled tribes in degraded districts of India. Indian Development Policy Review [Internet]. 2020 [citado 21 Abr 2020];1(1):43-66. Disponible en: https://www.esijournals.com/image/catalog/Journal%20Paper/IDPR/No%201%20(2020)/4_Manikandan.pdf

Published

2022-09-19

How to Cite

1.
Céspedes Panduro B. Application of the random forest algorithm to a model of anemia classification in Peruvian children. Mediciego [Internet]. 2022 Sep. 19 [cited 2025 Jan. 10];28(1):e3471. Available from: https://revmediciego.sld.cu/index.php/mediciego/article/view/3471

Issue

Section

Original article

Most read articles by the same author(s)