PREDICTIVE MODELING OF DIABETES, BREAST CANCER, CIRRHOSIS, AND THYROID DISORDERS IN MEXICAN WOMEN: A METHODOLOGICALLY RIGOROUS MACHINE LEARNING APPROACH WITH INTERSECTIONAL FAIRNESS EVALUATION

Luis Alberto Chavero Chavez; Gabriel Sánchez Bautista; Mónica García Munguía

doi:10.4238/0r6rws32

Authors

Luis Alberto Chavero Chavez Author
Gabriel Sánchez Bautista Author
Mónica García Munguía Author

DOI:

https://doi.org/10.4238/0r6rws32

Abstract

Background. Diabetes, breast cancer, cirrhosis, and thyroid disorders impose a disproportionate burden on Mexican women, with indigenous women at elevated diabetes risk [1]. Published machine-learning models often report near-perfect discrimination implausible for real data, owing to leakage-prone resampling and absent uncertainty reporting [2,3].

Objective. We test whether Random Forest and a Deep Neural Network can deliver reliable early identification of these four conditions under TRIPOD+AI and PROBAST standards, and whether performance holds for indigenous women under FAIR-MED [4,5].

Methods. We analyze ENSANUT 2022 (N = 115,307; indigenous n = 9,275; 38 variables). Random Forest is fit for diabetes, cirrhosis, and thyroid disorders; a five-layer Deep Neural Network for breast cancer. Imputation, normalization, and SMOTE-ENN are confined to training folds within stratified 5-fold cross-validation. We report G4, AUC-ROC, F1, Brier, and calibration slope and intercept with bootstrap 95% CIs [6], and compare SHAP feature importance.

Results. Discrimination is high but non-perfect (G4 0.87–0.93; AUC 0.91–0.95). Calibration is good (Brier 0.058–0.092; slopes 0.89–0.97); learning curves converge within five percent. Indigenous women show lower performance (FAIR-MED 0.17–0.23; G4 deficits of 3–4 points), with dominant predictors shifting from clinical to sociostructural variables.

Conclusions. Both architectures can support early identification of these conditions when methodological discipline replaces inflated metrics with calibrated estimates. Equitable deployment for indigenous women requires subgroup-aware modeling and external validation.

PREDICTIVE MODELING OF DIABETES, BREAST CANCER, CIRRHOSIS, AND THYROID DISORDERS IN MEXICAN WOMEN: A METHODOLOGICALLY RIGOROUS MACHINE LEARNING APPROACH WITH INTERSECTIONAL FAIRNESS EVALUATION

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Make a Submission

side

INDEXING

right

Language