PREDICTIVE MODELING OF DIABETES, BREAST CANCER, CIRRHOSIS, AND THYROID DISORDERS IN MEXICAN WOMEN: A METHODOLOGICALLY RIGOROUS MACHINE LEARNING APPROACH WITH INTERSECTIONAL FAIRNESS EVALUATION

Authors

  • Luis Alberto Chavero Chavez Author
  • Gabriel Sánchez Bautista Author
  • Mónica García Munguía Author

DOI:

https://doi.org/10.4238/0r6rws32

Abstract

Background. Diabetes, breast cancer, cirrhosis, and thyroid disorders impose a disproportionate burden on Mexican women, with indigenous women at elevated diabetes risk [1]. Published machine-learning models often report near-perfect discrimination implausible for real data, owing to leakage-prone resampling and absent uncertainty reporting [2,3].

Objective. We test whether Random Forest and a Deep Neural Network can deliver reliable early identification of these four conditions under TRIPOD+AI and PROBAST standards, and whether performance holds for indigenous women under FAIR-MED [4,5].

Methods. We analyze ENSANUT 2022 (N = 115,307; indigenous n = 9,275; 38 variables). Random Forest is fit for diabetes, cirrhosis, and thyroid disorders; a five-layer Deep Neural Network for breast cancer. Imputation, normalization, and SMOTE-ENN are confined to training folds within stratified 5-fold cross-validation. We report G4, AUC-ROC, F1, Brier, and calibration slope and intercept with bootstrap 95% CIs [6], and compare SHAP feature importance.

Results. Discrimination is high but non-perfect (G4 0.87–0.93; AUC 0.91–0.95). Calibration is good (Brier 0.058–0.092; slopes 0.89–0.97); learning curves converge within five percent. Indigenous women show lower performance (FAIR-MED 0.17–0.23; G4 deficits of 3–4 points), with dominant predictors shifting from clinical to sociostructural variables.

Conclusions. Both architectures can support early identification of these conditions when methodological discipline replaces inflated metrics with calibrated estimates. Equitable deployment for indigenous women requires subgroup-aware modeling and external validation.

Downloads

Published

2026-06-08

Issue

Section

Articles