Machine Learning Reveals Links Between PFAS Exposure and COPD Risk
Key Highlights:
- The CatBoost machine learning model achieved 84% accuracy and an area under the curve of 0.89 in predicting chronic obstructive pulmonary disease (COPD).
- Higher levels of perfluorooctane sulfonic acid and perfluoroundecanoic acid —both types of per- and polyfluoroalkyl substances (PFAS)—were associated with reduced COPD risk.
- Higher levels of perfluorooctanoic acid and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid, also PFAS compounds, were associated with increased COPD risk.
- A web-based risk calculator was developed to provide real-time, individualized COPD risk assessment using demographic, behavioral, and PFAS biomarker data.
In a machine learning analysis of US population data, researchers identified distinct associations between specific per- and polyfluoroalkyl substances (PFAS) and chronic obstructive pulmonary disease (COPD). The best-performing model, CatBoost, achieved 84% accuracy, an area under the curve (AUC) of 0.89, sensitivity of 81%, and specificity of 84%. These findings were used to build a publicly accessible COPD risk calculator that generates individualized risk estimates based on demographic, behavioral, and PFAS exposure variables.
COPD is a major contributor to global morbidity and mortality, with limited options for early detection. PFAS—synthetic chemicals widely used in industry and consumer goods—have been associated with various health risks, including respiratory diseases. While prior studies have examined links between PFAS and COPD, few have used interpretable machine learning approaches capable of modeling complex, non-linear relationships or providing individualized risk predictions.
Using data from 4450 participants in the 2013–2018 National Health and Nutrition Examination Survey (NHANES), the researchers constructed and evaluated nine machine learning models. CatBoost emerged as the top performer. SHapley additive exPlanations (SHAP) and partial dependence plots (PDP) were used to interpret the model and assess how individual PFAS influenced COPD risk. The final model was deployed as a web-based calculator for clinical and public health use.
Partial dependence analysis revealed that higher levels of perfluorooctane sulfonic acid (PFOS) and perfluoroundecanoic acid (PFUA) were associated with reduced COPD risk. In contrast, perfluorooctanoic acid (PFOA) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) were linked to increased risk. Perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDE), and perfluorohexane sulfonic acid (PFHxS) showed mixed or non-linear effects. SHAP analysis ranked predictors by overall contribution, identifying age and smoking status as the most influential, with several PFAS compounds also contributing substantially.
The study has several limitations. COPD classification was based on self-reported data rather than clinical testing, and PFAS levels were measured at a single time point, limiting conclusions about long-term exposure. The model was not externally validated, and important covariates such as physical activity and occupational exposures were not included.
“CatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD,” the study authors concluded. “These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies.”
Reference:
Shao X, Zhang L, Wang Y, Ying Y, Chen X. Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration. Front Public Health. 2025;13:1602566. doi:10.3389/fpubh.2025.1602566
