Veerakul, Gumpanart, Pasupa, Kitsuchart, Pinroj, Yod, Chaothawee, Lertlak, Somprakit, Pradit, Kietdumrongwong, Pongtorn, Tonphu, Somkiat, Chaiwong, Warut and Yasri, Saowaluck (2025) Predicting Cardiovascular Diseases Risk in Thai Population by Machine Learning The Bangkok Medical Journal, 21 (2)., 133-144.
OBJECTIVES: To establish a clinical data lake for artificial intelligence (AI) and develop a machine learning model to predict cardiovascular disease (CVD) risk. MATERIALS AND METHODS: Following IRB approval, de-identified clinical data from 2.9 million patients (2010–2019) across eight Bangkok Dusit Medical Services (BDMS) hospitals were collected in compliance with the Personal Data Protection Act (PDPA). Two datasets were constructed: BDMS-CVD-large (n = 9,072), comprising 3-year clinical records with 20 SHAP-selected features plus age and sex; and BDMS-CVD-Small (n = 107), incorporating coronary artery calcium scores (CACS) and time-from-test. XGBoost models were trained using 5-fold cross-validation, grid search, and repeated across 10 random splits. RESULTS: The BDMS-CVD-Large model achieved strong performance (F1-score: Macro 0.93 ± 0.008; Weighted 0.97 ± 0.003), with age, HDL, and LDL as key predictors. Including CACS improved the F1-score (0.92 ± 0.032 vs. 0.87 ± 0.031), confirming its value. Limitations included potential occult CVD, exclusion of over 40% of cases due to incomplete data, and missing longitudinal data in many patients. CONCLUSION: This study demonstrates the feasibility of machine learning (ML) based CVD prediction using large-scale clinical data under PDPA compliance. Prospective validation over 5–10 years is warranted, and integrating CACS may enhance future predictive accuracy.
Item Type:
Article
Identification Number (DOI):
Subjects:
Subjects > Computer Science > Machine Learning
Subjects > Statistics > Applications
Deposited by:
Kitsuchart Pasupa
Date Deposited:
2026-01-06 00:29:43
Last Modified:
2026-01-06 00:38:40