摘要: | 背景:新生兒包含早產兒的相關重症和死亡率的預測研究不斷推陳出新,從利用傳統線性迴歸或統計分析的風險計算機,到近幾年來使用人工智慧和機器學習建立的預測模型,利用周產期特徵、生理數值、早產兒併發症等因子預測其死亡率。美國波士頓單一醫學中心的重症醫學資料庫MIMIC-III(Medical Information Mart for Intensive Care-III)為一數據龐大和結構化之資料庫,已被廣泛利用在重症醫學的機器學習,本研究試圖分析MIMIC-III 資料庫中新生兒加護病房住院中極低體重早產兒死亡個案之特徵並進行機器學習訓練,比較不同模型在預測新生兒加護病房住院中極低體重早產兒死亡病患之表現。希望能藉由死亡個案的特徵分析,改善新生兒加護病房品質,降低其死亡率。另一方面也希望能將此模型應用在前瞻式分析,作為臨床警示醫護及決策輔助工具使用。
目的:將MIMIC-III 資料庫中之新生兒加護病房極低體重早產兒病患資料進行前處理,建立結構化之新生兒加護病房極低體重早產兒資料庫,建立新生兒加護病房住院中極低體重早產兒死亡之預測模型,並使用衛生福利部雙和醫院之本地新生兒加護病房極低體重早產兒資料以進行模型外部驗證。
方法:從MIMIC-III 資料庫中將2001-2008 年之新生兒加護病房住院病患資料,篩選出極低體重早產兒,收集病患之人口統計資料、周產期資訊和事件發生(死亡/存活出院)前一日之24 小時內之生命徵象作為特徵項,使用邏輯式迴歸(Logistic Regression, LR)進行特徵分析。機器學習部分以80%資料作為訓練集,20%資料作為測試集,並且採用SMOTE(Synthesized Minority Oversampling Technique)進行資料平衡處理。機器學習部分使用邏輯式迴歸(Logistic Regression, LR)、隨機森林(Random Forest, RF)、K-近鄰演算法(K-Nearest Neighbors, KNN)、XGBoost(eXtreme Gradient Boosting)、AdaBoost(Adaptive Boosting),共5 種模型訓練新生兒加護病房住院中極低體重早產兒之死亡預測模型,以曲線下面積(Area Under Curve, AUC)、準確(Accuracy)、陽性預測值(Positive Predictive Value, PPV)及召回率(Recall)等四項指標評估。本研究更進一步使用衛生福利部雙和醫院新生兒科加護病房極低體重早產兒2021-2023 年之本地資料做為外部驗證以測試模型之適用性。
結果:本研究從MIMIC-III 資料庫中共收集765 筆符合分析條件之新生兒加護病房極低體重早產兒住院資料,使用新生兒之性別、出生週數、出生體重、出生後第一分鐘和第五分鐘之Apgar 分數及是否為多胞胎等6 項特徵,事件發生前一日之24 小時內之體溫、心跳速率、呼吸速率等12 項特徵共18 項進行特徵分析並以機器學習訓練住院中死亡預測模型。內部驗證之部分表現最好的是XGBoost 模型(AUC: 0.99; Accuracy: 0.98),最弱的是KNN 模型(AUC: 0.81; Accuracy: 0.88)。外部驗證中一樣以XGBoost 模型表現最佳,但曲線下面積僅0.71,其中邏輯式迴歸模型(AUC: 0.46)及KNN 模型(AUC: 0.55)預測能力極低。另外,不論是在內部或外部驗證中,存活預測的陽性預測值和召回率反而明顯優於死亡預測。因此,本研究之模型尚無法運用於本地極低體重早產兒之住院中死亡預測,將來需要更多的數據及分析以進行研究及建立準確之模型。 Background: Many efforts had been put to prediction of mortality in in-hospital VLBW (Very Low Birth Weights) preterm infants, from traditional statistical methods with linear regression models to recent risk predictors using machine learning. Features such as gestational and perinatal factors, physiological and biochemistry markers, and complications from prematurity has been used for developing risk calculator. Data bank such as MIMIC-III(Medical Information Mart for Intensive Care-III)from a single center in Boston, Massachusetts are used extensively in machine learning for prediction of mortality and complication rates for critical patients in intensive care units in adult patients, yet rarely in NICU (Neonatal Intensive Care Unit) infants. This study aimed to use data of VLBW preterm infants in NICU extracted from MIMIC-III to proceed with feature analysis and building machine learning models for in-hospital mortality prediction, in the hope of care quality improvement and act as an alarm detection system or decision supporting tool for clinicians.
Purpose: This study started with data extraction and processing of VLBW preterm infants in NICU from MIMIC-III, then proceed with feature analysis with machine learning method, and finally to build prediction model for in-hospital mortality to test on the MIMIC-III data and local data from Shuang Ho Hospital, Ministry of Health and Welfare.
Method: Information of VLBW preterm infants admitted to NICU during 2001-2008 was extracted from the MIMIC-III data set. Data including demographic information and clinical features were processed, and prediction model were then developed with machine learning algorithms using 80% of the data as training set, and 20% as testing set using Python software. SMOTE (Synthesized Minority Oversampling Technique) method was used for data balancing. Models with Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost) were used for building prediction model for in-hospital mortality of the VLBW preterm infants. Area Under Curve (AUC), Accuracy, Positive Predictive Value (PPV), and Recall were used for model evaluation. External validation was done with local data from VLBW preterm infants admitted to NICU during 2021-2023 at Shuang Ho Hospital, Ministry of Health and Welfare.
Result: The study obtained 765 admissions of VLBW preterm infants in NICU from MIMIC-III for feature analysis and building machine learning prediction models. 6 categories of demographic and perinatal information including gender, gestational age, birth weight, Apgar score at 1st and 5th minute after birth and number of fetuses, and 12 categories of vital signs recordings including temperature, heart rate and respiratory rate were selected for feature analysis and building machine learning models for in-hospital mortality prediction. XGBoost model (AUC: 0.99; Accuracy: 0.98) was the best model for internal validation, while KNN model (AUC: 0.81; Accuracy: 0.88) being the weakest. XGBoost also worked as the best model in external validation but with markedly weaker performance (AUC: 0.71), and the performance of logistic regression (AUC: 0.46) and KNN (AUC: 0.55) model on external data was close to random guessing. Interestingly, in both internal and external validation, the PPV and Recall were much higher in survival prediction rather than mortality prediction. In summary, the model trained in this study is not yet capable of predicting in-hospital mortality of VLBW preterm infants with local data. Further effort is warranted for research and building an optimal prediction model. |