摘要: | 中風是目前台灣的第四大死因,也是造成成人失能的最常見原因。根據之前研究,有3.1-18%的中風的病人在3個月內會發生再次中風。中風的再次發生將造成失能的程度更加嚴重,因此,如何去預防並且預測中風的再發生將有助於我們適時地調整治療策略,以減少再次中風發生的機率。本研究目的在於使用機器學習與深度學習的模式對缺血性中風病人5年內再次發生缺血性中風的可能性進行預測,我們使用的資料來自台灣的健保資料庫,收集的是病人首次缺血性中風時的住院資料(包含住院天數、醫院等級、出院情況、診斷、手術、藥物、醫療費用等)以及病人前3年的住院資料(住院天數、診斷、手術、醫療費用)與門診資料(診斷、手術、醫療費用)。使用的預測模式包含決策樹(decision tree)、隨機森林(random forest)、邏輯回歸(logistic regression)、貝氏演算法(Naïve Bayes)、自適應增強(AdaBoost)、支持向量機(support vector machine)、與深度學習網路(deep neural network)。我們研究發現機器學習建立的預測模型對於缺血性中風病人再中風的預測有不錯的表現,其中SVM模型在oversampling 50:50的處理後並藉由適當的特徵選取,其accuracy可以到84.1%、sensitivity可以到84.9%、specificity可以到83.5%、precision可以到80.2%。而以我們目前深度學習的結果而言,經過參數調整後其accuracy最高可以到78.5%,但其precision僅為0.1%而以目前的結果而言,無法顯示其預測表現較機器學習更佳。 Stroke is currently the fourth leading cause of death in Taiwan and the most common cause of adult disability. According to previous studies, 3.1-18% of stroke patients will have another stroke within 3 months. The recurrence of stroke will cause more severe disability. Therefore, how to prevent and predict the recurrence of stroke will help us to adjust the treatment strategy in time to reduce the chance of recurrence. The purpose of this study is to use machine learning and deep learning models to predict the possibility of ischemic stroke recurrence in patients with ischemic stroke within 5 years. The data we use are from Taiwan’s health insurance database. Hospitalization data (including hospitalization days, hospital level, discharge status, diagnosis, surgery, drugs, medical expenses, etc.) during first-time ischemic stroke, hospitalization data (hospital days, diagnosis, surgery, medical expenses, etc.) in the previous 3 years, and outpatient data (Diagnosis, surgery, medical expenses) in the previous 3 years were collected. The prediction modes used included decision tree, random forest, logistic regression, Naïve Bayes, adaptive enhancement (AdaBoost), support vector machine, and deep neural network. Our research has found that the prediction model established by machine learning has a good performance in the prediction of re-stroke in patients with ischemic stroke. Among them, the SVM model has an accuracy of 84.1% after oversampling 50:50 processing and appropriate feature selection. The sensitivity can reach 84.9%, the specificity can reach 83.5%, and the precision can reach 80.2%. As far as our current deep learning results are concerned, its accuracy can reach 78.5% after opitimization of hyperparameters, but its precision is only 0.1%. Based on the current results, it cannot show that its prediction performance is better than machine learning. |