摘要: | 研究目的:病患出院照護計畫需要醫師依照病患之疾病治療過程給予詳細問診、安排檢查、確認疾病對症下藥,再依住院期間之治療給予書寫於出院病摘中,併給予後續適當的診斷與治療照護計畫,這過程是臨床常規複雜的工作程序。若能在此過程中建立輔助疾病診斷碼之模型,將能減少人力與時間上的浪費,進而提升醫療紀錄與治療的完整性。
研究方法:近年來深度學習(Deep Learning)激起一股研究熱潮,在不同領域中有突破性的發展,文本分類之相關研究也隨此熱潮所影響,目前也已經有越來越多的研究針對深度模型做開發。本研究利用word 2vec對電子病歷上的文字,轉化為聚集向量,並透過CNN的方式強化特徵擷取,最後使用多標籤分類(Multi-Label Text Classification)方式建立預測做為輸出,以實現自動挑選疾病診斷碼(ICD-10-CM)於出院病摘中的目的。
研究結果:目前研究結果可利用深度學習演算法成功建立出預測模型,另外,利用人工的方式隨機挑選個案進行驗證,發現在許多個案中有不錯的預測結果。本研究挑選住院資料較多且複雜的內科與外科分別為神經外科(Neurosurgery)、腎臟科(Nephrology)和心臟科(Cardiology)做模型訓練。第一種模型為三個科別的全部醫生,訓練標籤以Full Level ICD-10,並以一整年度的出院病摘做為學習對象,其學習的效果分別為神經外科(Neurosurgery)的46.4%、心臟科(Cardiology)45.8%及腎臟科(Nephrology)的18.3%。第二種模型為挑選內科與外科分別選三個科別各個醫生,訓練標籤以Full Level ICD-10,挑選資料數大於100的醫生,並以一整年度的出院病摘做為學習對象,以醫師為單位的個別訓練表現,單一醫師的模型效果有了突破性的成長,第一名為神經外科(Neurosurgery)的86.1%,而這位醫師的訓練案件數與案病比,與其他醫師相比也都是相對的高。但第二名以後的模型綜合指標則了非常大的落差,降至42.1%與第一名的醫師約差距44%、訓練數少了47%,從模型表現的綜合指標來看,訓練數與案病比雖然趨勢相當,但都存在著變異量。第三種模型挑選內科與外科分別為神經外科(Neurosurgery)、腎臟科(Nephrology)和心臟科(Cardiology),經由前兩次的模型經驗,我們可以預知模型的綜合比較,對於訓練數與案病比是有著正向的關係,所以第三種模型在無法增加模型個案訓練數的情況下,我們針對ICD層級進行變更,將Full變更為Categorical,再次針對三個科的全部醫生進行模型訓練,綜合表示整體提升至46.4%。將會繼續使用訓練完成的模型進行獨立的資料集進行測試,並使用適當的評估參數進行預測模型整體的表現評估。 Object:The patient's discharge care plan requires the doctor to give detailed consultation, arrange the examination, confirm the disease, and then treat the disease according to the disease treatment process of the patient, and then write it in the discharge of the disease according to the treatment during the hospitalization, and give appropriate follow-up diagnosis and Therapeutic care plan, a process that is routinely complex in clinical practice. If the model which is disease diagnosis code can be established in this process, we can reduce labor work, thereby improving the integrity of medical records and treatment.
Methods: In this study, word2vec is used to convert the text on the electronic medical record into an aggregation vector, and the feature extraction is enhanced through the CNN method. Finally, the multi-label text classification method is used to establish the prediction of the disease diagnosis.
Results: The current research results can successfully establish a predictive model using a deep learning algorithm. In addition, a manual method is used to randomly select cases for verification, and it is found that there are good prediction results in many cases. This study selected internal medicine (Cardiology) and surgery (Neurosurgery and Nephrology) for model training. in the first model is all doctors in three departments. The training label is Full Level ICD-10, and the whole year's discharge note is taken as the learning object. The learning effect is 46.4% of neurosurgery,45.8% of nephrology and 18.3% of cardiology department respectively. The second model selects three different doctors for the selection of internal medicine and surgery. The training label is Full Level ICD-10, selects doctors with more than 100 visits, and takes the whole year's discharge sickness as the learning object. The doctor's individual training performance, the single physician's model effect has a breakthrough improvement, the first one is 86.1% of neurosurgery, and the doctor's training cases compared with the case, compared with other physicians. It is relatively high. However, the comprehensive index of the model has a very large gap, which is down to 42.1% and the first doctor is 44%, and the training is 47% less. From the comprehensive indicators of the model performance, the training number and Although the case-to-case ratio is quite similar, there are variations. The third model selects internal medicine and surgery as cardiology, neurosurgery, and nephrology. Through the previous two models of experience, we can predict the comprehensive comparison of the model, and there is a positive relationship between the training number and the case-to-case ratio, so in the third model, when the number of model case trainings could not be increased, we changed the ICD level, changed full to categorical, and again trained the models for all the doctors in the three departments. The overall performance was increased to 46.4%. The trained model will continue to be tested in a separate data set and the overall performance of the predictive model will be evaluated using appropriate evaluation parameters. |