摘要: | 大腸直腸癌(Colorectal cancer, CRC)為全球排名第四的癌症死亡原因,而在衛生福利部國民健康署(國健署)2019年的癌症登記報告中統計,台灣癌症人數最多的亦為大腸直腸癌。其中大腸息肉為大腸直腸癌的風險因子。除了息肉為癌變的風險因素,在發炎性腸道疾病(Inflammatory bowel disease, IBD)的病人中,發生大腸直腸癌的風險也相對較高。本研究為了開發大腸直腸癌的預測模型,收集臺北醫學大學臨床研究資料庫(Taipei Medical University Clinical Research Database, TMUCRD)所提供的臨床數據,包含臺北醫學大學附設醫院7,747人,病患為被診斷為大腸直腸癌(CRC)或大腸炎(Colitis)、息肉(Polyp)。接著將利用Python進行資料分析,演算法分別使用Logistic Regression、Decision Tree、Random Forest、SVM、Extreme Gradient Boosting(XGBoost)以及LightGBM建立預測模型。結果在兩院的CRC與Colitis中,最適合的作為預測模型的是LightGBM,結果為Accuracy 85.5%、Sensitivity 84.3%、Specificity 86.6%、AUC 0.919,檢驗項目為CEA與RBC。在未來可以結合CEA與RBC做更一步的研究,以利未來可以在大腸炎或息肉的病人中,及早發現癌症的可能性,並可以及早做治療。 Colorectal cancer (CRC) is the fourth leading cause of cancer death in the world, and according to the 2019 cancer registration report of The Health Promotion Administration (HPA) of the Ministry of Health and Welfare, colorectal cancer has the largest number of cancers in Taiwan. Among them, colorectal polyps are risk factors for colorectal cancer. In addition to polyps being a risk factor for cancer, patients with inflammatory bowel disease also have a relatively higher risk of developing colorectal cancer. In order to develop a prediction model for colorectal cancer, this study collected clinical data provided by the Taipei Medical University Clinical Research Database (TMUCRD), including 7747 people from the Taipei Medical University Hospital (TMUH), the patients were diagnosed with colorectal cancer, colitis or polyps. Next, Python will be used for data analysis, and the algorithms will use logistic regression, decision tree, random forest, SVM, Extreme Gradient Boosting (XGBoost) and lightGBM to build prediction models. As a result, lightGBM is the most suitable prediction model for CRC and colitis. TMUH had accuracy 85.5%, sensitivity 84.3%, specificity 86.6%, AUC 0.919. The clinical laboratory test items of TMUH are CEA and RBC. In the future, we can combine CEA and RBC for further research. And in patients with colitis or polyps, the possibility of cancer can be detected early, and treatment can be done early. |