摘要: | 研究背景:發炎性腸道疾病(Inflammatory Bowel Disease, IBD)是一種免疫失調的疾病。由於其成因複雜,且常與急性腸胃炎或腸道功能性障礙混淆,因此初期診斷不易。IBD患者的腸胃道會長期且反覆發炎,經常經歷腹瀉、腹痛或血便等症狀。如果未能及早治療,或者治療後發炎反應未能有效控制,患者可能需要使用進階藥物或接受手術治療。否則,長期累積的腸道損傷可能導致腸道狹窄、腸阻塞以及肛門?管等多種併發症。IBD是一種需要終身治療的疾病,近年來在台灣的盛行率有所上升。部分患者需要進階治療,包括使用生物製劑或接受手術。除了及早診斷外,能夠提前識別患者的發炎反應是否未受控制,並及時進行進階藥物或手術治療,成為重要的研究目標。 研究目的:由於影像學檢查取得不易且非常規檢查,因此本研究藉由分析臨床資料、實驗室資料,除了尋找臨床上最能顯示IBD患者緩解狀態的指標參數之外,還要以這些指標參數建構IBD緩解的預測模型。 研究方法:我們使用ICD-9和ICD-10代碼在臺北醫學大學臨床研究資料庫中進行搜索,涵蓋潰瘍性結腸炎(Ulcerative Colitis, UC)和克隆氏症(Crohn's Disease, CD)。我們將提取自2004年至2022年期間診斷為IBD的患者的臨床資料和血液檢測資料,年齡為18歲以上,研究使用邏輯回歸和隨機森林等機器學習方法來構建IBD患者緩解狀態的預測模型,資料整理及統計分析使用SAS和Python進行。針對連續變數,我們使用獨立t檢定來比較其平均數是否具有顯著差異;對於類別變數,使用卡方檢定來檢驗類別變數與應變數之間是否具有獨立性,最後我們使用Python來構建機器學習預測模型。 研究結果和結論:我們總共納入1,147名UC和CD患者,其中預後患者良好為1,073人,預後不好患者為74人。在預測IBD患者的預後結果上,模型整體的AUC達0.91 ± 0.04,針對預後不良的患者,模型的Recall達到0.98 ± 0.03。在5次外部驗證中,Recall的值也都有達到0.73~0.87,展現良好的結果以及穩定性。本研究成功開發了一個能夠準確預測IBD患者預後不良的模型,並在內部與外部驗證中均顯示出卓越的性能,特別是在未見過的資料上仍具有良好且穩定的表現。我們發現了Albumin 為模型預測時的最強變數,這些發現為臨床決策提供了有價值的信息,儘管我們也面臨資料缺失和特徵選擇等挑戰。 關鍵字:發炎性腸道疾病、克隆氏症、潰瘍性結腸炎、發炎、機器學習、預測緩解 Background: Inflammatory Bowel Disease (IBD) is an immune-mediated disorder. Its complex etiology and frequent confusion with acute gastroenteritis or functional bowel disorders make early diagnosis challenging. IBD patients experience chronic and recurrent inflammation of the gastrointestinal tract, often suffering from symptoms such as diarrhea, abdominal pain, or bloody stools. If not treated promptly, or if the inflammatory response is not effectively controlled after treatment, patients may require advanced medication or surgical intervention. Otherwise, the cumulative intestinal damage over time can lead to complications such as intestinal strictures, bowel obstruction, and anal fistulas. IBD is a lifelong disease, and its prevalence has been increasing in Taiwan in recent years. Some patients require advanced treatments, including biological agents or surgery. Besides early diagnosis, the ability to identify uncontrolled inflammatory responses in patients in advance and promptly proceed with advanced medication or surgical treatment has become a significant research goal. Purpose: Due to the difficulty and non-routine nature of obtaining imaging examinations, this study aims to analyze clinical and laboratory data to identify the most indicative parameters of remission status in IBD patients. Furthermore, these indicative parameters will be used to construct a predictive model for IBD remission. Methods: We used ICD-9 and ICD-10 codes to search the Taipei Medical University Clinical Data Center, covering both ulcerative colitis (UC) and Crohn's disease (CD). We extracted clinical data and blood test results for patients diagnosed with IBD from 2004 to 2022, aged 18 and above. The study employed machine learning methods such as logistic regression and random forest to construct a predictive model for the remission status of IBD patients. Data cleaning and statistical analysis were performed using SAS and Python. For continuous variables, we used independent t-tests to compare the means for significant differences; for categorical variables, chi-square tests were used to examine the independence between categorical variables and the dependent variable. Finally, we used Python to build the machine learning predictive models. Results and Conclusions: We included a total of 1,147 UC and CD patients, of which 1,073 had a good prognosis and 74 had a poor prognosis. In predicting the prognosis of IBD patients, the overall AUC of the model reached 0.91 ± 0.04. For patients with poor prognosis, the model's Recall reached 0.98 ± 0.03. In five external validations, the Recall values ranged between 0.73 and 0.87, demonstrating good results and stability. This study successfully developed a model capable of accurately predicting poor prognosis in IBD patients, showing excellent performance in both internal and external validations, especially on unseen data. We identified Albumin as the strongest variable in the model's predictions. These findings provide valuable information for clinical decision-making, despite challenges such as data missingness and feature selection. Keywords: inflammatory bowel disease, Crohn's disease, ulcerative colitis, inflammation, machine learning, remission prediction |