摘要: | 膀胱為儲存尿液的器官,因需與人體代謝後的尿液接觸,膀胱有長時間與高度機會接觸並浸泡在尿液中的各樣物質裡,使其組織容易產生病變與癌化。無痛血尿、頻尿、小便疼痛等為常見之膀胱癌症狀,常發生於男性、抽煙、飲酒、有家族病史的患者身上。依世界衛生組織公告,膀胱癌(Bladder urothelial carcinoma, BLCA)已名列全球十大癌症之一。在台灣,膀胱癌在男性十大癌症中也佔有一席之地。
不同於腹腔一帶或泌尿相關的其他癌症,膀胱癌除了切片看形態學之外,目前並沒有其他非侵入式的診斷指標可參考。本研究欲透過資料探勘與機器學習之技術,用於臨床大數據中相關之檢驗項目如尿液常規檢查、血液生化檢查、病理細胞學檢查結果等資料提取特徵,建立偵測膀胱癌生成及分辨的模型。
研究樣本收集北部某醫學中心2009~2019 十年間病理報告有確診膀胱癌(BLCA)、 腎癌(Kidney renal clear cell carcinoma, KIRC)、男性的前列腺癌(Prostate adenocarcinoma, PRAD)及女性的子宮體或子宮頸癌(Cervical squamous cell carcinoma and endocervical adenocarcinoma, CSEC)四類癌症的患者資料。另做為比對,加收病理報告為膀胱炎(Cystitis)者。
經Python進行資料探勘,並以10 fold交叉驗證進行機器學習,在經參數調整後,以隨機森林建立出不錯的膀胱癌分辨模型,從膀胱炎正確分辨膀胱癌的準確度可達71.8%,從腎臟癌正確分辨膀胱癌準確度可達74.2%,從攝護腺癌正確分辦膀胱癌的準確度可達78%,而從子宮體癌正確分辨膀胱癌的準確度高可達86.8%。
本次研究結果展現,應用資料探勘與機器學習的技巧於容易取得的臨床數據資料中,可以建立不錯的機器學習模型,有助於提升癌症早期發現的機會。 Urinary bladder is the organ responsible for urine storage. Due to its long period of contact with urine, the bladder has a high chance of immersing in various chemical substances, making the tissue in high risk of neoplastic changing. Painless hematuria, frequent urination, and painful urination are common symptoms of bladder cancer, which often occur in men, people with smoking or drinking habits, and patients with family history. According to the World Health Organization, bladder urothelial carcinoma (BLCA) has been listed as one of the top ten cancers in the world. In Taiwan, bladder cancer is also listed among the top ten cancers in men.
Unlike other cancers in the abdominal cavity, bladder cancer has no other non-invasive indicator but biopsy through cystoscope. Our study intends to use data mining and machine learning techniques to extract features from clinical data such as urine routine, blood examination, and cytology results to establish models for detection of bladder cancer.
We collected cases from one medical center during 2009 to 2019. Pathological reports confirmed as bladder urothelial carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), prostate adenocarcinoma (PRAD) and cervical squamous cell carcinoma and endocervical adenocarcinoma (CSEC) are collected. For comparison, cases with pathological report of cystitis are also collected.
Using python for data exploration, and machine learning with 10 fold cross-validation carried out. After parameter adjustment, Random forest established a good bladder cancer discrimination model. The accuracy of correctly distinguishing bladder cancer from cystitis is 71.8%, from kidney cancer is 74.2%, from prostate cancer is 78%, and from uterine cancer is 86.8%.
The results of this study shows that the application of data mining and machine learning techniques in clinical laboratory data can build good models for improvement of early detection of cancer. |