摘要: | 據世界衛生組織統計,世界各地確診膀胱癌的人口正逐年上升,好發七十至八十歲男性,發病原因至今不明,危險因子有家族遺傳、吸菸、部分職業等。初期症狀無具特異性,如血尿、頻尿等類似泌尿道細菌感染或是膀胱結石。膀胱癌可利用常規尿液檢查、細胞病理檢查及膀胱鏡來檢測,倘若病人出現血尿且為罹患膀胱癌之高風險人群,醫師會建議採膀胱鏡來確定有無膀胱癌,施術過程中若有異常組織,將進行病理切片,確認是否癌化或期別確立。早期膀胱癌可使用BCG注射、TURBT切除腫瘤,預後良好,復發機率小。若已進展至侵襲性,則建議摘除膀胱並配合化療及放療,同時密切注意有無轉移,預後不佳,復發機率也較大。現有檢查方式仍有不足,即使黃金指標的膀胱鏡亦有其風險,故本研究藉臨床檢驗數據結合機器學習,建立模型,尋找非侵入式檢驗結果與膀胱癌之關係,以預測病人罹患膀胱癌的機率,區分攝護腺癌、膀胱癌、腎癌、子宮內膜癌等泌尿系統附近之癌症。研究中採用之演算法包含Logistic regression、Decision trees、Random Forest、SVM、XGBoost、Light GBM等,使用混亂矩陣(confusion matrix) 、AUROC加以驗證模型表現。藉由向前特徵篩選得出特徵如BUN、Creatinine、eGFR、turbidity、Urine Glucose、 WBC、Occutlblood等,於雙和資料區分疾病控制組與疾病組上,平均有著高靈敏性(93.30%)、特異性(92.33%) 、精確性(92.48%) ,而AUROC則為0.975。 According to the Bladder cancer statistics from WTO, the number of patients increases year by year. The reason of bladder cancer is still unknow but is related to some risk factors such as smoking, aging, job occupation etc. The symptoms of bladder cancer are nonspecific including hematuria, pollakiuria which are likely confused by urinary tract infection or bladder calculus.With urine routine examination and urine cytological examination, patients along with hematuria would be suggested to underwent cystoscope for bladder cancer screening and biopsy would be taken if any suspect area or tissue exist. Cancer staging would be decided according to the result of all the examinations including MRI, CT, cystoscope, and treatments would be followed by suggestions. All the examinations mentioned above still have some disadvantages. Therefore, in this study, machine learning is combined with clinical laboratory data to find a particularly model for bladder cancer prediction and classification.Decision trees, Random Forest, Logistic regression, SVM, XGBoost, Light GBM were used, and the performance of models will be viewed by specificity, sensitivity, accuracy, precision, FI score and AUROC. With the selected features such as BUN, creatinine, eGFR, turbidity, urine strip Glucose, WBC in blood, model is able to separate the disease from cystitis easily with sensitivity (93.30%), specificity (92.33%), accuracy (92.48%) and AUROC (0.975). |