摘要: | 隨著時代的變遷、醫療技術的進步,人類壽命延長且疾病型態改變,高齡人口快速增加,伴隨年齡增長的失智症近年來逐漸成為各國嚴重公共衛生議題,根據國際阿茲海默症協會 (Alzheimer's Disease International, ADI)於2019年估計,全球有超過5千萬名失智者,其中阿茲海默症是最常見的失智症疾病,約佔失智症病患60%至70%,失智症是一種進行性的慢性疾病,不僅長期對患者本身有身體、心理、社會和經濟層面的影響,而且對他們的照護者、家庭和整個社會而言同樣也是一大難題。阿茲海默症一直是國內外研究探討的熱門話題,臨床上並無可治癒之治療方式,其以藥物減緩為主,但由於缺乏有效率的診斷方式,無法使病患達到早期診斷早期治療,藥物的治療效果部分無法達到預期,目前臨床上多半採用問卷調查搭配磁振造影 (magnetic resonance imaging, MRI) 大腦影像或是採集腦脊髓液的診斷方式,但是MRI的健保給付條件嚴苛,而採集腦脊髓液的方式則有脊髓穿刺的隱憂,也因此低成本、低侵入性且有效的早期診斷工具就顯得相當重要。
本研究目的為透過美國阿茲海默症神經影像倡議 (Alzheimer’s Disease Neuroimaging Initiative, ADNI)所提供基因表現資料庫,開發阿茲海默症預測模型,結合機器學習演算法與特徵篩選演算法篩選重要生物標記,並以此預測一年、兩年與三年後得到阿茲海默症疾的可能性。該資料將個案分為三種不同類別,分別是健康 (normal aging, NL)、輕度認知功能障礙 (mild cognitive impairment, MCI)、阿茲海默症 (Alzheimer’s disease, AD)的個案採血液樣本進行,基因表現資料採微陣列實驗資料,包含49,386個探針,分別對應到不同的基因之表現量。利用隨機森林、支持向量機器等機器學習方法進行建模,將資料分為訓練、驗證與測試資料集進行嚴謹的模型建構與驗證工作,建立有效阿茲海默症早期診斷預測模型,並使用卡方檢定 (chi-square test)、差異表現基因 (differentially expressed genes, DEGs)、包裹法 (wrapper-based method)、交集基因 (intersection genes)等方式進行特徵篩選,以此預測一年、兩年與三年後得到阿茲海默症的可能性,同時加以分析輕度認知功能障礙的族群中,維持輕度認知功能障礙的個案及預測狀態惡化成阿茲海默症的病患在特定基因表現量的差異
在經過上述特徵篩選方法及嘗試八種不同的機器學習模型後,本研究的實驗結果找到18個基因機轉組合 (19個基因探針),搭配隨機森林演算法發現,在輕度認知功能障礙病患族群中,結果為預測兩年內狀態維持輕度認知功能障礙,不會進入阿茲海默症的個案,其能有效分析預測測試集資料且準確率達到88%,AUC (area under the ROC curve)也能達到71%。此外在預測分數小於或等於0.1分且維持在輕度認知功能障礙症的族群,準確率更能達到100%。為了驗證模型的可用性,將此模型用於預測輕度認知功能障礙一年及三年後狀態改變,AUC分別82%及74%,表示此模型及基因組合在預測輕度認知功能障礙疾病狀態維持上表現不俗。這樣的研究成果也表明,透過低成本的抽血方法採集基因表現量和機器學習模型能對阿茲海默症的前驅狀態-輕度認知功能障礙的患者,在疾病的變化與否上進行預測性評估的可行性價值。 As time passes and with the progress of new techniques in the medical field, the expected lifespan is extended and it also makes new diseases emerge and a rising aging population. With this aging problem around the world, dementia aggressively becomes a serious topic in public health. According to Alzheimer’s Disease International (ADI), there are more than 50 million patients with dementia and from 60% to 70% of patients are Alzheimer’s disease (AD) in the proportion to this population. Dementia is a progressively chronic disease, which not only has some physical, psychological, social, and economic impacts on people with dementia but also brings out a lot of problems for their carers. Whether on national researches or international researches, AD is always a topical issue. There is no treatment currently available to cure dementia or to alter its progressive course. Clinically, the main treatment for AD is medication for preventing its worsening. Physicians mostly use questionnaires compared with MRI (magnetic resonance imaging) brain images or cerebral spinal fluid (CSF) tests. But the governments hold tough prerequisites in MRI, and CSF has concerns about the risk of trauma to the spinal cord from the spinal needle. Therefore, it is vital to find a way with lower cost and low-invasive procedures for early diagnosis.
This research aims to build a model for early diagnosis of AD using a database from Alzheimer’s Disease Neuroimaging Initiative (ADNI) which offers clinical blood gene expression data of groups of normal aging (NL), mild cognitive impairment (MCI), and AD. We tried to utilize Random Forest, support vector machine (SVM), and several machine learning algorithms to build prediction models and conduct chi-square test, differentially expressed genes (DEGs), wrapper-based method, and intersection genes to screen and select important genes which could distinguish normal patients from patients with the deteriorating condition.
After features selection and evaluation of 8 machine learning models, we conclude 18 genes (19 probs). By using Random Forest, for MCI patients not converted to AD for two years, the model has 88% accuracy and AUC (area under the ROC curve) reaches 71%. In addition, for the subjects who maintain MCI, if their prediction score is less than or equal to 0.1, the accuracy rate can reach 100%. To validate the usability effectiveness of this model, we tried to predict the status of MCI patients after 1 year and 3 years. The AUC of 1 year and 3 years are 82% and 74%, respectively. This data indicates that this model has good performance in prediction. It also substantially proves that biomarkers of gene expression could be collected by low-cost blood sampling and can be used to predict and evaluate the precursors of AD with machine learning models. |