摘要: | 背景:
全球大約有百分之十五的人口有氣喘病, 而且是一種 潛藏致死嚴重性的
疾病 。 氣喘病的主要特徵為呼吸道的慢性發炎和呼吸道的過 度 反應。 急性發
作,或者是未被妥善控制的氣喘病,通常是因為未能及時對症治療 。慢性 病患
的疾病表徵型 的 改變, 也是導致 目前治療的藥物無法完全控制 氣喘 症狀 的可能
原因。 呼吸道的炎症反應,會激活很多免疫細胞,尤其是 第二型 T淋巴球與 嗜
酸性白血球 等等,這些免疫細胞會產生一氧化氮, 並 充滿整個呼吸道 。 吐氣一
氧化氮 (FENO)是目前醫界公認為最簡單、也最可靠的檢測氣喘呼吸道免疫反應
是否為 第二型 T淋巴球主導的免疫反應 。 血液中的 E型免疫球蛋白 濃度, 是 目
前 胸腔醫學界 最 常用來評估 氣喘是否為 過敏 性 反應 的檢驗之一。 隨機森林 利用
集成學習 中 的 boosttrap和 bagging 以決策樹作為最小的學習分類器,是目前
最有效率的人工智慧分類學習器。 近兩年的文獻顯示, 已經 有一些學者利用隨
機森林 集成學習法來診斷呼吸道疾病。
研究目的:
本研究有二個主要目的 :其一、 利用隨機森林確定氣喘病 表徵 分型
的重要 特徵 ;其二、 以機器學習建立氣喘臨床表現型的分類模型。
研究方法:
收集氣喘病患的 基本生物資料、肺功能結果, FENO IgE等等,總
8
共
共33個特徵,使用個特徵,使用R語言,語言,利用隨機森林集成分類學習法,利用隨機森林集成分類學習法,分析分析並找出並找出氣喘氣喘表徵分表徵分型型的重要特徵。的重要特徵。進一步更利用隨機森林集成學習法,進一步更利用隨機森林集成學習法,建立氣喘表徵分型建立氣喘表徵分型的預測模型。的預測模型。
研究結果:
研究結果:本研究收集了本研究收集了67個氣喘病患的資料,個氣喘病患的資料,種共輸入了生物資料、肺功種共輸入了生物資料、肺功能的檢查結果,血液中能的檢查結果,血液中IgE,,FENO等等共等等共33項特徵值項特徵值。。利用隨機森林集成分利用隨機森林集成分類學習法分析之後,類學習法分析之後,前前五五項重要的項重要的氣喘表徵氣喘表徵分型特徵是分型特徵是IgE,,FENO,,FVC,,reversibility,,BMI。。使用未增量的資料訓練隨機森林分類學習模型師,準確性使用未增量的資料訓練隨機森林分類學習模型師,準確性為為0.88;利用隨機重複取樣,將資料增量為;利用隨機重複取樣,將資料增量為100, 200, 300筆時,筆時,準確性分別為準確性分別為0.94, 0.94, 0.97。。
結論:
結論:隨機森林分類學習法隨機森林分類學習法可以運用於可以運用於氣喘病氣喘病表徵分型的預測,表徵分型的預測,模型模型對於準確對於準確性性的預測的預測,有相當不錯的表現。,有相當不錯的表現。本研究本研究隨機隨機增量訓練增量訓練的預測結果顯示,的預測結果顯示,以後若以後若是可以增加是可以增加實際實際訓練資料的總數訓練資料的總數,,應該更能提高模型的預測能力應該更能提高模型的預測能力,將有,將有助助於臨於臨床醫師對於氣喘表徵分型床醫師對於氣喘表徵分型的正確診斷,提升氣喘病的照護品質的正確診斷,提升氣喘病的照護品質。。 Background: Approximately 15% of the global population has asthma, and it is a potentially fatal disease. The main features of asthma are chronic inflammation of the respiratory tract and airway hyperresponsiveness. Acute attacks, or asthma that is not properly controlled, are usually caused by failure to treat symptoms in time. The change in the disease characteristics of chronic patients is also a possible reason why the current treatment drugs cannot completely control the symptoms of asthma. The inflammatory response of the respiratory tract will activate many immune cells, such as T lymphocytes and eosinophils, etc. These immune cells, especially type 2 immune cells will produce nitric oxide and fill the entire respiratory tract. Exhaled nitric oxide (FENO) is currently recognized by the medical profession as one of the simplest and most reliable tests for type 2 immune responses of asthma. The level of immunoglobulin E in the blood is currently one of the most commonly used tests in the thoracic medicine community to assess allergic reactions in the respiratory tract. Random forest uses bootstrap and bagging in ensemble learning, and uses decision trees as the smallest learning classifier. It is currently the most efficient artificial intelligence classification learner. The literature in the past two years shows that some scholars have used random forest ensemble learning methods to diagnose respiratory diseases.
Objectives: This research has two main objectives: one is to use random forest to determine the important characteristics of asthma characterization; second, to establish a classification model of asthma clinical phenotypes with machine learning.
Method: Collect basic biological data, lung function results, FENO, IgE, etc. of asthma patients, a total of 33 features, using R language, using random forest ensemble classification learning method, analyze and find out the important features of asthma characterization classification . Further use the random forest ensemble learning method to establish a prediction model for the classification of asthma representation.
10
Results: This study collected data of 67 asthma patients, and entered the biological data, lung function test results, blood IgE, FENO, etc., a total of 33 characteristic values. After using the random forest ensemble classification learning method to analyze, the top five important asthma characterization characteristics are IgE, FENO, FVC, reversibility, and BMI. Using unincremented data to train a random forest classification learning modeler, the accuracy is 0.88; when using random repeated sampling to increase the data to 100, 200, and 300, the accuracy is 0.94, 0.94, and 0.97, respectively.
Conclusion: The random forest classification learning method can be applied to the prediction of the characterization classification of asthma. The model has a fairly good performance for the accuracy of the prediction. The prediction results of random incremental training in this study show that if the total number of actual training data can be increased in the future, and the compliance with the use of drugs, the number of acute attacks, and the quality of residential air pollution, etc., it should be able to improve the prediction of the model. Ability, will help clinicians to correctly diagnose the type of asthma, and improve the quality of care for asthma. |