摘要: | 隨著人口結構日益高齡化,預防和提前預測骨質疏鬆症的發生已成為一個重要的公共衛生議題。目前,診斷骨質疏鬆症主要依賴雙能量X 光吸收儀(DXA)測量骨礦物質密度(BMD)。本研究收集了1,919 位骨量異常患者的數據,並利用資料探勘和機器學習技術,從中提取相關檢驗項目的特徵,建立了一個用於檢測低骨量和骨質疏鬆的模型。
在這項研究中,我們採用了決策樹、隨機森林和邏輯回歸等三種演算法作為機器學習模型,並利用混淆矩陣來評估比較模型的效能。每個訓練模型的曲線下面積(AUROC)在決策樹中為 0.692,在隨機森林中為 0.784,在邏輯回歸中為 0.693。當我們新增了身體質量指數(BMI)這一項特徵後,模型的AUROC 值提高至0.800。在隨機森林模型中,年齡、性別、身高、高密度脂蛋白膽固醇(HDL-c)和體重被認為是重要的特徵。
本研究評估了用於預測正常骨量和異常骨量人群的機器學習模型的準確性。透過進一步分析基本資料描述和特徵重要性,我們發現檢驗項目中的高密度脂蛋白膽固醇(HDL-c)在檢測骨量方面具有潛在的指標價值。 With the aging population, preventing and predicting osteoporosis has become an important public health issue. Currently, the diagnosis of osteoporosis relies on dual-energy X-ray absorptiometry (DXA) to measure bone mineral density(BMD). In this study, data from 1,919 patients with abnormal bone mass were collected, and data mining and machine learning techniques were used to extract features related to relevant laboratory tests in order to establish a model for detecting osteopenia and osteoporosis.
Three machine learning algorithms: decision tree, random forest, and logistic regression, were employed in this study, and the performance of these models was evaluated using a confusion matrix. The area under the receiver operating characteristic curve (AUROC) for each trained model was 0.692 for decision tree, 0.784 for random forest, and 0.693 for logistic regression. After incorporating the new feature of body mass index (BMI), the AUROC value of the model improved to 0.800. Age, gender, height, high-density lipoprotein cholesterol (HDL-c), and weight were identified as important features in the random forest model.
This study assessed the accuracy of machine learning models in predicting normal and abnormal bone mass populations. Through further analysis of demographic characteristics and feature importance, it was discovered that HDL-c, a laboratory test parameter, has potential as an indicator for bone mass detection. |