English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 45422/58598 (78%)
造訪人次 : 2532972      線上人數 : 247
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://libir.tmu.edu.tw/handle/987654321/64261


    題名: 人工智慧輔助的植物多醣結構分析及其在植物演化系統和組織特異性鑑定中的應用
    AI-Assisted Structural Profiling of Polysaccharides for Plant Phylogeny and Tissue-Specific Identification
    作者: 熊師怡
    Hsiung, Shih-Yi
    貢獻者: 藥學系博士班
    謝尚逸
    王靜瓊
    關鍵詞: 人工智慧;多醣;植物細胞壁;真菌細胞壁
    Artificial intelligence;Polysaccharide;Plant cell wall;Fungal cell wall
    日期: 2024-06-17
    上傳時間: 2024-09-11 19:15:55 (UTC+8)
    摘要: 植物細胞具有堅硬的細胞壁,主要由纖維素和非纖維素多醣組成。纖維素微纖維是植物細胞壁的主要結構成分。非纖維素多醣在各種生物環境中展現出多樣的結構和功能,在植物細胞壁中,這些多醣在增強細胞壁的整體結構靈活性和功能性方面,起著至關重要的作用。其中一個值得注意的例子是木葡聚糖(xyloglucan),它被認為是雙子葉植物初級細胞壁中最豐富的非纖維素多醣。第一章討論了植物和真菌界中的主要多醣聚合物,包括果膠(pectin)、木葡聚糖(xyloglucan)、異質聚木醣(heteroxylan)、異甘露聚糖(heteromannan)等,以及介紹幾種目前常用的多醣結構分析技術,以及人工智慧(artificial intelligence)和程式平台在其中的角色。

    第二章專注於天南星科植物中的木葡聚糖譜系分析。早期研究發現,水生天南星科植物浮萍亞科的木葡聚糖結構,在與其他非香蒲亞科單子葉植物比較岩藻半乳木葡聚醣(fucogalacto-xyloglucan)時,有顯著的不同。在本研究中,我們調查了七個天南星亞科中五個亞科裡的26種植物,包含在早前研究中,因特殊岩藻半乳木葡聚醣結構而引起注意的浮萍(Lemna minor)。在所分析的七種水生植物中,木葡聚糖結構比例皆有不尋常的特徵(至少包含以下一或兩種):77%以上的XXXG核心基序[在浮萍亞科(Lemnoideae)和金棒芋亞科(Orontioideae)中觀察到];缺乏岩藻糖基化[在浮萍亞科(Lemnoideae)、亞澎椒草(Cryptocoryne aponogetonifolia)和卵形瓶苞芋(Lagenandra ovata)(Aroideae,Rheophytes clade)中發現];以及14%以上的寡糖單位具有S或D側鏈[在水萍(Spirodela polyrhiza)和少根紫萍(Landoltia punctata, Lemnoideae),以及水芙蓉(Pistia stratiotes)(Aroideae,Dracunculus clade)中觀察到]。金棒芋亞科和浮萍亞科被認為是最早期演化的兩個亞科,其中所有物種都是水生的,而天南星亞科(Aroideae)被認為是最晚演化的。有趣的是,兩種陸生植物[花葉萬年青(Dieffenbachia seguine)和心葉天南星(Spathicarpa hastifolia)(Aroideae,Zantedeschia clade)]也顯示出沒有岩藻糖的木葡聚糖,表明這一特徵並不僅限於水生植物。

    第三章內容為使用機器學習模型(machine learning; ML)分析植物之木葡聚糖。這項研究針對172個植物譜系中之物種的木葡聚糖,進行機器學習模型建立下的譜系分析。木葡聚糖(XG)因其可生物降解、相容性好、易於化學修飾且已被美國食品和藥物管理局(FDA)批准,具有廣泛的醫療用途。然而,目前尚無關於綠色植物:石松類(lycopodiopsida; L)、蕨類(polypodiopsida; P)、裸子植物(gymnosperm; G)、非香蒲亞科單子葉植物(non-commelinid monocotyledon; MN)、香蒲亞科單子葉植物(commelinid monocotyledons; MC)和木蘭類(magnoliids; Ma)的XG結構之綜合研究,以及和植物系統相關的XG結構分析。廣泛分析族譜系統的XG結構之障礙,在於難以獲取種類如此廣泛的植物樣品和缺乏有系統的分析方式。在本研究中,我們建立了7種機器學習模型,線性判別分析(LDA)、邏輯回歸(LR)、k-最近鄰居法(KNN)、支援向量機(SVM)、隨機森林(RF)、梯度提升(GB)和單純貝氏(NB),以分析172種石松類(2)、蕨類(12)、裸子植物(9)、木蘭類(1)、雙子葉植物(29)、非香蒲亞科單子葉植物(37)和香蒲亞科單子葉植物(82)的XG結構。結果顯示,ML模型在XG數據集上表現良好(AUC > 0.900)。植物類別,蕨類(P)、裸子植物(G)、雙子葉植物(E)、非香蒲亞科單子葉植物(MN)和香蒲亞科單子葉植物(MC)可以基於XG結構、XXGn型、XXXG型、XXFG、XLFG和XXFG+XLFG百分比進行分類、識別和預測。這可能也表示XG成分在植物界中包含系統發生訊息和結構模式。ROC曲線顯示,ML分析數據在某些植物類別的表現在KNN、RF、GBM、SVM和NB之間有些許不同。LDA和LR在每個植物類別表現傑出,不管是分類或預測。每個ML模型都顯示,各個植物類別中的XG結構XXGn型、XXXG型、XXFG、XLFG和XXFG+XLFG等百分比都擁有能用以分類與預測的特性。這些發現,輔助模型在各個特定植物系統類別中,擁有精確分類和預測有效性的推論。植物類別G、MC和P表現出較高的AUC平均值,可能表示這些組中的XG結構的演化訊息相比組MN和E更加顯著。這些XG可用以系統分類與預測的結果,顯示ML模型結合XG數據,擁有在未來生物技術和製藥研究中,前線分析和材料選擇的潛在應用價值。

    第四章為另一個使用機器學習進行多醣譜系分析的例子,使用模型針對茯苓真菌的組織單醣進行分類(以單糖比例作為分類特徵)。機器學習(ML)已成為生物信息學中各種臨床決策和診斷過程中的寶貴工具。在我們的研究中,我們評估了8種ML模型演算法,線性判別分析(LDA)、邏輯回歸(LR)、k-最近鄰居法(KNN)、支援向量機(SVM)、隨機森林(RF)、梯度提升(GB)和單純貝氏(NB)和人工神經網絡(ANN)模型,在茯苓真菌四種組織的單醣分析中的表現。所有ML的模型表現都很出色,AUC皆超過0.8。其中五個模型,LDA、KNN、RF、GBM和ANN,在分類四種組織類型方面表現非常出色,AUC皆大於0.9。此外,所有八個模型在三組織類型分類中的預測性能良好,AUC大於0.8。值得注意的是,所有ML模型分類預測的方法都優於單純LDA繪圖分析。基於ML的方法優於傳統的回歸技術,樣本量的增加,可能提高識別茯苓真菌組織樣本的準確性。而單醣組織ML分類法,未來有潛力可在市場中用來分辨茯苓粉狀樣品的真偽。
    The plant cell has a rigid cell wall containing mainly cellulose and non-cellulosic polysaccharides. Cellulose microfibrils are a major structural component of plant cell walls. Non-cellulosic polysaccharides exhibit diverse structures and functions across various biological contexts. In the context of plant cell walls, these polysaccharides play a crucial role in enhancing the overall structural flexibility and functionality of the cell wall. One noteworthy example is xyloglucan (XGs), recognized as the most abundant non-cellulosic polysaccharide in the primary cell wall of eudicotyledons. Chapter 1 discusses the structural diversity of polysaccharides, including pectins, XGs, heteroxylans, heteromannans etc., in plant and fungal kingdoms. It also summarized how these polysaccharides were structurally characterized by current technologies and how artificial intelligence (AI) could be implement in structural profiling.

    Chapter 2 focuses on xyloglucan profiling in the Araceae family. Earlier studies demonstrated that the aquatic Araceae species Lemna minor possesses xyloglucans with a distinct structure compared to the fucogalactoxyloglucans found in other non-commelinid monocotyledons. In this study, we investigated 26 Araceae species, including L. minor, from five of the seven subfamilies. All seven aquatic species examined exhibited xyloglucans with unusual characteristics, including one or two of the following features: a core motif of <77% XXXG [observed in L. minor (Lemnoideae) and Orontium aquaticum (Orontioideae)]; lack of fucosylation [found in L. minor (Lemnoideae), Cryptocoryne aponogetonifolia, and Lagenandra ovata (Aroideae, Rheophytes clade)]; and >14% of oligosaccharide units with S or D side chains [observed in Spirodela polyrhiza and Landoltia punctata (Lemnoideae), and Pistia stratiotes (Aroideae, Dracunculus clade)]. The subfamilies Orontioideae and Lemnoideae are considered the two most basal subfamilies, with all their species being aquatic, while Aroideae is considered the most derived. Interestingly, two terrestrial species [Dieffenbachia seguine and Spathicarpa hastifolia (Aroideae, Zantedeschia clade)] also exhibited xyloglucans without fucose, indicating that this feature was not unique to aquatic species.

    Chapter 3 includes a machine learning study on xyloglucan profiling of 172 species across 7 plant lineages. Xyloglucan (XG) has a wide range of medical uses because it is biodegradable, biocompatible, flexible to chemical modification, and has been approved by the U.S. Food and Drug Administration (FDA). However, there has not been a comprehensive study of XG structures in green plants, lycopodiopsida (L), polypodiopsida (P), gymnosperms (G), non-commelinid monocotyledons (MN), commelinid monocotyledons (MC), and magnoliids (Ma), nor any extensive reports available on XG structures across various phylogenetic groups. The obstacles of operating this wide range of XG profiling are from the difficulty of accessing plant sources and lack of systematical analysis. In this study, we operated 7 machine learning models, linear discriminant analysis (LDA), logistic regression (LR), k-nearest neighbors (KNN), support vector machine (SVM), random forest (RF), gradient boosting (GB), and naive Bayes (NB) to analyze XG structures in 172 species of lycopodiopsida (2), polypodiopsida (12), gymnosperms (9), magnoliids (1), eudicotyledons (29), non-commelinid monocotyledons (37), and commelinid monocotyledons (82). The results indicated that ML models well-performed with XG dataset (AUC > 0.900). The phylogenetic groups, polypodiopsida (P), gymnosperms (G), eudicotyledons (E), non-commelinid monocotyledons (MN), and commelinid monocotyledons (MC) could be identified and predicted based on XG structures, XXGn type, XXXG type, XXFG, XLFG and XXFG+XLFG percentages. It may suggest that XG compositions contain phylogenetic signals and structural patterns in plant kingdom. The ROC curves showed that the performance of data in certain phylogenetic groups varied across KNN, RF, GBM, SVM, and NB. LDA and LR performed the best classifications and predictions in each phylogenetic group. Every model revealed distinctive patterns in the percentages of XG structures, XXGn type, XXXG type, XXFG, XLFG and XXFG+XLFG, within each phylogenetic group. These findings underscored the models' effectiveness in achieving accurate classification and prediction within each specific phylogenetic group. Phylogenetic groups G, MC, and P exhibited higher average AUC values, suggesting that the phylogenetic patterns in XG structures within these groups were more pronounced compared to groups MN and E. A potential application of ML in frontline examinations and material selections for future biotechnological and pharmaceutical studies was suggested.

    Chapter 4 presents another example of using machine learning for polysaccharide profiling in the tissue-specific classification of Wolfiporia extensa samples. Monosaccharide ratios were used as the classification features. Machine learning (ML) has become a valuable tool in various clinical decision-making and diagnostic procedures within bioinformatics. In our study, we assessed the classification and prediction capabilities of eight algorithms, linear discriminant analysis (LDA), logistic regression (LR), k-nearest neighbor (KNN), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), na?ve Bayes classifier (NB), and artificial neural network (ANN) models, using the monosaccharide composition profiles of four tissue types in Wolfiporia extensa. All eight ML-based models were exemplary, with an AUC exceeding 0.8. Particularly, five models, LDA, KNN, RF, GBM, and ANN, performed exceptionally well in classifying the four tissue types, achieving an AUC greater than 0.9. Moreover, all eight models demonstrated good predictive performance, with an AUC greater than 0.8 in the three-tissue-type classification. It is noteworthy that all eight ML-based methods outperformed the single LDA plotting method. For larger sample sizes, ML-based methods proved superior to traditional regression techniques, potentially enhancing the accuracy of identifying tissue samples of W. extensa.
    描述: 博士
    指導教授:謝尚逸
    共同指導教授:王靜瓊
    口試委員:李慶國
    口試委員:張嘉銓
    口試委員:林東毅
    口試委員:王靜瓊
    資料類型: thesis
    顯示於類別:[藥學系] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML0檢視/開啟


    在TMUIR中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    著作權聲明 Copyright Notice
    • 本平台之數位內容為臺北醫學大學所收錄之機構典藏,包含體系內各式學術著作及學術產出。秉持開放取用的精神,提供使用者進行資料檢索、下載與取用,惟仍請適度、合理地於合法範圍內使用本平台之內容,以尊重著作權人之權益。商業上之利用,請先取得著作權人之授權。

      The digital content on this platform is part of the Taipei Medical University Institutional Repository, featuring various academic works and outputs from the institution. It offers free access to academic research and public education for non-commercial use. Please use the content appropriately and within legal boundaries to respect copyright owners' rights. For commercial use, please obtain prior authorization from the copyright owner.

    • 瀏覽或使用本平台,視同使用者已完全接受並瞭解聲明中所有規範、中華民國相關法規、一切國際網路規定及使用慣例,並不得為任何不法目的使用TMUIR。

      By utilising the platform, users are deemed to have fully accepted and understood all the regulations set out in the statement, relevant laws of the Republic of China, all international internet regulations, and usage conventions. Furthermore, users must not use TMUIR for any illegal purposes.

    • 本平台盡力防止侵害著作權人之權益。若發現本平台之數位內容有侵害著作權人權益情事者,煩請權利人通知本平台維護人員([email protected]),將立即採取移除該數位著作等補救措施。

      TMUIR is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff([email protected]). We will remove the work from the repository.

    Back to Top
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋