English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 45253/58429 (77%)
造訪人次 : 2487412      線上人數 : 262
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://libir.tmu.edu.tw/handle/987654321/64320


    題名: 探討大型語言模型與深度學習用於辨識生醫文獻中蛋白質互動關係之效能
    Analysis of Large Language Models and Deep Learning Methods in Identifying Protein Interaction Relationships in Biomedical Literature
    作者: 黃顗亘
    貢獻者: 大數據科技及管理研究所碩士班
    張詠淳
    關鍵詞: 自然語言處理;深度學習;大型語言模型;蛋白質交互作用;GPT
    Natural Language Processing;Deep Learning;Large Language Models;Protein– Protein Interaction Extraction;GPT
    日期: 2024-01-12
    上傳時間: 2024-09-30 14:21:17 (UTC+8)
    摘要: "隨著生醫文獻數量的快速增長,藉由準確辨識生物醫學領域中蛋白質交互作用(PPIs)以提供研究者快速捕捉文獻中的關鍵資訊成為了一項重要且艱鉅的任務。在過去,神經網路(Neural Network, NN)的突破推動了其在文字探勘任務中的廣泛應用,然而,直接將通用領域的方法應用於生物醫學方面仍存在限制;而近年來,隨著大型語言模型(Large Language Model, LLM)的發展與其基於大量文獻進行預訓練的優勢,使得LLM模型在各領域中能夠更有效地理解專業術語和上下文資訊,這樣的發展能夠實現對疾病的深入理解和治療方法的改進以及對新藥物開發研究的助益。
    本研究旨在探討使用不同prompt指令於模型GPT-3.5及GPT-4來預測蛋白質之間的互動關係,並從中提出一個最適用於GPT模型的prompt提問方法。此外,方法中我們也針對較複雜的實體型態進行改善,例如:巢狀蛋白質結構及複合詞蛋白質的例外處理。我們在五個常用於效能比較且公開的PPI資料集(LLL、IEPA、HPRD50、AIMed及BioInfer)進行評估,實驗結果表明,本研究所提出的方法在效能上具有相當的準確度,尤其在LLL資料集中F_1-score為87.3%,僅次於多核方法中的DSTK模型;再者,相較於其他深度學習模型,GPT基於具有高度彈性的prompt提問功能及多項參數可供調整,我們相信這將為生物醫學研究者提供更多的便利性。"
    "With the rapid growth of the number of biomedical literature, it has become an important and arduous task to accurately identify protein-protein interactions (PPIs) in the biomedical field to provide researchers with the ability to quickly capture key information in the literature. In the past, breakthroughs in neural networks (NN) have promoted its widespread application in text mining tasks. However, there are still limitations in directly applying general-field methods to biomedicine. In recent years, with the development of Large Language Model (LLM) and its advantages of pre-training based on a large amount of literature, the LLM model can more effectively understand professional terminology and contextual information in various fields. Such developments could lead to a deeper understanding of disease and improved treatments, as well as aiding research into the development of new drugs.
    This study aims to explore the use of different prompt instructions in the models GPT-3.5 and GPT-4 to predict the interaction between proteins, and propose a prompt that is most suitable for the GPT model. In addition, we have also improved the method for more complex entity types, such as nested protein structures and exception processing for compound proteins. We conducted evaluations on five publicly available PPI data sets (LLL, IEPA, HPRD50, AIMed and BioInfer) that are commonly used for performance comparison. The experimental results show that the method proposed in this study has considerable accuracy in performance, especially in LLL. The F_1-score in the data set is 87.3%, second only to the DSTK model in the multiple kernels method; furthermore, compared with other deep learning models, GPT is based on a highly flexible prompt questioning function and multiple parameters that can be adjusted. We believe that this It will provide more convenience for biomedical researchers."
    描述: 碩士
    指導教授:張詠淳
    口試委員:張詠淳
    口試委員:蘇家玉
    口試委員:許明暉
    附註: 論文公開日期:2024-01-23
    資料類型: thesis
    顯示於類別:[大數據科技及管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML67檢視/開啟


    在TMUIR中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    著作權聲明 Copyright Notice
    • 本平台之數位內容為臺北醫學大學所收錄之機構典藏,包含體系內各式學術著作及學術產出。秉持開放取用的精神,提供使用者進行資料檢索、下載與取用,惟仍請適度、合理地於合法範圍內使用本平台之內容,以尊重著作權人之權益。商業上之利用,請先取得著作權人之授權。

      The digital content on this platform is part of the Taipei Medical University Institutional Repository, featuring various academic works and outputs from the institution. It offers free access to academic research and public education for non-commercial use. Please use the content appropriately and within legal boundaries to respect copyright owners' rights. For commercial use, please obtain prior authorization from the copyright owner.

    • 瀏覽或使用本平台,視同使用者已完全接受並瞭解聲明中所有規範、中華民國相關法規、一切國際網路規定及使用慣例,並不得為任何不法目的使用TMUIR。

      By utilising the platform, users are deemed to have fully accepted and understood all the regulations set out in the statement, relevant laws of the Republic of China, all international internet regulations, and usage conventions. Furthermore, users must not use TMUIR for any illegal purposes.

    • 本平台盡力防止侵害著作權人之權益。若發現本平台之數位內容有侵害著作權人權益情事者,煩請權利人通知本平台維護人員([email protected]),將立即採取移除該數位著作等補救措施。

      TMUIR is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff([email protected]). We will remove the work from the repository.

    Back to Top
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋