摘要: | 背景:於2022年OPEN AI釋出新版本聊天機器人 CHAT GPT 3.5版即獲得快速關注,2023年初釋出API提供商業型應用用戶引用服務,使用自然語言開發的Chat GPT 3.5版本,經美國2022年底在未調整模型下研究CHAT GPT 進行美國醫師證照考試,正確性約接近六成。因此於Chat GPT 3.5上進行衛教諮詢時,首先面臨資料是否充足具備被病患及家屬諮詢的能力。 目的:Chat GPT 為預訓練模型,探討CHAT GPT 3.5應用在醫療衛教諮?流程建置,經提示工程調整回覆依據為醫療衛教文本後的成效;並將調整後諮?流程,邀請醫師評估其正確性、一致性、差異性,請使用者評估以國中三年級的繁體中文可讀性、信任度進行觀察性研究。 方法: 研究初期建置及測試醫療衛教諮詢流程,首先成立醫療衛教網站,彙整專家建議的醫療衛教文本,資料來源:臺北醫學大學附設醫院官方網站—衛教資訊,截取2023年03月至06月份資料,網站供CHAT GPT 衛教諮詢流程測試及評分使用,共抽樣三項醫療衛教主題進行實驗:失眠、慢性腎臟疾病、糖尿病;由三組衛教各5位的一般使用者組受試者各提問3項諮詢問題發展為題庫,由CHAT GPT 進行回答,提問諮詢內容於提示工程(Engineering Prompt)中增加要求:1.使用繁體中文回覆。2. 若提問內容不在衛教文本中應回答”您所提問的內容不在資料庫中”等字樣。3.回答諮詢依據為本研究所提供衛教文本內容;4. 若有無法回答內容應提示使用者修正問題。5. 請從衛教資料中回覆諮詢問題。 將Chat GPT回覆諮詢結果製成題庫,由3位醫師閱讀題庫,計算依文本內容回答諮詢的正確性、一致性、差異性,填寫醫師評分表;由15位使用者評分模型回答諮詢的可讀性、信任度填寫於一般使用者評分表,共五項變數,觀察其成效分佈及標準差是否接近,由不同醫師判斷的正確性是否也為小量標準差差異,不會造成影響,驗證生成式AI是否可以正確回答諮詢內容,以做為日後衛教諮詢服務的工具。收案日期自2023年06月12日至2023年06月25日。 結果:測試完成彙整文本,於Chat GPT中進行提示工程語法測試,並取得調整後提示訊息的醫療衛教諮詢回覆;由醫師組3位、一般使用者15位受試者評分,評分共45題繁體中文諮詢議題,每題諮詢二次合計90次諮詢議題。依評分結果顯示模型諮詢表現,全部共45題,CHAT GPT 正確性達8分以上比例為26題,但未及格48題;一致性達8分以上比例為24題,平均值為7分以上未滿8分。因此目前正確性及一致性表現,尚不達做為醫療衛教諮詢助理程度。 Background: In 2022, OPEN AI releases a new version of the chat robot CHAT GPT version 3.5, which has gained rapid attention. In early 2023, the API was released to provide commercial application user reference services. The Chat GPT version 3.5 developed in natural language. Under the model, the CHAT GPT is used to conduct the American Medical Licensing Examination in United States at the end of 2022, and the accuracy is close to 60%. Therefore, when Chat GPT 3.5 was consult for health education, the first thing to face is whether the information is sufficient and the ability to be consulted by patients and family members. Purpose: Chat GPT is a pre-training model, exploring the application of CHAT GPT 3.5 in the construction of the medical and health education consultation process, and adjust the basis to the medical education text by engineering prompt. Finally perform an observational study. Invite doctors to evaluate its correctness, consistency, and differences. Invite users to evaluate the readability and trustworthiness, the readability was calculated of traditional Chinese in the 9th grade. Method: In the early stage of research, build and test the medical and health education consultation process. First, build a medical health education website to compile medical health education texts suggested by experts. Date source: The official website of Taipei Medical University Hospital-Health Education Information, data intercepted from March 2023 to June 2023, the website is used for CHAT GPT health education consultation process testing and scoring. A total of three medical health education topics were sampled for experiments: insomnia, chronic kidney disease, and diabetes. Participants ask 3 questions to develop into a question bank, which will be answered by CHAT GPT. The content of questions and consultations will be added to the Engineering Prompt: 1. Reply in traditional Chinese. 2. If the content of the question is not in the health education text, Chat GPT should answer "the content of your question is not in the database". 3. The basis for answering the consultation is the health education text content provided by the research institute; 4. If there is any content that cannot be answered, the user should be prompted to correct the question. 5. Please answer the consultation questions from the health education materials. The results of the Chat GPT reply consultation were made into a question bank, and 3 doctors read the question bank, calculated the correctness, consistency, and difference of the answers to the consultation based on the text content, and filled in the doctor's rating form. 15 general users calculated the readability and trust of model consultation. There are five variables in total. Observe whether the effect distribution and standard deviation are close. To verify is the generative AI model could generate consultation feedback correctly. The research is from June 12, 2023 to June 21, 2023. Result: After completed the test of text, test the program of engineer prompt on Chat GPT, to get the medical education consultation feedback of adjust workflow. 3 physicians and 15 general users scored the scores, with a total of 45 questions in Traditional Chinese for consultation topics, each question will be consulted twice for a total of 90 consultation topics. According to the scoring result of the model consultation feedback, there are a total 45 questions. 26 questions’ correctness score of CHAT GPT is above 8 points, but 48 questions are unqualified. About the consistency score, there are 24 questions got above 8 points, and the average score is above 7. Therefore, the correctness and reliability are insufficient to be a medical education consultation assistant. |