摘要: | 背景和目的: 棒球運動作為一項高強度的體能運動,投手的傷害問題一直是關注的重點。大量文獻顯示,投手的受傷率明顯高於其他位置的球員。其主要原因在於投球動作的重複性,帶來肩部、肘部長期的負荷和疲勞。過去的研究主要聚焦於整季或整個職業生涯投手表現資料的分析。這些研究運用各種統計方法或機器學習演算法,嘗試找出影響傷害風險的關鍵因素。儘管取得了一定的成果,但仍存在一些不足。像是聚焦長期表現數據,忽略造成受傷與單場比賽的關係,還有在預測結果的精確度仍有改進空間。因此,本研究設計一種能夠結合單場比賽數據,提高預測準確性的方法,對於投手傷害預防具有新的觀點。本研究希望能在現有研究的基礎上,分析投手發生受傷前的比賽表現與受傷的關係,並建立機器學習模型預測下季和同季會再受傷的可能性。希望結果能提供球員、教練和球團參考,並調整選手訓練及比賽方式,使運動傷害降低。 研究方法: 利用公開的2012至2022年美國職業棒球大聯盟投手表現和受傷數據,經過資料預處理後合併樣本數據。數據包含投手身高、體重等身體特徵,以及受傷詳情,如受傷部位、受傷狀況等,還有發生受傷前一場、三場和五場比賽的各項投球表現指標。透過14種不同的機器學習演算法,篩選出表現最佳的演算法作為主要預測模型。 結果: 在使用獨立樣本分配將發生受傷前五場比賽加上投球速度變化的特徵的資料集,設定2012年到2021年的資料為訓練集,運用Voting演算法集成10種單一模型準確率最高的前三種方法去預測同季再次受傷的準確率最高,Accuracy為0.90±0.02、AUC 0.96±0.01、F1-score 0.90±0.021。而影響模型準確性的特徵有60天的傷兵名單、逐出賽季與否、受傷天數。 結論: 發生受傷前的比賽情況與受傷情形能有效預測同季是否再次受傷的可能。雖然測試集的驗證效能較偏頗於預測無再次受傷的類別,但本研究結果給予運動醫學上不同的研究角度。若未來有更完整的數據及不同的特徵值,是可以繼續探討的方向。 Background and Purpose: Baseball pitchers face a high risk of injuries due to the repetitive nature of pitching, which places long-term strain on the shoulder and elbow.Studies show that pitchers have a significantly higher injury rate compared to other players. Previous research has focused on analyzing performance data over entire seasons or careers using statistical methods and machine learning algorithms to identify injury risk factors. However, these studies often overlook the impact of single-game performance on injuries and have room for improvement in prediction accuracy. This study aims to integrate single-game data to enhance prediction accuracy, providing a new perspective for injury prevention in pitchers. By analyzing the relationship between pre-injury performance and injuries, and establishing machine learning models, this research predicts the likelihood of re-injury within the same season and the next. The findings offer valuable insights to players, coaches, and teams, helping them adjust training and gameplay strategies to reduce injuries. Method: Utilizing publicly available performance and injury data of Major League Baseball pitchers from 2012 to 2022, this study involves data preprocessing to merge sample data. The dataset includes physical characteristicsof pitchers, such as height and weight, as well as injury details like the injured body part and injury status.Additionally, it covers various pitching performance indicators from one game, three games, and five games before the injury occurrence. Fourteen different machine learning algorithms were employed to identify the best-performing algorithm, which was then used as the primary predictive model. Results: Using an independent sample distribution, we combined features from the five games prior to injury along with pitching velocity changes. The data from 2012 to 2021 was set as the training set. By integrating the top three models with the highest accuracy using the Voting algorithm, the prediction of re-injury in the same season yielded the highest accuracy, with an Accuracy of 0.90 ± 0.02, AUC of 0.96 ± 0.01, and F1-score of 0.90 ± 0.021. The features impacting model accuracy included the 60-day injured list, season-ending injuries, and the number of injury days. Conclusion: The performance and injury status in the games leading up to an injury can effectively predict the likelihood of re-injury within the same season. Although the validation performance on the test set showed a bias towards predicting the class without re-injury, the findings provide a different research perspective in sports medicine. Future studies with more comprehensive data and varied features can further explore this direction. |