摘要: | 乘著大數據與人工智慧風潮,程式設計能力已成為影響學生未來競爭力的必備技能之一,程式教育的需求也隨之遽增。在程式教育教學現場,教師有難以掌握學生學習狀況的問題,往往需仰賴出作業及後續批改進行確認,但當從作業中發掘出學生問題時,往往已經錯過最佳協助時機,緩不濟急。另一方面,相較於許多研究聚焦於積木式編程模擬軟體上,文本型程式語言對於高等教育學生而言,更貼近現實程式開發環境。然而文本型程式設計的特徵資料萃取不易。
為解決上述問題,本研究團隊針對臺灣北部某國立大學「Python教育資料探勘實作」課程進行實驗,修課學生共32人。並且以Jupyter Notebook為開發介面建置「教學輔助系統」。系統透過課程專用伺服器即時收取學生撰寫程式時產生的系統操作日誌,亦整合學校課程管理系統中之考試、作業繳交等資料,並即時量化呈現於Tableau視覺化儀表板,供教師進行課程狀態的監測。研究期間共計取得118,738筆系統日誌資料。
本研究從上述系統日誌及學生的程式碼中,萃取學生的學習歷程特徵,包括:執行次數、複製貼上次數、各種錯誤次數、修復錯誤耗費的時間、各種程式句型數及套件方法數等。根據這些特徵,本研究透過比較不同成績表現及不同經驗背景的學生群於這些特徵中的差異,找出學習不良的介入指標;也藉由分群分析,將有類似編輯習慣的學生組成群集,經整理出其於學習方法上之傾向並與成績做交叉比對後,可做為學生學習上的建議。上述特徵也被用於因素分析,以探索學生在這些特徵表現背後的狀態,最終歸納出了四種狀態:「閱讀及複習程式碼」、「撰寫Python程式碼」、「嘗試釐清問題」、「建構運算邏輯」。在比較不同成績及學習方法學生處於各種狀態下之比重差異後,可作為改善教學方向的參考依據。最後本研究使用學期中累計至各週的特徵資料,建構學期成績的預測模型,並觀察這些特徵是否足以作為學習預警的依據。結果顯示,使用累計至第六週的資料,可在預測學期成績的模型中,有高達0.81準確率的表現,可判斷這些特徵確實有學習預警的潛能。 With the emerging technologies including big data and Artificial Intelligence, programming ability has become an essential skill today, and the demand for programming learning is also growing rapidly. When teaching programming, it's difficult for teachers to understand the learning status of each student, so they make assessments based on the students’ homework. However, when the teacher finds and correct students' errors on the homework, the teacher also misses the best timing to assistance right away. On the other hand, compared to the other studies focusing on the block-based programming language, text-based programming languages are more related to the practical programming development environment. However, it's not easy to extract features of text-based programming.
To solve the aforementioned problems, this research conducted an experiment in the "Educational Data Mining Using Python" course at a national university in northern Taiwan. There are 32 students who participated in the study. This study used Jupyter Notebook as the development environment and built a teaching assistant system. We collected programming log data generated by students and integrated it with the data from the school's learning management system. Moreover, this study further visualized the data on a Tableau dashboard in real-time to help teachers monitor the learning process of students in the course. A total of 118,738 system log data were obtained during the research period.
This study extracts the features of students’ learning process from the above system logs and students’ code, including the number of executions, copying and pasting, the number of various errors, the time to repair the errors, the number of program patterns, etc. Based on these features, we compared the differences between students with different performances and experiences, and find out the indicators for learning disability. Through the cluster analysis, students with similar learning habits are divided into groups. After comparing the scores between different groups, we can generalize the learning suggestions for students. These features were also used in the factor analysis to further explore the state behind students’ performance. There are four states summarized: "reading and reviewing the code", "writing Python code", "attempting to solve problems", and "constructing programming logic". After comparing the differences among the students with different grades and different learning styles, it can be used as a reference for improving the teaching direction. In addition, this study used these features to construct a predictive model of learning performance and observed whether these features are sufficient to raise a learning warning during the semester. The results indicate that using the data accumulated from the first six weeks, the model can predict students’ performance with an accuracy of 0.81. It can be judged that these features have the potential of learning early warning. |