Research Guy

Problem

The increasing adoption of code-generating AI tools like GitHub Copilot in both professional and educational settings presents unforeseen consequences for computing students, with a lack of empirical evidence regarding its effects. While human pair programming is known to be beneficial for student self-efficacy and achievement, its utilization is threatened by the proposition that AI can replace human partners. This study addresses this gap by comparing human-human and human-AI pair programming paradigms, investigating their differences in actual and perceived performance, learning retention, workload, and emotional impact, guided by the Control-Value Theory of emotion and Cognitive Load Theory. The core problem is understanding how AI assistance impacts students beyond mere performance, considering aspects like learning, cognitive load, and emotional well-being compared to traditional human collaboration.

Method

A mixed-methods, within-subjects study was conducted with 22 novice and intermediate Python programmers. Participants worked on Python coding tasks under time pressure, both in teams of two (human-human) and individually with GitHub Copilot (human-AI) for 20 minutes each. The study design was counterbalanced across participants and tasks to mitigate order and task pool effects. One week later, participants completed a retention test individually on the same programming tasks. Objective measures included programming performance (tasks completed and time to spare) and learning (retest performance). Subjective measures collected via questionnaires included workload (NASA Task Load Index dimensions: mental demand, temporal demand, effort, frustration) and emotion (valence and arousal, based on Russell's circumplex model). Participants were incentivized to balance performance with understanding. Data analysis involved Ordinary Least Squares (OLS) regression with Studentized Wild Cluster Bootstrapping for inference, addressing small sample size and data dependence, and Benjamini-Hochberg adjustment for family-wise error rate control. Qualitative data from exit questionnaires was used to contextualize and triangulate quantitative findings.

Results

Participants demonstrated significantly better objective performance with GitHub Copilot compared to human teammates, completing tasks faster and more efficiently. However, their self-grades did not significantly reflect this AI-driven performance advantage, indicating they didn't attribute the success to their own individual contribution. Workload measures revealed significantly reduced mental demand, temporal demand, and effort when working with Copilot, though frustration levels were not significantly different. Emotionally, the impact of the human teammate was significantly more positive and arousing than with Copilot, suggesting a more engaging and enjoyable experience despite higher workload and lower raw performance. Regarding learning retention, there was a non-significant trend towards worse retest performance for tasks initially completed with AI, particularly for stronger teammates, who may have engaged less deeply with the material when AI was present. Qualitative feedback highlighted that human interaction fostered deeper thinking, collaboration, validation, and a sense of contribution that was often absent with AI.

Implications

The study strongly recommends that educators integrate human-human pair programming into their courses, as it fosters greater emotional engagement, enjoyment, and motivation to work harder, even if objective performance might initially be lower than with AI. While Copilot offers significant performance benefits and reduces workload with potentially minor learning consequences for simple problems, it can diminish a student's sense of control, autonomy, and the valuable social interactions crucial for deeper learning and emotional well-being. The findings suggest that AI should be considered an accessible, low-stakes information retrieval tool. Future research should explore hybrid models where AI augments human teams, but with careful consideration to prevent AI from disrupting beneficial social dynamics and student autonomy. Mechanisms like user-defined rate limiting or encouraging intentional AI use with system guardrails could help students leverage AI without sacrificing the intrinsic and social values of human collaboration. The study also acknowledges limitations related to sample size, generalizability, and the experimental setting's fixed time limits, suggesting future work explore student-determined time investment and more complex programming tasks.

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

Fast and Forgettable: A Controlled Study of Novices' Performance, Learning, Workload, and Emotion in AI-Assisted and Human Pair Programming Paradigms

Problem

Method

Results

Implications