This bachelor thesis addresses the challenge of verifying students' individual understanding of programming submissions in an era where large language models are widely used. Instructors traditionally rely on targeted questions during tutorials to confirm that students truly comprehend the code they submit, but this process is time-consuming and difficult to scale.
The goal of this thesis is to design and implement a standalone web application that simulates this tutorial setting independently of any existing programming platform. Teachers create programming tasks with deadlines, grading criteria, a ground-truth solution, and unit tests; students upload their source code, run the unit tests, and then answer multiple-choice and open questions generated by an LLM-powered chatbot based on their own submission. The system scores the answers automatically and combines them with unit test results to produce a grade for the programming task.
The thesis will cover the design and implementation of:
The final concept and implementation will be evaluated with example programming exercises, incorporating feedback from instructors to assess usability, pedagogical value, and the reliability of LLM-generated assessments.