Abstract and keywords
Abstract:
The paper presents algorithm for the automatic generation of thematic tests using the example of English language tests using the counterfactual analysis method to improve their quality based on a mobile application. A detailed analysis of the language domain led to the development of clear requirements for the future service. Key forms of assessment knowledge were classified, along with descriptions of typical exercises and the difficulty levels in which they are used, helping to create a comprehensive picture of the skills requiring step-by-step assessment. The challenges of existing tests are highlighted: ambiguous wording, multiple correct answers, and labor-intensive selection. This paper develops and tests a comprehensive approach to assessing the effectiveness of prompts for generating grammar tests based on Large Language Models. A counterfactual algorithm is proposed as a core, which allows identifying latent features that actually influence the choice of grammatical structures of the model, selectively modifying the prompt, and evaluating changes using three complementary metrics. The application of the algorithm showed that adding explicit indications of the most significant hidden features increases the model's sensitivity to key factors of the task. Further re-evaluation of quality using the developed metrics and independent expert review confirmed a statistically significant increase (p < 0.01) in both grammatical compliance and compliance with the structure of tasks: the average score increased from 0,91 to 0,95. Thus, counterfactual analysis is indeed an effective tool for fine-tuning prompts; the proposed improved prompt ensures more reliable generation of test materials that meet educational standards and lays the foundation for scaling the algorithm to other types of tasks and language skills.

Keywords:
quality of education, artificial intelligence, Large Language Models, prompt, counterfactual analysis, latent signs, grammar test, counterfactual algorithm, model sensitivity, test generation
Text
Text (PDF): Read Download
References

1. Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education / T.K.F. Chiu [et al.] // Computers and Education: Artificial Intelligence. 2023. Vol. 4. P. 100118. DOI:https://doi.org/10.1016/j.caeai.2022.100070

2. Kalyan K.S., Rajasekharan A., Sangeetha S. AMMUS: A Survey of Transformer-based Pretrained Models in Natural Language Processing // arXiv preprint. 2021. DOI:https://doi.org/10.48550/arXiv.2108.05542

3. Training language models to follow instructions with human feedback / L. Ouyang [et al.] // arXiv preprint. 2022. DOIhttps://doi.org/10.48550/arXiv.2203.02155

4. Language Models are Few-Shot Learners / T.B. Brown [et al.] // arXiv preprint. 2020. DOI:https://doi.org/10.48550/arXiv:2005.14165

5. GPT-3 family: Diverse applications of a large language model / T.B. Brown [et al.] // arXiv preprint. 2021. DOI:https://doi.org/10.48550/arXiv:2105.14208

6. Text-davinci: A large language model for diverse and creative text generation / A. Radford [et al.] // arXiv preprint. 2022. DOI:https://doi.org/10.48550/arXiv:2201.12136

7. ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education / E. Kasneci [et al.] // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2304.11208

8. Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges / Q. Li [et al.] // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2401.08664

9. Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review / L. Yan [et al.] // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2303.13379

10. Nitze A. Future-proofing Education: A Prototype for Simulating Oral Examinations Using Large Language Models // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2401.06160

11. Peng L., Nuchged B., Gao Y. Spoken Language Intelligence of Large Language Models for Language Learning // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2308.14536

12. Wang K., Ramos J., Lawrence R. ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2401.00052

13. Castleman B., Turkcan M.K. Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2309.12367

14. Large Language Models in Education: Vision and Opportunities / W. Gan [et al.] // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2311.13160

15. Challenges and Opportunities of Generative AI for Higher Education as Explained by ChatGPT / R. Michel-Villarreal [et al.] // Education Sciences. 2023. Vol. 13. № 9. P. 856. DOI:https://doi.org/10.3390/educsci13090856

16. A systematic survey of prompt engineering in large language models: Techniques and applications / P. Sahoo [et al.] // arXiv preprint. 2024. DOI:https://doi.org/10.48550/arXiv:2402.07927

17. Luo H., Specia L. From understanding to utilization: A survey on explainability for large language models // arXiv preprint. 2024. DOI:https://doi.org/10.48550/arXiv:2309.01029

18. Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions / S. Wu [et al.] // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2309.01029

19. Larger language models do in-context learning differently / J. Wei [et al.] // arXiv preprint. 2024. DOI:https://doi.org/10.48550/arXiv:2405.19592

20. Madsen A., Chandar S., Reddy S. Can Large Language Models Explain Themselves? // arXiv preprint. 2024. DOI:https://doi.org/10.48550/arXiv:2401.07927

21. LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers? / A. Bhattacharjee [et al.] // arXiv preprint. 2023. DOI:https://doi.org/10.48550/arXiv:2309.13340

Login or Create
* Forgot password?