Scientific and analytical journal «Vestnik Saint-Petersburg university of State fire service of EMERCOM of Russia»

Научно-аналитический журнал "Вестник Санкт-Петербургского университета ГПС МЧС России"

2218-130X

120408

10.61260/2218-130X-2026-1-30-42

ИНФОРМАТИКА, ВЫЧИСЛИТЕЛЬНАЯ ТЕХНИКА И УПРАВЛЕНИЕ

INFORMATICS, COMPUTER ENGINEERING AND CONTROL

ИНФОРМАТИКА, ВЫЧИСЛИТЕЛЬНАЯ ТЕХНИКА И УПРАВЛЕНИЕ

ALGORITHM FOR SUPPORTING INDIVIDUAL KNOWLEDGE TESTING BASED ON A GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM

АЛГОРИТМ ПОДДЕРЖКИ ИНДИВИДУАЛЬНОГО ТЕСТИРОВАНИЯ ЗНАНИЙ НА ОСНОВЕ СИСТЕМ ГЕНЕРАТИВНОГО ИСКУССТВЕННОГО ИНТЕЛЛЕКТА

Коцюба

Игорь Юрьевич

Kotsyuba

Igor Yurievich

ikotciuba@itmo.ru

кандидат технических наук;

candidate of technical sciences;

Лайок

Олег Владимирович

Layok

Oleg Vladimirovich

laolvl@mail.ru

Валдайцева

Мария Викторовна

Valdayceva

Mariya Viktorovna

mvvaldaitceva@itmo.ru

кандидат технических наук;

candidate of technical sciences;

Санкт-Петербургский национальный исследовательский университет информационных технологий, механики и оптики Saint-Petersburg National Research University of Information Technologies, Mechanics and Optics

Санкт-Петербургский национальный исследовательский университет информационных технологий, механики и оптики Санкт-Петербург Россия ITMO university Saint-Petersburg Russian Federation

11 04 2026

2026 1 30 42 12 01 2026 25 03 2026

https://journals.igps.ru/en/nauka/article/120408/view

Рассмотрен алгоритм автоматической генерации тематических тестов на примере тестов по английскому языку с использованием метода контрфактного анализа для повышения их качества на базе мобильного приложения.В ходе детального анализа предметной области языкового тестирования были выстроены четкие требования к будущему сервису, классифицированы ключевые форматы контроля знаний с описанием типовых упражнений и уровней сложности, на которых они применяются, что помогло собрать целостную картину навыков, требующих автоматизированной проверки. Выделены сложные точки существующих тестов: двусмысленные формулировки, множественность корректных ответов, трудоёмкий подбор.Разработан и апробирован комплексный подход к оценке эффективности промптов для генерации грамматических тестов на базе больших языковых моделей. В качестве ядра предложен контрфактный алгоритм, позволяющий выявлять латентные признаки, реально влияющие на выбор грамматических структур модели, точечно модифицировать промпт и оценивать изменения по трём взаимодополняющим метрикам. Применение алгоритма показало, что добавление явных указаний на самые значимые скрытые признаки повышает восприимчивость модели к ключевым факторам задания. Дальнейшая переоценка качества по разработанным метрикам и независимая экспертная проверка подтвердили статистически значимый прирост (p < 0,01) как в грамматическом соответствии, так и в соответствии структуре заданий: средняя оценка повысилась с 0,91 до 0,95. Таким образом, контрфактный анализ действительно является эффективным инструментом тонкой настройки промптов; предложенный улучшенный промпт обеспечивает более надёжную генерацию тестовых материалов, соответствующих образовательным стандартам, и закладывает основу для масштабирования алгоритма на другие типы заданий и языковые навыки.

The paper presents algorithm for the automatic generation of thematic tests using the example of English language tests using the counterfactual analysis method to improve their quality based on a mobile application. A detailed analysis of the language domain led to the development of clear requirements for the future service. Key forms of assessment knowledge were classified, along with descriptions of typical exercises and the difficulty levels in which they are used, helping to create a comprehensive picture of the skills requiring step-by-step assessment. The challenges of existing tests are highlighted: ambiguous wording, multiple correct answers, and labor-intensive selection. This paper develops and tests a comprehensive approach to assessing the effectiveness of prompts for generating grammar tests based on Large Language Models. A counterfactual algorithm is proposed as a core, which allows identifying latent features that actually influence the choice of grammatical structures of the model, selectively modifying the prompt, and evaluating changes using three complementary metrics. The application of the algorithm showed that adding explicit indications of the most significant hidden features increases the model's sensitivity to key factors of the task. Further re-evaluation of quality using the developed metrics and independent expert review confirmed a statistically significant increase (p < 0.01) in both grammatical compliance and compliance with the structure of tasks: the average score increased from 0,91 to 0,95. Thus, counterfactual analysis is indeed an effective tool for fine-tuning prompts; the proposed improved prompt ensures more reliable generation of test materials that meet educational standards and lays the foundation for scaling the algorithm to other types of tasks and language skills.

качество образования искусственный интеллект Large Language Models промт контрфактный анализ латентные признаки грамматический тест контрфактный алгоритм восприимчивость модели генерация тестов

quality of education artificial intelligence Large Language Models prompt counterfactual analysis latent signs grammar test counterfactual algorithm model sensitivity test generation

Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education / T.K.F. Chiu [et al.] // Computers and Education: Artificial Intelligence. 2023. Vol. 4. P. 100118. DOI: 10.1016/j.caeai.2022.100070

Kalyan K.S., Rajasekharan A., Sangeetha S. AMMUS: A Survey of Transformer-based Pretrained Models in Natural Language Processing // arXiv preprint. 2021. DOI: 10.48550/arXiv.2108.05542

Training language models to follow instructions with human feedback / L. Ouyang [et al.] // arXiv preprint. 2022. DOI:10.48550/arXiv.2203.02155

Language Models are Few-Shot Learners / T.B. Brown [et al.] // arXiv preprint. 2020. DOI: 10.48550/arXiv:2005.14165

GPT-3 family: Diverse applications of a large language model / T.B. Brown [et al.] // arXiv preprint. 2021. DOI: 10.48550/arXiv:2105.14208

Text-davinci: A large language model for diverse and creative text generation / A. Radford [et al.] // arXiv preprint. 2022. DOI: 10.48550/arXiv:2201.12136

ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education / E. Kasneci [et al.] // arXiv preprint. 2023. DOI: 10.48550/arXiv:2304.11208

Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges / Q. Li [et al.] // arXiv preprint. 2023. DOI: 10.48550/arXiv:2401.08664

Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review / L. Yan [et al.] // arXiv preprint. 2023. DOI: 10.48550/arXiv:2303.13379

10.

Nitze A. Future-proofing Education: A Prototype for Simulating Oral Examinations Using Large Language Models // arXiv preprint. 2023. DOI: 10.48550/arXiv:2401.06160

11.

Peng L., Nuchged B., Gao Y. Spoken Language Intelligence of Large Language Models for Language Learning // arXiv preprint. 2023. DOI: 10.48550/arXiv:2308.14536

12.

Wang K., Ramos J., Lawrence R. ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education // arXiv preprint. 2023. DOI: 10.48550/arXiv:2401.00052

13.

Castleman B., Turkcan M.K. Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors // arXiv preprint. 2023. DOI: 10.48550/arXiv:2309.12367

14.

Large Language Models in Education: Vision and Opportunities / W. Gan [et al.] // arXiv preprint. 2023. DOI: 10.48550/arXiv:2311.13160

15.

Challenges and Opportunities of Generative AI for Higher Education as Explained by ChatGPT / R. Michel-Villarreal [et al.] // Education Sciences. 2023. Vol. 13. № 9. P. 856. DOI: 10.3390/educsci13090856

16.

A systematic survey of prompt engineering in large language models: Techniques and applications / P. Sahoo [et al.] // arXiv preprint. 2024. DOI: 10.48550/arXiv:2402.07927

17.

Luo H., Specia L. From understanding to utilization: A survey on explainability for large language models // arXiv preprint. 2024. DOI: 10.48550/arXiv:2309.01029

18.

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions / S. Wu [et al.] // arXiv preprint. 2023. DOI: 10.48550/arXiv:2309.01029

19.

Larger language models do in-context learning differently / J. Wei [et al.] // arXiv preprint. 2024. DOI: 10.48550/arXiv:2405.19592

20.

Madsen A., Chandar S., Reddy S. Can Large Language Models Explain Themselves? // arXiv preprint. 2024. DOI: 10.48550/arXiv:2401.07927

21.

LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers? / A. Bhattacharjee [et al.] // arXiv preprint. 2023. DOI: 10.48550/arXiv:2309.13340