PROBLEM ISSUES IN USING LARGE LANGUAGE MODELS  FOR DECOMPILATION OF MACHINE CODE WITH VULNERABILITIES

Konstantin Izrailov

doi:doi:10.61260/2218-130X-2025-4-72-81

Home / Journals / Scientific and analytical journal «Vestnik Saint-Petersburg university of State fire service of EMERCOM of Russia» / Volume 2025 Issue 4 / PROBLEM ISSUES IN USING LARGE LANGUAGE MODELS FOR DECOMPILATION OF MACHINE CODE WITH VULNERABILITIES

PROBLEM ISSUES IN USING LARGE LANGUAGE MODELS FOR DECOMPILATION OF MACHINE CODE WITH VULNERABILITIES

Submit manuscript Download PDF
Text

To cite

Citations:

PROBLEM ISSUES IN USING LARGE LANGUAGE MODELS FOR DECOMPILATION OF MACHINE CODE WITH VULNERABILITIES

Journal: SCIENTIFIC AND ANALYTICAL JOURNAL «VESTNIK SAINT-PETERSBURG UNIVERSITY OF STATE FIRE SERVICE OF EMERCOM OF RUSSIA» Volume 2025 № 4 , 2025

Rubrics: INFORMATICS, COMPUTER ENGINEERING AND CONTROL

UDC 004.04

Konstantin Izrailov ¹

Author and publication information

Authors:

1. Saint-Petersburg university of State fire service of EMERCOM of Russia (department of applied mathematics and information technology security, professor)

Russian Federation

Type:

Article

DOI:

https://doi.org/10.61260/2218-130X-2025-4-72-81

Pages:

from 72 to 81

Status:

Published

Received:

24.10.2025

Accepted:

23.11.2025

Published:

24.12.2025

Subject area:

UDC 004.04

Language:

Russian

Keywords:

software security, vulnerabilities, reverse engineering, decompilation, artificial intelligence, problem issues

Abstract and keywords

Abstract:
This paper examines the problem of software vulnerabilities in the absence of source code. One way to counter them is by decompilation the machine (executable) code of programs. The paper considers the application of a relatively new technology, large language models, to the task of restoring pseudo-source code suitable for detecting and eliminating vulnerabilities. The paper identifies problematic issues in the subject area, such as the incompleteness of the dataset for rare processor architectures, the lack of a guarantee that the obtained source code is identical to the specified machine code, the sanitization of the recovered source code by fixing vulnerabilities, hallucinations in the code, and the difficulty of restoring obfuscated (including optimized) code. To substantiate and demonstrate the essence of each problematic issue, a practical example of decompilation assembly code functions using the widespread large language model DeepSeek-V3.2 is provided. The negative impact of these problematic issues on the final neutralization of vulnerabilities is also indicated.

Keywords:
software security, vulnerabilities, reverse engineering, decompilation, artificial intelligence, problem issues

Text

Text (PDF): Read Download

References

1. Kasperski K. Tekhnika otladki programm bez iskhodnyh tekstov. SPb.: BHV-Peterburg, 2005. 832 s.

2. Aeshin I.T. Revers-inzhiniring programmnogo produkta s ispol'zovaniem IDA Pro // Aktual'nye problemy aviacii i kosmonavtiki. 2018. T. 3. № 4 (14). S. 808‒809.

3. Izrailov K.E. Algoritmizaciya mashinnogo koda telekommunikacionnyh ustrojstv kak strategicheskoe sredstvo obespecheniya informacionnoj bezopasnosti // Nacional'naya bezopasnost' i strategicheskoe planirovanie. 2013. № 2 (2). S. 28–36.

4. Shin E.C.R., Song D., Moazzezi R. Recognizing functions in binaries with neural networks // The proceedings of 24th USENIX Conference on Security Symposium. Washington, 2015. P. 611‒626.

5. Izrailov K.E. Geneticheskij revers-inzhiniring programm dlya poiska uyazvimostej // Nauchno-analiticheskij zhurnal «Vestnik Sankt-Peterburgskogo universiteta Gosudarstvennoj protivopozharnoj sluzhby MCHS Rossii». 2025. № 1. S. 109–119. DOI:https://doi.org/10.61260/2218-130X-2025-1-109-119.

6. LLM4Decompile: Decompiling Binary Code with Large Language Models / H. Tan [et al.] // The proceeding of Conference on Empirical Methods in Natural Language Processing. Miami, 2024. P. 3473–3487. DOI:https://doi.org/10.18653/v1/2024.emnlp-main.203.

7. Izrailov K.E. Koncepciya geneticheskoj dekompilyacii mashinnogo koda telekommunikacionnyh ustrojstv // Trudy uchebnyh zavedenij svyazi. 2021. T. 7. № 4. S. 10‒17. DOI:https://doi.org/10.31854/1813-324X-2021-7-4-95-109.

8. Yin X., Ni C., Wang S. Multitask-Based Evaluation of Open-Source LLM on Software Vulnerability // Transactions on Software Engineering. Vol. 50. № 11. P. 3071–3087. DOI:https://doi.org/10.1109/TSE.2024.3470333.

9. Galadima H.S., Doherty C., Brennan R. Towards LLM-based Synthetic Dataset Generation of Cyber Incident Response Process Logs // The proceedings of Cyber Research Conference. Carlow, 2024. P. 1–4. DOI:https://doi.org/10.1109/Cyber-RCI60769.2024.10939563.

10. Calatayud B.M., Meany L. A comparative analysis of Buffer Overflow vulnerabilities in High-End IoT devices // The proceedings of 12th Annual Computing and Communication Workshop and Conference. Las Vegas, 2022. P. 0694–0701. DOI:https://doi.org/10.1109/CCWC54503.2022.9720884.

11. Komashko M.N. ChatGPT, tekst, informaciya: kriticheskij analiz // Trudy po intellektual'noj sobstvennosti. 2024. T. 50. № 3. S. 118–128. DOI:https://doi.org/10.17323/tis.2024.22306.

12. Milushev E.H., Batunin Ya.V., Popov A.A. Metody obfuskacii koda: sravnitel'nyj analiz // Naukosfera. 2025. № 5-2. S. 1–6. DOI:https://doi.org/10.5281/zenodo.15574433.

13. Izrailov K.E. Problemnye voprosy geneticheskoj deevolyucii predstavlenij programmy dlya poiska v nih uyazvimostej i rekomendacii po ih razresheniyu // Trudy uchebnyh zavedenij svyazi. 2025. T. 11. № 1. S. 84–98. DOI:https://doi.org/10.31854/1813-324X-2025-11-1-84-98.

Submit manuscript Download PDF
Text JATS XML

To cite

Citations:

Confirmation

Регистрация