Russian Federation
This paper examines the problem of software vulnerabilities in the absence of source code. One way to counter them is by decompilation the machine (executable) code of programs. The paper considers the application of a relatively new technology, large language models, to the task of restoring pseudo-source code suitable for detecting and eliminating vulnerabilities. The paper identifies problematic issues in the subject area, such as the incompleteness of the dataset for rare processor architectures, the lack of a guarantee that the obtained source code is identical to the specified machine code, the sanitization of the recovered source code by fixing vulnerabilities, hallucinations in the code, and the difficulty of restoring obfuscated (including optimized) code. To substantiate and demonstrate the essence of each problematic issue, a practical example of decompilation assembly code functions using the widespread large language model DeepSeek-V3.2 is provided. The negative impact of these problematic issues on the final neutralization of vulnerabilities is also indicated.
software security, vulnerabilities, reverse engineering, decompilation, artificial intelligence, problem issues
1. Kasperski K. Tekhnika otladki programm bez iskhodnyh tekstov. SPb.: BHV-Peterburg, 2005. 832 s.
2. Aeshin I.T. Revers-inzhiniring programmnogo produkta s ispol'zovaniem IDA Pro // Aktual'nye problemy aviacii i kosmonavtiki. 2018. T. 3. № 4 (14). S. 808‒809.
3. Izrailov K.E. Algoritmizaciya mashinnogo koda telekommunikacionnyh ustrojstv kak strategicheskoe sredstvo obespecheniya informacionnoj bezopasnosti // Nacional'naya bezopasnost' i strategicheskoe planirovanie. 2013. № 2 (2). S. 28–36.
4. Shin E.C.R., Song D., Moazzezi R. Recognizing functions in binaries with neural networks // The proceedings of 24th USENIX Conference on Security Symposium. Washington, 2015. P. 611‒626.
5. Izrailov K.E. Geneticheskij revers-inzhiniring programm dlya poiska uyazvimostej // Nauchno-analiticheskij zhurnal «Vestnik Sankt-Peterburgskogo universiteta Gosudarstvennoj protivopozharnoj sluzhby MCHS Rossii». 2025. № 1. S. 109–119. DOI:https://doi.org/10.61260/2218-130X-2025-1-109-119.
6. LLM4Decompile: Decompiling Binary Code with Large Language Models / H. Tan [et al.] // The proceeding of Conference on Empirical Methods in Natural Language Processing. Miami, 2024. P. 3473–3487. DOI:https://doi.org/10.18653/v1/2024.emnlp-main.203.
7. Izrailov K.E. Koncepciya geneticheskoj dekompilyacii mashinnogo koda telekommunikacionnyh ustrojstv // Trudy uchebnyh zavedenij svyazi. 2021. T. 7. № 4. S. 10‒17. DOI:https://doi.org/10.31854/1813-324X-2021-7-4-95-109.
8. Yin X., Ni C., Wang S. Multitask-Based Evaluation of Open-Source LLM on Software Vulnerability // Transactions on Software Engineering. Vol. 50. № 11. P. 3071–3087. DOI:https://doi.org/10.1109/TSE.2024.3470333.
9. Galadima H.S., Doherty C., Brennan R. Towards LLM-based Synthetic Dataset Generation of Cyber Incident Response Process Logs // The proceedings of Cyber Research Conference. Carlow, 2024. P. 1–4. DOI:https://doi.org/10.1109/Cyber-RCI60769.2024.10939563.
10. Calatayud B.M., Meany L. A comparative analysis of Buffer Overflow vulnerabilities in High-End IoT devices // The proceedings of 12th Annual Computing and Communication Workshop and Conference. Las Vegas, 2022. P. 0694–0701. DOI:https://doi.org/10.1109/CCWC54503.2022.9720884.
11. Komashko M.N. ChatGPT, tekst, informaciya: kriticheskij analiz // Trudy po intellektual'noj sobstvennosti. 2024. T. 50. № 3. S. 118–128. DOI:https://doi.org/10.17323/tis.2024.22306.
12. Milushev E.H., Batunin Ya.V., Popov A.A. Metody obfuskacii koda: sravnitel'nyj analiz // Naukosfera. 2025. № 5-2. S. 1–6. DOI:https://doi.org/10.5281/zenodo.15574433.
13. Izrailov K.E. Problemnye voprosy geneticheskoj deevolyucii predstavlenij programmy dlya poiska v nih uyazvimostej i rekomendacii po ih razresheniyu // Trudy uchebnyh zavedenij svyazi. 2025. T. 11. № 1. S. 84–98. DOI:https://doi.org/10.31854/1813-324X-2025-11-1-84-98.



