The article is devoted to the problem of source code finding copies that duplicates the given one. In the interests of this, the existing approaches for searching for code clones based on textual, lexical, syntactic, metric and semantic analysis are considered. Based on their criterion comparison, a new method for searching for duplicates is proposed, which is based on the random walk algorithm. The essence of the method is to build graphs of two source codes (where the nodes are the tokens of the text, and the edges are the links between them), on which the specified algorithm is then applied; the description of the method is given in the form of pseudocode. An experiment is being carried out to evaluate the performance of the method using the following metrics: Jaccard, differences in the number of edges, vertices and average clustering of graphs, the shortest path between their vertices, as well as similarities between graphs. Experimental scenarios consist of calculating metrics for combinations of two source codes instances, their union and the proportion of one of them. Conclusions are drawn regarding the applicability of both the method itself and each of the evaluation metrics.

information security, duplicate search, random walk, source code
