How does an Internet search engine work? Yandex source code leak makes it clear


Some media reported last week that a source code repository for the Yandex search engine had allegedly been stolen by a former employee of the Russian tech company and leaked as a torrent on a popular hacking forum.

We talk about 44.7 GB of files stolen from the company in July 2022. These code repositories supposedly contain all of the company’s source code, plus anti-spam rules. In a statement, Yandex stated that an initial investigation showed that the leaked code “appears to be old snippets that differ from the current version in the company repository.”

Security researcher Arseniy Shestakov claims that the exposed files date back to February 2022, coinciding with the Russian invasion of Ukraine. Although Shestakov said the leaked files included the source code for a number of services, They do not contain sensitive user data.

As we say, the filtered repository only contains code. The other important part like the model weights for neural networks are missing, so it’s almost useless. Still, there are many interesting files with names like “blacklist.txt” that could expose running services.


Despite this, the movement, in a few words, is unprecedented and that we all lack more data and information about it, but for the first time you can see the ins and outs of a search engine. This opens up a great space for us to delve into this dark world of Yandex, at least currently, and thus explain to you, if you don’t know it yet, what this search engine consists of.

What is Yandex?

Well, let’s go to the base of everything to be able to delve into the issue that concerns us today. Yandex is a Russian technology company known for the creation of its Yandex search engine. According to Statcounter, Yandex had a 39.6% market share in Russia, compared to Google’s 57.9% in July 2020.

Yandex is used to search like other search engines like Google or Bing: you enter the query, press enter and a bunch of results will appear. According to a study, Yandex generates 52% of web traffic in Russia. Also, it grew in popularity when Russian Android phones decided to stop using Google as Search Engine default in 2017.


Of course, although it works like most conventional search engines, There are some key differences that set Yandex apart from other competitors like Google.

For example, Yandex places more emphasis than Google on local SEO and regionality and performs geodependent searches that only show websites from a specific region. This means that people in different places will be shown different results for the same search term.

Google SEO

On the other hand, user behavior as well as dwell time is a key ranking factor for Yandex. Although Google also take it into account, it is a critical factor for a good ranking in Yandex.

Finally, while link building is still important, it is more about drive relevant traffic to your site than to demonstrate the power or reliability of your site. Domain age and creation date play a bigger role in ranking in Yandex, so you can find a lot of outdated content.

Some conclusions drawn from Yandex after the leak of its code

We wanted to extract some of the conclusions that have most caught our attention. Of course, if you want to take a look at the article in its entirety, we leave you the linked source of Search Engine.

On the one hand, note that this search engine has anti-SEO upper limits for some ranking factors and 39 of these ranking factors are part of the initially weighted factors that may prevent a Page from being included in the initial list of posts.

This is something that many search engines like Bing incorporate. For example, this promotes the abusive use of meta keywords as a negative factor, but it seems that Yandex far exceeds them.


On the other hand, it is suggested that there are certain parameters that benefit more from the reinforcement algorithm than others, which is known as “boosting”.

For example, they mention that the smallest files go inside and, what is most striking, Yandex gives a boost that skews its results to certain news organizations and gives a boost to those you want in their positioning.

Racist code in the Yandex code base?

Another of the most striking points of this leak case is that of the possible use of racist code in its base. And it is that, those who have been able to review it, have seen racist insults throughout the database code leaked from Yandex.

To help you understand how this is possible, explain to you that programmers often use specific terms or names so that other developers can understand what function or action a certain line of code performs.

This helps them that if they have to modify or update code, they can reduce the necessary search time. In this casethe Yandex developers seem to have substituted a generic term for a feature with offensive language.

It is not clear why exactly these terms were included. However, the use of offensive language in the code is a violation of good practice and, as Yandex pointed out in its statement, against its code of ethics.

Yandex did not provide any additional information as to why it has used certain profanities, but those who have delved into it noted that it also appeared to have been used to replace “workers” in various parts of its codebase. Certainly a relic for those who are interested in this and other search engines.


Leave a Reply

Your email address will not be published. Required fields are marked *

About Us

Our team of experienced writers and editors come from diverse backgrounds and bring a wealth of knowledge and expertise to our website. We are passionate about our work and are committed to upholding the highest standards of journalism.