The New Semrush Backlink Database： Bigger, Better, Faster

The New Semrush Backlink Database： Bigger, Better, Faster

2024-12-25 13:20

Although the Backlink Analytics tool is one of the oldest features of Semrush, however, we have to admit that it may have been the weaker link of our SEO toolkit. We knew we had to step up our game, so about a year and a half ago, we set on changing the status quo.

The New Semrush Backl<i></i>ink Datab<i></i>ase： Bigger, Better, Faster

Semrush, while being a well-rounded toolkit for digital marketers, has always had a soft spot for SEO. Helping people drive organic traffic from search engines to their content has been one of our most important goals since its inception.

This goal led us to become a world-renowned SEO Suite, allowing us to win multiple industry awards over the years.

SEO is tricky, as it involves complex and intertwined moving parts. To get the top rankings, you have to nail down every single step on-site and off-site. Throughout this entire process, we strive to provide our users with the best solutions.

To stay praiseworthy, we are continuously working on improving our toolkit. Today, we are proud to share with you our latest breakthrough:

We needed a major improvement in the quality of our backlink data. There was no workaround, but a complete overhaul of our data-gathering process. To focus on our end goal, we put on hold the development of all other backlink features and made a huge list of things that would improve our backlink data delivery to clients.

The path was clear, and all we had to do is work on crossing off the items on our list.

We won’t bore you with the technical details of our backlink database’s overhaul, but here is a quick rundown of what was done:

Crawler. After carefully examining the drawbacks and boundaries of the existing architecture, we decided to rewrite our crawler from scratch. And so we did, we have designed an entirely new approach to our data gathering.

Crawling queue. The first tests of our new crawler revealed that its request queue was not properly handling the amount of data it was now collecting. We tried solving this by simply increasing hardware capabilities, but it was not good enough, so we developed a more efficient crawling queue.

Seeding. To provide our crawler with a quality initial seeding, we queued up all the URLs from Google’s Top 100 for 450 million keywords from our Organic Research tool; this ensured that our database was relevant from the ground up.

Storage. Increased data collection obviously demands more storage space — we had to quadruple our server size.

To find out exactly where we are as a backlink provider, we decided to measure ourselves against the best: Majestic, Ahrefs, and Moz.

We will explain the methodology in a second. First, let’s assess our development progress during the past six months. Looking at the relationship between the four top SEO tools, you can see that we have made a giant step forward.

It was not easy to figure out a methodology that would be both clear-cut and fair.

You can always find the domains that will show your backlink tool in a good light, that is why we decided to use a random set of 100 domains (out of 100,000) for each month to show us how the contestants performed during the past six months.

We were looking at the number of referring domains and the total number of backlinks each contestant had for the 100 domains.

Next, for each domain in the test sample, we compared the ratio of Semrush results to the results of our competitors. So, if the ratio is less than 1 — the Semrush database has less information for the test domain. A ratio greater than 1 shows by how many times the Semrush result exceeded.

To get the final score, we calculated the median of all results.

As expected, we’ve had a lot of feedback on this post, and we would like to thank you all for your responses!

For the most part, the community was very supportive, and one of the first people to give us kudos was Alyeda Solis.

This was followed by a barrage of positive messages, with Gregg Lee putting the cherry on top — Brian Dean has checked and approved our database growth.

Of course, we’ve also had a good share of criticism. Russ Jones claimed to prove us wrong with his own research.

After a bit of back and forth, he revised his conclusion. But we still have to disagree with it.

To quote Russ: “Comparing link indexes accurately is no easy endeavor.”

That’s completely true. First off, getting a truly random sample of domains is an important and very complex part of a quality backlink index comparison. We are really appreciative of the methodology that Russ presented in his article, it’s quite a helpful piece. Yet we cannot agree with the way he assesses and compares the indexes themselves.

The method he uses only shows the likelihood of one index having more data than the other. It does not reveal how much more data there is, which means the method cannot be used for the real comparison (if your goal is to find out which index has more data).

The following are examples illustrating why.

Example 1:

Let’s say we have backlink indexes for Contestant 1 (C1) and Contestant 2 (C2).

The comparison for a sample of 12 domains shows that C2 wins every time:

According to Russ’s approach, C2 is the absolute champion. But in reality, the difference between indexes is 0.1%, which means that the C2 and C1 indexes are basically equal.

Example 2:

This time, let’s say, the comparison shows that out of a sample of 12 domains, C2 has 9 wins, and C1 has 3 wins:

Once again, according to Russ’s approach, C2 here is 3 times bigger than C1. But by looking at the actual index sizes, you can see that for 75% of sample domains the indexes are almost equal (0,1% difference), and for 25%, C1 has a complete victory (C2 has no data). Overall, C1 in this example has a better backlink index.

These examples are extreme, but they do illustrate the flaws of the approach. Without knowing how much data there actually is, you can not claim that one backlink index is more useful for SEO than another.

Our comparison method acknowledges this, as we were calculating the median using the actual numbers of referring domains and backlinks.

Russ kindly shared the sample used in his research so that we could verify it ourselves. The results turned out to be very similar to those presented in our research.

In terms of number of backlinks this graph shows a drastically different picture to what Russ has presented in his research.

We also took the first domain (amotherthing.com) from Russ’s sample and ran it through the interfaces of both Semrush and Moz.

The numbers proved to be different from those presented by Russ.

Russ’s research:
Semrush: 28469 backlinks
Moz: 404078 backlinks

Tool interfaces:
Semrush: 37.7k backlinks
Moz: 26.8k backlinks

Anyway, we wanted to thank Russ for his time and ideas, as we believe that healthy competition is a good incentive for us and the industry as a whole.

We have made a huge leap forward with our backlink database, and it feels great to look at the numbers and pat ourselves on the back, but, obviously, it is not just about the numbers.

The quantity of data does not necessarily convert in quality, and we are making a great effort to ensure that our database stays fresh and useful.

Now that we have a new data gathering process, we will build upon it designing new features and capabilities that will make our tools even stronger. Stay tuned for more exciting news!