The web has always battled spam. Every reader here has likely dealt with the onslaught of email spam, while others have also dealt with the scourge of link spam.  

At first glance, it’s a mild annoyance, a part of everyday life online. But over my years at Moz and my prior years as a consultant, I’ve come to understand that the underlying motivations for spam are economic — and that economy is booming.

Why do authority metrics get manipulated?

When Google finally pulled Toolbar PageRank from Chrome in 2016, a vacuum was left in its place for webmasters who skirt Google’s quality guidelines.

PageRank was the scale upon which nearly every link acquisition was judged. With PageRank gone, webmasters flocked to the next free metrics: Moz’s Domain Authority (DA) and Page Authority (PA). Today, it’s common practice to see links and domains priced based on their DA or PA score.

Unfortunately, as the market shifted from PageRank to Domain Authority, so did the attempts to manipulate the metric with spam. Admittedly, these attempts were largely effective. The original Domain Authority was built using a simple process:

  • Download a large number of search results for a random set of keywords
  • Use a machine learning algorithm to predict those search results with link metrics
  • Place all sites on a 0–100 scale using the output of that machine-learned model

This seemed to be the obvious way to accomplish the task of building a metric that predicts rankings, but there was a flaw, or at least a limitation to this methodology.

How and why we changed Domain Authority to root out spam

Imagine you wanted to figure out what it takes to become an All Star in the NBA. You take the stats of all the players that year and the list of those that made the All Stars, and then you use a machine learning model to predict which players will make the All Stars in the future.

Your model would very quickly come to learn that scoring points and getting assists and rebounds will improve your chances of making the All Stars.

However, what it wouldn’t tell you is that height is a major component. Why? Because everyone in the training set is already tall. But of course, no one under 5 feet tall is going to become an NBA All Star.

So what would happen when we applied the machine learning algorithm to your average person on the street, rather than a professional player?

The algorithm would miss this glaring predictor and perform far more poorly than it would had it known that height matters.

Similarly, while Domain Authority was doing a good job predicting sites that will rank among sites that already rank, it was at a loss among sites that had bizarre link profiles not targeted towards rankings. This left Domain Authority vulnerable to manipulation.

New methods to power a more effective metric

Members of the data science and engineering teams at Moz came together to create a new training set and new variables to address this very problem.

The first step involved modifying the training set so that it would include sites that rank for no terms in the last place for a certain percentage of search engine results pages (SERPs). This would allow the neural network to learn not only how to compare sites that already rank, but how to properly devalue sites that don’t rank for any keyword at all.

The second step involved dramatically improving the variables from which the neural network could learn.

Historically, we used largely raw, singular metrics — like the number of root linking domains to your site — to power the Domain Authority model. However, with the release of our massive new link index, Link Explorer, we were able to import far more complex variables, like link distributions across various categories.

Let me give an example or two. The number of links you have from websites that get more than 100 visits a month might be a useful metric in creating Domain Authority.

However, it’s a singular metric with no context and no standard to compare it against. What if, instead, we mapped all the links to a domain based on categories of how many visitors the linking domain gets? As it turns out, this is an excellent signal for detecting certain types of link spam — giant link networks, in particular.

As you can plainly see, the spam site gets most of its links from sites that just aren’t visited by anyone. On the contrary, the model site receives a much healthier proportion of its links from sites that are well trafficked.

Of course, this isn’t the only distribution available for the neural network model as we considered other distributions, like Moz’s proprietary Spam Score.

Link manipulators lost between 15% and 98% of their Domain Authority

So, how did our new model and variables fair in devaluing link spam compared to the previous DA?

The graph above depicts how Domain Authority was affected, on average, in moving from DA 1.0 to DA 2.0. The first two columns in the bar chart are for random domains and customers. We saw an average drop of 6% for random domains, representing a recentering of the metric. The results for known link manipulation, however, were quite different.

  • Sites which purchase links saw an average drop of over 15%
  • Auction domains dropped between 61% and 98%, depending on the quality
  • Comment spammers lost a third of their Domain Authority
  • Link sellers lost more than half their Domain Authority
  • Link and domainer networks lost over 70 and 90% respectively of their Domain Authority

In short, the link and domain economy built on Domain Authority was thoroughly culled, decimating inventories of link-inflated sites.

The new Domain Authority effectively does its part to cleanse the web of the kind of spam that causes real harm. More reliable than ever before, it’s also better at predicting rankings and weeding out overt manipulation.

We see a bright future for Domain Authority and look forward to rolling out Page Authority 2.0 in the future, as well.