| Marpo

Spam has made a significant resurgence in Google's search results, reaching an unprecedented scale that many experts believe Google is struggling to manage. Despite past efforts to combat manipulative tactics, the search giant's algorithms appear to be inadvertently rewarding the very abuses they were designed to prevent.

A few years ago, Google seemed to be gaining the upper hand against various spam issues, with updates like the Helpful Content update penalizing low-quality sites. The prospect of being hit by a spam update and Google's apparent commitment to search quality deterred many. However, the landscape has shifted dramatically. The rise of artificial intelligence is haphazardly rewriting the rules, and major tech companies like Google are prioritizing other, more costly endeavors, leaving white hat SEOs at a disadvantage.

How Google's Spam Detection System Works

Google's primary spam detection system is known as SpamBrain. Historically, Google introduced algorithms such as Penguin, Panda, and RankBrain to improve decision-making based on links and keywords. SpamBrain is designed to identify spammy content and websites with what Google describes as "shocking" accuracy.

Over time, the algorithm learns to distinguish between legitimate and spammy content, establishing signals associated with problematic sites to create a neural network. Similar to the concept of seed sites, by mapping out known spammy websites, SpamBrain can accurately score other sites against them. It then analyzes various signals—content, links, behavioral, and reputational—at scale to group sites. This process involves:

Inputs: Content, linking, reputational, and behavioral signals.
Hidden layer: Clustering and comparing each site to known spam sites.
Outputs: Classifying a site as spam or not spam.

If a website is grouped with obviously spammy sites based on these signals, it indicates a significant problem. The algorithm operates on thresholds, meaning sites must often engage in questionable practices for an extended period before being penalized. However, sites with thin, low-value content, combined with dangerous links, poor business decisions (like parasite SEO), and scaled content abuse, are particularly vulnerable.

Prevalent Types of Spam

Google outlines various egregious spam activities, many of which are interconnected and currently thriving:

Cloaking: Presenting different content to users and search engines.
Doorway abuse: Creating multiple pages optimized for specific keywords to funnel users to a single destination.
Expired domain abuse: Acquiring old, authoritative domains to leverage their existing link equity for new, often unrelated, content.
Hacked content: Spam injected into compromised websites.
Hidden text and content: Text or links concealed from users but visible to search engines.
Keyword stuffing: Overloading pages with keywords in an attempt to manipulate rankings.
Link spam: Manipulating inbound links to boost rankings.
Scaled content abuse: Mass production of low-quality content, often AI-generated.
Site reputation abuse: Hosting low-quality, third-party content on reputable sites to benefit from their authority.
Thin affiliate content: Affiliate pages with minimal original content.
UGC spam: Spam in user-generated content like comments or forum posts.

While some tactics like keyword stuffing are less common, link spam and scaled content abuse are at an all-time high. The strategy often involves spreading content across multiple semantically similar websites, using exact and partial match anchors to direct authority to "money" pages, thereby increasing profitability.

Fake News and Scaled Content Abuse

Google Discover, the platform designed for engagement, has recently been exploited by spammers. Instances of fake, AI-driven content reaching a mass audience are numerous, even appearing on legacy media sites. These spammers understand how to incite emotions, often spreading misinformation about topics like state pension age or free bus passes.

Simultaneously, scaled content abuse, largely fueled by AI, is rampant. Estimates suggest that over 50% of internet content is now AI-generated, with some analyses indicating 74% of new content contains AI elements. Award-winning journalist Jean-Marc Manach's research identified over 8,300 AI-generated news websites in French and more than 300 in English, suggesting a much larger underlying problem. These tactics are reportedly making some site owners millionaires, leveraging authoritative expired domains and Private Blog Networks (PBNs) to game the system by faking clicks, manipulating engagement, and exploiting past link equity.

Expired Domain Abuse and PBNs

Expired domain abuse is a cornerstone of black hat SEO. It involves purchasing valuable expired domains with strong, clean backlink histories (free from manual penalties). These domains are then used to build Private Blog Networks (PBNs).

A PBN is a network of websites controlled by an individual or entity, designed to link back to a "money site" (the site intended to generate revenue). To avoid detection, each site within a PBN must be unique, with standalone hosting providers, nameservers, and IP addresses. This allows black hat SEOs to build significant link equity and fabricated topical authority, mitigating the risk associated with individual expired domains.

The process often involves:

Acquiring expired, valuable domains with strong backlink profiles.
Creating a PBN with unique hosting, nameservers, and IP addresses across various authoritative, aged, and newer domains.
Establishing these domains as equity strongholds.
Spinning up multiple TLD variations (e.g., .org.uk instead of .com).
Adding a mix of exact and partial match anchors from the PBN to the money site to signal its new focus.
Implementing 301 redirects or canonicalization to pass equity to the money site.

These scams are typically short-term but can yield substantial profits, sometimes hundreds of thousands of pounds monthly, particularly in lucrative niches like betting and crypto. The strategy often involves acquiring old charity domains, quickly rebranding them, and using redirects to funnel authority to commercial pages.

According to notorious black hat practitioner Charles Floate, some companies are laundering hundreds of thousands of pounds a month through these tactics.

Protecting the primary asset (the purchased aged or expired domain) is crucial. Instead of directly linking to the money site, spammers often link to sites that, in turn, link to the money site, indirectly boosting its value while shielding it from Google's direct scrutiny.

Google Leak Insights on Spam Detection

Analysis of data from the recent Google Leak, while an inexact science, offers insights into Google's spam detection priorities. By examining module names and descriptions related to "spam," approximately 115 relevant entries can be categorized into content, links, reputational, and behavioral signals. Further classification reveals a strong focus on link-related factors, particularly anchor text.

Key modules include:

spambrainTotalDocSpamScore: Calculates a document's overall spam score.
IndexingDocjoinerAnchorPhraseSpamInfo and IndexingDocjoinerAnchorSpamInfo modules: Identify spammy anchor phrases by analyzing link velocity, discovery dates, and the duration of link spikes.
GeostoreSourceTrustProto: Helps evaluate the trustworthiness of a source.

The primary takeaway is the significant importance Google places on links for spam detection, especially anchor text. The velocity at which links are acquired, along with the text and surrounding content, serves as a major indicator. A sudden spike in exact match anchors pointing to highly commercial pages is a red flag. Once a site is flagged for content or link-related abuse, SpamBrain further analyzes behavioral and reputational signals. If these corroborate, and the site exceeds certain thresholds, it faces penalties.

Underinvestment in Traditional Search

A key factor contributing to the worsening spam problem is Google's apparent shift in investment priorities. As Martin McGarry noted, Google seems less focused on traditional search, having "bigger, more hallucinogenic fish to fry" in the realm of AI. In 2025, Google had four updates lasting approximately 70 days, compared to seven updates lasting almost 130 days in 2024, suggesting a decrease in dedicated resources for core search quality.

The search experience is evolving, with Google rolling out preferred publisher sources globally and integrating inline linking more effectively in its AI products. Initiatives like The Google Web Guide, a personalized mix of trusted sources, AI Mode, and a classic search interface, indicate a move towards a more curated, engagement-driven feed. Unconfirmed reports also suggest Google is implementing persona-driven recommendation signals and a private publisher entity layer for Discover, which helps content go viral by grouping users into cohorts.

However, the economics of this shift are significant. Traditional "ten blue links" search is inexpensive to run, relying on static, retrieval-based indexing. In contrast, AI-driven search experiences are vastly more costly.

The High Cost of AI Searches

Google is projected to spend an additional $10 billion this year due to rising demand for cloud services, with its year-over-year CAPEX spend nearly doubling 2024's $52.5 billion. This substantial investment, part of a broader Silicon Valley trend, highlights the financial strain of the AI race.

While Google hasn't publicly disclosed exact figures, it's widely accepted that AI searches are significantly more expensive than traditional searches. A classic search is largely static and retrieval-based, serving pre-indexed links at a low operational cost. An AI Overview, however, is generative, requiring a large language model to summarize and produce natural language answers. AI Mode, with its multi-turn conversational interface, processes the entire dialogue in addition to new queries, demanding substantially more computational power through techniques like query fan-out, where dozens of searches run in parallel. Custom chips, efficiencies, and caching can mitigate these costs, but it remains one of Google's biggest challenges. This economic reality is likely why some experts believe AI Mode won't become the default search experience, especially for branded or navigational searches where it would be an enormous waste of resources.

According to The IET, if London's population (>9.7 million) each asked ChatGPT to write a 100-word email, it would require 4,874,000 liters of water to cool the servers—equivalent to filling over seven 25m swimming pools.

LLMs and Their Own Spam Problem

Large Language Models (LLMs) themselves are susceptible to spam. They are often trained, at least in part, by the sheer volume of mentions in training data, ingesting and accepting information at face value. This means that adding a descriptive line in a website footer, even if spammy, can be taken as fact by an LLM. Low-quality, manipulative tactics can thus be more effective than genuine marketing efforts.

Basic SEO strategies, reminiscent of 2012, are making a comeback in the LLM context, including "best" lists, paid placements, and reciprocal link exchanges. If a tactic is "half-arsed," it seems to be regaining traction. Since these models rely on Google's index for queries they cannot confidently answer (Retrieval-Augmented Generation or RAG), the effectiveness of Google's spam engine is more critical than ever. Just as publishers need to take a stand against big tech and AI, Google must seriously address its escalating spam problem.

The Future Outlook

The immediate future for combating spam in search remains uncertain. OpenAI has signed extraordinary contracts, yet its revenue is far from where it needs to be, and Google's CAPEX expenditure is soaring. In this financially driven environment, quality and accuracy may not be top priorities. Consumer and investor confidence is not particularly high, and companies are under pressure to generate revenue.

According to HSBC, OpenAI needs to raise at least $207 billion by 2030 to continue operating at a loss, leading to descriptions of it as "a money pit with a website on top." This financial pressure means new funding is primarily directed towards data centers, potentially diverting resources from core search quality and spam fighting. The challenge for Google and other tech giants will be to rationalize these investments while maintaining the integrity of their search results.

More Resources:

This post was originally published on Leadership in SEO.

Google's Spam Problem Worsens Amid AI Shift

How Google's Spam Detection System Works

Prevalent Types of Spam

Fake News and Scaled Content Abuse

Expired Domain Abuse and PBNs

Google Leak Insights on Spam Detection

Underinvestment in Traditional Search

The High Cost of AI Searches

LLMs and Their Own Spam Problem

The Future Outlook

Similar News

B2B Trust Deficit: Winning Buyers in the AI Era

AI Mode Reshapes Local SEO: A New Strategic Imperative

Reddit Boosts Automated Ad Targeting in Ads Manager

Ahrefs: Google AI Mode & Overviews Cite Different Sources

Bryan Johnson Livestreams Psilocybin Trip for Longevity