Where AI Brand Visibility Breaks at the Consensus Layer
/ 7 min read
Summary
Most people would guess that if a URL gets cited by one major AI engine, it has a reasonable shot at appearing in the others. But. The practical question is what this changes for SEO, content quality, and AI search visibility.
For most of us working in digital content or SEO, there is a lingering anxiety about the "black box" of AI. We spend hours tweaking content for AI Engine Optimization (AEO), hoping to secure a spot in the citations of a chatbot or an AI overview. The assumption has always been that if you "win" in one major AI engine, you've likely cracked the code for the others. We treat AI visibility as a single, monolithic goal.
But the reality is far more fragmented. When we look at the data, it becomes clear that appearing in one AI engine doesn't guarantee, and often doesn't even suggest, that you'll appear in another. There is a massive "consensus gap" that most of our current dashboards are simply smoothing over with averages.
Understanding this gap is critical because it changes how we measure success. If the engines aren't agreeing on who the authorities are, then a blended visibility score is a vanity metric. We need to stop asking if we are "visible" and start asking if our visibility is "portable."
Only 2% Of URLs Get Cited
It feels intuitive to think that ChatGPT, Perplexity, and Google AI Overviews are all drawing from a similar pool of high authority sources and simply ranking them in a slightly different order. If a page is authoritative enough for one, it should be authoritative enough for all.
The data tells a different story. In a sample of 20,000 prompts, only 2.37% of the cited URLs appeared across all three engines for the same query. To put that in perspective, 91.07% of citations appeared in only one engine.
These two numbers are the most important part of the equation. The fact that the vast majority of citations are exclusive to a single platform suggests that these AI engines are not just ranking the same pool of content differently, they are drawing from largely disjointed pools of information entirely.
For anyone managing a brand's presence, this is a wake up call. If you rely on a single composite visibility score, you are likely hiding a strategic failure. A brand might look dominant in an aggregate report, but in reality, they could be completely invisible in two out of the three major engines. We aren't dealing with one leaderboard; we are dealing with three different distribution systems that rarely agree. A useful companion note is to Improve Your Brand’s LLM Visibility, because it looks at a nearby part of the same system. The same pattern also shows up in AI Recommendation Sets Leave Some Brands Out, where the practical question is how the signal becomes visible.
The 2% Holds Across Every Cut
One might wonder if this fragmentation is just a temporary glitch or a result of a specific set of queries. However, the consistency of this gap suggests it is structural rather than incidental. Across four different data samples, the overlap rate remained stubbornly low, hovering around 2% while the exclusive rate stayed near 91%.
Looking at the timeline, we see a very slight trend toward convergence, but nothing that fundamentally changes the landscape. In Q3 2025, the universal overlap was 2.2%. By Q4 2025 and Q1 2026, that number rose slightly to 2.7%. During that same period, engine exclusive citations dipped from 90.1% to roughly 88%.
While a small amount of convergence is happening, fragmentation is still the dominant state of the ecosystem. The "consensus gap" isn't a fluke of the data; it's a characteristic of how these LLMs and retrieval systems are built. They have different training sets, different retrieval mechanisms, and different definitions of what constitutes a "reliable" source.
Commercial Prompts Don't Converge Either
There is a common instinct in SEO that high intent, commercial queries should produce more consensus. The logic is that for a query like "best CRM" or "best running shoes," the pool of acceptable, authoritative sources is much narrower than it is for a broad informational query. We assume that the "industry leaders" will be recognized by every engine.
Surprisingly, the data doesn't support this. Commercial prompts showed a universal overlap of 2.4%, while informational prompts showed 2.0%. The difference is negligible.
Even when the answer set should theoretically be narrow, the engines still choose different sources the majority of the time. This suggests that the retrieval logic, the internal "trust" mechanism of each AI, outweighs the general industry consensus of authority. Even in the most competitive commercial spaces, each engine is playing by its own set of rules.
Guides Beat Homepages
If we look at which types of pages actually "travel" across engines, a clear pattern emerges. Content that explains, teaches, or compares performs better than brand centric or transactional pages.
The breakdown of cross engine overlap by page type looks like this:
Guides and Tutorials: 2.3%. Blogs: 1.8%. Category Pages: 1.6%. Product Pages: 1.2%. Homepages: 1.1%.
The takeaway here is that helpful, explanatory content is significantly more portable than brand assets. If your goal is to appear across multiple AI platforms, your best bet isn't to optimize your homepage or your product landing pages; it's to create deep, utility driven guides.
However, it is important to keep the absolute numbers in mind. Even the "best" performing page type, guides, only has a 2.3% overlap. This isn't a signal to simply "publish more guides" in hopes of universal dominance. Rather, it's a reminder that utility is the primary currency of AI citations. Helpful content travels better than brand content, but it still faces an uphill battle against fragmentation.
Visibility Is Not The Same As Portability
This is perhaps the most critical distinction for operators to make. We often confuse citation frequency (how often you appear) with citation portability (how often you appear across different engines).
Wikipedia is the perfect example of this paradox. In the dataset, Wikipedia appears 16,073 times. It is incredibly visible. Yet, only 1.3% of those appearances are universal across all three engines. Reddit fares even worse; despite appearing 14,267 times, its universal overlap is a mere 0.1%. Reuters, with 1,202 appearances, had a 0.0% universal overlap.
This reveals a dangerous blind spot in aggregate dashboards. A domain can appear to be dominant because it has a high volume of citations in one engine, but it may have zero portability. If you are heavily reliant on a single platform's habit, you are one algorithm update away from total invisibility.
Presence tells you that you are visible. Portability tells you that your visibility is resilient.
What This Means For Operators
The practical conclusion is that we need to stop treating AI visibility as a single metric. A blended AEO score is too abstract to be useful. Instead, I suggest measuring your domain's health through three distinct lenses:
1. Presence: This is the percentage of your tracked prompts where your domain appears in any engine. This is your baseline, it tells you if you are in the game at all.
2. Portability: This is the percentage of your cited URLs that appear in all three engines. This is your resilience metric. High portability means your content is recognized as authoritative regardless of the engine's specific retrieval logic.
3. Concentration: This is the percentage of your citations that come from a single engine. This reveals where your risk lies. If your concentration is 90% in one engine, your "visibility" is actually a dependency.
When the overlap between engines is this low, a one size fits all strategy is a mistake. We have to acknowledge that different engines prefer different formats and sources, and our goal should be to move from fragile presence to resilient portability.
Methodology
It is worth noting a few caveats regarding this analysis. The dataset is skewed toward the customer base of Omnia, and the classification of intent and page types relies on regex, which is a directional tool rather than a perfect taxonomy.
However, these details don't undermine the core finding. The signal is found in the consistency of the pattern. Regardless of the specific cut, the result is the same: very little overlap and very high engine specificity. The analysis is based on four prompt samples, including three cohorts of 5,000 prompts each, tracked starting from January 1, 2025, and July 1, 2025.
Practical next steps
The useful part is not only the idea itself, but the operating habit behind it. Use it as a checklist for decisions: what deserves attention now, what should be monitored, what needs a stronger evidence base, and what can wait until the system has more scale.
Comments
Comments are published automatically. Links are not allowed inside comments.