Mt. Stupid Has a Pricing Page: the Practical Angle

Shalin Siriwardhana

Summary

Anthropic published its main interpretability research post in May 2024. It opens: "We mostly treat AI models as a black box:... The practical question is what this changes for SEO, content quality, and AI-search visibility.

Mt. Stupid Has a Pricing Page: the Practical Angle

There is a specific kind of anxiety that comes with the current state of search. If you spend any time on professional networks like LinkedIn, you've likely seen the surge of "Generative Engine Optimization" (GEO) advice. It arrives as a flood of certainty: do this specific thing, use this exact schema, and you will "guarantee" your presence in AI Overviews.

It is tempting to lean into this certainty. When the ground is shifting beneath us, a precise framework feels like a lifeline. But there is a dangerous gap between the people selling these "hacks" and the people actually building the models. When we ignore that gap, we aren't just wasting time—we are operating on "Mt. Stupid," the peak of the Dunning-Kruger curve where confidence is highest and actual knowledge is lowest.

The Admission of the Architects

To understand why the current GEO trend is problematic, we have to look at what the creators of these systems are saying. Dario Amodei, the head of Anthropic, noted as recently as January that AI systems remain unpredictable and difficult to control. This isn't a casual observation; it is a statement about the very technology his company sells.

Anthropic’s own interpretability research from May 2024 is even more candid. They describe their models as "black boxes," admitting that while an input goes in and a response comes out, the "why" behind a specific answer remains opaque. The people who wrote the code cannot fully explain the output.

This sentiment is echoed across the industry. Neel Nanda of Google DeepMind’s mechanistic interpretability team suggested in late 2025 that the goal of achieving robust, guaranteed interpretability is likely an impossible dream. Even Ilya Sutskever, a foundational figure in the scaling hypothesis, has pointed out that as these models reason more, they actually become less predictable.

Expert Interpretation: This is the most critical piece of evidence in the entire puzzle. If the engineers who designed the architecture, the researchers who trained the weights, and the scientists who study the neurons admit the system is a black box, any claim of "deterministic control" from the outside is logically impossible. The tradeoff here is between the comfort of a "guarantee" and the reality of a probability. The decision you should inspect is whether you are trusting a vendor who claims to have "cracked the code" of a system that its own creators admit is unpredictable.

The Language of Deterministic Selling

Contrast the caution of the builders with the confidence of the consultants. The current "Technical GEO" discourse is filled with deterministic language. You will see "four-pillar frameworks," "guaranteed inclusion," and claims of "13% citation lifts" or "2.8x conversion improvements."

The prescriptions are incredibly specific: maintain a 300-character paragraph limit to dictate how a vector database chunks your content, or use a specific first-sentence structure to ensure the AI parses the answer. These aren't presented as hypotheses to be tested; they are presented as laws of physics.

The problem is that these "proofs" are often circular. An agency produces data based on its own prescriptions, applies those prescriptions to a client, sees a metric move, and calls it a victory. There are rarely control groups or pre-registered hypotheses. It is confirmation bias dressed up as data science.

Expert Interpretation: In any technical field, the level of confidence should be proportional to the level of predictability. When you see a consultant using words like "ensures," "guarantees," or "dictates" regarding a black-box LLM, they are selling a product, not a process. The risk here is "confirmation in costume"—mistaking a random correlation for a causal lever. Before investing in a "GEO framework," ask to see the control group. If they can't provide one, they aren't doing science; they are doing marketing.

The Reality of Controlled Testing

When we move away from anecdotal "case studies" and toward actual controlled testing, the GEO narrative begins to crumble. A recent study by Ahrefs, conducted by Louise Linehan and Xibeijia Guan, provides a stark reality check. They tracked 1,885 pages that implemented JSON-LD schema between August 2025 and March 2026.

To ensure the data was clean, they used 4,000 matched control pages and measured citation changes 30 days before and after the schema was added. They looked across ChatGPT, Google AI Mode, and Google AI Overviews using a difference-in-differences methodology.

The result? There was no meaningful uplift in citations. In fact, for Google AI Overviews, there was a small but statistically significant decline in citations for the pages that added the schema.

Expert Interpretation: This study is a masterclass in why methodology matters. Most "SEO wins" are reported as "I did X and then Y happened," which ignores the baseline of what would have happened anyway. By using a matched control group, Ahrefs isolated the variable. The takeaway is that the "schema for AI" trend is likely a phantom. The decision to make here is to stop prioritizing "AI-specific" technical tweaks that lack empirical support and instead focus on the quality of the information being retrieved.

When the Source of Truth Speaks

If the Ahrefs study was a warning, Google’s own documentation is a stop sign. In May 2026, Google published official guidance on optimizing for generative AI features. Rather than providing a new playbook, they spent the page debunking the existing GEO prescriptions.

Google explicitly stated that several common "hacks" are ineffective. Specifically, they noted that llms.txt files are not needed, content "chunking" is not required, and rewriting content specifically for AI systems is unnecessary. They also dismissed the need for special schema markup for AI and warned that pursuing inauthentic mentions does not help.

The language used by Google was unusually blunt for a developer page, stating directly that many of these suggested "hacks" are not supported by how Google Search actually works. They named "Answer Engine Optimization" and "Generative Engine Optimization" by their full names and rejected the entire playbook.

Expert Interpretation: There is a significant difference between "reading between the lines" of a Google leak and reading a direct rejection of a methodology in official documentation. When the entity that controls the system tells you that a specific set of tactics is a "hack" and not a feature, the cost of continuing those tactics is high. The tradeoff is spending resources on "optimization" versus spending them on "utility." If the system is designed to surface the best answer, the only sustainable "hack" is to actually be the best answer.

The Incentive Structure of Confidence

You might wonder why, in the face of builder admissions, failed tests, and official denials, the GEO industry continues to thrive. The answer lies in the social and professional cost of being skeptical.

On platforms like LinkedIn, confidence is rewarded with engagement. Posting a bold, deterministic claim costs nothing and yields high returns in the form of audience growth and inbound leads. If the claim is later proven wrong, the author has already moved on to the next acronym before the correction catches up.

Conversely, posting a correction or a skeptical analysis is costly. It marks you as a contrarian or suggests you "don't get it." In a professional environment where "thought leadership" is measured by the ability to predict the future, admitting that the future is unpredictable is a brand risk.

Expert Interpretation: We are seeing a misalignment of incentives. The people with the most confidence have the least to lose if they are wrong, while the people with the most caution (the builders) have the most to lose if they overpromise. When evaluating advice, you must discount the "confidence" of the speaker and instead look at the "evidence" of the claim. The decision to inspect is: is this person providing a framework because it works, or because a framework is an easy product to sell?

The Data of Absence

If we strip away the noise of the "optimization" discourse, we are left with a glaring absence. We have a technical field where the dominant prescriptions are consistently contradicted by controlled tests, yet the prescriptions continue to be sold.

The most telling data point is the gradient of certainty. When the people who built the systems hedge their bets and the people optimizing for those systems offer guarantees, the builders are almost certainly the ones who are right. No one working on inference attribution believes that a 300-character paragraph limit "dictates" the behavior of a multi-billion parameter model.

The real question isn't whether GEO "hacks" work—the evidence suggests they don't. The real question is why the industry is so eager to believe in a shortcut for a system that is fundamentally unpredictable.

Expert Interpretation: The "absence" of evidence for GEO is, in itself, the evidence. The most practical path forward is to stop chasing the gradient of "AI optimization" and return to the fundamentals of information gain and user utility. If the model is a black box, the only way to influence it is to provide the highest quality, most authoritative signal possible. Everything else is just noise on the way up Mt. Stupid.

Practical next steps

The useful part is not only the idea itself, but the operating habit behind it. Use it as a checklist for decisions: what deserves attention now, what should be monitored, what needs a stronger evidence base, and what can wait until the system has more scale.

Comments

Comments are published automatically. Links are not allowed inside comments.

Only your name, optional LinkedIn profile, and comment will be shown.