Generative engine optimisation: separating sound practice from snake oil

By Iain,

Generative engine optimisation: separating sound practice from snake oil

A new three-letter acronym is stalking the marketing industry. Generative Engine Optimisation (GEO) is the practice of making your content visible in AI-generated answers, such as those produced by ChatGPT, Perplexity, Google AI Overviews, and Claude. The term was coined in a 2023 research paper from Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi, presented at the KDD 2024 conference, and has since been adopted by a cottage industry of consultants, conferences, and SaaS dashboards all promising to crack the code.

The underlying shift is large enough to warrant attention, even if the hype exceeds the evidence. Gartner predicted that traditional search engine volume would drop 25% by 2026 as AI chatbots cannibalise queries. Whether the exact number lands is debatable, but the direction of travel is not, and the raw volumes are staggering. ChatGPT processes roughly 2.5 billion prompts per day, Perplexity handles 780 million monthly queries, and Google AI Overviews now appear across more than 200 countries and territories. According to an analysis by Ahrefs, AI Overviews reduced clicks to the top-ranking result by 58% by the end of 2025, up from 34.5% just eight months earlier, and the trend is accelerating rather than flattening. The question for anyone producing content with commercial intent is no longer whether to care about this, but what you can do about it that stands on firmer ground than hope.

What the research says

The foundational GEO paper tested nine optimisation strategies across 10,000 search queries using a generative engine modelled on BingChat’s architecture. The results, later validated on Perplexity, were unambiguous in one respect and maddeningly vague in another. Three techniques produced consistent improvements of 30-40% on the researchers’ Position-Adjusted Word Count metric, and 15-30% on their Subjective Impression metric. Those three were Statistics Addition (embedding quantitative data into content), Quotation Addition (incorporating attributed quotes from credible sources), and Cite Sources (adding citations from authoritative domains).

The maddeningly vague part is that the techniques which flopped are the ones a naive observer might expect to work. Keyword Stuffing, the blunt instrument that dominated early SEO, produced negligible gains. An authoritative tone, which makes content sound more assertive and confident, was performed inconsistently and only reliably improved visibility in narrow domains like historical content. The Princeton team carefully noted that these results varied by domain, meaning that statistics performed best for law and government queries, while quotations outperformed other methods in people, society, and history categories.

One result from the study deserves more attention. The best-performing combination was not any single technique in isolation but Fluency Optimisation paired with Statistics Addition, which beat every standalone method by more than 5.5%. Cite Sources performed modestly on its own but amplified the impact of other techniques by an average of 31.4% when combined with them. The implication is useful and unglamorous. Write clearly, back your claims with numbers, and cite your sources, which is what any decent editor would tell you, and that is partly the point.

The SEO parallels

If you spent any time in the SEO industry between 2005 and 2015, the current GEO hype cycle will feel disturbingly familiar. The early days of search optimisation were dominated by technical tricks, keyword density formulae, link farms, hidden text, and a general conviction that Google’s algorithm was a lock to be picked. It took the industry roughly a decade to accept that the most reliable ranking strategy was to produce genuinely useful content and make it technically accessible. Google’s Panda update in 2011 buried a generation of content farms, and Penguin the following year did the same to link schemes. The slow, grudging consensus that emerged was that quality, authority, and relevance were what search engines rewarded because those were the signals most resistant to gaming.

GEO is at an equivalent stage, somewhere around 2008 in SEO years. The techniques that work, according to the limited evidence we have, are the same ones that worked in mature SEO once the tricks stopped working. Write with clarity and specificity, include verifiable data, cite reputable sources, and build topical authority through depth rather than breadth. The parallels run deeper than technique, because SEO had its own snake oil era of guaranteed first-page rankings and proprietary algorithms that were really just automated link-building scripts. GEO is already generating its own version, with agencies promising AI citation guarantees and tools claiming to have decoded how each model selects sources. A critique of the Princeton study by SandboxSEO pointed out that the three winning techniques all involved adding content, while the six weaker ones only tweaked existing text, raising the possibility that the gains came from volume and information density rather than any particular optimisation magic.

There is one critical difference between SEO and GEO that changes the economics for content producers. In traditional search, you got a click, the user landed on your site, saw your brand, maybe converted, and you could retarget them later. In a generative engine response, the AI synthesises your content into its answer, and the user may never visit your page at all. This is the zero-click problem scaled to a new dimension. Research from Seer Interactive found that 58.5% of Google searches in the US already end without a click, rising to 75% on mobile. When AI Overviews are present, organic click-through rates drop by a further 61%. You are optimising for citation, not traffic, which means the return is brand authority rather than direct page visits.

What to do (the practical part)

Strip away the jargon and the GEO-specific tooling, and the evidence supports a surprisingly short list of practices, none of which are complicated but all of which require discipline.

Answer the question in the first paragraph. AI retrieval systems extract from the opening of sections. GenOptima’s cross-platform monitoring found that AI Overviews cite from the first 30% of content 55% of the time. This echoes the inverted-pyramid structure that journalists have used for 150 years and that good SEO copywriting already follows. Put the answer first and elaborate after the reader has what they came for.

Include statistics, and make them specific. The Princeton study found that embedding quantitative data produced the single largest visibility improvement at 41%. ZipTie.dev’s analysis found that websites with high data density received 4.31 times more citation occurrences per URL than directory-style listings. Specificity is what separates visible content from invisible content in this context. “Revenue grew last year” is invisible to a retrieval system. “Revenue grew 23% year-on-year to $4.7 million” gives the model something it can extract and attribute.

Cite credible external sources. The Princeton paper found that including citations from authoritative domains (.edu, .gov, peer-reviewed journals) improved visibility and amplified the effect of other techniques. This is the GEO equivalent of backlinks in SEO, except the direction is reversed. Instead of earning inbound links to prove your authority, you are linking outward to ground your claims.

Publish original research and proprietary data. AI engines are risk-minimising systems. They prefer to cite content with verifiable, attributable data because it reduces the probability of generating incorrect responses. ZipTie.dev noted that Domain Authority, the traditional SEO metric, correlates with AI citations at only r=0.18, meaning traditional SEO metrics explain less than 20% of the variance in AI citation rates. What does correlate is original data, named expertise, and verifiable methodology. If you have proprietary data, publish it, and if you run surveys, share the results openly. The content nobody else has is the content an AI model has a reason to cite over a dozen lookalike alternatives.

Structure content for extraction, not just reading. Generative engines do not evaluate whole pages the way Google’s traditional algorithm does, because they retrieve and synthesise individual sections rather than scoring entire documents. Each section needs to function as a self-contained unit with a clear heading, a direct statement of the point, and supporting evidence. The old SEO practice of writing a 4,000-word page where the useful answer was buried at the 2,000-word mark is even more counterproductive in a GEO context. The GEO Lab’s framework describes this as optimising for “extractability,” the ability of an AI system to cleanly parse and compress a content block into a response.

Implement schema markup. This is the driest recommendation on the list and probably the most undervalued. JSON-LD structured data (Article, FAQPage, HowTo, Organisation) gives AI systems machine-readable signals about what your content is and what it covers. It is the equivalent of putting labels on the boxes in your warehouse rather than trusting the delivery driver to open each one and guess.

Keep content current. AI retrieval systems weigh recency for time-sensitive queries. Search Engine Land’s 2026 guide found that un-updated pages lose citations at three times the normal rate. Add a visible “Last updated” date, refresh statistics annually, and add a “What changed in [current year]” section to evergreen articles. This is identical to the SEO practice of content refreshing, which has been proven to recover ranking decay since at least 2018.

What the snake oil merchants are selling

Where the evidence is thin, the salespeople are thick. Several claims circulating in the GEO industry deserve scepticism proportional to their confidence.

“We can guarantee AI citations.” Nobody can guarantee citations in a system where the retrieval mechanism changes with every model update and where different platforms cite entirely different sources. ZipTie.dev’s cross-platform analysis found that 89% of citations differ between ChatGPT and Perplexity, and only 18% of brands are visible across all three major AI platforms simultaneously. If someone tells you they can guarantee your brand appears in AI answers, they are either lying or they have a meaningfully different definition of “guarantee” than you do.

“GEO is a completely new discipline.” It is not, and calling it one obscures how much of GEO is inherited from mature content strategy. The techniques that work are enhanced versions of practices any competent content strategist already follows. The retrieval layer is new and the source selection mechanism differs from PageRank, but the underlying principle, that AI models prefer well-structured, well-sourced, information-dense content produced by credible authors, is a restatement of Google’s E-E-A-T framework in a different technical context.

“You need our proprietary tool to track AI visibility.” You need measurement, certainly. Tracking whether your brand is mentioned in AI-generated responses is a legitimate, unsolved problem. GA4 does not natively segment AI referral traffic well, and manual citation audits are tedious. But the tool market is immature, the metrics are unstandardised, and committing to an annual contract with a GEO analytics vendor in 2026 is like buying an SEO tool subscription in 2004. The ground is shifting too quickly for any vendor’s current approach to be definitive.

“Blocking AI crawlers will protect your content.” This one is more complicated than the others. A study by Rutgers Business School and The Wharton School published in December 2025 found that publishers who blocked AI bots experienced a 23% traffic decline, while those who allowed crawling fared better. The robots.txt standard is voluntary, compliance is patchy, and the evidence suggests blocking may do more harm than good. Tollbit’s Q2 2025 report found that 13.26% of AI bot requests ignored robots.txt directives entirely. BuzzStream’s analysis of the top 100 news sites found that 79% block AI training bots, but only 14% block all AI bots, suggesting that most publishers are trying to thread a needle between protecting training data and maintaining retrieval visibility. Blocking is a blunt instrument for a problem that requires a scalpel.

The uncomfortable economics

The elephant in the GEO room is the same one that sat quietly in the corner throughout the history of SEO and eventually grew too large to ignore: who pays for the content that AI models cite?

In traditional search, the implicit deal was that Google crawled your content, ranked it, and sent you traffic in return. With AI-generated responses, that deal is broken because the model extracts your information, synthesises it into an answer, and the user gets what they need without clicking through. Cloudflare’s data quantified the asymmetry with a clarity that should worry every publisher. In June 2025, OpenAI’s crawl-to-referral ratio was 1,700 to one, and Anthropic’s was 73,000 to one, meaning that for every visitor Anthropic sent to a publisher’s site, it crawled 73,000 pages of their content.

That arrangement is not sustainable, because content production costs money, and somebody has to pay for it. If the return on that investment shifts from traffic to brand mentions inside an AI response, the economics of content marketing change in ways most businesses have not yet priced in. The businesses that will handle this transition most successfully are those with existing brand recognition (because AI models tend to cite known brands), those producing genuinely original research (because derivative content has no citation advantage), and those treating AI visibility as one channel in a diversified mix rather than the whole strategy.

Where this goes next

The more interesting question is not how to optimise for today’s generative engines, which are still search tools with a conversational interface bolted on, but what happens when the web’s primary audience stops being human altogether.

The agentic browser has already arrived, and it changes the picture considerably. OpenAI launched Atlas in October 2025 with an Agent Mode that browses websites, fills forms, and completes transactions on behalf of users. Perplexity’s Comet browser autonomously browses and synthesises across multiple tabs. Google shipped an Auto Browse feature in Chrome in January 2026 that handles multi-step workflows through a Gemini-powered side panel. Fellou, Genspark, and Opera Neon are all competing in the same space, and in February 2026, Google released an early preview of WebMCP in Chrome Canary, a protocol for structured AI agent interaction with websites. Nimble, an enterprise data company, launched its Agentic Search Platform backed by $47 million in Series B funding, with its CEO stating plainly that machines are becoming the web’s first-class citizens. The web was built for humans clicking through pages, but the next version may be built for agents moving through APIs, structured data, and machine-readable interfaces, with the visual, human-facing layer becoming almost secondary.

When that shift is complete, GEO, as currently conceived, will look quaint. Optimising for AI citation in a text-based answer is a transitional concern. The endgame is a web where your content, your product catalogue, your pricing, and your availability data must be consumable by autonomous systems that never render a webpage or read a paragraph the way a human does. The parallel to draw here is not with SEO but with the transition from print catalogues to e-commerce in the 1990s. The companies that understood that the internet was not a digital version of a paper catalogue, but an entirely new distribution channel with different economics and different design constraints, were the ones that built Amazon, not the ones that scanned their Argos book into a PDF.

The practical implication for anyone working on their digital presence today is that the basics of GEO (clear structure, verifiable data, authoritative sourcing, schema markup) are worth doing because they are also the basics of making your content machine-readable. They are building blocks for the agentic web, not throwaway tactics for a temporary trend. The content strategies that will age well are those built on substance rather than tricks, on being the kind of source an AI system would be foolish not to cite, rather than gaming a retrieval algorithm that will change next quarter.

Somewhere in the timeline of every optimisation discipline, there is a moment when the practitioners stop trying to exploit the system and start trying to be genuinely useful to it. SEO reached that point roughly around 2015 when Google got good enough at detecting manipulation that the game stopped being worth the candle. GEO will reach it sooner because the systems it targets are smarter and update faster. The businesses that skip straight to the “be useful” phase will look like geniuses in three years. Those buying citation guarantees from agencies with a three-month track record will be shopping for a new acronym.

More insights:

  • The path to an agent-first web

    For three decades, the web has operated on an implicit contract between the people who build websites and the people who visit them. You design pages for human eyes and organise information for human brains, monetising attention through ads, upsells, and sticky navigation patter…

  • Automating your marketing 01: Paid Search Ads

    Google has always wanted you to believe that running search ads is simple and not as complex as it actually is. Set a budget (a generous one!), choose some keywords, and let the machine handle the rest. To be fair, the machine has become exceptionally good at certain aspects of …

  • Why AI models hallucinate

    In September 2025, OpenAI published a paper that said something the AI industry already suspected but hadn’t quite articulated. The paper, “Why Language Models Hallucinate”, authored by Adam Tauman Kalai, Ofir Nachum, Santosh Vempala, and Edwin Zhang, didn’t just catalogue the p…

  • Received wisdom: classic frameworks under AI pressure 01: David C Baker

    David C Baker has spent thirty years telling agency owners something they already suspected but lacked the courage to act on. You are not expensive enough, not focused enough in what you do. You are not sufficiently authoritative with your clients. The issue is not your work. Th…

  • The trust problem that you already solved

    Every developer who has spent time with AI coding tools carries the same low-grade anxiety. You ask the model to build something, it hands you back a file, and then you stare at it like a customs inspector wondering whether the suitcase has a false bottom. Line by line, function…

All insights

Book a call

Have a challenge in mind or just want to connect? Schedule a call with Garrett, or reach out via email or LinkedIn.

A playful, hand-drawn illustration of a group of characters holding up scorecards with the number ‘11’. They sit behind a table scattered with various other numbers.