SAGEO Arena: The Benchmark That Puts GEO to the Test

Based on the paper "SAGEO Arena: A Realistic Environment for Evaluating Search-Augmented Generative Engine Optimization" by Kim, Jeong, Kim, Lee & Lee (Yonsei University / Konkuk University). Published in February 2026.

Paper: arXiv:2602.12187


The Problem: GEO Lives in a Simplified World

Since the foundational GEO paper in 2024, researchers have proposed numerous strategies to improve content visibility in AI search engine responses. Adding statistics, citations, improving fluency — all of this works. In the lab.

But there's a fundamental problem: all existing benchmarks are cheating.

How? They skip the hardest steps. When Perplexity or Google AI Overviews answer a question, here's what actually happens:

  1. Retrieval — the engine searches through billions of pages and selects a few hundred
  2. Reranking — a model reranks these pages to keep the top 5-10
  3. Generation — an LLM synthesizes a response from these selected pages

Existing benchmarks (GEO-Bench, AutoGEO, etc.) only test step 3. They feed the documents directly to the LLM and measure whether optimization improves citation. They completely ignore steps 1 and 2.

It's like testing a sprinter only on the last 10 meters of a 100-meter race. It says nothing about their ability to run the full race.


What SAGEO Arena Changes

SAGEO Arena is the first benchmark that tests the end-to-end pipeline: retrieval, reranking, and generation. It also introduces a key concept everyone was ignoring: structural information.

A Real Web Corpus at Scale

Unlike previous benchmarks that use raw text, SAGEO Arena built a corpus of 170,000 real web documents covering 9 domains. For each document, they extracted:

  • The body text — the main content
  • The structural information — title, meta description, heading hierarchy (h1, h2...), and schema markup

Why does this matter? Because in the real world, search engines heavily use these structural signals to decide which documents to retrieve and rank. Ignoring them means ignoring half the game.

A Complete and Modular Pipeline

SAGEO Arena integrates all three stages of the generative search pipeline:

Stage What happens What is evaluated
Retrieval The engine searches through the 170K document corpus Is the optimized document still retrieved?
Reranking A model reranks the results Does the document move up or down in ranking?
Generation An LLM generates the final response Is the document cited in the response?

Each stage is modular and configurable. Researchers can trace exactly how an optimization affects each stage of the pipeline.

The Evaluation Protocol

The process is simple but rigorous:

  1. Run the pipeline with a test query
  2. Select a target document from those that reach the generation stage
  3. Apply a GEO optimization strategy to that document
  4. Reinsert the optimized document into the corpus
  5. Rerun the pipeline with the same query
  6. Measure: did the document gain, maintain, or lose visibility at each stage?

The Shocking Finding: GEO Breaks Retrieval

This is the paper's most important result, and it calls the entire field into question:

Existing GEO strategies sometimes improve generation, but actively degrade retrieval and reranking.

Concretely: when you optimize your page's text with classic GEO techniques (adding stats, citations, etc.), your page can disappear from search results before even reaching the LLM. The optimized document is no longer retrieved by the search engine, so it has zero chance of being cited.

It's as if you improved your store's window display, but the GPS stopped directing customers to your street.

Why Does This Happen?

GEO strategies modify body text to please the LLM (step 3), but these modifications can:

  • Alter the signals the retrieval engine uses to find the document (step 1)
  • Change the perceived relevance for the reranking model (step 2)

Optimizing for one stage can sabotage the others. This is a problem nobody had measured before SAGEO Arena.


Structural Information: The Unexpected Hero

The second major finding: structural information (titles, meta descriptions, schema markup) is essential for maintaining visibility at the retrieval stage.

The researchers found that the two types of information play complementary roles:

Information type Primary role
Structural information (title, meta, schema) Determines whether the document is found (retrieval)
Body text Determines whether the document is well-ranked (reranking) and cited (generation)

In other words: structure gets your document into the race, and content determines whether it wins. Optimizing only content without touching structure is like having an excellent product with no marketing.

This is a direct bridge between traditional SEO (which focuses on structure) and GEO (which focuses on content). Both are necessary.


SAGEO: Stage-Aware Optimization

Armed with these findings, the researchers propose stage-aware SAGEO — an approach that adapts optimization to each stage of the pipeline:

  • For retrieval — optimize structural information (titles, meta descriptions, schema markup)
  • For reranking — optimize relevance and text quality
  • For generation — apply classic GEO techniques (stats, citations, fluency)

This combined approach achieves the best overall visibility among all evaluated strategies. It's the first time a method jointly optimizes SEO and GEO.


Comparison With Existing Benchmarks

Benchmark Corpus Retrieval Reranking Generation Structural info
GEO-Bench (2024) No No Yes No
AutoGEO (2025) No No Yes No
C-SEO Bench (2025) No No Yes No
SAGEO Arena (2026) 170K docs Yes Yes Yes Yes

The difference is striking: SAGEO Arena is the only one covering the complete pipeline with real web documents.


What This Means for Content Creators

1. SEO Is Not Dead

Contrary to what many claim, traditional SEO remains crucial. Structural information (titles, meta descriptions, schema markup) is what enables your content to be found by AI search engines in the first place. Without it, your GEO optimizations are useless.

2. Don't Sacrifice Retrieval for Generation

If you rewrite your content to please LLMs, verify that your page is still properly indexed and retrieved by search engines. Content perfectly optimized for generation but invisible to retrieval has zero visibility.

3. Think in Pipelines, Not Silos

Effective optimization requires working on all three stages:

  • Structure — to be found
  • Relevance — to be well-ranked
  • Content quality — to be cited

4. Schema Markup Matters

Structured data (JSON-LD, schema.org) isn't just a SEO bonus. It plays an active role in how AI search engines understand and select your content.


Limitations to Keep in Mind

  • The benchmark uses open-source models to simulate the pipeline, not actual commercial engines (Google, Perplexity, etc.)
  • The 170K document corpus is large but remains smaller than the real web
  • Off-page factors (backlinks, domain authority) are not modeled

Despite these limitations, SAGEO Arena represents a major qualitative leap over previous benchmarks.


The Big Picture

SAGEO Arena is a paper that should make the entire GEO community think. Its central message is simple but powerful:

Optimizing only for the last stage of the pipeline is insufficient — and potentially counterproductive.

The future of optimization for AI search is neither SEO alone nor GEO alone, but an integrated approach that understands and optimizes each stage of the generative search pipeline. SAGEO Arena finally provides the tool to measure this.

For content creators, the lesson is clear: keep your SEO fundamentals solid (structure, metadata, schema) while applying GEO techniques (stats, citations, fluency) to your content. One without the other isn't enough.


Paper: Kim, S., Jeong, W., Kim, S., Lee, S., & Lee, D. (2026). SAGEO Arena: A Realistic Environment for Evaluating Search-Augmented Generative Engine Optimization. arXiv:2602.12187