Beyond SEO: Fine-Tuning a Transformer to Optimize Content for AI Search

Based on the paper "Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation" by Lüttgenau, Colic & Ramirez. Published in July 2025.

Paper: arXiv:2507.03169


The Idea: What If a Model Could Rewrite Your Content Automatically?

Until now, GEO relied on manual strategies: "add statistics," "cite sources," "improve fluency." It's effective, but it requires human work for every page.

This paper proposes a radically different approach: fine-tuning a language model (BART) to automatically rewrite web content into a GEO-optimized version.

The idea is simple: feed the model raw website text as input and get an optimized version as output — one that will be better cited by AI search engines.


How It Works

The Training Data

The researchers created a synthetic dataset of 1,905 pairs of content in the travel domain:

  • Input: raw text from a travel website
  • Output: GEO-optimized version of the same text

The optimized versions incorporate:

  • Credible citations
  • Statistical evidence
  • Improved linguistic fluency

The Model

They fine-tuned BART-base — a relatively small and accessible transformer model — on these pairs. No need for massive GPUs: this is an approach that works with modest resources.


The Results

Rewriting Quality

The fine-tuned model outperforms base BART on text quality metrics:

  • ROUGE-L: 0.249 (vs 0.226 for the baseline) — measures similarity with the target text
  • BLEU: 0.200 (vs 0.173) — measures generation precision

Visibility in AI Search

This is where it gets interesting. The researchers tested the optimized content with Llama-3.3-70B as the generative engine:

  • +15.6% improvement in absolute word count within generated responses
  • +31% improvement in position-adjusted word count (words at the beginning of the response count more)

Content rewritten by the model is significantly more visible in AI responses.


Why This Matters

1. The First Proof That Fine-Tuning Works for GEO

Before this paper, all GEO approaches used either manual rules or LLM prompting. This is the first empirical demonstration that a model fine-tuned specifically for GEO can produce significant results.

2. Accessible With Modest Resources

BART-base is a small model. The dataset has fewer than 2,000 examples. No need for a GPU cluster to reproduce these results. This is an approach that small teams or independents could adopt.

3. Domain-Specific

The model was trained on travel content. The researchers emphasize that the approach is domain-specific — different datasets would be needed for other sectors. But the synthetic data creation pipeline is reproducible.


The Limitations

  • Only one domain tested (travel) — generalization to other sectors is not proven
  • Synthetic dataset — the training pairs are AI-generated, not human-created
  • Small model — BART-base has its limits in terms of understanding and generation
  • No end-to-end testing — as SAGEO Arena highlights, optimizing for generation without testing retrieval can be counterproductive

What This Signals for the Future of GEO

This paper opens a promising path: automating GEO through fine-tuning. One can imagine:

  • Models fine-tuned by sector (health, legal, e-commerce...)
  • Automated pipelines that rewrite content at scale
  • SaaS tools that integrate these models for content creators

It's still a first step — the dataset is small, the domain is limited — but the proof of concept is there. Automated GEO through fine-tuning is viable, even with limited resources.

Combined with agentic approaches like AgenticGEO, we can see a future taking shape where optimization for AI search will be largely automated.


Paper: Lüttgenau, F., Colic, I., & Ramirez, G. (2025). Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation. arXiv:2507.03169