# Content Negotiation — Teaching Our Site to Speak AI

*2026-02-14 — Tech — By Kim*

> We built content negotiation into every page on morgondag.io. Bots and AI agents now get clean markdown instead of HTML — using Accept headers, the same HTTP mechanism browsers have used since 1996.
So you want to publish a blog in 2026. You spin up your framework, write some posts, hit deploy.
Then you remember: oh right, the machines are reading too. Think about the accessibility features!

You add a `sitemap.xml` for Google. An RSS feed for the three people still using feed readers (respect).
OpenGraph tags so your links don't look broken on Discord and Bluesky. JSON-LD so search engines understand your structured data.
A `robots.txt` to be polite. Maybe even an [`llms.txt`](/llms.txt) because that's the new thing.

That's a lot of machine-readable formats bolted onto what is essentially a pile of HTML.

LLM — Large Language Model. The thing reading your website right now, probably.}>LLMs are eating the web. Not maliciously — they're just doing what we built them to do. They crawl, they scrape, they try to understand. But right now everyone's doing it the hard way. Fetch a full HTML page, strip out the nav and the footer and the cookie banners, parse what's left, hope for the best.

**Whataboutism** if the machines could just *ask* for the format they want?

## The 30-year-old feature nobody used

HTTP — Hypertext Transfer Protocol. The protocol your browser uses. Since 1996.}>HTTP has had content negotiation since the beginning. The `Accept` header lets a client say "I'd prefer this format" and the server responds accordingly. Browsers send `Accept: text/html` and get HTML. An API — Application Programming Interface. The door you knock on when you want data but don't want to be invited inside.}>API client sends `Accept: application/json` and gets JSON — JavaScript Object Notation. Curly braces all the way down. XML lost the war.}>JSON.

I've used this pattern for years building APIs— same endpoint, different representations:
- Send `Accept: application/json` and get JSON,
- Send `Accept: text/plain` for debugging,
- Send `Accept: text/html` for a rendered page.

One URL, one resource, multiple formats. REST — Representational State Transfer. The architectural style everyone claims to follow and almost nobody actually implements correctly.}>REST as it was intended. You can still support `.json` and `.xml` extensions alongside it — but the `Accept` header means clients don't *have* to know about them.

Nobody really used content negotiation for *websites* though.

Browsers wanted HTML. The End.. Game over man...

---

Until now. Suddenly half your traffic isn't browsers anymore — it's GPTBot, ClaudeBot, PerplexityBot, and a growing swarm of agents that would really rather not parse a React/Svelte/HTMX/Angular component tree to find a paragraph of text.

[This piece by Reading.sh](https://medium.com/reading-sh/cloudflare-just-taught-the-web-to-speak-ai-ad92a485e9e0) about Cloudflare rolling out markdown responses at the CDN level inspired me to take the same pattern and apply it across the whole site. If a client asks for markdown, give them markdown. Same URL, different representation.

HTTP was *designed* to do this. We just never had a reason to use it for websites — until the machines showed up.

## Welcome machines - We Love you long time

Every page on morgondag.io now supports content negotiation. The homepage, the about page, every game page, every news post — including this one.

When a request comes in we check a few things:

1. Does the `Accept` header include `text/markdown` or `text/plain`?
2. Is there a `?format=markdown` query parameter?
3. Is the User-Agent a known bot, crawler, or AI agent?
4. Are browser-specific headers like `Sec-Fetch-Mode` missing?

If any match, we rewrite the request to a markdown route. Same URL to the outside world — clean, lightweight text instead of a full web app.

The response comes back with `Content-Type: text/markdown` and an `x-markdown-tokens` header estimating the token count for agents managing context windows.

### Positive bot detection

We don't just wait for polite agents to send the right headers. We actively detect bots and serve them markdown automatically:

- **Known User-Agent patterns** — Googlebot, GPTBot, ClaudeBot, PerplexityBot, curl, python-requests, and about 70 others
- **Missing `Sec-Fetch-Mode`** — every real browser has sent this since 2020. Missing + no `Mozilla/` in the UA = not a browser
- **Empty User-Agent** — no legitimate browser omits this
- **Explicit opt-in** — `Accept: text/markdown` from any client

If you're a machine, you get machine-readable content. No DOM parsing, no guessing where the article starts.

## Try it

```bash
curl https://morgondag.io/news/ai-content-negotiation -H 'Accept: text/markdown'
```

```bash
curl https://morgondag.io -H 'Accept: text/markdown'
```

Or skip headers entirely — [https://morgondag.io/news/ai-content-negotiation?format=markdown](/news/ai-content-negotiation?format=markdown).


Bots that *want* HTML can override the automatic markdown by sending `Accept: text/html` or appending `?format=html` to the URL. Content negotiation goes both ways.

No new standards. No new file formats. No committee meetings. Just the `Accept` header doing what it was always meant to do.

This also opens up some fun ideas for the [AI podcast & NPC generation](/news/ai-podcast) project — imagine agents pulling structured episode data directly instead of scraping a page. Now they will 🚀

## Questions & Answers



*Questions about the article, answered by the developer.*



**1. So you're basically saying HTTP already solved this problem 30 years ago and we just never needed it until now — what was the moment you went 'oh wait, the Accept header already does this'?**

I usually thing about these things everytime i make a REST API, or  you around with any HTTP or web-frameworks like next or even HTMLX memes. 

It's just adopting these things to the AI world. 

**2. You're not just passively waiting for bots to ask nicely — you're actively detecting them with things like missing Sec-Fetch-Mode headers and known User-Agent patterns. Isn't there a risk of false positives where a real user gets served raw markdown instead of the site?**

Yeah sure that could happen, never let the edge-cases drive your main decision arc. 
Most users use normal browers, does who don't, I assume know what they are doing. 

**3. You mentioned sending back an x-markdown-tokens header estimating token count for agents managing context windows — that's a really specific detail. How are you calculating that, and have you seen any AI agents actually use it?**

It's just a calculation of the content size divided 4. It's just an indicator of the length of the content. 
A nice guesture or hello.

**4. You called out Cloudflare doing this at the CDN level and you took it site-wide instead. What's your take on where this should live — should it be an infrastructure thing or something developers own per-site?**

Dynamic content, for static sure we can do it on the CDN side but authors want to have control of their content. 

**5. You've got llms.txt, robots.txt, sitemaps, RSS, OpenGraph, JSON-LD — and now content negotiation on top of all that. Is this replacing any of those, or is it just yet another layer in the machine-readable lasagna?**

more lasagna for everybody, there are lots of standards for many reasons - markdown is just a sign of the current age - it might go out of fashion eventually.

**6. You teased at the end that this connects to your AI podcast and NPC generation work — agents pulling structured episode data instead of scraping. How far along is that idea, and what does that actually look like in practice?**

I mean these questions and answers is one of the things for generative podcast content.  Layers of meta-content, these questions are a human beeing interviewed by an LLM about the content of the news-post. 


---

*Canonical URL: https://morgondag.io/news/ai-content-negotiation*