← All Insights
PublishingApr 25, 2026

What drives whether AI assistants cite your content?

Duane Grey

Duane Grey

AI Strategy & Implementation

Find this useful? Share the visual.

The Short Version

I expected the winning features to be the ones content people argue about. Topic adjacency to proven winners. Keyword density and structural polish. Stats and citations woven into the draft. One feature outranked the rest. How recently the piece was published. Across five weeks of citation data on my own site, a baseline that used just recency beat the more sophisticated models I built. At a content library in the dozens rather than the thousands, the most defensible use of your hour is publishing the next piece, not polishing the current one past a reasonable threshold.

When I started modeling this on my own site, I expected the winning features to be the ones content people usually argue about. Topic adjacency to proven winners. Keyword density and structural polish. Stats and citations woven into the draft. The kind of work a good writer can deliberately engineer.

What the data showed, at least at the scale I can measure so far, is that one feature dominates. How recently the piece was published. The other features I tried either tied that result or made predictions worse.

I want to walk through what I tested and what the numbers said, then what that implies for where to spend time if you are writing for AI citation.

The Setup

I publish a running set of insights on my site and I track which of them get fetched by AI assistants like ChatGPT, Claude, and Perplexity when a real user asks those assistants a question. Each fetch shows up in my server logs with a specific user agent I can identify. Over five weeks of data, that produced twenty three real user citations spread across about twenty five eligible pieces.

The project goal was simple. Can I build a model that, given a candidate draft, predicts whether it will get cited in the next seven days. If I can, I have a tool that helps me prioritize what to work on.

Before building anything fancy, I ran three baselines. A baseline is a deliberately dumb model used to set a floor. If your sophisticated model cannot beat the dumb one, the sophistication is not earning its keep.

The three baselines.

Baseline one, mean at base rate. Predict the average rate for any piece. If about ten percent of pieces get cited in a given week, predict ten percent across the board.

Baseline two, recency. The single feature the model gets to see is days since published. Pieces published yesterday score higher than pieces from last month.

Baseline three, similarity. The features the model gets to see are how topically similar each piece is to pieces that were already cited. Closer to a proven winner equals higher score. This is the "adjacent content" thesis you hear in a lot of AI search optimization advice.

What the Data Showed

I scored each baseline two ways. Spearman correlation measures whether the model's ranking of pieces lines up with the real ranking, from negative one for exactly wrong through zero for no relationship to positive one for exactly right. Precision at five takes the five pieces the model ranked highest and asks how many of those were cited.

For crawler attention, meaning traffic from indexing bots preparing training or search data, recency was dominant. Spearman of 0.64 and precision at five of 1.0. Each of the five most recently published pieces was visited by crawlers in the following week.

For real user citation, the effect was weaker but still pointed the same way. Recency got a Spearman of 0.22 and a precision at five of 0.4. Two of the five most recent pieces were cited by an AI assistant during a real user query, versus one out of five expected if you picked at random.

Similarity features did nothing. For both targets, the similarity baseline scored at or below the random floor. Negative Spearman on one of them, which means the ranking was slightly inverted. Pieces that looked most similar to previous winners were not more likely to become winners themselves, at least not in any way my setup could detect.

The full model I built on top of these baselines, with the structural features I could think of, did not beat recency on the user citation target. The honest read is that at my current content volume, recency explains most of what is explainable.

What Surprised Me

Two things.

The first is that the gap between crawler behavior and user citation is large. Crawlers grab new content aggressively. AI assistants answering real user questions are more selective. Optimizing for what crawlers pick up is not the same as optimizing for what gets cited in an actual conversation. A lot of AI content advice conflates the two.

The second is that topic adjacency did not show up in the numbers. I had written several pieces as deliberate bets adjacent to proven winners, and I had anecdotal evidence the bets were paying off. The data does not refute the anecdotes, but it says the effect is not strong enough to beat a recency baseline on twenty three events. Either the effect is smaller than I thought, or I need more data to see it, or my way of measuring similarity misses the thing that matters.

What This Implies for Where to Spend Time

If you are writing for AI citation and your content library is in the dozens rather than the thousands, the most defensible use of your hour is publishing another piece, not polishing the current one past a reasonable threshold.

That is not the advice I wanted to give. I wanted to find a feature that would tell me which drafts to prioritize. What I got instead is closer to "keep the bar high, keep shipping, and measure the library rather than the piece."

Polish still matters for the reader you are trying to convert once they arrive. A thin page will not turn a citation into a lead. The point is narrower. The act of publishing the next piece produces more expected citation value than making the last one ten percent better.

What I Am Not Claiming

I want to be careful about what this data can and cannot support.

Twenty three positive events is directional, not definitive. The picture could change meaningfully once I have a few more months of data. AI assistant behavior drifts monthly, which means a model trained on April data may not describe July.

I tested just the similarity features I could build cheaply with off the shelf embeddings. It is possible that a better measure of "does this piece engage the same argument as a proven winner" would show a real effect. That is on my list.

What I feel confident saying is this. If someone sells you a content engineering framework that promises to move AI citation rates meaningfully without increasing publishing cadence, ask them to show their work on a sample the size of yours. My sample is small and my answer is already recency. It is unlikely their sample is both larger than mine and says something very different.

By the Numbers

Recent research on generative engine optimization finds that content freshness and structural clarity are stronger predictors of AI assistant citation than topical clustering alone.

Aggarwal et al., 'GEO: Generative Engine Optimization,' arXiv:2311.09735, 2023

Across studies of search and recommender systems, simple baselines using recency or popularity frequently match or beat more sophisticated models on small datasets, a result known as the 'strong baselines' problem.

Ferrari Dacrema et al., 'Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches,' RecSys, 2019