Back to Blog

AirOps’ State of AI Search Report: What 70% Churn Means for Every Visibility Number in the Category

Kevin McCabe

CRO

5 min read

AirOps published a finding in their 2026 State of AI Search report that should change how every marketer reads an AI visibility number: only 30% of brands that appear in one AI answer show up in the next, and only 20% persist across five consecutive runs of the same prompt. Nothing changes on the brand’s side between runs. The engine just rebuilds the answer from scratch each time. 

That’s 70% single-run churn. It means that on any given measurement pass, the majority of brand appearances are transient — they wouldn’t be there if you asked again a minute later.  

The question is what that finding demands of every visibility number published in the category, including AirOps’ own. 


What the research found 

The AirOps report measures what happens when the same prompt set is fired at AI engines on repeated passes with no intervention between runs. The results: 

About 30% of brands that appear in one answer also appear in the next. About 20% persist across five consecutive runs. Brands earning both a mention and a citation are 40% more likely to resurface, but only 28% of answers include brands with that dual signal. More than 50% of brands that drop out of an answer resurface within two runs, meaning visibility isn’t permanently lost, but it is constantly cycling. 

These aren’t edge cases. They’re the baseline behavior of the system being measured. Every AI visibility number is a reading taken from a system that exhibits this level of churn by default. 


What 70% churn demands of measurement 

If the system reshuffles 70% of its brand appearances on every pass, then a single-pass reading captures one frame of a distribution that’s constantly moving. That has specific consequences for how visibility numbers should be reported: 

A reported number needs to be an average across multiple runs, not a single snapshot. If each prompt was fired once, the reported visibility percentage is one realization of a distribution. Fire the same prompt set again and the number changes not because the brand’s visibility changed, but because the engine rebuilt the answer differently. Averaging across multiple runs produces a number that represents the brand’s typical visibility, not the visibility it happened to show on one particular pass. 

A reported number needs a confidence range. Even averaged across multiple runs, the resulting number sits inside a band of uncertainty that depends on how many prompts were tracked and how many times each was run. Without that band, the reader can’t tell whether a 6-point week-over-week change is meaningful movement or well within the expected fluctuation of a system with 70% single-run churn. 

A baseline needs to have settled before being treated as a starting point. If brand rankings in a category are still shifting when the “before” reading is taken, the measurement is anchored to a moving point. The AirOps report shows that more than 50% of brands that drop out resurface within two runs, which means the system cycles through periods of higher and lower visibility for any given brand. A reading taken during a low point produces a mechanically inflated lift when compared to a later reading. 

A week-over-week change needs to be compared against the noise floor. The AirOps research quantifies the noise floor: 70% churn per run. A 6-point weekly delta on a system with that level of churn could easily be the churn doing what churn does, not the brand’s actual position changing. Without a way to compare the reported delta against the expected variance, the marketer can’t distinguish signal from noise. 


The implication for every case study in the category 

AirOps’ research doesn’t just apply to AirOps’ own numbers. It applies to every vendor reporting AI visibility metrics. If the system has 70% single-run churn, then any case study, from any vendor, that reports a before-and-after lift without disclosing multi-run averaging, confidence ranges, and baseline stability is reporting a number the underlying system may not support. 

The four measurement questions this research implies are the same ones a buyer should ask of every case study and every dashboard in the category: 

  1. How many times was each prompt run per measurement window? 

  2. What confidence range surrounds each reported number? 

  3. Was the baseline stable before the “before” reading was locked? 

  4. How many prompts were used, how were they selected, and was the same set used for both readings? 

AirOps’ own report is the evidence for why these questions matter. Any vendor whose reporting doesn’t address them is asking the buyer to treat a single frame as a trend on a system that AirOps’ research shows is reshuffling most of its content on every pass. 


What the research got right 

The AirOps State of AI Search report is genuinely useful work. Measuring run-to-run variance across engines, publishing the persistence rates, and quantifying the dual-signal (mention plus citation) effect gives the category a shared reference point for what the noise floor actually looks like. The finding that 50% of dropped brands resurface within two runs is a useful corrective to the assumption that visibility loss is permanent — it reframes the problem as cycling rather than declining. 

The report’s recommendations — structured content, quarterly freshness, off-site credibility, tracking patterns over time rather than individual snapshots — are grounded in the data and actionable. The measurement challenge it surfaces is real and the category is better for having it documented. 

The question that remains is whether the reporting standards the research implies — multi-run averaging, confidence intervals, baseline readiness — are applied consistently to every visibility number that carries the weight of a business decision. 

Kevin McCabe is CRO at IQRush. If you want to see how your brand’s AI visibility holds up under the same measurement framework described here, book a 30-minute walkthrough.

Back to Blog

AirOps’ State of AI Search Report: What 70% Churn Means for Every Visibility Number in the Category

Kevin McCabe

CRO

5 min read

AirOps published a finding in their 2026 State of AI Search report that should change how every marketer reads an AI visibility number: only 30% of brands that appear in one AI answer show up in the next, and only 20% persist across five consecutive runs of the same prompt. Nothing changes on the brand’s side between runs. The engine just rebuilds the answer from scratch each time. 

That’s 70% single-run churn. It means that on any given measurement pass, the majority of brand appearances are transient — they wouldn’t be there if you asked again a minute later.  

The question is what that finding demands of every visibility number published in the category, including AirOps’ own. 


What the research found 

The AirOps report measures what happens when the same prompt set is fired at AI engines on repeated passes with no intervention between runs. The results: 

About 30% of brands that appear in one answer also appear in the next. About 20% persist across five consecutive runs. Brands earning both a mention and a citation are 40% more likely to resurface, but only 28% of answers include brands with that dual signal. More than 50% of brands that drop out of an answer resurface within two runs, meaning visibility isn’t permanently lost, but it is constantly cycling. 

These aren’t edge cases. They’re the baseline behavior of the system being measured. Every AI visibility number is a reading taken from a system that exhibits this level of churn by default. 


What 70% churn demands of measurement 

If the system reshuffles 70% of its brand appearances on every pass, then a single-pass reading captures one frame of a distribution that’s constantly moving. That has specific consequences for how visibility numbers should be reported: 

A reported number needs to be an average across multiple runs, not a single snapshot. If each prompt was fired once, the reported visibility percentage is one realization of a distribution. Fire the same prompt set again and the number changes not because the brand’s visibility changed, but because the engine rebuilt the answer differently. Averaging across multiple runs produces a number that represents the brand’s typical visibility, not the visibility it happened to show on one particular pass. 

A reported number needs a confidence range. Even averaged across multiple runs, the resulting number sits inside a band of uncertainty that depends on how many prompts were tracked and how many times each was run. Without that band, the reader can’t tell whether a 6-point week-over-week change is meaningful movement or well within the expected fluctuation of a system with 70% single-run churn. 

A baseline needs to have settled before being treated as a starting point. If brand rankings in a category are still shifting when the “before” reading is taken, the measurement is anchored to a moving point. The AirOps report shows that more than 50% of brands that drop out resurface within two runs, which means the system cycles through periods of higher and lower visibility for any given brand. A reading taken during a low point produces a mechanically inflated lift when compared to a later reading. 

A week-over-week change needs to be compared against the noise floor. The AirOps research quantifies the noise floor: 70% churn per run. A 6-point weekly delta on a system with that level of churn could easily be the churn doing what churn does, not the brand’s actual position changing. Without a way to compare the reported delta against the expected variance, the marketer can’t distinguish signal from noise. 


The implication for every case study in the category 

AirOps’ research doesn’t just apply to AirOps’ own numbers. It applies to every vendor reporting AI visibility metrics. If the system has 70% single-run churn, then any case study, from any vendor, that reports a before-and-after lift without disclosing multi-run averaging, confidence ranges, and baseline stability is reporting a number the underlying system may not support. 

The four measurement questions this research implies are the same ones a buyer should ask of every case study and every dashboard in the category: 

  1. How many times was each prompt run per measurement window? 

  2. What confidence range surrounds each reported number? 

  3. Was the baseline stable before the “before” reading was locked? 

  4. How many prompts were used, how were they selected, and was the same set used for both readings? 

AirOps’ own report is the evidence for why these questions matter. Any vendor whose reporting doesn’t address them is asking the buyer to treat a single frame as a trend on a system that AirOps’ research shows is reshuffling most of its content on every pass. 


What the research got right 

The AirOps State of AI Search report is genuinely useful work. Measuring run-to-run variance across engines, publishing the persistence rates, and quantifying the dual-signal (mention plus citation) effect gives the category a shared reference point for what the noise floor actually looks like. The finding that 50% of dropped brands resurface within two runs is a useful corrective to the assumption that visibility loss is permanent — it reframes the problem as cycling rather than declining. 

The report’s recommendations — structured content, quarterly freshness, off-site credibility, tracking patterns over time rather than individual snapshots — are grounded in the data and actionable. The measurement challenge it surfaces is real and the category is better for having it documented. 

The question that remains is whether the reporting standards the research implies — multi-run averaging, confidence intervals, baseline readiness — are applied consistently to every visibility number that carries the weight of a business decision. 

Kevin McCabe is CRO at IQRush. If you want to see how your brand’s AI visibility holds up under the same measurement framework described here, book a 30-minute walkthrough.

spacer