Back to Blog

AirOps’ Asana Case Study: The Questions Behind a 71% Citation Lift

Kevin McCabe

CRO

Jun 16, 2026

5 min read

AirOps’ Asana case study reports a 71% lift in AI citations over a 33-day window. It’s a content velocity story: AirOps’ Quill platform produced 232 articles across four languages, and the citation numbers moved.

The case study itself calls the numbers “directional,” noting the short measurement window. But it doesn’t carry the methodology to tell a buyer whether the 71% is a reliable reading or a favorable snapshot — and AirOps’ own published research is the strongest argument in the category for why that methodology matters.

The claim

Over a 33-day post-publish window, Asana’s AI citation count increased 71%. ChatGPT citations rose 93%. Google AI Overviews rose 42%. Google AI Mode rose 11%. First-mention rate jumped 18%. Citation rate lifted 16%. 58% of tracked prompts went from zero Asana presence to active citation. The 232 articles produced by Quill drove 58% of all Asana brand citations in the measurement window.

The case study doesn’t disclose how many prompts were tracked, how many times each prompt was run, what confidence range surrounds any of the reported figures, or whether the baseline was stable before measurement began.

Here’s what has to be true for the 71% headline to hold up.

What you’d have to believe

1. The measurements were based on repeated runs, not single passes.

AirOps has published research showing that only about 30% of brands persist in AI answers from one run to the next, and only about 20% across five consecutive runs. That research is the most direct evidence in the category that single-pass readings on AI engines are unreliable and it comes from AirOps itself.

The Asana case study reports six metrics as point estimates. It doesn’t disclose whether those numbers are averages across multiple runs or single snapshots. If they’re single-pass readings, the instability that AirOps’ own research documents applies directly to these numbers. A 71% citation lift measured on a single pass could look like 30% or 120% on the next pass, and nobody touched the content in between.

The case study and the research need to be reconciled. Either the measurements were multi-run — in which case the case study should say so — or they weren’t, in which case AirOps’ own findings are the reason the numbers can’t be treated as settled.

2. The reported lifts would hold up inside a confidence range.

Six metrics are reported to the percentage point. None carries an uncertainty band. A 71% citation lift sounds decisive, but without knowing the prompt count and the number of runs, a buyer can’t tell how wide the confidence range around that number is. A 16% citation rate lift is even more vulnerable — smaller moves on smaller bases need larger samples to separate from noise.

Reporting six metrics without confidence ranges creates an impression of precision across multiple dimensions. Each number reinforces the others visually. But if the underlying methodology is the same for all six hey aren’t six independent confirmations. They’re six views of the same data, and they share the same uncertainty.

3. The baseline was stable before 232 articles started publishing at seven per day.

This is the measurement challenge specific to the Asana case study. Most case studies compare a quiet “before” period against an “after” period where something changed. Here, the “after” period wasn’t a targeted content push, it was a sustained production operation at seven articles a day across four languages.

That volume is its own intervention. The baseline — whatever Asana’s citation rate was before Quill — was captured during or just before a period of massive content acceleration. If the pre-publish reading caught Asana during a natural low in citation cycling, the lift is mechanically inflated before any article goes live.

For the “before” number to be a real starting point, Asana’s citation presence would need to have been stable going into the measurement window. The case study doesn’t disclose a pre-period baseline, how long it was observed, or whether it had settled before the first article published.

4. The prompt set was representative, balanced, and held constant across both windows.

The case study reports that 58% of tracked prompts went from zero Asana presence to active citation. That’s a striking number, but it depends entirely on which prompts were being tracked and when the set was defined.

If the prompt set was chosen or adjusted after the content was published to include prompts where the new articles happened to get cited then the measurement is fitted to the outcome. A prompt set defined before the content push, held constant through both windows, and broad enough to represent how buyers actually search for work management tools would be a stronger foundation. The case study doesn’t disclose how the prompts were selected or whether the set was the same in both periods.

Why this matters

The Asana case study isn’t just reporting a lift; it’s describing a production engine. Seven articles a day, four languages, 232 pieces in 33 days. That’s an operation built for speed. The measurement is running at the same pace: a 33-day window, point-estimate metrics, and a forward-looking statement that the numbers will compound.

The risk is that the production velocity and the measurement velocity are both outpacing the reliability of the data underneath. Each article published is a resource allocation decision. Each one is justified, in part, by metrics the case study itself calls directional. At one or two articles, that’s a reasonable bet on early signal. At 232 articles with more markets ramping, it’s a content operation scaled on top of numbers that haven’t been verified and the case study frames that as a feature, not a limitation.

The ask

Four questions any AI visibility case study should answer:

How many times was each prompt run per measurement window?
What confidence range surrounds each reported number?
Was the baseline stable and decision-ready before the “before” reading was locked?
How many prompts were used, how were they selected, and was the same set used for both readings?

The Asana case study doesn’t answer any of them. AirOps has the research to know why these questions matter: their own variance findings are the reason. Applying that research to their own case study methodology would turn these numbers from directional into defensible.

Frequently asked questions

Why does AirOps' own variance research matter here?

Because AirOps has published findings showing that only about 30% of brands persist in AI answers from one run to the next. That's the strongest evidence in the category for why single-pass readings can't be treated as reliable. If the Asana metrics are based on single-pass readings, they inherit the instability AirOps' own research documents.

Is AirOps being dishonest?

No. The case study is transparent about the 33-day window and uses the word "directional." The gap isn't honesty; it's methodology. The reported numbers don't carry the measurement disclosure a buyer would need to independently evaluate whether the lifts are statistically reliable.

Does publishing 232 articles in 33 days make the results harder to evaluate?

Yes. High content velocity during the measurement window means the baseline was shifting as measurement began. Separating the effect of the content from normal citation fluctuation requires a stable pre-period baseline and the case study doesn't disclose one.

Kevin McCabe is CRO at IQRush. If you want to see how your brand’s AI visibility holds up under the same measurement framework described here, book a 30-minute walkthrough.

Back to Blog

AirOps’ Asana Case Study: The Questions Behind a 71% Citation Lift

Kevin McCabe

CRO

Jun 16, 2026

5 min read

The claim

Here’s what has to be true for the 71% headline to hold up.

What you’d have to believe

1. The measurements were based on repeated runs, not single passes.

2. The reported lifts would hold up inside a confidence range.

3. The baseline was stable before 232 articles started publishing at seven per day.

4. The prompt set was representative, balanced, and held constant across both windows.

Why this matters

The ask

Four questions any AI visibility case study should answer:

How many times was each prompt run per measurement window?
What confidence range surrounds each reported number?
Was the baseline stable and decision-ready before the “before” reading was locked?
How many prompts were used, how were they selected, and was the same set used for both readings?

Frequently asked questions

Why does AirOps' own variance research matter here?

Is AirOps being dishonest?

Does publishing 232 articles in 33 days make the results harder to evaluate?

Kevin McCabe is CRO at IQRush. If you want to see how your brand’s AI visibility holds up under the same measurement framework described here, book a 30-minute walkthrough.

spacer

Problem

Solution

How it Works

Platform

AI search visibility you can defend.

AirOps’ Asana Case Study: The Questions Behind a 71% Citation Lift

The claim

What you’d have to believe

1. The measurements were based on repeated runs, not single passes.

2. The reported lifts would hold up inside a confidence range.

3. The baseline was stable before 232 articles started publishing at seven per day.

4. The prompt set was representative, balanced, and held constant across both windows.

Why this matters

The ask

Frequently asked questions

Why does AirOps' own variance research matter here?

Is AirOps being dishonest?

Does publishing 232 articles in 33 days make the results harder to evaluate?

AirOps’ Asana Case Study: The Questions Behind a 71% Citation Lift

The claim

What you’d have to believe

1. The measurements were based on repeated runs, not single passes.

2. The reported lifts would hold up inside a confidence range.

3. The baseline was stable before 232 articles started publishing at seven per day.

4. The prompt set was representative, balanced, and held constant across both windows.

Why this matters

The ask

Frequently asked questions

Why does AirOps' own variance research matter here?

Is AirOps being dishonest?

Does publishing 232 articles in 33 days make the results harder to evaluate?

AI search visibility you can defend.