Back to Blog

AthenaHQ’s Grüns Case Study: What’s Behind a 6x Share of Voice Lift?

Kevin McCabe

CRO

5 min read

AthenaHQ’s Grüns case study reports a 6x Share of Voice lift in 60 days. It’s a compelling headline backed by a detailed content strategy. But AI visibility is measured on probabilistic systems that reshuffle answers constantly, and a before-and-after number on that kind of system carries a burden of proof. 

The question isn’t whether Grüns did the work. It’s whether the measurement underneath the headline can support a 6x improvement claim. And this case study has a problem the math can actually demonstrate: depending on how many prompts were tracked, the confidence bands around the two reported numbers may overlap. When the bands overlap, sampling noise alone could explain the entire reported lift. 

That doesn’t mean the lift didn’t happen. It means the case study, as published, can’t rule out the possibility that it didn’t. 


The claim

Grüns’ Share of Voice went from 2.0% to 12.6% over roughly 60 days (July to September 2025). Brand Mention Rate went from 4.0% to 25.0%. Citation Rate went from 0.3% to 7.0%. The case study credits a pillar-and-cluster content strategy executed with AthenaHQ’s platform, with approximately 70% of citations attributed to Athena-authored content. 

The case study doesn’t disclose how many prompts were tracked per topic, how many times each prompt was run, or what confidence range surrounds any of the reported numbers. 

Here’s what has to be true for the headline to hold up. 


What you’d have to believe 

1. Each prompt was measured more than once and with enough prompts to separate the signal from the noise. 

This is where the Grüns case study has a problem the math can demonstrate. 

At a moderate prompt count, the textbook 95% confidence band around a 2.0% reading is wide — the upper bound can reach into the high single digits or above 10%, depending on the calculation method. The band around 12.6% starts in the single digits on the low end. The result: the two ranges overlap. That means there’s a plausible scenario where the brand’s true underlying Share of Voice was somewhere in the middle of both bands during both periods, and the move from 2.0% to 12.6% is the result of sampling fluctuation across two measurement windows, not a real change. 

With a larger prompt count, the bands narrow and the overlap can disappear. But the case study doesn’t disclose the prompt count. Without that number, a buyer looking at this case study can’t determine whether the 6x headline clears even the most basic statistical check. 

And that’s before accounting for run-to-run variance. If each prompt was fired once per measurement window, both the 2.0% and the 12.6% are single draws from distributions that published research suggests are reshuffling roughly 70% of their cited brands from one run to the next. Multiple runs per prompt, averaged into each endpoint, would substantially strengthen the measurement. The case study doesn’t say whether that happened. 


2. The reported numbers would survive a confidence range. 

Neither 2.0% nor 12.6% is reported with any uncertainty band. Both are presented as point estimates to one decimal place, which implies a level of precision that the underlying measurement may not support. 

The challenge is specific to the size of these numbers. A 2.0% Share of Voice is a small signal. Small proportions measured on moderate sample sizes produce wide, asymmetric bands — the lower bound clips near zero while the upper bound stretches much further than most marketers would expect. That asymmetry means the 2.0% “before” number carries more uncertainty than its single decimal place suggests. Without a confidence range alongside the number, the reader has no way to know whether 2.0% is a tight estimate or a rough approximation sitting inside a band that reaches above 10%. 


3. The baseline of 2.0% was stable, not a low point in normal fluctuation. 

Greens supplements is a growing and competitive category in AI recommendations. Grüns competes against established brands with significant content footprints. In a category like that, brand mentions rotate in and out of AI answers as engines resample and re-weight sources. 

A Share of Voice of 2.0% is a very small number. At that level, the difference between “present in AI answers” and “absent from AI answers” is a matter of a few prompt responses. If the July reading caught Grüns during a period when it happened to be cycling low — appearing in fewer answers than its typical rate — the baseline is artificially depressed and the lift is mechanically inflated before anyone publishes an article. 

For 2.0% to be a real starting point, the measurement window it came from would need to have been stable. The brand’s appearance rate in AI answers would need to have settled into a consistent pattern, not still bouncing around. The case study doesn’t address this. It presents 2.0% as a fixed floor. 


4. The prompt set reflects how consumers actually search, and was held constant across both readings. 

The case study describes the topic as greens supplements and names a content strategy built around AthenaHQ’s Prompt Planner. But it doesn’t disclose how many prompts were tracked, how they were selected, or whether the same set was used for both the before and after readings. 

A prompt set skewed toward queries where Grüns’ new pillar-and-cluster content happened to land would produce a larger lift than a set reflecting the full range of how consumers ask about greens supplements — dosage, taste, ingredients, comparisons, side effects, and value, among others. A balanced set, structured across query types and buyer intents, produces a measurement that’s harder to inflate accidentally. And if different prompts were used for the July and September readings, the comparison isn’t paired. You’re measuring two different things and calling it a change. 


Why this matters for the buyer 

A 6x Share of Voice lift is the kind of number that moves budget. It gets briefed into a content strategy, used to justify a platform commitment, and cited in a quarterly review. The specific risk with the Grüns case study is that the confidence bands may overlap, which means there’s a plausible reading where the brand’s underlying Share of Voice didn’t change and the reported gap is sampling fluctuation. A budget decision deserves a number that can be defended when someone asks “how do we know?” 


The ask 

Four questions any AI visibility case study should answer: 

  1. How many times was each prompt run per measurement window? 

  2. What confidence range surrounds each reported number? 

  3. Was the baseline stable and decision-ready before the “before” reading was locked? 

  4. How many prompts were used, how were they selected, and was the same set used for both readings? 

The Grüns case study doesn’t answer any of them. And this case study has a measurable consequence for the gap: depending on the prompt count, the confidence bands around the before and after numbers may overlap entirely, which means sampling noise alone could explain the full reported lift. 

The next time a vendor walks you through a headline lift, run it through the four questions above. Not because the number is wrong. Because you don’t have enough information to know whether it’s right.

Kevin McCabe is CRO at IQRush. If you want to see how your brand’s AI visibility holds upunder the same measurement framework described here, book a 30-minute walkthrough.


Back to Blog

AthenaHQ’s Grüns Case Study: What’s Behind a 6x Share of Voice Lift?

Kevin McCabe

CRO

5 min read

AthenaHQ’s Grüns case study reports a 6x Share of Voice lift in 60 days. It’s a compelling headline backed by a detailed content strategy. But AI visibility is measured on probabilistic systems that reshuffle answers constantly, and a before-and-after number on that kind of system carries a burden of proof. 

The question isn’t whether Grüns did the work. It’s whether the measurement underneath the headline can support a 6x improvement claim. And this case study has a problem the math can actually demonstrate: depending on how many prompts were tracked, the confidence bands around the two reported numbers may overlap. When the bands overlap, sampling noise alone could explain the entire reported lift. 

That doesn’t mean the lift didn’t happen. It means the case study, as published, can’t rule out the possibility that it didn’t. 


The claim

Grüns’ Share of Voice went from 2.0% to 12.6% over roughly 60 days (July to September 2025). Brand Mention Rate went from 4.0% to 25.0%. Citation Rate went from 0.3% to 7.0%. The case study credits a pillar-and-cluster content strategy executed with AthenaHQ’s platform, with approximately 70% of citations attributed to Athena-authored content. 

The case study doesn’t disclose how many prompts were tracked per topic, how many times each prompt was run, or what confidence range surrounds any of the reported numbers. 

Here’s what has to be true for the headline to hold up. 


What you’d have to believe 

1. Each prompt was measured more than once and with enough prompts to separate the signal from the noise. 

This is where the Grüns case study has a problem the math can demonstrate. 

At a moderate prompt count, the textbook 95% confidence band around a 2.0% reading is wide — the upper bound can reach into the high single digits or above 10%, depending on the calculation method. The band around 12.6% starts in the single digits on the low end. The result: the two ranges overlap. That means there’s a plausible scenario where the brand’s true underlying Share of Voice was somewhere in the middle of both bands during both periods, and the move from 2.0% to 12.6% is the result of sampling fluctuation across two measurement windows, not a real change. 

With a larger prompt count, the bands narrow and the overlap can disappear. But the case study doesn’t disclose the prompt count. Without that number, a buyer looking at this case study can’t determine whether the 6x headline clears even the most basic statistical check. 

And that’s before accounting for run-to-run variance. If each prompt was fired once per measurement window, both the 2.0% and the 12.6% are single draws from distributions that published research suggests are reshuffling roughly 70% of their cited brands from one run to the next. Multiple runs per prompt, averaged into each endpoint, would substantially strengthen the measurement. The case study doesn’t say whether that happened. 


2. The reported numbers would survive a confidence range. 

Neither 2.0% nor 12.6% is reported with any uncertainty band. Both are presented as point estimates to one decimal place, which implies a level of precision that the underlying measurement may not support. 

The challenge is specific to the size of these numbers. A 2.0% Share of Voice is a small signal. Small proportions measured on moderate sample sizes produce wide, asymmetric bands — the lower bound clips near zero while the upper bound stretches much further than most marketers would expect. That asymmetry means the 2.0% “before” number carries more uncertainty than its single decimal place suggests. Without a confidence range alongside the number, the reader has no way to know whether 2.0% is a tight estimate or a rough approximation sitting inside a band that reaches above 10%. 


3. The baseline of 2.0% was stable, not a low point in normal fluctuation. 

Greens supplements is a growing and competitive category in AI recommendations. Grüns competes against established brands with significant content footprints. In a category like that, brand mentions rotate in and out of AI answers as engines resample and re-weight sources. 

A Share of Voice of 2.0% is a very small number. At that level, the difference between “present in AI answers” and “absent from AI answers” is a matter of a few prompt responses. If the July reading caught Grüns during a period when it happened to be cycling low — appearing in fewer answers than its typical rate — the baseline is artificially depressed and the lift is mechanically inflated before anyone publishes an article. 

For 2.0% to be a real starting point, the measurement window it came from would need to have been stable. The brand’s appearance rate in AI answers would need to have settled into a consistent pattern, not still bouncing around. The case study doesn’t address this. It presents 2.0% as a fixed floor. 


4. The prompt set reflects how consumers actually search, and was held constant across both readings. 

The case study describes the topic as greens supplements and names a content strategy built around AthenaHQ’s Prompt Planner. But it doesn’t disclose how many prompts were tracked, how they were selected, or whether the same set was used for both the before and after readings. 

A prompt set skewed toward queries where Grüns’ new pillar-and-cluster content happened to land would produce a larger lift than a set reflecting the full range of how consumers ask about greens supplements — dosage, taste, ingredients, comparisons, side effects, and value, among others. A balanced set, structured across query types and buyer intents, produces a measurement that’s harder to inflate accidentally. And if different prompts were used for the July and September readings, the comparison isn’t paired. You’re measuring two different things and calling it a change. 


Why this matters for the buyer 

A 6x Share of Voice lift is the kind of number that moves budget. It gets briefed into a content strategy, used to justify a platform commitment, and cited in a quarterly review. The specific risk with the Grüns case study is that the confidence bands may overlap, which means there’s a plausible reading where the brand’s underlying Share of Voice didn’t change and the reported gap is sampling fluctuation. A budget decision deserves a number that can be defended when someone asks “how do we know?” 


The ask 

Four questions any AI visibility case study should answer: 

  1. How many times was each prompt run per measurement window? 

  2. What confidence range surrounds each reported number? 

  3. Was the baseline stable and decision-ready before the “before” reading was locked? 

  4. How many prompts were used, how were they selected, and was the same set used for both readings? 

The Grüns case study doesn’t answer any of them. And this case study has a measurable consequence for the gap: depending on the prompt count, the confidence bands around the before and after numbers may overlap entirely, which means sampling noise alone could explain the full reported lift. 

The next time a vendor walks you through a headline lift, run it through the four questions above. Not because the number is wrong. Because you don’t have enough information to know whether it’s right.

Kevin McCabe is CRO at IQRush. If you want to see how your brand’s AI visibility holds upunder the same measurement framework described here, book a 30-minute walkthrough.


spacer