If you've spent a single dollar on Generative Engine Optimization (GEO), schema markup, or "AI SEO" — and you can't answer "is it working?" with a number, you're flying blind. The whole proposition is that AI engines like ChatGPT, Gemini, Claude, and Perplexity recommend your business to users. But almost no one tracks whether that's actually happening.

This playbook fixes that. Inside: the three layers of AI visibility worth tracking, ten prompt templates you can run today, the tools that actually deliver signal (free, paid, and DIY), a five-KPI framework, and a monthly reporting structure you can hand to a non-technical stakeholder.

By the end you'll know how to measure AI visibility tracking the way a search marketer measures rankings — except the rankings are now whether AI calls your name when somebody asks for a recommendation.

Why "ranking" doesn't work for AI search

Google rankings are deterministic. Position 1, 2, 3 — same query, same SERP, same data center, same answer. AI search isn't like that. The same prompt asked twice on the same day can produce two different recommendations. Personalization, model temperature, retrieval-augmented results, and even time-of-day shift the output.

This is why traditional rank trackers fail at AI visibility. Ahrefs and Semrush will tell you where your homepage ranks for "Toronto dental clinic." Neither will tell you whether ChatGPT recommends your clinic when a user asks "best invisalign provider in Toronto."

To measure AI visibility, you need a different mental model: citation share over time across a representative sample of buyer queries. Statistical, not deterministic. Trends, not snapshots.

The three layers of AI visibility

Before you pick tools, decide which layer you're actually tracking. Most people conflate these, then wonder why their dashboard is noisy.

Layer 1 — Knowledge graph presence

Does the AI know your brand exists? Ask ChatGPT "Who is [Your Brand]?" and it should return a clean factual paragraph. If the response is hallucinated, hedged, or "I don't have information on that brand," you have a knowledge graph problem, not a citation problem. Schema markup, Wikidata entries, structured About pages, and authoritative sameAs links control this layer. (We cover the schema side in Schema Markup for AI: The JSON-LD Templates That Get You Cited by ChatGPT.)

Layer 2 — Citation rate

When a user asks an AI for a recommendation in your category, how often does your brand appear in the answer? This is the layer most people mean when they say "AI visibility." It's measured as a percentage of relevant queries that mention you, by name.

Layer 3 — Sentiment and share of voice

When AI mentions you, how does it describe you? Positive, neutral, or critical? And what's your share of voice relative to direct competitors in the same category? A 40% citation rate means nothing if the AI is saying "they had a class-action settlement in 2024."

A useful tracking system measures all three. A bad tracking system measures only Layer 1 and calls it done.

The manual tracking method (start here)

Before you buy software, you need to do this manually for two weeks. It builds intuition that no dashboard replaces.

Step 1 — Build your prompt set

You need a representative sample of 20-30 queries that match how real customers describe their need. Not your branded queries (that's vanity). Not abstract terms like "AI marketing." Concrete, intent-driven, vertical-specific. Group them by intent:

Recommendation prompts (the highest-value category):

Who's the best [type of business] in [city] for [specific need]?

I need a [service]. Recommend three providers in [area] and tell me the trade-offs.

Who do you trust for [problem]? Give me names, not categories.

Comparison prompts:

Compare [Your Brand] vs [Competitor 1] vs [Competitor 2] for [use case].

What's the difference between [Your Brand] and [Top Competitor]?

Discovery prompts:

What does [Your Brand] do?

Tell me about [Your Brand] — what are they known for?

Vertical-authority prompts:

Who are the top experts in [your niche]?

What are the leading agencies for [your service category] in [your region]?

Who publishes the best research on [your topic]?

Step 2 — Run them across the four engines

Run every prompt through ChatGPT, Gemini, Claude, and Perplexity. Use a fresh session each time (no chat history) and log the responses verbatim into a spreadsheet. Don't paraphrase — exact wording matters for sentiment scoring.

Step 3 — Score each response

Five columns per row: mentioned (Y/N), rank (1st named, 2nd, 3rd, etc.), sentiment (+1 / 0 / -1), competitors named, and source links cited (if any). Two hours of manual work the first time. After two weeks of doing this, you'll know your real baseline — and whether your shiny new GEO investment is producing signal.

Reality check: Manual tracking is tedious, but it's the only way to spot patterns that automated tools miss — like when AI starts citing a competitor's blog post as the authoritative source on your category. That's a strategic threat. No dashboard flags it; manual review does.

Tools that actually work

Once your manual baseline is set, you can start automating. Here's a tiered breakdown.

Free tier

The four AI engines + Google Trends

ChatGPT, Gemini, Claude, and Perplexity all have free tiers. Use them directly with the prompt templates above. Run on a Tuesday morning and a Friday afternoon to capture variance. Cost: $0 + your time.

Pair with Google Trends for context — when a topic spikes, AI engines re-train on it faster than you'd expect. Watch for trend lifts that should improve your citation rate.

Pros: Real, ground-truth data. Cons: Doesn't scale beyond ~30 queries/week without a team.

Paid tier

Brand Radar (Ahrefs), Profound, Otterly, Daydream

The 2026 AI-visibility tools market has exploded. The best ones automate prompt sweeps across the four major engines, score citations, and chart trends. Pricing ranges from $99/month (Otterly) to $499+/month (Profound, Brand Radar).

Brand Radar by Ahrefs integrates with their existing rank tracker, so you can see Google rankings and AI citations in one dashboard. Best fit if you're already an Ahrefs customer.

Profound is the most thorough on competitive intelligence — it tracks not just whether you're cited, but who else is cited and from which sources.

Otterly is the cheapest and easiest to set up. Good starting point for a single brand.

Daydream focuses on conversational query patterns and is strong at sentiment scoring.

Pros: Scale, trend tracking, alerts. Cons: All of them rely on prompt sampling — none capture every real user query, so your numbers are statistically representative, not exhaustive.

DIY tier

Build your own with the OpenAI / Anthropic / Google APIs

If your brand has data engineering capacity, you can build a custom tracker in a weekend. Stack: Google Sheets or Airtable for prompts and results storage; a Cloudflare Worker or Lambda function to call each AI's API on a cron schedule; a simple sentiment classifier (the cheapest model from any of the four works fine for this); a Looker / Metabase dashboard.

Cost: ~$30/month in API fees for daily sweeps of 30 queries × 4 engines. Pros: Total control, exact data, easy to extend. Cons: Maintenance burden, and you're querying APIs (not the consumer products) so results aren't 100% identical to what real users see.

The KPI framework: five metrics that matter

Pick these five. Track them monthly. Show them to your CEO.

Citation rate

% of category-relevant queries (across all four engines) that mention your brand by name. The headline number.

Target: 25%+

Citation rank

When mentioned, what position in the answer? 1st-named carries dramatically more weight than 4th-named. Average across cited responses.

Target: ≤2.0

Sentiment score

Average sentiment of citations on a -1 to +1 scale. Positive language ("trusted," "leading," "best for...") scores higher than neutral ("based in Toronto, founded in...").

Target: +0.6

Share of voice

Your citation count divided by total citations of all named brands in the same category. Tells you whether you're winning the category narrative.

Target: 30%+

Source authority

Of the sources AI cites when discussing your brand, what's the average domain authority? Higher means AI trusts the references — and you should reinforce them with link-building.

Target: 60+

These targets are aspirational for a category leader. Most brands start at 5–10% citation rate, +0.2 sentiment, single-digit share of voice. Three months of disciplined GEO work typically doubles the citation rate. Six months can push share of voice past 25% in a regional category.

The monthly reporting template

Five sections. One page. Send to non-technical stakeholders.

1. Headline numbers — Citation rate, share of voice, sentiment score, all with the delta vs. previous month. One sentence interpretation.

2. Engine-by-engine breakdown — Citation rate per engine (ChatGPT, Gemini, Claude, Perplexity). One usually leads, one usually lags. Knowing which is which informs where to focus next month's content.

3. Top-performing queries — The 5 prompts where you have the highest citation rate. These are your defensible category positions.

4. Loss queries — The 5 prompts where your citation rate dropped vs. last month, or where you're absent. These are next month's priority targets.

5. Competitor delta — A small bar chart of your share of voice vs. the top three competitors. Stakeholders will spend 80% of their attention here.

The whole report fits on one page. If yours doesn't, you're including too much. Stakeholders need a trend line, not a data dump.

When to track what

Don't be obsessive. AI visibility is a trailing indicator — over-monitoring it generates noise.

Daily: Nothing. Skip the dashboard. You'll see variance and start making bad decisions.
Weekly: Run your top 5 priority queries (the ones tied to revenue) across all four engines. Note any sudden disappearances.
Monthly: Full audit, KPI dashboard refresh, stakeholder report. This is the cadence that matters.
Quarterly: Competitive benchmarking. Add or rotate prompts in your tracking set as buyer language evolves. Re-baseline targets.

Common pitfalls

After running this for hundreds of clients, these are the five tracking mistakes you'll want to avoid.

Caching and personalization create noise. ChatGPT, Gemini, and the rest cache responses for similar prompts and personalize for logged-in users. Always test in a fresh session. Always rotate phrasing.

Sample size too small. Three prompts is not a sample. Twenty is the floor. Thirty is comfortable. Below twenty and you're measuring noise, not signal.

Wrong queries. Tracking branded queries ("What is [Your Brand]?") is vanity. Real visibility lives in unbranded recommendation queries — the ones a buyer asks before they know your name. If your prompt set is half branded, your numbers look great and your pipeline doesn't move.

Mistaking volatility for trends. A single bad week is noise. Three weeks in a row is a trend. Don't pivot strategy on a single data point.

Not pairing tracking with action. Citation tracking only matters if it informs what you publish, what schema you deploy, and what citations you build. A dashboard with no follow-through is theatre.

Frequently asked questions

How often do AI engines update their knowledge of my brand? Continuously, but unevenly. ChatGPT's web-browsing tool can pick up new content within hours. Its base model retraining happens roughly every few months. Gemini and Claude follow similar patterns. Perplexity is fastest because it does live retrieval — your latest blog post can show up the same day.

Should I track AI Overviews (Google's SERP feature) separately? Yes. AI Overviews behave differently than ChatGPT/Gemini direct responses. Treat them as a fifth engine in your tracking matrix. Tools like Brand Radar can do this automatically.

What's a realistic timeline for citation rate improvement? Schema and entity fixes typically produce measurable lifts in 30–60 days. Displacing an entrenched competitor in AI recommendations takes 3–6 months of sustained authority work. We cover the operational side in The 90-Day GEO Roadmap — coming soon.

Is Perplexity worth tracking if it has lower usage than ChatGPT? Yes. Perplexity is over-represented among researchers, journalists, and B2B buyers — high-LTV audiences. Its citation patterns also lead the others (because of live retrieval), making it an early signal of where ChatGPT and Gemini will be in 30–60 days.

Stop tracking. Start fixing.

Tracking is necessary but tedious. After a couple of months it becomes obvious that the rate-limiting step isn't the dashboard — it's the schema, content, and citation work that moves the numbers.

If you'd rather skip the manual tracking entirely: request a free AI Visibility Audit. We run all of the above, on your domain, across all four major AI engines, and deliver a prioritized list of what to fix first. Twenty-four hours, no credit card.

Or — if you want the operator path — pair this playbook with The AI Knowledge Graph Playbook and Schema Markup for AI and you'll have everything you need to track and improve your citation rate inside 90 days.

The brands that win the AI search era aren't the ones with the prettiest websites. They're the ones with measurable answers when somebody asks "is it working?"

The AI Visibility Tracking Playbook: How to Measure Your Brand's Citation Share Across ChatGPT, Gemini, Claude & Perplexity

In this guide

Why "ranking" doesn't work for AI search

The three layers of AI visibility

Layer 1 — Knowledge graph presence

Layer 2 — Citation rate

Layer 3 — Sentiment and share of voice

The manual tracking method (start here)

Step 1 — Build your prompt set

Step 2 — Run them across the four engines

Step 3 — Score each response

Tools that actually work

The four AI engines + Google Trends

Brand Radar (Ahrefs), Profound, Otterly, Daydream

Build your own with the OpenAI / Anthropic / Google APIs

The KPI framework: five metrics that matter

The monthly reporting template

When to track what

Common pitfalls

Frequently asked questions

Stop tracking. Start fixing.

Is AI recommending your competitors instead of you?

Unlock Your AI Potential

The AI Visibility Tracking Playbook: How to Measure Your Brand's Citation Share Across ChatGPT, Gemini, Claude & Perplexity

In this guide

Why "ranking" doesn't work for AI search

The three layers of AI visibility

Layer 1 — Knowledge graph presence

Layer 2 — Citation rate

Layer 3 — Sentiment and share of voice

The manual tracking method (start here)

Step 1 — Build your prompt set

Step 2 — Run them across the four engines

Step 3 — Score each response

Tools that actually work

The four AI engines + Google Trends

Brand Radar (Ahrefs), Profound, Otterly, Daydream

Build your own with the OpenAI / Anthropic / Google APIs

The KPI framework: five metrics that matter

The monthly reporting template

When to track what

Common pitfalls

Frequently asked questions

Stop tracking. Start fixing.

Read Next

I Asked ChatGPT About 10 Local Businesses. Here's What It Got Wrong.

GEO for Healthcare: The 7-Entity Setup That Gets Your Practice Cited by ChatGPT

We Asked ChatGPT to Recommend a Real Estate Agent in 20 Toronto Neighbourhoods. Here's Who It Picked.

Is AI recommending your competitors instead of you?

Unlock Your AI Potential