Content without clicks: Making sure your content is found on and off your website

Content discovery has become increasingly complex. Marketing teams are now responsible for ensuring content can be found by users not just on their own websites, but also in AI-powered search environments like ChatGPT and Perplexity. Watch this webinar to learn how you can make your content discoverable across all search channels, efficiently.

Squiz Consulting Team 21 Oct 2025

Webinar Q&A

Making content machine-readable

What does 'proper content structure' actually mean in practice? Can you give us an example?

In practice, having a proper content structure means:

Using proper headings (H1s, H2s, H3s) to organize your content hierarchically
Writing in plain language that's easy to understand for both people and AI
Marking up your content properly using HTML elements like paragraphs, lists, tables, captions, and quotes to give structure
Adding machine-readable structure through schema markup so AI crawlers can interpret the meaning, not just the text

Crawlers don't look at visual elements on your page, they're reading the code behind it. So even if your content looks clear visually, if the underlying structure isn't there, AI won't understand it properly.

What exactly do you mean by "machine-readable structure"? What are some characteristics of that?

Machine-readable structure primarily refers to schema markup, i.e. structured data that helps make your content less confusing to AI crawlers.

Here's a practical example: Imagine you have a picture showing someone holding a microphone at a concert. Humans might recognize the band and venue, but a crawler won't know that from an image alone. Schema markup explicitly identifies the content type (event type: concert, performer, venue details) to remove that ambiguity.

Does the machine readability focus mean shifting from best-practice answer-statement headings (back) to question headings in content structure?

Not necessarily. While LLMs do tend to prefer FAQ structures, you don't need to force all your content into question-format headings just to be AI-ready.

Machine-readability is more about the underlying code structure than the visible heading format. This means:

Using proper heading hierarchy (H1s, H2s, H3s) in your HTML
Writing in plain language
Adding schema markup to provide context AI can understand

Well-structured content with answer-statement headings (like "How to apply" or "Application process") can still be AI-ready if the underlying structure and markup are solid. The format matters less than ensuring your content clearly addresses user questions and is properly structured.

This is something our content auditing tools will address in the future. They'll help you identify where your content needs optimization to be AI-ready, rather than forcing all your content into FAQ formats.

Are there any AI industry standards adopted across major players/models?

While there isn't a single unified standard across all AI platforms, the principles discussed in the webinar apply broadly across major players. The key is that well-structured, machine-readable content performs better across all AI systems, whether that’s Google AI Overviews, ChatGPT, Perplexity, or other LLMs.

The best standards, while not specific to AI, are still schema.org and JSON-LD. If you use those standards, most AI models will read them best.

There are many types of schema markups. How do we decide what works for what content?

Choose schema types that match the purpose of your content. For example:

FAQPage – for pages that answer common public questions
HowTo – for step-by-step guides (e.g., how to apply for a building permit)
Article or NewsArticle – for updates or announcements
Dataset – for open data releases
Organization – for describing your organization's details
WebPage – as a default for general information pages

Where multiple schema types apply, you can nest or combine them - for example, a Service page might also include Organization and BreadcrumbList markup. The goal is to make each page's intent clear to AI systems and search engines, so they can identify you as the authoritative source on that topic.

How to test if you've got it right: Prompt an LLM to explain your page back to you and see how accurate the response is. If it's not as accurate as you'd like, improve the schema markup.

Here’s an example prompt you could use: "Act as an SEO and structured data specialist. Review the schema markup on [link to your website]. Identify what schema types are currently in use, whether they are valid and aligned with Google’s Rich Results guidelines, and where improvements can be made. Recommend additional schema types that would strengthen entity recognition, Answer Engine Optimisation (AEO), and visibility in AI-powered search. Suggest concrete changes (e.g. adding Organization, Service, FAQ, Article, BreadcrumbList) and explain why each matters for discoverability and performance. Highlight risks of over- or mis-use of schema. Present your findings as: Current State, Gaps, Recommendations, and Next Actions."

Do you have any insight into how discoverable content in PDFs/documents on our site would be vs on-page content?

When it comes to your own website's search experience, PDFs will work well, as Squiz Conversational Search will soon be able to handle unstructured data like PDFs effectively. However, for broader AI discoverability across external platforms like ChatGPT, Perplexity, and Google AI Overviews, HTML web pages are preferred because they allow for proper semantic structure, schema markup, and accessibility features that these AI systems rely on.

If discoverability across the broader AI ecosystem is important for that content, consider publishing it as a web page in addition to (or instead of) a PDF.

Optimization strategy and prioritization

For large websites with a huge body of content, how can we pivot quickly without resorting to the nuclear option (starting from scratch)?

Starting from scratch isn't realistic for most teams, and honestly not necessary. We recommend starting with just a "slice" of your website: choose a high-impact content area and optimize that first, rather than tackling your entire website at once.

Look for content areas that are important to your audience but lower-risk to experiment with. Think of areas like student life content, community resources, or general information pages, i.e. valuable content that supports your goals without being your most critical conversion pages. The goal is to start somewhere that will deliver value without putting high-stakes pages at risk while you're learning.

This approach works because:

You don't need to optimize everything at once
You can test, learn, and refine your strategy quickly
Quick wins build organizational buy-in
It's realistic for small teams without massive resources

You mentioned choosing a high-impact content area or "slice" to start optimizing. How do you determine what qualifies as a high-impact area?

When you're starting out, the sweet spot is content that's high-value but low-risk, i.e. meaningful to your audience without being business-critical. This gives you space to learn and refine your approach before tackling your most important pages.

Take student life content at a university as an example. It includes information about accommodation, clubs and societies, health services, and general facilities – all important to prospective and current students. But unlike admissions or scholarship pages, if something doesn't work perfectly during optimization, it won't stop students from applying or enrolling. That makes it ideal for testing and learning.

The same logic applies across industries. Look for content areas that:

Support important user needs or journeys
Have decent visibility but aren't your main conversion pages
Are contained enough to manage with your current resources
Won't cause major issues if there are hiccups while you’re optimizing

Use your analytics and content audit tools to identify these areas, but also consider what's manageable for your team to tackle as a first project.

We’re an agency that supports large enterprise organizations that may have 20-50K pages in an existing set of websites. Often the org knows that the content quality varies widely, but they have no scaleable way to evaluate the content and estimate the level of work required to optimize the content.

This is a challenge that we see time and time again – for organizations of this size, trying to optimize everything at once is overwhelming and not realistic.

We would suggest:

Use auditing tools to get a comprehensive view of where problems exist across your content. A good auditor will not just flag the issues, but also give you the right information to help you prioritize fixes:
1. Content and accessibility auditing tools to flag issues like outdated content, duplicates, missing metadata, or quality problems
2. Search analytics to understand what content matters most to your users and will have the highest impact
This gives you a scalable way to assess content quality across large volumes without manual review.
Use the slice approach
Don't try to fix everything at once. Focus on one high-impact content area or user journey, optimize that, measure results, then move to the next slice. This makes the work manageable, gives you concrete data on effort required, and lets you scale based on what you learn from the first slice.

The key is to use tools to identify priority areas, start with one slice, and build your optimization approach iteratively rather than trying to tackle everything upfront.

We used to use keyword tools to target content for traditional SEO, but how do we know what to optimize for Generative Engine Optimization (GEO)?

The shift is from optimizing for keywords to optimizing for questions and answers. Focus on the questions your audience is actually asking in your search analytics, and ensure your content provides clear, well-structured answers that AI can easily interpret and cite, using the principles we've discussed above around structure, clarity, and machine-readability.

Testing is key. The best way to know if your content is AI-ready is to test it against actual AI models with real user questions. For Squiz customers, we're developing auditing tools that simulate how AI search will use your content to answer common FAQs, pinpointing exactly which fragments in your content need to be optimized. You can test, learn, optimize, and repeat until you're happy with the answers that are surfaced. By testing against an actual AI model (in our case, Anthropic Claude), the improvements in content quality will help you perform better across other AI platforms as well, whether it’s ChatGPT, Perplexity, or Gemini.

How are you finding real user questions to use in an audit?

The best source for real user questions is your own search analytics. We recommend looking at what users are actually searching for on your site - these queries tell you exactly what questions matter to your audience.

If you have conversational search implemented (like Squiz Conversational Search), your admins will also have access to conversation history showing the actual questions users are asking in natural language. This gives you invaluable insight into how people phrase their questions, what information they're looking for, and where gaps might exist in your content.

We recommend that you start with your existing search data, identify the most common queries and questions, and use those as the basis for your content audit. This ensures you're optimizing for questions that real users are actually asking, not just what you think they might ask.

Does users arriving further along in the decision journey mean you need to change the content?

It's less about changing the content itself and more about considering the user journey as a whole.

Because users are arriving on your site more informed and ready to take action, your content should have clear pathways for these highly qualified leads to get directly to what they're trying to do. Of course, your content still needs to work well for humans - i.e. it needs to be clear, helpful, and accurate - but the focus shifts to making it easy for informed visitors to take the next step.

In addition to following best practices as discussed, what else do I need to do for dual search (traditional + AI) if I'm starting my site from scratch?

You're in a great position! Starting from scratch means you can build with dual search in mind from day one, without needing to audit and fix existing content.

Here’s what we recommend:

Map your key questions - identify what questions your audience will ask and what answers you need to provide
Create a content development plan - ensure every page follows the principles we've covered: proper structure, schema markup, clear headings, plain language
Use AI to support development - build a strong prompt that includes:
- Non-negotiables that must be on every page
- The questions you want each page to answer
- SEO and GEO best practices
- Then tailor this prompt for each specific page you create
Implement and monitor - publish your content and track how it performs across both traditional and AI search from the beginning

Most importantly, we want to emphasize that you should treat this as an ongoing experiment, not a one-off build. The way AI systems surface and summarize content is evolving fast, so make testing part of your routine. Regularly prompt tools like ChatGPT, Perplexity, and Gemini to describe your site or specific pages back to you, and see if they capture your intended message. If they don’t, refine your structure, language, or schema. Think of it as training both humans and machines to understand your brand the way you want them to.

Authority and accuracy concerns

As we learn more about how AI prioritizes its sources (e.g., ranking Reddit highly), and with evidence showing that AI ranks what people say about you higher than your own content, how should we account for that in our content organization? This is particularly concerning from a government perspective, where ensuring accurate information reaches the public is critical.

LLMs tend to give more weight to official, trustworthy sources, especially for things like government information, legal content, or policy guidelines. However, as you’ve noticed, it’s not quite as simple as that. LLMs don’t just rank content by credibility; they look at context, intent, and consensus.

If someone asks a factual question like “What’s the official guidance on X?”, the model will usually surface government or institutional sources. But if the question leans more toward opinion or interpretation, e.g. “What do people think about policy X?”, then discussions on Reddit, LinkedIn, or in the media can rise higher, because that’s where the broader public conversation lives. And to the model, that conversation equals relevance.

That being said, here are some of the steps you can take (beyond the machine-readability and schema markup we discussed in the webinar):

Build accurate signals across platforms
Think of citations as the new backlinks. Previously you needed links to your site for SEO; now mentions matter more:
1. Engage on platforms like Reddit and LinkedIn - not as PR, but to provide factual clarification
2. Encourage credible third parties (academics, NGOs, trusted community leaders) to reference and echo your official information
Monitor how AI represents you
Treat this like media monitoring:
1. Regularly test queries like "What is the policy on [topic]?" across major AI platforms (ChatGPT, Gemini, Perplexity, Claude)
2. Track how these models currently cite or summarize your content
3. Spot misinformation or drift early so you can correct it
Reinforce your authority signals
1. Maintain strong domain authority (.gov domains) with clear internal linking
2. Include "last updated" metadata and explanations of policy changes

The key shift is that authority now comes from a combination of your own well-structured content plus credible mentions and citations across the ecosystem.

How can we compete to have our information prioritized by AI over other websites? I work in local government, and building construction rules are sometimes pulled by AI from a builder's website instead of ours, which can display incorrect information.

As mentioned in the previous question about government information, LLMs weigh both credibility and context - looking at where information is discussed and cited across the web, not just at official sources.

In addition to the content structure and schema principles covered in the webinar, there are specific steps you can take to establish yourself as the authoritative source:

Establish digital trust signals
AI models recognize and weigh trust indicators:
1. Use your official government domain prominently (.gov, etc.) - models recognize these as official
2. Ensure other authoritative domains link back to your pages (national agencies, regulatory bodies, regional government sites)
3. Use canonical tags on all policy pages to declare the "main" version
Engage beyond your website
AIs learn from where people discuss topics, not just where official information is published. Seed accurate information where AI models monitor:
1. Post clarifications on relevant forums and platforms (e.g. Reddit) from an official account
2. Contribute to Wikipedia pages about your regulations and programs
3. Collaborate with trusted industry bodies so their pages link to your content
Each link and mention reinforces to AI models that your domain is the authority.
Publish versioned, referenced content
If other sites quote outdated rules, make it easy for AI (and humans) to identify current information:
1. Include "effective from" and "superseded" statements on every rule page
2. Maintain an archive of prior versions with redirects so AI models can identify which versions are current
Monitor and correct the record
Treat AI outputs as a channel you need to manage:
1. Regularly test prompts relevant to your area (e.g., "What are the rules for [specific regulation] in [your jurisdiction]?")
2. Track where AI pulls incorrect information and which sites are cited
3. Contact those site owners (or use AI feedback tools) to request corrections

You can also signal quality to AI platforms by ensuring fast page loads, HTTPS, and clean metadata.

Team resourcing and implementation

In your experience, how long does it take to audit and implement a "slice"?

Our slice approach at Squiz goes beyond just auditing and optimizing, and also involves setting up conversational search on your website. In our experience, this generally takes 6-8 weeks for the first slice – but it may be shorter or longer depending on the topic's scope and the current state of your content.

However, this is just the first slice when your team is still learning the process, so it should get quicker as you scale to additional content areas.

We're a small team of 3 people. Is content optimization even realistic for us?

Yes, absolutely! The slice approach we've discussed is designed specifically for teams with resource constraints. You don't need to hire a large team to get started.

The key is strategic focus:

Use the "slice" approach. Start with one high-impact content area.
Focus on what will make the biggest difference first.
Build on successes incrementally.

Many successful optimization projects are run by small, focused teams. The difference isn't team size, it's having a clear strategy and taking an iterative approach.

Our content is generated by multiple departments. Do you have tips on how we can maintain consistency across teams?

There's no silver bullet, but these strategies work:

Strong content governance: Create clear, up-to-date guidelines that specifically address AI and GEO optimization. Update existing brand voice and grammar guidelines to include how to write content for AI discovery.
Cross-departmental working group: Meet regularly (even quarterly) to share insights, discuss what's working, and build capability across the organization.
Leverage technology: Use tools that enforce consistency automatically. For example, in the Squiz DXP, you can set up AI prompts that help refine content within defined guardrails before publishing.

The most effective approach combines all three: building capability, maintaining clear standards, and leveraging technology to make consistency easier to achieve.

Testing and measurement

Does the 90% statistic you mentioned apply to Google's AI summaries linking out?

No, the 90% statistic refers specifically to ChatGPT citations, not Google's AI summaries.

Google's AI Overviews typically pull from pages that are already ranking well in traditional search results. This is different from ChatGPT, which often cites pages outside Google's top 20 results.

What digital content tools can I use to alert me to problems with content?

Our Product team at Squiz are building comprehensive auditing tools that help in two key ways:

Scan your content for issues across multiple dimensions – content quality, accessibility, spellcheck, broken links, and adherence to your brand guidelines – and flag them for optimization with suggested prioritization.
Simulate how AI search will use your content to answer common FAQs, pinpointing exactly which fragments of your content need to be optimized.

Together, this gives you visibility into any problems with your content so you know exactly what needs to be addressed.

After we optimize our content following these steps, how can we determine whether it's actually ready for AI discovery?

The best way to validate your content is by testing it against actual AI models with real user questions.

For Squiz customers, we're developing auditing tools that simulate how AI search will use your content to answer common FAQs, pinpointing exactly which fragments in your content are performing well and which need further optimization. You can test, learn, optimize, and repeat until you're happy with the answers that are surfaced.

By testing against an actual AI model (in our case, Anthropic Claude), the improvements in content quality will help you perform better across other AI platforms as well, whether it’s ChatGPT, Perplexity, or Gemini.

What metrics could you use to assess whether staff are finding information faster?

The most direct way to measure this is through user satisfaction metrics. If you have conversational search implemented, you can track:

Thumbs up/thumbs down ratings - direct feedback on whether answers were helpful
Conversation completion - reviewing conversation history to see if users are getting to the end of their query or abandoning mid-conversation, and whether they are getting answers on the first try or struggling

Beyond conversational search, you can look at:

Support ticket reduction - fewer "where can I find X?" requests to your internal helpdesk
Time to resolution - how quickly information-related queries are resolved

While efficiency gains can be hard to quantify directly, user satisfaction scores and support ticket trends can give you tangible indicators that staff are finding information faster.

We have noticed that ChatGPT is not accessing some fairly important types of content pages on our site. We plan to add schema to these pages. If we do this, how soon should we start to see changes to how ChatGPT finds them.

The timing is uncertain and varies by platform. Unlike traditional search engines that crawl websites regularly, AI platforms such as ChatGPT are updated periodically with new web data but don’t publish specific schedules for those updates. As a result, any changes you make won’t be reflected immediately in AI-generated responses.

However, here is what you can do:

Add schema markup as soon as you can. This improves your site’s clarity and structure, increasing the likelihood of being accurately represented when ChatGPT and other AI systems update their data.
Monitor regularly. Test by querying ChatGPT (and other AI platforms) from time to time to see when your updated content begins appearing.
Focus on overall content quality. Clear, well-structured, and schema-rich content is the best long-term strategy for improving visibility across all AI platforms.

Because AI update cycles are unpredictable, investing in broad content quality and technical accuracy, rather than trying to optimize for a single platform’s timing, is the most sustainable approach.

Platform-specific questions

Can you share real-world examples where Squiz's implementation of conversational AI search is being used?

Yes! You can check out our own implementation of Conversational Search on squiz.net – you can ask it anything about the Conversational Search feature. Check it out here.

If you’d like to see additional examples, please reach out to your account manager who can share relevant customer implementations with you. Alternatively, let us know at ask@squiz.net.

In terms of schemas/machine readable structure, does Squiz have any built-in structure that exports to AI industry standard/best practice to provide that context etc.? Or is that for us to create based on the likes of schema.org?

We don’t currently have a built-in structure that automatically exports to AI standards. This is something our content auditing tools will address in the future – they’ll help you identify where your content needs optimization to be AI-ready, rather than forcing all your content into FAQ formats.

In the meantime, for schema markup specifically, you’ll want to implement this based on your content types using standards like schema.org.

How do we implement AI search on our website?

Squiz offers Conversational AI Search for your website, allowing users to ask questions in natural language and get answers drawn from your content.

We've learned a lot from implementing this on our own site and working through it with customers, and we've developed a 5-step framework that our consulting team uses to guide you through the process. You can read about it in this blog post.

If you'd like to chat about how this could work for your site, reach out to your account manager or book a call with us.

Would love a link to the previous webinar recordings in this series?

Absolutely! You can find all of our previous webinars here.