Google Search Advocate John Mueller has expressed skepticism regarding the growing trend of creating dedicated Markdown or JSON pages exclusively for large language models (LLMs). Mueller questioned the necessity of building content formats that are unseen by human users, emphasizing that clean HTML and structured data should remain the primary focus for content creators.
The discussion originated on Bluesky when SEO expert Lily Ray raised a query about "creating separate markdown / JSON pages for LLMs and serving those URLs to bots," asking for Google's official perspective. This question highlights a developing practice where publishers generate "shadow" copies of their content in formats believed to be more easily digestible by AI systems.
Not sure if you can answer, but starting to hear a lot about creating separate markdown / JSON pages for LLMs and serving those URLs to bots. Can you share Google's perspective on this?
Ray further noted that this has become a "hot topic," with companies actively pitching solutions for creating such AI-specific content versions. A more active discussion on this topic also took place on X (formerly Twitter).
Mueller's Stance on LLM-Only Pages
In response to Ray's inquiry, Mueller stated that he was unaware of any internal Google requirements that would necessitate such a setup. He pointed out that LLMs have been successfully trained on and have parsed standard web pages from their inception.
I'm not aware of anything in that regard. In my POV, LLMs have trained on – read & parsed – normal web pages since the beginning, it seems a given that they have no problems dealing with HTML. Why would they want to see a page that no user sees? And, if they check for equivalence, why not use HTML?
When Ray followed up, asking if a separate format might "expedite getting key points across to LLMs quickly," Mueller argued that if specific file formats made a significant difference, the companies developing and running these AI systems would be very vocal about it.
If those creating and running these systems knew they could create better responses from sites with specific file formats, I expect they would be very vocal about that. AI companies aren't really known for being shy.
He conceded that some pages might perform better for AI systems than others, but he doubted this was primarily due to the file format itself, specifically distinguishing between HTML and Markdown. He also noted that JavaScript content still presents challenges for many AI systems.
Collectively, Mueller's comments suggest that, from Google's perspective, there's no immediate need for publishers to create bot-only Markdown or JSON duplicates of existing pages merely to ensure LLM comprehension.
The Role of Structured Data
Other contributors to the discussion differentiated between speculative "shadow" formats and instances where AI platforms have clearly defined content requirements. Matt Wright, for example, highlighted OpenAI's eCommerce product feeds as a case where JSON schemas are crucial.
In such contexts, a precise specification dictates how platforms like ChatGPT ingest and display product data. Wright explained:
Interestingly, the OpenAI eCommerce product feeds are live: JSON schemas appear to have a key role in AI search already.
This example supports the idea that structured feeds and schemas become paramount when a platform explicitly publishes a specification and requires its use. Additionally, Chris Long observed on LinkedIn that "editorial sites using product schemas tend to get included in ChatGPT citations," further underscoring the value of structured data in specific, well-defined scenarios.
Why This Matters for Publishers and SEOs
For publishers and SEO professionals contemplating the development of "LLM-optimized" Markdown or JSON versions of their content, this exchange offers valuable guidance. Mueller's remarks reinforce the long-standing capability of LLMs to effectively read and parse standard HTML.
For most websites, a more productive approach involves continuously improving the speed, readability, and overall content structure of existing pages. Implementing schema markup is also beneficial, particularly where clear platform guidelines or specifications exist.
While the Bluesky discussion indicates that AI-specific formats are emerging in niche areas like product feeds, these are typically tied to explicit integrations rather than a general rule that Markdown is inherently superior for LLMs. These specific requirements are certainly worth monitoring.
Looking Ahead
This conversation underscores the rapid pace at which AI-driven search changes are translating into technical demands for SEO and development teams, often preceding comprehensive documentation. Until LLM providers publish more concrete guidelines, the current advice points back to fundamental best practices:
- Maintain clean HTML.
- Reduce unnecessary JavaScript, especially where it hinders content parsing.
- Utilize structured data where platforms have clearly documented schemas.
By adhering to these principles, content creators can ensure their material remains accessible and understandable to both human users and advanced AI systems.









