Cloudflare gives creators new tool to control use of their content
Cloudflare has introduced a way to help website owners and publishers gain more control over their content, by making it easy for any website owner to update their robots.txt — the simple text file that tells web crawlers what parts of a site they can or cannot access — with a new Content Signals Policy. This new policy will enable website operators to express preferences over how their data is used by others, including the ability to opt out of AI overviews and inference.
The Internet is shifting from ‘search engines’, which provided a treasure map of links that a user could explore for information, to ‘answer engines’ powered by AI, which give a direct answer without a user ever needing to click on the original site’s content. This severely threatens the original business model of the Internet, where websites, publishers and content creators could earn money or attention by driving traffic and views to their site.
Today, AI crawlers scrape vast troves of data from websites, but website operators have no way to express the nuances of whether, how and for what purpose they may want to allow their content to be used. The robots.txt file allows website operators to specify which crawlers are allowed and what parts of a website they can access. It does not, however, let the crawler know what they are able to do with the content after accessing it. There needs to be a standard, machine-readable way to signal how data can be used even after it has been accessed.
“The Internet cannot wait for a solution, while in the meantime, creators’ original content is used for profit by other companies,” said Matthew Prince, co-founder and CEO of Cloudflare. “To ensure the web remains open and thriving, we’re giving website owners a better way to express how companies are allowed to use their content. Robots.txt is an underutilised resource that we can help strengthen, and make it clear to AI companies that they can no longer ignore a content creator's preferences.”
Cloudflare believes that an operator of a website, API, MCP server or any Internet-connected service, whether they are a local news organisation, AI startup or an ecommerce shop, should get to decide how their data is used by others for commercial purposes. Today, more than 3.8 million domains use Cloudflare’s managed robots.txt service to express they do not want their content used for training. Now, Cloudflare's new Content Signals Policy will enable users to strengthen their robots.txt preferences with a clear set of instructions for anyone accessing the website via automated means, such as an AI crawler. The policy will now inform crawlers by:
- Explaining how to interpret the content signals in simple terms: ‘Yes’ means allowed, ‘no’ means not allowed, and no signal means no expressed preference.
- Defining the different ways that a crawler typically uses content in clear terms, including search, AI input and AI training.
- Reminding companies that website operators’ preferences in robots.txt files can have legal significance.
While robots.txt files may not stop unwanted scraping, Cloudflare’s aim is that this improved policy language will better communicate a website owner’s preferences to bot operators, and drive companies to better respect content creator preferences.
Starting today, Cloudflare will automatically update the robots.txt files to include this new policy language for all customers who ask Cloudflare to manage their robots.txt file. For anyone who wants to declare how crawlers can use their content via customised robots.txt files, Cloudflare is publishing tools to help.
Organisations have seen the need for solutions like the Content Signals Policy, as a way to offer more direction over how their content is used.
“We are thrilled that Cloudflare is offering a powerful new tool, now widely available to all users, for publishers to dictate how and where their content is used,” said Danielle Coffey, President and CEO of the News/Media Alliance. “This is an important step towards empowering publishers of all sizes to reclaim control over their own content and ensure they can continue to fund the creation of quality content that users rely on.
“We hope that this encourages tech companies to respect content creators’ preferences. Cloudflare is showing that doing the right thing isn’t just possible, it’s good business.”
To learn more, read Cloudflare’s Content Signals blog.
Check Point Software expands Australian presence
Check Point Software has implemented Australian-based data residency instances of its Harmony...
Radware uncovers zero-click vulnerability in ChatGPT
ShadowLeak is a zero-click server-side vulnerability affecting the ChatGPT Deep...
DigiCert acquires Valimail to boost email security
DigiCert has acquired DMARC provider Valimail in a bid to enhance its email authentication...