The rise of artificial intelligence is sparking a silent war on the Internet. On one side are AI companies eager to train models with massive amounts of data, and on the other side are worried content creators and publishers. They're finding their work being used for AI training without compensation, and they're either forced to leave the door open or build high walls (walled gardens) to block the content altogether. But now, cyberinfrastructure giant Cloudflare is proposing a third way, attempting to defuse the conflict with an almost forgotten Internet protocol.
Breaking the binary choice: pay per crawl
The current dilemma facing content owners is real. Many media companies, such as The New York Times, have opted to take legal action against OpenAI and Microsoft, accusing them of copyright infringement. Others, such as Axel Springer and the Associated Press, have opted to enter into licensing agreements with AI companies, trading content for financial and technical cooperation. But the bar for negotiating these one-off deals is extremely high, making them nearly impossible for small and medium-sized content creators to realize.
Cloudflare's proposal, called "Pay per Crawl," centers on giving content owners a third option to "completely open" and "completely closed" content. and "completely closed":charge for visitsThe solution is not the creation of a new technology. Rather than creating an entirely new technology, this solution cleverly "resurrects" a long-forgotten HTTP status code that had been in an experimental stage for a long time: 402 Payment Required
(Payment required).
This status code was originally designed for digital cash or micropayment systems, but was never widely adopted.Cloudflare is bringing it back today with the aim of creating a programmatic framework for monetizing content at web scale.
How does the "402 payment requirement" work?
"Pay per Crawl," currently in private beta, allows website owners to set a flat, per-request price for their content. When an AI crawler visits the site, the publisher has three options:
- Allow (Allow):: Free and open content.
- Charge:: Demand payment from the other party at a set price.
- Block:: Complete denial of access.
Interestingly, even if a crawler doesn't have a payment relationship with Cloudflare, the publisher can still choose to "charge". This is functionally equivalent to a network-level interception (returning a 403 Forbidden
), but it sends an additional message that there could be a paid partnership between us in the future.
A cornerstone of trust: verifying the identity of a crawler
The key challenge with this system is to make sure that the paid crawler is what it claims to be, and not a forger.Cloudflare employs a system named Web Bot Auth
program to address this issue.Web Bot Auth
Use cryptographic signatures in HTTP messages to verify that the request is indeed coming from an automated bot.
For the crawler operator, the whole process is as follows:
- Generating Keys: Create an Ed25519 key pair.
- Publishing Public Keys: Publish public keys in JWK format in a self-hosted directory.
- enrollment: Provide Cloudflare with the URL of the public key catalog and user agent information.
- Signature request: Message signatures are used in every HTTP request made.
When the crawler makes a request, the request header will contain the signature-agent
,signature-input
cap (a poem) signature
etc. fields for authentication.
// 一个带有数字签名的请求示例,用于验证爬虫身份
GET /example.html
Signature-Agent: "https://signature-agent.example.com"
Signature-Input: sig2=("@authority" "signature-agent")
;created=1735689600
;keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"
;alg="ed25519"
;expires=1735693200
;nonce="e8N7S2MFd/qrd6T2R3tdfAuuANngKI7LFtKYI/vowzk4lAZYadIX6wW25MwG7DCT9RUKAJ0qVkU0mEeLElW1qg=="
;tag="web-bot-auth"
Signature: sig2=:jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:
Two payment models: active and passive
In practice, paid interactions are divided into two models:
- Reactive: The crawler sends a request first, and if the target content needs to be paid for, the server returns the
HTTP 402 Payment Required
response with the response headercrawler-price
field informs about the price. Once the crawler receives it, it can decide whether to carrycrawler-exact-price
Header retry request indicating agreement to pay. - Proactive Intent Mode: Crawlers can actively include a
crawler-max-price
request header, indicating the maximum price it is willing to pay. If the price of the content is less than or equal to that maximum, the server simply returns theHTTP 200 OK
and content in the response header via thecrawler-charged
Confirms the actual amount of the deduction. If the price of the content is higher than its bid, return the402
Response.
Cloudflare plays the role of the merchant of record in which it is responsible for aggregating transactions, charging crawlers, and ultimately distributing the proceeds to content publishers.
A look into the future: from crawlers to AI agents
"Pay per Crawl" may be about much more than solving the crawler data problem of the day; Cloudflare is really looking at a future dominated by AI Agents.
Imagine giving this agent a budget when your personal AI assistant needs to write a review of the latest cancer research for you, or find you the best local restaurant. It would be able to be programmed to HTTP 402
Agreements automatically negotiate, pay for, and access the highest quality, most relevant content with other information sources.
This marks a technological shift towards a robust, automated mechanism that gives creators real control over the value of their digital assets. While this system is still in its very early stages, with issues such as dynamic pricing and more granular licensing models still to be explored, it opens a new door to building a fairer and more diverse Internet content ecosystem.