AI Bots supply some of the most advanced technologies that we apply today, from search engines to AI assistants. However, their growing presence led to the approaching their growing number of websites.
Bots suspending sites have costs, and between search engines and website owners there is a social agreement in which search engines raise value by sending a recommendation traffic to websites. This is preparing most of the websites from blocking search engines, such as Google, even if Google seems to be more likely to take more of this movement.
When we looked at traffic makeup ~ 35 thousand Pages in Ahrefs Analytics, we found that AI sends only 0.1% of the total recommendation movement – in search.
I think that many site owners want to allow these bots to find out about their brand, their activities and their products and offers. But while many people assume that these systems are the future, now they risk that they do not add sufficient value to website owners.
The first LLM, which will add more values to users, showing the impressions and clicks to website owners, will probably have a substantial advantage. Companies will report indicators from this LLM, which will probably raise adoption and prevent more websites to block their bot.
Bots apply resources, apply data to train their AIS and create potential privacy problems. As a result, many websites choose blocking AI bots.
We looked at ~ 140 million websites, and our data show that block indicators for AI bots have increased significantly over the past year. I want to thank our scientist very much XIBEIJIA Guan to download this data.
- The number of AI bots has doubled From August 2023, with 21 main AI bots now operating on the Internet.
- GPTBOT (OPENAI) is the most blocked bot AIwith 5.89% of all websites.
- Claudebot (anthropic) recorded the highest raise in block indicatorsIt increases by 32.67% over the past year.
The most popular bots are also the most popular. It is likely that less known bots are less blocked because they are less known and less busy.
We looked at the total number of websites blocking bots. There are many ways to block bots with robots.txt, and this includes all, including:
- Clear blockswhere the bot is replaced and not allowed
- General blockswhere all bots can be blocked
- All cases in which The directive allowed the botAfter blocking all bots
Reservations: This does not include any other types of blocks, such as firewalls or IP blocks.
As I mentioned earlier, GPTBOT is the most blocked bot. This is the most busy bot ai according to Cloudflare radar.


There is a moderate positive correlation between the speed of the request and the speed of blocks for these bots. The bots that make more demands are usually blocked. Nerdy’s numbers have 0.512 Pearson correlation coefficient, P value of 0.0149, and this is statistically essential at 5%.


Here are data for general blocks:


Here is the total number of websites blocking AI bots:


Here are the data:
Bot name | To count | Percentage | Bot operator |
---|---|---|---|
GPTBOT | 8245987 | 5.89 | Openai |
Ccbot | 8188656 | 5.85 | Widespread |
Amazonbot | 8082636 | 5.78 | Amazon |
Bytespider | 8024980 | 5.74 | Bytedance |
Claudebot | 8023055 | 5.74 | Anthropic |
Google extended | 7989344 | 5.71 | |
Anthropic-ai | 7963740 | 5.69 | Anthropic |
Facebookbot | 7931812 | 5.67 | Finish |
Omgili | 7911471 | 5.66 | Webz.io |
Claude-Web | 7909953 | 5.65 | Anthropic |
You have kore | 7894417 | 5.64 | Cohere |
Chatgpt-user | 7890973 | 5.64 | Openai |
Explosion | 7888105 | 5.64 | Apple |
Meta-venerable agent | 7886636 | 5.64 | Finish |
Diffbot | 7855329 | 5.62 | Diffbot |
Embarrassment | 7844977 | 5.61 | Embarrassment |
Timpot | 7818696 | 5.59 | time |
Applers | 7768055 | 5.55 | Apple |
OAI-SEARCHBOT | 7753426 | 5.54 | Openai |
Webzo-Exteded | 7745014 | 5.54 | Webz.io |
Meta-externalfetcher | 7744251 | 5.54 | Finish |
Bot Kangaroo | 7739707 | 5.53 | Kangaroo llm |
It becomes a little more complicated. In the case of the above, we looked at the main file robots.txt for the website, but each subdomen can have its own set of instructions. If we look at ~ 461 billion robots. TXT, then the total % of blocks for GPTBOT increases to 7.3 %.
Ai bot blocks with time
In 2024, AI bots blocking AI bottles began to block, but the trend decreases at the end of the year. It seems that the inheritance comes mainly from general blocks. The trend for AI bots themselves is growing and I will show you it in a minute.


Do some types of sites block AI more?
Here’s how it breaks down for each bot in various categories of websites. In fact, I expected the messages more blocked than other categories, because there were many stories about information services blocking these bots, but art and entertainment (45% blocked) and Law & Government pages (42% blocked) blocked them more.


The decision to block AI bots varies depending on the industry. There can be many unique reasons. They are somewhat speculative:
- Art and entertainment: ethical aversion, reluctance to become training data.
- Books and literature: Copyright.
- Law and government: legal fears, compliance.
- News and media: prevent the apply of their articles for training AI models that could compete with their journalism and break away from their revenues.
- Shopping: prevent prices from scraping or monitoring stocks by competitors.
- Sport: Similar to messages and media about income concerns.
In this regard, we only look at cases in which a given bot is not allowed. It does not include any general statements or cases in which only some bots may be allowed. In such cases, site owners tried to lock some bots specially.
Again, GPTBOT is the most targeted and then Common Crawl bots. Common indexing data is probably used as a data source for most LLM.
Here are the most blocked AI bots with sites that are designated:


Here are data on the number of websites blocking their websites:


Here are the data:
Bot name | To count | Percentage | Bot operator |
---|---|---|---|
GPTBOT | 693639 | 0.5 | Openai |
Ccbot | 682861 | 0.49 | Widespread |
Amazonbot | 469086 | 0.34 | Amazon |
Bytespider | 461706 | 0.33 | Bytedance |
Google extended | 415821 | 0.3 | |
Claudebot | 393511 | 0.28 | Anthropic |
Anthropic-ai | 383176 | 0.27 | Anthropic |
Facebookbot | 361803 | 0.26 | Finish |
Omgili | 322502 | 0.23 | Webz.io |
Chatgpt-user | 310430 | 0.22 | Openai |
You have kore | 306385 | 0.22 | Cohere |
Claude-Web | 276411 | 0.2 | Anthropic |
Explosion | 258451 | 0.18 | Apple |
Meta-venerable agent | 245176 | 0.18 | Finish |
Embarrassment | 214488 | 0.15 | Embarrassment |
Diffbot | 213828 | 0.15 | Diffbot |
Timpot | 174434 | 0.12 | time |
Applers | 163148 | 0.12 | Apple |
OAI-SEARCHBOT | 110376 | 0.08 | Openai |
Webzo-Exteded | 100572 | 0.07 | Webz.io |
Meta-externalfetcher | 99993 | 0.07 | Finish |
Bot Kangaroo | 95056 | 0.07 | Kangaroo llm |
In times, clear blocks of AI bots
As you can see, AI bots are starting to be blocked by much more sites with the most commercial.


The number of AI bots increased more than twice in just over a year, from 10 in August 2023 to 21 in December 2024, more up-to-date participants on the market mean more bots who apply resources for website tanks.
Claudebot had the fastest development of every worm in the last year.


Here are the data:
Bot name | Growth % | Absolute growth |
---|---|---|
Claudebot | 32.67% | 0.85 |
Anthropic-ai | 25.14% | 0.67 |
Claude-Web | 20.66% | 0.54 |
Bytespider | 19.57% | 0.54 |
Chatgpt-user | 15.52% | 0.47 |
embarrassment | 15.37% | 0.4 |
GPTBOT | 13.38% | 0.53 |
You have kore | 12.45% | 0.32 |
Facebookbot | 11.71% | 0.32 |
Ccbot | 11.41% | 0.44 |
Amazonbot | 10.22% | 0.3 |
Google extended | 10.07% | 0.3 |
diffbot | 8.98% | 0.23 |
Omgili | 8.96% | 0.25 |
Explosion | 7.11% | 0.18 |
Meta-venerable agent | 5.90% | 0.15 |
OAI-SEARCHBOT | 2.17% | 0.06 |
Timpot | 0.01% | 0 |
Webzo-Exteded | -1.69% | -0.04 |
Applers | -3.32% | -0.09 |
Meta-externalfetcher | -4.32% | -0.11 |
Bot Kangaroo | -5.89% | -0.15 |
Final thoughts
It will be compelling to see how the block rate evolves, when more and more of these robots are starting to apply an increasing number of resources. Will they be able to fulfill this social contract with the owners of the site and send them more traffic, or will they decide to maintain this movement for themselves?
I think that if they go to the garden approach in the garden, more websites will block bots, and these systems will have to pay websites for access to their data, or bottles can end the breakthrough of websites and ignoring blocks. There were several reports about some and ignoring works.
What do you have Do you block them on your website or see value, enabling them to access? Let me know X Or LinkedIn.