AI bots that block ~ 140 million websites the most

AI bots that block ~ 140 million websites the most

AI Bots supply some of the most advanced technologies that we apply today, from search engines to AI assistants. However, their growing presence led to the approaching their growing number of websites.

Bots suspending sites have costs, and between search engines and website owners there is a social agreement in which search engines raise value by sending a recommendation traffic to websites. This is preparing most of the websites from blocking search engines, such as Google, even if Google seems to be more likely to take more of this movement.

When we looked at traffic makeup ~ 35 thousand Pages in Ahrefs Analytics, we found that AI sends only 0.1% of the total recommendation movement – in search.

I think that many site owners want to allow these bots to find out about their brand, their activities and their products and offers. But while many people assume that these systems are the future, now they risk that they do not add sufficient value to website owners.

The first LLM, which will add more values ​​to users, showing the impressions and clicks to website owners, will probably have a substantial advantage. Companies will report indicators from this LLM, which will probably raise adoption and prevent more websites to block their bot.

Bots apply resources, apply data to train their AIS and create potential privacy problems. As a result, many websites choose blocking AI bots.

We looked at ~ 140 million websites, and our data show that block indicators for AI bots have increased significantly over the past year. I want to thank our scientist very much XIBEIJIA Guan to download this data.

  • The number of AI bots has doubled From August 2023, with 21 main AI bots now operating on the Internet.
  • GPTBOT (OPENAI) is the most blocked bot AIwith 5.89% of all websites.
  • Claudebot (anthropic) recorded the highest raise in block indicatorsIt increases by 32.67% over the past year.

The most popular bots are also the most popular. It is likely that less known bots are less blocked because they are less known and less busy.

We looked at the total number of websites blocking bots. There are many ways to block bots with robots.txt, and this includes all, including:

  • Clear blockswhere the bot is replaced and not allowed
  • General blockswhere all bots can be blocked
  • All cases in which The directive allowed the botAfter blocking all bots

Reservations: This does not include any other types of blocks, such as firewalls or IP blocks.

As I mentioned earlier, GPTBOT is the most blocked bot. This is the most busy bot ai according to Cloudflare radar.

Bots that crawl the most according to cloudflare radarBots that crawl the most according to cloudflare radar

There is a moderate positive correlation between the speed of the request and the speed of blocks for these bots. The bots that make more demands are usually blocked. Nerdy’s numbers have 0.512 Pearson correlation coefficient, P value of 0.0149, and this is statistically essential at 5%.

Bots that creep more are usually blockedBots that creep more are usually blocked

Here are data for general blocks:

The speed of AI bot blocksThe speed of AI bot blocks

Here is the total number of websites blocking AI bots:

Total websites blocking AI botsTotal websites blocking AI bots

Here are the data:

Bot name To count Percentage Bot operator
GPTBOT 8245987 5.89 Openai
Ccbot 8188656 5.85 Widespread
Amazonbot 8082636 5.78 Amazon
Bytespider 8024980 5.74 Bytedance
Claudebot 8023055 5.74 Anthropic
Google extended 7989344 5.71 Google
Anthropic-ai 7963740 5.69 Anthropic
Facebookbot 7931812 5.67 Finish
Omgili 7911471 5.66 Webz.io
Claude-Web 7909953 5.65 Anthropic
You have kore 7894417 5.64 Cohere
Chatgpt-user 7890973 5.64 Openai
Explosion 7888105 5.64 Apple
Meta-venerable agent 7886636 5.64 Finish
Diffbot 7855329 5.62 Diffbot
Embarrassment 7844977 5.61 Embarrassment
Timpot 7818696 5.59 time
Applers 7768055 5.55 Apple
OAI-SEARCHBOT 7753426 5.54 Openai
Webzo-Exteded 7745014 5.54 Webz.io
Meta-externalfetcher 7744251 5.54 Finish
Bot Kangaroo 7739707 5.53 Kangaroo llm

It becomes a little more complicated. In the case of the above, we looked at the main file robots.txt for the website, but each subdomen can have its own set of instructions. If we look at ~ 461 billion robots. TXT, then the total % of blocks for GPTBOT increases to 7.3 %.

Ai bot blocks with time

In 2024, AI bots blocking AI bottles began to block, but the trend decreases at the end of the year. It seems that the inheritance comes mainly from general blocks. The trend for AI bots themselves is growing and I will show you it in a minute.

AI bot blocks rate in time through movementAI bot blocks rate in time through movement

Do some types of sites block AI more?

Here’s how it breaks down for each bot in various categories of websites. In fact, I expected the messages more blocked than other categories, because there were many stories about information services blocking these bots, but art and entertainment (45% blocked) and Law & Government pages (42% blocked) blocked them more.

The speed of AI blocks in time according to the domain categoryThe speed of AI blocks in time according to the domain category

The decision to block AI bots varies depending on the industry. There can be many unique reasons. They are somewhat speculative:

  • Art and entertainment: ethical aversion, reluctance to become training data.
  • Books and literature: Copyright.
  • Law and government: legal fears, compliance.
  • News and media: prevent the apply of their articles for training AI models that could compete with their journalism and break away from their revenues.
  • Shopping: prevent prices from scraping or monitoring stocks by competitors.
  • Sport: Similar to messages and media about income concerns.

In this regard, we only look at cases in which a given bot is not allowed. It does not include any general statements or cases in which only some bots may be allowed. In such cases, site owners tried to lock some bots specially.

Again, GPTBOT is the most targeted and then Common Crawl bots. Common indexing data is probably used as a data source for most LLM.

Here are the most blocked AI bots with sites that are designated:

Clear AI bot blocksClear AI bot blocks

Here are data on the number of websites blocking their websites:

The total number of sites clearly blocking AI botsThe total number of sites clearly blocking AI bots

Here are the data:

Bot name To count Percentage Bot operator
GPTBOT 693639 0.5 Openai
Ccbot 682861 0.49 Widespread
Amazonbot 469086 0.34 Amazon
Bytespider 461706 0.33 Bytedance
Google extended 415821 0.3 Google
Claudebot 393511 0.28 Anthropic
Anthropic-ai 383176 0.27 Anthropic
Facebookbot 361803 0.26 Finish
Omgili 322502 0.23 Webz.io
Chatgpt-user 310430 0.22 Openai
You have kore 306385 0.22 Cohere
Claude-Web 276411 0.2 Anthropic
Explosion 258451 0.18 Apple
Meta-venerable agent 245176 0.18 Finish
Embarrassment 214488 0.15 Embarrassment
Diffbot 213828 0.15 Diffbot
Timpot 174434 0.12 time
Applers 163148 0.12 Apple
OAI-SEARCHBOT 110376 0.08 Openai
Webzo-Exteded 100572 0.07 Webz.io
Meta-externalfetcher 99993 0.07 Finish
Bot Kangaroo 95056 0.07 Kangaroo llm

In times, clear blocks of AI bots

As you can see, AI bots are starting to be blocked by much more sites with the most commercial.

Clear AI bottles for 1 million websites by trafficClear AI bottles for 1 million websites by traffic

The number of AI bots increased more than twice in just over a year, from 10 in August 2023 to 21 in December 2024, more up-to-date participants on the market mean more bots who apply resources for website tanks.

Claudebot had the fastest development of every worm in the last year.

AI total blocks of AI for 1 million websites by trafficAI total blocks of AI for 1 million websites by traffic

Here are the data:

Bot name Growth % Absolute growth
Claudebot 32.67% 0.85
Anthropic-ai 25.14% 0.67
Claude-Web 20.66% 0.54
Bytespider 19.57% 0.54
Chatgpt-user 15.52% 0.47
embarrassment 15.37% 0.4
GPTBOT 13.38% 0.53
You have kore 12.45% 0.32
Facebookbot 11.71% 0.32
Ccbot 11.41% 0.44
Amazonbot 10.22% 0.3
Google extended 10.07% 0.3
diffbot 8.98% 0.23
Omgili 8.96% 0.25
Explosion 7.11% 0.18
Meta-venerable agent 5.90% 0.15
OAI-SEARCHBOT 2.17% 0.06
Timpot 0.01% 0
Webzo-Exteded -1.69% -0.04
Applers -3.32% -0.09
Meta-externalfetcher -4.32% -0.11
Bot Kangaroo -5.89% -0.15

Final thoughts

It will be compelling to see how the block rate evolves, when more and more of these robots are starting to apply an increasing number of resources. Will they be able to fulfill this social contract with the owners of the site and send them more traffic, or will they decide to maintain this movement for themselves?

I think that if they go to the garden approach in the garden, more websites will block bots, and these systems will have to pay websites for access to their data, or bottles can end the breakthrough of websites and ignoring blocks. There were several reports about some and ignoring works.

What do you have Do you block them on your website or see value, enabling them to access? Let me know X Or LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *