Wikimedia claims it's groaning under traffic from bots scraping for AI

At least 65 per cent of the most resource-intensive traffic to Wikimedia's core data centres comes from bots, stretching the non-profit's resources thin across the globe, the Wikimedia Foundation has said.

While bots make up only 35 per cent of total pageviews, their disproportionate use of backend resources has put growing strain on Wikimedia's infrastructure, threatening performance and reliability, the foundation has found.

It claimed that the surge is driven largely by automated programs scraping Wikimedia Commons — home to over 144 million freely licenced images, videos, and files — for use in training AI models.

These bots often bypass local caching systems by bulk-accessing rarely viewed content, triggering more expensive requests that must be processed by core data centers, it said.

This spike in non-human traffic is already impacting operations, it claimed.

When former US President Jimmy Carter died in December 2024, his Wikipedia page received 2.8 million views in a single day.

Simultaneously, a spike in video views of a 1980 presidential debate between him and Ronald Reagan doubled Wikimedia's network traffic, briefly filling up some internet connections and slowing load times.

The foundation, which has for years been thriving on donations to keep the site alive, said that since the beginning of 2024, bandwidth use for downloading multimedia has grown by 50 per cent – almost entirely due to scraping activity.

It said that while Wikimedia's infrastructure is designed to absorb sudden spikes from human interest, the sustained and growing base demand from bots is creating persistent pressure and escalating operational costs.

It claimed that the activity poses a broader challenge across the web, as tech companies rush to harvest high-quality, human-verified content for AI development.

Large Language Models, which form the core of most chatbot driven AIs, rely on outside knowledge sources to feed itself.

Several AI firms, including the ChatGPT-maker OpenAI, have been accused of scraping data from publishers, news websites and other sources without permission, leading to lawsuits.

Image Source: Unsplash