
Data scraping has changed from what it was a few years
ago, according to the 2025 edition of "Most Scraped Websites," a study by Decodo.
"Companies aren't just grabbing text from websites anymore,” the study states. “They
want videos, images, and audio to train their AI solutions."
The biggest change is that “everyone's racing to collect data for AI training, which means they need way more diverse content
than ever before,” it adds.
And video-first platforms are now at the top of the top 10 list.
The study is not about the legal ramifications that
trouble publishers. Rather, is explores the data needs of companies using large language models (LLMs).
"In 2025, outdated data is
useless,” says Vytautas Savickas, CEO at Decodo. “LLMs and AI agents live on real-time, relevant information collected from various sources, including product
reviews, the latest research papers, and trending content on community platforms. Companies are betting their future on having access to this kind of current, reliable
data."
advertisement
advertisement
Decodo has also seen "an increasing demand for data from eCommerce platforms like Coupang, Amazon, and Walmart,” says Gabriel Verbickait,
senior product marketing manager at Decodo.“Businesses are increasingly collecting more data from each platform, meaning these sites now play a bigger role in pricing strategies, product
assortment decisions, and shaping customer experiences."
Verbickait adds: "Data might have been the new oil in 2006, but in 2025, it's the fuel that powers artificial
intelligence. And AI systems have an appetite for fresh, diverse, and high-quality training data at unprecedented scale."
All that said, the most-scraped websites of 2025
are:
- TikTok
- Google
- Amazon
- YouTube
- Walmart
- Coupang
- eBay
- ScienceDirect
- Crunchhouse
- Airbnb