Commentary

For The Record: The Case For The Wayback Machine Archive

by Ray Schultz , Columnist, Yesterday

I see on LinkedIn that some former colleagues are upset that the owner of our old products is killing the web archive.

My initial reaction was that I don’t care if my 10-year-old articles are trashed – it’s probably for the best in some cases.

Anyway, with the volume that reporters have to put out these days, pride of authorship must be limited to the most recent stories.

But it raises a point. There is a similar outcry going on regarding The Wayback Machine, a seemingly benign tool that helps people access content in the archive.

Three major news organizations – The New York Times, The Guardian and Reddit – are blocking access to their content because they fear their content will be scraped by AI crawlers and used without permission.

It's easy to understand their feeling. AI-scraping is one of the biggest problems facing publishers today. They are losing money and traffic to this nefarious practice.

But let’s look at the Wayback side of it.

“For thirty years, the Wayback Machine has worked in the background, preserving more than 1 trillion web pages so that reporting doesn’t simply vanish with the next site redesign or corporate decision,” Mark Graham, director of the Wayback Machine Internet Archive, said in a recent email that thanked publishers for their support.

“Today, more than 100 news articles every month reference, cite, or rely on material preserved by the Wayback Machine to verify claims, recover deleted information, or provide historical context," he added.

Graham continues, “Where previous generations could walk into a newsroom morgue or a local library archive, today’s journalists increasingly rely on digital preservation to trace accountability and verify claims that might otherwise be lost. When a source disappears, when a statement is rewritten, when a page is taken down, the ability to recover that record is not a luxury.”

(I will say that going through musty paper archives is more fun.)

Addressing the scraping issue, Graham asserts, “We build systems designed for people, not bulk extraction; we monitor our services to manage abusive access; and we actively collaborate with publishers and newsrooms to ensure their work is preserved with integrity.”

He concludes, “At a time when the pressures on journalism are mounting – from economic shifts to the rapid evolution of AI – your support sends a clear message: preserving the public record is not optional. It is essential infrastructure for a functioning democracy."

Save the archive!

artificial intelligence, content, data management, news, newspapers, publishing

Next story loading

About the Author

Ray Schultz is the former editor of DM News, Chief Marketer, Direct, Circulation Management and other marketing titles.