It’s About to Get Harder to Read Old Reddit Threads, and You Can Blame AI

With more and more AI showing up in Google searches as of late, I’ve been leaning extra hard on that one magic word that makes the internet work: Reddit. It’s got its problems, but appending “Reddit” to a search is still the surest bet I have of getting an honest opinion from a real person, which is more than I can say for some other platforms. Unfortunately, it seems like the “Reddit” trick is about to get a lot less useful, and once again, you can blame AI for it.

The problem with any live forum is that information comes and goes as people delete old posts and new updates break older parts of the site. There used to be a way to get around this, but going forward, that loophole’s getting closed.

Yes, Reddit is about to start blocking the Internet Archive. The site, run by a nonprofit dedicated to preserving the open internet, is host to the Wayback Machine, a popular way to browse internet pages that are no longer active, or have changed significantly since they first went up. Simply enter a URL in the Machine’s search box, and you’ll be able to browse captures of what that page used to look like, sometimes going as far back as the 1990s.

It’s a useful way to see how a site has changed, or access information that’s supposed to be long gone. In Reddit’s case, you could use it to look at, say, a hotel review that’s since been deleted. Sure, you might feel a bit awkward about reading a post that’s been purposefully taken down, but because deleting all your threads when leaving the service is a common practice, the Wayback Machine is a great way to preserve useful content well into the future, and keep classic memes from becoming lost media.

Unfortunately, while Reddit says it’s not against the Wayback Machine in general, it’s about to stop the Internet Archive from indexing anything but the Reddit homepage, which means the only archives it’ll be able to keep going forward will be lists of what was popular on Reddit on a certain day. Individual subreddits and posts will be blocked.

That’s not totally useless, say if you’re an internet researcher, but it will make all future Reddit threads way more temporary in nature, and will definitely hurt casual web searches down the line. If I review a hotel now, and then delete my thread, users in a month or two won’t be able to easily see it. On the bright side, existing archives shouldn’t be affected by this block, at least unless Reddit asks the Internet Archive to take down existing captures. But as time passes, the lack of Reddit archives is only going to become a bigger issue.

So why is this happening? Basically, Reddit doesn’t like AI companies scraping content from its site, at least without paying for it first.

“Internet Archive provides a service to the open web,” Reddit spokesperson Tim Rathschmidt told the Verge, “but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine.”

Essentially, Reddit wants to tightly control which AI companies it works with (it’s sued over this before), and has blocked most of them from crawling its site. However, with some then turning to scraping Reddit pages captured by the Internet Archive instead, the company is now going to crack down on those captures as well. Basically, we’re paying the price for a few bad apples.

Rathschmidt told The Verge that limits on the Internet Archive will start “ramping up” today, although he wasn’t entirely clear about how. I’ve reached out to Reddit for details, but for now, I did double check, and I’m still able to access archives that already exist, so at least Reddit hasn’t gone nuclear yet.

As for any future posts, all might not be lost. The Verge also spoke to Wayback Machine director Mark Graham, who said that the Internet Archive has a “longstanding relationship with Reddit,” and that there are “ongoing discussions about this matter.”

Leave a Reply

Your email address will not be published. Required fields are marked *