web-archive
Wayback Machine (Internet Archive)
The Internet Archive's flagship web-page archive and the largest of its kind, holding more than a trillion captures of URLs as they appeared over time. Enter any web address to browse its snapshot history in the web interface; free APIs also let you check whether a page is archived and list every capture. It is the default first stop when you need an old or deleted version of almost any web page. The service has had intermittent outages in 2026 and throttles heavy use, though existing captures remain readable.
Why it’s useful & how it works
Backbone. Availability API = simplest existence check (tiny JSON). CDX = full capture list. Works datacenter+direct. Degraded only by intermittent 2026 outages (last May 5 2026) and a growing set of news publishers blocking the crawler — does not affect reads of existing captures. Aggressive 429 under heavy load.
What’s inside
1 trillion+ pages (Oct 2025 milestone); ~99 PB unique.
API access
Availability https://archive.org/wayback/available?url= ; CDX http://web.archive.org/cdx/search/cdx?url=&output=json ; TimeMap http://web.archive.org/web/timemap/link/ ; replay https://web.archive.org/web/ <ts>/<url>
Access
Freely reachable — no key, login, or captcha.