web-archive
Archive-It (Internet Archive)
The Internet Archive's subscription archiving service, used by more than a thousand libraries, governments, and universities to build curated web collections that together hold tens of billions of documents. Captures are viewed through replay links on wayback.archive-it.org, organized by collection. It pays off when a specific institution's collection covers your subject; coverage is collection-by-collection, so it is not the place to look up arbitrary URLs.
No programmatic check — opens the archive’s own search.
Why it’s useful & how it works
FINDING: the /all/ aggregate CDX is 403 (the known DDoS-block), but per-collection CDX (e.g. /2950/timemap/cdx) returns 200 both ways. Integrate per-collection, not /all. Need a collection-id strategy (or accept replay-link-only for arbitrary URLs).
What’s inside
Tens of billions of docs; PB-scale.
API access
Per-collection CDX https://wayback.archive-it.org/ <collId>/timemap/cdx?url= (works) ; replay https://wayback.archive-it.org/ <collId>/<ts>/<url> . Aggregate /all/ CDX is blocked.
Access
Programmatic API access (a key may be required — see the API tag).