web-archive
Library of Congress Web Archives
The Library of Congress's curated web-archiving program: more than a hundred thematic collections of websites selected by its librarians, at petabyte scale. You browse the collections on loc.gov, and collection and item pages can also be fetched as JSON. It shines when your topic matches one of the curated themes — elections, events, organizations — rather than for looking up arbitrary URLs.
Why it’s useful & how it works
CORRECTION: program is ALIVE (only the LC Labs 'Web Archive Datasets' experiment was retired in 2025, not collecting). Works from direct/residential IP; datacenter proxies are blocked (use direct). The /web-archives/ landing isn't a collection — query specific collection slugs with ?fo=json.
What’s inside
100+ collections; PB-scale curated.
API access
loc.gov JSON: append ?fo=json to collection/item URLs.
Access
Freely reachable — no key, login, or captcha.