Archivarix · Echo

web-archive

Library of Congress Web Archives

The Library of Congress's curated web-archiving program: more than a hundred thematic collections of websites selected by its librarians, at petabyte scale. You browse the collections on loc.gov, and collection and item pages can also be fetched as JSON. It shines when your topic matches one of the curated themes — elections, events, organizations — rather than for looking up arbitrary URLs.

Web pages / sites API

Why it’s useful & how it works

CORRECTION: program is ALIVE (only the LC Labs 'Web Archive Datasets' experiment was retired in 2025, not collecting). Works from direct/residential IP; datacenter proxies are blocked (use direct). The /web-archives/ landing isn't a collection — query specific collection slugs with ?fo=json.

What’s inside

100+ collections; PB-scale curated.

API access

loc.gov JSON: append ?fo=json to collection/item URLs.

Access

Freely reachable — no key, login, or captcha.

Homepage

https://www.loc.gov/web-archives/