Browse the archive catalog
Every source we know — filter by category, cluster, API, or access, then open any archive for details.
- Wayback Machine (Internet Archive) The largest general web-page archive; arbitrary URLs over time. Web pages / sites API
- archive.today (.ph / .is / .md / .li / .vn / .fo) On-demand single-snapshot archiver; flattens dynamic pages, bypasses many paywalls. Web pages / sites API captcha
- Common Crawl (CDX index) Petabyte-scale open web crawl with a per-crawl URL capture index. Web pages / sites API
- Arquivo.pt (Portuguese Web Archive) Portuguese national archive with full-text search over preserved pages. Web pages / sites API
- Library of Congress Web Archives Curated thematic web-archive collections selected by LoC. Web pages / sites API
- Perma.cc Citation-grade permanent archiving for legal/academic links (Harvard LIL). Web pages / sites API
- Archive-It (Internet Archive) Subscription curated archiving used by 1000+ institutions; cross-collection coverage. Web pages / sites
- UK Government Web Archive (The National Archives) Public archive of UK central-government websites since ~2003. Web pages / sites
- Vefsafn.is (Icelandic Web Archive) National Library of Iceland archive of .is and Iceland-related sites since 1996. Web pages / sites API captcha
- Trove / Australian Web Archive (NLA) NLA web archive (PANDORA + AGWA + .au), searchable via Trove. Web pages / sites
- DNB Webarchiv (German National Library) German NL thematic/selective web archive since 2012.
- Ghostarchive Small on-demand archiver noted for YouTube/Twitter + general pages. Web pages / sites
- Conifer (Rhizome, ex-Webrecorder.io) Hosted high-fidelity interactive web capture.
- Webrecorder / Browsertrix Self-hostable/hosted crawler producing WACZ/WARC; the Conifer successor toolchain.
- Stanford Web Archive Portal (SWAP) Access portal to Stanford Libraries' web archives (stored via Archive-It).
- EU Web Archive (Publications Office) Publications Office archive of EU-institution websites since 2018. IP-blocked
- KB Netherlands Web Archive Royal Library of the Netherlands web archive ('Websites van Nederland'). IP-blocked
- US NARA Web Harvests (webharvest.gov) US National Archives congressional/presidential web harvests.
- Kiwix / Zimit Offline-archive toolchain (ZIM files); library at download.kiwix.org/zim/.
- PADICAT (Catalonia) Catalan national web archive.
- Croatian Web Archive (HAW) Croatian NL archive of .hr and Croatian content.
- Bibliotheca Alexandrina Web Archive Egypt's Library of Alexandria — historical IA mirror + regional collections.
- Yandex (cache / SERP) Last major search engine still surfacing cached page copies (via SERP). Web full-text captcha
- Crossref REST API Authoritative DOI registration metadata (~165M records). Academic papers API
- Unpaywall DOI -> best open-access copy locator. API
- OpenAlex Open index of ~480M scholarly works/authors/institutions. Academic papers API · key
- Semantic Scholar (Academic Graph) AI2 citation graph (~220M papers) with citation context. Academic papers API
- DataCite REST API DOI metadata for datasets, software, preprints. Academic papers API
- OpenCitations Open citation index (DOI-to-DOI); 2B+ links. API
- dblp (CS bibliography) Curated CS publication index (~7M records). Academic papers API
- arXiv (export API) Preprint server (physics/math/CS/...); ~2.5M papers. Academic papers API
- bioRxiv Biology preprint server (CSHL). Academic papers API captcha
- medRxiv Medical preprint server (shares bioRxiv API). Academic papers API captcha
- PubMed / PMC (NCBI E-utilities) NIH biomedical index (~37M) + PMC full text. Academic papers API
- Europe PMC EMBL-EBI life-sciences literature + full text. Academic papers API
- DOAJ Directory of Open Access Journals + article metadata. Academic papers API
- DOAB (Open Access Books) Directory of peer-reviewed open-access academic books. Books API
- OpenAIRE Graph EU open-science graph: pubs, datasets, software, funding. Academic papers API
- CORE (core.ac.uk) Largest OA aggregator: full-text + metadata from 10k+ repos. API · key
- BASE (Bielefeld) OAI-PMH index of 400M+ docs from 12k+ providers. IP-blocked
- The Lens Combined scholarly + patent search/analytics.
- Dimensions (Digital Science) Linked research DB: pubs, grants, patents, trials, policy.
- Scite.ai Smart Citations (supporting/contrasting/mentioning). captcha
- Connected Papers Visual similarity graph of related papers.
- Open Library Open catalog of books/editions/authors (IA); lending links. Books API
- Internet Archive Texts ~40M+ digitized texts/books on archive.org (some lending-gated). Books API
- scholar.archive.org / fatcat IA scholarly catalog (fatcat) + full-text search over 25M+ OA papers. Academic papers
- HathiTrust ~18M+ digitized volumes; bibliographic API. Books API captcha
- Google Books API Volume metadata + previews; broad ISBN/title coverage. Books API
- Project Gutenberg (Gutendex) ~75k public-domain ebooks; Gutendex serves the catalog as JSON. Books API
- Standard Ebooks Hand-produced public-domain ebooks; OPDS feed. Books IP-blocked
- Wikisource Wikimedia free-content library of source texts. Books API
- WorldCat / OCLC Largest union library catalog (holdings).
- Trove books (NLA) NLA discovery: books, newspapers, images, AU material. API · key
- Anna's Archive Meta-search/mirror over LibGen, Sci-Hub, Z-Library + scraped collections. Books captcha
- Library Genesis (LibGen) Long-running shadow library of books/textbooks/articles. Books IP-blocked
- Sci-Hub Full-text PDFs of paywalled papers by DOI. Academic papers captcha
- Z-Library Large shadow ebook/article library; account-gated. captcha
- Bluesky public AppView (AT Protocol) Open keyless API for profiles, posts, feeds, search — the cleanest social API in 2026. API
- Telegram public preview (t.me/s/) First-party server-rendered HTML preview of any public channel — no app, no login. Social API
- Wayback — twitter/x URLs Snapshots of tweet/profile URLs — most reliable deleted-tweet recovery in 2026. Social API
- Wayback Tweets (claromes) Tool that queries Wayback CDX for all archived tweets of a handle. Social API
- archive.today — X / Facebook On-demand snapshot of X/FB pages the Wayback crawler can't reach. Social API captcha
- Wayback — facebook.com Snapshots of public FB pages/posts where crawled. Social API
- Wayback — reddit.com Snapshots of Reddit threads/profiles — API-independent, so it survives. Social API
- IA Twitter Stream Grab Internet Archive bulk dumps of the old Twitter sample stream (JSON, by month). Social API
- PullPush.io Open Pushshift successor: full-text Reddit comment/submission search, no key. Social API
- Arctic Shift Reddit data via API + bulk dumps + web UI; better limits than PullPush. Social API
- Reddit Data API (official) Official Reddit API; unauth .json works, OAuth for more; commercial contract-gated. API
- Desuarchive (4chan archive) FoolFuuka 4chan archive (/a/, /int/, /k/, /tg/, /wsg/…) with JSON API. Social API
- Mastodon per-instance API Each Mastodon server exposes a REST API; public reads often keyless. API
- Fediverse Observer Directory + uptime monitor of Fediverse/Mastodon instances.
- Nitter (xcancel.com et al.) Privacy front-end for X; ecosystem mostly collapsed, a few instances cling on. IP-blocked
- Nitter health tracker Live uptime tracker for remaining Nitter instances.
- Thread Reader App Unrolls X threads into a readable page; unrolled threads persist after deletion.
- Politwoops (ProPublica) Archive of deleted politician tweets; tracking ended, archive browsable.
- Telegago (Google CSE) Google Custom Search scoped to public t.me for Telegram OSINT discovery.
- TGStat Largest Telegram channel/group catalog + analytics + post search. captcha
- Telemetr.io Large Telegram analytics service.
- SocialData API Paid X/Twitter data proxy — search, user, tweet endpoints without the official API. API · key
- X API v2 (official) Official X API; pay-per-use credit model since 2026-02, free tier killed.
- Meta Content Library CrowdTangle successor: public FB/IG content for vetted researchers only. IP-blocked
- Reveddit Shows mod/AutoMod-removed content on a user/thread; relied on Pushshift. captcha
- redditsearch.io Old Pushshift-backed Reddit search UI.
- twstalker / Sotwe (X viewers) Third-party X profile/timeline viewers that scrape and re-display tweets. captcha
- Picuki (IG viewer) Anonymous Instagram profile/post/story viewer. captcha
- Imginn (IG viewer) Anonymous Instagram post/story viewer and downloader. captcha
- AnonyIG (IG story viewer) No-login anonymous Instagram story/highlight viewer (class of many). IP-blocked
- Bluesky Firehose / Jetstream Real-time network-wide event stream; Jetstream serves a small JSON version. API
- 4plebs (4chan archive) Long-running FoolFuuka 4chan archive (/pol/, /x/, /tv/, /adv/…) with JSON API. Social IP-blocked
- archived.moe (4chan) Broad multi-board FoolFuuka 4chan archive. captcha
- Software Heritage Universal source-code archive ('Library of Alexandria of code'); REST API + content-addressed SWHIDs. Software / packages API
- Wikimedia Commons Largest free-media repository (100M+ files); full MediaWiki API. Images API
- Openverse Search across 800M+ CC-licensed / public-domain images & audio (WordPress). Images API
- Internet Archive (Video) Massive open video archive (films, TV, ephemera) with keyless search/metadata API. Video / audio / music API
- Internet Archive (Audio) / Live Music Huge open audio archive incl. Live Music Archive (etree) of legal live recordings. Video / audio / music API
- YouTube via Wayback Machine Recover deleted/changed YouTube watch pages (metadata, thumbnails) via Wayback captures. API
- Dailymotion Major Western video host with a keyless public Graph API and stable search; the best openly-API'd reupload source in research. Video / audio / music
- RuTube Russian video host — ~4% YouTube-reupload overlap in research. Video / audio / music captcha
- VK Video VKontakte video — high RU reupload potential, mostly auth-walled. Video / audio / music IP-blocked
- Odysee LBRY-based video host — open access, no captcha in research. Video / audio / music
- Rumble US video host — blocks datacenter IPs (403 in research). Video / audio / music IP-blocked
- BitChute Independent video host with captcha-gated search. Video / audio / music captcha
- OK.ru Video Odnoklassniki video — empty body to datacenter IPs in research. Video / audio / music IP-blocked
- Bilibili Largest Chinese video host — captcha + geo from outside CN. Video / audio / music captcha
- Torrents-CSV Open, self-hostable torrent search backed by a community CSV dataset; JSON API. Torrents API
- Academic Torrents BitTorrent sharing of research datasets and academic data (legitimate). Torrents API
- The Pirate Bay (apibay) Longest-running general torrent index; apibay.org is its keyless JSON backend. Torrents API
- Nyaa Leading anime/East-Asian-media torrent index; RSS acts as a pseudo-API. Torrents API
- SauceNAO Source-finder for anime/manga/art (Pixiv, Danbooru…); the de-facto art reverse-search. Images API · key
- VirusTotal Aggregated AV/sandbox verdicts + file/URL/domain/hash intelligence. File by hash API · key
- MalwareBazaar (abuse.ch) Free repository to search/download malware samples by hash/tag/signature. API · key
- URLhaus (abuse.ch) Database of malware-distribution URLs (payload URLs, host/hash lookups). API · key
- ThreatFox (abuse.ch) Open IOC-sharing platform (IPs, domains, hashes, URLs tied to malware). API · key
- Triage (tria.ge) Automated malware sandbox with public sample search by hash (Recorded Future). API · key
- MalShare Community public malware repository; hash search + sample download. API · key
- Maltiverse Threat-intel aggregator; lookup IPs/domains/URLs/hashes. API · key
- Flickr Large photo-sharing archive with mature REST API. API · key
- Imgur Image host; archival value reduced after the 2023 anonymous-content purge. API · key
- DeviantArt Large art community/archive with OAuth API. API · key
- Freesound Large collaborative database of CC-licensed sound samples/effects. API · key
- Yandex Images Best-in-class reverse image search, esp. faces & visually-similar. Images captcha
- Google Images / Lens World's largest reverse-image / visual search; Lens does object + text recognition. Images captcha
- TinEye Oldest dedicated reverse-image engine; exact/edited copies + earliest occurrence. Images
- Bing Visual Search Microsoft's visual/reverse image search. Images
- IQDB Multi-service booru reverse-image search (Danbooru, Gelbooru, Zerochan…). Images
- ascii2d Japanese reverse-image search (color + bovw hashes); strong on Pixiv/Twitter art. IP-blocked
- Karma Decay Reverse-image search scoped to Reddit (prior submissions of an image).
- PimEyes Face-recognition reverse search across the open web. captcha
- Photobucket Legacy image host; large historical (often hotlink-broken) archive.
- Pinterest Visual discovery / pin board image collection. IP-blocked
- Google Arts & Culture High-res museum/cultural artworks and exhibits.
- Tube Archivarix Archivarix's own deleted-YouTube-video finder: reupload discovery across mirror hosts + archived metadata. Video / audio / music
- Filmot Searchable index of YouTube metadata + subtitles, incl. deleted/private videos. Video / audio / music captcha
- Ghostarchive (video) On-demand archiver that preserves YouTube/Twitter video and pages. Video / audio / music
- ANY.RUN Interactive online malware sandbox with public report feed.
- Hybrid Analysis CrowdStrike Falcon Sandbox community portal; submit/search by hash. API · key captcha
- VirusShare Invite-only malware sample repository (hash lookup + download). captcha
- BTDigg (btdig.com) DHT-crawling torrent search engine (magnet links only). Torrents IP-blocked
- SolidTorrents DHT-based torrent/metadata search engine. Torrents IP-blocked
- Snowfl Meta torrent search aggregating multiple indexers. IP-blocked
- TorrentGalaxy General torrent index/community. IP-blocked
- 1337x One of the most-trafficked general torrent indexes. Torrents IP-blocked
- Bitmagnet (self-host) Self-hosted DHT crawler + torrent indexer (you run it).
- GH Archive Hourly archive of the public GitHub event timeline since 2011.
- Sourcegraph (public code search) Cross-repository code search over public repositories. Software / packages
- ccMixter Community remix site of CC-licensed music, samples, a cappellas.
- GifCities (GeoCities GIFs) Search engine over animated GIFs salvaged from GeoCities (rebuilt 2025, semantic). Images API
- OoCities Mirror/resurrection of GeoCities pages, browsable by old neighborhood path.
- GeoCities.ws GeoCities archive + free retro hosting; browsable mirror. captcha
- ReoCities One of the original GeoCities rescue mirrors from the 2009 shutdown.
- IA GeoCities Collection IA's dedicated GeoCities crawl (source corpus behind GifCities). API
- Cameron's World Curated collage artwork built from GeoCities graphics/GIFs (web-art).
- ProtoWeb HTTP proxy serving the 1990s web to vintage browsers from a restored-site cache. captcha
- oldweb.today Run emulated vintage browsers against Wayback/Memento archives in-browser.
- Marginalia Search Independent crawler/index for the small, old, text-heavy, non-commercial web. Web full-text API
- Wiby Search engine for hobbyist/early-web pages; 'surprise me' random old page. Web full-text API
- Mojeek UK independent search engine with its own crawler (no Bing/Google backfill).
- Stract Open-source, customizable search engine (NLnet-funded).
- Million Short Search that removes the top N most-popular sites to surface obscure results. captcha
- Teclis Non-commercial 'creative web' search; now Kagi's internal index.
- Google Groups Usenet DejaNews-derived Usenet archive back to 1981; frozen/read-only since 2024. Niche text
- Usenet Archives Independent free web archive of hundreds of millions of historical Usenet posts.
- Narkive Long-running searchable Usenet web archive.
- textfiles.com Curated archive of BBS-era text files, zines, phreaking, BBS lists (Jason Scott).
- DiscMaster (files inside old discs) Full-text/file search INSIDE millions of vintage files in archive.org disc images. Niche text API
- MobyGames Comprehensive cataloged history of video games across all platforms. captcha
- IA MS-DOS / Software Library In-browser emulated MS-DOS games + historical software on IA. Software / packages API
- My Abandonware Large abandonware database/download for 80s-90s DOS/Win/Amiga/console games. Software / packages
- Europeana Aggregator of 50M+ digitized items from European GLAM institutions.
- DPLA Aggregates digital items from US libraries/archives/museums.
- Smithsonian Open Access Open Access metadata + CC0 media across all Smithsonian museums. captcha
- Met Museum API The Met's full collection metadata + open-access images. Cultural heritage API
- Rijksmuseum API Dutch national museum collection metadata + high-res images.
- Chronicling America (LoC) US historic newspaper pages (1700s-1960s) with OCR full text (LoC). API
- crt.sh (Cert Transparency) Queryable search over Certificate Transparency logs (historical certs/subdomains). Web pages / sites
- Censys Search Internet-wide host/cert scanning index (Shodan alternative). captcha
- Shodan Search engine for internet-connected devices/services with historical banners.
- Intelligence X Searches pastes, darkweb, leaks, whois + a 'time machine' of removed data.
- Have I Been Pwned (pwned passwords) Index of breached accounts/credentials; Pwned-Passwords range API is keyless. API
- Ahmia (Tor index) Clearnet search engine indexing Tor .onion hidden services (abuse-filtered).
- APKMirror Archive of historical Android APK versions (signed, verified). Software / packages
- F-Droid FOSS Android app repo with full version archive + reproducible builds. Software / packages API
- PyPI Python package index with all historical versions + JSON metadata. Software / packages API
- npm registry Node package registry with full version history + tarballs. Software / packages API captcha
- OldMapsOnline Search portal for georeferenced historical maps across institutions. Cultural heritage captcha
- David Rumsey Maps Premier digitized historical map collection (Luna), high-res + georef.
- OSM / Overpass Editable world map; Overpass = read-only query API; full history dumps. API
- GDELT Global news event/tone monitoring across world media + TV news. API
- IA TV News Archive Searchable US/intl TV news since 2009 via closed-caption text (IA). Video / audio / music API
- Radio Garden Globe interface to thousands of live radio stations worldwide. API
- IA Live Music (etree) Legal taper recordings of live concerts (etree). Video / audio / music API
- Zenodo CERN-run general-purpose research data/software repository with DOIs. Datasets
- Figshare Repository for datasets, figures, posters, supplementary outputs. Datasets API
- Harvard Dataverse Large Dataverse research-data repository. Datasets API
- OSF Research project/data collaboration + preprint registry. Datasets API
- Hugging Face Datasets Largest hub of ML datasets (and models) with rich metadata. Datasets API
- Archive of Our Own Largest fanfiction archive (OTW); no official API by policy. Niche text captcha
- FanFiction.Net Veteran fanfiction site (pre-AO3 era); Cloudflare-walled. captcha
- Fanlore Wiki documenting fan history, terms, fandoms (OTW); MediaWiki API. captcha
- WikiTeam dumps Preserved dumps of 600k+ wikis (Fandom/Wikia, niche MediaWikis) on IA. Niche text API
- Discogs Crowdsourced database of music releases/pressings + marketplace. captcha
- MusicBrainz Open music encyclopedia; canonical MBIDs for artists/releases. Video / audio / music API
- Genius Song lyrics + annotations database.
- CourtListener / RECAP Free US case law + PACER docs (RECAP) + oral-argument audio. Cultural heritage API
- PatentsView (USPTO) USPTO patent data API/search with disambiguated inventors/assignees.
- Google Patents Global patent + non-patent literature search (many offices). Academic papers
- Espacenet / EPO OPS EPO worldwide patent search + Open Patent Services REST API. captcha
- FamilySearch World's largest free genealogy records archive (LDS Church).
- pouet.net (demoscene) Canonical demoscene database: productions, parties, groups since the 90s. API
- 16colo.rs (ANSI/ASCII art) Archive of ANSI/ASCII artpacks from the BBS artscene since the early 1990s.
- ASCII Art Archive Large categorized library of classic single-image ASCII art.
- CyberLeninka (КиберЛенинка) Russian open-access scholarly library: full-text journal articles, free to read. Academic papers
- НЭБ — National Electronic Library (rusneb.ru) Russia's National Electronic Library: digitized books, periodicals, and dissertations from major Russian libraries.
- End of Term Web Archive Snapshots of US federal government websites captured at each presidential term transition since 2008.