Skip to main content

More right-solid
More right-solid
SHOW DETAILS
eye
Title
Date Archived
Creator
Accelovation Crawl
collection
1,324
ITEMS
59.6M
VIEWS
collection
eye 59.6M
Web crawl snapshots generously donated from Accelovation . This data is currently not publicly accessible. From the site : Accelovation is pioneering the delivery of Insight Discovery™ software solutions that help companies move from innovation idea to product reality faster and with more success. Our solutions are used by leading firms in the Fortune 500 and beyond – companies from a diverse set of industries ranging from consumer packaged goods to high tech, foods to chemicals, and...
Wikileaks.org Archive
collection
8
ITEMS
9,873
VIEWS
collection
eye 9,873
A collection of web pages from the wikileaks websites as well as news coverage and commentary surrounding the Wikileaks releases. It includes coverage of the Afghan war diaries, the Iraq war logs, and the US State diplomatic cables. Please note this content was collected from July 27, 2010 to December 9, 2010. The Internet Archive has archived more pages that will be available as the general Wayback Machine is brought up to date. For ongoing updates or full text search please visit the Internet...
Topic: wikileaks
Ferguson Tweets
collection
212
ITEMS
1.5M
VIEWS
collection
eye 1.5M
IDs of tweets that mention Ferguson, Missouri between August 10th and August 27th, 2014 subsequent to the death of Michael Brown . Tweets collected by Ed Summers. He subsequently extracted the URLs from these tweets, and they were crawled by the Internet Archive. Please read Summers's article at inkdroid.org , with an update here , for more information. Photo: " Memorial to Michael Brown " by Jamelle Bouie
Mercator Crawl
collection
1
ITEMS
73
VIEWS
collection
eye 73
Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible.
Internet Memory Foundation
collection
1,916
ITEMS
138.9M
VIEWS
collection
eye 138.9M
Data crawled on behalf of Internet Memory Foundation . This data is currently not publicly accessible. from Wikipedia : The Internet Memory Foundation (formerly the European Archive Foundation) is a non profit foundation whose purpose is archiving web content, it supports projects and research which include the preservation and protection of multimedia content. Its archives form a digital library of cultural content.
Rescue Crawls
collection
2
ITEMS
670
VIEWS
collection
eye 670
Rescue crawls conducted by the public for sites that have announced that they are closing.
To Crawl
collection
1
ITEMS
94,066
VIEWS
collection
eye 94,066
Data collected by Internet Archive. This data is currently not publicly accessible.
Biblioteca Nazionale Centrale di Firenze
collection
223
ITEMS
11.8M
VIEWS
collection
eye 11.8M
Data collected by Internet Archive on behalf of Biblioteca Nazionale Centrale di Firenze. This data is currently not publicly accessible.
Open Sky
collection
1
ITEMS
2,453
VIEWS
collection
eye 2,453
Demo crawl of scientific data. This data is currently not publicly accessible.
web_sup
collection
88
ITEMS
6.4M
VIEWS
collection
eye 6.4M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl Title
collection
1
ITEMS
317,199
VIEWS
collection
eye 317,199
Crawl Title from Alexa Internet. This data is currently not publicly accessible.
Hurricane Katrina
collection
112
ITEMS
7.5M
VIEWS
collection
eye 7.5M
Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
Inktomi 2001
collection
1
ITEMS
75,370
VIEWS
collection
eye 75,370
Data collected in 2001. This data is currently not publicly accessible. from Wikipedia : Inktomi Corporation was a California company that provided software for Internet service providers. It was founded in 1996 by UC Berkeley professor Eric Brewer and graduate student Paul Gauthier. The company was initially founded based on the real-world success of the web search engine they developed at the university. After the bursting of the dot-com bubble, Inktomi was acquired by Yahoo!
web_mon
collection
3,809
ITEMS
102.4M
VIEWS
collection
eye 102.4M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Yahoo! Video Crawl
collection
4,484
ITEMS
44,238
VIEWS
collection
eye 44,238
Pages captured from Yahoo! Video prior to removal of user uploads. Crawl Started February 2011. This data is currently not publicly accessible. from Wikipedia : Yahoo! Video is a video sharing website on which users could upload and share videos. The service is owned and created by Yahoo! Yahoo! Video began as an internet-wide video search engine and added the ability to upload and share video clips in June 2006. A re-designed site was launched in February 2008 that changed the focus to...
National Library of New Zealand Crawls
collection
13,058
ITEMS
67.3M
VIEWS
collection
eye 67.3M
Crawls performed by Internet Archive on behalf of the National Library of New Zealand. This data is currently not publicly accessible.
UK Government Site Crawl
collection
106
ITEMS
4.4M
VIEWS
collection
eye 4.4M
Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
Alexa Crawl F2
collection
1
ITEMS
218
VIEWS
collection
eye 218
Crawl F2 from Alexa Internet. This data is currently not publicly accessible.
web_ind
collection
91
ITEMS
5.8M
VIEWS
collection
eye 5.8M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl Image
collection
92
ITEMS
40M
VIEWS
collection
eye 40M
Crawl Image from Alexa Internet. This data is currently not publicly accessible.
Nigerian Election
collection
1
ITEMS
28,226
VIEWS
collection
eye 28,226
Data related to Nigerian elections, 2001 collected by Internet Archive. This data is currently not publicly accessible.
VOX.com Crawl September 2010
collection
28
ITEMS
1.1M
VIEWS
collection
eye 1.1M
Crawl of vox.com, September 2010. This was an attempt to preserve vox.com content as much as possible in the wake of service closure, September 30, 2010.
Topic: webwidecrawl
web_osi
collection
677
ITEMS
21.9M
VIEWS
collection
eye 21.9M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl Test
collection
6
ITEMS
10.7M
VIEWS
collection
eye 10.7M
Crawl Test from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl ST
collection
1
ITEMS
619,607
VIEWS
collection
eye 619,607
Crawl ST from Alexa Internet. This data is currently not publicly accessible.
collection
eye 800,275
web_oso
collection
150
ITEMS
8.8M
VIEWS
collection
eye 8.8M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_el
collection
925
ITEMS
49.3M
VIEWS
collection
eye 49.3M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
National Archives and Records Administration
collection
9,688
ITEMS
79M
VIEWS
collection
eye 79M
National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
web_ma
collection
1,085
ITEMS
52M
VIEWS
collection
eye 52M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl Robot
collection
1
ITEMS
69,855
VIEWS
collection
eye 69,855
Crawl Robot from Alexa Internet. This data is currently not publicly accessible.
web_is_m
collection
1
ITEMS
10,223
VIEWS
collection
eye 10,223
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl EE
collection
484
ITEMS
49.9M
VIEWS
collection
eye 49.9M
Crawl EE from Alexa Internet. This data is currently not publicly accessible.
web_eot
collection
245
ITEMS
1.2M
VIEWS
collection
eye 1.2M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
collection
eye 2.7M
Data collected by Internet Archive on behalf of the Fundacao para a Computacao Cientifica Nacional of Portugal. This data is currently not publicly accessible.
web_sm_or
collection
16
ITEMS
2.3M
VIEWS
collection
eye 2.3M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
FS Fed US
collection
3
ITEMS
14,431
VIEWS
collection
eye 14,431
Data collected in 2005 by Internet Archive. This data is currently not publicly accessible.
NDIIPP Youtube Crawl
collection
90
ITEMS
2.2M
VIEWS
collection
eye 2.2M
Youtube crawl performed by Internet Archive on behalf of the National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible.
Top 150 Crawl
collection
29
ITEMS
4.6M
VIEWS
collection
eye 4.6M
Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
Shallow Crawls
collection
1,042
ITEMS
121.8M
VIEWS
collection
eye 121.8M
Shallow crawls that collect content 1 level deep including embeds. This data is currently not publicly accessible.
University of Michigan
collection
5
ITEMS
1.2M
VIEWS
collection
eye 1.2M
Data collected by Internet Archive on behalf of University of Michigan. This data is currently not publicly accessible. from Wikipedia : The University of Michigan, frequently referred to as simply Michigan, is a public research university located in Ann Arbor, Michigan, United States. It is the state's oldest university and the flagship campus of the University of Michigan.
collection
eye 3.1M
Cuil Crawl Data
collection
0
ITEMS
22M
VIEWS
collection
eye 22M
Web crawl snapshot generously donated from cuil.com . This collection of pages mostly from 2007 and some from 2008, is about 310 terabytes of compressed data, and almost 60 billion URLs (mostly text). Cuil was a search engine that organized web pages by content and displayed relatively long entries along with thumbnail pictures for many results. Cuil said it had a larger index than any other search engine, with about 120 billion web pages. It went live on July 28, 2008. Cuil's servers were shut...
Fix Broken Links Web Crawls
collection
108,210
ITEMS
2.2B
VIEWS
collection
eye 2.2B
These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved. Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's...
Alexa Crawl DZ
collection
1,207
ITEMS
103.2M
VIEWS
collection
eye 103.2M
Crawl DZ from Alexa Internet. This data is currently not publicly accessible.
collection
eye 1.9M
Alexa Crawl EH
collection
1,218
ITEMS
120.2M
VIEWS
collection
eye 120.2M
Crawl EH from Alexa Internet. This data is currently not publicly accessible.
Mayoral Crawls
collection
1
ITEMS
204,471
VIEWS
collection
eye 204,471
Mayoral crawls performed by Internet Archive. This data is currently not publicly accessible.
web_con
collection
1,507
ITEMS
51.4M
VIEWS
collection
eye 51.4M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_leg
collection
58
ITEMS
6.6M
VIEWS
collection
eye 6.6M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl Short
collection
5
ITEMS
5.6M
VIEWS
collection
eye 5.6M
Crawl Short from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl TO
collection
1
ITEMS
1.4M
VIEWS
collection
eye 1.4M
Crawl TO from Alexa Internet. This data is currently not publicly accessible.
Alexa MP3.com Crawl
collection
43
ITEMS
7,721
VIEWS
collection
eye 7,721
MP3.com Crawl from Alexa Internet. This data is currently not publicly accessible.
Swiss National Library
collection
10
ITEMS
315,748
VIEWS
collection
eye 315,748
Data collected by Internet Archive on behalf of the Swiss National Library. This data is currently not publicly accessible.
Internet Archive Web Crawls
by Common Crawl
data
eye 30
favorite 0
comment 0
1213890617409_4.arc includes crawl data collected between 2008-05-13 09:36:54 GMT and 2008-05-13 09:37:44 GMT 1213890649760_4.arc includes crawl data collected between 2008-05-13 09:37:47 GMT and 2008-05-13 11:09:12 GMT etc.
Topic: crawldata
2004 Election
collection
178
ITEMS
9.8M
VIEWS
collection
eye 9.8M
2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
NL TV
collection
1
ITEMS
61,398
VIEWS
collection
eye 61,398
Data collected in 2005. This data is currently not publicly accessible.
Alexa Crawl DX
collection
1,442
ITEMS
120.4M
VIEWS
collection
eye 120.4M
Crawl DX from Alexa Internet. This data is currently not publicly accessible.
web_tran
collection
4,192
ITEMS
93.2M
VIEWS
collection
eye 93.2M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl ARC
collection
79
ITEMS
16.6M
VIEWS
collection
eye 16.6M
Crawl ARC from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl TS
collection
1
ITEMS
7,341
VIEWS
collection
eye 7,341
Crawl TS from Alexa Internet. This data is currently not publicly accessible.
Product DB
collection
1
ITEMS
6,517
VIEWS
collection
eye 6,517
Product DB data collected by Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DJ
collection
341
ITEMS
58.6M
VIEWS
collection
eye 58.6M
Crawl DJ from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DL
collection
413
ITEMS
69.2M
VIEWS
collection
eye 69.2M
Crawl DL from Alexa Internet. This data is currently not publicly accessible.
collection
eye 19,695
Alexa Crawl EB
collection
653
ITEMS
91.3M
VIEWS
collection
eye 91.3M
Crawl EB from Alexa Internet. This data is currently not publicly accessible.
Alexa 2002 Election Crawl
collection
24
ITEMS
14.5M
VIEWS
collection
eye 14.5M
2002 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Bibliotheque Nationale de France Domain Crawls
collection
1,652
ITEMS
135.3M
VIEWS
collection
eye 135.3M
Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
World Wars Crawl
collection
13
ITEMS
10.1M
VIEWS
collection
eye 10.1M
Web data related to World Wars I and II collected by Internet Archive in an experimental crawl sponsored by National Endowment for the Humanities and JISC. This data is currently not publicly accessible.
Alexa Crawl GR
collection
74
ITEMS
11.4M
VIEWS
collection
eye 11.4M
Crawl GR from Alexa Internet. This data is currently not publicly accessible.
National Library of Sweden
collection
309
ITEMS
21.6M
VIEWS
collection
eye 21.6M
Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
web_wk
collection
9,973
ITEMS
208M
VIEWS
collection
eye 208M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl RECY
collection
1
ITEMS
146,630
VIEWS
collection
eye 146,630
Crawl RECY from Alexa Internet. This data is currently not publicly accessible.
NDIIPP Reality
collection
1
ITEMS
4,078
VIEWS
collection
eye 4,078
Immersive gaming environments R&D project for National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible. from Wikipedia : The National Digital Information Infrastructure and Preservation Program (NDIIPP) is an archival program led by the Library of Congress to archive and provide access to digital resources. The U.S. Congress established the program in 2000. The Library was chosen because of its role as one of the leading providers of...
Alexa Crawl DH
collection
141
ITEMS
29.6M
VIEWS
collection
eye 29.6M
Crawl DH from Alexa Internet. This data is currently not publicly accessible.
Crawl Data
collection
32,956
ITEMS
14.8M
VIEWS
collection
eye 14.8M
Crawl Data. This data is currently not publicly accessible.
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
4.8M
VIEWS
collection
eye 4.8M
Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
collection
eye 163,094
web_sm_sing
collection
3
ITEMS
1M
VIEWS
collection
eye 1M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa 1996 Election Crawl
collection
1
ITEMS
32,515
VIEWS
collection
eye 32,515
1996 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Wayback Robots Crawl
collection
129
ITEMS
4.1M
VIEWS
collection
eye 4.1M
Wayback robots.txt crawl performed by Internet Archive. This data is currently not publicly accessible.
Brookings Institute Crawl
collection
1
ITEMS
124,556
VIEWS
collection
eye 124,556
Crawl data gather by Internet Archive on behalf of the Brookings Institute. This data is currently not publicly accessible.
web_iq
collection
2,637
ITEMS
187.1M
VIEWS
collection
eye 187.1M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Web Crawls
data
eye 1,253
favorite 1
comment 0
Test WARC files.
National Library of Ireland Crawls
collection
2,620
ITEMS
20.7M
VIEWS
collection
eye 20.7M
Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
web_is
collection
5
ITEMS
1.7M
VIEWS
collection
eye 1.7M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl AUG
collection
80
ITEMS
37M
VIEWS
collection
eye 37M
Crawl AUG from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl CRC
collection
32
ITEMS
20.9M
VIEWS
collection
eye 20.9M
Crawl CRC from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl EI
collection
1,408
ITEMS
144.3M
VIEWS
collection
eye 144.3M
Crawl EI from Alexa Internet. This data is currently not publicly accessible.
collection
eye 61,569
Demo crawl for National Oceanic and Atmospheric Administration (NOAA). This data is currently not publicly accessible. from Wikipedia : The National Oceanic and Atmospheric Administration (NOAA) is a scientific agency within the United States Department of Commerce focused on the conditions of the oceans and the atmosphere. NOAA warns of dangerous weather, charts seas and skies, guides the use and protection of ocean and coastal resources, and conducts research to improve understanding and...
web_sm_prin
collection
1
ITEMS
98,348
VIEWS
collection
eye 98,348
Crawl performed by Internet Archive. This data is currently not publicly accessible.
September 11th
collection
1
ITEMS
652,606
VIEWS
collection
eye 652,606
Data related to September 11th, 2001 collected by Internet Archive. This data is currently not publicly accessible. from Wikipedia : The September 11 attacks (also referred to as September 11, September 11th, or 9/11 were a series of four coordinated terrorist attacks launched by the Islamic terrorist group al-Qaeda upon the United States in New York City and the Washington, D.C. areas on September 11, 2001.
Standards
collection
1
ITEMS
599
VIEWS
collection
eye 599
Standards crawl data collected by Internet Archive. This data is currently not publicly accessible.
web_pop
collection
13
ITEMS
2.6M
VIEWS
collection
eye 2.6M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Traffic
collection
89
ITEMS
808
VIEWS
collection
eye 808
Traffic files from Alexa Internet that are sanitized-- just base urls (no parameters) and time/date. This data is currently not publicly accessible. Covers the period from December 2001 to February 2009.
Target Product Crawl
collection
4
ITEMS
241
VIEWS
collection
eye 241
Target product crawl data collected by Alexa Internet. This data is currently not publicly accessible.
National Science Digital Library
collection
3
ITEMS
39,138
VIEWS
collection
eye 39,138
Demo crawl for the National Science Digital Library. This data is currently not publicly accessible. from Wikipedia : The United States' National Science Digital Library (NSDL) is an open-access online digital library and collaborative network of disciplinary and grade-level focused education providers. NSDL's mission is to provide quality digital learning collections to the science, technology, engineering, and mathematics (STEM) education community, both formal and informal, institutional and...
Alexa 2000 Election Crawl
collection
4
ITEMS
248,112
VIEWS
collection
eye 248,112
2000 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
web_eg
collection
32
ITEMS
2.7M
VIEWS
collection
eye 2.7M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_dar
collection
112
ITEMS
6.1M
VIEWS
collection
eye 6.1M
Crawl performed by Internet Archive. This data is currently not publicly accessible.