Skip to main content

More right-solid
More right-solid
SHOW DETAILS
eye
Title
Date Archived
Creator
Accelovation Crawl
collection
1,324
ITEMS
51.1M
VIEWS
collection
eye 51.1M
Web crawl snapshots generously donated from Accelovation . This data is currently not publicly accessible. From the site : Accelovation is pioneering the delivery of Insight Discovery™ software solutions that help companies move from innovation idea to product reality faster and with more success. Our solutions are used by leading firms in the Fortune 500 and beyond – companies from a diverse set of industries ranging from consumer packaged goods to high tech, foods to chemicals, and...
Fix Broken Links Web Crawls
collection
102,611
ITEMS
1.8B
VIEWS
collection
eye 1.8B
These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved. Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's...
Internet Memory Foundation
collection
1,916
ITEMS
107.7M
VIEWS
collection
eye 107.7M
Data crawled on behalf of Internet Memory Foundation . This data is currently not publicly accessible. from Wikipedia : The Internet Memory Foundation (formerly the European Archive Foundation) is a non profit foundation whose purpose is archiving web content, it supports projects and research which include the preservation and protection of multimedia content. Its archives form a digital library of cultural content.
Rescue Crawls
collection
2
ITEMS
670
VIEWS
collection
eye 670
Rescue crawls conducted by the public for sites that have announced that they are closing.
Mercator Crawl
collection
1
ITEMS
72
VIEWS
collection
eye 72
Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible.
Wikileaks.org Archive
collection
8
ITEMS
9,666
VIEWS
collection
eye 9,666
A collection of web pages from the wikileaks websites as well as news coverage and commentary surrounding the Wikileaks releases. It includes coverage of the Afghan war diaries, the Iraq war logs, and the US State diplomatic cables. Please note this content was collected from July 27, 2010 to December 9, 2010. The Internet Archive has archived more pages that will be available as the general Wayback Machine is brought up to date. For ongoing updates or full text search please visit the Internet...
Topic: wikileaks
Ferguson Tweets
collection
212
ITEMS
1.4M
VIEWS
collection
eye 1.4M
IDs of tweets that mention Ferguson, Missouri between August 10th and August 27th, 2014 subsequent to the death of Michael Brown . Tweets collected by Ed Summers. He subsequently extracted the URLs from these tweets, and they were crawled by the Internet Archive. Please read Summers's article at inkdroid.org , with an update here , for more information. Photo: " Memorial to Michael Brown " by Jamelle Bouie
web_mon
collection
3,809
ITEMS
90.9M
VIEWS
collection
eye 90.9M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Open Sky
collection
1
ITEMS
2,222
VIEWS
collection
eye 2,222
Demo crawl of scientific data. This data is currently not publicly accessible.
web_sup
collection
88
ITEMS
5.7M
VIEWS
collection
eye 5.7M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl F2
collection
1
ITEMS
191
VIEWS
collection
eye 191
Crawl F2 from Alexa Internet. This data is currently not publicly accessible.
VOX.com Crawl September 2010
collection
28
ITEMS
1M
VIEWS
collection
eye 1M
Crawl of vox.com, September 2010. This was an attempt to preserve vox.com content as much as possible in the wake of service closure, September 30, 2010.
Topic: webwidecrawl
Inktomi 2001
collection
1
ITEMS
67,675
VIEWS
collection
eye 67,675
Data collected in 2001. This data is currently not publicly accessible. from Wikipedia : Inktomi Corporation was a California company that provided software for Internet service providers. It was founded in 1996 by UC Berkeley professor Eric Brewer and graduate student Paul Gauthier. The company was initially founded based on the real-world success of the web search engine they developed at the university. After the bursting of the dot-com bubble, Inktomi was acquired by Yahoo!
Alexa Crawl Title
collection
1
ITEMS
274,172
VIEWS
collection
eye 274,172
Crawl Title from Alexa Internet. This data is currently not publicly accessible.
UK Government Site Crawl
collection
106
ITEMS
3.9M
VIEWS
collection
eye 3.9M
Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
Alexa Crawl Test
collection
6
ITEMS
9.7M
VIEWS
collection
eye 9.7M
Crawl Test from Alexa Internet. This data is currently not publicly accessible.
National Library of New Zealand Crawls
collection
10,952
ITEMS
54.9M
VIEWS
collection
eye 54.9M
Crawls performed by Internet Archive on behalf of the National Library of New Zealand. This data is currently not publicly accessible.
Nigerian Election
collection
1
ITEMS
25,184
VIEWS
collection
eye 25,184
Data related to Nigerian elections, 2001 collected by Internet Archive. This data is currently not publicly accessible.
Yahoo! Video Crawl
collection
4,484
ITEMS
42,873
VIEWS
collection
eye 42,873
Pages captured from Yahoo! Video prior to removal of user uploads. Crawl Started February 2011. This data is currently not publicly accessible. from Wikipedia : Yahoo! Video is a video sharing website on which users could upload and share videos. The service is owned and created by Yahoo! Yahoo! Video began as an internet-wide video search engine and added the ability to upload and share video clips in June 2006. A re-designed site was launched in February 2008 that changed the focus to...
Alexa Crawl Image
collection
92
ITEMS
35.8M
VIEWS
collection
eye 35.8M
Crawl Image from Alexa Internet. This data is currently not publicly accessible.
web_ind
collection
91
ITEMS
5.2M
VIEWS
collection
eye 5.2M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Hurricane Katrina
collection
112
ITEMS
6.7M
VIEWS
collection
eye 6.7M
Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
To Crawl
collection
1
ITEMS
78,773
VIEWS
collection
eye 78,773
Data collected by Internet Archive. This data is currently not publicly accessible.
web_osi
collection
677
ITEMS
19.6M
VIEWS
collection
eye 19.6M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Biblioteca Nazionale Centrale di Firenze
collection
223
ITEMS
10.4M
VIEWS
collection
eye 10.4M
Data collected by Internet Archive on behalf of Biblioteca Nazionale Centrale di Firenze. This data is currently not publicly accessible.
Alexa Crawl EE
collection
484
ITEMS
41.2M
VIEWS
collection
eye 41.2M
Crawl EE from Alexa Internet. This data is currently not publicly accessible.
web_sm_or
collection
16
ITEMS
2M
VIEWS
collection
eye 2M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Shallow Crawls
collection
1,042
ITEMS
106.5M
VIEWS
collection
eye 106.5M
Shallow crawls that collect content 1 level deep including embeds. This data is currently not publicly accessible.
Top 150 Crawl
collection
30
ITEMS
4.1M
VIEWS
collection
eye 4.1M
Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
National Archives and Records Administration
collection
9,688
ITEMS
68.8M
VIEWS
collection
eye 68.8M
National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
collection
eye 723,359
Alexa Crawl Robot
collection
1
ITEMS
61,515
VIEWS
collection
eye 61,515
Crawl Robot from Alexa Internet. This data is currently not publicly accessible.
web_is_m
collection
1
ITEMS
9,408
VIEWS
collection
eye 9,408
Crawl performed by Internet Archive. This data is currently not publicly accessible.
collection
eye 2.7M
web_oso
collection
150
ITEMS
7.8M
VIEWS
collection
eye 7.8M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
NDIIPP Youtube Crawl
collection
90
ITEMS
2M
VIEWS
collection
eye 2M
Youtube crawl performed by Internet Archive on behalf of the National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible.
collection
eye 2.3M
Data collected by Internet Archive on behalf of the Fundacao para a Computacao Cientifica Nacional of Portugal. This data is currently not publicly accessible.
FS Fed US
collection
3
ITEMS
13,101
VIEWS
collection
eye 13,101
Data collected in 2005 by Internet Archive. This data is currently not publicly accessible.
Cuil Crawl Data
collection
0
ITEMS
22M
VIEWS
collection
eye 22M
Web crawl snapshot generously donated from cuil.com . This collection of pages mostly from 2007 and some from 2008, is about 310 terabytes of compressed data, and almost 60 billion URLs (mostly text). Cuil was a search engine that organized web pages by content and displayed relatively long entries along with thumbnail pictures for many results. Cuil said it had a larger index than any other search engine, with about 120 billion web pages. It went live on July 28, 2008. Cuil's servers were shut...
web_eot
collection
245
ITEMS
1M
VIEWS
collection
eye 1M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_ma
collection
1,085
ITEMS
46.2M
VIEWS
collection
eye 46.2M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
University of Michigan
collection
5
ITEMS
1.1M
VIEWS
collection
eye 1.1M
Data collected by Internet Archive on behalf of University of Michigan. This data is currently not publicly accessible. from Wikipedia : The University of Michigan, frequently referred to as simply Michigan, is a public research university located in Ann Arbor, Michigan, United States. It is the state's oldest university and the flagship campus of the University of Michigan.
Alexa Crawl ST
collection
1
ITEMS
518,137
VIEWS
collection
eye 518,137
Crawl ST from Alexa Internet. This data is currently not publicly accessible.
web_el
collection
925
ITEMS
44.9M
VIEWS
collection
eye 44.9M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Wordpress Blogs and the Pages They Link To
collection
27,752
ITEMS
349.6M
VIEWS
collection
eye 349.6M
This is a collection of pages and embedded objects from WordPress blogs and the external pages they link to. Captures of these pages are made on a continuous basis seeded from a feed of new or changed pages hosted by Wordpress.com or by Wordpress pages hosted by sites running a properly configured Jetpack wordpress plugin.
Topics: Wordpress.com, blogs, jetpack
Brookings Institute Crawl
collection
1
ITEMS
112,802
VIEWS
collection
eye 112,802
Crawl data gather by Internet Archive on behalf of the Brookings Institute. This data is currently not publicly accessible.
web_iq
collection
2,637
ITEMS
167M
VIEWS
collection
eye 167M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Web Crawls
data
eye 1,199
favorite 1
comment 0
Test WARC files.
collection
eye 138,043
Institut national de l’audiovisuel
collection
50
ITEMS
53.4M
VIEWS
collection
eye 53.4M
Crawl data from Institut national de l’audiovisuel in France. This data is currently not publicly accessible. from Wikipedia : The Institut national de l'audiovisuel (or INA, French for National Audiovisual Institute), is a repository of all French radio and television audiovisual archives. Since 2006, it has allowed free online consultation on a website called ina.fr with a search tool indexing 100,000 archives of historical programs, for a total of 20,000 hours.
Wayback Robots Crawl
collection
129
ITEMS
3.4M
VIEWS
collection
eye 3.4M
Wayback robots.txt crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_sing
collection
3
ITEMS
916,023
VIEWS
collection
eye 916,023
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa 1996 Election Crawl
collection
1
ITEMS
29,049
VIEWS
collection
eye 29,049
1996 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Wikipedia Dumps
collection
810
ITEMS
64,595
VIEWS
collection
eye 64,595
Data dumps of the wikipedia.org web site.
Crawl Data
collection
32,956
ITEMS
11.6M
VIEWS
collection
eye 11.6M
Crawl Data. This data is currently not publicly accessible.
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
4.3M
VIEWS
collection
eye 4.3M
Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl CRC
collection
32
ITEMS
18.9M
VIEWS
collection
eye 18.9M
Crawl CRC from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl AUG
collection
80
ITEMS
33.2M
VIEWS
collection
eye 33.2M
Crawl AUG from Alexa Internet. This data is currently not publicly accessible.
collection
eye 53,670
Demo crawl for National Oceanic and Atmospheric Administration (NOAA). This data is currently not publicly accessible. from Wikipedia : The National Oceanic and Atmospheric Administration (NOAA) is a scientific agency within the United States Department of Commerce focused on the conditions of the oceans and the atmosphere. NOAA warns of dangerous weather, charts seas and skies, guides the use and protection of ocean and coastal resources, and conducts research to improve understanding and...
Alexa Crawl BK
collection
1
ITEMS
63,001
VIEWS
collection
eye 63,001
Crawl BK from Alexa Internet. This data is currently not publicly accessible.
September 11th
collection
1
ITEMS
591,816
VIEWS
collection
eye 591,816
Data related to September 11th, 2001 collected by Internet Archive. This data is currently not publicly accessible. from Wikipedia : The September 11 attacks (also referred to as September 11, September 11th, or 9/11 were a series of four coordinated terrorist attacks launched by the Islamic terrorist group al-Qaeda upon the United States in New York City and the Washington, D.C. areas on September 11, 2001.
Target Product Crawl
collection
4
ITEMS
186
VIEWS
collection
eye 186
Target product crawl data collected by Alexa Internet. This data is currently not publicly accessible.
National Library of Ireland Crawls
collection
2,620
ITEMS
17.2M
VIEWS
collection
eye 17.2M
Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
web_pop
collection
13
ITEMS
2.3M
VIEWS
collection
eye 2.3M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Standards
collection
1
ITEMS
495
VIEWS
collection
eye 495
Standards crawl data collected by Internet Archive. This data is currently not publicly accessible.
web_is
collection
5
ITEMS
1.5M
VIEWS
collection
eye 1.5M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
collection
eye 159,085
Alexa Crawl EI
collection
1,408
ITEMS
121.3M
VIEWS
collection
eye 121.3M
Crawl EI from Alexa Internet. This data is currently not publicly accessible.
Edu & Gov Crawl, June 2010
collection
704
ITEMS
13.2M
VIEWS
collection
eye 13.2M
TEST COLLECTION: Crawl of .edu and .gov sites started in June 2010.
Topic: crawldata
Alexa 2000 Election Crawl
collection
4
ITEMS
226,333
VIEWS
collection
eye 226,333
2000 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
web_eg
collection
32
ITEMS
2.4M
VIEWS
collection
eye 2.4M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_prin
collection
1
ITEMS
87,602
VIEWS
collection
eye 87,602
Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_dar
collection
112
ITEMS
5.4M
VIEWS
collection
eye 5.4M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Traffic
collection
89
ITEMS
800
VIEWS
collection
eye 800
Traffic files from Alexa Internet that are sanitized-- just base urls (no parameters) and time/date. This data is currently not publicly accessible. Covers the period from December 2001 to February 2009.
National Science Digital Library
collection
3
ITEMS
34,907
VIEWS
collection
eye 34,907
Demo crawl for the National Science Digital Library. This data is currently not publicly accessible. from Wikipedia : The United States' National Science Digital Library (NSDL) is an open-access online digital library and collaborative network of disciplinary and grade-level focused education providers. NSDL's mission is to provide quality digital learning collections to the science, technology, engineering, and mathematics (STEM) education community, both formal and informal, institutional and...
Swiss National Library
collection
10
ITEMS
288,260
VIEWS
collection
eye 288,260
Data collected by Internet Archive on behalf of the Swiss National Library. This data is currently not publicly accessible.
Google Video
collection
7,208
ITEMS
130,708
VIEWS
collection
eye 130,708
Content crawled from video.google.com prior to shut down. from Wikipedia : Google Videos (originally Google Video) was a video search engine, and formerly a free video sharing website, from Google Inc. Before removing user-uploaded content, the service allowed selected videos to be remotely embedded on other websites and provided the necessary HTML code alongside the media, similar to YouTube. This allowed for websites to host large amounts of video remotely without running into bandwidth or...
Alexa Crawl TO
collection
1
ITEMS
1.2M
VIEWS
collection
eye 1.2M
Crawl TO from Alexa Internet. This data is currently not publicly accessible.
Alexa MP3.com Crawl
collection
43
ITEMS
7,652
VIEWS
collection
eye 7,652
MP3.com Crawl from Alexa Internet. This data is currently not publicly accessible.
collection
eye 1.7M
Mayoral Crawls
collection
1
ITEMS
185,779
VIEWS
collection
eye 185,779
Mayoral crawls performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl Short
collection
5
ITEMS
5.1M
VIEWS
collection
eye 5.1M
Crawl Short from Alexa Internet. This data is currently not publicly accessible.
web_con
collection
1,507
ITEMS
45.8M
VIEWS
collection
eye 45.8M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl DZ
collection
1,207
ITEMS
85.8M
VIEWS
collection
eye 85.8M
Crawl DZ from Alexa Internet. This data is currently not publicly accessible.
web_leg
collection
58
ITEMS
5.9M
VIEWS
collection
eye 5.9M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl EH
collection
1,218
ITEMS
100.8M
VIEWS
collection
eye 100.8M
Crawl EH from Alexa Internet. This data is currently not publicly accessible.
Wikipedia Dumps
by Wikipedia
web
eye 98
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 126
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 74
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 108
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 87
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 93
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 95
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 100
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 94
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 96
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 115
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 119
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 97
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web
eye 88
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010