Skip to main content

Worldwide Web Crawls

Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites.



rss RSS

Show sorted alphabetically
Show sorted alphabetically
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
collection
eye 1.3B
Wide17 was seeded with the "Total Domains" list of 256,796,456 URLs provided by  Domains Index   on June 26th, and crawled with max-hops set to "3" and de-duplication set "on".   
collection
eye 1.9B
The seed for Wide00014 was: - Slash pages from every domain on the web: -- a list of domains using Survey crawl seeds -- a list of domains using Wide00012 web graph -- a list of domains using Wide00013 web graph - Top ranked pages (up to a max of 100) from every linked-to domain using the Wide00012 inter-domain navigational link graph -- a ranking of all URLs that have more than one incoming inter-domain link (rank was determined by number of incoming links using Wide00012 inter domain links)...
collection
eye 1.1B
Web wide crawl.
collection
eye 946.7M
Web wide crawl number 16 The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.
Wide Crawl started April 2013
Wide Crawl started April 2013
collection
25,035
ITEMS
1.1B
VIEWS
collection
eye 1.1B
Web wide crawl with initial seedlist and crawler configuration from April 2013.
Wide Crawl Number 12 - started March, 14th 2015
Wide Crawl Number 12 - started March, 14th 2015
collection
49,621
ITEMS
1B
VIEWS
collection
eye 1B
Web wide crawl with initial seedlist and crawler configuration from January 2015.
Wide Crawl started June 2014
Wide Crawl started June 2014
collection
45,341
ITEMS
1B
VIEWS
collection
eye 1B
Web wide crawl with initial seedlist and crawler configuration from June 2014.
Wide Crawl Number 13
Wide Crawl Number 13
collection
46,050
ITEMS
758M
VIEWS
collection
eye 758M
Web Wide Crawl Number 13
Wide Crawl started August 2013
Wide Crawl started August 2013
collection
21,932
ITEMS
742.7M
VIEWS
collection
eye 742.7M
Web wide crawl with initial seedlist and crawler configuration from August 2013.
Wide Crawl started February 2014
Wide Crawl started February 2014
collection
9,806
ITEMS
468.3M
VIEWS
collection
eye 468.3M
Web wide crawl with initial seedlist and crawler configuration from February 2014.
Wide Crawl started January 2012
Wide Crawl started January 2012
collection
30,373
ITEMS
646M
VIEWS
collection
eye 646M
Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Wide Crawl started April 2012
Wide Crawl started April 2012
collection
39,279
ITEMS
567.6M
VIEWS
collection
eye 567.6M
Web wide crawl with initial seedlist and crawler configuration from April 2012.
Host Screen Captures
Host Screen Captures
collection
17,413
ITEMS
108.1M
VIEWS
collection
eye 108.1M
Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Wide Crawl started October 2010
Wide Crawl started October 2010
collection
15,839
ITEMS
435.1M
VIEWS
collection
eye 435.1M
Web wide crawl with initial seedlist and crawler configuration from October 2010
Wide Crawl started October 2011
Wide Crawl started October 2011
collection
12,648
ITEMS
392.8M
VIEWS
collection
eye 392.8M
Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Wide Crawl Started January 2013
Wide Crawl Started January 2013
collection
15,157
ITEMS
420M
VIEWS
collection
eye 420M
Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Wide Crawl started September 2012
Wide Crawl started September 2012
collection
22,423
ITEMS
413.7M
VIEWS
collection
eye 413.7M
Web wide crawl with initial seedlist and crawler configuration from September 2012.
Wide Crawl started March 2011
Wide Crawl started March 2011
collection
8,528
ITEMS
366.4M
VIEWS
collection
eye 366.4M
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
Wide Crawl started September 2010
Wide Crawl started September 2010
collection
332
ITEMS
12.9M
VIEWS
collection
eye 12.9M
Web wide crawl with initial seedlist and crawler configuration from September 2010
survey_00010
web
eye 5.5M
favorite 0
comment 0
"Internet Archive crawldata from feed-driven by 1.2 million top ranked domains from data.domainrank.io - captured by crawl423.us.archive.org:survey_00010 from Mon May 11 14:14:43 PDT 2020 to Mon May 11 09:09:55 PDT 2020."
Topics: survey_00010, crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Sun Aug 5 03:32:04 PDT 2018 to Mon Aug 6 22:50:58 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Mon Aug 6 14:06:39 PDT 2018 to Wed Aug 8 10:30:37 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Fri Aug 3 21:22:16 PDT 2018 to Sun Aug 5 10:02:05 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Mon Aug 6 12:59:58 PDT 2018 to Wed Aug 8 04:44:37 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Fri Aug 3 21:35:03 PDT 2018 to Sun Aug 5 05:00:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Sat Aug 4 03:13:16 PDT 2018 to Sun Aug 5 14:12:46 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Fri Aug 3 20:25:18 PDT 2018 to Sun Aug 5 08:14:01 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Sun Aug 5 04:40:59 PDT 2018 to Tue Aug 7 09:40:31 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Sun Aug 5 03:33:48 PDT 2018 to Tue Aug 7 01:48:53 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun Aug 5 05:32:56 PDT 2018 to Tue Aug 7 09:24:53 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl811.us.archive.org:wide from Sun Aug 5 01:02:39 PDT 2018 to Tue Aug 7 17:04:42 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Sun Aug 5 05:22:26 PDT 2018 to Mon Aug 6 22:26:16 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Mon Aug 6 13:08:43 PDT 2018 to Wed Aug 8 10:49:06 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Mon Aug 6 06:15:02 PDT 2018 to Wed Aug 8 17:46:21 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Sun Aug 5 00:35:40 PDT 2018 to Tue Aug 7 01:25:09 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Sat Aug 4 03:13:18 PDT 2018 to Sun Aug 5 11:15:19 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Sat Aug 4 22:37:55 PDT 2018 to Mon Aug 6 22:17:37 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Sun Aug 5 01:06:54 PDT 2018 to Mon Aug 6 15:07:05 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Sun Aug 5 05:36:11 PDT 2018 to Mon Aug 6 06:38:33 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Mon Aug 6 12:03:20 PDT 2018 to Wed Aug 8 23:51:48 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Mon Aug 6 09:25:04 PDT 2018 to Wed Aug 8 09:47:22 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Sat Aug 4 20:34:59 PDT 2018 to Mon Aug 6 22:41:00 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Sun Aug 5 04:29:12 PDT 2018 to Mon Aug 6 09:39:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Mon Aug 6 13:38:33 PDT 2018 to Wed Aug 8 08:00:38 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Mon Aug 6 06:15:56 PDT 2018 to Wed Aug 8 04:21:46 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Mon Aug 6 16:39:31 PDT 2018 to Wed Aug 8 09:59:07 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Fri Aug 3 21:18:11 PDT 2018 to Sun Aug 5 14:38:58 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Sun Aug 5 06:35:49 PDT 2018 to Mon Aug 6 18:20:05 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Sun Aug 5 03:16:34 PDT 2018 to Mon Aug 6 16:55:41 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl804.us.archive.org:wide from Sun Aug 5 04:04:57 PDT 2018 to Tue Aug 7 13:12:57 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Sat Aug 4 03:13:13 PDT 2018 to Sun Aug 5 16:18:03 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Fri Aug 3 21:35:01 PDT 2018 to Sun Aug 5 13:59:54 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Fri Aug 3 21:34:22 PDT 2018 to Sun Aug 5 08:16:26 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Mon Aug 6 09:28:15 PDT 2018 to Wed Aug 8 18:02:43 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Tue Aug 7 14:29:40 PDT 2018 to Thu Aug 9 10:29:22 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Sun Aug 5 05:35:10 PDT 2018 to Mon Aug 6 17:04:32 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Mon Aug 6 09:59:22 PDT 2018 to Wed Aug 8 07:33:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Sat Aug 4 23:46:37 PDT 2018 to Tue Aug 7 01:00:56 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Mon Aug 6 13:48:48 PDT 2018 to Wed Aug 8 11:44:48 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Fri Aug 3 21:34:24 PDT 2018 to Sun Aug 5 10:09:58 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Mon Aug 6 14:52:48 PDT 2018 to Wed Aug 8 10:36:01 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl804.us.archive.org:wide from Fri Aug 3 21:34:25 PDT 2018 to Sun Aug 5 13:42:13 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Fri Aug 3 21:35:00 PDT 2018 to Sun Aug 5 12:48:40 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Aug 3 21:18:13 PDT 2018 to Sun Aug 5 13:22:13 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Mon Aug 6 09:17:48 PDT 2018 to Wed Aug 8 09:30:08 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Fri Aug 3 21:18:10 PDT 2018 to Sun Aug 5 09:52:49 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Fri Aug 3 20:26:12 PDT 2018 to Sun Aug 5 04:54:14 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl811.us.archive.org:wide from Fri Aug 3 21:22:15 PDT 2018 to Sun Aug 5 04:38:03 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Mon Aug 6 09:40:02 PDT 2018 to Wed Aug 8 08:03:57 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon Aug 6 04:16:16 PDT 2018 to Wed Aug 8 01:01:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Fri Aug 3 21:34:26 PDT 2018 to Sun Aug 5 00:05:49 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Mon Aug 6 02:27:04 PDT 2018 to Tue Aug 7 12:04:12 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Sun Aug 5 00:24:43 PDT 2018 to Thu Aug 16 16:44:19 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Fri Aug 3 21:22:17 PDT 2018 to Sun Aug 5 05:21:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Aug 3 20:27:13 PDT 2018 to Sun Aug 5 03:13:30 PDT 2018.
Topic: crawldata