Wide17 was seeded with the "Total Domains" list of 256,796,456 URLs provided by Domains Index on June 26th, and crawled with max-hops set to "3" and de-duplication set "on".
The seed for Wide00014 was: - Slash pages from every domain on the web: -- a list of domains using Survey crawl seeds -- a list of domains using Wide00012 web graph -- a list of domains using Wide00013 web graph - Top ranked pages (up to a max of 100) from every linked-to domain using the Wide00012 inter-domain navigational link graph -- a ranking of all URLs that have more than one incoming inter-domain link (rank was determined by number of incoming links using Wide00012 inter domain links)...
Web wide crawl number 16 The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.
Web wide crawl with initial seedlist and crawler configuration from April 2013.
Web wide crawl with initial seedlist and crawler configuration from January 2015.
Web wide crawl with initial seedlist and crawler configuration from June 2014.
Web wide crawl with initial seedlist and crawler configuration from August 2013.
Web wide crawl with initial seedlist and crawler configuration from February 2014.
Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Web wide crawl with initial seedlist and crawler configuration from April 2012.
Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Web wide crawl with initial seedlist and crawler configuration from October 2010
Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Web wide crawl with initial seedlist and crawler configuration from September 2012.
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
Web wide crawl with initial seedlist and crawler configuration from September 2010
5.5M
5.5M
May 14, 2020
05/20
by
Internet Archive
web
eye 5.5M
favorite 0
comment 0
"Internet Archive crawldata from feed-driven by 1.2 million top ranked domains from data.domainrank.io - captured by crawl423.us.archive.org:survey_00010 from Mon May 11 14:14:43 PDT 2020 to Mon May 11 09:09:55 PDT 2020."
Topics: survey_00010, crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Sun Aug 5 03:32:04 PDT 2018 to Mon Aug 6 22:50:58 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Mon Aug 6 14:06:39 PDT 2018 to Wed Aug 8 10:30:37 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Fri Aug 3 21:22:16 PDT 2018 to Sun Aug 5 10:02:05 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Mon Aug 6 12:59:58 PDT 2018 to Wed Aug 8 04:44:37 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Fri Aug 3 21:35:03 PDT 2018 to Sun Aug 5 05:00:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Sat Aug 4 03:13:16 PDT 2018 to Sun Aug 5 14:12:46 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Fri Aug 3 20:25:18 PDT 2018 to Sun Aug 5 08:14:01 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Sun Aug 5 04:40:59 PDT 2018 to Tue Aug 7 09:40:31 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Sun Aug 5 03:33:48 PDT 2018 to Tue Aug 7 01:48:53 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun Aug 5 05:32:56 PDT 2018 to Tue Aug 7 09:24:53 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl811.us.archive.org:wide from Sun Aug 5 01:02:39 PDT 2018 to Tue Aug 7 17:04:42 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Sun Aug 5 05:22:26 PDT 2018 to Mon Aug 6 22:26:16 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Mon Aug 6 13:08:43 PDT 2018 to Wed Aug 8 10:49:06 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Mon Aug 6 06:15:02 PDT 2018 to Wed Aug 8 17:46:21 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Sun Aug 5 00:35:40 PDT 2018 to Tue Aug 7 01:25:09 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Sat Aug 4 03:13:18 PDT 2018 to Sun Aug 5 11:15:19 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Sat Aug 4 22:37:55 PDT 2018 to Mon Aug 6 22:17:37 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Sun Aug 5 01:06:54 PDT 2018 to Mon Aug 6 15:07:05 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Sun Aug 5 05:36:11 PDT 2018 to Mon Aug 6 06:38:33 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Mon Aug 6 12:03:20 PDT 2018 to Wed Aug 8 23:51:48 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Mon Aug 6 09:25:04 PDT 2018 to Wed Aug 8 09:47:22 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Sat Aug 4 20:34:59 PDT 2018 to Mon Aug 6 22:41:00 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Sun Aug 5 04:29:12 PDT 2018 to Mon Aug 6 09:39:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Mon Aug 6 13:38:33 PDT 2018 to Wed Aug 8 08:00:38 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Mon Aug 6 06:15:56 PDT 2018 to Wed Aug 8 04:21:46 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Mon Aug 6 16:39:31 PDT 2018 to Wed Aug 8 09:59:07 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Fri Aug 3 21:18:11 PDT 2018 to Sun Aug 5 14:38:58 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Sun Aug 5 06:35:49 PDT 2018 to Mon Aug 6 18:20:05 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Sun Aug 5 03:16:34 PDT 2018 to Mon Aug 6 16:55:41 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl804.us.archive.org:wide from Sun Aug 5 04:04:57 PDT 2018 to Tue Aug 7 13:12:57 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Sat Aug 4 03:13:13 PDT 2018 to Sun Aug 5 16:18:03 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Fri Aug 3 21:35:01 PDT 2018 to Sun Aug 5 13:59:54 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Fri Aug 3 21:34:22 PDT 2018 to Sun Aug 5 08:16:26 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Mon Aug 6 09:28:15 PDT 2018 to Wed Aug 8 18:02:43 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Tue Aug 7 14:29:40 PDT 2018 to Thu Aug 9 10:29:22 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Sun Aug 5 05:35:10 PDT 2018 to Mon Aug 6 17:04:32 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Mon Aug 6 09:59:22 PDT 2018 to Wed Aug 8 07:33:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Sat Aug 4 23:46:37 PDT 2018 to Tue Aug 7 01:00:56 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Mon Aug 6 13:48:48 PDT 2018 to Wed Aug 8 11:44:48 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Fri Aug 3 21:34:24 PDT 2018 to Sun Aug 5 10:09:58 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Mon Aug 6 14:52:48 PDT 2018 to Wed Aug 8 10:36:01 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl804.us.archive.org:wide from Fri Aug 3 21:34:25 PDT 2018 to Sun Aug 5 13:42:13 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Fri Aug 3 21:35:00 PDT 2018 to Sun Aug 5 12:48:40 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Aug 3 21:18:13 PDT 2018 to Sun Aug 5 13:22:13 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Mon Aug 6 09:17:48 PDT 2018 to Wed Aug 8 09:30:08 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Fri Aug 3 21:18:10 PDT 2018 to Sun Aug 5 09:52:49 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Fri Aug 3 20:26:12 PDT 2018 to Sun Aug 5 04:54:14 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl811.us.archive.org:wide from Fri Aug 3 21:22:15 PDT 2018 to Sun Aug 5 04:38:03 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Mon Aug 6 09:40:02 PDT 2018 to Wed Aug 8 08:03:57 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon Aug 6 04:16:16 PDT 2018 to Wed Aug 8 01:01:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Fri Aug 3 21:34:26 PDT 2018 to Sun Aug 5 00:05:49 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Mon Aug 6 02:27:04 PDT 2018 to Tue Aug 7 12:04:12 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Sun Aug 5 00:24:43 PDT 2018 to Thu Aug 16 16:44:19 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Fri Aug 3 21:22:17 PDT 2018 to Sun Aug 5 05:21:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Aug 3 20:27:13 PDT 2018 to Sun Aug 5 03:13:30 PDT 2018.
Topic: crawldata