Incremental crawl of the Portuguese web performed between 23 September 2014 and 24 October 2014 mainly from .PT domain. The AWP16 crawl is incremental because it was performed using DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/) taking the content of AWP15 as baseline. Thus, the files that remained unchanged from the AWP15 complete crawl were not archived (duplicated) on the AWP16 incremental crawl.
Topics: Incremental crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...
Incremental crawl of the Portuguese web performed between 23 September 2014 and 24 October 2014 mainly from .PT domain. The AWP16 crawl is incremental because it was performed using DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/) taking the content of AWP15 as baseline. Thus, the files that remained unchanged from the AWP15 complete crawl were not archived (duplicated) on the AWP16 incremental crawl.
Topics: Incremental crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...