org.archive.wayback.resourceindex.filters
Class ConditionalGetAnnotationFilter

java.lang.Object
  extended by org.archive.wayback.resourceindex.filters.ConditionalGetAnnotationFilter
All Implemented Interfaces:
ObjectFilter<CaptureSearchResult>

public class ConditionalGetAnnotationFilter
extends Object
implements ObjectFilter<CaptureSearchResult>

WARC file allows 2 forms of deduplication. The first actually downloads documents and compares their digest with a database of previous values. When a new capture of a document exactly matches the previous digest, an abbreviated record is stored in the WARC file. The second form uses an HTTP conditional GET request, sending previous values returned for a given URL (etag, last-modified, etc). In this case, the remote server either sends a new document (200) which is stored normally, or the server will return a 304 (Not Modified) response, which is stored in the WARC file. For the first record type, the wayback indexer will output a placeholder record that includes the digest of the last-stored record. For 304 responses, the indexer outputs a normal looking record, but the record will have a SHA1 digest which is easily distinguishable as an "empty" document. The SHA1 is always: 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ This class will observe a stream of SearchResults, storing the values for the last seen non-empty SHA1 field. Any subsequent SearchResults with an empty SHA1 will be annotated, copying the values from the last non-empty record. This is highly experimental.

Version:
$Date: 2010-09-29 05:28:38 +0700 (Wed, 29 Sep 2010) $, $Revision: 3262 $
Author:
brad

Field Summary
 
Fields inherited from interface org.archive.wayback.util.ObjectFilter
FILTER_ABORT, FILTER_EXCLUDE, FILTER_INCLUDE
 
Constructor Summary
ConditionalGetAnnotationFilter()
           
 
Method Summary
 int filterObject(CaptureSearchResult o)
          inpect record and determine if it should be included in the results or not, or if processing of new records should stop.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ConditionalGetAnnotationFilter

public ConditionalGetAnnotationFilter()
Method Detail

filterObject

public int filterObject(CaptureSearchResult o)
Description copied from interface: ObjectFilter
inpect record and determine if it should be included in the results or not, or if processing of new records should stop.

Specified by:
filterObject in interface ObjectFilter<CaptureSearchResult>
Parameters:
o - Object which should be checked for inclusion/exclusion or abort
Returns:
int of FILTER_INCLUDE, FILTER_EXCLUDE, or FILTER_ABORT


Copyright © 2005-2011 Internet Archive. All Rights Reserved.