org.archive.wayback.resourceindex.filters
Class ConditionalGetAnnotationFilter
java.lang.Object
org.archive.wayback.resourceindex.filters.ConditionalGetAnnotationFilter
- All Implemented Interfaces:
- ObjectFilter<CaptureSearchResult>
public class ConditionalGetAnnotationFilter
- extends Object
- implements ObjectFilter<CaptureSearchResult>
WARC file allows 2 forms of deduplication. The first actually downloads
documents and compares their digest with a database of previous values. When
a new capture of a document exactly matches the previous digest, an
abbreviated record is stored in the WARC file. The second form uses an HTTP
conditional GET request, sending previous values returned for a given URL
(etag, last-modified, etc). In this case, the remote server either sends a
new document (200) which is stored normally, or the server will return a
304 (Not Modified) response, which is stored in the WARC file.
For the first record type, the wayback indexer will output a placeholder
record that includes the digest of the last-stored record. For 304 responses,
the indexer outputs a normal looking record, but the record will have a
SHA1 digest which is easily distinguishable as an "empty" document. The SHA1
is always:
3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
This class will observe a stream of SearchResults, storing the values for
the last seen non-empty SHA1 field. Any subsequent SearchResults with an
empty SHA1 will be annotated, copying the values from the last non-empty
record.
This is highly experimental.
- Version:
- $Date: 2010-09-29 05:28:38 +0700 (Wed, 29 Sep 2010) $, $Revision: 3262 $
- Author:
- brad
|
Method Summary |
int |
filterObject(CaptureSearchResult o)
inpect record and determine if it should be included in the
results or not, or if processing of new records should stop. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ConditionalGetAnnotationFilter
public ConditionalGetAnnotationFilter()
filterObject
public int filterObject(CaptureSearchResult o)
- Description copied from interface:
ObjectFilter
- inpect record and determine if it should be included in the
results or not, or if processing of new records should stop.
- Specified by:
filterObject in interface ObjectFilter<CaptureSearchResult>
- Parameters:
o - Object which should be checked for inclusion/exclusion or abort
- Returns:
- int of FILTER_INCLUDE, FILTER_EXCLUDE, or FILTER_ABORT
Copyright © 2005-2011 Internet Archive. All Rights Reserved.