Package org.archive.wayback.resourceindex.filters

Class Summary
BeanShellFilter  
ClosestResultTrackingFilter Class which observes CaptureSearchResults, keeping track of the closest result found to a given date.
CompositeExclusionFilter SearchResultFilter that abstracts multiple SearchResultFilters -- if all filters return INCLUDE, then the result is included, but the first to return ABORT or EXCLUDE short-circuits the rest
ConditionalGetAnnotationFilter WARC file allows 2 forms of deduplication.
CounterFilter SearchResultFilter which INCLUDEs all checked records, but keeps track of how many were seen during processing.
DateRangeFilter SearchResultFilter that excludes records outside of start and end range.
DuplicateRecordFilter ObjectFilter which omits exact duplicate URL+date records from a stream of CaptureSearchResult.
EndDateFilter SearchResultFilter which includes all records until 1 is found beyond end date then it aborts processing.
ExclusionFilter  
FilePrefixFilter  
FileRegexFilter  
GuardRailFilter SearchResultFilter which aborts processing when too many records have been inspected.
HostMatchFilter SearchResultFilter which includes only records that have original host matching.
HttpCodeFilter ObjectFilter which allows including or excluding results based on the Http response code.
MimeTypeFilter SearchResultFilter which includes only records matching one or more supplied Mime-Types.
OracleAnnotationFilter SearchResult filter class which contacts an access-control Oracle, using information from the public comment field to annotate SearchResult objects.
SchemeMatchFilter ObjectFilter which omits CaptureSearchResult objects if their scheme does not match the specified scheme.
SelfRedirectFilter SearchResultFilter which INCLUDEs all records, unless they redirect to themselves, via whatever URL purification schemes are in use.
StartDateFilter SearchResultFilter which includes all records until 1 is found before start date then it aborts processing.
UrlMatchFilter SearchResultFilter which includes only records that have url matching aborts as soon as url does not match.
UrlPrefixMatchFilter SearchResultFilter which includes any URL which begins with a given prefix, and aborts processing when any URL does not match the prefix.
UserInfoInAuthorityFilter Class which omits CaptureSearchResults that have and '@' in the original URL field, if that '@' is after the scheme, and before the first '/' or ':'
WARCRevisitAnnotationFilter Filter class that observes a stream of SearchResults tracking for each complete record, a mapping of that records Digest to: Arc/Warc Filename Arc/Warc offset HTTP Response MIME-Type Redirect URL If subsequent SearchResults are missing these fields ("-") and the Digest field is in the map, then the SearchResults missing fields are replaced with the values from the previously seen record with the same digest, and an additional annotation field is added.
WindowEndFilter<T> SearchResultFitler that includes the first N records seen.
WindowStartFilter<T> SearchResultFitler that omits the first N records seen.
 



Copyright © 2005-2011 Internet Archive. All Rights Reserved.