org.archive.wayback.accesscontrol.robotstxt
Class RobotExclusionFilter
java.lang.Object
org.archive.wayback.accesscontrol.robotstxt.RobotExclusionFilter
- All Implemented Interfaces:
- ObjectFilter<CaptureSearchResult>
public class RobotExclusionFilter
- extends java.lang.Object
- implements ObjectFilter<CaptureSearchResult>
CaptureSearchResult Filter that uses a LiveWebCache to retrieve robots.txt documents
from the live web, and filters SearchResults based on the rules therein.
This class caches parsed RobotRules that are retrieved, so using the same
instance to filter multiple SearchResults from the same host will be more
efficient.
Instances are expected to be transient for each request: The internally
cached StringBuilder is not thread safe.
- Version:
- $Date$, $Revision$
- Author:
- brad
|
Constructor Summary |
RobotExclusionFilter(LiveWebCache webCache,
java.lang.String userAgent,
long maxCacheMS)
Construct a new RobotExclusionFilter that uses webCache to pull
robots.txt documents. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RobotExclusionFilter
public RobotExclusionFilter(LiveWebCache webCache,
java.lang.String userAgent,
long maxCacheMS)
- Construct a new RobotExclusionFilter that uses webCache to pull
robots.txt documents. filtering is based on userAgent, and cached
documents newer than maxCacheMS in the webCache are considered valid.
- Parameters:
webCache - userAgent - maxCacheMS -
searchResultToRobotUrlStrings
protected java.util.List<java.lang.String> searchResultToRobotUrlStrings(java.lang.String resultHost)
filterObject
public int filterObject(CaptureSearchResult r)
- Description copied from interface:
ObjectFilter
- inpect record and determine if it should be included in the
results or not, or if processing of new records should stop.
- Specified by:
filterObject in interface ObjectFilter<CaptureSearchResult>
- Parameters:
r - Object which should be checked for inclusion/exclusion or abort
- Returns:
- int of FILTER_INCLUDE, FILTER_EXCLUDE, or FILTER_ABORT
Copyright © 2005-2009 Internet Archive. All Rights Reserved.