org.archive.wayback.accesscontrol.robotstxt
Class RobotRules
java.lang.Object
org.archive.wayback.accesscontrol.robotstxt.RobotRules
public class RobotRules
- extends Object
Class which parses a robots.txt file, storing the rules contained therein,
and then allows for testing if path/userAgent tuples are blocked by those
rules.
- Version:
- $Date: 2010-09-29 05:28:38 +0700 (Wed, 29 Sep 2010) $, $Revision: 3262 $
- Author:
- brad
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
GLOBAL_USER_AGENT
public static final String GLOBAL_USER_AGENT
- Special name for User-agent which matches all values
- See Also:
- Constant Field Values
RobotRules
public RobotRules()
hasSyntaxErrors
public boolean hasSyntaxErrors()
- Returns:
- true if the robots.txt file looked suspicious, currently meaning
we found a Disallow rule that was not preceded by a "User-agent:" line
getUserAgentsFound
public List<String> getUserAgentsFound()
- Returns:
- a List of all UserAgents Found in the Robots.txt document
parse
public void parse(InputStream is)
throws IOException
- Read rules from InputStream argument into this RobotRules, as a
side-effect, sets the bSyntaxErrors property.
- Parameters:
is - InputStream containing the robots.txt document
- Throws:
IOException - for usual reasons
blocksPathForUA
public boolean blocksPathForUA(String path,
String ua)
- Checks first the specified ua UserAgent, if rules are present for it,
and then falls back to using rules for the '*' UserAgent.
- Parameters:
path - String server relative path to check for accessua - String user agent to check for access
- Returns:
- boolean value where true indicates the path is blocked for ua
Copyright © 2005-2011 Internet Archive. All Rights Reserved.