org.archive.wayback.accesscontrol.robotstxt
Class RobotRules
java.lang.Object
org.archive.wayback.accesscontrol.robotstxt.RobotRules
public class RobotRules
- extends java.lang.Object
Class which parses a robots.txt file, storing the rules contained therein,
and then allows for testing if path/userAgent tuples are blocked by those
rules.
- Version:
- $Date$, $Revision$
- Author:
- brad
|
Field Summary |
static java.lang.String |
GLOBAL_USER_AGENT
Special name for User-agent which matches all values |
|
Method Summary |
boolean |
blocksPathForUA(java.lang.String path,
java.lang.String ua)
Checks first the specified ua UserAgent, if rules are present for it,
and then falls back to using rules for the '*' UserAgent. |
java.util.List<java.lang.String> |
getUserAgentsFound()
|
boolean |
hasSyntaxErrors()
|
void |
parse(java.io.InputStream is)
Read rules from InputStream argument into this RobotRules, as a
side-effect, sets the bSyntaxErrors property. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
GLOBAL_USER_AGENT
public static final java.lang.String GLOBAL_USER_AGENT
- Special name for User-agent which matches all values
- See Also:
- Constant Field Values
RobotRules
public RobotRules()
hasSyntaxErrors
public boolean hasSyntaxErrors()
- Returns:
- true if the robots.txt file looked suspicious, currently meaning
we found a Disallow rule that was not preceded by a "User-agent:" line
getUserAgentsFound
public java.util.List<java.lang.String> getUserAgentsFound()
- Returns:
- a List of all UserAgents Found in the Robots.txt document
parse
public void parse(java.io.InputStream is)
throws java.io.IOException
- Read rules from InputStream argument into this RobotRules, as a
side-effect, sets the bSyntaxErrors property.
- Parameters:
is -
- Throws:
java.io.IOException
blocksPathForUA
public boolean blocksPathForUA(java.lang.String path,
java.lang.String ua)
- Checks first the specified ua UserAgent, if rules are present for it,
and then falls back to using rules for the '*' UserAgent.
- Parameters:
path - ua -
- Returns:
- boolean value where true indicates the path is blocked for ua
Copyright © 2005-2009 Internet Archive. All Rights Reserved.