org.archive.wayback.accesscontrol.robotstxt
Class RobotRules

java.lang.Object
  extended by org.archive.wayback.accesscontrol.robotstxt.RobotRules

public class RobotRules
extends Object

Class which parses a robots.txt file, storing the rules contained therein, and then allows for testing if path/userAgent tuples are blocked by those rules.

Version:
$Date: 2010-09-29 05:28:38 +0700 (Wed, 29 Sep 2010) $, $Revision: 3262 $
Author:
brad

Field Summary
static String GLOBAL_USER_AGENT
          Special name for User-agent which matches all values
 
Constructor Summary
RobotRules()
           
 
Method Summary
 boolean blocksPathForUA(String path, String ua)
          Checks first the specified ua UserAgent, if rules are present for it, and then falls back to using rules for the '*' UserAgent.
 List<String> getUserAgentsFound()
           
 boolean hasSyntaxErrors()
           
 void parse(InputStream is)
          Read rules from InputStream argument into this RobotRules, as a side-effect, sets the bSyntaxErrors property.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GLOBAL_USER_AGENT

public static final String GLOBAL_USER_AGENT
Special name for User-agent which matches all values

See Also:
Constant Field Values
Constructor Detail

RobotRules

public RobotRules()
Method Detail

hasSyntaxErrors

public boolean hasSyntaxErrors()
Returns:
true if the robots.txt file looked suspicious, currently meaning we found a Disallow rule that was not preceded by a "User-agent:" line

getUserAgentsFound

public List<String> getUserAgentsFound()
Returns:
a List of all UserAgents Found in the Robots.txt document

parse

public void parse(InputStream is)
           throws IOException
Read rules from InputStream argument into this RobotRules, as a side-effect, sets the bSyntaxErrors property.

Parameters:
is - InputStream containing the robots.txt document
Throws:
IOException - for usual reasons

blocksPathForUA

public boolean blocksPathForUA(String path,
                               String ua)
Checks first the specified ua UserAgent, if rules are present for it, and then falls back to using rules for the '*' UserAgent.

Parameters:
path - String server relative path to check for access
ua - String user agent to check for access
Returns:
boolean value where true indicates the path is blocked for ua


Copyright © 2005-2011 Internet Archive. All Rights Reserved.