org.archive.wayback.accesscontrol.robotstxt
Class RobotRules

java.lang.Object
  extended by org.archive.wayback.accesscontrol.robotstxt.RobotRules

public class RobotRules
extends java.lang.Object

Class which parses a robots.txt file, storing the rules contained therein, and then allows for testing if path/userAgent tuples are blocked by those rules.

Version:
$Date$, $Revision$
Author:
brad

Field Summary
static java.lang.String GLOBAL_USER_AGENT
          Special name for User-agent which matches all values
 
Constructor Summary
RobotRules()
           
 
Method Summary
 boolean blocksPathForUA(java.lang.String path, java.lang.String ua)
          Checks first the specified ua UserAgent, if rules are present for it, and then falls back to using rules for the '*' UserAgent.
 java.util.List<java.lang.String> getUserAgentsFound()
           
 boolean hasSyntaxErrors()
           
 void parse(java.io.InputStream is)
          Read rules from InputStream argument into this RobotRules, as a side-effect, sets the bSyntaxErrors property.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GLOBAL_USER_AGENT

public static final java.lang.String GLOBAL_USER_AGENT
Special name for User-agent which matches all values

See Also:
Constant Field Values
Constructor Detail

RobotRules

public RobotRules()
Method Detail

hasSyntaxErrors

public boolean hasSyntaxErrors()
Returns:
true if the robots.txt file looked suspicious, currently meaning we found a Disallow rule that was not preceded by a "User-agent:" line

getUserAgentsFound

public java.util.List<java.lang.String> getUserAgentsFound()
Returns:
a List of all UserAgents Found in the Robots.txt document

parse

public void parse(java.io.InputStream is)
           throws java.io.IOException
Read rules from InputStream argument into this RobotRules, as a side-effect, sets the bSyntaxErrors property.

Parameters:
is -
Throws:
java.io.IOException

blocksPathForUA

public boolean blocksPathForUA(java.lang.String path,
                               java.lang.String ua)
Checks first the specified ua UserAgent, if rules are present for it, and then falls back to using rules for the '*' UserAgent.

Parameters:
path -
ua -
Returns:
boolean value where true indicates the path is blocked for ua


Copyright © 2005-2009 Internet Archive. All Rights Reserved.