org.archive.access.nutch
Class NutchwaxQuery

java.lang.Object
  extended by org.archive.access.nutch.NutchwaxQuery

public class NutchwaxQuery
extends java.lang.Object

Handle exacturl when present in queries.


Constructor Summary
NutchwaxQuery()
           
 
Method Summary
static java.lang.String encodeExacturl(java.lang.String queryString)
           
static org.apache.nutch.searcher.Query parse(java.lang.String queryString, org.apache.hadoop.conf.Configuration conf)
          Does fixup on the passed in query before giving it into nutch.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NutchwaxQuery

public NutchwaxQuery()
Method Detail

parse

public static org.apache.nutch.searcher.Query parse(java.lang.String queryString,
                                                    org.apache.hadoop.conf.Configuration conf)
                                             throws java.io.IOException
Does fixup on the passed in query before giving it into nutch. Preprocess the query to do special handling of 'exacturl' clause if present. Split on spaces and then for each term, look for one that begins 'exacturl'. If we find one, special-encode its value. Do an encoding that is neither url nor html. Same encoding must be done in index-exacturl. This custom encoding is necessary because NutchAnalysis will not let through '?' or '=' characters in clause values and the '&' character can't be passed in a query string because it'll confuse request.getParameter.

Parameters:
queryString - Query string.
conf -
Returns:
Analyzed query.
Throws:
java.io.IOException

encodeExacturl

public static java.lang.String encodeExacturl(java.lang.String queryString)


Copyright © 2005-2007 Internet Archive. All Rights Reserved.