org.archive.wayback.resourceindex
Class LocalResourceIndex

java.lang.Object
  extended by org.archive.wayback.resourceindex.LocalResourceIndex
All Implemented Interfaces:
ResourceIndex

public class LocalResourceIndex
extends Object
implements ResourceIndex

ResourceIndex implementation which assumes a "local" SearchResultSource. Extracting SearchResults from the source involves several layered steps: 1) extraction of results based on a prefix into the index 2) passing each result through a series of adapters these adapters can create new fields based on existing fields, or can annotate fields as they are scanned in order 3) filtering results based on request filters, which may come from * WaybackRequest-specific parameters. Ex. exact host match only, exact scheme match only, ... * AccessPoint-specific configuration Ex. only return records with (ARC/WARC) filename prefixed with XXX Ex. block any dates not older than 6 months 4) filtering based on AccessControl configurations Ex. block any urls with prefixes in file X 5) windowing filters, which provide pagination of the results, allowing requests to specify "show results between 10 and 20" 6) post filter adapters, which may annotate final results with other information Ex. for each result, consult DB to see if user-contributed messages apply to the results After all results have been processed, we annotate the final SearchResultS object with summary information about the results included. As we set up the chain of filters, we instrument the chain with counters that observe the number of results that went into, and came out of the Exclusion filters. If there were results presented to the Exclusion filter, but none were emitted from it, an AccessControlException is thrown.

Version:
$Date: 2010-09-29 05:28:38 +0700 (Wed, 29 Sep 2010) $, $Revision: 3262 $
Author:
brad

Field Summary
protected  SearchResultSource source
           
static int TYPE_CAPTURE
           
static int TYPE_REPLAY
           
static int TYPE_URL
           
 
Constructor Summary
LocalResourceIndex()
           
 
Method Summary
 void addSearchResults(Iterator<CaptureSearchResult> itr)
           
 CaptureSearchResults doCaptureQuery(WaybackRequest wbRequest, int type)
           
 UrlSearchResults doUrlQuery(WaybackRequest wbRequest)
           
 ObjectFilter<CaptureSearchResult> getAnnotater()
           
 UrlCanonicalizer getCanonicalizer()
           
 ObjectFilter<CaptureSearchResult> getFilter()
           
 int getMaxRecords()
           
 boolean isDedupeRecords()
           
 boolean isUpdatable()
           
 SearchResults query(WaybackRequest wbRequest)
          Transform a WaybackRequest into a ResourceResults.
 void setAnnotater(ObjectFilter<CaptureSearchResult> annotater)
           
 void setCanonicalizer(UrlCanonicalizer canonicalizer)
           
 void setDedupeRecords(boolean dedupeRecords)
           
 void setFilter(ObjectFilter<CaptureSearchResult> filter)
           
 void setMaxRecords(int maxRecords)
           
 void setSource(SearchResultSource source)
           
 void shutdown()
          Release any resources used by this ResourceIndex cleanly
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TYPE_REPLAY

public static final int TYPE_REPLAY
See Also:
Constant Field Values

TYPE_CAPTURE

public static final int TYPE_CAPTURE
See Also:
Constant Field Values

TYPE_URL

public static final int TYPE_URL
See Also:
Constant Field Values

source

protected SearchResultSource source
Constructor Detail

LocalResourceIndex

public LocalResourceIndex()
Method Detail

doCaptureQuery

public CaptureSearchResults doCaptureQuery(WaybackRequest wbRequest,
                                           int type)
                                    throws ResourceIndexNotAvailableException,
                                           ResourceNotInArchiveException,
                                           BadQueryException,
                                           AccessControlException
Throws:
ResourceIndexNotAvailableException
ResourceNotInArchiveException
BadQueryException
AccessControlException

doUrlQuery

public UrlSearchResults doUrlQuery(WaybackRequest wbRequest)
                            throws ResourceIndexNotAvailableException,
                                   ResourceNotInArchiveException,
                                   BadQueryException,
                                   AccessControlException
Throws:
ResourceIndexNotAvailableException
ResourceNotInArchiveException
BadQueryException
AccessControlException

query

public SearchResults query(WaybackRequest wbRequest)
                    throws ResourceIndexNotAvailableException,
                           ResourceNotInArchiveException,
                           BadQueryException,
                           AccessControlException
Description copied from interface: ResourceIndex
Transform a WaybackRequest into a ResourceResults.

Specified by:
query in interface ResourceIndex
Parameters:
wbRequest - WaybackRequest object from RequestParser
Returns:
SearchResults containing SearchResult objects matching the WaybackRequest
Throws:
ResourceIndexNotAvailableException - if the ResourceIndex is not available (remote host down, local files missing, etc)
ResourceNotInArchiveException - if the ResourceIndex could be contacted, but no SearchResult objects matched the request
BadQueryException - if the WaybackRequest is lacking information required to make a reasonable search of this ResourceIndex
AccessControlException - if SearchResult objects actually matched, but could not be returned due to AccessControl restrictions (robots.txt documents, Administrative URL blocks, etc)

addSearchResults

public void addSearchResults(Iterator<CaptureSearchResult> itr)
                      throws IOException,
                             UnsupportedOperationException
Throws:
IOException
UnsupportedOperationException

isUpdatable

public boolean isUpdatable()

setMaxRecords

public void setMaxRecords(int maxRecords)
Parameters:
maxRecords - the maxRecords to set

getMaxRecords

public int getMaxRecords()

setSource

public void setSource(SearchResultSource source)
Parameters:
source - the source to set

isDedupeRecords

public boolean isDedupeRecords()

setDedupeRecords

public void setDedupeRecords(boolean dedupeRecords)

getCanonicalizer

public UrlCanonicalizer getCanonicalizer()

setCanonicalizer

public void setCanonicalizer(UrlCanonicalizer canonicalizer)

shutdown

public void shutdown()
              throws IOException
Description copied from interface: ResourceIndex
Release any resources used by this ResourceIndex cleanly

Specified by:
shutdown in interface ResourceIndex
Throws:
IOException - for usual causes

getAnnotater

public ObjectFilter<CaptureSearchResult> getAnnotater()

setAnnotater

public void setAnnotater(ObjectFilter<CaptureSearchResult> annotater)

getFilter

public ObjectFilter<CaptureSearchResult> getFilter()

setFilter

public void setFilter(ObjectFilter<CaptureSearchResult> filter)


Copyright © 2005-2011 Internet Archive. All Rights Reserved.