org.archive.wayback.resourceindex
Class LocalResourceIndex
java.lang.Object
org.archive.wayback.resourceindex.LocalResourceIndex
- All Implemented Interfaces:
- ResourceIndex
public class LocalResourceIndex
- extends Object
- implements ResourceIndex
ResourceIndex implementation which assumes a "local" SearchResultSource.
Extracting SearchResults from the source involves several layered steps:
1) extraction of results based on a prefix into the index
2) passing each result through a series of adapters
these adapters can create new fields based on existing fields, or can
annotate fields as they are scanned in order
3) filtering results based on request filters, which may come from
* WaybackRequest-specific parameters.
Ex. exact host match only, exact scheme match only, ...
* AccessPoint-specific configuration
Ex. only return records with (ARC/WARC) filename prefixed with XXX
Ex. block any dates not older than 6 months
4) filtering based on AccessControl configurations
Ex. block any urls with prefixes in file X
5) windowing filters, which provide pagination of the results, allowing
requests to specify "show results between 10 and 20"
6) post filter adapters, which may annotate final results with other
information
Ex. for each result, consult DB to see if user-contributed messages
apply to the results
After all results have been processed, we annotate the final SearchResultS
object with summary information about the results included. As we set up the
chain of filters, we instrument the chain with counters that observe the
number of results that went into, and came out of the Exclusion filters.
If there were results presented to the Exclusion filter, but none were
emitted from it, an AccessControlException is thrown.
- Version:
- $Date: 2010-09-29 05:28:38 +0700 (Wed, 29 Sep 2010) $, $Revision: 3262 $
- Author:
- brad
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TYPE_REPLAY
public static final int TYPE_REPLAY
- See Also:
- Constant Field Values
TYPE_CAPTURE
public static final int TYPE_CAPTURE
- See Also:
- Constant Field Values
TYPE_URL
public static final int TYPE_URL
- See Also:
- Constant Field Values
source
protected SearchResultSource source
LocalResourceIndex
public LocalResourceIndex()
doCaptureQuery
public CaptureSearchResults doCaptureQuery(WaybackRequest wbRequest,
int type)
throws ResourceIndexNotAvailableException,
ResourceNotInArchiveException,
BadQueryException,
AccessControlException
- Throws:
ResourceIndexNotAvailableException
ResourceNotInArchiveException
BadQueryException
AccessControlException
doUrlQuery
public UrlSearchResults doUrlQuery(WaybackRequest wbRequest)
throws ResourceIndexNotAvailableException,
ResourceNotInArchiveException,
BadQueryException,
AccessControlException
- Throws:
ResourceIndexNotAvailableException
ResourceNotInArchiveException
BadQueryException
AccessControlException
query
public SearchResults query(WaybackRequest wbRequest)
throws ResourceIndexNotAvailableException,
ResourceNotInArchiveException,
BadQueryException,
AccessControlException
- Description copied from interface:
ResourceIndex
- Transform a WaybackRequest into a ResourceResults.
- Specified by:
query in interface ResourceIndex
- Parameters:
wbRequest - WaybackRequest object from RequestParser
- Returns:
- SearchResults containing SearchResult objects matching the
WaybackRequest
- Throws:
ResourceIndexNotAvailableException - if the ResourceIndex
is not available (remote host down, local files missing, etc)
ResourceNotInArchiveException - if the ResourceIndex could be
contacted, but no SearchResult objects matched the request
BadQueryException - if the WaybackRequest is lacking information
required to make a reasonable search of this ResourceIndex
AccessControlException - if SearchResult objects actually matched,
but could not be returned due to AccessControl restrictions
(robots.txt documents, Administrative URL blocks, etc)
addSearchResults
public void addSearchResults(Iterator<CaptureSearchResult> itr)
throws IOException,
UnsupportedOperationException
- Throws:
IOException
UnsupportedOperationException
isUpdatable
public boolean isUpdatable()
setMaxRecords
public void setMaxRecords(int maxRecords)
- Parameters:
maxRecords - the maxRecords to set
getMaxRecords
public int getMaxRecords()
setSource
public void setSource(SearchResultSource source)
- Parameters:
source - the source to set
isDedupeRecords
public boolean isDedupeRecords()
setDedupeRecords
public void setDedupeRecords(boolean dedupeRecords)
getCanonicalizer
public UrlCanonicalizer getCanonicalizer()
setCanonicalizer
public void setCanonicalizer(UrlCanonicalizer canonicalizer)
shutdown
public void shutdown()
throws IOException
- Description copied from interface:
ResourceIndex
- Release any resources used by this ResourceIndex cleanly
- Specified by:
shutdown in interface ResourceIndex
- Throws:
IOException - for usual causes
getAnnotater
public ObjectFilter<CaptureSearchResult> getAnnotater()
setAnnotater
public void setAnnotater(ObjectFilter<CaptureSearchResult> annotater)
getFilter
public ObjectFilter<CaptureSearchResult> getFilter()
setFilter
public void setFilter(ObjectFilter<CaptureSearchResult> filter)
Copyright © 2005-2011 Internet Archive. All Rights Reserved.