Retains all information about a particular Wayback configuration
within a ServletContext, including holding references to the
implementation instances of the primary Wayback classes:
RequestParser
ResourceIndex(via WaybackCollection)
ResourceStore(via WaybackCollection)
QueryRenderer
ReplayDispatcher
ExceptionRenderer
ResultURIConverter
Class that implements the RequestParser interface, and also understands how
to:
This class will attempt to use the overridable parseCustom() method to
create the WaybackRequest object, but if that fails (returns null), it will
fall back to:
A) attempting to parse out an incoming OpenSearch format query
B) attempting to parse out any and all incoming form elements submitted as
either GET or POST arguments
This class also contains the functionality to extract HTTP header
information into WaybackRequest objects, including Http auth info, referer,
remote IPs, etc.
Result: this key being present indicates that this particular capture
was not actually stored, and that other values within this SearchResult
are actually values from a different record which *should* be identical
to this capture, had it been stored.
flag indicates that this document was NOT downloaded, but that the
origin server indicated that the document had not changed, based on
If-Modified HTTP request headers.
Result: this key is present when the CAPTURE_DUPLICATE_ANNOTATION is also
present, with the value indicating the last date that was actually
stored for this duplicate.
SearchResultFilter that abstracts multiple SearchResultFilters -- if all
filters return INCLUDE, then the result is included, but the first to
return ABORT or EXCLUDE short-circuits the rest
Class that provides SearchResult Filtering based on multiple
ExclusionFilterFactory instances by returning a single composite
SearchResultFilter based on the results of each ExclusionFilter.
Adapter class that observes a stream of SearchResults tracking for each
complete record, a mapping of that records digest to:
Arc/Warc Filename
Arc/Warc offset
HTTP Response
MIME-Type
Redirect URL
If subsequent SearchResults are missing these fields ("-") and the Digest
field has been seen, then the subsequent SearchResults are updated with the
values from the kept copy matching that digest, and an additional annotation
field is added.
A CompositeSearchResultSource that autmatically manages it's list of sources
based on 3 configuration files, and a background thread:
Config 1: Mapping of ranges to hosts responsible for that range
this class is aware of the local host name, so uses this file
to determin which range(s) should be local
Config 2: Mapping of ranges to one or more MD5s that compose that range
when all of these MD5s have been copied local, this index
becomes active, and each request uses a composite of these
local files
Config 3: Mapping of MD5s to locations from which they can be retrieved
when a file that should be local is missing, these locations
will be used to retrieve a copy of that file
Background Thread: compares current set of files to the various
configurations files, gets files local that need to be and
updates the composite set searched when the correct set of
MD5s are localized.
ServletRequestContext interface which uses a ResourceFileLocationDB to
reverse proxy an incoming HTTP request for a file by name to it's actual
back-end location.
prune down rules to only those which apply for a particular timestamp
first eliminating those outside the timestamp range, and then removing
ADD which have a (subsequent) DELETE
Subclass of File, which allows binary searching, returning Iterators
that allow scanning forwards and backwards thru the (sorted) file starting
from a particular prefix.
FlatFile() -
Constructor for class org.archive.wayback.util.flatfile.FlatFile
Simple worker, which gets tasks from an IndexQueue, in the case, the name
of ARC/WARC files to be indexed, retrieves the ARC/WARC location from a
ResourceFileLocationDB, creates the index, which is serialized into a file,
and then hands that file off to a ResourceIndex for merging, using an
IndexClient.
CaptureSearchResult ObjectFilter which passes through all inputs, modifying
each to construct a corrected original URL to comply with new Identity
format.
Class which starts a background thread that repeatedly scans an incoming
directory and merges files found therein(which are assumed to be in CDX
format) with a BDBIndex.
Alter the HTML document in page, updating URLs in the attrName attributes
of all tagName tags such that:
1) absolute URLs are prefixed with: wmPrefix + pageTS 2) server-relative
URLs are prefixed with: wmPrefix + pageTS + (host of page) 3)
path-relative URLs are prefixed with: wmPrefix + pageTS + (attribute URL
resolved against pageUrl)
Render the contents of a WaybackException in either html, javascript, or
css format, depending on the guessed context, so errors in embedded
documents do not cause unneeded errors in the embedding document.
Class which repeatedly builds a ResourceFileList for a set of
ResourceFileSource objects, serializing them into files, and dropping them
into the incoming directory of a ResourceFileLocationDBUpdater.
CaptureSearchResult Filter that uses a LiveWebCache to retrieve robots.txt documents
from the live web, and filters SearchResults based on the rules therein.
Class which parses a robots.txt file, storing the rules contained therein,
and then allows for testing if path/userAgent tuples are blocked by those
rules.
RobotRules() -
Constructor for class org.archive.wayback.accesscontrol.robotstxt.RobotRules