Retains all information about a particular Wayback configuration
within a ServletContext, including holding references to the
implementation instances of the primary Wayback classes:
Abstract implementation of the RequestParser interface, which provides some
convenience methods for accessing data in Map's, and also
allows for configuring maxRecords, and earliest and latest timestamp strings.
Result: this key being present indicates that this particular capture
was not actually stored, and that other values within this SearchResult
are actually values from a different record which *should* be identical
to this capture, had it been stored.
Mapper which reads an identity Funky format CDX line, outputting:
key - canonicalized original URL + timestamp
val - everything else
input lines are a hybrid format:
ARC_PREFIX (sans .arc.gz)
ROBOT_FLAG (combo of AIF - no: Archive,Index,Follow, or '-' if none)
http://www.myow.de:80/news_show.php? 20061126032815 - text/html 200 DVKFPTOJGCLT3G5GUVLCETHLFO3222JM - 91098929 foo A
Classic ReplayRenderer which uses a combination of server-side modification
URLs point back to a specific ArchivalURL AccessPoint.
A CompositeSearchResultSource that autmatically manages it's list of sources
based on 3 configuration files, and a background thread:
Config 1: Mapping of ranges to hosts responsible for that range
this class is aware of the local host name, so uses this file
to determin which range(s) should be local
Config 2: Mapping of ranges to one or more MD5s that compose that range
when all of these MD5s have been copied local, this index
becomes active, and each request uses a composite of these
Config 3: Mapping of MD5s to locations from which they can be retrieved
when a file that should be local is missing, these locations
will be used to retrieve a copy of that file
Background Thread: compares current set of files to the various
configurations files, gets files local that need to be and
updates the composite set searched when the correct set of
MD5s are localized.
Simple worker, which gets tasks from an IndexQueue, in the case, the name
of ARC/WARC files to be indexed, retrieves the ARC/WARC location from a
ResourceFileLocationDB, creates the index, which is serialized into a file,
and then hands that file off to a ResourceIndex for merging, using an
Tests if the String argument looks like it could be a legitimate
authority fragment of a URL, that is, is it an IP address, or, are the
characters legal in an authority, and does the string end with a legal
Alter the HTML document in page, updating URLs in the attrName attributes
of all tagName tags such that:
1) absolute URLs are prefixed with: wmPrefix + pageTS 2) server-relative
URLs are prefixed with: wmPrefix + pageTS + (host of page) 3)
path-relative URLs are prefixed with: wmPrefix + pageTS + (attribute URL
resolved against pageUrl)
RequestParser which attempts to extract data from an HTML form, that is, from
HTTP GET request arguments containing a query, an optional count (results
per page), and an optional current page argument.
Common interface to decouple application-specific handlers from the
ParseEventDelegator object: Any object interested in registering for specific
low-level events can implement this interface, and can be added to the
ParseEventDelegator parserVisitors list, and it will be given an opportunity
to register with the ParseEventDelegator for specific events it is
Class which allows matching based on:
a) one of several strings, any of which being found in the path cause match
b) one of several strings, any of which being found in the query cause match
c) one of several strings, *ALL* of which being found in the url cause match
Brutally simple, barely functional class to allow simple recording of
millisecond level timing within a particular request, enabling rough logging
of the time spent in various parts of the handling of a WaybackRequest
Read the single Spring XML configuration file located at the specified
path, performing PropertyPlaceHolder interpolation, extracting all beans
which implement the RequestHandler interface, and construct a
RequestMapper for those RequestHandlers, on the specified ServletContext.
Containing object for data associated with one region (month/year/etc) in the
graph, including the:
highlighted value index
int array of values to graph within this region
the global max int value across all values in the overall graph
css format, depending on the guessed context, so errors in embedded
documents do not cause unneeded errors in the embedding document.
ReplayDispatcher instance which uses a configurable ClosestResultSelector
to find the best result to show from a given set, and a list of
ReplayRendererSelector to determine how best to replay that result to a user.
Class which wraps functionality for converting a Resource(InputStream +
HTTP headers) into a StringBuilder, performing several common URL
resolution methods against that StringBuilder, inserting arbitrary Strings
into the page, and then converting the page back to a byte array.
Sad but needed subclass of the ArchiveReaderFactory, allows config of
timeouts for connect and reads on underlying HTTP connections, and overrides
the one getArchiveReader(URL,long) method to enable setting the timeouts.