org.archive.wayback.resourceindex.ziplines
Class ZiplinesSearchResultSource
java.lang.Object
org.archive.wayback.resourceindex.ziplines.ZiplinesSearchResultSource
- All Implemented Interfaces:
- SearchResultSource
public class ZiplinesSearchResultSource
- extends Object
- implements SearchResultSource
A set of Ziplines files, which are CDX files specially compressed into a
series of GZipMembers such that:
1) each member is exactly 128K, padded using a GZip comment header
2) each member contains complete lines: no line spans two GZip members
If the data put into these files is sorted, then the data within the files
can be uncompressed when needed, minimizing the total data to be uncompressed
This SearchResultSource assumes a set of alphabetically partitioned Ziplined
CDX files, so that each file is sorted, and no regions overlap.
This class takes 2 files as input:
1) a specially constructed map of the first N bytes of data from each GZip
member, and the filename and offset of that GZip member.
2) a mapping of filenames to URLs
Data from #1 is actually stored in a serialized
- Author:
- brad
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ZiplinesSearchResultSource
public ZiplinesSearchResultSource()
ZiplinesSearchResultSource
public ZiplinesSearchResultSource(CDXFormat format)
init
public void init()
throws IOException
- Throws:
IOException
adaptIterator
protected CloseableIterator<CaptureSearchResult> adaptIterator(Iterator<String> itr)
throws IOException
- Throws:
IOException
cleanup
public void cleanup(CloseableIterator<CaptureSearchResult> c)
throws IOException
- Specified by:
cleanup in interface SearchResultSource
- Throws:
IOException
getPrefixIterator
public CloseableIterator<CaptureSearchResult> getPrefixIterator(String prefix)
throws ResourceIndexNotAvailableException
- Specified by:
getPrefixIterator in interface SearchResultSource
- Returns:
- CleanableIterator that will return SearchResults beginning with prefix
argument, with subsequent next() calls returning subsequent
results.
- Throws:
ResourceIndexNotAvailableException
getStringPrefixIterator
public Iterator<String> getStringPrefixIterator(String prefix)
throws ResourceIndexNotAvailableException,
IOException
- Throws:
ResourceIndexNotAvailableException
IOException
getPrefixReverseIterator
public CloseableIterator<CaptureSearchResult> getPrefixReverseIterator(String prefix)
throws ResourceIndexNotAvailableException
- Specified by:
getPrefixReverseIterator in interface SearchResultSource
- Returns:
- CleanableIterator that will return SearchResults starting *before* prefix
argument, with subsequent next() calls returning previous
results.
- Throws:
ResourceIndexNotAvailableException
shutdown
public void shutdown()
throws IOException
- Specified by:
shutdown in interface SearchResultSource
- Throws:
IOException
getFormat
public CDXFormat getFormat()
- Returns:
- the format
setFormat
public void setFormat(CDXFormat format)
- Parameters:
format - the format to set
getChunkIndexPath
public String getChunkIndexPath()
- Returns:
- the chunkIndexPath
setChunkIndexPath
public void setChunkIndexPath(String chunkIndexPath)
- Parameters:
chunkIndexPath - the chunkIndexPath to set
getChunkMapPath
public String getChunkMapPath()
- Returns:
- the chunkMapPath
setChunkMapPath
public void setChunkMapPath(String chunkMapPath)
- Parameters:
chunkMapPath - the chunkMapPath to set
getMaxBlocks
public int getMaxBlocks()
- Returns:
- the maxBlocks
setMaxBlocks
public void setMaxBlocks(int maxBlocks)
- Parameters:
maxBlocks - the maxBlocks to set
main
public static void main(String[] args)
- Parameters:
args -
Copyright © 2005-2011 Internet Archive. All Rights Reserved.