org.archive.wayback.resourceindex.ziplines
Class ZiplinesSearchResultSource

java.lang.Object
  extended by org.archive.wayback.resourceindex.ziplines.ZiplinesSearchResultSource
All Implemented Interfaces:
SearchResultSource

public class ZiplinesSearchResultSource
extends Object
implements SearchResultSource

A set of Ziplines files, which are CDX files specially compressed into a series of GZipMembers such that: 1) each member is exactly 128K, padded using a GZip comment header 2) each member contains complete lines: no line spans two GZip members If the data put into these files is sorted, then the data within the files can be uncompressed when needed, minimizing the total data to be uncompressed This SearchResultSource assumes a set of alphabetically partitioned Ziplined CDX files, so that each file is sorted, and no regions overlap. This class takes 2 files as input: 1) a specially constructed map of the first N bytes of data from each GZip member, and the filename and offset of that GZip member. 2) a mapping of filenames to URLs Data from #1 is actually stored in a serialized

Author:
brad

Constructor Summary
ZiplinesSearchResultSource()
           
ZiplinesSearchResultSource(CDXFormat format)
           
 
Method Summary
protected  CloseableIterator<CaptureSearchResult> adaptIterator(Iterator<String> itr)
           
 void cleanup(CloseableIterator<CaptureSearchResult> c)
           
 String getChunkIndexPath()
           
 String getChunkMapPath()
           
 CDXFormat getFormat()
           
 int getMaxBlocks()
           
 CloseableIterator<CaptureSearchResult> getPrefixIterator(String prefix)
           
 CloseableIterator<CaptureSearchResult> getPrefixReverseIterator(String prefix)
           
 Iterator<String> getStringPrefixIterator(String prefix)
           
 void init()
           
static void main(String[] args)
           
 void setChunkIndexPath(String chunkIndexPath)
           
 void setChunkMapPath(String chunkMapPath)
           
 void setFormat(CDXFormat format)
           
 void setMaxBlocks(int maxBlocks)
           
 void shutdown()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ZiplinesSearchResultSource

public ZiplinesSearchResultSource()

ZiplinesSearchResultSource

public ZiplinesSearchResultSource(CDXFormat format)
Method Detail

init

public void init()
          throws IOException
Throws:
IOException

adaptIterator

protected CloseableIterator<CaptureSearchResult> adaptIterator(Iterator<String> itr)
                                                        throws IOException
Throws:
IOException

cleanup

public void cleanup(CloseableIterator<CaptureSearchResult> c)
             throws IOException
Specified by:
cleanup in interface SearchResultSource
Throws:
IOException

getPrefixIterator

public CloseableIterator<CaptureSearchResult> getPrefixIterator(String prefix)
                                                         throws ResourceIndexNotAvailableException
Specified by:
getPrefixIterator in interface SearchResultSource
Returns:
CleanableIterator that will return SearchResults beginning with prefix argument, with subsequent next() calls returning subsequent results.
Throws:
ResourceIndexNotAvailableException

getStringPrefixIterator

public Iterator<String> getStringPrefixIterator(String prefix)
                                         throws ResourceIndexNotAvailableException,
                                                IOException
Throws:
ResourceIndexNotAvailableException
IOException

getPrefixReverseIterator

public CloseableIterator<CaptureSearchResult> getPrefixReverseIterator(String prefix)
                                                                throws ResourceIndexNotAvailableException
Specified by:
getPrefixReverseIterator in interface SearchResultSource
Returns:
CleanableIterator that will return SearchResults starting *before* prefix argument, with subsequent next() calls returning previous results.
Throws:
ResourceIndexNotAvailableException

shutdown

public void shutdown()
              throws IOException
Specified by:
shutdown in interface SearchResultSource
Throws:
IOException

getFormat

public CDXFormat getFormat()
Returns:
the format

setFormat

public void setFormat(CDXFormat format)
Parameters:
format - the format to set

getChunkIndexPath

public String getChunkIndexPath()
Returns:
the chunkIndexPath

setChunkIndexPath

public void setChunkIndexPath(String chunkIndexPath)
Parameters:
chunkIndexPath - the chunkIndexPath to set

getChunkMapPath

public String getChunkMapPath()
Returns:
the chunkMapPath

setChunkMapPath

public void setChunkMapPath(String chunkMapPath)
Parameters:
chunkMapPath - the chunkMapPath to set

getMaxBlocks

public int getMaxBlocks()
Returns:
the maxBlocks

setMaxBlocks

public void setMaxBlocks(int maxBlocks)
Parameters:
maxBlocks - the maxBlocks to set

main

public static void main(String[] args)
Parameters:
args -


Copyright © 2005-2011 Internet Archive. All Rights Reserved.