A C D E F G I L M N O P R S T U W

A

ARCCOLLECTION_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 
ARCFILENAME_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 
ARCFILEOFFSET_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 

C

checkArcsDir(Path) - Method in class org.archive.access.nutch.Nutchwax
Check the arcs dir exists and looks like it has files that list ARCs (rather than ARCs themselves).
checkMimetype(String) - Static method in class org.archive.access.nutch.ImportArcs
 
cleanup(Thread, Reporter) - Method in class org.archive.access.nutch.ImportArcs
 
close() - Method in class org.archive.access.nutch.ImportArcs
 
configure(JobConf) - Method in class org.archive.access.nutch.ImportArcs
 
configure(JobConf) - Method in class org.archive.access.nutch.NutchwaxLinkDb
 
createLinkdb(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 

D

doAll(Path, String, Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
Run passed list of mapreduce indexing jobs.
doAllUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doClass(String[]) - Method in class org.archive.access.nutch.Nutchwax
 
doClassUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doDedup(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doDedupUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doGet(HttpServletRequest, HttpServletResponse) - Method in class org.archive.access.nutch.NutchwaxOpenSearchServlet
 
doImport(Path, String, Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doImportUsage(String, int) - Static method in class org.archive.access.nutch.ImportArcs
 
doIndexing(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doIndexing(Nutchwax.OutputDirectories, Path[]) - Method in class org.archive.access.nutch.Nutchwax
 
doIndexUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doInvert(Nutchwax.OutputDirectories, Path[]) - Method in class org.archive.access.nutch.Nutchwax
 
doInvert(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doInvertUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doJob(String, String[]) - Method in class org.archive.access.nutch.Nutchwax
 
doMerge(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doMergeUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doUpdate(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doUpdate(Nutchwax.OutputDirectories, String[]) - Method in class org.archive.access.nutch.Nutchwax
 
doUpdateUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 

E

encodeExacturl(String) - Static method in class org.archive.access.nutch.NutchwaxQuery
 

F

formatToOneLine(String) - Method in class org.archive.access.nutch.ImportArcs
 

G

generateWaxKey(WritableComparable, String) - Static method in class org.archive.access.nutch.Nutchwax
 
generateWaxKey(String, String) - Static method in class org.archive.access.nutch.Nutchwax
 
get(ServletContext, Configuration) - Static method in class org.archive.access.nutch.NutchwaxBean
 
getAnchors(HitDetails) - Method in class org.archive.access.nutch.NutchwaxBean
 
getCollectionFromArcname(String) - Static method in class org.archive.access.nutch.ImportArcs
 
getCollectionFromWaxKey(WritableComparable) - Static method in class org.archive.access.nutch.Nutchwax
 
getCollectionQualifiedHitDetails(HitDetails) - Method in class org.archive.access.nutch.NutchwaxBean
TODO: Make it so I don't have to create a new HitDetails changing the key used doing lookup.
getConfiguration() - Static method in class org.archive.access.nutch.NutchwaxConfiguration
 
getConfiguration(ServletContext) - Static method in class org.archive.access.nutch.NutchwaxConfiguration
 
getCrawlDb() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getDate(String) - Static method in class org.archive.access.nutch.Nutchwax
 
getFS() - Method in class org.archive.access.nutch.Nutchwax
 
getIndex() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getIndexes() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getInlinks(HitDetails) - Method in class org.archive.access.nutch.NutchwaxBean
 
getJobConf() - Method in class org.archive.access.nutch.Nutchwax
 
getLinkDb() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getMimetype(String, MimeTypes, String) - Method in class org.archive.access.nutch.ImportArcs
 
getOutput() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getParseRate(long, long) - Method in class org.archive.access.nutch.ImportArcs
 
getParseRateLogMessage(String, String, double) - Method in class org.archive.access.nutch.ImportArcs
 
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.archive.access.nutch.ImportArcs.WaxFetcherOutputFormat
 
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.archive.access.nutch.ImportArcs.WaxParseOutputFormat
 
getSegments(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
getSegments() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getStatus(String, String, String, String) - Method in class org.archive.access.nutch.ImportArcs
 
getSummary(HitDetails[], Query) - Method in class org.archive.access.nutch.NutchwaxBean
 
getTmpDir() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getUrlFromWaxKey(WritableComparable) - Static method in class org.archive.access.nutch.Nutchwax
 

I

ImportArcs - Class in org.archive.access.nutch
Ingests ARCs writing ARC Record parse as Nutch FetcherOutputFormat.
ImportArcs() - Constructor for class org.archive.access.nutch.ImportArcs
 
ImportArcs(Configuration) - Constructor for class org.archive.access.nutch.ImportArcs
 
importArcs(Path, Path, String) - Method in class org.archive.access.nutch.ImportArcs
 
ImportArcs.WaxFetcherOutputFormat - Class in org.archive.access.nutch
Override of nutch FetcherOutputFormat so I can substitute my own ParseOutputFormat, ImportArcs.WaxParseOutputFormat.
ImportArcs.WaxFetcherOutputFormat() - Constructor for class org.archive.access.nutch.ImportArcs.WaxFetcherOutputFormat
 
ImportArcs.WaxParseOutputFormat - Class in org.archive.access.nutch
Copy so I can add collection prefix to produced signature and link CrawlDatums.
ImportArcs.WaxParseOutputFormat() - Constructor for class org.archive.access.nutch.ImportArcs.WaxParseOutputFormat
 
index(Path, Path, Path, Path[]) - Method in class org.archive.access.nutch.NutchwaxIndexer
 
init(ServletConfig) - Method in class org.archive.access.nutch.NutchwaxOpenSearchServlet
 
invert(Path, Path[], boolean, boolean) - Method in class org.archive.access.nutch.NutchwaxLinkDb
 
isIndex(ARCRecord) - Method in class org.archive.access.nutch.ImportArcs
 

L

LOG - Variable in class org.archive.access.nutch.ImportArcs
 
LOG - Variable in class org.archive.access.nutch.ImportArcs.WaxParseOutputFormat
 
LOG - Static variable in class org.archive.access.nutch.Nutchwax
 
LOG - Static variable in class org.archive.access.nutch.NutchwaxCrawlDb
 
LOG - Variable in class org.archive.access.nutch.NutchwaxIndexer
 

M

main(String[]) - Static method in class org.archive.access.nutch.ImportArcs
 
main(String[]) - Static method in class org.archive.access.nutch.Nutchwax
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxCrawlDb
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxDistributedSearch.Server
Use to start org.apache.nutch.searcher.DistributedSearch$Server but with nutchwax configuration mixed in so nutchwax plugins can be found (and properly configured).
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxIndexer
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxLinkDb
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxLinkDbMerger
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.ImportArcs
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxCrawlDbFilter
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxLinkDb
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxLinkDbFilter
 
merge(Path, Path[], boolean, boolean) - Method in class org.archive.access.nutch.NutchwaxLinkDbMerger
 

N

Nutchwax - Class in org.archive.access.nutch
Script to run all indexing jobs from index through merge of final index.
Nutchwax() - Constructor for class org.archive.access.nutch.Nutchwax
Default constructor.
Nutchwax.OutputDirectories - Class in org.archive.access.nutch
 
Nutchwax.OutputDirectories(Path) - Constructor for class org.archive.access.nutch.Nutchwax.OutputDirectories
 
NutchwaxBean - Class in org.archive.access.nutch
Proxy that allows us intercept getSummary so we can change key used.
NutchwaxBean(Configuration, Path) - Constructor for class org.archive.access.nutch.NutchwaxBean
 
NutchwaxBean(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxBean
 
NutchwaxConfiguration - Class in org.archive.access.nutch
Configuration that adds NutchWAX configuration to base nutch and hadoop config.
NutchwaxCrawlDb - Class in org.archive.access.nutch
Adds setting of the NutchwaxCrawlDbFilter.
NutchwaxCrawlDb() - Constructor for class org.archive.access.nutch.NutchwaxCrawlDb
 
NutchwaxCrawlDb(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxCrawlDb
 
NutchwaxCrawlDbFilter - Class in org.archive.access.nutch
Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back.
NutchwaxCrawlDbFilter() - Constructor for class org.archive.access.nutch.NutchwaxCrawlDbFilter
 
NutchwaxDistributedSearch - Class in org.archive.access.nutch
Script to start up a Nutchwax Distributed Searcher.
NutchwaxDistributedSearch() - Constructor for class org.archive.access.nutch.NutchwaxDistributedSearch
 
NutchwaxDistributedSearch.Server - Class in org.archive.access.nutch
 
NutchwaxIndexer - Class in org.archive.access.nutch
Subclass of nutch Indexer that handles keys that are not just URLs.
NutchwaxIndexer() - Constructor for class org.archive.access.nutch.NutchwaxIndexer
 
NutchwaxIndexer(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxIndexer
 
NutchwaxLinkDb - Class in org.archive.access.nutch
Subclass of nutch indexer that writes out LinkDb keys that include the collection name.
NutchwaxLinkDb() - Constructor for class org.archive.access.nutch.NutchwaxLinkDb
 
NutchwaxLinkDb(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxLinkDb
Construct an LinkDb.
NutchwaxLinkDbFilter - Class in org.archive.access.nutch
Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back.
NutchwaxLinkDbFilter() - Constructor for class org.archive.access.nutch.NutchwaxLinkDbFilter
 
NutchwaxLinkDbMerger - Class in org.archive.access.nutch
Wrapper around LinkDbMerger.
NutchwaxLinkDbMerger() - Constructor for class org.archive.access.nutch.NutchwaxLinkDbMerger
 
NutchwaxOpenSearchServlet - Class in org.archive.access.nutch
Subclass of OpenSearchServlet from nutch.
NutchwaxOpenSearchServlet() - Constructor for class org.archive.access.nutch.NutchwaxOpenSearchServlet
 
NutchwaxQuery - Class in org.archive.access.nutch
Handle exacturl when present in queries.
NutchwaxQuery() - Constructor for class org.archive.access.nutch.NutchwaxQuery
 
NutchwaxTest - Class in org.archive.access.nutch
 
NutchwaxTest() - Constructor for class org.archive.access.nutch.NutchwaxTest
 

O

org.archive.access.nutch - package org.archive.access.nutch
Provides mapreduce jobs to import ARCs and plugins to add 'collection', and ARC repository location to nutch index.

P

parse(String, Configuration) - Static method in class org.archive.access.nutch.NutchwaxQuery
Does fixup on the passed in query before giving it into nutch.

R

reduce(WritableComparable, Iterator, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxIndexer
 
run(String[]) - Method in class org.archive.access.nutch.ImportArcs
 

S

skip(String) - Method in class org.archive.access.nutch.ImportArcs
 

T

testGetCollectionFromWaxKey() - Method in class org.archive.access.nutch.NutchwaxTest
 

U

update(Path, Path[], boolean, boolean, boolean, boolean) - Method in class org.archive.access.nutch.NutchwaxCrawlDb
 
usage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 

W

WAX_COLLECTION_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 

A C D E F G I L M N O P R S T U W

Copyright © 2005-2007 Internet Archive. All Rights Reserved.