A C D E F G I L M N O P R S T U W

A

ARCCOLLECTION_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 
ARCFILENAME_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 
ARCFILEOFFSET_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 

C

checkArcsDir(Path) - Method in class org.archive.access.nutch.Nutchwax
Check the arcs dir exists and looks like it has files that list ARCs (rather than ARCs themselves).
checkCollectionName() - Method in class org.archive.access.nutch.ImportArcs
 
checkMimetype(String) - Static method in class org.archive.access.nutch.ImportArcs
 
close() - Method in class org.archive.access.nutch.ImportArcs
 
close() - Method in class org.archive.access.nutch.jobs.ImportLogsReporter
 
configure(JobConf) - Method in class org.archive.access.nutch.ImportArcs
 
configure(JobConf) - Method in class org.archive.access.nutch.jobs.ImportLogsReporter
 
configure(JobConf) - Method in class org.archive.access.nutch.mapred.TaskLogMapRunner
 
configure(JobConf) - Method in class org.archive.access.nutch.NutchwaxLinkDb
 
createLinkdb(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 

D

doAll(Path, String, Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
Run passed list of mapreduce indexing jobs.
doAllUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doClass(String[]) - Method in class org.archive.access.nutch.Nutchwax
 
doClassUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doDedup(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doDedupUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doGet(HttpServletRequest, HttpServletResponse) - Method in class org.archive.access.nutch.NutchwaxOpenSearchServlet
 
doImport(Path, String, Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doImportUsage(String, int) - Static method in class org.archive.access.nutch.ImportArcs
 
doIndexing(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doIndexing(Nutchwax.OutputDirectories, Path[]) - Method in class org.archive.access.nutch.Nutchwax
 
doIndexUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doInvert(Nutchwax.OutputDirectories, Path[]) - Method in class org.archive.access.nutch.Nutchwax
 
doInvert(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doInvertUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doJob(String, String[]) - Method in class org.archive.access.nutch.Nutchwax
 
doLog(String, String, OutputCollector, Reporter) - Method in class org.archive.access.nutch.mapred.TaskLogMapRunner
 
doMerge(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doMergeUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doSearch(String[]) - Method in class org.archive.access.nutch.Nutchwax
 
doSearchUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 
doUpdate(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
doUpdate(Nutchwax.OutputDirectories, String[]) - Method in class org.archive.access.nutch.Nutchwax
 
doUpdateUsage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 

E

encodeExacturl(String) - Static method in class org.archive.access.nutch.NutchwaxQuery
 

F

fetchAll() - Method in class org.archive.access.nutch.mapred.TaskLogReader
Return the entire user-log (remaining splits).
formatToOneLine(String) - Method in class org.archive.access.nutch.ImportArcs
 

G

generateWaxKey(WritableComparable, String) - Static method in class org.archive.access.nutch.Nutchwax
 
generateWaxKey(String, String) - Static method in class org.archive.access.nutch.Nutchwax
 
get(ServletContext, Configuration) - Static method in class org.archive.access.nutch.NutchwaxBean
 
getAnchors(HitDetails) - Method in class org.archive.access.nutch.NutchwaxBean
 
getARCName(ARCRecordMetaData) - Method in class org.archive.access.nutch.ImportArcs
 
getCollectionFromArcname(String) - Static method in class org.archive.access.nutch.ImportArcs
 
getCollectionFromWaxKey(WritableComparable) - Static method in class org.archive.access.nutch.Nutchwax
 
getCollectionQualifiedHitDetails(HitDetails) - Method in class org.archive.access.nutch.NutchwaxBean
TODO: Make it so I don't have to create a new HitDetails changing the key used doing lookup.
getConf() - Method in class org.archive.access.nutch.ImportArcs
 
getConfiguration() - Static method in class org.archive.access.nutch.NutchwaxConfiguration
 
getConfiguration(ServletContext) - Static method in class org.archive.access.nutch.NutchwaxConfiguration
 
getCrawlDb() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getDate(String) - Static method in class org.archive.access.nutch.Nutchwax
 
getFS() - Method in class org.archive.access.nutch.Nutchwax
 
getIndex() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getIndexes() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getInlinks(HitDetails) - Method in class org.archive.access.nutch.NutchwaxBean
 
getInputStream() - Method in class org.archive.access.nutch.mapred.TaskLogReader
 
getJobConf() - Method in class org.archive.access.nutch.Nutchwax
 
getLinkDb() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getMimetype(String, MimeTypes, String) - Method in class org.archive.access.nutch.ImportArcs
 
getOutput() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getParseRate(long, long) - Method in class org.archive.access.nutch.ImportArcs
 
getParseRateLogMessage(String, String, double) - Method in class org.archive.access.nutch.ImportArcs
 
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.archive.access.nutch.ImportArcs.WaxFetcherOutputFormat
 
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.archive.access.nutch.ImportArcs.WaxParseOutputFormat
 
getSegments(Nutchwax.OutputDirectories) - Method in class org.archive.access.nutch.Nutchwax
 
getSegments() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getStatus(String, String, String, String) - Method in class org.archive.access.nutch.ImportArcs
 
getSummary(HitDetails[], Query) - Method in class org.archive.access.nutch.NutchwaxBean
 
getTmpDir() - Method in class org.archive.access.nutch.Nutchwax.OutputDirectories
 
getTotalLogSize() - Method in class org.archive.access.nutch.mapred.TaskLogReader
Return the total 'logical' log-size written by the task, including purged data.
getUrlFromWaxKey(WritableComparable) - Static method in class org.archive.access.nutch.Nutchwax
 

I

ImportArcs - Class in org.archive.access.nutch
Ingests ARCs writing ARC Record parse as Nutch FetcherOutputFormat.
ImportArcs() - Constructor for class org.archive.access.nutch.ImportArcs
 
ImportArcs(Configuration) - Constructor for class org.archive.access.nutch.ImportArcs
 
importArcs(Path, Path, String) - Method in class org.archive.access.nutch.ImportArcs
 
ImportArcs.WaxFetcherOutputFormat - Class in org.archive.access.nutch
Override of nutch FetcherOutputFormat so I can substitute my own ParseOutputFormat, ImportArcs.WaxParseOutputFormat.
ImportArcs.WaxFetcherOutputFormat() - Constructor for class org.archive.access.nutch.ImportArcs.WaxFetcherOutputFormat
 
ImportArcs.WaxParseOutputFormat - Class in org.archive.access.nutch
Copy so I can add collection prefix to produced signature and link CrawlDatums.
ImportArcs.WaxParseOutputFormat() - Constructor for class org.archive.access.nutch.ImportArcs.WaxParseOutputFormat
 
ImportLogsReporter - Class in org.archive.access.nutch.jobs
Makes a report based off passed log inputs.
ImportLogsReporter() - Constructor for class org.archive.access.nutch.jobs.ImportLogsReporter
 
index(Path, Path, Path, Path[]) - Method in class org.archive.access.nutch.NutchwaxIndexer
 
init(ServletConfig) - Method in class org.archive.access.nutch.NutchwaxOpenSearchServlet
 
invert(Path, Path[], boolean, boolean, boolean) - Method in class org.archive.access.nutch.NutchwaxLinkDb
 
isERROR(String) - Method in class org.archive.access.nutch.jobs.ImportLogsReporter
 
isIndex(ARCRecord) - Method in class org.archive.access.nutch.ImportArcs
 
isWARN(String) - Method in class org.archive.access.nutch.jobs.ImportLogsReporter
 

L

LOG - Variable in class org.archive.access.nutch.ImportArcs
 
LOG - Variable in class org.archive.access.nutch.ImportArcs.WaxParseOutputFormat
 
LOG - Variable in class org.archive.access.nutch.mapred.TaskLogMapRunner
 
LOG - Static variable in class org.archive.access.nutch.Nutchwax
 
LOG - Static variable in class org.archive.access.nutch.NutchwaxCrawlDb
 
LOG - Variable in class org.archive.access.nutch.NutchwaxIndexer
 

M

main(String[]) - Static method in class org.archive.access.nutch.ImportArcs
 
main(String[]) - Static method in class org.archive.access.nutch.jobs.ImportLogsReporter
 
main(String[]) - Static method in class org.archive.access.nutch.mapred.TaskLogReader
For testing the TaskLog Reader.
main(String[]) - Static method in class org.archive.access.nutch.Nutchwax
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxBean
For debugging.
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxCrawlDb
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxDistributedSearch.Server
Use to start org.apache.nutch.searcher.DistributedSearch$Server but with nutchwax configuration mixed in so nutchwax plugins can be found (and properly configured).
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxIndexer
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxLinkDb
 
main(String[]) - Static method in class org.archive.access.nutch.NutchwaxLinkDbMerger
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.ImportArcs
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.jobs.ImportLogsReporter
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxCrawlDbFilter
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxLinkDb
 
map(WritableComparable, Writable, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxLinkDbFilter
 
merge(Path, Path[], boolean, boolean) - Method in class org.archive.access.nutch.NutchwaxLinkDbMerger
 

N

Nutchwax - Class in org.archive.access.nutch
Script to run all indexing jobs from index through merge of final index.
Nutchwax() - Constructor for class org.archive.access.nutch.Nutchwax
Default constructor.
Nutchwax.OutputDirectories - Class in org.archive.access.nutch
 
Nutchwax.OutputDirectories(Path) - Constructor for class org.archive.access.nutch.Nutchwax.OutputDirectories
 
NutchwaxBean - Class in org.archive.access.nutch
Proxy that allows us intercept getSummary so we can change key used.
NutchwaxBean(Configuration, Path) - Constructor for class org.archive.access.nutch.NutchwaxBean
 
NutchwaxBean(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxBean
 
NutchwaxConfiguration - Class in org.archive.access.nutch
Configuration that adds NutchWAX configuration to base nutch and hadoop config.
NutchwaxCrawlDb - Class in org.archive.access.nutch
Adds setting of the NutchwaxCrawlDbFilter.
NutchwaxCrawlDb() - Constructor for class org.archive.access.nutch.NutchwaxCrawlDb
 
NutchwaxCrawlDb(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxCrawlDb
 
NutchwaxCrawlDbFilter - Class in org.archive.access.nutch
Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back.
NutchwaxCrawlDbFilter() - Constructor for class org.archive.access.nutch.NutchwaxCrawlDbFilter
 
NutchwaxDistributedSearch - Class in org.archive.access.nutch
Script to start up a Nutchwax Distributed Searcher.
NutchwaxDistributedSearch() - Constructor for class org.archive.access.nutch.NutchwaxDistributedSearch
 
NutchwaxDistributedSearch.Server - Class in org.archive.access.nutch
 
NutchwaxIndexer - Class in org.archive.access.nutch
Subclass of nutch Indexer that handles keys that are not just URLs.
NutchwaxIndexer() - Constructor for class org.archive.access.nutch.NutchwaxIndexer
 
NutchwaxIndexer(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxIndexer
 
NutchwaxLinkDb - Class in org.archive.access.nutch
Subclass of nutch indexer that writes out LinkDb keys that include the collection name.
NutchwaxLinkDb() - Constructor for class org.archive.access.nutch.NutchwaxLinkDb
 
NutchwaxLinkDb(Configuration) - Constructor for class org.archive.access.nutch.NutchwaxLinkDb
Construct an LinkDb.
NutchwaxLinkDbFilter - Class in org.archive.access.nutch
Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back.
NutchwaxLinkDbFilter() - Constructor for class org.archive.access.nutch.NutchwaxLinkDbFilter
 
NutchwaxLinkDbMerger - Class in org.archive.access.nutch
Wrapper around LinkDbMerger.
NutchwaxLinkDbMerger() - Constructor for class org.archive.access.nutch.NutchwaxLinkDbMerger
 
NutchwaxOpenSearchServlet - Class in org.archive.access.nutch
Subclass of OpenSearchServlet from nutch.
NutchwaxOpenSearchServlet() - Constructor for class org.archive.access.nutch.NutchwaxOpenSearchServlet
 
NutchwaxQuery - Class in org.archive.access.nutch
Handle exacturl when present in queries.
NutchwaxQuery() - Constructor for class org.archive.access.nutch.NutchwaxQuery
 
NutchwaxTest - Class in org.archive.access.nutch
 
NutchwaxTest() - Constructor for class org.archive.access.nutch.NutchwaxTest
 

O

onARCClose() - Method in class org.archive.access.nutch.ImportArcs
 
onARCOpen() - Method in class org.archive.access.nutch.ImportArcs
 
org.archive.access.nutch - package org.archive.access.nutch
Provides mapreduce jobs to import ARCs and plugins to add 'collection', and ARC repository location to nutch index.
org.archive.access.nutch.jobs - package org.archive.access.nutch.jobs
 
org.archive.access.nutch.mapred - package org.archive.access.nutch.mapred
 

P

parse(String, Configuration) - Static method in class org.archive.access.nutch.NutchwaxQuery
Does fixup on the passed in query before giving it into nutch.

R

read(byte[], int, int, long, long) - Method in class org.archive.access.nutch.mapred.TaskLogReader
Read user-log data given an offset/length.
reduce(WritableComparable, Iterator, OutputCollector, Reporter) - Method in class org.archive.access.nutch.NutchwaxIndexer
 
report(String, String) - Method in class org.archive.access.nutch.jobs.ImportLogsReporter
 
rewriteArgs(String[], int) - Method in class org.archive.access.nutch.Nutchwax
 
run(String[]) - Method in class org.archive.access.nutch.ImportArcs
 
run(String[]) - Method in class org.archive.access.nutch.jobs.ImportLogsReporter
 
run(RecordReader, OutputCollector, Reporter) - Method in class org.archive.access.nutch.mapred.TaskLogMapRunner
 

S

setConf(Configuration) - Method in class org.archive.access.nutch.ImportArcs
 
skip(String) - Method in class org.archive.access.nutch.ImportArcs
 

T

tail(byte[], int, int, long, int) - Method in class org.archive.access.nutch.mapred.TaskLogReader
Tail the user-log.
TaskLogMapRunner - Class in org.archive.access.nutch.mapred
Calls a map for every line in a hadoop userlog directory.
TaskLogMapRunner() - Constructor for class org.archive.access.nutch.mapred.TaskLogMapRunner
 
TaskLogReader - Class in org.archive.access.nutch.mapred
Bulk of below is a patched hadoop TaskLog$Reader that can read from URL streams.
TaskLogReader(URL) - Constructor for class org.archive.access.nutch.mapred.TaskLogReader
Create a new task log reader.
testGetCollectionFromWaxKey() - Method in class org.archive.access.nutch.NutchwaxTest
 

U

update(Path, Path[], boolean, boolean, boolean, boolean) - Method in class org.archive.access.nutch.NutchwaxCrawlDb
 
usage(String, int) - Static method in class org.archive.access.nutch.Nutchwax
 

W

WAX_COLLECTION_KEY - Static variable in class org.archive.access.nutch.ImportArcs
 

A C D E F G I L M N O P R S T U W

Copyright © 2005-2007 Internet Archive. All Rights Reserved.