Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
A
C
D
E
F
G
I
L
M
N
O
P
R
S
T
U
W
A
ARCCOLLECTION_KEY
- Static variable in class org.archive.access.nutch.
ImportArcs
ARCFILENAME_KEY
- Static variable in class org.archive.access.nutch.
ImportArcs
ARCFILEOFFSET_KEY
- Static variable in class org.archive.access.nutch.
ImportArcs
C
checkArcsDir(Path)
- Method in class org.archive.access.nutch.
Nutchwax
Check the arcs dir exists and looks like it has files that list ARCs (rather than ARCs themselves).
checkCollectionName()
- Method in class org.archive.access.nutch.
ImportArcs
checkMimetype(String)
- Static method in class org.archive.access.nutch.
ImportArcs
close()
- Method in class org.archive.access.nutch.
ImportArcs
close()
- Method in class org.archive.access.nutch.jobs.
ImportLogsReporter
configure(JobConf)
- Method in class org.archive.access.nutch.
ImportArcs
configure(JobConf)
- Method in class org.archive.access.nutch.jobs.
ImportLogsReporter
configure(JobConf)
- Method in class org.archive.access.nutch.mapred.
TaskLogMapRunner
configure(JobConf)
- Method in class org.archive.access.nutch.
NutchwaxLinkDb
createLinkdb(Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
D
doAll(Path, String, Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
Run passed list of mapreduce indexing jobs.
doAllUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
doClass(String[])
- Method in class org.archive.access.nutch.
Nutchwax
doClassUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
doDedup(Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
doDedupUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
doGet(HttpServletRequest, HttpServletResponse)
- Method in class org.archive.access.nutch.
NutchwaxOpenSearchServlet
doImport(Path, String, Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
doImportUsage(String, int)
- Static method in class org.archive.access.nutch.
ImportArcs
doIndexing(Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
doIndexing(Nutchwax.OutputDirectories, Path[])
- Method in class org.archive.access.nutch.
Nutchwax
doIndexUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
doInvert(Nutchwax.OutputDirectories, Path[])
- Method in class org.archive.access.nutch.
Nutchwax
doInvert(Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
doInvertUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
doJob(String, String[])
- Method in class org.archive.access.nutch.
Nutchwax
doLog(String, String, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.mapred.
TaskLogMapRunner
doMerge(Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
doMergeUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
doSearch(String[])
- Method in class org.archive.access.nutch.
Nutchwax
doSearchUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
doUpdate(Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
doUpdate(Nutchwax.OutputDirectories, String[])
- Method in class org.archive.access.nutch.
Nutchwax
doUpdateUsage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
E
encodeExacturl(String)
- Static method in class org.archive.access.nutch.
NutchwaxQuery
F
fetchAll()
- Method in class org.archive.access.nutch.mapred.
TaskLogReader
Return the entire user-log (remaining splits).
formatToOneLine(String)
- Method in class org.archive.access.nutch.
ImportArcs
G
generateWaxKey(WritableComparable, String)
- Static method in class org.archive.access.nutch.
Nutchwax
generateWaxKey(String, String)
- Static method in class org.archive.access.nutch.
Nutchwax
get(ServletContext, Configuration)
- Static method in class org.archive.access.nutch.
NutchwaxBean
getAnchors(HitDetails)
- Method in class org.archive.access.nutch.
NutchwaxBean
getARCName(ARCRecordMetaData)
- Method in class org.archive.access.nutch.
ImportArcs
getCollectionFromArcname(String)
- Static method in class org.archive.access.nutch.
ImportArcs
getCollectionFromWaxKey(WritableComparable)
- Static method in class org.archive.access.nutch.
Nutchwax
getCollectionQualifiedHitDetails(HitDetails)
- Method in class org.archive.access.nutch.
NutchwaxBean
TODO: Make it so I don't have to create a new HitDetails changing the key used doing lookup.
getConf()
- Method in class org.archive.access.nutch.
ImportArcs
getConfiguration()
- Static method in class org.archive.access.nutch.
NutchwaxConfiguration
getConfiguration(ServletContext)
- Static method in class org.archive.access.nutch.
NutchwaxConfiguration
getCrawlDb()
- Method in class org.archive.access.nutch.
Nutchwax.OutputDirectories
getDate(String)
- Static method in class org.archive.access.nutch.
Nutchwax
getFS()
- Method in class org.archive.access.nutch.
Nutchwax
getIndex()
- Method in class org.archive.access.nutch.
Nutchwax.OutputDirectories
getIndexes()
- Method in class org.archive.access.nutch.
Nutchwax.OutputDirectories
getInlinks(HitDetails)
- Method in class org.archive.access.nutch.
NutchwaxBean
getInputStream()
- Method in class org.archive.access.nutch.mapred.
TaskLogReader
getJobConf()
- Method in class org.archive.access.nutch.
Nutchwax
getLinkDb()
- Method in class org.archive.access.nutch.
Nutchwax.OutputDirectories
getMimetype(String, MimeTypes, String)
- Method in class org.archive.access.nutch.
ImportArcs
getOutput()
- Method in class org.archive.access.nutch.
Nutchwax.OutputDirectories
getParseRate(long, long)
- Method in class org.archive.access.nutch.
ImportArcs
getParseRateLogMessage(String, String, double)
- Method in class org.archive.access.nutch.
ImportArcs
getRecordWriter(FileSystem, JobConf, String, Progressable)
- Method in class org.archive.access.nutch.
ImportArcs.WaxFetcherOutputFormat
getRecordWriter(FileSystem, JobConf, String, Progressable)
- Method in class org.archive.access.nutch.
ImportArcs.WaxParseOutputFormat
getSegments(Nutchwax.OutputDirectories)
- Method in class org.archive.access.nutch.
Nutchwax
getSegments()
- Method in class org.archive.access.nutch.
Nutchwax.OutputDirectories
getStatus(String, String, String, String)
- Method in class org.archive.access.nutch.
ImportArcs
getSummary(HitDetails[], Query)
- Method in class org.archive.access.nutch.
NutchwaxBean
getTmpDir()
- Method in class org.archive.access.nutch.
Nutchwax.OutputDirectories
getTotalLogSize()
- Method in class org.archive.access.nutch.mapred.
TaskLogReader
Return the total 'logical' log-size written by the task, including purged data.
getUrlFromWaxKey(WritableComparable)
- Static method in class org.archive.access.nutch.
Nutchwax
I
ImportArcs
- Class in
org.archive.access.nutch
Ingests ARCs writing ARC Record parse as Nutch FetcherOutputFormat.
ImportArcs()
- Constructor for class org.archive.access.nutch.
ImportArcs
ImportArcs(Configuration)
- Constructor for class org.archive.access.nutch.
ImportArcs
importArcs(Path, Path, String)
- Method in class org.archive.access.nutch.
ImportArcs
ImportArcs.WaxFetcherOutputFormat
- Class in
org.archive.access.nutch
Override of nutch FetcherOutputFormat so I can substitute my own ParseOutputFormat,
ImportArcs.WaxParseOutputFormat
.
ImportArcs.WaxFetcherOutputFormat()
- Constructor for class org.archive.access.nutch.
ImportArcs.WaxFetcherOutputFormat
ImportArcs.WaxParseOutputFormat
- Class in
org.archive.access.nutch
Copy so I can add collection prefix to produced signature and link CrawlDatums.
ImportArcs.WaxParseOutputFormat()
- Constructor for class org.archive.access.nutch.
ImportArcs.WaxParseOutputFormat
ImportLogsReporter
- Class in
org.archive.access.nutch.jobs
Makes a report based off passed log inputs.
ImportLogsReporter()
- Constructor for class org.archive.access.nutch.jobs.
ImportLogsReporter
index(Path, Path, Path, Path[])
- Method in class org.archive.access.nutch.
NutchwaxIndexer
init(ServletConfig)
- Method in class org.archive.access.nutch.
NutchwaxOpenSearchServlet
invert(Path, Path[], boolean, boolean, boolean)
- Method in class org.archive.access.nutch.
NutchwaxLinkDb
isERROR(String)
- Method in class org.archive.access.nutch.jobs.
ImportLogsReporter
isIndex(ARCRecord)
- Method in class org.archive.access.nutch.
ImportArcs
isWARN(String)
- Method in class org.archive.access.nutch.jobs.
ImportLogsReporter
L
LOG
- Variable in class org.archive.access.nutch.
ImportArcs
LOG
- Variable in class org.archive.access.nutch.
ImportArcs.WaxParseOutputFormat
LOG
- Variable in class org.archive.access.nutch.mapred.
TaskLogMapRunner
LOG
- Static variable in class org.archive.access.nutch.
Nutchwax
LOG
- Static variable in class org.archive.access.nutch.
NutchwaxCrawlDb
LOG
- Variable in class org.archive.access.nutch.
NutchwaxIndexer
M
main(String[])
- Static method in class org.archive.access.nutch.
ImportArcs
main(String[])
- Static method in class org.archive.access.nutch.jobs.
ImportLogsReporter
main(String[])
- Static method in class org.archive.access.nutch.mapred.
TaskLogReader
For testing the TaskLog Reader.
main(String[])
- Static method in class org.archive.access.nutch.
Nutchwax
main(String[])
- Static method in class org.archive.access.nutch.
NutchwaxBean
For debugging.
main(String[])
- Static method in class org.archive.access.nutch.
NutchwaxCrawlDb
main(String[])
- Static method in class org.archive.access.nutch.
NutchwaxDistributedSearch.Server
Use to start org.apache.nutch.searcher.DistributedSearch$Server but with nutchwax configuration mixed in so nutchwax plugins can be found (and properly configured).
main(String[])
- Static method in class org.archive.access.nutch.
NutchwaxIndexer
main(String[])
- Static method in class org.archive.access.nutch.
NutchwaxLinkDb
main(String[])
- Static method in class org.archive.access.nutch.
NutchwaxLinkDbMerger
map(WritableComparable, Writable, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.
ImportArcs
map(WritableComparable, Writable, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.jobs.
ImportLogsReporter
map(WritableComparable, Writable, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.
NutchwaxCrawlDbFilter
map(WritableComparable, Writable, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.
NutchwaxLinkDb
map(WritableComparable, Writable, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.
NutchwaxLinkDbFilter
merge(Path, Path[], boolean, boolean)
- Method in class org.archive.access.nutch.
NutchwaxLinkDbMerger
N
Nutchwax
- Class in
org.archive.access.nutch
Script to run all indexing jobs from index through merge of final index.
Nutchwax()
- Constructor for class org.archive.access.nutch.
Nutchwax
Default constructor.
Nutchwax.OutputDirectories
- Class in
org.archive.access.nutch
Nutchwax.OutputDirectories(Path)
- Constructor for class org.archive.access.nutch.
Nutchwax.OutputDirectories
NutchwaxBean
- Class in
org.archive.access.nutch
Proxy that allows us intercept getSummary so we can change key used.
NutchwaxBean(Configuration, Path)
- Constructor for class org.archive.access.nutch.
NutchwaxBean
NutchwaxBean(Configuration)
- Constructor for class org.archive.access.nutch.
NutchwaxBean
NutchwaxConfiguration
- Class in
org.archive.access.nutch
Configuration that adds NutchWAX configuration to base nutch and hadoop config.
NutchwaxCrawlDb
- Class in
org.archive.access.nutch
Adds setting of the NutchwaxCrawlDbFilter.
NutchwaxCrawlDb()
- Constructor for class org.archive.access.nutch.
NutchwaxCrawlDb
NutchwaxCrawlDb(Configuration)
- Constructor for class org.archive.access.nutch.
NutchwaxCrawlDb
NutchwaxCrawlDbFilter
- Class in
org.archive.access.nutch
Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back.
NutchwaxCrawlDbFilter()
- Constructor for class org.archive.access.nutch.
NutchwaxCrawlDbFilter
NutchwaxDistributedSearch
- Class in
org.archive.access.nutch
Script to start up a Nutchwax Distributed Searcher.
NutchwaxDistributedSearch()
- Constructor for class org.archive.access.nutch.
NutchwaxDistributedSearch
NutchwaxDistributedSearch.Server
- Class in
org.archive.access.nutch
NutchwaxIndexer
- Class in
org.archive.access.nutch
Subclass of nutch Indexer that handles keys that are not just URLs.
NutchwaxIndexer()
- Constructor for class org.archive.access.nutch.
NutchwaxIndexer
NutchwaxIndexer(Configuration)
- Constructor for class org.archive.access.nutch.
NutchwaxIndexer
NutchwaxLinkDb
- Class in
org.archive.access.nutch
Subclass of nutch indexer that writes out LinkDb keys that include the collection name.
NutchwaxLinkDb()
- Constructor for class org.archive.access.nutch.
NutchwaxLinkDb
NutchwaxLinkDb(Configuration)
- Constructor for class org.archive.access.nutch.
NutchwaxLinkDb
Construct an LinkDb.
NutchwaxLinkDbFilter
- Class in
org.archive.access.nutch
Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back.
NutchwaxLinkDbFilter()
- Constructor for class org.archive.access.nutch.
NutchwaxLinkDbFilter
NutchwaxLinkDbMerger
- Class in
org.archive.access.nutch
Wrapper around LinkDbMerger.
NutchwaxLinkDbMerger()
- Constructor for class org.archive.access.nutch.
NutchwaxLinkDbMerger
NutchwaxOpenSearchServlet
- Class in
org.archive.access.nutch
Subclass of OpenSearchServlet from nutch.
NutchwaxOpenSearchServlet()
- Constructor for class org.archive.access.nutch.
NutchwaxOpenSearchServlet
NutchwaxQuery
- Class in
org.archive.access.nutch
Handle exacturl when present in queries.
NutchwaxQuery()
- Constructor for class org.archive.access.nutch.
NutchwaxQuery
NutchwaxTest
- Class in
org.archive.access.nutch
NutchwaxTest()
- Constructor for class org.archive.access.nutch.
NutchwaxTest
O
onARCClose()
- Method in class org.archive.access.nutch.
ImportArcs
onARCOpen()
- Method in class org.archive.access.nutch.
ImportArcs
org.archive.access.nutch
- package org.archive.access.nutch
Provides mapreduce jobs to import ARCs and plugins to add 'collection', and ARC repository location to nutch index.
org.archive.access.nutch.jobs
- package org.archive.access.nutch.jobs
org.archive.access.nutch.mapred
- package org.archive.access.nutch.mapred
P
parse(String, Configuration)
- Static method in class org.archive.access.nutch.
NutchwaxQuery
Does fixup on the passed in query before giving it into nutch.
R
read(byte[], int, int, long, long)
- Method in class org.archive.access.nutch.mapred.
TaskLogReader
Read user-log data given an offset/length.
reduce(WritableComparable, Iterator, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.
NutchwaxIndexer
report(String, String)
- Method in class org.archive.access.nutch.jobs.
ImportLogsReporter
rewriteArgs(String[], int)
- Method in class org.archive.access.nutch.
Nutchwax
run(String[])
- Method in class org.archive.access.nutch.
ImportArcs
run(String[])
- Method in class org.archive.access.nutch.jobs.
ImportLogsReporter
run(RecordReader, OutputCollector, Reporter)
- Method in class org.archive.access.nutch.mapred.
TaskLogMapRunner
S
setConf(Configuration)
- Method in class org.archive.access.nutch.
ImportArcs
skip(String)
- Method in class org.archive.access.nutch.
ImportArcs
T
tail(byte[], int, int, long, int)
- Method in class org.archive.access.nutch.mapred.
TaskLogReader
Tail the user-log.
TaskLogMapRunner
- Class in
org.archive.access.nutch.mapred
Calls a map for every line in a hadoop userlog directory.
TaskLogMapRunner()
- Constructor for class org.archive.access.nutch.mapred.
TaskLogMapRunner
TaskLogReader
- Class in
org.archive.access.nutch.mapred
Bulk of below is a patched hadoop TaskLog$Reader that can read from URL streams.
TaskLogReader(URL)
- Constructor for class org.archive.access.nutch.mapred.
TaskLogReader
Create a new task log reader.
testGetCollectionFromWaxKey()
- Method in class org.archive.access.nutch.
NutchwaxTest
U
update(Path, Path[], boolean, boolean, boolean, boolean)
- Method in class org.archive.access.nutch.
NutchwaxCrawlDb
usage(String, int)
- Static method in class org.archive.access.nutch.
Nutchwax
W
WAX_COLLECTION_KEY
- Static variable in class org.archive.access.nutch.
ImportArcs
A
C
D
E
F
G
I
L
M
N
O
P
R
S
T
U
W
Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
Copyright © 2005-2007
Internet Archive
. All Rights Reserved.