|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Class Summary | |
|---|---|
| ImportArcs | Ingests ARCs writing ARC Record parse as Nutch FetcherOutputFormat. |
| ImportArcs.WaxFetcherOutputFormat | Override of nutch FetcherOutputFormat so I can substitute my own
ParseOutputFormat, ImportArcs.WaxParseOutputFormat. |
| ImportArcs.WaxParseOutputFormat | Copy so I can add collection prefix to produced signature and link CrawlDatums. |
| Nutchwax | Script to run all indexing jobs from index through merge of final index. |
| NutchwaxBean | Proxy that allows us intercept getSummary so we can change key used. |
| NutchwaxConfiguration | Configuration that adds NutchWAX configuration to base nutch and hadoop config. |
| NutchwaxCrawlDb | Adds setting of the NutchwaxCrawlDbFilter. |
| NutchwaxCrawlDbFilter | Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back. |
| NutchwaxDistributedSearch | Script to start up a Nutchwax Distributed Searcher. |
| NutchwaxDistributedSearch.Server | |
| NutchwaxIndexer | Subclass of nutch Indexer that handles keys that are not just URLs. |
| NutchwaxLinkDb | Subclass of nutch indexer that writes out LinkDb keys that include the collection name. |
| NutchwaxLinkDbFilter | Override so we can meddle with the key passed the superclass stripping collection (then, when the super's mapper is done, put the collection back. |
| NutchwaxLinkDbMerger | Wrapper around LinkDbMerger. |
| NutchwaxOpenSearchServlet | Subclass of OpenSearchServlet from nutch. |
| NutchwaxQuery | Handle exacturl when present in queries. |
| NutchwaxTest | |
Provides mapreduce jobs to import ARCs and plugins to add 'collection', and ARC repository location to nutch index.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||