org.archive.access.nutch
Class NutchwaxLinkDb

java.lang.Object
  extended by org.apache.hadoop.util.ToolBase
      extended by org.apache.nutch.crawl.LinkDb
          extended by org.archive.access.nutch.NutchwaxLinkDb
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper, org.apache.hadoop.mapred.Reducer, org.apache.hadoop.util.Tool

public class NutchwaxLinkDb
extends org.apache.nutch.crawl.LinkDb

Subclass of nutch indexer that writes out LinkDb keys that include the collection name. Bulk of code is a copy and paste from LinkDb. LinkDb is not amenable to subclassing.

Author:
stack

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.nutch.crawl.LinkDb
org.apache.nutch.crawl.LinkDb.Merger
 
Field Summary
 
Fields inherited from class org.apache.nutch.crawl.LinkDb
CURRENT_NAME, LOCK_NAME, LOG
 
Fields inherited from class org.apache.hadoop.util.ToolBase
conf
 
Constructor Summary
NutchwaxLinkDb()
           
NutchwaxLinkDb(org.apache.hadoop.conf.Configuration conf)
          Construct an LinkDb.
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf job)
           
 void invert(org.apache.hadoop.fs.Path linkDb, org.apache.hadoop.fs.Path[] segments, boolean normalize, boolean filter, boolean force)
           
static void main(java.lang.String[] args)
           
 void map(org.apache.hadoop.io.WritableComparable key, org.apache.hadoop.io.Writable value, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter)
           
 
Methods inherited from class org.apache.nutch.crawl.LinkDb
close, createMergeJob, install, invert, reduce, run
 
Methods inherited from class org.apache.hadoop.util.ToolBase
doMain, getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NutchwaxLinkDb

public NutchwaxLinkDb()

NutchwaxLinkDb

public NutchwaxLinkDb(org.apache.hadoop.conf.Configuration conf)
Construct an LinkDb.

Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable
Overrides:
configure in class org.apache.nutch.crawl.LinkDb

map

public void map(org.apache.hadoop.io.WritableComparable key,
                org.apache.hadoop.io.Writable value,
                org.apache.hadoop.mapred.OutputCollector output,
                org.apache.hadoop.mapred.Reporter reporter)
         throws java.io.IOException
Specified by:
map in interface org.apache.hadoop.mapred.Mapper
Overrides:
map in class org.apache.nutch.crawl.LinkDb
Throws:
java.io.IOException

invert

public void invert(org.apache.hadoop.fs.Path linkDb,
                   org.apache.hadoop.fs.Path[] segments,
                   boolean normalize,
                   boolean filter,
                   boolean force)
            throws java.io.IOException
Overrides:
invert in class org.apache.nutch.crawl.LinkDb
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception


Copyright © 2005-2007 Internet Archive. All Rights Reserved.