|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.conf.Configured
org.archive.wayback.hadoop.CDXSort
public class CDXSort
| Nested Class Summary | |
|---|---|
static class |
CDXSort.CDXCanonicalizerMapClass
Mapper which reads an identity CDX line, outputting: key - canonicalized original URL + timestamp val - everything else |
static class |
CDXSort.CDXMapClass
Mapper which reads a canonicalized CDX line, splitting into: key - URL + timestamp val - everything else |
static class |
CDXSort.DeReffingCDXCanonicalizerMapClass
|
static class |
CDXSort.FunkyCDXCanonicalizerMapClass
Mapper which reads an identity Funky format CDX line, outputting: key - canonicalized original URL + timestamp val - everything else input lines are a hybrid format: ORIG_URL DATE '-' (literal) MIME HTTP_CODE SHA1 REDIRECT START_OFFSET ARC_PREFIX (sans .arc.gz) ROBOT_FLAG (combo of AIF - no: Archive,Index,Follow, or '-' if none) Ex: http://www.myow.de:80/news_show.php? 20061126032815 - text/html 200 DVKFPTOJGCLT3G5GUVLCETHLFO3222JM - 91098929 foo A Need to: . |
static class |
CDXSort.FunkyDeReffingCDXCanonicalizerMapClass
|
| Constructor Summary | |
|---|---|
CDXSort()
|
|
| Method Summary | |
|---|---|
org.apache.hadoop.mapred.RunningJob |
getResult()
Get the last job that was run using this instance. |
static void |
main(String[] args)
|
int |
run(String[] args)
The main driver for sort program. |
| Methods inherited from class org.apache.hadoop.conf.Configured |
|---|
getConf, setConf |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.hadoop.conf.Configurable |
|---|
getConf, setConf |
| Constructor Detail |
|---|
public CDXSort()
| Method Detail |
|---|
public int run(String[] args)
throws Exception
run in interface org.apache.hadoop.util.ToolIOException - When there is communication problems with the job tracker.
Exception
public static void main(String[] args)
throws Exception
Exceptionpublic org.apache.hadoop.mapred.RunningJob getResult()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||