org.archive.access.nutch.mapred
Class TaskLogReader

java.lang.Object
  extended by org.archive.access.nutch.mapred.TaskLogReader

public class TaskLogReader
extends java.lang.Object

Bulk of below is a patched hadoop TaskLog$Reader that can read from URL streams. [1637951] [nutchwax] Redo reporting scripts as mapreduce jobs' has a patch to give back to hadoop if the below works as basis for pulling logs from across cluster.

Author:
stack

Constructor Summary
TaskLogReader(java.net.URL u)
          Create a new task log reader.
 
Method Summary
 byte[] fetchAll()
          Return the entire user-log (remaining splits).
 java.io.InputStream getInputStream()
           
 long getTotalLogSize()
          Return the total 'logical' log-size written by the task, including purged data.
static void main(java.lang.String[] args)
          For testing the TaskLog Reader.
 int read(byte[] b, int off, int len, long logOffset, long logLength)
          Read user-log data given an offset/length.
 int tail(byte[] b, int off, int len, long tailSize, int tailWindow)
          Tail the user-log.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TaskLogReader

public TaskLogReader(java.net.URL u)
              throws java.io.IOException
Create a new task log reader.

Parameters:
u - URL that is inclusive of taskid and filter: e.g. file:///hadoop/logs/userlogs/task0001_m_00000_0/stdout/
filter - the LogFilter to apply on userlogs.
Throws:
java.io.IOException
Method Detail

getTotalLogSize

public long getTotalLogSize()
                     throws java.io.IOException
Return the total 'logical' log-size written by the task, including purged data.

Returns:
the total 'logical' log-size written by the task, including purged data.
Throws:
java.io.IOException

getInputStream

public java.io.InputStream getInputStream()
                                   throws java.io.IOException
Throws:
java.io.IOException

fetchAll

public byte[] fetchAll()
                throws java.io.IOException
Return the entire user-log (remaining splits).

Returns:
Returns a byte[] containing the data in user-log.
Throws:
java.io.IOException

tail

public int tail(byte[] b,
                int off,
                int len,
                long tailSize,
                int tailWindow)
         throws java.io.IOException
Tail the user-log.

Parameters:
b - the buffer into which the data is read.
off - the start offset in array b at which the data is written.
len - the maximum number of bytes to read.
tailSize - the no. of bytes to be read from end of file.
tailWindow - the sliding window for tailing the logs.
Returns:
the total number of bytes of user-logs dataread into the buffer.
Throws:
java.io.IOException

read

public int read(byte[] b,
                int off,
                int len,
                long logOffset,
                long logLength)
         throws java.io.IOException
Read user-log data given an offset/length.

Parameters:
b - the buffer into which the data is read.
off - the start offset in array b at which the data is written.
len - the maximum number of bytes to read.
logOffset - the offset of the user-log from which to get data.
logLength - the maximum number of bytes of user-log data to fetch.
Returns:
the total number of bytes of user-logs dataread into the buffer.
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
For testing the TaskLog Reader.

Parameters:
args -
Throws:
java.io.IOException


Copyright © 2005-2007 Internet Archive. All Rights Reserved.