java - 调用 InputSplit 的 getClass() 时来自 Hadoop 的 JobSplitWriter / SerializationFactory 的 NullPointerException

Question

我NullPointerException在开始MapReduce工作时遇到了问题。它被SerializationFactory'sgetSerializer()方法抛出。我正在使用自定义InputSplit、、InputFormat和值类RecordReader。MapReduce

我知道在我的班级创建拆分后的某个时间InputFormat，但在创建RecordReader. 据我所知，它是在“清理暂存区”消息之后直接发生的。

getSerialization()通过在堆栈跟踪指示的位置检查 Hadoop 源，当接收到空Class<T>指针时，似乎正在发生错误。JobClientwriteNewSplits()像这样调用该方法：

Serializer<T> serializer = factory.getSerializer((Class<T>) split.getClass());

因此，我假设当getClass()在我的自定义对象上被调用时InputSplit，它会返回一个null指针，但这只是令人困惑。有任何想法吗？

错误的完整堆栈跟踪如下：

12/06/24 14:26:49 INFO mapred.JobClient: 清理暂存区 hdfs://localhost:54310/tmp/hadoop-s3cur3/mapred/staging/s3cur3/.staging/job_201206240915_0035
线程“主”java.lang.NullPointerException 中的异常
    在 org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
    在 org.apache.hadoop.mapreduce.split.JobSplitWriter.writeNewSplits(JobSplitWriter.java:123)
    在 org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:74)
    在 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:968)
    在 org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
    在 org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    在 org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
    在 org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
    在 java.security.AccessController.doPrivileged（本机方法）
    在 javax.security.auth.Subject.doAs(Subject.java:396)
    在 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    在 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    在 org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    在 org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    在 edu.cs.illinois.cogcomp.hadoopinterface.infrastructure.CuratorJob.start（CuratorJob.java:94）
    在 edu.cs.illinois.cogcomp.hadoopinterface.HadoopInterface.main(HadoopInterface.java:58)
    在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    在 java.lang.reflect.Method.invoke(Method.java:597)
    在 org.apache.hadoop.util.RunJar.main(RunJar.java:156)

谢谢！

编辑：我的自定义 InputSplit 代码如下：

import . . .

/**
 * A document directory within the input directory. 
 * Returned by DirectoryInputFormat.getSplits()
 * and passed to DirectoryInputFormat.createRecordReader().
 *
 * Represents the data to be processed by an individual Map process.
 */
public class DirectorySplit extends InputSplit {
    /**
     * Constructs a DirectorySplit object
     * @param docDirectoryInHDFS The location (in HDFS) of this
     *            document's directory, complete with all annotations.
     * @param fs The filesystem associated with this job
     */
    public  DirectorySplit( Path docDirectoryInHDFS, FileSystem fs )
            throws IOException {
        this.inputPath = docDirectoryInHDFS;
        hash = FileSystemHandler.getFileNameFromPath(inputPath);
        this.fs = fs;
    }

    /**
     * Get the size of the split so that the input splits can be sorted by size.
     * Here, we calculate the size to be the number of bytes in the original
     * document (i.e., ignoring all annotations).
     *
     * @return The number of characters in the original document
     */
    @Override
    public long getLength() throws IOException, InterruptedException {
        Path origTxt = new Path( inputPath, "original.txt" );
        HadoopInterface.logger.log( msg );
        return FileSystemHandler.getFileSizeInBytes( origTxt, fs);
    }

    /**
     * Get the list of nodes where the data for this split would be local.
     * This list includes all nodes that contain any of the required data---it's
     * up to Hadoop to decide which one to use.
     *
     * @return An array of the nodes for whom the split is local
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    public String[] getLocations() throws IOException, InterruptedException {
        FileStatus status = fs.getFileStatus(inputPath);

        BlockLocation[] blockLocs = fs.getFileBlockLocations( status, 0,
                                                              status.getLen() );

        HashSet<String> allBlockHosts = new HashSet<String>();
        for( BlockLocation blockLoc : blockLocs ) {
            allBlockHosts.addAll( Arrays.asList( blockLoc.getHosts() ) );
        }

        return (String[])allBlockHosts.toArray();
    }

    /**
     * @return The hash of the document that this split handles
     */
    public String toString() {
        return hash;
    }

    private Path inputPath;
    private String hash;
    private FileSystem fs;
}

score 5 · Accepted Answer

5

InputSplit 不扩展 Writable，您需要明确声明您的输入拆分实现 Writable

于 2012-06-25T14:32:26.313 回答

java - 调用 InputSplit 的 getClass() 时来自 Hadoop 的 JobSplitWriter / SerializationFactory 的 NullPointerException

1 回答 1

Related

Reference