java - 从 FSDataInputStream 转换为 FileInputStream

Question

我是 Hadoop HDFS 的新手，对 Java 很生疏，我需要一些帮助。我正在尝试从 HDFS 读取文件并计算该文件的 MD5 哈希值。一般的 Hadoop 配置如下。

private FSDataInputStream hdfsDIS;
private FileInputStream FinputStream;
private FileSystem hdfs;
private Configuration myConfig;

myConfig.addResource("/HADOOP_HOME/conf/core-site.xml");
myConfig.addResource("/HADOOP_HOME/conf/hdfs-site.xml");

hdfs = FileSystem.get(new URI("hdfs://NodeName:54310"), myConfig);

hdfsDIS = hdfs.open(hdfsFilePath);

该函数hdfs.open(hdfsFilePath)返回一个FSDataInputStream

问题是我只能从FSDataInputStreamHDFS 中取出，但我想从中FileInputStream取出。

下面的代码执行散列部分，并改编自我在 StackOverflow 某处找到的内容（现在似乎找不到指向它的链接）。

FileInputStream FinputStream = hdfsDIS;   // <---This is where the problem is
MessageDigest md;
    try {
        md = MessageDigest.getInstance("MD5");  
        FileChannel channel = FinputStream.getChannel();
        ByteBuffer buff = ByteBuffer.allocate(2048);

        while(channel.read(buff) != -1){
            buff.flip();
            md.update(buff);
            buff.clear();
        }
        byte[] hashValue = md.digest();

        return toHex(hashValue);
    }
    catch (NoSuchAlgorithmException e){
        return null;
    } 
    catch (IOException e){
        return null;
    }

我需要 a 的原因FileInputStream是因为执行散列的代码使用 aFileChannel据说可以提高从文件中读取数据的效率。

有人可以告诉我如何将其FSDataInputStream转换为FileInputStream

score 2 · Accepted Answer

将其用作InputStream:

MessageDigest md;
try {
    md = MessageDigest.getInstance("MD5");  
    byte[] buff = new byte[2048];
    int count;

    while((count = hdfsDIS.read(buff)) != -1){
        md.update(buff, 0, count);
    }
    byte[] hashValue = md.digest();

    return toHex(hashValue);
}
catch (NoSuchAlgorithmException e){
    return null;
} 
catch (IOException e){
    return null;
}

进行散列的代码使用 FileChannel，据说可以提高从文件中读取数据的效率

在这种情况下不是。如果您只是将数据复制到另一个通道，它只会提高效率，如果您使用 aDirectByteBuffer.如果您正在处理数据，就像这里一样，它没有任何区别。阅读仍然是阅读。

score 0 · Accepted Answer

您可以将 theFSDataInputStream用作常规的InputStream，并将其传递Channels.newChannel给以取回 aReadableByteChannel而不是 a FileChannel。这是一个更新的版本：

InputStream inputStream = hdfsDIS;
MessageDigest md;
try {
    md = MessageDigest.getInstance("MD5");  
    ReadableByteChannel channel = Channels.newChannel(inputStream);
    ByteBuffer buff = ByteBuffer.allocate(2048);

    while(channel.read(buff) != -1){
        buff.flip();
        md.update(buff);
        buff.clear();
    }
    byte[] hashValue = md.digest();

    return toHex(hashValue);
}
catch (NoSuchAlgorithmException e){
    return null;
} 
catch (IOException e){
    return null;
}

score -1 · Accepted Answer

你不能做那个作业，因为:

java.lang.Object
扩展 java.io.InputStream
扩展 java.io.FilterInputStream
扩展 java.io.DataInputStream
扩展 org.apache.hadoop.fs.FSDataInputStream

FSDataInputStream 不是 FileInputStream。

也就是说要从 FSDataInputStream 转换为 FileInputStream，

您可以使用 FSDataInputStream FileDescriptors 根据 Api 创建 FileInputStream

new FileInputStream(hdfsDIS.getFileDescriptor());

不确定它会起作用。

java - 从 FSDataInputStream 转换为 FileInputStream

3 回答 3

Related

Reference