api - Does libhdfs c/c++ api support read/write compressed file

Question

I have found somebody talks libhdfs does not support read/write gzip file at about 2010.

I download the newest hadoop-2.0.4 and read hdfs.h. There is also no compressing arguments.

Now I am wondering if it supports reading compressed file now?

If it not, how can I make a patch for the libhdfs and make it work?

Thanks in advance.

Best Regards Haiti

score 0 · Accepted Answer

谢谢回复。使用 libhdfs 读取原始文件，然后使用 zlib 扩充内容。这可以工作。该文件使用 gzip。我使用了这样的代码。

z_stream gzip_stream;

gzip_stream.zalloc = (alloc_func)0;
gzip_stream.zfree = (free_func)0;
gzip_stream.opaque = (voidpf)0;

gzip_stream.next_in  = buf;
gzip_stream.avail_in = readlen;
gzip_stream.next_out = buf1;
gzip_stream.avail_out = 4096 * 4096;

ret = inflateInit2(&gzip_stream, 16 + MAX_WBITS);
if (ret != Z_OK) {
    printf("deflate init error\n");
}   
ret = inflate(&gzip_stream, Z_NO_FLUSH);
ret = inflateEnd(&gzip_stream);
printf("the buf \n%s\n", buf1);

return buf;

score 0 · Accepted Answer

据我所知，libhdfs只使用JNI来访问 HDFS。如果您熟悉 HDFS Java API，那么 libhdfs只是org.apache.hadoop.fs.FSDataInputStream. 所以它现在不能直接读取压缩文件。

我猜你想通过 C/C++ 访问 HDFS 中的文件。如果是这样，您可以使用libhdfs读取原始文件，并使用 zip/unzip C/C++ 库来解压缩内容。压缩文件格式相同。例如，如果文件被 lzo 压缩，那么你可以使用lzo库来解压它们。

但是如果文件是一个序列文件，那么你可能需要使用 JNI 来访问它们，因为它们是 Hadoop 特殊文件。我以前见过Impala 做过类似的工作。但它不是开箱即用的。

api - Does libhdfs c/c++ api support read/write compressed file

2 回答 2

Related

Reference