1

我有一个大约 1.85 GB 的数据集,其中包含 h5 文件,我需要使用 hadoop 处理这些文件,为此我可能需要将这些文件转换为文本或 csv。hadoop 有什么方法可以读取 h5 文件吗?或者有什么好的在线工具可以将 h5 文件转换为 csv 或文本文件?或者任何人都可以提供一个链接,我可以在其中下载包含文本或 csv 文件的巨大数据集?

提前致谢

4

1 回答 1

0

Have you tried OPeNDAP Hyrax server with hdf5_handler module?

For example, from the sample HDF5 file [1], you can get the following ASCII data [2]:

Dataset: grid_1_2d.h5
temperature[0], 10, 10, 10, 10, 10, 10, 10, 10
temperature[1], 11, 11, 11, 11, 11, 11, 11, 11
temperature[2], 12, 12, 12, 12, 12, 12, 12, 12
temperature[3], 13, 13, 13, 13, 13, 13, 13, 13
...

OPeNDAP Hyrax server with hdf5_handler is a great tool/service because you can select (and subset) a dataset from an HDF5 file easily using HTML form as well [3]. You can find the detailed information about OPeNDAP hdf5_handler from [4].

[1] http://eosdap.hdfgroup.org:8080/opendap/data/hdf5/grid_1_2d.h5

[2] http://eosdap.hdfgroup.org:8080/opendap/data/hdf5/grid_1_2d.h5.ascii

[3] http://eosdap.hdfgroup.org:8080/opendap/data/hdf5/grid_1_2d.h5.html

[4] http://hdfeos.org/software/hdf5_handler.php

于 2014-06-18T20:13:01.043 回答