“hdf”的相关标签问题_Stack Overflow中文网

0 投票

1 回答

360 浏览

python - How to use HDF to store a very large matrix

I am planning to use HDF to store a very large matrix, something like 1e6 x 1e6 of floats.

I would need to read the matrix in batches of consecutive rows or columns.

My question is, what would be the optimal way to structure/tweak the HDF file to maximize speed?

Some points:

I have estimated that reading/writing the full matrix uncompressed in HDF would take ~5 hours on my system. This is reasonable, but it is not reasonable to store the matrix uncompressed, since it will be several terabytes in size.
If the matrix is sparse, could compression cause reading speed to be comparable or even faster than reading an uncompressed dense matrix?
Breaking the matrix into separate submatrix datasets would be annoying, since it would complicate reading a row/column from the original matrix or doing things like matrix multiplication. So I would like to avoid this if possible (unless this gives a major speed advantage).
After reading the matrix once, I plan to read it many times. So read/decompression speed is more important than write/compression speed.
I am using python h5py to interface with the hdf.

2014-04-15T13:58:34.100

0 投票

1 回答

624 浏览

c# - 如何过滤 HDF5 文件中的特定对象

学习ILNumerics HDF5 API。我真的很喜欢使用 C# 对象初始化程序在一个表达式中设置复杂 HDF5 文件的选项。我创建了以下文件：

现在我正在寻找一种巧妙的方法来迭代文件并过滤具有特定属性的数据集。我想找到所有至少有一个名称中带有“att”的属性的数据集，收集并返回它们的内容。这是我到目前为止所做的：

但它不能递归地工作。我可以采用它，但 ILNumerics 声称很方便，所以一定有更好的方法吗？python中类似于h5py的东西？

c#python hdf5 ilnumerics hdf

2014-04-23T15:32:57.643

0 投票

1 回答

1377 浏览

python - HDF4 file on Anaconda distribution of python

I am trying to read an HDF4 file with my Anaconda python distributions on 64-bit Windows 7. I have tried to do a conda install of both the pyhdf and pyNio packages, but Anaconda seems to find neither. Does anyone have any advice on how to do this? I tried to add conda.binstar.org/mutirri to my .condarc file, but conda says it still can't find the packages....thanks!

python anaconda conda hdf pyhdf

2014-04-28T02:54:10.193

0 投票

1 回答

355 浏览

python - 写入 HDFStore 时是否可以保留 Pandas tseries DatetimeIndex 的频率？

我有一个 Pandas DataFrame，其中的索引是（注意 Freq: H）-

有多个列，但前几行（以及其他分散在各处的行）具有所有 NA 条目。如果我将其写入 HDF 文件：

然后读回来：

并查看索引，我看到：

请注意，Freq 现在为 None，并且行数更少，开始日期时间更晚。第一行现在是原始 DataFrame 的第一行，其中包含至少一个非 NA 列值。

首先，这种预期行为是由于 HDF5 格式的限制以及 DataFrames 的存储方式，还是一个错误？

有没有一种干净的方法来避免这种情况发生，或者我只需要在加载后“修复”索引。也不确定最好的方法是什么。

python pandas scipy pytables hdf

2014-05-07T15:28:24.300

0 投票

1 回答

78 浏览

matlab - 在 Matlab 中导出 HDF4 数据

我需要一个脚本来将数据从Matlab导出为HDF4格式。我要存储在 hdf4 文件中的变量的尺寸为 3128*242*256（int 16 类型）。

谢谢

matlab export hdf

2014-06-09T17:10:22.160

0 投票

4 回答

6467 浏览

python - 在 Python 中读取 HDF 文件的属性

我在 pandas 中读取 hdf 文件时遇到问题。截至目前，我不知道文件的密钥。

在这种情况下如何读取文件 [data.hdf]？而且，我的文件是 .hdf 而不是 .h5 ，它对数据获取有影响吗？

我看到您需要“商店中的组标识符”

我能够从 pytables 获取元数据

如何通过 pandas 使其可读？

python pandas hdf5 hdfstore hdf

user2517372

2014-06-16T03:23:10.430

0 投票

3 回答

1415 浏览

hadoop - 连接到 HDFS Namenode 的问题

安装新的 hadoop 单节点后，在 hadoop-root-datanode-localhost.localdomain.log 中出现以下错误

任何想法。？

JPS 没有给出任何输出

核心 site.xml 已更新

此外，关于使用hadoop namenode的格式 - 格式低于中止错误

hadoop hdf

2014-06-18T18:20:54.653

0 投票

1 回答

1029 浏览

c++ - 使用 Visual C++ 将二维数组 int[n][m] 写入 HDF5 文件

我刚刚开始使用 HDF5，希望能得到以下方面的建议。

我有一个二维数组： data[][]传递给一个方法。该方法如下所示：

数据的大小实际上不是 48 x 100，而是 48 x sizes[i]。即每一行可以是不同的长度！在我正在处理的一个简单情况下，所有行的大小都相同（但不是 100），因此您可以说数组是 48 X 大小 [0]。

如何最好地将其写入 HDF5？

我有一些工作代码，在其中循环 0 到 48 并为每一行创建一个新数据集。

就像是：

有没有办法在一个 DataSet 中一次写入所有数据？对于所有行的长度相同的简单情况，也许一种解决方案是针对参差不齐的行的另一种解决方案？

我尝试了几件事无济于事。我调用了dataSet.write(data, intDataType)，即我把整个数组扔给了它。我似乎在文件中得到了垃圾，我怀疑是因为存储数据的数组实际上是 48x100，我只需要其中的一小部分。

我突然想到我可以使用 double ptrs int** 或 vector> 但我坚持下去。据我所知，“写”需要一个 void* ptr。另外，我希望文件“看起来正确”。即一个包含所有数据行的巨大行是不可取的，如果我必须走那条路，那么有人需要传达一种巧妙的方式来存储允许我从文件中读回数据的信息（也许将行长度存储为属性？）。

也许我真正的问题是找到非平凡用例的 C++ 示例。

任何帮助深表感谢。

戴夫

c++visual-c++hdf5 hdf

2014-07-03T19:02:36.583

0 投票

1 回答

4165 浏览

python - 如何连接给定目录中的所有 HDF5 文件？

我在一个目录中有许多 HDF5 文件，我想将它们全部连接起来。我尝试了以下方法：

但是，这只会创建一个空文件。每个 HDF5 文件包含两个数据集，但我只关心获取第二个数据集（每个数据集的名称相同）并将其添加到新文件中。

有没有更好的连接 HDF 文件的方法？有没有办法修复我的方法？

python hdf

2014-07-03T19:09:12.843

0 投票

1 回答

1424 浏览

c - 使用数据类型检测从 HDF5 文件中的数据集读取

我目前尝试从 C 中的 hdf5 数据集中读取一些数据，如下所示。

这ic_group是一个包含数据集的组vx，memspace 是内存中的 hyperslab，vx_ptr 是内存中的数据。这种方法效果很好，但是由于我想稍后使用不同的数据类型，所以我想直接从数据集中读取类型：

不幸的是，这种方法会导致函数中的段错误H5Dread。也许我错过了什么？感谢您的任何建议。

编辑：我不知道这是否有用，但 gdb 的回溯下降到 0x00007ffff5adbd1e in __memcpy_ssse3_back () from /lib64/libc.so.6.

c hdf5 hdf

2014-07-21T18:07:50.733

问题标签 [hdf]

Reference