2

我正在尝试像这样读取大型二进制 LAS 文件

struct format
{
    double X;
    double Y;
    double Z;
    short red;
    short green;
    short blue;
    short alpha;
    unsigned long intensity
    // etc.
}

std::ifstream stream;
Point3 GetPoint()
{
    format f;
    stream.seekg(offset);
    offset += sizeof(format);
    stream.Read((char *)f, sizeof(format));
    return Point3(f.X, f.Y, f.Z);
}

在主要功能中:

Point3* points = new Point3[count]
for (int i = 0; i < count; i++)
    points[i] = GetPoint();

此操作大约需要 116 秒,有 1800 万点记录。但在 LAS 工具中,读取和开始对相同数据进行可视化需要将近 15 秒。

它怎么能比我的快 7 倍?是多线程还是其他?如果我的阅读功能不好,怎么会差7倍呢?

我有一些关于内存映射文件的信息。将整个文件加载到内存非常快,但 LAS 文件可能超过 15GB,这超出了我的内存大小,因此它将被加载到虚拟内存。即使我有足够的内存,我也必须使用循环读取内存映射文件。

有人可以帮我解决这种情况吗?

4

2 回答 2

2

Since the file is being read sequentially, why the call seekg? Try removing seekg.

Some other things you can try:

  • Read the file by blocks (32K) and pass these to another thread (look for consumer/producer pattern). The second thread (the consumer) can parse the blocks and fill the points array while the first thread (the producer) is waiting for I/O.
  • If Point3 defines a constructor, use a vector<> instead this way you won't have to create 'count' Point3 objects when you create the array.

Also, how do you know that the LAS tool waits for the entire file to be read before rendering? Is it possible that it starts the rendering before the file is completely read in?

于 2013-07-24T19:48:29.407 回答
1

根据您的实施ifstream,速度非常慢。例如,在 MS 编译器上,它依赖于<cstdio>缓冲。这意味着,它为要读取的每个字节调用“c”函数。

另外,您确定可以将内存复制到您的结构中吗?您是否考虑过填充?

就像您的问题状态一样,内存映射文件要快得多。您不需要映射整个文件,您可以映射其中的一小部分。通常,映射与系统页面大小相同的部分就足够了。

调查mmap

于 2013-07-24T19:29:43.393 回答