parallel-processing - 使用 MPI-IO 读取文本文件？

Question

我有一个文本文件，其中包含一个带有矩阵尺寸的标题，然后是矩阵。这是一个 3x3 矩阵的示例：

我不断地使用 MPI-IO 获取垃圾值，发现它只适用于二进制文件，而不适用于文本文件。

我以为我会读入一个字符流并转换为整数，但我不确定如何解决这个问题，因为矩阵元素的位数是可变的。我真的不确定如何解决这个问题？

score 1 · Accepted Answer

Generally you know the type and number of the things written in your binary file (e.g. all integers, 10 int 3 float, etc). You can read as numbers of bytes but MPI binary files are usually read/written as a whole number of types, in your case 9 integers (so number of digits are not important). You open the file and read with something like,

    call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, & 
                   MPI_MODE_RDONLY , MPI_INFO_NULL, fh, ierr)

    bufsize = 3*3
    allocate(buf(bufsize))
    call MPI_FILE_READ_ALL(fh, buf, bufsize, MPI_integer, & 
                           MPI_STATUS_IGNORE, ierr)

For variable matrix sizes, you can use things like MPI_File_get_size to get the size and work out how many elements to read. For mixed data, you can have the first (or last) part of the binary file as a header, which you read first and use to decode the rest of your file. You would still need to know the format of the header and this can be problematic as you break backwards compatibility when you change the code/header format. This is part of the reason for data formats like HDF5 https://support.hdfgroup.org/HDF5/

score 0 · Accepted Answer

文本文件很棘手，因为您需要知道“字节”而不是“数字”。eg1 1 1比短10 15 123355。

现在，如果您的约定说“每个数字都将被零填充到 6 位”，那么您可以让每个进程从 (size/nprocs)*rank -th 中读取

或者，您需要一个索引器来读取文件并记录矩阵的每一行开始的偏移量。

或者正如您所观察到的，使用二进制数据变得更加容易。

parallel-processing - 使用 MPI-IO 读取文本文件？

2 回答 2

Related

Reference