c++ - 分段错误，使用 MPI 库乘以矩阵

Question

我正在编写一个程序，该程序将存储在文本文件中的两个矩阵 A 和 B 相乘，其大小可能会有所不同，因此我的程序必须确定矩阵 A 和 B 的大小，确定它们是否可以相乘等。

好吧，这不是问题，真正的麻烦是当我将数据从主进程传递到从进程时，在我的程序中，我将行从主进程传递到从进程，行数取决于矩阵的行数和过程。

矩阵 A 按行存储，但矩阵 B 按列存储。

矩阵A[0]----------------

矩阵A[1]----------------

矩阵A[2]----------------

矩阵B[ 0 ] 矩阵B[ 1 ] 矩阵B[ 2 ] ....
|           |         |     |
|           |         |     |
|           |         |     |    

您可以在此处找到文本文件（用于输入）：matrixA matrixB。

经过几天的 80 年代风格调试（意味着根本不是调试器），我认为问题（我得到的分段错误作为输出）出在这些代码行中（来自从属函数）：

void slave( int id, int slaves, double **matrixA, double **matrixB, double **matrixC )
{
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
    MPI_Status status;

    /* Recieves columns of A and B from master. */
    type = 3;

    MPI_Recv( &columnsA, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rowsA, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &columnsB, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rowsB, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    printf( "%d slave recieved ColumnA = %d, RowsA = %d, ColumnB = %d, RowsB = %d.\n", id, columnsA, rowsA, columnsB, rowsB );


    /* Recieve from master. */
    type = 0;

    MPI_Recv( &offset, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rows, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );

    matrixAllocate( &matrixA, columnsA, rows );
    matrixAllocate( &matrixB, rowsB, columnsB );
    matrixAllocate( &matrixC, columnsB, rows );
    printf( "Correctly allocated.\n" );

    /* This part is only to see if the mem was correctly allocated.*/
    for( int i = 0; i < rows; i++ ){
        for( int j = 0; j < columnsA; j++)
            matrixA[ i ][ j ] = i + j;
    }

    for( int i = 0; i < columnsB; i++ ){
        for( int j = 0; j < rowsB; j++)
            matrixB[ i ][ j ] = i * j;
    }

    if ( id == 1 ){
        matrixPrinter( "matrixA", matrixA, rows, columnsA );
        matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
        matrixPrinter( "matrixC", matrixC, rows, columnsB );
    }

    MPI_Recv( &matrixA, ( rows * columnsA ) , MPI_DOUBLE, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &matrixB, ( rowsB * columnsB ), MPI_DOUBLE, 0, type, MPI_COMM_WORLD, &status );
    printf( "Correctly recieved.\n" );

    matrixPrinter( "matrixA", matrixA, rows, columnsA );
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
    matrixPrinter( "matrixC", matrixC, rows, columnsB );

    if ( id == 1 ){
        printf( "My id is %d.\n", id );
        for ( int i = 0; i < rows; i++ ){
            for( int j = 0; j < columnsA; j++ ){
                printf( "%lf    ", matrixA[ i ][ j ] );
            }
        printf( "\n" );
    }
}

整个代码可以在这里找到。C中的MPI矩阵乘法器。

终端的输出是：

在此处输入图像描述

score 6 · Accepted Answer

问题是，矩阵是在“matrixAllocate”中分配的“double **”类型。在发送和接收数据时，MPI 假定 buf 以一维数组的形式连续包含数据，但事实并非如此。（您可以通过打印出每个矩阵条目的地址来轻松检查）

我认为这是 C 语言中的一个著名陷阱：指针和数组是不同的。如果矩阵是一个二维数组，那么所有的条目都是连续布局的。

我的建议是将矩阵分配为 1-d 并且不要使用 multidim 下标。

score 1 · Accepted Answer

在不深入研究所有MPI代码的情况下，我不想发布这样的答案，但我建议-Wall将来使用编译器命令。它可能会有所帮助并发现这样的错误。对于 MPI 和任何与计算相关的东西，您几乎总是需要-Wall编译器命令

查看代码的输出和警告列表。

$ mpic++ test.cpp -Wall -o  test
test.cpp:30:63: warning: unused variable 'rank' [-Wunused-variable]
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                              ^
test.cpp:30:69: warning: unused variable 'source' [-Wunused-variable]
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                                    ^
test.cpp:126:50: warning: variable 'matrixC' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                                                 ^~~~~~~
test.cpp:34:21: note: initialize the variable 'matrixC' to silence this warning
           **matrixC;
                    ^
                     = NULL
test.cpp:126:41: warning: variable 'matrixB' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                                        ^~~~~~~
test.cpp:33:21: note: initialize the variable 'matrixB' to silence this warning
           **matrixB,
                    ^
                     = NULL
test.cpp:85:44: warning: variable 'rc' is uninitialized when used here [-Wuninitialized]
                MPI_Abort( MPI_COMM_WORLD, rc );
                                           ^~
test.cpp:30:53: note: initialize the variable 'rc' to silence this warning
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                    ^
                                                     = 0
test.cpp:126:32: warning: variable 'matrixA' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                               ^~~~~~~
test.cpp:32:21: note: initialize the variable 'matrixA' to silence this warning
    double **matrixA,
                    ^
                     = NULL
test.cpp:398:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixA", matrixA, rows, columnsA );
                   ^
test.cpp:399:21: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
                    ^
test.cpp:400:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixC", matrixC, rows, columnsB );
                   ^
test.cpp:407:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixA", matrixA, rows, columnsA );
                   ^
test.cpp:408:21: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
                    ^
test.cpp:409:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixC", matrixC, rows, columnsB );
                   ^
test.cpp:363:70: warning: unused variable 'averageRows' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                                     ^
test.cpp:363:83: warning: unused variable 'extraRows' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                                                  ^
test.cpp:363:49: warning: unused variable 'Btype' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                ^
15 warnings generated.

c++ - 分段错误，使用 MPI 库乘以矩阵

2 回答 2

Related

Reference