17.2.164. MPI_Gatherv

MPI_Gatherv, MPI_Igatherv, MPI_Gatherv_init - 从所有进程收集不同数量的数据到根进程

17.2.164.1. 语法

17.2.164.1.1. C语法

#include <mpi.h>

int MPI_Gatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts[], const int displs[], MPI_Datatype recvtype,
    int root, MPI_Comm comm)

int MPI_Igatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts[], const int displs[], MPI_Datatype recvtype,
    int root, MPI_Comm comm, MPI_Request *request)

int MPI_Gatherv_init(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts[], const int displs[], MPI_Datatype recvtype,
    int root, MPI_Comm comm, MPI_Info info, MPI_Request *request)

17.2.164.1.2. Fortran 语法

USE MPI
! or the older form: INCLUDE 'mpif.h'

MPI_GATHERV(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNTS,
        DISPLS, RECVTYPE, ROOT, COMM, IERROR)
    <type>  SENDBUF(*), RECVBUF(*)
    INTEGER SENDCOUNT, SENDTYPE, RECVCOUNTS(*), DISPLS(*)
    INTEGER RECVTYPE, ROOT, COMM, IERROR

MPI_IGATHERV(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNTS,
        DISPLS, RECVTYPE, ROOT, COMM, REQUEST, IERROR)
    <type>  SENDBUF(*), RECVBUF(*)
    INTEGER SENDCOUNT, SENDTYPE, RECVCOUNTS(*), DISPLS(*)
    INTEGER RECVTYPE, ROOT, COMM, REQUEST, IERROR

MPI_GATHERV_INIT(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNTS,
        DISPLS, RECVTYPE, ROOT, COMM, INFO, REQUEST, IERROR)
    <type>  SENDBUF(*), RECVBUF(*)
    INTEGER SENDCOUNT, SENDTYPE, RECVCOUNTS(*), DISPLS(*)
    INTEGER RECVTYPE, ROOT, COMM, INFO, REQUEST, IERROR

17.2.164.1.3. Fortran 2008 语法

USE mpi_f08

MPI_Gatherv(sendbuf, sendcount, sendtype, recvbuf, recvcounts, displs,
        recvtype, root, comm, ierror)
    TYPE(*), DIMENSION(..), INTENT(IN) :: sendbuf
    TYPE(*), DIMENSION(..) :: recvbuf
    INTEGER, INTENT(IN) :: sendcount, recvcounts(*), displs(*), root
    TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype
    TYPE(MPI_Comm), INTENT(IN) :: comm
    INTEGER, OPTIONAL, INTENT(OUT) :: ierror

MPI_Igatherv(sendbuf, sendcount, sendtype, recvbuf, recvcounts, displs,
        recvtype, root, comm, request, ierror)
    TYPE(*), DIMENSION(..), INTENT(IN), ASYNCHRONOUS :: sendbuf
    TYPE(*), DIMENSION(..), ASYNCHRONOUS :: recvbuf
    INTEGER, INTENT(IN) :: sendcount, root
    INTEGER, INTENT(IN), ASYNCHRONOUS :: recvcounts(*), displs(*)
    TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype
    TYPE(MPI_Comm), INTENT(IN) :: comm
    TYPE(MPI_Request), INTENT(OUT) :: request
    INTEGER, OPTIONAL, INTENT(OUT) :: ierror

MPI_Gatherv_init(sendbuf, sendcount, sendtype, recvbuf, recvcounts, displs,
        recvtype, root, comm, info, request, ierror)
    TYPE(*), DIMENSION(..), INTENT(IN), ASYNCHRONOUS :: sendbuf
    TYPE(*), DIMENSION(..), ASYNCHRONOUS :: recvbuf
    INTEGER, INTENT(IN) :: sendcount, root
    INTEGER, INTENT(IN), ASYNCHRONOUS :: recvcounts(*), displs(*)
    TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype
    TYPE(MPI_Comm), INTENT(IN) :: comm
    TYPE(MPI_Info), INTENT(IN) :: info
    TYPE(MPI_Request), INTENT(OUT) :: request
    INTEGER, OPTIONAL, INTENT(OUT) :: ierror

17.2.164.2. 输入参数

sendbuf : 发送缓冲区的起始地址（选项）。
sendcount : 发送缓冲区中的元素数量（整数）。
sendtype : 发送缓冲区元素的数据类型（句柄）。
recvcounts整数数组（长度为组大小），包含
从每个进程接收的元素数量（仅在根节点有效）。
displs整数数组（长度为组大小）。条目i指定
相对于recvbuf的位移量，用于放置来自进程i的输入数据（仅在根节点有效）。
recvtype接收缓冲区元素的数据类型（仅在
根节点有效）（句柄）。
root : 接收进程的等级（整数）。
comm : 通信器（句柄）。
info : 信息（句柄，仅持久化）。

17.2.164.3. 输出参数

recvbuf接收缓冲区地址（仅在根节点有效）
根节点。
request : 请求（句柄，仅限非阻塞模式）。
ierror : 仅限Fortran：错误状态（整数）。

17.2.164.4. 描述

MPI_Gatherv 扩展了 MPI_Gather 的功能，允许每个进程发送不同数量的数据，因为 recvcounts 现在是一个数组。通过提供新参数 displs，它还提供了关于数据在根节点上放置位置的更大灵活性。

结果相当于每个进程（包括根进程）都向根进程发送了一条消息，

MPI_Send(sendbuf, sendcount, sendtype, root, ...);

根节点执行n次接收操作，

MPI_Recv(recvbuf + disp[i] * extent(recvtype), recvcounts[i],
         recvtype, i, ...);

消息按照进程排名顺序被放入根进程的接收缓冲区中，也就是说，进程j发送的数据会被放置在根进程接收缓冲区recvbuf的第j个部分。recvbuf的第j个部分从偏移量displs[j]个元素（以recvtype为单位）处开始。

对于所有非根进程，接收缓冲区将被忽略。

进程i上由sendcount、sendtype隐含的类型签名必须与根进程上由recvcounts[i]、recvtype隐含的类型签名相等。这意味着发送的数据量必须等于接收的数据量，在每个进程与根进程之间成对匹配。发送方和接收方之间仍允许使用不同的类型映射，如下方示例2所示。

该函数的所有参数在进程根节点上都是有效的，而在其他进程上，只有参数 sendbuf、sendcount、sendtype、root 和 comm 是有效的。参数 root 和 comm 在所有进程上必须具有相同的值。

计数、类型和位移的指定不应导致根节点上的任何位置被多次写入。这样的调用是错误的。

示例1：现在让每个进程向根进程发送100个整数，但在接收端将每组（100个）整数间隔stride个位置存放。使用MPI_Gatherv和displs参数来实现此效果。假设stride >= 100。

MPI_Comm comm;
int gsize, sendarray[100];
int root, *rbuf, stride;
int *displs, i, rcounts;
...

MPI_Comm_size(comm, &gsize);
rbuf = (int)malloc(gsize * stride * sizeof(int));
displs = (int)malloc(gsize * sizeof(int));
rcounts = (int )malloc(gsize * sizeof(int));

for (i=0; i<gsize; ++i) {
  displs[i] = i * stride;
  rcounts[i] = 100;
}
MPI_Gatherv(sendarray, 100, MPI_INT, rbuf, rcounts, displs, MPI_INT,
            root, comm);

请注意，如果步长小于100，程序将出现错误。

示例2：与示例1的接收端相同，但在C语言中发送一个100x150整型数组的第0列的100个整数。

MPI_Comm comm;
int gsize, sendarray[100][150];
int root, *rbuf, stride;
MPI_Datatype stype;
int displs,i, rcounts;
...

MPI_Comm_size(comm, &gsize);
rbuf = (int )malloc(gsize * stride * sizeof(int));
displs = (int)malloc(gsize * sizeof(int));
rcounts = (int )malloc(gsize * sizeof(int));

for (i=0; i<gsize; ++i) {
  displs[i] = i * stride;
  rcounts[i] = 100;
}

// Create datatype for 1 column of array
MPI_Type_vector(100, 1, 150, MPI_INT, &stype);
MPI_Type_commit( &stype );
MPI_Gatherv(sendarray, 1, stype, rbuf, rcounts, displs, MPI_INT,
            root, comm);

示例3：在C语言中，进程i从一个100x150的整型数组的第i列发送(100-i)个整数。与前两个示例类似，接收方使用跨步方式存入缓冲区。

MPI_Comm comm;
int gsize, sendarray[100][150], *sptr;
int root, *rbuf, stride, myrank;
MPI_Datatype stype;
int displs, i, rcounts;
...

MPI_Comm_size(comm, &gsize);
MPI_Comm_rank( comm, &myrank );
rbuf = (int)malloc(gsize * stride * sizeof(int));
displs = (int)malloc(gsize * sizeof(int));
rcounts = (int )malloc(gsize * sizeof(int));

for (i=0; i<gsize; ++i) {
  displs[i] = i * stride;
  rcounts[i] = 100-i; // note change from previous example
}

// Create datatype for the column we are sending
MPI_Type_vector(100-myrank, 1, 150, MPI_INT, &stype);
MPI_Type_commit( &stype );
// sptr is the address of start of "myrank" column
sptr = &sendarray[0][myrank];
MPI_Gatherv(sptr, 1, stype, rbuf, rcounts, displs, MPI_INT,
            root, comm);

请注意，从每个进程接收的数据量可能不同。

示例4：与示例3相同，但在发送端采用不同的方式实现。我们创建了一种数据类型，使发送端能够正确跨步读取C数组的列。

MPI_Comm comm;
int gsize, sendarray[100][150], *sptr;
int root, *rbuf, stride, myrank, disp[2], blocklen[2];
MPI_Datatype stype, type[2];
int displs, i, rcounts;
...

MPI_Comm_size(comm, &gsize);
MPI_Comm_rank(comm, &myrank );
rbuf = (int )alloc(gsize * stride * sizeof(int));
displs = (int )malloc(gsize * sizeof(int));
rcounts = (int)malloc(gsize * sizeof(int));

for (i=0; i<gsize; ++i) {
  displs[i] = i* stride;
  rcounts[i] = 100-i;
}
// Create datatype for one int, with extent of entire row
disp[0] = 0;
disp[1] = 150 * sizeof(int);
type[0] = MPI_INT;
type[1] = MPI_UB;
blocklen[0] = 1;
blocklen[1] = 1;

MPI_Type_struct( 2, blocklen, disp, type, &stype );
MPI_Type_commit(&stype );
sptr = &sendarray[0][myrank];
MPI_Gatherv(sptr, 100-myrank, stype, rbuf, rcounts, displs, MPI_INT,
            root, comm);

示例5：发送端与示例3相同，但在接收端我们使接收块之间的步长随块变化。

MPI_Comm comm;
int gsize, sendarray[100][150], *sptr;
int root, *rbuf, *stride, myrank, bufsize;
MPI_Datatype stype;
int *displs, i, *rcounts, offset;
...

MPI_Comm_size( comm, &gsize);
MPI_Comm_rank( comm, &myrank );
de = (int )malloc(gsize * sizeof(int));
...
// stride[i] for i = 0 to gsize-1 is set somehow

// set up displs and rcounts vectors first
displs = (int)malloc(gsize * sizeof(int));
rcounts = (int )malloc(gsize * sizeof(int));
offset = 0;

for (i=0; i<gsize; ++i) {
  displs[i] = offset;
  offset += stride[i];
  rcounts[i] = 100-i;
}

// the required buffer size for rbuf is now easily obtained
bufsize = displs[gsize-1]+rcounts[gsize-1];
rbuf = (int )malloc(bufsize * sizeof(int));
// Create datatype for the column we are sending
MPI_Type_vector(100-myrank, 1, 150, MPI_INT, &stype);
MPI_Type_commit( &stype );
sptr = &sendarray[0][myrank];
MPI_Gatherv(sptr, 1, stype, rbuf, rcounts, displs, MPI_INT,
            root, comm);

示例6：在C语言中，进程i从100x150整型数组的第i列发送num个整数。复杂因素在于根进程并不知道num的不同取值，因此必须先运行一个单独的gather操作来获取这些值。数据在接收端被连续存放。

MPI_Comm comm;
int gsize, sendarray[100][150], *sptr;
int root, *rbuf, stride, myrank, disp[2], blocklen[2];
MPI_Datatype stype,types[2];
int *displs, i, *rcounts, num;
...

MPI_Comm_size( comm, &gsize);
MPI_Comm_rank( comm, &myrank );

// First, gather nums to root
rcounts = (int )malloc(gsize * sizeof(int));
MPI_Gather( &num, 1, MPI_INT, rcounts, 1, MPI_INT, root, comm);
// root now has correct rcounts, using these we set
// displs[] so that data is placed contiguously (or concatenated) at receive end

displs = (int)malloc(gsize * sizeof(int));
displs[0] = 0;
for (i=1; i<gsize; ++i) {
  displs[i] = displs[i-1]+rcounts[i-1];
}

// And, create receive buffer
rbuf = (int *)malloc(gsize * (displs[gsize-1]+rcounts[gsize-1]) * sizeof(int));
// Create datatype for one int, with extent of entire row
disp[0] = 0;
disp[1] = 150 * sizeof(int);
type[0] = MPI_INT;
type[1] = MPI_UB;
blocklen[0] = 1;
blocklen[1] = 1;
MPI_Type_struct(2, blocklen, disp, type, &stype );
MPI_Type_commit( &stype );
sptr = &sendarray[0][myrank];
MPI_Gatherv(sptr, num, stype, rbuf, rcounts, displs, MPI_INT, root, comm);

17.2.164.5. 使用原地选项

原地操作选项的工作方式与MPI_Gather中的相同。当通信器是内部通信器时，您可以执行原地聚集操作（输出缓冲区同时用作输入缓冲区）。将变量MPI_IN_PLACE作为根进程sendbuf的值。在这种情况下，sendcount和sendtype会被忽略，并且假定根进程对聚集向量的贡献已经位于接收缓冲区中的正确位置。

请注意，MPI_IN_PLACE是一种特殊类型的值；其使用限制与MPI_BOTTOM相同。

由于原地(in-place)选项将接收缓冲区转换为发送-接收缓冲区，包含INTENT的Fortran绑定必须将其标记为INOUT，而非OUT。

17.2.164.6. 当通信器为跨通信器时

当通信器为跨通信器时，第一组中的根进程会从第二组的所有进程中收集数据。第一组定义了根进程。该进程在其根参数中使用MPI_ROOT作为值。其余进程使用MPI_PROC_NULL作为其根参数的值。第二组中的所有进程使用第一组中该根进程的秩作为其根参数的值。第一组中进程的发送缓冲区参数必须与第二组中根进程的接收缓冲区参数保持一致。

17.2.164.7. 错误

几乎所有MPI例程都会返回一个错误值；C语言例程通过函数返回值返回，Fortran例程则通过最后一个参数返回。

在返回错误值之前，会调用与通信对象（如通信器、窗口、文件）关联的当前MPI错误处理程序。如果MPI调用未关联任何通信对象，则该调用被视为附加到MPI_COMM_SELF，并将调用关联的MPI错误处理程序。当MPI_COMM_SELF未初始化时（即在MPI_Init/MPI_Init_thread之前、MPI_Finalize之后，或仅使用会话模型时），错误会触发初始错误处理程序。初始错误处理程序可通过在使用世界模型时调用MPI_Comm_set_errhandler来修改MPI_COMM_SELF，或通过mpiexec的mpi_initial_errhandler命令行参数，或MPI_Comm_spawn/MPI_Comm_spawn_multiple的info键来设置。如果未设置其他适当的错误处理程序，则MPI I/O函数将调用MPI_ERRORS_RETURN错误处理程序，而其他所有MPI函数将调用MPI_ERRORS_ABORT错误处理程序。

Open MPI 包含三个可使用的预定义错误处理器：

MPI_ERRORS_ARE_FATAL 导致程序中止所有连接的MPI进程。
MPI_ERRORS_ABORT 一个可在通信器、窗口、文件或会话上调用的错误处理程序。当在通信器上调用时，其行为类似于在该通信器上调用MPI_Abort。如果在窗口或文件上调用，则行为类似于在包含对应窗口或文件中进程组的通信器上调用MPI_Abort。如果在会话上调用，则仅中止本地进程。
MPI_ERRORS_RETURN 向应用程序返回一个错误代码。

MPI应用程序也可以通过调用以下方式实现自己的错误处理程序：

MPI_Comm_create_errhandler 然后 MPI_Comm_set_errhandler
MPI_File_create_errhandler 然后 MPI_File_set_errhandler
MPI_Session_create_errhandler 然后 MPI_Session_set_errhandler 或在 MPI_Session_init
MPI_Win_create_errhandler 然后 MPI_Win_set_errhandler

请注意，MPI不保证MPI程序在出现错误后能够继续运行。

查看MPI手册页获取完整的MPI错误代码列表。

有关更多信息，请参阅MPI-3.1标准中的错误处理部分。

另请参阅

MPI_Gather