Aluminum API Documentation

Aluminum initialization and communication operations.

Aluminum provides an interface to high-performance and accelerator-aware communication operations.

namespace Al

Functions

void Initialize(int &argc, char **&argv)

Initialize Aluminum.

This must be called before any other calls to Aluminum are made, except for Initialized(). It is safe to call this multiple times, but it may not be called after Finalize().

The argc and argv arguments are used to initialize MPI. They may be null if the underlying MPI library does not rely on arguments to initialize.

This will initialize Aluminum to use the whole MPI_COMM_WORLD. See Initialize(int&, char**&, MPI_Comm) if a specific subcommunicator is desired.

Parameters:

argc, argv – The argc and argv arguments provided to the binary.

void Initialize(int &argc, char **&argv, MPI_Comm world_comm)

Initialize Aluminum with an explicit MPI world communicator.

This is identical to Initialize(int&, char**&), however, it allows a different world communicator world_comm to be specified. Aluminum will treat this as its world in instances where a default world communicator is needed.

Aluminum will create a duplicate of world_comm.

Parameters:
  • argc, argv – The argc and argv arguments provided to the binary.

  • world_comm – A default world communicator for Aluminum.

void Finalize()

Clean up Aluminum.

This will clean up all outstanding Aluminum resources and shut down communication libraries.

Do not make any additional calls to Aluminum after calling this function, except for Initialized().

bool Initialized()

Return true if Aluminum has been initialized and has not been finalized.

It is always safe to call this.

template<typename Backend, typename T>
void Allreduce(const T *sendbuf, T *recvbuf, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::allreduce_algo_type algo = Backend::allreduce_algo_type::automatic)

Perform an allreduce.

See Allreduce.

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced.

  • recvbuf[out] Buffer for the reduced vector.

  • count[in] Length of sendbuf and recvbuf in elements of type T.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce over.

  • algo[in] Request a particular allreduce algorithm.

template<typename Backend, typename T>
void Allreduce(T *buffer, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::allreduce_algo_type algo = Backend::allreduce_algo_type::automatic)

Perform an in-place Allreduce().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector to be reduced. Will be replaced with the reduced vector.

  • count – Length of buffer in elements of type T.

  • op – The reduction operation to perform.

  • comm – The communicator to reduce over.

  • algo – Request a particular allreduce algorithm.

template<typename Backend, typename T>
void NonblockingAllreduce(const T *sendbuf, T *recvbuf, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::allreduce_algo_type algo = Backend::allreduce_algo_type::automatic)

Perform a non-blocking Allreduce().

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced.

  • recvbuf[out] Buffer for the reduced vector.

  • count[in] Length of sendbuf and recvbuf in elements of type T.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular allreduce algorithm.

template<typename Backend, typename T>
void NonblockingAllreduce(T *buffer, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::allreduce_algo_type algo = Backend::allreduce_algo_type::automatic)

Perform a non-blocking in-place Allreduce().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector to be reduced. Will be replaced with the reduced vector.

  • count – Length of buffer in elements of type T.

  • op – The reduction operation to perform.

  • comm – The communicator to reduce over.

  • req[out] Request object for the asynchronous operation.

  • algo – Request a particular allreduce algorithm.

template<typename Backend, typename T>
void Reduce(const T *sendbuf, T *recvbuf, size_t count, ReductionOperator op, int root, typename Backend::comm_type &comm, typename Backend::reduce_algo_type algo = Backend::reduce_algo_type::automatic)

Perform a reduce-to-one.

See Reduce.

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced.

  • recvbuf[out] Buffer for the reduced vector.

  • count[in] Length of sendbuf and recvbuf in elements of type T.

  • op[in] The reduction operation to perform.

  • root[in] Root rank for the operation.

  • comm[in] The communicator to reduce over.

  • algo[in] Request a particular reduction algorithm.

template<typename Backend, typename T>
void Reduce(T *buffer, size_t count, ReductionOperator op, int root, typename Backend::comm_type &comm, typename Backend::reduce_algo_type algo = Backend::reduce_algo_type::automatic)

Perform an in-place Reduce().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector to be reduced. Will be replaced with the reduced vector.

  • count[in] Length of recvbuf in elements of type T.

  • op[in] The reduction operation to perform.

  • root[in] Root rank for the operation.

  • comm[in] The communicator to reduce over.

  • algo[in] Request a particular reduction algorithm.

template<typename Backend, typename T>
void NonblockingReduce(const T *sendbuf, T *recvbuf, size_t count, ReductionOperator op, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::reduce_algo_type algo = Backend::reduce_algo_type::automatic)

Perform a non-blocking Reduce().

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced.

  • recvbuf[out] Buffer for the reduced vector.

  • count[in] Length of sendbuf and recvbuf in elements of type T.

  • op[in] The reduction operation to perform.

  • root[in] Root rank for the operation.

  • comm[in] The communicator to reduce over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular reduction algorithm.

template<typename Backend, typename T>
void NonblockingReduce(T *buffer, size_t count, ReductionOperator op, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::reduce_algo_type algo = Backend::reduce_algo_type::automatic)

Perform a non-blocking in-place Reduce().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector to be reduced. Will be replaced with the reduced vector.

  • count[in] Length of buffer in elements of type T.

  • op[in] The reduction operation to perform.

  • root[in] Root rank for the operation.

  • comm[in] The communicator to reduce over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular reduction algorithm.

template<typename Backend, typename T>
void Reduce_scatter(const T *sendbuf, T *recvbuf, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::reduce_scatter_algo_type algo = Backend::reduce_scatter_algo_type::automatic)

Perform a reduce-scatter.

See Reduce-scatter.

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced/scattered.

  • recvbuf[out] Buffer for the scattered portion of the reduced vector.

  • count[in] Length of recvbuf in elements of type T. sendbuf should be count * comm.size() elements.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • algo[in] Request a particular reduce-scatter algorithm.

template<typename Backend, typename T>
void Reduce_scatter(T *buffer, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::reduce_scatter_algo_type algo = Backend::reduce_scatter_algo_type::automatic)

Perform an in-place Reduce_scatter().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector to be reduced/scattered. Will be replaced with the scattered portion of the reduced vector.

  • count[in] Length, in elements of type T, of the scattered portion of the reduced vector. buffer should be count * comm.size() elements.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • algo[in] Request a particular reduce-scatter algorithm.

template<typename Backend, typename T>
void NonblockingReduce_scatter(const T *sendbuf, T *recvbuf, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::reduce_scatter_algo_type algo = Backend::reduce_scatter_algo_type::automatic)

Perform a non-blocking Reduce_scatter().

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced/scattered.

  • recvbuf[out] Buffer for the scattered portion of the reduced vector.

  • count[in] Length of recvbuf in elements of type T. sendbuf should be count * comm.size() elements.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular reduce-scatter algorithm.

template<typename Backend, typename T>
void NonblockingReduce_scatter(T *buffer, size_t count, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::reduce_scatter_algo_type algo = Backend::reduce_scatter_algo_type::automatic)

Perform a non-blocking in-place Reduce_scatter().

Parameters:
  • buffer[inout] Inout and output buffer initially containing the local vector to be reduced/scattered. Will be replaced with the scattered portion of the reduced vector.

  • count[in] Length, in elements of type T, of the scattered portion of the reduced vector. buffer should be count * comm.size() elements.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular reduce-scatter algorithm.

template<typename Backend, typename T>
void Reduce_scatterv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::reduce_scatterv_algo_type algo = Backend::reduce_scatterv_algo_type::automatic)

Perform a vector Reduce_scatter().

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced/scattered.

  • recvbuf[out] Buffer for the scattered portion of the reduced vector.

  • counts[in] Vector of the length of the scattered vector each rank should receive, in elements of type T.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • algo[in] Request a particular reduce-scatterv algorithm.

template<typename Backend, typename T>
void Reduce_scatterv(T *buffer, std::vector<size_t> counts, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::reduce_scatterv_algo_type algo = Backend::reduce_scatterv_algo_type::automatic)

Perform an in-place Reduce_scatterv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector to be reduced/scattered. Will be replaced with the scattered portion of the reduced vector.

  • counts[in] Vector of the length of the scattered vector each rank should receive, in elements of type T.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • algo[in] Request a particular reduce-scatterv algorithm.

template<typename Backend, typename T>
void NonblockingReduce_scatterv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::reduce_scatterv_algo_type algo = Backend::reduce_scatterv_algo_type::automatic)

Perform a non-blocking Reduce_scatterv().

Parameters:
  • sendbuf[in] Buffer containing the local vector to be reduced/scattered.

  • recvbuf[out] Buffer for the scattered portion of the reduced vector.

  • counts[in] Vector of the length of the scattered vector each rank should receive, in elements of type T.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular reduce-scatterv algorithm.

template<typename Backend, typename T>
void NonblockingReduce_scatterv(T *buffer, std::vector<size_t> counts, ReductionOperator op, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::reduce_scatterv_algo_type algo = Backend::reduce_scatterv_algo_type::automatic)

Perform a non-blocking in-place Reduce_scatterv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector to be reduced/scattered. Will be replaced with the scattered portion of the reduced vector.

  • counts[in] Vector of the length of the scattered vector each rank should receive, in elements of type T.

  • op[in] The reduction operation to perform.

  • comm[in] The communicator to reduce/scatter over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular reduce-scatterv algorithm.

template<typename Backend, typename T>
void Allgather(const T *sendbuf, T *recvbuf, size_t count, typename Backend::comm_type &comm, typename Backend::allgather_algo_type algo = Backend::allgather_algo_type::automatic)

Perform an allgather.

See Allgather.

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector.

  • count[in] Length of sendbuf in elements of type T. recvbuf should be count * comm.size() elements.

  • comm[in] The communicator to allgather over.

  • algo[in] Request a particular allgather algorithm.

template<typename Backend, typename T>
void Allgather(T *buffer, size_t count, typename Backend::comm_type &comm, typename Backend::allgather_algo_type algo = Backend::allgather_algo_type::automatic)

Perform an in-place Allgather().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local slice of data. Will contain the gathered vector.

  • count[in] Length, in elements of type T, of the local slice of data. buffer should be count * comm.size() elements.

  • comm[in] The communicator to allgather over.

  • algo[in] Request a particular allgather algorithm.

template<typename Backend, typename T>
void NonblockingAllgather(const T *sendbuf, T *recvbuf, size_t count, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::allgather_algo_type algo = Backend::allgather_algo_type::automatic)

Perform a non-blocking Allgather().

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector.

  • count[in] Length of sendbuf in elements of type T. recvbuf should be count * comm.size() elements.

  • comm[in] The communicator to allgather over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular allgather algorithm.

template<typename Backend, typename T>
void NonblockingAllgather(T *buffer, size_t count, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::allgather_algo_type algo = Backend::allgather_algo_type::automatic)

Perform a non-blocking in-place Allgather().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local slice of data. Will contain the gathered vector.

  • count[in] Length, in elements of type T, of the local slice of data. buffer should be count * comm.size() elements.

  • comm[in] The communicator to allgather over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular allgather algorithm.

template<typename Backend, typename T>
void Allgatherv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, std::vector<size_t> displs, typename Backend::comm_type &comm, typename Backend::allgatherv_algo_type algo = Backend::allgatherv_algo_type::automatic)

Perform a vector Allgather().

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector.

  • counts[in] Length of sendbuf on each rank in elements of type T.

  • displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank should be received.

  • comm[in] The communicator to allgatherv over.

  • algo[in] Request a particular allgatherv algorithm.

template<typename Backend, typename T>
void Allgatherv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, typename Backend::comm_type &comm, typename Backend::allgatherv_algo_type algo = Backend::allgatherv_algo_type::automatic)

Perform an in-place Allgatherv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local slice of data. Will contain the gathered vector.

  • counts[in] Length of each rank’s slice in elements of type T.

  • displs[in] Offsets, in elements of type T, into buffer where data from the corresponding rank should be received.

  • comm[in] The communicator to allgatherv over.

  • algo[in] Request a particular allgatherv algorithm.

template<typename Backend, typename T>
void NonblockingAllgatherv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, std::vector<size_t> displs, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::allgatherv_algo_type algo = Backend::allgatherv_algo_type::automatic)

Perform a non-blocking Allgatherv().

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector.

  • counts[in] Length of sendbuf on each rank in elements of type T.

  • displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank should be received.

  • comm[in] The communicator to allgatherv over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular allgatherv algorithm.

template<typename Backend, typename T>
void NonblockingAllgatherv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::allgatherv_algo_type algo = Backend::allgatherv_algo_type::automatic)

Perform a non-blocking in-place Allgatherv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local slice of data. Will contain the gathered vector.

  • counts[in] Length of each rank’s slice in elements of type T.

  • displs[in] Offsets, in elements of type T, into buffer where data from the corresponding rank should be received.

  • comm[in] The communicator to allgatherv over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular allgatherv algorithm.

template<typename Backend>
void Barrier(typename Backend::comm_type &comm, typename Backend::barrier_algo_type algo = Backend::barrier_algo_type::automatic)

Perform a barrier synchronization.

See Barrier.

Parameters:
  • comm[in] The communicator to synchronize over.

  • algo[in] Request a particular barrier algorithm.

template<typename Backend>
void NonblockingBarrier(typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::barrier_algo_type algo = Backend::barrier_algo_type::automatic)

Perform a non-blocking Barrier().

Parameters:
  • comm[in] The communicator to synchronize over.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular barrier algorithm.

template<typename Backend, typename T>
void Bcast(T *buffer, size_t count, int root, typename Backend::comm_type &comm, typename Backend::bcast_algo_type algo = Backend::bcast_algo_type::automatic)

Perform a broadcast.

Broadcast is always in-place.

See Bcast.

Parameters:
  • buffer[inout] On the root, buffer containing the data to broadcast. On other ranks, buffer that will receive the broadcasted data.

  • count[in] Length of buffer in elements of type T.

  • root[in] Root rank for the operation.

  • comm[in] The communicator to broadcast over.

  • algo[in] Request a particular broadcast algorithm.

template<typename Backend, typename T>
void NonblockingBcast(T *buffer, size_t count, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::bcast_algo_type algo = Backend::bcast_algo_type::automatic)

Perform a non-blocking Bcast().

Broadcast is always in-place.

Parameters:
  • buffer[inout] On the root, buffer containing the data to broadcast. On other ranks, buffer that will receive the broadcasted data.

  • count[in] Length of buffer in elements of type T.

  • root[in] Root rank for the operation.

  • comm[in] The communicator to broadcast over.

  • req[out] Request for the asynchronous operation.

  • algo[in] Request a particular broadcast algorithm.

template<typename Backend, typename T>
void Alltoall(const T *sendbuf, T *recvbuf, size_t count, typename Backend::comm_type &comm, typename Backend::alltoall_algo_type algo = Backend::alltoall_algo_type::automatic)

Perform an all-to-all.

See Alltoall.

Parameters:
  • sendbuf[in] Buffer containing the local vector slices.

  • recvbuf[out] Buffer for the assembled slices.

  • count[in] Length of each slice in sendbuf in elements of type T. sendbuf and recvbuf should be count * comm.size() elements.

  • comm[in] The communicator for this all-to-all operation.

  • algo[in] Request a particular all-to-all algorithm.

template<typename Backend, typename T>
void Alltoall(T *buffer, size_t count, typename Backend::comm_type &comm, typename Backend::alltoall_algo_type algo = Backend::alltoall_algo_type::automatic)

Perform an in-place Alltoall().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector slices. Will be replaced with the assembled slices.

  • count[in] Length of each slice in sendbuf in elements of type T. buffer should be count * comm.size() elements.

  • comm[in] The communicator fo this all-to-all operation.

  • algo[in] Request a particular all-to-all algorithm.

template<typename Backend, typename T>
void NonblockingAlltoall(const T *sendbuf, T *recvbuf, size_t count, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::alltoall_algo_type algo = Backend::alltoall_algo_type::automatic)

Perform a nonblocking Alltoall().

Parameters:
  • sendbuf[in] Buffer containing the local vector slices.

  • recvbuf[out] Buffer for the assembled slices.

  • count[in] Length of each slice in sendbuf in elements of type T. sendbuf and recvbuf should be count * comm.size() elements.

  • comm[in] The communicator for this all-to-all operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular all-to-all algorithm.

template<typename Backend, typename T>
void NonblockingAlltoall(T *buffer, size_t count, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::alltoall_algo_type algo = Backend::alltoall_algo_type::automatic)

Perform a non-blocking in-place Alltoall().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector slices. Will be replaced with the assembled slices.

  • count[in] Length of each slice in sendbuf in elements of type T. buffer should be count * comm.size() elements.

  • comm[in] The communicator fo this all-to-all operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular all-to-all algorithm.

template<typename Backend, typename T>
void Alltoallv(const T *sendbuf, std::vector<size_t> send_counts, std::vector<size_t> send_displs, T *recvbuf, std::vector<size_t> recv_counts, std::vector<size_t> recv_displs, typename Backend::comm_type &comm, typename Backend::alltoallv_algo_type algo = Backend::alltoallv_algo_type::automatic)

Perform a vector Alltoall().

Parameters:
  • sendbuf[in] Buffer containing the local vector slices.

  • send_counts[in] Length of each slice in sendbuf in elements of type T.

  • send_displs[in] Offsets, in elements of type T, into sendbuf where the data for the corresponding rank begins.

  • recvbuf[out] Buffer for the assembled slices.

  • recv_counts[in] Length of each slice that will be received in recvbuf in elements of type T.

  • recv_displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank should be received.

  • comm[in] Communicator for this all-to-all operation.

  • algo[in] Request a particular vector all-to-all algorithm.

template<typename Backend, typename T>
void Alltoallv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, typename Backend::comm_type &comm, typename Backend::alltoallv_algo_type algo = Backend::alltoallv_algo_type::automatic)

Perform an in-place Alltoallv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector slices for each rank. Will contain the assembled slices for this rank.

  • counts[in] Length of the slice sent to and received from each rank, in elements of type T.

  • displs[in] Offsets, in elements of type T, into buffer for data sent to and received from the corresponding rank.

  • comm[in] Communicator for this all-to-all operation.

  • algo[in] Request a particular vector all-to-all algorithm.

template<typename Backend, typename T>
void NonblockingAlltoallv(const T *sendbuf, std::vector<size_t> send_counts, std::vector<size_t> send_displs, T *recvbuf, std::vector<size_t> recv_counts, std::vector<size_t> recv_displs, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::alltoallv_algo_type algo = Backend::alltoallv_algo_type::automatic)

Perform a non-blocking Alltoallv().

Parameters:
  • sendbuf[in] Buffer containing the local vector slices.

  • send_counts[in] Length of each slice in sendbuf in elements of type T.

  • send_displs[in] Offsets, in elements of type T, into sendbuf where the data for the corresponding rank begins.

  • recvbuf[out] Buffer for the assembled slices.

  • recv_counts[in] Length of each slice that will be received in recvbuf in elements of type T.

  • recv_displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank should be received.

  • comm[in] Communicator for this all-to-all operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular vector all-to-all algorithm.

template<typename Backend, typename T>
void NonblockingAlltoallv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::alltoallv_algo_type algo = Backend::alltoallv_algo_type::automatic)

Perform a non-blocking in-place Alltoallv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local vector slices for each rank. Will contain the assembled slices for this rank.

  • counts[in] Length of the slice sent to and received from each rank, in elements of type T.

  • displs[in] Offsets, in elements of type T, into buffer for data sent to and received from the corresponding rank.

  • comm[in] Communicator for this all-to-all operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular vector all-to-all algorithm.

template<typename Backend, typename T>
void Gather(const T *sendbuf, T *recvbuf, size_t count, int root, typename Backend::comm_type &comm, typename Backend::gather_algo_type algo = Backend::gather_algo_type::automatic)

Perform a gather-to-one.

See Gather.

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector on the root.

  • count[in] Length of each local slice in elements of type T. sendbuf should be count elements and recvbuf should be count * comm.size() elements on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this gather operation.

  • algo[in] Request a particular gather algorithm.

template<typename Backend, typename T>
void Gather(T *buffer, size_t count, int root, typename Backend::comm_type &comm, typename Backend::gather_algo_type algo = Backend::gather_algo_type::automatic)

Perform a in-place Gather().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local slice of data. On the root, its slice must be in the location corresponding to its rank position. On non-roots, the entire buffer is the slice. Will be replaced with the gathered vector on the root.

  • count[in] Length of each local slice in elements of type T. buffer should be count elements on non-roots and count * comm.size() elements on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator fo this gather operation.

  • algo[in] Request a particular gather algorithm.

template<typename Backend, typename T>
void NonblockingGather(const T *sendbuf, T *recvbuf, size_t count, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::gather_algo_type algo = Backend::gather_algo_type::automatic)

Perform a non-blocking Gather().

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector on the root.

  • count[in] Length of each local slice in elements of type T. sendbuf should be count elements and recvbuf should be count * comm.size() elements on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this gather operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular gather algorithm.

template<typename Backend, typename T>
void NonblockingGather(T *buffer, size_t count, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::gather_algo_type algo = Backend::gather_algo_type::automatic)

Perform a non-blocking in-place Gather().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local slice of data. On the root, its slice must be in the location corresponding to its rank position. On non-roots, the entire buffer is the slice. Will be replaced with the gathered vector on the root.

  • count[in] Length of each local slice in elements of type T. buffer should be count elements on non-roots and count * comm.size() elements on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator fo this gather operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular gather algorithm.

template<typename Backend, typename T>
void Gatherv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::gatherv_algo_type algo = Backend::gatherv_algo_type::automatic)

Perform a vector Gather().

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector on the root.

  • counts[in] Length of each rank’s slice in elements of type T.

  • displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank will be received on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this gather operation.

  • algo[in] Request a particular vector gather algorithm.

template<typename Backend, typename T>
void Gatherv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::gatherv_algo_type algo = Backend::gatherv_algo_type::automatic)

Perform a in-place Gatherv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local slice of data. On the root, its slice must be in the location corresponding to its rank position. On non-roots, the entire buffer is the slice. Will be replaced with the gathered vector on the root.

  • counts[in] Length of each rank’s slice in elements of type T.

  • displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank will be received on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this gather operation.

  • algo[in] Request a particular vector gather algorithm.

template<typename Backend, typename T>
void NonblockingGatherv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::gatherv_algo_type algo = Backend::gatherv_algo_type::automatic)

Perform a non-blocking Gatherv().

Parameters:
  • sendbuf[in] Buffer containing the local slice of data.

  • recvbuf[out] Buffer for the gathered vector on the root.

  • counts[in] Length of each rank’s slice in elements of type T.

  • displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank will be received on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this gather operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular vector gather algorithm.

template<typename Backend, typename T>
void NonblockingGatherv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::gatherv_algo_type algo = Backend::gatherv_algo_type::automatic)

Perform a non-blocking in-place Gatherv().

Parameters:
  • buffer[inout] Inout and output buffer initially containing the local slice of data. On the root, its slice must be in the location corresponding to its rank position. On non-roots, the entire buffer is the slice. Will be replaced with the gathered vector on the root.

  • counts[in] Length of each rank’s slice in elements of type T.

  • displs[in] Offsets, in elements of type T, into recvbuf where data from the corresponding rank will be received on the root.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this gather operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular vector gather algorithm.

template<typename Backend, typename T>
void Scatter(const T *sendbuf, T *recvbuf, size_t count, int root, typename Backend::comm_type &comm, typename Backend::scatter_algo_type algo = Backend::scatter_algo_type::automatic)

Perform a scatter-to-all.

See Scatter.

Parameters:
  • sendbuf[in] Buffer containing the complete vector at the root. Empty on non-roots.

  • recvbuf[out] Buffer for the scattered slice.

  • count[in] Length of each scattered slice in elements of type T. sendbuf should be count * comm.size() elements on the root and empty on non-roots. recvbuf should be count elements.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • algo[in] Request a particular scatter algorithm.

template<typename Backend, typename T>
void Scatter(T *buffer, size_t count, int root, typename Backend::comm_type &comm, typename Backend::scatter_algo_type algo = Backend::scatter_algo_type::automatic)

Perform an in-place Scatter().

Parameters:
  • buffer[inout] Input and output buffer initially containing the complete vector at the root and empty on non-roots. Will be replaced with the scattered slice on each rank. At the root, the scattered slice is in its corresponding rank position.

  • count[in] Length of each scattered slice in elements of type T. buffer should be count * comm.size() elements on the root and count elements on non-roots.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • algo[in] Request a particular scatter algorithm.

template<typename Backend, typename T>
void NonblockingScatter(const T *sendbuf, T *recvbuf, size_t count, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::scatter_algo_type algo = Backend::scatter_algo_type::automatic)

Perform a non-blocking Scatter().

Parameters:
  • sendbuf[in] Buffer containing the complete vector at the root. Empty on non-roots.

  • recvbuf[out] Buffer for the scattered slice.

  • count[in] Length of each scattered slice in elements of type T. sendbuf should be count * comm.size() elements on the root and empty on non-roots. recvbuf should be count elements.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular scatter algorithm.

template<typename Backend, typename T>
void NonblockingScatter(T *buffer, size_t count, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::scatter_algo_type algo = Backend::scatter_algo_type::automatic)

Perform a non-blocking in-place Scatter().

Parameters:
  • buffer[inout] Input and output buffer initially containing the complete vector at the root and empty on non-roots. Will be replaced with the scattered slice on each rank. At the root, the scattered slice is in its corresponding rank position.

  • count[in] Length of each scattered slice in elements of type T. buffer should be count * comm.size() elements on the root and count elements on non-roots.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular scatter algorithm.

template<typename Backend, typename T>
void Scatterv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::scatterv_algo_type algo = Backend::scatterv_algo_type::automatic)

Perform a vector Scatter().

Parameters:
  • sendbuf[in] Buffer containing the complete vector at the root. Empty on non-roots.

  • recvbuf[out] Buffer for the scattered slice.

  • counts[in] Length of the slice each rank will receive, in elements of type T.

  • displs[in] Offsets, in elements of type T, into sendbuf where the slices for each corresponding rank begin.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • algo[in] Request a particular vector scatter algorithm.

template<typename Backend, typename T>
void Scatterv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::scatterv_algo_type algo = Backend::scatterv_algo_type::automatic)

Perform a in-place Scatterv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the complete vector at the root and empty on non-roots. Will be replaced with the scattered slice on each rank. At the root, the scattered slice is in its corresponding rank position.

  • counts[in] Length of the slice each rank will receive, in elements of type T.

  • displs[in] Offsets, in elements of type T, into buffer where the slices for each corresponding rank begin.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • algo[in] Request a particular vector scatter algorithm.

template<typename Backend, typename T>
void NonblockingScatterv(const T *sendbuf, T *recvbuf, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::scatterv_algo_type algo = Backend::scatterv_algo_type::automatic)

Perform a non-blocking Scatterv().

Parameters:
  • sendbuf[in] Buffer containing the complete vector at the root. Empty on non-roots.

  • recvbuf[out] Buffer for the scattered slice.

  • counts[in] Length of the slice each rank will receive, in elements of type T.

  • displs[in] Offsets, in elements of type T, into sendbuf where the slices for each corresponding rank begin.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • req[out] Request object for the asynchronus operation.

  • algo[in] Request a particular vector scatter algorithm.

template<typename Backend, typename T>
void NonblockingScatterv(T *buffer, std::vector<size_t> counts, std::vector<size_t> displs, int root, typename Backend::comm_type &comm, typename Backend::req_type &req, typename Backend::scatterv_algo_type algo = Backend::scatterv_algo_type::automatic)

Perform a non-blocking in-place Scatterv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the complete vector at the root and empty on non-roots. Will be replaced with the scattered slice on each rank. At the root, the scattered slice is in its corresponding rank position.

  • counts[in] Length of the slice each rank will receive, in elements of type T.

  • displs[in] Offsets, in elements of type T, into buffer where the slices for each corresponding rank begin.

  • root[in] Root rank for the operation.

  • comm[in] The communicator for this scatter operation.

  • req[out] Request object for the asynchronous operation.

  • algo[in] Request a particular vector scatter algorithm.

template<typename Backend, typename T>
void Send(const T *sendbuf, size_t count, int dest, typename Backend::comm_type &comm)

Send a point-to-point message.

See Send and Recv.

Parameters:
  • sendbuf[in] Buffer containing the local data to send.

  • count[in] Length of sendbuf in elements of type T.

  • dest[in] Rank in comm to send to.

  • comm[in] Communicator to send within.

template<typename Backend, typename T>
void NonblockingSend(const T *sendbuf, size_t count, int dest, typename Backend::comm_type &comm, typename Backend::req_type &req)

Perform a non-blocking Send().

Parameters:
  • sendbuf[in] Buffer containing the local data to send.

  • count[in] Length of sendbuf in elements of type T.

  • dest[in] Rank in comm to send to.

  • req[out] Request object for the asynchronous operation.

  • comm[in] Communicator to send within.

template<typename Backend, typename T>
void Recv(T *recvbuf, size_t count, int src, typename Backend::comm_type &comm)

Receive a point-to-point message.

See Send and Recv.

Parameters:
  • recvbuf[out] Buffer to receive the sent data.

  • count[in] Length of recvbuf in elements of type T.

  • src[in] Rank in comm to receive from.

  • comm[in] Communicator to receive within.

template<typename Backend, typename T>
void NonblockingRecv(T *recvbuf, size_t count, int src, typename Backend::comm_type &comm, typename Backend::req_type &req)

Perform a non-blocking Recv().

Parameters:
  • recvbuf[out] Buffer to receive the sent data.

  • count[in] Length of recvbuf in elements of type T.

  • src[in] Rank in comm to receive from.

  • req[out] Request object for the asynchronous operation.

  • comm[in] Communicator to receive within.

template<typename Backend, typename T>
void SendRecv(const T *sendbuf, size_t send_count, int dest, T *recvbuf, size_t recv_count, int src, typename Backend::comm_type &comm)

Perform a simultaneous Send() and Recv().

See SendRecv.

Parameters:
  • sendbuf[in] Buffer containing the local data to send.

  • send_count[in] Length of sendbuf in elements of type T.

  • dest[in] Rank in comm to send to.

  • recvbuf[out] Buffer to receive the sent data.

  • recv_count[in] Length of recvbuf in elements of type T.

  • src[in] Rank in comm to receive from.

  • comm[in] Communicator to send/recv within.

template<typename Backend, typename T>
void SendRecv(T *buffer, size_t count, int dest, int src, typename Backend::comm_type &comm)

Perform an in-place SendRecv().

Parameters:
  • buffer[inout] Input and output buffer initially containing the local data to send. Will be replaced with the received data.

  • count[in] Length of data to send and receive. buffer should be count elements.

  • dest[in] Rank in comm to send to.

  • src[in] Rank in comm to receive from.

  • comm[in] Communicator to send/recv within.

template<typename Backend, typename T>
void NonblockingSendRecv(const T *sendbuf, size_t send_count, int dest, T *recvbuf, size_t recv_count, int src, typename Backend::comm_type &comm, typename Backend::req_type &req)

Perform a non-blocking SendRecv().

Parameters:
  • sendbuf[in] Buffer containing the local data to send.

  • send_count[in] Length of sendbuf in elements of type T.

  • dest[in] Rank in comm to send to.

  • recvbuf[out] Buffer to receive the sent data.

  • recv_count[in] Length of recvbuf in elements of type T.

  • src[in] Rank in comm to receive from.

  • comm[in] Communicator to send/recv within.

  • req[out] Request object for the asynchronous operation.

template<typename Backend, typename T>
void NonblockingSendRecv(T *buffer, size_t count, int dest, int src, typename Backend::comm_type &comm, typename Backend::req_type &req)

Perform a non-blocking in-place SendRecv().

Parameters:
  • buffer[inout] Input and output buffer initially contaiuning the local data to send. Will be replaced with the received data.

  • count[in] Length of data to send and receive. buffer should be count elements.

  • dest[in] Rank in comm to send to.

  • src[in] Rank in comm to receive from.

  • comm[in] Communicator to send/recv within.

  • req[out] Request object for the asynchronous operation.

template<typename Backend, typename T>
void MultiSendRecv(std::vector<const T*> send_buffers, std::vector<size_t> send_counts, std::vector<int> dests, std::vector<T*> recv_buffers, std::vector<size_t> recv_counts, std::vector<int> srcs, typename Backend::comm_type &comm)

Perform an arbitrary sequence of Send() and Recv() operations.

See MultiSendRecv.

Parameters:
  • send_buffers[in] Vector of buffers containing the local data to send.

  • send_counts[in] Vector of the lengths of each buffer in send_buffers in elements of type T.

  • dests[in] Vector of the destination rank to send each buffer to.

  • recv_buffers[out] Vector of buffers to receive data in.

  • recv_counts[in] Vector of the lengths of each buffer in recv_buffers in elements of type T.

  • srcs[in] Vector of the ranks to receive from.

  • comm[in] Communicator to send/recv within.

template<typename Backend, typename T>
void MultiSendRecv(std::vector<T*> buffers, std::vector<size_t> counts, std::vector<int> dests, std::vector<int> srcs, typename Backend::comm_type &comm)

Perform an in-place MultiSendRecv().

Parameters:
  • buffers[inout] Vector of input and output buffers initially containing the local data to send. Will be replaced with the received data.

  • counts[in] Vector of the lengths of data to send and receive.

  • dests[in] Vector of the destination rank to send each buffer to.

  • srcs[in] Vector of the ranks to receive from.

  • comm[in] Communicator to send/recv within.

template<typename Backend, typename T>
void NonblockingMultiSendRecv(std::vector<const T*> send_buffers, std::vector<size_t> send_counts, std::vector<int> dests, std::vector<T*> recv_buffers, std::vector<size_t> recv_counts, std::vector<int> srcs, typename Backend::comm_type &comm, typename Backend::req_type &req)

Perform a non-blocking MultiSendRecv().

Parameters:
  • send_buffers[in] Vector of buffers containing the local data to send.

  • send_counts[in] Vector of the lengths of each buffer in send_buffers in elements of type T.

  • dests[in] Vector of the destination rank to send each buffer to.

  • recv_buffers[out] Vector of buffers to receive data in.

  • recv_counts[in] Vector of the lengths of each buffer in recv_buffers in elements of type T.

  • srcs[in] Vector of the ranks to receive from.

  • comm[in] Communicator to send/recv within.

  • req[out] Request object for the asynchronous operation.

template<typename Backend, typename T>
void NonblockingMultiSendRecv(std::vector<T*> buffers, std::vector<size_t> counts, std::vector<int> dests, std::vector<int> srcs, typename Backend::comm_type &comm, typename Backend::req_type &req)

Perform a non-blocking in-place MultiSendRecv().

Parameters:
  • buffers[inout] Vector of input and output buffers initially containing the local data to send. Will be replaced with the received data.

  • counts[in] Vector of the lengths of data to send and receive.

  • dests[in] Vector of the destination rank to send each buffer to.

  • srcs[in] Vector of the ranks to receive from.

  • comm[in] Communicator to send/recv within.

  • req[out] Request object for the asynchronous operation.

template<typename Backend>
bool Test(typename Backend::req_type &req)

Return true if the asynchronous operation associated with req has completed.

This does not block. If the operation has completed, req will be reset to Backend::null_req.

See Non-Blocking Operations.

Parameters:

req[inout] Request object for the asynchronous operation.

template<typename Backend>
void Wait(typename Backend::req_type &req)

Wait until the asynchronous operation associated with req has has completed.

This blocks the compute stream associated with the operation.

req will be reset to Backend::null_req after the operation completes.

See Non-Blocking Operations.

Parameters:

req[inout] Request object for the asynchronous operation.

namespace ext