Skip to content

nccl

Description

nccl website

The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and Networking. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter as well as point-to-point send and receive that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconnects within a node and over NVIDIA Mellanox Network across nodes.

Environment Modules

Run module spider nccl to find out what environment modules are available for this application.

Environment Variables

  • HPC_NCCL_DIR - Installation Directory
  • HPC_NCCL_BIN - Executable Directory
  • HPC_NCCL_LIB - Library Directory

Additional Usage Information

NCCL helps increase the throughput of analyses by allowing the scheduler to provision GPUs on multiple compute nodes for multi-GPU jobs instead of a single node. The hpg-b200 partition is configured as an NVIDIA SuperPod which allows achieving the highest distributed job efficiency witout having to rely only on the intra-node NVLink connections.

A common error in setting up such jobs is to use the wrong network interfaces in the NVIDIA Superpod network topology or allow the processes to use all network interfaces on the node because only some IB (infiniband) interfaces are used for GPU communications.

We recommend the following settings for distributed superpod jobs on HiPergator hpg-b200 partition:

NCCL_SOCKET_IFNAME=bridge-1145

  • NCCL_IB_HCA, which specifies which Host Channel Adapter (RDMA) interfaces to use for communication.

    NCCL_IB_HCA=mlx5_15,mlx5_10,mlx5_14,mlx5_13,mlx5_8,mlx5_7,mlx5_9,mlx5_4

The above settings will use all interfaces on the compute fabric.

Categories

programming, library, gpu