✌️NVLink
About NVLink
NVLink is supported only on instance flavors that include 8× NVIDIA H100 or H200 GPUs. In this configuration, NVLink provides high-bandwidth, low-latency GPU-to-GPU communication, enabling:
Faster model training, especially for large models that require frequent inter-GPU data exchange
Improved scaling efficiency when using distributed training frameworks (e.g., Megatron-LM, DeepSpeed, PyTorch FSDP)
Reduced communication bottlenecks compared to PCIe-only GPU connectivity
Higher overall compute throughput for workloads that depend on multi-GPU synchronization
Instances with fewer than 8 GPUs do not support NVLink.
Enable NVLink for 8x GPUs Flavors
Note: NVLink is not enabled by default in FPT-provided images. Users must manually enable NVLink if required.
To enable NVLink support, follow the steps below:
Open the file:
/etc/default/grub.d/00-fci-grub.cfgLocate the following line and remove or comment it out:
GRUB_CMDLINE_LINUX_DEFAULT="nvidia.NVreg_NvLinkDisable=1" Update the GRUB configuration:
sudo update-grub Here is a clean, well-structured user-guide rewrite:
Verification
To verify that NVLink has been enabled successfully, run the following commands.
1. Check NVLink status
The output should show active NVLink connections with their link speeds (e.g., 25 GB/s per lane).
2. Check GPU topology
The output should display NVLink (NV##) connections between all GPUs, similar to the example below:
Legend
X : Same GPU
SYS : PCIe + inter-NUMA interconnect
NODE: PCIe + interconnect within one NUMA node
PHB : PCIe Host Bridge
PXB : Multiple PCIe bridges
PIX : Single PCIe bridge
NV# : NVLink connection with # bonded links
Troubleshooting
If you encounter errors such as:
system not yet initializedIssues when calling
torch.cuda.device_count()ortorch.cuda.get_device_name(i)
Restart the NVIDIA Fabric Manager:
Then retry your application or verification steps.
Last updated
