# NVLink

### About NVLink

NVLink is supported **only** on instance flavors that include **8× NVIDIA H100 or H200 GPUs**.\
In this configuration, NVLink provides high-bandwidth, low-latency GPU-to-GPU communication, enabling:

* **Faster model training**, especially for large models that require frequent inter-GPU data exchange
* **Improved scaling efficiency** when using distributed training frameworks (e.g., Megatron-LM, DeepSpeed, PyTorch FSDP)
* **Reduced communication bottlenecks** compared to PCIe-only GPU connectivity
* **Higher overall compute throughput** for workloads that depend on multi-GPU synchronization

Instances with fewer than 8 GPUs do **not** support NVLink.

### Enable NVLink for 8x GPUs Flavors

> **Note:** NVLink is **not enabled by default** in FPT-provided images.\
> Users must manually enable NVLink if required.

**To enable NVLink support**, follow the steps below:

* Open the file:\
  \&#xNAN;**`/etc/default/grub.d/00-fci-grub.cfg`**
* Locate the following line and **remove or comment it out**:

```
GRUB_CMDLINE_LINUX_DEFAULT="nvidia.NVreg_NvLinkDisable=1" 
```

* Update the GRUB configuration:

```
sudo update-grub 
```

Here is a clean, well-structured **user-guide rewrite**:

***

### Verification

To verify that NVLink has been enabled successfully, run the following commands.

#### 1. Check NVLink status

```bash
nvidia-smi nvlink --status
```

The output should show **active NVLink connections** with their link speeds (e.g., **25 GB/s per lane**).

#### 2. Check GPU topology

```bash
nvidia-smi topo -m
```

The output should display **NVLink (NV##)** connections between all GPUs, similar to the example below:

```
        GPU0   GPU1   GPU2   GPU3   GPU4   GPU5   GPU6   GPU7   CPU Affinity   NUMA Affinity
GPU0     X    NV18   NV18   NV18   NV18   NV18   NV18   NV18   0-127          0-1
GPU1   NV18     X    NV18   NV18   NV18   NV18   NV18   NV18   0-127          0-1
GPU2   NV18   NV18     X    NV18   NV18   NV18   NV18   NV18   0-127          0-1
GPU3   NV18   NV18   NV18     X    NV18   NV18   NV18   NV18   0-127          0-1
GPU4   NV18   NV18   NV18   NV18     X    NV18   NV18   NV18   0-127          0-1
GPU5   NV18   NV18   NV18   NV18   NV18     X    NV18   NV18   0-127          0-1
GPU6   NV18   NV18   NV18   NV18   NV18   NV18     X    NV18   0-127          0-1
GPU7   NV18   NV18   NV18   NV18   NV18   NV18   NV18     X    0-127          0-1
```

#### Legend

* **X** : Same GPU
* **SYS** : PCIe + inter-NUMA interconnect
* **NODE**: PCIe + interconnect within one NUMA node
* **PHB** : PCIe Host Bridge
* **PXB** : Multiple PCIe bridges
* **PIX** : Single PCIe bridge
* **NV#** : NVLink connection with # bonded links

***

### Troubleshooting

If you encounter errors such as:

* `system not yet initialized`
* Issues when calling `torch.cuda.device_count()` or `torch.cuda.get_device_name(i)`

Restart the NVIDIA Fabric Manager:

```bash
sudo systemctl restart nvidia-fabricmanager
```

Then retry your application or verification steps.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-docs.fptcloud.com/fpt-gpu-cloud/gpu-virtual-machine/on-fpt-cloud-console/tutorials/nvlink.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
