# Managed K8s with GPU Virtual Machine

## Overview

FPT Cloud provides Kubernetes using NVIDIA GPUs with the following key features:

* Flexible GPU configuration with multiple GPU types, optional GPU memory, applied per Worker Group.
* Automated management and provisioning of GPU resources in Kubernetes with NVIDIA Operator.

  Visualization and monitoring of GPUs using NVIDIA DCGM.
* Automatically scale containers/nodes with Autoscaler when application demand for GPU resources increases/decreases.
* Support GPU sharing with the Multi-Instance mechanism, helping to optimize GPU resource and cost usage.

FPT Cloud uses NVIDIA GPU Operator to provide tools for automatically managing all the software components needed to use GPUs on Kubernetes. GPU Operator allows users to use GPU resources just like they use CPUs in a Kubernetes cluster.

The Operator's components include:

* NVIDIA Drivers (CUDA, MIG, etc.)
* NVIDIA Device Plugin
* NVIDIA Container Toolkit
* NVIDIA GPU Feature Discovery
* NVIDIA Data Center GPU Manager (Monitoring)

In the Hanoi 2 and Japan regions, FPT Cloud currently supports Kubernetes using Nvidia H100 GPUs and Nvidia H200 GPUs

| **No.** | **GPU H100 SXM5** | **Strategy** | **Number instance** | **Instance resource**                |
| ------- | ----------------- | ------------ | ------------------- | ------------------------------------ |
| 1       | all-1g.10gb       | single       | 7                   | 1g.10gb                              |
| 2       | all-1g.20gb       | single       | 4                   | 1g.20gb                              |
| 3       | all-2g.20gb       | single       | 3                   | 2g.20gb                              |
| 4       | all-3g.40gb       | single       | 2                   | 3g.40gb                              |
| 5       | all-4g.40gb       | single       | 1                   | 4g.40gb                              |
| 6       | all-7g.80gb       | single       | 1                   | 7g.80gb                              |
| 7       | all-balanced      | mixed        | <p>2<br>1<br>1</p>  | <p>1g.10gb<br>2g.20gb<br>3g.40gb</p> |
| 8       | none (no label)   | none         | 0                   | 0 (Entire)                           |

| **No.** | **GPU H200 SXM5** | **Strategy** | **Number instance** | **Instance resource**                |
| ------- | ----------------- | ------------ | ------------------- | ------------------------------------ |
| 1       | all-1g.18gb       | single       | 7                   | 1g.18gb                              |
| 2       | all-1g.35gb       | single       | 4                   | 1g.35gb                              |
| 3       | all-2g.25gb       | single       | 3                   | 2g.25gb                              |
| 4       | all-3g.71gb       | single       | 2                   | 3g.71gb                              |
| 5       | all-4g.71gb       | single       | 1                   | 4g.71gb                              |
| 6       | all-7g.141gb      | single       | 1                   | 7g.141gb                             |
| 7       | all-balanced      | mixed        | <p>2<br>1<br>1</p>  | <p>1g.18gb<br>2g.35gb<br>3g.71gb</p> |
| 8       | none (no label)   | none         | 0                   | 0 (Entire)                           |

***Example:***

* If you select the single strategy configuration: all-1g.10gb, the H100 GPU card on the worker is divided into 7 mig-devices with logical GPU resources (equal to 1/7 of the physical GPU) and 10GB of GPU RAM.

**Note:**

MIG configuration applies to all cards attached to the worker. The MIG strategy on worker groups within the same cluster must be the same type (single/mixed/none).

### Terminology and Definitions\[TP1]  <a href="#toc123732120" id="toc123732120"></a>

<table data-header-hidden><thead><tr><th valign="top"></th><th valign="top"></th></tr></thead><tbody><tr><td valign="top"> <mark style="color:blue;"><strong>Terminology</strong></mark></td><td valign="top"> <mark style="color:blue;"><strong>Definition</strong></mark></td></tr><tr><td valign="top"> <strong>K8s</strong></td><td valign="top"> Kubernetes</td></tr><tr><td valign="top"> <strong>FKE</strong></td><td valign="top"> FPT Kubernetes Engine</td></tr><tr><td valign="top"> <strong>D-FKE</strong></td><td valign="top"> Dedicated – FPT Kubernetes Engine</td></tr><tr><td valign="top"> <strong>M-FKE</strong></td><td valign="top"> Managed – FPT Kubernetes Engine</td></tr><tr><td valign="top"> <strong>Master Node</strong></td><td valign="top">Nodes containing control plane components</td></tr><tr><td valign="top"> <strong>Worker nodes</strong></td><td valign="top"> Nodes used for executing workloads</td></tr><tr><td valign="top"> <strong>Automatic scaling of nodes</strong></td><td valign="top"> Automatic scaling of worker nodes (increase/decrease)</td></tr><tr><td valign="top"> <strong>K8S cluster</strong></td><td valign="top"> A collection of nodes (VMs) configured as a Kubernetes cluster.</td></tr><tr><td valign="top"> <strong>NFS persistent storage</strong></td><td valign="top"> A "persistent" storage partition on NFS.</td></tr><tr><td valign="top"> <strong>Pod</strong></td><td valign="top"> The smallest unit managed by Kubernetes. A Pod contains one or more containers.</td></tr><tr><td valign="top"> <strong>Pod network</strong></td><td valign="top"> The network/subnet used to assign IP addresses to Pods.</td></tr><tr><td valign="top"> <strong>Service Network</strong></td><td valign="top"> The network/subnet used to assign IP addresses to services.</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-docs.fptcloud.com/fpt-gpu-cloud/gpu-cluster/managed-k8s-with-gpu-virtual-machine.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
