For the complete documentation index, see llms.txt. This page is also available as Markdown.

Install GPU drivers

Users can install their preferred GPU driver on the FPT Kubernetes Engine cluster with integrated GPU support.

Step 1: Create a GPU Cluster with Driver Installation set to User-Install

Create a cluster with Driver Installation set to User-Install

Step 2: Customers install the software required to use the GPU (Driver, Toolkit, Device Plugin, etc.)

Refer to the GPU driver versions:

Customers can refer to the DaemonSet Driver installation below:

# Copyright 2023 FPT Cloud - PaaS
# worker.fptcloud/type=gpu

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fptcloud-gpu-driver-installer
  namespace: kube-system
  labels:
    k8s-app: gpu-driver
spec:
  selector:
    matchLabels:
      k8s-app: gpu-driver
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: nvidia-driver-installer
        k8s-app: gpu-driver
    spec:
      priorityClassName: system-node-critical
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: worker.fptcloud/type
                operator: In
                values: ["gpu"]
      tolerations:
      - operator: "Exists"
      containers:
        - image: docker.io/alpine:3.13
          name: nvidia-driver-installer
          command:
            - 'nsenter'
            - '-t'
            - '1'
            - '-m'
            - '-u'
            - '-i'
            - '-n'
            - '--'
            - 'bash'
            - '-l'
            - '-c'
            - 'curl -Ls https://raw.githubusercontent.com/fci-xplat/fke-config/main/fptcloud-gpu-driver-installer.sh | bash -s -- -p admin'
          resources:
            requests:
              cpu: 150m
          env:
          - name: NVIDIA_DRIVER_VERSION
            value: "535.54.03"
          - name: NVIDIA_TOOLKIT_INSTALL
            value: "true"
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
            allowPrivilegeEscalation: true
      hostPID: true
      hostNetwork: true
      hostIPC: true

With environment variable parameters:

  • NVIDIA_DRIVER_VERSION: Driver version

  • NVIDIA_TOOLKIT_INSTALL: "true" or "false", default is "true". Automatically install the toolkit or not.

To apply the fptcloud DaemonSet to the K8s cluster, use the following command:

Check the status of the DaemonSet's Pods

kubectl get pod -n kube-system | grep "gpu-driver"

The DaemonSet fptcloud-gpu-driver-installer will schedule pods on all workers in the Worker Group (with the label worker.fptcloud/type: gpu) to install the Driver/Toolkit.

  • Check the logs of the fptcloud-gpu-driver-installer-7tj55 pod to see if the Installer has finished installing.

kubectl logs fptcloud-gpu-driver-installer-7tj55 -n kube-system

  • If the installation is successful, you will see logs as follows. The installation process usually takes a few minutes.

Last updated

Was this helpful?