アプリケーションを展開する

概要

KubernetesはGPUリソースをCPUリソースと同様に管理・使用します。ワーカーグループに選択したGPU構成に応じて、Kubernetes上でアプリケーションのGPUリソースを宣言してください。

注記:

Kubernetesは制限値をデフォルトのリクエスト値として使用するため、リクエストを指定せずに制限値のみを指定できます。
GPU制限と要求の両方を指定できますが、この2つの値は等しくなければなりません。
制限を指定せずにGPU要求を指定することはできません。
次のコマンドでGPU構成を確認します：

kubectl get node -o json | jq ‘.items[].metadata.labels‘

例：下の画像は、Nvidia A30カードを使用するワーカーを示しています。構成戦略：all-balanced、ステータス：success。

ワーカー上で以下のコマンドを使用してGPUインスタンス構成を確認してください

（ワーカーにSSH接続し、コマンドを入力）：

GPUを使用したアプリケーションデプロイの例：

共有モードMIGおよびシングル戦略

GPUリソースは以下のように宣言されます：

nvidia.com/gpu:

#Example:
nvidia.com/gpu: 1

*(With the single strategy, the GPU card is divided into equal instances)

シングルGPU戦略を用いたデプロイメント例

apiVersion: apps/v1 

kind: Deployment 

metadata: 

  name: example-gpu-app 

spec: 

  replicas: 1 

  selector: 

    matchLabels: 

      component: gpu-app 

  template: 

    metadata: 

      labels: 

        component: gpu-app 

    spec: 

      containers: 

        - name: gpu-container 

          securityContext: 

            capabilities: 

              add: 

                - SYS_ADMIN 

          resources: 

            limits: 

              nvidia.com/mig-1g.6gb: 1 

          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04 

          command: ["/bin/sh", "-c"] 

          args: 

            - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t 1004 -d 300; sleep 30; done

MIGおよび混合共有モードで

GPUリソースは以下のように宣言されます：

nvidia.com/<type>:

#Example 
nvidia.com/mig-1g.6gb: 2

*(With the mixed strategy, a GPU card can be split into two instance types, so you must specify the instance type when declaring resources.)

混合GPU戦略を用いたデプロイメント例

apiVersion: apps/v1 

kind: Deployment 

metadata: 

  name: example-gpu-app 

spec: 

  replicas: 1 

  selector: 

    matchLabels: 

      component: gpu-app 

  template: 

    metadata: 

      labels: 

        component: gpu-app 

    spec: 

      containers: 

        - name: gpu-container 

          securityContext: 

            capabilities: 

              add: 

                - SYS_ADMIN 

          resources: 

            limits: 

              nvidia.com/mig-1g.6gb: 1 

          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04 

          command: ["/bin/sh", "-c"] 

          args: 

            - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t 1004 -d 300; sleep 30; done

非戦略によって

GPUリソースは以下のように宣言されます：

#Syntax:
nvidia.com/gpu: 1

*(With the none strategy, the pod will use all the resources of a single GPU card.)

非戦略を用いたデプロイメントの例

apiVersion: apps/v1 

kind: Deployment 

metadata: 

  name: example-gpu-app 

spec: 

  replicas: 1 

  selector: 

    matchLabels: 

      component: gpu-app 

  template: 

    metadata: 

      labels: 

        component: gpu-app 

    spec: 

      containers: 

        - name: gpu-container 

          securityContext: 

            capabilities: 

              add: 

                - SYS_ADMIN 

          resources: 

            limits: 

              nvidia.com/gpu: 1 

          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04 

          command: ["/bin/sh", "-c"] 

          args: 

            - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t 1004 -d 300; sleep 30; done

MPS共有モードで

GPUリソースは以下のように宣言されます：

#Syntax: nvidia.com/gpu:
#Example:
nvidia.com/gpu: 1

注記: ポッドが要求できる nvidia.com/gpu リソースの最大数は 1 です。

Previousクラスタ設定を変更する NextGPUドライバーをインストールする

Last updated 18 days ago