> For the complete documentation index, see [llms.txt](https://ai-docs.fptcloud.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-docs.fptcloud.com/fpt-gpu-cloud/gpu-cluster/managed-k8s-with-gpu-virtual-machine/tutorial/deploy-applications.md).

# Deploy applications

## Overview

Kubernetes manages and uses GPU resources in the same way as CPU resources. Depending on the GPU configuration selected for the Worker Group, declare GPU resources for the application on Kubernetes.

Note:

* You can specify GPU limits without specifying requests, as Kubernetes uses limits as the default request value.
* You can specify both GPU limits and requests, but these two values must be equal.
* You cannot specify GPU requests without specifying limits.
* Check the GPU configuration using the following command:

`kubectl get node -o json | jq ‘.items[].metadata.labels‘`

Example: The image below shows a worker using an Nvidia A30 card, configuration strategy: all-balanced, status: success.\
![](/files/ZdqUMt1dAEuL1vTihH9V)

Check the GPU Instance configuration on the worker using the following command \
(SSH into the worker, type the command):

Example of deploying an application using GPU:

## **With the sharing mode MIG and Single strategy**

GPU resources are declared as follows:

```
nvidia.com/gpu:

#Example:
nvidia.com/gpu: 1

*(With the single strategy, the GPU card is divided into equal instances)
```

Example deployment using the single GPU strategy

```
apiVersion: apps/v1 

kind: Deployment 

metadata: 

  name: example-gpu-app 

spec: 

  replicas: 1 

  selector: 

    matchLabels: 

      component: gpu-app 

  template: 

    metadata: 

      labels: 

        component: gpu-app 

    spec: 

      containers: 

        - name: gpu-container 

          securityContext: 

            capabilities: 

              add: 

                - SYS_ADMIN 

          resources: 

            limits: 

              nvidia.com/mig-1g.6gb: 1 

          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04 

          command: ["/bin/sh", "-c"] 

          args: 

            - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t 1004 -d 300; sleep 30; done 
```

## **With MIG and mixed sharing modes**

GPU resources are declared as follows:

```
nvidia.com/<type>:

#Example 
nvidia.com/mig-1g.6gb: 2

*(With the mixed strategy, a GPU card can be split into two instance types, so you must specify the instance type when declaring resources.)
```

Example deployment using the mixed GPU strategy

```
apiVersion: apps/v1 

kind: Deployment 

metadata: 

  name: example-gpu-app 

spec: 

  replicas: 1 

  selector: 

    matchLabels: 

      component: gpu-app 

  template: 

    metadata: 

      labels: 

        component: gpu-app 

    spec: 

      containers: 

        - name: gpu-container 

          securityContext: 

            capabilities: 

              add: 

                - SYS_ADMIN 

          resources: 

            limits: 

              nvidia.com/mig-1g.6gb: 1 

          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04 

          command: ["/bin/sh", "-c"] 

          args: 

            - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t 1004 -d 300; sleep 30; done 
```

## **With the non-strategy**

GPU resources are declared as follows:

```
#Syntax:
nvidia.com/gpu: 1

*(With the none strategy, the pod will use all the resources of a single GPU card.)
```

Example deployment using the non-strategy

```
apiVersion: apps/v1 

kind: Deployment 

metadata: 

  name: example-gpu-app 

spec: 

  replicas: 1 

  selector: 

    matchLabels: 

      component: gpu-app 

  template: 

    metadata: 

      labels: 

        component: gpu-app 

    spec: 

      containers: 

        - name: gpu-container 

          securityContext: 

            capabilities: 

              add: 

                - SYS_ADMIN 

          resources: 

            limits: 

              nvidia.com/gpu: 1 

          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04 

          command: ["/bin/sh", "-c"] 

          args: 

            - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t 1004 -d 300; sleep 30; done 
```

## With MPS sharing mode

GPU resources are declared as follows:

```
#Syntax: nvidia.com/gpu:
#Example:
nvidia.com/gpu: 1
```

**Note**: The maximum number of nvidia.com/gpu resources a pod can request is 1.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-docs.fptcloud.com/fpt-gpu-cloud/gpu-cluster/managed-k8s-with-gpu-virtual-machine/tutorial/deploy-applications.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.