# Cluster auto-scale using KEDA & Prometheus

## Requirements

* Kubernetes cluster with attached GPU
* The GPU application is in a running state
* The kube-prometheus-stack and prometheus-adapter packages in the FPT App Catalog service, as in [this documentation](/fpt-gpu-cloud/gpu-cluster/managed-k8s-with-gpu-virtual-machine/tutorial/cluster-auto-scaling/cluster-auto-scale-using-gpu-custom-metrics.md).

<figure><img src="/files/eHVtfSYltdZmc8K5htM9" alt="" width="289"><figcaption></figcaption></figure>

## Step by Step

### Step 1: Install KEDA&#x20;

#### **Using the FPT App Catalog**

Select the **FPT Cloud App Catalog** service, then search for **KEDA** in the `fptcloud-catalogs` repository.

#### Using the Helm chart

```
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace
```

Check if the KEDA pods are running normally

`kubectl -n keda get pod`

```
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/keda-admission-webhooks-54764ff7d5-l4tks           1/1     Running   0          3d
pod/keda-operator-567cb596fd-wx4t8                     1/1     Running   0          2d23h
pod/keda-operator-metrics-apiserver-6475bf5fff-8x8bw   1/1     Running   0          2d14h

NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)            AGE
service/keda-admission-webhooks           ClusterIP   100.71.2.54              443/TCP            3d2h
service/keda-operator                     ClusterIP   100.66.228.223           9666/TCP           3d2h
service/keda-operator-metrics-apiserver   ClusterIP   100.71.162.181           443/TCP,8080/TCP   3d2h

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/keda-admission-webhooks           1/1     1            1           3d2h
deployment.apps/keda-operator                     1/1     1            1           3d2h
deployment.apps/keda-operator-metrics-apiserver   1/1     1            1           3d2h

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/keda-admission-webhooks-54764ff7d5           1         1         1       3d2h
replicaset.apps/keda-operator-567cb596fd                     1         1         1       3d2h
replicaset.apps/keda-operator-metrics-apiserver-6475bf5fff   1         1         1       3d2h
```

### Step 2: Check if Prometheus has GPU metrics

`kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq -r . | grep DCGM`

```
"name": "namespaces/DCGM_FI_DEV_POWER_USAGE",
"name": "namespaces/DCGM_FI_DEV_FB_USED",
"name": "namespaces/DCGM_FI_DEV_PCIE_REPLAY_COUNTER",
"name": "pods/DCGM_FI_DEV_XID_ERRORS",
"name": "namespaces/DCGM_FI_PROF_GR_ENGINE_ACTIVE",
"name": "namespaces/DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION",
"name": "pods/DCGM_FI_PROF_DRAM_ACTIVE",
"name": "jobs.batch/DCGM_FI_DEV_POWER_USAGE",
"name": "jobs.batch/DCGM_FI_DEV_SM_CLOCK",
"name": "namespaces/DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL",
"name": "pods/DCGM_FI_DEV_POWER_USAGE",
"name": "jobs.batch/DCGM_FI_DEV_MEM_CLOCK",
"name": "jobs.batch/DCGM_FI_DEV_FB_USED",
"name": "namespaces/DCGM_FI_DEV_FB_FREE",
"name": "jobs.batch/DCGM_FI_PROF_GR_ENGINE_ACTIVE",
"name": "pods/DCGM_FI_DEV_MEMORY_TEMP",
"name": "pods/DCGM_FI_DEV_FB_FREE",
"name": "pods/DCGM_FI_DEV_MEM_CLOCK",
"name": "pods/DCGM_FI_PROF_GR_ENGINE_ACTIVE",
"name": "pods/DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL",
"name": "pods/DCGM_FI_PROF_PIPE_TENSOR_ACTIVE",
"name": "jobs.batch/DCGM_FI_DEV_MEMORY_TEMP",
"name": "namespaces/DCGM_FI_DEV_MEM_CLOCK",
"name": "jobs.batch/DCGM_FI_DEV_XID_ERRORS",
"name": "namespaces/DCGM_FI_DEV_VGPU_LICENSE_STATUS",
"name": "jobs.batch/DCGM_FI_DEV_VGPU_LICENSE_STATUS",
"name": "pods/DCGM_FI_DEV_GPU_TEMP",
"name": "jobs.batch/DCGM_FI_PROF_PIPE_TENSOR_ACTIVE",
"name": "pods/DCGM_FI_DEV_PCIE_REPLAY_COUNTER",
"name": "pods/DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION",
"name": "jobs.batch/DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION",
"name": "pods/DCGM_FI_DEV_FB_USED",
"name": "pods/DCGM_FI_DEV_VGPU_LICENSE_STATUS",
"name": "namespaces/DCGM_FI_DEV_MEMORY_TEMP",
"name": "jobs.batch/DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL",
"name": "namespaces/DCGM_FI_DEV_SM_CLOCK",
"name": "namespaces/DCGM_FI_PROF_PIPE_TENSOR_ACTIVE",
"name": "namespaces/DCGM_FI_DEV_GPU_TEMP",
"name": "jobs.batch/DCGM_FI_DEV_GPU_TEMP",
"name": "namespaces/DCGM_FI_PROF_DRAM_ACTIVE",
"name": "namespaces/DCGM_FI_DEV_XID_ERRORS",
"name": "jobs.batch/DCGM_FI_DEV_FB_FREE",
"name": "pods/DCGM_FI_DEV_SM_CLOCK",
"name": "jobs.batch/DCGM_FI_DEV_PCIE_REPLAY_COUNTER",
"name": "jobs.batch/DCGM_FI_PROF_DRAM_ACTIVE",
```

### Step 3: Create a ScaledObject to specify autoscaling for the application

* Manifest

```
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scaled-object
spec:
  scaleTargetRef:
    name: gpu-test
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus-kube-prometheus-prometheus.prometheus.svc.cluster.local:9090
        metricName: engine_active
        query: sum(DCGM_FI_PROF_GR_ENGINE_ACTIVE{modelName="NVIDIA A30", container="gpu-test"}) / count(DCGM_FI_PROF_GR_ENGINE_ACTIVE{modelName="NVIDIA A30", container="gpu-test"})
        threshold: '0.8'
```

* *name*: The name of the GPU deployment in the example is `gpu-test`
* serverAddress:\_The endpoint of the Prometheus server in the example is [`http://prometheus-kube-prometheus-prometheus.prometheus.svc.cluster.local:9090`](http://prometheus-kube-prometheus-prometheus.prometheus.svc.cluster.local:9090/)
* *query*: The PromQL query to find the value based on which autoscale is performed. In the example above, it finds the average values of the variable `DCGM_FI_PROF_GR_ENGINE_ACTIVE`
* *threshold*: The threshold value to trigger active autoscale; in the example it is `0.8`

As shown in the example above, whenever the average value of `DCGM_FI_PROF_GR_ENGINE_ACTIVE` exceeds `0.8`, ScaledObject will scale the pods of the Deployment named `gpu-test`.

After creating the ScaledObject, the deployment will automatically scale down to 0, indicating successful configuration.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-docs.fptcloud.com/fpt-gpu-cloud/gpu-cluster/managed-k8s-with-gpu-virtual-machine/tutorial/cluster-auto-scaling/cluster-auto-scale-using-keda-and-prometheus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
