# Cluster configuration

The Managed GPU Cluster product is developed from Kubernetes Native and integrates additional cloud provider components into Kubernetes, including the FPT Cloud Controller Manager component. This component aims to manage worker nodes in the cluster and Load Balancer-type services. Users can expose their applications to the internet in many ways so that their customers can access the applications and services. These methods may include creating an ingress for the service, creating a node port service and attaching a floating IP to the worker node, or using a Load Balancer service.

FPTCloud supports users in creating load balancer services with accompanying annotation options

in the service configuration:

|                                                       |                |             |                                                                                                                                                         |
| ----------------------------------------------------- | -------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Key**                                               | **Value**      | **Default** | **Purpose**                                                                                                                                             |
| service.beta.kubernetes.io/fpt-load-balancer-internal | "true"/"false" | "false"     | If you do not want to expose the service to the internet, set the value to "true"                                                                       |
| loadbalancer.fptcloud.com/keep-floatingip             | "true"/"false" | "false"     | If you want to keep the LoadBalancer service's floating IP within the VPC after deleting the service, set the value to "true"                           |
| loadbalancer.fptcloud.com/proxy-protocol              | "true"/"false" | "false"     | If you want the LoadBalancer to use the PROXY protocol, configure the value as "true". Note: The Proxy protocol is only used with Layer 4 LoadBalancers |
| loadbalancer.fptcloud.com/enable-health-monitor       | "true"/"false" | "true"      | To disable the health monitor for the LoadBalancer Pool, set the value to "false".                                                                      |

|                                                   |                                                                                                               |                               |                                                                                                                                                                                                             |
| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| service.beta.kubernetes.io/fpt-load-balancer-type | LBv1 includes: basic/ advanced/ standard/ premium LBv2 includes: Basic-1/ Basic-2/ Standard/ Advanced/Premium | LBv1: "basic" LBv2: "Basic-1" | Configure the LoadBalancer flavor to handle the corresponding load of the application behind the LoadBalancer pool backend                                                                                  |
| loadbalancer.fptcloud.com/enable-ingress-hostname | "true"/"false"                                                                                                | "false"                       | To enable ingress hostname for the LoadBalancer service type, set the value to "true"                                                                                                                       |
| loadbalancer.fptcloud.com/load-balancer-version   | "v1"/"v2"                                                                                                     | "v1"                          | To use LBv2 for the LoadBalancer service type, configure the value as "v2". LBv1 will be created by default if not configured this annotation                                                               |
| loadbalancer.fptcloud.com/x-forwarded-for         | "true"/"false"                                                                                                | "false"                       | To forward the request header to the LoadBalancer pool backend when using LoadBalancer layer7, configure the value as "true". Note: You cannot use the proxy protocol and x-forwarded-for at the same time. |

Additionally, Managed GPU Cluster supports users to configure:

**Create a LoadBalancer service type specifying a floating IP attached to the Load Balancer**

<img src="/files/p3waDp1JuuAlJMM2daPT" alt="Group 133, Grouped object" data-size="original">

<img src="/files/EpQUfmfAL9AvDkhkn30j" alt="Group 139, Grouped object" data-size="original">

Note: The public IP must be allocated to the VPC and be in the Inactive state. The user goes to the

**Networking -> Floating IPs** to check.

**Restrict access to the Load Balancer by configuring**

**\_"loadBalancerSourceRanges"\_in the \_"spec"\_section of the service configuration:**

![](/files/43ca0a399ddd92be35b3d5ad09ad3cf1111cca87)

* 14.233.234.0/24
* 10.250.0.0/24

Note: The "loadBalancerSourceRanges" configuration contains an array of public IP ranges allowed to access the Load Balancer. By default, M-FKE creates a Load Balancer service type with the source IP range configured as 0.0.0.0/0.

Ollama is an open-source tool that allows you to run, manage, and customize large language models (LLMs) on personal computers or servers, supporting various models such as Llama, DeepSeek, Mistral, etc. Open-WebUI is an open-source web interface specifically designed to

interact with Ollama, providing a user-friendly experience and making it easy to manage and use LLM models.

This document will guide you through the steps to deploy the DeepSeek-R1 model on the FPT Managed GPU Cluster using Ollama and Open-WebUI so that users can use it simply and easily.

**Step 1**: Clone the existing source code and script of Open-WebUI

![](/files/b481a18a13efd902bb634c4e74ee5657470d4233)

git clone <https://github.com/open-webui/open-webui>

**Step 2**: Run the scripts to deploy ollama and open-webui. The directory contains all the files needed for deployment, such as **namespace**, **ollama statefulSet**, **ollama service**, **open-webui deployment**, and **open-webui service**.

![](/files/4f6392019b77e04d840ce77b0c2198a1b2066bfa)

kubectl apply -f ./kubernetes/manifest

**Step 3**: Access open-webui in your browser at the forwarded port, for example: [*http://localhost:52433*.](http://localhost:52433/) For the first time installing and using OpenWebUI, users will need to configure the following information: name, email, password.

![](/files/99ebf87b790394d2a7974eb163a13a1e749700ec)

**Step 4**: After installation is complete, the user selects the model to use. For example, here we will install the DeepSeek-R1 model, version\*\* 1.5b\*\*.

![](/files/1a130969cf2dfcbe95f5adb79f939eef286d9b48)

**Step 5**: After the model has been loaded and run, users can interact with the model very simply

and intuitively through the interface.

![](/files/44bed2ced7045969822949418c943b71080295f4)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-docs.fptcloud.com/fpt-gpu-cloud/gpu-cluster/managed-k8s-with-metal-cloud/tutorial/cluster-configuration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
