vLLM Use Case

Step 1: Create a GPU Container using vllm-openai template

In the Environment Variables field, customize the value to match the API key (use for inferencing request) and your Hugging Face token to download model from Hugging Face.
In this tutorial, we are using Deepseek-R1-Distill-Qwen-1.5B. Please replace the value of MODEL with any other model you prefer for inference.

Step 2: Testing using Postman. Use your API_Token added in Step 1 to authorize

Last updated 9 days ago