vLLM Use Case
Step 1: Create a GPU Container using vllm-openai template
In the Environment Variables field, customize the value to match the API key (use for inferencing request) and your Hugging Face token to download model from Hugging Face.
In this tutorial, we are using Deepseek-R1-Distill-Qwen-1.5B. Please replace the value of MODEL with any other model you prefer for inference.

Step 2: Testing using Postman. Use your API_Token added in Step 1 to authorize

Last updated
