vLLM Use Case

Step 1: Create a GPU Container using vllm-openai template

  • In the Environment Variables field, customize the value to match the API key (use for inferencing request) and your Hugging Face token to download model from Hugging Face.

  • In this tutorial, we are using Deepseek-R1-Distill-Qwen-1.5B. Please replace the value of MODEL with any other model you prefer for inference.

Step 2: Testing using Postman. Use your API_Token added in Step 1 to authorize

Last updated