vllm-project/vllm
vLLM
vLLM is a high-performance engine for LLM inference and serving with an OpenAI-compatible API, batching, and efficient memory management.
Catalog projects marked with #serving. Tags work as dedicated landing pages, so related tools are easier to find and connect.