LLMs at Ludicrous Speed: Dockerizing vLLM for Real Apps
If you’ve ever watched your GPU twiddle its thumbs between prompts, this one’s for you. In this post we’ll cover what vLLM is, why it’s fast, how to run it with Docker Compose, and how to test it with real calls. I’ll also show concrete