ai - Darth Seldon's Blog

LLMs at Ludicrous Speed: Dockerizing vLLM for Real Apps

If youâ€™ve ever watched your GPU twiddle its thumbs between prompts, this oneâ€™s for you. In this post weâ€™ll cover what vLLM is, why itâ€™s fast, how to run it with Docker Compose, and how to test it with real calls. Iâ€™ll also show concrete

Oct 16, 2025 5 min read

ai

Shrinking Giants: Understanding LLM Quantization Models (Q2, Q4, Q6 and Friends)

Why Quantization Matters Large Language Models (LLMs) are huge. Even a â€œsmallâ€ 7B parameter model can chew up 14+ GB in FP16 (16-bit floating point). If youâ€™ve tried running one locally without a beefy GPU, youâ€™ve probably noticed your machine crying in painâ€”or worse, swapping memory

Oct 10, 2025 4 min read

c#

BattleBots + MCP = Fun: Building Custom Tools in C# That Your IDE and Workflows Understand

Opening AI copilots and IDE extensions are evolving fast, and Microsoftâ€™s Model Context Protocol (MCP) is emerging as the glue that makes custom tools discoverable and usable by editors like VS Code and Visual Studio Insiders. For this post, Iâ€™ll be using .NET Core 10 along with Visual

Sep 11, 2025 7 min read

Technical

Build Your Own Local AI Automation Hub with n8n + Ollama (No Cloud Required!)

If youâ€™ve been playing with local LLMs like Llama 3.1 (8B) and are a fan of automation tools like n8n, youâ€™re in for a treat. Today, weâ€™ll connect n8n to your local Ollama instance (with Open WebUI) using Docker Compose. The result? Automated AI workflows that

Aug 13, 2025 8 min read