Today, we’re excited to announce that Docker Model Runner now integrates the vLLM inference
engine and safetensors models, unlocking high-throughput AI
inference with the same Docker tooling you already use. When we
first introduced Docker Model Runner, our goal was to make it
simple for developers to run and experiment with large language
models (LLMs) using Docker.We designed it to integrate multiple
inference engines from day one, starting with llama.cpp, to make it
easy to get models running