LLM Backend Reference¶

Quick reference for the LLM backend integration layer. RUNE supports pluggable backends via the LLMBackend protocol. The default backend is Ollama.

Quick Start (Generic API)¶

List available models¶

from rune_bench.backends import get_backend

backend = get_backend("ollama", "http://localhost:11434")
models = backend.list_models()
print(models)

Check running models¶

backend = get_backend("ollama", "http://localhost:11434")
running = backend.list_running_models()
print(f"Currently running: {running}")

Warm up a model¶

backend = get_backend("ollama", "http://localhost:11434")
loaded = backend.warmup("mistral:latest", timeout_seconds=120)
print(f"Ready: {loaded}")

Get model capabilities¶

backend = get_backend("ollama", "http://localhost:11434")
normalized = backend.normalize_model_name("mistral:latest")
caps = backend.get_model_capabilities(normalized)
print(f"Context window: {caps.context_window}, Max tokens: {caps.max_output_tokens}")

CLI Usage¶

List available models on a server¶

python -m rune llm-list-models --backend-url http://localhost:11434 --backend-type ollama

Run benchmark with warm-up¶

python -m rune run-benchmark \
    --backend-url http://localhost:11434 \
    --model mistral:latest \
    --backend-warmup \
    --backend-warmup-timeout 90

Ollama-Specific Module¶

For direct access to Ollama-specific features, the OllamaBackend facade and lower-level OllamaClient/OllamaModelManager are still available:

from rune_bench.backends.ollama import OllamaClient, OllamaModelManager

# Low-level client
client = OllamaClient("http://localhost:11434")
models = client.get_available_models()

# High-level manager
manager = OllamaModelManager.create("http://localhost:11434")
manager.warmup_model("mistral:latest", timeout_seconds=120, unload_others=True)

Architecture¶

LLMBackend (Protocol): rune_bench/backends/base.py — 6 methods: base_url, get_model_capabilities, list_models, list_running_models, normalize_model_name, warmup.
get_backend(type, url): Factory in rune_bench/backends/__init__.py — resolves backend by type.
OllamaBackend: rune_bench/backends/ollama.py — implements LLMBackend for Ollama.
OllamaClient: Low-level HTTP transport for Ollama API.
OllamaModelManager: High-level model lifecycle operations.

Testing with Mocks¶

from unittest.mock import MagicMock
from rune_bench.backends.base import ModelCapabilities

fake_backend = MagicMock()
fake_backend.normalize_model_name.return_value = "mistral:latest"
fake_backend.get_model_capabilities.return_value = ModelCapabilities(
    model_name="mistral:latest",
    context_window=32768,
    max_output_tokens=4096,
)

# Patch get_backend to return the fake
# monkeypatch.setattr("rune_bench.backends.get_backend", lambda *a, **kw: fake_backend)