LLM Backend Reference¶
Quick reference for the LLM backend integration layer. RUNE supports pluggable backends via the LLMBackend protocol. The default backend is Ollama.
Quick Start (Generic API)¶
List available models¶
from rune_bench.backends import get_backend
backend = get_backend("ollama", "http://localhost:11434")
models = backend.list_models()
print(models)
Check running models¶
backend = get_backend("ollama", "http://localhost:11434")
running = backend.list_running_models()
print(f"Currently running: {running}")
Warm up a model¶
backend = get_backend("ollama", "http://localhost:11434")
loaded = backend.warmup("mistral:latest", timeout_seconds=120)
print(f"Ready: {loaded}")
Get model capabilities¶
backend = get_backend("ollama", "http://localhost:11434")
normalized = backend.normalize_model_name("mistral:latest")
caps = backend.get_model_capabilities(normalized)
print(f"Context window: {caps.context_window}, Max tokens: {caps.max_output_tokens}")
CLI Usage¶
List available models on a server¶
python -m rune llm-list-models --backend-url http://localhost:11434 --backend-type ollama
Run benchmark with warm-up¶
python -m rune run-benchmark \
--backend-url http://localhost:11434 \
--model mistral:latest \
--backend-warmup \
--backend-warmup-timeout 90
Ollama-Specific Module¶
For direct access to Ollama-specific features, the OllamaBackend facade and lower-level OllamaClient/OllamaModelManager are still available:
from rune_bench.backends.ollama import OllamaClient, OllamaModelManager
# Low-level client
client = OllamaClient("http://localhost:11434")
models = client.get_available_models()
# High-level manager
manager = OllamaModelManager.create("http://localhost:11434")
manager.warmup_model("mistral:latest", timeout_seconds=120, unload_others=True)
Architecture¶
LLMBackend(Protocol):rune_bench/backends/base.py— 6 methods:base_url,get_model_capabilities,list_models,list_running_models,normalize_model_name,warmup.get_backend(type, url): Factory inrune_bench/backends/__init__.py— resolves backend by type.OllamaBackend:rune_bench/backends/ollama.py— implementsLLMBackendfor Ollama.OllamaClient: Low-level HTTP transport for Ollama API.OllamaModelManager: High-level model lifecycle operations.
Testing with Mocks¶
from unittest.mock import MagicMock
from rune_bench.backends.base import ModelCapabilities
fake_backend = MagicMock()
fake_backend.normalize_model_name.return_value = "mistral:latest"
fake_backend.get_model_capabilities.return_value = ModelCapabilities(
model_name="mistral:latest",
context_window=32768,
max_output_tokens=4096,
)
# Patch get_backend to return the fake
# monkeypatch.setattr("rune_bench.backends.get_backend", lambda *a, **kw: fake_backend)