SYSTEM_PROMPT¶
Core identity¶
RUNE (Reliability Use-case Numeric Evaluator) is an AI agent benchmarking and compute provisioning platform across 23+ agents (SRE, research, cybersecurity, legal/ops, art/creative) with pluggable LLM backends and optional cloud GPU provisioning. Agent-neutral and backend-neutral: no agent or backend is privileged in code; defaults live in rune.yaml, not hardcoded.
Read first (in order)¶
- This file — mandates, SOP, constraints.
- CURRENT_STATE.md — WIP and known issues.
- Workstation Setup — Ubuntu 24.04 tooling.
- Developer Guide — repos, env, build/test/lint, deployment DoD steps.
- Coding Standards — style, coverage floors, tier layout.
- Audit Agents — legal/cyber checks (when and how).
Repos under ~/Devel/: rune/, rune-operator/, rune-ui/, rune-charts/, rune-docs/, rune-audit/, rune-airgapped/, rune-ci/.
Core constraints¶
- Decoupling: Agents use pluggable
DriverTransport(stdio or HTTP). - Thin CLI:
rune/is a shell; business logic inrune_bench/. - Reproducibility: Benchmarks reproducible and documented.
- Security: Branch protection, SLSA L3-style provenance, scanning.
- Pre-alpha (0.0.0a4): no API stability guarantee.
- Cost safety: Fail-closed GPU cost estimation;
confidence_score < 0.95rejects; local-only workflows skip gates. - Vulnerabilities: Resolve known issues; risk acceptance only below CVSS 8.8 with no fix; above threshold with no upstream fix → fork/patch +
dep-security-patchlabel. VEX Register.
Architecture (where things live)¶
| Layer | Location | Rule |
|---|---|---|
| CLI | rune/ |
Thin only |
| Orchestration | rune_bench/workflows.py |
Business flow |
| Drivers | rune_bench/drivers/ |
DriverTransport |
| Agents | rune_bench/agents/ |
By scope; registry registry.py |
| Backends | rune_bench/backends/ |
get_backend(...) |
| Resources | rune_bench/resources/ |
Vast.ai, etc. |
| Catalog | rune_bench/catalog/defaults/ |
chains.csv, scopes.csv |
| Config | rune_bench/common/config.py |
YAML + profiles |
| Cost contracts | rune_bench/api_contracts.py |
CostEstimation* |
| HTTP API | rune_bench/api_server.py |
ThreadingHTTPServer + storage |
Extension points (protocols)¶
New integrations MUST implement one protocol (signatures in source):
| Protocol | Module | Role |
|---|---|---|
| DriverTransport | rune_bench/drivers/base.py |
call(action, params) -> dict — stdio/HTTP factories via RUNE_<NAME>_DRIVER_* env |
| AgentRunner | rune_bench/agents/base.py |
ask(...) + AgentConfig / AgentResult |
| LLMBackend | rune_bench/backends/base.py |
Models, warmup, normalize_model_name, etc. |
| LLMResourceProvider | rune_bench/resources/base.py |
provision / teardown → ProvisioningResult |
Registries: get_agent / get_backend (and register_*); custom entries shadow built-ins; lazy importlib for built-ins. Missing required config → RuntimeError with env hint.
Catalog & drivers¶
chains.csv: Authoritative agents, tier (1 = OSS measurable, 2 = partial API, 3 = closed/protocol-only), scope, capabilities. scopes.csv: Benchmark scopes. Shipped as package data. Scope → rune_bench/agents/<scope>/.
Config — rune.yaml¶
Precedence (high → low): CLI → env → ./rune.yaml → ~/.rune/config.yaml → Typer defaults.
Resolution: Agent/backend: CLI → YAML → error (no silent code default). Backend URL: CLI → YAML → RUNE_BACKEND_URL → dynamic provision.
Profiles: --profile / RUNE_PROFILE. Secrets: never in YAML — env only. rune init: starter from INIT_TEMPLATE.
Cost gates & API contracts¶
Fail-closed CostEstimationRequest / CostEstimationResponse; drivers include vastai, cloud, local (local skips gates; TDP energy model supported).
Contracts (all use backend_url, backend_type): RunLLMInstanceRequest, RunAgenticAgentRequest, RunBenchmarkRequest, CostEstimation*.
Conventions¶
RuntimeErrorwith clear messages at boundaries; normalize URLs; strip LiteLLMollama/vianormalize_model_name.- Warmup deterministic memory; reuse matching Vast instances when sensible.
- Mock network/provider in tests (97% coverage floor target); no automated real cloud lifecycle tests.
- Optional
pyproject.tomlextras keep base install small (holmes,vastai,catalog,all,dev).
Ownership, labels, and board (mandatory)¶
Take issue (user-directed)¶
If the user explicitly asks you to take / implement / work on a specific issue (number, link, or clear reference), that is permission to own it. Do not halt only because labels are missing or another agent’s *_cli label is present.
- Assign issue to
lpasquali(never self-assign). - Add your
<agent>_cli(claude_cli,gemini_cli,copilot_cli,cursor_cli); create label in repo if missing. - Remove other agent
*_clilabels among those four soproject-sync-logic.ymlmaps Agent Lane unambiguously (first matching label wins). - Isolate (feature branch; repro first for bugs) — then continue SOP without asking to approve label fixes.
- Ensure item on GitHub project #1 (verify auto-add or
gh project item-add 1 --owner lpasquali --url <ISSUE_URL>), then set Status → In progress manually. Relabeling updates Agent Lane only — not Status.
Label isolation & PRs¶
- Draft/production PRs you open MUST carry the same
<agent>_clilabel. - Ownership fence: Only work on issues/PRs with your
*_clilabel; do not rebase/push another agent’s labeled branches/items. - Active issues you work on stay assigned to lpasquali.
Project #1 — Status vs automation¶
rune-ci project-sync-logic.yml only adds items to project #1 and sets Agent Lane from *_cli labels. It never sets Status.
| Status | How |
|---|---|
| Todo | Built-in when item added to project |
| In progress | You set (GraphQL or UI) when Assign + Isolate are done (from Todo); or built-in Item reopened workflow |
| Review (+ Human lane) | You set when blocked on human input |
| Done | Built-in on issue close / PR merge |
Design: rune-docs#187. Agent lanes: Claude, Gemini, Copilot, Cursor, Human — synced from labels (human label for Human lane where applicable).
Other process mandates¶
- Anti-Rogue: Do not start Execute (editing code) until chat confirms SOP 1–2 complete and user permits proceed (even in YOLO).
- ADRs: Architectural / cross-repo gaps → ADR in
docs/architecture/adrs/; note ADR id + title in CURRENT_STATE.md. - Branches: Feature branch only; rebase/push your assigned branch.
- PR workflow: Rebase on latest
main; wait for CI green. PR body MUST passpr-body-check: copy the repo’s.github/PULL_REQUEST_TEMPLATE.md(layout is shared across~/Devel/*repos) —Closes #NNN, exactly one[x]DoD level, Acceptance Criteria Evidence, Audit Checks, Breaking Changes, Test plan; audit line must includePASS,FAIL, orNo triggers fired. - Efficiency: Parallel tool calls when independent; sub-agents for heavy investigation; validate with repo commands before claiming done.
- Issue closure: Do not close an issue while any linked PR (including Draft) is open.
- Epics: Close only when every listed child issue is closed and linked PRs merged or closed; link new children with
Closes+ Epic body list.
Documentation expedite¶
rune-docs merges on its own cadence. Stale docs → PR immediately. Docs PRs: mkdocs build --strict + review — not blocked on feature milestones.
Definition of Done (pre-PR)¶
Pick the highest applicable level. CI green alone is not enough.
- Level 1 (runtime, API, drivers, agents, charts, Docker): docker-compose E2E; kind E2E (
kind/kubectl/helm,kind load docker-image); standalone CLI E2E; breaking-change review (API, persistence, protocols). Deps: runpip-audit(notsafety) before PR; never ship a known new CVE — fix, replace, fork-patch, or escalate tolpasquali. Healthchecks: prefer127.0.0.1; volume mounts:chownin image for non-root. - Level 2 (tests/CI/coverage/linter config only, no runtime code): full test suite; coverage not degraded; no CI regressions.
- Level 3 (
rune-docscontent only):mkdocs build --strict+ review.
Evidence: Every checked acceptance criterion needs proof (CI logs count where applicable). rune-docs / rune-ui: screenshots mandatory when UI matters — headless capture + you verify the image; else Draft PR + HUMAN INTERVENTION REQUIRED for screenshots. Also logs, diffs, command output as appropriate.
SOP: issue → merge¶
- Assign —
lpasquali+ Take issue label steps if user-directed (no halt on label mismatch). - Isolate — branch (+ repro for bugs).
- Research —
rune-docs+ code. - Halt — confirm 1–2 done; user OK to Execute.
- Execute — minimal scope; strong tests.
- Verify — mocks at boundaries; coverage/SLSA expectations.
- E2E — per DoD Level 1 when Level 1 applies; attach evidence.
- PR — template + rebase; green CI.
- Persist — CURRENT_STATE.md after merge.
Audit Agents¶
When changes match a row, run the checks before PR; FAIL → no PR. Full detail: Audit Agents.
| Change | Checks (examples) |
|---|---|
Deps (requirements.txt, pyproject.toml, go.mod) |
legal check:dep + cyber check:dep |
| New agent/driver | legal check:integration |
| API/auth/CRD | cyber check:api |
.github/workflows/ |
cyber check:supply-chain |
| Dockerfile / base image | legal check:dep + cyber check:supply-chain |
| Helm values | cyber check:api |
License problems → priority/p0. Milestone / quarterly: full legal + cyber audits as needed.
Tone¶
Professional, concise; optimize for reliability, automation, and security.