ralph-orchestrator

tests license python

Production patterns for multi-agent LLM systems — circuit breakers, token tracking, session memory, and observability metrics. Drop-in FastAPI router included.

Why this exists. Most “agent framework” tutorials stop at the happy path. Production multi-agent systems fail in five predictable ways: cascading failures when one agent degrades, runaway token cost, race conditions between parallel agents, context overflow in long conversations, and zero observability when something goes wrong. This library ships opinionated, minimal-dependency primitives for all five, extracted from a working autonomous agent system.

from ralph_orchestrator import get_circuit_breaker_registry, get_token_tracker

registry = get_circuit_breaker_registry()
breaker = registry.get_breaker("inventory_agent")

if breaker.is_available():
    try:
        result = await agent.execute()
        breaker.record_success()
        get_token_tracker().record_usage(
            agent_id="inventory_agent",
            input_tokens=result.usage.input_tokens,
            output_tokens=result.usage.output_tokens,
            model="claude-sonnet-4-5",
        )
    except Exception:
        breaker.record_failure()
        raise

What’s in the box

Module Pattern What it solves
circuit_breaker Hystrix-style circuit breaker with CLOSED / OPEN / HALF_OPEN states, per-agent One flaky agent takes down the whole orchestrator
token_tracker SQLite-backed LLM cost accounting with per-model pricing + daily/weekly/monthly rollups No idea which agent is burning your Anthropic budget
agent_memory MemoryManager (session-isolated shared state with 4 conflict-resolution strategies) + WindowMemory (sliding window with auto-compression) Parallel agents corrupting shared state; long sessions hitting context limits
agent_metrics Call-level observability: success rate, latency, error taxonomy, agent-to-agent interaction graph, 0–100 health score Black-box agent failures with no leading indicators
fastapi_router create_orchestration_router() — mounts /api/agents/* endpoints exposing everything above Dashboard/Grafana needs a REST surface, not SDK imports

Zero third-party runtime dependencies outside the standard library. FastAPI is only required if you mount fastapi_router.

Install

pip install -e .                 # core only
pip install -e ".[fastapi]"      # + fastapi / pydantic
pip install -e ".[dev]"          # + pytest + coverage

Python 3.10+. All persistence is SQLite (./database/metrics.db, created on first write).

Run the tests

pytest

Three test files ship with the library (tests/test_circuit_breaker.py, tests/test_token_tracker.py, tests/test_agent_memory.py) covering state transitions, cost rollups, conflict resolution, and compression. No external services required.

REST API

Mount create_orchestration_router() on any FastAPI app and you get:

GET  /api/agents/token-stats?period=daily|weekly|monthly
GET  /api/agents/token-stats/trend?days=7
GET  /api/agents/metrics?period=daily
GET  /api/agents/metrics/trend?days=7
GET  /api/agents/metrics/errors
GET  /api/agents/metrics/interactions          # nodes + edges graph
GET  /api/agents/health                        # system-wide summary
GET  /api/agents/circuit-breakers
POST /api/agents/circuit-breakers/{agent_id}/reset
GET  /api/agents/memory/stats

Point Grafana, Metabase, or a custom dashboard at those endpoints and you’re done.

from fastapi import FastAPI
from ralph_orchestrator.fastapi_router import create_orchestration_router

app = FastAPI()
app.include_router(create_orchestration_router())

Design choices worth knowing

Repository layout

src/ralph_orchestrator/
  circuit_breaker.py       # fault tolerance
  token_tracker.py         # cost accounting
  agent_memory.py          # MemoryManager + WindowMemory
  agent_metrics.py         # health scoring + interaction graph
  fastapi_router.py        # REST surface
tests/
  test_circuit_breaker.py
  test_token_tracker.py
  test_agent_memory.py
examples/
  quickstart.py

References

License

MIT — see LICENSE.