ralph-orchestrator

Production patterns for multi-agent LLM systems — circuit breakers, token tracking, session memory, and observability metrics. Drop-in FastAPI router included.

Why this exists. Most “agent framework” tutorials stop at the happy path. Production multi-agent systems fail in five predictable ways: cascading failures when one agent degrades, runaway token cost, race conditions between parallel agents, context overflow in long conversations, and zero observability when something goes wrong. This library ships opinionated, minimal-dependency primitives for all five, extracted from a working autonomous agent system.

from ralph_orchestrator import get_circuit_breaker_registry, get_token_tracker

registry = get_circuit_breaker_registry()
breaker = registry.get_breaker("inventory_agent")

if breaker.is_available():
    try:
        result = await agent.execute()
        breaker.record_success()
        get_token_tracker().record_usage(
            agent_id="inventory_agent",
            input_tokens=result.usage.input_tokens,
            output_tokens=result.usage.output_tokens,
            model="claude-sonnet-4-5",
        )
    except Exception:
        breaker.record_failure()
        raise

What’s in the box

Module	Pattern	What it solves
`circuit_breaker`	Hystrix-style circuit breaker with CLOSED / OPEN / HALF_OPEN states, per-agent	One flaky agent takes down the whole orchestrator
`token_tracker`	SQLite-backed LLM cost accounting with per-model pricing + daily/weekly/monthly rollups	No idea which agent is burning your Anthropic budget
`agent_memory`	`MemoryManager` (session-isolated shared state with 4 conflict-resolution strategies) + `WindowMemory` (sliding window with auto-compression)	Parallel agents corrupting shared state; long sessions hitting context limits
`agent_metrics`	Call-level observability: success rate, latency, error taxonomy, agent-to-agent interaction graph, 0–100 health score	Black-box agent failures with no leading indicators
`fastapi_router`	`create_orchestration_router()` — mounts `/api/agents/*` endpoints exposing everything above	Dashboard/Grafana needs a REST surface, not SDK imports

Zero third-party runtime dependencies outside the standard library. FastAPI is only required if you mount fastapi_router.

Install

pip install -e .                 # core only
pip install -e ".[fastapi]"      # + fastapi / pydantic
pip install -e ".[dev]"          # + pytest + coverage

Python 3.10+. All persistence is SQLite (./database/metrics.db, created on first write).

Run the tests

pytest

Three test files ship with the library (tests/test_circuit_breaker.py, tests/test_token_tracker.py, tests/test_agent_memory.py) covering state transitions, cost rollups, conflict resolution, and compression. No external services required.

REST API

Mount create_orchestration_router() on any FastAPI app and you get:

GET  /api/agents/token-stats?period=daily|weekly|monthly
GET  /api/agents/token-stats/trend?days=7
GET  /api/agents/metrics?period=daily
GET  /api/agents/metrics/trend?days=7
GET  /api/agents/metrics/errors
GET  /api/agents/metrics/interactions          # nodes + edges graph
GET  /api/agents/health                        # system-wide summary
GET  /api/agents/circuit-breakers
POST /api/agents/circuit-breakers/{agent_id}/reset
GET  /api/agents/memory/stats

Point Grafana, Metabase, or a custom dashboard at those endpoints and you’re done.

from fastapi import FastAPI
from ralph_orchestrator.fastapi_router import create_orchestration_router

app = FastAPI()
app.include_router(create_orchestration_router())

Design choices worth knowing

Singletons everywhere. get_circuit_breaker_registry(), get_token_tracker(), get_agent_metrics() return process-local singletons. Simple to reason about; swap for DI if you need to.
Snapshot-based memory isolation. MemoryManager.create_session() deep-copies shared state at session start; the session commits atomically with configurable conflict handling (LAST_WRITE_WINS, FIRST_WRITE_WINS, MERGE_ARRAYS, ERROR_ON_CONFLICT).
Health score is weighted, not magic. 60% success rate + 20% recent trend + 10% latency + 10% call volume. Tune AgentMetrics._calculate_health_score for your workload.
Circuit breaker states are tested. See test_circuit_breaker.py::TestCircuitBreakerIntegration::test_typical_failure_scenario for the full CLOSED → OPEN → HALF_OPEN → CLOSED recovery path.
Cost table is in MODEL_PRICING. Update it as Anthropic / OpenAI ship new models.

Repository layout

src/ralph_orchestrator/
  circuit_breaker.py       # fault tolerance
  token_tracker.py         # cost accounting
  agent_memory.py          # MemoryManager + WindowMemory
  agent_metrics.py         # health scoring + interaction graph
  fastapi_router.py        # REST surface
tests/
  test_circuit_breaker.py
  test_token_tracker.py
  test_agent_memory.py
examples/
  quickstart.py

References

Anthropic’s “Agents 201” production-pattern write-ups
Martin Fowler on the Circuit Breaker pattern
Hystrix design docs (the original Java implementation)

License

MIT — see LICENSE.

This site is open source. Improve this page.