Ocarina

01Why

The reproducible part of an agentic system

MCP tool calls return the same output for the same inputs, regardless of which LLM issued them. Write a playbook by hand or record one from a session, then replay it in CI to check the server still returns what you expect and catch regressions, with no API key.

Docs

Document any server

Generate markdown for every tool and resource a server exposes, with a copy-paste example step for each. The fastest way to start a new rondo.

Play

Run without an LLM

Execute every step in order. Chain outputs with echo: and grab:. Assert on results with expect:. Exits non-zero if an assertion fails, so it works as a CI step.

Record

Capture any session

Run ocarina as a transparent stdio proxy. Every tools/call request and response goes into a YAML rondo, including sampling/createMessage LLM callbacks from agentic servers.

02Install

Install

Requires Go 1.26+.

go install

go install github.com/msradam/ocarina@latest

Or download a binary from releases.

Quick start

See what a server exposes. mcp-server-time needs no credentials:

terminal

ocarina docs uvx mcp-server-time

Write a rondo:

clock.yaml

server:
  command: uvx
  args: [mcp-server-time]

rondo:
  - name: time in Tokyo
    tool: get_current_time
    args:
      timezone: Asia/Tokyo
    expect:
      contains: "datetime"

Play it. No LLM, no API key:

terminal

ocarina play clock.yaml
# ==> time in Tokyo (get_current_time)
# {
#   "datetime": "2026-06-27T09:30:00+09:00",
#   "timezone": "Asia/Tokyo"
# }
#     PASS: contains "datetime"

To capture a live session instead of writing one, see record. For ready-made environments you can clone and run, see Showcases.

03Rondo format

Rondo format

A rondo is a YAML file with three parts: keys for variables, a server (or servers) to connect to, and a rondo list of steps. Write one by hand or record it from a live session, then commit it to git.

examples/github-investigation.yaml

# Investigate any GitHub repo. Change keys.owner and keys.repo to switch repos.
keys:
  owner: modelcontextprotocol
  repo: go-sdk

server:
  command: npx
  args: [-y, "@modelcontextprotocol/server-github"]

rondo:
  - name: list recent commits
    tool: list_commits
    args:
      owner: "{{owner}}"
      repo: "{{repo}}"
      per_page: 5
    grab: ".0.sha"          # extract the first SHA from the JSON array
    echo: latest_sha        # capture it into keys for later steps

  - name: show latest commit
    tool: get_commit
    args:
      owner: "{{owner}}"
      repo: "{{repo}}"
      sha: "{{latest_sha}}"   # value from the previous step
    expect:
      contains: "author"

  - name: list open issues
    tool: list_issues
    args:
      owner: "{{owner}}"
      repo: "{{repo}}"
      state: open

Step fields

Field	Description
`tool`	Tool to call (`tools/call`).
`resource`	Resource URI to read (`resources/read`).
`list_resources`	List a server's resources. Output is a JSON array of URIs.
`sleep`	Pause for a duration (`500ms`, `2s`) to pace a run.
`server`	Which entry in `servers` to run this step against. Defaults to the first.
`args`	Tool arguments. `{{key}}` interpolates from `keys`, a prior `echo:`, or `{{env.NAME}}`.
`when`	CEL condition; the step runs only if it is true. Bare variable names, not `{{...}}`.
`loop`	Iterate over a JSON array, setting `{{item}}` each pass.
`grab`	gjson path into the JSON result (`.0.sha`, `.name`), applied before `echo`.
`echo`	Capture the (grabbed) value into a key for later steps. `register:` is an alias.
`expect`	Assertions: `contains`, `matches` (regex), `equals`, `is_error`, `rule` (CEL), `message`. `play` exits non-zero on failure.
`timeout`	Per-step deadline (`10s`). The step fails if it is exceeded.
`retry`	`retries`, `delay`, and `until:` (CEL). Re-run until the condition holds.
`tags`	Label the step for `--tags` / `--skip-tags` filtering.
`ignore_errors`	Continue past a failing step instead of failing the run.

Rondo-level fields

Field	Description
`keys`	Static variables, interpolated as `{{key}}` everywhere. Override at run time with `-e key=value`.
`servers`	A map of named servers (`command`, `args`, `env`). Steps pick one with `server:`.
`server`	A single server block, the shorthand when a rondo talks to one server.
`llm`	Captured `sampling/createMessage` exchanges. Written by `record` when an agentic server calls back to the LLM.

Coming from Ansible? tasks: is accepted in place of rondo:, and register: in place of echo:.

Multiple servers

One rondo can talk to several servers. Declare them under servers and set server: on each step. A step that omits server: uses the first entry. When more than one server is in play, output and diff namespace tool names by server, like time.get_current_time.

multi-server.yaml

servers:
  time:  {command: uvx, args: [mcp-server-time]}
  fetch: {command: uvx, args: [mcp-server-fetch]}

rondo:
  - name: get the time
    server: time
    tool: get_current_time
    args: {timezone: UTC}

  - name: fetch a page
    server: fetch
    tool: fetch
    args: {url: "https://example.com"}

Remote servers

Give a server a url: instead of a command: to reach a hosted MCP server over the Streamable HTTP transport. Headers are sent on every request, so a bearer token works through {{env.X}}. When a tool returns structuredContent, grab and expect run against that typed JSON instead of the text block.

remote.yaml

server:
  url: https://api.githubcopilot.com/mcp/
  headers:
    Authorization: "Bearer {{env.GITHUB_TOKEN}}"

rondo:
  - name: who am I
    tool: get_me
    args: {}
    expect:
      contains: "login"

04Commands

Commands

docs

ocarina docs <command> [args...]

Connects to a server and writes markdown for every tool and resource it exposes: a synopsis, an argument table, and a copy-paste example step. The fastest way to learn a server and start a rondo.

terminal

ocarina docs uvx mcp-server-time
ocarina docs --out docs/github.md npx -y @modelcontextprotocol/server-github

Flags

--out FILE Write to a file instead of stdout. Place it before the server command.

record

ocarina record <output.yaml> <command> [args...]

A stdio proxy between your MCP host and server. Records every tools/call request and response into a rondo. Also captures sampling/createMessage exchanges when an agentic server calls back to the LLM, stored in the rondo's llm: block.

terminal

ocarina record out.yaml uvx mcp-server-sqlite --db-path /tmp/db.sqlite
ocarina record out.yaml npx -y @modelcontextprotocol/server-github

Flags

--no-result Omit result blocks from the rondo (smaller files, cleaner diffs).

play

ocarina play <rondo.yaml>

Executes each step in order against the live server. No LLM needed. Values captured with echo: feed into later steps through {{key}} interpolation. Exits non-zero if any step fails or any expect: assertion fails, so it works as a CI step.

terminal

ocarina play examples/github-investigation.yaml
ocarina play examples/mcp-smoke-test.yaml   # has assertions
ocarina play examples/time-zones.yaml -e owner=acme --tags smoke

Flags

`--output json`	Emit a machine-readable pass/fail report on a clean stdout, for CI.
`--trace`	Log every JSON-RPC frame to stderr.
`--dry-run`	Print steps without executing them.
`-e key=value`	Override a `keys` variable at run time. Repeatable.
`--tags` / `--skip-tags`	Run only, or skip, steps with the given tags.

validate

ocarina validate <rondo.yaml>

Checks a rondo against the server's schemas without calling any tools: the tool exists, required args are present, types match, every {{key}} resolves, and the CEL, timeout:, and server: references are valid. Exits non-zero on errors. The pre-flight.

terminal

ocarina validate examples/github-investigation.yaml

diff

ocarina diff <rondo.yaml>

Connects to the live server and compares the rondo's tools against the current schemas. Flags tools that were removed, args that became required, server references that are undefined, and new tools the server now offers. Exits non-zero if a tool the rondo uses is gone. Schema-drift detection for CI.

terminal

ocarina diff examples/github-investigation.yaml

lock

ocarina lock <rondo.yaml>

Snapshots each server's full tool schema, including tool descriptions, to a lock file. With --check, compares the live server against the lock and exits non-zero if a tool was removed or its description or input schema changed. A reworded tool description is a breaking change for an agent and a possible tool-poisoning signal, so drift is a failure.

terminal

ocarina lock audit.yaml           # write audit.yaml.lock
ocarina lock audit.yaml --check   # fail if the live schema drifted

hum

ocarina hum <command> [args...] -- <tool> [key=value ...]

Calls a single tool ad hoc and prints the result. Use it to see a tool's real output shape before you write a step around it. Values are coerced to their natural type, so per_page=1 is sent as a number.

terminal

ocarina hum uvx mcp-server-time -- get_current_time timezone=UTC
ocarina hum npx -y @modelcontextprotocol/server-github -- list_commits owner=pytorch repo=pytorch per_page=1

05Examples

Examples

Working rondos in examples/, each verified against a live server.

File	Server	What it shows
fetch-demo.yaml	`uvx mcp-server-fetch`	Basic fetch and content capture
github-investigation.yaml	`server-github`	`echo:` + `grab:` to chain API calls, extract SHA from JSON array
sqlite-migration.yaml	`uvx mcp-server-sqlite`	Stateful workflow: create table, seed rows, query results
git-repo-audit.yaml	`uvx mcp-server-git`	Local repo audit: log, status, diff, branches
mcp-smoke-test.yaml	`server-everything`	`expect:` assertions; use as a CI smoke test
sequential-thinking-demo.yaml	`server-sequential-thinking`	Multi-step reasoning chain captured as a rondo
knowledge-graph-demo.yaml	`server-memory`	Create, relate, search, and delete lifecycle with assertions
puppeteer-scrape.yaml	`server-puppeteer`	Headless browser: navigate, evaluate JS, screenshot
time-zones.yaml	`uvx mcp-server-time`	Parameterized timezone conversions, `[readonly]` annotations

06Showcases

Showcases

Standalone repositories you can clone and run, each a real working environment for a different MCP server.

Repository	What it does
duckdb-mcp-ocarina	Data integrity, migration, and regression tests against a DuckDB database. Clone and run, no credentials.
chrome-devtools-mcp-ocarina	Web health checks through Chrome DevTools MCP. Fail on a console error or a failed request.
github-mcp-ocarina	Repo governance as tests through the GitHub MCP server: a license, docs, and history.
blender-mcp-ocarina	Automate and snapshot-test a 3D scene in Blender, an app with no external API at all.

07Use in CI

Use in CI

play exits 0 if all expectations pass, non-zero otherwise. No API keys needed.

.github/workflows/mcp-test.yml

name: MCP smoke tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.26'
      - run: go install github.com/msradam/ocarina@latest
      - run: ocarina play examples/mcp-smoke-test.yaml

08Scope

What MCP makes easy, and what it does not yet

Ocarina works because MCP gives every server the same shape: named tools with typed arguments, reachable over one protocol. Learn the grammar once and you can drive a time server, a GitHub server, a Postgres database, and a 3D modeling app with the same YAML. You point Ocarina at a server, fill in the calls, and it runs them the same way on every execution. That part is solid today.

The friction is in what comes back. These servers were built so a model could read the output and decide what to do next, so many of them answer in prose rather than data. mcp-server-fetch returns markdown. mcp-server-sqlite returns the Python repr of a row, single quotes and all. The arguments are typed; the results often are not. Ocarina smooths over the common cases, and will parse that Python repr before a grab: runs, but when a server replies in free text, pulling a clean value out of it stays awkward. That ceiling belongs to the server, not to Ocarina.

Errors are uneven for the same reason. The spec lets a tool flag a failure with isError, but plenty of servers report trouble as ordinary text and never set it. Ocarina catches a misspelled tool name or a missing required argument by checking the live schema before it calls, so those fail loudly. A server that returns table not found as a cheerful string is harder to catch on its own, and that is what an expect: assertion is for.

So the honest scope. Ocarina is an automation framework for the MCP servers that behave like clean RPC, and a deterministic test harness for the ones that do not. Today that means smoke tests in CI and scripts that chain a few well-behaved servers. Recording a session and replaying it without a model works now too. The reach grows as the ecosystem matures and servers return more structure. The Blender demo on the home page is the far end of what is already possible. The rough edges above are the near end you will actually hit.

What Ocarina is not

It does not score LLM outputs, compare models, or track token costs. It connects to the real server on every play run, so a broken server breaks the replay. It runs one linear pass with no scheduler and no state carried between runs. The output is stdout and an exit code.

GitHub · Releases · MIT license · Whistle icon by Alessio Capponi / Noun Project (CC BY 3.0)