Skip to content

Reviewing agent run results

Walk flagged runs in the flipbook UI, set verdicts, and take notes

The Review page is a focused queue for an agent’s flagged runs. It pages through them one at a time so you can quickly decide whether each result was good, bad, or needs more thought — and feed the bad ones back into tuning.

Agent review flipbook

  1. From the agent’s detail page, click Review (or open /agents/{id}/review directly).

  2. The page loads with the agent’s currently-flagged runs. The header shows the position counter (3 of 12) and the count of runs still flagged.

  3. If nothing is flagged, you’ll see the “Nothing to review” empty state — flag a run from the Runs tab first.

  1. Use the Previous / Next buttons or the dot pagination at the bottom to step through the queue.

  2. Press / (or j / k) on the keyboard to navigate without moving your hand to the mouse.

  3. Press Esc to leave the queue and return to the agent detail page.

  1. Read the run summary (asked / did blocks plus the response panel) on the current card.

  2. Press U for thumbs-up (good) or D for thumbs-down (flagged for tuning). The verdict mutation auto-advances to the next run.

  3. To remove a verdict, set it to “needs review” — clicking the same verdict again clears it.

  1. Type a note in the verdict text field on the flipbook card. Notes are saved with the verdict.

  2. The note appears on the run detail page and is included in the tuning conversation context.

  3. Notes are optional but make consolidated tuning proposals dramatically better — they tell the LLM why the run was bad.

  1. Click Open full detail in the flipbook card header to jump to /agents/{id}/runs/{runId} for the step-by-step timeline.

  2. Use the browser back button to return to the same position in the flipbook.

  1. When the header shows N flagged, click Tune with N flagged to jump into the tune workbench.

  2. The workbench reads the same flagged set you just reviewed — your notes feed directly into the consolidated proposal.