Skip to content

Work Supervision

IntermediateQuality Assurance

Supervise completed work by evaluating both the output quality (correctness, completeness, test coverage) and the process quality (efficiency, scope discipline, tool usage). Reads claudit session transcripts stored as git notes to understand not just what was done, but how it was done.

Prerequisites

  • Wave installed and initialized (wave init)
  • Git repository with recent work to review
  • Optional: claudit for session transcript storage via git notes

Quick Start

bash
# Auto-detect last pipeline run
wave run supervise

# Review a specific pipeline run
wave run supervise "last pipeline run"

# Review a specific branch
wave run supervise "feature/add-auth"

# Review current PR
wave run supervise "current pr"

With -o text:

[10:00:01] -> gather (supervisor)
[10:00:01]   gather: Executing agent
[10:04:30] + gather completed (269s, 8.2k tokens)
[10:04:31] -> evaluate (supervisor)
[10:08:45] + evaluate completed (254s, 6.1k tokens)
[10:08:46] -> verdict (reviewer)
[10:12:30] + verdict completed (224s, 3.8k tokens)

  + Pipeline 'supervise' completed successfully (748s)

Pipeline Structure

gather (supervisor) -> evaluate (supervisor) -> verdict (reviewer)

All three steps use readonly workspace mounts -- this is a purely analytical pipeline that never modifies code.

Step 1: Gather Evidence

The supervisor persona parses input heuristically to determine what to inspect:

InputDetection Strategy
(empty)Most recent pipeline run from .wave/workspaces/
"last pipeline run"Same as empty
"current pr" or "PR #42"Current or specified pull request
"feature/auth"All commits on that branch vs main
Free-form textSearch via grep/git log

Evidence collected includes:

  • Recent commits with diffs and stats
  • Claudit session transcripts from git notes
  • Pipeline workspace artifacts
  • Test results and coverage
  • Branch and PR state

Step 2: Evaluate Quality

The supervisor scores each dimension as excellent / good / adequate / poor:

Output Quality:

  • Correctness, completeness, test coverage, code quality

Process Quality:

  • Efficiency, scope discipline, tool usage, token economy

Step 3: Final Verdict

The reviewer independently verifies claims, runs the test suite, and issues a verdict:

  • APPROVE -- work is good quality, process was efficient
  • PARTIAL_APPROVE -- output acceptable but process had notable issues
  • REWORK -- significant issues requiring attention

Expected Outputs

ArtifactPathDescription
evidence.wave/output/supervision-evidence.jsonRaw evidence bundle with commits, artifacts, transcripts
evaluation.wave/output/supervision-evaluation.jsonScored evaluation across all quality dimensions
verdict.wave/output/supervision-verdict.mdFinal verdict with action items and lessons learned

Example Output

The pipeline produces .wave/output/supervision-verdict.md:

markdown
## Verdict: PARTIAL_APPROVE

## Output Quality
The implementation is correct and complete. All 47 tests pass,
including 12 new tests added for the feature. Code follows
existing project conventions.

## Process Quality
The agent took 3 unnecessary detours:
1. Read 14 unrelated files before finding the target module
2. Attempted a refactor that was reverted after 200 lines of changes
3. Re-ran the full test suite 5 times when targeted tests would suffice

Estimated 30% of tokens were spent on non-productive exploration.

## Action Items
- should-fix: Consider using targeted `go test ./internal/pipeline/...`
  instead of full suite during iterative development

## Lessons Learned
- Scope the initial exploration phase more tightly
- Use Glob/Grep before reading files to narrow candidates

Customization

Focus on process quality only

bash
wave run supervise "focus on process efficiency of the last pipeline run"

Review a specific PR

bash
wave run supervise "PR #42"

Next Steps

Released under the MIT License.