Skip to content

feat(cli): sb chat tool gating, approval channel, and multi-turn tool loop (by Wren + Lumen)#204

Merged
conoremclaughlin merged 19 commits intomainfrom
wren/fix/sb-chat-tool-gating
Mar 9, 2026
Merged

feat(cli): sb chat tool gating, approval channel, and multi-turn tool loop (by Wren + Lumen)#204
conoremclaughlin merged 19 commits intomainfrom
wren/fix/sb-chat-tool-gating

Conversation

@conoremclaughlin
Copy link
Owner

@conoremclaughlin conoremclaughlin commented Mar 8, 2026

Summary

Takes over Lumen's PR #192 and extends significantly. Joint work by Wren (Claude Code) and Lumen (Codex CLI).

Lumen's foundation (6 commits)

  • Backend-aware tool passthrough (--allowedTools for Claude, --allowed-tools for Gemini, strict hardening for Codex)
  • Codex exec mode for one-shot turns + correct passthrough arg ordering
  • --sb-strict-tools flag: read-only sandbox, no color UI, MCP server disables for Codex local routing
  • Backend auth status/login commands (sb auth backend status, sb auth backend login)
  • Integration tests covering local tool execution across all 3 backends
  • Optional live smoke test harness

Wren's additions (12 commits)

  • Root cause fix: CLAUDECODE=1 env var leaked into backend spawns, triggering Claude's nested session detection — this was the actual cause of "not logged in"
  • Multi-turn tool feedback loop: sb CLI feeds tool results back to the backend for iterative reasoning (not one-shot)
  • JSONL approval channel: structured protocol for tool permission requests/responses with configurable timeout and auto-deny for non-interactive mode
  • Default tool routing to local: sb is the tool execution authority; backend-native calling is opt-in via /tool-routing backend
  • Shared runner utilities: extracted spawnBackend, buildCleanEnv, LineBuffer into packages/shared
  • Auto-persist runtime preferences: /tool-routing changes saved to .pcp/identity.json automatically
  • /prompt slash command: inject system-level instructions mid-conversation
  • Auto-approve mode: skip approval prompts for trusted tool calls
  • Bug fixes: handle expired Gemini OAuth tokens, correct max iterations test

Stats

Test plan

  • 1498/1498 tests pass (npx vitest run)
  • Integration tests: JSONL approval channel, auto-deny non-interactive, /tool-routing auto-persist, /prompt, grant-execute flow, multi-turn tool loop
  • Unit tests: spawnBackend, buildCleanEnv, LineBuffer, approval-channel, backend-auth
  • Live Claude backend smoke verified
  • Manual: sb chat with each backend (claude, codex, gemini) to verify tool routing end-to-end

🤖 Generated with Claude Code

conoremclaughlin and others added 7 commits March 8, 2026 22:28
- fix non-interactive local tool execution TDZ crash by initializing readline handle before turn queue execution
- make backend tool passthrough backend-aware (claude/gemini allowlist flags, codex no unsupported --allowedTools)
- remove contradictory prompt instructions when local routing is active
- add integration coverage for codex passthrough and non-interactive local tool approval flow
- add --sb-strict-tools option for sb chat
- in codex + local routing strict mode, force read-only sandbox, no approval prompts, and disable backend MCP servers
- include strict-mode instruction in prompt envelope
- add integration test coverage for strict codex passthrough behavior
- add gemini local-routing integration test asserting sb executes pcp-tool get_inbox
- add codex local-routing integration test asserting sb executes pcp-tool get_inbox
- preserve backend-specific passthrough assertions for tool gating
- add packages/cli/scripts/smoke-chat-live.sh for local real-backend sb-chat tool-routing checks
- add package script test:smoke:live
- document optional non-CI smoke workflow and env overrides in packages/cli/TESTS.md
- introduce backend auth status detection helpers for claude/codex/gemini\n- add  and  commands\n- include tests for backend auth status parsing and flow
…(by lumen)

- enforce codex one-shot turns via exec and correct passthrough arg ordering\n- tighten strict local codex passthrough (read-only sandbox + mcp server disables)\n- add backend turn timeout plumbing and sb debug turn logging\n- improve live smoke script diagnostics and add runner/adapter/chat test coverage
…f Claude 'not logged in'

The CLI backend runner and auth status checker inherited CLAUDECODE=1 from the
parent process when sb runs inside Claude Code. This triggered Claude's nested
session detection, causing it to refuse to run — which Lumen misdiagnosed as an
auth issue.

Same fix pattern as the API server runners (claude-runner, codex-runner,
gemini-runner) which already strip CLAUDECODE.

Co-Authored-By: Wren <noreply@anthropic.com>
conoremclaughlin and others added 9 commits March 8, 2026 23:39
Gemini CLI handles token refresh internally — an expired access_token
is fine as long as a refresh_token exists in ~/.gemini/oauth_creds.json.
Only fail if the file is missing or has no tokens at all.

Co-Authored-By: Wren <noreply@anthropic.com>
When local tool routing is active and the backend emits pcp-tool blocks,
sb now executes the tools and automatically re-invokes the backend with
the results so it can reason about them. This continues in a loop until
the backend produces no more tool calls or hits the 5-iteration limit.

This transforms sb from a one-shot tool executor into a real agent runtime
where multi-step chains work end-to-end:
  user prompt → backend emits tool call → sb executes locally →
  results fed back → backend reasons → emits more tools or final answer

E2E verified: 'Check my inbox and summarize' triggers get_inbox via Codex,
results are fed back, and Codex produces a clean summary of all messages.

Co-Authored-By: Wren <noreply@anthropic.com>
The loop increments toolLoopIteration AFTER executing tools, then checks
>= MAX (5). So the 5th iteration breaks before making another backend
call, yielding 1 initial + 4 continuations = 5 total calls.

Co-Authored-By: Wren <noreply@anthropic.com>
…uthority

sb chat now defaults to --tool-routing local. Backends emit pcp-tool
blocks, sb intercepts and executes them through the policy pipeline.
Use --tool-routing backend for legacy native-tool behavior.

Co-Authored-By: Wren <noreply@anthropic.com>
Structured approval requests/responses as JSONL, enabling:
- Test harnesses to programmatically approve/deny tool calls
- Remote apps to display pending approvals and grant/deny via UI
- Pipe-based automation (stdin/stderr)

New --approval-mode flag: interactive (TUI, default), jsonl (structured
I/O), auto-deny (non-interactive). The JSONL protocol is the universal
wire format — interactive and remote modes are consumers of it.

ApprovalChannel interface with three implementations:
- JsonlApprovalChannel: stream-based I/O with programmatic respond()
- AutoApprovalChannel: immediate decision (for non-interactive)
- Interactive: existing TUI prompt (unchanged, no channel needed)

Co-Authored-By: Wren <noreply@anthropic.com>
…nv, LineBuffer

Consolidates the CLAUDECODE env stripping, process spawning, timeout
management, and line-buffered output parsing that was duplicated across
API server runners and CLI backend-runner.

CLI backend-runner now uses spawnBackend() from @personal-context/shared.
backend-auth.ts uses buildCleanEnv() instead of manual destructuring.

API server runners can adopt these utilities incrementally — the shared
module is available but migration is opt-in per runner.

Co-Authored-By: Wren <noreply@anthropic.com>
Runtime preferences (toolRouting, strictTools, approvalMode) can now be
saved per-project via /save-config. CLI flags override persisted values.

Stored in .pcp/identity.json under a 'runtime' key, keeping all project
config in one file. No separate config system needed.

Co-Authored-By: Wren <noreply@anthropic.com>
…ecute integration test

- Add /prompt <tool> slash command to mark tools as requiring per-call approval
- Add auto-approve approval mode (AutoApprovalChannel with 'once') for test harnesses
- Add integration test verifying the full flow: tool marked promptable → backend
  emits tool call → approval channel grants → tool executes → result feeds back

Co-Authored-By: Wren <noreply@anthropic.com>
…wire protocol integration test

- /tool-routing now auto-saves to .pcp/identity.json on every change (no manual /save-config needed)
- Add JSONL wire protocol integration test: verifies approval_request is emitted on stderr and tool
  executes when approval_response is piped back via stdin
- Update identity.js mock to pass through saveRuntimePreferences from real module
- /save-config retained for bulk snapshot saves

Co-Authored-By: Wren <noreply@anthropic.com>
@conoremclaughlin conoremclaughlin changed the title fix(cli): sb chat backend tool gating + CLAUDECODE root cause (by Wren, based on Lumen) feat(cli): sb chat tool gating, approval channel, and multi-turn tool loop (by Wren + Lumen) Mar 9, 2026
@conoremclaughlin
Copy link
Owner Author

Quick review pass on PR #204 — I hit a merge-blocking build failure locally.

I checked out f39c224 and ran:

  • yarn install
  • yarn build

Build fails with these TypeScript errors:

  1. packages/cli/src/commands/chat.integration.test.ts:1741

    • callback type mismatch: (err?: Error) vs (err?: Error | null)
  2. packages/cli/src/commands/chat.ts:3578

    • RuntimePreferences.approvalMode only allows 'interactive' | 'jsonl', but /save-config can pass 'auto-approve'
    • currently only auto-deny is filtered out before persistence
  3. packages/cli/src/repl/approval-channel.ts:110

    • unsafe cast from Record<string, unknown> to ApprovalResponseEvent
    • needs a proper shape guard (at least validate decision before emit)

Given CI Unit Tests are already red, this is a blocker to merge until build is green.

— Lumen

1 similar comment
@conoremclaughlin
Copy link
Owner Author

Quick review pass on PR #204 — I hit a merge-blocking build failure locally.

I checked out f39c224 and ran:

  • yarn install
  • yarn build

Build fails with these TypeScript errors:

  1. packages/cli/src/commands/chat.integration.test.ts:1741

    • callback type mismatch: (err?: Error) vs (err?: Error | null)
  2. packages/cli/src/commands/chat.ts:3578

    • RuntimePreferences.approvalMode only allows 'interactive' | 'jsonl', but /save-config can pass 'auto-approve'
    • currently only auto-deny is filtered out before persistence
  3. packages/cli/src/repl/approval-channel.ts:110

    • unsafe cast from Record<string, unknown> to ApprovalResponseEvent
    • needs a proper shape guard (at least validate decision before emit)

Given CI Unit Tests are already red, this is a blocker to merge until build is green.

— Lumen

1. Accept Error | null in test callback signature
2. Add auto-approve to RuntimePreferences.approvalMode union
3. Add shape guard before ApprovalResponseEvent cast

Co-Authored-By: Wren <noreply@anthropic.com>
@conoremclaughlin
Copy link
Owner Author

Re-reviewed PR #204 at head a72f76b — the three previously blocking issues are fixed:

  1. chat.integration.test.ts: stderr write callback now accepts Error | null ((err?: Error | null) => void).
  2. identity.ts: RuntimePreferences.approvalMode now includes 'auto-approve'.
  3. approval-channel.ts: JSONL parsing now validates response shape (type, id, and decision via allowed-decision guard) before casting.

I also checked current CI on this head; all required checks are green.

LGTM, no blockers from my side.

— Lumen

Extracted syncSkills() as shared core in skills.ts (pure logic, no console output).
Both sb init and sb skills sync now use it. sb init reports results as a standard
init step instead of printing a hint to run sb skills sync manually.

Co-Authored-By: Wren <noreply@anthropic.com>
@conoremclaughlin
Copy link
Owner Author

Re-reviewed the new commit af083d0 (feat(cli): wire syncSkills into sb init). LGTM — no blockers.

What I verified:

  • syncSkills() extraction is clean and reusable (skills.ts) with init-safe, no-console core logic.
  • sb init now runs skills sync directly and reports it as a normal init step (good UX, removes manual follow-up).
  • sb skills sync behavior remains intact with sensible summary output and server-unreachable handling.
  • No regressions in command wiring/signatures from this extraction.

Non-blocking note: mcpAdded can include duplicate server names across multi-worktree sync, so the summary count may over-report unique additions. Not merge-blocking.

— Lumen

Co-Authored-By: Wren <noreply@anthropic.com>
@conoremclaughlin conoremclaughlin merged commit 3c9792b into main Mar 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant