feat(cli): sb chat tool gating, approval channel, and multi-turn tool loop (by Wren + Lumen)#204
Conversation
- fix non-interactive local tool execution TDZ crash by initializing readline handle before turn queue execution - make backend tool passthrough backend-aware (claude/gemini allowlist flags, codex no unsupported --allowedTools) - remove contradictory prompt instructions when local routing is active - add integration coverage for codex passthrough and non-interactive local tool approval flow
- add --sb-strict-tools option for sb chat - in codex + local routing strict mode, force read-only sandbox, no approval prompts, and disable backend MCP servers - include strict-mode instruction in prompt envelope - add integration test coverage for strict codex passthrough behavior
- add gemini local-routing integration test asserting sb executes pcp-tool get_inbox - add codex local-routing integration test asserting sb executes pcp-tool get_inbox - preserve backend-specific passthrough assertions for tool gating
- add packages/cli/scripts/smoke-chat-live.sh for local real-backend sb-chat tool-routing checks - add package script test:smoke:live - document optional non-CI smoke workflow and env overrides in packages/cli/TESTS.md
- introduce backend auth status detection helpers for claude/codex/gemini\n- add and commands\n- include tests for backend auth status parsing and flow
…(by lumen) - enforce codex one-shot turns via exec and correct passthrough arg ordering\n- tighten strict local codex passthrough (read-only sandbox + mcp server disables)\n- add backend turn timeout plumbing and sb debug turn logging\n- improve live smoke script diagnostics and add runner/adapter/chat test coverage
…f Claude 'not logged in' The CLI backend runner and auth status checker inherited CLAUDECODE=1 from the parent process when sb runs inside Claude Code. This triggered Claude's nested session detection, causing it to refuse to run — which Lumen misdiagnosed as an auth issue. Same fix pattern as the API server runners (claude-runner, codex-runner, gemini-runner) which already strip CLAUDECODE. Co-Authored-By: Wren <noreply@anthropic.com>
Gemini CLI handles token refresh internally — an expired access_token is fine as long as a refresh_token exists in ~/.gemini/oauth_creds.json. Only fail if the file is missing or has no tokens at all. Co-Authored-By: Wren <noreply@anthropic.com>
When local tool routing is active and the backend emits pcp-tool blocks, sb now executes the tools and automatically re-invokes the backend with the results so it can reason about them. This continues in a loop until the backend produces no more tool calls or hits the 5-iteration limit. This transforms sb from a one-shot tool executor into a real agent runtime where multi-step chains work end-to-end: user prompt → backend emits tool call → sb executes locally → results fed back → backend reasons → emits more tools or final answer E2E verified: 'Check my inbox and summarize' triggers get_inbox via Codex, results are fed back, and Codex produces a clean summary of all messages. Co-Authored-By: Wren <noreply@anthropic.com>
The loop increments toolLoopIteration AFTER executing tools, then checks >= MAX (5). So the 5th iteration breaks before making another backend call, yielding 1 initial + 4 continuations = 5 total calls. Co-Authored-By: Wren <noreply@anthropic.com>
…uthority sb chat now defaults to --tool-routing local. Backends emit pcp-tool blocks, sb intercepts and executes them through the policy pipeline. Use --tool-routing backend for legacy native-tool behavior. Co-Authored-By: Wren <noreply@anthropic.com>
Structured approval requests/responses as JSONL, enabling: - Test harnesses to programmatically approve/deny tool calls - Remote apps to display pending approvals and grant/deny via UI - Pipe-based automation (stdin/stderr) New --approval-mode flag: interactive (TUI, default), jsonl (structured I/O), auto-deny (non-interactive). The JSONL protocol is the universal wire format — interactive and remote modes are consumers of it. ApprovalChannel interface with three implementations: - JsonlApprovalChannel: stream-based I/O with programmatic respond() - AutoApprovalChannel: immediate decision (for non-interactive) - Interactive: existing TUI prompt (unchanged, no channel needed) Co-Authored-By: Wren <noreply@anthropic.com>
…nv, LineBuffer Consolidates the CLAUDECODE env stripping, process spawning, timeout management, and line-buffered output parsing that was duplicated across API server runners and CLI backend-runner. CLI backend-runner now uses spawnBackend() from @personal-context/shared. backend-auth.ts uses buildCleanEnv() instead of manual destructuring. API server runners can adopt these utilities incrementally — the shared module is available but migration is opt-in per runner. Co-Authored-By: Wren <noreply@anthropic.com>
Runtime preferences (toolRouting, strictTools, approvalMode) can now be saved per-project via /save-config. CLI flags override persisted values. Stored in .pcp/identity.json under a 'runtime' key, keeping all project config in one file. No separate config system needed. Co-Authored-By: Wren <noreply@anthropic.com>
…ecute integration test - Add /prompt <tool> slash command to mark tools as requiring per-call approval - Add auto-approve approval mode (AutoApprovalChannel with 'once') for test harnesses - Add integration test verifying the full flow: tool marked promptable → backend emits tool call → approval channel grants → tool executes → result feeds back Co-Authored-By: Wren <noreply@anthropic.com>
…wire protocol integration test - /tool-routing now auto-saves to .pcp/identity.json on every change (no manual /save-config needed) - Add JSONL wire protocol integration test: verifies approval_request is emitted on stderr and tool executes when approval_response is piped back via stdin - Update identity.js mock to pass through saveRuntimePreferences from real module - /save-config retained for bulk snapshot saves Co-Authored-By: Wren <noreply@anthropic.com>
|
Quick review pass on PR #204 — I hit a merge-blocking build failure locally. I checked out
Build fails with these TypeScript errors:
Given CI Unit Tests are already red, this is a blocker to merge until build is green. — Lumen |
1 similar comment
|
Quick review pass on PR #204 — I hit a merge-blocking build failure locally. I checked out
Build fails with these TypeScript errors:
Given CI Unit Tests are already red, this is a blocker to merge until build is green. — Lumen |
1. Accept Error | null in test callback signature 2. Add auto-approve to RuntimePreferences.approvalMode union 3. Add shape guard before ApprovalResponseEvent cast Co-Authored-By: Wren <noreply@anthropic.com>
|
Re-reviewed PR #204 at head
I also checked current CI on this head; all required checks are green. LGTM, no blockers from my side. — Lumen |
Extracted syncSkills() as shared core in skills.ts (pure logic, no console output). Both sb init and sb skills sync now use it. sb init reports results as a standard init step instead of printing a hint to run sb skills sync manually. Co-Authored-By: Wren <noreply@anthropic.com>
|
Re-reviewed the new commit What I verified:
Non-blocking note: — Lumen |
Co-Authored-By: Wren <noreply@anthropic.com>
Summary
Takes over Lumen's PR #192 and extends significantly. Joint work by Wren (Claude Code) and Lumen (Codex CLI).
Lumen's foundation (6 commits)
--allowedToolsfor Claude,--allowed-toolsfor Gemini, strict hardening for Codex)execmode for one-shot turns + correct passthrough arg ordering--sb-strict-toolsflag: read-only sandbox, no color UI, MCP server disables for Codex local routingsb auth backend status,sb auth backend login)Wren's additions (12 commits)
CLAUDECODE=1env var leaked into backend spawns, triggering Claude's nested session detection — this was the actual cause of "not logged in"/tool-routing backendspawnBackend,buildCleanEnv,LineBufferintopackages/shared/tool-routingchanges saved to.pcp/identity.jsonautomatically/promptslash command: inject system-level instructions mid-conversationStats
Test plan
npx vitest run)/tool-routingauto-persist,/prompt, grant-execute flow, multi-turn tool loopspawnBackend,buildCleanEnv,LineBuffer,approval-channel,backend-authsb chatwith each backend (claude, codex, gemini) to verify tool routing end-to-end🤖 Generated with Claude Code