feat(cli,nodejs): add daemon process with ocap daemon CLI by rekmarks · Pull Request #843 · MetaMask/ocap-kernel

rekmarks · 2026-02-19T06:03:03Z

Adds a long-running daemon process to the OCAP kernel, managed via new ocap daemon CLI subcommands. The daemon spawns as a detached child process, exposes the kernel's RPC service over a Unix domain socket (~/.ocap/daemon.sock), and auto-starts on first exec invocation. The kernel database is persisted at ~/.ocap/kernel.sqlite.

Supersedes #842, and defers the introduction of its notion of a "console vat" and repl / IO functionality to a later date.

New CLI commands

ocap daemon start — start the daemon (or confirm it is already running)
ocap daemon stop — gracefully shut down the daemon
ocap daemon purge --force — stop the daemon and delete all persisted state
ocap daemon exec [method] [params-json] — send a JSON-RPC call to the daemon (defaults to getStatus)

Kernel changes

makeKernel() now returns { kernel, kernelDatabase } and accepts optional systemSubclusters
ifDefined utility moved from kernel-agents to kernel-utils
startRelay moved from cli to kernel-utils/libp2p

New modules

@ocap/nodejs/daemon — daemon orchestration (startDaemon, deleteDaemonState, startRpcSocketServer, socket line protocol)
@ocap/cli/commands/daemon* — CLI-side daemon client, spawner, and command handlers

Note

High Risk
Adds a new local RPC control plane (daemon + socket server) and changes kernel construction/IO-channel semantics, which can impact process lifecycle, persistence, and local security posture (e.g., arbitrary SQL via executeDBQuery).

Overview
Adds a detached, long-running OCAP daemon that hosts kernel JSON-RPC over a Unix domain socket and persists state under ~/.ocap, with new ocap daemon start|stop|purge|exec commands (including auto-spawn on exec) and prototype safeguards/behavior documented.

Introduces @ocap/nodejs/daemon (RPC socket server, line protocol helpers, daemon start/stop + state deletion) and updates makeKernel to return { kernel, kernelDatabase } (plus optional systemSubclusters) to support the daemon lifecycle.

Refactors shared utilities by moving startRelay to @metamask/kernel-utils/libp2p (and shifting libp2p deps accordingly) and moving ifDefined into @metamask/kernel-utils; updates tests/scripts and fixes makeSocketIOChannel reads to block until a client connects (instead of returning null).

^{Written by Cursor Bugbot for commit c0adc26. This will update automatically on new commits. Configure here.}

…tils Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…patch The system console vat manages a REPL loop over an IO channel, dispatching CLI commands (help, status, launch, terminate, subclusters, listRefs, revoke) and managing refs in persistent baggage. Refs use a monotonic counter (d-1, d-2, ...) since crypto.randomUUID() is unavailable under SES lockdown. Cross-vat errors are serialized via JSON.stringify fallback for reliable error reporting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add startDaemon() which boots a kernel with a system console vat listening on a UNIX domain socket IO channel. The kernel process IS the daemon — no separate HTTP server. Includes socket channel fix to block reads when no client is connected, flush-daemon utility, and e2e tests for the full daemon stack protocol. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add the 'ok' CLI that communicates with the kernel daemon over a UNIX domain socket using newline-delimited JSON. Uses yargs for command definitions with --help support on all commands. Supports three input modes: file arg (ok file.ocap method), stdin redirect (ok launch < config.json), and pipe (cat config.json | ok launch). Relative bundleSpec paths in launch configs are resolved to file:// URLs against CWD. Ref results are output as .ocap files when stdout is not a TTY. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement a two-tier access model: unauthenticated daemon-tier commands (help, status) and privileged ref-based dispatch via .ocap capability files. Self-ref dispatch bypasses kernel round-trip for the console root object. Fix kref leaks, improve socket channel reliability with stale connection detection and client-side retry.

…rect JSON-RPC daemon Replace the system-console-vat architecture with direct JSON-RPC over Unix socket. The old flow routed CLI commands through IOChannels and a REPL vat; the new flow sends JSON-RPC requests directly to kernel RPC handlers. - Add RPC socket server and daemon lifecycle to @ocap/nodejs under ./daemon export path, reusing RpcService and rpcHandlers from the kernel - Simplify CLI: ok.ts sends JSON-RPC commands, daemon-entry.ts boots kernel and starts the daemon socket server - Move libp2p relay from @ocap/cli to @metamask/kernel-utils under ./libp2p export path, breaking the cli<->nodejs dependency cycle - Remove @ocap/cli devDep from packages that only used the binary; use yarn run -T ocap for workspace-wide binary access - Delete system-console-vat and related IOChannel/ref plumbing - makeKernel now returns { kernel, kernelDatabase } Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Preserve kernel state across restarts (resetStorage: false) - Clean up stale socket files before listen - Add socket-based shutdown RPC with PID+SIGTERM fallback - Stop daemon before flushing state in begone handler - Narrow sendCommand retry to ECONNREFUSED/ECONNRESET only - Replace bare socket probe with getStatus RPC ping - Use JsonRpcResponse from @metamask/utils with runtime validation - Extract shared readLine/writeLine into socket-line.ts - Document 6 known limitations in CLI readme Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge the standalone `ok` binary into the existing `ocap` CLI as nested `daemon` subcommands (start, stop, begone, exec), removing the need for two separate entry points. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…n.sock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…od to --force Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…p eslint disables Move n/no-process-env exemption for cli package to eslint config, replace process.exit() calls with process.exitCode to allow pending I/O to complete, and simplify daemon-entry.ts error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/cli/src/commands/daemon-entry.ts

Guard the shutdown function with a stored promise so concurrent calls from RPC shutdown, SIGTERM, and SIGINT coalesce into a single handle.close() instead of throwing ERR_SERVER_NOT_RUNNING. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/cli/src/commands/daemon-entry.ts

Pass an explicit dbFilename to makeKernel so the daemon uses a on-disk SQLite database instead of the default in-memory one. This matches the path deleteDaemonState already expects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/nodejs/src/daemon/start-daemon.ts

packages/cli/src/commands/daemon-entry.ts

Use .finally() instead of .then() for PID file removal so stale daemon.pid files do not persist after a failed shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/cli/src/commands/daemon.ts

…path Return a boolean from stopDaemon so callers (purge, stop) can react to failure. Replace the SIGTERM poll loop with a short sleep since SIGTERM delivery is reliable. Refuse to delete state in purge if the daemon failed to stop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `e2e` CI job was failing on #843 with \"Executable doesn't exist\" for the Playwright Chromium binary. The root cause: the `prepare` job installs dependencies with `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1` and caches `node_modules`. When the `e2e` job restores that cache, `yarn install` is skipped entirely, so `postinstall`/`playwright-install.sh` never runs and the browsers are never downloaded. Playwright browsers live in `~/.cache/ms-playwright/`, not in `node_modules`, so they are not covered by the node_modules cache. ## Changes - Add a **Playwright browser cache** step (`actions/cache`) to the `e2e` job, keyed on `runner.os` + `yarn.lock` hash so it automatically invalidates when Playwright is updated. - Add an **Install Playwright browsers** step that runs `yarn playwright install chromium` only on a cache miss, and always installs Playwright dependencies, which are not cached. - Bump Playwright deps to trigger cache miss in CI. ## Testing This is a CI configuration change. It is verified by pushing the branch and confirming the `E2E Tests / omnium-gatherum` (and sibling matrix) jobs pass. On the first run the cache will miss and download Chromium; subsequent runs with the same `yarn.lock` will hit the cache and skip the download.  --- > [!NOTE] > **Low Risk** > CI/workflow and dependency-version updates only; main risk is longer first-run CI times or cache-key issues affecting E2E job reliability. > > **Overview** > Prevents E2E CI failures caused by missing Playwright Chromium binaries when `node_modules` is restored from cache and Playwright `postinstall` doesn’t run. > > Updates the `e2e` GitHub Actions job to **cache `~/.cache/ms-playwright`** (keyed by OS + `yarn.lock`) and to **install Playwright Chromium on cache miss**, plus always install required system deps via `playwright install-deps`. > > Bumps Playwright tooling to `^1.58.2` across the repo (root `playwright` devDependency and workspace `@playwright/test`/`playwright` versions), updating `yarn.lock` accordingly. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit fb33ca9. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

kernel.stop() now closes the database (ee4fed9), making the explicit close call in the daemon's shutdown path redundant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

FUDCo

Still looking at stuff, but wanted to get these comments in the pipeline for you.

Meta concern: This design seems to manifest what MarkM likes to call a "per error" (though I'm not sure it's actually an error as such). The CLI has a concept of The Daemon rather than A Daemon; the notion that there might be more than one of them is not supported by this tooling, though the possibility is certainly latent in some of the underlying implementation. It strikes me that being able to spin up multiple daemons might be useful for a number of testing and debugging scenarios, especially if we can arrange to wire up the various kernels to each other. On the other hand, supporting multiple daemons would complicate the UI, so I'm not sure how far we actually want to take that.

I was initially confused about what this was, which points to the existence of other possibilities in design space that may be worth exploring. What is implemented here is a headless process that contains an ocap kernel instance, and you can command the process or the kernel it contains by sending JSON RPC messages to the process via a domain socket, with shell command line support for sending such messages. I had started out thinking in terms of a headless process that would load an arbitrary bundle and then connect over a domain socket to an I/O configured vat running in some locally executing kernel. The vat would send commands to the daemon that would either command the daemon itself to do things or pass messages along to the code in the bundle it had loaded. Both of these ideas seem potentially useful, but they're very different.

packages/cli/README.md

FUDCo · 2026-02-19T21:53:08Z

packages/cli/src/commands/daemon.ts

+ * failed to stop within the timeout.
+ */
+export async function stopDaemon(socketPath: string): Promise<boolean> {
+  if (!(await pingDaemon(socketPath))) {


It seems to me that this use of ping is vulnerable to being unable to distinguish between the case where daemon process is not running and the case where it is running but non-responsive. In particular, if you happen to discover (out of band somehow) that the process is stuck, this function will return before falling back on SIGTERM. If I'm reading this right, it only does the latter if the daemon is unresponsive to the shutdown request, but it will never get to the point of trying to send shutdown if the process is just stuck.

So pingDaemon() already had a timeout via sendCommand(), but I tidied up the shutdown process and added a SIGKILL fallback in 87b6b55

packages/cli/src/commands/daemon.ts

Check both socket responsiveness and PID-based process liveness before declaring the daemon stopped. Escalate through socket shutdown, SIGTERM, and SIGKILL with proper exit verification at each stage. Also refactor sendCommand to use an options bag and give pingDaemon a 3s timeout instead of the default 30s so stuck-daemon detection is fast. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

launchSubcluster params are shaped { config: ClusterConfig }, but isClusterConfigLike was checking the top-level params object. Check parsed.config instead so relative bundleSpec paths are resolved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use conventional 0600 notation instead of JS 0o600 in documentation. Simplify post-shutdown socket check to a single ping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rekmarks · 2026-02-19T23:23:01Z

For posterity, re: @FUDCo's review:

We are postponing multi-tenant daemons until later.
The daemon is "some locally executing kernel" for the purposes of @FUDCo's I/O vat design.

packages/cli/src/commands/daemon.ts

packages/nodejs/src/daemon/start-daemon.ts

github-actions · 2026-02-20T02:25:49Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	75.98% ⬇️ -2.24%	6550 / 8620
🔵	Statements	75.87% ⬇️ -2.31%	6655 / 8771
🔵	Functions	73.79% ⬇️ -2.22%	1639 / 2221
🔵	Branches	75.35% ⬇️ -2.81%	2416 / 3206

File Coverage

File	Stmts	Branches	Functions	Lines	Uncovered Lines
Changed Files
packages/cli/src/app.ts	0% 🟰 ±0%	0% 🟰 ±0%	0% 🟰 ±0%	0% 🟰 ±0%	30-288
packages/cli/src/commands/daemon-client.ts	0%	0%	0%	0%	16-117
packages/cli/src/commands/daemon-entry.ts	0%	0%	0%	0%	11-102
packages/cli/src/commands/daemon-spawn.ts	0%	0%	0%	0%	7-47
packages/cli/src/commands/daemon.ts	0%	0%	0%	0%	11-263
packages/kernel-agents/src/utils.ts	96.42% ⬇️ -0.35%	80% 🟰 ±0%	100% 🟰 ±0%	96.29% ⬇️ -0.37%	71
packages/kernel-utils/src/index.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/kernel-utils/src/misc.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/nodejs/src/index.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/nodejs/src/daemon/delete-daemon-state.ts	0%	0%	0%	0%	23-37
packages/nodejs/src/daemon/index.ts	100%	100%	100%	100%
packages/nodejs/src/daemon/rpc-socket-server.ts	0%	0%	0%	0%	43-228
packages/nodejs/src/daemon/socket-line.ts	100%	100%	100%	100%
packages/nodejs/src/daemon/start-daemon.ts	100%	100%	100%	100%
packages/nodejs/src/io/socket-channel.ts	90.58% ⬇️ -3.46%	84.61% ⬇️ -2.89%	94.73% 🟰 ±0%	91.35% ⬇️ -3.65%	84-85, 96-98, 115, 160, 170
packages/nodejs/src/kernel/make-kernel.ts	100% 🟰 ±0%	85.71% ⬇️ -14.29%	100% 🟰 ±0%	100% 🟰 ±0%

Generated in workflow #3785 for commit c0adc26 by the Vitest Coverage Report Action

packages/cli/src/commands/daemon.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

FUDCo

Excelsior!

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

packages/nodejs/src/daemon/delete-daemon-state.ts

packages/nodejs/src/daemon/socket-line.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use removeListener with specific handler references instead of removeAllListeners, so callers' listeners are preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Covers line parsing, buffering, error/end/close rejection, timeout, and verifies that readLine only removes its own socket listeners. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

FUDCo

Avaunt!

grypez and others added 13 commits February 17, 2026 18:29

refactor(kernel-utils): move ifDefined from kernel-agents to kernel-u…

cde2516

…tils Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(cli): consolidate ok CLI into ocap daemon command

58a54c2

Merge the standalone `ok` binary into the existing `ocap` CLI as nested `daemon` subcommands (start, stop, begone, exec), removing the need for two separate entry points. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(cli,nodejs): rename daemon socket from console.sock to daemo…

b04a79b

…n.sock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(cli,nodejs): rename flushDaemon to deleteDaemonState

54f238c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(cli): add usage examples to ocap daemon exec help

082351e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(cli): rename begone to purge (with begone alias) and --forgo…

f2b1450

…od to --force Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rekmarks requested a review from a team as a code owner February 19, 2026 06:03

rekmarks changed the title ~~Consolidate ok CLI into ocap daemon command~~ feat(cli,nodejs): add daemon process with ocap daemon CLI Feb 19, 2026

rekmarks and others added 3 commits February 18, 2026 22:06

chore: update yarn.lock

4d986aa

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: Remove Kernel.invokeMethod()

3731aff

chore: Restore @ocap/cli dev dep to kernel-test

935aa03