Skip to content

Comments

feat(cli,nodejs): add daemon process with ocap daemon CLI#843

Merged
rekmarks merged 36 commits intomainfrom
rekm/grypez/daemon
Feb 20, 2026
Merged

feat(cli,nodejs): add daemon process with ocap daemon CLI#843
rekmarks merged 36 commits intomainfrom
rekm/grypez/daemon

Conversation

@rekmarks
Copy link
Member

@rekmarks rekmarks commented Feb 19, 2026

Adds a long-running daemon process to the OCAP kernel, managed via new ocap daemon CLI subcommands. The daemon spawns as a detached child process, exposes the kernel's RPC service over a Unix domain socket (~/.ocap/daemon.sock), and auto-starts on first exec invocation. The kernel database is persisted at ~/.ocap/kernel.sqlite.

Supersedes #842, and defers the introduction of its notion of a "console vat" and repl / IO functionality to a later date.

New CLI commands

  • ocap daemon start — start the daemon (or confirm it is already running)
  • ocap daemon stop — gracefully shut down the daemon
  • ocap daemon purge --force — stop the daemon and delete all persisted state
  • ocap daemon exec [method] [params-json] — send a JSON-RPC call to the daemon (defaults to getStatus)

Kernel changes

  • makeKernel() now returns { kernel, kernelDatabase } and accepts optional systemSubclusters
  • ifDefined utility moved from kernel-agents to kernel-utils
  • startRelay moved from cli to kernel-utils/libp2p

New modules

  • @ocap/nodejs/daemon — daemon orchestration (startDaemon, deleteDaemonState, startRpcSocketServer, socket line protocol)
  • @ocap/cli/commands/daemon* — CLI-side daemon client, spawner, and command handlers

Note

High Risk
Adds a new local RPC control plane (daemon + socket server) and changes kernel construction/IO-channel semantics, which can impact process lifecycle, persistence, and local security posture (e.g., arbitrary SQL via executeDBQuery).

Overview
Adds a detached, long-running OCAP daemon that hosts kernel JSON-RPC over a Unix domain socket and persists state under ~/.ocap, with new ocap daemon start|stop|purge|exec commands (including auto-spawn on exec) and prototype safeguards/behavior documented.

Introduces @ocap/nodejs/daemon (RPC socket server, line protocol helpers, daemon start/stop + state deletion) and updates makeKernel to return { kernel, kernelDatabase } (plus optional systemSubclusters) to support the daemon lifecycle.

Refactors shared utilities by moving startRelay to @metamask/kernel-utils/libp2p (and shifting libp2p deps accordingly) and moving ifDefined into @metamask/kernel-utils; updates tests/scripts and fixes makeSocketIOChannel reads to block until a client connects (instead of returning null).

Written by Cursor Bugbot for commit c0adc26. This will update automatically on new commits. Configure here.

grypez and others added 13 commits February 17, 2026 18:29
…tils

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…patch

The system console vat manages a REPL loop over an IO channel,
dispatching CLI commands (help, status, launch, terminate, subclusters,
listRefs, revoke) and managing refs in persistent baggage. Refs use
a monotonic counter (d-1, d-2, ...) since crypto.randomUUID() is
unavailable under SES lockdown. Cross-vat errors are serialized via
JSON.stringify fallback for reliable error reporting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add startDaemon() which boots a kernel with a system console vat
listening on a UNIX domain socket IO channel. The kernel process IS
the daemon — no separate HTTP server. Includes socket channel fix to
block reads when no client is connected, flush-daemon utility, and
e2e tests for the full daemon stack protocol.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the 'ok' CLI that communicates with the kernel daemon over a UNIX
domain socket using newline-delimited JSON. Uses yargs for command
definitions with --help support on all commands. Supports three input
modes: file arg (ok file.ocap method), stdin redirect (ok launch <
config.json), and pipe (cat config.json | ok launch). Relative
bundleSpec paths in launch configs are resolved to file:// URLs
against CWD. Ref results are output as .ocap files when stdout is
not a TTY.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement a two-tier access model: unauthenticated daemon-tier commands
(help, status) and privileged ref-based dispatch via .ocap capability
files. Self-ref dispatch bypasses kernel round-trip for the console
root object. Fix kref leaks, improve socket channel reliability with
stale connection detection and client-side retry.
…rect JSON-RPC daemon

Replace the system-console-vat architecture with direct JSON-RPC over Unix
socket. The old flow routed CLI commands through IOChannels and a REPL vat;
the new flow sends JSON-RPC requests directly to kernel RPC handlers.

- Add RPC socket server and daemon lifecycle to @ocap/nodejs under ./daemon
  export path, reusing RpcService and rpcHandlers from the kernel
- Simplify CLI: ok.ts sends JSON-RPC commands, daemon-entry.ts boots kernel
  and starts the daemon socket server
- Move libp2p relay from @ocap/cli to @metamask/kernel-utils under ./libp2p
  export path, breaking the cli<->nodejs dependency cycle
- Remove @ocap/cli devDep from packages that only used the binary; use
  yarn run -T ocap for workspace-wide binary access
- Delete system-console-vat and related IOChannel/ref plumbing
- makeKernel now returns { kernel, kernelDatabase }

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Preserve kernel state across restarts (resetStorage: false)
- Clean up stale socket files before listen
- Add socket-based shutdown RPC with PID+SIGTERM fallback
- Stop daemon before flushing state in begone handler
- Narrow sendCommand retry to ECONNREFUSED/ECONNRESET only
- Replace bare socket probe with getStatus RPC ping
- Use JsonRpcResponse from @metamask/utils with runtime validation
- Extract shared readLine/writeLine into socket-line.ts
- Document 6 known limitations in CLI readme

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge the standalone `ok` binary into the existing `ocap` CLI as nested
`daemon` subcommands (start, stop, begone, exec), removing the need for
two separate entry points.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n.sock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…od to --force

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…p eslint disables

Move n/no-process-env exemption for cli package to eslint config,
replace process.exit() calls with process.exitCode to allow pending
I/O to complete, and simplify daemon-entry.ts error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rekmarks rekmarks requested a review from a team as a code owner February 19, 2026 06:03
@rekmarks rekmarks changed the title Consolidate ok CLI into ocap daemon command feat(cli,nodejs): add daemon process with ocap daemon CLI Feb 19, 2026
Guard the shutdown function with a stored promise so concurrent calls
from RPC shutdown, SIGTERM, and SIGINT coalesce into a single
handle.close() instead of throwing ERR_SERVER_NOT_RUNNING.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass an explicit dbFilename to makeKernel so the daemon uses a
on-disk SQLite database instead of the default in-memory one.
This matches the path deleteDaemonState already expects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rekmarks and others added 2 commits February 19, 2026 10:49
Use .finally() instead of .then() for PID file removal so stale
daemon.pid files do not persist after a failed shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…path

Return a boolean from stopDaemon so callers (purge, stop) can react
to failure. Replace the SIGTERM poll loop with a short sleep since
SIGTERM delivery is reliable. Refuse to delete state in purge if the
daemon failed to stop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Feb 19, 2026
The `e2e` CI job was failing on #843 with \"Executable doesn't exist\"
for the Playwright Chromium binary. The root cause: the `prepare` job
installs dependencies with `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1` and
caches `node_modules`. When the `e2e` job restores that cache, `yarn
install` is skipped entirely, so `postinstall`/`playwright-install.sh`
never runs and the browsers are never downloaded. Playwright browsers
live in `~/.cache/ms-playwright/`, not in `node_modules`, so they are
not covered by the node_modules cache.

## Changes

- Add a **Playwright browser cache** step (`actions/cache`) to the `e2e`
job, keyed on `runner.os` + `yarn.lock` hash so it automatically
invalidates when Playwright is updated.
- Add an **Install Playwright browsers** step that runs `yarn playwright
install chromium` only on a cache miss, and always installs Playwright
dependencies, which are not cached.
- Bump Playwright deps to trigger cache miss in CI.

## Testing

This is a CI configuration change. It is verified by pushing the branch
and confirming the `E2E Tests / omnium-gatherum` (and sibling matrix)
jobs pass. On the first run the cache will miss and download Chromium;
subsequent runs with the same `yarn.lock` will hit the cache and skip
the download.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> CI/workflow and dependency-version updates only; main risk is longer
first-run CI times or cache-key issues affecting E2E job reliability.
> 
> **Overview**
> Prevents E2E CI failures caused by missing Playwright Chromium
binaries when `node_modules` is restored from cache and Playwright
`postinstall` doesn’t run.
> 
> Updates the `e2e` GitHub Actions job to **cache
`~/.cache/ms-playwright`** (keyed by OS + `yarn.lock`) and to **install
Playwright Chromium on cache miss**, plus always install required system
deps via `playwright install-deps`.
> 
> Bumps Playwright tooling to `^1.58.2` across the repo (root
`playwright` devDependency and workspace `@playwright/test`/`playwright`
versions), updating `yarn.lock` accordingly.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
fb33ca9. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
kernel.stop() now closes the database (ee4fed9), making the explicit
close call in the daemon's shutdown path redundant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at stuff, but wanted to get these comments in the pipeline for you.

Meta concern: This design seems to manifest what MarkM likes to call a "per error" (though I'm not sure it's actually an error as such). The CLI has a concept of The Daemon rather than A Daemon; the notion that there might be more than one of them is not supported by this tooling, though the possibility is certainly latent in some of the underlying implementation. It strikes me that being able to spin up multiple daemons might be useful for a number of testing and debugging scenarios, especially if we can arrange to wire up the various kernels to each other. On the other hand, supporting multiple daemons would complicate the UI, so I'm not sure how far we actually want to take that.

I was initially confused about what this was, which points to the existence of other possibilities in design space that may be worth exploring. What is implemented here is a headless process that contains an ocap kernel instance, and you can command the process or the kernel it contains by sending JSON RPC messages to the process via a domain socket, with shell command line support for sending such messages. I had started out thinking in terms of a headless process that would load an arbitrary bundle and then connect over a domain socket to an I/O configured vat running in some locally executing kernel. The vat would send commands to the daemon that would either command the daemon itself to do things or pass messages along to the code in the bundle it had loaded. Both of these ideas seem potentially useful, but they're very different.

* failed to stop within the timeout.
*/
export async function stopDaemon(socketPath: string): Promise<boolean> {
if (!(await pingDaemon(socketPath))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this use of ping is vulnerable to being unable to distinguish between the case where daemon process is not running and the case where it is running but non-responsive. In particular, if you happen to discover (out of band somehow) that the process is stuck, this function will return before falling back on SIGTERM. If I'm reading this right, it only does the latter if the daemon is unresponsive to the shutdown request, but it will never get to the point of trying to send shutdown if the process is just stuck.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So pingDaemon() already had a timeout via sendCommand(), but I tidied up the shutdown process and added a SIGKILL fallback in 87b6b55

rekmarks and others added 3 commits February 19, 2026 14:54
Check both socket responsiveness and PID-based process liveness before
declaring the daemon stopped. Escalate through socket shutdown, SIGTERM,
and SIGKILL with proper exit verification at each stage.

Also refactor sendCommand to use an options bag and give pingDaemon a
3s timeout instead of the default 30s so stuck-daemon detection is fast.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
launchSubcluster params are shaped { config: ClusterConfig }, but
isClusterConfigLike was checking the top-level params object. Check
parsed.config instead so relative bundleSpec paths are resolved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use conventional 0600 notation instead of JS 0o600 in documentation.
Simplify post-shutdown socket check to a single ping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rekmarks
Copy link
Member Author

For posterity, re: @FUDCo's review:

  • We are postponing multi-tenant daemons until later.
  • The daemon is "some locally executing kernel" for the purposes of @FUDCo's I/O vat design.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 20, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 75.98%
⬇️ -2.24%
6550 / 8620
🔵 Statements 75.87%
⬇️ -2.31%
6655 / 8771
🔵 Functions 73.79%
⬇️ -2.22%
1639 / 2221
🔵 Branches 75.35%
⬇️ -2.81%
2416 / 3206
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/cli/src/app.ts 0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
30-288
packages/cli/src/commands/daemon-client.ts 0% 0% 0% 0% 16-117
packages/cli/src/commands/daemon-entry.ts 0% 0% 0% 0% 11-102
packages/cli/src/commands/daemon-spawn.ts 0% 0% 0% 0% 7-47
packages/cli/src/commands/daemon.ts 0% 0% 0% 0% 11-263
packages/kernel-agents/src/utils.ts 96.42%
⬇️ -0.35%
80%
🟰 ±0%
100%
🟰 ±0%
96.29%
⬇️ -0.37%
71
packages/kernel-utils/src/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/kernel-utils/src/misc.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/nodejs/src/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/nodejs/src/daemon/delete-daemon-state.ts 0% 0% 0% 0% 23-37
packages/nodejs/src/daemon/index.ts 100% 100% 100% 100%
packages/nodejs/src/daemon/rpc-socket-server.ts 0% 0% 0% 0% 43-228
packages/nodejs/src/daemon/socket-line.ts 100% 100% 100% 100%
packages/nodejs/src/daemon/start-daemon.ts 100% 100% 100% 100%
packages/nodejs/src/io/socket-channel.ts 90.58%
⬇️ -3.46%
84.61%
⬇️ -2.89%
94.73%
🟰 ±0%
91.35%
⬇️ -3.65%
84-85, 96-98, 115, 160, 170
packages/nodejs/src/kernel/make-kernel.ts 100%
🟰 ±0%
85.71%
⬇️ -14.29%
100%
🟰 ±0%
100%
🟰 ±0%
Generated in workflow #3785 for commit c0adc26 by the Vitest Coverage Report Action

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FUDCo
FUDCo previously approved these changes Feb 20, 2026
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excelsior!

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

@rekmarks rekmarks disabled auto-merge February 20, 2026 02:56
rekmarks and others added 3 commits February 19, 2026 18:57
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use removeListener with specific handler references instead of
removeAllListeners, so callers' listeners are preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers line parsing, buffering, error/end/close rejection, timeout,
and verifies that readLine only removes its own socket listeners.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rekmarks rekmarks enabled auto-merge February 20, 2026 04:17
@rekmarks rekmarks requested a review from FUDCo February 20, 2026 04:17
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avaunt!

@rekmarks rekmarks added this pull request to the merge queue Feb 20, 2026
Merged via the queue into main with commit 5e60e4a Feb 20, 2026
29 checks passed
@rekmarks rekmarks deleted the rekm/grypez/daemon branch February 20, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants