Skip to content

fix: stabilize tray pairing and reconnect behavior#80

Open
andyeskridge wants to merge 2 commits intoopenclaw:masterfrom
andyeskridge:fix/tray-pairing-and-scopes
Open

fix: stabilize tray pairing and reconnect behavior#80
andyeskridge wants to merge 2 commits intoopenclaw:masterfrom
andyeskridge:fix/tray-pairing-and-scopes

Conversation

@andyeskridge
Copy link

Summary

This updates the Windows tray node pairing flow to match current OpenClaw gateway behavior and fixes reconnect recovery after gateway restarts.

Why

The tray previously assumed a node was only paired if hello-ok returned auth.deviceToken and that token was persisted locally.

In practice, current gateway behavior for Windows WS nodes looks like this:

  • unapproved device -> NOT_PAIRED
  • approved device -> hello-ok
  • node commands work normally after approval
  • auth.deviceToken is not always returned
  • That left the tray in incorrect states like false Pending, false Unknown, and failed reconnect recovery after gateway restarts.

Changes

  • treat NOT_PAIRED as a real pairing-needed state instead of a generic registration error
  • only show pending approval when the gateway explicitly indicates pairing is required
  • treat an approved hello-ok as paired even when auth.deviceToken is absent
  • fix websocket reconnect so the tray keeps retrying until the gateway is back
  • suppress repeated pairing/connect notifications and duplicate activity noise on routine reconnects
  • tighten pairing event matching so node pairing events use the actual node identity when available

Validation

Tested locally against a real gateway:

  • fresh tray state enters Pending on NOT_PAIRED
  • approving the device in OpenClaw transitions the node to Connected
  • node requests succeed after approval (system.notify verified)
  • gateway restart disconnects the tray, retries through startup errors, and reconnects automatically

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stabilizes the Windows tray “node mode” pairing state machine and improves reconnect recovery after gateway restarts, aligning tray behavior with current gateway responses (including NOT_PAIRED and hello-ok without auth.deviceToken).

Changes:

  • Update tray pairing notifications to suppress duplicate Pending/Paired toasts/activity.
  • Extend WindowsNodeClient to handle pairing-required errors, pairing events, and treat hello-ok as paired even without auth.deviceToken.
  • Make WebSocket reconnect retry continuously with backoff until the gateway is reachable again.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/OpenClaw.Tray.WinUI/App.xaml.cs Debounces pairing toasts/activity based on last observed pairing status.
src/OpenClaw.Shared/WindowsNodeClient.cs Adds explicit pairing state tracking, handles NOT_PAIRED + pair events, and revises hello-ok handling.
src/OpenClaw.Shared/WebSocketClientBase.cs Changes reconnect to loop with backoff until connected/disposed.
Comments suppressed due to low confidence (1)

src/OpenClaw.Shared/WindowsNodeClient.cs:575

  • _pairingApprovedAwaitingReconnect is documented as “until the next successful reconnect”, but in the hello-ok path it is only cleared when auth.deviceToken is present. If the gateway approves and then reconnects without returning auth.deviceToken, this flag stays true indefinitely, causing incorrect state/logging on subsequent reconnects. Clear _pairingApprovedAwaitingReconnect after processing the first successful hello-ok post-approval (even when no device token is returned).
            else if (_pairingApprovedAwaitingReconnect)
            {
                _logger.Info("hello-ok arrived after pairing approval without auth.deviceToken; keeping local state paired.");
            }

            _logger.Info($"Node registered successfully! ID: {_nodeId ?? _deviceIdentity.DeviceId.Substring(0, 16)}");
            _logger.Info($"[NODE] hello-ok auth present={hasAuthPayload}, receivedDeviceToken={receivedDeviceToken}, storedDeviceToken={!string.IsNullOrEmpty(_deviceIdentity.DeviceToken)}, pendingApproval={_isPendingApproval}, awaitingReconnect={_pairingApprovedAwaitingReconnect}");
            
            _isPendingApproval = false;
            _isPaired = true;
            _logger.Info(string.IsNullOrEmpty(_deviceIdentity.DeviceToken)
                ? "Gateway accepted the node without returning a device token; treating this device as paired"
                : "Already paired with stored device token");

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +576 to 582
if (!wasPairedBeforeHello)
{
_isPendingApproval = true;
_logger.Info("Not yet paired - check 'openclaw devices list' for pending approval");
_logger.Info($"To approve, run: openclaw devices approve {_deviceIdentity.DeviceId}");
PairingStatusChanged?.Invoke(this, new PairingStatusEventArgs(
PairingStatus.Pending,
PairingStatus.Paired,
_deviceIdentity.DeviceId,
$"Run: openclaw devices approve {ShortDeviceId}..."));
}
else
{
_isPendingApproval = false;
_logger.Info("Already paired with stored device token");
PairingStatusChanged?.Invoke(this, new PairingStatusEventArgs(
PairingStatus.Paired,
_deviceIdentity.DeviceId));
"Pairing approved!"));
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PairingStatusChanged event emitted on hello-ok uses the message "Pairing approved!" even when approval may have happened previously (e.g., device already approved on the gateway but no auth.deviceToken is returned). This is likely to mislead users because hello-ok indicates successful registration, not necessarily a new approval action. Consider using a more neutral message (e.g., "Node connected" / "Node registration accepted") or only using "approved" wording when a pairing approval event was actually observed.

This issue also appears on line 563 of the same file.

See below for a potential fix:

                var pairingMessage = (receivedDeviceToken || _pairingApprovedAwaitingReconnect)
                    ? "Pairing approved!"
                    : "Node registration accepted";
                PairingStatusChanged?.Invoke(this, new PairingStatusEventArgs(
                    PairingStatus.Paired,
                    _deviceIdentity.DeviceId,
                    pairingMessage));

Copilot uses AI. Check for mistakes.
Comment on lines +1161 to +1175
var previousStatus = _lastNodePairingStatus;
_lastNodePairingStatus = args.Status;

try
{
if (args.Status == OpenClaw.Shared.PairingStatus.Pending)
{
AddRecentActivity("Node pairing pending", category: "node", dashboardPath: "nodes", nodeId: args.DeviceId);
// Show toast with approval instructions
new ToastContentBuilder()
.AddText(LocalizationHelper.GetString("Toast_PairingPending"))
.AddText(string.Format(LocalizationHelper.GetString("Toast_PairingPendingDetail"), args.DeviceId.Substring(0, 16)))
.Show();
if (previousStatus != OpenClaw.Shared.PairingStatus.Pending)
{
AddRecentActivity("Node pairing pending", category: "node", dashboardPath: "nodes", nodeId: args.DeviceId);
new ToastContentBuilder()
.AddText(LocalizationHelper.GetString("Toast_PairingPending"))
.AddText(string.Format(LocalizationHelper.GetString("Toast_PairingPendingDetail"), args.DeviceId.Substring(0, 16)))
.Show();
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pairing toast/activity suppression only keys off the previous status, not the device identity. If the device identity changes during the app session (e.g., device key regenerated) and the new device immediately enters Pending/Paired, the toast could be incorrectly suppressed because previousStatus matches. Track the last deviceId along with status (or reset _lastNodePairingStatus when args.DeviceId changes).

Copilot uses AI. Check for mistakes.
Comment on lines +189 to 195
while (!_disposed)
{
await Task.Delay(delay, _cts.Token);
var delay = BackoffMs[Math.Min(_reconnectAttempts, BackoffMs.Length - 1)];
_reconnectAttempts++;
_logger.Warn($"{ClientRole} reconnecting in {delay}ms (attempt {_reconnectAttempts})");
RaiseStatusChanged(ConnectionStatus.Connecting);

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReconnectWithBackoffAsync now raises ConnectionStatus.Connecting on every retry iteration. In the tray app, OnNodeStatusChanged logs recent activity for every status change, so this can create noisy repeated "Node mode Connecting" entries while the gateway is down. Consider deduping/only emitting Connecting when transitioning from a non-connecting state, or moving the attempt counter into logs without raising a new status event each loop.

See below for a potential fix:

        // Only emit "Connecting" once when entering the reconnect loop to avoid noisy duplicates.
        RaiseStatusChanged(ConnectionStatus.Connecting);

        while (!_disposed)
        {
            var delay = BackoffMs[Math.Min(_reconnectAttempts, BackoffMs.Length - 1)];
            _reconnectAttempts++;
            _logger.Warn($"{ClientRole} reconnecting in {delay}ms (attempt {_reconnectAttempts})");

Copilot uses AI. Check for mistakes.
Comment on lines +620 to +647
if (string.Equals(errorCode, "NOT_PAIRED", StringComparison.OrdinalIgnoreCase))
{
if (_isPendingApproval)
{
return;
}

_isPendingApproval = true;
_isPaired = false;
_pairingApprovedAwaitingReconnect = false;

var detail = $"Device {ShortDeviceId} requires approval";
if (!string.IsNullOrWhiteSpace(pairingRequestId))
{
detail += $" (request {pairingRequestId})";
}

_logger.Info($"[NODE] Pairing required for this device; waiting for gateway approval. reason={pairingReason ?? "unknown"}, requestId={pairingRequestId ?? "none"}");
PairingStatusChanged?.Invoke(this, new PairingStatusEventArgs(
PairingStatus.Pending,
_deviceIdentity.DeviceId,
detail));
return;
}

_logger.Error($"Node registration failed: {error} (code: {errorCode})");
RaiseStatusChanged(ConnectionStatus.Error);
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new pairing state machine (handling NOT_PAIRED, node/device.pair.* events, and hello-ok without auth.deviceToken) isn’t covered by unit tests. There are existing WindowsNodeClientTests, but they only validate URL normalization; adding tests that feed representative JSON into ProcessMessageAsync/handlers would help prevent regressions in reconnect + pairing behavior.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

🤖 This is an automated response from Repo Assist.

Great work on this PR, @andyeskridge! The pairing and reconnect logic was one of the trickier areas to get right. Here's a review:


What's well done:

  • The ReconnectWithBackoffAsync loop is a solid fix — retrying until connected (or disposed) eliminates the missed-reconnect window that caused the original issue.
  • HandlePairingRequestedEvent / HandlePairingResolvedEventAsync correctly filter events to the current device via PayloadTargetsCurrentDevice, which is important since pairing events may arrive for other nodes on the same gateway.
  • Treating hello-ok as "paired" regardless of whether auth.deviceToken is returned correctly matches observed gateway behaviour (the issue your PR describes).
  • Deduplicating toast notifications via _lastNodePairingStatus is a good UX improvement.

A few things to consider:

  1. _lastNodePairingStatus not reset on service re-init. In OnSettingsSaved, the node service is torn down and a new one created. But _lastNodePairingStatus in App.xaml.cs isn't reset to null. If a user reconnects after changing settings, the first PairingStatus.Pending or PairingStatus.Paired event would be silently suppressed if it matches the previous state. Suggest resetting it to null in OnSettingsSaved before InitializeNodeService().

  2. Unconditional _isPaired = true on hello-ok. The current code always sets _isPaired = true when hello-ok arrives — even on a brand new unregistered device. In practice the gateway only sends hello-ok to approved nodes, so this should be fine, but it means there's no path back to _isPaired = false except through NOT_PAIRED or a rejected pairing. Worth a comment or assertion to document the assumption.

  3. _pairingApprovedAwaitingReconnect thread safety. This flag is written from HandlePairingResolvedEventAsync (called from the WS receive loop) and read/reset in HandleResponse (also on the receive loop) — so it's single-threaded and safe. Just noting it so reviewers don't worry.

  4. Minor: CloseWebSocketAsync() is called in HandlePairingResolvedEventAsync with await but the method is async void–like (called via HandleEventAsync). The close triggers reconnect correctly via the existing loop, but if CloseWebSocketAsync throws, the exception is observed by the event loop. This is fine as-is, just worth keeping in mind.

Overall, this looks like a correct and well-motivated fix. The approach of separating _isPaired from DeviceToken presence is the right design.

Generated by Repo Assist ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@cbb46ab386962aa371045839fc9998ee4e97ca64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants