Resubscribe to project changes after websocket reconnects#2174
Resubscribe to project changes after websocket reconnects#2174
Conversation
When the websocket times out and reconnects, such as after a laptop is put to sleep or a phone changes networks, HubConnection reconnects with a new connection ID. But the connection ID is what is used on the server side to determine who should receive project-change notifications. So whenever the HubConnection reconnects, it needs to resubscribe to the project-change notification events, which will send the server its new connection ID.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReworks SignalR hub contracts and reconnection: adds IProjectChangeHubServer/IProjectChangeHubClient, removes IProjectChangeListener, switches hub generic types, adds per-connection subscription caching and an InfiniteRetryPolicy to re-send ListenForProjectChanges after reconnects. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
|
Ideally we would be able to tell the Lexbox server "Hey, connection ID 123abc has gone away" and have the server disconnect that connection ID from the project-change notification group. Otherwise connection IDs will just pile up and pile up in the group, which could potentially lead to slower performance depending on how SignalR handles things. (I didn't do a deep dive into SignalR to find out). But in HubConnection's Reconnected event handler, I could see no way to retrieve the old connection ID in order to tell the server "Hey, I used to be ID thus-and-such". So we register with the new connection ID and hope that SignalR is doing the sensible thing with old connection IDs. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs (1)
221-224: Add logging and error handling inside theReconnectedhandler.If
SendAsyncthrows (e.g., due to a transient network error or cancellation before reconnect completes), the exception silently propagates to the SignalR event-dispatch loop. Adding atry/catchwith structured logging makes reconnect failures visible without destabilising the connection.♻️ Proposed improvement — log and swallow recoverable errors
- lexboxConnection.Reconnected += (newId) => - { - return lexboxConnection.SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken); - }; + lexboxConnection.Reconnected += async (newId) => + { + try + { + logger.LogInformation("SignalR reconnected (connectionId={NewId}), re-subscribing to project {ProjectId}", newId, projectData.Id); + await lexboxConnection.SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken); + } + catch (Exception ex) when (ex is not OperationCanceledException) + { + logger.LogError(ex, "Failed to re-subscribe to project changes for {ProjectId} after reconnect", projectData.Id); + } + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs` around lines 221 - 224, Wrap the Reconnected handler body in an async lambda and add try/catch around the SendAsync call: change lexboxConnection.Reconnected += (newId) => { ... } to an async (newId) => { try { await lexboxConnection.SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken); } catch (OperationCanceledException) { /* optional debug log if needed and swallow */ } catch (Exception ex) { /* structured log including projectData.Id, newId and the exception */ } }, so failures are logged with context (projectData.Id, newId) and do not propagate to the SignalR event loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 220-224: Register the lexboxConnection.Reconnected handler before
calling lexboxConnection.SendAsync to avoid the race where a reconnect occurs
with no handler attached; move the subscription of the Reconnected event (the
lambda that calls lexboxConnection.SendAsync("ListenForProjectChanges",
projectData.Id, stoppingToken)) so it runs prior to the initial
lexboxConnection.SendAsync("ListenForProjectChanges", projectData.Id,
stoppingToken), keeping the same arguments (projectData.Id, stoppingToken) in
the handler.
- Around line 221-224: StartLexboxProjectChangeListener currently appends a new
lexboxConnection.Reconnected handler per call which causes duplicate
ListenForProjectChanges sends on reconnect; fix by registering a single
Reconnected handler per cached HubConnection and maintain a thread-safe set of
registered project IDs for that connection (e.g., ConcurrentDictionary/HashSet)
so that when adding a project you only add its ID to the set (not another
Reconnected delegate), and the single Reconnected handler iterates the set and
calls lexboxConnection.SendAsync("ListenForProjectChanges", projectId,
stoppingToken) once per unique projectId; ensure thread-safety when updating the
set and remove project IDs / dispose handlers when listeners stop.
---
Nitpick comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 221-224: Wrap the Reconnected handler body in an async lambda and
add try/catch around the SendAsync call: change lexboxConnection.Reconnected +=
(newId) => { ... } to an async (newId) => { try { await
lexboxConnection.SendAsync("ListenForProjectChanges", projectData.Id,
stoppingToken); } catch (OperationCanceledException) { /* optional debug log if
needed and swallow */ } catch (Exception ex) { /* structured log including
projectData.Id, newId and the exception */ } }, so failures are logged with
context (projectData.Id, newId) and do not propagate to the SignalR event loop.
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs (1)
220-222: Lambda parametercacheshadows the class fieldthis.cache.The
GetOrCreatefactory parameter is namedcache, which shadows theIMemoryCachefield. This doesn't cause a bug today since the field isn't accessed inside the factory, but it's a readability trap — especially if someone later adds logic inside the factory that intends to reference theIMemoryCache. Consider naming itentryto match theICacheEntrytype.Naming fix
- var reconnectedHandler = cache.GetOrCreate<Func<string?, Task>>(HubReconnectionHandlerCacheKey(server, projectData), cache => + var reconnectedHandler = cache.GetOrCreate<Func<string?, Task>>(HubReconnectionHandlerCacheKey(server, projectData), entry => { - cache.SlidingExpiration = TimeSpan.FromHours(1); + entry.SlidingExpiration = TimeSpan.FromHours(1);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs` around lines 220 - 222, The lambda parameter named "cache" in the GetOrCreate call (where reconnectedHandler is created using GetOrCreate<Func<string?, Task>>(HubReconnectionHandlerCacheKey(server, projectData), cache => { ... })) shadows the class field this.cache; rename that lambda parameter to a nonconflicting name such as "entry" (or "cacheEntry") and update any internal references (e.g., cache.SlidingExpiration) to entry.SlidingExpiration so the factory uses the ICacheEntry parameter and avoids hiding the IMemoryCache field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 220-231: The cached reconnection handler currently closes over the
local lexboxConnection, causing it to call SendAsync on a disposed
HubConnection; change the factory used by
cache.GetOrCreate(HubReconnectionHandlerCacheKey(...)) so the returned
Func<string?, Task> does not capture lexboxConnection but instead looks up the
current/active HubConnection at invocation time (e.g., resolve the connection
from the same connection cache by server/project inside the delegate) and calls
SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken) on that
resolved connection; also update the Closed event handler to evict any cached
reconnection handlers for that server (use HubReconnectionHandlerCacheKey or
whatever key-generation helper) when a connection is permanently closed so stale
handlers are removed.
---
Nitpick comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 220-222: The lambda parameter named "cache" in the GetOrCreate
call (where reconnectedHandler is created using GetOrCreate<Func<string?,
Task>>(HubReconnectionHandlerCacheKey(server, projectData), cache => { ... }))
shadows the class field this.cache; rename that lambda parameter to a
nonconflicting name such as "entry" (or "cacheEntry") and update any internal
references (e.g., cache.SlidingExpiration) to entry.SlidingExpiration so the
factory uses the ICacheEntry parameter and avoids hiding the IMemoryCache field.
CodeRabbit AI review pointed out a possible race condition I would have missed: when the HubConnection is closed, it is removed from the cache. If ListenForProjectChanges is called after that point, it will return a new HubConnection object. The handler, though, has a cache duration of 1 hour (and I should probably bump that up to 1 or more day(s)) so that we will be certain to unregister the same event handler that was previously registered. Which means that closing over lexboxConnection would be sure to catch the stale, now-closed connection. By fetching the connection from the cache, we can ensure that we get the fresh connection rather than the stale one.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs (1)
234-234:stoppingTokencaptured in handler silently drops resubscription if cancelled at reconnect time.
stoppingTokenis captured at factory-creation time. If the token is cancelled before theReconnectedevent fires (e.g., the service/scope that originally calledListenForProjectChangesshuts down),SendAsyncthrowsOperationCanceledException. SinceHubConnection.Reconnectedis aFunc<string?, Task>event and SignalR silently swallows faulted handler tasks, the resubscription fails with no log entry. Consider usingCancellationToken.Noneinside the handler (reconnect itself isn't bounded by the caller's lifetime) or at least logging the failure:♻️ Proposed refactor
- return connection.SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken); + return connection.SendAsync("ListenForProjectChanges", projectData.Id, CancellationToken.None);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs` at line 234, The Reconnected handler captures the original stoppingToken causing SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken) to throw OperationCanceledException and silently drop resubscription; update the Reconnected handler in LexboxProjectService so it does not use the captured stoppingToken (use CancellationToken.None for the SendAsync call) or at minimum catch and log exceptions from SendAsync inside the HubConnection.Reconnected delegate (reference: stoppingToken, ListenForProjectChanges call, HubConnection.Reconnected, and SendAsync) to ensure failures are visible and resubscribe attempts aren't cancelled by the caller's lifetime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 225-233: The fallback logic in the handler that uses
lexboxConnection captured at factory-creation is unsafe and the comment about a
five-minute guard is incorrect; update the handler inside the cache retrieval
(the block that checks cache.TryGetValue(HubConnectionCacheKey(server), out
HubConnection? connection) || connection is null) to not call SendAsync on the
captured lexboxConnection fallback. Instead return Task.CompletedTask when no
valid cached connection is found; also remove or correct the misleading comment
and reference StartLexboxProjectChangeListener, cache.GetOrCreate and
lexboxConnection so the code clearly uses the cached connection when available
and otherwise safely no-ops.
---
Nitpick comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Line 234: The Reconnected handler captures the original stoppingToken causing
SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken) to throw
OperationCanceledException and silently drop resubscription; update the
Reconnected handler in LexboxProjectService so it does not use the captured
stoppingToken (use CancellationToken.None for the SendAsync call) or at minimum
catch and log exceptions from SendAsync inside the HubConnection.Reconnected
delegate (reference: stoppingToken, ListenForProjectChanges call,
HubConnection.Reconnected, and SendAsync) to ensure failures are visible and
resubscribe attempts aren't cancelled by the caller's lifetime.
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs (1)
227-232: The "five minutes" in the comment is inaccurate — correct it.The
HubConnectionCacheKeycache entry (created at line 297) has no sliding or absolute expiration set on it; it stays in the cache indefinitely until theClosedevent explicitly callscache.Remove(HubConnectionCacheKey(server)). There is no 5-minute timer, so the described race would not require "five minutes to pass" — the real reason it is vanishingly unlikely is simply thatStartLexboxProjectChangeListenerand theGetOrCreateon the next line execute sequentially within the same synchronous call chain.✏️ Suggested comment correction
- // In theory this could have a race condition where it ends up closing over the old lexboxConnection value, - // which might have been disposed and ListenForProjectChanges is being called with a new connection. But if that - // has happened, it is vanishingly unlikely for the new connection to NOT be in the cache by the time we reach - // this point, because that would require five minutes to pass between the StartLexboxProjectChangeListener call - // and the cache.GetOrCreate call two lines later. + // In theory this could have a race condition where it ends up closing over the old lexboxConnection value, + // which might have been disposed and ListenForProjectChanges is being called with a new connection. But if that + // has happened, it is vanishingly unlikely for the new connection to NOT be in the cache by the time we reach + // this point, because StartLexboxProjectChangeListener and this GetOrCreate call execute sequentially in the + // same call chain, so the new connection is already in the cache by the time we get here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs` around lines 227 - 232, The inline comment incorrectly refers to a "five minutes" delay; update the comment near the assignment to connection = lexboxConnection to remove the five‑minute claim and instead state the actual reason the race is vanishingly unlikely: the HubConnectionCacheKey entry has no expiration (it is only removed by cache.Remove(HubConnectionCacheKey(server)) in the Closed handler), and StartLexboxProjectChangeListener / ListenForProjectChanges and the subsequent cache.GetOrCreate call run synchronously in the same call chain so the new connection will already be in the cache; reference the symbols StartLexboxProjectChangeListener, ListenForProjectChanges, HubConnectionCacheKey, cache.GetOrCreate, lexboxConnection and connection when editing the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 220-238: Get a null-check after calling
cache.GetOrCreate<Func<string?, Task>>(HubReconnectionHandlerCacheKey(server,
projectData)) to ensure reconnectedHandler is not null before
unsubscribing/subscribing to lexboxConnection.Reconnected; if the cached value
is null, call the factory (or create a non-null handler) and store it so that
reconnectedHandler is always a valid Func<string?, Task>, then use that non-null
reconnectedHandler for lexboxConnection.Reconnected -= and += so
ListenForProjectChanges is reliably re-sent on reconnect.
---
Nitpick comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 227-232: The inline comment incorrectly refers to a "five minutes"
delay; update the comment near the assignment to connection = lexboxConnection
to remove the five‑minute claim and instead state the actual reason the race is
vanishingly unlikely: the HubConnectionCacheKey entry has no expiration (it is
only removed by cache.Remove(HubConnectionCacheKey(server)) in the Closed
handler), and StartLexboxProjectChangeListener / ListenForProjectChanges and the
subsequent cache.GetOrCreate call run synchronously in the same call chain so
the new connection will already be in the cache; reference the symbols
StartLexboxProjectChangeListener, ListenForProjectChanges,
HubConnectionCacheKey, cache.GetOrCreate, lexboxConnection and connection when
editing the comment.
@CodeRabbit You're right; I was getting confused with the five-minute absolute expiration in GetLexboxProjects. The comment no longer applies at all, so I have removed it in commit 932ef4c. |
|
Tip For best results, initiate chat on the files or code changes.
🧠 Learnings usedIf you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
|
Tip For best results, initiate chat on the files or code changes.
🧠 Learnings usedIf you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
|
@myieye - I have finished going through this code with CodeRabbit, and don't expect to make any more changes to it. Go ahead and take a look at it, it's in its final form (unless you request changes) so I won't be changing it while you review. |
There was a problem hiding this comment.
I think this is looking good. I expect this will add a lot more stability to our auto-syncing 👍
I went over this a bunch with Devin AI. Here's what I came up with:
1) There's an old bug: cache.CreateEntry(...) only commits the cache entry if you dispose the returned ICacheEntry (e.g. with using I suppose). So we actually never cache connections. That might be one reason you were seeing different connection IDs. But I suspect we still need the Reconnected handler.
2) As it is we're wiring up 1 reconnected handler per project and we never know when to remove them from the cache. What if we use a ConditionalWeakTable keyed on the HubConnection instead? One handler per connection, automatic cleanup when the connection is GC'd, no expiration to worry about. Code something like this:
private static readonly ConditionalWeakTable<HubConnection, HashSet<Guid>> _reconnectProjects = new();
public async Task ListenForProjectChanges(ProjectData projectData, CancellationToken stoppingToken)
{
if (!options.Value.TryGetServer(projectData, out var server)) return;
var lexboxConnection = await StartLexboxProjectChangeListener(server, stoppingToken);
if (lexboxConnection is null) return;
var subscribedProjects = _reconnectProjects.GetValue(lexboxConnection, static conn =>
{
var projects = new HashSet<Guid>();
conn.Reconnected += async _ =>
{
Guid[] projectIds;
lock (projects)
{
projectIds = [.. projects];
}
foreach (var projectId in projectIds)
{
await conn.SendAsync("ListenForProjectChanges", projectId);
}
};
return projects;
});
lock (subscribedProjects)
{
subscribedProjects.Add(projectData.Id);
}
await lexboxConnection.SendAsync("ListenForProjectChanges", projectData.Id, stoppingToken);
}This also means HubReconnectionHandlerCacheKey can be deleted.
3) How about we add a more persistent retry policy? The default WithAutomaticReconnect() only retries 4 times over ~42 seconds. If a user closes their laptop for a few minutes, all retries are exhausted, Closed fires, the connection is disposed, and nothing re-establishes it. That might been the more common cause of #2149 than a brief reconnect. Something like this:
.WithAutomaticReconnect(new InfiniteRetryPolicy())private class InfiniteRetryPolicy : IRetryPolicy
{
public TimeSpan? GetNextRetryDelay(RetryContext retryContext)
{
return retryContext.PreviousRetryCount switch
{
0 => TimeSpan.Zero,
1 => TimeSpan.FromSeconds(2),
2 => TimeSpan.FromSeconds(10),
3 => TimeSpan.FromSeconds(30),
_ => TimeSpan.FromSeconds(60),
};
}
}
That is a strange API: I was actually skeptical of whether the AI had hallucinated that, because it seems a bizarre way to implement this. Disposing the object commits it? But sure enough, https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.caching.memory.icacheentry?view=net-10.0-pp says that ICacheEntry "Represents an entry in the IMemoryCache implementation. When Disposed, is committed to the cache." I suppose the idea is that you take the just-created cache entry and do things to it (setting the expiration time, for example), and then when you're done you exit the
Sounds like a good plan. I never remember the existence of ConditionalWeakTable and similar. I'll do that too.
Yes, closing the laptop lid for hours at a time is going to be a common scenario. And when the laptop lid opens, it's likely to be on a different WiFi network with a different IP address, so that's something we should test too. I'll do an infinite retry policy as suggested, especially since as per 2) we'll have a persistent Reconnect handler. |
That depends very much on how the retry system is set up, actually. If the laptop is suspended, then the retry code isn't running. So on second thought, I don't think this is the issue we thought it was. When the laptop gets opened up, that's the moment when the system will notice the connection dropped, try to reconnect, and start the timer on the retry policy. I'd have to test this, of course, but I suspect the laptop could be closed for an hour and the default policy would still work: try again after 2 seconds, then after 10, then after 30, then give up, all of it with timers started at the moment of unsuspending the laptop — as long as the user's laptop connects to WiFi within 42 seconds, the policy would succeed. |
|
I'm rethinking whether a ConditionalWeakTable keyed on the HubConnection is the right approach. I'm slightly hampered by not having a deep understanding of when we might end up getting a new HubConnnection object back instead of the existing one. But if we get a new object back and don't have the existing one anymore, wouldn't that mean that we would lose the list of connected projects entirely? The Reconnected handler closes over the HashSet, but I think it might be closing over the wrong HashSet. Let me think it through. Scenario: Start up FW Lite, connect to two projects, A and B. Then something happens to the HubConnection and it gets torn down and a new one is established. (E.g. if, when it tries to reconnect, it fails to get through for 45 seconds, enough to exhaust the default retry policy). I'll walk through this step by step. Step 1. FW Lite starts up. HubConnection 1 established. Connect to project A. State of the weak table: HubConnection 1 has a reference to HashSet 0x123, which contains Guid A. HubConnection 1 now gets a Reconnect handler, which closes over HashSet 0x123 in its Step 2. Connect to project B. ListenForProjectChanges gets called again, retrieves HubConnection 1 from the memory cache. The Step 3. HubConnection 1 disconnected, tries to reconnect, successfully reconnects after 20 seconds. The Reconnect handler is called. It closed over HashSet 0x123, so it will retrieve A and B and successfully ask the server to receive project events again on the new connection ID. All good so far. Step 4. HubConnection 1 disconnected again, tries to reconnect, can't get through at all. Eventually gives up, Closed fires, and HubConnection 1 is booted out of the memory cache. Its Reconnected handler is never fired, because it never succeeded in reconnecting. Now HubConnection 1 gets disposed, and soon no strong references to it exist, so it is booted out of the ConditionalWeakTable. Its Reconnected handler, which had closed over HashSet 0x123, is presumably disposed along with HubConnection 1. As far as I can see, nothing reconnects projects A and B here. Step 5. User tries to connect to a new project, C. ListenForProjectChanges is called. It doesn't find a HubConnection in the cache, so it creates new HubConnection 2. It then registers HubConnection 2 in the ConditionalWeakTable. The CreateValue callback (GetValue's second param) is called. It creates a new HashSet, 0x456, and registers it with HubConnection 2. It has not found HashSet 0x123, and there's no way for it to find that hash set because HubConnection 1 is gone. The list of project GUIDs in HashSet 0x456 now contains only project C, and projects A and B are nowhere to be found. |
|
As far as I can tell, thinking through this scenario, we want a list of projects keyed to the server, not the HubConnection object, if we want to be able to reconnect them when a new HubConnection ends up being created. |
IMemoryCache.CreateEntry() returns an IDisposable that must be disposed for the cache entry to actually be created.
This allows us to resubscribe to all projects in a single handler, so we never need to do the "Event -= Handler; Event += Handler" dance.
@rmunn I understand better now what you mean. We're not so interested in the connection<->project relationship, but rather the question: "What projects does the user currently want to listen to and what server are they on?" As it is, our ideas kind of side-step that question. Our options:
I'd say (4) is out of reach. That leaves us with (1) and (2). I like the idea of an infinite retry policy. I've had days when my network adapter dropped out for at least a minute several times. I expect some of our users lose internet for much longer periods of time. I think it would be great if we could expect our app to recover from longer interruptions. The idea that "it just keeps trying" is both less confusing and more satisfying than "it'll retry for approx X minutes". |
|
I've tested this (up through commit 4be2ae6, I had not yet pulled your commit when I tested) and it reconnected after suspending the laptop for 2 minutes. So it's not "laptop was suspended for 1 hour" that we need to guard against, it's "internet was disconnected for over 1 minute while laptop was running". Which, as you mentioned, is definitely a scenario people will run into. So yes, infinite retry should go in as well. But I'm satisfied that suspending the laptop for over 42 seconds is not going to kill the HubConnection. |
Default retry policy is three retries spaced 2, 10, and 30 seconds apart. But if a network disconnect lasts for more than 42 seconds, it would give up and stop trying to reconnect. We will try to reconnect every minute, without ever stopping, because our users are very likely to encounter network hiccups that last several minutes, even several hours. When the network does come back, though, we want to reconnect no matter how long it took to come back, so use an infinite retry policy.
|
Infinite retry implemented in commit 8b51eae. |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs (1)
310-324: DoubleDisposeAsynconHubConnection—Closedhandler and post-eviction callback both dispose.When
Closedfires, line 312 callscache.Remove(...)which triggers the post-eviction callback (line 322:con.DisposeAsync()). Then line 313 immediately callsconnection.DisposeAsync()again on the same object. WhileDisposeAsyncis likely idempotent, this is redundant.Consider removing the explicit dispose from the
Closedhandler and relying solely on the post-eviction callback, or vice versa:♻️ Option: remove the redundant dispose from the Closed handler
connection.Closed += async (exception) => { cache.Remove(HubConnectionCacheKey(server)); - await connection.DisposeAsync(); };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs` around lines 310 - 324, The Closed event handler and the cache post-eviction callback both call DisposeAsync on the same HubConnection, causing a redundant double-dispose; remove the explicit await connection.DisposeAsync() from the connection.Closed handler and let cache.Remove(HubConnectionCacheKey(server)) trigger the RegisterPostEvictionCallback (which calls con.DisposeAsync()) to perform disposal, ensuring disposal is done exactly once and keeping the cache entry creation (using var __ = cache.CreateEntry(...).RegisterPostEvictionCallback(...)) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 216-244: The Reconnected handler attached in
ListenForProjectChanges via _reconnectProjects (inside the GetValue factory for
lexboxConnection) can abort resubscription if one conn.SendAsync throws; modify
the conn.Reconnected async handler to wrap each await
conn.SendAsync(nameof(IProjectChangeHubServer.ListenForProjectChanges),
projectId) call in a try-catch that logs the error and continues so all
projectIds are attempted; ensure the catch does not rethrow and consider logging
with context (connection and projectId) to aid diagnostics while preserving the
existing locking and project enumeration logic.
---
Nitpick comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 310-324: The Closed event handler and the cache post-eviction
callback both call DisposeAsync on the same HubConnection, causing a redundant
double-dispose; remove the explicit await connection.DisposeAsync() from the
connection.Closed handler and let cache.Remove(HubConnectionCacheKey(server))
trigger the RegisterPostEvictionCallback (which calls con.DisposeAsync()) to
perform disposal, ensuring disposal is done exactly once and keeping the cache
entry creation (using var __ =
cache.CreateEntry(...).RegisterPostEvictionCallback(...)) intact.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.csbackend/FwLite/MiniLcm/Push/IProjectChangeHub.csbackend/FwLite/MiniLcm/Push/IProjectChangeListener.csbackend/LexBoxApi/Controllers/CrdtController.csbackend/LexBoxApi/Hub/CrdtProjectChangeHub.cs
💤 Files with no reviewable changes (1)
- backend/FwLite/MiniLcm/Push/IProjectChangeListener.cs
If one project fails to reconnect and throws an exception, we want to swallow it so the rest of the projects aren't blocked from reconnecting.
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs (1)
317-331:DisposeAsyncmay be called twice on the sameHubConnection.The
Closedhandler (line 320) callsconnection.DisposeAsync(), butcache.Removeon line 319 triggers thePostEvictionCallback(line 325-330) which also callsDisposeAsync().HubConnection.DisposeAsync()should be idempotent, so this is unlikely to cause issues, but you could remove the explicit dispose in theClosedhandler since the post-eviction callback already handles it.Suggested simplification
connection.Closed += async (exception) => { cache.Remove(HubConnectionCacheKey(server)); - await connection.DisposeAsync(); };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs` around lines 317 - 331, The Closed handler currently awaits connection.DisposeAsync() while cache.Remove(HubConnectionCacheKey(server)) will invoke the CreateEntry().RegisterPostEvictionCallback which already calls DisposeAsync; remove the explicit await connection.DisposeAsync() from the connection.Closed async delegate (but keep cache.Remove and the PostEvictionCallback intact) so DisposeAsync is only invoked via the PostEvictionCallback registered on the cache entry; ensure the Closed handler still unregisters/cleans up the cache entry and does not swallow exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/FwLite/FwLiteShared/Projects/LexboxProjectService.cs`:
- Around line 317-331: The Closed handler currently awaits
connection.DisposeAsync() while cache.Remove(HubConnectionCacheKey(server)) will
invoke the CreateEntry().RegisterPostEvictionCallback which already calls
DisposeAsync; remove the explicit await connection.DisposeAsync() from the
connection.Closed async delegate (but keep cache.Remove and the
PostEvictionCallback intact) so DisposeAsync is only invoked via the
PostEvictionCallback registered on the cache entry; ensure the Closed handler
still unregisters/cleans up the cache entry and does not swallow exceptions.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/FwLite/FwLiteShared/Projects/LexboxProjectService.csbackend/FwLite/MiniLcm/Push/IProjectChangeHub.csbackend/FwLite/MiniLcm/Push/IProjectChangeListener.csbackend/LexBoxApi/Controllers/CrdtController.csbackend/LexBoxApi/Hub/CrdtProjectChangeHub.cs
💤 Files with no reviewable changes (1)
- backend/FwLite/MiniLcm/Push/IProjectChangeListener.cs
|
@myieye - Having thought through your four-item list a bit, I think I'm okay with the following logic (which exists in the PR as it is right now, meaning that as far as I'm concerned, it's ready to merge):
I don't think very many people will load so many projects that they will have performance issues (only consultants are likely to do that, and even consultants probably won't have 20-25 projects active at the same time), so making them do a close-and-reload step isn't a huge demand, I think. |
Fixes #2149.
When the websocket times out and reconnects, such as after a laptop is put to sleep or a phone changes networks, HubConnection reconnects with a new connection ID. But the connection ID is what is used on the server side to determine who should receive project-change notifications. So whenever the HubConnection reconnects, it needs to resubscribe to the project-change notification events, which will send the server its new connection ID.