[Feat] Wait for connections to establish before callback, retry on fail#1129
[Feat] Wait for connections to establish before callback, retry on fail#1129
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughArr, this PR defers calling Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Client as NimBLEClient
participant GAP as BLE GAP
participant Timer as Timer
participant CB as onConnect Callback
App->>Client: connect(address)
Client->>Client: reset pending & retry state
Client->>GAP: startConnectionAttempt(address)
GAP-->>Client: GAP_CONNECT (success)
Client->>Client: set m_connectCallbackPending
Client->>Timer: startConnectEstablishedTimer(delay based on conn_itvl)
Note over GAP,Client: GAP events (MTU, conn update, ENC, ID, PHY) may call\ncompleteConnectEstablished() early
Timer-->>Client: timer fires
Client->>Client: completeConnectEstablished() (stop timer, clear pending)
Client->>CB: invoke onConnect()
CB-->>App: user notified
sequenceDiagram
participant Client as NimBLEClient
participant GAP as BLE GAP
participant Retry as Retry Logic
participant CB as Callback
Client->>GAP: startConnectionAttempt()
GAP-->>Client: GAP_DISCONNECT (reason 0x3e)
Client->>Retry: check retry count < connectFailRetries
alt Retries remaining
Retry->>Client: increment retry count
Client->>GAP: startConnectionAttempt() (retry)
else Final retry exhausted
Client->>Client: clear pending, stop timer
Client->>CB: invoke connect-fail/disconnect callbacks
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/NimBLEClient.cpp (1)
296-307:⚠️ Potential issue | 🟠 MajorArrr, the blocking wait budget undercounts retry attempts.
Each 0x3e retry starts a fresh
ble_gap_connect(..., m_connectTimeout, ...), but this wait only budgets one connect timeout. A late first failure can leave the next attempt with almost no time before the local timeout path cancels it, and the final 7×interval settle window also be missing when retries are disabled.⚙️ Possible fix
- // Wait for the connect timeout time +1 second +retry time for the connection to complete - if (!NimBLEUtils::taskWait( - taskData, - m_connectTimeout + 1000 + connIntervalToMs(m_connParams.itvl_max) * 7 * m_config.connectFailRetries)) { + const uint32_t settleMs = connIntervalToMs(m_connParams.itvl_max) * 7U; + const uint32_t attempts = static_cast<uint32_t>(m_config.connectFailRetries) + 1U; + const uint32_t waitBudget = attempts * static_cast<uint32_t>(m_connectTimeout) + settleMs + 1000U; + if (!NimBLEUtils::taskWait(taskData, waitBudget)) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/NimBLEClient.cpp` around lines 296 - 307, The wait timeout undercounts multiple connection attempts: update the taskWait calculation to account for all connect retries and their per-attempt time plus the final 7× interval settle window; specifically, when computing the timeout passed to NimBLEUtils::taskWait replace the single m_connectTimeout and the 7× interval component with values multiplied by (1 + m_config.connectFailRetries) (or otherwise sum time for each retry), so symbols to change include taskWait call, m_connectTimeout, connIntervalToMs(m_connParams.itvl_max) * 7 * m_config.connectFailRetries, and ensure ble_gap_connect retry behavior and BLE_HS_ETIMEOUT cancellation path still work correctly (i.e., budget = (m_connectTimeout * (1 + m_config.connectFailRetries)) + connIntervalToMs(m_connParams.itvl_max) * 7 * (1 + m_config.connectFailRetries) ).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/NimBLEClient.cpp`:
- Around line 966-979: completeConnectEstablished() calls user code
(m_pClientCallbacks->onConnect) while still relying on the instance field
m_pTaskData for the eventual NimBLEUtils::taskRelease, which allows the callback
to clobber m_pTaskData and leak the original wait; fix by capturing m_pTaskData
into a local variable (e.g., auto* savedTask = m_pTaskData), nulling m_pTaskData
before invoking m_pClientCallbacks->onConnect(this) (so nested synchronous calls
can't overwrite it), and after the callback return use
NimBLEUtils::taskRelease(*savedTask, 0) only if savedTask is non-null; keep
existing behavior for m_connectCallbackPending and
ble_npl_callout_stop(&m_connectEstablishedTimer).
In `@src/NimBLEClient.h`:
- Around line 105-112: The bitfield default-member-initializer syntax used in
struct Config (e.g., uint8_t deleteCallbacks : 1 {0};) requires C++20 and can
break builds on toolchains using C++11/17; remove the brace-initializers from
the bitfield declarations in struct Config and add an explicit Config()
constructor that initializes each bitfield (deleteCallbacks, deleteOnDisconnect,
deleteOnConnectFail, asyncConnect, exchangeMTU, connectFailRetries) to the
intended defaults (0 for flags, 2 for connectFailRetries), ensuring no
C++20-only syntax remains so it compiles on older Arduino/espressif toolchains.
---
Outside diff comments:
In `@src/NimBLEClient.cpp`:
- Around line 296-307: The wait timeout undercounts multiple connection
attempts: update the taskWait calculation to account for all connect retries and
their per-attempt time plus the final 7× interval settle window; specifically,
when computing the timeout passed to NimBLEUtils::taskWait replace the single
m_connectTimeout and the 7× interval component with values multiplied by (1 +
m_config.connectFailRetries) (or otherwise sum time for each retry), so symbols
to change include taskWait call, m_connectTimeout,
connIntervalToMs(m_connParams.itvl_max) * 7 * m_config.connectFailRetries, and
ensure ble_gap_connect retry behavior and BLE_HS_ETIMEOUT cancellation path
still work correctly (i.e., budget = (m_connectTimeout * (1 +
m_config.connectFailRetries)) + connIntervalToMs(m_connParams.itvl_max) * 7 * (1
+ m_config.connectFailRetries) ).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a74a8d9-fb0c-41b0-a629-c6691b87cac7
📒 Files selected for processing (2)
src/NimBLEClient.cppsrc/NimBLEClient.h
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/NimBLEClient.cpp (1)
966-981:⚠️ Potential issue | 🔴 CriticalAvast! The task data pointer still be vulnerable to hijackin', ye landlubber!
The code saves
pTaskDatabefore the callback (good!), but it doesn't clearm_pTaskDatabeforehand. If another GAP event fires whileonConnect()be runnin' (e.g., encryption change initiated by the peer), that event handler will readm_pTaskDatawhich still points to the original task data, potentially causin' a double-release or early unblock.🔒 Apply the fix from the previous review
bool NimBLEClient::completeConnectEstablished() { if (!m_connectCallbackPending) { return false; } m_connectCallbackPending = false; ble_npl_callout_stop(&m_connectEstablishedTimer); - auto pTaskData = m_pTaskData; // save a copy in case something in the callback changes it + auto pTaskData = m_pTaskData; + m_pTaskData = nullptr; // clear before callback to prevent other handlers from releasing m_pClientCallbacks->onConnect(this); if (pTaskData != nullptr) { NimBLEUtils::taskRelease(*pTaskData, 0); } return true; } // completeConnectEstablished🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/NimBLEClient.cpp` around lines 966 - 981, completeConnectEstablished saves pTaskData but leaves m_pTaskData pointing at it, so concurrent GAP events could see and reuse/free the same pointer; fix by clearing m_pTaskData before invoking the callback: after setting m_connectCallbackPending = false and stopping m_connectEstablishedTimer, move m_pTaskData into a local (pTaskData) and set m_pTaskData = nullptr, then call m_pClientCallbacks->onConnect(this), and finally if (pTaskData) NimBLEUtils::taskRelease(*pTaskData, 0); this ensures completeConnectEstablished, m_pTaskData, pTaskData, m_pClientCallbacks->onConnect, and NimBLEUtils::taskRelease are used safely.
🧹 Nitpick comments (1)
src/NimBLEClient.cpp (1)
296-299: The timeout calculation be interestin', but may warrant a second look, ye salty sea dog.The formula adds
connIntervalToMs(itvl_max) * 7 * connectFailRetrieswhich multiplies the connection-established wait time by the number of retries. However, ye only need ONE established wait at the end (after a successful connection), not one per retry attempt.On the other hand, the formula doesn't account fer the full
m_connectTimeoutfer each retry attempt. In practice, 0x3e failures happen quickly (within a few connection events), so this may not cause actual timeouts, but the math be a wee bit off fer worst-case scenarios.🔧 Mathematically correct formula (optional)
- if (!NimBLEUtils::taskWait( - taskData, - m_connectTimeout + 1000 + connIntervalToMs(m_connParams.itvl_max) * 7 * m_config.connectFailRetries)) { + // Timeout: base timeout per attempt * (1 + retries) + established wait + buffer + uint32_t totalTimeout = m_connectTimeout * (1 + m_config.connectFailRetries) + + connIntervalToMs(m_connParams.itvl_max) * 7 + 1000; + if (!NimBLEUtils::taskWait(taskData, totalTimeout)) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/NimBLEClient.cpp` around lines 296 - 299, The current timeout passed to NimBLEUtils::taskWait incorrectly multiplies the connection-established wait (connIntervalToMs(m_connParams.itvl_max) * 7) by m_config.connectFailRetries and doesn't include m_connectTimeout for each retry; update the expression used in the NimBLEUtils::taskWait call (using taskData, m_connectTimeout, connIntervalToMs, m_connParams.itvl_max and m_config.connectFailRetries) to sum retry time correctly—for example sum m_connectTimeout for each attempt (m_connectTimeout * (1 + m_config.connectFailRetries)), add the extra 1000ms, and add a single connection-established wait (connIntervalToMs(m_connParams.itvl_max) * 7) instead of multiplying that by the retry count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/NimBLEClient.cpp`:
- Around line 966-981: completeConnectEstablished saves pTaskData but leaves
m_pTaskData pointing at it, so concurrent GAP events could see and reuse/free
the same pointer; fix by clearing m_pTaskData before invoking the callback:
after setting m_connectCallbackPending = false and stopping
m_connectEstablishedTimer, move m_pTaskData into a local (pTaskData) and set
m_pTaskData = nullptr, then call m_pClientCallbacks->onConnect(this), and
finally if (pTaskData) NimBLEUtils::taskRelease(*pTaskData, 0); this ensures
completeConnectEstablished, m_pTaskData, pTaskData,
m_pClientCallbacks->onConnect, and NimBLEUtils::taskRelease are used safely.
---
Nitpick comments:
In `@src/NimBLEClient.cpp`:
- Around line 296-299: The current timeout passed to NimBLEUtils::taskWait
incorrectly multiplies the connection-established wait
(connIntervalToMs(m_connParams.itvl_max) * 7) by m_config.connectFailRetries and
doesn't include m_connectTimeout for each retry; update the expression used in
the NimBLEUtils::taskWait call (using taskData, m_connectTimeout,
connIntervalToMs, m_connParams.itvl_max and m_config.connectFailRetries) to sum
retry time correctly—for example sum m_connectTimeout for each attempt
(m_connectTimeout * (1 + m_config.connectFailRetries)), add the extra 1000ms,
and add a single connection-established wait
(connIntervalToMs(m_connParams.itvl_max) * 7) instead of multiplying that by the
retry count.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 151d7889-21f6-4c6f-83e1-e1eed5816e57
📒 Files selected for processing (2)
src/NimBLEClient.cppsrc/NimBLEClient.h
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/NimBLEClient.cpp (1)
277-278: Shiver me timbers! Connection retries be a silent behavioral change fer existing crews, arr.With
connectFailRetriesdefaultin' to 2 in the Config struct, existing applications will automatically get retry behavior on 0x3e connection establishment failures—settin' sail fer up to 2 automatic retry attempts without the skipper liftin' a finger. While this be generally a grand improvement fer connection robustness, some applications might not expect this silent retry—especially those with strict timing requirements or that handle reconnection at a higher level.Recommend documentatin' this behavioral change prominently in the release notes or CHANGELOG, so captains upgradin' their ships know what waves be comin'. They can set
setConnectRetries(0)to restore the previous immediate-failure behavior if needed, yarrr.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/NimBLEClient.cpp` around lines 277 - 278, Changing the default connectFailRetries in the Config struct to 2 introduces automatic retry behavior on 0x3e connection failures which is a behavioral change; update the project release notes/CHANGELOG to call this out and document how to revert to prior behavior (e.g., use setConnectRetries(0) or set connectFailRetries to 0 in Config), and mention the related symbols (Config::connectFailRetries, NimBLEClient::setConnectRetries, and member m_connectFailRetryCount) so users know where to adjust the setting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/NimBLEClient.cpp`:
- Around line 296-299: The timeout multiplier uses std::min on
m_config.connectFailRetries which returns 0 when retries are disabled, producing
a zero total timeout; change the multiplier to use
std::max<uint8_t>(m_config.connectFailRetries, 1) so the NimBLEUtils::taskWait
call always waits at least once using m_connectTimeout plus
connIntervalToMs(m_connParams.itvl_max) * 7; update the expression around
NimBLEUtils::taskWait in NimBLEClient.cpp to replace the std::min usage with
std::max referencing m_config.connectFailRetries, m_connectTimeout,
connIntervalToMs, and m_connParams.itvl_max.
---
Nitpick comments:
In `@src/NimBLEClient.cpp`:
- Around line 277-278: Changing the default connectFailRetries in the Config
struct to 2 introduces automatic retry behavior on 0x3e connection failures
which is a behavioral change; update the project release notes/CHANGELOG to call
this out and document how to revert to prior behavior (e.g., use
setConnectRetries(0) or set connectFailRetries to 0 in Config), and mention the
related symbols (Config::connectFailRetries, NimBLEClient::setConnectRetries,
and member m_connectFailRetryCount) so users know where to adjust the setting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 352a77e9-aef6-45ba-9de3-47dcdda01736
📒 Files selected for processing (2)
src/NimBLEClient.cppsrc/NimBLEClient.h
🚧 Files skipped from review as they are similar to previous changes (1)
- src/NimBLEClient.h
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/NimBLEClient.cpp`:
- Around line 1163-1165: completeConnectEstablished() may run user callbacks
that can delete or disconnect the client, so after calling
pClient->completeConnectEstablished() do not keep using the old pClient pointer;
instead save conn_handle before the call, invoke completeConnectEstablished(),
then re-resolve pClient from the stable conn_handle (or skip the callback until
after you finish operations) before calling methods like getCharacteristic,
onMTUChange, onPhyUpdate, etc.; apply this pattern to the other branches (lines
mentioned) to avoid dereferencing a freed pClient.
- Around line 296-299: The blocking wait uses max(connectFailRetries,1) windows
but misses the initial attempt, so synchronous connect() can wait one attempt
too few; update the NimBLEUtils::taskWait call(s) (the one using
m_connectTimeout + connIntervalToMs(m_connParams.itvl_max) * 7 and the analogous
occurrence near the second block) to multiply by
(std::max<uint8_t>(m_config.connectFailRetries, 1) + 1) (i.e., retries + the
initial attempt) so the wait budget covers all attempts including the first.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d68cba87-6c4e-4cff-8eac-7c313fdf2d8d
📒 Files selected for processing (2)
src/NimBLEClient.cppsrc/NimBLEClient.h
🚧 Files skipped from review as they are similar to previous changes (1)
- src/NimBLEClient.h
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/NimBLEClient.cpp (1)
1075-1091:⚠️ Potential issue | 🔴 CriticalArrr, this new terminal
onConnectFail()path can still touch a freed client.By the time Line 1076 calls user code, this handler has already left the deferred-delete states. If
onConnectFail()callsNimBLEDevice::deleteClient(pClient), the client can be freed immediately, and Lines 1081-1091 then keep dereferencing dangling memory. Please avoid touchingpClientafter this callback, or add a deferred-delete guard for this path too.Based on learnings,
NimBLEDevice::deleteClient()only defers deletion for CONNECTED/DISCONNECTING or CONNECTING clients; a DISCONNECTED client can be freed immediately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/NimBLEClient.cpp` around lines 1075 - 1091, The onConnectFail callback can free pClient, so avoid touching pClient after calling m_pClientCallbacks->onConnectFail; compute and apply any deletion decisions before invoking the callback (e.g., set pClient->m_connHandle = BLE_HS_CONN_HANDLE_NONE, evaluate whether to delete and set pClient->m_config.deleteOnConnectFail if pTaskData and connEstablishFailReason apply), or capture the deleteDecision beforehand and if true call NimBLEDevice::deleteClient(pClient) immediately after the callback and then return without accessing pClient; refer to pClient, m_pClientCallbacks->onConnectFail, m_connHandle, m_config.deleteOnDisconnect, m_config.deleteOnConnectFail, pTaskData and NimBLEDevice::deleteClient to locate and implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/NimBLEClient.cpp`:
- Around line 1014-1021: The disconnect handling currently calls
NimBLEDevice::getClientByHandle(event->disconnect.conn.conn_handle)
unconditionally which matches idle clients because their m_connHandle ==
BLE_HS_CONN_HANDLE_NONE; update the lookup logic in the disconnect handling to
first test if event->disconnect.conn.conn_handle != BLE_HS_CONN_HANDLE_NONE
before calling getClientByHandle, and only if that condition fails (or returns
nullptr) proceed to look up by peer addresses via
NimBLEDevice::getClientByPeerAddress(event->disconnect.conn.peer_ota_addr) and
then peer_id_addr; this ensures the correct client is selected and
onConnectFail()/other callbacks are routed to the intended client.
---
Outside diff comments:
In `@src/NimBLEClient.cpp`:
- Around line 1075-1091: The onConnectFail callback can free pClient, so avoid
touching pClient after calling m_pClientCallbacks->onConnectFail; compute and
apply any deletion decisions before invoking the callback (e.g., set
pClient->m_connHandle = BLE_HS_CONN_HANDLE_NONE, evaluate whether to delete and
set pClient->m_config.deleteOnConnectFail if pTaskData and
connEstablishFailReason apply), or capture the deleteDecision beforehand and if
true call NimBLEDevice::deleteClient(pClient) immediately after the callback and
then return without accessing pClient; refer to pClient,
m_pClientCallbacks->onConnectFail, m_connHandle, m_config.deleteOnDisconnect,
m_config.deleteOnConnectFail, pTaskData and NimBLEDevice::deleteClient to locate
and implement the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 897bf23d-b2d3-42f1-8d1d-dc4b8c9d3bd7
📒 Files selected for processing (2)
src/NimBLEClient.cppsrc/NimBLEClient.h
* Delay onConnect until connection is likely established using a timer (7x conn interval), with early completion on peer activity. * Suppress onDisconnect for 0x3e and route terminal 0x3e failures to onConnectFail. * Add configurable 0x3e reconnect attempts via `setConnectRetries` or client config (default 2). * Retry logic suppresses callbacks while retrying; callbacks fire only on final outcome. * Refactor connect-start path into a shared helper used by initial connect and retries.
setConnectRetriesor client config (default 2).Summary by CodeRabbit
New Features
Bug Fixes
Refactor