-
Notifications
You must be signed in to change notification settings - Fork 0
<fix>[storage]: Atomic modification API for MDS nodes. #3233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 5.5.6
Are you sure you want to change the base?
Conversation
漫步 (Walkthrough)新增可选乐观锁参数 变更 (Changes)
序列图 (Sequence Diagram)sequenceDiagram
participant Client as 客户端
participant Server as 服务器\n(ExternalPrimaryStorage)
participant DB as 数据库
Client->>Server: 发送 APIUpdateExternalPrimaryStorageMsg\n(新配置 + 可选 oldConfig)
alt 提供 oldConfig(乐观锁路径)
Server->>Server: 获取 UUID 对应的锁(Interners)
Server->>DB: compareAndSetConfig(expectedConfig, newVO)
DB-->>Server: 返回 成功 或 失败
alt 成功
Server->>DB: updateAndRefresh(newVO)
DB-->>Server: 返回最新 VO
Server-->>Client: 返回成功响应
else 失败
Server-->>Client: 返回并发冲突错误(OperationFailure)
end
else 未提供 oldConfig(传统路径)
Server->>DB: 无条件 updateAndRefresh(newVO)
DB-->>Server: 返回最新 VO
Server-->>Client: 返回成功响应
end
代码审核工作量评估 (Estimated Code Review Effort)🎯 4 (Complex) | ⏱️ ~45 minutes 诗歌 (Poem)
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.5)utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.javaComment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`:
- Around line 325-349: The concurrent-update test is wrong: both Thread.start
blocks expect AssertionError and exceptions in those threads are not propagated
or waited for, making the final assert invalid; change the concurrency test so
one update is allowed to succeed and the other is expected to fail, capture and
propagate thread exceptions to the main test, and join the threads before
running retryInSecs. Specifically, modify the two updateExternalPrimaryStorage
calls (the blocks currently wrapped with expect(AssertionError.class)) to run in
separate threads but record their outcomes (e.g., store thrown exceptions or
success flags from each thread), assert that exactly one thread threw an
AssertionError and one succeeded, call join() (or use a CountDownLatch) to wait
for both threads to finish before querying the ExternalPrimaryStorageVO config
(the Q.New(...) block) and the final assertion that checks newConfig3 for the
two IPs.
🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)
205-219: 变量命名可能引起混淆。Line 205 的
oldConfig存储的是更新前的当前配置,而msg.getOldConfig()是用户提供的期望配置(用于 CAS 比较)。两者含义不同但名称相似,建议将 Line 205 的变量重命名为更明确的名称,如configBeforeUpdate或previousConfig。♻️ 建议的重命名
- String oldConfig = externalVO.getConfig(); + String configBeforeUpdate = externalVO.getConfig();
...st/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy
Outdated
Show resolved
Hide resolved
773db0b to
3a5e2b1
Compare
| .findValue() | ||
| assert ([ "127.0.1.6", "127.0.1.7" ].count { newConfig3.contains(it) } == 1) | ||
| assert exceptionCount.get() == 1 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from yu.qiu:
resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`:
- Around line 211-219: The code currently mutates externalVO before calling
compareAndSet(externalVO.getUuid(), msg.getOldConfig(), externalVO), so on CAS
failure the in-memory externalVO may be inconsistent with DB; change the failure
branch to reload the fresh state from DB (externalVO = dbf.reload(externalVO))
or revert to a pre-modified copy and only swap in the updated object after
successful compareAndSet; specifically, ensure compareAndSet operates on an
immutable copy or rollback by calling dbf.reload(...) before throwing the
OperationFailureException to avoid leaving a polluted in-memory externalVO.
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
Show resolved
Hide resolved
3a5e2b1 to
04e1e32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`:
- Around line 176-197: The compareAndSet method is swallowing all exceptions (in
the try/catch) and returning false, causing real DB/SQL/lock errors to be
misreported as CAS conflicts; change the catch to log the exception (including
context: uuid, expectedConfig, newVO fields) and rethrow a runtime/checked
exception so callers see the real failure, while keeping the current behavior of
returning false only when updatedRows == 0 (i.e., CAS mismatch); keep GLock
usage and the finally lock.unlock() as-is and ensure the SQL.New(...) updateSql
block remains the same but does not translate exceptions into false.
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`:
- Around line 326-365: The test currently catches Throwable in both thread1 and
thread2 which masks unexpected errors; change the handlers in the Thread.start
blocks that call updateExternalPrimaryStorage (the closures in thread1 and
thread2) to catch AssertionError (or the specific expected failure) and
increment exceptionCount there, and catch other Exceptions (or
BrokenBarrierException/RuntimeException) separately so they are rethrown or
captured in a new unexpectedExceptions collection and cause the test to fail;
keep successCount increment on success as-is and assert that
unexpectedExceptions is empty (or rethrow any caught non-AssertionError) after
thread joins so only the intended stale-oldConfig failure is counted in
exceptionCount.
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
Outdated
Show resolved
Hide resolved
...st/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy
Show resolved
Hide resolved
04e1e32 to
456c5ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@header/src/main/java/org/zstack/header/storage/addon/primary/APIUpdateExternalPrimaryStorageMsgDoc_zh_cn.groovy`:
- Around line 97-105: The API field definition for oldConfig (column name
"oldConfig" enclosedIn "updateExternalPrimaryStorage") has an empty desc; update
its desc to a concise English description stating that this optional String
holds the expected CAS configuration for the external primary storage (e.g.,
"Expected CAS configuration for the external primary storage in JSON/string
format"), keeping existing metadata (optional true, type "String", since
"5.5.0") intact so the generated docs correctly convey its purpose.
♻️ Duplicate comments (2)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)
176-194: compareAndSet 吞掉异常会把真实故障伪装成 CAS 冲突。异常被吞掉后统一返回
false,上层会抛出“config has been modified”的业务错误,真实的 DB/锁/SQL 故障被掩盖,影响排障与重试策略。建议直接抛出明确错误,仅在updatedRows == 0时返回false。🐛 建议修改
- } catch (Exception e) { - return false; + } catch (Exception e) { + throw new OperationFailureException(operr( + "Failed to update ExternalPrimaryStorage[uuid:%s], %s", uuid, e.getMessage())); } finally { lock.unlock(); }test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy (1)
326-365: 不要用 catch Throwable 统计“预期失败”。这会把意外异常(如 BrokenBarrierException、运行时错误)也当作 CAS 失败计数,导致测试误报通过。应区分
AssertionError(预期失败)与其他异常,并将非预期异常显式失败。🧪 建议修改
+ def unexpectedErrors = Collections.synchronizedList([]) def thread1 = Thread.start{ try { barrier.await() updateExternalPrimaryStorage { uuid = ps.uuid config ="{\"mdsUrls\":[\"root:password@127.0.1.6\",\"root:password@127.0.1.2\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}" oldConfig = newConfig2 } successCount.incrementAndGet() - } catch (Throwable e) { - exceptionCount.incrementAndGet() + } catch (AssertionError e) { + exceptionCount.incrementAndGet() + } catch (Throwable t) { + unexpectedErrors << t } } def thread2 = Thread.start{ try { barrier.await() updateExternalPrimaryStorage { uuid = ps.uuid config ="{\"mdsUrls\":[\"root:password@127.0.1.4\",\"root:password@127.0.1.7\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}" oldConfig = newConfig2 } successCount.incrementAndGet() - } catch (Throwable e) { - exceptionCount.incrementAndGet() + } catch (AssertionError e) { + exceptionCount.incrementAndGet() + } catch (Throwable t) { + unexpectedErrors << t } } thread1.join() thread2.join() retryInSecs { + assert unexpectedErrors.isEmpty() : "Unexpected errors occurred: ${unexpectedErrors*.message}" String newConfig3= Q.New(ExternalPrimaryStorageVO.class) .select(ExternalPrimaryStorageVO_.config) .eq(ExternalPrimaryStorageVO_.uuid, ps.uuid) .findValue() assert ([ "127.0.1.6", "127.0.1.7" ].count { newConfig3.contains(it) } == 1) assert exceptionCount.get() == 1 assert successCount.get() == 1 }
...a/org/zstack/header/storage/addon/primary/APIUpdateExternalPrimaryStorageMsgDoc_zh_cn.groovy
Show resolved
Hide resolved
9b8bc9f to
97ab24f
Compare
| } | ||
| } | ||
|
|
||
| private boolean compareAndSet(String uuid, String expectedConfig, ExternalPrimaryStorageVO newVO) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from jin.ma:
需要这么重吗,能排队就不要 glock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from yu.qiu:
需要排队,数据库层面的事务和锁不足以正常CAS,经过UT多次测试,在两个api同时发起时候,偶尔报错
后续参考您的建议,用sync修饰符,锁的是一个自定义的静态String对象(因为每个api都是一个新的实例)
85a733f to
aa2f5d8
Compare
| } | ||
| externalVO = dbf.updateAndRefresh(externalVO); | ||
| if (msg.getOldConfig() != null) { | ||
| boolean success = compareAndSet(externalVO.getUuid(), msg.getOldConfig(), externalVO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from jin.ma:
不需要这么多语义,compareAndSetConfig 只需要 set config,其他不需要。保持单一职责,不然下次 ext ps 新增个字段这里还需要写两遍 set ?还是那句话做什么事情就描述什么,compare config 就体现在函数名上。
aa2f5d8 to
a33d0c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)
178-221: CAS 成功后再updateAndRefresh可能覆盖后续并发更新
compareAndSetConfig只在同步块内更新config,但dbf.updateAndRefresh(externalVO)在锁外执行。两个并发 CAS 更新可在这之间交错,导致先完成的请求把后完成的配置覆盖,破坏 CAS 语义。建议把updateAndRefresh放入同一锁内,或将 CAS 更新与 refresh 合并成一个原子步骤。🛠️ 一种可行调整
- private boolean compareAndSetConfig(String expectedConfig, ExternalPrimaryStorageVO newVO) { + private ExternalPrimaryStorageVO compareAndSetConfig(String expectedConfig, ExternalPrimaryStorageVO newVO) { synchronized (externalPsUpdateLock.intern(String.format("ExternalPrimaryStorage-update-%s", newVO.getUuid()))) { try { String updateSql = "update ExternalPrimaryStorageVO set config = :config where uuid = :uuid and config = :expectedConfig"; int updatedRows = SQL.New(updateSql) .param("config", newVO.getConfig()) .param("uuid", newVO.getUuid()) .param("expectedConfig", expectedConfig) .execute(); - return updatedRows > 0; + if (updatedRows == 0) { + return null; + } + return dbf.updateAndRefresh(newVO); } catch (Exception e) { throw new OperationFailureException(operr(ORG_ZSTACK_STORAGE_ADDON_PRIMARY_10041,"Failed to update ExternalPrimaryStorage[uuid:%s], %s", newVO.getUuid(), e.getMessage())); } } } ... if (msg.getOldConfig() != null) { - boolean success = compareAndSetConfig(msg.getOldConfig(), externalVO); - if (!success) { + externalVO = compareAndSetConfig(msg.getOldConfig(), externalVO); + if (externalVO == null) { throw new OperationFailureException(operr(ORG_ZSTACK_STORAGE_ADDON_PRIMARY_10040,"Failed to update ExternalPrimaryStorage[uuid:%s], config has been modified by another operation", externalVO.getUuid())); } - } - externalVO = dbf.updateAndRefresh(externalVO); + } else { + externalVO = dbf.updateAndRefresh(externalVO); + }
♻️ Duplicate comments (1)
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy (1)
329-353: 不要把非预期异常计入预期失败当前用
catch (Throwable)会把运行时错误/屏障异常当作预期失败计数,可能导致测试误绿。建议仅把AssertionError计为预期失败,其他异常单独收集并让测试失败。🛠️ 建议修改
+ def unexpectedErrors = Collections.synchronizedList([]) def thread1 = Thread.start{ try { barrier.await() updateExternalPrimaryStorage { uuid = ps.uuid config ="{\"mdsUrls\":[\"root:password@127.0.1.6\",\"root:password@127.0.1.2\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}" oldConfig = newConfig2 } successCount.incrementAndGet() - } catch (Throwable e) { + } catch (AssertionError e) { exceptionCount.incrementAndGet() + } catch (Throwable t) { + unexpectedErrors << t } } def thread2 = Thread.start{ try { barrier.await() updateExternalPrimaryStorage { uuid = ps.uuid config ="{\"mdsUrls\":[\"root:password@127.0.1.4\",\"root:password@127.0.1.7\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}" oldConfig = newConfig2 } successCount.incrementAndGet() - } catch (Throwable e) { + } catch (AssertionError e) { exceptionCount.incrementAndGet() + } catch (Throwable t) { + unexpectedErrors << t } } thread1.join() thread2.join() retryInSecs { + assert unexpectedErrors.isEmpty() : "Unexpected errors: ${unexpectedErrors*.message}" String newConfig3= Q.New(ExternalPrimaryStorageVO.class) .select(ExternalPrimaryStorageVO_.config) .eq(ExternalPrimaryStorageVO_.uuid, ps.uuid) .findValue() assert ([ "127.0.1.6", "127.0.1.7" ].count { newConfig3.contains(it) } == 1) assert exceptionCount.get() == 1 assert successCount.get() == 1 }
| } | ||
| externalVO = dbf.updateAndRefresh(externalVO); | ||
| if (msg.getOldConfig() != null) { | ||
| boolean success = compareAndSet(externalVO.getUuid(), msg.getOldConfig(), externalVO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .param("expectedConfig", expectedConfig) | ||
| .execute(); | ||
| return updatedRows > 0; | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from jin.ma:
zstack 基本上不需要处理 exception ,删掉 try catch
| if (msg.getOldConfig() != null) { | ||
| boolean success = compareAndSetConfig(msg.getOldConfig(), externalVO); | ||
| if (!success) { | ||
| throw new OperationFailureException(operr(ORG_ZSTACK_STORAGE_ADDON_PRIMARY_10040,"Failed to update ExternalPrimaryStorage[uuid:%s], config has been modified by another operation", externalVO.getUuid())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from jin.ma:
能直接 evt.setError 就不要 throw exception ,很不好的习惯
When API requests for modifying multiple MDS nodes are initiated, place the API requests into a queue for serialized modification. APIImpact Resolves: ZSTAC-79225 Change-Id: I7073636e6f656c67636e7771726b6b79687a707a
a33d0c8 to
7b0388c
Compare

When multiple APIs for modifying MDS nodes are initiated,
the API semantics are Compare-And-Set (CAS),
performing atomic operations.
APIImpact
Resolves: ZSTAC-79225
Change-Id: I6774636f6b6d7a6a6c717568626f64727867796c
sync from gitlab !9051