Skip to content

Conversation

@zstack-robot-2
Copy link
Collaborator

When multiple APIs for modifying MDS nodes are initiated,
the API semantics are Compare-And-Set (CAS),
performing atomic operations.

APIImpact

Resolves: ZSTAC-79225

Change-Id: I6774636f6b6d7a6a6c717568626f64727867796c

sync from gitlab !9051

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

漫步 (Walkthrough)

新增可选乐观锁参数 oldConfig 到更新外部主存储的 API;在存储处理路径中使用基于 UUID 的锁与 compare-and-set(compareAndSetConfig)做条件更新并在冲突时返回错误;增加并发测试与文档及错误码常量。

变更 (Changes)

内聚组 / 文件 变更摘要
API 参数扩展
header/src/main/java/org/zstack/header/storage/addon/primary/APIUpdateExternalPrimaryStorageMsg.java
新增可选字段 oldConfig@APIParam(required = false))、getOldConfig / setOldConfig 访问器,并在 __example__() 中加入示例值
存储更新逻辑
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
新增基于 UUID 的 Interners 锁与私有方法 compareAndSetConfig(String expectedConfig, ExternalPrimaryStorageVO newVO);在处理 APIUpdateExternalPrimaryStorageMsg 时若提供 oldConfig 则走 compare-and-set 路径并在失败时返回并发冲突错误,失败时中止后续刷新流程;否则保留原有无条件 updateAndRefresh 流程
集成测试用例
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy
增加并发压力测试:使用 CyclicBarrier 启动两线程基于相同 oldConfig 同时更新不同 mdsUrls,断言只有一个成功并验证最终配置
API 文档
header/src/main/java/org/zstack/header/storage/addon/primary/APIUpdateExternalPrimaryStorageMsgDoc_zh_cn.groovy (及对应英文文档块)
updateExternalPrimaryStorage 文档中新增可选参数 oldConfig(类型 String,body,since 5.5.0),说明用于基于旧值的原子更新以保证并发安全
错误码常量
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
新增错误码常量 ORG_ZSTACK_STORAGE_ADDON_PRIMARY_10040

序列图 (Sequence Diagram)

sequenceDiagram
    participant Client as 客户端
    participant Server as 服务器\n(ExternalPrimaryStorage)
    participant DB as 数据库

    Client->>Server: 发送 APIUpdateExternalPrimaryStorageMsg\n(新配置 + 可选 oldConfig)
    alt 提供 oldConfig(乐观锁路径)
        Server->>Server: 获取 UUID 对应的锁(Interners)
        Server->>DB: compareAndSetConfig(expectedConfig, newVO)
        DB-->>Server: 返回 成功 或 失败
        alt 成功
            Server->>DB: updateAndRefresh(newVO)
            DB-->>Server: 返回最新 VO
            Server-->>Client: 返回成功响应
        else 失败
            Server-->>Client: 返回并发冲突错误(OperationFailure)
        end
    else 未提供 oldConfig(传统路径)
        Server->>DB: 无条件 updateAndRefresh(newVO)
        DB-->>Server: 返回最新 VO
        Server-->>Client: 返回成功响应
    end
Loading

代码审核工作量评估 (Estimated Code Review Effort)

🎯 4 (Complex) | ⏱️ ~45 minutes

诗歌 (Poem)

🐰 我是田间的小兔子,嗅着旧字风,
若旧值仍在掌心,我轻按换新棚。
两手同时触草地,只容一方舞动,
成功留下一串脚印,月光为它送钟。

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题遵循规定的格式 [scope]: ,长度54字符,未超过72字符限制,清晰描述了主要变更内容:为MDS节点添加原子修改API。
Description check ✅ Passed 描述与代码变更内容相关,提到了Compare-And-Set(CAS)原子操作语义、MDS节点修改、相关问题编号ZSTAC-79225,与代码中实现的并发安全更新机制一致。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`:
- Around line 325-349: The concurrent-update test is wrong: both Thread.start
blocks expect AssertionError and exceptions in those threads are not propagated
or waited for, making the final assert invalid; change the concurrency test so
one update is allowed to succeed and the other is expected to fail, capture and
propagate thread exceptions to the main test, and join the threads before
running retryInSecs. Specifically, modify the two updateExternalPrimaryStorage
calls (the blocks currently wrapped with expect(AssertionError.class)) to run in
separate threads but record their outcomes (e.g., store thrown exceptions or
success flags from each thread), assert that exactly one thread threw an
AssertionError and one succeeded, call join() (or use a CountDownLatch) to wait
for both threads to finish before querying the ExternalPrimaryStorageVO config
(the Q.New(...) block) and the final assertion that checks newConfig3 for the
two IPs.
🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)

205-219: 变量命名可能引起混淆。

Line 205 的 oldConfig 存储的是更新前的当前配置,而 msg.getOldConfig() 是用户提供的期望配置(用于 CAS 比较)。两者含义不同但名称相似,建议将 Line 205 的变量重命名为更明确的名称,如 configBeforeUpdatepreviousConfig

♻️ 建议的重命名
-        String oldConfig = externalVO.getConfig();
+        String configBeforeUpdate = externalVO.getConfig();

@MatheMatrix MatheMatrix force-pushed the sync/yu.qiu/ZSTAC-79225 branch 2 times, most recently from 773db0b to 3a5e2b1 Compare January 22, 2026 03:55
.findValue()
assert ([ "127.0.1.6", "127.0.1.7" ].count { newConfig3.contains(it) } == 1)
assert exceptionCount.get() == 1
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from yu.qiu:

resolved

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`:
- Around line 211-219: The code currently mutates externalVO before calling
compareAndSet(externalVO.getUuid(), msg.getOldConfig(), externalVO), so on CAS
failure the in-memory externalVO may be inconsistent with DB; change the failure
branch to reload the fresh state from DB (externalVO = dbf.reload(externalVO))
or revert to a pre-modified copy and only swap in the updated object after
successful compareAndSet; specifically, ensure compareAndSet operates on an
immutable copy or rollback by calling dbf.reload(...) before throwing the
OperationFailureException to avoid leaving a polluted in-memory externalVO.

@MatheMatrix MatheMatrix force-pushed the sync/yu.qiu/ZSTAC-79225 branch from 3a5e2b1 to 04e1e32 Compare January 22, 2026 08:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`:
- Around line 176-197: The compareAndSet method is swallowing all exceptions (in
the try/catch) and returning false, causing real DB/SQL/lock errors to be
misreported as CAS conflicts; change the catch to log the exception (including
context: uuid, expectedConfig, newVO fields) and rethrow a runtime/checked
exception so callers see the real failure, while keeping the current behavior of
returning false only when updatedRows == 0 (i.e., CAS mismatch); keep GLock
usage and the finally lock.unlock() as-is and ensure the SQL.New(...) updateSql
block remains the same but does not translate exceptions into false.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`:
- Around line 326-365: The test currently catches Throwable in both thread1 and
thread2 which masks unexpected errors; change the handlers in the Thread.start
blocks that call updateExternalPrimaryStorage (the closures in thread1 and
thread2) to catch AssertionError (or the specific expected failure) and
increment exceptionCount there, and catch other Exceptions (or
BrokenBarrierException/RuntimeException) separately so they are rethrown or
captured in a new unexpectedExceptions collection and cause the test to fail;
keep successCount increment on success as-is and assert that
unexpectedExceptions is empty (or rethrow any caught non-AssertionError) after
thread joins so only the intended stale-oldConfig failure is counted in
exceptionCount.

@MatheMatrix MatheMatrix force-pushed the sync/yu.qiu/ZSTAC-79225 branch from 04e1e32 to 456c5ed Compare January 22, 2026 10:33
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@header/src/main/java/org/zstack/header/storage/addon/primary/APIUpdateExternalPrimaryStorageMsgDoc_zh_cn.groovy`:
- Around line 97-105: The API field definition for oldConfig (column name
"oldConfig" enclosedIn "updateExternalPrimaryStorage") has an empty desc; update
its desc to a concise English description stating that this optional String
holds the expected CAS configuration for the external primary storage (e.g.,
"Expected CAS configuration for the external primary storage in JSON/string
format"), keeping existing metadata (optional true, type "String", since
"5.5.0") intact so the generated docs correctly convey its purpose.
♻️ Duplicate comments (2)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)

176-194: compareAndSet 吞掉异常会把真实故障伪装成 CAS 冲突。

异常被吞掉后统一返回 false,上层会抛出“config has been modified”的业务错误,真实的 DB/锁/SQL 故障被掩盖,影响排障与重试策略。建议直接抛出明确错误,仅在 updatedRows == 0 时返回 false

🐛 建议修改
-        } catch (Exception e) {
-            return false;
+        } catch (Exception e) {
+            throw new OperationFailureException(operr(
+                    "Failed to update ExternalPrimaryStorage[uuid:%s], %s", uuid, e.getMessage()));
         } finally {
             lock.unlock();
         }
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy (1)

326-365: 不要用 catch Throwable 统计“预期失败”。

这会把意外异常(如 BrokenBarrierException、运行时错误)也当作 CAS 失败计数,导致测试误报通过。应区分 AssertionError(预期失败)与其他异常,并将非预期异常显式失败。

🧪 建议修改
+        def unexpectedErrors = Collections.synchronizedList([])
         def thread1 = Thread.start{
             try {
                 barrier.await()
                 updateExternalPrimaryStorage {
                     uuid = ps.uuid
                     config ="{\"mdsUrls\":[\"root:password@127.0.1.6\",\"root:password@127.0.1.2\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}"
                     oldConfig = newConfig2
                 }
                 successCount.incrementAndGet()
-            } catch (Throwable e) {
-                exceptionCount.incrementAndGet()
+            } catch (AssertionError e) {
+                exceptionCount.incrementAndGet()
+            } catch (Throwable t) {
+                unexpectedErrors << t
             }
         }
         def thread2 = Thread.start{
             try {
                 barrier.await()
                 updateExternalPrimaryStorage {
                     uuid = ps.uuid
                     config ="{\"mdsUrls\":[\"root:password@127.0.1.4\",\"root:password@127.0.1.7\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}"
                     oldConfig = newConfig2
                 }
                 successCount.incrementAndGet()
-            } catch (Throwable e) {
-                exceptionCount.incrementAndGet()
+            } catch (AssertionError e) {
+                exceptionCount.incrementAndGet()
+            } catch (Throwable t) {
+                unexpectedErrors << t
             }
         }
         thread1.join()
         thread2.join()
         retryInSecs {
+            assert unexpectedErrors.isEmpty() : "Unexpected errors occurred: ${unexpectedErrors*.message}"
             String newConfig3= Q.New(ExternalPrimaryStorageVO.class)
                     .select(ExternalPrimaryStorageVO_.config)
                     .eq(ExternalPrimaryStorageVO_.uuid, ps.uuid)
                     .findValue()
             assert ([ "127.0.1.6", "127.0.1.7" ].count { newConfig3.contains(it) } == 1)
             assert exceptionCount.get() == 1
             assert successCount.get() == 1
         }

@MatheMatrix MatheMatrix force-pushed the sync/yu.qiu/ZSTAC-79225 branch 3 times, most recently from 9b8bc9f to 97ab24f Compare January 23, 2026 06:06
}
}

private boolean compareAndSet(String uuid, String expectedConfig, ExternalPrimaryStorageVO newVO) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

需要这么重吗,能排队就不要 glock

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from yu.qiu:

需要排队,数据库层面的事务和锁不足以正常CAS,经过UT多次测试,在两个api同时发起时候,偶尔报错
后续参考您的建议,用sync修饰符,锁的是一个自定义的静态String对象(因为每个api都是一个新的实例)

@MatheMatrix MatheMatrix force-pushed the sync/yu.qiu/ZSTAC-79225 branch 2 times, most recently from 85a733f to aa2f5d8 Compare January 23, 2026 10:53
}
externalVO = dbf.updateAndRefresh(externalVO);
if (msg.getOldConfig() != null) {
boolean success = compareAndSet(externalVO.getUuid(), msg.getOldConfig(), externalVO);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

不需要这么多语义,compareAndSetConfig 只需要 set config,其他不需要。保持单一职责,不然下次 ext ps 新增个字段这里还需要写两遍 set ?还是那句话做什么事情就描述什么,compare config 就体现在函数名上。

@MatheMatrix MatheMatrix force-pushed the sync/yu.qiu/ZSTAC-79225 branch from aa2f5d8 to a33d0c8 Compare January 26, 2026 06:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)

178-221: CAS 成功后再 updateAndRefresh 可能覆盖后续并发更新

compareAndSetConfig 只在同步块内更新 config,但 dbf.updateAndRefresh(externalVO) 在锁外执行。两个并发 CAS 更新可在这之间交错,导致先完成的请求把后完成的配置覆盖,破坏 CAS 语义。建议把 updateAndRefresh 放入同一锁内,或将 CAS 更新与 refresh 合并成一个原子步骤。

🛠️ 一种可行调整
-    private boolean compareAndSetConfig(String expectedConfig, ExternalPrimaryStorageVO newVO) {
+    private ExternalPrimaryStorageVO compareAndSetConfig(String expectedConfig, ExternalPrimaryStorageVO newVO) {
         synchronized (externalPsUpdateLock.intern(String.format("ExternalPrimaryStorage-update-%s", newVO.getUuid()))) {
             try {
                 String updateSql = "update ExternalPrimaryStorageVO set config = :config where uuid = :uuid and config = :expectedConfig";
                 int updatedRows = SQL.New(updateSql)
                         .param("config", newVO.getConfig())
                         .param("uuid", newVO.getUuid())
                         .param("expectedConfig", expectedConfig)
                         .execute();
-                return updatedRows > 0;
+                if (updatedRows == 0) {
+                    return null;
+                }
+                return dbf.updateAndRefresh(newVO);
             } catch (Exception e) {
                 throw new OperationFailureException(operr(ORG_ZSTACK_STORAGE_ADDON_PRIMARY_10041,"Failed to update ExternalPrimaryStorage[uuid:%s], %s", newVO.getUuid(), e.getMessage()));
             }
         }
     }
...
         if (msg.getOldConfig() != null) {
-            boolean success = compareAndSetConfig(msg.getOldConfig(), externalVO);
-            if (!success) {
+            externalVO = compareAndSetConfig(msg.getOldConfig(), externalVO);
+            if (externalVO == null) {
                 throw new OperationFailureException(operr(ORG_ZSTACK_STORAGE_ADDON_PRIMARY_10040,"Failed to update ExternalPrimaryStorage[uuid:%s], config has been modified by another operation", externalVO.getUuid()));
             }
-        }
-        externalVO = dbf.updateAndRefresh(externalVO);
+        } else {
+            externalVO = dbf.updateAndRefresh(externalVO);
+        }
♻️ Duplicate comments (1)
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy (1)

329-353: 不要把非预期异常计入预期失败

当前用 catch (Throwable) 会把运行时错误/屏障异常当作预期失败计数,可能导致测试误绿。建议仅把 AssertionError 计为预期失败,其他异常单独收集并让测试失败。

🛠️ 建议修改
+        def unexpectedErrors = Collections.synchronizedList([])
         def thread1 = Thread.start{
             try {
                 barrier.await()
                 updateExternalPrimaryStorage {
                     uuid = ps.uuid
                     config ="{\"mdsUrls\":[\"root:password@127.0.1.6\",\"root:password@127.0.1.2\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}"
                     oldConfig = newConfig2
                 }
                 successCount.incrementAndGet()
-            } catch (Throwable e) {
+            } catch (AssertionError e) {
                 exceptionCount.incrementAndGet()
+            } catch (Throwable t) {
+                unexpectedErrors << t
             }
         }
         def thread2 = Thread.start{
             try {
                 barrier.await()
                 updateExternalPrimaryStorage {
                     uuid = ps.uuid
                     config ="{\"mdsUrls\":[\"root:password@127.0.1.4\",\"root:password@127.0.1.7\",\"root:password@127.0.1.3\"],\"logicalPoolName\":\"lpool1\"}"
                     oldConfig = newConfig2
                 }
                 successCount.incrementAndGet()
-            } catch (Throwable e) {
+            } catch (AssertionError e) {
                 exceptionCount.incrementAndGet()
+            } catch (Throwable t) {
+                unexpectedErrors << t
             }
         }
         thread1.join()
         thread2.join()
         retryInSecs {
+            assert unexpectedErrors.isEmpty() : "Unexpected errors: ${unexpectedErrors*.message}"
             String newConfig3= Q.New(ExternalPrimaryStorageVO.class)
                     .select(ExternalPrimaryStorageVO_.config)
                     .eq(ExternalPrimaryStorageVO_.uuid, ps.uuid)
                     .findValue()
             assert ([ "127.0.1.6", "127.0.1.7" ].count { newConfig3.contains(it) } == 1)
             assert exceptionCount.get() == 1
             assert successCount.get() == 1
         }

}
externalVO = dbf.updateAndRefresh(externalVO);
if (msg.getOldConfig() != null) {
boolean success = compareAndSet(externalVO.getUuid(), msg.getOldConfig(), externalVO);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from yu.qiu:

image-20260126141753356

.param("expectedConfig", expectedConfig)
.execute();
return updatedRows > 0;
} catch (Exception e) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

zstack 基本上不需要处理 exception ,删掉 try catch

if (msg.getOldConfig() != null) {
boolean success = compareAndSetConfig(msg.getOldConfig(), externalVO);
if (!success) {
throw new OperationFailureException(operr(ORG_ZSTACK_STORAGE_ADDON_PRIMARY_10040,"Failed to update ExternalPrimaryStorage[uuid:%s], config has been modified by another operation", externalVO.getUuid()));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

能直接 evt.setError 就不要 throw exception ,很不好的习惯

When API requests for modifying
multiple MDS nodes are initiated,
place the API requests into a queue for serialized modification.

APIImpact

Resolves: ZSTAC-79225

Change-Id: I7073636e6f656c67636e7771726b6b79687a707a
@MatheMatrix MatheMatrix force-pushed the sync/yu.qiu/ZSTAC-79225 branch from a33d0c8 to 7b0388c Compare January 26, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants