Skip to content

Conversation

@ZStack-Robot
Copy link
Collaborator

APIImpact

Resolves: ZSTAC-81237

Change-Id: I6f7262716c7075637373646b726a796c73737563

sync from gitlab !9059

@coderabbitai
Copy link

coderabbitai bot commented Jan 24, 2026

Walkthrough

新增“暂停长运行任务”能力:添加 APISuspendLongJobMsg/APISuspendLongJobEvent,扩展 LongJob/LongJobFactory 接口以支持 suspend,LongJobFactoryImpl 与 LongJobManagerImpl 中实现能力检测与暂停执行及事件发布,LongJobUtils 增加可恢复错误判定并新增测试客户端辅助与错误码常量。

Changes

Cohort / File(s) 变更摘要
API 事件与消息
header/src/main/java/org/zstack/header/longjob/APISuspendLongJobEvent.java, header/src/main/java/org/zstack/header/longjob/APISuspendLongJobMsg.java
新增用于暂停操作的消息与事件类;APISuspendLongJobMsg 标注为 PUT /longjobs/{uuid}/actions 并产生 APISuspendLongJobEvent;包含 uuid 字段与示例方法。
接口扩展
header/src/main/java/org/zstack/header/longjob/LongJob.java, longjob/src/main/java/org/zstack/longjob/LongJobFactory.java
LongJob 接口新增默认 suspend(LongJobVO, ReturnValueCompletion<Boolean>) 方法;在 LongJobFactory 接口新增 supportSuspend(String jobName) 方法。
工厂实现
longjob/src/main/java/org/zstack/longjob/LongJobFactoryImpl.java
添加 notSupportSuspendJobType 集合并实现 supportSuspend;在行为检测中将默认 suspend 方法视为不支持标记,保持与 cancel/resume/clean 检测一致。
管理实现
longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java
在消息分发中增加 APISuspendLongJobMsg 分支;新增 handle(APISuspendLongJobMsg)suspendLongJob 流程(DB 查询、状态校验、能力检查、调用 Job.suspend、更新状态、发布事件)。
错误处理调整
longjob/src/main/java/org/zstack/longjob/LongJobUtils.java
将带有可恢复标识(opaque 字段 longJobRecoverable)的错误视为可恢复并映射为 suspend 路径;新增私有 isRecoverableError(ErrorCode) 与文档说明。
测试/客户端辅助
testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy
新增 suspendLongJob 测试辅助方法,构建并调用 SuspendLongJobAction,保持与其他 API helper 一致的调用与错误处理模式。
错误码扩展
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
追加三个错误码常量:ORG_ZSTACK_AI_MESSAGE_10003ORG_ZSTACK_AI_MESSAGE_10032ORG_ZSTACK_AI_MESSAGE_10033

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as API
    participant Manager as LongJobManager
    participant Factory as LongJobFactory
    participant Job as LongJob
    participant DB as Database

    Client->>API: PUT /longjobs/{uuid}/actions (APISuspendLongJobMsg)
    API->>Manager: handle(APISuspendLongJobMsg)
    activate Manager
    Manager->>DB: 查询 LongJobVO (uuid)
    alt 已为 Suspended
        Manager->>API: 发布 APISuspendLongJobEvent (已暂停)
    else 非 Running
        Manager->>API: 发布 APISuspendLongJobEvent (状态不允许)
    else 运行中
        Manager->>Factory: supportSuspend(jobName)?
        alt 不支持
            Manager->>API: 发布 APISuspendLongJobEvent (不支持暂停)
        else 支持
            Manager->>Job: 调用 suspend(LongJobVO, completion)
            activate Job
            Job-->>Manager: 回调(成功/失败)
            deactivate Job
            Manager->>DB: 更新状态为 Suspended(成功时)
            Manager->>API: 发布 APISuspendLongJobEvent (成功或错误)
        end
    end
    deactivate Manager
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 轻踏步停工,长作业梦一瞥,
消息送云端,事件携新页,
管厂相互验,错误可复归,
兔子鼓掌跳,休憩又再来 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题遵循了要求的格式 [type][scope]: description,长度为54个字符,清晰描述了主要变更内容:引入暂停长任务消息功能。
Description check ✅ Passed 描述与变更集相关,提及了API影响、解决的问题ZSTAC-81237,以及GitLab同步来源,与长任务暂停功能的引入相关联。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)
utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java

Comment @coderabbitai help to get the list of available commands and usage tips.

@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/ZSTAC-81237@@2 branch 2 times, most recently from 6c098e1 to af47416 Compare January 25, 2026 03:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java`:
- Around line 484-501: The suspend path lacks the intermediate "Suspending"
state used by cancelLongJob, which can allow concurrent suspend calls to race;
before calling job.suspend(...) in suspendLongJob, set the intermediate state
(e.g., changeState(uuid, LongJobStateEvent.suspending) or the appropriate enum)
while holding the same synchronization (synchronized(jobUuids.intern(uuid))) as
cancelLongJob, then call job.suspend; on success move to Suspended (as now) and
on failure revert or record the proper state and propagate the error, mirroring
the cancelLongJob pattern (use the same locking, changeState calls, and error
handling around job.suspend to ensure consistency).
🧹 Nitpick comments (1)
longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java (1)

463-472: 可考虑优化:减少重复数据库查询

在第 464-466 行查询了 statejobName,但在第 469 行又执行了一次 findByUuid 获取完整的 LongJobVO。可以考虑在需要返回 VO 时直接查询完整对象,或者延迟到真正需要时再查询。

♻️ 建议的优化
     private void suspendLongJob(String uuid, ReturnValueCompletion<LongJobVO> completion) {
         Tuple t = Q.New(LongJobVO.class).eq(LongJobVO_.uuid, uuid).select(LongJobVO_.state, LongJobVO_.jobName).findTuple();
         LongJobState currentState = t.get(0, LongJobState.class);
         String jobName = t.get(1, String.class);
 
         if (currentState == LongJobState.Suspended) {
-            LongJobVO vo = dbf.findByUuid(uuid, LongJobVO.class);
-            completion.success(vo);
+            completion.success(dbf.findByUuid(uuid, LongJobVO.class));
             return;
         }

Comment on lines +484 to +501
LongJobVO vo = dbf.findByUuid(uuid, LongJobVO.class);
LongJob job = longJobFactory.getLongJob(vo.getJobName());

job.suspend(vo, new ReturnValueCompletion<Boolean>(completion) {
@Override
public void success(Boolean suspended) {
LongJobVO updatedVo = changeState(uuid, LongJobStateEvent.suspend);
logger.info(String.format("longjob [uuid:%s, name:%s] has been suspended", vo.getUuid(), vo.getName()));
completion.success(updatedVo);
}

@Override
public void fail(ErrorCode errorCode) {
logger.error(String.format("failed to suspend longjob [uuid:%s, name:%s]: %s", vo.getUuid(), vo.getName(), errorCode));
completion.fail(errorCode);
}
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# 查找 LongJobState 和 LongJobStateEvent 枚举定义
find . -name "*.java" -type f | xargs grep -l "enum LongJobState\|enum LongJobStateEvent" | head -5

Repository: MatheMatrix/zstack

Length of output: 253


🏁 Script executed:

# 在 LongJobManagerImpl.java 中查找 cancelLongJob 和其他状态转换方法
rg -n "cancelLongJob|suspendLongJob|resumeLongJob|changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -A 5 -B 1

Repository: MatheMatrix/zstack

Length of output: 6832


🏁 Script executed:

# 查找 LongJobState 和 LongJobStateEvent 的定义
rg -n "enum LongJobState|enum LongJobStateEvent" --type=java -A 25

Repository: MatheMatrix/zstack

Length of output: 4185


🏁 Script executed:

# 查看 Running 状态的所有可能转移
rg -n "Running\.transactions" header/src/main/java/org/zstack/header/longjob/LongJobState.java -A 10

Repository: MatheMatrix/zstack

Length of output: 750


🏁 Script executed:

# 查看 changeState 方法的实现
rg -n "private.*changeState|public.*changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -A 15

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 用不同方式查找 changeState 方法
rg -n "changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | head -20

Repository: MatheMatrix/zstack

Length of output: 824


🏁 Script executed:

# 查找 changeState 的完整实现
rg -n "def changeState|LongJobVO changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -A 20

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 使用 ast-grep 查找 changeState 方法
ast-grep --pattern 'LongJobVO changeState($_, $_) { $$$ }'

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 查找 changeState 方法定义,搜索返回类型
rg -n "changeState\(" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -B 2 | grep -E "private|public|protected" -A 2

Repository: MatheMatrix/zstack

Length of output: 1026


🏁 Script executed:

# 在整个文件中搜索方法签名
cat longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | grep -n "changeState" -B 5 | head -50

Repository: MatheMatrix/zstack

Length of output: 2176


🏁 Script executed:

# 直接查看文件的特定行范围,寻找 changeState 方法定义
wc -l longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java

Repository: MatheMatrix/zstack

Length of output: 131


🏁 Script executed:

# 查找 changeState 的完整定义(在文件后部分)
tail -400 longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | head -200 | grep -n "changeState" -B 5 -A 15

Repository: MatheMatrix/zstack

Length of output: 4088


🏁 Script executed:

# 搜索所有的方法定义,找到 changeState
rg -n "^\s*(private|public|protected|static).*changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 用 sed 查找 changeState 方法定义
sed -n '900,1005p' longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | grep -n "changeState" -B 5 -A 20

Repository: MatheMatrix/zstack

Length of output: 1126


🏁 Script executed:

# 使用 sed 直接查看文件特定行数范围
sed -n '750,850p' longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java

Repository: MatheMatrix/zstack

Length of output: 3523


🏁 Script executed:

# 搜索 LongJobVO 返回类型的方法定义
rg -n "LongJobVO.*changeState|changeState.*\{" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -B 1 -A 20 | head -80

Repository: MatheMatrix/zstack

Length of output: 4100


🏁 Script executed:

# 搜索 private/public 修饰的 changeState 方法
sed -n '800,900p' longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | grep -n "changeState\|private\|public"

Repository: MatheMatrix/zstack

Length of output: 513


🏁 Script executed:

# 用 awk 找到 changeState 方法的确切位置
awk '/private.*changeState|public.*changeState/{print NR":"$0; for(i=1;i<=30;i++){getline; print NR":"$0}}' longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | head -60

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 从后往前搜索 changeState 的方法定义
rg -n "^\s*private\s+LongJobVO\s+changeState|^\s*public\s+LongJobVO\s+changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -A 30

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 直接查看整个文件,搜索所有的 LongJobVO changeState 方法
grep -n "LongJobVO changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -A 25

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 搜索包含 changeState 的所有行
rg "changeState" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java -n

Repository: MatheMatrix/zstack

Length of output: 824


🏁 Script executed:

# 尝试搜索方法定义的其他模式
rg -n "changeState.*\(" longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | tail -20

Repository: MatheMatrix/zstack

Length of output: 824


🏁 Script executed:

# 查看文件末尾部分,changeState 可能定义在那里
tail -200 longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java

Repository: MatheMatrix/zstack

Length of output: 7196


🏁 Script executed:

# 查找 LongJobManagerImpl 的类定义和继承
head -50 longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java | grep -n "class\|extends\|implements"

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# 搜索所有文件中的 changeState 方法定义
rg -n "LongJobVO.*changeState\s*\(" --type=java -B 2 -A 20 | head -100

Repository: MatheMatrix/zstack

Length of output: 10796


可考虑添加中间状态以保持一致性,但当前设计可能依赖幂等性

对比 cancelLongJob 的实现,取消操作在调用 job.cancel() 之前会先设置 Canceling 中间状态(第 319 行),以防止并发请求干扰。而 suspendLongJob 在调用 job.suspend() 前没有设置中间状态,而是在成功后才通过 changeState() 改为 Suspended 状态(第 490 行)。

这意味着两个并发的 suspend 请求都可能通过 Running 状态检查(第 465 行)并调用 job.suspend()。不过值得注意的是:

  1. changeState() 方法使用了 synchronized (jobUuids.intern(uuid)) 进行同步化处理
  2. 状态机允许 Suspended → suspend → Suspended 的幂等转换

这表明当前设计可能依赖 suspend 操作的幂等性。建议明确文档说明 suspend 操作是否设计为幂等的,或考虑添加 Suspending 中间状态以与 cancel 的设计模式保持一致。

🤖 Prompt for AI Agents
In `@longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java` around
lines 484 - 501, The suspend path lacks the intermediate "Suspending" state used
by cancelLongJob, which can allow concurrent suspend calls to race; before
calling job.suspend(...) in suspendLongJob, set the intermediate state (e.g.,
changeState(uuid, LongJobStateEvent.suspending) or the appropriate enum) while
holding the same synchronization (synchronized(jobUuids.intern(uuid))) as
cancelLongJob, then call job.suspend; on success move to Suspended (as now) and
on failure revert or record the proper state and propagate the error, mirroring
the cancelLongJob pattern (use the same locking, changeState calls, and error
handling around job.suspend to ensure consistency).

@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/ZSTAC-81237@@2 branch 2 times, most recently from a5268ac to 0ac0d2c Compare January 25, 2026 06:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
longjob/src/main/java/org/zstack/longjob/LongJobFactory.java (1)

10-15: 新增接口方法建议补充 Javadoc
Line 14 新增接口方法但未补充 Javadoc,建议按规范补齐以便生成文档与保持一致性。

📝 建议修改
     boolean supportCancel(String jobName);
+    /**
+     * `@param` jobName long job name
+     * `@return` true if suspend is supported for the job type
+     */
     boolean supportSuspend(String jobName);

As per coding guidelines, ...

♻️ Duplicate comments (1)
longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java (1)

484-501: cancelLongJob 相比缺少中间状态的问题已在之前的评审中提出。

当前实现在调用 job.suspend() 之前没有设置中间状态(如 Suspending),而 cancelLongJob 会在第 319 行先设置 Canceling 状态。虽然 changeState 内部使用了 synchronized(jobUuids.intern(uuid)) 进行同步,但这可能依赖于 suspend 操作的幂等性设计。

建议确认 suspend 操作是否设计为幂等的,或考虑添加 Suspending 中间状态以与 cancel 模式保持一致。

🧹 Nitpick comments (1)
longjob/src/main/java/org/zstack/longjob/LongJobUtils.java (1)

172-189: 建议将 "longJobRecoverable" 提取为常量
Line 185 使用字符串字面量作为跨模块约定的 key,存在拼写不一致风险。建议提为常量(必要时可公开)以减少魔法值。

♻️ 建议修改
 public class LongJobUtils {
     private static final CLogger logger = Utils.getLogger(LongJobUtils.class);
+    public static final String LONG_JOB_RECOVERABLE_OPAQUE_KEY = "longJobRecoverable";
@@
-        Object recoverable = errorCode.getFromOpaque("longJobRecoverable");
+        Object recoverable = errorCode.getFromOpaque(LONG_JOB_RECOVERABLE_OPAQUE_KEY);

As per coding guidelines, ...

@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/ZSTAC-81237@@2 branch from 0ac0d2c to 8d07640 Compare January 25, 2026 06:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
longjob/src/main/java/org/zstack/longjob/LongJobUtils.java (1)

160-169: 检查可恢复标记在聚合错误的递归传播路径

ErrorCodeList extends ErrorCode 并包含 List<ErrorCode> causes 字段,若业务侧将 longJobRecoverable 标记设置在 causes 列表中的错误对象的 opaque 字段上,当前的 isRecoverableError 方法(仅检查顶层 getFromOpaque)无法检测到该标记,导致判定为 fail 而非 suspend。建议补充递归遍历 causes 的逻辑,或在接口文档中明确要求该标记必须设置在顶层错误的 opaque 字段上。

🤖 Fix all issues with AI agents
In `@longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java`:
- Around line 487-493: The success handler for job.suspend currently ignores the
suspended Boolean and always transitions the job to Suspended; modify the
anonymous ReturnValueCompletion<Boolean> in the job.suspend(...) call so it
checks the suspended value and only calls changeState(uuid,
LongJobStateEvent.suspend) and completion.success(...) when suspended == true,
and otherwise call completion.fail(...) or completion.success(vo) (to keep
original state) per the method contract; update the success(Boolean suspended)
method in that block and ensure logging reflects whether the suspend actually
succeeded.
🧹 Nitpick comments (1)
longjob/src/main/java/org/zstack/longjob/LongJobUtils.java (1)

182-186: 将 opaque key 抽成常量,避免魔法字符串
建议把 "longJobRecoverable" 提取为私有常量,降低拼写风险和复用成本。

♻️ 建议修改
 public class LongJobUtils {
     private static final CLogger logger = Utils.getLogger(LongJobUtils.class);
+    private static final String LONG_JOB_RECOVERABLE_OPAQUE_KEY = "longJobRecoverable";
@@
-        Object recoverable = errorCode.getFromOpaque("longJobRecoverable");
+        Object recoverable = errorCode.getFromOpaque(LONG_JOB_RECOVERABLE_OPAQUE_KEY);

Comment on lines +487 to +493
job.suspend(vo, new ReturnValueCompletion<Boolean>(completion) {
@Override
public void success(Boolean suspended) {
LongJobVO updatedVo = changeState(uuid, LongJobStateEvent.suspend);
logger.info(String.format("longjob [uuid:%s, name:%s] has been suspended", vo.getUuid(), vo.getName()));
completion.success(updatedVo);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

必须处理 suspended 返回值,否则可能错误地标记为 Suspended。
当前无论 job.suspend(...) 返回 true/false 都会切到 Suspended。若实现返回 false 表示未真正暂停,这里会造成状态错误。

建议仅在 suspended == true 时变更状态,或按契约将 false 视为失败/保持原状态。

🔧 建议修正示例
job.suspend(vo, new ReturnValueCompletion<Boolean>(completion) {
    `@Override`
    public void success(Boolean suspended) {
-        LongJobVO updatedVo = changeState(uuid, LongJobStateEvent.suspend);
-        logger.info(String.format("longjob [uuid:%s, name:%s] has been suspended", vo.getUuid(), vo.getName()));
-        completion.success(updatedVo);
+        if (Boolean.TRUE.equals(suspended)) {
+            LongJobVO updatedVo = changeState(uuid, LongJobStateEvent.suspend);
+            logger.info(String.format("longjob [uuid:%s, name:%s] has been suspended", vo.getUuid(), vo.getName()));
+            completion.success(updatedVo);
+        } else {
+            logger.info(String.format("longjob [uuid:%s, name:%s] suspend returned false; keep state unchanged", vo.getUuid(), vo.getName()));
+            completion.success(dbf.findByUuid(uuid, LongJobVO.class));
+        }
    }
🤖 Prompt for AI Agents
In `@longjob/src/main/java/org/zstack/longjob/LongJobManagerImpl.java` around
lines 487 - 493, The success handler for job.suspend currently ignores the
suspended Boolean and always transitions the job to Suspended; modify the
anonymous ReturnValueCompletion<Boolean> in the job.suspend(...) call so it
checks the suspended value and only calls changeState(uuid,
LongJobStateEvent.suspend) and completion.success(...) when suspended == true,
and otherwise call completion.fail(...) or completion.success(vo) (to keep
original state) per the method contract; update the success(Boolean suspended)
method in that block and ensure logging reflects whether the suspend actually
succeeded.

APIImpact

Resolves: ZSTAC-81237

Change-Id: I6f7262716c7075637373646b726a796c73737563
Signed-off-by: AlanJager <ye.zou@zstack.io>
@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/ZSTAC-81237@@2 branch from 8d07640 to 50a007a Compare January 25, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants