Skip to content

feat(mcp): enhance logging and initialize MCP clients in background#5993

Merged
Soulter merged 4 commits intomasterfrom
fix/mcp
Mar 10, 2026
Merged

feat(mcp): enhance logging and initialize MCP clients in background#5993
Soulter merged 4 commits intomasterfrom
fix/mcp

Conversation

@Soulter
Copy link
Member

@Soulter Soulter commented Mar 10, 2026

fixes: #5777

Modifications / 改动点

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

在后台异步初始化 MCP 客户端,并改进与 MCP 相关的日志和超时机制。

新功能:

  • 在后台任务中运行 MCP 客户端初始化,避免阻塞 provider 启动。

增强改进:

  • 延长默认 MCP 初始化时间并启用超时设置,以更好地应对响应缓慢的 MCP 端点。
  • 优化 MCP 日志消息,统一使用英文文案,并筛选服务器日志通知,仅记录高严重级别的问题。
  • 确保在 provider 终止期间,MCP 后台初始化任务能够被干净地取消。
Original summary in English

Summary by Sourcery

Initialize MCP clients asynchronously in the background and improve MCP-related logging and timeouts.

New Features:

  • Run MCP client initialization in a background task to avoid blocking provider startup.

Enhancements:

  • Extend default MCP initialization and enable timeouts to better tolerate slow MCP endpoints.
  • Refine MCP logging messages, use English copy, and filter server log notifications to record only high-severity issues.
  • Ensure MCP background initialization task is cancelled cleanly during provider termination.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Mar 10, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些整体反馈:

  • init_mcp_clients 移到后台任务后,ASTRBOT_MCP_INIT_STRICT 在 MCP 完全失败时不再真正阻塞启动;如果严格模式本意是强制在失败时立刻终止,那么可以考虑在严格模式下保留一条阻塞式路径,或者显式更新文档/重命名该标志位以匹配新的行为。
  • 在两个 MCP 的 logging_callback 处理函数中,现在都接受 str | LoggingMessageNotificationParams,但只处理结构化的通知类型;如果 MCP 库未来发送纯字符串日志,它们会被直接忽略。建议要么处理 str 情况,要么收窄回调类型。
  • LogPipe 的日志级别从 ERROR 改成了 INFO,这会把所有 MCP 服务器日志透传到主日志记录器;请确认这不会在生产环境中淹没日志,并考虑只放行 warning 及以上,或者更精细地映射 MCP 的日志级别。
给 AI Agent 的提示词
Please address the comments from this code review:

## Overall Comments
- Moving `init_mcp_clients` into a background task means `ASTRBOT_MCP_INIT_STRICT` no longer actually blocks startup on complete MCP failure; if strict mode is meant to enforce a hard failure, consider either keeping a blocking path for strict or explicitly documenting/renaming the flag to match the new behavior.
- In both MCP `logging_callback` handlers you now accept `str | LoggingMessageNotificationParams` but only process the structured notification type; if the MCP library ever sends plain strings they will be silently ignored, so consider either handling the `str` case or narrowing the callback type.
- The `LogPipe` level was changed from `ERROR` to `INFO`, which will surface all MCP server logs into your main logger; double‑check that this won't flood logs in production and consider filtering to warning+ or mapping MCP levels more selectively.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/manager.py" line_range="343-352" />
<code_context>
-                "MCP 服务全部初始化失败,系统将继续启动(可设置 "
-                "ASTRBOT_MCP_INIT_STRICT=1 以在此场景下中止启动)。"
+
+        async def _init_mcp_clients_bg() -> None:
+            try:
+                mcp_init_summary = await self.llm_tools.init_mcp_clients(
+                    raise_on_all_failed=strict_mcp_init
+                )
+                if (
+                    mcp_init_summary.total > 0
+                    and mcp_init_summary.success == 0
+                    and not strict_mcp_init
+                ):
+                    logger.warning(
+                        "MCP 服务全部初始化失败,系统将继续启动(可设置 "
+                        "ASTRBOT_MCP_INIT_STRICT=1 以在此场景下中止启动)。"
+                    )
+            except Exception:
+                logger.error("MCP 后台初始化失败", exc_info=True)
+
+        if self._mcp_init_task is None or self._mcp_init_task.done():
</code_context>
<issue_to_address>
**issue (bug_risk):** 严格的 MCP 初始化标志不再像之前那样终止启动或向外传播失败。

在新的后台任务实现下,`initialize()` 不再等待 `init_mcp_clients` 完成。即使设置了 `raise_on_all_failed=True`,异常也会在 `_init_mcp_clients_bg` 内部被捕获并仅记录日志,所以启动总是会成功。这把 `ASTRBOT_MCP_INIT_STRICT` 的语义从“在 MCP 初始化失败时快速失败”变成了“异步记录失败日志”。为了保留严格模式的语义,可以在 `strict_mcp_init` 为 true 时改为同步执行初始化,或者允许异常从任务中向外抛出,并在严格模式启用时在 `initialize()` 中显式地等待 `_mcp_init_task`。
</issue_to_address>

Sourcery 对开源项目永久免费——如果你觉得这次 Review 有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的 Review。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • Moving init_mcp_clients into a background task means ASTRBOT_MCP_INIT_STRICT no longer actually blocks startup on complete MCP failure; if strict mode is meant to enforce a hard failure, consider either keeping a blocking path for strict or explicitly documenting/renaming the flag to match the new behavior.
  • In both MCP logging_callback handlers you now accept str | LoggingMessageNotificationParams but only process the structured notification type; if the MCP library ever sends plain strings they will be silently ignored, so consider either handling the str case or narrowing the callback type.
  • The LogPipe level was changed from ERROR to INFO, which will surface all MCP server logs into your main logger; double‑check that this won't flood logs in production and consider filtering to warning+ or mapping MCP levels more selectively.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Moving `init_mcp_clients` into a background task means `ASTRBOT_MCP_INIT_STRICT` no longer actually blocks startup on complete MCP failure; if strict mode is meant to enforce a hard failure, consider either keeping a blocking path for strict or explicitly documenting/renaming the flag to match the new behavior.
- In both MCP `logging_callback` handlers you now accept `str | LoggingMessageNotificationParams` but only process the structured notification type; if the MCP library ever sends plain strings they will be silently ignored, so consider either handling the `str` case or narrowing the callback type.
- The `LogPipe` level was changed from `ERROR` to `INFO`, which will surface all MCP server logs into your main logger; double‑check that this won't flood logs in production and consider filtering to warning+ or mapping MCP levels more selectively.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/manager.py" line_range="343-352" />
<code_context>
-                "MCP 服务全部初始化失败,系统将继续启动(可设置 "
-                "ASTRBOT_MCP_INIT_STRICT=1 以在此场景下中止启动)。"
+
+        async def _init_mcp_clients_bg() -> None:
+            try:
+                mcp_init_summary = await self.llm_tools.init_mcp_clients(
+                    raise_on_all_failed=strict_mcp_init
+                )
+                if (
+                    mcp_init_summary.total > 0
+                    and mcp_init_summary.success == 0
+                    and not strict_mcp_init
+                ):
+                    logger.warning(
+                        "MCP 服务全部初始化失败,系统将继续启动(可设置 "
+                        "ASTRBOT_MCP_INIT_STRICT=1 以在此场景下中止启动)。"
+                    )
+            except Exception:
+                logger.error("MCP 后台初始化失败", exc_info=True)
+
+        if self._mcp_init_task is None or self._mcp_init_task.done():
</code_context>
<issue_to_address>
**issue (bug_risk):** Strict MCP init flag no longer aborts startup or propagates failures as before.

With the new background task, `initialize()` no longer waits on `init_mcp_clients`. Even with `raise_on_all_failed=True`, exceptions are caught inside `_init_mcp_clients_bg` and only logged, so startup always succeeds. This changes `ASTRBOT_MCP_INIT_STRICT` from "fail fast on MCP init failures" to "log failures asynchronously". To preserve the strict semantics, either run init synchronously when `strict_mcp_init` is true, or allow exceptions to propagate from the task and explicitly await `_mcp_init_task` in `initialize()` when strict mode is enabled.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@dosubot
Copy link

dosubot bot commented Mar 10, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

AstrBotTeam's Space

pr4697的改动
View Suggested Changes
@@ -359,6 +359,48 @@
 
 #### 逻辑改进
 工具注册和配置加载逻辑已优化,确保子代理配置的正确性和工具的动态注册。FunctionTool 新增 `is_background_task` 属性,支持异步后台任务。
+
+#### MCP 客户端初始化(PR #5993)
+
+[PR #5993](https://github.com/AstrBotDevs/AstrBot/pull/5993) 对 MCP(Model Context Protocol)客户端初始化流程进行了重大优化,从阻塞系统启动改为后台异步初始化,提升了系统启动速度和容错能力。
+
+**后台初始化机制:**
+- MCP 客户端初始化现在在提供商加载完成后异步执行,不再阻塞系统启动
+- 初始化任务通过 `_init_mcp_clients_bg()` 后台任务管理,由 `ProviderManager._mcp_init_task` 追踪
+- 系统启动时 MCP 工具可能尚未完全可用,但不影响其他功能正常运行
+- 后台任务在系统终止时会被正确取消,确保资源清理
+
+**超时配置调整:**
+- `DEFAULT_MCP_INIT_TIMEOUT_SECONDS`:从 20.0 秒增加到 180.0 秒
+- `DEFAULT_ENABLE_MCP_TIMEOUT_SECONDS`:从 30.0 秒增加到 180.0 秒
+- 超时值可通过以下环境变量自定义:
+  - `ASTRBOT_MCP_INIT_TIMEOUT`:MCP 服务初始化超时时间
+  - `ASTRBOT_MCP_ENABLE_TIMEOUT`:MCP 服务启用超时时间
+  - 最大超时限制:300 秒
+
+**增强的错误日志系统:**
+- MCP 服务器错误日志现在使用结构化的 `LoggingMessageNotificationParams`,支持日志级别过滤
+- 仅记录严重级别的日志(warning、error、critical、alert、emergency)到 `server_errlogs`
+- 错误消息格式:`[LEVEL] message_data`(例如:`[ERROR] Connection refused`)
+- stdio 传输日志级别从 ERROR 调整为 INFO,提供更详细的运行时信息
+
+**消息本地化更新:**
+所有 MCP 相关的日志和错误消息已从中文更新为英文,确保国际化一致性:
+- "MCP 服务初始化失败" → "Failed to initialize MCP server"
+- "MCP 服务初始化超时" → "Connected to MCP server timeout"
+- "已连接 MCP 服务" → "Connected to MCP server"
+- "已关闭 MCP 服务" → "Disconnected from MCP server"
+- "全部 MCP 服务初始化失败" → "All MCP services failed to initialize"
+
+**生命周期管理:**
+- `ProviderManager.terminate()` 方法现在会在终止前取消活跃的 MCP 初始化任务
+- 取消逻辑通过 `asyncio.CancelledError` 安全处理,避免资源泄漏
+- 后台任务失败不会导致系统崩溃,错误通过日志记录并继续启动流程
+
+**严格模式支持:**
+- 环境变量 `ASTRBOT_MCP_INIT_STRICT` 可配置为 `1`、`true`、`yes` 或 `on` 启用严格模式
+- 严格模式下,如果所有 MCP 服务初始化失败,系统会中止启动
+- 非严格模式下(默认),MCP 初始化失败仅记录警告,系统继续启动
 
 #### 后台任务执行流程(PR #5722)
 

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and startup efficiency of the system's integration with MCP (Multi-Client Protocol) services. By moving MCP client initialization to a non-blocking background task, the system can start faster, even when MCP endpoints are slow. It also improves the reliability of MCP connections by extending timeouts and provides clearer operational insights through refined, English-only logging for MCP-related events.

Highlights

  • Asynchronous MCP Client Initialization: MCP clients are now initialized in a background task, preventing startup delays caused by slow or unavailable MCP endpoints.
  • Extended MCP Timeouts: Default initialization and enable timeouts for MCP clients have been significantly increased to better accommodate potentially slow MCP servers.
  • Enhanced MCP Logging: Logging for MCP server errors has been refined to filter messages, only recording high-severity issues, and all related log messages have been translated to English for consistency.
  • Clean Background Task Cancellation: The background MCP initialization task is now properly cancelled and awaited during provider termination, ensuring a clean shutdown.
Changelog
  • astrbot/core/agent/mcp_client.py
    • Updated logging_callback to process mcp.types.LoggingMessageNotificationParams and filter logs, only appending messages with levels "warning", "error", "critical", "alert", or "emergency" to server_errlogs.
    • Modified LogPipe's level from logging.ERROR to logging.INFO when creating stdio_client to allow more detailed logging.
  • astrbot/core/provider/func_tool_manager.py
    • Increased DEFAULT_MCP_INIT_TIMEOUT_SECONDS and DEFAULT_ENABLE_MCP_TIMEOUT_SECONDS from 20.0 and 30.0 to 180.0 seconds, respectively.
    • Translated several error and info messages related to MCP server initialization, connection, and shutdown from Chinese to English.
  • astrbot/core/provider/manager.py
    • Added _mcp_init_task as an asyncio.Task to manage background MCP initialization.
    • Refactored the initialize method to create and run init_mcp_clients as a background task, preventing startup blocking.
    • Implemented logic in the terminate method to cancel and await the _mcp_init_task if it is still running, ensuring proper cleanup.
Activity
  • No human activity (comments, reviews, progress updates) has been recorded for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully moves the MCP client initialization to a background task, which is a great improvement to avoid blocking application startup. The changes also enhance logging by translating many messages to English and adding more structure to MCP server logs. My review includes suggestions to address a few remaining untranslated log messages for consistency, improve the clarity of some log phrasings, and refactor a piece of duplicated code to enhance maintainability.

Comment on lines +221 to +226
def callback(msg: str | mcp.types.LoggingMessageNotificationParams) -> None:
# Handle MCP service error logs
self.server_errlogs.append(msg)
if isinstance(msg, mcp.types.LoggingMessageNotificationParams):
if msg.level in (
"warning",
"error",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's significant code duplication between this callback function and the logging_callback function defined earlier in this file (lines 147-154). To improve maintainability and reduce redundancy, consider extracting this logic into a shared method on the MCPClient class.

if name in self._mcp_server_runtime or name in self._mcp_starting:
logger.warning(
f"MCP 服务 {name} 已在运行,忽略本次启用请求(timeout={timeout:g})。"
f"Connected to MCP server {name}, ignoring this startup request (timeout={timeout:g})."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message Connected to MCP server {name}, ignoring this startup request can be a bit misleading, as this code path is taken when the server is already running or in the process of starting. A more accurate message would be something like MCP server {name} is already running or starting, ignoring this startup request.

Suggested change
f"Connected to MCP server {name}, ignoring this startup request (timeout={timeout:g})."
f"MCP server {name} is already running or starting, ignoring this startup request (timeout={timeout:g})."

except asyncio.TimeoutError as exc:
raise MCPInitTimeoutError(
f"MCP 服务 {name} 初始化超时({timeout:g} 秒)"
f"Connected to MCP server {name} timeout ({timeout:g} seconds)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrasing Connected to MCP server {name} timeout is a bit awkward. A clearer message would be Connection to MCP server {name} timed out.

Suggested change
f"Connected to MCP server {name} timeout ({timeout:g} seconds)"
f"Connection to MCP server {name} timed out ({timeout:g} seconds)"

Comment on lines +354 to +355
"MCP 服务全部初始化失败,系统将继续启动(可设置 "
"ASTRBOT_MCP_INIT_STRICT=1 以在此场景下中止启动)。"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This warning message is still in Chinese. To maintain consistency with the other logging changes in this pull request, it should be translated to English.

Suggested change
"MCP 服务全部初始化失败,系统将继续启动(可设置 "
"ASTRBOT_MCP_INIT_STRICT=1 以在此场景下中止启动)。"
"All MCP services failed to initialize. The system will continue to start. "
"(Set ASTRBOT_MCP_INIT_STRICT=1 to abort startup in this scenario)."

"ASTRBOT_MCP_INIT_STRICT=1 以在此场景下中止启动)。"
)
except Exception:
logger.error("MCP 后台初始化失败", exc_info=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This error message is still in Chinese. It should be translated to English for consistency.

Suggested change
logger.error("MCP 后台初始化失败", exc_info=True)
logger.error("MCP background initialization failed", exc_info=True)

@Soulter Soulter merged commit 3379587 into master Mar 10, 2026
1 check passed
@Soulter Soulter deleted the fix/mcp branch March 10, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant