Skip to content

add support for qwen3.5 vl model#693

Open
RobotSail wants to merge 1 commit intoinstructlab:mainfrom
RobotSail:add-qwen3-vl-support
Open

add support for qwen3.5 vl model#693
RobotSail wants to merge 1 commit intoinstructlab:mainfrom
RobotSail:add-qwen3-vl-support

Conversation

@RobotSail
Copy link
Member

@RobotSail RobotSail commented Mar 3, 2026

This PR adds support for the qwen3.5 vl model.

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of M-RoPE models with automatic attention implementation selection and SDPA fallback.
    • Enhanced multi-module training configuration validation with better error handling.

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

These changes enhance FSDP configuration and Flash Attention 2 handling in the training module. The accelerator now resolves all entries in _no_split_modules to support multi-target module wrapping. The model initialization adds M-RoPE detection to conditionally disable Flash Attention 2 when incompatible, falling back to SDPA.

Changes

Cohort / File(s) Summary
FSDP Configuration
src/instructlab/training/accelerator.py
Updated get_fsdp_config to iterate over all entries in _no_split_modules and resolve each to a class, collecting them into a set instead of using a single class. Added error handling to raise ValueError if no classes resolve successfully.
Flash Attention Compatibility
src/instructlab/training/model.py
Added M-RoPE detection via new _has_mrope() static method that inspects model config. Flash Attention 2 is now conditionally disabled when M-RoPE is detected, with fallback to SDPA and a warning message.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppy new logic in layers so grand,
More modules unwrapped across the land,
M-RoPE detection helps us see,
Which attention engine we should be!
Flash or SDPA, we choose with care,
Training runs smooth through the developer's prayer! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title describes adding support for a specific model (qwen3.5 vl), which aligns with the PR description objective. However, the code changes modify the Flash Attention 2 logic for M-RoPE detection and update FSDP config handling—these are implementation details that support the broader goal but are not explicitly captured in the simple title.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify bot added the ci-failure label Mar 3, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/instructlab/training/accelerator.py`:
- Around line 148-157: The code currently raises a ValueError when no classes
are resolved from self.model._no_split_modules; instead, try resolving names
against the underlying HF model (e.g., check getattr(self.model, "model", None)
or getattr(self.model, "hf_model", None) and call get_module_class_from_name
against that if the first pass yields nothing), and if still no layer_classes,
do not raise—instead emit a clear warning (use warnings.warn or the module
logger) that resolution failed and that FSDP will fall back to a safe wrap
policy, then continue with an empty layer_classes so the downstream FSDP wrap
logic uses the safe/default policy. Ensure you reference
get_module_class_from_name, self.model._no_split_modules, and layer_classes in
the change.

In `@src/instructlab/training/model.py`:
- Around line 125-131: The _has_mrope() helper should catch config-load errors
and validate types: wrap AutoConfig.from_pretrained(model_path) in a try/except
and return False on any exception, then retrieve rope_scaling safely by checking
attributes and types (use getattr(config, "rope_scaling", None) and
getattr(getattr(config, "text_config", None), "rope_scaling", None) but ensure
the resulting rope_scaling is a dict or has __contains__ before testing
"mrope_section"); if type checks fail, return False so M-RoPE detection is a
safe, non-fatal fallback.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f02ea6 and a1e6782.

📒 Files selected for processing (2)
  • src/instructlab/training/accelerator.py
  • src/instructlab/training/model.py

Comment on lines +148 to +157
for block_name in self.model._no_split_modules:
cls = get_module_class_from_name(self.model, block_name)
if cls is not None:
layer_classes.add(cls)
if not layer_classes:
raise ValueError(
f"Could not resolve any _no_split_modules "
f"({self.model._no_split_modules}) to actual module classes "
f"in the model."
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid hard-failing FSDP init when _no_split_modules resolution misses.

The new ValueError at Line 153 is currently blocking training (matches the pipeline failure). Resolve names against the underlying HF model and fall back to a safe wrap policy with a warning instead of aborting.

Proposed fix
-            layer_classes = set()
-            for block_name in self.model._no_split_modules:
-                cls = get_module_class_from_name(self.model, block_name)
+            hf_model = self.model.model
+            no_split_modules = getattr(hf_model, "_no_split_modules", []) or []
+            layer_classes = set()
+            for block_name in no_split_modules:
+                cls = get_module_class_from_name(hf_model, block_name)
                 if cls is not None:
                     layer_classes.add(cls)
-            if not layer_classes:
-                raise ValueError(
-                    f"Could not resolve any _no_split_modules "
-                    f"({self.model._no_split_modules}) to actual module classes "
-                    f"in the model."
-                )
-            wrap_policy = partial(
-                transformer_auto_wrap_policy,
-                transformer_layer_cls=layer_classes,
-            )
+            if not layer_classes:
+                logger.warning(
+                    "Could not resolve _no_split_modules=%s to concrete classes; "
+                    "falling back to fsdp_auto_wrap_policy.",
+                    no_split_modules,
+                )
+                wrap_policy = fsdp_auto_wrap_policy(hf_model)
+            else:
+                wrap_policy = partial(
+                    transformer_auto_wrap_policy,
+                    transformer_layer_cls=layer_classes,
+                )
🧰 Tools
🪛 GitHub Actions: Run unit tests via Tox::pytest

[error] 153-153: Could not resolve any _no_split_modules ({'LlamaDecoderLayer'}) to actual module classes in the model during get_fsdp_config().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/instructlab/training/accelerator.py` around lines 148 - 157, The code
currently raises a ValueError when no classes are resolved from
self.model._no_split_modules; instead, try resolving names against the
underlying HF model (e.g., check getattr(self.model, "model", None) or
getattr(self.model, "hf_model", None) and call get_module_class_from_name
against that if the first pass yields nothing), and if still no layer_classes,
do not raise—instead emit a clear warning (use warnings.warn or the module
logger) that resolution failed and that FSDP will fall back to a safe wrap
policy, then continue with an empty layer_classes so the downstream FSDP wrap
logic uses the safe/default policy. Ensure you reference
get_module_class_from_name, self.model._no_split_modules, and layer_classes in
the change.

Comment on lines +125 to +131
from transformers import AutoConfig

config = AutoConfig.from_pretrained(model_path)
rope_scaling = getattr(config, "rope_scaling", None) or getattr(
getattr(config, "text_config", None), "rope_scaling", None
)
return rope_scaling is not None and "mrope_section" in rope_scaling
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current _has_mrope implementation
rg -n -C3 'def _has_mrope|AutoConfig.from_pretrained|mrope_section|except' src/instructlab/training/model.py

Repository: instructlab/training

Length of output: 1766


🏁 Script executed:

#!/bin/bash
# Find where _has_mrope is called
rg -n '_has_mrope' src/instructlab/training/model.py

Repository: instructlab/training

Length of output: 168


🏁 Script executed:

#!/bin/bash
# Get context around line 103 where _has_mrope is called
sed -n '90,120p' src/instructlab/training/model.py | cat -n

Repository: instructlab/training

Length of output: 1836


🏁 Script executed:

#!/bin/bash
# Check if there's any try-except wrapper in the __init__ or _setup_flash_attention method
# Get more context to see the full method structure
sed -n '75,135p' src/instructlab/training/model.py | cat -n

Repository: instructlab/training

Length of output: 3239


Guard _has_mrope() against config-load failures.

Line 127 performs an external config load without error handling. A transient Hub/local-config failure will crash model initialization even though M-RoPE detection is an optional optimization for flash attention.

Wrap the AutoConfig.from_pretrained() call in a try-except block and return False on failure (safe fallback). Additionally, add type validation before accessing rope_scaling as a dict to prevent runtime errors from unexpected config structures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/instructlab/training/model.py` around lines 125 - 131, The _has_mrope()
helper should catch config-load errors and validate types: wrap
AutoConfig.from_pretrained(model_path) in a try/except and return False on any
exception, then retrieve rope_scaling safely by checking attributes and types
(use getattr(config, "rope_scaling", None) and getattr(getattr(config,
"text_config", None), "rope_scaling", None) but ensure the resulting
rope_scaling is a dict or has __contains__ before testing "mrope_section"); if
type checks fail, return False so M-RoPE detection is a safe, non-fatal
fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant