add support for qwen3.5 vl model by RobotSail · Pull Request #693 · instructlab/training

RobotSail · 2026-03-03T02:21:06Z

This PR adds support for the qwen3.5 vl model.

Summary by CodeRabbit

Bug Fixes
- Improved handling of M-RoPE models with automatic attention implementation selection and SDPA fallback.
- Enhanced multi-module training configuration validation with better error handling.

coderabbitai · 2026-03-03T02:21:25Z

📝 Walkthrough

Walkthrough

These changes enhance FSDP configuration and Flash Attention 2 handling in the training module. The accelerator now resolves all entries in _no_split_modules to support multi-target module wrapping. The model initialization adds M-RoPE detection to conditionally disable Flash Attention 2 when incompatible, falling back to SDPA.

Changes

Cohort / File(s)	Summary
FSDP Configuration `src/instructlab/training/accelerator.py`	Updated `get_fsdp_config` to iterate over all entries in `_no_split_modules` and resolve each to a class, collecting them into a set instead of using a single class. Added error handling to raise ValueError if no classes resolve successfully.
Flash Attention Compatibility `src/instructlab/training/model.py`	Added M-RoPE detection via new `_has_mrope()` static method that inspects model config. Flash Attention 2 is now conditionally disabled when M-RoPE is detected, with fallback to SDPA and a warning message.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppy new logic in layers so grand,
More modules unwrapped across the land,
M-RoPE detection helps us see,
Which attention engine we should be!
Flash or SDPA, we choose with care,
Training runs smooth through the developer's prayer! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title describes adding support for a specific model (qwen3.5 vl), which aligns with the PR description objective. However, the code changes modify the Flash Attention 2 logic for M-RoPE detection and update FSDP config handling—these are implementation details that support the broader goal but are not explicitly captured in the simple title.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/instructlab/training/accelerator.py`:
- Around line 148-157: The code currently raises a ValueError when no classes
are resolved from self.model._no_split_modules; instead, try resolving names
against the underlying HF model (e.g., check getattr(self.model, "model", None)
or getattr(self.model, "hf_model", None) and call get_module_class_from_name
against that if the first pass yields nothing), and if still no layer_classes,
do not raise—instead emit a clear warning (use warnings.warn or the module
logger) that resolution failed and that FSDP will fall back to a safe wrap
policy, then continue with an empty layer_classes so the downstream FSDP wrap
logic uses the safe/default policy. Ensure you reference
get_module_class_from_name, self.model._no_split_modules, and layer_classes in
the change.

In `@src/instructlab/training/model.py`:
- Around line 125-131: The _has_mrope() helper should catch config-load errors
and validate types: wrap AutoConfig.from_pretrained(model_path) in a try/except
and return False on any exception, then retrieve rope_scaling safely by checking
attributes and types (use getattr(config, "rope_scaling", None) and
getattr(getattr(config, "text_config", None), "rope_scaling", None) but ensure
the resulting rope_scaling is a dict or has __contains__ before testing
"mrope_section"); if type checks fail, return False so M-RoPE detection is a
safe, non-fatal fallback.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f02ea6 and a1e6782.

📒 Files selected for processing (2)

src/instructlab/training/accelerator.py
src/instructlab/training/model.py

coderabbitai · 2026-03-03T02:26:25Z

src/instructlab/training/accelerator.py

+            for block_name in self.model._no_split_modules:
+                cls = get_module_class_from_name(self.model, block_name)
+                if cls is not None:
+                    layer_classes.add(cls)
+            if not layer_classes:
+                raise ValueError(
+                    f"Could not resolve any _no_split_modules "
+                    f"({self.model._no_split_modules}) to actual module classes "
+                    f"in the model."
+                )


⚠️ Potential issue | 🔴 Critical

Avoid hard-failing FSDP init when _no_split_modules resolution misses.

The new ValueError at Line 153 is currently blocking training (matches the pipeline failure). Resolve names against the underlying HF model and fall back to a safe wrap policy with a warning instead of aborting.

Proposed fix

- layer_classes = set() - for block_name in self.model._no_split_modules: - cls = get_module_class_from_name(self.model, block_name) + hf_model = self.model.model + no_split_modules = getattr(hf_model, "_no_split_modules", []) or [] + layer_classes = set() + for block_name in no_split_modules: + cls = get_module_class_from_name(hf_model, block_name) if cls is not None: layer_classes.add(cls) - if not layer_classes: - raise ValueError( - f"Could not resolve any _no_split_modules " - f"({self.model._no_split_modules}) to actual module classes " - f"in the model." - ) - wrap_policy = partial( - transformer_auto_wrap_policy, - transformer_layer_cls=layer_classes, - ) + if not layer_classes: + logger.warning( + "Could not resolve _no_split_modules=%s to concrete classes; " + "falling back to fsdp_auto_wrap_policy.", + no_split_modules, + ) + wrap_policy = fsdp_auto_wrap_policy(hf_model) + else: + wrap_policy = partial( + transformer_auto_wrap_policy, + transformer_layer_cls=layer_classes, + )

🧰 Tools

🪛 GitHub Actions: Run unit tests via Tox::pytest

[error] 153-153: Could not resolve any _no_split_modules ({'LlamaDecoderLayer'}) to actual module classes in the model during get_fsdp_config().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/accelerator.py` around lines 148 - 157, The code currently raises a ValueError when no classes are resolved from self.model._no_split_modules; instead, try resolving names against the underlying HF model (e.g., check getattr(self.model, "model", None) or getattr(self.model, "hf_model", None) and call get_module_class_from_name against that if the first pass yields nothing), and if still no layer_classes, do not raise—instead emit a clear warning (use warnings.warn or the module logger) that resolution failed and that FSDP will fall back to a safe wrap policy, then continue with an empty layer_classes so the downstream FSDP wrap logic uses the safe/default policy. Ensure you reference get_module_class_from_name, self.model._no_split_modules, and layer_classes in the change.

coderabbitai · 2026-03-03T02:26:25Z

src/instructlab/training/model.py

+        from transformers import AutoConfig
+
+        config = AutoConfig.from_pretrained(model_path)
+        rope_scaling = getattr(config, "rope_scaling", None) or getattr(
+            getattr(config, "text_config", None), "rope_scaling", None
+        )
+        return rope_scaling is not None and "mrope_section" in rope_scaling


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify current _has_mrope implementation rg -n -C3 'def _has_mrope|AutoConfig.from_pretrained|mrope_section|except' src/instructlab/training/model.py

Repository: instructlab/training

Length of output: 1766

🏁 Script executed:

#!/bin/bash # Find where _has_mrope is called rg -n '_has_mrope' src/instructlab/training/model.py

Repository: instructlab/training

Length of output: 168

🏁 Script executed:

#!/bin/bash # Get context around line 103 where _has_mrope is called sed -n '90,120p' src/instructlab/training/model.py | cat -n

Repository: instructlab/training

Length of output: 1836

🏁 Script executed:

#!/bin/bash # Check if there's any try-except wrapper in the __init__ or _setup_flash_attention method # Get more context to see the full method structure sed -n '75,135p' src/instructlab/training/model.py | cat -n

Repository: instructlab/training

Length of output: 3239

Guard _has_mrope() against config-load failures.

Line 127 performs an external config load without error handling. A transient Hub/local-config failure will crash model initialization even though M-RoPE detection is an optional optimization for flash attention.

Wrap the AutoConfig.from_pretrained() call in a try-except block and return False on failure (safe fallback). Additionally, add type validation before accessing rope_scaling as a dict to prevent runtime errors from unexpected config structures.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/model.py` around lines 125 - 131, The _has_mrope() helper should catch config-load errors and validate types: wrap AutoConfig.from_pretrained(model_path) in a try/except and return False on any exception, then retrieve rope_scaling safely by checking attributes and types (use getattr(config, "rope_scaling", None) and getattr(getattr(config, "text_config", None), "rope_scaling", None) but ensure the resulting rope_scaling is a dict or has __contains__ before testing "mrope_section"); if type checks fail, return False so M-RoPE detection is a safe, non-fatal fallback.

add support for qwen3.5 vl model

a1e6782

mergify bot added the ci-failure label Mar 3, 2026

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for qwen3.5 vl model#693

add support for qwen3.5 vl model#693
RobotSail wants to merge 1 commit intoinstructlab:mainfrom
RobotSail:add-qwen3-vl-support

RobotSail commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RobotSail commented Mar 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RobotSail commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading