Skip to content

Implemented Qwen35ChatHandler for Qwen3.5#61

Merged
JamePeng merged 8 commits intoJamePeng:mainfrom
TAO71-AI:main
Mar 1, 2026
Merged

Implemented Qwen35ChatHandler for Qwen3.5#61
JamePeng merged 8 commits intoJamePeng:mainfrom
TAO71-AI:main

Conversation

@alcoftTAO
Copy link

This PR implements Qwen3.5 models. Not tested yet due to lack of compute power on my end.

This PR is going to be a draft for now until I can test it with smaller models and also check and fix the chat template and parameters of the Qwen35ChatHandler class.

I still need to decide which parameters are useful.

@JamePeng JamePeng force-pushed the main branch 5 times, most recently from 76d8272 to 68eacae Compare February 19, 2026 14:03
@JamePeng
Copy link
Owner

Detailed adaptation work can be done after Qwen3.5-9B-Instruct and Qwen3.5-35B-A3B-Instruct are released.
Indeed, the current open-source Qwen3.5 model is too large.

@yamikumo-DSD
Copy link

yamikumo-DSD commented Feb 23, 2026

I've tested pruned version of the Qwen 3.5 model.
https://huggingface.co/infinityai/Qwen3.5-397B-REAP-50-IQ3_M/tree/main
This model is also suffered from memory_seq_rm failure problem (issue).
So, I guess even after the ChatHandler implemented, multiple turn conversation won't work currently.

@alcoftTAO
Copy link
Author

I'm updating and testing this with Qwen3.5-27B right now.

@alcoftTAO alcoftTAO reopened this Feb 26, 2026
@alcoftTAO
Copy link
Author

Closed the PR temporarily to update to the latest commit.
I will continue to test this.

@JamePeng
Copy link
Owner

I'm currently refactoring some logic locally, but the hybrid structure of qwen3-next and qwen3.5 is basically running. You can continue testing after I finish the initial implementation.

@alcoftTAO
Copy link
Author

alcoftTAO commented Feb 26, 2026

I'm having some trouble testing this.

Traceback (most recent call last):
  ...
  File ".../.env/lib/python3.11/site-packages/llama_cpp/__init__.py", line 1, in <module>
    from .llama_cpp import *
  File ".../.env/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 1391, in <module>
    @ctypes_function(
     ^^^^^^^^^^^^^^^^
  File ".../.env/lib/python3.11/site-packages/llama_cpp/_ctypes_extensions.py", line 160, in decorator
    func = getattr(lib, name)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ctypes/__init__.py", line 389, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ctypes/__init__.py", line 394, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: /usr/lib/libllama.so.0: undefined symbol: llama_params_fit

I'm not sure if it's related to my OS, Python version, or this project. Last time I compiled it (about two days ago) it worked fine.

I'm currently refactoring some logic locally, but the hybrid structure of qwen3-next and qwen3.5 is basically running. You can continue testing after I finish the initial implementation.

Does it crash because of this?

@JamePeng
Copy link
Owner

No, it's because your compiled library is incompatible or the library file is missing.

@alcoftTAO
Copy link
Author

Okay, thanks! I'll try to fix it as soon as possible to continue testing.

@JamePeng
Copy link
Owner

JamePeng commented Feb 26, 2026

The initial implementation is now live. You can try out qwen3.5 and see how it performs. Looking forward to your feedback.
Otherwise,hybrid structure models also have many multimodal. For example, LFM2-VL can now run normally without the previous hack code, and LFM2.5-VL should also have no problem.

@alcoftTAO
Copy link
Author

I've tested the chat template with and without images, thinking mode, etc.

I've only used Qwen3.5-27B (quantized to IQ2_M and mmproj in F16) with a ctx of 4096 tokens, but should work fine with any other Qwen3.5-series model and quantization.


The model seems to work fine without images, but hallucinates a lot when using images (can't describe them properly, etc.).

Since the model is so big and I'm just asking for a simple description of the image, the quantization I'm using should not be a problem.

Note

Activating the thinking mode and looking at the reasoning content, the model does not see the image. I'll check the chat template to make sure it's not causing this problem.

The user wants me to describe an image. However, looking at the "Picture 1" content provided, it appears to be empty or contains only exclamation marks (which might be a placeholder or error). Since I cannot see any actual image content in the prompt, I need to inform the user that I cannot see the image.
...

I'll continue to work on this.

@alcoftTAO
Copy link
Author

Looking at the console output, I find this:

find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 559 after 513 for sequence 0 with 15 new tokens
find_slot: non-consecutive token position 559 after 513 for sequence 0 with 15 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 557 after 513 for sequence 0 with 13 new tokens
find_slot: non-consecutive token position 557 after 513 for sequence 0 with 13 new tokens

I have no idea if it's related to this bug, but is only printed when adding an image to the prompt.


Comparing the Qwen35ChatHandler's template with the Qwen3VLChatHandler's template, the code to load an image is similar:

Qwen3VLChatHandler:

...
                    {%- if 'image_url' in content -%}
                        {%- set image_count.value = image_count.value + 1 -%}
                        {%- if add_vision_id -%}
                            {{- 'Picture ' -}}
                            {{- image_count.value | string -}}
                            {{- ': ' -}}
                        {%- endif -%}
                        {{- '<|vision_start|>' -}}
                        {%- if content.image_url is string -%}
                            {{- content.image_url -}}
                        {%- else -%}
                            {{- content.image_url.url -}}
                        {%- endif -%}
                        {{- '<|vision_end|>' -}}
                    {%- endif -%}
...

Qwen35ChatHandler:

...
                    {%- if 'image' in item or 'image_url' in item -%}
                        {%- if is_system_content -%}
                            {{- raise_exception('System message cannot contain images.') -}}
                        {%- endif -%}
                        {%- if do_vision_count -%}
                            {%- set image_count.value = image_count.value + 1 -%}
                        {%- endif -%}
                        {%- if add_vision_id -%}
                            {{- 'Picture ' ~ image_count.value ~ ': ' -}}
                        {%- endif -%}
                        {{- '<|vision_start|>' -}}
                        {%- if 'image' in item -%}
                            {%- if item.image is string -%}
                                {{- item.image -}}
                            {%- else -%}
                                {{- item.image.url -}}
                            {%- endif -%}
                        {%- elif 'image_url' in item -%}
                            {%- if item.image_url is string -%}
                                {{- item.image_url -}}
                            {%- else -%}
                                {{- item.image_url.url -}}
                            {%- endif -%}
                        {%- endif -%}
                        {{- '<|vision_end|>' -}}
                    {%- elif 'video' in item -%}
...

@JamePeng
Copy link
Owner

JamePeng commented Feb 27, 2026

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.memory_clear(True)
        llama.n_tokens = 0

These all need to be removed; the generation and eval processes should now manage them.

@JamePeng
Copy link
Owner

The model seems to work fine without images, but hallucinates a lot when using images (can't describe them properly, etc.).

Since the model is so big and I'm just asking for a simple description of the image, the quantization I'm using should not be a problem.

Note

Activating the thinking mode and looking at the reasoning content, the model does not see the image. I'll check the chat template to make sure it's not causing this problem.

You can check how the final format template it constructs is assembled, and whether there are any <media> elements mounted in it.

@yamikumo-DSD
Copy link

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.memory_clear(True)
        llama.n_tokens = 0

These all need to be removed; the generation and eval processes should now manage them.

I guess, even after you remove this part, an identical codes will be called by Llava15ChatHandler because of return super().__call__(**kwargs). Is it Okay?

@alcoftTAO
Copy link
Author

alcoftTAO commented Feb 28, 2026

I've tested this again.
I deleted the code mentioned above, I've also tried using the Qwen3-VL chat template but the bug persists.

find_slot: non-consecutive token position 446 after 445 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 445 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 128 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 32 new tokens
find_slot: non-consecutive token position 446 after 446 for sequence 0 with 32 new tokens
find_slot: non-consecutive token position 496 after 446 for sequence 0 with 7 new tokens

You can check how the final format template it constructs is assembled, and whether there are any elements mounted in it.

Printing the chat template shows the <__media__> tag between <|vision_start|> and <|vision_end|>, just as expected.

I'll try to re-download the model and mmproj, maybe with another quantization.

NOTE: According to the model's thoughts, sometimes it sees the image, but it's all black or just !!!!!!!!!! characters. Keep in mind that this is only according to the model's thoughts, it doesn't mean it can actually see anything.

Also I'm using base64-encoded images, not URLs.

@JamePeng
Copy link
Owner

JamePeng commented Feb 28, 2026

This is normal. Currently, multimodal inputs are not entering the hybrid cache. I'll look into modifying the llava15chathandler into a temporary version this weekend.
Otherwise, I want to separate mtmd_engine to encapsulate and manage the mtmd API like using method, so that in the future, We only need to maintain the new chat_template instead of maintaining the oldest llava15chathandler.

@JamePeng
Copy link
Owner

JamePeng commented Mar 1, 2026

@alcoftTAO I've completed the local adaptation for qwen3.5 and tested your chat template; it seems to work fine. Please remove the changes from the qwen3-vl section, and then modify the chat_template in qwen3.5 as follows:

class Qwen35ChatHandler(MTMDChatHandler):
    CHAT_FORMAT = (
        "{%- set image_count = namespace(value=0) -%}"
        "{%- set video_count = namespace(value=0) -%}"
        "{%- macro render_content(content, do_vision_count, is_system_content=false) -%}"
        "    {%- if content is string -%}"
        "        {{- content -}}"
        "    {%- elif content is iterable and content is not mapping -%}"
        "        {%- for item in content -%}"
        "            {%- if 'image_url' in item or item.type == 'image_url' -%}"
        "                {%- if is_system_content -%}"
        "                    {{- raise_exception('System message cannot contain images.') -}}"
        "                {%- endif -%}"
        "                {%- if do_vision_count -%}"
        "                    {%- set image_count.value = image_count.value + 1 -%}"
        "                {%- endif -%}"
        "                {%- if add_vision_id -%}"
        "                    {{- 'Picture ' -}}"
        "                    {{- image_count.value | string -}}"
        "                    {{- ': ' -}}"
        "                {%- endif -%}"
        "                {{- '<|vision_start|>' -}}"
        "                {%- if item.image_url is string -%}"
        "                    {{- item.image_url -}}"
        "                {%- else -%}"
        "                    {{- item.image_url.url -}}"
        "                {%- endif -%}"
        "                {{- '<|vision_end|>' -}}"
        "            {%- elif 'video' in item -%}"
        "                {{- raise_exception('llama.cpp does not currently support video.') -}}"  # Video not supported, raise exception
        "                {%- if is_system_content -%}"
        "                    {{- raise_exception('System message cannot contain videos.') -}}"
        "                {%- endif -%}"
        "                {%- if do_vision_count -%}"
        "                    {%- set video_count.value = video_count.value + 1 -%}"
        "                {%- endif -%}"
        "                {%- if add_vision_id -%}"
        "                    {{- 'Video ' ~ video_count.value ~ ': ' -}}"
        "                {%- endif -%}"
        "                {{- '<|vision_start|>' -}}"
        "                {{- item.video -}}"
        "                {{- '<|vision_end|>' -}}"
        "            {%- elif 'text' in item -%}"
        "                {{- item.text -}}"
        "            {%- else -%}"
        "                {{- raise_exception('Unexpected item type in content.') -}}"
        "            {%- endif -%}"
        "        {%- endfor -%}"
        "    {%- elif content is none or content is undefined -%}"
        "        {{- '' -}}"
        "    {%- else -%}"
        "        {{- raise_exception('Unexpected content type.') -}}"
        "    {%- endif -%}"
        "{%- endmacro -%}"
        "{%- if not messages -%}"
        "    {{- raise_exception('No messages provided.') -}}"
        "{%- endif -%}"
        "{%- if tools and tools is iterable and tools is not mapping -%}"
        "    {{- '<|im_start|>system\n' -}}"
        "    {{- '# Tools\n\nYou have access to the following functions:\n\n<tools>' -}}"
        "    {%- for tool in tools -%}"
        "        {{- '\n' -}}"
        "        {{- tool | tojson -}}"
        "    {%- endfor -%}"
        "    {{- '\n</tools>' -}}"
        "    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' -}}"
        "    {%- if messages[0].role == 'system' -%}"
        "        {%- set content = render_content(messages[0].content, false, true) | trim -%}"
        "        {%- if content -%}"
        "            {{- '\n\n' + content -}}"
        "        {%- endif -%}"
        "    {%- endif -%}"
        "    {{- '<|im_end|>\n' -}}"
        "{%- elif messages[0].role == 'system' -%}"
        "    {%- set content = render_content(messages[0].content, false, true) -%}"
        "    {{- '<|im_start|>system\n' + content + '<|im_end|>\n' -}}"
        "{%- endif -%}"
        "{%- set ns = namespace(multi_step_tool=true, last_query_index=messages | length - 1) -%}"
        "{%- for message in messages[::-1] -%}"
        "    {%- set index = messages | length - 1 - loop.index0 -%}"
        "    {%- if ns.multi_step_tool and message.role == 'user' -%}"
        "        {%- set content = render_content(message.content, false) | trim -%}"
        "        {%- if not (content.startswith('<tool_response>') and content.endswith('</tool_response>')) -%}"
        "            {%- set ns.multi_step_tool = false -%}"
        "            {%- set ns.last_query_index = index -%}"
        "        {%- endif -%}"
        "    {%- endif -%}"
        "{%- endfor -%}"
        "{%- if ns.multi_step_tool -%}"
        "    {{- raise_exception('No user query found in messages.') -}}"
        "{%- endif -%}"
        "{%- for message in messages -%}"
        "    {%- set content = render_content(message.content, true) | trim -%}"
        "    {%- if message.role == 'system' -%}"
        "        {%- if not loop.first -%}"
        "            {{- raise_exception('System message must be at the beginning.') -}}"
        "        {%- endif -%}"
        "    {%- elif message.role == 'user' -%}"
        "        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>\n' -}}"
        "    {%- elif message.role == 'assistant' -%}"
        "        {%- set reasoning_content = '' -%}"
        "        {%- if message.reasoning_content is string -%}"
        "            {%- set reasoning_content = message.reasoning_content -%}"
        "        {%- elif '</think>' in content -%}"
        "            {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') -%}"
        "            {%- set content = content.split('</think>')[-1].lstrip('\n') -%}"
        "        {%- endif -%}"
        "        {%- set reasoning_content = reasoning_content | trim -%}"
        "        {%- if loop.index0 > ns.last_query_index -%}"
        "            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content -}}"
        "        {%- else -%}"
        "            {{- '<|im_start|>' + message.role + '\n' + content -}}"
        "        {%- endif -%}"
        "        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping -%}"
        "            {%- for tool_call in message.tool_call -%}"
        "                {%- if tool_call.function is defined -%}"
        "                    {%- set tool_call = tool_call.function -%}"
        "                {%- endif -%}"
        "                {%- if loop.first -%}"
        "                    {%- if content | trim -%}"
        "                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' -}}"
        "                    {%- else -%}"
        "                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' -}}"
        "                    {%- endif -%}"
        "                {%- else -%}"
        "                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' -}}"
        "                {%- endif -%}"
        "                {%- if tool_call.arguments is defined -%}"
        "                    {%- for (args_name, args_value) in tool_calls.arguments | items -%}"
        "                        {{- '<parameter=' + args.name + '>\n' -}}"
        "                        {%- set args_value = args_value | tojson | safe if args_value is mapping or args_value is sequence and args_value is not string else args_value | string -%}"
        "                        {{- args_value -}}"
        "                        {{- '\n</parameter>' -}}"
        "                    {%- endfor -%}"
        "                {%- endif -%}"
        "                {{- '</function>\n</tool_call>' -}}"
        "            {%- endfor -%}"
        "        {%- endif -%}"
        "        {{- '<|im_end|>\n' -}}"
        "    {%- elif message.role == 'tool' -%}"
        "        {%- if loop.previtem and loop.previtem.role != 'tool' -%}"
        "            {{- '<|im_start|>user' -}}"
        "        {%- endif -%}"
        "        {{- '\n<tool_response>\n' -}}"
        "        {{- content -}}"
        "        {{- '\n</tool_response>' -}}"
        "        {%- if not loop.last and loop.nextitem.role != 'tool' -%}"
        "            {{- '<|im_end|>\n' -}}"
        "        {%- elif loop.last -%}"
        "            {{- '<|im_end|>\n' -}}"
        "        {%- endif -%}"
        "    {%- else -%}"
        "        {{- raise_exception('Unexpected message role.') -}}"
        "    {%- endif -%}"
        "{%- endfor -%}"
        "{%- if add_generation_prompt -%}"
        "    {{- '<|im_start|>assistant\n' -}}"
        "    {%- if enable_thinking is false -%}"
        "        {{- '<think>\n\n</think>\n\n' -}}"
        "    {%- else -%}"
        "        {{- '<think>\n' -}}"
        "    {%- endif -%}"
        "{%- endif -%}"
    )

    def __init__(
        self,
        enable_thinking: bool = True,
        add_vision_id: bool = True,
        **kwargs,
    ):
        """
        Parameters:
        - enable_thinking (bool):
            - True (default): Enables reasoning for better results.
            - False: Disables reasoning for faster results.
        - add_vision_id (bool):
            - True (default): Count all the images. Recommended for multi-image.
            - False: Doesn't count the images. Can save tokens with single-image.
        """
        super().__init__(**kwargs)
        self.enable_thinking = enable_thinking
        self.extra_template_arguments["enable_thinking"] = enable_thinking
        self.extra_template_arguments["add_vision_id"] = add_vision_id

    def __call__(self, **kwargs):
        llama = kwargs['llama']

        if hasattr(llama, 'input_ids'):
            llama.input_ids.fill(0)

        if self.verbose:
            print(f"{self.log_prefix}(enable_thinking={self.enable_thinking}) - Start processing")

        # Use parent implementation
        return super().__call__(**kwargs)

@JamePeng
Copy link
Owner

JamePeng commented Mar 1, 2026

Test result:
image
image

@JamePeng
Copy link
Owner

JamePeng commented Mar 1, 2026

The new refactoring code has been committed; you can try it again after syncing.

After this PR is merged, it will be almost time to update the new version code :)

@alcoftTAO
Copy link
Author

I've tested it and works fine!
I'll push the commit now.

@alcoftTAO alcoftTAO marked this pull request as ready for review March 1, 2026 18:22
@JamePeng JamePeng changed the title Implemented Qwen3.5 Implemented Qwen35ChatHandler for Qwen3.5 Mar 1, 2026
@JamePeng JamePeng merged commit 65b8497 into JamePeng:main Mar 1, 2026
12 checks passed
@JamePeng
Copy link
Owner

JamePeng commented Mar 1, 2026

@alcoftTAO If interested, you can also adapt multimodal models of LFM2.5-VL or other Hybrid/Recurrent/SWA structures. Let's see how well other models perform when adapted and run.

I've also added a multi-threaded image-to-bitmap conversion function this time. You can also try passing in multi-image bitmap video flags to see how efficient it is.

@alcoftTAO
Copy link
Author

Thanks! I'll look forward to implement LFM2.5-VL soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants