Is it in scope of the llama.cpp project to add a new "endianness changer" feature to `llama-quantize` cli tooling? #17971

aarmn · 2025-12-12T19:22:12Z

aarmn
Dec 12, 2025

Hello everyone!
I was considering adding an endian-ness changer which is currently a dangling python file to the llama.cpp project, llama-quantize module, do you think there is a scope problem with that? as I'm new to source-code, and not quite a top notch c/cpp developer, so I'd love to have a heads-up before investing some time on implementing that.

I think personally it's super useful to have that or even, auto convert with a prompt to the user, but I think a flag would be enough to improve workflow

Is there a similar feature implemented? and should I?

Thanks for your time!

taronaeo · 2025-12-13T01:19:18Z

taronaeo
Dec 13, 2025
Collaborator

I would be speaking the context for Big-Endian systems. Llama.cpp currently provides the following scripts that already implement byteswapping:

convert_hf_to_gguf.py (via --bigendian flag)
gguf_convert_endian.py

Both of these will produce Big-Endian GGUF files. However, gguf_convert_endian.py has limited support for byteswapping the GGML types, so more implementation is required.

Next, quantization. On s390x, we usually take the Big-Endian GGUF model created from above and run it through llama-quantize to create a quantized model. This ensures that the model is still in Big-Endian byteorder.

If you could somehow merge these steps where we can take a BF16 Little-Endian GGUF model, and quantize it to lets say Q4_K_M while byteswapping it to Big-Endian, that would be great :)

lmk your thoughts

2 replies

aarmn Dec 14, 2025
Author

Hello again!

My initial plan was to get gguf_convert_endian.py and port it to C/++ as a utility of llama_quantize or llama_gguf

Also, unlike convert_hf_to_gguf.py, in default builds and installations (e.g.: archlinux) gguf_convert_endian.py doesn't even get shipped, I'm not sure if other distros are different. Also, the idea of having python files with the .py extension in path feels really eerie for some reason! And tools naming feels a bit arbitrary, as in, a user doesn't know what belongs to llama.cpp without explicit lookup using pkg manager.

My goal was to collect these python scripts and port them as subcommands of other llama.cpp utils so that there be less clutter, and suggest user on changing endianness (while I was trying to run a model with wrong endianness, I got a warning that it might be endianness, but I had to Google, download the py script, and run it manually, while I think it makes a lot more sense for it to be a simple flag or subcommands of one the existing llama.cpp utils)

Or at least unify them as a python script, under a more unified CLI, e.g.: llama_format?

Based on current project structure, is that a good idea? Or am I unaware of some critical details in my assumptions?

Thanks alot for your feedback! @taronaeo

taronaeo Dec 14, 2025
Collaborator

My initial plan was to get gguf_convert_endian.py and port it to C/++ as a utility of llama_quantize or llama_gguf

Just a thought about code maintainability. gguf_convert_endian.py includes numpy to structure the GGUF file and safetensors to read the original model tensors. I'm not sure how maintainable this port would be in the long run, given that both numpy and safetensors are well-maintained. (Thinking of why reinvent the wheel?)

Also, unlike convert_hf_to_gguf.py, in default builds and installations (e.g.: archlinux) gguf_convert_endian.py doesn't even get shipped, I'm not sure if other distros are different.

Most users will never touch gguf_convert_endian.py tbh. It's a very niche case for Big-Endian systems because most, if not all models available online are by default Little-Endian. If by "default builds" you mean the GitHub releases we have, then yes, we intend to keep the releases as minimal as possible.

Btw, what hardware are you using? AIX / s390x?

Also, the idea of having python files with the .py extension in path feels really eerie for some reason!

A lot of major projects available online use multiple programming languages, so this shouldn't come as a surprise :)

Or at least unify them as a python script, under a more unified CLI, e.g.: llama_format?

I think this is a better direction than to port the existing convert_hf_to_gguf.py and gguf_convert_endian.py to C/C++, where its really difficult to maintain. Pretty sure this would be a welcome change.

Ping @CISC for opinion as well :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it in scope of the llama.cpp project to add a new "endianness changer" feature to `llama-quantize` cli tooling? #17971

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is it in scope of the llama.cpp project to add a new "endianness changer" feature to llama-quantize cli tooling? #17971

Uh oh!

aarmn Dec 12, 2025

Replies: 1 comment · 2 replies

Uh oh!

taronaeo Dec 13, 2025 Collaborator

Uh oh!

aarmn Dec 14, 2025 Author

Uh oh!

taronaeo Dec 14, 2025 Collaborator

Is it in scope of the llama.cpp project to add a new "endianness changer" feature to `llama-quantize` cli tooling? #17971

aarmn
Dec 12, 2025

Replies: 1 comment 2 replies

taronaeo
Dec 13, 2025
Collaborator

aarmn Dec 14, 2025
Author

taronaeo Dec 14, 2025
Collaborator