Replies: 1 comment 2 replies
-
|
I would be speaking the context for Big-Endian systems. Llama.cpp currently provides the following scripts that already implement byteswapping:
Both of these will produce Big-Endian GGUF files. However, gguf_convert_endian.py has limited support for byteswapping the GGML types, so more implementation is required. Next, quantization. On s390x, we usually take the Big-Endian GGUF model created from above and run it through llama-quantize to create a quantized model. This ensures that the model is still in Big-Endian byteorder. If you could somehow merge these steps where we can take a BF16 Little-Endian GGUF model, and quantize it to lets say Q4_K_M while byteswapping it to Big-Endian, that would be great :) lmk your thoughts |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone!
I was considering adding an endian-ness changer which is currently a dangling python file to the llama.cpp project, llama-quantize module, do you think there is a scope problem with that? as I'm new to source-code, and not quite a top notch c/cpp developer, so I'd love to have a heads-up before investing some time on implementing that.
I think personally it's super useful to have that or even, auto convert with a prompt to the user, but I think a flag would be enough to improve workflow
Is there a similar feature implemented? and should I?
Thanks for your time!
Beta Was this translation helpful? Give feedback.
All reactions