-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Description
Name and Version
I've used several versions from b7296 to b7380
Operating systems
Mac
GGML backends
Metal
Hardware
M1 Max, M4 Max
Models
gpt-oss20b FP16
Problem description & steps to reproduce
I have a task that I use to test models with, it's a simple tool use "see three functions implementations in file 1, update them to use security measures implemented in file and add them to file2". usuall success rate for gpt-oss20b was always 100% or very close.
I've noticed performance in this task has significantly degraded yesterday. I've started narrowing down specific llama.cpp buils and it looks like it broke around b7371.
You can see that 7350, 7363 and 7370 made proper code inserts without bugs. 7380 can't insert correct code.
And I was not able to get any inserts from 7371 at all, it's like model is partially blind and barely "sees" the code. Sometimes it just claims code is already there and ends. Sometimes it keeps using "read file" and "search in file tools" forever. Sometimes it inserts same code several times (after checking if inserts went fine).
Idk how to provide reproducible example because it involves several mcp servers and proprietary code. Hope the data I've provided is enough, because I see 7371 has some breaking changes and the fix will be easy.
First Bad Commit
I think it's release b7371
Relevant log output
Logs look absolutely normal, I've ran diff on them and only two strings are different: "ggml_metal_library_init: loaded <time>" and "build: <build>".