Eval bug: gpt-oss20b strange behaviour

### Name and Version

I've used several versions from b7296 to b7380

### Operating systems

Mac

### GGML backends

Metal

### Hardware

M1 Max, M4 Max

### Models

gpt-oss20b FP16 

### Problem description & steps to reproduce

I have a task that I use to test models with, it's a simple tool use "see three functions implementations in file 1, update them to use security measures implemented in file and add them to file2". usuall success rate for gpt-oss20b was always 100% or very close.

I've noticed performance in this task has significantly degraded yesterday. I've started narrowing down specific llama.cpp buils and it looks like it broke around b7371.

<img width="484" height="625" alt="Image" src="https://github.com/user-attachments/assets/99e11c81-f1d8-4684-b5dc-906f939f4bf3" />

You can see that 7350, 7363 and 7370 made proper code inserts without bugs. 7380 can't insert correct code. 
And I was not able to get any inserts from 7371 at all, it's like model is partially blind and barely "sees" the code. Sometimes it just claims code is already there and ends. Sometimes it keeps using "read file" and "search in file tools" forever. Sometimes it inserts same code several times (after checking if inserts went fine).

Idk how to provide reproducible example because it involves several mcp servers and proprietary code. Hope the data I've provided is enough, because I see 7371 has some breaking changes and the fix will be easy. 

### First Bad Commit

**_I think_** it's release b7371

### Relevant log output

```shell
Logs look absolutely normal, I've ran diff on them and only two strings are different: "ggml_metal_library_init: loaded <time>" and "build: <build>".
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: gpt-oss20b strange behaviour #18004

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: gpt-oss20b strange behaviour #18004

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions