You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just updated to b7347 from a previous releases b738. And I've noticed an across-the-board reduction in eval time. Oddly the eval time is about the same.
No other changes to OS, drivers or llama cli parameters. Testing with a single GPU, RTX 3090, on a variety of models including Gemma 27B, Deepseek R1 70B, GPT-OSS 120B.
Any one else see this or just me? Thanks.
Deepseek R1 70B - Previous
common_perf_print: prompt eval time = 426.54 ms / 13 tokens ( 32.81 ms per token, 30.48 tokens per second)
common_perf_print: eval time = 158021.47 ms / 2127 runs ( 74.29 ms per token, 13.46 tokens per second)
Deepseek R1 70B - Current
common_perf_print: prompt eval time = 528.70 ms / 13 tokens ( 40.67 ms per token, 24.59 tokens per second)
common_perf_print: eval time = 151626.33 ms / 2208 runs ( 68.67 ms per token, 14.56 tokens per second)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I just updated to b7347 from a previous releases b738. And I've noticed an across-the-board reduction in eval time. Oddly the eval time is about the same.
No other changes to OS, drivers or llama cli parameters. Testing with a single GPU, RTX 3090, on a variety of models including Gemma 27B, Deepseek R1 70B, GPT-OSS 120B.
Any one else see this or just me? Thanks.
Deepseek R1 70B - Previous
Deepseek R1 70B - Current
Beta Was this translation helpful? Give feedback.
All reactions