Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 13, 2025

Fix #17989

Related discussion: #16736 (comment)

Argument Explanation
--kv-unified, -kvu use single unified KV buffer shared across all sequences (default: enabled if number of slots is auto)
(env: LLAMA_ARG_KV_UNIFIED)
-np, --parallel N number of server slots (default: -1, -1 = auto)
(env: LLAMA_ARG_N_PARALLEL)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: "--parallel 1" initializes 4 slots, while docs say default is 1

1 participant