Skip to content

Conversation

@ggerganov
Copy link
Member

from #17004

Extracting some refactoring portions from #17004 to make the review easier:

  • Simplify and make safer the management of llama objects (samplers, contexts, model)
  • The common_init_result now also owns the sampler chains constructed during common_init_from_params()
  • The sampler chains of common_init_result are constructed before the model and the context - we will need this for sampling : add support for backend sampling #17004 in order to optionally pass the samplers during the construction of the context

ref #17750 (comment)

Another change related to the grammar logic (the explanation is in the referenced comment):

  • No longer maintain a separate sampler chain for the grammar
  • Merge the grammar into the main common_sampler chain
  • The grammar is now always applied first to the raw logits, before the rest of the samplers

The main reason for this change is to make the integration of #17004 compatible with grammar usage and to simplify the logic for handling the grammar when it is present. The main concern is that this will likely hurt the performance when grammar sampling is involved, since we no longer do the "rejection sampling" trick. I think it's better to put effort to optimize the performance of the grammar in general so we don't need to do the trick at all.

@github-actions github-actions bot added the python python script changes label Dec 11, 2025
@ggerganov ggerganov requested a review from danbev December 12, 2025 12:32
@ggerganov ggerganov merged commit 254098a into master Dec 14, 2025
75 of 78 checks passed
@ggerganov ggerganov deleted the gg/common-refactor branch December 14, 2025 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants