-
-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Describe the feature
Chat templates are executable Jinja2 programs bundled with instruction-tuned and chat-finetuned models that translate structured input into the token sequences the model expects. They are present across the large majority of deployed LLMs and SLMs today, including diffusion-based language models such as LLaDA 2.0. They are typically distributed as Jinja2 strings, either embedded in GGUF file metadata or stored in tokenizer_config.json within HuggingFace model repositories.
They run automatically before model inference, mapping conversational roles (user, assistant, system) into model-specific serialized formats using special tokens (e.g., Llama's <|start_header_id|>, Qwen's <|im_start|>, Mistral's [INST]).
Despite their critical role in the inference pipeline, chat templates are not currently tracked in CycloneDX model cards. This means a component that directly controls how all user input reaches a model has no representation in the BOM.
Why this matters
Recent peer-reviewed research shows that a maliciously modified chat template is sufficient to backdoor a model at inference time. The attacker only needs to change the template file itself. Across 18 models spanning 7 families, triggered backdoors achieved roughly 80% success rates while staying dormant under normal use and evading HuggingFace's existing security scanners.
This has been accepted as a workshop paper at ICLR 2026:
Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates
The paper demonstrates two attack scenarios:
- Integrity degradation: The model produces plausible but subtly wrong answers (e.g., incorrect dates for historical facts) while maintaining fluent output.
- Forbidden resource emission: The model emits attacker-controlled URLs, either in plaintext, hidden in HTML comments, or Base64-encoded.
Relationship to existing work
Issue #702 already identifies chat templates as a TODO item for the MLBOM 2.0 schema rework, noting that inputs and outputs "are really (chat) template parameters (which may vary by template as models can have multiple)." This proposal provides a concrete, research-backed design for that item and makes the security case for treating it as a first-class field rather than a property extension.
This proposal targets CycloneDX 2.0, aligning with the broader MLBOM schema improvements tracked in #702 and the modularization goals in #631.
Possible solutions
Add a chatTemplate object to modelCard.modelParameters (or to the top-level modelCard if fields are reorganized per #702).
Proposed schema structure
"chatTemplate": {
"type": "object",
"description": "Chat template used to format structured input into token sequences.",
"properties": {
"format": {
"type": "string",
"description": "The template language or format.",
"examples": ["jinja2", "chatml"]
},
"content": {
"type": "string",
"description": "The raw template content."
},
"hashes": {
"type": "array",
"description": "Cryptographic hashes of the template for integrity verification.",
"items": { "$ref": "#/definitions/hash" }
},
"signature": {
"$ref": "#/definitions/signature",
"description": "Cryptographic signature from the model provider for provenance."
},
"specialTokens": {
"type": "array",
"description": "Special tokens the template uses for role demarcation.",
"items": {
"type": "object",
"properties": {
"role": { "type": "string" },
"startDelimiter": { "type": "string" },
"endDelimiter": { "type": "string" }
}
}
}
}
}This reuses existing CycloneDX primitives (hash, signature) rather than introducing new types, which keeps the addition minimal while directly addressing the paper's core recommendations:
| Paper recommendation | CycloneDX mechanism |
|---|---|
| Treat templates as security-relevant | First-class schema field |
| Cryptographic signing for provenance | Existing signature type |
| Automated anomaly detection | hashes enable integrity checks |
| Deployer-side auditing | content field for inspection |
Since models can support multiple chat templates (as noted in #702), this field could also be an array to capture template variants.
Alternatives
- Property extension only: Use the existing
propertiesname-value mechanism to store template metadata. This is possible today but provides no structure, no integrity guarantees, and no tooling interoperability. - External reference: Point to the template via
externalReferences. This captures location but not content or hashes, limiting offline auditing and integrity verification.
Additional context
- ICLR 2026 workshop paper (accepted): https://arxiv.org/abs/2602.04653v2
- CycloneDX 2.0 MLBOM schema discussion: [FEATURE]: Proposed changes for MLBOM schema for CycloneDX 2.0 #702
- CycloneDX 2.0 tracker: CycloneDX 2.0 #631
- HuggingFace chat template documentation: https://huggingface.co/docs/transformers/main/en/chat_templating
- The paper also shows that "defensive" chat templates can improve model robustness against jailbreaks by 12.5% without degrading benign performance, suggesting that template tracking has value for both security auditing and safety documentation.