Skip to content

[FEATURE]: Track chat templates in modelCard for ML model supply chain integrity #862

@afogel

Description

@afogel

Describe the feature

Chat templates are executable Jinja2 programs bundled with instruction-tuned and chat-finetuned models that translate structured input into the token sequences the model expects. They are present across the large majority of deployed LLMs and SLMs today, including diffusion-based language models such as LLaDA 2.0. They are typically distributed as Jinja2 strings, either embedded in GGUF file metadata or stored in tokenizer_config.json within HuggingFace model repositories.

They run automatically before model inference, mapping conversational roles (user, assistant, system) into model-specific serialized formats using special tokens (e.g., Llama's <|start_header_id|>, Qwen's <|im_start|>, Mistral's [INST]).

Despite their critical role in the inference pipeline, chat templates are not currently tracked in CycloneDX model cards. This means a component that directly controls how all user input reaches a model has no representation in the BOM.

Why this matters

Recent peer-reviewed research shows that a maliciously modified chat template is sufficient to backdoor a model at inference time. The attacker only needs to change the template file itself. Across 18 models spanning 7 families, triggered backdoors achieved roughly 80% success rates while staying dormant under normal use and evading HuggingFace's existing security scanners.

This has been accepted as a workshop paper at ICLR 2026:

Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates

https://arxiv.org/abs/2602.04653v2

The paper demonstrates two attack scenarios:

  1. Integrity degradation: The model produces plausible but subtly wrong answers (e.g., incorrect dates for historical facts) while maintaining fluent output.
  2. Forbidden resource emission: The model emits attacker-controlled URLs, either in plaintext, hidden in HTML comments, or Base64-encoded.

Relationship to existing work

Issue #702 already identifies chat templates as a TODO item for the MLBOM 2.0 schema rework, noting that inputs and outputs "are really (chat) template parameters (which may vary by template as models can have multiple)." This proposal provides a concrete, research-backed design for that item and makes the security case for treating it as a first-class field rather than a property extension.

This proposal targets CycloneDX 2.0, aligning with the broader MLBOM schema improvements tracked in #702 and the modularization goals in #631.

Possible solutions

Add a chatTemplate object to modelCard.modelParameters (or to the top-level modelCard if fields are reorganized per #702).

Proposed schema structure

"chatTemplate": {
  "type": "object",
  "description": "Chat template used to format structured input into token sequences.",
  "properties": {
    "format": {
      "type": "string",
      "description": "The template language or format.",
      "examples": ["jinja2", "chatml"]
    },
    "content": {
      "type": "string",
      "description": "The raw template content."
    },
    "hashes": {
      "type": "array",
      "description": "Cryptographic hashes of the template for integrity verification.",
      "items": { "$ref": "#/definitions/hash" }
    },
    "signature": {
      "$ref": "#/definitions/signature",
      "description": "Cryptographic signature from the model provider for provenance."
    },
    "specialTokens": {
      "type": "array",
      "description": "Special tokens the template uses for role demarcation.",
      "items": {
        "type": "object",
        "properties": {
          "role": { "type": "string" },
          "startDelimiter": { "type": "string" },
          "endDelimiter": { "type": "string" }
        }
      }
    }
  }
}

This reuses existing CycloneDX primitives (hash, signature) rather than introducing new types, which keeps the addition minimal while directly addressing the paper's core recommendations:

Paper recommendation CycloneDX mechanism
Treat templates as security-relevant First-class schema field
Cryptographic signing for provenance Existing signature type
Automated anomaly detection hashes enable integrity checks
Deployer-side auditing content field for inspection

Since models can support multiple chat templates (as noted in #702), this field could also be an array to capture template variants.

Alternatives

  • Property extension only: Use the existing properties name-value mechanism to store template metadata. This is possible today but provides no structure, no integrity guarantees, and no tooling interoperability.
  • External reference: Point to the template via externalReferences. This captures location but not content or hashes, limiting offline auditing and integrity verification.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions