[PyTorch] ONNX test fix + export for FP8 attention #2598

pggPL · 2026-01-14T19:20:22Z

Description

Fixes incorrect implementation of no_torch_dynamo decorator, which results in errors for newest PyTorch. The decorator was not correctly disabled during export to onnx.
Adds support for FP8 attention export.

Fixes #2588

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-01-27T17:12:56Z

Greptile Overview

Greptile Summary

This PR fixes ONNX export issues and adds FP8 attention support for ONNX export.

Key Changes:

Fixed no_torch_dynamo decorator bug: The previous lambda-based implementation incorrectly evaluated is_in_onnx_export_mode() at decoration time instead of at runtime, causing errors with newer PyTorch versions during ONNX export. The new implementation uses a proper wrapper function that checks the export mode at each call.
Added FP8 attention ONNX export: Implemented onnx_forward method in FP8EmulationFunc that uses ONNX-compatible operations (flatten, concatenate, quantize, split) to emulate FP8 quantization during export.
Enabled FP8 emulation during ONNX export: Modified attention backend selection logic to allow FP8 emulation when is_in_onnx_export_mode() returns true, even without the environment variable.
Updated tests: Added parameterization for FP8 recipes (DelayedScaling and Float8CurrentScaling with fp8_dpa=True), removed attention_dropout=0.5 to avoid non-deterministic outputs, and adjusted tolerance to 5e-1 for FP8 tests.
Updated CI: Set NVTE_UnfusedDPA_Emulate_FP8=1 environment variable in test script to enable FP8 emulation in CI environments without native FP8 hardware.

The implementation follows best practices for ONNX export by using operations with defined ONNX translations and properly handling the export mode detection.

Confidence Score: 4/5

This PR is safe to merge with minor considerations
The changes correctly address the no_torch_dynamo decorator bug that was causing issues with newer PyTorch versions, and properly implement FP8 attention ONNX export. The implementation follows best practices by using ONNX-compatible operations and runtime mode detection. Minor consideration: assert statement in onnx_forward could be improved with better error handling.
Pay attention to transformer_engine/pytorch/attention/dot_product_attention/backends.py - the assert statement could cause issues if non-Float8 quantizers are passed during ONNX export

Important Files Changed

Filename	Overview
transformer_engine/pytorch/jit.py	Fixed `no_torch_dynamo` decorator to properly check ONNX export mode at runtime instead of at decoration time, preventing errors with newer PyTorch versions
transformer_engine/pytorch/attention/dot_product_attention/backends.py	Added `onnx_forward` method to `FP8EmulationFunc` for ONNX-compatible FP8 quantization/dequantization using flatten+concat+quantize+split operations
tests/pytorch/test_onnx_export.py	Added FP8 recipe parameterization to core attention tests, removed `attention_dropout=0.5` parameter, and adjusted tolerance for FP8 tests

Sequence Diagram

sequenceDiagram
    participant User
    participant DotProductAttention
    participant AttentionBackend as get_attention_backend
    participant UnfusedDPA as UnfusedDotProductAttention
    participant FP8Emulation as FP8EmulationFunc
    participant Quantizer as Float8Quantizer
    participant ONNX as ONNX Export

    User->>DotProductAttention: forward(query, key, value)
    DotProductAttention->>AttentionBackend: get_attention_backend()
    
    alt ONNX Export Mode
        AttentionBackend->>AttentionBackend: is_in_onnx_export_mode() == True
        AttentionBackend->>AttentionBackend: allow_emulation = True
        AttentionBackend-->>DotProductAttention: use UnfusedDotProductAttention
        
        DotProductAttention->>UnfusedDPA: forward(Q, K, V)
        UnfusedDPA->>FP8Emulation: apply(Q, K, V, quantizer, "QKV_quantizer")
        FP8Emulation->>FP8Emulation: is_in_onnx_export_mode() == True
        FP8Emulation->>FP8Emulation: onnx_forward()
        
        FP8Emulation->>FP8Emulation: flatten Q, K, V
        FP8Emulation->>FP8Emulation: concatenate tensors
        FP8Emulation->>Quantizer: onnx_quantize(combined)
        Quantizer-->>FP8Emulation: FP8 tensor
        FP8Emulation->>Quantizer: onnx_dequantize(fp8_tensor)
        Quantizer-->>FP8Emulation: dequantized tensor
        FP8Emulation->>FP8Emulation: split and reshape Q, K, V
        FP8Emulation-->>UnfusedDPA: emulated FP8 Q, K, V
        
        UnfusedDPA->>UnfusedDPA: compute attention(Q, K, V)
        UnfusedDPA-->>DotProductAttention: attention output
        DotProductAttention->>ONNX: export to ONNX graph
        ONNX-->>User: ONNX model with FP8 attention
    else Normal Training/Inference
        AttentionBackend->>AttentionBackend: check env var NVTE_UnfusedDPA_Emulate_FP8
        AttentionBackend-->>DotProductAttention: select appropriate backend
        DotProductAttention-->>User: attention output
    end

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

tests/pytorch/test_onnx_export.py

pggPL · 2026-01-27T17:51:51Z

/te-ci pytorch L1

timmoon10

LGTM

timmoon10 · 2026-01-27T21:22:32Z

transformer_engine/pytorch/attention/dot_product_attention/backends.py

+            # Flatten and concatenate
+            combined = torch.cat(
+                [tensor1.reshape(-1), tensor2.reshape(-1), tensor3.reshape(-1)], dim=0
+            )


This is fine for FP8 attention, although we'll need to revisit whenever we support MXFP8 or NVFP4. Why can't we concatenate the 2D tensors?

I'm not sure if this will work for all layouts and different max_q_length and max_kv_length. Added asserions that's it not mxfp8, because I want to merge it fast. I will rethink it when adding support for mxfp8.

tests/pytorch/test_onnx_export.py

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T22:35:53Z

transformer_engine/pytorch/attention/dot_product_attention/backends.py

+        assert isinstance(
+            quantizer, (Float8Quantizer, Float8CurrentScalingQuantizer)
+        ), "ONNX FP8 emulation path supports only Float8 quantizers."


Assert statement will cause ONNX export to fail if non-Float8 quantizers are used. Consider replacing with a runtime check that raises a more descriptive error or logging a warning.

pggPL · 2026-01-27T22:40:08Z

/te-ci pytorch L1

* jjit bug fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix' Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

pggPL and others added 7 commits January 14, 2026 19:46

jjit bug fix

2c49133

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix'

bec2c3c

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0e054ad

for more information, see https://pre-commit.ci

fix

4bc878c

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix

75ca174

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fixes

1f0111f

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

768796b

for more information, see https://pre-commit.ci

pggPL marked this pull request as ready for review January 27, 2026 17:09

Merge branch 'main' into onnx_debug

87c5101

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

pggPL and others added 2 commits January 27, 2026 17:39

lint fixes

6f04da2

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c3a1acf

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

tests/pytorch/test_onnx_export.py Show resolved Hide resolved

KshitijLakhani requested review from cyanguwa and timmoon10 January 27, 2026 18:37

KshitijLakhani added the 2.12.0 label Jan 27, 2026

timmoon10 previously approved these changes Jan 27, 2026

View reviewed changes

fix

c07ffb7

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL dismissed timmoon10’s stale review via c07ffb7 January 27, 2026 22:32

timmoon10 approved these changes Jan 27, 2026

View reviewed changes

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

pggPL merged commit f04b094 into NVIDIA:main Jan 28, 2026
27 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] ONNX test fix + export for FP8 attention #2598

[PyTorch] ONNX test fix + export for FP8 attention #2598

Uh oh!

pggPL commented Jan 14, 2026

Uh oh!

greptile-apps bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

pggPL commented Jan 27, 2026

Uh oh!

timmoon10 left a comment

Uh oh!

timmoon10 Jan 27, 2026

Uh oh!

pggPL Jan 27, 2026

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 27, 2026

Uh oh!

pggPL commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PyTorch] ONNX test fix + export for FP8 attention #2598

[PyTorch] ONNX test fix + export for FP8 attention #2598

Uh oh!

Conversation

pggPL commented Jan 14, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pggPL commented Jan 27, 2026

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

pggPL Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

pggPL commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Jan 27, 2026 •

edited

Loading