Fix mel spectrogram preprocessor allocating gigabytes of planned memory by mergennachin · Pull Request #18229 · pytorch/executorch

mergennachin · 2026-03-17T13:09:28Z

The dynamic dimension max was computed as max_audio_len * n_samples
(samples per 30s chunk), not max_audio_len * sampling_rate. With
max_audio_len=300, this produced 144M samples (150 minutes) instead of
4.8M (5 minutes), causing a ~3.3 GB planned buffer for STFT
intermediates.

For streaming mode, the max was even worse: 600 * 480K = 288M samples,
producing a 6.6 GB planned buffer — even though streaming processes
~1640 samples per step.

Fix both paths:

Offline: use max_audio_len * sampling_rate (300s → 4.8M samples, ~110 MB)
Streaming: cap at 2 seconds (32K samples, ~0.7 MB)

Peak RSS for voxtral runner: (before) 9,556 MB, after (4,712 MB)

pytorch-bot · 2026-03-17T13:09:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18229

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 9 Pending

As of commit d6c31f6 with merge base 1e17e28 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-17T13:10:16Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR fixes excessive planned-memory allocation during torch.export of the Whisper mel-spectrogram preprocessor by correcting the bound used for the waveform’s dynamic length dimension, with a tighter cap for streaming mode to keep STFT intermediate buffers small.

Changes:

Fix offline export dynamic max to max_audio_len * sampling_rate (seconds → samples) instead of mistakenly multiplying by n_samples.
Add a streaming-specific dynamic max cap at 2 * sampling_rate to prevent multi-GB memory plans.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

extension/audio/mel_spectrogram.py

The dynamic dimension max was computed as max_audio_len * n_samples (samples per 30s chunk), not max_audio_len * sampling_rate. With max_audio_len=300, this produced 144M samples (150 minutes) instead of 4.8M (5 minutes), causing a ~3.3 GB planned buffer for STFT intermediates. For streaming mode, the max was even worse: 600 * 480K = 288M samples, producing a 6.6 GB planned buffer — even though streaming processes ~1640 samples per step. Fix both paths: - Offline: use max_audio_len * sampling_rate (300s → 4.8M samples, ~110 MB) - Streaming: cap at 2 seconds (32K samples, ~0.7 MB)

digantdesai · 2026-03-17T14:27:14Z

extension/audio/mel_spectrogram.py

+    if model.streaming:
+        # Streaming processes small windows per step. 2 seconds gives
+        # comfortable headroom while keeping the memory plan tight.
+        max_samples = 2 * model.sampling_rate


Any performance issues with this? In the streaming mode, each inference takes 2s worth of samples and start over again for next two seconds?

No performance issue. The max_samples = 2 * sampling_rate is only the dynamic shape upper bound at export time. It tells the memory planner the maximum buffer size to allocate. It doesn't affect how inference runs.

At runtime, the streaming preprocessor is called with ~1,640 samples per step (~0.1s). The exported graph handles any input size from 1 up to the declared max.

The 2-second cap just means if someone somehow passed more than 32,000 samples in a single call, it would fail. In practice the streaming window is fixed at 1,640 samples.

mergennachin · 2026-03-17T16:28:20Z

@pytorchbot cherry-pick --onto release/1.2 -c critical

…ry (#18229) The dynamic dimension max was computed as max_audio_len * n_samples (samples per 30s chunk), not max_audio_len * sampling_rate. With max_audio_len=300, this produced 144M samples (150 minutes) instead of 4.8M (5 minutes), causing a ~3.3 GB planned buffer for STFT intermediates. For streaming mode, the max was even worse: 600 * 480K = 288M samples, producing a 6.6 GB planned buffer — even though streaming processes ~1640 samples per step. Fix both paths: - Offline: use max_audio_len * sampling_rate (300s → 4.8M samples, ~110 MB) - Streaming: cap at 2 seconds (32K samples, ~0.7 MB) Peak RSS for voxtral runner: (before) 9,556 MB, after (4,712 MB) (cherry picked from commit 776979f)

pytorchbot · 2026-03-17T16:42:23Z

Cherry picking #18229

The cherry pick PR is at #18238 and it is recommended to link a critical cherry pick PR with an issue.

Details for Dev Infra team

Raised by workflow job

Copilot AI review requested due to automatic review settings March 17, 2026 13:09

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 17, 2026

Copilot started reviewing on behalf of mergennachin March 17, 2026 13:10 View session

mergennachin requested a review from manuelcandales March 17, 2026 13:10

Copilot AI reviewed Mar 17, 2026

View reviewed changes

extension/audio/mel_spectrogram.py Show resolved Hide resolved

mergennachin force-pushed the fix-streaming-preprocessor-memory branch from a08eec5 to d6c31f6 Compare March 17, 2026 13:16

mergennachin requested review from JacobSzwejbka, digantdesai, kirklandsign and larryliu0820 March 17, 2026 13:22

digantdesai reviewed Mar 17, 2026

View reviewed changes

digantdesai approved these changes Mar 17, 2026

View reviewed changes

mergennachin merged commit 776979f into main Mar 17, 2026
233 checks passed

mergennachin deleted the fix-streaming-preprocessor-memory branch March 17, 2026 15:56

mergennachin mentioned this pull request Mar 17, 2026

[v1.2.0] Release Schedule and Tracker #17016

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mel spectrogram preprocessor allocating gigabytes of planned memory#18229

Fix mel spectrogram preprocessor allocating gigabytes of planned memory#18229
mergennachin merged 1 commit intomainfrom
fix-streaming-preprocessor-memory

mergennachin commented Mar 17, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

digantdesai Mar 17, 2026

Uh oh!

mergennachin Mar 17, 2026

Uh oh!

Uh oh!

mergennachin commented Mar 17, 2026

Uh oh!

pytorchbot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mergennachin commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18229

⏳ No Failures, 9 Pending

Uh oh!

github-actions bot commented Mar 17, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

digantdesai Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

mergennachin Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergennachin commented Mar 17, 2026

Uh oh!

pytorchbot commented Mar 17, 2026

Cherry picking #18229

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mergennachin commented Mar 17, 2026 •

edited

Loading

pytorch-bot bot commented Mar 17, 2026 •

edited

Loading

This PR needs a `release notes:` label