-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
Feature Idea
My custom node https://github.com/phaserblast/ComfyUI-DGXSparkSafetensorsLoader uses fastsafetensors to allocate memory and load safetensors models directly into VRAM using GPUDirect, bypassing ComfyUI's memory management entirely. This was needed to get around mmap() used by Hugging Face's safetensors loader, which doesn't work well on shared memory systems like DGX Spark.
The problem with this approach is although models load in only a few seconds (as opposed to over a minute or more using the HF stuff), the allocated memory can't be freed. The reason is ComfyUI's memory management has zero clue about what the node is doing memory-wise, and cannot call the necessary fastsafetensors methods to free the memory.
The solution is to have ComfyUI accept a callback that contains the code necessary to free the memory allocated by the custom implementation. It can check to see if the callback is non-null, and if so use that when it wants to unload a model rather than its own built-in stuff. This will offer custom nodes way more flexibility when loading models.
Existing Solutions
No response
Other
No response