Skip to content

Conversation

@JTischbein
Copy link
Contributor

Implements Direct I/O (uncached) file reading on Linux to improve model loading performance by bypassing the page cache. This is especially beneficial for large model files.

While mmap is fast on loading the same model multiple times, uncached read provides consistent model loading times at the speed of the sequential disk read speed. On DGX Spark loading GPT-OSS-120B-MXFP4 using mmap takes ~110s, in the following loads ~67s. With these changes it takes consistently ~10.5s. The speedup depends on the model size, the disk read speed and for sequential loading the available RAM.

I would propose to set uncached reads as default, Windows already has async uncached IO (PR)

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in a huge load speedup on DGX Spark and also at the end of the program leaves the memory in state free instead of buff/cache.

Currently, the implementation is gated behind defined(__linux__). Is this functionality generally supported across all linux platforms? If I am reading this correctly, it boils down to having O_DIRECT support for open().

Also, do we expect this change to also have effect on non-DGX Spark systems?

@lemmi
Copy link

lemmi commented Dec 14, 2025

On my strix halo machine with btrfs, this is strictly worse than master or with mmap. mmap shows the highest throughput while loading the model (~6GByte/s), master is around 3GByte/s and this patch is 2GByte/s.

@ehoogeveen-medweb
Copy link

IIRC with Strix Halo and ROCm/HIP, loading a model into memory reserved for the GPU using mmap has a major performance issue, hanging basically indefinitely for larger models. Given that reserving memory for the GPU also means having less RAM available to the CPU, it would be great if this DirectIO doesn't have that issue as it would make ROCm/HIP more viable for larger models. Vulkan doesn't have this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants