-
-
Notifications
You must be signed in to change notification settings - Fork 387
Description
In the v3 spec, the chunk_grid field is extensible. People can define their own chunk grids.
In Zarr python, we have 1 chunk grid right (the regular chunk grid, every chunk is the same shape), and we are planning on adding another one (the rectilinear chunk grid, every chunk can have a different shape). Over in Zarrs, there's a third chunk grid, which is regular except for boundary chunks, which are trimmed. We should support this too.
The rectilinear chunk grid is strictly more general than both the regular chunk grid and the zarrs regular-bounded chunk grid -- any regular or regular-bounded chunk grid can be expressed as a rectilinear chunk grid, and if your implementation supports the rectilinear chunk grid, you support these other two chunk grids for free.
This is very different from other extension points like data types or chunk key encodings. Supporting uint8 doesn't help you with float32 or vice versa.
Over in #3735 I set up chunk grids the same way as the other extensible metadata, with a registry. This was basically extrapolation from how data types and chunk key encodings work. But @maxrjones was skeptical about this approach, and I think that skepticism was right. I don't think we can re-use the chunk key encoding / data patterns for chunk grids.
Here are some thoughts about how our chunk grid API could look when we add support for rectilinear chunking. I'm interested in any and all feedback on these ideas:
- we should have all our chunk grid logic in one place, e.g. just one chunk grid class. The regular chunk grid doesn't need a separate class, because it's just a special case of rectilinear chunking (rectilinear chunk grids can also be regular). We should think about the rectilinear chunk grid implementation as a replacement for our current (regular) chunk grid implementation.
- Unless directed otherwise we should serialize array metadata with the simplest chunk grid metadata that can describe the user-requested chunk arrangement of the array. If a user asks for regular chunks, we serialize the regular chunk grid. if a user has a regularly-chunked array and resizes it, we should expose control over the chunking of the expanded / contracted region, and update the chunk grid metadata accordingly. It should be easy, but also transparent, to opt into rectilinear chunking starting from a regularly-chunked array.
- Because the rectilinear chunk grid will be the most general chunk grid we can support as an implementation, and other known chunk grids are special cases of the rectilinear chunk grid, I don't think there's value in allowing user-defined chunk grids. Happy to be corrected here if there's some unknown chunk grid that's more general than the rectilinear chunk grid which we could actually support. I'm not aware of one that retains the properties we assume for chunk grids.
Anyway those are some of my thoughts while thinking about how to ship the features added in #3534, curious to hear if people have other ideas