Skip to content

[Python] Add a PyArrow sanitizers build #49321

@raulcd

Description

@raulcd

Describe the enhancement requested

As part of:

The discussion about adding a sanitizers build for PyArrow popped up. I am creating this issue to track the discussion and raise it as a separate enhancement.

So far the summary of the discussion there:

I think the main difficulty for a PyArrow sanitizers build is that the sanitizer instrumentation should be enabled in CPython as well (and potentially NumPy?).

Originally posted by @pitrou #36411

You may be interested in how numpy & scipy are doing this, in conjunction with CPython. That setup uses pixi as a kind of "light-weight conda-build" orchestrator that wraps the various rebuilds (independent of whether that's via CMake/meson/whatever):

Originally posted by @h-vetinari in #36411

That's an ideal setup but I don't think its required - you could use point LD_PRELOAD to the sanitizer library to have it loaded correctly from a process that was not built with sanitizers enabled (i.e. Python). We used to do that in CI with pandas, although we did abandon it after time due to it being a maintenance burden

Originally posted by @WillAyd in #36411

Is that enough, though? Ideally, the code is instrumented at compile time (memory accesses etc.). For example, if PyArrow passes a bogus memory pointer to NumPy, we want ASan to notice and that might not happen if NumPy was not compiled with ASan enabled.

Originally posted by @pitrou in #36411

Yeah, for ASAN/TSAN, you need to instrument the other relevant libraries, which means rebuilding them, which is generally a huge pain, which is why the approach I referenced above provides a real benefit. Once all the pieces are in place, it comes down to

pixi run test-asan -t some_test

which rebuilds (& caches) instrumented cpython, numpy etc. as necessary. I haven't been very involved, but the scipy PR contains more details; and I'm pretty sure that Lucas wouldn't mind answering questions (not tagged here because it's already a bit OT).

Originally posted by @h-vetinari in #36411

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions