Add benchmarks for some of the basic operations on some of the basic types #349

Maegereg · 2025-11-10T20:23:00Z

Covers addition, multiplication, and equality (contains, for arbs) for acb, arb, fmpq, and fmpz.

The primary goal is to use these to measure the performance effect of using the stable API (#338), but they could be useful for other things in the future.

I'm particularly looking for feedback on whether this should include additional types or operations.

…types.

oscarbenjamin · 2025-11-12T11:55:51Z

Is there some package that could be used for benchmarking here?

Ideally what you want is to be able to compare two different versions to see possible statistically significant differences.

oscarbenjamin · 2025-11-12T12:01:18Z

The failed CI job is possibly due to the Cython constraint and might be fixed after gh-350.

Maegereg · 2025-11-12T19:17:02Z

Is there some package that could be used for benchmarking here?

I was initially assuming that we'd want to follow the philosophy of the tests and keep things pretty minimal. But I've done a bit of research now, and it looks like pyperf could be useful here - it has good support for running a suite of benchmarks, and comparing multiple runs (which would allow us to get comparisons between different builds of the library). We'd still need either some manual effort to set up the different builds in different environments, or some scripting on top of pyperf to automate that a little (I was planning to do that anyway in the world where we aren't using pyperf).

If that sounds reasonable to you, I can re-write these benchmarks to use pyperf. I plan to leave the scaffolding for handling multiple builds to a future PR, so that right now we can focus on whether these are the right things to measure.

Maegereg · 2025-11-25T19:09:14Z

I went ahead wrote up a version that uses pyperf.

oscarbenjamin · 2026-01-24T17:57:49Z

Sorry this dropped off my radar

oscarbenjamin · 2026-01-24T19:00:33Z

It has taken me a while to figure out how to actually run the benchmarks in my dev setup but it is

spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH

This is because I am using environment variables to make libflint.so available to the runtime linker but pyperf by default drops environment variables when launching the subprocesses that actually run the benchmarks.

When I run the benchmarks I see these warnings in the output for each case:

$ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (47.8 ns) is 12% of the mean (389 ns)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

acb addition: Mean +- std dev: 389 ns +- 48 ns
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
...

Is there a way to write the benchmarking code differently so that the results are considered to be more reliable?

They can be suppressed with --quiet so for now I'll use that and we have

$ meson setup build --reconfigure -Dbuildtype=release
$ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH --quiet
acb addition: Mean +- std dev: 173 ns +- 2 ns
acb contains: Mean +- std dev: 566 ns +- 6 ns
acb multiplication: Mean +- std dev: 165 ns +- 21 ns
arb addition: Mean +- std dev: 138 ns +- 2 ns
arb contains: Mean +- std dev: 1.06 us +- 0.01 us
arb multiplication: Mean +- std dev: 133 ns +- 1 ns
fmpq addition: Mean +- std dev: 184 ns +- 28 ns
fmpq equality: Mean +- std dev: 342 ns +- 6 ns
fmpq multiplication: Mean +- std dev: 208 ns +- 4 ns
fmpz addition: Mean +- std dev: 92.7 ns +- 0.9 ns
fmpz equality: Mean +- std dev: 93.1 ns +- 1.2 ns
fmpz multiplication: Mean +- std dev: 97.4 ns +- 6.0 ns

Then this is using the stable ABI v3.12:

$ meson setup build --reconfigure -Dbuildtype=release -Dpython.allow_limited_api=true -Dlimited_api_version=3.12
$ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH --quiet
acb addition: Mean +- std dev: 236 ns +- 42 ns
acb contains: Mean +- std dev: 573 ns +- 17 ns
acb multiplication: Mean +- std dev: 197 ns +- 11 ns
arb addition: Mean +- std dev: 171 ns +- 25 ns
arb contains: Mean +- std dev: 1.05 us +- 0.01 us
arb multiplication: Mean +- std dev: 159 ns +- 14 ns
fmpq addition: Mean +- std dev: 231 ns +- 16 ns
fmpq equality: Mean +- std dev: 464 ns +- 4 ns
fmpq multiplication: Mean +- std dev: 265 ns +- 12 ns
fmpz addition: Mean +- std dev: 130 ns +- 9 ns
fmpz equality: Mean +- std dev: 99.8 ns +- 7.4 ns
fmpz multiplication: Mean +- std dev: 141 ns +- 13 ns

(Side note that I needed to do rm -r build-install/ when switching to the stable ABI because otherwise you end up with both kinds of extension modules and CPython prefers to import the non-stable-ABI ones at import time.)

This is the stable ABI v3.9:

$ meson setup build --reconfigure -Dbuildtype=release -Dpython.allow_limited_api=true -Dlimited_api_version=3.9
$ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH --quiet
acb addition: Mean +- std dev: 195 ns +- 8 ns
acb contains: Mean +- std dev: 545 ns +- 10 ns
acb multiplication: Mean +- std dev: 182 ns +- 16 ns
arb addition: Mean +- std dev: 165 ns +- 10 ns
arb contains: Mean +- std dev: 1.05 us +- 0.03 us
arb multiplication: Mean +- std dev: 152 ns +- 7 ns
fmpq addition: Mean +- std dev: 206 ns +- 10 ns
fmpq equality: Mean +- std dev: 451 ns +- 70 ns
fmpq multiplication: Mean +- std dev: 247 ns +- 12 ns
fmpz addition: Mean +- std dev: 120 ns +- 10 ns
fmpz equality: Mean +- std dev: 94.1 ns +- 3.2 ns
fmpz multiplication: Mean +- std dev: 117 ns +- 7 ns

Those timings are all quite noisy. I haven't done a systematic analysis of statistical significance but it does look like the stable ABI gives an average slowdown for these micro-operations with stable ABI being maybe about 20% slower overall. I don't see a clear difference between using the 3.9 vs 3.12 version of the stable ABI (the Cython docs say that using 3.12 can make somethings faster). Probably with something bigger like an arb_mat it would be less noticeable but for something like fmpz(2)+fmpz(3) the overheads here are noticeable.

Further investigation could be done especially running the timings again and on a different computer because this is an old not powerful computer. Assuming there is just a 20% slowdown I think what that means is that in general we don't want to just use the stable ABI for all of the wheels uploaded to PyPI. We could however do something hybrid like using the stable ABI for less common platforms or for older Python versions.

CC @da-woods who may be interested to know about the Cython+stable-ABI timings.

Add benchmarks for some of the basic operations on some of the basic …

544e306

…types.

Use pyperf.

f3b2c95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for some of the basic operations on some of the basic types #349

Add benchmarks for some of the basic operations on some of the basic types #349

Maegereg commented Nov 10, 2025 •

edited

Loading

Uh oh!

oscarbenjamin commented Nov 12, 2025

Uh oh!

oscarbenjamin commented Nov 12, 2025

Uh oh!

Maegereg commented Nov 12, 2025

Uh oh!

Maegereg commented Nov 25, 2025

Uh oh!

oscarbenjamin commented Jan 24, 2026

Uh oh!

oscarbenjamin commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add benchmarks for some of the basic operations on some of the basic types #349

Are you sure you want to change the base?

Add benchmarks for some of the basic operations on some of the basic types #349

Conversation

Maegereg commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscarbenjamin commented Nov 12, 2025

Uh oh!

oscarbenjamin commented Nov 12, 2025

Uh oh!

Maegereg commented Nov 12, 2025

Uh oh!

Maegereg commented Nov 25, 2025

Uh oh!

oscarbenjamin commented Jan 24, 2026

Uh oh!

oscarbenjamin commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Maegereg commented Nov 10, 2025 •

edited

Loading