-
Notifications
You must be signed in to change notification settings - Fork 35
Add benchmarks for some of the basic operations on some of the basic types #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Is there some package that could be used for benchmarking here? Ideally what you want is to be able to compare two different versions to see possible statistically significant differences. |
|
The failed CI job is possibly due to the Cython constraint and might be fixed after gh-350. |
I was initially assuming that we'd want to follow the philosophy of the tests and keep things pretty minimal. But I've done a bit of research now, and it looks like pyperf could be useful here - it has good support for running a suite of benchmarks, and comparing multiple runs (which would allow us to get comparisons between different builds of the library). We'd still need either some manual effort to set up the different builds in different environments, or some scripting on top of pyperf to automate that a little (I was planning to do that anyway in the world where we aren't using pyperf). If that sounds reasonable to you, I can re-write these benchmarks to use pyperf. I plan to leave the scaffolding for handling multiple builds to a future PR, so that right now we can focus on whether these are the right things to measure. |
|
I went ahead wrote up a version that uses pyperf. |
|
Sorry this dropped off my radar |
|
It has taken me a while to figure out how to actually run the benchmarks in my dev setup but it is spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATHThis is because I am using environment variables to make libflint.so available to the runtime linker but pyperf by default drops environment variables when launching the subprocesses that actually run the benchmarks. When I run the benchmarks I see these warnings in the output for each case: $ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (47.8 ns) is 12% of the mean (389 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
acb addition: Mean +- std dev: 389 ns +- 48 ns
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
...Is there a way to write the benchmarking code differently so that the results are considered to be more reliable? They can be suppressed with $ meson setup build --reconfigure -Dbuildtype=release
$ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH --quiet
acb addition: Mean +- std dev: 173 ns +- 2 ns
acb contains: Mean +- std dev: 566 ns +- 6 ns
acb multiplication: Mean +- std dev: 165 ns +- 21 ns
arb addition: Mean +- std dev: 138 ns +- 2 ns
arb contains: Mean +- std dev: 1.06 us +- 0.01 us
arb multiplication: Mean +- std dev: 133 ns +- 1 ns
fmpq addition: Mean +- std dev: 184 ns +- 28 ns
fmpq equality: Mean +- std dev: 342 ns +- 6 ns
fmpq multiplication: Mean +- std dev: 208 ns +- 4 ns
fmpz addition: Mean +- std dev: 92.7 ns +- 0.9 ns
fmpz equality: Mean +- std dev: 93.1 ns +- 1.2 ns
fmpz multiplication: Mean +- std dev: 97.4 ns +- 6.0 nsThen this is using the stable ABI v3.12: $ meson setup build --reconfigure -Dbuildtype=release -Dpython.allow_limited_api=true -Dlimited_api_version=3.12
$ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH --quiet
acb addition: Mean +- std dev: 236 ns +- 42 ns
acb contains: Mean +- std dev: 573 ns +- 17 ns
acb multiplication: Mean +- std dev: 197 ns +- 11 ns
arb addition: Mean +- std dev: 171 ns +- 25 ns
arb contains: Mean +- std dev: 1.05 us +- 0.01 us
arb multiplication: Mean +- std dev: 159 ns +- 14 ns
fmpq addition: Mean +- std dev: 231 ns +- 16 ns
fmpq equality: Mean +- std dev: 464 ns +- 4 ns
fmpq multiplication: Mean +- std dev: 265 ns +- 12 ns
fmpz addition: Mean +- std dev: 130 ns +- 9 ns
fmpz equality: Mean +- std dev: 99.8 ns +- 7.4 ns
fmpz multiplication: Mean +- std dev: 141 ns +- 13 ns(Side note that I needed to do This is the stable ABI v3.9: $ meson setup build --reconfigure -Dbuildtype=release -Dpython.allow_limited_api=true -Dlimited_api_version=3.9
$ spin run python benchmarks/simple_benchmarks.py --inherit-environ=PYTHONPATH,LD_LIBRARY_PATH --quiet
acb addition: Mean +- std dev: 195 ns +- 8 ns
acb contains: Mean +- std dev: 545 ns +- 10 ns
acb multiplication: Mean +- std dev: 182 ns +- 16 ns
arb addition: Mean +- std dev: 165 ns +- 10 ns
arb contains: Mean +- std dev: 1.05 us +- 0.03 us
arb multiplication: Mean +- std dev: 152 ns +- 7 ns
fmpq addition: Mean +- std dev: 206 ns +- 10 ns
fmpq equality: Mean +- std dev: 451 ns +- 70 ns
fmpq multiplication: Mean +- std dev: 247 ns +- 12 ns
fmpz addition: Mean +- std dev: 120 ns +- 10 ns
fmpz equality: Mean +- std dev: 94.1 ns +- 3.2 ns
fmpz multiplication: Mean +- std dev: 117 ns +- 7 nsThose timings are all quite noisy. I haven't done a systematic analysis of statistical significance but it does look like the stable ABI gives an average slowdown for these micro-operations with stable ABI being maybe about 20% slower overall. I don't see a clear difference between using the 3.9 vs 3.12 version of the stable ABI (the Cython docs say that using 3.12 can make somethings faster). Probably with something bigger like an Further investigation could be done especially running the timings again and on a different computer because this is an old not powerful computer. Assuming there is just a 20% slowdown I think what that means is that in general we don't want to just use the stable ABI for all of the wheels uploaded to PyPI. We could however do something hybrid like using the stable ABI for less common platforms or for older Python versions. CC @da-woods who may be interested to know about the Cython+stable-ABI timings. |
Covers addition, multiplication, and equality (contains, for arbs) for acb, arb, fmpq, and fmpz.
The primary goal is to use these to measure the performance effect of using the stable API (#338), but they could be useful for other things in the future.
I'm particularly looking for feedback on whether this should include additional types or operations.