


# Benchmarks








The results of these benchmarks suggest that building this `bc` with




optimization at `O3` with linktime optimization (`flto`) will result in the




best performance. However, using `march=native` can result in **WORSE**




performance.








*Note*: all benchmarks were run four times, and the fastest run is the one




shown. Also, `[bc]` means whichever `bc` was being run, and the assumed working




directory is the root directory of this repository. Also, this `bc` was at




version `3.0.0` while GNU `bc` was at version `1.07.1`, and all tests were




conducted on an `x86_64` machine running Gentoo Linux with `clang` `9.0.1` as




the compiler.








## Typical Optimization Level








These benchmarks were run with both `bc`'s compiled with the typical `O2`




optimizations and no linktime optimization.








### Addition








The command used was:








```




tests/script.sh bc add.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 2.54




user 1.21




sys 1.32




```








For this `bc`:








```




real 0.88




user 0.85




sys 0.02




```








### Subtraction








The command used was:








```




tests/script.sh bc subtract.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 2.51




user 1.05




sys 1.45




```








For this `bc`:








```




real 0.91




user 0.85




sys 0.05




```








### Multiplication








The command used was:








```




tests/script.sh bc multiply.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 7.15




user 4.69




sys 2.46




```








For this `bc`:








```




real 2.20




user 2.10




sys 0.09




```








### Division








The command used was:








```




tests/script.sh bc divide.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 3.36




user 1.87




sys 1.48




```








For this `bc`:








```




real 1.61




user 1.57




sys 0.03




```








### Power








The command used was:








```




printf '1234567890^100000; halt\n'  time p [bc] q > /dev/null




```








For GNU `bc`:








```




real 11.30




user 11.30




sys 0.00




```








For this `bc`:








```




real 0.73




user 0.72




sys 0.00




```








### Scripts








[This file][1] was downloaded, saved at `../timeconst.bc` and the following




patch was applied:








```




 ../timeconst.bc 20180928 11:32:22.808669000 0600




+++ ../timeconst.bc 20190607 07:26:36.359913078 0600




@@ 110,8 +110,10 @@








print "#endif /* KERNEL_TIMECONST_H */\n"




}




 halt




}








hz = read();




timeconst(hz)




+for (i = 0; i <= 50000; ++i) {




+ timeconst(i)




+}




+




+halt




```








The command used was:








```




time p [bc] ../timeconst.bc > /dev/null




```








For GNU `bc`:








```




real 16.71




user 16.06




sys 0.65




```








For this `bc`:








```




real 13.16




user 13.15




sys 0.00




```








Because this `bc` is faster when doing math, it might be a better comparison to




run a script that is not running any math. As such, I put the following into




`../test.bc`:








```




for (i = 0; i < 100000000; ++i) {




y = i




}








i




y








halt




```








The command used was:








```




time p [bc] ../test.bc > /dev/null




```








For GNU `bc`:








```




real 16.60




user 16.59




sys 0.00




```








For this `bc`:








```




real 22.76




user 22.75




sys 0.00




```








I also put the following into `../test2.bc`:








```




i = 0








while (i < 100000000) {




i += 1




}








i








halt




```








The command used was:








```




time p [bc] ../test2.bc > /dev/null




```








For GNU `bc`:








```




real 17.32




user 17.30




sys 0.00




```








For this `bc`:








```




real 16.98




user 16.96




sys 0.01




```








It seems that the improvements to the interpreter helped a lot in certain cases.








Also, I have no idea why GNU `bc` did worse when it is technically doing less




work.








## Recommended Optimizations from `2.7.0`








Note that, when running the benchmarks, the optimizations used are not the ones




I recommended for version `2.7.0`, which are `O3 flto march=native`.








This `bc` separates its code into modules that, when optimized at link time,




removes a lot of the inefficiency that comes from function overhead. This is




most keenly felt with one function: `bc_vec_item()`, which should turn into just




one instruction (on `x86_64`) when optimized at link time and inlined. There are




other functions that matter as well.








I also recommended `march=native` on the grounds that newer instructions would




increase performance on mathheavy code. We will see if that assumption was




correct. (Spoiler: **NO**.)








When compiling both `bc`'s with the optimizations I recommended for this `bc`




for version `2.7.0`, the results are as follows.








### Addition








The command used was:








```




tests/script.sh bc add.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 2.44




user 1.11




sys 1.32




```








For this `bc`:








```




real 0.59




user 0.54




sys 0.05




```








### Subtraction








The command used was:








```




tests/script.sh bc subtract.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 2.42




user 1.02




sys 1.40




```








For this `bc`:








```




real 0.64




user 0.57




sys 0.06




```








### Multiplication








The command used was:








```




tests/script.sh bc multiply.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 7.01




user 4.50




sys 2.50




```








For this `bc`:








```




real 1.59




user 1.53




sys 0.05




```








### Division








The command used was:








```




tests/script.sh bc divide.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 3.26




user 1.82




sys 1.44




```








For this `bc`:








```




real 1.24




user 1.20




sys 0.03




```








### Power








The command used was:








```




printf '1234567890^100000; halt\n'  time p [bc] q > /dev/null




```








For GNU `bc`:








```




real 11.08




user 11.07




sys 0.00




```








For this `bc`:








```




real 0.71




user 0.70




sys 0.00




```








### Scripts








The command for the `../timeconst.bc` script was:








```




time p [bc] ../timeconst.bc > /dev/null




```








For GNU `bc`:








```




real 15.62




user 15.08




sys 0.53




```








For this `bc`:








```




real 10.09




user 10.08




sys 0.01




```








The command for the next script, the `for` loop script, was:








```




time p [bc] ../test.bc > /dev/null




```








For GNU `bc`:








```




real 14.76




user 14.75




sys 0.00




```








For this `bc`:








```




real 17.95




user 17.94




sys 0.00




```








The command for the next script, the `while` loop script, was:








```




time p [bc] ../test2.bc > /dev/null




```








For GNU `bc`:








```




real 14.84




user 14.83




sys 0.00




```








For this `bc`:








```




real 13.53




user 13.52




sys 0.00




```








## LinkTime Optimization Only








Just for kicks, let's see if `march=native` is even useful.








The optimizations I used for both `bc`'s were `O3 flto`.








### Addition








The command used was:








```




tests/script.sh bc add.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 2.41




user 1.05




sys 1.35




```








For this `bc`:








```




real 0.58




user 0.52




sys 0.05




```








### Subtraction








The command used was:








```




tests/script.sh bc subtract.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 2.39




user 1.10




sys 1.28




```








For this `bc`:








```




real 0.65




user 0.57




sys 0.07




```








### Multiplication








The command used was:








```




tests/script.sh bc multiply.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 6.82




user 4.30




sys 2.51




```








For this `bc`:








```




real 1.57




user 1.49




sys 0.08




```








### Division








The command used was:








```




tests/script.sh bc divide.bc 1 0 1 1 [bc]




```








For GNU `bc`:








```




real 3.25




user 1.81




sys 1.43




```








For this `bc`:








```




real 1.27




user 1.23




sys 0.04




```








### Power








The command used was:








```




printf '1234567890^100000; halt\n'  time p [bc] q > /dev/null




```








For GNU `bc`:








```




real 10.50




user 10.49




sys 0.00




```








For this `bc`:








```




real 0.72




user 0.71




sys 0.00




```








### Scripts








The command for the `../timeconst.bc` script was:








```




time p [bc] ../timeconst.bc > /dev/null




```








For GNU `bc`:








```




real 15.50




user 14.81




sys 0.68




```








For this `bc`:








```




real 10.17




user 10.15




sys 0.01




```








The command for the next script, the `for` loop script, was:








```




time p [bc] ../test.bc > /dev/null




```








For GNU `bc`:








```




real 14.99




user 14.99




sys 0.00




```








For this `bc`:








```




real 16.85




user 16.84




sys 0.00




```








The command for the next script, the `while` loop script, was:








```




time p [bc] ../test2.bc > /dev/null




```








For GNU `bc`:








```




real 14.92




user 14.91




sys 0.00




```








For this `bc`:








```




real 12.75




user 12.75




sys 0.00




```








It turns out that `march=native` can be a problem. As such, I have removed the




recommendation to build with `march=native`.








## Recommended Compiler








When I ran these benchmarks with my `bc` compiled under `clang` vs. `gcc`, it




performed much better under `clang`. I recommend compiling this `bc` with




`clang`.








[1]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc
