1================================== 2Benchmarking tips 3================================== 4 5 6Introduction 7============ 8 9For benchmarking a patch we want to reduce all possible sources of 10noise as much as possible. How to do that is very OS dependent. 11 12Note that low noise is required, but not sufficient. It does not 13exclude measurement bias. See 14https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for 15example. 16 17General 18================================ 19 20* Use a high resolution timer, e.g. perf under linux. 21 22* Run the benchmark multiple times to be able to recognize noise. 23 24* Disable as many processes or services as possible on the target system. 25 26* Disable frequency scaling, turbo boost and address space 27 randomization (see OS specific section). 28 29* Static link if the OS supports it. That avoids any variation that 30 might be introduced by loading dynamic libraries. This can be done 31 by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake. 32 33* Try to avoid storage. On some systems you can use tmpfs. Putting the 34 program, inputs and outputs on tmpfs avoids touching a real storage 35 system, which can have a pretty big variability. 36 37 To mount it (on linux and freebsd at least):: 38 39 mount -t tmpfs -o size=<XX>g none dir_to_mount 40 41Linux 42===== 43 44* Disable address space randomization:: 45 46 echo 0 > /proc/sys/kernel/randomize_va_space 47 48* Set scaling_governor to performance:: 49 50 for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor 51 do 52 echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor 53 done 54 55* Use https://github.com/lpechacek/cpuset to reserve cpus for just the 56 program you are benchmarking. If using perf, leave at least 2 cores 57 so that perf runs in one and your program in another:: 58 59 cset shield -c N1,N2 -k on 60 61 This will move all threads out of N1 and N2. The ``-k on`` means 62 that even kernel threads are moved out. 63 64* Disable the SMT pair of the cpus you will use for the benchmark. The 65 pair of cpu N can be found in 66 ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and 67 disabled with:: 68 69 echo 0 > /sys/devices/system/cpu/cpuX/online 70 71 72* Run the program with:: 73 74 cset shield --exec -- perf stat -r 10 <cmd> 75 76 This will run the command after ``--`` in the isolated cpus. The 77 particular perf command runs the ``<cmd>`` 10 times and reports 78 statistics. 79 80With these in place you can expect perf variations of less than 0.1%. 81 82Linux Intel 83----------- 84 85* Disable turbo mode:: 86 87 echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo 88