b_eff_io is short for Effective I/O Bandwidth Benchmark. While aiming to measure performance that can be expected in parallel applications this benchmark is also focused on evaluating the speed of different access patterns.
After compiling the benchmark using mpicc
simply call the tool with mpirun
.
The only mandatory parameters are -MB
and -MT
to set the memory per node and total memory used for the benchmark respectively.
Runnning
mpirun -np 32 ./b_eff_io -MB 2048 -MT 81920 -noshared -rewrite -N 32 -T 60 -p /MY_fast_filesystem/MY_directory -f MYSYS_32pe_0060sec_noshared
results on in the following output on our test machine (saved in MYSYS_32pe_0060sec_noshared.sum
)
b_eff_io.c, Revision 2.1 from Dec. 12, 2001 MEMORY_PER_PROCESSOR = 2048 MBytes [1MBytes = 1024*1024 bytes, 1MB = 1e6 bytes] Maximum chunk size = 16.000 MBytes -N 32 T=60, MT=81920 MBytes, -noshared, -rewrite PATH=/home/gresens/test, PREFIX=MYSYS_32pe_0060sec_noshared system name : Linux hostname : cluster OS release : 3.13.0-76-generic OS version : #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 machine : x86_64 Date of measurement: Sat Mar 26 17:42:34 2016 Summary of file I/O bandwidth accumulated on 32 processes with 2048 MByte/PE ----------------------------------------------------------------------------- number pos chunk- access type=0 type=1 type=2 type=3 type=4 of PEs size (l) methode scatter shared separate segmened seg-coll [bytes] methode [MB/s] [MB/s] [MB/s] [MB/s] [MB/s] ----------------------------------------------------------------------- 32 PEs 1 1024 write 50.373 0.621 1.462 2.136 0.814 32 PEs 2 1032 write 60.067 0.662 1.909 1.582 0.695 32 PEs 3 32768 write 58.940 11.607 32.038 19.695 17.600 32 PEs 4 32776 write 61.503 16.008 34.560 21.868 13.998 32 PEs 5 1048576 write 82.490 89.902 113.474 92.407 83.098 32 PEs 6 1048584 write 82.964 84.907 105.175 80.906 82.537 32 PEs 7 16777216 write 109.741 111.965 113.723 109.800 109.211 32 PEs total-write 86.998 88.472 95.186 83.414 71.959 32 PEs 1 1024 rewrite 71.952 0.532 4.815 1.552 1.129 32 PEs 2 1032 rewrite 30.374 0.571 2.961 1.132 0.610 32 PEs 3 32768 rewrite 78.283 6.473 18.009 7.372 7.933 32 PEs 4 32776 rewrite 81.708 9.767 10.925 11.109 8.962 32 PEs 5 1048576 rewrite 51.444 27.174 110.536 44.736 48.128 32 PEs 6 1048584 rewrite 42.578 52.136 60.421 57.537 60.699 32 PEs 7 16777216 rewrite 76.624 63.918 107.528 78.295 85.087 32 PEs total-rewrite 64.469 49.024 70.673 50.648 53.845 32 PEs 1 1024 read 38.914 3.526 3.072 2.419 0.818 32 PEs 2 1032 read 37.997 1.154 4.093 2.984 0.953 32 PEs 3 32768 read 31.802 23.553 11.717 2.973 2.758 32 PEs 4 32776 read 39.174 28.625 14.161 11.857 9.168 32 PEs 5 1048576 read 70.816 49.608 40.960 31.331 31.057 32 PEs 6 1048584 read 77.031 77.107 77.181 25.288 27.258 32 PEs 7 16777216 read 50.218 41.240 84.615 71.739 78.096 32 PEs total-read 47.801 40.171 46.944 40.404 40.009 This table shows all results, except pattern 2 (scatter, l=1MBytes, L=2MBytes): bw_pat2= 59.459 MB/s write, 76.267 MB/s rewrite, 37.815 MB/s read (For gnuplot:) set xtics ( '1k' 1, '+8' 2, '32k' 4, '+8' 5, '1M' 7, '+8' 8, '16M' 10) set title 'Linux cluster 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64' -4 set label 1 'b_eff_io' at 10,50000 right set label 2 'rel. 2.1' at 10,25000 right set label 3 'T=1.0min' at 10,12500 right set label 4 'n=32' at 10,6250 right set label 5 'workaround for type 1:' at 10,0.50 right set label 6 'individual file pointer' at 10,0.25 right weighted average bandwidth for write : 85.504 MB/s on 32 processes weighted average bandwidth for rewrite : 58.855 MB/s on 32 processes weighted average bandwidth for read : 43.855 MB/s on 32 processes (type=0 is weighted double) Total amount of data written/read with each access method: 3516.800 MBytes = 4.3 percent of the total memory (81920 MBytes) b_eff_io of these measurements = 59.876 MB/s on 32 processes with 2048 MByte/PE and scheduled time=1.0 min NOT VALID for comparison of different systems criterion 1: scheduled time 1.0 min >= 30 min -- NOT reached criterion 2: shared file pointers must be used for pattern type 1 -- NOT reached criterion 3: error count (0) == 0 -- reached Maximum over all number of PEs ------------------------------ b_eff_io = 59.876 MB/s on 32 processes with 2048 MByte/PE, scheduled time=1.0 Min, on Linux cluster 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64, NOT VALID (see above)
Output has been created using version 2.1.