b_eff_io is short for Effective I/O Bandwidth Benchmark. While aiming to measure performance that can be expected in parallel applications this benchmark is also focused on evaluating the speed of different access patterns.
After compiling the benchmark using mpicc simply call the tool with mpirun.
The only mandatory parameters are -MB and -MT to set the memory per node and total memory used for the benchmark respectively.
Runnning
mpirun -np 32 ./b_eff_io -MB 2048 -MT 81920 -noshared -rewrite -N 32 -T 60 -p /MY_fast_filesystem/MY_directory -f MYSYS_32pe_0060sec_noshared
results on in the following output on our test machine (saved in MYSYS_32pe_0060sec_noshared.sum)
b_eff_io.c, Revision 2.1 from Dec. 12, 2001
MEMORY_PER_PROCESSOR = 2048 MBytes [1MBytes = 1024*1024 bytes, 1MB = 1e6 bytes]
Maximum chunk size = 16.000 MBytes
-N 32 T=60, MT=81920 MBytes, -noshared, -rewrite
PATH=/home/gresens/test, PREFIX=MYSYS_32pe_0060sec_noshared
system name : Linux
hostname : cluster
OS release : 3.13.0-76-generic
OS version : #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016
machine : x86_64
Date of measurement: Sat Mar 26 17:42:34 2016
Summary of file I/O bandwidth accumulated on 32 processes with 2048 MByte/PE
-----------------------------------------------------------------------------
number pos chunk- access type=0 type=1 type=2 type=3 type=4
of PEs size (l) methode scatter shared separate segmened seg-coll
[bytes] methode [MB/s] [MB/s] [MB/s] [MB/s] [MB/s]
-----------------------------------------------------------------------
32 PEs 1 1024 write 50.373 0.621 1.462 2.136 0.814
32 PEs 2 1032 write 60.067 0.662 1.909 1.582 0.695
32 PEs 3 32768 write 58.940 11.607 32.038 19.695 17.600
32 PEs 4 32776 write 61.503 16.008 34.560 21.868 13.998
32 PEs 5 1048576 write 82.490 89.902 113.474 92.407 83.098
32 PEs 6 1048584 write 82.964 84.907 105.175 80.906 82.537
32 PEs 7 16777216 write 109.741 111.965 113.723 109.800 109.211
32 PEs total-write 86.998 88.472 95.186 83.414 71.959
32 PEs 1 1024 rewrite 71.952 0.532 4.815 1.552 1.129
32 PEs 2 1032 rewrite 30.374 0.571 2.961 1.132 0.610
32 PEs 3 32768 rewrite 78.283 6.473 18.009 7.372 7.933
32 PEs 4 32776 rewrite 81.708 9.767 10.925 11.109 8.962
32 PEs 5 1048576 rewrite 51.444 27.174 110.536 44.736 48.128
32 PEs 6 1048584 rewrite 42.578 52.136 60.421 57.537 60.699
32 PEs 7 16777216 rewrite 76.624 63.918 107.528 78.295 85.087
32 PEs total-rewrite 64.469 49.024 70.673 50.648 53.845
32 PEs 1 1024 read 38.914 3.526 3.072 2.419 0.818
32 PEs 2 1032 read 37.997 1.154 4.093 2.984 0.953
32 PEs 3 32768 read 31.802 23.553 11.717 2.973 2.758
32 PEs 4 32776 read 39.174 28.625 14.161 11.857 9.168
32 PEs 5 1048576 read 70.816 49.608 40.960 31.331 31.057
32 PEs 6 1048584 read 77.031 77.107 77.181 25.288 27.258
32 PEs 7 16777216 read 50.218 41.240 84.615 71.739 78.096
32 PEs total-read 47.801 40.171 46.944 40.404 40.009
This table shows all results, except pattern 2 (scatter, l=1MBytes, L=2MBytes):
bw_pat2= 59.459 MB/s write, 76.267 MB/s rewrite, 37.815 MB/s read
(For gnuplot:)
set xtics ( '1k' 1, '+8' 2, '32k' 4, '+8' 5, '1M' 7, '+8' 8, '16M' 10)
set title 'Linux cluster 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64' -4
set label 1 'b_eff_io' at 10,50000 right
set label 2 'rel. 2.1' at 10,25000 right
set label 3 'T=1.0min' at 10,12500 right
set label 4 'n=32' at 10,6250 right
set label 5 'workaround for type 1:' at 10,0.50 right
set label 6 'individual file pointer' at 10,0.25 right
weighted average bandwidth for write : 85.504 MB/s on 32 processes
weighted average bandwidth for rewrite : 58.855 MB/s on 32 processes
weighted average bandwidth for read : 43.855 MB/s on 32 processes
(type=0 is weighted double)
Total amount of data written/read with each access method: 3516.800 MBytes
= 4.3 percent of the total memory (81920 MBytes)
b_eff_io of these measurements = 59.876 MB/s on 32 processes with 2048 MByte/PE and scheduled time=1.0 min
NOT VALID for comparison of different systems
criterion 1: scheduled time 1.0 min >= 30 min -- NOT reached
criterion 2: shared file pointers must be used for pattern type 1 -- NOT reached
criterion 3: error count (0) == 0 -- reached
Maximum over all number of PEs
------------------------------
b_eff_io = 59.876 MB/s on 32 processes with 2048 MByte/PE, scheduled time=1.0 Min, on Linux cluster 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64, NOT VALID (see above)
Output has been created using version 2.1.