====== b_eff_io ======
---- dataentry benchmark ----
name         : b_eff_io # 
layers_ttags : MPI-IO # 
features_ttags : data
parallel_ttags : MPI 
type_ttags   : synthetic # 
webpage_url  : https://fs.hlrs.de/projects/par/mpi//b_eff_io/ # 
license_ttag : tbd # 
----


b_eff_io is short for //Effective I/O Bandwidth Benchmark//. While aiming to measure performance that can be expected in parallel applications this benchmark is also focused on evaluating the speed of different access patterns.
===== Usage =====
After compiling the benchmark using ''mpicc'' simply call the tool with ''mpirun''.
The only mandatory parameters are ''-MB'' and ''-MT'' to set the memory per node and total memory used for the benchmark respectively.
===== Example Output =====
Runnning <code>mpirun -np 32 ./b_eff_io -MB 2048 -MT 81920 -noshared -rewrite -N 32 -T 60 -p /MY_fast_filesystem/MY_directory -f MYSYS_32pe_0060sec_noshared</code> results on in the following output on our test machine (saved in ''MYSYS_32pe_0060sec_noshared.sum'')
<code>
b_eff_io.c, Revision 2.1 from Dec. 12, 2001

MEMORY_PER_PROCESSOR = 2048 MBytes  [1MBytes = 1024*1024 bytes, 1MB = 1e6 bytes]
Maximum chunk size   =   16.000 MBytes

-N  32 T=60, MT=81920 MBytes, -noshared, -rewrite
PATH=/home/gresens/test, PREFIX=MYSYS_32pe_0060sec_noshared
       system name : Linux
       hostname    : cluster
       OS release  : 3.13.0-76-generic
       OS version  : #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016
       machine     : x86_64

Date of measurement: Sat Mar 26 17:42:34 2016


Summary of file I/O bandwidth accumulated on  32 processes with 2048 MByte/PE
-----------------------------------------------------------------------------

 number pos chunk-   access   type=0   type=1   type=2   type=3   type=4
 of PEs     size (l) methode scatter  shared  separate segmened seg-coll
            [bytes]  methode  [MB/s]   [MB/s]   [MB/s]   [MB/s]   [MB/s]
 -----------------------------------------------------------------------

  32 PEs 1     1024 write     50.373    0.621    1.462    2.136    0.814
  32 PEs 2     1032 write     60.067    0.662    1.909    1.582    0.695
  32 PEs 3    32768 write     58.940   11.607   32.038   19.695   17.600
  32 PEs 4    32776 write     61.503   16.008   34.560   21.868   13.998
  32 PEs 5  1048576 write     82.490   89.902  113.474   92.407   83.098
  32 PEs 6  1048584 write     82.964   84.907  105.175   80.906   82.537
  32 PEs 7 16777216 write    109.741  111.965  113.723  109.800  109.211
  32 PEs      total-write     86.998   88.472   95.186   83.414   71.959

  32 PEs 1     1024 rewrite   71.952    0.532    4.815    1.552    1.129
  32 PEs 2     1032 rewrite   30.374    0.571    2.961    1.132    0.610
  32 PEs 3    32768 rewrite   78.283    6.473   18.009    7.372    7.933
  32 PEs 4    32776 rewrite   81.708    9.767   10.925   11.109    8.962
  32 PEs 5  1048576 rewrite   51.444   27.174  110.536   44.736   48.128
  32 PEs 6  1048584 rewrite   42.578   52.136   60.421   57.537   60.699
  32 PEs 7 16777216 rewrite   76.624   63.918  107.528   78.295   85.087
  32 PEs      total-rewrite   64.469   49.024   70.673   50.648   53.845

  32 PEs 1     1024 read      38.914    3.526    3.072    2.419    0.818
  32 PEs 2     1032 read      37.997    1.154    4.093    2.984    0.953
  32 PEs 3    32768 read      31.802   23.553   11.717    2.973    2.758
  32 PEs 4    32776 read      39.174   28.625   14.161   11.857    9.168
  32 PEs 5  1048576 read      70.816   49.608   40.960   31.331   31.057
  32 PEs 6  1048584 read      77.031   77.107   77.181   25.288   27.258
  32 PEs 7 16777216 read      50.218   41.240   84.615   71.739   78.096
  32 PEs      total-read      47.801   40.171   46.944   40.404   40.009

This table shows all results, except pattern 2 (scatter, l=1MBytes, L=2MBytes): 
  bw_pat2=   59.459 MB/s write,   76.267 MB/s rewrite,   37.815 MB/s read

(For gnuplot:)
  set xtics ( '1k' 1, '+8' 2, '32k' 4, '+8' 5, '1M' 7, '+8' 8, '16M' 10)
  set title 'Linux cluster 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64' -4
  set label 1 'b_eff_io'  at 10,50000 right
  set label 2 'rel. 2.1'  at 10,25000 right
  set label 3 'T=1.0min' at 10,12500 right
  set label 4 'n=32'     at 10,6250  right
  set label 5 'workaround for type 1:'  at 10,0.50 right
  set label 6 'individual file pointer' at 10,0.25 right

weighted average bandwidth for write   :   85.504 MB/s on 32 processes
weighted average bandwidth for rewrite :   58.855 MB/s on 32 processes
weighted average bandwidth for read    :   43.855 MB/s on 32 processes
(type=0 is weighted double)

Total amount of data written/read with each access method: 3516.800 MBytes
  =  4.3 percent of the total memory (81920 MBytes)

b_eff_io of these measurements =   59.876 MB/s on 32 processes with 2048 MByte/PE and scheduled time=1.0 min

NOT VALID for comparison of different systems
  criterion 1: scheduled time 1.0 min >= 30 min -- NOT reached
  criterion 2: shared file pointers must be used for pattern type 1 -- NOT reached
  criterion 3: error count (0) == 0 -- reached

Maximum over all number of PEs
------------------------------


 b_eff_io =   59.876 MB/s on 32 processes with 2048 MByte/PE, scheduled time=1.0 Min, on Linux cluster 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64, NOT VALID (see above)
</code>
Output has been created using version 2.1.