User Tools

Site Tools


no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


hpsl:2020:deu:dkrz:lustre01 [2020/08/13 17:09] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== FS: Lustre01 ======
  
 +===== Characteristics =====
 +
 +<data_cdcl>
 +name:Lustre Phase1
 +</data_cdcl>
 +
 +===== Description =====
 +
 +The DKRZ system was procured in two phases that are roughly have the same size.
 +The storage of the first phase consisted of [[http://www.seagate.com/www-content/product-content/xyratex-branded/clustered-file-systems/en-us/docs/clusterstor-cs9000-datasheet.pdf|ClusterStor 9000]] storage devices.
 +
 +Both systems are configured in Scalable System Units (SSUs); pairs of servers in active/active fail-over mode that manages an extension unit (JBOD containing additional devices), resulting in two OSTs per OSS.
 +
 +===== Measurement protocols =====
 +
 +==== Peak performance ====
 +
 +The peak performance is derived from the maximum performance possible on a ClusterStor 9000, that is 5.4 GiB/s, multiplied with the number of servers in the SSU/extension pairs we have installed (31 in phase 1.
 +
 +==== Sustained metadata performance ====
 +
 +Performance has been measured using [[tools:benchmarks:parabench|Parabench]].
 +With Lustre DNE phase 1, DKRZ distributes the load among 5 metadata servers
 +The observed performance was roughly 25 kOP/s on the root MDS and 15 kOP/s on each DNE MDS resulting 80 kOPS/s.
 +Then five folders have been pre-created and distributed across the MDS.
 +Then five Parabench were started at the same to create the typical workload behavior.
 +The benchmark runs for a considerable time on 16 nodes with 16 processes per node but does not explicitly synchronize between the individual parabench runs.
 +Theoretically, a single Parabench run could handle this setup, but the simpler approach has been chosen.
 +
 +
 +
 +==== Sustained performance ====
 +
 +Performance of the phase 1 system has been measured with [[tools:benchmarks:ior|IOR]] using POSIX I/O to one file per process.
 +The configuration was as follows:
 +
 +  * Sriping across 84 OSTs = 21 SSUs + 21 extension units
 +  * 168 compute nodes, 6 IOR procs per node
 +  * Arguments to IOR: -b 2000000 -t 2000000
 +  * The amount of data was about 3x main memory of the used nodes