no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== FS: Lustre02 ======
+===== Characteristics =====
+<data_cdcl>
+name:Lustre Phase2
+</data_cdcl>
+===== Description =====
+The DKRZ system was procured in two phases that are roughly the same size.
+The second phase consists of [[http://www.seagate.com/files/www-content/product-content/xyratex-branded/clustered-file-systems/_shared/datasheets/seagate-clusterStor-l300-datasheet-11-06-15.pdf|ClusterStor L300]] equipped with Seagate Enterprise Capacity V5 disks (8 TB, ST8000NM0095).
+Both systems are configured in Scalable System Units (SSUs); pairs of servers in active/active fail-over mode that manages an extension unit (JBOD containing additional devices), resulting in two OSTs per OSS.
+Initially, we planned of creating one big shared file system, but now are using two file systems (one for the storage of phase 1 and one for phase 2). Both file systems are mounted on all compute nodes.
+===== Measurement protocols =====
+==== Peak performance ====
+The peak performance is derived from the maximum performance possible on a L300 that is 5.4 GiB/s, multiplied with the number of servers in the SSU/extension pairs we have installed (34 in phase 2).
+The L300 actually manages to achieve a better performance and operates at Infiniband speed. Still for the theoretic maximum, we consider the limit of 5.4 GiB/s.
+Lustre's obd-filter survey demonstrates that the phase 2 system alone is able to deliver 480 GB/s and 580 GB/s for write and read, respectively.
+==== Sustained metadata performance ====
+Performance has been measured using [[tools:benchmarks:parabench|Parabench]], see the description in [[lustre01]].
+The benchmark runs for a considerable time on 16 nodes with 16 processes per node but does not explicitly synchronize between the individual parabench runs.
+Theoretically, a single Parabench run could handle this setup, but the simpler approach has been chosen.
+In phase 2, we received additional 7 metadata servers, they now delivered between 30 and 35k Ops/s if stressed individually resulting in 210 kOPS/s.
+While both benchmarks have been executed individually, there is strong evidence that the way measurement is done allows us to add up the results of both runs.
+==== Sustained performance ====
+The reported performance result is only for the new phase 2 system.
+Performance of the phase 1 system has been measured with [[tools:benchmarks:ior|IOR]] (see also [[lustre01]]).
+Similarly performance of the phase 2 system has been measured.
+The configuration was as follows:
+  * Striping 128 OSTs = 32 SSUs
+  * 852 compute nodes, 4 IOR procs per node
+  * Arguments to IOR: -b 2000000 -t 2000000
+  * The amount of data was about 3x main memory of the used nodes
+The measurement has been conducted while production in Phase 1 was active. Since both systems share the Infiniband tree network, the observed performance is lower than the system capabilities.

Virtual Institute for I/O

User Tools

Site Tools

Differences

Virtual Institute
for
I/O