'[Lustre-discuss] [HPDD-discuss] Same performance Infiniband and Ethernet'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] [HPDD-discuss] Same performance Infiniband and Ethernet
From:       alfonso.pardo () ciemat ! es (Pardo Diaz, Alfonso)
Date:       2014-05-21 6:32:54
Message-ID: 1CC41228-9D18-4B45-85BB-FE6C6C95A9ED () ciemat ! es
[Download RAW message or body]

Thanks Richard, I appreciate your advice.

I was able to sature the channel using: XDD, 10 threads writing in 10 OST, each OST \
in difference OSS and this is the result:

ETHERNET
                                   T  Q       Bytes             Ops      Time         \
Rate       IOPS      Latency   %CPU TARGET   Average     0  1    2147483648    65536  \
140.156    15.322     467.59    0.0021    39.16 TARGET   Average     1  1    \
2147483648    65536   140.785    15.254     465.50    0.0021    39.11 TARGET   \
Average     2  1    2147483648    65536   140.559    15.278     466.25    0.0021    \
39.14 TARGET   Average     3  1    2147483648    65536   176.141    12.192     372.07 \
0.0027    38.02 TARGET   Average     4  1    2147483648    65536   168.234    12.765  \
389.55    0.0026    38.54 TARGET   Average     5  1    2147483648    65536   140.823  \
15.250     465.38    0.0021    39.11 TARGET   Average     6  1    2147483648    65536 \
140.183    15.319     467.50    0.0021    39.16 TARGET   Average     8  1    \
2147483648    65536   176.432    12.172     371.45    0.0027    38.02 TARGET   \
Average     9  1    2147483648    65536   167.944    12.787     390.23    0.0026    \
                38.57
         Combined   10 10   21474836480   655360   180.000   119.305     3640.89    \
0.0003    387.99

INFINIBAND
                                   T  Q       Bytes             Ops      Time         \
Rate       IOPS      Latency   %CPU TARGET   Average     0  1    2147483648    65536  \
9.369   229.217     6995.16    0.0001    480.40 TARGET   Average     1  1    \
2147483648    65536     9.540   225.110     6869.80    0.0001    474.25 TARGET   \
Average     2  1    2147483648    65536     8.963   239.582     7311.45    0.0001    \
479.85 TARGET   Average     3  1    2147483648    65536     9.480   226.521     \
6912.86    0.0001    478.21 TARGET   Average     4  1    2147483648    65536     \
9.109   235.748     7194.47    0.0001    480.83 TARGET   Average     5  1    \
2147483648    65536     9.284   231.299     7058.69    0.0001    479.04 TARGET   \
Average     6  1    2147483648    65536     8.839   242.947     7414.15    0.0001    \
480.55 TARGET   Average     7  1    2147483648    65536     9.210   233.166     \
7115.65    0.0001    480.17 TARGET   Average     8  1    2147483648    65536     \
9.373   229.125     6992.33    0.0001    475.13 TARGET   Average     9  1    \
                2147483648    65536     9.184   233.828     7135.86    0.0001    \
                480.25
         Combined   10 10   21474836480   655360     9.540   2251.097     68698.03    \
0.0000    4788.69

A estimate is 0,6Gbits (max 1Gbit) by ethernet and 16Gbits by infiniband (max \
40Gbits).

REGARDS!

El 19/05/2014, a las 17:37, Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu> \
escribi?:

> Alfonso,
> 
> Based on my attempts to benchmark single client Lustre performance, here is some \
> advice/comments that I have.  (YMMV) 
> 1) On the IB client, I recommend disabling checksums (lctl set_param \
> osc.*.checksums=0).  Having checksums enabled sometimes results in a significant \
> performance hit. 
> 2) Single-threaded tests (like dd) will usually bottleneck before you can max out \
> the total client performance.  You need to use a multi-threaded tool (like xdd) and \
> have several threads perform IO at the same time in order to measure aggregate \
> single client performance. 
> 3) When using a tool like xdd, set up the test to run for a fixed amount of time \
> rather than having each thread write a fixed amount of data.  If all threads write \
> a fixed amount of data (say 1 GB), and if any of the threads run slower than \
> others, you might get skewed results for the aggregate throughput because of the \
> stragglers. 
> 4) In order to avoid contention at the ost level among the multiple threads on a \
> single client, precreate the output files with stripe_count=1 and statically assign \
> them evenly to the different osts.  Have each thread write to a different file so \
> that no two processes write to the same ost.  If you don't have enough osts to \
> saturate the client, you can always have two files per ost.  Going beyond that will \
> likely hurt more than help, at least for an ldiskfs backend. 
> 5) In my testing, I seem to get worse results using direct I/O for write tests,  so \
> I usually just use buffered I/O.  Based on my understanding, the max_dirty_mb \
> parameter on the client (which defaults to 32 MB) limits the amount of dirty \
> written data than can be cached on each ost.  Unless you have increased this to a \
> very large number, that parameter will likely mitigate any effects of client \
> caching on the test results.  (NOTE: This reasoning only applies to write tests.  \
> Any written data can still be cached by the client, and a subsequent read test \
> might very well pull data from cache unless you have taken steps to flush the \
> cached data.) 
> If you have 10 oss nodes and 20 osts in your file system, I would start by running \
> a test with 10 threads and have each thread write to a single ost on different \
> servers.  You can increase/decrease the number of threads as needed to see if the \
> aggregate performance gets better/worse.  On my clients with QDR IB, I typically \
> see aggregate write speeds in the range of 2.5-3.0 GB/s. 
> You are probably already aware of this, but just in case, make sure that the IB \
> clients you use for testing don't also have ethernet connections to your OSS \
> servers.  If the client has an ethernet and an IB path to the same server, it will \
> choose one of the paths to use.  It could end up choosing ethernet instead of IB \
> and mess up your results. 
> -- 
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
> 
> 
> On May 19, 2014, at 6:33 AM, "Pardo Diaz, Alfonso" <alfonso.pardo at ciemat.es>
> wrote:
> 
> > Hi,
> > 
> > I have migrated my Lustre 2.2 to 2.5.1 and I have equipped my OSS/MDS and clients \
> > with Infiniband QDR interfaces. I have compile lustre with OFED 3.2 and I have \
> > configured lnet module with: 
> > options lent networks=?o2ib(ib0),tcp(eth0)?
> > 
> > 
> > But when I try to compare the lustre performance across Infiniband (o2ib), I get \
> > the same performance than across ethernet (tcp): 
> > INFINIBAND TEST:
> > dd if=/dev/zero of=test.dat bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1,0 GB) copied, 5,88433 s, 178 MB/s
> > 
> > ETHERNET TEST:
> > dd if=/dev/zero of=test.dat bs=1M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1,0 GB) copied, 5,97423 s, 154 MB/s
> > 
> > 
> > And this is my scenario:
> > 
> > - 1 MDs with SSD RAID10 MDT
> > - 10 OSS with 2 OST per OSS
> > - Infiniband interface in connected mode
> > - Centos 6.5
> > - Lustre 2.5.1
> > - Striped filesystem ?lfs setstripe -s 1M -c 10"
> > 
> > 
> > I know my infiniband running correctly, because if I use IPERF3 between client \
> > and servers I got 40Gb/s by infiniband and 1Gb/s by ethernet connections. 
> > 
> > 
> > Could you help me?
> > 
> > 
> > Regards,
> > 
> > 
> > 
> > 
> > 
> > Alfonso Pardo Diaz
> > System Administrator / Researcher
> > c/ Sola n? 1; 10200 Trujillo, ESPA?A
> > Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
> > 
> > 
> > 
> > 
> > ----------------------------
> > Confidencialidad: 
> > Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y \
> > puede contener informaci?n privilegiada o confidencial. Si no es vd. el \
> > destinatario indicado, queda notificado de que la utilizaci?n, divulgaci?n y/o \
> > copia sin autorizaci?n est? prohibida en virtud de la legislaci?n vigente. Si ha \
> > recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente \
> > respondiendo al mensaje y proceda a su destrucci?n. 
> > Disclaimer: 
> > This message and its attached files is intended exclusively for its recipients \
> > and may contain confidential information. If you received this e-mail in error \
> > you are hereby notified that any dissemination, copy or disclosure of this \
> > communication is strictly prohibited and may be unlawful. In this case, please \
> >                 notify us by a reply and delete this email and its contents \
> >                 immediately. 
> > ----------------------------
> > 
> > _______________________________________________
> > HPDD-discuss mailing list
> > HPDD-discuss at lists.01.org
> > https://lists.01.org/mailman/listinfo/hpdd-discuss
> 
> 
> 

[prev in list] [next in list] [prev in thread] [next in thread]