'[Lustre-discuss] lustre 1.6.0.1'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] lustre 1.6.0.1
From:       aaron () iges ! org (Aaron Knister)
Date:       2007-06-24 21:17:47
Message-ID: D77D30F2-FB3F-4864-8309-4F479EB4A063 () iges ! org
[Download RAW message or body]

If you weren't happy with GFS try OCFS2. It's oracle's cluster  
filesystem and it's SOO easy to set up. Sadly I don't have answers to  
any of your other questions other than the fact that Lustre's  
performance with small files is abysmal for me too. I'm very much  
interested in any tunables.

-Aaron

On Jun 21, 2007, at 9:20 AM, Balagopal Pillai wrote:

> Hi,
>
>           I am using Lustre 1.6.0.1 with one OST and 20 clients in  
> an HPC cluster.
> The OST/MDT/MGS has a 16 channel 3ware 9650 using raid6. I  
> currently have another lustre installation
> (version 1.4.5) and it has been working trouble free for over an  
> year. The OS is CentOS 4. There are 4 network
> ports in the storage server in adaptive load balanced mode and  
> aggregate network throughout is great (with 4 x netperf/iperf from  
> clients)
> in an ideal situation when clients pick up different mac addresses  
> of the different interfaces in their arp table.
>
>           I have a few questions about Lustre and hope someone can  
> help me.
>
> * I had to re-export the lustre volume via nfs on the new 1.6.0.1  
> setup to other infrastructure boxes.
> After the export, i get the following error messages in the OSS -
>
>
> Jun 21 09:31:11 lustre-3ware kernel: Lustre: 4946:0: 
> (lustre_fsfilt.h:205:fsfilt_start_log()) scratch-OST0000: slow  
> journal start 33s
> Jun 21 09:31:11 lustre-3ware kernel: Lustre: 4946:0: 
> (lustre_fsfilt.h:205:fsfilt_start_log()) Skipped 22 previous  
> similar messages
> Jun 21 09:31:11 lustre-3ware kernel: Lustre: 4874:0:(filter.c: 
> 1139:filter_parent_lock()) scratch-OST0000: slow parent lock 33s
> Jun 21 09:31:11 lustre-3ware kernel: Lustre: 4874:0:(filter.c: 
> 1139:filter_parent_lock()) Skipped 6 previous similar messages
>
> Also is the NFS re-export option stable in version 1.6? I read some  
> posts before in the list reporting kernel panics on Lustre 1.4.
>
>
> *I was evaluating GFS for the past few weeks with GNBD and the  
> performance was amazing (at least for my purpose with one storage  
> server). It was very fast, especially for small files.
> But i had to dump it because of stability reasons. The problems  
> were these - has 6 daemons that need to come up in a particular  
> order. If some of
> the kernel modules crash on heavy load on a node, the whole cluster  
> freezes. It had the issue of quorum, which is beneficial on a HA  
> setup, may be not for HPC.
> In some cases, i have to keep just one server running  that re- 
> exports the volume via nfs even if the hpc nodes are down. Like  
> during a power failure for example. Quorum is a
> problem in that case. But it was mostly stability that made me not  
> go with GFS + GNBD.
>
> *Now the problem - Lustre performance dips a lot when it comes to  
> small files. Please see the following fileop -f 5 test comparing  
> NFS and Lustre -
>
>
> Lustre -
>         Fileop:  File size is 1,  Output is in Ops/sec. (A=Avg,  
> B=Best, W=Worst)
> .      mkdir  rmdir create   read  write  close   stat access   
> chmod readdir link   unlink delete  Total_files
> A    5   1654    691    132  14228    719   4874   1987  32737    
> 1718   2506   1262   1340   1608          125
>
>
> NFS -
> Fileop:  File size is 1,  Output is in Ops/sec. (A=Avg, B=Best,  
> W=Worst)
> .      mkdir  rmdir create   read  write  close   stat access   
> chmod readdir link   unlink delete  Total_files
> A    5    177    594    459 380747 137392   2282   1219 444312     
> 502   1274    306    513    464          125
>
>        Could you please recommend any tunables to get a bit more  
> performance out of Lustre with lots of small files? Lots of small  
> files was bad in GFS too, but
> it was better than NFS though.
>
> *Also the read performance of Lustre seems to be a little behind  
> NFS. I had /opt which has all the software for users moved to  
> Lustre in the new setup. But
> software like Matlab, Splus etc takes almost a minute to come up.  
> The second time is very fast though, maybe due to caching. So i am  
> thinking of putting /opt
> back to NFS. Is it possible to boost the read performance of Lustre  
> a bit?
>
> *Is there a way to make disk quotas activate at startup  
> automatically on a Lustre client? The lfs quotaon <mount point>  
> works sometimes. But
> sometimes it gives an a resource busy error message.
> *One last question. In the older Lustre setup (version 1.4.5), i  
> have 5 scsi drives one each as an OST for a single volume. The  
> volume became full. But df still reported
> that there is 27GB free.  There doesn't seem to be an lfs df option  
> in that version of Lustre. So i couldn't see the individual  
> utilization of each of the 5 OST. Is this a striping
> problem?
>               I know it's a lot of questions. Hope some of them are  
> solvable.  Thanks very much.
>
>
>
> Best Regards
>
> Balagopal Pillai
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Aaron Knister
Systems Administrator/Web Master
Center for Research on Environment and Water

(301) 595-7001
aaron@iges.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070621/cb9f0301/attachment.html

[prev in list] [next in list] [prev in thread] [next in thread]