'[Lustre-discuss] mds inodes'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] mds inodes
From:       anselm.strauss () id ! unibe ! ch (Anselm Strauss)
Date:       2006-05-19 7:36:50
Message-ID: C8F1036D-6529-4210-B5CB-7BBE61778BBC () id ! unibe ! ch
[Download RAW message or body]

thanks for the detailed answers.

On May 15, 2006, at 6:51 PM, Andreas Dilger wrote:

> On May 15, 2006  15:04 +0200, Anselm Strauss wrote:
>> i noticed every time i create a file on a lustre filesystem one inode
>> on the corresponding mds is used (as well as one inode on the ost
>> itself). the minimum bytes per inode for ext3 is 1024 and the maximum
>> block size is 4096, thus the maximum rate of inodes per block is 4.
>
> That is only true if you use the "-i" option for mke2fs.  If you  
> specify
> some absolute number of inodes using "-N {num inodes}" newer e2fsprogs
> will reduce the group size to allow an increased number of inodes  
> beyond
> 1 per 1024 bytes.
>
> That said, there likely isn't a good reason to do this.  Lustre  
> uses large
> inodes (at least 512 bytes by default), and there has to be space left
> in the filesystem for other metadata like the journal (up to 400MB),
> bitmaps, and directories.  There is also a small number of regular  
> files
> that Lustre uses to maintain cluster consistency.
>
> CFS recommends 4kB/inode on the MDS to be safe.
>
>> needs to be at least a fourth of the size of my lustre fs (assuming a
>> block size of 4K on my lustre fs and 1K bytes per inode on my mds).
>> like this the mds has four times less data blocks as the lustre fs,
>> but 4 inodes for each block and therefore one inode for each block on
>> the lustre fs. for example, with a 10tb lustre fs i need at least
>> 2.5tb metadata storage.
>
> Firstly, you are confusing several items here:
> - the MDS and OST filesystems are unrelated, so the formatting  
> parameters
>   for the two do not need to be the same
> - since the MDS and OST filesystems are independent, the size of  
> the MDS
>   filesystem is purely a factor of how many inodes you want in the  
> total
>   Lustre filesystem, and not the size of the aggregate OST space
> - you can have a much higher maximum number of bytes per inode in the
>   filesystem, up to 128 MB per 8 inodes, which is useful for OSTs if
>   you have a very large average file size
>
> As a result, the only important factor when calculating the MDS  
> size is
> the average size of files to be stored in the filesystem.  If the  
> average
> file size is, say, 5MB and you have 100TB of usable OST space then you
> need at least (100 * 1024 * 1024 / 5) = 20M inodes, though I would  
> always
> recommend 2x the minimum, so 50M inodes.  At the default 4kB/inode  
> space
> this works out to only 80GB of space for the MDS.

this is actually the point i was concerned about. for example, i  
currently have an xfs filesystem of 4TB with 2600M inodes, and i used  
standard parameters to format it. with your calculation i would need  
a verry big mds. but i also have to say, that none of my filesystems  
has an inode usage bigger than 1%, so your calculation seems sufficient.
still, would you think it's a good idea to format the mds space with - 
N, for example use half the space for inodes an the other half for  
metadata? 40GB for metadata should be sufficient, i think. or would a  
lot of inodes on the mds also have some performance impact?

> On the other end of the spectrum, if you had a very small average  
> file size
> (e.g. 4kB), it is true that Lustre isn't very efficient, since at  
> that point
> you consume as much space on the MDS as you are on the OSTs.  This  
> is not
> a very common configuration for Lustre.  With a 2TB MDS you could  
> potentially
> have 1kB/inode (with 512-byte inodes I wouldn't go any lower) so 2B  
> inodes,
> and this would need 2B * 4kB = 8TB of usable OST space.  Depending  
> on your
> needs, you could instead just do this with a single ext3 filesystem  
> instead
> of Lustre.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>

anselm

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic