[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    Re: [lustre-discuss] Query regarding MDS and OSS functions
From:       "Dilger, Andreas" <andreas.dilger () intel ! com>
Date:       2016-06-29 7:50:34
Message-ID: F7801125-04F9-4D23-97A2-02AA466B9D43 () intel ! com
[Download RAW message or body]

One caveat - the Linux VFS still serializes file creates/unlinks in a singl=
e directory, even though the server allows them in parallel (it doesn't use=
 the VFS).  Even lookups within a single directory are serialized on the cl=
ient by the VFS except with the very latest kernels, and Lustre hasn't been=
 modified yet to take advantage of this.  That said, at least progress is b=
eing made on that front.

Cheers, Andreas

On Jun 28, 2016, at 20:42, Patrick Farrell <paf@cray.com<mailto:paf@cray.co=
m>> wrote:

That's a bit complicated - In Linux, creating files from one thread is of c=
ourse a one at a time thing, because you have to wait for each file to be c=
reated before going on.  The only way to do multiple creates at once from o=
ne client is with multiple threads.

The Lustre side of things:

Each MDS can have multiple MDTs (metadata volumes) attached to it it.  Each=
 metadata volume is mostly independent of the others, so you can create fil=
es on different MDTs at the same time.

Also, in the more recent versions of Lustre (2.8 and newer), one client can=
 have more than one modifying metadata request in flight at the same time. =
(that means some sort of metadata write, like a file create or permissions =
change).  That means several threads on a client can create files in parall=
el.  Prior versions were limited to one modifying metadata request at a tim=
e.  (For each client)

So, basically: Yes.
________________________________
From: Sangeetha Banavathi Srinivasa <bsangee@vt.edu<mailto:bsangee@vt.edu>>
Sent: Tuesday, June 28, 2016 8:42:37 PM
To: Dilger, Andreas
Cc: Patrick Farrell; lustre-discuss@lists.lustre.org<mailto:lustre-discuss@=
lists.lustre.org>
Subject: Re: [lustre-discuss] Query regarding MDS and OSS functions

Can a client ask for multiple file requests at the same  time or is it sequ=
ential as in it initially asks for one file creation and once that is done =
it asks for the next?

On Jun 28, 2016, at 9:39 PM, Dilger, Andreas <andreas.dilger@intel.com<mail=
to:andreas.dilger@intel.com>> wrote:

The MDS is in charge of the storage that is attached to it, so for a partic=
ular file or directory the client will always communicate with the same MDS=
. This is different from e.g. Ceph where the MDS is a "service" running on =
some node that doesn't have any of its own storage, so the Ceph MDS serving=
 a particular file or directory can change over time.

Cheers, Andreas

On Jun 28, 2016, at 19:24, Sangeetha Banavathi Srinivasa <bsangee@vt.edu<ma=
ilto:bsangee@vt.edu>> wrote:

Needed a clarification on the second question.
Whenever a client has to communicate with an MDS is it always the same MDS =
or can it be any MDS.

On Jun 28, 2016, at 9:19 PM, Patrick Farrell <paf@cray.com<mailto:paf@cray.=
com>> wrote:


Replies inline.

Hi,

I had the following doubts about the functioning of lustre.

1. Whenever a file creation request is sent from the client, the request is=
 sent to MDS from where a list of OSTs, file metadata etc is received.
        After this point does the data flow through the OSS to the OST or d=
oes the client communicate with the OST directly?

That question doesn't quite make sense.  The OST is just a disk volume.  Th=
e clients send data to the OSS (a server) that the OST in question is conne=
cted to, then the server writes that data to the OST.  This is done without=
 involving the metadata server (MDS) except when opening and closing the fi=
le.

2. If a lustre system has more than one MDS will the client always send its=
 requests to the same MDS or how is this decided?
An individual file is always on a particular single MDS.  So for a given fi=
le, the requests always go to the same MDS.  Deciding which files are put o=
n which MDS is more complicated.  (This feature is known as distributed nam=
espace or DNE, more info can be found online)

3. How does the MDS decide which OSTs have to be allocated whenever a file =
request has been made?
There are a number of different policies available.  I believe the default =
is a mix of round robin and space usage optimization to keep particular OST=
s from filling up.

4. How many OSTs can an OSS have? How is this upper limit decided if at all=
?
I don't know what the upper limit is (it is at least 16), but the software =
limit is not the main concern.  The OSS (the server) has to be able to move=
 enough data to keep the OSTs fed.  So while you can put many, many OSTs on=
 one server, in practice, you would not.

5. How many clients can an application have?
There is no limit.  Real systems exist with tens of thousands of client nod=
es connected to one Lustre file system.  The practical limit is not client =
count, it is performance of the servers, disks, and network.
________________________________
From: lustre-discuss <lustre-discuss-bounces@lists.lustre.org<mailto:lustre=
-discuss-bounces@lists.lustre.org>> on behalf of Sangeetha Banavathi Sriniv=
asa <bsangee@vt.edu<mailto:bsangee@vt.edu>>
Sent: Tuesday, June 28, 2016 7:58:03 PM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Query regarding MDS and OSS functions

Hi,

I had the following doubts about the functioning of lustre.

1. Whenever a file creation request is sent from the client, the request is=
 sent to MDS from where a list of OSTs, file metadata etc is received.
        After this point does the data flow through the OSS to the OST or d=
oes the client communicate with the OST directly?

2. If a lustre system has more than one MDS will the client always send its=
 requests to the same MDS or how is this decided?

3. How does the MDS decide which OSTs have to be allocated whenever a file =
request has been made?

4. How many OSTs can an OSS have? How is this upper limit decided if at all=
?

5. How many clients can an application have?


Thanks,
Sangeetha
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[Attachment #3 (text/html)]

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body dir="auto">
<div>One caveat - the Linux VFS still serializes file creates/unlinks in a single \
directory, even though the server allows them in parallel (it doesn't use the VFS). \
&nbsp;Even lookups within a single directory are serialized on the client by the VFS \
except with  the very latest kernels, and Lustre hasn't been modified yet to take \
advantage of this. &nbsp;That said, at least progress is being made on that \
front.&nbsp;<br> <br>
Cheers, Andreas</div>
<div><br>
On Jun 28, 2016, at 20:42, Patrick Farrell &lt;<a \
href="mailto:paf@cray.com">paf@cray.com</a>&gt; wrote:<br> <br>
</div>
<blockquote type="cite">
<div>That's a bit complicated - In Linux, creating files from one thread is of course \
a one at a time thing, because you have to wait for each file to be created before \
going on.&nbsp; The only way to do multiple creates at once from one client is with \
multiple  threads.<br>
<br>
The Lustre side of things:<br>
<br>
Each MDS can have multiple MDTs (metadata volumes) attached to it it.&nbsp; Each \
metadata volume is mostly independent of the others, so you can create files on \
different MDTs at the same time.<br> <br>
Also, in the more recent versions of Lustre (2.8 and newer), one client can have more \
than one modifying metadata request in flight at the same time. (that means some sort \
of metadata write, like a file create or permissions change).&nbsp; That means \
several threads  on a client can create files in parallel.&nbsp; Prior versions were \
limited to one modifying metadata request at a time.&nbsp; (For each client)<br> <br>
So, basically: Yes.<br>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" \
style="font-size:11pt" color="#000000"><b>From:</b> Sangeetha Banavathi Srinivasa \
&lt;<a href="mailto:bsangee@vt.edu">bsangee@vt.edu</a>&gt;<br> <b>Sent:</b> Tuesday, \
June 28, 2016 8:42:37 PM<br> <b>To:</b> Dilger, Andreas<br>
<b>Cc:</b> Patrick Farrell; <a \
href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br> \
<b>Subject:</b> Re: [lustre-discuss] Query regarding MDS and OSS functions</font> \
<div>&nbsp;</div> </div>
<div>Can a client ask for multiple file requests at the same &nbsp;time or is it \
sequential as in it initially asks for one file creation and once that is done it \
asks for the next? <div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Jun 28, 2016, at 9:39 PM, Dilger, Andreas &lt;<a \
href="mailto:andreas.dilger@intel.com" class="">andreas.dilger@intel.com</a>&gt; \
wrote:</div> <br class="Apple-interchange-newline">
<div class="">
<div dir="auto" class="">
<div class="">The MDS is in charge of the storage that is attached to it, so for a \
particular file or directory the client will always communicate with the same MDS. \
This is different from e.g. Ceph where the MDS is a &quot;service&quot; running on \
some node that doesn't  have any of its own storage, so the Ceph MDS serving a \
particular file or directory can change over time.&nbsp;<br class=""> <br class="">
Cheers, Andreas</div>
<div class=""><br class="">
On Jun 28, 2016, at 19:24, Sangeetha Banavathi Srinivasa &lt;<a \
href="mailto:bsangee@vt.edu" class="">bsangee@vt.edu</a>&gt; wrote:<br class=""> <br \
class=""> </div>
<blockquote type="cite" class="">
<div class="">Needed a clarification on the second question.
<div class="">Whenever a client has to communicate with an MDS is it always the same \
MDS or can it be any MDS.</div> <div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Jun 28, 2016, at 9:19 PM, Patrick Farrell &lt;<a \
href="mailto:paf@cray.com" class="">paf@cray.com</a>&gt; wrote:</div> <br \
class="Apple-interchange-newline"> <div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; \
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: \
auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; \
widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""> <br \
class=""> Replies inline.<br class="">
<br class="">
Hi,<br class="">
<br class="">
I had the following doubts about the functioning of lustre.<br class="">
<br class="">
1. Whenever a file creation request is sent from the client, the request is sent to \
MDS from where a list of OSTs, file metadata etc is received.<br class=""> \
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; After this point does the data flow \
through the OSS to the OST or does the client communicate with the OST directly?<br \
class=""> <br class="">
That question doesn't quite make sense.&nbsp; The OST is just a disk volume.&nbsp; \
The clients send data to the OSS (a server) that the OST in question is connected to, \
then the server writes that data to the OST.&nbsp; This is done without involving the \
metadata server (MDS)  except when opening and closing the file.<br class="">
<br class="">
2. If a lustre system has more than one MDS will the client always send its requests \
to the same MDS or how is this decided?<br class=""> An individual file is always on \
a particular single MDS.&nbsp; So for a given file, the requests always go to the \
same MDS.&nbsp; Deciding which files are put on which MDS is more complicated.&nbsp; \
(This feature is known as distributed namespace or DNE, more info can be  found \
online)<br class=""> <br class="">
3. How does the MDS decide which OSTs have to be allocated whenever a file request \
has been made?<br class=""> There are a number of different policies available.&nbsp; \
I believe the default is a mix of round robin and space usage optimization to keep \
particular OSTs from filling up.<br class=""> <br class="">
4. How many OSTs can an OSS have? How is this upper limit decided if at all?<br \
class=""> I don't know what the upper limit is (it is at least 16), but the software \
limit is not the main concern.&nbsp; The OSS (the server) has to be able to move \
enough data to keep the OSTs fed.&nbsp; So while you can put many, many OSTs on one \
server, in practice, you would  not.<br class="">
<br class="">
5. How many clients can an application have?<br class="">
There is no limit.&nbsp; Real systems exist with tens of thousands of client nodes \
connected to one Lustre file system.&nbsp; The practical limit is not client count, \
it is performance of the servers, disks, and network.<span \
class="Apple-converted-space">&nbsp;</span> <hr tabindex="-1" style="display: \
inline-block; width: 1205.390625px;" class=""> <div id="x_divRplyFwdMsg" dir="ltr" \
class=""><font face="Calibri, sans-serif" style="font-size: 11pt;" class=""><b \
class="">From:</b><span class="Apple-converted-space">&nbsp;</span>lustre-discuss \
&lt;<a href="mailto:lustre-discuss-bounces@lists.lustre.org" \
class="">lustre-discuss-bounces@lists.lustre.org</a>&gt;  on behalf of Sangeetha \
Banavathi Srinivasa &lt;<a href="mailto:bsangee@vt.edu" \
class="">bsangee@vt.edu</a>&gt;<br class=""> <b class="">Sent:</b><span \
class="Apple-converted-space">&nbsp;</span>Tuesday, June 28, 2016 7:58:03 PM<br \
class=""> <b class="">To:</b><span class="Apple-converted-space">&nbsp;</span><a \
href="mailto:lustre-discuss@lists.lustre.org" \
class="">lustre-discuss@lists.lustre.org</a><br class=""> <b \
class="">Subject:</b><span \
class="Apple-converted-space">&nbsp;</span>[lustre-discuss] Query regarding MDS and \
OSS functions</font> <div class="">&nbsp;</div>
</div>
</div>
<font size="2" style="font-family: Helvetica; font-style: normal; font-variant-caps: \
normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: \
start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; \
word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-size: \
10pt;" class=""> <div class="PlainText">Hi,<br class="">
<br class="">
I had the following doubts about the functioning of lustre.<br class="">
<br class="">
1. Whenever a file creation request is sent from the client, the request is sent to \
MDS from where a list of OSTs, file metadata etc is received.<br class=""> \
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; After this point does the data flow \
through the OSS to the OST or does the client communicate with the OST directly?<br \
class=""> <br class="">
2. If a lustre system has more than one MDS will the client always send its requests \
to the same MDS or how is this decided?<br class=""> <br class="">
3. How does the MDS decide which OSTs have to be allocated whenever a file request \
has been made?<br class=""> <br class="">
4. How many OSTs can an OSS have? How is this upper limit decided if at all?<br \
class=""> <br class="">
5. How many clients can an application have?<br class="">
<br class="">
<br class="">
Thanks,<br class="">
Sangeetha<br class="">
_______________________________________________<br class="">
lustre-discuss mailing list<br class="">
<a href="mailto:lustre-discuss@lists.lustre.org" \
class="">lustre-discuss@lists.lustre.org</a><br class=""> <a \
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" \
class="">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a></div> \
</span></font></div> </blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><span class="">_______________________________________________</span><br \
class=""> <span class="">lustre-discuss mailing list</span><br class="">
<span class=""><a href="mailto:lustre-discuss@lists.lustre.org" \
class="">lustre-discuss@lists.lustre.org</a></span><br class=""> <span class=""><a \
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" \
class="">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a></span><br \
class=""> </div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div><span>_______________________________________________</span><br>
<span>lustre-discuss mailing list</span><br>
<span><a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a></span><br>
 <span><a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a></span><br>
 </div>
</blockquote>
</body>
</html>



_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--===============4332327082401239919==--

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic