[prev in list] [next in list] [prev in thread] [next in thread]
List: lustre-discuss
Subject: Re: [lustre-discuss] Query regarding MDS and OSS functions
From: "Dilger, Andreas" <andreas.dilger () intel ! com>
Date: 2016-06-29 7:50:34
Message-ID: F7801125-04F9-4D23-97A2-02AA466B9D43 () intel ! com
[Download RAW message or body]
One caveat - the Linux VFS still serializes file creates/unlinks in a singl=
e directory, even though the server allows them in parallel (it doesn't use=
the VFS). Even lookups within a single directory are serialized on the cl=
ient by the VFS except with the very latest kernels, and Lustre hasn't been=
modified yet to take advantage of this. That said, at least progress is b=
eing made on that front.
Cheers, Andreas
On Jun 28, 2016, at 20:42, Patrick Farrell <paf@cray.com<mailto:paf@cray.co=
m>> wrote:
That's a bit complicated - In Linux, creating files from one thread is of c=
ourse a one at a time thing, because you have to wait for each file to be c=
reated before going on. The only way to do multiple creates at once from o=
ne client is with multiple threads.
The Lustre side of things:
Each MDS can have multiple MDTs (metadata volumes) attached to it it. Each=
metadata volume is mostly independent of the others, so you can create fil=
es on different MDTs at the same time.
Also, in the more recent versions of Lustre (2.8 and newer), one client can=
have more than one modifying metadata request in flight at the same time. =
(that means some sort of metadata write, like a file create or permissions =
change). That means several threads on a client can create files in parall=
el. Prior versions were limited to one modifying metadata request at a tim=
e. (For each client)
So, basically: Yes.
________________________________
From: Sangeetha Banavathi Srinivasa <bsangee@vt.edu<mailto:bsangee@vt.edu>>
Sent: Tuesday, June 28, 2016 8:42:37 PM
To: Dilger, Andreas
Cc: Patrick Farrell; lustre-discuss@lists.lustre.org<mailto:lustre-discuss@=
lists.lustre.org>
Subject: Re: [lustre-discuss] Query regarding MDS and OSS functions
Can a client ask for multiple file requests at the same time or is it sequ=
ential as in it initially asks for one file creation and once that is done =
it asks for the next?
On Jun 28, 2016, at 9:39 PM, Dilger, Andreas <andreas.dilger@intel.com<mail=
to:andreas.dilger@intel.com>> wrote:
The MDS is in charge of the storage that is attached to it, so for a partic=
ular file or directory the client will always communicate with the same MDS=
. This is different from e.g. Ceph where the MDS is a "service" running on =
some node that doesn't have any of its own storage, so the Ceph MDS serving=
a particular file or directory can change over time.
Cheers, Andreas
On Jun 28, 2016, at 19:24, Sangeetha Banavathi Srinivasa <bsangee@vt.edu<ma=
ilto:bsangee@vt.edu>> wrote:
Needed a clarification on the second question.
Whenever a client has to communicate with an MDS is it always the same MDS =
or can it be any MDS.
On Jun 28, 2016, at 9:19 PM, Patrick Farrell <paf@cray.com<mailto:paf@cray.=
com>> wrote:
Replies inline.
Hi,
I had the following doubts about the functioning of lustre.
1. Whenever a file creation request is sent from the client, the request is=
sent to MDS from where a list of OSTs, file metadata etc is received.
After this point does the data flow through the OSS to the OST or d=
oes the client communicate with the OST directly?
That question doesn't quite make sense. The OST is just a disk volume. Th=
e clients send data to the OSS (a server) that the OST in question is conne=
cted to, then the server writes that data to the OST. This is done without=
involving the metadata server (MDS) except when opening and closing the fi=
le.
2. If a lustre system has more than one MDS will the client always send its=
requests to the same MDS or how is this decided?
An individual file is always on a particular single MDS. So for a given fi=
le, the requests always go to the same MDS. Deciding which files are put o=
n which MDS is more complicated. (This feature is known as distributed nam=
espace or DNE, more info can be found online)
3. How does the MDS decide which OSTs have to be allocated whenever a file =
request has been made?
There are a number of different policies available. I believe the default =
is a mix of round robin and space usage optimization to keep particular OST=
s from filling up.
4. How many OSTs can an OSS have? How is this upper limit decided if at all=
?
I don't know what the upper limit is (it is at least 16), but the software =
limit is not the main concern. The OSS (the server) has to be able to move=
enough data to keep the OSTs fed. So while you can put many, many OSTs on=
one server, in practice, you would not.
5. How many clients can an application have?
There is no limit. Real systems exist with tens of thousands of client nod=
es connected to one Lustre file system. The practical limit is not client =
count, it is performance of the servers, disks, and network.
________________________________
From: lustre-discuss <lustre-discuss-bounces@lists.lustre.org<mailto:lustre=
-discuss-bounces@lists.lustre.org>> on behalf of Sangeetha Banavathi Sriniv=
asa <bsangee@vt.edu<mailto:bsangee@vt.edu>>
Sent: Tuesday, June 28, 2016 7:58:03 PM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Query regarding MDS and OSS functions
Hi,
I had the following doubts about the functioning of lustre.
1. Whenever a file creation request is sent from the client, the request is=
sent to MDS from where a list of OSTs, file metadata etc is received.
After this point does the data flow through the OSS to the OST or d=
oes the client communicate with the OST directly?
2. If a lustre system has more than one MDS will the client always send its=
requests to the same MDS or how is this decided?
3. How does the MDS decide which OSTs have to be allocated whenever a file =
request has been made?
4. How many OSTs can an OSS have? How is this upper limit decided if at all=
?
5. How many clients can an application have?
Thanks,
Sangeetha
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[Attachment #3 (text/html)]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body dir="auto">
<div>One caveat - the Linux VFS still serializes file creates/unlinks in a single \
directory, even though the server allows them in parallel (it doesn't use the VFS). \
Even lookups within a single directory are serialized on the client by the VFS \
except with the very latest kernels, and Lustre hasn't been modified yet to take \
advantage of this. That said, at least progress is being made on that \
front. <br> <br>
Cheers, Andreas</div>
<div><br>
On Jun 28, 2016, at 20:42, Patrick Farrell <<a \
href="mailto:paf@cray.com">paf@cray.com</a>> wrote:<br> <br>
</div>
<blockquote type="cite">
<div>That's a bit complicated - In Linux, creating files from one thread is of course \
a one at a time thing, because you have to wait for each file to be created before \
going on. The only way to do multiple creates at once from one client is with \
multiple threads.<br>
<br>
The Lustre side of things:<br>
<br>
Each MDS can have multiple MDTs (metadata volumes) attached to it it. Each \
metadata volume is mostly independent of the others, so you can create files on \
different MDTs at the same time.<br> <br>
Also, in the more recent versions of Lustre (2.8 and newer), one client can have more \
than one modifying metadata request in flight at the same time. (that means some sort \
of metadata write, like a file create or permissions change). That means \
several threads on a client can create files in parallel. Prior versions were \
limited to one modifying metadata request at a time. (For each client)<br> <br>
So, basically: Yes.<br>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" \
style="font-size:11pt" color="#000000"><b>From:</b> Sangeetha Banavathi Srinivasa \
<<a href="mailto:bsangee@vt.edu">bsangee@vt.edu</a>><br> <b>Sent:</b> Tuesday, \
June 28, 2016 8:42:37 PM<br> <b>To:</b> Dilger, Andreas<br>
<b>Cc:</b> Patrick Farrell; <a \
href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br> \
<b>Subject:</b> Re: [lustre-discuss] Query regarding MDS and OSS functions</font> \
<div> </div> </div>
<div>Can a client ask for multiple file requests at the same time or is it \
sequential as in it initially asks for one file creation and once that is done it \
asks for the next? <div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Jun 28, 2016, at 9:39 PM, Dilger, Andreas <<a \
href="mailto:andreas.dilger@intel.com" class="">andreas.dilger@intel.com</a>> \
wrote:</div> <br class="Apple-interchange-newline">
<div class="">
<div dir="auto" class="">
<div class="">The MDS is in charge of the storage that is attached to it, so for a \
particular file or directory the client will always communicate with the same MDS. \
This is different from e.g. Ceph where the MDS is a "service" running on \
some node that doesn't have any of its own storage, so the Ceph MDS serving a \
particular file or directory can change over time. <br class=""> <br class="">
Cheers, Andreas</div>
<div class=""><br class="">
On Jun 28, 2016, at 19:24, Sangeetha Banavathi Srinivasa <<a \
href="mailto:bsangee@vt.edu" class="">bsangee@vt.edu</a>> wrote:<br class=""> <br \
class=""> </div>
<blockquote type="cite" class="">
<div class="">Needed a clarification on the second question.
<div class="">Whenever a client has to communicate with an MDS is it always the same \
MDS or can it be any MDS.</div> <div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Jun 28, 2016, at 9:19 PM, Patrick Farrell <<a \
href="mailto:paf@cray.com" class="">paf@cray.com</a>> wrote:</div> <br \
class="Apple-interchange-newline"> <div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; \
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: \
auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; \
widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""> <br \
class=""> Replies inline.<br class="">
<br class="">
Hi,<br class="">
<br class="">
I had the following doubts about the functioning of lustre.<br class="">
<br class="">
1. Whenever a file creation request is sent from the client, the request is sent to \
MDS from where a list of OSTs, file metadata etc is received.<br class=""> \
After this point does the data flow \
through the OSS to the OST or does the client communicate with the OST directly?<br \
class=""> <br class="">
That question doesn't quite make sense. The OST is just a disk volume. \
The clients send data to the OSS (a server) that the OST in question is connected to, \
then the server writes that data to the OST. This is done without involving the \
metadata server (MDS) except when opening and closing the file.<br class="">
<br class="">
2. If a lustre system has more than one MDS will the client always send its requests \
to the same MDS or how is this decided?<br class=""> An individual file is always on \
a particular single MDS. So for a given file, the requests always go to the \
same MDS. Deciding which files are put on which MDS is more complicated. \
(This feature is known as distributed namespace or DNE, more info can be found \
online)<br class=""> <br class="">
3. How does the MDS decide which OSTs have to be allocated whenever a file request \
has been made?<br class=""> There are a number of different policies available. \
I believe the default is a mix of round robin and space usage optimization to keep \
particular OSTs from filling up.<br class=""> <br class="">
4. How many OSTs can an OSS have? How is this upper limit decided if at all?<br \
class=""> I don't know what the upper limit is (it is at least 16), but the software \
limit is not the main concern. The OSS (the server) has to be able to move \
enough data to keep the OSTs fed. So while you can put many, many OSTs on one \
server, in practice, you would not.<br class="">
<br class="">
5. How many clients can an application have?<br class="">
There is no limit. Real systems exist with tens of thousands of client nodes \
connected to one Lustre file system. The practical limit is not client count, \
it is performance of the servers, disks, and network.<span \
class="Apple-converted-space"> </span> <hr tabindex="-1" style="display: \
inline-block; width: 1205.390625px;" class=""> <div id="x_divRplyFwdMsg" dir="ltr" \
class=""><font face="Calibri, sans-serif" style="font-size: 11pt;" class=""><b \
class="">From:</b><span class="Apple-converted-space"> </span>lustre-discuss \
<<a href="mailto:lustre-discuss-bounces@lists.lustre.org" \
class="">lustre-discuss-bounces@lists.lustre.org</a>> on behalf of Sangeetha \
Banavathi Srinivasa <<a href="mailto:bsangee@vt.edu" \
class="">bsangee@vt.edu</a>><br class=""> <b class="">Sent:</b><span \
class="Apple-converted-space"> </span>Tuesday, June 28, 2016 7:58:03 PM<br \
class=""> <b class="">To:</b><span class="Apple-converted-space"> </span><a \
href="mailto:lustre-discuss@lists.lustre.org" \
class="">lustre-discuss@lists.lustre.org</a><br class=""> <b \
class="">Subject:</b><span \
class="Apple-converted-space"> </span>[lustre-discuss] Query regarding MDS and \
OSS functions</font> <div class=""> </div>
</div>
</div>
<font size="2" style="font-family: Helvetica; font-style: normal; font-variant-caps: \
normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: \
start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; \
word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-size: \
10pt;" class=""> <div class="PlainText">Hi,<br class="">
<br class="">
I had the following doubts about the functioning of lustre.<br class="">
<br class="">
1. Whenever a file creation request is sent from the client, the request is sent to \
MDS from where a list of OSTs, file metadata etc is received.<br class=""> \
After this point does the data flow \
through the OSS to the OST or does the client communicate with the OST directly?<br \
class=""> <br class="">
2. If a lustre system has more than one MDS will the client always send its requests \
to the same MDS or how is this decided?<br class=""> <br class="">
3. How does the MDS decide which OSTs have to be allocated whenever a file request \
has been made?<br class=""> <br class="">
4. How many OSTs can an OSS have? How is this upper limit decided if at all?<br \
class=""> <br class="">
5. How many clients can an application have?<br class="">
<br class="">
<br class="">
Thanks,<br class="">
Sangeetha<br class="">
_______________________________________________<br class="">
lustre-discuss mailing list<br class="">
<a href="mailto:lustre-discuss@lists.lustre.org" \
class="">lustre-discuss@lists.lustre.org</a><br class=""> <a \
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" \
class="">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a></div> \
</span></font></div> </blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><span class="">_______________________________________________</span><br \
class=""> <span class="">lustre-discuss mailing list</span><br class="">
<span class=""><a href="mailto:lustre-discuss@lists.lustre.org" \
class="">lustre-discuss@lists.lustre.org</a></span><br class=""> <span class=""><a \
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" \
class="">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a></span><br \
class=""> </div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div><span>_______________________________________________</span><br>
<span>lustre-discuss mailing list</span><br>
<span><a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a></span><br>
<span><a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a></span><br>
</div>
</blockquote>
</body>
</html>
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
--===============4332327082401239919==--
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic