[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    Re: [lustre-discuss] Question regarding user access during recovery and journal replay
From:       Marc O'Brien via lustre-discuss <lustre-discuss () lists ! lustre ! org>
Date:       2023-03-14 14:38:51
Message-ID: LO2P265MB477388AF5F2B2947AAAC12FE86BE9 () LO2P265MB4773 ! GBRP265 ! PROD ! OUTLOOK ! COM
[Download RAW message or body]

Thank you so much, that has puzzled me for sometime now :).

From: Patrick Farrell <pfarrell@ddn.com>
Date: Tuesday, 14 March 2023 at 14:36
To: lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>, Marc=
 O'Brien <Marc.OBrien@cruk.cam.ac.uk>
Subject: Re: Question regarding user access during recovery and journal rep=
lay

Marc,



[Re-posting to the list...]



No, it=92s fine to have interaction during those times. The system is desig=
ned to do that work online.  Depending what you=92re trying to do and what =
you=92re accessing, some client operations will experience delays, but that=
=92s it.  For example, during failover/recovery for a particular OST or MDT=
, no new IO to that target will complete.  But the user programs will just =
wait - it=92s safe to leave them running.



So recovery, etc, will show up to users as delays in some requests, but it=
=92s safe to do with users accessing the system.



Regards,

Patrick

________________________________
From: lustre-discuss <lustre-discuss-bounces@lists.lustre.org> on behalf of=
 Marc O'Brien via lustre-discuss <lustre-discuss@lists.lustre.org>
Sent: Tuesday, March 14, 2023 7:24 AM
To: lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Question regarding user access during recovery an=
d journal replay


Hi,

When I was first taught some Lustre file system administration, it was stre=
ssed that when recovering a Lustre file system and while the journal replay=
 was occurring on each host, there should be no user interaction with the f=
ile system. Any recovery was done with cluster access denied to HPC users, =
or when the cluster was deemed to be quiescent. This seemed to make sense a=
s during journal replay the file system is in R/W state, but the distribute=
d file system may not have reached a stable state. We now have multiple Lus=
tre file systems (2 Ext4 based and 1 ZFS based) and evicting users or findi=
ng a quiescent time is problematic (luckily there are maintenance windows f=
or the routine stuff).

I have searched online and have yet to see in print that there should be no=
 user interaction with Lustre during recovery or journal replay (I may have=
 missed it).

So, my question is, is the no cluster user interaction during recovery and =
journal replay restriction, actually a thing?

Thanks in advance for any enlightenment :)

Marc



[Attachment #3 (text/html)]

<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40"> <head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:inherit;
	panose-1:2 11 6 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
span.contentpasted0
	{mso-style-name:contentpasted0;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
	{mso-style-name:x_msonormal;
	margin:0cm;
	font-size:12.0pt;
	font-family:"Calibri",sans-serif;}
span.EmailStyle23
	{mso-style-type:personal-reply;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Thank you so much, that \
has puzzled me for sometime now :).<o:p></o:p></span></p> <p class="MsoNormal"><span \
style="mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p> <div \
style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm"> <p \
class="MsoNormal" style="margin-bottom:12.0pt"><b><span \
style="font-size:12.0pt;color:black">From: </span></b><span \
style="font-size:12.0pt;color:black">Patrick Farrell &lt;pfarrell@ddn.com&gt;<br> \
<b>Date: </b>Tuesday, 14 March 2023 at 14:36<br> <b>To: \
</b>lustre-discuss@lists.lustre.org &lt;lustre-discuss@lists.lustre.org&gt;, Marc \
O'Brien &lt;Marc.OBrien@cruk.cam.ac.uk&gt;<br> <b>Subject: </b>Re: Question regarding \
user access during recovery and journal replay<o:p></o:p></span></p> </div>
<div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">Marc,</span></span><span \
style="color:#242424"><o:p></o:p></span></p> <p style="background:white"><span \
style="color:#242424"><o:p>&nbsp;</o:p></span></p> <p style="background:white"><span \
class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">[Re-posting to the \
list...]</span></span><span style="color:#242424"><o:p></o:p></span></p> </div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">&nbsp;</span></span><span \
style="color:#242424"><o:p></o:p></span></p> </div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">No, it’s fine to have \
interaction during those times. The system is designed to do that work online. \
&nbsp;Depending what you’re trying to do and what  you’re accessing, some client \
operations will experience delays, but that’s it. &nbsp;For example, during \
failover/recovery for a particular OST or MDT, no new IO to that target will \
complete. &nbsp;But the user programs will just wait - it’s safe to leave them \
running.</span></span><span style="color:#242424"><o:p></o:p></span></p> </div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">&nbsp;</span></span><span \
style="color:#242424"><o:p></o:p></span></p> </div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">So recovery, etc, will show \
up to users as delays in some requests, but it’s safe to do with users accessing the \
system.</span></span><span style="color:#242424"><o:p></o:p></span></p> </div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">&nbsp;</span></span><span \
style="color:#242424"><o:p></o:p></span></p> </div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">Regards,</span></span><span \
style="color:#242424"><o:p></o:p></span></p> </div>
<div>
<p style="background:white"><span class="contentpasted0"><span \
style="font-family:&quot;inherit&quot;,serif;color:black">Patrick</span></span><span \
style="color:#242424"><o:p></o:p></span></p> </div>
</div>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="0" width="94%" align="center">
</div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span \
style="color:black"> lustre-discuss &lt;lustre-discuss-bounces@lists.lustre.org&gt; \
on behalf of Marc O'Brien via lustre-discuss \
&lt;lustre-discuss@lists.lustre.org&gt;<br> <b>Sent:</b> Tuesday, March 14, 2023 7:24 \
AM<br> <b>To:</b> lustre-discuss@lists.lustre.org \
&lt;lustre-discuss@lists.lustre.org&gt;<br> <b>Subject:</b> [lustre-discuss] Question \
regarding user access during recovery and journal replay</span> <o:p></o:p></p>
<div>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="xmsonormal"><span \
style="font-size:11.0pt;color:black">Hi,</span><o:p></o:p></p> <p \
class="xmsonormal"><span style="font-size:11.0pt;color:black">When I was first taught \
some Lustre file system administration, it was stressed that when recovering a Lustre \
file system and while the journal replay was occurring on each host, there should  be \
no user interaction with the file system. Any recovery was done with cluster access \
denied to HPC users, or when the cluster was deemed to be quiescent. This seemed to \
make sense as during journal replay the file system is in R/W state, but the \
distributed  file system may not have reached a stable state. We now have multiple \
Lustre file systems (2 Ext4 based and 1 ZFS based) and evicting users or finding a \
quiescent time is problematic (luckily there are maintenance windows for the routine \
stuff).</span><o:p></o:p></p> <p class="xmsonormal"><span \
style="font-size:11.0pt;color:black">I have searched online and have yet to see in \
print that there should be no user interaction with Lustre during recovery or journal \
replay (I may have missed it).</span><o:p></o:p></p> <p class="xmsonormal"><span \
style="font-size:11.0pt;color:black">So, my question is, is the no cluster user \
interaction during recovery and journal replay restriction, actually a \
thing?</span><o:p></o:p></p> <p class="xmsonormal"><span \
style="font-size:11.0pt;color:black">Thanks in advance for any enlightenment \
:)</span><o:p></o:p></p> <p class="xmsonormal"><span \
style="font-size:11.0pt;color:black">Marc</span><o:p></o:p></p> <p \
class="xmsonormal"><span style="font-size:11.0pt">&nbsp;</span><o:p></o:p></p> </div>
</div>
</div>
</body>
</html>



_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--===============3501546436767232968==--

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic