[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    Re: [lustre-discuss] 2.15.4 o2iblnd on RoCEv2?
From:       Andreas Dilger via lustre-discuss <lustre-discuss () lists ! lustre ! org>
Date:       2024-01-10 21:55:57
Message-ID: 03B63E33-2F62-427D-95E6-4BAC13B15D44 () ddn ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

Granted that I'm not an LNet expert, but "errno: -1 descr: cannot parse net \
'<255:65535>' " doesn't immediately lead me to the same conclusion as if "unknown \
internface 'ib0' " were printed for the error message.  Also "errno: -1" is "-EPERM = \
Operation not permitted", and doesn't give the same information as "-ENXIO = No such \
device or address" or even "-EINVAL = Invalid argument" would.

That said, I can't even offer a patch for this myself, since that exact error message \
is used in a few different places, though I suspect it is coming from \
lustre_lnet_config_ni().

Looking further into this, now that I've found where (I think) the error message is \
generated, it seems that "errno: -1" is not "-EPERM" but rather \
"LUSTRE_CFG_RC_BAD_PARAM", which is IMHO a travesty to use different error numbers \
(and then print them after "errno:") instead of existing POSIX error codes that could \
fill the same role (with some creative mapping):

    #define LUSTRE_CFG_RC_NO_ERR                     0  => fine
    #define LUSTRE_CFG_RC_BAD_PARAM                 -1  => -EINVAL
    #define LUSTRE_CFG_RC_MISSING_PARAM             -2  => -EFAULT
    #define LUSTRE_CFG_RC_OUT_OF_RANGE_PARAM        -3  => -ERANGE
    #define LUSTRE_CFG_RC_OUT_OF_MEM                -4  => -ENOMEM
    #define LUSTRE_CFG_RC_GENERIC_ERR               -5  => -ENODATA
    #define LUSTRE_CFG_RC_NO_MATCH                  -6  => -ENOMSG
    #define LUSTRE_CFG_RC_MATCH                     -7  => -EXFULL
    #define LUSTRE_CFG_RC_SKIP                      -8  => -EBADSLT
    #define LUSTRE_CFG_RC_LAST_ELEM                 -9  => -ECHRNG
    #define LUSTRE_CFG_RC_MARSHAL_FAIL              -10 => -ENOSTR

I don't think "overloading" the POSIX error codes to mean something similar is worse \
than using random numbers to report errors.  Also, in some cases (even in \
lustre_lnet_config_ni()) it is using "rc = -errno" so the LUSTRE_CFG_RC_* errors are \
*already* conflicting with POSIX error numbers, and it impossible to distinguish \
between them...

The main question is whether changing these numbers will break a user->kernel \
interface, or if these definitions are only in userspace?    It looks like lnetctl.c \
is only ever checking "!= LUSTRE_CFG_RC_NO_ERR", so maybe it is fine?  None of the \
values currently overlap, so it would be possible to start accepting either of the \
values for the return in the user tools, and then at some point in the future start \
actually returning them...  Something for the LNet folks to figure out.

Cheers, Andreas

On Jan 10, 2024, at 13:29, Jeff Johnson \
<jeff.johnson@aeoncomputing.com<mailto:jeff.johnson@aeoncomputing.com>> wrote:

A LU ticket and patch for lnetctl or for me being an under-caffeinated
idiot? ;-)

On Wed, Jan 10, 2024 at 12:06 PM Andreas Dilger \
<adilger@whamcloud.com<mailto:adilger@whamcloud.com>> wrote:

It would seem that the error message could be improved in this case?  Could you file \
an LU ticket for that with the reproducer below, and ideally along with a patch?

Cheers, Andreas
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud


[Attachment #3 (text/html)]

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: \
after-white-space;" class=""> Granted that I'm not an LNet expert, but &quot;errno: \
-1&nbsp;descr: cannot parse net '&lt;255:65535&gt;' &quot; doesn't immediately lead \
me to the same conclusion as if &quot;unknown internface 'ib0' &quot; were printed \
for the error message. &nbsp;Also &quot;errno: -1&quot; is &quot;-EPERM = Operation \
not  permitted&quot;, and doesn't give the same information as &quot;-ENXIO = No such \
device or address&quot; or even &quot;-EINVAL = Invalid argument&quot; would. <div \
class=""><br class=""> </div>
<div class="">That said, I can't even offer a patch for this myself, since that exact \
error message is used in a few different places, though I suspect it is coming \
from&nbsp;lustre_lnet_config_ni().</div> <div class=""><br class="">
</div>
<div class="">Looking further into this, now that I've found where (I think) the \
error message is generated, it seems that &quot;errno: -1&quot; is not \
&quot;-EPERM&quot; but rather &quot;LUSTRE_CFG_RC_BAD_PARAM&quot;, which is IMHO a \
travesty to use different error numbers (and then print  them after \
&quot;errno:&quot;) instead of existing POSIX error codes that could fill the same \
role (with some creative mapping):</div> <div class=""><br class="">
</div>
<div class=""><font face="Courier New" class="">&nbsp; &nbsp; #define \
LUSTRE_CFG_RC_NO_ERR&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp;&nbsp;0 &nbsp;=&gt; fine<br class=""> &nbsp; &nbsp;&nbsp;#define \
LUSTRE_CFG_RC_BAD_PARAM&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp;&nbsp;-1 &nbsp;=&gt; -EINVAL<br class=""> &nbsp; &nbsp;&nbsp;#define \
LUSTRE_CFG_RC_MISSING_PARAM&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;-2 \
&nbsp;=&gt; -EFAULT<br class=""> &nbsp; &nbsp;&nbsp;#define \
LUSTRE_CFG_RC_OUT_OF_RANGE_PARAM&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;-3 &nbsp;=&gt; \
-ERANGE<br class=""> &nbsp; &nbsp;&nbsp;#define LUSTRE_CFG_RC_OUT_OF_MEM&nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;-4 &nbsp;=&gt; -ENOMEM<br class=""> \
&nbsp; &nbsp;&nbsp;#define LUSTRE_CFG_RC_GENERIC_ERR&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp;&nbsp;-5 &nbsp;=&gt; -ENODATA</font></div> <div class=""><font \
face="Courier New" class="">&nbsp; &nbsp; #define LUSTRE_CFG_RC_NO_MATCH&nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;-6 &nbsp;=&gt; -ENOMSG<br \
class=""> &nbsp; &nbsp;&nbsp;#define LUSTRE_CFG_RC_MATCH&nbsp;&nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;-7 &nbsp;=&gt; -EXFULL<br \
class=""> &nbsp; &nbsp;&nbsp;#define LUSTRE_CFG_RC_SKIP&nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;-8 &nbsp;=&gt; -EBADSLT<br \
class=""> &nbsp; &nbsp;&nbsp;#define LUSTRE_CFG_RC_LAST_ELEM&nbsp;&nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;-9 &nbsp;=&gt; -ECHRNG<br class=""> \
&nbsp; &nbsp;&nbsp;#define LUSTRE_CFG_RC_MARSHAL_FAIL&nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp;&nbsp;-10 =&gt; -ENOSTR<br class=""> </font><br class="">
I don't think &quot;overloading&quot; the POSIX error codes to mean something similar \
is worse than using random numbers to report errors. &nbsp;Also, in some cases (even \
in lustre_lnet_config_ni()) it is using &quot;rc = -errno&quot; so the \
LUSTRE_CFG_RC_* errors are *already* conflicting  with POSIX error numbers, and it \
impossible to distinguish between them...</div> <div class=""><br class="">
</div>
<div class="">The main question is whether changing these numbers will break a \
user-&gt;kernel interface, or if these definitions are only in userspace? &nbsp; \
&nbsp;It looks like lnetctl.c is only ever checking &quot;!= \
LUSTRE_CFG_RC_NO_ERR&quot;, so maybe it is fine? &nbsp;None of the  values currently \
overlap, so it would be possible to start accepting either of the values for the \
return in the user tools, and then at some point in the future start actually \
returning them... &nbsp;Something for the LNet folks to figure out.<br class=""> <div \
class=""><br class=""> </div>
<div class="">Cheers, Andreas<br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Jan 10, 2024, at 13:29, Jeff Johnson &lt;<a \
href="mailto:jeff.johnson@aeoncomputing.com" \
class="">jeff.johnson@aeoncomputing.com</a>&gt; wrote:</div> <br \
class="Apple-interchange-newline"> <div class="">
<div class="">A LU ticket and patch for lnetctl or for me being an \
under-caffeinated<br class=""> idiot? ;-)<br class="">
<br class="">
On Wed, Jan 10, 2024 at 12:06 PM Andreas Dilger &lt;<a \
href="mailto:adilger@whamcloud.com" class="">adilger@whamcloud.com</a>&gt; wrote:<br \
class=""> <blockquote type="cite" class=""><br class="">
It would seem that the error message could be improved in this case? &nbsp;Could you \
file an LU ticket for that with the reproducer below, and ideally along with a \
patch?<br class=""> <br class="">
Cheers, Andreas<br class="">
</blockquote>
_______________________________________________<br class="">
lustre-discuss mailing list<br class="">
<a href="mailto:lustre-discuss@lists.lustre.org" \
class="">lustre-discuss@lists.lustre.org</a><br class=""> \
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<br class=""> </div>
</div>
</blockquote>
</div>
<br class="">
<div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); \
letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; \
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; \
text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: \
after-white-space;" class=""> <div dir="auto" style="caret-color: rgb(0, 0, 0); \
color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; \
text-transform: none; white-space: normal; word-spacing: 0px; \
-webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; \
-webkit-nbsp-mode: space; line-break: after-white-space;" class=""> <div dir="auto" \
style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; \
text-align: start; text-indent: 0px; text-transform: none; white-space: normal; \
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: \
break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""> <div \
dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: \
normal; text-align: start; text-indent: 0px; text-transform: none; white-space: \
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; \
word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" \
class=""> <div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); \
letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; \
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; \
text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: \
after-white-space;" class=""> <div dir="auto" style="caret-color: rgb(0, 0, 0); \
color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; \
text-transform: none; white-space: normal; word-spacing: 0px; \
-webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; \
-webkit-nbsp-mode: space; line-break: after-white-space;" class=""> <div>Cheers, \
Andreas</div> <div>--</div>
<div>Andreas Dilger</div>
<div>Lustre&nbsp;Principal Architect</div>
<div>Whamcloud</div>
<div><br class="">
</div>
<div><br class="">
</div>
<div><br class="">
</div>
</div>
</div>
</div>
</div>
</div>
<br class="Apple-interchange-newline">
</div>
<br class="Apple-interchange-newline">
<br class="Apple-interchange-newline">
</div>
<br class="">
</div>
</div>
</body>
</html>



_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--===============4676233661479678054==--

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic