[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freenx-knx
Subject:    Re: [FreeNX-kNX] nxagent session gets lost,
From:       Marcelo Boveto Shima <marceloshima () gmail ! com>
Date:       2009-03-01 0:55:38
Message-ID: 7d3bf3160902281655o52664ec4me98d9e4fefee4398 () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


The following patch fixes this bug.
http://bazaar.launchpad.net/~freenx-team/freenx-server/teambzr/revision/91

This line seems to trigger the problem.
echo "NX> 596 Error: Session $1 failed. Reason was: $line"

Running it only when the node failed to restore the session solves the bug.

Regards.
Shima

On Mon, Feb 16, 2009 at 4:50 PM, Freerk Kalsbeek
<f.kalsbeek@mindswitch.nl>wrote:

> I've implemented this patch at one of our servers. Let's see what happens
> next few days.
> Haven't had the time to analyse our issues in more detail. Hopefully this
> fixes it.
>
> Regards,
> Freerk
>
>
> On Sun, Jan 25, 2009 at 4:32 AM, Mario Becroft <mb@gem.win.co.nz> wrote:
>
>> I still don't fully understand this problem, but I have a solution.
>>
>> I am not very sure about Marcelo's patch because as far as I can see,
>> NODE_SUSPEND_STATUS is never set to "Suspending". What is this patch
>> meant to do exactly?
>>
>> I found that with slave mode disabled, everything is much easier to
>> understand, and it does not appear to make it any slower. It did not
>> exactly fix the problem though, just modified the symptoms.
>>
>> The key problem is that when the client nxssh is killed, nxserver hangs
>> in the echo inside server_nxnode_echo(). It attempts to handle this
>> situation by installing a SIGPIPE handler that sets
>> SERVER_CHANNEL=0. Unfortunately, SIGPIPE is never received in this
>> situation; instead the echo hangs forever. This is what causes it never
>> to process any more commands from nxnode.
>>
>> It is not entirely clear why it happens in this way.
>>
>> Anyway, the workaround is to change echo to /bin/echo. /bin/echo returns
>> immediately if the client is disconnected. Probably it should also check
>> the status and set SERVER_CHANNEL=0 if /bin/echo failed. However I have
>> not bothered to do this. It does not seem to matter a great deal.
>>
>> This solves the problem both with and without slave mode. I think there
>> may still be some sort of timing related potential problem here, but I
>> am not sure, it is all rather complicated.
>>
>> I have also noticed another problem that I thought might be related, but
>> is probably different. If you unplug the network from the currently
>> logged in client, it takes about 30 seconds before nxagent notices that
>> the client is gone and suspends the session. If, in this 30-second
>> window, you login from another client, everything works, but the session
>> status incorrectly remains in suspended state. I guess this is because
>> when the second client logs in, it must suspend the session before
>> restoring it on the new client. Somehow the suspended state of the
>> session is set after the resumed state. I am out of time and this
>> problem is not so serious, so I am ignoring it for now. Maybe someone
>> else has time to look into this one.
>>
>> Anyway, for anyone else who has the present problem, please try the
>> following patch and report back.
>>
>> See the patch below (the line numbers might be a bit off since my file
>> has lots of extra instrumentation):
>>
>> --8<---------------cut here---------------start------------->8---
>> --- nxserver.foo        2009-01-25 16:07:46.590977440 +1300
>> +++ nxserver    2009-01-25 16:07:54.498952944 +1300
>> @@ -967,8 +967,8 @@
>>  server_nxnode_echo()
>>  {
>>        log 6 "server_nxnode_echo: $@"
>> -       [ "$SERVER_CHANNEL" = "1" ] && echo "$@"
>> -       [ "$SERVER_CHANNEL" = "2" ] && echo "$@" >&2
>> +       [ "$SERVER_CHANNEL" = "1" ] && /bin/echo "$@"
>> +       [ "$SERVER_CHANNEL" = "2" ] && /bin/echo "$@" >&2
>>  }
>>
>>  server_nxnode_exit_func()
>> --8<---------------cut here---------------end--------------->8---
>>
>> --
>> Mario Becroft <mb@gem.win.co.nz>
>> ________________________________________________________________
>>     Were you helped on this list with your FreeNX problem?
>>    Then please write up the solution in the FreeNX Wiki/FAQ:
>>
>> http://openfacts2.berlios.de/wikien/index.php/BerliosProject:FreeNX_-_FAQ
>>
>>         Don't forget to check the NX Knowledge Base:
>>                 http://www.nomachine.com/kb/
>>
>> ________________________________________________________________
>>       FreeNX-kNX mailing list --- FreeNX-kNX@kde.org
>>      https://mail.kde.org/mailman/listinfo/freenx-knx
>> ________________________________________________________________
>>
>
>
> ________________________________________________________________
>     Were you helped on this list with your FreeNX problem?
>    Then please write up the solution in the FreeNX Wiki/FAQ:
>
> http://openfacts2.berlios.de/wikien/index.php/BerliosProject:FreeNX_-_FAQ
>
>         Don't forget to check the NX Knowledge Base:
>                 http://www.nomachine.com/kb/
>
> ________________________________________________________________
>       FreeNX-kNX mailing list --- FreeNX-kNX@kde.org
>      https://mail.kde.org/mailman/listinfo/freenx-knx
> ________________________________________________________________
>

[Attachment #5 (text/html)]

The following patch fixes this bug.<br><a \
href="http://bazaar.launchpad.net/~freenx-team/freenx-server/teambzr/revision/91">http \
://bazaar.launchpad.net/~freenx-team/freenx-server/teambzr/revision/91</a><br><br>This \
line seems to trigger the problem.<br> echo &quot;NX&gt; 596 Error: Session $1 \
failed. Reason was: $line&quot;<br><br>Running it only when the node failed to \
restore the session solves the bug.<br><br>Regards.<br>Shima<br><br><div \
class="gmail_quote">On Mon, Feb 16, 2009 at 4:50 PM, Freerk Kalsbeek <span \
dir="ltr">&lt;<a href="mailto:f.kalsbeek@mindswitch.nl">f.kalsbeek@mindswitch.nl</a>&gt;</span> \
wrote:<br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, \
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I&#39;ve implemented this \
patch at one of our servers. Let&#39;s see what happens next few days.<br> \
Haven&#39;t had the time to analyse our issues in more detail. Hopefully this fixes \
it.<br><br>Regards,<br><font color="#888888">Freerk</font><div><div></div><div \
class="Wj3C7c"><br><br><div class="gmail_quote"> On Sun, Jan 25, 2009 at 4:32 AM, \
Mario Becroft <span dir="ltr">&lt;<a href="mailto:mb@gem.win.co.nz" \
target="_blank">mb@gem.win.co.nz</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt \
0pt 0.8ex; padding-left: 1ex;">

I still don&#39;t fully understand this problem, but I have a solution.<br>
<br>
I am not very sure about Marcelo&#39;s patch because as far as I can see,<br>
NODE_SUSPEND_STATUS is never set to &quot;Suspending&quot;. What is this patch<br>
meant to do exactly?<br>
<br>
I found that with slave mode disabled, everything is much easier to<br>
understand, and it does not appear to make it any slower. It did not<br>
exactly fix the problem though, just modified the symptoms.<br>
<br>
The key problem is that when the client nxssh is killed, nxserver hangs<br>
in the echo inside server_nxnode_echo(). It attempts to handle this<br>
situation by installing a SIGPIPE handler that sets<br>
SERVER_CHANNEL=0. Unfortunately, SIGPIPE is never received in this<br>
situation; instead the echo hangs forever. This is what causes it never<br>
to process any more commands from nxnode.<br>
<br>
It is not entirely clear why it happens in this way.<br>
<br>
Anyway, the workaround is to change echo to /bin/echo. /bin/echo returns<br>
immediately if the client is disconnected. Probably it should also check<br>
the status and set SERVER_CHANNEL=0 if /bin/echo failed. However I have<br>
not bothered to do this. It does not seem to matter a great deal.<br>
<br>
This solves the problem both with and without slave mode. I think there<br>
may still be some sort of timing related potential problem here, but I<br>
am not sure, it is all rather complicated.<br>
<br>
I have also noticed another problem that I thought might be related, but<br>
is probably different. If you unplug the network from the currently<br>
logged in client, it takes about 30 seconds before nxagent notices that<br>
the client is gone and suspends the session. If, in this 30-second<br>
window, you login from another client, everything works, but the session<br>
status incorrectly remains in suspended state. I guess this is because<br>
when the second client logs in, it must suspend the session before<br>
restoring it on the new client. Somehow the suspended state of the<br>
session is set after the resumed state. I am out of time and this<br>
problem is not so serious, so I am ignoring it for now. Maybe someone<br>
else has time to look into this one.<br>
<br>
Anyway, for anyone else who has the present problem, please try the<br>
following patch and report back.<br>
<br>
See the patch below (the line numbers might be a bit off since my file<br>
has lots of extra instrumentation):<br>
<div><br>
--8&lt;---------------cut here---------------start-------------&gt;8---<br>
</div>--- nxserver.foo        2009-01-25 16:07:46.590977440 +1300<br>
+++ nxserver    2009-01-25 16:07:54.498952944 +1300<br>
@@ -967,8 +967,8 @@<br>
 server_nxnode_echo()<br>
 {<br>
        log 6 &quot;server_nxnode_echo: $@&quot;<br>
-       [ &quot;$SERVER_CHANNEL&quot; = &quot;1&quot; ] &amp;&amp; echo \
                &quot;$@&quot;<br>
-       [ &quot;$SERVER_CHANNEL&quot; = &quot;2&quot; ] &amp;&amp; echo \
&quot;$@&quot; &gt;&amp;2<br> +       [ &quot;$SERVER_CHANNEL&quot; = &quot;1&quot; ] \
&amp;&amp; /bin/echo &quot;$@&quot;<br> +       [ &quot;$SERVER_CHANNEL&quot; = \
&quot;2&quot; ] &amp;&amp; /bin/echo &quot;$@&quot; &gt;&amp;2<br>  }<br>
<br>
 server_nxnode_exit_func()<br>
<div><div></div><div>--8&lt;---------------cut \
here---------------end---------------&gt;8---<br> <br>
--<br>
Mario Becroft &lt;<a href="mailto:mb@gem.win.co.nz" \
target="_blank">mb@gem.win.co.nz</a>&gt;<br> \
________________________________________________________________<br>  Were you helped \
on this list with your FreeNX problem?<br>  Then please write up the solution in the \
FreeNX Wiki/FAQ:<br> <br>
<a href="http://openfacts2.berlios.de/wikien/index.php/BerliosProject:FreeNX_-_FAQ" \
target="_blank">http://openfacts2.berlios.de/wikien/index.php/BerliosProject:FreeNX_-_FAQ</a><br>
 <br>
         Don&#39;t forget to check the NX Knowledge Base:<br>
                 <a href="http://www.nomachine.com/kb/" \
target="_blank">http://www.nomachine.com/kb/</a><br> <br>
________________________________________________________________<br>
       FreeNX-kNX mailing list --- <a href="mailto:FreeNX-kNX@kde.org" \
                target="_blank">FreeNX-kNX@kde.org</a><br>
      <a href="https://mail.kde.org/mailman/listinfo/freenx-knx" \
target="_blank">https://mail.kde.org/mailman/listinfo/freenx-knx</a><br> \
________________________________________________________________<br> \
</div></div></blockquote></div><br> \
</div></div><br>________________________________________________________________<br>  \
Were you helped on this list with your FreeNX problem?<br>  Then please write up the \
solution in the FreeNX Wiki/FAQ:<br> <br>
<a href="http://openfacts2.berlios.de/wikien/index.php/BerliosProject:FreeNX_-_FAQ" \
target="_blank">http://openfacts2.berlios.de/wikien/index.php/BerliosProject:FreeNX_-_FAQ</a><br>
 <br>
         Don&#39;t forget to check the NX Knowledge Base:<br>
                 <a href="http://www.nomachine.com/kb/" \
target="_blank">http://www.nomachine.com/kb/</a><br> <br>
________________________________________________________________<br>
       FreeNX-kNX mailing list --- <a \
                href="mailto:FreeNX-kNX@kde.org">FreeNX-kNX@kde.org</a><br>
      <a href="https://mail.kde.org/mailman/listinfo/freenx-knx" \
target="_blank">https://mail.kde.org/mailman/listinfo/freenx-knx</a><br> \
________________________________________________________________<br></blockquote></div><br>




________________________________________________________________
     Were you helped on this list with your FreeNX problem?
    Then please write up the solution in the FreeNX Wiki/FAQ:

http://openfacts2.berlios.de/wikien/index.php/BerliosProject:FreeNX_-_FAQ
  
         Don't forget to check the NX Knowledge Base:
                 http://www.nomachine.com/kb/ 

________________________________________________________________
       FreeNX-kNX mailing list --- FreeNX-kNX@kde.org
      https://mail.kde.org/mailman/listinfo/freenx-knx
________________________________________________________________

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic