[prev in list] [next in list] [prev in thread] [next in thread] 

List:       redhat-linux-cluster
Subject:    Re: [Linux-cluster] GFS2 and D state HTTPD processes
From:       Emilio Arjona <emilio.ah () gmail ! com>
Date:       2010-04-27 11:58:39
Message-ID: x2mdbaea1f51004270458p753d2d00u33403224673bd732 () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Thanks Ricardo,

We don't want to update the server because it's in production. We will plan
a system update in summer when system's load is low.

In the last incidents there is a new process involved: [delete_workqueu].
Now, it is usually the initiator of the D-state processes lockout. I have
been looking for information about this process but couldn't find out
anything.

Any idea?

Regards :)


2010/4/9 Ricardo Argüello <ricardo@fedoraproject.org>

> Looks like this bug:
>
> GFS2 - probably lost glock call back
> https://bugzilla.redhat.com/show_bug.cgi?id=498976
>
> This is fixed in the kernel included in RHEL 5.5.
> Do a "yum update" to fix it.
>
> Ricardo Arguello
>
> On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona <emilio.ah@gmail.com> wrote:
> > Thanks for your response, Steve.
> >
> > 2010/3/2 Steven Whitehouse <swhiteho@redhat.com>:
> >> Hi,
> >>
> >> On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote:
> >>> Hi,
> >>>
> >>> we are experiencing some problems commented in an old thread:
> >>>
> >>> http://www.mail-archive.com/linux-cluster@redhat.com/msg07091.html
> >>>
> >>> We have 3 clustered servers under Red Hat 5.4 accessing a GFS2
> resource.
> >>>
> >>> fstab options:
> >>> /dev/vg_cluster/lv_cluster /opt/datacluster gfs2
> >>> defaults,noatime,nodiratime,noquota 0 0
> >>>
> >>> GFS options:
> >>> plock_rate_limit="0"
> >>> plock_ownership=1
> >>>
> >>> httpd processes run into D status sometimes and the only solution is
> >>> hard reset the affected server.
> >>>
> >>> Can anyone give me some hints to diagnose the problem?
> >>>
> >>> Thanks :)
> >>>
> >> Can you give me a rough idea of what the actual workload is and how it
> >> is distributed amoung the director(y/ies) ?
> >
> > We had problems with php sessions in the past but we fixed it by
> > configuring php to store the sessions in the database instead of in
> > the GFS filesystem. Now, we're having problems with files and
> > directories in the "data" folder of Moodle LMS.
> >
> > "lsof -p" returned a i/o operation over the same folder in 2/3 nodes,
> > we did a hard reset of these nodes but some hours after the CPU load
> > grew up again, specially in the node that wasn't rebooted. We decided
> > to reboot (vía ssh) this node, then the CPU load went down to normal
> > values in all nodes.
> >
> > I don't think the system's load is high enough to produce concurrent
> > access problems. It's more likely to be some misconfiguration, in
> > fact, we changed some GFS2 options to non default values to increase
> > performance (
> http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html
> ).
> >
> >>
> >> This is often down to contention on glocks (one per inode) and maybe
> >> because there is a process of processes writing a file or directory
> >> which is in use (either read-only or writable) by other processes.
> >>
> >> If you are using php, then you might have to strace it to find out what
> >> it is really doing,
> >
> > Ok, we will try to strace the D processes and post the results. Hope
> > we find something!!
> >
> >>
> >> Steve.
> >>
> >>> --
> >>>
> >>> Emilio Arjona.
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster@redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster@redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> >
> >
> > --
> > Emilio Arjona.
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
*******************************************
Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************

[Attachment #5 (text/html)]

Thanks Ricardo,<br><br>We don&#39;t want to update the server because it&#39;s in \
production. We will plan a system update in summer when system&#39;s load is low. \
<br><br>In the last incidents there is a new process involved: [delete_workqueu]. \
Now, it is usually the initiator of the D-state processes lockout. I have been \
looking for information about this process but couldn&#39;t find out anything.<br> \
<br>Any idea?<br><br>Regards :)<br><br><br><div class="gmail_quote">2010/4/9 Ricardo \
Argüello <span dir="ltr">&lt;<a \
href="mailto:ricardo@fedoraproject.org">ricardo@fedoraproject.org</a>&gt;</span><br><blockquote \
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt \
0pt 0.8ex; padding-left: 1ex;"> Looks like this bug:<br>
<br>
GFS2 - probably lost glock call back<br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=498976" \
target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=498976</a><br> <br>
This is fixed in the kernel included in RHEL 5.5.<br>
Do a &quot;yum update&quot; to fix it.<br>
<br>
Ricardo Arguello<br>
<div><div></div><div class="h5"><br>
On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona &lt;<a \
href="mailto:emilio.ah@gmail.com">emilio.ah@gmail.com</a>&gt; wrote:<br> &gt; Thanks \
for your response, Steve.<br> &gt;<br>
&gt; 2010/3/2 Steven Whitehouse &lt;<a \
href="mailto:swhiteho@redhat.com">swhiteho@redhat.com</a>&gt;:<br> &gt;&gt; Hi,<br>
&gt;&gt;<br>
&gt;&gt; On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote:<br>
&gt;&gt;&gt; Hi,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; we are experiencing some problems commented in an old thread:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; <a href="http://www.mail-archive.com/linux-cluster@redhat.com/msg07091.html" \
target="_blank">http://www.mail-archive.com/linux-cluster@redhat.com/msg07091.html</a><br>
 &gt;&gt;&gt;<br>
&gt;&gt;&gt; We have 3 clustered servers under Red Hat 5.4 accessing a GFS2 \
resource.<br> &gt;&gt;&gt;<br>
&gt;&gt;&gt; fstab options:<br>
&gt;&gt;&gt; /dev/vg_cluster/lv_cluster /opt/datacluster gfs2<br>
&gt;&gt;&gt; defaults,noatime,nodiratime,noquota 0 0<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; GFS options:<br>
&gt;&gt;&gt; plock_rate_limit=&quot;0&quot;<br>
&gt;&gt;&gt; plock_ownership=1<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; httpd processes run into D status sometimes and the only solution is<br>
&gt;&gt;&gt; hard reset the affected server.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Can anyone give me some hints to diagnose the problem?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Thanks :)<br>
&gt;&gt;&gt;<br>
&gt;&gt; Can you give me a rough idea of what the actual workload is and how it<br>
&gt;&gt; is distributed amoung the director(y/ies) ?<br>
&gt;<br>
&gt; We had problems with php sessions in the past but we fixed it by<br>
&gt; configuring php to store the sessions in the database instead of in<br>
&gt; the GFS filesystem. Now, we&#39;re having problems with files and<br>
&gt; directories in the &quot;data&quot; folder of Moodle LMS.<br>
&gt;<br>
&gt; &quot;lsof -p&quot; returned a i/o operation over the same folder in 2/3 \
nodes,<br> &gt; we did a hard reset of these nodes but some hours after the CPU \
load<br> &gt; grew up again, specially in the node that wasn&#39;t rebooted. We \
decided<br> &gt; to reboot (vía ssh) this node, then the CPU load went down to \
normal<br> &gt; values in all nodes.<br>
&gt;<br>
&gt; I don&#39;t think the system&#39;s load is high enough to produce concurrent<br>
&gt; access problems. It&#39;s more likely to be some misconfiguration, in<br>
&gt; fact, we changed some GFS2 options to non default values to increase<br>
&gt; performance (<a \
href="http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html" \
target="_blank">http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html</a>).<br>
 &gt;<br>
&gt;&gt;<br>
&gt;&gt; This is often down to contention on glocks (one per inode) and maybe<br>
&gt;&gt; because there is a process of processes writing a file or directory<br>
&gt;&gt; which is in use (either read-only or writable) by other processes.<br>
&gt;&gt;<br>
&gt;&gt; If you are using php, then you might have to strace it to find out what<br>
&gt;&gt; it is really doing,<br>
&gt;<br>
&gt; Ok, we will try to strace the D processes and post the results. Hope<br>
&gt; we find something!!<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt; Steve.<br>
&gt;&gt;<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Emilio Arjona.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt; Linux-cluster mailing list<br>
&gt;&gt;&gt; <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
 &gt;&gt;&gt; <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br> \
&gt;&gt;<br> &gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Linux-cluster mailing list<br>
&gt;&gt; <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
&gt;&gt; <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br> \
&gt;&gt;<br> &gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Emilio Arjona.<br>
&gt;<br>
&gt; --<br>
&gt; Linux-cluster mailing list<br>
&gt; <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
&gt; <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br> \
&gt;<br> <br>
--<br>
Linux-cluster mailing list<br>
<a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a></div></div></blockquote></div><br><br \
clear="all"><br>-- <br>*******************************************<br> Emilio Arjona \
Heredia<br>Centro de Enseñanzas Virtuales de la Universidad de Granada<br>C/ Real de \
Cartuja 36-38<br><a href="http://cevug.ugr.es">http://cevug.ugr.es</a><br>Tlfno.: \
958-241000 ext. 20206<br>*******************************************<br> <br>



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic