[prev in list] [next in list] [prev in thread] [next in thread]
List: redhat-linux-cluster
Subject: Re: [Linux-cluster] GFS2 and D state HTTPD processes
From: Emilio Arjona <emilio.ah () gmail ! com>
Date: 2010-04-27 11:58:39
Message-ID: x2mdbaea1f51004270458p753d2d00u33403224673bd732 () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Thanks Ricardo,
We don't want to update the server because it's in production. We will plan
a system update in summer when system's load is low.
In the last incidents there is a new process involved: [delete_workqueu].
Now, it is usually the initiator of the D-state processes lockout. I have
been looking for information about this process but couldn't find out
anything.
Any idea?
Regards :)
2010/4/9 Ricardo Argüello <ricardo@fedoraproject.org>
> Looks like this bug:
>
> GFS2 - probably lost glock call back
> https://bugzilla.redhat.com/show_bug.cgi?id=498976
>
> This is fixed in the kernel included in RHEL 5.5.
> Do a "yum update" to fix it.
>
> Ricardo Arguello
>
> On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona <emilio.ah@gmail.com> wrote:
> > Thanks for your response, Steve.
> >
> > 2010/3/2 Steven Whitehouse <swhiteho@redhat.com>:
> >> Hi,
> >>
> >> On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote:
> >>> Hi,
> >>>
> >>> we are experiencing some problems commented in an old thread:
> >>>
> >>> http://www.mail-archive.com/linux-cluster@redhat.com/msg07091.html
> >>>
> >>> We have 3 clustered servers under Red Hat 5.4 accessing a GFS2
> resource.
> >>>
> >>> fstab options:
> >>> /dev/vg_cluster/lv_cluster /opt/datacluster gfs2
> >>> defaults,noatime,nodiratime,noquota 0 0
> >>>
> >>> GFS options:
> >>> plock_rate_limit="0"
> >>> plock_ownership=1
> >>>
> >>> httpd processes run into D status sometimes and the only solution is
> >>> hard reset the affected server.
> >>>
> >>> Can anyone give me some hints to diagnose the problem?
> >>>
> >>> Thanks :)
> >>>
> >> Can you give me a rough idea of what the actual workload is and how it
> >> is distributed amoung the director(y/ies) ?
> >
> > We had problems with php sessions in the past but we fixed it by
> > configuring php to store the sessions in the database instead of in
> > the GFS filesystem. Now, we're having problems with files and
> > directories in the "data" folder of Moodle LMS.
> >
> > "lsof -p" returned a i/o operation over the same folder in 2/3 nodes,
> > we did a hard reset of these nodes but some hours after the CPU load
> > grew up again, specially in the node that wasn't rebooted. We decided
> > to reboot (vía ssh) this node, then the CPU load went down to normal
> > values in all nodes.
> >
> > I don't think the system's load is high enough to produce concurrent
> > access problems. It's more likely to be some misconfiguration, in
> > fact, we changed some GFS2 options to non default values to increase
> > performance (
> http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html
> ).
> >
> >>
> >> This is often down to contention on glocks (one per inode) and maybe
> >> because there is a process of processes writing a file or directory
> >> which is in use (either read-only or writable) by other processes.
> >>
> >> If you are using php, then you might have to strace it to find out what
> >> it is really doing,
> >
> > Ok, we will try to strace the D processes and post the results. Hope
> > we find something!!
> >
> >>
> >> Steve.
> >>
> >>> --
> >>>
> >>> Emilio Arjona.
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster@redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster@redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> >
> >
> > --
> > Emilio Arjona.
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
*******************************************
Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
[Attachment #5 (text/html)]
Thanks Ricardo,<br><br>We don't want to update the server because it's in \
production. We will plan a system update in summer when system's load is low. \
<br><br>In the last incidents there is a new process involved: [delete_workqueu]. \
Now, it is usually the initiator of the D-state processes lockout. I have been \
looking for information about this process but couldn't find out anything.<br> \
<br>Any idea?<br><br>Regards :)<br><br><br><div class="gmail_quote">2010/4/9 Ricardo \
Argüello <span dir="ltr"><<a \
href="mailto:ricardo@fedoraproject.org">ricardo@fedoraproject.org</a>></span><br><blockquote \
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt \
0pt 0.8ex; padding-left: 1ex;"> Looks like this bug:<br>
<br>
GFS2 - probably lost glock call back<br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=498976" \
target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=498976</a><br> <br>
This is fixed in the kernel included in RHEL 5.5.<br>
Do a "yum update" to fix it.<br>
<br>
Ricardo Arguello<br>
<div><div></div><div class="h5"><br>
On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona <<a \
href="mailto:emilio.ah@gmail.com">emilio.ah@gmail.com</a>> wrote:<br> > Thanks \
for your response, Steve.<br> ><br>
> 2010/3/2 Steven Whitehouse <<a \
href="mailto:swhiteho@redhat.com">swhiteho@redhat.com</a>>:<br> >> Hi,<br>
>><br>
>> On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote:<br>
>>> Hi,<br>
>>><br>
>>> we are experiencing some problems commented in an old thread:<br>
>>><br>
>>> <a href="http://www.mail-archive.com/linux-cluster@redhat.com/msg07091.html" \
target="_blank">http://www.mail-archive.com/linux-cluster@redhat.com/msg07091.html</a><br>
>>><br>
>>> We have 3 clustered servers under Red Hat 5.4 accessing a GFS2 \
resource.<br> >>><br>
>>> fstab options:<br>
>>> /dev/vg_cluster/lv_cluster /opt/datacluster gfs2<br>
>>> defaults,noatime,nodiratime,noquota 0 0<br>
>>><br>
>>> GFS options:<br>
>>> plock_rate_limit="0"<br>
>>> plock_ownership=1<br>
>>><br>
>>> httpd processes run into D status sometimes and the only solution is<br>
>>> hard reset the affected server.<br>
>>><br>
>>> Can anyone give me some hints to diagnose the problem?<br>
>>><br>
>>> Thanks :)<br>
>>><br>
>> Can you give me a rough idea of what the actual workload is and how it<br>
>> is distributed amoung the director(y/ies) ?<br>
><br>
> We had problems with php sessions in the past but we fixed it by<br>
> configuring php to store the sessions in the database instead of in<br>
> the GFS filesystem. Now, we're having problems with files and<br>
> directories in the "data" folder of Moodle LMS.<br>
><br>
> "lsof -p" returned a i/o operation over the same folder in 2/3 \
nodes,<br> > we did a hard reset of these nodes but some hours after the CPU \
load<br> > grew up again, specially in the node that wasn't rebooted. We \
decided<br> > to reboot (vía ssh) this node, then the CPU load went down to \
normal<br> > values in all nodes.<br>
><br>
> I don't think the system's load is high enough to produce concurrent<br>
> access problems. It's more likely to be some misconfiguration, in<br>
> fact, we changed some GFS2 options to non default values to increase<br>
> performance (<a \
href="http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html" \
target="_blank">http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html</a>).<br>
><br>
>><br>
>> This is often down to contention on glocks (one per inode) and maybe<br>
>> because there is a process of processes writing a file or directory<br>
>> which is in use (either read-only or writable) by other processes.<br>
>><br>
>> If you are using php, then you might have to strace it to find out what<br>
>> it is really doing,<br>
><br>
> Ok, we will try to strace the D processes and post the results. Hope<br>
> we find something!!<br>
><br>
>><br>
>> Steve.<br>
>><br>
>>> --<br>
>>><br>
>>> Emilio Arjona.<br>
>>><br>
>>> --<br>
>>> Linux-cluster mailing list<br>
>>> <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
>>> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br> \
>><br> >><br>
>> --<br>
>> Linux-cluster mailing list<br>
>> <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
>> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br> \
>><br> ><br>
><br>
><br>
> --<br>
> Emilio Arjona.<br>
><br>
> --<br>
> Linux-cluster mailing list<br>
> <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br> \
><br> <br>
--<br>
Linux-cluster mailing list<br>
<a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" \
target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a></div></div></blockquote></div><br><br \
clear="all"><br>-- <br>*******************************************<br> Emilio Arjona \
Heredia<br>Centro de Enseñanzas Virtuales de la Universidad de Granada<br>C/ Real de \
Cartuja 36-38<br><a href="http://cevug.ugr.es">http://cevug.ugr.es</a><br>Tlfno.: \
958-241000 ext. 20206<br>*******************************************<br> <br>
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic