[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ocfs2-users
Subject:    Re: [Ocfs2-users] Periodic hangs
From:       Sunil Mushran <sunil.mushran () oracle ! com>
Date:       2010-10-15 20:05:47
Message-ID: 4CB8B41B.6050705 () oracle ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


  I am not asking you to cause a hang. Just that you take a stack trace
when you encounter one.

If you don't see /proc/.../stack, then CONFIG_STACKTRACE has not been
enabled in your kernel. You'll have to use the old fashioned method of
setting up a netconsole server and then issuing "echo t >/proc/sysrq-trigger"

On 10/15/2010 12:20 PM, Emil Noether wrote:
> Hi,
> 
> thank you for the reply. Are you sure with this command?, because when I run
> 
> find /proc -name stack
> 
> I get no output. But I'm running this command when the server is OK. I can't cause \
> the hang right now, because it is 21:00 here so it is a "prime time" of my web and \
> my customers are already quite upset. But I can try it tommorow morning. 
> Regards,
> Emil Noether
> 
> On 10/15/2010 07:22 PM, Sunil Mushran wrote:
> > Take a stack trace of the hang. If you are on 2.6.32, you could do:
> > 
> > # find /proc -name stack | while read A ; do D=$(dirname $A); echo $A; cat \
> > $D/cmdline; echo ; cat $A; echo ; done; 
> > Attach the output to a bugzilla on oss.oracle.com.
> > 
> > On 10/15/2010 08:16 AM, Emil Noether wrote:
> > > Hi,
> > > 
> > > I have a SATABoy2 Nexan storage with 8 disks (SATA Hitachi HUA721075KLA330) \
> > > connected to raid 6.  Two image servers and two webservers. Image servers are \
> > > connected to storage via iSCSI (1GBit) and webservers are connected via fibre \
> > > (QLogic ISP2432-based 4Gb). There is ocfs2 filesystem on the storage disk. When \
> > > I disconnect webserver1 (identical with webserver2) everything is ok. But when \
> > > I do "/etc/init.d/o2cb start", even without mounting the storage disk (so \
> > > webserver is actually doing nothing) my project is down every aprox 30 minutes \
> > > for aprox 2 minutes. 
> > > To describe what is down: There is no problem on image servers, but there is a \
> > > problem on webserver2. Mounted ocfs2 disk is not responding (I can't run even \
> > > "df" command), so load goes to aprox 400 and number of running apaches reaches \
> > > it's maximum and so on. The web page is not responding. 
> > > I store all of my logs on local disks so not on ocfs2 disk.
> > > 
> > > I use 2.6.32 kernel on servers, but I have already tried change it to some \
> > > another, but with no result. 
> > > I use ocfs2-tools in version 1.4.1-1.
> > > 
> > > My distro is Debian Lenny (5.0.6) x64.
> > > 
> > > My /etc/default/o2cb:
> > > O2CB_ENABLED=true
> > > O2CB_BOOTCLUSTER=ocfs2
> > > O2CB_HEARTBEAT_THRESHOLD=14
> > > O2CB_IDLE_TIMEOUT_MS=10000
> > > O2CB_KEEPALIVE_DELAY_MS=5000
> > > O2CB_RECONNECT_DELAY_MS=2000
> > > 
> > > My /etc/ocfs2/cluster.conf:
> > > node:
> > > ip_port = 7777
> > > ip_address = 10.0.0.111
> > > number = 0
> > > name = www1
> > > cluster = ocfs2
> > > 
> > > node:
> > > ip_port = 7777
> > > ip_address = 10.0.0.112
> > > number = 1
> > > name = ww2
> > > cluster = ocfs2
> > > 
> > > node:
> > > ip_port = 7777
> > > ip_address = 10.0.0.121
> > > number = 2
> > > name = img1
> > > cluster = ocfs2
> > > 
> > > node:
> > > ip_port = 7777
> > > ip_address = 10.0.0.122
> > > number = 3
> > > name = img2
> > > cluster = ocfs2
> > > 
> > > cluster:
> > > node_count = 4
> > > name = ocfs2
> > > 
> > > 
> > > Any help is very appreciated,
> > > Best Regards,
> > > 
> > > Emil Noether
> > > 
> > > 
> > > _______________________________________________
> > > Ocfs2-users mailing list
> > > Ocfs2-users@oss.oracle.com
> > > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> > 
> 


[Attachment #5 (text/html)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
    <title></title>
  </head>
  <body text="#000000" bgcolor="#ffffff">
    I am not asking you to cause a hang. Just that you take a stack
    trace<br>
    when you encounter one.<br>
    <br>
    If you don't see /proc/.../stack, then CONFIG_STACKTRACE has not
    been<br>
    enabled in your kernel. You'll have to use the old fashioned method
    of<br>
    setting up a netconsole server and then issuing "echo t
    &gt;/proc/sysrq-trigger"<br>
    <br>
    On 10/15/2010 12:20 PM, Emil Noether wrote:
    <blockquote cite="mid:4CB8A987.8000209@email.cz" type="cite">
      <meta content="text/html; charset=ISO-8859-1"
        http-equiv="Content-Type">
      Hi,<br>
      <br>
      thank you for the reply. Are you sure with this command?, because
      when
      I run <br>
      <br>
      find /proc -name stack<br>
      <br>
      I get no output. But I'm running this command when the server is
      OK. I
      can't cause the hang right now, because it is 21:00 here so it is
      a
      "prime time" of my web and my customers are already quite upset.
      But I
      can try it tommorow morning.<br>
      <br>
      Regards,<br>
      Emil Noether<br>
      <br>
      On 10/15/2010 07:22 PM, Sunil Mushran wrote:
      <blockquote cite="mid:4CB88DBB.2030100@oracle.com" type="cite">
        <meta content="text/html; charset=ISO-8859-1"
          http-equiv="Content-Type">
        Take a stack trace of the hang. If you are on 2.6.32, you could
        do:<br>
        <br>
        # find /proc -name stack | while read A ; do D=$(dirname $A);
        echo $A;
        cat $D/cmdline; echo ; cat $A; echo ; done;<br>
        <br>
        Attach the output to a bugzilla on oss.oracle.com.<br>
        <br>
        On 10/15/2010 08:16 AM, Emil Noether wrote:
        <blockquote cite="mid:4CB8705E.1010807@email.cz" type="cite">
          <meta http-equiv="content-type" content="text/html;
            charset=ISO-8859-1">
          Hi,<br>
          <br>
          I have a SATABoy2 Nexan storage with 8 disks (SATA Hitachi
          HUA721075KLA330) connected to raid 6.&nbsp; Two image servers and
          two
          webservers. Image servers are connected to storage via iSCSI
          (1GBit)
          and webservers are connected via fibre (QLogic ISP2432-based
          4Gb).
          There is ocfs2 filesystem on the storage disk. When I
          disconnect
          webserver1 (identical with webserver2) everything is ok. But
          when I do
          "/etc/init.d/o2cb
          start",
          even without mounting the storage disk (so webserver is
          actually doing nothing) my project is down every aprox 30
          minutes for
          aprox 2 minutes.<br>
          <br>
          To describe what is down: There is no problem on image
          servers, but
          there is a problem on webserver2. Mounted ocfs2 disk is not
          responding
          (I can't run even "df" command), so load goes to aprox 400 and
          number
          of running apaches reaches it's maximum and so on. The web
          page is not
          responding.<br>
          <br>
          I store all of my logs on local disks so not on ocfs2 disk.<br>
          <br>
          I use 2.6.32 kernel on servers, but I have already tried
          change it to
          some another, but with no result.<br>
          <br>
          I use ocfs2-tools in version 1.4.1-1.<br>
          <br>
          My distro is Debian Lenny (5.0.6) x64.<br>
          <br>
          My /etc/default/o2cb:<br>
          O2CB_ENABLED=true<br>
          O2CB_BOOTCLUSTER=ocfs2<br>
          O2CB_HEARTBEAT_THRESHOLD=14<br>
          O2CB_IDLE_TIMEOUT_MS=10000<br>
          O2CB_KEEPALIVE_DELAY_MS=5000<br>
          O2CB_RECONNECT_DELAY_MS=2000<br>
          <br>
          My /etc/ocfs2/cluster.conf:<br>
          node:<br>
          &nbsp; ip_port = 7777<br>
          &nbsp; ip_address = 10.0.0.111<br>
          &nbsp; number = 0<br>
          &nbsp; name = www1<br>
          &nbsp; cluster = ocfs2<br>
          <br>
          node:<br>
          &nbsp; ip_port = 7777<br>
          &nbsp; ip_address = 10.0.0.112<br>
          &nbsp; number = 1<br>
          &nbsp; name = ww2<br>
          &nbsp; cluster = ocfs2<br>
          <br>
          node:<br>
          &nbsp; ip_port = 7777<br>
          &nbsp; ip_address = 10.0.0.121<br>
          &nbsp; number = 2<br>
          &nbsp; name = img1<br>
          &nbsp; cluster = ocfs2<br>
          <br>
          node:<br>
          &nbsp; ip_port = 7777<br>
          &nbsp; ip_address = 10.0.0.122<br>
          &nbsp; number = 3<br>
          &nbsp; name = img2<br>
          &nbsp; cluster = ocfs2<br>
          <br>
          cluster:<br>
          &nbsp; node_count = 4<br>
          &nbsp; name = ocfs2<br>
          <br>
          <br>
          Any help is very appreciated,<br>
          Best Regards,<br>
          <br>
          Emil Noether<br>
          <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Ocfs2-users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" \
href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a> <a \
moz-do-not-send="true" class="moz-txt-link-freetext" \
href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>
  </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>



_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic