'Re: [Gluster-users] xfs_rename error and brick offline'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gluster-users
Subject:    Re: [Gluster-users] xfs_rename error and brick offline
From:       Paul <flypen () gmail ! com>
Date:       2017-11-23 1:53:20
Message-ID: CAFTsVLRJ9Hhj1aY-TLZuy3zSHZOj41FUKv6YxvLAQraM5YF_UA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Vijay,

Yes, I find it's a problem of xfs later. After upgrading xfs code, I've not
seen this problem again.

Thanks a lot!
Paul

On Fri, Nov 17, 2017 at 12:08 AM, Vijay Bellur <vbellur@redhat.com> wrote:

>
>
> On Thu, Nov 16, 2017 at 6:23 AM, Paul <flypen@gmail.com> wrote:
>
>> Hi,
>>
>> I have a 5-nodes GlusterFS cluster with Distributed-Replicate. There are
>> 180 bricks in total. The OS is CentOS6.5, and GlusterFS is 3.11.0. I find
>> many bricks are offline when we generate some empty files and rename them.
>> I see xfs call trace in every node.
>>
>> For example,
>> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): Internal error
>> xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c.  Caller
>> 0xffffffffa04e33f9
>> Nov 16 11:15:12 node10 kernel:
>> Nov 16 11:15:12 node10 kernel: Pid: 9939, comm: glusterfsd Tainted: G
>>        --------------- H  2.6.32-prsys.1.1.0.13.x86_64 #1
>> Nov 16 11:15:12 node10 kernel: Call Trace:
>> Nov 16 11:15:12 node10 kernel: [<ffffffffa04c803f>] ?
>> xfs_error_report+0x3f/0x50 [xfs]
>> Nov 16 11:15:12 node10 kernel: [<ffffffffa04e33f9>] ?
>> xfs_rename+0x2c9/0x6c0 [xfs]
>> Nov 16 11:15:12 node10 kernel: [<ffffffffa04e5e39>] ?
>> xfs_trans_cancel+0xd9/0x100 [xfs]
>> Nov 16 11:15:12 node10 kernel: [<ffffffffa04e33f9>] ?
>> xfs_rename+0x2c9/0x6c0 [xfs]
>> Nov 16 11:15:12 node10 kernel: [<ffffffff811962c5>] ?
>> mntput_no_expire+0x25/0xb0
>> Nov 16 11:15:12 node10 kernel: [<ffffffffa04f5a06>] ?
>> xfs_vn_rename+0x66/0x70 [xfs]
>> Nov 16 11:15:12 node10 kernel: [<ffffffff81184580>] ?
>> vfs_rename+0x2a0/0x500
>> Nov 16 11:15:12 node10 kernel: [<ffffffff81182cd6>] ?
>> generic_permission+0x16/0xa0
>> Nov 16 11:15:12 node10 kernel: [<ffffffff811882d9>] ?
>> sys_renameat+0x369/0x420
>> Nov 16 11:15:12 node10 kernel: [<ffffffff81185f06>] ?
>> final_putname+0x26/0x50
>> Nov 16 11:15:12 node10 kernel: [<ffffffff81186189>] ? putname+0x29/0x40
>> Nov 16 11:15:12 node10 kernel: [<ffffffff811861f9>] ?
>> user_path_at+0x59/0xa0
>> Nov 16 11:15:12 node10 kernel: [<ffffffff8151dc79>] ?
>> unroll_tree_refs+0x16/0xbc
>> Nov 16 11:15:12 node10 kernel: [<ffffffff810d1698>] ?
>> audit_syscall_entry+0x2d8/0x300
>> Nov 16 11:15:12 node10 kernel: [<ffffffff811883ab>] ? sys_rename+0x1b/0x20
>> Nov 16 11:15:12 node10 kernel: [<ffffffff8100b032>] ?
>> system_call_fastpath+0x16/0x1b
>> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2):
>> xfs_do_force_shutdown(0x8) called from line 1949 of file
>> fs/xfs/xfs_trans.c.  Return address = 0xffffffffa04e5e52
>> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): Corruption of in-memory
>> data detected.  Shutting down filesystem
>> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): Please umount the
>> filesystem and rectify the problem(s)
>> Nov 16 11:15:30 node10 disks-FAvUzxiL-brick[29742]: [2017-11-16
>> 11:15:30.206208] M [MSGID: 113075] [posix-helpers.c:1891:posix_health_check_thread_proc]
>> 0-data-posix: health-check failed, going down
>> Nov 16 11:15:30 node10 disks-FAvUzxiL-brick[29742]: [2017-11-16
>> 11:15:30.206538] M [MSGID: 113075] [posix-helpers.c:1908:posix_health_check_thread_proc]
>> 0-data-posix: still alive! -> SIGTERM
>> Nov 16 11:15:37 node10 kernel: XFS (sdm): xfs_log_force: error 5 returned.
>> Nov 16 11:16:07 node10 kernel: XFS (sdm): xfs_log_force: error 5 returned.
>>
>>
>>
>
> As the logs indicate, xfs shut down and the posix health check feature in
> Gluster rendered the brick offline. You would be better off checking with
> the xfs community about this problem.
>
> Regards,
> Vijay
>

[Attachment #5 (text/html)]

<div dir="ltr">Vijay,<div><br></div><div>Yes, I find it&#39;s a problem of xfs later. \
After upgrading xfs code, I&#39;ve not seen this problem \
again.</div><div><br></div><div>Thanks a lot!</div><div>Paul</div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 17, 2017 at 12:08 AM, \
Vijay Bellur <span dir="ltr">&lt;<a href="mailto:vbellur@redhat.com" \
target="_blank">vbellur@redhat.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div \
class="gmail_quote"><div><div class="h5">On Thu, Nov 16, 2017 at 6:23 AM, Paul <span \
dir="ltr">&lt;<a href="mailto:flypen@gmail.com" \
target="_blank">flypen@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><div>I have a \
5-nodes GlusterFS cluster with Distributed-Replicate. There are 180 bricks in total. \
The OS is CentOS6.5, and GlusterFS is 3.11.0. I find many bricks are offline when we \
generate some empty files and rename them. I see xfs call trace in every node.  \
</div><div><br></div><div>For example,</div><div>Nov 16 11:15:12 node10 kernel: XFS \
(rdc00d28p2): Internal error xfs_trans_cancel at line 1948 of file \
fs/xfs/xfs_trans.c.   Caller 0xffffffffa04e33f9</div><div>Nov 16 11:15:12 node10 \
kernel:</div><div>Nov 16 11:15:12 node10 kernel: Pid: 9939, comm: glusterfsd Tainted: \
G                 --------------- H   2.6.32-prsys.1.1.0.13.x86_64 #1</div><div>Nov \
16 11:15:12 node10 kernel: Call Trace:</div><div>Nov 16 11:15:12 node10 kernel: \
[&lt;ffffffffa04c803f&gt;] ? xfs_error_report+0x3f/0x50 [xfs]</div><div>Nov 16 \
11:15:12 node10 kernel: [&lt;ffffffffa04e33f9&gt;] ? xfs_rename+0x2c9/0x6c0 \
[xfs]</div><div>Nov 16 11:15:12 node10 kernel: [&lt;ffffffffa04e5e39&gt;] ? \
xfs_trans_cancel+0xd9/0x100 [xfs]</div><div>Nov 16 11:15:12 node10 kernel: \
[&lt;ffffffffa04e33f9&gt;] ? xfs_rename+0x2c9/0x6c0 [xfs]</div><div>Nov 16 11:15:12 \
node10 kernel: [&lt;ffffffff811962c5&gt;] ? mntput_no_expire+0x25/0xb0</div><div>Nov \
16 11:15:12 node10 kernel: [&lt;ffffffffa04f5a06&gt;] ? xfs_vn_rename+0x66/0x70 \
[xfs]</div><div>Nov 16 11:15:12 node10 kernel: [&lt;ffffffff81184580&gt;] ? \
vfs_rename+0x2a0/0x500</div><div>Nov 16 11:15:12 node10 kernel: \
[&lt;ffffffff81182cd6&gt;] ? generic_permission+0x16/0xa0</div><div>Nov 16 11:15:12 \
node10 kernel: [&lt;ffffffff811882d9&gt;] ? sys_renameat+0x369/0x420</div><div>Nov 16 \
11:15:12 node10 kernel: [&lt;ffffffff81185f06&gt;] ? \
final_putname+0x26/0x50</div><div>Nov 16 11:15:12 node10 kernel: \
[&lt;ffffffff81186189&gt;] ? putname+0x29/0x40</div><div>Nov 16 11:15:12 node10 \
kernel: [&lt;ffffffff811861f9&gt;] ? user_path_at+0x59/0xa0</div><div>Nov 16 11:15:12 \
node10 kernel: [&lt;ffffffff8151dc79&gt;] ? unroll_tree_refs+0x16/0xbc</div><div>Nov \
16 11:15:12 node10 kernel: [&lt;ffffffff810d1698&gt;] ? \
audit_syscall_entry+0x2d8/0x30<wbr>0</div><div>Nov 16 11:15:12 node10 kernel: \
[&lt;ffffffff811883ab&gt;] ? sys_rename+0x1b/0x20</div><div>Nov 16 11:15:12 node10 \
kernel: [&lt;ffffffff8100b032&gt;] ? system_call_fastpath+0x16/0x1b</div><div>Nov 16 \
11:15:12 node10 kernel: XFS (rdc00d28p2): xfs_do_force_shutdown(0x8) called from line \
1949 of file fs/xfs/xfs_trans.c.   Return address = 0xffffffffa04e5e52</div><div>Nov \
16 11:15:12 node10 kernel: XFS (rdc00d28p2): Corruption of in-memory data detected.   \
Shutting down filesystem</div><div>Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): \
Please umount the filesystem and rectify the problem(s)</div><div>Nov 16 11:15:30 \
node10 disks-FAvUzxiL-brick[29742]: [2017-11-16 11:15:30.206208] M [MSGID: 113075] \
[posix-helpers.c:1891:posix_he<wbr>alth_check_thread_proc] 0-data-posix: health-check \
failed, going down</div><div>Nov 16 11:15:30 node10 disks-FAvUzxiL-brick[29742]: \
[2017-11-16 11:15:30.206538] M [MSGID: 113075] \
[posix-helpers.c:1908:posix_he<wbr>alth_check_thread_proc] 0-data-posix: still alive! \
-&gt; SIGTERM</div><div>Nov 16 11:15:37 node10 kernel: XFS (sdm): xfs_log_force: \
error 5 returned.</div><div>Nov 16 11:16:07 node10 kernel: XFS (sdm): xfs_log_force: \
error 5 returned.</div><div><br></div><div><br></div></div></blockquote><div><br></div><div><br></div></div></div><div>As \
the logs indicate, xfs shut down and the posix health check feature in Gluster \
rendered the brick offline. You would be better off checking with the xfs community \
about this problem.</div><div><br></div><div>Regards,</div><div>Vijay  \
</div></div></div></div> </blockquote></div><br></div>

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[prev in list] [next in list] [prev in thread] [next in thread]