'Re: [Gluster-users] Question on stale shards with distribute-replicate volume'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gluster-users
Subject:    Re: [Gluster-users] Question on stale shards with distribute-replicate volume
From:       Ronny Adsetts <ronny.adsetts () amazinginternet ! com>
Date:       2022-11-11 9:11:06
Message-ID: 5ad7e56b-56c0-ec2b-ec53-efe6c893a791 () amazinginternet ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Hi Strahil,

Thanks for the response, appreciate it.

There were two set sets of shard files (each set of two replicas and the arbiter \
data) showing up, one 0-byte set and one of the correct size, for the problem data. \
The correct data looked fine. The two sets of shard files is what was causing Gluster \
to have a wobble. No idea how Gluster came to have the two sets of files. Maybe I'm \
missing something in my understanding of how Gluster works here.

In the end, I resorted to verifying the data by copying the iSCSI backing store files \
to /dev/null from the mounted Gluster volume and then removing any "bad" 0 byte \
shards that were logged by Gluster with the "Stale file handle" error. This then \
resolved the I/O errors that were being seen within the iSCSI mounted filesystem.

The problem I was initially experiencing in trying to tracking this down was the tgtd \
logs from libgfap. I had no clue on how to determine which shards were problematic \
from those logs.

Anyway, disaster over but it does leave me a little nervous. Recovery from backup is \
quite tedious.

Ronny

Strahil Nikolov wrote on 10/11/2022 17:28:
> I skimmed over , so take everything I say with a grain of salt.
> 
> Based on thr logs, the gfid for one of the cases is clear -> \
> b42dc8f9-755e-46be-8418-4882a9f765e1 and shard 5613. 
> As there is a linkto, most probably the shards location was on another subvolume \
> and in such case I would just "walk" over all bricks and get the extended file \
> attributes of the real ones. 
> I can't imagine why it happened but I do suspect a gfid splitbrain.
> 
> If I were in your shoes, I would check the gfids and assume that those with the \
> same gfid value are the good ones (usually the one that differs has an older \
> timestamp) and I would remove the copy from the last brick and check if it fixes \
> the things for me. 
> Best Regards,
> Strahil Nikolov 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Nov 3, 2022 at 17:24, Ronny Adsetts
> <ronny.adsetts@amazinginternet.com> wrote:
> Hi,
> 
> We have a 4 x ( 2 + 1 ) distribute-replicate volume with sharding enabled. We use \
> the volume for storing backing files for iscsi devices. The iscsi devices are \
> provided to our file server using tgtd using the glfs backing store type via \
> libgfapi. 
> So we had a problem the other day where one of the filesystems wouldn't re-mount \
> following a rolling tgtd restart (we have 4 servers providing tgtd). I think this \
> rolling restart was done too quickly which meant there was a disconnect at the file \
> server end (speculating). After some investigation, and manually trying to copy the \
> fs image file to a temporary location, I found 0 byte shards. 
> Because I mounted the file directly I got errors in the gluster logs \
> (/var/log/glusterfs/srv-iscsi.log) for the volume. I get no errors in gluster logs \
> when this happens via libgfapi though I did see tgtd errors. 
> The tgtd errors look like this:
> 
> tgtd[24080]: tgtd: bs_glfs_request(279) Error on read ffffffff 1000tgtd: \
> bs_glfs_request(370) io error 0x55da8d9820b0 2 28 -1 4096 376698519552, Stale file \
> handle 
> Not sure how to figure out which shard is the issue out of that log entry. :-)
> 
> The gluster logs look like this:
> 
> [2022-11-01 16:51:28.496911] E [MSGID: 133010] \
> [shard.c:2342:shard_common_lookup_shards_cbk] 0-iscsi-shard: Lookup on shard 5613 \
> failed. Base file gfid = b42dc8f9-755e-46be-8418-4882a9f765e1 [Stale file handle] 
> [2022-11-01 19:17:09.060376] E [MSGID: 133010] \
> [shard.c:2342:shard_common_lookup_shards_cbk] 0-iscsi-shard: Lookup on shard 5418 \
> failed. Base file gfid = b42dc8f9-755e-46be-8418-4882a9f765e1 [Stale file handle] 
> So there were the two shards showing up as problematic. Checking the shard files \
> showed that they were 0 byte with a trusted.glusterfs.dht.linkto value in the file \
> attributes. There were other shard files of the same name with the correct size. So \
> I guess the shard had been moved at some point resulting in the 8 byte linkto \
> copies. Anyway, moving the offending .shard and associated .gluster files out of \
> the way resulted in me being able to first, copy the file without error, and then \
> run an "xfs_repair -L" on the filesystem and get it remounted. There was some data \
> loss but minor as far as I can tell. 
> So the two shards I removed (replica 2 + arbiter) look like so:
> 
> ronny@cogline <mailto:ronny@cogline>:~$ ls -al \
> /tmp/publichomes-backup-stale-shards/.shard/ total 0
> drwxr-xr-x 2 root root 104 Nov  2 00:13 .
> drwxr-xr-x 4 root root  38 Nov  2 00:05 ..
> ---------T 1 root root  0 Aug 14 04:26 b42dc8f9-755e-46be-8418-4882a9f765e1.5418
> ---------T 1 root root  0 Oct 25 10:49 b42dc8f9-755e-46be-8418-4882a9f765e1.5613
> 
> ronny@keratrix <mailto:ronny@keratrix>:~$ ls -al \
> /tmp/publichomes-backup-stale-shards/.shard/ total 0
> drwxr-xr-x 2 root root 104 Nov  2 00:13 .
> drwxr-xr-x 4 root root  38 Nov  2 00:07 ..
> ---------T 1 root root  0 Aug 14 04:26 b42dc8f9-755e-46be-8418-4882a9f765e1.5418
> ---------T 1 root root  0 Oct 25 10:49 b42dc8f9-755e-46be-8418-4882a9f765e1.5613
> 
> ronny@bellizen <mailto:ronny@bellizen>:~$ ls -al \
> /tmp/publichomes-backup-stale-shards/.shard/ total 0
> drwxr-xr-x 2 root root 55 Nov  2 00:07 .
> drwxr-xr-x 4 root root 38 Nov  2 00:07 ..
> ---------T 1 root root  0 Oct 25 10:49 b42dc8f9-755e-46be-8418-4882a9f765e1.5613
> 
> ronny@risca <mailto:ronny@risca>:~$ ls -al \
> /tmp/publichomes-backup-stale-shards/.shard/ total 0
> drwxr-xr-x 2 root root 55 Nov  2 00:13 .
> drwxr-xr-x 4 root root 38 Nov  2 00:13 ..
> ---------T 1 root root  0 Aug 14 04:26 b42dc8f9-755e-46be-8418-4882a9f765e1.5418
> 
> So the first question is did I do the right thing to get this resolved?
> 
> The other, and more important question now relates to "Stale file handle" errors we \
> are now seeing on a different file system. 
> I only have tgtd log entries for this and wondered if anyone could help with taking \
> a log entry and somehow figuring out which shard is the problematic one: 
> tgtd[3052]: tgtd: bs_glfs_request(370) io error 0x56404e0dc510 2 2a -1 1310720 \
> 428680884224, Stale file handle 
> Thanks for any help anyone can provide.
> 
> Ronny
> -- 
> Ronny Adsetts
> Technical Director
> Amazing Internet Ltd, London
> t: +44 20 8977 8943
> w: www.amazinginternet.com
> 
> Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
> Registered in England. Company No. 4042957
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
-- 

Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8977 8943
w: www.amazinginternet.com

Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
Registered in England. Company No. 4042957

[Attachment #5 (text/html)]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi Strahil,<br>
      <br>
      Thanks for the response, appreciate it.<br>
      <br>
      There were two set sets of shard files (each set of two replicas
      and the arbiter data) showing up, one 0-byte set and one of the
      correct size, for the problem data. The correct data looked fine.
      The two sets of shard files is what was causing Gluster to have a
      wobble. No idea how Gluster came to have the two sets of files.
      Maybe I'm missing something in my understanding of how Gluster
      works here.<br>
      <br>
      In the end, I resorted to verifying the data by copying the iSCSI
      backing store files to /dev/null from the mounted Gluster volume
      and then removing any "bad" 0 byte shards that were logged by
      Gluster with the "Stale file handle" error. This then resolved the
      I/O errors that were being seen within the iSCSI mounted
      filesystem.<br>
      <br>
      The problem I was initially experiencing in trying to tracking
      this down was the tgtd logs from libgfap. I had no clue on how to
      determine which shards were problematic from those logs.</p>
    <p>Anyway, disaster over but it does leave me a little nervous.
      Recovery from backup is quite tedious.</p>
    <p>Ronny<br>
    </p>
    <div class="moz-cite-prefix">Strahil Nikolov wrote on 10/11/2022
      17:28:<br>
    </div>
    <blockquote type="cite"
      cite="mid:994377682.1166654.1668101338421@mail.yahoo.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      I skimmed over , so take everything I say with a grain of salt.
      <div><br>
      </div>
      <div>Based on thr logs, the gfid for one of the cases is clear
        -&gt; b42dc8f9-755e-46be-8418-4882a9f765e1 and shard 5613.</div>
      <div><br>
      </div>
      <div>As there is a linkto, most probably the shards location was
        on another subvolume and in such case I would just "walk" over
        all bricks and get the extended file attributes of the real
        ones.</div>
      <div><br>
      </div>
      <div>I can't imagine why it happened but I do suspect a gfid
        splitbrain.</div>
      <div><br>
      </div>
      <div>If I were in your shoes, I would check the gfids and assume
        that those with the same gfid value are the good ones (usually
        the one that differs has an older timestamp) and I would remove
        the copy from the last brick and check if it fixes the things
        for me.</div>
      <div><br>
      </div>
      <div>Best Regards,</div>
      <div>Strahil Nikolov </div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div> <br>
        <blockquote style="margin: 0 0 20px 0;">
          <div style="font-family:Roboto, sans-serif; color:#6D00F6;">
            <div>On Thu, Nov 3, 2022 at 17:24, Ronny Adsetts</div>
            <div><a class="moz-txt-link-rfc2396E" \
href="mailto:ronny.adsetts@amazinginternet.com">&lt;ronny.adsetts@amazinginternet.com&gt;</a> \
wrote:</div>  </div>
          <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0;
            border-left: 1px solid #6D00F6;">
            <div dir="ltr">Hi,<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">We have a 4 x ( 2 + 1 ) distribute-replicate
              volume with sharding enabled. We use the volume for
              storing backing files for iscsi devices. The iscsi devices
              are provided to our file server using tgtd using the glfs
              backing store type via libgfapi.<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">So we had a problem the other day where one
              of the filesystems wouldn't re-mount following a rolling
              tgtd restart (we have 4 servers providing tgtd). I think
              this rolling restart was done too quickly which meant
              there was a disconnect at the file server end
              (speculating). After some investigation, and manually
              trying to copy the fs image file to a temporary location,
              I found 0 byte shards.<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">Because I mounted the file directly I got
              errors in the gluster logs
              (/var/log/glusterfs/srv-iscsi.log) for the volume. I get
              no errors in gluster logs when this happens via libgfapi
              though I did see tgtd errors.<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">The tgtd errors look like this:<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">tgtd[24080]: tgtd: bs_glfs_request(279) Error
              on read ffffffff 1000tgtd: bs_glfs_request(370) io error
              0x55da8d9820b0 2 28 -1 4096 376698519552, Stale file
              handle<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">Not sure how to figure out which shard is the
              issue out of that log entry. :-)<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">The gluster logs look like this:<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">[2022-11-01 16:51:28.496911] E [MSGID:
              133010] [shard.c:2342:shard_common_lookup_shards_cbk]
              0-iscsi-shard: Lookup on shard 5613 failed. Base file gfid
              = b42dc8f9-755e-46be-8418-4882a9f765e1 [Stale file handle]<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">[2022-11-01 19:17:09.060376] E [MSGID:
              133010] [shard.c:2342:shard_common_lookup_shards_cbk]
              0-iscsi-shard: Lookup on shard 5418 failed. Base file gfid
              = b42dc8f9-755e-46be-8418-4882a9f765e1 [Stale file handle]<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">So there were the two shards showing up as
              problematic. Checking the shard files showed that they
              were 0 byte with a trusted.glusterfs.dht.linkto value in
              the file attributes. There were other shard files of the
              same name with the correct size. So I guess the shard had
              been moved at some point resulting in the 8 byte linkto
              copies. Anyway, moving the offending .shard and associated
              .gluster files out of the way resulted in me being able to
              first, copy the file without error, and then run an
              "xfs_repair -L" on the filesystem and get it remounted.
              There was some data loss but minor as far as I can tell.<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">So the two shards I removed (replica 2 +
              arbiter) look like so:<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr"><a ymailto="mailto:ronny@cogline"
                href="mailto:ronny@cogline" \
moz-do-not-send="true">ronny@cogline</a>:~$  ls -al \
/tmp/publichomes-backup-stale-shards/.shard/<br>  </div>
            <div dir="ltr">total 0<br>
            </div>
            <div dir="ltr">drwxr-xr-x 2 root root 104 Nov  2 00:13 .<br>
            </div>
            <div dir="ltr">drwxr-xr-x 4 root root  38 Nov  2 00:05 ..<br>
            </div>
            <div dir="ltr">---------T 1 root root  0 Aug 14 04:26
              b42dc8f9-755e-46be-8418-4882a9f765e1.5418<br>
            </div>
            <div dir="ltr">---------T 1 root root  0 Oct 25 10:49
              b42dc8f9-755e-46be-8418-4882a9f765e1.5613<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr"><a ymailto="mailto:ronny@keratrix"
                href="mailto:ronny@keratrix" \
moz-do-not-send="true">ronny@keratrix</a>:~$  ls -al \
/tmp/publichomes-backup-stale-shards/.shard/<br>  </div>
            <div dir="ltr">total 0<br>
            </div>
            <div dir="ltr">drwxr-xr-x 2 root root 104 Nov  2 00:13 .<br>
            </div>
            <div dir="ltr">drwxr-xr-x 4 root root  38 Nov  2 00:07 ..<br>
            </div>
            <div dir="ltr">---------T 1 root root  0 Aug 14 04:26
              b42dc8f9-755e-46be-8418-4882a9f765e1.5418<br>
            </div>
            <div dir="ltr">---------T 1 root root  0 Oct 25 10:49
              b42dc8f9-755e-46be-8418-4882a9f765e1.5613<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr"><a ymailto="mailto:ronny@bellizen"
                href="mailto:ronny@bellizen" \
moz-do-not-send="true">ronny@bellizen</a>:~$  ls -al \
/tmp/publichomes-backup-stale-shards/.shard/<br>  </div>
            <div dir="ltr">total 0<br>
            </div>
            <div dir="ltr">drwxr-xr-x 2 root root 55 Nov  2 00:07 .<br>
            </div>
            <div dir="ltr">drwxr-xr-x 4 root root 38 Nov  2 00:07 ..<br>
            </div>
            <div dir="ltr">---------T 1 root root  0 Oct 25 10:49
              b42dc8f9-755e-46be-8418-4882a9f765e1.5613<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr"><a ymailto="mailto:ronny@risca"
                href="mailto:ronny@risca" moz-do-not-send="true">ronny@risca</a>:~$
              ls -al /tmp/publichomes-backup-stale-shards/.shard/<br>
            </div>
            <div dir="ltr">total 0<br>
            </div>
            <div dir="ltr">drwxr-xr-x 2 root root 55 Nov  2 00:13 .<br>
            </div>
            <div dir="ltr">drwxr-xr-x 4 root root 38 Nov  2 00:13 ..<br>
            </div>
            <div dir="ltr">---------T 1 root root  0 Aug 14 04:26
              b42dc8f9-755e-46be-8418-4882a9f765e1.5418<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">So the first question is did I do the right
              thing to get this resolved?<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">The other, and more important question now
              relates to "Stale file handle" errors we are now seeing on
              a different file system.<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">I only have tgtd log entries for this and
              wondered if anyone could help with taking a log entry and
              somehow figuring out which shard is the problematic one:<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">tgtd[3052]: tgtd: bs_glfs_request(370) io
              error 0x56404e0dc510 2 2a -1 1310720 428680884224, Stale
              file handle<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">Thanks for any help anyone can provide.<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">Ronny<br>
            </div>
            <div dir="ltr">-- <br>
            </div>
            <div dir="ltr">Ronny Adsetts<br>
            </div>
            <div dir="ltr">Technical Director<br>
            </div>
            <div dir="ltr">Amazing Internet Ltd, London<br>
            </div>
            <div dir="ltr">t: +44 20 8977 8943<br>
            </div>
            <div dir="ltr">w: <a class="moz-txt-link-abbreviated" \
href="http://www.amazinginternet.com">www.amazinginternet.com</a><br>  </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">Registered office: 85 Waldegrave Park,
              Twickenham, TW1 4TJ<br>
            </div>
            <div dir="ltr">Registered in England. Company No. 4042957<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">________<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">Community Meeting Calendar:<br>
            </div>
            <div dir="ltr"><br>
            </div>
            <div dir="ltr">Schedule -<br>
            </div>
            <div dir="ltr">Every 2nd and 4th Tuesday at 14:30 IST /
              09:00 UTC<br>
            </div>
            <div dir="ltr">Bridge: <a
                href="https://meet.google.com/cpu-eiue-hvk"
                target="_blank" \
moz-do-not-send="true">https://meet.google.com/cpu-eiue-hvk</a><br>  </div>
            <div dir="ltr">Gluster-users mailing list<br>
            </div>
            <div dir="ltr"><a ymailto="mailto:Gluster-users@gluster.org"
                href="mailto:Gluster-users@gluster.org"
                moz-do-not-send="true">Gluster-users@gluster.org</a><br>
            </div>
            <div dir="ltr"><a
                href="https://lists.gluster.org/mailman/listinfo/gluster-users"
                target="_blank" \
moz-do-not-send="true">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
  </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <div class="moz-signature">-- <br>
      <pre>Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8977 8943
w: <a class="moz-txt-link-abbreviated" \
href="http://www.amazinginternet.com">www.amazinginternet.com</a>

Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
Registered in England. Company No. 4042957
</pre>
    </div>
  </body>
</html>

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[prev in list] [next in list] [prev in thread] [next in thread]