[prev in list] [next in list] [prev in thread] [next in thread]
List: gluster-users
Subject: Re: [Gluster-users] ganesha.nfsd process dies when copying files
From: Karli Sjöberg <karli () inparadise ! se>
Date: 2018-08-16 17:29:50
Message-ID: 364a6c4f-2082-4ed8-bfbe-8befe284f98c () email ! android ! com
[Download RAW message or body]
[Attachment #2 (text/html)]
<div dir='auto'><div><br><div class="gmail_extra"><br><div class="gmail_quote">Den 15 \
aug. 2018 13:14 skrev Karli Sjöberg <karli@inparadise.se>:<br \
type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div>On Wed, 2018-08-15 at 13:42 +0800, Pui Edylie \
wrote:<br>> Hi Karli,<br>> <br>> I think Alex is right in regards with the \
NFS version and state.<br>> <br>> I am only using NFSv3 and the failover is \
working per expectation.<br><br>OK, so I've remade the test again and it goes like \
this:<br><br>1) Start copy loop[*]<br>2) Power off hv02<br>3) Copy loop stalls \
indefinitely<br><br>I have attached a snippet of the ctdb log that looks interesting \
but<br>doesn't say much to me execpt that something's wrong:)<br><br>[*]: while true; \
do mount -o vers=3 hv03v.localdomain:/data /mnt/; dd<br>if=/var/tmp/test.bin \
of=/mnt/test.bin bs=1M status=progress; rm -fv<br>/mnt/test.bin; umount /mnt; \
done<br><br>Thanks in advance!<br><br>/K<br></div></blockquote></div></div></div><div \
dir="auto"><br></div><div dir="auto"><div dir="auto" style="font-family: \
sans-serif;">Could someone just confirm to me if this is the correct result for this \
scenario?</div><div dir="auto" style="font-family: sans-serif;"><br></div><div \
dir="auto" style="font-family: sans-serif;">Aren't you supposed to be able to reboot \
a host in the cluster without compromising it?</div><div dir="auto" \
style="font-family: sans-serif;"><br></div><div dir="auto" style="font-family: \
sans-serif;">/K</div><div dir="auto" style="font-family: \
sans-serif;"><br></div></div><div dir="auto"><div class="gmail_extra"><div \
class="gmail_quote"><blockquote class="quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br>> <br>> In my use \
case, I have 3 nodes with ESXI 6.7 as OS and setup 1x <br>> gluster VM on each of \
the ESXI host using its local datastore.<br>> <br>> Once I have formed the \
replicate 3, I use the CTDB VIP to present the<br>> NFS3 back to the Vcenter and \
uses it as a shared storage.<br>> <br>> Everything works great other than \
performance is not very good ... I<br>> am still looking for ways to improve \
it.<br>> <br>> Cheers,<br>> Edy<br>> <br>> On 8/15/2018 12:25 AM, Alex \
Chekholko wrote:<br>> > Hi Karli,<br>> > <br>> > I'm not 100% sure \
this is related, but when I set up my ZFS NFS HA<br>> > per \
https://github.com/ewwhite/zfs-ha/wiki I was not able to get<br>> > the \
failover to work with NFS v4 but only with NFS v3.<br>> > <br>> > From \
the client point of view, it really looked like with NFS v4<br>> > there is an \
open file handle and that just goes stale and hangs, or<br>> > something like \
that, whereas with NFSv3 the client retries and<br>> > recovers and continues. \
I did not investigate further, I just use<br>> > v3. I think it has something \
to do with NFSv4 being "stateful" and<br>> > NFSv3 being "stateless".<br>> \
> <br>> > Can you re-run your test but using NFSv3 on the client mount? \
Or<br>> > do you need to use v4.x?<br>> > <br>> > Regards,<br>> \
> Alex<br>> > <br>> > On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg \
<br>> > wrote:<br>> > > On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. \
KEITHLEY wrote:<br>> > > > On 08/10/2018 09:23 AM, Karli Sjöberg \
wrote:<br>> > > > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie \
wrote:<br>> > > > > > Hi Karli,<br>> > > > > > \
<br>> > > > > > Storhaug works with glusterfs 4.1.2 and latest \
nfs-ganesha.<br>> > > > > > <br>> > > > > > I \
just installed them last weekend ... they are working<br>> > > very \
well<br>> > > > > > :)<br>> > > > > <br>> > \
> > > Okay, awesome!<br>> > > > > <br>> > > > \
> Is there any documentation on how to do that?<br>> > > > > \
<br>> > > > <br>> > > > \
https://github.com/gluster/storhaug/wiki<br>> > > > <br>> > > \
<br>> > > Thanks Kaleb and Edy!<br>> > > <br>> > > I have \
now redone the cluster using the latest and greatest<br>> > > \
following<br>> > > the above guide and repeated the same test I was doing \
before<br>> > > (the<br>> > > rsync while loop) with success. I let \
(forgot) it run for about a<br>> > > day<br>> > > and it was still \
chugging along nicely when I aborted it, so<br>> > > success<br>> > \
> there!<br>> > > <br>> > > On to the next test; the \
catastrophic failure test- where one of<br>> > > the<br>> > > \
servers dies, I'm having a more difficult time with.<br>> > > <br>> > \
> 1) I start with mounting the share over NFS 4.1 and then proceed<br>> > \
> with<br>> > > writing a 8 GiB large random data file with 'dd', while \
"hard-<br>> > > cutting"<br>> > > the power to the server I'm \
writing to, the transfer just stops<br>> > > indefinitely, until the server \
comes back again. Is that supposed<br>> > > to<br>> > > happen? \
Like this:<br>> > > <br>> > > # dd if=/dev/urandom \
of=/var/tmp/test.bin bs=1M count=8192<br>> > > # mount -o vers=4.1 \
hv03v.localdomain:/data /mnt/<br>> > > # dd if=/var/tmp/test.bin \
of=/mnt/test.bin bs=1M status=progress<br>> > > 2434793472 bytes (2,4 GB, \
2,3 GiB) copied, 42 s, 57,9 MB/s<br>> > > <br>> > > (here I cut the \
power and let it be for almost two hours before<br>> > > turning<br>> \
> > it on again)<br>> > > <br>> > > dd: error writing \
'/mnt/test.bin': Remote I/O error<br>> > > 2325+0 records in<br>> > \
> 2324+0 records out<br>> > > 2436890624 bytes (2,4 GB, 2,3 GiB) copied, \
6944,84 s, 351 kB/s<br>> > > # umount /mnt<br>> > > <br>> > \
> Here the unmount command hung and I had to hard reset the client.<br>> > \
> <br>> > > 2) Another question I have is why some files "change" as you \
copy<br>> > > them<br>> > > out to the Gluster storage? Is that the \
way it should be? This<br>> > > time, I<br>> > > deleted eveything \
in the destination directory to start over:<br>> > > <br>> > > # \
mount -o vers=4.1 hv03v.localdomain:/data /mnt/<br>> > > # rm -f \
/mnt/test.bin<br>> > > # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M \
status=progress<br>> > > 8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, \
70,1 MB/s<br>> > > 8192+0 records in<br>> > > 8192+0 records \
out<br>> > > 8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 \
MB/s<br>> > > # md5sum /var/tmp/test.bin <br>> > > \
073867b68fa8eaa382ffe05adb90b583 /var/tmp/test.bin<br>> > > # md5sum \
/mnt/test.bin <br>> > > 634187d367f856f3f5fb31846f796397 \
/mnt/test.bin<br>> > > # umount /mnt<br>> > > <br>> > > \
Thanks in advance!<br>> > > <br>> > > /K<br>> > > \
_______________________________________________<br>> > > Gluster-users \
mailing list<br>> > > Gluster-users@gluster.org<br>> > > \
https://lists.gluster.org/mailman/listinfo/gluster-users<br>> \
_______________________________________________<br>Gluster-users mailing \
list<br>Gluster-users@gluster.org<br>https://lists.gluster.org/mailman/listinfo/gluster-users</div></blockquote></div><br></div></div></div>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic