[prev in list] [next in list] [prev in thread] [next in thread]
List: freebsd-current
Subject: Re: still: Re: gbde data corruption?
From: Heiko Schaefer <hschaefer () fto ! de>
Date: 2003-04-30 14:14:20
[Download RAW message or body]
Hello Poul,
> >the broken version of the file contains lots of 0-bytes (instead of high
> >entropy values in the original file). seems by the output of cmp that
> >every damaged value is replaced by 0.
>
> Zero bytes is the absolutely last thing I would expect...
>
> How long are the sequences of zero bytes, and do they start at
> sector boundaries ?
it seems that the (one and only) sequence is exactly 32k long and starts
nicely alligned (alligned to 1024*16, even).
> Do you also see this on the client ? (Ie: could it be that data is
> still cached on the client and not flushed ?)
i see the broken variant of the file both locally and via my nfs client.
which is to be expected - i'm moving rather large amounts of data...
the thing that i am doing (over and over again) is completely filling one
30gb and one 60gb filesystem.
> What is the approximate error-rate ? 1 file in 10 ? 1 file in 100 ?
> How long are the files ?
this last error i observe is one file on a 30gb filesystem that is filled
fully with files that are between 1mb and 10mb or so (most of them, at
least). so i'm talking about 1 in 10000, in this case.
> >another thing i just notice: /var/log/messages contains lots of
> >
> >[...]
> >Apr 30 15:24:55 zoidberg kernel: ENOMEM 0xc4c62100 on 0xc45c6c80(ad2s1e.bde)
> >Apr 30 15:25:19 zoidberg kernel: ENOMEM 0xc3fa5000 on 0xc45c6c80(ad2s1e.bde)
> >Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4b46100 on 0xc45c6c80(ad2s1e.bde)
> >Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4364500 on 0xc45c6c80(ad2s1e.bde)
> >[...]
>
> This means that the kernel ran out of ram and the operation was retried,
> it should not result in data corruption but it may reorder bio requests
> significantly. I must admit that I have not bashed NFS to see that it
> copes.
that sounds moderately suspicious to me. i could try to physically move
another disc with lots of unencrypted data into the fileserver and try
copying onto gbde without nfs - but only later today, when i get home.
> >if you have no other things i could report or try, i might just throw away
> >the gbde volumes and try the same copying with non-gbde partitions, just
> >to be sure.
>
> That would be a good first step, but we need to do it controlled to make
> sure we know what we prove, so please try it this way:
>
> add
> option MALLOC_MAKE_FAILURES
> to your kernel.
>
> Build filesystem without GBDE, run test, check for corruption.
well, i think i'll just try copying (over nfs) onto unencrypted
filesystems without any further changes first. one of these copy- and
checksum cycles takes quite a few hours ... if that test results in
errors, then i will instantly throw myself into the dust before you and
apologize :) if not, i'll try to stress my box some more (including malloc
failures if nothing else helps/hurts).
thanks, regards,
Heiko
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic