[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-btrfs
Subject: Re: kernel panic after upgrading to Linux 5.5
From: Tomasz Chmielewski <mangoo () wpkg ! org>
Date: 2020-03-16 12:14:14
Message-ID: 910611ad09d3efb53b13b77bf3c4d99c () wpkg ! org
[Download RAW message or body]
On 2020-03-16 19:26, Qu Wenruo wrote:
> On 2020/3/16 下午1:19, Tomasz Chmielewski wrote:
>> On 2020-03-16 14:06, Qu Wenruo wrote:
>>> On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
>>>> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5),
>>>> the
>>>> system panics shortly after mounting and starting to use a btrfs
>>>> filesystem. Here is a dmesg - please advise how to deal with it.
>>>> It has since crashed several times, because of panic=10 parameter
>>>> (system boots, runs for a while, crashes, boots again, and so on).
>>>>
>>>> Mount options:
>>>>
>>>> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
>>>>
>>>>
>>>>
>>>> [ 65.777428] BTRFS info (device sda2): enabling ssd optimizations
>>>> [ 65.777435] BTRFS info (device sda2): using free space tree
>>>> [ 65.777436] BTRFS info (device sda2): has skinny extents
>>>> [ 98.225099] BTRFS error (device sda2): parent transid verify
>>>> failed
>>>> on 19718118866944 wanted 664218442 found 674530371
>>>> [ 98.225594] BTRFS error (device sda2): parent transid verify
>>>> failed
>>>> on 19718118866944 wanted 664218442 found 674530371
>>>
>>> This is the root cause, not quota.
>>>
>>> The metadata is already corrupted, and quota is the first to complain
>>> about it.
>>
>> Still, should it crash the server, putting it into a cycle of
>> crash-boot-crash-boot, possibly breaking the filesystem even more?
>
> The transid mismatch in the first place is the cause, and I'm not sure
> how it happened.
>
> Did you have any history of the kernel used on that server?
>
> Some potential corruption source includes the v5.2.0~v5.2.14, which
> could cause some tree block not written to disk.
Yes, it used to run a lot of kernel, starting with 4.18 or perhaps even
earlier.
>> Also, how do I fix that corruption?
>>
>> This server had a drive added, a full balance (to RAID-10 for data and
>> metadata) and scrub a few weeks ago, with no errors. Running scrub now
>> to see if it shows up anything.
>
> Then at least at that time, it's not corrupted.
>
> Is there any sudden powerloss happened in recent days?
> Another potential cause is out of spec FLUSH/FUA behavior, which means
> the hard disk controller is not reporting correct FLUSH/FUA finish.
>
> That means if you use the same disk/controller, and manually to cause
> powerloss, it would fail just after several cycle.
Powerloss - possibly there was.
Tomasz
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic