'Re: kernel panic after upgrading to Linux 5.5'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-btrfs
Subject:    Re: kernel panic after upgrading to Linux 5.5
From:       Tomasz Chmielewski <mangoo () wpkg ! org>
Date:       2020-03-16 12:14:14
Message-ID: 910611ad09d3efb53b13b77bf3c4d99c () wpkg ! org
[Download RAW message or body]

On 2020-03-16 19:26, Qu Wenruo wrote:
> On 2020/3/16 下午1:19, Tomasz Chmielewski wrote:
>> On 2020-03-16 14:06, Qu Wenruo wrote:
>>> On 2020/3/16 上午11:13, Tomasz Chmielewski wrote:
>>>> After upgrading to Linux 5.5 (tried 5.5.6, 5.5.9, also 5.6.0-rc5), 
>>>> the
>>>> system panics shortly after mounting and starting to use a btrfs
>>>> filesystem. Here is a dmesg - please advise how to deal with it.
>>>> It has since crashed several times, because of panic=10 parameter
>>>> (system boots, runs for a while, crashes, boots again, and so on).
>>>> 
>>>> Mount options:
>>>> 
>>>> noatime,ssd,space_cache=v2,user_subvol_rm_allowed
>>>> 
>>>> 
>>>> 
>>>> [   65.777428] BTRFS info (device sda2): enabling ssd optimizations
>>>> [   65.777435] BTRFS info (device sda2): using free space tree
>>>> [   65.777436] BTRFS info (device sda2): has skinny extents
>>>> [   98.225099] BTRFS error (device sda2): parent transid verify 
>>>> failed
>>>> on 19718118866944 wanted 664218442 found 674530371
>>>> [   98.225594] BTRFS error (device sda2): parent transid verify 
>>>> failed
>>>> on 19718118866944 wanted 664218442 found 674530371
>>> 
>>> This is the root cause, not quota.
>>> 
>>> The metadata is already corrupted, and quota is the first to complain
>>> about it.
>> 
>> Still, should it crash the server, putting it into a cycle of
>> crash-boot-crash-boot, possibly breaking the filesystem even more?
> 
> The transid mismatch in the first place is the cause, and I'm not sure
> how it happened.
> 
> Did you have any history of the kernel used on that server?
> 
> Some potential corruption source includes the v5.2.0~v5.2.14, which
> could cause some tree block not written to disk.

Yes, it used to run a lot of kernel, starting with 4.18 or perhaps even 
earlier.


>> Also, how do I fix that corruption?
>> 
>> This server had a drive added, a full balance (to RAID-10 for data and
>> metadata) and scrub a few weeks ago, with no errors. Running scrub now
>> to see if it shows up anything.
> 
> Then at least at that time, it's not corrupted.
> 
> Is there any sudden powerloss happened in recent days?
> Another potential cause is out of spec FLUSH/FUA behavior, which means
> the hard disk controller is not reporting correct FLUSH/FUA finish.
> 
> That means if you use the same disk/controller, and manually to cause
> powerloss, it would fail just after several cycle.

Powerloss - possibly there was.


Tomasz
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic