'XFS Kernel 2.6.27.7 oopses'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-xfs
Subject:    XFS Kernel 2.6.27.7 oopses
From:       Ralf Liebenow <ralf () theco ! de>
Date:       2009-01-30 22:23:59
Message-ID: 20090130222359.GB32142 () theco ! de
[Download RAW message or body]

Hello !

I heavily use XFS for an incremental backup server (by using rsync --link-dest option
to create hardlinks to unchanged files), and therefore have about 10 million files
on my TB Harddisk. To remove old versions nightly an "rm -rf" will remove a million
hardlinks/files every night.

After a while I had regular oopses and so I updated the system to make sure its
on a current version.

It is now a SuSE 11.1 64Bit with SuSE's Kernel 2.6.27.7-9-default

The Server is a Quad-Core Intel 64Bit with 8 GB RAM running a 64Bit Linux.
(I have vmware server 2 installed, so those modules can be seen in the kmesg,
but the OOPs happens also without them).

Now sometimes the "rm -rf" Job OOPses the kernel and get stuck (there is no
other measurable IO traffic on that system). The /proc/kmesg gives:

cat /proc/kmsg 
<0>general protection fault: 0000 [1] SMP 
<0>last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
<4>CPU 3 
<4>Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device binfmt_mi 
sc vmnet(N) vsock(N) vmci(N) vmmon(N) nfsd lockd nfs_acl auth_rpcgss sunrpc expo 
rtfs microcode fuse loop dm_mod snd_hda_intel st r8169 snd_pcm snd_timer osst sn 
d_page_alloc ppdev iTCO_wdt mii shpchp button rtc_cmos snd_hwdep pci_hotplug par 
port_pc rtc_core sky2 ohci1394 intel_agp rtc_lib snd i2c_i801 iTCO_vendor_suppor 
t ieee1394 parport pcspkr i2c_core sg soundcore raid456 async_xor async_memcpy a 
sync_tx xor raid0 sd_mod crc_t10dif ehci_hcd uhci_hcd usbcore edd raid1 xfs fan  
ahci libata dock aic79xx scsi_transport_spi scsi_mod thermal processor thermal_s 
ys hwmon
<4>Supported: No
<4>Pid: 5176, comm: xfssyncd Tainted: G          2.6.27.7-9-default #1
<4>RIP: 0010:[<ffffffff80230865>]  [<ffffffff80230865>] __wake_up_common+0x29/0x 
76
<4>RSP: 0018:ffff880114df9d30  EFLAGS: 00010086
<4>RAX: 7fff8800255b8a70 RBX: ffff8800255b8a60 RCX: 0000000000000000
<4>RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8800255b8a68
<4>RBP: ffff880114df9d60 R08: 7fff8800255b8a58 R09: 0000000000000282
<4>R10: 0000000000000002 R11: ffff8800255b87c0 R12: 0000000000000001
<4>R13: 0000000000000282 R14: ffff8800255b8a70 R15: 0000000000000000
<4>FS:  0000000000000000(0000) GS:ffff88012fba0ec0(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00007f28d42a2000 CR3: 0000000124e34000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process xfssyncd (pid: 5176, threadinfo ffff880114df8000, task ffff88012bc1e0 
c0)
<4>Stack:  0000000300000000 ffff8800255b8a60 ffff8800255b8a68 0000000000000282
<4> ffff88012d802000 0000000000000001 ffff880114df9d90 ffffffff8023219a
<4> 0000000000000286 0000000000000000 ffff88006ef1d240 ffff88012aca3800
<4>Call Trace:
<4> [<ffffffff8023219a>] complete+0x38/0x4b
<4> [<ffffffffa00f5316>] xfs_iflush+0x73/0x2ab [xfs]
<4> [<ffffffffa010a7a2>] xfs_finish_reclaim+0x12a/0x168 [xfs]
<4> [<ffffffffa010a871>] xfs_finish_reclaim_all+0x91/0xcb [xfs]
<4> [<ffffffffa010925c>] xfs_syncsub+0x50/0x22b [xfs]
<4> [<ffffffffa0118a3a>] xfs_sync_worker+0x17/0x36 [xfs]
<4> [<ffffffffa01189d4>] xfssyncd+0x15d/0x1ac [xfs]
<4> [<ffffffff8025434d>] kthread+0x47/0x73
<4> [<ffffffff8020d7b9>] child_rip+0xa/0x11
<4>
<4>
<0>Code: c9 c3 55 48 89 e5 41 57 4d 89 c7 41 56 4c 8d 77 08 41 55 41 54 41 89 d4 
 53 48 83 ec 08 89 75 d4 89 4d d0 48 8b 47 08 4c 8d 40 e8 <49> 8b 40 18 48 8d 58 
 e8 eb 2d 45 8b 28 4c 89 f9 8b 55 d0 8b 75 
<1>RIP  [<ffffffff80230865>] __wake_up_common+0x29/0x76
<4> RSP <ffff880114df9d30>
<4>---[ end trace a069bd11f2b4e6ab ]---

It _always_ gets stuck at the same place in "complete" of xfssyncd, so i dont
think its hardware related.

I also always did a xfs_repair after very OOPS->Reboot, so the filesystem itself
should be consistent.

I initilly used default settings for mkfs.xfs and mount. Now I use different
settings, but get the same OOPs again, it seems to be unrelated.

What do you recommend ? Has this bug already been addressed within the
hundrets of fixes I've seen on the mailing list ? Shall I try a stock 2.6.28
kernel ?

   Thanks in advance !

      Ralf
-- 
theCode AG 
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [荺
fon +49 30 617 897-0  fax -10
ralf@theCo.de http://www.theCo.de

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

[prev in list] [next in list] [prev in thread] [next in thread]