[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-bugs
Subject:    Re: access softraid(4) raid-5, retrying read on block, kernel panic
From:       Marcus MERIGHI <mcmer-openbsd () tor ! at>
Date:       2016-02-03 13:08:00
Message-ID: 20160203130800.GD5576 () tor ! at
[Download RAW message or body]

The softraid(4) volume sd5a was not mounted at the time this happened.
This time it says 'could not write metadata to sd1a'. 
Thats the same softraid(4) hunk that had to be rebuilt before.

sd5: retrying read on block 384
softraid0: I/O error 5 on dev 0x410 at block 16
softraid0: could not write metadata to sd1a
panic: softraid0: sd5: invalid volume state transistion 1 -> 1
Starting stack trace...
panic() at panic+0x10b
sr_raid5_set_vol_state() at sr_raid5_set_vol_state+0xe7
sr_raid5_set_chunk_state() at sr_raid5_set_chunk_state+0xc7
sr_ccb_done() at sr_ccb_done+0x76
sr_raid5_intr() at sr_raid5_intr+0x3c
sd_buf_done() at sd_buf_done+0x7b
scsi_done() at scsi_done+0x1e
usb_transfer_complete() at usb_transfer_complete+0x26c
ehci_softintr() at ehci_softintr+0x3f
softintr_dispatch() at softintr_dispatch+0x8b
Xsoftnet() at Xsoftnet+0x1f
--- interrupt ---
end of kernel
end trace frame: 0x1388, count: 246
0x8: 
End of stack trace.
syncing disks... 15 6 2 1 1 1 [...] giving up

Bye, Marcus

mcmer-openbsd@tor.at (Marcus MERIGHI), 2016.02.02 (Tue) 13:38 (CET):
> The rebuild has finished:
> 
> Volume      Status               Size Device  
> softraid0 0 Online     12002360033280 sd5     RAID5 
> 0 Online      4000786726912 0:0.0   noencl <sd1a>
> 1 Online      4000786726912 0:1.0   noencl <sd2a>
> 2 Online      4000786726912 0:2.0   noencl <sd3a>
> 3 Online      4000786726912 0:3.0   noencl <sd4a>
> 
> I'm hoping for advise on how to proceed; I need to try to get as much
> data off of this before things get worse. Or aren't they even bad at
> all?
> 
> What bothers me is that there is a need to 'retrying read on block
> 52992'. IIRC I've seen this on dying HDDs but not on a softraid disk.
> 
> Does this mean the underlying softraid is broken?
> 
> Thanks in advance, Marcus
> 
> mcmer-openbsd@tor.at (Marcus MERIGHI), 2016.01.28 (Thu) 10:58 (CET):
> > Softraid RAID-5 on four 4TB HDDs. 
> > The four disks are in an external enclosure (JBOD), connected via USB. 
> > The array worked for about a month.
> > Tested before loading (bioctl -O, bioctl -R). 
> > Copied roughly 6TB onto it while rebuilding was ongoing. 
> > Took about 3 weeks (via network). 
> > When the data was there I started another copy from the array to yet another
> > external disk. 
> > While this was running the first kernel panic ocurred: 
> > 
> > sd5: retrying read on block 52992
> > panic: softraid0: sd5: invalid volume state transition 1 -> 1
> > Starting stack trace...
> > panic() at panic+0x10b
> > sr_raid5_set_vol_state() at sr_raid5_set_vol_state+0xe7
> > sr_raid5_set_chunk_state() at sr_raid5_set_chunk_state+0xc7
> > sr_ccb_done() at sr_ccb_done+0x76
> > sr_raid5_intr() at sr_raid5_intr+0x3c
> > sd_buf_done() at sd_buf_done+0x7b
> > scsi_done() at scsi_done+0x1e
> > usb_transfer_complete() at usb_transfer_complete+0x26c
> > ehci_softintr() at ehci_softintr+0x3f
> > softintr_dispatch() at softintr_dispatch+0x8b
> > Xsoftnet() at Xsoftnet+0x1f
> > --- interrupt ---
> > end of kernel
> > end trace frame: 0x1388, count: 246
> > 0x8:
> > End of stack trace.
> > syncing disks... 80 48 10 1 1 1 1 1 1 [...] giving up
> > 
> > After that I retried (reboot, fsck, mount) and when accessing the mount
> > point there was another kernel panic. Silly me did not take another
> > picture.
> > Only after unpowering/powering the external enclosure and rebooting the
> > machine the array was automagically assembled again, with one disk
> > degraded. It is now rebuilding, 13% took about 14 hours. 
> > 
> > Thanks for reading, Marcus
> > 
> > # bioctl softraid0
> > Volume      Status               Size Device  
> > softraid0 0 Rebuild    12002360033280 sd5     RAID5 13% done 
> > 0 Rebuild     4000786726912 0:0.0   noencl <sd1a>
> > 1 Online      4000786726912 0:1.0   noencl <sd2a>
> > 2 Online      4000786726912 0:2.0   noencl <sd3a>
> > 3 Online      4000786726912 0:3.0   noencl <sd4a>
> > softraid0 1 Online        53691555840 sd6     CRYPTO
> > 0 Online        53691555840 1:0.0   noencl <sd0k>
> > softraid0 2 Online        53949354496 sd7     CRYPTO
> > 0 Online        53949354496 2:0.0   noencl <sd0m>
> > 
> > OpenBSD 5.8 (GENERIC.MP) #1236: Sun Aug 16 02:31:04 MDT 2015
> > deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 4276822016 (4078MB)
> > avail mem = 4143312896 (3951MB)
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcff9c000 (46 entries)
> > bios0: vendor Dell Inc. version "1.4.3" date 06/05/2009
> > bios0: Dell Inc. PowerEdge R200
> > acpi0 at bios0: rev 2
> > acpi0: sleep states S0 S4 S5
> > acpi0: tables DSDT FACP APIC SPCR HPET MCFG WDAT SLIC ERST HEST BERT EINJ SSDT \
> >                 SSDT SSDT
> > acpi0: wakeup devices PCI0(S5)
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz, 1600.27 MHz
> > cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU \
> > SH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,NXE,LONG,LAHF,PERF,SENSOR
> >                 
> > cpu0: 3MB 64b/line 8-way L2 cache
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 266MHz
> > cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2, IBE
> > cpu1 at mainbus0: apid 1 (application processor)
> > cpu1: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz, 1600.06 MHz
> > cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU \
> > SH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,NXE,LONG,LAHF,PERF,SENSOR
> >                 
> > cpu1: 3MB 64b/line 8-way L2 cache
> > cpu1: smt 0, core 1, package 0
> > ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
> > ioapic0: misconfigured as apic 0, remapped to apid 2
> > ioapic1 at mainbus0: apid 3 pa 0xfec10000, version 20, 24 pins
> > ioapic1: misconfigured as apic 0, remapped to apid 3
> > acpihpet0 at acpi0: 14318179 Hz
> > acpimcfg0 at acpi0 addr 0xe0000000, bus 0-255
> > acpiprt0 at acpi0: bus 0 (PCI0)
> > acpiprt1 at acpi0: bus 1 (PEX1)
> > acpiprt2 at acpi0: bus 2 (SBE0)
> > acpiprt3 at acpi0: bus 3 (PXHA)
> > acpiprt4 at acpi0: bus 4 (SBE4)
> > acpiprt5 at acpi0: bus 5 (SBE5)
> > acpiprt6 at acpi0: bus 6 (COMP)
> > acpicpu0 at acpi0: C1(@1 halt!), PSS
> > acpicpu1 at acpi0: C1(@1 halt!), PSS
> > ipmi at mainbus0 not configured
> > cpu0: Enhanced SpeedStep 1600 MHz: speeds: 2667, 2400, 2133, 1867, 1600 MHz
> > pci0 at mainbus0 bus 0
> > pchb0 at pci0 dev 0 function 0 "Intel 3200/3210 Host" rev 0x01
> > ppb0 at pci0 dev 1 function 0 "Intel 3200/3210 PCIE" rev 0x01: msi
> > pci1 at ppb0 bus 1
> > mpi0 at pci1 dev 0 function 0 "Symbios Logic SAS1068E" rev 0x08: msi
> > mpi0: SAS6IR, firmware 0.25.47.0
> > scsibus1 at mpi0: 112 targets
> > sd0 at scsibus1 targ 0 lun 0: <Dell, VIRTUAL DISK, 1028> SCSI3 0/direct fixed \
> >                 naa.600508e0000000006e54a90066b37109
> > sd0: 152064MB, 512 bytes/sector, 311427072 sectors
> > ppb1 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02
> > pci2 at ppb1 bus 2
> > ppb2 at pci2 dev 0 function 0 "Intel 6702PXH PCIE-PCIX" rev 0x09
> > pci3 at ppb2 bus 3
> > ppb3 at pci0 dev 28 function 4 "Intel 82801I PCIE" rev 0x02
> > pci4 at ppb3 bus 4
> > bge0 at pci4 dev 0 function 0 "Broadcom BCM5721" rev 0x21, BCM5750 C1 (0x4201): \
> > msi, address 00:25:64:3b:e9:34 brgphy0 at bge0 phy 1: BCM5750 10/100/1000baseT \
> > PHY, rev. 0 ppb4 at pci0 dev 28 function 5 "Intel 82801I PCIE" rev 0x02
> > pci5 at ppb4 bus 5
> > bge1 at pci5 dev 0 function 0 "Broadcom BCM5721" rev 0x21, BCM5750 C1 (0x4201): \
> > msi, address 00:25:64:3b:e9:35 brgphy1 at bge1 phy 1: BCM5750 10/100/1000baseT \
> > PHY, rev. 0 uhci0 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x02: apic 2 \
> > int 21 uhci1 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x02: apic 2 int 20
> > uhci2 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x02: apic 2 int 21
> > ehci0 at pci0 dev 29 function 7 "Intel 82801I USB" rev 0x02: apic 2 int 21
> > usb0 at ehci0: USB revision 2.0
> > uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> > ppb5 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x92
> > pci6 at ppb5 bus 6
> > radeondrm0 at pci6 dev 5 function 0 "ATI ES1000" rev 0x02
> > drm0 at radeondrm0
> > radeondrm0: apic 2 int 19
> > pcib0 at pci0 dev 31 function 0 "Intel 82801IR LPC" rev 0x02
> > pciide0 at pci0 dev 31 function 2 "Intel 82801I SATA" rev 0x02: DMA, channel 0 \
> >                 configured to native-PCI, channel 1 configured to native-PCI
> > pciide0: using apic 2 int 23 for native-PCI interrupt
> > atapiscsi0 at pciide0 channel 0 drive 1
> > scsibus2 at atapiscsi0: 2 targets
> > cd0 at scsibus2 targ 0 lun 0: <TSSTcorp, CDRWDVD TS-L463A, D550> ATAPI 5/cdrom \
> > removable cd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 5
> > usb1 at uhci0: USB revision 1.0
> > uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> > usb2 at uhci1: USB revision 1.0
> > uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> > usb3 at uhci2: USB revision 1.0
> > uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> > isa0 at pcib0
> > isadma0 at isa0
> > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> > pckbd0 at pckbc0 (kbd slot)
> > wskbd0 at pckbd0: console keyboard
> > pcppi0 at isa0 port 0x61
> > spkr0 at pcppi0
> > umass0 at uhub0 port 4 configuration 1 interface 0 "JMicron USB to ATA/ATAPI \
> >                 Bridge" rev 2.10/1.00 addr 2
> > umass0: using SCSI over Bulk-Only
> > scsibus3 at umass0: 2 targets, initiator 0
> > sd1 at scsibus3 targ 1 lun 0: <WDC WD40, EFRX-68WT0N0, > SCSI3 0/direct fixed
> > sd1: 3815447MB, 512 bytes/sector, 7814037168 sectors
> > sd2 at scsibus3 targ 1 lun 1: <WDC WD40, EFRX-68WT0N0, > SCSI3 0/direct fixed
> > sd2: 3815447MB, 512 bytes/sector, 7814037168 sectors
> > sd3 at scsibus3 targ 1 lun 2: <WDC WD40, EFRX-68WT0N0, > SCSI3 0/direct fixed
> > sd3: 3815447MB, 512 bytes/sector, 7814037168 sectors
> > sd4 at scsibus3 targ 1 lun 3: <WDC WD40, EFRX-68WT0N0, > SCSI3 0/direct fixed
> > sd4: 3815447MB, 512 bytes/sector, 7814037168 sectors
> > uhub4 at uhub0 port 5 "Cypress Semiconductor USB2 Hub" rev 2.00/0.0b addr 3
> > uhidev0 at uhub4 port 1 configuration 1 interface 0 "CPS OR600ELCDRM1U" rev \
> >                 1.10/2.00 addr 4
> > uhidev0: iclass 3/0, 44 report ids
> > upd0 at uhidev0
> > uhub5 at uhub4 port 2 "Terminus Technology USB 2.0 Hub" rev 2.00/1.11 addr 5
> > umass1 at uhub5 port 1 configuration 1 interface 0 "Intenso USB 3.0 Device" rev \
> >                 2.10/1.00 addr 6
> > umass1: using SCSI over Bulk-Only
> > scsibus4 at umass1: 2 targets, initiator 0
> > sd5 at scsibus4 targ 1 lun 0: <Intenso, USB 3.0 Device, 0> SCSI4 0/direct fixed \
> >                 serial.174c55aa000000001F1F
> > sd5: 2861588MB, 4096 bytes/sector, 732566646 sectors
> > umass2 at uhub5 port 2 configuration 1 interface 0 "Intenso USB 3.0 Device" rev \
> >                 2.10/1.00 addr 7
> > umass2: using SCSI over Bulk-Only
> > scsibus5 at umass2: 2 targets, initiator 0
> > sd6 at scsibus5 targ 1 lun 0: <Intenso, USB 3.0 Device, 0> SCSI4 0/direct fixed \
> >                 serial.174c55aa0000000022DF
> > sd6: 2861588MB, 4096 bytes/sector, 732566646 sectors
> > umass3 at uhub5 port 3 configuration 1 interface 0 "Intenso USB 3.0 Device" rev \
> >                 2.10/1.00 addr 8
> > umass3: using SCSI over Bulk-Only
> > scsibus6 at umass3: 2 targets, initiator 0
> > sd7 at scsibus6 targ 1 lun 0: <Intenso, USB 3.0 Device, 0> SCSI4 0/direct fixed \
> >                 serial.174c55aa000000001EA0
> > sd7: 2861588MB, 4096 bytes/sector, 732566646 sectors
> > umass4 at uhub5 port 4 configuration 1 interface 0 "Intenso USB 3.0 Device" rev \
> >                 2.10/1.00 addr 9
> > umass4: using SCSI over Bulk-Only
> > scsibus7 at umass4: 2 targets, initiator 0
> > sd8 at scsibus7 targ 1 lun 0: <Intenso, USB 3.0 Device, 0> SCSI4 0/direct fixed \
> >                 serial.174c55aa000000004CC1
> > sd8: 2861588MB, 4096 bytes/sector, 732566646 sectors
> > uhidev1 at uhub2 port 1 configuration 1 interface 0 "Tangtop USB CAT5" rev \
> >                 1.10/0.01 addr 2
> > uhidev1: iclass 3/1
> > ukbd0 at uhidev1: 8 variable keys, 6 key codes
> > wskbd1 at ukbd0 mux 1
> > uhidev2 at uhub2 port 1 configuration 1 interface 1 "Tangtop USB CAT5" rev \
> >                 1.10/0.01 addr 2
> > uhidev2: iclass 3/1
> > ums0 at uhidev2: 3 buttons, Z dir
> > wsmouse0 at ums0 mux 0
> > vscsi0 at root
> > scsibus8 at vscsi0: 256 targets
> > softraid0 at root
> > scsibus9 at softraid0: 256 targets
> > softraid0: sd5 was not shutdown properly
> > sd9 at scsibus9 targ 1 lun 0: <OPENBSD, SR RAID 5, 005> SCSI2 0/direct fixed
> > sd9: 11446342MB, 512 bytes/sector, 23442109440 sectors
> > softraid0: volume sd9 is roaming, it used to be sd5, updating metadata
> > root on sd0a (4bc519b678dfdbe1.a) swap on sd0b dump on sd0b
> > WARNING: / was not properly unmounted
> > radeondrm0: 1024x768
> > wsdisplay0 at radeondrm0 mux 1: console (std, vt100 emulation), using wskbd0
> > wskbd1: connecting to wsdisplay0
> > wsdisplay0: screen 1-5 added (std, vt100 emulation)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic