[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux1394-user
Subject:    Re: sbp2: sbp2util_node_write_no_wait failed
From:       Thorsten Holtkaemper <openpub () web ! de>
Date:       2005-11-06 16:51:12
Message-ID: 200511061751.12787.openpub () web ! de
[Download RAW message or body]

[just subscribed to the list, so this mail is not a real reply]

Hi Stefan, Michael,

Stefan Richter wrote:
> Michael Brade wrote:
> > On Friday 04 November 2005 23:13, Stefan Richter wrote:
> > > Michael Brade wrote:
> > > > Heh, good news, debugging finished! :-)
> > > Let's say, _nearly_ finished.
> > Ok, fair enough, however, you didn't tell me yet what to do next...
>
> I hope my lack of insight didn't come across as lack of manners. :-)
>
> > > There are 64 tlabels, therefore what you have seen is that sbp2 added
> > > ORBs and rung the doorbell 64 times while the target did not finish
> > > these 64 small transactions to the doorbell register with a response.
> > 
> > Yep, but what I don't quite understand yet is when exactly it happens. It 
> > seems that I can copy one huge file almost without errors, maybe one or
> > two, rarely more. But when I write a lot of small files and try to do some
> > reading inbetween the error happens every 10 seconds or worse. With the
> > odd exception to the rule. 
> 
> I/O of huge files involves very few SCSI commands with big accompanying 
> data transfers, therefore the ratio of "protocol traffic" to "data 
> traffic" is much lower then. This explains why the frequence of the 
> problem and the number of files (rather than file size) correlate.

exactly the same problem here with an ieee1394 harddisk. serialize_io made the 
problem occure less but not go away. Also *not* to use the AS IO scheduler 
helped a little bit (ATM I'm using deadline).

[...]
kernel: ieee1394: sbp2: aborting sbp2 command
kernel: scsi5 : destination target 0, lun 0
kernel:         command: Test Unit Ready: 00 00 00 00 00 00
kernel:  offline device
kernel: scsi5 (0:0): rejecting I/O to offline device
last message repeated 372 times
kernel:  offline device
kernel: scsi5 (0:0): rejecting I/O to offline device
last message repeated 83 times
kernel: scsi5 (0:0): rejecting I/O to offline device offline device
kernel: scsi5 (0:0): rejecting I/O to offline device
last message repeated 341 times
kernel: printk: 3813 messages suppressed.
kernel: Buffer I/O error on device sde1, logical block 8051775
kernel: lost page write due to I/O error on sde1
kernel: scsi5 (0:0): rejecting I/O to offline device
last message repeated 55 times
kernel: printk: 1247 messages suppressed.
kernel: Buffer I/O error on device sde1, logical block 8063430
kernel: lost page write due to I/O error on sde1
kernel: scsi5 (0:0): rejecting I/O to offline device
last message repeated 580 times
kernel: printk: 13162 messages suppressed.
kernel: Buffer I/O error on device sde1, logical block 8076595
kernel: lost page write due to I/O error on sde1
kernel: scsi5 (0:0): rejecting I/O to offline device
last message repeated 697 times
kernel: EXT3-fs error (device sde1): read_block_bitmap: Cannot read block 
bitmap - block_group = 247, block_bitmap = 8093696
kernel: Aborting journal on device sde1.
kernel: scsi5 (0:0): rejecting I/O to offline device
kernel: scsi5 (0:0): rejecting I/O to offline device
kernel: ext3_abort called.
kernel: EXT3-fs error (device sde1): ext3_journal_start_sb: Detected aborted 
journal
kernel: Remounting filesystem read-only
kernel: EXT3-fs error (device sde1) in ext3_prepare_write: IO failure
kernel: __journal_remove_journal_head: freeing b_committed_data
kernel: scsi5 (0:0): rejecting I/O to offline device
last message repeated 43 times
kernel: scsi5 (0:0): rejecting I/O to offline device
kernel: EXT3-fs error (device sde1): ext3_readdir: directory #2 contains a 
hole at offset 0
kernel: __journal_remove_journal_head: freeing b_committed_data
kernel: scsi5 (0:0): rejecting I/O to offline device
kernel: printk: 15883 messages suppressed.
kernel: Buffer I/O error on device sde1, logical block 535
kernel: lost page write due to I/O error on sde1
[...]

In addition it usuallay needs several unload/load attemps of the firewire 
modules (sbp2 eth1394 ohci1394 ieee1394) to make the kernel recognize the 
harddisk at all.

I am running kernel 2.6.13.4

[...]
kernel: ieee1394: Initialized config rom entry `ip1394'
kernel: ohci1394: $Rev: 1299 $ Ben Collins <bcollins@debian.org>
kernel: ACPI: PCI Interrupt 0000:00:0c.0[A] -> GSI 19 (level, low) -> IRQ 209
kernel: ohci1394: fw-host0: Unexpected PCI resource length of 1000!
kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[209]  
MMIO=[dfff7000-dfff77ff]  Max Packet=[2048]
Nov  6 13:34:18 tux kernel: ieee1394: Host added: ID:BUS[0-00:1023]  
GUID[000010dc00592003]
kernel: eth1394: $Rev: 1264 $ Ben Collins <bcollins@debian.org>
kernel: eth1394: eth2: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0)
[...]

Log for a _not_ successful login attemp:

[...]
ieee1394: Error parsing configrom for node 0-01:1023
ieee1394: The root node is not cycle master capable; selecting a new root node 
and resetting...
ieee1394: Error parsing configrom for node 0-00:1023
ieee1394: Node changed: 0-00:1023 -> 0-01:1023
ieee1394: Node added: ID:BUS[0-00:1023]  GUID[0050770e501e1452]
ieee1394: unsolicited response packet received - no tlabel match
sbp2: $Rev: 1306 $ Ben Collins <bcollins@debian.org>
scsi2 : SCSI emulation for IEEE-1394 SBP-2 Devices
ieee1394: sbp2: Error logging into SBP-2 device - login timed-out
sbp2: probe of 0050770e501e1452-0 failed with error -16
[...]

Log for a successful login attempt:

[...]
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[000010dc00592003]
ieee1394: Node added: ID:BUS[0-01:1023]  GUID[0050770e501e1452]
ieee1394: unsolicited response packet received - no tlabel match
ieee1394: The root node is not cycle master capable; selecting a new root node 
and resetting...
ieee1394: Node changed: 0-01:1023 -> 0-00:1023
ieee1394: Node changed: 0-00:1023 -> 0-01:1023
ieee1394: unsolicited response packet received - no tlabel match
scsi5 : SCSI emulation for IEEE-1394 SBP-2 Devices
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
  Vendor: WDC WD25  Model: 00JB-00GVA0       Rev:
  Type:   Direct-Access                      ANSI SCSI revision: 06
SCSI device sde: 488397168 512-byte hdwr sectors (250059 MB)
sde: got wrong page
sde: assuming drive cache: write through
SCSI device sde: 488397168 512-byte hdwr sectors (250059 MB)
sde: got wrong page
sde: assuming drive cache: write through
 sde: sde1
Attached scsi disk sde at scsi5, channel 0, id 0, lun 0
[...]


I can reproduce both problems easily. I am happily willing to test new code to 
resolve this issue(s).

Best regards,
Thorsten


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
mailing list Linux1394-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux1394-user
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic