[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-storage-discuss
Subject:    Re: [storage-discuss] iSCSI Target and ZFS interaction
From:       Rick McNeal <Rick.McNeal () Sun ! COM>
Date:       2006-08-01 19:44:17
Message-ID: 42E478F7-E19C-4B58-82FF-BE338C43750B () Sun ! COM
[Download RAW message or body]


On Aug 1, 2006, at 12:19 PM, Marco van Wieringen wrote:
>> While this was going on and I had debug output enabled, I noticed the
>> connection would close and then open, over and over again. Upon
>> further investigation I found a bug in the target's session handling
>> code when t10_cmd_create() fails it would shutdown the connection
>> instead of returning the status code that's available. It turns out
>> that Microsoft issues commands to logical units 0 through 255 instead
>> of using the information from REPORT_LUNS. The target only has LU 0,
>> so commands sent to the other LUs will fail calls to t10_cmd_create 
>> ().
>>
> Ok great does it even ask for REPORT_LUNS ? Or does it just scan all
> LUNS. This explains the number of connections.
>

I found what triggered Windows to scan all 256 LUNs for a target.  
When the target would return a UNIT_ATTENTION on the first command  
indicating a Power-On occurred. This would cause Windows to ignore  
LUN0 forever. Since LUN0 was disabled Windows couldn't issue the  
REPORT_LUNS command and decided to scan everything. Now, I had  
removed those three lines which I spoke of in a previous message,  
rebuilt the daemon, killed the running version and started a new one.  
Windows reconnects, but it's knowledge of LUN0 being disabled isn't  
flushed! By causing the Windows initiator to log out via the GUI and  
log back in, it's willing to revisit LUN0. At that point the first  
command succeeds and it will then issue a REPORT_LUNS command and not  
attempt to issue any commands to the other non-existent LUs.

>> So, when this occurs (I've not found the trigger to always reproduce
>> it), two commands are sent to each LU which causes a connection
>> closure. You'd see a bunch of LWPS being spawned.
> Ok sound quite like the thing I'm seeing, it just spins to more then
> 600 LWP's.
>

I still haven't tracked down this issue completely. I did perform a  
quick experiment where I caused a Solaris initiator to repeatedly  
connect and disconnect. I watch the NLWPs as reported by prstat climb  
up to well over 100. I then caused the daemon to drop core and looked  
at the core with mdb. As I expected, there are only 4 LWPs listed.

>
> -- 
> Marco van Wieringen <mvw@planets.elm.net>
> Planets Communications B.V.
>

----
Rick McNeal

A good friend will come and bail you out of jail ... but, a true  
friend will be sitting next to you saying, "Damn ... that was fun!"


_______________________________________________
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic