[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linuxbios
Subject:    Re: [coreboot] MP table multicore patch
From:       Myles Watson <mylesgw () gmail ! com>
Date:       2010-02-16 23:58:59
Message-ID: 2831fecf1002161558o33a71aadof752e2f3b82ffa60 () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On Tue, Feb 16, 2010 at 4:40 PM, Stefan Reinauer <stepan@coresystems.de>wrote:

>  On 2/16/10 8:42 PM, Myles Watson wrote:
>
>
>
> On Tue, Feb 16, 2010 at 12:02 PM, Stefan Reinauer <stepan@coresystems.de>wrote:
>
>>  On 2/16/10 5:11 AM, Timothy Pearson wrote:
>> > Here is a cleaned up and tested version of the SMP APIC autodetect
>> patch.
>> >
>> > Signed-off-by: Timothy Pearson <tpearson@raptorengineeringinc.com>
>>
>
>
>> And, can you describe high level, what the patch changes? It looks to me
>> as if you are recursing through the tree instead of just walking the
>> "all_devices" list.
>> So this implies that you don't catch all devices when running through
>> all_devices. This sounds like a problem in the resource allocator and
>> maybe it should be fixed there instead?
>>
> I don't understand why this would be a resource allocator problem.  Aren't
> we talking about the device tree?
>
>
> Oh, sorry I didn't say more...  What the old code did was the following:
>
>     device_t cpu;
>     for(cpu = all_devices; cpu; cpu = cpu->next) {
>      ....
>      <- check if the device is an APIC and if it is, dump an APIC entry in
> the mptable ->
>      ....
>     }
>
> But it seems that the code does not see all devices -- anymore -- not sure
> if it ever did, but I think to remember that's what happened at some point,
> or was at least intended.
>
I'm really surprised that it didn't see all devices.


> So now the code looks l
>
> scan_for_apic_devices(parent)
> {
>     for(c_it=0; c_it < parent->links; c_it++) {
>         for (child = parent->link[c_it].children; child; child =
> child->sibling) {
> ....
>
>             if (child->path.type == DEVICE_PATH_APIC_CLUSTER) {
>                 // Found an APIC cluster, scan it for APICs
>                 smp_scan_for_apic_devices(child);
>             }
>         }
>     }
> }
>
> So two more steps are necessary:
> - check all the downwards links of a device instead of just walking devices
> and checking their type.
> - run recursively in a special case on APIC clusters.
>
> This sounds a whole lot like something changed in the way "all_devices"
> works. And if "all_devices" does not mean "all devices" I am sure there are
> more places in our code that need similar fixes.
>
I agree.  I've never seen the problem where I couldn't run through all the
devs with all_devices.

>
>  Maybe the real problem is related to the memory corruption seen with
> printk?
>
> Not completely impossible, but I figured it's easier to ask the guy who
> rewrote the resource allocator if he knows something about how the intended
> behavior of "all_devices" ;-)
>
As far as I know I didn't change that behavior.  Does anyone have a boot log
where they can see that the tree of devices has more devices than the list
(at the same stage of enumeration)?


> - Maybe the behavior is intended, then we just need to check in Timothy's
> patch.
> - Maybe the behavior is not intended, but that's how the code works right
> now. Then we'd rather have to look at the resource allocator and decide if
> we want "all_devices" to mean "all devices", or whether we rename the
> variable, or redefine its meaning.
>
I think it should mean all devices.  I changed resource allocation to go
through the tree because children of disabled devices shouldn't have
allocated resources.

- Maybe none of the above applies, then we need to do nasty stack corruption
> debugging. In this case it would be fundamentally wrong to touch either the
> device tree code or the mp table creation code until we fix the corruption
> in order to make sure we don't create funny special cases.
>
This is my vote, but I'm happy to be proven wrong.  I don't see anywhere in
the code where devices get added to the tree but not to the list of
devices.  I also don't remember a place where devices get removed from the
all_devices list.


> So, if you have hints enlightening any of the maybes, please share!
>
My suggestion would be to traverse the tree and the list and compare them.
The problem is that printk was causing a hang for him.

Thanks,
Myles

[Attachment #5 (text/html)]

<br><br><div class="gmail_quote">On Tue, Feb 16, 2010 at 4:40 PM, Stefan Reinauer \
<span dir="ltr">&lt;<a \
href="mailto:stepan@coresystems.de">stepan@coresystems.de</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, \
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">



  

<div bgcolor="#ffffff" text="#000000"><div class="im">
On 2/16/10 8:42 PM, Myles Watson wrote:
<blockquote type="cite"><br>
  <br>
  <div class="gmail_quote">On Tue, Feb 16, 2010 at 12:02 PM, Stefan
Reinauer <span dir="ltr">&lt;<a href="mailto:stepan@coresystems.de" \
target="_blank">stepan@coresystems.de</a>&gt;</span> wrote:<br>
  <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); \
margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">  <div>
    <div>On 2/16/10 5:11 AM, Timothy Pearson wrote:<br>
&gt; Here is a cleaned up and tested version of the SMP APIC autodetect
patch.<br>
&gt;<br>
&gt; Signed-off-by: Timothy Pearson &lt;<a \
href="mailto:tpearson@raptorengineeringinc.com" \
target="_blank">tpearson@raptorengineeringinc.com</a>&gt;</div>  </div>
  </blockquote>
  <div> </div>
  <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); \
margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">And, can you describe high level, what \
the patch changes? It looks to me<br> as if you are recursing through the tree \
instead of just walking the<br> &quot;all_devices&quot; list.<br>
So this implies that you don&#39;t catch all devices when running through<br>
all_devices. This sounds like a problem in the resource allocator and<br>
maybe it should be fixed there instead?<br>
  </blockquote>
  <div>I don&#39;t understand why this would be a resource allocator
problem.  Aren&#39;t we talking about the device tree?<br>
  </div>
  </div>
</blockquote>
<br></div>
Oh, sorry I didn&#39;t say more...  What the old code did was the following:<br>
<br>
    device_t cpu;<br>
    for(cpu = all_devices; cpu; cpu = cpu-&gt;next) {<br>
     ....<br>
     &lt;- check if the device is an APIC and if it is, dump an APIC
entry in the mptable -&gt;<br>
     ....<br>
    }<br>
<br>
But it seems that the code does not see all devices -- anymore -- not
sure if it ever did, but I think to remember that&#39;s what happened at
some point, or was at least intended.<br></div></blockquote><div>I&#39;m really \
surprised that it didn&#39;t see all devices.<br> <br></div><blockquote \
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt \
0pt 0.8ex; padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000">
So now the code looks l<br>
<br>
scan_for_apic_devices(parent)<br>
{<br>
    for(c_it=0; c_it &lt; parent-&gt;links; c_it++) {<br>
        for (child = parent-&gt;link[c_it].children; child; child =
child-&gt;sibling) {<br>
....<div class="im"><br>
            if (child-&gt;path.type == DEVICE_PATH_APIC_CLUSTER) {<br>
                // Found an APIC cluster, scan it for APICs<br></div>
                smp_scan_for_apic_devices(child);<br>
            }<br>
        }<br>
    }<br>
}<br>
<br>
So two more steps are necessary: <br>
- check all the downwards links of a device instead of just walking
devices and checking their type.<br>
- run recursively in a special case on APIC clusters.<br>
<br>
This sounds a whole lot like something changed in the way &quot;all_devices&quot;
works. And if &quot;all_devices&quot; does not mean &quot;all devices&quot; I am sure \
there are more places in our code that need similar fixes.<br><div \
class="im"></div></div></blockquote><div>I agree.  I&#39;ve never seen the problem \
where I couldn&#39;t run through all the devs with all_devices. <br></div><blockquote \
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt \
0pt 0.8ex; padding-left: 1ex;"> <div bgcolor="#ffffff" text="#000000"><div \
class="im"> <br>
<blockquote type="cite">
  <div class="gmail_quote">
  <div>Maybe the real problem is related to the memory corruption seen
with printk?<br>
  </div>
  </div>
</blockquote></div>
Not completely impossible, but I figured it&#39;s easier to ask the guy who
rewrote the resource allocator if he knows something about how the
intended behavior of &quot;all_devices&quot; ;-) <br></div></blockquote><div>As far \
as I know I didn&#39;t change that behavior.  Does anyone have a boot log where they \
can see that the tree of devices has more devices than the list (at the same stage of \
enumeration)?<br>  <br></div><blockquote class="gmail_quote" style="border-left: 1px \
solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div \
                bgcolor="#ffffff" text="#000000">
- Maybe the behavior is intended, then we just need to check in
Timothy&#39;s patch.<br>
- Maybe the behavior is not intended, but that&#39;s how the code works
right now. Then we&#39;d rather have to look at the resource allocator and
decide if we want &quot;all_devices&quot; to mean &quot;all devices&quot;, or whether \
we rename the variable, or redefine its meaning.<br></div></blockquote><div>I think \
it should mean all devices.  I changed resource allocation to go through the tree \
because children of disabled devices shouldn&#39;t have allocated resources.<br> \
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, \
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div bgcolor="#ffffff" \
                text="#000000">
- Maybe none of the above applies, then we need to do nasty stack
corruption debugging. In this case it would be fundamentally wrong to
touch either the device tree code or the mp table creation code until
we fix the corruption in order to make sure we don&#39;t create funny
special cases.<br></div></blockquote><div>This is my vote, but I&#39;m happy to be \
proven wrong.  I don&#39;t see anywhere in the code where devices get added to the \
tree but not to the list of devices.  I also don&#39;t remember a place where devices \
get removed from the all_devices list.<br>  <br></div><blockquote class="gmail_quote" \
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; \
padding-left: 1ex;"><div bgcolor="#ffffff" text="#000000"> So, if you have hints \
enlightening any of the maybes, please share!<br></div></blockquote><div>My \
suggestion would be to traverse the tree and the list and compare them.  The problem \
is that printk was causing a hang for him.<br> <br>Thanks,<br>Myles<br></div></div>



-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic