'Re: [Beowulf] Nehalem Xeons 55** series'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       beowulf
Subject:    Re: [Beowulf] Nehalem Xeons 55** series
From:       Tiago Marques <a28427 () ua ! pt>
Date:       2009-02-23 20:22:49
Message-ID: b1335fe90902231222t4126529q216e39f90b84a1b2 () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

On Mon, Feb 23, 2009 at 5:27 AM, Mark Hahn <hahn@mcmaster.ca> wrote:

> AMD Opteron "Shangai" 6MB L3 : 90% of peak
>> Intel "Nehalem" Core i7 : 95.2% of peak
>> Intel Itanium 2 : 95.6% of peak
>>
>
> interesting numbers, and Goto's efforts are always respected.
> it would be valuable to understand why, though.  I usually think
> of ia64 as being basically designed specifically to make this kind of
> hero-optimization possible.  but I wonder how well each of these chips does
> on low-to-medium quality user code and current compilers.

As I see it, plenty of bandwidth and an excellent choice of cache sizes,
latency and architecture. AMD had done a good job with Barcelona but didn't
follow up with further improvements(other than cache size) with Shangai.
Intel seems to have it the sweet spot.

>
>
>  It's a Core i7 but should give you an idea.
>>
>
> I suspect that lots of people are holding their breaths (and POs)
> waiting for Intel to release 2 (and 4?) socket nehalems.  it has to be a
> bit scary for AMD, since they're getting leapfrogged in the area
> of their traditional strength.  I don't have a sense for whether Goto's
> kind of tuning (beyond the ability of compilers, and relatively
> cache-friendly relative to flops) is terribly relevant to the market.

It's even worse for AMD in some other areas. Take a look at this:
http://it.anandtech.com/weblog/showpost.aspx?i=554
Best regards,
                                 Tiago Marques

>
>
> on the topic of memory bandwidth:
>  http://techreport.com/articles.x/16448
> makes it sound as if AMD knows they have a scaling problem with cache
> coherency, and have a solution queued.  does anyone know whether nehalem
> already has a probe filter?  or if AMD has mentioned anything about
> widening to a 3-4-dimm-per-socket memory interface?
>
> regards, mark hahn.
>

[Attachment #5 (text/html)]

<br><div class="gmail_quote">On Mon, Feb 23, 2009 at 5:27 AM, Mark Hahn <span \
dir="ltr">&lt;<a href="mailto:hahn@mcmaster.ca">hahn@mcmaster.ca</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex;"> <div class="Ih2E3d"><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> AMD Opteron \
&quot;Shangai&quot; 6MB L3 : 90% of peak<br> Intel &quot;Nehalem&quot; Core i7 : \
95.2% of peak<br> Intel Itanium 2 : 95.6% of peak<br>
</blockquote>
<br></div>
interesting numbers, and Goto&#39;s efforts are always respected.<br>
it would be valuable to understand why, though. &nbsp;I usually think<br>
of ia64 as being basically designed specifically to make this kind of \
hero-optimization possible. &nbsp;but I wonder how well each of these chips does on \
low-to-medium quality user code and current compilers.</blockquote><div> \
</div><div>As I see it, plenty of bandwidth and an excellent choice of cache sizes, \
latency and architecture. AMD had done a good job with Barcelona but didn&#39;t \
follow up with further improvements(other than cache size) with Shangai. Intel seems \
to have it the sweet spot.</div> <div>&nbsp;</div><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div \
class="Ih2E3d"><br> <br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> It&#39;s a Core i7 but should give you an idea.<br>
</blockquote>
<br></div>
I suspect that lots of people are holding their breaths (and POs)<br>
waiting for Intel to release 2 (and 4?) socket nehalems. &nbsp;it has to be a bit \
scary for AMD, since they&#39;re getting leapfrogged in the area<br> of their \
traditional strength. &nbsp;I don&#39;t have a sense for whether Goto&#39;s<br> kind \
of tuning (beyond the ability of compilers, and relatively<br> cache-friendly \
relative to flops) is terribly relevant to the \
market.</blockquote><div></div><div>It&#39;s even worse for AMD in some other areas. \
Take a look at this: <a \
href="http://it.anandtech.com/weblog/showpost.aspx?i=554">http://it.anandtech.com/weblog/showpost.aspx?i=554</a></div>
 <div></div><div>Best regards,</div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Tiago \
Marques</div><div></div><div>&nbsp;</div><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br>

<br>
on the topic of memory bandwidth:<br>
&nbsp;<a href="http://techreport.com/articles.x/16448" \
target="_blank">http://techreport.com/articles.x/16448</a><br> makes it sound as if \
AMD knows they have a scaling problem with cache<br> coherency, and have a solution \
queued. &nbsp;does anyone know whether nehalem<br> already has a probe filter? \
&nbsp;or if AMD has mentioned anything about widening to a 3-4-dimm-per-socket memory \
interface?<br> <br>
regards, mark hahn.<br>
</blockquote></div><br>

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit \
http://www.beowulf.org/mailman/listinfo/beowulf

[prev in list] [next in list] [prev in thread] [next in thread]