'Re: [Python-Dev] Thoughts fresh after EuroPython'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-dev
Subject:    Re: [Python-Dev] Thoughts fresh after EuroPython
From:       Glenn Linderman <glenn () nevcal ! com>
Date:       2010-09-08 5:00:15
Message-ID: 4C87185F.6000408 () nevcal ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

  On 7/26/2010 7:36 AM, Guido van Rossum wrote:
> According to CSP advicates, this approach will break down when you
> need more than 8-16 cores since cache coherence breaks down at 16
> cores. Then you would have to figure out a message-passing approach
> (but the messages would have to be very fast).

Catching up on Python-Dev after 3 months of travel (lucky me!), so 
apologies for a "blast from the past" as I'm 6 weeks late in replying here.

Think of the hardware implementation of cache coherence as a MIL - 
memory interleave lock, or a micro interpreter lock (the hardware is 
interpreting what the compiled software is doing).

That is not so different than Python's GIL, just at a lower level.

I didn't read the CSP advocacy papers, but experience in early parallel 
system at CMU, Tandem Computers, and Teradata strongly imply that 
multiprocessing of some sort will always be able to scale larger than 
memory coherent cores -- if the application can be made parallel at all.

It is interesting to note that all the parallel systems mentioned above 
implemented fast message passing hardware of various sorts (affected by 
available technologies of their times).

It is interesting to note the similarities between some of the extreme 
multi-way cache coherence approaches and the various message passing 
hardware, also... some of the papers that talk about exceeding 16 cores 
were going down a message passing road to achieve it.  Maybe something 
new has been discovered in the last 8 years since I've not been 
following the research... the only thing I've read about that in the 
last 8 years is the loss of Jim Gray at sea... but the IEEE paper you 
posted later seems to confirm my suspicions that there has not yet been 
a breakthrough.

The point of the scalability remark, though, is that while lots of 
problems can be solved on a multi-core system, problems also grow 
bigger, and there will likely always be problems that cannot be solved 
on a multi-core (single cache coherent memory) system.  Those problems 
will require message passing solutions.  Experience with the systems 
above has shown that switching from a multi-core (semaphore based) 
design to a message passing design is usually a rewrite.

Perhaps the existence of the GIL, forcing a message passing solution to 
be created early, is a blessing in disguise for the design of large 
scale applications.  I've been hearing about problems for which the data 
is too large to share, and the calculation is too complex to parallelize 
for years, but once the available hardware is exhausted as the problem 
grows, the only path to larger scale is message passing parallelism... 
forcing a redesign of applications that outgrew the available hardware.

That said, applications that do fit in available hardware generally can 
run a little faster with some sort of shared memory approach: message 
passing does have overhead.

-- 
Glenn
------------------------------------------------------------------------
I have CDO..It's like OCD, but in alphabetical order..The way it should be!
(a facebook group is named this, except for a misspelling.)

[Attachment #5 (text/html)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    On 7/26/2010 7:36 AM, Guido van Rossum wrote:
    <blockquote
      cite="mid:AANLkTimXNT+ndDmPL5bgSKiW-W+X18EjdnBnRU_q0xum@mail.gmail.com"
      type="cite">
      <pre wrap="">According to CSP advicates, this approach will break down when you
need more than 8-16 cores since cache coherence breaks down at 16
cores. Then you would have to figure out a message-passing approach
(but the messages would have to be very fast).
</pre>
    </blockquote>
    <br>
    Catching up on Python-Dev after 3 months of travel (lucky me!), so
    apologies for a "blast from the past" as I'm 6 weeks late in
    replying here.<br>
    <br>
    Think of the hardware implementation of cache coherence as a MIL -
    memory interleave lock, or a micro interpreter lock (the hardware is
    interpreting what the compiled software is doing).<br>
    <br>
    That is not so different than Python's GIL, just at a lower level.<br>
    <br>
    I didn't read the CSP advocacy papers, but experience in early
    parallel system at CMU, Tandem Computers, and Teradata strongly
    imply that multiprocessing of some sort will always be able to scale
    larger than memory coherent cores -- if the application can be made
    parallel at all.<br>
    <br>
    It is interesting to note that all the parallel systems mentioned
    above implemented fast message passing hardware of various sorts
    (affected by available technologies of their times).<br>
    <br>
    It is interesting to note the similarities between some of the
    extreme multi-way cache coherence approaches and the various message
    passing hardware, also... some of the papers that talk about
    exceeding 16 cores were going down a message passing road to achieve
    it.&nbsp; Maybe something new has been discovered in the last 8 years
    since I've not been following the research... the only thing I've
    read about that in the last 8 years is the loss of Jim Gray at
    sea... but the IEEE paper you posted later seems to confirm my
    suspicions that there has not yet been a breakthrough.<br>
    <br>
    The point of the scalability remark, though, is that while lots of
    problems can be solved on a multi-core system, problems also grow
    bigger, and there will likely always be problems that cannot be
    solved on a multi-core (single cache coherent memory) system.&nbsp; Those
    problems will require message passing solutions.&nbsp; Experience with
    the systems above has shown that switching from a multi-core
    (semaphore based) design to a message passing design is usually a
    rewrite.<br>
    <br>
    Perhaps the existence of the GIL, forcing a message passing solution
    to be created early, is a blessing in disguise for the design of
    large scale applications.&nbsp; I've been hearing about problems for
    which the data is too large to share, and the calculation is too
    complex to parallelize for years, but once the available hardware is
    exhausted as the problem grows, the only path to larger scale is
    message passing parallelism... forcing a redesign of applications
    that outgrew the available hardware.<br>
    <br>
    That said, applications that do fit in available hardware generally
    can run a little faster with some sort of shared memory approach:
    message passing does have overhead.<br>
    <br>
    <div class="moz-signature">-- <br>
      Glenn
      <hr>
      <div style="text-align: right;" align="right">
        I have CDO..It's like OCD, but in alphabetical order..The way it
        should be!<br>
        (a facebook group is named this, except for a misspelling.)
      </div>
    </div>
  </body>
</html>

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-dev%40progressive-comp.com

[prev in list] [next in list] [prev in thread] [next in thread]