'Re: [pypy-dev]'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pypy-dev
Subject:    Re: [pypy-dev]
From:       Paolo Giarrusso <p.giarrusso () gmail ! com>
Date:       2009-03-30 18:38:31
Message-ID: a4d0c86c0903301138v1cd4d7d5lf294eefd13fd5e81 () mail ! gmail ! com
[Download RAW message or body]

On Mon, Mar 30, 2009 at 13:48, Christian Tismer <tismer@stackless.com> wrote:
> On 3/30/09 1:28 AM, Paolo Giarrusso wrote:
>>
>> On Thu, Mar 26, 2009 at 02:42, Leonardo Santagada<santagada@gmail.com>
>>  wrote:
>>>
>>> On Mar 25, 2009, at 9:27 PM, Christian Tismer wrote:
>>>
>>>> Hi friends,
>>>>
>>>> please have a look at this.
>>>> http://code.google.com/p/unladen-swallow/wiki/ProjectPlan
>>>>
>>>> is this YAIPAP ?
>>>> Yet Another Ignorant Python Acceleration Project
>>>>
>>>> I see them mentioning Hölzle and Self, and I see a reference to
>>>> Psyco where they want to steal from, but PyPy does not exist.
>>>>
>>>> IMHO, this is totally GAGA  - chris
>>
>> First, I suggest you have a look at the FAQ - when I did, I discovered
>> the developers are full-time Google engineers. Having said that, the
>> Google V8 group does not seem to be involved, and that's quite stupid.
>> However, I studied with Lars Bak and the base ideas he had are the
>> same.
>
> Interesting. Yes, I read the FAQ, but I don't know the people.
>
>> Missing knowledge of PyPy is also stupid obviously, but I wonder why
>> nobody proposed "hey, let's tell them"; fighting another project.
>> They'd sure benefit, for instance, from CALL_METHOD and
>> CALL_LIKELY_BUILTIN ideas (they mention among their purposes fixing
>> the performance problems addressed by CALL_LIKELY_BUILTIN, that's why
>> I mention these).
>
> Well, we could have contacted them.
> But (I'm speaking for myself) if they have decided to totally
> ignore a project like PyPy, then I think it is for a reason.
> Not knowing about PyPy means to close more than two eyes.
> Therefore I see no point in approaching them.

Well, there is a mention of this project on the PyPy blog (just
noticed), and they talk in term of "friendly competition". Since they
seem to know about PyPy, I wonder why it's not listed on the website.
But it's anyway by far incomplete (lots of interesting questions
weren't answered by those pages).

>> Also, they're already started with releasing. Have a look at their
>> benchmarks:
>> http://code.google.com/p/unladen-swallow/wiki/Releases
>>
>> Did you look at that because declaring it GAGA?
>
> My first perception of the project was a bit distorted.
> I saw this as an attempt to replace PyPy with something better,
> and it seemed GAGA to me to do that ignoring prior work.
>
> Now that I realize that the project has much smaller
> goals, it becomes more realistic.
>
> The Q1 goals are relatively doable without doubt. The current
> achievements speedwise remind me of the anxient Python2C project.
> It showed the typical acceleration by a factor of around 2, which
> is what you can expect when eliminating the interpreter loop.
>
>>> I was reading it earlier, the simpler ideas, like making pickle faster
>>> and most of q1 deliverables seems nice, and could really help python
>>> right now, but those seems more like the things Need For Speed sprint
>>> was doing.
>
> Yes, Need for Speed did small, doable things.
>
>>> Not the LLVM-JIT, new GC, eliminate the GIL seems
>>> unrealistic, even the pace of the project seems to be counting on tons
>>> of new developers joining the project after the Q1 milestone.
>>
>> Well, the milestones seem crazy but they did achieve something notable
>> in their Q1 deliverable, but most of the ideas seem correct.
>
> Q1 is fine, but it does by no means give any proof that the next
> milestones can be achieved.
>
>> "Eliminate the GIL" is not hard by itself, the problem (IMHO, no hard
>> numbers for that) is that it's impossible with refcounting (because of
>> the cost of atomic refcount manipulation). The author of the old "free
>> threading" patch mentioned only another problem, that is locking
>> around name lookups in mutable name dictionaries (for looking up
>> object members, globals, etc.), which can also be approached (and I
>> think refcount manipulation is a bigger issue).
>
> As I remember that patch, the overhead was around 40%, mostly because
> of reference counting.

You're right, but I couldn't find someone stating that it was due to
reference counting, on the MLs or on FAQs. The mention of dictionaries
(which are probably also a problem) was from the author of the patch,
Greg Stein:

http://mail.python.org/pipermail/python-dev/2001-August/017099.html

> I guess nobody actually goes this path,
> because it is such a waste, compared to multiple processes, and doing
> it "right" (where I'm referring to some Java achievements I heard of)
> is pretty much of a total re-write of Python.

Well, converting each and every C module from refcounting to garbage
collection is _not_ a total rewrite of Python. I can understand why
you claim it has the same costs. But eXtreme Programming on one side,
and the daily activity of the Linux kernel on another side, show that
given enough manpower such things can be done.

I mean, why CPython doesn't allow having multiple interpreters in the
same process? Because it would take time to move all statics into the
thread state. Well, much bigger changes do take place in the Linux
kernel much more than once per release. Having had experience working
there, I see either a severe lack of manpower, or a lack of trust into
what developers can do.

And even if I say "enough manpower", it still doesn't amount at all to
a complete rewrite. For a human programmer, the conversion is mostly
mechanical: remove (or make conditional) refcount stuff, then in all
functions, register all pointers with the GC (or, in C++, turn them
into handles). If you don't do that over night, you'll get bugs only
in very particular situations. It's obviously boring though, but
programming is not always fun.

> I'm pretty much wondering if the latter makes sense, given the
> existence of PyPy.

Because PyPy still needs lots of time, because the PyPy idea _is_
research, and because if they meet their deadlines (I'm not sure if
it's conceivably possible) Google will save computing power much
earlier. Especially because their C API will be more compatible with
the current one. Well, they actually claim full compatibility, but
this point _is_ IMHO crazy, and they'll notice when they'll have to
remove refcounting.

About the "PyPy is research" statement, just a quote:
http://morepypy.blogspot.com/2008/10/sprint-discussions-jit-generator.html

"Partial evaluation (the basis for our JIT generator) is a 30 years
old technique that was always just promising and never really
successful, so the fact that we think we can solve its problems in a
few years is very much hubris anyway :-). On the positive side, we
think that we now know these problems much better than ever before and
that we have a plan that has a chance to succeed."

I didn't read the post about tracing JITs yet, so I don't know if that
still holds true (I mean, if this is still about partial evaluation).
Anyway, at most it's research approaching completion and production
status, but IMHO nobody imagines getting production stability by the
end of 2009 (that's IMHO though, I might very well be wrong, and I
hope so).

Regards
-- 
Paolo Giarrusso
_______________________________________________
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev

[prev in list] [next in list] [prev in thread] [next in thread]