[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-capi-sig
Subject:    [capi-sig]Re: fixedint patch, proof of concept working now
From:       Robert Bradshaw <robertwb () math ! washington ! edu>
Date:       2018-09-19 8:48:39
Message-ID: CADiQ+QAirnjUWD8Nk35+N_7REieCPQqJD2ciom+V2knRQPpFZA () mail ! gmail ! com
[Download RAW message or body]

On Wed, Sep 19, 2018 at 10:15 AM INADA Naoki <songofacandy@gmail.com> wrote:

> >
> > I did consider trying to use a second tag for short strings.  Not
> > sure it will help too much as some quick analysis shows that only
> > 25% of strings used by PyDict_GetItem are short enough to fit.
> >
> > This morning I dreamed up new idea: analyze normal Python programs
> > and build a list of strings commonly used for PyDict_GetItem.  They
> > will be strings like "self",


"self" is almost never referred to by name (it's an indexed local in the
bytecode)


> builtin functions names, etc.


Strings encountered in CPython compilation are already interend.
Fortunately, this should make the experiment easier--you could just plug
into that framework to decide which strings to represent with tagged
pointers rather than having to come up with your own heuristic.


> Then use
> > a tagged pointer to hold these common strings.  I.e. a tag to denote
> > a string (or interned symbol, in Lisp speak) and an integer which is
> > the offset into the fixed array of interned strings.  The savings
> > would have to come from avoiding the INCREF/DECREF accounting of
> > refcounts on those strings.


Given that branching to avoid the refcounting is more expensive than doing
the refcounting itself, this might be hard, but maybe there's a slight win
if we're already doing this branching for ints, etc. Or if "normal"
refcounts become more expensive than increment/decrements (e.g. truly
threadsafe). Unlike ints, most strings are not ephemeral, and nor can the
"contents" be accessed without a deference, so tagged pointers are less of
a win here.)

Instead of fixed set of strings,
> > perhaps we could make the intern process dynamically allocate the
> > tag IDs.  We could have a specialized lookdict that works for dicts
> > containing only interned strings.
>
> It's very nice idea!
>

CPython already optimizes for dicts containing only strings; given that
most of the keys in this dict are interned, I'm not sure it would be a big
win to disallow non-interned keys (which would primarily speed up the
atypical key-not-found path).

Still, these are interesting experiments; thanks for exploring and sharing!
_______________________________________________
capi-sig mailing list -- capi-sig@python.org
To unsubscribe send an email to capi-sig-leave@python.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic