'[Python-ideas] Re: Experimenting with dict performance, and an immutable dict'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-ideas
Subject:    [Python-ideas] Re: Experimenting with dict performance, and an immutable dict
From:       Todd <toddrjen () gmail ! com>
Date:       2020-07-22 4:41:11
Message-ID: CAFpSVp+Tsvu7fUwS-d5Hy4NNgEjq9HTZ=GMjQFC3DzOGFSYeMw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


What, exactly, is frozen?  My understanding is that one problem with frozen
dicts in the past is deciding exactly what is mutable and what is
immutable.  Can you change what object a key maps to so long as the set of
keys stay the same?  Can you modify the contents of mutable object that is
a value?

On Tue, Jul 21, 2020 at 6:30 PM Marco Sulla <Marco.Sulla.Python@gmail.com>
wrote:

> Let me first say that the code and discussion, as per title, is also about
> possible performance improvements to the dict base type.
>
> TL;DR I implemented a frozendict using CPython 3.9 code. It seems that an
> immutable dict *could* be faster than dict in some cases. Furthermore, some
> optimization could be applied to dict too.
>
> Long explaining:
>
> Since now I have some time, I decided to experiment a little with the
> creation of an immutable dict in Python.
>
> Unluckily, I started this experiment many months ago, so the CPython code
> I used is old. Maybe some or all of my considerations are outdated.
>
> Initially, I wrote a quick and dirty implementation:
>
> https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334
>
> The code was very simple, and the performance was identical to dict. So in
> theory, adding a frozendict to CPython is not hard. But there's more.
>
> After the first implementation, I started to try to improve the
> performance of frozendict. The result of the improvements are here:
>
>
> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt
>
> For benchmarks, I used simply timeit, with autorange and repeat and, as
> suggested in the module documentation, I got the minimum of the results.
> Here is the code:
>
> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py
>
> I have not tested with an optimized build, since optimization data is
> collected using the unit tests, and I didn't write tests for frozendict in
> the official CPython test framework.
>
> The tests and benchmarks were done using CPython 3.9a0. CPU and other pc
> resources were not restricted using pyperf or similar tools, to see the
> "real" speed. CPython was compiled using gcc and g++.
>
> In benchmarks, I compared methods and operators using dict and frozendict.
> The benchmark uses dicts with all integers and dicts with all strings.
> Furthermore, I tested dicts with size 8 (the most optimized size in
> CPython) and 1000 elements (maybe too much, but I wanted to see how they
> perform with a high number of elements).
>
> Every benchmark has a line in the output. The Name describes the
> benchmarked method, operator or code snippet. Size is the size of the dict,
> 8 or 1000. Keys are the keys type, str or int. Type is the dictionary type,
> dict or frozendict.
>
> In Name, the "o" represents the object itself (dict or frozendict). "d",
> in benchmark with dict, is "o"; in benchmarks with frozendict is an
> equivalent instance of type dict.
>
> Some consideration:
>
> 1. frozendict is very fast, as expected, at copying. But it's also faster
> at creation, using a (not frozen) dict, kwargs or a sequence2. Speedups
> range from 20% to 45%.
> 2. frozendict is also a bit faster when you iterate over it, especially
> over values, where is ~15% faster
> 3. hash seems really fast, but this is because it's cached the first time
> hash() is invoked
> 4. where frozendict is slower is when you unpickle it and with fromkeys
> and repr. This is because I wrote a very naif implementation of these
> methods, without optimizing them. The other methods have a comparable speed.
>
> Here is the code:
> https://github.com/Marco-Sulla/cpython
>
> Here is the diff between the CPython code and my commits:
> https://github.com/python/cpython/compare/master...Marco-Sulla:master
>
> About code
>
> I coded the implementation doing a simple copy/paste of the existing dict
> functions, modifying their code and renaming them. This way I'm sure dict
> continues to work as before, and I can compare the speed gain.
>
> Some of the optimization I adopted can be implemented in `dict` too. For
> example, instead of creating an empty dict and resizing it, I create it
> with the "maximum" size and I fill it. It seems to work, even if I did not
> explore the possibility that a mutable object can change while a frozendict
> creation is in progress.
>
> Some problems with optimizing dict and maintaining a frozendict:
>
> 1. duplication of code. To gain a significant boost, I had to copy and
> paste a lot of code. Functions can be remerged again, but maybe the speedup
> will be reduced.
> 2. split vs combined dicts. As I wrote, split dicts seem to be faster in
> reading than combined dicts. For example, iterating over values is faster
> with a split dict, as expected.
> But writing was not tested; furthermore, some of the optimizations can be
> adopted for dicts too, so the convenience of a split table can be lowered.
> dict continues to maintain both split and combined tables, so this could
> be not a problem. But the code could be less and more fast if only a table
> layout is supported
> 3. the CPython code I used is old, so some of the improvements I adopted
> could be already implemented
>
> About frozendict
>
> Apart the considerations done in the [PEP 416](
> https://www.python.org/dev/peps/pep-0416/), that was rejected since there
> was little gain from its implementation, I think that frozendict can be
> useful as a substitute of MappingProxyType, that is really slow.
> MappingProxyType is not much used, but it's present in critical parts of
> CPython code, for example in _sre. We have to see if a mapping proxy type
> *can* be substituted with an immutable map in some critical part of CPython
> code.
>
> Furthermore, frozendicts could be used for implementing "immutable"
> classes and modules, and can be used as a faster dict if its content does
> not change.
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-leave@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW6O7RO6DT3JIG3OAJCAVCA5CNTN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

[Attachment #5 (text/html)]

<div dir="ltr">What, exactly, is frozen?   My understanding is that one problem with \
frozen dicts in the past is deciding exactly what is mutable and what is immutable.   \
Can you change what object a key maps to so long as the set of keys stay the same?   \
Can you modify the contents of mutable object that is a value?<br></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 21, 2020 at 6:30 PM \
Marco Sulla &lt;<a href="mailto:Marco.Sulla.Python@gmail.com">Marco.Sulla.Python@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Let me \
first say that the code and discussion, as per title, is also about possible \
performance improvements to the dict base type.<br><br><div>TL;DR I implemented a \
frozendict using CPython 3.9 code. It seems that an immutable dict *could* be faster \
than dict in some cases. Furthermore, some optimization could be applied to dict \
too.</div><br>Long explaining:<br><br>Since now I have some time, I decided to \
experiment a little with the creation of an immutable dict in \
Python.<br><br>Unluckily, I started this experiment many months ago, so the CPython \
code I used is old. Maybe some or all of my considerations are \
outdated.<br><br>Initially, I wrote a quick and dirty implementation:<br><a \
href="https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334" \
target="_blank">https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334</a><br><br>The \
code was very simple, and the performance was identical to dict. So in theory, adding \
a frozendict to CPython is not hard. But there&#39;s more.<br><br>After the first \
implementation, I started to try to improve the performance of frozendict. The result \
of the improvements are here:<br><br><a \
href="https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt" \
target="_blank">https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt</a><br><br>For \
benchmarks, I used simply timeit, with autorange and repeat and, as suggested in the \
module documentation, I got the minimum of the results. Here is the code:<br><br><a \
href="https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py" \
target="_blank">https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py</a><br><br>I \
have not tested with an optimized build, since optimization data is collected using \
the unit tests, and I didn&#39;t write tests for frozendict in the official CPython \
test framework.<br><br>The tests and benchmarks were done using CPython 3.9a0. CPU \
and other pc resources were not restricted using pyperf or similar tools, to see the \
&quot;real&quot; speed. CPython was compiled using gcc and g++.<br><br>In benchmarks, \
I compared methods and operators using dict and frozendict. The benchmark uses dicts \
with all integers and dicts with all strings. Furthermore, I tested dicts with size 8 \
(the most optimized size in CPython) and 1000 elements (maybe too much, but I wanted \
to see how they perform with a high number of elements).<br><br>Every benchmark has a \
line in the output. The Name describes the benchmarked method, operator or code \
snippet. Size is the size of the dict, 8 or 1000. Keys are the keys type, str or int. \
Type is the dictionary type, dict or frozendict.<br><br>In Name, the &quot;o&quot; \
represents the object itself (dict or frozendict). &quot;d&quot;, in benchmark with \
dict, is &quot;o&quot;; in benchmarks with frozendict is an equivalent instance of \
type dict. <br><br>Some consideration:<br><br>1. frozendict is very fast, as \
expected, at copying. But it&#39;s also faster at creation, using a (not frozen) \
dict, kwargs or a sequence2. Speedups range from 20% to 45%.<br>2. frozendict is also \
a bit faster when you iterate over it, especially over values, where is ~15% \
faster<br>3. hash seems really fast, but this is because it&#39;s cached the first \
time hash() is invoked<br>4. where frozendict is slower is when you unpickle it and \
with fromkeys and repr. This is because I wrote a very naif implementation of these \
methods, without optimizing them. The other methods have a comparable \
speed.<br><br>Here is the code:<br><a href="https://github.com/Marco-Sulla/cpython" \
target="_blank">https://github.com/Marco-Sulla/cpython</a><br><br>Here is the diff \
between the CPython code and my commits:<br><a \
href="https://github.com/python/cpython/compare/master...Marco-Sulla:master" \
target="_blank">https://github.com/python/cpython/compare/master...Marco-Sulla:master</a><br><br>About \
code<br><br>I coded the implementation doing a simple copy/paste of the existing dict \
functions, modifying their code and renaming them. This way I&#39;m sure dict \
continues to work as before, and I can compare the speed gain.<br><br>Some of the \
optimization I adopted can be implemented in `dict` too. For example, instead of \
creating an empty dict and resizing it, I create it with the &quot;maximum&quot; size \
and I fill it. It seems to work, even if I did not explore the possibility that a \
mutable object can change while a frozendict creation is in progress.<br><br>Some \
problems with optimizing dict and maintaining a frozendict:<br><br>1. duplication of \
code. To gain a significant boost, I had to copy and paste a lot of code. Functions \
can be remerged again, but maybe the speedup will be reduced.<br>2. split vs combined \
dicts. As I wrote, split dicts seem to be faster in reading than combined dicts. For \
example, iterating over values is faster with a split dict, as expected. <br>But \
writing was not tested; furthermore, some of the optimizations can be adopted for \
dicts too, so the convenience of a split table can be lowered.<br>dict continues to \
maintain both split and combined tables, so this could be not a problem. But the code \
could be less and more fast if only a table layout is supported<br>3. the CPython \
code I used is old, so some of the improvements I adopted could be already \
implemented<br><br>About frozendict<br><br>Apart the considerations done in the [PEP \
416](<a href="https://www.python.org/dev/peps/pep-0416/" \
target="_blank">https://www.python.org/dev/peps/pep-0416/</a>), that was rejected \
since there was little gain from its implementation, I think that frozendict can be \
useful as a substitute of MappingProxyType, that is really slow. MappingProxyType is \
not much used, but it&#39;s present in critical parts of CPython code, for example in \
_sre. We have to see if a mapping proxy type *can* be substituted with an immutable \
map in some critical part of CPython code.<br><br>Furthermore, frozendicts could be \
used for implementing &quot;immutable&quot; classes and modules, and can be used as a \
faster dict if its content does not change.</div> \
_______________________________________________<br> Python-ideas mailing list -- <a \
href="mailto:python-ideas@python.org" target="_blank">python-ideas@python.org</a><br> \
To unsubscribe send an email to <a href="mailto:python-ideas-leave@python.org" \
target="_blank">python-ideas-leave@python.org</a><br> <a \
href="https://mail.python.org/mailman3/lists/python-ideas.python.org/" \
rel="noreferrer" target="_blank">https://mail.python.org/mailman3/lists/python-ideas.python.org/</a><br>
 Message archived at <a \
href="https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW6O7RO6DT3JIG3OAJCAVCA5CNTN/" \
rel="noreferrer" target="_blank">https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW6O7RO6DT3JIG3OAJCAVCA5CNTN/</a><br>
 Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" \
target="_blank">http://python.org/psf/codeofconduct/</a><br> </blockquote></div>



_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GAJHZEBOU5BGYJAUNQP5V66KSS6HHSQD/
 Code of Conduct: http://python.org/psf/codeofconduct/



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic