[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: UDSEntry compression ideas
From: Mark <markg85 () gmail ! com>
Date: 2013-09-20 19:31:03
Message-ID: CAPd6JnEiHLiMbywy6k3TWwVvT7e66qKXtvoYP7uuD4c+NJJguw () mail ! gmail ! com
[Download RAW message or body]
On Fri, Sep 20, 2013 at 10:29 AM, Frank Reininghaus <
frank78ac@googlemail.com> wrote:
> Hi Mark,
>
> 2013/9/20 Mark:
> [...]
> >> > The easiest "compression" that I could think of was to force the
> >> > UDSEntries to implicitly share all their QStrings for one field if
> >> > they are all equal:
> >> >
> >> > http://paste.kde.org/p52a24b49/
> >> >
> >> > Reduces the memory consumption in Details View for 100,000 items from
> >> > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it
> >> > might have a small performance penalty. Maybe it's worth investigating
> >> > that further though.
> >>
> >> OK, so I did some performance measurements. Short version: my earlier
> >> patch does make listing directories slightly slower, but I found
> >> another way to hack UDSEntry that does apparently not make things
> >> slower, and that reduces the memory usage by about 450 bytes per
> >> listed file, without any loss of functionality.
> >>
> >> Long version:
> >>
> >> (a) First of all, I took the current master branches of kdelibs and
> >> kde-baseapps (all built in release mode) and added some profiling
> >> output:
> >>
> >> http://paste.kde.org/p9b9b7b68/
> >> http://paste.kde.org/p992af44c/
> >>
> >> Then I opened a folder with 100,000 files in Dolphin in Icons View.
> >>
> >> According to KSysGuard, the memory usage of the dolphin process is 200.6
> >> MB.
> >>
> >> For the performance measurements, I discarded the first run and went
> >> out of the folder and back twice to get reproducible results (the time
> >> spent per item in the first run fluctuated strongly).
> >>
> >> The time that SlaveInterface::dispatch(int _cmd, const QByteArray
> >> &rawdata) needs per item was between 2.6 and 2.8 microseconds.
> >>
> >>
> >> (b) My first patch that only makes use of implicit sharing of the
> >> QStrings:
> >>
> >> http://paste.kde.org/p52a24b49/
> >>
> >> Memory usage drops to 187.9 MB, but the time spent per item increases
> >> to 2.9 to 3.1 microseconds (not surprising, I think).
> >>
> >>
> >> (c) My new patch, that makes more intrusive changes to the way the
> >> data is stored inside UDSEntry (explanations below):
> >>
> >> http://paste.kde.org/pc85e6b34/
> >>
> >> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9
> >> microseconds.
> >>
> >>
> >> This means that we can save more than 450 bytes for each file, which
> >> is IMHO a lot.
> >>
> >>
> >>
> >> How does it work?
> >>
> >> The basic observation is that UDSEntry currently stores all data in a
> >> QHash<uint, Field>, and that a QHash is not really space-efficient
> >> (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get an
> >> idea how much memory is needed to store the data inside a QHash).
> >> However, it has the nice property that the "Field" that an "uint" is
> >> mapped to can be accessed in O(1) time.
> >>
> >> My idea to improve this and keep the O(1) access is the following: We
> >> store all fields in UDSEntryPrivate in a simple
> >>
> >> QVector<Field> fields
> >>
> >> in the order in which the fields are read from the stream. We remember
> >> which UDS field comes at which position and store this in a
> >>
> >> QHash<uint, int> udsIndexes
> >>
> >> (uint is the type of the "UDS field", and a QVector uses "int" for
> >> indexing, but there are still a few signed/unsigned warnings with my
> >> patch which need investigation).
> >>
> >> Then we can access a "Field" that corresponds to a uint "uds" with
> >>
> >> d->fields.at(d->udsIndexes.value(field)).
> >>
> >> Now the obvious question is: "How on earth can this be more efficient
> >> that the current solution? There is still a QHash in each
> >> UDSEntryPrivate after all, and the code gets more complicated!"
> >>
> >> The trick is that the mapping from the "uint uds" to the index of the
> >> corresponding "Field" is usually exactly the same for every UDSEntry
> >> if we read many of them in a row, and we can make all UDSEntryPrivates
> >> share this mapping (QHash is implicitly shared). This means that every
> >> UDSEntryPrivate only has a private QVector<Field>, which only needs a
> >> rather small overhead compared to the memory that the Fields
> >> themselves need.
> >>
> >> The patch definitely needs further testing and polishing. I probably
> >> won't get much work done until next week though.
> >>
> >> Best regards,
> >> Frank
> >
> >
> > Hi Frank,
> >
> > That is really stunning! I sadly don't have a proper testcase to test it
> on
> > frameworks.
> > So i made a very quick rough test app that simply stores all entries in
> > memory.
> >
> > Doing that i get these very impressive results:
> > PRE:
> > 500.000 files: 236,208MB
> >
> > POST (Frank):
> > 500.000 files: 118,148MB
> >
> > You can remove another ~14MB in both cases which the app just has with
> other
> > stuff going one.
> >
> > Hoewver, looking at your new patch i can't help but notice that the hash
> is
> > now only used as a lookup table. Perhaps you can just get rid of that now
> > and simply loop through the vector to find the right "Field" object? You
> > probably have to add a uint field to the Field struct in that case
> otherwise
> > you wouldn't know the field key. I guess it would take a bit more CPU
> power
> > since you would have to loop through all of QVector to find a falue. On
> the
> > other hand, you just have to loop over 8 items.. Perhaps that time is
> even
> > faster then a hash lookup even if that's O(1), it still has to do the
> hash
> > calculation some other stuff as opposed to just iterating over a vector
> and
> > comparing if the uint key (an int!) matches.
>
> Yes, the QHash is just a lookup table, but removing it won't help
> anything in terms of memory usage. Note that it's most likely
> implicitly shared with loads of other UDSEntryPrivate structs, so the
> amortized memory cost per UDSEntry is just a little more than the size
> of a pointer (8 bytes on a 64-bit system).
>
> OTOH, if you store a uint with each field, that requires
>
> "number of fields" * 4 bytes (size of unit) + overhead
>
> If you store the uints in a separate container, you have a certain
> constant overhead. If you store them inside the "Field" struct, the
> overhead will probably be 4 bytes per field on a 64-bit system because
> currently the "Field" contains a QString (effectively a pointer, i.e.,
> 8 bytes) and a long long (8 bytes). If you add a 4 byte uint to that
> struct, the compiler will add 4 extra bytes to the struct to preserve
> the 8 byte alignment of the pointers [1].
>
> Moreover, I'm not sure if it's safe to assume that each UDSEntry will
> only contain 8 items. There might be kioslaves which add a lot more
> now or in the future, so I'm not sure if sacrificing the O(1) access
> is a good idea.
>
> > Btw. This stuff just shows again how freaking smart you are
>
> Thanks :-) But there are also lots of programming-related areas where
> I'm a lot less smart than other people. And I should also say that
> this blog post by Milian made me aware of the implicit sharing idea, I
> don't think I would have had such ideas without reading it:
>
> http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1
>
> The only small flaw in the KMail use case is that the "implicit
> sharing pool" can in principle grow indefinitely if it's being fed
> with different strings all the time. But fortunately, this cannot
> happen with the implicit sharing implementation in my UDSEntry patch.I
> wouldn't be surprised if there were many more such opportunities in
> Qt-based applications and libraries.
>
> Cheers,
> Frank
>
> [1] By the way, a cool way to check such things quickly is
> http://gcc.godbolt.org/. Just paste the code below into the "Code
> editor", and you can immediately see the size of the different structs
> in the assembly output:
>
> struct test1
> {
> void *v;
> };
>
> struct test2
> {
> void *v;
> int i;
> };
>
> struct test3
> {
> void *v;
> int i;
> int j; // Depending on the platform, 'j' will not need any
> additional space compared to 'test2'.
> };
>
> int size1()
> {
> return sizeof(test1);
> }
>
> int size2()
> {
> return sizeof(test2);
> }
>
> int size3()
> {
> return sizeof(test3);
> }
>
That stuff is really cool!
Now i see a few more possible optimizations. I wonder what your opinion is
on those.
1. Instead of storing the string as QString, you can also store it as a
QByteArray and call squeeze after setting string. The places that returns a
QString simply constructs it QString(...)
2. Use the google "sparse_hash_map", It's extremely memory efficient!
https://code.google.com/p/sparsehash/ reading how it is implemented id also
very interesting:
https://sparsehash.googlecode.com/svn/trunk/doc/implementation.html that
was another "WOW!" moment for me when reading that. The downside, it's
slower.
3. Right now the Field struct stores a QString and a long long. Certainly
there must be a way to just use one type. I guess QVariant has too much
overhead to be used, but something like that.
Just my remaining ideas that i can't even implement :) (ok, 1 i can, 2 i
can likely, but 3.. nope. That requires templating knowledge that i simply
don't have if possible at all)
[Attachment #3 (text/html)]
<div dir="ltr">On Fri, Sep 20, 2013 at 10:29 AM, Frank Reininghaus <span \
dir="ltr"><<a href="mailto:frank78ac@googlemail.com" \
target="_blank">frank78ac@googlemail.com</a>></span> wrote:<br><div \
class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi \
Mark,<br> <br>
2013/9/20 Mark:<br>
[...]<br>
<div><div class="h5">>> > The easiest "compression" that I could \
think of was to force the<br> >> > UDSEntries to implicitly share all their \
QStrings for one field if<br> >> > they are all equal:<br>
>> ><br>
>> > <a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> >> ><br>
>> > Reduces the memory consumption in Details View for 100,000 items \
from<br> >> > 160.5 MB to 147 MB here. The patch is a bit hackish though, \
and it<br> >> > might have a small performance penalty. Maybe it's worth \
investigating<br> >> > that further though.<br>
>><br>
>> OK, so I did some performance measurements. Short version: my earlier<br>
>> patch does make listing directories slightly slower, but I found<br>
>> another way to hack UDSEntry that does apparently not make things<br>
>> slower, and that reduces the memory usage by about 450 bytes per<br>
>> listed file, without any loss of functionality.<br>
>><br>
>> Long version:<br>
>><br>
>> (a) First of all, I took the current master branches of kdelibs and<br>
>> kde-baseapps (all built in release mode) and added some profiling<br>
>> output:<br>
>><br>
>> <a href="http://paste.kde.org/p9b9b7b68/" \
target="_blank">http://paste.kde.org/p9b9b7b68/</a><br> >> <a \
href="http://paste.kde.org/p992af44c/" \
target="_blank">http://paste.kde.org/p992af44c/</a><br> >><br>
>> Then I opened a folder with 100,000 files in Dolphin in Icons View.<br>
>><br>
>> According to KSysGuard, the memory usage of the dolphin process is 200.6<br>
>> MB.<br>
>><br>
>> For the performance measurements, I discarded the first run and went<br>
>> out of the folder and back twice to get reproducible results (the time<br>
>> spent per item in the first run fluctuated strongly).<br>
>><br>
>> The time that SlaveInterface::dispatch(int _cmd, const QByteArray<br>
>> &rawdata) needs per item was between 2.6 and 2.8 microseconds.<br>
>><br>
>><br>
>> (b) My first patch that only makes use of implicit sharing of the<br>
>> QStrings:<br>
>><br>
>> <a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> >><br>
>> Memory usage drops to 187.9 MB, but the time spent per item increases<br>
>> to 2.9 to 3.1 microseconds (not surprising, I think).<br>
>><br>
>><br>
>> (c) My new patch, that makes more intrusive changes to the way the<br>
>> data is stored inside UDSEntry (explanations below):<br>
>><br>
>> <a href="http://paste.kde.org/pc85e6b34/" \
target="_blank">http://paste.kde.org/pc85e6b34/</a><br> >><br>
>> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9<br>
>> microseconds.<br>
>><br>
>><br>
>> This means that we can save more than 450 bytes for each file, which<br>
>> is IMHO a lot.<br>
>><br>
>><br>
>><br>
>> How does it work?<br>
>><br>
>> The basic observation is that UDSEntry currently stores all data in a<br>
>> QHash<uint, Field>, and that a QHash is not really space-efficient<br>
>> (see, e.g., <a href="http://doc.qt.digia.com/qq/qq19-containers.html" \
target="_blank">http://doc.qt.digia.com/qq/qq19-containers.html</a> to get an<br> \
>> idea how much memory is needed to store the data inside a QHash).<br> \
>> However, it has the nice property that the "Field" that an \
"uint" is<br> >> mapped to can be accessed in O(1) time.<br>
>><br>
>> My idea to improve this and keep the O(1) access is the following: We<br>
>> store all fields in UDSEntryPrivate in a simple<br>
>><br>
>> QVector<Field> fields<br>
>><br>
>> in the order in which the fields are read from the stream. We remember<br>
>> which UDS field comes at which position and store this in a<br>
>><br>
>> QHash<uint, int> udsIndexes<br>
>><br>
>> (uint is the type of the "UDS field", and a QVector uses \
"int" for<br> >> indexing, but there are still a few signed/unsigned \
warnings with my<br> >> patch which need investigation).<br>
>><br>
>> Then we can access a "Field" that corresponds to a uint \
"uds" with<br> >><br>
>> d-><a href="http://fields.at" \
target="_blank">fields.at</a>(d->udsIndexes.value(field)).<br> >><br>
>> Now the obvious question is: "How on earth can this be more \
efficient<br> >> that the current solution? There is still a QHash in each<br>
>> UDSEntryPrivate after all, and the code gets more complicated!"<br>
>><br>
>> The trick is that the mapping from the "uint uds" to the index of \
the<br> >> corresponding "Field" is usually exactly the same for \
every UDSEntry<br> >> if we read many of them in a row, and we can make all \
UDSEntryPrivates<br> >> share this mapping (QHash is implicitly shared). This \
means that every<br> >> UDSEntryPrivate only has a private \
QVector<Field>, which only needs a<br> >> rather small overhead compared \
to the memory that the Fields<br> >> themselves need.<br>
>><br>
>> The patch definitely needs further testing and polishing. I probably<br>
>> won't get much work done until next week though.<br>
>><br>
>> Best regards,<br>
>> Frank<br>
><br>
><br>
> Hi Frank,<br>
><br>
> That is really stunning! I sadly don't have a proper testcase to test it \
on<br> > frameworks.<br>
> So i made a very quick rough test app that simply stores all entries in<br>
> memory.<br>
><br>
> Doing that i get these very impressive results:<br>
> PRE:<br>
> 500.000 files: 236,208MB<br>
><br>
> POST (Frank):<br>
> 500.000 files: 118,148MB<br>
><br>
> You can remove another ~14MB in both cases which the app just has with other<br>
> stuff going one.<br>
><br>
> Hoewver, looking at your new patch i can't help but notice that the hash \
is<br> > now only used as a lookup table. Perhaps you can just get rid of that \
now<br> > and simply loop through the vector to find the right "Field" \
object? You<br> > probably have to add a uint field to the Field struct in that \
case otherwise<br> > you wouldn't know the field key. I guess it would take a \
bit more CPU power<br> > since you would have to loop through all of QVector to \
find a falue. On the<br> > other hand, you just have to loop over 8 items.. \
Perhaps that time is even<br> > faster then a hash lookup even if that's O(1), \
it still has to do the hash<br> > calculation some other stuff as opposed to just \
iterating over a vector and<br> > comparing if the uint key (an int!) matches.<br>
<br>
</div></div>Yes, the QHash is just a lookup table, but removing it won't help<br>
anything in terms of memory usage. Note that it's most likely<br>
implicitly shared with loads of other UDSEntryPrivate structs, so the<br>
amortized memory cost per UDSEntry is just a little more than the size<br>
of a pointer (8 bytes on a 64-bit system).<br>
<br>
OTOH, if you store a uint with each field, that requires<br>
<br>
"number of fields" * 4 bytes (size of unit) + overhead<br>
<br>
If you store the uints in a separate container, you have a certain<br>
constant overhead. If you store them inside the "Field" struct, the<br>
overhead will probably be 4 bytes per field on a 64-bit system because<br>
currently the "Field" contains a QString (effectively a pointer, i.e.,<br>
8 bytes) and a long long (8 bytes). If you add a 4 byte uint to that<br>
struct, the compiler will add 4 extra bytes to the struct to preserve<br>
the 8 byte alignment of the pointers [1].<br>
<br>
Moreover, I'm not sure if it's safe to assume that each UDSEntry will<br>
only contain 8 items. There might be kioslaves which add a lot more<br>
now or in the future, so I'm not sure if sacrificing the O(1) access<br>
is a good idea.<br>
<div class="im"><br>
> Btw. This stuff just shows again how freaking smart you are<br>
<br>
</div>Thanks :-) But there are also lots of programming-related areas where<br>
I'm a lot less smart than other people. And I should also say that<br>
this blog post by Milian made me aware of the implicit sharing idea, I<br>
don't think I would have had such ideas without reading it:<br>
<br>
<a href="http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1" \
target="_blank">http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1</a><br> \
<br> The only small flaw in the KMail use case is that the "implicit<br>
sharing pool" can in principle grow indefinitely if it's being fed<br>
with different strings all the time. But fortunately, this cannot<br>
happen with the implicit sharing implementation in my UDSEntry patch.I<br>
wouldn't be surprised if there were many more such opportunities in<br>
Qt-based applications and libraries.<br>
<br>
Cheers,<br>
Frank<br>
<br>
[1] By the way, a cool way to check such things quickly is<br>
<a href="http://gcc.godbolt.org/" target="_blank">http://gcc.godbolt.org/</a>. Just \
paste the code below into the "Code<br> editor", and you can immediately \
see the size of the different structs<br> in the assembly output:<br>
<br>
struct test1<br>
{<br>
void *v;<br>
};<br>
<br>
struct test2<br>
{<br>
void *v;<br>
int i;<br>
};<br>
<br>
struct test3<br>
{<br>
void *v;<br>
int i;<br>
int j; // Depending on the platform, 'j' will not need any<br>
additional space compared to 'test2'.<br>
};<br>
<br>
int size1()<br>
{<br>
return sizeof(test1);<br>
}<br>
<br>
int size2()<br>
{<br>
return sizeof(test2);<br>
}<br>
<br>
int size3()<br>
{<br>
return sizeof(test3);<br>
}<br></blockquote><div><br></div><div>That stuff is really \
cool!</div><div><br></div><div>Now i see a few more possible optimizations. I wonder \
what your opinion is on those.</div><div><br></div><div>1. Instead of storing the \
string as QString, you can also store it as a QByteArray and call squeeze after \
setting string. The places that returns a QString simply constructs it \
QString(...)</div>
<div>2. Use the google "sparse_hash_map", It's extremely memory \
efficient! <a href="https://code.google.com/p/sparsehash/">https://code.google.com/p/sparsehash/</a> \
reading how it is implemented id also very interesting: <a \
href="https://sparsehash.googlecode.com/svn/trunk/doc/implementation.html">https://sparsehash.googlecode.com/svn/trunk/doc/implementation.html</a> \
that was another "WOW!" moment for me when reading that. The downside, \
it's slower.</div>
<div>3. Right now the Field struct stores a QString and a long long. Certainly there \
must be a way to just use one type. I guess QVariant has too much overhead to be \
used, but something like that.</div><div><br></div><div>
Just my remaining ideas that i can't even implement :) (ok, 1 i can, 2 i can \
likely, but 3.. nope. That requires templating knowledge that i simply don't have \
if possible at all)</div></div></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic