[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: UDSEntry compression ideas
From: Mark <markg85 () gmail ! com>
Date: 2013-09-20 19:31:03
Message-ID: CAPd6JnEiHLiMbywy6k3TWwVvT7e66qKXtvoYP7uuD4c+NJJguw () mail ! gmail ! com
[Download RAW message or body]
On Fri, Sep 20, 2013 at 10:29 AM, Frank Reininghaus <
frank78ac@googlemail.com> wrote:
> Hi Mark,
>
> 2013/9/20 Mark:
> [...]
> >> > The easiest "compression" that I could think of was to force the
> >> > UDSEntries to implicitly share all their QStrings for one field if
> >> > they are all equal:
> >> >
> >> > http://paste.kde.org/p52a24b49/
> >> >
> >> > Reduces the memory consumption in Details View for 100,000 items from
> >> > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it
> >> > might have a small performance penalty. Maybe it's worth investigating
> >> > that further though.
> >>
> >> OK, so I did some performance measurements. Short version: my earlier
> >> patch does make listing directories slightly slower, but I found
> >> another way to hack UDSEntry that does apparently not make things
> >> slower, and that reduces the memory usage by about 450 bytes per
> >> listed file, without any loss of functionality.
> >>
> >> Long version:
> >>
> >> (a) First of all, I took the current master branches of kdelibs and
> >> kde-baseapps (all built in release mode) and added some profiling
> >> output:
> >>
> >> http://paste.kde.org/p9b9b7b68/
> >> http://paste.kde.org/p992af44c/
> >>
> >> Then I opened a folder with 100,000 files in Dolphin in Icons View.
> >>
> >> According to KSysGuard, the memory usage of the dolphin process is 200.6
> >> MB.
> >>
> >> For the performance measurements, I discarded the first run and went
> >> out of the folder and back twice to get reproducible results (the time
> >> spent per item in the first run fluctuated strongly).
> >>
> >> The time that SlaveInterface::dispatch(int _cmd, const QByteArray
> >> &rawdata) needs per item was between 2.6 and 2.8 microseconds.
> >>
> >>
> >> (b) My first patch that only makes use of implicit sharing of the
> >> QStrings:
> >>
> >> http://paste.kde.org/p52a24b49/
> >>
> >> Memory usage drops to 187.9 MB, but the time spent per item increases
> >> to 2.9 to 3.1 microseconds (not surprising, I think).
> >>
> >>
> >> (c) My new patch, that makes more intrusive changes to the way the
> >> data is stored inside UDSEntry (explanations below):
> >>
> >> http://paste.kde.org/pc85e6b34/
> >>
> >> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9
> >> microseconds.
> >>
> >>
> >> This means that we can save more than 450 bytes for each file, which
> >> is IMHO a lot.
> >>
> >>
> >>
> >> How does it work?
> >>
> >> The basic observation is that UDSEntry currently stores all data in a
> >> QHash<uint, Field>, and that a QHash is not really space-efficient
> >> (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get an
> >> idea how much memory is needed to store the data inside a QHash).
> >> However, it has the nice property that the "Field" that an "uint" is
> >> mapped to can be accessed in O(1) time.
> >>
> >> My idea to improve this and keep the O(1) access is the following: We
> >> store all fields in UDSEntryPrivate in a simple
> >>
> >> QVector<Field> fields
> >>
> >> in the order in which the fields are read from the stream. We remember
> >> which UDS field comes at which position and store this in a
> >>
> >> QHash<uint, int> udsIndexes
> >>
> >> (uint is the type of the "UDS field", and a QVector uses "int" for
> >> indexing, but there are still a few signed/unsigned warnings with my
> >> patch which need investigation).
> >>
> >> Then we can access a "Field" that corresponds to a uint "uds" with
> >>
> >> d->fields.at(d->udsIndexes.value(field)).
> >>
> >> Now the obvious question is: "How on earth can this be more efficient
> >> that the current solution? There is still a QHash in each
> >> UDSEntryPrivate after all, and the code gets more complicated!"
> >>
> >> The trick is that the mapping from the "uint uds" to the index of the
> >> corresponding "Field" is usually exactly the same for every UDSEntry
> >> if we read many of them in a row, and we can make all UDSEntryPrivates
> >> share this mapping (QHash is implicitly shared). This means that every
> >> UDSEntryPrivate only has a private QVector<Field>, which only needs a
> >> rather small overhead compared to the memory that the Fields
> >> themselves need.
> >>
> >> The patch definitely needs further testing and polishing. I probably
> >> won't get much work done until next week though.
> >>
> >> Best regards,
> >> Frank
> >
> >
> > Hi Frank,
> >
> > That is really stunning! I sadly don't have a proper testcase to test it
> on
> > frameworks.
> > So i made a very quick rough test app that simply stores all entries in
> > memory.
> >
> > Doing that i get these very impressive results:
> > PRE:
> > 500.000 files: 236,208MB
> >
> > POST (Frank):
> > 500.000 files: 118,148MB
> >
> > You can remove another ~14MB in both cases which the app just has with
> other
> > stuff going one.
> >
> > Hoewver, looking at your new patch i can't help but notice that the hash
> is
> > now only used as a lookup table. Perhaps you can just get rid of that now
> > and simply loop through the vector to find the right "Field" object? You
> > probably have to add a uint field to the Field struct in that case
> otherwise
> > you wouldn't know the field key. I guess it would take a bit more CPU
> power
> > since you would have to loop through all of QVector to find a falue. On
> the
> > other hand, you just have to loop over 8 items.. Perhaps that time is
> even
> > faster then a hash lookup even if that's O(1), it still has to do the
> hash
> > calculation some other stuff as opposed to just iterating over a vector
> and
> > comparing if the uint key (an int!) matches.
>
> Yes, the QHash is just a lookup table, but removing it won't help
> anything in terms of memory usage. Note that it's most likely
> implicitly shared with loads of other UDSEntryPrivate structs, so the
> amortized memory cost per UDSEntry is just a little more than the size
> of a pointer (8 bytes on a 64-bit system).
>
> OTOH, if you store a uint with each field, that requires
>
> "number of fields" * 4 bytes (size of unit) + overhead
>
> If you store the uints in a separate container, you have a certain
> constant overhead. If you store them inside the "Field" struct, the
> overhead will probably be 4 bytes per field on a 64-bit system because
> currently the "Field" contains a QString (effectively a pointer, i.e.,
> 8 bytes) and a long long (8 bytes). If you add a 4 byte uint to that
> struct, the compiler will add 4 extra bytes to the struct to preserve
> the 8 byte alignment of the pointers [1].
>
> Moreover, I'm not sure if it's safe to assume that each UDSEntry will
> only contain 8 items. There might be kioslaves which add a lot more
> now or in the future, so I'm not sure if sacrificing the O(1) access
> is a good idea.
>
> > Btw. This stuff just shows again how freaking smart you are
>
> Thanks :-) But there are also lots of programming-related areas where
> I'm a lot less smart than other people. And I should also say that
> this blog post by Milian made me aware of the implicit sharing idea, I
> don't think I would have had such ideas without reading it:
>
> http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1
>
> The only small flaw in the KMail use case is that the "implicit
> sharing pool" can in principle grow indefinitely if it's being fed
> with different strings all the time. But fortunately, this cannot
> happen with the implicit sharing implementation in my UDSEntry patch.I
> wouldn't be surprised if there were many more such opportunities in
> Qt-based applications and libraries.
>
> Cheers,
> Frank
>
> [1] By the way, a cool way to check such things quickly is
> http://gcc.godbolt.org/. Just paste the code below into the "Code
> editor", and you can immediately see the size of the different structs
> in the assembly output:
>
> struct test1
> {
> void *v;
> };
>
> struct test2
> {
> void *v;
> int i;
> };
>
> struct test3
> {
> void *v;
> int i;
> int j; // Depending on the platform, 'j' will not need any
> additional space compared to 'test2'.
> };
>
> int size1()
> {
> return sizeof(test1);
> }
>
> int size2()
> {
> return sizeof(test2);
> }
>
> int size3()
> {
> return sizeof(test3);
> }
>
That stuff is really cool!
Now i see a few more possible optimizations. I wonder what your opinion is
on those.
1. Instead of storing the string as QString, you can also store it as a
QByteArray and call squeeze after setting string. The places that returns a
QString simply constructs it QString(...)
2. Use the google "sparse_hash_map", It's extremely memory efficient!
https://code.google.com/p/sparsehash/ reading how it is implemented id also
very interesting:
https://sparsehash.googlecode.com/svn/trunk/doc/implementation.html that
was another "WOW!" moment for me when reading that. The downside, it's
slower.
3. Right now the Field struct stores a QString and a long long. Certainly
there must be a way to just use one type. I guess QVariant has too much
overhead to be used, but something like that.
Just my remaining ideas that i can't even implement :) (ok, 1 i can, 2 i
can likely, but 3.. nope. That requires templating knowledge that i simply
don't have if possible at all)
[Attachment #3 (text/html)]
<div dir="ltr">On Fri, Sep 20, 2013 at 10:29 AM, Frank Reininghaus <span \
dir="ltr"><<a href="mailto:frank78ac@googlemail.com" \
target="_blank">frank78ac@googlemail.com</a>></span> wrote:<br><div \
class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi \
Mark,<br> <br>
2013/9/20 Mark:<br>
[...]<br>
<div><div class="h5">>> > The easiest "compression" that \
I could think of was to force the<br> >> > UDSEntries to \
implicitly share all their QStrings for one field if<br> >> > they \
are all equal:<br> >> ><br>
>> > <a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> >> ><br>
>> > Reduces the memory consumption in Details View for 100,000 \
items from<br> >> > 160.5 MB to 147 MB here. The patch is a bit \
hackish though, and it<br> >> > might have a small performance \
penalty. Maybe it's worth investigating<br> >> > that further \
though.<br> >><br>
>> OK, so I did some performance measurements. Short version: my \
earlier<br> >> patch does make listing directories slightly slower, \
but I found<br> >> another way to hack UDSEntry that does apparently \
not make things<br> >> slower, and that reduces the memory usage by \
about 450 bytes per<br> >> listed file, without any loss of \
functionality.<br> >><br>
>> Long version:<br>
>><br>
>> (a) First of all, I took the current master branches of kdelibs \
and<br> >> kde-baseapps (all built in release mode) and added some \
profiling<br> >> output:<br>
>><br>
>> <a href="http://paste.kde.org/p9b9b7b68/" \
target="_blank">http://paste.kde.org/p9b9b7b68/</a><br> >> <a \
href="http://paste.kde.org/p992af44c/" \
target="_blank">http://paste.kde.org/p992af44c/</a><br> >><br>
>> Then I opened a folder with 100,000 files in Dolphin in Icons \
View.<br> >><br>
>> According to KSysGuard, the memory usage of the dolphin process is \
200.6<br> >> MB.<br>
>><br>
>> For the performance measurements, I discarded the first run and \
went<br> >> out of the folder and back twice to get reproducible \
results (the time<br> >> spent per item in the first run fluctuated \
strongly).<br> >><br>
>> The time that SlaveInterface::dispatch(int _cmd, const \
QByteArray<br> >> &rawdata) needs per item was between 2.6 and \
2.8 microseconds.<br> >><br>
>><br>
>> (b) My first patch that only makes use of implicit sharing of \
the<br> >> QStrings:<br>
>><br>
>> <a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> >><br>
>> Memory usage drops to 187.9 MB, but the time spent per item \
increases<br> >> to 2.9 to 3.1 microseconds (not surprising, I \
think).<br> >><br>
>><br>
>> (c) My new patch, that makes more intrusive changes to the way \
the<br> >> data is stored inside UDSEntry (explanations below):<br>
>><br>
>> <a href="http://paste.kde.org/pc85e6b34/" \
target="_blank">http://paste.kde.org/pc85e6b34/</a><br> >><br>
>> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9<br>
>> microseconds.<br>
>><br>
>><br>
>> This means that we can save more than 450 bytes for each file, \
which<br> >> is IMHO a lot.<br>
>><br>
>><br>
>><br>
>> How does it work?<br>
>><br>
>> The basic observation is that UDSEntry currently stores all data \
in a<br> >> QHash<uint, Field>, and that a QHash is not really \
space-efficient<br> >> (see, e.g., <a \
href="http://doc.qt.digia.com/qq/qq19-containers.html" \
target="_blank">http://doc.qt.digia.com/qq/qq19-containers.html</a> to get \
an<br> >> idea how much memory is needed to store the data inside a \
QHash).<br> >> However, it has the nice property that the \
"Field" that an "uint" is<br> >> mapped to can be \
accessed in O(1) time.<br> >><br>
>> My idea to improve this and keep the O(1) access is the following: \
We<br> >> store all fields in UDSEntryPrivate in a simple<br>
>><br>
>> QVector<Field> fields<br>
>><br>
>> in the order in which the fields are read from the stream. We \
remember<br> >> which UDS field comes at which position and store \
this in a<br> >><br>
>> QHash<uint, int> udsIndexes<br>
>><br>
>> (uint is the type of the "UDS field", and a QVector uses \
"int" for<br> >> indexing, but there are still a few \
signed/unsigned warnings with my<br> >> patch which need \
investigation).<br> >><br>
>> Then we can access a "Field" that corresponds to a uint \
"uds" with<br> >><br>
>> d-><a href="http://fields.at" \
target="_blank">fields.at</a>(d->udsIndexes.value(field)).<br> \
>><br> >> Now the obvious question is: "How on earth can \
this be more efficient<br> >> that the current solution? There is \
still a QHash in each<br> >> UDSEntryPrivate after all, and the code \
gets more complicated!"<br> >><br>
>> The trick is that the mapping from the "uint uds" to the \
index of the<br> >> corresponding "Field" is usually \
exactly the same for every UDSEntry<br> >> if we read many of them in \
a row, and we can make all UDSEntryPrivates<br> >> share this mapping \
(QHash is implicitly shared). This means that every<br> >> \
UDSEntryPrivate only has a private QVector<Field>, which only needs \
a<br> >> rather small overhead compared to the memory that the \
Fields<br> >> themselves need.<br>
>><br>
>> The patch definitely needs further testing and polishing. I \
probably<br> >> won't get much work done until next week \
though.<br> >><br>
>> Best regards,<br>
>> Frank<br>
><br>
><br>
> Hi Frank,<br>
><br>
> That is really stunning! I sadly don't have a proper testcase to \
test it on<br> > frameworks.<br>
> So i made a very quick rough test app that simply stores all entries \
in<br> > memory.<br>
><br>
> Doing that i get these very impressive results:<br>
> PRE:<br>
> 500.000 files: 236,208MB<br>
><br>
> POST (Frank):<br>
> 500.000 files: 118,148MB<br>
><br>
> You can remove another ~14MB in both cases which the app just has with \
other<br> > stuff going one.<br>
><br>
> Hoewver, looking at your new patch i can't help but notice that \
the hash is<br> > now only used as a lookup table. Perhaps you can just \
get rid of that now<br> > and simply loop through the vector to find the \
right "Field" object? You<br> > probably have to add a uint \
field to the Field struct in that case otherwise<br> > you wouldn't \
know the field key. I guess it would take a bit more CPU power<br> > \
since you would have to loop through all of QVector to find a falue. On \
the<br> > other hand, you just have to loop over 8 items.. Perhaps that \
time is even<br> > faster then a hash lookup even if that's O(1), it \
still has to do the hash<br> > calculation some other stuff as opposed \
to just iterating over a vector and<br> > comparing if the uint key (an \
int!) matches.<br> <br>
</div></div>Yes, the QHash is just a lookup table, but removing it \
won't help<br> anything in terms of memory usage. Note that it's \
most likely<br> implicitly shared with loads of other UDSEntryPrivate \
structs, so the<br> amortized memory cost per UDSEntry is just a little \
more than the size<br> of a pointer (8 bytes on a 64-bit system).<br>
<br>
OTOH, if you store a uint with each field, that requires<br>
<br>
"number of fields" * 4 bytes (size of unit) + overhead<br>
<br>
If you store the uints in a separate container, you have a certain<br>
constant overhead. If you store them inside the "Field" struct, \
the<br> overhead will probably be 4 bytes per field on a 64-bit system \
because<br> currently the "Field" contains a QString (effectively \
a pointer, i.e.,<br> 8 bytes) and a long long (8 bytes). If you add a 4 \
byte uint to that<br> struct, the compiler will add 4 extra bytes to the \
struct to preserve<br> the 8 byte alignment of the pointers [1].<br>
<br>
Moreover, I'm not sure if it's safe to assume that each UDSEntry \
will<br> only contain 8 items. There might be kioslaves which add a lot \
more<br> now or in the future, so I'm not sure if sacrificing the O(1) \
access<br> is a good idea.<br>
<div class="im"><br>
> Btw. This stuff just shows again how freaking smart you are<br>
<br>
</div>Thanks :-) But there are also lots of programming-related areas \
where<br> I'm a lot less smart than other people. And I should also say \
that<br> this blog post by Milian made me aware of the implicit sharing \
idea, I<br> don't think I would have had such ideas without reading \
it:<br> <br>
<a href="http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1" \
target="_blank">http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1</a><br>
<br>
The only small flaw in the KMail use case is that the "implicit<br>
sharing pool" can in principle grow indefinitely if it's being \
fed<br> with different strings all the time. But fortunately, this \
cannot<br> happen with the implicit sharing implementation in my UDSEntry \
patch.I<br> wouldn't be surprised if there were many more such \
opportunities in<br> Qt-based applications and libraries.<br>
<br>
Cheers,<br>
Frank<br>
<br>
[1] By the way, a cool way to check such things quickly is<br>
<a href="http://gcc.godbolt.org/" \
target="_blank">http://gcc.godbolt.org/</a>. Just paste the code below into \
the "Code<br> editor", and you can immediately see the size of \
the different structs<br> in the assembly output:<br>
<br>
struct test1<br>
{<br>
void *v;<br>
};<br>
<br>
struct test2<br>
{<br>
void *v;<br>
int i;<br>
};<br>
<br>
struct test3<br>
{<br>
void *v;<br>
int i;<br>
int j; // Depending on the platform, 'j' will not need any<br>
additional space compared to 'test2'.<br>
};<br>
<br>
int size1()<br>
{<br>
return sizeof(test1);<br>
}<br>
<br>
int size2()<br>
{<br>
return sizeof(test2);<br>
}<br>
<br>
int size3()<br>
{<br>
return sizeof(test3);<br>
}<br></blockquote><div><br></div><div>That stuff is really \
cool!</div><div><br></div><div>Now i see a few more possible optimizations. \
I wonder what your opinion is on those.</div><div><br></div><div>1. Instead \
of storing the string as QString, you can also store it as a QByteArray and \
call squeeze after setting string. The places that returns a QString simply \
constructs it QString(...)</div>
<div>2. Use the google "sparse_hash_map", It's extremely \
memory efficient! <a \
href="https://code.google.com/p/sparsehash/">https://code.google.com/p/sparsehash/</a> \
reading how it is implemented id also very interesting: <a \
href="https://sparsehash.googlecode.com/svn/trunk/doc/implementation.html">https://sparsehash.googlecode.com/svn/trunk/doc/implementation.html</a> \
that was another "WOW!" moment for me when reading that. The \
downside, it's slower.</div>
<div>3. Right now the Field struct stores a QString and a long long. \
Certainly there must be a way to just use one type. I guess QVariant has \
too much overhead to be used, but something like \
that.</div><div><br></div><div>
Just my remaining ideas that i can't even implement :) (ok, 1 i can, 2 \
i can likely, but 3.. nope. That requires templating knowledge that i \
simply don't have if possible at all)</div></div></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic