[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: UDSEntry compression ideas
From: Mark <markg85 () gmail ! com>
Date: 2013-09-20 1:02:08
Message-ID: CAPd6JnF4O5sF4HAKjxhyabKRDk1KN_nqsZvriM-2owfRhY+2Dg () mail ! gmail ! com
[Download RAW message or body]
On Fri, Sep 20, 2013 at 12:32 AM, Frank Reininghaus <
frank78ac@googlemail.com> wrote:
> Hi,
>
> 2013/9/17 Frank Reininghaus:
> > Hi Mark,
> >
> > I'm too tired for a longer reply already, but here is a quick idea
> > that I just had and tried immediately.
> >
> > 2013/9/16 Mark:
> >> Hi All,
> >>
> >> please read this as a brainstorm post with some ideas. I have no code
> >> working in this area yet.
> >>
> >> Lazy loading is awesome, but if you can compress the data enough that
> you
> >> still have all details without lazy loading is even better i think. For
> >> example, doing a directory listing on any given folder gives at least
> the
> >> following UDSEntry values:
> >> - Device ID
> >> - Inode
> >> - User
> >> - Group
> >>
> >> In the "experimental" case where you list the folder contents that has
> just
> >> a bunch of files created by one user you can say that at least the
> above 4
> >> properties are the same for all files. However, the experimental
> situation
> >> certainly doesn't work on "real data" where you can have files from
> multiple
> >> users and folders as well.
> >
> > Well, I guess that most people have quite a few folders where the
> > "User" and "Group" are the same for all files ;-)
> >
> > The easiest "compression" that I could think of was to force the
> > UDSEntries to implicitly share all their QStrings for one field if
> > they are all equal:
> >
> > http://paste.kde.org/p52a24b49/
> >
> > Reduces the memory consumption in Details View for 100,000 items from
> > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it
> > might have a small performance penalty. Maybe it's worth investigating
> > that further though.
>
> OK, so I did some performance measurements. Short version: my earlier
> patch does make listing directories slightly slower, but I found
> another way to hack UDSEntry that does apparently not make things
> slower, and that reduces the memory usage by about 450 bytes per
> listed file, without any loss of functionality.
>
> Long version:
>
> (a) First of all, I took the current master branches of kdelibs and
> kde-baseapps (all built in release mode) and added some profiling
> output:
>
> http://paste.kde.org/p9b9b7b68/
> http://paste.kde.org/p992af44c/
>
> Then I opened a folder with 100,000 files in Dolphin in Icons View.
>
> According to KSysGuard, the memory usage of the dolphin process is 200.6
> MB.
>
> For the performance measurements, I discarded the first run and went
> out of the folder and back twice to get reproducible results (the time
> spent per item in the first run fluctuated strongly).
>
> The time that SlaveInterface::dispatch(int _cmd, const QByteArray
> &rawdata) needs per item was between 2.6 and 2.8 microseconds.
>
>
> (b) My first patch that only makes use of implicit sharing of the QStrings:
>
> http://paste.kde.org/p52a24b49/
>
> Memory usage drops to 187.9 MB, but the time spent per item increases
> to 2.9 to 3.1 microseconds (not surprising, I think).
>
>
> (c) My new patch, that makes more intrusive changes to the way the
> data is stored inside UDSEntry (explanations below):
>
> http://paste.kde.org/pc85e6b34/
>
> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9
> microseconds.
>
>
> This means that we can save more than 450 bytes for each file, which
> is IMHO a lot.
>
>
>
> How does it work?
>
> The basic observation is that UDSEntry currently stores all data in a
> QHash<uint, Field>, and that a QHash is not really space-efficient
> (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get an
> idea how much memory is needed to store the data inside a QHash).
> However, it has the nice property that the "Field" that an "uint" is
> mapped to can be accessed in O(1) time.
>
> My idea to improve this and keep the O(1) access is the following: We
> store all fields in UDSEntryPrivate in a simple
>
> QVector<Field> fields
>
> in the order in which the fields are read from the stream. We remember
> which UDS field comes at which position and store this in a
>
> QHash<uint, int> udsIndexes
>
> (uint is the type of the "UDS field", and a QVector uses "int" for
> indexing, but there are still a few signed/unsigned warnings with my
> patch which need investigation).
>
> Then we can access a "Field" that corresponds to a uint "uds" with
>
> d->fields.at(d->udsIndexes.value(field)).
>
> Now the obvious question is: "How on earth can this be more efficient
> that the current solution? There is still a QHash in each
> UDSEntryPrivate after all, and the code gets more complicated!"
>
> The trick is that the mapping from the "uint uds" to the index of the
> corresponding "Field" is usually exactly the same for every UDSEntry
> if we read many of them in a row, and we can make all UDSEntryPrivates
> share this mapping (QHash is implicitly shared). This means that every
> UDSEntryPrivate only has a private QVector<Field>, which only needs a
> rather small overhead compared to the memory that the Fields
> themselves need.
>
> The patch definitely needs further testing and polishing. I probably
> won't get much work done until next week though.
>
> Best regards,
> Frank
>
Hi Frank,
That is really stunning! I sadly don't have a proper testcase to test it on
frameworks.
So i made a very quick rough test app that simply stores all entries in
memory.
Doing that i get these very impressive results:
PRE:
500.000 files: 236,208MB
POST (Frank):
500.000 files: 118,148MB
You can remove another ~14MB in both cases which the app just has with
other stuff going one.
Hoewver, looking at your new patch i can't help but notice that the hash is
now only used as a lookup table. Perhaps you can just get rid of that now
and simply loop through the vector to find the right "Field" object? You
probably have to add a uint field to the Field struct in that case
otherwise you wouldn't know the field key. I guess it would take a bit more
CPU power since you would have to loop through all of QVector to find a
falue. On the other hand, you just have to loop over 8 items.. Perhaps that
time is even faster then a hash lookup even if that's O(1), it still has to
do the hash calculation some other stuff as opposed to just iterating over
a vector and comparing if the uint key (an int!) matches.
Btw. This stuff just shows again how freaking smart you are and how much i
still have to learn in programming :) I guess it's a good thing that i
ordered this book: "Data Structures and Algorithm Analysis in C++ (4th
edition)" : http://users.cis.fiu.edu/~weiss/
[Attachment #3 (text/html)]
<div dir="ltr">On Fri, Sep 20, 2013 at 12:32 AM, Frank Reininghaus <span \
dir="ltr"><<a href="mailto:frank78ac@googlemail.com" \
target="_blank">frank78ac@googlemail.com</a>></span> wrote:<br><div \
class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>
<br>
2013/9/17 Frank Reininghaus:<br>
<div><div class="h5">> Hi Mark,<br>
><br>
> I'm too tired for a longer reply already, but here is a quick \
idea<br> > that I just had and tried immediately.<br>
><br>
> 2013/9/16 Mark:<br>
>> Hi All,<br>
>><br>
>> please read this as a brainstorm post with some ideas. I have no \
code<br> >> working in this area yet.<br>
>><br>
>> Lazy loading is awesome, but if you can compress the data enough \
that you<br> >> still have all details without lazy loading is even \
better i think. For<br> >> example, doing a directory listing on any \
given folder gives at least the<br> >> following UDSEntry values:<br>
>> - Device ID<br>
>> - Inode<br>
>> - User<br>
>> - Group<br>
>><br>
>> In the "experimental" case where you list the folder \
contents that has just<br> >> a bunch of files created by one user \
you can say that at least the above 4<br> >> properties are the same \
for all files. However, the experimental situation<br> >> certainly \
doesn't work on "real data" where you can have files from \
multiple<br> >> users and folders as well.<br>
><br>
> Well, I guess that most people have quite a few folders where the<br>
> "User" and "Group" are the same for all files \
;-)<br> ><br>
> The easiest "compression" that I could think of was to force \
the<br> > UDSEntries to implicitly share all their QStrings for one \
field if<br> > they are all equal:<br>
><br>
> <a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> ><br>
> Reduces the memory consumption in Details View for 100,000 items \
from<br> > 160.5 MB to 147 MB here. The patch is a bit hackish though, \
and it<br> > might have a small performance penalty. Maybe it's \
worth investigating<br> > that further though.<br>
<br>
</div></div>OK, so I did some performance measurements. Short version: my \
earlier<br> patch does make listing directories slightly slower, but I \
found<br> another way to hack UDSEntry that does apparently not make \
things<br> slower, and that reduces the memory usage by about 450 bytes \
per<br> listed file, without any loss of functionality.<br>
<br>
Long version:<br>
<br>
(a) First of all, I took the current master branches of kdelibs and<br>
kde-baseapps (all built in release mode) and added some profiling<br>
output:<br>
<br>
<a href="http://paste.kde.org/p9b9b7b68/" \
target="_blank">http://paste.kde.org/p9b9b7b68/</a><br> <a \
href="http://paste.kde.org/p992af44c/" \
target="_blank">http://paste.kde.org/p992af44c/</a><br> <br>
Then I opened a folder with 100,000 files in Dolphin in Icons View.<br>
<br>
According to KSysGuard, the memory usage of the dolphin process is 200.6 \
MB.<br> <br>
For the performance measurements, I discarded the first run and went<br>
out of the folder and back twice to get reproducible results (the time<br>
spent per item in the first run fluctuated strongly).<br>
<br>
The time that SlaveInterface::dispatch(int _cmd, const QByteArray<br>
&rawdata) needs per item was between 2.6 and 2.8 microseconds.<br>
<br>
<br>
(b) My first patch that only makes use of implicit sharing of the \
QStrings:<br> <br>
<a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> <br>
Memory usage drops to 187.9 MB, but the time spent per item increases<br>
to 2.9 to 3.1 microseconds (not surprising, I think).<br>
<br>
<br>
(c) My new patch, that makes more intrusive changes to the way the<br>
data is stored inside UDSEntry (explanations below):<br>
<br>
<a href="http://paste.kde.org/pc85e6b34/" \
target="_blank">http://paste.kde.org/pc85e6b34/</a><br> <br>
Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9 \
microseconds.<br> <br>
<br>
This means that we can save more than 450 bytes for each file, which<br>
is IMHO a lot.<br>
<br>
<br>
<br>
How does it work?<br>
<br>
The basic observation is that UDSEntry currently stores all data in a<br>
QHash<uint, Field>, and that a QHash is not really \
space-efficient<br> (see, e.g., <a \
href="http://doc.qt.digia.com/qq/qq19-containers.html" \
target="_blank">http://doc.qt.digia.com/qq/qq19-containers.html</a> to get \
an<br> idea how much memory is needed to store the data inside a \
QHash).<br> However, it has the nice property that the "Field" \
that an "uint" is<br> mapped to can be accessed in O(1) time.<br>
<br>
My idea to improve this and keep the O(1) access is the following: We<br>
store all fields in UDSEntryPrivate in a simple<br>
<br>
QVector<Field> fields<br>
<br>
in the order in which the fields are read from the stream. We remember<br>
which UDS field comes at which position and store this in a<br>
<br>
QHash<uint, int> udsIndexes<br>
<br>
(uint is the type of the "UDS field", and a QVector uses \
"int" for<br> indexing, but there are still a few signed/unsigned \
warnings with my<br> patch which need investigation).<br>
<br>
Then we can access a "Field" that corresponds to a uint \
"uds" with<br> <br>
d-><a href="http://fields.at" \
target="_blank">fields.at</a>(d->udsIndexes.value(field)).<br> <br>
Now the obvious question is: "How on earth can this be more \
efficient<br> that the current solution? There is still a QHash in each<br>
UDSEntryPrivate after all, and the code gets more complicated!"<br>
<br>
The trick is that the mapping from the "uint uds" to the index of \
the<br> corresponding "Field" is usually exactly the same for \
every UDSEntry<br> if we read many of them in a row, and we can make all \
UDSEntryPrivates<br> share this mapping (QHash is implicitly shared). This \
means that every<br> UDSEntryPrivate only has a private \
QVector<Field>, which only needs a<br> rather small overhead compared \
to the memory that the Fields<br> themselves need.<br>
<br>
The patch definitely needs further testing and polishing. I probably<br>
won't get much work done until next week though.<br>
<br>
Best regards,<br>
Frank<br>
</blockquote></div><br></div><div class="gmail_extra">Hi Frank,</div><div \
class="gmail_extra"><br></div><div class="gmail_extra">That is really \
stunning! I sadly don't have a proper testcase to test it on \
frameworks.</div>
<div class="gmail_extra">So i made a very quick rough test app that simply \
stores all entries in memory.</div><div class="gmail_extra"><br></div><div \
class="gmail_extra">Doing that i get these very impressive results:</div>
<div class="gmail_extra">PRE:</div><div class="gmail_extra">500.000 files: \
236,208MB</div><div class="gmail_extra"><br></div><div \
class="gmail_extra">POST (Frank):</div><div class="gmail_extra">500.000 \
files: 118,148MB</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">You can remove \
another ~14MB in both cases which the app just has with other stuff going \
one.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">
Hoewver, looking at your new patch i can't help but notice that the \
hash is now only used as a lookup table. Perhaps you can just get rid of \
that now and simply loop through the vector to find the right \
"Field" object? You probably have to add a uint field to the \
Field struct in that case otherwise you wouldn't know the field key. I \
guess it would take a bit more CPU power since you would have to loop \
through all of QVector to find a falue. On the other hand, you just have to \
loop over 8 items.. Perhaps that time is even faster then a hash lookup \
even if that's O(1), it still has to do the hash calculation some other \
stuff as opposed to just iterating over a vector and comparing if the uint \
key (an int!) matches.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">Btw. This stuff \
just shows again how freaking smart you are and how much i still have to \
learn in programming :) I guess it's a good thing that i ordered this \
book: "Data Structures and Algorithm Analysis in C++ (4th \
edition)" : <a href="http://users.cis.fiu.edu/~weiss/">http://users.cis.fiu.edu/~weiss/</a></div>
</div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic