[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: UDSEntry compression ideas
From:       Mark <markg85 () gmail ! com>
Date:       2013-09-20 1:02:08
Message-ID: CAPd6JnF4O5sF4HAKjxhyabKRDk1KN_nqsZvriM-2owfRhY+2Dg () mail ! gmail ! com
[Download RAW message or body]

On Fri, Sep 20, 2013 at 12:32 AM, Frank Reininghaus <
frank78ac@googlemail.com> wrote:

> Hi,
>
> 2013/9/17 Frank Reininghaus:
> > Hi Mark,
> >
> > I'm too tired for a longer reply already, but here is a quick idea
> > that I just had and tried immediately.
> >
> > 2013/9/16 Mark:
> >> Hi All,
> >>
> >> please read this as a brainstorm post with some ideas. I have no code
> >> working in this area yet.
> >>
> >> Lazy loading is awesome, but if you can compress the data enough that
> you
> >> still have all details without lazy loading is even better i think. For
> >> example, doing a directory listing on any given folder gives at least
> the
> >> following UDSEntry values:
> >> - Device ID
> >> - Inode
> >> - User
> >> - Group
> >>
> >> In the "experimental" case where you list the folder contents that has
> just
> >> a bunch of files created by one user you can say that at least the
> above 4
> >> properties are the same for all files. However, the experimental
> situation
> >> certainly doesn't work on "real data" where you can have files from
> multiple
> >> users and folders as well.
> >
> > Well, I guess that most people have quite a few folders where the
> > "User" and "Group" are the same for all files ;-)
> >
> > The easiest "compression" that I could think of was to force the
> > UDSEntries to implicitly share all their QStrings for one field if
> > they are all equal:
> >
> > http://paste.kde.org/p52a24b49/
> >
> > Reduces the memory consumption in Details View for 100,000 items from
> > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it
> > might have a small performance penalty. Maybe it's worth investigating
> > that further though.
>
> OK, so I did some performance measurements. Short version: my earlier
> patch does make listing directories slightly slower, but I found
> another way to hack UDSEntry that does apparently not make things
> slower, and that reduces the memory usage by about 450 bytes per
> listed file, without any loss of functionality.
>
> Long version:
>
> (a) First of all, I took the current master branches of kdelibs and
> kde-baseapps (all built in release mode) and added some profiling
> output:
>
> http://paste.kde.org/p9b9b7b68/
> http://paste.kde.org/p992af44c/
>
> Then I opened a folder with 100,000 files in Dolphin in Icons View.
>
> According to KSysGuard, the memory usage of the dolphin process is 200.6
> MB.
>
> For the performance measurements, I discarded the first run and went
> out of the folder and back twice to get reproducible results (the time
> spent per item in the first run fluctuated strongly).
>
> The time that SlaveInterface::dispatch(int _cmd, const QByteArray
> &rawdata) needs per item was between 2.6 and 2.8 microseconds.
>
>
> (b) My first patch that only makes use of implicit sharing of the QStrings:
>
> http://paste.kde.org/p52a24b49/
>
> Memory usage drops to 187.9 MB, but the time spent per item increases
> to 2.9 to 3.1 microseconds (not surprising, I think).
>
>
> (c) My new patch, that makes more intrusive changes to the way the
> data is stored inside UDSEntry (explanations below):
>
> http://paste.kde.org/pc85e6b34/
>
> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9
> microseconds.
>
>
> This means that we can save more than 450 bytes for each file, which
> is IMHO a lot.
>
>
>
> How does it work?
>
> The basic observation is that UDSEntry currently stores all data in a
> QHash<uint, Field>, and that a QHash is not really space-efficient
> (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get an
> idea how much memory is needed to store the data inside a QHash).
> However, it has the nice property that the "Field" that an "uint" is
> mapped to can be accessed in O(1) time.
>
> My idea to improve this and keep the O(1) access is the following: We
> store all fields in UDSEntryPrivate in a simple
>
> QVector<Field> fields
>
> in the order in which the fields are read from the stream. We remember
> which UDS field comes at which position and store this in a
>
> QHash<uint, int> udsIndexes
>
> (uint is the type of the "UDS field", and a QVector uses "int" for
> indexing, but there are still a few signed/unsigned warnings with my
> patch which need investigation).
>
> Then we can access a "Field" that corresponds to a uint "uds" with
>
> d->fields.at(d->udsIndexes.value(field)).
>
> Now the obvious question is: "How on earth can this be more efficient
> that the current solution? There is still a QHash in each
> UDSEntryPrivate after all, and the code gets more complicated!"
>
> The trick is that the mapping from the "uint uds" to the index of the
> corresponding "Field" is usually exactly the same for every UDSEntry
> if we read many of them in a row, and we can make all UDSEntryPrivates
> share this mapping (QHash is implicitly shared). This means that every
> UDSEntryPrivate only has a private QVector<Field>, which only needs a
> rather small overhead compared to the memory that the Fields
> themselves need.
>
> The patch definitely needs further testing and polishing. I probably
> won't get much work done until next week though.
>
> Best regards,
> Frank
>

Hi Frank,

That is really stunning! I sadly don't have a proper testcase to test it on
frameworks.
So i made a very quick rough test app that simply stores all entries in
memory.

Doing that i get these very impressive results:
PRE:
500.000 files: 236,208MB

POST (Frank):
500.000 files: 118,148MB

You can remove another ~14MB in both cases which the app just has with
other stuff going one.

Hoewver, looking at your new patch i can't help but notice that the hash is
now only used as a lookup table. Perhaps you can just get rid of that now
and simply loop through the vector to find the right "Field" object? You
probably have to add a uint field to the Field struct in that case
otherwise you wouldn't know the field key. I guess it would take a bit more
CPU power since you would have to loop through all of QVector to find a
falue. On the other hand, you just have to loop over 8 items.. Perhaps that
time is even faster then a hash lookup even if that's O(1), it still has to
do the hash calculation some other stuff as opposed to just iterating over
a vector and comparing if the uint key (an int!) matches.

Btw. This stuff just shows again how freaking smart you are and how much i
still have to learn in programming :) I guess it's a good thing that i
ordered this book: "Data Structures and Algorithm Analysis in C++ (4th
edition)" : http://users.cis.fiu.edu/~weiss/

[Attachment #3 (text/html)]

<div dir="ltr">On Fri, Sep 20, 2013 at 12:32 AM, Frank Reininghaus <span \
dir="ltr">&lt;<a href="mailto:frank78ac@googlemail.com" \
target="_blank">frank78ac@googlemail.com</a>&gt;</span> wrote:<br><div \
class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>
 <br>
2013/9/17 Frank Reininghaus:<br>
<div><div class="h5">&gt; Hi Mark,<br>
&gt;<br>
&gt; I&#39;m too tired for a longer reply already, but here is a quick idea<br>
&gt; that I just had and tried immediately.<br>
&gt;<br>
&gt; 2013/9/16 Mark:<br>
&gt;&gt; Hi All,<br>
&gt;&gt;<br>
&gt;&gt; please read this as a brainstorm post with some ideas. I have no code<br>
&gt;&gt; working in this area yet.<br>
&gt;&gt;<br>
&gt;&gt; Lazy loading is awesome, but if you can compress the data enough that \
you<br> &gt;&gt; still have all details without lazy loading is even better i think. \
For<br> &gt;&gt; example, doing a directory listing on any given folder gives at \
least the<br> &gt;&gt; following UDSEntry values:<br>
&gt;&gt; - Device ID<br>
&gt;&gt; - Inode<br>
&gt;&gt; - User<br>
&gt;&gt; - Group<br>
&gt;&gt;<br>
&gt;&gt; In the &quot;experimental&quot; case where you list the folder contents that \
has just<br> &gt;&gt; a bunch of files created by one user you can say that at least \
the above 4<br> &gt;&gt; properties are the same for all files. However, the \
experimental situation<br> &gt;&gt; certainly doesn&#39;t work on &quot;real \
data&quot; where you can have files from multiple<br> &gt;&gt; users and folders as \
well.<br> &gt;<br>
&gt; Well, I guess that most people have quite a few folders where the<br>
&gt; &quot;User&quot; and &quot;Group&quot; are the same for all files ;-)<br>
&gt;<br>
&gt; The easiest &quot;compression&quot; that I could think of was to force the<br>
&gt; UDSEntries to implicitly share all their QStrings for one field if<br>
&gt; they are all equal:<br>
&gt;<br>
&gt; <a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> &gt;<br>
&gt; Reduces the memory consumption in Details View for 100,000 items from<br>
&gt; 160.5 MB to 147 MB here. The patch is a bit hackish though, and it<br>
&gt; might have a small performance penalty. Maybe it&#39;s worth investigating<br>
&gt; that further though.<br>
<br>
</div></div>OK, so I did some performance measurements. Short version: my earlier<br>
patch does make listing directories slightly slower, but I found<br>
another way to hack UDSEntry that does apparently not make things<br>
slower, and that reduces the memory usage by about 450 bytes per<br>
listed file, without any loss of functionality.<br>
<br>
Long version:<br>
<br>
(a) First of all, I took the current master branches of kdelibs and<br>
kde-baseapps (all built in release mode) and added some profiling<br>
output:<br>
<br>
<a href="http://paste.kde.org/p9b9b7b68/" \
target="_blank">http://paste.kde.org/p9b9b7b68/</a><br> <a \
href="http://paste.kde.org/p992af44c/" \
target="_blank">http://paste.kde.org/p992af44c/</a><br> <br>
Then I opened a folder with 100,000 files in Dolphin in Icons View.<br>
<br>
According to KSysGuard, the memory usage of the dolphin process is 200.6 MB.<br>
<br>
For the performance measurements, I discarded the first run and went<br>
out of the folder and back twice to get reproducible results (the time<br>
spent per item in the first run fluctuated strongly).<br>
<br>
The time that SlaveInterface::dispatch(int _cmd, const QByteArray<br>
&amp;rawdata) needs per item was between 2.6 and 2.8 microseconds.<br>
<br>
<br>
(b) My first patch that only makes use of implicit sharing of the QStrings:<br>
<br>
<a href="http://paste.kde.org/p52a24b49/" \
target="_blank">http://paste.kde.org/p52a24b49/</a><br> <br>
Memory usage drops to 187.9 MB, but the time spent per item increases<br>
to 2.9 to 3.1 microseconds (not surprising, I think).<br>
<br>
<br>
(c) My new patch, that makes more intrusive changes to the way the<br>
data is stored inside UDSEntry (explanations below):<br>
<br>
<a href="http://paste.kde.org/pc85e6b34/" \
target="_blank">http://paste.kde.org/pc85e6b34/</a><br> <br>
Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9 microseconds.<br>
<br>
<br>
This means that we can save more than 450 bytes for each file, which<br>
is IMHO a lot.<br>
<br>
<br>
<br>
How does it work?<br>
<br>
The basic observation is that UDSEntry currently stores all data in a<br>
QHash&lt;uint, Field&gt;, and that a QHash is not really space-efficient<br>
(see, e.g., <a href="http://doc.qt.digia.com/qq/qq19-containers.html" \
target="_blank">http://doc.qt.digia.com/qq/qq19-containers.html</a> to get an<br> \
idea how much memory is needed to store the data inside a QHash).<br> However, it has \
the nice property that the &quot;Field&quot; that an &quot;uint&quot; is<br> mapped \
to can be accessed in O(1) time.<br> <br>
My idea to improve this and keep the O(1) access is the following: We<br>
store all fields in UDSEntryPrivate in a simple<br>
<br>
QVector&lt;Field&gt; fields<br>
<br>
in the order in which the fields are read from the stream. We remember<br>
which UDS field comes at which position and store this in a<br>
<br>
QHash&lt;uint, int&gt; udsIndexes<br>
<br>
(uint is the type of the &quot;UDS field&quot;, and a QVector uses &quot;int&quot; \
for<br> indexing, but there are still a few signed/unsigned warnings with my<br>
patch which need investigation).<br>
<br>
Then we can access a &quot;Field&quot; that corresponds to a uint &quot;uds&quot; \
with<br> <br>
d-&gt;<a href="http://fields.at" \
target="_blank">fields.at</a>(d-&gt;udsIndexes.value(field)).<br> <br>
Now the obvious question is: &quot;How on earth can this be more efficient<br>
that the current solution? There is still a QHash in each<br>
UDSEntryPrivate after all, and the code gets more complicated!&quot;<br>
<br>
The trick is that the mapping from the &quot;uint uds&quot; to the index of the<br>
corresponding &quot;Field&quot; is usually exactly the same for every UDSEntry<br>
if we read many of them in a row, and we can make all UDSEntryPrivates<br>
share this mapping (QHash is implicitly shared). This means that every<br>
UDSEntryPrivate only has a private QVector&lt;Field&gt;, which only needs a<br>
rather small overhead compared to the memory that the Fields<br>
themselves need.<br>
<br>
The patch definitely needs further testing and polishing. I probably<br>
won&#39;t get much work done until next week though.<br>
<br>
Best regards,<br>
Frank<br>
</blockquote></div><br></div><div class="gmail_extra">Hi Frank,</div><div \
class="gmail_extra"><br></div><div class="gmail_extra">That is really stunning! I \
sadly don&#39;t have a proper testcase to test it on frameworks.</div>

<div class="gmail_extra">So i made a very quick rough test app that simply stores all \
entries in memory.</div><div class="gmail_extra"><br></div><div \
class="gmail_extra">Doing that i get these very impressive results:</div>

<div class="gmail_extra">PRE:</div><div class="gmail_extra">500.000 files: \
236,208MB</div><div class="gmail_extra"><br></div><div class="gmail_extra">POST \
(Frank):</div><div class="gmail_extra">500.000 files: 118,148MB</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">You can remove another \
~14MB in both cases which the app just has with other stuff going one.<br></div><div \
class="gmail_extra"><br></div><div class="gmail_extra">

Hoewver, looking at your new patch i can&#39;t help but notice that the hash is now \
only used as a lookup table. Perhaps you can just get rid of that now and simply loop \
through the vector to find the right &quot;Field&quot; object? You probably have to \
add a uint field to the Field struct in that case otherwise you wouldn&#39;t know the \
field key. I guess it would take a bit more CPU power since you would have to loop \
through all of QVector to find a falue. On the other hand, you just have to loop over \
8 items.. Perhaps that time is even faster then a hash lookup even if that&#39;s \
O(1), it still has to do the hash calculation some other stuff as opposed to just \
iterating over a vector and comparing if the uint key (an int!) matches.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">Btw. This stuff just \
shows again how freaking smart you are and how much i still have to learn in \
programming :) I guess it&#39;s a good thing that i ordered this book: &quot;Data \
Structures and Algorithm Analysis in C++ (4th edition)&quot; : <a \
href="http://users.cis.fiu.edu/~weiss/">http://users.cis.fiu.edu/~weiss/</a></div>

</div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic