--047d7bf0f67a6f84c104e6c63997 Content-Type: text/plain; charset=ISO-8859-1 On Fri, Sep 20, 2013 at 12:32 AM, Frank Reininghaus < frank78ac@googlemail.com> wrote: > Hi, > > 2013/9/17 Frank Reininghaus: > > Hi Mark, > > > > I'm too tired for a longer reply already, but here is a quick idea > > that I just had and tried immediately. > > > > 2013/9/16 Mark: > >> Hi All, > >> > >> please read this as a brainstorm post with some ideas. I have no code > >> working in this area yet. > >> > >> Lazy loading is awesome, but if you can compress the data enough that > you > >> still have all details without lazy loading is even better i think. For > >> example, doing a directory listing on any given folder gives at least > the > >> following UDSEntry values: > >> - Device ID > >> - Inode > >> - User > >> - Group > >> > >> In the "experimental" case where you list the folder contents that has > just > >> a bunch of files created by one user you can say that at least the > above 4 > >> properties are the same for all files. However, the experimental > situation > >> certainly doesn't work on "real data" where you can have files from > multiple > >> users and folders as well. > > > > Well, I guess that most people have quite a few folders where the > > "User" and "Group" are the same for all files ;-) > > > > The easiest "compression" that I could think of was to force the > > UDSEntries to implicitly share all their QStrings for one field if > > they are all equal: > > > > http://paste.kde.org/p52a24b49/ > > > > Reduces the memory consumption in Details View for 100,000 items from > > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it > > might have a small performance penalty. Maybe it's worth investigating > > that further though. > > OK, so I did some performance measurements. Short version: my earlier > patch does make listing directories slightly slower, but I found > another way to hack UDSEntry that does apparently not make things > slower, and that reduces the memory usage by about 450 bytes per > listed file, without any loss of functionality. > > Long version: > > (a) First of all, I took the current master branches of kdelibs and > kde-baseapps (all built in release mode) and added some profiling > output: > > http://paste.kde.org/p9b9b7b68/ > http://paste.kde.org/p992af44c/ > > Then I opened a folder with 100,000 files in Dolphin in Icons View. > > According to KSysGuard, the memory usage of the dolphin process is 200.6 > MB. > > For the performance measurements, I discarded the first run and went > out of the folder and back twice to get reproducible results (the time > spent per item in the first run fluctuated strongly). > > The time that SlaveInterface::dispatch(int _cmd, const QByteArray > &rawdata) needs per item was between 2.6 and 2.8 microseconds. > > > (b) My first patch that only makes use of implicit sharing of the QStrings: > > http://paste.kde.org/p52a24b49/ > > Memory usage drops to 187.9 MB, but the time spent per item increases > to 2.9 to 3.1 microseconds (not surprising, I think). > > > (c) My new patch, that makes more intrusive changes to the way the > data is stored inside UDSEntry (explanations below): > > http://paste.kde.org/pc85e6b34/ > > Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9 > microseconds. > > > This means that we can save more than 450 bytes for each file, which > is IMHO a lot. > > > > How does it work? > > The basic observation is that UDSEntry currently stores all data in a > QHash, and that a QHash is not really space-efficient > (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get an > idea how much memory is needed to store the data inside a QHash). > However, it has the nice property that the "Field" that an "uint" is > mapped to can be accessed in O(1) time. > > My idea to improve this and keep the O(1) access is the following: We > store all fields in UDSEntryPrivate in a simple > > QVector fields > > in the order in which the fields are read from the stream. We remember > which UDS field comes at which position and store this in a > > QHash udsIndexes > > (uint is the type of the "UDS field", and a QVector uses "int" for > indexing, but there are still a few signed/unsigned warnings with my > patch which need investigation). > > Then we can access a "Field" that corresponds to a uint "uds" with > > d->fields.at(d->udsIndexes.value(field)). > > Now the obvious question is: "How on earth can this be more efficient > that the current solution? There is still a QHash in each > UDSEntryPrivate after all, and the code gets more complicated!" > > The trick is that the mapping from the "uint uds" to the index of the > corresponding "Field" is usually exactly the same for every UDSEntry > if we read many of them in a row, and we can make all UDSEntryPrivates > share this mapping (QHash is implicitly shared). This means that every > UDSEntryPrivate only has a private QVector, which only needs a > rather small overhead compared to the memory that the Fields > themselves need. > > The patch definitely needs further testing and polishing. I probably > won't get much work done until next week though. > > Best regards, > Frank > Hi Frank, That is really stunning! I sadly don't have a proper testcase to test it on frameworks. So i made a very quick rough test app that simply stores all entries in memory. Doing that i get these very impressive results: PRE: 500.000 files: 236,208MB POST (Frank): 500.000 files: 118,148MB You can remove another ~14MB in both cases which the app just has with other stuff going one. Hoewver, looking at your new patch i can't help but notice that the hash is now only used as a lookup table. Perhaps you can just get rid of that now and simply loop through the vector to find the right "Field" object? You probably have to add a uint field to the Field struct in that case otherwise you wouldn't know the field key. I guess it would take a bit more CPU power since you would have to loop through all of QVector to find a falue. On the other hand, you just have to loop over 8 items.. Perhaps that time is even faster then a hash lookup even if that's O(1), it still has to do the hash calculation some other stuff as opposed to just iterating over a vector and comparing if the uint key (an int!) matches. Btw. This stuff just shows again how freaking smart you are and how much i still have to learn in programming :) I guess it's a good thing that i ordered this book: "Data Structures and Algorithm Analysis in C++ (4th edition)" : http://users.cis.fiu.edu/~weiss/ --047d7bf0f67a6f84c104e6c63997 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Fri, Sep 20, 2013 at 12:32 AM, Frank Reininghaus <frank78ac@googlemail.com> wrote:
Hi,

2013/9/17 Frank Reininghaus:
> Hi Mark,
>
> I'm too tired for a longer reply already, but here is a quick idea=
> that I just had and tried immediately.
>
> 2013/9/16 Mark:
>> Hi All,
>>
>> please read this as a brainstorm post with some ideas. I have no c= ode
>> working in this area yet.
>>
>> Lazy loading is awesome, but if you can compress the data enough t= hat you
>> still have all details without lazy loading is even better i think= . For
>> example, doing a directory listing on any given folder gives at le= ast the
>> following UDSEntry values:
>> - Device ID
>> - Inode
>> - User
>> - Group
>>
>> In the "experimental" case where you list the folder con= tents that has just
>> a bunch of files created by one user you can say that at least the= above 4
>> properties are the same for all files. However, the experimental s= ituation
>> certainly doesn't work on "real data" where you can = have files from multiple
>> users and folders as well.
>
> Well, I guess that most people have quite a few folders where the
> "User" and "Group" are the same for all files ;-)<= br> >
> The easiest "compression" that I could think of was to force= the
> UDSEntries to implicitly share all their QStrings for one field if
> they are all equal:
>
> http://p= aste.kde.org/p52a24b49/
>
> Reduces the memory consumption in Details View for 100,000 items from<= br> > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it
> might have a small performance penalty. Maybe it's worth investiga= ting
> that further though.

OK, so I did some performance measurements. Short version: my e= arlier
patch does make listing directories slightly slower, but I found
another way to hack UDSEntry that does apparently not make things
slower, and that reduces the memory usage by about 450 bytes per
listed file, without any loss of functionality.

Long version:

(a) First of all, I took the current master branches of kdelibs and
kde-baseapps (all built in release mode) and added some profiling
output:

http://paste.= kde.org/p9b9b7b68/
http://paste.= kde.org/p992af44c/

Then I opened a folder with 100,000 files in Dolphin in Icons View.

According to KSysGuard, the memory usage of the dolphin process is 200.6 MB= .

For the performance measurements, I discarded the first run and went
out of the folder and back twice to get reproducible results (the time
spent per item in the first run fluctuated strongly).

The time that SlaveInterface::dispatch(int _cmd, const QByteArray
&rawdata) needs per item was between 2.6 and 2.8 microseconds.


(b) My first patch that only makes use of implicit sharing of the QStrings:=

http://paste.= kde.org/p52a24b49/

Memory usage drops to 187.9 MB, but the time spent per item increases
to 2.9 to 3.1 microseconds (not surprising, I think).


(c) My new patch, that makes more intrusive changes to the way the
data is stored inside UDSEntry (explanations below):

http://paste.= kde.org/pc85e6b34/

Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9 microseconds= .


This means that we can save more than 450 bytes for each file, which
is IMHO a lot.



How does it work?

The basic observation is that UDSEntry currently stores all data in a
QHash<uint, Field>, and that a QHash is not really space-efficient (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get a= n
idea how much memory is needed to store the data inside a QHash).
However, it has the nice property that the "Field" that an "= uint" is
mapped to can be accessed in O(1) time.

My idea to improve this and keep the O(1) access is the following: We
store all fields in UDSEntryPrivate in a simple

QVector<Field> fields

in the order in which the fields are read from the stream. We remember
which UDS field comes at which position and store this in a

QHash<uint, int> udsIndexes

(uint is the type of the "UDS field", and a QVector uses "in= t" for
indexing, but there are still a few signed/unsigned warnings with my
patch which need investigation).

Then we can access a "Field" that corresponds to a uint "uds= " with

d->fields.at(d->ud= sIndexes.value(field)).

Now the obvious question is: "How on earth can this be more efficient<= br> that the current solution? There is still a QHash in each
UDSEntryPrivate after all, and the code gets more complicated!"

The trick is that the mapping from the "uint uds" to the index of= the
corresponding "Field" is usually exactly the same for every UDSEn= try
if we read many of them in a row, and we can make all UDSEntryPrivates
share this mapping (QHash is implicitly shared). This means that every
UDSEntryPrivate only has a private QVector<Field>, which only needs a=
rather small overhead compared to the memory that the Fields
themselves need.

The patch definitely needs further testing and polishing. I probably
won't get much work done until next week though.

Best regards,
Frank

Hi Frank,

That is really = stunning! I sadly don't have a proper testcase to test it on frameworks= .
So i made a very quick rough test app that simpl= y stores all entries in memory.

<= div class=3D"gmail_extra">Doing that i get these very impressive results:
PRE:
500.000 fil= es: 236,208MB

POST (Frank):
500.000 files: 118,14= 8MB

You can rem= ove another ~14MB in both cases which the app just has with other stuff goi= ng one.

Hoewver, looking at your new patch i can't help but notice that the has= h is now only used as a lookup table. Perhaps you can just get rid of that = now and simply loop through the vector to find the right "Field" = object? You probably have to add a uint field to the Field struct in that c= ase otherwise you wouldn't know the field key. I guess it would take a = bit more CPU power since you would have to loop through all of QVector to f= ind a falue. On the other hand, you just have to loop over 8 items.. Perhap= s that time is even faster then a hash lookup even if that's O(1), it s= till has to do the hash calculation some other stuff as opposed to just ite= rating over a vector and comparing if the uint key (an int!) matches.

Btw. This s= tuff just shows again how freaking smart you are and how much i still have = to learn in programming :) I guess it's a good thing that i ordered thi= s book: "Data Structures and Algorithm Analysis in C++ (4th edition)&q= uot; :=A0http://users.cis.fiu.= edu/~weiss/
--047d7bf0f67a6f84c104e6c63997--