From kfm-devel  Thu Oct 25 21:00:23 2012
From: Christoph Feck <christoph () maxiom ! de>
Date: Thu, 25 Oct 2012 21:00:23 +0000
To: kfm-devel
Subject: Re: Ideas for improving sorting. Goal: under 1 second for 200.000 files.
Message-Id: <201210252300.23951.christoph () maxiom ! de>
X-MARC-Message: https://marc.info/?l=kfm-devel&m=135119902825853

On Thursday 25 October 2012 20:24:23 Mark wrote:
> On Thu, Oct 25, 2012 at 2:09 PM, Christoph Feck=20
<christoph@maxiom.de> wrote:
> >         'a' ->  'a'
> >         'b' -> 'b'
> >         '=E4' -> 'a' '\0377'
> >=20
> > The appended 0xFF character ensures that '=E4' is always after 'a',
> > regardless of which other character follows. But it will never be
> > after 'b', because of the first 'a' in the sort key.
>=20
> This is where things get odd... So, i should place '=E4' before 'a',
> but that means that i have to make a translation table somewhere
> from latin chars to alphabet chars.. Or am i misunderstanding
> something here?

You are right, there need to be some tables, but as I wrote, you do=20
not bother, because you can use strxform() from glibc for this (or=20
libicu, which is what QCollator is going to use). Those libraries=20
already load locale-specific tables (every wondered, why "glibc-
locale" package is several hundred megabytes big? ;) Simply check the=20
source code of QString::localeAwareCompare(), it's no black magic to=20
call glibc.

> So how about this stuff in the implementations side?
> Am i right that i'm going to get 2 new parts:
>=20
> 1. A part for building up the collation sequence. That part is
> outside the sorting algorithm.

Yes. That's why Frank said this should be in kdelibs, right besides=20
the "naturalCompare" function. Looking at the QCollator link for=20
future Q5, we should model the function in kdelibs the same way, so=20
that it simply can be "moved" to Qt in frameworks.

> 2. The sorting algorithm becomes a LOT less code and simply does
> string compares based on the collation sequences. So no more
> difficult checks like are done now. Thus the sorting would indeed
> speed up massively, right?

Well, the comparing gets massively faster, but the actual sorting=20
algorithm gets a bit more complicated. First, you have to create a=20
temporary SortEntry list, create the keys, then sort the side-
information, and after that, have to use that sorted list to do the=20
actual model sorting. But those additional operations are only O(N),=20
not O(N*log N), which is why we expect it to be very worth for large=20
models.

> For Todd and Christoph, thanks a lot for writing such long
> informative replies! I really appreciate that and it really helps
> me understand how to implement this. Though i obviously still have
> questions ;)

Wer nicht fragt, bleibt dumm ;)

Christoph Feck (kdepepo)