From kfm-devel Thu Oct 25 21:00:23 2012 From: Christoph Feck Date: Thu, 25 Oct 2012 21:00:23 +0000 To: kfm-devel Subject: Re: Ideas for improving sorting. Goal: under 1 second for 200.000 files. Message-Id: <201210252300.23951.christoph () maxiom ! de> X-MARC-Message: https://marc.info/?l=kfm-devel&m=135119902825853 On Thursday 25 October 2012 20:24:23 Mark wrote: > On Thu, Oct 25, 2012 at 2:09 PM, Christoph Feck=20 wrote: > > 'a' -> 'a' > > 'b' -> 'b' > > '=E4' -> 'a' '\0377' > >=20 > > The appended 0xFF character ensures that '=E4' is always after 'a', > > regardless of which other character follows. But it will never be > > after 'b', because of the first 'a' in the sort key. >=20 > This is where things get odd... So, i should place '=E4' before 'a', > but that means that i have to make a translation table somewhere > from latin chars to alphabet chars.. Or am i misunderstanding > something here? You are right, there need to be some tables, but as I wrote, you do=20 not bother, because you can use strxform() from glibc for this (or=20 libicu, which is what QCollator is going to use). Those libraries=20 already load locale-specific tables (every wondered, why "glibc- locale" package is several hundred megabytes big? ;) Simply check the=20 source code of QString::localeAwareCompare(), it's no black magic to=20 call glibc. > So how about this stuff in the implementations side? > Am i right that i'm going to get 2 new parts: >=20 > 1. A part for building up the collation sequence. That part is > outside the sorting algorithm. Yes. That's why Frank said this should be in kdelibs, right besides=20 the "naturalCompare" function. Looking at the QCollator link for=20 future Q5, we should model the function in kdelibs the same way, so=20 that it simply can be "moved" to Qt in frameworks. > 2. The sorting algorithm becomes a LOT less code and simply does > string compares based on the collation sequences. So no more > difficult checks like are done now. Thus the sorting would indeed > speed up massively, right? Well, the comparing gets massively faster, but the actual sorting=20 algorithm gets a bit more complicated. First, you have to create a=20 temporary SortEntry list, create the keys, then sort the side- information, and after that, have to use that sorted list to do the=20 actual model sorting. But those additional operations are only O(N),=20 not O(N*log N), which is why we expect it to be very worth for large=20 models. > For Todd and Christoph, thanks a lot for writing such long > informative replies! I really appreciate that and it really helps > me understand how to implement this. Though i obviously still have > questions ;) Wer nicht fragt, bleibt dumm ;) Christoph Feck (kdepepo)