[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-core-devel
Subject: Re: Review Request 110262: KRandomSequence::randomize: use the Fisher-Yates Algorithm to achieve O(N
From: "Frank Reininghaus" <frank78ac () googlemail ! com>
Date: 2013-05-02 12:52:43
Message-ID: 20130502125243.27858.82092 () vidsolbach ! de
[Download RAW message or body]
> On May 1, 2013, 9:03 p.m., Parker Coates wrote:
> > Since KRandomSequence is a class for generating a predictable random sequence, is \
> > it not possible that this change would break applications that relied on the code \
> > below producing a consistent result between kdelibs releases?
> > QList list = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
> > KRandomSequence pseudoRand( 123545 );
> > pseudoRand.randomize( list );
> > return list;
> >
> >
> >
>
> Frank Reininghaus wrote:
> I had thought about that as well, this is the reason why I propose to push that to \
> master only, and not to KDE/4.10. Even though I think that a design which relies on \
> the order of the items in a randomized sequence might be questionable in an \
> application, it might make sense in unit tests. If the patch is pushed to master \
> only, it would give people some time to react if anything breaks. At first sight, I \
> did not notice anything in http://lxr.kde.org/ident?i=randomize that looks \
> dangerous though.
> On the other hand, if people are worried that this change might break things, then \
> we could just deprecate the existing method and implement Fisher-Yates in a new one \
> (maybe KRandomSequence::shuffle ?). The disadvantage would be that every \
> application that uses the old method needs to be ported to the new one in order to \
> stop wasting CPU cycles and prevent freezes (like when you set up a Plasma picture \
> frame slideshow with many pictures).
>
> Parker Coates wrote:
> The only reason I mentioned this is that I considered using \
> KRandomSequence::randomize() for shuffling in card games. There I needed to take an \
> integer deal number and a sorted list of cards to produce a predictably randomised \
> deck. I wouldn't have been all that impressed if my deal numbers suddenly changed \
> meaning due to a kdelibs update. (In the end, I didn't end up using \
> KRandomSequence::randomize(), so this is a non issue for me personally.)
> On the other hand, if the change is planned for a new major library version, and \
> your survey of current uses didn't turn up any that seem to rely on the \
> predictability of the result, I see no reason not to make the change. I just wanted \
> to make sure that you'd fully considered the implications. Maybe a note in the API \
> docs mentioning the change in behaviour would be a good idea, though?
> (Oh and sorry about the triple comment. ReviewBoard's feedback can be... \
> interesting at times. I'll see if I can manage to send this one only once.)
Thanks for the quick reply. The deal number issue is a valid concern, of course, but \
none of the places I found using lxr seem to use something like that. In fact, it \
looks like basically everyone is just using the default seed value. But I'll wait for \
futher feedback from other people and see if anyone whom I missed relies on a \
reproducible shuffling.
BTW, your shuffling algorithm in kpat.git/libkcardgame/shuffle.h looks equivalent to \
what I'm proposing here :-)
- Frank
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/110262/#review31855
-----------------------------------------------------------
On May 1, 2013, 8:34 p.m., Frank Reininghaus wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/110262/
> -----------------------------------------------------------
>
> (Updated May 1, 2013, 8:34 p.m.)
>
>
> Review request for kdelibs.
>
>
> Description
> -------
>
> The current algorithm that is used to shuffle lists is rather inefficient. It works \
> by removing the first item of the list repeatedly and inserting it at a random \
> position in a new list, which is finally used to replace the original list. \
> Unfortunately, this results in O(N^2) run time complexity because inserting into a \
> list, which is done N itmes, is O(N).
> I propose to replace this algorithm by the Fisher-Yates algorithm, which works by \
> swapping items N - 1 times. One could modify the entire thing further, like \
> providing randomization also for other containers and not only QList, but that \
> would probably be frameworks material.
>
> Diffs
> -----
>
> kdecore/util/krandomsequence.h 46949b4
>
> Diff: http://git.reviewboard.kde.org/r/110262/diff/
>
>
> Testing
> -------
>
> I wrote a small benchmark: http://paste.kde.org/735914/
>
> I got the following results with the existing algorithm:
>
> RESULT : Benchmark::randomSequenceBenchmark():"n=0":
> 0.000015 msecs per iteration (total: 66, iterations: 4194304)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1":
> 0.000192 msecs per iteration (total: 101, iterations: 524288)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3":
> 0.00070 msecs per iteration (total: 93, iterations: 131072)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10":
> 0.0025 msecs per iteration (total: 83, iterations: 32768)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30":
> 0.0070 msecs per iteration (total: 58, iterations: 8192)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100":
> 0.023 msecs per iteration (total: 97, iterations: 4096)
> RESULT : Benchmark::randomSequenceBenchmark():"n=300":
> 0.077 msecs per iteration (total: 79, iterations: 1024)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1000":
> 0.35 msecs per iteration (total: 90, iterations: 256)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3000":
> 1.8 msecs per iteration (total: 58, iterations: 32)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10000":
> 18 msecs per iteration (total: 72, iterations: 4)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30000":
> 283 msecs per iteration (total: 283, iterations: 1)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100000":
> 3,823 msecs per iteration (total: 3,823, iterations: 1)
>
> Here are the numbers for the proposed new algorithm:
>
> RESULT : Benchmark::randomSequenceBenchmark():"n=0":
> 0.000015 msecs per iteration (total: 65, iterations: 4194304)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1":
> 0.000015 msecs per iteration (total: 65, iterations: 4194304)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3":
> 0.00018 msecs per iteration (total: 98, iterations: 524288)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10":
> 0.00079 msecs per iteration (total: 52, iterations: 65536)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30":
> 0.0025 msecs per iteration (total: 83, iterations: 32768)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100":
> 0.0084 msecs per iteration (total: 69, iterations: 8192)
> RESULT : Benchmark::randomSequenceBenchmark():"n=300":
> 0.025 msecs per iteration (total: 52, iterations: 2048)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1000":
> 0.085 msecs per iteration (total: 88, iterations: 1024)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3000":
> 0.25 msecs per iteration (total: 66, iterations: 256)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10000":
> 0.85 msecs per iteration (total: 55, iterations: 64)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30000":
> 2.6 msecs per iteration (total: 86, iterations: 32)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100000":
> 10 msecs per iteration (total: 81, iterations: 8)
>
>
> Thanks,
>
> Frank Reininghaus
>
>
[Attachment #3 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;"> <tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="http://git.reviewboard.kde.org/r/110262/">http://git.reviewboard.kde.org/r/110262/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <p style="margin-top: 0;">On May 1st, 2013, 9:03 p.m. UTC, <b>Parker \
Coates</b> wrote:</p> <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;"> <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Since KRandomSequence is a class for generating a predictable random \
sequence, is it not possible that this change would break applications that relied on \
the code below producing a consistent result between kdelibs releases?
QList list = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
KRandomSequence pseudoRand( 123545 );
pseudoRand.randomize( list );
return list;
</pre>
</blockquote>
<p>On May 2nd, 2013, 9:43 a.m. UTC, <b>Frank Reininghaus</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I had thought about that \
as well, this is the reason why I propose to push that to master only, and not to \
KDE/4.10. Even though I think that a design which relies on the order of the items in \
a randomized sequence might be questionable in an application, it might make sense in \
unit tests. If the patch is pushed to master only, it would give people some time to \
react if anything breaks. At first sight, I did not notice anything in \
http://lxr.kde.org/ident?i=randomize that looks dangerous though.
On the other hand, if people are worried that this change might break things, then we \
could just deprecate the existing method and implement Fisher-Yates in a new one \
(maybe KRandomSequence::shuffle ?). The disadvantage would be that every application \
that uses the old method needs to be ported to the new one in order to stop wasting \
CPU cycles and prevent freezes (like when you set up a Plasma picture frame slideshow \
with many pictures). </pre>
</blockquote>
<p>On May 2nd, 2013, 12:18 p.m. UTC, <b>Parker Coates</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">The only reason I \
mentioned this is that I considered using KRandomSequence::randomize() for shuffling \
in card games. There I needed to take an integer deal number and a sorted list of \
cards to produce a predictably randomised deck. I wouldn't have been all that \
impressed if my deal numbers suddenly changed meaning due to a kdelibs update. (In \
the end, I didn't end up using KRandomSequence::randomize(), so this is a non \
issue for me personally.)
On the other hand, if the change is planned for a new major library version, and your \
survey of current uses didn't turn up any that seem to rely on the predictability \
of the result, I see no reason not to make the change. I just wanted to make sure \
that you'd fully considered the implications. Maybe a note in the API docs \
mentioning the change in behaviour would be a good idea, though?
(Oh and sorry about the triple comment. ReviewBoard's feedback can be... \
interesting at times. I'll see if I can manage to send this one only once.)</pre> \
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Thanks for the quick \
reply. The deal number issue is a valid concern, of course, but none of the places I \
found using lxr seem to use something like that. In fact, it looks like basically \
everyone is just using the default seed value. But I'll wait for futher feedback \
from other people and see if anyone whom I missed relies on a reproducible shuffling.
BTW, your shuffling algorithm in kpat.git/libkcardgame/shuffle.h looks equivalent to \
what I'm proposing here :-) </pre>
<br />
<p>- Frank</p>
<br />
<p>On May 1st, 2013, 8:34 p.m. UTC, Frank Reininghaus wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;"> <tr>
<td>
<div>Review request for kdelibs.</div>
<div>By Frank Reininghaus.</div>
<p style="color: grey;"><i>Updated May 1, 2013, 8:34 p.m.</i></p>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">The current algorithm that is used to shuffle lists is rather \
inefficient. It works by removing the first item of the list repeatedly and inserting \
it at a random position in a new list, which is finally used to replace the original \
list. Unfortunately, this results in O(N^2) run time complexity because inserting \
into a list, which is done N itmes, is O(N).
I propose to replace this algorithm by the Fisher-Yates algorithm, which works by \
swapping items N - 1 times. One could modify the entire thing further, like providing \
randomization also for other containers and not only QList, but that would probably \
be frameworks material. </pre> </td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I wrote a small benchmark: http://paste.kde.org/735914/
I got the following results with the existing algorithm:
RESULT : Benchmark::randomSequenceBenchmark():"n=0":
0.000015 msecs per iteration (total: 66, iterations: 4194304)
RESULT : Benchmark::randomSequenceBenchmark():"n=1":
0.000192 msecs per iteration (total: 101, iterations: 524288)
RESULT : Benchmark::randomSequenceBenchmark():"n=3":
0.00070 msecs per iteration (total: 93, iterations: 131072)
RESULT : Benchmark::randomSequenceBenchmark():"n=10":
0.0025 msecs per iteration (total: 83, iterations: 32768)
RESULT : Benchmark::randomSequenceBenchmark():"n=30":
0.0070 msecs per iteration (total: 58, iterations: 8192)
RESULT : Benchmark::randomSequenceBenchmark():"n=100":
0.023 msecs per iteration (total: 97, iterations: 4096)
RESULT : Benchmark::randomSequenceBenchmark():"n=300":
0.077 msecs per iteration (total: 79, iterations: 1024)
RESULT : Benchmark::randomSequenceBenchmark():"n=1000":
0.35 msecs per iteration (total: 90, iterations: 256)
RESULT : Benchmark::randomSequenceBenchmark():"n=3000":
1.8 msecs per iteration (total: 58, iterations: 32)
RESULT : Benchmark::randomSequenceBenchmark():"n=10000":
18 msecs per iteration (total: 72, iterations: 4)
RESULT : Benchmark::randomSequenceBenchmark():"n=30000":
283 msecs per iteration (total: 283, iterations: 1)
RESULT : Benchmark::randomSequenceBenchmark():"n=100000":
3,823 msecs per iteration (total: 3,823, iterations: 1)
Here are the numbers for the proposed new algorithm:
RESULT : Benchmark::randomSequenceBenchmark():"n=0":
0.000015 msecs per iteration (total: 65, iterations: 4194304)
RESULT : Benchmark::randomSequenceBenchmark():"n=1":
0.000015 msecs per iteration (total: 65, iterations: 4194304)
RESULT : Benchmark::randomSequenceBenchmark():"n=3":
0.00018 msecs per iteration (total: 98, iterations: 524288)
RESULT : Benchmark::randomSequenceBenchmark():"n=10":
0.00079 msecs per iteration (total: 52, iterations: 65536)
RESULT : Benchmark::randomSequenceBenchmark():"n=30":
0.0025 msecs per iteration (total: 83, iterations: 32768)
RESULT : Benchmark::randomSequenceBenchmark():"n=100":
0.0084 msecs per iteration (total: 69, iterations: 8192)
RESULT : Benchmark::randomSequenceBenchmark():"n=300":
0.025 msecs per iteration (total: 52, iterations: 2048)
RESULT : Benchmark::randomSequenceBenchmark():"n=1000":
0.085 msecs per iteration (total: 88, iterations: 1024)
RESULT : Benchmark::randomSequenceBenchmark():"n=3000":
0.25 msecs per iteration (total: 66, iterations: 256)
RESULT : Benchmark::randomSequenceBenchmark():"n=10000":
0.85 msecs per iteration (total: 55, iterations: 64)
RESULT : Benchmark::randomSequenceBenchmark():"n=30000":
2.6 msecs per iteration (total: 86, iterations: 32)
RESULT : Benchmark::randomSequenceBenchmark():"n=100000":
10 msecs per iteration (total: 81, iterations: 8)
</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>kdecore/util/krandomsequence.h <span style="color: grey">(46949b4)</span></li>
</ul>
<p><a href="http://git.reviewboard.kde.org/r/110262/diff/" style="margin-left: \
3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic