'Re: Review Request 110262: KRandomSequence::randomize: use the Fisher-Yates Algorithm to achieve O(N'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: Review Request 110262: KRandomSequence::randomize: use the Fisher-Yates Algorithm to achieve O(N
From:       "Frank Reininghaus" <frank78ac () googlemail ! com>
Date:       2013-05-02 12:52:43
Message-ID: 20130502125243.27858.82092 () vidsolbach ! de
[Download RAW message or body]

> On May 1, 2013, 9:03 p.m., Parker Coates wrote:
> > Since KRandomSequence is a class for generating a predictable random sequence, is \
> > it not possible that this change would break applications that relied on the code \
> > below producing a consistent result between kdelibs releases? 
> > QList list = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
> > KRandomSequence pseudoRand( 123545 );
> > pseudoRand.randomize( list );
> > return list;
> > 
> > 
> > 
> 
> Frank Reininghaus wrote:
> I had thought about that as well, this is the reason why I propose to push that to \
> master only, and not to KDE/4.10. Even though I think that a design which relies on \
> the order of the items in a randomized sequence might be questionable in an \
> application, it might make sense in unit tests. If the patch is pushed to master \
> only, it would give people some time to react if anything breaks. At first sight, I \
> did not notice anything in http://lxr.kde.org/ident?i=randomize that looks \
> dangerous though. 
> On the other hand, if people are worried that this change might break things, then \
> we could just deprecate the existing method and implement Fisher-Yates in a new one \
> (maybe KRandomSequence::shuffle ?). The disadvantage would be that every \
> application that uses the old method needs to be ported to the new one in order to \
> stop wasting CPU cycles and prevent freezes (like when you set up a Plasma picture \
> frame slideshow with many pictures). 
> 
> Parker Coates wrote:
> The only reason I mentioned this is that I considered using \
> KRandomSequence::randomize() for shuffling in card games. There I needed to take an \
> integer deal number and a sorted list of cards to produce a predictably randomised \
> deck. I wouldn't have been all that impressed if my deal numbers suddenly changed \
> meaning due to a kdelibs update. (In the end, I didn't end up using \
> KRandomSequence::randomize(), so this is a non issue for me personally.) 
> On the other hand, if the change is planned for a new major library version, and \
> your survey of current uses didn't turn up any that seem to rely on the \
> predictability of the result, I see no reason not to make the change. I just wanted \
> to make sure that you'd fully considered the implications. Maybe a note in the API \
> docs mentioning the change in behaviour would be a good idea, though? 
> (Oh and sorry about the triple comment. ReviewBoard's feedback can be... \
> interesting at times. I'll see if I can manage to send this one only once.)

Thanks for the quick reply. The deal number issue is a valid concern, of course, but \
none of the places I found using lxr seem to use something like that. In fact, it \
looks like basically everyone is just using the default seed value. But I'll wait for \
futher feedback from other people and see if anyone whom I missed relies on a \
reproducible shuffling.

BTW, your shuffling algorithm in kpat.git/libkcardgame/shuffle.h looks equivalent to \
what I'm proposing here :-)


- Frank


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/110262/#review31855
-----------------------------------------------------------


On May 1, 2013, 8:34 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/110262/
> -----------------------------------------------------------
> 
> (Updated May 1, 2013, 8:34 p.m.)
> 
> 
> Review request for kdelibs.
> 
> 
> Description
> -------
> 
> The current algorithm that is used to shuffle lists is rather inefficient. It works \
> by removing the first item of the list repeatedly and inserting it at a random \
> position in a new list, which is finally used to replace the original list. \
> Unfortunately, this results in O(N^2) run time complexity because inserting into a \
> list, which is done N itmes, is O(N). 
> I propose to replace this algorithm by the Fisher-Yates algorithm, which works by \
> swapping items N - 1 times. One could modify the entire thing further, like \
> providing randomization also for other containers and not only QList, but that \
> would probably be frameworks material.  
> 
> Diffs
> -----
> 
> kdecore/util/krandomsequence.h 46949b4 
> 
> Diff: http://git.reviewboard.kde.org/r/110262/diff/
> 
> 
> Testing
> -------
> 
> I wrote a small benchmark: http://paste.kde.org/735914/
> 
> I got the following results with the existing algorithm:
> 
> RESULT : Benchmark::randomSequenceBenchmark():"n=0":
> 0.000015 msecs per iteration (total: 66, iterations: 4194304)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1":
> 0.000192 msecs per iteration (total: 101, iterations: 524288)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3":
> 0.00070 msecs per iteration (total: 93, iterations: 131072)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10":
> 0.0025 msecs per iteration (total: 83, iterations: 32768)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30":
> 0.0070 msecs per iteration (total: 58, iterations: 8192)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100":
> 0.023 msecs per iteration (total: 97, iterations: 4096)
> RESULT : Benchmark::randomSequenceBenchmark():"n=300":
> 0.077 msecs per iteration (total: 79, iterations: 1024)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1000":
> 0.35 msecs per iteration (total: 90, iterations: 256)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3000":
> 1.8 msecs per iteration (total: 58, iterations: 32)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10000":
> 18 msecs per iteration (total: 72, iterations: 4)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30000":
> 283 msecs per iteration (total: 283, iterations: 1)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100000":
> 3,823 msecs per iteration (total: 3,823, iterations: 1)
> 
> Here are the numbers for the proposed new algorithm:
> 
> RESULT : Benchmark::randomSequenceBenchmark():"n=0":
> 0.000015 msecs per iteration (total: 65, iterations: 4194304)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1":
> 0.000015 msecs per iteration (total: 65, iterations: 4194304)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3":
> 0.00018 msecs per iteration (total: 98, iterations: 524288)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10":
> 0.00079 msecs per iteration (total: 52, iterations: 65536)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30":
> 0.0025 msecs per iteration (total: 83, iterations: 32768)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100":
> 0.0084 msecs per iteration (total: 69, iterations: 8192)
> RESULT : Benchmark::randomSequenceBenchmark():"n=300":
> 0.025 msecs per iteration (total: 52, iterations: 2048)
> RESULT : Benchmark::randomSequenceBenchmark():"n=1000":
> 0.085 msecs per iteration (total: 88, iterations: 1024)
> RESULT : Benchmark::randomSequenceBenchmark():"n=3000":
> 0.25 msecs per iteration (total: 66, iterations: 256)
> RESULT : Benchmark::randomSequenceBenchmark():"n=10000":
> 0.85 msecs per iteration (total: 55, iterations: 64)
> RESULT : Benchmark::randomSequenceBenchmark():"n=30000":
> 2.6 msecs per iteration (total: 86, iterations: 32)
> RESULT : Benchmark::randomSequenceBenchmark():"n=100000":
> 10 msecs per iteration (total: 81, iterations: 8)
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
> 


[Attachment #3 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="http://git.reviewboard.kde.org/r/110262/">http://git.reviewboard.kde.org/r/110262/</a>
  </td>
    </tr>
   </table>
   <br />





<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On May 1st, 2013, 9:03 p.m. UTC, <b>Parker \
Coates</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Since KRandomSequence is a class for generating a predictable random \
sequence, is it not possible that this change would break applications that relied on \
the code below producing a consistent result between kdelibs releases?

QList list = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
KRandomSequence pseudoRand( 123545 );
pseudoRand.randomize( list );
return list;


</pre>
 </blockquote>




 <p>On May 2nd, 2013, 9:43 a.m. UTC, <b>Frank Reininghaus</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I had thought about that \
as well, this is the reason why I propose to push that to master only, and not to \
KDE/4.10. Even though I think that a design which relies on the order of the items in \
a randomized sequence might be questionable in an application, it might make sense in \
unit tests. If the patch is pushed to master only, it would give people some time to \
react if anything breaks. At first sight, I did not notice anything in \
http://lxr.kde.org/ident?i=randomize that looks dangerous though.

On the other hand, if people are worried that this change might break things, then we \
could just deprecate the existing method and implement Fisher-Yates in a new one \
(maybe KRandomSequence::shuffle ?). The disadvantage would be that every application \
that uses the old method needs to be ported to the new one in order to stop wasting \
CPU cycles and prevent freezes (like when you set up a Plasma picture frame slideshow \
with many pictures). </pre>
 </blockquote>





 <p>On May 2nd, 2013, 12:18 p.m. UTC, <b>Parker Coates</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">The only reason I \
mentioned this is that I considered using KRandomSequence::randomize() for shuffling \
in card games. There I needed to take an integer deal number and a sorted list of \
cards to produce a predictably randomised deck. I wouldn&#39;t have been all that \
impressed if my deal numbers suddenly changed meaning due to a kdelibs update. (In \
the end, I didn&#39;t end up using KRandomSequence::randomize(), so this is a non \
issue for me personally.)

On the other hand, if the change is planned for a new major library version, and your \
survey of current uses didn&#39;t turn up any that seem to rely on the predictability \
of the result, I see no reason not to make the change. I just wanted to make sure \
that you&#39;d fully considered the implications. Maybe a note in the API docs \
mentioning the change in behaviour would be a good idea, though?

(Oh and sorry about the triple comment. ReviewBoard&#39;s feedback can be... \
interesting at times. I&#39;ll see if I can manage to send this one only once.)</pre> \
</blockquote>








</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Thanks for the quick \
reply. The deal number issue is a valid concern, of course, but none of the places I \
found using lxr seem to use something like that. In fact, it looks like basically \
everyone is just using the default seed value. But I&#39;ll wait for futher feedback \
from other people and see if anyone whom I missed relies on a reproducible shuffling.

BTW, your shuffling algorithm in kpat.git/libkcardgame/shuffle.h looks equivalent to \
what I&#39;m proposing here :-) </pre>
<br />










<p>- Frank</p>


<br />
<p>On May 1st, 2013, 8:34 p.m. UTC, Frank Reininghaus wrote:</p>








<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for kdelibs.</div>
<div>By Frank Reininghaus.</div>


<p style="color: grey;"><i>Updated May 1, 2013, 8:34 p.m.</i></p>






<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">The current algorithm that is used to shuffle lists is rather \
inefficient. It works by removing the first item of the list repeatedly and inserting \
it at a random position in a new list, which is finally used to replace the original \
list. Unfortunately, this results in O(N^2) run time complexity because inserting \
into a list, which is done N itmes, is O(N).

I propose to replace this algorithm by the Fisher-Yates algorithm, which works by \
swapping items N - 1 times. One could modify the entire thing further, like providing \
randomization also for other containers and not only QList, but that would probably \
be frameworks material. </pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I wrote a small benchmark: http://paste.kde.org/735914/

I got the following results with the existing algorithm:

RESULT : Benchmark::randomSequenceBenchmark():&quot;n=0&quot;:
     0.000015 msecs per iteration (total: 66, iterations: 4194304)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=1&quot;:
     0.000192 msecs per iteration (total: 101, iterations: 524288)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=3&quot;:
     0.00070 msecs per iteration (total: 93, iterations: 131072)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=10&quot;:
     0.0025 msecs per iteration (total: 83, iterations: 32768)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=30&quot;:
     0.0070 msecs per iteration (total: 58, iterations: 8192)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=100&quot;:
     0.023 msecs per iteration (total: 97, iterations: 4096)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=300&quot;:
     0.077 msecs per iteration (total: 79, iterations: 1024)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=1000&quot;:
     0.35 msecs per iteration (total: 90, iterations: 256)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=3000&quot;:
     1.8 msecs per iteration (total: 58, iterations: 32)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=10000&quot;:
     18 msecs per iteration (total: 72, iterations: 4)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=30000&quot;:
     283 msecs per iteration (total: 283, iterations: 1)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=100000&quot;:
     3,823 msecs per iteration (total: 3,823, iterations: 1)

Here are the numbers for the proposed new algorithm:

RESULT : Benchmark::randomSequenceBenchmark():&quot;n=0&quot;:
     0.000015 msecs per iteration (total: 65, iterations: 4194304)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=1&quot;:
     0.000015 msecs per iteration (total: 65, iterations: 4194304)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=3&quot;:
     0.00018 msecs per iteration (total: 98, iterations: 524288)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=10&quot;:
     0.00079 msecs per iteration (total: 52, iterations: 65536)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=30&quot;:
     0.0025 msecs per iteration (total: 83, iterations: 32768)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=100&quot;:
     0.0084 msecs per iteration (total: 69, iterations: 8192)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=300&quot;:
     0.025 msecs per iteration (total: 52, iterations: 2048)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=1000&quot;:
     0.085 msecs per iteration (total: 88, iterations: 1024)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=3000&quot;:
     0.25 msecs per iteration (total: 66, iterations: 256)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=10000&quot;:
     0.85 msecs per iteration (total: 55, iterations: 64)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=30000&quot;:
     2.6 msecs per iteration (total: 86, iterations: 32)
RESULT : Benchmark::randomSequenceBenchmark():&quot;n=100000&quot;:
     10 msecs per iteration (total: 81, iterations: 8)
</pre>
  </td>
 </tr>
</table>




<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>kdecore/util/krandomsequence.h <span style="color: grey">(46949b4)</span></li>

</ul>

<p><a href="http://git.reviewboard.kde.org/r/110262/diff/" style="margin-left: \
3em;">View Diff</a></p>







  </td>
 </tr>
</table>








  </div>
 </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic