'Re: Multi-cpu and ruby Threading'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    Re: Multi-cpu and ruby Threading
From:       Charles Oliver Nutter <headius () headius ! com>
Date:       2010-06-30 17:57:34
Message-ID: AANLkTimINuhA-hdoJ1FMn_C_uizQ02eg3iilmtyRJHv3 () mail ! gmail ! com
[Download RAW message or body]

On Wed, Jun 30, 2010 at 10:20 AM, Regis d'Aubarede
<regis.aubarede@gmail.com> wrote:
> For discrimination if the issue is in JRuby side or in JVM side, i run
> same
> JRubyCode, but invoke a pure Java traitement :
>    (1..nb_threads).map {  Thread.new() { Calc.calc(p1,n1) } }
> with
>
> class Calc {
>  public static long calc(int a, int b) {
>    long res=0;
>    for (int i=0;i<a;i++)
>      for (int j=0;j<b;j++)
>       for (int k=0;k<1000;k++)
>       res+=i+j+k;
>    return(res);
>  }
> }

Yes, this result is not surprising to me. In the original case, the
benchmark suffers mostly from all the objects being created. For
example:

* All the numeric loops (in JRuby) create at least one new Fixnum
object for every iteration
* All the math operations create Fixnum or Float objects as well

Running an allocation profile of your benchmark (which actually runs
pretty slow because there's *so much* allocation happening) shows the
amount of data that's being chewed up...it's very likely that the
bottleneck is in allocating all those closures and all those Fixnums
for this particular case:

~/projects/jruby ➔ jruby -J-Xrunhprof thread_bench.rb
1.8.7, java, 2010-06-17
1000 iterations by 1 threads  , Duration  = 399267 ms
^CDumping Java heap ... allocation sites ... done.

~/projects/jruby ➔ egrep "%|objs" java.hprof.txt | head -n 11
 rank   self  accum     bytes objs     bytes  objs trace name
    1 65.18% 65.18%  13545024 423282 1133938432 35435576 302318
org.jruby.RubyFixnum
    2 22.61% 87.79%   4697920 146810 381348672 11917146 302867
org.jruby.RubyFloat
    3  1.32% 89.12%    274992 5350    274992  5350 300000 char[]
    4  0.62% 89.74%    128488 5341    128488  5341 300000 java.lang.String
    5  0.18% 89.92%     38184    1     38184     1 306423 short[]
    6  0.18% 90.10%     38184    1     38184     1 306428 short[]
    7  0.14% 90.24%     28720  718     29400   735 300521
java.util.WeakHashMap$Entry
    8  0.13% 90.37%     27792   70     27792    70 300000 byte[]
    9  0.13% 90.50%     26832 1118     35040  1460 300704
java.util.concurrent.ConcurrentHashMap$HashEntry
   10  0.12% 90.63%     25232  166     25232   166 300557 org.jruby.MetaClass

Note that this is only after the 1000-iteration run, and during
execution over 1GB of memory was allocated and released, mostly in
Fixnum objects with a smaller amount (380MB+) in Float objects.
Running with verbose GC:


~/projects/jruby ➔ jruby -J-verbose:gc thread_bench.rb
1.8.7, java, 2010-06-17
[GC 13184K->1128K(63936K), 0.0108696 secs]
[GC 14312K->2124K(63936K), 0.0077762 secs]
[GC 15308K->1445K(63936K), 0.0010409 secs]
[GC 14629K->1246K(63936K), 0.0031958 secs]
...

And adding up all the size changes (number of GC runs * difference in
live object size) produces roughly the same estimate; for the period
the 1000-iteration part of the bench runs, it allocates a *lot* of
objects.

IronRuby may do better here if they're able to treat Fixnum objects as
value types, which the CLR handles more efficiently than the JVM's
"every object is on the heap". Ultimately this is largely an
allocation-rate benchmark, at least on JRuby, since our Fixnum objects
are "real" objects (or to put it in MRI's favor...our Fixnum objects
are forced to be "real" objects with heap lifecycles).

The dynopt work is part of efforts in JRuby to bring math performance
closer to Java, largely by eliminating te excessive object churn and
layers of noise for math operations.

- Charlie


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic