[prev in list] [next in list] [prev in thread] [next in thread]
List: ruby-talk
Subject: Re: Multi-cpu and ruby Threading
From: Charles Oliver Nutter <headius () headius ! com>
Date: 2010-06-30 17:57:34
Message-ID: AANLkTimINuhA-hdoJ1FMn_C_uizQ02eg3iilmtyRJHv3 () mail ! gmail ! com
[Download RAW message or body]
On Wed, Jun 30, 2010 at 10:20 AM, Regis d'Aubarede
<regis.aubarede@gmail.com> wrote:
> For discrimination if the issue is in JRuby side or in JVM side, i run
> same
> JRubyCode, but invoke a pure Java traitement :
> (1..nb_threads).map { Thread.new() { Calc.calc(p1,n1) } }
> with
>
> class Calc {
> public static long calc(int a, int b) {
> long res=0;
> for (int i=0;i<a;i++)
> for (int j=0;j<b;j++)
> for (int k=0;k<1000;k++)
> res+=i+j+k;
> return(res);
> }
> }
Yes, this result is not surprising to me. In the original case, the
benchmark suffers mostly from all the objects being created. For
example:
* All the numeric loops (in JRuby) create at least one new Fixnum
object for every iteration
* All the math operations create Fixnum or Float objects as well
Running an allocation profile of your benchmark (which actually runs
pretty slow because there's *so much* allocation happening) shows the
amount of data that's being chewed up...it's very likely that the
bottleneck is in allocating all those closures and all those Fixnums
for this particular case:
~/projects/jruby ➔ jruby -J-Xrunhprof thread_bench.rb
1.8.7, java, 2010-06-17
1000 iterations by 1 threads , Duration = 399267 ms
^CDumping Java heap ... allocation sites ... done.
~/projects/jruby ➔ egrep "%|objs" java.hprof.txt | head -n 11
rank self accum bytes objs bytes objs trace name
1 65.18% 65.18% 13545024 423282 1133938432 35435576 302318
org.jruby.RubyFixnum
2 22.61% 87.79% 4697920 146810 381348672 11917146 302867
org.jruby.RubyFloat
3 1.32% 89.12% 274992 5350 274992 5350 300000 char[]
4 0.62% 89.74% 128488 5341 128488 5341 300000 java.lang.String
5 0.18% 89.92% 38184 1 38184 1 306423 short[]
6 0.18% 90.10% 38184 1 38184 1 306428 short[]
7 0.14% 90.24% 28720 718 29400 735 300521
java.util.WeakHashMap$Entry
8 0.13% 90.37% 27792 70 27792 70 300000 byte[]
9 0.13% 90.50% 26832 1118 35040 1460 300704
java.util.concurrent.ConcurrentHashMap$HashEntry
10 0.12% 90.63% 25232 166 25232 166 300557 org.jruby.MetaClass
Note that this is only after the 1000-iteration run, and during
execution over 1GB of memory was allocated and released, mostly in
Fixnum objects with a smaller amount (380MB+) in Float objects.
Running with verbose GC:
~/projects/jruby ➔ jruby -J-verbose:gc thread_bench.rb
1.8.7, java, 2010-06-17
[GC 13184K->1128K(63936K), 0.0108696 secs]
[GC 14312K->2124K(63936K), 0.0077762 secs]
[GC 15308K->1445K(63936K), 0.0010409 secs]
[GC 14629K->1246K(63936K), 0.0031958 secs]
...
And adding up all the size changes (number of GC runs * difference in
live object size) produces roughly the same estimate; for the period
the 1000-iteration part of the bench runs, it allocates a *lot* of
objects.
IronRuby may do better here if they're able to treat Fixnum objects as
value types, which the CLR handles more efficiently than the JVM's
"every object is on the heap". Ultimately this is largely an
allocation-rate benchmark, at least on JRuby, since our Fixnum objects
are "real" objects (or to put it in MRI's favor...our Fixnum objects
are forced to be "real" objects with heap lifecycles).
The dynopt work is part of efforts in JRuby to bring math performance
closer to Java, largely by eliminating te excessive object churn and
layers of noise for math operations.
- Charlie
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic