[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    Re: array comparison
From:       Robert Klemme <shortcutter () googlemail ! com>
Date:       2008-10-31 17:19:08
Message-ID: 6n0svvFj7s9lU1 () mid ! individual ! net
[Download RAW message or body]

On 31.10.2008 11:21, Sarcar, Shourya C (GE Healthcare) wrote:
>> -----Original Message-----
>> From: Robert Klemme [mailto:shortcutter@googlemail.com] 
>> Sent: Friday, October 31, 2008 12:04 AM
>> To: ruby-talk ML
>> Subject: Re: array comparison
>>
>> The comparison is a bit unfair since you include the build 
>> time for the structure in case of Hash but not for the Array. 
>>  You should at least also compare just the intersection time. 
>>  And while you're at it you can also add Set to the mix. :-)
> 
> 
> Interesting discussion.
> 
> I wrote a small program to compare Array intersection to Set
> intersection.
> I noticed that for smaller sets (~10 ** 3), Array is much faster that
> Set, sometimes upto 5x.
> As the set size grows, they seem to converge (~10 ** 7, 100 million)
> 
> # Program to compare timings for intersections
> require 'set'
> 
> n = 10000
> while (n <= 10 ** 7) do 
> 	c = Array.new(n) {rand n}.uniq
> 	d = Array.new(n) {rand n}.uniq
> 
> 	# Native array intersection
> 	t0 = Time.now
> 	cd = c & d
> 	t1 = Time.now
> 	printf("Array | n = %d : %d intersects found in %f
> seconds\n",n,cd.size,t1-t0)
> 	
> 	# Convert array to Set objects and perform intersection
> 	c = c.to_set
> 	d = d.to_set
> 	t2 = Time.now
> 	cd = c & d
> 	t3 = Time.now
> 	printf("Set   | n = %d : %d intersects found in %f
> seconds\n",n,cd.size,t3-t2)
> 	printf("Ratio of Set/Array intersect time =
> %.2f\n",(t3-t2)/(t1-t0))
> 
> 	n = 2 * n
> end

Some things are noteworthy about your program:

  - It has a large intersection, i.e. roughly 50% on average.  This may 
be different for other scenarios.

  - You use Fixnums which compare pretty fast, you can expect to reach 
break even earlier when using other types.

I have taken the liberty to extend your test program a bit changing 
those two parameters mentioned above and you can see significant 
differences. :-)

Kind regards

	robert



["i2.rb" (text/plain)]

require 'set'

constr = [
  lambda do |n|
  [
    Array.new(n) {rand n}.uniq,
    Array.new(n) {rand(n) + (n*0.9).to_i}.uniq
  ]
  end,
    lambda do |n|
    [
      Array.new(n) {"#{rand n}" << ("*" * 100_000)}.uniq,
      Array.new(n) {"#{rand(n) + (n*0.9).to_i}" << ("*" * 100_000)}.uniq
    ]
    end
]

2.times do |run|
  puts "=== run #{run + 1} ========================================================"
  constr.each_with_index do |creat, idx|
    puts "--- construct #{idx} --------------------------------------------------------"
    n = 1
    loop do 
      c, d = creat[n]

      # Native array intersection
      t0 = Time.now
      cd = c & d
      t1 = Time.now
      printf("Array | n = %d : %d intersects found in %f seconds\n",n,cd.size,t1-t0)

      # Convert array to Set objects and perform intersection
      c = c.to_set
      d = d.to_set
      t2 = Time.now
      cd = c & d
      t3 = Time.now
      printf("Set   | n = %d : %d intersects found in %f seconds\n",n,cd.size,t3-t2)

      ratio = (t3-t2)/(t1-t0)
      printf("Ratio of Set/Array intersect time = %.2f\n",ratio)

      n *= 2
      break if n > 100 && ratio > 0.1 && ratio < 0.9
    end
  end
end

["output.txt" (text/plain)]

=== run 1 ========================================================
--- construct 0 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.000000 seconds
Set   | n = 1 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 2 : 0 intersects found in 0.000000 seconds
Set   | n = 2 : 0 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 4 : 1 intersects found in 0.000000 seconds
Set   | n = 4 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 8 : 0 intersects found in 0.000000 seconds
Set   | n = 8 : 0 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 16 : 0 intersects found in 0.000000 seconds
Set   | n = 16 : 0 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 32 : 1 intersects found in 0.000000 seconds
Set   | n = 32 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 64 : 2 intersects found in 0.000000 seconds
Set   | n = 64 : 2 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 128 : 3 intersects found in 0.000000 seconds
Set   | n = 128 : 3 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 256 : 13 intersects found in 0.000000 seconds
Set   | n = 256 : 13 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 512 : 19 intersects found in 0.000000 seconds
Set   | n = 512 : 19 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 1024 : 38 intersects found in 0.001000 seconds
Set   | n = 1024 : 38 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = 1.00
Array | n = 2048 : 87 intersects found in 0.001000 seconds
Set   | n = 2048 : 87 intersects found in 0.003000 seconds
Ratio of Set/Array intersect time = 3.00
Array | n = 4096 : 147 intersects found in 0.002000 seconds
Set   | n = 4096 : 147 intersects found in 0.005000 seconds
Ratio of Set/Array intersect time = 2.50
Array | n = 8192 : 344 intersects found in 0.004000 seconds
Set   | n = 8192 : 344 intersects found in 0.010000 seconds
Ratio of Set/Array intersect time = 2.50
Array | n = 16384 : 674 intersects found in 0.008000 seconds
Set   | n = 16384 : 674 intersects found in 0.021000 seconds
Ratio of Set/Array intersect time = 2.62
Array | n = 32768 : 1353 intersects found in 0.015000 seconds
Set   | n = 32768 : 1353 intersects found in 0.046000 seconds
Ratio of Set/Array intersect time = 3.07
Array | n = 65536 : 2605 intersects found in 0.040000 seconds
Set   | n = 65536 : 2605 intersects found in 0.096000 seconds
Ratio of Set/Array intersect time = 2.40
Array | n = 131072 : 5278 intersects found in 0.125000 seconds
Set   | n = 131072 : 5278 intersects found in 0.203000 seconds
Ratio of Set/Array intersect time = 1.62
Array | n = 262144 : 10610 intersects found in 0.327000 seconds
Set   | n = 262144 : 10610 intersects found in 0.416000 seconds
Ratio of Set/Array intersect time = 1.27
Array | n = 524288 : 21003 intersects found in 0.869000 seconds
Set   | n = 524288 : 21003 intersects found in 0.827000 seconds
Ratio of Set/Array intersect time = 0.95
Array | n = 1048576 : 41912 intersects found in 2.059000 seconds
Set   | n = 1048576 : 41912 intersects found in 1.674000 seconds
Ratio of Set/Array intersect time = 0.81
--- construct 1 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.002000 seconds
Set   | n = 1 : 1 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = 0.50
Array | n = 2 : 1 intersects found in 0.003000 seconds
Set   | n = 2 : 1 intersects found in 0.003000 seconds
Ratio of Set/Array intersect time = 1.00
Array | n = 4 : 0 intersects found in 0.003000 seconds
Set   | n = 4 : 0 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = 0.33
Array | n = 8 : 0 intersects found in 0.011000 seconds
Set   | n = 8 : 0 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.18
Array | n = 16 : 2 intersects found in 0.013000 seconds
Set   | n = 16 : 2 intersects found in 0.008000 seconds
Ratio of Set/Array intersect time = 0.62
Array | n = 32 : 1 intersects found in 0.026000 seconds
Set   | n = 32 : 1 intersects found in 0.009000 seconds
Ratio of Set/Array intersect time = 0.35
Array | n = 64 : 3 intersects found in 0.054000 seconds
Set   | n = 64 : 3 intersects found in 0.020000 seconds
Ratio of Set/Array intersect time = 0.37
=== run 2 ========================================================
--- construct 0 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.000000 seconds
Set   | n = 1 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 2 : 1 intersects found in 0.000000 seconds
Set   | n = 2 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 4 : 1 intersects found in 0.000000 seconds
Set   | n = 4 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 8 : 1 intersects found in 0.000000 seconds
Set   | n = 8 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 16 : 1 intersects found in 0.000000 seconds
Set   | n = 16 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 32 : 1 intersects found in 0.000000 seconds
Set   | n = 32 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 64 : 5 intersects found in 0.000000 seconds
Set   | n = 64 : 5 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 128 : 7 intersects found in 0.000000 seconds
Set   | n = 128 : 7 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 256 : 9 intersects found in 0.000000 seconds
Set   | n = 256 : 9 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 512 : 24 intersects found in 0.000000 seconds
Set   | n = 512 : 24 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 1024 : 47 intersects found in 0.000000 seconds
Set   | n = 1024 : 47 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 2048 : 79 intersects found in 0.000000 seconds
Set   | n = 2048 : 79 intersects found in 0.003000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 4096 : 178 intersects found in 0.002000 seconds
Set   | n = 4096 : 178 intersects found in 0.004000 seconds
Ratio of Set/Array intersect time = 2.00
Array | n = 8192 : 322 intersects found in 0.004000 seconds
Set   | n = 8192 : 322 intersects found in 0.010000 seconds
Ratio of Set/Array intersect time = 2.50
Array | n = 16384 : 650 intersects found in 0.008000 seconds
Set   | n = 16384 : 650 intersects found in 0.022000 seconds
Ratio of Set/Array intersect time = 2.75
Array | n = 32768 : 1326 intersects found in 0.015000 seconds
Set   | n = 32768 : 1326 intersects found in 0.046000 seconds
Ratio of Set/Array intersect time = 3.07
Array | n = 65536 : 2626 intersects found in 0.041000 seconds
Set   | n = 65536 : 2626 intersects found in 0.095000 seconds
Ratio of Set/Array intersect time = 2.32
Array | n = 131072 : 5279 intersects found in 0.129000 seconds
Set   | n = 131072 : 5279 intersects found in 0.205000 seconds
Ratio of Set/Array intersect time = 1.59
Array | n = 262144 : 10491 intersects found in 0.334000 seconds
Set   | n = 262144 : 10491 intersects found in 0.403000 seconds
Ratio of Set/Array intersect time = 1.21
Array | n = 524288 : 21031 intersects found in 1.112000 seconds
Set   | n = 524288 : 21031 intersects found in 0.814000 seconds
Ratio of Set/Array intersect time = 0.73
--- construct 1 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.001000 seconds
Set   | n = 1 : 1 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 2.00
Array | n = 2 : 1 intersects found in 0.003000 seconds
Set   | n = 2 : 1 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.67
Array | n = 4 : 1 intersects found in 0.004000 seconds
Set   | n = 4 : 1 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.50
Array | n = 8 : 0 intersects found in 0.007000 seconds
Set   | n = 8 : 0 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.29
Array | n = 16 : 0 intersects found in 0.013000 seconds
Set   | n = 16 : 0 intersects found in 0.005000 seconds
Ratio of Set/Array intersect time = 0.38
Array | n = 32 : 1 intersects found in 0.024000 seconds
Set   | n = 32 : 1 intersects found in 0.009000 seconds
Ratio of Set/Array intersect time = 0.37
Array | n = 64 : 3 intersects found in 0.055000 seconds
Set   | n = 64 : 3 intersects found in 0.020000 seconds
Ratio of Set/Array intersect time = 0.36


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic