'Catersion Join Efficiency'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Catersion Join Efficiency
From:       pigpigpig <pigzhu740 () GMAIL ! COM>
Date:       2009-10-30 18:05:36
Message-ID: 2b09e2a8-7e68-437f-9a51-320d798ca409 () p8g2000yqb ! googlegroups ! com
[Download RAW message or body]

Catersion Join is quite frequently used in my projects..  It is an
inefficient join, but I have to do it.. because I have to match the
data in table a with the data in table b.  If table a has 10000 and
table b has 5000, then the table generated by the catesion join will
have 5000*10000=50,000,000 records.

This is very time consuming.. To save time,  what I have done is to
cut the observations of one table, and run the join parallelly in a
few SAS sessions (open like 5 SAS window, one of the window runs for
%catersion_join  ( Low=0, high=1000); another one run ( Low=1000,
high=2000), .. etc..

I am posting my solution here to share with you all, feel free to
comment , I am not sure if this is a good solution..... I believe
there must be better way to do this.. Any suggestion??

Thanks!!!!!


For example:

%macro  catersion_join ( Low=0, high=1000)

Data partial_a;
set a;
if &low <_n_<=&high then output;
run;

Proc sql;
create table c_&high as select partial_a.* , b. * from partial_a, b;
quit;

proc append base=c_1  data=c_&high ;
run;
%mend;

%catersion_join  ( Low=0, high=1000);
%catersion_join  ( Low=1000, high=2000);
%catersion_join  ( Low=2000, high=3000);
%catersion_join  ( Low=3000, high=4000);
%catersion_join  ( Low=4000, high=5000);
%catersion_join  ( Low=5000, high=6000);
%catersion_join  ( Low=6000, high=7000);
%catersion_join  ( Low=7000, high=8000);
%catersion_join  ( Low=8000, high=9000);
%catersion_join  ( Low=9000, high=10000);

The one of the best things in the world is sharing.................
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic