[prev in list] [next in list] [prev in thread] [next in thread]
List: sas-l
Subject: Re: Data Step vs. Proc sql
From: Glenn Heagerty <gheagerty () EARTHLINK ! NET>
Date: 2004-02-28 0:41:47
[Download RAW message or body]
Sigurd,
Thank you for the summary and example. I will keep it with me as I start working
more with PROC SQL.
Have a nice weekend,
Glenn
Sigurd Hermansen wrote:
> Glenn and all:
> I recall at least two other full rounds of arguments on SAS-L about whether
> SAS SQL performs as well or better than the SAS merge. The relative
> performance of SQL joins and Data step merges depends on
> - the 'width' of the datasets being joined (SQL works faster with 'narrow'
> tables);
> - whether SAS knows the sort order of sorted datasets (the merge assumes BY
> group order);
> - LEFT (RIGHT, FULL) vs INNER joins (SAS tends to hash inner joins but not
> others);
> - memory available (though not consistently in MS Windows);
> - prior caching of results (1st attempt runs slower);
> - number of datasets being merged (the merge works much faster on three or
> more datasets
> being merged on the same key);
> - the density and distribution of join/merge key values (sort/merge works
> better than merge
> when key 'hit rate' exceeds ~20%);
> - what else is happening in the computing environment.
>
> I wrote a fairly complete program (or perhaps, 'adapted it from a posted
> program') to test left joins vs. merges during a minor join/merge skirmish
> last summer. It follows the Sig line. I don't believe that highly accurate
> differences in the execution times would, in a <100M row context, be worth
> another thought.
>
> Intelligent use of join/merges in SAS would, on the other hand, make a big
> difference in the performance and reliability of SAS programs. Better SQL
> execution plans, at least to the extent that they perform no much worse than
> a default merge and use available memory and indexes to good advantage,
> would make it easier to choose the logically correct method for the task.
> Finally, SAS join/merges work so fast now bullet-proof programming pays
> greater dividends than programming that saves system resources.
> Sig
>
> %let obs = 1e6;;
> Data one ;
> retain y 1;
> a=repeat('X',15999);
> do x = &obs to 1 by -1; output; if uniform(7) < .05 then output; end;
> run;
>
> Data two ;
> retain y 2 z 3;
> a=repeat('X',15999);
> do x = &obs to 1 by -1; output; end;
> run;
>
> options mprint;
> proc sql noprint;
> select t1.name into :vars2 separated by ','
> from (select name from dictionary.columns
> where libname = "WORK" and upcase(memname)='TWO') as t1
> left join
> (select name from dictionary.columns
> where libname="WORK" and upcase(memname)='ONE') as t2
> on t1.name=t2.name
> where t2.name is missing
> ;
> quit;
> proc sort data=one;
> by x;
> run;
> proc sort data=two;
> by x;
> run;
> proc sql _method;
> create table three as
> select t1.x,t2.&vars2
> from one (sortedby=x) as t1 left join two (sortedby=x) as t2
> on t1.x=t2.x
> ;
> quit;
> data three;
> merge one(in = i1) two(keep = x &vars2);
> by x;
> if i1;
> run;
> proc sql;
> create index x on two(x);
> create table three as
> select t1.x,t2.&vars2
> from one as t1 left join two as t2
> on t1.x=t2.x
> ;
> drop index x from two;
> quit;
> proc sql _method;
> create table three as
> select t1.x,t1.a,t2.&vars2
> from one as t1 left join two as t2
> on t1.x=t2.x
> ;
> quit;
> proc datasets;
> modify two;
> index create x;
> run;
> data three;
> do until(eof);
> set one end=eof;
> set two (keep= x &vars2);
> key=x;
> output;
> end;
> stop;
> run;
>
SNIP>>>>>>>>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic