'Re: Data Step vs. Proc sql'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: Data Step vs. Proc sql
From:       Glenn Heagerty <gheagerty () EARTHLINK ! NET>
Date:       2004-02-28 0:41:47
[Download RAW message or body]

Sigurd,

Thank you for the summary and example. I will keep it with me as I start working
more with PROC SQL.

Have a nice weekend,

Glenn

Sigurd Hermansen wrote:
> Glenn and all:
> I recall at least two other full rounds of arguments on SAS-L about whether
> SAS SQL performs as well or better than the SAS merge. The relative
> performance of SQL joins and Data step merges depends on
> - the 'width' of the datasets being joined (SQL works faster with 'narrow'
> tables);
> - whether SAS knows the sort order of sorted datasets (the merge assumes BY
> group order);
> - LEFT (RIGHT, FULL) vs INNER joins (SAS tends to hash inner joins but not
> others);
> - memory available (though not consistently in MS Windows);
> - prior caching of results (1st attempt runs slower);
> - number of datasets being merged (the merge works much faster on three or
> more datasets
>   being merged on the same key);
> - the density and distribution of join/merge key values (sort/merge works
> better than merge
>   when key 'hit rate' exceeds ~20%);
> - what else is happening in the computing environment.
>
> I wrote a fairly complete program (or perhaps, 'adapted it from a posted
> program') to test left joins vs. merges during a minor join/merge skirmish
> last summer. It follows the Sig line. I don't believe that highly accurate
> differences in the execution times would, in a <100M row context, be worth
> another thought.
>
> Intelligent use of join/merges in SAS would, on the other hand, make a big
> difference in the performance and reliability of SAS programs. Better SQL
> execution plans, at least to the extent that they perform no much worse than
> a default merge and use available memory and indexes to good advantage,
> would make it easier to choose the logically correct method for the task.
> Finally, SAS join/merges work so fast now bullet-proof programming pays
> greater dividends than programming that saves system resources.
> Sig
>
> %let obs = 1e6;;
> Data one ;
>  retain y 1;
>  a=repeat('X',15999);
>  do x = &obs to 1 by -1; output; if uniform(7) < .05 then output; end;
>  run;
>
> Data two ;
>  retain y 2 z 3;
>  a=repeat('X',15999);
>  do x = &obs to 1 by -1; output; end;
>  run;
>
> options mprint;
> proc sql noprint;
>   select t1.name into :vars2 separated by ','
>   from (select name from dictionary.columns
>         where libname = "WORK" and upcase(memname)='TWO') as t1
>         left join
>            (select name from dictionary.columns
>             where libname="WORK" and upcase(memname)='ONE') as t2
>   on t1.name=t2.name
>   where t2.name is missing
>   ;
> quit;
> proc sort data=one;
>   by x;
> run;
> proc sort data=two;
>   by x;
> run;
> proc sql _method;
>   create table three as
>   select t1.x,t2.&vars2
>   from one (sortedby=x) as t1 left join two (sortedby=x) as t2
>   on t1.x=t2.x
>   ;
> quit;
> data three;
>   merge one(in = i1) two(keep = x &vars2);
>    by x;
>     if i1;
> run;
> proc sql;
>   create index x on two(x);
>   create table three as
>   select t1.x,t2.&vars2
>   from one as t1 left join two as t2
>   on t1.x=t2.x
>   ;
>   drop index x from two;
> quit;
> proc sql _method;
>   create table three as
>   select t1.x,t1.a,t2.&vars2
>   from one as t1 left join two as t2
>   on t1.x=t2.x
>   ;
> quit;
> proc datasets;
>    modify two;
>    index create x;
> run;
> data three;
>   do until(eof);
>    set one end=eof;
>    set two (keep= x &vars2);
>    key=x;
>    output;
>   end;
>   stop;
> run;
>

SNIP>>>>>>>>

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic