'Re: too many sorts--?Use hashing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: too many sorts--?Use hashing
From:       "Data _null_;" <iebupdte () GMAIL ! COM>
Date:       2012-11-27 16:32:04
Message-ID: CAEZCystmkyMw06OGq_yw2Gi9Kc+EykOWzbuVZBKr3+k4GiOY3Q () mail ! gmail ! com
[Download RAW message or body]

I think you need to be more specific about how to pick the good records.

My first thought is

where not missing(AMOUNT);

On Tue, Nov 27, 2012 at 10:11 AM, Dave <d_crimkey@yahoo.com> wrote:
> I'm going to try hashing the two sorts by id1 and id2. But I still need help on the \
> first sort -- the no dup one. The large dataset can have multiple recs per key but \
> I need to make sure I pick up the record with data--meaning that there might be an \
> amount in one record that is missing in the other. Any ideas on how to pick up the \
> rec with data? 
> Sent from my iPhone
> 
> On Nov 27, 2012, at 9:24 AM, "Worley, Larry" <Larry.Worley@TEA.STATE.TX.US> wrote:
> 
> > Some thoughts
> > 
> > 1.  Can you pre-merge the two smaller datasets before merging with the
> > larger dataset?  If you can, that could reduce the sorts on the larger
> > dataset?
> > 
> > 2.  Is this a recurring task?  If so you might consider building two
> > data set Indexes on the larger data set for the larger dataset.
> > 
> > 3.  Matthew offers another useful approach with hashing.
> > 
> > BTW the new required sort may be able to replace your first sort on the
> > large table.  The first two keys of the new sort are identical with the
> > keys of the original first sort and are in the same order.
> > 
> > Hope this helps.
> > 
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > Dave
> > Sent: Tuesday, November 27, 2012 8:03 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Re: too many sorts--?Use hashing
> > 
> > I still would have to use two hashes correct?
> > 
> > Sent from my iPhone
> > 
> > On Nov 27, 2012, at 8:11 AM, "Zack, Matthew M. (CDC/ONDIEH/NCCDPHP)"
> > <mmz1@cdc.gov> wrote:
> > 
> > > Can you use hashing to replace your merging steps and reduce the total
> > time?
> > > 
> > > http://www2.sas.com/proceedings/sugi31/244-31.pdf
> > > 
> > > Matthew Zack
> > > 
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > dave crimkey
> > > Sent: Monday, November 26, 2012 9:03 PM
> > > To: SAS-L@LISTSERV.UGA.EDU
> > > Subject: too many sorts
> > > 
> > > I have a large dataset that I need to merge to two other datasets.
> > The program is written with the first sort by id1 and month_year so that
> > I can merge by id1 and month_year.  The second sort takes the same large
> > dataset and sorts by id2 and month_year for a merge to another dataset.
> > So the code looks like this:
> > > 
> > > Proc sort data=large_dataset;
> > > by id1 month_year;
> > > run;
> > > proc sort nodupkey data=dataset1 out=sd1;
> > > by id1 month_year;
> > > run;
> > > data firstmerge;
> > > merge large_dataset sd1;
> > > by id1 month_year;
> > > run;
> > > proc sort data=large_dataset;
> > > by id2 month_year;
> > > run;
> > > proc sort nodupkey data=dataset2 out=sd2;
> > > by id2 month_year;
> > > run;
> > > data secondmerge;
> > > merge large_dataset sd2;
> > > by id2 month_year;
> > > run;
> > > 
> > > My question is:  Is there a way to do this more efficiently?  The two
> > sorts on the large dataset take a long time.  And now I'm being told I
> > need another sort at the beginning of all of this to make sure that
> > duplicate rows pick up the right data.  So that sort would look like:
> > > 
> > > Proc sort nodupkey data=large_dataset;
> > > by id1 month_year descending member_count; run;
> > > 
> > > 
> > > The three sorts on large_dataset are going to take a long time and
> > this doesn't seem very efficient.
> > > 
> > > Thanks,
> > > Dave


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic