[prev in list] [next in list] [prev in thread] [next in thread]
List: sas-l
Subject: Re: too many sorts--?Use hashing
From: "Data _null_;" <iebupdte () GMAIL ! COM>
Date: 2012-11-27 16:32:04
Message-ID: CAEZCystmkyMw06OGq_yw2Gi9Kc+EykOWzbuVZBKr3+k4GiOY3Q () mail ! gmail ! com
[Download RAW message or body]
I think you need to be more specific about how to pick the good records.
My first thought is
where not missing(AMOUNT);
On Tue, Nov 27, 2012 at 10:11 AM, Dave <d_crimkey@yahoo.com> wrote:
> I'm going to try hashing the two sorts by id1 and id2. But I still need help on the \
> first sort -- the no dup one. The large dataset can have multiple recs per key but \
> I need to make sure I pick up the record with data--meaning that there might be an \
> amount in one record that is missing in the other. Any ideas on how to pick up the \
> rec with data?
> Sent from my iPhone
>
> On Nov 27, 2012, at 9:24 AM, "Worley, Larry" <Larry.Worley@TEA.STATE.TX.US> wrote:
>
> > Some thoughts
> >
> > 1. Can you pre-merge the two smaller datasets before merging with the
> > larger dataset? If you can, that could reduce the sorts on the larger
> > dataset?
> >
> > 2. Is this a recurring task? If so you might consider building two
> > data set Indexes on the larger data set for the larger dataset.
> >
> > 3. Matthew offers another useful approach with hashing.
> >
> > BTW the new required sort may be able to replace your first sort on the
> > large table. The first two keys of the new sort are identical with the
> > keys of the original first sort and are in the same order.
> >
> > Hope this helps.
> >
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > Dave
> > Sent: Tuesday, November 27, 2012 8:03 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Re: too many sorts--?Use hashing
> >
> > I still would have to use two hashes correct?
> >
> > Sent from my iPhone
> >
> > On Nov 27, 2012, at 8:11 AM, "Zack, Matthew M. (CDC/ONDIEH/NCCDPHP)"
> > <mmz1@cdc.gov> wrote:
> >
> > > Can you use hashing to replace your merging steps and reduce the total
> > time?
> > >
> > > http://www2.sas.com/proceedings/sugi31/244-31.pdf
> > >
> > > Matthew Zack
> > >
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > dave crimkey
> > > Sent: Monday, November 26, 2012 9:03 PM
> > > To: SAS-L@LISTSERV.UGA.EDU
> > > Subject: too many sorts
> > >
> > > I have a large dataset that I need to merge to two other datasets.
> > The program is written with the first sort by id1 and month_year so that
> > I can merge by id1 and month_year. The second sort takes the same large
> > dataset and sorts by id2 and month_year for a merge to another dataset.
> > So the code looks like this:
> > >
> > > Proc sort data=large_dataset;
> > > by id1 month_year;
> > > run;
> > > proc sort nodupkey data=dataset1 out=sd1;
> > > by id1 month_year;
> > > run;
> > > data firstmerge;
> > > merge large_dataset sd1;
> > > by id1 month_year;
> > > run;
> > > proc sort data=large_dataset;
> > > by id2 month_year;
> > > run;
> > > proc sort nodupkey data=dataset2 out=sd2;
> > > by id2 month_year;
> > > run;
> > > data secondmerge;
> > > merge large_dataset sd2;
> > > by id2 month_year;
> > > run;
> > >
> > > My question is: Is there a way to do this more efficiently? The two
> > sorts on the large dataset take a long time. And now I'm being told I
> > need another sort at the beginning of all of this to make sure that
> > duplicate rows pick up the right data. So that sort would look like:
> > >
> > > Proc sort nodupkey data=large_dataset;
> > > by id1 month_year descending member_count; run;
> > >
> > >
> > > The three sorts on large_dataset are going to take a long time and
> > this doesn't seem very efficient.
> > >
> > > Thanks,
> > > Dave
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic