[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: Remove Duplicates
From:       Sigurd Hermansen <HERMANS1 () WESTAT ! COM>
Date:       2002-08-30 22:28:00
[Download RAW message or body]

I called the SELECT DISTINCT ... a solution to the variant of the problem
that
you stated. If you add other variables to the SELECT list, it will as before
remove exact duplicates of all of the column variables listed.

You may want to eliminate duplicates of the key, Var1,Var2,Var3, but keep
one of
the other variables even if they do not match exactly. That means you must
build
into your program an arbitrary rule that determines which of the rows
containing
duplicate rows to keep.

To build in a rule for keeping one of more than one possibilities, you'll
need a
more complex program. What rule do you want to specify?

Sig

-----Original Message-----
From: Nick I [mailto:ni14@mail.com]
Sent: Thursday, August 29, 2002 5:40 PM
To: Sigurd Hermansen
Subject: Re: Re: Remove Duplicates


Yes, that works, IF I only have those 3 variables (Var1, Var2, Var3).
However, the data set has other variables as well, which I need to keep!
Doing
....
select distinct var1, var2, var3
from foo
......

will remove the other fields and keep ONLY Var1-Var3 (removes duplicates but
other variables are gone!). I Want ALL variables,
Var1, Var2, Var3, MortgageId, Name, PIF, etc.

Thanks.
--nick


----- Original Message -----
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Date:         Thu, 29 Aug 2002 17:10:48 -0400
To: SAS-L@LISTSERV.UGA.EDU
Subject:      Re: Remove Duplicates


> This variant of the general problem of removing duplicates has an easy SQL
> solution:
>
> create table fooNoDup as select distinct var1,var2,var3 from foo;
>
> The DISTINCT qualifier treats the three variables as a class and forms a
set
> (no duplicates) of elements of the class.
>
> Sig
>
> -----Original Message-----
> From: Nick I [mailto:ni14@MAIL.COM]
> Sent: Thursday, August 29, 2002 4:59 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Remove Duplicates
>
>
> Dear SAS experts,
>
> I need to remove duplicate records and I don't know how to write the code.
>
> Here is an example data (just copy and paste into SAS):
>
> Var1 = Numeric
> Var2 = Char
> Var3 = Num
>
> data foo;
> input var1 var2 $ var3;
> cards;
> 123 a 1
> 124 a 1
> 125 a 1
> 125 a 1
> 125 b 1
> 126 b 1
> 127 b 1
> 128 c 1
> 128 c 1
> 129 c 1
> 129 c 1
> 130 d 2
> 131 d 2
> 132 c 2
> 133 c 2
> 134 a 2
> 134 e 2
> 134 e 2
> 134 a 2
> 135 a 2
> 136 a 3
> 137 b 3
> 138 d 3
> 138 f 3
> 139 f 3
> 139 f 3
> ;
> run;
>
> Obs 4 must be removed since it appears more than once under the SAME
> VAR3 variable AND the SAME VAR2 variable AND the SAME VAR1 variable.
>
> Obs 9 must be removed.
>
> Obs 11 must be removed.
>
> Obs 16 and 19 must be removed. (TRICKY ! MUST SORT or something?)
>
> Obs 17 must be removed.
> Obs 18 must be removed.
>
> Obs 24 must be removed.
>
> Obs 26 must be removed.
>
> I hope you see the idea from this small example.
>
> Thanks a bunch.
>
> --nick
>
> --
> __________________________________________________________
> Sign-up for your own FREE Personalized E-mail at Mail.com
> htIp://www.mail.com/?sr=signup
>

--
__________________________________________________________

Sign-up for your own FREE Personalized E-mail at Mail.com

http://www.mail.com/?sr=signup

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic