[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: How to create 10 deciles using Proc Rank when there are too
From:       "Keintz, H. Mark" <mkeintz () WHARTON ! UPENN ! EDU>
Date:       2010-01-31 23:06:44
Message-ID: FEE685C811A7E44AAD17E47B2A966E29BC416968E7 () KITE ! wharton ! upenn ! edu
[Download RAW message or body]

Nick:

If there are two many ties to make deciles relatively unique,
then what do mean by "deciles"?

As an extreme example of my question, consider a binary variable
with exactly 500 zeroes and 500 ones.  Would you then insist on
getting deciles?  You apparently have a somewhat similar, but
less extreme, situation.

I suggest you take a look at the distribution of your data.  Maybe
you have a lot of zeroes, and then relatively few ties for values
above zero (I assume no negatives).  In that case, you might want
to put all the zeroes in one (large) group, and divide the others
into roughly equal size quantiles.

If you must form unique, replicable, deciles, and find a way to
break ties, then add a small random number to your ranking variable,
and run PROC RANK.  Make sure the random number is small enough so
that it never can re-order values.  For instance if the smallest
interval between neighboring values is, say 1, then make sure the
random value is always less than 1.

Example:


data rankable / view=rankable;
  set mydata;
  myscore_jiggled = myscore+ 0.5* uniform(10481408);
 run;

proc rank data=rankable out=r_mydata group=10 ties=low descending;
  var myscore_jiggled;
  ranks r_myscore;
run;

Regards,
Mark

> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> newtous
> Sent: Saturday, January 30, 2010 2:26 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: How to create 10 deciles using Proc Rank when there are too
> many ties
>
> Hello everyone,
> I am trying to create 10 deciles using Proc Rank out of a dataset with
> about 100K obs.  Here is the code I used:
>
> Proc rank data=mydata out=r_mydata group=10 ties=low descending;
>   var myscore;
>   ranks r_myscore;
> run;
>
> Unfortunately I got fewer than 10 deciles - for some score I got
> decile 0-7, some decile 0-5.  The reason is  that there are too many
> ties (e.g. ties with value of 0.000) in the data.  The last decile has
> more than 10K obs because SAS treats all the zeros as the same, so
> decile 7 has 30K instead of the desired 10K.
>
> Can anyone help with this?  I am trying to decide the decile cut
> values.  Or does it even make sense to include all the ties?
>
> Thanks in advance for any help.
>
> Nick Yang
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic