[prev in list] [next in list] [prev in thread] [next in thread]
List: sas-l
Subject: Re: Randomly Print Records
From: John Whittington <John.W () MEDISCIENCE ! CO ! UK>
Date: 1999-09-30 21:24:49
[Download RAW message or body]
Since I suggested:
proc print data = yourfile (where = ( ranuni(2345346) < 20/1000000 ) ) ;
run ;
... several people have asked me about the chances of ending up with
samples of particular sizes using this approach. The theoretical answer to
this is simply a matter of application of the binomial distribution. The
following code, and the output which follows, gives the probabilities of
getting a sample of each possible size from 0 to 40, from a source dataset
of 1,000,000 when 'aiming for' a sample of 20 - although the 20, and
1,000,000 can obviously be changed in the code to suit any other situation:
data doit ;
n = 0 ; p = probbnml(20/1e6, 1e6 , 0) ; p2 = p*100 ; output ;
do n = 1 to 40 ;
p = probbnml(20/1e6, 1e6 , n) - probbnml(20/1e6, 1e6, n-1) ;
p2 = p*100 ;
output ;
end ;
format p p2 16.14 ;
label n = 'No. in Sample'
p = 'Probability of this'
p2 = 'Probability as %' ;
run ;
proc print label noobs ; run ;
... gives output:
No. in Probability
Sample of this Probability as %
0 0.00000000206074 0.00000020607414
1 0.00000004121565 0.00000412156529
2 0.00000041216436 0.00004121643597
3 0.00000274781186 0.00027478118590
4 0.00001373929286 0.00137392928637
5 0.00005495805079 0.00549580507870
6 0.00018319625058 0.01831962505810
7 0.00052342518680 0.05234251868001
8 0.00130857997866 0.13085799786603
9 0.00290799040430 0.29079904043011
10 0.00581604478568 0.58160447856764
11 0.01057473263144 1.05747326314427
12 0.01762471300992 1.76247130099162
13 0.02711516001609 2.71151600160886
14 0.03873621403719 3.87362140371920
15 0.05164859527888 5.16485952788821
16 0.06456106690885 6.45610669088481
17 0.07595450018630 7.59545001862986
18 0.08439414228268 8.43941422826766
19 0.08883611692044 8.88361169204376
20 0.08883620575845 8.88362057584528
21 0.08460591024620 8.46059102461959
22 0.07691438694430 7.69143869442992
23 0.06688194183694 6.68819418369368
24 0.05573478432266 5.57347843226550
25 0.04458764910328 4.45876491032804
26 0.03429802012356 3.42980201235620
27 0.02540578839360 2.54057883936003
28 0.01814686467825 1.81468646782497
29 0.01251497896582 1.25149789658247
30 0.00834324421918 0.83432442191819
31 0.00538268437747 0.53826843774717
32 0.00336414072923 0.33641407292276
33 0.00203884870226 0.20388487022636
34 0.00119930717453 0.11993071745305
35 0.00068530879080 0.06853087907976
36 0.00038072139498 0.03807213949781
37 0.00020579205585 0.02057920558457
38 0.00010830976701 0.01083097670057
39 0.00005554247046 0.00555424704570
40 0.00002777070756 0.00277707075644
It can be seen that the highest probability is, indeed, of N in the
sample, but there is only an 8.9% chance of that happening - and N has
almost exactly the same chance, with probabilites becoming increasingly
less as one moves away from 20.
Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
Buckingham MK18 4EL, UK mediscience@compuserve.com
----------------------------------------------------------------
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic