[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: (MVS) Creating random sample from large dataset
From:       "Michael A. Raithel" <MICHAEL.RAITHEL () RAITHM49 ! CUSTOMS ! SPRINT ! COM>
Date:       1996-09-30 19:48:53
[Download RAW message or body]

Steve Des Jardins wrote:

>I have a very large dataset (950,000 rows/790 columns) and would like
>to create a random sample of this dataset to work with.  I tried
>1)reading in all data, 2)creating a random number, 3)sorting on random
>number, 4)using obs= to select a small subset of the original data.
>THe problem is the sort procedure (on the random number) only gets
>about half way through and bombs because of space limitations.  ANy
>suggestions??
>

Steve,

Your immediate problem, the sort space abend, may be alleviated by
increasing the size of your SORT data set.  To do this, you should
override your default SORT data set size.  This can be done by modifying
your SAS EXEC JCL statement as in this example:

//STEP01 EXEC SAS,SORT='xxx'

In the above statement, xxx represents the number of cylinders (or
tracks--if that is the space unit) that are to be allocated to the sort
data set.

But, consider that you may not have to make several passes of the
950,000 obs data set.  There may be a less resource intensive way to
obtain the sample.

If the records are already in a random order, a simple way to get (say)
every 50th observation into an output data set is:

data keepers;
set  steve.bigfile;

if mod(_n_,50) = 0 then output;

run;

You could modify the value in the MOD statement to suit your needs.

Alternatively, you could replace the MOD statement with a random number
generator that would determine if a particular observation was eligible
for admission in your extract.  There are a number of SAS-L gurus that
would probably provide guidance using this track.

Good luck!

I hope that this suggestion proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not
reflect those of my organization or my associates.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
E-mail: maraithel@mcimail.com
Author: Tuning SAS Applications in the MVS Environment
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
How do you dare to tell me that I'm my father's son, when that was just
an accident of birth...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic