'Re: Creating random sample from large dataset'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: Creating random sample from large dataset
From:       Howard Lethbridge <lethcon () DIRCON ! CO ! UK>
Date:       1996-09-30 23:09:34
[Download RAW message or body]

At 08:24 28/09/96 -0500, you wrote:
>Hi,
>
>First, I'm working on a mainframe (MVS) v6.
>
>I have a very large dataset (950,000 rows/790 columns) and would like to
>create a random sample of this dataset to work with.  I tried 1)reading in
>all data, 2)creating a random number, 3)sorting on random number, 4)using
>obs= to select a small subset of the original data.  THe problem is the sort
>procedure (on the random number) only gets about half way through and bombs
>because of space limitations.  ANy suggestions??
>
>If you think you have a solution but need more info please address me
>personally (not the list) and I'll summarize for the list at a later date.
>
>Thanks, Steve
>Stephen L. DesJardins                                      Phone:(612) 625-2579
>Research Fellow                                            Fax:(612)624-6057
>Office of Planning and Analysis                 E-mail:
S-DESJ@MAROON.TC.UMN.EDU
>Office of the V.P. for Planning
>University of Minnesota - Twin Cities
>260 Williamson Hall
>231 Pillsbury Dr. SE
>Minneapolis, MN  55455
http://www.opa.pres.umn.edu/personal/sdesj/
>
You could try the option TAGSORT, e.g.
PROC SORT DATA=xxx TAGSORT;

This sorts the keys only + the obs. no., and then retrieves the original
observations according to their sorted obs. no. It's slower, but uses less
memory.

_______________________________________________________________
Howard J Lethbridge
+44 (0)181-907 8655
lethcon@dircon.co.uk

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic