[prev in list] [next in list] [prev in thread] [next in thread]
List: sas-l
Subject: Re: Creating random sample from large dataset
From: Howard Lethbridge <lethcon () DIRCON ! CO ! UK>
Date: 1996-09-30 23:09:34
[Download RAW message or body]
At 08:24 28/09/96 -0500, you wrote:
>Hi,
>
>First, I'm working on a mainframe (MVS) v6.
>
>I have a very large dataset (950,000 rows/790 columns) and would like to
>create a random sample of this dataset to work with. I tried 1)reading in
>all data, 2)creating a random number, 3)sorting on random number, 4)using
>obs= to select a small subset of the original data. THe problem is the sort
>procedure (on the random number) only gets about half way through and bombs
>because of space limitations. ANy suggestions??
>
>If you think you have a solution but need more info please address me
>personally (not the list) and I'll summarize for the list at a later date.
>
>Thanks, Steve
>Stephen L. DesJardins Phone:(612) 625-2579
>Research Fellow Fax:(612)624-6057
>Office of Planning and Analysis E-mail:
S-DESJ@MAROON.TC.UMN.EDU
>Office of the V.P. for Planning
>University of Minnesota - Twin Cities
>260 Williamson Hall
>231 Pillsbury Dr. SE
>Minneapolis, MN 55455
http://www.opa.pres.umn.edu/personal/sdesj/
>
You could try the option TAGSORT, e.g.
PROC SORT DATA=xxx TAGSORT;
This sorts the keys only + the obs. no., and then retrieves the original
observations according to their sorted obs. no. It's slower, but uses less
memory.
_______________________________________________________________
Howard J Lethbridge
+44 (0)181-907 8655
lethcon@dircon.co.uk
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic