[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: SAS/MVS: Delete x previous records based on condition
From:       "Huang, Ya" <yhuang () AMYLIN ! COM>
Date:       2003-07-31 23:19:24
[Download RAW message or body]

Here is a one pass solution. The idea is to build up a
3 element FIFO queue, and test if any of the three elements
in the queue is the summary record, if NOT then POP the
first for output, otherwise do nothing. To fill the queue
with consecutive three records, use firstobs data set options:

data xx;
input str0 $1-30;
cards;
Record1AAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDD
1111111111111111111111111111
2222222222222222222222222222
3333333333333333333333
SUMMARYXXXXXXXXXXXXXXXXX
Record2AAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDD
3333333333333333333333333333
5555555555555555555555555555
SUMMARYXXXXXXXXXXXXXXXXX
Record3AAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDD
4444444444444441111111111111
7799999999999999222222222222
SUMMARYXXXXXXXXXXXXXXXXX
;

data xx;
length str1-str3 $100;
str1=''; str2=''; str3='';
 set xx (firstobs=1);
 str1=str0;
 set xx (firstobs=2);
 str2=str0;
 set xx (firstobs=3);
 str3=str0;
if not(
   substr(str1,1,7) = 'SUMMARY' or
   substr(str2,1,7) = 'SUMMARY' or
   substr(str3,1,7) = 'SUMMARY') then put str1;
run;

-------------

Record1AAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDD
1111111111111111111111111111
Record2AAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDD
Record3AAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDD
NOTE: There were 21 observations read from the data set WORK.XX.

I added an extra record for record1, '333333333333333333',
to show that it only get rid of the two records immediately
before the summary record, as you can see, '11111111111111111111111111'
is preserved, because it is the third record above summary.

Kind regards,

Ya Huang


-----Original Message-----
From: Ian Whitlock [mailto:WHITLOI1@WESTAT.COM]
Sent: Thursday, July 31, 2003 12:17 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: SAS/MVS: Delete x previous records based on condition


Prompted by Howard's fine solution, he is a different sort of two pass
solution with slightly different requirements - SUMMARY is not on the first
two records and is on the last one.  If SUMMARY comes too soon after RECORD
then a logical record my be lost or at least its identifier, but that is the
fault of the specs not the program.

filename demo temp ;
filename demo2 "%sysfunc(pathname(demo))" ;

data _null_;
   file demo;
   /* copy Howard's PUT statements below */
   run;

data _null_ ;
   infile demo2 firstobs = 3 ;
   input test $char7. ;
   if test = "SUMMARY" then state = 1 ;
   infile demo ;
   input ;
   if state = 0 then put _infile_ ;
   else
      state + 1 ;
   if state = 4 then state = 0 ;
run ;

IanWhitlock@westat.com

-----Original Message-----
From: Howard Schreier [mailto:Howard_Schreier@ITA.DOC.GOV]
Sent: Thursday, July 31, 2003 2:15 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: SAS/MVS: Delete x previous records based on condition


It's hard to do in one pass unless there is some reliable decision rule for
distinguishing the summary records from the pre-summary records as they are
processed. A lookback rule is hard to implement.

>From the example, it appears that pre-summary records are all alpha and
summary records are all numerics. I suspect this is just a lack of realism
and generality in the example, so I won't build from that.

Instead, here's a two-pass approach which disregards the logical binding of
lines pertaining to the same transaction. It simply deletes lines beginning
with "SUMMARY" and the two lines preceding each such line.

Store the example:

   filename demo 'demo';

   data _null_;
   file demo;
   put 'Record1AAAAAAAAAAAAAAAAAAA';
   put 'BBBBBBBBBBBBBBBBBBBBBBBBB';
   put 'CCCCCCCCCCCCCCCCCCCCCC';
   put 'DDDDDDDDDDDDDDDDDDDDDD';
   put '1111111111111111111111111111';
   put '2222222222222222222222222222';
   put 'SUMMARYXXXXXXXXXXXXXXXXX';
   put 'Record2AAAAAAAAAAAAAAAAAAA';
   put 'BBBBBBBBBBBBBBBBBBBBBBBBB';
   put 'CCCCCCCCCCCCCCCCCCCCCC';
   put 'DDDDDDDDDDDDDDDDDDDDDD';
   put '3333333333333333333333333333';
   put '5555555555555555555555555555';
   put 'SUMMARYXXXXXXXXXXXXXXXXX';
   put 'Record3AAAAAAAAAAAAAAAAAAA';
   put 'BBBBBBBBBBBBBBBBBBBBBBBBB';
   put 'CCCCCCCCCCCCCCCCCCCCCC';
   put 'DDDDDDDDDDDDDDDDDDDDDD';
   put '4444444444444441111111111111';
   put '7799999999999999222222222222';
   put 'SUMMARYXXXXXXXXXXXXXXXXX';
   run;

Create one view which simply transcribes everything, with explicit line
numbers:

   data all / view=all;
   infile demo;
   obsnum  = _n_;
   input;
   rec = _infile_;
   run;

Create a second view which lists the summary lines to be deleted:

   data summarylines / view=summarylines;
   infile demo;
   input;
   if upcase(_infile_) = : 'SUMMARY' then
    do obsnum = max(0,_n_-2) to _n_;
      output;
      end;
   run;

Finally, a fairly simple MERGE step does the job:

   data result;
   merge all summarylines(in=summary);
   by   obsnum;
   drop obsnum;
   if not summary;
   run;

This approach assumes that lines containing "SUMMARY" are not consecutive.
If this is violated, the SUMMARYLINES view will generate data out of order
and the MERGE will fail.

Note that this is a two-pass solution, with both passes occurring in the
final DATA step.

On Wed, 30 Jul 2003 19:27:33 -0700, Kumar <kamur@LYCOS.COM> wrote:

>Hello,
>
>My input file has layout like:
>
>Code:
>
>
>Record1AAAAAAAAAAAAAAAAAAA
>BBBBBBBBBBBBBBBBBBBBBBBBB
>CCCCCCCCCCCCCCCCCCCCCC
>DDDDDDDDDDDDDDDDDDDDDD
>1111111111111111111111111111
>2222222222222222222222222222
>SUMMARYXXXXXXXXXXXXXXXXX
>Record2AAAAAAAAAAAAAAAAAAA
>BBBBBBBBBBBBBBBBBBBBBBBBB
>CCCCCCCCCCCCCCCCCCCCCC
>DDDDDDDDDDDDDDDDDDDDDD
>3333333333333333333333333333
>5555555555555555555555555555
>SUMMARYXXXXXXXXXXXXXXXXX
>Record3AAAAAAAAAAAAAAAAAAA
>BBBBBBBBBBBBBBBBBBBBBBBBB
>CCCCCCCCCCCCCCCCCCCCCC
>DDDDDDDDDDDDDDDDDDDDDD
>4444444444444441111111111111
>7799999999999999222222222222
>SUMMARYXXXXXXXXXXXXXXXXX
>
>
>
>
>
>Here, lets say these are records for 3 transactions, identified by
>Record1, Record2 and Record3. As the last record of each of this
>transaction, there is a SUMMARY record.
>The Summary records in actual, includes the SUMMARY line and 2 records
>above this line (the numeric records in this sample).
>
>I need to conditionally see if the record is SUMMARY line and if so,
>need to delete the SUMMARY line and 2 records above it.
>
>So my output file will look like:
>
>Code:
>
>
>Record1AAAAAAAAAAAAAAAAAAA
>BBBBBBBBBBBBBBBBBBBBBBBBB
>CCCCCCCCCCCCCCCCCCCCCC
>DDDDDDDDDDDDDDDDDDDDDD
>Record2AAAAAAAAAAAAAAAAAAA
>BBBBBBBBBBBBBBBBBBBBBBBBB
>CCCCCCCCCCCCCCCCCCCCCC
>DDDDDDDDDDDDDDDDDDDDDD
>Record3AAAAAAAAAAAAAAAAAAA
>BBBBBBBBBBBBBBBBBBBBBBBBB
>CCCCCCCCCCCCCCCCCCCCCC
>DDDDDDDDDDDDDDDDDDDDDD
>
>
>
>
>How can I achieve this with SAS/MVS program?
>
>Regards
>Kamur

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic