[prev in list] [next in list] [prev in thread] [next in thread]
List: sas-l
Subject: Re: PRoc Summary 'Internals'
From: Biff Canlett <Biff () CANLETT ! COM>
Date: 2001-05-31 16:04:34
[Download RAW message or body]
I would never consider debugging a proc summary step without re running it
with the missing option. Missing values can creep into data so easily in
SAS, particularly if you use '+' for addition instead of SUM( ) . Using
Class in proc summary will discard all data with a missing value in the
class variable. If you re run with "PROC SUMMARY MISSING NWAY" and review
your results, I suspect you will find some key values missing.
"teedl" <teedl@tdbank.ca> wrote in message
news:000601c0e9d0$d858c6e0$164b1331@isd.tdbank.ca...
> I have just come across a small interesting issue while performing some
> summarisation on data using Proc Summary and I was hoping some of you
> familiar with this process, or SAS Internal processes could provide me
with
> some insight as to what is happening here. I am pasting the log below.
>
> This particular process reads a DB2 table consisting of all new load
> advances(cheques), selects the desired records based on the date parameter
> passed to it, and then generates a key value. Since a loan can have
multiple
> advances occur within the time period requested, and the desire is to have
a
> single record per loan, the records need to be 'summarized' based on the
> AcctKey. In originally setting this process up I had used the Proc Summary
> NWAY, Class process. However, in reviewing the results I noted that the
> number of records output was much lower than expected. While multiple
> advances do occur, it is not a common occurence. After thinking that the
> uniqueness of my AcctKey was the issue, I spent some time testing that. My
> AcctKey logic proved to be, at least in this case, correct. Lon_Num is
never
> higher than 40, Account_ is a 7 digit number, and Branch is a 4 digit
> number. The resulting Key is, in essence, BBBBAAAAAAANN. While trying to
> determine what was going on, I modified the routine slightly to sort the
> file prior to summarization, and the use the By process. Expecting to get
> the same results I was astonished to find that the By on the Summary
> actually produced the correct results! To make matters even more
> interesting, if I add a Format AcctKey 13. to the DATA step, the Summary
> Class process works fine! So then, it appears that Class is using some
form
> of Format unkown to me, while By does not. Can anyone enlighten me on what
> is internally happening here? And is their a way, using Proc Summary
itself,
> that I can produce the same result for both the Class and By, without
> knowing in advance the format structure?
>
> In essence, I am a little surprised that I would get a different result
from
> what appears to be two identical processes.
>
>
>
> Thanks.
>
>
> Lionel
>
>
> PS to Paul Dorfman. This routine is a 'patch' to an existing process. As a
> result, I did not try to use any of your neat Hashing/Direct
> Addressing/Bitmapping stuff. I will, however, be using that stuff in some
of
> my newer projects.
>
> 1
> 2 DATA NEWLOANS (KEEP=ACCTKEY ADVTP LON_NUM ACCOUNT_ BRANCH_N
> 3 XDATE FROMDATE TODATE
> 4 );
> 5 RETAIN FROMDATE TODATE DSTAMP ;
> 6 SET LOANSDB2.V_NWLN_L (KEEP=BRANCH_N ACCOUNT_ LON_NUM
> 7 LON_TYPE LON_SUBT TRANS_TI PROCEED_
> 8 );
> 9 IF NOT FROMDATE THEN DO ;
> 10 INFILE PARMFILE ;
> 11 INPUT
> 12 %PARMFILE
> 13 ;
> 14 FROMDATE=INPUT(PUT(BSTAMP,8.),YYMMDD8.) ;
> 15 TODATE =INPUT(PUT(DSTAMP,8.),YYMMDD8.) ;
> 16 END ;
> 17 XDATE =DATEPART(TRANS_TI) ;
> 18 IF (FROMDATE LT XDATE LE TODATE) ;
> 19 ACCTKEY =BRANCH_N*1000000000+ACCOUNT_*100+LON_NUM ;
> 20 ACCTYPE =LON_TYPE*1000+LON_SUBT ;
> 21 ADVTP =PROCEED_ ;
> 22
>
> NOTE: THE INFILE PARMFILE IS:
> DSNAME=FSDU.I.M.PARMFILE.VER2.G0035V00,
> UNIT=3390,VOLUME=INFC59,DISP=SHR,BLKSIZE=27960,
> LRECL=40,RECFM=FB
>
> NOTE: 1 RECORD WAS READ FROM THE INFILE PARMFILE.
> NOTE: THE DATA SET WORK.NEWLOANS HAS 136982 OBSERVATIONS AND 8 VARIABLES.
>
> 23 PROC CONTENTS ;
> 24
>
> NOTE: THE PROCEDURE CONTENTS PRINTED PAGE 1.
>
> 25 PROC SORT ;
> 26 BY ACCTKEY ;
> 27
>
> NOTE: THE DATA SET WORK.NEWLOANS HAS 136982 OBSERVATIONS AND 8 VARIABLES.
>
> 28 PROC SUMMARY ;
> 29 BY ACCTKEY ;
> 30 VAR ADVTP ;
> 31 OUTPUT OUT=ADVTP00 SUM= ;
> 32
>
> NOTE: THE DATA SET WORK.ADVTP00 HAS 115176 OBSERVATIONS AND 4 VARIABLES.
>
> 33 PROC SUMMARY DATA=NEWLOANS NWAY ;
> 34 CLASS ACCTKEY ;
> 35 VAR ADVTP ;
> 36 OUTPUT OUT=ADVTP00 SUM= ;
> NOTE: THE DATA SET WORK.ADVTP00 HAS 18826 OBSERVATIONS AND 4 VARIABLES.
>
> NOTE: SAS INSTITUTE INC., SAS CAMPUS DRIVE, CARY, NC USA 27513-2414
>
>
> CONTENTS PROCEDURE
>
> DATA SET NAME: WORK.NEWLOANS OBSERVATIONS:
> 136982
> MEMBER TYPE: DATA VARIABLES:
8
> ENGINE: V609 INDEXES:
0
> CREATED: 8:05 THURSDAY, MAY 31, 2001 OBSERVATION LENGTH:
64
> LAST MODIFIED: 8:05 THURSDAY, MAY 31, 2001 DELETED OBSERVATIONS:
0
> PROTECTION: COMPRESSED:
NO
> DATA SET TYPE: SORTED:
NO
> LABEL:
>
> -----ENGINE/HOST DEPENDENT INFORMATION-----
>
> DATA SET PAGE SIZE: 6144
> NUMBER OF DATA SET PAGES: 1443
> FILE FORMAT: 607
> FIRST DATA PAGE: 1
> MAX OBS PER PAGE: 95
> OBS IN FIRST DATA PAGE: 75
> PHYSICAL NAME: SYS01151.T080551.RA000.UCLN23RT.R0423646
> RELEASE CREATED: 6.090470
> RELEASE LAST MODIFIED: 6.090470
> CREATED BY: UCLN23RT
> LAST MODIFIED BY: UCLN23RT
> SUBEXTENTS: 2
> TOTAL BLOCKS USED: 1443
>
> -----ALPHABETIC LIST OF VARIABLES AND ATTRIBUTES-----
>
> # VARIABLE TYPE LEN POS FORMAT INFORMAT LABEL
> ----------------------------------------------------------------------
--
> 4 ACCOUNT_ NUM 8 24 9. 9.
ACCOUNT_NUM
> 7 ACCTKEY NUM 8 48
> 8 ADVTP NUM 8 56
> 3 BRANCH_N NUM 8 16 7. 7.
BRANCH_NUM
> 1 FROMDATE NUM 8 0
> 5 LON_NUM NUM 8 32 7. 7. LON_NUM
> 2 TODATE NUM 8 8
> 6 XDATE NUM 8 40
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic