[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: PRoc Summary 'Internals'
From:       Biff Canlett <Biff () CANLETT ! COM>
Date:       2001-05-31 16:04:34
[Download RAW message or body]

I would never consider debugging a proc summary step without re running it
with the missing option.  Missing values can creep into data so easily in
SAS, particularly if you use '+' for addition instead of SUM( ) . Using
Class in proc summary will discard all data with a missing value in the
class variable. If you re run with "PROC SUMMARY MISSING NWAY" and review
your results, I suspect you will find some key values missing.

"teedl" <teedl@tdbank.ca> wrote in message
news:000601c0e9d0$d858c6e0$164b1331@isd.tdbank.ca...
> I have just come across a small interesting issue while performing some
> summarisation on data using Proc Summary and I was hoping some of you
> familiar with this process, or SAS Internal processes could provide me
with
> some insight as to what is happening here. I am pasting the log below.
>
> This particular process reads a DB2 table consisting of all new load
> advances(cheques), selects the desired records based on the date parameter
> passed to it, and then generates a key value. Since a loan can have
multiple
> advances occur within the time period requested, and the desire is to have
a
> single record per loan, the records need to be 'summarized' based on the
> AcctKey. In originally setting this process up I had used the Proc Summary
> NWAY, Class process. However, in reviewing the results I noted that the
> number of records output was much lower than expected. While multiple
> advances do occur, it is not a common occurence. After thinking that the
> uniqueness of my AcctKey was the issue, I spent some time testing that. My
> AcctKey logic proved to be, at least in this case, correct. Lon_Num is
never
> higher than 40, Account_ is a 7 digit number, and Branch is a 4 digit
> number. The resulting Key is, in essence, BBBBAAAAAAANN. While trying to
> determine what was going on, I modified the routine slightly to sort the
> file prior to summarization, and the use the By process. Expecting to get
> the same results I was astonished to find that the By on the Summary
> actually produced the correct results! To make matters even more
> interesting, if I add a Format AcctKey  13. to the DATA step, the Summary
> Class process works fine! So then, it appears that Class is using some
form
> of Format unkown to me, while By does not. Can anyone enlighten me on what
> is internally happening here? And is their a way, using Proc Summary
itself,
> that I can produce the same result for both the Class and By, without
> knowing in advance the format structure?
>
> In essence, I am a little surprised that I would get a different result
from
> what appears to be two identical processes.
>
>
>
> Thanks.
>
>
> Lionel
>
>
> PS to Paul Dorfman. This routine is a 'patch' to an existing process. As a
> result, I did not try to use any of your neat Hashing/Direct
> Addressing/Bitmapping stuff. I will, however, be using that stuff in some
of
> my newer projects.
>
> 1
> 2          DATA NEWLOANS (KEEP=ACCTKEY ADVTP LON_NUM ACCOUNT_ BRANCH_N
> 3                              XDATE  FROMDATE TODATE
> 4                         );
> 5            RETAIN FROMDATE TODATE  DSTAMP ;
> 6            SET LOANSDB2.V_NWLN_L (KEEP=BRANCH_N ACCOUNT_ LON_NUM
> 7                                  LON_TYPE LON_SUBT TRANS_TI PROCEED_
> 8                                  );
> 9            IF NOT FROMDATE THEN DO ;
> 10             INFILE PARMFILE ;
> 11             INPUT
> 12             %PARMFILE
> 13             ;
> 14             FROMDATE=INPUT(PUT(BSTAMP,8.),YYMMDD8.) ;
> 15             TODATE  =INPUT(PUT(DSTAMP,8.),YYMMDD8.) ;
> 16           END ;
> 17           XDATE    =DATEPART(TRANS_TI) ;
> 18           IF (FROMDATE LT XDATE    LE TODATE)      ;
> 19           ACCTKEY  =BRANCH_N*1000000000+ACCOUNT_*100+LON_NUM  ;
> 20           ACCTYPE  =LON_TYPE*1000+LON_SUBT  ;
> 21           ADVTP    =PROCEED_ ;
> 22
>
> NOTE: THE INFILE PARMFILE IS:
>       DSNAME=FSDU.I.M.PARMFILE.VER2.G0035V00,
>       UNIT=3390,VOLUME=INFC59,DISP=SHR,BLKSIZE=27960,
>       LRECL=40,RECFM=FB
>
> NOTE: 1 RECORD WAS READ FROM THE INFILE PARMFILE.
> NOTE: THE DATA SET WORK.NEWLOANS HAS 136982 OBSERVATIONS AND 8 VARIABLES.
>
> 23         PROC CONTENTS ;
> 24
>
> NOTE: THE PROCEDURE CONTENTS PRINTED PAGE 1.
>
> 25         PROC SORT ;
> 26           BY ACCTKEY ;
> 27
>
> NOTE: THE DATA SET WORK.NEWLOANS HAS 136982 OBSERVATIONS AND 8 VARIABLES.
>
> 28         PROC SUMMARY ;
> 29           BY    ACCTKEY    ;
> 30           VAR   ADVTP     ;
> 31           OUTPUT OUT=ADVTP00 SUM= ;
> 32
>
> NOTE: THE DATA SET WORK.ADVTP00 HAS 115176 OBSERVATIONS AND 4 VARIABLES.
>
> 33         PROC SUMMARY DATA=NEWLOANS NWAY ;
> 34           CLASS  ACCTKEY ;
> 35           VAR    ADVTP   ;
> 36           OUTPUT OUT=ADVTP00 SUM= ;
> NOTE: THE DATA SET WORK.ADVTP00 HAS 18826 OBSERVATIONS AND 4 VARIABLES.
>
> NOTE: SAS INSTITUTE INC., SAS CAMPUS DRIVE, CARY, NC USA 27513-2414
>
>
>                                CONTENTS PROCEDURE
>
> DATA SET NAME: WORK.NEWLOANS                        OBSERVATIONS:
> 136982
> MEMBER TYPE:   DATA                                 VARIABLES:
8
> ENGINE:        V609                                 INDEXES:
0
> CREATED:       8:05 THURSDAY, MAY 31, 2001          OBSERVATION LENGTH:
64
> LAST MODIFIED: 8:05 THURSDAY, MAY 31, 2001          DELETED OBSERVATIONS:
0
> PROTECTION:                                         COMPRESSED:
NO
> DATA SET TYPE:                                      SORTED:
NO
> LABEL:
>
>                   -----ENGINE/HOST DEPENDENT INFORMATION-----
>
>        DATA SET PAGE SIZE:       6144
>        NUMBER OF DATA SET PAGES: 1443
>        FILE FORMAT:              607
>        FIRST DATA PAGE:          1
>        MAX OBS PER PAGE:         95
>        OBS IN FIRST DATA PAGE:   75
>        PHYSICAL NAME:            SYS01151.T080551.RA000.UCLN23RT.R0423646
>        RELEASE CREATED:          6.090470
>        RELEASE LAST MODIFIED:    6.090470
>        CREATED BY:               UCLN23RT
>        LAST MODIFIED BY:         UCLN23RT
>        SUBEXTENTS:               2
>        TOTAL BLOCKS USED:        1443
>
>               -----ALPHABETIC LIST OF VARIABLES AND ATTRIBUTES-----
>
>     #    VARIABLE    TYPE    LEN    POS    FORMAT    INFORMAT    LABEL
>     ----------------------------------------------------------------------
--
>     4    ACCOUNT_    NUM       8     24    9.        9.
ACCOUNT_NUM
>     7    ACCTKEY     NUM       8     48
>     8    ADVTP       NUM       8     56
>     3    BRANCH_N    NUM       8     16    7.        7.
BRANCH_NUM
>     1    FROMDATE    NUM       8      0
>     5    LON_NUM     NUM       8     32    7.        7.          LON_NUM
>     2    TODATE      NUM       8      8
>     6    XDATE       NUM       8     40

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic