[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    SAS/Python: Parsing Signatures into components
From:       Roger DeAngelis <rogerjdeangelis () GMAIL ! COM>
Date:       2017-03-30 23:51:05
Message-ID: 3747998728144051.WA.rogerjdeangelisgmail.com () listserv ! uga ! edu
[Download RAW message or body]

SAS/Python: Parsing Signatures into components


HAVE  Signatures that I want to parse
====================================

Up to 40 obs WORK.SIGS total obs=21

Obs    SIGNATURE

  1    HATEM (LOVEM) A NOUR EL DEEN III MD
  2    MRS CRISTINA ISABEL DE LA PENA DDS
  3    JUAN (JOE) JOSE LEON DE LOS RIOS MD
  4    LILIANA M DE LA HOZ AYARZA MD
  5    MR RONDA L XARSON MD (DOC RON)
  6    MS ALBERT R BXOWN DDS
  7    MR BETTY B HXN DO
  8    MRS SIOBHAN AC AXWOTT DDS
  9    DR JASON R GERDON PHD
 10    DR TERI A DAVINE DO
 11    DR WALTER R WIEXZTORT DO
 12    MRS GREGORY S MCDENALD DDS
 13    MS JOHN C SACK DDS
 14    DR SUSAN NIEVES CZXNKA DDS
 15    DR NOEL I TBRMULO DDS
 16    DR MARK W JUEXGENS DDS
 17    PETER A MCGUIRE
 18    ZACHARY D ROTH
 19    PETER A MCGUIRE
 20    MARILYN G MASCHGAN
 21    DAVID J VILLANUEVA

WANT
====

You can add to the TITLE and SUFFIX dictionaries

Up to 40 obs WORK.TST total obs=21

Obs    TITLE    FIRST       MIDDLE       LAST                SUFFIX     NICKNAME

  1             HATEM       A NOUR EL    DEEN                III, MD    LOVEM
  2     MRS     CRISTINA    ISABEL       DE LA PENA          DDS
  3             JUAN        JOSE LEON    DE LOS RIOS         MD         JOE
  4             LILIANA     M            DE LA HOZ AYARZA    MD
  5     MR      RONDA       L            XARSON              MD         DOC RON
  6     MS      ALBERT      R            BXOWN               DDS
  7     MR      BETTY       B            HXN                 DO
  8     MRS     SIOBHAN     AC           AXWOTT              DDS
  9     DR      JASON       R            GERDON              PHD
 10     DR      TERI        A            DAVINE              DO
 11     DR      WALTER      R            WIEXZTORT           DO
 12     MRS     GREGORY     S            MCDENALD            DDS
 13     MS      JOHN        C            SACK                DDS
 14     DR      SUSAN       NIEVES       CZXNKA              DDS
 15     DR      NOEL        I            TBRMULO             DDS
 16     DR      MARK        W            JUEXGENS            DDS
 17             PETER       A            MCGUIRE
 18             ZACHARY     D            ROTH
 19             PETER       A            MCGUIRE
 20             MARILYN     G            MASCHGAN
 21             DAVID       J            VILLANUEVA


WORKING CODE
=============

   Python
      fo.write(str(HumanName(signature).as_dict()) + "\n");


*                _                  _       _
 _ __ ___   __ _| | _____        __| | __ _| |_ __ _
> '_ ` _ \ / _` | |/ / _ \_____ / _` |/ _` | __/ _` |
> > > > > > (_| |   <  __/_____| (_| | (_| | || (_| |
> _| |_| |_|\__,_|_|\_\___|      \__,_|\__,_|\__\__,_|

;

data sigs;
  length signature $55;
  input;
  signature=_infile_;
  file "d:/txt/nams.txt";
  put signature;
  putlog signature;
cards4;
HATEM (LOVEM) A NOUR EL DEEN III MD
MRS CRISTINA ISABEL DE LA PENA DDS
JUAN (JOE) JOSE LEON DE LOS RIOS MD
LILIANA M DE LA HOZ AYARZA MD
MR RONDA L XARSON MD (DOC RON)
MS ALBERT R BXOWN DDS
MR BETTY B HXN DO
MRS SIOBHAN AC AXWOTT DDS
DR JASON R GERDON PHD
DR TERI A DAVINE DO
DR WALTER R WIEXZTORT DO
MRS GREGORY S MCDENALD DDS
MS JOHN C SACK DDS
DR SUSAN NIEVES CZXNKA DDS
DR NOEL I TBRMULO DDS
DR MARK W JUEXGENS DDS
PETER A MCGUIRE
ZACHARY D ROTH
PETER A MCGUIRE
MARILYN G MASCHGAN
DAVID J VILLANUEVA
;;;;
run;quit;

*            _   _
 _ __  _   _| |_| |__   ___  _ __
> '_ \| | | | __| '_ \ / _ \| '_ \
> > _) | |_| | |_| | | | (_) | | | |
> .__/ \__, |\__|_| |_|\___/|_| |_|
> _|    |___/
;

%utlchkfyl(d:/txt/namfix.txt); * delete if exists;

%utl_submit_py64('
from nameparser import HumanName;
fo = open("d:/txt/namfix.txt", "w");
for line in open("d:/txt/nams.txt"):;
.   fo.write(str(HumanName(line).as_dict()) + "\n");
');


data WANT;
  length title first middle last suffix nickname $32;
  infile "d:/txt/namfix.txt";
  array keys[6] $18 ("u'last': u'","u'suffix': u'","u'title': u'","u'middle': \
u'","u'nickname': u'","u'first': u'");  array nams[6] $32 last suffix title middle \
nickname first;  input;
  do i=1 to 6;
    pos=find(_infile_,strip(keys[i]));
    nams[i] = scan(substr(_infile_,pos),4,"'");
    if nams[i]=:', u' then nams[i]='';
  end;
  output;
  keep last suffix title middle nickname first;
run;quit;


NOTE: The infile "d:/txt/namfix.txt" is:
      Filename=d:\txt\namfix.txt,
      RECFM=V,LRECL=384,File Size (bytes)=2420,
      Last Modified=30Mar2017:19:45:57,
      Create Time=30Mar2017:19:45:57

NOTE: 21 records were read from the infile "d:/txt/namfix.txt".
      The minimum record length was 106.
      The maximum record length was 124.
NOTE: The data set WORK.WANT has 21 observations and 6 variables.


Up to 40 obs from WANT total obs=21

Obs    TITLE    FIRST       MIDDLE       LAST                SUFFIX     NICKNAME

  1             HATEM       A NOUR EL    DEEN                III, MD    LOVEM
  2     MRS     CRISTINA    ISABEL       DE LA PENA          DDS
  3             JUAN        JOSE LEON    DE LOS RIOS         MD         JOE
  4             LILIANA     M            DE LA HOZ AYARZA    MD
  5     MR      RONDA       L            XARSON              MD         DOC RON
  6     MS      ALBERT      R            BXOWN               DDS
  7     MR      BETTY       B            HXN                 DO
  8     MRS     SIOBHAN     AC           AXWOTT              DDS
  9     DR      JASON       R            GERDON              PHD
 10     DR      TERI        A            DAVINE              DO
 11     DR      WALTER      R            WIEXZTORT           DO
 12     MRS     GREGORY     S            MCDENALD            DDS
 13     MS      JOHN        C            SACK                DDS
 14     DR      SUSAN       NIEVES       CZXNKA              DDS
 15     DR      NOEL        I            TBRMULO             DDS
 16     DR      MARK        W            JUEXGENS            DDS
 17             PETER       A            MCGUIRE
 18             ZACHARY     D            ROTH
 19             PETER       A            MCGUIRE
 20             MARILYN     G            MASCHGAN
 21             DAVID       J            VILLANUEVA


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic