[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-fortran
Subject:    =?UTF-8?B?bGliY3BwIGhvdy10byBxdWVzdGlvbjogVG9rZW5pemluZyBhbmQgc3A=?= =?UTF-8?B?YWNlcyAmIHRhYnMg4oCTI
From:       Tobias Burnus <burnus () net-b ! de>
Date:       2014-11-29 15:10:35
Message-ID: 5479E1EB.3080204 () net-b ! de
[Download RAW message or body]

BACKGROUND

Currently, gfortran reads source files directly. If preprocessing is 
enabled, it calls libcpp directly but writes the preprocessed output 
into a temporary file, which is then read. In order to bring processing 
closer to the common code, show macro expansion in error messages and 
similar, I'd like to use libcpp for the reading of the files – 
preprocessed and to-be preprocessed.

The problem is that whitespace seems to get lost in libcpp. In Fortran, 
spaces play a role:
* In free format only to warn if tabs appear (invalid per ISO standard, 
-Wtabs) and to print an error if the line is too long (max 132 
characters according to the standard; with -ffree-line-length-none = 
unlimitted.)
* For fixed format, the whitespace is crucial. The columns 1 to 6 have a 
special meaning, but also the total length is limited to 72 characters; 
excess characters are ignored (comment). That dates back to the time of 
punch cards and the eight excess characters were e.g. used to enumerate 
the punch cards. There are still Fortran programs out there which assume 
that everything beyond 72 characters is ignored. Others assume that 80 
(= full punch card) or 132 characters are permitted (free-form limit). 
(gfortran permits any value >=72, including unlimited.)


LIBCPP

Now back to libcpp: As first step, I tried to use
   token = cpp_get_token (cpp_in);
   cpp_token_as_text (cpp_in, token);
for converting the input. I can recover linebreaks and whether there was 
a preceeding line space with the flags BOL and PREV_WHITE; for line 
breaks also by defining a call back. At the beginning of the line, I can 
still recover the number of spaces from the source location 
(SOURCE_COLUMN) but not whether it was done with " " or via a tab. For 
mid line, I could use: souce column of current token minus previous 
token minus the length of the previous token when spellt as text. 
However, that's not really elegant.

[A bit related, adding a special Fortran mode makes sense; currently, 
preprocessing can only use the traditional mode as things like
    print *, 'That''s a string which &
      ! Here's a comment line inbetween
      &is continued in the next line'
is not properly handled. Complaining about unterminated strings either 
because of the & continuation line or the ' in the comment. However, as 
some other compilers support features such as "##" concatenation,  
there's the wish by users to go beyond traditional.]

As the Fortran standard doesn't define how the preprocessing works,* we 
do have quite some leeway. However, for -fpreprocessed, the white spaces 
have really to be passed as is. (-fpreprocess which is the default in 
gfortran, unless the special file extension (.F, .F90, .fpp) or "-cpp" 
is used.


Do you have a suggestion how to best implement this white-space 
preserving with libcpp? It can (and presumably should) be a special 
flag/function for Fortran.


Tobias

* To be precise: Part 3 of the Fortran standarization series (ISO/IEC 
1539-3:1998) defines conditional compilation ("coco") but that never 
caught on [an external tool "coco" exists to use it]. I think coco is 
supposed to get retired. On the other hand, all Fortran compilers 
support to optionally but automatically run the code through the C 
pre-processor; some use simply "cpp", others netlib.org's fpp and some 
support newer features.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic