[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-3000
Subject:    [Python-3000] Invalid \U escape in source code give
From:       g.brandl () gmx ! net (Georg Brandl)
Date:       2007-07-18 21:42:56
Message-ID: f7m1gn$odp$1 () sea ! gmane ! org
[Download RAW message or body]

Guido van Rossum schrieb:
> On 7/17/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> > When a source file contains a string literal with an out-of-range \U
>> > escape (e.g. "\U12345678"), instead of a syntax error pointing to the
>> > offending literal, I get this, without any indication of the file or
>> > line:
>> >
>> > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in
>> > position 0-9: illegal Unicode character
>> >
>> > This is quite hard to track down.
>>
>> I think the fundamental flaw is that a codec is used to implement
>> the Python syntax (or, rather, lexical rules).
>>
>> Not quite sure what the rationale for this design was; doing it on
>> the lexical level is (was) tricky because \u escapes were allowed
>> only for Unicode literals, and the lexer had no knowledge of the
>> prefix preceding a literal. (In 3k, it's still similar, because
>> \U escapes have no effect in bytes and raw literals).
>>
>> Still, even if it is "only" handled at the parsing level, I
>> don't see why it needs to be a codec. Instead, implementing
>> escapes in the compiler would still allow for proper diagnostics
>> (notice that in the AST the original lexical form of the string
>> literal is gone).
> 
> I guess because it was deemed useful to have a codec for this purpose
> too, thereby exposing the algorithm to Python code that needs the same
> functionality (e.g. the compiler package, RIP).

And it still is useful. If you want to convert a string into a printable
representation, you can use repr(), but for the inverse you need this
codec. (or eval()...)

Georg


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic