[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-compiler-dev
Subject:    Re: [jdk17] RFR: JDK-8269150 UnicodeReader not translating \u005c\\u005d to \\] [v8]
From:       Joe Darcy <darcy () openjdk ! java ! net>
Date:       2021-07-26 17:25:38
Message-ID: 5JqhSGAEZMmgKKLtwvK23jgBXebXwAM3LiWbqHJXs_4=.19a349af-fe5b-47c8-815b-9170574ae3a1 () github ! com
[Download RAW message or body]

On Fri, 23 Jul 2021 13:31:50 GMT, Jim Laskey <jlaskey@openjdk.org> wrote:

> > This issue relates to *Unicode escapes*, described in section 3.3 of the JLS. \
> > javac interprets Unicode escapes during the reading of ASCII characters from \
> > source. Later on, javac interprets *escape sequences*, described in section 3.7 \
> > of the JLS, during the tokenization of character literals, string literals, and \
> > text blocks. Escape sequences are only indirectly affected by this bug. 
> > During reading, a _normal backslash_ (that is, the ASCII `` character, not the \
> > corresponding Unicode escape `\u005c`) followed by another normal backslash is \
> > treated collectively as a pair of backslash characters. No further interpretation \
> > is done. This means that if a normal backslash immediately precedes the sequence \
> > `` `u` `A` `B` `C` `D` which would "normally" be interpreted as an Unicode \
> > escape, then the interpretation of that sequence as a Unicode escape is \
> > suppressed. 
> > For example, the sequence `\u2022` would be interpreted as the `•` character, \
> > whereas `\\u2022` would be interpreted as the seven characters `` `` `u` `2` `0` \
> > `2` `2`. 
> > An issue arises when Java developers choose to use a _Unicode escape backslash_ \
> > `\u005c` in their source code, instead of a normal backslash. Prior to JDK 16, if \
> > the Unicode escape backslash was followed by a second Unicode escape, then *the \
> > second Unicode escape was always interpreted*. The normal backslash at the \
> > beginning of the second Unicode escape (immediately followed by `u`) was *not* \
> > paired with the preceding Unicode escape backslash. Elsewise, any following \
> > normal backslash will be paired with the `\u005c`. 
> > For example, the sequence `\u005c\u2022` would be interpreted as `` and `•`, \
> > whereas `\u005c\tXYZ` would be interpreted as `` `` `t` `X` `Y` `Z`. 
> > The bug in JDK 16 ignored `\u005c` as having any effect on Unicode \
> > interpretation. Using the example from compiler-dev discussions, `\u005c\\u005d` \
> > : 
> > - Prior to JDK 16, it was interpreted as `` `` `]`
> > - JDK 16 interpreted it as `` `` `` `u` `0` `0` `5` `d` which would produce a \
> > syntax error downstream in the lexer because the escape sequence `\u` is invalid.
> 
> Jim Laskey has updated the pull request with a new target base due to a merge or a \
> rebase. The incremental webrev excludes the unrelated changes brought in by the \
> merge/rebase. The pull request contains 11 additional commits since the last \
> revision: 
> - Merge branch 'master' into 8269150
> - Update UnicodeBackslash test to be easier to follow
> - Remove comment duplicated by merge
> - Merge branch 'master' into 8269150
> - Merge branch '8269150b' into 8269150
> - Use jdk15 logic
> - Proposed change
> - Merge branch 'master' into 8269150
> - Updated the test to include all combinations
> - Merge branch 'master' into 8269150
> - ... and 1 more: https://git.openjdk.java.net/jdk17/compare/3e29056e...3bc5789c

Marked as reviewed by darcy (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk17/pull/126


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic