[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-i18n-doc
Subject: Re: Migrating Pology to Python 3
From: Karl Ove Hufthammer <karl () huftis ! org>
Date: 2022-10-08 10:21:12
Message-ID: 69f3d0a5-7d01-2415-8f09-f2ebbd9d6a31 () huftis ! org
[Download RAW message or body]
Adrian Chaves skreiv 08.10.2022 12:13:
> I have debugged this issue and I believe the root cause is
> “addFilterHook name="normalize/noinvisible" on="pmsgstr"
> handle="noinvisible"”, defined in puretext.filters, which is included
> in ortography.rules. So I think this is another case where Python 3 is
> working as expected, and Python 2 was not.
Hmm. The ‘noinvisible’ hook should remove only invisible characters.
They are defined in normalize.py:
# As defined by http://www.unicode.org/faq/unsup_char.html.
_invisible_character_codepoints = ([]
+ [0x200C, 0x200D] # cursive joiners
+ list(range(0x202A, 0x202E + 1)) # bidirectional format controls
+ [0x00AD] # soft hyphen
+ [0x2060, 0xFEFF] # word joiners
+ [0x200B] # the zero width space
+ list(range(0x2061, 0x2064 + 1)) # invisible math operators
+ [0x115F, 0x1160] # Jamo filler characters
+ list(range(0xFE00, 0xFE0F + 1)) # variation selectors
)
But the non-breaking space (U+00A0) is not among these characters, and
shouldn’t be removed (or replaced by a normal space).
--
Karl Ove Hufthammer
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic