[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-i18n-doc
Subject:    Re: Migrating Pology to Python 3
From:       Karl Ove Hufthammer <karl () huftis ! org>
Date:       2022-10-08 10:21:12
Message-ID: 69f3d0a5-7d01-2415-8f09-f2ebbd9d6a31 () huftis ! org
[Download RAW message or body]

Adrian Chaves skreiv 08.10.2022 12:13:
> I have debugged this issue and I believe the root cause is 
> “addFilterHook name="normalize/noinvisible" on="pmsgstr" 
> handle="noinvisible"”, defined in puretext.filters, which is included 
> in ortography.rules. So I think this is another case where Python 3 is 
> working as expected, and Python 2 was not.

Hmm. The ‘noinvisible’ hook should remove only invisible characters. 
They are defined in normalize.py:

# As defined by http://www.unicode.org/faq/unsup_char.html.
_invisible_character_codepoints = ([]
     + [0x200C, 0x200D] # cursive joiners
     + list(range(0x202A, 0x202E + 1)) # bidirectional format controls
     + [0x00AD] # soft hyphen
     + [0x2060, 0xFEFF] # word joiners
     + [0x200B] # the zero width space
     + list(range(0x2061, 0x2064 + 1)) # invisible math operators
     + [0x115F, 0x1160] # Jamo filler characters
     + list(range(0xFE00, 0xFE0F + 1)) # variation selectors
)

But the non-breaking space (U+00A0) is not among these characters, and 
shouldn’t be removed (or replaced by a normal space).


-- 
Karl Ove Hufthammer

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic