[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freedesktop-poppler
Subject:    [poppler] tweaking pdfto[html|xml] to avoid spaces within words + which spellcheck to use ...
From:       Albretch Mueller <lbrtchx () gmail ! com>
Date:       2020-06-02 11:53:17
Message-ID: CAFakBwib2okLyM+eWKuas_3Qkf1rxWNyXroMtEts303i8=SB8A () mail ! gmail ! com
[Download RAW message or body]

 which option should be used to avoid such results

 <a href="...#183">Per cep tual  Re sponse .</a></text>

 or, which spellcheckers do you use in tandem with pdftohtml to
correct such spaces within words (and, optimally, spellcheck those
line).

 It appears to be something either within the pdf file or the text
extraction algorithm (based on phonemes?), because the starting and
ending characters of the words/meaningful sequences of characters are
never splitted.

 The spellcheck of libreoffice doesn't "correct all" such spaces
splitting words, which appear also, if you go: okular > export as >
text,

 lbrtchx
_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic