[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-devel
Subject: Re: Indexing Text Files and Text Encoding
From: David Narvaez <david.narvaez () computer ! org>
Date: 2015-01-20 18:18:03
Message-ID: CACFh1D5b_h091HHu9ogKiXCx1c0Gvhz+Fr_BZ-3sXW52XmOkbg () mail ! gmail ! com
[Download RAW message or body]
On Tue, Jan 20, 2015 at 12:10 PM, Vishesh Handa <me@vhanda.in> wrote:
> Hey guys
>
> We have a plain text indexing plugin in KFileMetaData. It gives the plain text of \
> any file whose mimetype beings with 'text/'. We used to use QString::fromUtf8 to \
> convert this into a string. However, this may not be ideal as a different encoding \
> can exist.
> I've just written a patch to use the system codec and if the conversion fails, to \
> abort. Does anyone have an opinions on this? I'm slightly conflicted.
> Reasons for doing this: If we cannot correctly convert it to text, we're just \
> indexing garbage. This often happens with a binary file getting detected as text. \
> [1].
What about guessing the encoding from some heuristic[0]?
David E. Narvaez
[0] http://icu-project.org/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html
> > Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic