[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-devel
Subject: Indexing Text Files and Text Encoding
From: Vishesh Handa <me () vhanda ! in>
Date: 2015-01-20 17:10:34
Message-ID: CAOPTMKDOSrf0dCmSV=MYJj7AADv9wnbXPggACTnpy0VSO=2xdw () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Hey guys
We have a plain text indexing plugin in KFileMetaData. It gives the plain
text of any file whose mimetype beings with 'text/'. We used to use
QString::fromUtf8 to convert this into a string. However, this may not be
ideal as a different encoding can exist.
I've just written a patch to use the system codec and if the conversion
fails, to abort. Does anyone have an opinions on this? I'm slightly
conflicted.
Reasons for doing this: If we cannot correctly convert it to text, we're
just indexing garbage. This often happens with a binary file getting
detected as text. [1].
--
Vishesh Handa
[1] https://bugs.kde.org/show_bug.cgi?id=342312
[Attachment #5 (text/html)]
<div dir="ltr"><div>Hey guys</div><div><br></div><div>We have a plain text indexing \
plugin in KFileMetaData. It gives the plain text of any file whose mimetype beings \
with 'text/'. We used to use QString::fromUtf8 to convert this into a string. \
However, this may not be ideal as a different encoding can \
exist.</div><div><br></div><div>I've just written a patch to use the system codec \
and if the conversion fails, to abort. Does anyone have an opinions on this? I'm \
slightly conflicted.</div><div><br></div><div>Reasons for doing this: If we cannot \
correctly convert it to text, we're just indexing garbage. This often happens \
with a binary file getting detected as text. [1].</div><div><br></div>-- \
<br><div><span style="color:rgb(192,192,192)">Vishesh \
Handa</span><br></div><div><span \
style="color:rgb(192,192,192)"><br></span></div><div><span \
style="color:rgb(192,192,192)">[1] </span><font color="#c0c0c0"><a \
href="https://bugs.kde.org/show_bug.cgi?id=342312">https://bugs.kde.org/show_bug.cgi?id=342312</a></font></div>
</div>
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic