[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Indexing Text Files and Text Encoding
From:       Vishesh Handa <me () vhanda ! in>
Date:       2015-01-20 17:10:34
Message-ID: CAOPTMKDOSrf0dCmSV=MYJj7AADv9wnbXPggACTnpy0VSO=2xdw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hey guys

We have a plain text indexing plugin in KFileMetaData. It gives the plain
text of any file whose mimetype beings with 'text/'. We used to use
QString::fromUtf8 to convert this into a string. However, this may not be
ideal as a different encoding can exist.

I've just written a patch to use the system codec and if the conversion
fails, to abort. Does anyone have an opinions on this? I'm slightly
conflicted.

Reasons for doing this: If we cannot correctly convert it to text, we're
just indexing garbage. This often happens with a binary file getting
detected as text. [1].

-- 
Vishesh Handa

[1] https://bugs.kde.org/show_bug.cgi?id=342312

[Attachment #5 (text/html)]

<div dir="ltr"><div>Hey guys</div><div><br></div><div>We have a plain text indexing \
plugin in KFileMetaData. It gives the plain text of any file whose mimetype beings \
with &#39;text/&#39;. We used to use QString::fromUtf8 to convert this into a string. \
However, this may not be ideal as a different encoding can \
exist.</div><div><br></div><div>I&#39;ve just written a patch to use the system codec \
and if the conversion fails, to abort. Does anyone have an opinions on this? I&#39;m \
slightly conflicted.</div><div><br></div><div>Reasons for doing this: If we cannot \
correctly convert it to text, we&#39;re just indexing garbage. This often happens \
with a binary file getting detected as text. [1].</div><div><br></div>-- \
<br><div><span style="color:rgb(192,192,192)">Vishesh \
Handa</span><br></div><div><span \
style="color:rgb(192,192,192)"><br></span></div><div><span \
style="color:rgb(192,192,192)">[1]  </span><font color="#c0c0c0"><a \
href="https://bugs.kde.org/show_bug.cgi?id=342312">https://bugs.kde.org/show_bug.cgi?id=342312</a></font></div>
 </div>



>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic