[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    [PATCH]: KNewsTicker querying non-ISO8859-1 sites
From:       Frerich Raabe <frerichraabe () gmx ! de>
Date:       2002-03-07 16:53:43
[Download RAW message or body]

Hi,

the attached patch, courtesy of Volker Augustin 
<volker.augustin@perfektionismus.de> apparently makes cyrillic characters as 
well as german umlauts work in KNewsTicker. I hope it makes asian charsets 
work as well, but I didn't yet find a suitable font.

You can use the URLs http://www.slashdot.jp/slashdot.rdf (japanese), 
http://www.hamovniki.net/~d00mer/lenta_rdf/lenta.rdf (russian) and 
http://www.heise.de/newsticker/heise.rdf (german) to test. Just in case 
somebody has one of those giant unicode fonts handy.

Ok to commit?

- Frerich

["xmlnewsaccess2.diff" (text/x-diff)]

Index: kdenetwork/knewsticker/common/xmlnewsaccess.cpp
===================================================================
RCS file: /home/kde/kdenetwork/knewsticker/common/xmlnewsaccess.cpp,v
retrieving revision 1.27
diff -u -r1.27 xmlnewsaccess.cpp
--- kdenetwork/knewsticker/common/xmlnewsaccess.cpp	2002/02/09 22:33:37	1.27
+++ kdenetwork/knewsticker/common/xmlnewsaccess.cpp	2002/03/06 15:02:18
@@ -16,6 +16,7 @@
 
 #include <qdom.h>
 #include <qregexp.h>
+#include <qtextcodec.h>
 
 XMLNewsArticle::XMLNewsArticle(const QString &headline, const KURL &address)
 	: m_headline(headline),
@@ -72,9 +73,36 @@
 	if (okSoFar) {
 		QDomDocument domDoc;
 		// Some servers like to prepend a blank line, QDom doesn't like that...
-		if (validContent = domDoc.setContent(QCString(data).stripWhiteSpace())) {
+		if (validContent = domDoc.setContent(QString(data).stripWhiteSpace())) {
+
+			/*
+			 * Detect the encoding and create a suitable QTextCodec object.
+			 * If a XML processing instruction is present, it should be of
+			 * the following form:
+			 * <?xml version = "1.0" encoding = "ISO-8859-1"?>
+			 * where the encoding attribute need not necessarily be present
+			 * (e.g. slashdot.org omits the encoding).
+			 * This should then be in the first node of the document which
+			 * in turn should be of type QDomProcessingInstruction.
+			 */
+			QTextCodec *codec = 0;
+
+			QDomNode firstNode = domDoc.firstChild();
+			if ( firstNode.isProcessingInstruction() ) {
+				QString data = firstNode.toProcessingInstruction().data();
+				QString encKey = QString::fromLatin1( "encoding" );
+				if ( data.contains( encKey ) ) {
+					QString containingPart = data.mid( data.find(encKey) );
+					QString encoding = containingPart.section( ' ', 2, 2 );
+					encoding = encoding.mid( 1, encoding.length() - 2 );
+					kdDebug(5005) << QString::fromLatin1( "Encoding: " ) << encoding << endl;
+
+					codec = QTextCodec::codecForName(encoding.latin1());
+				}
+			}
+
 			QDomNode channelNode = \
                domDoc.documentElement().namedItem(QString::fromLatin1("channel"));
-	
+
 			m_name = channelNode.namedItem(QString::fromLatin1("title")).toElement().text().simplifyWhiteSpace();
  m_link = channelNode.namedItem(QString::fromLatin1("link")).toElement().text().simplifyWhiteSpace();
  m_description = channelNode.namedItem(QString::fromLatin1("description")).toElement().text().simplifyWhiteSpace();
 @@ -85,7 +113,11 @@
 			QString headline, address;
 			for (unsigned int i = 0; i < items.count(); i++) {
 				itemNode = items.item(i);
-				headline = decodeEntities(itemNode.namedItem(QString::fromLatin1("title")).toElement().text().simplifyWhiteSpace());
 +				QString title = \
itemNode.namedItem(QString::fromLatin1("title")).toElement().text().simplifyWhiteSpace();
 +				if ( codec != 0 ) {
+					title = codec->toUnicode( title.latin1() );
+				}
+				headline = decodeEntities( title );
 				address = decodeEntities(itemNode.namedItem(QString::fromLatin1("link")).toElement().text().simplifyWhiteSpace());
  m_articles.append(XMLNewsArticle(headline, address));
 			}



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic