'RE: html parsing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    RE: html parsing
From:       Nicolas Goutte <nicog () snafu ! de>
Date:       2001-06-27 21:39:13
[Download RAW message or body]

>>
>> Thats very easy. You either load the document with
>> KHTMLPart::openURL or cou
>> construct it by inputting the HTML data directly with ::begin,
>> ::write and
>> ::end.
>>
>> After either end or openURL you just do a document() or
>> htmlDocument() call,
>> which return DOM::Document or DOM:.HTMLDocument repectively.
>>
>
>Thanks for the info. However, there's one thing I'm a bit concerned about.
>As far as I'm aware the KHTMLPart class sets up everything needed to access,
>parse and render html (basically a browser). But I don't need to do any
>rendering, therefore won't I incur a lot of extra overhead? If so, is there
>a more specific class that I can use that just does the accessing and
>parsing bit?

According to my experience, you cannot use KHTML without a view.

I have tried to use it in this way for KWord's HTML import filter, where 
I cannot have a view. However, I was not able to use it in that way (tested 
with kdelibs version 2.1.1.)

The problem is that the need of a view exists even deeply in the code. In my 
tests, when reading an HTML file, it crashed first at the <title> tag, and 
later when I had fixed this crash, it crashed at another tag. After 
this, I have completely drop the idea of using KHTML for KWord's HTML import 
filter.

I would be very glad if someone could prouve that I am wrong, as it would be 
very useful for KWord to have a working HTML import filter.

>
>Cheers
>Prash

Have a nice day/evening/night!

>> Visit http://master.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread]