[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: using KHTML without display
From: "Nom Declavier" <achats () blarg ! net>
Date: 2005-05-03 16:30:22
Message-ID: 00a401c54ffd$676a7f40$0200a8c0 () Dell030306
[Download RAW message or body]
Thanks very much for the xfake suggestion. Suppressing the display is part of what \
I'm after. The more critical requirement is to dispense with functionality that my \
application doesn't need. I want the application to be as small and fast as possible. \
What's the minimal context that allows the DOM, CSS, and KJS classes to be used for \
parsing and measurement?
----- Original Message -----
From: Nom Declavier
To: kfm-devel@kde.org
Sent: Sunday, May 01, 2005 6:31 PM
Subject: using KHTML without display
I'd like to use KHTML to parse HTML/CSS/Javascript, and to deduce the
sizes and positions of Web page constituents when the page is rendered
by a KHTML-based browser. But I want to do this without actually
rendering to any screen, and without invoking more browser functionality
than I need. My planned application has no graphical user interface. It
brings about no display. It's all about trees whose nodes may be
annotated with size and position information. I expect to call getRect()
frequently.
So what I really need is DOM::Document and so on. I'll use KApplication,
KHTMLPart, KHTMLView, and so on, only as I need them to invoke the
functionality of DOM, CSS, and KJS classes.
I'm aware of two ways to get a DOM::HTMLDocument from an HTML file, without
getting into windows and widgets.
Technique 1 looks like this:
DOM::HTMLDocument doc;
doc.setAsync(false);
doc.load(url);
Technique 1 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information. Second, when the program
exits, either the automatically-invoked destructors bomb, or if I invoke
destructors myself, they still bomb.
Technique 2 looks like this, where inputHTMLQString is a QString that's
read from the HTML file, it doesn't matter how.
KHTMLPart * pPart = new KHTMLPart();
pPart->begin();
pPart->write(inputHTMLQString);
pPart->end();
DOM::Document doc = pPart->document();
Technique 2 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information, so there's nothing to choose
between Technique 1 and Technique 2 here. Technique 2 leads to graceful
destruction, but it brings along by default a very fussy version of the
HTML parser which wreaks havoc with scripts, among other constituents. I
can get around the fussy parser, but the way I've done it so far isn't
pretty.
If I want to have all of the following:
calls to getRect() produce useful results
tolerant parser
effective destruction
and I want to have them to the extent possible without without windows
and widgets, I'm guessing the best technique isn't either of the ones
I've tried. Best aside, what's a good technique?
[Attachment #3 (text/html)]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<META content="MSHTML 6.00.2900.2627" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT size=2>Thanks very much for the xfake suggestion. Suppressing
the display is part of what I'm after. The more critical requirement is to
dispense with functionality that my application doesn't need. I want the
application to be as small and fast as possible. What's the minimal context that
allows the DOM, CSS, and KJS classes to be used for parsing and
measurement?</FONT></DIV>
<BLOCKQUOTE dir=ltr
style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 \
2px solid; MARGIN-RIGHT: 0px"> <DIV style="FONT: 10pt arial">----- Original Message \
----- </DIV> <DIV
style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B>
<A title=achats@blarg.net href="mailto:achats@blarg.net">Nom Declavier</A>
</DIV>
<DIV style="FONT: 10pt arial"><B>To:</B> <A title=kfm-devel@kde.org
href="mailto:kfm-devel@kde.org">kfm-devel@kde.org</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Sent:</B> Sunday, May 01, 2005 6:31 PM</DIV>
<DIV style="FONT: 10pt arial"><B>Subject:</B> using KHTML without
display</DIV>
<DIV><BR></DIV>
<DIV><FONT size=2>
<DIV><FONT size=2><FONT size=3>I'd like to use KHTML to parse
HTML/CSS/Javascript, and to deduce the<BR>sizes and positions of Web page
constituents when the page is rendered<BR>by a KHTML-based browser. But I want
to do this without actually<BR>rendering to any screen, and without invoking
more browser functionality<BR>than I need. My planned application has no
graphical user interface. It<BR>brings about no display. It's all about trees
whose nodes may be<BR>annotated with size and position information. I expect
to call getRect()<BR>frequently.<BR><BR>So what I really need is DOM::Document
and so on. I'll use KApplication,<BR>KHTMLPart, KHTMLView, and so on, only as
I need them to invoke the<BR>functionality of DOM, CSS, and KJS
classes.<BR><BR>I'm aware of two ways to get a DOM::HTMLDocument from an HTML
file, without<BR>getting into windows and widgets.<BR><BR>Technique 1 looks
like this:<BR><BR>DOM::HTMLDocument
doc;<BR>doc.setAsync(false);<BR>doc.load(url);<BR><BR>Technique 1 has two
serious problems. First, when getRect() is called on<BR>nodes, it produces no
useful information. Second, when the program<BR>exits, either the
automatically-invoked destructors bomb, or if I invoke<BR>destructors myself,
they still bomb.<BR><BR>Technique 2 looks like this, where inputHTMLQString is
a QString that's<BR>read from the HTML file, it doesn't matter
how.<BR><BR>KHTMLPart * pPart = new
KHTMLPart();<BR>pPart->begin();<BR>pPart->write(inputHTMLQString);<BR>pPart->end();<BR>DOM::Document \
doc = pPart->document();<BR><BR>Technique 2 has two serious problems.
First, when getRect() is called on<BR>nodes, it produces no useful
information, so there's nothing to choose<BR>between Technique 1 and Technique
2 here. Technique 2 leads to graceful<BR>destruction, but it brings along by
default a very fussy version of the<BR>HTML parser which wreaks havoc with
scripts, among other constituents. I<BR>can get around the fussy parser, but
the way I've done it so far isn't<BR>pretty.<BR><BR>If I want to have all of
the following:<BR><BR>calls to getRect() produce useful results<BR>tolerant
parser<BR>effective destruction<BR><BR>and I want to have them to the extent
possible without without windows<BR>and widgets, I'm guessing the best
technique isn't either of the ones<BR>I've tried. Best aside, what's a good
technique?</FONT><BR><BR></FONT></DIV></FONT></DIV></BLOCKQUOTE></BODY></HTML>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic