[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: using KHTML without display
From: "Nom Declavier" <achats () blarg ! net>
Date: 2005-05-02 1:31:27
Message-ID: 003d01c54eb6$a9197740$0200a8c0 () Dell030306
[Download RAW message or body]
I'd like to use KHTML to parse HTML/CSS/Javascript, and to deduce the
sizes and positions of Web page constituents when the page is rendered
by a KHTML-based browser. But I want to do this without actually
rendering to any screen, and without invoking more browser functionality
than I need. My planned application has no graphical user interface. It
brings about no display. It's all about trees whose nodes may be
annotated with size and position information. I expect to call getRect()
frequently.
So what I really need is DOM::Document and so on. I'll use KApplication,
KHTMLPart, KHTMLView, and so on, only as I need them to invoke the
functionality of DOM, CSS, and KJS classes.
I'm aware of two ways to get a DOM::HTMLDocument from an HTML file, without
getting into windows and widgets.
Technique 1 looks like this:
DOM::HTMLDocument doc;
doc.setAsync(false);
doc.load(url);
Technique 1 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information. Second, when the program
exits, either the automatically-invoked destructors bomb, or if I invoke
destructors myself, they still bomb.
Technique 2 looks like this, where inputHTMLQString is a QString that's
read from the HTML file, it doesn't matter how.
KHTMLPart * pPart = new KHTMLPart();
pPart->begin();
pPart->write(inputHTMLQString);
pPart->end();
DOM::Document doc = pPart->document();
Technique 2 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information, so there's nothing to choose
between Technique 1 and Technique 2 here. Technique 2 leads to graceful
destruction, but it brings along by default a very fussy version of the
HTML parser which wreaks havoc with scripts, among other constituents. I
can get around the fussy parser, but the way I've done it so far isn't
pretty.
If I want to have all of the following:
calls to getRect() produce useful results
tolerant parser
effective destruction
and I want to have them to the extent possible without without windows
and widgets, I'm guessing the best technique isn't either of the ones
I've tried. Best aside, what's a good technique?
[Attachment #3 (text/html)]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<META content="MSHTML 6.00.2900.2627" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT size=2>
<DIV><FONT size=2><FONT size=3>I'd like to use KHTML to parse
HTML/CSS/Javascript, and to deduce the<BR>sizes and positions of Web page
constituents when the page is rendered<BR>by a KHTML-based browser. But I want
to do this without actually<BR>rendering to any screen, and without invoking
more browser functionality<BR>than I need. My planned application has no
graphical user interface. It<BR>brings about no display. It's all about trees
whose nodes may be<BR>annotated with size and position information. I expect to
call getRect()<BR>frequently.<BR><BR>So what I really need is DOM::Document and
so on. I'll use KApplication,<BR>KHTMLPart, KHTMLView, and so on, only as I need
them to invoke the<BR>functionality of DOM, CSS, and KJS classes.<BR><BR>I'm
aware of two ways to get a DOM::HTMLDocument from an HTML file,
without<BR>getting into windows and widgets.<BR><BR>Technique 1 looks like
this:<BR><BR>DOM::HTMLDocument
doc;<BR>doc.setAsync(false);<BR>doc.load(url);<BR><BR>Technique 1 has two
serious problems. First, when getRect() is called on<BR>nodes, it produces no
useful information. Second, when the program<BR>exits, either the
automatically-invoked destructors bomb, or if I invoke<BR>destructors myself,
they still bomb.<BR><BR>Technique 2 looks like this, where inputHTMLQString is a
QString that's<BR>read from the HTML file, it doesn't matter
how.<BR><BR>KHTMLPart * pPart = new
KHTMLPart();<BR>pPart->begin();<BR>pPart->write(inputHTMLQString);<BR>pPart->end();<BR>DOM::Document \
doc = pPart->document();<BR><BR>Technique 2 has two serious problems. First,
when getRect() is called on<BR>nodes, it produces no useful information, so
there's nothing to choose<BR>between Technique 1 and Technique 2 here. Technique
2 leads to graceful<BR>destruction, but it brings along by default a very fussy
version of the<BR>HTML parser which wreaks havoc with scripts, among other
constituents. I<BR>can get around the fussy parser, but the way I've done it so
far isn't<BR>pretty.<BR><BR>If I want to have all of the following:<BR><BR>calls
to getRect() produce useful results<BR>tolerant parser<BR>effective
destruction<BR><BR>and I want to have them to the extent possible without
without windows<BR>and widgets, I'm guessing the best technique isn't either of
the ones<BR>I've tried. Best aside, what's a good
technique?</FONT><BR><BR></FONT></DIV></FONT></DIV></BODY></HTML>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic