[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: using KHTML without display
From:       "Nom Declavier" <achats () blarg ! net>
Date:       2005-05-03 16:30:22
Message-ID: 00a401c54ffd$676a7f40$0200a8c0 () Dell030306
[Download RAW message or body]

Thanks very much for the xfake suggestion. Suppressing the display is part of what \
I'm after. The more critical requirement is to dispense with functionality that my \
application doesn't need. I want the application to be as small and fast as possible. \
What's the minimal context that allows the DOM, CSS, and KJS classes to be used for \
                parsing and measurement?
  ----- Original Message ----- 
  From: Nom Declavier 
  To: kfm-devel@kde.org 
  Sent: Sunday, May 01, 2005 6:31 PM
  Subject: using KHTML without display


  I'd like to use KHTML to parse HTML/CSS/Javascript, and to deduce the
  sizes and positions of Web page constituents when the page is rendered
  by a KHTML-based browser. But I want to do this without actually
  rendering to any screen, and without invoking more browser functionality
  than I need. My planned application has no graphical user interface. It
  brings about no display. It's all about trees whose nodes may be
  annotated with size and position information. I expect to call getRect()
  frequently.

  So what I really need is DOM::Document and so on. I'll use KApplication,
  KHTMLPart, KHTMLView, and so on, only as I need them to invoke the
  functionality of DOM, CSS, and KJS classes.

  I'm aware of two ways to get a DOM::HTMLDocument from an HTML file, without
  getting into windows and widgets.

  Technique 1 looks like this:

  DOM::HTMLDocument doc;
  doc.setAsync(false);
  doc.load(url);

  Technique 1 has two serious problems. First, when getRect() is called on
  nodes, it produces no useful information. Second, when the program
  exits, either the automatically-invoked destructors bomb, or if I invoke
  destructors myself, they still bomb.

  Technique 2 looks like this, where inputHTMLQString is a QString that's
  read from the HTML file, it doesn't matter how.

  KHTMLPart * pPart = new KHTMLPart();
  pPart->begin();
  pPart->write(inputHTMLQString);
  pPart->end();
  DOM::Document doc = pPart->document();

  Technique 2 has two serious problems. First, when getRect() is called on
  nodes, it produces no useful information, so there's nothing to choose
  between Technique 1 and Technique 2 here. Technique 2 leads to graceful
  destruction, but it brings along by default a very fussy version of the
  HTML parser which wreaks havoc with scripts, among other constituents. I
  can get around the fussy parser, but the way I've done it so far isn't
  pretty.

  If I want to have all of the following:

  calls to getRect() produce useful results
  tolerant parser
  effective destruction

  and I want to have them to the extent possible without without windows
  and widgets, I'm guessing the best technique isn't either of the ones
  I've tried. Best aside, what's a good technique?


[Attachment #3 (text/html)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<META content="MSHTML 6.00.2900.2627" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT size=2>Thanks very much for the xfake suggestion.&nbsp;Suppressing 
the display is part of what I'm after. The more critical requirement is to 
dispense with functionality that my application doesn't need. I want the 
application to be as small and fast as possible. What's the minimal context that 
allows the DOM, CSS, and KJS classes to be used for parsing and 
measurement?</FONT></DIV>
<BLOCKQUOTE dir=ltr 
style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 \
2px solid; MARGIN-RIGHT: 0px">  <DIV style="FONT: 10pt arial">----- Original Message \
----- </DIV>  <DIV 
  style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B> 
  <A title=achats@blarg.net href="mailto:achats@blarg.net">Nom Declavier</A> 
  </DIV>
  <DIV style="FONT: 10pt arial"><B>To:</B> <A title=kfm-devel@kde.org 
  href="mailto:kfm-devel@kde.org">kfm-devel@kde.org</A> </DIV>
  <DIV style="FONT: 10pt arial"><B>Sent:</B> Sunday, May 01, 2005 6:31 PM</DIV>
  <DIV style="FONT: 10pt arial"><B>Subject:</B> using KHTML without 
display</DIV>
  <DIV><BR></DIV>
  <DIV><FONT size=2>
  <DIV><FONT size=2><FONT size=3>I'd like to use KHTML to parse 
  HTML/CSS/Javascript, and to deduce the<BR>sizes and positions of Web page 
  constituents when the page is rendered<BR>by a KHTML-based browser. But I want 
  to do this without actually<BR>rendering to any screen, and without invoking 
  more browser functionality<BR>than I need. My planned application has no 
  graphical user interface. It<BR>brings about no display. It's all about trees 
  whose nodes may be<BR>annotated with size and position information. I expect 
  to call getRect()<BR>frequently.<BR><BR>So what I really need is DOM::Document 
  and so on. I'll use KApplication,<BR>KHTMLPart, KHTMLView, and so on, only as 
  I need them to invoke the<BR>functionality of DOM, CSS, and KJS 
  classes.<BR><BR>I'm aware of two ways to get a DOM::HTMLDocument from an HTML 
  file, without<BR>getting into windows and widgets.<BR><BR>Technique 1 looks 
  like this:<BR><BR>DOM::HTMLDocument 
  doc;<BR>doc.setAsync(false);<BR>doc.load(url);<BR><BR>Technique 1 has two 
  serious problems. First, when getRect() is called on<BR>nodes, it produces no 
  useful information. Second, when the program<BR>exits, either the 
  automatically-invoked destructors bomb, or if I invoke<BR>destructors myself, 
  they still bomb.<BR><BR>Technique 2 looks like this, where inputHTMLQString is 
  a QString that's<BR>read from the HTML file, it doesn't matter 
  how.<BR><BR>KHTMLPart * pPart = new 
  KHTMLPart();<BR>pPart-&gt;begin();<BR>pPart-&gt;write(inputHTMLQString);<BR>pPart-&gt;end();<BR>DOM::Document \
  doc = pPart-&gt;document();<BR><BR>Technique 2 has two serious problems. 
  First, when getRect() is called on<BR>nodes, it produces no useful 
  information, so there's nothing to choose<BR>between Technique 1 and Technique 
  2 here. Technique 2 leads to graceful<BR>destruction, but it brings along by 
  default a very fussy version of the<BR>HTML parser which wreaks havoc with 
  scripts, among other constituents. I<BR>can get around the fussy parser, but 
  the way I've done it so far isn't<BR>pretty.<BR><BR>If I want to have all of 
  the following:<BR><BR>calls to getRect() produce useful results<BR>tolerant 
  parser<BR>effective destruction<BR><BR>and I want to have them to the extent 
  possible without without windows<BR>and widgets, I'm guessing the best 
  technique isn't either of the ones<BR>I've tried. Best aside, what's a good 
  technique?</FONT><BR><BR></FONT></DIV></FONT></DIV></BLOCKQUOTE></BODY></HTML>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic