[prev in list] [next in list] [prev in thread] [next in thread] 

List:       htmlunit-develop
Subject:    [HtmlUnit] [ htmlunit-Feature Requests-2962074 ] Get HtmlUnit to
From:       "SourceForge.net" <noreply () sourceforge ! net>
Date:       2010-03-31 15:18:11
Message-ID: E1NwzgJ-0002Bp-4b () sfs-web-7 ! v29 ! ch3 ! sourceforge ! com
[Download RAW message or body]

Feature Requests item #2962074, was opened at 2010-03-02 10:03
Message generated for change (Comment added) made by amitmanjhi
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=448269&aid=2962074&group_id=47038

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Amit Manjhi (amitmanjhi)
Assigned to: Nobody/Anonymous (nobody)
Summary: Get HtmlUnit to run on Google App Engine (GAE)

Initial Comment:
There are several restrictions GAE places that prevent a vanilla HtmlUnit from \
running on GAE. Specifically, no threads can be started on GAE. GAE also does not \
allow many classes, such as URLStreamHandler, Applets etc. 

I have a somewhat hacky solution to get HtmlUnit to run on GAE. It builds on the \
                dethreading patch, and has 3 main changes: 
-- the EventLoop does not start a thread. Instead, it just accumulates jobs. When the \
                main thread calls pumpEventLoop (long timeout)
-- An appEngine compatible implementation of WebConnection called \
                UrlFetchWebConnection.
-- A hack to avoid the use of URLHandler. I rewrite "javascript:<data>" url as \
"http://javascript/<data>" and then in UrlFetchWebConnection, I return the \
appropriate response.  
In addition, I had to change some of the variables in WebClient.java, so that they \
are not statically initialized. I have attached a patch against a "recent version" of \
my dethreading patch, just to get the discussion started.

----------------------------------------------------------------------

> Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-31 08:18

Message:
Marc: I will take a look at the changes and update the patch.

Wouldn't HttpWebConnection fail because it tries to load the
java.lang.Thread class somewhere?

----------------------------------------------------------------------

Comment By: Marc Guillemot (mguillem)
Date: 2010-03-29 03:29

Message:
I've committed a fix to avoid NoClassDefFoundError for URLStreamHandler and
uses a hack similar to the one of your patch for javascript, about and data
urls BUT without the need to change WebClient.URL_ABOUT_BLANK or to make
any change that would modify the normal behaviour of HtmlUnit.

Can you take these changes into account and simplify your patch?

I'd like to continue step by step, with dedicated tests all the time. The
next problem with HtmlUnit on GAE is now that the HttpWebConnection starts
a thread. My plan is to write a test that reproduces this problem and then
to use the web connection of your patch. I don't know when I'll be able to
work. If you provide a patch for that, it could go faster.

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-26 12:10

Message:
Marc: I looked at r5610. It is a great approach for testing appEngine
compatibility.  Looking forward to the rest of the patch. 

----------------------------------------------------------------------

Comment By: Marc Guillemot (mguillem)
Date: 2010-03-26 00:59

Message:
For info: I work slowly on this. I've added a unit test (as
NotYetImplemented) simulating the problems due to GAE white list. I have an
idea how to cleanly fix it (proposed patch is a hack and can't be
integrated this way).  Once this is done, you can update the patch to make
use of it.

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-25 18:32

Message:
@Phil: you are welcome to try this patch. Apply it against a current
htmlunit svn source (r5611 or higher).  Build the HtmlUnit jars and try
them on AppEngine.

Now that Marc is back, I hope things can move quickly on landing a version
of this patch.

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-25 18:22

Message:
Updated git patch against r5611.  To  see this in action, visit:
http://ajax-crawler.appspot.com/

Built with GWT + HtmlUnit with this patch.

----------------------------------------------------------------------

Comment By: PhilBeaudoin ()
Date: 2010-03-14 14:14

Message:
@Amit, @Marc: Cheers cheers! Go go go! Viva Open Source. :)  More
seriously, if there's anything I can do to help speedup the process, let me
know, I could have some time to contribute myself.

----------------------------------------------------------------------

Comment By: Daniel Gredler (sdanig)
Date: 2010-03-13 08:43

Message:
@Phil: I believe this is currently at the "working proof of concept" stage;
you can grab the sources in SVN trunk and apply this patch and probably get
it to work, but it may be a while yet before you can just download HtmlUnit
and use it in GAE. I guess if Amit and Marc continue to collaborate as
quickly as they did for the dethreading patch, it might happen before the
next version of HtmlUnit is released :-)

----------------------------------------------------------------------

Comment By: PhilBeaudoin ()
Date: 2010-03-12 14:16

Message:
I just want to chip in to say I too am very interested in running HTML Unit
on GAE. amitmanjhi mentions that the dethreading patch is in. Is there any
way to get this version of HTML Unit? Would it run on GAE?

(And thanks all for this great tool!)

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-09 13:16

Message:
Marc: Sorry for assigning the bug to you.

Now that the dethreading patch is in, I will update the patch so that
anyone can play with it. Let us also continue the discussion around what
tests to add. So far, we have:
- Being able to run most of HtmlUnit JS tests, in the AppEngine mode.
- AppEngine integration tests, where we test the DOM produced by HtmlUnit
in AppEngine mode against the HtmlUnit in DOM mode. 

Did I miss anything?

----------------------------------------------------------------------

Comment By: Marc Guillemot (mguillem)
Date: 2010-03-09 00:48

Message:
@Amit: please let committers assign bugs themselves. Remember that most of
the time we work on HtmlUnit in our free time (gigs are welcome to speed up
things ;-)) and that even if I would welcome GAE support I can't foresee
how many time I will have to work on it.

I find the idea interesting to take "normal HtmlUnit" as reference to test
"HtmlUnit on GAE".

Concerning the default browser: if if wouldn't hurt so much users, I would
prefer to remove the notion of default browser! ;-)

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-08 09:59

Message:
@Marc: Marking it as "next release" and assigning it to you. Is that fine?

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-05 18:20

Message:
Almost all the HtmlUnit JS library tests "should" work out of the box,
except that a pumpEventLoop(...) call would need to be added. However, I
still think it would be worthwhile to have an integration test going -- the
knob we have is in selecting the URLs and what conditions to check for
them. Perhaps, the "new reference DOM" could be obtained by running
HtmlUnit in non-appEngine mode. So the test would check whether the
AppEngine-mode HtmlUnit produces the same output as the "default" HtmlUnit.
Thoughts?

Regarding IE7, it is just  an attempt to keep things simple. (plus, in
most cases, it doesn't affect the output.) Maybe it is time to update the
default browser in HtmlUnit :-). 

----------------------------------------------------------------------

Comment By: Marc Guillemot (mguillem)
Date: 2010-03-04 00:40

Message:
Hmm, external URLs are surely interesting but I wouldn't use them as the
first test. As we don't control them, we would have to regularly verify
manually that:
- "normal" HtmlUnit still works with them
- get new reference DOM

What about a set of things that we can control, without external
dependencies like HtmlUnit JS library tests?

Additional note: in your comment as well as in
http://code.google.com/web/ajaxcrawling/docs/html-snapshot.html you use
"new WebClient()" which is (currently) equivalent to "new
WebClient(BrowserVersion.INTERNET_EXPLORER_7)". Is it voluntary to simulate
IE7 rather than an other browser?

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-03 23:27

Message:
A file containing a comprehensive list of classes used in HtmlUnit code,
which can't be used on AppEngine.

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-03 15:52

Message:
In the code fragment,  the first line should be:
 for (String inputUrl : urls) {

In addition, the client could be set up as: 
 WebClient client = new WebClient();

static {
   WebClient.setAppEngineMode();
}

----------------------------------------------------------------------

Comment By: Amit Manjhi (amitmanjhi)
Date: 2010-03-03 15:49

Message:
Marc: Thanks for the comments. To start with, I would suggest adding a test
that does the following, where urls is just a collection of popular JS
heavy URLs: 

for (String url : urls) {
       HtmlPage page = client.getPage(inputUrl);
      client.pumpEventLoop(10000);
      String pageAsString = page.asXml();
      assert(...); // assert pageAsString contains the rendered DOM.
}

To implement such a test, we will need to come up with a list of URLs and
conditions for each URL that confirm whether HtmlUnit produced the DOM
correctly or not. This test would also directly help any developers who aim
to use HtmlUnit for crawling, as outlined in
http://code.google.com/web/ajaxcrawling/docs/html-snapshot.html

Thoughts?

----------------------------------------------------------------------

Comment By: Marc Guillemot (mguillem)
Date: 2010-03-03 07:53

Message:
Thanks for the patch. This is indeed a hack, but this is good enough to
start the discussion.

In this first comment I don't want to go into details of the patch but
rather about the target. I personally find GAE very interesting and would
really welcome it if HtmlUnit could run on GAE, even if it is not the
primary target.

I believe that the first thing that we have to define is what we want to
see running on GAE. Currently HtmlUnit doesn't work at all and it would
surely be very difficult to have the "full" HtmlUnit running on GAE. We
have to define something between these two extrema as the target.

Once the target is defined, we need to ensure that it is reached and that
future releases of HtmlUnit continue to reach it. Everything that can be
tested by unit tests integrated in HtmlUnit build's has to be tested this
way but this isn't enough. GAE is different and the only real test would be
to deploy a project on GAE to run some tests (I've personally experienced
that the dev mode is interesting but not good enough to give a definitive
result). This means that we need to setup a new project for instance in
HtmlUnit's SVN for this purpose and create a GAE app with HtmlUnit
committers as developers (+ you? or you submit some other patches to become
committer ;-)).

What do you think? What would you see as initial set of tests that should
pass?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=448269&aid=2962074&group_id=47038

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
HtmlUnit-develop mailing list
HtmlUnit-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlunit-develop


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic