[prev in list] [next in list] [prev in thread] [next in thread] 

List:       htmlunit-user
Subject:    Re: [Htmlunit-user] tmp files
From:       J <ihavealegohead () yahoo ! com>
Date:       2011-04-22 20:58:50
Message-ID: 670503.61411.qm () web65816 ! mail ! ac4 ! yahoo ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Here is the fix I implemented, which deletes the files once they are 10 minutes 
old. I should probably do better pattern matching in the second function -- but 
you get the idea --  Just insert these functions and call cleanup() 
periodically. 


-J


    public static void cleanup() {
       String tempdir = System.getProperty("java.io.tmpdir");

        deleteFilesOlderThanNminutes(10, tempdir);
        return;
    }

    public static void deleteFilesOlderThanNminutes(int minsBack, String dirWay) 
{

        File directory = new File(dirWay);
        if (directory.exists()) {

            File[] listFiles = directory.listFiles();
            long purgeTime = System.currentTimeMillis() - (minsBack * 60 * 
1000);
            for (File listFile : listFiles) {
                if (listFile.getName().contains("htmlunit")) {
                    if (listFile.lastModified() < purgeTime) {
                        if (!listFile.delete()) {
                            System.out.println("Unable to delete file: " + 
listFile);
                        }
                    }
                }
            }
        } else {
            System.out.println("Files were not deleted, directory " + dirWay + " 
does'nt exist!");
        }
    }





________________________________
From: J <ihavealegohead@yahoo.com>
To: htmlunit-user@lists.sourceforge.net
Sent: Wed, April 20, 2011 8:22:06 AM
Subject: Re: [Htmlunit-user] tmp files


Hi, 

Thank you for the speedy reply.  

The JVM can be up as long as three days (Friday to Monday) prior to a restart.  
There is no one to monitor over the weekend.  During that time, it is scraping 
pages and storing for indexing by another app.  During those three days we 
noticed that it collects about 57gb of temp files. (Multi-Threaded spider w/ 
java script) The user asserts that this is slowing down the application over 
time (but I haven't evaluated that).

I will try to write my own function to remove the files (for windows) and will 
post here. 


-J





________________________________
From: Marc Guillemot <mguillemot@yahoo.fr>
To: htmlunit-user@lists.sourceforge.net
Sent: Fri, April 15, 2011 7:55:58 AM
Subject: Re: [Htmlunit-user] tmp files

Hi,

I think that it has been introduced after 2.7. Someone already reported 
this problem but didn't provide any feedback on what I proposed to test 
therefore no change has been made in the code base.

If you can hack the HtmlUnit version you're using, I can propose you a 
change that may solve the problem. If yes, I would be happy to add it 
into the code base.

Btw: is your JVM using HtmlUnit up all the time?

Cheers,
Marc.

-- 
HtmlUnit support & consulting from the source
Blog: http://mguillem.wordpress.com



Le 15/04/2011 15:13, Rural Hunter a écrit :
> I didn't see this behavior with htmlunit 2.7. maybe this is introduced
> in with v2.8?
>
> 2011/4/15 21:00, Mark Locklear wrote:
>> Looks like there is not any way to handle this natively within
>> HTMLUnit. Have you  considered a 'tear down' function that would remove
>> the files? Should not be hard to write your own function to remove them.
>>
>> On Fri, Apr 15, 2011 at 8:56 AM, J <ihavealegohead@yahoo.com
>> <mailto:ihavealegohead@yahoo.com>> wrote:
>>
>>     Thoughts anyone?
>>
>>     ------------------------------------------------------------------------
>>     *From:* J <ihavealegohead@yahoo.com <mailto:ihavealegohead@yahoo.com>>
>>     *To:* htmlunit-user@lists.sourceforge.net
>>     <mailto:htmlunit-user@lists.sourceforge.net>
>>     *Sent:* Sun, April 10, 2011 8:07:52 AM
>>     *Subject:* tmp files
>>
>>     Hi Gang,
>>
>>     I know this has been discussed before. I have a java app that
>>     scrapes pages. Over a three day period I collected 57GB of temp
>>     files like this:
>>
>>     04/10/2011 07:55 AM 601,845 htmlunit5801385718691168118.tmp
>>     04/10/2011 07:55 AM 656,736 htmlunit5969965552760849390.tmp
>>     04/10/2011 07:55 AM 726,575  htmlunit6117172858876623233.tmp
>>     04/10/2011 07:53 AM 857,098 htmlunit6202458995179421076.tmp
>>     04/10/2011 07:55 AM 779,627 htmlunit627685916148993480.tmp
>>     04/10/2011 07:54 AM 567,019 htmlunit6283066291185857988.tmp
>>     04/10/2011 07:53 AM 788,032 htmlunit6299129423422802544.tmp
>>     04/10/2011 07:55 AM 842,307 htmlunit6349285296783299349.tmp
>>
>>     HtmlUnit uses temporary files when downloading content with size>    
>500kb.
>>
>>     These tmp files are representations of the page the application is 
>>extracting data from. Even though I issue a "webClient.closeAllWindows()" these 
>>files are still growing.
>>
>>
>>     It would be beneficial for me to call an html function that in return will 
>>calldeleteOnExit(). Or for some function  internal to delete them once the 
>>number of files reaches a threshold.
>>
>>     As this issue has never been resolved, I'd like to attempt a hack and see 
>>if I can clean it up.
>>
>>
>>     Can anyone provide some direction of
>>       where
>>       I might start ? I don't want to hack this in the wrong place. ;(
>>
>>     -J

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Htmlunit-user mailing list
Htmlunit-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlunit-user

[Attachment #5 (text/html)]

<html><head><style type="text/css"><!-- DIV {margin:0px;} \
--></style></head><body><div style="font-family:Courier \
New,courier,monaco,monospace,sans-serif;font-size:12pt">Here is the fix I \
implemented, which deletes the files once they are 10 minutes old. I should probably \
do better pattern matching in the second function -- but you get the idea --&nbsp; \
Just insert these functions and call cleanup() periodically. \
<br><br>-J<br><br><br>&nbsp;&nbsp;&nbsp; public static void cleanup() \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; String tempdir = \
System.getProperty("java.io.tmpdir");<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
deleteFilesOlderThanNminutes(10, \
tempdir);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return;<br>&nbsp;&nbsp;&nbsp; \
}<br><br>&nbsp;&nbsp;&nbsp; public static void deleteFilesOlderThanNminutes(int \
minsBack, String dirWay) {<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; File \
directory = new  File(dirWay);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if \
(directory.exists()) \
{<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; File[] \
listFiles = directory.listFiles();<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
long purgeTime = System.currentTimeMillis() - (minsBack * 60 * \
1000);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for \
(File listFile : listFiles) \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
if (listFile.getName().contains("htmlunit")) \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
if (listFile.lastModified() &lt; purgeTime) \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
if (!listFile.delete())  \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb \
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
System.out.println("Unable to delete file: " + \
listFile);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; } else \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
System.out.println("Files were not deleted, directory " + dirWay + " does'nt \
exist!");<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp;&nbsp; \
}<br><div><br></div><div  style="font-family: Courier \
New,courier,monaco,monospace,sans-serif; font-size: 12pt;"><br><div \
style="font-family: times new roman,new york,times,serif; font-size: 12pt;"><font \
face="Tahoma" size="2"><hr size="1"><b><span style="font-weight: \
bold;">From:</span></b> J &lt;ihavealegohead@yahoo.com&gt;<br><b><span \
style="font-weight: bold;">To:</span></b> \
htmlunit-user@lists.sourceforge.net<br><b><span style="font-weight: \
bold;">Sent:</span></b> Wed, April 20, 2011 8:22:06 AM<br><b><span \
style="font-weight: bold;">Subject:</span></b> Re: [Htmlunit-user] tmp \
files<br></font><br> <div style="font-family: Courier \
New,courier,monaco,monospace,sans-serif; font-size: 12pt;">Hi, <br><br>Thank you for \
the speedy reply.&nbsp; <br><br>The JVM can be up as long as three days (Friday to \
Monday) prior to a restart.&nbsp; There is no one to monitor over the weekend.&nbsp; \
During that time, it is scraping pages and storing for indexing by another app.&nbsp; \
During those three days we noticed that it collects about 57gb of temp files. \
(Multi-Threaded spider w/ java script) The user asserts that this is slowing down the \
application over time (but I haven't evaluated that).<br><br>I will try to write my \
own function to remove the files (for windows) and will post here. \
<br><br>-J<br><div><br></div><div style="font-family: Courier \
New,courier,monaco,monospace,sans-serif; font-size: 12pt;"><br><div \
style="font-family: arial,helvetica,sans-serif; font-size: 13px;"><font face="Tahoma" \
size="2"><hr size="1"><b><span style="font-weight:  bold;">From:</span></b> Marc \
Guillemot &lt;mguillemot@yahoo.fr&gt;<br><b><span style="font-weight: \
bold;">To:</span></b> htmlunit-user@lists.sourceforge.net<br><b><span \
style="font-weight: bold;">Sent:</span></b> Fri, April 15, 2011 7:55:58 \
AM<br><b><span style="font-weight: bold;">Subject:</span></b> Re: [Htmlunit-user] tmp \
files<br></font><br> Hi,<br><br>I think that it has been introduced after 2.7. \
Someone already reported <br>this problem but didn't provide any feedback on what I \
proposed to test <br>therefore no change has been made in the code base.<br><br>If \
you can hack the HtmlUnit version you're using, I can propose you a <br>change that \
may solve the problem. If yes, I would be happy to add it <br>into the code \
base.<br><br>Btw: is your JVM using HtmlUnit up all the \
time?<br><br>Cheers,<br>Marc.<br><br>-- <br>HtmlUnit support &amp; consulting from \
the source<br><span><span>Blog: <a target="_blank" \
href="http://mguillem.wordpress.com">http://mguillem.wordpress.com</a></span></span><br><br><br><br>Le \
15/04/2011 15:13, Rural Hunter a écrit :<br>&gt; I didn't see this behavior with \
htmlunit 2.7. maybe this is introduced<br>&gt; in with v2.8?<br>&gt;<br>&gt; \
2011/4/15 21:00, Mark Locklear wrote:<br>&gt;&gt; Looks like there is not any way to \
handle this natively within<br>&gt;&gt;  HTMLUnit. Have you
 considered a 'tear down' function that would remove<br>&gt;&gt; the files? Should \
not be hard to write your own function to remove them.<br>&gt;&gt;<br>&gt;&gt; On \
Fri, Apr 15, 2011 at 8:56 AM, J &lt;<a rel="nofollow" \
ymailto="mailto:ihavealegohead@yahoo.com" target="_blank" \
href="mailto:ihavealegohead@yahoo.com">ihavealegohead@yahoo.com</a><br>&gt;&gt; \
&lt;mailto:<a rel="nofollow" ymailto="mailto:ihavealegohead@yahoo.com" \
target="_blank" href="mailto:ihavealegohead@yahoo.com">ihavealegohead@yahoo.com</a>&gt;&gt; \
wrote:<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  Thoughts \
anyone?<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  \
------------------------------------------------------------------------<br>&gt;&gt;&nbsp; \
&nbsp;  *From:* J &lt;<a rel="nofollow" ymailto="mailto:ihavealegohead@yahoo.com" \
target="_blank" href="mailto:ihavealegohead@yahoo.com">ihavealegohead@yahoo.com</a> \
&lt;mailto:<a rel="nofollow" ymailto="mailto:ihavealegohead@yahoo.com" \
target="_blank"  href="mailto:ihavealegohead@yahoo.com">ihavealegohead@yahoo.com</a>&gt;&gt;<br>&gt;&gt;&nbsp; \
&nbsp;  *To:* <a rel="nofollow" ymailto="mailto:htmlunit-user@lists.sourceforge.net" \
target="_blank" href="mailto:htmlunit-user@lists.sourceforge.net">htmlunit-user@lists.sourceforge.net</a><br>&gt;&gt;&nbsp; \
&nbsp;  &lt;mailto:<a rel="nofollow" \
ymailto="mailto:htmlunit-user@lists.sourceforge.net" target="_blank" \
href="mailto:htmlunit-user@lists.sourceforge.net">htmlunit-user@lists.sourceforge.net</a>&gt;<br>&gt;&gt;&nbsp; \
&nbsp;  *Sent:* Sun, April 10, 2011 8:07:52 AM<br>&gt;&gt;&nbsp; &nbsp;  *Subject:* \
tmp files<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  Hi \
Gang,<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  I know this has been discussed before. I \
have a java app that<br>&gt;&gt;&nbsp; &nbsp;  scrapes pages. Over a three day period \
I collected 57GB of temp<br>&gt;&gt;&nbsp; &nbsp;  files like \
this:<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:55 AM 601,845  \
htmlunit5801385718691168118.tmp<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:55 AM 656,736 \
htmlunit5969965552760849390.tmp<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:55 AM 726,575 \
htmlunit6117172858876623233.tmp<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:53 AM 857,098 \
htmlunit6202458995179421076.tmp<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:55 AM 779,627 \
htmlunit627685916148993480.tmp<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:54 AM 567,019 \
htmlunit6283066291185857988.tmp<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:53 AM 788,032 \
htmlunit6299129423422802544.tmp<br>&gt;&gt;&nbsp; &nbsp;  04/10/2011 07:55 AM 842,307 \
htmlunit6349285296783299349.tmp<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  HtmlUnit uses \
temporary files when downloading content with size&gt;&nbsp; &nbsp; \
500kb.<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  These tmp files are representations of \
the page the application is extracting data from. Even though I issue a \
"webClient.closeAllWindows()" these files are still \
growing.<br>&gt;&gt;<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  It would be beneficial for \
me to call an html function that in return will calldeleteOnExit(). Or for some \
function  internal to delete them once the number of files reaches a \
threshold.<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  As this issue has never been \
resolved, I'd like to attempt a hack and see if I can clean it \
up.<br>&gt;&gt;<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  Can anyone provide some \
direction of<br>&gt;&gt;&nbsp; &nbsp; &nbsp;  where<br>&gt;&gt;&nbsp; &nbsp; &nbsp;  \
I might start ? I don't want to hack this in the wrong place. \
;(<br>&gt;&gt;<br>&gt;&gt;&nbsp; &nbsp;  \
-J<br><br>------------------------------------------------------------------------------<br>Benefiting \
from Server Virtualization: Beyond Initial Workload <br>Consolidation -- Increasing \
the use of server virtualization is a top<br>priority.Virtualization can reduce \
costs, simplify management, and improve <br>application availability and disaster \
protection. Learn more about boosting <br><span><span>the value of server \
virtualization. <a target="_blank"  \
href="http://p.sf.net/sfu/vmware-sfdev2dev">http://p.sf.net/sfu/vmware-sfdev2dev</a></span></span><br>_______________________________________________<br>Htmlunit-user \
mailing list<br><a rel="nofollow" \
ymailto="mailto:Htmlunit-user@lists.sourceforge.net" target="_blank" \
href="mailto:Htmlunit-user@lists.sourceforge.net">Htmlunit-user@lists.sourceforge.net</a><br><a \
rel="nofollow" target="_blank" \
href="https://lists.sourceforge.net/lists/listinfo/htmlunit-user">https://lists.sourceforge.net/lists/listinfo/htmlunit-user</a><br></div></div>




</div></div></div>
</div></body></html>



------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails

_______________________________________________
Htmlunit-user mailing list
Htmlunit-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlunit-user


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic