[prev in list] [next in list] [prev in thread] [next in thread] 

List:       flume-user
Subject:    Re: multiple flume clients and memory
From:       Matt Fair <matt.fair () gmail ! com>
Date:       2015-03-29 15:47:25
Message-ID: CAAXZqF1hOCNJqygmP3wau5p4b33JvwY=iD3SxF3eT_sASx4ZVQ () mail ! gmail ! com
[Download RAW message or body]

Thank you very much!  Suggesting VisualVM was very useful in exploring the
usage of java resources which was really the large issue that I was running
into and I re-architected my code to run as many threads instead of many
separate java processes.  By doing that it alleviated all of my memory
issues, which I suspect was really just the overhead of each separate java
process, not the flume client code.

Thanks again!
Matt


On Wed, Mar 25, 2015 at 11:18 PM, Ashish <paliwalashish@gmail.com> wrote:

> Do all these clients have memory usage is in same range? If yes, then
> taking a heap dump would reveal what is consuming memory.
>
> As Hari said, the batch is kept in-memory, meaning Event size would
> matter. Here is what I would do to debug this
>
> 1. See the memory usage of all client
> 2. If they are in range, would use VisualVM to get the heap dump of
> any one of the process, else take heap dump of a few process (max, min
> usage etc)
> 3. Use Eclipse MAT or other tool to see what's consuming the memory
>
> Can also try tweaking the batch size to see if it makes any difference
> in memory usage.
>
> On Thu, Mar 26, 2015 at 8:33 AM, Matt Fair <matt.fair@gmail.com> wrote:
> > The machine that I have seen it both on my machine with 16 GB and 60 GB
> of
> > memory, when running about 40 clients and ~4k clients respectively using
> up
> > 100% of memory.  If I run without the flume client I have no memory
> > problems, but when I insatiate a flume RPCClient, then I run into memory
> > problems.
> >
> > Thanks,
> > Matt
> >
> > On Wed, Mar 25, 2015 at 6:42 PM, Hari Shreedharan
> > <hshreedharan@cloudera.com> wrote:
> >>
> >> How much memory are you talking about? The RPC client will hold on to
> the
> >> batch of events you sent, plus some additional threading overhead.
> Under the
> >> hood, it uses a Netty client which should not really have a big memory
> >> footprint.
> >>
> >> Thanks,
> >> Hari
> >>
> >>
> >> On Wed, Mar 25, 2015 at 3:27 PM, Matt Fair <matt.fair@gmail.com> wrote:
> >>>
> >>> I have an application that launches a bunch of processes (40+) on the
> >>> same machine, each one connects to flume using the default flume
> RPCClient.
> >>> I however have noticed that each RPCClient takes up a decent amount of
> >>> memory, and when you create as many clients like I am, it adds up to a
> lot
> >>> of memory.  One thought I had to alleviate having to create all of the
> >>> clients was to create only a single RPCClient and then have my other
> >>> processes connect to it via a socket, but that seems a little redundant
> >>> since that is what the RPCClient is suppose to do anyways.  Have others
> >>> found themselves in this same situation?  Is there a way to handle
> memory
> >>> more efficiently or is there another RPCClient implementation that
> doesn't
> >>> take up as much memory?
> >>>
> >>> Thanks,
> >>> Matt
> >>
> >>
> >
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>

[Attachment #3 (text/html)]

<div dir="ltr">Thank you very much!  Suggesting VisualVM was very useful in exploring \
the usage of java resources which was really the large issue that I was running into \
and I re-architected my code to run as many threads instead of many separate java \
processes.  By doing that it alleviated all of my memory issues, which I suspect was \
really just the overhead of each separate java process, not the flume client \
code.<div><br></div><div>Thanks \
again!</div><div>Matt</div><div><div><div><br></div></div></div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 25, 2015 at 11:18 PM, \
Ashish <span dir="ltr">&lt;<a href="mailto:paliwalashish@gmail.com" \
target="_blank">paliwalashish@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Do all these clients have memory usage is in same range? If \
yes, then<br> taking a heap dump would reveal what is consuming memory.<br>
<br>
As Hari said, the batch is kept in-memory, meaning Event size would<br>
matter. Here is what I would do to debug this<br>
<br>
1. See the memory usage of all client<br>
2. If they are in range, would use VisualVM to get the heap dump of<br>
any one of the process, else take heap dump of a few process (max, min<br>
usage etc)<br>
3. Use Eclipse MAT or other tool to see what&#39;s consuming the memory<br>
<br>
Can also try tweaking the batch size to see if it makes any difference<br>
in memory usage.<br>
<div class="HOEnZb"><div class="h5"><br>
On Thu, Mar 26, 2015 at 8:33 AM, Matt Fair &lt;<a \
href="mailto:matt.fair@gmail.com">matt.fair@gmail.com</a>&gt; wrote:<br> &gt; The \
machine that I have seen it both on my machine with 16 GB and 60 GB of<br> &gt; \
memory, when running about 40 clients and ~4k clients respectively using up<br> &gt; \
100% of memory.  If I run without the flume client I have no memory<br> &gt; \
problems, but when I insatiate a flume RPCClient, then I run into memory<br> &gt; \
problems.<br> &gt;<br>
&gt; Thanks,<br>
&gt; Matt<br>
&gt;<br>
&gt; On Wed, Mar 25, 2015 at 6:42 PM, Hari Shreedharan<br>
&gt; &lt;<a href="mailto:hshreedharan@cloudera.com">hshreedharan@cloudera.com</a>&gt; \
wrote:<br> &gt;&gt;<br>
&gt;&gt; How much memory are you talking about? The RPC client will hold on to \
the<br> &gt;&gt; batch of events you sent, plus some additional threading overhead. \
Under the<br> &gt;&gt; hood, it uses a Netty client which should not really have a \
big memory<br> &gt;&gt; footprint.<br>
&gt;&gt;<br>
&gt;&gt; Thanks,<br>
&gt;&gt; Hari<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Wed, Mar 25, 2015 at 3:27 PM, Matt Fair &lt;<a \
href="mailto:matt.fair@gmail.com">matt.fair@gmail.com</a>&gt; wrote:<br> \
&gt;&gt;&gt;<br> &gt;&gt;&gt; I have an application that launches a bunch of \
processes (40+) on the<br> &gt;&gt;&gt; same machine, each one connects to flume \
using the default flume RPCClient.<br> &gt;&gt;&gt; I however have noticed that each \
RPCClient takes up a decent amount of<br> &gt;&gt;&gt; memory, and when you create as \
many clients like I am, it adds up to a lot<br> &gt;&gt;&gt; of memory.  One thought \
I had to alleviate having to create all of the<br> &gt;&gt;&gt; clients was to create \
only a single RPCClient and then have my other<br> &gt;&gt;&gt; processes connect to \
it via a socket, but that seems a little redundant<br> &gt;&gt;&gt; since that is \
what the RPCClient is suppose to do anyways.  Have others<br> &gt;&gt;&gt; found \
themselves in this same situation?  Is there a way to handle memory<br> &gt;&gt;&gt; \
more efficiently or is there another RPCClient implementation that doesn&#39;t<br> \
&gt;&gt;&gt; take up as much memory?<br> &gt;&gt;&gt;<br>
&gt;&gt;&gt; Thanks,<br>
&gt;&gt;&gt; Matt<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
<br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
thanks<br>
ashish<br>
<br>
Blog: <a href="http://www.ashishpaliwal.com/blog" \
target="_blank">http://www.ashishpaliwal.com/blog</a><br> My Photo Galleries: <a \
href="http://www.pbase.com/ashishpaliwal" \
target="_blank">http://www.pbase.com/ashishpaliwal</a><br> \
</font></span></blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic