[prev in list] [next in list] [prev in thread] [next in thread]
List: flume-user
Subject: Re: multiple flume clients and memory
From: Matt Fair <matt.fair () gmail ! com>
Date: 2015-03-29 15:47:25
Message-ID: CAAXZqF1hOCNJqygmP3wau5p4b33JvwY=iD3SxF3eT_sASx4ZVQ () mail ! gmail ! com
[Download RAW message or body]
Thank you very much! Suggesting VisualVM was very useful in exploring the
usage of java resources which was really the large issue that I was running
into and I re-architected my code to run as many threads instead of many
separate java processes. By doing that it alleviated all of my memory
issues, which I suspect was really just the overhead of each separate java
process, not the flume client code.
Thanks again!
Matt
On Wed, Mar 25, 2015 at 11:18 PM, Ashish <paliwalashish@gmail.com> wrote:
> Do all these clients have memory usage is in same range? If yes, then
> taking a heap dump would reveal what is consuming memory.
>
> As Hari said, the batch is kept in-memory, meaning Event size would
> matter. Here is what I would do to debug this
>
> 1. See the memory usage of all client
> 2. If they are in range, would use VisualVM to get the heap dump of
> any one of the process, else take heap dump of a few process (max, min
> usage etc)
> 3. Use Eclipse MAT or other tool to see what's consuming the memory
>
> Can also try tweaking the batch size to see if it makes any difference
> in memory usage.
>
> On Thu, Mar 26, 2015 at 8:33 AM, Matt Fair <matt.fair@gmail.com> wrote:
> > The machine that I have seen it both on my machine with 16 GB and 60 GB
> of
> > memory, when running about 40 clients and ~4k clients respectively using
> up
> > 100% of memory. If I run without the flume client I have no memory
> > problems, but when I insatiate a flume RPCClient, then I run into memory
> > problems.
> >
> > Thanks,
> > Matt
> >
> > On Wed, Mar 25, 2015 at 6:42 PM, Hari Shreedharan
> > <hshreedharan@cloudera.com> wrote:
> >>
> >> How much memory are you talking about? The RPC client will hold on to
> the
> >> batch of events you sent, plus some additional threading overhead.
> Under the
> >> hood, it uses a Netty client which should not really have a big memory
> >> footprint.
> >>
> >> Thanks,
> >> Hari
> >>
> >>
> >> On Wed, Mar 25, 2015 at 3:27 PM, Matt Fair <matt.fair@gmail.com> wrote:
> >>>
> >>> I have an application that launches a bunch of processes (40+) on the
> >>> same machine, each one connects to flume using the default flume
> RPCClient.
> >>> I however have noticed that each RPCClient takes up a decent amount of
> >>> memory, and when you create as many clients like I am, it adds up to a
> lot
> >>> of memory. One thought I had to alleviate having to create all of the
> >>> clients was to create only a single RPCClient and then have my other
> >>> processes connect to it via a socket, but that seems a little redundant
> >>> since that is what the RPCClient is suppose to do anyways. Have others
> >>> found themselves in this same situation? Is there a way to handle
> memory
> >>> more efficiently or is there another RPCClient implementation that
> doesn't
> >>> take up as much memory?
> >>>
> >>> Thanks,
> >>> Matt
> >>
> >>
> >
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>
[Attachment #3 (text/html)]
<div dir="ltr">Thank you very much! Suggesting VisualVM was very useful in exploring \
the usage of java resources which was really the large issue that I was running into \
and I re-architected my code to run as many threads instead of many separate java \
processes. By doing that it alleviated all of my memory issues, which I suspect was \
really just the overhead of each separate java process, not the flume client \
code.<div><br></div><div>Thanks \
again!</div><div>Matt</div><div><div><div><br></div></div></div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 25, 2015 at 11:18 PM, \
Ashish <span dir="ltr"><<a href="mailto:paliwalashish@gmail.com" \
target="_blank">paliwalashish@gmail.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Do all these clients have memory usage is in same range? If \
yes, then<br> taking a heap dump would reveal what is consuming memory.<br>
<br>
As Hari said, the batch is kept in-memory, meaning Event size would<br>
matter. Here is what I would do to debug this<br>
<br>
1. See the memory usage of all client<br>
2. If they are in range, would use VisualVM to get the heap dump of<br>
any one of the process, else take heap dump of a few process (max, min<br>
usage etc)<br>
3. Use Eclipse MAT or other tool to see what's consuming the memory<br>
<br>
Can also try tweaking the batch size to see if it makes any difference<br>
in memory usage.<br>
<div class="HOEnZb"><div class="h5"><br>
On Thu, Mar 26, 2015 at 8:33 AM, Matt Fair <<a \
href="mailto:matt.fair@gmail.com">matt.fair@gmail.com</a>> wrote:<br> > The \
machine that I have seen it both on my machine with 16 GB and 60 GB of<br> > \
memory, when running about 40 clients and ~4k clients respectively using up<br> > \
100% of memory. If I run without the flume client I have no memory<br> > \
problems, but when I insatiate a flume RPCClient, then I run into memory<br> > \
problems.<br> ><br>
> Thanks,<br>
> Matt<br>
><br>
> On Wed, Mar 25, 2015 at 6:42 PM, Hari Shreedharan<br>
> <<a href="mailto:hshreedharan@cloudera.com">hshreedharan@cloudera.com</a>> \
wrote:<br> >><br>
>> How much memory are you talking about? The RPC client will hold on to \
the<br> >> batch of events you sent, plus some additional threading overhead. \
Under the<br> >> hood, it uses a Netty client which should not really have a \
big memory<br> >> footprint.<br>
>><br>
>> Thanks,<br>
>> Hari<br>
>><br>
>><br>
>> On Wed, Mar 25, 2015 at 3:27 PM, Matt Fair <<a \
href="mailto:matt.fair@gmail.com">matt.fair@gmail.com</a>> wrote:<br> \
>>><br> >>> I have an application that launches a bunch of \
processes (40+) on the<br> >>> same machine, each one connects to flume \
using the default flume RPCClient.<br> >>> I however have noticed that each \
RPCClient takes up a decent amount of<br> >>> memory, and when you create as \
many clients like I am, it adds up to a lot<br> >>> of memory. One thought \
I had to alleviate having to create all of the<br> >>> clients was to create \
only a single RPCClient and then have my other<br> >>> processes connect to \
it via a socket, but that seems a little redundant<br> >>> since that is \
what the RPCClient is suppose to do anyways. Have others<br> >>> found \
themselves in this same situation? Is there a way to handle memory<br> >>> \
more efficiently or is there another RPCClient implementation that doesn't<br> \
>>> take up as much memory?<br> >>><br>
>>> Thanks,<br>
>>> Matt<br>
>><br>
>><br>
><br>
<br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
thanks<br>
ashish<br>
<br>
Blog: <a href="http://www.ashishpaliwal.com/blog" \
target="_blank">http://www.ashishpaliwal.com/blog</a><br> My Photo Galleries: <a \
href="http://www.pbase.com/ashishpaliwal" \
target="_blank">http://www.pbase.com/ashishpaliwal</a><br> \
</font></span></blockquote></div><br></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic