'Re: Flume-ng 1.6 reliable setup'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       flume-user
Subject:    Re: Flume-ng 1.6 reliable setup
From:       Simone Roselli <simoneroselli78 () gmail ! com>
Date:       2015-10-21 9:52:03
Message-ID: CAH34K+m8-Gxq-ZOt1xBvr89bF93tt4v2jqOqioqaigk5S_5TRw () mail ! gmail ! com
[Download RAW message or body]

In case of disk crash the Flume host is removed from a backend pool, and it
stops receiving events; so that won't be a problem.

I've found a nice solution (nice for our setup) setting up a Spooldir
source, configured on the same Channel of the Kafka sink. This means that
each event placed on this directory will be instantly pushed on the Kafka
sink.

So, to recap, the File roll sink will write each events that cannot reach
the main Kafka, on a directory (/failover); periodically a script will just
check for the presence of events in this directory and, in the case, it
will just move them to /spool_directory


Thanks anyway for the support

On Mon, Oct 19, 2015 at 3:14 PM, Gonzalo Herreros <gherreros@gmail.com>
wrote:

> I see. Maybe you need more kafka nodes and less Flume agents (I have the
> same of each)
>
> All the solutions you mention will not survive a disk crash.
> I would rather rely on Kafka to guarantee no message losses.
>
> Gonzalo
>
> On 19 October 2015 at 13:39, Simone Roselli <simoneroselli78@gmail.com>
> wrote:
>
>> Hi,
>>
>> .. because a Kafka channel will lead me to the same problem, no?
>>
>> I have 200 nodes, each one with a Flume-ng agent aboard. I cannot lose a
>> single event.
>>
>> With a memory/file channel, in case Kafka is down/broken/bugged, I could
>> still take care of events (Spillable memory, File roll, other sinks..). In
>> case of Kafka Channel (another separated Kafka cluster) I would exclusively
>> rely on the Kafka cluster, which was my initial non-ideal situation, having
>> it as a Sink.
>>
>>
>> Thanks
>> Simone
>>
>>
>>
>>
>>
>> On Mon, Oct 19, 2015 at 11:28 AM, Gonzalo Herreros <gherreros@gmail.com>
>> wrote:
>>
>>> Why don't you use a Kafka channel?
>>> It would be simpler and it would meet your initial requirement of having
>>> channel fail tolerance.
>>>
>>> Regards,
>>> Gonzalo
>>>
>>> On 19 October 2015 at 10:23, Simone Roselli <simoneroselli78@gmail.com>
>>> wrote:
>>>
>>>> However,
>>>>
>>>> since the arrive order on Kafka (main sink) is not a particular problem
>>>> to me, my current solution would be:
>>>>
>>>>  * memory channel
>>>>  * sinkgroup with 2 sinks:
>>>>    ** Kafka
>>>>    ** File_roll (write events on '/data/x' directory,  in case Kafka is
>>>> down)
>>>>  * periodically check the presence of files in '/data/x' and, in the
>>>> case, re-push them to Kafka
>>>>
>>>> I still don't know whether it is possible to re-push File-roll files on
>>>> Kafka using bin/flume-ng
>>>>
>>>> Whatever hints would be appreciated.
>>>>
>>>> Many thanks
>>>>
>>>> On Fri, Oct 16, 2015 at 4:32 PM, Simone Roselli <
>>>> simoneroselli78@gmail.com> wrote:
>>>>
>>>>> Hi Phil,
>>>>>
>>>>> thanks for your reply.
>>>>>
>>>>> Yes, setting up a file-channel configuration is consuming CPU up to
>>>>> 80/90%
>>>>>
>>>>> My settings:
>>>>> # Channel configuration
>>>>> agent1.channels.ch1.type = file
>>>>> agent1.channels.ch1.checkpointDir = /opt/flume-ng/chekpoint
>>>>> agent1.channels.ch1.dataDirs = /opt/flume-ng/data
>>>>> agent1.channels.ch1.capacity = 1000000
>>>>> agent1.channels.ch1.transactionCapacity = 10000
>>>>>
>>>>> # flume-env.sh
>>>>> export JAVA_OPTS="-Xms512m -Xmx2048m"
>>>>>
>>>>> # top
>>>>> 22079 flume-ng  20   0 6924752 785536  17132 S  83.7%  2.4   3:53.19
>>>>> java
>>>>>
>>>>> Do you have any tuning for the GC ?
>>>>>
>>>>> Thanks
>>>>> Simone
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 15, 2015 at 7:59 PM, Phil Scala <
>>>>> Phil.Scala@globalrelay.net> wrote:
>>>>>
>>>>>> Hi Simone
>>>>>>
>>>>>>
>>>>>>
>>>>>> I wonder why you're seeing 90% CPU use when you use a file channel.
>>>>>> I would expect high disk I/O.  To counter, I have on a single server 4
>>>>>> spool dir sources, each going to a separate file channel.  Also on an SSD
>>>>>> based server.   I do not see any CPU or even disk IO utilization.  I am
>>>>>> pushing about 10 million events per day across all 4 sources and has been
>>>>>> running reliably for 2 years now.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would always use a file channel, any memory channel runs the risk
>>>>>> of data loss if the node were to fail.  I would be as worried about the
>>>>>> local node failing seeing that a 3 node kafka cluster losing 2 nodes before
>>>>>> it would lose quorum.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Not sure what your data source is, if you can add more flume nodes of
>>>>>> course that would help.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Have you given ample heap space, seeing maybe GC's causing the high
>>>>>> CPU?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Simone Roselli [mailto:simoneroselli78@gmail.com]
>>>>>> *Sent:* Friday, October 09, 2015 12:33 AM
>>>>>> *To:* user@flume.apache.org
>>>>>> *Subject:* Flume-ng 1.6 reliable setup
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm currently plan to migrate from Flume 0.9 to Flume-ng 1.6, but I'm
>>>>>> having troubles trying to find a reliable setup for this one.
>>>>>>
>>>>>>
>>>>>>
>>>>>> My sink is a 3 nodes Kafka cluster. I must avoid *to lose events in
>>>>>> case the main sink is down*, broken or unreachable for a while.
>>>>>>
>>>>>>
>>>>>>
>>>>>> In Flume 0.9, I use a memory channel with the *store on failure *feature,
>>>>>> which starts writing events on the local disk in case the target sink is
>>>>>> not available.
>>>>>>
>>>>>>
>>>>>>
>>>>>> In Flume-ng 1.6 the same behaviour would be accomplished by setting
>>>>>> up a *Spillable memory channel, *but the problem with this solution
>>>>>> is written in the end of the channel's description: "*This channel
>>>>>> is currently experimental and not recommended for use in production."*
>>>>>>
>>>>>>
>>>>>>
>>>>>> In Flume-ng 1.6, it's possible to setup a pool of *Failover sinks*.
>>>>>> So, I was thinking to hypothetically configure a *File Roll *as
>>>>>> Secondary sink in case the Primary is down. However, once the Primary sink
>>>>>> would be back online, the data placed on the Secondary sink (local disk)
>>>>>> won't be automatically pushed on the Primary one.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Another option would be setting up a *file channel*: write each
>>>>>> event on the disk and then sink. Without mentioning that I don't love the
>>>>>> idea to write/delete each single event continuously on a SSD, this setup is
>>>>>> taking 90% of CPU. The same exactly configuration but using a memory
>>>>>> channel takes 3%.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Other solutions to evaluate ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Simone
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

[Attachment #3 (text/html)]

<div dir="ltr">In case of disk crash the Flume host is removed from a backend pool, \
and it stops receiving events; so that won&#39;t be a problem.  \
<div><br></div><div>I&#39;ve found a nice solution (nice for our setup) setting up a \
Spooldir source, configured on the same Channel of the Kafka sink. This means that \
each event placed on this directory will be instantly pushed on the Kafka \
sink.</div><div><br></div><div>So, to recap, the File roll sink will write each \
events that cannot reach the main Kafka, on a directory (/failover); periodically a \
script will just check for the presence of events in this directory and, in the case, \
it will just move them to \
/spool_directory</div><div><br></div><div><br></div><div>Thanks anyway for the \
support</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct \
19, 2015 at 3:14 PM, Gonzalo Herreros <span dir="ltr">&lt;<a \
href="mailto:gherreros@gmail.com" target="_blank">gherreros@gmail.com</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="ltr">I see. Maybe you need more kafka nodes \
and less Flume agents (I have the same of each)<div><br><div>All the solutions you \
mention will not survive a disk crash.</div><div>I would rather rely on Kafka to \
guarantee no message losses.<span class="HOEnZb"><font \
color="#888888"><br><div><br></div><div>Gonzalo</div></font></span></div></div></div><div \
class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div \
class="gmail_quote">On 19 October 2015 at 13:39, Simone Roselli <span \
dir="ltr">&lt;<a href="mailto:simoneroselli78@gmail.com" \
target="_blank">simoneroselli78@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>.. because a Kafka \
channel will lead me to the same problem, no?</div><div><br></div><div>I have 200 \
nodes, each one with a Flume-ng agent aboard. I cannot lose a single \
event.</div><div><br></div><div>With a memory/file channel, in case Kafka is \
down/broken/bugged, I could still take care of events (Spillable memory, File roll, \
other sinks..). In case of Kafka Channel (another separated Kafka cluster) I would \
exclusively rely on the Kafka cluster, which was my initial non-ideal situation, \
having it as a Sink.</div><div><br></div><div><br></div><div>Thanks</div><span><font \
color="#888888"><div>Simone</div><div><br></div><div><br></div><div><br></div><div><br></div></font></span></div><div><div><div \
class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 19, 2015 at 11:28 AM, \
Gonzalo Herreros <span dir="ltr">&lt;<a href="mailto:gherreros@gmail.com" \
target="_blank">gherreros@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Why don&#39;t you use a Kafka channel?<div>It \
would be simpler and it would meet your initial requirement of having channel fail \
tolerance.</div><div><br></div><div>Regards,</div><div>Gonzalo</div></div><div><div><div \
class="gmail_extra"><br><div class="gmail_quote">On 19 October 2015 at 10:23, Simone \
Roselli <span dir="ltr">&lt;<a href="mailto:simoneroselli78@gmail.com" \
target="_blank">simoneroselli78@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">However,<div><br></div><div>since the arrive \
order on Kafka (main sink) is not a particular problem to me, my current solution \
would be:</div><div><br></div><div>  * memory channel</div><div>  * sinkgroup with 2 \
sinks:</div><div>     ** Kafka</div><div>     ** File_roll (write events on \
&#39;/data/x&#39; directory,   in case Kafka is down)</div><div>  * periodically \
check the presence of files in &#39;/data/x&#39; and, in the case, re-push them to \
Kafka</div><div><br></div><div>I still don&#39;t know whether it is possible to \
re-push File-roll files on Kafka using bin/flume-ng</div><div><br></div><div>Whatever \
hints would be appreciated.</div><div><br></div><div>Many \
thanks</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On \
Fri, Oct 16, 2015 at 4:32 PM, Simone Roselli <span dir="ltr">&lt;<a \
href="mailto:simoneroselli78@gmail.com" \
target="_blank">simoneroselli78@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hi Phil,<div><br></div><div>thanks for your \
reply.</div><div><br></div><div>Yes, setting up a file-channel configuration is \
consuming CPU up to 80/90%</div><div><br></div><div>My settings:</div><div><div># \
Channel configuration</div><div>agent1.channels.ch1.type = \
file</div><div>agent1.channels.ch1.checkpointDir = \
/opt/flume-ng/chekpoint</div><div>agent1.channels.ch1.dataDirs = \
/opt/flume-ng/data</div><div>agent1.channels.ch1.capacity =  \
1000000</div><div>agent1.channels.ch1.transactionCapacity =  \
10000</div></div><div><br></div><div># flume-env.sh</div><div>export \
JAVA_OPTS=&quot;-Xms512m -Xmx2048m&quot;<br></div><div><br></div><div># \
top</div><div>22079 flume-ng   20    0 6924752 785536   17132 S   83.7%   2.4    \
3:53.19 java<br></div><div><br></div><div>Do you have any tuning for the GC \
?<br></div><div><br></div><div>Thanks</div><span><font \
color="#888888"><div>Simone</div><div><br></div><div><br></div></font></span></div><div><div><div \
class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 15, 2015 at 7:59 PM, \
Phil Scala <span dir="ltr">&lt;<a href="mailto:Phil.Scala@globalrelay.net" \
target="_blank">Phil.Scala@globalrelay.net</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">





<div lang="EN-US" link="#0563C1" vlink="#954F72">
<div>
<p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626">Hi \
Simone<u></u><u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626">I \
wonder why you're seeing 90% CPU use when you use a file channel.   I would expect \
high disk I/O.   To counter, I have on a single server 4 spool dir sources,  each \
going to a separate file channel.   Also on an SSD based server.     I do not see any \
CPU or even disk IO utilization.   I am pushing about 10 million events per day \
across all 4 sources and has been running reliably for 2 years \
now.<u></u><u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626">I \
would always use a file channel, any memory channel runs the risk of data loss if the \
node were to fail.   I would be as worried about the local node failing  seeing that \
a 3 node kafka cluster losing 2 nodes before it would lose quorum.   <u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626">Not \
sure what your data source is, if you can add more flume nodes of course that would \
help.<u></u><u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626">Have \
you given ample heap space, seeing maybe GC's causing the high \
CPU?<u></u><u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626">Phil<u></u><u></u></span></p>
 <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal"><span \
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#262626"><u></u> \
<u></u></span></p> <p class="MsoNormal" style="margin-left:.5in"><b><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> Simone Roselli \
[mailto:<a href="mailto:simoneroselli78@gmail.com" \
target="_blank">simoneroselli78@gmail.com</a>] <br>
<b>Sent:</b> Friday, October 09, 2015 12:33 AM<br>
<b>To:</b> <a href="mailto:user@flume.apache.org" \
target="_blank">user@flume.apache.org</a><br> <b>Subject:</b> Flume-ng 1.6 reliable \
setup<u></u><u></u></span></p><div><div> <p class="MsoNormal" \
style="margin-left:.5in"><u></u>  <u></u></p> <div>
<p class="MsoNormal" style="margin-left:.5in">Hi,<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">I&#39;m currently plan to migrate from \
Flume 0.9 to Flume-ng 1.6, but I&#39;m having troubles trying to find a reliable \
setup for this one.<u></u><u></u></p> </div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">My sink is a 3 nodes Kafka cluster. I \
must avoid  <b>to lose  events in case the main sink is down</b>, broken or \
unreachable for a while.<u></u><u></u></p> </div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">In Flume 0.9, I use a memory channel<b>
</b>with the <b>store on failure </b>feature, which starts writing events on the \
local disk in case the target sink is not available.<u></u><u></u></p> </div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">In Flume-ng 1.6 the same behaviour \
would be accomplished by setting up a <b>Spillable memory channel, </b>but the \
problem with this solution is written in the end of the channel&#39;s description: \
&quot;<strong><span style="font-size:15.0pt;font-family:&quot;Times&quot;,serif;color:black">This \
channel is currently experimental and not recommended  for use in \
production.&quot;</span></strong><u></u><u></u></p> </div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">In Flume-ng 1.6, it&#39;s possible to \
setup a pool of  <b>Failover sinks</b>. So, I was thinking to  hypothetically \
configure a <b>File Roll </b>as Secondary sink in case the Primary is down. However, \
once the Primary sink would be back online, the data placed on the Secondary sink \
(local disk) won&#39;t be automatically pushed on the Primary one.<u></u><u></u></p> \
</div> <div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">Another option would be setting up a  \
<b>file channel</b>: write each event on the disk and then sink. Without mentioning \
that I don&#39;t love the idea to write/delete each single event continuously on a \
SSD, this setup  is taking 90% of CPU. The same exactly configuration but using a \
memory channel takes 3%.<u></u><u></u></p> </div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">Other solutions to evaluate \
?<u></u><u></u></p> </div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">Simone<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><u></u>  <u></u></p>
</div>
</div>
</div></div></div>
</div>

</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic