[prev in list] [next in list] [prev in thread] [next in thread] 

List:       flume-user
Subject:    RE: checkpoint lifecycle
From:       Umesh Telang <Umesh.Telang () bbc ! co ! uk>
Date:       2014-01-30 17:44:42
Message-ID: B28E1934F72F05418ED42B5174609A1F0F184A42 () BGB01XUD1005 ! national ! core ! bbc ! co ! uk
[Download RAW message or body]

[Attachment #2 (text/plain)]

Thanks very much, Brock for all your help.

________________________________
From: Brock Noland [brock@cloudera.com]
Sent: 30 January 2014 16:28
To: user@flume.apache.org
Subject: Re: checkpoint lifecycle


On Thu, Jan 30, 2014 at 9:29 AM, Umesh Telang \
<Umesh.Telang@bbc.co.uk<mailto:Umesh.Telang@bbc.co.uk>> wrote:

Ah, ok. So 32 bytes is required for each pointer to an event.

Yep :)

We'll amend our heap size accordingly. We may also be able to reduce our FileChannel \
size. We hadn't understood the implications of the capacity value of the FileChannel \
we have been using.

Regarding the multiple data directories, I hadn't realised that that implied distinct \
disks. Just to confirm, you're saying that each data directory has to be on a \
distinct disk?

The recommendation is that you have two data directories per distinct disk.

Is it that FileChannel can't utilise an entire disk from an IO perspective, \
regardless of how big the disk is?

Right, it has nothing to do with size and everything todo with IO bandwidth. We could \
optimize this area (and will) but for now specifying two data directories per disk is \
a good workaround.

Or is this size-dependent? i.e above a certain size, you need a second data \
directory? If the latter, could you let me know what that size is? If it's a general \
point, then I'll follow the earlier advice of 2 data dirs per file channel.

Doesn't relate to size.


Apologies for all the questions!

We had made an estimation of disk space (avg event size (~250 bytes)  * channel size \
(150M)) and have provisioned disks that are significantly larger than the required \
space.

Perfect, great to hear!

Thanks,
Umesh

________________________________
From: Brock Noland [brock@cloudera.com<mailto:brock@cloudera.com>]
Sent: 30 January 2014 14:38

To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: checkpoint lifecycle

On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang \
<Umesh.Telang@bbc.co.uk<mailto:Umesh.Telang@bbc.co.uk>> wrote:

Hi Brock,

Our heap size is 2GB.

That is not enough heap for 150M events. It's 150 million * 32 bytes = 4.5GB + say \
100-500MB for the rest of Flume.


Thanks for the advice on data directories. Could you please let me know the heuristic \
for that?   (e.g. 1 data directory per N-sized channel where N is...)

File channel at present cannot utilize an entire disk from a IO perspective, that is \
why I suggest multiple disks. Of course you'll want to ensure that you have enough \
disk to support a full channel, but that is a different discussion (avg event size * \
channel size).


Thanks also for suggesting back up checkpoints - are these something that increases \
the integrity of Flume's execution in an automatic fashion, or does it aid in some \
form of manual recovery?

Automatic. If flume is killed or shutdown during a checkpoint that checkpoint is \
invalid and unless a backup checkpoint exists a full replay will have to take place. \
Furthermore, without FLUME-2155 full replays are very time consuming under certain \
conditions.


Re: FLUME-2155, I've scanned through it, and will read it in more detail. I'm not \
sure about the unit of measurement for some of the metrics (milliseconds?), but is \
there any guidance as to at which order of magnitude (10^4, 10^6 or 10^8 ?) the \
channel size causes the replay issue to become apparent?

It's not purely about channel size. Specifically it's about:

1) Large channel size
2) Having a large number of events in your channel (queue depth)
3) Having run the channel for some time such that old WAL's were cleaned up (causing \
there to be removes for which no event exists) 4) Performing a full replay in these \
conditions

Generally I wouldn't go over a 1M channel size without backup checkpoint, this \
change, or both. There are more details here:

https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=13841465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465


Brock



----------------------------


http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views \
which are not the views of the BBC unless specifically stated. If you have received \
it in error, please delete it from your system. Do not use, copy or disclose the \
information in any way nor act in reliance on it and notify the sender immediately. \
Please note that the BBC monitors e-mails sent or received. Further communication \
will signify your consent to this.

---------------------



--
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views \
which are not the views of the BBC unless specifically stated. If you have received \
it in error, please delete it from your system. Do not use, copy or disclose the \
information in any way nor act in reliance on it and notify the sender immediately. \
Please note that the BBC monitors e-mails sent or received. Further communication \
will signify your consent to this.

---------------------


[Attachment #3 (text/html)]

<html dir="ltr">
<head>
<!-- Template generated by Exclaimer Mail Disclaimers on 05:45:02 Thursday, 30 \
January 2014 --> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">P.4cc8db23-950e-4a1e-80b2-bdf13080b22b {
	MARGIN: 0cm 0cm 0pt
}
LI.4cc8db23-950e-4a1e-80b2-bdf13080b22b {
	MARGIN: 0cm 0cm 0pt
}
DIV.4cc8db23-950e-4a1e-80b2-bdf13080b22b {
	MARGIN: 0cm 0cm 0pt
}
TABLE.4cc8db23-950e-4a1e-80b2-bdf13080b22bTable {
	MARGIN: 0cm 0cm 0pt
}
DIV.Section1 {
	page: Section1
}
</style><style type="text/css" id="owaParaStyle"></style><style \
type="text/css"></style> </head>
<body fpstyle="1" ocsi="0">
<p class="4cc8db23-950e-4a1e-80b2-bdf13080b22b"></p>
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: \
10pt;">Thanks very much, Brock for all your help. <div><br>
<div style="font-family: Times New Roman; color: #000000; font-size: 16px">
<hr tabindex="-1">
<div id="divRpF721366" style="direction: ltr;"><font face="Tahoma" size="2" \
color="#000000"><b>From:</b> Brock Noland [brock@cloudera.com]<br> <b>Sent:</b> 30 \
January 2014 16:28<br> <b>To:</b> user@flume.apache.org<br>
<b>Subject:</b> Re: checkpoint lifecycle<br>
</font><br>
</div>
<div></div>
<div>
<div dir="ltr">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 30, 2014 at 9:29 AM, Umesh Telang <span \
dir="ltr"> &lt;<a href="mailto:Umesh.Telang@bbc.co.uk" \
target="_blank">Umesh.Telang@bbc.co.uk</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; \
padding-left:1ex"> <div>
<p></p>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">Ah, ok. So 32 bytes is \
required for each pointer to an event. </div>
</div>
</blockquote>
<div><br>
</div>
<div>Yep :)</div>
<div>&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">We'll amend our heap \
size accordingly. We may also be able to reduce our FileChannel size. We hadn't \
understood the implications of the capacity value of the FileChannel we have been \
using. <div><br>
</div>
<div>Regarding the multiple data directories, I hadn't realised that that implied \
distinct disks. Just to confirm, you're saying that each data directory has to be on \
a distinct disk? &nbsp;</div> </div>
</div>
</blockquote>
<div><br>
</div>
<div>The recommendation is that you have two data directories per distinct \
disk.</div> <div>&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">
<div>Is it that FileChannel can't utilise an entire disk from an IO perspective, \
regardless of how big the disk is? &nbsp;</div> </div>
</div>
</blockquote>
<div><br>
</div>
<div>Right, it has nothing to do with size and everything todo with IO bandwidth. We \
could optimize this area (and will) but for now specifying two data directories per \
disk is a good workaround.</div> <div>&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">
<div>Or is this size-dependent? i.e above a certain size, you need a second data \
directory? If the latter, could you let me know what that size is?&nbsp;</div> \
<div>If it's a general point, then I'll follow the earlier advice of 2 data dirs per \
file channel.</div> </div>
</div>
</blockquote>
<div><br>
</div>
<div>Doesn't relate to size.</div>
<div>&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">
<div><br>
</div>
<div>Apologies for all the questions! &nbsp;</div>
<div><br>
</div>
<div>We had made an estimation of disk space (avg event size (~250 bytes) &nbsp;* \
channel size (150M)) and have provisioned disks that are significantly larger than \
the required space.</div> </div>
</div>
</blockquote>
<div><br>
</div>
<div>Perfect, great to hear!&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">
<div><br>
</div>
<div>Thanks,</div>
<div>Umesh<br>
<div><br>
<div style="font-size:16px; font-family:Times New Roman">
<hr>
<div style="direction:ltr"><font face="Tahoma" color="#000000"><b>From:</b> Brock \
Noland [<a href="mailto:brock@cloudera.com" \
target="_blank">brock@cloudera.com</a>]<br> <b>Sent:</b> 30 January 2014 14:38
<div class="im"><br>
<b>To:</b> <a href="mailto:user@flume.apache.org" \
target="_blank">user@flume.apache.org</a><br> <b>Subject:</b> Re: checkpoint \
lifecycle<br> </div>
</font><br>
</div>
<div>
<div class="h5">
<div></div>
<div>
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang <span \
dir="ltr"> &lt;<a href="mailto:Umesh.Telang@bbc.co.uk" \
target="_blank">Umesh.Telang@bbc.co.uk</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; \
border-left-color:rgb(204,204,204); border-left-style:solid; padding-left:1ex"> <div>
<p></p>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">Hi Brock,
<div><br>
</div>
<div>Our heap size is 2GB.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>That is not enough heap for 150M events. It's&nbsp;150 million * 32 bytes = \
4.5GB &#43; say 100-500MB for the rest of Flume.</div> <div>&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; \
border-left-width:1px; border-left-color:rgb(204,204,204); border-left-style:solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">
<div><span style="font-size:10pt"><br>
</span></div>
<div><span style="font-size:10pt">Thanks for the advice on data directories. Could \
you please let me know the heuristic for that? &nbsp; (e.g. 1 data directory per \
N-sized channel where N is...)</span></div> </div>
</div>
</blockquote>
<div><br>
</div>
<div>File channel at present cannot utilize an entire disk from a IO perspective, \
that is why I suggest multiple disks. Of course you'll want to ensure that you have \
enough disk to support a full channel, but that is a different discussion (avg event \
size *  channel size).</div>
<div>&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; \
border-left-width:1px; border-left-color:rgb(204,204,204); border-left-style:solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">
<div><br>
</div>
<div>Thanks also for suggesting back up checkpoints - are these something that \
increases the integrity of Flume's execution in an automatic fashion, or does it aid \
in some form of manual recovery?</div> </div>
</div>
</blockquote>
<div><br>
</div>
<div>Automatic. If flume is killed or shutdown during a checkpoint that checkpoint is \
invalid and unless a backup checkpoint exists a full replay will have to take place. \
Furthermore, without FLUME-2155 full replays are very time consuming under certain \
conditions.</div> <div>&nbsp;</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; \
border-left-width:1px; border-left-color:rgb(204,204,204); border-left-style:solid; \
padding-left:1ex"> <div>
<div style="direction:ltr; font-size:10pt; font-family:Tahoma">
<div><br>
</div>
<div>Re: FLUME-2155, I've scanned through it, and will read it in more detail. I'm \
not sure about the unit of measurement for some of the metrics (milliseconds?), but \
is there any guidance as to at which order of magnitude (10^4, 10^6 or 10^8 ?) the \
channel  size causes the replay issue to become apparent?</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>It's not purely about channel size. Specifically it's about:</div>
<div><br>
</div>
<div>1) Large channel size</div>
<div>2) Having a large number of events in your channel (queue depth)</div>
<div>3) Having run the channel for some time such that old WAL's were cleaned up \
(causing there to be removes for which no event exists)</div> <div>4) Performing a \
full replay in these conditions</div> <div><br>
</div>
<div>Generally I wouldn't go over a 1M channel size without backup checkpoint, this \
change, or both. There are more details here:</div> <div><br>
</div>
<div><a href="https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=138414 \
65&amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465" \
target="_blank">https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=1384 \
1465&amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465</a><br>
 </div>
<div><br>
</div>
<div>Brock</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p></p>
<p>&nbsp;</p>
<p>----------------------------</p>
<div class="im"><br>
<font size="3" face="Times New Roman"><font size="3" face="Times New Roman"><font \
size="3" face="Times New Roman"><br> <font size="3" face="Times New Roman"><a \
href="http://www.bbc.co.uk" \
target="_blank">http://www.<span>bbc</span>.<span>co</span>.<span>uk</span></a><br> \
This e-mail (and any attachments) is confidential and may contain personal views \
which are not the views of the <span>BBC</span> unless specifically stated.<br>
If you have received it in error, please delete it from your system.<br>
Do not use, copy or disclose the information in any way nor act in reliance on it and \
notify the sender immediately.<br> Please note that the <span>BBC</span> monitors \
e-mails sent or received.<br> Further communication will signify your consent to \
this.</font></font></font></font></div> <p></p>
<p>---------------------</p>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Apache MRUnit - Unit testing MapReduce - <a \
href="http://mrunit.apache.org" target="_blank"> http://mrunit.apache.org</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
<p></p>
<p class="4cc8db23-950e-4a1e-80b2-bdf13080b22b">&nbsp;</p>
<p class="4cc8db23-950e-4a1e-80b2-bdf13080b22b">----------------------------<br>
<font size="3" face="Times New Roman"><font size="3" face="Times New Roman"><font \
size="3" face="Times New Roman"><br> <font size="3" face="Times New Roman"><a \
href="http://www.bbc.co.uk" target="_blank">http://www.<span \
class="il">bbc</span>.<span class="il">co</span>.<span class="il">uk</span></a><br> \
This e-mail (and any attachments) is confidential and may contain personal views \
which are not the views of the <span class="il">BBC</span> unless specifically \
stated.<br> If you have received it in error, please delete it from your system.<br>
Do not use, copy or disclose the information in any way nor act in reliance on it and \
notify the sender immediately.<br> Please note that the <span class="il">BBC</span> \
monitors e-mails sent or received.<br> Further communication will signify your \
consent to this.</font></font></font></font></p> <p \
class="4cc8db23-950e-4a1e-80b2-bdf13080b22b">---------------------</p> </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic