[prev in list] [next in list] [prev in thread] [next in thread] 

List:       flume-user
Subject:    Re: flume agent with HDFS sink, syslog source and memory channel - stuck on hdfs IOException
From:       Suhas Satish <suhas.satish () gmail ! com>
Date:       2014-01-15 1:42:54
Message-ID: CAPwc21a+aSuYsZhK7QqfF4XQ1O1Rh7TX-Avdohn82HbFnPdEUA () mail ! gmail ! com
[Download RAW message or body]

The patch has been tested and  uploaded. This should fix flume1.4 and
before.
https://issues.apache.org/jira/browse/FLUME-1654


Cheers,
Suhas.


On Wed, Oct 16, 2013 at 5:15 PM, Suhas Satish <suhas.satish@gmail.com>wrote=
:

> There already exists a JIRA. I have come up with a local fix which works.
> https://issues.apache.org/jira/browse/FLUME-1654
>
> Will be uploading a patch soon.
>
> Cheers,
> Suhas.
>
>
> On Tue, Oct 15, 2013 at 1:15 PM, Roshan Naik <roshan@hortonworks.com>wrot=
e:
>
>> Paul,
>>    HDFS sink issue apart... it sounds like this is a setup where  Hive  =
s
>> being allowed to read through new files/directories flowing into the
>> partition while HDFS sink is still writing to it. To my knowledge, in Hi=
ve,
>> a partition is considered immutable and it should not be updated once th=
e
>> partition is created. So only once the HDFS sink has rolled over to the
>> next directory, the previous directory should be exposed to Hive.
>> -roshan
>>
>>
>> On Tue, Oct 15, 2013 at 11:23 AM, Paul Chavez <
>> pchavez@verticalsearchworks.com> wrote:
>>
>>> I can=92t speak for Suhas, but I face a similar issue in production. Fo=
r
>>> me it occurs when someone queries a .tmp file from Hive or Pig. This ca=
uses
>>> the HDFS sink to lose the ability to close and rename the file and then=
 the
>>> HDFS sink is completely out of commission until the agent is restarted.
>>> We=92ve mitigated this in our environment by careful Hive partition
>>> coordination but it still crops up in cases where people are running ad=
-hoc
>>> queries they probably shouldn=92t be. We are waiting to get the latest =
CDH in
>>> production which eliminates the .tmp file issue but I would still like =
to
>>> have a more resilient HDFS sink and so I support development effort in =
this
>>> area.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Paul Chavez
>>>
>>>
>>>
>>>
>>>
>>> *From:* Roshan Naik [mailto:roshan@hortonworks.com]
>>> *Sent:* Tuesday, October 15, 2013 11:14 AM
>>> *To:* dev@flume.apache.org
>>> *Cc:* user@flume.apache.org; commits@flume.apache.org
>>> *Subject:* Re: flume agent with HDFS sink, syslog source and memory
>>> channel - stuck on hdfs IOException
>>>
>>>
>>>
>>> sounds like a valid bug. i am curious though... is there a use real use
>>> scenario you are facing in production ?
>>>
>>>
>>>
>>> On Mon, Oct 14, 2013 at 7:39 PM, Suhas Satish <suhas.satish@gmail.com>
>>> wrote:
>>>
>>> In summary, although the flume-agent JVM doesnt exit, once a HDFS IO
>>> exception
>>> occurs due to deleting a .tmp file, the agent doesn't recover from this
>>> to log
>>> other hdfs sink outputs generated by syslog source.
>>>
>>> There was only 1 JIRA remotely related to this HDFS sink issue I found =
in
>>> Apache which we didn't have. I tested by pulling-in jira patch
>>> FLUME-2007 into flume-1.4.0.
>>>
>>> https://github.com/apache/flume/commit/5b5470bd5d3e94842032009c36788d4a=
e346674bhttps://issues.apache.org/jira/browse/FLUME-2007<https://github.com=
/apache/flume/commit/5b5470bd5d3e94842032009c36788d4ae346674bhttps:/issues.=
apache.org/jira/browse/FLUME-2007>
>>>
>>> But it doesn't solve this issue.
>>>
>>> Should I open a new jira ticket?
>>>
>>>
>>>
>>> Thanks,
>>> Suhas.
>>>
>>>
>>> On Fri, Oct 11, 2013 at 4:13 PM, Suhas Satish <suhas.satish@gmail.com
>>> >wrote:
>>>
>>>
>>> > Hi I have the  following flume configuration file   flume-syslog.conf
>>> > (attached) -
>>> >
>>> > 1.) I laucnh it with -
>>> >
>>> > bin/flume-ng agent -n agent -c conf -f conf/flume-syslog.conf
>>> >
>>> > 2.) Generate log output using loggen (provided by syslog-ng):
>>> > loggen -I 30 -s 300 -r 900 localhost 13073
>>> >
>>> > 3.) I verify flume output is generated under /flume_import/ on hadoop
>>> cluster.
>>> >
>>> > It generates output of the form -
>>> >
>>> > -rwxr-xr-x   3 root root     139235 2013-10-11 14:35
>>> > /flume_import/2013/10/14/logdata-2013-10-14-35-45.1381527345384.tmp
>>> > -rwxr-xr-x   3 root root     138095 2013-10-11 14:35
>>> > /flume_import/2013/10/14/logdata-2013-10-14-35-46.1381527346543.tmp
>>> > -rwxr-xr-x   3 root root     135795 2013-10-11 14:35
>>> > /flume_import/2013/10/14/logdata-2013-10-14-35-47.1381527347670.tmp
>>> >
>>> >
>>> > 4.)  Delete the flume output files while loggen is still running and
>>> Flume is
>>> > generating the sink output.
>>> >
>>> > hadoop fs -rmr
>>> /flume_import/2013/10/14/logdata-2013-10-14-35-47.1381527347670.tmp
>>> >
>>> > 5. )This  gives me the following exception in the flume log. Although
>>> the flume agent JVM continues to run, it does not generate any more out=
put
>>> files from syslog-ng  until the flume agent JVM is restarted. Is flume
>>>  expected to behave like this or  should it handle IOException graceful=
ly
>>> and continue to log output of syslog to other output directories?
>>> >
>>> > 10 Oct 2013 16:55:42,092 WARN
>>>  [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> > (org.apache.flume.sink.hdfs.BucketWriter.append:430)  - Caught
>>> IOException
>>> > while closing file
>>> >
>>> (maprfs:///flume_import/2013/10/16//logdata-2013-10-16-50-03.1381449008=
596.tmp).
>>> > Exception follows.
>>> > java.io.IOException: 2049.112.5249612
>>> > /flume_import/2013/10/16/logdata-2013-10-16-50-03.1381449008596.tmp
>>> (Stale file
>>> > handle)
>>> >     at com.mapr.fs.Inode.throwIfFailed(Inode.java:269)
>>> >     at com.mapr.fs.Inode.flushJniBuffers(Inode.java:402)
>>> >     at com.mapr.fs.Inode.syncInternal(Inode.java:478)
>>> >     at com.mapr.fs.Inode.syncUpto(Inode.java:484)
>>> >     at com.mapr.fs.MapRFsOutStream.sync(MapRFsOutStream.java:244)
>>> >     at
>>> com.mapr.fs.MapRFsDataOutputStream.sync(MapRFsDataOutputStream.java:68)
>>> >     at
>>> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:946)
>>> >     at
>>> >
>>> org.apache.flume.sink.hdfs.HDFSSequenceFile.sync(HDFSSequenceFile.java:=
107)
>>> >     at
>>> org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356)
>>> >     at
>>> org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353)
>>> >     at
>>> org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
>>> >     at
>>> >
>>> org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java=
:160)
>>> >     at
>>> >
>>> org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:5=
6)
>>> >     at
>>> org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
>>> >     at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> >     at
>>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(T
>>> >
>>> >
>>> > 6.) I found the following related post
>>> >
>>> >
>>> http://mail-archives.apache.org/mod_mbox/flume-user/201305.mbox/%3C24FE=
FF6B2CA048F7A4A7D9D6E30084FB@cloudera.com%3E
>>> >
>>> > Not sure if its related to this issue. Can anyone comment?
>>> >
>>> > Thanks,
>>> > Suhas.
>>> >
>>>
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entit=
y
>>> to which it is addressed and may contain information that is confidenti=
al,
>>> privileged and exempt from disclosure under applicable law. If the read=
er
>>> of this message is not the intended recipient, you are hereby notified =
that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immedia=
tely
>>> and delete it from your system. Thank You.
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidentia=
l,
>> privileged and exempt from disclosure under applicable law. If the reade=
r
>> of this message is not the intended recipient, you are hereby notified t=
hat
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediat=
ely
>> and delete it from your system. Thank You.
>>
>
>

[Attachment #3 (text/html)]

<div dir="ltr">The patch has been tested and  uploaded. This should fix flume1.4 and \
before. <div><a href="https://issues.apache.org/jira/browse/FLUME-1654">https://issues.apache.org/jira/browse/FLUME-1654</a><br></div><div>


<br></div></div><div class="gmail_extra"><br \
clear="all"><div>Cheers,<br>Suhas.<br></div> <br><br><div class="gmail_quote">On Wed, \
Oct 16, 2013 at 5:15 PM, Suhas Satish <span dir="ltr">&lt;<a \
href="mailto:suhas.satish@gmail.com" \
target="_blank">suhas.satish@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">

<div dir="ltr"><div>There already exists a JIRA. I have come up with a local fix \
which works. <br><a href="https://issues.apache.org/jira/browse/FLUME-1654" \
target="_blank">https://issues.apache.org/jira/browse/FLUME-1654</a><br>

<br></div>
Will be uploading a patch soon. <br></div><div class="gmail_extra"><br \
clear="all"><div>Cheers,<br>Suhas.<br></div><div><div class="h5"> <br><br><div \
class="gmail_quote">On Tue, Oct 15, 2013 at 1:15 PM, Roshan Naik <span \
dir="ltr">&lt;<a href="mailto:roshan@hortonworks.com" \
target="_blank">roshan@hortonworks.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">


<div dir="ltr">Paul,<div>   HDFS sink issue apart... it sounds like this is a setup \
where  Hive  s being allowed to read through new files/directories flowing into the \
partition while HDFS sink is still writing to it. To my knowledge, in Hive, a \
partition is considered immutable and it should not be updated once the partition is \
created. So only once the HDFS sink has rolled over to the next directory, the \
previous directory should be exposed to Hive. <span><font color="#888888"><div>



-roshan</div></font></span></div></div><div><div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Oct 15, 2013 at 11:23 \
AM, Paul Chavez <span dir="ltr">&lt;<a href="mailto:pchavez@verticalsearchworks.com" \
target="_blank">pchavez@verticalsearchworks.com</a>&gt;</span> wrote:<br>



<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div link="blue" vlink="purple" lang="EN-US"><div><p \
class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">I \
can’t speak for Suhas, but I face a similar issue in production. For me it occurs \
when someone queries a .tmp file from Hive or Pig. This causes the HDFS sink to lose \
the ability to close and rename the file and then the HDFS sink is completely out of \
commission until the agent is restarted. We’ve mitigated this in our environment by \
careful Hive partition coordination but it still crops up in cases where people are \
running ad-hoc queries they probably shouldn’t be. We are waiting to get the latest \
CDH in production which eliminates the .tmp file issue but I would still like to have \
a more resilient HDFS sink and so I support development effort in this \
area.<u></u><u></u></span></p>



<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> \
<u></u></span></p><p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Thanks,<u></u><u></u></span></p>




<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Paul \
Chavez<u></u><u></u></span></p><p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> \
<u></u></span></p>



<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> \
<u></u></span></p><p class="MsoNormal"><b><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">From:</span></b><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;"> \
Roshan Naik [mailto:<a href="mailto:roshan@hortonworks.com" \
target="_blank">roshan@hortonworks.com</a>] <br>



<b>Sent:</b> Tuesday, October 15, 2013 11:14 AM<br><b>To:</b> <a \
href="mailto:dev@flume.apache.org" \
target="_blank">dev@flume.apache.org</a><br><b>Cc:</b> <a \
href="mailto:user@flume.apache.org" target="_blank">user@flume.apache.org</a>; <a \
href="mailto:commits@flume.apache.org" \
target="_blank">commits@flume.apache.org</a><br>



<b>Subject:</b> Re: flume agent with HDFS sink, syslog source and memory channel - \
stuck on hdfs IOException<u></u><u></u></span></p><div><div><p \
class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">sounds like a valid \
bug. i am curious though... is there a use real use scenario you are facing in \
production ?<u></u><u></u></p>



</div><div><p class="MsoNormal" style="margin-bottom:12.0pt"><u></u> \
<u></u></p><div><p class="MsoNormal">On Mon, Oct 14, 2013 at 7:39 PM, Suhas Satish \
&lt;<a href="mailto:suhas.satish@gmail.com" \
target="_blank">suhas.satish@gmail.com</a>&gt; wrote:<u></u><u></u></p>



<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in \
6.0pt;margin-left:4.8pt;margin-right:0in"><p class="MsoNormal">In summary, although \
the flume-agent JVM doesnt exit, once a HDFS IO exception<br>



occurs due to deleting a .tmp file, the agent doesn&#39;t recover from this to \
log<br>other hdfs sink outputs generated by syslog source.<br><br>There was only 1 \
JIRA remotely related to this HDFS sink issue I found in<br>



Apache which we didn&#39;t have. I tested by pulling-in jira patch<br>FLUME-2007 into \
flume-1.4.0.<br><a href="https://github.com/apache/flume/commit/5b5470bd5d3e94842032009c36788d4ae346674bhttps:/issues.apache.org/jira/browse/FLUME-2007" \
target="_blank">https://github.com/apache/flume/commit/5b5470bd5d3e94842032009c36788d4ae346674bhttps://issues.apache.org/jira/browse/FLUME-2007</a><br>




<br>But it doesn&#39;t solve this issue.<br><br>Should I open a new jira \
ticket?<br><br><br><br>Thanks,<br>Suhas.<br><br><br>On Fri, Oct 11, 2013 at 4:13 PM, \
Suhas Satish &lt;<a href="mailto:suhas.satish@gmail.com" \
target="_blank">suhas.satish@gmail.com</a>&gt;wrote:<u></u><u></u></p>



<div><div><p class="MsoNormal"><br>&gt; Hi I have the  following flume configuration \
file   flume-syslog.conf<br>&gt; (attached) -<br>&gt;<br>&gt; 1.) I laucnh it with \
-<br>&gt;<br>&gt; bin/flume-ng agent -n agent -c conf -f conf/flume-syslog.conf<br>



&gt;<br>&gt; 2.) Generate log output using loggen (provided by syslog-ng):<br>&gt; \
loggen -I 30 -s 300 -r 900 localhost 13073<br>&gt;<br>&gt; 3.) I verify flume output \
is generated under /flume_import/ on hadoop cluster.<br>



&gt;<br>&gt; It generates output of the form -<br>&gt;<br>&gt; -rwxr-xr-x   3 root \
root     139235 2013-10-11 14:35<br>&gt; \
/flume_import/2013/10/14/logdata-2013-10-14-35-45.1381527345384.tmp<br>&gt; \
-rwxr-xr-x   3 root root     138095 2013-10-11 14:35<br>



&gt; /flume_import/2013/10/14/logdata-2013-10-14-35-46.1381527346543.tmp<br>&gt; \
-rwxr-xr-x   3 root root     135795 2013-10-11 14:35<br>&gt; \
/flume_import/2013/10/14/logdata-2013-10-14-35-47.1381527347670.tmp<br>&gt;<br>



&gt;<br>&gt; 4.)  Delete the flume output files while loggen is still running and \
Flume is<br>&gt; generating the sink output.<br>&gt;<br>&gt; hadoop fs -rmr \
/flume_import/2013/10/14/logdata-2013-10-14-35-47.1381527347670.tmp<br>



&gt;<br>&gt; 5. )This  gives me the following exception in the flume log. Although \
the flume agent JVM continues to run, it does not generate any more output files from \
syslog-ng  until the flume agent JVM is restarted. Is flume  expected to behave like \
this or  should it handle IOException gracefully and continue to log output of syslog \
to other output directories?<br>



&gt;<br>&gt; 10 Oct 2013 16:55:42,092 WARN  \
[SinkRunner-PollingRunner-DefaultSinkProcessor]<br>&gt; \
(org.apache.flume.sink.hdfs.BucketWriter.append:430)  - Caught IOException<br>&gt; \
while closing file<br>&gt; \
(maprfs:///flume_import/2013/10/16//logdata-2013-10-16-50-03.1381449008596.tmp).<br>



&gt; Exception follows.<br>&gt; java.io.IOException: 2049.112.5249612<br>&gt; \
/flume_import/2013/10/16/logdata-2013-10-16-50-03.1381449008596.tmp (Stale \
file<br>&gt; handle)<br>&gt;     at \
com.mapr.fs.Inode.throwIfFailed(Inode.java:269)<br>



&gt;     at com.mapr.fs.Inode.flushJniBuffers(Inode.java:402)<br>&gt;     at \
com.mapr.fs.Inode.syncInternal(Inode.java:478)<br>&gt;     at \
com.mapr.fs.Inode.syncUpto(Inode.java:484)<br>&gt;     at \
com.mapr.fs.MapRFsOutStream.sync(MapRFsOutStream.java:244)<br>



&gt;     at com.mapr.fs.MapRFsDataOutputStream.sync(MapRFsDataOutputStream.java:68)<br>&gt; \
at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:946)<br>&gt;     \
at<br>&gt; org.apache.flume.sink.hdfs.HDFSSequenceFile.sync(HDFSSequenceFile.java:107)<br>




&gt;     at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356)<br>&gt; \
at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353)<br>&gt;     \
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)<br>



&gt;     at<br>&gt; org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)<br>&gt; \
at<br>&gt; org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)<br>&gt; \
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)<br>



&gt;     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)<br>&gt; \
at java.util.concurrent.FutureTask.run(FutureTask.java:138)<br>&gt;     at<br>&gt; \
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(T<br>



&gt;<br>&gt;<br>&gt; 6.) I found the following related post<br>&gt;<br>&gt; <a \
href="http://mail-archives.apache.org/mod_mbox/flume-user/201305.mbox/%3C24FEFF6B2CA048F7A4A7D9D6E30084FB@cloudera.com%3E" \
target="_blank">http://mail-archives.apache.org/mod_mbox/flume-user/201305.mbox/%3C24FEFF6B2CA048F7A4A7D9D6E30084FB@cloudera.com%3E</a><br>




&gt;<br>&gt; Not sure if its related to this issue. Can anyone \
comment?<br>&gt;<br>&gt; Thanks,<br>&gt; \
Suhas.<br>&gt;<u></u><u></u></p></div></div></blockquote></div><p \
class="MsoNormal"><u></u> <u></u></p></div></div></div>



<p class="MsoNormal"><br><span \
style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:gray">CONFIDENTIALITY \
NOTICE<br>NOTICE: This message is intended for the use of the individual or entity to \
which it is addressed and may contain information that is confidential, privileged \
and exempt from disclosure under applicable law. If the reader of this message is not \
the intended recipient, you are hereby notified that any printing, copying, \
dissemination, distribution, disclosure or forwarding of this communication is \
strictly prohibited. If you have received this communication in error, please contact \
the sender immediately and delete it from your system. Thank \
You.</span><u></u><u></u></p>



</div></div></blockquote></div><br></div>

<br>
<span style="color:rgb(128,128,128);font-family:Arial,sans-serif;font-size:10px">CONFIDENTIALITY \
NOTICE</span><br style="color:rgb(128,128,128);font-family:Arial,sans-serif;font-size:10px"><span \
style="color:rgb(128,128,128);font-family:Arial,sans-serif;font-size:10px">NOTICE: \
This message is intended for the use of the individual or entity to which it is \
addressed and may contain information that is confidential, privileged and exempt \
from disclosure under applicable law. If the reader of this message is not the \
intended recipient, you are hereby notified that any printing, copying, \
dissemination, distribution, disclosure or forwarding of this communication is \
strictly prohibited. If you have received this communication in error, please contact \
the sender immediately and delete it from your system. Thank You.</span></div>


</div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic