[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Intermittent BindException during long MR jobs
From:       Krishna Rao <krishnanjrao () gmail ! com>
Date:       2015-03-25 17:19:17
Message-ID: CAPEqew+KHCdS2-JOCv1pDnHoMeu-LCLhTCOXAN_zPhXk5kqyCg () mail ! gmail ! com
[Download RAW message or body]

Thanks for the responses. In our case the port is 0, and so from the link
<http://wiki.apache.org/hadoop/BindException> Ted mentioned it says that a
collision is highly unlikely:

"If the port is "0", then the OS is looking for any free port -so the
port-in-use and port-below-1024 problems are highly unlikely to be the
cause of the problem."

I think load may be the culprit since the nodes will be heavily used during
the times that the exception occurs.

Is there anyway to set/increase the timeout for the call/connection
attempt? In all cases so far it seems to be on a call to delete a file in
HDFS. I had a search through the HDFS code base but couldn't see an obvious
way to set a timeout, and couldn't see it being set.

Krishna

On 28 February 2015 at 15:20, Ted Yu <yuzhihong@gmail.com> wrote:

> Krishna:
> Please take a look at:
> http://wiki.apache.org/hadoop/BindException
> 
> Cheers
> 
> On Thu, Feb 26, 2015 at 10:30 PM, <hadoop.support@visolve.com> wrote:
> 
> > Hello Krishna,
> > 
> > 
> > 
> > Exception seems to be IP specific. It might be occurred due to
> > unavailability of IP address in the system to assign. Double check the IP
> > address availability and run the job.
> > 
> > 
> > 
> > *Thanks,*
> > 
> > *S.RagavendraGanesh*
> > 
> > ViSolve Hadoop Support Team
> > ViSolve Inc. | San Jose, California
> > Website: www.visolve.com
> > 
> > email: services@visolve.com | Phone: 408-850-2243
> > 
> > 
> > 
> > 
> > 
> > *From:* Krishna Rao [mailto:krishnanjrao@gmail.com]
> > *Sent:* Thursday, February 26, 2015 9:48 PM
> > *To:* user@hive.apache.org; user@hadoop.apache.org
> > *Subject:* Intermittent BindException during long MR jobs
> > 
> > 
> > 
> > Hi,
> > 
> > 
> > 
> > we occasionally run into a BindException causing long running jobs to
> > occasionally fail.
> > 
> > 
> > 
> > The stacktrace is below.
> > 
> > 
> > 
> > Any ideas what this could be caused by?
> > 
> > 
> > 
> > Cheers,
> > 
> > 
> > 
> > Krishna
> > 
> > 
> > 
> > 
> > 
> > Stacktrace:
> > 
> > 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task  - Job
> > Submission failed with exception 'java.net.BindException(Problem binding to
> > [back10/10.4.2.10:0] java.net.BindException: Cann
> > 
> > ot assign requested address; For more details see:
> > http://wiki.apache.org/hadoop/BindException)'
> > 
> > java.net.BindException: Problem binding to [back10/10.4.2.10:0]
> > java.net.BindException: Cannot assign requested address; For more details
> > see:  http://wiki.apache.org/hadoop/BindException
> > 
> > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)
> > 
> > at org.apache.hadoop.ipc.Client.call(Client.java:1242)
> > 
> > at
> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> >  
> > at com.sun.proxy.$Proxy10.create(Unknown Source)
> > 
> > at
> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)
> >  
> > at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
> > 
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >  
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > 
> > at
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> >  
> > at
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> >  
> > at com.sun.proxy.$Proxy11.create(Unknown Source)
> > 
> > at
> > org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1376)
> > 
> > at
> > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)
> >  
> > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)
> > 
> > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)
> > 
> > at
> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)
> >  
> > at
> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)
> >  
> > at
> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)
> >  
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
> > 
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)
> > 
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)
> > 
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)
> > 
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)
> > 
> > at
> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)
> >  
> > at
> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)
> >  
> > at
> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)
> > 
> > at
> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)
> > 
> > at
> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)
> > 
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)
> > 
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)
> > 
> > at java.security.AccessController.doPrivileged(Native Method)
> > 
> > at javax.security.auth.Subject.doAs(Subject.java:396)
> > 
> > at
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> >  
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283)
> > 
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> > 
> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> > 
> > at java.security.AccessController.doPrivileged(Native Method)
> > 
> > at javax.security.auth.Subject.doAs(Subject.java:396)
> > 
> > at
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> >  
> > at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> > 
> > at
> > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> > 
> > at
> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
> > 
> > at
> > org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
> > 
> > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
> > 
> > at
> > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
> > 
> > at
> > org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:56)
> > 
> > 
> > 
> 
> 


[Attachment #3 (text/html)]

<div dir="ltr"><span style="font-size:13px">Thanks for the responses. In our case the \
port is 0, and so from the  </span><a \
href="http://wiki.apache.org/hadoop/BindException" target="_blank" \
style="font-size:13px">link</a><span style="font-size:13px">  Ted mentioned it says \
that a collision is highly unlikely:</span><div style="font-size:13px"><br></div><div \
style="font-size:13px">&quot;If the port is &quot;0&quot;, then the OS is looking for \
any free port -so the port-in-use and port-below-1024 problems are highly unlikely to \
be the cause of the problem.&quot;</div><div style="font-size:13px"><br></div><div \
style="font-size:13px">I think load may be the culprit since the nodes will be \
heavily used during the times that the exception occurs.</div><div \
style="font-size:13px"><br></div><div style="font-size:13px">Is there anyway to \
set/increase the timeout for the call/connection attempt? In all cases so far it \
seems to be on a call to delete a file in HDFS. I had a search through the HDFS code \
base but couldn&#39;t see an obvious way to set a timeout, and couldn&#39;t see it \
being set.</div><div style="font-size:13px"><br></div><div \
style="font-size:13px">Krishna</div></div><div class="gmail_extra"><br><div \
class="gmail_quote">On 28 February 2015 at 15:20, Ted Yu <span dir="ltr">&lt;<a \
href="mailto:yuzhihong@gmail.com" target="_blank">yuzhihong@gmail.com</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="ltr">Krishna:<div>Please take a look \
at:<br><div><a href="http://wiki.apache.org/hadoop/BindException" \
target="_blank">http://wiki.apache.org/hadoop/BindException</a><br></div></div><div><br></div><div>Cheers</div></div><div \
class="gmail_extra"><br><div class="gmail_quote"><span class="">On Thu, Feb 26, 2015 \
at 10:30 PM,  <span dir="ltr">&lt;<a href="mailto:hadoop.support@visolve.com" \
target="_blank">hadoop.support@visolve.com</a>&gt;</span> wrote:<br></span><div><div \
class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div><p \
class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1f497d">Hello \
Krishna,<u></u><u></u></span></p><p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1f497d"><u></u> \
<u></u></span></p><p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1f497d">Exception \
seems to be IP specific. It might be occurred due to unavailability of IP address in \
the system to assign. Double check the IP address availability and run the job. \
<u></u><u></u></span></p><p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1f497d"><u></u> \
<u></u></span></p><p class="MsoNormal" \
style="margin-right:22.5pt;line-height:13.5pt;vertical-align:baseline"><em>Thanks,<u></u><u></u></em></p><p \
class="MsoNormal" style="margin-right:22.5pt;line-height:13.5pt;vertical-align:baseline"><em>S.RagavendraGanesh</em><i><u></u><u></u></i></p><p \
class="MsoNormal" style="margin-right:22.5pt;line-height:13.5pt;vertical-align:baseline">ViSolve \
Hadoop Support Team<br>ViSolve Inc. | San Jose, California<br>Website: <a \
href="http://www.visolve.com" target="_blank">www.visolve.com</a> \
<u></u><u></u></p><p class="MsoNormal" \
style="margin-right:22.5pt;line-height:13.5pt;vertical-align:baseline">email: <a \
href="mailto:services@visolve.com" target="_blank">services@visolve.com</a> | Phone: \
<a href="tel:408-850-2243" value="+14088502243" \
target="_blank">408-850-2243</a><u></u><u></u></p><p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"><u></u>  \
<u></u></span></p><p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1f497d"><u></u> \
<u></u></span></p><p class="MsoNormal"><b><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span \
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> Krishna Rao \
[mailto:<a href="mailto:krishnanjrao@gmail.com" \
target="_blank">krishnanjrao@gmail.com</a>] <br><b>Sent:</b> Thursday, February 26, \
2015 9:48 PM<br><b>To:</b> <a href="mailto:user@hive.apache.org" \
target="_blank">user@hive.apache.org</a>; <a href="mailto:user@hadoop.apache.org" \
target="_blank">user@hadoop.apache.org</a><br><b>Subject:</b> Intermittent \
BindException during long MR jobs<u></u><u></u></span></p><p \
class="MsoNormal"><u></u>  <u></u></p><div><p \
class="MsoNormal">Hi,<u></u><u></u></p><div><p class="MsoNormal"><u></u>  \
<u></u></p></div><div><p class="MsoNormal">we occasionally run into a BindException \
causing long running jobs to occasionally fail.<u></u><u></u></p></div><div><p \
class="MsoNormal"><u></u>  <u></u></p></div><div><p class="MsoNormal">The stacktrace \
is below.<u></u><u></u></p></div><div><p class="MsoNormal"><u></u>  \
<u></u></p></div><div><p class="MsoNormal">Any ideas what this could be caused \
by?<u></u><u></u></p></div><div><p class="MsoNormal"><u></u>  \
<u></u></p></div><div><p class="MsoNormal">Cheers,<u></u><u></u></p></div><div><p \
class="MsoNormal"><u></u>  <u></u></p></div><div><p \
class="MsoNormal">Krishna<u></u><u></u></p></div><div><p class="MsoNormal"><u></u>  \
<u></u></p></div><div><p class="MsoNormal"><u></u>  <u></u></p></div><div><p \
class="MsoNormal">Stacktrace:<u></u><u></u></p></div><div><div><p \
class="MsoNormal">379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task   - \
Job Submission failed with exception &#39;java.net.BindException(Problem binding to \
[back10/<a href="http://10.4.2.10:0" target="_blank">10.4.2.10:0</a>] \
java.net.BindException: Cann<u></u><u></u></p></div><div><p class="MsoNormal">ot \
assign requested address; For more details see:   <a \
href="http://wiki.apache.org/hadoop/BindException" \
target="_blank">http://wiki.apache.org/hadoop/BindException</a>)&#39;<u></u><u></u></p></div><div><p \
class="MsoNormal">java.net.BindException: Problem binding to [back10/<a \
href="http://10.4.2.10:0" target="_blank">10.4.2.10:0</a>] java.net.BindException: \
Cannot assign requested address; For more details see:   <a \
href="http://wiki.apache.org/hadoop/BindException" \
target="_blank">http://wiki.apache.org/hadoop/BindException</a><u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.ipc.Client.call(Client.java:1242)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at com.sun.proxy.$Proxy10.create(Unknown \
Source)<u></u><u></u></p></div><div><p class="MsoNormal">            at \
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown \
Source)<u></u><u></u></p></div><div><p class="MsoNormal">            at \
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
java.lang.reflect.Method.invoke(Method.java:597)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at com.sun.proxy.$Proxy11.create(Unknown \
Source)<u></u><u></u></p></div><div><p class="MsoNormal">            at \
org.apache.hadoop.hdfs.DFSOutputStream.&lt;init&gt;(DFSOutputStream.java:1376)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at java.security.AccessController.doPrivileged(Native \
Method)<u></u><u></u></p></div><div><p class="MsoNormal">            at \
javax.security.auth.Subject.doAs(Subject.java:396)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapreduce.Job.submit(Job.java:1283)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at java.security.AccessController.doPrivileged(Native \
Method)<u></u><u></u></p></div><div><p class="MsoNormal">            at \
javax.security.auth.Subject.doAs(Subject.java:396)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)<u></u><u></u></p></div><div><p \
class="MsoNormal">            at \
org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:56)<u></u><u></u></p></div></div><div><p \
class="MsoNormal"><u></u>  \
<u></u></p></div></div></div></div></blockquote></div></div></div><br></div> \
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic