'Re: Error: Too Many Fetch Failures'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Error: Too Many Fetch Failures
From:       "Ellis H. Wilson III" <ellis () cse ! psu ! edu>
Date:       2012-06-29 0:30:46
Message-ID: 4FECF736.7030306 () cse ! psu ! edu
[Download RAW message or body]

On 06/19/12 23:10, Ellis H. Wilson III wrote:
> On 06/19/12 20:42, Raj Vishwanathan wrote:
>> You are probably having a very low somaxconn parameter ( default
>> centos has it at 128 , if I remember correctly). You can check the
>> value under /proc/sys/net/core/somaxconn
>
> Aha! Excellent, it does seem it's at the default, and that particular
> sysctl item had slipped my notice:
> [ellis@pool100 ~]$ cat /proc/sys/net/core/somaxconn
> 128
>
>> Can you also check the value of ulimit -n? It could be low.
>
> I did look for and alter this already, but it is set fairly high from
> what I can tell:
> [ellis@pool100 ~]$ ulimit -n
> 16384
>
> I altered both of these in /etc/sysctl.conf and have forced them to be
> re-read with `sysctl -p` on all nodes. I will report back if this fixes
> the issues tomorrow.

To anyone who runs into this problem in the future, I found that 
increasing the somaxconn parameter fixed the fetch failures issue 
completely (from 3 tests run so far on largish datasets).  This should 
be particularly useful for others who are dealing with an extremely high 
TaskTracker to DataNode ratio (10:1 in my case).

Thanks again to Raj for this solution, and others for their suggestions.

Best,

ellis
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic