[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Too many fetch-failures - reduce task problem
From:       Nachiket Vaidya <vaidyand () gmail ! com>
Date:       2010-01-28 12:27:13
Message-ID: 1c802db51001280427j5b8e57dai4a8d0fdd038f41 () mail ! gmail ! com
[Download RAW message or body]


with hostnames of master and slaves added to /etc/hosts and removing entry
for 127.0.1.1 it worked!!!!

I was always specifying IP address instead of hostname in conf file. But
Hadoop uses IP address only at start up and for all other operations, it
uses hostname only. so added IP address in /etc/hosts file.

On Wed, Jan 27, 2010 at 1:16 PM, Nachiket Vaidya <vaidyand@gmail.com> wrote:

> Hi all,
> My problem is the same problem as
> http://issues.apache.org/jira/browse/HADOOP-3362 and there no solution is
> given :(
>
> 1. I am using hadoop 20.1. My structure is very simple. I have two machines
> (both are Ubuntu machines)
> machine1 = namenode, jobtracker and also datanode and tasktracker. (We will
> call this as master)
>  machine2 = datanode, namenode (We will call this as slave)
> Same as given in
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
> Just one difference I have not changed my /etc/hosts file as I am using ip
> address in conf files. *Is it ok?*
> *
> *
> 2. The program is running fine with stand alone mode but in multi node mode
> it is halting in reduce phase and eventually returning successfully. I am
> running just word count example.
> /**************************/
> 10/01/27 12:08:21 INFO input.FileInputFormat: Total input paths to process
> : 17
> 10/01/27 12:08:21 INFO mapred.JobClient: Running job: job_201001271157_0002
> 10/01/27 12:08:22 INFO mapred.JobClient:  map 0% reduce 0%
> 10/01/27 12:08:39 INFO mapred.JobClient:  map 11% reduce 0%
> 10/01/27 12:08:46 INFO mapred.JobClient:  map 23% reduce 0%
> 10/01/27 12:08:53 INFO mapred.JobClient:  map 35% reduce 0%
> 10/01/27 12:08:56 INFO mapred.JobClient:  map 47% reduce 3%
> 10/01/27 12:09:02 INFO mapred.JobClient:  map 58% reduce 7%
> 10/01/27 12:09:05 INFO mapred.JobClient:  map 70% reduce 7%
> 10/01/27 12:09:08 INFO mapred.JobClient:  map 82% reduce 11%
> 10/01/27 12:09:11 INFO mapred.JobClient:  map 88% reduce 11%
> 10/01/27 12:09:14 INFO mapred.JobClient:  map 100% reduce 11%
> 10/01/27 12:09:23 INFO mapred.JobClient:  map 100% reduce 17%
> 10/01/27 12:16:39 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000002_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 12:16:54 INFO mapred.JobClient:  map 100% reduce 19%
> 10/01/27 12:26:52 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000003_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 12:27:08 INFO mapred.JobClient:  map 100% reduce 21%
> 10/01/27 12:37:08 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000006_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 12:37:24 INFO mapred.JobClient:  map 100% reduce 23%
> 10/01/27 12:47:24 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000007_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 12:47:28 INFO mapred.JobClient:  map 94% reduce 23%
> 10/01/27 12:47:31 INFO mapred.JobClient:  map 100% reduce 23%
> 10/01/27 12:47:40 INFO mapred.JobClient:  map 100% reduce 25%
> 10/01/27 12:57:38 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000010_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 12:57:54 INFO mapred.JobClient:  map 100% reduce 27%
> 10/01/27 13:07:55 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000011_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 13:08:11 INFO mapred.JobClient:  map 100% reduce 29%
> 10/01/27 13:18:11 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000014_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 13:18:27 INFO mapred.JobClient:  map 100% reduce 31%
> 10/01/27 13:28:24 INFO mapred.JobClient: Task Id :
> attempt_201001271157_0002_m_000015_0, Status : FAILED
> Too many fetch-failures
> 10/01/27 13:28:40 INFO mapred.JobClient:  map 100% reduce 100%
> 10/01/27 13:28:42 INFO mapred.JobClient: Job complete:
> job_201001271157_0002
> 10/01/27 13:28:42 INFO mapred.JobClient: Counters: 17
> 10/01/27 13:28:42 INFO mapred.JobClient:   Job Counters
> 10/01/27 13:28:42 INFO mapred.JobClient:     Launched reduce tasks=1
> 10/01/27 13:28:42 INFO mapred.JobClient:     Launched map tasks=25
> 10/01/27 13:28:42 INFO mapred.JobClient:     Data-local map tasks=25
> 10/01/27 13:28:42 INFO mapred.JobClient:   FileSystemCounters
> 10/01/27 13:28:42 INFO mapred.JobClient:     FILE_BYTES_READ=16584
> 10/01/27 13:28:42 INFO mapred.JobClient:     HDFS_BYTES_READ=18805
> 10/01/27 13:28:42 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=33808
> 10/01/27 13:28:42 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=10731
> 10/01/27 13:28:42 INFO mapred.JobClient:   Map-Reduce Framework
> 10/01/27 13:28:42 INFO mapred.JobClient:     Reduce input groups=0
> 10/01/27 13:28:42 INFO mapred.JobClient:     Combine output records=821
> 10/01/27 13:28:42 INFO mapred.JobClient:     Map input records=580
> 10/01/27 13:28:42 INFO mapred.JobClient:     Reduce shuffle bytes=16680
> 10/01/27 13:28:42 INFO mapred.JobClient:     Reduce output records=0
> 10/01/27 13:28:42 INFO mapred.JobClient:     Spilled Records=1642
> 10/01/27 13:28:42 INFO mapred.JobClient:     Map output bytes=25180
> 10/01/27 13:28:42 INFO mapred.JobClient:     Combine input records=1818
> 10/01/27 13:28:42 INFO mapred.JobClient:     Map output records=1818
> 10/01/27 13:28:42 INFO mapred.JobClient:     Reduce input records=821
> /**************************/
>
> I checked the logs for namenodes/jobtracker/datanodes/tasktracker:
>  (attached herewith.)
> There no exception in the files. Just failure statement in jobtracker logs
> as
> /*-------------------------*/
> 2010-01-27 12:26:51,554 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_201001271157_0002_m_000003_1' to tip
> task_201001271157_0002_m_000003, for tracker
> 'tracker_hadoop-desktop2:localhost/127.0.0.1:55734'
> 2010-01-27 12:26:51,554 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201001271157_0002_m_000003
> 2010-01-27 12:26:54,350 INFO org.apache.hadoop.mapred.JobTracker: Removed
> completed task 'attempt_201001271157_0002_m_000003_0' from
> 'tracker_hadoop-desktop1:localhost/127.0.0.1:36778'
> 2010-01-27 12:26:54,626 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201001271157_0002_m_000003_1' has completed
> task_201001271157_0002_m_000003 successfully.
> 2010-01-27 12:26:54,627 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:19  completedMapsInputSize:23876
>  completedMapsOutputSize:22641
> 2010-01-27 12:29:30,987 INFO org.apache.hadoop.mapred.JobInProgress: Failed
> fetch notification #1 for task attempt_201001271157_0002_m_000006_0
> 2010-01-27 12:32:07,410 INFO org.apache.hadoop.mapred.JobInProgress: Failed
> fetch notification #2 for task attempt_201001271157_0002_m_000006_0
> 2010-01-27 12:37:08,075 INFO org.apache.hadoop.mapred.JobInProgress: Failed
> fetch notification #3 for task attempt_201001271157_0002_m_000006_0
> 2010-01-27 12:37:08,075 INFO org.apache.hadoop.mapred.JobInProgress: Too
> many fetch-failures for output of task: attempt_201001271157_0002_m_000006_0
> ... killing it
> 2010-01-27 12:37:08,075 INFO org.apache.hadoop.mapred.TaskInProgress: Error
> from attempt_201001271157_0002_m_000006_0: Too many fetch-failures
> 2010-01-27 12:37:08,076 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_201001271157_0002_m_000006_1' to tip
> task_201001271157_0002_m_000006, for tracker
> 'tracker_hadoop-desktop2:localhost/127.0.0.1:55734'
> 2010-01-27 12:37:08,076 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201001271157_0002_m_000006
> 2010-01-27 12:37:10,613 INFO org.apache.hadoop.mapred.JobTracker: Removed
> completed task 'attempt_201001271157_0002_m_000006_0' from
> 'tracker_hadoop-desktop1:localhost/127.0.0.1:36778'
> 2010-01-27 12:37:11,084 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201001271157_0002_m_000006_1' has completed
> task_201001271157_0002_m_000006 successfully.
> 2010-01-27 12:37:11,084 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20  completedMapsInputSize:25072
>  completedMapsOutputSize:23508
> 2010-01-27 12:39:47,424 INFO org.apache.hadoop.mapred.JobInProgress: Failed
> fetch notification #1 for task attempt_201001271157_0002_m_000007_0
> 2010-01-27 12:42:23,822 INFO org.apache.hadoop.mapred.JobInProgress: Failed
> fetch notification #2 for task attempt_201001271157_0002_m_000007_0
> 2010-01-27 12:47:24,576 INFO org.apache.hadoop.mapred.JobInProgress: Failed
> fetch notification #3 for task attempt_201001271157_0002_m_000007_0
> 2010-01-27 12:47:24,578 INFO org.apache.hadoop.mapred.JobInProgress: Too
> many fetch-failures for output of task: attempt_201001271157_0002_m_000007_0
> ... killing it
> 2010-01-27 12:47:24,578 INFO org.apache.hadoop.mapred.TaskInProgress: Error
> from attempt_201001271157_0002_m_000007_0: Too many fetch-failures
> 2010-01-27 12:47:24,578 INFO org.apache.hadoop.mapred.JobInProgress:
> TaskTracker at 'hadoop-desktop1' turned 'flaky'
> 2010-01-27 12:47:24,579 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_201001271157_0002_m_000007_1' to tip
> task_201001271157_0002_m_000007, for tracker
> 'tracker_hadoop-desktop2:localhost/127.0.0.1:55734'
> 2010-01-27 12:47:24,579 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201001271157_0002_m_000007
> /*--------------------------*/
>
> More info:
> 1. The job filed due to "Too many fetch-failures" are on the *master
> machine only*. the slave able to finish those jobs.
> 2. From master/slave machine, we could not able to access web UI when ip
> address of master is given. But we can access web UI, when use localhost
> instead of ip address of master machine on master  machine.
> - Nachiket
>
>
> On Fri, Jan 22, 2010 at 7:13 PM, Sayali <sayali.kulkarni@gmail.com> wrote:
>
>> Hey Nachiket!
>> So nice to hear from you! I recently joined back PSL and currently working
>> hard on adjusting with the new environment :) I guess you can understand
>> what I mean -- 2 years in IIT, its tough to get back :)
>>
>> Well... your news server needs to be tested! It should not give out such
>> false info! :P (but anyways, ye reporter logon ko masala lagake bolane ki
>> aadat hoti hai... to samajh lo jo samajhana hai :) )
>>
>> Jokes apart... I have worked little bit on hadoop. so let me know what
>> help you need. I will try to help as much as my little memory can allow..
>>
>> :)
>> --s
>>
>>
>>
>> On Fri, Jan 22, 2010 at 9:39 PM, Nachiket Vaidya <vaidyand@gmail.com>wrote:
>>
>>> Hey Sayali,
>>> How are you? Where are you now?
>>>
>>> I am using Hadoop. From news server I got the info that you are boss in
>>> hadoop. I want some help about it.
>>> Do you help me?
>>>
>>>  - Nachiket
>>>
>>
>>
>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic