'[jira] Created: (HADOOP-5367) After some jobs have finished,'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-dev
Subject:    [jira] Created: (HADOOP-5367) After some jobs have finished,
From:       "Thibaut (JIRA)" <jira () apache ! org>
Date:       2009-02-28 13:10:13
Message-ID: 860192477.1235826613394.JavaMail.jira () brutus
[Download RAW message or body]

After some jobs have finished, Reducer will run new job's reduce tasks sequentially \
and not in parallel (mapred.JobTracker: Serious problem.  While updating status, \
                cannot find taskid...)
-------------------------------------------------------------------------------------- \
-----------------------------------------------------------------------------------------------------


                 Key: HADOOP-5367
                 URL: https://issues.apache.org/jira/browse/HADOOP-5367
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.19.1
         Environment: State: RUNNING
Started: Fri Feb 27 17:00:07 CET 2009
Version: 0.19.1, r745977
Compiled: Fri Feb 20 00:16:34 UTC 2009 by ndaley

            Reporter: Thibaut
            Priority: Critical


Hi,

After I while, my cluster will only run the reduce tasks sequentially (each reducer \
running on the same node), the other nodes stay empty. The map phase however will run \
the jobs on all the nodes. This happens in my cluster after about 160 successfully \
completed jobs. (Some jobs have reducer set to 0!).  As possible solution I have to \
restart the mapreduce service.

I didn't notice this behaviour in version 0.19.0. I can't use version 0.19.0 because \
of the multipleoutput bug when setting reducers to 0.

Anoter site node which might be related. I also tried running the jobs with \
speculative execution set to on. My cluster would always hold back one reducer and \
only run it (in multiple instances) after the first of the other 6 reducers had \
finished, instead of launching all of them at the same time.


Below is a short extract from related logfile. It's full of these kind of entries.

09/02/28 12:48:07 INFO mapred.JobTracker: Serious problem.  While updating status, \
cannot find taskid attempt_200902271700_0051_r_000006_1 09/02/28 12:48:08 INFO \
mapred.JobTracker: Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0041_r_000002_1 09/02/28 12:48:08 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0083_r_000006_1 09/02/28 12:48:08 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0041_r_000005_1 09/02/28 12:48:10 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0105_r_000006_1 09/02/28 12:48:10 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0102_r_000006_1 09/02/28 12:48:12 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0051_r_000006_1 09/02/28 12:48:13 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0041_r_000002_1 09/02/28 12:48:13 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0083_r_000006_1 09/02/28 12:48:13 INFO mapred.JobTracker: \
Serious problem.  While updating status, cannot find taskid \
attempt_200902271700_0041_r_000005_1


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic