[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-dev
Subject:    [jira] Updated: (HADOOP-934) TaskTracker sends duplicate status
From:       "Doug Cutting (JIRA)" <jira () apache ! org>
Date:       2007-01-31 19:05:05
Message-ID: 2856168.1170270305772.JavaMail.jira () brutus
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Doug Cutting updated HADOOP-934:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Arun!

We should also probably figure out why that exception is thrown by metrics, since \
that looks like a bug, but we certainly shouldn't let a metrics exception mess things \
up this way, so I think this is a good patch.

> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
> 
> Key: HADOOP-934
> URL: https://issues.apache.org/jira/browse/HADOOP-934
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.10.1
> Reporter: Nigel Daley
> Assigned To: Arun C Murthy
> Fix For: 0.11.0
> 
> Attachments: HADOOP-934_20070129_1.patch
> 
> 
> When updating the TaskTracker metrics repeatedly throws an exception, the \
> TaskTracker repeatedly sends status events to the JobTracker.  This seems to get \
> the JobTracker into a funny state. See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: \
> LaunchTaskAction: task_0001_m_000000_0 2007-01-21 23:51:05,669 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing \
> -1539638179018996759@438272/1858392 2007-01-21 23:51:06,722 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote \
> 7041109562936259497 2007-01-21 23:51:07,784 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote \
> -3327068643288265619 2007-01-21 23:51:08,819 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote \
> -53923196326936926 2007-01-21 23:51:09,322 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote \
> -3407225283463353195 2007-01-21 23:51:09,323 INFO \
> org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done. 2007-01-21 \
> 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: \
> org.apache.hadoop.metrics.MetricsException: at foo.Bar.setMetric(Unknown Source)
> at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>  at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
> at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending \
> 'status' to 'jobtrackerhost' with reponseId '21 2007-01-21 23:51:14,060 ERROR \
> org.apache.hadoop.mapred.TaskTracker: Caught exception: \
> org.apache.hadoop.metrics.MetricsException: at foo.Bar.setMetric(Unknown Source)
> at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>  at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
> at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending \
> 'status' to 'jobtrackerhost' with reponseId '22 2007-01-21 23:51:14,063 ERROR \
> org.apache.hadoop.mapred.TaskTracker: Caught exception: \
> org.apache.hadoop.metrics.MetricsException: at foo.Bar.setMetric(Unknown Source)
> at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>  at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
> at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an \
> increasing responseId] JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing \
> normal task tip_0001_m_000000 2007-01-21 23:51:03,511 INFO \
> org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip \
> tip_0001_m_000000, for tracker 'tracker:50050' 2007-01-21 23:51:03,569 INFO \
> org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000 \
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task \
> 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050' \
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task \
> 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully. 2007-01-21 \
> 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task \
> 'task_0001_m_000000_0' has completed. 2007-01-21 23:51:14,059 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
> 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' 2007-01-21 \
> 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate \
> status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' \
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved \
> duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP \
> 'tip_0001_m_000000' 2007-01-21 23:51:14,066 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
>                 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved \
> duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP \
> 'tip_0001_m_000000' 2007-01-21 23:51:16,442 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
> 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' 2007-01-21 \
> 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task \
> 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully. 2007-01-21 \
> 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task \
> 'task_0001_r_000000_0' has completed. 2007-01-21 23:51:16,456 INFO \
> org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully. \
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed \
> task 'task_0001_r_000000_0' from 'tracker2:50050' 2007-01-21 23:51:16,460 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
> 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' 2007-01-21 \
> 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task \
> 'task_0001_m_000000_0' from 'tracker:50050' 2007-01-21 23:51:16,462 INFO \
> org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, \
> cannot find taskid task_0001_m_000000_0 2007-01-21 23:51:16,463 INFO \
> org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, \
> cannot find taskid task_0001_m_000000_0 2007-01-21 23:51:16,465 INFO \
> org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, \
>                 cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic