[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-dev
Subject: [jira] Updated: (HADOOP-934) TaskTracker sends duplicate status
From: "Doug Cutting (JIRA)" <jira () apache ! org>
Date: 2007-01-31 19:05:05
Message-ID: 2856168.1170270305772.JavaMail.jira () brutus
[Download RAW message or body]
[ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]
Doug Cutting updated HADOOP-934:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I just committed this. Thanks, Arun!
We should also probably figure out why that exception is thrown by metrics, since \
that looks like a bug, but we certainly shouldn't let a metrics exception mess things \
up this way, so I think this is a good patch.
> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
>
> Key: HADOOP-934
> URL: https://issues.apache.org/jira/browse/HADOOP-934
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.10.1
> Reporter: Nigel Daley
> Assigned To: Arun C Murthy
> Fix For: 0.11.0
>
> Attachments: HADOOP-934_20070129_1.patch
>
>
> When updating the TaskTracker metrics repeatedly throws an exception, the \
> TaskTracker repeatedly sends status events to the JobTracker. This seems to get \
> the JobTracker into a funny state. See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: \
> LaunchTaskAction: task_0001_m_000000_0 2007-01-21 23:51:05,669 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing \
> -1539638179018996759@438272/1858392 2007-01-21 23:51:06,722 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote \
> 7041109562936259497 2007-01-21 23:51:07,784 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote \
> -3327068643288265619 2007-01-21 23:51:08,819 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote \
> -53923196326936926 2007-01-21 23:51:09,322 INFO \
> org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote \
> -3407225283463353195 2007-01-21 23:51:09,323 INFO \
> org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done. 2007-01-21 \
> 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: \
> org.apache.hadoop.metrics.MetricsException: at foo.Bar.setMetric(Unknown Source)
> at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
> at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
> at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending \
> 'status' to 'jobtrackerhost' with reponseId '21 2007-01-21 23:51:14,060 ERROR \
> org.apache.hadoop.mapred.TaskTracker: Caught exception: \
> org.apache.hadoop.metrics.MetricsException: at foo.Bar.setMetric(Unknown Source)
> at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
> at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
> at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending \
> 'status' to 'jobtrackerhost' with reponseId '22 2007-01-21 23:51:14,063 ERROR \
> org.apache.hadoop.mapred.TaskTracker: Caught exception: \
> org.apache.hadoop.metrics.MetricsException: at foo.Bar.setMetric(Unknown Source)
> at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
> at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
> at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an \
> increasing responseId] JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing \
> normal task tip_0001_m_000000 2007-01-21 23:51:03,511 INFO \
> org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip \
> tip_0001_m_000000, for tracker 'tracker:50050' 2007-01-21 23:51:03,569 INFO \
> org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000 \
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task \
> 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050' \
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task \
> 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully. 2007-01-21 \
> 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task \
> 'task_0001_m_000000_0' has completed. 2007-01-21 23:51:14,059 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
> 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' 2007-01-21 \
> 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate \
> status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' \
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved \
> duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP \
> 'tip_0001_m_000000' 2007-01-21 23:51:14,066 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
> 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved \
> duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP \
> 'tip_0001_m_000000' 2007-01-21 23:51:16,442 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
> 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' 2007-01-21 \
> 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task \
> 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully. 2007-01-21 \
> 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task \
> 'task_0001_r_000000_0' has completed. 2007-01-21 23:51:16,456 INFO \
> org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully. \
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed \
> task 'task_0001_r_000000_0' from 'tracker2:50050' 2007-01-21 23:51:16,460 WARN \
> org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of \
> 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000' 2007-01-21 \
> 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task \
> 'task_0001_m_000000_0' from 'tracker:50050' 2007-01-21 23:51:16,462 INFO \
> org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, \
> cannot find taskid task_0001_m_000000_0 2007-01-21 23:51:16,463 INFO \
> org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, \
> cannot find taskid task_0001_m_000000_0 2007-01-21 23:51:16,465 INFO \
> org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, \
> cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic