[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-user
Subject:    hadoop on mesos odd issues with heartbeat and ghost task trackers.
From:       John Omernik <john () omernik ! com>
Date:       2015-02-25 17:01:13
Message-ID: CAKOFcwoCLkiHMJgPBr0kqr2o7xTn+RjdPz2xdWpgy1-KwjKwLg () mail ! gmail ! com
[Download RAW message or body]

I am running hadoop on mesos 0.0.8 on Mesos 0.21.0.  I am running into
a weird issue where it appears two of my nodes, when a task tracker is
run on them,  never really complete the check in process, the job
tracker is waiting for their heartbeat, they think they are running
successfully, and then tasks that would be assigned to them stay in a
hung/pending state waiting for the heartbeat.

Basically in the job tracker log, I see the below (where the pending
tasks is one, the inactive slots is 2 (launched but no heartbeat yet)
so the jobtracker just sits there waiting, and the node thinks it's
running fine.

Is there a way to have the JobTracker give up on a task tracker
sooner?  This waiting for timeout period seems odd.

Thanks!

(if there is any other information I can provide, please let me know)



Job Tracker Log:

   Pending Map Tasks: 0

   Pending Reduce Tasks: 1

      Running Map Tasks: 0

   Running Reduce Tasks: 0

         Idle Map Slots: 2

      Idle Reduce Slots: 0

     Inactive Map Slots: 2 (launched but no hearbeat yet)

  Inactive Reduce Slots: 2 (launched but no hearbeat yet)

       Needed Map Slots: 0

    Needed Reduce Slots: 0

     Unhealthy Trackers: 0

2015-02-25 10:57:01,930 INFO mapred.ResourcePolicy [Thread-1290]:
Satisfied map and reduce slots needed.

2015-02-25 10:57:02,083 INFO mapred.MesosScheduler [IPC Server handler
7 on 7676]: Unknown/exited TaskTracker: http://hadoopmapr3:31264.

2015-02-25 10:57:02,097 INFO mapred.MesosScheduler [IPC Server handler
0 on 7676]: Unknown/exited TaskTracker: http://hadoopmapr3:50060.

2015-02-25 10:57:02,148 INFO mapred.MesosScheduler [IPC Server handler
4 on 7676]: Unknown/exited TaskTracker: http://moonman:31182.

2015-02-25 10:57:02,392 INFO mapred.MesosScheduler [IPC Server handler
1 on 7676]: Unknown/exited TaskTracker: http://hadoopmapr3:31264.

2015-02-25 10:57:02,403 INFO mapred.MesosScheduler [IPC Server handler
3 on 7676]: Unknown/exited TaskTracker: http://hadoopmapr3:50060.

2015-02-25 10:57:02,459 INFO mapred.MesosScheduler [IPC Server handler
6 on 7676]: Unknown/exited TaskTracker: http://moonman:31182.

2015-02-25 10:57:02,702 INFO mapred.MesosScheduler [IPC Server handler
4 on 7676]: Unknown/exited TaskTracker: http://hadoopmapr3:31264.

2015-02-25 10:57:02,714 INFO mapred.MesosScheduler [IPC Server handler
5 on 7676]: Unknown/exited TaskTracker: http://hadoopmapr3:50060.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic