[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-issues
Subject:    [jira] [Commented] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launc
From:       "Sargun Dhillon (JIRA)" <jira () apache ! org>
Date:       2017-06-30 2:10:00
Message-ID: JIRA.13083651.1498788565000.140966.1498788600054 () Atlassian ! JIRA
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin. \
system.issuetabpanels:comment-tabpanel&focusedCommentId=16069355#comment-16069355 ] 

Sargun Dhillon commented on MESOS-7744:
---------------------------------------

Full log:
{code}
Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned task \
                Titus-7590548-worker-0-4476 for framework TitusFramework
Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task \
                Titus-7590548-worker-0-4476 for framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task \
‘Titus-7590548-worker-0-4476' for executor ‘docker-executor' of framework \
                TitusFramework at executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill task \
                Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling status update \
TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task \
                Titus-7590548-worker-0-4476 of framework TitusFramework from \
                @0.0.0.0:0
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued task \
‘Titus-7590548-worker-0-4476' to executor ‘docker-executor' of framework \
                TitusFramework at executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.494860  5171 slave.cpp:3211] Handling status update \
TASK_STARTING (UUID: d6aaed02-5d21-11e7-846c-0a0c90a7033c) for task \
Titus-7590548-worker-0-4476 of framework TitusFramework from \
                executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.496829  5191 status_update_manager.cpp:320] \
Received status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for \
                task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.497530  5191 status_update_manager.cpp:825] \
Checkpointing UPDATE for status update TASK_KILLED (UUID: \
898215d6-a244-4dbe-bc9c-878a22d36ea4) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.498082  5171 slave.cpp:3211] Handling status update \
TASK_STARTING (UUID: d6aafd3f-5d21-11e7-846c-0a0c90a7033c) for task \
Titus-7590548-worker-0-4476 of framework TitusFramework from \
                executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.500267  5191 status_update_manager.cpp:320] \
Received status update TASK_STARTING (UUID: d6aaed02-5d21-11e7-846c-0a0c90a7033c) for \
                task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.500377  5191 status_update_manager.cpp:825] \
Checkpointing UPDATE for status update TASK_STARTING (UUID: \
d6aaed02-5d21-11e7-846c-0a0c90a7033c) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.500562  5189 slave.cpp:3604] Forwarding the update \
TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task \
                Titus-7590548-worker-0-4476 of framework TitusFramework to \
                master@100.66.3.213:7103
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.502029  5191 status_update_manager.cpp:320] \
Received status update TASK_STARTING (UUID: d6aafd3f-5d21-11e7-846c-0a0c90a7033c) for \
                task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.502092  5191 status_update_manager.cpp:825] \
Checkpointing UPDATE for status update TASK_STARTING (UUID: \
d6aafd3f-5d21-11e7-846c-0a0c90a7033c) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.502393  5189 slave.cpp:3514] Sending \
acknowledgement for status update TASK_STARTING (UUID: \
d6aaed02-5d21-11e7-846c-0a0c90a7033c) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework to executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.504465  5189 slave.cpp:3514] Sending \
acknowledgement for status update TASK_STARTING (UUID: \
d6aafd3f-5d21-11e7-846c-0a0c90a7033c) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework to executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.518888  5191 status_update_manager.cpp:392] \
Received status update acknowledgement (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) \
                for task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:37.519039  5191 status_update_manager.cpp:825] \
Checkpointing ACK for status update TASK_KILLED (UUID: \
898215d6-a244-4dbe-bc9c-878a22d36ea4) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: W0629 23:22:37.520956  5191 status_update_manager.cpp:446] \
Acknowledged a terminal status update TASK_KILLED (UUID: \
898215d6-a244-4dbe-bc9c-878a22d36ea4) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework but updates are still pending
Jun 29 23:22:39 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:39.681637  5183 slave.cpp:3211] Handling status update \
TASK_STARTING (UUID: d7f8e8bc-5d21-11e7-846c-0a0c90a7033c) for task \
Titus-7590548-worker-0-4476 of framework TitusFramework from \
                executor(1)@100.66.11.10:17707
Jun 29 23:22:39 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: W0629 23:22:39.681761  5183 slave.cpp:3291] Could not find the \
executor for status update TASK_STARTING (UUID: d7f8e8bc-5d21-11e7-846c-0a0c90a7033c) \
                for task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:39 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:39.682006  5180 status_update_manager.cpp:320] \
Received status update TASK_STARTING (UUID: d7f8e8bc-5d21-11e7-846c-0a0c90a7033c) for \
                task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:39 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:39.682586  5181 slave.cpp:3604] Forwarding the update \
TASK_STARTING (UUID: d7f8e8bc-5d21-11e7-846c-0a0c90a7033c) for task \
                Titus-7590548-worker-0-4476 of framework TitusFramework to \
                master@100.66.3.213:7103
Jun 29 23:22:39 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:39.682958  5181 slave.cpp:3514] Sending \
acknowledgement for status update TASK_STARTING (UUID: \
d7f8e8bc-5d21-11e7-846c-0a0c90a7033c) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework to executor(1)@100.66.11.10:17707
Jun 29 23:22:39 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:39.686782  5172 status_update_manager.cpp:392] \
Received status update acknowledgement (UUID: d7f8e8bc-5d21-11e7-846c-0a0c90a7033c) \
                for task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:39 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: E0629 23:22:39.687196  5195 slave.cpp:2621] Status update \
acknowledgement (UUID: d7f8e8bc-5d21-11e7-846c-0a0c90a7033c) for task \
                Titus-7590548-worker-0-4476 of unknown executor
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.530827  5182 slave.cpp:3211] Handling status update \
TASK_STARTING (UUID: df08f5a4-5d21-11e7-846c-0a0c90a7033c) for task \
Titus-7590548-worker-0-4476 of framework TitusFramework from \
                executor(1)@100.66.11.10:17707
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: W0629 23:22:51.530951  5182 slave.cpp:3291] Could not find the \
executor for status update TASK_STARTING (UUID: df08f5a4-5d21-11e7-846c-0a0c90a7033c) \
                for task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.531138  5172 status_update_manager.cpp:320] \
Received status update TASK_STARTING (UUID: df08f5a4-5d21-11e7-846c-0a0c90a7033c) for \
                task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.531445  5181 slave.cpp:3604] Forwarding the update \
TASK_STARTING (UUID: df08f5a4-5d21-11e7-846c-0a0c90a7033c) for task \
                Titus-7590548-worker-0-4476 of framework TitusFramework to \
                master@100.66.3.213:7103
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.531718  5181 slave.cpp:3514] Sending \
acknowledgement for status update TASK_STARTING (UUID: \
df08f5a4-5d21-11e7-846c-0a0c90a7033c) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework to executor(1)@100.66.11.10:17707
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.536438  5196 status_update_manager.cpp:392] \
Received status update acknowledgement (UUID: df08f5a4-5d21-11e7-846c-0a0c90a7033c) \
                for task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: E0629 23:22:51.536902  5197 slave.cpp:2621] Status update \
acknowledgement (UUID: df08f5a4-5d21-11e7-846c-0a0c90a7033c) for task \
                Titus-7590548-worker-0-4476 of unknown executor
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.693526  5189 slave.cpp:3211] Handling status update \
TASK_RUNNING (UUID: df21c703-5d21-11e7-846c-0a0c90a7033c) for task \
Titus-7590548-worker-0-4476 of framework TitusFramework from \
                executor(1)@100.66.11.10:17707
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: W0629 23:22:51.693653  5189 slave.cpp:3291] Could not find the \
executor for status update TASK_RUNNING (UUID: df21c703-5d21-11e7-846c-0a0c90a7033c) \
                for task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.693857  5199 status_update_manager.cpp:320] \
Received status update TASK_RUNNING (UUID: df21c703-5d21-11e7-846c-0a0c90a7033c) for \
                task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.694207  5170 slave.cpp:3604] Forwarding the update \
TASK_RUNNING (UUID: df21c703-5d21-11e7-846c-0a0c90a7033c) for task \
                Titus-7590548-worker-0-4476 of framework TitusFramework to \
                master@100.66.3.213:7103
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.694473  5170 slave.cpp:3514] Sending \
acknowledgement for status update TASK_RUNNING (UUID: \
df21c703-5d21-11e7-846c-0a0c90a7033c) for task Titus-7590548-worker-0-4476 of \
                framework TitusFramework to executor(1)@100.66.11.10:17707
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: I0629 23:22:51.698933  5201 status_update_manager.cpp:392] \
Received status update acknowledgement (UUID: df21c703-5d21-11e7-846c-0a0c90a7033c) \
                for task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:51 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
mesos-slave[4290]: E0629 23:22:51.699404  5172 slave.cpp:2621] Status update \
acknowledgement (UUID: df21c703-5d21-11e7-846c-0a0c90a7033c) for task \
Titus-7590548-worker-0-4476 of unknown executor {code}

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> ----------------------------------------------------------------------------
> 
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.0.1
> Reporter: Sargun Dhillon
> Priority: Minor
> 
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a \
> TASK_STARTING back from the agent. Under certain conditions it can result in Mesos \
> losing track of the task. The chunk of the logs which is interesting is here: \
>                 {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned task \
>                 Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task \
>                 Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task \
> ‘Titus-7590548-worker-0-4476' for executor ‘docker-executor' of framework \
>                 TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill task \
>                 Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling status \
> update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task \
>                 Titus-7590548-worker-0-4476 of framework TitusFramework from \
>                 @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c \
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued task \
> ‘Titus-7590548-worker-0-4476' to executor ‘docker-executor' of framework \
> TitusFramework at executor(1)@100.66.11.10:17707{ {code}
> In our executor, we see that the launch message arrives after the master has \
> already gotten the kill update. We then send non-terminal state updates to the \
> agent, and yet it doesn't forward these to our framework. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic