[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-issues
Subject:    [jira] [Commented] (MESOS-10231) Mesos master crashes during framework teardown
From:       "Andreas Peters (Jira)" <jira () apache ! org>
Date:       2021-10-04 10:08:00
Message-ID: JIRA.13403938.1632860815000.1095867.1633342080035 () Atlassian ! JIRA
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/MESOS-10231?page=com.atlassian.jira.plugin \
.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423868#comment-17423868 ] 

Andreas Peters commented on MESOS-10231:
----------------------------------------

Can u show us the configuration how u start spark? Would be helpfull to try it out \
byself.

> Mesos master crashes during framework teardown
> ----------------------------------------------
> 
> Key: MESOS-10231
> URL: https://issues.apache.org/jira/browse/MESOS-10231
> Project: Mesos
> Issue Type: Bug
> Components: framework, master
> Affects Versions: 1.9.0
> Environment: CentOS Linux release 7.9.2009
> Mesos version - 1.9.0
> Reporter: Divyansh Jamuaar
> Priority: Major
> 
> I have setup a Mesos cluster with a single Mesos Master and I submit spark jobs to \
> it in "cluster" mode. After running few spark jobs correctly, the Mesos master \
> crashes while trying to shutdown one of the Spark frameworks with the following \
> error - 
> {code:java}
> F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: \
>                 totalOfferedResources.filter(allocatedToRole).empty() 
> *** Check failure stack trace: ***
> @     0x7f1e024ded2e  google::LogMessage::Fail()
> @     0x7f1e024dec8d  google::LogMessage::SendToLog()
> @     0x7f1e024de637  google::LogMessage::Flush()
> @     0x7f1e024e191c  google::LogMessageFatal::~LogMessageFatal()
> @     0x7f1dff93978d  mesos::internal::master::Framework::untrackUnderRole()
> @     0x7f1dffad004b  mesos::internal::master::Master::removeFramework()
> @     0x7f1dfface859  mesos::internal::master::Master::teardown()
> @     0x7f1dffa8ba25  mesos::internal::master::Master::receive()
> @     0x7f1dffb2f1cf  ProtobufProcess<>::handlerMutM<>()
> @     0x7f1dffbe6809  std::__invoke_impl<>()
> @     0x7f1dffbdae22  std::__invoke<>()
> @     0x7f1dffbc8079  \
> _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9schedul \
> er4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEES4_SD_St12_Placeh \
> olderILi1EESO_ILi2EEEE6__callIvJS8_SL_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
>  @     0x7f1dffbaaae5  std::_Bind<>::operator()<>()
> @     0x7f1dffb833c9  std::_Function_handler<>::_M_invoke()
> @     0x7f1dff330281  std::function<>::operator()()
> @     0x7f1dffb13329  ProtobufProcess<>::consume()
> @     0x7f1dffa85436  mesos::internal::master::Master::_consume()
> @     0x7f1dffa84ad5  mesos::internal::master::Master::consume()
> @     0x7f1dffafb9ae  _ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE
> @     0x564c359f7002  process::ProcessBase::serve()
> @     0x7f1e023a7bbd  process::ProcessManager::resume()
> @     0x7f1e023a407c  _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv
> @     0x7f1e023cf1ba  \
> _ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
>  @     0x7f1e023cd9c9  \
> _ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_
>  @     0x7f1e023cc482  \
> _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE
>  @     0x7f1e023cb53b  \
> _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv
>  @     0x7f1e023ca3c4  \
> _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEEE6_M_runEv
>  @     0x7f1e051f419d  execute_native_thread_routine
> @     0x7f1df4200ea5  start_thread
> @     0x7f1df3f2996d  __clone
> {code}
> 
> 
> It seems like an assertion check is failing which is categorized as fatal but I am \
> not able to figure out the root cause of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic