[prev in list] [next in list] [prev in thread] [next in thread]
List: grid-engine-dev
Subject: Re: [GE dev] Qselect problem with sge6?
From: Andy Schwierskott <andy.schwierskott () sun ! com>
Date: 2004-07-20 9:52:06
Message-ID: Pine.SOC.4.60.0407201151120.2368 () sr-ergb01-01
[Download RAW message or body]
Jeff,
I assume you have a core (and can get one) - could you please attach the stack strace \
to the Issue?
Thanks,
Andy
On Mon, 19 Jul 2004, Beadles, Jeff wrote:
>
> It must be strange problem day for me, but if I run;
> $ qselect -l arch=sol-sparc64
> critical error: !!!!!!!!!! lGetList(): got NULL element for EH_load_list !!!!!!!
> !!!
> Aborted
>
> However this works fine:
>
> $ qhost -l arch=sol-sparc64
> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
> -------------------------------------------------------------------------------
> global - - - - - - -
> b2500 sol-sparc64 1.00 0.00 4.0G 420.0M 4.0G 1.0M
> bertha sol-sparc64 8.00 0.02 16.0G 2.3G 32.0G 0.0
> blakswan sol-sparc64 4.00 0.03 4.0G 1.5G 4.0G 49.0M
> devalssw sol-sparc64 2.00 0.19 1.0G 713.0M 2.0G 144.0M
> lab240 sol-sparc64 2.00 0.00 4.0G 498.0M 2.0G 0.0
> matebert sol-sparc64 1.00 - 2.0G - 4.0G -
> mirror sol-sparc64 1.00 0.04 512.0M 136.0M 2.0G 28.0M
> scotch - - - - - - -
> srsc101 sol-sparc64 4.00 0.01 4.0G 470.0M 8.9G 0.0
> ss5svr1 sol-sparc64 2.00 0.05 4.0G 643.0M 4.0G 0.0
> ...
>
> I believe that this has to do with hosts that are down, and that haven't reported \
> their load/config information.
> I've taken a look at the abort, and it looks like the code is abort()ing rather \
> than returning 0 matches.
> In particular, in libs/cull/cull_multitype.c, there are several places with code \
> like:
> const char *lGetHost(const lListElem *ep, int name)
> {
> int pos;
> DENTER(CULL_BASIS_LAYER, "lGetHost");
>
> if (!ep) {
> CRITICAL((SGE_EVENT, MSG_CULL_POINTER_GETHOST_NULLELEMENTFORX_S ,
> lNm2Str(name)));
> DEXIT;
> abort();
> }
> ...
>
> I think that it should read:
> const char *lGetHost(const lListElem *ep, int name)
> {
> int pos;
> DENTER(CULL_BASIS_LAYER, "lGetHost");
>
> if (!ep) {
> return(NULL);
> }
>
> There are 3 places that this shows up, in lGetHost(), lGetList(), and lGetSubStr()
>
> Shouldn't it be returning 'no matches' rather than dumping core?
>
> Attached is an updated version of cull_multitype.c, that I would appreciate someone \
> with source access reviewing & checking in.
> We've been running with these changes for a year on V5, and now with V6.
>
> Regards,
> -Jeff
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic