'Re: [Rocks-Discuss] sge queues threading on each other'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       npaci-rocks-discussion
Subject:    Re: [Rocks-Discuss] sge queues threading on each other
From:       Eli Morris <ermorris () ucsc ! edu>
Date:       2010-09-28 19:56:26
Message-ID: FF8A44A7-71B0-4A81-B34F-598FFBCC5C0D () ucsc ! edu
[Download RAW message or body]

Hi Jonathan,

Thanks for responding. I'm sorry but I don't understand what you are saying in point \
1. Also, I tried Nadya's suggestion and it didn't seem to have the desired effect. \
Some of the jobs still wound up trying to use processors on nodes that already had \
all their processors being used. I could see on ganglia that some nodes had 16 \
processors in use, when we only have 8 processors per node. Is there some other \
parameter I need to set, perhaps, to get Nadya's suggestion to work in addition to \
the command she wrote about? 

For point 2, I'll say that the problem is that user A will request 32 processors and \
sge will assign nodes 10, 11, 12, 13. So far, so good. User B will use a different \
queue and request another 32 processors. Instead of SGE assigning the 32 processors \
on free nodes 5,6,7,8, it will sometimes assign the job nodes 10, 11, 5, 6 or \
something like that, so instead of assigning jobs to empty nodes, it will assign jobs \
to nodes already at the maximum usage. It isn't 'seeing' that jobs are already \
running on some nodes and avoiding those nodes.

I'll mention that I'm using rocks 5.2.

Again, thanks for helping.

Eli


On Sep 27, 2010, at 4:13 PM, Jonathan Pierce wrote:

> Hi Eli,
> 
> Nadya's method is definitely the easiest way to accomplish what you want.
> You could also define a parallel environment and attach it to the queues,
> but that might be a little much to dive into if you're just starting with
> GE. Two notes I'd like to add on using complex values:
> 
> 1) Make sure you force the resource request with a default request of 1.
> 2) It is *entirely* up to the user to make a larger request--say, "-l
> slots=8" if their job spawns eight threads. That's why a lot of sites have
> a bold/red/screaming/flashing-neon warning on their usage page that it's
> possible to accidentally circumvent the design of the scheduling system,
> which has the effect of making everybody's jobs suffer (including that
> particular individual's jobs). I could be wrong, but I don't believe
> there's any hard resource limits one can set in GE that would kill a job
> if it uses more cores than it requests.
> 
> Best,
> Jonathan
> 
> 
> On 9/26/10 10:10 PM, "Eli Morris" <ermorris@ucsc.edu> wrote:
> 
> > Hi Nadya,
> > 
> > Thanks a lot for the suggestion. I took it and I'm still having the same
> > problem of jobs in different queues trying to use more processors than we
> > have on each of the compute nodes. Any other ideas, guys?
> > 
> > thanks for the help,
> > 
> > Eli
> > 
> > On Sep 24, 2010, at 3:20 PM, Nadya Williams wrote:
> > 
> > > Eli,
> > > 
> > > For each execute host, you ned to set  predefined complex value "slots"
> > > in "complex_values" :
> > > qconf -me compute-0-0.local
> > > 
> > > You will see something like:
> > > hostname              compute-0-0.local
> > > load_scaling         NONE
> > > complex_values   NONE
> > > user_lists               NONE
> > > xuser_lists            NONE
> > > projects                 NONE
> > > xprojects               NONE
> > > usage_scaling     NONE
> > > report_variables  NONE
> > > 
> > > Set the variable "complex_values" to "slots=8" or whatever number you
> > > need (no quotes) :
> > > complex_values        slots=8
> > > 
> > > Or using  CLI you can issue for each host:
> > > qconf -mattr exechost complex_values slots=8 compute-0-0
> > > 
> > > This will force common slot pool for multiple queues to be the number
> > > of slots available.
> > > and will results in up to number of jobs running on any host at any
> > > time.
> > > 
> > > Nadya
> > > 
> > > On Sep 23, 2010, at 11:10 PM, Eli Morris wrote:
> > > 
> > > > Hi All,
> > > > 
> > > > I tried to set up a couple of new queues besides the default 'all.q'
> > > > on my small cluster. I cloned the default queue in qmon and then made
> > > > one of the queue subordinate to the other, so that jobs on the lower
> > > > priority queue will be suspended for jobs in the higher priority queue
> > > > when the cluster does not have enough processors to run all the jobs
> > > > submitted. It's a small group and we just need a simple scheme. Here's
> > > > the problem; the jobs from one queue now try to run on the node /
> > > > processors as the jobs from the other queue such that one computer node
> > > > will be loaded up with 16 processors worth of jobs, even if the node
> > > > only has 8 processors, while some nodes go totally unused. So, it looks
> > > > like one queue isn't 'aware' of the other and they are both trying to
> > > > use the same processors, instead of knowing that  one queue has jobs on
> > > > node X, so it will use node Y. Does anyone know how to deal with this?
> > > > This is my first exposure to messing with scheduling and sge is a beast
> > > > to try to understand at first. I'd appreciate any help.
> > > > 
> > > > Thanks very much,
> > > > 
> > > > Eli
> > > 
> > > Nadya Williams         University of Zurich
> > > nadya@oci.uzh.ch     Winterthurerstrasse 190
> > > Tel:  +41 44 635 4296    CH-8057 Zurich
> > > Fax: +41 44 635 6888    Switzerland
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > .................................
> > Jonathan Pierce
> > Manager, High Performance Computing
> > Laboratory of Neuro Imaging, UCLA
> > 635 Charles E. Young Drive South,
> > Suite 225 Los Angeles, CA 90095-7334
> > Tel: 310.267.5076
> > Fax: 310.206.5518
> > jonathan.pierce@loni.ucla.edu
> > .................................
> 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic