'Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lse-tech
Subject:    Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement
From:       Paul Jackson <pj () sgi ! com>
Date:       2005-02-09 4:23:56
Message-ID: 20050208202356.41bda38b.pj () sgi ! com
[Download RAW message or body]

Nick wrote:
> The biggest issues may be the userspace
> interface and a decent userspace management tool.

One possibility, perhaps, would be to have a boolean flag "sched_domain"
on each cpuset, indicating whether it was a sched domain or not.  If a
cpuset had its sched_domain flag set, then that cpusets cpus_allowed
mask would define a sched domain.

Later Nick wrote:
> In the (hopefully) common case where there are disjoint partitions
> _somewhere_, sched domains can do the job in a much better
> way than task cpu affinities (better isolation, multiprocessor
> balancing shouldn't break down).
> 
> Those users with overlapping CPU sets can then use task affinities
> on top of sched domains partitions to get the desired result.

Ok - seems it should work with the above cpuset flag marking sched
domains, and a rule that _those_ cpusets so marked can't overlap.  Other
cpusets that are not so marked, and any sched_setaffinity calls, can do
whatever they want.  Trying to turn on the sched_domain flag on a cpuset
that overlapped with existing such cpuset sched_domains, or trying to
mess with the CPUs (cpus_allowed) in an existing cpuset sched_domain so
as to force it to overlap, would return an error to user space on that
write(2).

If the sysadmin didn't mark any cpusets as sched_domains, then fall back
to something automatic and useful.

Inside the kernel, we'll need someway for the cpuset code to tell the
sched code about sched_domain changes.  This might mean something like
the following.  Have the sched code provide the cpuset code a couple of
routines, one to setup and and the other to tear down sched_domains.

Both calls would take a cpumask_t argument, and return void.  The setup
call must pass a cpumask that does not overlap any existing sched
domains defined via cpusets.  The tear down call must pass a cpumask
value exactly matching a previous, still active, setup call.

So if someone made a single CPU change to an existing sched_domain
defining cpuset, the kernel cpuset code would have to call the kernel
sched code twice, first to tear down the old sched_domain, and then to
setup the new, slightly different, one.  The cpuset code would likely be
holding the single global cpuset_sem semaphore across this pair of
calls.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Lse-tech mailing list
Lse-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lse-tech
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic