'Re: [Veritas-ha] Nice levels of processes.'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       veritas-ha
Subject:    Re: [Veritas-ha] Nice levels of processes.
From:       Darren Dunham <ddunham () taos ! com>
Date:       2001-10-25 19:10:22
[Download RAW message or body]

> > > While investigating an unrelated problem, I just now noticed that all
> > > the VCS processes and applications started by it run with a nice
> > > level of -20 on my box.
> 
> This should not be a big deal.  Solaris process scheduling is dynamic;
> Solaris adjusts the priority of the process during it's lifetime.

Maybe it shouldn't be a big deal, but I'm not going to declare that it
isn't.  To me, this is a change to an oracle installation.  If I went to
management and said that I wanted to change the priority of every oracle
job on the box from 0 to +60, they'd want to know why.  VCS has done
this for me because I didn't understand it fully.  I don't see a good
reason for it to change, so I don't trust it.

Besides, -20 can cause problems.

I took a simple CPU bound process (perl running while(1)) and set it to
-20.  It locked up every single xterm on my test box.  Fortunately I had
an ssh in and was able to kill the process.

Now most processes aren't so CPU piggy, but I can't rely on that.

I took four CPU pigs and ran them for a while.  The load on the box
climbed to around 5 or so.

I reniced them all to -19 (no -20s this time) and the load climbed to
over 12 before I shut them down.  No other CPU jobs were running on the
box.

last pid: 13814;  load averages: 12.05,  9.08,  5.19                 12:03:27
82 processes:  73 sleeping, 8 running, 1 on cpu
CPU states:  0.0% idle, 99.8% user,  0.2% kernel,  0.0% iowait,  0.0% swap
Memory: 384M real, 98M free, 179M swap in use, 581M swap free

   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 13806 ddunham    1  57  -19 2592K 2008K run      3:58 29.14% perl
 13805 ddunham    1  57  -19 2592K 2008K run      3:52 25.09% perl
 13807 ddunham    1  57  -19 2592K 2008K run      3:36 23.19% perl
 13808 ddunham    1  59  -19 2592K 2008K run      3:37 21.46% perl
 13813 root       1  59    0 2576K 1632K cpu      0:01  0.14% top

Conclusion:  Renicing user jobs to very high priorities can affect the
system in unpredictable ways.  I have just a few oracle jobs in one
cluster, but I'm getting some messages about sendmail shutting down due
to system loads over 12.

I'm going to reset the priority on this cluster and hope I can get some
downtime to cycle the application and see if this behavior goes away.

> After that, the priority is recomputed based on how much CPU the
> time process has had, and whether the process was able to run for
> the entire quantum, and other things.  If the process is interrupted
> by the kernel before it can complete it's quantum, it gets a boost in
> priority.  If the process runs for a full quantum, the priority is
> adjusted downward.  The point is to prevent processes that need
> a lot of CPU time (and thus are going to take a while to run
> regardless) of starving processes that only need a little CPU
> time.  The idea is to balance response time and throughput.

I agree.  I do think that even after running a full quantum, the
priority is not adjust down enough in all situations.

> > I initially tried to resolve it by adding some stuff to the 'online'
> > script for Oracle to renice itself, but it appears that's not necessary.
> 
> I personally would just leave it alone.

If I weren't seeing problems that I'm associating with it, I might.

> I would be hesitant to tune this - I suspect the values may have
> been chosen for a reason.  For example, a higher nice value will
> give the process some boost in priority initially, which will
> improve startup time slightly.  This would be beneficial in
> a cluster environment, I think, particularly on a failover.

If there's a good reason, I want to hear it.  In my mind, the values
have been changed on my oracle processes for no good reason and I'm just
changing it back to "normal".

> In any event the system will tune itself.  I am using top right
> now, watchin the priorities for various oracle processes vary
> between 24 and 59, and the processes running with a nice level
> of -20 are NOT getting any perceived benefit over those with
> a nice of 0.  I don't remember all the details off the top of
> my head, but the general impressions that stuck in my mind were
> that after a very few seconds, the nice value was of almost no
> real consequence, and that the way Solaris did process scheduling
> and dispatching seemed to make a lot of sense.

"nice" and priorities are CPU schedulers.  Most oracle processes don't
spend a great deal of time on CPU due to I/O waiting and talking to
other processe, so in the common case, they are less affected by them.
My concern is that a "runaway" oracle process could be created that
attempts to consume great deals of CPU.  It's not that oracle doesn't
run fine this way normally, it's that I don't trust it if it wants to do
something strange.

I just set the nice value of those cpu hogs to 0 and the load on the
machine dropped right down....

last pid: 13814;  load averages:  5.41,  7.51,  5.39                  12:06:54
80 processes:  74 sleeping, 5 running, 1 on cpu
CPU states:  0.0% idle, 99.6% user,  0.4% kernel,  0.0% iowait,  0.0% swap
Memory: 384M real, 99M free, 178M swap in use, 582M swap free

   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 13806 ddunham    1  21    0 2592K 2008K run      4:51 24.96% perl
 13805 ddunham    1  31    0 2592K 2008K run      4:44 24.78% perl
 13807 ddunham    1  21    0 2592K 2008K run      4:27 24.69% perl
 13808 ddunham    1  22    0 2592K 2008K run      4:27 24.60% perl
 13813 root       1  58    0 2576K 1632K cpu      0:01  0.15% top

-- 
Darren Dunham                                           ddunham@taos.com
Unix System Administrator                    Taos - The SysAdmin Company
Got some Dr Pepper?                           San Francisco, CA bay area
          < How are you gentlemen!! Take off every '.SIG'!! >

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic