[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ssic-linux-users
Subject:    [SSI-users] unbalanced load & strange numerous migrations?
From:       Maurice Libes <Maurice.Libes () com ! univ-mrs ! fr>
Date:       2006-01-09 15:03:08
Message-ID: 43C27B2C.30500 () com ! univ-mrs ! fr
[Download RAW message or body]

hi to all

these morning my 3 nodes cluster (oSSI 1.2 on debian) seemed frozen .. 
none system command were answering (top ifconfig w ps ..)...
to  my knowledge (top command was frozen) there was only one computing 
process running..

trying to understand what happened, i saw this situation below..
an heavily unbalanced load between nodes...

loads
1:      3792
3:      8
4:      54

$ onnode 3 cat /proc/cluster/loadlog

exhibits plenty of "uptime" processes which were regularly migrated on 
node 1 from node 3, each 15' ... increasing the node 1 load until 3656

how this situation was possible?
1. why and how the loads between nodes could be  as much unbalanced?

i don't know the origin of such a discrepancy, could it be due to these 
uptime processes which were regurlarly migrated to node 1?

any ideas welcome
thanks
ML

1136747701 rexec :pid 255475(/usr/bin/uptime) -> node 1 mem 810204 my 
load 8 node1 load 60
1136748601 rexec :pid 255496(/usr/bin/uptime) -> node 1 mem 808844 my 
load 8 node1 load 120
1136749502 rexec :pid 255523(/usr/bin/uptime) -> node 1 mem 807536 my 
load 8 node1 load 177
1136750401 rexec :pid 255542(/usr/bin/uptime) -> node 1 mem 806184 my 
load 8 node1 load 237
1136751301 rexec :pid 255569(/usr/bin/uptime) -> node 1 mem 804824 my 
load 8 node1 load 294
1136752201 rexec :pid 255590(/usr/bin/uptime) -> node 1 mem 803444 my 
load 8 node1 load 354
1136753101 rexec :pid 255617(/usr/bin/uptime) -> node 1 mem 802020 my 
load 8 node1 load 415
1136754002 rexec :pid 255636(/usr/bin/uptime) -> node 1 mem 800648 my 
load 8 node1 load 447
1136754901 rexec :pid 255663(/usr/bin/uptime) -> node 1 mem 799204 my 
load 8 node1 load 532
1136755801 rexec :pid 255684(/usr/bin/uptime) -> node 1 mem 797928 my 
load 8 node1 load 600
1136756701 rexec :pid 255711(/usr/bin/uptime) -> node 1 mem 796536 my 
load 8 node1 load 649
1136757601 rexec :pid 255730(/usr/bin/uptime) -> node 1 mem 795104 my 
load 8 node1 load 709
1136758502 rexec :pid 255757(/usr/bin/uptime) -> node 1 mem 793728 my 
load 8 node1 load 766
1136759401 rexec :pid 255778(/usr/bin/uptime) -> node 1 mem 792328 my 
load 8 node1 load 826
1136760301 rexec :pid 255805(/usr/bin/uptime) -> node 1 mem 790964 my 
load 8 node1 load 886
1136761201 rexec :pid 255824(/usr/bin/uptime) -> node 1 mem 789676 my 
load 8 node1 load 943
1136762101 rexec :pid 255851(/usr/bin/uptime) -> node 1 mem 788148 my 
load 8 node1 load 1003
1136763002 rexec :pid 255872(/usr/bin/uptime) -> node 1 mem 786768 my 
load 8 node1 load 1060
1136763901 rexec :pid 255899(/usr/bin/uptime) -> node 1 mem 785368 my 
load 8 node1 load 1120
1136764801 rexec :pid 255918(/usr/bin/uptime) -> node 1 mem 783980 my 
load 8 node1 load 1181
1136765701 rexec :pid 255945(/usr/bin/uptime) -> node 1 mem 782596 my 
load 8 node1 load 1237
1136766601 rexec :pid 255966(/usr/bin/uptime) -> node 1 mem 781212 my 
load 8 node1 load 1298
1136767501 rexec :pid 255993(/usr/bin/uptime) -> node 1 mem 779856 my 
load 8 node1 load 1354
1136768401 rexec :pid 256012(/usr/bin/uptime) -> node 1 mem 778564 my 
load 8 node1 load 1415
1136769301 rexec :pid 256039(/usr/bin/uptime) -> node 1 mem 777176 my 
load 8 node1 load 1475
1136770201 rexec :pid 256060(/usr/bin/uptime) -> node 1 mem 775852 my 
load 8 node1 load 1532
1136771101 rexec :pid 256087(/usr/bin/uptime) -> node 1 mem 774364 my 
load 8 node1 load 1592
1136772002 rexec :pid 256106(/usr/bin/uptime) -> node 1 mem 773012 my 
load 8 node1 load 1652
1136772901 rexec :pid 256133(/usr/bin/uptime) -> node 1 mem 771524 my 
load 8 node1 load 1709
1136773801 rexec :pid 256154(/usr/bin/uptime) -> node 1 mem 770216 my 
load 8 node1 load 1769
1136774701 rexec :pid 256181(/usr/bin/uptime) -> node 1 mem 768832 my 
load 8 node1 load 1826
1136775601 rexec :pid 256200(/usr/bin/uptime) -> node 1 mem 767468 my 
load 8 node1 load 1923
1136776501 rexec :pid 256227(/usr/bin/uptime) -> node 1 mem 765952 my 
load 8 node1 load 1947
1136777401 rexec :pid 256248(/usr/bin/uptime) -> node 1 mem 764668 my 
load 8 node1 load 2003
1136778301 rexec :pid 256275(/usr/bin/uptime) -> node 1 mem 763264 my 
load 8 node1 load 2064
1136779201 rexec :pid 256294(/usr/bin/uptime) -> node 1 mem 761896 my 
load 8 node1 load 2120
1136780101 rexec :pid 256321(/usr/bin/uptime) -> node 1 mem 760540 my 
load 8 node1 load 2181
1136781001 rexec :pid 256342(/usr/bin/uptime) -> node 1 mem 759156 my 
load 8 node1 load 2241

136781001 rexec :pid 256342(/usr/bin/uptime) -> node 1 mem 759156 my 
load 8 node1 load 2241
1136781901 rexec :pid 256369(/usr/bin/uptime) -> node 1 mem 757772 my 
load 8 node1 load 2347
1136782801 rexec :pid 256388(/usr/bin/uptime) -> node 1 mem 756376 my 
load 8 node1 load 2358
1136783702 rexec :pid 256415(/usr/bin/uptime) -> node 1 mem 754896 my 
load 8 node1 load 2418
1136784601 rexec :pid 256565(/usr/bin/uptime) -> node 1 mem 753016 my 
load 8 node1 load 2494
1136785502 rexec :pid 256592(/usr/bin/uptime) -> node 1 mem 751652 my 
load 8 node1 load 2554
1136786401 rexec :pid 256611(/usr/bin/uptime) -> node 1 mem 750328 my 
load 8 node1 load 2665
1136787301 rexec :pid 256638(/usr/bin/uptime) -> node 1 mem 748824 my 
load 8 node1 load 2723
1136788201 rexec :pid 256659(/usr/bin/uptime) -> node 1 mem 747532 my 
load 8 node1 load 2732
1136789101 rexec :pid 256686(/usr/bin/uptime) -> node 1 mem 746076 my 
load 8 node1 load 2788
1136790002 rexec :pid 256705(/usr/bin/uptime) -> node 1 mem 744760 my 
load 8 node1 load 2849
1136790901 rexec :pid 256732(/usr/bin/uptime) -> node 1 mem 743296 my 
load 8 node1 load 2909
1136791801 rexec :pid 256753(/usr/bin/uptime) -> node 1 mem 741964 my 
load 8 node1 load 3023
1136792701 rexec :pid 256780(/usr/bin/uptime) -> node 1 mem 740472 my 
load 8 node1 load 3026
1136793601 rexec :pid 256799(/usr/bin/uptime) -> node 1 mem 739176 my 
load 8 node1 load 3086
1136794502 rexec :pid 256826(/usr/bin/uptime) -> node 1 mem 737744 my 
load 8 node1 load 3143
1136795401 rexec :pid 256847(/usr/bin/uptime) -> node 1 mem 736456 my 
load 8 node1 load 3203
1136796301 rexec :pid 256874(/usr/bin/uptime) -> node 1 mem 735032 my 
load 8 node1 load 3260
1136797201 rexec :pid 256893(/usr/bin/uptime) -> node 1 mem 733612 my 
load 8 node1 load 3320
1136798101 rexec :pid 256920(/usr/bin/uptime) -> node 1 mem 732136 my 
load 8 node1 load 3381
1136799002 rexec :pid 256941(/usr/bin/uptime) -> node 1 mem 730800 my 
load 8 node1 load 3498
1136799901 rexec :pid 256968(/usr/bin/uptime) -> node 1 mem 729412 my 
load 8 node1 load 3554
1136800801 rexec :pid 256988(/usr/bin/uptime) -> node 1 mem 728032 my 
load 8 node1 load 3656

-- 
      Maurice Libes
Tel : +33 (04) 91 82 93 25            Centre d'Oceanologie de Marseille
Fax : +33 (04) 91 82 65 48            UMS2196 CNRS- Campus de Luminy, 
Case 901
mailto:maurice.libes@com.univ-mrs.fr  F-13288 Marseille cedex 9
Annuaire : http://annuaire.univmed.fr/showuser.php?uid=libes

["smime.p7s" (application/x-pkcs7-signature)]
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Ssic-linux-users mailing list
Ssic-linux-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ssic-linux-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic