[prev in list] [next in list] [prev in thread] [next in thread] 

List:       npaci-rocks-discussion
Subject:    RE: [Rocks-Discuss]Nodes in 'E' state
From:       "Gurgul, Dennis J." <DGURGUL () PARTNERS ! ORG>
Date:       2006-04-28 16:58:55
Message-ID: 11821FC5155A5A48AB5CF6480C46D864037904EF () PHSXMB1 ! partners ! org
[Download RAW message or body]

I do qstat -f -j job# and there is a line in the output that shows the reason
for the error.  Man on qstat also shows "-explain E" but it seems to do the same
thing as qstat -f -j job#.  

Usually the errors I see don't make sense.  Like, "cannot chdir to
/home/user'sdir...no such file or dir", when "cluster-fork ls -l /home/user'sdir
| wc" returns the expected output for each node (there is obviously a dir).

I clear the errors with "qmod -cj job#".  There's supposed to be a wild-card
option (so, clear error for ALL of a users jobs, but I can't figure out how to
use it.  I usually use a script like:

#!/bin/sh
for ((i=starting_job_number; i < ending_job_number; i++));
do
qmod -cj $i  
done



Dennis Gurgul
Parthers HealthCare
Research Computing
617.724.3169
http://www.partners.org/rescomputing/

-----Original Message-----
From: npaci-rocks-discussion-admin@sdsc.edu
[mailto:npaci-rocks-discussion-admin@sdsc.edu] On Behalf Of
kannaiah@bsd.uchicago.edu
Sent: Thursday, April 27, 2006 6:59 PM
To: npaci-rocks-discussion@sdsc.edu
Subject: [Rocks-Discuss]Nodes in 'E' state

Hello,

What does it mean when a node is in 'E' (error) state. And how do i get the
nodes out of that state?


Thank you
-Kiran




-------------------------------------------------
This email is intended only for the use of the individual or entity to which
it is addressed and may contain information that is privileged and
confidential.  If the reader of this email message is not the intended
recipient, you are hereby notified that any dissemination, distribution, or
copying of this communication is prohibited.  If you have received this email
in error, please notify the sender and destroy/delete all copies of the
transmittal.  Thank you.
-------------------------------------------------

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic