[prev in list] [next in list] [prev in thread] [next in thread]
List: sun-managers
Subject: SUMMARY - Multiple Power Supply Replacements in Sun E250
From: Chris Hoogendyk <hoogendyk () bio ! umass ! edu>
Date: 2009-09-24 19:18:08
Message-ID: 4ABBC5F0.1000606 () bio ! umass ! edu
[Download RAW message or body]
Problem resolved.
Original email at bottom.
Thanks to Sean Walmsley, Francisco Roque, Paul Kraus, Bryan hodgson, Tim
Bradshaw, Michael Horton, J.E., Todd A. Cox, and Joseph A. Belford. Todd
nailed it, and a couple of others had pieces of the same puzzle.
First I took the original "failed" power supplies and put them in a
spare E250 that had been set up with adequate hardware components to
run. They came up fine and continued to be fine for over 24 hours.
Then we took an E250 that we had been gutting for parts and removed the
Power Distribution Board (The E250 Owner's Manual has simple
directions). Last night, around 8pm, my boss and I both came in, took
down the server, replaced the power distribution board, and brought it
back up. I had set the eeprom for diag-level=max before we took it down.
It came up clean without any trouble. It has remained clean through
today. Previously, we had incessant complaints in /var/adm/messages
about power supply 1 having failed again.
The only slight bit of difficulty we had was that there are a lot of
cables connecting to the power distribution board. They all have to be
disconnected and reconnected correctly, and where the excess hangs has
to be out of the way of things that move during hot swap. First time we
tried to slide a power supply back in, part of a cable loop had gotten
in the way. We had to rearrange things and then try again.
Note: in looking over the internals of several E250's, mostly in the
direction of 10 years old, there was no sign of any of the capacitor
swelling or leakage that we routinely see in low cost PCs.
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk@bio.umass.edu>
---------------
Erdvs 4
-------- Original Message --------
Subject: Multiple Power Supply Replacements in Sun E250
Date: Mon, 21 Sep 2009 16:19:49 -0400
From: Chris Hoogendyk <hoogendyk@bio.umass.edu>
To: Sun Managers List <sunmanagers@sunmanagers.org>
I'm tossing this to the list because I'm sure there is something I'm
missing here.
We have a number of E250's that have been in operation for a number of
years. We haven't had any trouble with any of them.
A couple of years ago, we also took in 10 used E250's that were being
discarded by another department on campus. We put 3 of them into
operation, collecting parts from some of the others and adding new disk
drives. The rest were set aside in our store room for scavenging.
They've just been sitting there for a couple of years now.
Now to the problem. Around the beginning of September we noticed a
service light on the front of one our E250's. Turns out it was
complaining that power supply 1 had faulted. That power supply showed AC
in but no DC out on its indicator lights. So, we went back to our store
room, pulled a power supply, and hotswapped it. Since the hotswapped
supply had been in the off mode when it was put in, we had to turn the
switch on the front of the E250 to diagnostic and back to run. That
turned off the service light. Cool. That was Sept. 3.
Then on the weekend of Sept. 12/13 there were 3 warnings in
/var/adm/messages on Saturday night saying first that power supply 0 was
faulting and then that power supply 1 was faulting. However, they seemed
to be separated in time in some way so that it didn't take down the
server. Then, on Sunday around 4pm, the server went down. The indicator
lights pointed to power supply 0. My boss swapped that out. Weird.
Then, same E250, started reporting power supply 1 faulted midweek the
following week. We've been under an onslaught of other work, so we
didn't notice it right away. Anyway, when we did notice it, I did an
inventory of our stored E250's, picked the newest one based on serial
numbers, that had been stored above ground level (paranoia about water
leakage), and pulled its upper power supply 1, and replaced that for the
"faulted 1" in our running E250. That gave us about 10 minutes of
respite from the warnings. Then the warnings resumed, saying power
supply 1 not ok.
This just doesn't make sense.
Is there something we are doing wrong? Is flipping the switch to
diagnostic and back to run inadequate to really set the power supply to
be in the on mode? Is there likely something more serious wrong with
this E250? Should we be looking at swapping out the whole box? Have
these additional power supplies just gone stale from sitting idle for a
couple of years? And, can anyone give any guidance on how to
authoritatively diagnose what the problem really is? This happens to be
the one department that has the most trouble coming up with money for
any kind of equipment updates/additions/repairs.
Thanks,
--
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk@bio.umass.edu>
---------------
Erdvs 4
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic