[prev in list] [next in list] [prev in thread] [next in thread]
List: beowulf
Subject: Re: [Beowulf] [EXTERNAL] Power Cycling Question
From: Prentice Bisbal via Beowulf <beowulf () beowulf ! org>
Date: 2021-07-19 15:46:33
Message-ID: 3164c16f-e8cc-50cf-e461-ea6ca2f5c9eb () pppl ! gov
[Download RAW message or body]
> On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can \
> take surprisingly long, so there's another tradeoff there.
Not really, especially not with NVMe disk drives. I have NVMe drives in
both my laptop and my desktop, and it startling how fast they boot and
resume from suspend with NVMe disks.
I think the bigger issue with this approach is if enterprise servers
would support this. I believe there has to be some level of hardware
support for this, which I doubt servers designed for constant-on use
have. Someone please jump in and correct me if I'm wrong here.
Prentice
On 7/16/21 8:38 PM, Lux, Jim (US 7140) via Beowulf wrote:
> An interesting question.
> The power cycling reliability thing is probably not a big deal - the temperatures \
> change a lot between light load and heavy load already, and if a "server class" PC \
> can't take a power cycle per day, when the grungiest consumer unit can do it, I'd \
> be surprised. It's not like you're cycling between -40C and 70C every hour like in \
> an automotive application.
> Managing the chillers, though - That might be a bigger problem.
>
> And as Jörg points out, there's a fair amount of sophistication needed in setting \
> your turn on and turn off thresholds.
> On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can \
> take surprisingly long, so there's another tradeoff there.
>
> On 7/16/21, 12:35 PM, "Beowulf on behalf of Douglas Eadline" \
> <beowulf-bounces@beowulf.org on behalf of deadline@eadline.org> wrote:
>
> Hi everyone:
>
> Reducing power use has become an important topic. One
> of the questions I always wondered about is
> why more cluster do not turn off unused nodes. Slurm
> has hooks to turn nodes off when not in use and
> turn them on when resources are needed.
>
> My understanding is that power cycling creates
> temperature cycling, that then leads to premature node
> failure. Makes sense and has anyone ever studied/tested
> this ?
>
> The only other reason I can think of is that the delay
> in server boot time makes job starts slow or power
> surge issues.
>
> I'm curious about other ideas or experiences.
>
> Thanks
>
> --
> Doug
>
>
>
>
> --
> Doug
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit \
> https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!! \
> PvBDto6Hs4WbVuu7!ef5Z3NxzUcVChBwMKSYQ9u5d4nI_weKdbvUWM6BY8x2UyBeye1j64LNSRzJZUkml3wOJ0TM$
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit \
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit \
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic