[prev in list] [next in list] [prev in thread] [next in thread] 

List:       beowulf
Subject:    Re: [Beowulf] [EXTERNAL] Power Cycling Question
From:       Prentice Bisbal via Beowulf <beowulf () beowulf ! org>
Date:       2021-07-19 15:46:33
Message-ID: 3164c16f-e8cc-50cf-e461-ea6ca2f5c9eb () pppl ! gov
[Download RAW message or body]

> On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can \
> take surprisingly long, so there's another tradeoff there.

Not really, especially not with NVMe disk drives. I have NVMe drives in 
both my laptop and my desktop, and it startling how fast they boot and 
resume from suspend with NVMe disks.

I think the bigger issue with this approach is if enterprise servers 
would support this. I believe there has to be some level of hardware 
support for this, which I doubt servers designed for constant-on use 
have. Someone please jump in and correct me if I'm wrong here.

Prentice

On 7/16/21 8:38 PM, Lux, Jim (US 7140) via Beowulf wrote:
> An interesting question.
> The power cycling reliability thing is probably not a big deal - the temperatures \
> change a lot between light load and heavy load already, and if a "server class" PC \
> can't take a power cycle per day, when the grungiest consumer unit can do it, I'd \
> be surprised. It's not like you're cycling between -40C and 70C every hour like in \
> an automotive application. 
> Managing the chillers, though - That might be a bigger problem.
> 
> And as Jörg points out, there's a fair amount of sophistication needed in setting \
> your turn on and turn off thresholds. 
> On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can \
> take surprisingly long, so there's another tradeoff there. 
> 
> On 7/16/21, 12:35 PM, "Beowulf on behalf of Douglas Eadline" \
> <beowulf-bounces@beowulf.org on behalf of deadline@eadline.org> wrote: 
> 
> Hi everyone:
> 
> Reducing power use has become an important topic. One
> of the questions I always wondered about is
> why more cluster do not turn off unused nodes. Slurm
> has hooks to turn nodes off when not in use and
> turn them on when resources are needed.
> 
> My understanding is that power cycling creates
> temperature cycling, that then leads to premature node
> failure. Makes sense and has anyone ever studied/tested
> this ?
> 
> The only other reason I can think of is that the delay
> in server boot time makes job starts slow or power
> surge issues.
> 
> I'm curious about other ideas or experiences.
> 
> Thanks
> 
> --
> Doug
> 
> 
> 
> 
> --
> Doug
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit \
> https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!! \
> PvBDto6Hs4WbVuu7!ef5Z3NxzUcVChBwMKSYQ9u5d4nI_weKdbvUWM6BY8x2UyBeye1j64LNSRzJZUkml3wOJ0TM$
>  
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit \
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit \
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic