'Re: Ansible install Re: Reboot and re-link (ignore previously sent message)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-misc
Subject:    Re: Ansible install Re: Reboot and re-link (ignore previously sent message)
From:       U'll Be King Of The Stars <ullbeking () andrewnesbit ! org>
Date:       2019-06-23 2:54:21
Message-ID: 6182dc04-86f0-316a-5b98-c05df73871e7 () andrewnesbit ! org
[Download RAW message or body]

[Please ignore the previous message I sent on this topic.  I
accidentally pressed 'Send' before my message was complete.]

On 22/06/2019 19:52, chohag@jtan.com wrote:
> Lyndon Nerenberg writes:
>> We are looking forward to that.  *However*, there is a lot to be
>> said for regularly re-installing your hosts from scratch.  This
>> ensures your installer scripts don't rot as host system "features"
>> accrete over time.  This is prone to happen when you Ansible- or
>
> Or as I like to put it: Reboot* often, to ensure that you can. Uptime is
> overrated.

In my experience, there are indeed benefits to rebooting production
servers on a scheduled maintenance basis.  Here are two example problems
that it could help with:

1.  If long-running processes are running then there is some chance that
the system is suffering memory fragmentation.  This will make your
server slower.  I think it could also/either trigger an OOM.

2.  Untested changes could have been deployed since last reboot.  They
might have unpredictable effects on the startup scripts.

3.  The startup scripts might no longer work _at all_ if the server has
been in continual operation for a long time, such as five years.  This
can happen due to the phenomenon known as "bit rot".

Some benefits of a regular, scheduled reboot cycle:

1.  Rebooting will clear up memory fragmentation.

2.  Rebooting will improve confidence that it is possible to reboot the
server in a clean way and that the startup scripts still work.  After
initial boot the server will progress to its intended runtime state.
("Have you tried turning it off and then back on again?")

    Having this kind of confidence is particularly important when a
server crashes or when you need to perform unscheduled maintenance to
deploy to urgent hotfix.

    Another thought literally just occurred to me.  Regular
_unscheduled_ reboots seem like a typical chaos engineering technique.
I haven't investigated chaos engineering closely but I'd be surprised if
it isn't.

Andrew
-- 
OpenPGP key: EB28 0338 28B7 19DA DAB0  B193 D21D 996E 883B E5B9

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic