[prev in list] [next in list] [prev in thread] [next in thread] 

List:       infrastructures
Subject:    Re: [Infrastructures] continous long term live host management
From:       Jim Rowan <jmr () computing ! com>
Date:       2002-11-09 6:39:47
Message-ID: 0A85B135-F3AE-11D6-859D-003065A913C6 () computing ! com
[Download RAW message or body]

On Thursday, September 5, 2002, at 09:53 PM, Kevin M. Counts wrote:

>
> I'd appreciate some input to the practice of moving forward in revisions
> of OS with regards to an infrastructure approach.
>
> Steve mentions on the front page of infrastructures.org
> that a benefit of IS is "Continuous, long-term live host management
> (no re-installation needed to apply upgrades)."

I subscribe to the theory that this is a primary goal.

> How exactly has this been accomplished in the past? Is the idea
> that since we can reproduce a clone system we can test out
> a live upgrade with certainty?

Because we have good control over the state of a machine; we believe that 
we can reproduce a clone with the proposed changes applied upon which to 
test.  Testing is an area that is frequently imprecisely defined, poorly 
executed, and incomplete.  It usually involves players in the user 
community who aren't excited about doing it thoroughly, don't know how, and 
have other pressures.  (Yes, even though it is them that will get bit in 
the ass...)  Thus, unless you've done a lot of other homework, the test 
environment only sets the stage for a potentially clean upgrade.

> Or is it that we can lay
> down a new OS and replay all changes? (dont think its that one)

The "good control" also allows you to back out the change (or reinstall and 
replay; it doesn't make a huge difference..).  It's this one that I like 
the most.

Admin: we've set up a test machine; please run all your acceptance tests 
and tell me if it's ok to roll this out.
User/Acceptance tester (you've all formally identified him, right?:): 
<later> Ok, I ran some tests and I think it's ok.  Go ahead.
Admin: <rolls changes to production environment><pager starts ringing>
Users: YOU BROKE IT!
Admin: It wasn't actually me, it was ____.  However, I can fix it right 
away.  Pushes "rollback" button.  Goes back to drinking {jolt/coffee/beer}
.  Adds annotation to the ISO 9000/ 6 sigma/TCM log that this outage was 
caused by inadequate acceptance testing.

Seriously, it's like when you start using revision control for managing a 
file (of any kind).  You're much more willing to make changes because you 
know you can undo them at little cost.  This is just revision control of a 
bigger object.  Because you're able to make incremental changes, you get to 
stay in control while you make forward progress.  Overall system quality is 
much higher.


Jim Rowan
DCSI
jmr@computing.com


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic