[prev in list] [next in list] [prev in thread] [next in thread] 

List:       toasters
Subject:    RE: SMVI / VMWare Experiences...
From:       Ken Williams <kwillia () smud ! org>
Date:       2009-09-14 16:51:21
Message-ID: 8A98218D44C9EF4D851C95427928DDAB04850F21 () snpexch01 ! corporate ! smud ! org
[Download RAW message or body]

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick@silkey.org] 
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters@mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs.  A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build.  When the
next SMVI schedule hits, some fail while others succeed.  Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change.  Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now.  Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3.  The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia@smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI 
> with NetApp. We're currently experiencing major issues with SMVI 
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft 
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003 
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the 
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a 
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento 
> Municipal Utility District
> E-Mail: kwillia@smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic