My love/hate relationship with Cisco Nexus 1000v Part 1

March 08, 2011

Over a year ago, we deployed Cisco Nexus 1000v virtual switching into our VMware cluster. I love some of the features it offers, but the problems we have run across still haunt me.

The first time I deployed into our environment, it was into a virtualized vCenter instance. We were running it on a separate SQL server VM in the cluster. This was all fine and dandy - until we had a major power outage that took the entire cluster offline. The problem with this scenario is that it leads to recursive and cascading failures if not carefully designed. Even if it is carefully designed, it can still make for an unpleasant scenario. Think this through with me. In the event of a complete power outage, what is the recovery process with a typical virtualized vCenter installation? I believe it looks something like this:
1) Power the hosts back up
2) Power up AD server (physical)
3) Locate and power up SQL server (unless this is on-box with vCenter)
4) Locate the vCenter VM and boot it up

What about with Nexus 1000v?
1) Power the hosts back up
2) Power up AD server (physical)
3) Locate and power up SQL server (unless this is on-box with vCenter)
4) Locate the vCenter VM and boot it up
5) Locate the VSMs and power them up

But wait, what if your standard server port profiles aren't configured with System VLANs to match how the servers are configured? That potentially means that you can power up your AD & SQL servers, but they will not have network connectivity until the VSMs are up - the VEMs (the hosts) will keep the VM network ports in a blocked status until VSM connectivity can be restored.

Ok, so you put your SQL server into a subnet that is living on a System VLAN. Then you can power them up, power up vCenter and the VSMs (which should both be on System VLANs), then the VEMs should come online and restore connectivity to the remainder of your VMs.

However, what is the downside of putting VMs on the System VLAN? If I understand correctly, that means that traffic will bypass any ACLs, QoS configuration, and potentially other features if the VSMs become unreachable. In my book, this isn't a particularly appealing solution either.

Another option is to go ahead and run SQL server on-box with your vCenter VM. I think this is certainly a better option than putting multiple servers on System VLANs. Or, you can do what we did: create a physical vCenter server with on-box SQL server. Then, your recovery process is:
1) Power up vCenter server & AD server (both physicals)
2) Power hosts back up
3) Power up VSM VMs

Sure, it isn't really any shorter in the long run, but it's a heck of a lot easier to document in case you want someone other than yourself to be able to start things back up again.

Search This Blog

Undocumented Sanity

My love/hate relationship with Cisco Nexus 1000v Part 1

Comments

Popular posts from this blog

Installing Cisco CallManager 4.1(3) on VMware in 2025

Why is Cisco Licensing so terrible?

Installing Linux on a Cisco Content Engine