Friday, May 3, 2024

Twilio

Programmatically sending messages sure has gotten significantly more complicated. This page is a placeholder for opt-in numbers, which will be added as comments.

Monday, July 23, 2012

Why is Cisco Licensing so terrible?

Well, it's been a while since I've written anything here. Since my last post, a lot of things have happened. One of those things perfectly illustrates a poorly thought out aspect of Cisco Licensing, especially for Unified Communications running on VMware. I've believed forever that using the Primary DNS and NTP servers as part of the License MAC generation was a bad idea, but I never really thought about how unpleasant it could be.

I recently traveled to Brazil to deploy our UC-on-UCS environment there, and built out all of the servers. When I first arrived onsite, I contacted our systems team and requested they send me the DNS/NTP information that I would need to use for the build (and licensing). They sent me the information without any hassle, which was a huge bonus. Fast forward, installation is done, I'm back home, and I set up RTMT. Low and behold, I'm getting alarms that the Primary NTP server is inaccessible. Hmmm, that's odd, but sure enough, I can't ping it. So, I contact the systems team and ask what's wrong with the NTP server. Oh, there isn't anything wrong with it, they just used a different IP address when they installed it. Wait, what? But you sent me the IP information while I was onsite, how did it change so quickly? Well, they actually built out the NTP server while I was onsite, since it wasn't already live. When they did, they realized that they allocated an IP address from the DMZ instead of the Internal network, and so they had to assign a different IP address...

So now, here I sit, with a completely installed and in-production UC system, that needs a DNS/NTP server change, and doing so will invalidate my licenses. Awesome. Better than that, Cisco's Answer File Generator bites, for multiple reasons.
1) When I change from CUCM to CUC, for some inexplicable reason, the AFG changes from Virtual Machine to Physical Server. Why? If this setting is tied to the Product, then the Product should be listed first, not second.
2) I need to generate a License MAC for a redundant Unity Connection server, and I've never been able to get this to work right for some reason. If I actually list it as a second node, it shows the same License MAC for both boxes, which certainly isn't right. If I try to create the second node just using the regular settings, the resultant License MAC certainly doesn't match up to what the Production box has now, so obviously it's not going to match after I change my settings.

Cisco: this is stupid. Please remove the License MAC tie to NTP/DNS (and SMTP Location); it's a retarded dependency that is not always under my control. I've worked for several companies now where these functions are managed by other teams, and communication is never as good as it should be. The other setting that I think is dumb: NIC Speed/Duplex. Granted this setting should never change in a Virtual Environment, but if it does have to change for some unusual reason, it's a really dumb reason to have to get a new License.

At least fix the crummy AFG so that it actually works.


Now, let's move on to Tandberg Licensing. We ordered 5 C40s and an EX60 for our offices in Brazil, and for some reason, we are missing SmartNet on them. Either we somehow didn't buy it (even though it was on the Quote), or our VAR jacked something up. That part is still a mystery that will apparently never be solved - no one can seem to produce anything showing what we actually bought, and our account folks from the VAR speak minimal English, so they can't even understand what we are trying to tell them. So, whatever, I wrote it off as a lost cause; I'll fix it next year when I renew everything else.

So, I fire these bad boys up, only to discover that they are loaded with 4.x software. Ok, no problem, I'll just drop 5.x software on them and get a new release key - after all, they just arrived in the mail, so they should be under warranty at least. Guess what. Nope. Cisco (or Tandberg) apparently must start the warranty ticking the day that these things are ordered, and by the time you add in even the most insignificant lead time, these thing arrive on the doorstep with an expired warranty. Thankfully, I was able to bug enough people to get the appropriate release keys (thank you to whoever the product manager is), but come on - this is ridiculous.

First of all, in CCW, the only software that is even select able is 5.x software, but they ship with 4.x software, then you get all twitchy when I try to order the "upgrade"? Then the warranty is expired before I even get them? What kind of shady deal are you running here, Cisco? Thinking this was an anomaly, and something unique to International orders, we ordered two additional EX60s in the US. Guess what? The exact same thing happened. Shipped with 4.x software, warranty expired before they even arrived! VAR hadn't registered Smartnet yet, so I didn't have contract information. I was not happy, to say the least. So, I complained again, and again received new keys (again, thanks to the product manager). C'mon Cisco, this is dumb. I don't care if the unit is covered by Warranty or SmartNet - if the product is only orderable with 5.x software, I should get a 5.x release key in the box, I shouldn't have to beg for it.

Thursday, October 27, 2011

It's Been Awhile...

It's been awhile since I've posted here, and that is primarily because of a job change. Although there are all sorts of negative things I could say about my previous employer, I will instead focus on the positive aspects of my new position. Instead of being solely responsible for practically everything in the enterprise, I am now the Senior Engineer over Global Telephony. There are dedicated teams handling Network, Security, and Servers, and that allows me to focus my efforts and produce quality work. So, that being said, I will be posting more about Telephony, and less about other Data Center stuff.

So, what is going on in my life? New CUCM Cluster in London, planning a new CUCM Cluster for Brazil, a CUCMBE installation in Singapore, and a CUCM Migration here in the States. That should keep me busy for a couple of months...

Monday, July 18, 2011

VMware hates it's loyal customers

Now that vSphere 5 has officially been announced, has anyone else reviewed the licensing changes [PDF]? They are changing the model to begin capping the total vRAM at a socket-license level. So, it works out like this, for each socket, these are the vRAM entitlements per license level:
- 24GB vRAM for Essentials Kit
- 24GB vRAM for Essentials Plus Kit
- 24GB vRAM for Standard
- 32GB vRAM for Enterprise
- 48GB vRAM for Enterprise Plus

Let's say that you are using an 8-node cluster, each with 96GB of Physical RAM, and each with Enterprise Plus licensing. That means you are now entitled to 768MB of Virtual RAM. Now let's say that you use a script, such as this one, to determine how much vRAM is in use in your current environment. If the answer is >768MB, you are now out of compliance. Let's say that you have fairly low consolidation ratios and you are consuming 1024MB of RAM. That means you need to purchase and additional 256MB of vRAM licensing, which equates to 6 additional Enterprise Plus licenses. No one pays list, but for the sake of argument, let's assume $4,229 per license, which equals $25,374. For a version upgrade. That you are probably also paying SnS on for the "privilege" of software subscription, which I always thought meant you received free upgrades. Guess not.

VMware is punishing it's existing customers for failing to have a well-designed licensing mechanism in place. I understand they want to make more money, but they should not be coming back to the coffers of existing customers who are already paying SnS maintenance - changing the licensing scheme for every major version will eventually cost them customers. This new licensing mechanism seems be here only to give cloud providers another billing mechanism, and to punish enterprise customers who want to actually virtualize memory-bound applications.

Thursday, May 26, 2011

Nexus 5000 - FWM-2-STM_LOOP_DETECT

In a previous post, I mentioned problems we were having with one of our Nexus 5000 switches. During all of the Nexus 1000v issues, it was throwing these messages continually:

2011 Mar 29 05:22:13 N5K-2 %FWM-2-STM_LEARNING_RE_ENABLE: Re enabling dynamic learning on all interfaces
2011 Mar 29 05:22:20 N5K-2 %FWM-2-STM_LOOP_DETECT: Loops detected in the network among ports Eth1/10 and Eth1/2 vlan 801 - Disabling dynamic learn notifications for 180 seconds

I couldn't tell if it was actually affecting anything, since VLAN 801 was being used as a FCoE VLAN. Looking at MAC addresses bound to VLAN 801 would reveal one MAC address in particular that would move around:

N5K-2(config)# sho mac add vlan 801
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 801 0005.9b7a.8800 dynamic 0 F F Eth1/7

But what was this MAC address? After a little bit of digging, I finally found it:

N5K-2(config)# sho int mgmt0
mgmt0 is down (Link not connected)
Hardware: GigabitEthernet, address: 0005.9b7a.8800 (bia 0005.9b7a.8800)
There it is, but why is it being learned? The interface is down (and the MAC still remained even when Admin Down). After opening a TAC case and collecting debugs over and over again, the Engineer opened a Bug for the issue. Which was then terminated. Solution? Stop the leak of the MAC address and reload the switch. Fun.

But how did this come about? Well, before we installed our FCoE storage, we were using vPC's off of the Nexus 5000s to each of our ESX hosts. When we installed the storage, EMC told us we could not use FCoE across our vPC's. I can't find any evidence that this advice was accurate, but it is what it is. The Management ports had originally been assigned internally routable IP addresses, and the vPC Keep-Alive was built on top of that. That means that the default VRF would have learned the address from the management VRF at some point in time. Upon removing the vPC configuration, it just never un-learned it apparently.

More Dell PowerEdge M1000e woes

I previously commented about issues we had with one of the pass through I/O modules with our M1000e chassis. After opening a case with support, they had us do some things such as remove the blades, remove the modules, etc, and it started working. Still not a particularly promising sign. After building out our ESX servers and trying to put VMs on them, we had all kinds of unusual issues with trying to run FCoE Active/Active on them. We were getting errors such as:

Apr 23 15:02:04 host vmkernel: 0:00:39:45.582 cpu0:4284)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic2: transmit timed out
Apr 23 15:02:04 host vmkernel: 0:00:39:45.854 cpu3:4260)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.60060060060060060060060060060060".
Apr 23 15:02:04 host vmkernel: 0:00:39:45.854 cpu4:4258)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.60060060060060060060060060060060".
Apr 23 15:02:04 host vmkernel: 0:00:39:45.854 cpu10:4255)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.60060060060060060060060060060060".

There was another blog post documenting a similar issue with other Emulex cards that seemed to point to an incompatibility between the Emulex drivers and the current version of ESX. Dell did not have an update available, and they reached the same conclusion - that they were incompatible. So, they RMA'd the I/O modules and the Emulex Mezzanine cards for Brocade Mezzanine cards and the corresponding replacement pass through I/O modules. So far, things have been better. We'll see if it holds.

Wednesday, April 6, 2011

Nexus 1000v and Cisco Support

After writing my previous posts about my love/hate relationship with Nexus 1000v, I received a phone call from the Cisco Nexus 1000v Product Manager. I can only guess that he tracked me down because I posted a Bug ID in there. Regardless, he was very interested in making sure that my issues were resolved, and he pulled some resources together to help me out.

I needed the help because my Secondary VSM had started into a reboot loop. Even deploying a fresh VSM would do the same thing after the Config Sync happened. While pulling some debugs off of the busted VSM, somehow 6 of our VEMs (Hosts) unregistered with the Primary VSM. My TAC Engineer was out of the office, but an Engineer from the 1000v Escalation Team got on the phone with me, and started digging around. What he found was this: a 3750 switch, home to several Development ESX Hosts, using a port-channel connected via vPC to our Nexus 7000 switches. The only traffic allowed across this port-channel was the Control/Packet/Management VLANs. Something on the 3750 was causing a broadcast loop in the network. After the Engineer pointed out the issue, shutting down the ports connected to the 3750 instantly resolved the connectivity issues - looks like I need to revisit my spanning-tree and port-channel configurations to see where I went astray.

During the course of all of this, we were trying to add the Dell Chassis that I wrote about previously. The stupid VEMs (hosts/blades) just wouldn't register with the Nexus 1000v VSMs. Why? After spending some more time on the phone with my TAC Engineer - who I'm sure was quite sick of hearing from me - more wonderful news was discovered: The firmware version on the Emulex OneConnect Mezzanine cards (2.702.200.17) apparently has an issue processing tagged multicast/broadcast traffic. Excellent news! Thanks, Dell, for shipping me cards in brand new servers that have firmware old enough to have major issues that are at least somewhat well known. I lucked out by having a TAC Engineer who has supported Nexus 5000 switches as well, and had ran into this exact issue before. I don't even want to think about how long it may have taken to resolve this problem had she not been familiar with it. And sure enough, after upgrading the firmware (which was another fun process - I'll write about it later), the Dell VEMs were happily registered to the VSM.

To be fair, I should have caught the issue with the Emulex cards. We bought PCI Express versions with the same chipset about 6 months before. Inside each of the boxes were big warning messages stating not use the cards without first updating the firmware. Apparently it's too much to ask Emulex to do that before sending them out of the factory. It's probably because there is one customer who is still using that original firmware version; maybe it's not possible to downgrade the cards that far once they have been upgraded. Regardless, it's substantially annoying.

Oh, yeah, then there was that other Nexus 5000 issue - I'll get to that in another post.