spk-logo-white-text-short
0%
1-888-310-4540 (main) / 1-888-707-6150 (support) info@spkaa.com
Select Page

3 Real World vSphere Situations to Avoid

Written by Mike Solinap
Published on January 8, 2014

Almost a year ago, I wrote about 10 Pitfalls that Can Impact VMware Performance. I thought I’d revisit this topic, provide some specific situations I’ve encountered over the past year, and explain what I’ve learned from them. You may have already taken the advice of my previous article, but as we’re all aware, the real world will typically bring us unexpected issues.

Here are 3 real world vSphere situations that you will want to avoid:

1. Poor NFS performance

You’ve got powerful hosts, a fast network, and a decent number of spindles on your storage array — yet your IO performance is horrible. Are your datastores mounted via NFS? If so, this may likely be the culprit. Don’t get me wrong — after a decade of managing Unix systems as part of our engineering services, I learned that NFS is something that we can’t live without. In terms of vSphere, NFS provides several key benefits when compared to the alternative, block storage. Namely, compared to Fiber Channel; no storage area network investment is needed, LUNs no longer need to be carved, you have better accessibility to individual VMDKs and troubleshooting – port mirror with wireshark! The list goes on.

However, there is 1 big limitation with respect to NFS and vSphere – the NFS implementation in vSphere seems to only support synchronous writes. This is regardless of how your storage array is exporting the NFS volume.

Why is this problematic? It may or may not be, depending what your storage array is. Take for example a Network Appliance. Synchronous writes are NOT a problem, due to the fact that Netapp implements an NVRAM. Writes are considered committed as soon as they are written to NVRAM, as opposed to waiting for your spindles to write the data. A linux or BSD based machine with ZFS and an SSD backed Intent Log is a similar situation. Control is returned to the application before data is committed to the physical disks.

Without the ability to quickly acknowledge the write requests from the vSphere hosts, your overall performance will suffer.

2. Snapshots Are Not Free

The beauty of virtualizing a machine is that we can take snapshots of the running state, and revert back to them if needed. This comes in extremely handy for testing software, configuration changes, or as an easy fallback when migrating to production as part of your build and release management plan.

Unfortunately, this benefit isn’t “free”. When you initiate a snapshot of a virtual machine, vSphere stops writing changes to the original VMDK file. A new file is created and subsequent changes are appended to this file throughout the life of the snapshot. Chances are, we forget that we created the snapshot in the first place, and this can create some obvious, but also not so obvious repercussions.

Obviously, as the the snapshot ages, the delta file will grow. And grow. And grow. Even though you may think your virtual machine is mostly “idle”, things constantly get written to systems logs, periodic tasks run, OS updates get applied, etc. If you have your datastores accessible via NFS (hint hint!), run a find for all delta files and see how much space they’ve accumulated. You may be surprised.

Growing snapshots present another issue. At some point, you will want to commit (delete) them. If your snapshots have grown significantly, committing them will generate a HUGE amount of IO. Delete them and cross your fingers. I’ve encountered a situation where I was deleting a large snapshot. It generated enough IO to cause vSphere to time out the operation. The result – a corrupted VM, a saturated network, and a bogged down storage array.

3. Under Utilized Memory

If you have an abundance of hardware available, and deep pockets, feel free to skip over this section. Otherwise, you’re likely in the same situation as the rest of us — with limited budgets and hardware sorely needing upgrades.

You may be tempted to splurge on some additional RAM since it’s typically the lowest hanging fruit. Or is it? What if your host has no free RAM slots left? With some simple analysis, it is worth investigating whether or not additional RAM is needed at all.

Although your host says its memory utilization is at 80%, it may not necessarily need any more. vSphere employs many techniques to get the most utilization from the host:

  • Memory deduplication: vSphere will look for identical memory pages, and only keep one copy. Since hosts run multiple, similar machines, running similar applications, there’s good potential for memory savings.
  • Ballooning: This concept is similar to the “swappiness” behavior of a Linux machine. The idea is that if pages in memory aren’t being accessed often or at all, page them to physical disk so that memory can be freed up. However, vSphere goes one step further and attempts to control or force this paging to disk by employing a balloon driver. Installed with VMware Tools, an artificial process will start consuming memory within the guest. Then, this forces the guest to decide for itself what best should be paged to disk.
  • Memory Compression: vSphere has the ability to check for the compressibility of a memory page. If it can be compressed greater than 50%, then it will do it. Compressing a memory page still outperforms having to swap to disk.

After looking at performance metrics for each of these techniques, and seeing that the guest is still swapping to disk, then it’s likely that you do in fact need more RAM. However, if new RAM is out of the question for whatever reason, vSphere can take advantage of an SSD and use it as your swap space. With SSD’s dropping in price month after month, this may be your biggest bang for your buck. Additionally, if you are licensed for Enterprise Plus, the SSD can also be used as a read cache, reducing IO on your storage array even further.

Hopefully you’re fortunate enough to avoid situations like these.  Do you have any unique situations you’d like to share, or feedback on the ones I’ve mentioned? I’d like to hear them!

Next Steps:

Michael Solinap
Sr. Systems Integrator
SPK & Associates

Latest White Papers

The Hybrid-Remote Playbook

The Hybrid-Remote Playbook

Post-pandemic, many companies have shifted to a hybrid or fully remote work environment. Despite many companies having fully remote workers, many still rely on synchronous communication. Loom offers a way for employees to work on their own time, without as many...

Related Resources

Optimize Your Databases with Azure SQL

Optimize Your Databases with Azure SQL

Making data-driven decisions is one of the most valuable things a business can do to achieve and maintain success. Businesses thrive on their ability to make intelligent, timely decisions based on accurate, accessible data. Without the use of data to inform their...

How Model-Based Definition (MBD) Cuts ECOs by 41% and Scrap by 47%

How Model-Based Definition (MBD) Cuts ECOs by 41% and Scrap by 47%

Organizations are increasingly turning to Model-Based Definition (MBD) to revolutionize their engineering and manufacturing processes. By embedding rich, digital annotations directly into 3D models, MBD provides a single source of truth for product definitions. This...

Seamlessly Transition from AWS CodeCommit to GitLab

Seamlessly Transition from AWS CodeCommit to GitLab

In July of 2024, AWS announced that AWS CodeCommit would no longer be sold to new customers.  And thus begins the journey of winding down a product for AWS.  As AWS CodeCommit approaches its end-of-life, many organizations face a tough decision. Choosing where to...