Info Image

The Do’s and Don’ts of VM Mobilization: Your Rulebook

The Do’s and Don’ts of VM Mobilization: Your Rulebook Image Credit: monstij/bigstockphoto.com

Of the myriad of capabilities the cloud provides, one of its most notable involves overseeing the physical servers running virtual machines (VMs) so humans don’t have to. Virtual machines are software computers that run applications and an operating system, with similar functionalities as a physical computer. Where VMs differentiate has to do with their computer systems. VMs are ultimately computer files that run on a physical computer, but behave as separate computer system and are created within a computing environment called a host. Multiple VMs may exist in one host at the same time, and theoretically, your VM should keep your functions executing as they are designed to while maintaining appropriate service levels, which removes any concern should your VM transfer to a different host.

That said, the reality of effective VM mobilization remains quite complex, despite the flexibility and efficiency it provides long-term. There are important “what-if’s” to consider, including potential restrictions limiting VM capabilities, the responsibility and morality surrounding evacuation strategies and navigating technical limitations.

For example, telco operators are using their private cloud with a multitude of restrictions (i.e. traditional barriers, customer’s requirements, regulations, etc.) that limit the freedom to automatically mobilize VM to a new location. Often times, telco applications include parameters, such as CPU pinning, SRIOV, affinity/anti-affinity and no oversubscription, which greatly impact the option to migrate, either live or cold, or evacuate a VM to a different host.

The evacuation option raises a responsibility debate about who is permitted to change a VM’s location. In that same vein, it poses the question: is it in the mandate of infrastructure or should it be approved or even triggered by the application layer?

It’s important that people consider the four key ways you can approach auto host evacuation.

First, we need to define the cases where we prefer to evacuate VMs from their sources to a target host. It is hard to set well-defined rules to trigger evacuation from infrastructure. For example, one common situation is that of host failure, which occurs when the host is not available and ultimately stops functioning. In this case, you need a balance between a desire for acting fast by triggering evacuation shortly after failure detection and the willingness to accept many false cases. Temporarily, a host may be unavailable due to server restart, passing connectivity issues, or fast action which may lead to unnecessary evacuation. In such cases, waiting longer to validate host failure is not temporary and can help prevent those false cases.

Part two of this is ensuring you can evacuate and re-allocate all VMs successfully. Today, actions of evacuation and placement are separate, and, when evacuating, you cannot be certain that the placement will be found for all evacuated VMs.

One key way to circumvent this problem is to have a dedicated pool of empty servers to be reserved as optional targets for evacuation. Alternatively, one could implement tools to simulate placement before evacuation. Lastly, consider whether you want to evacuate all VMs in source host or selectively evacuate to reduce the number of VMs and increase chances to fulfill placement successfully.

A third and important potential hardship is technical limitations, such as the migration of running SRIOV VM or CPU pinning. If you understand those limitations you can plan ahead of time regarding what can and can’t be done to mitigate future issues.

The fourth and final is a recurring evacuation. Ensuring that the first evacuation will succeed is the first part. The second piece is thinking through and planning multiple evacuations that require more complex tools to create optional targets that are always available.

When it comes to VM evacuations, the topic of responsibility is not trivial. The infrastructure layer can and should monitor resources and send notifications to get ahead of a host or any other infra failure. The problem is, however, often the knowledge on VM is lacking, which impacts to rely on certain people to be the key decision makers. We see many organizations approach evacuations in a multitude of ways and we find it’s important to consider the true limitations, as well as capabilities, of your specific infrastructure rather than mirroring an approach that may not work best for your organization.

Regardless of who you serve and the role VM plays in your business, you will be able to navigate the somewhat complicated process of VM mobilization knowing that you have a strategy in place for potential roadblocks.

NEW REPORT:
Next-Gen DPI for ZTNA: Advanced Traffic Detection for Real-Time Identity and Context Awareness
Author

With 8 years of experience as a product manager in Telco Cloud, Ohad Shamir is the product manager in the Cloud Infra product team in Nokia Software, spanning the CloudBand Infrastructure Software and Nokia Container Services products. In addition, he leads the security domain for the Cloud Infra team.

PREVIOUS POST

The Tipping Point for Unified Communications in Business, and How and Why Free Video Conferencing Fits In

NEXT POST

The Need for Network Analytics to Support the New Normal for Mobile Communications