How to Select a Placement Policy for Site-Aware Clusters

One of the more popular failover clustering enhancements in Windows Server 2016 and 2019 is the ability to define the different fault domains in your infrastructure. A fault domain lets you scope a single point of failure in hardware, whether this is a Hyper-V host (a cluster node), its enclosure (chassis), its server rack or an entire datacenter. To configure these fault domains, check out the Altaro blog post on configuring site-aware clusters and fault domains in Windows Server 2016 & 2019. After you have defined the hierarchy between your nodes, chassis, racks, and sites then the cluster’s placement policies, failover behavior, and health checks will be optimized. This blog will explain the automatic placements policies and advanced settings you can use to maximize the availability of your virtual machines (VMs) with site-aware clusters.

Site-Aware Placement Based on Storage Affinity

From reading the earlier Altaro blog about fault-tolerance, you may recall that the resiliency is created by distributing identical (mirrored) storage spaces direct (S2D) disks across the different fault domains.  Each node, chassis, rack or site may contain a copy of a VM’s virtual hard disks. However, you always want the VM to be in the same site as its disk for performance reasons to avoid having the I/O transmitted across distance. In the event that a VM is forced to start in a separate site from its disk, then it will automatically live migrate the VM to the same site as its disk after about a minute.  With site-awareness, the automatic enforcement of storage affinity between a VM and its disk is given the highest site placement priority.

Configuring Preferred Sites with Site-Aware Clusters

If you have configured multiple sites in your infrastructure, then you should consider which site is your “primary” site and which should be used as a backup. Many organizations will designate their primary site as the location closest to their customers or with the best hardware, and the secondary site as the failover location which may have limited hardware to only support critical workloads.  Some enterprises may deploy identical datacenters, and distribute specific workloads to each location to balance their resources. If you are splitting your workloads across different sites you can assign each clustered workload or VM (cluster group) a preferred site. Let’s say that you want your US-East VM to run in your primary datacenter and your US-West VM to run in your secondary datacenter, you could configure the following settings via PowerShell:

Designating a preferred site for the entire cluster will ensure that after a failure that the VMs will start in this location. After you defined your sites by creating a New-ClusterFaultDomain you can use the cluster-wide property PreferredSite to set the default location to launch VMs. Below is the PowerShell cmdlet:

Be aware of your capacity if you are usually distributing your workloads across two sites and they are forced to run in a single location as performance will diminish with less hardware. Consider using the VM prioritization feature and disabling automatic VM restarts after a failure, as this will ensure that only the most important VMs will run. You can find more information from this Altaro blog on how to configure start order priority for clustered VMs.

To summarize, placement priority is based on:

  • Storage affinity
  • Preferred site for a cluster group or VM
  • Preferred site for the entire cluster

Site-Aware Placement Based on Failover Affinity

When site-awareness has been configured for a cluster, there are several automatic failover policies that are enforced behind the scenes. First, a clustered VM or group will always failover to a node, chassis or rack within the same site before it moves to a different site. This is because local failover is always faster than cross-site failover since it can bring the VM online faster by accessing the local disk and avoid any network latency between sites. Similarly, site-awareness is also honored by the cluster when a node is drained for maintenance. The VMs will automatically move to a local node, rather than a cross-site node.

Cluster Shared Volumes (CSV) disks are also site-aware. A single CSV disk can store multiple Hyper-V virtual hard disks while allowing their VMs to run simultaneously on different nodes.  However, it is important that these VMs are all running on nodes within the same site. This is because the CSV service coordinates disk write access across multiple nodes to a single disk. In the case of Storage Spaces Direct (S2D), the disks are mirrored so there are identical copies running in different locations (or sites). If VMs were writing to mirrored CSV disks in different locations and replicating their data without any coordination, it could lead to disk corruption. Microsoft ensures that this problem never occurs by enforcing all VMs which share a CSV disk to run on the local site and write to a single instance of that disk. Furthermore, CSV distributes the VMs across different nodes within the same site, balancing the workloads and write requests to that coordinate node.

Site-Aware Health Checks and Cluster Heartbeats

Advanced cluster administrators may be familiar with cluster heartbeats, which are health checks between cluster nodes. This is the primary way in which cluster nodes validate that their peers are healthy and functioning. The nodes will ping each other once per predefined interval, and if a node does not respond after several attempts it will be considered offline, failed or partitioned from the rest of the cluster. When this happens, the host is not considered an active node in the cluster and it does not provide a vote towards cluster quorum (membership).

If you have configured multiple sites in different physical locations, then you should configure the frequency of these pings (CrossSiteDelay) and the number of health check which can be missed (CrossSiteThreshold) before a node is considered failed. The greater the distance between sites, the more network latency will exist, so these values should be tweaked to minimize the chances of a false failover during times when there is high network traffic. By default, the pings are sent every 1 second (1000 milliseconds) and when 20 are missed, a node is considered unavailable and any workloads it was hosting will be redistributed. You should test your network latency and cross-site resiliency regularly to determine whether you should increase or reduce these default values. Below is an example to change the testing frequency from every 1 second to 5 seconds and the number of missed responses from 20 to 30.

By increasing these values, it will now take longer for a failure to be confirmed and failover to happen resulting in greater downtime. The default time is 1-second x 20 misses = 20 seconds, and this example extends it to 5 seconds x 30 misses = 150 seconds.

Site-Aware Quorum Considerations

Cluster quorum is an algorithm that clusters use to determine whether there are enough active nodes in the cluster to run its core operations. For additional information, check out this series of blogs from Altaro about multi-site cluster quorum configuration.  In a multi-site cluster, quorum becomes complicated since there could be a different number of nodes in each site. With site-aware clusters, “dynamic quorum” will be used to automatically rebalance the number of nodes which have votes. This means that as clusters nodes drop out of membership, the number of voting nodes changes. If there are two sites with an equal number of voting nodes, then the group of nodes that are assigned to be the preferred site will stay online and run the workloads, while the lower priority site will reduce their votes and not host any VMs.

Windows Server 2012 R2 introduced a setting known as the LowerQuorumPriorityNodeID, which allowed you to set a node in a site as the least important, but this was deprecated in Windows Server 2016 and should no longer be used. The idea behind this was to easily declare which location was the least important when there were two sites with the same number of voting nodes. The site with the lower priority node would stay offline while the other partition would run the clustered workloads. That caused some confusion since the setting was only applied to a single host, but you may still see this setting referenced in blogs such as Altaro’s https://www.altaro.com/hyper-v/quorum-microsoft-failover-clusters/.

The site-awareness features added to the latest version of Window Server will greatly enhance a cluster’s resilience through a combination of user-defined policies and automatic actions. By creating the fault domains for clusters, it is easy to provide even greater VM availability by moving the workloads between nodes, chassis, racks, and sites as efficiently as possible. Failover clustering further reduces the configuration overhead by automatically applying best practices to make failover faster and keep your workloads online for longer.

Wrap-Up

Useful information yes? How many of you are using multi-site clusters in your organizations? Are you finding it easy to configure and manage? Having issues? If so, let us know in the comments section below! We’re always looking to see what challenges and successes people in the industry are running into!

Thanks for reading!


Go to Original Article
Author: Symon Perriman