Hyper-V Replica to the Rescue!

Power outages are not infrequent where I live (something I find quite confounding – to be honest) and earlier this week we had an extended power outage and my Hyper-V servers were powered off uncleanly.  When the power returned I had to sit down and make sure that everything came back correctly.

At first glance – everything looked good.  My Hyper-V servers powered up happily and started up all the virtual machines.  Hyper-V Replica reported that it was in a critical state – but it automatically scheduled resynchronization for all of my virtual machines.  But as I was going through the virtual machines – I found a problem.

Something had gone wrong with my firewall.

I could not figure out what exactly was wrong – but it was using 100% CPU and not allowing any network traffic through.  I shutdown the virtual machine cleanly and restarted it – but no dice.  It still would not work.  Thankfully, there was a simple solution.

I shutdown the misbehaving virtual machine and started up the replica version of it.  This came up with no problems and started functioning correctly – as it had not been powered off uncleanly,  Yay!

Now, there is a key point to make here:  if I had performed a planned failover in Hyper-V (select the primary virtual machine and perform a planned failover) this would not have worked.  Hyper-V would have copied across the outstanding (bad) changes and would have broken my replica virtual machine too.  What I actually did was go straight to the replica virtual machine and selected to perform a failover (not planned).  By doing this, Hyper-V did not copy across the latest data and everything worked.

At the end of this process I reversed the replication relationship and was good to go.


Backup of a Replica VM

This blog post covers the scenarios and motivations that drive the backup of a Replica VM, and product guidance to administrators.

Why backup a Replica VM?

Ever since the advent of Hyper-V Replica in Windows Server 2012, customers have been interested in backing up the Replica VM. Traditionally, IT administrators have taken backups of the VM that contains the running workload (the primary VM) and backup products have been built to cater to this need. So when a significant proportion of customers talked about the backup of Replica VMs, we were intrigued. There are a few key scenarios where backup of a Replica VM becomes useful:

  1. Reduce the impact of backup on the running workload:   Taking the backup of a VM involves the creation of a snapshot/diff-disk to baseline the changes that need to be backed up. For the duration of the backup job, the workload is running on a diff-disk and there is an impact on the system when that happens. By offloading the backup to the Replica site, the running workload is no longer impacted by the backup operation. Of course, this is applicable only to deployments where the backup copy is stored on the remote site. For example, the daily backup operation might store the data locally for quicker restore times, but monthly or quarterly backup for long-term retention that are stored remotely can be done from the Replica VM.
  2. Limited bandwidth between sites:   This is typical of Branch Office-Head Office (BO-HO) kind of deployments where there are multiple smaller remote branch office sites and a larger central Head Office site. The backup data for the branch offices is stored in the head office, and an appropriate amount of bandwidth is provisioned by administrators to transfer the backup data between the two sites. The introduction of disaster recovery using Hyper-V Replica creates another stream of network traffic, and administrators have to re-evaluate their network infrastructure. In most cases, administrators either could not or were not willing to increase the bandwidth between sites to accommodate both backup and DR traffic. However they did come to the realization that backup and DR were independently sending copies of the same data over the network – and this was an area that could be optimized. With Hyper-V Replica creating a VM in the Head Office site, administrators could save on the network transfer by backing up the Replica VM locally rather than backing up the primary VM and sending the data over the network.
  3. Backup of all VMs in the Hoster datacenter:   Some customers use the Hoster datacenter as the Replica site, with the intention of not building a secondary datacenter of their own. Hosters have SLAs around the protection of all customer VMs in their datacenters – typically once a day backup. Thus the backup of Replica VMs becomes a requirement for the success of their business.

Thus various customer segments found that the backup of a Replica VM has value for their specific scenarios.

Data consistency

A key aspect of the backup operation is related to the consistency of the backed-up data. Customers have a clear prioritization and preference when it comes to data consistency of backed up VMs:

  1. Application-consistent backup
  2. Crash-consistent backup

And this prioritization applied to Replica VMs as well. Conversations with customers indicated that they were comfortable with crash-consistency for a Replica VM, if application-consistency was not possible. Of course, anything less than crash-consistency was not acceptable and customers preferred that backups fail rather than have inconsistent data getting backed up.

Attempting application-consistency

Typical backup products try to ensure application-consistency of the data being backed up (using the VSS framework) – and this works out well when the VM is running. However, the Replica VM is always turned off until a failover is initiated, and VSS is unable to guarantee application-consistent backup for a Replica VM. Thus getting application-consistent backup of a Replica VM is not possible.

Guaranteeing crash-consistency

In order to ensure that customers backing up Replica VMs always get crash-consistent data, a set of changes were introduced in Windows Server 2012 R2 that failed the backup operation if consistency could not be guaranteed. The virtual disk could be inconsistent when any one of the below conditions are encountered, and in these cases backup is expected to fail.

  1. HRL logs are being applied to the Replica VM
  2. Previous HRL log apply operation was cancelled or interrupted
  3. Previous HRL log apply operation failed
  4. Replica VM health is Critical
  5. VM is in the Resynchronization Required state or the Resynchronization in progress state
  6. Migration of Replica VM is in progress
  7. Initial replication is in progress (between the primary site and secondary site)
  8. Failover is in progress

Dealing with failures

These are largely treated as transient error states and the backup product is expected to retry the backup operation based on its own retry policies. With 30 second replication and apply being supported in Windows Server 2012 R2, the backup operation is expected to collide with HRL log apply more frequently – resulting in error scenario 1 mentioned above. A robust retry mechanism is needed to ensure a high backup success rate. In case the backup product is unable to retry or cope with failures then an option is to explicitly pause the replication before the backup is scheduled to run.


Key Takeaways

Impact on administrators 

  1. Backup of Replica VMs is better with Windows Server 2012 R2.
  2. Only crash-consistent backup of a Replica VM is guaranteed.
  3. A robust retry mechanism needs to be configured in the backup product to deal with failures. Or ensure that replication is paused when backup is scheduled.

Impact on backup vendors

  1. The changes introduced in Windows Server 2012 R2 would benefit customers using any backup product to take backup of Replica VMs.
  2. A robust retry mechanism would need to be built to deal with Replica VM failure.
  3. For specific details on how Data Protection Manager (DPM) deals with the backup of Replica VMs, refer to this blog post.


Update 25-Apr-2014:  The DPM-specific details on this post have been moved to the DPM blog.

Listing all the IP Addresses used by VMs

Here is a neat little snippet of PowerShell:

Get-VM | ?{$_.State -eq “Running”} |  Get-VMNetworkAdapter | Select VMName, IPAddresses

If you run this on a Hyper-V Server it will give you a listing of all the IP addresses that are assigned to running virtual machines:


This works whether you are using DHCP or Static IP addresses – and can really help when you are trying to track down a rogue virtual machine.


Exporting a Virtual Machine Checkpoint

Something neat that you can do in Windows Server 2012 / Windows 8 or later is to export a virtual machine checkpoint.  You can do this by either:

  1. Selecting the checkpoint in the UI and selecting Export from the action pane
  2. Using the Export-VMSnapshot cmdlet

When you do this, we will actually create an exported virtual machine that represents just the checkpoint you selected.  This means that we will merge all the changes from the checkpoint into a single set of virtual hard disks – so that you can create a new virtual machine from that point in time without having to include other checkpoints.


Hyper-V Networking–NIC Teaming

If you look at the advanced features of a network adapter in Hyper-V, you may have noticed the NIC Teaming option below:

NIC Teaming

And wondered what this was about.

In most deployments you will enabled network adapter teaming in the host operating system, and connect a virtual switch to the team.  If you do this you will never need to enable this option.

But there are situations where you will not want to use network adapter teaming in the host operating system, and instead you will want to connect two virtual network adapters to the virtual machine and configure teaming inside the guest operating system.  One situation where you would want to do this is if you were using SR-IOV enabled network adapters.

The effect of enabling this option is that if there is a connection failure on the physical network adapter that is being used by the virtual switch, we also disconnect the virtual network adapter.  This is required to ensure that network teaming functions correctly inside the guest operating system – but will cause problems if network teaming is not configured inside the guest operating system.


Understanding Maximum Dynamic Memory from inside a VM

Dynamic memory is a great feature that allows Hyper-V administrators to get better utilization of their physical memory.  But it can be hard to tell what is going on from inside of a virtual machine.  There are, however, some things that you can do from inside a virtual machine.

The first thing you can do is see how much memory is currently available to your virtual machine.  This is just the free memory as is displayed in Task Manager inside the guest operating system:


Beyond this, if you are running Windows 8 or later, you can also see what the maximum memory is set at for your virtual machine.  In the screenshot above you can see that the Maximum memory of this virtual machine is set to 4GB.

You can also access the information about the maximum memory configured inside of a virtual machine by looking at the Maximum Memory, Mbytes performance counter off of the Hyper-V Dynamic Memory Integration Service using Performance Monitor.


Finally, you can access this information using PowerShell:


Running the following command inside of a virtual machine:

(get-counter “Hyper-v Dynamic Memory Integration ServiceMaximum Memory, Mbytes”).CounterSamples.CookedValue

Will tell you what the maximum memory for that virtual machine is configured to.