Optimizing Hyper-V Replica HTTPS traffic using Riverbed SteelHead

Hyper-V Replica support both Kerberos based authentication and certificate based authentication – the former sends the replication traffic between the two servers/sites over HTTP while the latter sends it over HTTPS. Network is a precious commodity and any optimization delivered has a huge impact on the organization’s TCO and the Recovery Point Objective (RPO).

Around a year back, we partnered with the folks from Riverbed in Microsoft’s EEC lab, to publish a whitepaper which detailed the bandwidth optimization of replication traffic sent over HTTP.

A few months back, we decided to revisit the setup with the latest release of RiOS (Riverbed OS which runs in the Riverbed appliance). Using the resources and appliances from EEC and Riverbed, a set of experiments were performed to study the network optimizations delivered by the Riverbed SteelHead appliance. Optimizing SSL traffic has been a tough nut to crack and we saw some really impressive numbers.  The whitepaper documenting the results and technology is available here – http://www.microsoft.com/en-us/download/details.aspx?id=42627.

At a high level, in order to optimize HTTPS traffic, the Riverbed SteelHead appliance decrypts the packet from the client (the primary server). It then optimizes the payload and encrypts the payload before sending it to the server side SteelHead appliance over the internet/WAN. The server-side SteelHead appliance decrypts the payload, de-optimizes the traffic and re-encrypts it. The server side appliance finally sends it to the destination server (the replica server) which proceeds to decrypt the replication traffic. The diagram is taken from Riverbed’s user manual and explains the above technology:

image

When Hyper-V Replica’s inbuilt compression is disabled, the reduction delivered over WAN was ~80%

image

When Hyper-V Replica’s inbuilt compression is enabled, the reduction delivered over WAN was ~30%

image

It’s worth calling out that the % reduction delivered depends on a number of factors such as workload read, write pattern, sparseness of the disk etc but the numbers were quite impressive.

In summary, both Hyper-V Replica and the SteelHead devices were easy to configure and worked “out-of the box”. Neither product required specific configurations to light up the scenario. The Riverbed appliance delivered ~30% on compressed, encrypted Hyper-V Replica traffic and ~80% on uncompressed, encrypted Hyper-V Replica traffic.

Hyper-V events at TechEd North America

I’m excited to report we, the Hyper-V team, will have a record high presence this year at TechEd North America in Houston.  Come join us at the Hyper-V booth and for our Hyper-V sessions.

Sessions:

Monday:

11:00 – 12:00

FDN06 Transform the Datacenter: Making the Promise of Connected Clouds a Reality 
Speaker(s): Brian Hillger, Elden Christensen, Jeff Woolsey, Jeffrey Snover, Matt McSpirit

1:15 – 2:30 

DCIM-B319 Building a Backup Strategy for Your Private Cloud
Speaker(s): Doug Hazelman, Michael Jones, Shivam Garg, Taylor Brown, Vineeth Karinta

4:45 – 6:00 

DCIM-B378 Converged Networking for Windows Server 2012 R2 Hyper-V
Speaker(s): Don Stanwyck, Taylor Brown

Tuesday:

8:30 – 9:45 

DCIM-B379 Using VMware? The Advantages of Microsoft Cloud Fundamentals with Virtualization 
Speaker(s): Jeff Woolsey, Matt McSpirit

Wednesday:

5:00 – 6:15 

DCIM-B380 What’s New in Windows Server 2012 R2 Hyper-V 
Speaker(s): Jeff Woolsey

Thursday:

2:45 – 4:00 

DCIM-B219 Secure Design and Best Practices for Your Private Cloud
Speaker(s): Patrick Lang, Sam Chandrashekar

Booth Info:

The Hyper-V booth will be in the center of the Expo floor with the Server and Cloud Tools booth block.  Come find us when you have a chance.

I’ll post bios for the Hyper-V attendees shortly.

Cheers,
Sarah

Replication Health-Windows Server 2012 R2

We have made improvements to the way we display Replication Health in Windows Server 2012 R2 to support Extend Replication. If you are new to measuring replication health, I would strongly suggest you to go through this two part blog series on Interpreting Replication Health. I would discuss specifically on the additional changes we made in Windows Server 2012 R2.

Replication Tab in Replica Site Hyper-V Manager:

Replication tab in Replica Site now shows replication health information for both Primary Replication Relationship and Extended Replication relationship. It neatly captures the Health values separately for both primary and extend replication in a single pane separating them by a line.

Replication helath-Tab

Replication Health Screen in Replica Site:

Replication Health information about Extend Replication can be captured through “Extended Replication” tab in Replication Health screen. To view Replication Health Screen, go to Hyper-V Manager/Failover Cluster Manager and right click on protected VM and choose “View Replication Health”.

Replication health information about primary replication relationship is shown in “Replication” tab while extended replication screen displays Replication Health information about extend replication. What’s more, Extended Replication tab looks exactly like Replication Health screen in Primary Server to give a consistent view while Replication tab continues to display the content the way it used to. You can even “Reset Statistics” or “Save as CSV file” on a relationship basis.

rep heal-1

rep-heal2

Replication Health through PowerShell:

I can get Replication Health details of Extended Replication through Powershell by setting “ReplicationRelationshipType parameter to “Extended”. To view the health of Replication from primary to replica, use the value of “Simple” as input to ReplicationRelationshipType parameter.

Measure-VMReplication –VMName -ReplicationRelationshipType Extended

While we have added support to display extended replication in our UI/PS, getting details about primary replication relationship remain same Smile

Hyper-V Replica to the Rescue!

Power outages are not infrequent where I live (something I find quite confounding – to be honest) and earlier this week we had an extended power outage and my Hyper-V servers were powered off uncleanly.  When the power returned I had to sit down and make sure that everything came back correctly.

At first glance – everything looked good.  My Hyper-V servers powered up happily and started up all the virtual machines.  Hyper-V Replica reported that it was in a critical state – but it automatically scheduled resynchronization for all of my virtual machines.  But as I was going through the virtual machines – I found a problem.

Something had gone wrong with my firewall.

I could not figure out what exactly was wrong – but it was using 100% CPU and not allowing any network traffic through.  I shutdown the virtual machine cleanly and restarted it – but no dice.  It still would not work.  Thankfully, there was a simple solution.

I shutdown the misbehaving virtual machine and started up the replica version of it.  This came up with no problems and started functioning correctly – as it had not been powered off uncleanly,  Yay!

Now, there is a key point to make here:  if I had performed a planned failover in Hyper-V (select the primary virtual machine and perform a planned failover) this would not have worked.  Hyper-V would have copied across the outstanding (bad) changes and would have broken my replica virtual machine too.  What I actually did was go straight to the replica virtual machine and selected to perform a failover (not planned).  By doing this, Hyper-V did not copy across the latest data and everything worked.

At the end of this process I reversed the replication relationship and was good to go.

Cheers,
Ben

Backup of a Replica VM

This blog post covers the scenarios and motivations that drive the backup of a Replica VM, and product guidance to administrators.

Why backup a Replica VM?

Ever since the advent of Hyper-V Replica in Windows Server 2012, customers have been interested in backing up the Replica VM. Traditionally, IT administrators have taken backups of the VM that contains the running workload (the primary VM) and backup products have been built to cater to this need. So when a significant proportion of customers talked about the backup of Replica VMs, we were intrigued. There are a few key scenarios where backup of a Replica VM becomes useful:

  1. Reduce the impact of backup on the running workload:   Taking the backup of a VM involves the creation of a snapshot/diff-disk to baseline the changes that need to be backed up. For the duration of the backup job, the workload is running on a diff-disk and there is an impact on the system when that happens. By offloading the backup to the Replica site, the running workload is no longer impacted by the backup operation. Of course, this is applicable only to deployments where the backup copy is stored on the remote site. For example, the daily backup operation might store the data locally for quicker restore times, but monthly or quarterly backup for long-term retention that are stored remotely can be done from the Replica VM.
  2. Limited bandwidth between sites:   This is typical of Branch Office-Head Office (BO-HO) kind of deployments where there are multiple smaller remote branch office sites and a larger central Head Office site. The backup data for the branch offices is stored in the head office, and an appropriate amount of bandwidth is provisioned by administrators to transfer the backup data between the two sites. The introduction of disaster recovery using Hyper-V Replica creates another stream of network traffic, and administrators have to re-evaluate their network infrastructure. In most cases, administrators either could not or were not willing to increase the bandwidth between sites to accommodate both backup and DR traffic. However they did come to the realization that backup and DR were independently sending copies of the same data over the network – and this was an area that could be optimized. With Hyper-V Replica creating a VM in the Head Office site, administrators could save on the network transfer by backing up the Replica VM locally rather than backing up the primary VM and sending the data over the network.
  3. Backup of all VMs in the Hoster datacenter:   Some customers use the Hoster datacenter as the Replica site, with the intention of not building a secondary datacenter of their own. Hosters have SLAs around the protection of all customer VMs in their datacenters – typically once a day backup. Thus the backup of Replica VMs becomes a requirement for the success of their business.

Thus various customer segments found that the backup of a Replica VM has value for their specific scenarios.

Data consistency

A key aspect of the backup operation is related to the consistency of the backed-up data. Customers have a clear prioritization and preference when it comes to data consistency of backed up VMs:

  1. Application-consistent backup
  2. Crash-consistent backup

And this prioritization applied to Replica VMs as well. Conversations with customers indicated that they were comfortable with crash-consistency for a Replica VM, if application-consistency was not possible. Of course, anything less than crash-consistency was not acceptable and customers preferred that backups fail rather than have inconsistent data getting backed up.

Attempting application-consistency

Typical backup products try to ensure application-consistency of the data being backed up (using the VSS framework) – and this works out well when the VM is running. However, the Replica VM is always turned off until a failover is initiated, and VSS is unable to guarantee application-consistent backup for a Replica VM. Thus getting application-consistent backup of a Replica VM is not possible.

Guaranteeing crash-consistency

In order to ensure that customers backing up Replica VMs always get crash-consistent data, a set of changes were introduced in Windows Server 2012 R2 that failed the backup operation if consistency could not be guaranteed. The virtual disk could be inconsistent when any one of the below conditions are encountered, and in these cases backup is expected to fail.

  1. HRL logs are being applied to the Replica VM
  2. Previous HRL log apply operation was cancelled or interrupted
  3. Previous HRL log apply operation failed
  4. Replica VM health is Critical
  5. VM is in the Resynchronization Required state or the Resynchronization in progress state
  6. Migration of Replica VM is in progress
  7. Initial replication is in progress (between the primary site and secondary site)
  8. Failover is in progress

Dealing with failures

These are largely treated as transient error states and the backup product is expected to retry the backup operation based on its own retry policies. With 30 second replication and apply being supported in Windows Server 2012 R2, the backup operation is expected to collide with HRL log apply more frequently – resulting in error scenario 1 mentioned above. A robust retry mechanism is needed to ensure a high backup success rate. In case the backup product is unable to retry or cope with failures then an option is to explicitly pause the replication before the backup is scheduled to run.

 

Key Takeaways

Impact on administrators 

  1. Backup of Replica VMs is better with Windows Server 2012 R2.
  2. Only crash-consistent backup of a Replica VM is guaranteed.
  3. A robust retry mechanism needs to be configured in the backup product to deal with failures. Or ensure that replication is paused when backup is scheduled.

Impact on backup vendors

  1. The changes introduced in Windows Server 2012 R2 would benefit customers using any backup product to take backup of Replica VMs.
  2. A robust retry mechanism would need to be built to deal with Replica VM failure.
  3. For specific details on how Data Protection Manager (DPM) deals with the backup of Replica VMs, refer to this blog post.

 

Update 25-Apr-2014:  The DPM-specific details on this post have been moved to the DPM blog.

Listing all the IP Addresses used by VMs

Here is a neat little snippet of PowerShell:

Get-VM | ?{$_.State -eq “Running”} |  Get-VMNetworkAdapter | Select VMName, IPAddresses

If you run this on a Hyper-V Server it will give you a listing of all the IP addresses that are assigned to running virtual machines:

image

This works whether you are using DHCP or Static IP addresses – and can really help when you are trying to track down a rogue virtual machine.

Cheers,
Ben

Exporting a Virtual Machine Checkpoint

Something neat that you can do in Windows Server 2012 / Windows 8 or later is to export a virtual machine checkpoint.  You can do this by either:

  1. Selecting the checkpoint in the UI and selecting Export from the action pane
  2. Using the Export-VMSnapshot cmdlet

When you do this, we will actually create an exported virtual machine that represents just the checkpoint you selected.  This means that we will merge all the changes from the checkpoint into a single set of virtual hard disks – so that you can create a new virtual machine from that point in time without having to include other checkpoints.

Cheers,
Ben