Tag Archives: HVR

Announcing GA of Disaster Recovery to Azure – Purpose-Built for Branch Offices and SMB

Today, we are excited to announce the GA for Branch Office and SMB Disaster Recovery to Azure.  Azure Site Recovery delivers a simpler, reliable & cost effective Disaster Recovery solution to Branch Office and SMB customers.  ASR with new Hyper-V Virtual Machine Protection from Windows Server 2012 R2 to Microsoft Azure can now be used at customer owned sites where SCVMM is not deployed.

 

You can visit the Getting Started with Azure Site Recovery for additional information.

Out-of-band Initial Replication (OOB IR) and Deduplication

A recent conversation with a customer brought out the question:   What is the best way to create an entire Replica site from scratch? At the surface this seems simple enough – configure initial replication to send the data over the network for the VMs one after another in sequence. For this specific customer however, there were some additional constraints placed:

  1. The network bandwidth was less than 10Mbps and it primarily catered to their daily business needs (email etc…). Adding more network was not possible within their budget. This came as quite a surprise because despite the incredible download speeds that are encountered these days, there are still places in the world where it isn’t as cost effective to purchase those speeds.
  2. The VMs were of size between 150GB and 300GB each. This made it rather impractical to send the data over the wire. In the best case, it would have taken 34 hours for a single VM of size 150GB.

This left OOB IR as the only realistic way to transfer data. But at 300GB per VM, it is easy to exhaust a removable drive of 1TB. That left us thinking about deduplication – after all, deduplication is supported on the Replica site. So why not use it for deduplicating OOB IR data?

So I tested this out in my lab environment with a removable USB drive, and a bunch of VMs created out of the same Windows Server 2012 VHDX file. The expectation was that at least 20% to 40% of the data would be same in the VMs, and the overall deduplication rate would be quite high and we could fit a good number of VMs into the removable USB drive.

I started this experiment by attaching the removable drive to my server and attempted to enable deduplication on the associated volume in Server Manager.

Interesting discovery #1:  Deduplication is not allowed on volumes on removable disks

Whoops! This seems like a fundamental block to our scenario – how do you build deduplicated OOB IR, if the deduplication is not supported on removable media? This limitation is officially documented here: http://technet.microsoft.com/en-us/library/hh831700.aspx, and says “Volumes that are candidates for deduplication must conform to the following requirements:  Must be exposed to the operating system as non-removable drives. Remotely-mapped drives are not supported.”

Fortunately my colleague Paul Despe in the Windows Server Data Deduplication team came to the rescue. There is a (slightly) convoluted way to get the data on the removable drive and deduplicated. Here goes:

  • Create a dynamically expanding VHDX file. The size doesn’t matter too much as you can always start off with the default and expand if required.

image

  • Using Disk Management, bring the disk online, initialize it, create a single volume, and format it with NTFS. You should be able to see the new volume in your Explorer window. I used Y: as the drive letter.

image

  • Mount this VHDX on the server you are using to do the OOB IR process.
  • If you go to Server Manager and view this volume (Y:), you will see that it is backed by a fixed disk.

image

  • In the volume view, enable deduplication on this volume by right-clicking and selecting ‘Configure Data Deduplication’. Set the ‘Deduplicate files older than (in days)’ field to zero.

image

image

You can also enable deduplication in PowerShell with the following commandlets:

PS C:> Enable-DedupVolume Y: -UsageType HyperV

PS C:> Set-DedupVolume Y: -MinimumFileAgeDays 0

Now you are set to start the OOB IR process and take advantage of the deduplicated volume. This is what I saw after 1 VM was enabled for replication with OOB IR:

image

image

That’s about 32.6GB of storage used. Wait… shouldn’t there be a reduction in size because of deduplication?

Interesting discovery #2:  Deduplication doesn’t work on-the-fly

Ah… so if you were expecting that the VHD data would arrive into the volume in deduplicated form, this is going to be a bit of a surprise. At the first go, the VHD data will be present in the volume in its original size. Deduplication happens as post-facto as a job that crunches the data and reduces the size of the VHD after it has been fully copied as a part of the OOB IR process. This is because deduplication needs an exclusive handle on the file in order to go about doing its work.

The good part is that you can trigger the job on-demand and start the deduplication as soon as the first VHD is copied. You can do that by using the PowerShell commandlet provided:

PS C:> Start-DedupJob Y: -Type Optimization

There are other parameters provided by the commandlet that allow you to control the deduplication job. You can explore the various options in the TechNet documentation: http://technet.microsoft.com/en-us/library/hh848442.aspx.

This is what I got after the deduplication job completed:

image

That’s a 54% saving with just one VM – a very good start!

Deduplication rate with more virtual machines

After this I threw in a few more virtual machines with completely different applications installed and here is the observed savings after each step:

image

I think the excellent results speak for themselves! Smile Notice how between VM2 and VM3, almost all of the data (~9GB) has been absorbed by deduplication with an increase of only 300MB! As the deduplication team as published on TechNet, VDI VMs would have a high degree of similarity in their disks and would result in a much higher deduplication rate. A random mix of VMs yields surprisingly good results as well.

Final steps

Once you are done with the OOB IR and deduplication of your VMs, you need to do the following steps:

  1. Ensure that no deduplication job is running on the volume
  2. Eject the fixed disk – this should disconnect the VHD from the host
  3. Compact the VHD using the “Edit Virtual Hard Disk Wizard”. At the time I disconnected the VHD from the host, the size of the VHD was 36.38GB. After compacting it the size came down to 28.13GB… and this is more in line with the actual disk consumed that you see in the graph above
  4. Copy the VHD to the Replica site, mount it on the Replica host, and complete the OOB IR process!

 

Hope this blog post helps with setting up your own Hyper-V Replica sites from scratch using OOB IR! Try it out and let us know your feedback.

Disaster Recovery to Microsoft Azure – Part 2

 

Continuing from the previous blog – check out the recent TechEd NA 2014 talk @ https://channel9.msdn.com/Events/TechEd/NorthAmerica/2014/DCIM-B322 which includes a cool demo of this product.

Love it??? Talk about it, try it and share your comments.

Let’s retrace the journey – in Jan 2014, we announced the General Availability of Hyper-V Recovery Manager (HRM). HRM  enabled customers to co-ordinate protection and recovery of virtualized workloads between SCVMM managed clouds. Using this Azure service, customers could setup, monitor and orchestrate protection and recovery of their Virtual Machines on top of Windows Server 2012, WS2012 R2 Hyper-V Replica.

Like Hyper-V Replica, the solution works great when our customers had a secondary location. But what if it isn’t the case. After all, the CAPEX and OPEX cost of building and maintaining multiple datacenters is high. One of the common questions/suggestions/feedback to our team was around using Azure as a secondary data center. Azure provides a world class, reliable, resilient platform – at a fraction of a cost compared to running your workloads or in this case, maintaining a secondary datacenter.

The rebranded HRM service – Azure Site Recovery (ASR) – delivers this capability. On 6/19, we announced the availability of the preview version of ASR which orchestrates, manages and replicates VMs to Azure.

When a disaster strikes the customer’s on-premises, ASR can “failover” the replicated VMs in Azure.

And once the customer recovers the on-premises site, ASR can “failback” the Azure IaaS VMs to the customer’s private cloud. We want you to decide which VM runs where and when!

There is some exciting technology built on top of Azure which enables the scenario and in the coming weeks we will dive deep into the workflows and the technology.

Top of my head, the key features in the product are:

  • Replication from a System Center 2012 R2 Virtual Machine Manager cloud to Azure – From a SCVMM 2012 R2 managed private cloud, any VM (we will cover some caveats in subsequent blogs) running on Windows Server 2012 R2 hypervisor can be replicated to Azure.

  • Replication frequency of 30seconds, 5mins or 15mins – just like the on-premises product, you can replicate to Azure at 30seconds.

  • Additional 24 additional recovery points to choose during failover – You can configure upto 24 additional recovery points at an hourly granularity.

 

  • Encryption @ Rest: You got to love this – we encrypt the data *before* it leaves your on-premises server. We never decrypt the payload till you initiate a failover. You own the encryption key and it’s safe with you.

  • Self-service DR with Planned, Unplanned and Test Failover – Need I say more – everything is in your hands and at your convenience.

  • One click app-level failover using Recovery Plans
  • Audit and compliance reporting
  • .…and many more!

The documentation explaining the end to end workflows is available @ http://azure.microsoft.com/en-us/documentation/articles/hyper-v-recovery-manager-azure/ to help you get started.

The landing page for this service is @ http://azure.microsoft.com/en-us/services/site-recovery/

If you have questions when using the product, post them @ http://social.msdn.microsoft.com/Forums/windowsazure/en-US/home?forum=hypervrecovmgr or in this blog.

Keep watching this blog space for more information on this capability.

Application consistent recovery points with Windows Server 2008/2003 guest OS

I recently had a conversation with a customer around a very interesting problem, and the insights that were gained there are worth sharing. The issue was about VSS errors popping up in the guest event viewer while Hyper-V Replica reported the successful creation of application-consistent (VSS-based) recovery points.

Deployment details

The customer had the following setup that was throwing errors:

  1. Primary site:   Hyper-V Cluster with Windows Server 2012 R2
  2. Replica site:   Hyper-V Cluster with Windows Server 2012 R2
  3. Virtual machines:   SQL server instances with SQL Server 2012 SP1, SQL Server 2005, and SQL Server 2008

At the time of enabling replication, the customer selected the option to create additional recovery points and have the “Volume Shadow Copy Service (VSS) snapshot frequency” as 1 hour. This means that every hour the VSS writer of the guest OS would be invoked to take an application-consistent snapshot.

Symptoms

With this configuration, there was a contradiction in the output – the guest event viewer showed errors/failure during the VSS process, while the Replica VM showed application-consistent points in the recovery history.

Here is an example of the error registered in the guest:

SQLVM: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=2644. Thread=7212. Client. Instance=. VD=Global*******

 

BACKUP failed to complete the command BACKUP DATABASE model. Check the backup application log for detailed messages.

 

BackupVirtualDeviceFile::SendFileInfoBegin:  failure on backup device '{********-63**-49**-BA**-5DB6********}1'. Operating system error 995(error not found).

Root cause and Dealing with the errors

The big question was:  Why was Hyper-V Replica showing application-consistent recovery points if there are failures?

The behavior seen by the customer is a benign error caused because of the interaction between Hyper-V and VSS, especially for older versions of the guest OS. Details about this can be found in the KB article here: http://support.microsoft.com/kb/2952783

The Hyper-V requestor explicitly stops the VSS operation right after the OnThaw phase. While this ensures application-consistency of the writes going to the disk, it also results in the VSS errors being logged. Meanwhile, Hyper-V returns the consistency correctly to Hyper-V Replica, which in turn makes sure that the recovery side shows application-consistent points.

A great way to validate whether the recovery point is application-consistent or not is to do a test failover on that recovery point. After the VM has booted up, the event viewer logs will have events pertaining to a rollback – and this would mean that the point is not application consistent.

Key Takeaways

  1. All in all, you can rest assured that in the case of VMs with older operating systems, Hyper-V Replica is correctly taking an application-consistent snapshot of the virtual machine.
  2. Although there are errors seen in the guest, they are benign and having a recovery history with application-consistent points is an expected behavior.

Excluding virtual disks in Hyper-V Replica

Since its introduction in Windows Server 2012, Hyper-V Replica has provided a way for users to exclude specific virtual disks from being replicated. This option is rarely exercised but can have a significant benefits when used correctly. This blog post covers the disk exclusion scenarios and the impact this has on the various operations done during the lifecycle of VM replication. This blog post has been co-authored by Priyank Gaharwar of the Hyper-V Replica test team.

Why exclude disks?

Excluding disks from replication is done because:

  1. The data churned on the excluded disk is not important or doesn’t need to be replicated    (and)
  2. Storage and network resources can be saved by not replicating this churn

Point #1 is worth elaborating on a little. What data isn’t “important”? The lens used to judge the importance of replicated data is its usefulness at the time of Failover. Data that is not replicated should also not be needed at the time of failover. Lack of this data would then also not impact the Recovery Point Objective (RPO) in any material way.

There are some specific examples of data churn that can be easily identified and are great candidates for exclusion – for example, page file writes. Depending on the workload and the storage subsystem, the page file can register a significant amount churn. However, replicating this data from the primary site to the replica site would be resource intensive and yet completely worthless. Thus the replication of a VM with a single virtual disk having both the OS and the page file can be optimized by:

  1. Splitting the single virtual disk into two virtual disks – one with the OS, and one with the page file
  2. Excluding the page file disk from replication

How to exclude disks

Application impact – isolating the churn to a separate disk

The first step in using this feature is to first isolate the superfluous churn on to a separate virtual disk, similar to what is described above for page files. This is a change to the virtual machine and to the guest. Depending on how your VM is configured and what kind of disk you are adding (IDE, SCSI) you may have to power off your VM before any changes can be made.

At the end, an additional disk should surface up in the guest. Appropriate configuration changes should be done in the application to change the location of the temporary files to point to the newly added disk.

Figure 1:  Changing the location of the System Page File to another disk/volumeimage

Excluding disks in the Hyper-V Replica UI

Right-click on a VM and select “Enable Replication…”. This will bring up the wizard that walks you through the various inputs required to enable replication on the VM. The screen titled “Choose Replication VHDs” is where you deselect the virtual disks that you do not want to replicate. By default, all virtual disks will be selected for replication.

Figure 2:  Excluding the page file virtual disk from a virtual machineimage

Excluding disks using PowerShell

The Enable-VMReplication commandlet provides two optional parameters: –ExcludedVhd and –ExcludedVhdPath. These parameters should be used to exclude the virtual disks at the time of enabling replication.

PS C:Windowssystem32> Enable-VMReplication -VMName SQLSERVER -ReplicaServerName repserv01.contoso.com -AuthenticationType Kerberos -ReplicaServerPort 80 -ExcludedVhdPath 'D:Primary-SiteHyper-VVirtual Hard DisksSQL-PageFile.vhdx'

After running this command, you will be able to see the excluded disks under VM Settings > Replication > Replication VHDs.

Figure 3:  List of disks included for and excluded from replication image

Impact of disk exclusion

Enable replication A placeholder disk (for use during initial replication) is not created on the Replica VM. The excluded disk doesn’t exist on the replica in any form.
Initial replication The data from the excluded disks are not transferred to the replica site.
Delta replication The churn on any of the excluded disks is not transferred to the replica site.
Failover The failover is initiated without the disk that has been excluded. Applications that refer to the disk/volume in the guest will have their configurations incorrect.

For page files specifically, if the page file disk is not attached to the VM before VM boot up then the page file location is automatically shifted to the OS disk.

Resynchronization The excluded disk is not part of the resynchronization process.

Ensuring a successful failover

Most applications have configurable settings that make use of file system paths. In order to run correctly, the application expects these paths to be present. The key to a successful failover and an error-free application startup is to ensure that the configured paths are present where they should be. In the case of file system paths associated with the excluded disk, this means updating the Replica VM by adding a disk – along with any subfolders that need to be present for the application to work correctly.

The prerequisites for doing this correctly are:

  • The disk should be added to the Replica VM before the VM is started. This can be done at any time after initial replication completes, but is preferably done immediately after the VM has failed over.
  • The disk should be added to the Replica VM with the exact controller type, controller number, and controller location as the disk has on the primary.

There are two ways of making a virtual disk available for use at the time of failover:

  1. Copy the excluded disk manually (once) from the primary site to the replica site
  2. Create a new disk, and format it appropriately (with any folders if required)

When possible, option #2 is preferred over option #1 because of the resources saved from not having to copy the disk. The following PowerShell script can be used to green-light option #2, focusing on meeting the prerequisites to ensure that the Replica VM is exactly the same as the primary VM from a virtual disk perspective:

param (

    [string]$VMNAME,

    [string]$PRIMARYSERVER

)

 

## Get VHD details from primary, replica

$excludedDisks = Get-VMReplication -VMName $VMNAME -ComputerName $PRIMARYSERVER | select ExcludedDisks

$includedDisks = Get-VMReplication -VMName $VMNAME | select ReplicatedDisks

if( $excludedDisks -eq $null ) {

    exit

}

 

#Get location of first replica VM disk

$replicaPath = $includedDisks.ReplicatedDisks[0].Path | Split-Path -Parent

 

## Create and attach each excluded disk

foreach( $exDisk in $excludedDisks.ExcludedDisks )

{

    #Get the actual disk object

    $pDisk = Get-VHD -Path $exDisk.Path -ComputerName $PRIMARYSERVER

    $pDisk

    

    #Create a new VHD on the Replica

    $diskpath = $replicaPath + "" + ($pDisk.Path | Split-Path -Leaf)

    $newvhd = New-VHD -Path $diskpath `

                      -SizeBytes $pDisk.Size `

                      -Dynamic `

                      -LogicalSectorSizeBytes $pDisk.LogicalSectorSize `

                      -PhysicalSectorSizeBytes $pDisk.PhysicalSectorSize `

                      -BlockSizeBytes $pDisk.BlockSize `

                      -Verbose

    if($newvhd -eq $null) 

    {

        Write-Host "It is assumed that the VHD [" ($pDisk.Path | Split-Path -Leaf) "] already exists and has been added to the Replica VM [" $VMNAME "]"

        continue;

    }

 

    #Mount and format the new new VHD

    $newvhd | Mount-VHD -PassThru -verbose `

            | Initialize-Disk -Passthru -verbose `

            | New-Partition -AssignDriveLetter -UseMaximumSize -Verbose `

            | Format-Volume -FileSystem NTFS -Confirm:$false -Force -verbose `

    

    #Unmount the disk 

    $newvhd | Dismount-VHD -Passthru -Verbose

 

    #Attach disk to Replica VM

    Add-VMHardDiskDrive -VMName $VMNAME `

                        -ControllerType $exDisk.ControllerType `

                        -ControllerNumber $exDisk.ControllerNumber `

                        -ControllerLocation $exDisk.ControllerLocation `

                        -Path $newvhd.Path `

                        -Verbose

}

The script can also be customized for use with Azure Hyper-V Recovery Manager, but we’ll save that for another post!

Capacity Planner and disk exclusion

The Capacity Planner for Hyper-V Replica allows you to forecast your resource needs. It allows you to be more precise about the replication inputs that impact the resource consumption – such as the disks that will be replicated and the disks that will not be replicated.

Figure 4:  Disks excluded for capacity planningimage

Key Takeaways

  1. Excluding virtual disks from replication can save on storage, IOPS, and network resources used during replication
  2. At the time of failover, ensure that the excluded virtual disk is attached to the Replica VM
  3. In most cases, the excluded virtual disk can be recreated on the Replica side using the PowerShell script provided

Optimizing Hyper-V Replica HTTPS traffic using Riverbed SteelHead

Hyper-V Replica support both Kerberos based authentication and certificate based authentication – the former sends the replication traffic between the two servers/sites over HTTP while the latter sends it over HTTPS. Network is a precious commodity and any optimization delivered has a huge impact on the organization’s TCO and the Recovery Point Objective (RPO).

Around a year back, we partnered with the folks from Riverbed in Microsoft’s EEC lab, to publish a whitepaper which detailed the bandwidth optimization of replication traffic sent over HTTP.

A few months back, we decided to revisit the setup with the latest release of RiOS (Riverbed OS which runs in the Riverbed appliance). Using the resources and appliances from EEC and Riverbed, a set of experiments were performed to study the network optimizations delivered by the Riverbed SteelHead appliance. Optimizing SSL traffic has been a tough nut to crack and we saw some really impressive numbers.  The whitepaper documenting the results and technology is available here – http://www.microsoft.com/en-us/download/details.aspx?id=42627.

At a high level, in order to optimize HTTPS traffic, the Riverbed SteelHead appliance decrypts the packet from the client (the primary server). It then optimizes the payload and encrypts the payload before sending it to the server side SteelHead appliance over the internet/WAN. The server-side SteelHead appliance decrypts the payload, de-optimizes the traffic and re-encrypts it. The server side appliance finally sends it to the destination server (the replica server) which proceeds to decrypt the replication traffic. The diagram is taken from Riverbed’s user manual and explains the above technology:

image

When Hyper-V Replica’s inbuilt compression is disabled, the reduction delivered over WAN was ~80%

image

When Hyper-V Replica’s inbuilt compression is enabled, the reduction delivered over WAN was ~30%

image

It’s worth calling out that the % reduction delivered depends on a number of factors such as workload read, write pattern, sparseness of the disk etc but the numbers were quite impressive.

In summary, both Hyper-V Replica and the SteelHead devices were easy to configure and worked “out-of the box”. Neither product required specific configurations to light up the scenario. The Riverbed appliance delivered ~30% on compressed, encrypted Hyper-V Replica traffic and ~80% on uncompressed, encrypted Hyper-V Replica traffic.

Backup of a Replica VM

This blog post covers the scenarios and motivations that drive the backup of a Replica VM, and product guidance to administrators.

Why backup a Replica VM?

Ever since the advent of Hyper-V Replica in Windows Server 2012, customers have been interested in backing up the Replica VM. Traditionally, IT administrators have taken backups of the VM that contains the running workload (the primary VM) and backup products have been built to cater to this need. So when a significant proportion of customers talked about the backup of Replica VMs, we were intrigued. There are a few key scenarios where backup of a Replica VM becomes useful:

  1. Reduce the impact of backup on the running workload:   Taking the backup of a VM involves the creation of a snapshot/diff-disk to baseline the changes that need to be backed up. For the duration of the backup job, the workload is running on a diff-disk and there is an impact on the system when that happens. By offloading the backup to the Replica site, the running workload is no longer impacted by the backup operation. Of course, this is applicable only to deployments where the backup copy is stored on the remote site. For example, the daily backup operation might store the data locally for quicker restore times, but monthly or quarterly backup for long-term retention that are stored remotely can be done from the Replica VM.
  2. Limited bandwidth between sites:   This is typical of Branch Office-Head Office (BO-HO) kind of deployments where there are multiple smaller remote branch office sites and a larger central Head Office site. The backup data for the branch offices is stored in the head office, and an appropriate amount of bandwidth is provisioned by administrators to transfer the backup data between the two sites. The introduction of disaster recovery using Hyper-V Replica creates another stream of network traffic, and administrators have to re-evaluate their network infrastructure. In most cases, administrators either could not or were not willing to increase the bandwidth between sites to accommodate both backup and DR traffic. However they did come to the realization that backup and DR were independently sending copies of the same data over the network – and this was an area that could be optimized. With Hyper-V Replica creating a VM in the Head Office site, administrators could save on the network transfer by backing up the Replica VM locally rather than backing up the primary VM and sending the data over the network.
  3. Backup of all VMs in the Hoster datacenter:   Some customers use the Hoster datacenter as the Replica site, with the intention of not building a secondary datacenter of their own. Hosters have SLAs around the protection of all customer VMs in their datacenters – typically once a day backup. Thus the backup of Replica VMs becomes a requirement for the success of their business.

Thus various customer segments found that the backup of a Replica VM has value for their specific scenarios.

Data consistency

A key aspect of the backup operation is related to the consistency of the backed-up data. Customers have a clear prioritization and preference when it comes to data consistency of backed up VMs:

  1. Application-consistent backup
  2. Crash-consistent backup

And this prioritization applied to Replica VMs as well. Conversations with customers indicated that they were comfortable with crash-consistency for a Replica VM, if application-consistency was not possible. Of course, anything less than crash-consistency was not acceptable and customers preferred that backups fail rather than have inconsistent data getting backed up.

Attempting application-consistency

Typical backup products try to ensure application-consistency of the data being backed up (using the VSS framework) – and this works out well when the VM is running. However, the Replica VM is always turned off until a failover is initiated, and VSS is unable to guarantee application-consistent backup for a Replica VM. Thus getting application-consistent backup of a Replica VM is not possible.

Guaranteeing crash-consistency

In order to ensure that customers backing up Replica VMs always get crash-consistent data, a set of changes were introduced in Windows Server 2012 R2 that failed the backup operation if consistency could not be guaranteed. The virtual disk could be inconsistent when any one of the below conditions are encountered, and in these cases backup is expected to fail.

  1. HRL logs are being applied to the Replica VM
  2. Previous HRL log apply operation was cancelled or interrupted
  3. Previous HRL log apply operation failed
  4. Replica VM health is Critical
  5. VM is in the Resynchronization Required state or the Resynchronization in progress state
  6. Migration of Replica VM is in progress
  7. Initial replication is in progress (between the primary site and secondary site)
  8. Failover is in progress

Dealing with failures

These are largely treated as transient error states and the backup product is expected to retry the backup operation based on its own retry policies. With 30 second replication and apply being supported in Windows Server 2012 R2, the backup operation is expected to collide with HRL log apply more frequently – resulting in error scenario 1 mentioned above. A robust retry mechanism is needed to ensure a high backup success rate. In case the backup product is unable to retry or cope with failures then an option is to explicitly pause the replication before the backup is scheduled to run.

 

Key Takeaways

Impact on administrators 

  1. Backup of Replica VMs is better with Windows Server 2012 R2.
  2. Only crash-consistent backup of a Replica VM is guaranteed.
  3. A robust retry mechanism needs to be configured in the backup product to deal with failures. Or ensure that replication is paused when backup is scheduled.

Impact on backup vendors

  1. The changes introduced in Windows Server 2012 R2 would benefit customers using any backup product to take backup of Replica VMs.
  2. A robust retry mechanism would need to be built to deal with Replica VM failure.
  3. For specific details on how Data Protection Manager (DPM) deals with the backup of Replica VMs, refer to this blog post.

 

Update 25-Apr-2014:  The DPM-specific details on this post have been moved to the DPM blog.

Hosting Providers and HRM

If you are a hosting provider who is interested in offering DR as a service – you should go over this great post by Gaurav on how Hyper-V Recovery Manager (HRM) helps you build this capability http://blogs.technet.com/b/scvmm/archive/2014/02/18/disaster-recovery-as-a-service-by-hosting-service-providers-using-windows-azure-hyper-v-recovery-manager.aspx

The post provides a high level overview of the capability and also a detailed FAQ on the common set of queries which we have heard from our customers. If you have any further questions, leave a comment in the blog post.

Hyper-V Replica Certificate based authentication and Proxy servers

Continuing from where we left off, I have a small lab deployment which consists of a AD, DNS, Proxy server (Forefront TMG 2010 on WS 2008 R2 SP1), primary servers and replica servers. When the primary server is behind the proxy (forward proxy) and when I tried to enable replication using certificate based authentication, I got the following error message: The handle is in the wrong state for the requested operation (0x00002EF3)

image

That didn’t convey too much, did it? Fortunately I had netmon running in the background and the only set of network traffic which was seen was between the primary server and the proxy. A particular HTTP response caught my eye:

image

The highlighted text indicated that the proxy was terminating the connection and returning a ‘Bad gateway’ error. Closer look at the TMG error log indicated that the error was encountered during https-inspect state.

After some bing’ing of the errors and the pieces began to emerge. When HTTPS inspection is enabled, the TMG server terminates the connection and establishes a new connection (in our case to the replica server) acting as a trusted man-in-the-middle. This doesn’t work for Hyper-V Replica as we mutually authenticate the primary and replica server endpoints. To work around the situation, I disabled HTTPS inspection in the proxy server

image

and things worked as expected. The primary server was able to establish the connection and replication was on track.