Tag Archives: replication

How to Use ASR and Hyper-V Replica with Failover Clusters

In the third and final post of this blog series, we will evaluate Microsoft’s replication solutions for multi-site clusters and how to integrate basic backup/DR with them. This includes Hyper-V Replica, Azure Site Recovery, and DFS Replication. In the first part of the series, you learned about setting up failover clusters to work with DR solutions and in the last post, you learned about disk replication considerations from third-party storage vendors. The challenge with the solutions that we previously discussed is that they typically require third-party hardware or software. Let’s look at the basic technologies provided by Microsoft to reduce these upfront fixed costs.

Note: The features talked about in this article are native Microsoft features with a baseline level of functionality. Should you require over and above what is required here you should look at a third-party backup/replication product such as Altaro VM Backup.

Multi-Site Disaster Recovery with Windows Server DFS Replication (DFSR)

DFS Replication (DFSR) is a Windows Server role service that has been around for many releases. Although DFSR is built into Windows Server and is easy to configure, it is not supported for multi-site clustering. This is because the replication of files only happens when a file is closed, so it works great for file servers hosting documents. However, it is not designed to work with application workloads where the file is kept open, such as SQL databases or Hyper-V VMs. Since these file types will only close during a planned failover or unplanned crash, it is hard to keep the data consistent at both sites. This means that if your first site crashes, the data will not be available at the second site, so DFSR should not be considered as a possible solution.

Multi-Site Disaster Recovery with Hyper-V Replica

The most popular Microsoft DR solution is Hyper-V Replica which is a built-in Hyper-V feature and available to Windows Server customers at no additional cost. It copies the virtual hard disk (VHD) file of a running virtual machine from one host to a second host in a different location. This is an excellent low-cost solution to replicate your data between your primary and secondary sites and even allows you to do extended (“chained”) replication to a third location. However, it is limited in that is only replicates Hyper-V virtual machines (VMs) so it cannot be used for any other application unless they are virtualized and running inside a VM. The way it works is that any changes to the VHD file are tracked by a log file, which is copied to an offline VM/VHD in the secondary site. This also means that replication is also asynchronous, allowing copies to be sent every 30 seconds, 5 minutes or 15 minutes. While this means that there is no distance limitation between the sites, there could be some data loss if any in-memory data has not been written to the disk or if there is a crash between replication cycles.

Two Clusters Replicate Data between Sites with Hyper-V Replica

Figure 1 – Two Clusters Replicate Data between Sites with Hyper-V Replica

Hyper-V Replica allows for replication between standalone Hyper-V hosts or between separate clusters, or any combination.  This means that instead of stretching a single cluster across two sites, you will set up two independent clusters. This also allows for a more affordable solution by letting businesses set up a cluster in their primary site and a single host in their secondary site that will be used only for mission-critical applications. If the Hyper-V Replica is deployed on a failover cluster, a new clustered workload type is created, known as the Hyper-V Replica Broker. This basically makes the replication service highly-available, so that if a node crashes, the replication engine will failover to a different node and continue to copy logs to the secondary site, providing greater resiliency.

Another powerful feature of Hyper-V Replica is its built-in testing, allowing you to simulate both planned and unplanned failures to the secondary site.  While this solution will meet the needs of most virtualized datacenters, it is also important to remember that there are no integrity checks in the data which is being copied between the VMs. This means that if a VM becomes corrupted or is infected with a virus, that same fault will be sent to its replica. For this reason, backups of the virtual machine are still a critical part of standard operating procedure. Additionally, this Altaro blog notes that Hyper-V Replica has other limitations compared to backups when it comes to retention, file space management, keeping separate copies, using multiple storage locations, replication frequency and may have a higher total cost of ownership. If you are using a multi-site DR solution which uses two clusters, then make sure that you are taking and storing backups in both sites, so that you can recover your data at either location. Also make sure that your backup provider supports clusters, CSV disks, and Hyper-V replica, however, this is now standard in the industry.

Multi-Site Disaster Recovery with Azure Site Recovery (ASR)

All of the aforementioned solutions require you to have a second datacenter, which simply is not possible for some businesses.  While you could rent rack space from a cohosting facility, the economics just may not make sense. Fortunately, the Microsoft Azure public cloud can now be used as your disaster recovery site using Azure Site Recovery (ASR). This technology works with Hyper-V Replica, but instead of copying your VMs to a secondary site, you are pushing them to a nearby Microsoft datacenter. This technology still has the same limitations of Hyper-V Replica, including the replication frequency, and furthermore you do not have access to the physical infrastructure of your DR site in Azure. The replicated VM can run on the native Azure infrastructure, or you can even build a virtualized guest cluster, and replicate to this highly-available infrastructure.

While ASR is a significantly cheaper solution than maintaining your own hardware in the secondary site, it is not free. You have to pay for the service, the storage of your virtual hard disks (VHDs) in the cloud, and if you turn on any of those VMs, you will pay for standard Azure VM operating costs.

If you are using ASR, you should follow the same backup best practices as mentioned in the earlier Hyper-V Replica section. The main difference will be that you should use an Azure-native backup solution to protect your replicated VHDs in Azure, in case you switch over the Azure VMs for any extended period of time.

Conclusion

From reviewing this blog series, you should be equipped to make the right decisions when planning your disaster recovery solution using multi-site clustering.  Start by understanding your site restrictions and from there you can plan your hardware needs and storage replication solution.  There are a variety of options that have tradeoffs between a higher price with more features to cost-effective solutions using Microsoft Azure, but have limited control. Even after you have deployed this resilient infrastructure, keep in mind that there are still three main reasons why disaster recovery plans fail:

  • The detection of the outage failed, so the failover to the secondary datacenter never happens.
  • One component in the DR failover process does not work, which is usually due to poor or infrequent testing.
  • There was no automation or some dependency on humans during the process, which failed as humans create a bottleneck and are unreliable during a disaster.

This means that whichever solution you choose, make sure that it is well tested with quick failure detection and try to eliminate all dependencies on humans! Good luck with your deployment and please post any questions that you have in the comments section of this blog.


Go to Original Article
Author: Symon Perriman

How to Use Failover Clusters with 3rd Party Replication

In this second post, we will review the different types of replication options and give you guidance on what you need to ask your storage vendor if you are considering a third-party storage replication solution.

If you want to set up a resilient disaster recovery (DR) solution for Windows Server and Hyper-V, you’ll need to understand how to configure a multi-site cluster as this also provides you with local high-availability. In the first post in this series, you learned about the best practices for planning the location, node count, quorum configuration and hardware setup. The next critical decision you have to make is how to maintain identical copies of your data at both sites, so that the same information is available to your applications, VMs, and users.

Multi-Site Cluster Storage Planning

All Windows Server Failover Clusters require some type of shared storage to allow an application to run on any host and access the same data. Multi-site clusters behave the same way, but they require multiple independent storage arrays at each site, with the data replicated between them. The data for the clustered application or virtual machine (VM) on each site should use its own local storage array, or it could have significant latency if each disk IO operation had to go to the other location.

If you are running Hyper-V VMs on your multi-site cluster, you may wish to use Cluster Shared Volumes (CSV) disks. This type of clustered storage configuration is optimized for Hyper-V and allows multiple virtual hard disks (VHDs) to reside on the same disk while allowing the VMs to run on different nodes. The challenge when using CSV in a multi-site cluster is that the VMs must make sure that they are always writing to their disk in their site, and not the replicated copy. Most storage providers offer CSV-aware solutions, and you must make sure that they explicitly support multi-site clustering scenarios. Often the vendors will force writes at the primary site by making the CSV disk at the second site read-only, to ensure that the correct disks are always being used.

Understanding Synchronous and Asynchronous Replication

As you progress in planning your multi-site cluster you will have to select how your data is copied between sites, either synchronously or asynchronously. With asynchronous replication, the application will write to the clustered disk at the primary site, then at regular intervals, the changes will be copied to the disk at the secondary site. This usually happens every few minutes or hours, but if a site fails between replication cycles, then any data from the primary site which has not yet been copied to the secondary site will be lost. This is the recommended configuration for applications that can sustain some amount of data loss, and this generally does not impose any restrictions on the distance between sites. The following image shows the asynchronous replication cycle.

Asynchronous Replication in a Multi-Site Cluster

Asynchronous Replication in a Multi-Site Cluster

With synchronous replication, whenever a disk write command occurs on the primary site, it is then copied to the secondary site, and an acknowledgment is returned to both the primary and secondary storage arrays before that write is committed. Synchronous replication ensures consistency between both sites and avoids data loss in the event that there is a crash between a replication cycle. The challenge of writing to two sets of disks in different locations is that the physical distance between sites must be close or it can affect the performance of the application. Even with a high-bandwidth and low-latency connection, synchronous replication is usually recommended only for critical applications that cannot sustain any data loss, and this should be considered with the location of your secondary site.  The following image shows the asynchronous replication cycle.

Synchronous Replication in a Multi-Site Cluster

Synchronous Replication in a Multi-Site Cluster

As you continue to evaluate different storage vendors, you may also want to assess the granularity of their replication solution. Most of the traditional storage vendors will replicate data at the block-level, which means that they track specific segments of data on the disk which have changed since the last replication. This is usually fast and works well with larger files (like virtual hard disks or databases), as only blocks that have changed need to be copied to the secondary site. Some examples of integrated block-level solutions include HP’s Cluster Extension, Dell/EMC’s Cluster Enabler (SRDF/CE for DMX, RecoverPoint for CLARiiON), Hitachi’s Storage Cluster (HSC), NetApp’s MetroCluster, and IBM’s Storage System.

There are also some storage vendors which provide a file-based replication solution that can run on top of commodity storage hardware. These providers will keep track of individual files which have changed, and only copy those. They are often less efficient than the block-level replication solutions as larger chunks of data (full files) must be copied, however, the total cost of ownership can be much less. A few of the top file-level vendors who support multi-site clusters include Symantec’s Storage Foundation High Availability, Sanbolic’s Melio, SIOS’s Datakeeper Cluster Edition, and Vision Solutions’ Double-Take Availability.

The final class of replication providers will abstract the underlying sets of storage arrays at each site. This software manages disk access and redirection to the correct location. The more popular solutions include EMC’s VPLEX, FalconStor’s Continuous Data Protector and DataCore’s SANsymphony. Almost all of the block-level, file-level, and appliance-level providers are compatible with CSV disks, but it is best to check that they support the latest version of Windows Server if you are planning a fresh deployment.

By now you should have a good understanding of how you plan to configure your multi-site cluster and your replication requirements. Now you can plan your backup and recovery process. Even though the application’s data is being copied to the secondary site, which is similar to a backup, it does not replace the real thing. This is because if the VM (VHD) on one site becomes corrupted, that same error is likely going to be copied to the secondary site. You should still regularly back up any production workloads running at either site.  This means that you need to deploy your cluster-aware backup software and agents in both locations and ensure that they are regularly taking backups. The backups should also be stored independently at both sites so that they can be recovered from either location if one datacenter becomes unavailable. Testing recovery from both sites is strongly recommended. Altaro’s Hyper-V Backup is a great solution for multi-site clusters and is CSV-aware, ensuring that your disaster recovery solution is resilient to all types of disasters.

If you are looking for a more affordable multi-site cluster replication solution, only have a single datacenter, or your storage provider does not support these scenarios, Microsoft offers a few solutions. This includes Hyper-V Replica and Azure Site Recovery, and we’ll explore these disaster recovery options and how they integrate with Windows Server Failover Clustering in the third part of this blog series.

Let us know if you have any questions in the comments form below!


Go to Original Article
Author: Symon Perriman