All posts by admin

Gathering Recent Events for a Specific VM

Imagine this scenario: you login to one of your Hyper-V servers and find that something has gone wrong with a virtual machine.  Maybe the guest operating system is not responding, maybe it is running slower than expected, maybe something else has gone wrong.

As you are triaging the problem – you are likely to want to gather all the information you can about what has been happening with the virtual machine in question.  Luckily, this is quite easy to do with PowerShell.

In fact, you just need to run this code snippet:

$vmName = “File Server”
Get-WinEvent -FilterHashTable @{LogName =”Microsoft-Windows-Hyper-V*”; StartTime = (Get-Date).AddDays(-2)} | ?{( [xml]$_.toxml()).event.userdata.vmleventlog.vmname -eq $vmName}

And you will get results like this:
  clip_image002
(sorry for the lack of results – I have not had any problems with my virtual machines lately!)

This works because Hyper-V tags each event log entry with the virtual machine name, and the Get-WinEvent Cmdlet allows you to look for this tag in the event log results.

Cheers,
Ben

Application consistent recovery points with Windows Server 2008/2003 guest OS

I recently had a conversation with a customer around a very interesting problem, and the insights that were gained there are worth sharing. The issue was about VSS errors popping up in the guest event viewer while Hyper-V Replica reported the successful creation of application-consistent (VSS-based) recovery points.

Deployment details

The customer had the following setup that was throwing errors:

  1. Primary site:   Hyper-V Cluster with Windows Server 2012 R2
  2. Replica site:   Hyper-V Cluster with Windows Server 2012 R2
  3. Virtual machines:   SQL server instances with SQL Server 2012 SP1, SQL Server 2005, and SQL Server 2008

At the time of enabling replication, the customer selected the option to create additional recovery points and have the “Volume Shadow Copy Service (VSS) snapshot frequency” as 1 hour. This means that every hour the VSS writer of the guest OS would be invoked to take an application-consistent snapshot.

Symptoms

With this configuration, there was a contradiction in the output – the guest event viewer showed errors/failure during the VSS process, while the Replica VM showed application-consistent points in the recovery history.

Here is an example of the error registered in the guest:

SQLVM: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=2644. Thread=7212. Client. Instance=. VD=Global*******

 

BACKUP failed to complete the command BACKUP DATABASE model. Check the backup application log for detailed messages.

 

BackupVirtualDeviceFile::SendFileInfoBegin:  failure on backup device '{********-63**-49**-BA**-5DB6********}1'. Operating system error 995(error not found).

Root cause and Dealing with the errors

The big question was:  Why was Hyper-V Replica showing application-consistent recovery points if there are failures?

The behavior seen by the customer is a benign error caused because of the interaction between Hyper-V and VSS, especially for older versions of the guest OS. Details about this can be found in the KB article here: http://support.microsoft.com/kb/2952783

The Hyper-V requestor explicitly stops the VSS operation right after the OnThaw phase. While this ensures application-consistency of the writes going to the disk, it also results in the VSS errors being logged. Meanwhile, Hyper-V returns the consistency correctly to Hyper-V Replica, which in turn makes sure that the recovery side shows application-consistent points.

A great way to validate whether the recovery point is application-consistent or not is to do a test failover on that recovery point. After the VM has booted up, the event viewer logs will have events pertaining to a rollback – and this would mean that the point is not application consistent.

Key Takeaways

  1. All in all, you can rest assured that in the case of VMs with older operating systems, Hyper-V Replica is correctly taking an application-consistent snapshot of the virtual machine.
  2. Although there are errors seen in the guest, they are benign and having a recovery history with application-consistent points is an expected behavior.

Quickly Recovering Replication on Hyper-V

Two weeks ago, I had to recover from a sizable power outage. When this happened, my first priority was to make sure that all of my virtual machines were running well. Once I had done this, my next goal was to get Hyper-V Replica back up and running – so that I would be protected against any future problems.

Now, Hyper-V Replica would have eventually sorted itself out – but I did not want to wait for this to happen organically. I wanted things fixed immediately.

Hyper-V Replica had correctly detected that was a problem, and had scheduled resynchronization for all of my virtual machines. What I did to speed up the process was to shut down all non-critical virtual machines, and then use PowerShell to run the following command:

Get-VM -ComputerName Hyper-V-1, Hyper-V-2 | ?{$_.ReplicationMode -eq “Primary” -and $_.ReplicationHealth -eq “Critical”} | Resume-VMReplication -Resynchronize

This caused replica resynchronization to start immediately for all virtual machines that were reporting that replication was in a critical state. At this stage I must give a word of caution. You may be wandering why I shut down non-critical virtual machines before doing this. The reason is that initiating a mass resynchronization like this will generate a huge amount of disk activity, as Hyper-V goes through and rechecks all of the data on disk. I shut down non-critical systems to try and minimize the amount of data churn that occurred during this process.  Even with this precautionary step, I could feel the system slow down overall while resynchronization was happening.

But after a relatively short period of time, resynchronization was complete and my computers were (almost) back to normal.

Cheers,
Ben

Upcoming Preview of 'Disaster Recovery to Azure' Functionality in Hyper-V Recovery Manager

In the coming weeks, we will Preview functionality within Hyper-V Recovery Manager to enable Microsoft Azure as a Disaster Recovery point for virtualized workloads. The new functionality will add support for secure and seamless management of failover and failback operations using Azure IaaS Virtual Machines, thereby enabling our customers to save precious CAPEX and ongoing OPEX incurred in managing a secondary site for Disaster Recovery. Our enhanced DRaaS offering further delivers on our promise of democratizing Disaster Recovery and of making it available to everyone, everywhere. Hyper-V Recovery Manager provides enterprise-scale Disaster Recovery using a sing-click failover in the event of a disaster to an alternate enterprise data center or to an IaaS VM in Microsoft Azure. Application and Site Level Disaster Recovery is delivered via automation of overall DR workflow, smart networking, and frequent testing using DR Drills.

 

We announced the Preview during TechEd 2014. For more details about the upcoming Preview and existing Hyper-V Recovery Manager functionality, check out the DCIM-B322 session recording.

Replication Health Mailer

One of our Engineers, Sangeeth, has come up with a nifty PowerShell script which mails the replication health in a host or  in a cluster in a nice dashboard format. We thought it would be of help to our customers to get the status of the replicating VMs and their foot print on CPU and in Memory. You can download the script here.

The sample output from the script looks like this. You can add as many recipients as you wish Smile

Capture

On a cluster, you can run this script on one of the cluster nodes to get information about all Cluster VMs. You can even run this script to get information from remote host and remote Cluster using “HostorClusterName” parameter. In case of cluster use “isCluster” parameter to tell the script to get information from Cluster rather than on the local node.

Isn’t it simple and easy to get the replication information about VMs?

Hyper-V Replica trouble-shooting wiki

We are happy to announce the availability of Hyper-V Replica trouble-shooting Wiki here:

http://social.technet.microsoft.com/wiki/contents/articles/21948.hyper-v-replica-troubleshooting-guide.aspx

This guide contains links and resources to trouble-shoot some common Hyper-V Replica failure scenarios. We will be updating the guide over time!

We would like this to be a community effort to make it social and you are free to add the content to this guide.To add content follow this high level schema for the new articles [Please feel free to add other sections as appropriate]:

a. Error Messages/Event Viewer details – This section mentions what error messages customer will see on UI/PS/WMI and what event viewer messages are logged.

b. Possible Causes – This section explains list of scenarios(one or more) which might have led to failure.

c. Resolution – This section lists down the actions admin has to take in his environment to resolve the failure

d. Additional resources – List of blogs/KB Articles/Documentation/other articles which contain more information for the customer about the failure.

If you are new to TechNet wiki, the guide on “How to contribute” is here.

Happy WIKI’ing Smile