Azure Site Recovery Deployment Planner for VMware to Azure

Azure VMware

Before you begin to protect any VMware virtual machines (VMs) by using Azure Site Recovery, allocate sufficient bandwidth, based on your daily data-change rate, to meet your desired recovery point objective (RPO). Be sure to deploy the right number of configuration servers and process servers on-premises.

This tool can help you estimate duration of the migration.

Here after you will find details step-by-step guide on this tool

Prerequisites

The tool has two main phases: profiling and report generation. There is also a third option to calculate throughput only. The requirements for the server from which the profiling and throughput measurement is initiated are presented in the following table.

Server requirementDescription
Profiling and throughput measurementOperating system: Windows Server 2016 or Windows Server 2012 R2
(ideally matching at least the size recommendations for the configuration server)Machine configuration: 8 vCPUs, 16 GB RAM, 300 GB HDD.NET Framework 4.5VMware vSphere PowerCLI 6.0 R3Visual C++ Redistributable for Visual Studio 2012Internet access to Azure from this serverAzure storage accountAdministrator access on the serverMinimum 100 GB of free disk space (assuming 1,000 VMs with an average of three disks each, profiled for 30 days)VMware vCenter statistics level settings can be 1 or higher levelAllow vCenter port (default 443): Site Recovery Deployment Planner uses this port to connect to the vCenter server/ESXi host
Report generationA Windows PC or Windows Server with Excel 2013 or later..NET Framework 4.5Visual C++ Redistributable for Visual Studio 2012VMware vSphere PowerCLI 6.0 R3 is required only when you pass -User option in the report generation command to fetch the latest VM configuration information of the VMs. The Deployment Planner connects to vCenter server. Allow vCenter port (default 443) port to connect to vCenter server.
User permissionsRead-only permission for the user account that’s used to access the VMware vCenter server/VMware vSphere ESXi host during profiling
  • Download the tool by clicking here

Modes of running deployment planner

You can run the command-line tool (ASRDeploymentPlanner.exe) in any of the following three modes:

  1. Profiling
  2. Report generation
  3. Get throughput

First, run the tool in profiling mode to gather VM data churn and IOPS. Next, run the tool to generate the report to find the network bandwidth, storage requirements and DR cost.

Profile VMware VMs

In profiling mode, the deployment planner tool connects to the vCenter server/vSphere ESXi host to collect performance data about the VM.

  • Profiling does not affect the performance of the production VMs, because no direct connection is made to them. All performance data is collected from the vCenter server/vSphere ESXi host.
  • To ensure that there is a negligible impact on the server because of profiling, the tool queries the vCenter server/vSphere ESXi host once every 15 minutes. This query interval does not compromise profiling accuracy, because the tool stores every minute’s performance counter data.
Create a list of VMs to profile

First, you need a list of the VMs to be profiled. You can get all the names of VMs on a vCenter server/vSphere ESXi host by using the VMware vSphere PowerCLI commands in the following procedure. Alternatively, you can list in a file the friendly names or IP addresses of the VMs that you want to profile manually.

  1. Sign in to the VM that VMware vSphere PowerCLI is installed in.
  • Open the PowerShell command and run
  • Ensure that the execution policy is enabled for the script. If it is disabled, launch the VMware vSphere PowerCLI console in administrator mode, and then enable it by running the following command: Set-ExecutionPolicy –ExecutionPolicy AllSigned
  • You may optionally need to run the following command if Connect-VIServer is not recognized as the name of cmdlet. Add-PSSnapin VMware.VimAutomation.Core
  • To get all the names of VMs on a vCenter server/vSphere ESXi host and store the list in a .txt file, run the two commands listed here
  • Open the output file in Notepad, and then copy the names of all VMs that you want to profile to another file (for example, ProfileVMList.txt), one VM name per line. This file is used as input to the -VMListFile parameter of the command-line tool.
Start profiling

After you have the list of VMs to be profiled, you can run the tool in profiling mode. Here is the list of mandatory and optional parameters of the tool to run in profiling mode. Copy

ASRDeploymentPlanner.exe -Operation StartProfiling /?
Parameter nameDescription
-OperationStartProfiling
-ServerThe fully qualified domain name or IP address of the vCenter server/vSphere ESXi host whose VMs are to be profiled.
-UserThe user name to connect to the vCenter server/vSphere ESXi host. The user needs to have read-only access, at minimum.
-VMListFileThe file that contains the list of VMs to be profiled. The file path can be absolute or relative. The file should contain one VM name/IP address per line. Virtual machine name specified in the file should be the same as the VM name on the vCenter server/vSphere ESXi host.
For example, the file VMList.txt contains the following VMs:virtual_machine_A10.150.29.110virtual_machine_B
-NoOfMinutesToProfileThe number of minutes for which profiling is to be run. Minimum is 30 minutes.
-NoOfHoursToProfileThe number of hours for which profiling is to be run.
-NoOfDaysToProfileThe number of days for which profiling is to be run. We recommend that you run profiling for more than 7 days to ensure that the workload pattern in your environment over the specified period is observed and used to provide an accurate recommendation.
-VirtualizationSpecify the virtualization type (VMware or Hyper-V).
-Directory(Optional) The universal naming convention (UNC) or local directory path to store profiling data generated during profiling. If a directory name is not given, the directory named ‘ProfiledData’ under the current path will be used as the default directory.
-Password(Optional) The password to use to connect to the vCenter server/vSphere ESXi host. If you do not specify one now, you will be prompted for it when the command is executed.
-Port(Optional) Port number to connect to vCenter/ESXi host. Default port is 443.
-Protocol(Optional) Specified the protocol either ‘http’ or ‘https’ to connect to vCenter. Default protocol is https.
-StorageAccountName(Optional) The storage-account name that’s used to find the throughput achievable for replication of data from on-premises to Azure. The tool uploads test data to this storage account to calculate throughput. The storage account must be General-purpose v1 (GPv1) type.
-StorageAccountKey(Optional) The storage-account key that’s used to access the storage account. Go to the Azure portal > Storage accounts > <Storage account name> > Settings > Access Keys > Key1.
-Environment(optional) This is your target Azure Storage account environment. This can be one of three values – AzureCloud,AzureUSGovernment, AzureChinaCloud. Default is AzureCloud. Use the parameter when your target Azure region is either Azure US Government or Azure China 21Vianet.

Microsoft recommend that you profile your VMs for more than 7 days. If churn pattern varies in a month, Microsoft recommend to profile during the week when you see the maximum churn. The best way is to profile for 31 days to get better recommendation.

I will profile for 60 minutes as it’s for test and blog purpose, using the following command within VMware vSphere PowerCLI:

.\ASRDeploymentPlanner.exe -Operation StartProfiling -Virtualization VMware -Directory “C:\vCenter_ProfiledData” -Server vcenter67.arnaud.lab -VMListFile “C:\vCenter_ProfiledData\ProfileVMList.txt”  -NoOfMinutesToProfile 60  -User administrator@arnaud.lab
  • When prompted, provide password for user to connect to vCenter and press Enter
  • When all the checks are passed, you will see the above message
  • You need to keep the windows opened
  • At the end of the process we can continue with next step

Generate Report

The tool generates a macro-enabled Microsoft Excel file (XLSM file) as the report output, which summarizes all the deployment recommendations. The report is named DeploymentPlannerReport_<unique numeric identifier>.xlsm and placed in the specified directory.

Note: You will need to have Excel installed on the machine when you generate the report.

  • Run the following command to generate the report (adapt the command line to your configuration)
  • The tool run and create the report
  • Once creation is finished we can now analyze the report

Analyze Report

The generated Microsoft Excel report contains multiple sheets:

On-premises summary

The On-premises summary worksheet provides an overview of the profiled VMware environment.

Start Date and End Date: The start and end dates of the profiling data considered for report generation. By default, the start date is the date when profiling starts, and the end date is the date when profiling stops. This can be the ‘StartDate’ and ‘EndDate’ values if the report is generated with these parameters.

Total number of profiling days: The total number of days of profiling between the start and end dates for which the report is generated.

Number of compatible virtual machines: The total number of compatible VMs for which the required network bandwidth, required number of storage accounts, Microsoft Azure cores, configuration servers and additional process servers are calculated.

Total number of disks across all compatible virtual machines: The number that’s used as one of the inputs to decide the number of configuration servers and additional process servers to be used in the deployment.

Average number of disks per compatible virtual machine: The average number of disks calculated across all compatible VMs.

Average disk size (GB): The average disk size calculated across all compatible VMs.

Desired RPO (minutes): Either the default recovery point objective or the value passed for the ‘DesiredRPO’ parameter at the time of report generation to estimate required bandwidth.

Desired bandwidth (Mbps): The value that you have passed for the ‘Bandwidth’ parameter at the time of report generation to estimate achievable RPO.

Observed typical data churn per day (GB): The average data churn observed across all profiling days. This number is used as one of the inputs to decide the number of configuration servers and additional process servers to be used in the deployment.

Recommendations

The recommendations sheet of the VMware to Azure report has the following details as per the selected desired RPO:

Profiled data

Profiled data period: The period during which the profiling was run. By default, the tool includes all profiled data in the calculation, unless it generates the report for a specific period by using StartDate and EndDate options during report generation.

Server Name: The name or IP address of the VMware vCenter or ESXi host whose VMs’ report is generated.

Desired RPO: The recovery point objective for your deployment. By default, the required network bandwidth is calculated for RPO values of 15, 30, and 60 minutes. Based on the selection, the affected values are updated on the sheet. If you have used the DesiredRPOinMin parameter while generating the report, that value is shown in the Desired RPO result.

Profiling overview

Total Profiled Virtual Machines: The total number of VMs whose profiled data is available. If the VMListFile has names of any VMs which were not profiled, those VMs are not considered in the report generation and are excluded from the total profiled VMs count.

Compatible Virtual Machines: The number of VMs that can be protected to Azure by using Site Recovery. It is the total number of compatible VMs for which the required network bandwidth, number of storage accounts, number of Azure cores, and number of configuration servers and additional process servers are calculated. The details of every compatible VM are available in the “Compatible VMs” section.

Incompatible Virtual Machines: The number of profiled VMs that are incompatible for protection with Site Recovery. The reasons for incompatibility are noted in the “Incompatible VMs” section. If the VMListFile has names of any VMs that were not profiled, those VMs are excluded from the incompatible VMs count. These VMs are listed as “Data not found” at the end of the “Incompatible VMs” section.

Desired RPO: Your desired recovery point objective, in minutes. The report is generated for three RPO values: 15 (default), 30, and 60 minutes. The bandwidth recommendation in the report is changed based on your selection in the Desired RPO drop-down list at the top right of the sheet. If you have generated the report by using the -DesiredRPO parameter with a custom value, this custom value will show as the default in the Desired RPO drop-down list.

Required network bandwidth (Mbps)

To meet RPO 100 percent of the time: The recommended bandwidth in Mbps to be allocated to meet your desired RPO 100 percent of the time. This amount of bandwidth must be dedicated for steady-state delta replication of all your compatible VMs to avoid any RPO violations.

To meet RPO 90 percent of the time: Because of broadband pricing or for any other reason, if you cannot set the bandwidth needed to meet your desired RPO 100 percent of the time, you can choose to go with a lower bandwidth setting that can meet your desired RPO 90 percent of the time. To understand the implications of setting this lower bandwidth, the report provides a what-if analysis on the number and duration of RPO violations to expect.

Achieved Throughput: The throughput from the server on which you have run the GetThroughput command to the Microsoft Azure region where the storage account is located. This throughput number indicates the estimated level that you can achieve when you protect the compatible VMs by using Site Recovery, provided that your configuration server or process server storage and network characteristics remain the same as that of the server from which you have run the tool.

For replication, you should set the recommended bandwidth to meet the RPO 100 percent of the time. After you set the bandwidth, if you don’t see any increase in the achieved throughput, as reported by the tool, do the following:

  1. Check to see whether there is any network Quality of Service (QoS) that is limiting Site Recovery throughput.
  2. Check to see whether your Site Recovery vault is in the nearest physically supported Microsoft Azure region to minimize network latency.
  3. Check your local storage characteristics to determine whether you can improve the hardware (for example, HDD to SSD).
  4. Change the Site Recovery settings in the process server to increase the amount network bandwidth used for replication.

If you are running the tool on a configuration server or process server that already has protected VMs, run the tool a few times. The achieved throughput number changes depending on the amount of churn being processed at that point in time.

For all enterprise Site Recovery deployments, Microsoft recommend the use of ExpressRoute.

Required storage accounts

The following chart shows the total number of storage accounts (standard and premium) that are required to protect all the compatible VMs. To learn which storage account to use for each VM, see the “VM-storage placement” section. If you are using v2.5 of Deployment Planner, this recommendation only shows the number of standard cache storage accounts which are needed for replication since the data is being directly written to Managed Disks.

Required number of Azure cores

This result is the total number of cores to be set up before failover or test failover of all the compatible VMs. If too few cores are available in the subscription, Site Recovery fails to create VMs at the time of test failover or failover.

Required on-premises infrastructure

This figure is the total number of configuration servers and additional process servers to be configured that would suffice to protect all the compatible VMs. Depending on the supported size recommendations for the configuration server, the tool might recommend additional servers. The recommendation is based on the larger of either the per-day churn or the maximum number of protected VMs (assuming an average of three disks per VM), whichever is hit first on the configuration server or the additional process server. You’ll find the details of total churn per day and total number of protected disks in the “On-premises summary” section.

What-if analysis

This analysis outlines how many violations could occur during the profiling period when you set a lower bandwidth for the desired RPO to be met only 90 percent of the time. One or more RPO violations can occur on any given day. The graph shows the peak RPO of the day. Based on this analysis, you can decide if the number of RPO violations across all days and peak RPO hit per day is acceptable with the specified lower bandwidth. If it is acceptable, you can allocate the lower bandwidth for replication, else allocate the higher bandwidth as suggested to meet the desired RPO 100 percent of the time.

In this section, we recommend the number of VMs that can be protected in parallel to complete the initial replication within 72 hours with the suggested bandwidth to meet desired RPO 100 percent of the time being set. This value is configurable value. To change it at report-generation time, use the GoalToCompleteIR parameter.

The graph here shows a range of bandwidth values and a calculated VM batch size count to complete initial replication in 72 hours, based on the average detected VM size across all the compatible VMs.

In the public preview, the report does not specify which VMs should be included in a batch. You can use the disk size shown in the “Compatible VMs” section to find each VM’s size and select them for a batch, or you can select the VMs based on known workload characteristics. The completion time of the initial replication changes proportionally, based on the actual VM disk size, used disk space, and available network throughput.

Cost estimation

The graph shows the summary view of the estimated total disaster recovery (DR) cost to Azure of your chosen target region and the currency that you have specified for report generation.

The summary helps you to understand the cost that you need to pay for storage, compute, network, and license when you protect all your compatible VMs to Azure using Azure Site Recovery. The cost is calculated on for compatible VMs and not on all the profiled VMs.

You can view the cost either monthly or yearly. Learn more about supported target regions and supported currencies.

Cost by components The total DR cost is divided into four components: Compute, Storage, Network, and Azure Site Recovery license cost. The cost is calculated based on the consumption that will be incurred during replication and at DR drill time for compute, storage (premium and standard), ExpressRoute/VPN that is configured between the on-premises site and Azure, and Azure Site Recovery license.

Cost by states The total disaster recovery (DR) cost is categories based on two different states – Replication and DR drill.

Replication cost: The cost that will be incurred during replication. It covers the cost of storage, network, and Azure Site Recovery license.

DR-Drill cost: The cost that will be incurred during test failovers. Azure Site Recovery spins up VMs during test failover. The DR drill cost covers the running VMs’ compute and storage cost.

Azure storage cost per Month/Year It shows the total storage cost that will be incurred for premium and standard storage for replication and DR drill. You can view detailed cost analysis per VM in the Cost Estimation sheet.

Growth factor and percentile values used

This section at the bottom of the sheet shows the percentile value used for all the performance counters of the profiled VMs (default is 95th percentile), and the growth factor (default is 30 percent) that’s used in all the calculations.

VM-storage placement

Replication Storage Type: Either a standard or premium managed disk, which is used to replicate all the corresponding VMs mentioned in the VMs to Place column.

Log Storage Account Type: All the replication logs are stored in a standard storage account.

Suggested Prefix for Storage Account: The suggested three-character prefix that can be used for naming the cache storage account. You can use your own prefix, but the tool’s suggestion follows the partition naming convention for storage accounts.

Suggested Log Account Name: The storage-account name after you include the suggested prefix. Replace the name within the angle brackets (< and >) with your custom input.

Placement Summary: A summary of the disks needed to protected VMs by storage type. It includes the total number of VMs, total provisioned size across all disks, and total number of disks.

Virtual Machines to Place: A list of all the VMs that should be placed on the given storage account for optimal performance and use.

Compatible VMs

VM Name: The VM name or IP address that’s used in the VMListFile when a report is generated. This column also lists the disks (VMDKs) that are attached to the VMs. To distinguish vCenter VMs with duplicate names or IP addresses, the names include the ESXi host name. The listed ESXi host is the one where the VM was placed when the tool discovered during the profiling period.

VM Compatibility: Values are Yes and Yes*. Yes* is for instances in which the VM is a fit for premium SSDs. Here, the profiled high-churn or IOPS disk fits in the P20 or P30 category, but the size of the disk causes it to be mapped down to a P10 or P20. The storage account decides which premium storage disk type to map a disk to, based on its size. For example:

  • <128 GB is a P10.
  • 128 GB to 256 GB is a P15
  • 256 GB to 512 GB is a P20.
  • 512 GB to 1024 GB is a P30.
  • 1025 GB to 2048 GB is a P40.
  • 2049 GB to 4095 GB is a P50.

For example, if the workload characteristics of a disk put it in the P20 or P30 category, but the size maps it down to a lower premium storage disk type, the tool marks that VM as Yes*. The tool also recommends that you either change the source disk size to fit into the recommended premium storage disk type or change the target disk type post-failover.

Storage Type: Standard or premium.

Asrseeddisk (Managed Disk) created for replication: The name of the disk that is created when you enable replication. It stores the data and its snapshots in Azure.

Peak R/W IOPS (with Growth Factor): The peak workload read/write IOPS on the disk (default is 95th percentile), including the future growth factor (default is 30 percent). Note that the total read/write IOPS of a VM is not always the sum of the VM’s individual disks’ read/write IOPS, because the peak read/write IOPS of the VM is the peak of the sum of its individual disks’ read/write IOPS during every minute of the profiling period.

Peak Data Churn in Mbps (with Growth Factor): The peak churn rate on the disk (default is 95th percentile), including the future growth factor (default is 30 percent). Note that the total data churn of the VM is not always the sum of the VM’s individual disks’ data churn, because the peak data churn of the VM is the peak of the sum of its individual disks’ churn during every minute of the profiling period.

Azure VM Size: The ideal mapped Azure Cloud Services virtual-machine size for this on-premises VM. The mapping is based on the on-premises VM’s memory, number of disks/cores/NICs, and read/write IOPS. The recommendation is always the lowest Azure VM size that matches all of the on-premises VM characteristics.

Number of Disks: The total number of virtual machine disks (VMDKs) on the VM.

Disk size (GB): The total setup size of all disks of the VM. The tool also shows the disk size for the individual disks in the VM.

Cores: The number of CPU cores on the VM.

Memory (MB): The RAM on the VM.

NICs: The number of NICs on the VM.

Boot Type: Boot type of the VM. It can be either BIOS or EFI. Currently Azure Site Recovery supports Windows Server EFI VMs (Windows Server 2012, 2012 R2 and 2016) provided the number of partitions in the boot disk is less than 4 and boot sector size is 512 bytes. To protect EFI VMs, Azure Site Recovery mobility service version must be 9.13 or above. Only failover is supported for EFI VMs. Failback is not supported.

OS Type: It is OS type of the VM. It can be either Windows or Linux or other based on the chosen template from VMware vSphere while creating the VM.

Incompatible VMs

VM Name: The VM name or IP address that’s used in the VMListFile when a report is generated. This column also lists the VMDKs that are attached to the VMs. To distinguish vCenter VMs with duplicate names or IP addresses, the names include the ESXi host name. The listed ESXi host is the one where the VM was placed when the tool discovered during the profiling period.

VM Compatibility: Indicates why the given VM is incompatible for use with Site Recovery. The reasons are described for each incompatible disk of the VM and, based on published storage limits, can be any of the following:

  • Disk size is >4095 GB. Azure Storage currently does not support data disk sizes greater than 4095 GB.
  • OS disk is >2048 GB. Azure Storage currently does not support OS disk size greater than 2048 GB.
  • Total VM size (replication + TFO) exceeds the supported storage-account size limit (35 TB). This incompatibility usually occurs when a single disk in the VM has a performance characteristic that exceeds the maximum supported Azure or Site Recovery limits for standard storage. Such an instance pushes the VM into the premium storage zone. However, the maximum supported size of a premium storage account is 35 TB, and a single protected VM cannot be protected across multiple storage accounts. Also note that when a test failover is executed on a protected VM, it runs in the same storage account where replication is progressing. In this instance, set up 2x the size of the disk for replication to progress and test failover to succeed in parallel.
  • Source IOPS exceeds supported storage IOPS limit of 7500 per disk.
  • Source IOPS exceeds supported storage IOPS limit of 80,000 per VM.
  • Average data churn exceeds supported Site Recovery data churn limit of 20 MB/s for average I/O size for the disk.
  • Average data churn exceeds supported Site Recovery data churn limit of 25 MB/s for average I/O size for the VM (sum of all disks churn).
  • Peak data churn across all disks on the VM exceeds the maximum supported Site Recovery peak data churn limit of 54 MB/s per VM.
  • Average effective write IOPS exceeds the supported Site Recovery IOPS limit of 840 for disk.
  • Calculated snapshot storage exceeds the supported snapshot storage limit of 10 TB.
  • Total data churn per day exceeds supported churn per day limit of 2 TB by a Process Server.

Peak R/W IOPS (with Growth Factor): The peak workload IOPS on the disk (default is 95th percentile), including the future growth factor (default is 30 percent). Note that the total read/write IOPS of the VM is not always the sum of the VM’s individual disks’ read/write IOPS, because the peak read/write IOPS of the VM is the peak of the sum of its individual disks’ read/write IOPS during every minute of the profiling period.

Peak Data Churn in Mbps (with Growth Factor): The peak churn rate on the disk (default 95th percentile) including the future growth factor (default 30 percent). Note that the total data churn of the VM is not always the sum of the VM’s individual disks’ data churn, because the peak data churn of the VM is the peak of the sum of its individual disks’ churn during every minute of the profiling period.

Number of Disks: The total number of VMDKs on the VM.

Disk size (GB): The total setup size of all disks of the VM. The tool also shows the disk size for the individual disks in the VM.

Cores: The number of CPU cores on the VM.

Memory (MB): The amount of RAM on the VM.

NICs: The number of NICs on the VM.

Boot Type: Boot type of the VM. It can be either BIOS or EFI. Currently Azure Site Recovery supports Windows Server EFI VMs (Windows Server 2012, 2012 R2 and 2016) provided the number of partitions in the boot disk is less than 4 and boot sector size is 512 bytes. To protect EFI VMs, Azure Site Recovery mobility service version must be 9.13 or above. Only failover is supported for EFI VMs. Failback is not supported.

OS Type: It is OS type of the VM. It can be either Windows or Linux or other based on the chosen template from VMware vSphere while creating the VM.

Azure Site Recovery limits

The following table provides the Azure Site Recovery limits. These limits are based on our tests, but they cannot cover all possible application I/O combinations. Actual results can vary based on your application I/O mix. For best results, even after deployment planning, we always recommend that you perform extensive application testing by issuing a test failover to get the true performance picture of the application.

Replication storage targetAverage source disk I/O sizeAverage source disk data churnTotal source disk data churn per day
Standard storage8 KB2 MB/s168 GB per disk
Premium P10 or P15 disk8 KB2 MB/s168 GB per disk
Premium P10 or P15 disk16 KB4 MB/s336 GB per disk
Premium P10 or P15 disk32 KB or greater8 MB/s672 GB per disk
Premium P20 or P30 or P40 or P50 disk8 KB5 MB/s421 GB per disk
Premium P20 or P30 or P40 or P50 disk16 KB or greater20 MB/s1684 GB per disk
Source data churnMaximum Limit
Average data churn per VM25 MB/s
Peak data churn across all disks on a VM54 MB/s
Maximum data churn per day supported by a Process Server2 TB

These are average numbers assuming a 30 percent I/O overlap. Site Recovery is capable of handling higher throughput based on overlap ratio, larger write sizes, and actual workload I/O behavior. The preceding numbers assume a typical backlog of approximately five minutes. That is, after data is uploaded, it is processed and a recovery point is created within five minutes.