VMware SRM 6.1 – Configure Array-Based Replication

Introduction

 

This how-to will walk through the installation and configuration of array-based replication features for VMware Site Recovery Manager 6.1.

Before configuring array-based replication for use with VMware SRM, there are some pre-requisites.  First of all, you’re going to need to visit the VMware Compatibility Guide, which will help you determine if your specific array vendor is supported for use with SRM.  Second, there are steps to take to configure array based replication on the storage side, and that portion is out-of-scope for this blog, as I did not have access to do so.

vmware_hcl_example

There are several ways to search the compatibility guide, but to be specific, you can select entries from the areas highlighted above.  The bottom section that is highlighted will be your results once you click “Update and View Results.”  The reason why I wanted to point this step out is because if you assume your array vendor is supported, and don’t verify first, you could end up wasting your time planning and designing.

For this example, we are using SRM 6.1 with the Fibre Channel protocol on IBM SVC-fronted DS8K’s in both sites. I wanted to point that out because when I first set out to find the SRAs for use with our solution, I attempted to use the “IBM DS8000 Storage Replication Adapter”, later to find out it wasn’t the correct one.   The correct SRA for use with my environment is the “IBM Storwize Family Storage Replication Adapter”, so there may be a little bit of trial and error with this; however, if you do it up front during testing, you’ll save yourself some time later when deploying to production.

That all said, once you’ve verified your storage is supported, and what version of the SRA to download, you can get it by visiting the VMware downloads (you will need to login).  Be sure to also verify that the version of the SRA you are downloading is compatible with the version of array manager code you’re running.

 

Installing the SRA

Before you Begin – Prior to installing the SRA on the SRM server in each site (protected and recovery), you should have already paired the sites successfully.  Also, if you haven’t installed SRM yet, you will need to, otherwise the SRA installer will fail once it discovers that SRM is not installed.

Installing the SRA should be straightforward and painless, as there are not many options to configure during installation.  Once the installation is completed on both the protected and recovery SRM servers, proceed.

 

Verify That SRM Has Registered the SRAs

  1. Once you’ve installed the SRA on each site’s SRM server, log into the vSphere Web Client, and go to Site Recovery > Sites and select a site.site_recovery_sites_sra_monitor
    From this view, you can see what SRA has been installed, its status, and compatibility information.
  2. Click the rescan button to ensure the connection is valid and there are no errors.srm_sra_rescan_button

Configure Array Managers

After pairing the protected and recovery sites, you will need to configure the respective array managers so SRM can discover replicated devices, compute datastore groups, and initiate storage operations.  You typically only need to do this once, however, if array access credentials change, or you want to use a different set of arrays, you can edit the connections to update accordingly.

Pre-Requisites

  • Sites have been paired and are connected
  • SRAs have been installed at both sites and verified

Procedure

  1. In the vSphere Web Client, go to Site Recovery > Array Based Replication.srm_abr_settings_1_1
  2. On the Objects tab in the right window pane, click the icon to add an array manager.srm_abr_settings_1_2
  3. Select from one of two options for adding array managers (pair or single), then click Next.srm_abr_settings_1_3
  4. Select a pair of sites for the array manager(s), and click Next.srm_abr_settings_1_4
  5. Enter a name for the array in the Display Name field, and click Next.srm_abr_settings_1_5
  6. Provide the required information for the type of SRA you selected, and click Next.srm_abr_settings_1_6
  7. If you chose to add a pair of array managers, enter the paired array manager information, then click Next.srm_abr_settings_1_7
  8. Click-to-enable the checkbox beside the array pair you just configured, and click Next.srm_abr_settings_1_8
  9. Review your configuration, then click Finish when ready.srm_abr_settings_1_9

 

Rescan Arrays to Detect Configuration Changes

SRM performs an automatic rescan every 24 hours by default to detect any changes made to the array configurations.  It is recommended to perform a manual rescan following any changes to either site by way of reconfiguration or adding/removing devices to recompute the datastore groups.  If you need to change the default interval at which SRM performs a rescan, you can do this in the advanced settings for each site, editing the storage.minDsGroupComputationInterval advanced setting:

srm_abr_settings_1_11

To perform a manual rescan after making any configuration changes:

  1. Go to Site Recovery  > Array Based Replication
  2. Select an array for either site
  3. On the Manage tab of the selected array, click the Array Pairs sub tab
  4. Click the rescan button to perform a manual rescan.srm_abr_settings_1_10

 

Once you’ve got all of the above configured, you can begin setting up your protection groups and recovery plans.

Share This:
Share

Product Comparison: VMware SRM & Zerto Virtual Replication

Introduction

Obviously, based on my previous blog posts, it’s apparent that I’ve been spending some time in the past few months testing VMware Site Recovery Manager and Zerto Virtual Replication to see which product best meets our business continuity and disaster recovery requirements.  My task was to compare the two products, feature for feature based on our use cases, which are primarily protection, recovery, re-protection, and workload migration.

Get comfortable, this could take a while…

Blue vs. Red

As of today, SRM and Zerto have been tested in a sandbox environment, consisting of 2 sites (Seattle and Denver), 2 vCenters, 2 physical hosts in a cluster in each site, and 1 test workload which consisted of a Windows Server VM with auto-generated files of different sizes.  The two sites, being geographically separated are joined by a dual 20 Gb/s connection, and there are no bandwidth throttling mechanisms in place outside of what’s available in the software, and it’s only used to throttle down during business hours.  The physical networking at the host level in both sites is 10GbE.

VMware’s Site Recovery Manager is the only one of the two products that has the array-based replication feature, so to make this more of an “apples-to-apples” comparison, that feature isn’t heavily reported on here, but has been tested, and it works well, so I’m happy.

Both hypervisor-based product tests that were performed have been completed in each direction, in terms of recovery testing, failover, re-protection, and migration.  The results of both solutions are similar, however, based on results, we are leaning more toward one product in terms of simplicity, flexibility, scalability, monitoring capabilities, and user experience.

Below are images of what the topology for both test environments looks like, with SRM on the left, and Zerto on the right.

If you are interested in seeing these diagrams up close, you can download the PDFs for each here:

topology_showdown_generic

^^ Not pictured in the Zerto Diagram: External PSCs for vCenter, vCenter SQL Servers, and all port communication native to vCenter components.

Product Comparison

While VMware Site Recovery Manager creates a complete solution with vSphere Replication (which can also be used without SRM), Zerto also protects using hypervisor replication.  But to compare the two, we must first compare the capabilities of each solution by comparing vSphere Replication (without SRM) to Zerto Virtual Replication.  Note that without SRM, vSphere Replication can be rather limited when it comes to several features.  The tables will lay out the use cases for either product, and their features.

Use Cases

VMware vSphere Replication Use CasesZerto Virtual Replication Use Cases
  • Data protection and disaster recovery within the same site and across sites
  • Data center migration
  • Replication engine for VMware vCloud Air Disaster Recovery
  • Replication Engine for VMware vCenter Site Recovery Manager
  • Replication & Disaster Recovery
  • Offsite Backup and Data Protection
  • Data Migrations & Workload Mobility
  • Automated Failover, Failback & Testing
  • Reduce RTO/RPO
  • Complete BC/DR solution: Business Continuity and Disaster Recovery
  • Storage Savings
  • AWS Migrations: Cloud migration to Amazon Web Services (ZVR 5.0 introduces DRaaS to Azure)
  • Cross-Hypervisor Replication: MS Hyper-V to VMware vSphere/VMware vSphere to MS Hyper-V

 

Feature Comparison: vSphere Replication (Without SRM) and Zerto Virtual Replication

 VMware vSphere ReplicationFeatures & BenefitsZerto Virtual Replication Features & Benefits
Licensing RequirementVMware Essentials Plus and AboveVMware Essentials
Automation/Orchestration of Disaster RecoveryManual, PowerCLI to get basic automation (add to inventory, power on/off) ; otherwise, use SRM with vSphere Replication Full automation/orchestration features
Version CompatibilityvSphere Replication version must match vCenter versionZerto can be used with vSphere 4.0 and later, no ties to having every component match versions in respect to hypervisor/vCenter.
Automated Recovery CapabilitiesEach VM in the recovery site will need to manually be powered on. Fully automated recovery capabilities.
Automated Connection to correct network(s)Manually done when recovering with vSphere Replication. For automation of post-recovery tasks, use SRM. Fully automated
WAN CompressionNetwork compression capable with 6.1 at the cost of vSphere replication appliance CPU resources. Note: 1 vR appliance per vCenter instance is supported for a maximum of 2000 VMs protected per appliance. Built-in, often seeing a 50% compression ratio. Replication appliances are assigned a 1:1 ratio (host to VRA) with automated resource reservations to ensure best performance of replication appliances.
IP Re-AddressingManual process. For automated re-IP, use SRMBuilt in to failover plan (assigned in VPG)
Non-Disruptive TestingNot available since you cannot power on the replica VM if the original VM is still running and reachable. Use SRM with vSphere Replication to allow for recovery testing. Real or bubble networks can be used for recovery testing and isolation.
Cloning CapabilityNoneAllows for recovery site clones. This allows for full long-term archival backups of the VMs or file-level recovery from a point-in-time clone.
Failback OptionNone - SRM required.Automated failback workflow capability
Point-in-Time RecoveryAvailable with vSphere Replication 6.x - maximum of 24 PIT instances. Uses VMware Snapshots. Configurable, however, when using Offsite Backup Feature, up to 1 year. Does not use VMware Snapshots.
RDM (Raw Device Mapping) Support No physical RDM support, but virtual RDMs are supported.Both physical and virtual mode RDMs are supported.
Bandwidth ControlNoneThrottling and priorities are available in Zerto to reduce bandwidth consumption during certain times, and unlimited at others, via schedule.
vApp SupportNot SupportedZerto leverages vApps to make administration easier. If a vApp is configured for protection with a VPG, then any VM added to the vApp is automatically protected.
Storage DRS SupportNot supported, SRM is required.Storage DRS is supported and works with Zerto.
RPO Range15 minutes to 24 hoursSeconds
How VMs are ChosenSelected individually or through multi-selecting in the interface, but protection grouping is not available. VMs can be organized into Virtual Protection Groups.

 

Feature Comparison: vSphere Replication (with SRM) and Zerto Virtual Replication

 VMware vSphere Replication with SRMZerto Virtual Replication
Provides planning, testing, and execution of disaster recovery for vSphere:YesYes
Designed for:SRM was designed for disaster recovery orchestration only Designed for hypervisor-based replication AND disaster recovery orchestration
Licensed:Per-VMPer-VM
Replication granularity:Per-VM or multi-select but virtual protection grouping is not available Per-VM and/or Per-Virtual Protection Group
Configure consistency groups (virtual protection groups)NoYes
Replication recovery points:Yes, up to 24 snapshotsYes, up to 14 days with standard recovery, up to 1 year with extended recovery using the Offsite Backup feature.
Compatibility:vSphere Replication works with ESX 5.x and above. SRM requires the same version of vCenter and SRM be installed at both sites. Zerto works with ESXi 4.0 U1 and above. Zerto can replicate between different versions of vCenter. Zerto can also protect and recover from vSphere to Hyper-V, Hyper-V to vSphere, and either virtualization platform to the cloud (AWS, Azure(Zerto v5.0)).
Managed with:vSphere Client PluginvSphere Client Plugin and standalone browser UI
Replication is performed with:vSphere ReplicationZerto HyperVisor-based replication through VRAs deployed to each host with protected VMs

 

Feature Comparison: VMware Site Recovery Manager & Zerto Virtual Replication API Availability

The following table displays the availability, use cases, and capabilities of both the VMware Site Recovery Manager and Zerto Virtual Replication APIs for access, integration, and automation.

 VMware Site Recovery ManagerZerto Virtual Replication
Availability
  • Similar to vSphere API, uses web service that allows access to the API in Java C#, or any language that supports WSDL (Web Services Definition Language).
  • REST APIs are available to automate virtual infrastructure, allowing for benefits of software defined replication and recovery.
Use Cases
  • Automation of protection operations
  • Automation of protection operations
  • Automation of product deployment
  • Querying and Reporting
Capabilities
  • Create protection groups
  • Initiate testing
  • Initiate recovery
  • Re-protection
  • Revert Operations
  • Collect Results
  • Bulk automated VRA deployment
  • Bulk automated VPG creation
  • Automating VM protection by vSphere Folder
  • Automating VM protection with vRealize Orchestrator
  • Listing unprotected VMs
  • Listing protected VMs & VPGs
  • Long Term RPO & Storage Reporting to CSV
  • Resource reports
  • VPG, VM, VMNIC & Re-IP settings report
  • Emailing Reports
Programming Environments/Supported Languages
  • Java JAX-WS Framework
  • C# and Visual Studio
  • Java Axis Framework
  • Managed Objects as WSDL
  • All require SDK installation for each environment
  • PowerShell
  • cURL
  • Python
  • C#

 

System Requirements

The following tables below outline system requirements for both VMware Site Recovery Manager and Zerto Virtual Replication.

 VMware Site Recovery Manager 6.1Zerto Virtual replication 4.5 U3
Virtualization Management
  • VMware vCenter 6.0 U2 in both protected and recovery sites.
  • VMware vCenter 4.0 U1
  • Microsoft SCVMM 2012 R2
  • As long as protected and recovery sites meet minimum versions, cross-version protection and recovery is supported.
Hypervisor
  • Minimum VMware vSphere ESXi 5.0
  • Minimum VMware vSphere ESXi 4.0 U1
  • Microsoft Windows Server 2012 R2 and Server Core
vSphere Replication Appliance
  • Minimum vSphere Replication 6.0
  • Not Required
Storage Replication Adapter
  • Depends on SAN vendor and code level, availability, and support.
  • Not Required
Client
  • vSphere Web Client - by default will match currently installed version that matches vCenter requirement for SRM.
  • vSphere Client Console (Thick Client) 4.0 and higher
  • vSphere Web Client 5.0 - 5.0 U3 - Not supported
  • vSphere Web Client 5.1 and up - Supported
  • Zerto Standalone Web UI
vSphere Replication Appliance Resource Requirements (per site)
  • 2 vCPU
  • 4 GB RAM
  • 18 GB Storage
  • According to VMware, CPU and memory resources consumed by vSphere Replication on a host or guest OS is negligible.
  • The numbers seen above are how the appliance is configured by default.
  • N/A
Zerto Virtual Replication Appliance (VRA)
  • N/A
  • 1 vCPU
  • 2GB RAM (minimum)
  • 12.5GB Storage
  • 1 of these appliances needs to be deployed (via Zerto UI) to each host that will be protecting VMs in VPGs.
  • DRS Affinity rules are created automatically by Zerto during the deployment process, so VRAs always stay on the hosts they are installed to.
Recovery Orchestration Provided By
  • Site Recovery Manager 6.1 (see versions above for compatibility) or review VMware's product interoperability matrix for all version information.
  • Zerto Virtual Replication (required before VRAs can be deployed)
SRM6.1/ZVM 4.5U3 Server Requirements (1 per site)
  • At least 2 CPUs, 4 for large environments
  • 2 GB RAM minimum - at least 6 GB if including OS requirements
  • 5 GB storage (in addition to OS requirements)
  • At least 1Gb/s NIC
    • Windows Server 2008 R2 (64-bit)
    • Windows Server 2012 R2 (64-bit)
  • Protecting up to 750 VMs and up to 5 peer sites:
    • 2 CPU (reserved)
    • 4GB RAM (reserved)
  • Protecting 751-2000 VMs and up to 15 peer sites:
    • 4 CPU (reserved)
    • 4GB RAM (reserved)
  • Protecting over 2000 VMs and over 15 peer sites:
    • 8 CPUs (reserved)
    • 8GB RAM (reserved)
  • 2GB Storage space for binaries
Supported Databases
  • Microsoft SQL Server
  • 2008 Express R2 SP2,SP3 (32-bit and 64-bit)
  • 2008 Standard/Enterprise R2 SP3 (32-bit and 64-bit)
  • 2008 Standard/Enterprise/Datacenter R2 SP2 (32-bit and 64-bit)
  • 2008 Standard/Enterprise R2 SP1 (32-bit and 64-bit)
  • 2012 Express SP2 (32-bit and 64-bit)
  • 2012 Standard/Enterprise SP2 (32-bit and 64-bit)
  • 2012 Standard/Enterprise SP1 (32-bit and 64-bit)
  • 2012 Enterprise (64-bit)
  • 2014 Standard/Enterprise (32-bit and 64-bit)
  • Oracle
    • 11g Standard ONE Edition, R2 (32-bit and 64-bit)
    • 11g Standard/Enterprise Edition, R2 (32-bit and 64-bit)
    • 12C Standard ONE Edition, R1 (32-bit and 64-bit)
    • 12C Standard/Enterprise Edition (32-bit and 64-bit)
    • Embedded SQL database for protecting up to 4 sites, 40 hosts, and 400 VMs/li>
    • Microsoft SQL Server Standard & Enterprise Editions for anything more than the above
    • Microsoft SQL Server Express
    • Supported MSSQL Database versions:
      • 2008
      • 2008 R2
      • 2012
      • 2014
    Bandwidth Requirements
    • > 10Mb/s (dedicated to move 40GB in an hour)
    • > 5Mb/s
    Number of Firewall Ports for Cross-site Communication, Replication, and Recovery
    • WAN - 7 (in addition to all vCenter related ports) See topology diagram for port listings.
    • WAN - 3 (in addition to all vCenter related ports) - See topology diagram for port listings.

     

    Steps from Installation to Protection

    The following table compares the high-level installation tasks/steps for VMware Site Recovery Manager and Zerto Virtual Replication.  These steps assume necessary pre-requisites such as vCenter installation and firewall rules have been created.

    Please note, that SRM appears to have many more steps, because SRM supports both array-based replication, in addition to vSphere Replication. If you don’t use one or the other, these steps are dramatically decreased.  In my test environment, both features have been tested, and because of that, SRM has more steps.

    VMware Site Recovery ManagerZerto Virtual Replication
    1. Build Windows VMs to host SRM in each site
    2. Build SQL Server/leverage existing, or use embedded vPostgress db.
    3. Install SRM in Protected and Recovery Sites and license
    4. Connect SRM instances in Protected and Recovery Sites
      Note: This requires a functional error-free vCenter/PSC infrastructure. PSCs should be in-sync with no errors.
    5. Pair SRM instances
    6. Install & configure Storage Replication Adapters (SRA)
    7. Pair Array Managers
    8. Configure inventory mappings
    9. Create Protection Groups and Recovery Plans
    10. Test, validate, protect, test recovery, monitor, and alert.
    11. If using vSphere Replication - Install, configure, & pair vSphere Replication Appliances in each site
    1. Build Windows VMs to host Zerto in each site
    2. Install Zerto on each ZVM and apply license on login
    3. Optional: Build/leverage existing SQL Server, or use the embedded database
      • See Database requirements in the above table for explanation on sizing the DB and when to use an external SQL server.
    4. Pair the Zerto instances
    5. Edit site settings, schedule throttling if using a shared WAN connection, and configure alerts, thresholds, etc...
    6. Deploy ZRAs (Zerto Replication Appliance - one per host that will be protecting VMs)
    7. Build Virtual Protection Groups (the VPG configuration also includes recovery options such as re-IP or pre/post scripts).
    8. Test, validate, protect, test recovery, monitor, and alert.

     

    Protection Workflow

    The following workflows have been created to illustrate the process involved in protecting virtual workloads using VMware Site Recovery Manager with vSphere Replication, and Zerto Virtual Replication.
    Individual files for each protection workflow in full-size view are here:

    srm_zerto_protection_workflows

    In the above images, SRM on the left, and Zerto on the right; visually, you can see that SRM clearly has many more steps performed in multiple places, compared to Zerto. Majority of the additional steps in the SRM protection workflow deal with the multiple layers where protection is configured via the vSphere Web Client for a single VM using vSphere Replication. On the right side (Zerto), you see that most of the steps (if not all) for protecting virtual workloads takes place at the top layer, which is the Zerto Virtual Manager UI.

    In SRM, protecting a single VM using vSphere Replication involves selecting the VM enabling vSphere Replication, going into Site Recovery, building a protection group and configuring it, followed by creating a recovery plan and configuring. The recovery plan portion of that is where customization such as boot priority and IP address changes are completed.

    In Zerto, protecting a single VM is as easy as logging into the ZVM UI, creating a VPG, and providing protection and recovery settings all within one wizard.

     

    Recovery Workflow

    The following workflows have been created to illustrate the process involved in recovering from a site failure using VMware Site Recovery Manager with vSphere Replication, and Zerto Virtual Replication.

    Individual files for each protection workflow in full-size view are here:

    srm_zerto_recovery_workflows

     

    In the above images, SRM on the left, and Zerto on the right; visually you can see that the steps to recovery are fairly similar, with the exception that recovery in SRM is performed via the vSphere Web Client, while recovery from Zerto is performed from the ZVM UI (recovery performed at the recovery site in both scenarios). The most complex part about recovering in any scenario is the organization of admins/engineers/business stakeholders to recover, re-configure, and validate the recovery process. Of course, if routine recovery testing had been taking place, a failure should basically mimic a recovery test, although, more of a commitment at this point, instead of an exercise.

    In SRM, there really is one place to take care of a recovery, and that is in Site Recovery > Recovery Plans. Locate the recovery plan for the application(s) you want to recover, and click the red button – its a no-brainer!

    In the Zerto UI home screen, toggle the failover type from test to “live”, and click the recover button. When you click the button, you will be presented with a 3 step wizard, where you will select the VPG(s) to recover; select the checkpoint to recover from, set the commit policy, re-protect; and click the “start failover” button. Recovery and re-protection all in 1 place.  The re-protection process in either product is straightforward, however, if there already isn’t a site built to re-protect to, there will be some work to do (in either case).

     

    Implementation Time and Complexity

    Planning, designing, and implementing either of these two products shouldn’t be difficult for anyone, except there are several pre-requisites that take time, change management processes and schedules to follow, or firewall rules to create and verify. With SRM, I’ve found that since this product ties to closely in to vSphere and version matching is a requirement, this could delay anyone who doesn’t have a version-aligned environment; or doesn’t have experience with vSphere or SRM. The biggest requirement for SRM? vSphere – you will have to have a vSphere deployment fully functional, and at an exact minimum version in both sites, in order to deploy SRM successfully.  Zerto doesn’t care if the vCenter/ESXi versions on both sites match, as long as the minimum supported version is in use.

    Granular requirements can make for administrative overhead and total team collaboration in the case of upgrades, maintenance, recovery, etc… because SRM relies heavily on version compatibility (as do other VMware products). In cases like this, there are specific orders of operations required for upgrades or power-on operations. These requirements are out of scope, but it pays to understand that they exist; so be sure to do some research, and if you can, test it before performing in production.

    When installing Zerto, what took the most amount of time was building the Windows VMs (a few hours x 2) to house ZVM in each site… that and firewall rules (about 2 weeks, in my case following approval, change management, and implementation). Once the VMs were built and the firewall rules were in place, the actual time taken to install Zerto was about 10-15 minutes per ZVM, and approximately 10 minutes to deploy each VRA, which can also be bulk scripted. Zerto works as long as the hypervisor and vCenter are at a minimum version supported by Zerto, but it can protect across versions, or even hypervisors (VMware vSphere & Microsoft Hyper-V)! VPG creation can vary, depending on how many VMs per VPG you want to protect, and customization of all options, with one of the longer taking items being recovery and test IP settings. That’s it. Once you have a VPG created, initial synchronization starts, and as soon as the sites are in sync,  you’ll ready to test, recover, or migrate and re-protect.

     

    Monitoring and Reporting

     

    Monitoring and Reporting with VMware Site Recovery Manager

    VMware Site Recovery Manager provides monitoring and reporting, however, is limited depending on where you are in the object hierarchy (but the data is there!):

    • number of replicated VMs per host
    • amount of data transferred
    • number of RPO violations
    • replication count
    • number of sites successfully connected

    These reports can also be expanded to show more detail, and data range can be modified. In my experience during testing, monitoring replication status and information isn’t as intuitive and centrally located as you would expect. There are several different places to monitor protection status and get additional information.

    Some of this is at the VM level, where you will see replication status, last sync point, target site, quiescing (enabled/disabled), network compression (enabled/disabled), RPO, Points in time recovery (enabled/disabled), disk status.

     

    Monitoring at the VM Object

    vm_replication_status

    At the VM (protected VM) level, you can monitor replication performance, however, it is limited to 2 counters, which are:

    • Replication Data Receive Rate (Average in KBps)
    • Replication Data Transmit Rate (Average in KBps)

    srm_vm_counters

     

    Monitoring at the Site Recovery > Sites Level

    At the site level, you can monitor things like issues, recovery plan history, and also get basic protection group and recovery plan information for Array Based Replication, Protection Groups, and Recovery Plans:

    srm_site_monitors

     

    Monitoring at the Protection Group Level

    At the protection group level, the summary tab will give you information such as status, number of VMs that are in the protection group, configuration status of those VMs, and any replication warnings (not clickable for more detail):

    srm_pg_summary

    Selecting a protection group gives you a list of recovery plans, and VMs, and general protection information, but no logging or reporting.

    srm_pg_monitors

     

    Monitoring at the Recovery Plan Level

    At the recovery plan level, when you select a recovery plan you the plan status, VM status, and recent history if the recovery plan has been run for testing or failover:

    srm_rp_summary

     

    Digging deeper into a recovery plan, you have the ability to see recovery plan steps, history, protection group general protection information, and virtual machine general protection information:

    srm_rp_monitors

     

    Monitoring vSphere Replication at the vCenter Level

    One more place that I was able to find monitoring and reporting is at the vSphere Replication level.  Going to vSphere Replication in the vSphere Web Client, then clicking on a vCenter.  From there, going to the Monitor tab, and clicking on vSphere Replication will take you the the screen in the image below where you can monitor Outgoing Replications, Incoming Replications, View Reports and Cloud Recovery Settings.  The reports section looks to contain the most information, however, there isn’t a way in the UI to export reports if a customer requests a report to show history of their replication jobs.

    Monitoring Outgoing Replications (per vCenter)

    This section displays any Point in Time snapshots that can be recovered to if it has been configured, and replication information (although very general) such as:

    • Status
    • VM
    • Target Site
    • vR Server used
    • Configured Disks
    • Last Instance Sync Point
    • Last Sync Duration
    • Last Sync Size
    • RPO
    • Quiescing (enabled/disabled)
    • Network Compression (enabled/disabled)

    monitoring_vsphere_replication_outgoing_rep

     

    Monitoring Incoming Replications (per vCenter)

    This section displays Point in Time Snapshots, Recovery history, and Replication information (again all general) such as:

    • Status
    • VM (when a VM is selected above)
    • Target Site
    • vR Server
    • Configured Disks
    • What manages the incoming replications (in this case, it’s SRM)
    • Last instance sync point
    • Last sync duration
    • Last sync size
    • RPO
    • Quiescing (enabled/disabled)
    • Network Compression (enabled/disabled)

    monitoring_vsphere_replication_incoming_rep

     

    Reporting for vSphere Replication (per vCenter)

    This section contains statistical information that can be filtered by date range.  This section is a little more detailed (my favorite view), and actually contains numbers on graphs. It contains information such as:

    • Count of replicated vs non-replicated VMs
    • Replicated VMs per by host(s)
    • Transferred bytes
    • RPO violations
    • Replications Count
    • Site connectivity status
    • vR Server Connectivity (not pictured)

    While this is great information, there is no way from the interface to export the reports if needed.

    monitoring_vsphere_replication

     

    Cloud Recovery Testing

    This section contains general information on any replications to the cloud.  Since we are not replicating to the public cloud, this section is empty, but I have shown it to display what detail it contains.

    monitoring_vsphere_replication_cloud_settings

    Based on the findings for monitoring vSphere Replication and SRM, as shown above, there are multiple places to look for information, statistics, and reports.  The problem here is that monitoring any ongoing replication jobs and/or recoveries and performance is a multi-tiered approach, and there is no centralization of information that is exportable for review.  There are too many places to look for information, and it would be too tedious to effectively monitor protection jobs, recoveries, and performance out-of-the-box.

     

    Monitoring and Reporting in Zerto Virtual Replication

    Monitoring protection status in Zerto has been intuitive, detailed, and centralized. Zerto has decided to separate the two functions into “tabs” within the UI. One tab for monitoring (includes tasks and alerts), and one tab for reporting. The ability to set Zerto up to alert via e-mail and send reports at a regular interval (and scheduled!) are natively built into the product. The product doesn’t stop with 1 e-mail address destination, as it also allows for multiple recipients via comma or semicolon separator in the site settings. In the resource reports, you can set up the sampling rate, and the sampling time interval. In terms of BC/DR solutions, it would be much more preferred to receive more information than necessary, rather than waiting for a problem to surface. Nothing is more embarrassing or resume-generating than finding out at the point of a failure that your replication product hasn’t been replicating much or hasn’t been able to meet your RPO/RTO.

    In the Zerto UI, monitoring alerts, events, and tasks is as simple as clicking on the “monitoring” tab. You can search for specific events or alerts (or both), and also modify the timeframe that you are targeting. In the reporting tab, you can get reports for the following items, and you can select any of them per VPG, or for all VPGs (and customize the reporting dates).

    • VPG Performance (RPO in seconds, IOPs, Throughput (MB/s), and WAN traffic (MB/s))
    • Outbound Protection Over Time (data in GB) – for each recovery site
    • Protection Over Time by Site (Journal Usage in GB, VMs protected by count)
    • Recovery Reports by VPG, type, and/or status
    • Resource Report – shows resources used by protected VMs, which is required by Zerto to ensure recovery capability. (Exports to Excel)
    • Usage – exports to CSV, PDF, or ZIP

    zerto_monitoring_tab

    zerto_reports_tab

     

    Conclusion

    In conclusion, both products work as advertised, and deciding which product to go with may come down to trust, flexibility, simplicity, scalability, monitoring & reporting, re-protection capabilities, and of course, cost. When considering the cost of either solution, be sure to also include the cost of human hours required to successfully deploy and support either one. Both products have their benefits and quirks, but the bottom line is that THEY BOTH WORK GREAT!

    Since I also went through the entire process from design, to implementation, to protection, testing, and recovery – it took a considerable amount of time for VMware Site Recovery Manager to become usable due to some external problems we were having, so that sort of left a bad taste in my mouth (it was frustrating – but that was specific to my environment). Because Zerto wasn’t affected by those existing problems in terms of being prevented from working, it felt much simpler, but don’t get me wrong, you still have to plan for your deployment.  The time taken alone to deploy and have both products functioning varied considerably, with Zerto coming in as the winner in terms of time to protection versus Site Recovery Manager in my experience (again related to the underlying problems in my environment).

    Array-based replication is an optional feature of SRM, and once we figured out what was needed on the SAN side for this to work properly, it actually runs nicely. This method historically, has been an expensive route to go due to the requirement of needing to have the same storage (vendor at least) in each site (protected and recovery). This also introduces another layer of complexity in configuration, administration, maintenance, and support alignment, which will involve SAN administrators.  vSphere Replication on the other hand, is easy to set up and you can be replicating VMs using this method in a short period of time.

    Scalability of the products is another area that I researched, and determined that while both products can protect up to 5000 VMs per vCenter (refer to comparison tables), vSphere replication has a limitation 1 vSphere Replication Appliance per vCenter (per the documentation)  That said – the only real problem here is the VM limit per appliance. Zerto can scale out to take advantage of resources on each host, but requires a VRA (virtual replication appliance) deployed to each host in a cluster where you are protecting VMs. The VRAs come at no additional cost (both products are licensed per VM being protected), and can be sized as needed for best performance. The only option for scalability with the vSphere Replication Appliance is to scale up, which means adding more CPU/RAM to it, but hosts still carry the bulk of the replication tasks, even through an appliance failure.  When deploying Zerto VRAs, you will need IP addresses, so that’s one downside to having one per host, especially in large environments.  On the plus side, you can deploy all those VRAs from one screen and their deployments can be automated, so that saves time.

    Compatibility of each product and their requirements varies as well, with SRM having the most requirements in both sites (protected and recovery). Since Zerto is basically deployed on top of a virtualization infrastructure, it is not tightly integrated into the base vSphere product nor does it rely on the same version requirements as SRM.  Zerto is very flexible in versioning for both protected and recovery sites, and it also can protect and recovery to/from vSphere to Microsoft Hyper-V (or cloud).

    Lastly, while I’m not seasoned programmer or script guru – at a high-level, both products can be programmatically managed, and both support PowerShell (with SRM requiring the PowerCLI add-on from VMware). Both products can also leverage vRealize Orchestrator, allowing workflow automation for protection tasks. Both products include support for multiple scripting/programming languages, and have their APIs documented, however, in the case of SRM, creation of recovery plans and forced-failovers cannot be automated (per the API documentation). Zerto can be managed through a feature-rich RESTful API that allows management of pretty much every aspect of the product and its capabilities, and their documentation is clear and full of example scripts in each of their supported languages for everyday tasks.

    I hope this information has been helpful for those who are trying to decide which product to go with, and as always, comments or questions are welcome!  And if you find this to be useful information, please share it!

    Share This:
    Share

    SRM 6.1 POC Update – Post Failed PSC Remediation

    Just an update here to show that after resolving that PSC synchronization issue in our environment, I am now able to successfully pair the two SRM sites in our POC.

    Since I have replaced the failed PSC with a new one (new name/IP), and the SRM server was initially connected to the old PSC, I had to first modify the SRM installation and update the PSC it was pointed at. Once I did that, site pairing was successful, and all those SSL and user/password errors I was getting went away.

    srm_poc_update_post_pscfix

    So, my advice if you run into the same issues as I did – is not to count other systems in the environment out, otherwise, you may be thrown for a loop and support would be no help.

    If we hadn’t discovered that synchronization issue between external PSCs, this would have likely been an ongoing issue and it would have seemed like there was no light at the end of the tunnel.

    For a recap of the issues seen with site pairing due to the PSC synchronization being broken, see this blog entry.

    Share This:
    Share

    SRM 6.1 POC Update & PSC Problems

    I recently ran into some problems during my first attempt at pairing the two sites in my SRM POC, which resulted in a failure, and some misleading error messages.  Since help on this was pretty scarce on the ‘net, I opened a case with VMware Support.  After about a week’s worth of troubleshooting – repairing the installation, re-installing SRM with a fresh database, and certificate regeneration/registration provided no resolution.

    srm_psc_error_1

    As I was waiting for an escalation from VMware, we discovered that one of the PSCs in this environment stopped replicating changes to the other. Upon further analysis, I discovered that it had been about a month since that particular platform service controller had stopped replicating changes. What made it tough to find the problem here was that we were still able to get into vCenter and manage it just fine, but taking a peek under the covers proved there was definitely an issue.  It was by chance a license change to vCenter exposed this problem when we saw that the change didn’t make it from one vCenter to the other.

    The following command will provide you with the results seen below, which indicate the synchronization problem:

    On each PSCrun the following command from the vmdird directory: 

    • .\vdcrepadmin.exe -f showpartnerstatus -h localhost -u administrator -w [password]

    Partner: psc1.domain.local
    Host available: Yes
    Status available: Yes
    My last change number: 872590
    Partner has seen my change number: 10846
    Partner is 861744 changes behind.
    
    Partner: psc2.domain.local
    Host available: Yes
    Status available: Yes
    My last change number: 2147483197
    Partner has seen my change number: 2147483197
    Partner is 0 changes behind.

     

    Since this had been discovered, the support engineer and I agreed that we should put the site recovery pairing on hold until the PSC issue was resolved, just so we didn’t have too many variables involved in our troubleshooting. To make a long story short, the PSC synchronization was the root cause of SRM not being able to pair the sites, and I’ve also written up a series on re-creating the environment in isolation, and performing the PSC replacement to provide the ultimate solution.

    Share This:
    Share