Zerto: Dual NIC ZVM

Something I recently ran into with Zerto (and this can happen for anything else) was the dilemma of being able to protect remote sites that (doesn’t happen often) happen to have IP addresses that are identical in both the protected and recovery sites.  And no, this wasn’t planned for, it was just discovered during my Zerto deployment in what we’ll call the protected sites.

Luckily, our network team had provisioned two new networks that are isolated, and connected to these protected sites via MPLS.  Those two new networks do not have the ability to talk back to our existing enterprise network without firewalls getting involved, and this is by design since we are basically consolidating data centers while absorbing assets and virtual workloads from a recently acquired company.

When I originally installed the ZVM in my site (which we’ll call the recovery site), I had used IP addresses for the ZVM and VRAs that were part of our production network, and not the isolated network set aside for this consolidation.  Note: I installed the Zerto infrastructure in the recovery site ahead of time before discussions about the isolated networks was brought up.  So, because I needed to get this onto the isolated network in order to be able to replicate data from the protected sites to the recovery site, I set out to re-IP the ZVM, and re-IP the VRAs.  Before I could do that, I needed to provide justification for firewall exceptions in order for the ZVM in the recovery site to link to the vCenter, communicate with ESXi hosts for VRA deployment, and also to be able to authenticate the computer, users, service accounts in use on the ZVM.  Oh, and I also needed DNS and time services.

The network and security teams asked if they could NAT the traffic, and my answer was “no” because Zerto doesn’t support replication using NAT.  That was easy, and now the network team had to create firewall exceptions for the ports I needed.

Well,  as expected, they delivered what I needed.  To make a long story short, it all worked, and then about 12 hours before we were scheduled to perform our first VPG move, it all stopped working, and no one knew why.  At this point, it was getting really close to us pulling the plug on the migration the following day, but I was determined to get this going and prevent another delay in the project.

When looking for answers, I contacted my Zerto SE, reached out on twitter, and also contacted Zerto Support.  Well, at the time I was on the phone with support, we couldn’t do anything because communication to the resources I needed was not working.  We couldn’t perform a Zerto re-configure to re-connect to the vCenter, and at this point, I had about 24VPGs that were reporting they were in sync (lucky!), but ZVM to ZVM communication wasn’t working, and recovery site ZVM was not able to communicate with vCenter, so I wouldn’t have been able to perform the cutover.  So since support couldn’t help me out in that instance, I scoured the Zerto KB looking for an alternate way of configuring this where I could get the best of both worlds, and still be able to stay isolated as needed.

I eventually found this KB article that explained that not only is it supported, but it’s also considered a best practice in CSP or large environments to dual-NIC the ZVM to separate management from replication traffic.  I figured, I’m all out of ideas, and the back-and-forth with firewall admins wasn’t getting us anywhere; I might as well give this a go.  While the KB article offers the solution, it doesn’t tell you exactly how to do it, outside of adding a second vNIC to the ZVM.  There were some steps missing, which I figured out within a few minutes of completing the configuration.  Oh, and part of this required me to re-IP the original NIC back to the original IP I used, which was on our production network.  Doing this re-opened the lines of communication to vCenter, ESXi hosts, AD, DNS, SMTP, etc, etc… Now I had to focus on the vNIC that was to be used for all ZVM to ZVM as well as replication traffic.  In a few short minutes, I was able to get communication going the way I needed it, so the final thing I needed to do was re-configure Zerto to use the new vNIC for it’s replication-related activities.  I did that, and while I was able to re-establish the production network communications I needed, now I wasn’t able to access the remote sites (ZVM to ZVM) or access the recovery site VRAs.

It turns out, what I needed here were some static, persistent routes to the remote networks, configured to use the specific interface I created for it.

Here’s how:

The steps I took are below the image.  If the image is too small, consider downloading the PDF here.

zerto_dual_nic_diagram

 

On the ZVM:

  1. Power it down, add 2nd vNIC and set it’s network to the isolated network.  Set the primary vNIC to the production network.
  2. Power it on.  When it’s booted up, log in to Windows, and re-configure the IP address for the primary vNIC.  Reboot to make sure everything comes up successfully now that it is on the correct production network.
  3. After the reboot, edit the IP configuration of the second vNIC (the one on the isolated network).  DO NOT configure a default gateway for it.
  4. Open the Zerto Diagnostics Utility on the ZVM. You’ll find this by opening the start menu and looking for the Zerto Diagnostics Utility.  If you’re on Windows Server 2008 or 2012, you can search for it by clicking the start menu and starting to type “Zerto.”
    zerto_dual_nic_1_4
  5. Once the Zerto Diagnostics Utility loads, select “Reconfigure Zerto Virtual Manager” and click Next.
    zerto_dual_nic_1_5
  6. On the vCenter Server Connectivity screen, make any necessary changes you need to and click Next.  (Note: We’re only after changing the IP address the ZVM uses for replication and ZVM-to-ZVM communication, so in most cases, you can just click Next on this screen.)
  7. On the vCloud Director (vCD) Connectivity screen, make any necessary changes you need to and click Next. (Note: same note in step 6)
  8. On the Zerto Virtual Manager Site Details screen, make any necessary changes you need to  and click Next. (Note: same as note in step 6)
  9. On the Zerto Virtual Manager Communication screen, the only thing to change here is the “IP/Host Name Used by the Zerto User Interface.”  Change this to the IP Address of your vNIC on the isolated Network, then click Next.zerto_dual_nic_1_9
  10. Continue to accept any defaults on following screens, and after validation completes, click Finish, and your changes will be saved.
  11. Once the above step has completed, you will now need to add a persistent, static route to the Windows routing table.  This will tell the ZVM that for any traffic destined for the protected site(s), it will need to send that traffic over the vNIC that is configured for the isolated network.
  12. Use the following route statement from the Windows CLI to create those static routes:
    route ADD [Destination IP] MASK [SubnetMask] [LocalGatewayIP] IF [InterfaceNumberforIsolatedNetworkNIC] -p
    Example:>
    route ADD 192.168.100.0 MASK 255.255.255.0 10.10.10.1 IF 2 -p
    route ADD 102.168.200.0 MASK 255.255.255.0 10.10.10.1 IF 2 -p
    
    Note: To find out what the interface number is for your isolated network vNIC, run route print from the Windows CLI.  It will be listed at the top of what is returned.
    

 

zerto_dual_nic_1_10

Once you’ve configured your route(s), you can test by sending pings to remote site IP addresses that you would normally not be able to see.

After performing all of these steps, my ZVMs are now communicating without issue and replications are all taking place.  A huge difference from hours before when everything looked like it was broken.  The next day, we were able to successfully move our VPGs from protected sites to recovery sites without issue, and reverse protect (which we’re doing for now as a failback option until we can guarantee everything is working as expected).

If this is helpful or you have any questions/suggestions, please comment, and please share! Thanks for reading!

 

Share This:

Protecting a VM with vSphere Replication

Continuing on from the previous blog about configuring array-based replication with SRM, in this blog post we’ll be going through configuring protection of a VM using vSphere Replication.  The reason I’m doing this instead of jumping right into creating the protection groups and recovery plans is because vSphere Replication can function on its own without SRM.  That said, we’ll go through the steps to protect a virtual workload using vSphere Replication, and follow this up with creating protection groups and recovery plans, which come into play in either situation (ABR vs vR) when we get to the orchestration functionality that SRM brings to the table.

vSphere Replication is included with VMware Essentials plus and above, so chances are you have this feature available to you to should you decide to use it to protect VMs using hypervisor-based replication.  In my experience, vSphere Replication works great and can be used to either migrate or protect virtual workloads, however, as stated above, can be limited.  See this previous post for the details of what vSphere Replication can and can’t do without Site Recovery Manager.

 

Procedure

In this walkthrough for protecting a VM using vSphere Replication, I will be performing the steps using a decently sized Windows VM as the asset that needs protection.  This VM is a plain installation of Windows, however, I use the fsutil to generate files of different sizes to simulate data change.

    1. In your vSphere Web Client, locate a VM that you wish to protect via hypervisor-based replication.
    2. Right-click on the VM and go to All vSphere Replication Actions > Configure Replication.how-to_vspherereplication_1_2
    3. When the wizard loads, the first screen asks for the replication type.  Select Replicate to a vCenter Server, and click Next.how-to_vspherereplication_1_3
    4. Select the Target Site and click Next.how-to_vspherereplication_1_4
    5. Select the remote vSphere Replication server (or if you only have 1, then select auto-assign), wait for validation, then click Next.how-to_vspherereplication_1_5
    6. On the target location screen, there are several options to configure, so we’ll go through each one by one:- Expand the settings by clicking the arrow next to the VM, or click the info link.how-to_vspherereplication_1_6_a– Click edit in the area labeled Target VM Location, select the target datastore and location for the recovery VM, then click OK to be returned to the previous screen.how-to_vspherereplication_1_6_b– Typically, the previous step would be enough, however, if you want to place VMDKs in specific datastores, edit their format (thick vs. thin provisioned), or assign a policy, use the edit links beside each hard disk.  Once all your settings are how you want them, click Next.

      how-to_vspherereplication_1_6_c

    7. Specify your replication options, then click Next.
      Notes:
      - Enable quiescing if your guest OS supports it, however, keep in mind
        that quiescing may affect your RPO times.
      - Enable network compression to reduce required bandwidth and free up
        buffer memory on the vSphere Replication server, however, higher CPU
        usage may result, so it is best to test with both options to see what
        works best in your environment.
      

      how-to_vspherereplication_1_7

    8. Configure RPO to meet customer requirements, enable point in time instances (snapshots in time as recovery points – maximum of 24) if needed, then click Next.
    9. Review your configuration summary, make changes if necessary, but when you’re done, click Finish.  As soon as you finish, a full sync will be initiated.

There you go, configuring vSphere replication for a VM.  The next post will cover creating protection groups and recovery plans, which we will then tie into what we’ve just performed here and with the array-based replication post.

Share This:

VMware SRM 6.1 – Configure Array-Based Replication

Introduction

 

This how-to will walk through the installation and configuration of array-based replication features for VMware Site Recovery Manager 6.1.

Before configuring array-based replication for use with VMware SRM, there are some pre-requisites.  First of all, you’re going to need to visit the VMware Compatibility Guide, which will help you determine if your specific array vendor is supported for use with SRM.  Second, there are steps to take to configure array based replication on the storage side, and that portion is out-of-scope for this blog, as I did not have access to do so.

vmware_hcl_example

There are several ways to search the compatibility guide, but to be specific, you can select entries from the areas highlighted above.  The bottom section that is highlighted will be your results once you click “Update and View Results.”  The reason why I wanted to point this step out is because if you assume your array vendor is supported, and don’t verify first, you could end up wasting your time planning and designing.

For this example, we are using SRM 6.1 with the Fibre Channel protocol on IBM SVC-fronted DS8K’s in both sites. I wanted to point that out because when I first set out to find the SRAs for use with our solution, I attempted to use the “IBM DS8000 Storage Replication Adapter”, later to find out it wasn’t the correct one.   The correct SRA for use with my environment is the “IBM Storwize Family Storage Replication Adapter”, so there may be a little bit of trial and error with this; however, if you do it up front during testing, you’ll save yourself some time later when deploying to production.

That all said, once you’ve verified your storage is supported, and what version of the SRA to download, you can get it by visiting the VMware downloads (you will need to login).  Be sure to also verify that the version of the SRA you are downloading is compatible with the version of array manager code you’re running.

 

Installing the SRA

Before you Begin – Prior to installing the SRA on the SRM server in each site (protected and recovery), you should have already paired the sites successfully.  Also, if you haven’t installed SRM yet, you will need to, otherwise the SRA installer will fail once it discovers that SRM is not installed.

Installing the SRA should be straightforward and painless, as there are not many options to configure during installation.  Once the installation is completed on both the protected and recovery SRM servers, proceed.

 

Verify That SRM Has Registered the SRAs

  1. Once you’ve installed the SRA on each site’s SRM server, log into the vSphere Web Client, and go to Site Recovery > Sites and select a site.site_recovery_sites_sra_monitor
    From this view, you can see what SRA has been installed, its status, and compatibility information.
  2. Click the rescan button to ensure the connection is valid and there are no errors.srm_sra_rescan_button

Configure Array Managers

After pairing the protected and recovery sites, you will need to configure the respective array managers so SRM can discover replicated devices, compute datastore groups, and initiate storage operations.  You typically only need to do this once, however, if array access credentials change, or you want to use a different set of arrays, you can edit the connections to update accordingly.

Pre-Requisites

  • Sites have been paired and are connected
  • SRAs have been installed at both sites and verified

Procedure

  1. In the vSphere Web Client, go to Site Recovery > Array Based Replication.srm_abr_settings_1_1
  2. On the Objects tab in the right window pane, click the icon to add an array manager.srm_abr_settings_1_2
  3. Select from one of two options for adding array managers (pair or single), then click Next.srm_abr_settings_1_3
  4. Select a pair of sites for the array manager(s), and click Next.srm_abr_settings_1_4
  5. Enter a name for the array in the Display Name field, and click Next.srm_abr_settings_1_5
  6. Provide the required information for the type of SRA you selected, and click Next.srm_abr_settings_1_6
  7. If you chose to add a pair of array managers, enter the paired array manager information, then click Next.srm_abr_settings_1_7
  8. Click-to-enable the checkbox beside the array pair you just configured, and click Next.srm_abr_settings_1_8
  9. Review your configuration, then click Finish when ready.srm_abr_settings_1_9

 

Rescan Arrays to Detect Configuration Changes

SRM performs an automatic rescan every 24 hours by default to detect any changes made to the array configurations.  It is recommended to perform a manual rescan following any changes to either site by way of reconfiguration or adding/removing devices to recompute the datastore groups.  If you need to change the default interval at which SRM performs a rescan, you can do this in the advanced settings for each site, editing the storage.minDsGroupComputationInterval advanced setting:

srm_abr_settings_1_11

To perform a manual rescan after making any configuration changes:

  1. Go to Site Recovery  > Array Based Replication
  2. Select an array for either site
  3. On the Manage tab of the selected array, click the Array Pairs sub tab
  4. Click the rescan button to perform a manual rescan.srm_abr_settings_1_10

 

Once you’ve got all of the above configured, you can begin setting up your protection groups and recovery plans.

Share This: