Testing, Testing 1, 2, 3

Thunder Technologies
6 min readSep 28, 2021

Testing your disaster recovery protection once a year might be typical for manual disaster recovery processes. But would you buy software from a company that tests their code once a year?

Once a month makes more sense, but given your busy schedule and competing priorities the chance is high that this regimen is routinely skipped.

If it can be done with a minimum investment in time and money, testing can never be done too often. Every hour makes sense, and in this post I’ll show you how you can do it for your mission-critical AWS EC2 instances robustly and affordably. Spoiler alert though: our product Thunder for EC2 Serverless automates essentially all of these operations for you for a one-time $399 fee and minimal ongoing operational costs.

In a previous post I described how AWS Lifecycle Manager can replicate either snapshots or AMI images of your mission-critical EC2 instances across regions. In theory, in case of an outage you can deploy instances from your replicated AMIs to keep your business running.

But how do you know it will work?

Ideally, you convert those AMI images to instances and power them on regularly. However, you are not protecting EC2 instances, but the applications hosted in these instances. Without connecting to the application, you do not have the confidence that the application can actually recover, that all the licenses are in place, that all of the storage it needs is available, and visibility into any other unforeseeable glitch that could affect an application’s boot.

Plainly testing each application manually during your frequent tests is cumbersome and time-consuming, even with painstaking documentation. Ideally you would have a battery of automated testing code that can be simple stored and safely executed against your backup EC2 instance every time a replication job completes. What’s the best way to do this?

A serverless function, perhaps? Consider the following:

Take for example an EC2 instance hosting a Redmine server, a popular open-source project management solution. In Redmine you can configure a test project with a test user who only has permission to read that project:

Then you can write a simple python program to connect to the Redmine server and extract information about that project. The python library python-redmine provides a simple client, which can be added to a virtual python environment with python pip:

Because the execution environment in Lambda functions only contain a limited subset of python libraries, you will have to supply the Redmine python client modules — and their dependencies — along with your client code. This is relatively straightforward using python virtual environments, merely include all of the directories that support the Redmine client that are in the venv/lib/python3.8/site-packages directory; for example:

You will need certify, idna, redminelib, requests, and urllib3; pip, setuptools, and wheel are standard python packages in every python environment, including a Lambda function.

Convert your Redmine client to a Lambda function handler, include the library directories in the zip file, instruct your function handler to look in the local directory for python libraries:

Then configure a new AWS Lambda function in your DR region; that function will need secure access to the Redmine DR instance, so first create a security group in the VPC hosting the DR Redmine server; it needs no special rules.

Then create the Lambda function, attaching it to the Redmine server’s VPC, and specifying the security group you just created.

Upload the code to the function, specifying the DR Redmine’s server’s private IP as the URL; this private IP is constant for the instance throughout its existence, even after a reboot:

Add the function’s security group to the Redmine DR instance’s security group so that it can access the application through the private network, this is done by creating a new rule that all traffic is granted to the Lambda function’s security group ID:

Power on the DR instance, and create a Lambda test to run your code to confirm it works:

Then run the test:

Power off the Redmine DR instance; run again after each replication job, or more often if you see fit. Now you have essentially total confidence your Redmine server will recover in case of a DR failover.

Using a Lambda function has many advantages:

  • The code is securely stored in AWS, there is no confusion where the code is
  • The function connects securely to your application through the private network
  • Execution time is very short and the test instance only needs to run briefly; overall cost to test is extremely low
  • Confidence is given that the application will recover in case of a true failure; any errors can be corrected in advance
  • The process is repeatable for all of your applications; most modern software has open-source python libraries available and is easily packaged

Hopefully by now you are nodding your head in agreement that this is the most robust way to test your replicated EC2 instances, but perhaps some hesitation as to the effort involved and the need to automate many of the steps, including running the test itself.

That’s where our product Thunder for EC2 Serverless comes in: it automates the provisioning, replication, and — most importantly — testing of duplicate EC2 instances across regions for robust disaster recovery protection, all for a single $399 license fee. Our product is implemented itself as an AWS Lambda function, so its operation costs are minimal, and it hooks into the testing procedure described in this article. We already have several supported tests — all easily configured through a CloudFormation template — including MySQL and HTTP, and are adding more regularly. We can also assist in creating more and making those readily available to you from one of our S3 buckets. The cloud just makes all of this so smooth and easy.

For more information check out our hands-on demo including a brief introductory video, or contact us at info@thundertech.io for more information or to schedule a live demo.

Originally published at https://www.linkedin.com.

--

--

Thunder Technologies

Thunder Technologies provides robust, cost-effective disaster recovery automation for the public cloud