One of the most useful concepts to come from the “as a service” model that cloud computing created is Disaster Recovery as a Service (DRaaS). DRaaS allows the business to outsource the critical part of their IT infrastructure strategy that assures the organization will still operate in the event of an IT outage. The primary technology that has allowed Disaster Recovery (DR) to be outsourced, or prosed as a service, is virtualization. DRaaS providers operate their own datacenter and provide a cloud infrastructure where they will rent servers for replication and recovery of their customer’s data. DRaaS solutions have grown in popularity partly because of the increased need for small and medium (SMB) sized business’s IT strategy to include DR. DR plans have become mandated by the larger companies, who the SMB supply services to, as well as insurers or regulatory agencies. These entities require proof of the DR plan and of the ability to recover quickly from an outage. It’s a complicated process that few organizations take the proper time to address. A custom solution for each business needs to be designed by an experienced IT professional that focuses on cloud and DR. Most times an expert such as Two Ears One Mouth Consulting partnered with a DRaaS provider will create the best custom solution.
The Principles and Best Practices for Disaster Recovery (DR)
Disaster Recovery (DR) plans and strategies can vary greatly. One extreme notion is the idea that “my data is in the cloud, so I’m covered”. The other end of the spectrum is “I want a duplication of my entire infrastructure off site and replicated continually”, an active-active strategy. Most businesses today have some sort of backup; however, backup is not a DR plan. IT leadership of larger organizations favor the idea of a duplicated IT infrastructure like the active-active strategy dictates but balk when they see the cost. The answer for your company will depend on your tolerance for an IT outage, how long you’re willing to be off-line, as well as your company’s financial constraints.
First, it’s important to understand what the primary causes of IT outages are. Many times, we consider weather events and the power outages they create. Disruptive weather such as hurricanes, tornadoes and lightning strikes from severe thunder storms affect us all. These weather-related events make the news but are not the most common causes. Human error is the greatest source of IT outages. This type of outage can come from failed upgrades and updates, errors by IT employees or even mistakes from end users. Another growing source of IT outages is malware and IT security breaches (See the previous article on Phishing). Ransomware outages require an organization to recover from backups as the organization’s data has been encrypted and will only be unlocked with a ransom payment. It is vital that security threats are addressed, understood and planned for in the DR recovery process.
Two important concepts of DR are Recovery Point Objective (RPO) and Recovery Time Objective (RTO). RPO will detail the interval of time that will pass during an outage before reaching the organization’s tolerance for data loss. The RPO concept can be used for a ransomware attack, described above, to fallback to data for a time before the breach. More often RPO is used to define how long the customer is willing to go back in time for the data to be restored. This determines the frequency of the data replication and ultimately the cost of the solution. The RTO defines the amount of time the vendor will have the customer up and running on the DR solution in an outage and how they will “fallback” when the outage is over.
If the company is unable to create an active-active DR solution, it is important to rate and prioritize critical applications. The business leadership needs to decide what applications are most important to the operations of the company and set them first to recover in a DR solution. Typically these applications will be grouped in “phases” as to the priority of importance to the business and order to be restored.
Telecommunications networking can sometimes be the cause of an IT outage and is often the most complicated part of the recovery process. Customers directed to one site in normal circumstances need to be changed to another when the DR plan is engaged. In the early days of DR, there was a critical piece of documentation called a playbook. A playbook was a physical document with step-by-step instructions detailing what needs to happen in the event of an IT outage. It would also define what is considered a disaster, and at what point do we engage the DR plan. Software automation has partially replaced the playbook; however, the playbook concept remains. While automating the process is often beneficial there are steps that can’t be automated. Adjusting the networking of the IT infrastructure in the event the DR plan in imitated in one example.
Considerations for the DRaaS Solution
DRaaS like other outsourced solutions has special considerations. The agreement with the DraaS provider needs to include Service Level Agreements (SLAs). SLA’s are not exclusive to DRaaS but are critical to it. An SLA will define all the metrics you expect your vendor to attain in the recovery process. RTO and RPO are important metrics in an SLA. SLA’s need to be in writing and have well defined penalties if deliverables are not met. There should also be consideration for how the recovery of an application is defined. A vendor can point out the application is working at the server level but may not consider if it’s working at the desktop and at all sites. If the customer has multiple sites, the details of the networking between sites is a critical part of the DR plan. That is why a partner that understands both DR and telecommunications, like Two Ears One Mouth IT Consulting, is critical.
The financial benefits of an outsourced solution such as DRaaS are a primary consideration. To make a CapEx purchase of the required infrastructure that will be implemented in a remote and secure facility is very costly. Most businesses see the value of renting the infrastructure for DR that is already implemented and tested in a secure and telecom rich site.
DR is a complicated and very important technology that a business will pay for but may never use. Like other insurance policies, it’s important and worth the expense. However, it’s complicated it should be designed and executed by professionals which may make an outsourced service the best alternative.
If you need assistance designing your DR Solution (in Cincinnati or remotely), please contact us at:
.Jim Conwell (513) 227-4131 email@example.com www.twoearsonemouth.net