Lesson V: Availability

The final aspect of security I want to discuss in this series is Availability; the ability to access data when needed. While this is normally something that is not considered part of security, it is part of the CIA (Confidentiality, Integrity, Availability) approach that we use at Workshare.

Availability can be in conflict with security, especially with confidentiality; after all, the most secure computer is one unplugged from the network and turned off, but that is not useful if you need data.

There are different levels of availability issues, from temporary ones to complete data loss and all of them need to be considered.

In order for data to always be available when needed, we have to first understand the requirements.

Availability considerations usually include the following elements:

  • Determining availability requirements.
    • Availability periods: Some data must be available 24/7, but other data may only be needed Monday to Friday - during office hours. If you do not need 24/7 availability you will be able to schedule maintenance out of business hours without affecting your users.
    • Up-time requirements: Due to the nature of computer systems, it is impossible to guarantee 100% availability and, even if it was possible, it may not be cost effective. Usually, up-time requirements are guaranteed by a Service Level Agreement (SLA) with your customers.
    • Definition of what uptime is: In most cases, availability SLAs may not cover the whole infrastructure or service or include hard limits on response times.
  • Risk analysis: Identifying the different things that may go wrong in the infrastructure (including human error), how they will affect availability and how they can be mitigated.
  • Technical measures: There are a number of technical measures to help with availability. At the most basic level, you can increase availability (and cost!) by adding redundancy and eliminating, or at least reducing, single points of failure, but you can go all the way into self-healing systems.
  • Constraints: There will be multiple constraints at different levels, such as monetary, architectural, staffing or even technical ones, which will lead the requirements.
  • Monitoring: This is a combination of automation and manual intervention to verify that the system is "up". A common approach to monitoring sets different thresholds that go from healthy to warning (the system is approaching its limits) to critical, where the system is unavailable.
  • Processes: A major aspect of maintaining high levels of availability is recovering from failure situations. This includes technical measures as well as processes and documentation ensuring that issues are handled quickly and correctly.
    • One often forgotten approach is the existence of Business Continuity and Disaster Recovery processes and tools that ensure the system can be brought back up after major incidents with the infrastructure or business.

Once all the considerations are in place, it comes to developing a plan that allows us to reach or exceed availability goals within the estimated budget, making sure it gets reviewed on a regular basis. On top of scheduled reviews, the plans should be reviewed after changes to the service, including code, infrastructure or architecture. In most cases, there should be a capacity plan that looks ahead to ensure scalability issues are addressed before they become a problem for availability.

The move to the cloud has changed completely the approach to availability. While in the past the approach was to set up systems that did not fail, with enough capacity to handle peak load, now the normal approach is to have fault-tolerant systems that can scale up and down as required. This approach can be both cheaper and more reliable, but it requires design with those constraints in mind and will complicate the solution.

At Workshare services are designed with availability and scalability in mind. All of our services are replicated, with minimal single points of failure and a multi-cloud approach to ensure that an outage at our main provider will not affect our customers.

This is the final post in The Evangelist series. I hope you have enjoyed them!