Seeing to the current era, we all know the tough market competition that makes every company bother for their existence. With the growing demand in the IT market, people are also looking for reliable infrastructures. While thinking this, don't you feel that High Availability plays a vital role even in the cloud server environment? The answer to this is, of course, affirmative. People tend to think that when in cloud they may not need load balancer but do remmber every cloud provider including AWS cloud service provider do provide a load balancer service and there is a reason behind it.
With the growing competition, it is very much important to make a note that there is no breakdown or failure while providing the service at client's end. In transparent words, High Availability with the base of a load balancer is the infrastructure on which the other important aspects rely.
Lets put a glance on what exactly is high availability and how can you conquer over it.
About High Availability
High availability is nothing but a quality of a system or component that reassures a high level of operational execution for a given period of time. It can be defined as the period of time when a service is available, as well as the time taken by a system to respond to a request that is made by a user.
How to measure High Availability
It is often expressed as a percentage that would reveal how much up time is expected from a particular system in a given period of time. For example, value of 100% would indicate that the system never fails or goes offline. For instance, a system that guarantees you 99% of availability in a period of a year, can have up to 3.65 days of downtime (1%).
Well, these percentages are based on many factors that that include both scheduled and unscheduled maintenance periods, as well as the time to recover from a possible system failure.
How does Load Balancer help to increase the availability?
The basic role of high availability is the elimination of any failure in the infrastructure. If at all server becomes offline then surely there is an interruption of service due to which it becomes unavailable. Therefore, any component that causes failure should be requisite for the proper functionality of the application. This will make the system available.
The system should be prepared in such a way that all the operations work accurately. The system should be made in such way that it faces failures in less percentages. For example, there should be two identical web servers behind a load balancer. Now, the traffic coming from the clients will be distributed evenly between the servers. In the phase if one server goes down, the traffic is redirected to the online server by load balancer.
Main question here arises that what will happen if load balancer goes offline.
If this kind of issue takes place which is bit uncommon to happen, an additional load balancer can be configured easily. This will help you to achieve redundancy. But redundancy itself cannot guarantee high availability. There should be a mechanism in place for detecting failures and taking action when one of the either components of stack becomes unavailable.
The failure detection and recovery for redundant systems can be applied by using a top-to-bottom approach . This makes the layer on top responsible for monitoring the layer immediately beneath which would failures. In our previous example, the load balancer is the top layer. Now if one of the web servers becomes unavailable, the load balancer will stop redirecting requests for that specific server.
The below diagram describes it well
This method looks simpler but is actually not because it has certain limitations. There will be a point where a top layer is out of reach. This generally takes place in the case of load balancer layer. On the other hand creating a failure detection operation for the load balancer in external server would simply create a new single point of failure.
In such a phase, multiple redundant nodes must be connected together. This should be like a cluster where each node should be equally capable of failure detection and recovery.
What System Components Are Required for High Availability?
There are several components that must be taken under consideration for the implementation of high availability. Following are the factors:
- Software: The software must be prepared in such a way that it should be able to handle unexpected failure. This could potentially require a system to restart.
- Environment: It is good to have the redundant servers in different locations as natural calamities will not take all the servers down.
- Hardware: highly available servers should be resilient to power outages and hardware failures, including hard disks and network interfaces.
- Network: It is very much important that a redundant network strategy is in place for possible failures. If not then the unplanned network will cause another point of failure.
- Data: The loss of data can be caused by several other factors. Hard disk failure is not only a cause for it. Highly available systems must account for data safety in the event of a failure.
What Software Can Be Used to Configure High Availability?
Each layer will have different needs in terms of software and its configuration. At the application level, load balancers represent an important part of the software for the setup.
HAProxy (High Availability Proxy) is a common software for load balancing. It can handle load balancing at multiple layers and for different kinds of servers, including any database servers you may be using like MYSQL database server and/or MSSQL database server.
Next important thing is to implement a reliable redundant solution for application entry point. To remove this single point of failure, we need to implement a cluster of load balancers behind a Floating IP. For this, Corosync and Pacemaker are used for the setup both Ubuntu and CentOS servers.
You must be thinking that this is complicated but High availability is an important subset of reliability engineering. This helps you to focus that your system is running at an optimum level in a given period of time. This works best for the systems that require high reliability.