During server monitoring, it is extremely important to correctly assess the system load. By understanding the level of load, you can soberly assess the performance and availability of the system. For this purpose, experts, as a rule, estimate the Load Average indicator. What it shows and how to measure it correctly - further in our article.
Load Average (LA, average load) is an average measure of load, it is displayed in the number of processes that are in the state of execution or in the state of waiting for resources for the time interval of 1, 5 and 15 minutes. To best assess system performance, it is best to look at the average load, as load fluctuates rapidly due to short-lived processes.
There are several simple ways to measure the average load. The simplest is to write and execute a command. For example, in Linux, just run the uptime command in the terminal. The output will show the current time, the duration of system operation, the number of users, and most importantly, the average load in the interval of 1, 5, and 15 minutes. The load on the server is found out by executing the w command through the SSH console.
The result looks like this:
The average load value is calculated based on the processes that are running and queued for execution (CPU, RAM, I/O). The LA is mostly affected by CPU utilization, which is actually the only and key factor in increasing the load on the server.
Here is a simple example: there is a VPS with two cores. The average load value in the image above: 1.03, 1.11, 1.20 are normal load values for a VPS with 2 cores.
1 (unit) LA = 100% load on 1 CPU core. If the VPS has two cores, the average load can be as high as 2 LA:
- LA shows values 3.21, 4.22, 5.23 - load is dropping, but in the last 15 minutes it averaged 4.22, which equals 422% load = 4 out of 2 cores is not the norm;
- LA shows 7.15, 5.24, 1.18 - load is increasing, and in the last 15 minutes it was 1.18, within normal limits, which equals 118% load = 1 out of 2 cores - within normal limits (peak load lasting up to 30 min, let's say).
With three values at your disposal, you can analyze the state of the system and evaluate its performance. If all three values are 0, then the system is in standby mode. If the values increase, it means that the load is growing, and if they decrease, it means that the load is decreasing.
It should be taken into account that the system may experience a large number of spikes in case of simultaneous connection of a large number of users. This means that in combination with commands you should use various monitoring tools - Zabbix, Nagios, Monit. They record CPU and memory activity in the long term.
For hosting, it is extremely important to monitor the value of LA. The hoster's actions in case of load increase will depend on the cause of the load. For example, if the load grows, exceeds the number of cores and continues for a long period of time, the LA will increase the queue of requests to execute. In the presence of KVM/OpenVZ virtualization, the resulting load is bad for the physical server.
When a user performs a backup or unloading of goods in 1C as a result of which there are bursts of load for the hoster is not a serious cause for concern. But if the LA on the physical server significantly exceeds the norm and this phenomenon persists for a long time, it is often necessary to take certain measures, because the high LA carries a negative effect for customers who have placed their projects on a particular physical server.