Linux Performance Monitoring and Tuning Introduction
Linux system administrators should to be capable in Linux performance monitoring and tuning. This article gives an abnormal state outline on how we should approach performance checking and tuning in Linux, and the different subsystems that should be observed.
To recognize system bottlenecks and come up with solutions to fix it, you should to see how various components of Linux works. For instance, how the kernel gives preference to one Linux handle over others utilizing nice values, how I/O interrupts are taken care of, how the memory management works, how the Linux file system works, how the system layer is executed in Linux, and so on.
On an abnormal state, taking after are the four subsystems that should be observed.
- CPU
- Memory
- I/O
- Network
1. CPU (Central Processing Unit)
CPU is an electronic circuit that can execute PC programs. Both the scaling down and standardization of CPUs have expanded their presence far beyond past the restricted application of dedicated computing machines. Modern microchips show up in everything from automobiles to cell phones.
You should understand the four critical performance metrics for CPU context switch, run queue, cpu utilization, and load average.
Context Switch:
- A content switch is a technique that a PC’s CPU follows to change from one task (or process) then onto the next while ensuring that the assignments don’t conlict.
- In a CPU, the term “context” refers to the data in the registers and program counter at a specific moment in time.
- At the point when a process switch happens, kernal stores the current condition of the CPU (of a process or thread) in the memory.
- A modern CPU can perform hundreds of context switches per second. Therefore, the user gets the feeling that the PC is performing numerous tasks in a parallel manner.
- Kernal also recovers the previously stored state (of a process or thread) from the memory and places it in the CPU.
Run Queue:
- Run queue shows the total number of active process in the current queue for CPU.
- When CPU is ready to execute a process, it picks it up from the run queue based on the priority of the process.
- Processes that are in sleep state, or i/o wait state are not in the run queue.
- A higher number of processes in the run queue can cause performance issues.
CPU Utilization
CPU usage is characterized as the rate of usage of a CPU. How a CPU is used is an important metric for measuring system. Its indicates how much of the CPU is currently getting used. Most performance monitoring tools categorize CPU utilization into the following categories:
- User Time – The percentage of time a CPU spends executing process threads in the user space.
- System Time – The percentage of time the CPU spends executing kernel threads and interrupts.
- Wait IO – The percentage of time a CPU spends idle because ALL process threads are blocked waiting for IO requests to complete.
Load Average
The system load is an estimation of the computational work the system is performing. This estimation is shown as a number.
- This indicates the average CPU load over a specific time period.
- Load average is figured by combing both the total number of process in the queue, and the total number of process in the uninterruptable task status.
2. Network
PCs are joined in a network to exchange information or resources one another. Two or more PC joined through network media called PC network. There are number of system devices or media are included to form PC network.
- This metric informs you of the quantity of packets received and sent by a given network interface.
- This value depicts the number of bytes got and sent by a given network interface.
- This is a number of packets that have been dropped by the kernal, either because of a firewall configuration or because of an absence of network buffers.
- A good understanding of TCP/IP concepts is helpful while analyzing any network issues.
3. I/O
- I/O wait is the amount of time CPU is waiting for for I/O. In the event that you see consistent high I/O wait to you system, it indicates an issue in the disk subsystem.
- You should to likewise screen reads/second, and writes/second. This is measured in blocks. i.e number of blocks read/write per second. These are likewise referred as shut in and shut out.
- tps indicates total transactions per seconds, which is sum of rtps (read transactions per second) and wtps (write transactions per seconds).
4. Memory
Memory is the electronic holding place for instruction and information that your PC’s microchip can reach quickly. At the point when your PC is in normal operation, its memory usually contains the main parts of the working system and some or the greater part of the application programs and related information that are being utilized.
- The unused RAM will be used as file system cache by the kernel.
- The Linux system will swap when it needs more memory. i.e when it needs more memory than the physical memory.
- Lot of swapping can cause performance issues, as the disk is much slower than the physical memory, and it takes time to swap the memory pages from RAM to disk.
80% of the performance improvement comes from tuning the application, and the rest 20% comes from tuning the infrastructure components. There are various tools available to monitor Linux system performance. For example: top, free, ps, iostat, vmstat, mpstat, sar, tcpump, netstat, iozone, etc.