Linux Scheduler profiling

Comments · 13169 Views

Designed various bench-marks to profile over-heads, latencies, run-time and behavior of different schedulers in a Linux machine

The Linux process scheduler is an important part of the kernel in a Linux operating system. A Scheduler must rather allocate the processor time to all the processes. We are going to focus on three scheduler in particular: SCHED_FIFO, SCHED_RR, and SCHED_OTHER.

A.SCHED_FIFO (First-In-First-Out)

This is also knows as a real-time policy which execute and generates in FIFO order. The SCHED_FIFO scheduling class is a longstanding, POSIX-specified real-time feature. Processes in this class are given the CPU for as long as they want it, subject to the needs of higher-priority real-time processes. If there are
two SCHED_FIFO processes with the same priority contending for the CPU, the process which is currently running will continue to do so until it decides to give the processor up. SCHED_FIFO is thus useful for real-time applications where one wants to know, with great assurance, that the highest priority process on the system will have full access to the processor for as long as it needs it.

B.SCHED_RR (Round-Robin)

Tasks running in SCHED_RR are real time, but they will leave the CPU if there is another real-time task in the execute queue. So the CPU power will be dispersed between all other SCHED_RR tasks. If at least one RT task is executing, no other SCHED_OTHER task will be permitted to run in any CPU. Each RT task has rt_priority so the SCHED_RR class will be permitted to allocate the CPU power between all the SCHED_RR tasks at its own will. The rt_priority of the SCHED_RR class works just as the basic priority field for the SCHED_OTHER class. In the SCHED_RR policy, threads of equal priority are scheduled in a method called as round robin fashion. Generally as what we have seen is that SCHED_FIFO(First-In-First-Out) is always preferred over SCHED_RR (Round-Robin). SCHED_FIFO and SCHED_RR threads will run till one of the following events happen: 1. the thread goes to sleep or begins waiting for a particular event. 2. A higher-priority real-time thread befits ready to run. If one of these events does not happen, the threads will execute indefinitely on that processor,and lower-priority threads will not be given a chance to run. This can result in system service threads deteriorating to run, and operations such as memory swapping and file system data flushing not occurring as expected.


SCHED_OTHER is the common round-robin time-sharing scheduling policy that plans a job for a certain time slice depending on the other jobs running in the system. SCHED_OTHER or SCHED_NORMAL is the default scheduling policy for Linux threads. It has a dynamic priority that is altered by the system built on the characteristics of the thread. One more thing that effects the priority of SCHED_OTHER threads is their nice value. The nice value is an integer ranging between -20 (highest priority) and 19 (lowest priority). By default, SCHED_OTHER threads take a nice value of 0. Adjusting the nice value will alter the way the thread is controlled.




Ftrace is an internal tracer designed to help out developers and designers of systems to find what is going on inside the kernel. It can be used for debugging or analyzing latencies and performance issues that take place outside of user-space. Although ftrace is typically considered the function tracer, it is really a frame work of several assorted tracing utilities. There's latency tracing to examine what occurs
between interrupts disabled and enabled, as well as for preemption and from a time a task is woken to the task is actually scheduled in. One of the frequently common uses of ftrace is the event tracing. Throughout the kernel is hundreds of static event points that can be enabled via the tracefs file system
to see what is going on in certain parts of the kernel. Ftrace uses the tracefs file system to hold the control files as well as the files to display output.


The trace-cmd is a command that interacts with Ftrace tracer that is built within the Linux kernel. It interfaces with the Ftrace specific files found in the debugfs file system under the tracing
directory. A COMMAND must be specified to tell trace-cmd what to do. The various commands used are: record, report etc.

C.Kernel Shark

Kernel Shark is a front end reader of the trace-cmd output. It reads a trace-cmd.dat formatted file and gives out a graph and tabulated view of the data.


Perf sched practices a dump-and-post-process method foranalyzing scheduler events, which can be a difficult as these events can be very recurrent – millions per second – costing CPU, memory, and disk overhead to record the events occurring. Scheduler analysis tools, does critically reduce overhead by using in-kernel summaries. But there are certain cases where you might want to capture every event using perf sched instead, in spite of the higher overhead. Imagine having about five minutes to examine a bad cloud instance before it is auto-terminated, and you want to capture everything for later


A.First Milestone

As a part of Experiment1, we have designed a python program which maximizes load on all the CPU cores of the system which leads up to 100% CPU utilization. This program was scheduled
on all the schedulers: SCHED_FIFO, SCHED_RR and SCHED_OTHER. With the help of time package in Linux, different execution times at user, real, system are extracted. At the same time using perf, various statistics like number of switches, average preemption delay, maximum preemption
delay were obtained.

B.Second Milestone

Previously in milestone 1, we have stressed only CPU utilization with pooling four different processes. But in the second milestone, we have designed our second Experiment which not only stresses CPU but also Memory and I/O. To put load on memory utilization, a third-party tool called “stress” is
used which will start one worker continuously calling mmap/munmap and writes till 1 GB to the memory. To increase I/O utilization, we have taken a file of 8MB and performed read and write operations to the same file. One more difference between the two experiments is that in experiment 1 we are pooling four processes.

On the other hand, in experiment 2 implemented load on I/O utilization and memory utilization on threads. So this bench-mark consist of thread and also process scheduling which is observed under the
three schedulers and same set of results were retrieved to analyze the behavior of them.

C.Third Milestone

In this milestone, we considered two real-time applications for a clear distinction in results: chromium browser and VLC for experiment 3. Here, we are also combining the program written in experiment 2 to analyze how all the three schedulers are scheduling normal task with Chromium and VLC.

  1. Fourth Milestone

As a part of advanced analysis with the help of tool called stress-ng, we tried to fork different child processes continuously for 50 seconds under different schedulers. We did this to check how the three schedulers handle forks. Here, the children process forked immediately exits. In this milestone, we have
taken into consideration of various metrics such as SCHED_WAKEUP, SCHED_MIGRATE tasks, CPU cycles,
CPU migrations.


Metrics of Execution times:

Real – It is the time elapsed including time-slices by other processes and the time the process is blocked.
User – Actual CPU time used in executing the process.
Sys – The amount of CPU time spent in the kernel in the process.

Metrics of preemptive over-head:

Switches – Number of context switches occurred.
Average delay – Average time the process is in preemptive state and waits to run again.
Maximum delay – The max time process is in preemptive state.


From the results obtained in milestone 1, execution times are almost similar across all the schedulers though context switches are more for RR and OTHER. The switches are because of the time slicing policy they have. This doesn’t shows much difference.

But, from the results obtained in milestone 2 and 3 it is clearly seen that CFS is better across all the schedulers and FIFO bit late. Though, FIFO has lesser number of switches, the average preemption delay and maximum preemption delay is higher for it. Whereas, RR is between OTHER and FIFO. In the three schedulers, profiles OTHER uses the time slicing as well as the lowest virtual runtime next which is the optimal scheduling policy. FIFO and RR consider priorities whereas, OTHER doesn’t consider priority.


We have successfully designed the bench-marks and profile across all the three schedulers with the help of various tools like trace-cmd, kernel shark, perf, stress-ng that gives us the inferences of execution time and preemptive latencies. The results are tabulated for the three different schedulers.