Linux

Next Previous Table of Contents

4. Time Accounting

In order to be able to implement the scheduling policies described above, we must keep track of how long a process has run to be able to do a fair selectioning between the processes waiting to be processed. And if a process has used up its credit to run, we must signal this to system so another process can be choosen to run.

In Linux this time accounting is done using bottom halfs, a concept unique to Linux(?).

4.1 What are bottom halfs?

There are often times in a kernel when you do not want to do work at this moment. A good example of this is during interrupt processing. When the interrupt was asserted, the processor stopped what it was doing and the operating system delivered the interrupt to the appropriate device driver. Device drivers should not spend too much time handling interrupts as, during this time, nothing else in the system can run. There is often some work that could just as well be done later on. Linux's bottom half handlers were invented so that device drivers and other parts of the Linux kernel could queue work to

Whenever a device driver, or some other part of the kernel, needs to schedule work to be done later, it adds work to the appropriate system queue, for example the timer queue, and then signals the kernel that some bottom half handling needs to be done. It does this by setting the appropriate bit in bh_active. Bit 8 is set if the driver has queued something on the immediate queue and wishes the immediate bottom half handler to run and process it. The bh_active bitmask is checked at the end of each system call, just before control is returned to the calling process. If it has any bits set, the bottom half handler routines that are active are called. Bit 0 is checked first, then 1 and so on until bit 31.

The bit in bh_active is cleared as each bottom half handling routine is called. bh_active is transient; it only has meaning between calls to the scheduler and is a way of not calling bottom half handling routines when there is no work for them to do.

4.2 Time Accounting using bottom halfs

Very early in the boot process when the system gets setup (paging, traps and IRQ get intialized) the scheduler too gets initialized (see sched_init() in init/main.c). It's here where the infrastructure for time accouting is set up by setting a function pointer to the time accouting code which is run whenever the bottom halfs are processed, ergo every clock tick.

From kernel/sched.c:

1160 void __init sched_init(void) 
1161 { 

..

1174         init_bh(TIMER_BH, timer_bh); 
1175         init_bh(TQUEUE_BH, tqueue_bh); 
1176         init_bh(IMMEDIATE_BH, immediate_bh); 
1177 
1178         /* 
1179          * The boot idle thread does lazy MMU switching as well: 
1180          */ 
1181         atomic_inc(&init_mm.mm_count); 
1182         enter_lazy_tlb(&init_mm, current, cpu); 

update_timers is the interesting funciton call here. (run_old_timers and immediate_bh will trigger the timer task queue and the immediate task queue to be run.)

From kernel/timer.c:

664 void timer_bh(void) 
665 { 
666         update_times(); 
667         run_old_timers(); 
668         run_timer_list(); 
669 } 

update_times() calls update_process_times(ticks, system) where the updating of the time left for a process is done. Therefore the counter variable gets decreased by ticks, a magical value (at least in my eyes) that describes the time past between the last call. It's important to see, that if the currently running process has used up its credit (counter<0) a flag is set (need_resched=1) that will force the scheduler to reschedule as soon as possible by selecting a process to run.

From kernel/timer.c:

563 static void update_process_times(unsigned long ticks, unsigned long system)
564 {
565 /*
566  * SMP does this on a per-CPU basis elsewhere
567  */
568 #ifndef  CONFIG_SMP
569         struct task_struct * p = current;
570         unsigned long user = ticks - system;
571         if (p->pid) {
572                 p->counter -= ticks;
573                 if (p->counter <= 0) {
574                         p->counter = 0;
575                         p->need_resched = 1;
576                 }

Doing some statistics, again.

577                 if (p->priority < DEF_PRIORITY)
578                         kstat.cpu_nice += user;
579                 else
580                         kstat.cpu_user += user;
581                 kstat.cpu_system += system;
582         }
583         update_one_process(p, ticks, user, system, 0);
584 #endif
585 }

Next Previous Table of Contents