Next Previous Table of Contents
In order to be able to implement the scheduling policies described above, we must keep track of how long a process has run to be able to do a fair selectioning between the processes waiting to be processed. And if a process has used up its credit to run, we must signal this to system so another process can be choosen to run.
In Linux this time accounting is done using bottom halfs, a concept unique to Linux(?).
There are often times in a kernel when you do not want to do work at this moment. A good example of this is during interrupt processing. When the interrupt was asserted, the processor stopped what it was doing and the operating system delivered the interrupt to the appropriate device driver. Device drivers should not spend too much time handling interrupts as, during this time, nothing else in the system can run. There is often some work that could just as well be done later on. Linux's bottom half handlers were invented so that device drivers and other parts of the Linux kernel could queue work to
Whenever a device driver, or some other part of the kernel, needs to schedule work to be done later, it adds work to the appropriate system queue, for example the timer queue, and then signals the kernel that some bottom half handling needs to be done. It does this by setting the appropriate bit in bh_active. Bit 8 is set if the driver has queued something on the immediate queue and wishes the immediate bottom half handler to run and process it. The bh_active bitmask is checked at the end of each system call, just before control is returned to the calling process. If it has any bits set, the bottom half handler routines that are active are called. Bit 0 is checked first, then 1 and so on until bit 31.
The bit in bh_active is cleared as each bottom half handling routine is called. bh_active is transient; it only has meaning between calls to the scheduler and is a way of not calling bottom half handling routines when there is no work for them to do.
Very early in the boot process when the system gets setup (paging, traps and IRQ get intialized) the scheduler too gets initialized (see sched_init() in init/main.c). It's here where the infrastructure for time accouting is set up by setting a function pointer to the time accouting code which is run whenever the bottom halfs are processed, ergo every clock tick.
From kernel/sched.c:
1160 void __init sched_init(void) 1161 { .. 1174 init_bh(TIMER_BH, timer_bh); 1175 init_bh(TQUEUE_BH, tqueue_bh); 1176 init_bh(IMMEDIATE_BH, immediate_bh); 1177 1178 /* 1179 * The boot idle thread does lazy MMU switching as well: 1180 */ 1181 atomic_inc(&init_mm.mm_count); 1182 enter_lazy_tlb(&init_mm, current, cpu);
update_timers is the interesting funciton call here. (run_old_timers and immediate_bh will trigger the timer task queue and the immediate task queue to be run.)
From kernel/timer.c:
664 void timer_bh(void) 665 { 666 update_times(); 667 run_old_timers(); 668 run_timer_list(); 669 }
update_times() calls update_process_times(ticks, system) where the updating of the time left for a process is done. Therefore the counter variable gets decreased by ticks, a magical value (at least in my eyes) that describes the time past between the last call. It's important to see, that if the currently running process has used up its credit (counter<0) a flag is set (need_resched=1) that will force the scheduler to reschedule as soon as possible by selecting a process to run.
From kernel/timer.c:
563 static void update_process_times(unsigned long ticks, unsigned long system) 564 { 565 /* 566 * SMP does this on a per-CPU basis elsewhere 567 */ 568 #ifndef CONFIG_SMP 569 struct task_struct * p = current; 570 unsigned long user = ticks - system; 571 if (p->pid) { 572 p->counter -= ticks; 573 if (p->counter <= 0) { 574 p->counter = 0; 575 p->need_resched = 1; 576 }
Doing some statistics, again.
577 if (p->priority < DEF_PRIORITY) 578 kstat.cpu_nice += user; 579 else 580 kstat.cpu_user += user; 581 kstat.cpu_system += system; 582 } 583 update_one_process(p, ticks, user, system, 0); 584 #endif 585 }
Next Previous Table of Contents