CSUA Software Projects

Lottery Scheduling Kernel


Authors: David Petrou and John Milford
Status: Was used on Soda Mark V (FreeBSD 2.2.x), but hasn't been ported to Soda Mark VI (FreeBSD 4.x).

The upgrade to Soda Mark V allowed the CSUA to once again have source for the operating system kernel, and to customize its operation. David and John have taken the FreeBSD kernel source and implemented a radically different process scheduling algorithm that is better suited to the CSUA environment. This new kernel is being stress tested on Soda, with up to (and sometimes even a bit more than) 150 simultaneous login sessions. If you want to know if the lottery scheduler is running on soda at any given time, login and run /sbin/dmesg and look for the following line:

initializing lottery scheduler 0.x, <dpetrou@cs.berkeley.edu>

This work is based on Waldspurger's lottery scheduling algorithm, which implements proportional-share resource management. The primary advantages are that users have strict control over the relative execution rates of their processes, and users are load-insulated from each other, preventing one user from dominating the CPU. See below for more information on lottery scheduling algorithm and its applications.

David & John have written a paper on their design, evaluation, and experience of their system in comparison with decay-usage schedulers, emphasizing solutions we have developed for several problems encountered while implementing lottery scheduling for a real system. Infrequently, they will need to switch soda back and forth between the standard FreeBSD scheduler and the lottery scheduler so that they can measure both systems. They will try to be as unobtrusive as possible, and we hope they have your patience, as one of their goals is to make soda a more usable system for all of us.

They are also soliciting user feedback. If, after using the system, you have an opinion one way or the other, or if you have done some hard experiments, they want to hear from you.


The Gory Details of Lottery Scheduling

by David Petrou

Here is a brief introduction to our implementation of lottery scheduling. Processes are assigned a number of tickets, 10 tickets by default, ranging from 1 to 100, in a user currency. Users can inflate or deflate their currency by adding or removing tickets, or running more or less processes. Users can adjust the relative execution rates of their own programs by adjusting ticket distributions. A CPU-bound process with 10 tickets will get the processor twice as often as a CPU-bound process with 5 tickets.

To keep users isolated from each other, user tickets are converted into base tickets via an exchange rate before a scheduling decision is made. Users by default have 1000 base tickets (ranging from 100 to 10000) that fund their user currency. So, even if one user has 10000 user tickets in running processes, the tickets get scaled down to the number of base tickets that fund them. Root can adjust the number of base tickets that fund users to adjust the relative execution rate of users. We also mapped the 'nice' semantics to lottery scheduling, so 'nice' works as expected.

There are userlevel programs available for tweaking scheduling parameters. Right now they are in /csua/bin. As time permits, we will write man pages for these programs and the new system calls, and install these programs in the appropriate place. Here are their descriptions:

set_tickets -ppid -ttickets
Sets the number of tickets held by pid to tickets. The user must own pid unless the user is root. tickets is in user tickets (as opposed to base tickets).
run_tickets tickets prog [arg1 ...]
Executes prog with optional argN's using tickets represented in user tickets.
show_tickets [-u] [-p] [-x]
Shows the number of tickets held by pid. The pid must be owned by the user, unless the user is root. A pid of -1 shows the number of tickets held by all of the user's processes, the default. Root can view information on other users by specifying the uid, or -1 for all users. Use -x to get additional information.
set_funding [-u] -t
Root sets the number of base tickets that funds uid's processes.
show_funding [-u]
Shows the number of base tickets that fund uid. If uid is -1, all users are shown. Only root can view another uid.

We encourage experimentation and would appreciate it if you let us know if you experience behavior out of the ordinary. These are some of the problems you should watch out for: bad interactive response, slow execution of CPU-bound jobs, runnable processes that starve (that are not being scheduled). We have not noticed any of these, but these are some of the problems that may occur when a new scheduler gets heavy use. If we notice poor performance, we should not be overly concerned (we still want to hear about it, though!), because there are a few important things that are left for us to optimize. We are more concerned about process starvation, or a weird synchronization problem that causes the machine to crash.

Want to know more? Read:

David Petrou and John Milford. Proportional-Share Scheduling: Implementation and Evaluation in a Widely-Deployed Operating System, December 1997. http://www.cs.cmu.edu/~dpetrou/papers/freebsd_lottery_writeup98.ps, source code: http://www.cs.cmu.edu/~dpetrou/code/freebsd_lottery_code.tar.gz
Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management, Proceedings of the First Symposium on Operating Systems Design and Implementation (OSDI '94), pages 1-11, Monterey, California, November 1994. http://www.research.digital.com/SRC/personal/caw/papers.html [Received award for best paper.]

Send feedback and questions to dpetrou@cs.cmu.edu.


[CSUA] Last Modified:
www@CSUA.Berkeley.EDU