Process Scheduling
If there are several runnable jobs, the operating system has to decidewhich job to run next, a process known as Process Scheduling.
In the old days, when computers ran batch jobs, this was not an issue.The computer operator simply submitted the jobs in the order that theywere delivered to him or her, and each job ran to completion. We cancall this algorithm First come first served, or FIFO (first in firstout).
However, even this primitive system had problems. Suppose there arefive jobs waiting to be run. Four of the five jobs will take aboutten seconds each to run, and one will take ten minutes, but the tenminute job was submitted first. In a FIFO system, the four fast jobswill all be held up for a long time by a large job that happened to bedelivered first.
In a batch system, this was not serious since jobs were neverinteractive. However, if we knew ahead of time how long each jobwould take, we could maximize throughput. In fact, some computercenters in the '60s could actually do this. When a user submitted ajob, he or she was also asked to specify an estimated run time. Theoperator would then allow a job to run only for that amount of time(or perhaps a few seconds longer). This was important, because eventhen, programmers could write programs with infinite loops. If aprogram exceeded its estimated run time, the operator killed it.
This permitted the operator to run jobs using a shortest jobfirst algorithm. As the name implies, instead of running jobs inthe order that they are delivered, the operator would search throughall available jobs and run that job which had the shortest run time.This is provably the fastest job scheduling algorithm. A simpleexample will demonstrate this.
Suppose four jobs arrived at about the same, but not exactly the same,time. We happen to know exactly how long each job will take. Here arethe run times of the four jobs, in the order that they arrived.
Job One | 25 |
Job Two | 10 |
Job Three | 20 |
Job Four | 15 |
If the jobs are run with a FIFO algorithm, hereare the total times to completion for each of the four jobs (assuming that it takes notime to load a job).
Job One | 25 |
Job Two | 35 |
Job Three | 55 |
Job Four | 70 |
The average time to completion was 46.25 seconds ((25 + 35 + 55 + 70) / 4).
If the jobs are run shortest job first, (Job Two, Job Four, Job Three, Job One) here are the total times to completion of the four jobs.
Job One | 70 |
Job Two | 10 |
Job Three | 45 |
Job Four | 25 |
It still takes 70 seconds to run all four jobs, but theaverage time to completion for a job was 37.5 seconds ((70 + 10 + 45 + 25) / 4).
Aside:Asking users to estimate the run time of their jobs putthem in somewhat of a bind, because often they did not have an exact guess.If they submitted a high guess, their job would almost certainly run tocompletion, but it might be delayed by the operator. On the other hand, ifthey submitted a somewhat lower time estimate, the operator would start runningit sooner, but there was a greater chance that it would time out before completing.
Process scheduling on a modern multiprogramming operating system is far more complex.Recall our state diagram for the states of a process
The earlier discussion made reference to a kernel process called the scheduleror the dispatcher. This is a process which decides what process to run next.The scheduler has a number of competing demands on it.
- Efficiency
- Keeping the CPU and other computer resources as busy as possible. If the CPU is idle but thereare jobs to run, everything is slowed down
- Minimizing response time for interactive jobs
- Modern computers are interactive, there are users waiting for responsesto their commands. It is important to schedule those jobs first where theuser is waiting for a response.
- Maximizing throughput
- Throughput is the number of jobs completed per unit time. Clearlythis should be a high as possible
- Minimizing turnaround time
- The average time to complete jobs should be as low as possible.
- No starvation
- Even long running batch jobs should be making progress.
- Low overhead
- The process scheduler itself is a process, but this process shouldnot take too much time. We can use the term overhead to refer toall of the time which is taken by the process scheduler and by the workinvolved in switching from one process to another.
- Enforcing priorities
- On a large system, some jobs may be run at a higher priority than others, because their owner is more important or they are doing moreimportant work. Scheduling algorithms have to take these priorities intoaccount.
- Enforcing real-time deadlines
- A real time process is one which has strict deadlines becauseit is impacting events in the world. An example is a process which isdisplaying a video in which each frame has to be loaded within a strictschedule in order that it not appear jittery. A system which is runningmany processes, some of which are real time processes, has to make surethat the real time process deadlines are met.
- It is waiting for an event to occur, such as a keystroke from the useror a read from a file or a mutex to become unlocked. In this case, theprocess is moved from the running state to the blocked state, and some otherprocess is allowed to run.
- It has used up its time quantum (time slice). One of thefeatures of modern process scheduling is that if there are several runnablejobs, one job runs for a while, and then it is replaced by another job.The maximum amount of time that a job can run before being preempted isset by the system and is called the time quantum. It is typically on theorder of several hundred milliseconds.
- A process with a higher priority has become runnable. This most oftenoccurs because a job which had been blocked, waiting for an event, isawakened because the event has occurred. It may be the case that thisrecently awakened process has a higher priority than the currently runningjob, and so it replaces it.
There are many types of jobs which may be running more or less concurrently.Some jobs are interactive; the user is sitting at the monitor hitting keys andmoving the mouse and waiting for a response. Other jobs may be running inbackground. For example, a web server has no user, but it still requiresrelatively rapid response to requests. Still other jobs are pure batch jobs;with no user waiting for an immediate response. On many mainframes, filesof transactions are read in and a database is updated, but this is nothappening in real time.
We can distinguish two categories of jobs, compute intensive jobs andI/O intensive jobs. Any running process consists of alternating periodsof computation followed by an I/O operation. In a compute intensivejob, the computation phase is long relative to the I/O phase. In anI/O intensive job, the computation phase is very short; the computerreads a record from a file for example, performs a trivial computationon it, then reads the next record.
Question: Which type of job should be given priority, a computeintensive job or an I/O intensive job?
Answer: The I/O intensive job should be given priority, because itwill run for a very brief period of time before issuing its next I/O command, whichwill put it to sleep, and then the compute intensive job can run.
Here is a simple example: Two jobs are submitted at the same time, P1 iscompute intensive and P2 is I/O intensive. The compute intensive job willrequire 300 milliseconds to run, the I/O intensive job will perform 3 I/O operations.Each I/O operation takes 60 msec, and the process will need 20 milliseconds toprocess the data from one I/O operation. There is also an initial startup of 20 msec, so it requires a total of 80 msec of CPU time. The time slice is100 msec.
Consider two scenarios. In the first scenario, the compute intensive jobis given priority. In the second scenario, the I/O intensive job is givenpriority and will preempt a running compute intensive job whenever it isrunnable. To keep things simple, we will assume that a context switch takesno time. Here are the two time lines.
In the first scenario, P1 takes a total of 340 msec and P2 takes 440 msec.In the second scenario, P1 takes 380 msec but P2 takes only 260 msec.
It is important to keep in mind throughout all of this discussion that thescheduler does not know how long a process will run or whether or not itwill be I/O intensive or CPU intensive. Also, the scheduler does not usually knowwhether or not a particular job is interactive or not. The only information that it hasis the priority and its past history.
If we look at the kinds of jobs that are actually run on computers doing realwork, it turns out that we can make some assumptions.
- A process which has been IO intensive in the recent past will probably beIO intensive in the near future. The same is true of CPU intensive jobs.
- A process which has been running for a long time is less likely to end soonthan a process which has just started. This may be somewhat counterintuitive,but it turns out that the mix of jobs on most computers consists of a large numberof very short jobs (the user typing ls for example) with a small numberof long running jobs.
Scheduling Algorithms and strategies
Let's consider some simple scheduling algorithms to see how well they meet ourcriteria.
We have already discussed two non-preemptive algorithms
First-come-first-served (FCFS) - Just run the jobs as they arrive. This issimple to implement, but it means that a long running job can block a quickjob, so it is not very efficient.
Shortest job first - Run the shortest job in the queue first. This isthe most efficient non-preemptive strategy, but it is based on the assumptionthat the system knows how long each job will run, and generally this is not known.Thus, it is not practical.
Preemptive strategies
Essentially all modern operating systems use a preemptive scheduling strategy,so we can discuss several such algorithms.
Round Robin
A simple yet fairly effective scheduling algorithm is called RoundRobin. This is a variant of the FCFS algorithm discussed above with atime slice preemption added. In a Round Robin scheduler, the processes whichare ready to run are kept in a FIFO queue. A running job continues runninguntil it terminates, it becomes blocked, or it uses up its time slice. Whenits time slice is used up, it is put on the end of the ready queue. When sleeping processes are awakened or when new processes are started theydo not preempt the currently running job, they are put on the end of the queue.
The Round Robin Scheduling algorithm has the following advantages
- It is simple to implement. Simplicity generally means low overhead.
- There is no danger of starvation for any job
- It cannot take scheduling priorities into account
- Interactive jobs are not necessarily given priority, so there can belonger than optimal waits.
- IO intensive jobs are not necessarily given priority.
Job 1 | 40 |
Job 2 | 10 |
Job 3 | 30 |
Job 4 | 20 |
The average time to completion is 70 msec. With shortest job firstscheduling, the average time to completion of the four jobs is 50 msec.
The size of the time quantum is an important consideration. Ourexamples have ignored the time to perform a context switch, but inpractice there is some overhead associated with switching from onerunning process to another. If the goal is to minimize overheadtime, then it is better to have a large time quantum. In practicemost processes will not use up their entire quantum because they willbe blocked on an IO operation or a mutex.
However, if the time quantum is large, the turnaround time on shortjobs will be longer because they will spend more time waiting forCPU intensive jobs to complete their time slice.
Multiple queues based on priority
If job priority is an important criterion, as it is in real time systemsfor example, another algorithms is to have several queues depending onpriority. High priority jobs are placed on one queue, low priorityjobs on another. High priority jobs are run first. This algorithmhas the risk that low priority jobs might starve if there are enoughhigh priority jobs, so one solution to this is that jobs that are ready torun but have not received any CPU time recently are moved up in priority.
Multiple queues based on prior usage (feedback)
A variant of this is to assign jobs to different priority queues depending on recent CPU usage. Consider a simple system with twoqueues. Newly started jobs or jobs that have just been awakenedafter being blocked are placed on the high priority queue. A jobwhich is preempted because it uses up its time slice is placed onthe lower priority queue. Jobs on the high priority queue arerun first. If there are no jobs on this queue, then jobs on thelow priority queue are run. Each queue operates on an FIFO basis.
This algorithm has a number of advantages over a simple round robinscheduler. Interactive processes are given priority because theytend to be either very short or to spend most of their time blockedwaiting for input. The disadvantage of this is that long running,CPU intensive jobs can starve if there are enough interactive jobs.One solution to this problem is to have the scheduler periodicallyscan the low priority queue to look for jobs which have notreceived any CPU time recently and move them to the higher priorityqueue.
If two queues are better than one queue, perhaps three or more queuesare better than two. With multiple queues, a job which is on, sayqueue 2 which uses up its time slice is then placed on queue 3, (lowernumbers mean higher priority), and so on. Thus a long, compute intensive jobwill quickly migrate to a lower priority queue. In order to avoid starvation,there needs to be a mechanism to move low priority jobs which have not received anyCPU time for a while to move to a higher priority queue.
Yet another variant of this, which helps to address the problem oflow priority jobs starving, is to vary the time slice depending onthe priority. Low priority jobs run less often, but when they getto run, they are given a longer time slice.
Scheduling systems which use multiple queues based on prior usage aresometimes called feedback models.
Calculating CPU usage with decay
A multiple queue system assigns jobs a priority based onCPU usage. One simple but poor solution is to just assignpriority based on total CPU usage so far. This effectivelystarves long running jobs. A second solution isto base priority only on CPU usage during the recent past, the past second for example. Thus, a job which received no CPU time in the past second istreated the same as a brand new job. An even better solution isto use an algorithm which is a hybrid between these two such that recent CPU time reduces priority more than CPU time in the distant past.
For example, suppose we divide timeup into discrete periods of, say, one second. We want to count usage inthe most recent time period more heavily than usage in the second most recent timeperiod, and so on. This is called a decay function. Each second, a valueT
is calculated for each current process according to a formula suchas the following:
TN = .5 * TimeN-1 + .25 * TimeN-2 + .125 * TimeN-3 + .0625 * TimeN-4 ...
In this function, TN
is our total usage indicator at time N
.TimeN-1
is the amount of CPU time that the process used in thelast time period, TimeN-2
is the amount of CPU time thatthe process used in the second to last time period, TimeN-3
is the amount of CPU time that the process used in the third to last time period, and so on.
Jobs are prioritized by T
, the job with the lowest value of T
is run first, the job with the second lowest value of T
isrun second, and so on.
This algorithm has several advantages
- Newly arrived jobs are given the highest priority because they haveno prior CPU time.
- IO intensive jobs are given priority, because, since they spend most oftheir time sleeping, they do not accumulate much CPU time.
- Long running, CPU intensive jobs do not starve, because the longerthey wait for CPU time, the lower their value of
T
is.
TN = P * Time + (1 - P) * TN-1
In this formula Time
is the amount of CPU time that the processused since T
was last recomputed, and P
is avalue between zero and one. If P
has the value of .5, thisexactly computes the expansion above. A higher value of P
gives moreweight to the recent past. In other words, the function decays more quickly.
Fair-share scheduling
All of the models described above treat each process as independent. However,on a modern multiuser system with threading, this may not be the case. In thediscussion of threads, we made a distinction between two different ways toimplement threads. On system which implements user threads, thread schedulingis handled entirely within a process without the kernel dispatcher knowinganything about the threads. In this case, there is a thread library which isa part of the process structure, and this will contain a scheduling mechanism,but of course, it only schedules threads within that process when that processis actually running. The opposite is kernel threads, in which the kernelknows about multiple threads in a single process and is responsible forscheduling these threads.
In the latter case, the scheduler might want to schedule threads such thateach process gets its fair share of the CPU, in contrast to giving a process with,say, six threads, six times as much run time as a process with only a single thread.This is known as fair-share scheduling (FSS). In a typical FSS system,the system divides the executable threads into related groups, and allocatesa fraction of the total CPU time to each group. Scheduling is thus based on acomplex priority system which can take into account the recent history of thatparticular thread, the total usage of all of the threads in that group, andperhaps the priority of the group, and/or the priority of the threads withinthat group.
Swapping
Multiprogramming only works efficiently if all runnable processesare in memory. During times of heavy load, it may be the casethat not all processes will fit into memory at the same time.This leads to a lot of memory page faults (exception conditions in whicha process or a component of a process needs to be loaded into memoryfrom disk before it can run), which are extremely inefficient.One solution which is adopted by many real world operating systemsis to swap out some jobs during times of peak load.This means that entire jobs are copied onto disk and removedfrom the ready queue, even though they are potentially runnable.This not only frees up memory for higher priority jobs, but italso makes the scheduler work more efficiently because it hasfewer potentially runnable processes to consider. Only lowpriority jobs should be swapped out.
The obvious disadvantage is that swapping can dramatically slowdown long running jobs, so there needs to be a mechanism toperiodically copy swapped out jobs back to memory and give themsome CPU time.
Swapping processes in and out of memory is called medium level scheduling. Thiscan be contrasted with short term scheduling on the one hand, which was the kindof scheduling that was discussed in the rest of this lesson, and long term scheduling,on the other hand, which is deciding which and how many jobs to run. Althoughmany operating systems texts discuss long term scheduling, modern systems usually have little control over when new processes are created, and so itis only relevant for a few large mainframe systems that run mostly batch jobs.
An observation
Although process scheduling is a crucial component of the study of operatingsystems, it is only important for systems which are running many more or lesssimultaneous processes. In practice, the typical PC or Unix workstation isonly running one or at most a small number of active processes at one time,and so process scheduling is not important.
Scheduling in Unix and Windows
In spite of that observation, both of theoperating systems that we discuss in this course use scheduling algorithmswhich are more complex than any of those discussed above.
Unix (4.4BSD)
Unix uses multilevel feedback queues. All runnable processesare assigned a scheduling priority that determines which queue theyare placed in. Each queue uses round robin. Priorities aredynamically adjusted.
When a nonrunning process with a higher priority becomes available, itimmediately preempts the current running process if the current processis in user mode, otherwise it switches when the current process exits thekernel.
Unix processes have a priority value called niceness, which ranges from-20 (highest priority) to +20 (lowest priority). The default value for a user process is zero.A user can run a process with a lower priority with the nice
command at the shell. For examplenice a.out
This sets its niceness value to 10. It is called nice
becauseusers who use this are being nice to other users by running their jobs at alower priority. Only the superuser is allowed to increase the priority of a job(nastiness).
Short term priority of user processes is based on two values, recent cpu utilization and niceness. Every four clock ticks(40 msec) priority is recalculated base on this equation.
priority = PUSER + (recentcputime/4) + 2 * Niceness.
recentcputime is incremented each time that the system clock ticks,and the process is found to be executing. It is adjusted onceper second by a decay factor which causes 90 percent of the cpu usageover the past second to be decremented over a period of timewhich is dependent on the system load factor. PUSER is a constant.
recentcputime = ((2 * load) / (2 * load + 1)) * recentcputime + nice
load
is the average length of the run queue. Thus, this function uses a decay function as described above with the rate ofdecay determined by system load. When the system is lightly loaded,decay is rapid, when the system is heavily loaded, decay is slower.
Each type of job has its own priority. This is determined bythe constant in the equation. User jobs have the value PUSER,Processes running in kernel mode have a higherpriority than user jobs, so the constant is higher.
Process Scheduling in Windows 2000
Windows 2000 also uses a priority driven, preemptive scheduling algorithm. A running job is assigned a time quantum, but size of thetime quantum varies. Unlike Unix, what is scheduled is threads,not processes, and in general, no consideration is given to whatprocess the thread belongs to. This means that a process withmany runnable threads would receive much more CPU time than a processwith only one runnable thread.
Windows 2000 uses 32 priority levels (higher numbers indicatehigher priority). The top 16 are calledreal time levels and the bottom 16 are for ordinary userprocesses.
A thread in a real time process is assigned a realtime priority and this does not change. However, note thatin spite of the name, Windows 2000 does not support truereal time scheduling (i.e. guaranteed turnaround time);these simply refer to higher priority jobs. Many kernelthreads run with real time priority.
The priority of a thread in an ordinaryuser process changes dynamically. Each thread starts out with itsprocess base priority (the default is 8). However, prioritycan be increased by the following events
- Completion of an I/O operation
- after waiting on a semaphore or mutex
- when a GUI thread wakes up because of a windowing activity
- when a thread that is ready to run hasn't run for sometime (starvation).
The scheduler also changes the length of a time quantum, thuseffectively increasing the CPU time of certain processes. Oneof the advantages of integrating the Graphical User Interface withthe kernel is that the system knows which window has the focus,and can increase the time quantum for the process or processesassociated with that window (foreground processes). This makes sense because theseprocesses are more likely to be interactive.
The dispatcher runs higher priority jobs first. A process that uses upits time quantum has its priority reduced.
Here is an exercise which simulates two of thesealgorithms
There is no programming assignment this week.