LINUX SCHEDULER BENCHMARK RESULTS
Richard Gooch
30-SEP-1998
Introduction
One of the important considerations when choosing a Real Time (RT)
operating system is the scheduler (context switch) overhead. For
example, if a low priority process releases a lock which a high
priority task is waiting for, how long will it take for the high
priority task to resume processing? The minimum time this can
take is determined by the time taken to perform a context switch. In
other words, the time taken to schedule a new process.
Serious RT operating systems can quote context switch overheads of a
few tens of microseconds or less. This page shows some results for the
Linux scheduler.
Raw Results
This section lists some raw results collected by myself and other
people. All results in microseconds.
First my own results (UP kernels, no extra processes, sched_yield(),
RT processes):
CPU Memory process switch thread switch Kernel version
AMD386DX40 FPM 39.4 24.8 2.1.122+fix
Intel386DX33 FPM 22.5 12.3 2.1.122+fix
486DX266 FPM 8.3 5.2 2.1.122+fix
Pentium 100 FPM 8.8 2.8 2.1.122+fix
PPro 180 EDO 2.2 1.5 2.1.122+fix
Pentium/MMX 200 SDRAM 2.4 0.6 2.1.122+fix
PPro 180 EDO 2.2 1.4 2.1.122+REG
Pentium 100 FPM 8.0-8.2 2.5 2.1.123+fix
PPro 180 FPM 2.0 1.0 2.1.123+fix
Pentium 100 FPM 7.2 2.0 2.1.123+rtqueue
PPro 180 FPM 1.9 1.0 2.1.123+rtqueue
Now the same tests with 10 low-priority processes on the run queue:
CPU Memory process switch thread switch Kernel version
AMD386DX40 FPM 118.7 106.0 2.1.122+fix
Intel386DX33 FPM 96.7 85.9 2.1.122+fix
486DX266 FPM 46.5 35.5 2.1.122+fix
Pentium 100 FPM 23.0 11.7 2.1.122+fix
Pentium/MMX 200 SDRAM 7.8 6.0 2.1.122+fix
PPro 180 EDO 4.3 3.6 2.1.122+fix
PPro 180 EDO 3.6 3.0 2.1.122+REG
Pentium 100 FPM 22.1 10.8 2.1.123+fix
PPro 180 FPM 4.1 3.2 2.1.123+fix
Pentium 100 FPM 7.2 2.0 2.1.123+rtqueue
PPro 180 FPM 1.9 1.0 2.1.123+rtqueue
Now using pipes and token passing rather than sched_yield() (no extra
processes):
CPU Memory process switch thread switch Kernel version
486DX266 FPM 16.4 10.8 2.1.122+fix
Pentium 100 FPM 13.1 7.0 2.1.122+fix
PPro 180 EDO 3.0 2.2 2.1.122+fix
Pentium/MMX 200 SDRAM 4.2 1.3 2.1.122+fix
PPro 180 EDO 3.4 2.6 2.1.122+REG
Pentium 100 FPM 13.6 7.5 2.1.123+fix
PPro 180 FPM 3.1 2.4 2.1.123+fix
Pentium 100 FPM 14.2 6.5 2.1.123+rtqueue
PPro 180 FPM 3.0 2.1 2.1.123+rtqueue
And now adding FPU pollution in the main timing loop:
CPU Memory process switch thread switch Kernel version
486DX266 FPM 27.1 16.9 2.1.122+fix
Pentium 100 FPM 15.6 9.0 2.1.122+fix
PPro 180 EDO 4.8 4.0 2.1.122+fix
Pentium/MMX 200 SDRAM 5.0 2.3 2.1.122+fix
PPro 180 EDO 5.3 4.6 2.1.122+REG
Pentium 100 FPM 15.7 9.8 2.1.123+fix
PPro 180 FPM 5.2 4.2 2.1.123+fix
Pentium 100 FPM 16.3 9.0 2.1.123+rtqueue
PPro 180 FPM 4.9 4.1 2.1.123+rtqueue
A comparative test using "lat_ctx" from lmbench shows the context
switch time go from about 19 us to 41 us when 10 low-priority
processes are added to the run queue. Timings for the Pentium 100.
Another test for normal processes (UP kernels, no extra processes,
sched_yield(), including syscall overheads):
CPU Memory process switch thread switch Kernel version
Pentium 100 FPM 11.2-11.5 6.0 2.1.123+fix
PPro 180 FPM 5.3 4.5 2.1.123+fix
Pentium 100 FPM 12.2 6.6 2.1.123+rtqueue
PPro 180 FPM 5.2 4.6 2.1.123+rtqueue
Note the "2.1.112+fix" kernel is 2.1.122 plus a bugfix from Linus for
lazy FPU state save/restoring.
Discussion
This section will discuss the results.
Effect of Run Queue Length
Looking at the costs of having more processes on the run queue, we can
see from the PPro 180 times with kernel 2.1.122+fix that an extra 10
processes on the run queue increased the thread switch time from 1.5
us to 3.6 us. This is a cost of 2.1 us per 10 processes, or 0.21 us
per process. For the Pentium 100 the cost is 0.89 us. With only 3
extra processes on the run queue, we can expect the thread switching
time for a Pentium 100 to double! A more pessimistic result may be
obtained using lmbench which shows 2.2 us per process cost for a
Pentium 100. For an Intel 386DX33 the cost is 7.4 us per process. Here
a mere two extra processes on the run queue more than doubles thread
switch latency.
Note that I look at thread switch times since I expect an RT
application to be implemented with threads rather than full
processes. Comparing with a RT OS like pSOS+, all tasks run in the
same memory space, so this seems like a common paradime for RT
applications.
Given these results, I think there is a case to be made to have a
separate run queue for RT processes under Linux. Consider a system
which serves a dual purpose: an RT application which is used to
control an instrument and an online data reduction and visualisation
application. While the "obvious" solution is to run the RT and other
application on separate machines, this has some drawbacks. Firstly,
there is the added cost of a second machine: not all organisations
have the necessary budget. Secondly, it makes sense to perform the
data reduction and visualisation on the same machine performing the
data capture. Having separate machines would require data to be
shipped across a network and would also consume memory bandwidth on
the data capture machine.
A separate RT run queue provides an increased measure of isolation
between RT processes and normal processes. This would help give a more
stable response for RT processes on a system loaded with normal
processes.
I have written a kernel patch which maintains a separate run queue for
RT processes and also simplfies some of the code paths. The results
are shown for the "2.1.123+rtqueue" kernel. The "2.1.123+fix" kernel
includes a bugfix for POSIX.4 compliance. From the results we can see
that there is a minor benefit for the case where the run queue is
short. This benefit is due to the simplfied code paths and probably
better cache utilisation. In addition, we can see that there is now no
penalty for additional low-priority processes on the run queue.
Effect of Cache Line Aliasing
It's interesting to note that the thread switch ratio between a
Pentium 100 and a Pentium Pro 180 is only about two, and that the
Pentium/MMX 200 is twice as fast as the PPro 180. On the other hand
the PPro 180 is slightly faster than the Pentium/MMX 200 for process
context switches and both are around 4 times faster than the Pentium
100.
A possible explanation for this is that since the Pentium/MMX has a
4-way L1 cache, cache line aliasing is not occurring for the thread
switch case, whereas for the Pentium and PPro cache line aliasing
requires more cache line fills and hence slows down thread
switching. A full process switch on the other hand requires more
memory to be manipulated and hence the faster L2 cache of the PPro
dominates.
Looking further at the case with an extra 10 low-priority processes
placed on the run queue, we see again that the Pentium/MMX is not
performing as well as the PPro. Since the run queue is longer, the
Pentium/MMX may now also be exhibiting cache line aliasing
problems. The faster L2 cache of the PPro may again be helping.
The Linux task (process) structure is page aligned. Thus there is the
potential for cache line thrashing, as accesses to the same member in
successive task structures can be aliased to the same cache line. This
then results in that cache line being flushed/invalidated and a new
cache line must be brought in from main memory. To reduce the impact
from this, I've reordered some of the members in the task structure so
that the scheduler only needs to access a maximum of 48 contiguous
bytes when scanning the run queue. On a CPU with 32 byte cache lines,
this means only 2 cache lines per task needs to be brought in. The
standard version of the kernel requires at least 3 cache lines per
thread and 4 caches lines per non-RT process on the run queue.
The effect of my reording can be seen in the results for the
"2.1.122+REG" kernel. As well as reducing the basic (2 thread)
switching time, the per-process cost has been reduced from 0.21 us to
0.16 us for a PPro 180. However, other tests show that this change
makes some operations slower, so clearly more work is required to
determine an optimal ordering. In my view it is more fruitful to
create a separate RT run queue, as this would have a greater benefit.
FPU Issues
From the results above we can see that the time taken to save and
restore the FPU state dominates the (short run queue) context switch
time. On a PPro 180, process switch times went from 3.0 us to 4.8 us,
a cost of 1.8 us. This is worth noting for a few reasons.
Firstly, RT processes which wish to ensure low context switch times
and wakeup latencies should avoid floating point operations if
possible. Performing floating point operations will increase the
latency.
Secondly, when comparing results from my test code with lmbench
results, it is worth noting that lmbench uses floating point
arithmetic to compute timings. This explains why most of my benchmarks
give lower absolute times than lmbench.
Finally, the effect that this FPU state saving/restoring has on
variance. This is discussed below.
Variance
Earlier generations of my test code showed considerable variance (up
to 100%). Some of that variance has been explained by unexpected
system effects (such as my shell, xterm and X server being placed on
the run queue while the benchmark was running). The current code takes
steps to avoid these effects.
There still remains some variance (up to 50%) where an occasional
measurement is higher than usual. This is in contrast to the lmbench
results, which so far have only exhibited a 30% variance (mean 5.25
us, minimum 3.94 us on a PPro 180). However, it is worth considering
the FPU save/restore times. The large variance in my benchmarks was
without FPU pollution (and hence the lower switch times). If we take
the 1.8 us FPU state overhead away from the lmbench numbers, we get a
mean of 3.45 us and a minimum of 2.14 us, yielding a variance of over
60%. While this is a crude comparison, it should be a reasonable first
approximation. In my tests which included FPU pollution, the variance
was under 30%.
The assumption I've made is that FPU state save/restore times are less
subject to variance than the rest of the context switch time. I think
this is a reasonable assumption considering that the FPU state is kept
in a single block of memory whereas the rest of the context switch
operations require access to scattered parts of memory. Access to
scattered memory is more dependent on cache effects.
Finally, it is worth noting that I've seen greatest variance on the
486DX266, a bit less on the Pentium 100, less again on the PPro 180
and least on the Pentium/MMX 200. This leads me to the suspicion that
much of the variance is due to cache effects.
Source Code
Below is the source code used in these tests.
/* time-schedule.c
Programme to test how long a context switch takes.
Copyright (C) 1998 Richard Gooch
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Richard Gooch may be reached by email at rgooch@atnf.csiro.au
The postal address is:
Richard Gooch, c/o ATNF, P. O. Box 76, Epping, N.S.W., 2121, Australia.
*/
/*
This programme will determine the context switch (scheduling) overhead on
a system. It takes into account SMP machines. True context switches are
measured.
Written by Richard Gooch 15-SEP-1998
Last updated by Richard Gooch 25-SEP-1998
*/
#include
#ifdef _POSIX_THREADS
# ifndef _REENTRANT
# define _REENTRANT
# endif
# ifndef _POSIX_THREAD_SAFE_FUNCTIONS
# define _POSIX_THREAD_SAFE_FUNCTIONS
# endif
# include
#endif
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#ifndef __KARMA__
# define mt_num_processors() 1 /* Set to the number of processors */
# define ERRSTRING sys_errlist[errno]
# define FALSE 0
# define TRUE 1
#else
# include
# include
#endif
#define MAX_ITERATIONS 1000
static unsigned int hog_other_cpus ();
static void run_yielder (int use_threads, int read_fd);
static void *yielder_main (void *arg);
static void s_term_handler ();
static void run_low_priority (unsigned int num, int read_fd);
static unsigned long compute_median (unsigned long values[MAX_ITERATIONS],
unsigned long max_value);
static unsigned int get_run_queue_size ();
static unsigned long get_num_switches ();
static void use_fpu_value (double val);
static volatile unsigned int sched_count = 0;
/* For yielder */
static int pipe_read_fd = -1;
static int pipe_write_fd = -1;
static pid_t child = -1;
int main (int argc, char **argv)
{
int use_threads = FALSE;
int use_pipe = FALSE;
int no_test = FALSE;
int frob_fpu = FALSE;
int read_fd = -1;
int write_fd = -1;
int num_low_priority = -1;
int i, j;
int fds[2];
unsigned int count, num_yields, run_queue_size1, run_queue_size2, num_hogs;
unsigned long median, switches1, num_switches, num_overhead_switches;
signed long overhead, total_diffs;
signed long min_diff = 1000000000;
signed long max_diff = -1000000000;
double dcount = 0.0;
unsigned long diffs[MAX_ITERATIONS];
struct timeval before, after;
static char *usage =
"time-schedule [-h] [-thread] [-notest] [-pipe] [-fpu] [num_running]";
/* First create pipe used to sychronise low priority processes */
if (pipe (fds) != 0)
{
fprintf (stderr, "Error creating pipe\t%s\n", ERRSTRING);
exit (1);
}
read_fd = fds[0];
pipe_write_fd = fds[1];
for (count = 1; count < argc; ++count)
{
if (strcmp (argv[count], "-thread") == 0)
{
#ifdef _POSIX_THREADS
use_threads = TRUE;
#else
fprintf (stderr, "POSIX threads not available\n");
#endif
}
else if (strcmp (argv[count], "-pipe") == 0) use_pipe = TRUE;
else if (strcmp (argv[count], "-notest") == 0) no_test = TRUE;
else if (strcmp (argv[count], "-fpu") == 0) frob_fpu = TRUE;
else if ( isdigit (argv[count][0]) )
num_low_priority = atoi (argv[count]);
else
{
fprintf (stderr,
"Programme to time context switches (schedules)\n");
fprintf (stderr,
"(C) 1998 Richard Gooch \n");
fprintf (stderr, "Usage:\t%s\n", usage);
fprintf (stderr, "\t-thread\t\tswitch threads not processes\n");
fprintf (stderr, "\t-pipe\t\tuse pipes not sched_yield()\n");
fprintf (stderr,
"\t-fpu\t\tpollute the FPU after each switch in main\n");
fprintf (stderr, "\tnum_running\tnumber of extra processes\n");
exit (0);
}
}
if (no_test)
{
if (num_low_priority > 0) run_low_priority (num_low_priority, read_fd);
while (TRUE) pause ();
}
if (geteuid () == 0)
{
struct sched_param sp;
memset (&sp, 0, sizeof sp);
sp.sched_priority = 10;
if (sched_setscheduler (0, SCHED_FIFO, &sp) != 0)
{
fprintf (stderr, "Error changing to RT class\t%s\n", ERRSTRING);
exit (1);
}
if (mlockall (MCL_CURRENT | MCL_FUTURE) != 0)
{
fprintf (stderr, "Error locking pages\t%s\n", ERRSTRING);
exit (1);
}
}
else fprintf (stderr, "Not running with RT priority!\n");
/* Give shell and login programme time to finish up and get off the run
queue */
usleep (200000);
if (use_pipe)
{
if (pipe (fds) != 0)
{
fprintf (stderr, "Error creating pipe\t%s\n", ERRSTRING);
exit (1);
}
pipe_read_fd = fds[0];
write_fd = fds[1];
}
num_hogs = hog_other_cpus ();
/* Determine overhead. Do it in a loop=2. The first iteration should warm
the cache, the second will compute the overhead */
for (j = 0; j < 2; ++j)
{
switches1 = get_num_switches ();
gettimeofday (&before, NULL);
for (i = 0; i < 20; ++i)
{
if (use_pipe)
{
char ch = 0;
write (pipe_write_fd, &ch, 1);
read (read_fd, &ch, 1);
}
else sched_yield ();
if (frob_fpu) ++dcount;
}
gettimeofday (&after, NULL);
num_overhead_switches = get_num_switches () - switches1;
overhead = 1000000 * (after.tv_sec - before.tv_sec);
overhead += after.tv_usec - before.tv_usec;
}
use_fpu_value (dcount);
if (num_low_priority > 0) run_low_priority (num_low_priority, read_fd);
/* Set up for the benchmark */
run_yielder (use_threads, read_fd);
memset (diffs, 0, sizeof diffs);
run_queue_size1 = get_run_queue_size ();
total_diffs = 0;
switches1 = get_num_switches ();
/* Benchmark! */
for (count = 0; count < MAX_ITERATIONS; ++count)
{
signed long diff;
gettimeofday (&before, NULL);
/* Generate 20 context switches */
for (i = 0; i < 10; ++i)
{
if (use_pipe)
{
char ch = 0;
write (write_fd, &ch, 1);
read (read_fd, &ch, 1);
}
else sched_yield ();
if (frob_fpu) dcount += 1.0;
}
gettimeofday (&after, NULL);
diff = 1000000 * (after.tv_sec - before.tv_sec);
diff += after.tv_usec - before.tv_usec;
diffs[count] = diff;
total_diffs += diff;
if (diff < min_diff) min_diff = diff;
if (diff > max_diff) max_diff = diff;
}
num_yields = sched_count;
run_queue_size2 = get_run_queue_size ();
num_switches = get_num_switches () - switches1;
if (!use_threads) kill (child, SIGTERM);
fprintf (stderr, "Started %u hog processes\n", num_hogs);
fprintf (stderr, "Syscall%s overhead: %.1f us\n", frob_fpu ? "/FPU" : "",
(double) overhead / 20.0);
if (switches1 > 0)
fprintf (stderr, "Num switches during overhead check: %lu\n",
num_overhead_switches);
fprintf (stderr, "Minimum scheduling latency: %.1f (%.1f) us\n",
(double) min_diff / 20.0, (double) (min_diff - overhead) / 20.0);
median = compute_median (diffs, max_diff);
fprintf (stderr, "Median scheduling latency: %.1f (%.1f) us\n",
(double) median / 20.0, (double) (median - overhead) / 20.0);
fprintf (stderr, "Average scheduling latency: %.1f (%.1f) us\n",
(double) total_diffs / (double) MAX_ITERATIONS / 20.0,
(double) (total_diffs - overhead * MAX_ITERATIONS) /
(double) MAX_ITERATIONS / 20.0);
fprintf (stderr, "Maximum scheduling latency: %.1f (%.1f) us\n",
(double) max_diff / 20.0, (double) (max_diff - overhead) / 20.0);
fprintf (stderr, "Run queue size: %u, %u\n",
run_queue_size1, run_queue_size2);
use_fpu_value (dcount);
if (use_threads) fprintf (stderr, "Number of yields: %u\n", num_yields);
if (num_switches > 0) fprintf (stderr, "Num switches: %lu\n",num_switches);
/* Finish up */
kill (0, SIGTERM);
return (0);
} /* End Function main */
static unsigned int hog_other_cpus ()
/* [SUMMARY] Hog other CPUs with a high-priority job.
[RETURNS] The number of hogged CPUs.
*/
{
unsigned int count;
for (count = mt_num_processors (); count > 1; --count)
{
switch ( fork () )
{
case 0:
/* Child */
while (TRUE);
break;
case -1:
/* Error */
fprintf (stderr, "Error forking\t%s\n", ERRSTRING);
kill (0, SIGTERM);
break;
default:
/* Parent */
break;
}
}
return mt_num_processors () - 1;
} /* End Function hog_other_cpus */
static void run_yielder (int use_threads, int read_fd)
/* [SUMMARY] Run other process which will continuously yield.
If TRUE, the yielding process is just a thread.
The pipe to read the synchronisation byte from.
[RETURNS] Nothing.
*/
{
char ch;
struct sigaction new_action;
#ifdef _POSIX_THREADS
pthread_t thread;
if (use_threads)
{
if (pthread_create (&thread, NULL, yielder_main, NULL) != 0)
{
fprintf (stderr, "Error creating thread\t%s\n", ERRSTRING);
kill (0, SIGTERM);
}
read (read_fd, &ch, 1);
return;
}
#endif
switch ( child = fork () )
{
case 0:
/* Child */
break;
case -1:
/* Error */
fprintf (stderr, "Error forking\t%s\n", ERRSTRING);
kill (0, SIGTERM);
break;
default:
/* Parent */
read (read_fd, &ch, 1);
return;
/*break;*/
}
memset (&new_action, 0, sizeof new_action);
sigemptyset (&new_action.sa_mask);
new_action.sa_handler = s_term_handler;
if (sigaction (SIGTERM, &new_action, NULL) != 0)
{
fprintf (stderr, "Error setting SIGTERM handler\t%s\n", ERRSTRING);
exit (1);
}
yielder_main (NULL);
} /* End Function run_yielder */
static void *yielder_main (void *arg)
/* [SUMMARY] Yielder function.
An arbirtrary pointer. Ignored.
[RETURNS] NULL.
*/
{
char ch = 0;
sched_count = 0;
write (pipe_write_fd, &ch, 1);
while (TRUE)
{
if (pipe_read_fd >= 0)
{
read (pipe_read_fd, &ch, 1);
write (pipe_write_fd, &ch, 1);
}
else sched_yield ();
++sched_count;
}
return (NULL);
} /* End Function yielder_main */
static void s_term_handler ()
{
fprintf (stderr, "Number of yields: %u\n", sched_count);
exit (0);
} /* End Function s_term_handler */
static void run_low_priority (unsigned int num, int read_fd)
/* [SUMMARY] Run low priority processes.
Number of processes.
The pipe to read the synchronisation byte from.
[RETURNS] Nothing.
*/
{
char ch = 0;
for (; num > 0; --num)
{
switch ( fork () )
{
case 0:
/* Child */
if (geteuid () == 0)
{
struct sched_param sp;
memset (&sp, 0, sizeof sp);
sp.sched_priority = 0;
if (sched_setscheduler (0, SCHED_OTHER, &sp) != 0)
{
fprintf (stderr,
"Error changing to SCHED_OTHER class\t%s\n",
ERRSTRING);
exit (1);
}
}
if (nice (20) != 0)
{
fprintf (stderr, "Error nicing\t%s\n", ERRSTRING);
kill (0, SIGTERM);
}
write (pipe_write_fd, &ch, 1);
while (TRUE) sched_yield ();
break;
case -1:
/* Error */
fprintf (stderr, "Error forking\t%s\n", ERRSTRING);
kill (0, SIGTERM);
break;
default:
/* Parent */
read (read_fd, &ch, 1);
break;
}
}
} /* End Function run_low_priority */
static unsigned long compute_median (unsigned long values[MAX_ITERATIONS],
unsigned long max_value)
/* [SUMMARY] Compute the median from an array of values.
The array of values.
The maximum value in the array.
[RETURNS] The median value.
*/
{
unsigned long count;
unsigned long median = 0;
unsigned long peak = 0;
unsigned long *table;
/* Crude but effective */
if ( ( table = calloc (max_value + 1, sizeof *table) ) == NULL )
{
fprintf (stderr, "Error allocating median table\n");
exit (1);
}
for (count = 0; count < MAX_ITERATIONS; ++count)
{
++table[ values[count] ];
}
/* Now search for peak. Position of peak is median */
for (count = 0; count < max_value + 1; ++count)
{
if (table[count] < peak) continue;
peak = table[count];
median = count;
}
return (median);
} /* End Function compute_median */
static unsigned int get_run_queue_size ()
/* [SUMMARY] Compute the current size of the run queue.
[RETURNS] The length of the run queue.
*/
{
int dummy_i;
unsigned int length = 0;
FILE *fp;
DIR *dp;
struct dirent *de;
char txt[64], dummy_str[64];
if ( ( dp = opendir ("/proc") ) == NULL ) return (0);
while ( ( de = readdir (dp) ) != NULL )
{
if ( !isdigit (de->d_name[0]) ) continue;
sprintf (txt, "/proc/%s/stat", de->d_name);
if ( ( fp = fopen (txt, "r") ) == NULL ) return (length);
fscanf (fp, "%d %s %s", &dummy_i, dummy_str, txt);
if (txt[0] == 'R') ++length;
fclose (fp);
}
closedir (dp);
return (length);
} /* End Function get_run_queue_size */
static unsigned long get_num_switches ()
/* [SUMMARY] Get the number of context switches.
[RETURNS] The number of context switches on success, else 0.
*/
{
unsigned long val;
FILE *fp;
char line[256], name[64];
if ( ( fp = fopen ("/proc/stat", "r") ) == NULL ) return (0);
while (fgets (line, sizeof line, fp) != NULL)
{
sscanf (line, "%s %lu", name, &val);
if (strcasecmp (name, "ctxt") != 0) continue;
fclose (fp);
return (val);
}
fclose (fp);
return (0);
} /* End Function get_num_switches */
static void use_fpu_value (double val)
/* [SUMMARY] Dummy function to consume FPU value. Fools compiler.
The value.
[RETURNS] Nothing.
*/
{
} /* End Function use_fpu_value */