Latency test of POSIX timers under Linux SCHED

Discussion:

Latency test of POSIX timers under Linux SCHED_FIFO

Arvid Brodin

2013-03-19 23:20:43 UTC

Hello,

Just out of curiosity, I wrote this program to test the latency using a
vanilla 3.4 kernel's "real-time scheduler". Maybe it is of interest to
someone on this list.

$ uname -srvmp
Linux 3.3.8 #7 SMP PREEMPT Tue Jun 26 03:08:40 CEST 2012 x86_64 Intel(R)
Core(TM) i3-2120T CPU @ 2.60GHz

Tests were made while running a 1080p x264 movie and "shaking" a large
glxgears window.

Sample running as normal user without permission to change to realtime
scheduler, i.e. running under the normal scheduler (SCHED_OTHER):

$ ./rt_test
Setting SCHED priority 1
Could not set scheduler: Operation not permitted
CLOCK_REALTIME resolution: 1 ns
Establishing handler for signal 34
timer ID is 0x1ae6010
...
Min, max diff -686, 694 µs; cumulative diff 1990 µs; overruns 0;
calls 70913
Min, max diff -827, 1257 µs; cumulative diff 4003 µs; overruns 2;
calls 71970
Min, max diff -645, 1193 µs; cumulative diff 4994 µs; overruns 1;
calls 73028
Min, max diff -629, 635 µs; cumulative diff 4996 µs; overruns 0;
calls 74087
Min, max diff -48, 65 µs; cumulative diff 4997 µs; overruns 0;
calls 75146
Min, max diff -29, 34 µs; cumulative diff 4990 µs; overruns 0;
calls 76204
Min, max diff -253, 160 µs; cumulative diff 4995 µs; overruns 0;
calls 77263
Min, max diff -854, 868 µs; cumulative diff 4996 µs; overruns 0;
calls 78323
Min, max diff -812, 1812 µs; cumulative diff 5990 µs; overruns 1;
calls 79381
Min, max diff -635, 371 µs; cumulative diff 5990 µs; overruns 0;
calls 80441
Min, max diff -536, 540 µs; cumulative diff 6000 µs; overruns 0;
calls 81500
Min, max diff -855, 1229 µs; cumulative diff 6988 µs; overruns 1;
calls 82558
Min, max diff -305, 435 µs; cumulative diff 7002 µs; overruns 0;
calls 83618
Min, max diff -293, 304 µs; cumulative diff 6995 µs; overruns 0;
calls 84677
Min, max diff -644, 647 µs; cumulative diff 6993 µs; overruns 0;
calls 85737

Sample running under real-time scheduler (SCHED_FIFO):
# ulimit -r 1
# ./rt_test
Setting SCHED priority 1
CLOCK_REALTIME resolution: 1 ns
Establishing handler for signal 34
timer ID is 0x1236010
...
Min, max diff -24, 36 µs; cumulative diff 2 µs; overruns 0; calls 35173
Min, max diff -15, 20 µs; cumulative diff -3 µs; overruns 0; calls
36178
Min, max diff -30, 24 µs; cumulative diff -3 µs; overruns 0; calls
37183
Min, max diff -17, 14 µs; cumulative diff -1 µs; overruns 0; calls
38187
Min, max diff -27, 22 µs; cumulative diff -4 µs; overruns 0; calls
39192
Min, max diff -68, 80 µs; cumulative diff 2 µs; overruns 0; calls 40199
Min, max diff -26, 32 µs; cumulative diff 0 µs; overruns 0; calls 41204
Min, max diff -23, 29 µs; cumulative diff -3 µs; overruns 0; calls
42209
Min, max diff -29, 28 µs; cumulative diff 3 µs; overruns 0; calls 43215
Min, max diff -24, 25 µs; cumulative diff -3 µs; overruns 0; calls
44220
Min, max diff -26, 28 µs; cumulative diff 2 µs; overruns 0; calls 45225
Min, max diff -19, 25 µs; cumulative diff -3 µs; overruns 0; calls
46231
Min, max diff -47, 57 µs; cumulative diff -1 µs; overruns 0; calls
48242

It looks good enough for the servo thread, at least on my computer!
Here's the program, in case you want to try it yourself:

Compile with
$ gcc -Wall -lrt rt_test.c -o rt_test

---
/*
* Latency/precision test program for POSIX timers under Linux
* real-time scheduling (SCHED_FIFO).
*
* Based on the sample code in 'man 2 timer_create',
* copyright (c) 2009 Linux Foundation, written by Michael Kerrisk
* <mtk.manpages-***@public.gmane.org>
* and released under the "Verbatim" License:
* https://www.kernel.org/doc/man-pages/licenses.html#verbatim
*
* Modifications for this usage by Arvid Brodin <arvidb-***@public.gmane.org>.
*/

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <time.h>
#include <errno.h>

#define CLOCKID CLOCK_REALTIME
#define SIG SIGRTMIN
#define PERIOD 1000000

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

static inline int min(int a, int b)
{
return (a < b) ? a : b;
}

static inline int max(int a, int b)
{
return (a > b) ? a : b;
}

/* Global variable to keep track of latency statistics etc. */
static struct {
int mindiff; // Maximum negative deviation
int maxdiff; // Maximum positive deviation
int cumdiff; // Cumulative deviation from first call
int calls; // Total number of calls made
int overruns; // Number of completely missed calls
int last_ns;
} period_info;

static void print_clockinfo()
{
struct timespec ts;

if (clock_getres(CLOCK_REALTIME, &ts) != 0) {
printf("Could not get clock info.\n");
return;
}

printf("CLOCK_REALTIME resolution: %ld ns\n", ts.tv_nsec);
}

/*
* Print statistics. Called from main loop.
*/
static void print_diffs()
{
printf("Min, max diff %5d, %5d µs; cumulative diff %d µs; overruns %d;
calls %d\n",
period_info.mindiff / 1000,
period_info.maxdiff / 1000,
period_info.cumdiff / 1000,
period_info.overruns,
period_info.calls);
period_info.mindiff = 0;
period_info.maxdiff = 0;
period_info.overruns = 0;
}

static struct sigaction sa;

/*
* Signal handler. Called every PERIOD ns.
*/
static void handler(int sig, siginfo_t *si, void *uc)
{
timer_t *tidp;
struct timespec ts;
int overruns;
int diff;

period_info.calls += 1;

tidp = si->si_value.sival_ptr;
overruns = timer_getoverrun(*tidp);
if (overruns > 0)
period_info.overruns += overruns;

clock_gettime(CLOCKID, &ts);
if (period_info.last_ns < 0)
// First call
period_info.last_ns = ts.tv_nsec - PERIOD;
else if (ts.tv_nsec < period_info.last_ns)
// Seconds wrapped
period_info.last_ns -= 1000000000;

diff = ts.tv_nsec - period_info.last_ns - PERIOD;
period_info.mindiff = min(period_info.mindiff, diff);
period_info.maxdiff = max(period_info.maxdiff, diff);
period_info.cumdiff += diff;

period_info.last_ns = ts.tv_nsec;
}

/*
* Signal delivery causes system/library calls to return prematurely.
* See 'man 7 signal' for details. This function repeatedly call
* nanosleep() until (at least) 'sec' seconds has passed.
*/
static void sigsafe_sleep(int sec)
{
struct timespec rqtp, rmtp;
struct timespec *ts_req, *ts_rem, *ts_tmp;
int res;

rqtp.tv_sec = sec;
rqtp.tv_nsec = 0;

ts_req = &rqtp;
ts_rem = &rmtp;
res = nanosleep(ts_req, ts_rem);
while (res < 0 && errno == EINTR) {
ts_tmp = ts_req;
ts_req = ts_rem;
ts_rem = ts_tmp;
res = nanosleep(ts_req, ts_rem);
}
}

int main(int argc, char *argv[])
{
struct sched_param sched_param;
timer_t timerid;
struct sigevent sev;
struct itimerspec its;
int res;
const int sched_policy = SCHED_FIFO;

// Scheduler
sched_param.sched_priority = sched_get_priority_min(sched_policy);
printf("Setting SCHED priority %d\n", sched_param.sched_priority);
res = sched_setscheduler(0, sched_policy, &sched_param);
if (res < 0)
perror("Could not set scheduler");

// Some init
print_clockinfo();
period_info.mindiff = 0;
period_info.maxdiff = 0;
period_info.cumdiff = 0;
period_info.last_ns = -1;

// Establish handler for timer signal
printf("Establishing handler for signal %d\n", SIG);
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
if (sigaction(SIG, &sa, NULL) == -1)
errExit("sigaction");

// Create the timer
sev.sigev_notify = SIGEV_SIGNAL;
sev.sigev_signo = SIG;
sev.sigev_value.sival_ptr = &timerid;
if (timer_create(CLOCKID, &sev, &timerid) == -1)
errExit("timer_create");
printf("timer ID is 0x%lx\n", (long) timerid);

// Start the timer
its.it_value.tv_sec = 0;
its.it_value.tv_nsec = PERIOD;
its.it_interval.tv_sec = its.it_value.tv_sec;
its.it_interval.tv_nsec = its.it_value.tv_nsec;

if (timer_settime(timerid, 0, &its, NULL) == -1)
errExit("timer_settime");

while (1) {
sigsafe_sleep(1);
print_diffs();
}

// Not reached
exit(EXIT_SUCCESS);
}
---

--
Arvid

Arvid Brodin

2013-03-19 23:26:43 UTC

Permalink

Post by Arvid Brodin
Hello,
Just out of curiosity, I wrote this program to test the latency using a
vanilla 3.4 kernel's "real-time scheduler". Maybe it is of interest to
someone on this list.

Or maybe 3.3. Oh well. :)

Post by Arvid Brodin
$ uname -srvmp
Linux 3.3.8 #7 SMP PREEMPT Tue Jun 26 03:08:40 CEST 2012 x86_64 Intel(R)

--
Arvid

Michael Haberler

2013-03-20 08:32:29 UTC

Permalink

Hi Arvid,

I got curious and tried, also on 2.6.32-122-rtai - it turns out it has a significant improvement on rtai too, so that feature might go back even longer than 3.3.

I do see potential use for this, it would be in the plain kernel support of what is now 'sim mode' (aka posix threads, which are not RT-hardened wrt latency in any way)

what we currently do is: Posix threads are used for for both 'vanilla' and 'RT PREEMPT' RTOS support; in case an RT PREEMPT kernel is detected its facilities are used (I'd need to read up on the specifics).

If not, it falls back to plain scheduling, and thats where your suggestion could come it - it could improve RT latency when running on a vanilla kernel.

here's the current code which is used for both vanilla and RT PREEMPT kernels:

http://git.mah.priv.at/gitweb?p=emc2-dev.git;a=blob;f=src/rtapi/rt-preempt-user.c;h=d46994b172b030b98d601eba1a23707531794ed8;hb=refs/heads/rtos-integration-preview3

would you be willing to help plug that in? ideally it would be runtime flag; there will be a kernel autodetection routine which executes at the beginning of an RTAPI instance, and thats where a flag could be handed down from

thanks for the suggestion!

regards

Michael

Post by Arvid Brodin
Hello,
Just out of curiosity, I wrote this program to test the latency using a
vanilla 3.4 kernel's "real-time scheduler". Maybe it is of interest to
someone on this list.
$ uname -srvmp
Linux 3.3.8 #7 SMP PREEMPT Tue Jun 26 03:08:40 CEST 2012 x86_64 Intel(R)
Tests were made while running a 1080p x264 movie and "shaking" a large
glxgears window.
Sample running as normal user without permission to change to realtime
$ ./rt_test
Setting SCHED priority 1
Could not set scheduler: Operation not permitted
CLOCK_REALTIME resolution: 1 ns
Establishing handler for signal 34
timer ID is 0x1ae6010
...
Min, max diff -686, 694 µs; cumulative diff 1990 µs; overruns 0;
calls 70913
Min, max diff -827, 1257 µs; cumulative diff 4003 µs; overruns 2;
calls 71970
Min, max diff -645, 1193 µs; cumulative diff 4994 µs; overruns 1;
calls 73028
Min, max diff -629, 635 µs; cumulative diff 4996 µs; overruns 0;
calls 74087
Min, max diff -48, 65 µs; cumulative diff 4997 µs; overruns 0;
calls 75146
Min, max diff -29, 34 µs; cumulative diff 4990 µs; overruns 0;
calls 76204
Min, max diff -253, 160 µs; cumulative diff 4995 µs; overruns 0;
calls 77263
Min, max diff -854, 868 µs; cumulative diff 4996 µs; overruns 0;
calls 78323
Min, max diff -812, 1812 µs; cumulative diff 5990 µs; overruns 1;
calls 79381
Min, max diff -635, 371 µs; cumulative diff 5990 µs; overruns 0;
calls 80441
Min, max diff -536, 540 µs; cumulative diff 6000 µs; overruns 0;
calls 81500
Min, max diff -855, 1229 µs; cumulative diff 6988 µs; overruns 1;
calls 82558
Min, max diff -305, 435 µs; cumulative diff 7002 µs; overruns 0;
calls 83618
Min, max diff -293, 304 µs; cumulative diff 6995 µs; overruns 0;
calls 84677
Min, max diff -644, 647 µs; cumulative diff 6993 µs; overruns 0;
calls 85737
# ulimit -r 1
# ./rt_test
Setting SCHED priority 1
CLOCK_REALTIME resolution: 1 ns
Establishing handler for signal 34
timer ID is 0x1236010
...
Min, max diff -24, 36 µs; cumulative diff 2 µs; overruns 0; calls 35173
Min, max diff -15, 20 µs; cumulative diff -3 µs; overruns 0; calls
36178
Min, max diff -30, 24 µs; cumulative diff -3 µs; overruns 0; calls
37183
Min, max diff -17, 14 µs; cumulative diff -1 µs; overruns 0; calls
38187
Min, max diff -27, 22 µs; cumulative diff -4 µs; overruns 0; calls
39192
Min, max diff -68, 80 µs; cumulative diff 2 µs; overruns 0; calls 40199
Min, max diff -26, 32 µs; cumulative diff 0 µs; overruns 0; calls 41204
Min, max diff -23, 29 µs; cumulative diff -3 µs; overruns 0; calls
42209
Min, max diff -29, 28 µs; cumulative diff 3 µs; overruns 0; calls 43215
Min, max diff -24, 25 µs; cumulative diff -3 µs; overruns 0; calls
44220
Min, max diff -26, 28 µs; cumulative diff 2 µs; overruns 0; calls 45225
Min, max diff -19, 25 µs; cumulative diff -3 µs; overruns 0; calls
46231
Min, max diff -47, 57 µs; cumulative diff -1 µs; overruns 0; calls
48242
It looks good enough for the servo thread, at least on my computer!
Compile with
$ gcc -Wall -lrt rt_test.c -o rt_test
---
/*
* Latency/precision test program for POSIX timers under Linux
* real-time scheduling (SCHED_FIFO).
*
* Based on the sample code in 'man 2 timer_create',
* copyright (c) 2009 Linux Foundation, written by Michael Kerrisk
* https://www.kernel.org/doc/man-pages/licenses.html#verbatim
*
*/
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <time.h>
#include <errno.h>
#define CLOCKID CLOCK_REALTIME
#define SIG SIGRTMIN
#define PERIOD 1000000
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
static inline int min(int a, int b)
{
return (a < b) ? a : b;
}
static inline int max(int a, int b)
{
return (a > b) ? a : b;
}
/* Global variable to keep track of latency statistics etc. */
static struct {
int mindiff; // Maximum negative deviation
int maxdiff; // Maximum positive deviation
int cumdiff; // Cumulative deviation from first call
int calls; // Total number of calls made
int overruns; // Number of completely missed calls
int last_ns;
} period_info;
static void print_clockinfo()
{
struct timespec ts;
if (clock_getres(CLOCK_REALTIME, &ts) != 0) {
printf("Could not get clock info.\n");
return;
}
printf("CLOCK_REALTIME resolution: %ld ns\n", ts.tv_nsec);
}
/*
* Print statistics. Called from main loop.
*/
static void print_diffs()
{
printf("Min, max diff %5d, %5d µs; cumulative diff %d µs; overruns %d;
calls %d\n",
period_info.mindiff / 1000,
period_info.maxdiff / 1000,
period_info.cumdiff / 1000,
period_info.overruns,
period_info.calls);
period_info.mindiff = 0;
period_info.maxdiff = 0;
period_info.overruns = 0;
}
static struct sigaction sa;
/*
* Signal handler. Called every PERIOD ns.
*/
static void handler(int sig, siginfo_t *si, void *uc)
{
timer_t *tidp;
struct timespec ts;
int overruns;
int diff;
period_info.calls += 1;
tidp = si->si_value.sival_ptr;
overruns = timer_getoverrun(*tidp);
if (overruns > 0)
period_info.overruns += overruns;
clock_gettime(CLOCKID, &ts);
if (period_info.last_ns < 0)
// First call
period_info.last_ns = ts.tv_nsec - PERIOD;
else if (ts.tv_nsec < period_info.last_ns)
// Seconds wrapped
period_info.last_ns -= 1000000000;
diff = ts.tv_nsec - period_info.last_ns - PERIOD;
period_info.mindiff = min(period_info.mindiff, diff);
period_info.maxdiff = max(period_info.maxdiff, diff);
period_info.cumdiff += diff;
period_info.last_ns = ts.tv_nsec;
}
/*
* Signal delivery causes system/library calls to return prematurely.
* See 'man 7 signal' for details. This function repeatedly call
* nanosleep() until (at least) 'sec' seconds has passed.
*/
static void sigsafe_sleep(int sec)
{
struct timespec rqtp, rmtp;
struct timespec *ts_req, *ts_rem, *ts_tmp;
int res;
rqtp.tv_sec = sec;
rqtp.tv_nsec = 0;
ts_req = &rqtp;
ts_rem = &rmtp;
res = nanosleep(ts_req, ts_rem);
while (res < 0 && errno == EINTR) {
ts_tmp = ts_req;
ts_req = ts_rem;
ts_rem = ts_tmp;
res = nanosleep(ts_req, ts_rem);
}
}
int main(int argc, char *argv[])
{
struct sched_param sched_param;
timer_t timerid;
struct sigevent sev;
struct itimerspec its;
int res;
const int sched_policy = SCHED_FIFO;
// Scheduler
sched_param.sched_priority = sched_get_priority_min(sched_policy);
printf("Setting SCHED priority %d\n", sched_param.sched_priority);
res = sched_setscheduler(0, sched_policy, &sched_param);
if (res < 0)
perror("Could not set scheduler");
// Some init
print_clockinfo();
period_info.mindiff = 0;
period_info.maxdiff = 0;
period_info.cumdiff = 0;
period_info.last_ns = -1;
// Establish handler for timer signal
printf("Establishing handler for signal %d\n", SIG);
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
if (sigaction(SIG, &sa, NULL) == -1)
errExit("sigaction");
// Create the timer
sev.sigev_notify = SIGEV_SIGNAL;
sev.sigev_signo = SIG;
sev.sigev_value.sival_ptr = &timerid;
if (timer_create(CLOCKID, &sev, &timerid) == -1)
errExit("timer_create");
printf("timer ID is 0x%lx\n", (long) timerid);
// Start the timer
its.it_value.tv_sec = 0;
its.it_value.tv_nsec = PERIOD;
its.it_interval.tv_sec = its.it_value.tv_sec;
its.it_interval.tv_nsec = its.it_value.tv_nsec;
if (timer_settime(timerid, 0, &its, NULL) == -1)
errExit("timer_settime");
while (1) {
sigsafe_sleep(1);
print_diffs();
}
// Not reached
exit(EXIT_SUCCESS);
}
---
--
Arvid
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Emc-developers mailing list
https://lists.sourceforge.net/lists/listinfo/emc-developers

Arvid Brodin

2013-03-20 15:03:56 UTC

Permalink

Post by Michael Haberler
Hi Arvid,
I got curious and tried, also on 2.6.32-122-rtai - it turns out it has a significant improvement on rtai too, so that feature might go back even longer than 3.3.
I do see potential use for this, it would be in the plain kernel support of what is now 'sim mode' (aka posix threads, which are not RT-hardened wrt latency in any way)
what we currently do is: Posix threads are used for for both 'vanilla' and 'RT PREEMPT' RTOS support; in case an RT PREEMPT kernel is detected its facilities are used (I'd need to read up on the specifics).
If not, it falls back to plain scheduling, and thats where your suggestion could come it - it could improve RT latency when running on a vanilla kernel.
http://git.mah.priv.at/gitweb?p=emc2-dev.git;a=blob;f=src/rtapi/rt-preempt-user.c;h=d46994b172b030b98d601eba1a23707531794ed8;hb=refs/heads/rtos-integration-preview3

It seems like that code is already using SCHED_FIFO. Look at line 349 -
if SCHED_DEADLINE cannot be set, it fallbacks to SCHED_FIFO. I _think_
it works like this: if you have an RT_PREEMPTion patched kernel, you get
deterministic behaviour with SCHED_FIFO/SCHED_RR. If not, you just get
"lower latency".

See e.g.
https://wiki-ext.aps.anl.gov/epics/index.php/How_To_Use_Posix_Thread_Priority_Scheduling_under_Linux
https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO#A_Realtime_.22Hello_World.22_Example

(Under "Benchmarking" in the second link, there's a link to another
latency test program, BTW...)

Now, I've just started looking at these things (both Linux realtime and
LinuxCNC), so please correct me if I'm wrong. :)

--
Arvid

Arvid Brodin

2013-03-20 15:41:44 UTC

Permalink

Post by Arvid Brodin
Hello,
Just out of curiosity, I wrote this program to test the latency using a
vanilla 3.4 kernel's "real-time scheduler". Maybe it is of interest to
someone on this list.

I just ran this on a Gentoo system with a radeon card, running linux-3.4
and an up-to-date user space. Although in general the latencies were
much lower running under SCHED_FIFO, I still got latencies up to almost
2000 µs while browsing the web. So not very usable...

The first system I ran the test on (and which got latencies in the +/-80
µs range) was an Intel Core i3 with Intel HD graphics.

--
Arvid

Michael Haberler

2013-03-20 16:07:00 UTC

Permalink

Post by Arvid Brodin

Post by Arvid Brodin
Hello,
Just out of curiosity, I wrote this program to test the latency using a
vanilla 3.4 kernel's "real-time scheduler". Maybe it is of interest to
someone on this list.

it's not necessarily the scheduler

all tricks of the trade, and rules of reducing latency still apply - just because you're running on a vanilla kernel you cant skimp on that ;)

try booting with the 'idle=poll' or 'nohlt' option and see what changes (and feel the room temperature rising ;)

there are sources of latency like recovery from idle/powersave states which regulate down the onboard SMPS for the CPU; that needs time to get back up to full speed voltage given the large bypass capacitors; also some clock generation might go through a phase-locked loop and that wont kick into high gear instantaneous either

- Michael

here's an intro to every trick hardware manufacturers came up with over time wrt power saving states:

http://www.hardwaresecrets.com/article/Everything-You-Need-to-Know-About-the-CPU-C-States-Power-Saving-Modes/611/2

Post by Arvid Brodin
The first system I ran the test on (and which got latencies in the +/-80
µs range) was an Intel Core i3 with Intel HD graphics.
--
Arvid
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Emc-developers mailing list
https://lists.sourceforge.net/lists/listinfo/emc-developers