Problem with using time(2) as a time source

The problem

POE (along with a lot of software) expects time(2) to return a stable and monotonic view of time. That is, if time(2) is called a second later, it should return the previous time plus one. This is no where guaranteed. The system time could have been changed, either manually by an admin or automatically by ntpd or other. Or a hybrid situation: when a VM is suspended and then resumed.

Note that this doesn't normally happen often. But a daemon that runs 24/7 is going to see this eventually and it needs react intelligently.

The symptoms

In POE, a delay is set 5 seconds hence. Ntpd sets the system time back 2 seconds. The delay event will be delivered in 7 seconds.

Conversely, a delay is set 5 seconds hence. Ntpd sets the system time forward 2 seconds. The delay event will be delivered in either 5 or 3 seconds, depending on other kernel activity.

This is less of a problem for alarms; they are absolute. An alarm set for "5:30 am on Thursday" will happen be delivered at that time, unless time jumps forwards by a large amount.

The cause

At its heart, the POE kernel tracts a list of file handles and one timer. When no events are active, the kernel waits for activity on the file handles or for the timer to expire. The kernel also holds a queue of future events, ordered by time(2) of the due time. The timer is calculated by deducting time(2) from the due time of the next event in the queue. But time(2) might have changed unpredictably since the event was added to the queue.

This is why there are 2 possible results if the time moves forwards. If the kernel detects activity on a file handle, it will recalculate the timer. In the scenario above, the kernel sees activity after 2 seconds, calls the related events, then recalculated the timeout which is now 1 second and not 3 seconds.

Solutions

use /POSIX::RT::Timer
use underlying event's /Loop timer mechanism, if reliable (libev)

Stats

Save this on Delicious