Timeout Anything
Sometimes blocking operations take a long to complete and we need a way to
break them and investigate what’s going on. Unfortunately, not all of them
support timing out which returns control to the program. There are many ways
to deal with the situation, like using a separate thread or async library. If
blocking operation involves waiting for a file descriptor to be ready, we
could wrap it in a select()
or poll()
functions. These solutions are
fine, but unnecessarily complex and might bite us when we want to add other
features to our program.
Let’s write a simple filter: a program which runs a subprocess and pipes and converts its output to something else. In Python we can write it like this:
import sys
import subprocess
def process_line(s: str) -> str:
... # skip the logic for simplicity
cmd = ["tail", "-f", "myfile"]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
while True:
line = proc.stdout.readline()
print(process_line(line.decode()), flush=True)
This program has a problem: the call to proc.stdout.readline()
might block
infinitely. readline()
only returns when it reches a newline character or
when subprocess sends EOF (end-of-file), typically when it’s closing its
standard output. But process can finish abruptly, for example when killed
with SIGKILL
. In such case it won’t have an opportunity to cleanly close
its file descriptors. readline()
will block forever in this case and our
program will hang. It’d would be good to check from time to time if there’s
still a subprocess to read from.
We need a background timeout for readline()
call, but we have synchronous,
single-threaded application. How do we add one? The answer is: use standard
OS signals, specifically SIGALRM
.
With signal.alarm()
function we can arrange for a SIGALRM
signal delivery
once a configured time (in seconds) passes. If at the same time we enable a
signal handler which raises TimeoutError
when SIGALRM
occurs, we end up
having a very simple and robust way to setup a timeout for any function.
To simplify its use, let’s put it into a context manager. It’ll setup and
clean alarm automatically whenever we enter and quit the with
block.
import signal
from contextlib import contextmanager
@contextmanager
def timeout(secs: int):
def _handler(signum, frame):
raise TimeoutError("Timeout expired")
assert secs > 0, "timeout must be positive integer"
curr = signal.alarm(0)
if curr != 0:
# restore previous alarm and fail setting the new one
signal.alarm(curr)
raise AssertionError("only one SIGALRM can be active at a time")
signal.signal(signal.SIGALRM, _handler)
signal.alarm(secs)
try:
yield
finally:
signal.alarm(0)
signal.signal(signal.SIGALRM, signal.SIG_DFL)
From now on we can check every 5 seconds that subprocess is alive and act accordingly if it isn’t. Let’s see how it works:
while True:
with timeout(5):
try:
line = proc.stdout.readline()
except TimeoutError:
if proc.poll() is not None:
print("Subprocess died", file=sys.stderr)
sys.exit(1)
continue
if not line: # EOF
break
print(process_line(line.decode()), flush=True)
This way we may add timeouts to any function, not only I/O-related ones. Caveat of this approach is that only one alarm can be scheduled at a time. We must be aware of it and don’t use alarms for other purposes, or at least at the same time when timeout clock is ticking. Specifically, we mustn’t nest timeouts.
Manual page for alarm()
also notes that sleep()
function may be
implemented using SIGALRM
, so mixing calls to alarm()
and sleep()
might not be the best idea.