Cleanup

  • python
  • 6
  • 2
  • finished

Weakref module has a wonderful mechanism to perform cleanup actions: finalizer objects. The idea is to register a cleanup function which will is called when some object is garbage collected, or when application exits, without worrying about lifetime of the function itself. I’ve used finalizer objects as a replacement for ordinary atexit module, but they have 2 drawbacks:

  1. they bind existence of object to the function call: global cleanups require global application object;
  2. you can’t easily call all the cleanup functions before atexit handlers run.

The latter becomes painful if we’re dealing with non-daemonic threads. (sidenote: Daemon threads borrow their name from Unix daemon processeses. They are threads which run in the background and aren’t automatically joined by Python when application exits. They are a kind of run-and-forget threads.) If we can communicate to the thread that it should stop, then we should do so before we try to join it (either manually or automatically), or otherwise we risk unnecessary delay of application exit. A good example are timers: if we exit an application while a timer is still in its waiting state, then joining a thread will result in waiting the remaining time and then running the underlying action anyway. When application is quitting, it would be preferable to cancel the timer first, but there’s no way to do it with weakref or atexit module. (sidenote: CPython implementation of threading module provides undocumented _register_atexit function for the purpose of running cleanup actions before joining of non-daemonic threads.)

Custom Solution

Inspired by weakref’s finalizer objects, I wrote a custom class for lifetime management, which is a part of one of my projects, kpsh (implementation, and tests are available under the terms of GPL3 or later license).

The usage is simple: you can register any cleanup function by passing it and its arguments to the cleanup() call. Without anything else, registered functions are called in reverse order when application quits, the same as atexit module.

Calling cleanup returns a proxy object. If you call it without any arguments, it will run a stored function immediately. Later, this function won’t be called again. You may also cancel() the cleanup.

You may call all remaining cleanup actions (in reverse order) at any time by running cleanup.run_all(). You can also wrap blocks of code in a context manager cleanup.clean_on_exit(), in which case run_all() will be automatically called when you exit this block of code. You can then register more cleanup actions for later use.

import sys
from cleanup import cleanup


def print_on_exit(s: str, **kw):
    print(f"Cleanup: {s}", **kw)


def recursive_cleanup():
    cleanup(print_on_exit, "Recursive cleanup")


cleanup(print_on_exit, "Some cleanup")
cleanup(print_on_exit, "Cleanup on stderr", file=sys.stderr)
cleanup(recursive_cleanup)
c = cleanup(print_on_exit, "Cleanup which will be cancelled")
p = cleanup(print_on_exit, "Preemptive cleanup")

print("Application code")

c.cancel()
p()

cleanup.run_all()

print("-------------------------------------")

try:
    with cleanup.clean_on_exit():
        cleanup(print_on_exit, "Cleanup even though there's an exception")
        assert False
except AssertionError:
    print("Exception handler")

print("-------------------------------------")

cleanup(print_on_exit, "This will be called AFTER the traceback")
assert False, "The End"

The output of above code is:

Application code
Cleanup: Preemptive cleanup
Cleanup: Recursive cleanup
Cleanup: Cleanup on stderr
Cleanup: Some cleanup
-------------------------------------
Cleanup: Cleanup even though there's an exception
Exception handler
-------------------------------------
Traceback (most recent call last):
  File "/home/mgoral/temp/cl/./bla.py", line 41, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/mgoral/temp/cl/./bla.py", line 39, in main
    assert False, "The End"
AssertionError: The End
Cleanup: This will be called AFTER the traceback

Source Code

# SPDX-License-Identifier: GPL-3.0-or-later
# Copyright (C) 2024 Michał Góral.

import sys
import atexit
import itertools
from threading import RLock
from contextlib import contextmanager


# Rough idea for function registry stolen from weakref.finalize implementation
class cleanup:
    __slots__ = ()
    _index_iter = itertools.count()
    _registry = {}
    _is_atexit = False
    lock = RLock()

    class _Info:
        __slots__ = ("fn", "args", "kwargs", "index")

    def __init__(self, fn, *args, **kwargs):
        if not self._is_atexit:
            atexit.register(self.run_all)
            cleanup._is_atexit = True

        info = self._Info()
        info.fn = fn
        info.args = args
        info.kwargs = kwargs
        info.index = next(self._index_iter)
        with self.lock:
            self._registry[self] = info

    def __call__(self):
        with self.lock:
            info = self._registry.pop(self, None)
        if info:
            return info.fn(*info.args, **info.kwargs)
        return None

    def cancel(self):
        with self.lock:
            self._registry.pop(self, None)

    @classmethod
    @contextmanager
    def clean_on_exit(cls):
        try:
            yield
        finally:
            cls.run_all()

    @classmethod
    def run_all(cls):
        try:
            # theoretically cleanup actions may create new cleanups by
            # themselves, so we must handle this in an infinite loop
            while True:
                cleanups = cls._get_cleanups()
                if not cleanups:
                    break

                cl = cleanups.pop()
                try:
                    cl()
                except Exception:
                    sys.excepthook(*sys.exc_info())  # show exception on stderr

            with cls.lock:
                assert len(cls._registry) == 0
        finally:
            atexit.unregister(cls.run_all)
            cls._is_atexit = False

    @classmethod
    def _get_cleanups(cls):
        with cls.lock:
            lst = list(cls._registry.items())
        lst.sort(key=lambda elem: elem[1].index)  # oldest last, but we use list.pop
        return [cl for cl, _ in lst]  # force retrieval of info via cleanup object