Conversion Descriptor

All programs accept 2 types of input: input from the outside of the program (user input, input from other programs) and input from the inside of the program (programmer input: source code). Both are error-prone and both should be validated, preferably as soon as appear in the program. Don’t store them not validated and converted to the internal datatype.

Feature of statically typed languages, like Rust or C++, is that they validate programmer input and disallow a lot of mistakes. Not worrying about data types is a serious relief. Python is dynamically-typed language, but since version 3.5. it includes a support for type annotations (sidenote: Or: type hints.) which allow some kind of type (and thus: data) validation via static linters like mypy. For example, this code produces a mypy error:

foo: int = "11"

While working on ledgerify, which is my work-in-progress approach to importing data into ledger and hledger, I’ve created a descriptor which automatically converts any assigned value to the annotated data type. This frees programmer’s mind and is a convenience for end-user, who can for example type a date-like string which gets automatically converted to datetime object.

To use it, just create a type-annotated class field and assign a Conversion descriptor to it. Field should be annotated as Conversion[T], with T being the actual type to which data should be converted. (sidenote: By chance, this form of type annotation is, in my opinion, the correct one for any descriptors. Some people lie about their types and write e.g. foo: int = Descriptor(), but field foo isn’t the integer. Its a descriptor. Descriptor[int] gives a clear message: you’re dealing with descriptor which wraps an integer.) Type annotation is necessary, because at one point descriptor detects whether a value is already converted or not and a lot of converting functions, often just bare types, would break.

Conversion works well with dataclasses which use a special __get__() call to determine whether descriptor supplies a default argument or not. It also provies a default_factory, to avoid the same pitfall with mutable types, as dataclasses do. (sidenote: If descriptor is not part of a dataclass, then default and default_factory do nothing, because they require a special dataclass-specific call.)

There are 2 kinds of validation: runtime validation and static validation. Descriptor is heavily annotated, so mypy catches a lot of type errors. However, Python suffers from being dynamically-typed, so we can’t check everything statically. Conversion descriptor tries to be as helpful as possible and validates a lot of common pitfalls in a runtime, like converter function which doesn’t convert, or missing or incorrect annotation.

The last one is actually important, because it’s a limitation of Conversion: how can annotation be incorrect and not caught by mypy? Conversion[T] requires that T is actual type, not an alias type, or whatever class typing module provides (apart from Optional). So, it should be Conversion[list] instead of Conversion[list[int]] or Conversion[Iterablepint]]. Descriptor compares assigned value against this type and skips conversion altogether when types match. This isn’t mere convenience or performance improvement: it’s necessary for types which don’t support no-op conversion for their self-types (like decimal.Decimal, which can be constructed from strings, floats and integers, but not from other decimals).

Usage Examples

def split_list(val: str):
    return [v.strip() for v in val.split(",")]


@dataclass
class Foo:
    arg: Conversion[int] = Conversion(int)
    arg2: Conversion[Optional[int]] = Conversion(int)

    dec: Conversion[Decimal] = Conversion(Decimal, default=Decimal(11))
    list1: Conversion[list] = Conversion(split_list, default_factory=list)

    dec2: Conversion[Optional[Decimal]] = Conversion(Decimal, default=None)
    list2: Conversion[list | None] = Conversion(split_list, default_factory=list)

    # mypy error: default isn't int
    # err: Conversion[int] = Conversion(int, default=11)

    # mypy error: default isn't int
    # runtime error: conversion function doesn't convert to int
    # err: Conversion[int] = Conversion(str, default="11")

    # mypy error: no overload for both default and default_factory
    # runtime error: disallowed use of both default and default_factory
    # err: Conversion[list] = Conversion(split_list, default=[], default_factory=list)

    # runtime error: missing mandatory annotation
    # err = Conversion(Decimal, default=Decimal(11))

    # runtime error: invalid type annotation
    # err: Conversion[list[int]] = Conversion(split_list, default_factory=list)
    # err: Conversion[List[int]] = Conversion(split_list, default_factory=list)

Let’s set some data:

f = Foo("123", 111)
f.arg2 = None
f.list1 = "Hello, world!"
f.list2 = None

f.dec = "11.23"

try:
    f.dec = None
except ValueError:
    print("Couldn't unset f.dec.")

print(f)

The output is:

Couldn't unset f.dec.
Foo(arg=123, arg2=None, dec=Decimal('11.23'), list1=['Hello', 'world!'], dec2=None, list2=None)

Descriptor Source Code

# SPDX-License-Identifier: GPL-3.0
# Copyright (C) 2024 Michał Góral.

import typing
from typing import Generic, TypeVar, Optional, Callable, Any, Type, overload

T = TypeVar("T")
Converter = Callable[[Any], T]
Factory = Callable[[], T]


# A sentinel which forces creating a new instance of default_factory
class _CreateDefaultFactory:
    pass


_CREATE_DEFAULT_FACTORY = _CreateDefaultFactory()


# A sentinel which indicates not existing default/default_factory for a member
class _MissingArgument:
    pass


_MISSING = _MissingArgument()


class Conversion(Generic[T]):
    @overload
    def __init__(self, conv: Converter) -> None:
        ...

    @overload
    def __init__(self, conv: Converter, *, default: T) -> None:
        ...

    @overload
    def __init__(self, conv: Converter, *, default_factory: Factory) -> None:
        ...

    def __init__(self, conv, *, default=_MISSING, default_factory=_MISSING):
        if default is not _MISSING and default_factory is not _MISSING:
            raise ValueError("cannot specify both default and default_factory")

        self._conv = conv
        self._default = default
        self._default_factory = default_factory
        self._name: str = ""
        self._pubname: str = ""
        self._tp = None

    def __set_name__(self, cls, name: str):
        self._pubname = name
        self._name = "_" + name

    def __get__(self, obj, cls=None):
        # dataclasses determines default value by calling
        # descriptor.__get__(obj=None, tp=cls)
        if obj is None:
            if self._default is not _MISSING:
                return self._default
            if self._default_factory is not _MISSING:
                return _CREATE_DEFAULT_FACTORY
            raise AttributeError("no default")

        return getattr(obj, self._name)

    def __set__(self, obj, value):
        if value is _CREATE_DEFAULT_FACTORY and self._default_factory:
            value = self._default_factory()

        # Don't convert values which already match desired type
        if not isinstance(value, self._target_type(obj)):
            try:
                value = self._conv(value)
                if not isinstance(value, self._target_type(obj)):
                    raise TypeError(
                        f"unexpected type after conversion for '{self._pubname}': {type(value)}"
                    )
            except:
                raise ValueError(f"conversion error for '{self._pubname}': {value}")

        setattr(obj, self._name, value)

    def _target_type(self, obj) -> Type:
        if not self._tp:
            hints = typing.get_type_hints(obj)
            cls = type(obj).__name__
            assert (
                self._pubname in hints
            ), f"missing mandatory type annotation for {cls}.{self._pubname}"

            # automatically extract a tuple
            myhint = hints[self._pubname]
            (tp,) = typing.get_args(myhint)

            # duck-typing validate that annotated type is suitable for
            # isinstance (only types, tuples of types and Unions are)
            try:
                isinstance(None, tp)
            except TypeError as e:
                raise TypeError(
                    f"invalid type annotation for {cls}.{self._pubname}: {myhint}"
                ) from e

            self._tp = tp

        assert self._tp is not None
        return self._tp