Conversion Descriptor
- python, descriptors, ledgerify
- 6
- 2
- finished
All programs accept 2 types of input: input from the outside of the program (user input, input from other programs) and input from the inside of the program (programmer input: source code). Both are error-prone and both should be validated, preferably as soon as appear in the program. Don’t store them not validated and converted to the internal datatype.
Feature of statically typed languages, like Rust or C++, is that they validate programmer input and disallow a lot of mistakes. Not worrying about data types is a serious relief. Python is dynamically-typed language, but since version 3.5. it includes a support for type annotations (sidenote: Or: type hints.) which allow some kind of type (and thus: data) validation via static linters like mypy. For example, this code produces a mypy error:
foo: int = "11"
While working on ledgerify, which
is my work-in-progress approach to importing data into
ledger and hledger,
I’ve created a descriptor which automatically converts any assigned value to
the annotated data type. This frees programmer’s mind and is a convenience
for end-user, who can for example type a date-like string which gets
automatically converted to datetime
object.
To use it, just create a type-annotated class field and assign a Conversion
descriptor to it. Field should be annotated as Conversion[T]
, with T
being the actual type to which data should be converted.
(sidenote: By chance, this form of type annotation is, in my opinion, the correct one
for any descriptors. Some people lie about their types and write e.g. foo:
int = Descriptor()
, but field foo
isn’t the integer. Its a descriptor.
Descriptor[int]
gives a clear message: you’re dealing with descriptor which
wraps an integer.)
Type annotation is necessary, because at one
point descriptor detects whether a value is already converted or not and a
lot of converting functions, often just bare types, would break.
Conversion
works well with dataclasses which use a special __get__()
call
to determine whether descriptor supplies a default argument or not. It also
provies a default_factory
, to avoid the same pitfall with mutable types, as
dataclasses do.
(sidenote: If descriptor is not part of a dataclass,
then default
and default_factory
do nothing, because they require a
special dataclass-specific call.)
There are 2 kinds of validation: runtime validation and static validation.
Descriptor is heavily annotated, so mypy catches a lot of type errors.
However, Python suffers from being dynamically-typed, so we can’t check
everything statically. Conversion
descriptor tries to be as helpful as
possible and validates a lot of common pitfalls in a runtime, like converter
function which doesn’t convert, or missing or incorrect annotation.
The last one is actually important, because it’s a limitation of
Conversion
: how can annotation be incorrect and not caught by mypy?
Conversion[T]
requires that T
is actual type, not an alias type, or
whatever class typing module provides (apart from Optional
). So,
it should be Conversion[list]
instead of Conversion[list[int]]
or
Conversion[Iterablepint]]
. Descriptor compares assigned value against this
type and skips conversion altogether when types match. This isn’t mere
convenience or performance improvement: it’s necessary for types which don’t
support no-op conversion for their self-types (like decimal.Decimal
, which
can be constructed from strings, floats and integers, but not from other
decimals).
Usage Examples
def split_list(val: str):
return [v.strip() for v in val.split(",")]
@dataclass
class Foo:
arg: Conversion[int] = Conversion(int)
arg2: Conversion[Optional[int]] = Conversion(int)
dec: Conversion[Decimal] = Conversion(Decimal, default=Decimal(11))
list1: Conversion[list] = Conversion(split_list, default_factory=list)
dec2: Conversion[Optional[Decimal]] = Conversion(Decimal, default=None)
list2: Conversion[list | None] = Conversion(split_list, default_factory=list)
# mypy error: default isn't int
# err: Conversion[int] = Conversion(int, default=11)
# mypy error: default isn't int
# runtime error: conversion function doesn't convert to int
# err: Conversion[int] = Conversion(str, default="11")
# mypy error: no overload for both default and default_factory
# runtime error: disallowed use of both default and default_factory
# err: Conversion[list] = Conversion(split_list, default=[], default_factory=list)
# runtime error: missing mandatory annotation
# err = Conversion(Decimal, default=Decimal(11))
# runtime error: invalid type annotation
# err: Conversion[list[int]] = Conversion(split_list, default_factory=list)
# err: Conversion[List[int]] = Conversion(split_list, default_factory=list)
Let’s set some data:
f = Foo("123", 111)
f.arg2 = None
f.list1 = "Hello, world!"
f.list2 = None
f.dec = "11.23"
try:
f.dec = None
except ValueError:
print("Couldn't unset f.dec.")
print(f)
The output is:
Couldn't unset f.dec.
Foo(arg=123, arg2=None, dec=Decimal('11.23'), list1=['Hello', 'world!'], dec2=None, list2=None)
Descriptor Source Code
# SPDX-License-Identifier: GPL-3.0
# Copyright (C) 2024 Michał Góral.
import typing
from typing import Generic, TypeVar, Optional, Callable, Any, Type, overload
T = TypeVar("T")
Converter = Callable[[Any], T]
Factory = Callable[[], T]
# A sentinel which forces creating a new instance of default_factory
class _CreateDefaultFactory:
pass
_CREATE_DEFAULT_FACTORY = _CreateDefaultFactory()
# A sentinel which indicates not existing default/default_factory for a member
class _MissingArgument:
pass
_MISSING = _MissingArgument()
class Conversion(Generic[T]):
@overload
def __init__(self, conv: Converter) -> None:
...
@overload
def __init__(self, conv: Converter, *, default: T) -> None:
...
@overload
def __init__(self, conv: Converter, *, default_factory: Factory) -> None:
...
def __init__(self, conv, *, default=_MISSING, default_factory=_MISSING):
if default is not _MISSING and default_factory is not _MISSING:
raise ValueError("cannot specify both default and default_factory")
self._conv = conv
self._default = default
self._default_factory = default_factory
self._name: str = ""
self._pubname: str = ""
self._tp = None
def __set_name__(self, cls, name: str):
self._pubname = name
self._name = "_" + name
def __get__(self, obj, cls=None):
# dataclasses determines default value by calling
# descriptor.__get__(obj=None, tp=cls)
if obj is None:
if self._default is not _MISSING:
return self._default
if self._default_factory is not _MISSING:
return _CREATE_DEFAULT_FACTORY
raise AttributeError("no default")
return getattr(obj, self._name)
def __set__(self, obj, value):
if value is _CREATE_DEFAULT_FACTORY and self._default_factory:
value = self._default_factory()
# Don't convert values which already match desired type
if not isinstance(value, self._target_type(obj)):
try:
value = self._conv(value)
if not isinstance(value, self._target_type(obj)):
raise TypeError(
f"unexpected type after conversion for '{self._pubname}': {type(value)}"
)
except:
raise ValueError(f"conversion error for '{self._pubname}': {value}")
setattr(obj, self._name, value)
def _target_type(self, obj) -> Type:
if not self._tp:
hints = typing.get_type_hints(obj)
cls = type(obj).__name__
assert (
self._pubname in hints
), f"missing mandatory type annotation for {cls}.{self._pubname}"
# automatically extract a tuple
myhint = hints[self._pubname]
(tp,) = typing.get_args(myhint)
# duck-typing validate that annotated type is suitable for
# isinstance (only types, tuples of types and Unions are)
try:
isinstance(None, tp)
except TypeError as e:
raise TypeError(
f"invalid type annotation for {cls}.{self._pubname}: {myhint}"
) from e
self._tp = tp
assert self._tp is not None
return self._tp