Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 712 – Adding a “converter” parameter to dataclasses.field

Author:
Joshua Cannon <joshdcannon at gmail.com>
Sponsor:
Eric V. Smith <eric at trueblade.com>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
01-Jan-2023
Python-Version:
3.13
Post-History:
27-Dec-2022, 19-Jan-2023, 23-Apr-2023

Table of Contents

Abstract

PEP 557 added dataclasses to the Python stdlib. PEP 681 added dataclass_transform() to help type checkers understand several common dataclass-like libraries, such as attrs, Pydantic, and object relational mapper (ORM) packages such as SQLAlchemy and Django.

A common feature other libraries provide over the standard library implementation is the ability for the library to convert arguments given at initialization time into the types expected for each field using a user-provided conversion function.

Therefore, this PEP adds a converter parameter to dataclasses.field() (along with the requisite changes to dataclasses.Field and dataclass_transform()) to specify the function to use to convert the input value for each field to the representation to be stored in the dataclass.

Motivation

There is no existing, standard way for dataclasses or third-party dataclass-like libraries to support argument conversion in a type-checkable way. To work around this limitation, library authors/users are forced to choose to:

  • Opt-in to a custom Mypy plugin. These plugins help Mypy understand the conversion semantics, but not other tools.
  • Shift conversion responsibility onto the caller of the dataclass constructor. This can make constructing certain dataclasses unnecessarily verbose and repetitive.
  • Provide a custom __init__ which declares “wider” parameter types and converts them when setting the appropriate attribute. This not only duplicates the typing annotations between the converter and __init__, but also opts the user out of many of the features dataclasses provides.
  • Provide a custom __init__ but without meaningful type annotations for the parameter types requiring conversion.

None of these choices are ideal.

Rationale

Adding argument conversion semantics is useful and beneficial enough that most dataclass-like libraries provide support. Adding this feature to the standard library means more users are able to opt-in to these benefits without requiring third-party libraries. Additionally third-party libraries are able to clue type-checkers into their own conversion semantics through added support in dataclass_transform(), meaning users of those libraries benefit as well.

Specification

New converter parameter

This specification introduces a new parameter named converter to the dataclasses.field() function. If provided, it represents a single-argument callable used to convert all values when assigning to the associated attribute.

For frozen dataclasses, the converter is only used inside a dataclass-synthesized __init__ when setting the attribute. For non-frozen dataclasses, the converter is used for all attribute assignment (E.g. obj.attr = value), which includes assignment of default values.

The converter is not used when reading attributes, as the attributes should already have been converted.

Adding this parameter also implies the following changes:

Example

def str_or_none(x: Any) -> str | None:
  return str(x) if x is not None else None

@dataclasses.dataclass
class InventoryItem:
    # `converter` as a type (including a GenericAlias).
    id: int = dataclasses.field(converter=int)
    skus: tuple[int, ...] = dataclasses.field(converter=tuple[int, ...])
    # `converter` as a callable.
    vendor: str | None = dataclasses.field(converter=str_or_none))
    names: tuple[str, ...] = dataclasses.field(
      converter=lambda names: tuple(map(str.lower, names))
    )  # Note that lambdas are supported, but discouraged as they are untyped.

    # The default value is also converted; therefore the following is not a
    # type error.
    stock_image_path: pathlib.PurePosixPath = dataclasses.field(
      converter=pathlib.PurePosixPath, default="assets/unknown.png"
    )

    # Default value conversion extends to `default_factory`;
    # therefore the following is also not a type error.
    shelves: tuple = dataclasses.field(
      converter=tuple, default_factory=list
    )

item1 = InventoryItem(
  "1",
  [234, 765],
  None,
  ["PYTHON PLUSHIE", "FLUFFY SNAKE"]
)
# item1's repr would be (with added newlines for readability):
#   InventoryItem(
#     id=1,
#     skus=(234, 765),
#     vendor=None,
#     names=('PYTHON PLUSHIE', 'FLUFFY SNAKE'),
#     stock_image_path=PurePosixPath('assets/unknown.png'),
#     shelves=()
#   )

# Attribute assignment also participates in conversion.
item1.skus = [555]
# item1's skus attribute is now (555,).

Impact on typing

A converter must be a callable that accepts a single positional argument, and the parameter type corresponding to this positional argument provides the type of the the synthesized __init__ parameter associated with the field.

In other words, the argument provided for the converter parameter must be compatible with Callable[[T], X] where T is the input type for the converter and X is the output type of the converter.

Type-checking default and default_factory

Because default values are unconditionally converted using converter, if an argument for converter is provided alongside either default or default_factory, the type of the default (the default argument if provided, otherwise the return value of default_factory) should be checked using the type of the single argument to the converter callable.

Converter return type

The return type of the callable must be a type that’s compatible with the field’s declared type. This includes the field’s type exactly, but can also be a type that’s more specialized (such as a converter returning a list[int] for a field annotated as list, or a converter returning an int for a field annotated as int | str).

Indirection of allowable argument types

One downside introduced by this PEP is that knowing what argument types are allowed in the dataclass’ __init__ and during attribute assignment is not immediately obvious from reading the dataclass. The allowable types are defined by the converter.

This is true when reading code from source, however typing-related aides such as typing.reveal_type and “IntelliSense” in an IDE should make it easy to know exactly what types are allowed without having to read any source code.

Backward Compatibility

These changes don’t introduce any compatibility problems since they only introduce opt-in new features.

Security Implications

There are no direct security concerns with these changes.

How to Teach This

Documentation and examples explaining the new parameter and behavior will be added to the relevant sections of the docs site (primarily on dataclasses) and linked from the What’s New document.

The added documentation/examples will also cover the “common pitfalls” that users of converters are likely to encounter. Such pitfalls include:

  • Needing to handle None/sentinel values.
  • Needing to handle values that are already of the correct type.
  • Avoiding lambdas for converters, as the synthesized __init__ parameter’s type will become Any.
  • Forgetting to convert values in the bodies of user-defined __init__ in frozen dataclasses.
  • Forgetting to convert values in the bodies of user-defined __setattr__ in non-frozen dataclasses.

Additionally, potentially confusing pattern matching semantics should be covered:

@dataclass
class Point:
    x: int = field(converter=int)
    y: int

match Point(x="0", y=0):
    case Point(x="0", y=0):  # Won't be matched
        ...
    case Point():  # Will be matched
        ...
    case _:
        ...

However it’s worth noting this behavior is true of any type that does conversion in its initializer, and type-checkers should be able to catch this pitfall:

match int("0"):
  case int("0"):  # Won't be matched
      ...
  case _:  # Will be matched
      ...

Reference Implementation

The attrs library already includes a converter parameter exhibiting the same converter semantics (converting in the initializer and on attribute setting) when using the @define class decorator.

CPython support is implemented on a branch in the author’s fork.

Rejected Ideas

Just adding “converter” to typing.dataclass_transform’s field_specifiers

The idea of isolating this addition to dataclass_transform() was briefly discussed on Typing-SIG where it was suggested to broaden this to dataclasses more generally.

Additionally, adding this to dataclasses ensures anyone can reap the benefits without requiring additional libraries.

Not converting default values

There are pros and cons with both converting and not converting default values. Leaving default values as-is allows type-checkers and dataclass authors to expect that the type of the default matches the type of the field. However, converting default values has three large advantages:

  1. Consistency. Unconditionally converting all values that are assigned to the attribute, involves fewer “special rules” that users must remember.
  2. Simpler defaults. Allowing the default value to have the same type as user-provided values means dataclass authors get the same conveniences as their callers.
  3. Compatibility with attrs. Attrs unconditionally uses the converter to convert default values.

Automatic conversion using the field’s type

One idea could be to allow the type of the field specified (e.g. str or int) to be used as a converter for each argument provided. Pydantic’s data conversion has semantics which appear to be similar to this approach.

This works well for fairly simple types, but leads to ambiguity in expected behavior for complex types such as generics. E.g. For tuple[int, ...] it is ambiguous if the converter is supposed to simply convert an iterable to a tuple, or if it is additionally supposed to convert each element type to int. Or for int | None, which isn’t callable.

Deducing the attribute type from the return type of the converter

Another idea would be to allow the user to omit the attribute’s type annotation if providing a field with a converter argument. Although this would reduce the common repetition this PEP introduces (e.g. x: str = field(converter=str)), it isn’t clear how to best support this while maintaining the current dataclass semantics (namely, that the attribute order is preserved for things like the synthesized __init__, or dataclasses.fields). This is because there isn’t an easy way in Python (today) to get the annotation-only attributes interspersed with un-annotated attributes in the order they were defined.

A sentinel annotation could be applied (e.g. x: FromConverter = ...), however this breaks a fundamental assumption of type annotations.

Lastly, this is feasible if all fields (including those without a converter) were assigned to dataclasses.field, which means the class’ own namespace holds the order, however this trades repetition of type+converter with repetition of field assignment. The end result is no gain or loss of repetition, but with the added complexity of dataclasses semantics.

This PEP doesn’t suggest it can’t or shouldn’t be done. Just that it isn’t included in this PEP.


Source: https://github.com/python/peps/blob/main/peps/pep-0712.rst

Last modified: 2023-11-14 19:22:34 GMT