Draft PEP: Sealed decorator for static typing

Below is a draft PEP proposing adding a @sealed decorator to the typing module to support algebraic data types.

Previous discussions: typing-sig mailing list discussing an earlier version of this draft PEP and typing meetup to discuss a more ambitious idea.

PEP: XXX
Title: Sealed Decorator for Static Typing
Author: John Hagen johnthagen@gmail.com, David Hagen david@drhagen.com
Sponsor: Jelle Zijlstra
PEP-Delegate: TBD
Discussions-To: Discourse thread
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Mar-2024
Python-Version: 3.13
Post-History:
Resolution:

Abstract

This PEP proposes a @sealed decorator be added to the typing module to
support creating versatile algebraic data types (ADTs) which type checkers can
exhaustively pattern match against.

Motivation

Quite often it is desirable to apply exhaustiveness to a set of classes without
defining ad-hoc union types, which is itself fragile if a class is missing in
the union definition. A design pattern where a group of record-like classes is
combined into a union is popular in other languages that support pattern
matching 1 and is known as a nominal sum type, a key instatiation of
algebraic data types 2.

We propose adding a special decorator class @sealed to the typing
module 3, that will have no effect at runtime, but will indicate to static
type checkers that all direct subclasses of this class should be defined in the
same module as the base class.

The idea is that, since all subclasses are known, the type checker can treat
the sealed base class as a union of all its subclasses. Together with
dataclasses this allows a clean and safe support of algebraic data types
in Python. Consider this example,

from dataclasses import dataclass
from typing import sealed

@sealed
class Node:
    ...

@sealed
class Expression(Node):
    ...

@sealed
class Statement(Node):
    ...

@dataclass
class Name(Expression):
    name: str

@dataclass
class Operation(Expression):
    left: Expression
    op: str
    right: Expression

@dataclass
class Assignment(Statement):
    target: str
    value: Expression

@dataclass
class Print(Statement):
    value: Expression

With such a definition, a type checker can safely treat Node as
Union[Expression, Statement], and also safely treat Expression as
Union[Name, Operation] and Statement as Union[Assignment, Print].
With these declarations, a type checking error will occur in the below snippet,
because Name is not handled (and the type checker can give a useful error
message).

def dump(node: Node) -> str:
    match node:
        case Assignment(target, value):
            return f"{target} = {dump(value)}"
        case Print(value):
            return f"print({dump(value)})"
        case Operation(left, op, right):
            return f"({dump(left)} {op} {dump(right)})"

Note: This section was largely derived from PEP 622 4.

Rationale

Kotlin 5, Scala 2 6, and Java 17 7 all support a sealed keyword
that is used to declare algebraic data types. By using the same terminology,
the @sealed decorator will be familiar to developers familiar with those
languages.

Specification

The typing.sealed decorator can be applied to the declaration of any class.
This decoration indicates to type checkers that all immediate subclasses of the
decorated class are defined in the current file.

The exhaustiveness checking features of type checkers should assume that there
are no subclasses outside the current file, treating the decorated class as a
Union of all its same-file subclasses.

Type checkers should raise an error if a sealed class is inherited in a file
different from where the sealed class is declared.

A sealed class is automatically declared to be abstract. Whatever actions a
type checker normally takes with abstract classes should be taken with sealed
classes as well. What exactly these behaviors are (e.g. disallowing
instantiation) is outside the scope of this PEP.

Similar to the typing.final decorator 8, the only runtime behavior of
this decorator is to set the __sealed__ attribute of class to True so
that the sealed property of the class can be introspected. There is no runtime
enforcement of sealed class inheritance.

Reference Implementation

[Link to any existing implementation and details about its state, e.g. proof-of-concept.]

Rejected Ideas

Union of independent variants

Some of the behavior of sealed can be emulated with Union today.

class Leaf: ...
class Branch: ...

Node = Leaf | Branch

The main problem with this is that the ADT loses all the features of
inheritance, which is rather featureful in Python, to put it mildly. There can
be no abstract methods, private methods to be reused by the subclasses, public
methods to be exposed on all subclasses, class methods of any kind,
__init_subclass__, etc. Even if a specific method is implemented on each
subclass, then rename, jump-to-definition, find-usage, and other IDE features
are difficult to make work reliably.

Adding a base class in addition to the union type alleviates some of these
issues:

class BaseNode: ...

class Leaf(BaseNode): ...
class Branch(BaseNode): ...

Node = Leaf | Branch

Despite being possible today, this is quite unergonomic. The base class and the
union type are conceptually the same thing, but have to be defined as two
separate objects. If this became standard, it seems Python would be first
language to separate the definition of an ADT into two different objects.

This duplication causes a serious don’t-repeat-yourself problem. A new subclass
must be added to both the base class and the union type. Failure to do so will
not result in an immediate error but in inconsistent behavior between the two
representations.

The base class is not merely passive, either. There are a number of operations
that will only work when using the base class instead of the union type and
vice verse. For example, matching only works on the base class, not the union
type:

maybe_node: Node | None = ...  # must be Node to enforce exhaustiveness

match maybe_node:
    case Node():  # TypeError: called match pattern must be a type
        ...
    case None:
        ...

match maybe_node:
    case BaseNode():  # no error
        ...
    case None:
        ...

Having to remember whether to use the base class or the union type in each
situation is particularly unfriendly to the user of a sealed class.

Generalize Enum

Rust 9, Scala 3 10, and Swift 11 support algebraic data types using a
generalized enum mechanism.

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(i32, i32, i32),
}

One could imagine a generalization of the Python Enum 12 to support
variants of different shapes. Valueless variants could use enum.auto to
keep themselves terse.

    from dataclasses import dataclass
    from enum import auto, Enum

    class Message(Enum):
        Quit = auto()

        @dataclass
        class Move:
            x: int
            y: int

        @dataclass
        class Write:
            message: str

        @dataclass
        class ChangeColor:
            r: int
            g: int
            b: int

This solution allows attaching methods directly to the base ADT type,
something a Union type lacks, but does not support the full
power of inheritance that @sealed would provide.

This would be a substantial addition to the implementation and
semantics of Enum.

Explicitly list subclasses

Java requires that subclasses be explicitly listed with the base class.

public sealed interface Node
    permits Leaf, Branch {}

public final class Leaf {}
public final class Branch {}

The advantage of this requirement is that subclasses can be defined anywhere,
not just in the same file, eliminating the somewhat weird file dependence of
this feature. Once disadvantage is that it requires that all subclasses to be
written twice: once when defined and once in the enumerated list on the base
class.

There is also an inherent circular reference when explicitly enumerating the
subclasses. The subclass refers to the base class in order to inherit from it,
and the base class refers to the subclasses in order to enumerate them. In
statically typed languages, these kinds of circular references in the types can
be managed, but in Python, it is much harder.

For example, this Sealed base class that behaves like Generic:

from typing import Sealed

class Node(Sealed[Leaf, Branch]): ...

class Leaf(Node): ...
class Branch(Node): ...

This cannot work because Leaf must be defined before Node and Node
must be defined before Leaf. This is a not an annotation, so lazy
annotations cannot save it. Perhaps, the subclasses in the enumerated list could
be strings, but that severely hurts the ergonomics of this feature.

If the enumerated list was in an annotation, it could be made to work, but there
is no natural place for the annotation to live. Here is one possibility:

class Node:
    __sealed__: Leaf | Branch

class Leaf(Node): ...
class Branch(Node): ...

Copyright

This document is placed in the public domain.

13 Likes

I have a real-world use-case for this! mypy plugin allows false positive error when exhaustively pattern matching on Result · Issue #1361 · dry-python/returns · GitHub Thanks a lot for writting down this PEP, I hope that it will be accepted :slight_smile:

3 Likes

As I said in the typing-sig thread and the typing meeting, I’m very negative on this idea. I will repeat my arguments here for completeness and visibility. I think the proposed mechanism is largely redundant, and it violates some important principles in the the Python language. It provides only a small convenience to a small audience but would result in a net negative to millions of Python developers if it were implemented in pyright. As such, I think it represents a bad tradeoff for the Python community as a whole.

For context, pyright implements lazy (just-in-time) type evaluation. This is critical for its performance and low latency when it is used to power language server features like completion suggestions. If you have several hundred classes defined in a “.py” or “.pyi” file and you reference one of them by import, pyright evaluates only that one class, not the entire file. This is possible because Python has been designed such that the meaning of a symbol can be determined by examining only its definition. This PEP proposes to break this design principle for the first time and say that the meaning of a symbol (for purposes of type checking) depends on an arbitrary number of other symbol definitions in the same file. This would require a lazy type checker to analyze all class definitions in a file to determine the type and meaning of one symbol in that file.

The mechanism described in this PEP also represents the first time where the meaning of a symbol changes when you move class definitions between files. For the past 30 years, Python has allowed symbols to be defined in whatever source file is most appropriate. This PEP breaks the ability to refactor because all subclasses of a @sealed class are required to be defined in the same file.

This PEP proposes to create a new, implicit way to define a union. Once a sealed class is defined, the symbol represents both a class and a union type from the perspective of type checking. The alternative solution is define an explicit union of classes. Unions are already well-supported in the type system and by all static type checking tools. The explicit union solution is listed in the “Rejected Ideas” section, but I find the argument against it unconvincing. I frequently use the explicit union technique (with or without a common base class), and I do not find it to be unergonomic or a maintenance burden.

I’ll also note that the proposed solution would not work for TypedDict classes because they are not allowed to derive from non-TypedDict classes, so this technique has compositional issues with existing typing features.

21 Likes

I would love to be able to avoid this pattern

which is currently inevitable if you use type hints with inheritance. It clutters the code with so many Base[ClassName]s and [ClassName]s and makes it hard to read (imagine if you also want to subclass Leaf and Branch).

But your proposed implementation doesn’t convince me. Having the type checker look for all subclasses of a sealed class defined in the same module feels too “magical”.

I much prefer this rejected idea

or something along that line, e.g. @sealed("Leaf", "Branch") works too.

“Explicit is better than implicit.”

7 Likes

Can this not be fixed?

Did I promise at some point to sponsor this PEP? I’m not currently enthusiastic about the idea, so if you do choose to submit it as a PEP, I’d prefer you find another sponsor.

2 Likes

I don’t think there is anything technical blocking the allowance of Union in runtime type checks. A object is an instance of a Union if it is an instance of any of its members. If there were a lot of members, this would be substantially less performant than an isinstance check on the base class unless some caching mechanism was added to each Union object, like is used in singledispatch.

Not a promise, but a probable willingness in the original discussion. In any case, that was awhile ago and I don’t want to hold you to something you aren’t interested in anymore.

Any core dev interested in being the sponsor of this PEP?

It provides only a small convenience to a small audience but would result in a net negative to millions of Python developers if it were implemented in pyright. As such, I think it represents a bad tradeoff for the Python community as a whole.

Mainly curious, is this based purely due to an assessment of pyright’s current implementation and community?

This would require a lazy type checker to analyze all class definitions in a file to determine the type and meaning of one symbol in that file.

Would this affect mypy negatively-so in the same manner as well as other type checkers? Might be worth to see others chime in.

This PEP proposes to create a new, implicit way to define a union

Agree. I do see this idea as a bigger benefit from a typing perspective though. Looking at how other languages do this, I don’t see this as such a negative runtime tradeoff over an alternative implementations. I can see how libraries like python-returns would benefit here the most, helping the community at large.

1 Like

I was originally relatively positive on this, but after giving it more thought, I think the problems this attempts to solve may be better assisted by allowing case-matching a Union. I’m not sure What the semantics of that should require (that the union members share the same __match_args__ signature is probably reasonable), but this is the only place where there’s any actual need to expose the baseclass and end up with divergence, and it seems like solving this would be a much better route to fix ergonomics.

Edit: This was an off-handed comment that I had not thought through enough. I still think the problems the sealed decorator are attempting solve should be done a different way, for reasons that have been captured by the rest of the discussion already, but I don’t think adding match support to unions is the right way to go.

1 Like

I started a discussion for your idea here: Syntactic sugar for union cases in match statements

Making Union work in a match may be nice on its own, but it does not resolve the need for sealed or something like it. An equally big problem with Unions is that inheritance does not work at all. A Union will never have abstract methods, class methods, __init_subclass__, etc.–all the things that you can do with inheritance in Python. If you have

class Leaf: ...
class Branch: ...

Node = Leaf | Branch

then Node.parse(text) cannot be made to work in any way I can think of.

I think this is a fine path to go down. The difficulty is that it fundamentally requires a circular reference, which is uniquely tricky to make exist in Python even if only for typing.

Annotations are basically the only things that are evaluated lazily in Python, so the only implementation that works is probably this one:

class Node:
    __sealed__: Leaf | Branch

class Leaf(Node): ...
class Branch(Node): ...

But if were willing to add syntax for this, then this is solvable by copying the feature directly from other languages:

class Node of Leaf | Branch: ...

class Leaf(Node): ...
class Branch(Node): ...

I would have thought there was no chance syntax would be added for this, but we just got syntax for type parameters on classes and methods, so maybe there is more appetite for nice typing syntax these days.

1 Like

I dont think that’s exactly compelling.

class _Base:
    def foo() -> str:
        ...

class A(_Base):  ...
class B(_Base):  ...

SomeSumType = A | B

From here, you’ve declared your functionality, shared functionality is only declared once, and shared functionality can safely be used on any object that belongs to that union

def use_above(obj: SomeSumType):
    obj.foo()  # this is fine

As for inheritance, your sealed example doesn’t allow other users to inherit from it, and It generally doesn’t appear to make sense to expose the actual base as a constructor if you’re expecting one of the subclasses of it.

4 Likes

I haven’t had the opportunity to use Python 3.10 and above too much and I don’t understand ADTs all too well so could you explain what the benefit of the proposed sealed decorator would be over using abstract base classes?

I agree with all of these arguments, but is there a way to get runtime checking too?

Override __init_subclass__? You could raise an error on any new subclass outside of a fixed list.

Right, but you need to expose the list to both the runtime and the type checker. And you need to be able to seal subclasses with a different set of types, right?

As I understand it, the core benefit here is to allow static verification tools (linters, type checkers) to do what is called exhaustiveness checking.

When writing a function that processes an instance of the base data type, where you need to detect the child type and act differently for each one, the type checker can enforce that you have handled every child type.

Edit: this is pretty common in languages that emphasize ADTs and structural pattern matching. In Python the match statement is still pretty new but “sealed” would probably be most helpful with that.

Runtime checking alone would not provide the main benefit of this proposal: static verification of exhaustive matching. I guess I don’t feel strongly about there being runtime behavior in addition to the static type checking, but most of the recent additions to typing in Python have leaned fully into “consenting adults”, having no runtime behavior whatsoever. This draft PEP reflects that.

1 Like