|
|
Subscribe / Log in / New account

Altering Python attribute handling for modules

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

By Jake Edge
September 6, 2023

A recent discussion on the Python forum looked at a way to protect module objects (and users) from mistaken attribute assignment and deletion. There are ways to get the same effect today, but the mechanism that would be used causes a performance penalty for an unrelated, and heavily used, action: attribute lookup on modules. Back in 2017, PEP 562 ("Module __getattr__ and __dir__") set the stage for adding magic methods to module objects; now a new proposal would extend that idea to add __setattr__() and __delattr__() to them.

The idea came from a GitHub enhancement request that was filed for the mpmath Python module, which provides arbitrary-precision floating-point arithmetic. A common mistake is for users to import mpmath in an incorrect manner and then to end up setting an attribute on the module itself, which has no real effect, rather than on the context object mp. From the request:

    import numpy as np
    import mpmath as mp  # should be from mpmath import mp
    mp.dps = 50
The import statement for mpmath is incorrect, as noted, but it is a common mistake because of the way NumPy is usually (correctly) imported. The programmer thought they were setting the dps (decimal precision) attribute for the mpmath context object, but instead set the attribute on the module object itself, which has no effect on the precision.

The enhancement request notes that the problem is clearly a user error, but that it would be nice to somehow catch it and provide the user with an error message redirecting them to the proper setting (mp.mp.dps in this case). Since the functions of interest in mpmath are available from both the module object and the context object, the user may well not notice that their calculations are being done with the wrong precision. If the module could have a __setattr__() magic method that would be called when mp.dps is set (erroneously), the problem could be resolved by raising an exception. There is no support for __setattr__() on a module, however.

There is another way to get there, though, as is shown in the Python documentation. A module can create a subclass of ModuleType, which can contain other magic methods of interest, then change the class of the module object to be that subclass:

    import sys
    from types import ModuleType

    class VerboseModule(ModuleType):
	def __setattr__(self, attr, value):
	    print(f'Setting {attr}...')
	    super().__setattr__(attr, value)

    # change the class of the module instance __name__
    sys.modules[__name__].__class__ = VerboseModule

The problem with that approach, as pointed out by Sergey B Kirpichev in the initial discussion back in April, is that doing so negatively impacts the performance of attribute lookups for modules. He measured a nearly 3x difference in the speed of a lookup using the regular module object versus one with the __class__ modification (48ns versus 131ns). The lookup operation is commonplace for accessing anything that a module provides, of course.

The difference is that the __class__-based mechanism bypasses the internal C-based module object so all of its operations are done using the Python class. Allowing __setattr__() to be added to the module object, as was done in PEP 562 for __getattr__(), would mean that the C-based module object would only call out to Python when attributes are being set. It is worth noting that the module __getattr__() calls are only made if the attribute is not found on the object, which is also how it works for classes; so any performance penalty for adding __getattr__() is only paid for attributes that are being handled specially.

But, normally, looking up magic methods (also known as "dunders" for double underscore) like __getattr__() is done at the class level; those methods are defined for a class. PEP 562 added the ability to look up __getattr__() (and __dir__()) on module instances. The import machinery creates an instance from the ModuleType class when the module is imported. So a module that wants to add __getattr__() defines it at the top level of the module namespace , which is a big departure from the usual situation.

The April discussion showed a somewhat mixed reaction to the idea. Around the same time, PEP 713 ("Callable Modules") was proposed. It would add the ability to define a __call__() method for module instances, so that modules that only provide a single (main) interface can be called directly. The author of the PEP, Amethyst Reese, listed a number of standard library modules that could benefit from the feature. It also builds on the "precedent" of PEP 562.

In June, Victor Stinner posted a link to a GitHub issue in support of the module __setattr__() idea. He would like to protect the attributes of the sys module from accidental modification, which can result in strange error messages. For example, setting the sys.modules dictionary to an integer value will cause a later import to report:

    AttributeError: 'int' object has no attribute 'get'
He also suggested adding __delattr__() to the list of allowable module overrides so that efforts to delete attributes from modules like sys could be disallowed.

On August 31, Kirpichev proposed PEP 726 ("Module __setattr__() and __delattr__()"); he had drafted the PEP earlier, but it lacked a core-developer sponsor at that time. In the interim, he had worked on a reference implementation and gotten a sponsor. The reaction was generally favorable to the idea, but it was noted that there are ways around the protection for sys attributes that Stinner was looking for. As Oscar Benjamin put it:

This PEP does not in itself prevent anyone from replacing an object if they really want to. A determined monkeypatcher can still use sys.__dict__['modules'] = whatever. The intention here is more to prevent accidentally setting attributes. Otherwise the consenting adults principle still applies.

On September 5, both Jelle Zijlstra and steering council member Gregory P. Smith said that the PEP needed additional reasons to justify its addition to the language. Zijlstra said that Stinner's example was good, "but I'm not sure a single module is enough to justify a language change". Smith suggested that the PEP should change:

The current 726 text has a Motivation section that covers what one could do with it. But the thing I personally think is currently missing are specific use cases for why it'd be useful. ex: What problems are package or module maintainers having today that would be improved by having this in the future?

Smith noted that the council had just rejected PEP 713 (for adding __call__()) because there was "not enough compelling evidence of seemingly important practical uses for the feature". He suggested that PEP 726 would need to clear a similar bar in order to get accepted. Kirpichev thought that the example in the PEP (which is based around the mpmath problem that started the whole effort) was sufficiently concrete and that Stinner's example further showed the benefits.

He did update the PEP to try to make the motivation clearer; he also listed a few different real-world examples of the problem in his reply. Chris Markiewicz posted another use case, for handling a deprecated value for a module attribute, in the thread. Likewise, Andrej Klychin explained his use case for it, which involves invalidating a cache in the module when an attribute gets updated.

It is hard to say if there are enough "seemingly important practical uses" of the feature for the steering council to approve it. There is some discomfort with adding exceptions to the normal dunder lookup rules, though __setattr__() and __delattr__() seem like a reasonable extension of the PEP 562 change. There are a wide variety of ways that the proposed feature could be used, and the alternative mechanism, which slows down all module lookups, is not an attractive option—for something like sys in particular. It seems that there may be enough to clear the bar; the council is considering the PEP now, so a pronouncement should be coming fairly soon. If it is accepted, we should see it in Python 3.13, which is scheduled for October 2024.


Index entries for this article
PythonEnhancements
PythonPython Enhancement Proposals (PEP)/PEP 726


(Log in to post comments)

Altering Python attribute handling for modules

Posted Sep 7, 2023 0:15 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Hot take: Modules are literally just singleton objects with funny syntax.

Altering Python attribute handling for modules

Posted Sep 7, 2023 4:51 UTC (Thu) by mb (subscriber, #50428) [Link]

>In June, Victor Stinner posted a link to a GitHub issue in support of the module __setattr__() idea.
>He would like to protect the attributes of the sys module from accidental modification,

Yeah, well, no. I understand the idea and the intention, but please don't do this.
It changes the API significantly.
Python is a dynamic language and that is by design. It comes with advantages and disadvantages. One's disadvantage is another ones advantage.
Monkey patching is very common and useful. Please don't restrict it further.

It will result in https://xkcd.com/1172/

Altering Python attribute handling for modules

Posted Sep 7, 2023 7:00 UTC (Thu) by gdiscry (subscriber, #91125) [Link]

There are several mechanisms in Python to restrict operations. It's just that they can be bypassed if necessary.

For example, it's possible to bypass a custom __setattr__() by directly modifying the object's __dict__, or by using object.__setattr__() for classes using slots (which are techniques used by attrs for its frozen classes [1]).

The PEP already mentions using mod.__dict__ to bypass mod.__setattr__() [2].

[1] https://www.attrs.org/en/stable/how-does-it-work.html#imm...
[2] https://peps.python.org/pep-0726/#specification

Altering Python attribute handling for modules

Posted Sep 8, 2023 18:18 UTC (Fri) by geofft (subscriber, #59789) [Link]

I'm reading the linked GitHub issue in a slightly different way than I think you're reading it. I don't think Stinner is intending to break monkey-patching at all, and is indeed trying to make it work better. (So yes, he's trying to protect them from accidental modification, but not at all from intentional modification.)

His proposal for sys.__setattr__ is in fact not to prevent modification at all, but to make that modification reflected in the C API. Currently, if you modify sys.modules, PyImport_AddModule() doesn't take your modification into account: it still uses the original object that was exposed as sys.modules so the behavior is inconsistent and your monkey patch doesn't work. If he gets his way, he won't be preventing you from modifying sys.modules - on the contrary, he will make your modification to sys.modules also take effect for callers using the C API.

He also wants to add validators to ensure that, for instance, sys.modules is actually a dict. I think this also serves to help monkey-patchers - the goal of modifying sys.modules isn't to create a variable with a funny name for your own use, it's to actually change how the module system works. People who set sys.modules aren't generally intending to set it to an integer, and if they do, it's probably a bug in the monkey-patching code. Right now, that bug will only surface an error when someone tries to do an import. With the proposal, it will surface an error at assignment time, which is much easier to track down.

His proposal for sys.__delattr__ is to prevent you from deleting attributes, yes, but I don't think this irrecoverably breaks any workflows. Apart from the unlikelihood of anyone actually needing this, he points out that current code is always careful to guard accesses and treat missing attributes as None. So, if you need to preserve the current behavior after his proposal takes effect, you can always just explicitly assign None.

Altering Python attribute handling for modules

Posted Sep 19, 2023 21:21 UTC (Tue) by dmwyatt (guest, #167070) [Link]

This doesn't prevent monkey patching. It aims to prevent *accidental* monkey patching.


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds