Python Attribute Access and the Descriptor Protocol
source link: https://www.tuicool.com/articles/UvQbMn3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Let’s look at the following snippet:
class Foo: def __init__(self): self.bar = 'hello!' foo = Foo() print(foo.bar)
We already know how Foo
instantiation works . Today, our question is this:
What exactly happens when we say foo.bar
?
You might already know that most Python classes have an internal dictionary called __dict__
which holds all of their internal variables. And what’s amazing about Python is that we can simply inspect even internal implementation details like this one:
>>> foo = Foo() >>> foo.__dict__ {'bar': 'hello!'}
So we can arrive at the following incomplete hypothesis:
foo.bar
is equivalent to foo.__dict__['bar']
.
It looks correct:
>>> foo = Foo() >>> foo.__dict__['bar'] 'hello!'
Now, suppose you’re a sophisticated fancy-pants modern Pythonista and you know you can define dynamic attributes in Python classes , how does that sit with our knowledge about __dict__
?
>>> class Foo: ... def __init__(self): ... self.bar = 'hello!' ... ... def __getattr__(self, item): ... return 'goodbye!' ... ... foo = Foo() >>> foo.bar 'hello!' >>> foo.baz 'goodbye!' >>> foo.__dict__ {'bar': 'hello!'}
Err… okay. We can see that __getattr__
can “fake” attribute access, but it doesn’t work if we already have that variable defined (meaning, foo.bar
returns 'hello!'
and not 'goodbye!'
). So this mechanism is more complex than it seemed, and there’s actual logic involved when accessing attributes. Indeed, there’s a magic method that’s called whenever we access instance attributes, but it’s clearly not __getattr__
as we can see in the above example. This magic method is called __getattribute__
and we’ll try to reverse engineer it by observing its different behaviors. For now, let’s modify our hypothesis:
foo.bar
is equivalent to calling foo.__getattribute__('bar')
, which is roughly:
def __getattribute__(self, item): if item in self.__dict__: return self.__dict__[item] return self.__getattr__(item)
Let’s test this out by actually implementing this method (with a different name) and calling it directly:
>>> class Foo: ... def __init__(self): ... self.bar = 'hello!' ... ... def __getattr__(self, item): ... return 'goodbye!' ... ... def my_getattribute(self, item): ... if item in self.__dict__: ... return self.__dict__[item] ... return self.__getattr__(item) >>> foo = Foo() >>> foo.bar 'hello!' >>> foo.baz 'goodbye!' >>> foo.my_getattribute('bar') 'hello!' >>> foo.my_getattribute('baz') 'goodbye!'
Looks good, right?
Great, so let’s just make sure that we also support setting these variables and we can go home and enjoy the rest of the -
>>> foo.baz = 1337 >>> foo.baz 1337 >>> foo.my_getattribute('baz') = 'h4x0r' SyntaxError: can't assign to function call
Damn.
In retrospect this seems a bit obvious. my_getattribute
returns something that is like a reference. We can mutate it, but we can’t reassign the original value to a new object. So what the hell is going on here? If foo.baz
translates to any function call, how can we ever assign to it?
When we look at a statement like foo.bar = 1
, there’s an extra something going on. And it seems like we simply don’t access attributes the same way when we set them, as opposed to get them. Indeed, we can also override __setattr__
in a similar manner:
>>> class Foo: ... def __init__(self): ... self.__dict__['my_dunder_dict'] = {} ... self.bar = 'hello!' ... ... def __setattr__(self, item, value): ... self.my_dunder_dict[item] = value ... ... def __getattr__(self, item): ... return self.my_dunder_dict[item] >>> foo = Foo() >>> foo.bar 'hello!' >>> foo.bar = 'goodbye!' >>> foo.bar 'goodbye!' >>> foo.baz Traceback (most recent call last): File "<pyshell#75>", line 1, in <module> foo.baz File "<pyshell#70>", line 10, in __getattr__ return self.my_dunder_dict[item] KeyError: 'baz' >>> foo.baz = 1337 >>> foo.baz 1337 >>> foo.__dict__ {'my_dunder_dict': {'bar': 'goodbye!', 'baz': 1337}}
A few things to note about the above snippet:
- There’s intentional asymmetry such that
__setattr__
doesn’t have an analogous accompanying method similar to__getattribute__
(i.e., there’s no__setattribute__
). -
__setattr__
works in__init__
as well - that’s why we do a weird assignment tomy_dunder_dict
(self.__dict__['my_dunder_dict'] = {}
). Otherwise, we’ll get infinite recursion.
And then… there’s property
(and friends). Decorators that make methods behave like members. Sigh.
Let’s try to understand how this is happening.
>>> class Foo(object): ... def __getattribute__(self, item): ... print('__getattribute__ was called') ... return super().__getattribute__(item) ... ... def __getattr__(self, item): ... print('__getattr__ was called') ... return super().__getattr__(item) ... ... @property ... def bar(self): ... print('bar property was called') ... return 100 >>> f = Foo() >>> f.bar __getattribute__ was called bar property was called
Out of curiosity, what’s in f.__dict__
then?
>>> f.__dict__ __getattribute__ was called {}
Let me get this straight. bar
is not in __dict__
, but __getattr__
isn’t called. wat? .
Well , bar
is a method and it accepts the class instance, but it’s actually a member of the class object, not the instance. Let’s verify:
>>> Foo.__dict__ mappingproxy({'__dict__': <attribute '__dict__' of 'Foo' objects>, '__doc__': None, '__getattr__': <function Foo.__getattr__ at 0x038308A0>, '__getattribute__': <function Foo.__getattribute__ at 0x038308E8>, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'Foo' objects>, 'bar': <property object at 0x0381EC30>})
We can see bar
as the last item in that dictionary. In order to reconstruct __getattribute__
we need to answer another question here - who has precedence, the instance, or the class?
>>> f.__dict__['bar'] = 'will we see this printed?' __getattribute__ was called >>> f.bar __getattribute__ was called bar property was called 100
Alright. We now know that the class’ __dict__
is also checked and that it has priority. So it’s just a minor complicati –
Wait wait wait, when did we even call the bar
method? I mean, our pseudo-code for __getattribute__
never calls the object, so what’s going on?
Enter The Descriptor Protocol :
descr.__get__(self, obj, type=None) -> value
descr.__set__(self, obj, value) -> None
descr.__delete__(self, obj) -> None
That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute.
If an object defines both __get__()
and __set__()
, it is considered a data descriptor. Descriptors that only define __get__()
are called non-data descriptors (they are typically used for methods but other uses are possible).
Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.
To make a read-only data descriptor, define both __get__()
and __set__()
with the __set__()
raising an AttributeError when called. Defining the __set__()
method with an exception raising placeholder is enough to make it a data descriptor.
TL;DR - if you implement any of __get__
, __set__
or __delete__
you have officially, erm… Descripted a Protocol , I guess? Which is exactly what the property
decorator is doing. In the case of calling it like we did, it defines a read-only data descriptor, which is then called in __getattribute__
.
One last refactor:
foo.bar
as a getter, is equivalent to calling foo.__getattribute__('bar')
, which is roughly:
def __getattribute__(self, item): if item in self.__class__.__dict__: v = self.__class__.__dict__[item] elif item in self.__dict__: v = self.__dict__[item] else: v = self.__getattr__(item) if hasattr(v, '__get__'): v = v.__get__(self, type(self)) return v
Let’s try to demonstrate all the behaviors we know:
class Foo: class_attr = "I'm a class attribute!" def __init__(self): self.dict_attr = "I'm in a dict!" @property def property_attr(self): return "I'm a read-only property!" def __getattr__(self, item): return "I'm dynamically returned!" def my_getattribute(self, item): if item in self.__class__.__dict__: print('Retrieving from self.__class__.__dict__') v = self.__class__.__dict__[item] elif item in self.__dict__: print('Retrieving from self.__dict__') v = self.__dict__[item] else: print('Retrieving from self.__getattr__') v = self.__getattr__(item) if hasattr(v, '__get__'): print("Invoking descriptor's __get__") v = v.__get__(self, type(self)) return v
>>> foo = Foo() ... ... print(foo.class_attr) ... print(foo.dict_attr) ... print(foo.property_attr) ... print(foo.dynamic_attr) ... ... print() ... ... print(foo.my_getattribute('class_attr')) ... print(foo.my_getattribute('dict_attr')) ... print(foo.my_getattribute('property_attr')) ... print(foo.my_getattribute('dynamic_attr')) I'm a class attribute! I'm in a dict! I'm a read-only property! I'm dynamically returned! Retrieving from self.__class__.__dict__ I'm a class attribute! Retrieving from self.__dict__ I'm in a dict! Retrieving from self.__class__.__dict__ Invoking descriptor's __get__ I'm a read-only property! Retrieving from self.__getattr__ I'm dynamically returned!
There’s always more. I’ve just scratched the surface of Python’s internals, and while the general idea is correct, it’s probable that the small details are implemented differently. Please read the official sources below if you need exact implementation details.
My hope is that aside from demonstrating how attribute access works, I’ve also convinced you of how beautiful Python is - a language you can push and prod and experiment with. Settle someknowledge debt today.
Sources
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK