July 09, 2016
Some days ago I got a simple question.
class A():
x = 1
a = A()
b = A()
b.x = 11
a.x # ?
My answer was wrong but it was the good occasion to pick some books and the reference docs back and spend some hours with the concepts behind: classes, instances, attributes and namespaces.
All the things you are going to read are related to Python 3.x.
In Python you have 2 very important concepts: scope and namespace. They are both related but generally scope is related to unqualified names (X for example) and namespace is related to qualified attribute names (object.X). Due to the fact that in Python everything is an object the difference is slight, but in general we can assume that.
Time for some code
X = 20 # global X
def f():
print(X)
def f1():
X = 1 # local X
print(X)
f() # 20
f1() # 1
Python (generally) follows the LEGB rule, where LEGB means Local -> Enclosed -> Global -> Built-in.
LEGB rule means that when you call X Python will look in order inside the:
And if Python doesn't find anything it throws an error:
#X = 20
def f():
print(X)
f() # 20
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-6b47feb56ddf> in <module>()
3 print(X)
4
----> 5 f() # 20
<ipython-input-2-6b47feb56ddf> in f()
1 #X = 20
2 def f():
----> 3 print(X)
4
5 f() # 20
NameError: name 'X' is not defined
If you want to know more about LEGB you can start from here.
Let's go back to our classes.
class C(): # class is a reserved keyword used to create class
X = 10
def f(self):
print(X)
c = C()
We just defined a simple class, named C, X is defined inside class C, c is what we call an object of C or an instance of C.
"f" is a function that accepts 1 parameter, self.
print(C.f, c.f)
<function C.f at 0x1039aaae8> <bound method C.f of <__main__.C object at 0x103ea8e10>>
Now the first strange part, we called f, a function defined inside C, and we get 2 different things.
A function with C.f and a method with c.f.
The keyword here is bound or at least it's where the main difference lives.
But let's call our function (or method in this case):
c.f()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-5abfb4dafb0b> in <module>()
4 print(X)
5 c = C()
----> 6 c.f()
<ipython-input-5-5abfb4dafb0b> in f(self)
2 X = 10
3 def f(self):
----> 4 print(X)
5 c = C()
6 c.f()
NameError: name 'X' is not defined
mmmm.... Python should find X following the LEGB, does the LEGB rule still apply or not ?
Let's try with this:
X = 50
class C():
X = 10
def f(self):
Y = 10
print(X)
def f1():
print(Y)
f1()
c = C()
c.f()
# Output: 50
# Output: 10
We have a nested function (f1) and we added X = 50 in the global scope and now the code works.
But what about X inside class C ?
Well X (inside class C) is not exactly a variable, it's an attribute and behaves in a different way from a variable when we talk about LEGB.
class C():
X = 10
def f(self):
print(self.X)
c = C()
c.f() # 10
We just changed X with self.X inside the print function a now it works.
Why ?
Well.... time to explain the concept of self and namespaces.
As I said, self is just a parameter.
class C():
X = 10
def f(legion):
print(legion.X)
c = C()
c.f() # 10
This code works in the same way, we use self as a convention, it's just a reference to the instance, in this case to the instance c, that is passed to the function when we call it.
When we type c.f() Python is calling C.f(c), where C is the class of our instance, f is our function/method and c is the first parameter required by f (self or legion).
And do you remember this ?
c.f # <bound method C.f of <__main__.C object at 0x1039cf6d8>>
Now the meaning of bound is more clear, it means that when we call f with c.f we are automatically passing a reference to our instance.
So from this moment when we talk about a function that accept a self parameter we will call it instance method.
And yes you can also have unbound method, that are not related to your instances like class method or static method, but we leave this for the future.
A namespace is a collection of...names.
A collection of references to objects like name=object.
Why namespaces are so important ?
Because every class has a namespace and...... every instance of a class has a namespace too.
They are completely separated but related somehow.
Let's look inside our class and instance namespaces, we have a built-in attribute for this: __dict__
# I cleaned the output of C.__dict__ of all the built-in methods/attributes
C.__dict__ # mappingproxy({'f': <function C.f at 0x1039c1378>, 'X': 10})
c.__dict__ # {}
As you can see class namespace and instance namespace are completely different.
Class namespace is a mappingproxy, instance namespace is a dict.
A mappingproxy is a kind of read-only dict.
You can find why mappingproxy is used here.
When I said that mappingproxy is read-only I mean that you cannot assign items using the mappingproxy as a dict for example:
C.__dict__['X'] # 10
C.__dict__['X'] = 10 #
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-193-ec143c56e8cd> in <module>()
1 C.__dict__['X']
----> 2 C.__dict__['X'] = 10
TypeError: 'mappingproxy' object does not support item assignment
If you want to put new elements inside your class namespace you have to use other ways.
Back to C namespace we can find f, our instance method, and X.
X as we said is not properly a variable and is defined in C.__dict__.
How can we call it ? Class attribute
And with class attribute we mean that X (attribute) belongs to C (class), because as we see X is inside C namespace (C.__dict__).
Can we access directly a class attribute ?
Yes, with NameOfTheClass.AttributeName
C.X # 10
Just to recap:
But before I said that class namespaces and instance namespaces are completely separated.
class C():
X = 10
def f(self):
print(self.X)
c = C()
c.f() # 10
When we type self.X we tell Python to "look inside the namespace of instance self for X".
How can we access something that belongs to C (attribute X) from self.X (our instance) ?
Time for the next part.
MRO means Method Resolution Order and is how and why from self.X we get C.__dict__['X'].
As I said namespace of a class and its instances are separated but somehow related.
MRO is behind the "somehow related".
When Python look for an attribute, like self.X, it will search in order:
Object is...the mother of all classes but we are not going to talk about her here.
Let's look at another example
class Mother():
M = 22
class Father():
F = 34
class Son(Mother, Father):
S = 10
def f(self, x):
print(getattr(self, x))
a = Son()
a.f("M") # Output: 22
a.f("F") # Output: 34
a.f("S") # Output: 10
print(a.M) # Output: 22
print(a.F) # Output: 34
print(a.S) # Output: 10
With getattr we are using our instance (self) and another parameter ("M", "F" or "S") to get our class attributes.
I used getattr because I can specify dynamically the name of the attribute but is exactly like self.M (or a.M in our case).
As I said our instance is related to its class (Son) and to the superclasses of Son (Mother and Father).
The process of giving an order between your instance and its class and between your class and all its superclasses is called linearization.
The algorithm behind Python MRO is called C3 and the main thing to remember is this:
Bases are what we put inside the parenthesis after the name of a class: class Name(bases).
With **bases** we get a tuple with the parents of a class, with **class** we get the class of an instance:
a.__class__ # __main__.A,
Son.__bases__ # (__main__.Mother, __main__.Father)
Mother.__bases__ # (object,)
Father.__bases__ # (object,)
Nothing new I would say, can we see the search order of MRO ?
Yes, with __mro__:
Son.__mro__ # (__main__.Son, __main__.Mother, __main__.Father, object)
When we type the code below:
a.M # 22
Python searches in order:
With the rule first-come first-served.
class Mother():
M = 22
class Father():
F = 34
class Son(Mother, Father):
S = 10
F = 50
def f(self, x):
print(getattr(self, x))
a = Son()
a.F # Output: 50
This is why we get 50 here instead of 34.
And what about F defined inside Father ?
As I said first-come first-served, and in our __mro__ Son namespace is before Father namespace.
And what does it happen if you look for a reference that doesn't exist in all the __mro__ namespaces ?
a.DXIUISD
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-242-bcb50197f1e4> in <module>()
----> 1 a.DXIUISD
AttributeError: 'A' object has no attribute 'DXIUISD'
You get an AttributeError.
But where is the tricky part ?
MRO and all the things explained above work only when you try to retrieve an attribute/method and with retrieve I mean object.attribute or object.method in general.
When you try to assign an attribute/method (like object.attribute = 10) you do that in the namespace of the object (instance/class).
You can change the behaviour with advanced and magic things (metaclass, inheritance, descriptor, property...) but this is how it normally works.
So when we type this:
Son.F
Python looks for the attribute inside the namespaces following the __mro__ order.
Son.F = 50
But with this the __mro__ doesn't matter, Python will create a new attribute or change its value if exists.
class Mother():
F = 34
class Son(Mother):
S = 10
F = 50
Mother.F # 34
Son.F # 50 # operation 1
del(Son.F) # remove F from Son.__dict__ # operation 2
Son.F # 34 # operation 3
Son.F = 30 # operation 4
Son.F # 30
Clear how it works ?
Here are the namespaces
{'S': 10, 'F': 50} # After 1
{'S': 10,} # After 2
{'S': 10, 'F': 30} # After 4
Init method is a special method to customize our instance, and it's called when we create an instance.
class A():
C = 10
def __init__(self, x):
self.x = x
a = A(10)
b = A(50)
Translated it means "when you call A(something) create a new instance of A and assign something to self.x".
Let's look inside the namespace.
A.__dict__ # (mappingproxy({'__init__': <function A.__init__ at 0x103ecab70>, 'C': 10})
a.__dict__ # {'x': 10}
b.__dict__ # {'x': 50}
A has its attributes, C and __init__, a and b their own x.
The attributes of a and b are called instance attributes, and they belong to their instance.
Now it should be clear the output of this:
A.C, a.C, b.C # (10, 10, 10)
Both a and b have no references of the attribute C in their namespaces, so Python looks inside A namespace to find something (following the MRO).
But what happens if we do this ?
a.C = 50
a.__dict__ # {'C': 50, 'x': 10}
We create a new reference inside a namespace.
A.C, a.C, b.C # (10, 50, 10)
This is why we get this result, because now when we look for C inside a we have an occurrence.
Let's see the dict of A, a and b again
A.__dict__ # (mappingproxy({'__init__': <function A.__init__ at 0x103ecab70>, 'C': 10})
a.__dict__ # {'C': 50, 'x': 10}
b.__dict__ # {'x': 10}
But what if we change A.C ?
A.C = 20
A.C, a.C, b.C # (20, 50, 20)
a.C remains 50 and b.C still looks inside A namespace because its namespace doesn't have any value for "C".
b.C = 70
A.C, a.C, b.C
(20, 50, 70)
Now instances a, b and the class A have a reference for "C" in their namespaces.
A.__dict__ # (mappingproxy({'__init__': <function A.__init__ at 0x103ecab70>, 'C': 10})
a.__dict__ # {'C': 50, 'x': 10}
b.__dict__ # {'C': 70, 'x': 50}
Yes, but if you have a reference in your namespace with the same name you need to be explicit.
class A():
C = 10
def __init__(self, x):
self.x = x
def p(self):
print(A.C, self.x)
a = A(10)
b = A(50)
a.p() # 10 10
b.p() # 10 50
So we just hardcoded A.C inside our method.
Is there a better way ?
class A():
C = 10
def __init__(self, x):
self.x = x
def p(self):
print(type(self).C, self.x) # First
print(self.__class__.C, self.x) # Second
a = A(10)
b = A(50)
a.p() # 10 10
b.p() # 10 50
Personally I prefer the second (more clear to me).
we have this code:
class A():
C = []
def __init__(self, x):
self.x = x
def p(self):
print(self.__class__.C, self.x)
a = A(10)
b = A(50)
and then we type this:
A.C, a.C, b.C # ([], [], [])
a.C.append(50)
A.C, a.C, b.C # ([50], [50], [50])
b.C.append(10)
A.C, a.C, b.C # ([50, 10], [50, 10], [50, 10])
A.C.pop(0)
A.C, a.C, b.C # ([10], [10], [10])
The class attribute A.C this time seems really shared, but also the class attributes that we used before were initially shared.
So why we can append and pop elements without any kind of problem ?
Because we didn't do any kind of assignment.
And our instances and class are "working" on a mutable object, accessing the object pointed by the reference (C) and changing it directly.
a.C = a.C * 2
A.C, a.C, b.C # ([10], [10, 10], [10])
After an assignment a.C has a new reference inside its namespace.
But the tricky part is this one....if instead of the last code we type this:
a.C *= 2
A.C, a.C, b.C # ?
what would be the output ?
([10, 10], [10, 10], [10, 10])
In this case we didn't have any new assignment, so we are still changing the referenced object.
Why ?
Due to how augmented assignments (+=, -=, *=, /=, ...) work.
With list and mutable objects there isn't an assignment and the operation is done "in-place", we are just updating the referenced object directly.
So back to the initial question...
class A():
x = 1
a = A()
b = A()
b.x = 11
a.x # ?
The answer is 1 because b.x = 11 creates a new attribute (instance attribute) inside b namespace.
The namespace of a remains empty, so a.x will look inside the namespace of A where x is still equal to 1.
If you wanto to go further with OOP in Python I think nothing is better than Leonardo Giordani's training.
Enjoyed this post? For more, follow me on twitter.