When Python Practices Go Wrong

code::dive • Wrocław
2019 November 20
Brandon Rhodes
Not Python, but Python practices
First Topic
Compiling new code at runtime
exec statement
eval() function
a = 'abc'
s = '4 + len(a)'

eval(s)

# string → tokens → AST → bytecode → run

 7
Some beginners use eval()
for any dynamic operation
# “I need to get an attribute whose
# name I don’t know until runtime—”

eval('my_object.' + attribute_name)
Many early languages were called “dynamic”
merely because they offered eval()
Python is “dynamic” because
it offers introspection and data-driven
object operations without making
you build strings for eval()
eval('my_object.' + attribute_name)
                 
getattr(my_object, attribute_name)
So we steer new programmers away from eval()
A better mechanism is almost always available
But eval() wound up having
a great future — just not where
we first expected it!
Interesting example in Standard Library
from collections import namedtuple
Point = namedtuple('Point', 'x y z')

p = Point(3.0, 4.0, 5.0)
print(p.z)

 5.0
# How is the new class constructed?
# It uses exec!

_class_template = 'class {typename}...'

                

exec _class_template.format(...)
A contributor once offered an alternative that
builds the namedtuple class without exec
The core developers turned it down!

• Running the string is faster!
• Easier to get correct
So, templating types was
judged to be a legitimate use
doctests
Documentation often
includes sample code
┌────────────┐
 README.txt 
├────────────┴───────────────────┐
Adding a float and int together 
produces a float as output.     
                                
>>> 15.0 + 1                    
16.0                            
                                
This is also true for multiply.”│
└────────────────────────────────┘
def count_letters(s):
    """Return a dictionary of letter counts.

    >>> count_letters('abba')
    {'a': 2, 'b': 2}

    """
    counts = {}
    for letter in s:
        n = counts.get(letter, 0)
        counts[letter] = n + 1
    return counts
What if we could execute
the code inside documentation?
for line in lines:
   if line.startswith('>>> '):
       expr = line[4:]
       expected = next(lines)
       result = repr(eval(expr, scope))
       if expected != result:
          raise Exception()
Standard Library doctests module
Downsides

• All the code snippets in a file are coupled
Awkward to make excuses for trying edge cases
Not a good model for real tests
Upside

• Great for testing documentation
The greatest success of eval()?

ipynb-example.png
eval()

Wound up having only limited use
for application code but very wide
use in the tools we build
around code!
Second topic
Python’s Object Model
Syntax     Dunder Methods

a + b      a.__add__(b)
a - b      a.__sub__(b)
a < b      a.__lt__(b)
repr(a)    a.__repr__()
a.b        a.__getattribute__('b')
a(b)       a.__call__(b)
Let’s look at three dunder methods.
#1
__bool__()
if a: ...
elif a: ...
while a: ...
'yes' if a else 'no'
[a for a in sequence if a]
(a for a in sequence if a)
if False:   if True:
if 0:       if 1:
if 0.0:     if 1.0:
if '':      if 'nonempty string':
if []:      if ['non', 'empty', 'list']:
if {}:      if {'nonempty': 'dict'}:
if None:    if {'non', 'empty', 'set'}:
Programmers love brevity
PEP 8
“For sequences, (strings, lists, tuples),
use the fact that empty sequences are false.”
if len(seq) > 0:




if len(seq) > 0:
if len(seq):


if len(seq) > 0:
if len(seq):
if seq:
Problem

Python code
is always in danger
of becoming a
type desert
      ┌┐
      ││     ┌┐
    ┌┐││┌┐   ││┌┐
    │└┘│││   │└┘│
    └─┐└┘│   │┌─┘    \|/
      │┌─┘   ││       |
──────┴┴─────┴┴───────────
def unfriend(subject, users):
    if not users:
        return
    remove_edges('friend', subject, users)

# Q: What’s the type of “users”?
Because anything can have a __bool__() method,
the object in if statement can be anything
<UserCollection>



<UserCollection>
[<User angelica>, <User eliza>]


<UserCollection>
[<User angelica>, <User eliza>]
['angelica', 'eliza']

<UserCollection>
[<User angelica>, <User eliza>]
['angelica', 'eliza']
12
import this
“Explicit is better than implicit.”
if users:
if len(users):  # for a container
if users > 0:   # for an integer
#2
__getattr__()
class C:
    def __getattr__(self, name):
        print('Asked for attribute:', name)
        return 5

c = C()
print('c.foo equals', c.foo)

 Asked for attribute: foo
 c.foo equals 5
# Proxy Pattern

class Proxy:
    def __init__(self, target):
        self.target = target

    def __getattr__(self, name):
        return getattr(self.target, name)
# A "mock" test object

class Mock:
    def __init__(self):
        self.calls = []

    def __getattr__(self, name):
        def fake_method(*args):
            self.calls.append((name, args))
        return fake_method
You get an object that records
what methods have been invoked
m = Mock()

m.open('file.txt')
m.close()

assert m.calls == [
    ('open', ('file.txt',)), ('close', ())
]
from unittest.mock import Mock
def test_layout():
    window = Mock()
    layout_list(['a', 'b'], window)
    assert window.called == [
        ('text', 0, 0, 'a'),
        ('text', 0, 12, 'b'),
    ]
Problem?
Mock encourages tests
that are not really tests
Ideal tests
unit test ↔ A
unit test ↔ B
integration test ↔ A ↔ B

Mocked test
A ↔ Mock()    Mock() ↔ B
My complaints:

Tests start to lose signal
when Mock becomes routine
instead of a reluctant workaround

Docs and blog posts too often focus
on how to use rather than when
#3
__call__
The Python syntax f(...)
is an operator named a “call”
f is often a function or method,
but could be any other object instead
class Template:
    def __init__(self, path):
        with open(path) as f:
            self.text = f.read()

    def render(self, **kw):
        return self.text.format(**kw)

t = Template('index.html')
t.render(city='Wrocłow', conf='code::dive')
Programmers love brevity
“I only have one real method.
Does it even need a name?”
class Template:
    def __init__(self, path):
        with open(path) as f:
            self.text = f.read()

    def render(self, **kw):
        return self.text.format(**kw)

t = Template('index.html')
t.render(city='Wrocłow', conf='code::dive')
class Template:
    def __init__(self, path):
        with open(path) as f:
            self.text = f.read()

    def __call__(self, **kw):
        return self.text.format(**kw)

t = Template('index.html')
t(city='Wrocłow', conf='code::dive')
Problem
Readability
return get_template(args)()()
# build tree of XML element objects
                         
return get_template(args)()()
                           
#      flatten to plain text
I’m sure it felt clever to give
each of those objects a “default verb”
with a __call__() method
But I prefer:
return template.bind(args)()()
return template.bind(args).build().flatten()
But wait!
“But what if an API expects a plain callable?
Then my class definitely needs a __call__() method!”
class Template:
    def __call__(...):
        ...

t = Template()
framework(t)  # Needs a callable
So the class needs __call__(), right?
No.
It doesn’t.
class Template:
    def render(...):
        ...

t = Template()
framework(t.render)  # Needs a callable

# "t.render" is a "bound method"
</object_model>
Third Topic
Python’s mutability
modules and classes are mutable
import string

string.ascii_letters = ('aąbcćdeęfghijklł'
                        'mnńoóprsśtuwyzźż')
class C:
    value = 1

C.value = 2
print(v.value)  # prints “2”
class C:
    def method(self):
        print('A')

C.method = lambda self: print('B')

obj = C()
obj.method()  # prints “B”
This resulted in some
predictable mayhem
Programmers dislike repetition
DRY
“Don’t Repeat Yourself”
# your_web_app.py

def index_view(request):
    ...

def settings_view(request):
    ...

def shop_view(request):
    ...
“Python global objects are mutable.
Let’s eliminate the repetition!”
# your_web_app.py
from framework import request

def index_view():
    ...

def settings_view():
    ...

def shop_view():
    ...
# The framework loads your module.
name = 'your_web_app'
module = __import__(name)

# Then mutates `request` with HTTP data.
request.method = 'GET'
request.url = '/shop/'
response = getattr(module, 'shop_view')()
This is a bad idea
• Data now enters a function from two directions



• Data now enters a function from two directions
• Tests will have to mutate a global


• Data now enters a function from two directions
• Tests will have to mutate a global
• Threads would overwrite the one request object

• Data now enters a function from two directions
• Tests will have to mutate a global
• Threads would overwrite the one request object
• Async framework will need special knowledge
Bad

But can get even worse!
Why import request at all?
# The framework loads your module.
name = 'your_web_app'
module = __import__(name)

# Then injects a global named `request`!
module.request = request_object
Global state

Useful in emergencies,
but should not be used routinely
“Emergency”?

Gating: experimental feature chosen by data
the routine normally would not have
Another use of mutability:
code that’s about other code

Tests
from unittest.mock import patch

with patch('C.method', replacement_method):
   # ...
   # indented block runs with altered method
   # ...
So patch() has become the official
mechanism for testing code that’s
coupled to side effects
   get_books()       fetch()      download()
────────────────────────────────────────────────
for isbn in books:
    fetch(isbn)
               
                 url = URL.format(isbn)
                 download(url)
                              
                                u = urlopen(url)
                                data = u.read()
                                return data
Q: How much of that code can you
test without triggering real HTTP?
   get_books()       fetch()      download()
────────────────────────────────────────────────
for isbn in books:
    fetch(isbn)
               
                 url = URL.format(isbn)
                 download(url)
                              
                                u = urlopen(url)
                                data = u.read()
                                return data:
A: None of it!
The code abstracts away the I/O
but fails to actually decouple it
Solution?
# Best solution is Clean Architecture / Hexagonal

    get_books()       book_urls()    build_url()
────────────────────────────────────────────────────
urls = book_urls(books)
for url in urls    
    urlopen(url)    for isbn in books:
    data = u.read()     yield build_url(isbn)
                              
                                u = URL.format(isbn)
                                return u
# Second best: patch() only the I/O
# get_books() → fetch() → download() → urlopen()

with patch('urlopen', ...):
    get_books(['ISBN1', 'ISBN2'])
# Worst: disable immediate subroutine
# get_books() → fetch() → download() → urlopen()

args = []
with patch('fetch', args.append):
    get_books(['ISBN1', 'ISBN2'])

assert args == ['ISBN1', 'ISBN2']
Like testing objects with only Mocks,
testing functions with all of their
subroutines patched falls short
of being a real test
And you lose half
the benefit of testing!
Why write tests?

1. Automated alert when code is broken
2. Apply pressure to decouple your code
Good: heavily coupled code is
still testable in Python thanks to patch()

Bad: overuse of patch()
not only makes tests less meaningful,
but ruins what the tests would teach us
about our code’s architecture
One last danger of mutability:
import-time side effects
Python modules are executed
top-to-bottom at import time

Each module starts as a blank slate
and runs its code sequentially
# my_module.py

def f():
    for i in range(3):
        print(i)

for letter in 'aąbcć':
    print(letter)
The ability to execute arbitrary code
at import time leads to temptation
1. Configuration lives in Python code.



1. Configuration lives in Python code.
2. Switch to loading config from a file.


1. Configuration lives in Python code.
2. Switch to loading config from a file.
3. Then, config moves inside a database.

1. Configuration lives in Python code.
2. Switch to loading config from a file.
3. Then, config moves inside a database.
4. Then, database port moves to zookeeper.
Result: a module that can't
be imported or tested until
Zookeeper and Postgres are both up
Advice:
Never start down the road.
Admit no import time side effects!
One last thought
my_package/
     __init__.py  people keeping putting code here
     models.py
     views.py
Worse yet, some __init__.py files
import all their package’s modules
Advice: keep __init__.py empty of code

skyfield — docstring but no code
skyfield.api — imports everything
</mutability>
Final Topic
Object Orientation
subclass
from threading import Thread
# Option 1

def task():
   ...

t = Thread(target=task)
t.start()
# Option 2

class MyThread(Thread):
   def run(self):
       ...

t = MyThread()
t.start()
Subclassing Thread

• Two new objects instead of one
• Extra 4 spaces of indentation
• Can’t test task in isolation
Composition
vs
Specialization
Composition — Putting simple things together

Composition — Putting simple things together
Specialization — Making one thing more complicated
Problem

Python provides
an extra temptation
toward specialization
m×n
Gang of Four book Design Patterns

MSWindow MacWindow LinuxWindow
×
ListWindow IconWindow ImageWindow
MSListWindow
MSIconWindow
MSImageWindow
MacListWindow
MacIconWindow
MacImageWindow
LinuxWindowListWindow
LinuxWindowIconWindow
LinuxWindowImageWindow
m×n
Gang of Four
m×n → Bridge Pattern
Window        Layout
  |              |
MSWindow      ListLayout
MacWindow     IconLayout
LinuxWindow   ImageLayout
# At runtime, two class instances
# are composed together

w = MacWindow()
return IconLayout(w)
m×nm+n
Q: m×n → Bridge Pattern → ?
You can use data to couple
code instead of using behavior
layout(metrics, information)
           
    [Text, Text, Line,
     Box, Line, Text]
           
  Window.render(graphics)
Is there any advantage
to coupling with nouns
instead of verbs?
“Show me your flowchart
and conceal your tables,
and I shall continue to be mystified.

Show me your tables,
and I won’t usually need your flowchart;
it’ll be obvious.”

Fred Brooks (1975)
If Fred Brooks was right,
coupling classes with data structures
will be easier to understand than
coupling them with behavior
m×n → Bridge Pattern → Pipeline
Python
m×n → ? → Bridge Pattern → Pipeline
m×nmixins → Bridge Pattern → Pipeline
Let’s look at an early
experiment that still lingers
in the Python Standard Library
1990s


crazy times
from socketserver import BaseServer
finish_request()
get_request()
handle_error()
handle_request()
handle_timeout()
process_request()
serve_forever()
server_activate()
server_bind()
server_close()
service_actions()
shutdown()
verify_request()
finish_request()
get_request()
handle_error()
handle_request()
handle_timeout()
process_request()
serve_forever()    API
server_activate()
server_bind()
server_close()     API
service_actions()
shutdown()         API
verify_request()
finish_request()
get_request()
handle_error()
handle_request()       A
handle_timeout()
process_request()
serve_forever()    API
server_activate()      A*
server_bind()          A*
server_close()     API
service_actions()      A*
shutdown()         API
verify_request()
finish_request()
get_request()             B*
handle_error()
handle_request()       A
handle_timeout()
process_request()         B*
serve_forever()    API
server_activate()      A*
server_bind()          A*
server_close()     API
service_actions()      A*
shutdown()         API
verify_request()          B*
finish_request()             C
get_request()             B*
handle_error()
handle_request()       A
handle_timeout()
process_request()         B*
serve_forever()    API
server_activate()      A*
server_bind()          A*
server_close()     API
service_actions()      A*
shutdown()         API
verify_request()          B*
finish_request()             C
get_request()             B*
handle_error()                 D*
handle_request()       A
handle_timeout()               D*
process_request()         B*
serve_forever()    API
server_activate()      A*
server_bind()          A*
server_close()     API
service_actions()      A*
shutdown()         API
verify_request()          B*
This single class is in fact
several layers of behavior
finish_request()             C
get_request()             B*
handle_error()                 D*
handle_request()       A
handle_timeout()               D*
process_request()         B*
serve_forever()    API
server_activate()      A*
server_bind()          A*
server_close()     API
service_actions()      A*
shutdown()         API
verify_request()          B*
Why so many behaviors
on a single class?
“Modeling”
2 nouns: listening socket, connected socket
2 classes: BaseServer, StreamRequestHandler
If you think of classes
as structs for modeling the world,
you’ll create one class for one noun
instead of for each interface = behavior
Result?
Generally, too few classes
You’ll model patterns of behavior
with a dense graph of method calls
instead of clean separate interfaces
You’ll wind up
with one big “multi-storey” class
instead of several sleek single-storey
classes that cooperate together
Bridge Pattern
“abstraction and its implementation
in separate class hierarchies”
“S” in SOLID:

Single Responsibility Principle
“class should have only one reason to change”
Alas, the BaseServer has
many reasons to change!
finish_request()             C
get_request()             B*
handle_error()                 D*
handle_request()       A
handle_timeout()               D*
process_request()         B*
serve_forever()    API
server_activate()      A*
server_bind()          A*
server_close()     API
service_actions()      A*
shutdown()         API
verify_request()          B*
Server            Port       Service
───────────────── ────────── ─────────────────
serve_forever()   bind()     service_actions()
process_request() activate() verify_request()
shutdown()        get()      handle_request()
                  close()    handle_error()
                             handle_timeout()
Yes: three different classes
to wrap a single actual resource!
          <Port>  <listening socket>
<Server>
          <Service>
Q: So, how does the BaseServer class
survive needing to be customized in
different directions at once?
A: Mixins!
# Mixin does not inherit from anything else
class ThreadingMixIn:
    def process_request(self):
        ...

# By listing mixin first, its methods get
# priority over those of the other class
class MyService(ThreadingMixIn, BaseServer):
    ...
       socketserver

             
TCPServer     ForkingMixIn
UDPServer  ×  ThreadingMixIn
             
Mixins let you extend a class
in several directions at the same time
without solving the actual problem:
that the class is too complicated
m×n → Mixins → Bridge Pattern → Pipeline
It’s nearly 2020,
but new Python books are
still recommending mixins
class HybridDetailView(
    JSONResponseMixin,
    SingleObjectTemplateResponseMixin,
    BaseDetailView
): ...
Poor architecture is going
to happen despite our best efforts,
so it’s wonderful that Python has powerful
mechanisms for surviving it
But: Mixins too often become
routine instead of an emergency
survival mechanism
Conclusion

Python is a flexible
and powerful language
It’s so powerful that we often
work around poor architecture
without noticing it
A powerful language by itself
is not enough
The ease and power of the Python language need
to be combined with the experience of its community
for healthy long-term software projects

@brandon_rhodes