Saturday, July 9, 2022

Just use Dictionaries

A Python dictionary has a simple & well-known API. It is possible to merge data using a nice & minimalistic syntax, without mutating or worrying about state. You're probably not gonna need classes.

Hey, what's wrong with classes? 🤔

From what I've seen in Python, classes often add unnecessary complexity to code. Remember, the Python language is all about keeping it simple.

My impression is that in general, class and instance-based code feels like the proper way of coding: encapsulating data, inheriting features, exposing public methods & writing smart objects. The result is very often a lot of code, weird APIs (each one with its own) and not smart-enough objects. That kind of code quickly tend to be an obstacle. I guess that's when workarounds & hacks usually are added to the app.

Two ways of solving a problem: class-based vs data-oriented.
Less code, less problems.

What about Dataclasses?

Python dataclasses might be a good tradeoff between a heavy object with methods and the simple dictionary. You get typings and autocomplete. You can also create immutable data classes, and that's great! But you might miss the flexibility: the simplicity of merging, picking or omitting data from a dictionary. Letting data flow smoothly through your app.

Hey, what about Pydantic?

That's a really good choice for things like defining FastAPI endpoints. You'll get the typed data as OpenAPI docs for free.

I would as early as possible convert the Pydantic model to a dictionary (using the model.dict() function), or just pick the individual keys and pass those on to the rest of the app. By doing that, the rest of the app is not required to be aware of a domain specific type or some base class, created as workaround for the new set of problems introduced with custom types.

Just data. Keeping it simple.

What about the basic types? 🤔

That is certainly a tradeoff when using dictionaries, the data can be of any type and you will potentially get runtime errors. On the other hand, is that a real problem when using basic data types like dict, bool, str or int? For me, I can't remember that ever has been an issue.

But shouldn't data be private?

Classes are often used to separate public and private functionality. From my experience, explicitly making data and functionality private rarely adds value to a product. I think Python agrees with me about this. By default, all things in a Python module is public. I remember when learning about it, and the authors saying that’s okay because we’re all adults here. I very much liked that perspective!

Do you like Interfaces? 🤩

Yes! Especially when structuring code in modules and packages (more about that in the next section). Using __init__.py is a great way to make the intention of a small package clearer and easier to grasp. Maybe there's only one function that makes sense to use from the outside? That's where the package interface (aka the __init__.py file) feature fits in well.

Python files, modules, packages?

In Python, a file is a module. One or more modules in a folder is a package. One or more packages can be combined into an app. Using a package interface makes sense when structuring code in this way.

Keeping it simple. 😊

I'm finishing off this post with a quote from the past:

“Data is formless, shapeless, like water.
If you put data in a Clojure map, it becomes the map.
You put data in a Python list and it becomes the list.
Now data in a program can flow or it can crash.

Be data, my friend.

Bruce Lee, 1940 - 1973

ps.

If neither Bruce or I convinced you about the great thing with simple data structures, maybe Rich Hickey will? Don't miss his "just use maps" talk!

ds.



Top photo by Syd Wachs on Unsplash

No comments: