Numbers in Python

Sun 26 April 2020 by Moshe Zadka

Numbers in Python come in all shapes and forms. The reason different kind of representations of numbers exist is because they all have different trade-offs. These trade-offs are often surprising!

Integers

The most surprising things about integers is how easily they stop being integers. Dividing two integers, for example, 4/3, gives a float, and (4/3)*3 is the float 4.0. Even if a program has no floating point numbers coming in, all that is needed for floating point numbers to exist somewhere is a division operation.

Floats

Floats do not behave like numbers. Numbers obey certain mathematical properties: subtraction is the inverse to addition, addition is associative, and more.

For example

>>> 1 + 2 - 2 - 1
0
>>> 0.1 + 0.2 - 0.2 - 0.1
2.7755575615628914e-17

adding two numbers, and then subtracting them one at a time, does not result in the same value.

They do not obey the associative law of addition, a + (b + c) = (a + b) + c:

>>> a = 2**-53
>>> (a + a) + 1 == a + (a + 1)
False

These show just two of the corner cases that floating point numbers exhibit, which can be surprising. A full treatise on the ways that floating point behavior can be surprising is too big to fit in the margin of this blog post.

Fractions

Many algorithms that look straightforward "explode" with exact fractions. Explosion usually starts as time explosion: the algorithm becomes "quadratic": the time it takes is proportional not to the input length, but to the scare of the input's length. In other words, doubling the input size quadruples the time it takes.

If enough time is spent, memory explosion is also possible: the space requirements increase, until all memory fills up.

One weird protection against memory explosion is that usually it will take too long to get it, and the program will be killed for "hanging".

One such "algorithm" is addition.

>>> print(set(type(p) for p in primes))
>>> one = fractions.Fraction(1)
>>> before = datetime.now()
>>> res = sum(one/p for p in primes[:10000])
>>> after = datetime.now()
>>> print("It took", after-before)
>>> print("Size of output", len(str(res)))
>>> print("Approximate value", float(res))
{<class 'int'>}
It took 0:01:16.033260
Size of output 90676
Approximate value 2.7092582487972945

This is just adding the inverses to some primes (I removed the first few from the list, and then chopped the list to be the next 10,000). On a nice laptop designed as a gaming rig, adding 10,000 numbers took over a minute, and resulted in an output that was over 90K!

In comparison, running the same algorithm with floats is much more efficient:

>>> print(set(type(p) for p in primes))
>>> before = datetime.now()
>>> res = sum(1/p for p in primes[:10000])
>>> after = datetime.now()
>>> print("It took", after-before)
>>> print("Size of output", len(str(res)))
>>> print("Approximate value", float(res))
{<class 'int'>}
It took 0:00:00.000480
Size of output 17
Approximate value 2.709258248797317

The time it took is less than a millisecond, and some of that is possibly measurement error from datetime. This is around 10,000 times faster. The output can be saved in 17 bytes: a mere 1000 reduction in space. However, the result is inaccurate:

Approximate value 2.7092582487972945
Approximate value 2.709258248797317
                    1234567891234

The results are off by less than 1e-14. This would be like getting the distance to the moon wrong by one millimeter. In cases that do not involve sending a rocket to the moon with less than a millimeter (one grain of sand) tolerance, floats give a result that is precise enough and several orders of magnitude more efficient.

A lot of the responses to this were along the lines of "fractions are slow because they are implemented in Python". Python can be responsible for a 10x slowdown, but not 10,000x. There is a third-party module, quicktions, which implements fractions using Cython.

Using quicktions was, indeed, quicker. It took the time down from a minute and sixteen seconds to to a minute and fifteen seconds on my laptop.

Fundamentally, the problem is that this is a quadratic algorithm. I chose the inputs carefully: the worst case behavior for fraction addition is on prime numbers. But unless you can predict the inputs to an algorithm, you cannot rely on anything but the worst-case behavior.

Decimals

Decimal numbers are useful when managing financial transactions. This is for the most boring reason possible: the laws governing finance are specified in decimals. However, all decimal point calculations in Python are governed by hidden global state: the context. The context determines precision, and is taken from the caricature of how action at a distance is problematic for APIs.

Quoting the documentation (for Python 3.8):

>>> getcontext().prec = 6
>>> Decimal(1) / Decimal(7)
Decimal('0.142857')
>>> getcontext().prec = 28
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')

In practice, code might have hundreds of lines between setting the precision and doing a calculation. The calculation can be in another function, or even another file.

The only safe way to use decimal numbers in Python is with localcontext:

>>> getcontext().prec = 6
>>> # 6853 lines elided
... with localcontext() as ctx:
...     ctx.prec = 10
...     Decimal(1) / Decimal(7)
...
Decimal('0.1428571429')

As long as you are careful to use localcontext, decimals work pretty well. It is thread-safe and signal-safe.

Summary

Before you do things with numbers in your code, stop and think. What types should you use? What do you want to happen? What tolerances are important?

Not thinking means letting the corner cases in the code just happen.

(Thanks to Adi Stav, Aaron Hall, and Avy Faingezicht for their feedback on an earlier draft. All issues and mistakes that remain are my responsibility.)

Categories
misc