Intro
mypy and other type checkers are popular tools for ensuring code quality in modern production-grade Python code. A little-known fact is the existence of mypyc. Mypyc uses the same facilities as mypy to analyze Python code, but instead of reporting type violations, it can emit IR code which later gets compiled as C-extension. It’s used to compile mypy itself, as well as other popular tools in Python ecosystem, e.g. black.
Dabbling into compiled Python extensions was long on my TODO list, and mypyc is a great substep to achieve that.
Without further ado, let’s go!
Baseline
A toy example to demonstrate the solution will be the 2021/09 Advent of Code problem, which I solved some time ago and published to my GitHub.
Advent of Code problems have a couple of properties which make them suitable for this demonstration:
Problems resemble competitive programming problems, often promoting low-level implementation, native looping, etc. This style of code is notoriously slow and discouraged in Python
Problems are structured as two-step problems, where the second problem is often designed to uncover performance issues with naive implementation achieved in the step before. This tends to promote problems that are heavily CPU-bound, which (once again) are notoriously slow in Python
For subsequent steps, the code was revamped.
aoc_2021_09.py contains all logic:
import functools
import heapq
def get_input(path: str) -> list[list[int]]:
with open(path) as fh:
return [[*map(int, line.strip())] for line in fh]
def solve(heightmap: list[list[int]]) -> int:
ret = 0
num_rows = len(heightmap)
num_cols = len(heightmap[0])
for row_number, row in enumerate(heightmap):
for col_number, value in enumerate(row):
to_compare: set[int] = set()
if row_number > 0:
to_compare.add(heightmap[row_number - 1][col_number])
if col_number > 0:
to_compare.add(heightmap[row_number][col_number - 1])
if row_number < num_rows - 1:
to_compare.add(heightmap[row_number + 1][col_number])
if col_number < num_cols - 1:
to_compare.add(heightmap[row_number][col_number + 1])
if all(c > value for c in to_compare):
ret += value + 1
return ret
def solve_p2(heightmap: list[list[int]]) -> int:
# this is basically problem of finding connected components
visited = [[False] * len(heightmap[0]) for _ in range(len(heightmap))]
sizes: list[int] = []
for row_number, row in enumerate(heightmap):
for col_number, value in enumerate(row):
size = dfs(visited, heightmap, row_number, col_number)
if size > 0:
sizes.append(size)
return functools.reduce(lambda x, y: x * y, heapq.nlargest(3, sizes))
def dfs(
visited: list[list[bool]],
heightmap: list[list[int]],
row_number: int,
col_number: int,
) -> int:
size = 0
num_rows = len(heightmap)
num_cols = len(heightmap[0])
if not visited[row_number][col_number]:
visited[row_number][col_number] = True
if heightmap[row_number][col_number] == 9:
return 0
size += 1
if row_number > 0:
size += dfs(visited, heightmap, row_number - 1, col_number)
if col_number > 0:
size += dfs(visited, heightmap, row_number, col_number - 1)
if row_number < num_rows - 1:
size += dfs(visited, heightmap, row_number + 1, col_number)
if col_number < num_cols - 1:
size += dfs(visited, heightmap, row_number, col_number + 1)
return size
runner.py contains just invocation:
import aoc_2021_09
my_input = aoc_2021_09.get_input("input.txt")
def benchmark():
"""Will be called by timeit"""
aoc_2021_09.solve_p2(my_input)
if __name__ == "__main__":
print(aoc_2021_09.__file__)
print(aoc_2021_09.solve_p2(my_input))
The reader is left to explore the 2021/09 problem definition and solution code.
Let’s just check if we can run our code:
$ python runner.py
C:\Users\rogal\repo\play-with-mypyc\aoc_2021_09.py
1050192
We can run the code and get the expected result. Awesome!
Compilation
Let’s start with the installation of mypy with relevant extras:
pip install mypy[mypyc]
...
Installing collected packages: mypy
Successfully installed mypy-1.13.0
Let’s check if our type model is accurate.
$ mypy --strict aoc_2021_09.py
Success: no issues found in 1 source file
Perfect! Typically, violations found here will result in an aborted compilation process later.
The compilation is as simple as telling mypyc to compile module (-m) named aoc_2021_09:
$ mypyc -m aoc_2021_09
running build_ext
copying build\lib.win-amd64-cpython-312\aoc_2021_09.cp312-win_amd64.pyd ->
In our filesystem, we can notice a new file:
$ ls
__pycache__/ aoc_2021_09.cp312-win_amd64.pyd* aoc_2021_09.py build/ ex.txt input.txt runner.py
I am on Windows, so *.pyd file is created. On the Linux box, *.so would be created instead.
The created file is fully equivalent to the original *.py file and will be used by the interpreter during execution:
$ python runner.py
C:\Users\rogal\repo\play-with-mypyc\aoc_2021_09.cp312-win_amd64.pyd
1050192
Now, let’s take a look at some numbers!
Benchmarks
The simplest benchmarks (based on timeit) give following results on my laptop.
Run with native Python:
$ python -m timeit 'import runner; runner.benchmark()'
50 loops, best of 5: 7.74 msec per loop
Run with compiled extension:
$ python -m timeit 'import runner; runner.benchmark()'
500 loops, best of 5: 939 usec per loop
Numbers vary from run to run, but they consistently show up as 8x speed-up. As discussed earlier, the example shown here (CPU bound, highly recursive) is almost designed to produce the highest speed-up possible. In my experiments, I’ve seen outputs on the full spectrum:
performance degradations
performance parity
minor speedups
big improvements such as shown in the results above
In practical terms, performance impact must be measured case by case.
Next steps
Of course, toy projects are a great way to start, but they do not represent real-life problems accurately. The next step on my TODO list is to apply these learnings on bigger codebase and hopefully find some performance improvements there.
Wish me luck!
Are you interested in more content like this post?
If you believe that I omitted something essential here, let me know!