A Python prompt into a running process: debugging with Manhole

Sometimes your Python process will behave strangely, run slowly, or give you the wrong answers. And while hopefully you have logging, the logging isn’t always enough.

So how do you debug this process?

If you planned ahead, you can access an interactive Python prompt inside your running process, so you can poke around and see what’s going on.

In this article you’ll learn how to do just that using the Manhole project, along with some discussions of the risks, and some suggestions on how to get to the Python objects you care about.

How it works

Consider the following program; it runs Flask, and it enables Manhole:

from flask import Flask
import manhole

class Counter:
    i = 0
    def increment(self):
        self.i += 1

    def __str__(self):
        return str(self.i)

_counter = Counter()
app = Flask(__name__)

@app.route("/")
def index():
    _counter.increment()
    return str(_counter) + "\n"


if __name__ == '__main__':
    manhole.install(patch_fork=False, daemon_connection=True)
    app.run()

We can run the server, and it starts both the Flask application and the Manhole:

$ python flask1.py 
 * Serving Flask app "flask1" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
Manhole[5643:1565025066.7582]: Manhole UDS path: /tmp/manhole-5643
Manhole[5643:1565025066.7582]: Waiting for new connection (in pid:5643) ...
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Notice it tells us that it’s process with PID 5643, and that it’s listening at pth /tmp/manhole-5643.

We can send queries to the web server:

$ curl http://localhost:5000
1
$ curl http://localhost:5000
2
$ curl http://localhost:5000
3

But we can also get a Python prompt inside the running process. We use the socat networking tool to open the Unix socket where Manhole listens, which is based on the process’ PID:

$ socat readline unix-connect:/tmp/manhole-5643

... here it prints the stack traces of all threads ...

Python 3.7.4 (default, Jul  9 2019, 16:32:37) 
[GCC 9.1.1 20190503 (Red Hat 9.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
(ManholeConsole)
>>> import __main__
>>> __main__._counter.i
3

It also prints the stack traces of all running threads, useful if something is slow or deadlocked.

(Note that Manhole comes with a manhole-cli utility which technically means you don’t need socat, but it has problems rendering, so I’d stick with socat.)

Security and other risks

So this is obviously very useful, but how risky is it?

As far as security goes, if you’re running inside a Docker container, not very. The interpreter is only accessible via a file inside the container, so the only way to access the interpreter is via docker exec or kubectl exec or some equivalent.

If someone can run arbitrary programs inside your container they already have plenty of access, so adding Manhole doesn’t seem like much of a security risk.

More significant is the risk of breaking your production environment, since you are messing around in a production server—the same sort of risks you have from running queries on your production database.

  • If the server is running on your laptop or in staging, using Manhole is no problem at all.
  • If it’s a long-running batch process that is acting weird and whose output you control, it may be less risky.
  • If it’s a server that has access to users’ data and you’re worried about breaking it—maybe don’t use it.

Regardless, any time you do use the manhole interpreter you should consider it a failure: you ought to have been able to debug this problem with logging and monitoring. So don’t just debug the problem, also make sure you add enough logging that you don’t have to do this again.

Getting access to objects

Poking around at objects and looking at their contents is useful, but how do you get access to the objects you need? For module-level objects, including the main program script which becomes the module __main__, you just import the module—see the example above.

Another option is explicitly giving Manhole an object to expose as a local variable. We can change the Flask server above like this:

if __name__ == '__main__':
    manhole.install(patch_fork=False, daemon_connection=True,
                    locals={"counter": _counter})
    app.run()

And now when we connect to the Python interpreter it has a counter object, with no need for imports:

$ socat readline unix-connect:/tmp/manhole-6679
...
(ManholeConsole)
>>> counter
<__main__.Counter object at 0x7fbfc14d1910>

Finally, the garbage collector in Python has access to every object in memory, so we can use that to, for example, find instances of arbitrary classes:

$ socat readline unix-connect:/tmp/manhole-6679
...
(ManholeConsole)
>>> import gc
>>> from flask import Flask
>>> apps = [o for o in gc.get_objects() if isinstance(o, Flask)]
>>> apps
[<Flask 'flask1'>]
>>> import __main__
>>> __main__.app is apps[0]
True

Prepare for the worse

Failure is inevitable, and so you should prepare for it in advance: slowness, crashes, or strange behavior will happen sooner or later.

This can include logging—in fact you really ought to have as much logging as possible, crash handlers like faulthandler, and as we saw in this case, providing a backdoor Python interpreter for when you really don’t know what’s gone wrong.