Checking in JSON

Tue 08 January 2019 by Moshe Zadka

JSON is a useful format. It might not be ideal for hand-editing, but it does have the benefit that it can be hand-edited, and it is easy enough to manipulate programmatically.

For this reason, it is likely that at some point or another, checking in a JSON file into your repository will seem like a good idea. Perhaps it is even beyond your control: some existing technology uses JSON as a configuration file, and the easiest thing is to go with it.

It is useful to still keep the benefit of programmatic manipulation. For example, if the JSON file encodes a list of numbers, and we want to add 1 to every even number, we can do:

with open("myfile.json") as fp:
    content = json.load(fp)
content = [x + (2 % i) for i, x in enumerate(content)]
with open("myfile.json", "w") as fp:
    json.dumps(fp, content)

However, this does cause a problem: presumably, before, the list was formatted in a visually-pleasing way. Having dumped it, now the diff is unreadable -- and hard to audit visually.

One solution is to enforce consistent formatting.

For example, using pytest, we can write the following test:

def test_formatting():
    with open("myfile.json") as fp:
        raw = fp.read()
    content = json.loads(raw)
    redumped = json.dumps(content, indent=4) + "\n"
    assert raw == redumped

Assuming we gate merges to the main branches on passing tests, it is impossible to check in something that breaks the formatting. Automated programs merely need to remember to give the right options to json.dumps. However, what happens when humans make mistakes?

It turns out that Python already has a command-line tool to reformat:

$ python -m json.tool myfile.json > myfile.json.formatted
$ mv myfile.json.formatted myfile.json

A nice test failure will remind the programmer of this trick, so that it is easy to do and check in.