How to Build a Webhook Receiver in Django

2021-05-09 Well, I’m hooked.

A common way to receive data in a web application is with a webhook. The external system pushes data to yours with an HTTP request.

Correctly receiving and processing webhook data can be vital to your application working. In this post we’ll create a Django view to receive incoming webhook data.

Example use case

Imagine our site receives messages via webhook from a system at the infamous Acme Corporation. They follow the convention of sending POST requests with JSON bodies to a path on our site that we provide. They send a header with a secret token which we can use to authenticate their requests.

For the purposes of the example, we’ll ignore what we do with these messages and instead focus on the “scaffolding”.

Message log model

Before we start building a view, we should consider storing all incoming messages. Logging all incoming messages allows us to debug failures, check their structure is as documented, and otherwise audit what’s happening.

We could use any data store for the messages, but the simplest solution is to use a database model. This provides all the benefits of Django’s ORM and our database server’s durability guarantees.

The messages are JSON, so we can store them directly in a JSONField. Since Django 3.1 this works for all database backends.

We should also store the time we received the message, and index it to improve query performance. This will allow us to see the messages in order. We can also use use it to clear old messages, avoiding indefinite table growth.

Combining these requirements we get this model:

from django.db import models


class AcmeWebhookMessage(models.Model):
    received_at = models.DateTimeField(help_text="When we received the event.")
    payload = models.JSONField(default=None, null=True)

    class Meta:
        indexes = [
            models.Index(fields=["received_at"]),
        ]

Note we’re using models.Index, the modern way to define indexes.

View

Our view should verify the request, receive the incoming message, store it, process it, and reply with a success response. We can do these steps like so:

import datetime as dt
import json
from secrets import compare_digest

from django.conf import settings
from django.db.transaction import atomic, non_atomic_requests
from django.http import HttpResponse, HttpResponseForbidden
from django.views.decorators.csrf import csrf_exempt
from django.views.decorators.http import require_POST
from django.utils import timezone

from example.core.models import AcmeWebhookMessage


@csrf_exempt
@require_POST
@non_atomic_requests
def acme_webhook(request):
    given_token = request.headers.get("Acme-Webhook-Token", "")
    if not compare_digest(given_token, settings.ACME_WEBHOOK_TOKEN):
        return HttpResponseForbidden(
            "Incorrect token in Acme-Webhook-Token header.",
            content_type="text/plain",
        )

    AcmeWebhookMessage.objects.filter(
        received_at__lte=timezone.now() - dt.timedelta(days=7)
    ).delete()
    payload = json.loads(request.body)
    AcmeWebhookMessage.objects.create(
        received_at=timezone.now(),
        payload=payload,
    )
    process_webhook_payload(payload)
    return HttpResponse("Message received okay.", content_type="text/plain")


@atomic
def process_webhook_payload(payload):
    # TODO: business logic
    ...

Note:

@csrf_exempt disables Django’s default cross-site request forgery (CSRF) protection. Normally we wouldn’t want to accept a POST request without a CSRF token, as it could indicate a user being tricked into submitting a malicious form to our site from another. But for webhooks, we verify requests with different authentication schemes, so we can disable CSRF.
@require_POST blocks non-POST requests.
@non_atomic_requests disables the ATOMIC_REQUESTS (transaction-per-request) for this view. Using ATOMIC_REQUESTS is normally a good idea, and a straightforward way of adding transactions to your Django application. Here, we’re using direct transaction control—the @atomic on process_webhook_payload—to ensure that if our business logic crashes, we’ve at least saved the AcmeWebhookMessage for debugging. Therefore we don’t want a transaction around the whole view.
Acme’s system provides some authentication with a token in the Acme-Webhook-Token header. We check this header against the token they should be using, which we store in an environment variable and read in our settings. If the two do not match, we can reject the incoming message.

We use secrets.compare_digest() to perform the comparison. Unlike normal string comparison, this is guaranteed to take the same amount of time no matter the input string. This prevents timing attacks from retrieving our secret token. (Thanks to Florian Apolloner for reminding me to add this protection.)

Authentication is very important for webhook receivers since they are on the public web, and anyone could potentially discover them. Since there’s no real standard for webhooks, different callers use different authentication methods. If you’re adapting this code, check your caller’s documentation.
Before storing the new message, we clean up stored messages older than a week. This is a simple way to remove old data.

If our webhook ends up running frequently, executing this delete query each time may get expensive. In this case we could move the deletion out to a periodic background task, similar to Django’s clearsessions.
We use json.loads() to load the request body. We do this without any checking of the Content-Type header or error handling if the body isn’t valid JSON. If an error does occur, the view will crash, and our error reporting software (e.g. Sentry) will alert us.

This is a fine failure mode for our example. Since we’ve verified the message is from Acme, if the body is not JSON, something has gone wrong, and we’d like to know about it.
We store the data in the AcmeWebhookMessage model before attempting to process it. This ensures we have it logged even if we crash later.
We call our business logic handler. This has a stub implementation, left empty for the purposes of this example. In a real world application we’d add some code here. That said, deploying a first version with an empty handler is a good way to test messages are being received correctly.
We return a plain-text OK response from our view. Typically webhook callers check only the status code, so we can keep the body minimal.

URL

We can add a URL mapping to our view with the standard path():

from django.urls import path

from example.core.views import acme_webhook

urlpatterns = [
    ...,
    path(
        "webhooks/acme/mPnBRC1qxapOAxQpWmjy4NofbgxCmXSj/",
        acme_webhook,
    ),
]

The path contains a random string, generated with a password manager. This adds a little extra security-by-obscurity, since we won’t provide this URL to anyone but Acme. This prevents at least URL enumeration attacks from discovering our receiver.

Random URLs in the strings don’t provide real protection. URLs often get copied to insecure places, such as logs, emails, or sticky notes. Unfortunately some webhook callers do not support any authentication mechanism, so this can be the best option.

Tests

To test our webhook view, we can make requests to it with Django’s test client:

import datetime as dt
from http import HTTPStatus

from django.test import Client, override_settings, TestCase
from django.utils import timezone

from example.core.models import AcmeWebhookMessage


@override_settings(ACME_WEBHOOK_TOKEN="abc123")
class AcmeWebhookTests(TestCase):
    def setUp(self):
        self.client = Client(enforce_csrf_checks=True)

    def test_bad_method(self):
        response = self.client.get("/webhooks/acme/mPnBRC1qxapOAxQpWmjy4NofbgxCmXSj/")

        assert response.status_code == HTTPStatus.METHOD_NOT_ALLOWED

    def test_missing_token(self):
        response = self.client.post(
            "/webhooks/acme/mPnBRC1qxapOAxQpWmjy4NofbgxCmXSj/",
        )

        assert response.status_code == HTTPStatus.FORBIDDEN
        assert (
            response.content.decode() == "Incorrect token in Acme-Webhook-Token header."
        )

    def test_bad_token(self):
        response = self.client.post(
            "/webhooks/acme/mPnBRC1qxapOAxQpWmjy4NofbgxCmXSj/",
            HTTP_ACME_WEBHOOK_TOKEN="def456",
        )

        assert response.status_code == HTTPStatus.FORBIDDEN
        assert (
            response.content.decode() == "Incorrect token in Acme-Webhook-Token header."
        )

    def test_success(self):
        start = timezone.now()
        old_message = AcmeWebhookMessage.objects.create(
            received_at=start - dt.timedelta(days=100),
        )

        response = self.client.post(
            "/webhooks/acme/mPnBRC1qxapOAxQpWmjy4NofbgxCmXSj/",
            HTTP_ACME_WEBHOOK_TOKEN="abc123",
            content_type="application/json",
            data={"this": "is a message"},
        )

        assert response.status_code == HTTPStatus.OK
        assert response.content.decode() == "Message received okay."
        assert not AcmeWebhookMessage.objects.filter(id=old_message.id).exists()
        awm = AcmeWebhookMessage.objects.get()
        assert awm.received_at >= start
        assert awm.payload == {"this": "is a message"}

Note:

We use @override_settings to replace the token setting for every test in the test case. This means we don’t need to set a value in our test settings, nor use the sensitive real token, which we should not save in our code base.
To check the @csrf_exempt decorator, we create our test client with the enforce_csrf_checks flag on. The test client would raise a CSRF error if we accidentally removed the decorator from the view.
We first test the view’s various failure modes before testing its success case. Testing both the missing and bad token cases is not strictly necessary for coverage, but done for completeness in case the code changes.
When making assertions on the response status codes, we compare them with the HTTPStatus enum from the Python standard library.
To send the Acme-Webhook-Token header, we have to use the slightly unfriendly HTTP_* syntax.

Further changes

There are many ways we might need to improve our webhook receiver, beyond finishing its business logic. Here are some ideas:

We might extract more data from the JSON body into separate fields on our AcmeWebhookMessage model. For example, if there are multiple types of message we might want to be able to query them.
The caller might, by necessity, send us messages more than once. We’d want to guard against reprocessing repeat messages, to behave idempotently. We can do this by querying past messages, but we’d need more fields and maybe an index.
We could offload the processing of the messages to a background task, so we don’t make the caller wait for our success response. To do this we could extend our AcmeWebhookMessage model with more fields. Background processing could also make us more robust, allowing retries etc.
We could prevent the caller system from overwhelming us by adding rate-limiting, using django-ratelimit.

Fin

I hope you’re web-hooked to my blog!

—Adam

Read my book Boost Your Git DX to Git better.

One summary email a week, no spam, I pinky promise.

Tags: django