Siv Scripts

Solving Problems Using Code

Sun 25 October 2020

Using the Facade Pattern to Wrap Third-Party Integrations

Posted by Aly Sivji in Deep Dives   

The main idea behind Software Architecture Methodologies such as Clean Architecture and Hexagonal Architecture is to create loosely coupled components that can be organized into layers. This way of writing code leverages the separation of concerns design principle and makes our application easier to maintain, i.e. we can easily modify our code and test it using stubs.

There are many ways we can create systems with layered architecture; one of the more popular techniques is to leverage Structural Design Patterns to create explicit relationships between classes. This post explores how the Facade Pattern can be used to wrap third-party integrations to improve software design.

Note: this is a companion writeup to my PyTexas talk, Everyday Design Patterns: Facade Pattern.


Table of Contents


What You Need to Follow Along

Language

Code


Project Description

We will be creating a changelog generator.

When I cut a new release for software that I own, I include a CHANGELOG that describes all the changes made since the last release.

Below is an example changelog; it contains a list of changes with links to the relevant GitHub Pull Request:

Example changelog; list of changes with links to the relevant GitHub Pull Request
Figure 1. Example Changelog

To simplify our example we'll make some assumptions:

  • the master / main branch is protected and all changes need to be made through a Pull Request
  • we squash all commits before merging into master; this means each commit in the master / main branch represents one change

The process to generate a changelog is fairly straightforward:

  • get the date of the last release using the GitHub API
  • get all the commit messages since that date from the GitHub API
  • format commit message into a changelog

Direct Integration Implementation

In this section we will walk through our initial implementation of a changelog generator script; this script directly interacts with the GitHub API.

Changelog Script

Our command-line script looks as follows:

# changelog/a_direct_integration.py

import argparse
import requests


def generate_changelog(owner, repo, version):
    BASE_URL = f"https://api.github.com/repos/{owner}/{repo}"

    # get release date
    resp = requests.get(f"{BASE_URL}/releases/tags/{version}")
    if resp.status_code == 404:
        raise ValueError("Version does not exist")
    resp.raise_for_status()
    release_dt = resp.json()["published_at"]

    # get commit messages
    params = {"sha": "master", "since": release_dt}
    resp = requests.get(f"{BASE_URL}/commits", params=params)
    resp.raise_for_status()
    commit_messages = [item.get("commit", {}).get("message") for item in resp.json()]

    # format
    changelog = ["CHANGELOG", ""]
    for message in commit_messages[::-1]:
        changelog.append(f"- {message}")
    return changelog


def parse_args():
    description = "Generate changelog for repository"
    parser = argparse.ArgumentParser(description=description)
    parser.add_argument(
        "-r",
        "--repo",
        type=str,
        help="Full path to repository, (abc/xyz)",
        required=True,
    )
    parser.add_argument(
        "-v",
        "--version",
        type=str,
        help="Version to generate CHANGELOG from",
        required=True,
    )
    return vars(parser.parse_args())


if __name__ == "__main__":
    args = parse_args()
    try:
        owner, repo = args["repo"].split("/")
    except ValueError:
        raise ValueError("Invalid repo")
    version = args["version"]

    changelog = generate_changelog(owner, repo, version)
    print()
    print("\n".join(changelog))

We can run this script as follows:

$ python changelog/a_direct_integration.py -r busy-beaver-dev/busy-beaver -v 2.9.0

CHANGELOG

- Merge dictionaries using new operator in Python 3.9 (#336)

Notes

  • used requests to interact with the GitHub API
  • used argparse to capture and parse command-line arguments

Testing Script

To figure out what / how to test, we need to understand our current workflow.

Diagram of Changelog Script workflow: script interacts with the GitHub API
Figure 2. Diagram of Changelog Script workflow: script interacts with the GitHub API.

The GitHub API is an external dependency that adds complexity to our testing process. It makes our tests slow as we have the additional overhead of making API requests across the internet. Also, what if GitHub goes down? The tests which depend on GitHub are going to fail. That doesn't make a lot of sense.

This is why we replace our dependency on the GitHub API with a stub that returns canned responses.

Diagram of Changelog Script workflow: script interacts with the GitHub API
Figure 3. Diagram of Changelog Script workflow for tests: script interacts with the GitHub API Stub.

In Python, we can use the responses library to create and return canned responses for interactions made using the requests library. We can specify the JSON to return when a specified endpoint is hit with a known HTTP verb. Stubbing out external dependencies also makes our tests determinstic.

Our tests look as follows:

# tests/test_a_direct_integration.py

import responses

from changelog.a_direct_integration import generate_changelog


@responses.activate
def test_generate_changelog():
    # Arrange -- created canned responses
    responses.add(
        responses.GET,
        "https://api.github.com/repos/owner/repo/releases/tags/1.0.0",
        json={"published_at": "2020-01-26"},
    )

    responses.add(
        responses.GET,
        "https://api.github.com/repos/owner/repo/commits",
        json=[
            {"commit": {"message": "last commit"}},
            {"commit": {"message": "first commit"}},
        ],
    )

    # Act
    changelog = generate_changelog("owner", "repo", "1.0.0")

    # Assert
    assert changelog == ["CHANGELOG", "", "- first commit", "- last commit"]

To run our test:

$ pytest tests/test_a_direct_integration.py

================== test session starts ==================
platform darwin -- Python 3.9.0, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /Users/alysivji/siv-dev/siv-scripts/clean-architecture--facade-pattern
collected 1 item

tests/test_a_direct_integration.py .              [100%]

=================== 1 passed in 0.12s ===================

Problem with Current Approach

The implementation above works, but it couples our code to something we do not control. If there is a change to our external dependency, we will have to update the generate_changelog function. Possible changes include:

  • having to modify our code if there is a GitHub API version upgrade
  • rewriting our entire integration logic if we move our project to GitLab

Neither of these changes would affect our actual business logic, but we would still have to modify our code as it is tightly coupled with the integration.

This is where the Facade Pattern comes in. The Facade Pattern helps us separate parts of our code that change from parts of our code that stay the same.


Facade Pattern

Provides a unified interface to a set of interfaces in a subsystem. The Facade defines a higher-level interface that makes the subsystem easier to use.

  • Head First Design Patterns

In this section, we will discuss the Facade Pattern.

Real World Example

(This example from Head First Design Patterns)

Imagine we have a home theatre system with many different components: TV, cable box, receiver, BluRay player, and some lights. Each of the components of this home theatre system has its own remote we can use to interact with it.

Lots of remotes; one for each component of a home theatre system
Figure 4. Remote for each component of a home theatre system.

Or we can program a universal remote with a simple interface. This remote will be our Facade to our home theatre system.

Universal remote with a simple interface
Figure 5. Universal remote with a simple interface

We can interact with our interface:

  • The "Watch TV" button can turn on your TV and cable box
  • The "Watch a DVD" button can turn on your TV and BluRay player, also dim your lights

If we need to access advanced features of any of our devices, we can use its supplied remote. But for most use cases, the universal remote does what we need.

Class Diagram

We can visualize the Facade Pattern using the following diagram:

Facade Pattern Class Diagram
Figure 6. Facade Pattern Class Diagram

In the above diagram, we have multiple clients interacting with a complex subsystem through a Facade.

Use Cases

We can use the Facade Pattern to:

Wrap Third-Party Integrations

Third-party integrations (libraries, APIs, SDKs) are general-purpose tools designed to solve many different types of problems. Usually, we only require a small subset of the functionality a library provides.

We can use the Facade Pattern to "wrap" our integration and only expose the functionality we require.

If our clients requires additional functionality from a third-party integration, we can expand the interface of our Facade for that use case. Our abstraction starts to leak if clients start bypassing the Facade.

Break Apart a Monolith

We can use the Facade Pattern to move from a monolith to microservices. Once we know the functionality we are migrating into a new service, we place this logic inside of a Facade. Then we rewrite our monolith's business logic to use the Facade.

Then when we are ready, we can replace method calls inside of the Facade with calls to another service via an API or by putting tasks on a queue.

Benefits of the Facade Pattern

Using the Facade Pattern provides the following benefits:

Reduces interface of 3rd party integrations

Usually, we only require a small subset of functionality from third-party libraries. We can use the Facade Pattern to simplify a library's interface to only the subset we require.

This can also improve our code's readability. Instead of directly integrating dependencies using each library's API, we can write business logic in the language of our problem domain.

Weak Coupling

Our clients do not need to know about the underlying implementation of the integration. They only need to know the integration's interface: function names, what parameters it takes, what it sends back.

We can change the implementation of the integration and our clients wouldn't know as long as the interface stayed the same. Another way to say this is: we "program to interfaces, not to implementations" .

Separation of concerns

We abstract parts of our code that change, from parts of our code that stay the same. This allows us to develop and test each component independently.

Test by replacing each component boundary with a value

Just like we stubbed out the GitHub API, we can stub out each boundary and unit test our component. There is a great talk by Gary Bernhardt that explores this topic in a lot more depth.


Facade Pattern Implementation

We will refactor our previous script using the Facade Pattern. To do this we need to wrap all the logic associated with the GitHub API in a class.

Another way to say this is: we want to encapsulate the GitHub API into a higher-order abstraction that we can use in our business logic.

Changelog Script

Our updated command-line script looks as follows:

# changelog/b_facade.py

import argparse
import requests

BASE_URL = "https://api.github.com"


def generate_changelog(owner, repo, version):
    github = GitHubClient()
    release_dt = github.get_release_date(owner, repo, version)
    commit_messages = github.get_commit_messages(owner, repo, release_dt)

    changelog = ["CHANGELOG", ""]
    for message in commit_messages:
        changelog.append(f"- {message}")
    return changelog


class GitHubClient:
    """Facade around GitHub REST API"""

    def get_release_date(self, owner, repo, version):
        url = f"{BASE_URL}/repos/{owner}/{repo}/releases/tags/{version}"
        resp = requests.get(url)
        if resp.status_code == 404:
            raise ValueError("Version does not exist")
        resp.raise_for_status()

        return resp.json()["published_at"]

    def get_commit_messages(self, owner, repo, release_dt):
        url = f"{BASE_URL}/repos/{owner}/{repo}/commits"
        params = {"sha": "master", "since": release_dt}
        resp = requests.get(url, params=params)
        resp.raise_for_status()

        messages = [item.get("commit", {}).get("message") for item in resp.json()]
        return messages[::-1]


def parse_args():
    description = "Generate changelog for repository"
    parser = argparse.ArgumentParser(description=description)
    parser.add_argument(
        "-r",
        "--repo",
        type=str,
        help="Full path to repository, (abc/xyz)",
        required=True,
    )
    parser.add_argument(
        "-v",
        "--version",
        type=str,
        help="Version to generate CHANGELOG from",
        required=True,
    )
    return vars(parser.parse_args())


if __name__ == "__main__":
    args = parse_args()
    try:
        owner, repo = args["repo"].split("/")
    except ValueError:
        raise ValueError("Invalid repo")
    version = args["version"]

    changelog = generate_changelog(owner, repo, version)
    print()
    print("\n".join(changelog))

We can run this script as follows:

$ python changelog/b_facade.py -r busy-beaver-dev/busy-beaver -v 2.9.0

CHANGELOG

- Merge dictionaries using new operator in Python 3.9 (#336)

Notes

  • this is a simple Facade that retrieves information from public GitHub repos
  • using sessions can improve performance, see Appendix A

Testing Script

To test the above script, we need to use responses as we did before. We will also need to test the generate_changelog driver function which interacts with the Facade to create a changelog.

This looks as follows:

# tests/test_b_facade.py

from unittest import mock
import responses

from changelog.b_facade import generate_changelog, GitHubClient


@responses.activate
def test_github_client_get_release_date():
    responses.add(
        responses.GET,
        "https://api.github.com/repos/owner/repo/releases/tags/1.0.0",
        json={"published_at": "2020-01-26"},
    )

    github = GitHubClient()
    release_dt = github.get_release_date("owner", "repo", "1.0.0")

    assert release_dt == "2020-01-26"


@responses.activate
def test_github_client_get_commit_messages():
    responses.add(
        responses.GET,
        "https://api.github.com/repos/owner/repo/commits",
        json=[
            {"commit": {"message": "last commit"}},
            {"commit": {"message": "first commit"}},
        ],
    )

    github = GitHubClient()
    messages = github.get_commit_messages("owner", "repo", "release_dt")

    assert messages == ["first commit", "last commit"]


class GitHubClientStub:
    def __init__(self, commit_messages=None):
        self.commit_messages = commit_messages
        self.mock = mock.Mock()

    def get_release_date(self, *args, **kwargs):
        self.mock(*args, **kwargs)

    def get_commit_messages(self, *args, **kwargs):
        self.mock(*args, **kwargs)
        return self.commit_messages


@mock.patch("changelog.b_facade.GitHubClient")
def test_generate_changelog(github_mock):
    commit_messages = ["first commit", "last commit"]
    github_mock.return_value = GitHubClientStub(commit_messages)

    messages = generate_changelog("owner", "repo", "1.0.0")

    assert messages == ["CHANGELOG", "", "- first commit", "- last commit"]

To run our test:

$ pytest tests/test_b_facade.py

================== test session starts ==================
platform darwin -- Python 3.9.0, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /Users/alysivji/siv-dev/siv-scripts/clean-architecture--facade-pattern
collected 3 items

tests/test_b_facade.py ...                        [100%]

=================== 3 passed in 0.11s ===================

Notes

  • in addition to replacing our GitHub integration boundary with a stub, we also replaced our internal integration boundary with a value
    • this way of development allows us write robust tests; we can write loads of unit tests to make sure each component works as expected

Facade Pattern: Migrate to GitHub GraphQL API

Now that we wrapped the GitHub API, let's explore how to refactor the underlying implementation in the Facade without changing business logic.

Throughout this post, we have been interacting with GitHub using the REST API interface. In this section, we will be migrating our integration to use the GraphQL API.

There are many videos that describe what GraphQL is and how to use it, but that's beyond the scope of what we need to know. For our purposes, GraphQL is a query language that retrieves the exact data we ask for. Instead of having to parse through large JSON blobs, we can make requests to get the exact information we need.

Changelog Script

Our refactored integration looks as follows:

# changelog/c_graphyql.py

import os
import argparse

from sgqlc.endpoint.requests import RequestsEndpoint

GITHUB_TOKEN = os.getenv("GITHUB_TOKEN", None)
BASE_URL = "https://api.github.com"


def generate_changelog(owner, repo, version):
    github = GitHubClient(GITHUB_TOKEN)
    release_dt = github.get_release_date(owner, repo, version)
    commit_messages = github.get_commit_messages(owner, repo, release_dt)

    changelog = ["CHANGELOG", ""]
    for message in commit_messages:
        changelog.append(f"- {message}")
    return changelog


class GitHubClient:
    """Facade around GitHub GraphQL API"""

    def __init__(self, oauth_token):
        headers = {"Authorization": f"Bearer {GITHUB_TOKEN}"}
        self.endpoint = RequestsEndpoint("https://api.github.com/graphql", headers)

    def get_release_date(self, owner, repo, tag):
        query = """
        query findReleaseDt($owner: String!, $repo: String!, $tag: String!) {
            repository(owner: $owner, name: $repo) {
                release(tagName: $tag) {
                    publishedAt
                }
            }
        }
        """
        variables = {"owner": owner, "repo": repo, "tag": tag}
        data = self.endpoint(query, variables)
        try:
            return data["data"]["repository"]["release"]["publishedAt"]
        except TypeError:  # returns {"release": None} if tag does not exist
            raise ValueError("Version does not exist")

    def get_commit_messages(self, owner, repo, release_dt):
        query = """
        query commitsSinceDt($owner: String!, $repo: String!, $branch: String!, $since_dt: GitTimestamp) {
            repository(owner: $owner, name: $repo) {
                object(expression: $branch) {
                    ... on Commit {
                        history(since: $since_dt) {
                            nodes {
                                messageHeadline
                            }
                        }
                    }
                }
            }
        }
        """  # noqa
        variables = {
            "owner": owner,
            "repo": repo,
            "branch": "master",
            "since_dt": release_dt,
        }
        data = self.endpoint(query, variables)
        if "errors" in data:
            # loop thru this: data["errors"][0]["message"]
            raise ValueError()

        commits = data["data"]["repository"]["object"]["history"]["nodes"]
        commit_messages = [commit["messageHeadline"] for commit in commits]
        return commit_messages[::-1]


def parse_args():
    description = "Generate changelog for repository"
    parser = argparse.ArgumentParser(description=description)
    parser.add_argument(
        "-r",
        "--repo",
        type=str,
        help="Full path to repository, (abc/xyz)",
        required=True,
    )
    parser.add_argument(
        "-v",
        "--version",
        type=str,
        help="Version to generate CHANGELOG from",
        required=True,
    )
    return vars(parser.parse_args())


if __name__ == "__main__":
    args = parse_args()
    try:
        owner, repo = args["repo"].split("/")
    except ValueError:
        raise ValueError("Invalid repo")
    version = args["version"]

    changelog = generate_changelog(owner, repo, version)
    print()
    print("\n".join(changelog))

We can run this script as follows:

$ python changelog/c_graphql.py -r busy-beaver-dev/busy-beaver -v 2.9.0

CHANGELOG

- Merge dictionaries using new operator in Python 3.9 (#336)

Notes

Discussion

Notice that the only change we made was to our GitHub integration, our actual business logic stayed the same. This is exactly what we should expect because our business logic doesn't care if we use the GitHub REST API or the GitHub GraphQL API.

It treats the GitHub integration like a black box. As long as the integration's interface stays the same, our code will work as expected.

To complete this task, we will need to update our contract tests. I will leave this as an exercise for the reader. Appendix B walks through an API testing strategy that records requests and responses.


Conclusion

In this post, we wrapped a third-party integration using the Facade Pattern. This results in loosely coupled code that is easy to maintain and even easier to test.

In the Appendices below, we will build a full-featured Facade and show an easy way to test API integrations.

Additional Resources

  • Freeman, Eric & Robson, Elizabeth. (2004). Head First Design Patterns: A Brain-Friendly Guide. 1st ed. Sebastopol, CA: O’Reilly Media
  • “Gang of Four”. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. 1st ed. Boston, MA: Addison-Wesley Professional
  • Gary Bernhardt: Boundaries
  • Martin, Robert. (2017). Clean Architecture. 1st ed. Upper Saddle River, NJ: Prentice Hall

Appendix A: Full-Featured Facade

Our running example created a simple Facade to demonstrate concepts without additional overhead. While the code does work, it's not something we would use in production.

To create a proper abstraction around the GitHub API we need the following:

  • request.Sessions to improve performance
  • set HTTP headers (Content-Type, User-Agent, Accept, etc) to be a good citizen of the web
  • HTTP Basic Authentication using a GitHub Access Token with repo permissions
    • will allow us to access private repos

Implementation

# changelog/d_full_featured_facade.py

import os
import argparse
import requests

GITHUB_TOKEN = os.getenv("GITHUB_TOKEN", None)
BASE_URL = "https://api.github.com"


def generate_changelog(owner, repo, version):
    github = GitHubClient(GITHUB_TOKEN)
    release_dt = github.get_release_date(owner, repo, version)
    commit_messages = github.get_commit_messages(owner, repo, release_dt)

    changelog = ["CHANGELOG", ""]
    for message in commit_messages:
        changelog.append(f"- {message}")
    return changelog


class GitHubClient:
    def __init__(self, oauth_token):
        headers = {
            "User-Agent": "Change Log",
            "Accept": "application/vnd.github.v3+json",
            "Authorization": f"token {oauth_token}",
            "Content-Type": "application/json",
        }
        session = requests.session()
        session.headers.update(headers)
        self.session = session

    def get_release_date(self, owner, repo, version):
        url = f"{BASE_URL}/repos/{owner}/{repo}/releases/tags/{version}"
        resp = self.session.get(url)
        if resp.status_code == 404:
            raise ValueError("Version does not exist")
        resp.raise_for_status()

        return resp.json()["published_at"]

    def get_commit_messages(self, owner, repo, release_dt):
        url = f"{BASE_URL}/repos/{owner}/{repo}/commits"
        params = {"sha": "master", "since": release_dt}
        resp = self.session.get(url, params=params)
        resp.raise_for_status()

        messages = [item.get("commit", {}).get("message") for item in resp.json()]
        return messages[::-1]


def parse_args():
    description = "Generate changelog for repository"
    parser = argparse.ArgumentParser(description=description)
    parser.add_argument(
        "-r",
        "--repo",
        type=str,
        help="Full path to repository, (abc/xyz)",
        required=True,
    )
    parser.add_argument(
        "-v",
        "--version",
        type=str,
        help="Version to generate CHANGELOG from",
        required=True,
    )
    return vars(parser.parse_args())


if __name__ == "__main__":
    args = parse_args()
    try:
        owner, repo = args["repo"].split("/")
    except ValueError:
        raise ValueError("Invalid repo")
    version = args["version"]

    changelog = generate_changelog(owner, repo, version)
    print()
    print("\n".join(changelog))

Appendix B: Testing with VCR.py

We used the responses library to stub out an external API in order to create determinstic tests.. While this method does work, it requires us to manually construct each response payload.

An alternative approach to testing is to utilize VCR.py. VCR.py records requests and responses and save them to disk as yaml files; these files are called cassettes. When we run our tests, VCR.py will use cassettes to replay the recorded requests and responses.

This approach deserves its own post but that's beyond the scope of this essay.

Implementation

We need to replace the Authorization header which contains our GitHub Access Token with a dummy value to ensure secrets do not get saved in our cassettes. With pytest, we can add the following snippet in our conftest.py:

# conftest.py

import pytest


@pytest.fixture(scope="session")
def vcr_config():
    """Overwrite headers where key can be leaked"""
    return {
        "filter_headers": [("authorization", "DUMMY")],
    }

Our tests will look as follows:

# tests/test_vcrpy.py

import os
from unittest import mock
import pytest

from changelog.d_full_featured_facade import generate_changelog, GitHubClient


class GitHubClientStub:
    def __init__(self, commit_messages=None):
        self.commit_messages = commit_messages
        self.mock = mock.Mock()

    def get_release_date(self, *args, **kwargs):
        self.mock(*args, **kwargs)

    def get_commit_messages(self, *args, **kwargs):
        self.mock(*args, **kwargs)
        return self.commit_messages


@mock.patch("changelog.d_full_featured_facade.GitHubClient")
def test_generate_changelog(github_mock):
    commit_messages = ["first commit", "last commit"]
    github_mock.return_value = GitHubClientStub(commit_messages)

    messages = generate_changelog("owner", "repo", "1.0.0")

    assert messages == ["CHANGELOG", "", "- first commit", "- last commit"]


@pytest.mark.vcr(cassette_library_dir="tests/cassettes/rest")
def test_github_client_get_release_date():
    GITHUB_TOKEN = os.getenv("GITHUB_TOKEN", None)
    github = GitHubClient(GITHUB_TOKEN)
    release_dt = github.get_release_date("busy-beaver-dev", "busy-beaver", "1.3.2")

    assert release_dt == "2020-01-26T19:04:10Z"


@pytest.mark.vcr(cassette_library_dir="tests/cassettes/rest")
def test_github_client_get_commit_messages():
    GITHUB_TOKEN = os.getenv("GITHUB_TOKEN", None)
    github = GitHubClient(GITHUB_TOKEN)

    release_dt = "2020-01-25T19:04:10Z"
    messages = github.get_commit_messages("busy-beaver-dev", "busy-beaver", release_dt)

    assert "Update to Python 3.9 (#335)" in messages

 
    
 
 

Comments