Generating Code without Generating Technical Debt?

Delivering and maintaining AI-generated code.

Written by Reka Horvath on Jun 19, 2023

Credit card with laptop — Photo by Paul Felberbauer on Unsplash

GPT and other large language models can produce huge volumes of code quickly. This allows for faster prototyping and iterative development, trying out multiple solutions. But it can also leave us with a bigger amount of ~~mess~~ code to maintain.

“People have talked about technical debt for a long time, and now we have a brand new credit card here that is going to allow us to accumulate technical debt in ways we were never able to do before,” said Armando Solar-Lezama, a professor at the MIT’s Computer Science & Artificial Intelligence Laboratory to the Wall Street Journal.

This article explores several ways how to improve the code generated by these powerful tools and how to fit it into your project.

code generation practices

Prompt Engineering

GPT can produce high quality code, but it often needs some “encouragement” to do so. The GPT best practices described by OpenAI can help with code generation as well.

Adopting an expert persona. For example: “You’re an expert Python developer.”
General instructions. For example: “Write high quality code.” or “Aim for concise, readable, maintainable code.”
Specific instructions. For example: “Favor composition over inheritance.”, “Split the code into short functions.”, “If the condition is likely to happen. => Use if. If the condition is unlikely to happen. => Use a try-except.”
Provide examples.
Iterate: If the first answer is OK but not great, provide some specific instructions on how to improve it.

As an example, let’s start with a quite general prompt:

Generate a Python function that takes a url and determines whether it’s https.

from urllib.parse import urlparse


def is_https(url):
    try:
        result = urlparse(url)
        return result.scheme == "https"
    except Exception as e:
        print(f"An error occurred: {e}")
        return False

except Exception is way too broad. Let’s try to improve it.

Please rewrite the code above, so that you specify which exceptions can actually occur.

And we get indeed a better result:

from urllib.parse import urlparse


def is_https(url):
    try:
        result = urlparse(url)
        return result.scheme == "https"
    except ValueError as e:
        print(f"Invalid URL, could not parse: {e}")
        return False

Make it Up-to-date

ChatGPT was trained with a “diverse range of internet text” with the last update of September 2021. (as stated by ChatGPT itself)

The logical consequence is that it doesn’t know about libraries, language constructs etc. that were released after September 2021. For example, the last Python release ChatGPT knows about is 3.9. Thus, it won’t use any language features introduced in 3.10 or 3.11 You can workaround this problem by explicitly explaining these new features via prompts.

A less obvious consequence is that ChatGPT might prefer an older syntax or library, even if the newer one was released before the training cutoff. A plausible reason: The “diverse range of internet text” contains a ton of tutorials using the “old option”, but only a few texts with the new one.

For example, if you ask ChatGPT to write a Python script communicating with an API, it usually provides an answer using the requests library. However, if you explicitly ask for using httpx, ChatGPT provides an answer using various async features.

Following Standards & Explicit Guidelines

Just like human coders, ChatGPT tends to meet expectations better, if it knows what those expectations are. As the GPT Best Practices guide in the OpenAI docs writes:

GPTs can’t read your mind.

If your project follow widespread standards, like PEP-8 in Python, chances are good that ChatGPT will generate code that fits into your project. If your project has some unusual conventions, you need to teach ChatGPT “your way” first.

If you have explicit coding standards, you can use them in your prompts:

At the start of a chat to set the expectations for the following instructions.
During iteration, when asking ChatGPT to improve a suboptimal solution.

Some examples for such prompts:

Write a docstring following Google’s style for each function you generate.
Define Pydantic models instead of dataclasses.

What’s Next?

A crucial question is: What happens to the code after ChatGPT has generated it? In the next sections, we’ll explore two main aspects of this:

Where to put the generated code?
How does this code get delivered to users?

The Right Place

A frequent cause of tech debt is code duplication. And a frequent cause of code duplication is that the developer didn’t recognize that a functionality already exists.

Whether code has been written by a human or by our AI friend, it’s important to put it at the right place. A clear project structure means:

Small parts with clear boundaries. It doesn’t matter whether these parts are components / packages within a huge project or independent microservices and libraries.
Clear dependencies between the parts.

Let’s say your project includes both a something.util and a something.whatever.util.

And something.util contains a function like this:

def is_https(url: str) -> bool:
    scheme, _, _, _, _ = urlsplit(url)
    return scheme == "https"

Now, you’re looking only in something.whatever.util and don’t see this function. So, you’re asking ChatGPT:

Generate a Python function that takes a url and determines whether it’s https.

Sure, you get this result:

from urllib.parse import urlparse


def is_https(url):
    try:
        result = urlparse(url)
        return result.scheme == "https"
    except Exception as e:
        print(f"An error occurred: {e}")
        return False

=> Now, you have 2 functions for the same functionality using different libraries.

A 4 year long research leading to the book Accelerate came to similar conclusion. They identified two architectural characteristics that correlated with high performance:

We can do most of our testing without requiring an integrated environment.

We can and do deploy or release our application independently of other applications/services it depends on.

📖 Forsgren, Nicole - Humble, Jez - Kim, Gene: Accelerate March 2018, IT Revolution Press Chapter 5

Continuous Delivery

In the book Accelerate, Nicole Forsgren and her team dedicate a chapter to Technical Practices and how they influence the performance of software teams. The chapter focuses on a single concept: continuous delivery.

According to their findings, continuous delivery decreases lead time, change fail rates, and the time necessary to restore the service. It also reduces deployment pain and (probably related to that) burnout.

The research team identified key capabilities that drive continuous delivery. The practices they found beneficial include:

Version control for all production artifacts, incl. configuration, scripts etc.
Deployment automation.
Continuous integration.
Trunk-based development.
A reliable automated test suite
Test data management.
Shift left on security.
Monitoring across application and infrastructure.
Check system health proactively.

📖 Forsgren, Nicole - Humble, Jez - Kim, Gene: Accelerate March 2018, IT Revolution Press Chapter 4 and Appendix A

While there hasn’t been similar research yet on code generated by AI tools, our prediction is that the factors above will continue to matter. In fact, as our tools generate more code faster, we’ll need even more reliable systems to verify and deliver that code.

Conclusion: New Code, Old Challenges

GPT and the other large language models provide an amazing new way to write a lot of code quickly. The code created by our virtual friend faces the same challenges as code created by humans. It has to:

fit into your system,
be delivered to users,
evolve over time.

Getting better and more robust code out of these tools is an intriguing puzzle. It combines new elements like prompt engineering with established practices for continuous delivery and maintenance.

Bousquette, Isabelle: AI Is Writing Code Now. For Companies, That Is Good and Bad Wall Street Journal 2023-05-31
📖 Forsgren, Nicole - Humble, Jez - Kim, Gene: Accelerate March 2018, IT Revolution Press
GPT Best Practices OpenAI docs

PEP 20 Zen of Python

From Sourcery

Gilboy, Tim: Why You Need Coding Standards
Horvath, Reka: Maintain A Clean Architecture With Dependency Rules