Generating Code without Generating Technical Debt?

Delivering and maintaining AI-generated code.

Written by Reka Horvath on

Credit card with laptop
Photo by Paul Felberbauer on Unsplash

GPT and other large language models can produce huge volumes of code quickly. This allows for faster prototyping and iterative development, trying out multiple solutions. But it can also leave us with a bigger amount of mess code to maintain.

“People have talked about technical debt for a long time, and now we have a brand new credit card here that is going to allow us to accumulate technical debt in ways we were never able to do before,” said Armando Solar-Lezama, a professor at the MIT’s Computer Science & Artificial Intelligence Laboratory to the Wall Street Journal.

This article explores several ways how to improve the code generated by these powerful tools and how to fit it into your project.

code generation practices

Prompt Engineering

GPT can produce high quality code, but it often needs some “encouragement” to do so. The GPT best practices described by OpenAI can help with code generation as well.

As an example, let’s start with a quite general prompt:

Generate a Python function that takes a url and determines whether it’s https.

from urllib.parse import urlparse


def is_https(url):
    try:
        result = urlparse(url)
        return result.scheme == "https"
    except Exception as e:
        print(f"An error occurred: {e}")
        return False

except Exception is way too broad. Let’s try to improve it.

Please rewrite the code above, so that you specify which exceptions can actually occur.

And we get indeed a better result:

from urllib.parse import urlparse


def is_https(url):
    try:
        result = urlparse(url)
        return result.scheme == "https"
    except ValueError as e:
        print(f"Invalid URL, could not parse: {e}")
        return False

Make it Up-to-date

ChatGPT was trained with a “diverse range of internet text” with the last update of September 2021. (as stated by ChatGPT itself)

The logical consequence is that it doesn’t know about libraries, language constructs etc. that were released after September 2021. For example, the last Python release ChatGPT knows about is 3.9. Thus, it won’t use any language features introduced in 3.10 or 3.11 You can workaround this problem by explicitly explaining these new features via prompts.

A less obvious consequence is that ChatGPT might prefer an older syntax or library, even if the newer one was released before the training cutoff. A plausible reason: The “diverse range of internet text” contains a ton of tutorials using the “old option”, but only a few texts with the new one.

For example, if you ask ChatGPT to write a Python script communicating with an API, it usually provides an answer using the requests library. However, if you explicitly ask for using httpx, ChatGPT provides an answer using various async features.

Following Standards & Explicit Guidelines

Just like human coders, ChatGPT tends to meet expectations better, if it knows what those expectations are. As the GPT Best Practices guide in the OpenAI docs writes:

GPTs can’t read your mind.

If your project follow widespread standards, like PEP-8 in Python, chances are good that ChatGPT will generate code that fits into your project. If your project has some unusual conventions, you need to teach ChatGPT “your way” first.

If you have explicit coding standards, you can use them in your prompts:

Some examples for such prompts:

What’s Next?

A crucial question is: What happens to the code after ChatGPT has generated it? In the next sections, we’ll explore two main aspects of this:

The Right Place

A frequent cause of tech debt is code duplication. And a frequent cause of code duplication is that the developer didn’t recognize that a functionality already exists.

Whether code has been written by a human or by our AI friend, it’s important to put it at the right place. A clear project structure means:

Let’s say your project includes both a something.util and a something.whatever.util.

And something.util contains a function like this:

def is_https(url: str) -> bool:
    scheme, _, _, _, _ = urlsplit(url)
    return scheme == "https"

Now, you’re looking only in something.whatever.util and don’t see this function. So, you’re asking ChatGPT:

Generate a Python function that takes a url and determines whether it’s https.

Sure, you get this result:

from urllib.parse import urlparse


def is_https(url):
    try:
        result = urlparse(url)
        return result.scheme == "https"
    except Exception as e:
        print(f"An error occurred: {e}")
        return False

=> Now, you have 2 functions for the same functionality using different libraries.

A 4 year long research leading to the book Accelerate came to similar conclusion. They identified two architectural characteristics that correlated with high performance:

  • We can do most of our testing without requiring an integrated environment.
  • We can and do deploy or release our application independently of other applications/services it depends on.

📖 Forsgren, Nicole - Humble, Jez - Kim, Gene: Accelerate March 2018, IT Revolution Press Chapter 5

Continuous Delivery

In the book Accelerate, Nicole Forsgren and her team dedicate a chapter to Technical Practices and how they influence the performance of software teams. The chapter focuses on a single concept: continuous delivery.

According to their findings, continuous delivery decreases lead time, change fail rates, and the time necessary to restore the service. It also reduces deployment pain and (probably related to that) burnout.

The research team identified key capabilities that drive continuous delivery. The practices they found beneficial include:

📖 Forsgren, Nicole - Humble, Jez - Kim, Gene: Accelerate March 2018, IT Revolution Press Chapter 4 and Appendix A

While there hasn’t been similar research yet on code generated by AI tools, our prediction is that the factors above will continue to matter. In fact, as our tools generate more code faster, we’ll need even more reliable systems to verify and deliver that code.

Conclusion: New Code, Old Challenges

GPT and the other large language models provide an amazing new way to write a lot of code quickly. The code created by our virtual friend faces the same challenges as code created by humans. It has to:

Getting better and more robust code out of these tools is an intriguing puzzle. It combines new elements like prompt engineering with established practices for continuous delivery and maintenance.

From Sourcery