Gem: exploding string alternatives

Tuesday 28 December 2021

Here’s a Python gem: a small bit of Python that uses the power of the language and standard library well.

It’s a function to list strings generated by a pattern with embedded alternatives. It takes an input string with brace-wrapped possibilities, and generates all the strings made from making choices among them:

>>> list(explode("{Alice,Bob} ate a {banana,donut}."))
[
    'Alice ate a banana.',
    'Alice ate a donut.',
    'Bob ate a banana.',
    'Bob ate a donut.'
]

Here’s the function:

def explode(pattern: str) -> Iterable[str]:
    """
    Expand the brace-delimited possibilities in a string.
    """
    seg_choices = []
    for segment in re.split(r"(\{.*?\})", pattern):
        if segment.startswith("{"):
            seg_choices.append(segment.strip("{}").split(","))
        else:
            seg_choices.append([segment])

    for parts in itertools.product(*seg_choices):
        yield "".join(parts)

I call this a gem because it’s concise without being tricky, and uses Python’s tools to strong effect. Let’s look at how it works.

re.split: The first step is to break the string into pieces. I used re.split(): it takes a regular expression, divides the string wherever the pattern matches, and returns a list of parts.

A subtlety that I make use of here: if the splitting regex has capturing groups (parenthesized pieces), then those captured strings are also included in the result list. The pattern is anything enclosed in curly braces, and I’ve parenthesized the whole thing so that I’ll get those bracketed pieces in the split list.

For our sample string, re.split will return these segments:

['', '{Alice,Bob}', ' ate a ', '{banana,donut}', '.']

There’s an initial empty string which might seem concerning, but it won’t be a problem.

Grouping: I used that list of segments to make another list, the choices for each segment. If a segment starts with a brace, then I strip off the braces and split on commas to get a list of the alternatives. The segments that don’t start with a brace are the non-choices part of the string, so I add them as a one-choice list. This gives us a uniform list of lists, where the inner lists are the choices for each segment of the result.

For our sample string, this is the parts list:

[[''], ['Alice', 'Bob'], [' ate a '], ['banana', 'donut'], ['.']]

itertools.product: To generate all the combinations, I used itertools.product(), which does much of the heavy lifting of this function. It takes a number of iterables as arguments and generates all the different combinations of choosing one element from each. My seg_choices list is exactly the list of arguments needed for itertools.product, and I can apply it with the star syntax.

The values from itertools.product are tuples of the choices made. The last step is to join them together and use yield to provide the value to the caller.

Nice.

Comments

[gravatar]

For fun I tried a Ruby version of this for comparison, although maybe not as much of a gem (no pun) because I don’t think Ruby has a nice way to return an iterable (lazy enumerable) for a cartesian product

#!/usr/bin/env ruby

def explode(pattern)
  seg_choices = []

  for segment in pattern.split(/(\{.*?\})/) do
    if segment.start_with?("{")
      seg_choices.append(segment.delete_prefix('{').delete_suffix('}').split(','))
    else
      seg_choices.append([segment])
    end
  end

  seg_choices[0].product(*seg_choices[1...]).map(&:join)
end


pp explode("{Alice,Bob} ate a {banana,donut}.")
[gravatar]

Very nice. I would use

segment[1:-1]

instead of

segment.strip("{}")

since strip() would strip all leading and trailing braces and thus a pathological case like

"{Alice,Bob,}}"

would not work?

[gravatar]

@Kari: in fact, I originally had [1:-1], until I wrote in the text, “strip off the braces”. Then .strip() seemed like a better expression of the idea. But I could go either way.

[gravatar]

Really beatiful gem and it is straight-forward to transform it into comprehension-style:

def explode(pattern: str) -> Iterable[str]:
    """
    Expand the brace-delimited possibilities in a string.
    """

    seg_choices = (
        seg.strip("{}").split(",") if seg.startswith("{") else [seg]
        for seg in re.split(r"(\{.*?\})", pattern)
    )

    return ("".join(parts) for parts in itertools.product(*seg_choices))
[gravatar]
Jerry Ebalunode 12:22 AM on 5 Jan 2022

You can do something similar in Bash shell, with brace expansion

echo -e {Alice,Bob,Calvin,Mary}\\tate\\ta\\t{banana\\n\\n,donut\\n\\n}

Alice ate a banana

Alice ate a donut

Bob ate a banana

Bob ate a donut

Calvin ate a banana

Calvin ate a donut

Mary ate a banana

Mary ate a donut

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.