Regex spiffification: free stuff!

By Filip Salo; published on September 25, 2006.

I thought it could be nice if you could use a string (as opposed to a compiled regular expression) as the second operand of a concatenation or union operation, but I left that for later.

Keeping the implementation minimal has several benefits. One being that my current implementation already does allow that too:

>>> a = re.compile(r"a|b")
>>> a + "c|d"
<Regex object for '(?:a|b)(?:c|d)'>

This is probably not a good idea, though. Consider this example:

>>> a = re.compile("A")
>>> a | "B|C" + "X|Y"
<Regex object for 'A|B|CX|Y'>

Since the + operator takes precedence over |, the second and third operands are concatenated as strings, and the expression becomes a | "B|CX|Y" instead of a | "(B|C)(X|Y)". That kind of subtleties is not something I'd want to wrestle in real code.

Something else that came for free (as in beer, lunch, Tibet and/or Willy) was this thought: what, exactly, will the benefit of adding + and | to regex objects if you still have to compile the subexpressions first? Why not just keep them as strings and do things like re.compile("%s|%s%s" % (a, b, c)) instead of a | b + c?

I'll let that sink in for a bit.

Next up: no, less.