Section 12.1.  Extended Formatting

Table of Contents

12.1. Extended Formatting

Always use the /x flag.

Because regular expressions are really just programs, all the arguments in favour of careful code layout that were advanced in Chapter 2 must apply equally to regexes. And possibly more than equally, since regexes are written in a language much "denser" than Perl itself.

At very least, it's essential to use whitespace to make the code more readable, and comments to record your intent[*]. Writing a pattern like this:

[*] Particularly as regular expressions so often fail precisely because the coder's intent is not accurately translated into their patterns.


is no more acceptable than writing a program like this:


And no more readable, or maintainable.

The /x mode allows regular expressions to be laid out and annotated in a maintainable manner. Under /x mode, whitespace in your regex is ignored (i.e., it no longer matches the corresponding whitespace character), so you're free to use spaces and newlines for indentation and layout, as you do in regular Perl code. The # character is also special under /x. Instead of matching a literal '#', it introduces a normal Perl comment.

For example, the pattern shown previously could be rewritten like so:


    # Match a single-quoted string efficiently...
m{ '
# an opening single quote
# any non-special chars (i.e., not backslash or single quote)
# then all of...
\\ .
#    any explicitly backslashed char
#    followed by any non-special chars
# ...repeated zero or more times
# a closing single quote

That may still not be pretty, but at least it's now survivable.

Some people argue that the /x flag should be used only when a regular expression exceeds some particular threshold of complexity, such as only when it won't fit on a single line. But, as with all forms of code, regular expressions tend to grow in complexity over time. So even "simple" regexes will eventually need a /x, which will most likely not be retrofitted when the pattern reaches the particular complexity threshold you are using.

Besides, setting some arbitrary threshold of complexity makes both coding and maintenance harder. If you always use /x, then you can train your fingers to type it automatically for you, and you never need to think about it again. That's much more efficient and reliable than having to consciously[*] assess each regex you write to determine whether it merits the flag. And when you're maintaining the code, if you can rely on every regex having a /x flag, then you never have to check whether a particular regex is or isn't using the flag, and you never have to mentally switch regex "dialects".

[*] Or, worse still, unconsciously.

In other words, it's perfectly okay to use the /x flag only when a regular expression exceeds some particular threshold of complexity...so long as you set that particular threshold at zero.

    Table of Contents
    © 2000- NIV