Introduction

This new regex implementation is intended eventually to replace Python's current re module implementation.

For testing and comparison with the current 're' module the new implementation is in the form of a module called 'regex'.

Also included are the compiled binary .pyd files for Python 2.5-2.7 and Python 3.1-3.2 on 32-bit Windows.

Flags

There are 2 kinds of flag: scoped and global. Scoped flags can apply to only part of a pattern and can be turned on or off; global flags apply to the entire pattern and can only be turned on.

The scoped flags are: IGNORECASE, MULTILINE, DOTALL, VERBOSE, WORD.

The global flags are: ASCII, LOCALE, NEW, REVERSE, UNICODE.

If neither the ASCII, LOCALE nor UNICODE flag is specified, the default is UNICODE if the regex pattern is a Unicode string and ASCII if it's a bytestring.

The NEW flag turns on the new behaviour of this module, which can differ from that of the 're' module, such as splitting on zero-width matches, inline flags affecting only what follows, and being able to turn inline flags off.

Notes on named capture groups

All capture groups have a group number, starting from 1.

Groups with the same group name will have the same group number, and groups with a different group name will have a different group number.

The same group name can be used on different branches of an alternation because they are mutually exclusive, eg. (?P<foo>first)|(?P<foo>second). They will, of course, have the same group number.

Group numbers will be reused, where possible, across different branches of a branch reset, eg. (?|(first)|(second)) has only group 1. If capture groups have different group names then they will, of course, have different group numbers, eg. (?|(?P<foo>first)|(?P<bar>second)) has group 1 ("foo") and group 2 ("bar").

Multithreading

The regex module releases the GIL during matching on instances of the built-in (immutable) string classes, enabling other Python threads to run concurrently. It is also possible to force the regex module to release the GIL during matching by calling the matching methods with the keyword argument concurrent=True. The behaviour is undefined if the string changes during matching, so use it only when it is guaranteed that that won't happen.

Building for 64-bits

If the source files are built for a 64-bit target then the string positions will also be 64-bit. (The 're' module appears to limit string positions to 32 bits, even on a 64-bit build.)

Unicode

This module supports Unicode 6.0.0.

Additional features

The issue numbers relate to the Python bug tracker.