14adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao#
24adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Secret Labs' Regular Expression Engine
34adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao#
44adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# re-compatible interface for the sre matching engine
54adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao#
64adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Copyright (c) 1998-2001 by Secret Labs AB.  All rights reserved.
74adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao#
84adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# This version of the SRE library can be redistributed under CNRI's
94adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Python 1.6 license.  For any other use, please contact Secret Labs
104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# AB (info@pythonware.com).
114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao#
124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Portions of this engine have been developed in cooperation with
134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# CNRI.  Hewlett-Packard provided funding for 1.6 integration and
144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# other compatibility work.
154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao#
164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaor"""Support for regular expressions (RE).
184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
194adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThis module provides regular expression matching operations similar to
204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaothose found in Perl.  It supports both 8-bit and Unicode strings; both
214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaothe pattern and the strings being processed can contain null bytes and
224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaocharacters outside the US ASCII range.
234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
244adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoRegular expressions can contain both special and ordinary characters.
254adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoMost ordinary characters, like "A", "a", or "0", are the simplest
264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoregular expressions; they simply match themselves.  You can
274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoconcatenate ordinary characters, so last matches the string 'last'.
284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
294adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe special characters are:
304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "."      Matches any character except a newline.
314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "^"      Matches the start of the string.
324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "$"      Matches the end of the string or just before the newline at
334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao             the end of the string.
344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao             Greedy means that it will match as many repetitions as possible.
364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "?"      Matches 0 or 1 (greedy) of the preceding RE.
384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    *?,+?,?? Non-greedy versions of the previous three special characters.
394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    {m,n}    Matches from m to n repetitions of the preceding RE.
404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    {m,n}?   Non-greedy version of the above.
414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "\\"     Either escapes special characters or signals a special sequence.
424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    []       Indicates a set of characters.
434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao             A "^" as the first character indicates a complementing set.
444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "|"      A|B, creates an RE that will match either A or B.
454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (...)    Matches the RE inside the parentheses.
464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao             The contents can be retrieved or matched later in the string.
474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below).
484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?:...)  Non-grouping version of regular parentheses.
494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?P<name>...) The substring matched by the group is accessible by name.
504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?P=name)     Matches the text matched earlier by the group named name.
514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?#...)  A comment; ignored.
524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?=...)  Matches if ... matches next, but doesn't consume the string.
534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?!...)  Matches if ... doesn't match next.
544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?<=...) Matches if preceded by ... (must be fixed length).
554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?<!...) Matches if not preceded by ... (must be fixed length).
564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                       the (optional) no pattern otherwise.
584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
594adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe special sequences consist of "\\" and a character from the list
604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaobelow.  If the ordinary character is not on the list, then the
614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoresulting RE will match the second character.
624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \number  Matches the contents of the group of the same number.
634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \A       Matches only at the start of the string.
644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \Z       Matches only at the end of the string.
654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \b       Matches the empty string, but only at the start or end of a word.
664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \B       Matches the empty string, but not at the start or end of a word.
674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \d       Matches any decimal digit; equivalent to the set [0-9].
684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \D       Matches any non-digit character; equivalent to the set [^0-9].
694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \s       Matches any whitespace character; equivalent to [ \t\n\r\f\v].
704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \S       Matches any non-whitespace character; equiv. to [^ \t\n\r\f\v].
714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_].
724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao             With LOCALE, it will match the set [0-9_] plus characters defined
734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao             as letters for the current locale.
744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \W       Matches the complement of \w.
754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    \\       Matches a literal backslash.
764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
774adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThis module exports the following functions:
784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    match    Match a regular expression pattern to the beginning of a string.
794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    search   Search a string for the presence of a pattern.
804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    sub      Substitute occurrences of a pattern found in a string.
814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    subn     Same as sub, but also return the number of substitutions made.
824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    split    Split a string by the occurrences of a pattern.
834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    findall  Find all occurrences of a pattern in a string.
844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    finditer Return an iterator yielding a match object for each match.
854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    compile  Compile a pattern into a RegexObject.
864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    purge    Clear the regular expression cache.
874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    escape   Backslash all non-alphanumerics in a string.
884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
894adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoSome of the functions in this module takes flags as optional parameters:
904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    I  IGNORECASE  Perform case-insensitive matching.
914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    M  MULTILINE   "^" matches the beginning of lines (after a newline)
934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                   as well as the string.
944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                   "$" matches the end of lines (before a newline) as well
954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                   as the end of the string.
964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    S  DOTALL      "." matches any character at all, including the newline.
974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    U  UNICODE     Make \w, \W, \b, \B, dependent on the Unicode locale.
994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1004adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThis module also defines an exception 'error'.
1014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao"""
1034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sys
1054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sre_compile
1064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sre_parse
1074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# public symbols
1094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao__all__ = [ "match", "search", "sub", "subn", "split", "findall",
1104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "compile", "purge", "template", "escape", "I", "L", "M", "S", "X",
1114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "U", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",
1124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "UNICODE", "error" ]
1134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao__version__ = "2.2.1"
1154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# flags
1174adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoI = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
1184adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoL = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
1194adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoU = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
1204adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoM = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
1214adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoS = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
1224adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoX = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments
1234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# sre extensions (experimental, don't rely on these)
1254adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoT = TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking
1264adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoDEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation
1274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# sre exception
1294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoerror = sre_compile.error
1304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# --------------------------------------------------------------------
1324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# public interface
1334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef match(pattern, string, flags=0):
1354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Try to apply the pattern at the start of the string, returning
1364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    a match object, or None if no match was found."""
1374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags).match(string)
1384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef search(pattern, string, flags=0):
1404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Scan through string looking for a match to the pattern, returning
1414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    a match object, or None if no match was found."""
1424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags).search(string)
1434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef sub(pattern, repl, string, count=0, flags=0):
1454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Return the string obtained by replacing the leftmost
1464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    non-overlapping occurrences of the pattern in string by the
1474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    replacement repl.  repl can be either a string or a callable;
1484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if a string, backslash escapes in it are processed.  If it is
1494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    a callable, it's passed the match object and must return
1504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    a replacement string to be used."""
1514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags).sub(repl, string, count)
1524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef subn(pattern, repl, string, count=0, flags=0):
1544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Return a 2-tuple containing (new_string, number).
1554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    new_string is the string obtained by replacing the leftmost
1564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    non-overlapping occurrences of the pattern in the source
1574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    string by the replacement repl.  number is the number of
1584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    substitutions that were made. repl can be either a string or a
1594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    callable; if a string, backslash escapes in it are processed.
1604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    If it is a callable, it's passed the match object and must
1614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return a replacement string to be used."""
1624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags).subn(repl, string, count)
1634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef split(pattern, string, maxsplit=0, flags=0):
1654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Split the source string by the occurrences of the pattern,
1664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    returning a list containing the resulting substrings."""
1674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags).split(string, maxsplit)
1684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef findall(pattern, string, flags=0):
1704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Return a list of all non-overlapping matches in the string.
1714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    If one or more groups are present in the pattern, return a
1734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    list of groups; this will be a list of tuples if the pattern
1744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    has more than one group.
1754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Empty matches are included in the result."""
1774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags).findall(string)
1784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoif sys.hexversion >= 0x02020000:
1804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    __all__.append("finditer")
1814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def finditer(pattern, string, flags=0):
1824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """Return an iterator over all non-overlapping matches in the
1834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        string.  For each match, the iterator returns a match object.
1844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        Empty matches are included in the result."""
1864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return _compile(pattern, flags).finditer(string)
1874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef compile(pattern, flags=0):
1894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "Compile a regular expression pattern, returning a pattern object."
1904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags)
1914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef purge():
1934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "Clear the regular expression cache"
1944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    _cache.clear()
1954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    _cache_repl.clear()
1964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef template(pattern, flags=0):
1984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "Compile a template pattern, returning a pattern object"
1994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile(pattern, flags|T)
2004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_alphanum = frozenset(
2024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
2034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef escape(pattern):
2054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    "Escape all non-alphanumeric characters in pattern."
2064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    s = list(pattern)
2074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    alphanum = _alphanum
2084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for i, c in enumerate(pattern):
2094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if c not in alphanum:
2104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if c == "\000":
2114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                s[i] = "\\000"
2124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            else:
2134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                s[i] = "\\" + c
2144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return pattern[:0].join(s)
2154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# --------------------------------------------------------------------
2174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# internals
2184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_cache = {}
2204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_cache_repl = {}
2214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_pattern_type = type(sre_compile.compile("", 0))
2234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_MAXCACHE = 100
2254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _compile(*key):
2274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # internal: compile pattern
2284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    cachekey = (type(key[0]),) + key
2294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    p = _cache.get(cachekey)
2304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if p is not None:
2314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return p
2324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    pattern, flags = key
2334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if isinstance(pattern, _pattern_type):
2344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if flags:
2354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise ValueError('Cannot process flags argument with a compiled pattern')
2364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return pattern
2374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if not sre_compile.isstring(pattern):
2384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        raise TypeError, "first argument must be string or compiled pattern"
2394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    try:
2404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        p = sre_compile.compile(pattern, flags)
2414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    except error, v:
2424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        raise error, v # invalid expression
2434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if len(_cache) >= _MAXCACHE:
2444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        _cache.clear()
2454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    _cache[cachekey] = p
2464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return p
2474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _compile_repl(*key):
2494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # internal: compile replacement pattern
2504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    p = _cache_repl.get(key)
2514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if p is not None:
2524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return p
2534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    repl, pattern = key
2544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    try:
2554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        p = sre_parse.parse_template(repl, pattern)
2564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    except error, v:
2574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        raise error, v # invalid expression
2584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if len(_cache_repl) >= _MAXCACHE:
2594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        _cache_repl.clear()
2604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    _cache_repl[key] = p
2614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return p
2624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _expand(pattern, match, template):
2644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # internal: match.expand implementation hook
2654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    template = sre_parse.parse_template(template, pattern)
2664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return sre_parse.expand_template(template, match)
2674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _subx(pattern, template):
2694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # internal: pattern.sub/subn implementation helper
2704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    template = _compile_repl(template, pattern)
2714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if not template[0] and len(template[1]) == 1:
2724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # literal replacement
2734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return template[1][0]
2744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def filter(match, template=template):
2754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return sre_parse.expand_template(template, match)
2764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return filter
2774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# register myself for pickling
2794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport copy_reg
2814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _pickle(p):
2834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _compile, (p.pattern, p.flags)
2844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaocopy_reg.pickle(_pattern_type, _pickle, _compile)
2864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# --------------------------------------------------------------------
2884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# experimental stuff (see python-dev discussions for details)
2894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass Scanner:
2914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, lexicon, flags=0):
2924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        from sre_constants import BRANCH, SUBPATTERN
2934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.lexicon = lexicon
2944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # combine phrases into a compound pattern
2954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        p = []
2964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        s = sre_parse.Pattern()
2974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        s.flags = flags
2984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for phrase, action in lexicon:
2994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            p.append(sre_parse.SubPattern(s, [
3004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                (SUBPATTERN, (len(p)+1, sre_parse.parse(phrase, flags))),
3014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                ]))
3024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        s.groups = len(p)+1
3034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        p = sre_parse.SubPattern(s, [(BRANCH, (None, p))])
3044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.scanner = sre_compile.compile(p)
3054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def scan(self, string):
3064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        result = []
3074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        append = result.append
3084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        match = self.scanner.scanner(string).match
3094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        i = 0
3104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        while 1:
3114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            m = match()
3124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if not m:
3134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                break
3144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            j = m.end()
3154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if i == j:
3164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                break
3174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            action = self.lexicon[m.lastindex-1][1]
3184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if hasattr(action, '__call__'):
3194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                self.match = m
3204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                action = action(self, m.group())
3214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if action is not None:
3224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                append(action)
3234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            i = j
3244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return result, string[i:]
325