14adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# 24adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Secret Labs' Regular Expression Engine 34adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# 44adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# re-compatible interface for the sre matching engine 54adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# 64adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved. 74adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# 84adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# This version of the SRE library can be redistributed under CNRI's 94adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Python 1.6 license. For any other use, please contact Secret Labs 104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# AB (info@pythonware.com). 114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# 124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Portions of this engine have been developed in cooperation with 134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# CNRI. Hewlett-Packard provided funding for 1.6 integration and 144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# other compatibility work. 154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# 164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaor"""Support for regular expressions (RE). 184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 194adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThis module provides regular expression matching operations similar to 204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaothose found in Perl. It supports both 8-bit and Unicode strings; both 214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaothe pattern and the strings being processed can contain null bytes and 224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaocharacters outside the US ASCII range. 234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 244adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoRegular expressions can contain both special and ordinary characters. 254adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoMost ordinary characters, like "A", "a", or "0", are the simplest 264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoregular expressions; they simply match themselves. You can 274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoconcatenate ordinary characters, so last matches the string 'last'. 284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 294adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe special characters are: 304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "." Matches any character except a newline. 314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "^" Matches the start of the string. 324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "$" Matches the end of the string or just before the newline at 334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao the end of the string. 344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "*" Matches 0 or more (greedy) repetitions of the preceding RE. 354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Greedy means that it will match as many repetitions as possible. 364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "+" Matches 1 or more (greedy) repetitions of the preceding RE. 374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "?" Matches 0 or 1 (greedy) of the preceding RE. 384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao *?,+?,?? Non-greedy versions of the previous three special characters. 394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao {m,n} Matches from m to n repetitions of the preceding RE. 404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao {m,n}? Non-greedy version of the above. 414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "\\" Either escapes special characters or signals a special sequence. 424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao [] Indicates a set of characters. 434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao A "^" as the first character indicates a complementing set. 444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "|" A|B, creates an RE that will match either A or B. 454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (...) Matches the RE inside the parentheses. 464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao The contents can be retrieved or matched later in the string. 474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below). 484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?:...) Non-grouping version of regular parentheses. 494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?P<name>...) The substring matched by the group is accessible by name. 504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?P=name) Matches the text matched earlier by the group named name. 514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?#...) A comment; ignored. 524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?=...) Matches if ... matches next, but doesn't consume the string. 534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?!...) Matches if ... doesn't match next. 544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?<=...) Matches if preceded by ... (must be fixed length). 554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?<!...) Matches if not preceded by ... (must be fixed length). 564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (?(id/name)yes|no) Matches yes pattern if the group with id/name matched, 574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao the (optional) no pattern otherwise. 584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 594adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe special sequences consist of "\\" and a character from the list 604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaobelow. If the ordinary character is not on the list, then the 614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoresulting RE will match the second character. 624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \number Matches the contents of the group of the same number. 634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \A Matches only at the start of the string. 644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \Z Matches only at the end of the string. 654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \b Matches the empty string, but only at the start or end of a word. 664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \B Matches the empty string, but not at the start or end of a word. 674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \d Matches any decimal digit; equivalent to the set [0-9]. 684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \D Matches any non-digit character; equivalent to the set [^0-9]. 694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \s Matches any whitespace character; equivalent to [ \t\n\r\f\v]. 704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \S Matches any non-whitespace character; equiv. to [^ \t\n\r\f\v]. 714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]. 724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao With LOCALE, it will match the set [0-9_] plus characters defined 734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao as letters for the current locale. 744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \W Matches the complement of \w. 754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao \\ Matches a literal backslash. 764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 774adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThis module exports the following functions: 784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao match Match a regular expression pattern to the beginning of a string. 794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao search Search a string for the presence of a pattern. 804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao sub Substitute occurrences of a pattern found in a string. 814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao subn Same as sub, but also return the number of substitutions made. 824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao split Split a string by the occurrences of a pattern. 834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao findall Find all occurrences of a pattern in a string. 844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao finditer Return an iterator yielding a match object for each match. 854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao compile Compile a pattern into a RegexObject. 864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao purge Clear the regular expression cache. 874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao escape Backslash all non-alphanumerics in a string. 884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 894adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoSome of the functions in this module takes flags as optional parameters: 904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao I IGNORECASE Perform case-insensitive matching. 914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao L LOCALE Make \w, \W, \b, \B, dependent on the current locale. 924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao M MULTILINE "^" matches the beginning of lines (after a newline) 934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao as well as the string. 944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "$" matches the end of lines (before a newline) as well 954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao as the end of the string. 964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao S DOTALL "." matches any character at all, including the newline. 974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao X VERBOSE Ignore whitespace and comments for nicer looking RE's. 984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao U UNICODE Make \w, \W, \b, \B, dependent on the Unicode locale. 994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1004adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThis module also defines an exception 'error'. 1014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao""" 1034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sys 1054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sre_compile 1064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sre_parse 1074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# public symbols 1094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao__all__ = [ "match", "search", "sub", "subn", "split", "findall", 1104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "compile", "purge", "template", "escape", "I", "L", "M", "S", "X", 1114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "U", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE", 1124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "UNICODE", "error" ] 1134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao__version__ = "2.2.1" 1154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# flags 1174adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoI = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case 1184adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoL = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale 1194adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoU = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale 1204adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoM = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline 1214adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoS = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline 1224adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoX = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments 1234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# sre extensions (experimental, don't rely on these) 1254adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoT = TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking 1264adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoDEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation 1274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# sre exception 1294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoerror = sre_compile.error 1304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# -------------------------------------------------------------------- 1324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# public interface 1334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef match(pattern, string, flags=0): 1354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Try to apply the pattern at the start of the string, returning 1364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao a match object, or None if no match was found.""" 1374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags).match(string) 1384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef search(pattern, string, flags=0): 1404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Scan through string looking for a match to the pattern, returning 1414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao a match object, or None if no match was found.""" 1424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags).search(string) 1434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef sub(pattern, repl, string, count=0, flags=0): 1454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return the string obtained by replacing the leftmost 1464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao non-overlapping occurrences of the pattern in string by the 1474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao replacement repl. repl can be either a string or a callable; 1484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if a string, backslash escapes in it are processed. If it is 1494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao a callable, it's passed the match object and must return 1504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao a replacement string to be used.""" 1514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags).sub(repl, string, count) 1524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef subn(pattern, repl, string, count=0, flags=0): 1544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return a 2-tuple containing (new_string, number). 1554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao new_string is the string obtained by replacing the leftmost 1564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao non-overlapping occurrences of the pattern in the source 1574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao string by the replacement repl. number is the number of 1584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao substitutions that were made. repl can be either a string or a 1594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao callable; if a string, backslash escapes in it are processed. 1604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao If it is a callable, it's passed the match object and must 1614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return a replacement string to be used.""" 1624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags).subn(repl, string, count) 1634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef split(pattern, string, maxsplit=0, flags=0): 1654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Split the source string by the occurrences of the pattern, 1664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao returning a list containing the resulting substrings.""" 1674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags).split(string, maxsplit) 1684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef findall(pattern, string, flags=0): 1704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return a list of all non-overlapping matches in the string. 1714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao If one or more groups are present in the pattern, return a 1734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao list of groups; this will be a list of tuples if the pattern 1744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao has more than one group. 1754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Empty matches are included in the result.""" 1774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags).findall(string) 1784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoif sys.hexversion >= 0x02020000: 1804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao __all__.append("finditer") 1814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def finditer(pattern, string, flags=0): 1824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return an iterator over all non-overlapping matches in the 1834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao string. For each match, the iterator returns a match object. 1844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Empty matches are included in the result.""" 1864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags).finditer(string) 1874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef compile(pattern, flags=0): 1894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "Compile a regular expression pattern, returning a pattern object." 1904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags) 1914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef purge(): 1934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "Clear the regular expression cache" 1944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _cache.clear() 1954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _cache_repl.clear() 1964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef template(pattern, flags=0): 1984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "Compile a template pattern, returning a pattern object" 1994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile(pattern, flags|T) 2004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_alphanum = frozenset( 2024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789") 2034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef escape(pattern): 2054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "Escape all non-alphanumeric characters in pattern." 2064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao s = list(pattern) 2074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao alphanum = _alphanum 2084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for i, c in enumerate(pattern): 2094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if c not in alphanum: 2104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if c == "\000": 2114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao s[i] = "\\000" 2124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 2134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao s[i] = "\\" + c 2144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return pattern[:0].join(s) 2154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# -------------------------------------------------------------------- 2174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# internals 2184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_cache = {} 2204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_cache_repl = {} 2214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_pattern_type = type(sre_compile.compile("", 0)) 2234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_MAXCACHE = 100 2254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _compile(*key): 2274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # internal: compile pattern 2284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao cachekey = (type(key[0]),) + key 2294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao p = _cache.get(cachekey) 2304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if p is not None: 2314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return p 2324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao pattern, flags = key 2334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if isinstance(pattern, _pattern_type): 2344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if flags: 2354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise ValueError('Cannot process flags argument with a compiled pattern') 2364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return pattern 2374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not sre_compile.isstring(pattern): 2384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise TypeError, "first argument must be string or compiled pattern" 2394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 2404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao p = sre_compile.compile(pattern, flags) 2414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except error, v: 2424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise error, v # invalid expression 2434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if len(_cache) >= _MAXCACHE: 2444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _cache.clear() 2454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _cache[cachekey] = p 2464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return p 2474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _compile_repl(*key): 2494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # internal: compile replacement pattern 2504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao p = _cache_repl.get(key) 2514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if p is not None: 2524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return p 2534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao repl, pattern = key 2544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 2554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao p = sre_parse.parse_template(repl, pattern) 2564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except error, v: 2574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise error, v # invalid expression 2584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if len(_cache_repl) >= _MAXCACHE: 2594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _cache_repl.clear() 2604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _cache_repl[key] = p 2614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return p 2624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _expand(pattern, match, template): 2644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # internal: match.expand implementation hook 2654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao template = sre_parse.parse_template(template, pattern) 2664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return sre_parse.expand_template(template, match) 2674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _subx(pattern, template): 2694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # internal: pattern.sub/subn implementation helper 2704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao template = _compile_repl(template, pattern) 2714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not template[0] and len(template[1]) == 1: 2724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # literal replacement 2734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return template[1][0] 2744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def filter(match, template=template): 2754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return sre_parse.expand_template(template, match) 2764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return filter 2774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# register myself for pickling 2794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport copy_reg 2814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _pickle(p): 2834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _compile, (p.pattern, p.flags) 2844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaocopy_reg.pickle(_pattern_type, _pickle, _compile) 2864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# -------------------------------------------------------------------- 2884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# experimental stuff (see python-dev discussions for details) 2894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass Scanner: 2914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, lexicon, flags=0): 2924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao from sre_constants import BRANCH, SUBPATTERN 2934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.lexicon = lexicon 2944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # combine phrases into a compound pattern 2954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao p = [] 2964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao s = sre_parse.Pattern() 2974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao s.flags = flags 2984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for phrase, action in lexicon: 2994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao p.append(sre_parse.SubPattern(s, [ 3004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (SUBPATTERN, (len(p)+1, sre_parse.parse(phrase, flags))), 3014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ])) 3024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao s.groups = len(p)+1 3034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao p = sre_parse.SubPattern(s, [(BRANCH, (None, p))]) 3044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.scanner = sre_compile.compile(p) 3054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def scan(self, string): 3064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao result = [] 3074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao append = result.append 3084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao match = self.scanner.scanner(string).match 3094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao i = 0 3104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao while 1: 3114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao m = match() 3124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not m: 3134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao break 3144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao j = m.end() 3154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if i == j: 3164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao break 3174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao action = self.lexicon[m.lastindex-1][1] 3184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if hasattr(action, '__call__'): 3194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.match = m 3204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao action = action(self, m.group()) 3214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if action is not None: 3224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao append(action) 3234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao i = j 3244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return result, string[i:] 325