• Python正则表达式


        很少用正则表达式(为什么?不知道,但很有用),每次用到总是要重新查,然后重新试验后使用。
        整理Python正则表达式帮助内容,学习和理解,蓝色​标记及空白有待完善和补充。​​​​​​

    1. 函数
    2. 特殊字符
    3. 特殊转译 
    This module exports the following functions 函数s = 'abca'
    标识英文描述中文描述用法示例
    matchMatch a regular expression pattern to the beginning of a string.从字符串开始(第一个字符)匹配模式串match(pattern, string, flags=0)

    re.match('a',s)
    >>>
    re.match('a',s)[0]
    >>> a
    re.match('ab',s)
    >>>
    re.match('c',s)
    >>> None

    fullmatchMatch a regular expression pattern to all of a string.匹配字符串与模式串是否一致fullmatch(pattern, string, flags=0)re.fullmatch('abca',s)
    >>>
    re.fullmatch('abc',s)
    >>> None
    searchSearch a string for the presence of a pattern.在字符串中匹配模式串search(pattern, string, flags=0)re.search('a',s)
    >>>
    re.search('bc',s)
    >>>
    re.search('d',s)
    >>> None
    subSubstitute occurrences of a pattern found in a string.替换字符串中匹配的模式串,默认全部
    count设定匹配替换的个数
    sub(pattern, repl, string, count=0, flags=0)re.sub('a','A',s)
    >>> AbcA
    re.sub('a','A',s,1)
    >>> Abca
    re.sub('abc','ABC',s)
    >>> ABCa
    re.sub('d','A',s)
    >>> abca   
    subnSame as sub, but also return the number of substitutions made.sub,同时返回模式串出现的次数,返回元组subn(pattern, repl, string, count=0, flags=0)re.subn('a','A',s)
    >>> ('AbcA', 2)
    re.subn('a','A',s)[1]
    >>> 2
    re.subn('d','A',s)
    >>> ('abca', 0)
    splitSplit a string by the occurrences of a pattern.按模式拆分字符串,匹配的模式串显示为空,返回列表
    maxsplit设定匹配和拆分的个数
    split(pattern, string, maxsplit=0, flags=0)re.split('a',s)
    >>> ['', 'bc', '']
    re.split('a',s,1)
    >>> ['', 'bca']
    findallFind all occurrences of a pattern in a string.按模式返回字符串中所有匹配findall(pattern, string, flags=0)re.findall('a',s)
    >>> ['a', 'a']
    finditerReturn an iterator yielding a Match object for each match.findall,返回match对象listfinditer(pattern, string, flags=0)re.finditer('a',s)
    >>>
    list(re.finditer('a',s))
    >>> [, ]
    compileCompile a pattern into a Pattern object.创建模式串对象compile(pattern, flags=0)mo = re.compile('a')
    re.findall(mo,s)
    >>> ['a', 'a']
    purgeClear the regular expression cache.清空正则表达式缓存purge()具体意义不明确,待验证
    escapeBackslash all non-alphanumerics in a string.转译,所有非字母数字添加反斜杠escape(pattern)re.escape('\\')
    >>> '\\\\'
    re.escape('a')
    >>> 'a'
    re.escape('1')
    >>> '1'
    re.escape('#')
    >>> '\\#'
    The special characters are,特殊字符s = 'abcdaabc \n reg'
    标识英文描述中文描述用法示例
    .Matches any character except a newline.匹配除\n换行字符外所有字符串,返回listre.findall('.',s)
    >>> ['a', 'b', 'c', 'd', 'a', 'a', 'b', 'c', ' ', ' ', 'r', 'e', 'g']
    ^Matches the start of the string.从字符串开始匹配re.findall('^a',s)
    >>> ['a']
    re.findall('^b',s)
    >>> []
    $Matches the end of the string or just before the newline at the end of the string.以字符串结束开始匹配re.findall('g$',s)
    >>> ['g']
    re.findall('a$',s)
    >>> []
    *Matches 0 or more (greedy) repetitions of the preceding RE. 匹配0或更多重复字符(*前一字符或字符段)re.findall('a*',s)   # 匹配包含非a 1个a 或 多个a的子字符串
    >>> ['a', '', '', '', 'aa', '', '', '', '', '', '', '', '', '']  
    re.findall('abcc*',s)
    >>> ['abc', 'abc'] 
    +Matches 1 or more (greedy) repetitions of the preceding RE.匹配1或更多重复字符(+前一字符或字符段)re.findall('a+',s)   # 匹配包含a 多个a的子字符串
    >>> ['a', 'aa']
    re.findall('abcc+',s)
    >>> []
    ?Matches 0 or 1 (greedy) of the preceding RE.匹配01个字符(?前一字符或字符段)re.findall('a?',s)   # 匹配1a a的子字符串
    >>> ['a', '', '', '', 'a', 'a', '', '', '', '', '', '', '', '', '']
    re.findall('abcc?',s)
    >>> ['abc', 'abc']
    *?,+?,??Non-greedy versions of the previous three special characters.以?结束贪婪匹配(只匹配最少字符)re.findall('a*?b',s)
    >>> ['ab', 'aab']
    re.findall('a+?b',s)
    >>> ['ab', 'aab']
    re.findall('a??b',s)
    >>> ['ab', 'ab']
    {m,n}Matches from m to n repetitions of the preceding RE.匹配一个字符mn次重复(前一字符或字符段)re.findall('abc{0,0}',s)
    >>> ['ab', 'ab']
    re.findall('abc{0,1}',s)
    >>> ['abc', 'abc']
    {m,n}?Non-greedy version of the above.以?结束贪婪(只匹配最少字符),匹配指定长度re.findall('abc{0,1}?',s)
    >>> ['ab', 'ab']
    re.findall('abc{0,1}?d',s)
    >>>['abcd']
    \\Either escapes special characters or signals a special sequence.匹配特殊字符或特殊序列re.findall('\\n',s)
    >>> ['\n']
    []Indicates a set of characters.A "^" as the first character indicates a complementing set.将[]中的字符串中的每个字符分别匹配,不包含特殊字符
    第一个字符^ 非,不匹配^后的每个字符的模式
    re.findall('[a*b\n]',s)
    >>> ['a', 'b', 'a', 'a', 'b', '\n']
    re.findall('[^a*b\n]',s)
    >>> ['c', 'd', 'c', ' ', ' ', 'r', 'e', 'g']
    |A|B, creates an RE that will match either A or B.or,匹配多种不同模式串re.findall('a*?b|\\n',s)
    >>> ['ab', 'aab', '\n']
    (...)Matches the RE inside the parentheses.The contents can be retrieved or matched later in the string.按()中的模式串进行匹配并分组,不在()的不进行匹配re.findall('(a*)(bc+)(d?)',s)
    (?aiLmsux)The letters set the corresponding flags defined below.  
    (?:...)Non-grouping version of regular parentheses.  
    (?P...)The substring matched by the group is accessible by name.通过名称访问组匹配模式串re.search('(?P.*)',s).groupdict()
    >>> {'TEST': 'abcdaabc '}
    (?P=name)Matches the text matched earlier by the group named name.按组名匹配第一个文本re.search('(?P.*)(?P=abc)',s).groupdict()
    >>> {'abc': ''}
    (?#...)A comment; ignored.注释re.search('(\w*)',s).group()
    >>> abcdaabc
    re.search('(?#Comment:\w*)',s).group()
    >>>  
    (?=...)Matches if ... matches next, but doesn't consume the string.分组,满足(=...)开始并包括开始字符匹配串re.findall('(?=a)\w*',s)
    >>> ['abcdaabc']
    re.findall('(?=a)\w*',s)
    >>> ['bcdaabc']
    (?!...)Matches if ... doesn't match next.分组,非(=...)开始并包括开始字符匹配串re.findall('(?! )\w*',s)
    >>> ['abcdaabc', '', 'reg', '']
    re.findall('(? >>> ['abcdaabc', '', '', 'eg', '']
    (?<=...)Matches if preceded by ... (must be fixed length).分组,满足(=...)开始但不包括开始字符匹配串re.findall('(?<= )',s)  # start with ''(blank)
    >>> ['', '']
    re.findall('(?<= )\w*',s) # start with ''(blank) and follow with 数字、字母、下划线
    >>> ['', 'reg']
    (?Matches if not preceded by ... (must be fixed length).分组,非(=...)开始但不包括开始字符匹配串re.findall('(?<= )',s)  # start not with ''(blank)
    >>> ['', '', '', '', '', '', '', '', '', '', '', '', '']
    re.findall('(?<= )\w*',s) # start not with ''(blank) and follow with 数字、字母、下划线
    >>> ['abcdaabc', '', '', 'eg', '']
    (?(id/name)yes|no)Matches yes pattern if the group with id/name matched,the (optional) no pattern otherwise.  
    The special sequences consist of "\\" and a character from the list below.  If the ordinary character is not on the list, then the resulting RE will match the second character.s = '0123456789 this is regular tist express!'
    标识英文描述中文描述用法示例
    \numberMatches the contents of the group of the same number.()对字符串分组,在分组中使用\num表示引用第num个模式s = '122211'
    re.findall(r'(\d{1})(\d{1})(\d{1})',s)    #3组,每组1个数字
    >>> [('1', '2', '2'), ('2', '1', '1')]
    re.findall(r'(\d{1})(\d{1})(\1)',s)        #
    3组,每组1个数字,第3组内容与第1组一致
    >>> [('2', '2', '2')]
    \AMatches only at the start of the string.匹配以模式开始的字符串,^re.findall(r'\A.',s)
    >>> ['0']
    \ZMatches only at the end of the string.匹配以模式结束的字符串,$re.findall(r'.\Z',s)
    >>> ['!']
    \bMatches the empty string, but only at the start or end of a word.匹配单词边界(单词间空格)re.findall(r'is\b',s)
    >>> ['is', 'is']
    re.findall(r'\bis\b',s)
    >>> ['is']
    re.findall(r'.is\b',s)
    >>> ['his', ' is']
    \BMatches the empty string, but not at the start or end of a word.匹配非单词边界(除空格外字符)re.findall(r'.is\B.',s)
    >>> ['tist']
    \dMatches any decimal digit; equivalent to the set [0-9] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode digits.匹配任意数字,等价于 [0-9]re.findall(r'\d',s)
    >>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
    \DMatches any non-digit character; equivalent to [^\d].匹配任意非数字re.findall(r'\d',s)
    >>> [' ', 't', 'h', 'i', 's', ' ', 'i', 's', ' ', 'r', 'e', 'g', 'u', 'l', 'a', 'r', ' ', 't', 'i', 's', 't', ' ', 'e', 'x', 'p', 'r', 'e', 's', 's', '!']
    \sMatches any whitespace character; equivalent to [ \t\n\r\f\v] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode whitespace characters.匹配任意空白字符,等价于 [ \t\n\r\f]re.findall(r'\s',s)
    >>> [' ', ' ', ' ', ' ', ' ']
    re.findall(r'\s[a-z]*',s
    >>> [' this', ' is', ' regular', ' tist', ' express']
    \SMatches any non-whitespace character; equivalent to [^\s].匹配任意非空字符re.findall(r'\S',s)
    >>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 't', 'h', 'i', 's', 'i', 's', 'r', 'e', 'g', 'u', 'l', 'a', 'r', 't', 'i', 's', 't', 'e', 'x', 'p', 'r', 'e', 's', 's', '!']
    \wMatches any alphanumeric character; equivalent to [a-zA-Z0-9_]in bytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the range of Unicode alphanumeric characters (letters plus digits plus underscore).With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale.匹配字母数字及下划线re.findall(r'\s\w*',s)
    >>> [' this', ' is', ' regular', ' tist', ' express']
    re.findall(r'\w',s)
    >>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 't', 'h', 'i', 's', 'i', 's', 'r', 'e', 'g', 'u', 'l', 'a', 'r', 't', 'i', 's', 't', 'e', 'x', 'p', 'r', 'e', 's', 's']
    \WMatches the complement of \w.匹配非字母数字及下划线re.findall(r'\W*',s)
    >>> [' ', ' ', ' ', ' ', ' ', '!']
    re.findall(r'\S\W',s)
    >>> ['9 ', 's ', 's ', 'r ', 't ', 's!']
    \\Matches a literal backslash.匹配反斜杠s = 'abc\\'
    re.findall(r'\\')
    >>> ['\\']
    说明 
    Greedy means that it will match as many repetitions as possible.贪婪意味着匹配任意多个可能的重复
    pattern模式串,要匹配的正则表达式
    flag标志位,控制正则表达式的匹配方式1. re.I(re.IGNORECASE): 忽略大小写
    2. re.M(MULTILINE): 多行模式,改变'^''$'的行为
    3. re.S(DOTALL): 点任意匹配模式,改变'.'的行为
    4. re.L(LOCALE): 使预定字符类 \w \W \b \B \s \S 取决于当前区域设定
    5. re.U(UNICODE): 使预定字符类 \w \W \b \B \s \S \d \D 取决于unicode定义的字符属性
    6. re.X(VERBOSE): 详细模式。这个模式下正则表达式可以是多行,忽略空白字符,并可以加入注释
    \使用转译字符'\'时,由于字符串会自动转译,因为要使用 r'\b' 格式(增加 r 标识),以满足正则表达式的模式匹配
    • 实例:按组匹配身份证,身份证号 61052420220101912X
      身份证号码是由18位数字组成的,他们分别表示:
        1)前1、2位数字表示:所在省份的代码
        2)前3、4位数字表示:所在城市的代码
        3)前5、6位数字表示:所在区县的代码
        4)第7~14位数字表示:出生年、月、日;7、8、9、10位是年,11、12位是月,13、14位是日
        5)第15、16位数字表示:所在地的派出所的代码
        6)第17位数字表示性别:奇数表示男性,偶数表示女性
             7)第18位数字是校检码:校检码可以是0~9的数字,有时也用X表示
    1. import re
    2. s = '61052420220129128X'
    3. idcard = re.findall('(?P<省>\d{2})(?P<市>\d{2})(?P<县>\d{2})(?P<年>\d{4})(?P<月>\d{2})(?P<日>\d{2})(?P<派出所>\d{2})(?P<性别>\d{1})(?P<校验码>\d{1}|\D{1})',s)
    4. print("FindAll: ",idcard)
    5. idcard = re.search('(?P<省>\d{2})(?P<市>\d{2})(?P<县>\d{2})(?P<年>\d{4})(?P<月>\d{2})(?P<日>\d{2})(?P<派出所>\d{2})(?P<性别>\d{1})(?P<校验码>\d{1}|\D{1})',s).groupdict()
    6. print("Search1: ",idcard)
    7. print("Search2: ")
    8. for i in idcard.items():
    9. print(i[0],": ",i[1])
    10. >>>>>>>>>>>>>>>>>>>>>
    11. FindAll: [('61', '05', '24', '2022', '01', '29', '12', '8', 'X')]
    12. Search1: {'省': '61', '市': '05', '县': '24', '年': '2022', '月': '01', '日': '29', '派出所': '12', '性别': '8', '校验码': 'X'}
    13. Search2:
    14. 省 : 61
    15. 市 : 05
    16. 县 : 24
    17. 年 : 2022
    18. 月 : 01
    19. 日 : 29
    20. 派出所 : 12
    21. 性别 : 8
    22. 校验码 : X

    参考:

  • 相关阅读:
    机器学习(三十八):感知器
    自己动手从零写桌面操作系统GrapeOS系列教程——11.MBR介绍
    云原生核心技术之:Service Mesh 的实现—— Istio
    嵌入式系统开发笔记81:使用Dialog组件设计提示对话框
    01. 课程简介
    用智能插件(Fitten Code: Faster and Better AI Assistant)再次修改vue3 <script setup>留言板
    hcia 目的mac为(单播 组播 广播)mac
    包裹骨髓间充质干细胞/捕获内源性细胞因子的磺化/载有活性蛋白的壳聚糖水凝胶的制备与研究
    AUTOSAR知识点 之 Det (一):Det内容浅分析(规范与实际使用均很少介绍,所以文章描述较模糊,读者研究规范比较好点)
    Java --- SpringMVC的HttpMessageConverter
  • 原文地址:https://blog.csdn.net/u012841352/article/details/127273362