• Linux基础学习笔记(十)——正则表达式


    前言

    走到目前为止,离shell的学习越来越近了,还剩下几个基础内容,这篇博客打算总结一下正则表达式,对应于《鸟哥的Linux私房菜》第十一章的内容,这篇博客也会涉及grep命令和sed命令,但是关于什么是正则表达式,这里不做介绍

    grep命令

    grep命令linux中使用的比较广泛的一条命令,能以行为单位过滤文本数据。
    在这里插入图片描述

    实例

    ## 查询root用户最近登录的记录
    [root@localhost ~]# last | grep  'root'
    # 查询非root用户最近登录的记录
    [root@localhost ~]# last | grep -v 'root'
    # 取出/etc/man_db.conf 中包含MANPATH的几行,且高亮显示
    grep --color=auto 'MANPATH' /etc/man_db.conf
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    除了这些基础用法,还有一些进阶的用法
    在这里插入图片描述

    实例

    ## dmesg列出核心信息,通过grep找出含有'IPv6'的数据行
    [root@localhost shell-learn]# dmesg | grep 'IPv6'
    [    6.852070] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
    [    6.876779] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
    [    6.894626] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
    [    6.915310] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    [    6.944766] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    [    7.013395] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    [    7.100007] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    [    7.896722] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s8: link becomes ready
    ## 显示行号
    [root@localhost shell-learn]# dmesg | grep -n --color=auto 'IPv6'
    503:[    6.852070] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
    505:[    6.876779] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
    506:[    6.894626] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
    507:[    6.915310] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    509:[    6.944766] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    510:[    7.013395] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    511:[    7.100007] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
    512:[    7.896722] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s8: link becomes ready
    ## 关键词之前2行数据和之后的3行数据,都被显示出来
    [root@localhost shell-learn]# dmesg | grep -n -A3 -B2 --color=auto 'IPv6'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22

    grep在数据中查找一个字符的时候,是以"整行"为单位来进行数据的读取。

    基础正则表达式

    准备数据

    vi创建一个文件,用于总结正则表达式

    "Open Source" is a good mechanism to develop programs.
    apple is my favorite food.
    Football game is not use feet only.
    this dress doesn't fit me.
    However, this dress is about $ 3183 dollars.^M
    GNU is free air not free beer.^M
    Her hair is very beauty.^M
    I can't finish the test.^M
    Oh! The soup taste good.^M
    motorcycle is cheap than car.
    This window is clear.
    the symbol '*' is represented as start.
    Oh!     My god!
    The gd software is a library for drafting programs. "M
    You are the best is mean you are the no. 1.
    The world <Happy> is the same with "glad".
    I like dog.
    google is the best tools for search keyword.
    goooooogle yes!
    go! go! Let's go.
    # I am CoderMan
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21

    特定字符串

    特定字符串,其实也是一种正则表达式

    #过滤包含特定字符串的数据行
    [root@localhost shell-learn]# grep -n 'the' regular_express.txt 
    8:I can't finish the test.^M
    12:the symbol '*' is represented as start.
    15:You are the best is mean you are the no. 1.
    16:The world <Happy> is the same with "glad".
    18:google is the best tools for search keyword.
    #反向过滤包含指定字符串的数据行
    [root@localhost shell-learn]# grep -vn 'the' regular_express.txt
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    []指定搜寻集合字符

    [] 指定需要搜寻的特定字符,不管[]中不论有多个字符串,他都代表其中的某一个字符,比如如下实例,搜寻包含tast或test字符的数据行。

    [root@localhost shell-learn]# grep -n 't[ae]st' regular_express.txt 
    8:I can't finish the test.^M
    9:Oh! The soup taste good.^M
    
    • 1
    • 2
    • 3

    [^]反向选择,比如不想oo字符前有g,则通过如下正则表达式指定[^g]oo

    [root@localhost shell-learn]# grep -n '[^g]oo' regular_express.txt 
    2:apple is my favorite food.
    3:Football game is not use feet only.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    
    • 1
    • 2
    • 3
    • 4
    • 5

    最后一行由于包含ooo,也确实满足oo之前不为g,满足要求,故而匹配命中

    [^a-z]oo指定oo前不包含小写字母的数据行

    #查询oo字符前不含小写字母的数据行
    [root@localhost shell-learn]# grep -n '[^a-z]oo' regular_express.txt 
    3:Football game is not use feet only.
    ## 查询包含数字的数据行
    [root@localhost shell-learn]# grep -n '[0-9]' regular_express.txt 
    5:However, this dress is about $ 3183 dollars.^M
    15:You are the best is mean you are the no. 1.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    通过C语系的编码来指定正则,这个特殊符号,后续会进行汇总

    [root@localhost shell-learn]# grep -n '[^[:lower:]]oo' regular_express.txt
    3:Football game is not use feet only.
    [root@localhost shell-learn]# grep -n '[[:digit:]]' regular_express.txt
    5:However, this dress is about $ 3183 dollars.^M
    15:You are the best is mean you are the no. 1.
    
    • 1
    • 2
    • 3
    • 4
    • 5

    ^,$表示行首与行尾

    在正则中,^表示行首,$表示行尾

    匹配行首的实例

    ## 匹配以the开头的数据行
    [root@localhost shell-learn]# grep -n '^the' regular_express.txt 
    12:the symbol '*' is represented as start.
    #匹配以小写字母开头的数据行
    [root@localhost shell-learn]# grep -n '^[a-z]' regular_express.txt 
    2:apple is my favorite food.
    4:this dress doesn't fit me.
    10:motorcycle is cheap than car.
    12:the symbol '*' is represented as start.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.
    #^[[:lower:]]也能达到同样的效果,匹配以小写字母开头的数据行
    [root@localhost shell-learn]# grep -n '^[[:lower:]]' regular_express.txt 
    2:apple is my favorite food.
    4:this dress doesn't fit me.
    10:motorcycle is cheap than car.
    12:the symbol '*' is represented as start.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.
    # 读取不以字母开头的数据行 ^[^[:alpha:]]也可以
    [root@localhost shell-learn]# grep -n '^[^a-zA-Z]' regular_express.txt 
    1:"Open Source" is a good mechanism to develop programs.
    21:# I am CoderMan
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    需要注意的是^符号,在[]内和在[]外,含义是不同的。在[]内表示反向选择,在[]之外,表示行首。

    匹配行尾的实例

    #匹配以.结尾的数据行
    [root@localhost shell-learn]# grep -n '\.$' regular_express.txt 
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    4:this dress doesn't fit me.
    10:motorcycle is cheap than car.
    11:This window is clear.
    12:the symbol '*' is represented as start.
    15:You are the best is mean you are the no. 1.
    16:The world <Happy> is the same with "glad".
    17:I like dog.
    18:google is the best tools for search keyword.
    20:go! go! Let's go.
    ## 匹配以e.结尾的数据行
    [root@localhost shell-learn]# grep -n 'e\.$' regular_express.txt 
    4:this dress doesn't fit me.
    # 匹配空白行
    [root@localhost shell-learn]# grep -n '^$' regular_express.txt 
    22:
    # 去除/etc/rsyslog.conf中 空白行和以#开头的注释行
    [root@localhost shell-learn]# grep -v '^$' /etc/rsyslog.conf | grep -v '^#'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22

    .,*表示任意一个字符与重复字符

    小数点代表一定有一个字符的意思,星号表示重复前一个字符,有0到无穷多次的意思。

    #匹配g??d的数据行
    #这里表示g和d之间一定要占用两个字符。
    [root@localhost shell-learn]# grep -n 'g..d' regular_express.txt 
    1:"Open Source" is a good mechanism to develop programs.
    9:Oh! The soup taste good.^M
    16:The world <Happy> is the same with "glad".
    #匹配两个g之间至少有一个o的数据行
    [root@localhost shell-learn]# grep -n 'goo*g' regular_express.txt 
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    #g*表示有0个或多个g字符,整个正则的字符是g,gg,ggg,gggg都能匹配
    [root@localhost shell-learn]# grep -n 'g*g' regular_express.txt 
    1:"Open Source" is a good mechanism to develop programs.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.^M
    13:Oh!     My god!
    14:The gd software is a library for drafting programs. "M
    16:The world <Happy> is the same with "glad".
    17:I like dog.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22

    这里重点说一下.*.表示一定有一个字符,*表示0个或者多个,二者结合表示另个或多个任意字符

    [root@localhost shell-learn]# grep -n 'g.*g' regular_express.txt 
    1:"Open Source" is a good mechanism to develop programs.
    14:The gd software is a library for drafting programs. "M
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    {}限定字符范围

    .*从来设定0个到无限多个重复字符,如果想限定指定范围内的重复字符数呢?这个时候就需要用到{}了,同时因为{}在shell中有特殊含义,因此在使用{}的时候,需要用到转义字符。

    # {}中指定一个数字,等同于*号
    #等同于 ooo*
    [root@localhost shell-learn]# grep -n 'go\{2,\}' regular_express.txt 
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.^M
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    
    #{}中指定两个数字,表示确定的出现范围
    [root@localhost shell-learn]# grep -n 'go\{2,5\}g' regular_express.txt 
    18:google is the best tools for search keyword.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    小结

    上面的都是几个简单的实例,这里汇总一下基础正则的常用表示

    字符含义
    ^word待查找的word在行首
    grep -n ‘^#’ regular_express.txt
    word$待查找的word在行尾
    grep -n ‘!$’ regular_express.txt
    .一定有一个任意字符
    grep -n ‘e.e’ regular_express.txt
    \转义字符
    匹配有单引号的数据行:grep -n \’ regular_express.txt
    *重复零个到无穷多个的前一个字符
    grep -n ‘ess*’ regular_express.txt
    [list]字符集合,里面包含我们想要匹配的字符
    grep -n ‘g[ld]’ regular_express.txt
    [n1-n2]字符集合,里面包含我们想要匹配的字符范围
    grep -n ‘[a-z]’ regular_express.txt
    [^list]字符集合,里面列出我们不想匹配的字符范围
    grep -n ‘oo[^t]’ regular_express.txt (‘^’ = ‘^’)
    \{n,m\}连续的n到m个前一个字符
    如果是\{n\},则表示连续n个前一个字符
    如果是\{n,\}则表示连续n个以上的前一个字符
    grep -n ‘go\{2,3\}g’ regular_express.txt

    延伸正则表达式

    在《鸟哥的Linux私房菜》一书中,还提到了一个延伸的例子。

    Linux中的grep命令,仅仅支持基础的正则表达式用法,基本熟悉了基础正则表达式的用法已经够用了。如果想要让grep命令支持延伸型正则表达式的用法,需要用到grep -E或者egrep

    之前基础表达式中的

    # 去除文件中的注释和空白行
    grep -v '^$' regular_express.txt | grep -v '^#'
    
    • 1
    • 2

    在延伸正则表达式中可以通过如下来处理

    egrep -v '^$|^#' regular_express.txt
    
    • 1

    相比于基础的正则表达式,延伸表达式提供了一些额外的逻辑组合

    字符含义
    +重复一个或一个以上前一个字符
    实例:egrep -n ‘go+d’ regular_express.txt
    ?零个或一个前一个字符
    实例:egrep -n ‘go?d’ regular_express.txt
    |用or的方式找出多个字符串
    实例:egrep -n ‘gd|good’ regular_express.txt
    ()找出群组字符串
    实例:egrep -n ‘g(la|oo)d’ regular_express.txt
    ()+找出多个群组字符串
    实例:egrep -n ‘A(xyz)+’ regular_express.txt

    总结

    Linux中关于正则的基础使用基本总结完成,正则表达式的使用场景除了Linux下,在一些编程语言中也有着很重要的地位,后续会进行总结。

  • 相关阅读:
    Docker开箱即用,开发码农加分项部署技术拿下!
    python 脚本一键搭建socks5代理
    Python 爬虫入门:常见工具介绍
    System V 消息队列(一)—— 消息队列相关接口函数(msgget / msgctl)
    C 语言 sizeof运算符
    2022安洵杯web题复现
    数据库实验报告(一)
    瑞吉外卖实战项目全攻略——第二天
    【Leetcode】1206. Design Skiplist
    基于SSM的甜品店商城系统
  • 原文地址:https://blog.csdn.net/liman65727/article/details/125585895