Linux基础学习笔记（十）——正则表达式

文章目录

前言
grep命令
基础正则表达式
延伸正则表达式
总结

前言

走到目前为止，离shell的学习越来越近了，还剩下几个基础内容，这篇博客打算总结一下正则表达式，对应于《鸟哥的Linux私房菜》第十一章的内容，这篇博客也会涉及grep命令和sed命令，但是关于什么是正则表达式，这里不做介绍

grep命令

grep命令linux中使用的比较广泛的一条命令，能以行为单位过滤文本数据。
在这里插入图片描述

实例

## 查询root用户最近登录的记录
[root@localhost ~]# last | grep  'root'
# 查询非root用户最近登录的记录
[root@localhost ~]# last | grep -v 'root'
# 取出/etc/man_db.conf 中包含MANPATH的几行，且高亮显示
grep --color=auto 'MANPATH' /etc/man_db.conf
1
2
3
4
5
6

除了这些基础用法，还有一些进阶的用法
在这里插入图片描述

实例

## dmesg列出核心信息，通过grep找出含有'IPv6'的数据行
[root@localhost shell-learn]# dmesg | grep 'IPv6'
[    6.852070] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
[    6.876779] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
[    6.894626] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
[    6.915310] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[    6.944766] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[    7.013395] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[    7.100007] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[    7.896722] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s8: link becomes ready
## 显示行号
[root@localhost shell-learn]# dmesg | grep -n --color=auto 'IPv6'
503:[    6.852070] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
505:[    6.876779] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
506:[    6.894626] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
507:[    6.915310] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
509:[    6.944766] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
510:[    7.013395] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
511:[    7.100007] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
512:[    7.896722] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s8: link becomes ready
## 关键词之前2行数据和之后的3行数据，都被显示出来
[root@localhost shell-learn]# dmesg | grep -n -A3 -B2 --color=auto 'IPv6'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

grep在数据中查找一个字符的时候，是以"整行"为单位来进行数据的读取。

基础正则表达式

准备数据

vi创建一个文件，用于总结正则表达式

"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.^M
GNU is free air not free beer.^M
Her hair is very beauty.^M
I can't finish the test.^M
Oh! The soup taste good.^M
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
Oh!     My god!
The gd software is a library for drafting programs. "M
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am CoderMan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

特定字符串

特定字符串，其实也是一种正则表达式

#过滤包含特定字符串的数据行
[root@localhost shell-learn]# grep -n 'the' regular_express.txt 
8:I can't finish the test.^M
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
#反向过滤包含指定字符串的数据行
[root@localhost shell-learn]# grep -vn 'the' regular_express.txt
1
2
3
4
5
6
7
8
9

[]指定搜寻集合字符

[] 指定需要搜寻的特定字符，不管[]中不论有多个字符串，他都代表其中的某一个字符，比如如下实例，搜寻包含tast或test字符的数据行。

[root@localhost shell-learn]# grep -n 't[ae]st' regular_express.txt 
8:I can't finish the test.^M
9:Oh! The soup taste good.^M
1
2
3

[^]反向选择，比如不想oo字符前有g，则通过如下正则表达式指定[^g]oo

[root@localhost shell-learn]# grep -n '[^g]oo' regular_express.txt 
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!
1
2
3
4
5

最后一行由于包含ooo，也确实满足oo之前不为g，满足要求，故而匹配命中

[^a-z]oo指定oo前不包含小写字母的数据行

#查询oo字符前不含小写字母的数据行
[root@localhost shell-learn]# grep -n '[^a-z]oo' regular_express.txt 
3:Football game is not use feet only.
## 查询包含数字的数据行
[root@localhost shell-learn]# grep -n '[0-9]' regular_express.txt 
5:However, this dress is about $ 3183 dollars.^M
15:You are the best is mean you are the no. 1.
1
2
3
4
5
6
7

通过C语系的编码来指定正则，这个特殊符号，后续会进行汇总

[root@localhost shell-learn]# grep -n '[^[:lower:]]oo' regular_express.txt
3:Football game is not use feet only.
[root@localhost shell-learn]# grep -n '[[:digit:]]' regular_express.txt
5:However, this dress is about $ 3183 dollars.^M
15:You are the best is mean you are the no. 1.
1
2
3
4
5

^，$表示行首与行尾

在正则中，^表示行首，$表示行尾

匹配行首的实例

## 匹配以the开头的数据行
[root@localhost shell-learn]# grep -n '^the' regular_express.txt 
12:the symbol '*' is represented as start.
#匹配以小写字母开头的数据行
[root@localhost shell-learn]# grep -n '^[a-z]' regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
#^[[:lower:]]也能达到同样的效果，匹配以小写字母开头的数据行
[root@localhost shell-learn]# grep -n '^[[:lower:]]' regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
# 读取不以字母开头的数据行 ^[^[:alpha:]]也可以
[root@localhost shell-learn]# grep -n '^[^a-zA-Z]' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
21:# I am CoderMan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

需要注意的是^符号，在[]内和在[]外，含义是不同的。在[]内表示反向选择，在[]之外，表示行首。

匹配行尾的实例

#匹配以.结尾的数据行
[root@localhost shell-learn]# grep -n '\.$' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.
## 匹配以e.结尾的数据行
[root@localhost shell-learn]# grep -n 'e\.$' regular_express.txt 
4:this dress doesn't fit me.
# 匹配空白行
[root@localhost shell-learn]# grep -n '^$' regular_express.txt 
22:
# 去除/etc/rsyslog.conf中 空白行和以#开头的注释行
[root@localhost shell-learn]# grep -v '^$' /etc/rsyslog.conf | grep -v '^#'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

.，*表示任意一个字符与重复字符

小数点代表一定有一个字符的意思，星号表示重复前一个字符，有0到无穷多次的意思。

#匹配g??d的数据行
#这里表示g和d之间一定要占用两个字符。
[root@localhost shell-learn]# grep -n 'g..d' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.^M
16:The world <Happy> is the same with "glad".
#匹配两个g之间至少有一个o的数据行
[root@localhost shell-learn]# grep -n 'goo*g' regular_express.txt 
18:google is the best tools for search keyword.
19:goooooogle yes!
#g*表示有0个或多个g字符，整个正则的字符是g,gg,ggg,gggg都能匹配
[root@localhost shell-learn]# grep -n 'g*g' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
13:Oh!     My god!
14:The gd software is a library for drafting programs. "M
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

这里重点说一下.*.表示一定有一个字符，*表示0个或者多个，二者结合表示另个或多个任意字符

[root@localhost shell-learn]# grep -n 'g.*g' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs. "M
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
1
2
3
4
5
6

{}限定字符范围

.和*从来设定0个到无限多个重复字符，如果想限定指定范围内的重复字符数呢？这个时候就需要用到{}了，同时因为{}在shell中有特殊含义，因此在使用{}的时候，需要用到转义字符。

# {}中指定一个数字，等同于*号
#等同于 ooo*
[root@localhost shell-learn]# grep -n 'go\{2,\}' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!

#{}中指定两个数字，表示确定的出现范围
[root@localhost shell-learn]# grep -n 'go\{2,5\}g' regular_express.txt 
18:google is the best tools for search keyword.
1
2
3
4
5
6
7
8
9
10
11
12
13

小结

上面的都是几个简单的实例，这里汇总一下基础正则的常用表示

字符	含义
^word	待查找的word在行首 grep -n ‘^#’ regular_express.txt
word$	待查找的word在行尾 grep -n ‘!$’ regular_express.txt
.	一定有一个任意字符 grep -n ‘e.e’ regular_express.txt
\	转义字符匹配有单引号的数据行：grep -n \’ regular_express.txt
*	重复零个到无穷多个的前一个字符 grep -n ‘ess*’ regular_express.txt
[list]	字符集合，里面包含我们想要匹配的字符 grep -n ‘g[ld]’ regular_express.txt
[n1-n2]	字符集合，里面包含我们想要匹配的字符范围 grep -n ‘[a-z]’ regular_express.txt
[^list]	字符集合，里面列出我们不想匹配的字符范围 grep -n ‘oo[^t]’ regular_express.txt (‘^’ = ‘^’)
\{n,m\}	连续的n到m个前一个字符如果是\{n\}，则表示连续n个前一个字符如果是\{n,\}则表示连续n个以上的前一个字符 grep -n ‘go\{2,3\}g’ regular_express.txt

延伸正则表达式

在《鸟哥的Linux私房菜》一书中，还提到了一个延伸的例子。

Linux中的grep命令，仅仅支持基础的正则表达式用法，基本熟悉了基础正则表达式的用法已经够用了。如果想要让grep命令支持延伸型正则表达式的用法，需要用到grep -E或者egrep

之前基础表达式中的

# 去除文件中的注释和空白行
grep -v '^$' regular_express.txt | grep -v '^#'
1
2

在延伸正则表达式中可以通过如下来处理

egrep -v '^$|^#' regular_express.txt
1

相比于基础的正则表达式，延伸表达式提供了一些额外的逻辑组合

字符	含义
+	重复一个或一个以上前一个字符实例：egrep -n ‘go+d’ regular_express.txt
?	零个或一个前一个字符实例：egrep -n ‘go?d’ regular_express.txt
\|	用or的方式找出多个字符串实例：egrep -n ‘gd\|good’ regular_express.txt
()	找出群组字符串实例：egrep -n ‘g(la\|oo)d’ regular_express.txt
()+	找出多个群组字符串实例：egrep -n ‘A(xyz)+’ regular_express.txt

总结

Linux中关于正则的基础使用基本总结完成，正则表达式的使用场景除了Linux下，在一些编程语言中也有着很重要的地位，后续会进行总结。

相关阅读:
Docker开箱即用，开发码农加分项部署技术拿下！
python 脚本一键搭建socks5代理
 Python 爬虫入门：常见工具介绍
 System V 消息队列（一）—— 消息队列相关接口函数（msgget / msgctl）
C 语言 sizeof运算符
 2022安洵杯web题复现
 数据库实验报告（一）
瑞吉外卖实战项目全攻略——第二天
 【Leetcode】1206. Design Skiplist
基于SSM的甜品店商城系统
原文地址：https://blog.csdn.net/liman65727/article/details/125585895

字符	含义
+	重复一个或一个以上前一个字符实例：egrep -n ‘go+d’ regular_express.txt
?	零个或一个前一个字符实例：egrep -n ‘go?d’ regular_express.txt
\|	用or的方式找出多个字符串实例：egrep -n ‘gd\|good’ regular_express.txt
()	找出群组字符串实例：egrep -n ‘g(la\|oo)d’ regular_express.txt
()+	找出多个群组字符串实例：egrep -n ‘A(xyz)+’ regular_express.txt