走到目前为止,离shell的学习越来越近了,还剩下几个基础内容,这篇博客打算总结一下正则表达式,对应于《鸟哥的Linux私房菜》第十一章的内容,这篇博客也会涉及grep命令和sed命令,但是关于什么是正则表达式,这里不做介绍
grep命令linux中使用的比较广泛的一条命令,能以行为单位过滤文本数据。

实例
## 查询root用户最近登录的记录
[root@localhost ~]# last | grep 'root'
# 查询非root用户最近登录的记录
[root@localhost ~]# last | grep -v 'root'
# 取出/etc/man_db.conf 中包含MANPATH的几行,且高亮显示
grep --color=auto 'MANPATH' /etc/man_db.conf
除了这些基础用法,还有一些进阶的用法

实例
## dmesg列出核心信息,通过grep找出含有'IPv6'的数据行
[root@localhost shell-learn]# dmesg | grep 'IPv6'
[ 6.852070] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
[ 6.876779] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
[ 6.894626] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
[ 6.915310] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[ 6.944766] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[ 7.013395] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[ 7.100007] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
[ 7.896722] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s8: link becomes ready
## 显示行号
[root@localhost shell-learn]# dmesg | grep -n --color=auto 'IPv6'
503:[ 6.852070] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
505:[ 6.876779] IPv6: ADDRCONF(NETDEV_UP): enp0s3: link is not ready
506:[ 6.894626] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
507:[ 6.915310] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
509:[ 6.944766] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
510:[ 7.013395] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
511:[ 7.100007] IPv6: ADDRCONF(NETDEV_UP): enp0s8: link is not ready
512:[ 7.896722] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s8: link becomes ready
## 关键词之前2行数据和之后的3行数据,都被显示出来
[root@localhost shell-learn]# dmesg | grep -n -A3 -B2 --color=auto 'IPv6'
grep在数据中查找一个字符的时候,是以"整行"为单位来进行数据的读取。
vi创建一个文件,用于总结正则表达式
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.^M
GNU is free air not free beer.^M
Her hair is very beauty.^M
I can't finish the test.^M
Oh! The soup taste good.^M
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
Oh! My god!
The gd software is a library for drafting programs. "M
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am CoderMan
特定字符串,其实也是一种正则表达式
#过滤包含特定字符串的数据行
[root@localhost shell-learn]# grep -n 'the' regular_express.txt
8:I can't finish the test.^M
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
#反向过滤包含指定字符串的数据行
[root@localhost shell-learn]# grep -vn 'the' regular_express.txt
[] 指定需要搜寻的特定字符,不管[]中不论有多个字符串,他都代表其中的某一个字符,比如如下实例,搜寻包含tast或test字符的数据行。
[root@localhost shell-learn]# grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.^M
9:Oh! The soup taste good.^M
[^]反向选择,比如不想oo字符前有g,则通过如下正则表达式指定[^g]oo
[root@localhost shell-learn]# grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!
最后一行由于包含ooo,也确实满足oo之前不为g,满足要求,故而匹配命中
[^a-z]oo指定oo前不包含小写字母的数据行
#查询oo字符前不含小写字母的数据行
[root@localhost shell-learn]# grep -n '[^a-z]oo' regular_express.txt
3:Football game is not use feet only.
## 查询包含数字的数据行
[root@localhost shell-learn]# grep -n '[0-9]' regular_express.txt
5:However, this dress is about $ 3183 dollars.^M
15:You are the best is mean you are the no. 1.
通过C语系的编码来指定正则,这个特殊符号,后续会进行汇总
[root@localhost shell-learn]# grep -n '[^[:lower:]]oo' regular_express.txt
3:Football game is not use feet only.
[root@localhost shell-learn]# grep -n '[[:digit:]]' regular_express.txt
5:However, this dress is about $ 3183 dollars.^M
15:You are the best is mean you are the no. 1.
在正则中,^表示行首,$表示行尾
匹配行首的实例
## 匹配以the开头的数据行
[root@localhost shell-learn]# grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.
#匹配以小写字母开头的数据行
[root@localhost shell-learn]# grep -n '^[a-z]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
#^[[:lower:]]也能达到同样的效果,匹配以小写字母开头的数据行
[root@localhost shell-learn]# grep -n '^[[:lower:]]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
# 读取不以字母开头的数据行 ^[^[:alpha:]]也可以
[root@localhost shell-learn]# grep -n '^[^a-zA-Z]' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
21:# I am CoderMan
需要注意的是^符号,在[]内和在[]外,含义是不同的。在[]内表示反向选择,在[]之外,表示行首。
匹配行尾的实例
#匹配以.结尾的数据行
[root@localhost shell-learn]# grep -n '\.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.
## 匹配以e.结尾的数据行
[root@localhost shell-learn]# grep -n 'e\.$' regular_express.txt
4:this dress doesn't fit me.
# 匹配空白行
[root@localhost shell-learn]# grep -n '^$' regular_express.txt
22:
# 去除/etc/rsyslog.conf中 空白行和以#开头的注释行
[root@localhost shell-learn]# grep -v '^$' /etc/rsyslog.conf | grep -v '^#'
小数点代表一定有一个字符的意思,星号表示重复前一个字符,有0到无穷多次的意思。
#匹配g??d的数据行
#这里表示g和d之间一定要占用两个字符。
[root@localhost shell-learn]# grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.^M
16:The world <Happy> is the same with "glad".
#匹配两个g之间至少有一个o的数据行
[root@localhost shell-learn]# grep -n 'goo*g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!
#g*表示有0个或多个g字符,整个正则的字符是g,gg,ggg,gggg都能匹配
[root@localhost shell-learn]# grep -n 'g*g' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
13:Oh! My god!
14:The gd software is a library for drafting programs. "M
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
这里重点说一下.*.表示一定有一个字符,*表示0个或者多个,二者结合表示另个或多个任意字符
[root@localhost shell-learn]# grep -n 'g.*g' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs. "M
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
.和*从来设定0个到无限多个重复字符,如果想限定指定范围内的重复字符数呢?这个时候就需要用到{}了,同时因为{}在shell中有特殊含义,因此在使用{}的时候,需要用到转义字符。
# {}中指定一个数字,等同于*号
#等同于 ooo*
[root@localhost shell-learn]# grep -n 'go\{2,\}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!
#{}中指定两个数字,表示确定的出现范围
[root@localhost shell-learn]# grep -n 'go\{2,5\}g' regular_express.txt
18:google is the best tools for search keyword.
上面的都是几个简单的实例,这里汇总一下基础正则的常用表示
| 字符 | 含义 |
|---|---|
| ^word | 待查找的word在行首 grep -n ‘^#’ regular_express.txt |
| word$ | 待查找的word在行尾 grep -n ‘!$’ regular_express.txt |
| . | 一定有一个任意字符 grep -n ‘e.e’ regular_express.txt |
| \ | 转义字符 匹配有单引号的数据行:grep -n \’ regular_express.txt |
| * | 重复零个到无穷多个的前一个字符 grep -n ‘ess*’ regular_express.txt |
| [list] | 字符集合,里面包含我们想要匹配的字符 grep -n ‘g[ld]’ regular_express.txt |
| [n1-n2] | 字符集合,里面包含我们想要匹配的字符范围 grep -n ‘[a-z]’ regular_express.txt |
| [^list] | 字符集合,里面列出我们不想匹配的字符范围 grep -n ‘oo[^t]’ regular_express.txt (‘^’ = ‘^’) |
| \{n,m\} | 连续的n到m个前一个字符 如果是\{n\},则表示连续n个前一个字符 如果是\{n,\}则表示连续n个以上的前一个字符 grep -n ‘go\{2,3\}g’ regular_express.txt |
在《鸟哥的Linux私房菜》一书中,还提到了一个延伸的例子。
Linux中的grep命令,仅仅支持基础的正则表达式用法,基本熟悉了基础正则表达式的用法已经够用了。如果想要让grep命令支持延伸型正则表达式的用法,需要用到grep -E或者egrep
之前基础表达式中的
# 去除文件中的注释和空白行
grep -v '^$' regular_express.txt | grep -v '^#'
在延伸正则表达式中可以通过如下来处理
egrep -v '^$|^#' regular_express.txt
相比于基础的正则表达式,延伸表达式提供了一些额外的逻辑组合
| 字符 | 含义 |
|---|---|
| + | 重复一个或一个以上前一个字符 实例:egrep -n ‘go+d’ regular_express.txt |
| ? | 零个或一个前一个字符 实例:egrep -n ‘go?d’ regular_express.txt |
| | | 用or的方式找出多个字符串 实例:egrep -n ‘gd|good’ regular_express.txt |
| () | 找出群组字符串 实例:egrep -n ‘g(la|oo)d’ regular_express.txt |
| ()+ | 找出多个群组字符串 实例:egrep -n ‘A(xyz)+’ regular_express.txt |
Linux中关于正则的基础使用基本总结完成,正则表达式的使用场景除了Linux下,在一些编程语言中也有着很重要的地位,后续会进行总结。