P4：正则表达式（Regular Expression）学习笔记

1.初始正则表达式

正则表达式是对字符串执行模式匹配的技术

1.1正则表达式练习1

提取content中所有的英文单词
提取content中所有的数字
提取content中所有的英文单词和数字

package Regexp_;

import org.omg.CosNaming.NamingContextExtPackage.StringNameHelper;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 正则表达式1 ；体验正则表达式
 */
public class Reg1 {
    public static void main(String[] args) {
        String content="1995年，互联网的蓬勃发展给了Oak机会。业界为了使死板、单调的静态网页能够“灵活”起来，" +
                "急需一种软件技术来开发一种程序，这种程序可以通过网络传播并且能够跨平台运行。" +
                "于是，世界各大IT企业为此纷纷投入了大量的人力、物力和财力。" +
                "这个时候，Sun公司想起了那个被搁置起来很久的Oak，并且重新审视了那个用软件编写的试验平台，" +
                "由于它是按照嵌入式系统硬件平台体系结构进行编写的，所以非常小，特别适用于网络上的传输系统，" +
                "而Oak也是一种精简的语言，程序非常小，适合在网络上传输。" +
                "Sun公司首先推出了可以嵌入网页并且可以随同网页在网络上传输的Applet（Applet是一种将小程序嵌入到网页中进行执行的技术），" +
                "并将Oak更名为Java。5月23日，Sun公司在Sun world会议上正式发布Java和HotJava浏览器。" +
                "IBM、Apple、DEC、Adobe、HP、Oracle、Netscape和微软等各大公司都纷纷停止了自己的相关开发项目，" +
                "竞相购买了Java使用许可证，并为自己的产品开发了相应的Java平台。";

        //1先创建一个Pattern对象，模式对象，可以理解成一个正则表达式对象
        //Pattern pattern = Pattern.compile("[a-zA-Z]+"); //提取content中所有的英文单词
        //Pattern pattern = Pattern.compile("[0-9]+"); //提取content中所有的数字
        Pattern pattern = Pattern.compile("([0-9]+|[a-zA-Z]+)");//提取content中所有的英文单词和数字
        
        //2 创建一个匹配器对象
        //理解：匹配器 matcher 按照pattern（模式/样式），到content中去匹配，找到返回true，否则返回false
        Matcher matcher = pattern.matcher(content);
        //3 可以开始循环匹配
        while (matcher.find()){
            //匹配内容，文本，放到 m.group(0)
            System.out.println("找到："+ matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

1.2正则表达式练习2

提取百度热搜的标题

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Reg2 {
    public static void main(String[] args) {
        String context="  <div class=\"ntes-quicknav-list\">\n" +
                "              <div class=\"ntes-quicknav-content\">\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-1\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://news.163.com\">新闻</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://news.163.com/domestic\">国内</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://news.163.com/world\">国际</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://view.163.com\">评论</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://war.163.com\">军事</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://news.163.com/special/wangsansanhome/\">王三三</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-2\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://sports.163.com\">体育</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/nba\">NBA</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/cba\">CBA</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/allsports\">综合</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/zc\">中超</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/world\">国际足球</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/yc\">英超</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/xj\">西甲</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sports.163.com/yj\">意甲</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-3\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://ent.163.com\">娱乐</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://ent.163.com/star\">明星</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://ent.163.com/photo\">图片</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://ent.163.com/movie\">电影</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://ent.163.com/tv\">电视</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://ent.163.com/music\">音乐</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://ent.163.com/special/gsbjb/\">稿事编辑部</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://ent.163.com/special/focus_ent/\">娱乐FOCUS</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-4\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://money.163.com\">财经</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://money.163.com/stock\">股票</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"http://quotes.money.163.com/stock\">行情</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://money.163.com/ipo\">新股</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://money.163.com/finance\">金融</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://money.163.com/fund\">基金</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://biz.163.com\">商业</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://money.163.com/licai\">理财</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-5\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://auto.163.com\">汽车</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://auto.163.com/buy\">购车</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://auto.163.com/depreciate\">行情</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"http://product.auto.163.com\">车型库</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://auto.163.com/elec\">新能源</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://auto.163.com/news\">行业</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-6\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://tech.163.com\">科技</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://tech.163.com/telecom/\">通信</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://tech.163.com/it\">IT</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://tech.163.com/internet\">互联网</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://tech.163.com/special/chzt\">特别策划</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://tech.163.com/smart/\">网易智能</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-7\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://fashion.163.com\">时尚</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://baby.163.com\">亲子</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://fashion.163.com/art\">艺术</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-8\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://mobile.163.com\">手机</a>\n" +
                "                      <span>/</span>\n" +
                "                      <a href=\"https://digi.163.com/\">数码</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://tech.163.com/special/ydhlw\">移动互联网</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://mobile.163.com/special/jqkj_list/\">惊奇科技</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://mobile.163.com/special/cpshi_list/\">易评机</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-9\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://house.163.com\">房产</a>\n" +
                "                      <span>/</span>\n" +
                "                      <a href=\"https://home.163.com\">家居</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://bj.house.163.com\">北京房产</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://sh.house.163.com\">上海房产</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://gz.house.163.com\">广州房产</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://house.163.com/city\">全部分站</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://xf.house.163.com\">楼盘库</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://home.163.com/jiaju/\">家具</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://home.163.com/weiyu/\">卫浴</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-10\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://travel.163.com\">旅游</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://travel.163.com/outdoor\">户外</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://travel.163.com/food\">美食</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://travel.163.com/special/travellist/#f=endnav\">专题</a>\n" +
                "                  </li>\n" +
                "                </ul>\n" +
                "                <ul class=\"ntes-quicknav-column ntes-quicknav-column-11\">\n" +
                "                  <li>\n" +
                "                    <h3>\n" +
                "                      <a href=\"https://edu.163.com\">教育</a>\n" +
                "                    </h3>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://edu.163.com/yimin\">移民</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://edu.163.com/liuxue\">留学</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://edu.163.com/en\">外语</a>\n" +
                "                  </li>\n" +
                "                  <li>\n" +
                "                    <a href=\"https://edu.163.com/gaokao\">高考</a>\n" +
                "                  </li>\n" +
                "                </ul>";

        //1先创建一个Pattern对象，模式对象，可以理解成一个正则表达式对象
        Pattern pattern = Pattern.compile("<a href=\"(\\S*)\">(\\S*)</a>");//提取content中所有的英文单词和数字

        //2 创建一个匹配器对象
        //理解：匹配器 matcher 按照pattern（模式/样式），到content中去匹配，找到返回true，否则返回false
        Matcher matcher = pattern.matcher(context);
        //3 可以开始循环匹配
        while (matcher.find()){
            //匹配内容，文本，放到 m.group(0)
            System.out.println("找到："+ matcher.group(0)); //找到：<a href="https://home.163.com/jiaju/">家具</a>
            //System.out.println("找到："+ matcher.group(1)); //找到：https://house.163.com
            //System.out.println("找到："+ matcher.group(2));//找到：新闻
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275

在这里插入图片描述

1.3正则表达式练习3

给你一个字符串(或文章),请你找出所有四个数字连在一起的子串?
给你一个字符串(或文章),请你找出所有四个数字连在一起的子串，并且这四个数字要满足:第一位与第四位相同,第二位与第三位相同,比如1221, 5775

2.正则表达式源码分析

2.1 源码分析-matcher.find()

matcher.find() 完成的任务

根据指定的规则，定位满足规则的子字符串（比如1998），
找到后，将子字符串开始的索引记录到 matcher 对象的属性 int[] groups
groups[0] =0 ,把该子字符串的结束的索引+1的值记录到groups[1]=4
同时记录 oldLast 的值为子字符串的结束的索引+1 的值即 4，即下次执行find 时就从4开始匹配

2.2源码分析- matcher.group(0)

源码

public String group(int group) {
    if (first < 0)
            throw new IllegalStateException("No match found");
      if (group < 0 || group > groupCount())
            throw new IndexOutOfBoundsException("No group " + group);
       if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
          return null;
       return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
   }
1
2
3
4
5
6
7
8
9

根据 groups[0]=0和groups[1] =1的记录位置，从context开始截取子字符串返回就是[0,4) 包含0 但是不包含索引4 -1998
如果再次执行find方法，仍然按照上面分析来执行groups[0]=31和groups[1]=35 包含31 但是不包含索引35 -1999

2.3 整体代码

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Reg3 {
    public static void main(String[] args) {
        String context="1998年12月8日，第二代Java平台的企业版J2EE发布。" +
                "1999年6月，Sun公司发布了第二代Java平台（简称为Java2）的3个版本：" +
                "J2ME（Java2 Micro Edition，Java2平台的微型版），应用于移动、无线及有限资源的环境；" +
                "J2SE（Java 2 Standard Edition，Java 2平台的标准版），应用于桌面环境；" +
                "J2EE（Java 2Enterprise Edition，Java 2平台的企业版），应用于基于Java的应用服务器。" +
                "Java 2平台的发布，是Java发展过程中最重要的一个里程碑，标志着Java的应用开始普及9889。";
        /**
         *  \d 数字字符匹配。等效于 [0-9]。
         */
        String regStr="\\d\\d\\d\\d";//所有四个数字连在一起的子串
        //创建模式对象[即正则表达式]
        Pattern pattern =Pattern.compile(regStr);
        //创建匹配器，按照正则表达式规则，去匹配context字符串
        Matcher matcher = pattern.matcher(context);
        /**
         * matcher.find() 完成的任务
         * 1. 根据指定的规则 ，定位满足规则的子字符串（比如1998），
         * 2. 找到后，将子字符串 开始的索引 记录到 matcher 对象的属性 int[] groups
         *  groups[0] =0  ,把该子字符串的结束的索引+1的值记录到groups[1]=4
         * 3.同时记录 oldLast 的值 为子字符串 的结束的索引+1 的值  即 4，即下次执行find 时  就从4开始匹配
         *
         * matcher.group(0) 分析
         *      源码
         *     public String group(int group) {
         *         if (first < 0)
         *             throw new IllegalStateException("No match found");
         *         if (group < 0 || group > groupCount())
         *             throw new IndexOutOfBoundsException("No group " + group);
         *         if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
         *             return null;
         *         return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
         *     }
         *  1. 根据 groups[0]和groups[1] 的记录位置，从context开始截取子字符串返回
         *          就是[0,4) 包含0 但是不包含索引4  -1998
         *  如果再次执行find方法，仍然按照上面分析来执行
         *    groups[0]=31和groups[1]=35 包含31 但是不包含索引35  -1999
         *
         */
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

2.4源码分析-分组（）

什么是分组？比如 (\d\d)(\d\d)，正则表达式中有 () 表示分组，第1个()表示第一组，第2个()表示第二组

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Reg4 {
    public static void main(String[] args) {
        String context="1998年12月8日，第二代Java平台的企业版J2EE发布。" +
                "1999年6月，Sun公司发布了第二代Java平台（简称为Java2）的3个版本：" +
                "J2ME（Java2 Micro Edition，Java2平台的微型版），应用于移动、无线及有限资源的环境；" +
                "J2SE（Java 2 Standard Edition，Java 2平台的标准版），应用于桌面环境；" +
                "J2EE（Java 2Enterprise Edition，Java 2平台的企业版），应用于基于Java的应用服务器。" +
                "Java 2平台的发布，是Java发展过程中最重要的一个里程碑，标志着Java的应用开始普及9889。";
        /**
         *  \d 数字字符匹配。等效于 [0-9]。
         */
        String regStr="(\\d\\d)(\\d\\d)";//所有四个数字连在一起的子串
        //创建模式对象[即正则表达式]
        Pattern pattern =Pattern.compile(regStr);
        //创建匹配器，按照正则表达式规则，去匹配context字符串
        Matcher matcher = pattern.matcher(context);
        /**
         * matcher.find() 完成的任务
         *  什么是分组？ 比如 (\d\d)(\d\d)，正则表达式中有 () 表示分组，第1个()表示第一组，第2个()表示第二组
         * 1. 根据指定的规则 ，定位满足规则的子字符串（比如(19)(98)），
         * 2. 找到后，将子字符串 开始的索引 记录到 matcher 对象的属性 int[] groups
         *    2.1  groups[0] =0  ,把该子字符串的结束的 索引+1的值记录到groups[1]=4
         *    2.2  记录第一组匹配到的字符串groups[2]= 0 , groups[3]=2  (19)
         *    2.3  记录第一组匹配到的字符串groups[4]= 2 , groups[5]=2  (98)
         *    2.4 如果有更多的分组，依次类推.......
         *
         * 3.同时记录 oldLast 的值 为子字符串 的结束的索引+1 的值  即 4，即下次执行find 时  就从4开始匹配
         *
         * matcher.group(0) 分析
         *      源码
         *     public String group(int group) {
         *         if (first < 0)
         *             throw new IllegalStateException("No match found");
         *         if (group < 0 || group > groupCount())
         *             throw new IndexOutOfBoundsException("No group " + group);
         *         if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
         *             return null;
         *         return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
         *     }
         *  1. 根据 groups[0]和groups[1] 的记录位置，从context开始截取子字符串返回
         *          就是[0,4) 包含0 但是不包含索引4  -1998
         *  如果再次执行find方法，仍然按照上面分析来执行
         *    groups[0]=31和groups[1]=35 包含31 但是不包含索引35  -1999
         *
         */
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
            System.out.println("第一组()匹配到的值="+matcher.group(1));
            System.out.println("第二组()匹配到的值="+matcher.group(2));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

matcher.find()

根据指定的规则，定位满足规则的子字符串（比如(19)(98)），
找到后，将子字符串开始的索引记录到 matcher 对象的属性 int[] groups
2.1 groups[0] =0 ,把该子字符串的结束的索引+1的值记录到groups[1]=4
2.2 记录第一组匹配到的字符串groups[2]= 0 , groups[3]=2 (19)
2.3 记录第一组匹配到的字符串groups[4]= 2 , groups[5]=2 (98)
2.4 如果有更多的分组，依次类推…

System.out.println("找到："+matcher.group(0)); //表示匹配到的子字符串整体  1998
System.out.println("第一组()匹配到的值="+matcher.group(1)); //19
System.out.println("第二组()匹配到的值="+matcher.group(2)); //98
1
2
3

在这里插入图片描述

3.正则表达式语法

各种元字符的功能

3.0元字符(Metacharacter)-转义号 `\\`

\\ 符号：在我们使用正则表达式去检索某些特殊字符的时候，需要用到转义符号，否则检索不到结果，甚至会报错。
在java正则表达式中，两个\\ 代表其他语言中的一个\
需要用到转义符号的字符有
```
. * + （） $ / \ ? [] ^ { }
1
```

3.1限定符

用于指定其前面的字符和组合项连续出现多少次
在这里插入图片描述

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 限定符的使用
 */
public class Reg7 {
    public static void main(String[] args) {
        String content="111234Aa2a1abbbccd13511";

        //String regStr="a{3}"; //表示匹配 aaa
        //String regStr="1{4}" ;  //表示匹配 1111

        //String regStr="\\d{2}" ; //表示匹配 2位的任意数字

        //String regStr="a{2,4}" ; //表示匹配 2-4 之间个数的 1 ,尽量匹配多的
        //String regStr="\\d{2,5}" ; //表示匹配2位数 或者3,4,5

        //String regStr="1+"; //表示匹配 1个字符1或者多个字符1
        //String regStr="\\d+"; //表示匹配 1个数字或者多个数字

        //String regStr="1*"; //匹配 0个1 或者多个 1

        String regStr="a1?"; //匹配  a 或者 a1   匹配0或1个 1






        Pattern pattern =Pattern.compile(regStr);
        Matcher matcher =pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

3.2选择匹配符

在这里插入图片描述

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 选择匹配符
 */
public class Reg6 {
    public static void main(String[] args) {
        String context="后羿射手  houyisheshou 社长";
        String regStr="射|she|社";

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(context);
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

3.3分组组合

在这里插入图片描述

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 分组
 */
public class Reg9 {
    public static void main(String[] args) {
        String content ="han123 s7789 nn1189han";

        /**
         * 下面就是非命名分组
         * 说明
         * 1.matcher.group(0) 得到匹配的字符串
         * 2.matcher.group(1) 得到匹配的字符串的第1个分组内容
         * 3.matcher.group(2) 得到匹配的字符串的第2个分组内容
         */
        //String regStr ="(\\d\\d)(\\d\\d)"; //匹配4个数字的字符串

        /**
         * 下面试命名分组
         * 可以给分组取名
         */
        String regStr ="(?<g1>\\d\\d)(?<g2>\\d\\d)";


        Pattern pattern =Pattern.compile(regStr);
        Matcher matcher =pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
            System.out.println("第1个分组内容："+matcher.group(1));
            System.out.println("第1个分组内容【通过组名】："+matcher.group("g1"));
            System.out.println("第2个分组内容："+matcher.group(2));
            System.out.println("第2个分组内容【通过组名】："+matcher.group("g2"));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

在这里插入图片描述

在这里插入图片描述
非捕获分组 （？：pattern）

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 非捕获分组
 */
public class Reg10 {
    public static void main(String[] args) {
        String content ="hello小明教育 jack小明老师 小明同学hello";

        //String  regStr="小明教育|小明老师|小明同学"; //找到 小明教育  小明老师 小明同学
        String  regStr="小明(?:教育|老师|同学)"; //非捕获分组 找到 小明教育  小明老师 小明同学


        Pattern pattern =Pattern.compile(regStr);
        Matcher matcher =pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

在这里插入图片描述

非捕获分组 (?=pattern)

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Reg11 {
    public static void main(String[] args) {
        String content ="hello小明教育 jack小明老师 小明同学hello";

        String  regStr="小明(?=教育|老师)"; //非捕获分组 找到 小明这个关键字，但是要求只是查找 小明教育和 小明老师  中包含的小明


        Pattern pattern =Pattern.compile(regStr);
        Matcher matcher =pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

在这里插入图片描述

非捕获分组(?!pattern)

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Reg12 {
    public static void main(String[] args) {
        String content ="hello小明教育 jack小明老师 小明同学hello";

        String  regStr="小明(?!教育|老师)"; //非捕获分组 找到 小明这个关键字，但是要求不是查找 小明教育和 小明老师  中包含的小明


        Pattern pattern =Pattern.compile(regStr);
        Matcher matcher =pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

3.4字符匹配符

在这里插入图片描述

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/***
 * 演示字符匹配符  的使用
 */
public class Reg5 {
    public static void main(String[] args) {
        String content ="a11c8.a bc ABC*_";

        //String regStr="[a-z]"; //匹配  a-z 之间的任意一个字符
        //String regStr="[^a-z]"; //匹配 不在 a-z 之间的任意一个字符

        //String regStr="[A-Z]"; //匹配 A-Z 之间的任意一个字符
        //String regStr="[^A-Z]"; //匹配 不在 A-Z 之间的任意一个字符

        //String regStr="[0-9]"; //匹配  0-9 之间的任意一个字符
        //String regStr="\\d"; //匹配  0-9 之间的任意一个字符
        //String regStr="[^0-9]"; //匹配 不在 0-9 之间的任意一个字符
        //String regStr="\\D"; //匹配 不在 0-9 之间的任意一个字符

        //String regStr="\\w"; //匹配 大小写英文字母、数字、下划线
        //String regStr="\\W";  //匹配 等价于 [^a-zA-Z0-9_]

        //String regStr="abc"; //匹配 abc 字符串 [默认区分大小写]
        //String regStr="(?i)abc"; //匹配 abc 字符串 [不区分大小写]
        //String regStr="a(?i)bc"; //匹配 abc 字符串 [ bc不区分大小写]
        //String regStr="a((?i)b)c"; //匹配 abc 字符串 [ 只有b不区分大小写]

        //String regStr="[abcd]";// 匹配在adcd之中的任意一个字符

        //String regStr="\\s"; // 匹配任何空白字符(空格、制表符)
        //String regStr="\\S";  // 不是 空白字符 之外的任意一个字符

        //String regStr=".";  //匹配 除了 \n 之外的所有字符， 如果需要匹配 . 本身,需要使用 \\.
        String regStr="\\."; //匹配 .

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        while (matcher.find()){
            System.out.println("找到"+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

3.5定位符

定位符，规定要匹配的字符出现的位置，比如在字符串的开始还是在结束的位置
在这里插入图片描述

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 定位符使用
 */
public class Reg8 {
    public static void main(String[] args) {
        String content ="han123resdfgsfdsdahan  dphanu";

        //String regStr ="^[0-9]+[a-z]*"; //匹配 以至少1个数字开头，后接任意个小写字母的字符串
        String regStr ="^[0-9]+[a-z]+$"; //匹配 以至少1个数字开头，必须以至少1个字母结束 的字符串

        //String regStr ="han\\b"; //匹配 边界的han 【这里的边界是指：被匹配的字符串的最后，也可以是空格的子字符串的后面】
        //String regStr ="han\\B";
        Pattern pattern =Pattern.compile(regStr);
        Matcher matcher =pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到："+matcher.group(0));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

3.6反向引用

圆括号的内容被捕获后，可以在这个括号后被使用，从而写出一个比较实用的匹配模式，这个我们称为反向引用，这种引用既可以是在正则表达式内部，也可以是在正则表达式外部，内部反向引用\\分组号，外部反向引用$分组号

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 反向引用
 */
public class Reg19 {
    public static void main(String[] args) {
        String content ="hello jack222222 tom1551 yyy xx";
        //String regStr="(\\d)\\1"; //找到2个连续相同的数字
        //String regStr="(\\d)\\1{4}";//找到5个连续相同的数字
        String regStr="(\\d)(\\d)\\2\\1"; //匹配 个位与千位相同，十位与百位相同的数 如5225 1551
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到"+matcher.group(0));
        }

    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 请在字符串中检索商品编号,形式如:12321-333999111这样的号码,
 * 要求满足前面是一个五位数,然后一个-号,然后是一个九位数,连续的每三位要相同
 */
public class Reg20 {
    public static void main(String[] args) {
        String content ="12321-333999111";

        String regStr="\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}"; 
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()){
            System.out.println("找到"+matcher.group(0));
        }

    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

在这里插入图片描述

4.正则表达式的应用实例

4.1验证是不是汉字

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 正则表达式的应用实例
 */
public class Reg13 {
    public static void main(String[] args) {
        String content="小明的教育";

        String regStr="^[\u0391-\uffe5]+$"; //判断是不是汉字
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        if (matcher.find()){
            System.out.println("满足格式");
        }else {
            System.out.println("不满足格式");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

4.2 验证邮政编码

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 *
 *  正则表达式的应用实例
 *  邮政编码
 *  格式要求 ：是1-9开头的 一个6位数字，比如123890
 */
public class Reg14 {
    public static void main(String[] args) {
        String content="523890";

        String regStr="^[1-9]\\d{5}$";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        if (matcher.find()){
            System.out.println("满足格式");
        }else {
            System.out.println("不满足格式");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

4.3验证QQ号格式

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 正则表达式的应用实例
 * QQ号
 * 格式要求：是1-9开头的一个(5位数-10数)
 * 比如 12389 ，1234567890
 */
public class Reg15 {
    public static void main(String[] args) {
        String content="5678950";

        String regStr="^[1-9]\\d{4,9}$";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        if (matcher.find()){
            System.out.println("满足格式");
        }else {
            System.out.println("不满足格式");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

4.4手机号格式验证

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 手机号码
 * 格式要求  ：必须以13,14,15,18 开头的11位数字 ，比如135 8888 9999
 */
public class Reg16 {
    public static void main(String[] args) {
        String content="18668902219";

        String regStr="^1[3|4|5|8]\\d{9}$";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        if (matcher.find()){
            System.out.println("满足格式");
        }else {
            System.out.println("不满足格式");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

4.5URL

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 正则表达式应用实例
 * url
 */
public class Reg17 {
    public static void main(String[] args) {
        String content="https://www.bilibili.com/v/knowledge/?spm_id_from=333.1007.0.0";
        /**
         * 思路
         * 1. 先确定url 的开始部分  https:// |http://
         * 2. 然后通过  (([\w-])+\.)+[\w-]+ 匹配 www.bilibili.com
         * 3. (\\/[\w-?=&/%.#]*)? 匹配 /v/knowledge/?spm_id_from=333.1007.0.0
         */
        String regStr="^((http|https)://)(([\\w-])+\\.)+[\\w-]+(\\/[\\w-?=&/%.#]*)?$";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        if (matcher.find()){
            System.out.println("满足格式");
        }else {
            System.out.println("不满足格式");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

4.6经典的结巴程序

把类似:“我….我要…学学学学…编程java!”;
通过正则表达式修改成“我要学编程

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Reg21 {
    public static void main(String[] args) {
        String content ="我...我要..学学学学...编程java!";
        // 1.去掉所有的 .
        Pattern pattern = Pattern.compile("\\.");
        Matcher matcher = pattern.matcher(content);
        content = matcher.replaceAll("");
        System.out.println("content="+content);
        //2. 去掉重复的字
        /**
         * 思路
         * 1 使用(.)\1
         * 2. 正则表达式变化需要重置matcher
         * 3. 使用反向引用 $1 替换匹配到的内容
         */

        
        // 使用一条语句替换重读的字
        content = Pattern.compile("(.)\\1+").matcher(content).replaceAll("$1");
        System.out.println(content);
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

5.1Pattern类 -matches方法

整体匹配

package Regexp_;

import java.util.regex.Pattern;

/**
 * matches 方法 ，用于整体匹配，在验证输入的字符串是否满足条件使用
 *
 */
public class Reg18 {
    public static void main(String[] args) {
        String content ="hello abc hello,小明";
        String regStr="hello.*";

        boolean matches = Pattern.matches(regStr, content);
        System.out.println("整体匹配="+matches);
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

在这里插入图片描述

6.String类中使用正则表达式

6.1替换字符

package Regexp_;

/**
 * String 类使用正则表达式
 */
public class Reg22 {
    public static void main(String[] args) {
        String content ="2000年5月，JDK1.3、JDK1.4和J2SE1.3相继发布，几周后其获得了Apple公司Mac OS X的工业标准的支持。" +
                "2001年9月24日，J2EE1.3发布";
        //使用正则表达式方式，将JDK1.3、JDK1.4 替换成JDK
        content = content.replaceAll("JDK1\\.3|JDK1\\.4", "JDK");
        System.out.println(content);
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

6.2判断功能

package Regexp_;

/**
 * String 类使用正则表达式
 */
public class Reg23 {
    public static void main(String[] args) {
        // 要求验证一个手机号， 要求必须是138 139开头
        String content="13788899999";
        boolean matches = content.matches("^1(38|39)\\d{8}");
        if (matches){
            System.out.println("验证成功");
        }else {
            System.out.println("验证失败");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

6.3分割功能

package Regexp_;

/**
 * String类使用正则表达式
 * 分割功能
 * 要求按照#或者-或者~或者数字来分割
 */
public class Reg24 {
    public static void main(String[] args) {
        String content ="hell#lo-jack12sminth~本经";
        String[] strings = content.split("#|-|~|\\d+");
        for (String s : strings) {
            System.out.println(s);
        }

    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

6.4验证电子邮件是否合法

package Regexp_;

/**
 * 验证电子邮件是否合法
 * 规定电子邮件规则为
 * 1.只能有一个@
 * 2.@前面是用户名,可以是a-Z A-Z0-9_-字符
 * 3.@后面是域名,并且域名只能是英文字母，比如sohu.com或者tsinghua.org.cn4、
 * 写出对应的正则表达式,验证输入的字符串是否为满足规则
 */
public class Reg25 {
    public static void main(String[] args) {
        String content="123464@qq.com.cn";

        boolean matches = content.matches("^[\\w-]+@([a-zA-Z]+\\.)+[a-zA-Z]+$");
        if (matches){
            System.out.println("匹配成功");
        }else{
            System.out.println("匹配失败");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

6.5验证是不是整数或者小数

这个题要考虑正数和负数
比如:123 -345 34.89 -87.9 -0.01 0.45等

package Regexp_;

/**
 * 要求验证是不是整数或者小数
 * 提示:这个题要考虑正数和负数
 * 比如:123 -345 34.89 -87.9 -0.01 0.45等
 */
public class Reg26 {
    public static void main(String[] args) {
        String content ="0.36";
        String regStr="^[-+]?([1-9]\\d*|0)(\\.\\d+)?$";
        /**
         * 1. 先写出简单的正则表达式
         * 2.再逐步完善【根据各种情况来完善】
         */
        if (content.matches(regStr)){
            System.out.println("匹配成功,是整数或者小数");
        }else {
            System.out.println("匹配失败");
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

6.6对url进行解析

在这里插入图片描述

package Regexp_;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Reg27 {
    public static void main(String[] args) {
        String content="http://www.sohu.com:8080/abc//sdsd/inde@#*/x.htm";
        //正则表达式是根据要求来编写的
        String  regStr="^([a-zA-Z]+)://([a-zA-Z.]+):(\\d+)[\\w-/]*/([\\w.@#*/]+)$";

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        if (matcher.matches()){ //整体匹配，如果匹配成功，可以通过group(x)，获取对应分组内容
            System.out.println("整体匹配成功="+matcher.group(0));
            System.out.println("协议:"+matcher.group(1));
            System.out.println("域名："+matcher.group(2));
            System.out.println("端口:"+matcher.group(3));
            System.out.println("文件:"+matcher.group(4));


        }else {
            System.out.println("没有匹配成功");
        }

    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

相关阅读:
docker的常用指令
 知识点滴 - 关于MFI
纳百川冲刺创业板上市：计划募资约8亿元，宁德时代为主要合作方
 C++ 窗体程序初步（全网最全）
【32-业务开发-基础业务-规格参数-保存数据-查询数据-更新操作之数据回显展示-更新操作-前后端项目交互整合与测试-总结收获】
12寸和8寸封装线的差异点
 【js作用域】JavaScript中作用域的是什么？：从编译时其承担什么角色和查询作用域中的变量的角度解析作用域
 采购数智化爆发在即，支出宝“3+2“体系助力企业打造核心竞争优势
 Oracel存储过程,分页函数定义，索引建立
 7.25模拟赛总结
原文地址：https://blog.csdn.net/qq_45498432/article/details/125469634

P4：正则表达式（Regular Expression）学习笔记

正则表达式学习

1.初始正则表达式

1.1正则表达式练习1

1.2正则表达式练习2

1.3正则表达式练习3

2.正则表达式源码分析

2.1 源码分析-matcher.find()

2.2源码分析- matcher.group(0)

2.3 整体代码

2.4源码分析-分组（）

3.正则表达式语法

3.0元字符(Metacharacter)-转义号 \\

3.1限定符

3.2选择匹配符

3.3分组组合

3.4字符匹配符

3.5定位符

3.6反向引用

4.正则表达式的应用实例

4.1验证是不是汉字

4.2 验证邮政编码

4.3验证QQ号格式

4.4手机号格式验证

4.5URL

4.6经典的结巴程序

5.1Pattern类 -matches方法

6.String类中使用正则表达式

6.1替换字符

6.2判断功能

6.3分割功能

6.4验证电子邮件是否合法

6.5验证是不是整数或者小数

6.6对url进行解析

3.0元字符(Metacharacter)-转义号 `\\`